[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)
On May 15, 2020, at 21:35, Steven D'Aprano wrote: > > On Fri, May 15, 2020 at 05:44:59PM -0700, Andrew Barnert wrote: > >> Once you go with a separate view-creating function (or type), do we even >> need the dunder? > > Possibly not. But the beauty of a protocol is that it can work even if > the object doesn't define a `__view__` dunder. Sure, but if there’s no good reason for any class to provide a __view__ dunder, it’s better not to call one. Which is why I asked—in the message you’re replying to—a bunch of questions to try to determine whether there’s any reason for a class to want to provide an override. I’m not going to repeat the whole thing here; it’s all still in that same message you replied to. > - If the object defines `__view__`, call it; this allows objects to > return an optimized view, if it makes sense to them; e.g. bytes > might simply return a memoryview. Not if memoryview doesn’t have the right API, as we discussed earlier in this thread. But more importantly, if it’s only builtins that will likely ever need an optimization, we can do that inside the functions. That’s exactly what we do in hundreds of places already. Even the one optimization that’s exposed as part of the public C API, PySequence_Fast, isn’t hookable, much less all the functions that fast-path directly on the array in list/tuple or on the split hash table in set/dict/dict_keys and so on. It seems to work well enough in practice, and it’s simpler, and faster for the builtins, and it means we don’t have hundreds of extra dunders (and type slots in CPython) that will almost never be used, and PyPy doesn’t need to write hooks that are actually pessimizations just because they’re optimizations in CPython, and so on. Of course there might be a reason that doesn’t apply in this case (there obviously is a good reason for non-builtin types to optimize __contains__, for example), but “there might be” isn’t an answer to YAGNI. Especially if we can add the dunder later if someone later finds a need for it. And honestly, I’m not sure even list and tuple are worth optimizing here. After all, you can’t do the index arithmetic and call to sq_ifem significantly faster than a generic C function; it only helps if you can avoid the call to sq_item, and I think we can’t do that in any of the most useful cases (at least not without patching up a whole lot more code than we want). But I’ll try it and see if I’m wrong. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/JWBWCVKBBZMKGGMR6UQDP5ZII4NN6IWM/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)
On May 15, 2020, at 21:25, Steven D'Aprano wrote: > > On Fri, May 15, 2020 at 01:00:09PM -0700, Christopher Barker wrote: > >> I know you winked there, but frankly, there isn't a clear most Pythonic API >> here. Surely you do'nt think PYhton should have no methods? > > That's not what I said. Of course Python should have methods -- it's an > OOP language after all, and it's pretty hard to have objects unless they > have behaviour (methods). Objects with no behaviour are just structs. > > But seriously, and this time no winking, Python's design philosophy is > very different from that of Java and even Ruby and protocols are a > hugely important part of that. Python without protocols wouldn't be > Python, and it would be a much lesser language. > > [Aside: despite what the Zen says, I think *protocols* are far more > important to Python than *namespaces*.] I agree up to this point. But what you’re missing is that Python (even with stdlib stuff like pickle/copy and math.floor) has only a couple dozen protocols, and hundreds and hundreds of methods. Some things should be protocols, but not everything should, or even close. Very few things should be protocols. More to the point, things should be protocols if and only if they have a specific reason to be a protocol. For example: 1. You need something more complicated than just a single straightforward call, like the fallback behavior for __contains__ and __iter__ with “old-style sequences”, or the whole pickle__getnewargs_ex__ and friends, or __add__ vs. __radd__. 2. Syntax, especially operator overloading, like __contains__ and __add__. 3. The function is so ubiquitously important that you don’t want anything else using the same name for different meanings, like __len__. (There are probably other good reasons.) When you have a reason like this, you should design a protocol. But when you don’t, dot syntax is the default. And it’s not just complexity, or “too many builtins” (after all, pickle.dump and math.ceil aren’t builtins). It’s that dot syntax gives you built-in disambiguation that function call syntax doesn’t. If I have a sequence, xs.index(x) has an obvious meaning. But index(xs, x) would not, because means too many different things (in fact, we already have an __index__ protocol that does one of those different things), and it’s not like len where one of those meanings is so fundamental that we a actually want to discourage all the others. As I said elsewhere, I think we probably can’t have dot syntax in this case for other reasons. But that _still_ doesn’t necessarily mean we need a protocol. If we need to be able to override behavior but we can’t have dot syntax, *that* might be a good reason for a protocol, but either of those on its own is not a good reason, only the combination. It’s worth comparing C++, where “free functions are part of a class’s interface”. They don’t spell their protocols with underscores, or call them protocols, but they idea is all over the place. x+y tries x.operator+(y) plus various fallbacks. The way you get an iterator is begin(xs) which by default calls xs.begin() so that’s the standard place to customize it but there are fallbacks. Converting a C to a D tries (among other things) both C::operator D() and D::D(C). And so on. But, unlike Python, they don’t try to distinguish what is and isn’t a protocol; the dogma is basically that everything should be a protocol if it possibly can be. Which doesn’t work. They keep trying to solve the compiler-ambiguity problem by adding features like argument-dependent lookup, and almost adding D’s uniform call syntax every 3 years, but none of that will ever solve the human-ambiguity problem. Things like + and begin and swap belong at the top level because they should always mean the same thing even if they have to be implemented differently, but things like draw should be methods because they mean totally different things on different types, and even if the compiler can tell which one is meant, even if an IDE can help you, deck.draw(5) vs. shape.draw(ctx) is still more readable than draw(deck, 5) vs. draw(shape, ctx). Ultimately, it’s just as bad as Java; it just goes too far in the opposite direction, which is still too far, and that’s what always happens when you’re looking for a perfect and simple dogma that applies to both iter and index so you never have to think about design. > Python tends to use protocol-based top-level functions: > > len, int, str, repr, bool, iter, list > > etc are all based on *protocols*, not inheritance. > The most notable > counter-example to that was `iterator.next` which turned out to be a > mistake and was changed in Python 3 to become a protocol based on a > dunder. No, the most notable counter examples are things like insert, extend, index, count, etc. on sequences; keys, items, update, setdefault, etc. on mappings; add, isdisjoint, etc. on sets; real, imag, etc. on numbers;
[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)
On May 15, 2020, at 18:21, Christopher Barker wrote: > > Hmm, more thought needed. Speaking of “more thought needed”, I took a first pass over cleaning up my quick slice view class and adding the slicer class, and found some bikesheddable options. I think in most cases the best answer is obvious, but I’ve been wrong before. :) Assume s and t are Sequences of the same type, u is a Sequence or a different type, and vs, vt, and vu are view slices on those sequences. Also assume that we called the view slicer type vslice, and the view slice type SliceView, although obviously those are up for bikeshedding. When s==t is allowed, is vs==vt? What about vs==t? Same for <, etc.? I think yes, yes, yes. When s is hashable, is vs hashable? If so, is it the same hash an equivalent copy-slice would have? The answer to == constrains the answer here, of course. I think they can just not be hashable, but it’s a bit weird to have an immutable builtin sequence that isn’t. (Maybe hash could be left out but then added in a future version if there’s a need?) When s+t is allowed, is vs+t? vs+vt? (Similarly when s+u is allowed, but that usually isn’t.) vs*3? I think all yes, but I’m not sure. (Imagine you create a million view slices but filter them down to just 2, and then concatenate those two. That makes sense, I think.) Should there be a way to ask vs for the corresponding regular copy slice? Like vslice(s)[10:].strictify() == s[10:]? I’m not sure what it’s good for, but either __hash__ or __add__ seems to imply a private method for this, and then I can’t see any reason to prevent people from calling it. (Except that I can’t think of a good name.) Should the underlying sequence be a public attribute? It seems easy and harmless and potentially useful, and memoryview has .obj (although dict views don’t have a public reference to the dict). What about the original slice object? This seems less useful, since you don’t pass around slice objects that often. And we may not actually be storing it. (The simplest solution is to store slice.indices(len(seq)) instead of slice.) So I think no. If s isn’t a Sequence, should vslice(s) be a TypeError. I think we want the C API sequence check, but not the full ABC check. What does vslice(s)[1] do? I think TyoeError('not a slice'). Does the vslice type need any other methods besides __new__ and __getitem__? I don’t think so. The only use for vslice(s) besides slicing it is stashing it to be sliced later, just like the only use for a method besides calling it is stashing it to be called later. But it should have the sequence as a public attribute for debugging/introspection, just like methods make their self and function attributes public. Is the SliceView type public? (Only in types?) Or is “what the vslice slicer factory creates” an implementation detail, like list_iter. I think the latter. What’s the repr for a SliceView? Something like vslice([1, 2, 10, 20])[::2] seems most useful, since that’s the way you construct it, even if it is a bit unusual. Although a tiny slice of a giant sequence would then have a giant repr. What’s the str? I think same as the repr, but will people expect a view of a list/tuple/etc. to look “nice” like list/tuple/etc. do? Does vs[:] return self? (And, presumably, vs[0:len(s)+100] and so on.) I think so, but that doesn’t need to be guaranteed (just like tuple, range, etc.). If vs is an instance of a subclass of SliceView, is vs[10:20] a SliceView, or an instance of the subclass? I think the base class, just like tuple, etc. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/X45QVVPMB5JOQDKI7OEV4JAQ7WMA4XHO/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)
On May 15, 2020, at 18:21, Christopher Barker wrote: > >> On Fri, May 15, 2020 at 5:45 PM Andrew Barnert wrote: > >> On May 15, 2020, at 13:03, Christopher Barker wrote: >> > >> > Taking all that into account, if we want to add "something" to Sequence >> > behavior (in this case a sequence_view object), then adding a dunder is >> > really the only option -- you'd need a really compelling reason to add a >> > Sequence method, and since there are quite a few folks that think that's >> > the wrong approach anyway, we don't have a compelling reason. >> > >> > So IF a sequence_view is to be added, then a dunder is really the only >> > option. >> >> Once you go with a separate view-creating function (or type), do we even >> need the dunder? > > Indeed -- maybe not. We'd need a dunder if we wanted to make it an "official" > part of the Sequence protocol/ABC, but as you point out there may be no need > to do that at all. That’s actually a what triggered this thought. We need collections.abc.Sequence to support the dunder with a default implementation so code using it as a mixin works. What would that default implementation be? Basically just a class whose __getitem__ constructs the thing I posted earlier and that does nothing else. And why would anyone want to override that default? Being able to override dunders like __in__ and regular methods like count is useful for multiple reasons: a string-like class needs to extend their behavior for substring searching, a range-like class can implement them without searching at all, etc. But none of those seemed to apply to overriding __viewslice__ (or whatever we’d call it). > Hmm, more thought needed. Yeah, certainly just because I couldn’t think of a use doesn’t mean there isn’t one. But if I’m right that the dunder could be retrofitted in later (I want to try building an implementation without the dunder and then retrofitting one in along with a class that overrides it, if I get the time this weekend, to verify that it really isn’t a problem), that seems like a much better case for leaving it out. Another point: now that we’re thinking generic function (albeit maybe a C builtin with fast-path code for list/tuple), maybe it’s worth putting an implementation on PyPI as soon as possible, so we can get some experience using it and make sure the design doesn’t have any unexpected holes and, if we’re lucky, get some uptake from people outside this thread.___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/VSGQLYF6B25BB6KLZALMYST7IQWMVI3I/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Improve handling of Unicode quotes and hyphens
On May 14, 2020, at 20:01, Stephen J. Turnbull wrote: > > Executive summary: > > AFAICT, my guess at what's going on in the C tokenizer was exactly > right. It greedily consumes as many non-operator, non-whitespace > characters as possible, then validates. Well, it like like it’s not quite “non-operator, non-whitespace characters”, but rather “ASCII identifier or non-ASCII characters”: > (c >= 'a' && c <= 'z')\ > || (c >= 'A' && c <= 'Z')\ > || c == '_'\ > || (c >= 128)) (That’s the initial char rule; the continuing char rule is similar but of course allows digits.) So it won’t treat a $ or a ^G as potentially part of an identifier, so the caret will show up in the right place for one of those, but it will treat an emoji as potentially part of an identifier, so (if that emoji is immediately followed by legal identifier characters, ASCII or otherwise) the caret will show up too far to the right. I’m still glad the Python tokenizer doesn’t do this (because, as I said, I’ve relied on the documented behavior in import hooks for playing around with Python, and they use the Python tokenizer), but that doesn’t matter for the C tokenizer, because its output is not public, it’s only seen by the parser. And I think you can prove that the error caret placement is the only thing that could be affected by this shortcut.[1] And if it makes the tokenizer faster, or just simpler to maintain, that could easily be worth it. (At least until one of those periodic “Python should add this Unicode operator” proposals actually gets some traction, but I don’t see that as likely any time soon.) —- [1] Python only allows non-ASCII characters in identifiers, strings, and comments. Therefore, any string of characters that should be tokenized as a sequence of 1 ERRORTOKEN followed by 0 or more NAME and ERRORTOKEN tokens by the documented rule (and the Python code) will still give you a sequence of 1 ERRORTOKEN followed by 0 or more NAME and ERRORTOKEN tokens by the C code, just not necessarily the same such sequence. And any such sequence will be parsed as a SyntaxError pointing at the end of the initial ERRORTOKEN. So, the caret might be somewhere else within that block of identifier and non-ASCII characters, but it will be somewhere within that block. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/LPLKLECRRW2UEONMN6RAROU5HKKQC6XO/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)
On May 15, 2020, at 13:03, Christopher Barker wrote: > > Taking all that into account, if we want to add "something" to Sequence > behavior (in this case a sequence_view object), then adding a dunder is > really the only option -- you'd need a really compelling reason to add a > Sequence method, and since there are quite a few folks that think that's the > wrong approach anyway, we don't have a compelling reason. > > So IF a sequence_view is to be added, then a dunder is really the only option. Once you go with a separate view-creating function (or type), do we even need the dunder? I’m pretty sure a generic slice-view-wrapper (that just does index arithmetic and delegates) will work correctly on every sequence type. I won’t promise that the one I posted early in this thread does, of course, and obviously we need a bit more proof than “I’m pretty sure…”, but can anyone think of a way a Sequence could legally work that would break this? And I can’t think of any custom features a Sequence might want add to its view slices (or its view-slice-making wrapper). I can definitely see how a custom wrapper for list and tuple could be faster, and imagine how real life code could use it often enough that this matters. But if it’s just list and tuple, CPython’s already full of builtins that fast-path on list and tuple, and there’s no reason this one can’t do the same thing. So, it seems like it only needs a dunder if there are likely to be third-party classes that can do view-slicing significantly faster than a generic view-slicer, and are used in code where it’s likely to matter. Can anyone think of such a case? (At first numpy seems like an obvious answer. Arrays aren’t Sequences, but I think as long as the wrapper doesn’t actually type-check that at __new__ time they’d work anyway. But why would anyone, especially when they care about speed, use a generic viewslice function on a numpy array instead of just using numpy’s own view slicing?) It seems like a dunder is something that could be added as a refinement in the next Python version, if it turns out to be needed. If so, then, unless we have an example in advance to disprove the YAGNI presumption, why not just do it without the dunder? ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/G3L6NP4PWPR2O2VSVXGGJNALYECKDG5G/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)
On May 15, 2020, at 03:50, Steven D'Aprano wrote: > > On Thu, May 14, 2020 at 09:47:36AM -0700, Andrew Barnert wrote: >>> On May 14, 2020, at 03:01, Steven D'Aprano wrote: >>> >> Which is exactly why Christopher said from the start of this thread, >> and everyone else has agreed at every step of the way, that we can’t >> change the default behavior of slicing, we have to instead add some >> new way to specifically ask for something different. > > Which is why I was so surprised that you suddenly started talking about > not being able to insert into a slice of a list rather than a view. We’re talking about slice views. The sentence you quoted and responded to was about the difference between a slice view from a list and a slice view from a string. A slice view from a list may or may not be the same type as a slice view from a tuple (I don’t think there’s a reason to care whether they are or not), but either way, it being immutable will, I think, not surprise anyone. By contrast, a slice view from a string being not stringy _might_ surprise someone. >> Not only that, but whatever gives >> you view-slicing must look sufficiently different that you notice the >> difference—and ideally that gives you something you can look up if you >> don’t know what it means. I think lst.view[10:20] fits that bill. > > Have we forgotten how to look at prior art all of a sudden? Suddenly > been possessed by the spirits of deceased Java and Ruby programmers > intent on changing the look and feel of Python to make it "real object > oriented"? *wink* No, we have remembered that language design is not made up of trivial rules like “functions good, methods bad”, but of understanding the tradeoffs and how they apply in each case. > We have prior art here: > >b'abcd'.memoryview # No, not this. >memoryview(b'abcd') # That's the one. >'abcd'.iter # No, not that either. >iter('abcd') # That's it > > In fairness, I do have to point out that dict views do use a method > interface, This is a secondary issue that I’ll come back to, but first: the whole thing that this started off with is being able to use slicing syntax even when you don’t want a copy. The parallel to the prior art is obvious: itertools.islice(seq, 10, 20) # if you don’t care about iterator or view sliceviews.slice(seq, 10, 20) # if you do The first one already exists. The second one takes 15 lines of code, which I slapped together and posted near the start of the thread. The only problem is that they don’t solve the problem of “use slicing syntax”. But if that’s the entire point of the proposal (at least for Chris), that’s a pretty big problem. Now, as we’d already been discussing (and as you quoted), you _could_ have a callable like this: viewslice(seq)[10:20] I can write that in only a few more lines than what I posted before, and it works. But it’s no longer parallel to the prior art. It’s not a function that returns a view, it’s a wrapper object that can be sliced to provide a view. There are pros and cons of this wrapper object vs. the property, but a false parallel with other functions is not one of them. > 1. Dict views came with a lot of backwards-compatibility baggage; > they were initially methods that returned lists; then methods > that returned iterators were added, then methods that returned > views were added, and finally in 3.x the view methods were renamed and > the other six methods were removed. This is, if anything, a reason they _shouldn’t_ have been methods. Changing the methods from 2.6 to 2.7 to 3.x, and in a way that tools like six couldn’t even help without making all of your code a bit uglier, was bad, and wouldn’t have been nearly as much of a problem if we’d just made them all functions in 2.6. And yet, the reasons for them being methods were compelling enough that they remain methods in 3.x, despite that problem. That’s how tradeoffs work. > 2. There is only a single builtin mapping object, dict, not like > sequences where there are lists, tuples, range objects, strings, byte > strings and bytearrays. Well. there’s also mappingproxy, which is a builtin even if its name is only visible in types. And there are other mappings in the stdlib, as well as popular third-party libraries like SortedContainers. And they all support these methods. There are some legacy third-party libraries never fully updated for 3.x still out there, but they don’t meet the Mapping protocol or its ABC. So, how does this distinction matter? Note that there is a nearly opposite argument for the wrapper object that someone already made that both seem a lot compelling to me: third-party types. We can’t change them overnight. And some of them might already have an attribute named view, or anything else we might come up with. Those are real negatives with the property design, in a way that “more of the code we _can_ easily change is in the Objects rather than Lib
[Python-ideas] Re: Documenting iterators vs. iterables [was: Adding slice Iterator ...]
On May 14, 2020, at 20:17, Stephen J. Turnbull wrote: > > Andrew Barnert writes: > >> Students often want to know why this doesn’t work: >> with open("file") as f: >> for line in file: >> do_stuff(line) >> for line in file: >> do_other_stuff(line) > > Sure. *Some* students do. I've never gotten that question from mine, > though I do occasionally see > > with open("file") as f: > for line in f:# ;-) > do_stuff(line) > with open("file") as f: > for line in f: > do_other_stuff(line) > > I don't know, maybe they asked the student next to them. :-) Or they got it off StackOverflow or Python-list or Quora or wherever. Those resources really do occasionally work as intended, providing answers to people who search without them having to ask a duplicate question. :) >> The answer is that files are iterators, while lists are… well, >> there is no word. > > As Chris B said, sure there are words: File objects are *already* > iterators, while lists are *not*. My question is, "why isn't that > instructive?" Well, it’s not _completely_ not instructive, it’s just not _sufficiently_ instructive. Language is more useful when the concepts it names carve up the world in the same way you usually think about it. Yes, it’s true that we can talk about “iterables that are not iterators”. But that doesn’t mean there’s no need for a word. We don’t technically need the word “liquid” because we could always talk about “compressibles that are not solid” (or “fluids that are not gas”); we don’t need the word “bird” because we could always talk about “diapsids that are not reptiles”; etc. Theoretically, English could express all the same propositions and questions and so on that it does today without those words. But practically, it would be harder to communicate with. And that’s why we have the words “bird” and “liquid”. And the reason we don’t have a word for all diapsids except birds and turtles is that we don’t need to communicate about that category. Natural languages get there naturally; jargon sometimes needs help. >> We shouldn’t define everything up front, just the most important >> things. But this is one of the most important things. People need >> to understand this distinction very early on to use Python, > > No, they don't. They neither understand, nor (to a large extent) do > they *need* to. > ISTM that all we need to say is that > > 1. An *iterator* is a Python object whose only necessary function is > to return an object when next is applied to it. Its purpose is to > keep track of "next" for *for*. (It might do other useful things > for the user, eg, file objects.) > > 2. The *for* statement and the *next* builtin require an iterator > object to work. Since for *always* needs an iterator object, it > automatically converts the "in" object to an iterator implicitly. > (Technical note: for the convenience of implementors of 'for', > when iter is applied to an iterator, it always returns the > iterator itself.) I think this is more complicated than people need to know, or usually learn. People use for loops almost from the start, but many people get by with never calling next. All you need is the concept “thing that can be used in a for loop”, which we call “iterable”. Once you know that, everything else in Python that loops is the same as a for loop—the inputs to zip and enumerate are iterables, because they get looped over. “Iterable” is the fundamental concept. Yeah, it sucks that it has such a clumsy word, but at least it has a word. You don’t need the concept “iterator” here, much less need to know that looping uses iterables by calling iter() to get an iterator and then calling next() until StopIteration, until you get to the point of needing to read or write some code that iterates manually. Of course you will need to learn the concept “iterator” pretty soon anyway, but only because Python actually gives you iterators all over the place. In a language (like Swift) where zip and enumerate were views, files weren’t iterable at all, etc., you wouldn’t need the concept “iterator” until very late, but in Python it shows up early. But you still don’t need to learn about next(); that’s as much a technical detail as the fact that they return self from iter(). You want to know whether they can be used in for loops—and they can, because (unlike in Swift) iterators are iterable, and you already understand that. > 3. When a "generic" iterator "runs out", it's exhausted, it's truly > done. It is no longer useful, and there's nothing you can do but > throw it away. Generic iterators do not have a reset method. > Specialized iterators may provide one, but most do not. Yes, this is the next thing you need to know about iterators. But you also need to know that many iterables don’t get consumed in this way. Lists, ranges, dicts, etc. do _not_ run out when you use
[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)
On May 14, 2020, at 11:53, Ricky Teachey wrote: > >> So that means a view() function (with maybe a different name) -- however, >> that brings up the issue of where to put it. I'm not sure that it warrants >> being in builtins, but where does it belong? Maybe the collections module? >> And I really think the extra import would be a barrier. >> > > It occurs to me-- and please quickly shut me down if this is a really dumb > idea, I won't be offended-- `memoryview` is already a top-level built-in. I > know it has a near completely different meaning with regards to bytes objects > than we are talking about with a sequence view object. But could it do double > duty as a creator of views for sequences, too? But bytes and bytearray are Sequences, and maybe other things that support the buffer protocol are too. At first glance, it sounds terrible that the same function gives you a locking buffer view for some sequences and an indirect regular sequence view for others, and that there’s no way to get the latter for bytes even when you explicitly want that. But maybe in practice it wouldn’t be nearly as bad as it sounds? I don’t know. It sounds terrible in theory that NumPy arrays are almost but not quite Sequences, but in practice I rarely get confused by that. Maybe the same would be true here? There’s also the problem that “memoryview” is kind of a misleading name if you apply it to, say, a range instead of a list. But again, I’m not sure how bad that would be in practice.___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/BMB5DAW67NRODTH46NXIZ55D4VDRBO2Y/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)
On May 14, 2020, at 10:45, Rhodri James wrote: > > On 14/05/2020 17:47, Andrew Barnert via Python-ideas wrote: >> Which is exactly why Christopher said from the start of this thread, >> and everyone else has agreed at every step of the way, that we can’t >> change the default behavior of slicing, we have to instead add some >> new way to specifically ask for something different. > > Erm, did someone actually ask for something different? As far as I can tell > the original thread OP was asking for islice-maker objects, which don't > require the behaviour of slicing to change at all. Quite where the demand > for slice views has come from I'm not at all clear. That doesn’t make any difference here. If you want slicing sequences to return iterators rather than copies, that would break way too much code, so it’s not going to happen. A different method/property/class/function that gives you iterators would be fine. If you want slicing sequences to return views rather than copies, that would break way too much code, so it’s not going to happen. A different method/property/class/function that gives you iterators would be fine. Which is why nobody has proposed changing what list.__getitem__, etc. will do. As for where views came from: because they do everything iterators do plus things they don’t, and in this case they’re about as easy to implement. It’s really the same thing as dict.items. People wanted a dict.items that didn’t copy the whole thing into a giant list. The first suggestion was for an iterator. But that would break too much code, so it couldn’t be done until 3.0. But it was still so useful that it was worth having before 3.x, so it was added to 2.6 with a distinct name, iteritems. But then people realized they could have a view just as easily as an iterator, and it would do more, so that’s what actually went into 3.0. And that turned out to be so useful that it was worth having before 3.x, so, even though iteritems had already been added in 2.6, it was phased out for viewitems in 2.7. I’m just trying to jump to the end here. Some of the issues aren’t the same (should it be a function or an attribute, is it worth having custom implementations for some builtin types, …), but some of them are, so we can learn from the past instead of repeating the same process. We can just build the equivalent of viewitems right off the bat, and not even think about changing plain slicing (because we never want another 3.0 break). (Of course there may still be good arguments for why this isn’t the same, or for why it should end up differently even if it _is_ the same.) ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/TLTTZXWFP3QM6WRKEGF246RK6WYJSEG7/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)
On May 14, 2020, at 03:35, Steven D'Aprano wrote: > > On Sun, May 10, 2020 at 09:36:14PM -0700, Andrew Barnert via Python-ideas > wrote: > > >>> for i in itertools.seq_view(a_list)[::2]: >>>... >>> >>> I still think I prefer this though: >>> >>> for i in a_list.view[::2]: >>>... > >> Agreed. A property on sequences would be best, > > Why? Because the whole point of this is for something to apply slicing syntax to. And compare: lst.view[10:20] view(lst)[10:20] vjew(lst, 10, 20) The last one is clearly the worst, because it doesn’t let you use slicing syntax. The others are both OK, but the first seems the most readable. I’ll give more detailed reasons below. (There may be reasons why it can’t or shouldn’t be done, which is why I ranked all of the options in order rather than just insisting that we must have the first one or I hate his whole idea.) > This leads to the same problem that len() solves by being a function, > not a method on list and tuple and str and bytes and dict and deque and > Making views a method or property means that every sequence type > needs to implement it's own method, or inherit from the same base class, But len doesn’t solve that problem at all, and isn’t meant to. It just means that every sequence type has to implement __len__ instead of every sequence type having to implement len. Protocols often provide some added functionality. iter() doesn’t just call __iter__, it can also fall back to old-style sequence methods, and it has the 2-arg form. Similarly, str() falls back to __repr__, and has other parameter forms, and doubles as the constructor for the string type. And next() even changed from being a normal method to a protocol and function, breaking backward compatibility, specifically to make it easier to do the 2-arg form. But len() isn’t like that. There is no fallback, no added behavior, nothing. It doesn’t add anything. So why do we have it? Guido’s argument is in the FAQ. It starts off with “For some operations, prefix notation just reads better than postfix”. He then backs up the general principle that this is sometimes true by appeal to math. And then he explains the reasons this is one of those operations by arguing that “len”’is the most important piece of information here so it belongs first. It’s the same principle here, but the specific answer is different. View-ness is not more important than the sequence and the slicing, so it doesn’t call out to be fronted. In fact, view-ness is (at least in the user’s mind) strongly tied to the slicing, so it calls out to be near the slice. And it’s not like this is some unprecedented thing. Most of the collection types, and corresponding ABCs, have regular methods as well as protocol dunders. Is anyone ever confused by having to write xs.index(x) instead of index(xs, x)? I don’t think so. In fact, I think the latter would be _more_ confusing, because “index” has so many different meanings that “list.index” is useful to nail it down. (Notice that we already _have_ a dunder named __index__, and it does something totally different…) And the same is true for “view”. In fact, everything in your argument is so generic that it acts as an argument against not just .index() but against any public methods or attributes on anything. Obviously you didn’t intend it that way, but once you actually target it so that it argues against .len() but not .index(), I don’t think there’s any argument against .view left. > and that's why in the Java world nobody agrees what method to call to > get the length of an object. Nobody can agree on what function to call in C or PHP even though they’re functions rather than methods in those languages. Everyone can agree on what method to use in C++ and Smalltalk even though they’re methods in those languages, just like Java. (In fact, C++ even loosely enforces consistency the same way Python loosely does, except at compile time instead of run time—if your class doesn’t have a size() method, it doesn’t duck type as a collection and therefore can’t be used in templates that want a collection.) Or just look at Python: nobody is confused about how to spell the .index method even though it’s a method. So the problem in Java has nothing to do with methods. (We don’t have to get into what’s wrong with Java here; it’s not relevant.) > So if we are to have a generic view proxy object, as opposed to the very > much non-generic dict views, then it ought to be a callable function We don’t actually _know_ how generic it can/should be yet. That’s something we’ve been discussing in this thread. It might well be a quality-of-implementation issue that has different best answers in different Pythons. Or it might not. It’s not obvious. Which implies that whatever the answer is, it’s not something that peopl
[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)
On May 14, 2020, at 03:01, Steven D'Aprano wrote: > > On Mon, May 11, 2020 at 10:41:06AM -0700, Andrew Barnert via Python-ideas > wrote: > >> I think in general people will expect that a slice view on a sequence >> acts like “some kind of sequence”, not like the same kind they’re >> viewing—again, they won’t be surprised if you can’t insert into a >> slice of a list. > > o_O > > For nearly 30 years, We've been able to insert into a slice of a list. > I'm going to be *really* surprise if that stops working Which is exactly why Christopher said from the start of this thread, and everyone else has agreed at every step of the way, that we can’t change the default behavior of slicing, we have to instead add some new way to specifically ask for something different. Well, not _jusr_ this. There’s also the fact that for 30 years people have been using [:] to mean copy, and the fact that for 30 years people have taken small slices of giant lists and then expected the giant lists to get collected, and so on. But any one of these is enough reason on its own that copy-slicing must remain the default, behavior you get from lst[10:20]. Not only that, but whatever gives you view-slicing must look sufficiently different that you notice the difference—and ideally that gives you something you can look up if you don’t know what it means. I think lst.view[10:20] fits that bill. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/FSAZWPEV3LA3K2CP46GMLABDIOCM7FSL/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: [Suspected Spam]Re: Adding slice Iterator to Sequences (was: islice with actual slices)
On May 13, 2020, at 20:49, Christopher Barker wrote: > > OK, now for: > >> On Wed, May 13, 2020 at 7:50 PM Andrew Barnert wrote: > >> But that’s the wrong generalization. Because sets also work the same way, >> and they aren’t Sequences. Nor are dict views, or many of the other kinds of >> things-that-can-be-iterated-over-and-over-independently. > > > But file.readline() does not return any of those objects. It returns a list. > If you see this as an opportunity to teach about the iteration protocol, then > sure, you'd want to make that distinction. But I think the file object is the > wrong first example -- it's an oddball, having both the iteration protocol, > AND methods for doing most of teh same things. Agreed, it’s not an ideal first example, and zip or map would be much better. Unfortunately, files seem to be the example that many people run into first. (Or maybe lots of people do run into map first, but fewer of them get confused and need to go ask for help?) When you’re teaching a class, you can guide people to hit the things you want them to think about, but the intern, or C# guru who only touches Python once a year, or random person on StackOverflow that I’m dealing with apparently didn’t take your class. This is where they got confused, so this is what they ask about. > Most iterables don't have the equivalent of readlines() or readline() -- and > in this case, I think THAT's the main sorce of confusion, rather than the > iterable vs iterator distinction. But notice that they’re already writing `for line in f:`. That means they *do* understand that files are iterables. Sure, they probably don’t know the word “iterable”, but they understand that files are things you can use in a for loop (and that’s all “iterable” really means, unless you’re trying to implement rather than use them). And honestly, if Python didn’t make iteration so central, I’m not sure as many novices would get that far that fast in the first place. Imagine if, instead of just calling open and then doing `for line in f:`, you had to call an opener factory to get a filesystem opener, call that to get a file object, bind a line-buffered read stream to it, then call a method on that read stream with a callback function that processes the line and makes the next async read call. Anyone who gets that far is probably already a lot more experienced with JavaScript than someone who iterates their first file is with Python. > > > You can explain it anyway. In fact, you _have_ to give an explanation > > > with analogies and examples and so on, and that would be true even if > > > there were a word for what lists are. But it would be easier to explain > > > if there were such a word, and if you could link that word to something > > > in the glossary, and a chapter in the tutorial. > > OK -- time for someone to come up with word for "Itererable that isn't an > Iterator" -- I"d start using it :-) People used to loosely use “collection” for this, back before it was defined to mean “sized container”, but that no longer works. Maybe we need to come up with a word that can’t possibly have any existing meaning to anyone, like Standard Oil did with “Exxon”.___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/UUQJSBVB6PXPIV5QID76SOHOOUGF6FN2/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: [Suspected Spam]Re: Adding slice Iterator to Sequences (was: islice with actual slices)
> On May 13, 2020, at 20:32, Christopher Barker wrote: > >>> On Wed, May 13, 2020 at 7:50 PM Andrew Barnert wrote: >> On May 13, 2020, at 12:40, Christopher Barker wrote: >> Back to the Sequence View idea, I need to write this up properly, but I'm thinking something like: >>> >>> Can we just say that it returns an immutable sequence that blah blah, >>> without defining or naming the type of that sequence? > > Sure -- but it ends up getting a lot more wordy if you dont' have a name for > a thing. You’re right. Looking at the dict and similar docs, what they mostly do is to talk about”the key view”, and sometimes even “the key view type”, etc., in plain English, while being careful not to say anything that implies it has any particular name or identity. (In particular, “key view type” obviously can’t be the name of an actual type, because it has a space in it.) Anyway, if the proposal gets far enough to need docstrings and documentation, I guess you can worry about getting it right then, but until then you don’t have to be that careful; as long as we all know that list_view isn’t meant to name a specific type (and to be guaranteed distinct from tuple_view), I think we’ll all be fine. >> Python doesn’t define the types of most things you never construct directly. > > No, but there are ABCs so that might e the way to talk about this. That’s a good point. Does a sequence slice view (or a more general sequence view?) need an ABC beyond just being a Sequence? I wasn’t expecting that to be needed, but now that you bring it up… if there’s, say, a public attribute/property or method to get the underlying object, presumably it should be the same name on all such views, and maybe that’s something you’d want to be documented, and maybe even testable, by an ABC after all. >> And nobody even notices that list and tuple use the same type for their >> __iter__ in some Python implementations but not others. > > I sure haven't noticed that :-) It’s actually a bit surprising what tuple and list share under the covers in CPython, even at the public C API level. >> > calling.view on a list_view is another trick -- does it reference the host >> > view? or go straight back to the original sequence? > > > I think it’s the same answer again. In fact, I think .view on any slice > > view should just return self. > > Hmm -- this makes me nervous, but as long as its immutable, why not? Exactly. The same as these: >>> s = ''.join(random.choices(string.ascii_lowercase, k=10)) >>> s[:] is s True >>> str.__new__(s) is s True >>> copy.copy(s) is s True >>> t = tuple(s) >>> t[:] is t True etc. But all of those are just allowed, and implemented that way by CPython, not guaranteed by the language. So maybe the same should be true here. You can implement .view as just self, but if other implementations want to do something different they can, as long as it meets the same documented behavior (which could just be something like “view-slicing a view slice has the same effect as slicing a view slice” or something?). ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/Q4PVK6ZBH4MCJMIHYEYA5MHYFHTPCUMN/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: [Suspected Spam]Re: Adding slice Iterator to Sequences (was: islice with actual slices)
On May 13, 2020, at 12:40, Christopher Barker wrote: I hope you don’t mind, but I’m going to take your reply out of order to get the most important stuff first, in case anyone else is still reading. :) >> Back to the Sequence View idea, I need to write this up properly, but I'm >> thinking something like: > > (using a concrete example or list) > > list.view is a read-only property that returns an indexable object. > indexing that object with a slice returns a list_view object > > a_view = list.view[a:b:c] > > a_view is a list_ view object > > a list_view object is a immutable sequence. indexing it returns elements from > the original list. Can we just say that it returns an immutable sequence that blah blah, without defining or naming the type of that sequence? Python doesn’t define the types of most things you never construct directly. (Sometimes there is a public name for it buried away in the types module, but it’s not mentioned anywhere else.) Even the dict view objects, which need a whole docs section to describe them, never say what type they are. And I think this is intentional. For example, nowhere does it say what type function.__get__ returns, only what behavior that object has—and that allowed Python 3 to get rid of unbound methods, because a function already has the right behavior. And nobody even notices that list and tuple use the same type for their __iter__ in some Python implementations but not others. Similarly, I think dict.__iter__() used to return a different type from dict.keys().__iter__() in CPython but now they share a type, and that didn’t break any backward compatibility guarantees. And it seems there’s no reason you couldn’t use the same generic sequence view type on all sequences, but also it’s possible that a custom one for list and tuple might allow some optimization (and even more likely so for range, although it may be less important). So if you don’t specify the type, that can be left up to each version of each implementation to decide. > slicing a list view returns I'm not sure what here -- it should probably > be a copy, so a new list_view object refgerenceing the same list? That will > need to be thought out carefully) Good question. I suppose there are three choices: (1) a list (or, in general, whatever the original object returns from slicing), (2) a new view of the same list, or (3) a view of the view of the list. I think I agree with you here that (2) is the best option. In other words, lst.view[2::2][1::3] gives you the exact same thing as lst.view[4::6]. At first that sounds weird because if you can inspect the attributes of the view object, there’s way to see that you did a [1::3] anywhere. But that’s exactly the same thing that happens with, e.g,, range(100)[2::2][1::3]. You just get range(4, 100, 6), and there’s no way to see that you did a [1::3] anywhere. And the same is true for memoryview, and for numpy arrays and bintrees tree slices—despite them being radically different things in lots of other ways, they all made the same choice here. And even beyond Python, it’s what slicing a slice view does in Swift (even though other kinds of views of views don’t “flatten out” like this, slice views of slice views do), and in Go. (Although C++20 is a counterexample here.) > calling.view on a list_view is another trick -- does it reference the host > view? or go straight back to the original sequence? I think it’s the same answer again. In fact, I think .view on any slice view should just return self. Think about it: whether you decided that lst.view[2::2][1::3] gives lst.view[4::6] or a nested view-of-a-view-of-a-list, it would be confusing if lst.view[2::2].view[1::3] gave you the other one, and what other options would make sense? And, unless there’s some other behavior besides slicing on view properties, if self.view slices the same as self, it might as well just be self. > iter(a_list_view) returns a list_viewiterator. Here, it seems even more useful to leave the type unspecified. For list (and tuple) in CPython, I’m not sure if you can get away with using the special list_iterator type used by list and tuple (which accesses the underlying array directly), or, if not that, the PySeqIter type used for old-style iter-by-indexing, but if you can, it would be both simpler and more efficient. And similarly, range.view might be able to use the range_iterator type. Or, if you can’t do that, a generic PyIter around tp_next would be less efficient than a custom type, but again simpler, and the efficiency might not matter. Or, if you just had a single sequence view type rather than custom ones for each sequence type, that would obviously mean a single iterator type. And so on. That all seems like quality-of-implementation stuff that should be left open to whatever turns out to be best. > iterating that gets you items from the "host" "on the fly. > > All this is a fair bit more complicated than
[Python-ideas] Re: Improve handling of Unicode quotes and hyphens
On May 13, 2020, at 05:31, Richard Damon wrote: > > On 5/13/20 2:22 AM, Stephen J. Turnbull wrote: >> MRAB writes: >>> >>> This isn't a parsing problem as such. I am not an expert on the >>> parser, but what's going is something like this: the parser >>> (tokenizer) sees the character "=" and expects an operator. Next, it >>> sees something that is not "=" and not whitespace, so it expects a >>> literal or an identifier. " “" is not parsable as the start of a >>> literal, so the parser consumes up to the next boundary character >>> (whitespace or operator). Now it checks for the different types of >>> barewords: keywords and identifiers, and neither one works. >>> >>> Here's the critical point: identifier fails because the tokenizer >>> tries to match a sequence of Unicode word constitituents, and " “" >>> isn't one. So it fails the sequence of non-whitespace characters, and >>> points to the end of the last thing it saw. >> But that is the problem, identifier fails too late, it should have seen >> at the start that the first character wasn't valid in an identifier, and >> failed THERE, pointing at the bad character. There shouldn't be a >> post-hoc test for bad characters in the identifier, it should be a >> pre-test in the tokenizer. >> >> So I see no reason why we need to transition to the new parser to fix >> this. (And the new parser (as of the last comment I saw from Guido) >> probably doesn't help: he kept the tokenizer.) We just need to make a >> second pass over the invalid identifier and identify the invalid >> characters it contains and their positions. > There is no need to rescan/reparse, the tokenizer shouldn't treat > illegal characters as possibly part of a token. Isn’t this what already happens? >>> import tokenize, io >>> def tok(s): return list(tokenize.tokenize(io.BytesIO(x.encode()).readline)) >>> tok('spam(“Abc”)') When I run this in 3.7, the fourth token is an ERRORTOKEN with string ”, then there’s a NAME with Abc, then another ERRORTOKEN with “. And reading the Lexical Analysis chapter of the docs, this seems correct. The smart quote is not a possible xid_start, or any other start of any token terminal, so it should immediately fail as an error.(The fact that the tokenizer eats it, generates an ERRORTOKEN, and then lexes the Abc as a NAME, rather than throwing an exception or otherwise punting, is a pretty nice error-recovery attempt, and seems perfectly reasonable.) Is that not true for the internal C tokenizer? Or is it true, but the parser or the error generating code isn’t taking advantage of it? (By the way. I’m pretty sure this behavior isn’t specific to 3.7, but has been that way back into the mists of whenever you could first write old-style import hooks, even up to the way error recovery works. I’ve taken advantage of this behavior in experimenting with new syntax. If your new syntax is not just unambiguous at the parser level, but even at the lexical level, you can just scan the token stream for your matching ERRORTOKEN.) ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/OZ5N3NJIGQCO7Q645IDX4IWA45GAMEI6/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: [Suspected Spam]Re: Adding slice Iterator to Sequences (was: islice with actual slices)
On May 12, 2020, at 23:29, Stephen J. Turnbull wrote: > > Andrew Barnert writes: >>> On May 10, 2020, at 22:36, Stephen J. Turnbull >>> wrote: >>> >>> Andrew Barnert via Python-ideas writes: >>> >>>> A lot of people get this confused. I think the problem is that we >>>> don’t have a word for “iterable that’s not an iterator”, >>> >>> I think part of the problem is that people rarely see explicit >>> iterator objects in the wild. Most of the time we encounter iterator >>> objects only implicitly. >> >> We encounter iterators in the wild all the time, we just don’t >> usually _care_ that they’re iterators instead of “some kind of >> iterable”, and I think that’s the key distinction you’re looking >> for. > > It *is* the distinction I'm making with the word "explicit". I never > use "next" on an open file. I'm not sure your more precise statement > is better. > > I think the real difference is that I'm thinking of "people" as > including my students who have no clue what an iterator does and don't > care what an iterable is, they just cargo cult > >with open("file") as f: >for line in f: >do_stuff(line) > > while as you point out (and I think is appropriate in this discussion) > some people who are discussing proposed changes are using the available > terminology incorrectly, and that's not good. Students often want to know why this doesn’t work: with open("file") as f: for line in file: do_stuff(line) for line in file: do_other_stuff(line) … when this works fine: with open("file") as f: lines = file.readlines() for line in lines: do_stuff(line) for line in lines: do_other_stuff(line) This question (or a variation on it) gets asked by novices every few day’s on StackOverflow; it’s one of the top common duplicates. The answer is that files are iterators, while lists are… well, there is no word. You can explain it anyway. In fact, you _have_ to give an explanation with analogies and examples and so on, and that would be true even if there were a word for what lists are. But it would be easier to explain if there were such a word, and if you could link that word to something in the glossary, and a chapter in the tutorial. >> Still, having clear names with simple definitions would help that >> problem without watering down the benefits. > > I disagree. I agree there's "amortized zero" cost to the crowd who > would use those names fairly frequently in design discussions, but > there is a cost to the "lazy in the technical sense" programmer, who > might want to read the documentation if it gave "simple answers to > simple questions", > but not if they have to wade through a thicket of > "twisty subtle definitions all alike" to get to the simple answer, and > especially not if it's not obvious after all that what the answer is. We shouldn’t define everything up front, just the most important things. But this is one of the most important things. People need to understand this distinction very early on to use Python, and many of them don’t get it, hence all the StackOverflow duplicated. People run into this problem well before they run into a problem that requires them to understand the distinction between arguments and parameters, or protocols and ABCs, or Mapping and dict. > It also makes conversations with experts fraught, as those experts > will tend to provide more detail and precision than the questioner > wants (speaking for myself, anyway!) "Not every one-sentence > explanation needs terminology in the documentation." I think it’s the opposite. I can teach a child why a glass will break permanently when you hit it while a lake won’t by using the words “solid” and “liquid”. I don’t have to give them the scientific definitions and all the equations. I might not even know them. And in the same way, I can teach novices why the x after x=y+1 doesn’t change when y changes by teaching them about variables without having to explain __getattr__ and fast locals and the import system and so on. Knowing all the subtleties or shear force or __getattribute__ or whatever doesn’t prevent me from teaching a kid without getting into those subtleties. The better I understand “solid” or “variable”, the easier it is for me to teach it. That’s how words work, or how the human mind works, or whatever, and that’s why language is useful for teaching. >>>> But that last thing is exactly the behavior you expect from “things >>>> like list, dict, etc.”, and it’s hard to explain, and therefore >>>> ha
[Python-ideas] Re: Sanitize filename (path part)
On May 12, 2020, at 01:32, Barry Scott wrote: > > >> On 11 May 2020, at 23:24, Andrew Barnert wrote: >> >>> On May 11, 2020, at 13:31, Barry Scott wrote: >>> >>> macOS and Unix version (I only use Unicode input so avoid the random bytes >>> problems): >> >> But that doesn’t avoid the problem. If someone gives you a character whose >> encoding on the target filesystem includes a null or pathsep byte, your >> sanitizer will pass it as safe, when it shouldn’t. > > Do you have a example that shows an encoding that produces a NUL or pathsep? > I'm not aware of any. UTF-1 encodes U+D7FF to the bytes F7 2F C3. BOCU has similar examples. In the other direction, MUTF-8 decodes the bytes CO 80 to U+. There were a number of cross-site scripting and misleading-link attacks abusing (mostly) BOCU in this way, which is part of the reason WHATWG banned them as charsets. Although there were other reasons (they banned stuff like SCSU and CESU-8 and UTF-7 at the same time, and I don’t think any of them have the same problem). And if there were widespread legitimate uses of these codecs, they probably wouldn’t have been banned (see UTF-16LE, which is even easier to exploit this way, but unfortunately way too common). I don’t think Python comes with codecs for any of these encodings. And I don’t know of anyone who ever used them for filenames. (SCSU was the default fs encoding on Symbian flash memory drives, but again, I don’t think it has this problem.) So this may well not be a practical problem. >> Is it still a realistic problem today? I don’t know. I’m pretty sure the >> modern versions of Shift-JIS, EUC-*, Big5, and GB can never have >> continuation bytes below 0x30, but even if I’m right, are these (and UTF-8, >> of course) the only multi-byte encodings anyone ever uses on Unix >> filesystems? > > I suspect that legacy encoding are used in organisations with old data, but > do have direct experience of this. I have direct experience of some of those East Asian codecs, albeit 15 or so years ago. I’m pretty sure the only ones they used were all safe. I also have experience even further back of mounting drives from Ataris and classic Macs and IBM mainframes and all kinds of other crazy things under Unix, but the filesystem drivers recoded filenames on the fly, along with providing a Unix-style hierarchical filesystem, so user-level code didn’t have to worry about MacKorean or EBCDIC or whatever any more than it had to worry about : as a pathsep and absolute paths being the ones that _don’t_ start with a pathsep and so on. So, based on my experience, it doesn’t seem likely to come up even in shops full of old data. But that experience isn’t worth much… ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/7L466KEUYZ3ZA2IUBUD2L7UONQFPSECM/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Sanitize filename (path part)
On May 11, 2020, at 13:31, Barry Scott wrote: > > macOS and Unix version (I only use Unicode input so avoid the random bytes > problems): But that doesn’t avoid the problem. If someone gives you a character whose encoding on the target filesystem includes a null or pathsep byte, your sanitizer will pass it as safe, when it shouldn’t. This isn’t possible on macOS because the OS won’t let you mount any filesystem whose encoding isn’t UTF-8, but it is possible on most other *nixes, and it has been used as an attack in the past. Is it still a realistic problem today? I don’t know. I’m pretty sure the modern versions of Shift-JIS, EUC-*, Big5, and GB can never have continuation bytes below 0x30, but even if I’m right, are these (and UTF-8, of course) the only multi-byte encodings anyone ever uses on Unix filesystems? ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/KPEEFJHXFH26EMLYRPAG27MQD2LJHCHG/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Sanitize filename (path part) 2nd try
> On May 11, 2020, at 14:18, Steve Jorgensen wrote: > > Andrew Barnert wrote: >>> On May 11, 2020, at 00:40, Steve Jorgensen ste...@stevej.name wrote: >>> Proposal: >>> Add a new function (possibly os.path.sanitizepart) to sanitize a value for >>> use as a single component of a path. In the default case, the value must >>> also not be a >>> reference to the current or parent directory ("." or "..") and must not >>> contain control >>> characters. > >> If not: the result can contain the path separator, illegal characters that >> aren’t >> control characters, nonprinting characters that aren’t control characters, >> and characters >> whose bytes (in the filesystem’s encoding) are ASCII control characters? >> And it can be a reserved name, or even something like C:; as long as it’s >> not the Unix >> . or ..? > > Are there non-printing characters outside of those in the Unicode general > category of "C" that make sense to omit? Off the top of my head, everything in the Z category (like U+2029 PARAGRAPH SEPARATOR) is non-printable, and makes sense to sanitize. Meanwhile, what about invalid characters being smuggled through str by surrogate_escape? I don’t know if those are printable, or what category they are… or whether you want to sanitize them, for that matter, so I have no idea if this rule does the right thing or not. More generally, we shouldn’t be relying on what respondents know off the top of their heads in the first place for something that people are going to rely on for security/safety purposes. > Regarding names like "C:", you are absolutely right to point that out. When > the platform is Windows, certainly, ":" should not be allowed, and > perhaps colon should not be allowed at all. I'll need to research that a bit. > This matters because if the path part is used without explicit "./" prefixed > to it, then it will refer to a root path, The name `C:spam` means spam in the current directory for the C drive—which isn’t the same as the current working directory unless C is the current working drive, but it’s definitely not (in general) the same as the root. And what about all the other questions I asked? Most importantly, you need to clarify what the use case is, and why this proposal meets it. Otherwise, it sounds more like a trap to make people think their code is safe when it isn’t, not a fix for the real problem. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/XRFT77TXLE7MNAP2MV2IC57NG4EWQIGP/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Sanitize filename (path part) 2nd try
On May 11, 2020, at 12:54, Wes Turner wrote: > > > What does sanitizepart do with newlines \n \r \r\n in filenames? Are these > control characters? >>> unicodedata.category('\n') Cc ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/D6HRO6UIEXK56KV6NMR676CJCZKMKZJV/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Sanitize filename (path part) 2nd try
On May 11, 2020, at 12:59, Barry Scott wrote: > > >> On 11 May 2020, at 18:09, Andrew Barnert via Python-ideas >> wrote: >> >> More generally, what’s the use case for %-encoding filenames like this? Are >> people expecting it to interact transparently with URLs, so if I save a file >> “spam\0eggs” in a Python script and then try to browse to >> file:///spam\0eggs” in a browser, the browser will convert the \0 character >> to %00 the same way my Python script did and therefore find the file? > > No. > > The \0 can never be part of a valid file in Unix, macOS or Windows. Of course. Which is exactly the kind of thing this sanitize function is meant for. Hence my question: if my Python script is sanitizing all filenames with this function with escape='%', is the expectation that it’ll actually give me something that can be used if I paste the same thing into a browser and let it url-escape a file URL? If so, will that actually work? If not, what _is_ the intended use for this option? ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/CMYTQZQHSALH4ZREIMTDMFLYMJXWSPG3/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)
On May 11, 2020, at 10:57, Alex Hall wrote: > > >> On Mon, May 11, 2020 at 12:50 AM Christopher Barker >> wrote: > > >> Though it is heading in a different direction that where Andrew was >> proposing, that this would be about making and using views on sequences, >> which really wouldn't make sense for any iterator. > > The idea is that islice would be the default behaviour and classes could > override that to return views if they want. It is possible to get both, but I don’t think it’s easy. I think the ultimate unification of these ideas is the “views everywhere” design of Swift. Whether you have a sequence or just a collection or just a one-shot forward-only iterable, you use the same syntax and the same functions to do everything—copy-slicing, view-slicing, chaining, mapping, zipping, etc. And the result is always a view with as much functionality as makes sense (do filtering a sequence gives you a view that’s a reversible collection, not a sequence). So you can view-slice the result of a genexpr the same way you would a list, and you just get a forward-only iterable view instead of a full-fledged sequence view. I’ve started designing such a thing multiple times, every couple years or so, and always realize it’s even more work than I thought and harder to fit into Python than i thought and give up. But maybe doing it _just_ for view slicing, rather than for everything, and requiring a wrapper object to use it, is a lot simpler, and useful enough on its own. And that would fit well into the Python way of growing by adding stuff as needed, and only trying to come up with a complete and perfect general design up front when absolutely necessary.___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/4CQE7Q4TRJTQF66ZHMCPJMCLCUEXHEAT/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: [Suspected Spam]Re: Adding slice Iterator to Sequences (was: islice with actual slices)
On May 10, 2020, at 22:36, Stephen J. Turnbull wrote: > > Andrew Barnert via Python-ideas writes: > >> A lot of people get this confused. I think the problem is that we >> don’t have a word for “iterable that’s not an iterator”, > > I think part of the problem is that people rarely see explicit > iterator objects in the wild. Most of the time we encounter iterator > objects only implicitly. We encounter iterators in the wild all the time, we just don’t usually _care_ that they’re iterators instead of “some kind of iterable”, and I think that’s the key distinction you’re looking for. Certainly when you open a file, you usually deal with the file object. And whenever you feed the result of one genexpr into another, or into a map call, you are using an iterator. You often even store those iterators in variables. But if you change that first genexpr to a listcomp (say, because you want to be able to breakpoint there and print it to the debugger, or dump it to a log), nothing changes except performance. And people know this and take advantage of it without even thinking. And that’s true of the majority of places you use iterators. Code that explicitly needs an iterator (like the grouper idiom where you zip an iterator with itself) certainly does exist, but it’s nowhere near as common as code that can use any iterable and only uses an iterator because that’s the easiest thing to write or the most efficient thing. This is a big part of what I meant about the concepts being so nice that people manage to use them despite not being able to talk about them. > Nomenclature *is* a problem (I still don't > know what a "generator" is: a function that contains "yield" in its > def, or the result of invoking such a function), but part of the > reason for that is that Python successfully hides objects like > iterators and generator objects much of the time (I use generator > expressions a lot, "yield" rarely). You’re right. The fact that the concept (and the implementation of those concepts) is so nice that we rarely have to think about these things explicit is actually part of the reason it’s hard to do so on the rare occasions we need to. And put that way, it’s a pretty good tradeoff. Still, having clear names with simple definitions would help that problem without watering down the benefits. >> or for the refinement “iterable that’s not an iterator and is >> reusable”, much less the further refinement “iterable that’s >> reusable, providing a distinct iterator that starts from the head >> each time, and allows multiple such iterators in parallel”. > > Aside: Does "multiple parallel iterators" add anything to "distinct > iterator that starts from the head each time"? Or did you mean what I > would express as "and *so* it allows multiple parallel iterators"? I’m being redundant here to make sure I’m understood, because just saying it the second way apparently didn’t get the idea across the first time. >> But that last thing is exactly the behavior you expect from “things >> like list, dict, etc.”, and it’s hard to explain, and therefore >> hard to document. > > Um, you just did *explain* it, quite well IMHO, you just didn't *name* > it. ;-) Well, it was a long, and redundant, explanation, not something you’d want to see in the docs or even a PEP. >> The closest word for that is “collection”, but Collection is also a >> protocol that adds being a Container and being Sized on top of >> being Iterable, so it’s misleading unless you’re really careful. So >> the docs don’t clearly tell people that range, dict_keys, etc. are >> exactly that “like list, dict, etc.” thing, so people are confused >> about what they are. People know they’re lazy, they know iterators >> are lazy, > > I'm not sure what "lazy" means here. range is lazy: the index it > reports doesn't exist anywhere in the program's data until it computes > it. But I wouldn't call a dict view "lazy" any more than I'd call the > underlying dict "lazy". Views are references, or alternative access > interfaces if you like. But the data for the view already exists. “lazy” as in it creates something that acts like a list or a set, but hasn’t actually stored a list or set or other data structure in memory or done a bunch of up-front CPU work. You’re right that a more precise definition would probably include range but not dict_keys, but I think people do use it in a way that includes both, and that’s part of the reason they’re equally confused into thinking both are iterators. >> so they think they’re a kind of iterator, and the docs don’t ever >> make it clear why that’s wrong. > > I don't think the problem is in the docs. Iterators and vie
[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)
On May 10, 2020, at 21:51, Christopher Barker wrote: > > > On Sun, May 10, 2020 at 9:36 PM Andrew Barnert wrote: > >> However, there is one potential problem with the property I hadn’t thought >> of until just now: I think people will understand that mylist.view[2:] is >> not mutable, but will they understand that mystr.view[2:] is not a string? >> I’m pretty sure that isn’t a problem for seqview(mystr)[2:], but I’m not >> sure about mystr.view[2:]. > > One more issue around the whole "a string is sequence of strings" thing :-) > Of course, it *could* be a string -- not much difference with immutables. > Though I suppose if you took a large slice of a large string, you probably > don't want the copy. But what *would* you want to do with it. That “string is a sequence of strings” issue, plus the “nothing can duck type as a string“ issue. Here’s an example that I can write in, say, Swift or Rust or even C++, but not in Python: I mmap a giant mailbox file, and I can treat that as a string without copying it anywhere. I split it into a string for each message—I don’t want to copy them all into a list of strings, and ideally I don’t even want to copy one at a time into an iterator or strings because some of them can be pretty huge; I want a list or iterator of views into substrings of the mmap. (This isn’t actually a great example, because even with substring views, the mmap can’t be used as a str in the first place, but it has the virtue of being a real example of code I’ve actually written.) > but if you had a view of a slice, and it was a proper view, it might be > pretty poky for many string operations, so probably just as well not to have > them. I think in general people will expect that a slice view on a sequence acts like “some kind of sequence”, not like the same kind they’re viewing—again, they won’t be surprised if you can’t insert into a slice of a list. It’s only with str that I’m worried they might expect more than we can provide, which sucks because str is the one place we _couldn’t_ provide it even if we wanted to. But maybe I’m wrong and people won’t have this assumption, or will be easily cured of it. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/CMHJKKVH2TQLED2W3KICEIQY43SBX27S/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Sanitize filename (path part) 2nd try
On May 11, 2020, at 00:40, Steve Jorgensen wrote: > > Proposal: > > Add a new function (possibly `os.path.sanitizepart`) to sanitize a value for > use as a single component of a path. In the default case, the value must also > not be a reference to the current or parent directory ("." or "..") and must > not contain control characters. “Also” in addition to what? Are there other requirements enforced besides these two that aren’t specified anywhere? If not: the result can contain the path separator, illegal characters that aren’t control characters, nonprinting characters that aren’t control characters, and characters whose bytes (in the filesystem’s encoding) are ASCII control characters? And it can be a reserved name, or even something like C:; as long as it’s not the Unix . or ..? What’s the use case where you need to sanitize these things but nothing else? As I said on the previous proposal, I have had a variety of times where I needed to sanitize filenames, but I don’t think this would have been what I wanted for _any_ of them, much less for most. Are there existing tools, libraries, recommendations, etc. that this is based on, or is it just an educated guess at what’s important? For something that’s meant to go into the stdlib with a name that strongly implies “if you use this, you’re safe from stupid or malicious filenames”, it would be misleading, and possibly dangerous, if it didn’t actually make you safe because it didn’t catch common mistakes/exploits that everyone else considers important to catch. And without any cites to what people everyone else considers important, why should anyone trust that this proposal isn’t missing, or getting wrong, anything critical? Why isn’t this also available in pathlib? Is it the kind of thing you don’t envision high-level pathlib-style code ever needing to do, only low-level os-style code? > When `replace` is supplied, it is used as a replacement for any invalid > characters or for the first character of an invalid name. When `prefix` is > not also supplied, this is also used as the replacement for the first > character of the name if it is invalid, not simply due to containing invalid > characters. What’s the use case for separate prefix and replace? Or just for prefix in the first place? > When `escape` is supplied (typically "%") it is used as the escape character > in the same way that "%" is used in URL encoding. Why allow other escape strings? Has anyone ever wanted URL-encoding but with some other string in place or %, in this or any other context? The escape character is not itself escaped? More generally, what’s the use case for %-encoding filenames like this? Are people expecting it to interact transparently with URLs, so if I save a file “spam\0eggs” in a Python script and then try to browse to file:///spam\0eggs” in a browser, the browser will convert the \0 character to %00 the same way my Python script did and therefore find the file? If so, doesn’t it need to escape all the same characters that URLs do, not a different set? If not, isn’t using something similar to URL-encoding but not identical just going to confuse people rather than help then? What happens if you supply a string longer than one character as escape? Or replace or prefix, for that matter? Overall, it seems like there is a problem to be solved, but I don’t see any reason to be confident that this is the solution for anyone, and if it’s not the solution for _most_ people, adding it to the stdlib will just mean people don’t search for and find the right one, all the while misleading themselves into thinking they’re safe when they’re not, which will make the overall problem worse, not better. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/ZBBMQ34OHSR3RYKVUFLNUIM34WG3R2N7/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)
On May 10, 2020, at 15:39, Christopher Barker wrote: > > >> On Sun, May 10, 2020 at 12:48 PM Andrew Barnert wrote: > >> Is there any way you can fix the reply quoting on your mail client, or >> manually work around it? > > I'm trying -- sorry I've missed a few. It seems more and more "modern" email > clients make "interspersed" posting really hard. But I hate bottom posting > maybe even more than top posting :-( (gmail seems to have magically gotten > worse in this regard recently) It seems like the one place Google still sees (the remnants of) Yahoo as a competitor is who can screw up mailing lists worse. > It's also interesting to note (from another part of this thread) that slicing > isn't part of the Sequence ABC, or any? "official" protocol? If we still had separate __getitem__ and __getslice__ when ABCs and the idea of being clearer about protocols had come along, I’ll bet __getslice__ would have been made part of the protocol. But I suppose it’s a little too late for me to complain about a change that I think went in even before new-style classes. :) > I do see this, though not entirely sure what to make of it: > > https://docs.python.org/3/c-api/sequence.html?highlight=sequence Yeah, the fact that sequences and mappings have identical methods means that from Python those two protocols are opt-in rather than automatic, while from C you have to be more prepared for errors after checking than with other protocols. Annoying, but not using the same syntax and dunders for indexing and keying would be a lot more annoying. > > Also, notice that this is true for all of the existing views, and none of > > them try to be un-featureful to avoid it. > > But there is no full featured mapping-view that otherwise acts much like a > mapping. types.MappingProxyType. In most cases, type(self).__dict__ will get you one of these. But of course this is a view of the whole dict, not a subset. > in theory, there *could* be -- if there was some nice way to specify a subset > of a mapping without copying the whole thing -- I can't think of one at the > moment. Not in the stdlib, but for a SortedDict type, key-slicing makes total sense, and many of them do it—although coming up with a nice API is hard enough that they all seem to do it differently. (Obviously d[lo:hi] should be some iterable of the values from the keys lo<=key> I think the biggest question is actually the API. Making this a function (or >> a class that most people think of as a function, like most of itertools) is >> easy, but as soon as you say it should be a method or property of sequences, >> that’s trickier. You can add it to all the builtin sequence types, but >> should other sequences in the stdlib have it? Should Sequence provide it as >> a mixin? Should it be part of the sequence protocol, and therefore checked >> by Sequence as an ABC (even though that could be a breaking change)? > > Here is where I think you (Andrew) and I (Chris B.) differ in our goals. My > goal here is to have an easily accessible way to use the slice syntax to get > an iterable that does not make a copy. It’s just a small difference in emphasis. I want a way to get a non-copying slice, and I’d really like it to be easily accessible—I‘d grumble if you didn’t make it a member, but I’d still use it. > While we're at it, getting a sequence view that can provide an iterator, and > all sorts of other nifty features, is great. But making it a callable in > itertools (or any other module) wouldn't accomplish that goal. > > Hmm, but maybe not that bad: > > for i in itertools.seq_view(a_list)[::2]: > ... > > I still think I prefer this though: > > for i in a_list.view[::2]: > ... Agreed. A property on sequences would be best, a wrapper object that takes slice syntax clearly back in second, and a callable that takes only islice syntax a very distant third. So if the first one is possible, I’m all for it. My slices repo provides the islice API just because it’s easier for slapping together a proof of concept of the slicing part, definitely not because I’d want that added to the stdlib as-is. However, there is one potential problem with the property I hadn’t thought of until just now: I think people will understand that mylist.view[2:] is not mutable, but will they understand that mystr.view[2:] is not a string? I’m pretty sure that isn’t a problem for seqview(mystr)[2:], but I’m not sure about mystr.view[2:]. > So to all those questions: I say "yes" except maybe: > > "checked by Sequence as an ABC (even though that could be a breaking change)" > -- because, well, breaking changes are "Not good". > > I wonder if there is a way to make something standard, but not quite break > things -- hmm. > > For instance: It seems to be possible to have Sequence provide it as a mixin, > but not have it checked by Sequence as an ABC? Actually, now that I think about it, Sequence _never_ checks methods. Most of the
[Python-ideas] Re: Improve handling of Unicode quotes and hyphens
On May 10, 2020, at 14:33, Christopher Barker wrote: > > Having a "tabnanny-like" function / module in the stdlib would be nice, > though I'd think a stand alone module in PyPi would be almost as good, and a > good way to see if it gains traction. Good point. Plus, it might well turn out that, say, the right thing for most Windows users and the right thing for most iOS Pythonista users is sufficiently different that two separate defancier packages are better than a one-size-fits-all could be, which we’d find out a lot more easily if people go out and use it in the field than if we try to design it here. > BTW -- there are a whole lot of Syntax Errors that a semi smart algorithm > could provide meaningful suggestions about about. I'm pretty sure that's come > up before on this list, but maybe "helpful" mode you could run Python in > that would do that for all Syntax errors that it could. We could even have a > way for folks to extend it with additional checks. This already exists on PyPI. Actually, there are a few different ones. One of them (I think friendly-tracebacks?) is very detailed. One of the authors sometimes posts about it here, when we’re talking about how some exception should be improved, with an example showing that they’ve already thought of it and done something better than is being proposed in the list.:) That one may already be a category killer. I looked over some of the others and the only thing that jumped out at me was that one of them (better-errors?) integrates really nicely into iPython and Jupyter (using iPython’s syntax coloring settings, making more use of line-drawing and box characters, etc.) But that doesn’t mean the category killer should be in the stdlib; I suspect they’re still improving it at a much faster pace than the stdlib could handle. But maybe the docs should link to it. The only problem is that the obvious places (like Interface Options section in the Usage docs) are things almost nobody reads… ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/WWPXZFCCFH7UF3Q52EONRASPQOQQD2OH/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Improve handling of Unicode quotes and hyphens
On May 10, 2020, at 00:11, Steve Barnes wrote: > > What can be done? I think there’s another option (in addition to improving SyntaxError, not instead of it): Add a defancier module to the stdlib. It has functions that take some text and turn smart quotes into plain ASCII quotes, dashes and minuses into ASCII hyphens, etc., or just detect them and produce useful objects and/or text. And it’s a runnable module that can either lint or fix source code. Then instead of telling people who get this SyntaxError “Use a proper editor, and all the code you wrote so far has to be rewritten or fixed manually, and that’ll show you”, we can tell them “Use a proper editor in the future, but meanwhile you can fix your existing script with `python -m defancier -f script.py`“. And a simple IDE or editor mode that doesn’t want to come up with something better could run defancier on SyntaxError or on open or whenever and show the output in a nice way and offer a single-click fix. There’s nothing in the stdlib quite like this, but textwrap, tabnanny, 2to3, etc. are vaguely similar precedents. And it seems like the kind of thing that will evolve on about the right scale for the stdlib—new problems to add to the list come up about once a decade, not every few months or anything. The place I’d _really_ like this is Pythonista, which does an admirable job fighting iOS text input for me, but it’s not so helpful for fixing up pasted code. (And needless to say, I can’t just get a better editor/IDE; it’s by far the best option for the platform.) (By the way, the reason I used -f rather than —fix is that I can’t figure out how to get the iPhone Mail.app to not replace double hyphens with an em-dash, or even how to fix it when it does. All of the other fancifier stuff can be worked around pretty easily, but apparently not that one…) ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/T6JPAQWP3P3IJSGGZWMDPBKPFUE6LQJ2/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Improve handling of Unicode quotes and hyphens
On May 10, 2020, at 03:47, Ned Batchelder wrote: > > On 5/10/20 3:09 AM, Steve Barnes wrote: >> Change the error message “SyntaxError: invalid character in identifier” to >> include which character and it’s Unicode value so that it becomes >> “SyntaxError: invalid character 0x201c “ in identifier” – this is almost >> certainly the easiest change and fits well with explicit is better than >> implicit but still leaves it to the user to correct the erroneous input >> (which could be argued is both good and bad). > > Or change it to, "SyntaxError, only plain quotes can be used: you have 0x201c > which is a fancy quote" (or something). We have a specific SyntaxError > message for print-without-parens, we should be able to do this also. Can the error message actually include the Unicode character itself? A novice isn’t going to know what U+201c means, they may not be entirely sure what fancy quote means or how to search for it, but they will know what “ means and can search for it by just copying and pasting from the error message to the Find box in their editor. (I think we do include Unicode characters in other error messages when they come directly from the user’s input text. For example, if I try to 2+Spám(), the error message will have á in the string.) ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/K5QQ64GF2YVUHWRQ5LNCDRCN5VA6OZOZ/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)
On May 10, 2020, at 11:09, Christopher Barker wrote: Is there any way you can fix the reply quoting on your mail client, or manually work around it? I keep reading paragraphs and saying “why is he saying the same thing I said” only to realize that you’re not, that’s just a quote from me that isn’t marked, up until the last line where it isn’t… > On Sat, May 9, 2020 at 9:11 PM Andrew Barnert wrote: > > > That’s no more of a problem for a list slice view than for any of the > > existing views. The simplest way to implement a view is to keep a reference > > to the underlying object and delegate to it, which is effectively what the > > dict views do. > > Fair enough. Though you still could get potentially surprising behavior if > the original sequence's length is changed. I don’t think it’s surprising. When you go out of your way to ask for a dynamic view instead of the default snapshot copy, and then you change the list, you’d expect the view to change. If you don’t keep views around, because you’re only using them for more efficient one-shot iteration, you might never think about that, but then you’ll never notice it to be surprised by it. The dynamic behavior of dict views presumably hasn’t ever surprised you in the 12 years it’s worked that way. > And you probably don't want to lock the "host" anyway -- that could be very > confusing if the view is kept all be somewhere far from the code trying to > change the sequence. Yes. I think memoryview’s locking behavior is a special case, not something we’d want to emulate here. I’m guessing many people just never use memoryview at all, but when you do, you’re generally thinking about raw buffers rather than abstract behavior. (It’s right there in the name…) And when you need something more featureful than an invisible hard lock on the host, it’s time for numpy. :) > I'm still a bit confused about what a dict.* view actually is The docs explain it reasonably well. See https://docs.python.org/3/glossary.html#term-dictionary-view for the basic idea, https://docs.python.org/3/library/stdtypes.html#dict-views for the details on the concrete types, and I think the relevant ABCs and data model entries are linked from there. > -- for instance, a dict_keys object pretty much acts like a set, but it isn't > a subclass of set, and it has an isdisjoint() method, but not .union or any > of the other set methods. But it does have what at a glance looks like pretty > complete set of dunders The point of collections.abc.Set, and ABCs jn general, and the whole concept of protocols, is that the set protocol can be implemented by different concrete types—set, frozenset, dict_keys, third-party types like sortedcontainers.SortedSet or pyobjc.Foundation.NSSet, etc.—that are generally completely unrelated to each other, and implemented in different ways—a dict_keys is a link to the keys table in a dict somewhere, a set or frozenset has its own hash table, a SortedSet has a wide-B-tree-like structure, an NSSet is a proxy to an ObjC object, etc. if they all had to be subclasses of set, they’d be carrying around a set’s hash table but never using it; they’d have to be careful to override every method to make sure it never accidentally got used (and what would frozenset or dict_keys override add with?), etc. And if you look at the ABC, union isn’t part of the protocol, but __or__ is, and so on. > Anyway, a Sequence view is simpler, because it could probably simply be an > immutable sequence -- not much need for contemplating every bit of the API. It’s really the same thing, it’s just the Sequence protocol rather than the Set protocol. If anything, it’s _less_ simple, because for sequences you have to decide whether indexing should work with negative indices, extended slices, etc., which the protocol is silent about. But the answer there is pretty easy—unless there’s a good reason not to support those things, you want to support them. (The only open question is when you’re designing a sequence that you expect to be subclassed, but I don’t think we’re designing for subclassing here.) > I do see a possible objection here though. Making a small view of a large > sequence would keep that sequence alive, which could be a memory issue. Which > is one reason why sliced don't do that by default. Yes. When you just want to iterate something once, non-lazily, you don’t care whether it’s a view of a snapshot, but when you want to keep it around, you do care, and you have to decide which one you want. So we certainly can’t change the default; that would be a huge but subtle change that would break all kinds of code. But I don’t think it’s a problem for offering an alternative that people have to explicitly ask for. Also, notice that this is true for all of the existing views, and none of them try to be un-featureful to avoid it. > And it could simply be a buyer beware issue. But the more featureful you make > a view, the
[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)
On May 10, 2020, at 02:42, Alex Hall wrote: > > - Handling negative indices for sequences (is there any reason we don't have > that now?) Presumably partly just to keep it minimal and simple. Itertools is all about transforming iterables into other iterables in as generic a way as possible. None of the other functions do anything special if given a more fully-featured iterable. But also, negative indexing isn’t actually part of the Sequence protocol. (You don’t get negative indexes for free by inheriting Sequence as a mixin, nor is it ensured by testing isinstance with Sequence as an ABC.) It’s part of the extra stuff that list and the other builtin sequences happen to do. You didn’t suggest allowing negative islicing on set even though it could just as easily be implemented there, because you don’t expect negative indexing as part of the Set protocol (or the Sized Iterable protocol); you did expect it as part of the Sequence protocol, but Python’s model disagrees. Maybe practicality beats purity here, and islice should take negative indices on any Sequence, or even Sized, input, even though that makes it different from other itertools functions, and ignores the fact that it could be simulating negative indexing on some types where it’s meaningless. But how often have you wanted to call islice with a negative index? How horrible is the workaround you had to write instead? I suspect that it’s already rare enough of a problem that it’s not worth it, and that any form of this proposal would make it even rarer, but I could be wrong. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/ZGHJJJP43VZI4ZG7PRTIH3GJGTXANJK6/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Sanitize filename (path part)
On May 9, 2020, at 17:35, Steve Jorgensen wrote: > > I believe the Python standard library should include a means of sanitizing a > filesystem entry, and this should not be something requiring a 3rd party > package. > > One of reasons I think this should be in the standard lib is because that > provides a common, simple means for code reviewers and static analysis > services such as Veracode to recognize that a value is sanitized in an > accepted manner. This does seem like a good idea. People who do this themselves get it wrong all the time, occasionally with disastrous consequences, so if Python can solve that, that would be great. But, at least historically, this has been more complicated than what you’re suggesting here. For example, don’t you have to catch things like directories named “Con” or files whose 8.3 representation has “CON” as the 8 part? I don’t think you can hang an entire Windows system by abusing those anymore, but you can still produce filenames that some APIs, and some tools (possibly including Explorer, cmd, powershell, Cygwin, mingw/native shells, Python itself…) can’t access (or can only access if the user manually specified a \\.\ absolute path, or whatever). Is there an established algorithm/rule that lots of people in the industry trust that Python can just reference, instead of having to research or invent it? Because otherwise, we run the risk of making things worse instead of better. > What I am envisioning is a function (presumably in `os.path` with a signature > roughly like > {{{ > sanitizepart(name, permissive=False, mode=ESCAPE, system=None) > }}} Maybe it would make more sense to put this in pathlib. Then you construct a PurePath of the appropriate type, and call sanitize() on it (maybe with a flag that ensures that it’s a single path component if you expected it to be one). I think some, but not all, of this logic already exists in pathlib. > When `permissive` is `False`, characters that are generally unsafe are > rejected. When `permissive` is `True`, only path separator characters are > rejected. Generally unsafe characters besides path separators would include > things like a leading ".", any non-printing character, any wildcard, piping > and redirection characters, etc. I think neither of these is what I’d usually want. I never want to sanitize just pathsep characters without sanitizing all illegal characters. I do often want to sanitize all illegal characters (just \0 and the path sep on POSIX, a larger set that I don’t know by heart on Windows). I don’t think I’ve ever wanted to sanitize the set of potentially-unsafe characters you’re proposing here. I have wanted to sanitize (or pop up an “are you sure?” dialog, etc.) a wider range of potentially confusing characters. For example, newlines or Unicode separators can be very confusing in filenames. I’ve used one of those “potentially misleading URL” libs for this even though files and URLs aren’t quite the same and it was definitely overzealous, but if I’m not really confident that someone has thought through the details and widely vetted them, I’d rather have overzealous than underzealous for something like this. Meanwhile, on POSIX, it’s actually bytes rather than characters that are illegal. Any character that, in the filesystem’s encoding, would have a \0 or \x2f is therefore illegal. Of course in UTF-8, the only such characters are NUL and /, so in scripts I write for my own use on my own systems where I know all the filesystems are UTF-8 I don’t worry about this But mething meant for hardening/verification tools seems like it needs to meet a higher standard and work on more varied systems. And I don’t know how you could even apply the right rule without knowing what the file system encoding is (which means you need the full path, not just the component to be checked) or requiring bytes rather than str (but then it doesn’t work for Windows, and resolving that whole mess gets extra fun, and even on POSIX it’s a lot less common to use). Speaking of encodings and Windows, isn’t any character not in the user’s OEM code page likely to be confusing? Sure, it’ll work with other Python 3.8 scripts, but it’ll crash or do the wrong thing or display mojibake when used with lots of other tools. > The `mode` argument indicates what to do with unacceptable characters. Escape > them (`ESCAPE`), omit them (`OMIT`) or raise an exception (`RAISE`). What’s the exception, and what attributes does it have? Usually I don’t care too much as long as the traceback/log entry/whatever is good enough for debugging, but for this function, I think I’d often want to be able to programmatically access the character(s) that triggered the error so I can tell the user. Especially if the rule isn’t a fixed, well-known one that you can describe the way Windows Explorer does when you try to use an illegal character. > This could also double as an escape character argument
[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)
On May 9, 2020, at 19:43, Christopher Barker wrote: > > On Sat, May 9, 2020 at 1:03 PM Andrew Barnert wrote: > > https://github.com/PythonCHB/islice-pep/blob/master/pep-xxx-islice.rst > > I haven’t read the whole thing yet, but one thing immediately jumped out at > me: > > > and methods on containers, such as dict.keys return iterators in Python 3, > > No they don’t. They return views—objects that are collections in their own > right (in particular, they’re not one-shot; they can be iterated over and > over) but just delegate to another object rather than storing the data. > > Thanks -- that's that kind of thing that led me to say that this is probably > not ready for a PEP. > > but I don't think that invalidates the idea at all -- there is debate about > what an "islice" should return, but an iterable view would be a good option. I don’t think it invalidates the basic idea at all, just that it suggests the design should be different. Originally, dict returned lists for keys, values, and items. In 2.2, iterator variants were added. In 3.0, the list and iterator variants were both replaced with view versions, which were enough of an improvement that they were backported to 2.x. Because a view does cover almost all of the uses of both a sequence copy and an iterator. And I think the same is true here. > I'm inclined to think that it would be a bad idea to have it return a full > sequence view object, and not sure it should do anything other than be > iterable. Why? What’s the downside to being able to do more with them for the same performance cost and only a little more up-front design work? > > And this is important here, because a view is what you ideally _want_. The > > reason range, key view, etc. are views rather than iterators isn’t that > > it’s easier to implement or explain or anything, it’s that it’s a little > > harder to implement and explain but so much more useful that it’s worth it. > > It’s something people take advantage of all the time in real code. > > Maybe -- but "all the time?" I'd vernture to say that absolutiely the most > comon thing done with, e.g. dict.keys() is to iterate over it. Really? When I just want to iterate over a dict’s keys, I iterate the dict itself. > > For prior art specifically on slicing as a view, rather than just views in > > general, see memoryview (which only works on buffers, not all sequences) > > and NumPy (which is weird in many ways, but people rely on slicing giving > > you a storage-sharing view) > > I am a long-time numpy user, and yes, I very much take advantage of the > memory sharing view. > > But I do not think that that would be a good idea for the standard libary. > numpy slices return a full-fledged numpy array, which shares a data view with > the it's "host" -- this is really helpful for performance reasons -- moving > large blocks of data around is expensive, but it's also pretty confusing. And > it would be a lot more problematic with, e.g. lists, as the underlying buffer > can be reallocated -- numpy arrays are mutable, but not re-sizable, once > you've made one its data buffer does not change. That’s no more of a problem for a list slice view than for any of the existing views. The simplest way to implement a view is to keep a reference to the underlying object and delegate to it, which is effectively what the dict views do. (Well, did from 2.x to 3.5. The dict improvements in 3.6 opened up an optimization opportunity, because in the split layout a dict is effectively a wrapper around a keys view and a separate table, so the keys view can refer directly to that thing that already exists. But that isn’t relevant here.) (You _could_ instead refuse to allow expanding a sequence when there’s a live view, as bytearray does with memoryview, but I don’t think that’s necessary here. It’s only needed there a consequence of the fact that the buffer protocol is provided in C rather than in Python. For a slice view, it would just make things more complicated and less functional for no good reason.) > > But just replacing islice is a much simpler task (mainly because the input > > has to be a sequence and the output is always a sequence, so the only > > complexity that arises is whether you want to allow mutable views into > > mutable sequences), and it may well be useful on its own. > > Agreed. And while yes, dict_keys and friends are not JUST iterartors, they > also aren't very functional views, either. They are not sequences, That’s not true. They are very functional—as functional as reasonably makes sense. The only reason they’re not Sequences is that they’re views on dicts, so indexing makes little sense, but set operations do—and they are in fact Sets. (Except for values.) > certainly not mutabe sequences. Well, yes, but mutating a dict through its views wouldn’t make sense in the first place: >>> d = {1: 2} >>> k = dict.keys() >>> k |= 3 You’ve told it to
[Python-ideas] Re: Equality between some of the indexed collections
On May 9, 2020, at 13:24, Dominik Vilsmeier wrote: > > >> On 09.05.20 22:16, Andrew Barnert wrote: >>> >> There’s an obvious use for the .all, but do you ever have a use for the >> elementwise itself? When do you need to iterate all the individual >> comparisons? (In numpy, an array of bools has all kinds of uses, starting >> with indexing or selecting with it, but I don’t think any of them are doable >> here.) > I probably took too much inspiration from Numpy :-) Also I thought it > would nicely fit with the builtin `all` and `any`, but you are right, > there's probably not much use for the elementwise iterator itself. So > one could use `elementwise` as a namespace for `elementwise.all(chars) > == string` and `elementwise.any(chars) == string` which automatically > reduce the elementwise comparisons and the former also performs a length > check prior to that. This would still leave the option of having > `elementwise(x) == y` return an iterator without reducing (if desired). But do you have any use for the .any? Again, it’s useful in NumPy, but would any of those uses translate? If you’re never going to use elementwise.any, and you’re never going to use elementwise itself, having elementwise.all rather than just making that the callable is just making the useful bit a little harder to access. And it’s definitely complicating the implementation, too. If you have a use for the other features, that may easily be worth it, but if you don’t, why bother? I took my lexicompare, stripped out the dependency on other helpers in my toolbox (which meant rewriting < in a way that might be a little slower; I haven’t tested) and the YAGNI stuff (like trying to be “view-ready” even though I never finished my views library), and posted it at https://github.com/abarnert/lexicompare (no promises that it’s stdlib-ready as-is, of course, but I think it’s at least a useful comparison point here). It’s pretty hard to beat this for simplicity: @total_ordering class _Smallest: def __lt__(self, other): return True @total_ordering class lexicompare: def __new__(cls, it): self = super(lexicompare, cls).__new__(cls) self.it = it return self def __eq__(self, other): return all(x==y for x,y in zip_longest(self.it, other, fillvalue=object())) def __lt__(self, other): for x, y in zip_longest(self.it, other, fillvalue=_Smallest()): if x < y: return True elif x < y: return False return False ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/JUG6XZMEGTRYWBUKUVAOXN64FTAJGTX7/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Equality between some of the indexed collections
On May 9, 2020, at 02:58, Dominik Vilsmeier wrote: > > > Initially I assumed that the reason for this new functionality was > concerned with cases where the types of two objects are not precisely > known and hence instead of converting them to a common type such as > list, a direct elementwise comparison is preferable (that's probably > uncommon though). Instead in the case where two objects are known to > have different types but nevertheless need to be compared > element-by-element, the performance argument makes sense of course. > > So as a practical step forward, what about providing a wrapper type > which performs all operations elementwise on the operands. So for example: > > if all(elementwise(chars) == string): > ... > > Here the `elementwise(chars) == string` part returns a generator which > performs the `==` comparison element-by-element. > > This doesn't perform any length checks yet, so as a bonus one could add > an `all` property: > > if elementwise(chars).all == string: > ... There’s an obvious use for the .all, but do you ever have a use for the elementwise itself? When do you need to iterate all the individual comparisons? (In numpy, an array of bools has all kinds of uses, starting with indexing or selecting with it, but I don’t think any of them are doable here.) And obviously this would be a lot simpler if it was just the all object rather than the elementwise object—and even a little simpler to use: element_compare(chars) == string (In fact, I think someone submitted effectively that under a different name for more-itertools and it was rejected because it seemed really useful but more-itertools didn’t seem like the right place for it. I have a similar “lexicompare” in my toolbox, but it has extra options that YAGNI. Anyway, even if I’m remembering right, you probably don’t need to dig up the more-itertools PR because it’s easy enough to redo from scratch.) ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/AF3Z63YYQQVWCV3DZQJMKFNKO2G5AXKG/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)
On May 9, 2020, at 12:38, Christopher Barker wrote: > > https://github.com/PythonCHB/islice-pep/blob/master/pep-xxx-islice.rst I haven’t read the whole thing yet, but one thing immediately jumped out at me: > and methods on containers, such as dict.keys return iterators in Python 3, No they don’t. They return views—objects that are collections in their own right (in particular, they’re not one-shot; they can be iterated over and over) but just delegate to another object rather than storing the data. People also commonly say that range is an iterator instead of a function that returns a list in Python 3, and that’s wrong for the same reason. And this is important here, because a view is what you ideally _want_. The reason range, key view, etc. are views rather than iterators isn’t that it’s easier to implement or explain or anything, it’s that it’s a little harder to implement and explain but so much more useful that it’s worth it. It’s something people take advantage of all the time in real code. And this is pretty easy to implement. I have a quick and dirty version at https://github.com/abarnert/slices, but I think I may have a better version somewhere with more unit tests. For prior art specifically on slicing as a view, rather than just views in general, see memoryview (which only works on buffers, not all sequences) and NumPy (which is weird in many ways, but people rely on slicing giving you a storage-sharing view) The reason I never proposed this for the stdlib (even though that would allow adding methods directly onto the builtin container types, as your proposal does) is that I always want to build a _complete_ view library, with replacements for map, zip, enumerate, all of itertools, etc., and with enough cleverness to present exactly as much functionality as is possible. But just replacing islice is a much simpler task (mainly because the input has to be a sequence and the output is always a sequence, so the only complexity that arises is whether you want to allow mutable views into mutable sequences), and it may well be useful on its own. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/YQKKS4RADWU3QOFWFUU6PHS3ZU523T7P/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: islice with actual slices
On May 9, 2020, at 02:12, Ram Rachum wrote: > > > Here's an idea I've had. How about instead of this: > > itertools.islice(iterable, 7, 20) > > We'll just have: > > itertools.islice(iterable)[7:20] I’ve actually built this.[1] From my experience, it feels clever at first, but it can get confusing. The problem is that if you slice twice, or slice after nexting, you can’t get a feel for what the remaining values should be unless you work it through. Of course the exactly same thing is true with using islice twice today, but you don’t _expect_ that to be comprehensible in terms of slicing the original iterable twice, while with slice notation, you do. Or at least I do; maybe that’s just me. And meanwhile, even though the simple uses aren’t confusing, I’ve never had any code where it made things nicer enough that it seemed worth reaching into the toolbox. But again, maybe that’s just me. If you want to play with this and can’t implement it yourself easily, I could dig up my implementation. But it’s pretty easy (especially if you don’t try to optimize and just have __getitem__ return a new islice around self). —- [1] Actually, I built an incomplete viewtools (a replacement for itertools plus zip, map, etc. that gives you views that are reusable iterables and forward as much input behavior as possible—so map(lambda i: i*2, range(10)) is a sequence, while filter(lambda i: i%2, range(10)) is not a sequence but it is reversible, and so on) and then extracted and simplified the vslice because I thought it might be useful without the views stuff. (I also extracted and simplified it in a different way, as view slices that only work on sequences, and that actually did turn out to be occasionally useful.)___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/CQ43LF5UICMYNNB43JJM2CXOILUOCSPC/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: zip() as a class constructor (meta) [was: ... Length-Checking To zip]
On May 9, 2020, at 03:46, Chris Angelico wrote: > > But ultimately, a generator function is very similar to a class with a > __next__ method. When you call it, you get back a state object that > you can ping for the next value. That's really all that matters. Well, it’s very similar to a class with __next__, send, throw, and close methods. But that doesn’t really change your point. For a different angle on this: What If Python 3.10 changed things so that every generator function actually did define a new class (maybe even accessible as a member of the function)? What would break? You could make inspect.isgenerator() continue to work, and provide the same internal attributes documented in the inspect module. So only code that depends on type(gen()) is types.GeneratorType would break (and there probably is very little of that—not even throwaway REPL code). Also: a generator isn’t actually a way of defining a class, but it’s a way of defining a factory for objects that meet a certain API, and Python goes out of its way to hide that distinction wherever possible (not just for generators, but in general). The only meaningful thing that’s different between a generator function and a generator class is that the author of the function doesn’t directly write the __next__ (and send, close, etc.) code, but instead writes code that defines their behavior implicitly. And that’s obviously just an implementation detail, and it isn’t that much different from the fact that the author of a @dataclsss doesn’t directly write the __init__, __repr__, etc. So you’re right, from outside, it really doesn’t matter. > I > think the C implementations tend to be classes but the Python ones > tend to be generators - possibly because a generator function is way > easier to write in Python, but maybe the advantage isn't as strong in > C. It’s not just not as strong, it runs in the opposite direction. In fact, it’s impossible to write generator functions in C. There’s no way to yield control in a C function. (Even if you build a coro library around setjmp, or use C++20 coros, it wouldn’t help you yield back into CPython’s ceval loop.) A generator object is basically just a wrapper around an interpreter frame and its bytecode; there’s no way to exploit that from C. There are a few shortcuts to writing an iterator (e.g., when you have a raw array, implement the old-style sequence protocol, want to delegate to a member, or can steal another type’s implementation as frozenset does with set), but a generator function isn’t one of them. (If you’re curious how Cython compiles generators, it’s worth looking at what it produces—but doing the same thing in raw C would not be a shortcut to writing a generator class.) ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/OGNVZK6SUCE2YUMM4IUHHD4TG76A7CYX/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: zip(x, y, z, strict=True)
> On May 9, 2020, at 04:30, Alex Hall wrote: > >> On Fri, May 8, 2020 at 11:22 PM Andrew Barnert via Python-ideas >> wrote: > >> Trying to make it a flag (which will always be passed a constant value) is a >> clever way to try to get the best of both worlds—and so is the >> chain.from_iterable style. > > At this point it sounds like you're saying that zip(..., strict=True) and > zip.strict(...) are equally bad. You’re right, it did sound like that, and I don’t mean that. Sorry. zip.strict has _some_ of the same problems as zip(strict=True), but definitely not _all_ of them. And I definitely prefer zip.strict to the flag. At the time I wrote this (I don’t know why it took a few days to get delivered…), zip.strict had come up the first time and been roundly shouted down, and it seemed like.nobody but me (and the proposer, of course) had found it at all acceptable, and I was trying to make the point that if people don’t like zip.strict, the same things and more apply to passing an always-constant flag, so it should be even more acceptable. Then. over the last few days, a bunch of people came around on zip.strict. And that seems to be at least in part because people came up with better arguments than the first time around. (For example, I forget who it was that pointed out that you don’t really have to start thinking of zip as a class and zip.strict as an alternate constructor, because plenty of people don’t realize that’s true for chain.from_iterable and they still have no more problem using it than they do for datetime.now.) So now, rather than it being a +0 for me and a distant second choice behind an itertools function, I think I’m pretty close to evenly torn between the two. I do think that if we add zip.strict, we should also probably add zip.longest, not just think about maybe adding it some day. And it might even be worth adding zip.shortest, even if we have no intention of ever eliminating zip() itself or changing it to mean zip.strict. But I don’t have good arguments for these; I’ll have to think about it a bit more to explain why I think consistency easily trumps the costs for this variant of the proposal but probably fails for other variants. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/RAFDWYYUIDOLCQ4M7HS35DZL56LR32YX/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Equality between some of the indexed collections
On May 8, 2020, at 20:36, Dan Sommers <2qdxy4rzwzuui...@potatochowder.com> wrote: > > On Fri, 8 May 2020 17:40:31 -0700 > Andrew Barnert via Python-ideas wrote: > >> So, the OP is right that (1,2,3)==[1,2,3] would sometimes be handy, >> the opponents are right that it would often be misleading, and the >> question isn’t which one is right ... > > That's a good summary. Thank you. :-) > >> [1] If anyone still wants to argue that using a tuple as a hashable >> sequence instead of an anonymous struct is wrong, how would you change >> this excerpt of code: >> >>memomean = memoize(mean, key=tuple) >>def player_stats(player): >># … >>… = memomean(player.scores) … >># … >> >> Player.scores is a list of ints, and a new one is appended after each >> match, so a list is clearly the right thing. But you can’t use a list >> as a cache key. You need a hashable sequence of the same values. And >> the way to spell that in Python is tuple. > > Very clever. I don’t think it’s particularly clever. And that’s fine—using common idioms usually is one of the least clever ways to do something out of the infinite number of possible ways. Because being intuitively the one obvious way tends to be important to becoming an idiom, and it tends to run counter to being clever. (Being concise, using well-tested code, and being efficient are also often important, but being clever doesn’t automatically give you any of those.) > Then again, it wouldn't be python-ideas if it were that > simple! "hashable sequence of the same values" is too strict. I think > all memoize needs is a key function such that if x != y, then key(x) != > key(y). Well, it does have to be hashable. (Unless you’re proposing to also replace the dict with an alist or something?) I suppose it only needs to be a hashable _encoding_ of a sequence of the same values, but surely the simplest encoding of a sequence is the sequence itself, so, unless “hashable sequence” is impossible (which it obviously isn’t), who cares? >def key(scores): >','.join(str(-score * 42) for score in scores) This is still a sequence. If you really want to get clever, why not: def key(scores): return sum(prime**score for prime, score in zip(calcprimes(), scores)) But this just demonstrates why you don’t really want to get clever. It’s more code to write, read, and debug than tuple, easier to get wrong, harder to understand, and almost certainly slower, and the only advantage is that it deliberately avoids meeting a requirement that we technically didn’t need but got for free. > Oh, wait, even that's too strict. All memoize really needs is if > mean(x) != mean(y), then key(x) != key(y): > >memomean = memoize(mean, key=mean) >def player_stats(player): ># … >… = memomean(player.scores) … ># … Well, it seems pretty unlikely that calculating the mean to use it as a cache key will be more efficient than just calculating the mean, but hey, if you’ve got benchmarks, benchmarks always win. :) (In fact, I predicted that memoizing here would be a waste of time in the first place, because the only players likely to have equal score lists to earlier players would be the ones with really short lists—but someone wanted to try it anyway, and he was able to show that it did speed up the script on our test data set by something like 10%. Not nearly as much as he’d hoped, but still enough that it was hard to argue against keeping it.) ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/QLEDJ7XBE3EHG2C3J2QEFOWROSTMSH4C/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: PEP 618: Add Optional Length-Checking To zip
On May 4, 2020, at 10:44, Steve Barnes wrote: > And "equal" doesn't say what it's equal. > > What we need is a word that means "same length", much as "shorter" and > "longer" are about length. > > There's "coextensive", but that'll probably get a -1. If “equal” is bad, “coextensive” is much worse. “Equal” is arguably ambiguous between “same length” and “same values”, but “coextensive” usually means “same values”. “The County shall be coextensive with the City of San Francisco” doesn’t mean that it’s 49.81 square miles, it means it consists of the exact same 49.81 square miles as the city. “The golden age of Dutch culture was roughly coextensive with the Netherlands’ reign as a world power…” doesn’t mean it was roughly 67 years, it means it was roughly the same 67 years from 1585 to 1652.[1] “Consciousness and knowledge are coextensive” means that you know the things you’re conscious of. And in math[2], a popular example in undergrad textbooks[3] is that (Z/7Z, +) and (Z/7Z, *) are coextensive but still distinct groups. The most popular formulation of the axiom of reducibility in early predicative set theory was “to each propositional function there corresponds a coextensive predicative function”. Even in measure theory, it seems to always mean “same extension”, not “same extent”. So, this would be a great name for the function in the other thread about comparing lists and tuples as equal, but it’s not a great name here. Some dictionaries do give “commensurate” or similar as a secondary[4] meaning, but at best that would mean it’s still ambiguous. —- [1] And here I thought it was 1989 until whenever Guido left. [2] I didn’t even remember that it was used in math until I used the word in its normal English sense and one of the other Steves accused me or resorting to mathematical jargon—but after that, I did some searching, and I was wrong, and it actually is reasonably common. [3] Seriously, I found the exact same example in three apparently unrelated textbooks. Which is pretty odd. [4] Or even later, after giving the same spatial boundaries, then the same temporal boundaries, then the math/logic definition, but I’m lumping those all together as one sense because they’re coextensive if spacetime Is topological. :) ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/W3RLUQ2GUQX4I5GV6X7UUTLQ7QPJ6ZA2/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: General methods
On May 8, 2020, at 15:44, Steven D'Aprano wrote: > > On Fri, May 08, 2020 at 10:46:45PM +0300, Serhiy Storchaka wrote: > >> I propose to add the METH_GENERAL flag, which is applicable to methods >> as METH_CLASS and METH_STATIC (and is mutually incompatible with them). >> If it is set, the check for the type of self will be omitted, and you >> can pass an arbitrary object as the first argument of the unbound method. > > Does this effect code written in Python? As I understand, in Python > code, unbound methods are just plain old functions, and there is no > type-checking done on `self`. > >py> class C: >... def method(self, arg): >... return (self,) >... >py> C.method(999, None) >(999,) > > So I think your proposal will only affect builtin methods written in C. > Is that correct? Maybe the best way to see it is this: For classes implemented in Python, you have to go out of your way to typecheck self. For classes implemented in C, you have to go out of your way to _not_ typecheck self. It’s probably way too big of a change to make them consistent at this point, so Serhiy is just proposing a way to make it a lot easier for C methods to act like Python ones when you need them to. And, given that he has some solid use cases, it’s hard to see any problem with that. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/JKAYR4SWRM7HTJS3RN775R2I4K3B75XQ/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Equality between some of the indexed collections
On May 6, 2020, at 05:22, Richard Damon wrote: > > In my mind, tuples and lists seem very different concepts, that just > happen to work similarly at a low level (and because of that, are > sometimes 'misused' as each other because it happens to 'work'). I think this thread has gotten off track, and this is really the key issue here. If someone wants this proposal, it’s because they believe it’s _not_ a misuse to use a tuple as a frozen list (or a list as a mutable tuple). If someone doesn’t want this proposal, the most likely reason (although admittedly there are others) is because they believe it _is_ a misuse to use a tuple as a frozen list. It’s not always a misuse; it’s sometimes perfectly idiomatic to use a tuple as an immutable hashable sequence. It doesn’t just happen to 'work', it works, for principled reasons (tuple is a Sequence), and this is a good thing.[1] It’s just that it’s _also_ common (probably a lot more common, but even that isn’t necessary) to use it as an anonymous struct. So, the OP is right that (1,2,3)==[1,2,3] would sometimes be handy, the opponents are right that it would often be misleading, and the question isn’t which one is right, it’s just how often is often. And the answer is obviously: often enough that it can’t be ignored. And that’s all that matters here. And that’s why tuple is different from frozenset. Very few uses of frozenset are as something other than a frozen set, so it’s almost never misleading that frozensets equal sets; plenty of tuples aren’t frozen lists, so it would often be misleading if tuples equaled lists. —- [1] If anyone still wants to argue that using a tuple as a hashable sequence instead of an anonymous struct is wrong, how would you change this excerpt of code: memomean = memoize(mean, key=tuple) def player_stats(player): # … … = memomean(player.scores) … # … Player.scores is a list of ints, and a new one is appended after each match, so a list is clearly the right thing. But you can’t use a list as a cache key. You need a hashable sequence of the same values. And the way to spell that in Python is tuple. And that’s not a design flaw in Python, it’s a feature. (Shimmer is a floor wax _and_ a dessert topping!) Sure, when you see a tuple, the default first guess is that it’s an anonymous struct—but when it isn’t, it’s usually so obvious from context that you don’t even have to think about it. It’s confusing a lot less often than, say, str, and it’s helpful a lot more often. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/F65FI2QMUOUCD2RVW4APQMNAFALQZFXB/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Auto-assign attributes from __init__ arguments
On May 4, 2020, at 17:26, Steven D'Aprano wrote: > > Proposal: > > We should have a mechanism that collects the current function or > method's parameters into a dict, similar to the way locals() returns all > local variables. > > This mechanism could be a new function,or it could even be a magic local > variable inside each function, similar to what is done to make super() > work. But for the sake of this discussion, I'll assume it is a function, > `parameters()`, without worrying about whether it is a built-in or > imported from the `inspect` module. Some other popular languages have something pretty similar. (And they’re not all as horrible as perl $*.) For example, in JavaScript, there’s a magic local variable named arguments whose value is (a thing that duck-types as) a list of the arguments passed to the current function’s parameters. (Not a dict, but that’s just because JS doesn’t have keyword arguments.) > function spam(x, y) { console.log(arguments) } > spam(23, 42) [23, 42] Whether it’s called arguments or parameters, and whether it’s a magic variable or a magic function, are minor bikeshedding issues (which you already raised), not serious objections to considering them parallel. And I think all of the other differences are either irrelevant, or obviously compelled by differences between the languages (e.g., Python doesn’t need a rule for how it’s different between the two different kinds of functions, because lambda doesn’t produce a different kind of function). So, I think this counts as a prior-art/cross-language argument for your proposal. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/JABSTNZJ2D5GMI23FXJD7UAG7QPXVHJK/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: zip(x, y, z, strict=True)
On May 5, 2020, at 12:50, Christopher Barker wrote: > > Another key point is that if you want zip_longest() functionality, you simply > can not get it with the builtin zip() -- you are forced to look elsewhere. > Whereas most code that might want "strict" behavior will still work, albeit > less safely, with the builtin. I think this is a key point, but I think you’ve got it backward. You _can_ build zip_longest with zip, and before 2.6, people _did_. (Well, they built izip_longest with izip.) I’ve still got my version in an old toolbox. You chain a repeat(None) onto each iterable, izip, and you get an infinite iterator that you have to read until all(is None). You can just takewhile that into exactly the same thing as izip_longest, but unfortunately that’s a lot slower than filtering when you iterate, so I had both _longest and _infinite variants, and I think I used the latter more even though it was usually less convenient. That sounds like a silly way to do it, and it’s certainly easier to get subtly wrong than just writing a generator function like the “as if” code in the (i)zip_longest docs, but a comment in my code assures me that this is almost 4x as fast, and half the speed of a custom C implementation, so I’m pretty sure that’s why I did it. And I doubt I’m the only person who solved it that way. In fact, I’ll bet I copied it from an ActiveState recipe or a colleague or an open source project. So, most likely, izip_longest wasn’t added because you can’t build it on top of izip, but because building it on top of izip is easy to get subtly wrong (especially if you need it to be fast—or don’t need it to be fast but micro optimize it anyway, for that matter), and often people punt and do something clunkier (use _infinite instead of _longest and make the final for loop more complicated). Which is actually a pretty good parallel for the current proposal. You can write your own zip_strict on top of zip, and at least a few people do—but, as people have shown in this thread, the obvious solution is too slow, the obvious fast solution is very easy to get subtly wrong, and often people punt and do something clunkier (listify and compare len). That’s why I’m +1 on this proposal in some form. Assuming zip_strict would be useful at least as often as zip_longest (and I’ve been sold on that part, and I think most people on all sides of this discussion agree?), it calls out for a good official solution. The fact that the ecosystem is different nowadays (pip install more-itertools or copying off StackOverflow is a lot simpler, and more common, than finding a recipe on ActiveState) does make it a little less compelling, but at most that means the official solution should be a docs link to more-itertools, still not that we should do nothing. But that’s also part of the reason I’m -1 on it being a flag. Just like zip_longest, it’s a different function, one you shouldn’t think of as being built on zip even if it could be. Maybe strict really is needed so much more often than longest that “import itertools” is too onerous, but if that’s really true, that different function should be another builtin. I think nobody is arguing for that, because it’s just obvious that it isn’t needed enough to reach the high bar of adding another function to builtins. But that means it belongs in itertools. Trying to make it a flag (which will always be passed a constant value) is a clever way to try to get the best of both worlds—and so is the chain.from_iterable style. But if either of those really did get the best of both worlds and the problems of neither, it would be used all over the place, rather than as sparingly as possible. And of course it doesn’t get the best of both worlds. A flag is hiding code as data, and it looks misleadingly like the much more common uses of flags where you actually do often set the flag with a runtime value. It’s harder to type (and autocomplete makes the difference worse, not better). It’s a tiny bit harder to read, because you’re adding as much meaningless boilerplate (True) as important information (strict). It’s increasing the amount of stuff to learn in builtins just as much as another function would. And so on. So it’s only worth doing for really special cases, like open. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/IEMCC3WXEHV2J7DLP7OXWSYATLSC3BBI/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Adding a "once" function to functools
On May 1, 2020, at 09:51, Tom Forbes wrote: > >> You’ve written an exactly equIvalent to the double-checked locking for >> singletons examples that broke Java 1.4 and C++03 and led to us having once >> functions in the first place. >> … but what about on Jython, or PyPy-STM, or a future GIL-less Python? > > While I truly do appreciate your feedback on this idea, I’m really not clear > on your line of reasoning here. What specifically do you propose would be the > issue with the *Python* implementation? Are you proposing that under some > Python implementations `cache = func()` could be… the result of half a > function call? I could buy an issue with some implementations meaning that > `cache` still appears as `sentinel` in specific situations, but I feel that > would constitute a pretty obvious bug in the implementation that would impact > a _lot_ of other multithreaded code rather than a glaring issue with this > snippet. Both the issues you’ve referenced valid, but also are rather > specific to the languages that they affect. I don’t believe they apply to > Python. But the issues really aren’t specific to C++ and Java. The only reason C#, Swift, Go, etc. don’t have the same problem is that their memory models were designed from the start to provide a way to do this correctly. Python was not. There was an attempt to define a memory model in the 00’s (PEP 583), but it was withdrawn. According to the discussion around that PEP about when you can see uninitialized variables (not exactly the same issue, but closely related), Jython is safe when they’re globals or instance attributes and you haven’t replaced the module or object dict, but otherwise probably not; IronPython is probably safe in the same cases and more but nobody’s actually sure. Does that sound good enough to dismiss the problem? > I still think the point stands. With your two-separate-decorators approach > you’re paying it on every call. As a general purpose `call_once()` > implementation I think the snippet works well, but obviously if you have some > very specific use-case where it’s not appropriate - well then you are > probably able to write a very specific and suitable decorator. Being willing to trade safety or portability for speed is sometimes a good tradeoff, but that’s the special use case, not the other way around. People who don’t know exactly what they need should get something safe and portable. Plus, there’s still the huge issue with single-threaded programs. It’s not like multi-threaded programs are ubiquitous in Python but, e.g., asyncio is some rare niche thing that the stdlib doesn’t have to worry about. A bunch of coroutines using a once function needs either nothing, or a coro lock; if you build a threading lock into the function, they waste time and maybe deadlock every 1 startups for no benefit whatsoever. Why is that acceptable for a general purpose function? ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/SSBEI5BD3HNONNSH5RGGPYKZ2LY3DEXR/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: is a
On May 1, 2020, at 15:35, Steven D'Aprano wrote: > > but if it is all functions, then I think you have no choice but to > either live with it or shift languages, because the syntax for functions > is too deeply baked into Python to change now. Actually, I’m pretty sure Python could add infix calling without complicating the grammar much, or breaking backward compatibility at all. I don’t think it *should*, but maybe others would disagree. The most obvious way to do it is borrowing straight out of Haskell, so this: x `spam` y … compiles to exactly the same code as this: spam(x, y) That should be a very easy change to the grammar and no change at all to the later stages of compiling, so it’s about as simple as any new syntax could be. It doesn’t get in the way of anything else to the parser—and, more importantly, I don’t think it’s confusable as meaning something else to humans. (Of course it would be one extra thing to learn, like any syntax change.) Maybe something like $ instead of backticks is better for people with gritty monitors, but no point bikeshedding that (or the precedence) unless the basic idea is sound. Anyway, it’s up to the user to decide which binary functions to infix and which to call normally, which sounds like a consenting-adults issue, but… does it _ever_ look Pythonic? For this particular use case: isa = isinstance thing `isa` Fruit and not thing `isa` Apple … honestly, the lack of any parens here makes it seem harder to read, even if it is a bit closer to English. Here’s the best use cases I can come up with: xs `cross` ys array([[0,1], [1,1]]) `matrix_power` n prices `round` 2 These are all things I have written infix in Haskell, and can’t in Python/NumPy, so you’d think I’d like the improvement… but if I can’t have real operators, I think I want dot-syntax methods with parens instead in Python: prices.round(2) And outside of NumPy, the examples seem to just get worse: with open(path, 'w') as f: obj `json.dump` f Of course maybe I’m just failing to imagine good examples. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/AQPPHKL4EMFMT5NPB66W4GAFMGE5YYAB/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Introduce 100 more built-in exceptions
On May 1, 2020, at 16:32, Steven D'Aprano wrote: > > On Fri, May 01, 2020 at 12:28:02PM -0700, Andrew Barnert via Python-ideas > wrote: >>> On May 1, 2020, at 09:24, Christopher Barker wrote: >>> Maybe it's too late for this, but I would love it if ".errno or similar" >>> were more standardized. As it is, every exception may have it's own way to >>> find out more about exactly what caused it, and often you are left with >>> parsing the message if you really want to know. >> I don’t think there are many cases where a standardized .errno would >> help—and I think most such cases would be better served by separate >> exceptions. With OSError, errno was a problem to be fixed, not an ideal >> solution to emulate everywhere. >> You do often need to be able to get more information, and that is a problem, >> but I think it usually needs to be specific to each exception, not something >> generic. >> Does code often need to distinguish between an unpacking error and an int >> parsing error? If so, you should be able to handle UnpackingError and >> IntParsingError, not handle ValueError and check an .errno against some set >> of dozens of new builtin int constants. If not, then we shouldn’t change >> anything at all. >> As for parsing the error message, that usually comes up because >> there’s auxiliary information that you need but that isn’t accessible. >> For example, in 2.x, to get the filename that failed to open, you had >> to regex .args[0], and that sucked. > > Why would you parse the error message when you already have the > file name? > > try: >f = open(filename) > except IOError as err: >print(filename) try: config = parse_config() except IOError as err: print(filename) You can’t get the local variable out of some other function that you called, even with frame hacking. At any rate, it’s a bit silly to relitigate this change. All of the new IOError subclasses where a filename is relevant have had a filename attribute since 3.0, so this problem has been solved for over a decade. If you really prefer the 2.x situation where sometimes those exception instances had the filename and sometimes not, you’ll need a time machine. >> It seems like every year or two, someone suggests that we should go >> through the stdlib and fix all the exceptions to be reasonably >> distinguishable and to make their relevant information more >> accessible, and I don’t think anyone ever has a problem with that, > > I do! > > Christopher's proposal of a central registry of error numbers and > standardised error messages just adds a lot more work to every core > developer for negligible or zero actual real world benefit. You’re replying to a message saying “errno was a problem to be fixed, not an ideal solution to emulate” and likewise having to parse errors. And you’re insisting that you disagree because adding errno and standardizing messages so they could be parsed would be a problem for maintainers as well as for users. Sure, you’re right, but that’s not in any way an argument against Ram’s proposal, or against the paragraph you quoted; if anything, it’s an argument *for* it. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/EESMDX2YO5KAIQQVVSSNKSTKTOYMSNH2/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Introduce 100 more built-in exceptions
> On May 1, 2020, at 14:34, Christopher Barker wrote: > > But it seems clear that "doing a big revamp if all the Exceptions and adding > alot more subclasses" is not supported. Which doesn't means that a few more > expansions wouldn't be excepted. > > So folks that like this idea may be best served by finding the lowest hanging > fruit, and suggesting jsut a few. I think you’re right. It _might_ be accepted if someone did the work, but it’s probably a lot easier to get separate small changes with solid use cases in one by one. As long as you’re not being sneaky and pretending like you don’t have a master plan, and each of the changes is convincing, I think they’d have a much better chance. And there ought to be good use cases for “these builtin parse functions should have a .string with the input that failed so you don’t have to regex it out of the message” or “I need to distinguish this one kind of ValueError from all the other kinds” or whatever; a lot easier to argue for those use cases than something abstract and general. And almost any way it turns out seems like a win. Even if they all get rejected, better to know you were on the wrong track early rather than after a hundred hours of work. Or if it turns out to be more work than you expected and you get sick of doing it, at least you’ve improved some of the most important cases. Or maybe you’d just keep doing it and people just keep saying “fine”. Or maybe someone says, “Hold on, another one of these? They’re all good on their own, but shouldn’t we have some master plan behind it all?” and then you can point back to the master plan you posted in this thread that nobody wanted to read at the time, and now they’ll want to read it and start bikeshedding. :) (By “you” here I don’t mean you, Christopher; I mean Ram, or whoever else wants to do all this work.) By the way: > Python2 DID have a .message attribute -- I guess I should go look and find > documentation for the reasoning behind that, but it does seem like a step > backwards to me. In 2.6 and 2.7, it’s undocumented, and should always be either the same thing __str__ returns or the empty string. So, not particularly useful. I believe it exists as a consequence of the first time someone suggested “let’s clean up all the exceptions” but then that cleanup didn’t get started. It was added in 2.5, along with a planned deprecation of args, and a new rule for __str__ (return self.message instead of formatting self.args), and a new idiom for how newly-written exception classes should super: don’t pass *args, pass a single formatted string; anything worth keeping around for users is worth storing in a nicely named attribute the way SyntaxError and IOError always have. And Py3000 was going to change all the existing exceptions to use that new idiom. But that never happened, and 2.6 and 3.0 basically went back to 2.4: there’s no mention of message at all, args isn’t going to be deprecated, the rule for __str__ is the old one, etc. There are more custom attributes on more exceptions than there used to be, but they seem to mostly have grown on a case by case basis (and mostly on brand new exceptions) rather than in one fell swoop. Which implies that you were right about the best way to get anything done. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/ODSM2MPGCQPPOMB4V5Q2BQFROLTP6KR3/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: PEP 618: Add Optional Length-Checking To zip
On May 1, 2020, at 11:19, Brandt Bucher wrote: > > I have pushed a first draft of PEP 618: > > https://www.python.org/dev/peps/pep-0618 The document says “… with nobody challenging the use of the word ‘strict’”, but people did challenge it, and even more people just called it “equal” instead of “strict” when arguing for it or +’ing it (which implies a preference even if there’s no argument there), and the only known prior art on this is more-itertools, which has a zip_equal function, not a zip_strict function. I think it misrepresents the arguments for a separate function and undersells the advantages—it basically just addresses the objections that are easiest to reject. I don’t want to rehash all of my arguments and those of a dozen other people, since they’re already in the thread, but let me just give one: A separate function can be used in third-party libraries immediately, as long as there’s an available backport (whether that’s more-iterools, or a trivial zip39 or whatever) that they can require; a flag can’t be used in libraries until they’re able to require Python 3.9 (unless they want to use a backport that monkey patches or shadows the builtin, but I doubt you’d suggest that, since you called it an antipattern elsewhere in the PEP). It implies that infinite iterators are the only legitimate place where you’d ever want the existing shortest behavior. Also, I don’t think anyone on the thread suggested the alternative of changing the behavior of zip _today_. Serhiy only suggested that we should leave the door open to doing so in the future, by having an enum-valued flag instead of a bool, or zip_shortest alongside zip_equal and zip_longest, or whatever. That allows people to explicitly say they want shortest when they want it, now—which might be beneficial even on its own terms. And if people end up usually using strict, and usually being explicit when they want shortest, then at that point it might be worth changing the default (or just not having one). So the argument against the alternative doesn’t really cover the actual thing suggested, but a different thing nobody wanted. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/MHP3V2GFFBIDXVCY4T62TL4YRLGYGTGW/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: is a
On May 1, 2020, at 10:27, gbs--- via Python-ideas wrote: > > In cases where it makes sense to do explicit type checking (with ABCs or > whatever), I really detest the look of the isinstance() function. > > if isinstance(thing, Fruit) and not isinstance(thing, Apple): > > Yucky. I think it’s intentional that it’s a little yucky. It makes you think “could I be using duck typing or overridden methods here instead of type switching?” Sure, sometimes the answer is, “No, I can’t,” which is why ABCs were added. But if you’re using them so often that you get annoyed by the ugliness, then maybe you’re using an antipattern—or, if not, there’s a good chance you’re doing something that’s perfectly valid but unusual for Python, so the language just isn’t going to cater to you. Maybe Python leans a little too far toward discouraging type checks, because there was so much resistance to the very idea of ABCs until people got used to them. But if so, I suspect you’ll need a solid example of realistic code that should look better, and can’t be reasonably redesigned, to convince people, not just showing that isinstance is about as ugly as it was designed to be. > What I really want to write is: > > if thing is a Fruit and thing is not an Apple: > and after thinking about it on and off for a while I wonder if it might > indeed be possible to teach the parser to handle that in a way that > eliminates almost all possible ambiguity with the regular "is", including > perhaps 100% of all existing standard library code and almost all user code? Possible? Yes, at least with the new parser coming from PEP 617. But that doesn’t mean it’s a good idea. You certainly can’t make a and an into keywords, because lots of people have variables named a. You can’t even make them into “conditional keywords”, that only have a special meaning after “is” and “is not”—besides all the usual negatives of conditional keywords, it won’t work, because “b is a” is already perfectly reasonable code today. So you’d need to add some kind of backtracking: they’re conditional keywords only if they follow “is” or “is not” and are followed by a valid expression. Which is more complicated (and less efficient) to parse. Some third-party parser tools might even have to be completely rewritten, or at least to add special case hacks for this. And, more importantly, the more context it takes to parse things (or the more special cases you have to learn and memorize), the harder the language’s syntax is to internalize. The fact that Python is (almost) an LL(1) language makes it pretty easy to get most of syntax for the subset that you use firmly into your head. Every special case makes that less true, which means more cases where you get confused by a SyntaxError in your code or about what someone else’s code means, and means it’s harder to manually work through the parse when you do get stumped like that and you resort to shotgun-debugging antics instead. For a practical example, look at some languages that are actually designed to be executable English rather than executable pseudocode, like AppleScript or Inform. The fact that “bring every window of the first app to the foreground” reads like a normal English sentence is pretty cool, but the fact that “bring the first window of every app to the foreground” gives you an error message about not knowing what the every is, and the only way to rewrite it is “tell every app to bring the first window of it to the foreground”, severely dampens the coolness factor. > Maybe this has been considered at some point in the past? The "is [not] a|an" > proposal would at least be a strong contender for "hardest thing to search > for on the internet" lol. That will also make it hard to search for when you see some code you don’t understand and need to search for help, won’t it? A search for “isinstance” (even without including Python) brings me the docs page, some tutorials and blogs, and some StackOverflow questions; what’s a search for “is a” or even “Python is a” going to get me? Maybe you could get more mileage out of going halfway there, with an operator named isa. Other languages use that spelling for related things (in Perl it’s exactly the operator you want; in ObjC it’s a property on the instance but it’s still about types), and people often use “isa” or “is-a” as a technical term in comp sci. if thing isa Fruit and thing not isa Apple: That’s still pretty readable, and easy to parse. But it still breaks backward compatibility, because people do have code that uses “isa” as a normal identifier. (For one thing, it’s how you access the isa attribute of an ObjC object in PyObjC.) ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at
[Python-ideas] Re: Introduce 100 more built-in exceptions
On May 1, 2020, at 09:24, Christopher Barker wrote: > > Maybe it's too late for this, but I would love it if ".errno or similar" were > more standardized. As it is, every exception may have it's own way to find > out more about exactly what caused it, and often you are left with parsing > the message if you really want to know. I don’t think there are many cases where a standardized .errno would help—and I think most such cases would be better served by separate exceptions. With OSError, errno was a problem to be fixed, not an ideal solution to emulate everywhere. You do often need to be able to get more information, and that is a problem, but I think it usually needs to be specific to each exception, not something generic. Does code often need to distinguish between an unpacking error and an int parsing error? If so, you should be able to handle UnpackingError and IntParsingError, not handle ValueError and check an .errno against some set of dozens of new builtin int constants. If not, then we shouldn’t change anything at all. As for parsing the error message, that usually comes up because there’s auxiliary information that you need but that isn’t accessible. For example, in 2.x, to get the filename that failed to open, you had to regex .args[0], and that sucked. But the fix was to add a .filename to all of the relevant exceptions, and now it’s great. If you need to be able to get the failing string for int(s) raising a ValueError today, you have to regex .args[0], and that sucks. Do people actually need to do that? If so, there should be a .string or something that carries that information; an .errno won’t help. It seems like every year or two, someone suggests that we should go through the stdlib and fix all the exceptions to be reasonably distinguishable and to make their relevant information more accessible, and I don’t think anyone ever has a problem with that, it’s just that nobody’s ever willing to volunteer to survey every place a builtin or stdlib raises, list them all, and work out exactly what should be changed and where. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/CZP5RDQGWAXS4QQ3BHVNRT4VBXVP2Z3Z/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: zip(x, y, z, strict=True)
On May 1, 2020, at 08:08, Christopher Barker wrote: > > Also please keep in mind that the members of this list, and the python-dev > list, are not representative of most Python users. Certainly not beginners > but also many (most?) fairly active, but more "casual" users. > > Folks on this list are very invested in the itertools module and iteration in > general. But many folks write a LOT of code without every touching > iterttools. Honestly, a lot of it is pretty esoteric (zip_longests is not) -- > I need to read the docs and think carefully before I know what they even do. So what? Most of the os module is pretty esoteric, but that doesn’t stop you—or even a novice who just asked “how do I get my files like dir”—from using os.listdir. For that matter, zip is in the same place as stuff like setattr and memoryview, which are a lot harder to grok than chain. That novice will never guess to look in os. And if I told them “go look in os”, that would be useless and cruel. But I don’t, I tell them “that’s called os.listdir”, and they don’t have to learn about effective/real/saved user ids or the 11 different spawn functions to “get my files like dir” like they asked. > Example: Here's the docstring for itertools.chain: > > chain(*iterables) --> chain object > > Return a chain object whose .__next__() method returns elements from the > first iterable until it is exhausted, then elements from the next > iterable, until all of the iterables are exhausted. > > I can tell you that I have no idea what that means -- maybe folks wth CS > training do, but that is NOT most people that use Python. And here’s the docstring for zip: > Return a zip object whose .__next__() method returns a tuple where > the i-th element comes from the i-th iterable argument. The .__next__() > method continues until the shortest iterable in the argument sequence > is exhausted and then it raises StopIteration Most people have no idea what that means either. In fact, chain is simpler to grok than zip (it just doesn’t come up as often, so it doesn’t need to be a builtin). > Anyway, inscrutable docstrings are another issue, and one I keep hoping I'll > find the time to try to address one day, Yes, many of Python’s docstrings tersely explain the details of how the function does what it does, rather than telling you why it’s useful or how to use it. And yes, that’s less than ideal. But that isn’t an advantage to adding a flag to zip over adding a new function. Making zip more complicated certainly won’t magically fix its docstring, it’ll just make the docstring more complicated. > but the point is : > > "Folks will go look in itertools when zip() doesn't do what they want " just > does not apply to most people. But nobody suggested that they will. That’s exactly why people keep saying it should be mentioned in the docstring and the docs page and maybe even the tutorial. And you’re also right that it’s also not true that “folks will read the docstring for zip() when zip() doesn’t do what they want and figure it out from there”, but that’s equally a problem for both versions of the proposal. In fact, most people, unless they learned it from a tutorial or class or book or blog post or from existing code before they needed it, are going to go to a coworker, StackOverflow, the TA for their class, a general web search, etc. to find out how to do what they want. There’s only so much Python can do about that—the docstring, docs page, and official tutorial (which isn’t the tutorial most people learn from) is about it. We have to trust that if this really is something novices need, the people who teach classes and answer on StackOverflow and write tutorials and mentor interns and help out C# experts who only use Python twice a year and so on will teach it. There’s no way around that. But if those people can and do teach os.listdir and math.sin and so on, they can also teach zip_equal. > Finally, yes, a pointer to itertools in the docstring would help a lot, but > yes, it's still a heavier lift than adding a flag, 'cause you have to then go > and import a new module, etc. What’s the “etc.” here? What additional thing do they have to do besides import a new module? People have to import a new module to get a list of their files. And lots of other things that are builtins in other languages. In JavaScript, I don’t have to import anything to decode JSON, to do basic math functions like sin or mean, to create a simple object (where I don’t have to worry about writing __init__ and __repr__ and __eq__ and so on), to make a basic web request, etc. In Python, I have to import a module to do any of those things (for the last one, I even have to install a third-party package first). Namespaces are a honking great idea, but there is a cost to that idea, and that cost includes people having to learn import pretty early on. ___ Python-ideas mailing list --
[Python-ideas] Re: zip(x, y, z, strict=True)
On Apr 29, 2020, at 22:50, Stephen J. Turnbull wrote: > Andrew Barnert via Python-ideas writes: > >>> Also -1 on the flag. > > Also -1 on the flag, for the same set of reasons. > > I have to dissent somewhat from one of the complaints, though: > >> auto-complete won’t help at all, Thanks for pointing this out; I didn’t realize how misleadingly I stated this. What I meant to say is that auto-complete won’t help at all with the problem that flags are less discoverable and harder to type than separate functions. Not that it won’t help at all with typing flags—it will actually help a little, it’ll just help a lot less than with separate functions, making the problem even starker rather than eliminating it. It’s worth trying this out to see for yourself. > Many (most?) people use IDEs that will catch up more or less quickly, > though. In fact, most IDEs should just automatically work without needing to change anything, because they work off the signatures and/or typesheds in the first place. That’s not the issue; the issue is what they can actually do for you. And it’s not really any different from in your terminal. In an iPython REPL in my terminal, I enter these definitions: def spam(*args, equal=False): pass def eggs(*args): pass def eggs_equal(*args): pass I can now type eggs_equal(x, y) with `e TAB TAB x, y` or `eggs_ TAB x, y`. And either way, a pop up is showing me exactly the options I want to see when I ask for completion, I’m not just typing that blind. I can type spam(x, y, equal=True) with `s TAB x, y, e TAB T TAB`. That is better than typing out the whole thing, but notice that it requires three autocompletes rather than one, and they aren’t nearly as helpful. Why? Well, it has no idea that the third argument I want to pass is the equal keyword rather than anything at all, because *args takes anything all. And, even after it knows I’m passing the equal argument, it has no idea what value I want for it, so the only way to get suggestions for what to pass as the value is to type T and complete all values in scope starting with T (and usually True will be the first one). And it’s not giving me much useful information at each step; I had to know that I was looking to type equal=True before it could help me type that. The popup signature that shows *args, equal=False does clue me in, but still not nearly as well as offering eggs_equal did. Now repeat the same thing in a source file in PyCharm, and it’s basically the same. Sure, the popups are nicer, and PyCharm actually infers that equal is of type bool even though I didn’t annotate so it can show me True, False, and all bool variables in scope instead of showing me everything in scope, but otherwise, no difference. I still need to ask for help three times instead of once, and get less guidance when I do. And that’s with a bool (or Enum) flag. Change it to end="shortest", and it’s even worse. Strings aren’t code, they’re data, so PyCharm suggests nothing at all for the argument value, while iPython suggests generally-interesting strings like the files in my cwd. (I suppose they could add a special case for this argument of this function, although they don’t do that for anything else, not even the mode argument of open—and, even if they did, at best that makes things only a little worse than a bool or Enum instead of a lot worse…) ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/ERRWSIQC5XQBMOY3WX2NR5HH426LYX5L/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: zip(x, y, z, strict=True)
On Apr 30, 2020, at 07:58, Christopher Barker wrote: > >> I think that the issue of searchability and signature are pretty >> compelling reasons for such a simple feature to be part of the >> function name. > > I would absolutely agree with that if all three function were in the same > namespace (like the string methods referred to earlier), but in this case, > one is a built in and the others will not be — which makes a huge difference > in discoverability. > > Imagine someone that uses zip() in code that works for a while, and then > discovers a bug triggered by unequal length inputs. > > If it’s a flag, they look at the zip docstring, and find the flag, and their > problem is solved. > > Is it’s in itertools, they have to think to look there. Granted, some > googling will probably lead them there, and the zip() docstring can point > them there, but it’s still a heavier lift. I don’t understand. You’re arguing that being discoverable in the docstring is sufficient for the flag, but being discoverable in the docstring is a heavier lift from the function. Why would this be true, unless you intentionally write the docstring badly? To make this more concrete, let’s say we want to just add on to the existing doc string (even though it seems aimed more at reminding experts of the exact details than at teaching novices) and stick to the same style. We’re then talking about something like this: > Return a zip object whose .__next__() method returns a tuple where > the i-th element comes from the i-th iterable argument. The .__next__() > method continues until the shortest iterable in the argument sequence > is exhausted and then it raises StopIteration, or, if equal is true, > it checks that the remaining iterables are exhausted and otherwise > raises ValueError. … vs. this: > Return a zip object whose .__next__() method returns a tuple where > the i-th element comes from the i-th iterable argument. The .__next__() > method continues until the shortest iterable in the argument sequence > is exhausted and then it raises StopIteration. If you need to check > that all iterables are exhausted, use itertools.zip_equal, > which raises ValueError if they aren’t. If they can figure out that equal=True is what they’re looking for from the first one, it’ll be just as easy to figure out that zip_equal is what they’re looking for from the second. Of course it might be better to rewrite the whole thing to be more novice-friendly and to describe what zip iterates at a higher level instead of describing how its __next__ method operates, but that applies to both versions. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/BGTNMWVD3THOYV2GILT7LNNYHMBGAW77/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Adding a "once" function to functools
On Apr 29, 2020, at 11:15, Tom Forbes wrote: > >> Thread 2 wakes up with the lock, calls the function, fills the cache, and >> releases the lock. > > What exactly would the issue be with this: > > ``` > import functools > from threading import Lock > > def once(func): >sentinel = object() >cache = sentinel >lock = Lock() > >@functools.wraps(func) >def _wrapper(): >nonlocal cache, lock, sentinel >if cache is sentinel: >with lock: >if cache is sentinel: >cache = func() >return cache > >return _wrapper > ``` You’ve written an exactly equIvalent to the double-checked locking for singletons examples that broke Java 1.4 and C++03 and led to us having once functions in the first place. In both of those languages, and most others, there is no guarantee that the write to cache in thread 1 happens between the two reads from cache in thread 2. Which gives you the fun kind of bug that every few thousand runs you have corrupted data an hour later, or it works fine on your computer but it crashes for one of your users because they have two CPUs that don’t share L2 cache while you have all your cores on the same die, or it works fine until you change some completely unrelated part of the code, etc. Java solved this by adding volatile variables in Java 5 (existing code was still broken, but just mark cache volatile and it’s fixed); C++11 added a compiler-assisted call_once function (and added a memory model that allows them to specify exactly what happens and when so that the desired behavior was actually guaranteeable). Newer languages learned from their experience and got it right the first time, rather than repeating the same mistake. Is there anything about Python’s memory model guarantee that means it can’t happen in Python? I don’t think there _is_ a memory model. In CPython, or any GIL-based implementation, I _think_ it’s safe (the other thread can’t be running at the same time on a different core, so there can’t be a cache coherency ordering issue between the cores, right?), but what about on Jython, or PyPy-STM, or a future GIL-less Python? And in both of those languages, double-checked locking is still nowhere near as efficient as using a local static. > Seems generally more correct, even in single threaded cases, to pay the > overhead only in the first call if you want `call_once` semantics. Which is > why you would be using `call_once` in the first place? But you won’t be paying the overhead only on the first call, you’ll be paying it on all of the calls that before the first one completed. That’s the whole point of the lock, after all—they have to wait until it’s ready—and they can’t possibly do that without the lock overhead. And for the next few afterward, because they’ll have gotten far enough to check even if they haven’t gotten far enough to get the lock, and there’s no way they can know they don’t need the lock. And for the next few after that, because unless the system only runs one thread at a time and synchronizes all of memory every time you switch threads they may not see the write yet anyway. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/G4ZDP6UYOL323VGX4IFRGGA5OVIEDD6P/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: deque: Allow efficient operations
On Apr 29, 2020, at 12:03, Christopher Barker wrote: > > > Isn't much demand for a *generic* linked list. It would probably be a good > recipe though -- so users could have a starting point for their custom > version. I think what would be really handy would be a HOWTO on linked lists that showed the different options and tradeoffs and how to implement and use at least a few different ones, and showed why they’re useful with examples. (And also showed why the Sequence/Iterable API can be helpful but also why it’s not sufficient.) Then the collections module (and the tutorial?) could both just have a sentence saying “Python doesn’t have a linked list type because there are so many useful kinds of linked lists and they’re all easy to build but very different—see the Linked Lists HOWTO for details.” But if I wrote it, it would probably be 4x as long as any novice would want to read. (I think I wrote some blog posts on linked lists in Python years ago, and ended up building a Haskell-style lazy list out of a trigger function and then showing how to do Fibonacci numbers by recursively zipping it, or something crazy like that.) In the old days we could probably just post three different simple recipes on ActiveState and link to them from the docs and let people build on the examples there, rather than try to write it all up-front and fit it into the Python docs style, but that doesn’t work so well anymore. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/CTSBRBTMALAM6JFW6H4JPT2SAADK44A6/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: zip(x, y, z, strict=True)
On Apr 29, 2020, at 07:08, Barry Scott wrote: > > >> On 28 Apr 2020, at 16:12, Rhodri James wrote: >> >>> On 28/04/2020 15:46, Brandt Bucher wrote: >>> Thanks for weighing in, everybody. >>> Over the course of the last week, it has become surprisingly clear that >>> this change is controversial enough to require a PEP. >>> With that in mind, I've started drafting one summarizing the discussion >>> that took place here, and arguing for the addition of a boolean flag to the >>> `zip` constructor. Antoine Pitrou has agreed to sponsor, and I've chatted >>> with another core developer who shares my view that such a flag wouldn't >>> violate Python's existing design philosophies. >>> I'll be watching this thread, and should have a draft posted to the list >>> for feedback this week. >> >> -1 on the flag. I'd be happy to have a separate zip_strict() (however you >> spell it), but behaviour switches just smell wrong. > > Also -1 on the flag. > > 1. A new name can be searched for. > 2. You do not force a if on the flag for every single call to zip. Agreed on both Rhodri’s and Barry’s reasons, and more below. I also prefer the name zip_equal to zip_strict, because what we’re being strict about isn’t nearly as obvious as what’s different between shortest vs. equal vs. longest, but that’s just a mild preference, not a -1 like the flag. In addition to the three points above: Having one common zip variant spelled as a different function and the other as a flag seems really bad for learning and remembering the language. And zip_longest has a solidly established precedent. And I don’t think you want to add multiple bool flags to zip? Also, just look at these: zip_strict(xs, ys) zip(xs, ys, strict=True) The first one is easier to read because it doesn’t have the extra 5 characters to skim over that don’t really add anything to the meaning, and it puts the important distinction up front. It’s also shorter, and a lot easier to type with auto-complete—which isn’t nearly as big of a deal, but if this is really meant to be used often it does add up. And it’s obviously more extensible, if it really is at all possible that we might want to eventually deprecate shortest or add new end behaviors like yielding partial tuples or Soni’s thing of stashing the leftovers somehow (none of which I find very convincing, but others apparently do, and picking a design that rules them out means explicitly rejecting them). A string or enum flag instead of a book solves half of those problems (as long as “longest” is one of the options), but it makes others even worse. The available strings aren’t even discoverable as part of the signature, auto-complete won’t help at all, and the result is even longer and even more deemphasizes the important thing. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/3JKAI25VFIGBO4HPWQ6S22PNKZ6ZOCCT/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: deque: Allow efficient operations
On Apr 29, 2020, at 08:33, Christopher Barker wrote: > > I've wondered about Linked Lists for a while, but while there are many > versions on PyPi, I can't find one that seems to be mature and maintained. > Which seems to indicate that there isn't much demand for them. I think there’s lots of demand for them, but there are so many different variants that can’t substitute for each other (try taking any nontrivial sample code using Haskell’s single-linked, no-handle, immutable tail-sharing list and rewriting it with C++’s doubly-linked handled mutable list, or vice-versa), and most of the key operations fit so poorly with Python’s sequence/iterable API, and they’re all so easy to build, that people just build the one they need whenever they need it. I do have a few different linked lists in my toolbox that have come up often enough that I stashed them (an immutable cons, a handled double-linked list, a cffi wrapper for a common style of C internally-linked lists, probably others), but half the time I reach for one I have to modify it anyway, so I haven’t bothered to turn them into a package I just import and use. And, while I did add the whole (Mutable)Sequence API to each one (because it’s convenient for debugging and REPL exploration to be able to list(xs), or to get a repr that’s written in terms of a from_iter classmethod so I can eval it back, etc.), I usually don’t use that API for anything but debugging. When you’re dealing with linked lists, you usually need to deal with the nodes directly. For example, one big reason to use linked lists is constant-time splicing, but you can’t splice in constant time if all you have is the head/handle and/or an opaque iterator that only knows how to go forward; you need the node before the splice point (or, for doubly-linked, after is fine too). Another reason to use (Lisp/Haskell-style) linked lists is that they automatically release nodes as you iterate unless you keep a reference to the head, but that’s clumsy to do with Python-style APIs. And so on. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/DGBRHXGXHAMERZRQW2WN5XMU22WBIHUK/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Adding a "once" function to functools
On Apr 28, 2020, at 16:25, Steven D'Aprano wrote: > > On Tue, Apr 28, 2020 at 11:45:49AM -0700, Raymond Hettinger wrote: > >> It seems like you would get just about everything you want with one line: >> >> once = lru_cache(maxsize=None) > > But is it thread-safe? You can add thread safety the same way as any other function: @synchronized @once def spam(): return 42 in a slow and non-thread-safe and non-idempotent way and also launch the missiles the second time we’re called Or wrap a with lock: around the code that calls it, or whatever. Not all uses of once require thread safety. For the really obvious example, imagine you’re sharing a singleton between coroutines instead of threads. And if people are really concerned with the overhead of lru_cache(maxsize=None), the overhead of locking every time you access the value is probably even less acceptable when unnecessary. So, I think it makes sense to leave it up to the user (but to explain the issue in the docs). Or maybe we could add a threading.once (and asyncio.once?) as well as functools.once? ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/RSJLUF4R6TM3HSILZYWGB366WUHQT755/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Adding a "once" function to functools
On Apr 28, 2020, at 12:02, Alex Hall wrote: > > Some libraries implement a 'lazy object' which forwards all operations to a > wrapped object, which gets lazily initialised once: > > https://github.com/ionelmc/python-lazy-object-proxy > https://docs.djangoproject.com/en/3.0/_modules/django/utils/functional/ > > There's also a more general concept of proxying everything to some target. > wrapt provides ObjectProxy which is the simplest case, the idea being that > you override specific operations: > > https://wrapt.readthedocs.io/en/latest/wrappers.html > > Flask and werkzeug provide proxies which forward based on the request being > handled, e.g. which thread or greenlet you're in, which allows magic like the > global request object: > > https://flask.palletsprojects.com/en/1.1.x/api/#flask.request > > All of these have messy looking implementations and hairy edge cases. I > imagine the language could be changed to make this kind of thing easier, more > robust, and more performant. But I'm struggling to formulate what exactly > "this kind of thing is", i.e. what feature the language could use. For the case where you’re trying to do the “singleton pattern” for a complex object whose behavior is all about calling specific methods, a proxy might work, and the only thing Python might need, if anything, is ways to make it possible/easier to write a GenericProxy that just delegates everything in some clean way, but even that isn’t really needed if you’re willing to make the proxy specific to the type you’re singleton-ing. But often what you want to lazily initialize is a simple object—a str, a small integer, a list of str, etc. Guido’s example lazily initialized by calling getcwd(), and the first example given for the Swift feature is usually a fullname string built on demand from firstname and lastname. And if you look for examples of @cachedproperty (which really is exactly what you want for @lazy except that it only works for instance attributes, and you want it for class attributes or globals), the singleton pattern seems to be a notable exception, not the usual case; mostly you lazily initialize either simple objects like a str, a pair of floats, a list of int, etc., or numpy/pandas objects. And you can’t proxy either of those in Python. Especially str. Proxies work by duck-typing as the target, but you can’t duck-type as a str, because most builtin and extension functions that want a str ignore its methods and use the PyUnicode API to get directly at its array of characters. Numbers, lists, numpy arrays, etc. aren’t quite as bad as str, but they still have problems. Also, even when it works, the performance cost of a proxy would often be prohibitive. If you write this: @lazy def fullname(): return firstname + " " + lastname … presumably it’s because you need to eliminate the cost of string concatenation every time you need the fullname. But if it then requires every operation on that fullname to go through a dynamic proxy, you’ve probably added more overhead than you saved. So I don’t think proxies are the answer here. Really, we either need descriptors that can somehow work for globals and class attributes (which is probably not solveable), or some brand new language semantics that aren’t built on what’s already there. The latter sounds like probably way more work than this feature deserves, but maybe the experience of Swift argues otherwise. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/OGCFXBYXPT7AVJQLSW3HTNBP7SJJ7A5B/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: extended for-else, extended continue, and a rant about zip()
On Apr 28, 2020, at 09:18, Chris Angelico wrote: > > I suggest forking CPython and implementing the feature. I’d suggest trying MacroPy first. There’s no way to get the desired syntax with macros, but at least at first glance it seems like you should be able to get the desired semantics with something that’s only kind of ugly and clumsy, rather than totally hideous. And if so, that’s usually good enough for playing around with fun ideas to see where they can lead, and a lot less work. Plus, playing with MacroPy is actually fun in itself; playing with the CPython parser is kind of the opposite of fun. :) ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/SPTNT7DXUHVP32ZUZAC6HIYYWNEYDY4K/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Adding a "once" function to functools
On Apr 26, 2020, at 10:41, Guido van Rossum wrote: > > > Since the function has no parameters and is pre-computed, why force all users > to *call* it? The @once decorator could just return the value of calling the > function: > > def once(func): > return func() > > @once > def pwd(): > return os.getcwd() If that’s all @once does, you don’t need it. Surely this is even clearer: pwd = os.getcwd() The decorator has to add initialization on first demand, or it’s not doing anything. But I think you’re onto something important that everyone else is missing. To the user of this module, this really should look like a variable, not a function. The fact that we want to initialize it later shouldn’t change that. Especially not in Python—other languages bend over backward to make you write getters around every public attribute even when you don’t need any computation; Python bends over backward to let you expose public attributes even when you do need computation. And this isn’t unprecedented. Swift added a lazy variable initialization feature in version 2 even though they already had dispatch_once. And then they discovered that it eliminated nearly all good uses of dispatch_once and deprecated it. All you need is lazy-initialized variables. Your singletons, your possibly-unused expensive tables, your fiddly low-level things with complicated initialization order dependencies, they’re all lazy variables. So what’s left for @once functions that need to look like functions? I think once you think about it in these terms, @lazy makes more sense than @once. The difference between these special attributes and normal ones is that they’re initialized on first demand rather than at definition time. The “on demand” is the salient bit, not the “first”. The only problem is: how could this be implemented? Most of the time you want these things on modules. For lazy imports, the __getattr__ solution of PEP 562 was good enough, but this isn’t nearly as much of an expert feature. Novices write lazy variables in Swift, and if we have to tell them they can’t do it in Python without learning the deep magic of how variable lookup works, that would be a major shame. But I can’t think of an answer that doesn’t run into all the same problems that PEP 562’s competing protocols did. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/GHDFHZ46QAIMKRGACESWG6XX4FPPLZIN/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: extended for-else, extended continue, and a rant about zip()
On Apr 27, 2020, at 20:48, Soni L. wrote > >> Here are four ways of doing this today: … >> So, why do we need another way to do something that’s probably pretty >> uncommon and can already be done pretty easily? Especially if that new way >> isn’t more readable or more powerful? > > the only one with equivalent semantics is the last one. I won’t argue about whether two functions that give the exact same results in every case but get there in different ways are “equivalent” or not, since one is already good enough. If you agree that there is obvious code that works in Python 3.8 (and even in Python 2.7, for that matter) to get the semantics you want, why should we add a new language feature that gives you a less readable, more verbose, and more complicated way to do the same thing? > tbh my particular case doesn't make a ton of practical sense. That’s hardly a good argument for your proposal. Do you actually want the things you propose to be added to the language, or even to be seriously considered? If not, why are you proposing them? >> > > see: why are we perfectly happy with ignoring extra lines at the end? >> >> Because there aren’t any. The file was made by catting together 2022 4-line >> files, so it’s 8088 lines long. It will always be 8088 lines long. If I >> really thought that was important to check, surely I’d want to check 8088 >> rather than just divisible by 4. But I didn’t think it was worth checking >> either of those—or that the text is pure ASCII, or that the newlines are \n, >> etc. For a more general purpose script (especially if it had to accept input >> from potentially stupid or malicious end users and produce useful error >> responses instead of just punting), I would have checked many of those >> things and more, but for this script, it wasn’t worth it. > > that's what assert is for - making assumptions that you know are correct now, > but might not remain so in the future! Would you want to read, or maintain, code like this: s = "spam" assert isinstance(s, str) assert isinstance(type(s), type) assert len(s) == 4 assert len(set(s)) == len(s) for c in s: assert type(c) == type(s) assert c is not None assert len(c) == 1 assert s.count(c) == 1 assert 0 <= ord(c) < 0x11 assert len(c.encode()) <= 4 assert not sys.stdout.closed() print(f"{c}...") if sys.implementation.name == "cpython”: assert chr(ord(c)) is c assert c == s[-1] assert s == "spam" I’m assuming all of those things are true, and hundreds more (from the fact that s was unbound before the assignment to the fact that nobody has modified the interned 0 value to mean 1), but that doesn’t mean they’re all worth testing. Trying to test absolutely everything just means you’re more likely to forget to test one of the important things, and more likely to miss it if you do forget. (And that’s even assuming all of your tests are correct, which they almost certainly won’t be if you’re trying to test everything you can imagine. So you’ll also waste time debugging useless tests that could have been spent verifying, debugging or improving the useful tests and/or the actual functionality.) On top of that, if my input file doesn’t have 8088 lines, that’s almost certainly not a bug in my code, but either user error (I put the wrong file at that path) or corrupted data (I accidentally truncated the file). So testing it with an assert would actually be misleading myself; it should be something like a ValueError. Even if you never programmatically handle the error, having the right error makes a big difference to ease of debugging. >> Even if you think Python should be doing more to encourage such checks, your >> proposal doesn’t help that at all—what you want is something like Serhiy’s >> proposal in the other thread (to eventually rename zip to zip_shortest and >> either get rid of plain zip or make it an alias for zip_equal). > > ... why not? I know assert is discouraged by many, but I wouldn't say > enabling ppl to do these checks doesn't help ppl do these checks...? unless I > misunderstand what you mean by this? Because people already are enabled to check, and they’re just choosing not to. Giving them a harder and less discoverable way isn’t going to change that. Anyone who’s decided it’s not worth using zip_equal instead of zip is not going to think it’s worth adding an else and a test to the loop around that zip. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/DNSURDEN67HDRYBBWV7FONVITNOKEC3J/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: extended for-else, extended continue, and a rant about zip()
On Apr 27, 2020, at 17:01, Soni L. wrote: > >>> On 2020-04-27 8:37 p.m., Andrew Barnert wrote: >>> On Apr 27, 2020, at 14:38, Soni L. wrote: >> [snipping a long unanswered reply] >>> The explicit case for zip is if you *don't* want it to consume anything >>> after the stop. >> Sure, but *when do you want that*? What’s an example of code you want to >> write that would be more readable, or easier to write, or whatever, if you >> could work around consuming anything after the stop? > > so here's one example, let's say you want to iterate multiple things (like > with zip), get a count out of it, as well as partially consume an external > iterator without swallowing any extra values from it. What do you want to do that for? This still isn’t a concrete use case, so it’s still not much more of a rationale than “let’s say you want to intermingle the bits of two 16-bit integers into a 32-bit integer”. Sure, that’s something that’s easy to do in some other languages (it’s the builtin $ operator in INTERCAL) but very hard to do readably or efficiently in Python. If we added a $ operator with a __bigmoney__ protocol and made int.__bigmoney__ implement this operation in C, that would definitely solve the problem. But it’s only worth proposing that solution if anyone actually needs a solution to the problem in the first place. When’s the last time anyone ever needed to efficiently intermingle bits? (Except in INTERCAL, where the language intentionally leaves out useful operators like +, |, and << and even 32-bit literals to force you to write things in clever ways around $ and ~ instead). On top of that, this abstract example you want can already be written today. > it'd look something like this: > >def foo(self, other_things): > for x in zip(range(sys.maxsize), self.my_things, other_things): >do_stuff > else as y: >return y[0] # count > using extended for-else + partial-zip. it stops as soon as self.my_things > stops. and then the caller can do whatever else it needs with other_things. > (altho maybe it's considered unpythonic to reuse iterators like this? I like > it tho.) Here are four ways of doing this today: def foo(self, other_things): for x in zip(count(1), self.my_things, other_things): do_stuff return x[0] def foo(self, other_things): c = count(-1) for x in zip(c, self.my_things, other_things): do_stuff return next(c) def foo(self, other_things): c = count() for x in zip(self.my_things, other_things, c): do_stuff return next(c) def foo(self, other_things): c = lastable(count()) for x in zip(c, self.my_things, other_things): do_stuff return c.last So, why do we need another way to do something that’s probably pretty uncommon and can already be done pretty easily? Especially if that new way isn’t more readable or more powerful? > if anything my motivating example is because I wanna do some very unpythonic > things. Then you should have given that example in the first place. Sure, the fact that it’s unpythonic might mean it’s not very convincing, but it doesn’t become more convincing after multiple people have to go back and forth to drag it out of you. All that means is that everyone else has already tuned out and won’t even see your example, so your proposal has basically zero chance instead of whatever chance it should have had. And sometimes unpythonic things really do get into the language—sometimes because they’re just so useful, but more often, because they point to a reason for changing what everyone’s definition of “pythonic” is. Think of the abc module. Or, better, if you can dig up the 3.1-era vs. 3.3-era threads on the original coroutine PEP 3152, you can see how the consensus changed from “wtf, that doesn’t look like Python at all and nobody will ever understand it” to “this is obviously the pythonic way to write reactors (modulo a bunch of bikeshedding)”. That wouldn’t have happened if Greg Ewing had refused to tell anyone that he wanted coroutines to provide a better, if unfamiliar, way to write things like reactors, and instead tried to come up with less-unpythonic-looking but completely useless examples. >> That grouping idiom is useful for all kinds of things that _aren’t_ about >> optimization. Maybe the zip docs aren’t the best place for it (but it’s also >> in the itertools recipes, which probably is the best place for it), but it’s >> definitely useful. In fact, I used it less than a week ago. We’ve got this >> tool that writes a bunch of 4-line files, and someone concatenated a bunch >> of them together and wrote this horrible code to pull them back apart in >> another language I won’t mention here, and rather than debug their code, I >> just rewrote it in Python like this: >> with open(path) as f: >> for entry in chunkify(f, 4): >> process(entry)
[Python-ideas] Re: extended for-else, extended continue, and a rant about zip()
On Apr 27, 2020, at 16:35, Soni L. wrote: > > the point of posting here is that someone else may have a similar existing > use-case Similar to *what*? It can’t be similar to your use case if you don’t have a use case for it to be similar to. If you really can’t imagine why something might be useful, and nobody else has ever asked for it, it probably isn’t actually needed. Sure, there are rare exceptions to that, but that shouldn’t be your default assumption for everything that could ever conceivably be done. > where this would make things better. I can't take a look at proprietary code > so I post about stuff in the hopes that the ppl who can will back this stuff > up. > > (doesn't proprietary software make things so much harder? :/) A little bit, but not nearly as much as you seem to be thinking. There are zillions of lines of open source Python code easily searchable. There may be a few kinds of problems that are likely to only come up in proprietary code, but something generic like this is just as likely to be useful to Django or MusicBrainz or Jupyter or DNF or even the Python stdlib as to some internal Dropbox service or the guts of the Civ V scripting engine. So the fact that you can’t search the Dropbox or Firaxis source is not actually a big problem. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/ZF2KSSWXGEY4C6HFBBS64XV3BA2HUGX7/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: extended for-else, extended continue, and a rant about zip()
On Apr 27, 2020, at 14:38, Soni L. wrote: [snipping a long unanswered reply] > The explicit case for zip is if you *don't* want it to consume anything after > the stop. Sure, but *when do you want that*? What’s an example of code you want to write that would be more readable, or easier to write, or whatever, if you could work around consuming anything after the stop? > btw: I suggest reading the whole post as one rather than trying to pick it > apart. I did read the whole post, and then went back to reply to each part in-line. You can tell by the fact that I refer to things later in the post. For example, when I refer to your proposed code being better than “the ugly mess that you posted below“ as the current alternative, it should be pretty clear that I’ve already read the ugly mess that you posted below. So why did I format it as replies inline? Because that’s standard netiquette that goes back to the earliest days of email lists. Most people find it confusing (and sometimes annoying) to read a giant quote and then a giant reply and try to figure out what’s being referred to where, so when you have a giant message to reply to, it’s helpful to reply inline. But as a bonus, writing a reply that way makes it clear to yourself if you’ve left out anything important. You didn’t reply to multiple issues that I raised, and I doubt that it’s because you don’t have any answers and are just trying to hide that fact to trick people into accepting your proposal anyway, but rather than you just forgot to get to some things because it’s easy to miss important stuff when you’re not replying inline. > the purpose of the proposal, as a whole, is to make it easier to pick things > - generators in particular - apart. I tried to make that clear but clearly I > failed. No, you did make that part clear; what you didn’t make clear is (a) what exactly you’re trying to pick apart from the generators and why, (b) what actual problems look like, (c) how your proposal could make that code better, and (d) why existing solutions (like manually nexting iterators in a while loop, or using tools like peekable) don’t already solve the problem. Without any of that, all you’re doing is offering something abstract that might conceivably be useful, but it’s not clear where or why or even whether it would ever come up, so for all we know it’ll *never* actually be useful. Nobody’s likely to get on board with such a change. > Side note, here's one case where it'd be better than using zip_longest: Your motivating example should not be a “side note”, it should be the core of any proposal. But it should also be a real example, not a meaningless toy example. Especially not one where even you can’t imagine an actual similar use case. “We should add this feature because it would let you write code that I can’t imagine ever wanting to write” isn’t a rationale that’s going to attract much support. > for a, b, c, d, e, f, g in zip(*[iter(x)]*7): # this pattern is suggested by > the zip() docs, btw. >use_7x_algorithm(a, b, c, d, e, f, g) > else as x: # leftovers that didn't fit the 7-tuple. >use_slow_variable_arity_algorithm(*x) Why do you want to unpack into 7 variables with meaningless names just to pass those 7 variables? And if you don’t need that part, why can’t you just write this with zip_skip (which, as mentioned in the other thread, is pretty easy to write around zip_longest)? The best guess I can come up with is that in a real life example maybe that would have some performance cost that’s hard to see in this toy. But then if that’s the case, given that x is clearly not an iterator, is it a sequence? You could then presumably get much more optimization by looping over slices instead of using the grouper idiom in the first place. Or, as you say, by using numpy. > I haven't found a real use-case for this yet, tho. > SIMD is handled by numpy, which does a better job than you could ever hope > for in plain python, and for SIMD you could use zip_longest with a suitable > dummy instead. but... yeah, not really useful. > (actually: why do the docs for zip() even suggest this stuff anyway? seems > like something nobody would actually use.) That grouping idiom is useful for all kinds of things that _aren’t_ about optimization. Maybe the zip docs aren’t the best place for it (but it’s also in the itertools recipes, which probably is the best place for it), but it’s definitely useful. In fact, I used it less than a week ago. We’ve got this tool that writes a bunch of 4-line files, and someone concatenated a bunch of them together and wrote this horrible code to pull them back apart in another language I won’t mention here, and rather than debug their code, I just rewrote it in Python like this: with open(path) as f: for entry in chunkify(f, 4): process(entry) I used a function called chunkify because I think that’s a lot easier to understand (especially for
[Python-ideas] Smarter zip, map, etc. iterables (Re: Re: zip(x, y, z, strict=True))
On Apr 27, 2020, at 13:41, Christopher Barker wrote: > > SIDE NOTE: this is reminding me that there have been calls in the past for an > optional __len__ protocol for iterators that are not proper sequences, but DO > know their length -- maybe one more place to use that if it existed. But __len__ doesn’t really make sense on iterators. And no iterator is a proper sequence, so I think you meant _iterables_ that aren’t proper sequences anyway—and that’s already there: xs = {1, 2, 3} len(xs) # 3 isinstance(xs, collections.abc.Sized) # True I think the issue is that people don’t actually want zip to be an Iterator, they want it to be a smarter Iterable that preserves (at least) Sized from its inputs. The same way, e.g., dict.items or memoryview does. The same way range is lazy but not an Iterator. And it’s not just zip; the same thing is true for map, enumerate, islice, etc. And it’s also not just Sized. It would be just as cool if zip, enumerate, etc. preserved Reversible. In fact, “how do I both enumerate and reverse” comes up often enough that I’ve got a reverse_enumerate function in my toolbox to work around it. And, for that matter, why do they have to be only one-shot-iterable unless their input is? Again, dict.items and range come to mind, and there’s no real reason zip, map, islice, etc. couldn’t preserve as much of their input behavior as possible: xs = [1, 2, 3] ys = map(lambda x: x*3, xs) len(ys) # 3 reversed(enumerate(ys))[-1] # (0, 3) Of course it’s not always possible to preserve all behavior: xs = [1, 2, 3] ys = filter(lambda x: x%2, xs) len(ys) # still a TypeError even though xs is sized … but the cases where it is or isn’t possible can all be worked out for each function and each ABC: filter can _never_ preserve Sized but can _always_ preserves Reversible, etc. This is clearly feasible—Swift does it, and C++ is trying to do it in their next version, and Python already does it in a few special cases (as mentioned earlier), just not in all (or even most) of the potentially useful cases. The only really hard part of this is designing a framework that makes it possible to write all those views simply. You don’t want to have to write five different map view classes for all the ways a map can act based on its inputs, and then repeat 80% of that same work again for filter, and again for islice and so on. The boilerplate would be insane. (See the Swift 1.0 stdlib for an example of how horrible it could be, and they only implemented a handful of the possibilities.) And, except for a couple of things (notably genexprs), most of this could be written as a third-party library today. (And if it existed and people were using it widely, it would be pretty easy to argue that it should come with Python, so that it _could_ handle those last few things like genexprs, and also to serve as an example to encourage third-party libraries like toolz to similarly implement smart views instead of dumb iterators, and also as helpers to make that easier for them. That argument might or might not win the day, but at least it’s obvious what it would look like.) So I suspect the only reason nobody’s done so is that you don’t actually run into a need for it very often. How often do you actually need the result of zip to be Sized anyway? At least for me, it’s not very often. Whenever I run into any of these needs, I start thinking about the fully general solution, but put it off until I run into a second good use for it and meanwhile write a simple 2-minute workaround for my immediate use (or add a new special case like reversed_enumerate to my toolbox), and then by the time I run into another need for it, it’s been so long that I’ve almost forgotten the idea… But maybe there would be a lot more demand for this if people knew the idea was feasible? Maybe there are people who have tons of real-life examples where they could use a Sized zip or a Reversible enumerate or a Sequence map, and they just never thought they could have it so they never tried or asked? ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/3DAO3ZS7ZF4TIOKJBJK3XANTKNQ6DOKG/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: extended for-else, extended continue, and a rant about zip()
On Apr 27, 2020, at 12:49, Soni L. wrote: > > I wanna propose making generators even weirder! Why? Most people would consider that a negative, not a positive. Even if you demonstrate some useful functionality with realistic examples that benefit from it, all you’ve done here is set the bar higher for yourself to convince anyone that your change is worth it. > so, extended continue is an oldie: > https://www.python.org/dev/peps/pep-0342/#the-extended-continue-statement > > it'd allow one to turn: > > yield from foo > > into: > > for bar in foo: > continue (yield bar) And what’s the advantage of that? It’s a lot more verbose, harder to read, probably easier to get wrong, and presumably less efficient. If this is your best argument for why we should revisit an old rejected idea, it’s not a very good one. (If you’re accepting that it’s a pointless feature on its own but proposing it because, together with your other proposed new feature, it would no longer be pointless, then say that, don’t offer an obviously bad argument for it on its own.) > but what's this extended for-else? well, currently you have for-else: > > for x, y, z in zip(a, b, c): > ... > else: > pass > > and this works. you get the stuff from the iterators, and if you break the > loop, the else doesn't run. the else basically behaves like "except > StopIteration:"... > > so I propose an extended for-else, that behaves like "except StopIteration as > foo:". that is, assuming we could get a zip() that returns partial results in > the StopIteration (see other threads), we could do: > > for x, y, z in zip(a, b, c): > do_stuff_with(x, y, z) > else as partial_xy: > if len(partial_xy) == 0: > x = dummy > try: > y = next(b) > except StopIteration: y = dummy > try: > z = next(c) > except StopIteration: z = dummy > if (x, y, z) != (dummy, dummy dummy): > do_stuff_with(x, y, z) > if len(partial_xy) == 1: > x, = partial_xy > y = dummy > try: > z = next(c) > except StopIteration: z = dummy > do_stuff_with(x, y, z) > if len(partial_xy) == 2: > x, y = partial_xy > z = dummy > do_stuff_with(x, y, z) > > (this example is better served by zip_longest. however, it's nevertheless a > good way to demonstrate functionality, thanks to zip_longest's (and zip's) > trivial/easy to understand behaviour.) Would it always be this complicated and verbose to use this feature? I mean, compare it to the “roughly equivalent” zip_longest in the docs, which is a lot shorter, easier to understand, harder to get wrong, and more flexible (e.g., it works unchanged with any number of iterables, while yours to had to rewritten for any different number of iterables because it requires N! chunks of explicit boilerplate). Are there any examples where it lets you do something useful that can’t be done with existing features, so it’s actually worth learning this weird new feature and requiring Python 3.10+ and writing 22 lines of extra code? Even if there is such an example, if the code to deal with the post-for state is 11x as long and complicated as the for loop and can’t be easily simplified or abstracted, is the benefit of using a for loop instead of manually nexting iterators still a net benefit? I don’t know that manually nexting the iterators will always avoid the problem, but it certainly often is (again, look at many of the equivalents in the itertools docs that do it), and it definitely is in your emulating-zip_longest example, and that’s the only example you’ve offered. Also notice that many cases like this can be trivially solved by a simple peekable or unnextable (I believe more-itertools has both, and the first one is a recipe in itertools too, but I can’t remember the names they use; if not, they’re really easy to write) or tee. We don’t even need any of that for your example, but if you can actually come up with another example, make sure it isn’t already doable a lot more simply with peekable/etc. > this would enable one to turn: > > return yield from foo > > into: > > for bar in foo: > continue (yield bar) > else as baz: > return baz > > allowing one to pick apart and modify the yielded and sent parts, while still > getting access to the return values. Again, this is letting you turn something simple into something more complicated, and it’s not at all clear why you want to do that. What exactly are you trying to pick apart that makes that necessary, that can’t be written better today? I’ll grant that writing something fully general that supports all the different things that could be theoretically done with your desired feature requires the ugly mess that you posted
[Python-ideas] Re: zip(x, y, z, strict=True)
On Apr 26, 2020, at 21:23, David Mertz wrote: > > >> On Sun, Apr 26, 2020 at 11:56 PM Christopher Barker >> wrote: >> > If I have two or more "sequences" there are basically two cases of that. >> >> so you need to write different code, depending on which case? that seems not >> very "there's only one way to do it" to me. > > This difference is built into the problem itself. There CANNOT be only one > way to do these fundamentally different things. > > With iterators, there is at heart a difference between "sequences that one > can (reasonably) concretize" and "sequences that must be lazy." And that > difference means that for some versions of a seemingly similar problem it is > possible to ask len() before looping through them while for others that is > not possible (and hence we may have done some work that we want to > "roll-back" in some sense). Agreed. But here’s a different way to look at it: The Python iteration protocol hides the difference between different kinds of iterables; every iterator is just a dumb next-only iterator. So any distinction between things you can pre-check and things you can post-check has to be made at a higher level, up wherever the code knows what’s being iterated (probably the application level). That isn’t inherent to the idea of iteration, as demonstrated by C++ (and later languages like Swift), where you can have reversible or random-accessible iterators and write tools that switch on those features, so you wouldn’t be forced to make the decision at the application level. You could write a generic C++ zip_equal function that pre-checks random-accessible iterators but post-checks other iterators. But when would you want that generic function? When you’re writing that application code, you know whether you have sequences, inherently lazy iterators, or generic iterables as input, and you know whether you want no check, a pre-check, or a post-check on equal lengths, and those aren’t independent questions: when you want a pre-check, it’s because you’re thinking in sequence terms, not general iteration terms. Pre-checking sequences is so trivial that you don’t need any helpers. The only piece Python is (arguably) missing is a way to do that post-check easily when you’ve decided you need it, and that’s what the proposals in this thread are trying to solve. The fact that asking for post-checking on the zip iterator won’t look the same as manually pre-checking the input sequences isn’t a violation of TOOWTDI because the “it” you’re doing is a different thing, different in a way that’s meaningful to your code, and there doesn’t have to be one obvious way to do two different things. Just like slicing doesn’t have to look the same as islice, and a find method doesn’t have to look the same as a generic iterable find function, and so on; they only look the same when the distinction between thinking about sequences and thinking about lazy iterables is irrelevant to the problem. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/XXMKTQFT5JJGZS2QNFFT5JUCXLN3GV6J/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: zip(x, y, z, strict=True)
On Apr 26, 2020, at 16:58, Steven D'Aprano wrote: > > On Sun, Apr 26, 2020 at 04:13:27PM -0700, Andrew Barnert via Python-ideas > wrote: > >> But if we add methods on zip objects, and then we add a new skip() >> method in 3.10, how does the backport work? It can’t monkeypatch the >> zip type (unless we both make the type public and specifically design >> it to be monkeypatchable, which C builtins usually aren’t). > > Depends on how you define monkey-patching. > > I'm not saying this because I see the need for a plethora of methods on > zip (on the contrary); but I do like the methods-on-function API, like > itertools.chain has. Functions are namespaces, and we under-utilise > that fact in our APIs. > >Namespaces are one honking great idea -- let's do more of those! > > Here is a sketch of how you might do it: > ># Untested. >class MyZipBackport(): >real_zip = builtins.zip >def __call__(self, *args): >return self.real_zip(*args) >def __getattr__(self, name): ># Delegation is another under-utilised technique. >return getattr(self.real_zip, name) >def skip(self, *args): ># insert implementation here... > >builtins.zip = MyZipBackport() But this doesn’t do what the OP suggested; it’s a completely different proposal. They wanted to write this: zipped = zip(xs, ys).skip() … and you’re offering this: zipped = zip.skip(xs, ys) That’s a decent proposal—arguably better than the one being discussed—but it’s definitely not the same one. > I don't know what "zip.skip" is supposed to do, I quoted it in the email you’re responding to: it’s supposed to yield short tuples that skip the iterables that ran out early. But from the wording you quoted it should be obvious that isn’t an issue here anyway. As long as you understand their point that they want to leave things open for expansion to new forms of zipping in the future, you can understand my point that their design makes that harder rather than easier. >> Also, what exactly do these methods return? > > An iterator. What kind of iterator is an implementation detail. > > The type of the zip objects is not part of the public API, only the > functional behaviour. Now go back and do what the OP actually asked for, with the zip iterator type having shortest(), equal(), and longest() methods in 3.9 and a skip() method added in 3.10. It’s no longer just “some iterator type, doesn’t matter”, it has specific methods on it, documented as part of the public API, and you need to either subclass it or emulate it. That’s exactly the problem I’m pointing out. The fact that it’s not true in 3.8, it’s not required by the problem, it’s not true of other designs proposed in this thread like just having more separate functions in itertools, it’s specifically a flaw with this design. So the fact that you can come up with a different design without that flaw isn’t an argument against my point, it’s just a probably-unnecessary further demonstration of my point. Your design looks like a pretty good one at least at first glance, and I think you should propose it seriously. You should be showing why it’s better than adding methods to zip objects—and also better than adding more functions to itertools or builtins, or flags to zip, or doing nothing—not pretending it’s the same as one of those other proposals and then trying to defend that other proposal by confusing the problems with it. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/WINTXNJWN7THOKAWTCFK3GZICEFDJJIC/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: zip(x, y, z, strict=True)
On Apr 26, 2020, at 14:36, Daniel Moisset wrote: > > This idea is something I could have used many times. I agree with many people > here that the strict=True API is at least "unusual" in Python. I was thinking > of 2 different API approaches that could be used for this and I think no one > has mentioned: > we could add a callable filler_factory keyword argument to zip_longest. That > would allow passing a function that raises an exception if I want "strict" > behaviour, and also has some other uses (for example, if I want to use [] as > a filler value, but not the *same* empty list for all fillers) This could be useful, and doesn’t seem too bad. I still think an itertools.zip_equal would be more discoverable and more easily understandable than something like itertools.zip_longest(fill_factory=lambda: throw(ValueError)), especially since you have to write that thrower function yourself. But if there really are other common uses like zip_longest(fill_factory=list), that might make up for it. > we could add methods to the zip() type that provide different behaviours. > That way you could use zip(seq, seq2).shortest(), zip(seq1, seq2).equal(), > zip(seq1, seq2).longer(filler="foo") ; zip(...).shortest() would be > equivalent to zip(...). Other names might work better with this API, I can > think of zip(...).drop_tails(), zip(...).consume_all() and zip(...).fill(). > This also allows adding other possible behaviours (I wouldn't say it's > common, but at least once I've wanted to zip lists of different length, but > get shorter tuples on the tails instead of fillers). This second one is a cool idea—but your argument for it seems to be an argument against it. If we stick with separate functions in itertools, and then we add a new one for your zip_skip (or whatever you’d call it) in 3.10, the backport is trivial. Either more-itertools adds zip_skip, or someone writes an itertools310 library with the new functions in 3.10, and then people just do this: try: from itertools import zip_skip except ImportError: from more_itertools import zip_skip But if we add methods on zip objects, and then we add a new skip() method in 3.10, how does the backport work? It can’t monkeypatch the zip type (unless we both make the type public and specifically design it to be monkeypatchable, which C builtins usually aren’t). So more-itertools or zip310 or whatever has to provide a full implementation of the zip type, with all of its methods, and probably twice (in Python for other implementations plus a C accelerator for CPython). Sure, maybe it could delegate to a real zip object for the methods that are already there, but that’s still not trivial (and adds a performance cost). Also, what exactly do these methods return? Do they set some flag and return self? If so, that goes against the usual Python rule that mutator methods return None rather than self. Plus, it opens the question of what zip(xs, ys).equal().shortest() should do. I think you’d want that to be an AttributeError, but the only sensible way to get that is if equal() actually returns a new object of a new zip_equal type rather than self. So, that solves both problems, but it means you have to implement four different builtin types. (Also, while the C implementation of those types, and constructing them from the zip type’s methods, seems trivial, I think the pure Python version would have to be pretty clunky.)___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/VF7VRHZPDJXOT3DKYNK3KWUS6HBW3OLX/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Adding a "once" function to functools
On Apr 26, 2020, at 10:49, Eric Fahlgren wrote: > >> On Sun, Apr 26, 2020 at 9:46 AM Alex Hall wrote: >> It's not clear to me why people prefer an extra function which would be >> exactly equivalent to lru_cache in the expected use case (i.e. decorating a >> function without arguments). It seems like a good way to cause confusion, >> especially for beginners. Based on the Zen, there should be one obvious way >> to do it. > > I don't believe it is. lru_cache only guarantees that you will get the same > result back for identical arguments, not that the function will only be > called once. Seems to me if you call it, then in the middle of caching the > value, there's a thread change, you could get to the function wrapped by > lru_cache twice (or more times). In order to implement once, it needs to > contain a thread lock to ensure its "once" moniker and support the singleton > pattern for which it is currently being used (apparently incorrectly) in > django and other places. Am I understanding threading correctly here? There are three different use cases for “once” in a threaded program: 1. It’s incorrect or dangerous to even call the function twice. 2. The function isn’t idempotent but you need it to be. 3. The function is idempotent and it’s purely a performance optimization. For the third case, you don’t need any synchronization for correctness (as long as reading and writing the cache value is atomic), and it may actually be a lot faster. Sure, it means occasionally you end up doing the work two or even more times at startup, but in exchange you avoid a zillion thread locks, which can be a lot more expensive. If that’s the case with those Django uses, they’re not using it incorrectly. Also, if you know your app’s sequencing well enough and know exactly what the GIL guarantees, you might be able to prove (or at least convince yourself well enough that if test X passes it’s almost certainly safe) that there’s no chance of startup contention. This includes the really trivial case where you know what’s needed before you fork any threads that might need it (although for a lot of those cases, in Python, it’s probably simpler to just use a module global, but using an unsynchronized cache isn’t terrible for readability). Of course it’s also possible that Django is using it incorrectly and it just shows up as a handful of web apps starting up wrong one in a million instances and there are live bugs all over the internet that nobody’s handling right. But I wouldn’t just assume that it’s incorrect and add a new feature to Python and encourage Django to rewrite a whole lot of code to use it without finding an actual bug first. Also, it’s pretty easy to turn a unsynchronized implementation into a synchronized one: just add a @synchronized decorator around the @lru_cache or @cached_property decorator (or write a simple @synchronized_lru_cache or @synchronized_cached_property decorator and use that). So, does Python really need to include anything in the stdlib to make it easier? (That’s not a rhetorical question; I’m not sure.) On the other hand, if the bugs are actually the second case rather than the first, you can solve that with something faster than a full read-write mutex, but it’s a lot more complicated (and may not even be writeable in Python at all): read-acquire the cache, and if it’s empty, call the function and then compare-and-swap-release the cache, and if the CAS fails that means someone else got there first so discard your value and return theirs. If that comes up a lot and the performance benefit is often worth having, that seems like it should definitely be in the stdlib because people won’t get it right. But I doubt it does. One last thing: the best way to cache an idempotent nullary function with lru_cache is to use maxsize=None. If people are leaving the default 128, maybe the docs need to be improved in some way? ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/BYGTJWJQQ2YEG6KKDCXAM7ZFSMYWANEX/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: zip(x, y, z, strict=True)
On Apr 25, 2020, at 09:40, Christopher Barker wrote: > > - The main exception to this may be when one of them is infinite, but how > common is that, really? Remember that when zip was first created (py2) it was > a list builder, not an iterator, and Python itself was much less > iterable-focused. Well, yes, and improvements like that are why Python 3.9 is a better language than Python 2.0 (when zip was first added). Python wasn’t just much less iterable-focused, it didn’t even have the concept of “iterable”. While it did have map and filter, the tutorial taught you to loop over range(len(xs)), only mentioning map and filter as “good candidates to pass to lambda forms” for people who really want to pretend Python is Lisp rather than using it properly. Adding the iterator protocol and more powerful for loop; functions like zip, enumerate, and iter; generators, comprehensions, and generator expressions; itertools; yield from; and changing map and friends to iterators is a big part of why you can write all kinds of things naturally in Python 3.9 that were clumsy, complicated, or even impossible. Sure, you can use it as if it were Python 2.0 but with Unicode, but it’s a lot more than that. But also, why was zip added with “shortest” behavior in 2.0 in the first place? It wasn’t to support infinite or otherwise lazy lists, because those didn’t exist. And it wasn’t chosen on a whim. In Python 1.x, if you knew your lists were the same length, you used map with None as the function. (Well, usually you just looped over range(len(first_list)), but if you wanted to be all Lispy, you used map.) But if you didn’t know the lists were the same length, you couldn’t (because map had “longest” behavior, with an unchangeable fillvalue of None, until 3.0). If that didn’t actually come up for people even in Python 1.x, nobody would have asked for it in 2.0. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/VTWDBKYES27GRT6ZH3SPUNI5YDCE3YQY/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: zip(x, y, z, strict=True)
On Apr 24, 2020, at 11:07, Brandt Bucher wrote: > > 1. Likely the most common case, for me, is when I have some data and want to > iterate over both it and a calculated pairing: > x = ["a", "b", "c", "d"] y = iter_apply_some_transformation(x) for a, b in zip(x, y): > ... ... # Do something. > ... Your other examples are a lot more compelling. I can easily imagine actually being bitten by zip(*ragged_iterables_that_I_thought_were_rectangular) and having a hard time debugging that, and the other one is an actual bug in actual code, which is even harder to dismiss. I think this one, on the other hand, is exactly what I think doubters are imagining. I can easily imagine cases where you want to zip together two obviously-equal iterables, but when they’re obviously equal, adding a check for that is hardly the first thing I’d think about defending against. (For example, things like using “spam eggs cheese”.strip() instead of .split() as the input are more common logic errors and even less fun to debug…) And that’s why people keep asking for examples—because the proponents of the change keep talking as if there are examples like your 2 and 3 where everyone would agree that there’s a significant benefit to making it easier to be defensive, but the wary conservatives are only imagining examples like your 1. Anyway, if I’m right, I think you just solved that problem, and now everyone can stop talking past each other. (Although the couple of people who suggested wanting to _handle_ the error as a normal case rather than treating it as a logic error to debug like your examples still need to give use cases if they want anything different than what you want.) ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/H3AVAVKK4FCVAYNXIVUXFA4MS4HLPLVO/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Make type(None) the no-op function
> On Apr 24, 2020, at 08:46, Soni L. wrote: > > it's not my own use case for once. the PEP clearly lists a use-case. we > should support that use-case. So your use case is the rationale from a PEP written because Barry “can’t resist” and rejected as a joke in record time, for which a better solution was already added (use 0 for the environment variable) 3 years ago. And you ended your email with this code snippet: NoneType > Traceback (most recent call last): > File "", line 1, in > NameError: name 'NoneType' is not defined … which demonstrates that even if you had a real need for a PYTHON_BREAKPOINT=noop, your proposal wouldn’t have helped anyway. There is actually a way you could pass it, but if you don’t know that way, and felt the need to show us that you don’t know it, that can’t be your use case. And even after you figure it out, it would hardly be any more obvious to other people who need a noop function where to go digging for it than it was for you. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/AHQX44DL2GRLDNOBCRMXG62LP7BADA7V/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: zip(x, y, z, strict=True)
> On Apr 22, 2020, at 14:09, Steven D'Aprano wrote: > > On Wed, Apr 22, 2020 at 10:33:24AM -0700, Andrew Barnert via Python-ideas > wrote: > >> If that is your long-term goal, I think you could do it in three steps. > > I think the first step is a PEP. This is not a small change that can be > just done on a whim. Yes, I agree. Each of the three steps will very likely require a PEP. And not only that, the PEP for this first step has to make it clear that it’s useful on its own—not just to people like Serhiy who eventually want to replace zip and see it as a first step, but also to people who do not want zip to ever change but do want a convenient way to opt in to checking zips (and don’t find more-itertools convenient enough) and see this as the _only_ step. >> And of course after the first two steps you can proselytize for the >> next one. If you can convince lots of people that they should care >> about the choice more often and get them using the explicit functions, >> it’ll be a lot harder to argue that everyone is happy with today’s >> behavior. > > If they need to be *convinced* to use the new function, then they don't > really need it and didn't want it. I had to be convinced that I wanted str.format. (The guy who convinced me was enthusiastic enough that he went through the effort of writing a __format__ method for my Fixed1616 class to show how easily extensible it is.) But really, I did want it, and just didn’t know it yet. Hell, I had to be convinced to use Python instead of sticking with Perl and Tcl, but it turned out I did want it. Let’s assume that the proponents of adding zip_strict are right that using it will often give you early failures on some common uses that are today painful to debug. If so, most people don’t know that today, and aren’t going to think of it just because a new function shows up in itertools, or a new flag on a builtin, or whatever. Someone will have to convince them to use it. But then, one evening, they’ll get an exception and realize, “Whoa, that would have taken me hours to debug otherwise, if I’d even spotted the bug…”, and they’ll realize they needed it, just as much as the handful who noticed the need in advance and went looking. The proponents of the bigger, longer-term change of eventually making this the default behavior for zip may be right too. If so, many of the people who were convinced to use zip_strict will find it helpful so often, and zip_shortest so unusual in their code, that they start asking why the hell strict isn’t the default instead of shortest. And then it’ll be a lot easier for Serhiy or whoever to sell such a big change. Of course if that doesn’t ever happen, it’ll be a lot harder to sell the change—but in that case, the change would be a mistake, so that’s good too. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/PPSOSLWFLGV4KF2X44THDJ53XPIOSZTY/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Add extend_const action to argparse
On Apr 22, 2020, at 15:04, pyt...@roganartu.com wrote: > > The natural extension to this filtering idea are convenience args that set > two const values (eg: `--filter x --filter y` being equivalent to > `--filter-x-y`), but there is no `extend_const` action to enable this. > > While this is possible (and rather straight forward) to add via a custom > action, I feel like this should be a built-in action instead. `append` has > `append_const`, it seems intuitive and reasonable to expect `extend` to have > `extend_const` too (my anecdotal experience the first time I came across this > need was that I simply tried using `extend_const` without checking the docs, > assuming it already existed). I’m pretty sure I’ve run into the exact same situation (well, not accumulating filters, but accumulating something and wanting to add multiple constants from one flag), had the same “Really? It’s not there?” reaction as you, and then just muttered and worked around it. It makes sense to me to fix it, exactly the way you propose. My only comment is that when you write the example(s) for the docs, it might be worth using a tuple rather than a list for the const value. It doesn’t really make a difference, but people might be momentarily confused by a mutable list called “const”. Also, looking at the _copy_items function you’re calling: it has a comment saying it’s only used by append and append_const, but that’s wrong as it’s also used by extend. And of course you’re adding extend_const. I don’t know if that’s worth fixing separately, but if not it seems to me it’s probably worth fixing in your patch. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/VGXVYH5ICUJSMMWWZZGBDHBUIYFA5IWT/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: zip(x, y, z, strict=True)
> On Apr 21, 2020, at 16:02, Steven D'Aprano wrote: > > On Tue, Apr 21, 2020 at 12:25:06PM -0700, Andrew Barnert via Python-ideas > wrote: >>> On Apr 21, 2020, at 01:36, Serhiy Storchaka wrote: >>> except ValueError: # assuming that’s the exception you want? >> For what it’s worth, more_itertools.zip_equal raises an >> UnequalIterablesError, which is a subclass of ValueError. >> I’m not sure whether having a special error class is worth it, but that’s >> because nobody’s providing any examples of code where they’d want to handle >> this error. Presumably there are cases where something else in the >> expression could raise a ValueError for a different reason, and being able >> to catch this one instead of that one would be worthwhile. But how often? No >> idea. > >> At a guess, I’d say that if this has to be a builtin (whether >> flag-switchable behavior in zip or a new builtin function) it’s >> probably not worth adding a new builtin exception, but if it’s going >> to go into itertools it probably is worth it. > > Why? Well, you quoted the answer above, but I’ll repeat it: >> Presumably there are cases where something else in the expression could >> raise a ValueError for a different reason, and being able to catch this one >> instead of that one would be worthwhile. But how often? No idea. For a little more detail: A few people (like Soni) keep trying to come up with general-purpose ways to differentiate exceptions better. The strong consensus is always that we don’t need any such thing, because in most cases, Python gives you just enough to differentiate what you actually need in most code. (That wasn’t quite true in Python 2, but it is now.) We have LookupError with subclasses KeyError and IndexError, but not additional subclasses IndexTooBigError and IndexTooSmallError, and so on. For the IOError subclasses, Python does kind of lean on C/POSIX, but that’s still good enough that it’s fine. The question in every case is: do you often need to distinguish this case? In this case: will the zip_strict postcondition violation be used in a lot of places where there are other likely sources of ValueError that need to be distinguished? If so, it should be a separate subclass. If that will be rare, it shouldn’t. As I said, I don’t know the answer to that question, because none of the people saying they need an exception here have given any examples where they’d want to handle the exception, and it’s hard to guess how people want to handle an exception when you don’t even know where and when they want to handle it. So I took a guess to start the discussion. If you have a different guess, fine. But really, we need the people who have code in mind that would actually use this to show us that code or tell us about it. > I know that the Python community has a love-affair with more-itertools, > but I don't think that it is a well-designed library offering good APIs. > It's a grab-bag of "everything including the kitchen sink". Just because > they use a distinct exception doesn't mean we should follow them. If I thought we should just do what more-itertools does without thinking, I would have said “more-itertools has a separate exception, so we should”, rather than saying “For what it’s worth, more-itertools has a separate exception” and then concluding that I don’t know if we actually need one and we need to look at actual examples to decide. When all else is equal, I think it’s worth being consistent with more-itertools just because that way we get an automatic backport. But that’s not a huge win, and quite often, all else isn’t equal, so looking at what more-itertools does and why isn’t the answer, it’s just one piece of information to throw into the discussion. And I think that’s the case here: their design raises a question for us to answer, but it doesn’t answer it for us. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/M2IYSPBE37G55QQ4PL2RDFJ3BXLBTI56/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: zip(x, y, z, strict=True)
On Apr 21, 2020, at 19:35, Steven D'Aprano wrote: > > On Mon, Apr 20, 2020 at 07:47:51PM -0700, Andrew Barnert wrote: > counter = itertools.count() yield from zip(counter, itertools.chain(headers, [''], body, ['']) lines = next(counter) >>> >>> That gives you one more than the number of lines yielded. >> >> Yeah, I screwed that up in simplifying the real code without testing >> the result. And your version gives one _less_ than the number yielded. > > No, my version repeats the last number yielded, which is precisely what > you wanted (as I understand it). No, I wanted the number of lines yielded. You not only quoted that, but directly claimed that you were giving the number of lines yielded. But you’re not; you’re giving me the number of the last line, which is 1 less than that. >py> def test(): >... headers = body = '' >... for t in enumerate(itertools.chain(headers, [''], body, [''])): >... yield t >... print(t[0]) >... >py> list(test()) >1 >[(0, ''), (1, '')] Right. The number of pairs yielded is 2. Your code prints 1. >> (With either enumerate(xs) or zip(counter, xs) the last element will >> be (len(xs)-1, xs[-1]). > > Um, yes? That's because both enumerate and counter start from zero by > default. I would have asked you why you were counting your lines > starting from zero instead of using `enumerate(xs, 1)` but I thought > that was intentional. You were right, counting from 0 was intentional. Just as it is almost everywhere in Python. The caller needs those line numbers; otherwise I wouldn’t be yielding them in the first place. And that’s why your solution is wrong: you correctly left it counting from 0, but then incorrectly assumed that the last number equals the count, which is only true when counting from 1. If that’s not a classic fencepost error, I don’t know what is. And my originally-posted version has a different fencepost error, as you pointed out. And my real code doesn’t, but I may well have made one and had to spend a minute debugging it. Nontrivial counting code often has fencepost errors, and Python only eliminates the sources that come up often, not every possible one that might come up rarely, which is fine. And this proposal doesn’t change that in any way, nor is it meant to. >> Your version has the additional problem that >> if the iterable is empty, t is not off by one but unbound (or bound to >> some stale old value)—but that’s not possible in my example, and >> probably not in most similar examples. > > But the iterable is never empty, because you always yield at least > two blanks. Yes; I said “but that’s not possible in my example”, as you quoted directly above. > I don't believe this zip_strict proposal would help you in this > situation. I think it will make it worse, Well, of course. Since it wasn’t an argument for the proposal, but an example pointing out a potential hole in the proposal that needed to be thought through, why would you expect the proposal to help it? To recap: Someone had said that it doesn’t matter what state the iterables are left in, because nobody ever looks at an iterator after zip. So I gave an example of (simplified) real code that looks at an iterator after zip. So people thought through what state the iterables should be left in by this new zip_strict function, and there is a reasonable answer. Even if your arguments about this example were correct, they wouldn’t be relevant to the thread, because the entire purpose of giving the example has been fulfilled. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/B64KN2HMCNXPRKPULNP3KE4HQM5A4F2U/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: zip(x, y, z, strict=True)
On Apr 22, 2020, at 01:52, Serhiy Storchaka wrote: > > 22.04.20 11:20, Antoine Pitrou пише: >> Ideally, that's what it would do. Whether it's desirable to transition >> to that behaviour is an open question. >> But, as far as I'm concerned, the number of times where I took >> advantage of zip()'s current acceptance of heteregenously-sized inputs >> is extremely small. In most of my uses of zip(), a size difference >> would have been a logic error that deserves noticing and fixing. > > I concur with Antoine. Ideally we should have several functions: > zip_shortest(), zip_equal(), zip_longest(). In most cases (80% or 90% or > more) they are equivalent, because input iterators has the same length, but > it is safer to use zip_equal() to catch bugs. In other cases you would use > zip_shortest() or zip_longest(). And it would be natural to rename the most > popular variant to just zip(). > > Now it is a breaking change. We had a chance to do it in 3.0, when other > breaking change was performed in zip(). I do not know if it is worth to do > now. But when we plan any changes in zip() we should take into account > possible future changes and make them simpler, not harder. If that is your long-term goal, I think you could do it in three steps. First, just add itertools.zip_equal. Ideally the docs should replace the usual “Added in 3.9” with something like “Added in 3.9; if you need the same function in earlier versions see more-itertools” (linked to the more-itertools blurb at the top of the page). It seems like there’s a lot of support for this step even from people who don’t want your big goal. Second, add itertools.zip_shortest. And change zip’s docs to say that it’s the same as zip_shortest and mention the other two choices, and maybe even to try to nudge people to explicitly decide which of the three they want. And find some places in the tutorial that use zip and change them to use zip_equal and zip_shortest as appropriate. I think that gets you about as much as you can get without backward compatibility issues, and it also gets you closer to being able to deprecate zip or change it to alias zip_equal, rather than making it harder. Third, do the deprecation. By that point, everyone maintaining existing code will have an easy way to defensively prepare for it: as long as they can require 3.10+ or more-itertools, they can just change all uses of zip to zip_shortest and they’re done. Still not painless, but about as painless as a backward compatibility break could ever be. And of course after the first two steps you can proselytize for the next one. If you can convince lots of people that they should care about the choice more often and get them using the explicit functions, it’ll be a lot harder to argue that everyone is happy with today’s behavior. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/NVBFRNG4PPJQ3SEIZJMGXY5UFB3LNWZ3/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: zip(x, y, z, strict=True)
On Apr 21, 2020, at 01:36, Serhiy Storchaka wrote: > >except ValueError: # assuming that’s the exception you want? For what it’s worth, more_itertools.zip_equal raises an UnequalIterablesError, which is a subclass of ValueError. I’m not sure whether having a special error class is worth it, but that’s because nobody’s providing any examples of code where they’d want to handle this error. Presumably there are cases where something else in the expression could raise a ValueError for a different reason, and being able to catch this one instead of that one would be worthwhile. But how often? No idea. At a guess, I’d say that if this has to be a builtin (whether flag-switchable behavior in zip or a new builtin function) it’s probably not worth adding a new builtin exception, but if it’s going to go into itertools it probably is worth it. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/3CMU427P5H7RNZF5QQ7QAAEEMAYLFTBU/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: zip(x, y, z, strict=True)
On Apr 21, 2020, at 01:36, Serhiy Storchaka wrote: > > 20.04.20 23:33, Andrew Barnert via Python-ideas пише: >> Should this print 1 or 2 or raise StopIteration or be a don’t-care? >> Should it matter if you zip(y, x, strict=True) instead? > > It should print 2 in both cases. The only way to determine whether the > iterator ends is to try to get its next value. And this value (1) will lost, > because there is no way to return it or "unput" to the iterator. There is no > reason to consume more values, so StopIteration is irrelevant. > > There is more interesting example: > >x = iter(range(5)) >y = [0] >z = iter(range(5)) >try: >zipped = list(zip(x, y, z, strict=True)) >except ValueError: # assuming that’s the exception you want? >assert zipped == [(0, 0, 0)] >assert next(x) == 2 >print(next(z)) > > Should this print 1 or 2? > > The simple implementation using zip_longest() would print 2, but more optimal > implementation can print 1. You’re right; that’s the question I should have asked; thanks. As I said, I think either answer is probably acceptable as long as it’s documented (and, therefore, it’s also clear that the consequences have been thought through). ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/EAWRMFD3JOSMIGHRLOHYQZMWNKKVDBRU/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Keyword arguments self-assignment
On Apr 21, 2020, at 01:27, M.-A. Lemburg wrote: > > On 21.04.2020 04:25, Andrew Barnert via Python-ideas wrote: >>> On Apr 20, 2020, at 16:24, M.-A. Lemburg wrote: >>> >>> On 20.04.2020 19:43, Andrew Barnert via Python-ideas wrote: >>>>> On Apr 20, 2020, at 01:06, M.-A. Lemburg wrote: >>>>> >>>>> The current version already strikes me as way too complex. >>>>> It's by far the most complex piece of grammar we have in Python: >>>>> >>>>> funcdef: 'def' NAME parameters ['->' test] ':' [TYPE_COMMENT] >>>>> func_body_suite >>>> >>>> But nobody’s proposing changing the function definition syntax here, only >>>> the function call syntax. Which is a lot less hairy. It is still somewhat >>>> hairy, but nowhere near as bad, so this argument doesn’t really apply. >>> >>> True, I quoted the wrong part of the grammar for the argument, >>> sorry. I meant this part: >>> >>> https://docs.python.org/3/reference/expressions.html#calls >>> >>> which is simpler, but not really much, since the devil is in >>> the details. >> >> Let’s just take one of the variant proposals under discussion here, adding >> ::identifier to dict displays. This makes no change to the call grammar, or >> to any of the call-related bits, or any other horribly complicated piece of >> grammar. It just changes key_datum (a nonterminal referenced only in >> dict_display) from this: >> >>expression ":" expression | “**” or_expr >> >> … to this: >> >>expression ":" expression | “::” identifier | “**” or_expr >> >> That’s about as simple as any syntax change ever gets. >> >> Which is still not nothing. But you’re absolutely right that a big and messy >> change to function definition grammar would have a higher bar to clear than >> most syntax proposals—and for the exact same reason, a small and local >> change to dict display datum grammar has a lower bar than most syntax >> proposals. > > I think the real issue you would like to resolve is how to get > at the variable names used for calling a function, essentially > pass-by-reference (in the Python sense, where variable names are > references to objects, not pointers as in C) rather than > pass-by-value, as is the default for Python functions. No, nobody’s asking for that either. It wouldn’t directly solve most of the examples in this thread, or even indirectly make them easier to solve. The problem in most cases is that they have to call a function that they can’t change with a big mess of parameters. Any change to help the callee side doesn’t do any good, because the callee is the thing they can’t change. The fix needs to be on the caller side alone. This also wouldn’t give you useful pass-by-reference in the usual sense of “I want to let the callee rebind the variables I pass in”, because a name isn’t a reference in Python without the namespace to look it up in. Even if the callee knew the name the caller used for one of its parameters, how would it know whether that name was a local or a cell or a global? If it’s a local, how would it get at the caller’s local environment without frame hacking? (As people have demonstrated on this thread, frame hacking on its own is enough, without any new changes.) Even if it could get that local environment, how could it rebind the variable when you can’t mutate locals dicts? Also, most arguments in Python do not have names, because arguments are arbitrary expressions. Of course the same thing is true in, say, C++, but that’s fine in C++, because lvalue expressions have perfectly good lvalues even if they don’t have good names. You can pass p->vec[idx+1].spam to a function that wants an int&, and it can modify its parameter and you’ll see the change on your side. How could your proposal handle even the simplest case of passing lst[0]? Even if it could work as advertised, it’s hugely overkill for this problem. A full-blown macro system would let people solve this problem, and half the other things people propose for Python, but that doesn’t mean that half the proposals on this list are requests for a full-blown macro system, or that it’s the right answer for them. > The f-string logic addresses a similar need. Similar, yes, but the f-string logic (a) runs in the caller’s scope and (b) evaluates code that’s textually part of the caller. > With a way to get at the variable names used for calling a > function from inside a function, you could then write a dict > constructor which gives you the subset of vars() you are > looking for. Most of the use cases involve “I
[Python-ideas] Re: zip(x, y, z, strict=True)
> On Apr 20, 2020, at 17:22, Steven D'Aprano wrote: > > On Mon, Apr 20, 2020 at 03:28:09PM -0700, Andrew Barnert via Python-ideas > wrote: > >> Admittedly, such cases are almost surely not that common, but I >> actually have some line-numbering code that did something like this >> (simplified a bit from real code): >> yield from enumerate(itertools.chain(headers, [''], body, ['']) >> … but then I needed to know how many lines I yielded, and there’s no >> way to get that from enumerate, so instead I had to do this: > > Did you actually need to "yield from"? Unless your caller was sending > values into the enumerate iterable, which as far as I know enumerate > doesn't support, "yield from" isn't necessary. True. Using yield from is more efficient, more composeable, and usually (but not here) more concise and readable, but none of those are relevant to my example (or the real code). I suppose it’s just a matter of habit to reach for yield from before a loop over yield even in cases where it doesn’t matter much. >> counter = itertools.count() >> yield from zip(counter, itertools.chain(headers, [''], body, ['']) >> lines = next(counter) > > That gives you one more than the number of lines yielded. Yeah, I screwed that up in simplifying the real code without testing the result. And your version gives one _less_ than the number yielded. (With either enumerate(xs) or zip(counter, xs) the last element will be (len(xs)-1, xs[-1]). Your version has the additional problem that if the iterable is empty, t is not off by one but unbound (or bound to some stale old value)—but that’s not possible in my example, and probably not in most similar examples. Both are easy to fix in practice, but both (as we just demonstrated) even easier to get wrong the first time, like all fencepost errors. Maybe it would be better to use an undoable/peekable/tee wrapper after all, but without writing it out I’m not sure that wouldn’t be just as fencepostable… Anyway, that’s exactly why I want to make sure the fencepost behavior is actually defined for this new proposal. Any reasonable answer is probably fine; people probably won’t run into wanting the leftovers, but if they ever do, as long as the docs say what should be there, they’ll work it out. That, and the implementation constraint. If everyone were convinced that the only reasonable answer is to fully consume all inputs on error, that would be a bit of a problem, so it’s worth making sure nobody is convinced of that. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/6KC76CDWZM45K3E6V3JXHJRTMLBXKY2R/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Keyword arguments self-assignment
On Apr 20, 2020, at 16:24, M.-A. Lemburg wrote: > > On 20.04.2020 19:43, Andrew Barnert via Python-ideas wrote: >>> On Apr 20, 2020, at 01:06, M.-A. Lemburg wrote: >>> >>> The current version already strikes me as way too complex. >>> It's by far the most complex piece of grammar we have in Python: >>> >>> funcdef: 'def' NAME parameters ['->' test] ':' [TYPE_COMMENT] >>> func_body_suite >> >> But nobody’s proposing changing the function definition syntax here, only >> the function call syntax. Which is a lot less hairy. It is still somewhat >> hairy, but nowhere near as bad, so this argument doesn’t really apply. > > True, I quoted the wrong part of the grammar for the argument, > sorry. I meant this part: > > https://docs.python.org/3/reference/expressions.html#calls > > which is simpler, but not really much, since the devil is in > the details. Let’s just take one of the variant proposals under discussion here, adding ::identifier to dict displays. This makes no change to the call grammar, or to any of the call-related bits, or any other horribly complicated piece of grammar. It just changes key_datum (a nonterminal referenced only in dict_display) from this: expression ":" expression | “**” or_expr … to this: expression ":" expression | “::” identifier | “**” or_expr That’s about as simple as any syntax change ever gets. Which is still not nothing. But you’re absolutely right that a big and messy change to function definition grammar would have a higher bar to clear than most syntax proposals—and for the exact same reason, a small and local change to dict display datum grammar has a lower bar than most syntax proposals. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/T2LDV7TT6GA7Y2ZJ4FTWEY3EXCAJV3GR/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Keyword arguments self-assignment
On Apr 20, 2020, at 16:46, Christopher Barker wrote: > On Mon, Apr 20, 2020 at 3:13 PM Andrew Barnert wrote: > > Sure, it’s a declarative format, it’s just that often it’s intended to be > > understood as representing an object graph. > > I"m not sure the point here -- I was not getting onto detail nor expalingnoi > myself well, but I think there are (kind of) three ways to "name" just one > piece of data that came from a bunch of JSON: > > - a key, as in a dict `data['this']` > - an attribute of an object: `data.this` > - a local variable: `this` > > what I was getting at is that there may be a fine line between the dsta > version and the object version, but that can go between those easily without > typing all the names. OK, I thought you were saying that line is a serious problem for this proposal, so I was arguing that the same problems actually arise either way, and the same proposal helps both. Since you weren’t saying that and I misinterpreted you, that whole part of the message is irrelevant. So I’ll strip all the irrelevant bits down to this quote from you that I agree with 100%: > It's only when you have it in a local variable that this whole idea starts to > matter. And I think we also agree that it would be better to make this a dict display feature, and a bunch of other bits. But here’s the big issue: > > If I have 38 locals for all 38 selectors in the API—or, worse, a > > dynamically-chosen subset of them—then “get rid of those locals” is almost > > surely the answer, but with just 1? Probably not. And maybe 3 or 4 is > > reasonable too— > > right. but I don't think anyone is suggesting a language change for 1, or > even 3-4 names (maybe 4...) The original post only had 2 arguments. Other people came up with examples like the popen one, which has something insane like 19 arguments, but most of them were either constants or computed values not worth storing; only 4 of them near the end we’re copied from locals. Steven’s example had 5. The “but JavaScript lets me do it” post has 3. I think someone suggested the same setup.py example you came up with later in this same example, and it had 3 or 4. So I think people really are suggesting this for around 4 names. And I agree that’s kind of a marginal benefit. That’s why I think the whole proposal is marginal. It’s almost never going to be a huge win—but it may be a small win in so many places that it adds up to being worth it. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/IH2O23QXS53LY4KM4TS2V23Q6SSEYD7V/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: zip(x, y, z, strict=True)
Alex Hall wrote: > Surely no exception is raised because zip is lazy? Ack, you're right. The same problem would come up wherever you actually _use_ the zip, of course, but it's harder to demonstrate and reason about. So change that toy example to `zipped = list(zip(x, y, strict=True))`. (Fortunately, it looks like Ram got what I intended despite my mistake.) > Doesn't it still have to be even with strict=True? Well, I suppose technically it doesn't _have_ to be, but it certainly _should_ be. (Although it's a bit weird to say "it should be lazy even with `strict=True`" out loud; maybe that's a mild argument for using a different qualifier like `equal`, as in more-itertools?) ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/HMJMEHH72F3WKODK6CHS3F6CQC6TDDUK/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: zip(x, y, z, strict=True)
On Apr 20, 2020, at 13:49, Ram Rachum wrote: > > Good point. It would have to be dependent on position. In other words, you > would never pass an iterator into zip with any expectation that it would be > in a usable condition by the time it's done. > > Actually, I can't think of any current scenario in which someone would want > to do this, with the existing zip logic. Admittedly, such cases are almost surely not that common, but I actually have some line-numbering code that did something like this (simplified a bit from real code): yield from enumerate(itertools.chain(headers, [''], body, ['']) … but then I needed to know how many lines I yielded, and there’s no way to get that from enumerate, so instead I had to do this: counter = itertools.count() yield from zip(counter, itertools.chain(headers, [''], body, ['']) lines = next(counter) (Actually, at the same time I did that, I also needed to add some conditional bits to the chain, and it got way too messy for one line, so I ended up rewriting it as a sequence of separate `yield from zip(counter, things)` statements. But that’s just a more complicated demonstration of the same idea.) But again, this probably isn’t very common. And also, while you were asking about the existing zip logic, the more important question is the new logic you’re proposing. I can’t imagine a case where you’d want to check for non-empty and _then_ use it, which is what’s relevant here. There probably are such cases, but if so, they’re even rarer, enough so that the fact that you have to wrap something in itertools.tee or more_itertools.peekable to pull it off (or just not use the new strict=True/zip_strict/zip_equal) is probably not a great tragedy. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/B4BOQYPUQNFHLTUDZKIJIR2526UAQR2V/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Keyword arguments self-assignment
On Apr 20, 2020, at 13:42, Christopher Barker wrote: > > On Mon, Apr 20, 2020 at 12:17 PM Andrew Barnert wrote: >> >> A lot of JSON is designed to be consumed by JavaScript, where there is no >> real line (there is no dict type; objects have both dict-style and dot-style >> access). So in practice, a lot of JSON maps well to data, a lot maps well to >> objects, and some is mushily designed and doesn’t quite fit either way, >> because in JS they all look the same anyway. > > Well, sure. Though JSON itself is declarative data. Sure, it’s a declarative format, it’s just that often it’s intended to be understood as representing an object graph. > In Python, you need to decide how you want to work with it, either as an > object with attributes or a dict. But if you are getting it from JSON, it's a > dict to begin with. So you can keep it as a dict, or populate an object with > it. B ut populating that object can be automated: > > an_instance = MyObject(**the_dict_from_JSON) But unless your object graph is flat, this doesn’t work. A MyObject doesn’t just have strings and numbers, it also has a list of MySubObjects; if you just ** the JSON dict, you get subobjs=[{… d1 … }, { … d2 … }], when what you actually wanted was subobjs=[MySubObject(**d) for d in …]. It’s not like it’s _hard_ to write code to serialize and deserialize object graphs as JSON (although it’s hard enough that people keep proposing a __json__ method to go one way and then realizing they don’t have a proposal to go the other way…), but it’s not as trivial as just ** the dict into keywords. > > But, maybe even more importantly: even if you _do_ decide it makes more > > sense to stick to data for this API, you have the parallel `{'country': > > country, 'year': year}` issue, which is just as repetitive and verbose. > > only if you have those as local variables -- why are they ? Apologies for my overly-fragmentary toy example. Let’s say you have a function that makes an API request to some video site to get the local-language names of all movies of the user-chosen genre in the current year. If you’ve built an object model, it’ll look something like this: query = api.Query(genre=genre, year=datetime.date.today().year) response = api.query_movies(query) result = [movie.name[language] for movie in response.movies] If you’re treating the JSON as data instead, it’ll look something like this: query = {'query': {'genre': genre, 'year': datetime.date.today().year}} response = requests.post(api.query_movies_url, json=query).json result = [movie['name'][language] for movie in response.movies] Either way, the problem is in that first line, and it’s the same problem. (And the existence of ** unpacking and the dict() constructor from keywords means that solving either one very likely solves the other nearly for free.) Here I’ve got one local, `genre`. (I also included one global representing a global setting, just to show that they _can_ be reasonable as well, although I think a lot less often than locals, so ignore that.) I think it’s pretty reasonable that the local variable has the same name as the selector key/keyword. If I ask “why do I have to repeat myself with genre=genre or 'genre': genre”, what’s the answer? If I have 38 locals for all 38 selectors in the API—or, worse, a dynamically-chosen subset of them—then “get rid of those locals” is almost surely the answer, but with just 1? Probably not. And maybe 3 or 4 is reasonable too—a function that select by genre, subgenre, and mood seems like a reasonable thing. (If it isn’t… well, then I was stupid to pick an application area I haven’t done much work in… but you definitely don’t want to just select subgenre without genre in many music APIs, because your user rarely wants to hear both hardcore punk and hardcore techno.) And it’s clearly not an accident that the local and the selector have the same name. So, I think that case is real, and not dismissible. > I'm not saying it never comes up in well designed code -- it sure does, but > if there's a LOT of that, then maybe some refactoring is in order. Yes. And now that you point that out, thinking of how many people go to StackOverflow and python-list and so on looking for help with exactly that antipattern when they shouldn’t be doing it in the first place, there is definitely a risk that making this syntax easier could be an antipattern magnet. So, it’s not just whether the cases with 4 locals are important enough to overcome the cost of making Python syntax more complicated; the benefit has to _also_ overcome the cost of being a potential antipattern magnet. For me, this proposal is right on the border of being worth it (and I’m not sure which side it falls on), so that could be enough to change the answer, so… good thing you brought it up. But I don’t think it eliminates the rationale for the proposal, or even the rationale for using it with JSON-related
[Python-ideas] Re: zip(x, y, z, strict=True)
On Apr 20, 2020, at 10:42, Ram Rachum wrote: > > Here's something that would have saved me some debugging yesterday: > > >>> zipped = zip(x, y, z, strict=True) > > I suggest that `strict=True` would ensure that all the iterables have been > exhausted, raising an exception otherwise. One quick bikeshedding question (which also gets to the heart of how you’d want to implement it); apologies if this came up in the thread from 2 years ago or the discussion in the more-iterables PR that I just suggested everyone should read before commenting, but I wanted to get this down before I forget it. x = iter(range(5)) y = [0] try: zipped = zip(x, y, strict=True) except ValueError: # assuming that’s the exception you want? print(next(x)) Should this print 1 or 2 or raise StopIteration or be a don’t-care? Should it matter if you zip(y, x, strict=True) instead? ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/3CWTSFTLVGYHWPKMG45N4GZFEV2Z7RZF/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: zip(x, y, z, strict=True)
On Apr 20, 2020, at 13:03, Eric V. Smith wrote: > > On 4/20/2020 3:39 PM, Andrew Barnert via Python-ideas wrote: >> >> >> As I said, wanting to check does come up sometimes—I know I have written >> this myself at least once, and I’d be a little surprised if it’s not in >> more-itertools. > > Interestingly, it looks like it it might be more_itertools.zip_equal, which > is listed at https://github.com/more-itertools/more-itertools, but is linked > to > https://more-itertools.readthedocs.io/en/stable/api.html#more_itertools.zip_equal > which is missing. Maybe it's new? Yeah, it is new. See PR 415 (https://github.com/more-itertools/more-itertools/pull/415) 21 days ago. There must be something in the air that’s made people suddenly want this more. :) The PR does a great job linking to other discussions about this, including an -ideas thread from two years ago. I haven’t read through everything yet, but I notice that the first objection last time around was David Mertz pointing out that it’s not even in more-itertools, so maybe that more-itertools PR means it’s the perfect time to reopen this discussion? Or maybe it means we should wait a few months and see if people seem to be using the one in more-itertools? (And also maybe to wait for it to stabilize—there are a few bug fix commits to it after the initial merge.) ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/FF4IK5KX6EFRO2SHCCKLCZN72JD3XJ3H/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: zip(x, y, z, strict=True)
On Apr 20, 2020, at 11:25, Brandt Bucher wrote: > > I disagree. In my own personal experience, ~80% of the time when I use `zip` > there is an assumption that all of the iterables are the same length. Sure, but I think cases where you want that assumption _checked_ are a lot less common. There are lots of postconditions that you assume just as often as “x, y, and z are fully consumed” and just as rarely want to check, so we don’t need to make it easy to check every possible one of them. As I said, wanting to check does come up sometimes—I know I have written this myself at least once, and I’d be a little surprised if it’s not in more-itertools. But often enough to be a (flag on a) builtin? I’ve also written a zip that uses the length of the first rather than the shortest or longest, and a zip that skips rather than filling past the end of short inputs, and there are probably other variations that come up occasionally. But if they don’t come up that often, and are easy to write yourself, is there really a problem that needs to be fixed? And even if checking is the most common option after the default, it seems like a weird API to have some options for what to do at the end as keyword parameter flags and other options as entirely separate functions. Maybe a flag for longest (or a single at_end parameter with an enum of different end-behaviors truncate, check, fill, skip where the signature can immediately show you that the default is truncate) would be a better design if you were doing Python from scratch, but I think the established existence of zip_longest pushes us the other way. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/BNBKATJM4NDXUG53WRZFSO7VWRWDCA5B/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Keyword arguments self-assignment
On Apr 20, 2020, at 11:01, Christopher Barker wrote: > > The JSON - related example is a good one -- JSON maps well to "data" in > Python, dicts and lists of numbers and strings. If you find yourself > converting a bunch of variable names to/from JSON, you probably should be > simply using a dict, and passing that around anyway. A lot of JSON is designed to be consumed by JavaScript, where there is no real line (there is no dict type; objects have both dict-style and dot-style access). So in practice, a lot of JSON maps well to data, a lot maps well to objects, and some is mushily designed and doesn’t quite fit either way, because in JS they all look the same anyway. The example code for an API often shows you doing `result.movies[0].title.en`, because in JS you can. And in other languages, sometimes it is worth writing (or auto-generating) the code for Movie, etc. classes and serializing them to/from JSON so you can do the same. This is really the same point as “sometimes ORMs are useful”, which I don’t think is that controversial. But, maybe even more importantly: even if you _do_ decide it makes more sense to stick to data for this API, you have the parallel `{'country': country, 'year': year}` issue, which is just as repetitive and verbose. The `{::country, ::year}` syntax obviously solves that dict key issue just as easily as it does for keywords. But most of the other variant proposals solve it at least indirectly via dict constructor calls—`dict(**, country, year)`, `dict(country=, year=)`, `dict(**{country, year})`, which isn’t quite as beautiful, but is still better than repeating yourself if the list of members or query conditions gets long. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/EH24NKL5AGOK42DG4LO7XX43ZEG6V5D4/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: zip(x, y, z, strict=True)
On Apr 20, 2020, at 10:42, Ram Rachum wrote: > > Here's something that would have saved me some debugging yesterday: > > >>> zipped = zip(x, y, z, strict=True) > > I suggest that `strict=True` would ensure that all the iterables have been > exhausted, raising an exception otherwise. This is definitely sometimes useful, but I think less often than zip_longest, which we already decided long ago isn’t important enough to push into zip but instead should be a separate function living in itertools. I’ll bet there’s a zip_strict (or some other name for the same idea) in the more-itertools library. (If not, it’s probably worth submitting.) Whether it’s important enough to bring into itertools, add as a recipe, or call out as an example of what more-itertools can do in the itertools docs, I’m not sure. But I don’t think it needs to be added as a flag on the builtin. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/QKU65L2QIEKUOABDAF6QCBJ3AKSXOHWG/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Keyword arguments self-assignment
On Apr 20, 2020, at 01:06, M.-A. Lemburg wrote: > > The current version already strikes me as way too complex. > It's by far the most complex piece of grammar we have in Python: > > funcdef: 'def' NAME parameters ['->' test] ':' [TYPE_COMMENT] > func_body_suite But nobody’s proposing changing the function definition syntax here, only the function call syntax. Which is a lot less hairy. It is still somewhat hairy, but nowhere near as bad, so this argument doesn’t really apply. Also, you’re lumping all the different proposals here, but they don’t all have the same effect, which makes the argument even weaker. Adding a ** mode switch does make calls significantly more complicated, because it effectively clones half of the call grammar to switch to a similar but new grammar. But allowing keyword= is a simple and local change to one small subpart of the call grammar that I don’t think adds too much burden. And ::value in dict displays doesn’t touch the call syntax at all; it makes only a trivial and local change to a subpart of the much simpler dict display grammar. And **{a,b,c} is by far the most complicated, but the complicated part isn’t in calls (which would just gain one simple alternation); it’s in cloning half of the expression grammar to create a new nonset_expression node; the change to call syntax to use that new node is simple. (I’m assuming this proposal would make **{set display} in a call a syntax error when it’s not a magic set-of-identifiers unpacking, because otherwise I don’t know how you could disambiguate at all.) So, even if you hadn’t mixed up definitions and calls, I don’t think this argument really holds much water. I think your point that “hard to parse means hard to reason about” is a good one, however. That’s part of my rationale for the ::value syntax in dict displays: it’s a simple change to a simple piece of syntax that’s well isolated and consistent everywhere it appears. But I don’t think people would actually have a problem learning, internalizing, and reading keyword= syntax. And I think it may be an argument against the **{a,b,c} syntax, but only in a more subtle way than you’re advancing—people wouldn’t even internalize the right grammar; they’d just think of it as a special use of set displays (in fact Steven, who proposed it, encourages that reading), which is an extra special case to learn. Which can still be acceptable (lots of people get away with thinking of target lists as a special use of tuple displays…); it’s just a really high bar to clear. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/JNENJWUEORPQ6UMMTWQPMKAZUBDE37B3/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: list.append(x) could return x
On Apr 20, 2020, at 08:41, J. Pic wrote: > > > Currently, list.append(x) mutates the list and returns None. Yes, which is consistent with the vast majority of mutating methods in Python. It would be pretty weird to make lst.append(x) return x, while lst.extend(xs) still returns None, not to mention similar methods on other types like set.add. > It would be a little syntactic sugar to return x, for example: > > something = mylist.append(Something()) You can already get the same effect with: mylist.append(something := Something()) I think usually this would be more readable (and pythonic) as two lines rather than combined into one: something = Something() mylist.append(something) But “usually” isn’t “always”, and that’s why we have escape hatches like the walrus operator. It’s for exactly this purpose, where a subexpression needs to be bound to a name, but for some reason it can’t or shouldn’t be extracted to a separate assignment statement. And I think this is a lot clearer about what’s getting assigned to something, too. Someone who’s never seen the walrus operator won’t understand it, but at least it’s unlikely they’re going to misunderstand it as something other than what it means. This is especially an issue with methods like list.append. In a lot of other languages, they return self, because the language encourages method chaining for fluid programming (Perl, Ruby), or because list appending is non-mutating (Scala, Haskell), or for bizarre reasons specific to that language (Go), so a lot of people are likely to misunderstand your syntax as meaning that something gets the new value of the list. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/GFH5MNRLBRYDIKH4FPZYJVBGQU7ZR634/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Proposal: Keyword Unpacking Shortcut [was Re: Keyword arguments self-assignment]
On Apr 18, 2020, at 05:16, Alex Hall wrote: > > Is there anything else similar in the language? Obviously there are cases > where the same text has different meanings in different contexts, but I don't > think you can ever refactor an expression (or text that looks like an > expression) into a variable and change its meaning while keeping the program > runnable. I suppose it depends on where you draw the “looks like an expression”, line, but I think there are cases that fit. It’s just that there are _not many_ of them, and most of them are well motivated. Each exception adds a small cost to learning the language, but Python doesn’t have to be perfectly regular like Lisp or Smalltalk, it just has to be a lot less irregular than C or Perl. Most special cases aren’t special enough, but some are. A subscription looks like a list display, but it’s not. Mixing them up will only give you a syntax error if you use slices, ellipses, or *-unpacking in the wrong one, and often won’t even give you a runtime error. And the parallel isn’t even useful. But this is worth it anyway because subscription is so tremendously important. A target list looks like a tuple display, but it’s not. Mixing them up will only give you a syntax error if you try to use a tuple display with a constant or a complex expression in it as a target list. Mixing them up in other ways will only give you at best an UnboundLocalError or NameError at runtime, and at worst silently wrong behavior. But the parallel here is more helpful than confusing (it’s why multiple-value return looks so natural in Python, for one thing), so it’s worth it. **{a, b, c} is a special case in two ways: **-unpacking is no longer one thing but two different things, although with a very useful and still pretty solid parallel between them, and set display syntax now has two meanings, with a somewhat useful and weaker parallel. Even added together, that’s not as much of a learning burden as subscription looking like list displays. But it also isn’t as important a benefit. The magic ** mode switch only pushes two complicated and already-not-quite-parallel forms a little farther apart, which is less of a cost. The keyword= is similar but even less so, especially since anywhere it could be confused is a syntax error. The dict display ::value doesn’t cause any new exceptions or break any existing parallels at all, so it’s even less of a cost. But there are plenty of other advantages and disadvantages of each of the four (and the minor variations on them in this thread); that’s just one factor of many to weigh. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/GC5ZDVJNR5E36MFN4RERMZ4QQULVDIXX/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Proposal: Keyword Unpacking Shortcut [was Re: Keyword arguments self-assignment]
On Apr 17, 2020, at 23:18, Steven D'Aprano wrote: > > > Keyword Unpacking Shortcut > -- > > Inside function calls, the syntax > > **{identifier [, ...]} > > expands to a set of `identifier=identifier` argument bindings. > > This will be legal anywhere inside a function call that keyword > unpacking would be legal. Which means that you can’t just learn ** unpacking as a single consistent thing that’s usable in multiple contexts with (almost) identical syntax and identical meaning, you have to learn that it has an additional syntax with a different meaning in just one specific context, calls, that’s not legal in the others. Each special case like that makes the language’s syntax a little harder to internalize, and it’s a good thing that Python has a lot fewer such special cases than, say, C. Worse, this exact same syntax is a set display anywhere except in a ** in a call. Not only is that another special case to learn about the differences between set and dict displays, it also means that if you naively copy and paste a subexpression from a call into somewhere else (say, to print the value of that dict), you don’t get what you wanted, or a syntax error, or even a runtime error, you get a perfectly valid but very different value. > On the other hand, plain keyword unpacking: > > **textinfo > > is terse, but perhaps too terse. Neither the keys nor the values are > immediately visible. Instead, one must search the rest of the function > or module for the definition of `textinfo` to learn which parameters are > being filled in. You can easily put the dict right before the call, and when you don’t, it’s usually because there was a good reason. And there are good reasons. Ideally you shouldn’t have any function calls that are so hairy that you want to refractor them, but the the existence of libraries you can’t control that are too huge and unwieldy is the entire rationale here. Sometimes it’s worth pulling out a group of related parameters to a “launch_params” or “timeout_and_retry_params” dict, or even to a “build_launch_params” method, not just for readability but sometimes for flexibility (e.g., to use it as a cache or stats key, or to give you somewhere to hook easily in the debugger and swap out the launch_params dict. > Backwards compatibility > --- > > The syntax is not currently legal so there are no backwards > compatibility concerns. The syntax is perfectly legal today. The syntax for ** unpacking in a call expression takes any legal expression, and a set display is a legal expression. You can see this by calling compile (or, better, dis.dis) on the string 'spam(**{a, b, c})'. The semantics will be a guaranteed TypeError at runtime unless you’ve done something pathological, so almost surely nobody’s deployed any code that depends on the existing semantics. But that’s not the same as the syntax not being legal. And, outside of that trivial backward compatibility nit, this raises a bunch of more serious issues. Running Python 3.9 code in 3.8 would do the wrong thing, but maybe not wrong enough to break your program visibly, which could lead to some fun debugging sessions. That’s not a dealbreaker, but it’s definitely better for new syntax to raise a syntax error in old versions, if possible. And of course existing linters, IDEs, etc. will misunderstand the new syntax (which is worse than failing to parse it) until they’re taught the new special case. This also raises an implementation issue. The grammar rule to disambiguate this will probably either be pretty hairy, or require building a parallel fork of half the expression tree so you can have an “expression except for set displays” node. Or there won’t be one, and it’ll be done as a special case post-parse hack, which Python uses sparingly. But all of that goes right along with the human confusion. If the same syntax can mean two different things in different contexts, it’s harder to internalize a usable approximate version of the grammar. For something important enough, that may be worth it, but I don’t think the benefits of this proposal reach that bar. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/I5VC4ONOG4F4KRP3TCQMAT4HCNUZT2O3/ Code of Conduct: http://python.org/psf/codeofconduct/