On 21 August 2016 at 14:10, Chris Angelico <ros...@gmail.com> wrote: > On Sun, Aug 21, 2016 at 12:52 PM, Steven D'Aprano <st...@pearwood.info> wrote: >> I think that while the suggestion does bring some benefit, the benefit >> isn't enough to make up for the code churn and disruption it would >> cause. But I encourage the OP to go through the standard library, pick a >> couple of modules, and re-write them to see how they would look using >> this proposal. > > Python still has a rule that you can iterate over anything that has > __getitem__, and it'll be called with 0, 1, 2, 3... until it raises > IndexError. So you have two options: Remove that rule, and require > that all iterable objects actually define __iter__; or make strings > non-subscriptable, which means you need to do something like > "asdf".char_at(0) instead of "asdf"[0]. IMO the second option is a > total non-flyer - good luck convincing anyone that THAT is an > improvement. The first one is possible, but dramatically broadens the > backward-compatibility issue. You'd have to search for any class that > defines __getitem__ and not __iter__.
That's not actually true - any type that defines __getitem__ can prevent iteration just by explicitly raising TypeError from __iter__. It would be *weird* to do so, but it's entirely possible. However, the real problem with this proposal (and the reason why the switch from 8-bit str to "bytes are effectively a tuple of ints" in Python 3 was such a pain), is that there are a lot of bytes and text processing operations that *really do* operate code point by code point. Scanning a path for directory separators, scanning a CSV (or other delimited format) for delimiters, processing regular expressions, tokenising according to a grammar, analysing words in a text for character popularity, answering questions like "Is this a valid identifier?" all involve looking at each character in a sequence individually, rather than looking at the character sequence as an atomic unit. The idiomatic pattern for doing that kind of "item by item" processing in Python is iteration (whether through the Python syntax and builtins, or through the CPython C API). Now, if we were designing a language from scratch today, there's a strong case to be made that the *right* way to represent text is to have a stream-like interface (e.g. StringIO, BytesIO) around an atomic type (e.g. CodePoint, int). But we're not designing a language from scratch - we're iterating on one with a 25 year history of design, development, and use. There may also be a case to be made for introducing an AtomicStr type into Python's data model that works like a normal string, but *doesn't* support indexing, slicing, or iteration, and is instead an opaque blob of data that nevertheless supports all the other usual string operations. (Similar to the way that types.MappingProxyType lets you provide a read-only view of an otherwise mutable mapping, and that collections.KeysView, ValuesView and ItemsView provide different interfaces for a common underlying mapping) But changing the core text type itself to no longer be suitable for use in text processing tasks? Not gonna happen :) Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/