[Python-ideas] Re: Incremental step on road to improving situation around iterable strings

Alex Hall Mon, 24 Feb 2020 08:24:02 -0800

> A library implemented in a confusing way is not an example of nothing
wrong on Python strings. (I myself has made this stupid mistake many times
and I cannot blame neither Python nor sqlite for being careless.)
> In my humble opinion, your example does not prove that iterable strings
are faulty. They can be tricky in some occasions, I admit it... but there
are many tricks in all programming languages especially for newbies
(currently I am trying to learn Lisp... again).


In a sense we agree. Python strings are not wrong or faulty. I think both
sides of this thread are making good points, but it's ultimately a very
academic discussion. Strings blur the line between scalars and iterables.
Them being iterable is a bit weird sometimes and can make some code messier
but it's easy enough to deal with when you know what you're doing. That
kind of thing is not a good enough reason to make any drastic changes.

But as you say, they can be tricky, and that's a real problem worth paying
serious attention to. I don't understand your dismissal that there are many
tricks in all languages. Sure that's inevitable to a degree, but shouldn't
we try to make things less tricky where we can? Python strives to be easy
to use and easy to learn for beginners. Accidentally iterating over strings
has probably caused many hours of frustration and confusion. It probably
doesn't have that effect on anyone in this mailing list because we
understand Python deeply, but we need to consider the beginner's
perspective.

> Actually, `in` means the same in strings, in sequences, in lists, etc.

No, it really doesn't. `x[start:end] in x` is generally only True for
strings, not any other collection. Quoting from
https://docs.python.org/3/reference/expressions.html#membership-test-operations
:

For container types such as list, tuple, set, frozenset, dict, or
collections.deque, the expression x in y is equivalent to any(x is e or x ==
 e for e in y).

For the string and bytes types, x in y is True if and only if *x* is a
substring of *y*.

For user-defined classes which do not define __contains__()
<https://docs.python.org/3/reference/datamodel.html#object.__contains__> but
do define __iter__()
<https://docs.python.org/3/reference/datamodel.html#object.__iter__>, x in y
 is True if some value z, for which the expression x is z or x == z is
true, is produced while iterating over y.

Lastly, the old-style iteration protocol is tried: if a class defines
__getitem__()
<https://docs.python.org/3/reference/datamodel.html#object.__getitem__>, x
in y is True if and only if there is a non-negative integer index *i* such
that x is y[i] or x == y[i],
Strings and bytes clearly stick out as behaving differently from every
built in container type and they deviate from the default implementation in
terms of both __iter__ and __getitem__.

And that's fine! The behaviour is very useful. It would be sad if `c in
string` was only true if `c` was a single character. My point is that
sometimes the protocols and magic methods in Python aren't always in
perfectly consistent harmony. Remember that I was responding to this:

> Conceptually, we should be able to reason that every object that
> supports indexing should be iterable, without adding a special case
> exception "...except for str".

We already have a special case exactly like that and it's a good thing, so
it wouldn't be outrageous to add another.

> Are you implying that developers are wrong when they iterate over strings?

Roughly, though I think you might be hearing me wrong. There is lots of
existing code that correctly and intentionally iterates over strings. And
code that unintentionally does it probably doesn't live for long. But if
you took a random sample of all the times that someone has written code
that creates new behaviour which iterates over a string, most of them would
be mistakes. And essentially the developer was 'wrong' in those instances.
In my case, since I can't think of when I've needed to iterate over a
string, I've probably been wrong at least 90% of the time.

> Does it matter in any case?

Yes, because it wastes people's time and energy debugging.

> Strings must be defined in Python in some way.

We can choose to define them differently.

> The implementation, the syntax, and the semantics of strings are coherent
in Python.

They are not entirely coherent, as I have explained, and they do not have
to meet any particular standard of coherence.

> Ultimately, it does [not] matter how many people iterate on strings. That
is not the question.

It matters a lot, I don't know why you assert that.

> > And in the face of ambiguity, refuse the temptation to guess.
> > I do think it would be a pity if strings broke the tradition of
indexable implies
> > iterable, but "A Foolish Consistency is the Hobgoblin of Little Minds".
The benefits in
> > helping users when debugging would outweigh the inconsistency and the
minor inconvenience
> > of adding a few characters. Users who are expecting iteration to work
because indexing
> > works will quickly get a helpful error message and fix their problem.
At the risk of
> > overusing classic Python sayings, Explicit is better than implicit.
> > However, we could get the benefit of making debugging easier without
having to actually
> > break any existing code if we just raised a warning whenever someone
iterates
> > over a string. It doesn't have to be a deprecation warning and we don't
need to ever
> > actually make strings non-iterable.

> I do not agree at all.

What do you not agree with? Do you think it's more than a minor
inconvenience to add ".chars()" here and there? Do you think that the
benefits to debugging would be minor? Do you think that the inconsistency
would significantly hurt users? I haven't seen an argument for any of these
and I don't know if anything else I said was debateable.

>  It is not a question of right or wrong, better or worse. It is a
question of being consistent.

Why would that be the question? Why is consistency more important than
"better or worse"? How can you make such a bold claim?

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/LMQCLVUU7CRFV5WPFHI7TTENJFDOG6X2/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Incremental step on road to improving situation around iterable strings

Reply via email to