Three times in the last week the devs where I work accidentally
introduced bugs into our code because of a mistake with case-insensitive
string comparisons. They managed to demonstrate three different failures:
# 1
a = something().upper() # normalise string
... much later on
if a == b.lower(): ...
# 2
a = something().upper()
... much later on
if a == 'maildir': ...
# 3
a = something() # unnormalised
assert 'foo' in a
... much later on
pos = a.find('FOO')
Not every two line function needs to be in the standard library, but I've
come to the conclusion that case-insensitive testing and searches should
be. I've made these mistakes myself at times, as I'm sure most people
have, and I'm tired of writing my own case-insensitive function over and
over again.
So I'd like to propose some additions to 3.7 or 3.8. If the feedback here
is positive, I'll take it to Python-Ideas for the negative feedback :-)
(1) Add a new string method, which performs a case-insensitive equality
test. Here is a potential implementation, written in pure Python:
def equal(self, other):
if self is other:
return True
if not isinstance(other, str):
raise TypeError
if len(self) != len(other):
return False
casefold = str.casefold
for a, b in zip(self, other):
if casefold(a) != casefold(b):
return False
return True
Alternatively: how about a === triple-equals operator to do the same
thing?
(2) Add keyword-only arguments to str.find and str.index:
casefold=False
which does nothing if false (the default), and switches to a case-
insensitive search if true.
Alternatives:
(i) Do nothing. The status quo wins a stalemate.
(ii) Instead of str.find or index, use a regular expression.
This is less discoverable (you need to know regular expressions) and
harder to get right than to just call a string method. Also, I expect
that invoking the re engine just for case insensitivity will be a lot
more expensive than a simple search need be.
(iii) Not every two line function needs to be in the standard library.
Just add this to the top of every module:
def equal(s, t):
return s.casefold() == t.casefold()
That's the status quo wins again. It's an annoyance. A small annoyance,
but multiplied by the sheer number of times it happens, it becomes a
large annoyance. I believe the annoyance factor of case-insensitive
comparisons outweighs the "two line function" objection.
And the two-line "equal" function doesn't solve the problem for find and
index, or for sets dicts, list.index and the `in` operator either.
Unsolved problems:
This proposal doesn't help with sets and dicts, list.index and the `in`
operator either.
Thoughts?
--
Steven D'Aprano
“You are deluded if you think software engineers who can't write
operating systems or applications without security holes, can write
virtualization layers without security holes.” —Theo de Raadt
--
https://mail.python.org/mailman/listinfo/python-list