On Mon, Mar 23, 2020 at 7:06 PM Alex Hall <alex.moj...@gmail.com> wrote: > > I think I'm missing something, why is case insensitivity a mess? >
Because there are many characters that case fold in strange ways. "ıIiİ".casefold() == 'ıiii̇' which means that lowercase dotless ı doesn't casefold to the same thing that uppercase dotless I. Some characters case fold to strings of different lengths, such as "ß" which casefolds to "ss". I haven't even tried what happens with combining characters vs combined characters. And Unicode case folding is already a simplified version of reality; what actual humans expect can be even more complicated, such as (I think) German case folding rules being different for names and for book titles, and the way that umlauted letters are case folded. On the other hand, this might actually mean it's *better* to have a dedicated case-insensitive-cut-prefix operation. It would be difficult to define it in easy terms, but basically it should be such that the returned string (if not identical to the original) is the longest suffix to the original string such that, if the returned string were appended to the prefix and the result case folded, it would be the same as the original string case folded. But there could be other definitions, just as complicated, and not necessarily more correct. In any case, this can (and in my opinion should) be deferred for later. Start with the simple one that doesn't care about all these complexities, and then expand from there as the need is found. ChrisA _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/BI7YMJPJJV6BTUXJVGORZIF4NZUIVPM3/ Code of Conduct: http://python.org/psf/codeofconduct/