Since this is Python 4000, where everything's made up and the points don't matter...
I think there shouldn't be a char type, and also strings shouldn't be iterable, or indexable by integers, or anything else that makes them appear to be tuples of code points. Nothing good can come of decomposing strings into Unicode code points. The code point abstraction is practically as low level as the internal byte encoding of the strings. Only lexing libraries should look at strings at that level, and you should use a well written and tested lexing library, not a hacky hand-coded lexer. Someone in this thread mentioned that they'd used ' '.join on a string in production code. Was the intent really to put a space between every pair of code points of an arbitrary string? Or did they know that only certain code points would appear in that string? A convenient way of splitting strings into more meaningful character units would make the actual intent clear in the code, and it would allow for runtime testing of the programmer's assumptions. Explicit access to code points should be ugly – s.__codepoints__, maybe. And that should be a sequence of integers, not strings like "́". >it’s probably worth at least considering making UTF-8 strings first-class >objects. They can’t be randomly accessed, They can be randomly accessed by abstract indices: objects that look similar to ints from C code, but that have no extractable integer value in Python code, so that they're independent of the underlying string representation. They can't be randomly accessed by code point index, but there's no reason you should ever want to randomly access a string by a code point index. It's a completely meaningless operation. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/MDX4LXOWJQ2DXPIG27DJ3TVETSUSMSVW/ Code of Conduct: http://python.org/psf/codeofconduct/