Since this is Python 4000, where everything's made up and the points
don't matter...

I think there shouldn't be a char type, and also strings shouldn't be
iterable, or indexable by integers, or anything else that makes them
appear to be tuples of code points.

Nothing good can come of decomposing strings into Unicode code points.
The code point abstraction is practically as low level as the internal
byte encoding of the strings. Only lexing libraries should look at
strings at that level, and you should use a well written and tested
lexing library, not a hacky hand-coded lexer.

Someone in this thread mentioned that they'd used ' '.join on a string
in production code. Was the intent really to put a space between every
pair of code points of an arbitrary string? Or did they know that only
certain code points would appear in that string? A convenient way of
splitting strings into more meaningful character units would make the
actual intent clear in the code, and it would allow for runtime
testing of the programmer's assumptions.

Explicit access to code points should be ugly – s.__codepoints__,
maybe. And that should be a sequence of integers, not strings like
"́".

>it’s probably worth at least considering making UTF-8 strings first-class 
>objects. They can’t be randomly accessed,

They can be randomly accessed by abstract indices: objects that look
similar to ints from C code, but that have no extractable integer
value in Python code, so that they're independent of the underlying
string representation.

They can't be randomly accessed by code point index, but there's no
reason you should ever want to randomly access a string by a code
point index. It's a completely meaningless operation.
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/MDX4LXOWJQ2DXPIG27DJ3TVETSUSMSVW/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to