Ah. That makes a lot of sense, actually. Anyway, so then Latin1 strings are memcmp-able, and others are not. That's fine; I'll just add a check for that (I think there are already helper functions for this) and then have two special-case string functions. Thanks!
On Wed, Oct 12, 2016 at 4:08 PM Alexander Belopolsky < [email protected]> wrote: > > On Wed, Oct 12, 2016 at 5:57 PM, Elliot Gorokhovsky < > [email protected]> wrote: > > On Wed, Oct 12, 2016 at 3:51 PM Nathaniel Smith <[email protected]> wrote: > > But this isn't relevant to Python's str, because Python's str never uses > UTF-8. > > > Really? I thought in python 3, strings are all unicode... so what encoding > do they use, then? > > > No encoding is used. The actual code points are stored as integers of the > same size. If all code points are less than 256, they are stored as 8-bit > integers (bytes). If some code points are greater or equal to 256 but less > than 65536, they are stored as 16-bit integers and so on. >
_______________________________________________ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
