On 07/12/2013 09:59 AM, Joshua Landau wrote: > If you're interested, the basic of it is that strings now use a > variable number of bytes to encode their values depending on whether > values outside of the ASCII range and some other range are used, as an > optimisation.
Variable number of bytes is a problematic way to saying it. UTF-8 is a variable-number-of-bytes encoding scheme where each character can be 1, 2, 4, or more bytes, depending on the unicode character. As you can imagine this sort of encoding scheme would be very slow to do slicing with (looking up a character at a certain position). Python uses fixed-width encoding schemes, so they preserve the O(n) lookup speeds, but python will use 1, 2, or 4 bytes per every character in the string, depending on what is needed. Just in case the OP might have misunderstood what you are saying. jmf sees the case where a string is promoted from one width to another, and thinks that the brief slowdown in string operations to accomplish this is a problem. In reality I have never seen anyone use the types of string operations his pseudo benchmarks use, and in general Python 3's string behavior is pretty fast. And apparently much more correct than if jmf's ideas of unicode were implemented. -- http://mail.python.org/mailman/listinfo/python-list