On Tue, Jan 7, 2014 at 10:36 AM, Stephen J. Turnbull <step...@xemacs.org> wrote: > Daniel Holth writes: > > > Isn't it true that if you have bytes > 127 or surrogate escapes then > > encoding to latin1 is no longer as fast as memcpy? > > Be careful. As phrased, the question makes no sense. You don't "have > bytes" when you are encoding, you have characters. > > If you mean "what happens when my str contains characters in the range > 128-255?", the answer is encoding a str in 8-bit representation to > latin1 is effectively memcpy. If you read in latin1, it's memcpy all > the way (unless you combine it with a non-latin1 string, in which case > you're in the cases below). > > If you mean "what happens when my str contains characters in the range >> 255", you have to truncate 16-bit units to 8 bit units; no memcpy. > > Surrogates require >= 16 bits; no memcpy.
That is neat. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com