On Tue, Jan 7, 2014 at 10:36 AM, Stephen J. Turnbull <step...@xemacs.org> wrote:
> Daniel Holth writes:
>
>  > Isn't it true that if you have bytes > 127 or surrogate escapes then
>  > encoding to latin1 is no longer as fast as memcpy?
>
> Be careful.  As phrased, the question makes no sense.  You don't "have
> bytes" when you are encoding, you have characters.
>
> If you mean "what happens when my str contains characters in the range
> 128-255?", the answer is encoding a str in 8-bit representation to
> latin1 is effectively memcpy.  If you read in latin1, it's memcpy all
> the way (unless you combine it with a non-latin1 string, in which case
> you're in the cases below).
>
> If you mean "what happens when my str contains characters in the range
>> 255", you have to truncate 16-bit units to 8 bit units; no memcpy.
>
> Surrogates require >= 16 bits; no memcpy.

That is neat.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to