* Ralf Schmitt <[email protected]>, 2012-03-09, 12:49:
fribidi_utf8_to_unicode consumes at most 3 bytes for a single unicode
character, i.e. it does not handle unicode character above 0xffff.
Now I woke up I finally understand what you meant here. :) Sorry for the
noise.
here's the inner loop of "fribidi_utf8_to_unicode" from
fribidi-char-sets-utf8.c:
,----
| length = 0;
| while ((FriBidiStrIndex) (s - t) < len)
| {
| register unsigned char ch = *s;
| if (ch <= 0x7f) /* one byte */
| {
| *us++ = *s++;
| }
| else if (ch <= 0xdf) /* 2 byte */
| {
| *us++ = ((*s & 0x1f) << 6) + (*(s + 1) & 0x3f);
| s += 2;
| }
| else /* 3 byte */
| {
| *us++ =
| ((int) (*s & 0x0f) << 12) +
| ((*(s + 1) & 0x3f) << 6) + (*(s + 2) & 0x3f);
| s += 3;
| }
| length++;
| }
`----
Ugh. That's so broken...
--
Jakub Wilk
_______________________________________________
Python-modules-team mailing list
[email protected]
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/python-modules-team