Interesting.  Thanks for the explanation; that was bugging me.

On Thu, May 3, 2018 at 3:37 PM, Shu-Hung You
<shu-hung....@eecs.northwestern.edu> wrote:
> Looks like the implementation of char-utf-8-length returns values
> fitting the "FSS-UTF (1992) / UTF-8 (1993)" table in
> https://en.wikipedia.org/wiki/UTF-8#History. Not sure what's the
> standard UTF-8 encoding..
>
> /* racket/src/char.c */
> static Scheme_Object *char_utf8_length (int argc, Scheme_Object *argv[])
> {
>   mzchar wc;
>   if (!SCHEME_CHARP(argv[0]))
>     scheme_wrong_contract("char-utf-8-length", "char?", 0, argc, argv);
>
>   wc = SCHEME_CHAR_VAL(argv[0]);
>   if (wc < 0x80) {
>     return scheme_make_integer(1);
>   } else if (wc < 0x800) {
>     return scheme_make_integer(2);
>   } else if (wc < 0x10000) {
>     return scheme_make_integer(3);
>   } else if (wc < 0x200000) {
>     return scheme_make_integer(4);
>   } else if (wc < 0x4000000) {
>     return scheme_make_integer(5);
>   } else {
>     return scheme_make_integer(6);
>   }
> }
>
>
> On Thu, May 3, 2018 at 2:12 PM, David Storrs <david.sto...@gmail.com> wrote:
>> I noticed this in the docs and it surprised me:
>>
>> (char-utf-8-length char) → (integer-in 1 6)
>>
>> UTF-8 characters are 1-4 bytes, so why isn't it (integer-in 1 4)?  I
>> feel like this is probably obvious but I'm not coming up with the
>> answer.
>>
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "Racket Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to racket-users+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to