Re: [racket-users] char-utf-8-length signature is surprising

2018-05-03 Thread David Storrs
Interesting.  Thanks for the explanation; that was bugging me.

On Thu, May 3, 2018 at 3:37 PM, Shu-Hung You
 wrote:
> Looks like the implementation of char-utf-8-length returns values
> fitting the "FSS-UTF (1992) / UTF-8 (1993)" table in
> https://en.wikipedia.org/wiki/UTF-8#History. Not sure what's the
> standard UTF-8 encoding..
>
> /* racket/src/char.c */
> static Scheme_Object *char_utf8_length (int argc, Scheme_Object *argv[])
> {
>   mzchar wc;
>   if (!SCHEME_CHARP(argv[0]))
> scheme_wrong_contract("char-utf-8-length", "char?", 0, argc, argv);
>
>   wc = SCHEME_CHAR_VAL(argv[0]);
>   if (wc < 0x80) {
> return scheme_make_integer(1);
>   } else if (wc < 0x800) {
> return scheme_make_integer(2);
>   } else if (wc < 0x1) {
> return scheme_make_integer(3);
>   } else if (wc < 0x20) {
> return scheme_make_integer(4);
>   } else if (wc < 0x400) {
> return scheme_make_integer(5);
>   } else {
> return scheme_make_integer(6);
>   }
> }
>
>
> On Thu, May 3, 2018 at 2:12 PM, David Storrs  wrote:
>> I noticed this in the docs and it surprised me:
>>
>> (char-utf-8-length char) → (integer-in 1 6)
>>
>> UTF-8 characters are 1-4 bytes, so why isn't it (integer-in 1 4)?  I
>> feel like this is probably obvious but I'm not coming up with the
>> answer.
>>
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "Racket Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to racket-users+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] char-utf-8-length signature is surprising

2018-05-03 Thread Shu-Hung You
Looks like the implementation of char-utf-8-length returns values
fitting the "FSS-UTF (1992) / UTF-8 (1993)" table in
https://en.wikipedia.org/wiki/UTF-8#History. Not sure what's the
standard UTF-8 encoding..

/* racket/src/char.c */
static Scheme_Object *char_utf8_length (int argc, Scheme_Object *argv[])
{
  mzchar wc;
  if (!SCHEME_CHARP(argv[0]))
scheme_wrong_contract("char-utf-8-length", "char?", 0, argc, argv);

  wc = SCHEME_CHAR_VAL(argv[0]);
  if (wc < 0x80) {
return scheme_make_integer(1);
  } else if (wc < 0x800) {
return scheme_make_integer(2);
  } else if (wc < 0x1) {
return scheme_make_integer(3);
  } else if (wc < 0x20) {
return scheme_make_integer(4);
  } else if (wc < 0x400) {
return scheme_make_integer(5);
  } else {
return scheme_make_integer(6);
  }
}


On Thu, May 3, 2018 at 2:12 PM, David Storrs  wrote:
> I noticed this in the docs and it surprised me:
>
> (char-utf-8-length char) → (integer-in 1 6)
>
> UTF-8 characters are 1-4 bytes, so why isn't it (integer-in 1 4)?  I
> feel like this is probably obvious but I'm not coming up with the
> answer.
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to racket-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[racket-users] char-utf-8-length signature is surprising

2018-05-03 Thread David Storrs
I noticed this in the docs and it surprised me:

(char-utf-8-length char) → (integer-in 1 6)

UTF-8 characters are 1-4 bytes, so why isn't it (integer-in 1 4)?  I
feel like this is probably obvious but I'm not coming up with the
answer.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.