seems reasonable to me, though i'd be tempted to specifically mention utf8
--- there's nothing inherently "invalid" about surrogate pairs, other than
that you shouldn't see them in utf8. (though you will see them in
_modified_ utf8, so Java programmers might still meet them.)

> 0x10ffff seems legitimately just plain "invalid" though.

On Sat, May 15, 2021 at 10:21 AM Rob Landley <[email protected]> wrote:

> Elliott, is it worth testing for invalid unicode range in the display, ala:
>
> --- a/toys/other/ascii.c
> +++ b/toys/other/ascii.c
> @@ -44,7 +44,8 @@ static void codepoint(unsigned wc)
>    char *s = toybuf + sprintf(toybuf, "U+%04X : ", wc), *ss;
>    unsigned n, i;
>
> -  if (wc>31 && wc!=127) {
> +  if ((wc>0xd7ff && wc<0xe000) || wc>0x10ffff) s += sprintf(s, "invalid");
> +  else if (wc>31 && wc!=127) {
>      s += n = wctoutf8(ss = s, wc);
>      if (n>1) for (i = 0; i<n; i++) s += sprintf(s, " : %#02x"+2*!!i,
> *ss++);
>    } else s = memcpy(s, (wc==127) ? "DEL" : low+wc*3, 3)+3;
>
>
> Rob
>
_______________________________________________
Toybox mailing list
[email protected]
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Reply via email to