seems reasonable to me, though i'd be tempted to specifically mention utf8 --- there's nothing inherently "invalid" about surrogate pairs, other than that you shouldn't see them in utf8. (though you will see them in _modified_ utf8, so Java programmers might still meet them.)
> 0x10ffff seems legitimately just plain "invalid" though. On Sat, May 15, 2021 at 10:21 AM Rob Landley <[email protected]> wrote: > Elliott, is it worth testing for invalid unicode range in the display, ala: > > --- a/toys/other/ascii.c > +++ b/toys/other/ascii.c > @@ -44,7 +44,8 @@ static void codepoint(unsigned wc) > char *s = toybuf + sprintf(toybuf, "U+%04X : ", wc), *ss; > unsigned n, i; > > - if (wc>31 && wc!=127) { > + if ((wc>0xd7ff && wc<0xe000) || wc>0x10ffff) s += sprintf(s, "invalid"); > + else if (wc>31 && wc!=127) { > s += n = wctoutf8(ss = s, wc); > if (n>1) for (i = 0; i<n; i++) s += sprintf(s, " : %#02x"+2*!!i, > *ss++); > } else s = memcpy(s, (wc==127) ? "DEL" : low+wc*3, 3)+3; > > > Rob >
_______________________________________________ Toybox mailing list [email protected] http://lists.landley.net/listinfo.cgi/toybox-landley.net
