On Thu, Apr 11, 2024 at 03:37, Jarno Mäkipää <jmaki...@gmail.com> wrote:
> there is slight difference between wctoutf8 and wcrtomb, wcrtomb
> returns -1 if its presented with non valid char, of its char is not
> presentable on current locale. I think wctoutf8 only returns positive
> integers.
wctouf8 cannot fail because it writes invalid Unicode code points as utf8.
This is another reason I asked if we could delegate the job of "Is this a valid
Unicode code point" to the other Unicode code. We are not reading Unicode with
utf8towc, we are reading utf8, if unicode ever gets replaced, it’s not hard to
imagine that new/different encoding system representing itself with utf8 (a
very elegant, efficient way to represent this type of stuff). As long as there
isn’t a security problem to it, it only makes the code less agnostic where it
doesn’t really need to be.
I remember from testing if you pass in max unsigned int to wctoutf8, it will
write one 0xff character, which is actual invalid utf8 (the theoretical max
codepoint in utf8 is 2^31-1). This is a situation where bounds checking seems
sane, maybe a "if (wc > 1<<31-1) return -1" at the start of wctoutf8 would fix
it?
- Oliver Webb <aquahobby...@proton.me>
_______________________________________________
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net