On Thu, Apr 11, 2024 at 03:37, Jarno Mäkipää <jmaki...@gmail.com> wrote:

> there is slight difference between wctoutf8 and wcrtomb, wcrtomb
> returns -1 if its presented with non valid char, of its char is not
> presentable on current locale. I think wctoutf8 only returns positive
> integers.

wctouf8 cannot fail because it writes invalid Unicode code points as utf8.

This is another reason I asked if we could delegate the job of "Is this a valid 
Unicode code point" to the other Unicode code. We are not reading Unicode with 
utf8towc, we are reading utf8, if unicode ever gets replaced, it’s not hard to 
imagine that new/different encoding system representing itself with utf8 (a 
very elegant, efficient way to represent this type of stuff). As long as there 
isn’t a security problem to it, it only makes the code less agnostic where it 
doesn’t really need to be.

I remember from testing if you pass in max unsigned int to wctoutf8, it will 
write one 0xff character, which is actual invalid utf8 (the theoretical max 
codepoint in utf8 is 2^31-1). This is a situation where bounds checking seems 
sane, maybe a "if (wc > 1<<31-1) return -1" at the start of wctoutf8 would fix 
it?

- Oliver Webb <aquahobby...@proton.me>
_______________________________________________
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Reply via email to