On Thu, Sep 21, 2023 at 01:25:01PM +0200, Walter Alejandro Iglesias wrote: > I corrected many of the things you pointed me, but not all. The > function I use to check utf8 is mine, I use it in a pair of little > programs which I've *hardly* checked for memory leacks. I know my > function looks BIG :-), but I know for sure that it does the job.
We already have code in libc that does this, see the function _citrus_utf8_ctype_mbrtowc in lib/libc/citrus/citrus_utf8.c. Please use the libc interface if at all possible, it is best to have just one place to fix when a UTF-8 parser bug is found. There is also utf8_isvalid() in tmux utf8.c though you would have to trim tmux UTF-8 code down for your narrow use case. Your implementation lacks proper bounds checking. It accesses s[i + 3] based purely on the contents of the input string, without checking whether len < i + 3. Entering the while (i != len) loop with i == len-1 and a specially crafted input string can be problematic.