On Thu, Sep 21, 2023 at 01:25:01PM +0200, Walter Alejandro Iglesias wrote:
> I corrected many of the things you pointed me, but not all.  The
> function I use to check utf8 is mine, I use it in a pair of little
> programs which I've *hardly* checked for memory leacks.  I know my
> function looks BIG :-), but I know for sure that it does the job.

We already have code in libc that does this, see the function
_citrus_utf8_ctype_mbrtowc in lib/libc/citrus/citrus_utf8.c.
Please use the libc interface if at all possible, it is best to
have just one place to fix when a UTF-8 parser bug is found.

There is also utf8_isvalid() in tmux utf8.c though you would
have to trim tmux UTF-8 code down for your narrow use case.

Your implementation lacks proper bounds checking. It accesses
s[i + 3] based purely on the contents of the input string, without
checking whether len < i + 3. Entering the while (i != len) loop with
i == len-1 and a specially crafted input string can be problematic.

Reply via email to