Re: [Toybox] utf8towc(), stop being defective on null bytes

Rob Landley Mon, 08 Apr 2024 09:48:59 -0700

On 4/8/24 11:01, enh wrote:
>> > Returning length 0 means we hit a null terminator,
>>
>> Null bytes aren't always "terminators". You can embed null bytes into data 
>> and still
>> want to do utf8 processing with it.
> 
> that's questionable ... the desire to have ASCII NUL in utf-8
> sequences (without breaking the "utf-8 sequences are usable as c
> strings" property) is the main reason for the existence of "modified
> utf-8".


You don't need a conversion function to grab a nul byte, you can check if it's a
null byte.

That value _is_ a special case, the enclosing loop can deal with it easily
enough (there's nothing to convert, it's a NUL byte, check directly). I've got
functions like regexec0() that work over a range instead of using a NUL, and
those have to deal with libc's regex stopping at NUL so the enclosing loop
advances past it and restarts.

Rob
_______________________________________________
Toybox mailing list
[email protected]
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Re: [Toybox] utf8towc(), stop being defective on null bytes

Reply via email to