On 4/8/24 11:01, enh wrote: >> > Returning length 0 means we hit a null terminator, >> >> Null bytes aren't always "terminators". You can embed null bytes into data >> and still >> want to do utf8 processing with it. > > that's questionable ... the desire to have ASCII NUL in utf-8 > sequences (without breaking the "utf-8 sequences are usable as c > strings" property) is the main reason for the existence of "modified > utf-8".
You don't need a conversion function to grab a nul byte, you can check if it's a null byte. That value _is_ a special case, the enclosing loop can deal with it easily enough (there's nothing to convert, it's a NUL byte, check directly). I've got functions like regexec0() that work over a range instead of using a NUL, and those have to deal with libc's regex stopping at NUL so the enclosing loop advances past it and restarts. Rob _______________________________________________ Toybox mailing list [email protected] http://lists.landley.net/listinfo.cgi/toybox-landley.net
