On Mon, May 7, 2018 at 5:11 AM Joshua Watt <jpewhac...@gmail.com> wrote: > IMHO, if you are doing UTF-8 (which you should), you should *always* > specify any offset in the string as a byte offset. I have a few > reasons for this justification:
I agree with this as well. I thought some more about how to spell out my gut feeling on this matter in more technical terms. UTF-8 is a byte (sequence) representation of Unicode code points. This indicates to me that an offset within an UTF-8-encoded string should also be given in bytes. Specifying the offset in Unicode points mixes the abstraction of the Unicode code point with (one of) its representations as a byte sequence. This is reflected in the fact that an offset in Unicode code points is not applicable to the UTF-8 string without first processing the string. Unicode code points do not give us that much either since what we most likely want are grapheme clusters anyway (which, like any more advanced Unicode processing, should be handled by a specialised library): http://utf8everywhere.org/#myth.strlen Cheers, Silvan _______________________________________________ wayland-devel mailing list wayland-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/wayland-devel