Yes, this is precisely my point - 'one or more'. The string-length with invalid
embedded sequences is not guaranteed to be consistent, which seems like a
problem. Doing a decode to ensure all points are valid - even if in the
undefined sequences - seems to be a good idea to prevent secondary
> Question: if there is no translation at all, won't the invalid chars cause
> issues with things like string-length and string-copy procs? That is, since
> the number of octets can't be correctly translated to a number of glyphs,
> there will be some unpleasant side effects.
Converting a
Question: if there is no translation at all, won't the invalid chars cause
issues with things like string-length and string-copy procs? That is, since the
number of octets can't be correctly translated to a number of glyphs, there
will be some unpleasant side effects.
-elf
On 27 November 2023
> From the unicode-transition page:
>
> The strategy that I favor in the moment is to handle all string data
> > injected into the system transparently, the actual bytes are unchanged and
> > unexpected UTF-8 bytes are decoded and marked as a U+DC80 - U+DCFF (low,
> > trailing) UTF-16 surrogate