Module Name: src Committed By: riastradh Date: Mon Aug 19 16:22:10 UTC 2024
Modified Files: src/lib/libc/locale: c32rtomb.c c32rtomb.h src/tests/lib/libc/locale: t_c16rtomb.c t_c8rtomb.c Log Message: c32rtomb(3): Use conversion state to handle shift sequences. For conversion of Unicode scalar values to coding systems requiring shift sequences, such as ISO-2022-JP, _citrus_iconv_convert will always produce: 1. a shift sequence from the initial state to some nondefault state, like from US-ASCII to JIS X 0208 2. the encoding of the desired characater 3. a shift sequence restoring the initial state This is unnecessary if the output is already in the state needed to encoded the desired character. For example, this method produces seven bytes to encode each YEN SIGN in ISO-2022-JP -- and fourteen, to encode two consecutive ones -- even though the shift sequence is only three bytes long and once shifted YEN SIGN takes only one byte. Instead, convert the Unicode scalar value to a locale-dependent wide character and encode that, by composing - _citrus_iconv_convert => gives us a multibyte encoding of the character from the initial state (and restoring the initial state afterward) - mbrtowc with initial conversion state => gives us the single wide character representation XXX If combining characters are possible here, this may fail. - wcrtomb with caller's conversion tsate => gives us a state-dependent multibyte encoding of the character XXX Is there a cheaper way to convert from Unicode scalar value to locale-dependent wide character? It is not obvious to me from the largely undocumented Citrus machinery, but it would obviously be better than this somewhat circuitous Rube Goldberg contraption of chained multibyte APIs. PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences To generate a diff of this commit: cvs rdiff -u -r1.3 -r1.4 src/lib/libc/locale/c32rtomb.c cvs rdiff -u -r1.1 -r1.2 src/lib/libc/locale/c32rtomb.h cvs rdiff -u -r1.5 -r1.6 src/tests/lib/libc/locale/t_c16rtomb.c cvs rdiff -u -r1.6 -r1.7 src/tests/lib/libc/locale/t_c8rtomb.c Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files.