Hi Stuart, Stuart Henderson wrote on Tue, Jul 11, 2017 at 03:52:26PM +0100: > On 2017/07/11 16:19, Ingo Schwarze wrote:
>> This decade feels like a strange point in time for degrading fortune >> and calendar files by replacing UTF-8 characters with ASCII >> transcriptions. Maybe such games should call >> >> setlocale(LC_CTYPE, ""); >> char *loc = nl_langinfo(CODESET); >> >> and replace bytes that are not printable ASCII with question marks >> when loc doesn't contain UTF-8? Not sure. > Given that we don't have > http://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv.html, > that seems better to me than either indiscriminately printing UTF-8 to a > terminal expecting ASCII, or quietly mangling output. > > But then, how far should one go? ls(1) can have the same problem with > an incompatible terminal. ls(1) already does that: $ cd /usr/src/bin/ls/Test/ # containing test files on my notebook $ ls a??c surr????????? bel?. test.txt b??r test_wctype cr???. test_wctype.c esc?[4munder testfile iv?????????????? tmp.txt long???????????????????????? wt?]0;rogue_title?. np??. xff?. sh????????????????????? $ export LC_CTYPE=en_US.UTF-8 $ ls [ snip UTF-8-output because that doesn't belong in mail ] Admittedly, what ls(1) does in /usr/src/bin/ls/utf8.c is minimally more complicated: It also validates, sanitizes, and columnates UTF-8 characters in LC_CTYPE=en_US.UTF-8 mode. For simpler cases like fortune(6) and calendar(6), no validation, sanitation, and columnation is needed, so we get away without mbtowc(3) and even without isu8cont(). Just isprint(3) is probably enough for those, and even that is only needed unless the locale is UTF-8. Yours, Ingo