Hi Stuart,

Stuart Henderson wrote on Tue, Jul 11, 2017 at 03:52:26PM +0100:
> On 2017/07/11 16:19, Ingo Schwarze wrote:

>> This decade feels like a strange point in time for degrading fortune
>> and calendar files by replacing UTF-8 characters with ASCII
>> transcriptions.  Maybe such games should call
>> 
>>   setlocale(LC_CTYPE, "");
>>   char *loc = nl_langinfo(CODESET);
>> 
>> and replace bytes that are not printable ASCII with question marks
>> when loc doesn't contain UTF-8?  Not sure.

> Given that we don't have
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv.html,
> that seems better to me than either indiscriminately printing UTF-8 to a
> terminal expecting ASCII, or quietly mangling output.
> 
> But then, how far should one go? ls(1) can have the same problem with
> an incompatible terminal.

ls(1) already does that:

   $ cd /usr/src/bin/ls/Test/   # containing test files on my notebook
   $ ls
  a??c                                    surr?????????
  bel?.                                   test.txt
  b??r                                    test_wctype
  cr???.                                  test_wctype.c
  esc?[4munder                            testfile
  iv??????????????                        tmp.txt
  long????????????????????????            wt?]0;rogue_title?.
  np??.                                   xff?.
  sh?????????????????????
   $ export LC_CTYPE=en_US.UTF-8
   $ ls                          
  [ snip UTF-8-output because that doesn't belong in mail ]

Admittedly, what ls(1) does in /usr/src/bin/ls/utf8.c is minimally
more complicated:  It also validates, sanitizes, and columnates
UTF-8 characters in LC_CTYPE=en_US.UTF-8 mode.  For simpler cases
like fortune(6) and calendar(6), no validation, sanitation, and
columnation is needed, so we get away without mbtowc(3) and even
without isu8cont().  Just isprint(3) is probably enough for those,
and even that is only needed unless the locale is UTF-8.

Yours,
  Ingo

Reply via email to