Hi, Ingo

• Ingo Schwarze [2023-09-20 13:55]:
Hi Kirill,

Kirill Miazine wrote on Wed, Sep 20, 2023 at 12:52:52PM +0200:

you may not even need -m, and instead inspect LC_CTYPE environment
variable and add appropriate headers for UTF-8. according to locale(1),
LC_CTYPE may be set to indicate UTF-8:

If the value of LC_CTYPE ends in ‘.UTF-8’, programs in the OpenBSD base
system ignore the beginning of it, treating for example zh_CN.UTF-8
exactly like en_US.UTF-8.

This is definitely very bad advice

I am sorry! I was thinking that a user who sets appropriate LC_CTYPE is thus instructing programs that input and output is UTF-8 and such instruction could be used instead of a flag, as per locale(1) presence of .UTF-8 in LC_CTYPE is an instruction to treat input and output as UTF-8 encoded text:

"The character encoding locale LC_CTYPE instructs programs which character encoding to assume for text input and to use for text output."

After all, LC_CTYPE=en_US.UTF-8 has to be set by a user and thus signals a preference to programs, and thus it wouldn't be unexpected or surprising to treat text as UTF-8 and also set appropriate MIME-headers. After all, by setting LC_CTYPE=en_US.UTF-8 -- according to locale(1) -- user says that input text is UTF-8, and then mail(1) would have to figure out how to make sure that text is transmitted properly.

Whether the user uses an UTF-8 locale for their shell and terminal
has nothing to do with whether they want to be send UTF-8 encoded
mail with MIME headers. For example, i'm using LC_CTYPE=en_US.UTF-8
for my shells and terminals most of the time, but i do not want the
low-level mail(1) MUA to suddenly start sending UTF-8 mail without
being specifically asked to.

My understanding of purpose of LC_TYPE was that by setting it, user specifically asks to treat input as UTF-8, and then the programs have to handle encoding appropriately. So I wouldn't be surprised if mail(1) started sending UTF-8 mail with LC_CTYPE=en_US.UTF-8. In fact, I'd be happy if it did so.

I just checked - even though i'm using the higer-level mutt(1) MUA
most of the time and even though the shell i'm starting mutt(1) from
has LC_CTYPE=C.UTF-8 set on that particular machine, the last sixteen
mails i sent all contained the explicit header

   Content-Type: text/plain; charset=us-ascii

and intentionally so.  Yes, i do occasionally send UTF-8 mail on
purpose, mostly in highly technical messages that need to display
particular Unicode characters in addition to mentioning their
codepoints in the U+[XX]XXXX form, and rarely, sending UTF-8 happens
inadvertently because mutt(1) contains some weird autodetection logic -
but what you set your terminal to and what you use for sending mail
are clearly completely unrelated topics.

Mutt has indeed a logic to see which character set a text can be converted into: it tries US-ASCII, then ISO-8859-1 and then UTF-8.

Yours,
   Ingo


Reply via email to