Hi, Ingo
• Ingo Schwarze [2023-09-20 13:55]:
Hi Kirill,
Kirill Miazine wrote on Wed, Sep 20, 2023 at 12:52:52PM +0200:
you may not even need -m, and instead inspect LC_CTYPE environment
variable and add appropriate headers for UTF-8. according to locale(1),
LC_CTYPE may be set to indicate UTF-8:
If the value of LC_CTYPE ends in ‘.UTF-8’, programs in the OpenBSD base
system ignore the beginning of it, treating for example zh_CN.UTF-8
exactly like en_US.UTF-8.
This is definitely very bad advice
I am sorry! I was thinking that a user who sets appropriate LC_CTYPE is
thus instructing programs that input and output is UTF-8 and such
instruction could be used instead of a flag, as per locale(1) presence
of .UTF-8 in LC_CTYPE is an instruction to treat input and output as
UTF-8 encoded text:
"The character encoding locale LC_CTYPE instructs programs which
character encoding to assume for text input and to use for text output."
After all, LC_CTYPE=en_US.UTF-8 has to be set by a user and thus signals
a preference to programs, and thus it wouldn't be unexpected or
surprising to treat text as UTF-8 and also set appropriate MIME-headers.
After all, by setting LC_CTYPE=en_US.UTF-8 -- according to locale(1) --
user says that input text is UTF-8, and then mail(1) would have to
figure out how to make sure that text is transmitted properly.
Whether the user uses an UTF-8 locale for their shell and terminal
has nothing to do with whether they want to be send UTF-8 encoded
mail with MIME headers. For example, i'm using LC_CTYPE=en_US.UTF-8
for my shells and terminals most of the time, but i do not want the
low-level mail(1) MUA to suddenly start sending UTF-8 mail without
being specifically asked to.
My understanding of purpose of LC_TYPE was that by setting it, user
specifically asks to treat input as UTF-8, and then the programs have to
handle encoding appropriately. So I wouldn't be surprised if mail(1)
started sending UTF-8 mail with LC_CTYPE=en_US.UTF-8. In fact, I'd be
happy if it did so.
I just checked - even though i'm using the higer-level mutt(1) MUA
most of the time and even though the shell i'm starting mutt(1) from
has LC_CTYPE=C.UTF-8 set on that particular machine, the last sixteen
mails i sent all contained the explicit header
Content-Type: text/plain; charset=us-ascii
and intentionally so. Yes, i do occasionally send UTF-8 mail on
purpose, mostly in highly technical messages that need to display
particular Unicode characters in addition to mentioning their
codepoints in the U+[XX]XXXX form, and rarely, sending UTF-8 happens
inadvertently because mutt(1) contains some weird autodetection logic -
but what you set your terminal to and what you use for sending mail
are clearly completely unrelated topics.
Mutt has indeed a logic to see which character set a text can be
converted into: it tries US-ASCII, then ISO-8859-1 and then UTF-8.
Yours,
Ingo