Hi, i checked the following points:
* Even though RFC 2049 section 2 bullet point 1 only *requires* MIME-conformant MUAs to always write the header "MIME-Version: 1.0" - and mail(1) is most certainly not MIME-conformant - RFC 2049 section 2 bullet point 8 explicitly *recommends* that even non-MIME MUAs always set appropriate MIME headers. RFC 2046 section 4.1.2 paragraph 8 also "strongly" recommends the explicit inclusion of a "charset" parameter even for us-ascii. Consequently, i believe that when sending a message in US-ASCII, mail(1) should include these headers: MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii * Adding a "Content-Transfer-Encoding: ..." header is indeed required for sending UTF-8 messages, see RFC 2049 section 2 bullet point 2. "8bit" is one of the valid values that MUAs must support for receiving messages by default. Using it seems sane because it is most likely to work with receiving MUAs that are not MIME-conformant, like our mail(1) itself. I think nowadays, that's a bigger concern than MTAs that are not 8-bit clean, in particular when maintaining a low-level program like our mail(1). Consequently, i think using 8bit is indeed better for our mail(1) than quoted-printable or base64. * Adding "Content-Type: text/plain; charset=utf-8" is required by RFC 2049 section 2 bullet point 4 (for the simplest kind of UTF-8 encoded messages). * The Content-Disposition: header is defined in RFC 2183, clearly optional, and not useful in single-part messages. Consequently, mail(1) should not write it. So apart from writing the headers for us-ascii, i think you are almost there. Given that the charset cannot be inferred from the environment and that setting it per-system or per-user in a configuration file is also inadequate - it shouldn't be uncommon for users to sometimes send US-ASCII and sometimes UTF-8 mail - i think that a new option is indeed needed. Regarding the naming of the option, compatibility with POSIX https://pubs.opengroup.org/onlinepubs/9699919799/utilities/mailx.html is paramount, which kills the tentative idea to use -u for "UTF-8" because -u already means "user". Compatibility with other mailx(1) implementations is also a consideration. See, for example, https://linux.die.net/man/1/mail and -m is indeed among the very few options still available over there. I would document it focussing on a "multibyte character encoding" mnemonic. The "mime" mnemonic feels far too broad because MIME can be used for lots of other purposes besides specifying a character encoding. The -m option is also free here: https://man.freebsd.org/cgi/man.cgi?query=mail(1) https://man.netbsd.org/mail.1 https://docs.oracle.com/cd/E88353_01/html/E37839/mailx-1.html https://www.ibm.com/docs/en/aix/7.3?topic=m-mail-command-1 None of those appears to support command line selection of the character set for sending mail, so i don't see any immediate logioc clashes either. The -m option does clash with this one: https://www.sdaoden.eu/code-nail.html But i think dismissing Steffen Daode Nurpmeso as a lunatic is obviously the way to go. Try to listen to that person and you will never get anything done. The mailx(1) documented on die.net appears to be the Heirloom one. It does not have an option to select sending US-ASCII or UTF-8. Instead, it has a "sendcharsets" configuration variable. That's clearly overengineering, but even when hardcoding the equivalent of sendcharsets=utf-8 which is also the default, that's nasty because it silently switches to UTF-8 as soon as a non-ASCII character appears in the input. I think at least in interactive mode, explicit confirmation from the user would be required to send UTF-8, instead writing dead.letter if the user rejects the request, such that they can clean up the file and try again. That would certainly be more complicated than requiring an option up front, not only from the implementation perspective, but arguably also from the user perspective. So unless other developers think this should be fully automatic with confirmation rather than controlled by an option, i suggest staying with Walter's idea of using an option. > Index: extern.h > =================================================================== > RCS file: /cvs/src/usr.bin/mail/extern.h,v > retrieving revision 1.29 > diff -u -p -r1.29 extern.h > --- extern.h 16 Sep 2018 02:38:57 -0000 1.29 > +++ extern.h 20 Sep 2023 10:44:41 -0000 > @@ -261,3 +261,4 @@ int writeback(FILE *); > extern char *__progname; > extern char *tmpdir; > extern const struct cmd *com; /* command we are running */ > +extern char mime; /* Add MIME headers */ Likely not best mnemonic naming. > Index: mail.1 > =================================================================== > RCS file: /cvs/src/usr.bin/mail/mail.1,v > retrieving revision 1.83 > diff -u -p -r1.83 mail.1 > --- mail.1 31 Mar 2022 17:27:25 -0000 1.83 > +++ mail.1 20 Sep 2023 10:44:41 -0000 > @@ -40,7 +40,7 @@ > .Sh SYNOPSIS > .Nm mail > .Bk -words > -.Op Fl dEIinv > +.Op Fl dEIimnv > .Op Fl b Ar list > .Op Fl c Ar list > .Op Fl r Ar from-addr > @@ -106,6 +106,8 @@ on noisy phone lines. > .It Fl N > Inhibits initial display of message headers > when reading mail or editing a mail folder. > +.It Fl m > +Add MIME headers to send UTF-8 encoded messages. Maybe s/messages/plain text messages/ ? This should probably also say that the input text is supposed to already be UTF-8 encoded and that neither a re-encoding of the input nor a Content-Transfer-Encoding takes place, instead simply setting Content-Transfer-Encoding: 8bit. Should we also say - either here or below .Ss Sending mail - that mail is sent using the US-ASCII encoding by default, and that the input text is supposed to only contain 7-bit ascii(7) characters in that case? Probably, when -m is not requested, mail(1) should refuse sending a message that is not 7-bit clean, which may possibly require writing a dead.letter file. But that's clearly a topic for a different patch. > .It Fl n > Inhibits reading > .Pa /etc/mail.rc > Index: main.c > =================================================================== > RCS file: /cvs/src/usr.bin/mail/main.c,v > retrieving revision 1.35 > diff -u -p -r1.35 main.c > --- main.c 26 Jan 2021 18:21:47 -0000 1.35 > +++ main.c 20 Sep 2023 10:44:41 -0000 > @@ -79,6 +79,8 @@ int realscreenheight; /* the real scree > int uflag; /* Are we in -u mode? */ > sigset_t intset; /* Signal set that is just SIGINT */ > > +char mime = 0; /* Add MIME headers */ > + Again, likely not the best mnemonic. > /* > * The pointers for the string allocation routines, > * there are NSPACE independent areas. > @@ -136,7 +138,7 @@ main(int argc, char **argv) > smopts = NULL; > fromaddr = NULL; > subject = NULL; > - while ((i = getopt(argc, argv, "EINb:c:dfinr:s:u:v")) != -1) { > + while ((i = getopt(argc, argv, "EINb:c:dfimnr:s:u:v")) != -1) { > switch (i) { > case 'u': > /* > @@ -171,6 +173,10 @@ main(int argc, char **argv) > */ > subject = optarg; > break; > + case 'm': > + /* Add MIME headers */ > + mime = 1; > + break; > case 'f': > /* > * User is specifying file to "edit" with Mail, > @@ -337,7 +343,7 @@ __dead void > usage(void) > { > > - fprintf(stderr, "usage: %s [-dEIinv] [-b list] [-c list] " > + fprintf(stderr, "usage: %s [-dEIimnv] [-b list] [-c list] " > "[-r from-addr] [-s subject] to-addr ...\n", __progname); > fprintf(stderr, " %s [-dEIiNnv] -f [file]\n", __progname); > fprintf(stderr, " %s [-dEIiNnv] [-u user]\n", __progname); > Index: send.c > =================================================================== > RCS file: /cvs/src/usr.bin/mail/send.c,v > retrieving revision 1.26 > diff -u -p -r1.26 send.c > --- send.c 8 Mar 2023 04:43:11 -0000 1.26 > +++ send.c 20 Sep 2023 10:44:41 -0000 > @@ -525,6 +525,8 @@ puthead(struct header *hp, FILE *fo, int > fmt("To:", hp->h_to, fo, w&GCOMMA), gotcha++; > if (hp->h_subject != NULL && w & GSUBJECT) > fprintf(fo, "Subject: %s\n", hp->h_subject), gotcha++; > + if (mime) > + fprintf(fo, "MIME-Version: 1.0\nContent-Type: text/plain; > charset=utf-8\nContent-Transfer-Encoding: 8bit\n"), gotcha++; > if (hp->h_cc != NULL && w & GCC) > fmt("Cc:", hp->h_cc, fo, w&GCOMMA), gotcha++; > if (hp->h_bcc != NULL && w & GBCC) This probably needs some output for the case without -m, too, and i would write the MIME headers after Cc: and Bcc:, i think. Yours, Ingo