On Wed, Sep 20, 2023 at 05:30:08PM +0200, Ingo Schwarze wrote: > Hi, > > i checked the following points: > > * Even though RFC 2049 section 2 bullet point 1 only *requires* > MIME-conformant MUAs to always write the header "MIME-Version: > 1.0" - and mail(1) is most certainly not MIME-conformant - RFC 2049 > section 2 bullet point 8 explicitly *recommends* that even non-MIME > MUAs always set appropriate MIME headers. RFC 2046 section 4.1.2 > paragraph 8 also "strongly" recommends the explicit inclusion of a > "charset" parameter even for us-ascii. > > Consequently, i believe that when sending a message in US-ASCII, > mail(1) should include these headers: > > MIME-Version: 1.0 > Content-Transfer-Encoding: 7bit > Content-Type: text/plain; charset=us-ascii
I already thought about adding this, it's what Mutt does by default, But I thought, Ingo is going to scold me for complicating things. :-) > > * Adding a "Content-Transfer-Encoding: ..." header is indeed required > for sending UTF-8 messages, see RFC 2049 section 2 bullet point 2. > "8bit" is one of the valid values that MUAs must support for > receiving messages by default. > Using it seems sane because it is most likely to work with receiving > MUAs that are not MIME-conformant, like our mail(1) itself. > I think nowadays, that's a bigger concern than MTAs that are not > 8-bit clean, in particular when maintaining a low-level program > like our mail(1). > Consequently, i think using 8bit is indeed better for our mail(1) > than quoted-printable or base64. Well, this also saves you the conversion, especially with the subject, which is tricky. > > * Adding "Content-Type: text/plain; charset=utf-8" is required by > RFC 2049 section 2 bullet point 4 (for the simplest kind of UTF-8 > encoded messages). > > * The Content-Disposition: header is defined in RFC 2183, clearly > optional, and not useful in single-part messages. Consequently, > mail(1) should not write it. Yeah, I read that, that's why I didn't add that header. > > So apart from writing the headers for us-ascii, i think you are > almost there. > > Given that the charset cannot be inferred from the environment > and that setting it per-system or per-user in a configuration file > is also inadequate - it shouldn't be uncommon for users to sometimes > send US-ASCII and sometimes UTF-8 mail - i think that a new option > is indeed needed. > > Regarding the naming of the option, compatibility with POSIX > https://pubs.opengroup.org/onlinepubs/9699919799/utilities/mailx.html > is paramount, which kills the tentative idea to use -u for "UTF-8" > because -u already means "user". > > Compatibility with other mailx(1) implementations is also a > consideration. See, for example, > https://linux.die.net/man/1/mail > and -m is indeed among the very few options still available over there. > I would document it focussing on a "multibyte character encoding" > mnemonic. The "mime" mnemonic feels far too broad because MIME can > be used for lots of other purposes besides specifying a character > encoding. > > The -m option is also free here: > https://man.freebsd.org/cgi/man.cgi?query=mail(1) > https://man.netbsd.org/mail.1 > https://docs.oracle.com/cd/E88353_01/html/E37839/mailx-1.html > https://www.ibm.com/docs/en/aix/7.3?topic=m-mail-command-1 > None of those appears to support command line selection of the > character set for sending mail, so i don't see any immediate > logioc clashes either. > > The -m option does clash with this one: > https://www.sdaoden.eu/code-nail.html > But i think dismissing Steffen Daode Nurpmeso as a lunatic is obviously > the way to go. Try to listen to that person and you will never get > anything done. > > The mailx(1) documented on die.net appears to be the Heirloom one. > It does not have an option to select sending US-ASCII or UTF-8. > Instead, it has a "sendcharsets" configuration variable. That's > clearly overengineering, but even when hardcoding the equivalent of > > sendcharsets=utf-8 > > which is also the default, that's nasty because it silently switches to > UTF-8 as soon as a non-ASCII character appears in the input. I think > at least in interactive mode, explicit confirmation from the user would > be required to send UTF-8, instead writing dead.letter if the user > rejects the request, such that they can clean up the file and try again. > > That would certainly be more complicated than requiring an option > up front, not only from the implementation perspective, but arguably > also from the user perspective. So unless other developers think this > should be fully automatic with confirmation rather than controlled > by an option, i suggest staying with Walter's idea of using an option. Now I was investigating exactly that :-) (like Mutt also does): to make mail(1) automatically set the appropiate MIME headers when it detects any utf8 characters in the body text. So, you don't like this idea? > > > > Index: extern.h > > =================================================================== > > RCS file: /cvs/src/usr.bin/mail/extern.h,v > > retrieving revision 1.29 > > diff -u -p -r1.29 extern.h > > --- extern.h 16 Sep 2018 02:38:57 -0000 1.29 > > +++ extern.h 20 Sep 2023 10:44:41 -0000 > > @@ -261,3 +261,4 @@ int writeback(FILE *); > > extern char *__progname; > > extern char *tmpdir; > > extern const struct cmd *com; /* command we are running */ > > +extern char mime; /* Add MIME headers */ > > Likely not best mnemonic naming. > > > Index: mail.1 > > =================================================================== > > RCS file: /cvs/src/usr.bin/mail/mail.1,v > > retrieving revision 1.83 > > diff -u -p -r1.83 mail.1 > > --- mail.1 31 Mar 2022 17:27:25 -0000 1.83 > > +++ mail.1 20 Sep 2023 10:44:41 -0000 > > @@ -40,7 +40,7 @@ > > .Sh SYNOPSIS > > .Nm mail > > .Bk -words > > -.Op Fl dEIinv > > +.Op Fl dEIimnv > > .Op Fl b Ar list > > .Op Fl c Ar list > > .Op Fl r Ar from-addr > > @@ -106,6 +106,8 @@ on noisy phone lines. > > .It Fl N > > Inhibits initial display of message headers > > when reading mail or editing a mail folder. > > +.It Fl m > > +Add MIME headers to send UTF-8 encoded messages. > > Maybe s/messages/plain text messages/ ? > > This should probably also say that the input text is supposed to > already be UTF-8 encoded and that neither a re-encoding of the input > nor a Content-Transfer-Encoding takes place, instead simply setting > Content-Transfer-Encoding: 8bit. > > Should we also say - either here or below .Ss Sending mail - that mail > is sent using the US-ASCII encoding by default, and that the input text > is supposed to only contain 7-bit ascii(7) characters in that case? > > Probably, when -m is not requested, mail(1) should refuse sending > a message that is not 7-bit clean, which may possibly require writing > a dead.letter file. But that's clearly a topic for a different patch. > > > .It Fl n > > Inhibits reading > > .Pa /etc/mail.rc > > Index: main.c > > =================================================================== > > RCS file: /cvs/src/usr.bin/mail/main.c,v > > retrieving revision 1.35 > > diff -u -p -r1.35 main.c > > --- main.c 26 Jan 2021 18:21:47 -0000 1.35 > > +++ main.c 20 Sep 2023 10:44:41 -0000 > > @@ -79,6 +79,8 @@ int realscreenheight; /* the real > > scree > > int uflag; /* Are we in -u mode? */ > > sigset_t intset; /* Signal set that is just SIGINT */ > > > > +char mime = 0; /* Add MIME headers */ > > + > > Again, likely not the best mnemonic. > > > /* > > * The pointers for the string allocation routines, > > * there are NSPACE independent areas. > > @@ -136,7 +138,7 @@ main(int argc, char **argv) > > smopts = NULL; > > fromaddr = NULL; > > subject = NULL; > > - while ((i = getopt(argc, argv, "EINb:c:dfinr:s:u:v")) != -1) { > > + while ((i = getopt(argc, argv, "EINb:c:dfimnr:s:u:v")) != -1) { > > switch (i) { > > case 'u': > > /* > > @@ -171,6 +173,10 @@ main(int argc, char **argv) > > */ > > subject = optarg; > > break; > > + case 'm': > > + /* Add MIME headers */ > > + mime = 1; > > + break; > > case 'f': > > /* > > * User is specifying file to "edit" with Mail, > > @@ -337,7 +343,7 @@ __dead void > > usage(void) > > { > > > > - fprintf(stderr, "usage: %s [-dEIinv] [-b list] [-c list] " > > + fprintf(stderr, "usage: %s [-dEIimnv] [-b list] [-c list] " > > "[-r from-addr] [-s subject] to-addr ...\n", __progname); > > fprintf(stderr, " %s [-dEIiNnv] -f [file]\n", __progname); > > fprintf(stderr, " %s [-dEIiNnv] [-u user]\n", __progname); > > Index: send.c > > =================================================================== > > RCS file: /cvs/src/usr.bin/mail/send.c,v > > retrieving revision 1.26 > > diff -u -p -r1.26 send.c > > --- send.c 8 Mar 2023 04:43:11 -0000 1.26 > > +++ send.c 20 Sep 2023 10:44:41 -0000 > > @@ -525,6 +525,8 @@ puthead(struct header *hp, FILE *fo, int > > fmt("To:", hp->h_to, fo, w&GCOMMA), gotcha++; > > if (hp->h_subject != NULL && w & GSUBJECT) > > fprintf(fo, "Subject: %s\n", hp->h_subject), gotcha++; > > + if (mime) > > + fprintf(fo, "MIME-Version: 1.0\nContent-Type: text/plain; > > charset=utf-8\nContent-Transfer-Encoding: 8bit\n"), gotcha++; > > if (hp->h_cc != NULL && w & GCC) > > fmt("Cc:", hp->h_cc, fo, w&GCOMMA), gotcha++; > > if (hp->h_bcc != NULL && w & GBCC) > > This probably needs some output for the case without -m, too, > and i would write the MIME headers after Cc: and Bcc:, i think. > > Yours, > Ingo -- Walter