On Wed, Sep 20, 2023 at 05:30:08PM +0200, Ingo Schwarze wrote:
> Hi,
> 
> i checked the following points:
> 
>  * Even though RFC 2049 section 2 bullet point 1 only *requires*
>    MIME-conformant MUAs to always write the header "MIME-Version:
>    1.0" - and mail(1) is most certainly not MIME-conformant - RFC 2049
>    section 2 bullet point 8 explicitly *recommends* that even non-MIME
>    MUAs always set appropriate MIME headers.  RFC 2046 section 4.1.2
>    paragraph 8 also "strongly" recommends the explicit inclusion of a
>    "charset" parameter even for us-ascii.
> 
>    Consequently, i believe that when sending a message in US-ASCII,
>    mail(1) should include these headers:
> 
>    MIME-Version: 1.0
>    Content-Transfer-Encoding: 7bit
>    Content-Type: text/plain; charset=us-ascii

I already thought about adding this, it's what Mutt does by default, But
I thought, Ingo is going to scold me for complicating things. :-)

> 
>  * Adding a "Content-Transfer-Encoding: ..." header is indeed required
>    for sending UTF-8 messages, see  RFC 2049 section 2 bullet point 2.
>    "8bit" is one of the valid values that MUAs must support for
>    receiving messages by default.
>    Using it seems sane because it is most likely to work with receiving
>    MUAs that are not MIME-conformant, like our mail(1) itself.
>    I think nowadays, that's a bigger concern than MTAs that are not
>    8-bit clean, in particular when maintaining a low-level program
>    like our mail(1).
>    Consequently, i think using 8bit is indeed better for our mail(1)
>    than quoted-printable or base64.

Well, this also saves you the conversion, especially with the subject,
which is tricky.

> 
>  * Adding "Content-Type: text/plain; charset=utf-8" is required by
>    RFC 2049 section 2 bullet point 4 (for the simplest kind of UTF-8
>    encoded messages).
> 
>  * The Content-Disposition: header is defined in RFC 2183, clearly
>    optional, and not useful in single-part messages.  Consequently,
>    mail(1) should not write it.

Yeah, I read that, that's why I didn't add that header.


> 
> So apart from writing the headers for us-ascii, i think you are
> almost there.
> 
> Given that the charset cannot be inferred from the environment
> and that setting it per-system or per-user in a configuration file
> is also inadequate - it shouldn't be uncommon for users to sometimes
> send US-ASCII and sometimes UTF-8 mail - i think that a new option
> is indeed needed.
> 
> Regarding the naming of the option, compatibility with POSIX
>   https://pubs.opengroup.org/onlinepubs/9699919799/utilities/mailx.html
> is paramount, which kills the tentative idea to use -u for "UTF-8"
> because -u already means "user".
> 
> Compatibility with other mailx(1) implementations is also a
> consideration.  See, for example,
>   https://linux.die.net/man/1/mail
> and -m is indeed among the very few options still available over there.
> I would document it focussing on a "multibyte character encoding"
> mnemonic.  The "mime" mnemonic feels far too broad because MIME can
> be used for lots of other purposes besides specifying a character
> encoding.
> 
> The -m option is also free here:
>   https://man.freebsd.org/cgi/man.cgi?query=mail(1)
>   https://man.netbsd.org/mail.1
>   https://docs.oracle.com/cd/E88353_01/html/E37839/mailx-1.html
>   https://www.ibm.com/docs/en/aix/7.3?topic=m-mail-command-1
> None of those appears to support command line selection of the
> character set for sending mail, so i don't see any immediate
> logioc clashes either.
> 
> The -m option does clash with this one:
>   https://www.sdaoden.eu/code-nail.html
> But i think dismissing Steffen Daode Nurpmeso as a lunatic is obviously
> the way to go.  Try to listen to that person and you will never get
> anything done.
> 
> The mailx(1) documented on die.net appears to be the Heirloom one.
> It does not have an option to select sending US-ASCII or UTF-8.
> Instead, it has a "sendcharsets" configuration variable.  That's
> clearly overengineering, but even when hardcoding the equivalent of
> 
>   sendcharsets=utf-8
> 
> which is also the default, that's nasty because it silently switches to
> UTF-8 as soon as a non-ASCII character appears in the input.  I think
> at least in interactive mode, explicit confirmation from the user would
> be required to send UTF-8, instead writing dead.letter if the user
> rejects the request, such that they can clean up the file and try again.
> 
> That would certainly be more complicated than requiring an option
> up front, not only from the implementation perspective, but arguably
> also from the user perspective.  So unless other developers think this
> should be fully automatic with confirmation rather than controlled
> by an option, i suggest staying with Walter's idea of using an option.

Now I was investigating exactly that :-) (like Mutt also does): to make
mail(1) automatically set the appropiate MIME headers when it detects
any utf8 characters in the body text.  So, you don't like this idea?


> 
> 
> > Index: extern.h
> > ===================================================================
> > RCS file: /cvs/src/usr.bin/mail/extern.h,v
> > retrieving revision 1.29
> > diff -u -p -r1.29 extern.h
> > --- extern.h        16 Sep 2018 02:38:57 -0000      1.29
> > +++ extern.h        20 Sep 2023 10:44:41 -0000
> > @@ -261,3 +261,4 @@ int      writeback(FILE *);
> >  extern char *__progname;
> >  extern char *tmpdir;
> >  extern const struct cmd *com; /* command we are running */
> > +extern char mime; /* Add MIME headers */
> 
> Likely not best mnemonic naming.
> 
> > Index: mail.1
> > ===================================================================
> > RCS file: /cvs/src/usr.bin/mail/mail.1,v
> > retrieving revision 1.83
> > diff -u -p -r1.83 mail.1
> > --- mail.1  31 Mar 2022 17:27:25 -0000      1.83
> > +++ mail.1  20 Sep 2023 10:44:41 -0000
> > @@ -40,7 +40,7 @@
> >  .Sh SYNOPSIS
> >  .Nm mail
> >  .Bk -words
> > -.Op Fl dEIinv
> > +.Op Fl dEIimnv
> >  .Op Fl b Ar list
> >  .Op Fl c Ar list
> >  .Op Fl r Ar from-addr
> > @@ -106,6 +106,8 @@ on noisy phone lines.
> >  .It Fl N
> >  Inhibits initial display of message headers
> >  when reading mail or editing a mail folder.
> > +.It Fl m
> > +Add MIME headers to send UTF-8 encoded messages.
> 
> Maybe s/messages/plain text messages/ ?
> 
> This should probably also say that the input text is supposed to
> already be UTF-8 encoded and that neither a re-encoding of the input
> nor a Content-Transfer-Encoding takes place, instead simply setting
> Content-Transfer-Encoding: 8bit.
> 
> Should we also say - either here or below .Ss Sending mail - that mail
> is sent using the US-ASCII encoding by default, and that the input text
> is supposed to only contain 7-bit ascii(7) characters in that case?
> 
> Probably, when -m is not requested, mail(1) should refuse sending
> a message that is not 7-bit clean, which may possibly require writing
> a dead.letter file.  But that's clearly a topic for a different patch.
> 
> >  .It Fl n
> >  Inhibits reading
> >  .Pa /etc/mail.rc
> > Index: main.c
> > ===================================================================
> > RCS file: /cvs/src/usr.bin/mail/main.c,v
> > retrieving revision 1.35
> > diff -u -p -r1.35 main.c
> > --- main.c  26 Jan 2021 18:21:47 -0000      1.35
> > +++ main.c  20 Sep 2023 10:44:41 -0000
> > @@ -79,6 +79,8 @@ int       realscreenheight;               /* the real 
> > scree
> >  int        uflag;                          /* Are we in -u mode? */
> >  sigset_t intset;                   /* Signal set that is just SIGINT */
> >  
> > +char mime = 0;                             /* Add MIME headers */
> > +
> 
> Again, likely not the best mnemonic.
> 
> >  /*
> >   * The pointers for the string allocation routines,
> >   * there are NSPACE independent areas.
> > @@ -136,7 +138,7 @@ main(int argc, char **argv)
> >     smopts = NULL;
> >     fromaddr = NULL;
> >     subject = NULL;
> > -   while ((i = getopt(argc, argv, "EINb:c:dfinr:s:u:v")) != -1) {
> > +   while ((i = getopt(argc, argv, "EINb:c:dfimnr:s:u:v")) != -1) {
> >             switch (i) {
> >             case 'u':
> >                     /*
> > @@ -171,6 +173,10 @@ main(int argc, char **argv)
> >                      */
> >                     subject = optarg;
> >                     break;
> > +           case 'm':
> > +                   /* Add MIME headers */
> > +                   mime = 1;
> > +                   break;
> >             case 'f':
> >                     /*
> >                      * User is specifying file to "edit" with Mail,
> > @@ -337,7 +343,7 @@ __dead void
> >  usage(void)
> >  {
> >  
> > -   fprintf(stderr, "usage: %s [-dEIinv] [-b list] [-c list] "
> > +   fprintf(stderr, "usage: %s [-dEIimnv] [-b list] [-c list] "
> >         "[-r from-addr] [-s subject] to-addr ...\n", __progname);
> >     fprintf(stderr, "       %s [-dEIiNnv] -f [file]\n", __progname);
> >     fprintf(stderr, "       %s [-dEIiNnv] [-u user]\n", __progname);
> > Index: send.c
> > ===================================================================
> > RCS file: /cvs/src/usr.bin/mail/send.c,v
> > retrieving revision 1.26
> > diff -u -p -r1.26 send.c
> > --- send.c  8 Mar 2023 04:43:11 -0000       1.26
> > +++ send.c  20 Sep 2023 10:44:41 -0000
> > @@ -525,6 +525,8 @@ puthead(struct header *hp, FILE *fo, int
> >             fmt("To:", hp->h_to, fo, w&GCOMMA), gotcha++;
> >     if (hp->h_subject != NULL && w & GSUBJECT)
> >             fprintf(fo, "Subject: %s\n", hp->h_subject), gotcha++;
> > +   if (mime)
> > +           fprintf(fo, "MIME-Version: 1.0\nContent-Type: text/plain; 
> > charset=utf-8\nContent-Transfer-Encoding: 8bit\n"), gotcha++;
> >     if (hp->h_cc != NULL && w & GCC)
> >             fmt("Cc:", hp->h_cc, fo, w&GCOMMA), gotcha++;
> >     if (hp->h_bcc != NULL && w & GBCC)
> 
> This probably needs some output for the case without -m, too,
> and i would write the MIME headers after Cc: and Bcc:, i think.
> 
> Yours,
>   Ingo

-- 
Walter

Reply via email to