Hi,

i checked the following points:

 * Even though RFC 2049 section 2 bullet point 1 only *requires*
   MIME-conformant MUAs to always write the header "MIME-Version:
   1.0" - and mail(1) is most certainly not MIME-conformant - RFC 2049
   section 2 bullet point 8 explicitly *recommends* that even non-MIME
   MUAs always set appropriate MIME headers.  RFC 2046 section 4.1.2
   paragraph 8 also "strongly" recommends the explicit inclusion of a
   "charset" parameter even for us-ascii.

   Consequently, i believe that when sending a message in US-ASCII,
   mail(1) should include these headers:

   MIME-Version: 1.0
   Content-Transfer-Encoding: 7bit
   Content-Type: text/plain; charset=us-ascii

 * Adding a "Content-Transfer-Encoding: ..." header is indeed required
   for sending UTF-8 messages, see  RFC 2049 section 2 bullet point 2.
   "8bit" is one of the valid values that MUAs must support for
   receiving messages by default.
   Using it seems sane because it is most likely to work with receiving
   MUAs that are not MIME-conformant, like our mail(1) itself.
   I think nowadays, that's a bigger concern than MTAs that are not
   8-bit clean, in particular when maintaining a low-level program
   like our mail(1).
   Consequently, i think using 8bit is indeed better for our mail(1)
   than quoted-printable or base64.

 * Adding "Content-Type: text/plain; charset=utf-8" is required by
   RFC 2049 section 2 bullet point 4 (for the simplest kind of UTF-8
   encoded messages).

 * The Content-Disposition: header is defined in RFC 2183, clearly
   optional, and not useful in single-part messages.  Consequently,
   mail(1) should not write it.

So apart from writing the headers for us-ascii, i think you are
almost there.

Given that the charset cannot be inferred from the environment
and that setting it per-system or per-user in a configuration file
is also inadequate - it shouldn't be uncommon for users to sometimes
send US-ASCII and sometimes UTF-8 mail - i think that a new option
is indeed needed.

Regarding the naming of the option, compatibility with POSIX
  https://pubs.opengroup.org/onlinepubs/9699919799/utilities/mailx.html
is paramount, which kills the tentative idea to use -u for "UTF-8"
because -u already means "user".

Compatibility with other mailx(1) implementations is also a
consideration.  See, for example,
  https://linux.die.net/man/1/mail
and -m is indeed among the very few options still available over there.
I would document it focussing on a "multibyte character encoding"
mnemonic.  The "mime" mnemonic feels far too broad because MIME can
be used for lots of other purposes besides specifying a character
encoding.

The -m option is also free here:
  https://man.freebsd.org/cgi/man.cgi?query=mail(1)
  https://man.netbsd.org/mail.1
  https://docs.oracle.com/cd/E88353_01/html/E37839/mailx-1.html
  https://www.ibm.com/docs/en/aix/7.3?topic=m-mail-command-1
None of those appears to support command line selection of the
character set for sending mail, so i don't see any immediate
logioc clashes either.

The -m option does clash with this one:
  https://www.sdaoden.eu/code-nail.html
But i think dismissing Steffen Daode Nurpmeso as a lunatic is obviously
the way to go.  Try to listen to that person and you will never get
anything done.

The mailx(1) documented on die.net appears to be the Heirloom one.
It does not have an option to select sending US-ASCII or UTF-8.
Instead, it has a "sendcharsets" configuration variable.  That's
clearly overengineering, but even when hardcoding the equivalent of

  sendcharsets=utf-8

which is also the default, that's nasty because it silently switches to
UTF-8 as soon as a non-ASCII character appears in the input.  I think
at least in interactive mode, explicit confirmation from the user would
be required to send UTF-8, instead writing dead.letter if the user
rejects the request, such that they can clean up the file and try again.

That would certainly be more complicated than requiring an option
up front, not only from the implementation perspective, but arguably
also from the user perspective.  So unless other developers think this
should be fully automatic with confirmation rather than controlled
by an option, i suggest staying with Walter's idea of using an option.


> Index: extern.h
> ===================================================================
> RCS file: /cvs/src/usr.bin/mail/extern.h,v
> retrieving revision 1.29
> diff -u -p -r1.29 extern.h
> --- extern.h  16 Sep 2018 02:38:57 -0000      1.29
> +++ extern.h  20 Sep 2023 10:44:41 -0000
> @@ -261,3 +261,4 @@ int        writeback(FILE *);
>  extern char *__progname;
>  extern char *tmpdir;
>  extern const struct cmd *com; /* command we are running */
> +extern char mime; /* Add MIME headers */

Likely not best mnemonic naming.

> Index: mail.1
> ===================================================================
> RCS file: /cvs/src/usr.bin/mail/mail.1,v
> retrieving revision 1.83
> diff -u -p -r1.83 mail.1
> --- mail.1    31 Mar 2022 17:27:25 -0000      1.83
> +++ mail.1    20 Sep 2023 10:44:41 -0000
> @@ -40,7 +40,7 @@
>  .Sh SYNOPSIS
>  .Nm mail
>  .Bk -words
> -.Op Fl dEIinv
> +.Op Fl dEIimnv
>  .Op Fl b Ar list
>  .Op Fl c Ar list
>  .Op Fl r Ar from-addr
> @@ -106,6 +106,8 @@ on noisy phone lines.
>  .It Fl N
>  Inhibits initial display of message headers
>  when reading mail or editing a mail folder.
> +.It Fl m
> +Add MIME headers to send UTF-8 encoded messages.

Maybe s/messages/plain text messages/ ?

This should probably also say that the input text is supposed to
already be UTF-8 encoded and that neither a re-encoding of the input
nor a Content-Transfer-Encoding takes place, instead simply setting
Content-Transfer-Encoding: 8bit.

Should we also say - either here or below .Ss Sending mail - that mail
is sent using the US-ASCII encoding by default, and that the input text
is supposed to only contain 7-bit ascii(7) characters in that case?

Probably, when -m is not requested, mail(1) should refuse sending
a message that is not 7-bit clean, which may possibly require writing
a dead.letter file.  But that's clearly a topic for a different patch.

>  .It Fl n
>  Inhibits reading
>  .Pa /etc/mail.rc
> Index: main.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/mail/main.c,v
> retrieving revision 1.35
> diff -u -p -r1.35 main.c
> --- main.c    26 Jan 2021 18:21:47 -0000      1.35
> +++ main.c    20 Sep 2023 10:44:41 -0000
> @@ -79,6 +79,8 @@ int realscreenheight;               /* the real scree
>  int  uflag;                          /* Are we in -u mode? */
>  sigset_t intset;                     /* Signal set that is just SIGINT */
>  
> +char mime = 0;                               /* Add MIME headers */
> +

Again, likely not the best mnemonic.

>  /*
>   * The pointers for the string allocation routines,
>   * there are NSPACE independent areas.
> @@ -136,7 +138,7 @@ main(int argc, char **argv)
>       smopts = NULL;
>       fromaddr = NULL;
>       subject = NULL;
> -     while ((i = getopt(argc, argv, "EINb:c:dfinr:s:u:v")) != -1) {
> +     while ((i = getopt(argc, argv, "EINb:c:dfimnr:s:u:v")) != -1) {
>               switch (i) {
>               case 'u':
>                       /*
> @@ -171,6 +173,10 @@ main(int argc, char **argv)
>                        */
>                       subject = optarg;
>                       break;
> +             case 'm':
> +                     /* Add MIME headers */
> +                     mime = 1;
> +                     break;
>               case 'f':
>                       /*
>                        * User is specifying file to "edit" with Mail,
> @@ -337,7 +343,7 @@ __dead void
>  usage(void)
>  {
>  
> -     fprintf(stderr, "usage: %s [-dEIinv] [-b list] [-c list] "
> +     fprintf(stderr, "usage: %s [-dEIimnv] [-b list] [-c list] "
>           "[-r from-addr] [-s subject] to-addr ...\n", __progname);
>       fprintf(stderr, "       %s [-dEIiNnv] -f [file]\n", __progname);
>       fprintf(stderr, "       %s [-dEIiNnv] [-u user]\n", __progname);
> Index: send.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/mail/send.c,v
> retrieving revision 1.26
> diff -u -p -r1.26 send.c
> --- send.c    8 Mar 2023 04:43:11 -0000       1.26
> +++ send.c    20 Sep 2023 10:44:41 -0000
> @@ -525,6 +525,8 @@ puthead(struct header *hp, FILE *fo, int
>               fmt("To:", hp->h_to, fo, w&GCOMMA), gotcha++;
>       if (hp->h_subject != NULL && w & GSUBJECT)
>               fprintf(fo, "Subject: %s\n", hp->h_subject), gotcha++;
> +     if (mime)
> +             fprintf(fo, "MIME-Version: 1.0\nContent-Type: text/plain; 
> charset=utf-8\nContent-Transfer-Encoding: 8bit\n"), gotcha++;
>       if (hp->h_cc != NULL && w & GCC)
>               fmt("Cc:", hp->h_cc, fo, w&GCOMMA), gotcha++;
>       if (hp->h_bcc != NULL && w & GBCC)

This probably needs some output for the case without -m, too,
and i would write the MIME headers after Cc: and Bcc:, i think.

Yours,
  Ingo

Reply via email to