Hello,

Walter: I'm happy that you've been hacking on mail and at least in
principle I think what you're doing makes sense; however, let's try to
get one bit committed at a time.

Let's start with the MIME needed for sending utf-8 messages.

I've going through the various mail and I think it's here where things
started to go off the rails.  Ingo provided some valuable feedback,
I've updated your diff to address it.  Other additions, such as doing
checks on the content, adding other headers, etc... can be done as a
follow-up after this goes in IMHO.

On 2023/09/20 17:30:08 +0200, Ingo Schwarze <schwa...@usta.de> wrote:
> Hi,
> 
> i checked the following points:
> 
>  * Even though RFC 2049 section 2 bullet point 1 only *requires*
>    MIME-conformant MUAs to always write the header "MIME-Version:
>    1.0" - and mail(1) is most certainly not MIME-conformant - RFC 2049
>    section 2 bullet point 8 explicitly *recommends* that even non-MIME
>    MUAs always set appropriate MIME headers.  RFC 2046 section 4.1.2
>    paragraph 8 also "strongly" recommends the explicit inclusion of a
>    "charset" parameter even for us-ascii.
> 
>    Consequently, i believe that when sending a message in US-ASCII,
>    mail(1) should include these headers:
> 
>    MIME-Version: 1.0
>    Content-Transfer-Encoding: 7bit
>    Content-Type: text/plain; charset=us-ascii
> 
>  * Adding a "Content-Transfer-Encoding: ..." header is indeed required
>    for sending UTF-8 messages, see  RFC 2049 section 2 bullet point 2.
>    "8bit" is one of the valid values that MUAs must support for
>    receiving messages by default.
>    Using it seems sane because it is most likely to work with receiving
>    MUAs that are not MIME-conformant, like our mail(1) itself.
>    I think nowadays, that's a bigger concern than MTAs that are not
>    8-bit clean, in particular when maintaining a low-level program
>    like our mail(1).
>    Consequently, i think using 8bit is indeed better for our mail(1)
>    than quoted-printable or base64.
> 
>  * Adding "Content-Type: text/plain; charset=utf-8" is required by
>    RFC 2049 section 2 bullet point 4 (for the simplest kind of UTF-8
>    encoded messages).
> 
>  * The Content-Disposition: header is defined in RFC 2183, clearly
>    optional, and not useful in single-part messages.  Consequently,
>    mail(1) should not write it.
> 
> So apart from writing the headers for us-ascii, i think you are
> almost there.
> 
> Given that the charset cannot be inferred from the environment
> and that setting it per-system or per-user in a configuration file
> is also inadequate - it shouldn't be uncommon for users to sometimes
> send US-ASCII and sometimes UTF-8 mail - i think that a new option
> is indeed needed.
> 
> Regarding the naming of the option, compatibility with POSIX
>   https://pubs.opengroup.org/onlinepubs/9699919799/utilities/mailx.html
> is paramount, which kills the tentative idea to use -u for "UTF-8"
> because -u already means "user".
> 
> Compatibility with other mailx(1) implementations is also a
> consideration.  See, for example,
>   https://linux.die.net/man/1/mail
> and -m is indeed among the very few options still available over there.
> I would document it focussing on a "multibyte character encoding"
> mnemonic.  The "mime" mnemonic feels far too broad because MIME can
> be used for lots of other purposes besides specifying a character
> encoding.
> 
> The -m option is also free here:
>   https://man.freebsd.org/cgi/man.cgi?query=mail(1)
>   https://man.netbsd.org/mail.1
>   https://docs.oracle.com/cd/E88353_01/html/E37839/mailx-1.html
>   https://www.ibm.com/docs/en/aix/7.3?topic=m-mail-command-1
> None of those appears to support command line selection of the
> character set for sending mail, so i don't see any immediate
> logioc clashes either.
> 
> The -m option does clash with this one:
>   https://www.sdaoden.eu/code-nail.html
> But i think dismissing Steffen Daode Nurpmeso as a lunatic is obviously
> the way to go.  Try to listen to that person and you will never get
> anything done.
> 
> The mailx(1) documented on die.net appears to be the Heirloom one.
> It does not have an option to select sending US-ASCII or UTF-8.
> Instead, it has a "sendcharsets" configuration variable.  That's
> clearly overengineering, but even when hardcoding the equivalent of
> 
>   sendcharsets=utf-8
> 
> which is also the default, that's nasty because it silently switches to
> UTF-8 as soon as a non-ASCII character appears in the input.  I think
> at least in interactive mode, explicit confirmation from the user would
> be required to send UTF-8, instead writing dead.letter if the user
> rejects the request, such that they can clean up the file and try again.
> 
> That would certainly be more complicated than requiring an option
> up front, not only from the implementation perspective, but arguably
> also from the user perspective.  So unless other developers think this
> should be fully automatic with confirmation rather than controlled
> by an option, i suggest staying with Walter's idea of using an option.
> 
> 
> > Index: extern.h
> > ===================================================================
> > RCS file: /cvs/src/usr.bin/mail/extern.h,v
> > retrieving revision 1.29
> > diff -u -p -r1.29 extern.h
> > --- extern.h        16 Sep 2018 02:38:57 -0000      1.29
> > +++ extern.h        20 Sep 2023 10:44:41 -0000
> > @@ -261,3 +261,4 @@ int      writeback(FILE *);
> >  extern char *__progname;
> >  extern char *tmpdir;
> >  extern const struct cmd *com; /* command we are running */
> > +extern char mime; /* Add MIME headers */
> 
> Likely not best mnemonic naming.

I've changed this to "multibyte".

> > Index: mail.1
> > ===================================================================
> > RCS file: /cvs/src/usr.bin/mail/mail.1,v
> > retrieving revision 1.83
> > diff -u -p -r1.83 mail.1
> > --- mail.1  31 Mar 2022 17:27:25 -0000      1.83
> > +++ mail.1  20 Sep 2023 10:44:41 -0000
> > @@ -40,7 +40,7 @@
> >  .Sh SYNOPSIS
> >  .Nm mail
> >  .Bk -words
> > -.Op Fl dEIinv
> > +.Op Fl dEIimnv
> >  .Op Fl b Ar list
> >  .Op Fl c Ar list
> >  .Op Fl r Ar from-addr
> > @@ -106,6 +106,8 @@ on noisy phone lines.
> >  .It Fl N
> >  Inhibits initial display of message headers
> >  when reading mail or editing a mail folder.
> > +.It Fl m
> > +Add MIME headers to send UTF-8 encoded messages.
> 
> Maybe s/messages/plain text messages/ ?
> 
> This should probably also say that the input text is supposed to
> already be UTF-8 encoded and that neither a re-encoding of the input
> nor a Content-Transfer-Encoding takes place, instead simply setting
> Content-Transfer-Encoding: 8bit.

I've slightly expanded this line to mention that the input is supposed
to be already encoded.

> Should we also say - either here or below .Ss Sending mail - that mail
> is sent using the US-ASCII encoding by default, and that the input text
> is supposed to only contain 7-bit ascii(7) characters in that case?

I think so.  I've added a line under .Ss Sending mail, but it could go
here as well.

> Probably, when -m is not requested, mail(1) should refuse sending
> a message that is not 7-bit clean, which may possibly require writing
> a dead.letter file.  But that's clearly a topic for a different patch.

Just a curiosity, is sending UTF-8 mails such a bad default?  Asking
because being able to send UTF-8 encoded mail is what I would humbly
expect.

(our mail(1) can't deal with showing utf8 bodies yet, at least in
interactive mode... one issue at a time.)

Actually we could do the opposite too and having (say) -M or any other
free letter to make mail send the mail as 7bit us-ascii.

> [...]
> > ===================================================================
> > RCS file: /cvs/src/usr.bin/mail/send.c,v
> > retrieving revision 1.26
> > diff -u -p -r1.26 send.c
> > --- send.c  8 Mar 2023 04:43:11 -0000       1.26
> > +++ send.c  20 Sep 2023 10:44:41 -0000
> > @@ -525,6 +525,8 @@ puthead(struct header *hp, FILE *fo, int
> >             fmt("To:", hp->h_to, fo, w&GCOMMA), gotcha++;
> >     if (hp->h_subject != NULL && w & GSUBJECT)
> >             fprintf(fo, "Subject: %s\n", hp->h_subject), gotcha++;
> > +   if (mime)
> > +           fprintf(fo, "MIME-Version: 1.0\nContent-Type: text/plain; 
> > charset=utf-8\nContent-Transfer-Encoding: 8bit\n"), gotcha++;
> >     if (hp->h_cc != NULL && w & GCC)
> >             fmt("Cc:", hp->h_cc, fo, w&GCOMMA), gotcha++;
> >     if (hp->h_bcc != NULL && w & GBCC)
> 
> This probably needs some output for the case without -m, too,
> and i would write the MIME headers after Cc: and Bcc:, i think.

Done both.

I'm quite sure the changes to puthead are correct since it's only
called when producing the headers for an outgoing mail.  There's a bit
of churn since `gotcha' would be always set, but I can drop part of
that change.


Thanks,


diff a62878957e34caa3952801e84ba09678c0e6ba04 
2d025d839f99dc09ee525c11a4ed09a0f3bbe7d0
commit - a62878957e34caa3952801e84ba09678c0e6ba04
commit + 2d025d839f99dc09ee525c11a4ed09a0f3bbe7d0
blob - 60a1088b83bdfb7a49e390fbd72dffc15e4a20fd
blob + f2d41c77811164641ddf8579905cf0eca56ed370
--- usr.bin/mail/extern.h
+++ usr.bin/mail/extern.h
@@ -261,3 +261,4 @@ int  writeback(FILE *);
 extern char *__progname;
 extern char *tmpdir;
 extern const struct cmd *com; /* command we are running */
+extern int multibyte; /* Add MIME headers */
blob - d712811f0cc119391c027c8634f334f5113a881c
blob + 71770fe5e09b9bb3bc2d9135374c169e57fff32e
--- usr.bin/mail/mail.1
+++ usr.bin/mail/mail.1
@@ -40,7 +40,7 @@
 .Sh SYNOPSIS
 .Nm mail
 .Bk -words
-.Op Fl dEIinv
+.Op Fl dEIimnv
 .Op Fl b Ar list
 .Op Fl c Ar list
 .Op Fl r Ar from-addr
@@ -103,6 +103,9 @@ This is
 particularly useful when using
 .Nm mail
 on noisy phone lines.
+.It Fl m
+Send an UTF-8 encoded plain text message.
+The input text is supposed to already be UTF-8 encoded.
 .It Fl N
 Inhibits initial display of message headers
 when reading mail or editing a mail folder.
@@ -159,6 +162,11 @@ your message, followed
 by a control-D
 .Pq Sq ^D
 at the beginning of a line.
+By default the input text is expected to only contain 7-bit
+.Xr ascii 7
+characters unless
+.Fl m
+is used.
 The section below,
 .Sx Replying to or originating mail ,
 describes some features of
blob - f802c07f9f0f30c17cbf5a187a9a6e35028173be
blob + 956681a99c1d1dda5102350f1c4f731097fb432d
--- usr.bin/mail/main.c
+++ usr.bin/mail/main.c
@@ -79,6 +79,8 @@ int   realscreenheight;               /* the real screen 
height */
 int    uflag;                          /* Are we in -u mode? */
 sigset_t intset;                       /* Signal set that is just SIGINT */
 
+int    multibyte;                      /* Add MIME headers */
+
 /*
  * The pointers for the string allocation routines,
  * there are NSPACE independent areas.
@@ -136,7 +138,7 @@ main(int argc, char **argv)
        smopts = NULL;
        fromaddr = NULL;
        subject = NULL;
-       while ((i = getopt(argc, argv, "EINb:c:dfinr:s:u:v")) != -1) {
+       while ((i = getopt(argc, argv, "EINb:c:dfimnr:s:u:v")) != -1) {
                switch (i) {
                case 'u':
                        /*
@@ -171,6 +173,9 @@ main(int argc, char **argv)
                         */
                        subject = optarg;
                        break;
+               case 'm':
+                       multibyte = 1;
+                       break;
                case 'f':
                        /*
                         * User is specifying file to "edit" with Mail,
@@ -336,8 +341,7 @@ setscreensize(void)
 __dead void
 usage(void)
 {
-
-       fprintf(stderr, "usage: %s [-dEIinv] [-b list] [-c list] "
+       fprintf(stderr, "usage: %s [-dEIimnv] [-b list] [-c list] "
            "[-r from-addr] [-s subject] to-addr ...\n", __progname);
        fprintf(stderr, "       %s [-dEIiNnv] -f [file]\n", __progname);
        fprintf(stderr, "       %s [-dEIiNnv] [-u user]\n", __progname);
blob - 9582675f9b851583f8487aae8ff1b82e70bf01d4
blob + aa0285bf1135bf0d2f59088a1d293ae28bd81b8a
--- usr.bin/mail/send.c
+++ usr.bin/mail/send.c
@@ -514,22 +514,27 @@ infix(struct header *hp, FILE *fi)
 int
 puthead(struct header *hp, FILE *fo, int w)
 {
-       int gotcha;
        char *from;
 
-       gotcha = 0;
        from = hp->h_from ? hp->h_from : value("from");
        if (from != NULL)
-               fprintf(fo, "From: %s\n", from), gotcha++;
+               fprintf(fo, "From: %s\n", from);
        if (hp->h_to != NULL && w & GTO)
-               fmt("To:", hp->h_to, fo, w&GCOMMA), gotcha++;
+               fmt("To:", hp->h_to, fo, w&GCOMMA);
        if (hp->h_subject != NULL && w & GSUBJECT)
-               fprintf(fo, "Subject: %s\n", hp->h_subject), gotcha++;
+               fprintf(fo, "Subject: %s\n", hp->h_subject);
        if (hp->h_cc != NULL && w & GCC)
-               fmt("Cc:", hp->h_cc, fo, w&GCOMMA), gotcha++;
+               fmt("Cc:", hp->h_cc, fo, w&GCOMMA);
        if (hp->h_bcc != NULL && w & GBCC)
-               fmt("Bcc:", hp->h_bcc, fo, w&GCOMMA), gotcha++;
-       if (gotcha && w & GNL)
+               fmt("Bcc:", hp->h_bcc, fo, w&GCOMMA);
+       fprintf(fo, "MIME-Version: 1.0\n");
+       if (multibyte)
+               fprintf(fo, "Content-Transfer-Encoding: 8bit\n"
+                   "Content-Type: text/plain; charset=utf-8\n");
+       else
+               fprintf(fo, "Content-Transfer-Encoding: 7bit\n"
+                   "Content-Type: text/plain; charset=us-ascii\n");
+       if (w & GNL)
                (void)putc('\n', fo);
        return(0);
 }

Reply via email to