Re: (Not-so) hypothetical question: What to do about NULs?

2023-03-12 Thread David Levine
I wrote:

> Ken wrote:
> > In terms of the networking code,
> > it looks like the right thing will happen when sending a NUL via
> > SMTP,
>
> Almost, but not quite.  I posted a possible fix but I'm still refining
> it.

Fix to post(8) committed:

commit 8f897f65fecbc668db777e2f4fabb23a08edf11b
Author: David Levine 
Date:   Sun Mar 12 10:28:39 2023 -0400

Enhanced post(8) to handle NULs in message body.

:100644 100644 6436734c 30b887af M  test/fakesmtp.c
:100755 100755 39915e71 fb2b167b M  test/post/test-post-basic
:100644 100644 820ed05b bf35b8a4 M  uip/post.c

David



Re: (Not-so) hypothetical question: What to do about NULs?

2023-02-23 Thread Michael Richardson
Ken Hornstein  wrote:
> Right, but that's mostly because of the way multiline responses are
> handled in POP.  It's never "read X bytes", it's "read lines until you
> get a line that is just .\r\n".  With IMAP, it's "the next X bytes are
> the data you asked for".  So you're used to dealing with "lines" and
> that lends itself to C strings.

I assumed that the nuls we are concerned about are either inside the base64,
when it's a text/*, or are in the binary transfer when it's not base64.




Re: (Not-so) hypothetical question: What to do about NULs?

2023-02-22 Thread Andy Bradford
Thus said Ken Hornstein on Wed, 22 Feb 2023 20:59:31 -0500:

> I had an inkling popular MTAs would DTRT.

Well, qmail's hardly "popular" these days, but Professor Bernstein had a
penchant to make string handling robust to avoid exploits, so he got NUL
handling as a benefit.

I run minority MTA with minority MUA  (at least for as long as the email
cartel continues to permit legitimate email).

Andy




Re: (Not-so) hypothetical question: What to do about NULs?

2023-02-22 Thread Ken Hornstein
>While POP's LIST does actually include the size of the message in bytes,
>that's prior  to any  CRLF mangling  that happens so  it cannot  be used
>reliably as a method for determining when to stop reading. Unfortunate.

Right, but that's mostly because of the way multiline responses are
handled in POP.  It's never "read X bytes", it's "read lines until you
get a line that is just .\r\n".  With IMAP, it's "the next X bytes are
the data you asked for".  So you're used to dealing with "lines" and
that lends itself to C strings.

>I notice however,  that some components of my  email infrastructure pass
>NULs through without problems and some do not. qmail successfully queued
>a message with  a NUL in both  the header and the body,  but other parts
>(e.g. recipient validation tools) did not fare as well, and of course we
>knew that inc would truncate (and it did because the lines with NUL were
>truncated).

I had an inkling popular MTAs would DTRT.

--Ken



Re: (Not-so) hypothetical question: What to do about NULs?

2023-02-22 Thread Andy Bradford
Thus said Ken Hornstein on Tue, 21 Feb 2023 21:29:16 -0500:

> So you're told  "I am sending this many bytes  exactly", and you don't
> have to  deal with "lines", so  the implementations I've seen  tend to
> call read() (or  the equivalent) until they get the  correct number of
> bytes, and  because you're  not dealing with  "lines" you  don't treat
> them as C strings.

While POP's LIST does actually include the size of the message in bytes,
that's prior  to any  CRLF mangling  that happens so  it cannot  be used
reliably as a method for determining when to stop reading. Unfortunate.

I notice however,  that some components of my  email infrastructure pass
NULs through without problems and some do not. qmail successfully queued
a message with  a NUL in both  the header and the body,  but other parts
(e.g. recipient validation tools) did not fare as well, and of course we
knew that inc would truncate (and it did because the lines with NUL were
truncated).

I suspect that qmail worked for the most part because of stralloc:

http://cr.yp.to/lib/stralloc.html

Andy




Re: (Not-so) hypothetical question: What to do about NULs?

2023-02-22 Thread David Malone
> > I wonder if it would be better to use fwrite instead of write, to
> > avoid mixing stdio and Posix-style output? (It would also avoid an
> > unbuffered write of 1 byte.)

> Good point.  How about the attached?

Looks sensible to me!

David.



Re: (Not-so) hypothetical question: What to do about NULs?

2023-02-21 Thread David Levine
Ken wrote:

> [David:]
> >I have received email with C-T-E set to binary.  While I don't think it
> >was needed, I haven't checked closely.
>
> Facinating!  I am curious: who/what sent this to you!  Do you remember
> the MIME type?

The C-T-E: binary is in the message header.  The are two alternative
content parts, text/html and text/plain.  Both are encoded Q-P.  So
the C-T-E: binary is gratuitous.  (And mhfixmsg converts it to 8-bit.)

 msg part  type/subtype  size description
   0   multipart/alternative  26K
 boundary="--=_1648114734-702538-12126"
 charset="UTF-8"
 1 text/html  16K
 disposition "inline"
 2 text/plain9823
 disposition "inline"

The sender, freecycle.org, uses that C-T-E: binary often.  Maybe every
time.

> Well, I'm not SURE that's necessarily true.  As you point out, that's
> only true for the bodies of message fields.  And I see a lot of things
> in the code that assume the body of a message field is a valid C string,
> e.g (mhparse.c):
>
> /* if necessary, get rest of field */
> while (state == FLDPLUS) {
> bufsz = sizeof buf;
> state = m_getfld2(, name, buf, );
> vp = add (buf, vp); /* add to previous value */
> }

That's in FLDPLUS, still in the header.

> In terms of the networking code,
> it looks like the right thing will happen when sending a NUL via
> SMTP,

Almost, but not quite.  I posted a possible fix but I'm still refining
it.

> It seems for message bodies we're
> in reasonable shape (unless you are RETRIEVING a message via POP), but
> if a NUL appears in the header somewhere all bets are off.

Yeah.  I'd be OK with replacing NULs with some legal
character(s).  I'm not sure that just squashing them is a good
idea.  I don't have a concrete example, but wonder if it could be
abused, say in a really messy URL.

David



Re: (Not-so) hypothetical question: What to do about NULs?

2023-02-21 Thread Ken Hornstein
>When I  was poking around  in the POP code  I didn't notice  any special
>handling  of  NUL  bytes.  It's  possible  that  this  would  result  in
>truncation. If that's what we do now, I suspect it's alright to continue
>to do so; at  least until we find legitimate emails in  the wild that do
>not conform (again think 16M character lines).

Right, definitely the POP code doesn't handle this, and my quick check
suggests we're not the only ones.

However, it seems like a lot of IMAP implementations do better.  I think
that's due to the protocol; in POP when you retrieve a message it looks
like:

C: RETR 1
S: +OK
S: Line 1
S: Line 2
S: [...]
S: .

So you're THINKING in lines so you tend to read a "line" until you get a
line with the sentinel value (.\r\n).

IMAP, on the other hand, looks like:

C: a0001 FETCH 1 (RFC822)
S: * 1 FETCH (RFC822 {1024}
[... 1024 bytes of data follows ...]
S: )

So you're told "I am sending this many bytes exactly", and you don't
have to deal with "lines", so the implementations I've seen tend to call
read() (or the equivalent) until they get the correct number of bytes,
and because you're not dealing with "lines" you don't treat them as C
strings.  Of course, RFC 3051 explicitly says:

(3) The ASCII NUL character, %x00, MUST NOT be used at any
time.

But you're not supposed to send 16MB lines either!

--Ken



Re: (Not-so) hypothetical question: What to do about NULs?

2023-02-21 Thread Andy Bradford
Thus said Ken Hornstein on Tue, 21 Feb 2023 07:17:19 -0500:

> I'm sitting down to write or modify  nmh code. Right now we have a lot
> of code  that assumes NUL-terminated  C strings are safe  to represent
> email everywhere. My question is: is that a valid assumption?

I don't think  nmh should produce anything that contains  NUL bytes, but
whether or  not it should  accept such is  a different question  (as you
mention the 16 million byte line of text in an email message that I keep
getting from a certain sender that  cannot be bothered to follow the RFC
which clearly states  that base64 MIME data should be  78 characters and
clearly not longer than 998).

When I  was poking around  in the POP code  I didn't notice  any special
handling  of  NUL  bytes.  It's  possible  that  this  would  result  in
truncation. If that's what we do now, I suspect it's alright to continue
to do so; at  least until we find legitimate emails in  the wild that do
not conform (again think 16M character lines).

nmh's  POP code  has been  silently  truncating long  lines (e.g.  those
greater  than 1023  bytes) for  years and  crashing on  lines that  were
longer than 32,767 bytes). I  only recently discovered this while trying
to figure out what to do with  a 16M character line. I went back through
old emails and sure  enough, I had a lot of  truncation. I never noticed
because most of them were in long lines of HTML that I don't ever bother
reading.

So I guess what I'm saying is, I think it's alright to continue to treat
messages as C-strings (until it isn't).

Andy




Re: (Not-so) hypothetical question: What to do about NULs?

2023-02-21 Thread Andy Bradford
Thus said Ken Hornstein on Mon, 20 Feb 2023 21:11:48 -0500:

> Facinating! I am  curious: who/what sent this to you!  Do you remember
> the MIME type?

0.11 % (percent) of my messages have Content-Transfer-Encoding of binary
at the beginning of the line somewhere in the message.

Here are the headers from one that  dates all the way back to 2001 (this
message does not appear to have any actual "binary" content in it).

---BEGIN
Content-Type: multipart/mixed; boundary="--=_154292612-6290-0"
Content-Transfer-Encoding: binary
MIME-Version: 1.0
X-Mailer: MIME-tools 5.41 (Entity 5.404)
From: "Jato Boa" 
Date: Xxx, 00 Xxx 2001 16:09:27 +0800

This is a multi-part message in MIME format...

=_154292612-6290-0
Content-Type: text/plain; charset="iso-8859-1"
Content-Disposition: inline

[ascii data]
=_154292612-6290-0
Content-Type: image/jpg; name="OosI Fric Ghesuf kurfIzKi chruzGi Awt.jpg"
Content-Disposition: attachment; filename="OosI Fric Ghesuf kurfIzKi chruzGi 
Awt.jpg"
Content-Transfer-Encoding: base64

[base64 data]
---END--


Also, I have quite a few from the Bugtraq mailing list that have a C-T-E
of binary. The headers indicate binary, but the rest of the body doesn't
seem to  imply it (doesn't  need it probably),  but then there  are some
like this:

https://seclists.org/bugtraq/2004/Aug/223

Here  are relevant  headers and  the  binary values  were replaced  with
:

---BEGIN
Content-Type: text/plain
Content-Disposition: inline
Content-Transfer-Encoding: binary
MIME-Version: 1.0
X-Mailer: MIME-tools 5.411 (Entity 5.404)
From: "Jrme" ATHIAS 
To: bugt...@securityfocus.com
Subject: First vulnerabilities in the SP2 - XP ?...
X-Spam-BMF-Status: No, hits=0.00 required=0.90



http://www.heise.de/security/artikel/50051

Regards,
Jrme ATHIAS

---END--

Today, I  think this message  would instead be quoted-printable  or some
other encoding.


Here's another  example from a  well known  online seller of  goods that
used messagelabs to send out customer order statuses:

---BEGIN
Content-Transfer-Encoding: binary
Content-Type: multipart/related; boundary="_--=_79242061420"
MIME-Version: 1.0
X-Mailer: MIME::Lite 3.01 (F2.72; A1.60; B2.21; Q2.21)
Date: Xxx, 00 Xxx  16:36:04 UT
From: [online store redacted]

This is a multi-part message in MIME format.

--_--=_79242061420
Content-Transfer-Encoding: binary
Content-Type: multipart/alternative; boundary="_--=_79242061421"
MIME-Version: 1.0
X-Mailer: MIME::Lite 3.01 (F2.72; A1.60; B2.21; Q2.21)
Date: Xxx, 00 Xxx  16:36:04 UT

This is a multi-part message in MIME format.

--_--=_79242061421
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain
MIME-Version: 1.0
X-Mailer: MIME::Lite 3.01 (F2.72; A1.60; B2.21; Q2.21)
Date: Xxx, 00 Xxx  16:36:04 UT

[quoted printable data]
---END--


Here's a more recent email from another online provider of services with
 replaced where binary value was found:

---BEGIN
Content-Type: text/html
Content-Disposition: inline
Content-Transfer-Encoding: binary
MIME-Version: 1.0
X-Mailer: MIME-tools 5.509 (Entity 5.509)

...

  
Copyright  2021 ...
---END--


Are these  bugs in  email client  implementations?

I've looked at a handful of the messages that I have which have a header
of C-T-E binary and the body of  the message is almost always some other
C-T-E (mostly  quoted-printable) or  non-binary. But sometimes  it seems
justified. Maybe they just throw the C-T-E  on there "just in case" as a
sloppy way of getting by?


> I guess  what I was hoping  for was a  consensus on what we  SHOULD do
> when we encounter  a NUL byte, because I haven't  heard that yet! Like
> what should the code do, precisely?

I'm not  sure. Does any  one have any example  of having received  a NUL
byte in  an email? I'm  having a hard time  convincing grep to  look for
one.

Andy




Re: (Not-so) hypothetical question: What to do about NULs?

2023-02-21 Thread Michael Richardson
Ken Hornstein  wrote:
> I'm sitting down to write or modify nmh code.  Right now we have a lot
> of code that assumes NUL-terminated C strings are safe to represent
> email everywhere.  My question is: is that a valid assumption?  If
> we are making that assumption, fine, let's be explicit and if someone
> DOES encounter a NUL in modern email, we tell them to suck it.

I think that this is the minimum that we must do.

> If we all agree that is NOT a valid assumption, then fine, going forward
> we should eventually fix that, or target new APIs that fix that.  If

>> The IETF "modern SMTP" stuff John Klensin is working on (with others) 
might
>> want to talk to that: a lot of the ICANN UA stuff is a push for UTF-8 
clean
>> across the board.

> I do not think this is relevant to this discussion, unless they are
> changing RFC 5322s position on NULs.

But, it seems like a question that IETF could clarify.

--
Michael Richardson. o O ( IPv6 IøT consulting )
   Sandelman Software Works Inc, Ottawa and Worldwide






signature.asc
Description: PGP signature


Re: (Not-so) hypothetical question: What to do about NULs?

2023-02-21 Thread Ken Hornstein
>> I do not think this is relevant to this discussion, unless they
>> are changing RFC 5322s position on NULs.
>
>But, it seems like a question that IETF could clarify.

I don't see how further clarification is necessary here?  I mean, a 16MB
single line in email is clearly a MUST NOT, but people send them anyway.

--Ken



Re: (Not-so) hypothetical question: What to do about NULs?

2023-02-21 Thread Paul Fox
ken wrote:
 > I'm sitting down to write or modify nmh code.  Right now we have a lot
 > of code that assumes NUL-terminated C strings are safe to represent
 > email everywhere.  My question is: is that a valid assumption?  If
 > we are making that assumption, fine, let's be explicit and if someone
 > DOES encounter a NUL in modern email, we tell them to suck it.

It seems to me that, given the results of your skim of various other
mail recipients, it's clear that receiving NULs in mail is not a big
issue.  If receiving NULs were a big issue, or even, really, a small
issue, then the clients with far larger user bases than MH's would
have had to fix their code by now.  And they haven't.  (Your skim
wasn't comprehensive, but that says to me that there are likely more
potential breakages out there than you found -- not fewer.)

 > What I don't want is the current situation where we're kind of
 > half-assing it and it works because NULs are extremely uncommon (unless
 > we all agree that is fine).  So, I ask again: I encounter a NUL in

I personally vote for "that is fine".  If no one here has had issues
with NULs in mail, and the rest of the world seems to ignore the
problem, then I'd submit that it really isn't a problem.  The
wishy-washyness of the RFCs supports this.

Going forward we should try not to crash.  And we should try not to
truncate.  But then I'd say half-assing it is fine:  remove the NUL,
replace it with '@', whatever.  If it's never going to happen, then it
simply doesn't matter.

paul
=--
paul fox, p...@foxharp.boston.ma.us (arlington, ma, where it's 34.7 degrees)




Re: (Not-so) hypothetical question: What to do about NULs?

2023-02-21 Thread Ken Hornstein
>> if a NUL appears in the header somewhere all bets are off.
>
>I think it would be fascinating to understand how that happened. Depending
>on how the parse tree is done, it could be marginally bad, or catastrophic.
>
>I really would be amazed if this is seen in the wild. But its a big
>network: maybe its out there?

Sigh.  I don't really know if it has happened in the wild before (I will
presume that it has), but that's not really my point.  Let me try to
explain it again.

I'm sitting down to write or modify nmh code.  Right now we have a lot
of code that assumes NUL-terminated C strings are safe to represent
email everywhere.  My question is: is that a valid assumption?  If
we are making that assumption, fine, let's be explicit and if someone
DOES encounter a NUL in modern email, we tell them to suck it.

If we all agree that is NOT a valid assumption, then fine, going forward
we should eventually fix that, or target new APIs that fix that.  If
we agree that we should handle NULs in individual MIME parts but not
handle them in message headers, fine, let's make that explicit.  Then
that begs the question of what we SHOULD do when we encounter a NUL in
a message header.

What I don't want is the current situation where we're kind of
half-assing it and it works because NULs are extremely uncommon (unless
we all agree that is fine).  So, I ask again: I encounter a NUL in
an email.  What do I do, exactly?  Pseudocode is preferred in your
response.

>The IETF "modern SMTP" stuff John Klensin is working on (with others) might
>want to talk to that: a lot of the ICANN UA stuff is a push for UTF-8 clean
>across the board.

I do not think this is relevant to this discussion, unless they are
changing RFC 5322s position on NULs.

--Ken



Re: (Not-so) hypothetical question: What to do about NULs?

2023-02-20 Thread George Michaelson
> if a NUL appears in the header somewhere all bets are off.

I think it would be fascinating to understand how that happened. Depending
on how the parse tree is done, it could be marginally bad, or catastrophic.

I really would be amazed if this is seen in the wild. But its a big
network: maybe its out there?

The IETF "modern SMTP" stuff John Klensin is working on (with others) might
want to talk to that: a lot of the ICANN UA stuff is a push for UTF-8 clean
across the board.

-G


Re: (Not-so) hypothetical question: What to do about NULs?

2023-02-20 Thread Ken Hornstein
>I have received email with C-T-E set to binary.  While I don't think it
>was needed, I haven't checked closely.

Facinating!  I am curious: who/what sent this to you!  Do you remember
the MIME type?

>> - Completely handle embedded NULs properly.  This is arguably the most
>>   correct option but would involve a lot of code changes.
>
>This might not be much of a lift.  m_getfld might handle NULs in bodies,
>and the MIME parser comes close to handling them as well.

Well, I'm not SURE that's necessarily true.  As you point out, that's
only true for the bodies of message fields.  And I see a lot of things
in the code that assume the body of a message field is a valid C string,
e.g (mhparse.c):

/* if necessary, get rest of field */
while (state == FLDPLUS) {
bufsz = sizeof buf;
state = m_getfld2(, name, buf, );
vp = add (buf, vp); /* add to previous value */
}

Also a lot of things (like MIME parameter parsing, address parsing, etc
etc) assume C strings.  I agree that if you get a binary part it looks
like the right things will happen.  In terms of the networking code,
it looks like the right thing will happen when sending a NUL via
SMTP, but the POP code assumes that can't happen (as far as I can
tell, this was true even before I switched things to the unified
netsec code).

I guess what I was hoping for was a consensus on what we SHOULD do
when we encounter a NUL byte, because I haven't heard that yet!  Like
what should the code do, precisely?  It seems for message bodies we're
in reasonable shape (unless you are RETRIEVING a message via POP), but
if a NUL appears in the header somewhere all bets are off.

--Ken



Re: (Not-so) hypothetical question: What to do about NULs?

2023-02-20 Thread David Levine
David Malone wrote:

> I wonder if it would be better to use fwrite instead of write, to
> avoid mixing stdio and Posix-style output? (It would also avoid an
> unbuffered write of 1 byte.)

Good point.  How about the attached?

David
diff --git a/uip/post.c b/uip/post.c
index 820ed05b..a58e19a1 100644
--- a/uip/post.c
+++ b/uip/post.c
@@ -660 +660,3 @@ main (int argc, char **argv)
-	case BODY: 
+	case BODY: {
+		size_t n;
+
@@ -664 +666,9 @@ main (int argc, char **argv)
-		fprintf (out, "\n%s", buf);
+		if (fwrite ("\n", 1, 1, out) != 1) {
+		adios ("write of newline between header and body", "failed");
+		}
+		/* Don't emit trailing NUL to avoid interfering with SMTP
+		   conversation. */
+		n =  bufsz >= 1 && buf[bufsz-1] == '\0' ? bufsz - 1 : bufsz;
+		if (fwrite (buf, 1, n, out) != n) {
+		adios ("write of body", "failed");
+		}
@@ -668 +678,3 @@ main (int argc, char **argv)
-		fputs (buf, out);
+		if (fwrite (buf, 1, bufsz, out) != (size_t) bufsz) {
+			adios ("continued write of body", "failed");
+		}
@@ -670,0 +683 @@ main (int argc, char **argv)
+	}


Re: (Not-so) hypothetical question: What to do about NULs?

2023-02-20 Thread David Malone
I wonder if it would be better to use fwrite instead of write, to
avoid mixing stdio and Posix-style output? (It would also avoid an
unbuffered write of 1 byte.)

David.



Re: (Not-so) hypothetical question: What to do about NULs?

2023-02-20 Thread David Levine
I wrote:

> This might not be much of a lift.  m_getfld might handle NULs in bodies,
> and the MIME parser comes close to handling them as well.

m_getfld and the MIME parser do handle NULs.

post(8) doesn't.  I'd like to commit the attached patch.  It uses write(2)
instead of fprintf(3) and fputs(3).  All tests pass with it.  Any objection?

While SMTP servers still might not be able to handle NULs, I don't think
that nmh should block them.

> mhshow has a test-binary that says that it reads a null byte, but it's
> just a space.

That was incorrect.  printf(1) wrote out the NUL.  I'll add a comment to
the test to make that clearer.

David
diff --git a/uip/post.c b/uip/post.c
index 820ed05b..010bcf1e 100644
--- a/uip/post.c
+++ b/uip/post.c
@@ -340 +340 @@ main (int argc, char **argv)
-int state, compnum, dashstuff = 0, swnum;
+int state, compnum, dashstuff = 0, swnum, out_fd;
@@ -632 +632 @@ main (int argc, char **argv)
-	char *cp = m_mktemp2(NULL, invo_name, NULL, );
+	char *cp = m_mktemp2(NULL, invo_name, _fd, );
@@ -664 +664,5 @@ main (int argc, char **argv)
-		fprintf (out, "\n%s", buf);
+		write (out_fd, "\n", 1);
+		/* Don't emit trailing NUL to avoid interfering with SMTP
+		   conversation. */
+		write (out_fd, buf,
+		   bufsz >= 1 && buf[bufsz-1] == '\0' ? bufsz-1 : bufsz);
@@ -668 +672 @@ main (int argc, char **argv)
-		fputs (buf, out);
+		write (out_fd, buf, bufsz);
@@ -1240,0 +1245 @@ finish_headers (FILE *out)
+fflush (out);


Re: (Not-so) hypothetical question: What to do about NULs?

2023-02-19 Thread David Levine
Ken wrote:

> But RFC 2045 says in §6.2:
>
>Thus there are no
>circumstances in which the "binary" Content-Transfer-Encoding is
>actually valid in Internet mail.

> Also, I am not aware of "binary" being used as a C-T-E at all.

I have received email with C-T-E set to binary.  While I don't think it
was needed, I haven't checked closely.

> - Completely handle embedded NULs properly.  This is arguably the most
>   correct option but would involve a lot of code changes.

This might not be much of a lift.  m_getfld might handle NULs in bodies,
and the MIME parser comes close to handling them as well.

mhshow has a test-binary that says that it reads a null byte, but it's
just a space.  That should be fixed, but I think that might reveal
(minor?) deficiencies elsewhere.

David



Re: (Not-so) hypothetical question: What to do about NULs?

2023-02-18 Thread Steffen Nurpmeso
Ken Hornstein wrote in
 <20230219012125.2e48b1d7...@pb-smtp21.pobox.com>:
 |>Seems to me this is classifcation of attachment data, which will end up
 |>as octet-stream in that case.
 |
 |It's ... a little confusing!
 |
 |>For S-nail we more or less do what Heirloom mailx has done.
 |
 |Well, it seems that in the message lexer if you encounter a NUL you
 |just stop, from a_msg_scan():
 |
 |  cp = mslp->msl_cap->ca_arg.ca_str.s;
 |  if((c = *cp++) != '\0')
 | break;

That seems to come from a command argument parser, not mail
content.  Ah no, no no, wrong code :)
I can assure you that the email

  From reproducible_build Wed Oct  2 01:50:07 1996
  Date: Wed, 02 Oct 1996 01:50:07 +
  From: e...@am.ple
  Subject: s3
  MIME-Version: 1.0
  Content-Type: text/plain; charset=utf-8
  Content-Transfer-Encoding: quoted-printable
  Status: O

  Alo=00ha
  Boom.

is decoded (of course) and displayed with the NUL converted to the
Unicode graphical for NUL.
The same of i make it "binary" and put a real NUL in place of the
=00.

 |It does look like to me that for IMAP and POP a NUL character is handled
 |properly.  But that doesn't answer the question, what do you THINK should

Uh i really had to look and try out whether binary data on the
input side of IMAP or POP3 properly handles embedded NULs.
I would assume yes.  (More or less.)

 |happen?  Should NULs be passed through?  You basically can't use C strings
 |anywhere if you want to handle embedded NULs.

That is true.

 |>The implementation is total crap. (longjmp codebase, data leaks,
 |>blocking I/O, all that (it was).)  All of these (mailbox read,
 |>content-transfer decoding, character set conversion, .. display
 |>preparation) should be "filters" with input and output plugged together,
 |>with internal buffers as necessary.  That is the v15 MIME and I/O layer
 |>rewrite that is not happening for nine years.
 |
 |Sigh, I know the feeling :-/

A nice Sunday is also not a bad thing.
Ciao,

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



Re: (Not-so) hypothetical question: What to do about NULs?

2023-02-18 Thread Ken Hornstein
>Seems to me this is classifcation of attachment data, which will end up
>as octet-stream in that case.

It's ... a little confusing!

>For S-nail we more or less do what Heirloom mailx has done.

Well, it seems that in the message lexer if you encounter a NUL you
just stop, from a_msg_scan():

  cp = mslp->msl_cap->ca_arg.ca_str.s;
  if((c = *cp++) != '\0')
 break;

It does look like to me that for IMAP and POP a NUL character is handled
properly.  But that doesn't answer the question, what do you THINK should
happen?  Should NULs be passed through?  You basically can't use C strings
anywhere if you want to handle embedded NULs.

>The implementation is total crap. (longjmp codebase, data leaks,
>blocking I/O, all that (it was).)  All of these (mailbox read,
>content-transfer decoding, character set conversion, .. display
>preparation) should be "filters" with input and output plugged together,
>with internal buffers as necessary.  That is the v15 MIME and I/O layer
>rewrite that is not happening for nine years.

Sigh, I know the feeling :-/

--Ken



Re: (Not-so) hypothetical question: What to do about NULs?

2023-02-18 Thread Steffen Nurpmeso
P.S.:

Congratulations to your new release btw.

I have written an OAuth helper in Python3 that suports OAuth for
GMail, Microsoft, Yandex:

  curl -u moon:mars --basic -O 
https://git.sdaoden.eu/browse/s-toolbox.git/plain/oauth-helper.py

It has a "manual" mode where it documents for GMail

  -- How to create a Google registration --

  Go to console.developers.google.com, and create a new project. The name 
doesn't
  matter and could be "mutt registration project".

   - Go to Library, choose Gmail API, and enable it
   - Hit left arrow icon to get back to console.developers.google.com
   - Choose OAuth Consent Screen
  - Choose Internal for an organizational G Suite
  - Choose External if that's your only choice
  - For Application Name, put for example "Mutt"
  - Under scopes, choose Add scope, scroll all the way down, enable the
"https://mail.google.com/; scope
  [Note this only allow "internal" users; you get the same mail usage scope
  by selecting those gmail scopes without any lock symbol!
  Like this application verification is not needed, and "External" can be
  chosen.]
  - Fill out additional fields (application logo, etc) if you feel like 
it
(will make the consent screen look nicer)

Maybe this helps!

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



Re: (Not-so) hypothetical question: What to do about NULs?

2023-02-18 Thread Steffen Nurpmeso
Ken Hornstein wrote in
 <20230219001921.597ad1e0...@pb-smtp20.pobox.com>:
 ...
 |- mutt
 ...
 |[.]Internally mutt does
 |have an idea if the content contains a NUL (the CONTENT structure contains
 |a 'nulbin' member which contains the number of NUL bytes), but it's not
 |clear to me what happens when a NUL is encountered.

Seems to me this is classifcation of attachment data, which will
end up as octet-stream in that case.

For S-nail we more or less do what Heirloom mailx has done.
For classification purposes we switch to octet-stream.
For display purposes we happily display it after passing it
through some kind of makeprint.

  isuni = ((n_psonce & n_PSO_UNICODE) != 0);
  ...
 if(!iswprint(wc) && wc != '\n' /*&& wc != '\r' && wc != '\b'*/ &&
   wc != '\t'){
if ((wc & ~S(wchar_t,037)) == 0)
   wc = isuni ? 0x2400 | wc : '?';
else if(wc == 0177)
   wc = isuni ? 0x2421 : '?';
else
   wc = isuni ? 0x2426 : '?';
 }else if(isuni){ /* TODO ctext */
/* Need to filter out L-TO-R and R-TO-R marks TODO ctext */
if(wc == 0x200E || wc == 0x200F || (wc >= 0x202A && wc <= 0x202E))
   continue;
/* And some zero-width messes */
if(wc == 0x00AD || (wc >= 0x200B && wc <= 0x200D))
   continue;
/* Oh about the ISO C wide character interfaces, baby! */
if(wc == 0xFEFF)
   continue;
 }

Or, without mb* and wc* sausage,

   {
  int c;
  while(inp < maxp){
 c = *inp++ & 0377;
 if(!su_cs_is_print(c) &&
   c != '\n' && c != '\r' && c != '\b' && c != '\t')
c = '?';
 *outp++ = c;
  }
  out->l = in->l;
   }

This is even a degression against Heirloom mailx that Jörg
Schilling was very dissatisfied about, as the above only handles
ASCII printable regardless of the locale.  (My plan was to write
a CText library for Unicode handling, and it was quite progressed
with only about two months until decomposition and normalization
were implemented (Christmas 2014), when something very bad
happened.  Maybe i will do it someday.  Or simply do what OpenBSD
does and use perl's fantastic Unicode support to generate some
tables.)

The implementation is total crap.  (longjmp codebase, data leaks,
blocking I/O, all that (it was).)  All of these (mailbox read,
content-transfer decoding, character set conversion, .. display
preparation) should be "filters" with input and output plugged
together, with internal buffers as necessary.  That is the v15
MIME and I/O layer rewrite that is not happening for nine years.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



(Not-so) hypothetical question: What to do about NULs?

2023-02-18 Thread Ken Hornstein
I've been idly thinking about this for a while, and while the question
might be simple I think it gets at some larger meta-issues we have never
really agreed on how to resolve it properly.

My question is, simply: What should happen when nmh encounters a NUL
character (U+) in email?

The rules
-

In theory, a NUL is never permitted in an email message.  RFC 5322 (the
latest incarnation of RFC 822) says in §4:

   Finally, certain characters that were formerly allowed in messages
   appear in this section.  The NUL character (ASCII value 0) was once
   allowed, but is no longer for compatibility reasons.

However, in §4.1 a NUL character is added to the BNF for obs-utext and
obs-body, so in THEORY you are supposed to handle that if you handle
obsolete messages.  §4 also says:

  Note: This section identifies syntactic forms that any
  implementation MUST reasonably interpret.  However, there are
  certainly Internet messages that do not conform to even the
  additional syntax given in this section.  The fact that a
  particular form does not appear in any section of this document is
  not justification for computer programs to crash or for malformed
  data to be irretrievably lost by any implementation.  It is up to
  the implementation to deal with messages robustly.

RFC 5322 punts some of the message syntax back to the MIME RFCs.
The "binary" content transfer encoding does allow any octet including
NUL characters.  But RFC 2045 says in §6.2:

   Mail transport for unencoded 8bit data is defined in RFC 1652.  As of
   the initial publication of this document, there are no standardized
   Internet mail transports for which it is legitimate to include
   unencoded binary data in mail bodies.  Thus there are no
   circumstances in which the "binary" Content-Transfer-Encoding is
   actually valid in Internet mail.  However, in the event that binary
   mail transport becomes a reality in Internet mail, or when MIME is
   used in conjunction with any other binary-capable mail transport
   mechanism, binary bodies must be labelled as such using this
   mechanism.

RFC 9051 (IMAP4rev2) says in §4.3.1:

   IMAP4rev2 is compatible with [I18N-HDRS]. As a result, the identified
   charset for header-field values with 8-bit content is UTF-8
   [UTF-8]. IMAP4rev2 implementations MUST accept and MAY transmit
   [UTF-8] text in quoted-strings as long as the string does not contain
   NUL, CR, or LF. This differs from IMAP4rev1 implementations.

   Although a BINARY content transfer encoding is defined, unencoded
   binary strings are not permitted, unless returned in a 
   in response to a BINARY.PEEK[]<> or
   BINARY[]<> FETCH data item. A "binary string"
   is any string with NUL characters. A string with an excessive amount
   of CTL characters MAY also be considered to be binary. Unless returned
   in response to BINARY.PEEK[...]/BINARY[...] FETCH, client and server
   implementations MUST encode binary data into a textual form, such as
   base64, before transmitting the data.

So it's ... a bit wishy-washy, but I think the case for NUL not being
valid is mostly okay.  IMAP, at least, says you can't send a NUL unless
you are getting a BINARY response with the special literal8 response
format (and BINARY is not defined in RFC 3501).

Messages in the real world
--

While other rules seem to be violated with impunity (see: 16MB single
lines) I am not aware of bare NULs commonly being sent in email messages
today.  Also, I am not aware of "binary" being used as a C-T-E at all.
Now, I could be COMPLETELY wrong about this!  It would be interesting to
hear about use of the binary CTE or other occurances of NUL characters
in the wild.

My impression is that if you are getting binary data, it is universally
encoded with base64; that it something everyone seems to be doing.  And
a NUL character doesn't seem to be valid in non-ASCII character sets
as anything other than a NUL.

How other mail programs deal with NULs
--

I was curious, so I took a look.  I tried to look at "modern" mail programs,
and by that I mean, "Seems to be kept up to date".  Which sadly excludes
Heirloom mailx as it seems to had it's last release in 2005.  I am open
to hearing about what other mail program do.

- fetchmail

Fetchmail uncerimously just smashes any NUL characters it sees, so if
you are retrieving messages using fetchmail you never see any NUL
characters.  From transact.c:

/*
 * Smash out any NULs, they could wreak havoc later on.
 * Some network stacks seem to generate these at random,
 * especially (according to reports) at the beginning of the
 * first read.  NULs are illegal in RFC822 format.
 */

You might get a special header warning you that a message had an
embedded NUL, though.

- alpine

Internally alpine (which uses a lot of