Re: (Not-so) hypothetical question: What to do about NULs?

2023-02-21 Thread David Levine
Ken wrote:

> [David:]
> >I have received email with C-T-E set to binary.  While I don't think it
> >was needed, I haven't checked closely.
>
> Facinating!  I am curious: who/what sent this to you!  Do you remember
> the MIME type?

The C-T-E: binary is in the message header.  The are two alternative
content parts, text/html and text/plain.  Both are encoded Q-P.  So
the C-T-E: binary is gratuitous.  (And mhfixmsg converts it to 8-bit.)

 msg part  type/subtype  size description
   0   multipart/alternative  26K
 boundary="--=_1648114734-702538-12126"
 charset="UTF-8"
 1 text/html  16K
 disposition "inline"
 2 text/plain9823
 disposition "inline"

The sender, freecycle.org, uses that C-T-E: binary often.  Maybe every
time.

> Well, I'm not SURE that's necessarily true.  As you point out, that's
> only true for the bodies of message fields.  And I see a lot of things
> in the code that assume the body of a message field is a valid C string,
> e.g (mhparse.c):
>
> /* if necessary, get rest of field */
> while (state == FLDPLUS) {
> bufsz = sizeof buf;
> state = m_getfld2(, name, buf, );
> vp = add (buf, vp); /* add to previous value */
> }

That's in FLDPLUS, still in the header.

> In terms of the networking code,
> it looks like the right thing will happen when sending a NUL via
> SMTP,

Almost, but not quite.  I posted a possible fix but I'm still refining
it.

> It seems for message bodies we're
> in reasonable shape (unless you are RETRIEVING a message via POP), but
> if a NUL appears in the header somewhere all bets are off.

Yeah.  I'd be OK with replacing NULs with some legal
character(s).  I'm not sure that just squashing them is a good
idea.  I don't have a concrete example, but wonder if it could be
abused, say in a really messy URL.

David



Re: (Not-so) hypothetical question: What to do about NULs?

2023-02-21 Thread Ken Hornstein
>When I  was poking around  in the POP code  I didn't notice  any special
>handling  of  NUL  bytes.  It's  possible  that  this  would  result  in
>truncation. If that's what we do now, I suspect it's alright to continue
>to do so; at  least until we find legitimate emails in  the wild that do
>not conform (again think 16M character lines).

Right, definitely the POP code doesn't handle this, and my quick check
suggests we're not the only ones.

However, it seems like a lot of IMAP implementations do better.  I think
that's due to the protocol; in POP when you retrieve a message it looks
like:

C: RETR 1
S: +OK
S: Line 1
S: Line 2
S: [...]
S: .

So you're THINKING in lines so you tend to read a "line" until you get a
line with the sentinel value (.\r\n).

IMAP, on the other hand, looks like:

C: a0001 FETCH 1 (RFC822)
S: * 1 FETCH (RFC822 {1024}
[... 1024 bytes of data follows ...]
S: )

So you're told "I am sending this many bytes exactly", and you don't
have to deal with "lines", so the implementations I've seen tend to call
read() (or the equivalent) until they get the correct number of bytes,
and because you're not dealing with "lines" you don't treat them as C
strings.  Of course, RFC 3051 explicitly says:

(3) The ASCII NUL character, %x00, MUST NOT be used at any
time.

But you're not supposed to send 16MB lines either!

--Ken



Re: (Not-so) hypothetical question: What to do about NULs?

2023-02-21 Thread Andy Bradford
Thus said Ken Hornstein on Tue, 21 Feb 2023 07:17:19 -0500:

> I'm sitting down to write or modify  nmh code. Right now we have a lot
> of code  that assumes NUL-terminated  C strings are safe  to represent
> email everywhere. My question is: is that a valid assumption?

I don't think  nmh should produce anything that contains  NUL bytes, but
whether or  not it should  accept such is  a different question  (as you
mention the 16 million byte line of text in an email message that I keep
getting from a certain sender that  cannot be bothered to follow the RFC
which clearly states  that base64 MIME data should be  78 characters and
clearly not longer than 998).

When I  was poking around  in the POP code  I didn't notice  any special
handling  of  NUL  bytes.  It's  possible  that  this  would  result  in
truncation. If that's what we do now, I suspect it's alright to continue
to do so; at  least until we find legitimate emails in  the wild that do
not conform (again think 16M character lines).

nmh's  POP code  has been  silently  truncating long  lines (e.g.  those
greater  than 1023  bytes) for  years and  crashing on  lines that  were
longer than 32,767 bytes). I  only recently discovered this while trying
to figure out what to do with  a 16M character line. I went back through
old emails and sure  enough, I had a lot of  truncation. I never noticed
because most of them were in long lines of HTML that I don't ever bother
reading.

So I guess what I'm saying is, I think it's alright to continue to treat
messages as C-strings (until it isn't).

Andy




Re: (Not-so) hypothetical question: What to do about NULs?

2023-02-21 Thread Andy Bradford
Thus said Ken Hornstein on Mon, 20 Feb 2023 21:11:48 -0500:

> Facinating! I am  curious: who/what sent this to you!  Do you remember
> the MIME type?

0.11 % (percent) of my messages have Content-Transfer-Encoding of binary
at the beginning of the line somewhere in the message.

Here are the headers from one that  dates all the way back to 2001 (this
message does not appear to have any actual "binary" content in it).

---BEGIN
Content-Type: multipart/mixed; boundary="--=_154292612-6290-0"
Content-Transfer-Encoding: binary
MIME-Version: 1.0
X-Mailer: MIME-tools 5.41 (Entity 5.404)
From: "Jato Boa" 
Date: Xxx, 00 Xxx 2001 16:09:27 +0800

This is a multi-part message in MIME format...

=_154292612-6290-0
Content-Type: text/plain; charset="iso-8859-1"
Content-Disposition: inline

[ascii data]
=_154292612-6290-0
Content-Type: image/jpg; name="OosI Fric Ghesuf kurfIzKi chruzGi Awt.jpg"
Content-Disposition: attachment; filename="OosI Fric Ghesuf kurfIzKi chruzGi 
Awt.jpg"
Content-Transfer-Encoding: base64

[base64 data]
---END--


Also, I have quite a few from the Bugtraq mailing list that have a C-T-E
of binary. The headers indicate binary, but the rest of the body doesn't
seem to  imply it (doesn't  need it probably),  but then there  are some
like this:

https://seclists.org/bugtraq/2004/Aug/223

Here  are relevant  headers and  the  binary values  were replaced  with
:

---BEGIN
Content-Type: text/plain
Content-Disposition: inline
Content-Transfer-Encoding: binary
MIME-Version: 1.0
X-Mailer: MIME-tools 5.411 (Entity 5.404)
From: "Jrme" ATHIAS 
To: bugt...@securityfocus.com
Subject: First vulnerabilities in the SP2 - XP ?...
X-Spam-BMF-Status: No, hits=0.00 required=0.90



http://www.heise.de/security/artikel/50051

Regards,
Jrme ATHIAS

---END--

Today, I  think this message  would instead be quoted-printable  or some
other encoding.


Here's another  example from a  well known  online seller of  goods that
used messagelabs to send out customer order statuses:

---BEGIN
Content-Transfer-Encoding: binary
Content-Type: multipart/related; boundary="_--=_79242061420"
MIME-Version: 1.0
X-Mailer: MIME::Lite 3.01 (F2.72; A1.60; B2.21; Q2.21)
Date: Xxx, 00 Xxx  16:36:04 UT
From: [online store redacted]

This is a multi-part message in MIME format.

--_--=_79242061420
Content-Transfer-Encoding: binary
Content-Type: multipart/alternative; boundary="_--=_79242061421"
MIME-Version: 1.0
X-Mailer: MIME::Lite 3.01 (F2.72; A1.60; B2.21; Q2.21)
Date: Xxx, 00 Xxx  16:36:04 UT

This is a multi-part message in MIME format.

--_--=_79242061421
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain
MIME-Version: 1.0
X-Mailer: MIME::Lite 3.01 (F2.72; A1.60; B2.21; Q2.21)
Date: Xxx, 00 Xxx  16:36:04 UT

[quoted printable data]
---END--


Here's a more recent email from another online provider of services with
 replaced where binary value was found:

---BEGIN
Content-Type: text/html
Content-Disposition: inline
Content-Transfer-Encoding: binary
MIME-Version: 1.0
X-Mailer: MIME-tools 5.509 (Entity 5.509)

...

  
Copyright  2021 ...
---END--


Are these  bugs in  email client  implementations?

I've looked at a handful of the messages that I have which have a header
of C-T-E binary and the body of  the message is almost always some other
C-T-E (mostly  quoted-printable) or  non-binary. But sometimes  it seems
justified. Maybe they just throw the C-T-E  on there "just in case" as a
sloppy way of getting by?


> I guess  what I was hoping  for was a  consensus on what we  SHOULD do
> when we encounter  a NUL byte, because I haven't  heard that yet! Like
> what should the code do, precisely?

I'm not  sure. Does any  one have any example  of having received  a NUL
byte in  an email? I'm  having a hard time  convincing grep to  look for
one.

Andy




Re: (Not-so) hypothetical question: What to do about NULs?

2023-02-21 Thread Michael Richardson
Ken Hornstein  wrote:
> I'm sitting down to write or modify nmh code.  Right now we have a lot
> of code that assumes NUL-terminated C strings are safe to represent
> email everywhere.  My question is: is that a valid assumption?  If
> we are making that assumption, fine, let's be explicit and if someone
> DOES encounter a NUL in modern email, we tell them to suck it.

I think that this is the minimum that we must do.

> If we all agree that is NOT a valid assumption, then fine, going forward
> we should eventually fix that, or target new APIs that fix that.  If

>> The IETF "modern SMTP" stuff John Klensin is working on (with others) 
might
>> want to talk to that: a lot of the ICANN UA stuff is a push for UTF-8 
clean
>> across the board.

> I do not think this is relevant to this discussion, unless they are
> changing RFC 5322s position on NULs.

But, it seems like a question that IETF could clarify.

--
Michael Richardson. o O ( IPv6 IøT consulting )
   Sandelman Software Works Inc, Ottawa and Worldwide






signature.asc
Description: PGP signature


Re: (Not-so) hypothetical question: What to do about NULs?

2023-02-21 Thread Ken Hornstein
>> I do not think this is relevant to this discussion, unless they
>> are changing RFC 5322s position on NULs.
>
>But, it seems like a question that IETF could clarify.

I don't see how further clarification is necessary here?  I mean, a 16MB
single line in email is clearly a MUST NOT, but people send them anyway.

--Ken



Re: (Not-so) hypothetical question: What to do about NULs?

2023-02-21 Thread Paul Fox
ken wrote:
 > I'm sitting down to write or modify nmh code.  Right now we have a lot
 > of code that assumes NUL-terminated C strings are safe to represent
 > email everywhere.  My question is: is that a valid assumption?  If
 > we are making that assumption, fine, let's be explicit and if someone
 > DOES encounter a NUL in modern email, we tell them to suck it.

It seems to me that, given the results of your skim of various other
mail recipients, it's clear that receiving NULs in mail is not a big
issue.  If receiving NULs were a big issue, or even, really, a small
issue, then the clients with far larger user bases than MH's would
have had to fix their code by now.  And they haven't.  (Your skim
wasn't comprehensive, but that says to me that there are likely more
potential breakages out there than you found -- not fewer.)

 > What I don't want is the current situation where we're kind of
 > half-assing it and it works because NULs are extremely uncommon (unless
 > we all agree that is fine).  So, I ask again: I encounter a NUL in

I personally vote for "that is fine".  If no one here has had issues
with NULs in mail, and the rest of the world seems to ignore the
problem, then I'd submit that it really isn't a problem.  The
wishy-washyness of the RFCs supports this.

Going forward we should try not to crash.  And we should try not to
truncate.  But then I'd say half-assing it is fine:  remove the NUL,
replace it with '@', whatever.  If it's never going to happen, then it
simply doesn't matter.

paul
=--
paul fox, p...@foxharp.boston.ma.us (arlington, ma, where it's 34.7 degrees)




Re: (Not-so) hypothetical question: What to do about NULs?

2023-02-21 Thread Ken Hornstein
>> if a NUL appears in the header somewhere all bets are off.
>
>I think it would be fascinating to understand how that happened. Depending
>on how the parse tree is done, it could be marginally bad, or catastrophic.
>
>I really would be amazed if this is seen in the wild. But its a big
>network: maybe its out there?

Sigh.  I don't really know if it has happened in the wild before (I will
presume that it has), but that's not really my point.  Let me try to
explain it again.

I'm sitting down to write or modify nmh code.  Right now we have a lot
of code that assumes NUL-terminated C strings are safe to represent
email everywhere.  My question is: is that a valid assumption?  If
we are making that assumption, fine, let's be explicit and if someone
DOES encounter a NUL in modern email, we tell them to suck it.

If we all agree that is NOT a valid assumption, then fine, going forward
we should eventually fix that, or target new APIs that fix that.  If
we agree that we should handle NULs in individual MIME parts but not
handle them in message headers, fine, let's make that explicit.  Then
that begs the question of what we SHOULD do when we encounter a NUL in
a message header.

What I don't want is the current situation where we're kind of
half-assing it and it works because NULs are extremely uncommon (unless
we all agree that is fine).  So, I ask again: I encounter a NUL in
an email.  What do I do, exactly?  Pseudocode is preferred in your
response.

>The IETF "modern SMTP" stuff John Klensin is working on (with others) might
>want to talk to that: a lot of the ICANN UA stuff is a push for UTF-8 clean
>across the board.

I do not think this is relevant to this discussion, unless they are
changing RFC 5322s position on NULs.

--Ken