Re: (Not-so) hypothetical question: What to do about NULs?
Ken wrote: > [David:] > >I have received email with C-T-E set to binary. While I don't think it > >was needed, I haven't checked closely. > > Facinating! I am curious: who/what sent this to you! Do you remember > the MIME type? The C-T-E: binary is in the message header. The are two alternative content parts, text/html and text/plain. Both are encoded Q-P. So the C-T-E: binary is gratuitous. (And mhfixmsg converts it to 8-bit.) msg part type/subtype size description 0 multipart/alternative 26K boundary="--=_1648114734-702538-12126" charset="UTF-8" 1 text/html 16K disposition "inline" 2 text/plain9823 disposition "inline" The sender, freecycle.org, uses that C-T-E: binary often. Maybe every time. > Well, I'm not SURE that's necessarily true. As you point out, that's > only true for the bodies of message fields. And I see a lot of things > in the code that assume the body of a message field is a valid C string, > e.g (mhparse.c): > > /* if necessary, get rest of field */ > while (state == FLDPLUS) { > bufsz = sizeof buf; > state = m_getfld2(, name, buf, ); > vp = add (buf, vp); /* add to previous value */ > } That's in FLDPLUS, still in the header. > In terms of the networking code, > it looks like the right thing will happen when sending a NUL via > SMTP, Almost, but not quite. I posted a possible fix but I'm still refining it. > It seems for message bodies we're > in reasonable shape (unless you are RETRIEVING a message via POP), but > if a NUL appears in the header somewhere all bets are off. Yeah. I'd be OK with replacing NULs with some legal character(s). I'm not sure that just squashing them is a good idea. I don't have a concrete example, but wonder if it could be abused, say in a really messy URL. David
Re: (Not-so) hypothetical question: What to do about NULs?
>When I was poking around in the POP code I didn't notice any special >handling of NUL bytes. It's possible that this would result in >truncation. If that's what we do now, I suspect it's alright to continue >to do so; at least until we find legitimate emails in the wild that do >not conform (again think 16M character lines). Right, definitely the POP code doesn't handle this, and my quick check suggests we're not the only ones. However, it seems like a lot of IMAP implementations do better. I think that's due to the protocol; in POP when you retrieve a message it looks like: C: RETR 1 S: +OK S: Line 1 S: Line 2 S: [...] S: . So you're THINKING in lines so you tend to read a "line" until you get a line with the sentinel value (.\r\n). IMAP, on the other hand, looks like: C: a0001 FETCH 1 (RFC822) S: * 1 FETCH (RFC822 {1024} [... 1024 bytes of data follows ...] S: ) So you're told "I am sending this many bytes exactly", and you don't have to deal with "lines", so the implementations I've seen tend to call read() (or the equivalent) until they get the correct number of bytes, and because you're not dealing with "lines" you don't treat them as C strings. Of course, RFC 3051 explicitly says: (3) The ASCII NUL character, %x00, MUST NOT be used at any time. But you're not supposed to send 16MB lines either! --Ken
Re: (Not-so) hypothetical question: What to do about NULs?
Thus said Ken Hornstein on Tue, 21 Feb 2023 07:17:19 -0500: > I'm sitting down to write or modify nmh code. Right now we have a lot > of code that assumes NUL-terminated C strings are safe to represent > email everywhere. My question is: is that a valid assumption? I don't think nmh should produce anything that contains NUL bytes, but whether or not it should accept such is a different question (as you mention the 16 million byte line of text in an email message that I keep getting from a certain sender that cannot be bothered to follow the RFC which clearly states that base64 MIME data should be 78 characters and clearly not longer than 998). When I was poking around in the POP code I didn't notice any special handling of NUL bytes. It's possible that this would result in truncation. If that's what we do now, I suspect it's alright to continue to do so; at least until we find legitimate emails in the wild that do not conform (again think 16M character lines). nmh's POP code has been silently truncating long lines (e.g. those greater than 1023 bytes) for years and crashing on lines that were longer than 32,767 bytes). I only recently discovered this while trying to figure out what to do with a 16M character line. I went back through old emails and sure enough, I had a lot of truncation. I never noticed because most of them were in long lines of HTML that I don't ever bother reading. So I guess what I'm saying is, I think it's alright to continue to treat messages as C-strings (until it isn't). Andy
Re: (Not-so) hypothetical question: What to do about NULs?
Thus said Ken Hornstein on Mon, 20 Feb 2023 21:11:48 -0500: > Facinating! I am curious: who/what sent this to you! Do you remember > the MIME type? 0.11 % (percent) of my messages have Content-Transfer-Encoding of binary at the beginning of the line somewhere in the message. Here are the headers from one that dates all the way back to 2001 (this message does not appear to have any actual "binary" content in it). ---BEGIN Content-Type: multipart/mixed; boundary="--=_154292612-6290-0" Content-Transfer-Encoding: binary MIME-Version: 1.0 X-Mailer: MIME-tools 5.41 (Entity 5.404) From: "Jato Boa" Date: Xxx, 00 Xxx 2001 16:09:27 +0800 This is a multi-part message in MIME format... =_154292612-6290-0 Content-Type: text/plain; charset="iso-8859-1" Content-Disposition: inline [ascii data] =_154292612-6290-0 Content-Type: image/jpg; name="OosI Fric Ghesuf kurfIzKi chruzGi Awt.jpg" Content-Disposition: attachment; filename="OosI Fric Ghesuf kurfIzKi chruzGi Awt.jpg" Content-Transfer-Encoding: base64 [base64 data] ---END-- Also, I have quite a few from the Bugtraq mailing list that have a C-T-E of binary. The headers indicate binary, but the rest of the body doesn't seem to imply it (doesn't need it probably), but then there are some like this: https://seclists.org/bugtraq/2004/Aug/223 Here are relevant headers and the binary values were replaced with : ---BEGIN Content-Type: text/plain Content-Disposition: inline Content-Transfer-Encoding: binary MIME-Version: 1.0 X-Mailer: MIME-tools 5.411 (Entity 5.404) From: "Jrme" ATHIAS To: bugt...@securityfocus.com Subject: First vulnerabilities in the SP2 - XP ?... X-Spam-BMF-Status: No, hits=0.00 required=0.90 http://www.heise.de/security/artikel/50051 Regards, Jrme ATHIAS ---END-- Today, I think this message would instead be quoted-printable or some other encoding. Here's another example from a well known online seller of goods that used messagelabs to send out customer order statuses: ---BEGIN Content-Transfer-Encoding: binary Content-Type: multipart/related; boundary="_--=_79242061420" MIME-Version: 1.0 X-Mailer: MIME::Lite 3.01 (F2.72; A1.60; B2.21; Q2.21) Date: Xxx, 00 Xxx 16:36:04 UT From: [online store redacted] This is a multi-part message in MIME format. --_--=_79242061420 Content-Transfer-Encoding: binary Content-Type: multipart/alternative; boundary="_--=_79242061421" MIME-Version: 1.0 X-Mailer: MIME::Lite 3.01 (F2.72; A1.60; B2.21; Q2.21) Date: Xxx, 00 Xxx 16:36:04 UT This is a multi-part message in MIME format. --_--=_79242061421 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain MIME-Version: 1.0 X-Mailer: MIME::Lite 3.01 (F2.72; A1.60; B2.21; Q2.21) Date: Xxx, 00 Xxx 16:36:04 UT [quoted printable data] ---END-- Here's a more recent email from another online provider of services with replaced where binary value was found: ---BEGIN Content-Type: text/html Content-Disposition: inline Content-Transfer-Encoding: binary MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) ... Copyright 2021 ... ---END-- Are these bugs in email client implementations? I've looked at a handful of the messages that I have which have a header of C-T-E binary and the body of the message is almost always some other C-T-E (mostly quoted-printable) or non-binary. But sometimes it seems justified. Maybe they just throw the C-T-E on there "just in case" as a sloppy way of getting by? > I guess what I was hoping for was a consensus on what we SHOULD do > when we encounter a NUL byte, because I haven't heard that yet! Like > what should the code do, precisely? I'm not sure. Does any one have any example of having received a NUL byte in an email? I'm having a hard time convincing grep to look for one. Andy
Re: (Not-so) hypothetical question: What to do about NULs?
Ken Hornstein wrote: > I'm sitting down to write or modify nmh code. Right now we have a lot > of code that assumes NUL-terminated C strings are safe to represent > email everywhere. My question is: is that a valid assumption? If > we are making that assumption, fine, let's be explicit and if someone > DOES encounter a NUL in modern email, we tell them to suck it. I think that this is the minimum that we must do. > If we all agree that is NOT a valid assumption, then fine, going forward > we should eventually fix that, or target new APIs that fix that. If >> The IETF "modern SMTP" stuff John Klensin is working on (with others) might >> want to talk to that: a lot of the ICANN UA stuff is a push for UTF-8 clean >> across the board. > I do not think this is relevant to this discussion, unless they are > changing RFC 5322s position on NULs. But, it seems like a question that IETF could clarify. -- Michael Richardson. o O ( IPv6 IøT consulting ) Sandelman Software Works Inc, Ottawa and Worldwide signature.asc Description: PGP signature
Re: (Not-so) hypothetical question: What to do about NULs?
>> I do not think this is relevant to this discussion, unless they >> are changing RFC 5322s position on NULs. > >But, it seems like a question that IETF could clarify. I don't see how further clarification is necessary here? I mean, a 16MB single line in email is clearly a MUST NOT, but people send them anyway. --Ken
Re: (Not-so) hypothetical question: What to do about NULs?
ken wrote: > I'm sitting down to write or modify nmh code. Right now we have a lot > of code that assumes NUL-terminated C strings are safe to represent > email everywhere. My question is: is that a valid assumption? If > we are making that assumption, fine, let's be explicit and if someone > DOES encounter a NUL in modern email, we tell them to suck it. It seems to me that, given the results of your skim of various other mail recipients, it's clear that receiving NULs in mail is not a big issue. If receiving NULs were a big issue, or even, really, a small issue, then the clients with far larger user bases than MH's would have had to fix their code by now. And they haven't. (Your skim wasn't comprehensive, but that says to me that there are likely more potential breakages out there than you found -- not fewer.) > What I don't want is the current situation where we're kind of > half-assing it and it works because NULs are extremely uncommon (unless > we all agree that is fine). So, I ask again: I encounter a NUL in I personally vote for "that is fine". If no one here has had issues with NULs in mail, and the rest of the world seems to ignore the problem, then I'd submit that it really isn't a problem. The wishy-washyness of the RFCs supports this. Going forward we should try not to crash. And we should try not to truncate. But then I'd say half-assing it is fine: remove the NUL, replace it with '@', whatever. If it's never going to happen, then it simply doesn't matter. paul =-- paul fox, p...@foxharp.boston.ma.us (arlington, ma, where it's 34.7 degrees)
Re: (Not-so) hypothetical question: What to do about NULs?
>> if a NUL appears in the header somewhere all bets are off. > >I think it would be fascinating to understand how that happened. Depending >on how the parse tree is done, it could be marginally bad, or catastrophic. > >I really would be amazed if this is seen in the wild. But its a big >network: maybe its out there? Sigh. I don't really know if it has happened in the wild before (I will presume that it has), but that's not really my point. Let me try to explain it again. I'm sitting down to write or modify nmh code. Right now we have a lot of code that assumes NUL-terminated C strings are safe to represent email everywhere. My question is: is that a valid assumption? If we are making that assumption, fine, let's be explicit and if someone DOES encounter a NUL in modern email, we tell them to suck it. If we all agree that is NOT a valid assumption, then fine, going forward we should eventually fix that, or target new APIs that fix that. If we agree that we should handle NULs in individual MIME parts but not handle them in message headers, fine, let's make that explicit. Then that begs the question of what we SHOULD do when we encounter a NUL in a message header. What I don't want is the current situation where we're kind of half-assing it and it works because NULs are extremely uncommon (unless we all agree that is fine). So, I ask again: I encounter a NUL in an email. What do I do, exactly? Pseudocode is preferred in your response. >The IETF "modern SMTP" stuff John Klensin is working on (with others) might >want to talk to that: a lot of the ICANN UA stuff is a push for UTF-8 clean >across the board. I do not think this is relevant to this discussion, unless they are changing RFC 5322s position on NULs. --Ken