On 9/19/19 5:46 PM, Gilles Chehade wrote: > Hello, > > The RFC for SMTP states the following (section 2.3.8): > > In addition, the appearance of "bare" "CR" or "LF" characters in text > (i.e., either without the other) has a long history of causing > problems in mail implementations and applications that use the mail > system as a tool. SMTP client implementations MUST NOT transmit > these characters except when they are intended as line terminators > and then MUST, as indicated above, transmit them only as a <CRLF> > sequence. > > > As a result, OpenSMTPD rejects DATA containing <CR> with the following: > > 500 5.0.0 <CR> is only allowed before <LF> > > requiring that clients encode DATA if <CR> is part of it. > > My question is: are we too strict ? > > Not two MTA do the same thing. Some will leave '\r' in the body and then > write it to the user mailbox or relay it. Other change it into a '\n' or > skip it. The first ones take the risk of a MUA not handling '\r' well or > an MTA rejecting later, the second ones break DKIM-signatures. > > The only good way to deal with this is to stick to the RFC ... BUT users > then experience message rejections when using broken clients (semarie@'s > printer is an example of one). > > So: > > a- do we leave '\r' in the body ? > b- do we turn '\r' into '\n' > c- do we keep strict behavior ? > d- do we keep strict behavior + provide a knob for '\r' to work ? > > To be blunt tl;dr: putting it that simple none of the options are suitable.
Lets start with 2 definitions in RFC 5321 section 4.1.1.4: The receiver normally sends a 354 response to DATA, and then treats the lines (strings ending in <CRLF> sequences, as described in Section 2.3.7) following the command as mail data from the sender. This command causes the mail data to be appended to the mail data buffer. The mail data may contain any of the 128 ASCII character codes, although experience has indicated that use of control characters other than SP, HT, CR, and LF may cause problems and SHOULD be avoided when possible. The custom of accepting lines ending only in <LF>, as a concession to non-conforming behavior on the part of some UNIX systems, has proven to cause more interoperability problems than it solves, and SMTP server systems MUST NOT do this, even in the name of improved robustness. In particular, the sequence "<LF>.<LF>" (bare line feeds, without carriage returns) MUST NOT be treated as equivalent to <CRLF>.<CRLF> as the end of mail data indication. In other words if we want to be smtp-strict we must be 7bit clean and we must also not accept '\n' as an end of line. This in turn means that iobuf.c:177 is wrong in that it makes the '\r' optional. This makes us loose critical information for dkim signatures. Additionally the current implementation would also fail a dkimverify: If someone sends a line without '\r' it would fail because the verifier can't know if the original line ended in '\r' or not and thus must assume there is one (in the current situation). I understand that for "dumb" filters based around basic shell tools should assume lines end on '\n', but that's not 7bit clean and breaks valid, but non-unix compatible mails. One way to work around this would be to add a second command (let's call it filter-dataline7bit) which sends the lines unaltered and can be used by fully compatible tools to do proper 7bit clean filtering. If someone has a usecase where they don't care if the input is 7bit clean (if they know the input is always properly formatted) they can use the current filter-dataline. In conclusion: I don't care at all if the input line contains a '\r' (from a DKIM-perspective), but I do care my input is consistent and I know how my lines end and what I send back is not altered anymore.
