Edit report at https://bugs.php.net/bug.php?id=62462&edit=1
ID: 62462 User updated by: c2h5oh at poczta dot fm Reported by: c2h5oh at poczta dot fm Summary: quoted_printable_encode splits line in the middle of UTF8 character -Status: Feedback +Status: Open Type: Bug Package: *Mail Related Operating System: Linux PHP Version: 5.3.14 Block user comment: N Private report: N New Comment: The important part is "quoted-printable data is generally assumed to be line- oriented", I don't think we can or should assume that nothing happens to line- breaks in transport. As for the "if some clients/agents break the email text, we can't really be responsible for them and if they do that": the encoding after patch is still valid encoding and at the same time prevents "breaking" email text. I have asked one of the people who made the decision to use own implementation of quoted_printable_encode in SwiftMailer instead of the function to update this ticket with the reasoning behind it. Previous Comments: ------------------------------------------------------------------------ [2012-07-16 06:41:33] s...@php.net The quoted bug is Thunderbird issue that produces real linebreak - note the absence of = at the end of line here: =CE=B1=CE=B1=CE=B1=CE=B1=CE=B1=CE=B1=CE <------- ERROR instead of soft breaks. Clearly, this is a bug since the encoder should not insert hard line breaks. PHP produces soft breaks, which is in full compliance with the standard. I'm not sure also how RFC quote is relevant - if some clients/agents break the email text, we can't really be responsible for them and if they do that, they break QP encoding and something else - like base64 - should be used. I don't see however why it makes PHP's way of encoding be wrong. Could you describe the scenario where the way PHP does things lead to something breaking, which is not due to bug in some other product? I also read swiftmailer report and also couldn't find description of any scenario where PHP encoding is wrong. ------------------------------------------------------------------------ [2012-07-15 12:02:06] c2h5oh at poczta dot fm >From RFC 2045: "Because quoted-printable data is generally assumed to be line- oriented, it is to be expected that the representation of the breaks between the lines of quoted-printable data may be altered in transport, in the same manner that plain text mail has always been altered in Internet mail when passing between systems with differing newline conventions." We have no guarantee that it will be possible to merge split character during decoding. This has caused problems before: https://bugzilla.mozilla.org/show_bug.cgi? id=684508 It's a problem widespread enough that for example SwiftMailer doesn't use quoted_printable_encode, but own PHP implementation which is more than an order of magnitude slower ( https://github.com/swiftmailer/swiftmailer/issues/220 ). 76 characters per line is the upper limit - adding soft line break earlier produces perfectly valid encoding that doesn't cause such problems. ------------------------------------------------------------------------ [2012-07-15 02:31:30] s...@php.net Could you explain why soft line breaks is a problem? The software decoding the QP string should ignore the linebreaks and reassemble the string in the original form. Quoting from https://en.wikipedia.org/wiki/Quoted-printable: A soft line break consists of an "=" at the end of an encoded line, and does not appear as a line break in the decoded text. So where the corruption of the utf-8 comes from? ------------------------------------------------------------------------ [2012-07-02 11:42:51] c2h5oh at poczta dot fm Description: ------------ quoted_printable_encode adds, among other things, soft line breaks if line lenght is greater than 76 characters. If that 76th character happens to be in the middle if encoded UTF8 character then this character will be split into two lines corrupting the encoded sting. Test script: --------------- <?php echo quoted_printable_encode('Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä '); Expected result: ---------------- =C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85= =C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85= =C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85= =C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85= =C4=85=C4=85=C4=85=C4=85=C4=85 Actual result: -------------- =C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4= =85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85= =C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4= =85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85= =C4=85=C4=85=C4=85 (compare ends of each line) ------------------------------------------------------------------------ -- Edit this bug report at https://bugs.php.net/bug.php?id=62462&edit=1