Edit report at https://bugs.php.net/bug.php?id=62462&edit=1
ID: 62462 User updated by: c2h5oh at poczta dot fm Reported by: c2h5oh at poczta dot fm Summary: quoted_printable_encode splits line in the middle of UTF8 character -Status: Feedback +Status: Open Type: Bug Package: *Mail Related Operating System: Linux PHP Version: 5.3.14 Block user comment: N Private report: N New Comment: >From RFC 2045: "Because quoted-printable data is generally assumed to be line- oriented, it is to be expected that the representation of the breaks between the lines of quoted-printable data may be altered in transport, in the same manner that plain text mail has always been altered in Internet mail when passing between systems with differing newline conventions." We have no guarantee that it will be possible to merge split character during decoding. This has caused problems before: https://bugzilla.mozilla.org/show_bug.cgi? id=684508 It's a problem widespread enough that for example SwiftMailer doesn't use quoted_printable_encode, but own PHP implementation which is more than an order of magnitude slower ( https://github.com/swiftmailer/swiftmailer/issues/220 ). 76 characters per line is the upper limit - adding soft line break earlier produces perfectly valid encoding that doesn't cause such problems. Previous Comments: ------------------------------------------------------------------------ [2012-07-15 02:31:30] s...@php.net Could you explain why soft line breaks is a problem? The software decoding the QP string should ignore the linebreaks and reassemble the string in the original form. Quoting from https://en.wikipedia.org/wiki/Quoted-printable: A soft line break consists of an "=" at the end of an encoded line, and does not appear as a line break in the decoded text. So where the corruption of the utf-8 comes from? ------------------------------------------------------------------------ [2012-07-02 11:42:51] c2h5oh at poczta dot fm Description: ------------ quoted_printable_encode adds, among other things, soft line breaks if line lenght is greater than 76 characters. If that 76th character happens to be in the middle if encoded UTF8 character then this character will be split into two lines corrupting the encoded sting. Test script: --------------- <?php echo quoted_printable_encode('Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä '); Expected result: ---------------- =C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85= =C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85= =C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85= =C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85= =C4=85=C4=85=C4=85=C4=85=C4=85 Actual result: -------------- =C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4= =85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85= =C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4= =85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85= =C4=85=C4=85=C4=85 (compare ends of each line) ------------------------------------------------------------------------ -- Edit this bug report at https://bugs.php.net/bug.php?id=62462&edit=1