Edit report at https://bugs.php.net/bug.php?id=62462&edit=1

 ID:                 62462
 User updated by:    c2h5oh at poczta dot fm
 Reported by:        c2h5oh at poczta dot fm
 Summary:            quoted_printable_encode splits line in the middle of
                     UTF8 character
-Status:             Feedback
+Status:             Open
 Type:               Bug
 Package:            *Mail Related
 Operating System:   Linux
 PHP Version:        5.3.14
 Block user comment: N
 Private report:     N

 New Comment:

>From RFC 2045:
"Because quoted-printable data is generally assumed to be line-
   oriented, it is to be expected that the representation of the breaks
   between the lines of quoted-printable data may be altered in
   transport, in the same manner that plain text mail has always been
   altered in Internet mail when passing between systems with differing
   newline conventions."

We have no guarantee that it will be possible to merge split character during 
decoding.

This has caused problems before: https://bugzilla.mozilla.org/show_bug.cgi?
id=684508
It's a problem widespread enough that for example SwiftMailer doesn't use 
quoted_printable_encode, but own PHP 
implementation which is more than an order of magnitude slower ( 
https://github.com/swiftmailer/swiftmailer/issues/220 ).

76 characters per line is the upper limit - adding soft line break earlier 
produces perfectly valid encoding that doesn't 
cause such problems.


Previous Comments:
------------------------------------------------------------------------
[2012-07-15 02:31:30] s...@php.net

Could you explain why soft line breaks is a problem? The software decoding the 
QP 
string should ignore the linebreaks and reassemble the string in the original 
form. Quoting from https://en.wikipedia.org/wiki/Quoted-printable:

A soft line break consists of an "=" at the end of an encoded line, and does 
not 
appear as a line break in the decoded text. 

So where the corruption of the utf-8 comes from?

------------------------------------------------------------------------
[2012-07-02 11:42:51] c2h5oh at poczta dot fm

Description:
------------
quoted_printable_encode adds, among other things, soft line breaks if line 
lenght 
is greater than 76 characters. 
If that 76th character happens to be in the middle if encoded UTF8 character 
then 
this character will be split into two lines corrupting the encoded sting.



Test script:
---------------
<?php
echo 
quoted_printable_encode('ąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąą');

Expected result:
----------------
=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=
=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=
=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=
=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=
=C4=85=C4=85=C4=85=C4=85=C4=85

Actual result:
--------------
=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=
=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=
=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=
=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=
=C4=85=C4=85=C4=85


(compare ends of each line)


------------------------------------------------------------------------



-- 
Edit this bug report at https://bugs.php.net/bug.php?id=62462&edit=1

Reply via email to