From: lunter at interia dot pl
Operating system: all
PHP version: 6CVS-2009-01-19 (CVS)
PHP Bug Type: Unicode Engine related
Bug description: PHP 6.0 decodes incorrect base64 uft8 data
Description:
------------
Problem:
--------
PHP 6.0 decodes incorrect base64 uft8 data.
If it is bogus, show the way to encode 'zrEgKyDOsiA9IM6z' to
(unicode)string 'α + β = γ'.
--------------------------------------------------------------------------------------------
PHP 6.0 example (example.php):
------------------------------
<?
// unicode.semantics = off
// unicode.runtime_encoding = iso-8859-1
// unicode.script_encoding = utf-8
// unicode.output_encoding = utf-8
// unicode.from_error_mode = U_INVALID_SUBSTITUTE
// unicode.from_error_subst_char = 3f
$base64='zrEgKyDOsiA9IM6z'; // this is utf-8
based64 text
$binary=base64_decode($base64); // binary utf-8 bytes
$text=unicode_decode($binary,'iso-8859-1'); // why iso-8859-x is
only
supported, where is raw binary option ?
// $text=bin2uni($binary); // needed
header('Content-Type: text/plain; charset=utf-8');
print($text); // SHOULD BE (utf-8):
α + β = γ
?>
--------------------------------------------------------------------------------------------
Solution:
---------
We can not get (unicode)string from (binary)string consists utf-8 bytes.
Imagine: converting (unicode)<->(binary unicode bytes string) newer need
charset infomation.
C#: System.Text.Encoding.UTF8.GetString()
Decodes a sequence of bytes from the specified byte array into a string.
PHP equivalent needed: unicode bin2uni( binary $b )
Decodes a sequence of bytes from the specified binary string into an
unicode string.
--------------------------------------------------------------------------------------------
C# working equivalent (example.ashx):
-------------------------------------
<%@ WebHandler Language="C#" Class="example_handler" %>
using System;
using System.Data;
using System.Web;
public class example_handler : IHttpHandler {
public void ProcessRequest (HttpContext context) {
string base64 = "zrEgKyDOsiA9IM6z"; // this
is utf-8 based64
text
byte[] binary = Convert.FromBase64String(base64); //
binary utf-8
bytes
string text = System.Text.Encoding.UTF8.GetString(binary); // raw
binary supported
context.Response.ContentType = "text/plain; charset=utf-8";
context.Response.Write(text); // very
good (utf-8): α + β =
γ
}
public bool IsReusable {
get {
return false;
}
}
}
Reproduce code:
---------------
above
Expected result:
----------------
above
Actual result:
--------------
above
--
Edit bug report at http://bugs.php.net/?id=47151&edit=1
--
Try a CVS snapshot (PHP 5.2):
http://bugs.php.net/fix.php?id=47151&r=trysnapshot52
Try a CVS snapshot (PHP 5.3):
http://bugs.php.net/fix.php?id=47151&r=trysnapshot53
Try a CVS snapshot (PHP 6.0):
http://bugs.php.net/fix.php?id=47151&r=trysnapshot60
Fixed in CVS:
http://bugs.php.net/fix.php?id=47151&r=fixedcvs
Fixed in CVS and need be documented:
http://bugs.php.net/fix.php?id=47151&r=needdocs
Fixed in release:
http://bugs.php.net/fix.php?id=47151&r=alreadyfixed
Need backtrace:
http://bugs.php.net/fix.php?id=47151&r=needtrace
Need Reproduce Script:
http://bugs.php.net/fix.php?id=47151&r=needscript
Try newer version:
http://bugs.php.net/fix.php?id=47151&r=oldversion
Not developer issue:
http://bugs.php.net/fix.php?id=47151&r=support
Expected behavior:
http://bugs.php.net/fix.php?id=47151&r=notwrong
Not enough info:
http://bugs.php.net/fix.php?id=47151&r=notenoughinfo
Submitted twice:
http://bugs.php.net/fix.php?id=47151&r=submittedtwice
register_globals:
http://bugs.php.net/fix.php?id=47151&r=globals
PHP 4 support discontinued: http://bugs.php.net/fix.php?id=47151&r=php4
Daylight Savings: http://bugs.php.net/fix.php?id=47151&r=dst
IIS Stability:
http://bugs.php.net/fix.php?id=47151&r=isapi
Install GNU Sed:
http://bugs.php.net/fix.php?id=47151&r=gnused
Floating point limitations:
http://bugs.php.net/fix.php?id=47151&r=float
No Zend Extensions:
http://bugs.php.net/fix.php?id=47151&r=nozend
MySQL Configuration Error:
http://bugs.php.net/fix.php?id=47151&r=mysqlcfg