ID: 45311
Comment by: bugs dot php dot net at jeka dot ru
Reported By: yar_helg at mail dot ru
Status: Open
Bug Type: mbstring related
Operating System: *
PHP Version: 5.2.6
New Comment:
Таже самая
проблема.
При
установленном
mb_internal_encoding('UTF-8');
substr($string, 0, 2)
случайным
образом
возвращает,
то 2 байта, то 4.
Cтрока в utf8.
Linux 2.6.24-gentoo-r4 #2 SMP
PHP 5.2.6-pl2-gentoo
mbstring.func_overload = 0
Previous Comments:
------------------------------------------------------------------------
[2008-06-19 07:53:31] yar_helg at mail dot ru
Description:
------------
When trying to operate with binary data using substr and
mb_internal_encoding is set to UTF-8 (but no function overloading is
set) substr works wrong - wrong number of bytes is returned after
function call
P.S. emptyfile.xls used in example is an empty MS Excel 2003 file. It
can be downloaded at http://an-best.ru/empty_file.xls (13/5 Kbytes)
P.P.S. IDENTIFIER_OLE constant is taken from Spreadsheet_excel_reader
class.
Reproduce code:
---------------
<?php
echo "function overload = ".ini_get('mbstring.func_overload')."<br
/>\n";
// Uncomment this for demonstration of wrong behaviour
//mb_internal_encoding('UTF-8');
echo "MB_INTERNAL_ENCODING =".mb_internal_encoding()."<br />\n";
define('IDENTIFIER_OLE',
pack("CCCCCCCC",0xd0,0xcf,0x11,0xe0,0xa1,0xb1,0x1a,0xe1));
$data =
file_get_contents($_SERVER['DOCUMENT_ROOT'].'/substr_bug/emptyfile.xls');
echo "Data length = ".strlen($data)."<br />\n";
echo "First 8 symbols ==>".var_export(substr($data,0,8),1)."<== <br
/>\n";
echo "Compare result (substr(\$data,0,8)==IDENTIFIER_OLE) -
".var_export(substr($data,0,8)==IDENTIFIER_OLE,1)."<br />\n";
echo "Substring length (substr(\$data,0,8)) -
".strlen(substr($data,0,8))."<br />\n";
?>
Expected result:
----------------
function overload = 0
MB_INTERNAL_ENCODING =ISO-8859-1
Data length = 13824
First 8 symbols ==>'поЮ║╠А'<==
Compare result (substr($data,0,8)==IDENTIFIER_OLE) - true
Substring length (substr($data,0,8)) - 8
Actual result:
--------------
// This result can be seen if mb_internal_encoding is set to UTF-8
function overload = 0
MB_INTERNAL_ENCODING =UTF-8
Data length = 13824
First 8 symbols ==>'поЮ║╠А' .
"\0"
. '' . "\0" . '' . "\0" . '' . "\0" . '' . "\0" . ''<==
Compare result (substr($data,0,8)==IDENTIFIER_OLE) - false
Substring length (substr($data,0,8)) - 13
------------------------------------------------------------------------
--
Edit this bug report at http://bugs.php.net/?id=45311&edit=1