ID:               43294
 User updated by:  tallyce at gmail dot com
 Reported By:      tallyce at gmail dot com
-Status:           No Feedback
+Status:           Open
 Bug Type:         Strings related
 Operating System: Windows or Linux
 PHP Version:      5.2.5
 New Comment:

I've been spending further time trying to work out what's happening,
and am convinced something is definitely not right.

I've also found another character where the presence of the character
results in the whole string disappearing, and there may be others.

Using this reproduce code:

<?php echo htmlentities ('Test › †', ENT_COMPAT, 'UTF-8') . '<br />' .
preg_replace('/[^\x00-\x7F]/e', '"&#".ord("$0").";"', 'Test › †') . '<br
/>' . htmlentities ('Test', ENT_COMPAT, 'UTF-8') . '<br />'; ?>

I get different results for machines running SUSE Linux/PHP5.2.4, Linux
Ubuntu/PHP 5.2.3 and WinXP/PHP 5.2.5. Only the second gives the result I
would expect.





1. From a linux machine terminal:

Firstly doing
less t.php
gives
<?php echo htmlentities ('Test 233 206', ENT_COMPAT, 'UTF-8') . '<br
/>' . preg_replace('/[^\x00-\x7F]/e', '"&#".ord("$0").";"', 'Test 233
206') . '<
br />' . htmlentities ('Test', ENT_COMPAT, 'UTF-8') . '<br />'; ?>
with the 233 and 206 background-highlighted.


php -v
PHP 5.2.4 (cli) (built: Sep 12 2007 15:23:24)
Copyright (c) 1997-2007 The PHP Group
Zend Engine v2.2.0, Copyright (c) 1998-2007 Zend Technologies

Test <br />Test &#155; &#134;<br />Test<br />




2. From the same machine but viewing with a web browser
(FF2.0.0.11/WinXP), i.e. example.com/t.php (which is serving up UTF-8
pages as confirmed by web-sniffer.net):

Test ? ?<br />Test &#155; &#134;<br />Test<br />

[two symbols appear as ? in diamond]



3. On another machine, with the putty terminal set to UTF-8:

less t.php
gives:
<?php echo htmlentities ('Test › †', ENT_COMPAT, 'UTF-8') . '<br />' .
preg_replace('/[^\x00-\x7F]/e', '"&#".ord("$0").";"', 'Test › †') . '<br
/>' . htmlentities ('Test', ENT_COMPAT, 'UTF-8') . '<br />'; ?>
exactly as first entered.

php -v
PHP 5.2.3-1ubuntu6.2 (cli) (built: Dec  3 2007 19:59:42)
Copyright (c) 1997-2007 The PHP Group
Zend Engine v2.2.0, Copyright (c) 1998-2007 Zend Technologies

php t.php
Test &rsaquo; &dagger;<br />Test &#226;&#128;&#186;
&#226;&#128;&#160;<br />Test<br />



4. Same machine as (3) but via web browser:

Test &rsaquo; &dagger;<br />Test &#226;&#128;&#186;
&#226;&#128;&#160;<br />Test<br />



5. On a Windows machine

C:\Documents and Settings\username>php -v
PHP 5.2.5 (cli) (built: Nov  8 2007 23:18:51)
Copyright (c) 1997-2007 The PHP Group
Zend Engine v2.2.0, Copyright (c) 1998-2007 Zend Technologies

H:\>php t.php
PHP Warning:  htmlentities(): Invalid multibyte sequence in argument in
H:\t.php on line 1
<br />Test &#155; &#134;<br />Test<br />



6. Same machine as (5) but via web browser

<br />Test &#155; &#134;<br />Test<br />


Previous Comments:
------------------------------------------------------------------------

[2007-12-18 01:00:01] php-bugs at lists dot php dot net

No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

------------------------------------------------------------------------

[2007-12-10 10:02:15] [EMAIL PROTECTED]

Correct output:

$ php t.php
Test &dagger;<br />Test


------------------------------------------------------------------------

[2007-12-10 10:01:49] [EMAIL PROTECTED]

Seems to work fine for me:

[EMAIL PROTECTED] ~]$ php t.php
Test &dagger;<br />Test[

Please try on command line.

------------------------------------------------------------------------

[2007-11-14 14:39:48] tallyce at gmail dot com

Description:
------------
A string which includes the † dagger symbol that is processed with
htmlentities() with UTF-8 as the encoding results in the whole string
being discarded and appearing as blank.

This is definitely a change in PHP 5.2.5. Tested on both Windows and
Linux machines.

Reproduce code:
---------------
<?php echo htmlentities ('Test †', ENT_COMPAT, 'UTF-8') . '<br />' .
htmlentities ('Test', ENT_COMPAT, 'UTF-8'); ?>

Expected result:
----------------
Test †
Test


[This is indeed the result as expected, on PHP v.5.2.4]

Actual result:
--------------
Test



[Blank line at start]


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=43294&edit=1

Reply via email to