ID:               31649
 Updated by:       [EMAIL PROTECTED]
 Reported By:      james at gogo dot co dot nz
 Status:           Open
-Bug Type:         URL related
+Bug Type:         Feature/Change Request
 Operating System: All
 PHP Version:      4.3.10
 New Comment:

PHP doesnt support unicode in a whole lot of places. Marking this as a
feature request instead.


Previous Comments:
------------------------------------------------------------------------

[2005-01-21 22:29:07] james at gogo dot co dot nz

Description:
------------
urldecode() does not understand the %uxxxx format for escaping unicode
characters above 0xFF.

This is a very old bug, originally reported as bug #15027 and declared
bogus, I believe erroneously, and here is the reasoning...

In all modern browsers (including Mozilla), JavaScript's escape()
function uses %HH for Unicode codepoints below 0x0100, but %uHHHH for
codepoints above there.

>From ECMA-262:
--------------
For characters whose Unicode encoding is 0xFF or less, a two-digit
escape sequence of the form %xx is used in accordance with RFC1738. For
characters whose Unicode encoding is greater than 0xFF, a four-digit
escape sequence of the form %uxxxx is used.
--------------

I believe this is a bug, PHP is unable to urldecode the valid escape()d
values from modern browsers when those escape()d strings contain unicode
characters greater than 0xFF.  

Declaring it not a bug because it is not in the RFCs, but rather
defined by ECMA is a poor decision.



Reproduce code:
---------------
echo urldecode('%u2013');


Expected result:
----------------
A string containing the three characters comprising the unicode
character 0x2013 (En Dash) in utf-8, namely 0xE2 0x80 and 0x93.

Actual result:
--------------
The literal string "%u2013".


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=31649&edit=1

Reply via email to