Edit report at https://bugs.php.net/bug.php?id=60412&edit=1

 ID:                 60412
 Updated by:         [email protected]
 Reported by:        mike dot squire at gmail dot com
-Summary:            UTF-8 functions doesn't respect unicode equivalence
+Summary:            UTF-8 functions doesn't respect unicode equivalence
                     - Need Normalization
-Status:             Open
+Status:             Analyzed
 Type:               Bug
-Package:            Unicode Engine related
+Package:            mbstring related
-Operating System:   OSX (though probably all)
+Operating System:   all
-PHP Version:        5.3.8
+PHP Version:        5.4SVN-2011-11-04 (SVN)
 Block user comment: N
 Private report:     N

 New Comment:

What you are looking for is normalization. Intl module has it, but mbstring 
does 
not.

I changed bug type to feature request.


Previous Comments:
------------------------------------------------------------------------
[2011-11-29 22:17:42] mike dot squire at gmail dot com

Description:
------------
Quote from http://en.wikipedia.org/wiki/Unicode_equivalence:

"...the code point U+006E (the Latin lowercase 'n') followed by U+0303 (the 
combining tilde '◌̃') is defined by Unicode to be canonically equivalent to 
the single code point U+00F1 (the lowercase letter 'ñ' of the Spanish 
alphabet). Therefore, those sequences should be displayed in the same manner, 
should be treated in the same way by applications such as alphabetizing names 
or searching, and may be substituted for each other."

It might be this is more a case of just documenting that the unicode functions 
don't support unicode equivalence (for completeness).

Test script:
---------------
echo "Output recorded from a terminal interpreting UTF-8\n\n";

var_dump("\x6e\xcc\x83");
var_dump(utf8_encode("\xf1"));

var_dump(utf8_decode("\x6e\xcc\x83") == "\xf1");
var_dump(mb_convert_encoding("\x6e\xcc\x83", "ISO-8859-1", "UTF-8") == "\xf1");


Expected result:
----------------
Output recorded from a terminal interpreting UTF-8

string(3) "ñ"
string(2) "ñ"
bool(true)
bool(true)

Actual result:
--------------
Output recorded from a terminal interpreting UTF-8

string(3) "ñ"
string(2) "ñ"
bool(false)
bool(false)


------------------------------------------------------------------------



-- 
Edit this bug report at https://bugs.php.net/bug.php?id=60412&edit=1

Reply via email to