On Oct 3, 2011, at 1:53 PM, Jerry B. Altzman wrote:

> Returning to the my original obPHP (so that Hans doesn't get upset at me):  
> how do punycoded URLs and their Unicoded (or other-encoded) counterparts get 
> dealt with in real life PHP?  Who is dealing with them, and how well does 
> PHP+your underlying OS manage it?  Do you need to do environment-wrangling to 
> make encoding issues go away?  Tedd's original response "by wishing Microsoft 
> never existed" is glib but unhelpful. The homographic problem is also huge, 
> no doubt, but computers by and large aren't fooled by the difference between 
> А and A. (BTW the former is U0410 'Cyrillic Capital Letter A', the latter is 
> U0041 'Latin Capital Letter A'.  Depending on the font you use in your 
> reader, you may or may not see a difference between the two.)

PUNYCODE is the ULR for IDN (Internationalized Domain Names). PHP doesn't have 
to deal with it any more/less than any other URL.

For most browsers, entering a PUNYCODE string is the only way to provide 
Unicode characters (code-points). It is only in the Safari Browser where a user 
can enter a string that can be composed of something other than ASCII AND the 
browser will accept that string as a real URL and direct the user to the proper 
URL. Whereas, other browsers convulse.

Nothing o the above has anything to do with PHP.

The following is from memory and may be flawed, but should be close:

Now, in PHP string management (string functions) that's a different story. 
Using the standard built-in PHP functions to deal with strings, such as 
strstr(), please realise that these routines deal with standard ASCII strings.

If you are dealing Unicode strings, then they are handled differently, such as 
using the routine mb_strstr() (the "mb_" mean muitibyte) of which Unicode 
strings are composed.

In other words, the extended charset taken from standard ASCII and expanded to 
include all Unicode (actually ASCII is a subset of Unicode) required more 
information to properly address each code point. In doing so, special functions 
had to be created to deal with the extended set of characters (code-points) 
that the Unicode database provides.

HTH's

tedd

_____________________
t...@sperling.com
http://sperling.com
_______________________________________________
New York PHP Users Group Community Talk Mailing List
http://lists.nyphp.org/mailman/listinfo/talk

http://www.nyphp.org/Show-Participation

Reply via email to