On Oct 4, 2011, at 1:13 PM, John Campbell wrote:

>> I understand that, but I'm asking something like: if you type in •.com into
>> your browser, what's getting passed to the server behind the scenes?
> 
> The non encoded string (xn--...).  It must be this way because the
> HTTP protocol requires the header to be completely US-ASCII.
> 
> It is best to think of punycode as just a browser adress bar display hack.
> 
> -jc

It's not so much a hack but rather the way one can send information that 
exceeds the capability of the medium.

The Internet is/was based upon a seven-bit character set and not 8 (or 
greater). As such, simple ASCII was used from the beginning. Domain names 
composed of ASCII characters posed no problems -- after all they're all English 
characters. However, when additional characters were needed (non-English), 
there was no way to address them. After all, if you are held to only 127 
possible characters (seven-bit), how can you address over 65,000 characters as 
found in the UTF-8?

So, circa 2000 the IDNS WG was established to create a method to use seven-bit 
addressing to accomplish more than what HTTP was originally designed to do. One 
of the first algorithms was AMC, followed by RACE, and finally followed by 
PUNYCODE. These were simply algorithms that used a prefix (such as "xn--" as 
found in PUNYCODE) to identify that the characters transmitted were to be 
transposed to code-points. For example, the string "xn--19g" meant that the 
"xn--" string identified the string as a IDNS domain and the "19g" was 
transposed by the PUNYCODE algorithms to produce a square-root symbol.

I believe that Browsers like Safari have the PUNYCODE algorithm already 
built-in and thus can make the translation "on-the-fly" between characters 
entered via the keyboard and what's transmitted via HTTP. Keep in mind that 
PUNYCODE was never meant to be seen by the end-user. All domain names were 
supposed to be seen in their native language.

Now, my understanding of the specific process may be flawed, but my description 
should be generally correct. Please understand that I lurked on the IDNS WG at 
the time (circa 2000), but did not fully understand everything that was 
discussed. There were some very smart people on that WG.

Cheers,

tedd

_____________________
t...@sperling.com
http://sperling.com
_______________________________________________
New York PHP Users Group Community Talk Mailing List
http://lists.nyphp.org/mailman/listinfo/talk

http://www.nyphp.org/Show-Participation

Reply via email to