I am working on a CGI program and am having trouble when non-ASCII characters (such as accented roman characters, for example) are submitted as form data.

Lets say that the following string is submitted as part of a POST query string:

Mü! (The second letter is a u with an umlaut, in case it doesn't survive the e-mail gauntlet.)

The string is encoded in the query string as:

M%FC%21

Since query strings are supposed to be %xx encoded as UTF-8, that looks fine so far (I checked <http://www.utf8-chartable.de/> and the hex codes match up).

So the data is sent to my program, and I capture it in a variable, say "s". If I then do the following:

t=decodeURLComponent(s)

...then t becomes:

M¸!     (The second character here is a free-standing cedilla.)

So while the exclamation point was restored, the umlauted 'u' was not. Also, the debugger lists t.encoding as nil.

I've tried both defining and converting the encoding of 's' beforehand, setting it to US-ASCII, but no joy. I've tried defining the encoding of 't' afterward, setting it to UTF-8, but on joy there either.

Is this a flaw in decodeURLComponent, or am I using it incorrectly? Is there a way to tell decodeURLComponent to interpret the %xx entities as UTF-8 values? (And what lookup table is it using, anyway... in UTF-8, the cedilla is %B8...?)

Thanks in advance!

***************************************************
Toby W. Rush - [EMAIL PROTECTED]
Instructor of Music Theory
PVA Webmaster & Technical Operations Manager
University of Northern Colorado
"Omnia voluntaria est."
***************************************************

_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

Reply via email to