decodeURLComponent and non-ASCII characters

Toby Rush Thu, 05 Oct 2006 16:17:33 -0700

I am working on a CGI program and am having trouble when non-ASCIIcharacters (such as accented roman characters, for example) aresubmitted as form data.

Lets say that the following string is submitted as part of a POSTquery string:

Mü! (The second letter is a u with an umlaut, in case it doesn'tsurvive the e-mail gauntlet.)


The string is encoded in the query string as:

M%FC%21

Since query strings are supposed to be %xx encoded as UTF-8, thatlooks fine so far (I checked <http://www.utf8-chartable.de/> and thehex codes match up).

So the data is sent to my program, and I capture it in a variable,say "s". If I then do the following:


t=decodeURLComponent(s)

...then t becomes:

M¸!     (The second character here is a free-standing cedilla.)

So while the exclamation point was restored, the umlauted 'u' wasnot. Also, the debugger lists t.encoding as nil.

I've tried both defining and converting the encoding of 's'beforehand, setting it to US-ASCII, but no joy. I've tried definingthe encoding of 't' afterward, setting it to UTF-8, but on joy thereeither.

Is this a flaw in decodeURLComponent, or am I using it incorrectly?Is there a way to tell decodeURLComponent to interpret the %xxentities as UTF-8 values? (And what lookup table is it using,anyway... in UTF-8, the cedilla is %B8...?)


Thanks in advance!

***************************************************
Toby W. Rush - [EMAIL PROTECTED]
Instructor of Music Theory
PVA Webmaster & Technical Operations Manager
University of Northern Colorado
"Omnia voluntaria est."
***************************************************

_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

decodeURLComponent and non-ASCII characters

Reply via email to