Thanks to Jay and Jonathan for their responses. It appears that I've
discovered a bug with decodeURLComponent, though I doubt myself a
little because I can't believe I'm the first one to run into it!
In my initial message, I noted that the entering the following string
into a web form:
Mü! (second letter is umlauted 'u')
Causes the web browser to encode it as follows:
M%FC%21
This encoding is UTF-8 (http://www.utf8-chartable.de/), and it
appears that it is consistent across browsers (I tried it in six
different ones including Safari, IE and Firefox).
Running this string through decodeURLComponent, however gives us this:
M¸! (second letter is a cedilla)
It appears that decodeURLComponent is assuming that the %xx values
are according to MacRoman (or SystemDefault) instead of UTF-8, which
appears to be the way that URLs are encoded nowadays.
So should I submit this as a bug?
Jon, your solution:
Try using DecodeURL by passing in the encoding parameter:
s =3D DecodeURL( myString, Encodings.UTF8 )
This is equivalent to the following code too:
s =3D DecodeURL( myString )
s =3D DefineEncoding( s, Encodings.UTF8 )
... gives me a different character altogether for the second letter:
I don't know what it is, but it's asc() value is 1048577.
Jay, your solution:
What if you tried ConvertURLString in the RB
examples: REAL Software\REALbasic
2006r4\Examples\Networking\Example Web Server
Also here is the code:
Protected Function ConvertURLString(url as string) As
string
dim x as integer
dim temp,encStr as string
// covert out hex values from the url string
temp = url
do
x = instr( temp, "%" )// hex values start with '%'
if x = 0 then// no encoding found
exit
end
encStr = mid( temp, x+1, 2 )
encStr = chr( val( "&h" + encStr ) ) // ****
temp = left( temp, x - 1 ) + encStr + mid( temp, x
+ 3 )
loop
return temp
End Function
Also encodes as MacRoman, but can be fixed by changing the marked
line (****) to:
encStr = encodings.UTF8.chr( val( "&h" + encStr ) )
That's probably what I'll end up doing... but it's going to be a
speed hit, I'm guessing. Shouldn't decodeURLComponent do this, or at
least have a setting to indicate how the %xx entities are encoded?
***************************************************
Toby W. Rush - [EMAIL PROTECTED]
Instructor of Music Theory
PVA Webmaster & Technical Operations Manager
University of Northern Colorado
"Omnia voluntaria est."
***************************************************
_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>
Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>