Thanks to Jay and Jonathan for their responses. It appears that I've discovered a bug with decodeURLComponent, though I doubt myself a little because I can't believe I'm the first one to run into it!

In my initial message, I noted that the entering the following string into a web form:

Mü! (second letter is umlauted 'u')

Causes the web browser to encode it as follows:

M%FC%21

This encoding is UTF-8 (http://www.utf8-chartable.de/), and it appears that it is consistent across browsers (I tried it in six different ones including Safari, IE and Firefox).

Running this string through decodeURLComponent, however gives us this:

M¸! (second letter is a cedilla)

It appears that decodeURLComponent is assuming that the %xx values are according to MacRoman (or SystemDefault) instead of UTF-8, which appears to be the way that URLs are encoded nowadays.

So should I submit this as a bug?

Jon, your solution:

Try using DecodeURL by passing in the encoding parameter:

s =3D DecodeURL( myString, Encodings.UTF8 )

This is equivalent to the following code too:

s =3D DecodeURL( myString )
s =3D DefineEncoding( s, Encodings.UTF8 )

... gives me a different character altogether for the second letter: I don't know what it is, but it's asc() value is 1048577.

Jay, your solution:

What if you tried ConvertURLString in the RB
examples: REAL Software\REALbasic
2006r4\Examples\Networking\Example Web Server

Also here is the code:

Protected Function ConvertURLString(url as string) As
string
  dim x as integer
  dim temp,encStr as string

  // covert out hex values from the url string
  temp = url
  do
    x = instr( temp, "%" )// hex values start with '%'
    if x = 0 then// no encoding found
      exit
    end

    encStr = mid( temp, x+1, 2 )
    encStr = chr( val( "&h" + encStr ) ) // ****

    temp = left( temp, x - 1 ) + encStr + mid( temp, x
+ 3 )
  loop

  return temp
End Function

Also encodes as MacRoman, but can be fixed by changing the marked line (****) to:

encStr = encodings.UTF8.chr( val( "&h" + encStr ) )

That's probably what I'll end up doing... but it's going to be a speed hit, I'm guessing. Shouldn't decodeURLComponent do this, or at least have a setting to indicate how the %xx entities are encoded?

***************************************************
Toby W. Rush - [EMAIL PROTECTED]
Instructor of Music Theory
PVA Webmaster & Technical Operations Manager
University of Northern Colorado
"Omnia voluntaria est."
***************************************************

_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

Reply via email to