On Oct 6, 2006, at 10:15 AM, Toby Rush wrote:

That's probably what I'll end up doing... but it's going to be a speed hit, I'm guessing. Shouldn't decodeURLComponent do this, or at least have a setting to indicate how the %xx entities are encoded?

I am sure that you are right and that it is a bug...

However, I encountered a bug with this before (on Linux) and it was a trivial exercise to write your own version which handles encodings properly. Just use MemoryBlocks (or the in-memory BinaryStream) copying byte values until you find a % character, and then convert the following two bytes into a single byte.

Just remember that a generic URLComponent string is suppose to be encoding-free (undefined). Even though you looked up the proper way to encode UTF-8 text, there is no encoding tag to identify it as such. Therefore it is up to you to identify the text as UTF-8, which is pretty hard to do without a validator... unless you are in a closed loop system where you know all data is being included as UTF-8. I would guess that web browsers send data in the encoding defined by the web page, but I wouldn't be surprised if some browsers are not UTF-8 aware.


_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

Reply via email to