Re: decodeURLComponent and non-ASCII characters

Phil M Fri, 06 Oct 2006 11:09:19 -0700

On Oct 6, 2006, at 10:15 AM, Toby Rush wrote:

That's probably what I'll end up doing... but it's going to be aspeed hit, I'm guessing. Shouldn't decodeURLComponent do this, orat least have a setting to indicate how the %xx entities are encoded?


I am sure that you are right and that it is a bug...

However, I encountered a bug with this before (on Linux) and it was atrivial exercise to write your own version which handles encodingsproperly. Just use MemoryBlocks (or the in-memory BinaryStream)copying byte values until you find a % character, and then convertthe following two bytes into a single byte.

Just remember that a generic URLComponent string is suppose to beencoding-free (undefined). Even though you looked up the proper wayto encode UTF-8 text, there is no encoding tag to identify it assuch. Therefore it is up to you to identify the text as UTF-8, whichis pretty hard to do without a validator... unless you are in aclosed loop system where you know all data is being included asUTF-8. I would guess that web browsers send data in the encodingdefined by the web page, but I wouldn't be surprised if some browsersare not UTF-8 aware.



_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

Re: decodeURLComponent and non-ASCII characters

Reply via email to