no, that's the very UTF-8.
i guess when u're reference to unicode, u meant UTF-16
in UTF-8, all ascii still have 1byte.
u can still urldecode them into UTF-8 by the old function. and then convert UTF-8 to UTF-16, which u need
From: Andr?John Mas <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Subject: URLs and double byte characters (unicode) Date: Sun, 22 Dec 2002 10:12:05 -0500Hi, I have tried searching for documentation on URLs and double-byte characters, even searched this mailing-list, but could find nothing concrete. For me the issue has arrisen because I am writing a servlet that allows for the browsing of a virtual directory structure that in certain cases have entries that have chinese names. I have looked for some algorithms, but while they worked in the majority of cases failed in a few special cases: - %20%3A%22 -- is this a space followed by one double byte character, or two single byte characters? - %3A%20%22 -- single byte character, space, single byte character OR double byte character, single byte character OR single byte character, double byte character? Using Mozilla I find that it encodes it utf-8 urls with a mixture of single byte and double characters. For example, a space will be represented as %20, any reserved ASCII character will use a single byte %xx value, but anything in chinese will be defined using a double byte %xx%yy value. This makes is very difficult to parse a URL. I would say that the problem is with Mozilla, but for me the real problem is the lack of any documentation on the issue. An RFC would be nice, so at least I know I am dealing with the same solution with all modern web browsers. regards Andre
_________________________________________________________________ ������� MSN Explorer: http://explorer.msn.com/lccn/
