RE: Unicode in a URL

2001-04-27 Thread Mike Brown
I asserted, referring to section 4.2.2 of the XML spec: !ENTITY greeting SYSTEM http://somewhere/getgreeting?lang=esname=C%C3%A9sar; ] The name Ce'sar is represented here as C%C3%A9sar in the UTF-8 based escaping, as per the XML requirement. You replied: What the XML spec (and all the

Re: Unicode in a URL

2001-04-26 Thread David Starner
On Thu, Apr 26, 2001 at 09:16:42AM -0700, Paul Deuter wrote: I am wondering if there isn't a need for the Unicode Spec to also dictate a way of encoding Unicode in an ASCII stream. Perhaps the %u is already that and I am just ignorant. Another alternative would be to use the U+

Re: Unicode in a URL

2001-04-26 Thread Markus Scherer
Paul Deuter wrote: I am wondering if there isn't a need for the Unicode Spec to also dictate a way of encoding Unicode in an ASCII stream. Perhaps How many more ways to we need? To be 8-bit-friendly, we have UTF-8. To get everything into ASCII characters, we have UTF-7. W3C specifies to use

RE: Unicode in a URL

2001-04-26 Thread Paul Deuter
Subject: Re: Unicode in a URL Paul Deuter wrote: I am wondering if there isn't a need for the Unicode Spec to also dictate a way of encoding Unicode in an ASCII stream. Perhaps How many more ways to we need? To be 8-bit-friendly, we have UTF-8. To get everything into ASCII characters, we have

RE: Unicode in a URL

2001-04-26 Thread Carl W. Brown
can use UTF-8 URLs otherwise they are invalid. Carl -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Paul Deuter Sent: Thursday, April 26, 2001 3:02 PM To: 'Markus Scherer'; unicode Subject: RE: Unicode in a URL Based on the responses, I guess my original

RE: Unicode in a URL

2001-04-26 Thread Mike Brown
W3C specifies to use %-encoded UTF-8 for URLs. I think that's an overstatement. Neither the W3C nor the IETF make such a specification. http://www.w3.org/TR/charmod/#sec-URIs contains many ambiguities, conflicts with XML and HTTP, and is not yet a recommendation.

Re: Unicode in a URL

2001-04-26 Thread Martin Duerst
At 11:28 01/04/26 -0700, Markus Scherer wrote: Paul Deuter wrote: I am wondering if there isn't a need for the Unicode Spec to also dictate a way of encoding Unicode in an ASCII stream. Perhaps How many more ways to we need? To be 8-bit-friendly, we have UTF-8. To get everything into ASCII

Re: Unicode in a URL

2001-04-26 Thread Martin Duerst
Hello Paul, At 19:41 01/04/25 -0700, Paul Deuter wrote: I am struggling to figure out the correct method for encoding Unicode characters in the query string portion of a URL. There is a W3C spec that says the Unicode character should be converted to UTF-8 and then each byte should be encoded as

RE: Unicode in a URL

2001-04-26 Thread Martin Duerst
At 15:02 01/04/26 -0700, Paul Deuter wrote: Based on the responses, I guess my original question/problem was not very well written. The %XX idea does not work because this it already in use by lots of software to encode many different character sets. So again we need something that identifies

RE: Unicode in a URL

2001-04-26 Thread Martin Duerst
Hello Mike, At 19:09 01/04/26 -0600, Mike Brown wrote: W3C specifies to use %-encoded UTF-8 for URLs. I think that's an overstatement. Neither the W3C nor the IETF make such a specification. True. Neither W3C nor IETF make such a general statement, because we can't just remove the about 10

RE: Unicode in a URL

2001-04-26 Thread Paul Deuter
[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Sent: Wednesday, April 25, 2001 9:20 PM To: Paul Deuter Cc: Unicode List (E-mail) Subject: Re: Unicode in a URL Actually, your first solution (the W3C recommendation

Unicode in a URL

2001-04-25 Thread Paul Deuter
I am struggling to figure out the correct method for encoding Unicode characters in the query string portion of a URL. There is a W3C spec that says the Unicode character should be converted to UTF-8 and then each byte should be encoded as %XX. From my experience however, browsers will encode