RE: Decimal Unicodepoints

2001-04-26 Thread Marco Cimarosti
David Starner wrote: Yes, I too wonder why the Web people would chose decimal for thier Unicode references, [...] Probably because (one of) the most widespread browsers did not support hexadecimal entities till not much time ago. _ Marco

Re: three characters?

2001-04-26 Thread James Kass
Roozbeh Pournader wrote: ...U+29C8. But it's not a math symbol (from that document, it seems that Squared Square is a binary operator). It's a bullet. Unify? No one else has responded to the list yet about this. According to the 3.0 book, under Character Properties, 4.9 Mathematical

Re: On the possibility of guidance code points for the Private Use Area

2001-04-26 Thread William Overington
Peter Constable wrote: It seems to me that you are still missing the point I'm making. end quote Peter Constable then quoted part of a sentence that I had written. For example, in everyday use of the English language, if I write the word horse then you have a knowledge of what that word means

RE: On the possibility of guidance...

2001-04-26 Thread Roozbeh Pournader
On Thu, 26 Apr 2001, Marco Cimarosti wrote: As a second thought, using codes higher than 0x0FFF is even safer, because it also accounts for the fact that, theoretically, ISO 10646 uses 31 bits. But this was the Unicode mailing list ;) Of course, all this is only possible for

RE: On the possibility of guidance...

2001-04-26 Thread Roozbeh Pournader
On Thu, 26 Apr 2001, Marco Cimarosti wrote: A. Intentional private use for non-exchanged data [...] I agree that little or no coordination is needed for case A. If PUA codepoints remain totally internal to an application, there is going to be no interchange problem at all, as far as

RE: On the possibility of guidance...

2001-04-26 Thread Marco Cimarosti
Roozbeh Pournader wrote: On Thu, 26 Apr 2001, Marco Cimarosti wrote: A. Intentional private use for non-exchanged data [...] I have some objection. One should not use PUA codes for internal purposes [...] If only a few internal ones needed, use noncharacters like the ones in

RE: On the possibility of guidance...

2001-04-26 Thread Marco Cimarosti
Roozbeh Pournader wrote: Surrogating the noncharacters in the FDD0..FDEF range works for internally 16-bit apps. OK. But you won't implement Devanagari rendering with 32 glyphs... At the risk of being the victim of the first digital autodafé, I will add that codes DC00..DFFF (Low Surrogate)

RE: On the possibility of guidance...

2001-04-26 Thread Roozbeh Pournader
On Thu, 26 Apr 2001, Marco Cimarosti wrote: OK. But you won't implement Devanagari rendering with 32 glyphs... If you want to render Devanagari, please use some other mechanisms, not simple character streams. Change the internals to 32-bit, or keep other things with the codepoints. Yes,

RE: Unicode editing

2001-04-26 Thread Marco Cimarosti
Matitiahu Allouche (Mati) has prepared a document Guidelines of a Logical User Interface for Editing Bidirectional Text. It is being discussed at the SII. With his kind permission, I have placed it at http://www.qsm.co.il/Hebrew/logicUI22.htm Thanks for the link! Matitiahu's document is

RE: Unicode editing

2001-04-26 Thread Jonathan Rosenne
-Original Message- From: Marco Cimarosti [mailto:[EMAIL PROTECTED]] Sent: Thursday, April 26, 2001 7:55 PM Matitiahu Allouche (Mati) has prepared a document Guidelines of a Logical User Interface for Editing Bidirectional Text. It is being discussed at the SII. With his

Re: Unicode in a URL

2001-04-26 Thread David Starner
On Thu, Apr 26, 2001 at 09:16:42AM -0700, Paul Deuter wrote: I am wondering if there isn't a need for the Unicode Spec to also dictate a way of encoding Unicode in an ASCII stream. Perhaps the %u is already that and I am just ignorant. Another alternative would be to use the U+

Re: Unicode in a URL

2001-04-26 Thread Markus Scherer
Paul Deuter wrote: I am wondering if there isn't a need for the Unicode Spec to also dictate a way of encoding Unicode in an ASCII stream. Perhaps How many more ways to we need? To be 8-bit-friendly, we have UTF-8. To get everything into ASCII characters, we have UTF-7. W3C specifies to use

Re: Tags and the Private Use Area

2001-04-26 Thread Michael \(michka\) Kaplan
From: William Overington [EMAIL PROTECTED] I have updated my suggestion. Here is the latest version for discussion. Lets consider the fact that what you are looking for is summarized at the end of your message: I hope to gain fairly widespread agreement within the unicode user community. I

RE: Unicode in a URL

2001-04-26 Thread Paul Deuter
Based on the responses, I guess my original question/problem was not very well written. UTF-7 won't work because it cannot be distinguished from ASCII without something that identifies it as UTF-7. The %XX idea does not work because this it already in use by lots of software to encode many

Re: On the possibility of guidance code points for the Private Use Area

2001-04-26 Thread Peter_Constable
On 04/26/2001 06:14:21 PM William Overington wrote: Peter Constable asks If I write chat, do you know what I mean?. Hmm, let me ponder! :-) Is it possible that you are referring to the answer that an Australian numismatist might give if asked what is the bird on the reverse of a British

Re: Tags and the Private Use Area

2001-04-26 Thread Peter_Constable
On 04/27/2001 03:23:36 AM unicode-bounce wrote: From: William Overington [EMAIL PROTECTED] I have updated my suggestion. Here is the latest version for discussion. Lets consider the fact that what you are looking for is summarized at the end of your message: I hope to gain fairly widespread

Pollard list set up

2001-04-26 Thread John H. Jenkins
Pollard has been smiled upon from the divine realm. There is now a mailing list set up to discuss it and its encoding in Unicode. Instructions follow: Your list is ready. [EMAIL PROTECTED]. People can subscribe by the usual method: send a blank message to [EMAIL PROTECTED] and say

RE: Unicode in a URL

2001-04-26 Thread Carl W. Brown
Paul, It sounds like you want URL's in UTF-8 and data is some code page. The HTTP header protocol only allows a single charset specification. Even if you pass UTF-8 URLs the browser should not handle them properly unless you also have the data in UTF-8 as well. If you send pages in UTF-8 you

Re: Tags and the Private Use Area

2001-04-26 Thread Kenneth Whistler
William Overington wrote: I have updated my suggestion. Here is the latest version for discussion. ... Specific protocols to use with such tagging can be devised. ... The suggestion is open for discussion and I hope to gain fairly widespread agreement within the unicode user community. And

RE: Unicode in a URL

2001-04-26 Thread Mike Brown
W3C specifies to use %-encoded UTF-8 for URLs. I think that's an overstatement. Neither the W3C nor the IETF make such a specification. http://www.w3.org/TR/charmod/#sec-URIs contains many ambiguities, conflicts with XML and HTTP, and is not yet a recommendation.

Re: Unicode in a URL

2001-04-26 Thread Martin Duerst
At 11:28 01/04/26 -0700, Markus Scherer wrote: Paul Deuter wrote: I am wondering if there isn't a need for the Unicode Spec to also dictate a way of encoding Unicode in an ASCII stream. Perhaps How many more ways to we need? To be 8-bit-friendly, we have UTF-8. To get everything into ASCII

Re: Unicode in a URL

2001-04-26 Thread Martin Duerst
Hello Paul, At 19:41 01/04/25 -0700, Paul Deuter wrote: I am struggling to figure out the correct method for encoding Unicode characters in the query string portion of a URL. There is a W3C spec that says the Unicode character should be converted to UTF-8 and then each byte should be encoded as

RE: Unicode in a URL

2001-04-26 Thread Martin Duerst
At 15:02 01/04/26 -0700, Paul Deuter wrote: Based on the responses, I guess my original question/problem was not very well written. The %XX idea does not work because this it already in use by lots of software to encode many different character sets. So again we need something that identifies

RE: Unicode in a URL

2001-04-26 Thread Martin Duerst
Hello Mike, At 19:09 01/04/26 -0600, Mike Brown wrote: W3C specifies to use %-encoded UTF-8 for URLs. I think that's an overstatement. Neither the W3C nor the IETF make such a specification. True. Neither W3C nor IETF make such a general statement, because we can't just remove the about 10

Re: On the possibility of guidance code points for the Private Use Area

2001-04-26 Thread William Overington
Wm Seán Glen asked: Couldn't one just embed the glyphs that aren't specified by Unicode along with the text? end quote Yes one could, in a file such as a Word document file where the format of the Word file can handle the embedding of illustrations. However, if one is using a plain unicode

Re: Tags and the Private Use Area

2001-04-26 Thread William Overington
I have updated my suggestion. Here is the latest version for discussion. Let there exist the idea that there is U+12 (PUA INTERPRETATION TAG) and a set of private use area tag characters (U+100020 U+10007F) all of which code points are in the upper private use area. May I suggest that

RE: Unicode in a URL

2001-04-26 Thread Paul Deuter
Thanks Addison. I appreciate that the UTF-8 solution is the right one. However we must acknowledge that this right solution does not appear to be implemented in anywhere. And I have come to the conclusion that it also will not be. The reason is the one that you mentioned: because the %XX

RE: How will software source code represent 21 bit unicode characters?

2001-04-26 Thread addison
On Mon, 23 Apr 2001, Mike Brown wrote: A char corresponds to a Unicode value -- a UTF-16 code value, which could either represent a Unicode character or one half of a surrogate pair. In the latter case, it would take a sequence of two chars to make one Unicode character. It is my