RE: Playing with Unicode (was: Re: UTF-17)

2001-06-25 Thread Marco Cimarosti
Hallo. I am one of those who started this childish joke of introducing implausible UTF-... acronyms at nearly every post. I found that the joke is getting very fun but also that it may be starting confusing people, so I fill compelled to quit joking for a moment and make clear which ones are

Re: Playing with Unicode (was: Re: UTF-17)

2001-06-25 Thread Otto Stolz
Am 2001-06-23 um 14:40 h EDT hat [EMAIL PROTECTED] geschrieben: To keep well-meaning people from misinterpreting humorous UTF proposals as serious, while still allowing the levity to flow freely, I hereby propose that UTFs proposed in a non-serious light be indicated in lower-case letters

RE: Playing with Unicode (was: Re: UTF-17)

2001-06-25 Thread Marco Cimarosti
Otto Stolz wrote: Yet, I acknowledge the need to clearly mark humorous UTF propositions for the unsuspicious. Hence, I'd like to suggest to enclose their respective acronyms between \u202B and \u202C. This would be enough hinting on the skewed nature of such suggestions while still

Re: How does Python Unicode treat surrogates?

2001-06-25 Thread Gaute B Strokkenes
[I'm cc:-ing the unicode list to make sure that I've gotten my terminology right, and to solicit comments On Mon, 25 Jun 2001, [EMAIL PROTECTED] wrote: Tim Peters wrote: [M.-A. Lemburg] ... 2. What to do when slicing of Unicode strings would break a surrogate pair ? To me a

Kanji tattoo

2001-06-25 Thread $B$F$s$I$&$j$e$&$8(B
For playing the dozens at a unicode convention: I wouldn't want *your* girlfriend. Why would I want a girl with so little personality she gets U+3005 on her arm? $B$i$s$^(B $B!z$8$e$&$$$C$A$c$s!z(B $B!!!_$"$+$M(B $B!

RE: Playing with Unicode (was: Re: UTF-17)

2001-06-25 Thread Elliotte Rusty Harold
At 11:13 AM +0200 6/25/01, Marco Cimarosti wrote: Hallo. I am one of those who started this childish joke of introducing implausible UTF-... acronyms at nearly every post. I found that the joke is getting very fun but also that it may be starting confusing people, so I fill compelled to quit

Re: How does Python Unicode treat surrogates?

2001-06-25 Thread Mark Davis
You cannot interpret isolated UTF-16 surrogate code units as characters. For example, you can't interpret the sequence of D800 followed by 0061 as if it were some private use character (say, Klingon) followed by an 'a'. (For those unfamiliar with the terminology, see

Re: Playing with Unicode (was: Re: UTF-17)

2001-06-25 Thread Lars Marius Garshol
* Marco Cimarosti | | 1) UTF-8, UTF-16 and UTF-32 are the only three real EXISTING Unicode | Transformation Formats. They are official and part of the Unicode standard. * Elliotte Rusty Harold | | What about ISO-10646-UCS-2 and ISO-10646-UCS-4 as used in XML? Where | do they fit in? Are they

Re: UTF-17

2001-06-25 Thread Lars Marius Garshol
* Michael Everson | | Have you seen the really cool new Not the Roadmap page? (See | http://www.egt.ie/standards/iso10646/ucs-roadmap.html) Nushu isn't mentioned there. What is the status of that with regard to encoding it in Unicode? --Lars M.

Re: Playing with Unicode (was: Re: UTF-17)

2001-06-25 Thread DougEwell2
In a message dated 2001-06-25 2:24:36 Pacific Daylight Time, [EMAIL PROTECTED] writes: To avoid possible misunderstandings, such as regarding Doug's Unicode Compression Kludge as a duck, acronyms should continue being written in upper-case letters. I hadn't thought of that possibility,

Re: DUDE-8, a compression proposal

2001-06-25 Thread Markus Scherer
John Cowan wrote: 5. Emit all non-zero bytes. Do you mean omit leading zeroes and emit following bytes? You would not want to emit all but a middle byte, right? markus

FW: Arabic font Crisis!!!!!

2001-06-25 Thread Magda Danish (Unicode)
-Original Message- From: Basel Abu Khiran [mailto:[EMAIL PROTECTED]] Sent: Saturday, June 23, 2001 7:34 AM To: '[EMAIL PROTECTED]' Subject: font Crisis! Dear Sir. I would like to inquire aboout a certaain issue I have a font that I use to desplay Qura'n You know that

Re: How does Python Unicode treat surrogates?

2001-06-25 Thread Rick McGowan
Gaute B Strokkenes wrote... [I'm cc:-ing the unicode list to make sure that I've gotten my terminology right, and to solicit comments Interesting... I just started looking at Python the other day, once I discovered it has such nice built-in Unicode support. If Python is explicitly storing

Re: How does Python Unicode treat surrogates?

2001-06-25 Thread J M Sykes
Mark Davis said: In most people's experience, it is best to leave the low level interfaces with indices in terms of code units, then supply some utility routines that tell you information about code points. ... Anyone on the list interested in the treatment of UCS aka Unicode in programming

Re: [I18n-sig] Re: How does Python Unicode treat surrogates?

2001-06-25 Thread Mark Davis
comments below. - Original Message - From: M.-A. Lemburg [EMAIL PROTECTED] To: Mark Davis [EMAIL PROTECTED] Cc: Gaute B Strokkenes [EMAIL PROTECTED]; Tim Peters [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Monday, June 25, 2001 09:46 Subject: Re: [I18n-sig] Re: How does

Re: How does Python Unicode treat surrogates?

2001-06-25 Thread Marcin 'Qrczak' Kowalczyk
Mon, 25 Jun 2001 07:24:28 -0700, Mark Davis [EMAIL PROTECTED] pisze: In most people's experience, it is best to leave the low level interfaces with indices in terms of code units, then supply some utility routines that tell you information about code points. It's yet better to work on

Nushu (was Re: UTF-17)

2001-06-25 Thread Kenneth Whistler
Lars M. asked: * Michael Everson | | Have you seen the really cool new Not the Roadmap page? (See | http://www.egt.ie/standards/iso10646/ucs-roadmap.html) Nushu isn't mentioned there. What is the status of that with regard to encoding it in Unicode? It's up in the air. For those who

Re: How does Python Unicode treat surrogates?

2001-06-25 Thread Mark Davis
That is an interesting approach; one that basically amounts to some convenience functions. For example, instead of writing: myString.substring(myString.cpToIndex(3), myString.cpToIndex(5)); you could write: myString.substring(3, 5, myString.CODEPOINT); This hides some of the work, when

Re: UTF-17

2001-06-25 Thread Michael Everson
What do you understand Nushu to be? -- Michael Everson

Re: Nushu (was Re: UTF-17)

2001-06-25 Thread Lars Marius Garshol
* Lars Marius Garshol | | Nushu isn't mentioned there. What is the status of that with regard to | encoding it in Unicode? * Kenneth Whistler | | It's up in the air. I can understand why that would be so, but shouldn't the roadmap say so? I would think it would be useful for it to do so. |

RE: Playing with Unicode (was: Re: UTF-17)

2001-06-25 Thread Yves Arrouye
A proposal needs a definition, though: UTF would mean Unicode Transformation Format utf would mean Unicode Terrible Farce untenable total figment? unable to focus? utf twisted form? YA

Re: FW: Arabic font Crisis!!!!!

2001-06-25 Thread John Hudson
From: Basel Abu Khiran [mailto:[EMAIL PROTECTED]] Dear Sir. I would like to inquire aboout a certaain issue I have a font that I use to desplay Qura'n You know that arabic letters have special characters above or below them... nowafter defining unicode in a c program.

RE: UTF-17

2001-06-25 Thread Yves Arrouye
From: [EMAIL PROTECTED] Oh yeah, well, I can be more tongue-in-cheek than all of you. I've already implemented it. Quick, quick. Patent it and then open-source it. It will be unstoppable. YA

Re: Nushu (was Re: UTF-17)

2001-06-25 Thread Kenneth Whistler
Lars Marius Garshol asked: * Kenneth Whistler | | It's up in the air. I can understand why that would be so, but shouldn't the roadmap say so? I would think it would be useful for it to do so. Yes, I think it would be. | From what I have seen, there is some question whether Nushu

Re: DUDE-8, a compression proposal

2001-06-25 Thread John Cowan
Markus Scherer scripsit: John Cowan wrote: 5. Emit all non-zero bytes. Do you mean omit leading zeroes and emit following bytes? You would not want to emit all but a middle byte, right? Yes, of course *assumes paper bag* -- John Cowan [EMAIL

Re: Nushu (was Re: UTF-17)

2001-06-25 Thread Michael Everson
At 22:27 +0200 2001-06-25, Lars Marius Garshol wrote: * Lars Marius Garshol | | Nushu isn't mentioned there. What is the status of that with regard to | encoding it in Unicode? * Kenneth Whistler | | It's up in the air. I can understand why that would be so, but shouldn't the roadmap say so? I

Re: Nushu (was Re: UTF-17)

2001-06-25 Thread Michael Everson
At 15:31 -0700 2001-06-25, Kenneth Whistler wrote: Thanks for the pointer. Michael Everson ought now to have enough information to put a reasonable entry in the Roadmap. It is not yet ready for encoding yet, clearly, and sounds like it could have a numerosity from something like 600 characters

Re: Nushu (was Re: UTF-17)

2001-06-25 Thread Michael Everson
At 11:42 -0700 2001-06-25, Kenneth Whistler wrote: From what I have seen, there is some question whether Nushu should just be treated as a cipher of the existing Han characters. Or maybe it's just a dictionary. The analytic lists seem to consist of lists of glyphs, each equated to a standard

Re: How does Python Unicode treat surrogates?

2001-06-25 Thread Gaute B Strokkenes
On Mon, 25 Jun 2001, [EMAIL PROTECTED] wrote: MAL and Gaute, Can I please take the middle ground (and risk having both of you throw things at me? = Lone surrogates are not 'true Unicode char points in their own right' [MAL] -- they don't represent characters. I think you're misquoting

Re: How does Python Unicode treat surrogates?

2001-06-25 Thread DougEwell2
In a message dated 2001-06-25 20:19:18 Pacific Daylight Time, [EMAIL PROTECTED] writes: (For instance, I don't see how it would be possible to encode a sequence of unicode scalar values corresponding to a low and a high surrogate; if you tried to map this back then you would get a