Hallo.
I am one of those who started this childish joke of introducing implausible
UTF-... acronyms at nearly every post.
I found that the joke is getting very fun but also that it may be starting
confusing people, so I fill compelled to quit joking for a moment and make
clear which ones are
Am 2001-06-23 um 14:40 h EDT hat [EMAIL PROTECTED] geschrieben:
To keep well-meaning people from misinterpreting humorous UTF proposals as
serious, while still allowing the levity to flow freely, I hereby propose
that UTFs proposed in a non-serious light be indicated in lower-case letters
Otto Stolz wrote:
Yet, I acknowledge the need to clearly mark humorous UTF propositions
for the unsuspicious. Hence, I'd like to suggest to enclose their
respective acronyms between \u202B and \u202C. This would be enough
hinting on the skewed nature of such suggestions while still
[I'm cc:-ing the unicode list to make sure that I've gotten my
terminology right, and to solicit comments
On Mon, 25 Jun 2001, [EMAIL PROTECTED] wrote:
Tim Peters wrote:
[M.-A. Lemburg]
...
2. What to do when slicing of Unicode strings would break
a surrogate pair ?
To me a
For playing the dozens at a unicode convention:
I wouldn't want *your* girlfriend. Why would I want a girl with so little personality
she gets U+3005 on her arm?
$B$i$s$^(B $B!z$8$e$&$$$C$A$c$s!z(B
$B!!!_$"$+$M(B
$B!
At 11:13 AM +0200 6/25/01, Marco Cimarosti wrote:
Hallo.
I am one of those who started this childish joke of introducing implausible
UTF-... acronyms at nearly every post.
I found that the joke is getting very fun but also that it may be starting
confusing people, so I fill compelled to quit
You cannot interpret isolated UTF-16 surrogate code units as characters. For
example, you can't interpret the sequence of D800 followed by 0061 as if it
were some private use character (say, Klingon) followed by an 'a'.
(For those unfamiliar with the terminology, see
* Marco Cimarosti
|
| 1) UTF-8, UTF-16 and UTF-32 are the only three real EXISTING Unicode
| Transformation Formats. They are official and part of the Unicode standard.
* Elliotte Rusty Harold
|
| What about ISO-10646-UCS-2 and ISO-10646-UCS-4 as used in XML? Where
| do they fit in? Are they
* Michael Everson
|
| Have you seen the really cool new Not the Roadmap page? (See
| http://www.egt.ie/standards/iso10646/ucs-roadmap.html)
Nushu isn't mentioned there. What is the status of that with regard to
encoding it in Unicode?
--Lars M.
In a message dated 2001-06-25 2:24:36 Pacific Daylight Time,
[EMAIL PROTECTED] writes:
To avoid possible misunderstandings, such as regarding Doug's Unicode
Compression Kludge as a duck, acronyms should continue being written
in upper-case letters.
I hadn't thought of that possibility,
John Cowan wrote:
5. Emit all non-zero bytes.
Do you mean omit leading zeroes and emit following bytes? You would not want to emit
all but a middle byte, right?
markus
-Original Message-
From: Basel Abu Khiran [mailto:[EMAIL PROTECTED]]
Sent: Saturday, June 23, 2001 7:34 AM
To: '[EMAIL PROTECTED]'
Subject: font Crisis!
Dear Sir.
I would like to inquire aboout a certaain issue
I have a font that I use to desplay Qura'n
You know that
Gaute B Strokkenes wrote...
[I'm cc:-ing the unicode list to make sure that I've gotten my
terminology right, and to solicit comments
Interesting... I just started looking at Python the other day, once I
discovered it has such nice built-in Unicode support.
If Python is explicitly storing
Mark Davis said:
In most people's experience, it is best to leave the low level interfaces
with indices in terms of code units, then supply some utility routines
that
tell you information about code points. ...
Anyone on the list interested in the treatment of UCS aka Unicode in
programming
comments below.
- Original Message -
From: M.-A. Lemburg [EMAIL PROTECTED]
To: Mark Davis [EMAIL PROTECTED]
Cc: Gaute B Strokkenes [EMAIL PROTECTED]; Tim Peters [EMAIL PROTECTED];
[EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Monday, June 25, 2001 09:46
Subject: Re: [I18n-sig] Re: How does
Mon, 25 Jun 2001 07:24:28 -0700, Mark Davis [EMAIL PROTECTED] pisze:
In most people's experience, it is best to leave the low level interfaces
with indices in terms of code units, then supply some utility routines that
tell you information about code points.
It's yet better to work on
Lars M. asked:
* Michael Everson
|
| Have you seen the really cool new Not the Roadmap page? (See
| http://www.egt.ie/standards/iso10646/ucs-roadmap.html)
Nushu isn't mentioned there. What is the status of that with regard to
encoding it in Unicode?
It's up in the air.
For those who
That is an interesting approach; one that basically amounts to some
convenience functions. For example, instead of writing:
myString.substring(myString.cpToIndex(3), myString.cpToIndex(5));
you could write:
myString.substring(3, 5, myString.CODEPOINT);
This hides some of the work, when
What do you understand Nushu to be?
--
Michael Everson
* Lars Marius Garshol
|
| Nushu isn't mentioned there. What is the status of that with regard to
| encoding it in Unicode?
* Kenneth Whistler
|
| It's up in the air.
I can understand why that would be so, but shouldn't the roadmap say
so? I would think it would be useful for it to do so.
|
A proposal needs a definition, though:
UTF would mean Unicode Transformation Format
utf would mean Unicode Terrible Farce
untenable total figment?
unable to focus?
utf twisted form?
YA
From: Basel Abu Khiran [mailto:[EMAIL PROTECTED]]
Dear Sir.
I would like to inquire aboout a certaain issue
I have a font that I use to desplay Qura'n
You know that arabic letters have special characters above or below
them... nowafter defining unicode in a c program.
From: [EMAIL PROTECTED]
Oh yeah, well, I can be more tongue-in-cheek than all of you. I've
already
implemented it.
Quick, quick. Patent it and then open-source it. It will be unstoppable.
YA
Lars Marius Garshol asked:
* Kenneth Whistler
|
| It's up in the air.
I can understand why that would be so, but shouldn't the roadmap say
so? I would think it would be useful for it to do so.
Yes, I think it would be.
| From what I have seen, there is some question whether Nushu
Markus Scherer scripsit:
John Cowan wrote:
5. Emit all non-zero bytes.
Do you mean omit leading zeroes and emit following bytes? You would not want to
emit all but a middle byte, right?
Yes, of course *assumes paper bag*
--
John Cowan [EMAIL
At 22:27 +0200 2001-06-25, Lars Marius Garshol wrote:
* Lars Marius Garshol
|
| Nushu isn't mentioned there. What is the status of that with regard to
| encoding it in Unicode?
* Kenneth Whistler
|
| It's up in the air.
I can understand why that would be so, but shouldn't the roadmap say
so? I
At 15:31 -0700 2001-06-25, Kenneth Whistler wrote:
Thanks for the pointer. Michael Everson ought now to have enough information
to put a reasonable entry in the Roadmap. It is not yet ready for
encoding yet, clearly, and sounds like it could have a numerosity
from something like 600 characters
At 11:42 -0700 2001-06-25, Kenneth Whistler wrote:
From what I have seen, there is some question whether Nushu should
just be treated as a cipher of the existing Han characters.
Or maybe it's just a dictionary.
The
analytic lists seem to consist of lists of glyphs, each equated to
a standard
On Mon, 25 Jun 2001, [EMAIL PROTECTED] wrote:
MAL and Gaute,
Can I please take the middle ground (and risk having both of you
throw things at me?
= Lone surrogates are not 'true Unicode char points
in their own right' [MAL] -- they don't represent characters.
I think you're misquoting
In a message dated 2001-06-25 20:19:18 Pacific Daylight Time, [EMAIL PROTECTED]
writes:
(For instance, I
don't see how it would be possible to encode a sequence of unicode
scalar values corresponding to a low and a high surrogate; if you
tried to map this back then you would get a
30 matches
Mail list logo