At 11:02 AM 10/26/03 +1100, Simon Butcher wrote:
Hi!
snip
I was taught at school that the double-bar form was used
when Australia
switched to decimal currency in 1966, and that it was
incorrect to write
the single-bar form when referring to Australian dollars.
It would be interesting if
At 02:08 PM 10/25/03 -0700, Doug Ewell wrote:
So, in effect the UNICODE character names attempt to be
a unified transliteration scheme for all languages? Are these
principles laid down somewhere or is this more informal?
The Unicode character names attempt to be (a) unique and (b) reasonably
At 09:30 PM 10/26/03 -0800, Doug Ewell wrote:
I can't speak for the whole of the last two centuries, but certainly
current American bills and coins do not use either symbol. The bills
in common use say ONE DOLLAR, FIVE DOLLARS, TEN DOLLARS, and TWENTY
DOLLARS; the coins say ONE CENT, FIVE
At 09:35 PM 10/27/03 -0800, Doug Ewell wrote:
That said, I can try to improve my use of real Unicode punctuation on
these lists, if I have time to paste it in (since my keyboard doesn't
support it).
Please don't.
I remember being told by someone a few years back that I
should limit my use of
At 05:52 AM 11/20/2003, Philippe Verdy wrote:
We need a comprehensive new technical report that lists all the exceptions
to the general category system, as these line-breaking or word-breaking or
grapheme cluster breaking properties are orthogonal to the basic GC system
and to the combining class
At 05:44 AM 11/19/2003, Philippe Verdy wrote:
However, a couple of paragraphs up, the definition for No-Break
Space says:
U+00A0 [No-Break Space] behaves like the following coded
character sequence: U+FEFF [Zero Width No-Break Space] +
U+0020 [Space] + U+FEFF [Zero Width No-Break Space].
- Original Message -
From: Frank Yung-Fong Tang [EMAIL PROTECTED]
UTF-166,634,430 bytes
UTF-87,637,601 bytes
SCSU6,414,319 bytes
BOCU-15,897,258 bytes
Legacy encoding (*)5,477,432 bytes
(*) KS C 5601, KS X 1001, or EUC-KR)
What is the size
Another rule which isn't written into Unicode but I like (don't know if
Everson
and Whistler and others will), is the font clarity rule. Given a font
minus one
character, I should be able to predict what that character will look like.
If I
have a Sütterlin font or a Fraktur font, I know what
At 04:08 PM 1/8/2004, D. Starner wrote:
Otto Stolz [EMAIL PROTECTED] wrote:
Gerd Schumacher wrote:
The long s [...] has been abandoned from the Roman alphabet in Germany
in the mid of the 19th century.
You mean the 20th century, don't you?
I have a facsimile reprint of the 1914 issue of
At 09:23 PM 1/18/2004, [EMAIL PROTECTED] wrote:
Seriously, it's my understanding that implementation guidelines
for Mongolian script and Unicode are still being worked out.
You are correct. A group of experts is currently working out a definite
description of how Mongolian should work.
All the
Just a few comments on Andrew's note:
At 06:43 AM 1/19/2004, Andrew C. West wrote:
An analogy for those not familiar with the Mongolian script is the much
beloved
long s, which is a positional glyph variant of the ordinary letter s for some
languages at some periods of time. The long s does not
At 04:12 PM 2/9/2004, Kenneth Whistler wrote:
That leaves item A. And it is mostly a matter of determining
what is the best mechanism for getting people to know how
they should spell the metegs with the minimum of confusion.
Putting something in the Unicode Standard might be appropriate,
or there
At 01:20 PM 2/7/2004, Laurentiu Iancu wrote:
I noticed that a new combining character, U+1DC2 Combining Snake Below, has
been added. Just out of curiosity, what were the reasons why this character
was allocated at this code point rather than, for instance, U+0358, the last
free position in the
At 12:11 PM 2/24/2004, Kenneth Whistler wrote:
Think of variation selection as being more appropriate when
what we are talking about are for most purposes simply
*free variants* for presentation -- either is equally correct
to most people under most circumstances -- but where for
particular
At 12:07 PM 3/16/2004, Antoine Leca wrote:
(For example, old German in Frakkur typeface has been decided to be
just different font, but the same lattin letters as we know today)
Like U+017F? ;-)
A little known fact is that the long s cannot be implemented as your typical
context-based glyph
At 10:34 AM 3/18/2004, Michael Everson wrote:
I think the ANARCHY SIGN is perfectly good, but I think it is a glyph
variant of an existing character.
Just as 2117 and 24C5 are similar, but unrelated the *ANARCHY SIGN is not
the same as 24B6.
A./
At 08:27 AM 3/18/2004, Jon Wilson wrote:
Hi folks,
I believe there is a character missing from the standard. I would like to
apply to have it included, but I am a typography and Unicode novice, so I
require some assistance with the application process.
The character in question is a variant of
At 04:18 PM 3/18/2004, Mike Ayers wrote:
Note that in *that* rendition of the anarchy symbol, the
crossbar on the A does *not* touch the circle on either
edge, but it may just be that the renderer was a little
short of black paint.
I find
At 09:48 AM 3/19/2004, Mike Ayers wrote:
In less than half an hour of looking at printed samples, I've
been able to
locate two instances of the symbol replacing the letter A in
a word. If
that's not use in text, I don't know what is.
That is use in text as a glyph variant, which is,
At 07:13 AM 3/19/2004, Marion Gunn wrote:
Ar 15:33 + 2004/03/18, scríobh Arcane Jill:
This probably is going to sound like a really dumb question, but ... Is
the BMP being saved for something?
...
Arcane Jill
There are never any dumb questions, Jill, only dumb answers.
And some of the latter
At 02:26 AM 3/21/2004, Philippe Verdy wrote:
Look into Wingdings and Dingbats code blocks,
**
Phillipe, this is a new low in sloppy inaccuracy even for you.
WingDings is a name of a series of fonts shipped by MS.
They contain many symbols not found in Unicode. There is no
At 02:55 PM 3/23/2004, Thomas Kuehne wrote:
Is somebody already using a PUA assignment for vertical text direction
controls?
from http://www.unicode.org/faq/bidi.html#1
[...] the choice of vertical layout is usually treated as a
formatting style; therefore, the Unicode Standard does not define
At 06:09 PM 3/23/2004, Thomas Kuehne wrote:
Am Mittwoch 24 März 2004 00:09 schrieb Asmus Freytag:
Is somebody already using a PUA assignment for vertical text
direction controls?
I think the idea was that these don't belong in plain text.
Markup languages have had vertical layout controls
At 12:14 PM 3/24/2004, Mike Ayers wrote:
Does anyone know of a good program for examining fonts? What I
am looking for is some way to, given a font, find out both the glyphs
contained and the code points (bad term?) at which those glyphs are
situated. Ability to read hinting/shaping
At 02:58 PM 3/24/2004, Thomas Kuehne wrote:
Am 2004-03-23 20:23 schrieb Asmus Freytag:
I don't think I know of a scenario where it is crtical for a
resource limited device to display the kinds of texts you list
below.
Reading the font data and processing it into a display representation
poses
At 01:33 PM 3/26/2004, Jim Allan wrote:
Arcane Jill posted:
(A) A proposed character will be rejected if its glyph is identical in
appearance to that of an extant glyph, regardless of its semantic
meaning,
Obviously not.
Unicode encodes characters not glyphs. That particular glyphs of one
At 02:03 PM 3/26/2004, Ernest Cline wrote:
[Original Message]
From: Asmus Freytag [EMAIL PROTECTED]
There are millions of fonts out there with variations of the zodiac. Font
shifting would seem to be the correct answer to implement glyph
variations
there. (A wrong font will ruin the mood
At 05:32 PM 3/26/2004, John Cowan wrote:
Asmus Freytag scripsit:
Another drawback is the fact that
too few systems handle any variation selectors gracefully.
Well, at least they should be easy to handle in fonts: add the selectors
to the font as invisible characters, and then create mandatory
John,
Look at UTR#20 and at UAX#9 (the 4.01. version is due out shortly).
Taken together they suggest that the non-plain text way is to keep such
text direction overrides out of band (i.e. in markup) and to apply the
bidi algorithm segment by segment in a marked up file.
If you export to plain
At 05:47 PM 3/27/2004, John Cowan wrote:
Asmus Freytag scripsit:
This can be tricky esp,. when the user doesn't know a VS is present
and the font used to view the data doesn't have an alternate glyph.
Well, surely it'll turn into the black blob, or the reversed question
mark, or whatever
At 09:46 AM 3/28/2004, Philippe Verdy wrote:
It was like the US telecommunications act which set fines for transmitting
its set of proscribed words including in programs that were designed to
filter the words out of text.
Dos this list really exist? Seriously, there's no word that can be
At 07:53 PM 3/27/2004, [EMAIL PROTECTED] wrote:
What does the collation standard say to do with unassigned codepoints
anyhow?
Variation selectors are not unassigned characters.
But, they might be regarded as such by any application predating VSs. And,
likewise for any VS sequences approved
Date: Sun, 28 Mar 2004 15:26:12 -0800
To: Philippe Verdy [EMAIL PROTECTED]
From: Asmus Freytag [EMAIL PROTECTED]
Subject: Re: [OT] proscribed words... (was:What is the principle?)
At 02:46 PM 3/28/2004, Philippe Verdy wrote:
From: Asmus Freytag [EMAIL PROTECTED]
Does this list really exist
At 12:19 PM 3/29/2004, Ernest Cline wrote:
[Original Message]
From: Peter Kirk [EMAIL PROTECTED]
On 29/03/2004 06:56, John Cowan wrote:
Peter Kirk scripsit:
Using NBSP rather than SPACE has several advantages, and has long
been specified in Unicode, although not widely implemented. It
At 04:28 PM 3/29/2004, Kenneth Whistler wrote:
I will say again as I have said before - but the above (and what I
snipped) is extra evidence for it - that what is broke ... is
the rule that the isolated (generally spacing) form of a combining mark
should be formed by SPACE or NBSP followed by
At 09:37 AM 4/1/2004, you wrote:
[EMAIL PROTECTED] wrote:
The cedi sign should be of the size of the dollar sign ($) or the euro
sign
(EUR). The site you provided is using the cent sign. The Ghana web site
uses a
better version of the cent sign for the cedi. See
At 12:34 PM 4/2/2004, Kenneth Whistler wrote:
But by all means, make the proposal to the UTC if fixing this
inconsistency seems important and there is some argument to
be made for it.
I might add that 'merely' fixing an apparent inconsistency
cannot be enough of a rationale for making this change.
At 11:44 AM 4/2/2004, Kenneth Whistler wrote:
Rick said:
We also learn from the bird stamps web site cited later that the
government of Ghana is extremely inconsistent about their images and
usage
of their own currency sign. I.e., they apparently don't have a standard
for
it.
So, I don't
Somebody wrote:
non-breaking and non-stretching are presentational properties, not
semantic ones. They don't change the meaning of the space: it's still
just a space, not a hyphen or the letter g. They don't affect
non-visual media; we don't break lines in spoken speech. Louis XVI
is
At 01:29 PM 4/7/2004, Richard Cook wrote:
On Wed, 7 Apr 2004, Peter Constable wrote:
They were encoded that way some while before they were accepted in
Unicode. Also, until Unicode 4.1 is published, there is a possibility
that codepoints may change.
I see. I assumed the codepoint assignments
At 09:11 PM 4/7/2004, Tobias Stamm wrote:
Greetings to all standartisers!
I'm new here so forgive me my stupidness.
I just have one little question to which I didn't found the answer in the
whole homepage:
What is the standard of the characters names?
You are looking for the character naming
At 10:49 PM 4/7/2004, Peter Constable wrote:
, and the length it reports
is the number of code units, not the number of characters or graphemes
in
the string.
True; that is documented.
However, that's very common; many APIs relating to UTF-8 would report
the number of bytes, not the number of
James,
this is the kind of thing that you should report via
our error reporting form. Here on the open list, it's
liable to get lost (no-one owns excerpting issues from
this forum).
The contact form can be found on our home page under
contact us.
A./
At 12:03 PM 4/8/2004, [EMAIL PROTECTED]
At 03:31 PM 4/15/2004, Peter Kirk wrote:
[PA] Isn't this the one that should be used in dictionaries ?
See http://www.unicode.org/unicode/standard/reports/tr14/tr14-6.html
Why are you guys citing the 1999 (!) version of this TR?
It's 2004, Unicode 4.0.1 has been published and we are up to
At 12:26 AM 4/16/2004, Alexandros Diamantidis wrote:
* Philippe Verdy [EMAIL PROTECTED] [2004-04-16 01:22]:
U+0387 GREEK ANO TELEIA
wrong form? it's a small square, and is the greek semicolon, and is then
separating words.
U+0387 is canonically equivalent to U+00B7. About its shape, whether
At 06:16 PM 4/15/2004, Philippe Verdy wrote:
The other reason is that the middle-dot, being a punctuation, would be
likely to
have extra spacing on both sides, which would make it inappropriate for
rendering Catalan words. Also such punctuation would probably forbid
kerning of
the middle-dot
At 01:54 PM 4/17/2004, Michael Everson wrote:
The samples Asmus sent suggest to me that a school of typographers made a
set of bad decisions, even if they were really famous and got paid lots of
money and their fonts are widely shipped!
In all charity, Michael, your opinion seems to be mainly
At 08:42 AM 4/19/2004, Theo Veenker wrote:
Hi,
Until now I always downloaded the lastest version of the UCD
and worked with that. Now I want to download the UCD files for
4.0.0 again. I know it is all in http://www.unicode.org/Public/-
4.0-Update/, but in http://www.unicode.org/ucd/ I read this:
At 03:49 PM 4/19/2004, Kenneth Whistler wrote:
The Unicode Standard is not prescriptive about rendering, beyond the
basics required to simply ensure correct mapping of textual content
into streams of characters. If one font vendor wants to have a raised
glyph for the MIDDLE DOT and another wants
At 10:44 AM 4/22/2004, Frank Yung-Fong Tang wrote:
I
saw the announcment of publishing
ISO/IEC 10646: 2003, Information technology --
Universal Multiple-Octet Coded Character Set (UCS)
From
http://anubis.dkuug.dk/jtc1/sc2/open/02n3729.htm
I expect there are no difference from Unicode 4.0,
At 11:13 AM 4/23/2004, Philippe Verdy wrote:
On Fri, 23 Apr 2004 12:12:57 -0400, Edward H. Trager [EMAIL PROTECTED]
said:
2 -- doing everything from regular windows gui tools, which have been
unicode-freindly since forever.
Maybe on Windows based on newer NT kernels only (NT4, 2000, XP, 2003,
At 08:59 AM 4/23/2004, [EMAIL PROTECTED] wrote:
I'm surfacing an issue from [EMAIL PROTECTED] because it may have
wider applicability.
Currently, it's the rule that variation selector characters can't be
applied to combining characters. This is sensible in the case of true
diacritical marks: if
At 06:30 AM 4/24/2004, Peter Constable replied to Peter Kirk:
problems do arise if there is more than one combining character
between the base character and the VS and they are not in canonical
order. But this is a marginal case which can be avoided by ensuring
that
canonical order is always
At 04:00 PM 4/24/2004, Peter Kirk wrote:
There are tons of problems once one adds in other combining marks
being applied to the character as well, because then under normalization,
unless the mark you were applying the variation selector to is of
combining class 0, you can't assure that the
At 05:04 PM 4/24/2004, Mark Davis quoted a message by Frank:
I know this is a little bit off-topic for Unicode, just like the one
about locale. Maybe I should move this to w3c i18n mailling list
Now that the common locale data repository is hosted by The Unicode
Consortium, it may no longer be as
At 05:33 PM 4/24/2004, Ernest Cline wrote:
There are problems. Suppose, we define a new variation selector that
will stay with the preceding mark under normalization.
Now consider what happens when implementations conforming to
a standard of Unicode that does not know about the new character
At 07:59 PM 4/24/2004, Michael Everson wrote:
At 19:37 -0700 2004-04-24, Doug Ewell wrote:
Michael Everson everson at evertype dot com wrote:
I would appreciate it if there were a [EMAIL PROTECTED] list for
these discussions.
There is, [EMAIL PROTECTED], and I apologize for burdening this list
At 07:09 PM 5/12/2004, Chris Jacobs wrote:
The Unicode Standard is a plain text standard, *not* a
text layout standard. See Section 2.9 of The Unicode
Standard, Version 4.0 for what the standard has to say
on this.
The extent of directional layout required of a *plain text*
standard is the
At 11:21 AM 5/13/2004, Francois Yergeau wrote:
Peter Constable a écrit :
A language is an attribute of content, and a language ID
is used for
declaration of that attribute.
A locale is an operational mode of software processes, and
a locale
ID is used in APIs to set or determine that mode.
At 01:42 PM 5/8/2004, [EMAIL PROTECTED] wrote:
The above merely to illustrate that experts in any persuasion
seldom agree on everything; if they did -- they couldn't be
contentious.
Becker's law:
For every expert there's an equal and opposite expert.
A./
At 04:36 AM 5/7/2004, Patrick Andries wrote:
Doug Ewell a écrit :
It's clear to me that the reason my colleague and I can read this font
is not that we have any special knowledge of both scripts, but because
it's a stylistic variant of Latin.
And thus he cannot read a Vietnamese text in
At 06:17 AM 5/3/2004, [EMAIL PROTECTED] wrote:
Unicode considers such combinations of letters to be presentation forms
of letters which are already covered in the Unicode Standard. Although
for the Yoruba language, the gb digraph is treated as a single letter,
for computer encoding it is a string
While I continue to be convinced that the 22 character repertoire of shapes
contained in the proposal is indeed well-known, as asserted by the
submitter, I am far less certain now that it would constitute progress to
encode these as characters.
I would want to see a lot more in terms of
At 10:25 AM 5/2/2004, Michael Everson wrote:
Do you really think it necessary that the proposal be a thesis reprising a
hundred years of script analysis?
I think what's desirable is something of a summary that applies this
analysis in a way that it can be related to the research. A thesis would
At 09:20 AM 5/2/2004, Michael Everson wrote:
At 03:28 -0800 2004-05-02, D. Starner wrote:
My site certainly does not consider Gaelic to be a separate script
from Latin.
Did you remove Latg and Latf from the scripts standard? Which is exactly
on-point to my message--it is useful to distinguish
At 11:59 PM 5/4/2004, R.C. Bakhuizen van den Brink [Rein] wrote:
How well does low-budget Eudora support Unicode?
Mixed.
It allows messages to be sent for viewing in a browser window; that enables
both Unicode support and additional HTML support not found in the Eudora
viewer. I use that feature
At 01:11 PM 5/17/2004, [EMAIL PROTECTED] wrote:
Michael Everson scripsit:
Or shouldn't simply Unicode deprecate script IDs in favor of ISO-15924
codes?
This doesn't make any sense.
I believe the suggestion is to drop the long-form Unicode script codes
currently used for the Script property in
At 03:34 PM 5/21/2004, Dean Snyder wrote:
Doug Ewell wrote at 3:07 PM on Friday, May 21, 2004:
Dean Snyder dean dot snyder at jhu dot edu wrote:
... And since Japanese and Fraktur are not separately encoded just
because there would be lots of people who would use such an encoding,
why would
At 12:19 PM 5/25/2004, Dean Snyder wrote:
Archaic Greek exhibits variable glyph stance, that is, glyphs can be
flipped horizontally or even vertically, usually dependent upon the
direction of the writing stream.
How should variable glyph stance for the same characters in the same
script be dealt
At 01:18 PM 5/25/2004, Mark Davis wrote:
The events in question happened in the Very Archaic
Unicode Era (1987-88), before 'document repositories'
etc, were invented.
In other words, before Unicode had moved from a concept
to an organized standardization activity.
A./
At 05:10 PM 5/25/2004, Mark Davis wrote:
I don't think the fold to base is as useful as some other information. For
those characters with a canonical decomposition, the decomposition carries
more
more information, since you can combine it with a remove combining marks
folding to get the folding
At 11:06 PM 5/25/2004, Doug Ewell wrote:
But then Dean responded:
So, you are saying there are glyph streams in German Fraktur that
fluent, native Germans would have trouble reading.
I consider myself moderately native, definitely of German origin, and
arguably somewhat fluent in reading
At 05:14 AM 6/3/2004, Peter Kirk wrote:
My first thought is that a variant script selector might be defined, which
applies until cancelled or overridden, on the analogy of RLO...PDF. But I
guess others will object to this. Does anyone have any other suggestions
for how Unicode can support
At 02:21 PM 6/4/2004, Peter Kirk wrote:
There is no consensus that this Phoenician proposal is necessary..
I am revisiting this one because I realise now that Ken has been somewhat
economical with the truth here. There ARE cases in which entire alphabets
have been given compatibility
Peter Kirk noted:
The link from http://evertype.com/formal.html remains broken.
This information is irrelevant on the list, and as Everson has
unsubscribed, won't reach him. If you care about helping him improve his
site, you might provide such information privately.
A./
At 10:17 AM 6/5/2004, Peter Kirk wrote:
Unicode has defined a mechanism for dealing with the situation, variation
selectors. If this mechanism is not appropriate in this particular case,
let the UTC come up with another mechanism to meet the user requirement.
To define a new set of abstract
At 03:44 AM 6/7/2004, Peter Kirk wrote:
On 06/06/2004 14:38, Patrick Durusau wrote:
The reason I pointed out that Semitic scholars had reached their view
long prior to Unicode was to point out that they were not following the
character/glyph model of the Unicode standard.
I don't claim that
At 09:11 PM 6/9/2004, Ernest Cline wrote:
[Original Message]
From: Michael Everson [EMAIL PROTECTED]
Practice your tongue-twisting.
Proposal to add Bantu phonetic click characters to the UCS
http://www.evertype.com/standards/iso10646/pdf/n2790-clicks.pdf
Why wouldn't U+1D4AC MATHEMATICAL
Any notation for a highly specialized subject would always tend to suffer
from a very small number of participants. This is not a-priori a reason to
force this notation into private use. One of our goals in this direction
would be to enable publishers to support online editions of a large
It was understood that the mathematical symbols were not to be used in
language text.
What was understood is that if you need a run of text in a script font you
wouldn't
use these characters, but would use markup. But if you needed an isolated,
out of
context shape, where the font style has
At 07:46 AM 6/10/2004, John Cowan wrote:
To represent the text as originally written, I need a digital representation
for each of the characters in it. Since all I want to do is reprint
the book -- I don't need to use the unusual characters in interchange --
the PUA and a commissioned font seem
Jun 2004 22:42:30 -0700, Asmus Freytag
[EMAIL PROTECTED] wrote (in reply to a different message):
The mathematical script capital Q has no formal case mapping to
mathematical script small q as case transformation would normally
change the meaning of a mathematical text.
Ah, here's the difference
At 12:08 PM 6/10/2004, Peter Constable wrote:
From: Asmus Freytag [mailto:[EMAIL PROTECTED]
Any notation for a highly specialized subject would always tend to
suffer
from a very small number of participants. This is not a-priori a
reason to
force this notation into private use.
Just to clarify
At 07:00 AM 6/10/2004, John Cowan wrote:
(LATIN LETTER OWL, indeed.)
This is an interesting symbol as a fairly similar symbol is used in Japan
to annotate phone numbers - if I correctly understand those that have a
taped message or automated response system.
We don't have a symbol for the
At 03:47 AM 6/10/2004, Michael Everson wrote:
At 00:11 -0400 2004-06-10, Ernest Cline wrote:
[Original Message]
From: Michael Everson [EMAIL PROTECTED]
Practice your tongue-twisting.
Proposal to add Bantu phonetic click characters to the UCS
At 01:04 PM 6/10/2004, Peter Constable wrote:
That doesn't mean that we stop asking all the hard questions, but that
we
allow a presumption of usefulness for characters that were in
demonstrated
use over some time and by several authors.
But it is precisely that status that is called into
At 12:08 PM 6/10/2004, Michael Everson wrote:
At 11:53 -0700 2004-06-10, Asmus Freytag wrote:
It was understood that the mathematical symbols were not to be used in
language text.
What was understood is that if you need a run of text in a script font
you wouldn't use these characters, but would
At 07:41 PM 6/10/2004, Kenneth Whistler wrote:
Yes, it's a scare claim. It is trying to bludgeon the committee
I think the verb in question is inappropriate for the occasion and
for this e-mail exchange. Especially when used in the context of
imputing intention of your opponent which is always a
At 01:11 PM 6/24/2004, Rick McGowan wrote:
Or do I need to write a short proposal asking to
change line 08?
There is no need to request any change to it.
I want to suggest that Babylonian vowels should also
be considered for BMP insertion.
Then you should write a proposal for them.
To amplify
At 02:01 PM 6/30/2004, Patrick Andries wrote:
If a few citations of author specific characters are enough are sufficient
for encoding I have a few more characters to propose
Note : I don't know which I really prefer (encode this kind of rare
characters or not).
I prefer to see proposals for
At 01:21 AM 7/1/2004, Peter Kirk wrote:
On 30/06/2004 16:35, Kenneth Whistler wrote:
the versions in the main Greek and Coptic block (or has it been
officially renamed just Greek?)
No, the block name won't be changed, in part because changing
block names is another destabilization in the
At 03:31 PM 7/1/2004, busmanus wrote:
Can you give a link to these normalization rules?
Just check the unicode home page.
A./
At 10:19 PM 2/19/2004, [EMAIL PROTECTED] wrote:
The Arabic data in question is for place names in a
mapping product. So far, we have only received one
complaint and it was a missed two element ligature from
Arabic Presentation Forms B. Does this mean that
the ligatures in Arabic Presentation
Sorry, stale mail alert.
A./
At 08:33 PM 7/9/2004, John Cowan wrote:
I have just reviewed this list and found it odd that Hebrew presentation
forms are included but Arabic ones are not.
The specification actually called only for Latin, Greek, and Cyrillic;
I added Hebrew pour la lagniappe. If someone wants to add Arabic, I
At 01:02 AM 7/10/2004, Marcin 'Qrczak' Kowalczyk wrote:
But there are cases when I would prefer to fold Polish diacritics in
searches.
It's basically every case when you are not sure that all stored data is
using diacritics,
Or when you are unsure how it is spelled, for example, looking up a
or transliteration standards
latin-arabic
W liście z pią, 09-07-2004, godz. 19:34 -0700, Asmus Freytag napisał:
o-slash, can be analyzed as o and slash, even though that's not done
canonically in Unicode. Allowing users outside Scandinavia to perform
fuzzy searches for words with this character
At 11:02 AM 7/13/2004, Peter Kirk wrote:
I was surprised to see that WG2 has accepted a proposal made by the US
National Body to use CGJ to distinguish between Umlaut and Tréma in German
bibliographic data.
You raise some interesting questions. However, note that the purpose of CGJ
is intended
Thank you for reviewing this.
DiacriticFolding (unlike AccentFolding) is selective about which combining
marks it removes for which base character. I wonder whether that's truly
intended, or whether it could be replaced by a combination of
AccentFolding
OtherDiacriticFolding
where AccentFolding
At 11:15 PM 7/17/2004, John Cowan wrote:
I agree that in the TR#30 context, the Right Thing is to remove the
character pair mappings altogether, and all of the single-character
mappings that have canonical decompositions
In other words, in your opinion, the reasonable thing to do would be for
201 - 300 of 1250 matches
Mail list logo