On 05/11/2003 19:59, Jony Rosenne wrote:
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Philippe Verdy
Sent: Thursday, November 06, 2003 3:46 AM
Is there an initiative in Israel related to the supported
glyphs and rendering features required to
-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On
Behalf Of Peter Kirk
Sent: Thursday, November 06, 2003 3:34 AM
To: Jony Rosenne
Cc: 'Philippe Verdy'; [EMAIL PROTECTED]
Subject: Re: Merging combining classes, was: New contribution N2676
On 05/11/2003 19:59, Jony Rosenne wrote
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On
Behalf Of Peter Kirk
But I am not sure that this get-out clause should
be applicable to a process which claims as its very essence to
support
correct positioning of nonspacing marks but actually supports only a
On 05/11/2003 15:13, Peter Constable wrote:
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On
Behalf Of Peter Kirk
But I am not sure that this get-out clause should
be applicable to a process which claims as its very essence to
support
correct
From: Peter Kirk [EMAIL PROTECTED]
It seems to me that the Unicode conformance clauses are so weak as to be
almost useless. An application can claim to conform to Unicode but
hardly do anything. A font can be sold, for example, as a Unicode Hebrew
font while successfully rendering only a very
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Philippe Verdy
Sent: Thursday, November 06, 2003 3:46 AM
Is there an initiative in Israel related to the supported
glyphs and rendering features required to support Hebrew,
like it exists in
On 29/10/2003 15:07, John Cowan wrote:
Not necessarily. A process may check its input for normalization and
reject it if it is not normalized, and XML consumers are encouraged
(not required) to do so.
This looks to me like a clear breach of C9, at least of the derived
principle
- Original Message -
From: Jim Allan [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Thursday, October 30, 2003 4:48 PM
Subject: Re: Merging combining classes, was: New contribution N2676
I offered a suggestion on cedilla and combining undercomma:
/ It seems to me that Cedilla
Philppe Verdy posted:
I do think the opposite: one can fold all commas below to cedillas by
default,
and, in a Romanian or Latvian context, fold all cedillas below to commas
below.
I see no difference.
Folding either way will find all occurrences of cedilla or comma below.
The direction of
On 28/10/2003 20:01, Jim Allan wrote:
...
From _The Unicode Standard 4.0_, 3.11 at
http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf:
If combining characters have different combining classes--for
example, when one nonspacing mark is above a base character form and
another is below
A similar situation can be seen in the Latvian letter U+0123 LATIN
SMALL LETTER G WITH CEDILLA. In good Latvian typography, this
character
is always shown with a rotated comma over the g, rather than
a cedilla
below the g, because of the typographical design and layout issues
Peter Kirk wrote:
Rather, it defines that they do not. But since this is not true on any
reasonable intuitive definition of interact typographically (as we
have seen with Hebrew vowel points), this statement makes sense only as
a counterintuitive definition of interact typographically.
Exactly.
Kent Karlson posted:
COMBINING COMMA BELOW is not attached, even though cedilla is.
A turned comma above is not _attached_ above...
Correct. COMBINING COMMA BELOW belongs to combining class 220.
However by Unicode specifications both it and an attached lower cedilla
on _g_ may be rendered by
However by Unicode specifications both it and an attached lower cedilla
on _g_ may be rendered by unattached turned comma above which interacts
with characters not in their respective combining classes. And this
new
turned comma above of necessity would always be applied before normal
upper
From: Jim Allan [EMAIL PROTECTED]
Kent Karlson posted:
COMBINING COMMA BELOW is not attached, even though cedilla is.
A turned comma above is not _attached_ above...
Correct. COMBINING COMMA BELOW belongs to combining class 220.
However by Unicode specifications both it and an attached
Jim Allan scripsit:
For example, it is crucial that the combining class of the cedilla be
lower than the combining class of the dot below, although their exact
values of 202 and 220 are not important for implementation.
This is not explained, but obviously the reason why it is crucial
- Original Message -
From: John Hudson [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: 'Jim Allan' [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Wednesday, October 29, 2003 6:15 PM
Subject: RE: Merging combining classes, was: New contribution N2676
At 04:04 AM 10/29/2003, Kent Karlsson wrote
At 12:33 PM 10/29/2003, Philippe Verdy wrote:
Even today, it is quite hard to find any Romanian or Latvian web page using
the new Unicode characters with a comma-below: even governmental sites use
the characters coded with the cedilla, and they support that this comma
below is rendered
On 29/10/2003 11:53, John Cowan wrote:
... A
rendering engine is *not* entitled to misbehave if it receives a, dot-below,
cedilla and try to place the dot between the a glyph and the cedilla;
this is a direct consequence of the conformance requirement that processes
not distinguish (unless they
Language Analysis Systems, Inc. Unicode list reader scripsit:
It suggests that for many fonts,
U+0067 LATIN SMALL LETTER G + U+0327 COMBINING CEDILLA
and
U+0067 LATIN SMALL LETTER G + U+0312 COMBINING TURNED COMMA ABOVE
would have exactly the same rendering. Some applications would
Peter Kirk scripsit:
Is this actually a conformance requirement? I thought I understood the
following: A rendering engine which fails to render canonical
equivalents identically, or fails to render certain orders sensibly, is
not doing what the Unicode standard tells it that it must do.
On 29/10/2003 14:14, John Cowan wrote:
Peter Kirk scripsit:
Is this actually a conformance requirement? I thought I understood the
following: A rendering engine which fails to render canonical
equivalents identically, or fails to render certain orders sensibly, is
not doing what the
From: John Hudson [EMAIL PROTECTED]
All of these fonts already include the newer Romanian S/s and T/t
commaaccent characters and correct accent forms for the Latvian diacritics
(although the Arial comma accent is a bit too much like an unattached
cedilla).
I meant for Windows 9x/ME users, as a
Peter Kirk scripsit:
[A process] must
interpret a non-normalised variant in the same way as the normalised
form; and it cannot assume that the process presenting the data makes a
distinction between the normalised and non-normalised form and does not
reorder the data into an arbitrary
From: Jim Allan [EMAIL PROTECTED]
It seems to me that Cedilla/undercomma folding would be a useful
addition to Charater Foldings at http://www.unicode.org/reports/tr30.
Excellent idea, however it has to be tailored by language:
For example, Turkish and French (which almost always and
On 29/10/2003 15:07, John Cowan wrote:
Not necessarily. A process may check its input for normalization and
reject it if it is not normalized, and XML consumers are encouraged
(not required) to do so.
This looks to me like a clear breach of C9, at least of the derived
principle
no process
On 27/10/2003 16:39, Philippe Verdy wrote:
...
The backwards marking is not restricted to French accents in collation
level 2. You can use reverse ordering at any tailored level to fit other
needs, and you can also insert an extra collation level.
So I think that Mark is right here as it gives
On 27/10/2003 18:06, Philippe Verdy wrote:
From: Peter Kirk [EMAIL PROTECTED]
Thanks for the clarification. In principle we might be able to go a
little further: we could define both c, CCO and CCO, c as
canonically equivalent to c for all c in combining class zero. This
would have to be
Philippe Verdy wrote:
But we cannot define it within the UCD, but algorithmically, like for
Hangul syllables/jamos...
Note that the *arithmetic* specification of the Hangul Syllable
canonical decompositions is just a short way of specifying the
decompositions. They CAN be listed, in a way
On 28/10/2003 04:49, Kent Karlsson wrote:
Philippe Verdy wrote:
There's a counter example with the position of the circumflex on the
lowercase t (I can't remember for which language it occurs,
sorry), which is
in some cases not the one that its combining class would
normally take.
There
Peter Kirk wrote:
Also, in the commonly used Hebrew *transliteration*, the same function
(fricative pronunciation) is indicated by a macron above g and p but
below b, d, k and t, for the same reason. It occurs only with these
letters (sometimes also written below h). There might be an argument
jim scripsit:
Unicode encodes U+1E20 and U+1E21 as combinations of lower and uppercase
_g_ with macron. The forms have canonical decomposition to _g_ or _G_
followed by U+0304. This seems to rule out being able to consider a bar
above and a bar below as variants of the same character
On 28/10/2003 13:35, John Cowan wrote:
...
But Unicode specifications currently say nothing about the possibility
of moving under-diacritics to an over-character position for
typographical reasons except for combination of _g_ and cedilla.
Nothing needs to be said, because glyphs are
I commented on what I saw as a problem in changing the positions of
diacritics in rendering from that shown in the charts from above to
below or from below to above.
John Cowan responded:
True. But that doesn't mean that the glyph that a particular font uses
for
the sequence g, COMBINING
On 26/10/2003 12:51, Jony Rosenne wrote:
While the current combining classes may cause some difficulties for Biblical
scholars (and this isn't cut and dry yet - it isn't certain whether these
are Unicode problem, implementation problems, missing characters or
mis-identified characters), I have
On 26/10/2003 19:58, John Hudson wrote:
...
Functionally, inserting a CGJ here resolves the problem fine. I'm just
not convinced that CGJ is a good general solution to the normalisation
problem: it works, but it requires deliberate insertion in every place
where unwanted mark re-ordering may
I am on a business trip abroad with only limited e-mail access. I will
try to respond next week when I'm back home.
Jony
. And we can
make font renderers accept this new encoding, by letting them recognize the
CCO.
- Original Message -
From: Peter Kirk [EMAIL PROTECTED]
To: John Hudson [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Monday, October 27, 2003 1:48 PM
Subject: Re: Merging combining classes
From: Peter Kirk [EMAIL PROTECTED]
So the logical order is
shin, sin/shin dot, dagesh, vowel, meteg.
But the canonical order is
shin, vowel, dagesh, meteg, sin/shin dot;
up to three (and in theory
more, at least in biblical Hebrew) other characters may appear between
the base letter and
On 27/10/2003 06:54, Philippe Verdy wrote:
Thanks a lot for thzese precisions on Hebrew usages that need those
combining order overrides.
This demonstrates that this occurs relatively infrequently, and so
introducing a ignorable combining order override control makes sense,
without needing to add
: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Mon, 2003 Oct 27 07:49
Subject: Re: Merging combining classes, was: New contribution N2676
On 27/10/2003 06:54, Philippe Verdy wrote:
Thanks a lot for thzese precisions on Hebrew usages that need those
combining order overrides.
This demonstrates
From: Peter Kirk [EMAIL PROTECTED]
I am not sure what you mean by further normalization steps for Hebrew.
Of course I don't mean that NF* algorithms must be changed. See below.
If this means that users will be expected to input Hebrew in this order,
perhaps with a keyboard driver which
From: Peter Kirk [EMAIL PROTECTED]
I don't see any difference between your proposed generic CCO and CGJ. As
you say, the same function may be needed in several scripts, including
perhaps IPA which uses complex diacritic stacking. So why not simply use
CGJ?
Why not effectively, but the
Philippe Verdy wrote:
This principle may help solve the ambiguities in all those affected
scripts
(may be there are similar issues in the Latin script for Vietnamese,
which
would like to better fit the phonetics of words that may be
incorrectly
rendered by the currently requited
From: Mark Davis [EMAIL PROTECTED]
the UTC decision:
[96-C20] Consensus: Add text to Unicode 4.0.1 which points out that
combining
grapheme joiner has the effect of preventing the canonical re-ordering of
combining marks during normalization. [L2/03-235, L2/03-236, L2/03-234]
[96-A72]
From: Peter Constable [EMAIL PROTECTED]
There is no problem requiring a solution for combining marks used with
Latin script,* including IPA and Vietnamese, because all of the marks
that occupy a comparable space relative to the base have the same
combining class, meaning that normalization
On 27/10/2003 12:28, Mark Davis wrote:
Collation is very different, and already has mechanisms for dealing with
sequences. So no CGJ is needed there (except for case 2).
Mark
Mark, can you outline what these mechanisms are or point me to a
definition e.g. in a section of UTR #10? As I had
On 27/10/2003 10:31, Philippe Verdy wrote:
...
The bad thing is that there's no way to say that a superfluous
CGJ character can be safely removed if CC(char1) = CC(char2),
so that it will preserve the semantic of the encoded text even
though such filtered text would not be canonically
From: Peter Kirk [EMAIL PROTECTED]
On 27/10/2003 10:31, Philippe Verdy wrote:
...
The bad thing is that there's no way to say that a superfluous
CGJ character can be safely removed if CC(char1) = CC(char2),
so that it will preserve the semantic of the encoded text even
though such
So, all we can do is to define compatibility equivalence between:
c1, CCO, c2
and:
c1, c2
if and only if:
CC(c1) CC(c2) 0.
Oops! Of course, I really meant:
All we can do is to define compatibility equivalence (NFK*)
between:
c1, CCO, c2
and:
c1, c2
On 27/10/2003 16:16, Philippe Verdy wrote:
...
So, all we can do is to define compatibility equivalence between:
c1, CCO, c2
and:
c1, c2
if and only if:
CC(c1) CC(c2) 0.
This won't affect the NFC and NFD conversion algorithms, but it can affect
the NFKC and NFKD conversion algorithms.
From: Peter Kirk [EMAIL PROTECTED]
each possible individually as a contraction. The Logical_Order_Exception
property (see http://www.unicode.org/reports/tr10/ section 3.1.3) just
One bug report note here:
The UTS#10 contains all references to several character properties,
pointing to
From: Peter Kirk [EMAIL PROTECTED]
On 27/10/2003 10:31, Philippe Verdy wrote:
...
The bad thing is that there's no way to say that a superfluous
CGJ character can be safely removed if CC(char1) = CC(char2),
so that it will preserve the semantic of the encoded text even
though such
From: Peter Kirk [EMAIL PROTECTED]
Thanks for the clarification. In principle we might be able to go a
little further: we could define both c, CCO and CCO, c as
canonically equivalent to c for all c in combining class zero. This
would have to be some kind of decomposition exception so that
Sent: Sunday, October 26, 2003 9:37 PM
To: Philippe Verdy
Cc: [EMAIL PROTECTED]
Subject: Re: Merging combining classes, was: New contribution N2676
On 25/10/2003 19:00, Philippe Verdy wrote:
From: Peter Kirk [EMAIL PROTECTED]
..
Of course, if the combining class values were
On Sunday, October 26, 2003 3:51 PM, Jony Rosenne wrote:
While the current combining classes may cause some difficulties for
Biblical scholars (and this isn't cut and dry yet - it isn't certain
whether these are Unicode problem, implementation problems,
missing characters or mis-identified
From: Peter Kirk [EMAIL PROTECTED]
I see the point, but I would think there was something seriously wrong
with a database setup which could change its ordering algorithm without
somehow declaring all existing indexes invalid.
Why would such a SQL engine do so, if what has changed is an
On 25/10/2003 19:00, Philippe Verdy wrote:
From: Peter Kirk [EMAIL PROTECTED]
I can see that there might be some problems in the changeover phase. But
these are basically the same problems as are present anyway, and at
least putting them into a changeover phase means that they go away
Jony Rosenne wrote:
While the current combining classes may cause some difficulties for Biblical
scholars (and this isn't cut and dry yet - it isn't certain whether these
are Unicode problem, implementation problems, missing characters or
mis-identified characters), I have yet to see a claimed
This is, in my opinion, a missing character.
Jony
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Ted Hopp
Sent: Monday, October 27, 2003 12:53 AM
To: [EMAIL PROTECTED]
Subject: Re: Merging combining classes, was: New contribution N2676
: Monday, October 27, 2003 2:07 AM
To: Jony Rosenne
Cc: [EMAIL PROTECTED]
Subject: Re: Merging combining classes, was: New contribution N2676
Jony Rosenne wrote:
While the current combining classes may cause some difficulties for
Biblical scholars (and this isn't cut and dry yet - it isn't
At 04:37 PM 10/26/2003, Jony Rosenne wrote:
There is nothing unusual about this. The only problem is that while the
Hiriq is between the Lamed and the Mem and belongs to the missing Yod, some
people insist that they see two vowels under the Lamed.
No, the problem is not the positioning of the
At 07:45 PM 10/26/2003, Mark E. Shoulson wrote:
I remembered there was a lot of discussion about this case, which is why I
brought it up. Can someone remind me why ZWNBSP would be Bad for
this? Wrong RTL coding? (possibly, but it's weak, isn't it) Wrongly
indicates a word-break? (this is
I remembered there was a lot of discussion about this case, which is why
I brought it up. Can someone remind me why ZWNBSP would be Bad for
this? Wrong RTL coding? (possibly, but it's weak, isn't it) Wrongly
indicates a word-break? (this is probably a problem.)
~mark
John Hudson wrote:
At
From: Peter Kirk [EMAIL PROTECTED]
I wonder if it would in fact be possible to merge certain adjacent
combining classes, as from a future numbered version N of the standard.
That would not affect the normalisation of existing text; text
normalised before version N would remain normalised in
On 25/10/2003 09:11, Philippe Verdy wrote:
From: Peter Kirk [EMAIL PROTECTED]
...
The problem would then be the interoperability of Unicode-compliant
systems using distinct versions of Unicode (for example between
XML processors, text editors, input methods, renderers, text
converters, full
Philippe Verdy wrote:
The problem with this solution is that stability is not guaranteed across
backward versions of Unicode: if a tool A implements the new version of
combining classes and normalizes its input, it will keep the relative
ordering of characters. If its output is injected into a
From: Peter Kirk [EMAIL PROTECTED]
I can see that there might be some problems in the changeover phase. But
these are basically the same problems as are present anyway, and at
least putting them into a changeover phase means that they go away
gradually instead of being standardised for ever,
68 matches
Mail list logo