Re: Merging combining classes, was: New contribution N2676

2003-11-06 Thread Peter Kirk
On 05/11/2003 19:59, Jony Rosenne wrote: -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Philippe Verdy Sent: Thursday, November 06, 2003 3:46 AM Is there an initiative in Israel related to the supported glyphs and rendering features required to

list etiquette (was RE: Merging combining classes, was: New contribution N2676)

2003-11-06 Thread Peter Constable
- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Peter Kirk Sent: Thursday, November 06, 2003 3:34 AM To: Jony Rosenne Cc: 'Philippe Verdy'; [EMAIL PROTECTED] Subject: Re: Merging combining classes, was: New contribution N2676 On 05/11/2003 19:59, Jony Rosenne wrote

RE: Merging combining classes, was: New contribution N2676

2003-11-05 Thread Peter Constable
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Peter Kirk But I am not sure that this get-out clause should be applicable to a process which claims as its very essence to support correct positioning of nonspacing marks but actually supports only a

Re: Merging combining classes, was: New contribution N2676

2003-11-05 Thread Peter Kirk
On 05/11/2003 15:13, Peter Constable wrote: -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Peter Kirk But I am not sure that this get-out clause should be applicable to a process which claims as its very essence to support correct

Re: Merging combining classes, was: New contribution N2676

2003-11-05 Thread Philippe Verdy
From: Peter Kirk [EMAIL PROTECTED] It seems to me that the Unicode conformance clauses are so weak as to be almost useless. An application can claim to conform to Unicode but hardly do anything. A font can be sold, for example, as a Unicode Hebrew font while successfully rendering only a very

RE: Merging combining classes, was: New contribution N2676

2003-11-05 Thread Jony Rosenne
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Philippe Verdy Sent: Thursday, November 06, 2003 3:46 AM Is there an initiative in Israel related to the supported glyphs and rendering features required to support Hebrew, like it exists in

Re: Merging combining classes, was: New contribution N2676

2003-10-30 Thread jon
On 29/10/2003 15:07, John Cowan wrote: Not necessarily. A process may check its input for normalization and reject it if it is not normalized, and XML consumers are encouraged (not required) to do so. This looks to me like a clear breach of C9, at least of the derived principle

Re: Merging combining classes, was: New contribution N2676

2003-10-30 Thread Philippe Verdy
- Original Message - From: Jim Allan [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, October 30, 2003 4:48 PM Subject: Re: Merging combining classes, was: New contribution N2676 I offered a suggestion on cedilla and combining undercomma: / It seems to me that Cedilla

Re: Merging combining classes, was: New contribution N2676

2003-10-30 Thread Jim Allan
Philppe Verdy posted: I do think the opposite: one can fold all commas below to cedillas by default, and, in a Romanian or Latvian context, fold all cedillas below to commas below. I see no difference. Folding either way will find all occurrences of cedilla or comma below. The direction of

Re: Merging combining classes, was: New contribution N2676

2003-10-29 Thread Peter Kirk
On 28/10/2003 20:01, Jim Allan wrote: ... From _The Unicode Standard 4.0_, 3.11 at http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf: If combining characters have different combining classes--for example, when one nonspacing mark is above a base character form and another is below

RE: Merging combining classes, was: New contribution N2676

2003-10-29 Thread Kent Karlsson
A similar situation can be seen in the Latvian letter U+0123 LATIN SMALL LETTER G WITH CEDILLA. In good Latvian typography, this character is always shown with a rotated comma over the g, rather than a cedilla below the g, because of the typographical design and layout issues

Re: Merging combining classes, was: New contribution N2676

2003-10-29 Thread Jim Allan
Peter Kirk wrote: Rather, it defines that they do not. But since this is not true on any reasonable intuitive definition of interact typographically (as we have seen with Hebrew vowel points), this statement makes sense only as a counterintuitive definition of interact typographically. Exactly.

RE: Merging combining classes, was: New contribution N2676

2003-10-29 Thread Jim Allan
Kent Karlson posted: COMBINING COMMA BELOW is not attached, even though cedilla is. A turned comma above is not _attached_ above... Correct. COMBINING COMMA BELOW belongs to combining class 220. However by Unicode specifications both it and an attached lower cedilla on _g_ may be rendered by

RE: Merging combining classes, was: New contribution N2676

2003-10-29 Thread Language Analysis Systems, Inc. Unicode list reader
However by Unicode specifications both it and an attached lower cedilla on _g_ may be rendered by unattached turned comma above which interacts with characters not in their respective combining classes. And this new turned comma above of necessity would always be applied before normal upper

Re: Merging combining classes, was: New contribution N2676

2003-10-29 Thread Philippe Verdy
From: Jim Allan [EMAIL PROTECTED] Kent Karlson posted: COMBINING COMMA BELOW is not attached, even though cedilla is. A turned comma above is not _attached_ above... Correct. COMBINING COMMA BELOW belongs to combining class 220. However by Unicode specifications both it and an attached

Re: Merging combining classes, was: New contribution N2676

2003-10-29 Thread John Cowan
Jim Allan scripsit: For example, it is crucial that the combining class of the cedilla be lower than the combining class of the dot below, although their exact values of 202 and 220 are not important for implementation. This is not explained, but obviously the reason why it is crucial

Re: Merging combining classes, was: New contribution N2676

2003-10-29 Thread Philippe Verdy
- Original Message - From: John Hudson [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: 'Jim Allan' [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Wednesday, October 29, 2003 6:15 PM Subject: RE: Merging combining classes, was: New contribution N2676 At 04:04 AM 10/29/2003, Kent Karlsson wrote

Re: Merging combining classes, was: New contribution N2676

2003-10-29 Thread John Hudson
At 12:33 PM 10/29/2003, Philippe Verdy wrote: Even today, it is quite hard to find any Romanian or Latvian web page using the new Unicode characters with a comma-below: even governmental sites use the characters coded with the cedilla, and they support that this comma below is rendered

Re: Merging combining classes, was: New contribution N2676

2003-10-29 Thread Peter Kirk
On 29/10/2003 11:53, John Cowan wrote: ... A rendering engine is *not* entitled to misbehave if it receives a, dot-below, cedilla and try to place the dot between the a glyph and the cedilla; this is a direct consequence of the conformance requirement that processes not distinguish (unless they

Re: Merging combining classes, was: New contribution N2676

2003-10-29 Thread John Cowan
Language Analysis Systems, Inc. Unicode list reader scripsit: It suggests that for many fonts, U+0067 LATIN SMALL LETTER G + U+0327 COMBINING CEDILLA and U+0067 LATIN SMALL LETTER G + U+0312 COMBINING TURNED COMMA ABOVE would have exactly the same rendering. Some applications would

Re: Merging combining classes, was: New contribution N2676

2003-10-29 Thread John Cowan
Peter Kirk scripsit: Is this actually a conformance requirement? I thought I understood the following: A rendering engine which fails to render canonical equivalents identically, or fails to render certain orders sensibly, is not doing what the Unicode standard tells it that it must do.

Re: Merging combining classes, was: New contribution N2676

2003-10-29 Thread Peter Kirk
On 29/10/2003 14:14, John Cowan wrote: Peter Kirk scripsit: Is this actually a conformance requirement? I thought I understood the following: A rendering engine which fails to render canonical equivalents identically, or fails to render certain orders sensibly, is not doing what the

Re: Merging combining classes, was: New contribution N2676

2003-10-29 Thread Philippe Verdy
From: John Hudson [EMAIL PROTECTED] All of these fonts already include the newer Romanian S/s and T/t commaaccent characters and correct accent forms for the Latvian diacritics (although the Arial comma accent is a bit too much like an unattached cedilla). I meant for Windows 9x/ME users, as a

Re: Merging combining classes, was: New contribution N2676

2003-10-29 Thread John Cowan
Peter Kirk scripsit: [A process] must interpret a non-normalised variant in the same way as the normalised form; and it cannot assume that the process presenting the data makes a distinction between the normalised and non-normalised form and does not reorder the data into an arbitrary

Re: Merging combining classes, was: New contribution N2676

2003-10-29 Thread Philippe Verdy
From: Jim Allan [EMAIL PROTECTED] It seems to me that Cedilla/undercomma folding would be a useful addition to Charater Foldings at http://www.unicode.org/reports/tr30. Excellent idea, however it has to be tailored by language: For example, Turkish and French (which almost always and

Re: Merging combining classes, was: New contribution N2676

2003-10-29 Thread Peter Kirk
On 29/10/2003 15:07, John Cowan wrote: Not necessarily. A process may check its input for normalization and reject it if it is not normalized, and XML consumers are encouraged (not required) to do so. This looks to me like a clear breach of C9, at least of the derived principle no process

Re: Merging combining classes, was: New contribution N2676

2003-10-28 Thread Peter Kirk
On 27/10/2003 16:39, Philippe Verdy wrote: ... The backwards marking is not restricted to French accents in collation level 2. You can use reverse ordering at any tailored level to fit other needs, and you can also insert an extra collation level. So I think that Mark is right here as it gives

Re: Merging combining classes, was: New contribution N2676

2003-10-28 Thread Peter Kirk
On 27/10/2003 18:06, Philippe Verdy wrote: From: Peter Kirk [EMAIL PROTECTED] Thanks for the clarification. In principle we might be able to go a little further: we could define both c, CCO and CCO, c as canonically equivalent to c for all c in combining class zero. This would have to be

RE: Merging combining classes, was: New contribution N2676

2003-10-28 Thread Kent Karlsson
Philippe Verdy wrote: But we cannot define it within the UCD, but algorithmically, like for Hangul syllables/jamos... Note that the *arithmetic* specification of the Hangul Syllable canonical decompositions is just a short way of specifying the decompositions. They CAN be listed, in a way

Re: Merging combining classes, was: New contribution N2676

2003-10-28 Thread Peter Kirk
On 28/10/2003 04:49, Kent Karlsson wrote: Philippe Verdy wrote: There's a counter example with the position of the circumflex on the lowercase t (I can't remember for which language it occurs, sorry), which is in some cases not the one that its combining class would normally take. There

Re: Merging combining classes, was: New contribution N2676

2003-10-28 Thread jim
Peter Kirk wrote: Also, in the commonly used Hebrew *transliteration*, the same function (fricative pronunciation) is indicated by a macron above g and p but below b, d, k and t, for the same reason. It occurs only with these letters (sometimes also written below h). There might be an argument

Re: Merging combining classes, was: New contribution N2676

2003-10-28 Thread John Cowan
jim scripsit: Unicode encodes U+1E20 and U+1E21 as combinations of lower and uppercase _g_ with macron. The forms have canonical decomposition to _g_ or _G_ followed by U+0304. This seems to rule out being able to consider a bar above and a bar below as variants of the same character

Re: Merging combining classes, was: New contribution N2676

2003-10-28 Thread Peter Kirk
On 28/10/2003 13:35, John Cowan wrote: ... But Unicode specifications currently say nothing about the possibility of moving under-diacritics to an over-character position for typographical reasons except for combination of _g_ and cedilla. Nothing needs to be said, because glyphs are

Re: Merging combining classes, was: New contribution N2676

2003-10-28 Thread Jim Allan
I commented on what I saw as a problem in changing the positions of diacritics in rendering from that shown in the charts from above to below or from below to above. John Cowan responded: True. But that doesn't mean that the glyph that a particular font uses for the sequence g, COMBINING

Re: Merging combining classes, was: New contribution N2676

2003-10-27 Thread Peter Kirk
On 26/10/2003 12:51, Jony Rosenne wrote: While the current combining classes may cause some difficulties for Biblical scholars (and this isn't cut and dry yet - it isn't certain whether these are Unicode problem, implementation problems, missing characters or mis-identified characters), I have

Re: Merging combining classes, was: New contribution N2676

2003-10-27 Thread Peter Kirk
On 26/10/2003 19:58, John Hudson wrote: ... Functionally, inserting a CGJ here resolves the problem fine. I'm just not convinced that CGJ is a good general solution to the normalisation problem: it works, but it requires deliberate insertion in every place where unwanted mark re-ordering may

Re: Merging combining classes, was: New contribution N2676

2003-10-27 Thread rosennej
I am on a business trip abroad with only limited e-mail access. I will try to respond next week when I'm back home. Jony

Re: Merging combining classes, was: New contribution N2676

2003-10-27 Thread Philippe Verdy
. And we can make font renderers accept this new encoding, by letting them recognize the CCO. - Original Message - From: Peter Kirk [EMAIL PROTECTED] To: John Hudson [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Monday, October 27, 2003 1:48 PM Subject: Re: Merging combining classes

Re: Merging combining classes, was: New contribution N2676

2003-10-27 Thread Philippe Verdy
From: Peter Kirk [EMAIL PROTECTED] So the logical order is shin, sin/shin dot, dagesh, vowel, meteg. But the canonical order is shin, vowel, dagesh, meteg, sin/shin dot; up to three (and in theory more, at least in biblical Hebrew) other characters may appear between the base letter and

Re: Merging combining classes, was: New contribution N2676

2003-10-27 Thread Peter Kirk
On 27/10/2003 06:54, Philippe Verdy wrote: Thanks a lot for thzese precisions on Hebrew usages that need those combining order overrides. This demonstrates that this occurs relatively infrequently, and so introducing a ignorable combining order override control makes sense, without needing to add

Re: Merging combining classes, was: New contribution N2676

2003-10-27 Thread Mark Davis
: [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Mon, 2003 Oct 27 07:49 Subject: Re: Merging combining classes, was: New contribution N2676 On 27/10/2003 06:54, Philippe Verdy wrote: Thanks a lot for thzese precisions on Hebrew usages that need those combining order overrides. This demonstrates

Re: Merging combining classes, was: New contribution N2676

2003-10-27 Thread Philippe Verdy
From: Peter Kirk [EMAIL PROTECTED] I am not sure what you mean by further normalization steps for Hebrew. Of course I don't mean that NF* algorithms must be changed. See below. If this means that users will be expected to input Hebrew in this order, perhaps with a keyboard driver which

Re: Merging combining classes, was: New contribution N2676

2003-10-27 Thread Philippe Verdy
From: Peter Kirk [EMAIL PROTECTED] I don't see any difference between your proposed generic CCO and CGJ. As you say, the same function may be needed in several scripts, including perhaps IPA which uses complex diacritic stacking. So why not simply use CGJ? Why not effectively, but the

RE: Merging combining classes, was: New contribution N2676

2003-10-27 Thread Peter Constable
Philippe Verdy wrote: This principle may help solve the ambiguities in all those affected scripts (may be there are similar issues in the Latin script for Vietnamese, which would like to better fit the phonetics of words that may be incorrectly rendered by the currently requited

Re: Merging combining classes, was: New contribution N2676

2003-10-27 Thread Philippe Verdy
From: Mark Davis [EMAIL PROTECTED] the UTC decision: [96-C20] Consensus: Add text to Unicode 4.0.1 which points out that combining grapheme joiner has the effect of preventing the canonical re-ordering of combining marks during normalization. [L2/03-235, L2/03-236, L2/03-234] [96-A72]

Re: Merging combining classes, was: New contribution N2676

2003-10-27 Thread Philippe Verdy
From: Peter Constable [EMAIL PROTECTED] There is no problem requiring a solution for combining marks used with Latin script,* including IPA and Vietnamese, because all of the marks that occupy a comparable space relative to the base have the same combining class, meaning that normalization

Re: Merging combining classes, was: New contribution N2676

2003-10-27 Thread Peter Kirk
On 27/10/2003 12:28, Mark Davis wrote: Collation is very different, and already has mechanisms for dealing with sequences. So no CGJ is needed there (except for case 2). Mark Mark, can you outline what these mechanisms are or point me to a definition e.g. in a section of UTR #10? As I had

Re: Merging combining classes, was: New contribution N2676

2003-10-27 Thread Peter Kirk
On 27/10/2003 10:31, Philippe Verdy wrote: ... The bad thing is that there's no way to say that a superfluous CGJ character can be safely removed if CC(char1) = CC(char2), so that it will preserve the semantic of the encoded text even though such filtered text would not be canonically

Re: Merging combining classes, was: New contribution N2676

2003-10-27 Thread Philippe Verdy
From: Peter Kirk [EMAIL PROTECTED] On 27/10/2003 10:31, Philippe Verdy wrote: ... The bad thing is that there's no way to say that a superfluous CGJ character can be safely removed if CC(char1) = CC(char2), so that it will preserve the semantic of the encoded text even though such

Re: Merging combining classes, was: New contribution N2676

2003-10-27 Thread Philippe Verdy
So, all we can do is to define compatibility equivalence between: c1, CCO, c2 and: c1, c2 if and only if: CC(c1) CC(c2) 0. Oops! Of course, I really meant: All we can do is to define compatibility equivalence (NFK*) between: c1, CCO, c2 and: c1, c2

Re: Merging combining classes, was: New contribution N2676

2003-10-27 Thread Peter Kirk
On 27/10/2003 16:16, Philippe Verdy wrote: ... So, all we can do is to define compatibility equivalence between: c1, CCO, c2 and: c1, c2 if and only if: CC(c1) CC(c2) 0. This won't affect the NFC and NFD conversion algorithms, but it can affect the NFKC and NFKD conversion algorithms.

Re: Merging combining classes, was: New contribution N2676

2003-10-27 Thread Philippe Verdy
From: Peter Kirk [EMAIL PROTECTED] each possible individually as a contraction. The Logical_Order_Exception property (see http://www.unicode.org/reports/tr10/ section 3.1.3) just One bug report note here: The UTS#10 contains all references to several character properties, pointing to

Re: Merging combining classes, was: New contribution N2676

2003-10-27 Thread Philippe Verdy
From: Peter Kirk [EMAIL PROTECTED] On 27/10/2003 10:31, Philippe Verdy wrote: ... The bad thing is that there's no way to say that a superfluous CGJ character can be safely removed if CC(char1) = CC(char2), so that it will preserve the semantic of the encoded text even though such

Re: Merging combining classes, was: New contribution N2676

2003-10-27 Thread Philippe Verdy
From: Peter Kirk [EMAIL PROTECTED] Thanks for the clarification. In principle we might be able to go a little further: we could define both c, CCO and CCO, c as canonically equivalent to c for all c in combining class zero. This would have to be some kind of decomposition exception so that

RE: Merging combining classes, was: New contribution N2676

2003-10-26 Thread Jony Rosenne
Sent: Sunday, October 26, 2003 9:37 PM To: Philippe Verdy Cc: [EMAIL PROTECTED] Subject: Re: Merging combining classes, was: New contribution N2676 On 25/10/2003 19:00, Philippe Verdy wrote: From: Peter Kirk [EMAIL PROTECTED] .. Of course, if the combining class values were

Re: Merging combining classes, was: New contribution N2676

2003-10-26 Thread Ted Hopp
On Sunday, October 26, 2003 3:51 PM, Jony Rosenne wrote: While the current combining classes may cause some difficulties for Biblical scholars (and this isn't cut and dry yet - it isn't certain whether these are Unicode problem, implementation problems, missing characters or mis-identified

Re: Merging combining classes, was: New contribution N2676

2003-10-26 Thread Philippe Verdy
From: Peter Kirk [EMAIL PROTECTED] I see the point, but I would think there was something seriously wrong with a database setup which could change its ordering algorithm without somehow declaring all existing indexes invalid. Why would such a SQL engine do so, if what has changed is an

Re: Merging combining classes, was: New contribution N2676

2003-10-26 Thread Peter Kirk
On 25/10/2003 19:00, Philippe Verdy wrote: From: Peter Kirk [EMAIL PROTECTED] I can see that there might be some problems in the changeover phase. But these are basically the same problems as are present anyway, and at least putting them into a changeover phase means that they go away

Re: Merging combining classes, was: New contribution N2676

2003-10-26 Thread Mark E. Shoulson
Jony Rosenne wrote: While the current combining classes may cause some difficulties for Biblical scholars (and this isn't cut and dry yet - it isn't certain whether these are Unicode problem, implementation problems, missing characters or mis-identified characters), I have yet to see a claimed

RE: Merging combining classes, was: New contribution N2676

2003-10-26 Thread Jony Rosenne
This is, in my opinion, a missing character. Jony -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ted Hopp Sent: Monday, October 27, 2003 12:53 AM To: [EMAIL PROTECTED] Subject: Re: Merging combining classes, was: New contribution N2676

RE: Merging combining classes, was: New contribution N2676

2003-10-26 Thread Jony Rosenne
: Monday, October 27, 2003 2:07 AM To: Jony Rosenne Cc: [EMAIL PROTECTED] Subject: Re: Merging combining classes, was: New contribution N2676 Jony Rosenne wrote: While the current combining classes may cause some difficulties for Biblical scholars (and this isn't cut and dry yet - it isn't

RE: Merging combining classes, was: New contribution N2676

2003-10-26 Thread John Hudson
At 04:37 PM 10/26/2003, Jony Rosenne wrote: There is nothing unusual about this. The only problem is that while the Hiriq is between the Lamed and the Mem and belongs to the missing Yod, some people insist that they see two vowels under the Lamed. No, the problem is not the positioning of the

Re: Merging combining classes, was: New contribution N2676

2003-10-26 Thread John Hudson
At 07:45 PM 10/26/2003, Mark E. Shoulson wrote: I remembered there was a lot of discussion about this case, which is why I brought it up. Can someone remind me why ZWNBSP would be Bad for this? Wrong RTL coding? (possibly, but it's weak, isn't it) Wrongly indicates a word-break? (this is

Re: Merging combining classes, was: New contribution N2676

2003-10-26 Thread Mark E. Shoulson
I remembered there was a lot of discussion about this case, which is why I brought it up. Can someone remind me why ZWNBSP would be Bad for this? Wrong RTL coding? (possibly, but it's weak, isn't it) Wrongly indicates a word-break? (this is probably a problem.) ~mark John Hudson wrote: At

Re: Merging combining classes, was: New contribution N2676

2003-10-25 Thread Philippe Verdy
From: Peter Kirk [EMAIL PROTECTED] I wonder if it would in fact be possible to merge certain adjacent combining classes, as from a future numbered version N of the standard. That would not affect the normalisation of existing text; text normalised before version N would remain normalised in

Re: Merging combining classes, was: New contribution N2676

2003-10-25 Thread Peter Kirk
On 25/10/2003 09:11, Philippe Verdy wrote: From: Peter Kirk [EMAIL PROTECTED] ... The problem would then be the interoperability of Unicode-compliant systems using distinct versions of Unicode (for example between XML processors, text editors, input methods, renderers, text converters, full

Re: Merging combining classes, was: New contribution N2676

2003-10-25 Thread Stefan Persson
Philippe Verdy wrote: The problem with this solution is that stability is not guaranteed across backward versions of Unicode: if a tool A implements the new version of combining classes and normalizes its input, it will keep the relative ordering of characters. If its output is injected into a

Re: Merging combining classes, was: New contribution N2676

2003-10-25 Thread Philippe Verdy
From: Peter Kirk [EMAIL PROTECTED] I can see that there might be some problems in the changeover phase. But these are basically the same problems as are present anyway, and at least putting them into a changeover phase means that they go away gradually instead of being standardised for ever,