On 26 Mar 2012, at 13:35, Denis Jacquerye wrote:
So far the linguistic atlases I have seen extensively use this
combining letter mechanism, with diacritics changing the meaning of
the combining letter or of the base letter.
There are a whole lot of notations that could simply be base
So far the linguistic atlases I have seen extensively use this
combining letter mechanism, with diacritics changing the meaning of
the combining letter or of the base letter.
There are a whole lot of notations that could simply be base combining
letter + combining diacritics, but if you consider
Denis Jacquerye wrote [2012-03-26 13:35+0200]:
The fact [.] doesn't make it any saner.
The same could be said [.]
Denis Moyogo Jacquerye
Are you trying to say that extra tables and exact additional
knowledge besides UnicodeData.txt should not be necessary?
In the end you wanna make it a
On Mon, Mar 26, 2012 at 1:59 PM, Steven Atreju snatr...@googlemail.com wrote:
Denis Jacquerye wrote [2012-03-26 13:35+0200]:
The fact [.] doesn't make it any saner.
The same could be said [.]
Denis Moyogo Jacquerye
Are you trying to say that extra tables and exact additional
knowledge
On Mon, Mar 12, 2012 at 12:18 AM, Doug Ewell d...@ewellic.org wrote:
Denis Moyogo Jacquerye wrote:
Stacked letters are also found in some Greek manuscripts.
See the page
http://www.archive.org/stream/revuearchologi27pariuoft#page/156/mode/1up
with some examples: Nu, omicron, omicron and
Denis Moyogo Jacquerye wrote:
Are these more examples that exist in only one or two sources, where
the main purpose of encoding them, or creating or enhancing the
combining mechanism, would be to talk about the sources? Or are they
in actual productive use?
The author states he has found
Stacked letters are also found in some Greek manuscripts.
See the page
http://www.archive.org/stream/revuearchologi27pariuoft#page/156/mode/1up
with some examples: Nu, omicron, omicron and Greek circumflex (tilde),
chi and Greek circumflex.
Would these also have to be represented by combining
On 11 Mar 2012, at 12:05, Denis Jacquerye wrote:
Stacked letters are also found in some Greek manuscripts.
See the page
http://www.archive.org/stream/revuearchologi27pariuoft#page/156/mode/1up
with some examples: Nu, omicron, omicron and Greek circumflex (tilde),
chi and Greek circumflex.
Denis Moyogo Jacquerye wrote:
Stacked letters are also found in some Greek manuscripts.
See the page
http://www.archive.org/stream/revuearchologi27pariuoft#page/156/mode/1up
with some examples: Nu, omicron, omicron and Greek circumflex (tilde),
chi and Greek circumflex.
Would these also
In other words, that circumflex is an epigraphic notation. This means
three distinct levels of analysis of the text: one for Chi, one for
the small letter above it noting something about the Chi, and another
for the circumflex noting something about the Chi itself.
This causes a major problem :
Note: for the choice 2 below, we currently have CGJ, but its role has
only been defined to allow some orthographic distinctions where the
ordering of diacritics is significant and does not match the canonical
ordering defined by the NFD form.
It is not intended to convey other semantic
Also I do think that this proposal would avoid havng to encode many
new precomposed diacritics made of a diacritic letter and a
diacritic applying to it. We would just encode them using such
separator first, before the encoded diacritic letter, and the standard
combining diacritics.
With this
One example: say you want to encode an epigraphic C with CEDILLA
appearing as a letter above another one, you would encode :
- (1) the orthographic base letter (with its standard diacritics,
including CGJ if needed)
- (2) the new special combining character with combining class 0 that I propose.
Note also that you have already accepted to encode characters like
COMBINING LATIN SMALL LETTER U WITH DIAERESIS.
The bad thing is that if it is used without any separator, it will not
clearly separate it from the orthographic level. So orthographic
checkers will choke on it. There's no clean way
Another example: suppose you want to represent the epigraphic notation
where there's a tie grouping several orthographic characters, for use
in texts discussing grammar. You can perfectly use the special
combining character with class 0 that I propose to annotate :
- the first orthographic
Le 6 mars 2012 23:59, Asmus Freytag asm...@ix.netcom.com a écrit :
Usually, stacked accents do not get smaller, but generally just change
position, so this would have to be an exception to general stacked accent
layout.
This is false. There's been a lot of cases were accents placed above
Le 6 mars 2012 23:59, Asmus Freytag asm...@ix.netcom.com a écrit :
Now, I daresay that this effect could be reproduced with clever font
tables, but it doesn't change the fact that visually what you see is
indistinguishable
Note that given the date of the book, there was no consideration
Am Montag, 5. März 2012 um 18:33 schrieb Denis Jacquerye:
DJ ... The phonetic alphabet of Gillérion and Rousselot used in the ''Atlas
DJ linguistique de la France''[1] and several other French dialectology
DJ texts use things like combining i with tilde, combining o with breve,
DJ combining o
On Mon, 5 Mar 2012 14:26:43 -0600 (CST)
Benjamin M Scarborough benjamin.scarboro...@utdallas.edu wrote:
Are you suggesting a LATIN SIGN VIRAMA?
The problem with LATIN SIGN COENG and LATIN SIGN INVERSE COENG is that
they are too late - there are characters around that should decompose to
contain
Speaking of U+17D2 KHMER SIGN COENG, what is a conforming renderer to
do if someone writes A ្B ? (U+0041 U+17D2 U+0042)
Leo
On 3/6/12, Richard Wordingham richard.wording...@ntlworld.com wrote:
On Mon, 5 Mar 2012 14:26:43 -0600 (CST)
Benjamin M Scarborough benjamin.scarboro...@utdallas.edu
...@ntlworld.com
Cc: unicode@unicode.org
Subject: Re: Combining latin small letters with diacritics
Speaking of U+17D2 KHMER SIGN COENG, what is a conforming renderer to
do if someone writes A ្B ? (U+0041 U+17D2 U+0042)
Leo
On 3/6/12, Richard Wordingham richard.wording...@ntlworld.com wrote:
On Mon, 5
On 3/6/12, Doug Ewell d...@ewellic.org wrote:
Speaking of U+17D2 KHMER SIGN COENG, what is a conforming renderer to
do if someone writes A្B ? (U+0041 U+17D2 U+0042)
Roll its eyes?
I guess :), but how should it look on the screen?
Leo
On 3/6/2012 2:34 PM, Leo Broukhis wrote:
On 3/6/12, Doug Ewelld...@ewellic.org wrote:
Speaking of U+17D2 KHMER SIGN COENG, what is a conforming renderer to
do if someone writes A្B ? (U+0041 U+17D2 U+0042)
Roll its eyes?
I guess :), but how should it look on the screen?
Just the way your
On 3/6/2012 1:57 AM, Karl Pentzlin wrote:
Regarding e.g. the "combining œ with breve" as shown on p.24 9th line
(see attached scan), this seems to be an "intermediate sound" "u + œ",
to which the breve is applied as a whole (which means, not
surprisingly, «voyelle
On 3/6/12, Ken Whistler k...@sybase.com wrote:
On 3/6/2012 2:34 PM, Leo Broukhis wrote:
On 3/6/12, Doug Ewelld...@ewellic.org wrote:
Speaking of U+17D2 KHMER SIGN COENG, what is a conforming renderer to
do if someone writes A្B ? (U+0041 U+17D2 U+0042)
Roll its eyes?
I guess :), but how
On 3/6/2012 3:19 PM, Leo Broukhis wrote:
On 3/6/12, Ken Whistlerk...@sybase.com wrote:
On 3/6/2012 2:34 PM, Leo Broukhis wrote:
On 3/6/12, Doug Ewelld...@ewellic.org wrote:
Speaking of U+17D2 KHMER SIGN COENG, what is a conforming renderer to
do if someone writes A្B ? (U+0041 U+17D2
Thank you, Ken!
What about Grapheme_Extend class characters placed out of context? It
would be nice to see a dotted box in cases like AׁB
(U+0041 U+05C1 HEBREW POINT SHIN DOT U+0042)
Leo
On 3/6/12, Ken Whistler k...@sybase.com wrote:
I see. I was under an impression that the renderer must
You can do that if you wish. This is part of the standard. Look at the
existing canonical decomposition mappings in the UCD (or just look at
them in the charts which display them). Note that this will not make
any difference for all conforming Unicode processes.
For example you can freely
On Mon, Mar 5, 2012 at 7:29 PM, Philippe Verdy verd...@wanadoo.fr wrote:
You can do that if you wish. This is part of the standard. Look at the
existing canonical decomposition mappings in the UCD (or just look at
them in the charts which display them). Note that this will not make
any
Le 5 mars 2012 18:33, Denis Jacquerye moy...@gmail.com a écrit :
[1] pp.19-24
http://www.archive.org/stream/atlaslinguistnot00gilluoft#page/18/mode/2up
I note an interesting character in your page : the « open g » used to
denote the « g dur français » show in the middle of the page on the
My question really is whether they could not be seen as
combacombcombdiaeresis/comb, etc. Where the shape of
combdiaeresis/comb is contextual.
Sorry I did not understood the question.
Anyway I don't see the exact problem you may find in this case. There
are other stacked diacritics in this
On 5 Mar 2012, at 18:48, Denis Jacquerye wrote:
My question really is whether they could not be seen as
combacombcombdiaeresis/comb, etc. Where the shape of
combdiaeresis/comb is contextual.
No, because both the combining-a and the combining-diaeresis are bound to the
base letter; the
So what do you propose ?
- Encoding the new precomposed pairs as a new combining character
(there may be a lot of candidate pairs to encode, espacially in the
Latin script),
- or encoding a variation of the existing diacritic to mean that they
are bound to a first-level of diacritic (here a
Note that the first alternative is the one used in the DAM for
encoding a separate COMBINING LATIN SMALL LETTER A/O/U WITH DIAERESIS
But the document cited by Denis gives a much more productive way that
allows stacking any kind of letters with its diacritics. There won't
be enough space in the
On Mon, Mar 5, 2012 at 19:09, Michael Everson wrote:
No, because both the combining-a and the combining-diaeresis are bound to the
base letter; the combining diaeresis is not bound to the combining-a.
Just like the proposed U+1ABB COMBINING PARENTHESIS ABOVE will be bound to the
base letter,
On 3/5/2012 11:44 AM, Philippe Verdy wrote:
So what do you propose ?
It doesn't matter what *Michael* proposes at this point. These have already
been approved by both the UTC and WG2 and are currently in DAM ballot.
- Encoding the new precomposed pairs as a new combining character
(there may
Le 5 mars 2012 21:17, Benjamin M Scarborough
benjamin.scarboro...@utdallas.edu a écrit :
On Mon, Mar 5, 2012 at 19:09, Michael Everson wrote:
No, because both the combining-a and the combining-diaeresis are bound to the
base letter; the combining diaeresis is not bound to the combining-a.
Just
On Mon, Mar 5, 2012 at 20:56, Philippe Verdy wrote:
But the document cited by Denis gives a much more productive way that
allows stacking any kind of letters with its diacritics. There won't
be enough space in the BMP for such Latin supplements.
Then put them in the SMP. Or is SMP still a
On 3/5/2012 11:56 AM, Philippe Verdy wrote:
Note that the first alternative is the one used in the DAM for
encoding a separate COMBINING LATIN SMALL LETTER A/O/U WITH DIAERESIS
Correct.
But the document cited by Denis gives a much more productive way that
allows stacking any kind of letters
On 3/5/2012 12:17 PM, Benjamin M Scarborough wrote:
On Mon, Mar 5, 2012 at 19:09, Michael Everson wrote:
No, because both the combining-a and the combining-diaeresis are bound to the
base letter; the combining diaeresis is not bound to the combining-a.
Just like the proposed U+1ABB COMBINING
You are so much attached to keep the existing encoding model
unchanged, that now you are going to prepare for LOTS of additions of
combining Latin characters with diacritics... The BMP won't be enough,
the SMP will fill up too, and there will be enormous problems for font
creators (or
On 3/5/2012 12:51 PM, Philippe Verdy wrote:
You are so much attached to keep the existing encoding model
unchanged,
Yep. That's why I work on *standards*, after all.
that now you are going to prepare for LOTS of additions of
combining Latin characters with diacritics... The BMP won't be
Le 5 mars 2012 21:32, Ken Whistler k...@sybase.com a écrit :
On 3/5/2012 11:56 AM, Philippe Verdy wrote:
But the document cited by Denis gives a much more productive way that
allows stacking any kind of letters with its diacritics. There won't
be enough space in the BMP for such Latin
On 5 Mar 2012, at 21:01, Ken Whistler wrote:
In the meantime, if the French dialectologists wish to come to the table, as
the German dialectologists did, the committees can examine the data and
everybody can work out together the best means of representing it in Unicode.
Indeed so.
Michael
Le 5 mars 2012 21:50, Ken Whistler k...@sybase.com a écrit :
On 3/5/2012 12:17 PM, Benjamin M Scarborough wrote:
On Mon, Mar 5, 2012 at 19:09, Michael Everson wrote:
No, because both the combining-a and the combining-diaeresis are bound to
the base letter; the combining diaeresis is not
On Mon, Mar 5, 2012 at 9:17 PM, Ken Whistler k...@sybase.com wrote:
By the way, Philippe, this horse is already long out of the barn. See U+1DD7
COMBINING LATIN SMALL LETTER C WITH CEDILLA, which is already a
published part of the standard.
Focusing just on the three new characters with
On 3/5/2012 2:01 PM, Denis Jacquerye wrote:
Wouldn't CGJ be useful in some way in cases like that of the cedilla
or the light centralization stroke 1AB9 ?
Base character + combining letter + CGJ + combining cedilla would be
clear, the cedilla would not be moved.
How is that simpler than Base
On Mon, Mar 5, 2012 at 7:49 PM, Philippe Verdy verd...@wanadoo.fr wrote:
Le 5 mars 2012 18:33, Denis Jacquerye moy...@gmail.com a écrit :
[1] pp.19-24
http://www.archive.org/stream/atlaslinguistnot00gilluoft#page/18/mode/2up
I note an interesting character in your page : the « open g » used
On Mon, Mar 5, 2012 at 11:19 PM, Ken Whistler k...@sybase.com wrote:
On 3/5/2012 2:01 PM, Denis Jacquerye wrote:
Wouldn't CGJ be useful in some way in cases like that of the cedilla
or the light centralization stroke 1AB9 ?
Base character + combining letter + CGJ + combining cedilla would be
On 3/5/2012 2:32 PM, Denis Jacquerye wrote:
I guess it's less messy than other situations. I just couldn't help
wondering why combining letters with diacritics are being encoded but
letters with diacritics or out of the question.
Because the combining ones are *not* decomposed, and hence don't
50 matches
Mail list logo