Re: Armenian Mijaket (Armenian colon)
On 12/5/2017 1:32 PM, Ken Whistler via Unicode wrote: Asmus, On 12/5/2017 12:35 PM, Asmus Freytag via Unicode wrote: I don't know the history of this particular "unification" Here are some clues to guide further research on the history. The annotation in question was added to a draft of the NamesList.txt file for Unicode 4.1 on October 7, 2003. The annotation was not yet in the Unicode 4.0 charts, published in April, 2003. That should narrow down the search for everybody. I can't find specific mention of this in the UTC minutes from the relevant 2003 window. But I strongly suspect that the catalyst for the change was the discussion that took place regarding PRI #12 re terminal punctuation: http://www.unicode.org/review/pr-12.html That document, at least, does mention "Armenian" and U+2024, although not in the same breath. That PRI was discussed and closed at UTC #96, on August 25, 2003: http://www.unicode.org/L2/L2003/03240.htm I don't find any particular mention of U+2024 in my own notes from that meeting, so I suspect the proximal cause for the change to the annotation for U+2024 on October 7 will have to be dug out of an email archive at some point. --Ken Thanks, Ken. Looking in the e-mail trail I find relevant that the concerns raised here were present already in 2003. John Cowan scripsit: (emphasis added) Of course with proportional fonts this character would display at least (and preferably) a single dot. Any use of this character that assumes it is a symbol consisting in a single dot aligned on the baseline seems to abuse the semantic of this character, which is not a punctuation, but really a styling character used instead of an "invisible" thin space. The larger concern was that implementations should be free to implement strings of leader characters in a way that makes sense for the intended purpose and is not constrained by the design constraints for other, utterly unrelated use. In particular, he likened the leader characters to "styled whitespace". In the early years there was much emphasis placed on unifying punctuation marks based on similarity of appearance, sometimes even ignoring the non-ink part of the design (e.g. side bearings, even asymmetric ones). More recently, the UTC seems to have shifted towards a stance of moving away from this towards a more nuanced approach. I believe there's a potential that the unification with the Armenian punctuation is not as well-considered as it may have appeared in 2003 and that it might better fit the current approach to construe the properties of the leader characters to be confined to their intended purpose more exclusively. A./
Re: Armenian Mijaket (Armenian colon)
Asmus, On 12/5/2017 12:35 PM, Asmus Freytag via Unicode wrote: I don't know the history of this particular "unification" Here are some clues to guide further research on the history. The annotation in question was added to a draft of the NamesList.txt file for Unicode 4.1 on October 7, 2003. The annotation was not yet in the Unicode 4.0 charts, published in April, 2003. That should narrow down the search for everybody. I can't find specific mention of this in the UTC minutes from the relevant 2003 window. But I strongly suspect that the catalyst for the change was the discussion that took place regarding PRI #12 re terminal punctuation: http://www.unicode.org/review/pr-12.html That document, at least, does mention "Armenian" and U+2024, although not in the same breath. That PRI was discussed and closed at UTC #96, on August 25, 2003: http://www.unicode.org/L2/L2003/03240.htm I don't find any particular mention of U+2024 in my own notes from that meeting, so I suspect the proximal cause for the change to the annotation for U+2024 on October 7 will have to be dug out of an email archive at some point. --Ken
Re: Armenian Mijaket (Armenian colon)
In fact I would also remove the suggested misleading (non normative) note in NamesList.txt about the use of the ONE LEADER DOT (it is jsut one of the possible fallbacks but it has wrong properties for encoiding plaintext, it is only useful as a rendering fallback, but is not even useful for that because almsot no font map this character, as leader dots are preferably rendered another way, by drawing a dotted line; one some text renderers may use the leader dot only when they need to transform a leader space into a botrted line and they need a glyph for that, but note that they'll also need to control the spacing, margins and will probably always put it on the baseline like regular full stops) A better fallback is the middle dot (but with additional thin space around it). Still for the semantics, and because we should not have to use such renndering fallbacks for composing plain texts (imagine what we want to enter in a database of texts or in translation engines that don't know and should have to worry about the fonts, font styles or metrics, when here we need a clear semantic distinction of the mikajet (colon or semi-colon articulating two phrases in the same sentence, or at end of an introductory sentence followed by one value or a list of Armenian words, itself terminated by an Armenian full stop U+0589). You'll note that on Wikipedia, the ArmSCII table at top of the page was composed and rendered (in LaTeX) with the middle dot and is clearly distinguished from the ASCII full stop and the Armenian full stop. You will find no place there about the ONE DOT LEADER. This is espacially important because today Armenian will be written using eithern "modern" (ASCII) punctuations (like in English with colons, semicolons, and full stops), or traditional punctuation. And it cannot be predicted in which context the transalted texts will be used (modern/ASCII or traditional) so we have an ambiguity about how to translate and represent colons/semicolons and full stops. The Armenian full stop is clearly encoded. The Armenian [semi]colon is not and we only have fallbacks. So we need the "mikajet" at U+0588 (unallocated and jut before the distinctive U+0589 Armenian full stop) is the best place. Even for the Unicode represenative chart, you'll note that the characters are slanted including the punctuation and the dots become ovals. Various Armenian texts use square dots (apparently drawn as a small nearly vertical stroke with a pencil or plum). This will leave the renderers choosing how to rendere the two Armenian punctuations (either traditional, or modern) and will preserve the semantics of text without conflicting with other rendering options (for the leaders in TOCs or tabular data, which may eventually use U+2024 with some rare fonts specific to the renderer engine and its own typographical engine, if it ever needs a font for its needed glyphs, but zven in that case this internal fonts will not need to be Unicode encoded, it will just be a collection of glyphs for the intended rendering effect and styles it wants to support). For now the immediate real need is for fully translating interfaces in applications and allowing them to support either a "modern" style (English/ASCII punctuations) or "traditional" style. No fallback characters should be encoded in these texts so that no confusion will arise if ever one uses both the real Armenian full stop (two dots) and a fallback for the distinctive missing mikajet (single dot, to distinguish also from leaders and decimal separators in numbers or abbreviation dots). The new encoded mikajet may include a note suggesting the use of the MIDDLE DOT as a preferable fallback. 2017-12-05 21:35 GMT+01:00 Asmus Freytag via Unicode: > On 12/5/2017 11:28 AM, Philippe Verdy via Unicode wrote: > > U+2024 is not supported in any fonts I have loaded. A websearch of mijaket > gives nothing. > U+20224 is used as a "leader dot", and does not match the expected metrics > (it is certainly not a mijaket, it should be more like U+0589, i.e. as a > bold parallelogram, and not a thin leader dot). > > Leader dots are NOT used as real punctuation, they are presentational, for > example in TOC (table of contents), where they are aligned in arbitrarily > long rows. > > The note in http://www.unicode.org/charts/PDF/U2000.pdf is absolutely not > normative and in fact it is wrong in my opinion. > > The mijaket (Armenian colon) should be encoded (preferably at U+0588 in > the Armenian block) as it also has to be distinguisdhed from leader dots in > Armenian TOC, exactly like the vertsaket was distinguished at U+0589. > > > Well, unless someone (you?) writes a proposal to that effect > > (I don't know the history of this particular "unification" but on the face > of it would share your concern that unifying something with a very specific > functionality and metrics, leader dots, with ordinary script-specific > punctuation is not helpful - unless it can be shown that
Re: Armenian Mijaket (Armenian colon)
On 12/5/2017 11:28 AM, Philippe Verdy via Unicode wrote: U+2024 is not supported in any fonts I have loaded. A websearch of mijaket gives nothing. U+20224 is used as a "leader dot", and does not match the expected metrics (it is certainly not a mijaket, it should be more like U+0589, i.e. as a bold parallelogram, and not a thin leader dot). Leader dots are NOT used as real punctuation, they are presentational, for example in TOC (table of contents), where they are aligned in arbitrarily long rows. The note in http://www.unicode.org/charts/PDF/U2000.pdf is absolutely not normative and in fact it is wrong in my opinion. The mijaket (Armenian colon) should be encoded (preferably at U+0588 in the Armenian block) as it also has to be distinguisdhed from leader dots in Armenian TOC, exactly like the vertsaket was distinguished at U+0589. Well, unless someone (you?) writes a proposal to that effect (I don't know the history of this particular "unification" but on the face of it would share your concern that unifying something with a very specific functionality and metrics, leader dots, with ordinary script-specific punctuation is not helpful - unless it can be shown that this unification is widely supported in practice. However, if your claim that 2024 is unsupported is correct, that would strengthen the case for reconsidering this; however the case would have to be made in a formal proposal first). A./ 2017-12-05 19:59 GMT+01:00 S. Gilles: On 2017-12-05T18:44:05+0100, Philippe Verdy via Unicode wrote: > The Armenian script has its own distinctive punctuation (vertsaket) for the > standard full stop at end of sentence (whose glyph looks very much like the > Basic Latin/ASCII colon, however slighly more bold and slanted and whose > dots are rectangular). It is encoded at U+0589. And used in traditional > texts instead of the "modern" full stop. > > But Armenian also has its own distinctive puctuation (mijaket) for the > introductory colon between two phrases of the same sentence (whose glyph > looks very much like the Basic Latin/ASCII full stop). It is not encoded > and I don't like using the ASCII full stop where it causes confusion. > > Where is the Armenian distinctive mijaket? Shouldn't it be encoded at > U+0588? Off-list because I generally don't know what I'm talking about, but grepping NamesList.txt for ‘mijaket’ gives U+2024. If this isn't what you're looking for, my apologies. -- S. Gilles
Re: Armenian Mijaket (Armenian colon)
Note that "Noto Sans Armenian" does not even map U+2024 (I doubt it is accepted as a real replacement for the missing Armenian mijaket which plays a role similar to a Latin semicolon or colon), it does match the hyphen at U+2010. But U+0589 (Armenian "versakjet", the Armenian full stop that looks like a ":" Latin colon) is mapped. My opinion is that the one dot leader has only been used by some sources that don't need to render tabular data or TOCs: these sources needing these traditional distinctions are probably religious texts, and clearly they don't even look like what is in the Unicode PDF for the representative glyph, and "Noto Sans Armenian" is designed for modern use on display and even there we'll need a better distinction and better metrics than going with the possible "Noto Sans" mapping of the leader dot at U+2024 (which still does not exist: in fact leaders are better represented another way than by repeating this character: leaders are essially parsed in arbitrary lengths like a tabulation whitespace and so the leader dot is not semantically suitable at all as a mijaket (it's just like if we wanted to replace ASCII full stops or colons and semicolons in English by SPACE or TAB: in Armenian this just causes havoc). 2017-12-05 20:28 GMT+01:00 Philippe Verdy: > U+2024 is not supported in any fonts I have loaded. A websearch of mijaket > gives nothing. > U+20224 is used as a "leader dot", and does not match the expected metrics > (it is certainly not a mijaket, it should be more like U+0589, i.e. as a > bold parallelogram, and not a thin leader dot). > > Leader dots are NOT used as real punctuation, they are presentational, for > example in TOC (table of contents), where they are aligned in arbitrarily > long rows. > > The note in http://www.unicode.org/charts/PDF/U2000.pdf is absolutely not > normative and in fact it is wrong in my opinion. > > The mijaket (Armenian colon) should be encoded (preferably at U+0588 in > the Armenian block) as it also has to be distinguisdhed from leader dots in > Armenian TOC, exactly like the vertsaket was distinguished at U+0589. > > > 2017-12-05 19:59 GMT+01:00 S. Gilles : > >> On 2017-12-05T18:44:05+0100, Philippe Verdy via Unicode wrote: >> > The Armenian script has its own distinctive punctuation (vertsaket) for >> the >> > standard full stop at end of sentence (whose glyph looks very much like >> the >> > Basic Latin/ASCII colon, however slighly more bold and slanted and whose >> > dots are rectangular). It is encoded at U+0589. And used in traditional >> > texts instead of the "modern" full stop. >> > >> > But Armenian also has its own distinctive puctuation (mijaket) for the >> > introductory colon between two phrases of the same sentence (whose glyph >> > looks very much like the Basic Latin/ASCII full stop). It is not encoded >> > and I don't like using the ASCII full stop where it causes confusion. >> > >> > Where is the Armenian distinctive mijaket? Shouldn't it be encoded at >> > U+0588? >> >> Off-list because I generally don't know what I'm talking about, but >> grepping NamesList.txt for ‘mijaket’ gives U+2024. If this isn't >> what you're looking for, my apologies. >> >> -- >> S. Gilles >> > >
Re: Armenian Mijaket (Armenian colon)
U+2024 is not supported in any fonts I have loaded. A websearch of mijaket gives nothing. U+20224 is used as a "leader dot", and does not match the expected metrics (it is certainly not a mijaket, it should be more like U+0589, i.e. as a bold parallelogram, and not a thin leader dot). Leader dots are NOT used as real punctuation, they are presentational, for example in TOC (table of contents), where they are aligned in arbitrarily long rows. The note in http://www.unicode.org/charts/PDF/U2000.pdf is absolutely not normative and in fact it is wrong in my opinion. The mijaket (Armenian colon) should be encoded (preferably at U+0588 in the Armenian block) as it also has to be distinguisdhed from leader dots in Armenian TOC, exactly like the vertsaket was distinguished at U+0589. 2017-12-05 19:59 GMT+01:00 S. Gilles: > On 2017-12-05T18:44:05+0100, Philippe Verdy via Unicode wrote: > > The Armenian script has its own distinctive punctuation (vertsaket) for > the > > standard full stop at end of sentence (whose glyph looks very much like > the > > Basic Latin/ASCII colon, however slighly more bold and slanted and whose > > dots are rectangular). It is encoded at U+0589. And used in traditional > > texts instead of the "modern" full stop. > > > > But Armenian also has its own distinctive puctuation (mijaket) for the > > introductory colon between two phrases of the same sentence (whose glyph > > looks very much like the Basic Latin/ASCII full stop). It is not encoded > > and I don't like using the ASCII full stop where it causes confusion. > > > > Where is the Armenian distinctive mijaket? Shouldn't it be encoded at > > U+0588? > > Off-list because I generally don't know what I'm talking about, but > grepping NamesList.txt for ‘mijaket’ gives U+2024. If this isn't > what you're looking for, my apologies. > > -- > S. Gilles >
Armenian Mijaket (Armenian colon)
The Armenian script has its own distinctive punctuation (vertsaket) for the standard full stop at end of sentence (whose glyph looks very much like the Basic Latin/ASCII colon, however slighly more bold and slanted and whose dots are rectangular). It is encoded at U+0589. And used in traditional texts instead of the "modern" full stop. But Armenian also has its own distinctive puctuation (mijaket) for the introductory colon between two phrases of the same sentence (whose glyph looks very much like the Basic Latin/ASCII full stop). It is not encoded and I don't like using the ASCII full stop where it causes confusion. Where is the Armenian distinctive mijaket? Shouldn't it be encoded at U+0588?