Re: Armenian Mijaket (Armenian colon)

2017-12-05 Thread Asmus Freytag via Unicode

  
  
On 12/5/2017 1:32 PM, Ken Whistler via
  Unicode wrote:

Asmus,
  
  
  
  On 12/5/2017 12:35 PM, Asmus Freytag via Unicode wrote:
  
  I don't know the history of this
particular "unification"

  
  
  Here are some clues to guide further research on the history.
  
  
  The annotation in question was added to a draft of the
  NamesList.txt file for Unicode 4.1 on October 7, 2003.
  
  
  The annotation was not yet in the Unicode 4.0 charts, published in
  April, 2003.
  
  
  That should narrow down the search for everybody. I can't find
  specific mention of this in the UTC minutes from the relevant 2003
  window.
  
  
  But I strongly suspect that the catalyst for the change was the
  discussion that took place regarding PRI #12 re terminal
  punctuation:
  
  
  http://www.unicode.org/review/pr-12.html
  
  
  That document, at least, does mention "Armenian" and U+2024,
  although not in the same breath. That PRI was discussed and closed
  at UTC #96, on August 25, 2003:
  
  
  http://www.unicode.org/L2/L2003/03240.htm
  
  
  I don't find any particular mention of U+2024 in my own notes from
  that meeting, so I suspect the proximal cause for the change to
  the annotation for U+2024 on October 7 will have to be dug out of
  an email archive at some point.
  
  
  --Ken
  
  
  
  
  

Thanks, Ken.
Looking in the e-mail trail I find relevant
that the concerns raised here were present already in 2003.
John Cowan scripsit: (emphasis added)
  

  
Of course with proportional fonts this
  character would display at least (and preferably) a single
  dot. Any use of this character that assumes
  
  it is a symbol consisting in a single
dot aligned on the baseline seems to abuse the semantic
of this character, which is not a
  punctuation,
  
  but really a styling character used instead of an "invisible"
  thin
  
  space. 


The larger concern was that implementations
should be free to implement strings of leader characters in a way
that makes sense for the intended purpose and is not constrained
by the design constraints for other, utterly unrelated use. In
particular, he likened the leader characters to "styled
whitespace".
In the early years there was much emphasis
placed on unifying punctuation marks based on similarity of
appearance, sometimes even ignoring the non-ink part of the
design (e.g. side bearings, even asymmetric ones). More
recently, the UTC seems to have shifted towards a stance of
moving away from this towards a more nuanced approach.
I believe there's a potential that the unification
with the Armenian punctuation is not as well-considered as it
may have appeared in 2003 and that it might better fit the
current approach to construe the properties of the leader characters
to be confined to their intended purpose more exclusively.
A./
  
  



Re: Armenian Mijaket (Armenian colon)

2017-12-05 Thread Ken Whistler via Unicode

Asmus,


On 12/5/2017 12:35 PM, Asmus Freytag via Unicode wrote:

I don't know the history of this particular "unification"


Here are some clues to guide further research on the history.

The annotation in question was added to a draft of the NamesList.txt 
file for Unicode 4.1 on October 7, 2003.


The annotation was not yet in the Unicode 4.0 charts, published in 
April, 2003.


That should narrow down the search for everybody. I can't find specific 
mention of this in the UTC minutes from the relevant 2003 window.


But I strongly suspect that the catalyst for the change was the 
discussion that took place regarding PRI #12 re terminal punctuation:


http://www.unicode.org/review/pr-12.html

That document, at least, does mention "Armenian" and U+2024, although 
not in the same breath. That PRI was discussed and closed at UTC #96, on 
August 25, 2003:


http://www.unicode.org/L2/L2003/03240.htm

I don't find any particular mention of U+2024 in my own notes from that 
meeting, so I suspect the proximal cause for the change to the 
annotation for U+2024 on October 7 will have to be dug out of an email 
archive at some point.


--Ken





Re: Armenian Mijaket (Armenian colon)

2017-12-05 Thread Philippe Verdy via Unicode
In fact I would also remove the suggested misleading (non normative) note
in NamesList.txt about the use of the ONE LEADER DOT (it is jsut one of the
possible fallbacks but it has wrong properties for encoiding plaintext, it
is only useful as a rendering fallback, but is not even useful for that
because almsot no font map this character, as leader dots are preferably
rendered another way, by drawing a dotted line; one some text renderers may
use the leader dot only when they need to transform a leader space into a
botrted line and they need a glyph for that, but note that they'll also
need to control the spacing, margins and will probably always put it on the
baseline like regular full stops)

A better fallback is the middle dot (but with additional thin space around
it). Still for the semantics, and because we should not have to use such
renndering fallbacks for composing plain texts (imagine what we want to
enter in a database of texts or in translation engines that don't know and
should have to worry about the fonts, font styles or metrics, when here we
need a clear semantic distinction of the mikajet (colon or semi-colon
articulating two phrases in the same sentence, or at end of an introductory
sentence followed by one value or a list of Armenian words, itself
terminated by an Armenian full stop U+0589).

You'll note that on Wikipedia, the ArmSCII table at top of the page was
composed and rendered (in LaTeX) with the middle dot and is clearly
distinguished from the ASCII full stop and the Armenian full stop. You will
find no place there about the ONE DOT LEADER.

This is espacially important because today Armenian will be written using
eithern "modern" (ASCII) punctuations (like in English with colons,
semicolons, and full stops), or traditional punctuation. And it cannot be
predicted in which context the transalted texts will be used (modern/ASCII
or traditional) so we have an ambiguity about how to translate and
represent colons/semicolons and full stops.

The Armenian full stop is clearly encoded. The Armenian [semi]colon is not
and we only have fallbacks. So we need the "mikajet" at U+0588 (unallocated
and jut before the distinctive U+0589 Armenian full stop) is the best place.
Even for the Unicode represenative chart, you'll note that the characters
are slanted including the punctuation and the dots become ovals. Various
Armenian texts use square dots (apparently drawn as a small nearly vertical
stroke with a pencil or plum).

This will leave the renderers choosing how to rendere the two Armenian
punctuations (either traditional, or modern) and will preserve the
semantics of text without conflicting with other rendering options (for the
leaders in TOCs or tabular data, which may eventually use U+2024 with some
rare fonts specific to the renderer engine and its own typographical
engine, if it ever needs a font for its needed glyphs, but zven in that
case this internal fonts will not need to be Unicode encoded, it will just
be a collection of glyphs for the intended rendering effect and styles it
wants to support).

For now the immediate real need is for fully translating interfaces in
applications and allowing them to support either a "modern" style
(English/ASCII punctuations) or "traditional" style. No fallback characters
should be encoded in these texts so that no confusion will arise if ever
one uses both the real Armenian full stop (two dots) and a fallback for the
distinctive missing mikajet (single dot, to distinguish also from leaders
and decimal separators in numbers or abbreviation dots). The new encoded
mikajet may include a note suggesting the use of the MIDDLE DOT as a
preferable fallback.


2017-12-05 21:35 GMT+01:00 Asmus Freytag via Unicode :

> On 12/5/2017 11:28 AM, Philippe Verdy via Unicode wrote:
>
> U+2024 is not supported in any fonts I have loaded. A websearch of mijaket
> gives nothing.
> U+20224 is used as a "leader dot", and does not match the expected metrics
> (it is  certainly not a mijaket, it should be more like U+0589, i.e. as a
> bold parallelogram, and not a thin leader dot).
>
> Leader dots are NOT used as real punctuation, they are presentational, for
> example in TOC (table of contents), where they are aligned in arbitrarily
> long rows.
>
> The note in http://www.unicode.org/charts/PDF/U2000.pdf is absolutely not
> normative and in fact it is wrong in my opinion.
>
> The mijaket (Armenian colon) should be encoded (preferably at U+0588 in
> the Armenian block) as it also has to be distinguisdhed from leader dots in
> Armenian TOC, exactly like the vertsaket was distinguished at U+0589.
>
>
> Well, unless someone (you?) writes a proposal to that effect
>
> (I don't know the history of this particular "unification" but on the face
> of it would share your concern that unifying something with a very specific
> functionality and metrics, leader dots, with ordinary script-specific
> punctuation is not helpful - unless it can be shown that 

Re: Armenian Mijaket (Armenian colon)

2017-12-05 Thread Asmus Freytag via Unicode

  
  
On 12/5/2017 11:28 AM, Philippe Verdy
  via Unicode wrote:


  U+2024 is not supported in any fonts I have loaded.
A websearch of mijaket gives nothing.
U+20224 is used as a "leader dot", and does not match the
  expected metrics (it is  certainly not a mijaket, it should be
  more like U+0589, i.e. as a bold parallelogram, and not a thin
  leader dot).


Leader dots are NOT used as real punctuation, they are
  presentational, for example in TOC (table of contents), where
  they are aligned in arbitrarily long rows.


The note in http://www.unicode.org/charts/PDF/U2000.pdf
  is absolutely not normative and in fact it is wrong in my
  opinion.


The mijaket (Armenian colon) should be encoded (preferably
  at U+0588 in the Armenian block) as it also has to be
  distinguisdhed from leader dots in Armenian TOC, exactly like
  the vertsaket was distinguished at U+0589.
  


Well, unless someone (you?) writes a proposal to that effect

(I don't know the history of this particular "unification" but on
the face of it would share your concern that unifying something with
a very specific functionality and metrics, leader dots, with
ordinary script-specific punctuation is not helpful - unless it can
be shown that this unification is widely supported in practice.
However, if your claim that 2024 is unsupported is correct, that
would strengthen the case for reconsidering this; however the case
would have to  be made in a formal proposal first).

A./


  


  
  
2017-12-05 19:59 GMT+01:00 S. Gilles :
  

  On 2017-12-05T18:44:05+0100, Philippe
Verdy via Unicode wrote:
> The Armenian script has its own distinctive
punctuation (vertsaket) for the
> standard full stop at end of sentence (whose glyph
looks very much like the
> Basic Latin/ASCII colon, however slighly more bold
and slanted and whose
> dots are rectangular). It is encoded at U+0589. And
used in traditional
> texts instead of the "modern" full stop.
>
> But Armenian also has its own distinctive
puctuation (mijaket) for the
> introductory colon between two phrases of the same
sentence (whose glyph
> looks very much like the Basic Latin/ASCII full
stop). It is not encoded
> and I don't like using the ASCII full stop where it
causes confusion.
>
> Where is the Armenian distinctive mijaket?
Shouldn't it be encoded at
> U+0588?

  

Off-list because I generally don't know what I'm talking
about, but
grepping NamesList.txt for ‘mijaket’ gives U+2024. If this
isn't
what you're looking for, my apologies.

--
S. Gilles
  


  



  



Re: Armenian Mijaket (Armenian colon)

2017-12-05 Thread Philippe Verdy via Unicode
Note that "Noto Sans Armenian" does not even map U+2024 (I doubt it is
accepted as a real replacement for the missing Armenian mijaket which plays
a role similar to a Latin semicolon or colon), it does match the hyphen at
U+2010. But U+0589 (Armenian "versakjet", the Armenian full stop that looks
like a ":" Latin colon) is mapped.

My opinion is that the one dot leader has only been used by some sources
that don't need to render tabular data or TOCs: these sources needing these
traditional distinctions are probably religious texts, and clearly they
don't even look like what is in the Unicode PDF for the representative
glyph, and "Noto Sans Armenian" is designed for modern use on display and
even there we'll need a better distinction and better metrics than going
with the possible "Noto Sans" mapping of the leader dot at U+2024 (which
still does not exist: in fact leaders are better represented another way
than by repeating this character: leaders are essially parsed in arbitrary
lengths like a tabulation whitespace and so the leader dot is not
semantically suitable at all as a mijaket (it's just like if we wanted to
replace ASCII full stops or colons and semicolons in English by SPACE or
TAB: in Armenian this just causes havoc).


2017-12-05 20:28 GMT+01:00 Philippe Verdy :

> U+2024 is not supported in any fonts I have loaded. A websearch of mijaket
> gives nothing.
> U+20224 is used as a "leader dot", and does not match the expected metrics
> (it is  certainly not a mijaket, it should be more like U+0589, i.e. as a
> bold parallelogram, and not a thin leader dot).
>
> Leader dots are NOT used as real punctuation, they are presentational, for
> example in TOC (table of contents), where they are aligned in arbitrarily
> long rows.
>
> The note in http://www.unicode.org/charts/PDF/U2000.pdf is absolutely not
> normative and in fact it is wrong in my opinion.
>
> The mijaket (Armenian colon) should be encoded (preferably at U+0588 in
> the Armenian block) as it also has to be distinguisdhed from leader dots in
> Armenian TOC, exactly like the vertsaket was distinguished at U+0589.
>
>
> 2017-12-05 19:59 GMT+01:00 S. Gilles :
>
>> On 2017-12-05T18:44:05+0100, Philippe Verdy via Unicode wrote:
>> > The Armenian script has its own distinctive punctuation (vertsaket) for
>> the
>> > standard full stop at end of sentence (whose glyph looks very much like
>> the
>> > Basic Latin/ASCII colon, however slighly more bold and slanted and whose
>> > dots are rectangular). It is encoded at U+0589. And used in traditional
>> > texts instead of the "modern" full stop.
>> >
>> > But Armenian also has its own distinctive puctuation (mijaket) for the
>> > introductory colon between two phrases of the same sentence (whose glyph
>> > looks very much like the Basic Latin/ASCII full stop). It is not encoded
>> > and I don't like using the ASCII full stop where it causes confusion.
>> >
>> > Where is the Armenian distinctive mijaket? Shouldn't it be encoded at
>> > U+0588?
>>
>> Off-list because I generally don't know what I'm talking about, but
>> grepping NamesList.txt for ‘mijaket’ gives U+2024. If this isn't
>> what you're looking for, my apologies.
>>
>> --
>> S. Gilles
>>
>
>


Re: Armenian Mijaket (Armenian colon)

2017-12-05 Thread Philippe Verdy via Unicode
U+2024 is not supported in any fonts I have loaded. A websearch of mijaket
gives nothing.
U+20224 is used as a "leader dot", and does not match the expected metrics
(it is  certainly not a mijaket, it should be more like U+0589, i.e. as a
bold parallelogram, and not a thin leader dot).

Leader dots are NOT used as real punctuation, they are presentational, for
example in TOC (table of contents), where they are aligned in arbitrarily
long rows.

The note in http://www.unicode.org/charts/PDF/U2000.pdf is absolutely not
normative and in fact it is wrong in my opinion.

The mijaket (Armenian colon) should be encoded (preferably at U+0588 in the
Armenian block) as it also has to be distinguisdhed from leader dots in
Armenian TOC, exactly like the vertsaket was distinguished at U+0589.


2017-12-05 19:59 GMT+01:00 S. Gilles :

> On 2017-12-05T18:44:05+0100, Philippe Verdy via Unicode wrote:
> > The Armenian script has its own distinctive punctuation (vertsaket) for
> the
> > standard full stop at end of sentence (whose glyph looks very much like
> the
> > Basic Latin/ASCII colon, however slighly more bold and slanted and whose
> > dots are rectangular). It is encoded at U+0589. And used in traditional
> > texts instead of the "modern" full stop.
> >
> > But Armenian also has its own distinctive puctuation (mijaket) for the
> > introductory colon between two phrases of the same sentence (whose glyph
> > looks very much like the Basic Latin/ASCII full stop). It is not encoded
> > and I don't like using the ASCII full stop where it causes confusion.
> >
> > Where is the Armenian distinctive mijaket? Shouldn't it be encoded at
> > U+0588?
>
> Off-list because I generally don't know what I'm talking about, but
> grepping NamesList.txt for ‘mijaket’ gives U+2024. If this isn't
> what you're looking for, my apologies.
>
> --
> S. Gilles
>


Armenian Mijaket (Armenian colon)

2017-12-05 Thread Philippe Verdy via Unicode
The Armenian script has its own distinctive punctuation (vertsaket) for the
standard full stop at end of sentence (whose glyph looks very much like the
Basic Latin/ASCII colon, however slighly more bold and slanted and whose
dots are rectangular). It is encoded at U+0589. And used in traditional
texts instead of the "modern" full stop.

But Armenian also has its own distinctive puctuation (mijaket) for the
introductory colon between two phrases of the same sentence (whose glyph
looks very much like the Basic Latin/ASCII full stop). It is not encoded
and I don't like using the ASCII full stop where it causes confusion.

Where is the Armenian distinctive mijaket? Shouldn't it be encoded at
U+0588?