Re: 0027, 02BC, 2019, or a new character?

2018-02-22 Thread Martin J. Dürst via Unicode

On 2018/02/21 12:15, Michael Everson via Unicode wrote:

I absolutely disagree. There’s a whole lot of related languages out there, and 
the speakers share some things in common. Orthographic harmonization between 
these languages can ONLY help any speaker of one to access information in any 
of the others. That expands people’s worlds. That would be a good goal.


It's definitely a good goal. But it's not rocket science to learn the 
different orthographies. If the languages are similar, then different 
orthographies are just a minor nuisance. As an example, German and Dutch 
also have different orthographies, but that's really a very minor issue 
when learning one language from the other even though these languages 
are very close.


Regards,   Martin.


Re: Suggestions?

2018-02-22 Thread via Unicode

On 22.02.2018 05:01, David Starner via Unicode wrote:

On Wed, Feb 21, 2018 at 7:55 AM Jeb Eldridge via Unicode
 wrote:


Where can I post suggestions and feedback for Unicode?


Here is as good as any place. There are specific places for a few
specific things, but likely if you do have something thats likely to
get changed, youll need the help of someone here to get through the
process. It is a quarter-century old technical standard embedded in
most electronics, so I would temper any expectations for major
changes; it works the way it works because thats the way previous
versions worked, and nobody is interested in the trouble changing 
them

would involve.



Yes and no. This list is for informal discussion, so someone unsure 
about things may start here, but posting on this list does not count as 
feedback or suggestions to Unicode. So by all means post here some of 
your ideas and understand more.


Regards
John Knightley



Links:
--
[1] mailto:unicode@unicode.org




Re: Bidi edge cases in Hangul and Indic

2018-02-22 Thread Ken Whistler via Unicode

David,


On 2/22/2018 7:21 PM, David Corbett via Unicode wrote:

My confusion stems from Unicode’s online bidi utility.


That bidi utility has known defects in it. It is not yet conformant with 
changes to UBA 6.3, let alone later changes to UBA. And the mapping of 
memory position to display position in that utility does not take into 
account complex mapping that has to occur in the layout engines and 
fonts in real applications.


--Ken


Re: Bidi edge cases in Hangul and Indic

2018-02-22 Thread David Corbett via Unicode
On Thu, Feb 22, 2018 at 6:32 PM, Ken Whistler wrote:

>
> If you override the normal left-to-right ordering with bidi override
> controls, then the layout order is reversed, but what is actually laid out
> is those two glyphs. So you just reverse the order of the two syllables for
> display, in either case.
>

My confusion stems from Unicode’s online bidi utility. Compare
https://unicode.org/cldr/utility/bidi.jsp?a=%E2%80%AE%EB%B3%B4%EA%B8%B0
(NFC) to https://unicode.org/cldr/utility/bidi.jsp?a=%E2%80%AE%E1%84%
87%E1%85%A9%E1%84%80%E1%85%B5 (NFD). Concatenating each one’s characters in
reordered display position order produces canonically different results.

Here is more practical example. A sequence of an emoji modifier base and an
emoji modifier in an RTL run will be display-reordered such that the
modifier is to left of the base. Clearly, the right thing is to not reorder
them, because they should ligate to form a single glyph. Contrast this with
“fl” in an RTL run, which will be display-reordered to “lf”: it would be
wrong to apply the previous rationale here just because “fl” may have a
single glyph.

It sounds like the UBA doesn’t specify how to reorder the glyphs of the
characters within a level run. That’s about what I expected. I was just
worried it might require an easily implemented but wrong order, so thanks
for the reassurance.


Re: Bidi edge cases in Hangul and Indic

2018-02-22 Thread Ken Whistler via Unicode



On 2/22/2018 11:39 AM, David Corbett via Unicode wrote:
For example, after a right-to-left override, the Hangul string 보기 
(“bogi”) becomes 기보 (“gibo”) in visual order. However, its NFD form is 
reordered by jamo instead of by syllable; that is, it looks like “igob”. 


Nope. *tilt* The UBA reorders the display order in layout -- not the 
underlying string.


"bogi" is the sequence <1107, 1169, 1100, 1175> in NFD or  
in NFC.


Because of canonical equivalence, for display of the NFD string, the 
sequence <1107,1169> needs to be mapped onto the same *glyph* as BCF4, 
and the sequence <1100,1175> onto the same *glyph* as AE30.


If you override the normal left-to-right ordering with bidi override 
controls, then the layout order is reversed, but what is actually laid 
out is those two glyphs. So you just reverse the order of the two 
syllables for display, in either case.


You could force display of "igob", but only if you had inserted some 
character in between the conjoining jamos that was preventing their 
equivalence to the syllables, anyway.


I don’t think it is the intent of the algorithm that canonically 
equivalent strings display so very differently, but I can’t find any 
explicit guidance. What should a UBA-conformant renderer do?


The right thing. ;-)

--Ken



Bidi edge cases in Hangul and Indic

2018-02-22 Thread David Corbett via Unicode
Although the Unicode Bidirectional Algorithm clearly defines how to reorder
characters in memory, I don’t understand precisely what it means to display
one character after another once they’ve been reordered; specifically, when
bidi reordering changes the number of user-perceived characters.

For example, after a right-to-left override, the Hangul string 보기 (“bogi”)
becomes 기보 (“gibo”) in visual order. However, its NFD form is reordered by
jamo instead of by syllable; that is, it looks like “igob”. I don’t think
it is the intent of the algorithm that canonically equivalent strings
display so very differently, but I can’t find any explicit guidance. What
should a UBA-conformant renderer do?

Another unclear case is Indic clusters. षिक् is unambiguously two clusters,
but after an RLO, and after following rule L3 to put combining marks after
their bases, it looks like one cluster: क्षि. If Devanagari were actually
written right-to-left, I would expect it to stay as two clusters: क्‌षि.
Does the UBA prefer one rendering over the other, or is this outside its
scope?


Re: Coloured Characters

2018-02-22 Thread William_J_G Overington via Unicode
Richard Wordingham wrote:

> 'Foreground' and 'background' are the only externally defined colours. 
> There's no ability to explicitly choose, say 'text stroked sable and dotted 
> gules'.  Instead, it's 'text stroked sable and dotted proper', with a choice 
> of palettes to define 'proper'.

External selection of decoration colours would theroretically be possible, I do 
not know how difficult this would be to implement.

I remember posting about that somewhere some years ago but I cannot find it at 
the moment.

The following thread now mentions that possibility and also has, from 2014, an 
idea of how to have shading from one colour to another.

https://forum.high-logic.com/viewtopic.php?f=37&t=5024

In that thread, on 7 June 2014, I wrote as follows.

quote

The standardization process has a rule that if someone (individual or company) 
puts forward a proposal for standardization, then that person has to agree to 
provide a working demonstration.

I put forward some ideas for how to extend the COLR/CPAL model so as to provide 
colour shading of glyphs as well as the existing solid colour.

Yet I could not formally propose them for standardization as I do not have the 
facilities to provide a working demonstration.

end quote

So the ideas are there and maybe they could be implemented, though alas I 
cannot implement them myself.

William Overington

Thursday 22 February 2018





AW: metric for block coverage

2018-02-22 Thread Dreiheller, Albrecht via Unicode
Thanks a lot.

If I understand it right, these are examples in Sanskrit language using Tamil 
script?
More precisely, my question is whether there are examples in (today's) Tamil 
language using Danda or Double Danda.

I tried to detect these characters in Tamil's Wikipedia texts, but I didn't 
find some.

Albrecht

-Ursprüngliche Nachricht-
Von: Unicode [mailto:unicode-boun...@unicode.org] Im Auftrag von Richard 
Wordingham via Unicode
Gesendet: Dienstag, 20. Februar 2018 21:13
An: unicode@unicode.org
Betreff: Re: metric for block coverage

On Tue, 20 Feb 2018 15:13:16 +
"Dreiheller, Albrecht via Unicode"  wrote:

> Could someone please supply an example (web link ...) for usage of
> danda / double danda in Tamil? Thanks, Albrecht

Take your pick from http://www.prapatti.com/slokas/slokasbyname.html .
Do they meet your requirements, or do you perhaps want text in the
Tamil language as opposed to PDFs of Sanskrit in Tamil script?  I found
the likes of my example by googling for 'Tamil Shloka' without quotes.

Richard.



Re: Coloured Characters

2018-02-22 Thread Richard Wordingham via Unicode
On Thu, 22 Feb 2018 10:55:23 + (GMT)
William_J_G Overington  wrote:

> Richard Wordingham wrote:
> 
> > 'Foreground' and 'background' are the only externally defined
> > colours. There's no ability to explicitly choose, say 'text stroked
> > sable and dotted gules'.  Instead, it's 'text stroked sable and
> > dotted proper', with a choice of palettes to define 'proper'.  

> External selection of decoration colours would theroretically be
> possible, I do not know how difficult this would be to implement.

The problem lies in changing existing interfaces.  I can only speak
with any real knowledge for the OpenType COLR/CPAL method.

The change would be a major pain in programming languages with
obligatory (even if implicit) typing.  At present, foreground and
background need to be specified (if only be default) and passed into
the painting routines. You now want to expand the foreground argument
into a list of colours - or possibly a callback routine.

The next issue is what is to happen when the list provided is too
short.  Without suitable handling, this may cause problems with fonts
that already work in applications that at one interface level know
nothing about colour fonts.  For example, the HTML code that I have been
using with my font knows nothing about colour fonts as such.  To get
colour with my web page, I just select a coloured font.

The final issue that springs to mind is that the COLR table of OpenType
allows for 65,535 different colours in glyphs; 0x is the only
reserved colour ID.  It represents the foreground colour.  If there is
only one palette in the font, 0xFFFE can be a legitimate
user-defined colour ID.  I wouldn't be surprised if such an
assignment survived the transition from a proof-of-principle font to
a released font.

A less painful method for interfaces might be the selection of
palettes by name.  However, there are rather more possible colour
combinations than can be accommodated in an sfnt name table, so an
approximation algorithm would be required.  It would also make the CPAL
tables larger and much more difficult to generate.  There are also 30
unassigned bits left in the palette's type attribute.

Of course, Unicode is not constrained by what is currently available,
and as an entity is interested at most in what is feasible rather than
the precise mechanisms.  Several full members, though, will care about
precise mechanisms.

Richard.


Re: IDC's versus Egyptian format controls

2018-02-22 Thread James Kass via Unicode
Martin J. Dürst wrote:

> Is it only me or did you get some of this data wrong?

Yes, sorry.  There's an offset.  I copy/pasted data from an archive
which apparently predates the formal release of Ext C, and IIRC there
was some shifting.  Unfortunately the font I used to view the data
matches the data, and so is also incorrect.