Re: [HarfBuzz] Ligatures

2020-05-26 Thread Khaled Hosny



> On May 24, 2020, at 6:34 PM, Eli Zaretskii  wrote:
> 
>> From: Khaled Hosny 
>> Date: Sun, 24 May 2020 18:00:45 +0200
>> Cc: harfbuzz@lists.freedesktop.org
>> 
> 
>> This, for example, ensures that HarfBuzz can do basic Arabic-like shaping 
>> across item boundaries e.g. if you break items in the middle of an Arabic 
>> word (due to font change, for example), you still get the 
>> initial/medial/final forms across the boundary as appropriate. Or to put a 
>> combining mark at the start of a paragraph on a dotted circle as it 
>> otherwise has no base.
>> 
>> If this is not possible, then you can try to pass enough context, like reach 
>> back and forward to first character that is not a combining mark. This may 
>> or may not be enough.
>> 
>> Shaping space-delimited words is orthogonal to that, context is better be 
>> always provided.
> 
> So this sounds like passing a physical line that ends in a newline
> should be good enough?  Or are there issues that cross newlines as
> well?

It should be enough.
> 
> And what is a "paragraph" in this context?

The same as in UAX#9.

>> Some fonts do have OpenType lookups that interact with space (e.g. kerning 
>> pairs involving space, or even substitutions involving space), so shaping 
>> words independently will give suboptimal result. You can use HarfBuzz API to 
>> find out if the font has OpenType layout rules involving space, or decide to 
>> live with this limitation.
> 
> Which API provides this information?

https://harfbuzz.github.io/harfbuzz-hb-ot-layout.html#hb-ot-layout-lookup-collect-glyphs

But requires some understanding of how OpenType lookups are structured. 
Checking how Firefox uses it might help.

Regards,
Khaled
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-25 Thread Eli Zaretskii
> Date: Sun, 24 May 2020 20:27:26 +0100
> From: Richard Wordingham 
> Cc: harfbuzz@lists.freedesktop.org
> 
> It seems to me that Emacs knows what script a cluster is in; perhaps
> it just hasn't united the concepts.

It's a kind of coincidence: different scripts almost always require
different fonts, and Emacs only composes characters displayed in the
same font.

> Users may have written some weird clustering combinations, and I can
> imagine some weird combinations in the Private Use Areas.  I should
> investigate.

Don't expect anything about PUA, Emacs doesn't assign any useful
properties to them.

> > That's a feature (you can disable it with disable-point-adjustment).
> 
> Is this documented in info, or does one have to trawl the code to find
> out what it does?

Every variable in Emacs has a doc string, and you can search them with
several apropos commands.  We don't describe in the manual every
obscure variable, there are too many of them.

> It seems that Emacs needs several levels of movement
> - by codepoints, by grapheme cluster, by akshara (will be the same as
> grapheme cluster in many cases) and by HarfBuzz cluster, or whatever
> is used to make access into lam-alif impossible.

I have no idea which one Emacs uses, not in these terms.  All I can
say is that, in HarfBuzz terms, we get the number of "elements" from
hb_buffer_get_length, and then index the arrays returned by
hb_buffer_get_glyph_infos.  Each "element" thus indexed is a separate
"thing" for display purposes, and Emacs by default won't let you
"enter" such a "thing", it will move across it in its entirety in one
go.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-24 Thread Richard Wordingham
On Sun, 24 May 2020 17:18:27 +0300
Eli Zaretskii  wrote:

> > Date: Sat, 23 May 2020 21:42:24 +0100
> > From: Richard Wordingham 

> > > As for different scripts: if the character codepoints are the
> > > same, Emacs currently assigns each character to a single script.  

> > I'll need to dig deeper.  Composition of both 'a' and Greek alpha
> > with an acute accent works, which suggest that the problem isn't
> > there for characters with a script property of 'inherited'.  

> Emacs currently leaves it up to HarfBuzz to guess the script, as it
> doesn't yet have the necessary smarts.

I thought the issue lay within Emacs.  HarfBuzz has been fairly
civilised about combining marks in the 'wrong' script run.  If I put
Thai marks in what is basically a Tai Tham script run, it seems to
treat them properly.  I do such a strange thing because the marks have
been borrowed into Tai Tham, but not yet encoded.  I was told I
couldn't do this in Emacs 24.

It seems to me that Emacs knows what script a cluster is in; perhaps
it just hasn't united the concepts.  Users may have written some weird
clustering combinations, and I can imagine some weird combinations in
the Private Use Areas.  I should investigate.

> > The behaviour in 27.05 is the almost the same as for 24.4, but the
> > breaking in item (1) is automatically repaired.

> > Pressing the 'delete' key still deletes a single character, but may
> > be that because it's mapped to tpu-delete-current-char.  

It's OK, it's still working with emacs -q.  That means one can easily
replace the initial character of a cluster.

> If you press DEL (or Backspace), it will delete a single codepoint.

That only deletes the final cluster.

> > So, what's not working in Arabic is that one can't move the cursor
> > through ligatures.  
> 
> That's a feature (you can disable it with disable-point-adjustment).

Is this documented in info, or does one have to trawl the code to find
out what it does?  It seems that Emacs needs several levels of movement
- by codepoints, by grapheme cluster, by akshara (will be the same as
grapheme cluster in many cases) and by HarfBuzz cluster, or whatever
is used to make access into lam-alif impossible. Visible motion by
akshara is the minimum requirement for English, so that stepping
through 'ffi' will visibly advance the cursor.  LibreOffice writer aims
to provide visible cursor motion at the grapheme cluster level, so one
can use the cursor to step through the consonants in an akshara.

By codepoint is useful for editing complex aksharas; it is even more
useful if the cursor acts like a cluster terminator, but that is
probably a matter of personal taste.  It will also be useful for
editing narrow phonetic transcriptions, which can be quite heavy on
diacritics.

By grapheme cluster (at least, by default grapheme cluster) is level
encouraged by Unicode, and will give you letter-by-letter control even
if you're editing Sanskrit in an Indian script.  For Arabic, European
and Hebrew scripts, this is the same as akshara level.

By akshara is the current default movement level for most Indian scripts
in Emacs.  It is also the level at which the most Hindi speakers
claim to operate.  (I get the impression, however, that a lot of
Indians do their fine level editing of complicated text in
transliteration!)

By HarfBuzz cluster takes you to the level where HarfBuzz will easily
give you cursor positions.  Now occasionally HarfBuzz's actual clusters
won't combine whole grapheme clusters or aksharas.  For example, Thai
vowels could be roughly placed for Thai without taking into account of
the previous letters, just as on typewriters, and one can even handle
Thai tone marks like that.  It's possible that in these cases, HarfBuzz
will not form clusters.  How you handle these cases is up to you.  I
would make 'by HarfBuzz cluster' the coarsest.

I don't think motion by HarfBuzz cluster is useful - perhaps you know
of a use.

Richard.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-24 Thread Eli Zaretskii
> From: Khaled Hosny 
> Date: Sun, 24 May 2020 18:00:45 +0200
> Cc: harfbuzz@lists.freedesktop.org
> 
> In general the safest is to pass the whole paragraph of text and the start 
> and length of each item (item being a run with same font, direction, script, 
> and language).

I was talking about text that has a single font, direction, script,
and language.

> This, for example, ensures that HarfBuzz can do basic Arabic-like shaping 
> across item boundaries e.g. if you break items in the middle of an Arabic 
> word (due to font change, for example), you still get the 
> initial/medial/final forms across the boundary as appropriate. Or to put a 
> combining mark at the start of a paragraph on a dotted circle as it otherwise 
> has no base.
> 
> If this is not possible, then you can try to pass enough context, like reach 
> back and forward to first character that is not a combining mark. This may or 
> may not be enough.
> 
> Shaping space-delimited words is orthogonal to that, context is better be 
> always provided.

So this sounds like passing a physical line that ends in a newline
should be good enough?  Or are there issues that cross newlines as
well?

And what is a "paragraph" in this context?

> Some fonts do have OpenType lookups that interact with space (e.g. kerning 
> pairs involving space, or even substitutions involving space), so shaping 
> words independently will give suboptimal result. You can use HarfBuzz API to 
> find out if the font has OpenType layout rules involving space, or decide to 
> live with this limitation.

Which API provides this information?

Thanks.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-24 Thread Khaled Hosny



> On May 24, 2020, at 5:41 PM, Eli Zaretskii  wrote:
> 
>>> I almost understand (and agree), sans one part: the "arbitrary parts"
>>> of what you wrote.  If we want to produce a ligature out of "ffi", the
>>> shaper will get "fii" and nothing more.  Which part here is arbitrary?
>> 
>> Sending "ffi" alone is an arbitrary decision. The font might have kerning 
>> between "ffi" and what comes before and after it, but you won't get it. The 
>> font might not have a ligature for "ffi" at all, but using kerning instead, 
>> so you will get kerning between "ffi" glyphs and not other glyphs which is 
>> arbitrary. It might be a cursive font that changes glyph shapes based on 
>> surrounding glyphs, and you will get that for "ffi" and not elsewhere which 
>> is arbitrary.
>> 
>> That is just plain wrong, there is no way around it.
> 
> So, to make sure I understand the correct solution: you are saying
> that all the text to be displayed should go through the shaper, is
> that right?
> 
> If so, how large should be the chunks of text to be passed to the
> shaper in any one call, in order to have a correct result?  Would it
> be enough to pass whitespace-separated words one by one? or do we need
> to send entire physical lines (up to the terminating newline
> character)? or maybe an entire paragraph?  What is the recommendation
> here?

In general the safest is to pass the whole paragraph of text and the start and 
length of each item (item being a run with same font, direction, script, and 
language).

This, for example, ensures that HarfBuzz can do basic Arabic-like shaping 
across item boundaries e.g. if you break items in the middle of an Arabic word 
(due to font change, for example), you still get the initial/medial/final forms 
across the boundary as appropriate. Or to put a combining mark at the start of 
a paragraph on a dotted circle as it otherwise has no base.

If this is not possible, then you can try to pass enough context, like reach 
back and forward to first character that is not a combining mark. This may or 
may not be enough.

Shaping space-delimited words is orthogonal to that, context is better be 
always provided.

Some fonts do have OpenType lookups that interact with space (e.g. kerning 
pairs involving space, or even substitutions involving space), so shaping words 
independently will give suboptimal result. You can use HarfBuzz API to find out 
if the font has OpenType layout rules involving space, or decide to live with 
this limitation. Firefox does this check as it wants to cache individualizing 
ideal shaped words when possible, and Chrome used to do that to but I think 
they now make sure to retain enough information to avoid unnecessary reshaping 
so such a word cache is not needed.

Regards,
Khaled
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-24 Thread Eli Zaretskii
> > I almost understand (and agree), sans one part: the "arbitrary parts"
> > of what you wrote.  If we want to produce a ligature out of "ffi", the
> > shaper will get "fii" and nothing more.  Which part here is arbitrary?
> 
> Sending "ffi" alone is an arbitrary decision. The font might have kerning 
> between "ffi" and what comes before and after it, but you won't get it. The 
> font might not have a ligature for "ffi" at all, but using kerning instead, 
> so you will get kerning between "ffi" glyphs and not other glyphs which is 
> arbitrary. It might be a cursive font that changes glyph shapes based on 
> surrounding glyphs, and you will get that for "ffi" and not elsewhere which 
> is arbitrary.
> 
> That is just plain wrong, there is no way around it.

So, to make sure I understand the correct solution: you are saying
that all the text to be displayed should go through the shaper, is
that right?

If so, how large should be the chunks of text to be passed to the
shaper in any one call, in order to have a correct result?  Would it
be enough to pass whitespace-separated words one by one? or do we need
to send entire physical lines (up to the terminating newline
character)? or maybe an entire paragraph?  What is the recommendation
here?

Thanks.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-24 Thread Eli Zaretskii
> Date: Sat, 23 May 2020 21:42:24 +0100
> From: Richard Wordingham 
> 
> > As for different scripts: if the character codepoints are the same,
> > Emacs currently assigns each character to a single script.
> 
> I'll need to dig deeper.  Composition of both 'a' and Greek alpha with
> an acute accent works, which suggest that the problem isn't there for
> characters with a script property of 'inherited'.

Emacs currently leaves it up to HarfBuzz to guess the script, as it
doesn't yet have the necessary smarts.

> > Emacs 24.4 is very old, and doesn't use HarfBuzz.  Please try Emacs 27
> > instead, it has several bugs in this area fixed, and will use HarfBuzz
> > if available at build time.
> 
> The behaviour in 27.05 is the almost the same as for 24.4, but the
> breaking in item (1) is automatically repaired.  The process seems slow
> - I can see the glyph become final and then revert back to being
> medial.  I'm puzzled by not being able to step into lam-alif but being
> able to step through a series 'beh's.  The step into command for
> advancing codepoint by codepoint semiworks.  The cluster shaping
> doesn't break at the cursor - Handa gave me a C code fix so I could
> achieve that - but the number of steps into to pass through a cluster
> matches the number of codepoints.
> 
> Pressing the 'delete' key still deletes a single character, but may be
> that because it's mapped to tpu-delete-current-char.

If you press DEL (or Backspace), it will delete a single codepoint.

> So, what's not working in Arabic is that one can't move the cursor
> through ligatures.

That's a feature (you can disable it with disable-point-adjustment).

The rest of your observations seem to be too Emacs-specific to discuss
here.  You are welcome to submit an Emacs bug report if you think
something isn't working as it should, or would like to discuss
Emacs-specific details.

Thanks.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Richard Wordingham
On Sat, 23 May 2020 19:45:17 +0300
Eli Zaretskii  wrote:

> > Date: Sat, 23 May 2020 16:54:51 +0100
> > From: Richard Wordingham 
> > Cc: harfbuzz@lists.freedesktop.org
> >   
> > > Emacs supports more than one rule for each composable sequence of
> > > characters.  
> > 
> > That doesn't help when the rules give conflicting divisions into
> > clusters, which is the case with Tai Tham.  
> 
> The assumption is that either the rules can be arranged in an order
> that allows to use the first matching rule, or, failing that, that you
> write your own composing function that implements whatever logic
> that's required to select the right rule.

That choice needs tied to the choice of font - or for Tai Tham you use
my hack technique. However, it's not as bad as it could be.  There's
something strange going on in Tai Tham even at Emacs 27.05. I can have
two aksharas interacting for shaping, but it take two 'ordinary' key
advances to pass through it, apparently implying that there are two
clusters. Clusters for cursor advancement and clusters for shaping seem
to be controlled independently!

>From the dotted circle insertion logic, Emacs 27.05 on my machine
definitely looks as though it's using some form of HarfBuzz.

> > The Devanagari rule only covers the Vedic marks in the Devanagari
> > block, the 'stress signs' according to the comments.  Can rules
> > essentially for different scripts now share combining marks?  The
> > newer Vedic marks were supposed to be available to at least all
> > Indian Indic scripts.  
> 
> I don't know enough about this to make sure I even understand the
> question, let alone can provide an answer.  One thing I can say is
> that the regexp pattern in a rule can specify different context (the
> surrounding characters) even if the character that triggers the rule
> is the same.  Failing that, I guess the solution will again be the
> function that produces the composition.
> 
> As for different scripts: if the character codepoints are the same,
> Emacs currently assigns each character to a single script.

I'll need to dig deeper.  Composition of both 'a' and Greek alpha with
an acute accent works, which suggest that the problem isn't there for
characters with a script property of 'inherited'.

> > > Does Emacs indeed fail to wrap Arabic text?  can you show an
> > > example?  
> > 
> > Character level wrapping still almost works down at Emacs 24.4, but
> > I don't know that it wasn't broken in later enhancements.  There
> > are three features that make me think Emacs 24.4 might be different
> > to the current state of affairs:
> > 
> > (1) Clicking into the text breaks text before the cursor, but not
> > after it.
> > (2) I can't step into lam-alif the way I step into Indic clusters.
> > (3) Lam-alif isn't broken by line wrap.  
> 
> Emacs 24.4 is very old, and doesn't use HarfBuzz.  Please try Emacs 27
> instead, it has several bugs in this area fixed, and will use HarfBuzz
> if available at build time.

The behaviour in 27.05 is the almost the same as for 24.4, but the
breaking in item (1) is automatically repaired.  The process seems slow
- I can see the glyph become final and then revert back to being
medial.  I'm puzzled by not being able to step into lam-alif but being
able to step through a series 'beh's.  The step into command for
advancing codepoint by codepoint semiworks.  The cluster shaping
doesn't break at the cursor - Handa gave me a C code fix so I could
achieve that - but the number of steps into to pass through a cluster
matches the number of codepoints.

Pressing the 'delete' key still deletes a single character, but may be
that because it's mapped to tpu-delete-current-char.

So, what's not working in Arabic is that one can't move the cursor
through ligatures.  It seems one can advance point through them
using a step-into command (dead reckoning is a useful fallback), but one
loses visual feedback.  But for that important matter, it looks as
though Arabic in Emacs already has the behaviours needed for shaping
Latin words.  The stepping into is enabled by the command "(setq
disable-point-adjustment t)".


Richard.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Simon Cozens

On 23/05/2020 08:44, Eli Zaretskii wrote:

Thanks.  Since (b) is not really feasible without redesigning the
entire Emacs display engine (for which I see no volunteers lining up
any time soon), I guess we will have to use some more-or-less
reasonable and somewhat unreliable heuristics by supporting only some
ligatures that are known in advance.


Travelling further in the wrong direction is always an option, but don't 
expect it to get you closer to the right destination.


Full text shaping is the only way to get this right. Everything else is 
a hack, and piling hacks on top of hacks is just storing maintenance 
problems up for yourself.


I know that's hard to hear for a volunteer project where nobody really 
wants to invest the effort in this complicated niche stuff, but 
honestly, you're probably better doing *nothing* than doing this.

___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Eli Zaretskii
> Cc: harfbuzz@lists.freedesktop.org
> From: Simon Cozens 
> Date: Sat, 23 May 2020 20:14:16 +0100
> 
> On 23/05/2020 08:44, Eli Zaretskii wrote:
> > Thanks.  Since (b) is not really feasible without redesigning the
> > entire Emacs display engine (for which I see no volunteers lining up
> > any time soon), I guess we will have to use some more-or-less
> > reasonable and somewhat unreliable heuristics by supporting only some
> > ligatures that are known in advance.
> 
> Travelling further in the wrong direction is always an option, but don't 
> expect it to get you closer to the right destination.

I don't think this is an adequate analogy.  What Emacs does is an
approximation to what should be done.  The approximation falls short
of the target, that's true, and might even produce clearly incorrect
results in some cases (although I've yet to see such cases, and I'm
using Emacs for editing non-ASCII text for 20 years).  But it is still
an approximation, so it is not really "the wrong direction" (which you
seem to interpret as 180 degrees off, otherwise even going in the
wrong direction might bring me closer to the destination, right?).
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Eli Zaretskii
> Date: Sat, 23 May 2020 20:06:32 +0100
> From: Richard Wordingham 
> 
> There are three different tools for producing what looks like an "ffi"
> ligature:
> 
> 1) Make a ligature
> 2) Contextual substitution
> 3) A mix of contextual substitution and kerning.
> 
> A font that uses the first will produce a ligature for Emacs.
> 
> A font that uses contextual substitution will not work - you will just
> see the 3 unligated characters with their default glyphs.
> 
> A font that uses a mix of contextual substitution and kerning will
> likewise fail.  However, if is possible that you might get the "ff"
> ligature and a normal 'i', or a normal 'f' and an "fi" ligature.
> 
> From the point of view of someone who expects full shaping, what result
> you get will be arbitrary, depending on how the font designer has
> marshalled his tools.

I understand.  Still, the result looks reasonably good in most cases,
especially in an editor whose main purpose is to edit programs, and
which doesn't pretend to produce typographical accuracy.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Eli Zaretskii
> From: Khaled Hosny 
> Date: Sat, 23 May 2020 20:54:15 +0200
> Cc: harfbuzz@lists.freedesktop.org
> 
> > We pass to the shaper the part of text that matches the regexps you
> > can see at the end of misc-lang.el, then display the glyphs the shaper
> > returns.  The above description is a high-level overview; there are
> > many details that I cannot describe in a short message.  For example,
> > for Arabic, when we get back the grapheme clusters, we lay them out,
> > then skip to the end of the text that we passed to the shaper.
> 
> You mean this:
> https://repo.or.cz/emacs.git/blob/HEAD:/lisp/language/misc-lang.el#l78
> 
> I’m not sure how can I read it, but it seems to be missing the entire Arabic 
> Extended-A and Arabic Mathematical Alphabetic Symbols blocks. I’m not also 
> sure how it would handle using combining marks from other blocks with Arabic 
> text (say putting U+20D6 over an Arabic letter).

If you can suggest improvements to those patterns, please do, and
thanks.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Richard Wordingham
On Sat, 23 May 2020 21:26:00 +0300
Eli Zaretskii  wrote:

> > From: Khaled Hosny 
> > Date: Sat, 23 May 2020 20:09:50 +0200
> > Cc: harfbuzz@lists.freedesktop.org
> > 
> > Overall, if you can’t send the whole text (words are the absolute
> > minimum, but this has its issues as well), don’t just send
> > arbitrary parts of it as the result will be some inconsistent
> > mess.  
> 
> I almost understand (and agree), sans one part: the "arbitrary parts"
> of what you wrote.  If we want to produce a ligature out of "ffi", the
> shaper will get "fii" and nothing more.  Which part here is arbitrary?

There are three different tools for producing what looks like an "ffi"
ligature:

1) Make a ligature
2) Contextual substitution
3) A mix of contextual substitution and kerning.

A font that uses the first will produce a ligature for Emacs.

A font that uses contextual substitution will not work - you will just
see the 3 unligated characters with their default glyphs.

A font that uses a mix of contextual substitution and kerning will
likewise fail.  However, if is possible that you might get the "ff"
ligature and a normal 'i', or a normal 'f' and an "fi" ligature.

From the point of view of someone who expects full shaping, what result
you get will be arbitrary, depending on how the font designer has
marshalled his tools.

Richard.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Eli Zaretskii
> From: Khaled Hosny 
> Date: Sat, 23 May 2020 20:40:44 +0200
> Cc: harfbuzz@lists.freedesktop.org
> 
> Sending “ffi” alone is an arbitrary decision. The font might have kerning 
> between “ffi” and what comes before and after it, but you won’t get it. The 
> font might not hav a ligature for “ffi” at all, but using kerning instead, so 
> you will get kerning between “ffi” glyphs and not other glyphs which is 
> arbitrary. It might be a cursive font that changes glyph shapes based on 
> surrounding glyphs, and you will get that for “ffi” and not elsewhere which 
> is arbitrary.
> 
> That is just plain wrong, there is no way around it.

OK, thanks.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Khaled Hosny


> On May 23, 2020, at 8:34 PM, Eli Zaretskii  wrote:
> 
>> From: Khaled Hosny 
>> Date: Sat, 23 May 2020 20:18:33 +0200
>> Cc: harfbuzz@lists.freedesktop.org
>> 
>>> The Emacs display engine examines the text to be displayed and laid
>>> out one character at a time, and makes layout decisions after each
>>> character or grapheme cluster it lays out.  Its design is therefore
>>> fundamentally incompatible with shaping large substrings of buffer
>>> text at once.  We do support that for short sequences of characters,
>>> which seems to work well enough for complex shaping (a.k.a. "character
>>> compositions") of scripts that require that, but we still do that one
>>> grapheme cluster at a time.  
>> 
>> That wouldn’t work for Arabic. You can’t shape Arabic one grapheme cluster 
>> at a time (or any other text actually, but the brokenness in Arabic will be 
>> immediately obvious), so I’m most certain that is not exactly how Arabic is 
>> handled in Emacs right now.
> 
> We pass to the shaper the part of text that matches the regexps you
> can see at the end of misc-lang.el, then display the glyphs the shaper
> returns.  The above description is a high-level overview; there are
> many details that I cannot describe in a short message.  For example,
> for Arabic, when we get back the grapheme clusters, we lay them out,
> then skip to the end of the text that we passed to the shaper.

You mean this:
https://repo.or.cz/emacs.git/blob/HEAD:/lisp/language/misc-lang.el#l78

I’m not sure how can I read it, but it seems to be missing the entire Arabic 
Extended-A and Arabic Mathematical Alphabetic Symbols blocks. I’m not also sure 
how it would handle using combining marks from other blocks with Arabic text 
(say putting U+20D6 over an Arabic letter).

What happens if one edits a file that contains only Arabic text, and why that 
(whatever it is ) can’t be extended to any text?

>>> The character composition is implemented
>>> in Lisp, which is called by the display engine, and which then calls
>>> back into C to invoke the shaper.  This implementation is meant to
>>> allow a great deal of control on what should be composed and how.  But
>>> it is also relatively slow, which is another reason why doing that for
>>> all the text to be laid out is impractical: it slows down redisplay to
>>> the degree that it becomes annoying to users.
>> 
>> Having more control should not be at the price of doing things wrong.
> 
> No one said it should, that's just how things are.
> 
>> The whole composition concept of Emacs does not make any sense to me, all 
>> text is “composed”. You can have a special mode that would disable shaping 
>> for specific purposes (opening huge log files, wanting to see raw text with 
>> no bidi or shaping, etc), but this can be done in cooperation with HarfBuzz 
>> and not by bypassing it entirely.
> 
> We are talking about a piece of software designed 21 years ago.  I
> realize that it makes no sense to you, but that's what we have, and
> will probably have for the next 10 years or so.  We must make the most
> out of what we have.

So nearly as old as the first release of OpenOffice (not counting its 
StarOffice days). Anyway bad decisions about text layout is quite rampant in 
software (old and new) and need to be fixed, but that is not my call.

___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Khaled Hosny


> On May 23, 2020, at 8:26 PM, Eli Zaretskii  wrote:
> 
>> From: Khaled Hosny 
>> Date: Sat, 23 May 2020 20:09:50 +0200
>> Cc: harfbuzz@lists.freedesktop.org
>> 
>> Overall, if you can’t send the whole text (words are the absolute minimum, 
>> but this has its issues as well), don’t just send arbitrary parts of it as 
>> the result will be some inconsistent mess.
> 
> I almost understand (and agree), sans one part: the "arbitrary parts"
> of what you wrote.  If we want to produce a ligature out of "ffi", the
> shaper will get "fii" and nothing more.  Which part here is arbitrary?

Sending “ffi” alone is an arbitrary decision. The font might have kerning 
between “ffi” and what comes before and after it, but you won’t get it. The 
font might not hav a ligature for “ffi” at all, but using kerning instead, so 
you will get kerning between “ffi” glyphs and not other glyphs which is 
arbitrary. It might be a cursive font that changes glyph shapes based on 
surrounding glyphs, and you will get that for “ffi” and not elsewhere which is 
arbitrary.

That is just plain wrong, there is no way around it.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Eli Zaretskii
> From: Khaled Hosny 
> Date: Sat, 23 May 2020 20:18:33 +0200
> Cc: harfbuzz@lists.freedesktop.org
> 
> > The Emacs display engine examines the text to be displayed and laid
> > out one character at a time, and makes layout decisions after each
> > character or grapheme cluster it lays out.  Its design is therefore
> > fundamentally incompatible with shaping large substrings of buffer
> > text at once.  We do support that for short sequences of characters,
> > which seems to work well enough for complex shaping (a.k.a. "character
> > compositions") of scripts that require that, but we still do that one
> > grapheme cluster at a time.  
> 
> That wouldn’t work for Arabic. You can’t shape Arabic one grapheme cluster at 
> a time (or any other text actually, but the brokenness in Arabic will be 
> immediately obvious), so I’m most certain that is not exactly how Arabic is 
> handled in Emacs right now.

We pass to the shaper the part of text that matches the regexps you
can see at the end of misc-lang.el, then display the glyphs the shaper
returns.  The above description is a high-level overview; there are
many details that I cannot describe in a short message.  For example,
for Arabic, when we get back the grapheme clusters, we lay them out,
then skip to the end of the text that we passed to the shaper.

> > The character composition is implemented
> > in Lisp, which is called by the display engine, and which then calls
> > back into C to invoke the shaper.  This implementation is meant to
> > allow a great deal of control on what should be composed and how.  But
> > it is also relatively slow, which is another reason why doing that for
> > all the text to be laid out is impractical: it slows down redisplay to
> > the degree that it becomes annoying to users.
> 
> Having more control should not be at the price of doing things wrong.

No one said it should, that's just how things are.

> The whole composition concept of Emacs does not make any sense to me, all 
> text is “composed”. You can have a special mode that would disable shaping 
> for specific purposes (opening huge log files, wanting to see raw text with 
> no bidi or shaping, etc), but this can be done in cooperation with HarfBuzz 
> and not by bypassing it entirely.

We are talking about a piece of software designed 21 years ago.  I
realize that it makes no sense to you, but that's what we have, and
will probably have for the next 10 years or so.  We must make the most
out of what we have.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Eli Zaretskii
> From: Khaled Hosny 
> Date: Sat, 23 May 2020 20:09:50 +0200
> Cc: harfbuzz@lists.freedesktop.org
> 
> Overall, if you can’t send the whole text (words are the absolute minimum, 
> but this has its issues as well), don’t just send arbitrary parts of it as 
> the result will be some inconsistent mess.

I almost understand (and agree), sans one part: the "arbitrary parts"
of what you wrote.  If we want to produce a ligature out of "ffi", the
shaper will get "fii" and nothing more.  Which part here is arbitrary?

Thanks.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Khaled Hosny


> On May 23, 2020, at 10:35 AM, Eli Zaretskii  wrote:
> 
>> From: Khaled Hosny 
>> Date: Sat, 23 May 2020 09:59:15 +0200
>> Cc: harfbuzz@lists.freedesktop.org
>> 
>> Also either Emacs is currently treating text that it enables shaping for as 
>> second-class citizens where limitations/degraded performance is acceptable 
>> (which is really really bad)
> 
> Could you tell more about which limitations and degraded performance
> you had in mind?  I'm not sure we have this, but cannot tell without
> understanding the issues.

I have no idea. I’m just guessing why you think the Emacs display engine can’t 
handle all text like it handles Arabic. Either it does not handle Arabic 
correctly, or it can handle all text like it handles Arabic.

>> or “redesigning the entire Emacs display engine” is not really needed as you 
>> can just declare all text as text that needs to be shaped and be done with 
>> it.
> 
> The Emacs display engine examines the text to be displayed and laid
> out one character at a time, and makes layout decisions after each
> character or grapheme cluster it lays out.  Its design is therefore
> fundamentally incompatible with shaping large substrings of buffer
> text at once.  We do support that for short sequences of characters,
> which seems to work well enough for complex shaping (a.k.a. "character
> compositions") of scripts that require that, but we still do that one
> grapheme cluster at a time.  


That wouldn’t work for Arabic. You can’t shape Arabic one grapheme cluster at a 
time (or any other text actually, but the brokenness in Arabic will be 
immediately obvious), so I’m most certain that is not exactly how Arabic is 
handled in Emacs right now.

> The character composition is implemented
> in Lisp, which is called by the display engine, and which then calls
> back into C to invoke the shaper.  This implementation is meant to
> allow a great deal of control on what should be composed and how.  But
> it is also relatively slow, which is another reason why doing that for
> all the text to be laid out is impractical: it slows down redisplay to
> the degree that it becomes annoying to users.

Having more control should not be at the price of doing things wrong. The whole 
composition concept of Emacs does not make any sense to me, all text is 
“composed”. You can have a special mode that would disable shaping for specific 
purposes (opening huge log files, wanting to see raw text with no bidi or 
shaping, etc), but this can be done in cooperation with HarfBuzz and not by 
bypassing it entirely.

Regards,
Khaled
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Khaled Hosny


> On May 23, 2020, at 10:25 AM, Eli Zaretskii  wrote:
> 
>> From: Khaled Hosny 
>> Date: Sat, 23 May 2020 09:51:21 +0200
>> Cc: harfbuzz@lists.freedesktop.org
>> 
>>> Thanks.  Since (b) is not really feasible without redesigning the
>>> entire Emacs display engine (for which I see no volunteers lining up
>>> any time soon), I guess we will have to use some more-or-less
>>> reasonable and somewhat unreliable heuristics by supporting only some
>>> ligatures that are known in advance.
>> 
>> What are you going to do about kerning, or mark positioning? Partially 
>> kerning arbitrary glyphs (because the sub string match some regular 
>> expression) is worse than not kerning at all.
> 
> I don't think I understand the question.  How is kerning related to
> the issue at hand?

Kerning is part of text layout. You are only considering ligatures, but they 
are small part of text layout and your proposal does not seem to consider 
anything other than ligatures which is arbitrary division and makes no much 
sense to me. Some fonts provide ligatures to fix f-collioson, others fix it 
with contextual alternates, and others fix it with kerning. Your proposed 
solution does not address this. Also when you pass certain text to the layout 
engine, you get everything the font provides not just ligatures, so you would 
end up kerning certain letter combination (that you send to the layout engine) 
and not others, which is inconsistent and ugly.

Overall, if you can’t send the whole text (words are the absolute minimum, but 
this has its issues as well), don’t just send arbitrary parts of it as the 
result will be some inconsistent mess.

Regards,
Khaled

___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Eli Zaretskii
> Date: Sat, 23 May 2020 16:54:51 +0100
> From: Richard Wordingham 
> Cc: harfbuzz@lists.freedesktop.org
> 
> > Emacs supports more than one rule for each composable sequence of
> > characters.
> 
> That doesn't help when the rules give conflicting divisions into
> clusters, which is the case with Tai Tham.

The assumption is that either the rules can be arranged in an order
that allows to use the first matching rule, or, failing that, that you
write your own composing function that implements whatever logic
that's required to select the right rule.

> The Devanagari rule only covers the Vedic marks in the Devanagari block,
> the 'stress signs' according to the comments.  Can rules essentially
> for different scripts now share combining marks?  The newer Vedic marks
> were supposed to be available to at least all Indian Indic scripts.

I don't know enough about this to make sure I even understand the
question, let alone can provide an answer.  One thing I can say is
that the regexp pattern in a rule can specify different context (the
surrounding characters) even if the character that triggers the rule
is the same.  Failing that, I guess the solution will again be the
function that produces the composition.

As for different scripts: if the character codepoints are the same,
Emacs currently assigns each character to a single script.

> > Does Emacs indeed fail to wrap Arabic text?  can you show an example?
> 
> Character level wrapping still almost works down at Emacs 24.4, but I
> don't know that it wasn't broken in later enhancements.  There are three
> features that make me think Emacs 24.4 might be different to the
> current state of affairs:
> 
> (1) Clicking into the text breaks text before the cursor, but not after
> it.
> (2) I can't step into lam-alif the way I step into Indic clusters.
> (3) Lam-alif isn't broken by line wrap.

Emacs 24.4 is very old, and doesn't use HarfBuzz.  Please try Emacs 27
instead, it has several bugs in this area fixed, and will use HarfBuzz
if available at build time.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Eli Zaretskii
> Date: Sat, 23 May 2020 16:33:12 +0100
> From: Richard Wordingham 
> 
> On Sat, 23 May 2020 11:25:38 +0300
> Eli Zaretskii  wrote:
> 
> > > From: Khaled Hosny 
> > > Date: Sat, 23 May 2020 09:51:21 +0200
> > > Cc: harfbuzz@lists.freedesktop.org
> > > What are you going to do about kerning, or mark positioning?
> > > Partially kerning arbitrary glyphs (because the sub string match
> > > some regular expression) is worse than not kerning at all.  
> > 
> > I don't think I understand the question.  How is kerning related to
> > the issue at hand?  I'm not an expert on typesetting text (so maybe I
> > don't even understand what exactly is meant by "kerning" in this
> > context), so please tell more details about this.
> 
> The simplest way of laying out proportionally spaced text is to have a
> fixed glyph-dependent distance ('advance width') from the 'origin' of a
> glyph to the origin of the next glyph and simply lay them out in a
> sequence, like movable type. However, if one chooses widths suitable
> for the sequences 'AM' and 'MV', then there may be an unsightly gap in
> the middle of 'AV'. Kerning is basically the process of adjusting those
> gaps.  Kerning is done by the shaper.  To do it, it needs the
> whole sequence of characters.

Ah, okay, thanks.  Then yes, Emacs just uses the advance width that we
get from the metrics of each glyph.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Richard Wordingham
On Sat, 23 May 2020 17:22:58 +0300
Eli Zaretskii  wrote:

> > Date: Sat, 23 May 2020 14:51:53 +0100
> > From: Richard Wordingham 
> >   
> > > > They may of course have more than one set of such rules, with
> > > > the rule sets defining different sets of sequences.
> > > 
> > > Who are "they" in this context?  
> > 
> > Devanagari and Tai Tham are two examples I am aware of.  
> 
> Emacs supports more than one rule for each composable sequence of
> characters.

That doesn't help when the rules give conflicting divisions into
clusters, which is the case with Tai Tham.

On the other hand, for the Devanagari scripts, the rules can store
alternatives which some renderers would consider ill-formed, or be
more sensibly treated as 2 clusters.  
 
> > Devanagari has different rules for positioning of Vedic marks
> > between fonts using the script tags dev and dev2 for it on one hand
> > and the unofficial script tag dev3, which follows the USE rules for
> > character ordering.  For tag dev, Microsoft says that  > virama, candrabindu, consonant> is one cluster; others, including
> > Unicode, say it's two.  Candrabindu in the middle and candrabindu
> > at the end mean different things; the former nasalises a consonant,
> > while the latter nasalises a vowel.  The visual distinction exists,
> > at least when half-forms are used.  
> 
> See the rules set up near the end of indian.el in Emacs.  If they
> don't cover what you describe, we can add more.

The Devanagari rule only covers the Vedic marks in the Devanagari block,
the 'stress signs' according to the comments.  Can rules essentially
for different scripts now share combining marks?  The newer Vedic marks
were supposed to be available to at least all Indian Indic scripts.

> > > If a font requires special shaping for any sequence of any number
> > > of 26 (or maybe 52) ASCII letters, then the Emacs display engine
> > > will need to be redesigned.  So this extreme possibility doesn't
> > > bother me.  
> > 
> > In general, they do require it.  But how is this worse than handling
> > Arabic?  
> 
> I don't know.  Maybe it isn't.  Or maybe the slowdown while displaying
> ASCII and moving the cursor through it will be unbearable.
> 
> > Is the problem that you want to keep the option of line
> > wrapping splitting words for ASCII, but are not bothered for Arabic
> > or other human languages?  
> 
> Does Emacs indeed fail to wrap Arabic text?  can you show an example?

Character level wrapping still almost works down at Emacs 24.4, but I
don't know that it wasn't broken in later enhancements.  There are three
features that make me think Emacs 24.4 might be different to the
current state of affairs:

(1) Clicking into the text breaks text before the cursor, but not after
it.
(2) I can't step into lam-alif the way I step into Indic clusters.
(3) Lam-alif isn't broken by line wrap.

> > I think you mean that Emacs would store the position of components
> > by an index that was the sequence of characters, not the glyph ID.
> > That would also deal with precomposed characters - it would be the
> > character sequence that mattered, and for cursor movement and
> > rendering, the canonically equivalent sequence(s) and the
> > precomposed character would remain distinct.  
> 
> Sorry, I don't follow: what do you mean by "store"?  Emacs stores the
> rules used to compose characters, and it stores the results of the
> compositions already done by applying those rules, as part of
> displaying some chunk of text.  Which one of these did you have in
> mind?

Neither.  I thought from the Emacs developers' discussion that you were
hoping to store the locations of the character boundaries within
ligatures. 

Richard.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Richard Wordingham
On Sat, 23 May 2020 11:25:38 +0300
Eli Zaretskii  wrote:

> > From: Khaled Hosny 
> > Date: Sat, 23 May 2020 09:51:21 +0200
> > Cc: harfbuzz@lists.freedesktop.org
> > What are you going to do about kerning, or mark positioning?
> > Partially kerning arbitrary glyphs (because the sub string match
> > some regular expression) is worse than not kerning at all.  
> 
> I don't think I understand the question.  How is kerning related to
> the issue at hand?  I'm not an expert on typesetting text (so maybe I
> don't even understand what exactly is meant by "kerning" in this
> context), so please tell more details about this.

The simplest way of laying out proportionally spaced text is to have a
fixed glyph-dependent distance ('advance width') from the 'origin' of a
glyph to the origin of the next glyph and simply lay them out in a
sequence, like movable type. However, if one chooses widths suitable
for the sequences 'AM' and 'MV', then there may be an unsightly gap in
the middle of 'AV'. Kerning is basically the process of adjusting those
gaps.  Kerning is done by the shaper.  To do it, it needs the
whole sequence of characters.

To a first approximation, mark positioning is handled by passing the
whole clusters to the shaper, and suitable regular expressions will
handle this.  However, sometimes clusters will interact.  Microsoft had
an example in the OpenType specification of the handling of the sequence
Wö  with a comparatively huge 'W'.  In this
example, the umlaut would be lowered to get out of the way of the 'W'.
To do this, the shaper has to be presented with "W" and "ö" as part of
the same sequence.

Richard.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Eli Zaretskii
> Date: Sat, 23 May 2020 14:51:53 +0100
> From: Richard Wordingham 
> 
> > > They may of course have more than one set of such rules, with the
> > > rule sets defining different sets of sequences.  
> > 
> > Who are "they" in this context?
> 
> Devanagari and Tai Tham are two examples I am aware of.

Emacs supports more than one rule for each composable sequence of
characters.

> Devanagari has different rules for positioning of Vedic marks between
> fonts using the script tags dev and dev2 for it on one hand and the
> unofficial script tag dev3, which follows the USE rules for character
> ordering.  For tag dev, Microsoft says that  candrabindu, consonant> is one cluster; others, including Unicode, say
> it's two.  Candrabindu in the middle and candrabindu at the end mean
> different things; the former nasalises a consonant, while the latter
> nasalises a vowel.  The visual distinction exists, at least when
> half-forms are used.

See the rules set up near the end of indian.el in Emacs.  If they
don't cover what you describe, we can add more.

> > I'm not talking about Arabic.  Emacs has a set of regular expressions
> > for sequences of Arabic characters that need shaping, misc-lang.el in
> > Emacs.  If the set is incomplete, we can augment it.
> 
> That regular expression treats every Arabic word as in need of shaping. 
> 
> > If a font requires special shaping for any sequence of any number of
> > 26 (or maybe 52) ASCII letters, then the Emacs display engine will
> > need to be redesigned.  So this extreme possibility doesn't bother me.
> 
> In general, they do require it.  But how is this worse than handling
> Arabic?

I don't know.  Maybe it isn't.  Or maybe the slowdown while displaying
ASCII and moving the cursor through it will be unbearable.

> Is the problem that you want to keep the option of line
> wrapping splitting words for ASCII, but are not bothered for Arabic or
> other human languages?

Does Emacs indeed fail to wrap Arabic text?  can you show an example?

> > > How would you handle the possibility that all three of <æ>, 
> > > and  might be rendered by the same glyph, althouɡh they
> > > are comprised of 1, 2 and 3 characters respectively?  
> > 
> > By using a composition rule that matches both  and .
> > The rules are regexp-based, and expressing the above as a regexp is
> > simple.  Once a sequence of characters matches the regexp, Emacs calls
> > the shaper (hb_shape etc.) to produce the font glyphs for the
> > sequence, and displays the glyphs that the shaper returns.
> 
> I think you mean that Emacs would store the position of components by
> an index that was the sequence of characters, not the glyph ID.  That
> would also deal with precomposed characters - it would be the character
> sequence that mattered, and for cursor movement and rendering,
> the canonically equivalent sequence(s) and the precomposed character
> would remain distinct.

Sorry, I don't follow: what do you mean by "store"?  Emacs stores the
rules used to compose characters, and it stores the results of the
compositions already done by applying those rules, as part of
displaying some chunk of text.  Which one of these did you have in
mind?
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Richard Wordingham
On Sat, 23 May 2020 09:09:48 +0300
Eli Zaretskii  wrote:

> > Date: Fri, 22 May 2020 22:22:49 +0100
> > From: Richard Wordingham 
> >   
> > > The current support for producing ligatures works in the same way
> > > as complex text shaping for scripts that require that, like
> > > Arabic and Khmer: the sequences of characters that can be
> > > displayed as ligatures are identified in advance with suitable
> > > regular expressions, and the display engine then passes these
> > > sequences to hb_shape to produce the ligatures.
> > > 
> > > This works well for scripts that require complex shaping, because
> > > such scripts generally have well-defined rules for the sequences
> > > of codepoints that need shaping.  
> > 
> > They may of course have more than one set of such rules, with the
> > rule sets defining different sets of sequences.  
> 
> Who are "they" in this context?

Devanagari and Tai Tham are two examples I am aware of.

Devanagari has different rules for positioning of Vedic marks between
fonts using the script tags dev and dev2 for it on one hand and the
unofficial script tag dev3, which follows the USE rules for character
ordering.  For tag dev, Microsoft says that  is one cluster; others, including Unicode, say
it's two.  Candrabindu in the middle and candrabindu at the end mean
different things; the former nasalises a consonant, while the latter
nasalises a vowel.  The visual distinction exists, at least when
half-forms are used.

Tai Tham has an issue with the mark U+1A58 TAI THAM SIGN MAI KANG LAI.
It is, at least formally, a non-spacing mark.  It occurs at the
juncture of two syllables in the same words.  Modern, printed Tai Khuen
happily treats it as syllable-final.  In more traditional styles, it
starts syllables, going above the first consonant, and so to the right
of a vowel mark reordered to the left hand side of the syllable.  Some
fonts seem to just let it hang over the start of the next syllable,
taking pot luck with what's there.  That gives two different syllable
structures.

As I supported the style found in a certain dictionary, it sometimes
belongs with the syllable before, and sometimes with the syllable after
it.  I therefore ended up defined the sequences to be shaped as a
sequence of one or more syllables joined together by U+1A58.
Fortunately, normal cursor motion is controlled by a different
definition.  (I'm still using Emacs 24.4 with the restoration of
interactive commands forward-char-intrusive and backward-char-intrusive
and their interface within the C code.)

> I understand that the number of combinations is theoretically
> unbounded.  I'm asking if it is also unbounded in practice.  That is,
> do font designers add ligatures for arbitrary combinations of
> characters, regardless of some reasonable set of requirements?  For
> example, is the set of ligatures of Latin characters shown here:
> 
>   https://en.wikipedia.org/wiki/Orthographic_ligature#Latin_alphabet
> 
> reasonably complete, or should I expect any number of other arbitrary
> combinations of Latin characters popping up in fonts?  And if the
> latter, then what is the purpose of providing such arbitrary
> ligatures?

Doesn't the existence of ligatures for 'Eisenhower' and 'Chamberlain'
provide enough of an answer?

If you claim to support handwriting fonts, then you can expect others -
'sh', 'tt' and 'ing' are fairly obvious ones.  You may also find
ligatures being used to sort out kerning issues.

One problem I've observed with computer fonts is that the spacing of
glyphs in a string is not consistent.  This appears to be due to the
way the positioning of the glyphs is rounded.  The problem can be bad
enough that the designer ends up fixing the problem by combining them
into a single glyph, which formally is a ligature.  I've not noticed
this in ASCII fonts, but then I haven't looked hard at them.

The 'tt' ligature can arise because the two t's are crossed by a
single stroke.  Crossing the 't' in 'lt' might be handled by a special
't' glyph, or one might just form an 'lt' ligature.  The ending 'ing'
is common enough that I unconsciously developed an abbreviated way of
writing it.

> I'm not talking about Arabic.  Emacs has a set of regular expressions
> for sequences of Arabic characters that need shaping, misc-lang.el in
> Emacs.  If the set is incomplete, we can augment it.

That regular expression treats every Arabic word as in need of shaping. 

> If a font requires special shaping for any sequence of any number of
> 26 (or maybe 52) ASCII letters, then the Emacs display engine will
> need to be redesigned.  So this extreme possibility doesn't bother me.

In general, they do require it.  But how is this worse than handling
Arabic?  Is the problem that you want to keep the option of line
wrapping splitting words for ASCII, but are not bothered for Arabic or
other human languages?  ASCII does not satisfyingly suffice for
English.

> > How would you handle the possibility that all three of <æ>, 
> 

Re: [HarfBuzz] Ligatures

2020-05-23 Thread Eli Zaretskii
> From: Khaled Hosny 
> Date: Sat, 23 May 2020 09:59:15 +0200
> Cc: harfbuzz@lists.freedesktop.org
> 
> Also either Emacs is currently treating text that it enables shaping for as 
> second-class citizens where limitations/degraded performance is acceptable 
> (which is really really bad)

Could you tell more about which limitations and degraded performance
you had in mind?  I'm not sure we have this, but cannot tell without
understanding the issues.

> or “redesigning the entire Emacs display engine” is not really needed as you 
> can just declare all text as text that needs to be shaped and be done with it.

The Emacs display engine examines the text to be displayed and laid
out one character at a time, and makes layout decisions after each
character or grapheme cluster it lays out.  Its design is therefore
fundamentally incompatible with shaping large substrings of buffer
text at once.  We do support that for short sequences of characters,
which seems to work well enough for complex shaping (a.k.a. "character
compositions") of scripts that require that, but we still do that one
grapheme cluster at a time.  The character composition is implemented
in Lisp, which is called by the display engine, and which then calls
back into C to invoke the shaper.  This implementation is meant to
allow a great deal of control on what should be composed and how.  But
it is also relatively slow, which is another reason why doing that for
all the text to be laid out is impractical: it slows down redisplay to
the degree that it becomes annoying to users.

That is why solving these problems in the way that you suggest
requires a complete rewrite of the Emacs display code.  It simply
cannot currently support what you expect.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Eli Zaretskii
> From: Khaled Hosny 
> Date: Sat, 23 May 2020 09:51:21 +0200
> Cc: harfbuzz@lists.freedesktop.org
> 
> > Thanks.  Since (b) is not really feasible without redesigning the
> > entire Emacs display engine (for which I see no volunteers lining up
> > any time soon), I guess we will have to use some more-or-less
> > reasonable and somewhat unreliable heuristics by supporting only some
> > ligatures that are known in advance.
> 
> What are you going to do about kerning, or mark positioning? Partially 
> kerning arbitrary glyphs (because the sub string match some regular 
> expression) is worse than not kerning at all.

I don't think I understand the question.  How is kerning related to
the issue at hand?  I'm not an expert on typesetting text (so maybe I
don't even understand what exactly is meant by "kerning" in this
context), so please tell more details about this.

Thanks.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Khaled Hosny


> On May 23, 2020, at 9:51 AM, Khaled Hosny  wrote:
> 
> 
> 
>> On May 23, 2020, at 9:44 AM, Eli Zaretskii  wrote:
>> 
>>> From: Khaled Hosny 
>>> Date: Sat, 23 May 2020 08:36:10 +0200
>>> Cc: harfbuzz@lists.freedesktop.org
>>> 
  The only way of
 doing this right, I'm told, is to either (a) query the font to get the
 list of all the ligatures it supports, or (b) assume any combination
 of characters can produce a ligature, and therefore we need to pass
 all the characters intended for display through hb_shape.  The latter
 in particular is in stark contrast to how the current Emacs display
 code is designed and implemented.
>>> 
>>> (a) is not realistically possible as doing it properly has pretty much the 
>>> same cost as shaping the text. So your only reliable option is (b).
>> 
>> Thanks.  Since (b) is not really feasible without redesigning the
>> entire Emacs display engine (for which I see no volunteers lining up
>> any time soon), I guess we will have to use some more-or-less
>> reasonable and somewhat unreliable heuristics by supporting only some
>> ligatures that are known in advance.
> 
> What are you going to do about kerning, or mark positioning? Partially 
> kerning arbitrary glyphs (because the sub string match some regular 
> expression) is worse than not kerning at all.

Also either Emacs is currently treating text that it enables shaping for as 
second-class citizens where limitations/degraded performance is acceptable 
(which is really really bad), or “redesigning the entire Emacs display engine” 
is not really needed as you can just declare all text as text that needs to be 
shaped and be done with it.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Khaled Hosny



> On May 23, 2020, at 9:44 AM, Eli Zaretskii  wrote:
> 
>> From: Khaled Hosny 
>> Date: Sat, 23 May 2020 08:36:10 +0200
>> Cc: harfbuzz@lists.freedesktop.org
>> 
>>>   The only way of
>>> doing this right, I'm told, is to either (a) query the font to get the
>>> list of all the ligatures it supports, or (b) assume any combination
>>> of characters can produce a ligature, and therefore we need to pass
>>> all the characters intended for display through hb_shape.  The latter
>>> in particular is in stark contrast to how the current Emacs display
>>> code is designed and implemented.
>> 
>> (a) is not realistically possible as doing it properly has pretty much the 
>> same cost as shaping the text. So your only reliable option is (b).
> 
> Thanks.  Since (b) is not really feasible without redesigning the
> entire Emacs display engine (for which I see no volunteers lining up
> any time soon), I guess we will have to use some more-or-less
> reasonable and somewhat unreliable heuristics by supporting only some
> ligatures that are known in advance.

What are you going to do about kerning, or mark positioning? Partially kerning 
arbitrary glyphs (because the sub string match some regular expression) is 
worse than not kerning at all.

___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Eli Zaretskii
> From: Khaled Hosny 
> Date: Sat, 23 May 2020 08:36:10 +0200
> Cc: harfbuzz@lists.freedesktop.org
> 
> >The only way of
> > doing this right, I'm told, is to either (a) query the font to get the
> > list of all the ligatures it supports, or (b) assume any combination
> > of characters can produce a ligature, and therefore we need to pass
> > all the characters intended for display through hb_shape.  The latter
> > in particular is in stark contrast to how the current Emacs display
> > code is designed and implemented.
> 
> (a) is not realistically possible as doing it properly has pretty much the 
> same cost as shaping the text. So your only reliable option is (b).

Thanks.  Since (b) is not really feasible without redesigning the
entire Emacs display engine (for which I see no volunteers lining up
any time soon), I guess we will have to use some more-or-less
reasonable and somewhat unreliable heuristics by supporting only some
ligatures that are known in advance.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Khaled Hosny



> On May 22, 2020, at 9:32 PM, Eli Zaretskii  wrote:
> 
> Hi,
> 
> This is a bit off-topic, but I thought it could be appropriate to ask
> here, since we have here some of the best experts on this subject.
> 
> We are discussing support for ligatures in Emacs, specifically when
> using HarfBuzz as the shaping engine.  See the discussion from
> 
>  https://lists.gnu.org/archive/html/emacs-devel/2020-05/msg02493.html
> 
> The current support for producing ligatures works in the same way as
> complex text shaping for scripts that require that, like Arabic and
> Khmer: the sequences of characters that can be displayed as ligatures
> are identified in advance with suitable regular expressions, and the
> display engine then passes these sequences to hb_shape to produce the
> ligatures.
> 
> This works well for scripts that require complex shaping, because such
> scripts generally have well-defined rules for the sequences of
> codepoints that need shaping.  My original thoughts were that
> ligatures could be supported in the same way, based on the assumption
> that the list of possible ligatures is finite and can be stored in a
> suitable data stricture in advance.

I might be stating the obvious, but what Emacs is doing is a very outdated view 
of text layout. The schism between so called complex text and simple text does 
not actually exist. There are script-specific shaping rules that layout engines 
know and apply, and there are additional/complementary rules provided by the 
font that layout engines also apply.

For all applications care about, they have text with certain properties and 
fonts, and they hand them to the layout engine and get back positioned glyphs. 
Any attempt to second guess the layout engine and classify the text into parts 
that need or do not need shaping is futile.

Fonts can, and do, provide any number of arbitrary glyph interactions (not just 
ligatures), and the only reliable way to know that is to shape and check the 
output.

I think I already said this before, but Emacs should indiscriminately give all 
the text to HarfBuzz (or any other text layout engine it additionally supports) 
and give up on trying to pre-classify text, and is what pretty much any other 
sensible application is doing already. There are many ways to solve potential 
performance issues that does not involve compromising on the text layout.

> However, I'm being told that this assumption is false, and that each
> font defines ligatures from any number of arbitrary combinations of
> characters, and therefore the exhaustive list of the ligatures is in
> practice infinite and cannot be provided in advance.

That is true.

>The only way of
> doing this right, I'm told, is to either (a) query the font to get the
> list of all the ligatures it supports, or (b) assume any combination
> of characters can produce a ligature, and therefore we need to pass
> all the characters intended for display through hb_shape.  The latter
> in particular is in stark contrast to how the current Emacs display
> code is designed and implemented.

(a) is not realistically possible as doing it properly has pretty much the same 
cost as shaping the text. So your only reliable option is (b).

Regards,
Khaled
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Eli Zaretskii
> Date: Fri, 22 May 2020 22:22:49 +0100
> From: Richard Wordingham 
> 
> > The current support for producing ligatures works in the same way as
> > complex text shaping for scripts that require that, like Arabic and
> > Khmer: the sequences of characters that can be displayed as ligatures
> > are identified in advance with suitable regular expressions, and the
> > display engine then passes these sequences to hb_shape to produce the
> > ligatures.
> > 
> > This works well for scripts that require complex shaping, because such
> > scripts generally have well-defined rules for the sequences of
> > codepoints that need shaping.
> 
> They may of course have more than one set of such rules, with the rule
> sets defining different sets of sequences.

Who are "they" in this context?

> > However, I'm being told that this assumption is false, and that each
> > font defines ligatures from any number of arbitrary combinations of
> > characters, and therefore the exhaustive list of the ligatures is in
> > practice infinite and cannot be provided in advance.
> 
> This arbitrariness is true.  Over the set of all credible fonts for a
> given character repertoire, the number of ligating combinations is
> unbounded.

I understand that the number of combinations is theoretically
unbounded.  I'm asking if it is also unbounded in practice.  That is,
do font designers add ligatures for arbitrary combinations of
characters, regardless of some reasonable set of requirements?  For
example, is the set of ligatures of Latin characters shown here:

  https://en.wikipedia.org/wiki/Orthographic_ligature#Latin_alphabet

reasonably complete, or should I expect any number of other arbitrary
combinations of Latin characters popping up in fonts?  And if the
latter, then what is the purpose of providing such arbitrary
ligatures?

> > To be specific, I'm talking about 2 kinds of ligatures:
> > 
> >   . ligatures made of Latin characters, like "ffi" and "Th"
> >   . ligatures produced from symbols, like "==>" that is
> > converted into ⟹

Yes, these are the only cases that I'm asking here about.  I'm not
asking about shaping complex scripts such as Arabic, where this
problem doesn't exist AFAIK.

> Have you addressed the cursive scripts yet, such as Arabic?  At its
> simplest, most consonants have four shapes, initial, medial, final and
> isolated, and roughly speaking the shape used depends on the adjacent
> spacing characters.  For the most part, Emacs would have to pass whole
> words into HarfBuzz for shaping.  In some of the more advanced fonts,
> the vowel marks in a word may also affect the shape of the consonant
> skeleton.  And of course, sometimes the Arabic script prefers to join
> letters vertically, as well as having a few straightforward ligatures.

I'm not talking about Arabic.  Emacs has a set of regular expressions
for sequences of Arabic characters that need shaping, misc-lang.el in
Emacs.  If the set is incomplete, we can augment it.

> A cursive Latin script font may behave in the same way, with the shape
> of letters depending on what precedes and follows them.  With a small
> enough character repertoire, there might be no ligatures, but your
> rendering logic would fail miserably.

If a font requires special shaping for any sequence of any number of
26 (or maybe 52) ASCII letters, then the Emacs display engine will
need to be redesigned.  So this extreme possibility doesn't bother me.

> How would you handle the possibility that all three of <æ>,  and
>  might be rendered by the same glyph, althouɡh they are
> comprised of 1, 2 and 3 characters respectively?

By using a composition rule that matches both  and .
The rules are regexp-based, and expressing the above as a regexp is
simple.  Once a sequence of characters matches the regexp, Emacs calls
the shaper (hb_shape etc.) to produce the font glyphs for the
sequence, and displays the glyphs that the shaper returns.

> And if Emacs is not imposing a normalisation, then all the
> precomposed characters in Unicode might have been entered as one or
> as more than one character?

If you are talking about composition with combining characters, Emacs
already has the rules to compose them as described above.  You can try
this in your Emacs: insert a, then U+0301 COMBINING ACUTE ACCENT, and
you should see them composed into a single glyph (provided that you
use a suitable font).

But I'm not asking about character composition in general, I'm asking
specifically about ligatures of ASCII characters, without any
non-ASCII codepoints or combining accents.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-22 Thread Richard Wordingham
On Fri, 22 May 2020 22:32:04 +0300
Eli Zaretskii  wrote:

> Can someone please tell what are the recommended practices regarding
> these ligatures?  Is the set of possible ligatures indeed infinite and
> impossible to know in advance?  And does HarfBuzz have APIs to query a
> font about the ligatures it supports?

hb_ot_layout_get_ligature_carets() is liable to be garbage in garbage
out.  While the cursor positions were included in OTL fonts to assist
cursor placement, it obviously fails when the components are stacked
vertically. Microsoft gave up on it and, if I remember the informal
statement correctly, just divides it up evenly between the characters
or grapheme clusters.  Many OpenType fonts don't populate the relevant
section of the GDEF table. And, of course, one has real trouble when
one glyph can come from different numbers of components.

LibreOffice takes (or took) a different approach, and uses the width of
the characters logically before the insertion point.  It's rather
disconcerting when the cursor jumps backwards as one steps through the
string.  It could happen with the Latin script string "a͡i", for the
'double' inverted breve should shorten when the second letter is 'i'.
One can get the effect in Indic scripts because of spacing viramas.

Richard.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-22 Thread Richard Wordingham
On Fri, 22 May 2020 22:32:04 +0300
Eli Zaretskii  wrote:

> Hi,
> 
> This is a bit off-topic, but I thought it could be appropriate to ask
> here, since we have here some of the best experts on this subject.
> 
> We are discussing support for ligatures in Emacs, specifically when
> using HarfBuzz as the shaping engine.  See the discussion from
> 
>   https://lists.gnu.org/archive/html/emacs-devel/2020-05/msg02493.html
> 
> The current support for producing ligatures works in the same way as
> complex text shaping for scripts that require that, like Arabic and
> Khmer: the sequences of characters that can be displayed as ligatures
> are identified in advance with suitable regular expressions, and the
> display engine then passes these sequences to hb_shape to produce the
> ligatures.
> 
> This works well for scripts that require complex shaping, because such
> scripts generally have well-defined rules for the sequences of
> codepoints that need shaping.

They may of course have more than one set of such rules, with the rule
sets defining different sets of sequences.

> My original thoughts were that
> ligatures could be supported in the same way, based on the assumption
> that the list of possible ligatures is finite and can be stored in a
> suitable data stricture in advance.

At one level, this is true for any individual font, for it cannot have
more than 65,536 glyphs.

> However, I'm being told that this assumption is false, and that each
> font defines ligatures from any number of arbitrary combinations of
> characters, and therefore the exhaustive list of the ligatures is in
> practice infinite and cannot be provided in advance.

This arbitrariness is true.  Over the set of all credible fonts for a
given character repertoire, the number of ligating combinations is
unbounded.

> The only way of
> doing this right, I'm told, is to either (a) query the font to get the
> list of all the ligatures it supports, or (b) assume any combination
> of characters can produce a ligature, and therefore we need to pass
> all the characters intended for display through hb_shape.  The latter
> in particular is in stark contrast to how the current Emacs display
> code is designed and implemented.

> To be specific, I'm talking about 2 kinds of ligatures:
> 
>   . ligatures made of Latin characters, like "ffi" and "Th"
>   . ligatures produced from symbols, like "==>" that is
> converted into ⟹
> 
> Can someone please tell what are the recommended practices regarding
> these ligatures?  Is the set of possible ligatures indeed infinite and
> impossible to know in advance?  And does HarfBuzz have APIs to query a
> font about the ligatures it supports?

Have you addressed the cursive scripts yet, such as Arabic?  At its
simplest, most consonants have four shapes, initial, medial, final and
isolated, and roughly speaking the shape used depends on the adjacent
spacing characters.  For the most part, Emacs would have to pass whole
words into HarfBuzz for shaping.  In some of the more advanced fonts,
the vowel marks in a word may also affect the shape of the consonant
skeleton.  And of course, sometimes the Arabic script prefers to join
letters vertically, as well as having a few straightforward ligatures.

A cursive Latin script font may behave in the same way, with the shape
of letters depending on what precedes and follows them.  With a small
enough character repertoire, there might be no ligatures, but your
rendering logic would fail miserably.

How would you handle the possibility that all three of <æ>,  and
 might be rendered by the same glyph, althouɡh they are
comprised of 1, 2 and 3 characters respectively?  And if Emacs is not
imposing a normalisation, then all the precomposed characters in
Unicode might have been entered as one or as more than one character? 

Richard.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures and color changes

2013-05-21 Thread Khaled Hosny
On Tue, Feb 19, 2013 at 05:53:04PM -0500, Behdad Esfahbod wrote:
 On 02/19/2013 05:47 PM, Khaled Hosny wrote:
  On Tue, Feb 19, 2013 at 05:34:40PM -0500, Behdad Esfahbod wrote:
  As for *where* to cut the ligature, here's what you need:
 
* Count the number of cursor positions *inside* the ligature.  For the 
  fi
  ligature it's one.  And we have one cursor position before the ligature, 
  so in
  this case we need to cut it in two pieces,
 
* The common heuristic then is to cut the advance width of the ligature
  (well, cluster really) into two equal pieces.  If you want to be fancy, you
  can call hb_ot_layout_get_ligature_carets(), and if the number of carets
  matches what you expect (1 in this case I believe?), you can use the 
  returned
  caret positions instead of equally dividing the ligature.  I haven't seen
  anyone implementing this though, as it gives very marginal improvements 
  over
  the heuristic.
  
  It can make quite some different with some Arabic ligatures, but few
  fonts implement it because few (no?) engines support it :)
 
 Correct.  Maybe you can give me a font... ;)

OK, here is one :)
https://github.com/khaledhosny/sahl-naskh

 Note that BTW, a similar issue exists when kerning text.  Most fonts implement
 kerning by adjusting the advance width of the first glyph.  What this means
 however is that for a pair like Te, if the e moves way under the T,
 essentially we will get a very narrow selection width for the T, and
 unchanged width for the e.  That's less than ideal.
 
 In HarfBuzz we split the kerning half-and-half for old-style TrueType kern
 pairs.  But don't do something like that for GPOS kerning since, well, with
 GPOS the font designer has full control on what to do.  Maybe we should do the
 same for GPOS kerning tables that only have adjustment for the first glyph and
 not the second?  Donno.  May be a nice improvement.  What do others think?

I think it would be a good idea, that is the majority of LTR kerning
anyway.

Regards,
Khaled
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures and color changes

2013-02-20 Thread Jonathan Kew

On 19/2/13 23:35, Lóránt Pintér wrote:

Using ZWNJ would be a great way of fixing it. And indeed it splits the
ligature all right, but it also destroys the kerning. Here's a little
test I did with HarfBuzz @ e0486fc1affd3796fb8f664e2e7fc208f1d2106c:
(the font has an fi ligature, and has some kerning between F and ,)

Splitting the ligature with the ZWNJ works:

Shaped fif\u200Ci with features: [  ]
Glyph: #682, x_advance: 652
Glyph: #71, x_advance: 380
Glyph: #3, x_advance: 0
Glyph: #74, x_advance: 292

Kerning across the ZWNJ does not:

Shaped F,F\u200C, with features: [  ]
Glyph: #39, x_advance: 483
Glyph: #13, x_advance: 242
Glyph: #39, x_advance: 563
Glyph: #3, x_advance: 0
Glyph: #13, x_advance: 242

Is it possible that doing this will be supported later in HarfBuzz?



Hmm - is that font using a legacy 'kern' table, or a GPOS 'kern' 
feature? For the former, I can understand that kerning would break, as 
it's a naïve glyph-pair lookup, but for the latter, I thought we should 
now ignore the ZWNJ.


JK

___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures and color changes

2013-02-20 Thread Lóránt Pintér
I'm trying this with a font that has the kerning data in the kern table as 
well as in the GPOS table. (Is that possible?)  

Can you show me a font that this feature works with?  

--  
Lóránt Pintér
Developer at Prezi (http://prezi.com)



On Wednesday, February 20, 2013 at 10:26 AM, Jonathan Kew wrote:

 On 19/2/13 23:35, Lóránt Pintér wrote:
  Using ZWNJ would be a great way of fixing it. And indeed it splits the
  ligature all right, but it also destroys the kerning. Here's a little
  test I did with HarfBuzz @ e0486fc1affd3796fb8f664e2e7fc208f1d2106c:
  (the font has an fi ligature, and has some kerning between F and ,)
   
  Splitting the ligature with the ZWNJ works:
   
  Shaped fif\u200Ci with features: [ ]
  Glyph: #682, x_advance: 652
  Glyph: #71, x_advance: 380
  Glyph: #3, x_advance: 0
  Glyph: #74, x_advance: 292
   
  Kerning across the ZWNJ does not:
   
  Shaped F,F\u200C, with features: [ ]
  Glyph: #39, x_advance: 483
  Glyph: #13, x_advance: 242
  Glyph: #39, x_advance: 563
  Glyph: #3, x_advance: 0
  Glyph: #13, x_advance: 242
   
  Is it possible that doing this will be supported later in HarfBuzz?
  
 Hmm - is that font using a legacy 'kern' table, or a GPOS 'kern'  
 feature? For the former, I can understand that kerning would break, as  
 it's a naïve glyph-pair lookup, but for the latter, I thought we should  
 now ignore the ZWNJ.
  
 JK  

___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures and color changes

2013-02-20 Thread Jonathan Kew

On 20/2/13 12:24, Lóránt Pintér wrote:

I'm trying this with a font that has the kerning data in the kern
table as well as in the GPOS table. (Is that possible?)

Can you show me a font that this feature works with?



I thought we'd seen it work last week, but now it doesn't seem to. :( 
Behdad, did we miss something here? It seems to me like the skippy_iter 
in PairPosFormat1::apply doesn't know what it's looking for...


JK


--

*Lóránt Pintér*

Developer at Prezi http://prezi.com

On Wednesday, February 20, 2013 at 10:26 AM, Jonathan Kew wrote:


On 19/2/13 23:35, Lóránt Pintér wrote:

Using ZWNJ would be a great way of fixing it. And indeed it splits the
ligature all right, but it also destroys the kerning. Here's a little
test I did with HarfBuzz @ e0486fc1affd3796fb8f664e2e7fc208f1d2106c:
(the font has an fi ligature, and has some kerning between F and ,)

Splitting the ligature with the ZWNJ works:

Shaped fif\u200Ci with features: [ ]
Glyph: #682, x_advance: 652
Glyph: #71, x_advance: 380
Glyph: #3, x_advance: 0
Glyph: #74, x_advance: 292

Kerning across the ZWNJ does not:

Shaped F,F\u200C, with features: [ ]
Glyph: #39, x_advance: 483
Glyph: #13, x_advance: 242
Glyph: #39, x_advance: 563
Glyph: #3, x_advance: 0
Glyph: #13, x_advance: 242

Is it possible that doing this will be supported later in HarfBuzz?


Hmm - is that font using a legacy 'kern' table, or a GPOS 'kern'
feature? For the former, I can understand that kerning would break, as
it's a naïve glyph-pair lookup, but for the latter, I thought we should
now ignore the ZWNJ.

JK




___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures and color changes

2013-02-19 Thread Behdad Esfahbod
Hi Lóránt,

On 02/19/2013 12:20 PM, Lóránt Pintér wrote:
 Hi,
 
 I have a problem with half-colored ligatures, like (5) mfim in the image:

Right.  That's one of the harder issues of text rendering.

 I figured out two ways to do this, but neither is good enough:
 
   * I can shape each color range separately, but then I lose the kerning
 between them, breaking (6)

Yes.  Best to not do this.

   * I can tell HarfBuzz to disable ligatures for the last of the first
 character of each color range, but then it breaks (2) or (3) and (4).

Right.  This is a limitation of HarfBuzz currently, that you can't turn off a
pair-wise feature on one pair only, since changing the liga bit on one
character affects it in both directions.

I haven't been able to find a satisfactory fix for this yet.  I'll think about 
it.


 Is there maybe a way to tell HarfBuzz to ignore ligatures if they span that
 color boundary? Or is there maybe a way to (quickly) assess if liga would be
 applied to a range of characters?

We don't have a good answer for this right now.  The way I want to eventually
fix this in Pango is different: it is to pain the ligature glyph half in each
color.  I think you can do the same using canvas.  Just use a gradient with
a sharp color switch for the ligature.  It's a royal pain, but I think that's
the most desirable rendering.  I may be wrong.

As for *where* to cut the ligature, here's what you need:

  * Count the number of cursor positions *inside* the ligature.  For the fi
ligature it's one.  And we have one cursor position before the ligature, so in
this case we need to cut it in two pieces,

  * The common heuristic then is to cut the advance width of the ligature
(well, cluster really) into two equal pieces.  If you want to be fancy, you
can call hb_ot_layout_get_ligature_carets(), and if the number of carets
matches what you expect (1 in this case I believe?), you can use the returned
caret positions instead of equally dividing the ligature.  I haven't seen
anyone implementing this though, as it gives very marginal improvements over
the heuristic.

Hope that helps,
behdad

 Thanks.
 
 -- 
 
 *Lóránt Pintér*
 
 Developer at Prezi http://prezi.com
 
 
 
 ___
 HarfBuzz mailing list
 HarfBuzz@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/harfbuzz
 

-- 
behdad
http://behdad.org/
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures and color changes

2013-02-19 Thread Khaled Hosny
On Tue, Feb 19, 2013 at 05:34:40PM -0500, Behdad Esfahbod wrote:
  Is there maybe a way to tell HarfBuzz to ignore ligatures if they span that
  color boundary? Or is there maybe a way to (quickly) assess if liga would 
  be
  applied to a range of characters?
 
 We don't have a good answer for this right now.  The way I want to eventually
 fix this in Pango is different: it is to pain the ligature glyph half in each
 color.  I think you can do the same using canvas.  Just use a gradient with
 a sharp color switch for the ligature.  It's a royal pain, but I think that's
 the most desirable rendering.  I may be wrong.

Gecko is (was?) using clipping to partially draw each part of the
ligature:
http://robert.ocallahan.org/2006/10/partial-ligatures_24.html

 As for *where* to cut the ligature, here's what you need:
 
   * Count the number of cursor positions *inside* the ligature.  For the fi
 ligature it's one.  And we have one cursor position before the ligature, so in
 this case we need to cut it in two pieces,
 
   * The common heuristic then is to cut the advance width of the ligature
 (well, cluster really) into two equal pieces.  If you want to be fancy, you
 can call hb_ot_layout_get_ligature_carets(), and if the number of carets
 matches what you expect (1 in this case I believe?), you can use the returned
 caret positions instead of equally dividing the ligature.  I haven't seen
 anyone implementing this though, as it gives very marginal improvements over
 the heuristic.

It can make quite some different with some Arabic ligatures, but few
fonts implement it because few (no?) engines support it :)

Regards,
Khaled
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures and color changes

2013-02-19 Thread Behdad Esfahbod
On 02/19/2013 05:47 PM, Khaled Hosny wrote:
 On Tue, Feb 19, 2013 at 05:34:40PM -0500, Behdad Esfahbod wrote:
 Is there maybe a way to tell HarfBuzz to ignore ligatures if they span that
 color boundary? Or is there maybe a way to (quickly) assess if liga would 
 be
 applied to a range of characters?

 We don't have a good answer for this right now.  The way I want to eventually
 fix this in Pango is different: it is to pain the ligature glyph half in each
 color.  I think you can do the same using canvas.  Just use a gradient with
 a sharp color switch for the ligature.  It's a royal pain, but I think that's
 the most desirable rendering.  I may be wrong.
 
 Gecko is (was?) using clipping to partially draw each part of the
 ligature:
 http://robert.ocallahan.org/2006/10/partial-ligatures_24.html

Thanks for the pointer.  Yes, that is also what GTK+ used to do (that code got
rewritten so I don't know what it does now) for selection, but not for color
attributes.  Note that attributes like underline are also affected in the same
way.


 As for *where* to cut the ligature, here's what you need:

   * Count the number of cursor positions *inside* the ligature.  For the fi
 ligature it's one.  And we have one cursor position before the ligature, so 
 in
 this case we need to cut it in two pieces,

   * The common heuristic then is to cut the advance width of the ligature
 (well, cluster really) into two equal pieces.  If you want to be fancy, you
 can call hb_ot_layout_get_ligature_carets(), and if the number of carets
 matches what you expect (1 in this case I believe?), you can use the returned
 caret positions instead of equally dividing the ligature.  I haven't seen
 anyone implementing this though, as it gives very marginal improvements over
 the heuristic.
 
 It can make quite some different with some Arabic ligatures, but few
 fonts implement it because few (no?) engines support it :)

Correct.  Maybe you can give me a font... ;)

Note that BTW, a similar issue exists when kerning text.  Most fonts implement
kerning by adjusting the advance width of the first glyph.  What this means
however is that for a pair like Te, if the e moves way under the T,
essentially we will get a very narrow selection width for the T, and
unchanged width for the e.  That's less than ideal.

In HarfBuzz we split the kerning half-and-half for old-style TrueType kern
pairs.  But don't do something like that for GPOS kerning since, well, with
GPOS the font designer has full control on what to do.  Maybe we should do the
same for GPOS kerning tables that only have adjustment for the first glyph and
not the second?  Donno.  May be a nice improvement.  What do others think?

behdad

 Regards,
 Khaled
 

-- 
behdad
http://behdad.org/
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures and color changes

2013-02-19 Thread Jonathan Kew

On 19/2/13 22:34, Behdad Esfahbod wrote:

Hi Lóránt,

On 02/19/2013 12:20 PM, Lóránt Pintér wrote:

Hi,

I have a problem with half-colored ligatures, like (5) mfim in the image:


Right.  That's one of the harder issues of text rendering.


I figured out two ways to do this, but neither is good enough:

   * I can shape each color range separately, but then I lose the kerning
 between them, breaking (6)


Yes.  Best to not do this.


   * I can tell HarfBuzz to disable ligatures for the last of the first
 character of each color range, but then it breaks (2) or (3) and (4).


Right.  This is a limitation of HarfBuzz currently, that you can't turn off a
pair-wise feature on one pair only, since changing the liga bit on one
character affects it in both directions.

I haven't been able to find a satisfactory fix for this yet.  I'll think about 
it.


In general, I don't think it's clear exactly how these sort of edge 
cases ought to work.


Suppose you have a glyph sequence A X B, and the 'liga' feature is 
enabled for A and B, but not for X; but further suppose that X is a mark 
glyph, the liga lookup ignores marks, and there's an AB ligature. Should 
it be applied here?


Another possible approach to disabling ligatures at the color change - 
given that harfbuzz doesn't know anything about color, that must be 
something that your application is maintaining - might be to insert a 
ZWNJ character at that position in the text. With the latest harfbuzz 
code, I believe kerning would still apply correctly across this, but it 
should prevent the ligature.



Is there maybe a way to tell HarfBuzz to ignore ligatures if they span that
color boundary? Or is there maybe a way to (quickly) assess if liga would be
applied to a range of characters?


We don't have a good answer for this right now.  The way I want to eventually
fix this in Pango is different: it is to pain the ligature glyph half in each
color.  I think you can do the same using canvas.  Just use a gradient with
a sharp color switch for the ligature.  It's a royal pain, but I think that's
the most desirable rendering.  I may be wrong.


It's a reasonable rendering for typical Latin ligatures in simple text 
fonts. It doesn't work so well for more cursive cases. E.g. using this 
approach to color the middle f of Zapfino's ffi will look rather 
weird, as will coloring the parts of Arabic lam-meem-hah in a font with 
stacked ligature forms.




As for *where* to cut the ligature, here's what you need:

   * Count the number of cursor positions *inside* the ligature.  For the fi
ligature it's one.  And we have one cursor position before the ligature, so in
this case we need to cut it in two pieces,

   * The common heuristic then is to cut the advance width of the ligature
(well, cluster really) into two equal pieces.  If you want to be fancy, you
can call hb_ot_layout_get_ligature_carets(), and if the number of carets
matches what you expect (1 in this case I believe?), you can use the returned
caret positions instead of equally dividing the ligature.  I haven't seen
anyone implementing this though, as it gives very marginal improvements over
the heuristic.


Particularly as I suspect that relatively few fonts actually have GDEF 
tables that define ligature-caret positions with any more care than 
simply dividing up the advance width into equal parts.


JK

___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures and color changes

2013-02-19 Thread Lóránt Pintér
Using ZWNJ would be a great way of fixing it. And indeed it splits the ligature 
all right, but it also destroys the kerning. Here's a little test I did with 
HarfBuzz @ e0486fc1affd3796fb8f664e2e7fc208f1d2106c: (the font has an fi 
ligature, and has some kerning between F and ,)

Splitting the ligature with the ZWNJ works:

Shaped fif\u200Ci with features: [  ]
Glyph: #682, x_advance: 652
Glyph: #71, x_advance: 380
Glyph: #3, x_advance: 0
Glyph: #74, x_advance: 292

Kerning across the ZWNJ does not:

Shaped F,F\u200C, with features: [  ]
Glyph: #39, x_advance: 483
Glyph: #13, x_advance: 242
Glyph: #39, x_advance: 563
Glyph: #3, x_advance: 0
Glyph: #13, x_advance: 242


Is it possible that doing this will be supported later in HarfBuzz?

--  
Lóránt Pintér
Developer at Prezi (http://prezi.com)



On Wednesday, February 20, 2013 at 12:09 AM, Jonathan Kew wrote:

 Another possible approach to disabling ligatures at the color change -
 given that harfbuzz doesn't know anything about color, that must be
 something that your application is maintaining - might be to insert a
 ZWNJ character at that position in the text. With the latest harfbuzz
 code, I believe kerning would still apply correctly across this, but it
 should prevent the ligature.


___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz