Re: [HarfBuzz] Ligatures

2020-05-23 Thread Richard Wordingham
On Sat, 23 May 2020 19:45:17 +0300
Eli Zaretskii  wrote:

> > Date: Sat, 23 May 2020 16:54:51 +0100
> > From: Richard Wordingham 
> > Cc: harfbuzz@lists.freedesktop.org
> >   
> > > Emacs supports more than one rule for each composable sequence of
> > > characters.  
> > 
> > That doesn't help when the rules give conflicting divisions into
> > clusters, which is the case with Tai Tham.  
> 
> The assumption is that either the rules can be arranged in an order
> that allows to use the first matching rule, or, failing that, that you
> write your own composing function that implements whatever logic
> that's required to select the right rule.

That choice needs tied to the choice of font - or for Tai Tham you use
my hack technique. However, it's not as bad as it could be.  There's
something strange going on in Tai Tham even at Emacs 27.05. I can have
two aksharas interacting for shaping, but it take two 'ordinary' key
advances to pass through it, apparently implying that there are two
clusters. Clusters for cursor advancement and clusters for shaping seem
to be controlled independently!

>From the dotted circle insertion logic, Emacs 27.05 on my machine
definitely looks as though it's using some form of HarfBuzz.

> > The Devanagari rule only covers the Vedic marks in the Devanagari
> > block, the 'stress signs' according to the comments.  Can rules
> > essentially for different scripts now share combining marks?  The
> > newer Vedic marks were supposed to be available to at least all
> > Indian Indic scripts.  
> 
> I don't know enough about this to make sure I even understand the
> question, let alone can provide an answer.  One thing I can say is
> that the regexp pattern in a rule can specify different context (the
> surrounding characters) even if the character that triggers the rule
> is the same.  Failing that, I guess the solution will again be the
> function that produces the composition.
> 
> As for different scripts: if the character codepoints are the same,
> Emacs currently assigns each character to a single script.

I'll need to dig deeper.  Composition of both 'a' and Greek alpha with
an acute accent works, which suggest that the problem isn't there for
characters with a script property of 'inherited'.

> > > Does Emacs indeed fail to wrap Arabic text?  can you show an
> > > example?  
> > 
> > Character level wrapping still almost works down at Emacs 24.4, but
> > I don't know that it wasn't broken in later enhancements.  There
> > are three features that make me think Emacs 24.4 might be different
> > to the current state of affairs:
> > 
> > (1) Clicking into the text breaks text before the cursor, but not
> > after it.
> > (2) I can't step into lam-alif the way I step into Indic clusters.
> > (3) Lam-alif isn't broken by line wrap.  
> 
> Emacs 24.4 is very old, and doesn't use HarfBuzz.  Please try Emacs 27
> instead, it has several bugs in this area fixed, and will use HarfBuzz
> if available at build time.

The behaviour in 27.05 is the almost the same as for 24.4, but the
breaking in item (1) is automatically repaired.  The process seems slow
- I can see the glyph become final and then revert back to being
medial.  I'm puzzled by not being able to step into lam-alif but being
able to step through a series 'beh's.  The step into command for
advancing codepoint by codepoint semiworks.  The cluster shaping
doesn't break at the cursor - Handa gave me a C code fix so I could
achieve that - but the number of steps into to pass through a cluster
matches the number of codepoints.

Pressing the 'delete' key still deletes a single character, but may be
that because it's mapped to tpu-delete-current-char.

So, what's not working in Arabic is that one can't move the cursor
through ligatures.  It seems one can advance point through them
using a step-into command (dead reckoning is a useful fallback), but one
loses visual feedback.  But for that important matter, it looks as
though Arabic in Emacs already has the behaviours needed for shaping
Latin words.  The stepping into is enabled by the command "(setq
disable-point-adjustment t)".


Richard.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Simon Cozens

On 23/05/2020 08:44, Eli Zaretskii wrote:

Thanks.  Since (b) is not really feasible without redesigning the
entire Emacs display engine (for which I see no volunteers lining up
any time soon), I guess we will have to use some more-or-less
reasonable and somewhat unreliable heuristics by supporting only some
ligatures that are known in advance.


Travelling further in the wrong direction is always an option, but don't 
expect it to get you closer to the right destination.


Full text shaping is the only way to get this right. Everything else is 
a hack, and piling hacks on top of hacks is just storing maintenance 
problems up for yourself.


I know that's hard to hear for a volunteer project where nobody really 
wants to invest the effort in this complicated niche stuff, but 
honestly, you're probably better doing *nothing* than doing this.

___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Eli Zaretskii
> Cc: harfbuzz@lists.freedesktop.org
> From: Simon Cozens 
> Date: Sat, 23 May 2020 20:14:16 +0100
> 
> On 23/05/2020 08:44, Eli Zaretskii wrote:
> > Thanks.  Since (b) is not really feasible without redesigning the
> > entire Emacs display engine (for which I see no volunteers lining up
> > any time soon), I guess we will have to use some more-or-less
> > reasonable and somewhat unreliable heuristics by supporting only some
> > ligatures that are known in advance.
> 
> Travelling further in the wrong direction is always an option, but don't 
> expect it to get you closer to the right destination.

I don't think this is an adequate analogy.  What Emacs does is an
approximation to what should be done.  The approximation falls short
of the target, that's true, and might even produce clearly incorrect
results in some cases (although I've yet to see such cases, and I'm
using Emacs for editing non-ASCII text for 20 years).  But it is still
an approximation, so it is not really "the wrong direction" (which you
seem to interpret as 180 degrees off, otherwise even going in the
wrong direction might bring me closer to the destination, right?).
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Eli Zaretskii
> Date: Sat, 23 May 2020 20:06:32 +0100
> From: Richard Wordingham 
> 
> There are three different tools for producing what looks like an "ffi"
> ligature:
> 
> 1) Make a ligature
> 2) Contextual substitution
> 3) A mix of contextual substitution and kerning.
> 
> A font that uses the first will produce a ligature for Emacs.
> 
> A font that uses contextual substitution will not work - you will just
> see the 3 unligated characters with their default glyphs.
> 
> A font that uses a mix of contextual substitution and kerning will
> likewise fail.  However, if is possible that you might get the "ff"
> ligature and a normal 'i', or a normal 'f' and an "fi" ligature.
> 
> From the point of view of someone who expects full shaping, what result
> you get will be arbitrary, depending on how the font designer has
> marshalled his tools.

I understand.  Still, the result looks reasonably good in most cases,
especially in an editor whose main purpose is to edit programs, and
which doesn't pretend to produce typographical accuracy.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Eli Zaretskii
> From: Khaled Hosny 
> Date: Sat, 23 May 2020 20:54:15 +0200
> Cc: harfbuzz@lists.freedesktop.org
> 
> > We pass to the shaper the part of text that matches the regexps you
> > can see at the end of misc-lang.el, then display the glyphs the shaper
> > returns.  The above description is a high-level overview; there are
> > many details that I cannot describe in a short message.  For example,
> > for Arabic, when we get back the grapheme clusters, we lay them out,
> > then skip to the end of the text that we passed to the shaper.
> 
> You mean this:
> https://repo.or.cz/emacs.git/blob/HEAD:/lisp/language/misc-lang.el#l78
> 
> I’m not sure how can I read it, but it seems to be missing the entire Arabic 
> Extended-A and Arabic Mathematical Alphabetic Symbols blocks. I’m not also 
> sure how it would handle using combining marks from other blocks with Arabic 
> text (say putting U+20D6 over an Arabic letter).

If you can suggest improvements to those patterns, please do, and
thanks.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Richard Wordingham
On Sat, 23 May 2020 21:26:00 +0300
Eli Zaretskii  wrote:

> > From: Khaled Hosny 
> > Date: Sat, 23 May 2020 20:09:50 +0200
> > Cc: harfbuzz@lists.freedesktop.org
> > 
> > Overall, if you can’t send the whole text (words are the absolute
> > minimum, but this has its issues as well), don’t just send
> > arbitrary parts of it as the result will be some inconsistent
> > mess.  
> 
> I almost understand (and agree), sans one part: the "arbitrary parts"
> of what you wrote.  If we want to produce a ligature out of "ffi", the
> shaper will get "fii" and nothing more.  Which part here is arbitrary?

There are three different tools for producing what looks like an "ffi"
ligature:

1) Make a ligature
2) Contextual substitution
3) A mix of contextual substitution and kerning.

A font that uses the first will produce a ligature for Emacs.

A font that uses contextual substitution will not work - you will just
see the 3 unligated characters with their default glyphs.

A font that uses a mix of contextual substitution and kerning will
likewise fail.  However, if is possible that you might get the "ff"
ligature and a normal 'i', or a normal 'f' and an "fi" ligature.

From the point of view of someone who expects full shaping, what result
you get will be arbitrary, depending on how the font designer has
marshalled his tools.

Richard.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Eli Zaretskii
> From: Khaled Hosny 
> Date: Sat, 23 May 2020 20:40:44 +0200
> Cc: harfbuzz@lists.freedesktop.org
> 
> Sending “ffi” alone is an arbitrary decision. The font might have kerning 
> between “ffi” and what comes before and after it, but you won’t get it. The 
> font might not hav a ligature for “ffi” at all, but using kerning instead, so 
> you will get kerning between “ffi” glyphs and not other glyphs which is 
> arbitrary. It might be a cursive font that changes glyph shapes based on 
> surrounding glyphs, and you will get that for “ffi” and not elsewhere which 
> is arbitrary.
> 
> That is just plain wrong, there is no way around it.

OK, thanks.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Khaled Hosny


> On May 23, 2020, at 8:34 PM, Eli Zaretskii  wrote:
> 
>> From: Khaled Hosny 
>> Date: Sat, 23 May 2020 20:18:33 +0200
>> Cc: harfbuzz@lists.freedesktop.org
>> 
>>> The Emacs display engine examines the text to be displayed and laid
>>> out one character at a time, and makes layout decisions after each
>>> character or grapheme cluster it lays out.  Its design is therefore
>>> fundamentally incompatible with shaping large substrings of buffer
>>> text at once.  We do support that for short sequences of characters,
>>> which seems to work well enough for complex shaping (a.k.a. "character
>>> compositions") of scripts that require that, but we still do that one
>>> grapheme cluster at a time.  
>> 
>> That wouldn’t work for Arabic. You can’t shape Arabic one grapheme cluster 
>> at a time (or any other text actually, but the brokenness in Arabic will be 
>> immediately obvious), so I’m most certain that is not exactly how Arabic is 
>> handled in Emacs right now.
> 
> We pass to the shaper the part of text that matches the regexps you
> can see at the end of misc-lang.el, then display the glyphs the shaper
> returns.  The above description is a high-level overview; there are
> many details that I cannot describe in a short message.  For example,
> for Arabic, when we get back the grapheme clusters, we lay them out,
> then skip to the end of the text that we passed to the shaper.

You mean this:
https://repo.or.cz/emacs.git/blob/HEAD:/lisp/language/misc-lang.el#l78

I’m not sure how can I read it, but it seems to be missing the entire Arabic 
Extended-A and Arabic Mathematical Alphabetic Symbols blocks. I’m not also sure 
how it would handle using combining marks from other blocks with Arabic text 
(say putting U+20D6 over an Arabic letter).

What happens if one edits a file that contains only Arabic text, and why that 
(whatever it is ) can’t be extended to any text?

>>> The character composition is implemented
>>> in Lisp, which is called by the display engine, and which then calls
>>> back into C to invoke the shaper.  This implementation is meant to
>>> allow a great deal of control on what should be composed and how.  But
>>> it is also relatively slow, which is another reason why doing that for
>>> all the text to be laid out is impractical: it slows down redisplay to
>>> the degree that it becomes annoying to users.
>> 
>> Having more control should not be at the price of doing things wrong.
> 
> No one said it should, that's just how things are.
> 
>> The whole composition concept of Emacs does not make any sense to me, all 
>> text is “composed”. You can have a special mode that would disable shaping 
>> for specific purposes (opening huge log files, wanting to see raw text with 
>> no bidi or shaping, etc), but this can be done in cooperation with HarfBuzz 
>> and not by bypassing it entirely.
> 
> We are talking about a piece of software designed 21 years ago.  I
> realize that it makes no sense to you, but that's what we have, and
> will probably have for the next 10 years or so.  We must make the most
> out of what we have.

So nearly as old as the first release of OpenOffice (not counting its 
StarOffice days). Anyway bad decisions about text layout is quite rampant in 
software (old and new) and need to be fixed, but that is not my call.

___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Khaled Hosny


> On May 23, 2020, at 8:26 PM, Eli Zaretskii  wrote:
> 
>> From: Khaled Hosny 
>> Date: Sat, 23 May 2020 20:09:50 +0200
>> Cc: harfbuzz@lists.freedesktop.org
>> 
>> Overall, if you can’t send the whole text (words are the absolute minimum, 
>> but this has its issues as well), don’t just send arbitrary parts of it as 
>> the result will be some inconsistent mess.
> 
> I almost understand (and agree), sans one part: the "arbitrary parts"
> of what you wrote.  If we want to produce a ligature out of "ffi", the
> shaper will get "fii" and nothing more.  Which part here is arbitrary?

Sending “ffi” alone is an arbitrary decision. The font might have kerning 
between “ffi” and what comes before and after it, but you won’t get it. The 
font might not hav a ligature for “ffi” at all, but using kerning instead, so 
you will get kerning between “ffi” glyphs and not other glyphs which is 
arbitrary. It might be a cursive font that changes glyph shapes based on 
surrounding glyphs, and you will get that for “ffi” and not elsewhere which is 
arbitrary.

That is just plain wrong, there is no way around it.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Eli Zaretskii
> From: Khaled Hosny 
> Date: Sat, 23 May 2020 20:18:33 +0200
> Cc: harfbuzz@lists.freedesktop.org
> 
> > The Emacs display engine examines the text to be displayed and laid
> > out one character at a time, and makes layout decisions after each
> > character or grapheme cluster it lays out.  Its design is therefore
> > fundamentally incompatible with shaping large substrings of buffer
> > text at once.  We do support that for short sequences of characters,
> > which seems to work well enough for complex shaping (a.k.a. "character
> > compositions") of scripts that require that, but we still do that one
> > grapheme cluster at a time.  
> 
> That wouldn’t work for Arabic. You can’t shape Arabic one grapheme cluster at 
> a time (or any other text actually, but the brokenness in Arabic will be 
> immediately obvious), so I’m most certain that is not exactly how Arabic is 
> handled in Emacs right now.

We pass to the shaper the part of text that matches the regexps you
can see at the end of misc-lang.el, then display the glyphs the shaper
returns.  The above description is a high-level overview; there are
many details that I cannot describe in a short message.  For example,
for Arabic, when we get back the grapheme clusters, we lay them out,
then skip to the end of the text that we passed to the shaper.

> > The character composition is implemented
> > in Lisp, which is called by the display engine, and which then calls
> > back into C to invoke the shaper.  This implementation is meant to
> > allow a great deal of control on what should be composed and how.  But
> > it is also relatively slow, which is another reason why doing that for
> > all the text to be laid out is impractical: it slows down redisplay to
> > the degree that it becomes annoying to users.
> 
> Having more control should not be at the price of doing things wrong.

No one said it should, that's just how things are.

> The whole composition concept of Emacs does not make any sense to me, all 
> text is “composed”. You can have a special mode that would disable shaping 
> for specific purposes (opening huge log files, wanting to see raw text with 
> no bidi or shaping, etc), but this can be done in cooperation with HarfBuzz 
> and not by bypassing it entirely.

We are talking about a piece of software designed 21 years ago.  I
realize that it makes no sense to you, but that's what we have, and
will probably have for the next 10 years or so.  We must make the most
out of what we have.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Eli Zaretskii
> From: Khaled Hosny 
> Date: Sat, 23 May 2020 20:09:50 +0200
> Cc: harfbuzz@lists.freedesktop.org
> 
> Overall, if you can’t send the whole text (words are the absolute minimum, 
> but this has its issues as well), don’t just send arbitrary parts of it as 
> the result will be some inconsistent mess.

I almost understand (and agree), sans one part: the "arbitrary parts"
of what you wrote.  If we want to produce a ligature out of "ffi", the
shaper will get "fii" and nothing more.  Which part here is arbitrary?

Thanks.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Khaled Hosny


> On May 23, 2020, at 10:35 AM, Eli Zaretskii  wrote:
> 
>> From: Khaled Hosny 
>> Date: Sat, 23 May 2020 09:59:15 +0200
>> Cc: harfbuzz@lists.freedesktop.org
>> 
>> Also either Emacs is currently treating text that it enables shaping for as 
>> second-class citizens where limitations/degraded performance is acceptable 
>> (which is really really bad)
> 
> Could you tell more about which limitations and degraded performance
> you had in mind?  I'm not sure we have this, but cannot tell without
> understanding the issues.

I have no idea. I’m just guessing why you think the Emacs display engine can’t 
handle all text like it handles Arabic. Either it does not handle Arabic 
correctly, or it can handle all text like it handles Arabic.

>> or “redesigning the entire Emacs display engine” is not really needed as you 
>> can just declare all text as text that needs to be shaped and be done with 
>> it.
> 
> The Emacs display engine examines the text to be displayed and laid
> out one character at a time, and makes layout decisions after each
> character or grapheme cluster it lays out.  Its design is therefore
> fundamentally incompatible with shaping large substrings of buffer
> text at once.  We do support that for short sequences of characters,
> which seems to work well enough for complex shaping (a.k.a. "character
> compositions") of scripts that require that, but we still do that one
> grapheme cluster at a time.  


That wouldn’t work for Arabic. You can’t shape Arabic one grapheme cluster at a 
time (or any other text actually, but the brokenness in Arabic will be 
immediately obvious), so I’m most certain that is not exactly how Arabic is 
handled in Emacs right now.

> The character composition is implemented
> in Lisp, which is called by the display engine, and which then calls
> back into C to invoke the shaper.  This implementation is meant to
> allow a great deal of control on what should be composed and how.  But
> it is also relatively slow, which is another reason why doing that for
> all the text to be laid out is impractical: it slows down redisplay to
> the degree that it becomes annoying to users.

Having more control should not be at the price of doing things wrong. The whole 
composition concept of Emacs does not make any sense to me, all text is 
“composed”. You can have a special mode that would disable shaping for specific 
purposes (opening huge log files, wanting to see raw text with no bidi or 
shaping, etc), but this can be done in cooperation with HarfBuzz and not by 
bypassing it entirely.

Regards,
Khaled
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Khaled Hosny


> On May 23, 2020, at 10:25 AM, Eli Zaretskii  wrote:
> 
>> From: Khaled Hosny 
>> Date: Sat, 23 May 2020 09:51:21 +0200
>> Cc: harfbuzz@lists.freedesktop.org
>> 
>>> Thanks.  Since (b) is not really feasible without redesigning the
>>> entire Emacs display engine (for which I see no volunteers lining up
>>> any time soon), I guess we will have to use some more-or-less
>>> reasonable and somewhat unreliable heuristics by supporting only some
>>> ligatures that are known in advance.
>> 
>> What are you going to do about kerning, or mark positioning? Partially 
>> kerning arbitrary glyphs (because the sub string match some regular 
>> expression) is worse than not kerning at all.
> 
> I don't think I understand the question.  How is kerning related to
> the issue at hand?

Kerning is part of text layout. You are only considering ligatures, but they 
are small part of text layout and your proposal does not seem to consider 
anything other than ligatures which is arbitrary division and makes no much 
sense to me. Some fonts provide ligatures to fix f-collioson, others fix it 
with contextual alternates, and others fix it with kerning. Your proposed 
solution does not address this. Also when you pass certain text to the layout 
engine, you get everything the font provides not just ligatures, so you would 
end up kerning certain letter combination (that you send to the layout engine) 
and not others, which is inconsistent and ugly.

Overall, if you can’t send the whole text (words are the absolute minimum, but 
this has its issues as well), don’t just send arbitrary parts of it as the 
result will be some inconsistent mess.

Regards,
Khaled

___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Eli Zaretskii
> Date: Sat, 23 May 2020 16:54:51 +0100
> From: Richard Wordingham 
> Cc: harfbuzz@lists.freedesktop.org
> 
> > Emacs supports more than one rule for each composable sequence of
> > characters.
> 
> That doesn't help when the rules give conflicting divisions into
> clusters, which is the case with Tai Tham.

The assumption is that either the rules can be arranged in an order
that allows to use the first matching rule, or, failing that, that you
write your own composing function that implements whatever logic
that's required to select the right rule.

> The Devanagari rule only covers the Vedic marks in the Devanagari block,
> the 'stress signs' according to the comments.  Can rules essentially
> for different scripts now share combining marks?  The newer Vedic marks
> were supposed to be available to at least all Indian Indic scripts.

I don't know enough about this to make sure I even understand the
question, let alone can provide an answer.  One thing I can say is
that the regexp pattern in a rule can specify different context (the
surrounding characters) even if the character that triggers the rule
is the same.  Failing that, I guess the solution will again be the
function that produces the composition.

As for different scripts: if the character codepoints are the same,
Emacs currently assigns each character to a single script.

> > Does Emacs indeed fail to wrap Arabic text?  can you show an example?
> 
> Character level wrapping still almost works down at Emacs 24.4, but I
> don't know that it wasn't broken in later enhancements.  There are three
> features that make me think Emacs 24.4 might be different to the
> current state of affairs:
> 
> (1) Clicking into the text breaks text before the cursor, but not after
> it.
> (2) I can't step into lam-alif the way I step into Indic clusters.
> (3) Lam-alif isn't broken by line wrap.

Emacs 24.4 is very old, and doesn't use HarfBuzz.  Please try Emacs 27
instead, it has several bugs in this area fixed, and will use HarfBuzz
if available at build time.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Eli Zaretskii
> Date: Sat, 23 May 2020 16:33:12 +0100
> From: Richard Wordingham 
> 
> On Sat, 23 May 2020 11:25:38 +0300
> Eli Zaretskii  wrote:
> 
> > > From: Khaled Hosny 
> > > Date: Sat, 23 May 2020 09:51:21 +0200
> > > Cc: harfbuzz@lists.freedesktop.org
> > > What are you going to do about kerning, or mark positioning?
> > > Partially kerning arbitrary glyphs (because the sub string match
> > > some regular expression) is worse than not kerning at all.  
> > 
> > I don't think I understand the question.  How is kerning related to
> > the issue at hand?  I'm not an expert on typesetting text (so maybe I
> > don't even understand what exactly is meant by "kerning" in this
> > context), so please tell more details about this.
> 
> The simplest way of laying out proportionally spaced text is to have a
> fixed glyph-dependent distance ('advance width') from the 'origin' of a
> glyph to the origin of the next glyph and simply lay them out in a
> sequence, like movable type. However, if one chooses widths suitable
> for the sequences 'AM' and 'MV', then there may be an unsightly gap in
> the middle of 'AV'. Kerning is basically the process of adjusting those
> gaps.  Kerning is done by the shaper.  To do it, it needs the
> whole sequence of characters.

Ah, okay, thanks.  Then yes, Emacs just uses the advance width that we
get from the metrics of each glyph.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Richard Wordingham
On Sat, 23 May 2020 17:22:58 +0300
Eli Zaretskii  wrote:

> > Date: Sat, 23 May 2020 14:51:53 +0100
> > From: Richard Wordingham 
> >   
> > > > They may of course have more than one set of such rules, with
> > > > the rule sets defining different sets of sequences.
> > > 
> > > Who are "they" in this context?  
> > 
> > Devanagari and Tai Tham are two examples I am aware of.  
> 
> Emacs supports more than one rule for each composable sequence of
> characters.

That doesn't help when the rules give conflicting divisions into
clusters, which is the case with Tai Tham.

On the other hand, for the Devanagari scripts, the rules can store
alternatives which some renderers would consider ill-formed, or be
more sensibly treated as 2 clusters.  
 
> > Devanagari has different rules for positioning of Vedic marks
> > between fonts using the script tags dev and dev2 for it on one hand
> > and the unofficial script tag dev3, which follows the USE rules for
> > character ordering.  For tag dev, Microsoft says that  > virama, candrabindu, consonant> is one cluster; others, including
> > Unicode, say it's two.  Candrabindu in the middle and candrabindu
> > at the end mean different things; the former nasalises a consonant,
> > while the latter nasalises a vowel.  The visual distinction exists,
> > at least when half-forms are used.  
> 
> See the rules set up near the end of indian.el in Emacs.  If they
> don't cover what you describe, we can add more.

The Devanagari rule only covers the Vedic marks in the Devanagari block,
the 'stress signs' according to the comments.  Can rules essentially
for different scripts now share combining marks?  The newer Vedic marks
were supposed to be available to at least all Indian Indic scripts.

> > > If a font requires special shaping for any sequence of any number
> > > of 26 (or maybe 52) ASCII letters, then the Emacs display engine
> > > will need to be redesigned.  So this extreme possibility doesn't
> > > bother me.  
> > 
> > In general, they do require it.  But how is this worse than handling
> > Arabic?  
> 
> I don't know.  Maybe it isn't.  Or maybe the slowdown while displaying
> ASCII and moving the cursor through it will be unbearable.
> 
> > Is the problem that you want to keep the option of line
> > wrapping splitting words for ASCII, but are not bothered for Arabic
> > or other human languages?  
> 
> Does Emacs indeed fail to wrap Arabic text?  can you show an example?

Character level wrapping still almost works down at Emacs 24.4, but I
don't know that it wasn't broken in later enhancements.  There are three
features that make me think Emacs 24.4 might be different to the
current state of affairs:

(1) Clicking into the text breaks text before the cursor, but not after
it.
(2) I can't step into lam-alif the way I step into Indic clusters.
(3) Lam-alif isn't broken by line wrap.

> > I think you mean that Emacs would store the position of components
> > by an index that was the sequence of characters, not the glyph ID.
> > That would also deal with precomposed characters - it would be the
> > character sequence that mattered, and for cursor movement and
> > rendering, the canonically equivalent sequence(s) and the
> > precomposed character would remain distinct.  
> 
> Sorry, I don't follow: what do you mean by "store"?  Emacs stores the
> rules used to compose characters, and it stores the results of the
> compositions already done by applying those rules, as part of
> displaying some chunk of text.  Which one of these did you have in
> mind?

Neither.  I thought from the Emacs developers' discussion that you were
hoping to store the locations of the character boundaries within
ligatures. 

Richard.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Richard Wordingham
On Sat, 23 May 2020 11:25:38 +0300
Eli Zaretskii  wrote:

> > From: Khaled Hosny 
> > Date: Sat, 23 May 2020 09:51:21 +0200
> > Cc: harfbuzz@lists.freedesktop.org
> > What are you going to do about kerning, or mark positioning?
> > Partially kerning arbitrary glyphs (because the sub string match
> > some regular expression) is worse than not kerning at all.  
> 
> I don't think I understand the question.  How is kerning related to
> the issue at hand?  I'm not an expert on typesetting text (so maybe I
> don't even understand what exactly is meant by "kerning" in this
> context), so please tell more details about this.

The simplest way of laying out proportionally spaced text is to have a
fixed glyph-dependent distance ('advance width') from the 'origin' of a
glyph to the origin of the next glyph and simply lay them out in a
sequence, like movable type. However, if one chooses widths suitable
for the sequences 'AM' and 'MV', then there may be an unsightly gap in
the middle of 'AV'. Kerning is basically the process of adjusting those
gaps.  Kerning is done by the shaper.  To do it, it needs the
whole sequence of characters.

To a first approximation, mark positioning is handled by passing the
whole clusters to the shaper, and suitable regular expressions will
handle this.  However, sometimes clusters will interact.  Microsoft had
an example in the OpenType specification of the handling of the sequence
Wö  with a comparatively huge 'W'.  In this
example, the umlaut would be lowered to get out of the way of the 'W'.
To do this, the shaper has to be presented with "W" and "ö" as part of
the same sequence.

Richard.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Eli Zaretskii
> Date: Sat, 23 May 2020 14:51:53 +0100
> From: Richard Wordingham 
> 
> > > They may of course have more than one set of such rules, with the
> > > rule sets defining different sets of sequences.  
> > 
> > Who are "they" in this context?
> 
> Devanagari and Tai Tham are two examples I am aware of.

Emacs supports more than one rule for each composable sequence of
characters.

> Devanagari has different rules for positioning of Vedic marks between
> fonts using the script tags dev and dev2 for it on one hand and the
> unofficial script tag dev3, which follows the USE rules for character
> ordering.  For tag dev, Microsoft says that  candrabindu, consonant> is one cluster; others, including Unicode, say
> it's two.  Candrabindu in the middle and candrabindu at the end mean
> different things; the former nasalises a consonant, while the latter
> nasalises a vowel.  The visual distinction exists, at least when
> half-forms are used.

See the rules set up near the end of indian.el in Emacs.  If they
don't cover what you describe, we can add more.

> > I'm not talking about Arabic.  Emacs has a set of regular expressions
> > for sequences of Arabic characters that need shaping, misc-lang.el in
> > Emacs.  If the set is incomplete, we can augment it.
> 
> That regular expression treats every Arabic word as in need of shaping. 
> 
> > If a font requires special shaping for any sequence of any number of
> > 26 (or maybe 52) ASCII letters, then the Emacs display engine will
> > need to be redesigned.  So this extreme possibility doesn't bother me.
> 
> In general, they do require it.  But how is this worse than handling
> Arabic?

I don't know.  Maybe it isn't.  Or maybe the slowdown while displaying
ASCII and moving the cursor through it will be unbearable.

> Is the problem that you want to keep the option of line
> wrapping splitting words for ASCII, but are not bothered for Arabic or
> other human languages?

Does Emacs indeed fail to wrap Arabic text?  can you show an example?

> > > How would you handle the possibility that all three of <æ>, 
> > > and  might be rendered by the same glyph, althouɡh they
> > > are comprised of 1, 2 and 3 characters respectively?  
> > 
> > By using a composition rule that matches both  and .
> > The rules are regexp-based, and expressing the above as a regexp is
> > simple.  Once a sequence of characters matches the regexp, Emacs calls
> > the shaper (hb_shape etc.) to produce the font glyphs for the
> > sequence, and displays the glyphs that the shaper returns.
> 
> I think you mean that Emacs would store the position of components by
> an index that was the sequence of characters, not the glyph ID.  That
> would also deal with precomposed characters - it would be the character
> sequence that mattered, and for cursor movement and rendering,
> the canonically equivalent sequence(s) and the precomposed character
> would remain distinct.

Sorry, I don't follow: what do you mean by "store"?  Emacs stores the
rules used to compose characters, and it stores the results of the
compositions already done by applying those rules, as part of
displaying some chunk of text.  Which one of these did you have in
mind?
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Richard Wordingham
On Sat, 23 May 2020 09:09:48 +0300
Eli Zaretskii  wrote:

> > Date: Fri, 22 May 2020 22:22:49 +0100
> > From: Richard Wordingham 
> >   
> > > The current support for producing ligatures works in the same way
> > > as complex text shaping for scripts that require that, like
> > > Arabic and Khmer: the sequences of characters that can be
> > > displayed as ligatures are identified in advance with suitable
> > > regular expressions, and the display engine then passes these
> > > sequences to hb_shape to produce the ligatures.
> > > 
> > > This works well for scripts that require complex shaping, because
> > > such scripts generally have well-defined rules for the sequences
> > > of codepoints that need shaping.  
> > 
> > They may of course have more than one set of such rules, with the
> > rule sets defining different sets of sequences.  
> 
> Who are "they" in this context?

Devanagari and Tai Tham are two examples I am aware of.

Devanagari has different rules for positioning of Vedic marks between
fonts using the script tags dev and dev2 for it on one hand and the
unofficial script tag dev3, which follows the USE rules for character
ordering.  For tag dev, Microsoft says that  is one cluster; others, including Unicode, say
it's two.  Candrabindu in the middle and candrabindu at the end mean
different things; the former nasalises a consonant, while the latter
nasalises a vowel.  The visual distinction exists, at least when
half-forms are used.

Tai Tham has an issue with the mark U+1A58 TAI THAM SIGN MAI KANG LAI.
It is, at least formally, a non-spacing mark.  It occurs at the
juncture of two syllables in the same words.  Modern, printed Tai Khuen
happily treats it as syllable-final.  In more traditional styles, it
starts syllables, going above the first consonant, and so to the right
of a vowel mark reordered to the left hand side of the syllable.  Some
fonts seem to just let it hang over the start of the next syllable,
taking pot luck with what's there.  That gives two different syllable
structures.

As I supported the style found in a certain dictionary, it sometimes
belongs with the syllable before, and sometimes with the syllable after
it.  I therefore ended up defined the sequences to be shaped as a
sequence of one or more syllables joined together by U+1A58.
Fortunately, normal cursor motion is controlled by a different
definition.  (I'm still using Emacs 24.4 with the restoration of
interactive commands forward-char-intrusive and backward-char-intrusive
and their interface within the C code.)

> I understand that the number of combinations is theoretically
> unbounded.  I'm asking if it is also unbounded in practice.  That is,
> do font designers add ligatures for arbitrary combinations of
> characters, regardless of some reasonable set of requirements?  For
> example, is the set of ligatures of Latin characters shown here:
> 
>   https://en.wikipedia.org/wiki/Orthographic_ligature#Latin_alphabet
> 
> reasonably complete, or should I expect any number of other arbitrary
> combinations of Latin characters popping up in fonts?  And if the
> latter, then what is the purpose of providing such arbitrary
> ligatures?

Doesn't the existence of ligatures for 'Eisenhower' and 'Chamberlain'
provide enough of an answer?

If you claim to support handwriting fonts, then you can expect others -
'sh', 'tt' and 'ing' are fairly obvious ones.  You may also find
ligatures being used to sort out kerning issues.

One problem I've observed with computer fonts is that the spacing of
glyphs in a string is not consistent.  This appears to be due to the
way the positioning of the glyphs is rounded.  The problem can be bad
enough that the designer ends up fixing the problem by combining them
into a single glyph, which formally is a ligature.  I've not noticed
this in ASCII fonts, but then I haven't looked hard at them.

The 'tt' ligature can arise because the two t's are crossed by a
single stroke.  Crossing the 't' in 'lt' might be handled by a special
't' glyph, or one might just form an 'lt' ligature.  The ending 'ing'
is common enough that I unconsciously developed an abbreviated way of
writing it.

> I'm not talking about Arabic.  Emacs has a set of regular expressions
> for sequences of Arabic characters that need shaping, misc-lang.el in
> Emacs.  If the set is incomplete, we can augment it.

That regular expression treats every Arabic word as in need of shaping. 

> If a font requires special shaping for any sequence of any number of
> 26 (or maybe 52) ASCII letters, then the Emacs display engine will
> need to be redesigned.  So this extreme possibility doesn't bother me.

In general, they do require it.  But how is this worse than handling
Arabic?  Is the problem that you want to keep the option of line
wrapping splitting words for ASCII, but are not bothered for Arabic or
other human languages?  ASCII does not satisfyingly suffice for
English.

> > How would you handle the possibility that all three of <æ>, 
> >

Re: [HarfBuzz] Ligatures

2020-05-23 Thread Eli Zaretskii
> From: Khaled Hosny 
> Date: Sat, 23 May 2020 09:59:15 +0200
> Cc: harfbuzz@lists.freedesktop.org
> 
> Also either Emacs is currently treating text that it enables shaping for as 
> second-class citizens where limitations/degraded performance is acceptable 
> (which is really really bad)

Could you tell more about which limitations and degraded performance
you had in mind?  I'm not sure we have this, but cannot tell without
understanding the issues.

> or “redesigning the entire Emacs display engine” is not really needed as you 
> can just declare all text as text that needs to be shaped and be done with it.

The Emacs display engine examines the text to be displayed and laid
out one character at a time, and makes layout decisions after each
character or grapheme cluster it lays out.  Its design is therefore
fundamentally incompatible with shaping large substrings of buffer
text at once.  We do support that for short sequences of characters,
which seems to work well enough for complex shaping (a.k.a. "character
compositions") of scripts that require that, but we still do that one
grapheme cluster at a time.  The character composition is implemented
in Lisp, which is called by the display engine, and which then calls
back into C to invoke the shaper.  This implementation is meant to
allow a great deal of control on what should be composed and how.  But
it is also relatively slow, which is another reason why doing that for
all the text to be laid out is impractical: it slows down redisplay to
the degree that it becomes annoying to users.

That is why solving these problems in the way that you suggest
requires a complete rewrite of the Emacs display code.  It simply
cannot currently support what you expect.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Eli Zaretskii
> From: Khaled Hosny 
> Date: Sat, 23 May 2020 09:51:21 +0200
> Cc: harfbuzz@lists.freedesktop.org
> 
> > Thanks.  Since (b) is not really feasible without redesigning the
> > entire Emacs display engine (for which I see no volunteers lining up
> > any time soon), I guess we will have to use some more-or-less
> > reasonable and somewhat unreliable heuristics by supporting only some
> > ligatures that are known in advance.
> 
> What are you going to do about kerning, or mark positioning? Partially 
> kerning arbitrary glyphs (because the sub string match some regular 
> expression) is worse than not kerning at all.

I don't think I understand the question.  How is kerning related to
the issue at hand?  I'm not an expert on typesetting text (so maybe I
don't even understand what exactly is meant by "kerning" in this
context), so please tell more details about this.

Thanks.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Khaled Hosny


> On May 23, 2020, at 9:51 AM, Khaled Hosny  wrote:
> 
> 
> 
>> On May 23, 2020, at 9:44 AM, Eli Zaretskii  wrote:
>> 
>>> From: Khaled Hosny 
>>> Date: Sat, 23 May 2020 08:36:10 +0200
>>> Cc: harfbuzz@lists.freedesktop.org
>>> 
  The only way of
 doing this right, I'm told, is to either (a) query the font to get the
 list of all the ligatures it supports, or (b) assume any combination
 of characters can produce a ligature, and therefore we need to pass
 all the characters intended for display through hb_shape.  The latter
 in particular is in stark contrast to how the current Emacs display
 code is designed and implemented.
>>> 
>>> (a) is not realistically possible as doing it properly has pretty much the 
>>> same cost as shaping the text. So your only reliable option is (b).
>> 
>> Thanks.  Since (b) is not really feasible without redesigning the
>> entire Emacs display engine (for which I see no volunteers lining up
>> any time soon), I guess we will have to use some more-or-less
>> reasonable and somewhat unreliable heuristics by supporting only some
>> ligatures that are known in advance.
> 
> What are you going to do about kerning, or mark positioning? Partially 
> kerning arbitrary glyphs (because the sub string match some regular 
> expression) is worse than not kerning at all.

Also either Emacs is currently treating text that it enables shaping for as 
second-class citizens where limitations/degraded performance is acceptable 
(which is really really bad), or “redesigning the entire Emacs display engine” 
is not really needed as you can just declare all text as text that needs to be 
shaped and be done with it.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Khaled Hosny



> On May 23, 2020, at 9:44 AM, Eli Zaretskii  wrote:
> 
>> From: Khaled Hosny 
>> Date: Sat, 23 May 2020 08:36:10 +0200
>> Cc: harfbuzz@lists.freedesktop.org
>> 
>>>   The only way of
>>> doing this right, I'm told, is to either (a) query the font to get the
>>> list of all the ligatures it supports, or (b) assume any combination
>>> of characters can produce a ligature, and therefore we need to pass
>>> all the characters intended for display through hb_shape.  The latter
>>> in particular is in stark contrast to how the current Emacs display
>>> code is designed and implemented.
>> 
>> (a) is not realistically possible as doing it properly has pretty much the 
>> same cost as shaping the text. So your only reliable option is (b).
> 
> Thanks.  Since (b) is not really feasible without redesigning the
> entire Emacs display engine (for which I see no volunteers lining up
> any time soon), I guess we will have to use some more-or-less
> reasonable and somewhat unreliable heuristics by supporting only some
> ligatures that are known in advance.

What are you going to do about kerning, or mark positioning? Partially kerning 
arbitrary glyphs (because the sub string match some regular expression) is 
worse than not kerning at all.

___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Ligatures

2020-05-23 Thread Eli Zaretskii
> From: Khaled Hosny 
> Date: Sat, 23 May 2020 08:36:10 +0200
> Cc: harfbuzz@lists.freedesktop.org
> 
> >The only way of
> > doing this right, I'm told, is to either (a) query the font to get the
> > list of all the ligatures it supports, or (b) assume any combination
> > of characters can produce a ligature, and therefore we need to pass
> > all the characters intended for display through hb_shape.  The latter
> > in particular is in stark contrast to how the current Emacs display
> > code is designed and implemented.
> 
> (a) is not realistically possible as doing it properly has pretty much the 
> same cost as shaping the text. So your only reliable option is (b).

Thanks.  Since (b) is not really feasible without redesigning the
entire Emacs display engine (for which I see no volunteers lining up
any time soon), I guess we will have to use some more-or-less
reasonable and somewhat unreliable heuristics by supporting only some
ligatures that are known in advance.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz