Re: [HarfBuzz] Ligatures
On Sat, 23 May 2020 19:45:17 +0300 Eli Zaretskii wrote: > > Date: Sat, 23 May 2020 16:54:51 +0100 > > From: Richard Wordingham > > Cc: harfbuzz@lists.freedesktop.org > > > > > Emacs supports more than one rule for each composable sequence of > > > characters. > > > > That doesn't help when the rules give conflicting divisions into > > clusters, which is the case with Tai Tham. > > The assumption is that either the rules can be arranged in an order > that allows to use the first matching rule, or, failing that, that you > write your own composing function that implements whatever logic > that's required to select the right rule. That choice needs tied to the choice of font - or for Tai Tham you use my hack technique. However, it's not as bad as it could be. There's something strange going on in Tai Tham even at Emacs 27.05. I can have two aksharas interacting for shaping, but it take two 'ordinary' key advances to pass through it, apparently implying that there are two clusters. Clusters for cursor advancement and clusters for shaping seem to be controlled independently! >From the dotted circle insertion logic, Emacs 27.05 on my machine definitely looks as though it's using some form of HarfBuzz. > > The Devanagari rule only covers the Vedic marks in the Devanagari > > block, the 'stress signs' according to the comments. Can rules > > essentially for different scripts now share combining marks? The > > newer Vedic marks were supposed to be available to at least all > > Indian Indic scripts. > > I don't know enough about this to make sure I even understand the > question, let alone can provide an answer. One thing I can say is > that the regexp pattern in a rule can specify different context (the > surrounding characters) even if the character that triggers the rule > is the same. Failing that, I guess the solution will again be the > function that produces the composition. > > As for different scripts: if the character codepoints are the same, > Emacs currently assigns each character to a single script. I'll need to dig deeper. Composition of both 'a' and Greek alpha with an acute accent works, which suggest that the problem isn't there for characters with a script property of 'inherited'. > > > Does Emacs indeed fail to wrap Arabic text? can you show an > > > example? > > > > Character level wrapping still almost works down at Emacs 24.4, but > > I don't know that it wasn't broken in later enhancements. There > > are three features that make me think Emacs 24.4 might be different > > to the current state of affairs: > > > > (1) Clicking into the text breaks text before the cursor, but not > > after it. > > (2) I can't step into lam-alif the way I step into Indic clusters. > > (3) Lam-alif isn't broken by line wrap. > > Emacs 24.4 is very old, and doesn't use HarfBuzz. Please try Emacs 27 > instead, it has several bugs in this area fixed, and will use HarfBuzz > if available at build time. The behaviour in 27.05 is the almost the same as for 24.4, but the breaking in item (1) is automatically repaired. The process seems slow - I can see the glyph become final and then revert back to being medial. I'm puzzled by not being able to step into lam-alif but being able to step through a series 'beh's. The step into command for advancing codepoint by codepoint semiworks. The cluster shaping doesn't break at the cursor - Handa gave me a C code fix so I could achieve that - but the number of steps into to pass through a cluster matches the number of codepoints. Pressing the 'delete' key still deletes a single character, but may be that because it's mapped to tpu-delete-current-char. So, what's not working in Arabic is that one can't move the cursor through ligatures. It seems one can advance point through them using a step-into command (dead reckoning is a useful fallback), but one loses visual feedback. But for that important matter, it looks as though Arabic in Emacs already has the behaviours needed for shaping Latin words. The stepping into is enabled by the command "(setq disable-point-adjustment t)". Richard. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
On 23/05/2020 08:44, Eli Zaretskii wrote: Thanks. Since (b) is not really feasible without redesigning the entire Emacs display engine (for which I see no volunteers lining up any time soon), I guess we will have to use some more-or-less reasonable and somewhat unreliable heuristics by supporting only some ligatures that are known in advance. Travelling further in the wrong direction is always an option, but don't expect it to get you closer to the right destination. Full text shaping is the only way to get this right. Everything else is a hack, and piling hacks on top of hacks is just storing maintenance problems up for yourself. I know that's hard to hear for a volunteer project where nobody really wants to invest the effort in this complicated niche stuff, but honestly, you're probably better doing *nothing* than doing this. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> Cc: harfbuzz@lists.freedesktop.org > From: Simon Cozens > Date: Sat, 23 May 2020 20:14:16 +0100 > > On 23/05/2020 08:44, Eli Zaretskii wrote: > > Thanks. Since (b) is not really feasible without redesigning the > > entire Emacs display engine (for which I see no volunteers lining up > > any time soon), I guess we will have to use some more-or-less > > reasonable and somewhat unreliable heuristics by supporting only some > > ligatures that are known in advance. > > Travelling further in the wrong direction is always an option, but don't > expect it to get you closer to the right destination. I don't think this is an adequate analogy. What Emacs does is an approximation to what should be done. The approximation falls short of the target, that's true, and might even produce clearly incorrect results in some cases (although I've yet to see such cases, and I'm using Emacs for editing non-ASCII text for 20 years). But it is still an approximation, so it is not really "the wrong direction" (which you seem to interpret as 180 degrees off, otherwise even going in the wrong direction might bring me closer to the destination, right?). ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> Date: Sat, 23 May 2020 20:06:32 +0100 > From: Richard Wordingham > > There are three different tools for producing what looks like an "ffi" > ligature: > > 1) Make a ligature > 2) Contextual substitution > 3) A mix of contextual substitution and kerning. > > A font that uses the first will produce a ligature for Emacs. > > A font that uses contextual substitution will not work - you will just > see the 3 unligated characters with their default glyphs. > > A font that uses a mix of contextual substitution and kerning will > likewise fail. However, if is possible that you might get the "ff" > ligature and a normal 'i', or a normal 'f' and an "fi" ligature. > > From the point of view of someone who expects full shaping, what result > you get will be arbitrary, depending on how the font designer has > marshalled his tools. I understand. Still, the result looks reasonably good in most cases, especially in an editor whose main purpose is to edit programs, and which doesn't pretend to produce typographical accuracy. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> From: Khaled Hosny > Date: Sat, 23 May 2020 20:54:15 +0200 > Cc: harfbuzz@lists.freedesktop.org > > > We pass to the shaper the part of text that matches the regexps you > > can see at the end of misc-lang.el, then display the glyphs the shaper > > returns. The above description is a high-level overview; there are > > many details that I cannot describe in a short message. For example, > > for Arabic, when we get back the grapheme clusters, we lay them out, > > then skip to the end of the text that we passed to the shaper. > > You mean this: > https://repo.or.cz/emacs.git/blob/HEAD:/lisp/language/misc-lang.el#l78 > > I’m not sure how can I read it, but it seems to be missing the entire Arabic > Extended-A and Arabic Mathematical Alphabetic Symbols blocks. I’m not also > sure how it would handle using combining marks from other blocks with Arabic > text (say putting U+20D6 over an Arabic letter). If you can suggest improvements to those patterns, please do, and thanks. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
On Sat, 23 May 2020 21:26:00 +0300 Eli Zaretskii wrote: > > From: Khaled Hosny > > Date: Sat, 23 May 2020 20:09:50 +0200 > > Cc: harfbuzz@lists.freedesktop.org > > > > Overall, if you can’t send the whole text (words are the absolute > > minimum, but this has its issues as well), don’t just send > > arbitrary parts of it as the result will be some inconsistent > > mess. > > I almost understand (and agree), sans one part: the "arbitrary parts" > of what you wrote. If we want to produce a ligature out of "ffi", the > shaper will get "fii" and nothing more. Which part here is arbitrary? There are three different tools for producing what looks like an "ffi" ligature: 1) Make a ligature 2) Contextual substitution 3) A mix of contextual substitution and kerning. A font that uses the first will produce a ligature for Emacs. A font that uses contextual substitution will not work - you will just see the 3 unligated characters with their default glyphs. A font that uses a mix of contextual substitution and kerning will likewise fail. However, if is possible that you might get the "ff" ligature and a normal 'i', or a normal 'f' and an "fi" ligature. From the point of view of someone who expects full shaping, what result you get will be arbitrary, depending on how the font designer has marshalled his tools. Richard. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> From: Khaled Hosny > Date: Sat, 23 May 2020 20:40:44 +0200 > Cc: harfbuzz@lists.freedesktop.org > > Sending “ffi” alone is an arbitrary decision. The font might have kerning > between “ffi” and what comes before and after it, but you won’t get it. The > font might not hav a ligature for “ffi” at all, but using kerning instead, so > you will get kerning between “ffi” glyphs and not other glyphs which is > arbitrary. It might be a cursive font that changes glyph shapes based on > surrounding glyphs, and you will get that for “ffi” and not elsewhere which > is arbitrary. > > That is just plain wrong, there is no way around it. OK, thanks. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> On May 23, 2020, at 8:34 PM, Eli Zaretskii wrote: > >> From: Khaled Hosny >> Date: Sat, 23 May 2020 20:18:33 +0200 >> Cc: harfbuzz@lists.freedesktop.org >> >>> The Emacs display engine examines the text to be displayed and laid >>> out one character at a time, and makes layout decisions after each >>> character or grapheme cluster it lays out. Its design is therefore >>> fundamentally incompatible with shaping large substrings of buffer >>> text at once. We do support that for short sequences of characters, >>> which seems to work well enough for complex shaping (a.k.a. "character >>> compositions") of scripts that require that, but we still do that one >>> grapheme cluster at a time. >> >> That wouldn’t work for Arabic. You can’t shape Arabic one grapheme cluster >> at a time (or any other text actually, but the brokenness in Arabic will be >> immediately obvious), so I’m most certain that is not exactly how Arabic is >> handled in Emacs right now. > > We pass to the shaper the part of text that matches the regexps you > can see at the end of misc-lang.el, then display the glyphs the shaper > returns. The above description is a high-level overview; there are > many details that I cannot describe in a short message. For example, > for Arabic, when we get back the grapheme clusters, we lay them out, > then skip to the end of the text that we passed to the shaper. You mean this: https://repo.or.cz/emacs.git/blob/HEAD:/lisp/language/misc-lang.el#l78 I’m not sure how can I read it, but it seems to be missing the entire Arabic Extended-A and Arabic Mathematical Alphabetic Symbols blocks. I’m not also sure how it would handle using combining marks from other blocks with Arabic text (say putting U+20D6 over an Arabic letter). What happens if one edits a file that contains only Arabic text, and why that (whatever it is ) can’t be extended to any text? >>> The character composition is implemented >>> in Lisp, which is called by the display engine, and which then calls >>> back into C to invoke the shaper. This implementation is meant to >>> allow a great deal of control on what should be composed and how. But >>> it is also relatively slow, which is another reason why doing that for >>> all the text to be laid out is impractical: it slows down redisplay to >>> the degree that it becomes annoying to users. >> >> Having more control should not be at the price of doing things wrong. > > No one said it should, that's just how things are. > >> The whole composition concept of Emacs does not make any sense to me, all >> text is “composed”. You can have a special mode that would disable shaping >> for specific purposes (opening huge log files, wanting to see raw text with >> no bidi or shaping, etc), but this can be done in cooperation with HarfBuzz >> and not by bypassing it entirely. > > We are talking about a piece of software designed 21 years ago. I > realize that it makes no sense to you, but that's what we have, and > will probably have for the next 10 years or so. We must make the most > out of what we have. So nearly as old as the first release of OpenOffice (not counting its StarOffice days). Anyway bad decisions about text layout is quite rampant in software (old and new) and need to be fixed, but that is not my call. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> On May 23, 2020, at 8:26 PM, Eli Zaretskii wrote: > >> From: Khaled Hosny >> Date: Sat, 23 May 2020 20:09:50 +0200 >> Cc: harfbuzz@lists.freedesktop.org >> >> Overall, if you can’t send the whole text (words are the absolute minimum, >> but this has its issues as well), don’t just send arbitrary parts of it as >> the result will be some inconsistent mess. > > I almost understand (and agree), sans one part: the "arbitrary parts" > of what you wrote. If we want to produce a ligature out of "ffi", the > shaper will get "fii" and nothing more. Which part here is arbitrary? Sending “ffi” alone is an arbitrary decision. The font might have kerning between “ffi” and what comes before and after it, but you won’t get it. The font might not hav a ligature for “ffi” at all, but using kerning instead, so you will get kerning between “ffi” glyphs and not other glyphs which is arbitrary. It might be a cursive font that changes glyph shapes based on surrounding glyphs, and you will get that for “ffi” and not elsewhere which is arbitrary. That is just plain wrong, there is no way around it. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> From: Khaled Hosny > Date: Sat, 23 May 2020 20:18:33 +0200 > Cc: harfbuzz@lists.freedesktop.org > > > The Emacs display engine examines the text to be displayed and laid > > out one character at a time, and makes layout decisions after each > > character or grapheme cluster it lays out. Its design is therefore > > fundamentally incompatible with shaping large substrings of buffer > > text at once. We do support that for short sequences of characters, > > which seems to work well enough for complex shaping (a.k.a. "character > > compositions") of scripts that require that, but we still do that one > > grapheme cluster at a time. > > That wouldn’t work for Arabic. You can’t shape Arabic one grapheme cluster at > a time (or any other text actually, but the brokenness in Arabic will be > immediately obvious), so I’m most certain that is not exactly how Arabic is > handled in Emacs right now. We pass to the shaper the part of text that matches the regexps you can see at the end of misc-lang.el, then display the glyphs the shaper returns. The above description is a high-level overview; there are many details that I cannot describe in a short message. For example, for Arabic, when we get back the grapheme clusters, we lay them out, then skip to the end of the text that we passed to the shaper. > > The character composition is implemented > > in Lisp, which is called by the display engine, and which then calls > > back into C to invoke the shaper. This implementation is meant to > > allow a great deal of control on what should be composed and how. But > > it is also relatively slow, which is another reason why doing that for > > all the text to be laid out is impractical: it slows down redisplay to > > the degree that it becomes annoying to users. > > Having more control should not be at the price of doing things wrong. No one said it should, that's just how things are. > The whole composition concept of Emacs does not make any sense to me, all > text is “composed”. You can have a special mode that would disable shaping > for specific purposes (opening huge log files, wanting to see raw text with > no bidi or shaping, etc), but this can be done in cooperation with HarfBuzz > and not by bypassing it entirely. We are talking about a piece of software designed 21 years ago. I realize that it makes no sense to you, but that's what we have, and will probably have for the next 10 years or so. We must make the most out of what we have. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> From: Khaled Hosny > Date: Sat, 23 May 2020 20:09:50 +0200 > Cc: harfbuzz@lists.freedesktop.org > > Overall, if you can’t send the whole text (words are the absolute minimum, > but this has its issues as well), don’t just send arbitrary parts of it as > the result will be some inconsistent mess. I almost understand (and agree), sans one part: the "arbitrary parts" of what you wrote. If we want to produce a ligature out of "ffi", the shaper will get "fii" and nothing more. Which part here is arbitrary? Thanks. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> On May 23, 2020, at 10:35 AM, Eli Zaretskii wrote: > >> From: Khaled Hosny >> Date: Sat, 23 May 2020 09:59:15 +0200 >> Cc: harfbuzz@lists.freedesktop.org >> >> Also either Emacs is currently treating text that it enables shaping for as >> second-class citizens where limitations/degraded performance is acceptable >> (which is really really bad) > > Could you tell more about which limitations and degraded performance > you had in mind? I'm not sure we have this, but cannot tell without > understanding the issues. I have no idea. I’m just guessing why you think the Emacs display engine can’t handle all text like it handles Arabic. Either it does not handle Arabic correctly, or it can handle all text like it handles Arabic. >> or “redesigning the entire Emacs display engine” is not really needed as you >> can just declare all text as text that needs to be shaped and be done with >> it. > > The Emacs display engine examines the text to be displayed and laid > out one character at a time, and makes layout decisions after each > character or grapheme cluster it lays out. Its design is therefore > fundamentally incompatible with shaping large substrings of buffer > text at once. We do support that for short sequences of characters, > which seems to work well enough for complex shaping (a.k.a. "character > compositions") of scripts that require that, but we still do that one > grapheme cluster at a time. That wouldn’t work for Arabic. You can’t shape Arabic one grapheme cluster at a time (or any other text actually, but the brokenness in Arabic will be immediately obvious), so I’m most certain that is not exactly how Arabic is handled in Emacs right now. > The character composition is implemented > in Lisp, which is called by the display engine, and which then calls > back into C to invoke the shaper. This implementation is meant to > allow a great deal of control on what should be composed and how. But > it is also relatively slow, which is another reason why doing that for > all the text to be laid out is impractical: it slows down redisplay to > the degree that it becomes annoying to users. Having more control should not be at the price of doing things wrong. The whole composition concept of Emacs does not make any sense to me, all text is “composed”. You can have a special mode that would disable shaping for specific purposes (opening huge log files, wanting to see raw text with no bidi or shaping, etc), but this can be done in cooperation with HarfBuzz and not by bypassing it entirely. Regards, Khaled ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> On May 23, 2020, at 10:25 AM, Eli Zaretskii wrote: > >> From: Khaled Hosny >> Date: Sat, 23 May 2020 09:51:21 +0200 >> Cc: harfbuzz@lists.freedesktop.org >> >>> Thanks. Since (b) is not really feasible without redesigning the >>> entire Emacs display engine (for which I see no volunteers lining up >>> any time soon), I guess we will have to use some more-or-less >>> reasonable and somewhat unreliable heuristics by supporting only some >>> ligatures that are known in advance. >> >> What are you going to do about kerning, or mark positioning? Partially >> kerning arbitrary glyphs (because the sub string match some regular >> expression) is worse than not kerning at all. > > I don't think I understand the question. How is kerning related to > the issue at hand? Kerning is part of text layout. You are only considering ligatures, but they are small part of text layout and your proposal does not seem to consider anything other than ligatures which is arbitrary division and makes no much sense to me. Some fonts provide ligatures to fix f-collioson, others fix it with contextual alternates, and others fix it with kerning. Your proposed solution does not address this. Also when you pass certain text to the layout engine, you get everything the font provides not just ligatures, so you would end up kerning certain letter combination (that you send to the layout engine) and not others, which is inconsistent and ugly. Overall, if you can’t send the whole text (words are the absolute minimum, but this has its issues as well), don’t just send arbitrary parts of it as the result will be some inconsistent mess. Regards, Khaled ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> Date: Sat, 23 May 2020 16:54:51 +0100 > From: Richard Wordingham > Cc: harfbuzz@lists.freedesktop.org > > > Emacs supports more than one rule for each composable sequence of > > characters. > > That doesn't help when the rules give conflicting divisions into > clusters, which is the case with Tai Tham. The assumption is that either the rules can be arranged in an order that allows to use the first matching rule, or, failing that, that you write your own composing function that implements whatever logic that's required to select the right rule. > The Devanagari rule only covers the Vedic marks in the Devanagari block, > the 'stress signs' according to the comments. Can rules essentially > for different scripts now share combining marks? The newer Vedic marks > were supposed to be available to at least all Indian Indic scripts. I don't know enough about this to make sure I even understand the question, let alone can provide an answer. One thing I can say is that the regexp pattern in a rule can specify different context (the surrounding characters) even if the character that triggers the rule is the same. Failing that, I guess the solution will again be the function that produces the composition. As for different scripts: if the character codepoints are the same, Emacs currently assigns each character to a single script. > > Does Emacs indeed fail to wrap Arabic text? can you show an example? > > Character level wrapping still almost works down at Emacs 24.4, but I > don't know that it wasn't broken in later enhancements. There are three > features that make me think Emacs 24.4 might be different to the > current state of affairs: > > (1) Clicking into the text breaks text before the cursor, but not after > it. > (2) I can't step into lam-alif the way I step into Indic clusters. > (3) Lam-alif isn't broken by line wrap. Emacs 24.4 is very old, and doesn't use HarfBuzz. Please try Emacs 27 instead, it has several bugs in this area fixed, and will use HarfBuzz if available at build time. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> Date: Sat, 23 May 2020 16:33:12 +0100 > From: Richard Wordingham > > On Sat, 23 May 2020 11:25:38 +0300 > Eli Zaretskii wrote: > > > > From: Khaled Hosny > > > Date: Sat, 23 May 2020 09:51:21 +0200 > > > Cc: harfbuzz@lists.freedesktop.org > > > What are you going to do about kerning, or mark positioning? > > > Partially kerning arbitrary glyphs (because the sub string match > > > some regular expression) is worse than not kerning at all. > > > > I don't think I understand the question. How is kerning related to > > the issue at hand? I'm not an expert on typesetting text (so maybe I > > don't even understand what exactly is meant by "kerning" in this > > context), so please tell more details about this. > > The simplest way of laying out proportionally spaced text is to have a > fixed glyph-dependent distance ('advance width') from the 'origin' of a > glyph to the origin of the next glyph and simply lay them out in a > sequence, like movable type. However, if one chooses widths suitable > for the sequences 'AM' and 'MV', then there may be an unsightly gap in > the middle of 'AV'. Kerning is basically the process of adjusting those > gaps. Kerning is done by the shaper. To do it, it needs the > whole sequence of characters. Ah, okay, thanks. Then yes, Emacs just uses the advance width that we get from the metrics of each glyph. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
On Sat, 23 May 2020 17:22:58 +0300 Eli Zaretskii wrote: > > Date: Sat, 23 May 2020 14:51:53 +0100 > > From: Richard Wordingham > > > > > > They may of course have more than one set of such rules, with > > > > the rule sets defining different sets of sequences. > > > > > > Who are "they" in this context? > > > > Devanagari and Tai Tham are two examples I am aware of. > > Emacs supports more than one rule for each composable sequence of > characters. That doesn't help when the rules give conflicting divisions into clusters, which is the case with Tai Tham. On the other hand, for the Devanagari scripts, the rules can store alternatives which some renderers would consider ill-formed, or be more sensibly treated as 2 clusters. > > Devanagari has different rules for positioning of Vedic marks > > between fonts using the script tags dev and dev2 for it on one hand > > and the unofficial script tag dev3, which follows the USE rules for > > character ordering. For tag dev, Microsoft says that > virama, candrabindu, consonant> is one cluster; others, including > > Unicode, say it's two. Candrabindu in the middle and candrabindu > > at the end mean different things; the former nasalises a consonant, > > while the latter nasalises a vowel. The visual distinction exists, > > at least when half-forms are used. > > See the rules set up near the end of indian.el in Emacs. If they > don't cover what you describe, we can add more. The Devanagari rule only covers the Vedic marks in the Devanagari block, the 'stress signs' according to the comments. Can rules essentially for different scripts now share combining marks? The newer Vedic marks were supposed to be available to at least all Indian Indic scripts. > > > If a font requires special shaping for any sequence of any number > > > of 26 (or maybe 52) ASCII letters, then the Emacs display engine > > > will need to be redesigned. So this extreme possibility doesn't > > > bother me. > > > > In general, they do require it. But how is this worse than handling > > Arabic? > > I don't know. Maybe it isn't. Or maybe the slowdown while displaying > ASCII and moving the cursor through it will be unbearable. > > > Is the problem that you want to keep the option of line > > wrapping splitting words for ASCII, but are not bothered for Arabic > > or other human languages? > > Does Emacs indeed fail to wrap Arabic text? can you show an example? Character level wrapping still almost works down at Emacs 24.4, but I don't know that it wasn't broken in later enhancements. There are three features that make me think Emacs 24.4 might be different to the current state of affairs: (1) Clicking into the text breaks text before the cursor, but not after it. (2) I can't step into lam-alif the way I step into Indic clusters. (3) Lam-alif isn't broken by line wrap. > > I think you mean that Emacs would store the position of components > > by an index that was the sequence of characters, not the glyph ID. > > That would also deal with precomposed characters - it would be the > > character sequence that mattered, and for cursor movement and > > rendering, the canonically equivalent sequence(s) and the > > precomposed character would remain distinct. > > Sorry, I don't follow: what do you mean by "store"? Emacs stores the > rules used to compose characters, and it stores the results of the > compositions already done by applying those rules, as part of > displaying some chunk of text. Which one of these did you have in > mind? Neither. I thought from the Emacs developers' discussion that you were hoping to store the locations of the character boundaries within ligatures. Richard. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
On Sat, 23 May 2020 11:25:38 +0300 Eli Zaretskii wrote: > > From: Khaled Hosny > > Date: Sat, 23 May 2020 09:51:21 +0200 > > Cc: harfbuzz@lists.freedesktop.org > > What are you going to do about kerning, or mark positioning? > > Partially kerning arbitrary glyphs (because the sub string match > > some regular expression) is worse than not kerning at all. > > I don't think I understand the question. How is kerning related to > the issue at hand? I'm not an expert on typesetting text (so maybe I > don't even understand what exactly is meant by "kerning" in this > context), so please tell more details about this. The simplest way of laying out proportionally spaced text is to have a fixed glyph-dependent distance ('advance width') from the 'origin' of a glyph to the origin of the next glyph and simply lay them out in a sequence, like movable type. However, if one chooses widths suitable for the sequences 'AM' and 'MV', then there may be an unsightly gap in the middle of 'AV'. Kerning is basically the process of adjusting those gaps. Kerning is done by the shaper. To do it, it needs the whole sequence of characters. To a first approximation, mark positioning is handled by passing the whole clusters to the shaper, and suitable regular expressions will handle this. However, sometimes clusters will interact. Microsoft had an example in the OpenType specification of the handling of the sequence Wö with a comparatively huge 'W'. In this example, the umlaut would be lowered to get out of the way of the 'W'. To do this, the shaper has to be presented with "W" and "ö" as part of the same sequence. Richard. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> Date: Sat, 23 May 2020 14:51:53 +0100 > From: Richard Wordingham > > > > They may of course have more than one set of such rules, with the > > > rule sets defining different sets of sequences. > > > > Who are "they" in this context? > > Devanagari and Tai Tham are two examples I am aware of. Emacs supports more than one rule for each composable sequence of characters. > Devanagari has different rules for positioning of Vedic marks between > fonts using the script tags dev and dev2 for it on one hand and the > unofficial script tag dev3, which follows the USE rules for character > ordering. For tag dev, Microsoft says that candrabindu, consonant> is one cluster; others, including Unicode, say > it's two. Candrabindu in the middle and candrabindu at the end mean > different things; the former nasalises a consonant, while the latter > nasalises a vowel. The visual distinction exists, at least when > half-forms are used. See the rules set up near the end of indian.el in Emacs. If they don't cover what you describe, we can add more. > > I'm not talking about Arabic. Emacs has a set of regular expressions > > for sequences of Arabic characters that need shaping, misc-lang.el in > > Emacs. If the set is incomplete, we can augment it. > > That regular expression treats every Arabic word as in need of shaping. > > > If a font requires special shaping for any sequence of any number of > > 26 (or maybe 52) ASCII letters, then the Emacs display engine will > > need to be redesigned. So this extreme possibility doesn't bother me. > > In general, they do require it. But how is this worse than handling > Arabic? I don't know. Maybe it isn't. Or maybe the slowdown while displaying ASCII and moving the cursor through it will be unbearable. > Is the problem that you want to keep the option of line > wrapping splitting words for ASCII, but are not bothered for Arabic or > other human languages? Does Emacs indeed fail to wrap Arabic text? can you show an example? > > > How would you handle the possibility that all three of <æ>, > > > and might be rendered by the same glyph, althouɡh they > > > are comprised of 1, 2 and 3 characters respectively? > > > > By using a composition rule that matches both and . > > The rules are regexp-based, and expressing the above as a regexp is > > simple. Once a sequence of characters matches the regexp, Emacs calls > > the shaper (hb_shape etc.) to produce the font glyphs for the > > sequence, and displays the glyphs that the shaper returns. > > I think you mean that Emacs would store the position of components by > an index that was the sequence of characters, not the glyph ID. That > would also deal with precomposed characters - it would be the character > sequence that mattered, and for cursor movement and rendering, > the canonically equivalent sequence(s) and the precomposed character > would remain distinct. Sorry, I don't follow: what do you mean by "store"? Emacs stores the rules used to compose characters, and it stores the results of the compositions already done by applying those rules, as part of displaying some chunk of text. Which one of these did you have in mind? ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
On Sat, 23 May 2020 09:09:48 +0300 Eli Zaretskii wrote: > > Date: Fri, 22 May 2020 22:22:49 +0100 > > From: Richard Wordingham > > > > > The current support for producing ligatures works in the same way > > > as complex text shaping for scripts that require that, like > > > Arabic and Khmer: the sequences of characters that can be > > > displayed as ligatures are identified in advance with suitable > > > regular expressions, and the display engine then passes these > > > sequences to hb_shape to produce the ligatures. > > > > > > This works well for scripts that require complex shaping, because > > > such scripts generally have well-defined rules for the sequences > > > of codepoints that need shaping. > > > > They may of course have more than one set of such rules, with the > > rule sets defining different sets of sequences. > > Who are "they" in this context? Devanagari and Tai Tham are two examples I am aware of. Devanagari has different rules for positioning of Vedic marks between fonts using the script tags dev and dev2 for it on one hand and the unofficial script tag dev3, which follows the USE rules for character ordering. For tag dev, Microsoft says that is one cluster; others, including Unicode, say it's two. Candrabindu in the middle and candrabindu at the end mean different things; the former nasalises a consonant, while the latter nasalises a vowel. The visual distinction exists, at least when half-forms are used. Tai Tham has an issue with the mark U+1A58 TAI THAM SIGN MAI KANG LAI. It is, at least formally, a non-spacing mark. It occurs at the juncture of two syllables in the same words. Modern, printed Tai Khuen happily treats it as syllable-final. In more traditional styles, it starts syllables, going above the first consonant, and so to the right of a vowel mark reordered to the left hand side of the syllable. Some fonts seem to just let it hang over the start of the next syllable, taking pot luck with what's there. That gives two different syllable structures. As I supported the style found in a certain dictionary, it sometimes belongs with the syllable before, and sometimes with the syllable after it. I therefore ended up defined the sequences to be shaped as a sequence of one or more syllables joined together by U+1A58. Fortunately, normal cursor motion is controlled by a different definition. (I'm still using Emacs 24.4 with the restoration of interactive commands forward-char-intrusive and backward-char-intrusive and their interface within the C code.) > I understand that the number of combinations is theoretically > unbounded. I'm asking if it is also unbounded in practice. That is, > do font designers add ligatures for arbitrary combinations of > characters, regardless of some reasonable set of requirements? For > example, is the set of ligatures of Latin characters shown here: > > https://en.wikipedia.org/wiki/Orthographic_ligature#Latin_alphabet > > reasonably complete, or should I expect any number of other arbitrary > combinations of Latin characters popping up in fonts? And if the > latter, then what is the purpose of providing such arbitrary > ligatures? Doesn't the existence of ligatures for 'Eisenhower' and 'Chamberlain' provide enough of an answer? If you claim to support handwriting fonts, then you can expect others - 'sh', 'tt' and 'ing' are fairly obvious ones. You may also find ligatures being used to sort out kerning issues. One problem I've observed with computer fonts is that the spacing of glyphs in a string is not consistent. This appears to be due to the way the positioning of the glyphs is rounded. The problem can be bad enough that the designer ends up fixing the problem by combining them into a single glyph, which formally is a ligature. I've not noticed this in ASCII fonts, but then I haven't looked hard at them. The 'tt' ligature can arise because the two t's are crossed by a single stroke. Crossing the 't' in 'lt' might be handled by a special 't' glyph, or one might just form an 'lt' ligature. The ending 'ing' is common enough that I unconsciously developed an abbreviated way of writing it. > I'm not talking about Arabic. Emacs has a set of regular expressions > for sequences of Arabic characters that need shaping, misc-lang.el in > Emacs. If the set is incomplete, we can augment it. That regular expression treats every Arabic word as in need of shaping. > If a font requires special shaping for any sequence of any number of > 26 (or maybe 52) ASCII letters, then the Emacs display engine will > need to be redesigned. So this extreme possibility doesn't bother me. In general, they do require it. But how is this worse than handling Arabic? Is the problem that you want to keep the option of line wrapping splitting words for ASCII, but are not bothered for Arabic or other human languages? ASCII does not satisfyingly suffice for English. > > How would you handle the possibility that all three of <æ>, > >
Re: [HarfBuzz] Ligatures
> From: Khaled Hosny > Date: Sat, 23 May 2020 09:59:15 +0200 > Cc: harfbuzz@lists.freedesktop.org > > Also either Emacs is currently treating text that it enables shaping for as > second-class citizens where limitations/degraded performance is acceptable > (which is really really bad) Could you tell more about which limitations and degraded performance you had in mind? I'm not sure we have this, but cannot tell without understanding the issues. > or “redesigning the entire Emacs display engine” is not really needed as you > can just declare all text as text that needs to be shaped and be done with it. The Emacs display engine examines the text to be displayed and laid out one character at a time, and makes layout decisions after each character or grapheme cluster it lays out. Its design is therefore fundamentally incompatible with shaping large substrings of buffer text at once. We do support that for short sequences of characters, which seems to work well enough for complex shaping (a.k.a. "character compositions") of scripts that require that, but we still do that one grapheme cluster at a time. The character composition is implemented in Lisp, which is called by the display engine, and which then calls back into C to invoke the shaper. This implementation is meant to allow a great deal of control on what should be composed and how. But it is also relatively slow, which is another reason why doing that for all the text to be laid out is impractical: it slows down redisplay to the degree that it becomes annoying to users. That is why solving these problems in the way that you suggest requires a complete rewrite of the Emacs display code. It simply cannot currently support what you expect. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> From: Khaled Hosny > Date: Sat, 23 May 2020 09:51:21 +0200 > Cc: harfbuzz@lists.freedesktop.org > > > Thanks. Since (b) is not really feasible without redesigning the > > entire Emacs display engine (for which I see no volunteers lining up > > any time soon), I guess we will have to use some more-or-less > > reasonable and somewhat unreliable heuristics by supporting only some > > ligatures that are known in advance. > > What are you going to do about kerning, or mark positioning? Partially > kerning arbitrary glyphs (because the sub string match some regular > expression) is worse than not kerning at all. I don't think I understand the question. How is kerning related to the issue at hand? I'm not an expert on typesetting text (so maybe I don't even understand what exactly is meant by "kerning" in this context), so please tell more details about this. Thanks. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> On May 23, 2020, at 9:51 AM, Khaled Hosny wrote: > > > >> On May 23, 2020, at 9:44 AM, Eli Zaretskii wrote: >> >>> From: Khaled Hosny >>> Date: Sat, 23 May 2020 08:36:10 +0200 >>> Cc: harfbuzz@lists.freedesktop.org >>> The only way of doing this right, I'm told, is to either (a) query the font to get the list of all the ligatures it supports, or (b) assume any combination of characters can produce a ligature, and therefore we need to pass all the characters intended for display through hb_shape. The latter in particular is in stark contrast to how the current Emacs display code is designed and implemented. >>> >>> (a) is not realistically possible as doing it properly has pretty much the >>> same cost as shaping the text. So your only reliable option is (b). >> >> Thanks. Since (b) is not really feasible without redesigning the >> entire Emacs display engine (for which I see no volunteers lining up >> any time soon), I guess we will have to use some more-or-less >> reasonable and somewhat unreliable heuristics by supporting only some >> ligatures that are known in advance. > > What are you going to do about kerning, or mark positioning? Partially > kerning arbitrary glyphs (because the sub string match some regular > expression) is worse than not kerning at all. Also either Emacs is currently treating text that it enables shaping for as second-class citizens where limitations/degraded performance is acceptable (which is really really bad), or “redesigning the entire Emacs display engine” is not really needed as you can just declare all text as text that needs to be shaped and be done with it. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> On May 23, 2020, at 9:44 AM, Eli Zaretskii wrote: > >> From: Khaled Hosny >> Date: Sat, 23 May 2020 08:36:10 +0200 >> Cc: harfbuzz@lists.freedesktop.org >> >>> The only way of >>> doing this right, I'm told, is to either (a) query the font to get the >>> list of all the ligatures it supports, or (b) assume any combination >>> of characters can produce a ligature, and therefore we need to pass >>> all the characters intended for display through hb_shape. The latter >>> in particular is in stark contrast to how the current Emacs display >>> code is designed and implemented. >> >> (a) is not realistically possible as doing it properly has pretty much the >> same cost as shaping the text. So your only reliable option is (b). > > Thanks. Since (b) is not really feasible without redesigning the > entire Emacs display engine (for which I see no volunteers lining up > any time soon), I guess we will have to use some more-or-less > reasonable and somewhat unreliable heuristics by supporting only some > ligatures that are known in advance. What are you going to do about kerning, or mark positioning? Partially kerning arbitrary glyphs (because the sub string match some regular expression) is worse than not kerning at all. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> From: Khaled Hosny > Date: Sat, 23 May 2020 08:36:10 +0200 > Cc: harfbuzz@lists.freedesktop.org > > >The only way of > > doing this right, I'm told, is to either (a) query the font to get the > > list of all the ligatures it supports, or (b) assume any combination > > of characters can produce a ligature, and therefore we need to pass > > all the characters intended for display through hb_shape. The latter > > in particular is in stark contrast to how the current Emacs display > > code is designed and implemented. > > (a) is not realistically possible as doing it properly has pretty much the > same cost as shaping the text. So your only reliable option is (b). Thanks. Since (b) is not really feasible without redesigning the entire Emacs display engine (for which I see no volunteers lining up any time soon), I guess we will have to use some more-or-less reasonable and somewhat unreliable heuristics by supporting only some ligatures that are known in advance. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz