Re: [HarfBuzz] Ligatures
> On May 24, 2020, at 6:34 PM, Eli Zaretskii wrote: > >> From: Khaled Hosny >> Date: Sun, 24 May 2020 18:00:45 +0200 >> Cc: harfbuzz@lists.freedesktop.org >> > >> This, for example, ensures that HarfBuzz can do basic Arabic-like shaping >> across item boundaries e.g. if you break items in the middle of an Arabic >> word (due to font change, for example), you still get the >> initial/medial/final forms across the boundary as appropriate. Or to put a >> combining mark at the start of a paragraph on a dotted circle as it >> otherwise has no base. >> >> If this is not possible, then you can try to pass enough context, like reach >> back and forward to first character that is not a combining mark. This may >> or may not be enough. >> >> Shaping space-delimited words is orthogonal to that, context is better be >> always provided. > > So this sounds like passing a physical line that ends in a newline > should be good enough? Or are there issues that cross newlines as > well? It should be enough. > > And what is a "paragraph" in this context? The same as in UAX#9. >> Some fonts do have OpenType lookups that interact with space (e.g. kerning >> pairs involving space, or even substitutions involving space), so shaping >> words independently will give suboptimal result. You can use HarfBuzz API to >> find out if the font has OpenType layout rules involving space, or decide to >> live with this limitation. > > Which API provides this information? https://harfbuzz.github.io/harfbuzz-hb-ot-layout.html#hb-ot-layout-lookup-collect-glyphs But requires some understanding of how OpenType lookups are structured. Checking how Firefox uses it might help. Regards, Khaled ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> Date: Sun, 24 May 2020 20:27:26 +0100 > From: Richard Wordingham > Cc: harfbuzz@lists.freedesktop.org > > It seems to me that Emacs knows what script a cluster is in; perhaps > it just hasn't united the concepts. It's a kind of coincidence: different scripts almost always require different fonts, and Emacs only composes characters displayed in the same font. > Users may have written some weird clustering combinations, and I can > imagine some weird combinations in the Private Use Areas. I should > investigate. Don't expect anything about PUA, Emacs doesn't assign any useful properties to them. > > That's a feature (you can disable it with disable-point-adjustment). > > Is this documented in info, or does one have to trawl the code to find > out what it does? Every variable in Emacs has a doc string, and you can search them with several apropos commands. We don't describe in the manual every obscure variable, there are too many of them. > It seems that Emacs needs several levels of movement > - by codepoints, by grapheme cluster, by akshara (will be the same as > grapheme cluster in many cases) and by HarfBuzz cluster, or whatever > is used to make access into lam-alif impossible. I have no idea which one Emacs uses, not in these terms. All I can say is that, in HarfBuzz terms, we get the number of "elements" from hb_buffer_get_length, and then index the arrays returned by hb_buffer_get_glyph_infos. Each "element" thus indexed is a separate "thing" for display purposes, and Emacs by default won't let you "enter" such a "thing", it will move across it in its entirety in one go. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
On Sun, 24 May 2020 17:18:27 +0300 Eli Zaretskii wrote: > > Date: Sat, 23 May 2020 21:42:24 +0100 > > From: Richard Wordingham > > > As for different scripts: if the character codepoints are the > > > same, Emacs currently assigns each character to a single script. > > I'll need to dig deeper. Composition of both 'a' and Greek alpha > > with an acute accent works, which suggest that the problem isn't > > there for characters with a script property of 'inherited'. > Emacs currently leaves it up to HarfBuzz to guess the script, as it > doesn't yet have the necessary smarts. I thought the issue lay within Emacs. HarfBuzz has been fairly civilised about combining marks in the 'wrong' script run. If I put Thai marks in what is basically a Tai Tham script run, it seems to treat them properly. I do such a strange thing because the marks have been borrowed into Tai Tham, but not yet encoded. I was told I couldn't do this in Emacs 24. It seems to me that Emacs knows what script a cluster is in; perhaps it just hasn't united the concepts. Users may have written some weird clustering combinations, and I can imagine some weird combinations in the Private Use Areas. I should investigate. > > The behaviour in 27.05 is the almost the same as for 24.4, but the > > breaking in item (1) is automatically repaired. > > Pressing the 'delete' key still deletes a single character, but may > > be that because it's mapped to tpu-delete-current-char. It's OK, it's still working with emacs -q. That means one can easily replace the initial character of a cluster. > If you press DEL (or Backspace), it will delete a single codepoint. That only deletes the final cluster. > > So, what's not working in Arabic is that one can't move the cursor > > through ligatures. > > That's a feature (you can disable it with disable-point-adjustment). Is this documented in info, or does one have to trawl the code to find out what it does? It seems that Emacs needs several levels of movement - by codepoints, by grapheme cluster, by akshara (will be the same as grapheme cluster in many cases) and by HarfBuzz cluster, or whatever is used to make access into lam-alif impossible. Visible motion by akshara is the minimum requirement for English, so that stepping through 'ffi' will visibly advance the cursor. LibreOffice writer aims to provide visible cursor motion at the grapheme cluster level, so one can use the cursor to step through the consonants in an akshara. By codepoint is useful for editing complex aksharas; it is even more useful if the cursor acts like a cluster terminator, but that is probably a matter of personal taste. It will also be useful for editing narrow phonetic transcriptions, which can be quite heavy on diacritics. By grapheme cluster (at least, by default grapheme cluster) is level encouraged by Unicode, and will give you letter-by-letter control even if you're editing Sanskrit in an Indian script. For Arabic, European and Hebrew scripts, this is the same as akshara level. By akshara is the current default movement level for most Indian scripts in Emacs. It is also the level at which the most Hindi speakers claim to operate. (I get the impression, however, that a lot of Indians do their fine level editing of complicated text in transliteration!) By HarfBuzz cluster takes you to the level where HarfBuzz will easily give you cursor positions. Now occasionally HarfBuzz's actual clusters won't combine whole grapheme clusters or aksharas. For example, Thai vowels could be roughly placed for Thai without taking into account of the previous letters, just as on typewriters, and one can even handle Thai tone marks like that. It's possible that in these cases, HarfBuzz will not form clusters. How you handle these cases is up to you. I would make 'by HarfBuzz cluster' the coarsest. I don't think motion by HarfBuzz cluster is useful - perhaps you know of a use. Richard. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> From: Khaled Hosny > Date: Sun, 24 May 2020 18:00:45 +0200 > Cc: harfbuzz@lists.freedesktop.org > > In general the safest is to pass the whole paragraph of text and the start > and length of each item (item being a run with same font, direction, script, > and language). I was talking about text that has a single font, direction, script, and language. > This, for example, ensures that HarfBuzz can do basic Arabic-like shaping > across item boundaries e.g. if you break items in the middle of an Arabic > word (due to font change, for example), you still get the > initial/medial/final forms across the boundary as appropriate. Or to put a > combining mark at the start of a paragraph on a dotted circle as it otherwise > has no base. > > If this is not possible, then you can try to pass enough context, like reach > back and forward to first character that is not a combining mark. This may or > may not be enough. > > Shaping space-delimited words is orthogonal to that, context is better be > always provided. So this sounds like passing a physical line that ends in a newline should be good enough? Or are there issues that cross newlines as well? And what is a "paragraph" in this context? > Some fonts do have OpenType lookups that interact with space (e.g. kerning > pairs involving space, or even substitutions involving space), so shaping > words independently will give suboptimal result. You can use HarfBuzz API to > find out if the font has OpenType layout rules involving space, or decide to > live with this limitation. Which API provides this information? Thanks. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> On May 24, 2020, at 5:41 PM, Eli Zaretskii wrote: > >>> I almost understand (and agree), sans one part: the "arbitrary parts" >>> of what you wrote. If we want to produce a ligature out of "ffi", the >>> shaper will get "fii" and nothing more. Which part here is arbitrary? >> >> Sending "ffi" alone is an arbitrary decision. The font might have kerning >> between "ffi" and what comes before and after it, but you won't get it. The >> font might not have a ligature for "ffi" at all, but using kerning instead, >> so you will get kerning between "ffi" glyphs and not other glyphs which is >> arbitrary. It might be a cursive font that changes glyph shapes based on >> surrounding glyphs, and you will get that for "ffi" and not elsewhere which >> is arbitrary. >> >> That is just plain wrong, there is no way around it. > > So, to make sure I understand the correct solution: you are saying > that all the text to be displayed should go through the shaper, is > that right? > > If so, how large should be the chunks of text to be passed to the > shaper in any one call, in order to have a correct result? Would it > be enough to pass whitespace-separated words one by one? or do we need > to send entire physical lines (up to the terminating newline > character)? or maybe an entire paragraph? What is the recommendation > here? In general the safest is to pass the whole paragraph of text and the start and length of each item (item being a run with same font, direction, script, and language). This, for example, ensures that HarfBuzz can do basic Arabic-like shaping across item boundaries e.g. if you break items in the middle of an Arabic word (due to font change, for example), you still get the initial/medial/final forms across the boundary as appropriate. Or to put a combining mark at the start of a paragraph on a dotted circle as it otherwise has no base. If this is not possible, then you can try to pass enough context, like reach back and forward to first character that is not a combining mark. This may or may not be enough. Shaping space-delimited words is orthogonal to that, context is better be always provided. Some fonts do have OpenType lookups that interact with space (e.g. kerning pairs involving space, or even substitutions involving space), so shaping words independently will give suboptimal result. You can use HarfBuzz API to find out if the font has OpenType layout rules involving space, or decide to live with this limitation. Firefox does this check as it wants to cache individualizing ideal shaped words when possible, and Chrome used to do that to but I think they now make sure to retain enough information to avoid unnecessary reshaping so such a word cache is not needed. Regards, Khaled ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> > I almost understand (and agree), sans one part: the "arbitrary parts" > > of what you wrote. If we want to produce a ligature out of "ffi", the > > shaper will get "fii" and nothing more. Which part here is arbitrary? > > Sending "ffi" alone is an arbitrary decision. The font might have kerning > between "ffi" and what comes before and after it, but you won't get it. The > font might not have a ligature for "ffi" at all, but using kerning instead, > so you will get kerning between "ffi" glyphs and not other glyphs which is > arbitrary. It might be a cursive font that changes glyph shapes based on > surrounding glyphs, and you will get that for "ffi" and not elsewhere which > is arbitrary. > > That is just plain wrong, there is no way around it. So, to make sure I understand the correct solution: you are saying that all the text to be displayed should go through the shaper, is that right? If so, how large should be the chunks of text to be passed to the shaper in any one call, in order to have a correct result? Would it be enough to pass whitespace-separated words one by one? or do we need to send entire physical lines (up to the terminating newline character)? or maybe an entire paragraph? What is the recommendation here? Thanks. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> Date: Sat, 23 May 2020 21:42:24 +0100 > From: Richard Wordingham > > > As for different scripts: if the character codepoints are the same, > > Emacs currently assigns each character to a single script. > > I'll need to dig deeper. Composition of both 'a' and Greek alpha with > an acute accent works, which suggest that the problem isn't there for > characters with a script property of 'inherited'. Emacs currently leaves it up to HarfBuzz to guess the script, as it doesn't yet have the necessary smarts. > > Emacs 24.4 is very old, and doesn't use HarfBuzz. Please try Emacs 27 > > instead, it has several bugs in this area fixed, and will use HarfBuzz > > if available at build time. > > The behaviour in 27.05 is the almost the same as for 24.4, but the > breaking in item (1) is automatically repaired. The process seems slow > - I can see the glyph become final and then revert back to being > medial. I'm puzzled by not being able to step into lam-alif but being > able to step through a series 'beh's. The step into command for > advancing codepoint by codepoint semiworks. The cluster shaping > doesn't break at the cursor - Handa gave me a C code fix so I could > achieve that - but the number of steps into to pass through a cluster > matches the number of codepoints. > > Pressing the 'delete' key still deletes a single character, but may be > that because it's mapped to tpu-delete-current-char. If you press DEL (or Backspace), it will delete a single codepoint. > So, what's not working in Arabic is that one can't move the cursor > through ligatures. That's a feature (you can disable it with disable-point-adjustment). The rest of your observations seem to be too Emacs-specific to discuss here. You are welcome to submit an Emacs bug report if you think something isn't working as it should, or would like to discuss Emacs-specific details. Thanks. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
On Sat, 23 May 2020 19:45:17 +0300 Eli Zaretskii wrote: > > Date: Sat, 23 May 2020 16:54:51 +0100 > > From: Richard Wordingham > > Cc: harfbuzz@lists.freedesktop.org > > > > > Emacs supports more than one rule for each composable sequence of > > > characters. > > > > That doesn't help when the rules give conflicting divisions into > > clusters, which is the case with Tai Tham. > > The assumption is that either the rules can be arranged in an order > that allows to use the first matching rule, or, failing that, that you > write your own composing function that implements whatever logic > that's required to select the right rule. That choice needs tied to the choice of font - or for Tai Tham you use my hack technique. However, it's not as bad as it could be. There's something strange going on in Tai Tham even at Emacs 27.05. I can have two aksharas interacting for shaping, but it take two 'ordinary' key advances to pass through it, apparently implying that there are two clusters. Clusters for cursor advancement and clusters for shaping seem to be controlled independently! >From the dotted circle insertion logic, Emacs 27.05 on my machine definitely looks as though it's using some form of HarfBuzz. > > The Devanagari rule only covers the Vedic marks in the Devanagari > > block, the 'stress signs' according to the comments. Can rules > > essentially for different scripts now share combining marks? The > > newer Vedic marks were supposed to be available to at least all > > Indian Indic scripts. > > I don't know enough about this to make sure I even understand the > question, let alone can provide an answer. One thing I can say is > that the regexp pattern in a rule can specify different context (the > surrounding characters) even if the character that triggers the rule > is the same. Failing that, I guess the solution will again be the > function that produces the composition. > > As for different scripts: if the character codepoints are the same, > Emacs currently assigns each character to a single script. I'll need to dig deeper. Composition of both 'a' and Greek alpha with an acute accent works, which suggest that the problem isn't there for characters with a script property of 'inherited'. > > > Does Emacs indeed fail to wrap Arabic text? can you show an > > > example? > > > > Character level wrapping still almost works down at Emacs 24.4, but > > I don't know that it wasn't broken in later enhancements. There > > are three features that make me think Emacs 24.4 might be different > > to the current state of affairs: > > > > (1) Clicking into the text breaks text before the cursor, but not > > after it. > > (2) I can't step into lam-alif the way I step into Indic clusters. > > (3) Lam-alif isn't broken by line wrap. > > Emacs 24.4 is very old, and doesn't use HarfBuzz. Please try Emacs 27 > instead, it has several bugs in this area fixed, and will use HarfBuzz > if available at build time. The behaviour in 27.05 is the almost the same as for 24.4, but the breaking in item (1) is automatically repaired. The process seems slow - I can see the glyph become final and then revert back to being medial. I'm puzzled by not being able to step into lam-alif but being able to step through a series 'beh's. The step into command for advancing codepoint by codepoint semiworks. The cluster shaping doesn't break at the cursor - Handa gave me a C code fix so I could achieve that - but the number of steps into to pass through a cluster matches the number of codepoints. Pressing the 'delete' key still deletes a single character, but may be that because it's mapped to tpu-delete-current-char. So, what's not working in Arabic is that one can't move the cursor through ligatures. It seems one can advance point through them using a step-into command (dead reckoning is a useful fallback), but one loses visual feedback. But for that important matter, it looks as though Arabic in Emacs already has the behaviours needed for shaping Latin words. The stepping into is enabled by the command "(setq disable-point-adjustment t)". Richard. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
On 23/05/2020 08:44, Eli Zaretskii wrote: Thanks. Since (b) is not really feasible without redesigning the entire Emacs display engine (for which I see no volunteers lining up any time soon), I guess we will have to use some more-or-less reasonable and somewhat unreliable heuristics by supporting only some ligatures that are known in advance. Travelling further in the wrong direction is always an option, but don't expect it to get you closer to the right destination. Full text shaping is the only way to get this right. Everything else is a hack, and piling hacks on top of hacks is just storing maintenance problems up for yourself. I know that's hard to hear for a volunteer project where nobody really wants to invest the effort in this complicated niche stuff, but honestly, you're probably better doing *nothing* than doing this. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> Cc: harfbuzz@lists.freedesktop.org > From: Simon Cozens > Date: Sat, 23 May 2020 20:14:16 +0100 > > On 23/05/2020 08:44, Eli Zaretskii wrote: > > Thanks. Since (b) is not really feasible without redesigning the > > entire Emacs display engine (for which I see no volunteers lining up > > any time soon), I guess we will have to use some more-or-less > > reasonable and somewhat unreliable heuristics by supporting only some > > ligatures that are known in advance. > > Travelling further in the wrong direction is always an option, but don't > expect it to get you closer to the right destination. I don't think this is an adequate analogy. What Emacs does is an approximation to what should be done. The approximation falls short of the target, that's true, and might even produce clearly incorrect results in some cases (although I've yet to see such cases, and I'm using Emacs for editing non-ASCII text for 20 years). But it is still an approximation, so it is not really "the wrong direction" (which you seem to interpret as 180 degrees off, otherwise even going in the wrong direction might bring me closer to the destination, right?). ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> Date: Sat, 23 May 2020 20:06:32 +0100 > From: Richard Wordingham > > There are three different tools for producing what looks like an "ffi" > ligature: > > 1) Make a ligature > 2) Contextual substitution > 3) A mix of contextual substitution and kerning. > > A font that uses the first will produce a ligature for Emacs. > > A font that uses contextual substitution will not work - you will just > see the 3 unligated characters with their default glyphs. > > A font that uses a mix of contextual substitution and kerning will > likewise fail. However, if is possible that you might get the "ff" > ligature and a normal 'i', or a normal 'f' and an "fi" ligature. > > From the point of view of someone who expects full shaping, what result > you get will be arbitrary, depending on how the font designer has > marshalled his tools. I understand. Still, the result looks reasonably good in most cases, especially in an editor whose main purpose is to edit programs, and which doesn't pretend to produce typographical accuracy. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> From: Khaled Hosny > Date: Sat, 23 May 2020 20:54:15 +0200 > Cc: harfbuzz@lists.freedesktop.org > > > We pass to the shaper the part of text that matches the regexps you > > can see at the end of misc-lang.el, then display the glyphs the shaper > > returns. The above description is a high-level overview; there are > > many details that I cannot describe in a short message. For example, > > for Arabic, when we get back the grapheme clusters, we lay them out, > > then skip to the end of the text that we passed to the shaper. > > You mean this: > https://repo.or.cz/emacs.git/blob/HEAD:/lisp/language/misc-lang.el#l78 > > I’m not sure how can I read it, but it seems to be missing the entire Arabic > Extended-A and Arabic Mathematical Alphabetic Symbols blocks. I’m not also > sure how it would handle using combining marks from other blocks with Arabic > text (say putting U+20D6 over an Arabic letter). If you can suggest improvements to those patterns, please do, and thanks. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
On Sat, 23 May 2020 21:26:00 +0300 Eli Zaretskii wrote: > > From: Khaled Hosny > > Date: Sat, 23 May 2020 20:09:50 +0200 > > Cc: harfbuzz@lists.freedesktop.org > > > > Overall, if you can’t send the whole text (words are the absolute > > minimum, but this has its issues as well), don’t just send > > arbitrary parts of it as the result will be some inconsistent > > mess. > > I almost understand (and agree), sans one part: the "arbitrary parts" > of what you wrote. If we want to produce a ligature out of "ffi", the > shaper will get "fii" and nothing more. Which part here is arbitrary? There are three different tools for producing what looks like an "ffi" ligature: 1) Make a ligature 2) Contextual substitution 3) A mix of contextual substitution and kerning. A font that uses the first will produce a ligature for Emacs. A font that uses contextual substitution will not work - you will just see the 3 unligated characters with their default glyphs. A font that uses a mix of contextual substitution and kerning will likewise fail. However, if is possible that you might get the "ff" ligature and a normal 'i', or a normal 'f' and an "fi" ligature. From the point of view of someone who expects full shaping, what result you get will be arbitrary, depending on how the font designer has marshalled his tools. Richard. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> From: Khaled Hosny > Date: Sat, 23 May 2020 20:40:44 +0200 > Cc: harfbuzz@lists.freedesktop.org > > Sending “ffi” alone is an arbitrary decision. The font might have kerning > between “ffi” and what comes before and after it, but you won’t get it. The > font might not hav a ligature for “ffi” at all, but using kerning instead, so > you will get kerning between “ffi” glyphs and not other glyphs which is > arbitrary. It might be a cursive font that changes glyph shapes based on > surrounding glyphs, and you will get that for “ffi” and not elsewhere which > is arbitrary. > > That is just plain wrong, there is no way around it. OK, thanks. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> On May 23, 2020, at 8:34 PM, Eli Zaretskii wrote: > >> From: Khaled Hosny >> Date: Sat, 23 May 2020 20:18:33 +0200 >> Cc: harfbuzz@lists.freedesktop.org >> >>> The Emacs display engine examines the text to be displayed and laid >>> out one character at a time, and makes layout decisions after each >>> character or grapheme cluster it lays out. Its design is therefore >>> fundamentally incompatible with shaping large substrings of buffer >>> text at once. We do support that for short sequences of characters, >>> which seems to work well enough for complex shaping (a.k.a. "character >>> compositions") of scripts that require that, but we still do that one >>> grapheme cluster at a time. >> >> That wouldn’t work for Arabic. You can’t shape Arabic one grapheme cluster >> at a time (or any other text actually, but the brokenness in Arabic will be >> immediately obvious), so I’m most certain that is not exactly how Arabic is >> handled in Emacs right now. > > We pass to the shaper the part of text that matches the regexps you > can see at the end of misc-lang.el, then display the glyphs the shaper > returns. The above description is a high-level overview; there are > many details that I cannot describe in a short message. For example, > for Arabic, when we get back the grapheme clusters, we lay them out, > then skip to the end of the text that we passed to the shaper. You mean this: https://repo.or.cz/emacs.git/blob/HEAD:/lisp/language/misc-lang.el#l78 I’m not sure how can I read it, but it seems to be missing the entire Arabic Extended-A and Arabic Mathematical Alphabetic Symbols blocks. I’m not also sure how it would handle using combining marks from other blocks with Arabic text (say putting U+20D6 over an Arabic letter). What happens if one edits a file that contains only Arabic text, and why that (whatever it is ) can’t be extended to any text? >>> The character composition is implemented >>> in Lisp, which is called by the display engine, and which then calls >>> back into C to invoke the shaper. This implementation is meant to >>> allow a great deal of control on what should be composed and how. But >>> it is also relatively slow, which is another reason why doing that for >>> all the text to be laid out is impractical: it slows down redisplay to >>> the degree that it becomes annoying to users. >> >> Having more control should not be at the price of doing things wrong. > > No one said it should, that's just how things are. > >> The whole composition concept of Emacs does not make any sense to me, all >> text is “composed”. You can have a special mode that would disable shaping >> for specific purposes (opening huge log files, wanting to see raw text with >> no bidi or shaping, etc), but this can be done in cooperation with HarfBuzz >> and not by bypassing it entirely. > > We are talking about a piece of software designed 21 years ago. I > realize that it makes no sense to you, but that's what we have, and > will probably have for the next 10 years or so. We must make the most > out of what we have. So nearly as old as the first release of OpenOffice (not counting its StarOffice days). Anyway bad decisions about text layout is quite rampant in software (old and new) and need to be fixed, but that is not my call. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> On May 23, 2020, at 8:26 PM, Eli Zaretskii wrote: > >> From: Khaled Hosny >> Date: Sat, 23 May 2020 20:09:50 +0200 >> Cc: harfbuzz@lists.freedesktop.org >> >> Overall, if you can’t send the whole text (words are the absolute minimum, >> but this has its issues as well), don’t just send arbitrary parts of it as >> the result will be some inconsistent mess. > > I almost understand (and agree), sans one part: the "arbitrary parts" > of what you wrote. If we want to produce a ligature out of "ffi", the > shaper will get "fii" and nothing more. Which part here is arbitrary? Sending “ffi” alone is an arbitrary decision. The font might have kerning between “ffi” and what comes before and after it, but you won’t get it. The font might not hav a ligature for “ffi” at all, but using kerning instead, so you will get kerning between “ffi” glyphs and not other glyphs which is arbitrary. It might be a cursive font that changes glyph shapes based on surrounding glyphs, and you will get that for “ffi” and not elsewhere which is arbitrary. That is just plain wrong, there is no way around it. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> From: Khaled Hosny > Date: Sat, 23 May 2020 20:18:33 +0200 > Cc: harfbuzz@lists.freedesktop.org > > > The Emacs display engine examines the text to be displayed and laid > > out one character at a time, and makes layout decisions after each > > character or grapheme cluster it lays out. Its design is therefore > > fundamentally incompatible with shaping large substrings of buffer > > text at once. We do support that for short sequences of characters, > > which seems to work well enough for complex shaping (a.k.a. "character > > compositions") of scripts that require that, but we still do that one > > grapheme cluster at a time. > > That wouldn’t work for Arabic. You can’t shape Arabic one grapheme cluster at > a time (or any other text actually, but the brokenness in Arabic will be > immediately obvious), so I’m most certain that is not exactly how Arabic is > handled in Emacs right now. We pass to the shaper the part of text that matches the regexps you can see at the end of misc-lang.el, then display the glyphs the shaper returns. The above description is a high-level overview; there are many details that I cannot describe in a short message. For example, for Arabic, when we get back the grapheme clusters, we lay them out, then skip to the end of the text that we passed to the shaper. > > The character composition is implemented > > in Lisp, which is called by the display engine, and which then calls > > back into C to invoke the shaper. This implementation is meant to > > allow a great deal of control on what should be composed and how. But > > it is also relatively slow, which is another reason why doing that for > > all the text to be laid out is impractical: it slows down redisplay to > > the degree that it becomes annoying to users. > > Having more control should not be at the price of doing things wrong. No one said it should, that's just how things are. > The whole composition concept of Emacs does not make any sense to me, all > text is “composed”. You can have a special mode that would disable shaping > for specific purposes (opening huge log files, wanting to see raw text with > no bidi or shaping, etc), but this can be done in cooperation with HarfBuzz > and not by bypassing it entirely. We are talking about a piece of software designed 21 years ago. I realize that it makes no sense to you, but that's what we have, and will probably have for the next 10 years or so. We must make the most out of what we have. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> From: Khaled Hosny > Date: Sat, 23 May 2020 20:09:50 +0200 > Cc: harfbuzz@lists.freedesktop.org > > Overall, if you can’t send the whole text (words are the absolute minimum, > but this has its issues as well), don’t just send arbitrary parts of it as > the result will be some inconsistent mess. I almost understand (and agree), sans one part: the "arbitrary parts" of what you wrote. If we want to produce a ligature out of "ffi", the shaper will get "fii" and nothing more. Which part here is arbitrary? Thanks. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> On May 23, 2020, at 10:35 AM, Eli Zaretskii wrote: > >> From: Khaled Hosny >> Date: Sat, 23 May 2020 09:59:15 +0200 >> Cc: harfbuzz@lists.freedesktop.org >> >> Also either Emacs is currently treating text that it enables shaping for as >> second-class citizens where limitations/degraded performance is acceptable >> (which is really really bad) > > Could you tell more about which limitations and degraded performance > you had in mind? I'm not sure we have this, but cannot tell without > understanding the issues. I have no idea. I’m just guessing why you think the Emacs display engine can’t handle all text like it handles Arabic. Either it does not handle Arabic correctly, or it can handle all text like it handles Arabic. >> or “redesigning the entire Emacs display engine” is not really needed as you >> can just declare all text as text that needs to be shaped and be done with >> it. > > The Emacs display engine examines the text to be displayed and laid > out one character at a time, and makes layout decisions after each > character or grapheme cluster it lays out. Its design is therefore > fundamentally incompatible with shaping large substrings of buffer > text at once. We do support that for short sequences of characters, > which seems to work well enough for complex shaping (a.k.a. "character > compositions") of scripts that require that, but we still do that one > grapheme cluster at a time. That wouldn’t work for Arabic. You can’t shape Arabic one grapheme cluster at a time (or any other text actually, but the brokenness in Arabic will be immediately obvious), so I’m most certain that is not exactly how Arabic is handled in Emacs right now. > The character composition is implemented > in Lisp, which is called by the display engine, and which then calls > back into C to invoke the shaper. This implementation is meant to > allow a great deal of control on what should be composed and how. But > it is also relatively slow, which is another reason why doing that for > all the text to be laid out is impractical: it slows down redisplay to > the degree that it becomes annoying to users. Having more control should not be at the price of doing things wrong. The whole composition concept of Emacs does not make any sense to me, all text is “composed”. You can have a special mode that would disable shaping for specific purposes (opening huge log files, wanting to see raw text with no bidi or shaping, etc), but this can be done in cooperation with HarfBuzz and not by bypassing it entirely. Regards, Khaled ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> On May 23, 2020, at 10:25 AM, Eli Zaretskii wrote: > >> From: Khaled Hosny >> Date: Sat, 23 May 2020 09:51:21 +0200 >> Cc: harfbuzz@lists.freedesktop.org >> >>> Thanks. Since (b) is not really feasible without redesigning the >>> entire Emacs display engine (for which I see no volunteers lining up >>> any time soon), I guess we will have to use some more-or-less >>> reasonable and somewhat unreliable heuristics by supporting only some >>> ligatures that are known in advance. >> >> What are you going to do about kerning, or mark positioning? Partially >> kerning arbitrary glyphs (because the sub string match some regular >> expression) is worse than not kerning at all. > > I don't think I understand the question. How is kerning related to > the issue at hand? Kerning is part of text layout. You are only considering ligatures, but they are small part of text layout and your proposal does not seem to consider anything other than ligatures which is arbitrary division and makes no much sense to me. Some fonts provide ligatures to fix f-collioson, others fix it with contextual alternates, and others fix it with kerning. Your proposed solution does not address this. Also when you pass certain text to the layout engine, you get everything the font provides not just ligatures, so you would end up kerning certain letter combination (that you send to the layout engine) and not others, which is inconsistent and ugly. Overall, if you can’t send the whole text (words are the absolute minimum, but this has its issues as well), don’t just send arbitrary parts of it as the result will be some inconsistent mess. Regards, Khaled ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> Date: Sat, 23 May 2020 16:54:51 +0100 > From: Richard Wordingham > Cc: harfbuzz@lists.freedesktop.org > > > Emacs supports more than one rule for each composable sequence of > > characters. > > That doesn't help when the rules give conflicting divisions into > clusters, which is the case with Tai Tham. The assumption is that either the rules can be arranged in an order that allows to use the first matching rule, or, failing that, that you write your own composing function that implements whatever logic that's required to select the right rule. > The Devanagari rule only covers the Vedic marks in the Devanagari block, > the 'stress signs' according to the comments. Can rules essentially > for different scripts now share combining marks? The newer Vedic marks > were supposed to be available to at least all Indian Indic scripts. I don't know enough about this to make sure I even understand the question, let alone can provide an answer. One thing I can say is that the regexp pattern in a rule can specify different context (the surrounding characters) even if the character that triggers the rule is the same. Failing that, I guess the solution will again be the function that produces the composition. As for different scripts: if the character codepoints are the same, Emacs currently assigns each character to a single script. > > Does Emacs indeed fail to wrap Arabic text? can you show an example? > > Character level wrapping still almost works down at Emacs 24.4, but I > don't know that it wasn't broken in later enhancements. There are three > features that make me think Emacs 24.4 might be different to the > current state of affairs: > > (1) Clicking into the text breaks text before the cursor, but not after > it. > (2) I can't step into lam-alif the way I step into Indic clusters. > (3) Lam-alif isn't broken by line wrap. Emacs 24.4 is very old, and doesn't use HarfBuzz. Please try Emacs 27 instead, it has several bugs in this area fixed, and will use HarfBuzz if available at build time. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> Date: Sat, 23 May 2020 16:33:12 +0100 > From: Richard Wordingham > > On Sat, 23 May 2020 11:25:38 +0300 > Eli Zaretskii wrote: > > > > From: Khaled Hosny > > > Date: Sat, 23 May 2020 09:51:21 +0200 > > > Cc: harfbuzz@lists.freedesktop.org > > > What are you going to do about kerning, or mark positioning? > > > Partially kerning arbitrary glyphs (because the sub string match > > > some regular expression) is worse than not kerning at all. > > > > I don't think I understand the question. How is kerning related to > > the issue at hand? I'm not an expert on typesetting text (so maybe I > > don't even understand what exactly is meant by "kerning" in this > > context), so please tell more details about this. > > The simplest way of laying out proportionally spaced text is to have a > fixed glyph-dependent distance ('advance width') from the 'origin' of a > glyph to the origin of the next glyph and simply lay them out in a > sequence, like movable type. However, if one chooses widths suitable > for the sequences 'AM' and 'MV', then there may be an unsightly gap in > the middle of 'AV'. Kerning is basically the process of adjusting those > gaps. Kerning is done by the shaper. To do it, it needs the > whole sequence of characters. Ah, okay, thanks. Then yes, Emacs just uses the advance width that we get from the metrics of each glyph. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
On Sat, 23 May 2020 17:22:58 +0300 Eli Zaretskii wrote: > > Date: Sat, 23 May 2020 14:51:53 +0100 > > From: Richard Wordingham > > > > > > They may of course have more than one set of such rules, with > > > > the rule sets defining different sets of sequences. > > > > > > Who are "they" in this context? > > > > Devanagari and Tai Tham are two examples I am aware of. > > Emacs supports more than one rule for each composable sequence of > characters. That doesn't help when the rules give conflicting divisions into clusters, which is the case with Tai Tham. On the other hand, for the Devanagari scripts, the rules can store alternatives which some renderers would consider ill-formed, or be more sensibly treated as 2 clusters. > > Devanagari has different rules for positioning of Vedic marks > > between fonts using the script tags dev and dev2 for it on one hand > > and the unofficial script tag dev3, which follows the USE rules for > > character ordering. For tag dev, Microsoft says that > virama, candrabindu, consonant> is one cluster; others, including > > Unicode, say it's two. Candrabindu in the middle and candrabindu > > at the end mean different things; the former nasalises a consonant, > > while the latter nasalises a vowel. The visual distinction exists, > > at least when half-forms are used. > > See the rules set up near the end of indian.el in Emacs. If they > don't cover what you describe, we can add more. The Devanagari rule only covers the Vedic marks in the Devanagari block, the 'stress signs' according to the comments. Can rules essentially for different scripts now share combining marks? The newer Vedic marks were supposed to be available to at least all Indian Indic scripts. > > > If a font requires special shaping for any sequence of any number > > > of 26 (or maybe 52) ASCII letters, then the Emacs display engine > > > will need to be redesigned. So this extreme possibility doesn't > > > bother me. > > > > In general, they do require it. But how is this worse than handling > > Arabic? > > I don't know. Maybe it isn't. Or maybe the slowdown while displaying > ASCII and moving the cursor through it will be unbearable. > > > Is the problem that you want to keep the option of line > > wrapping splitting words for ASCII, but are not bothered for Arabic > > or other human languages? > > Does Emacs indeed fail to wrap Arabic text? can you show an example? Character level wrapping still almost works down at Emacs 24.4, but I don't know that it wasn't broken in later enhancements. There are three features that make me think Emacs 24.4 might be different to the current state of affairs: (1) Clicking into the text breaks text before the cursor, but not after it. (2) I can't step into lam-alif the way I step into Indic clusters. (3) Lam-alif isn't broken by line wrap. > > I think you mean that Emacs would store the position of components > > by an index that was the sequence of characters, not the glyph ID. > > That would also deal with precomposed characters - it would be the > > character sequence that mattered, and for cursor movement and > > rendering, the canonically equivalent sequence(s) and the > > precomposed character would remain distinct. > > Sorry, I don't follow: what do you mean by "store"? Emacs stores the > rules used to compose characters, and it stores the results of the > compositions already done by applying those rules, as part of > displaying some chunk of text. Which one of these did you have in > mind? Neither. I thought from the Emacs developers' discussion that you were hoping to store the locations of the character boundaries within ligatures. Richard. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
On Sat, 23 May 2020 11:25:38 +0300 Eli Zaretskii wrote: > > From: Khaled Hosny > > Date: Sat, 23 May 2020 09:51:21 +0200 > > Cc: harfbuzz@lists.freedesktop.org > > What are you going to do about kerning, or mark positioning? > > Partially kerning arbitrary glyphs (because the sub string match > > some regular expression) is worse than not kerning at all. > > I don't think I understand the question. How is kerning related to > the issue at hand? I'm not an expert on typesetting text (so maybe I > don't even understand what exactly is meant by "kerning" in this > context), so please tell more details about this. The simplest way of laying out proportionally spaced text is to have a fixed glyph-dependent distance ('advance width') from the 'origin' of a glyph to the origin of the next glyph and simply lay them out in a sequence, like movable type. However, if one chooses widths suitable for the sequences 'AM' and 'MV', then there may be an unsightly gap in the middle of 'AV'. Kerning is basically the process of adjusting those gaps. Kerning is done by the shaper. To do it, it needs the whole sequence of characters. To a first approximation, mark positioning is handled by passing the whole clusters to the shaper, and suitable regular expressions will handle this. However, sometimes clusters will interact. Microsoft had an example in the OpenType specification of the handling of the sequence Wö with a comparatively huge 'W'. In this example, the umlaut would be lowered to get out of the way of the 'W'. To do this, the shaper has to be presented with "W" and "ö" as part of the same sequence. Richard. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> Date: Sat, 23 May 2020 14:51:53 +0100 > From: Richard Wordingham > > > > They may of course have more than one set of such rules, with the > > > rule sets defining different sets of sequences. > > > > Who are "they" in this context? > > Devanagari and Tai Tham are two examples I am aware of. Emacs supports more than one rule for each composable sequence of characters. > Devanagari has different rules for positioning of Vedic marks between > fonts using the script tags dev and dev2 for it on one hand and the > unofficial script tag dev3, which follows the USE rules for character > ordering. For tag dev, Microsoft says that candrabindu, consonant> is one cluster; others, including Unicode, say > it's two. Candrabindu in the middle and candrabindu at the end mean > different things; the former nasalises a consonant, while the latter > nasalises a vowel. The visual distinction exists, at least when > half-forms are used. See the rules set up near the end of indian.el in Emacs. If they don't cover what you describe, we can add more. > > I'm not talking about Arabic. Emacs has a set of regular expressions > > for sequences of Arabic characters that need shaping, misc-lang.el in > > Emacs. If the set is incomplete, we can augment it. > > That regular expression treats every Arabic word as in need of shaping. > > > If a font requires special shaping for any sequence of any number of > > 26 (or maybe 52) ASCII letters, then the Emacs display engine will > > need to be redesigned. So this extreme possibility doesn't bother me. > > In general, they do require it. But how is this worse than handling > Arabic? I don't know. Maybe it isn't. Or maybe the slowdown while displaying ASCII and moving the cursor through it will be unbearable. > Is the problem that you want to keep the option of line > wrapping splitting words for ASCII, but are not bothered for Arabic or > other human languages? Does Emacs indeed fail to wrap Arabic text? can you show an example? > > > How would you handle the possibility that all three of <æ>, > > > and might be rendered by the same glyph, althouɡh they > > > are comprised of 1, 2 and 3 characters respectively? > > > > By using a composition rule that matches both and . > > The rules are regexp-based, and expressing the above as a regexp is > > simple. Once a sequence of characters matches the regexp, Emacs calls > > the shaper (hb_shape etc.) to produce the font glyphs for the > > sequence, and displays the glyphs that the shaper returns. > > I think you mean that Emacs would store the position of components by > an index that was the sequence of characters, not the glyph ID. That > would also deal with precomposed characters - it would be the character > sequence that mattered, and for cursor movement and rendering, > the canonically equivalent sequence(s) and the precomposed character > would remain distinct. Sorry, I don't follow: what do you mean by "store"? Emacs stores the rules used to compose characters, and it stores the results of the compositions already done by applying those rules, as part of displaying some chunk of text. Which one of these did you have in mind? ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
On Sat, 23 May 2020 09:09:48 +0300 Eli Zaretskii wrote: > > Date: Fri, 22 May 2020 22:22:49 +0100 > > From: Richard Wordingham > > > > > The current support for producing ligatures works in the same way > > > as complex text shaping for scripts that require that, like > > > Arabic and Khmer: the sequences of characters that can be > > > displayed as ligatures are identified in advance with suitable > > > regular expressions, and the display engine then passes these > > > sequences to hb_shape to produce the ligatures. > > > > > > This works well for scripts that require complex shaping, because > > > such scripts generally have well-defined rules for the sequences > > > of codepoints that need shaping. > > > > They may of course have more than one set of such rules, with the > > rule sets defining different sets of sequences. > > Who are "they" in this context? Devanagari and Tai Tham are two examples I am aware of. Devanagari has different rules for positioning of Vedic marks between fonts using the script tags dev and dev2 for it on one hand and the unofficial script tag dev3, which follows the USE rules for character ordering. For tag dev, Microsoft says that is one cluster; others, including Unicode, say it's two. Candrabindu in the middle and candrabindu at the end mean different things; the former nasalises a consonant, while the latter nasalises a vowel. The visual distinction exists, at least when half-forms are used. Tai Tham has an issue with the mark U+1A58 TAI THAM SIGN MAI KANG LAI. It is, at least formally, a non-spacing mark. It occurs at the juncture of two syllables in the same words. Modern, printed Tai Khuen happily treats it as syllable-final. In more traditional styles, it starts syllables, going above the first consonant, and so to the right of a vowel mark reordered to the left hand side of the syllable. Some fonts seem to just let it hang over the start of the next syllable, taking pot luck with what's there. That gives two different syllable structures. As I supported the style found in a certain dictionary, it sometimes belongs with the syllable before, and sometimes with the syllable after it. I therefore ended up defined the sequences to be shaped as a sequence of one or more syllables joined together by U+1A58. Fortunately, normal cursor motion is controlled by a different definition. (I'm still using Emacs 24.4 with the restoration of interactive commands forward-char-intrusive and backward-char-intrusive and their interface within the C code.) > I understand that the number of combinations is theoretically > unbounded. I'm asking if it is also unbounded in practice. That is, > do font designers add ligatures for arbitrary combinations of > characters, regardless of some reasonable set of requirements? For > example, is the set of ligatures of Latin characters shown here: > > https://en.wikipedia.org/wiki/Orthographic_ligature#Latin_alphabet > > reasonably complete, or should I expect any number of other arbitrary > combinations of Latin characters popping up in fonts? And if the > latter, then what is the purpose of providing such arbitrary > ligatures? Doesn't the existence of ligatures for 'Eisenhower' and 'Chamberlain' provide enough of an answer? If you claim to support handwriting fonts, then you can expect others - 'sh', 'tt' and 'ing' are fairly obvious ones. You may also find ligatures being used to sort out kerning issues. One problem I've observed with computer fonts is that the spacing of glyphs in a string is not consistent. This appears to be due to the way the positioning of the glyphs is rounded. The problem can be bad enough that the designer ends up fixing the problem by combining them into a single glyph, which formally is a ligature. I've not noticed this in ASCII fonts, but then I haven't looked hard at them. The 'tt' ligature can arise because the two t's are crossed by a single stroke. Crossing the 't' in 'lt' might be handled by a special 't' glyph, or one might just form an 'lt' ligature. The ending 'ing' is common enough that I unconsciously developed an abbreviated way of writing it. > I'm not talking about Arabic. Emacs has a set of regular expressions > for sequences of Arabic characters that need shaping, misc-lang.el in > Emacs. If the set is incomplete, we can augment it. That regular expression treats every Arabic word as in need of shaping. > If a font requires special shaping for any sequence of any number of > 26 (or maybe 52) ASCII letters, then the Emacs display engine will > need to be redesigned. So this extreme possibility doesn't bother me. In general, they do require it. But how is this worse than handling Arabic? Is the problem that you want to keep the option of line wrapping splitting words for ASCII, but are not bothered for Arabic or other human languages? ASCII does not satisfyingly suffice for English. > > How would you handle the possibility that all three of <æ>, >
Re: [HarfBuzz] Ligatures
> From: Khaled Hosny > Date: Sat, 23 May 2020 09:59:15 +0200 > Cc: harfbuzz@lists.freedesktop.org > > Also either Emacs is currently treating text that it enables shaping for as > second-class citizens where limitations/degraded performance is acceptable > (which is really really bad) Could you tell more about which limitations and degraded performance you had in mind? I'm not sure we have this, but cannot tell without understanding the issues. > or “redesigning the entire Emacs display engine” is not really needed as you > can just declare all text as text that needs to be shaped and be done with it. The Emacs display engine examines the text to be displayed and laid out one character at a time, and makes layout decisions after each character or grapheme cluster it lays out. Its design is therefore fundamentally incompatible with shaping large substrings of buffer text at once. We do support that for short sequences of characters, which seems to work well enough for complex shaping (a.k.a. "character compositions") of scripts that require that, but we still do that one grapheme cluster at a time. The character composition is implemented in Lisp, which is called by the display engine, and which then calls back into C to invoke the shaper. This implementation is meant to allow a great deal of control on what should be composed and how. But it is also relatively slow, which is another reason why doing that for all the text to be laid out is impractical: it slows down redisplay to the degree that it becomes annoying to users. That is why solving these problems in the way that you suggest requires a complete rewrite of the Emacs display code. It simply cannot currently support what you expect. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> From: Khaled Hosny > Date: Sat, 23 May 2020 09:51:21 +0200 > Cc: harfbuzz@lists.freedesktop.org > > > Thanks. Since (b) is not really feasible without redesigning the > > entire Emacs display engine (for which I see no volunteers lining up > > any time soon), I guess we will have to use some more-or-less > > reasonable and somewhat unreliable heuristics by supporting only some > > ligatures that are known in advance. > > What are you going to do about kerning, or mark positioning? Partially > kerning arbitrary glyphs (because the sub string match some regular > expression) is worse than not kerning at all. I don't think I understand the question. How is kerning related to the issue at hand? I'm not an expert on typesetting text (so maybe I don't even understand what exactly is meant by "kerning" in this context), so please tell more details about this. Thanks. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> On May 23, 2020, at 9:51 AM, Khaled Hosny wrote: > > > >> On May 23, 2020, at 9:44 AM, Eli Zaretskii wrote: >> >>> From: Khaled Hosny >>> Date: Sat, 23 May 2020 08:36:10 +0200 >>> Cc: harfbuzz@lists.freedesktop.org >>> The only way of doing this right, I'm told, is to either (a) query the font to get the list of all the ligatures it supports, or (b) assume any combination of characters can produce a ligature, and therefore we need to pass all the characters intended for display through hb_shape. The latter in particular is in stark contrast to how the current Emacs display code is designed and implemented. >>> >>> (a) is not realistically possible as doing it properly has pretty much the >>> same cost as shaping the text. So your only reliable option is (b). >> >> Thanks. Since (b) is not really feasible without redesigning the >> entire Emacs display engine (for which I see no volunteers lining up >> any time soon), I guess we will have to use some more-or-less >> reasonable and somewhat unreliable heuristics by supporting only some >> ligatures that are known in advance. > > What are you going to do about kerning, or mark positioning? Partially > kerning arbitrary glyphs (because the sub string match some regular > expression) is worse than not kerning at all. Also either Emacs is currently treating text that it enables shaping for as second-class citizens where limitations/degraded performance is acceptable (which is really really bad), or “redesigning the entire Emacs display engine” is not really needed as you can just declare all text as text that needs to be shaped and be done with it. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> On May 23, 2020, at 9:44 AM, Eli Zaretskii wrote: > >> From: Khaled Hosny >> Date: Sat, 23 May 2020 08:36:10 +0200 >> Cc: harfbuzz@lists.freedesktop.org >> >>> The only way of >>> doing this right, I'm told, is to either (a) query the font to get the >>> list of all the ligatures it supports, or (b) assume any combination >>> of characters can produce a ligature, and therefore we need to pass >>> all the characters intended for display through hb_shape. The latter >>> in particular is in stark contrast to how the current Emacs display >>> code is designed and implemented. >> >> (a) is not realistically possible as doing it properly has pretty much the >> same cost as shaping the text. So your only reliable option is (b). > > Thanks. Since (b) is not really feasible without redesigning the > entire Emacs display engine (for which I see no volunteers lining up > any time soon), I guess we will have to use some more-or-less > reasonable and somewhat unreliable heuristics by supporting only some > ligatures that are known in advance. What are you going to do about kerning, or mark positioning? Partially kerning arbitrary glyphs (because the sub string match some regular expression) is worse than not kerning at all. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> From: Khaled Hosny > Date: Sat, 23 May 2020 08:36:10 +0200 > Cc: harfbuzz@lists.freedesktop.org > > >The only way of > > doing this right, I'm told, is to either (a) query the font to get the > > list of all the ligatures it supports, or (b) assume any combination > > of characters can produce a ligature, and therefore we need to pass > > all the characters intended for display through hb_shape. The latter > > in particular is in stark contrast to how the current Emacs display > > code is designed and implemented. > > (a) is not realistically possible as doing it properly has pretty much the > same cost as shaping the text. So your only reliable option is (b). Thanks. Since (b) is not really feasible without redesigning the entire Emacs display engine (for which I see no volunteers lining up any time soon), I guess we will have to use some more-or-less reasonable and somewhat unreliable heuristics by supporting only some ligatures that are known in advance. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> On May 22, 2020, at 9:32 PM, Eli Zaretskii wrote: > > Hi, > > This is a bit off-topic, but I thought it could be appropriate to ask > here, since we have here some of the best experts on this subject. > > We are discussing support for ligatures in Emacs, specifically when > using HarfBuzz as the shaping engine. See the discussion from > > https://lists.gnu.org/archive/html/emacs-devel/2020-05/msg02493.html > > The current support for producing ligatures works in the same way as > complex text shaping for scripts that require that, like Arabic and > Khmer: the sequences of characters that can be displayed as ligatures > are identified in advance with suitable regular expressions, and the > display engine then passes these sequences to hb_shape to produce the > ligatures. > > This works well for scripts that require complex shaping, because such > scripts generally have well-defined rules for the sequences of > codepoints that need shaping. My original thoughts were that > ligatures could be supported in the same way, based on the assumption > that the list of possible ligatures is finite and can be stored in a > suitable data stricture in advance. I might be stating the obvious, but what Emacs is doing is a very outdated view of text layout. The schism between so called complex text and simple text does not actually exist. There are script-specific shaping rules that layout engines know and apply, and there are additional/complementary rules provided by the font that layout engines also apply. For all applications care about, they have text with certain properties and fonts, and they hand them to the layout engine and get back positioned glyphs. Any attempt to second guess the layout engine and classify the text into parts that need or do not need shaping is futile. Fonts can, and do, provide any number of arbitrary glyph interactions (not just ligatures), and the only reliable way to know that is to shape and check the output. I think I already said this before, but Emacs should indiscriminately give all the text to HarfBuzz (or any other text layout engine it additionally supports) and give up on trying to pre-classify text, and is what pretty much any other sensible application is doing already. There are many ways to solve potential performance issues that does not involve compromising on the text layout. > However, I'm being told that this assumption is false, and that each > font defines ligatures from any number of arbitrary combinations of > characters, and therefore the exhaustive list of the ligatures is in > practice infinite and cannot be provided in advance. That is true. >The only way of > doing this right, I'm told, is to either (a) query the font to get the > list of all the ligatures it supports, or (b) assume any combination > of characters can produce a ligature, and therefore we need to pass > all the characters intended for display through hb_shape. The latter > in particular is in stark contrast to how the current Emacs display > code is designed and implemented. (a) is not realistically possible as doing it properly has pretty much the same cost as shaping the text. So your only reliable option is (b). Regards, Khaled ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> Date: Fri, 22 May 2020 22:22:49 +0100 > From: Richard Wordingham > > > The current support for producing ligatures works in the same way as > > complex text shaping for scripts that require that, like Arabic and > > Khmer: the sequences of characters that can be displayed as ligatures > > are identified in advance with suitable regular expressions, and the > > display engine then passes these sequences to hb_shape to produce the > > ligatures. > > > > This works well for scripts that require complex shaping, because such > > scripts generally have well-defined rules for the sequences of > > codepoints that need shaping. > > They may of course have more than one set of such rules, with the rule > sets defining different sets of sequences. Who are "they" in this context? > > However, I'm being told that this assumption is false, and that each > > font defines ligatures from any number of arbitrary combinations of > > characters, and therefore the exhaustive list of the ligatures is in > > practice infinite and cannot be provided in advance. > > This arbitrariness is true. Over the set of all credible fonts for a > given character repertoire, the number of ligating combinations is > unbounded. I understand that the number of combinations is theoretically unbounded. I'm asking if it is also unbounded in practice. That is, do font designers add ligatures for arbitrary combinations of characters, regardless of some reasonable set of requirements? For example, is the set of ligatures of Latin characters shown here: https://en.wikipedia.org/wiki/Orthographic_ligature#Latin_alphabet reasonably complete, or should I expect any number of other arbitrary combinations of Latin characters popping up in fonts? And if the latter, then what is the purpose of providing such arbitrary ligatures? > > To be specific, I'm talking about 2 kinds of ligatures: > > > > . ligatures made of Latin characters, like "ffi" and "Th" > > . ligatures produced from symbols, like "==>" that is > > converted into ⟹ Yes, these are the only cases that I'm asking here about. I'm not asking about shaping complex scripts such as Arabic, where this problem doesn't exist AFAIK. > Have you addressed the cursive scripts yet, such as Arabic? At its > simplest, most consonants have four shapes, initial, medial, final and > isolated, and roughly speaking the shape used depends on the adjacent > spacing characters. For the most part, Emacs would have to pass whole > words into HarfBuzz for shaping. In some of the more advanced fonts, > the vowel marks in a word may also affect the shape of the consonant > skeleton. And of course, sometimes the Arabic script prefers to join > letters vertically, as well as having a few straightforward ligatures. I'm not talking about Arabic. Emacs has a set of regular expressions for sequences of Arabic characters that need shaping, misc-lang.el in Emacs. If the set is incomplete, we can augment it. > A cursive Latin script font may behave in the same way, with the shape > of letters depending on what precedes and follows them. With a small > enough character repertoire, there might be no ligatures, but your > rendering logic would fail miserably. If a font requires special shaping for any sequence of any number of 26 (or maybe 52) ASCII letters, then the Emacs display engine will need to be redesigned. So this extreme possibility doesn't bother me. > How would you handle the possibility that all three of <æ>, and > might be rendered by the same glyph, althouɡh they are > comprised of 1, 2 and 3 characters respectively? By using a composition rule that matches both and . The rules are regexp-based, and expressing the above as a regexp is simple. Once a sequence of characters matches the regexp, Emacs calls the shaper (hb_shape etc.) to produce the font glyphs for the sequence, and displays the glyphs that the shaper returns. > And if Emacs is not imposing a normalisation, then all the > precomposed characters in Unicode might have been entered as one or > as more than one character? If you are talking about composition with combining characters, Emacs already has the rules to compose them as described above. You can try this in your Emacs: insert a, then U+0301 COMBINING ACUTE ACCENT, and you should see them composed into a single glyph (provided that you use a suitable font). But I'm not asking about character composition in general, I'm asking specifically about ligatures of ASCII characters, without any non-ASCII codepoints or combining accents. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
On Fri, 22 May 2020 22:32:04 +0300 Eli Zaretskii wrote: > Can someone please tell what are the recommended practices regarding > these ligatures? Is the set of possible ligatures indeed infinite and > impossible to know in advance? And does HarfBuzz have APIs to query a > font about the ligatures it supports? hb_ot_layout_get_ligature_carets() is liable to be garbage in garbage out. While the cursor positions were included in OTL fonts to assist cursor placement, it obviously fails when the components are stacked vertically. Microsoft gave up on it and, if I remember the informal statement correctly, just divides it up evenly between the characters or grapheme clusters. Many OpenType fonts don't populate the relevant section of the GDEF table. And, of course, one has real trouble when one glyph can come from different numbers of components. LibreOffice takes (or took) a different approach, and uses the width of the characters logically before the insertion point. It's rather disconcerting when the cursor jumps backwards as one steps through the string. It could happen with the Latin script string "a͡i", for the 'double' inverted breve should shorten when the second letter is 'i'. One can get the effect in Indic scripts because of spacing viramas. Richard. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
On Fri, 22 May 2020 22:32:04 +0300 Eli Zaretskii wrote: > Hi, > > This is a bit off-topic, but I thought it could be appropriate to ask > here, since we have here some of the best experts on this subject. > > We are discussing support for ligatures in Emacs, specifically when > using HarfBuzz as the shaping engine. See the discussion from > > https://lists.gnu.org/archive/html/emacs-devel/2020-05/msg02493.html > > The current support for producing ligatures works in the same way as > complex text shaping for scripts that require that, like Arabic and > Khmer: the sequences of characters that can be displayed as ligatures > are identified in advance with suitable regular expressions, and the > display engine then passes these sequences to hb_shape to produce the > ligatures. > > This works well for scripts that require complex shaping, because such > scripts generally have well-defined rules for the sequences of > codepoints that need shaping. They may of course have more than one set of such rules, with the rule sets defining different sets of sequences. > My original thoughts were that > ligatures could be supported in the same way, based on the assumption > that the list of possible ligatures is finite and can be stored in a > suitable data stricture in advance. At one level, this is true for any individual font, for it cannot have more than 65,536 glyphs. > However, I'm being told that this assumption is false, and that each > font defines ligatures from any number of arbitrary combinations of > characters, and therefore the exhaustive list of the ligatures is in > practice infinite and cannot be provided in advance. This arbitrariness is true. Over the set of all credible fonts for a given character repertoire, the number of ligating combinations is unbounded. > The only way of > doing this right, I'm told, is to either (a) query the font to get the > list of all the ligatures it supports, or (b) assume any combination > of characters can produce a ligature, and therefore we need to pass > all the characters intended for display through hb_shape. The latter > in particular is in stark contrast to how the current Emacs display > code is designed and implemented. > To be specific, I'm talking about 2 kinds of ligatures: > > . ligatures made of Latin characters, like "ffi" and "Th" > . ligatures produced from symbols, like "==>" that is > converted into ⟹ > > Can someone please tell what are the recommended practices regarding > these ligatures? Is the set of possible ligatures indeed infinite and > impossible to know in advance? And does HarfBuzz have APIs to query a > font about the ligatures it supports? Have you addressed the cursive scripts yet, such as Arabic? At its simplest, most consonants have four shapes, initial, medial, final and isolated, and roughly speaking the shape used depends on the adjacent spacing characters. For the most part, Emacs would have to pass whole words into HarfBuzz for shaping. In some of the more advanced fonts, the vowel marks in a word may also affect the shape of the consonant skeleton. And of course, sometimes the Arabic script prefers to join letters vertically, as well as having a few straightforward ligatures. A cursive Latin script font may behave in the same way, with the shape of letters depending on what precedes and follows them. With a small enough character repertoire, there might be no ligatures, but your rendering logic would fail miserably. How would you handle the possibility that all three of <æ>, and might be rendered by the same glyph, althouɡh they are comprised of 1, 2 and 3 characters respectively? And if Emacs is not imposing a normalisation, then all the precomposed characters in Unicode might have been entered as one or as more than one character? Richard. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures and color changes
On Tue, Feb 19, 2013 at 05:53:04PM -0500, Behdad Esfahbod wrote: On 02/19/2013 05:47 PM, Khaled Hosny wrote: On Tue, Feb 19, 2013 at 05:34:40PM -0500, Behdad Esfahbod wrote: As for *where* to cut the ligature, here's what you need: * Count the number of cursor positions *inside* the ligature. For the fi ligature it's one. And we have one cursor position before the ligature, so in this case we need to cut it in two pieces, * The common heuristic then is to cut the advance width of the ligature (well, cluster really) into two equal pieces. If you want to be fancy, you can call hb_ot_layout_get_ligature_carets(), and if the number of carets matches what you expect (1 in this case I believe?), you can use the returned caret positions instead of equally dividing the ligature. I haven't seen anyone implementing this though, as it gives very marginal improvements over the heuristic. It can make quite some different with some Arabic ligatures, but few fonts implement it because few (no?) engines support it :) Correct. Maybe you can give me a font... ;) OK, here is one :) https://github.com/khaledhosny/sahl-naskh Note that BTW, a similar issue exists when kerning text. Most fonts implement kerning by adjusting the advance width of the first glyph. What this means however is that for a pair like Te, if the e moves way under the T, essentially we will get a very narrow selection width for the T, and unchanged width for the e. That's less than ideal. In HarfBuzz we split the kerning half-and-half for old-style TrueType kern pairs. But don't do something like that for GPOS kerning since, well, with GPOS the font designer has full control on what to do. Maybe we should do the same for GPOS kerning tables that only have adjustment for the first glyph and not the second? Donno. May be a nice improvement. What do others think? I think it would be a good idea, that is the majority of LTR kerning anyway. Regards, Khaled ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures and color changes
On 19/2/13 23:35, Lóránt Pintér wrote: Using ZWNJ would be a great way of fixing it. And indeed it splits the ligature all right, but it also destroys the kerning. Here's a little test I did with HarfBuzz @ e0486fc1affd3796fb8f664e2e7fc208f1d2106c: (the font has an fi ligature, and has some kerning between F and ,) Splitting the ligature with the ZWNJ works: Shaped fif\u200Ci with features: [ ] Glyph: #682, x_advance: 652 Glyph: #71, x_advance: 380 Glyph: #3, x_advance: 0 Glyph: #74, x_advance: 292 Kerning across the ZWNJ does not: Shaped F,F\u200C, with features: [ ] Glyph: #39, x_advance: 483 Glyph: #13, x_advance: 242 Glyph: #39, x_advance: 563 Glyph: #3, x_advance: 0 Glyph: #13, x_advance: 242 Is it possible that doing this will be supported later in HarfBuzz? Hmm - is that font using a legacy 'kern' table, or a GPOS 'kern' feature? For the former, I can understand that kerning would break, as it's a naïve glyph-pair lookup, but for the latter, I thought we should now ignore the ZWNJ. JK ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures and color changes
I'm trying this with a font that has the kerning data in the kern table as well as in the GPOS table. (Is that possible?) Can you show me a font that this feature works with? -- Lóránt Pintér Developer at Prezi (http://prezi.com) On Wednesday, February 20, 2013 at 10:26 AM, Jonathan Kew wrote: On 19/2/13 23:35, Lóránt Pintér wrote: Using ZWNJ would be a great way of fixing it. And indeed it splits the ligature all right, but it also destroys the kerning. Here's a little test I did with HarfBuzz @ e0486fc1affd3796fb8f664e2e7fc208f1d2106c: (the font has an fi ligature, and has some kerning between F and ,) Splitting the ligature with the ZWNJ works: Shaped fif\u200Ci with features: [ ] Glyph: #682, x_advance: 652 Glyph: #71, x_advance: 380 Glyph: #3, x_advance: 0 Glyph: #74, x_advance: 292 Kerning across the ZWNJ does not: Shaped F,F\u200C, with features: [ ] Glyph: #39, x_advance: 483 Glyph: #13, x_advance: 242 Glyph: #39, x_advance: 563 Glyph: #3, x_advance: 0 Glyph: #13, x_advance: 242 Is it possible that doing this will be supported later in HarfBuzz? Hmm - is that font using a legacy 'kern' table, or a GPOS 'kern' feature? For the former, I can understand that kerning would break, as it's a naïve glyph-pair lookup, but for the latter, I thought we should now ignore the ZWNJ. JK ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures and color changes
On 20/2/13 12:24, Lóránt Pintér wrote: I'm trying this with a font that has the kerning data in the kern table as well as in the GPOS table. (Is that possible?) Can you show me a font that this feature works with? I thought we'd seen it work last week, but now it doesn't seem to. :( Behdad, did we miss something here? It seems to me like the skippy_iter in PairPosFormat1::apply doesn't know what it's looking for... JK -- *Lóránt Pintér* Developer at Prezi http://prezi.com On Wednesday, February 20, 2013 at 10:26 AM, Jonathan Kew wrote: On 19/2/13 23:35, Lóránt Pintér wrote: Using ZWNJ would be a great way of fixing it. And indeed it splits the ligature all right, but it also destroys the kerning. Here's a little test I did with HarfBuzz @ e0486fc1affd3796fb8f664e2e7fc208f1d2106c: (the font has an fi ligature, and has some kerning between F and ,) Splitting the ligature with the ZWNJ works: Shaped fif\u200Ci with features: [ ] Glyph: #682, x_advance: 652 Glyph: #71, x_advance: 380 Glyph: #3, x_advance: 0 Glyph: #74, x_advance: 292 Kerning across the ZWNJ does not: Shaped F,F\u200C, with features: [ ] Glyph: #39, x_advance: 483 Glyph: #13, x_advance: 242 Glyph: #39, x_advance: 563 Glyph: #3, x_advance: 0 Glyph: #13, x_advance: 242 Is it possible that doing this will be supported later in HarfBuzz? Hmm - is that font using a legacy 'kern' table, or a GPOS 'kern' feature? For the former, I can understand that kerning would break, as it's a naïve glyph-pair lookup, but for the latter, I thought we should now ignore the ZWNJ. JK ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures and color changes
Hi Lóránt, On 02/19/2013 12:20 PM, Lóránt Pintér wrote: Hi, I have a problem with half-colored ligatures, like (5) mfim in the image: Right. That's one of the harder issues of text rendering. I figured out two ways to do this, but neither is good enough: * I can shape each color range separately, but then I lose the kerning between them, breaking (6) Yes. Best to not do this. * I can tell HarfBuzz to disable ligatures for the last of the first character of each color range, but then it breaks (2) or (3) and (4). Right. This is a limitation of HarfBuzz currently, that you can't turn off a pair-wise feature on one pair only, since changing the liga bit on one character affects it in both directions. I haven't been able to find a satisfactory fix for this yet. I'll think about it. Is there maybe a way to tell HarfBuzz to ignore ligatures if they span that color boundary? Or is there maybe a way to (quickly) assess if liga would be applied to a range of characters? We don't have a good answer for this right now. The way I want to eventually fix this in Pango is different: it is to pain the ligature glyph half in each color. I think you can do the same using canvas. Just use a gradient with a sharp color switch for the ligature. It's a royal pain, but I think that's the most desirable rendering. I may be wrong. As for *where* to cut the ligature, here's what you need: * Count the number of cursor positions *inside* the ligature. For the fi ligature it's one. And we have one cursor position before the ligature, so in this case we need to cut it in two pieces, * The common heuristic then is to cut the advance width of the ligature (well, cluster really) into two equal pieces. If you want to be fancy, you can call hb_ot_layout_get_ligature_carets(), and if the number of carets matches what you expect (1 in this case I believe?), you can use the returned caret positions instead of equally dividing the ligature. I haven't seen anyone implementing this though, as it gives very marginal improvements over the heuristic. Hope that helps, behdad Thanks. -- *Lóránt Pintér* Developer at Prezi http://prezi.com ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz -- behdad http://behdad.org/ ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures and color changes
On Tue, Feb 19, 2013 at 05:34:40PM -0500, Behdad Esfahbod wrote: Is there maybe a way to tell HarfBuzz to ignore ligatures if they span that color boundary? Or is there maybe a way to (quickly) assess if liga would be applied to a range of characters? We don't have a good answer for this right now. The way I want to eventually fix this in Pango is different: it is to pain the ligature glyph half in each color. I think you can do the same using canvas. Just use a gradient with a sharp color switch for the ligature. It's a royal pain, but I think that's the most desirable rendering. I may be wrong. Gecko is (was?) using clipping to partially draw each part of the ligature: http://robert.ocallahan.org/2006/10/partial-ligatures_24.html As for *where* to cut the ligature, here's what you need: * Count the number of cursor positions *inside* the ligature. For the fi ligature it's one. And we have one cursor position before the ligature, so in this case we need to cut it in two pieces, * The common heuristic then is to cut the advance width of the ligature (well, cluster really) into two equal pieces. If you want to be fancy, you can call hb_ot_layout_get_ligature_carets(), and if the number of carets matches what you expect (1 in this case I believe?), you can use the returned caret positions instead of equally dividing the ligature. I haven't seen anyone implementing this though, as it gives very marginal improvements over the heuristic. It can make quite some different with some Arabic ligatures, but few fonts implement it because few (no?) engines support it :) Regards, Khaled ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures and color changes
On 02/19/2013 05:47 PM, Khaled Hosny wrote: On Tue, Feb 19, 2013 at 05:34:40PM -0500, Behdad Esfahbod wrote: Is there maybe a way to tell HarfBuzz to ignore ligatures if they span that color boundary? Or is there maybe a way to (quickly) assess if liga would be applied to a range of characters? We don't have a good answer for this right now. The way I want to eventually fix this in Pango is different: it is to pain the ligature glyph half in each color. I think you can do the same using canvas. Just use a gradient with a sharp color switch for the ligature. It's a royal pain, but I think that's the most desirable rendering. I may be wrong. Gecko is (was?) using clipping to partially draw each part of the ligature: http://robert.ocallahan.org/2006/10/partial-ligatures_24.html Thanks for the pointer. Yes, that is also what GTK+ used to do (that code got rewritten so I don't know what it does now) for selection, but not for color attributes. Note that attributes like underline are also affected in the same way. As for *where* to cut the ligature, here's what you need: * Count the number of cursor positions *inside* the ligature. For the fi ligature it's one. And we have one cursor position before the ligature, so in this case we need to cut it in two pieces, * The common heuristic then is to cut the advance width of the ligature (well, cluster really) into two equal pieces. If you want to be fancy, you can call hb_ot_layout_get_ligature_carets(), and if the number of carets matches what you expect (1 in this case I believe?), you can use the returned caret positions instead of equally dividing the ligature. I haven't seen anyone implementing this though, as it gives very marginal improvements over the heuristic. It can make quite some different with some Arabic ligatures, but few fonts implement it because few (no?) engines support it :) Correct. Maybe you can give me a font... ;) Note that BTW, a similar issue exists when kerning text. Most fonts implement kerning by adjusting the advance width of the first glyph. What this means however is that for a pair like Te, if the e moves way under the T, essentially we will get a very narrow selection width for the T, and unchanged width for the e. That's less than ideal. In HarfBuzz we split the kerning half-and-half for old-style TrueType kern pairs. But don't do something like that for GPOS kerning since, well, with GPOS the font designer has full control on what to do. Maybe we should do the same for GPOS kerning tables that only have adjustment for the first glyph and not the second? Donno. May be a nice improvement. What do others think? behdad Regards, Khaled -- behdad http://behdad.org/ ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures and color changes
On 19/2/13 22:34, Behdad Esfahbod wrote: Hi Lóránt, On 02/19/2013 12:20 PM, Lóránt Pintér wrote: Hi, I have a problem with half-colored ligatures, like (5) mfim in the image: Right. That's one of the harder issues of text rendering. I figured out two ways to do this, but neither is good enough: * I can shape each color range separately, but then I lose the kerning between them, breaking (6) Yes. Best to not do this. * I can tell HarfBuzz to disable ligatures for the last of the first character of each color range, but then it breaks (2) or (3) and (4). Right. This is a limitation of HarfBuzz currently, that you can't turn off a pair-wise feature on one pair only, since changing the liga bit on one character affects it in both directions. I haven't been able to find a satisfactory fix for this yet. I'll think about it. In general, I don't think it's clear exactly how these sort of edge cases ought to work. Suppose you have a glyph sequence A X B, and the 'liga' feature is enabled for A and B, but not for X; but further suppose that X is a mark glyph, the liga lookup ignores marks, and there's an AB ligature. Should it be applied here? Another possible approach to disabling ligatures at the color change - given that harfbuzz doesn't know anything about color, that must be something that your application is maintaining - might be to insert a ZWNJ character at that position in the text. With the latest harfbuzz code, I believe kerning would still apply correctly across this, but it should prevent the ligature. Is there maybe a way to tell HarfBuzz to ignore ligatures if they span that color boundary? Or is there maybe a way to (quickly) assess if liga would be applied to a range of characters? We don't have a good answer for this right now. The way I want to eventually fix this in Pango is different: it is to pain the ligature glyph half in each color. I think you can do the same using canvas. Just use a gradient with a sharp color switch for the ligature. It's a royal pain, but I think that's the most desirable rendering. I may be wrong. It's a reasonable rendering for typical Latin ligatures in simple text fonts. It doesn't work so well for more cursive cases. E.g. using this approach to color the middle f of Zapfino's ffi will look rather weird, as will coloring the parts of Arabic lam-meem-hah in a font with stacked ligature forms. As for *where* to cut the ligature, here's what you need: * Count the number of cursor positions *inside* the ligature. For the fi ligature it's one. And we have one cursor position before the ligature, so in this case we need to cut it in two pieces, * The common heuristic then is to cut the advance width of the ligature (well, cluster really) into two equal pieces. If you want to be fancy, you can call hb_ot_layout_get_ligature_carets(), and if the number of carets matches what you expect (1 in this case I believe?), you can use the returned caret positions instead of equally dividing the ligature. I haven't seen anyone implementing this though, as it gives very marginal improvements over the heuristic. Particularly as I suspect that relatively few fonts actually have GDEF tables that define ligature-caret positions with any more care than simply dividing up the advance width into equal parts. JK ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures and color changes
Using ZWNJ would be a great way of fixing it. And indeed it splits the ligature all right, but it also destroys the kerning. Here's a little test I did with HarfBuzz @ e0486fc1affd3796fb8f664e2e7fc208f1d2106c: (the font has an fi ligature, and has some kerning between F and ,) Splitting the ligature with the ZWNJ works: Shaped fif\u200Ci with features: [ ] Glyph: #682, x_advance: 652 Glyph: #71, x_advance: 380 Glyph: #3, x_advance: 0 Glyph: #74, x_advance: 292 Kerning across the ZWNJ does not: Shaped F,F\u200C, with features: [ ] Glyph: #39, x_advance: 483 Glyph: #13, x_advance: 242 Glyph: #39, x_advance: 563 Glyph: #3, x_advance: 0 Glyph: #13, x_advance: 242 Is it possible that doing this will be supported later in HarfBuzz? -- Lóránt Pintér Developer at Prezi (http://prezi.com) On Wednesday, February 20, 2013 at 12:09 AM, Jonathan Kew wrote: Another possible approach to disabling ligatures at the color change - given that harfbuzz doesn't know anything about color, that must be something that your application is maintaining - might be to insert a ZWNJ character at that position in the text. With the latest harfbuzz code, I believe kerning would still apply correctly across this, but it should prevent the ligature. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz