Re: [HarfBuzz] A problem in thai shaper
On 18/4/12 02:22, Behdad Esfahbod wrote: On 04/17/2012 06:47 PM, Khaled Hosny wrote: On Tue, Apr 17, 2012 at 05:10:37AM +0200, Khaled Hosny wrote: On Mon, Apr 16, 2012 at 09:08:49PM -0400, Behdad Esfahbod wrote: Problem 2: When there is no consonant exist, the dotted circle should be inserted as base character. The logic should be the first step for the shaping engine to find the invalid combing marks. Refer to http://www.microsoft.com/typography/otfntdev/thaiot/shaping.aspx#comb Right. We do not handle invalid combining marks yet. That's something I want to do at some point but it's not high priority. I don't know about Thai, but the handling of "invalid" Arabic combining marks in Uniscribe is completely brain dead and a real PITA and I'd really like not to see HarfBuzz going there, a shaping engine is not a spell checker and should not enforce any input pattern. http://www.microsoft.com/typography/OpenType%20Dev/arabic/shaping.mspx#invalid Incidentally, I came across this Typophile post, which is one example of why this "invalid" mark handling is not really a good idea: http://typophile.com/node/92130 Interesting. I'm undecided about this as of now. Just adding my vote in favor of Khaled's position. The shaping engine should not attempt to enforce rules such as "only one vowel mark on each consonant" or "nukta cannot apply to vowels" (IIRC, MS may have relented on that one) or "the vowel mark must precede the tone mark", etc. That's the role of a (language-specific) spell-checker. JK ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] harfbuzz-ng: Branch 'master' - 6 commits
> commit a5f1834f57ea3fb254f5c7d372747de316fcc8f1 > Author: Behdad Esfahbod > Date: Mon Apr 16 15:55:13 2012 -0400 > > Apply 'liga' for vertical writing mode too > > Apparently that's what Kazuraki uses to form vertical ligatures, > which suggests that it's what Adobe does. > > diff --git a/src/hb-ot-shape.cc b/src/hb-ot-shape.cc > index d21559c..66b1461 100644 > --- a/src/hb-ot-shape.cc > +++ b/src/hb-ot-shape.cc > @@ -35,6 +35,7 @@ > > hb_tag_t common_features[] = { >HB_TAG('c','c','m','p'), > + HB_TAG('l','i','g','a'), >HB_TAG('l','o','c','l'), >HB_TAG('m','a','r','k'), >HB_TAG('m','k','m','k'), > @@ -46,7 +47,6 @@ hb_tag_t horizontal_features[] = { >HB_TAG('c','l','i','g'), >HB_TAG('c','u','r','s'), >HB_TAG('k','e','r','n'), > - HB_TAG('l','i','g','a'), > }; Just a note here that this will be problematic when rendering upright Latin text in vertical mode. Frankly, I don't think there's a clear, consistent design model for OpenType features in the vertical case. There's clearly a need to distinguish more clearly vertical ligatures from horizontal ones, having 'f' and 'i' ligate in the upright vertical case doesn't make sense. Kazuraki relies on the 'vert' feature to disambiguate horizontal and vertical ligatures but this won't work for fonts not designed with the vertical case in mind. So I think this change will need tweaking in the future, once there's a clearer definition of the OpenType feature model for vertical text. Cheers, John Daggett ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] A problem in thai shaper
Dear Behdad, > >> I don't know about Thai, but the handling of "invalid" Arabic combining > >> marks in Uniscribe is completely brain dead and a real PITA and I'd > >> really like not to see HarfBuzz going there, a shaping engine is not a > >> spell checker and should not enforce any input pattern. > >> > >> http://www.microsoft.com/typography/OpenType%20Dev/arabic/shaping.mspx#invalid > > > > Incidentally, I came across this Typophile post, which is one example of > > why this "invalid" mark handling is not really a good idea: > > http://typophile.com/node/92130 > > Interesting. I'm undecided about this as of now. I agree that shaping should not be used to constrain what might be valid sequences. But I think a shaping engine can be used to mark (think dotted circle) sequences that are structurally invalid. By these I mean sequences that would not otherwise show any visual difference from a valid sequence. For example diacritics in the wrong order (not covered by normalization) that show no visual difference (e.g. upper diacritic preceding lower when both have 0 combining order). Such validity will be script specific but not language specific. The aim here is not to limit spellings but to ensure matchable sequencess. In addition, a shaping engine is not designed to ensure that the lowest common denominator font for a script can handle anything thrown at it. Yours, Martin ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] On hb_shape_plan() and other API for 1.0
On 04/12/2012 07:55 AM, Jonathan Kew wrote: > On 12/4/12 02:47, Behdad Esfahbod wrote: >> As a crude test, I profiled the Indic shaping, and am conjecturing that about >> 10 to 20 percent of the time can be saved pre-planning the shaping process. >> My testing showed no measurable saving for skipping the sanitizing process. >> >> Maybe both can wait (and not block a 1.0 release) since neither one seems to >> be hugely effective. > > A saving of 10-20% sounds pretty worthwhile to me - and if 1.0 is supposed to > provide a long-term stable API, then perhaps this should be done sooner rather > than later. So, I tested this, and looks like for short strings we get a 25% or better improvement. Correctly implementing it though takes some refactoring, so I'll do that when I get the time to. b > Otherwise, there'll be a strong temptation to rev the API again shortly after > 1.0 in order to achieve this performance boost. > > JK > ___ > HarfBuzz mailing list > HarfBuzz@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/harfbuzz > ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] A problem in thai shaper
On 04/17/2012 06:47 PM, Khaled Hosny wrote: > On Tue, Apr 17, 2012 at 05:10:37AM +0200, Khaled Hosny wrote: >> On Mon, Apr 16, 2012 at 09:08:49PM -0400, Behdad Esfahbod wrote: Problem 2: When there is no consonant exist, the dotted circle should be inserted as base character. The logic should be the first step for the shaping engine to find the invalid combing marks. Refer to http://www.microsoft.com/typography/otfntdev/thaiot/shaping.aspx#comb >>> >>> Right. We do not handle invalid combining marks yet. That's something I >>> want >>> to do at some point but it's not high priority. >> >> I don't know about Thai, but the handling of "invalid" Arabic combining >> marks in Uniscribe is completely brain dead and a real PITA and I'd >> really like not to see HarfBuzz going there, a shaping engine is not a >> spell checker and should not enforce any input pattern. >> >> http://www.microsoft.com/typography/OpenType%20Dev/arabic/shaping.mspx#invalid > > Incidentally, I came across this Typophile post, which is one example of > why this "invalid" mark handling is not really a good idea: > http://typophile.com/node/92130 Interesting. I'm undecided about this as of now. behdad > Regards, > Khaled > ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] A problem in thai shaper
On Tue, Apr 17, 2012 at 05:10:37AM +0200, Khaled Hosny wrote: > On Mon, Apr 16, 2012 at 09:08:49PM -0400, Behdad Esfahbod wrote: > > > Problem 2: > > > > > > When there is no consonant exist, the dotted circle should be inserted as > > > base > > > character. The logic should be the first step for the shaping engine to > > > find > > > the invalid combing marks. Refer to > > > http://www.microsoft.com/typography/otfntdev/thaiot/shaping.aspx#comb > > > > Right. We do not handle invalid combining marks yet. That's something I > > want > > to do at some point but it's not high priority. > > I don't know about Thai, but the handling of "invalid" Arabic combining > marks in Uniscribe is completely brain dead and a real PITA and I'd > really like not to see HarfBuzz going there, a shaping engine is not a > spell checker and should not enforce any input pattern. > > http://www.microsoft.com/typography/OpenType%20Dev/arabic/shaping.mspx#invalid Incidentally, I came across this Typophile post, which is one example of why this "invalid" mark handling is not really a good idea: http://typophile.com/node/92130 Regards, Khaled ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] A problem in thai shaper
Your HarfBuzz build probably doesn't have glib, and you are not providing any Unicode functions, so cluster formation fails. I shall make HB warn boldly if that happens. behdad On 04/17/2012 01:27 PM, datao zhang wrote: > Hi behdad: > > Thanks your comments. > > I have recheck the cluster value, but i found these values are still (0,1,2). > I don't why you can get (0,0,0). > > I test it use the following code written by myself: > > > unsigned int uchar[3] > for(int i = 0; i < 3; i++) > hb_buffer_add(buffer, uchar[i],1,i); > hb_buffer_set_direction(mBuffer, HB_DIRECTION_LTR); > hb_buffer_set_script(mBuffer, HB_SCRIPT_THAI); > hb_shape(mFont, mBuffer, NULL, 0); > > > After hb_shape(), i see cluster[0] :0 ; cluster[1]: 1; cluster[2]: 2 > > Do you have any comments? whether i make mistake? > > Maybe I use wrong concept, I know the cluster in harfbuzz not used for line > break, but i think, as same as the indic, the syllable should have the same > cluster for thai, isn't it? > > Br, > Dean > >> Date: Tue, 17 Apr 2012 10:28:15 -0400 >> From: beh...@behdad.org >> To: dataozh...@hotmail.com >> Subject: Re: [HarfBuzz] A problem in thai shaper >> >> On 04/17/2012 10:26 AM, Behdad Esfahbod wrote: >> > On 04/17/2012 08:01 AM, datao zhang wrote: >> >> Hi: >> >> For Problem 1: >> >> Example: if I pass the "0x0E01,0x0E34,0x0E48", the intput clusters >> >> (0,1,2), after shape, the output cluster should be (0,0,0) because the >> >> syllable can't be broken when line break. But, currently, I find the >> >> output >> >> clusetrs are still (0,1,2). >> > >> > First, note that HarfBuzz clusters are not supposed to be used for things >> > like >> > linebreaking and cursor positioning. So (0,1,2) is totally fine if there >> > are >> > three separate glyphs representing those characters. And (0,1,2) is exactly >> > what Uniscribe returns. >> >> Err, my bad. Both HarfBuzz and Uniscribe return (0,0,0) for the sequence, so >> I don't think there's anything to fix here. >> >> b >> >> > HarfBuzz however returns (0,0,0) for that sequence. >> > How where you testing? I'm leaning towards trying to match Uniscribe here. >> > The finer-grained the cluster values are, the better cursor positioning >> > can be >> > built on top of HarfBuzz. >> > >> > behdad >> > >> >> Br, >> >> Dean >> >> >> >>> Date: Mon, 16 Apr 2012 21:08:49 -0400 >> >>> From: beh...@behdad.org >> >>> To: dataozh...@hotmail.com >> >>> CC: harfbuzz@lists.freedesktop.org >> >>> Subject: Re: [HarfBuzz] A problem in thai shaper >> >>> >> >>> Hi, >> >>> >> >>> Thanks for the email. My comments inline. >> >>> >> >>> On 04/13/2012 09:41 AM, datao zhang wrote >> So I think for the new Thai shaper, the valid composition of “consonant >> [1 >> mandatory]+ diacritic vowel [1 optional] + tone mark [1 optional] “ > should be >> set as same cluster. >> >>> >> >>> I would guess that our generic layer will already take care of this >> >>> based on >> >>> canonical combining categories? Do you have a test case that you want to >> >>> see >> >>> improved? >> >>> >> >>> >> Problem 2: >> >> When there is no consonant exist, the dotted circle should be inserted > as base >> character. The logic should be the first step for the shaping engine to > find >> the invalid combing marks. Refer to >> http://www.microsoft.com/typography/otfntdev/thaiot/shaping.aspx#comb >> >>> >> >>> Right. We do not handle invalid combining marks yet. That's something I >> >>> want >> >>> to do at some point but it's not high priority. >> >>> >> >>> behdad ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
[HarfBuzz] harfbuzz-ng: Branch 'master'
src/hb-graphite2.cc |3 +++ 1 file changed, 3 insertions(+) New commits: commit 3cde23664fbbe9cd2ac1b8fd5eb2ea288309cc9c Author: Behdad Esfahbod Date: Tue Apr 17 11:44:49 2012 -0400 Minor note re Graphite diff --git a/src/hb-graphite2.cc b/src/hb-graphite2.cc index cdf55f1..fa07ae9 100644 --- a/src/hb-graphite2.cc +++ b/src/hb-graphite2.cc @@ -221,6 +221,9 @@ _hb_graphite_shape (hb_font_t *font, buffer->guess_properties (); + /* XXX We do a hell of a lot of stuff just to figure out this font + * is not graphite! Shouldn't do. */ + hb_gr_font_data_t *data = _hb_gr_font_get_data (font); if (!data->grface) return FALSE; ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
[HarfBuzz] harfbuzz-ng: Branch 'master' - 2 commits
src/hb-graphite2.cc |1 + test/shaping/texts/in-tree/shaper-thai/misc/misc.txt |1 + 2 files changed, 2 insertions(+) New commits: commit 4dc2449d92308f8dd366142831c0b85bd30ea5a9 Author: Behdad Esfahbod Date: Tue Apr 17 11:39:48 2012 -0400 Fix leak in graphite diff --git a/src/hb-graphite2.cc b/src/hb-graphite2.cc index 64f22f7..cdf55f1 100644 --- a/src/hb-graphite2.cc +++ b/src/hb-graphite2.cc @@ -130,6 +130,7 @@ static void _hb_gr_font_data_destroy (void *data) hb_gr_font_data_t *f = (hb_gr_font_data_t *) data; gr_font_destroy (f->grfont); + free (f); } static hb_user_data_key_t hb_gr_data_key; commit 0290bbf8611aa881daed907f22256a431250c90a Author: Behdad Esfahbod Date: Tue Apr 17 10:28:21 2012 -0400 Add another Thai test diff --git a/test/shaping/texts/in-tree/shaper-thai/misc/misc.txt b/test/shaping/texts/in-tree/shaper-thai/misc/misc.txt index fc2dba9..51a47af 100644 --- a/test/shaping/texts/in-tree/shaper-thai/misc/misc.txt +++ b/test/shaping/texts/in-tree/shaper-thai/misc/misc.txt @@ -3,3 +3,4 @@ à¸à¹à¹à¸² à¸à¸³ ำ +à¸à¸´à¹ ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz