[Harfbuzz-indic] Lemongrass HarfBuzz Hackfest, end of day 1
Hi everyone, I just pushed the commits from the first day of hacking. There were meetings and other things going on so it was a fairly short day, but we got some solid five hours of hacking on Khmer. We used the Daun Penh font shipping with Windows 7 for testing. At the beginning of the day, 85% of the tests from our Wikipedia test data where failing. Ie. everything was broken for Khmer. Which wasn't any surprise, since major parts of the Indic shaper did not recognize Khmer at all. Five hours and 20 commits later, the failures are at under 3% now. In particular, we fixed a GDEF/GSUB issue (that has implications in other scripts too), made the shaper recognize Khmer Ro, reorder pre-base reordering characters, recognize Khmer register shifters and similar signs, and other tweaks here and there. There's more to be done tomorrow: change the syllable machine to recognize post-matra subjoined consonants, etc. Then we will attack Bengali again, addressing Ra Phala / Ya Phala, which seem to constitute the majority of failures right now. Then we'll take another look at Malayalam (which was improved substantially as a result of implementing pre-base Ra reordering already). More updates tomorrow. In the mean time, the GSUB fix may have had addressed the issue Khaled recently reported. Khaled, would be nice if you can check that. And would be nice if others can give Khmer with other fonts and report. Cheers, behdad On 07/13/2012 02:34 PM, Behdad Esfahbod wrote: Hi, Just a heads-up that Jonathan and I will be hacking on HarfBuzz all-week next week in the Toronto Mozilla office. My plan / goal for the week is to further streamline the Indic scripts: finishing Bengali, Tamil, etc, and move to Malayalam (pre-base reordering Ra, etc), then moving on to implementing the Khmer coeng model and other Khmer-specific features, which should also resolve Tai Tham among others. If there's something specific that you want to see fixed, now is the time to raise it. We will be on IRC on #harfbuzz on freenode, but I wouldn't say we'll be hugely responsive. Cheers, behdad ___ HarfBuzz-Indic mailing list HarfBuzz-Indic@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz-indic
Re: [Harfbuzz-indic] Lemongrass HarfBuzz Hackfest, end of day 1
On 07/16/2012 10:38 PM, Behdad Esfahbod wrote: On 07/16/2012 10:28 PM, Behdad Esfahbod wrote: More updates tomorrow. In the mean time, the GSUB fix may have had addressed the issue Khaled recently reported. Khaled, would be nice if you can check that. Ok, this is NOT fixed. I'm looking into it. Fixed now. b b ___ HarfBuzz-Indic mailing list HarfBuzz-Indic@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz-indic
[HarfBuzz] Error when shaping Telugu text U+0C15,U+0C4D,U+0C30
Hi, When i test U+0C15,U+0C4D,U+0C30 for Telugu script, it seems that the result is different between windows shaping engine and harfbuzz. Please refer to the attachment. I think the picture of win7 is our expected. Br, Dean attachment: Win7.pngattachment: Harfbuzz.png___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] HEH GOAL WITH HAMZA ABOVE(0x06C2) error in arabic
Hi Khaled: Thanks your comments. Difficult to make an choice. But we cannot add any glyphs to any fonts ourselves. If the font designers do not provide the medial glyph for U+06C2 we cannot do anything about it. If we follow Unicode standard and specify the character dual joining, we just have to accept that the rendering fails and that’s it. So I decide to deviate from Unicode and do as Microsoft does. And with Microsoft as a vast number of font vendors does it… treat U+06C2 as single joining, and everybody is (seemingly) happy. Br, Dean Date: Thu, 12 Jul 2012 17:17:13 +0200 From: khaledho...@eglug.org To: dataozh...@hotmail.com CC: jfkth...@googlemail.com; harfbuzz@lists.freedesktop.org Subject: Re: [HarfBuzz] HEH GOAL WITH HAMZA ABOVE(0x06C2) error in arabic As Jonathan pointed out, this input makes no sense if you want to the medial heh to be in its final form as it just relies on the peculiarities of certain fonts/rendering engines and is against the Unicode standard. The more reliable Unicode-way is to insert a ZWNJ (U+200C) after the medial heh to force its final form and isolated form for the last one. Regards, Khaled On Thu, Jul 12, 2012 at 10:38:39PM +0800, datao zhang wrote: Hi JK: Thanks your reply. But, in windows 7, the U+0628,U+06C2,U+06C2 and U+0628,U+06C1, U+0654,U+06C2 will be rendered as different glyphs, refer to the attachment. I use the msuighur.ttf font file of win 7 to display such text, there is no medial glyph for U+06C2 in the font file. It seems win7 shape engine will treat U+062C as single joining. I also check other open source font file, example: DroidSansArabic.ttf, there is no medial glyph for U+06C2. So it seems most of font not include medial glyph for U+062C. I think it better that treat U+062C as a single joining same as win7 shape engine. Br, Dean Date: Thu, 12 Jul 2012 08:45:31 +0100 From: jfkth...@googlemail.com To: harfbuzz@lists.freedesktop.org Subject: Re: [HarfBuzz] HEH GOAL WITH HAMZA ABOVE(0x06C2) error in arabic On 12/7/12 04:12, datao zhang wrote: Hi Behdad: When i try to draw the text U+0628,U+06C2,U+06C2,U+0020 with harfbuzz, i find it is rendered differently in windows 7. please see attachment file. After checking the hb-ot-shape-complex-arabic-table.hh, I found U+06C2 joining type was JOINING_TYPE_D (double joining). I don’t know why Unicode specifies it double joining, it really does not make any sense. At least, when render the text, it should be treated as single joining type. U+06C2 is classified as double-joining because it has a canonical decomposition to U+06C1, U+0654, and U+06C1 is double-joining. If U+06C2 were right-joining, we'd have a situation where equivalent text sequences would have different joining behaviors, so that a canonical normalization process might unexpectedly alter the rendering. While I agree that the various HEH-related characters in Unicode are a confusing mess, the various compatibility and stability requirements, as well as the different usage patterns needed for various languages and regions, make it difficult to see how we could fix them. The broken-looking result you're getting from harfbuzz happens because the font used does not have a 'medi' glyph for the U+06C2 character. This is a font bug. To reliably produce the windows 7 result shown in your attachment, I believe the correct text sequence would be U+0628, U+06C2, U+200C, U+06C2. JK _ __ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] MarkAttachmentType lookup flag has no effect
Humm. Can you send screenshots? I get exact same output from both fonts, using both the OT and Uniscribe shapers. behdad On 07/15/2012 04:01 AM, Khaled Hosny wrote: It seems that the (older) MarkAttachmentType lookup flag is not applied by HarfBuzz, while the (newer) UseMarkFilteringSet flag works fine. Here is the same font font once using MarkAttachmentType and once using UseMarkFilteringSet with the same mark glyph class in both cases (and it is the only one used), and if applied the final dots in the words تختة, تخنة, تخئة, تخثة and تخٹة should be raised up whether vowel marks are used or not. http://khaledhosny.org/files/tmp/hussaini-nastaleeq_MarkAttachmentType.ttf http://khaledhosny.org/files/tmp/hussaini-nastaleeq_UseMarkFilteringSet.ttf Regards, Khaled ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] MarkAttachmentType lookup flag has no effect
Ah, sorry. My bad. I thought the sequences you listed are supposed to reproduce the problem by themselves. I'll test with marks added. On the positive side, Jonathan and I found a GDEF/GSUB issue and fixed it today, which I'm fairly sure was causing this. Commits (and day1 report) coming soon. behdad On 07/16/2012 01:37 PM, Khaled Hosny wrote: Attached two screenshots, first line with no vowel marks used and second line with vowel marks used (the placement of the vowel marks themselves it off, but that is because the glyphs lack proper anchors). Regards, Khaled On Mon, Jul 16, 2012 at 11:13:01AM -0400, Behdad Esfahbod wrote: Humm. Can you send screenshots? I get exact same output from both fonts, using both the OT and Uniscribe shapers. behdad On 07/15/2012 04:01 AM, Khaled Hosny wrote: It seems that the (older) MarkAttachmentType lookup flag is not applied by HarfBuzz, while the (newer) UseMarkFilteringSet flag works fine. Here is the same font font once using MarkAttachmentType and once using UseMarkFilteringSet with the same mark glyph class in both cases (and it is the only one used), and if applied the final dots in the words تختة, تخنة, تخئة, تخثة and تخٹة should be raised up whether vowel marks are used or not. http://khaledhosny.org/files/tmp/hussaini-nastaleeq_MarkAttachmentType.ttf http://khaledhosny.org/files/tmp/hussaini-nastaleeq_UseMarkFilteringSet.ttf Regards, Khaled ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
[HarfBuzz] harfbuzz-ng: Branch 'master' - 22 commits
src/hb-ot-layout-gsub-table.hh |2 src/hb-ot-layout-gsubgpos-private.hh | 20 ++- src/hb-ot-shape-complex-arabic.cc|6 src/hb-ot-shape-complex-indic-machine.rl |3 src/hb-ot-shape-complex-indic-private.hh | 13 +- src/hb-ot-shape-complex-indic.cc | 199 --- src/hb-ot-shape-complex-misc.cc | 18 ++ src/hb-ot-shape-complex-private.hh | 30 src/hb-ot-shape.cc |2 src/hb-unicode.cc| 22 +++ 10 files changed, 236 insertions(+), 79 deletions(-) New commits: commit af92b4cc90e4184d5bdd8037c551ed482700114f Author: Behdad Esfahbod beh...@behdad.org Date: Mon Jul 16 20:31:24 2012 -0400 [Indic] Disable 'kern' in Uniscribe bug compatibility mode Uniscribe does not apply 'kern' in the Indic module. Some of the Khmer fonts they ship have small adjustments in the 'kern' table. Disable 'kern' in the Indic module under Uniscribe bug compatibility mode. Fixes some 10% of the Khmer failures. Remains under 3% (excluding dotted-circle ones). diff --git a/src/hb-ot-shape-complex-indic.cc b/src/hb-ot-shape-complex-indic.cc index f8be98e..19bb75c 100644 --- a/src/hb-ot-shape-complex-indic.cc +++ b/src/hb-ot-shape-complex-indic.cc @@ -221,6 +221,9 @@ void _hb_ot_shape_complex_override_features_indic (hb_ot_map_builder_t *map, const hb_segment_properties_t *props HB_UNUSED) { + /* Uniscribe does not apply 'kern'. */ + if (indic_options ().uniscribe_bug_compatible) +map-add_feature (HB_TAG('k','e','r','n'), 0, true); } commit d96838ef951ce6170eb2dc576ebcba2262cf7008 Author: Behdad Esfahbod beh...@behdad.org Date: Mon Jul 16 20:26:57 2012 -0400 Allow complex shapers overriding common features In a new callback... Currently unused by all complex shapers. diff --git a/src/hb-ot-shape-complex-arabic.cc b/src/hb-ot-shape-complex-arabic.cc index 54460f0..75f5fe9 100644 --- a/src/hb-ot-shape-complex-arabic.cc +++ b/src/hb-ot-shape-complex-arabic.cc @@ -199,6 +199,12 @@ _hb_ot_shape_complex_collect_features_arabic (hb_ot_map_builder_t *map, map-add_bool_feature (HB_TAG('c','s','w','h')); } +void +_hb_ot_shape_complex_override_features_arabic (hb_ot_map_builder_t *map, + const hb_segment_properties_t *props) +{ +} + hb_ot_shape_normalization_mode_t _hb_ot_shape_complex_normalization_preference_arabic (void) { diff --git a/src/hb-ot-shape-complex-indic.cc b/src/hb-ot-shape-complex-indic.cc index d9087d6..f8be98e 100644 --- a/src/hb-ot-shape-complex-indic.cc +++ b/src/hb-ot-shape-complex-indic.cc @@ -217,6 +217,12 @@ _hb_ot_shape_complex_collect_features_indic (hb_ot_map_builder_t *map, } } +void +_hb_ot_shape_complex_override_features_indic (hb_ot_map_builder_t *map, + const hb_segment_properties_t *props HB_UNUSED) +{ +} + hb_ot_shape_normalization_mode_t _hb_ot_shape_complex_normalization_preference_indic (void) diff --git a/src/hb-ot-shape-complex-misc.cc b/src/hb-ot-shape-complex-misc.cc index 52fbd6d..3cea734 100644 --- a/src/hb-ot-shape-complex-misc.cc +++ b/src/hb-ot-shape-complex-misc.cc @@ -42,6 +42,12 @@ _hb_ot_shape_complex_collect_features_default (hb_ot_map_builder_t *map HB_UNUSE { } +void +_hb_ot_shape_complex_override_features_default (hb_ot_map_builder_t *map HB_UNUSED, + const hb_segment_properties_t *props HB_UNUSED) +{ +} + hb_ot_shape_normalization_mode_t _hb_ot_shape_complex_normalization_preference_default (void) { @@ -74,6 +80,12 @@ _hb_ot_shape_complex_collect_features_hangul (hb_ot_map_builder_t *map, map-add_bool_feature (hangul_features[i]); } +void +_hb_ot_shape_complex_override_features_hangul (hb_ot_map_builder_t *map, + const hb_segment_properties_t *props HB_UNUSED) +{ +} + hb_ot_shape_normalization_mode_t _hb_ot_shape_complex_normalization_preference_hangul (void) { @@ -97,6 +109,12 @@ _hb_ot_shape_complex_collect_features_thai (hb_ot_map_builder_t *map HB_UNUSED, { } +void +_hb_ot_shape_complex_override_features_thai (hb_ot_map_builder_t *map HB_UNUSED, +const hb_segment_properties_t *props HB_UNUSED) +{ +} + hb_ot_shape_normalization_mode_t _hb_ot_shape_complex_normalization_preference_thai (void) { diff --git a/src/hb-ot-shape-complex-private.hh b/src/hb-ot-shape-complex-private.hh index d2f7959..7f74e34 100644 --- a/src/hb-ot-shape-complex-private.hh +++ b/src/hb-ot-shape-complex-private.hh @@ -249,6 +249,36 @@ hb_ot_shape_complex_collect_features (hb_ot_complex_shaper_t shaper, /* + * override_features() + * + * Called during shape_plan(). + * + * Shapers should use map to override features and add callbacks after + * common features are added. + */
[HarfBuzz] Lemongrass HarfBuzz Hackfest, end of day 1
Hi everyone, I just pushed the commits from the first day of hacking. There were meetings and other things going on so it was a fairly short day, but we got some solid five hours of hacking on Khmer. We used the Daun Penh font shipping with Windows 7 for testing. At the beginning of the day, 85% of the tests from our Wikipedia test data where failing. Ie. everything was broken for Khmer. Which wasn't any surprise, since major parts of the Indic shaper did not recognize Khmer at all. Five hours and 20 commits later, the failures are at under 3% now. In particular, we fixed a GDEF/GSUB issue (that has implications in other scripts too), made the shaper recognize Khmer Ro, reorder pre-base reordering characters, recognize Khmer register shifters and similar signs, and other tweaks here and there. There's more to be done tomorrow: change the syllable machine to recognize post-matra subjoined consonants, etc. Then we will attack Bengali again, addressing Ra Phala / Ya Phala, which seem to constitute the majority of failures right now. Then we'll take another look at Malayalam (which was improved substantially as a result of implementing pre-base Ra reordering already). More updates tomorrow. In the mean time, the GSUB fix may have had addressed the issue Khaled recently reported. Khaled, would be nice if you can check that. And would be nice if others can give Khmer with other fonts and report. Cheers, behdad On 07/13/2012 02:34 PM, Behdad Esfahbod wrote: Hi, Just a heads-up that Jonathan and I will be hacking on HarfBuzz all-week next week in the Toronto Mozilla office. My plan / goal for the week is to further streamline the Indic scripts: finishing Bengali, Tamil, etc, and move to Malayalam (pre-base reordering Ra, etc), then moving on to implementing the Khmer coeng model and other Khmer-specific features, which should also resolve Tai Tham among others. If there's something specific that you want to see fixed, now is the time to raise it. We will be on IRC on #harfbuzz on freenode, but I wouldn't say we'll be hugely responsive. Cheers, behdad ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Lemongrass HarfBuzz Hackfest, end of day 1
On 07/16/2012 10:28 PM, Behdad Esfahbod wrote: More updates tomorrow. In the mean time, the GSUB fix may have had addressed the issue Khaled recently reported. Khaled, would be nice if you can check that. Ok, this is NOT fixed. I'm looking into it. b ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
[HarfBuzz] harfbuzz-ng: Branch 'master' - 3 commits
src/hb-ot-layout.cc |4 ++-- test/shaping/texts/in-tree/shaper-arabic/script-arabic/misc/diacritics/MANIFEST |1 + test/shaping/texts/in-tree/shaper-arabic/script-arabic/misc/diacritics/mark-skipping.txt | 10 ++ 3 files changed, 13 insertions(+), 2 deletions(-) New commits: commit 559f70667891a3ceeffb36f40de38a4f85868945 Author: Behdad Esfahbod beh...@behdad.org Date: Mon Jul 16 22:43:17 2012 -0400 Fix MarkAttachmentType matching Fixes issue reported by Khaled Hosny with his Hussaini Nastaleeq font and sequences like those added in the previous commit. diff --git a/src/hb-ot-layout.cc b/src/hb-ot-layout.cc index 7b48fa6..10811d0 100644 --- a/src/hb-ot-layout.cc +++ b/src/hb-ot-layout.cc @@ -123,7 +123,7 @@ _hb_ot_layout_match_properties_mark (hb_face_t *face, * ignore marks of attachment type different than * the attachment type specified. */ - if (lookup_props LookupFlag::MarkAttachmentType glyph_props LookupFlag::MarkAttachmentType) + if (lookup_props LookupFlag::MarkAttachmentType) return (lookup_props LookupFlag::MarkAttachmentType) == (glyph_props LookupFlag::MarkAttachmentType); return true; commit 6de103547e4a7fb34c833861713ea373cd912261 Author: Behdad Esfahbod beh...@behdad.org Date: Mon Jul 16 22:46:06 2012 -0400 [test/arabic] Add Arabic tests for mark skipping Expose a bug with Khaled's Hussaini Nastaleeq font. diff --git a/test/shaping/texts/in-tree/shaper-arabic/script-arabic/misc/diacritics/MANIFEST b/test/shaping/texts/in-tree/shaper-arabic/script-arabic/misc/diacritics/MANIFEST index df0e4b5..242b2a1 100644 --- a/test/shaping/texts/in-tree/shaper-arabic/script-arabic/misc/diacritics/MANIFEST +++ b/test/shaping/texts/in-tree/shaper-arabic/script-arabic/misc/diacritics/MANIFEST @@ -3,3 +3,4 @@ language-arabic.txt language-persian.txt language-urdu.txt ligature-diacritics.txt +mark-skipping.txt diff --git a/test/shaping/texts/in-tree/shaper-arabic/script-arabic/misc/diacritics/mark-skipping.txt b/test/shaping/texts/in-tree/shaper-arabic/script-arabic/misc/diacritics/mark-skipping.txt new file mode 100644 index 000..038c921 --- /dev/null +++ b/test/shaping/texts/in-tree/shaper-arabic/script-arabic/misc/diacritics/mark-skipping.txt @@ -0,0 +1,10 @@ +تختة +تخÙØ© +تخئة +تخثة +تخٹة +تختÙØ© +تخÙÙØ© +تخئÙØ© +تخثÙØ© +تخٹÙØ© commit ad4494759fa8bfd2497800c24fa414075ed1aa61 Author: Behdad Esfahbod beh...@behdad.org Date: Mon Jul 16 22:40:21 2012 -0400 Minor diff --git a/src/hb-ot-layout.cc b/src/hb-ot-layout.cc index 7a613b2..7b48fa6 100644 --- a/src/hb-ot-layout.cc +++ b/src/hb-ot-layout.cc @@ -173,7 +173,7 @@ _hb_ot_layout_skip_mark (hb_face_t*face, if (property_out) *property_out = property; - /* If it's a mark, skip it we don't accept it. */ + /* If it's a mark, skip it if we don't accept it. */ if (unlikely (property HB_OT_LAYOUT_GLYPH_CLASS_MARK)) return !_hb_ot_layout_match_properties (face, ginfo-codepoint, property, lookup_props); ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Lemongrass HarfBuzz Hackfest, end of day 1
On 07/16/2012 10:38 PM, Behdad Esfahbod wrote: On 07/16/2012 10:28 PM, Behdad Esfahbod wrote: More updates tomorrow. In the mean time, the GSUB fix may have had addressed the issue Khaled recently reported. Khaled, would be nice if you can check that. Ok, this is NOT fixed. I'm looking into it. Fixed now. b b ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Lemongrass HarfBuzz Hackfest, end of day 1
On Mon, Jul 16, 2012 at 10:47:15PM -0400, Behdad Esfahbod wrote: On 07/16/2012 10:38 PM, Behdad Esfahbod wrote: On 07/16/2012 10:28 PM, Behdad Esfahbod wrote: More updates tomorrow. In the mean time, the GSUB fix may have had addressed the issue Khaled recently reported. Khaled, would be nice if you can check that. Ok, this is NOT fixed. I'm looking into it. Fixed now. Great! I hope you will look into the 'Multiple substitution and mark positioning' issue we discussed last month. Regards, Khaled ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz