Re: [HarfBuzz] A problem in thai shaper

2012-04-16 Thread Khaled Hosny
On Mon, Apr 16, 2012 at 09:08:49PM -0400, Behdad Esfahbod wrote:
> > Problem 2:
> > 
> > When there is no consonant exist, the dotted circle should be inserted as 
> > base
> > character.  The logic should be the first step for the shaping engine to 
> > find
> > the invalid combing marks. Refer to
> > http://www.microsoft.com/typography/otfntdev/thaiot/shaping.aspx#comb
> 
> Right.  We do not handle invalid combining marks yet.  That's something I want
> to do at some point but it's not high priority.

I don't know about Thai, but the handling of "invalid" Arabic combining
marks in Uniscribe is completely brain dead and a real PITA and I'd
really like not to see HarfBuzz going there, a shaping engine is not a
spell checker and should not enforce any input pattern.

http://www.microsoft.com/typography/OpenType%20Dev/arabic/shaping.mspx#invalid

Regards,
 Khaled
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] A problem in thai shaper

2012-04-16 Thread Behdad Esfahbod
Hi,

Thanks for the email. My comments inline.

On 04/13/2012 09:41 AM, datao zhang wrote
> So I think for the new Thai shaper, the valid composition of “consonant [1
> mandatory]+ diacritic vowel [1 optional] + tone mark [1 optional] “ should be
> set as same cluster.

I would guess that our generic layer will already take care of this based on
canonical combining categories?  Do you have a test case that you want to see
improved?


> Problem 2:
> 
> When there is no consonant exist, the dotted circle should be inserted as base
> character.  The logic should be the first step for the shaping engine to find
> the invalid combing marks. Refer to
> http://www.microsoft.com/typography/otfntdev/thaiot/shaping.aspx#comb

Right.  We do not handle invalid combining marks yet.  That's something I want
to do at some point but it's not high priority.

behdad
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Problem in complex indic

2012-04-16 Thread Behdad Esfahbod
Thanks Dean.  Fixed.

behdad

On 04/15/2012 08:03 AM, datao zhang wrote:
> Hi:
>  
>  Problem about finding the vowel syllable:
>  
>   If the Indic shaper of Harfbuzz are following the OT specification of
> Microsoft, then the following rule in “hb-ot-shape-complex-indic-machine.rl”
> should be changed:
>  
> 
> vowel_syllable =  (Ra H)? V N? (z.H.c | ZWJ.c)? matra_group*
> syllable_tail %(found_vowel_syllable); =>
> 
>  
> 
> vowel_syllable =  (Ra H)? V N? (z?.H.c | ZWJ.c)? matra_group*
> syllable_tail %(found_vowel_syllable);
> 
>  
> 
> please refer to 
> http://www.microsoft.com/typography/otfntdev/devanot/shaping.aspx
> 
>  
> Br,
> Dean
>  
> 
> 
> 
> ___
> HarfBuzz mailing list
> HarfBuzz@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/harfbuzz
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz


[HarfBuzz] harfbuzz-ng: Branch 'master' - 2 commits

2012-04-16 Thread Behdad Esfahbod
 src/hb-ot-shape-complex-indic-machine.rl |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

New commits:
commit 9ceca3aeb14cc096f5f87660cf7351bc35073084
Author: Behdad Esfahbod 
Date:   Mon Apr 16 21:05:51 2012 -0400

Fix ragel regexp in vowel-based syllable

As reported by datao zhang on the mailing list.

diff --git a/src/hb-ot-shape-complex-indic-machine.rl 
b/src/hb-ot-shape-complex-indic-machine.rl
index 417880b..6406c24 100644
--- a/src/hb-ot-shape-complex-indic-machine.rl
+++ b/src/hb-ot-shape-complex-indic-machine.rl
@@ -67,7 +67,7 @@ action found_non_indic { found_non_indic (map, buffer, 
mask_array, last, p); }
 action next_syllable { buffer->merge_clusters (last, p); last = p; }
 
 consonant_syllable =   (c.N? (H.z?|z.H))* c.N? A? (H.z? | matra_group*)? 
syllable_tail %(found_consonant_syllable);
-vowel_syllable =   (Ra H)? V N? (z.H.c | ZWJ.c)? matra_group* 
syllable_tail %(found_vowel_syllable);
+vowel_syllable =   (Ra H)? V N? (z?.H.c | ZWJ.c)? matra_group* 
syllable_tail %(found_vowel_syllable);
 standalone_cluster =   (Ra H)? NBSP N? (z? H c)? matra_group* syllable_tail 
%(found_standalone_cluster);
 non_indic = X %(found_non_indic);
 
commit b870afcd1b436614af95db6dc297e54c8f03f0cd
Author: Behdad Esfahbod 
Date:   Mon Apr 16 21:05:11 2012 -0400

Rewrite ragel expression to better match the one on MS spec

https://www.microsoft.com/typography/otfntdev/devanot/shaping.aspx

diff --git a/src/hb-ot-shape-complex-indic-machine.rl 
b/src/hb-ot-shape-complex-indic-machine.rl
index 7af23c1..417880b 100644
--- a/src/hb-ot-shape-complex-indic-machine.rl
+++ b/src/hb-ot-shape-complex-indic-machine.rl
@@ -66,7 +66,7 @@ action found_non_indic { found_non_indic (map, buffer, 
mask_array, last, p); }
 
 action next_syllable { buffer->merge_clusters (last, p); last = p; }
 
-consonant_syllable =   (c.N? (z.H|H.z?))* c.N? A? (H.z? | matra_group*)? 
syllable_tail %(found_consonant_syllable);
+consonant_syllable =   (c.N? (H.z?|z.H))* c.N? A? (H.z? | matra_group*)? 
syllable_tail %(found_consonant_syllable);
 vowel_syllable =   (Ra H)? V N? (z.H.c | ZWJ.c)? matra_group* 
syllable_tail %(found_vowel_syllable);
 standalone_cluster =   (Ra H)? NBSP N? (z? H c)? matra_group* syllable_tail 
%(found_standalone_cluster);
 non_indic = X %(found_non_indic);
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz


[HarfBuzz] harfbuzz-ng: Branch 'master' - 6 commits

2012-04-16 Thread Behdad Esfahbod
 src/hb-ot-shape.cc 
|2 
 src/hb-private.hh  
|8 +
 test/shaping/texts/in-tree/shaper-default/MANIFEST 
|1 
 test/shaping/texts/in-tree/shaper-default/script-japanese/MANIFEST 
|1 
 test/shaping/texts/in-tree/shaper-default/script-japanese/misc/MANIFEST
|2 
 
test/shaping/texts/in-tree/shaper-default/script-japanese/misc/kazuraki-liga-lines.txt
 |8 +
 
test/shaping/texts/in-tree/shaper-default/script-japanese/misc/kazuraki-liga.txt
   |   53 ++
 util/hb-shape.cc   
|8 -
 util/hb-view.hh
|2 
 util/helper-cairo.cc   
|   22 +++-
 util/helper-cairo.hh   
|3 
 util/options.cc
|   19 ++-
 util/options.hh
|   26 +++-
 util/view-cairo.cc 
|   15 +-
 util/view-cairo.hh 
|3 
 15 files changed, 139 insertions(+), 34 deletions(-)

New commits:
commit 95cefdf96efe43a44133aa8a186155cf4e63e2b7
Author: Behdad Esfahbod 
Date:   Mon Apr 16 18:08:20 2012 -0400

Add --utf8-clusters

Also fix cairo cluster generation.

diff --git a/util/hb-shape.cc b/util/hb-shape.cc
index a76a778..b22bc1f 100644
--- a/util/hb-shape.cc
+++ b/util/hb-shape.cc
@@ -36,7 +36,8 @@ struct output_buffer_t : output_options_t, format_options_t
   void init (const font_options_t *font_opts);
   void consume_line (hb_buffer_t  *buffer,
 const char   *text,
-unsigned int  text_len);
+unsigned int  text_len,
+hb_bool_t utf8_clusters);
   void finish (const font_options_t *font_opts);
 
   protected:
@@ -57,11 +58,12 @@ output_buffer_t::init (const font_options_t *font_opts)
 void
 output_buffer_t::consume_line (hb_buffer_t  *buffer,
   const char   *text,
-  unsigned int  text_len)
+  unsigned int  text_len,
+  hb_bool_t utf8_clusters)
 {
   line_no++;
   g_string_set_size (gs, 0);
-  serialize_line (buffer, line_no, text, text_len, font, gs);
+  serialize_line (buffer, line_no, text, text_len, font, utf8_clusters, gs);
   fprintf (fp, "%s", gs->str);
 }
 
diff --git a/util/hb-view.hh b/util/hb-view.hh
index 68a5dd8..66d955b 100644
--- a/util/hb-view.hh
+++ b/util/hb-view.hh
@@ -65,7 +65,7 @@ struct hb_view_t
 buffer))
fail (FALSE, "All shapers failed");
 
-  output.consume_line (buffer, text, text_len);
+  output.consume_line (buffer, text, text_len, shaper.utf8_clusters);
 }
 hb_buffer_destroy (buffer);
 
diff --git a/util/helper-cairo.cc b/util/helper-cairo.cc
index abb8c15..9374d9e 100644
--- a/util/helper-cairo.cc
+++ b/util/helper-cairo.cc
@@ -301,7 +301,8 @@ helper_cairo_line_from_buffer (helper_cairo_line_t *l,
   hb_buffer_t *buffer,
   const char  *text,
   unsigned int text_len,
-  double   scale)
+  double   scale,
+  hb_bool_tutf8_clusters)
 {
   memset (l, 0, sizeof (*l));
 
@@ -349,27 +350,38 @@ helper_cairo_line_from_buffer (helper_cairo_line_t *l,
 hb_bool_t backward = HB_DIRECTION_IS_BACKWARD (hb_buffer_get_direction 
(buffer));
 l->cluster_flags = backward ? CAIRO_TEXT_CLUSTER_FLAG_BACKWARD : 
(cairo_text_cluster_flags_t) 0;
 unsigned int cluster = 0;
+const char *start = l->utf8, *end = start;
 l->clusters[cluster].num_glyphs++;
 if (backward) {
   for (i = l->num_glyphs - 2; i >= 0; i--) {
if (hb_glyph[i].cluster != hb_glyph[i+1].cluster) {
  g_assert (hb_glyph[i].cluster > hb_glyph[i+1].cluster);
- l->clusters[cluster].num_bytes += hb_glyph[i].cluster - 
hb_glyph[i+1].cluster;
+ if (utf8_clusters)
+   end = start + hb_glyph[i].cluster - hb_glyph[i+1].cluster;
+ else
+   end = g_utf8_offset_to_pointer (start, hb_glyph[i].cluster - 
hb_glyph[i+1].cluster);
+ l->clusters[cluster].num_bytes = end - start;
+ start = end;
  cluster++;
}
l->clusters[cluster].num_glyphs++;
   }
-  l->clusters[cluster].num_bytes += text_len - hb_glyph[0].clus