Re: [HarfBuzz] A problem in thai shaper

2012-04-17 Thread Jonathan Kew

On 18/4/12 02:22, Behdad Esfahbod wrote:

On 04/17/2012 06:47 PM, Khaled Hosny wrote:

On Tue, Apr 17, 2012 at 05:10:37AM +0200, Khaled Hosny wrote:

On Mon, Apr 16, 2012 at 09:08:49PM -0400, Behdad Esfahbod wrote:

Problem 2:

When there is no consonant exist, the dotted circle should be inserted as base
character.  The logic should be the first step for the shaping engine to find
the invalid combing marks. Refer to
http://www.microsoft.com/typography/otfntdev/thaiot/shaping.aspx#comb


Right.  We do not handle invalid combining marks yet.  That's something I want
to do at some point but it's not high priority.


I don't know about Thai, but the handling of "invalid" Arabic combining
marks in Uniscribe is completely brain dead and a real PITA and I'd
really like not to see HarfBuzz going there, a shaping engine is not a
spell checker and should not enforce any input pattern.

http://www.microsoft.com/typography/OpenType%20Dev/arabic/shaping.mspx#invalid


Incidentally, I came across this Typophile post, which is one example of
why this "invalid" mark handling is not really a good idea:
http://typophile.com/node/92130


Interesting.  I'm undecided about this as of now.


Just adding my vote in favor of Khaled's position.

The shaping engine should not attempt to enforce rules such as "only one 
vowel mark on each consonant" or "nukta cannot apply to vowels" (IIRC, 
MS may have relented on that one) or "the vowel mark must precede the 
tone mark", etc. That's the role of a (language-specific) spell-checker.


JK
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] harfbuzz-ng: Branch 'master' - 6 commits

2012-04-17 Thread John Daggett
> commit a5f1834f57ea3fb254f5c7d372747de316fcc8f1
> Author: Behdad Esfahbod 
> Date:   Mon Apr 16 15:55:13 2012 -0400
> 
> Apply 'liga' for vertical writing mode too
> 
> Apparently that's what Kazuraki uses to form vertical ligatures,
> which suggests that it's what Adobe does.
> 
> diff --git a/src/hb-ot-shape.cc b/src/hb-ot-shape.cc
> index d21559c..66b1461 100644
> --- a/src/hb-ot-shape.cc
> +++ b/src/hb-ot-shape.cc
> @@ -35,6 +35,7 @@
>  
>  hb_tag_t common_features[] = {
>HB_TAG('c','c','m','p'),
> +  HB_TAG('l','i','g','a'),
>HB_TAG('l','o','c','l'),
>HB_TAG('m','a','r','k'),
>HB_TAG('m','k','m','k'),
> @@ -46,7 +47,6 @@ hb_tag_t horizontal_features[] = {
>HB_TAG('c','l','i','g'),
>HB_TAG('c','u','r','s'),
>HB_TAG('k','e','r','n'),
> -  HB_TAG('l','i','g','a'),
>  };

Just a note here that this will be problematic when rendering upright
Latin text in vertical mode.  Frankly, I don't think there's a clear,
consistent design model for OpenType features in the vertical case.
There's clearly a need to distinguish more clearly vertical ligatures
from horizontal ones, having 'f' and 'i' ligate in the upright
vertical case doesn't make sense.  Kazuraki relies on the 'vert'
feature to disambiguate horizontal and vertical ligatures but this
won't work for fonts not designed with the vertical case in mind.

So I think this change will need tweaking in the future, once there's
a clearer definition of the OpenType feature model for vertical text.

Cheers,

John Daggett
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] A problem in thai shaper

2012-04-17 Thread Martin Hosken
Dear Behdad,

> >> I don't know about Thai, but the handling of "invalid" Arabic combining
> >> marks in Uniscribe is completely brain dead and a real PITA and I'd
> >> really like not to see HarfBuzz going there, a shaping engine is not a
> >> spell checker and should not enforce any input pattern.
> >>
> >> http://www.microsoft.com/typography/OpenType%20Dev/arabic/shaping.mspx#invalid
> > 
> > Incidentally, I came across this Typophile post, which is one example of
> > why this "invalid" mark handling is not really a good idea:
> > http://typophile.com/node/92130
> 
> Interesting.  I'm undecided about this as of now.

I agree that shaping should not be used to constrain what might be valid 
sequences. But I think a shaping engine can be used to mark (think dotted 
circle) sequences that are structurally invalid. By these I mean sequences that 
would not otherwise show any visual difference from a valid sequence. For 
example diacritics in the wrong order (not covered by normalization) that show 
no visual difference (e.g. upper diacritic preceding lower when both have 0 
combining order). Such validity will be script specific but not language 
specific. The aim here is not to limit spellings but to ensure matchable 
sequencess.

In addition, a shaping engine is not designed to ensure that the lowest common 
denominator font for a script can handle anything thrown at it.

Yours,
Martin
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] On hb_shape_plan() and other API for 1.0

2012-04-17 Thread Behdad Esfahbod
On 04/12/2012 07:55 AM, Jonathan Kew wrote:
> On 12/4/12 02:47, Behdad Esfahbod wrote:
>> As a crude test, I profiled the Indic shaping, and am conjecturing that about
>> 10 to 20 percent of the time can be saved pre-planning the shaping process.
>> My testing showed no measurable saving for skipping the sanitizing process.
>>
>> Maybe both can wait (and not block a 1.0 release) since neither one seems to
>> be hugely effective.
> 
> A saving of 10-20% sounds pretty worthwhile to me - and if 1.0 is supposed to
> provide a long-term stable API, then perhaps this should be done sooner rather
> than later.

So, I tested this, and looks like for short strings we get a 25% or better
improvement.  Correctly implementing it though takes some refactoring, so I'll
do that when I get the time to.

b

> Otherwise, there'll be a strong temptation to rev the API again shortly after
> 1.0 in order to achieve this performance boost.
> 
> JK
> ___
> HarfBuzz mailing list
> HarfBuzz@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/harfbuzz
> 
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] A problem in thai shaper

2012-04-17 Thread Behdad Esfahbod
On 04/17/2012 06:47 PM, Khaled Hosny wrote:
> On Tue, Apr 17, 2012 at 05:10:37AM +0200, Khaled Hosny wrote:
>> On Mon, Apr 16, 2012 at 09:08:49PM -0400, Behdad Esfahbod wrote:
 Problem 2:

 When there is no consonant exist, the dotted circle should be inserted as 
 base
 character.  The logic should be the first step for the shaping engine to 
 find
 the invalid combing marks. Refer to
 http://www.microsoft.com/typography/otfntdev/thaiot/shaping.aspx#comb
>>>
>>> Right.  We do not handle invalid combining marks yet.  That's something I 
>>> want
>>> to do at some point but it's not high priority.
>>
>> I don't know about Thai, but the handling of "invalid" Arabic combining
>> marks in Uniscribe is completely brain dead and a real PITA and I'd
>> really like not to see HarfBuzz going there, a shaping engine is not a
>> spell checker and should not enforce any input pattern.
>>
>> http://www.microsoft.com/typography/OpenType%20Dev/arabic/shaping.mspx#invalid
> 
> Incidentally, I came across this Typophile post, which is one example of
> why this "invalid" mark handling is not really a good idea:
> http://typophile.com/node/92130

Interesting.  I'm undecided about this as of now.

behdad

> Regards,
>  Khaled
> 
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] A problem in thai shaper

2012-04-17 Thread Khaled Hosny
On Tue, Apr 17, 2012 at 05:10:37AM +0200, Khaled Hosny wrote:
> On Mon, Apr 16, 2012 at 09:08:49PM -0400, Behdad Esfahbod wrote:
> > > Problem 2:
> > > 
> > > When there is no consonant exist, the dotted circle should be inserted as 
> > > base
> > > character.  The logic should be the first step for the shaping engine to 
> > > find
> > > the invalid combing marks. Refer to
> > > http://www.microsoft.com/typography/otfntdev/thaiot/shaping.aspx#comb
> > 
> > Right.  We do not handle invalid combining marks yet.  That's something I 
> > want
> > to do at some point but it's not high priority.
> 
> I don't know about Thai, but the handling of "invalid" Arabic combining
> marks in Uniscribe is completely brain dead and a real PITA and I'd
> really like not to see HarfBuzz going there, a shaping engine is not a
> spell checker and should not enforce any input pattern.
> 
> http://www.microsoft.com/typography/OpenType%20Dev/arabic/shaping.mspx#invalid

Incidentally, I came across this Typophile post, which is one example of
why this "invalid" mark handling is not really a good idea:
http://typophile.com/node/92130

Regards,
 Khaled
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] A problem in thai shaper

2012-04-17 Thread Behdad Esfahbod
Your HarfBuzz build probably doesn't have glib, and you are not providing any
Unicode functions, so cluster formation fails.  I shall make HB warn boldly if
that happens.

behdad

On 04/17/2012 01:27 PM, datao zhang wrote:
> Hi behdad:
>  
> Thanks your comments.
>  
> I have recheck the cluster value, but i found these values are still (0,1,2).
> I don't why you can get (0,0,0).
>  
> I test it use the following code written by myself:
>  
> 
> unsigned int uchar[3]
> for(int i = 0; i < 3; i++)
> hb_buffer_add(buffer, uchar[i],1,i);
> hb_buffer_set_direction(mBuffer, HB_DIRECTION_LTR);
> hb_buffer_set_script(mBuffer, HB_SCRIPT_THAI);
> hb_shape(mFont, mBuffer, NULL, 0);
> 
>  
> After hb_shape(), i see cluster[0] :0  ; cluster[1]: 1; cluster[2]: 2
>  
> Do you have any comments?  whether i make mistake?
>  
> Maybe I use wrong concept, I know the cluster in harfbuzz not used for line
> break, but i think,  as same as the indic, the syllable should have the same
> cluster for thai, isn't it?
>  
> Br,
> Dean
>  
>> Date: Tue, 17 Apr 2012 10:28:15 -0400
>> From: beh...@behdad.org
>> To: dataozh...@hotmail.com
>> Subject: Re: [HarfBuzz] A problem in thai shaper
>>
>> On 04/17/2012 10:26 AM, Behdad Esfahbod wrote:
>> > On 04/17/2012 08:01 AM, datao zhang wrote:
>> >> Hi:
>> >> For Problem 1:
>> >> Example: if I pass the "0x0E01,0x0E34,0x0E48", the intput clusters
>> >> (0,1,2), after shape, the output cluster should be (0,0,0) because the
>> >> syllable can't be broken when line break. But, currently, I find the 
>> >> output
>> >> clusetrs are still (0,1,2).
>> >
>> > First, note that HarfBuzz clusters are not supposed to be used for things 
>> > like
>> > linebreaking and cursor positioning. So (0,1,2) is totally fine if there 
>> > are
>> > three separate glyphs representing those characters. And (0,1,2) is exactly
>> > what Uniscribe returns.
>>
>> Err, my bad. Both HarfBuzz and Uniscribe return (0,0,0) for the sequence, so
>> I don't think there's anything to fix here.
>>
>> b
>>
>> > HarfBuzz however returns (0,0,0) for that sequence.
>> > How where you testing? I'm leaning towards trying to match Uniscribe here.
>> > The finer-grained the cluster values are, the better cursor positioning 
>> > can be
>> > built on top of HarfBuzz.
>> >
>> > behdad
>> >
>> >> Br,
>> >> Dean
>> >>
>> >>> Date: Mon, 16 Apr 2012 21:08:49 -0400
>> >>> From: beh...@behdad.org
>> >>> To: dataozh...@hotmail.com
>> >>> CC: harfbuzz@lists.freedesktop.org
>> >>> Subject: Re: [HarfBuzz] A problem in thai shaper
>> >>>
>> >>> Hi,
>> >>>
>> >>> Thanks for the email. My comments inline.
>> >>>
>> >>> On 04/13/2012 09:41 AM, datao zhang wrote
>>  So I think for the new Thai shaper, the valid composition of “consonant 
>>  [1
>>  mandatory]+ diacritic vowel [1 optional] + tone mark [1 optional] “
> should be
>>  set as same cluster.
>> >>>
>> >>> I would guess that our generic layer will already take care of this 
>> >>> based on
>> >>> canonical combining categories? Do you have a test case that you want to 
>> >>> see
>> >>> improved?
>> >>>
>> >>>
>>  Problem 2:
>> 
>>  When there is no consonant exist, the dotted circle should be inserted
> as base
>>  character. The logic should be the first step for the shaping engine to
> find
>>  the invalid combing marks. Refer to
>>  http://www.microsoft.com/typography/otfntdev/thaiot/shaping.aspx#comb
>> >>>
>> >>> Right. We do not handle invalid combining marks yet. That's something I 
>> >>> want
>> >>> to do at some point but it's not high priority.
>> >>>
>> >>> behdad
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz


[HarfBuzz] harfbuzz-ng: Branch 'master'

2012-04-17 Thread Behdad Esfahbod
 src/hb-graphite2.cc |3 +++
 1 file changed, 3 insertions(+)

New commits:
commit 3cde23664fbbe9cd2ac1b8fd5eb2ea288309cc9c
Author: Behdad Esfahbod 
Date:   Tue Apr 17 11:44:49 2012 -0400

Minor note re Graphite

diff --git a/src/hb-graphite2.cc b/src/hb-graphite2.cc
index cdf55f1..fa07ae9 100644
--- a/src/hb-graphite2.cc
+++ b/src/hb-graphite2.cc
@@ -221,6 +221,9 @@ _hb_graphite_shape (hb_font_t  *font,
 
   buffer->guess_properties ();
 
+  /* XXX We do a hell of a lot of stuff just to figure out this font
+   * is not graphite!  Shouldn't do. */
+
   hb_gr_font_data_t *data = _hb_gr_font_get_data (font);
   if (!data->grface) return FALSE;
 
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz


[HarfBuzz] harfbuzz-ng: Branch 'master' - 2 commits

2012-04-17 Thread Behdad Esfahbod
 src/hb-graphite2.cc  |1 +
 test/shaping/texts/in-tree/shaper-thai/misc/misc.txt |1 +
 2 files changed, 2 insertions(+)

New commits:
commit 4dc2449d92308f8dd366142831c0b85bd30ea5a9
Author: Behdad Esfahbod 
Date:   Tue Apr 17 11:39:48 2012 -0400

Fix leak in graphite

diff --git a/src/hb-graphite2.cc b/src/hb-graphite2.cc
index 64f22f7..cdf55f1 100644
--- a/src/hb-graphite2.cc
+++ b/src/hb-graphite2.cc
@@ -130,6 +130,7 @@ static void _hb_gr_font_data_destroy (void *data)
   hb_gr_font_data_t *f = (hb_gr_font_data_t *) data;
 
   gr_font_destroy (f->grfont);
+  free (f);
 }
 
 static hb_user_data_key_t hb_gr_data_key;
commit 0290bbf8611aa881daed907f22256a431250c90a
Author: Behdad Esfahbod 
Date:   Tue Apr 17 10:28:21 2012 -0400

Add another Thai test

diff --git a/test/shaping/texts/in-tree/shaper-thai/misc/misc.txt 
b/test/shaping/texts/in-tree/shaper-thai/misc/misc.txt
index fc2dba9..51a47af 100644
--- a/test/shaping/texts/in-tree/shaper-thai/misc/misc.txt
+++ b/test/shaping/texts/in-tree/shaper-thai/misc/misc.txt
@@ -3,3 +3,4 @@
 ด๋ํา
 ดำ
 ำ
+กิ่
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz