Re: Progress update on adjustment database

2023-11-03 Thread Craig White
The final branch has everything it needs to now!
I would really like you or someone else who has worked a lot with freetype
to give it a code review.

Thank you.

On Mon, Oct 30, 2023 at 2:26 AM Werner LEMBERG  wrote:

>
> > There was an intentional change to src/base/ftobjs.c.  I made the
> > function that finds the best unicode charmap public so that I could
> > use it from the autofitter.
>
> OK, I see.  I was confused because the `FT_BASE_TAG` is missing for
> this function in `ftobjs.c`.
>
>
> Werner
>


Re: Progress update on adjustment database

2023-10-30 Thread Werner LEMBERG


> There was an intentional change to src/base/ftobjs.c.  I made the
> function that finds the best unicode charmap public so that I could
> use it from the autofitter.

OK, I see.  I was confused because the `FT_BASE_TAG` is missing for
this function in `ftobjs.c`.


Werner



Re: Progress update on adjustment database

2023-10-29 Thread Craig White
> And no accidental changes to `freetype/config/ftoption.h` and
> `src/base/ftobjs,c`, please :-)

There was an intentional change to src/base/ftobjs.c.  I made the function
that finds the best unicode charmap public so that I could use it from the
autofitter.

On Fri, Oct 27, 2023 at 4:26 AM Werner LEMBERG  wrote:

>
> > I have added and tested type 3 lookup handling, and also added the
> > comments you asked for.
>
> Thanks.  Minor nits that I've seen while skimming through:
>
>   s/indicies/indices/
>   s/flatenning/flattening/
>
> > I will begin on the final branch if there's nothing else to be done.
>
> OK.  Please rebase to master before you start.
>
> And no accidental changes to `freetype/config/ftoption.h` and
> `src/base/ftobjs,c`, please :-)
>
>
> Werner
>


Re: Progress update on adjustment database

2023-10-27 Thread Werner LEMBERG


> I have added and tested type 3 lookup handling, and also added the
> comments you asked for.

Thanks.  Minor nits that I've seen while skimming through:

  s/indicies/indices/
  s/flatenning/flattening/

> I will begin on the final branch if there's nothing else to be done.

OK.  Please rebase to master before you start.

And no accidental changes to `freetype/config/ftoption.h` and
`src/base/ftobjs,c`, please :-)


Werner



Re: Progress update on adjustment database

2023-10-26 Thread Craig White
I have added and tested type 3 lookup handling, and also added the comments
you asked for.
I will begin on the final branch if there's nothing else to be done.

Thanks for the help!

On Mon, Oct 23, 2023 at 12:54 PM Werner LEMBERG  wrote:

>
> > I need some help figuring out how to handle the type 3 lookups.
> > I need to do 2 things:
> > - Figure out which features contain type 3 lookups
> > - Determine the number of variations the feature contains
>
> Simply add all glyphs a type 3 lookup provides to the list of glyphs.
> No further special handling is necessary.
>
> > For the second one, this function seems relevant:
> >
> https://harfbuzz.github.io/harfbuzz-hb-ot-layout.html#hb-ot-layout-feature-with-variations-get-lookups
> > But this returns a list of lookups if you already know the variation
> > index, when I want to know the range of possible variation indices.
>
> This function is not relevant – it's about variation fonts, see
>
>
> https://learn.microsoft.com/en-us/typography/opentype/spec/chapter2#device-and-variationindex-tables
>
> > hb_ot_layout_lookup_get_glyph_alternates
> > also looks useful and could partially solve the problem.
>
> This is the right function, I think.
>
> > With this function, I can handle the type 3 lookup cases in
> > isolation, only finding the glyphs directly resulting from the
> > feature and no further transformations.
>
> Sounds sufficient to me.
>
> > Also, can I have some advice on testing the code?  How should I make
> > these changes bulletproof?
>
> Alas, we don't have a testing framework for such issues.  However, as
> soon as your code lands in 'master', the Chromium people and other
> parties run their fuzzers on FreeType, which usually unveils memory
> leaks or segfaults quite soon.  They also do intensive comparison of
> graphic images; however, I don't know whether they use the auto-hinter
> for that.
>
>
> Werner
>


Re: Progress update on adjustment database

2023-10-23 Thread Werner LEMBERG

> I need some help figuring out how to handle the type 3 lookups.
> I need to do 2 things:
> - Figure out which features contain type 3 lookups
> - Determine the number of variations the feature contains

Simply add all glyphs a type 3 lookup provides to the list of glyphs.
No further special handling is necessary.

> For the second one, this function seems relevant:
> https://harfbuzz.github.io/harfbuzz-hb-ot-layout.html#hb-ot-layout-feature-with-variations-get-lookups
> But this returns a list of lookups if you already know the variation
> index, when I want to know the range of possible variation indices.

This function is not relevant – it's about variation fonts, see

  
https://learn.microsoft.com/en-us/typography/opentype/spec/chapter2#device-and-variationindex-tables

> hb_ot_layout_lookup_get_glyph_alternates
> also looks useful and could partially solve the problem.

This is the right function, I think.

> With this function, I can handle the type 3 lookup cases in
> isolation, only finding the glyphs directly resulting from the
> feature and no further transformations.

Sounds sufficient to me.

> Also, can I have some advice on testing the code?  How should I make
> these changes bulletproof?

Alas, we don't have a testing framework for such issues.  However, as
soon as your code lands in 'master', the Chromium people and other
parties run their fuzzers on FreeType, which usually unveils memory
leaks or segfaults quite soon.  They also do intensive comparison of
graphic images; however, I don't know whether they use the auto-hinter
for that.


Werner


Re: Progress update on adjustment database

2023-10-21 Thread Craig White
> OK.  Minor nit: Please avoid overlong git commit messages (i.e., not
> longer than 78 characters).  And there should be an empty line after
> the first line to help tools like `gitk` to properly display git
> commits.  [Overlong lines should be avoided in the C code, too, both
> for comments and code.]
> Excellent!  I think it would also be beneficial if you could mention
> your findings in either a git comment or in the code itself, together
> with some real-world examples of such quirks (i.e., font name, font
> version, glyph name, reason why it fails, etc., etc.)

Will do, thanks.

I need some help figuring out how to handle the type 3 lookups.
I need to do 2 things:
- Figure out which features contain type 3 lookups
- Determine the number of variations the feature contains

For the second one, this function seems relevant:
https://harfbuzz.github.io/harfbuzz-hb-ot-layout.html#hb-ot-layout-feature-with-variations-get-lookups
But this returns a list of lookups if you already know the variation index,
when I want to know the range of possible variation indices.

hb_ot_layout_lookup_get_glyph_alternates
also looks useful and could partially solve the problem.  With this
function, I can handle the type 3 lookup cases in isolation, only finding
the glyphs directly resulting from the feature and no further
transformations.

Also, can I have some advice on testing the code?  How should I make these
changes bulletproof?

On Mon, Oct 16, 2023 at 11:58 PM Werner LEMBERG  wrote:

>
> >> I simply noticed that it's possible for 2 characters to map to the
> >> same glyph, which means that a glyph would map to 2 characters.  I
> >> don't have any examples in mind for when this would actually
> >> happen.  I was planning on either ignoring the situation to let it
> >> be resolved arbitrarily, or removing both entries.
> >
> > The situation is resolved arbitrarily for now.
>
> OK.  Minor nit: Please avoid overlong git commit messages (i.e., not
> longer than 78 characters).  And there should be an empty line after
> the first line to help tools like `gitk` to properly display git
> commits.  [Overlong lines should be avoided in the C code, too, both
> for comments and code.]
>
> > Also: what else needs to be done for the project to be complete and
> > ready to become a part of freetype?  The remaining tasks I can think
> > of are:
> >
> > - Fill in, or find someone to fill in the rest of the adjustment
> >   database.
>
> This is certainly helpful.  However, it doesn't need to be complete
> right now, but it should cover a good share of languages that use the
> Latin script.  BTW, please add a comment to the `adjustment_database`
> array, explaining the format.
>
> > - properly address the 'salt' table and similar cases in the glyph
> >   alternative finding algorithm.
> > - Test everything more thoroughly.
>
> Sounds good, thanks.  I also request you to produce a 'final' GSoC
> tree with cleaned-up commit messages, as mentioned in other e-mails to
> this list to other GSoC participants.
>
> > At this point, I know that the segment removal + vertical stretch is
> > definitely the best approach, and the latest commit applies that to
> > all the characters with tildes rather than a comparison of
> > approaches.  I previously thought that it caused some regressions,
> > but I now know that the examples I had were just preexisting quirks
> > in either the font or the autohinter.
>
> Excellent!  I think it would also be beneficial if you could mention
> your findings in either a git comment or in the code itself, together
> with some real-world examples of such quirks (i.e., font name, font
> version, glyph name, reason why it fails, etc., etc.)
>
>
> Werner
>


Re: Progress update on adjustment database

2023-10-16 Thread Werner LEMBERG


>> I simply noticed that it's possible for 2 characters to map to the
>> same glyph, which means that a glyph would map to 2 characters.  I
>> don't have any examples in mind for when this would actually
>> happen.  I was planning on either ignoring the situation to let it
>> be resolved arbitrarily, or removing both entries.
> 
> The situation is resolved arbitrarily for now.

OK.  Minor nit: Please avoid overlong git commit messages (i.e., not
longer than 78 characters).  And there should be an empty line after
the first line to help tools like `gitk` to properly display git
commits.  [Overlong lines should be avoided in the C code, too, both
for comments and code.]

> Also: what else needs to be done for the project to be complete and
> ready to become a part of freetype?  The remaining tasks I can think
> of are:
>
> - Fill in, or find someone to fill in the rest of the adjustment
>   database.

This is certainly helpful.  However, it doesn't need to be complete
right now, but it should cover a good share of languages that use the
Latin script.  BTW, please add a comment to the `adjustment_database`
array, explaining the format.

> - properly address the 'salt' table and similar cases in the glyph
>   alternative finding algorithm.
> - Test everything more thoroughly.

Sounds good, thanks.  I also request you to produce a 'final' GSoC
tree with cleaned-up commit messages, as mentioned in other e-mails to
this list to other GSoC participants.

> At this point, I know that the segment removal + vertical stretch is
> definitely the best approach, and the latest commit applies that to
> all the characters with tildes rather than a comparison of
> approaches.  I previously thought that it caused some regressions,
> but I now know that the examples I had were just preexisting quirks
> in either the font or the autohinter.

Excellent!  I think it would also be beneficial if you could mention
your findings in either a git comment or in the code itself, together
with some real-world examples of such quirks (i.e., font name, font
version, glyph name, reason why it fails, etc., etc.)


Werner



Re: Progress update on adjustment database

2023-10-14 Thread Craig White
> Perhaps the following?
> (1) If glyph A is in the 'cmap' table, and glyph B is not, prefer
> glyph A.
> (2) If one glyph needs X lookups and another glyph needs Y, and X < Y,
> prefer glyph X.
> I'm not sure whether (2) makes sense, though.
> Can you give one or more examples for such cases?

To repeat what I said in an email that I accidentally didn't send to the
mailing list:
> I simply noticed that it's possible for 2 characters to map to the same
glyph, which means that a glyph would map to 2 characters.  I don't have
any examples in mind for when this would actually happen.  I was planning
on either ignoring the situation to let it be resolved arbitrarily, or
removing both entries.

The situation is resolved arbitrarily for now.  I want to have a reason for
the rule that I choose before I choose a rule for such cases.  (1) seems to
generally make sense because the cmap table is more "direct".

Also: what else needs to be done for the project to be complete and ready
to become a part of freetype?  The remaining tasks I can think of are:
- Fill in, or find someone to fill in the rest of the adjustment database.
- properly address the 'salt' table and similar cases in the glyph
alternative finding algorithm.
- Test everything more thoroughly.

At this point, I know that the segment removal + vertical stretch is
definitely the best approach, and the latest commit applies that to all the
characters with tildes rather than a comparison of approaches.  I
previously thought that it caused some regressions, but I now know that the
examples I had were just preexisting quirks in either the font or the
autohinter.

> To help people understand the non-trivial algorithm I
> suggest that you add a big comment that shows it working step by step
> for an example font, using a reduced set of features and glyphs.
This comment has been added.

On Tue, Oct 3, 2023 at 7:04 AM Werner LEMBERG  wrote:

> > > OK.  I think it is a bad side effect of the current auto-hinting
> > > algorithm that there are different approaches.
> >
> > I just want to clarify: you understood that the reason I used
> > different approaches for each letter was to compare the approaches?
> > My intent is to use one of those approaches as a universal algorithm
> > for all characters with tildes.  So every character would just have
> > a boolean flag for whether to apply tilde hinting or not.
>
> Ah, I was confused, sorry – I thought that you get such varying
> results for a single algorithm.  Sometimes it happens (not taking your
> current work into account) that blue zones affect the hinting of
> accents in a bad way, and I thought this were such cases.
>
> > Also, did you see my question about a glyph mapping to multiple
> > characters?
>
> I missed it, sorry again.  You write:
>
> > It's possible that 2 characters in the adjustment database could map
> > to the same glyph, which will create 2 entries in the reverse
> > character map with the same glyph as a key.  In this case, the
> > character that glyph maps to is decided arbitrarily based on which
> > one the binary search chooses and which order qsort puts them in.
> > What should be done in these cases?
>
> Perhaps the following?
>
> (1) If glyph A is in the 'cmap' table, and glyph B is not, prefer
> glyph A.
>
> (2) If one glyph needs X lookups and another glyph needs Y, and X < Y,
> prefer glyph X.
>
> I'm not sure whether (2) makes sense, though.
>
> Can you give one or more examples for such cases?
>
>
> Werner
>


Re: Progress update on adjustment database

2023-10-03 Thread Werner LEMBERG
> > OK.  I think it is a bad side effect of the current auto-hinting
> > algorithm that there are different approaches.
> 
> I just want to clarify: you understood that the reason I used
> different approaches for each letter was to compare the approaches?
> My intent is to use one of those approaches as a universal algorithm
> for all characters with tildes.  So every character would just have
> a boolean flag for whether to apply tilde hinting or not.

Ah, I was confused, sorry – I thought that you get such varying
results for a single algorithm.  Sometimes it happens (not taking your
current work into account) that blue zones affect the hinting of
accents in a bad way, and I thought this were such cases.

> Also, did you see my question about a glyph mapping to multiple
> characters?

I missed it, sorry again.  You write:

> It's possible that 2 characters in the adjustment database could map
> to the same glyph, which will create 2 entries in the reverse
> character map with the same glyph as a key.  In this case, the
> character that glyph maps to is decided arbitrarily based on which
> one the binary search chooses and which order qsort puts them in.
> What should be done in these cases?

Perhaps the following?

(1) If glyph A is in the 'cmap' table, and glyph B is not, prefer
glyph A.

(2) If one glyph needs X lookups and another glyph needs Y, and X < Y,
prefer glyph X.

I'm not sure whether (2) makes sense, though.

Can you give one or more examples for such cases?


Werner


Re: Progress update on adjustment database

2023-09-30 Thread Craig White
> OK.  I think it is a bad side effect of the current auto-hinting
algorithm that there are different approaches.

I just want to clarify: you understood that the reason I used different
approaches for each letter was to compare the approaches?  My intent is to
use one of those approaches as a universal algorithm for all characters
with tildes.  So every character would just have a boolean flag for whether
to apply tilde hinting or not.

> Looks good.  To help people understand the non-trivial algorithm I
suggest that you add a big comment that shows it working step by step
for an example font, using a reduced set of features and glyphs.

will do!

Also, did you see my question about a glyph mapping to multiple characters?

On Sat, Sep 30, 2023 at 2:40 AM Werner LEMBERG  wrote:

>
> > > Thanks.  Do you have meanwhile found an explanation why o-tilde
> > > looks so bad for Times New Roman at 16ppem?
> >
> > All 4 letters in each row have a different approach:
> >
> > õ: vertical stretch, no segment removal
> > ñ: no vertical stretch, segment removal
> > ã: vertical stretch and segment removal
> > all other tildes: no changes applied
>
> OK.  I think it is a bad side effect of the current auto-hinting
> algorithm that there are different approaches.  However, using the
> adjustment database I wonder whether the knowledge of the character
> topology can help improve the situation.  In other words, do you see a
> possibility to 'decouple' the (vertical) hinting of the tilde from the
> base glyph hinting by checking a flag in the database?  For this
> purpose, a 'tilde' could be defined as the contour that lies higher
> than the ascender of small letters – this implies that you need
> another flag or enumeration to refer to small letter, uppercase
> letters, etc.
>
> As an example, the database information for glyph 'o tilde' could be
>
>   * lowercase character
>   * hint contour(s) higher than the lowercase ascender hight
> separately
>   * stretch tilde vertically
>
> > I implemented the algorithm for all glyph variants!  The version I
> > used is different from what I wrote originally to fix some errors.
>
> Looks good.  To help people understand the non-trivial algorithm I
> suggest that you add a big comment that shows it working step by step
> for an example font, using a reduced set of features and glyphs.
>
> > I've only tried it on a pretty simple case so far, so I'll need to
> > assemble a more complex test font or two.
>
> A feature-rich (and freely available) font family is 'Libertinus', for
> example.
>
>
> Werner
>


Re: Progress update on adjustment database

2023-09-30 Thread Werner LEMBERG

> > Thanks.  Do you have meanwhile found an explanation why o-tilde
> > looks so bad for Times New Roman at 16ppem?
>
> All 4 letters in each row have a different approach:
>
> õ: vertical stretch, no segment removal
> ñ: no vertical stretch, segment removal
> ã: vertical stretch and segment removal
> all other tildes: no changes applied

OK.  I think it is a bad side effect of the current auto-hinting
algorithm that there are different approaches.  However, using the
adjustment database I wonder whether the knowledge of the character
topology can help improve the situation.  In other words, do you see a
possibility to 'decouple' the (vertical) hinting of the tilde from the
base glyph hinting by checking a flag in the database?  For this
purpose, a 'tilde' could be defined as the contour that lies higher
than the ascender of small letters – this implies that you need
another flag or enumeration to refer to small letter, uppercase
letters, etc.

As an example, the database information for glyph 'o tilde' could be

  * lowercase character
  * hint contour(s) higher than the lowercase ascender hight
separately
  * stretch tilde vertically

> I implemented the algorithm for all glyph variants!  The version I
> used is different from what I wrote originally to fix some errors.

Looks good.  To help people understand the non-trivial algorithm I
suggest that you add a big comment that shows it working step by step
for an example font, using a reduced set of features and glyphs.

> I've only tried it on a pretty simple case so far, so I'll need to
> assemble a more complex test font or two.

A feature-rich (and freely available) font family is 'Libertinus', for
example.


Werner


Re: Progress update on adjustment database

2023-09-27 Thread Craig White
I have a question:
It's possible that 2 characters in the adjustment database could map to the
same glyph, which will create 2 entries in the reverse character map with
the same glyph as a key.  In this case, the character that glyph maps to is
decided arbitrarily based on which one the binary search chooses and which
order qsort puts them in.
What should be done in these cases?

On Tue, Sep 26, 2023 at 11:17 PM Craig White  wrote:

> > Thanks.  Do you have meanwhile found an explanation why o-tilde looks
> so bad for Times New Roman at 16ppem?
>
> All 4 letters in each row have a different approach:
>
> õ: vertical stretch, no segment removal
> ñ: no vertical stretch, segment removal
> ã: vertical stretch and segment removal
> all other tildes: no changes applied
>
> Actually, I tried the o tilde character again with no adjustments and it
> looked the same.  In this case, the vertical stretch wasn't enough to fix
> the issue.
>
> > Sounds good.  Unfortunately, I'm a bit short of time right now; I'll
> think about your algorithm within the next few days.  However, please
> proceed anyway!
> I implemented the algorithm for all glyph variants!  The version I used is
> different from what I wrote originally to fix some errors.  Here's the
> current version:
> results is now a global set of glyphs instead of an argument to the
> function.  It initially starts empty
> fs is a global set of features, also initially empty
> all other definitions are the same
>
> func all_glyphs(codepoint c)
> {
> result = result ∪ lookup(c, fs)
> foreach (feature f ∈ (features - fs)) //for all features not already
> in fs
> {
> new_glyphs = lookup(c, fs ∪ f) - result
> if (new_glyphs != ∅)
> {
> result = result ∪ new_glyphs
> fs = fs ∪ f
> all_glyphs(c)
> fs = fs - f
> }
> }
> }
> I've only tried it on a pretty simple case so far, so I'll need to
> assemble a more complex test font or two.
>
> On Mon, Sep 18, 2023, 4:03 AM Werner LEMBERG  wrote:
>
>>
>> > Is testing all these combinations really necessary?
>>
>> I don't know :-) I just wanted to point out that feature combinations
>> have to be considered.
>>
>> > [...] My intuition says very few of these combinations actually
>> > matter.
>>
>> Yes, I agree.
>>
>> > I wrote some pseudocode for a different approach that I believe
>> > accomplishes the same thing, while being more efficient and hopefully
>> > removing the need to constrain the set of features considered: [...]
>>
>> Sounds good.  Unfortunately, I'm a bit short of time right now; I'll
>> think about your algorithm within the next few days.  However, please
>> proceed anyway!
>>
>> > I attached some pictures of the tilde unflattening approaches.
>>
>> Thanks.  Do you have meanwhile found an explanation why o-tilde looks
>> so bad for Times New Roman at 16ppem?
>>
>> > I chose sizes that showcase the differences between the approaches,
>> > and also committed my current code if you would like to try it
>> > yourself.
>>
>> Will try if I find some time.
>>
>>
>> Werner
>>
>


Re: Progress update on adjustment database

2023-09-26 Thread Craig White
> Thanks.  Do you have meanwhile found an explanation why o-tilde looks
so bad for Times New Roman at 16ppem?

All 4 letters in each row have a different approach:

õ: vertical stretch, no segment removal
ñ: no vertical stretch, segment removal
ã: vertical stretch and segment removal
all other tildes: no changes applied

Actually, I tried the o tilde character again with no adjustments and it
looked the same.  In this case, the vertical stretch wasn't enough to fix
the issue.

> Sounds good.  Unfortunately, I'm a bit short of time right now; I'll
think about your algorithm within the next few days.  However, please
proceed anyway!
I implemented the algorithm for all glyph variants!  The version I used is
different from what I wrote originally to fix some errors.  Here's the
current version:
results is now a global set of glyphs instead of an argument to the
function.  It initially starts empty
fs is a global set of features, also initially empty
all other definitions are the same

func all_glyphs(codepoint c)
{
result = result ∪ lookup(c, fs)
foreach (feature f ∈ (features - fs)) //for all features not already in
fs
{
new_glyphs = lookup(c, fs ∪ f) - result
if (new_glyphs != ∅)
{
result = result ∪ new_glyphs
fs = fs ∪ f
all_glyphs(c)
fs = fs - f
}
}
}
I've only tried it on a pretty simple case so far, so I'll need to assemble
a more complex test font or two.

On Mon, Sep 18, 2023, 4:03 AM Werner LEMBERG  wrote:

>
> > Is testing all these combinations really necessary?
>
> I don't know :-) I just wanted to point out that feature combinations
> have to be considered.
>
> > [...] My intuition says very few of these combinations actually
> > matter.
>
> Yes, I agree.
>
> > I wrote some pseudocode for a different approach that I believe
> > accomplishes the same thing, while being more efficient and hopefully
> > removing the need to constrain the set of features considered: [...]
>
> Sounds good.  Unfortunately, I'm a bit short of time right now; I'll
> think about your algorithm within the next few days.  However, please
> proceed anyway!
>
> > I attached some pictures of the tilde unflattening approaches.
>
> Thanks.  Do you have meanwhile found an explanation why o-tilde looks
> so bad for Times New Roman at 16ppem?
>
> > I chose sizes that showcase the differences between the approaches,
> > and also committed my current code if you would like to try it
> > yourself.
>
> Will try if I find some time.
>
>
> Werner
>


Re: Progress update on adjustment database

2023-09-18 Thread Werner LEMBERG


> Is testing all these combinations really necessary?

I don't know :-) I just wanted to point out that feature combinations
have to be considered.

> [...] My intuition says very few of these combinations actually
> matter.

Yes, I agree.

> I wrote some pseudocode for a different approach that I believe
> accomplishes the same thing, while being more efficient and hopefully
> removing the need to constrain the set of features considered: [...]

Sounds good.  Unfortunately, I'm a bit short of time right now; I'll
think about your algorithm within the next few days.  However, please
proceed anyway!

> I attached some pictures of the tilde unflattening approaches.

Thanks.  Do you have meanwhile found an explanation why o-tilde looks
so bad for Times New Roman at 16ppem?

> I chose sizes that showcase the differences between the approaches,
> and also committed my current code if you would like to try it
> yourself.

Will try if I find some time.


Werner



Re: Progress update on adjustment database

2023-09-12 Thread Werner LEMBERG


> So, if my understanding is correct, hb_ot_shape_glyphs_closure will
> take an input character or characters and tell me all the glyphs
> that it gets transformed into, as well as the final form.

Yes.

> I'm not sure about this interpretation, because the documentation
> uses the term "Transitive closure", which I'm not familiar with.

Indeed, it's a bit unfortunate that the documentation is not more
verbose.

> As for iterating through auto-hinter styles, do you mean that I
> should get a list of features and try each one for the 'features'
> parameter?

Yes, you should try each one, and all combinations of them.  However,
the number of features that are of interest (at least for latin
scripts) is small, which means that the number of iterations doesn't
become very large; see macro `META_STYLE_LATIN` in file `afstyles.h`
for a list.

> Also, I wanted to share my progress in the tilde unflattening.
> [...]

This sounds very promising, thanks!

> The segment removal should be part of the solution, but the question
> is to what extent the vertical stretch should be part of the
> solution.

My gut feeling says that both is needed.  I hope that you find
constraints that work reliably for a large bunch of (common) fonts.

To try to answer this, I tested on a bunch of fonts.
> [...]

Please also post some images.


Werner



Re: Progress update on adjustment database

2023-09-10 Thread Craig White
So, if my understanding is correct, hb_ot_shape_glyphs_closure will take an
input character or characters and tell me all the glyphs that it gets
transformed into, as well as the final form.  I'm not sure about this
interpretation, because the documentation uses the term "Transitive
closure", which I'm not familiar with.
As for iterating through auto-hinter styles, do you mean that I should get
a list of features and try each one for the 'features' parameter?

Also, I wanted to share my progress in the tilde unflattening.  You
suggested earlier that I fix the algorithm by removing the segments
containing points of the tilde from their edges.  This worked remarkably
well.  In fact, I found that I often got better results with only this
change.  The problem with the vertical stretch was that while it was able
to fix the tilde in many cases, it caused the size to be inconsistent.  The
tilde would be 2 pixels wide at one ppem, then increase to 3 when
downsizing, when this issue wasn't present before (this happens both with
and without the segment removal).  The segment removal is simpler and
should be preferred if it solves the problem adequately.  The drawback of
segment removal is that it causes the tildes to have blurrier outlines, and
it doesn't fix some of the cases that it would if it was combined with the
vertical stretch.
The segment removal should be part of the solution, but the question is to
what extent the vertical stretch should be part of the solution.  To try to
answer this, I tested on a bunch of fonts.
To aid testing, different characters use different versions of the
algorithm:
õ: vertical stretch, no segment
ñ: no vertical stretch, segment removal
ã: vertical stretch and segment removal
all other tildes: no changes applied

Results:
Liberation Sans Regular: Segment removal delays flattening, but adding
vertical stretch delays it even further.
Times New Roman: All approaches had weird behavior at exactly 16 ppem.
With only the stretch, the tilde flattened at this size.  With both
approaches with segment removal, the tilde stretched to 3 pixels tall and
looked very out of place.  This wasn't an issue with any non-adjusted
letters with tildes (indicating it didn't originate from the font).  In all
cases, the tilde was not flat but very blurry at all sizes where the glyph
was legible.
Noto Sans Regular, suggested by Brad: both approaches with segment removal
fix the tilde until around 13 ppem.  This font has flat tildes at sizes as
high as 26 ppem, which occurred with the vertical-stretch-only mode.
Calibri Regular: Didn't have a flat tilde issue to begin with.  Both
approaches using segment removal had no size anomalies.

On Sat, Sep 2, 2023 at 1:59 AM Werner LEMBERG  wrote:

>
> > I discovered there was an issue when I tried using a test font I created
> > with the following as one of its lookups (in ttx format):
> > 
> > 
> > 
> > 
> > 
> >   
> >   
> > 
> >   
> >
> > There is one lookup here, but 2 substitutions.  My program needs to
> > iterate through each substitution individually, but the function
> > hb_ot_layout_lookup_collect_glyphs only returns 2 unordered sets
> > representing all the input and output characters for this lookup.
> > How can I get one substitution at a time?
>
> Indeed, you can't use use `hb_ot_layout_lookup_collect_glyphs` for
> that.  However, given that you actually want to map input character
> codes to output glyph indices, what about using
> `hb_ot_shape_glyphs_closure` on single input characters, iterating
> over the auto-hinter 'styles'?  If you get a single output glyph,
> everything's fine.  If you get more than a single output glyph this
> essentially means that two or more lookups have been applied in
> succession, but you only have to take care of the glyph that is part
> of the 'style'.
>
> Note that this is untested on my side – I was just searching in the
> HarfBuzz API.
>
> Behdad, if you have a better idea, please chime in :-)
>
>
> Werner
>


Re: Progress update on adjustment database

2023-09-02 Thread Werner LEMBERG

> I discovered there was an issue when I tried using a test font I created
> with the following as one of its lookups (in ttx format):
> 
> 
> 
> 
> 
>   
>   
> 
>   
> 
> There is one lookup here, but 2 substitutions.  My program needs to
> iterate through each substitution individually, but the function
> hb_ot_layout_lookup_collect_glyphs only returns 2 unordered sets
> representing all the input and output characters for this lookup.
> How can I get one substitution at a time?

Indeed, you can't use use `hb_ot_layout_lookup_collect_glyphs` for
that.  However, given that you actually want to map input character
codes to output glyph indices, what about using
`hb_ot_shape_glyphs_closure` on single input characters, iterating
over the auto-hinter 'styles'?  If you get a single output glyph,
everything's fine.  If you get more than a single output glyph this
essentially means that two or more lookups have been applied in
succession, but you only have to take care of the glyph that is part
of the 'style'.

Note that this is untested on my side – I was just searching in the
HarfBuzz API.

Behdad, if you have a better idea, please chime in :-)


Werner


Re: Progress update on adjustment database

2023-08-31 Thread Craig White
I discovered there was an issue when I tried using a test font I created
with the following as one of its lookups (in ttx format):





  
  

  

There is one lookup here, but 2 substitutions.  My program needs to iterate
through each substitution individually, but the
function hb_ot_layout_lookup_collect_glyphs only returns 2 unordered sets
representing all the input and output characters for this lookup.  How can
I get one substitution at a time?

On Tue, Aug 29, 2023 at 3:13 PM Werner LEMBERG  wrote:

>
> > As I was testing my attempt at supporting GSUB lookups, I found that
> > hb_ot_layout_lookup_collect_glyphs, is actually not what I need,
> > because I assumed that each lookup contains exactly one
> > substitution, when it actually may contain multiple.  What I really
> > need is a way to get each individual substitution.  How do I do
> > this?
>
> It's not completely clear to me what you exactly need, please give an
> example.
>
> Behdad, any idea whether HarfBuzz can help here?  Otherwise it is
> probably necessary to parse the GSUB table by ourselves, which is
> something I would like to avoid...
>
>
> Werner
>


Re: Progress update on adjustment database

2023-08-29 Thread Werner LEMBERG


> As I was testing my attempt at supporting GSUB lookups, I found that
> hb_ot_layout_lookup_collect_glyphs, is actually not what I need,
> because I assumed that each lookup contains exactly one
> substitution, when it actually may contain multiple.  What I really
> need is a way to get each individual substitution.  How do I do
> this?

It's not completely clear to me what you exactly need, please give an
example.

Behdad, any idea whether HarfBuzz can help here?  Otherwise it is
probably necessary to parse the GSUB table by ourselves, which is
something I would like to avoid...


Werner



Re: Progress update on adjustment database

2023-08-28 Thread Craig White
As I was testing my attempt at supporting GSUB lookups, I found that
hb_ot_layout_lookup_collect_glyphs, is actually not what I need, because I
assumed that each lookup contains exactly one substitution, when it
actually may contain multiple.  What I really need is a way to get each
individual substitution.  How do I do this?


Re: Progress update on adjustment database

2023-08-12 Thread Hin-Tak Leung
 On Saturday, 12 August 2023 at 17:17:47 BST, Werner LEMBERG  
wrote:
 
 
 
> > It is one of those things I never remember - winding rule of
> > truetype and postscript are different, one is even-odd, the other is
> > non-zero.  [...]

> You are on the completely wrong track here.  What I'm talking about is
> strictly local and not (directly) related to having two different
> contours.  For example, to recognize stems and serifs, the contour
> must be the same.  In case you are not aware of how the auto-hinter
> collects 'segments' and 'edges', please read

>   https://www.tug.org/TUGboat/tb24-3/lemberg.pdf

> which still gives a good overview (inspite of its age and some changed
> details here and there).
Argh, sorry :-).


 
  

Re: Progress update on adjustment database

2023-08-12 Thread Werner LEMBERG


> It is one of those things I never remember - winding rule of
> truetype and postscript are different, one is even-odd, the other is
> non-zero.  [...]

You are on the completely wrong track here.  What I'm talking about is
strictly local and not (directly) related to having two different
contours.  For example, to recognize stems and serifs, the contour
must be the same.  In case you are not aware of how the auto-hinter
collects 'segments' and 'edges', please read

  https://www.tug.org/TUGboat/tb24-3/lemberg.pdf

which still gives a good overview (inspite of its age and some changed
details here and there).


Werner



Re: Progress update on adjustment database

2023-08-12 Thread Hin-Tak Leung
 It is one of those things I never remember - winding rule of truetype and 
postscript are different, one is even-odd, the other is non-zero. That concerns 
the number of times a line joining an "interior" (inked) point going to 
infinity, how many times it crosses the glyph contour.
Anyway, viewing from a small gap between two "inked" portion, the two contours 
on either side must be running in opposite direction.
There is a simple case where two contours runs in the same direction: think of 
a single connected contour that is like two circles except it crosses itself at 
one point (just say, the top - ie. You draw anti-clockwise from 12, when you 
get to 1, move inward to draw another circle, and move outward to 12 to join 
original when you get to 1 a 2nd time).
I think the two winding rules means in either case, the region between the 
inner/outer part is inked (viewing from that area locally, you see contours 
running in parallel), because the contour goes around you "once". Whether the 
inner circle is inked depends on whether you are talking postscript/cff or 
truetype. One is inked, the other is not. (That region is enclosed by the 
contour globally "twice" - hence difference in even-odd versus non-zero). So 
this self-intersecting contour should either be drawn as mostly an 'o' shape, 
or a solidly inked circle.
Granted, self-intersecting contours are rare, but they are legal. Anyway, a 
small gap we have been discussing for which we want to preserve during hinting, 
locally if you are sitting at that spot, you see contours running in opposite 
directions around you. If you see contours running in the same direction, you 
are possibly in a inked part instead, like between the self-intersecting 
bi-circle I just told you.
On Saturday, 12 August 2023 at 06:43:14 BST, Craig White 
 wrote:  
 
 I'm still missing something.  Why would the direction of the contour matter 
if, in either case, it's the same set of points?
On Fri, Aug 11, 2023 at 6:52 AM Werner LEMBERG  wrote:


> You said that for an i - like shape:
>> Both contours have the same direction.
> 
> What kind of problems does this rule protect against?

Sorry, this was sloppily formulated.  It's about the *local* direction
of contours, that is, whether a horizontal contour segment goes from
left to right or from right to left.  For the 'i' stem and the 'i'
dot, both contours must have the same direction globally, but locally,
at the dividing space, the corresponding lower and upper segments must
have the opposite directions.


    Werner

  

Re: Progress update on adjustment database

2023-08-12 Thread Werner LEMBERG


> I'm still missing something.  Why would the direction of the contour
> matter if, in either case, it's the same set of points?

It doesn't directly matter for your code.  However, FreeType's
auto-hinter handles these cases differently (namely, to detect stems),
which might in turn influence the adjustments you have to manage.
It's just a heads-up.


Werner



Re: Progress update on adjustment database

2023-08-11 Thread Craig White
I'm still missing something.  Why would the direction of the contour matter
if, in either case, it's the same set of points?

On Fri, Aug 11, 2023 at 6:52 AM Werner LEMBERG  wrote:

>
> > You said that for an i - like shape:
> >> Both contours have the same direction.
> >
> > What kind of problems does this rule protect against?
>
> Sorry, this was sloppily formulated.  It's about the *local* direction
> of contours, that is, whether a horizontal contour segment goes from
> left to right or from right to left.  For the 'i' stem and the 'i'
> dot, both contours must have the same direction globally, but locally,
> at the dividing space, the corresponding lower and upper segments must
> have the opposite directions.
>
>
> Werner
>


Re: Progress update on adjustment database

2023-08-11 Thread Werner LEMBERG


> You said that for an i - like shape:
>> Both contours have the same direction.
> 
> What kind of problems does this rule protect against?

Sorry, this was sloppily formulated.  It's about the *local* direction
of contours, that is, whether a horizontal contour segment goes from
left to right or from right to left.  For the 'i' stem and the 'i'
dot, both contours must have the same direction globally, but locally,
at the dividing space, the corresponding lower and upper segments must
have the opposite directions.


Werner



Re: Progress update on adjustment database

2023-08-11 Thread Craig White
I have a question about your suggestion earlier for a validation rule for
an i - like shape
You said that for an i - like shape:
> Both contours have the same direction.

What kind of problems does this rule protect against?

On Tue, Aug 8, 2023 at 1:45 PM Craig White  wrote:

> > Right now I'm abroad (actually, in New York :-) – it will take some
> > time until I can do such tests, sorry.
>
> That's ok.  I didn't mean to imply that I was expecting you to do the
> tests.  I just said that in case you tried the code anyway.
>
> On Mon, Aug 7, 2023 at 8:59 AM Werner LEMBERG  wrote:
>
>>
>> > If you're testing this yourself, keep in mind that the adjustment
>> > will only be applied to n with tilde.  Any other letter with a tilde
>> > can be used as a control group.
>>
>> Right now I'm abroad (actually, in New York :-) – it will take some
>> time until I can do such tests, sorry.
>>
>>
>> Werner
>>
>


Re: Progress update on adjustment database

2023-08-07 Thread Werner LEMBERG

> If you're testing this yourself, keep in mind that the adjustment
> will only be applied to n with tilde.  Any other letter with a tilde
> can be used as a control group.

Right now I'm abroad (actually, in New York :-) – it will take some
time until I can do such tests, sorry.


Werner


Re: Progress update on adjustment database

2023-08-03 Thread Werner LEMBERG


> [...] I have confirmed by commenting/uncommenting steps of the
> hinting process that af_glyph_hints_align_edge_points is the
> function that snaps the tilde back to flat, so if my understanding
> is correct, the points at the top and bottom of the tilde are
> forming edges and are being rounded towards each other, causing the
> tilde to remain flat.

Yes, it seems so.  In case you haven't read it already, this article
gives a nice overview, but please be aware that it is 20(!) years old
and thus partially outdated with respect to some details:

  https://www.tug.org/TUGboat/tb24-3/lemberg.pdf

> How should I proceed?

Perhaps you can try to remove the affected segments from the
corresponding edges.  If that fails there is still the possibility to
apply the vertical distortion afterwards...


Werner



Re: Progress update on adjustment database

2023-08-02 Thread Craig White
It's pushed.  If you're testing this yourself, keep in mind that the
adjustment will only be applied to n with tilde.  Any other letter with a
tilde can be used as a control group.

The commit also has the vertical adjustment mode that pushes the bottom
contour down, which I haven't found characters to test on yet.

On Wed, Aug 2, 2023 at 6:29 PM Craig White  wrote:

> Thanks for your help.  I fixed the issue by marking all on-curve points
> that I moved as touched (and letting IUP do the rest), setting oy = y, and
> applying the stretch to fy as well.  I also had some calculation errors in
> the height measurement the debug prints used and the algorithm itself.
>
> The next problem is that I relied on an assumption that if the original
> height of the tilde was about 2 pixels tall, the grid-fitted position would
> probably be 2 pixels.  My method estimates the width of the tilde by
> looking for the points at the tips of the tilde curves and measuring their
> distance to the bounding box (see the attached image for a visualization),
> then adding 1 pixel and scaling the contour vertically, anchored at its
> lowest point, until it is at least that tall.
>
> Testing with Liberation Sans, the sizes that had flat tildes before still
> have flat tildes.  I can add more than 1 pixel to the measurement to
> unflatten them, but that causes unnecessarily tall tildes at other sizes.
> I have confirmed by commenting/uncommenting steps of the hinting process
> that af_glyph_hints_align_edge_points is the function that snaps the tilde
> back to flat, so if my understanding is correct, the points at the top and
> bottom of the tilde are forming edges and are being rounded towards each
> other, causing the tilde to remain flat.  How should I proceed?
>
> I'll also push the code I have after removing the dead code and debug
> prints that have built up.
>
> On Mon, Jul 31, 2023 at 6:23 PM Werner LEMBERG  wrote:
>
>>
>> > I have an algorithm I'm testing for the tilde unflattening.  I went
>> > with doing it before all steps, because it worked better with my
>> > idea, but the function af_glyph_hints_align_weak_points is undoing
>> > my changes.
>>
>> Hmm.  Have you checked the OpenType specification how the IUP
>> instruction works?
>>
>>
>> https://learn.microsoft.com/en-us/typography/opentype/spec/tt_instructions#interpolate-untouched-points-through-the-outline
>>
>> Have you 'touched' the points you move (i.e., setting
>> `AF_FLAG_TOUCH_Y`) so that `af_glyph_hints_align_weak_points` doesn't
>> move them again?
>>
>>
>> Werner
>>
>


Re: Progress update on adjustment database

2023-08-02 Thread Craig White
Thanks for your help.  I fixed the issue by marking all on-curve points
that I moved as touched (and letting IUP do the rest), setting oy = y, and
applying the stretch to fy as well.  I also had some calculation errors in
the height measurement the debug prints used and the algorithm itself.

The next problem is that I relied on an assumption that if the original
height of the tilde was about 2 pixels tall, the grid-fitted position would
probably be 2 pixels.  My method estimates the width of the tilde by
looking for the points at the tips of the tilde curves and measuring their
distance to the bounding box (see the attached image for a visualization),
then adding 1 pixel and scaling the contour vertically, anchored at its
lowest point, until it is at least that tall.

Testing with Liberation Sans, the sizes that had flat tildes before still
have flat tildes.  I can add more than 1 pixel to the measurement to
unflatten them, but that causes unnecessarily tall tildes at other sizes.
I have confirmed by commenting/uncommenting steps of the hinting process
that af_glyph_hints_align_edge_points is the function that snaps the tilde
back to flat, so if my understanding is correct, the points at the top and
bottom of the tilde are forming edges and are being rounded towards each
other, causing the tilde to remain flat.  How should I proceed?

I'll also push the code I have after removing the dead code and debug
prints that have built up.

On Mon, Jul 31, 2023 at 6:23 PM Werner LEMBERG  wrote:

>
> > I have an algorithm I'm testing for the tilde unflattening.  I went
> > with doing it before all steps, because it worked better with my
> > idea, but the function af_glyph_hints_align_weak_points is undoing
> > my changes.
>
> Hmm.  Have you checked the OpenType specification how the IUP
> instruction works?
>
>
> https://learn.microsoft.com/en-us/typography/opentype/spec/tt_instructions#interpolate-untouched-points-through-the-outline
>
> Have you 'touched' the points you move (i.e., setting
> `AF_FLAG_TOUCH_Y`) so that `af_glyph_hints_align_weak_points` doesn't
> move them again?
>
>
> Werner
>


Re: Progress update on adjustment database

2023-07-31 Thread Werner LEMBERG


> I have an algorithm I'm testing for the tilde unflattening.  I went
> with doing it before all steps, because it worked better with my
> idea, but the function af_glyph_hints_align_weak_points is undoing
> my changes.

Hmm.  Have you checked the OpenType specification how the IUP
instruction works?

  
https://learn.microsoft.com/en-us/typography/opentype/spec/tt_instructions#interpolate-untouched-points-through-the-outline

Have you 'touched' the points you move (i.e., setting
`AF_FLAG_TOUCH_Y`) so that `af_glyph_hints_align_weak_points` doesn't
move them again?


Werner



Re: Progress update on adjustment database

2023-07-31 Thread Craig White
I have an algorithm I'm testing for the tilde unflattening.  I went with
doing it before all steps, because it worked better with my idea, but the
function af_glyph_hints_align_weak_points is undoing my changes.
I can tell by measuring the height of the tilde after each step of the
latin hinting process.
Here's an example from one test:
before tilde unflattening: 109 units
after: 124 units
after af_latin_hint_edges: 124 units
after af_glyph_hints_align_edge_points: 146 units
after af_glyph_hints_align_strong_points: 146 units
after af_glyph_hints_align_weak_points: 59 units

Even setting oy = y for all the points doesn't stop it, or making
extreme changes like raising all points by 200.  How do I fix this?

On Mon, Jul 31, 2023 at 12:51 AM Werner LEMBERG  wrote:

> > > Just take the extrema of *all* points – for fonts this should be
> > > good enough, because it is standard to have points at the curve
> > > extrema, thus making the bounding box of the curve identical to
> > > the bounding box of the points (both on and off points).
> >
> > Ok, thanks.  This is exactly what I needed.  I was already trying
> > this, but I thought it was wrong because of off points.
>
> For your information, you might also look at FreeType's function
> `FT_Glyph_Get_CBox`, which handles the generic case.
>
> > What I mean is: my code allows the adjustment for i to be applied to
> > glyphs with more than 2 contours to also adjust accented o and a, so
> > the rules you suggested would reject around half of the characters
> > that are currently being adjusted this way.  I think your rules can
> > still be enforced by treating the contour to be moved as "A" and all
> > other contours collectively as "B".
>
> Yeah, I was especially referring to glyph 'i', since here the problem
> is most visible.
>
> >> [...] I ask you to have this in mind to find a solution that can be
> >> easily extended to cover this situation, too (for example, by using
> >> an extendable structure instead of a plain variable).
> >
> > In this case, do you mean that instead of making a codepoint a key
> > for the database directly, I should wrap it in a struct so that
> > other kind of keys can be added?
>
> Something like this, yes.  Just try to be generic, and think of
> possible extensions.
>
>
>Werner
>


Re: Progress update on adjustment database

2023-07-30 Thread Werner LEMBERG
> > Just take the extrema of *all* points – for fonts this should be
> > good enough, because it is standard to have points at the curve
> > extrema, thus making the bounding box of the curve identical to
> > the bounding box of the points (both on and off points).
> 
> Ok, thanks.  This is exactly what I needed.  I was already trying
> this, but I thought it was wrong because of off points.

For your information, you might also look at FreeType's function
`FT_Glyph_Get_CBox`, which handles the generic case.

> What I mean is: my code allows the adjustment for i to be applied to
> glyphs with more than 2 contours to also adjust accented o and a, so
> the rules you suggested would reject around half of the characters
> that are currently being adjusted this way.  I think your rules can
> still be enforced by treating the contour to be moved as "A" and all
> other contours collectively as "B".

Yeah, I was especially referring to glyph 'i', since here the problem
is most visible.

>> [...] I ask you to have this in mind to find a solution that can be
>> easily extended to cover this situation, too (for example, by using
>> an extendable structure instead of a plain variable).
> 
> In this case, do you mean that instead of making a codepoint a key
> for the database directly, I should wrap it in a struct so that
> other kind of keys can be added?

Something like this, yes.  Just try to be generic, and think of
possible extensions.


   Werner


Re: Progress update on adjustment database

2023-07-30 Thread Craig White
> What exactly do you mean with 'find'?  The algorithm?  Just take the
extrema of *all* points – for fonts this should be good enough,
because it is standard to have points at the curve extrema, thus
making the bounding box of the curve identical to the bounding box of
the points (both on and off points).

Ok, thanks.  This is exactly what I needed.  I was already trying this, but
I thought it was wrong because of off points.

> What exact rule are you referring to?  My rule #3
> doesn't seem to fit what you are talking about...

Sorry for being unclear.
What I mean is: my code allows the adjustment for i to be applied to glyphs
with more than 2 contours to also adjust accented o and a, so the rules you
suggested would reject around half of the characters that are currently
being adjusted this way.  I think your rules can still be enforced by
treating the contour to be moved as "A" and all other contours collectively
as "B".

> No, I'm not, but I ask you to have this in mind to find a solution
> that can be easily extended to cover this situation, too (for example,
> by using an extendable structure instead of a plain variable).

In this case, do you mean that instead of making a codepoint a key for the
database directly, I should wrap it in a struct so that other kind of keys
can be added?
That sounds like a good solution

On Sun, Jul 30, 2023 at 4:14 AM Werner LEMBERG  wrote:

>
> Hello Craig,
>
>
> again sorry for the late reply.
>
> > During this time, I realized that knowing the bounding box of the
> > tilde contour would help a lot.  In fact, the logic for the other
> > vertical separation adjustments assumes that it can get the bounding
> > box by taking the minimum/maximum coordinates of all the points, but
> > this doesn't work because of off-points, which I didn't consider at
> > the time.  How do you find this bounding box?
>
> What exactly do you mean with 'find'?  The algorithm?  Just take the
> extrema of *all* points – for fonts this should be good enough,
> because it is standard to have points at the curve extrema, thus
> making the bounding box of the curve identical to the bounding box of
> the points (both on and off points).
>
> > I should note that, in your example, check #3 is too restrictive.
> > The logic allows for the bottom shape that needs to be separated to
> > be made up of any number of contours, which allows it to work for
> > characters with more complex shapes.
>
> What exact rule are you referring to?  My rule #3 was
>
>   (3) All points of A are lower than all points of B (or vice versa).
>
> which doesn't seem to fit what you are talking about...
>
> > I want to clarify: are you adding glyph names in the database as a
> > requirement for the project?
>
> No, I'm not, but I ask you to have this in mind to find a solution
> that can be easily extended to cover this situation, too (for example,
> by using an extendable structure instead of a plain variable).
>
>
> Werner
>


Re: Progress update on adjustment database

2023-07-30 Thread Werner LEMBERG

Hello Craig,


again sorry for the late reply.

> During this time, I realized that knowing the bounding box of the
> tilde contour would help a lot.  In fact, the logic for the other
> vertical separation adjustments assumes that it can get the bounding
> box by taking the minimum/maximum coordinates of all the points, but
> this doesn't work because of off-points, which I didn't consider at
> the time.  How do you find this bounding box?

What exactly do you mean with 'find'?  The algorithm?  Just take the
extrema of *all* points – for fonts this should be good enough,
because it is standard to have points at the curve extrema, thus
making the bounding box of the curve identical to the bounding box of
the points (both on and off points).

> I should note that, in your example, check #3 is too restrictive.
> The logic allows for the bottom shape that needs to be separated to
> be made up of any number of contours, which allows it to work for
> characters with more complex shapes.

What exact rule are you referring to?  My rule #3 was

  (3) All points of A are lower than all points of B (or vice versa).

which doesn't seem to fit what you are talking about...

> I want to clarify: are you adding glyph names in the database as a
> requirement for the project?

No, I'm not, but I ask you to have this in mind to find a solution
that can be easily extended to cover this situation, too (for example,
by using an extendable structure instead of a plain variable).


Werner


Re: Progress update on adjustment database

2023-07-25 Thread Craig White
Hey, I've been working on the tilde correction over the last few days.
During this time, I realized that knowing the bounding box of the tilde
contour would help a lot.  In fact, the logic for the other vertical
separation adjustments assumes that it can get the bounding box by taking
the minimum/maximum coordinates of all the points, but this doesn't work
because of off-points, which I didn't consider at the time.  How do you
find this bounding box?

Also, thanks for clarifying what your intended solution is for the GSUB
handling.  I should note that, in your example, check #3 is too
restrictive.  The logic allows for the bottom shape that needs to be
separated to be made up of any number of contours, which allows it to work
for characters with more complex shapes.
I want to clarify: are you adding glyph names in the database as a
requirement for the project?

On Fri, Jul 21, 2023 at 4:52 AM Werner LEMBERG  wrote:

>
> >> Many-to-one: accent marks, e.g. umlauts
> >> One-glyph-to-many-unicode-characters: ligatures, e.g. "ff", "fi".
> >
> > I don’t see how these two cases are different. An accented glyph like
> > ⟨ɑ̃⟩ is made up of two Unicode characters, ɑ+◌̃ (U+0251 U+0303);
> > similarly a ligated glyph like ⟨fi⟩ is made up of the two Unicode
> > characters f+i (U+0066 U+0069).
>
> The first case is usually handled in the GPOS table; the auto-hinter
> can only ignore this because the rendering of the components doesn't
> happen together but in succession.  See my other mail that gives both
> a many-to-one and a one-to many example that might be part of GSUB.
>
>
> Werner
>


Re: Progress update on adjustment database

2023-07-21 Thread Werner LEMBERG

>> Many-to-one: accent marks, e.g. umlauts
>> One-glyph-to-many-unicode-characters: ligatures, e.g. "ff", "fi".
> 
> I don’t see how these two cases are different. An accented glyph like
> ⟨ɑ̃⟩ is made up of two Unicode characters, ɑ+◌̃ (U+0251 U+0303);
> similarly a ligated glyph like ⟨fi⟩ is made up of the two Unicode
> characters f+i (U+0066 U+0069).

The first case is usually handled in the GPOS table; the auto-hinter
can only ignore this because the rendering of the components doesn't
happen together but in succession.  See my other mail that gives both
a many-to-one and a one-to many example that might be part of GSUB.


Werner


Re: Progress update on adjustment database

2023-07-21 Thread Brad Neimann



Many-to-one: accent marks, e.g. umlauts
One-glyph-to-many-unicode-characters: ligatures, e.g. "ff", "fi".


I don’t see how these two cases are different. An accented glyph like 
⟨ɑ̃⟩ is made up of two Unicode characters, ɑ+◌̃ (U+0251 U+0303); 
similarly a ligated glyph like ⟨fi⟩ is made up of the two Unicode 
characters f+i (U+0066 U+0069).


Regards,
Brad


Re: Progress update on adjustment database

2023-07-21 Thread Werner LEMBERG

>> A mapping from input Unicode characters to glyph indices based on
>> the GSUB + cmap tables and not on cmap alone.
>
> Right now, the only cases where the GSUB table is helpful that I am
> aware of, for the purposes of this project, are for handling glyph
> alternates and combining characters.  Those would be one-to-one
> mappings and many-to-one mappings, respectively.

Yes.

> Would this general solution involve other kinds of GSUB mappings?

Yes, but it won't affect your database format, which always contains
single Unicode input characters (or glyph names, see below).

> If so, it opens up edge cases such as: if a glyph on the "many" side
> of a many-to-one mapping needs a vertical separation adjustment,
> does the resulting glyph need it too?  This could be answered
> quickly by looking at the specific characters involved, but how
> would I answer this question in general?
>
> Even sticking to just many-to-one and one-to-one mappings, the
> adjustment database must make assumptions specific to the characters
> it's dealing with.

Yes, I think some kind of topological tests are needed.  For example,
the following assumptions should be checked for an 'i'-like shape:

(1) There are two contours A and B.
(2) Both contours have the same direction.
(3) All points of A are lower than all points of B (or vice versa).
(4) There is some horizontal overlap between A and B.

> In the case of combining characters, a separate database table is
> required because the existing table is a list of unicode characters
> and the actions that should be applied to them, while a glyph
> resulting from a combining character might not be a unicode
> character.

I don't think you need another database.  Let's discuss an 'fi'
ligature example.

* Analyzing the GSUB table shows that it is the result of two input
  characters, 'f' and i'.
* 'f' doesn't occur in your database and can be thus ignored if it is
  about to be queried.
* 'i' appears in your database, so check whether glyph 'fi' satisfies
  the constraints for an 'i'-like shape (as described above): No, it
  doesn't, since it either is a single contour, failing condition (1),
  or if there are two contours, failing condition (3).

It would be beneficial if your database could accept also glyph names
in addition to Unicode input character codes – glyph names, if
present, follow the Adobe Glyph List conventions today, which means
that they are standardized to a certain extent.  For example, there
might be an entry for glyph name 'fi' that ensures that on the right
side of the glyph there is vertical separation between the bottom and
top shape (i.e., the merged 'i').

Another example, this time a one-to-many mapping: Let's assume that
glyph U+2173, SMALL ROMAN NUMERAL FOUR, gets mapped in the GSUB table
to two glyphs, 'v.numeral' and 'i.numeral' (but 'i' is not mapped to
'i.numeral' or vice versa because of incompatible horizontal metrics).
Let's further assume that glyph 'i.numeral' is about to be queried.

* According to the GSUB table, 'i.numeral' is part of U+2173.
* U+2173 is in your database, and the constraints for considering it
  are the same as for the 'i' entry, which are met here.

> As for the tilde correction, I'll try doing it after the grid
> fitting like you recommended.

OK.


Werner


Re: Progress update on adjustment database

2023-07-21 Thread Hin-Tak Leung
 On Friday, 21 July 2023 at 07:28:07 BST, Craig White  
wrote:
 
> ...Those would be one-to-one mappings and many-to-one mappings, respectively. 
>  Would this general solution involve other kinds of GSUB mappings?  ...

> Even sticking to just many-to-one and one-to-one mappings, ...
Having too exclusively a western-european language background might be a 
blind-side... I haven't followed the conversations too closely, but the 3rd 
scenario you try to ignore has a very common name: ligature. Ie. Special glyphs 
for combinations of "fi", "ff", for example. To recap, common usages:
One-to-one: alternatesMany-to-one: accent marks, e.g. 
umlautsOne-glyph-to-many-unicode-characters: ligatures, e.g. "ff", "fi".  

Re: Progress update on adjustment database

2023-07-21 Thread Craig White
> Well, there are no 'unclear goals': the general solution is *exactly*
> what I was talking about all the time, and what you need for any entry
> in the adjustment database: A mapping from input Unicode characters to
> glyph indices based on the GSUB + cmap tables and not on cmap alone.

Right now, the only cases where the GSUB table is helpful that I am aware
of, for the purposes of this project, are for handling glyph alternates and
combining characters.  Those would be one-to-one mappings and many-to-one
mappings, respectively.  Would this general solution involve other kinds of
GSUB mappings?  If so, it opens up edge cases such as: if a glyph on the
"many" side of a many-to-one mapping needs a vertical
separation adjustment, does the resulting glyph need it too?  This could be
answered quickly by looking at the specific characters involved, but how
would I answer this question in general?
Even sticking to just many-to-one and one-to-one mappings, the adjustment
database must make assumptions specific to the characters it's dealing
with.  In the case of combining characters, a separate database table is
required because the existing table is a list of unicode characters and the
actions that should be applied to them, while a glyph resulting from a
combining character might not be a unicode character.  Even if I assumed it
was, listing all characters possibly resulting from a combining character
is inefficient.  Instead, only a table with a few entries are needed: the
combining character's codepoint and what action should be applied.  This is
something I started on before this conversation, and this is an example of
how the use case affects the structure of the database.

Without knowing what future use cases should be easier to implement because
of a generic solution, I don't know what flavor of generic is required.

As for the tilde correction, I'll try doing it after the grid fitting like
you recommended.


Re: Progress update on adjustment database

2023-07-20 Thread Werner LEMBERG


> Right now, I don't understand the needs well enough to know what to
> generalize or to test whether the general solution works well
> enough.  I'd rather start with specific use cases rather than a
> general solution with unclear goals.

Well, there are no 'unclear goals': the general solution is *exactly*
what I was talking about all the time, and what you need for any entry
in the adjustment database: A mapping from input Unicode characters to
glyph indices based on the GSUB + cmap tables and not on cmap alone.

>> Right now, I favor the latter: It should be a last-minute action,
>> similar to TrueType's `DELTAP[123]` bytecode instructions.
> 
> I disagree with doing the adjustment after grid fitting because in
> this case, grid fitting is a destructive action.  Doing it after
> would require taking a flat line and adding the wiggle back in,

Yes, but you have all the available data because you can access the
original glyph shape.  In other words, you can exactly control which
points to move.

> possibly in a way that doesn't match the font.

At the resolutions we are talking about this absolutely doesn't
matter, I think.  Essentially you have to make

  

appear as

   x x
  x x

> It sounds easier to prevent that from happening in the first place.

OK, give it a try.  Rounding information is available also, so this
might work as well.


Werner



Re: Progress update on adjustment database

2023-07-20 Thread Craig White
> Probably yes, but who knows.  It would be nice to have a generic
> solution that completely covers the whole situation, and we never have
> to think about it again.

Right now, I don't understand the needs well enough to know what to
generalize or to test whether the general solution works well enough.  I'd
rather start with specific use cases rather than a general solution with
unclear goals.

> This leads to the basic question: Shall the correction be applied
 >before or after the grid fitting?  Right now, I favor the latter: It
> should be a last-minute action, similar to TrueType's `DELTAP[123]`
> bytecode instructions.

I disagree with doing the adjustment after grid fitting because in this
case, grid fitting is a destructive action.  Doing it after would require
taking a flat line and adding the wiggle back in, possibly in a way that
doesn't match the font.  It sounds easier to prevent that from happening in
the first place.

On Thu, Jul 20, 2023 at 1:02 PM Werner LEMBERG  wrote:

>
> > Since hinting glyphs that are descendants of combining characters
> > will help few fonts, what other ways does the database need to use
> > the GSUB table?  The only other use case I'm aware of are one to one
> > substitutions providing alternate forms of a glyph.
>
> Probably yes, but who knows.  It would be nice to have a generic
> solution that completely covers the whole situation, and we never have
> to think about it again.
>
> > As for the tilde un-flattening, the approach I'm thinking of is to
> > force the tilde to be at least 2 pixels tall before grid fitting
> > begins.  Would this ever cause the tilde to be 3 pixels because of
> > rounding?
>
> This leads to the basic question: Shall the correction be applied
> before or after the grid fitting?  Right now, I favor the latter: It
> should be a last-minute action, similar to TrueType's `DELTAP[123]`
> bytecode instructions.
>
>
> https://learn.microsoft.com/en-us/typography/opentype/spec/tt_instructions#managing-exceptions
>
> In other words, if a tilde character's wiggle (not the whole tilde's
> vertical size!) is detected to be only 1px high, the shape should be
> aggressively distorted vertically to make the wiggle span two pixels.
> To do this, some code has to be written to detect the the inflection
> and extremum points of the upper and lower wiggle of the outline; only
> the extrema are then to be moved vertically.
>
>
> Werner
>


Re: Progress update on adjustment database

2023-07-20 Thread Werner LEMBERG


> Since hinting glyphs that are descendants of combining characters
> will help few fonts, what other ways does the database need to use
> the GSUB table?  The only other use case I'm aware of are one to one
> substitutions providing alternate forms of a glyph.

Probably yes, but who knows.  It would be nice to have a generic
solution that completely covers the whole situation, and we never have
to think about it again.

> As for the tilde un-flattening, the approach I'm thinking of is to
> force the tilde to be at least 2 pixels tall before grid fitting
> begins.  Would this ever cause the tilde to be 3 pixels because of
> rounding?

This leads to the basic question: Shall the correction be applied
before or after the grid fitting?  Right now, I favor the latter: It
should be a last-minute action, similar to TrueType's `DELTAP[123]`
bytecode instructions.

  
https://learn.microsoft.com/en-us/typography/opentype/spec/tt_instructions#managing-exceptions

In other words, if a tilde character's wiggle (not the whole tilde's
vertical size!) is detected to be only 1px high, the shape should be
aggressively distorted vertically to make the wiggle span two pixels.
To do this, some code has to be written to detect the the inflection
and extremum points of the upper and lower wiggle of the outline; only
the extrema are then to be moved vertically.


Werner



Re: Progress update on adjustment database

2023-07-20 Thread Craig White
Thanks!  This even answers some questions I was thinking about, but hadn't
asked.
I was wondering why I couldn't find any GSUB entries for combining
characters.  In one font I dumped with ttx, there were entries doing the
opposite: mapping 'aacute' -> 'a' + 'acute'.

Since hinting glyphs that are descendants of combining characters will help
few fonts, what other ways does the database need to use the GSUB table?
The only other use case I'm aware of are one to one substitutions providing
alternate forms of a glyph.

As for the tilde un-flattening, the approach I'm thinking of is to force
the tilde to be at least 2 pixels tall before grid fitting begins.  Would
this ever cause the tilde to be 3 pixels because of rounding?

On Thu, Jul 20, 2023 at 3:21 AM Werner LEMBERG  wrote:

>
> > The next thing I'm doing for the adjustment database is making
> > combining characters work.  Currently, only precomposed characters
> > will be adjusted.  If my understanding is correct, this would mean
> > finding any lookups that map a character + combining character onto
> > a glyph, then apply the appropriate adjustments to that glyph.
>
> Yes.  I suggest that you use the `ttx` decompiler from fonttools and
> analyse the contents of a GSUB table of your favourite font.
>
>   https://pypi.org/project/fonttools/
>
> At the same time, use the `ftview` FreeType demo program with an
> appropriate `FT2_DEBUG` setting so that you can see what the current
> HarfBuzz code does for the given font.  Examples:
>
> ```
> ttx -t GSUB arial.ttf
> FT2_DEBUG="afshaper:7 afglobal:7 -v" \
>   ftview -l 2 -kq arial.ttf &> arial.log
> ```
>
> Option `-l 2` selects 'light' hinting (i.e., auto-hinting), `-kq`
> emulates the 'q' keypress (i.e., quitting immediately).  See appended
> files for `arial.ttf` version 7.00.
>
> In `arial.log`, the information coming from the 'afshaper' component
> tells you the affected GSUB lookups; this helps poking around in the
> XML data as produced by `ttx`.  The 'afglobal' information tells you
> the glyph indices covering a given script and feature (start with
> 'latn_dflt').
>
> You might also try a font editor of your choice (for example,
> FontForge, menu entry 'View->Show ATT') to further analyze how the
> GSUB data is constructed, and to get some visual feeling on what's
> going on.
>
> > Right now, I'm trying to figure out what features I need to look
> > inside to find these lookups.  Should I just search all features?
>
> Yes, I think so.  Since the auto-hinter is agnostic to the script and
> the used language, you have to have all information in advance.
>
> > After that, I'm going to tackle the tilde-flattening issue, and any
> > other similar marks that are getting flattened.
>
> Note that in most fonts you won't find any GSUB data for common
> combinations like 'a' + 'acute' -> 'aacute'.  Usually, such stuff gets
> handled by the GPOS table, i.e., instead of mapping two glyphs to a
> another single one, the accent gets moved to a better position.  In
> this case, the glyphs are rendered separately, *outside of FreeType's
> scope*.  This means that we can't do anything on the auto-hinter side
> to optimize the distance between the base and the accent glyph (see
> also the comment in file `afshaper.c` starting at line 308, and this
> nice article
> https://learn.microsoft.com/en-us/typography/develop/processing-part1).
>
> It thus probably makes sense to do the tilde stuff first.
>
>
> Werner
>


Re: Progress update on adjustment database

2023-07-20 Thread Werner LEMBERG

> The next thing I'm doing for the adjustment database is making
> combining characters work.  Currently, only precomposed characters
> will be adjusted.  If my understanding is correct, this would mean
> finding any lookups that map a character + combining character onto
> a glyph, then apply the appropriate adjustments to that glyph.

Yes.  I suggest that you use the `ttx` decompiler from fonttools and
analyse the contents of a GSUB table of your favourite font.

  https://pypi.org/project/fonttools/

At the same time, use the `ftview` FreeType demo program with an
appropriate `FT2_DEBUG` setting so that you can see what the current
HarfBuzz code does for the given font.  Examples:

```
ttx -t GSUB arial.ttf
FT2_DEBUG="afshaper:7 afglobal:7 -v" \
  ftview -l 2 -kq arial.ttf &> arial.log
```

Option `-l 2` selects 'light' hinting (i.e., auto-hinting), `-kq`
emulates the 'q' keypress (i.e., quitting immediately).  See appended
files for `arial.ttf` version 7.00.

In `arial.log`, the information coming from the 'afshaper' component
tells you the affected GSUB lookups; this helps poking around in the
XML data as produced by `ttx`.  The 'afglobal' information tells you
the glyph indices covering a given script and feature (start with
'latn_dflt').

You might also try a font editor of your choice (for example,
FontForge, menu entry 'View->Show ATT') to further analyze how the
GSUB data is constructed, and to get some visual feeling on what's
going on.

> Right now, I'm trying to figure out what features I need to look
> inside to find these lookups.  Should I just search all features?

Yes, I think so.  Since the auto-hinter is agnostic to the script and
the used language, you have to have all information in advance.

> After that, I'm going to tackle the tilde-flattening issue, and any
> other similar marks that are getting flattened.

Note that in most fonts you won't find any GSUB data for common
combinations like 'a' + 'acute' -> 'aacute'.  Usually, such stuff gets
handled by the GPOS table, i.e., instead of mapping two glyphs to a
another single one, the accent gets moved to a better position.  In
this case, the glyphs are rendered separately, *outside of FreeType's
scope*.  This means that we can't do anything on the auto-hinter side
to optimize the distance between the base and the accent glyph (see
also the comment in file `afshaper.c` starting at line 308, and this
nice article
https://learn.microsoft.com/en-us/typography/develop/processing-part1).

It thus probably makes sense to do the tilde stuff first.


Werner


arial-7.00.ttx.xz
Description: Binary data


arial-7.00.log.xz
Description: Binary data


Progress update on adjustment database

2023-07-19 Thread Craig White
The next thing I'm doing for the adjustment database is making combining
characters work.  Currently, only precomposed characters will be adjusted.
If my understanding is correct, this would mean finding any lookups that
map a character + combining character onto a glyph, then apply the
appropriate adjustments to that glyph.
Right now, I'm trying to figure out what features I need to look inside to
find these lookups.  Should I just search all features?

After that, I'm going to tackle the tilde-flattening issue, and any other
similar marks that are getting flattened.