[ft-devel] ttfautohint: fun with assembler code

2011-04-29 Thread Werner LEMBERG

I've had some hope to show working bytecode stuff at the beginning of
May, but things are a bit more complicated, unfortunately: While the
library has been extended to support generated bytecode already, I
have first to define some auxiliary bytecode functions to support
hinting similar to FreeType's autohinter.

On the other hand, I have now an outline how things will work: I will
run the autohinter for a set of sizes which are then hardcoded as
bytecode.  Contrary to the autohinter, TrueType instructions can only
handle integer values for PPEM sizes; this reduces a lot the number of
configurations to test for.

Some weeks ago I've adapted the autohinter logging stuff to standard
FreeType tracing (at level 5); now can you actually see the
`high-level hints' the autohinter is going to apply with the usual
tracing methods as outlined in FreeType's `DEBUG' documentation file.
As an example, here are the messages for vertical hinting of glyph `R'
from pala.ttf at the 10ppem:

  BLUE: edge 0 (opos=0.00) snapped to (0.00), was (0.00)
  LINK: edge 1 (opos=0.30) linked to (0.30), dist was 0.30, now 0.30
  BLUE: edge 4 (opos=7.34) snapped to (8.00), was (7.34)
  LINK: edge 3 (opos=6.97) linked to (7.62), dist was -0.38, now -0.38
  SERIF_LINK1: edge 2 (opos=3.95) snapped to (4.31) from 1 (opos=0.30)

I'll skip a detailed explanation here; you might read the comments in
file `afhints.h' for more.

The most important fact which can be seen from the above output is
that `LINK' commands don't snap to pixel borders.  This is
intentional.  Except for blue zones where either the top or the bottom
line of a stem gets aligned with a pixel border (the width of small
stems gets increased if necessary), the autohinter aligns the *center*
of stems, applying some rounding voodoo to the stem width.  To imitate
this behaviour, I'm diving right now into the world of writing
bytecode assembler code.

Here an example.  The following code is taken from
`af_latin_compute_stem_width', transformed into some pseudo code which
can be easier converted to bytecode:

  Function: compute_stem_width

  in: width
  is_serif
  is_round
  out: new_width
  CVT: std_width

dist = ABS(width)

if is_serif
   && dist < 3*64:
  return width
else if is_round:
  if dist < 80
dist = 64
else:
  dist = MIN(56, dist)

delta = ABS(dist - std_width)

if delta < 40:
  dist = MIN(48, std_width)
  goto End

if dist < 3*64:
  delta = dist
  dist = FLOOR(dist)
  delta = delta - dist

  if delta < 10:
dist = dist + delta
  else if delta < 32:
dist = dist + 10
  else if delta < 54:
dist = dist + 54
  else
dist = dist + delta
else
  dist = ROUND(dist)

  End:
if width < 0:
  dist = -dist
return dist

This corresponds to the following (still untested) bytecode:

  // In the comments below, the top of the stack (`s:')
  // is the rightmost element.

  // Function 0: compute_stem_width
  //
  // in: width
  // is_serif
  // is_round
  // out: new_width
  // CVT: std_width

  0xB0, // PUSHB_1
  0x00, //   0
  0x2C, // FDEF

  0x20, // DUP
  0x64, // ABS -- s: is_round is_serif width dist

  0x20, // DUP
  0xB0, // PUSHB_1
  0xC0, //   3*64
  0x50, // LT -- (dist < 3*64)

  0xB0, // PUSHB_1
  0x04, //   4
  0x26, // MINDEX -- s: is_round width dist (dist<3*64) is_serif

  0x5A, // AND -- (is_serif && dist < 3*64)
  0x58, // IF -- s: is_round width dist
  0x21, //   POP
  0x23, //   SWAP
  0x21, //   POP -- s: width

  0x1B, // ELSE
  0x8A, //   ROLL -- s: width dist is_round
  0x58, //   IF -- s: width dist
  0x20, // DUP
  0xB0, // PUSHB_1
  0x50, //   80
  0x50, // LT -- (dist < 80)
  0x58, // IF -- s: width dist
  0x21, //   POP
  0xB0, //   PUSHB_1
  0x40, // 64 -- dist = 64
  0x59, // EIF

  0x1B, //   ELSE
  0xB0, // PUSHB_1
  0x38, //   56
  0x8C, //   MIN -- dist = min(56, dist)
  0x59, //   EIF

  0x20, //   DUP -- s: width dist dist
  0xB0, //   PUSHB_1
  %c,   // index of std_width
  0x45, //   RCVT
  0x61, //   SUB
  0x64, //   ABS -- s: width dist delta

  0xB0, //   PUSHB_1
  0x28, // 40
  0x50, //   LT -- (delta < 40)
  0x58, //   IF -- s: width dist
  0x21, // POP
  0xB1, // PUSHB_2
  0x30, //   48
  %c,   //   index of std_width
  0x45, // RCVT
  0x8C, // MIN -- dist = min(48, std_width)

  0x1B, //   ELSE
  0x20, // DUP -- s: width dist dist
  0xB0, // PUSHB_1
  0xC0, //   3*64
  0x50, // LT -- (dist < 3*64)
  0x58, // IF
  0x20, //   DUP -- s: width delta dist
  0x66, //   FLOOR -- dist = FLOOR(dist)
  0x20, //   DUP -- s: width delta dist dist
  0x8A, //   ROLL
  0x8A, //   ROLL -- s: width dist delta dist
  0x61, //   SUB -- delta = delta - dist

  0x20, //   DUP -- s: width dist delta delta
  0xB0, //   PUSHB_1
  0x0A, // 10
  0x50, //   LT -- (delta <

Re: [ft-devel] benchmark of sfnt checksum recalculation

2011-04-29 Thread mpsuzuki
Here is 2nd testcases checking ca. 1900 fonts (distributed in Debian
GNU/Linux). The cumulative latency is small again, but the average
latency to recalculate the checksum is 0.84 microsec, longer
than the 1st testcase.

0.60 microsec is spent to seek to the table (tt_face_goto_table()), and
0.23 microsec is spent to mathematical work (tt_synth_sfnt_checksum()).
Even if I write faster assembly version for mathematical work,
I cannot reduce the latency to half. The reduction of the
recalculation would be important.

Considering the number of the fonts including cvt/fpgm/prep tables
(or the times to call tricky font checker, 1731), the times to
call checksum calculater (tt_get_sfnt_checksum(), 44) is small.
The table length comparison (in current implementation) before table
checksum comparison is effective to reduce the number of checksum
recalculation.

TESTCASE 2)
===

Environment is following:
CPU:Centrino Duo
RAM:2GB
FreeType2 configuration:
env \
CFLAGS="-g3 -ggdb -p -pg -fkeep-inline-functions -DFT_DEBUG_MEMORY=7" \
./configure --disable-shared

Fonts:
1916 fonts (including non-TrueType) in attached list.
1127/1916 fonts have cvt  table
1019/1916 fonts have fpgm table
1071/1916 fonts have prep table

Profile result
--
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self  self total   
 time   seconds   secondscalls  us/call  us/call  name
 26.37  0.24 0.242087311.5011.50  FT_Stream_ReadULong
 10.99  0.34 0.10   106868 0.94 0.98  FT_Stream_ReadFields
  8.79  0.42 0.08 179144.6745.50  tt_face_load_hmtx
  7.69  0.49 0.07  1091109 0.06 0.06  FT_Stream_GetUShort
  5.49  0.54 0.05  1858500 0.03 0.03  FT_Stream_GetULong
  4.40  0.58 0.04   239967 0.17 1.27  Load_SBit_Range
  4.40  0.62 0.04 312412.8012.80  tt_cmap4_validate
  4.40  0.66 0.04 173123.1123.74  tt_face_load_kern
  4.40  0.70 0.04  182   219.78   219.78  tt_cmap12_validate
  3.30  0.73 0.03   538286 0.06 0.06  FT_Stream_EnterFrame
  3.30  0.76 0.03 173117.33   253.10  tt_face_load_eblc
  2.20  0.78 0.0234692 0.58 0.60  tt_face_goto_table
  1.65  0.80 0.01   583348 0.03 0.03  ft_mem_free
  1.10  0.81 0.01   531236 0.02 0.02  FT_Stream_ExitFrame
  1.10  0.81 0.01   277428 0.04 0.04  FT_Stream_Seek
  1.10  0.82 0.01   263558 0.04 0.04  ft_mem_alloc
  1.10  0.83 0.01   247736 0.04 0.08  ft_mem_realloc
  1.10  0.84 0.0122503 0.44 0.44  ft_service_list_lookup
  1.10  0.85 0.01 6925 1.44 1.44  FT_Get_Module
  1.10  0.86 0.01 1731 5.78 6.22  cid_get_interface
  1.10  0.88 0.01 1731 5.7852.33  tt_face_build_cmaps
  1.10  0.89 0.01 1731 5.7815.29  tt_face_free_eblc
  1.10  0.90 0.01 1731 5.78 5.78  tt_face_free_ps_names
  0.55  0.90 0.01   263559 0.02 0.02  ft_free
  0.55  0.91 0.01 1731 2.89 2.89  ft_close_stream_by_munmap

[snip]

  0.00  0.91 0.00 1731 0.00 0.02  tt_check_trickyness
  0.00  0.91 0.00 1731 0.00 0.00  tt_check_trickyness_family
  0.00  0.91 0.00 1731 0.00 0.02  
tt_check_trickyness_sfnt_ids

[snip]

  0.00  0.91 0.00   44 0.00 0.84  tt_get_sfnt_checksum
  0.00  0.91 0.00   44 0.00 0.23  tt_synth_sfnt_checksum
  0.00  0.91 0.00   35 0.00 0.00  tt_cmap2_validate

___
Freetype-devel mailing list
Freetype-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/freetype-devel


[ft-devel] benchmark of sfnt checksum recalculation

2011-04-29 Thread mpsuzuki
On Thu, 28 Apr 2011 19:07:22 +0900
mpsuz...@hiroshima-u.ac.jp wrote:
>Anyway, my assumption might be wrong. I should check the
>cost of checksum recalculation by some benchmarks...

Now I started the preliminary benchmark test about the
extra latency when we ignore predefined checksums and
calculate them by ourselves.

It seems that the extra latency of checksum recalculation
in FT_New_Face() is not fatal. The proportion to load/
validate hmtx, cmap tables is far larger.

However, the latency is dependent with the size of cvt/
fpgm/prep tables, more detailed discussion is needed.

Regards,
mpsuzuki


TESTCASE 1)
===

Environment is following:
CPU:Centrino Duo
RAM:2GB
FreeType2 configuration:
env \
CFLAGS="-g3 -ggdb -p -pg -fkeep-inline-functions -DFT_DEBUG_MEMORY=7" \
./configure --disable-shared
Fonts:
Arphic (Taiwanese) TrueType fonts in Debian package
bkai00mp.ttf, bsmi00lp.ttf,
gbsn00lp.ttf, gkai00mp.ttf,
ukai.ttc, uming.ttc
Benchmark:
repeat FT_New_Face() and FT_Done_Face() by sample program (attached)
all sample fonts are opened/closed 1000 times.

Profile result
--
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self  self total   
 time   seconds   secondscalls  ms/call  ms/call  name
 27.08  5.24 5.2412000 0.44 0.44  tt_face_load_hmtx
 18.76  8.87 3.63 1392 0.00 0.00  FT_Stream_ReadFields
  9.56 10.72 1.85 55404000 0.00 0.00  FT_Stream_GetUShort
  7.86 12.24 1.52 8000 0.19 0.19  tt_cmap4_validate
  6.30 13.46 1.22 33448000 0.00 0.00  FT_Stream_GetULong
  6.10 14.64 1.18 27927000 0.00 0.00  FT_Stream_EnterFrame
  4.86 15.58 0.94 6000 0.16 1.75  tt_face_load_eblc
  4.19 16.39 0.81 4000 0.20 0.20  tt_cmap12_validate
  3.98 17.16 0.77 13552000 0.00 0.00  FT_Stream_ReadULong
  2.43 17.63 0.47 6000 0.08 0.11  tt_face_free_eblc
  1.65 17.95 0.32 13839000 0.00 0.00  Load_SBit_Range
  1.55 18.25 0.30 27907000 0.00 0.00  FT_Stream_ExitFrame
  1.40 18.52 0.27 13992000 0.00 0.00  FT_Stream_Seek
  0.88 18.69 0.17 28088086 0.00 0.00  ft_mem_free
  0.67 18.82 0.13 13534000 0.00 0.00  Load_SBit_Const_Metrics

[snip]

  0.00 19.35 0.0012000 0.00 0.00  tt_face_load_hhea
  0.00 19.35 0.00 8000 0.00 0.00  tt_get_sfnt_checksum
  0.00 19.35 0.00 8000 0.00 0.00  tt_synth_sfnt_checksum
  0.00 19.35 0.00 6000 0.00 0.00  tt_check_trickyness
  0.00 19.35 0.00 6000 0.00 0.00  tt_check_trickyness_family
  0.00 19.35 0.00 6000 0.00 0.00  
tt_check_trickyness_sfnt_ids

[snip]
#include 
#include 
#include 
#include 

#include 
#include FT_FREETYPE_H
#include FT_SYSTEM_H

int main( int argc,
  char**  argv )
{
  FT_Errorerror;
  FT_Library  library;
  FT_Face face;
  int i;


  error = FT_Init_FreeType( &library );
  if ( error )
exit( -2 );

  if ( argc < 2 )
  {
fprintf( stderr, "1 argument is required\n" );
exit( -3 );
  }

  for ( i = 1; i < argc ; i++ )
  {
error = FT_New_Face( library, argv[i], 0, &face );
if ( error )
{
  printf( "cannot open any face from %s\n", argv[i] );
}
else
{
  printf( "opened a face successfully from %s\n", argv[i] );
  error = FT_Done_Face( face );
  if ( error )
exit( -7 );
}
  }

  error = FT_Done_FreeType( library );
  if ( error )
exit( -8 );


  exit( 0 );
}
___
Freetype-devel mailing list
Freetype-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/freetype-devel