[ft-devel] ttfautohint: fun with assembler code
I've had some hope to show working bytecode stuff at the beginning of May, but things are a bit more complicated, unfortunately: While the library has been extended to support generated bytecode already, I have first to define some auxiliary bytecode functions to support hinting similar to FreeType's autohinter. On the other hand, I have now an outline how things will work: I will run the autohinter for a set of sizes which are then hardcoded as bytecode. Contrary to the autohinter, TrueType instructions can only handle integer values for PPEM sizes; this reduces a lot the number of configurations to test for. Some weeks ago I've adapted the autohinter logging stuff to standard FreeType tracing (at level 5); now can you actually see the `high-level hints' the autohinter is going to apply with the usual tracing methods as outlined in FreeType's `DEBUG' documentation file. As an example, here are the messages for vertical hinting of glyph `R' from pala.ttf at the 10ppem: BLUE: edge 0 (opos=0.00) snapped to (0.00), was (0.00) LINK: edge 1 (opos=0.30) linked to (0.30), dist was 0.30, now 0.30 BLUE: edge 4 (opos=7.34) snapped to (8.00), was (7.34) LINK: edge 3 (opos=6.97) linked to (7.62), dist was -0.38, now -0.38 SERIF_LINK1: edge 2 (opos=3.95) snapped to (4.31) from 1 (opos=0.30) I'll skip a detailed explanation here; you might read the comments in file `afhints.h' for more. The most important fact which can be seen from the above output is that `LINK' commands don't snap to pixel borders. This is intentional. Except for blue zones where either the top or the bottom line of a stem gets aligned with a pixel border (the width of small stems gets increased if necessary), the autohinter aligns the *center* of stems, applying some rounding voodoo to the stem width. To imitate this behaviour, I'm diving right now into the world of writing bytecode assembler code. Here an example. The following code is taken from `af_latin_compute_stem_width', transformed into some pseudo code which can be easier converted to bytecode: Function: compute_stem_width in: width is_serif is_round out: new_width CVT: std_width dist = ABS(width) if is_serif && dist < 3*64: return width else if is_round: if dist < 80 dist = 64 else: dist = MIN(56, dist) delta = ABS(dist - std_width) if delta < 40: dist = MIN(48, std_width) goto End if dist < 3*64: delta = dist dist = FLOOR(dist) delta = delta - dist if delta < 10: dist = dist + delta else if delta < 32: dist = dist + 10 else if delta < 54: dist = dist + 54 else dist = dist + delta else dist = ROUND(dist) End: if width < 0: dist = -dist return dist This corresponds to the following (still untested) bytecode: // In the comments below, the top of the stack (`s:') // is the rightmost element. // Function 0: compute_stem_width // // in: width // is_serif // is_round // out: new_width // CVT: std_width 0xB0, // PUSHB_1 0x00, // 0 0x2C, // FDEF 0x20, // DUP 0x64, // ABS -- s: is_round is_serif width dist 0x20, // DUP 0xB0, // PUSHB_1 0xC0, // 3*64 0x50, // LT -- (dist < 3*64) 0xB0, // PUSHB_1 0x04, // 4 0x26, // MINDEX -- s: is_round width dist (dist<3*64) is_serif 0x5A, // AND -- (is_serif && dist < 3*64) 0x58, // IF -- s: is_round width dist 0x21, // POP 0x23, // SWAP 0x21, // POP -- s: width 0x1B, // ELSE 0x8A, // ROLL -- s: width dist is_round 0x58, // IF -- s: width dist 0x20, // DUP 0xB0, // PUSHB_1 0x50, // 80 0x50, // LT -- (dist < 80) 0x58, // IF -- s: width dist 0x21, // POP 0xB0, // PUSHB_1 0x40, // 64 -- dist = 64 0x59, // EIF 0x1B, // ELSE 0xB0, // PUSHB_1 0x38, // 56 0x8C, // MIN -- dist = min(56, dist) 0x59, // EIF 0x20, // DUP -- s: width dist dist 0xB0, // PUSHB_1 %c, // index of std_width 0x45, // RCVT 0x61, // SUB 0x64, // ABS -- s: width dist delta 0xB0, // PUSHB_1 0x28, // 40 0x50, // LT -- (delta < 40) 0x58, // IF -- s: width dist 0x21, // POP 0xB1, // PUSHB_2 0x30, // 48 %c, // index of std_width 0x45, // RCVT 0x8C, // MIN -- dist = min(48, std_width) 0x1B, // ELSE 0x20, // DUP -- s: width dist dist 0xB0, // PUSHB_1 0xC0, // 3*64 0x50, // LT -- (dist < 3*64) 0x58, // IF 0x20, // DUP -- s: width delta dist 0x66, // FLOOR -- dist = FLOOR(dist) 0x20, // DUP -- s: width delta dist dist 0x8A, // ROLL 0x8A, // ROLL -- s: width dist delta dist 0x61, // SUB -- delta = delta - dist 0x20, // DUP -- s: width dist delta delta 0xB0, // PUSHB_1 0x0A, // 10 0x50, // LT -- (delta <
Re: [ft-devel] benchmark of sfnt checksum recalculation
Here is 2nd testcases checking ca. 1900 fonts (distributed in Debian GNU/Linux). The cumulative latency is small again, but the average latency to recalculate the checksum is 0.84 microsec, longer than the 1st testcase. 0.60 microsec is spent to seek to the table (tt_face_goto_table()), and 0.23 microsec is spent to mathematical work (tt_synth_sfnt_checksum()). Even if I write faster assembly version for mathematical work, I cannot reduce the latency to half. The reduction of the recalculation would be important. Considering the number of the fonts including cvt/fpgm/prep tables (or the times to call tricky font checker, 1731), the times to call checksum calculater (tt_get_sfnt_checksum(), 44) is small. The table length comparison (in current implementation) before table checksum comparison is effective to reduce the number of checksum recalculation. TESTCASE 2) === Environment is following: CPU:Centrino Duo RAM:2GB FreeType2 configuration: env \ CFLAGS="-g3 -ggdb -p -pg -fkeep-inline-functions -DFT_DEBUG_MEMORY=7" \ ./configure --disable-shared Fonts: 1916 fonts (including non-TrueType) in attached list. 1127/1916 fonts have cvt table 1019/1916 fonts have fpgm table 1071/1916 fonts have prep table Profile result -- Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds secondscalls us/call us/call name 26.37 0.24 0.242087311.5011.50 FT_Stream_ReadULong 10.99 0.34 0.10 106868 0.94 0.98 FT_Stream_ReadFields 8.79 0.42 0.08 179144.6745.50 tt_face_load_hmtx 7.69 0.49 0.07 1091109 0.06 0.06 FT_Stream_GetUShort 5.49 0.54 0.05 1858500 0.03 0.03 FT_Stream_GetULong 4.40 0.58 0.04 239967 0.17 1.27 Load_SBit_Range 4.40 0.62 0.04 312412.8012.80 tt_cmap4_validate 4.40 0.66 0.04 173123.1123.74 tt_face_load_kern 4.40 0.70 0.04 182 219.78 219.78 tt_cmap12_validate 3.30 0.73 0.03 538286 0.06 0.06 FT_Stream_EnterFrame 3.30 0.76 0.03 173117.33 253.10 tt_face_load_eblc 2.20 0.78 0.0234692 0.58 0.60 tt_face_goto_table 1.65 0.80 0.01 583348 0.03 0.03 ft_mem_free 1.10 0.81 0.01 531236 0.02 0.02 FT_Stream_ExitFrame 1.10 0.81 0.01 277428 0.04 0.04 FT_Stream_Seek 1.10 0.82 0.01 263558 0.04 0.04 ft_mem_alloc 1.10 0.83 0.01 247736 0.04 0.08 ft_mem_realloc 1.10 0.84 0.0122503 0.44 0.44 ft_service_list_lookup 1.10 0.85 0.01 6925 1.44 1.44 FT_Get_Module 1.10 0.86 0.01 1731 5.78 6.22 cid_get_interface 1.10 0.88 0.01 1731 5.7852.33 tt_face_build_cmaps 1.10 0.89 0.01 1731 5.7815.29 tt_face_free_eblc 1.10 0.90 0.01 1731 5.78 5.78 tt_face_free_ps_names 0.55 0.90 0.01 263559 0.02 0.02 ft_free 0.55 0.91 0.01 1731 2.89 2.89 ft_close_stream_by_munmap [snip] 0.00 0.91 0.00 1731 0.00 0.02 tt_check_trickyness 0.00 0.91 0.00 1731 0.00 0.00 tt_check_trickyness_family 0.00 0.91 0.00 1731 0.00 0.02 tt_check_trickyness_sfnt_ids [snip] 0.00 0.91 0.00 44 0.00 0.84 tt_get_sfnt_checksum 0.00 0.91 0.00 44 0.00 0.23 tt_synth_sfnt_checksum 0.00 0.91 0.00 35 0.00 0.00 tt_cmap2_validate ___ Freetype-devel mailing list Freetype-devel@nongnu.org https://lists.nongnu.org/mailman/listinfo/freetype-devel
[ft-devel] benchmark of sfnt checksum recalculation
On Thu, 28 Apr 2011 19:07:22 +0900 mpsuz...@hiroshima-u.ac.jp wrote: >Anyway, my assumption might be wrong. I should check the >cost of checksum recalculation by some benchmarks... Now I started the preliminary benchmark test about the extra latency when we ignore predefined checksums and calculate them by ourselves. It seems that the extra latency of checksum recalculation in FT_New_Face() is not fatal. The proportion to load/ validate hmtx, cmap tables is far larger. However, the latency is dependent with the size of cvt/ fpgm/prep tables, more detailed discussion is needed. Regards, mpsuzuki TESTCASE 1) === Environment is following: CPU:Centrino Duo RAM:2GB FreeType2 configuration: env \ CFLAGS="-g3 -ggdb -p -pg -fkeep-inline-functions -DFT_DEBUG_MEMORY=7" \ ./configure --disable-shared Fonts: Arphic (Taiwanese) TrueType fonts in Debian package bkai00mp.ttf, bsmi00lp.ttf, gbsn00lp.ttf, gkai00mp.ttf, ukai.ttc, uming.ttc Benchmark: repeat FT_New_Face() and FT_Done_Face() by sample program (attached) all sample fonts are opened/closed 1000 times. Profile result -- Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds secondscalls ms/call ms/call name 27.08 5.24 5.2412000 0.44 0.44 tt_face_load_hmtx 18.76 8.87 3.63 1392 0.00 0.00 FT_Stream_ReadFields 9.56 10.72 1.85 55404000 0.00 0.00 FT_Stream_GetUShort 7.86 12.24 1.52 8000 0.19 0.19 tt_cmap4_validate 6.30 13.46 1.22 33448000 0.00 0.00 FT_Stream_GetULong 6.10 14.64 1.18 27927000 0.00 0.00 FT_Stream_EnterFrame 4.86 15.58 0.94 6000 0.16 1.75 tt_face_load_eblc 4.19 16.39 0.81 4000 0.20 0.20 tt_cmap12_validate 3.98 17.16 0.77 13552000 0.00 0.00 FT_Stream_ReadULong 2.43 17.63 0.47 6000 0.08 0.11 tt_face_free_eblc 1.65 17.95 0.32 13839000 0.00 0.00 Load_SBit_Range 1.55 18.25 0.30 27907000 0.00 0.00 FT_Stream_ExitFrame 1.40 18.52 0.27 13992000 0.00 0.00 FT_Stream_Seek 0.88 18.69 0.17 28088086 0.00 0.00 ft_mem_free 0.67 18.82 0.13 13534000 0.00 0.00 Load_SBit_Const_Metrics [snip] 0.00 19.35 0.0012000 0.00 0.00 tt_face_load_hhea 0.00 19.35 0.00 8000 0.00 0.00 tt_get_sfnt_checksum 0.00 19.35 0.00 8000 0.00 0.00 tt_synth_sfnt_checksum 0.00 19.35 0.00 6000 0.00 0.00 tt_check_trickyness 0.00 19.35 0.00 6000 0.00 0.00 tt_check_trickyness_family 0.00 19.35 0.00 6000 0.00 0.00 tt_check_trickyness_sfnt_ids [snip] #include #include #include #include #include #include FT_FREETYPE_H #include FT_SYSTEM_H int main( int argc, char** argv ) { FT_Errorerror; FT_Library library; FT_Face face; int i; error = FT_Init_FreeType( &library ); if ( error ) exit( -2 ); if ( argc < 2 ) { fprintf( stderr, "1 argument is required\n" ); exit( -3 ); } for ( i = 1; i < argc ; i++ ) { error = FT_New_Face( library, argv[i], 0, &face ); if ( error ) { printf( "cannot open any face from %s\n", argv[i] ); } else { printf( "opened a face successfully from %s\n", argv[i] ); error = FT_Done_Face( face ); if ( error ) exit( -7 ); } } error = FT_Done_FreeType( library ); if ( error ) exit( -8 ); exit( 0 ); } ___ Freetype-devel mailing list Freetype-devel@nongnu.org https://lists.nongnu.org/mailman/listinfo/freetype-devel