[ft] Compressing CJK Fonts effectively
Hi, I am working on an embedded project which requires CJK truetype fonts. Since it is an embedded system, I want to save as much disk space as possible. The fonts we are using seem to compress to about 70% of their original size - which could save around 13 MB on the ROM image. I noticed the post in 2002 from David Turner when he completed the gzip support talking about not wanting to encourage stupid choices (like gzip-compressed truetype fonts). What is the current best practice for managing large CJK fonts in a storage efficient manner? Matthew ___ Freetype mailing list Freetype@nongnu.org http://lists.nongnu.org/mailman/listinfo/freetype
Re: [ft] Compressing CJK Fonts effectively
It's interesting theme for CJK people, but difficult to give quick answer. The meaning of embedded system is too broad to guess the system limitation, in comparison with desktop, laptop, netbook or OLPC, please describe more :-) # small ROM but large RAM? # consuming CPU is better than consuming ROM/RAM? # speed of rendering? # etc etc Recent generic lossless compressing technologies, like rzip, would show better compression result, but will consume more memory/CPU powers. -- In addition, you want to compress TrueType losslessly? Or, you don't mind to loss some informations in TrueType (and invent new TrueType-incompatible font format)? For example, Agfa MicroType has a technology called MicroType, to compress TrueType fonts by restricting the rendering system and removing some non-essential info for the system. The technology is adapted by ISO MPEG-4 and Microsoft's embedded OpenType and it seems that recent Windows Mobile uses it for bundled CJK fonts. However, the technology is patented, FreeType2 cannot support it until its expiration, or change of the license. W3C seems to work for new font format for web documents (so the file size reduction would be important theme, although memory/CPU consumation would not be so critical), there is a possibility that you can find some interesting technology. Matthew Tippett wrote (2010/09/08 1:30): Hi, I am working on an embedded project which requires CJK truetype fonts. Since it is an embedded system, I want to save as much disk space as possible. The fonts we are using seem to compress to about 70% of their original size - which could save around 13 MB on the ROM image. I noticed the post in 2002 from David Turner when he completed the gzip support talking about not wanting to encourage stupid choices (like gzip-compressed truetype fonts). What is the current best practice for managing large CJK fonts in a storage efficient manner? Matthew ___ Freetype mailing list Freetype@nongnu.org http://lists.nongnu.org/mailman/listinfo/freetype ___ Freetype mailing list Freetype@nongnu.org http://lists.nongnu.org/mailman/listinfo/freetype
Re: [ft] Compressing CJK Fonts effectively
Ah, I slipped to mention about the font program size and the expected resolution. Michiel, thank you for reminding it. In CJK Ideographs, apparently, the number of control points to draw Bezier/Spline curve is greater than Latin script. The number of the operator to connect the control points is also great. For the pattern repeating control points declaration and drawing operators (like Zig-Zag lines), CFF can reduce the number of drawing operators in comparison with PostScript Type1 font formats. In the earlier days of CJK TrueType, the lo-end font for CJK market used 256x256 matrix to define the control points (the hi-end PostScript font used 1000x1000 matrix). The number of control points were also reduced, so if the glyphs are rasterized at higher resolution, the glyph shape designed on 256x256 looked poor. I guess if your embedding system has no feature to generate a printing data (like PDF), such approach (reducing the number of control point, simplification from Bezier/Spline curve to straight line) won't be harmful. Regards, mpsuzuki Michiel Kamermans wrote (2010/09/08 2:27): Matthew Tippett wrote: I am working on an embedded project which requires CJK truetype fonts. Since it is an embedded system, I want to save as much disk space as possible. The fonts we are using seem to compress to about 70% of their original size - which could save around 13 MB on the ROM image. I noticed the post in 2002 from David Turner when he completed the gzip support talking about not wanting to encourage stupid choices (like gzip-compressed truetype fonts). What is the current best practice for managing large CJK fonts in a storage efficient manner? First check if those fonts exist as OpenType with PostScript Type 2 outlines (CFF). If so, done: use those, that's about as small as the font's going to get, because CFF glyphs are defined in terms of both normal vector instructions as well as glyph subroutines, so shared features between glyphs (and in CJK fonts that can be tens of thousands of features) can be stored as subroutines and remove a bucketload of bytes from the filesize. If they don't exist as OT(CFF), then you can try to run your font through Adobe's tx tool, which is freely available as part of the Font Developer Kit (http://www.adobe.com/devnet/opentype/afdko/). You will lose any hinting that's in the truetype font, but the FDK also comes with autohinter that is pretty much as good as hinting's going to get. Of course, this may also be illegal for the font you're using, so make sure you have the right to create derivatives before you go down that road. - Mike Pomax Kamermans nihongoresources.com ___ Freetype mailing list Freetype@nongnu.org http://lists.nongnu.org/mailman/listinfo/freetype ___ Freetype mailing list Freetype@nongnu.org http://lists.nongnu.org/mailman/listinfo/freetype
Re: [ft] Compressing CJK Fonts effectively
Hi, Have you tried wqy-microhei-fonts? It only consumes 5.3 M disk space, and is used for Linux LiveCD. URL: http://wenq.org/enindex.cgi?MicroHei%28en%29 Thanks, Peng Wu On Wed, Sep 8, 2010 at 12:30 AM, Matthew Tippett tippe...@gmail.com wrote: Hi, I am working on an embedded project which requires CJK truetype fonts. Since it is an embedded system, I want to save as much disk space as possible. The fonts we are using seem to compress to about 70% of their original size - which could save around 13 MB on the ROM image. I noticed the post in 2002 from David Turner when he completed the gzip support talking about not wanting to encourage stupid choices (like gzip-compressed truetype fonts). What is the current best practice for managing large CJK fonts in a storage efficient manner? Matthew ___ Freetype mailing list Freetype@nongnu.org http://lists.nongnu.org/mailman/listinfo/freetype ___ Freetype mailing list Freetype@nongnu.org http://lists.nongnu.org/mailman/listinfo/freetype
[ft] Re: Shiny New Subpixel Hinting Patch
Also... there are a couple other patches included in this one, that can / would be disabled by default, like zipped fonts, etc. Another feature that I've enabled in this version of the patch is one that makes the emboldening function ignore the Y direction. In my opinion, it looks substantially better than the existing emboldening, albeit not technically correct. Try it out on Andale Mono, Lucida Console, or some other font that has no bold version. ___ Freetype mailing list Freetype@nongnu.org http://lists.nongnu.org/mailman/listinfo/freetype
Re: [ft-devel] FT_MulFix assembly
James Cloos cl...@jhcloos.com writes: The C version does away-from-zero rounding. MB Do you have test cases that show this? I tried using random inputs, MB but even up to billions of iterations, I can't seem to find a set of MB inputs where my function yields different results from yours. The C version saves the two signs, takes the absolute values, multiplies, scales and then sets the sign. When I tested, I used dd(1) to generate a quarter-gig file from urandom (I used a fixed file so that it would be reproducable), mmap(2)ed that to an int[], and went through two at a time. The C and my initial asm versions produced different results whenever the second int was -1 (ie 0x) and the first matched: (a 0 (a 0x == 0x8000)). In other words, multiplying something like 7.5 by -1/65536. An example of that test's output was: 7AFA8000, , 8505, 8506, 0 In that example, 8505 is what the C version generates. Hm, are you sure that's not backwards? When I tried the git C version[*], as well as your most recent FT_MulFix_x86_64, it returned 0x8506... The following C version: typedef signed int FT_Int; typedef signed long FT_Long; typedef signed long FT_Int64; /* on x86-64 */ FT_Long FT_MulFix_C_new( FT_Long a, FT_Long b ) { return (((FT_Int64)a * (FT_Int64)b) + 0x8000) 16; } ... generates this code: imulq %rsi, %rdi leaq32768(%rdi), %rax sarq$16, %rax It seems to yield exactly the same results as the offical C version[*], both for your test case: $ ./t 0x7AFA8000 0x 0x7afa8000 x 0x = C: 0x7fff8506 C_new: 0x7fff8506 asm: 0x7fff8506 ... and also for billions of random inputs. Is there something I'm missing...? [*] Fetched from: http://git.savannah.gnu.org/cgit/freetype/freetype2.git/tree/src/base/ftcalc.c Thanks, -Miles -- Liberty, n. One of imagination's most precious possessions. ___ Freetype-devel mailing list Freetype-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/freetype-devel
RE: [ft-devel] latest patch file for spline flattening
Graham, Here are the results of my performance testing. I was a bit surprised by the results. In gray_convert_glyph, the time is distributed as follows: OLDNEW render_line 20%15% render_cubic 15%33% render_scanline 14%10% split_cubic6% 9% OLD is the pre-2.4.0 code; NEW is the latest patch from you. These percentages are the fraction of time spent in the specific function (excluding children). Including children, we have the following actual times per call for handling cubic curves: OLDNEW render_cubic 142us 220us I wasn't expecting your new code to be slower. So I ran my trace code on it with the following results: OLD NEW average line segs per arc13.5 11.3 min line segs per arc 21 max line segs per arc32 133 average deviation per line seg0.29 0.44 min deviation per line seg00 max deviation per line seg 22.2 15.8 Some arcs are creating a very large number of line segments. I expect (though I haven't verified) that it is this that is causing the slow-down. Below is the data for one curve that gets broken down into many tiny line segments. David %^ 4604,0 2080,0 40,2020 40,4496 40,4496 - 40,4436 40,4436 - 40,4379 40,4379 - 41,4321 41,4321 - 44,4264 44,4264 - 47,4206 47,4206 - 51,4149 51,4149 - 56,4092 56,4092 - 62,4036 62,4036 - 68,3979 68,3979 - 74,3922 74,3922 - 81,3865 81,3865 - 90,3811 90,3811 - 99,3754 99,3754 - 109,3700 109,3700 - 119,3645 119,3645 - 131,3591 131,3591 - 142,3535 142,3535 - 154,3481 154,3481 - 166,3427 166,3427 - 181,3373 181,3373 - 195,3319 195,3319 - 210,3265 210,3265 - 225,3211 225,3211 - 243,3160 243,3160 - 259,3106 259,3106 - 277,3055 277,3055 - 295,3002 295,3002 - 314,2951 314,2951 - 333,2900 333,2900 - 354,2849 354,2849 - 375,2798 375,2798 - 397,2748 397,2748 - 418,2697 418,2697 - 440,2646 440,2646 - 463,2595 463,2595 - 487,2547 487,2547 - 536,2450 536,2450 - 588,2354 588,2354 - 641,2258 641,2258 - 697,2165 697,2165 - 756,2073 756,2073 - 817,1984 817,1984 - 879,1894 879,1894 - 943,1807 943,1807 - 1009,1720 1009,1720 - 1079,1637 1079,1637 - 1149,1554 1149,1554 - 1222,1474 1222,1474 - 1297,1395 1297,1395 - 1375,1319 1375,1319 - 1452,1243 1452,1243 - 1533,1169 1533,1169 - 1614,1097 1614,1097 - 1698,1028 1698,1028 - 1782,959 1782,959 - 1869,894 1869,894 - 1958,830 1958,830 - 2049,769 2049,769 - 2140,708 2140,708 - 2233,651 2233,651 - 2328,595 2328,595 - 2425,543 2425,543 - 2522,491 2522,491 - 2570,467 2570,467 - 2621,443 2621,443 - 2671,419 2671,419 - 2722,397 2722,397 - 2773,375 2773,375 - 2825,354 2825,354 - 2876,332 2876,332 - 2927,311 2927,311 - 2978,290 2978,290 - 3031,272 3031,272 - 3082,253 3082,253 - 3136,235 3136,235 - 3190,217 3190,217 - 3244,202 3244,202 - 3297,184 3297,184 - 3351,169 3351,169 - 3405,154 3405,154 - 3460,140 3460,140 - 3514,126 3514,126 - 3570,114 3570,114 - 3625,102 3625,102 - 3682,91 3682,91 - 3736,79 3736,79 - 3793,69 3793,69 - 3849,59 3849,59 - 3906,50 3906,50 - 3963,41 3963,41 - 4020,34 4020,34 - 4077,28 4077,28 - 4136,22 4136,22 - 4193,16 4193,16 - 4251,11 4251,11 - 4308,7 4308,7 - 4368,4 4368,4 - 4425,1 4425,1 - 4485,0 4485,0 - 4544,0 4544,0 - 4604,0 ___ Freetype-devel mailing list Freetype-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/freetype-devel
Re: [ft-devel] latest patch file for spline flattening
That is very interesting and very useful - in fact I think the more surprising a test is, the more useful it is. I'll have to look into your test case carefully as well. I might not be able to do it for a day or to, though. Where does your test data come from? Actual fonts, cooked up data, or a mixture of both? Best regards, Graham - Original Message From: David Bevan david.be...@pb.com To: Graham Asher graham.as...@btinternet.com Cc: freetype-devel freetype-devel@nongnu.org Sent: Tuesday, 7 September, 2010 12:40:21 Subject: RE: [ft-devel] latest patch file for spline flattening Graham, Here are the results of my performance testing. I was a bit surprised by the results. In gray_convert_glyph, the time is distributed as follows: OLDNEW render_line 20%15% render_cubic 15%33% render_scanline 14%10% split_cubic6% 9% OLD is the pre-2.4.0 code; NEW is the latest patch from you. These percentages are the fraction of time spent in the specific function (excluding children). Including children, we have the following actual times per call for handling cubic curves: OLDNEW render_cubic 142us 220us I wasn't expecting your new code to be slower. So I ran my trace code on it with the following results: OLD NEW average line segs per arc13.5 11.3 min line segs per arc 21 max line segs per arc32 133 average deviation per line seg0.29 0.44 min deviation per line seg00 max deviation per line seg 22.2 15.8 Some arcs are creating a very large number of line segments. I expect (though I haven't verified) that it is this that is causing the slow-down. Below is the data for one curve that gets broken down into many tiny line segments. David %^ 4604,0 2080,0 40,2020 40,4496 40,4496 - 40,4436 40,4436 - 40,4379 40,4379 - 41,4321 41,4321 - 44,4264 44,4264 - 47,4206 47,4206 - 51,4149 51,4149 - 56,4092 56,4092 - 62,4036 62,4036 - 68,3979 68,3979 - 74,3922 74,3922 - 81,3865 81,3865 - 90,3811 90,3811 - 99,3754 99,3754 - 109,3700 109,3700 - 119,3645 119,3645 - 131,3591 131,3591 - 142,3535 142,3535 - 154,3481 154,3481 - 166,3427 166,3427 - 181,3373 181,3373 - 195,3319 195,3319 - 210,3265 210,3265 - 225,3211 225,3211 - 243,3160 243,3160 - 259,3106 259,3106 - 277,3055 277,3055 - 295,3002 295,3002 - 314,2951 314,2951 - 333,2900 333,2900 - 354,2849 354,2849 - 375,2798 375,2798 - 397,2748 397,2748 - 418,2697 418,2697 - 440,2646 440,2646 - 463,2595 463,2595 - 487,2547 487,2547 - 536,2450 536,2450 - 588,2354 588,2354 - 641,2258 641,2258 - 697,2165 697,2165 - 756,2073 756,2073 - 817,1984 817,1984 - 879,1894 879,1894 - 943,1807 943,1807 - 1009,1720 1009,1720 - 1079,1637 1079,1637 - 1149,1554 1149,1554 - 1222,1474 1222,1474 - 1297,1395 1297,1395 - 1375,1319 1375,1319 - 1452,1243 1452,1243 - 1533,1169 1533,1169 - 1614,1097 1614,1097 - 1698,1028 1698,1028 - 1782,959 1782,959 - 1869,894 1869,894 - 1958,830 1958,830 - 2049,769 2049,769 - 2140,708 2140,708 - 2233,651 2233,651 - 2328,595 2328,595 - 2425,543 2425,543 - 2522,491 2522,491 - 2570,467 2570,467 - 2621,443 2621,443 - 2671,419 2671,419 - 2722,397 2722,397 - 2773,375 2773,375 - 2825,354 2825,354 - 2876,332 2876,332 - 2927,311 2927,311 - 2978,290 2978,290 - 3031,272 3031,272 - 3082,253 3082,253 - 3136,235 3136,235 - 3190,217 3190,217 - 3244,202 3244,202 - 3297,184 3297,184 - 3351,169 3351,169 - 3405,154 3405,154 - 3460,140 3460,140 - 3514,126 3514,126 - 3570,114 3570,114 - 3625,102 3625,102 - 3682,91 3682,91 - 3736,79 3736,79 - 3793,69 3793,69 - 3849,59 3849,59 - 3906,50 3906,50 - 3963,41 3963,41 - 4020,34 4020,34 - 4077,28 4077,28 - 4136,22 4136,22 - 4193,16 4193,16 - 4251,11 4251,11 - 4308,7 4308,7 - 4368,4 4368,4 - 4425,1 4425,1 - 4485,0 4485,0 - 4544,0 4544,0 - 4604,0 ___ Freetype-devel mailing list Freetype-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/freetype-devel
RE: [ft-devel] latest patch file for spline flattening
After trying various other fonts, I settled on using a single large (14,000 glyphs; 800,000 Bezier curves) CID-keyed Type 1 font, which seemed to show pretty average behaviour. I'm working on an implementation of something like Hain's algorithm now. It'll be interesting to see how it compares. David %^ -Original Message- From: GRAHAM ASHER [mailto:graham.as...@btinternet.com] Sent: 07 September 2010 13:46 To: David Bevan Cc: freetype-devel Subject: Re: [ft-devel] latest patch file for spline flattening That is very interesting and very useful - in fact I think the more surprising a test is, the more useful it is. I'll have to look into your test case carefully as well. I might not be able to do it for a day or to, though. Where does your test data come from? Actual fonts, cooked up data, or a mixture of both? Best regards, Graham - Original Message From: David Bevan david.be...@pb.com To: Graham Asher graham.as...@btinternet.com Cc: freetype-devel freetype-devel@nongnu.org Sent: Tuesday, 7 September, 2010 12:40:21 Subject: RE: [ft-devel] latest patch file for spline flattening Graham, Here are the results of my performance testing. I was a bit surprised by the results. In gray_convert_glyph, the time is distributed as follows: OLDNEW render_line 20%15% render_cubic 15%33% render_scanline 14%10% split_cubic6% 9% OLD is the pre-2.4.0 code; NEW is the latest patch from you. These percentages are the fraction of time spent in the specific function (excluding children). Including children, we have the following actual times per call for handling cubic curves: OLDNEW render_cubic 142us 220us I wasn't expecting your new code to be slower. So I ran my trace code on it with the following results: OLD NEW average line segs per arc13.5 11.3 min line segs per arc 21 max line segs per arc32 133 average deviation per line seg0.29 0.44 min deviation per line seg00 max deviation per line seg 22.2 15.8 Some arcs are creating a very large number of line segments. I expect (though I haven't verified) that it is this that is causing the slow-down. Below is the data for one curve that gets broken down into many tiny line segments. David %^ 4604,0 2080,0 40,2020 40,4496 40,4496 - 40,4436 40,4436 - 40,4379 40,4379 - 41,4321 41,4321 - 44,4264 44,4264 - 47,4206 47,4206 - 51,4149 51,4149 - 56,4092 56,4092 - 62,4036 62,4036 - 68,3979 68,3979 - 74,3922 74,3922 - 81,3865 81,3865 - 90,3811 90,3811 - 99,3754 99,3754 - 109,3700 109,3700 - 119,3645 119,3645 - 131,3591 131,3591 - 142,3535 142,3535 - 154,3481 154,3481 - 166,3427 166,3427 - 181,3373 181,3373 - 195,3319 195,3319 - 210,3265 210,3265 - 225,3211 225,3211 - 243,3160 243,3160 - 259,3106 259,3106 - 277,3055 277,3055 - 295,3002 295,3002 - 314,2951 314,2951 - 333,2900 333,2900 - 354,2849 354,2849 - 375,2798 375,2798 - 397,2748 397,2748 - 418,2697 418,2697 - 440,2646 440,2646 - 463,2595 463,2595 - 487,2547 487,2547 - 536,2450 536,2450 - 588,2354 588,2354 - 641,2258 641,2258 - 697,2165 697,2165 - 756,2073 756,2073 - 817,1984 817,1984 - 879,1894 879,1894 - 943,1807 943,1807 - 1009,1720 1009,1720 - 1079,1637 1079,1637 - 1149,1554 1149,1554 - 1222,1474 1222,1474 - 1297,1395 1297,1395 - 1375,1319 1375,1319 - 1452,1243 1452,1243 - 1533,1169 1533,1169 - 1614,1097 1614,1097 - 1698,1028 1698,1028 - 1782,959 1782,959 - 1869,894 1869,894 - 1958,830 1958,830 - 2049,769 2049,769 - 2140,708 2140,708 - 2233,651 2233,651 - 2328,595 2328,595 - 2425,543 2425,543 - 2522,491 2522,491 - 2570,467 2570,467 - 2621,443 2621,443 - 2671,419 2671,419 - 2722,397 2722,397 - 2773,375 2773,375 - 2825,354 2825,354 - 2876,332 2876,332 - 2927,311 2927,311 - 2978,290 2978,290 - 3031,272 3031,272 - 3082,253 3082,253 - 3136,235 3136,235 - 3190,217 3190,217 - 3244,202 3244,202 - 3297,184 3297,184 - 3351,169 3351,169 - 3405,154 3405,154 - 3460,140 3460,140 - 3514,126 3514,126 - 3570,114 3570,114 - 3625,102 3625,102 - 3682,91 3682,91 - 3736,79 3736,79 - 3793,69 3793,69 - 3849,59 3849,59 - 3906,50 3906,50 - 3963,41 3963,41 - 4020,34 4020,34 - 4077,28 4077,28 - 4136,22 4136,22 - 4193,16 4193,16 - 4251,11 4251,11 - 4308,7 4308,7 - 4368,4 4368,4 - 4425,1 4425,1 - 4485,0 4485,0 - 4544,0 4544,0 - 4604,0 ___ Freetype-devel mailing list Freetype-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/freetype-devel
Re: [ft-devel] latest patch file for spline flattening
With 14,000 glyphs, I imagine that's a CJK font. I think there might be different characteristics from a typical Latin font. I think we also ought to try out a similar number of Latin glyphs, which could be done by rasterizing all the glyphs in a Latin font at varying sizes and rotations. In some ways a CJK font is probably a more stressful test, because strokes occur at a greater number of angles and sizes; but Latin characters should be part of the benchmark. I am not trying to foist more work on you; just musing. Graham - Original Message From: David Bevan david.be...@pb.com To: GRAHAM ASHER graham.as...@btinternet.com Cc: freetype-devel freetype-devel@nongnu.org Sent: Tuesday, 7 September, 2010 13:52:35 Subject: RE: [ft-devel] latest patch file for spline flattening After trying various other fonts, I settled on using a single large (14,000 glyphs; 800,000 Bezier curves) CID-keyed Type 1 font, which seemed to show pretty average behaviour. I'm working on an implementation of something like Hain's algorithm now. It'll be interesting to see how it compares. David %^ -Original Message- From: GRAHAM ASHER [mailto:graham.as...@btinternet.com] Sent: 07 September 2010 13:46 To: David Bevan Cc: freetype-devel Subject: Re: [ft-devel] latest patch file for spline flattening That is very interesting and very useful - in fact I think the more surprising a test is, the more useful it is. I'll have to look into your test case carefully as well. I might not be able to do it for a day or to, though. Where does your test data come from? Actual fonts, cooked up data, or a mixture of both? Best regards, Graham - Original Message From: David Bevan david.be...@pb.com To: Graham Asher graham.as...@btinternet.com Cc: freetype-devel freetype-devel@nongnu.org Sent: Tuesday, 7 September, 2010 12:40:21 Subject: RE: [ft-devel] latest patch file for spline flattening Graham, Here are the results of my performance testing. I was a bit surprised by the results. In gray_convert_glyph, the time is distributed as follows: OLDNEW render_line 20%15% render_cubic 15%33% render_scanline 14%10% split_cubic6% 9% OLD is the pre-2.4.0 code; NEW is the latest patch from you. These percentages are the fraction of time spent in the specific function (excluding children). Including children, we have the following actual times per call for handling cubic curves: OLDNEW render_cubic 142us 220us I wasn't expecting your new code to be slower. So I ran my trace code on it with the following results: OLD NEW average line segs per arc13.5 11.3 min line segs per arc 21 max line segs per arc32 133 average deviation per line seg0.29 0.44 min deviation per line seg00 max deviation per line seg 22.2 15.8 Some arcs are creating a very large number of line segments. I expect (though I haven't verified) that it is this that is causing the slow-down. Below is the data for one curve that gets broken down into many tiny line segments. David %^ 4604,0 2080,0 40,2020 40,4496 40,4496 - 40,4436 40,4436 - 40,4379 40,4379 - 41,4321 41,4321 - 44,4264 44,4264 - 47,4206 47,4206 - 51,4149 51,4149 - 56,4092 56,4092 - 62,4036 62,4036 - 68,3979 68,3979 - 74,3922 74,3922 - 81,3865 81,3865 - 90,3811 90,3811 - 99,3754 99,3754 - 109,3700 109,3700 - 119,3645 119,3645 - 131,3591 131,3591 - 142,3535 142,3535 - 154,3481 154,3481 - 166,3427 166,3427 - 181,3373 181,3373 - 195,3319 195,3319 - 210,3265 210,3265 - 225,3211 225,3211 - 243,3160 243,3160 - 259,3106 259,3106 - 277,3055 277,3055 - 295,3002 295,3002 - 314,2951 314,2951 - 333,2900 333,2900 - 354,2849 354,2849 - 375,2798 375,2798 - 397,2748 397,2748 - 418,2697 418,2697 - 440,2646 440,2646 - 463,2595 463,2595 - 487,2547 487,2547 - 536,2450 536,2450 - 588,2354 588,2354 - 641,2258 641,2258 - 697,2165 697,2165 - 756,2073 756,2073 - 817,1984 817,1984 - 879,1894 879,1894 - 943,1807 943,1807 - 1009,1720 1009,1720 - 1079,1637 1079,1637 - 1149,1554 1149,1554 - 1222,1474 1222,1474 - 1297,1395 1297,1395 - 1375,1319 1375,1319 - 1452,1243 1452,1243 - 1533,1169 1533,1169 - 1614,1097 1614,1097 - 1698,1028 1698,1028 - 1782,959 1782,959 - 1869,894 1869,894 - 1958,830 1958,830 - 2049,769 2049,769 - 2140,708 2140,708 - 2233,651 2233,651 - 2328,595 2328,595 - 2425,543 2425,543 - 2522,491 2522,491 - 2570,467 2570,467 - 2621,443 2621,443 - 2671,419 2671,419 - 2722,397 2722,397 - 2773,375 2773,375 - 2825,354 2825,354 - 2876,332 2876,332 - 2927,311 2927,311 - 2978,290 2978,290 - 3031,272 3031,272 - 3082,253 3082,253 - 3136,235 3136,235 - 3190,217 3190,217 - 3244,202 3244,202 - 3297,184 3297,184 - 3351,169
RE: [ft-devel] latest patch file for spline flattening
I've now implemented something based on Hain's research and it seems to be measurably faster than previous FT approaches. I have used Hain's paper (now available from http://tinyurl.com/HainBez) to provide some sensible heuristics rather than implement all his stuff in detail, and done so without using square roots or even any divisions. First, here are the trace results: OLD NEW HAIN average line segs per arc13.5 11.32.1 min line segs per arc 21 1 max line segs per arc32 133 16 average deviation per line seg0.29 0.44 6.5 min deviation per line seg00 0 max deviation per line seg 22.2 15.8 15.7 By using reasonably accurate heuristics when deciding whether to split the curve, we create 5.5 x fewer line segments. This cuts down the number of calls to split_cubic and the number of iterations within render_cubic. And now the performance results: In gray_convert_glyph, the time is distributed as follows: OLDNEWHAIN render_line 20%15% 12% render_cubic 15%33% 9% render_scanline 14%10% 10% split_cubic6% 9% 2% The time spent in these functions has been significantly reduced as a fraction of processing time. Including children, we have the following actual times per call for handling cubic curves: OLDNEWHAIN render_cubic 142us 220us 61us render_cubic is now more than twice as fast as it ever has been. The effect of the speed-up is even measurable as a 5-10% speed-up of my font rasterisation program (which is reading and writing data on top of using FT to do the actual rendering). These tests are with the same Unicode font as before. I'll run some more test with Latin-only fonts, though previous testing didn't show any significant performance differences between Latin and CJK. CJK glyphs just have more cubic Bezier curves on average, but a Bezier curve is a Bezier curve wherever it comes from. The code is below. I hope I've tried to follow Werner's coding standards as far as I know what they are. Thanks. David %^ static void gray_render_cubic( RAS_ARG_ const FT_Vector* control1, const FT_Vector* control2, const FT_Vector* to ) { FT_Vector* arc; arc = ras.bez_stack; arc[0].x = UPSCALE( to-x ); arc[0].y = UPSCALE( to-y ); arc[1].x = UPSCALE( control2-x ); arc[1].y = UPSCALE( control2-y ); arc[2].x = UPSCALE( control1-x ); arc[2].y = UPSCALE( control1-y ); arc[3].x = ras.x; arc[3].y = ras.y; for (;;) { /* Check that the arc crosses the current band. */ TPos min, max, y; min = max = arc[0].y; y = arc[1].y; if ( y min ) min = y; if ( y max ) max = y; y = arc[2].y; if ( y min ) min = y; if ( y max ) max = y; y = arc[3].y; if ( y min ) min = y; if ( y max ) max = y; if ( TRUNC( min ) = ras.max_ey || TRUNC( max ) 0 ) goto Draw; /* Decide whether to split or draw */ /* See Hain's paper at http://tinyurl.com/HainBez for more info */ { TPos dx, dy, L, dx1, dy1, dx2, dy2, s1, s2; /* dx and dy are x- and y- components of the P0-P3 chord vector */ dx = arc[3].x - arc[0].x; dy = arc[3].y - arc[0].y; /* L is an (under)estimate of the Euclidean distance P0-P3 */ L = ( 236 * FT_MAX(labs(dx), labs(dy)) + 97 * FT_MIN(labs(dx), labs(dy))) 8; /* avoid possible arithmetic overflow below by splitting */ if (L 32767) goto Split; /* s1 is L * the perpendicular distance from P1 to the line P0-P3 */ s1 = labs( dy * (dx1 = arc[1].x - arc[0].x) - dx * (dy1 = arc[1].y - arc[0].y)); /* max deviation is at least (s1 / L) * sqrt(3)/6 (if v = -1) */ if (s1 L * (TPos)(FT_MAX_CURVE_DEVIATION / 0.288675)) goto Split; /* s2 is L * the perpendicular distance from P2 to the line P0-P3 */ s2 = labs( dy * (dx2 = arc[2].x - arc[0].x) - dx * (dy2 = arc[2].y - arc[0].y)); /* max deviation may be as much as (max(s1,s2)/L) * 3/4 (if v = 1) */ if (FT_MAX(s1, s2) L * (TPos)(FT_MAX_CURVE_DEVIATION / 0.75)) goto Split; /* if P1 or P2 is outside P0-P3, split */ if ( dy * dy1 + dx * dx1 0 || dy * dy2 + dx * dx2 0 || dy * (arc[3].y - arc[1].y) + dx * (arc[3].x - arc[1].x) 0 || dy * (arc[3].y - arc[2].y) + dx * (arc[3].x - arc[2].x) 0 ) goto Split; /* no reason to split */ goto Draw; } Split: gray_split_cubic( arc ); arc += 3; continue; Draw: gray_render_line( RAS_VAR_ arc[0].x, arc[0].y ); if (arc == ras.bez_stack) return; arc -=
RE: [ft-devel] latest patch file for spline flattening
Here are some test results with Latin fonts (40 thousand curves from fonts at various point sizes). Trace results: CJK CJK CJK LATIN LATIN OLD NEW HAINNEW HAIN average line segs per arc13.5 11.32.1 30.96.1 max line segs per arc32 133 16 163 18 average deviation per line seg0.29 0.44 6.5 0.37 7.4 max deviation per line seg 22.2 15.8 15.7 7.9 15.7 Performance results: In gray_convert_glyph, the time is distributed as follows: CJKCJKCJK LATIN LATIN OLDNEWHAINNEWHAIN render_line 20%15% 12%14%11% render_cubic 15%33% 9%34%11% render_scanline 14%10% 10%10%11% split_cubic6% 9% 2%10% 3% Including children, we have the following actual times per call for handling cubic curves: CJKCJKCJK LATIN LATIN OLDNEWHAINNEW HAIN render_cubic 142us 220us 61us546us 176us Conclusions: The performance improvement is as evident with Latin fonts as with CJK ones. However, on average Bezier curves from Latin fonts require more flattening (6 segments versus 2 with the Hain implementation), so processing them takes longer. As Graham pointed out to me: The curves used in Latin and other Latin-like alphabets are very often used to navigate 90-degree corners; P0 and P1 lie on a grid line, and so do P2 and P3. This is very rarely true in Han characters. On the other hand, Latin glyphs contain fewer Bezier curves than CJK (6 versus 57 on average with my data). The upshot of both of these together is that the performance change is very similar (the CJK and Latin time distribution figures are so similar they could be from the same test). David %^ ___ Freetype-devel mailing list Freetype-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/freetype-devel
Re: [ft-devel] FT_MulFix assembly
MB == Miles Bader mi...@gnu.org writes: MB Hm, are you sure that's not backwards? When I tried the git C version[*], MB as well as your most recent FT_MulFix_x86_64, it returned 0x8506... Odd. Adding your algo to my test app, I get: 7AFA8000, , 8505, 8505, 8506 #a , b ,FT ,JC ,MB I see that I have one small error in the C code in my app. FT has: c = (FT_Long)( ( (FT_Int64)a * b + 0x8000L ) 16 ); whereas I used: c = (int32_t)(((int64_t)a*b + 0x8000L) 16); But changing the int32_t to long does not change the results. Yours still is always +1 compared to the C, whenever the first arg represents a positive value with fractional part == 1/2. Oddly, though, gcc now refuses to compile my asm, even though it did do so before, complaining that I cannot guess what arg size to use for the imul Wierd. (The existing executables prove that it used to.) A simple way around that is to specify D and S as the contraints for a and b. (The rdi and rsi regesters are where the x86_64 abi puts the first two args which are passed to a function.) The disassembly of the final version is: 004006c0 mf: 4006c0: 48 89 f8mov%rdi,%rax 4006c3: 48 f7 eeimul %rsi 4006c6: 48 01 d0add%rdx,%rax 4006c9: 48 05 00 80 00 00 add$0x8000,%rax 4006cf: 48 c1 f8 10 sar$0x10,%rax 4006d3: c3 retq And I get this disassembly of yours: 00400840 miles: 400840: 48 63 c6movslq %esi,%rax 400843: 48 63 ffmovslq %edi,%rdi 400846: 48 0f af c7 imul %rdi,%rax 40084a: 48 05 00 80 00 00 add$0x8000,%rax 400850: 48 c1 f8 10 sar$0x10,%rax 400854: c3 retq I also just added this version to my test app: int another (int32_t a, int32_t b) { long r = (long)a * (long)b; long s = r 31; return (r + s + 0x8000) 16; } That results in: 00400760 another: 400760: 48 63 ffmovslq %edi,%rdi 400763: 48 63 f6movslq %esi,%rsi 400766: 48 0f af f7 imul %rdi,%rsi 40076a: 48 89 f0mov%rsi,%rax 40076d: 48 c1 f8 1f sar$0x1f,%rax 400771: 48 8d 84 06 00 80 00lea0x8000(%rsi,%rax,1),%rax 400778: 00 400779: 48 c1 f8 10 sar$0x10,%rax 40077d: c3 retq Since FT's C version uses longs, though, this: int another (long a, long b) { long r = (long)a * (long)b; long s = r 31; return (r + s + 0x8000) 16; } gives: 00400760 another: 400760: 48 0f af f7 imul %rdi,%rsi 400764: 48 89 f0mov%rsi,%rax 400767: 48 c1 f8 1f sar$0x1f,%rax 40076b: 48 8d 84 06 00 80 00lea0x8000(%rsi,%rax,1),%rax 400772: 00 400773: 48 c1 f8 10 sar$0x10,%rax 400777: c3 retq So it would seem that when compiling for any processor where FT_Long is the same as int64_t and where that fits into a single register, then that last bit of C might be optimal, yes? -JimC -- James Cloos cl...@jhcloos.com OpenPGP: 1024D/ED7DAEA6 ___ Freetype-devel mailing list Freetype-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/freetype-devel
Re: [ft-devel] FT_MulFix assembly
James Cloos cl...@jhcloos.com writes: Since FT's C version uses longs, though, this: int another (long a, long b) { long r = (long)a * (long)b; long s = r 31; return (r + s + 0x8000) 16; } That's not correct though, is it? The variable s should be the all sign portion of the multiplication, but since the two inputs have 32 significant bits (never mind the types), the product will have 64 significant bits. So r 31 won't be all-sign, it'll be a bunch of ... other bits. :) However, changing the shift to 63: FT_Long FT_MulFix_C_new2( FT_Long a, FT_Long b ) { FT_Int64 prod = (FT_Int64)a * (FT_Int64)b; FT_Int64 sign = prod 63; return ((prod + sign + 0x8000) 16); } ... does seem to yield correct results: $ ./t 0x7AFA8000 0x 0x7afa8000 x 0x = C: 0x8505 C_new: 0x8505 C_nw2: 0x8505 C_ano: 0x8505 asm: 0x8505 C is the old C, C_new was my previous attempt, C_nw2 is the above FT_MulFix_C_new2 function, C_ano is the another function, and asm was your final asm version. [another yields misleadingly correct results in this case, because of the particular argument values given; in other cases, it gives incorrect results.] -miles -- Discriminate, v.i. To note the particulars in which one person or thing is, if possible, more objectionable than another. ___ Freetype-devel mailing list Freetype-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/freetype-devel