Re: [ft-devel] Re: possible inefficiency in gray_render_cubic
here's what I think is a better version of the start of the function. [...] This is now in the git repository (with small fixes to make it compile). Thanks a lot! Please test. Werner ___ Freetype-devel mailing list Freetype-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/freetype-devel
Re: [ft-devel] Re: possible inefficiency in gray_render_cubic
You're welcome - I see that you had to get rid of a check on the return value of gray_render_line, which I forgot about. That is there because I use a modified version of FreeType that can be compiled as pure .NET intermediate language code; and to do that I had to get rid of setjmp and longjmp; incidentally, with no performance penalty and little extra complexity. Best regards, Graham Werner LEMBERG wrote: here's what I think is a better version of the start of the function. [...] This is now in the git repository (with small fixes to make it compile). Thanks a lot! Please test. Werner ___ Freetype-devel mailing list Freetype-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/freetype-devel
Re: [ft-devel] Re: possible inefficiency in gray_render_cubic
[...] I use a modified version of FreeType that can be compiled as pure .NET intermediate language code; and to do that I had to get rid of setjmp and longjmp; incidentally, with no performance penalty and little extra complexity. Is this something we should consider in general? I mean, are there benefits for removing setjmp/longjmp except portability to other programming languages? Werner ___ Freetype-devel mailing list Freetype-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/freetype-devel
Re: [ft-devel] Re: possible inefficiency in gray_render_cubic
No, I don't think it's worth worrying about this right now. I worked on this because I wanted to create a version of my CartoType library as a portable .NET assembly, and in fact I managed to compile CartoType as C++/CLI code, which is a tribute to its robustness; but to do that I had to remove setjmp/longjmp and change a few symbols which conflicted with new keywords, like 'generic'. It was an interesting exercise, but proved to be a dead end (for the moment) because C++/CLI is not supported for Windows Mobile, and 'portable' .NET assemblies seem not in fact to be portable. Doing this showed that setjmp and longjmp are not necessary for FreeType; and perhaps they should be removed in future; but there's no compelling reason to do so right now. Graham Werner LEMBERG wrote: [...] I use a modified version of FreeType that can be compiled as pure .NET intermediate language code; and to do that I had to get rid of setjmp and longjmp; incidentally, with no performance penalty and little extra complexity. Is this something we should consider in general? I mean, are there benefits for removing setjmp/longjmp except portability to other programming languages? Werner ___ Freetype-devel mailing list Freetype-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/freetype-devel
Re: [ft-devel] Re: possible inefficiency in gray_render_cubic
On 06/10/2010 05:17 AM, Graham Asher wrote: Doing this showed that setjmp and longjmp are not necessary for FreeType; Indeed. They are only used for error-handling in the validator modules. behdad ___ Freetype-devel mailing list Freetype-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/freetype-devel
Re: [ft-devel] Re: possible inefficiency in gray_render_cubic
On 06/10/2010 05:17 AM, Graham Asher wrote: Doing this showed that setjmp and longjmp are not necessary for FreeType; Indeed. They are only used for error-handling in the validator modules. behdad ___ Freetype-devel mailing list Freetype-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/freetype-devel
Re: [ft-devel] Re: possible inefficiency in gray_render_cubic
Unfortunately not only there but in the rasterizer, if memory serves. They were not in any place where I could get rid of them easily by disabling certain features. Graham Behdad Esfahbod wrote: On 06/10/2010 05:17 AM, Graham Asher wrote: Doing this showed that setjmp and longjmp are not necessary for FreeType; Indeed. They are only used for error-handling in the validator modules. behdad ___ Freetype-devel mailing list Freetype-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/freetype-devel
Re: [ft-devel] Re: possible inefficiency in gray_render_cubic
This is something Ken can probably check: He has a large database for Ghostscript to compare rendering results, and artifacts introduced by the simplified shorthand calculation would easily show up, I think. I'd like to be able to, but at the moment GS doesn't use FreeType for anti-aliased rendering, [...] OK. Thanks anyway. Werner ___ Freetype-devel mailing list Freetype-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/freetype-devel
Re: [ft-devel] Re: possible inefficiency in gray_render_cubic
Calculate the midpoint and compare it with start and end: [...] For me, this looks good. Thanks for working on this. I am sceptical about the need to calculate both da and db. Perhaps db only will suffice. I hope that David or Werner can comment. This is something Ken can probably check: He has a large database for Ghostscript to compare rendering results, and artifacts introduced by the simplified shorthand calculation would easily show up, I think. Below is a patch according to your data (hopefully, I've understood you correctly). Please check and test. Werner == --- ftgrays.c.orig 2009-07-31 18:45:19.0 +0200 +++ ftgrays.c 2010-06-08 09:56:23.0 +0200 @@ -1007,45 +1007,40 @@ const FT_Vector* control2, const FT_Vector* to ) { -TPosdx, dy, da, db; +TPosdx, dy; +TPosmid_x, mid_y; int top, level; int*levels; FT_Vector* arc; -dx = DOWNSCALE( ras.x ) + to-x - ( control1-x 1 ); -if ( dx 0 ) - dx = -dx; -dy = DOWNSCALE( ras.y ) + to-y - ( control1-y 1 ); -if ( dy 0 ) - dy = -dy; -if ( dx dy ) - dx = dy; -da = dx; +/* Calculate midpoint and compare it with start and end. */ +mid_x = ( DOWNSCALE( ras.x ) + to-x + + 3 * ( control1-x + control2-x ) ) / 8; +mid_y = ( DOWNSCALE( ras.y ) + to-y + + 3 * ( control1-y + control2-y ) ) / 8; -dx = DOWNSCALE( ras.x ) + to-x - 3 * ( control1-x + control2-x ); +dx = DOWNSCALE( ras.x ) + to-x - ( mid_x 1 ); if ( dx 0 ) dx = -dx; -dy = DOWNSCALE( ras.y ) + to-y - 3 * ( control1-x + control2-y ); +dy = DOWNSCALE( ras.y ) + to-y - ( mid_y 1 ); if ( dy 0 ) dy = -dy; if ( dx dy ) dx = dy; -db = dx; +/* Check whether an approximation with straight lines is sufficient. */ level = 1; -da= da / ras.cubic_level; -db= db / ras.conic_level; -while ( da 0 || db 0 ) +dx= dx / ras.conic_level; +while ( dx 0 ) { - da = 2; - db = 3; + dx = 3; level++; } if ( level = 1 ) { - TPos to_x, to_y, mid_x, mid_y; + TPos to_x, to_y; to_x = UPSCALE( to-x ); @@ -1104,7 +1099,7 @@ Draw: { -TPos to_x, to_y, mid_x, mid_y; +TPos to_x, to_y; to_x = arc[0].x; ___ Freetype-devel mailing list Freetype-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/freetype-devel
Re: [ft-devel] Re: possible inefficiency in gray_render_cubic
At 10:04 08/06/2010 +0200, Werner LEMBERG wrote: I am sceptical about the need to calculate both da and db. Perhaps db only will suffice. I hope that David or Werner can comment. This is something Ken can probably check: He has a large database for Ghostscript to compare rendering results, and artifacts introduced by the simplified shorthand calculation would easily show up, I think. I'd like to be able to, but at the moment GS doesn't use FreeType for anti-aliased rendering, hich is when this would get used I think ? Currently GS renders text to create a bitmap mask, and fills the mask with a colour (in PostScript colour can mean something complicated, like a pattern) It's on the schedule to replace the Ghostscript (crude) anti-aliasing with the FreeType anti-aliasing, but probably after we get the PCL and XPS interpreters to use FT (at the moment only the PS and PDF ones do). If I'm wrong please let me know and I'll be happy to put a patch through our regression testing. Ken ___ Freetype-devel mailing list Freetype-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/freetype-devel
Re: [ft-devel] Re: possible inefficiency in gray_render_cubic
Werner, I think your patch is slightly wrong because you use the same values for mid_x and mid_y for both the comparison and for drawing; however, one set if values needs to go through DownScale() and the other through UpScale(); that's why I calculate separately in my hastily-prepared patch. I'll see if I can come up with something better. Graham Werner LEMBERG wrote: Calculate the midpoint and compare it with start and end: [...] For me, this looks good. Thanks for working on this. I am sceptical about the need to calculate both da and db. Perhaps db only will suffice. I hope that David or Werner can comment. This is something Ken can probably check: He has a large database for Ghostscript to compare rendering results, and artifacts introduced by the simplified shorthand calculation would easily show up, I think. Below is a patch according to your data (hopefully, I've understood you correctly). Please check and test. Werner == --- ftgrays.c.orig 2009-07-31 18:45:19.0 +0200 +++ ftgrays.c 2010-06-08 09:56:23.0 +0200 @@ -1007,45 +1007,40 @@ const FT_Vector* control2, const FT_Vector* to ) { -TPosdx, dy, da, db; +TPosdx, dy; +TPosmid_x, mid_y; int top, level; int*levels; FT_Vector* arc; -dx = DOWNSCALE( ras.x ) + to-x - ( control1-x 1 ); -if ( dx 0 ) - dx = -dx; -dy = DOWNSCALE( ras.y ) + to-y - ( control1-y 1 ); -if ( dy 0 ) - dy = -dy; -if ( dx dy ) - dx = dy; -da = dx; +/* Calculate midpoint and compare it with start and end. */ +mid_x = ( DOWNSCALE( ras.x ) + to-x + + 3 * ( control1-x + control2-x ) ) / 8; +mid_y = ( DOWNSCALE( ras.y ) + to-y + + 3 * ( control1-y + control2-y ) ) / 8; -dx = DOWNSCALE( ras.x ) + to-x - 3 * ( control1-x + control2-x ); +dx = DOWNSCALE( ras.x ) + to-x - ( mid_x 1 ); if ( dx 0 ) dx = -dx; -dy = DOWNSCALE( ras.y ) + to-y - 3 * ( control1-x + control2-y ); +dy = DOWNSCALE( ras.y ) + to-y - ( mid_y 1 ); if ( dy 0 ) dy = -dy; if ( dx dy ) dx = dy; -db = dx; +/* Check whether an approximation with straight lines is sufficient. */ level = 1; -da= da / ras.cubic_level; -db= db / ras.conic_level; -while ( da 0 || db 0 ) +dx= dx / ras.conic_level; +while ( dx 0 ) { - da = 2; - db = 3; + dx = 3; level++; } if ( level = 1 ) { - TPos to_x, to_y, mid_x, mid_y; + TPos to_x, to_y; to_x = UPSCALE( to-x ); @@ -1104,7 +1099,7 @@ Draw: { -TPos to_x, to_y, mid_x, mid_y; +TPos to_x, to_y; to_x = arc[0].x; ___ Freetype-devel mailing list Freetype-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/freetype-devel
Re: [ft-devel] Re: possible inefficiency in gray_render_cubic
Another question arises: why do we divide by ras.conic_level when we are dealing with cubic splines? Surely it should be dx= dx / ras.cubic_level; or (better - we can use assignment operators in FreeType, I think ;-): dx/= ras.cubic_level; Graham Werner LEMBERG wrote: Calculate the midpoint and compare it with start and end: [...] For me, this looks good. Thanks for working on this. I am sceptical about the need to calculate both da and db. Perhaps db only will suffice. I hope that David or Werner can comment. This is something Ken can probably check: He has a large database for Ghostscript to compare rendering results, and artifacts introduced by the simplified shorthand calculation would easily show up, I think. Below is a patch according to your data (hopefully, I've understood you correctly). Please check and test. Werner == --- ftgrays.c.orig 2009-07-31 18:45:19.0 +0200 +++ ftgrays.c 2010-06-08 09:56:23.0 +0200 @@ -1007,45 +1007,40 @@ const FT_Vector* control2, const FT_Vector* to ) { -TPosdx, dy, da, db; +TPosdx, dy; +TPosmid_x, mid_y; int top, level; int*levels; FT_Vector* arc; -dx = DOWNSCALE( ras.x ) + to-x - ( control1-x 1 ); -if ( dx 0 ) - dx = -dx; -dy = DOWNSCALE( ras.y ) + to-y - ( control1-y 1 ); -if ( dy 0 ) - dy = -dy; -if ( dx dy ) - dx = dy; -da = dx; +/* Calculate midpoint and compare it with start and end. */ +mid_x = ( DOWNSCALE( ras.x ) + to-x + + 3 * ( control1-x + control2-x ) ) / 8; +mid_y = ( DOWNSCALE( ras.y ) + to-y + + 3 * ( control1-y + control2-y ) ) / 8; -dx = DOWNSCALE( ras.x ) + to-x - 3 * ( control1-x + control2-x ); +dx = DOWNSCALE( ras.x ) + to-x - ( mid_x 1 ); if ( dx 0 ) dx = -dx; -dy = DOWNSCALE( ras.y ) + to-y - 3 * ( control1-x + control2-y ); +dy = DOWNSCALE( ras.y ) + to-y - ( mid_y 1 ); if ( dy 0 ) dy = -dy; if ( dx dy ) dx = dy; -db = dx; +/* Check whether an approximation with straight lines is sufficient. */ level = 1; -da= da / ras.cubic_level; -db= db / ras.conic_level; -while ( da 0 || db 0 ) +dx= dx / ras.conic_level; +while ( dx 0 ) { - da = 2; - db = 3; + dx = 3; level++; } if ( level = 1 ) { - TPos to_x, to_y, mid_x, mid_y; + TPos to_x, to_y; to_x = UPSCALE( to-x ); @@ -1104,7 +1099,7 @@ Draw: { -TPos to_x, to_y, mid_x, mid_y; +TPos to_x, to_y; to_x = arc[0].x; ___ Freetype-devel mailing list Freetype-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/freetype-devel
Re: [ft-devel] Re: possible inefficiency in gray_render_cubic
Werner, here's what I think is a better version of the start of the function. I've combined a few declarations and initialisations; probably against your house style, so please adjust as appropriate if you don't like it. Important changes: (i) dx is now divided by ras.cubic_level; (ii) mid_x and mid_y are recalculated only if necessary. Graham static int gray_render_cubic( RAS_ARG_ FT_Vector* control1, FT_Vector* control2, FT_Vector* to ) { int top, level; int*levels; FT_Vector* arc; int error = 0; int mid_x = (DOWNSCALE( ras.x ) + to-x + 3 * (control1-x + control2-x)) / 8; int mid_y = (DOWNSCALE( ras.y ) + to-y + 3 * (control1-y + control2-y)) / 8; TPos dx = DOWNSCALE( ras.x ) + to-x - ( mid_x 1 ); TPos dy = DOWNSCALE( ras.y ) + to-y - ( mid_y 1 ); if ( dx 0 ) dx = -dx; if ( dy 0 ) dy = -dy; if ( dx dy ) dx = dy; level = 1; dx /= ras.cubic_level; while ( dx 0 ) { dx = 3; level++; } if ( level = 1 ) { TPos to_x, to_y; to_x = UPSCALE( to-x ); to_y = UPSCALE( to-y ); /* Recalculation of midpoint is needed only if UPSCALE and DOWNSCALE have any effect. */ #if (PIXEL_BITS != 6) mid_x = ( ras.x + to_x + 3 * UPSCALE( control1-x + control2-x ) ) / 8; mid_y = ( ras.y + to_y + 3 * UPSCALE( control1-y + control2-y ) ) / 8; #endif error = gray_render_line( RAS_VAR_ mid_x, mid_y ); if (!error) error = gray_render_line( RAS_VAR_ to_x, to_y ); return error; } Werner LEMBERG wrote: Calculate the midpoint and compare it with start and end: [...] For me, this looks good. Thanks for working on this. I am sceptical about the need to calculate both da and db. Perhaps db only will suffice. I hope that David or Werner can comment. This is something Ken can probably check: He has a large database for Ghostscript to compare rendering results, and artifacts introduced by the simplified shorthand calculation would easily show up, I think. Below is a patch according to your data (hopefully, I've understood you correctly). Please check and test. Werner == --- ftgrays.c.orig 2009-07-31 18:45:19.0 +0200 +++ ftgrays.c 2010-06-08 09:56:23.0 +0200 @@ -1007,45 +1007,40 @@ const FT_Vector* control2, const FT_Vector* to ) { -TPosdx, dy, da, db; +TPosdx, dy; +TPosmid_x, mid_y; int top, level; int*levels; FT_Vector* arc; -dx = DOWNSCALE( ras.x ) + to-x - ( control1-x 1 ); -if ( dx 0 ) - dx = -dx; -dy = DOWNSCALE( ras.y ) + to-y - ( control1-y 1 ); -if ( dy 0 ) - dy = -dy; -if ( dx dy ) - dx = dy; -da = dx; +/* Calculate midpoint and compare it with start and end. */ +mid_x = ( DOWNSCALE( ras.x ) + to-x + + 3 * ( control1-x + control2-x ) ) / 8; +mid_y = ( DOWNSCALE( ras.y ) + to-y + + 3 * ( control1-y + control2-y ) ) / 8; -dx = DOWNSCALE( ras.x ) + to-x - 3 * ( control1-x + control2-x ); +dx = DOWNSCALE( ras.x ) + to-x - ( mid_x 1 ); if ( dx 0 ) dx = -dx; -dy = DOWNSCALE( ras.y ) + to-y - 3 * ( control1-x + control2-y ); +dy = DOWNSCALE( ras.y ) + to-y - ( mid_y 1 ); if ( dy 0 ) dy = -dy; if ( dx dy ) dx = dy; -db = dx; +/* Check whether an approximation with straight lines is sufficient. */ level = 1; -da= da / ras.cubic_level; -db= db / ras.conic_level; -while ( da 0 || db 0 ) +dx= dx / ras.conic_level; +while ( dx 0 ) { - da = 2; - db = 3; + dx = 3; level++; } if ( level = 1 ) { - TPos to_x, to_y, mid_x, mid_y; + TPos to_x, to_y; to_x = UPSCALE( to-x ); @@ -1104,7 +1099,7 @@ Draw: { -TPos to_x, to_y, mid_x, mid_y; +TPos to_x, to_y; to_x = arc[0].x; ___ Freetype-devel mailing list Freetype-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/freetype-devel
[ft-devel] Re: possible inefficiency in gray_render_cubic
Correction (sorry, but this stuff gets confusing). Section 6 should read: 6. A POSSIBLE BETTER FIX Calculate the midpoint and compare it with start and end: { int mid_x = (DOWNSCALE( ras.x ) + to-x + 3 * (control1-x + control2-x)) / 8; int mid_y = (DOWNSCALE( ras.y ) + to-y + 3 * (control1-y + control2-y)) / 8; dx = DOWNSCALE( ras.x ) + to-x - ( mid_x 1 ); if ( dx 0 ) dx = -dx; dy = DOWNSCALE( ras.y ) + to-y - ( mid_y 1 ); if ( dy 0 ) dy = -dy; if ( dx dy ) dx = dy; db = dx; } without those multiplications by 3, which crept in again. Graham Graham Asher wrote: I have discovered a possible inefficiency in gray_render_cubic in ftsmooth.c. Removing it (by means of what I admit is a hack based on a limited understanding of the code) gives a vast speedup with no apparent loss of quality. The error is an incorrect method of calculating whether to use a short cut when all the points of a cubic spline are close together. There is also an obvious typo (see 4A below) in the current code. This is based on the current state of the code in the Git repository. Here's my analysis so far. 1. LOOP PASSES The number of passes through the main loop is controlled largely by the variable 'level'. Reducing that number increases the speed of the function. If 'level' is no greater than one there are no passes through the loop. Instead, there's a short cut that draws a straight line from the start to the mid point of the curve and another from the mid point to the end. 2. MEANING OF 'level' It's evident from the nature of the short-cut code that 'level' is related to the maximum distance between the start, end and control points. If all the points are very close together, there is no point in generating a full curve, especially as the units used are normally 64ths of pixels; it's not very meaningful to create a curve inside a pixel, and probably won't affect its grey level, as compared with a straight line. Thus 'level' ought to be 0 or 1 if the points are very close together. 3. EXAMPLE This example is taken from my CartoType map rendering code. Coordinates: start = 16272, 24478, control1 = 16276, 24461, control2 = 16266, 24443, end = 16249, 24439 Calculated value of da after 'da = dx': 31 Calculated value of db after 'db = dx': 97795 Calculated value of 'level' (with ras.cubic_level == 64 and ras.conic_level = 128): 5. Passes through the main loop: 16. 4. ANOMALOUS VALUE OF 'db' Let's look at the value of 'db' again: 97795. This large value comes from the code dx = DOWNSCALE( ras.x ) + to-x - 3 * ( control1-x + control2-x ); if ( dx 0 ) dx = -dx; dy = DOWNSCALE( ras.y ) + to-y - 3 * ( control1-x + control2-y ); if ( dy 0 ) dy = -dy; if ( dx dy ) dx = dy; db = dx; which is where I think the problem is. The idea seems to be to get the maximum distance of the midpoint from the start and end points, but the calculation is wrong: it adds two values (the start and end coordinates) then subtracts six coordinates (the control points, multiplied by three). Thus, if all 4 coordinates (x or y) are identical, the answer is 2x - 6x, or -4x. In our case we get a value for 'db' of ~100,000 because that is 4 times the y coordinate value; they are all ~25000. (4A. AN OBVIOUS BUG: the code 'control1-x + control2-y' above should obviously be 'control1-y + control2-y'. That needs to be fixed if my suggestions here are rejected.) 5. A SIMPLE HACK My initial fix was to simply remove the multiplication by 3, which was probably cut and pasted from the later correct calculation of the curve's midpoint. Thus we have: dx = DOWNSCALE( ras.x ) + to-x - ( control1-x + control2-x ); if ( dx 0 ) dx = -dx; dy = DOWNSCALE( ras.y ) + to-y - ( control1-y + control2-y ); if ( dy 0 ) dy = -dy; if ( dx dy ) dx = dy; db = dx; which, as mentioned above, seems to fix the trouble, without apparent bad consequences. 6. A POSSIBLE BETTER FIX Calculate the midpoint and compare it with start and end: { int mid_x = (DOWNSCALE( ras.x ) + to-x + 3 * (control1-x + control2-x)) / 8; int mid_y = (DOWNSCALE( ras.y ) + to-y + 3 * (control1-y + control2-y)) / 8; dx = DOWNSCALE( ras.x ) + to-x - 3 * ( mid_x 1 ); if ( dx 0 ) dx = -dx; dy = DOWNSCALE( ras.y ) + to-y - 3 * ( mid_y 1 ); if ( dy 0 ) dy = -dy; if ( dx dy ) dx = dy; db = dx; } 7. FURTHER WORK: WHY DO WE NEED da AND db? I am sceptical about the need to calculate both da and db. Perhaps db only will suffice. I hope that David or Werner can comment. Best regards, Graham Asher CartoType Ltd ___ Freetype-devel mailing list Freetype-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/freetype-devel