Re: [ft-devel] FT_MulFix assembly

2010-09-07 Thread Miles Bader
James Cloos cl...@jhcloos.com writes:
 The C version does away-from-zero rounding.

 MB Do you have test cases that show this?  I tried using random inputs,
 MB but even up to billions of iterations, I can't seem to find a set of
 MB inputs where my function yields different results from yours.

 The C version saves the two signs, takes the absolute values,
 multiplies, scales and then sets the sign.

 When I tested, I used dd(1) to generate a quarter-gig file from urandom
 (I used a fixed file so that it would be reproducable), mmap(2)ed that
 to an int[], and went through two at a time.  The C and my initial asm
 versions produced different results whenever the second int was -1 (ie
 0x) and the first matched: (a  0  (a  0x == 0x8000)).

 In other words, multiplying something like 7.5 by -1/65536.

 An example of that test's output was:

   7AFA8000, , 8505, 8506, 0

 In that example, 8505 is what the C version generates.

Hm, are you sure that's not backwards?  When I tried the git C version[*],
as well as your most recent FT_MulFix_x86_64, it returned 0x8506...


The following C version:

   typedef signed int FT_Int;
   typedef signed long FT_Long;
   typedef signed long FT_Int64; /* on x86-64 */

   FT_Long
   FT_MulFix_C_new( FT_Long  a,
FT_Long  b )
   {
 return (((FT_Int64)a * (FT_Int64)b) + 0x8000)  16;
   }

... generates this code:

imulq   %rsi, %rdi
leaq32768(%rdi), %rax
sarq$16, %rax


It seems to yield exactly the same results as the offical C version[*],
both for your test case:

   $ ./t 0x7AFA8000 0x
   0x7afa8000 x 0x =
   C: 0x7fff8506
   C_new: 0x7fff8506
 asm: 0x7fff8506

... and also for billions of random inputs.


Is there something I'm missing...?

[*] Fetched from:
http://git.savannah.gnu.org/cgit/freetype/freetype2.git/tree/src/base/ftcalc.c


Thanks,

-Miles

-- 
Liberty, n. One of imagination's most precious possessions.


___
Freetype-devel mailing list
Freetype-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/freetype-devel


RE: [ft-devel] latest patch file for spline flattening

2010-09-07 Thread David Bevan

Graham,

Here are the results of my performance testing. I was a bit surprised by the 
results.

In gray_convert_glyph, the time is distributed as follows:

  OLDNEW
render_line   20%15%
render_cubic  15%33%
render_scanline   14%10%
split_cubic6% 9%

OLD is the pre-2.4.0 code; NEW is the latest patch from you.
These percentages are the fraction of time spent in the specific function 
(excluding children).

Including children, we have the following actual times per call for handling 
cubic curves:

  OLDNEW
render_cubic  142us  220us

I wasn't expecting your new code to be slower. So I ran my trace code on it 
with the following results:

 OLD NEW
average line segs per arc13.5 11.3
min line segs per arc 21
max line segs per arc32  133

average deviation per line seg0.29 0.44
min deviation per line seg00
max deviation per line seg   22.2 15.8


Some arcs are creating a very large number of line segments. I expect (though I 
haven't verified) that it is this that is causing the slow-down.

Below is the data for one curve that gets broken down into many tiny line 
segments.

David %^


4604,0  2080,0  40,2020  40,4496

  40,4496 - 40,4436
  40,4436 - 40,4379
  40,4379 - 41,4321
  41,4321 - 44,4264
  44,4264 - 47,4206
  47,4206 - 51,4149
  51,4149 - 56,4092
  56,4092 - 62,4036
  62,4036 - 68,3979
  68,3979 - 74,3922
  74,3922 - 81,3865
  81,3865 - 90,3811
  90,3811 - 99,3754
  99,3754 - 109,3700
  109,3700 - 119,3645
  119,3645 - 131,3591
  131,3591 - 142,3535
  142,3535 - 154,3481
  154,3481 - 166,3427
  166,3427 - 181,3373
  181,3373 - 195,3319
  195,3319 - 210,3265
  210,3265 - 225,3211
  225,3211 - 243,3160
  243,3160 - 259,3106
  259,3106 - 277,3055
  277,3055 - 295,3002
  295,3002 - 314,2951
  314,2951 - 333,2900
  333,2900 - 354,2849
  354,2849 - 375,2798
  375,2798 - 397,2748
  397,2748 - 418,2697
  418,2697 - 440,2646
  440,2646 - 463,2595
  463,2595 - 487,2547
  487,2547 - 536,2450
  536,2450 - 588,2354
  588,2354 - 641,2258
  641,2258 - 697,2165
  697,2165 - 756,2073
  756,2073 - 817,1984
  817,1984 - 879,1894
  879,1894 - 943,1807
  943,1807 - 1009,1720
  1009,1720 - 1079,1637
  1079,1637 - 1149,1554
  1149,1554 - 1222,1474
  1222,1474 - 1297,1395
  1297,1395 - 1375,1319
  1375,1319 - 1452,1243
  1452,1243 - 1533,1169
  1533,1169 - 1614,1097
  1614,1097 - 1698,1028
  1698,1028 - 1782,959
  1782,959 - 1869,894
  1869,894 - 1958,830
  1958,830 - 2049,769
  2049,769 - 2140,708
  2140,708 - 2233,651
  2233,651 - 2328,595
  2328,595 - 2425,543
  2425,543 - 2522,491
  2522,491 - 2570,467
  2570,467 - 2621,443
  2621,443 - 2671,419
  2671,419 - 2722,397
  2722,397 - 2773,375
  2773,375 - 2825,354
  2825,354 - 2876,332
  2876,332 - 2927,311
  2927,311 - 2978,290
  2978,290 - 3031,272
  3031,272 - 3082,253
  3082,253 - 3136,235
  3136,235 - 3190,217
  3190,217 - 3244,202
  3244,202 - 3297,184
  3297,184 - 3351,169
  3351,169 - 3405,154
  3405,154 - 3460,140
  3460,140 - 3514,126
  3514,126 - 3570,114
  3570,114 - 3625,102
  3625,102 - 3682,91
  3682,91 - 3736,79
  3736,79 - 3793,69
  3793,69 - 3849,59
  3849,59 - 3906,50
  3906,50 - 3963,41
  3963,41 - 4020,34
  4020,34 - 4077,28
  4077,28 - 4136,22
  4136,22 - 4193,16
  4193,16 - 4251,11
  4251,11 - 4308,7
  4308,7 - 4368,4
  4368,4 - 4425,1
  4425,1 - 4485,0
  4485,0 - 4544,0
  4544,0 - 4604,0


___
Freetype-devel mailing list
Freetype-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/freetype-devel


Re: [ft-devel] latest patch file for spline flattening

2010-09-07 Thread GRAHAM ASHER
That is very interesting and very useful - in fact I think the more surprising 
a 
test is, the more useful it is. I'll have to look into your test case carefully 
as well. I might not be able to do it for a day or to, though.

Where does your test data come from? Actual fonts, cooked up data, or a mixture 
of both?

Best regards,

Graham




- Original Message 
From: David Bevan david.be...@pb.com
To: Graham Asher graham.as...@btinternet.com
Cc: freetype-devel freetype-devel@nongnu.org
Sent: Tuesday, 7 September, 2010 12:40:21
Subject: RE: [ft-devel] latest patch file for spline flattening


Graham,

Here are the results of my performance testing. I was a bit surprised by the 
results.

In gray_convert_glyph, the time is distributed as follows:

  OLDNEW
render_line   20%15%
render_cubic  15%33%
render_scanline   14%10%
split_cubic6% 9%

OLD is the pre-2.4.0 code; NEW is the latest patch from you.
These percentages are the fraction of time spent in the specific function 
(excluding children).

Including children, we have the following actual times per call for handling 
cubic curves:

  OLDNEW
render_cubic  142us  220us

I wasn't expecting your new code to be slower. So I ran my trace code on it 
with 
the following results:

 OLD NEW
average line segs per arc13.5 11.3
min line segs per arc 21
max line segs per arc32  133

average deviation per line seg0.29 0.44
min deviation per line seg00
max deviation per line seg   22.2 15.8


Some arcs are creating a very large number of line segments. I expect (though I 
haven't verified) that it is this that is causing the slow-down.

Below is the data for one curve that gets broken down into many tiny line 
segments.

David %^


4604,0  2080,0  40,2020  40,4496

  40,4496 - 40,4436
  40,4436 - 40,4379
  40,4379 - 41,4321
  41,4321 - 44,4264
  44,4264 - 47,4206
  47,4206 - 51,4149
  51,4149 - 56,4092
  56,4092 - 62,4036
  62,4036 - 68,3979
  68,3979 - 74,3922
  74,3922 - 81,3865
  81,3865 - 90,3811
  90,3811 - 99,3754
  99,3754 - 109,3700
  109,3700 - 119,3645
  119,3645 - 131,3591
  131,3591 - 142,3535
  142,3535 - 154,3481
  154,3481 - 166,3427
  166,3427 - 181,3373
  181,3373 - 195,3319
  195,3319 - 210,3265
  210,3265 - 225,3211
  225,3211 - 243,3160
  243,3160 - 259,3106
  259,3106 - 277,3055
  277,3055 - 295,3002
  295,3002 - 314,2951
  314,2951 - 333,2900
  333,2900 - 354,2849
  354,2849 - 375,2798
  375,2798 - 397,2748
  397,2748 - 418,2697
  418,2697 - 440,2646
  440,2646 - 463,2595
  463,2595 - 487,2547
  487,2547 - 536,2450
  536,2450 - 588,2354
  588,2354 - 641,2258
  641,2258 - 697,2165
  697,2165 - 756,2073
  756,2073 - 817,1984
  817,1984 - 879,1894
  879,1894 - 943,1807
  943,1807 - 1009,1720
  1009,1720 - 1079,1637
  1079,1637 - 1149,1554
  1149,1554 - 1222,1474
  1222,1474 - 1297,1395
  1297,1395 - 1375,1319
  1375,1319 - 1452,1243
  1452,1243 - 1533,1169
  1533,1169 - 1614,1097
  1614,1097 - 1698,1028
  1698,1028 - 1782,959
  1782,959 - 1869,894
  1869,894 - 1958,830
  1958,830 - 2049,769
  2049,769 - 2140,708
  2140,708 - 2233,651
  2233,651 - 2328,595
  2328,595 - 2425,543
  2425,543 - 2522,491
  2522,491 - 2570,467
  2570,467 - 2621,443
  2621,443 - 2671,419
  2671,419 - 2722,397
  2722,397 - 2773,375
  2773,375 - 2825,354
  2825,354 - 2876,332
  2876,332 - 2927,311
  2927,311 - 2978,290
  2978,290 - 3031,272
  3031,272 - 3082,253
  3082,253 - 3136,235
  3136,235 - 3190,217
  3190,217 - 3244,202
  3244,202 - 3297,184
  3297,184 - 3351,169
  3351,169 - 3405,154
  3405,154 - 3460,140
  3460,140 - 3514,126
  3514,126 - 3570,114
  3570,114 - 3625,102
  3625,102 - 3682,91
  3682,91 - 3736,79
  3736,79 - 3793,69
  3793,69 - 3849,59
  3849,59 - 3906,50
  3906,50 - 3963,41
  3963,41 - 4020,34
  4020,34 - 4077,28
  4077,28 - 4136,22
  4136,22 - 4193,16
  4193,16 - 4251,11
  4251,11 - 4308,7
  4308,7 - 4368,4
  4368,4 - 4425,1
  4425,1 - 4485,0
  4485,0 - 4544,0
  4544,0 - 4604,0

___
Freetype-devel mailing list
Freetype-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/freetype-devel


RE: [ft-devel] latest patch file for spline flattening

2010-09-07 Thread David Bevan

After trying various other fonts, I settled on using a single large (14,000 
glyphs; 800,000 Bezier curves) CID-keyed Type 1 font, which seemed to show 
pretty average behaviour.

I'm working on an implementation of something like Hain's algorithm now.

It'll be interesting to see how it compares.

David %^


-Original Message-
From: GRAHAM ASHER [mailto:graham.as...@btinternet.com] 
Sent: 07 September 2010 13:46
To: David Bevan
Cc: freetype-devel
Subject: Re: [ft-devel] latest patch file for spline flattening

That is very interesting and very useful - in fact I think the more surprising 
a 
test is, the more useful it is. I'll have to look into your test case carefully 
as well. I might not be able to do it for a day or to, though.

Where does your test data come from? Actual fonts, cooked up data, or a mixture 
of both?

Best regards,

Graham




- Original Message 
From: David Bevan david.be...@pb.com
To: Graham Asher graham.as...@btinternet.com
Cc: freetype-devel freetype-devel@nongnu.org
Sent: Tuesday, 7 September, 2010 12:40:21
Subject: RE: [ft-devel] latest patch file for spline flattening


Graham,

Here are the results of my performance testing. I was a bit surprised by the 
results.

In gray_convert_glyph, the time is distributed as follows:

  OLDNEW
render_line   20%15%
render_cubic  15%33%
render_scanline   14%10%
split_cubic6% 9%

OLD is the pre-2.4.0 code; NEW is the latest patch from you.
These percentages are the fraction of time spent in the specific function 
(excluding children).

Including children, we have the following actual times per call for handling 
cubic curves:

  OLDNEW
render_cubic  142us  220us

I wasn't expecting your new code to be slower. So I ran my trace code on it 
with 
the following results:

 OLD NEW
average line segs per arc13.5 11.3
min line segs per arc 21
max line segs per arc32  133

average deviation per line seg0.29 0.44
min deviation per line seg00
max deviation per line seg   22.2 15.8


Some arcs are creating a very large number of line segments. I expect (though I 
haven't verified) that it is this that is causing the slow-down.

Below is the data for one curve that gets broken down into many tiny line 
segments.

David %^


4604,0  2080,0  40,2020  40,4496

  40,4496 - 40,4436
  40,4436 - 40,4379
  40,4379 - 41,4321
  41,4321 - 44,4264
  44,4264 - 47,4206
  47,4206 - 51,4149
  51,4149 - 56,4092
  56,4092 - 62,4036
  62,4036 - 68,3979
  68,3979 - 74,3922
  74,3922 - 81,3865
  81,3865 - 90,3811
  90,3811 - 99,3754
  99,3754 - 109,3700
  109,3700 - 119,3645
  119,3645 - 131,3591
  131,3591 - 142,3535
  142,3535 - 154,3481
  154,3481 - 166,3427
  166,3427 - 181,3373
  181,3373 - 195,3319
  195,3319 - 210,3265
  210,3265 - 225,3211
  225,3211 - 243,3160
  243,3160 - 259,3106
  259,3106 - 277,3055
  277,3055 - 295,3002
  295,3002 - 314,2951
  314,2951 - 333,2900
  333,2900 - 354,2849
  354,2849 - 375,2798
  375,2798 - 397,2748
  397,2748 - 418,2697
  418,2697 - 440,2646
  440,2646 - 463,2595
  463,2595 - 487,2547
  487,2547 - 536,2450
  536,2450 - 588,2354
  588,2354 - 641,2258
  641,2258 - 697,2165
  697,2165 - 756,2073
  756,2073 - 817,1984
  817,1984 - 879,1894
  879,1894 - 943,1807
  943,1807 - 1009,1720
  1009,1720 - 1079,1637
  1079,1637 - 1149,1554
  1149,1554 - 1222,1474
  1222,1474 - 1297,1395
  1297,1395 - 1375,1319
  1375,1319 - 1452,1243
  1452,1243 - 1533,1169
  1533,1169 - 1614,1097
  1614,1097 - 1698,1028
  1698,1028 - 1782,959
  1782,959 - 1869,894
  1869,894 - 1958,830
  1958,830 - 2049,769
  2049,769 - 2140,708
  2140,708 - 2233,651
  2233,651 - 2328,595
  2328,595 - 2425,543
  2425,543 - 2522,491
  2522,491 - 2570,467
  2570,467 - 2621,443
  2621,443 - 2671,419
  2671,419 - 2722,397
  2722,397 - 2773,375
  2773,375 - 2825,354
  2825,354 - 2876,332
  2876,332 - 2927,311
  2927,311 - 2978,290
  2978,290 - 3031,272
  3031,272 - 3082,253
  3082,253 - 3136,235
  3136,235 - 3190,217
  3190,217 - 3244,202
  3244,202 - 3297,184
  3297,184 - 3351,169
  3351,169 - 3405,154
  3405,154 - 3460,140
  3460,140 - 3514,126
  3514,126 - 3570,114
  3570,114 - 3625,102
  3625,102 - 3682,91
  3682,91 - 3736,79
  3736,79 - 3793,69
  3793,69 - 3849,59
  3849,59 - 3906,50
  3906,50 - 3963,41
  3963,41 - 4020,34
  4020,34 - 4077,28
  4077,28 - 4136,22
  4136,22 - 4193,16
  4193,16 - 4251,11
  4251,11 - 4308,7
  4308,7 - 4368,4
  4368,4 - 4425,1
  4425,1 - 4485,0
  4485,0 - 4544,0
  4544,0 - 4604,0


___
Freetype-devel mailing list
Freetype-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/freetype-devel


Re: [ft-devel] latest patch file for spline flattening

2010-09-07 Thread GRAHAM ASHER
With 14,000 glyphs, I imagine that's a CJK font. I think there might be 
different characteristics from a typical Latin font. I think we also ought to 
try out a similar number of Latin glyphs, which could be done by rasterizing 
all 
the glyphs in a Latin font at varying sizes and rotations. In some ways a CJK 
font is probably a more stressful test, because strokes occur at a greater 
number of angles and sizes; but Latin characters should be part of the 
benchmark. I am not trying to foist more work on you; just musing.

Graham



- Original Message 
From: David Bevan david.be...@pb.com
To: GRAHAM ASHER graham.as...@btinternet.com
Cc: freetype-devel freetype-devel@nongnu.org
Sent: Tuesday, 7 September, 2010 13:52:35
Subject: RE: [ft-devel] latest patch file for spline flattening


After trying various other fonts, I settled on using a single large (14,000 
glyphs; 800,000 Bezier curves) CID-keyed Type 1 font, which seemed to show 
pretty average behaviour.

I'm working on an implementation of something like Hain's algorithm now.

It'll be interesting to see how it compares.

David %^


-Original Message-
From: GRAHAM ASHER [mailto:graham.as...@btinternet.com] 
Sent: 07 September 2010 13:46
To: David Bevan
Cc: freetype-devel
Subject: Re: [ft-devel] latest patch file for spline flattening

That is very interesting and very useful - in fact I think the more surprising 
a 

test is, the more useful it is. I'll have to look into your test case carefully 
as well. I might not be able to do it for a day or to, though.

Where does your test data come from? Actual fonts, cooked up data, or a mixture 
of both?

Best regards,

Graham




- Original Message 
From: David Bevan david.be...@pb.com
To: Graham Asher graham.as...@btinternet.com
Cc: freetype-devel freetype-devel@nongnu.org
Sent: Tuesday, 7 September, 2010 12:40:21
Subject: RE: [ft-devel] latest patch file for spline flattening


Graham,

Here are the results of my performance testing. I was a bit surprised by the 
results.

In gray_convert_glyph, the time is distributed as follows:

  OLDNEW
render_line   20%15%
render_cubic  15%33%
render_scanline   14%10%
split_cubic6% 9%

OLD is the pre-2.4.0 code; NEW is the latest patch from you.
These percentages are the fraction of time spent in the specific function 
(excluding children).

Including children, we have the following actual times per call for handling 
cubic curves:

  OLDNEW
render_cubic  142us  220us

I wasn't expecting your new code to be slower. So I ran my trace code on it 
with 

the following results:

 OLD NEW
average line segs per arc13.5 11.3
min line segs per arc 21
max line segs per arc32  133

average deviation per line seg0.29 0.44
min deviation per line seg00
max deviation per line seg   22.2 15.8


Some arcs are creating a very large number of line segments. I expect (though I 
haven't verified) that it is this that is causing the slow-down.

Below is the data for one curve that gets broken down into many tiny line 
segments.

David %^


4604,0  2080,0  40,2020  40,4496

  40,4496 - 40,4436
  40,4436 - 40,4379
  40,4379 - 41,4321
  41,4321 - 44,4264
  44,4264 - 47,4206
  47,4206 - 51,4149
  51,4149 - 56,4092
  56,4092 - 62,4036
  62,4036 - 68,3979
  68,3979 - 74,3922
  74,3922 - 81,3865
  81,3865 - 90,3811
  90,3811 - 99,3754
  99,3754 - 109,3700
  109,3700 - 119,3645
  119,3645 - 131,3591
  131,3591 - 142,3535
  142,3535 - 154,3481
  154,3481 - 166,3427
  166,3427 - 181,3373
  181,3373 - 195,3319
  195,3319 - 210,3265
  210,3265 - 225,3211
  225,3211 - 243,3160
  243,3160 - 259,3106
  259,3106 - 277,3055
  277,3055 - 295,3002
  295,3002 - 314,2951
  314,2951 - 333,2900
  333,2900 - 354,2849
  354,2849 - 375,2798
  375,2798 - 397,2748
  397,2748 - 418,2697
  418,2697 - 440,2646
  440,2646 - 463,2595
  463,2595 - 487,2547
  487,2547 - 536,2450
  536,2450 - 588,2354
  588,2354 - 641,2258
  641,2258 - 697,2165
  697,2165 - 756,2073
  756,2073 - 817,1984
  817,1984 - 879,1894
  879,1894 - 943,1807
  943,1807 - 1009,1720
  1009,1720 - 1079,1637
  1079,1637 - 1149,1554
  1149,1554 - 1222,1474
  1222,1474 - 1297,1395
  1297,1395 - 1375,1319
  1375,1319 - 1452,1243
  1452,1243 - 1533,1169
  1533,1169 - 1614,1097
  1614,1097 - 1698,1028
  1698,1028 - 1782,959
  1782,959 - 1869,894
  1869,894 - 1958,830
  1958,830 - 2049,769
  2049,769 - 2140,708
  2140,708 - 2233,651
  2233,651 - 2328,595
  2328,595 - 2425,543
  2425,543 - 2522,491
  2522,491 - 2570,467
  2570,467 - 2621,443
  2621,443 - 2671,419
  2671,419 - 2722,397
  2722,397 - 2773,375
  2773,375 - 2825,354
  2825,354 - 2876,332
  2876,332 - 2927,311
  2927,311 - 2978,290
  2978,290 - 3031,272
  3031,272 - 3082,253
  3082,253 - 3136,235
  3136,235 - 3190,217
  3190,217 - 3244,202
  3244,202 - 3297,184
  3297,184 - 3351,169
  

RE: [ft-devel] latest patch file for spline flattening

2010-09-07 Thread David Bevan

I've now implemented something based on Hain's research and it seems to be 
measurably faster than previous FT approaches. I have used Hain's paper (now 
available from http://tinyurl.com/HainBez) to provide some sensible heuristics 
rather than implement all his stuff in detail, and done so without using square 
roots or even any divisions.

First, here are the trace results:

 OLD NEW HAIN
average line segs per arc13.5 11.32.1
min line segs per arc 21  1
max line segs per arc32  133 16

average deviation per line seg0.29 0.44   6.5
min deviation per line seg00  0
max deviation per line seg   22.2 15.8   15.7

By using reasonably accurate heuristics when deciding whether to split the 
curve, we create 5.5 x fewer line segments. This cuts down the number of calls 
to split_cubic and the number of iterations within render_cubic.

And now the performance results:

In gray_convert_glyph, the time is distributed as follows:

  OLDNEWHAIN
render_line   20%15% 12%
render_cubic  15%33%  9%
render_scanline   14%10% 10%
split_cubic6% 9%  2%

The time spent in these functions has been significantly reduced as a fraction 
of processing time.

Including children, we have the following actual times per call for handling 
cubic curves:

  OLDNEWHAIN
render_cubic  142us  220us  61us

render_cubic is now more than twice as fast as it ever has been.

The effect of the speed-up is even measurable as a 5-10% speed-up of my font 
rasterisation program (which is reading and writing data on top of using FT to 
do the actual rendering).


These tests are with the same Unicode font as before. I'll run some more test 
with Latin-only fonts, though previous testing didn't show any significant 
performance differences between Latin and CJK. CJK glyphs just have more cubic 
Bezier curves on average, but a Bezier curve is a Bezier curve wherever it 
comes from.


The code is below. I hope I've tried to follow Werner's coding standards as far 
as I know what they are.

Thanks.

David %^



  static void
  gray_render_cubic( RAS_ARG_ const FT_Vector*  control1,
  const FT_Vector*  control2,
  const FT_Vector*  to )
  {
FT_Vector*  arc;


arc  = ras.bez_stack;
arc[0].x = UPSCALE( to-x );
arc[0].y = UPSCALE( to-y );
arc[1].x = UPSCALE( control2-x );
arc[1].y = UPSCALE( control2-y );
arc[2].x = UPSCALE( control1-x );
arc[2].y = UPSCALE( control1-y );
arc[3].x = ras.x;
arc[3].y = ras.y;

for (;;)
{
/* Check that the arc crosses the current band. */
TPos  min, max, y;


min = max = arc[0].y;
y = arc[1].y;
if ( y  min ) min = y;
if ( y  max ) max = y;
y = arc[2].y;
if ( y  min ) min = y;
if ( y  max ) max = y;
y = arc[3].y;
if ( y  min ) min = y;
if ( y  max ) max = y;
if ( TRUNC( min ) = ras.max_ey || TRUNC( max )  0 )
  goto Draw;

/* Decide whether to split or draw */
/* See Hain's paper at http://tinyurl.com/HainBez for more info */
{
   TPos  dx, dy, L, dx1, dy1, dx2, dy2, s1, s2;


   /* dx and dy are x- and y- components of the P0-P3 chord vector */
   dx = arc[3].x - arc[0].x;
   dy = arc[3].y - arc[0].y; 

   /* L is an (under)estimate of the Euclidean distance P0-P3 */
   L = (  236 * FT_MAX(labs(dx), labs(dy)) 
+  97 * FT_MIN(labs(dx), labs(dy)))  8;

   /* avoid possible arithmetic overflow below by splitting */
   if (L  32767)
  goto Split;

   /* s1 is L * the perpendicular distance from P1 to the line P0-P3 */
   s1 = labs(  dy * (dx1 = arc[1].x - arc[0].x) 
 - dx * (dy1 = arc[1].y - arc[0].y));

   /* max deviation is at least (s1 / L) * sqrt(3)/6 (if v = -1) */
   if (s1  L * (TPos)(FT_MAX_CURVE_DEVIATION / 0.288675))
  goto Split;

   /* s2 is L * the perpendicular distance from P2 to the line P0-P3 */
   s2 = labs(  dy * (dx2 = arc[2].x - arc[0].x) 
 - dx * (dy2 = arc[2].y - arc[0].y));

   /* max deviation may be as much as (max(s1,s2)/L) * 3/4 (if v = 1) */
   if (FT_MAX(s1, s2)  L * (TPos)(FT_MAX_CURVE_DEVIATION / 0.75))
  goto Split;

   /* if P1 or P2 is outside P0-P3, split */
   if (   dy * dy1 + dx * dx1  0
   || dy * dy2 + dx * dx2  0
   || dy * (arc[3].y - arc[1].y) + dx * (arc[3].x - arc[1].x)  0
   || dy * (arc[3].y - arc[2].y) + dx * (arc[3].x - arc[2].x)  0
  )
  goto Split;

   /* no reason to split */
   goto Draw;
}

Split:

gray_split_cubic( arc );
arc += 3;
continue;

Draw:

gray_render_line( RAS_VAR_ arc[0].x, arc[0].y );

if (arc == ras.bez_stack)
  return;

arc -= 

RE: [ft-devel] latest patch file for spline flattening

2010-09-07 Thread David Bevan

Here are some test results with Latin fonts (40 thousand curves from fonts at 
various point sizes).

Trace results:
  
 CJK CJK CJK LATIN   LATIN
 OLD NEW HAINNEW HAIN
average line segs per arc13.5 11.32.1 30.96.1
max line segs per arc32  133 16  163 18

average deviation per line seg0.29 0.44   6.5  0.37   7.4
max deviation per line seg   22.2 15.8   15.7  7.9   15.7

Performance results:

In gray_convert_glyph, the time is distributed as follows:

  CJKCJKCJK LATIN  LATIN
  OLDNEWHAINNEWHAIN
render_line   20%15% 12%14%11%
render_cubic  15%33%  9%34%11%
render_scanline   14%10% 10%10%11%
split_cubic6% 9%  2%10% 3%

Including children, we have the following actual times per call for handling 
cubic curves:

  CJKCJKCJK LATIN   LATIN
  OLDNEWHAINNEW HAIN
render_cubic  142us  220us  61us546us   176us


Conclusions:

The performance improvement is as evident with Latin fonts as with CJK ones.

However, on average Bezier curves from Latin fonts require more flattening (6 
segments versus 2 with the Hain implementation), so processing them takes 
longer. As Graham pointed out to me: The curves used in Latin and other 
Latin-like alphabets are very often used to navigate 90-degree corners; P0 and 
P1 lie on a grid line, and so do P2 and P3. This is very rarely true in Han 
characters.

On the other hand, Latin glyphs contain fewer Bezier curves than CJK (6 versus 
57 on average with my data).

The upshot of both of these together is that the performance change is very 
similar (the CJK and Latin time distribution figures are so similar they could 
be from the same test).

David %^


___
Freetype-devel mailing list
Freetype-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/freetype-devel


Re: [ft-devel] FT_MulFix assembly

2010-09-07 Thread James Cloos
 MB == Miles Bader mi...@gnu.org writes:

MB Hm, are you sure that's not backwards?  When I tried the git C version[*],
MB as well as your most recent FT_MulFix_x86_64, it returned 0x8506...


Odd.  Adding your algo to my test app, I get:

  7AFA8000, , 8505, 8505, 8506
 #a   , b   ,FT   ,JC   ,MB

I see that I have one small error in the C code in my app.

FT has:

c = (FT_Long)( ( (FT_Int64)a * b + 0x8000L )  16 );

whereas I used:

c = (int32_t)(((int64_t)a*b + 0x8000L)  16);

But changing the int32_t to long does not change the results.

Yours still is always +1 compared to the C, whenever the first arg
represents a positive value with fractional part == 1/2.

Oddly, though, gcc now refuses to compile my asm, even though it did do
so before, complaining that I cannot guess what arg size to use for the
imul  Wierd.  (The existing executables prove that it used to.)
A simple way around that is to specify D and S as the contraints
for a and b.  (The rdi and rsi regesters are where the x86_64 abi puts
the first two args which are passed to a function.)

The disassembly of the final version is:

004006c0 mf:
  4006c0:   48 89 f8mov%rdi,%rax
  4006c3:   48 f7 eeimul   %rsi
  4006c6:   48 01 d0add%rdx,%rax
  4006c9:   48 05 00 80 00 00   add$0x8000,%rax
  4006cf:   48 c1 f8 10 sar$0x10,%rax
  4006d3:   c3  retq   

And I get this disassembly of yours:

00400840 miles:
  400840:   48 63 c6movslq %esi,%rax
  400843:   48 63 ffmovslq %edi,%rdi
  400846:   48 0f af c7 imul   %rdi,%rax
  40084a:   48 05 00 80 00 00   add$0x8000,%rax
  400850:   48 c1 f8 10 sar$0x10,%rax
  400854:   c3  retq   

I also just added this version to my test app:

int another (int32_t a, int32_t b) {
long r = (long)a * (long)b;
long s = r  31;
return (r + s + 0x8000)  16;
}

That results in:

00400760 another:
  400760:   48 63 ffmovslq %edi,%rdi
  400763:   48 63 f6movslq %esi,%rsi
  400766:   48 0f af f7 imul   %rdi,%rsi
  40076a:   48 89 f0mov%rsi,%rax
  40076d:   48 c1 f8 1f sar$0x1f,%rax
  400771:   48 8d 84 06 00 80 00lea0x8000(%rsi,%rax,1),%rax
  400778:   00 
  400779:   48 c1 f8 10 sar$0x10,%rax
  40077d:   c3  retq   

Since FT's C version uses longs, though, this:

int another (long a, long b) {
long r = (long)a * (long)b;
long s = r  31;
return (r + s + 0x8000)  16;
}

gives:

00400760 another:
  400760:   48 0f af f7 imul   %rdi,%rsi
  400764:   48 89 f0mov%rsi,%rax
  400767:   48 c1 f8 1f sar$0x1f,%rax
  40076b:   48 8d 84 06 00 80 00lea0x8000(%rsi,%rax,1),%rax
  400772:   00 
  400773:   48 c1 f8 10 sar$0x10,%rax
  400777:   c3  retq   

So it would seem that when compiling for any processor where FT_Long is
the same as int64_t and where that fits into a single register, then
that last bit of C might be optimal, yes?

-JimC
-- 
James Cloos cl...@jhcloos.com OpenPGP: 1024D/ED7DAEA6

___
Freetype-devel mailing list
Freetype-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/freetype-devel


Re: [ft-devel] FT_MulFix assembly

2010-09-07 Thread Miles Bader
James Cloos cl...@jhcloos.com writes:
 Since FT's C version uses longs, though, this:

 int another (long a, long b) {
 long r = (long)a * (long)b;
 long s = r  31;
 return (r + s + 0x8000)  16;
 }

That's not correct though, is it?  The variable s should be the all
sign portion of the multiplication, but since the two inputs have 32
significant bits (never mind the types), the product will have 64
significant bits.  So r  31 won't be all-sign, it'll be a bunch of
... other bits. :)

However, changing the shift to 63:

   FT_Long
   FT_MulFix_C_new2( FT_Long  a,
 FT_Long  b )
   {
 FT_Int64 prod = (FT_Int64)a * (FT_Int64)b;
 FT_Int64 sign = prod  63;
 return ((prod + sign + 0x8000)  16);
   }

... does seem to yield correct results:

   $ ./t 0x7AFA8000 0x
   0x7afa8000 x 0x =
   C: 0x8505
   C_new: 0x8505
   C_nw2: 0x8505
   C_ano: 0x8505
 asm: 0x8505

C is the old C, C_new was my previous attempt, C_nw2 is the above
FT_MulFix_C_new2 function, C_ano is the another function, and
asm was your final asm version.

[another yields misleadingly correct results in this case, because of
the particular argument values given; in other cases, it gives
incorrect results.]

-miles

-- 
Discriminate, v.i. To note the particulars in which one person or thing is,
if possible, more objectionable than another.


___
Freetype-devel mailing list
Freetype-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/freetype-devel