Re: [ft-devel] FT_MulFix assembly

2010-09-06 Thread Graham Asher
Have you done an ARM version? Forgive my inattentiveness if you've already announced one. It just struck me that this sort of optimisation is even more necessary on mobile devices. Graham James Cloos wrote: The final result for amd64 looks like: static __inline__ long FT_MulFix_x86_64(

Re: [ft-devel] freetype.com

2010-09-06 Thread Miles Bader
phil song phils...@techtrex.com writes: Excuse me ,I don't understand why we need tell the slashdot.org people? It's just a quick way to generate a lot of publicity about monotype's dirty tricks... -Miles -- We are all lying in the gutter, but some of us are looking at the stars.

Re: [ft-devel] FT_MulFix assembly

2010-09-06 Thread Miles Bader
James Cloos cl...@jhcloos.com writes: __asm__ __volatile__ ( movq %1, %%rax\n imul %2\n addq %%rdx, %%rax\n addq $0x8000, %%rax\n sarq $16, %%rax\n : =a(result) : g(a), g(b) : rdx ); The above code has a latency of 1+5+1+1+1 = 10

Re: [ft-devel] FT_MulFix assembly

2010-09-06 Thread Miles Bader
Incidentally, you wrote: The assembly generated by the C code is 45 lines and 158 octets long, contains six conditional jumps, three each of explicit compares and tests, and still benchmarks are just as fast. Out-of-order processing wins out over hand-coded asm. :-/ ... but when I follow

Re: [ft-devel] FT_MulFix assembly

2010-09-06 Thread Miles Bader
Miles Bader mi...@gnu.org writes: The compiler generates the following assembly: mov %esi, %eax mov %edi, %edi imulq %rdi, %rax addq$32768, %rax shrq$16, %rax The movs there are obviously a bit silly (compiler bug?), but that output seems

[ft-devel] latest patch file for spline flattening

2010-09-06 Thread Graham Asher
Here's a new version of my spline flattening patch. (I would like to be able to push this to the git repository but am having authentication problems; Werner has been helping me, but no success so far, probably because of my ineptitude in these matters.). The nub of the latest change is that

RE: [ft-devel] latest patch file for spline flattening

2010-09-06 Thread David Bevan
Graham, That's looking much closer to what I would have thought we needed; only splitting the curve when required. However, your fast heuristic can be very inaccurate. Consider P0: 0,0 P1: 5,5 P2: 95,5 P3: 100,0 The max deviation is 3.75 (0.75 * 5 since Hain's v == 1), but your

Re: [ft-devel] latest patch file for spline flattening

2010-09-06 Thread Graham Asher
David, in fact I coded up and tested a different version using an accurate calculation of the control point deviation, but it was slower than the version I am proposing. I'll try your version; and I would be grateful if you could also do some benchmarking, because obviously we are not trying

RE: [ft-devel] latest patch file for spline flattening

2010-09-06 Thread David Bevan
I'll do that. I wonder how much of the cost of FT_Load_Glyph is actually spent in gray_render_cubic and how much impact reducing the number of line segments has on later phases of the rendering process. I'm also trying to see if I can come up with a heuristic that is both fast (i.e. simple

Re: [ft-devel] FT_MulFix assembly

2010-09-06 Thread James Cloos
GA == Graham Asher graham.as...@btinternet.com writes: GA Have you done an ARM version? Forgive my inattentiveness if you've GA already announced one. It just struck me that this sort of GA optimisation is even more necessary on mobile devices. I386, arm and arm-thumb versions were already

Re: [ft-devel] FT_MulFix assembly

2010-09-06 Thread James Cloos
MB == Miles Bader mi...@gnu.org writes: MB The compiler generates the following assembly: MB mov %esi, %eax MB mov %edi, %edi MB imulq %rdi, %rax MB addq$32768, %rax MB shrq$16, %rax That does not match the C code though; it rounds negative values wrong.

Re: [ft-devel] FT_MulFix assembly

2010-09-06 Thread Miles Bader
On Tue, Sep 7, 2010 at 4:28 AM, James Cloos cl...@jhcloos.com wrote: MB == Miles Bader mi...@gnu.org writes: MB The compiler generates the following assembly: MB     mov     %esi, %eax MB     mov     %edi, %edi MB     imulq   %rdi, %rax MB     addq    $32768, %rax MB     shrq    $16, %rax

Re: [ft-devel] FT_MulFix assembly

2010-09-06 Thread James Cloos
MB == Miles Bader mi...@gnu.org writes: The C version does away-from-zero rounding. MB Do you have test cases that show this? I tried using random inputs, MB but even up to billions of iterations, I can't seem to find a set of MB inputs where my function yields different results from yours.