My first cut at FT_MulFix_x86_64() is: static __inline__ FT_Int32 FT_MulFix_x86_64 (FT_Int32 a, FT_Int32 b) { register FT_Int32 r; __asm__ __volatile__ ( "movslq %%edx, %%rdx\n" "cltq\n" "imul %%rdx\n" "addq %%rdx, %%rax\n" "addq $0x8000, %%rax\n" "sarq $16, %%rax\n" : "=a"(r) : "a"(a), "d"(b)); return r; }
It passes a monte-carlo test comparing its results to the C code and to the i386 assembly. The logic is simple. The first two instructions sign-extend the two values to 64 bits, the multiply puts the least significant 64 bits of the product in rax and the most significant bits in rdx; because the values started out as 32 bit, rdx is guaranteed to be only sign bits: zero if the product is >=0, else -1. Adding the resulting rdx to rax serves the same purpose as the ecx value in the i386 version: it makes the rounding symmetric around zero, just like the C code. An alternative might be to cast the src values to (FT_Int64), but I doubt that the compiler would generate any better code than calling movslq and cltq. I have to finish the patch, but I thought I'd offer the algorithm for review, if anyone wants to. -JimC -- James Cloos <cl...@jhcloos.com> OpenPGP: 1024D/ED7DAEA6 _______________________________________________ Freetype-devel mailing list Freetype-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/freetype-devel