Re: [ft-devel] FT_MulFix assembly

James Cloos Sun, 08 Aug 2010 13:40:21 -0700

My first cut at FT_MulFix_x86_64() is:

static __inline__ FT_Int32
FT_MulFix_x86_64 (FT_Int32 a, FT_Int32 b) {
    register FT_Int32 r;
    __asm__ __volatile__ (
        "movslq %%edx, %%rdx\n"
        "cltq\n"
        "imul  %%rdx\n"
        "addq  %%rdx, %%rax\n"
        "addq  $0x8000, %%rax\n"
        "sarq  $16, %%rax\n"
        : "=a"(r)
        : "a"(a), "d"(b));
    return r;
}


It passes a monte-carlo test comparing its results to the C code and to
the i386 assembly.

The logic is simple.  The first two instructions sign-extend the two
values to 64 bits, the multiply puts the least significant 64 bits of
the product in rax and the most significant bits in rdx; because the
values started out as 32 bit, rdx is guaranteed to be only sign bits:
zero if the product is >=0, else -1.  Adding the resulting rdx to rax
serves the same purpose as the ecx value in the i386 version: it makes
the rounding symmetric around zero, just like the C code.

An alternative might be to cast the src values to (FT_Int64), but I
doubt that the compiler would generate any better code than calling
movslq and cltq.  

I have to finish the patch, but I thought I'd offer the algorithm for
review, if anyone wants to.

-JimC
-- 
James Cloos <cl...@jhcloos.com>         OpenPGP: 1024D/ED7DAEA6

_______________________________________________
Freetype-devel mailing list
Freetype-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/freetype-devel

Re: [ft-devel] FT_MulFix assembly

Reply via email to