Werner: Miles' version is shorter, is only wrong by one ulp and only
when the product overflows and is negative. My variation,
called another() above, fixes that slight difference.
Which would you prefer, if anything?
I tend to prefer the faster one $(Q#|(B
James Cloos cl...@jhcloos.com writes:
The C version does away-from-zero rounding.
MB Do you have test cases that show this? I tried using random inputs,
MB but even up to billions of iterations, I can't seem to find a set of
MB inputs where my function yields different results from yours.
MB == Miles Bader mi...@gnu.org writes:
MB Hm, are you sure that's not backwards? When I tried the git C version[*],
MB as well as your most recent FT_MulFix_x86_64, it returned 0x8506...
Odd. Adding your algo to my test app, I get:
7AFA8000, , 8505, 8505, 8506
#
James Cloos cl...@jhcloos.com writes:
Since FT's C version uses longs, though, this:
int another (long a, long b) {
long r = (long)a * (long)b;
long s = r 31;
return (r + s + 0x8000) 16;
}
That's not correct though, is it? The variable s should be the all
sign portion of the
Have you done an ARM version? Forgive my inattentiveness if you've
already announced one. It just struck me that this sort of optimisation
is even more necessary on mobile devices.
Graham
James Cloos wrote:
The final result for amd64 looks like:
static __inline__ long
FT_MulFix_x86_64(
James Cloos cl...@jhcloos.com writes:
__asm__ __volatile__ (
movq %1, %%rax\n
imul %2\n
addq %%rdx, %%rax\n
addq $0x8000, %%rax\n
sarq $16, %%rax\n
: =a(result)
: g(a), g(b)
: rdx );
The above code has a latency of 1+5+1+1+1 = 10
Incidentally, you wrote:
The assembly generated by the C code is 45 lines and 158 octets long,
contains six conditional jumps, three each of explicit compares and
tests, and still benchmarks are just as fast. Out-of-order processing
wins out over hand-coded asm. :-/
... but when I follow
Miles Bader mi...@gnu.org writes:
The compiler generates the following assembly:
mov %esi, %eax
mov %edi, %edi
imulq %rdi, %rax
addq$32768, %rax
shrq$16, %rax
The movs there are obviously a bit silly (compiler bug?), but that
output seems
GA == Graham Asher graham.as...@btinternet.com writes:
GA Have you done an ARM version? Forgive my inattentiveness if you've
GA already announced one. It just struck me that this sort of
GA optimisation is even more necessary on mobile devices.
I386, arm and arm-thumb versions were already
MB == Miles Bader mi...@gnu.org writes:
MB The compiler generates the following assembly:
MB mov %esi, %eax
MB mov %edi, %edi
MB imulq %rdi, %rax
MB addq$32768, %rax
MB shrq$16, %rax
That does not match the C code though; it rounds negative values wrong.
On Tue, Sep 7, 2010 at 4:28 AM, James Cloos cl...@jhcloos.com wrote:
MB == Miles Bader mi...@gnu.org writes:
MB The compiler generates the following assembly:
MB mov %esi, %eax
MB mov %edi, %edi
MB imulq %rdi, %rax
MB addq $32768, %rax
MB shrq $16, %rax
MB == Miles Bader mi...@gnu.org writes:
The C version does away-from-zero rounding.
MB Do you have test cases that show this? I tried using random inputs,
MB but even up to billions of iterations, I can't seem to find a set of
MB inputs where my function yields different results from yours.
The final result for amd64 looks like:
static __inline__ long
FT_MulFix_x86_64( long a,
long b )
{
register long result;
__asm__ __volatile__ (
movq %1, %%rax\n
imul %2\n
addq %%rdx, %%rax\n
addq $0x8000, %%rax\n
sarq $16,
I have to finish the patch, but I thought I'd offer the algorithm
for review, if anyone wants to.
I haven't enough knowledge to comment, but thanks for working on it!
Werner
___
Freetype-devel mailing list
Freetype-devel@nongnu.org
My first cut at FT_MulFix_x86_64() is:
static __inline__ FT_Int32
FT_MulFix_x86_64 (FT_Int32 a, FT_Int32 b) {
register FT_Int32 r;
__asm__ __volatile__ (
movslq %%edx, %%rdx\n
cltq\n
imul %%rdx\n
addq %%rdx, %%rax\n
addq $0x8000, %%rax\n
I see implementations for ia32 and arm; would other platforms
benefit from assembply implementations of MulFix?
As usual: patches are highly welcomed.
Werner
___
Freetype-devel mailing list
Freetype-devel@nongnu.org
16 matches
Mail list logo