Have you done an ARM version? Forgive my inattentiveness if you've
already announced one. It just struck me that this sort of optimisation
is even more necessary on mobile devices.
Graham
James Cloos wrote:
The final result for amd64 looks like:
static __inline__ long
FT_MulFix_x86_64(
phil song phils...@techtrex.com writes:
Excuse me ,I don't understand why we need tell the slashdot.org people?
It's just a quick way to generate a lot of publicity about monotype's
dirty tricks...
-Miles
--
We are all lying in the gutter, but some of us are looking at the stars.
James Cloos cl...@jhcloos.com writes:
__asm__ __volatile__ (
movq %1, %%rax\n
imul %2\n
addq %%rdx, %%rax\n
addq $0x8000, %%rax\n
sarq $16, %%rax\n
: =a(result)
: g(a), g(b)
: rdx );
The above code has a latency of 1+5+1+1+1 = 10
Incidentally, you wrote:
The assembly generated by the C code is 45 lines and 158 octets long,
contains six conditional jumps, three each of explicit compares and
tests, and still benchmarks are just as fast. Out-of-order processing
wins out over hand-coded asm. :-/
... but when I follow
Miles Bader mi...@gnu.org writes:
The compiler generates the following assembly:
mov %esi, %eax
mov %edi, %edi
imulq %rdi, %rax
addq$32768, %rax
shrq$16, %rax
The movs there are obviously a bit silly (compiler bug?), but that
output seems
Here's a new version of my spline flattening patch. (I would like to be
able to push this to the git repository but am having authentication
problems; Werner has been helping me, but no success so far, probably
because of my ineptitude in these matters.).
The nub of the latest change is that
Graham,
That's looking much closer to what I would have thought we needed; only
splitting the curve when required. However, your fast heuristic can be very
inaccurate.
Consider
P0: 0,0
P1: 5,5
P2: 95,5
P3: 100,0
The max deviation is 3.75 (0.75 * 5 since Hain's v == 1), but your
David,
in fact I coded up and tested a different version using an accurate
calculation of the control point deviation, but it was slower than the
version I am proposing. I'll try your version; and I would be grateful
if you could also do some benchmarking, because obviously we are not
trying
I'll do that. I wonder how much of the cost of FT_Load_Glyph is actually spent
in gray_render_cubic and how much impact reducing the number of line segments
has on later phases of the rendering process.
I'm also trying to see if I can come up with a heuristic that is both fast
(i.e. simple
GA == Graham Asher graham.as...@btinternet.com writes:
GA Have you done an ARM version? Forgive my inattentiveness if you've
GA already announced one. It just struck me that this sort of
GA optimisation is even more necessary on mobile devices.
I386, arm and arm-thumb versions were already
MB == Miles Bader mi...@gnu.org writes:
MB The compiler generates the following assembly:
MB mov %esi, %eax
MB mov %edi, %edi
MB imulq %rdi, %rax
MB addq$32768, %rax
MB shrq$16, %rax
That does not match the C code though; it rounds negative values wrong.
On Tue, Sep 7, 2010 at 4:28 AM, James Cloos cl...@jhcloos.com wrote:
MB == Miles Bader mi...@gnu.org writes:
MB The compiler generates the following assembly:
MB mov %esi, %eax
MB mov %edi, %edi
MB imulq %rdi, %rax
MB addq $32768, %rax
MB shrq $16, %rax
MB == Miles Bader mi...@gnu.org writes:
The C version does away-from-zero rounding.
MB Do you have test cases that show this? I tried using random inputs,
MB but even up to billions of iterations, I can't seem to find a set of
MB inputs where my function yields different results from yours.
13 matches
Mail list logo