Re: [Pixman] [PATCH] sse2: Using MMX and SSE 4.1

Jeff Muizelaar Wed, 09 May 2012 10:48:07 -0700

On 2012-05-09, at 12:57 PM, Søren Sandmann wrote:

> Matt Turner <[email protected]> writes:
> 
>> I started porting my src_8888_0565 MMX function to SSE2, and in the
>> process started thinking about using SSE3+. The useful instructions
>> added post SSE2 that I see are
>>      SSE3:   lddqu - for unaligned loads across cache lines
> 
> I don't really understand that instruction. Isn't it identical to
> movdqu?  Or is the idea that lddqu is faster than movdqu for cache line
> splits, but slower for plain old, non-cache split unaligned loads?


"The instructions movdqu, movups, movupd and lddqu are all able to read 
unaligned vectors. lddqu is faster than the alternatives on P4E and PM 
processors, but requires the SSE3 instruction set. The unaligned read 
instructions are relatively slow on older processors, but faster on Nehalem, 
Sandy Bridge and on future AMD and Intel processors."

>From http://www.agner.org/optimize/optimizing_assembly.pdf

-Jeff
_______________________________________________
Pixman mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH] sse2: Using MMX and SSE 4.1

Reply via email to