Re: Help with Hand-Optimized Assembly

2012-03-28 Thread Bill Woessner
On Jan 13, 4:59 am, Terje Mathisen terje.mathisen at tmsw.no@giganews.com wrote: I'll second James' suggestion about SSE2! I'm open to using SSE2. The only reason I used x87 is that I started with the assembly code that g++ generated. By default, it generates x87 instructions. But I'm

Re: Help with Hand-Optimized Assembly

2012-03-28 Thread Terje Mathisen
James Van Buskirk wrote: Bill Woessnerwoess...@nospicedham.gmail.com wrote in message news:67ddafac-ae03-4ef1-b156-5488e8b80...@i26g2000vbt.googlegroups.com... This compiles, runs and produces the correct answers. But I have a few issues with it: 1) If I declare this function inline, it

Re: Help with Hand-Optimized Assembly

2012-03-28 Thread Terje Mathisen
sfuerst wrote: There is a straight-forward algorithm using the fact that only one of the bounds can be crossed... Something like this: (Inputs in %xmm0, and %xmm1, output in %xmm0) subsd %xmm1,%xmm0 movsd plusM_PI(%rip), %xmm1 movsd minusM_PI(%rip), %xmm2 cmpgtsd %xmm0, %xmm1 cmpltsd %xmm0,

Re: Help with Hand-Optimized Assembly

2012-03-28 Thread Terje Mathisen
Tim Roberts wrote: Terje Mathisenterje.mathisen at tmsw.no@giganews.com wrote: Inline C isn't too hard to write: Have you tried this code? inline double delta(double th1, th2) { static double pi = 3.14159265357989; static double zero_or_twopi[2] = {0, 3.14159265357989*2};

Re: Help with Hand-Optimized Assembly

2012-03-28 Thread io_x
Terje Mathisen terje.mathisen at tmsw.no@giganews.com ha scritto nel messaggio news:iqe6u8-os52@ntp6.tmsw.no... inline double delta(double th1, th2) why not 2 loops [one for normalize delta pi and one for -pi]? difPimPi(double x, y) x-=y |?#.2 .1: x=-pi#.3|x+=2*pi|#.1 .2: x

Re: Help with Hand-Optimized Assembly

2012-03-28 Thread James Van Buskirk
Terje Mathisen terje.mathisen at tmsw.no@giganews.com wrote in message news:5gh6u8-3062@ntp6.tmsw.no... sfuerst wrote: There is a straight-forward algorithm using the fact that only one of the bounds can be crossed... Something like this: (Inputs in %xmm0, and %xmm1, output in %xmm0)

Re: Help with Hand-Optimized Assembly

2012-03-28 Thread Terje Mathisen
Bill Woessner wrote: On Jan 13, 4:59 am, Terje Mathisenterje.mathisen at tmsw.no@giganews.com wrote: I'll second James' suggestion about SSE2! I'm open to using SSE2. The only reason I used x87 is that I started with the assembly code that g++ generated. By default, it generates x87

Re: Help with Hand-Optimized Assembly

2012-03-28 Thread James Van Buskirk
Bill Woessner woess...@nospicedham.gmail.com wrote in message news:67ddafac-ae03-4ef1-b156-5488e8b80...@i26g2000vbt.googlegroups.com... This compiles, runs and produces the correct answers. But I have a few issues with it: 1) If I declare this function inline, it gives me garbage (like