Re: arm "neon"

2013-02-21 Thread Richard Henderson
On 2013-02-21 06:28, Torbjorn Granlund wrote: I'd advice strongly against that. Creating hard-to-trigger carry propagation bugs is not unlikely when playing with these primitives, and addmul_N.c will be much better at finding these, and will also shorten your development cycle (fast results, no

Re: arm "neon"

2013-02-21 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: One could hope so. I'm still a bit skeptic to mul and accumulate, at least for addmul_2, since it's seems like a good idea to schedule the multiplications far in advance. Multiply-accumulate can have the drawback to put multiplication in the cri

Re: arm "neon"

2013-02-21 Thread Niels Möller
Torbjorn Granlund writes: > I suspect that some scheduling just might improve performance by a large > factor... One could hope so. I'm still a bit skeptic to mul and accumulate, at least for addmul_2, since it's seems like a good idea to schedule the multiplications far in advance. > How do yo

Re: arm "neon"

2013-02-21 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Found the vmlal instruction now. Makes for a cute loop, .Loop: vld1.32 l01[1], [vp]! vld1.32 {u00[]}, [up]! vaddl.u32 q1, l01, c01 vmlal.u32 q1, u00, v01 C q1 overlaps with c01 a

Re: arm "neon"

2013-02-21 Thread Niels Möller
Torbjorn Granlund writes: > IIRC, there is an almost parallel set of SIMD multiply-accumulate > insns. Found the vmlal instruction now. Makes for a cute loop, .Loop: vld1.32 l01[1], [vp]! vld1.32 {u00[]}, [up]! vaddl.u32 q1, l01, c01 vmlal.u