On 2013-02-21 06:28, Torbjorn Granlund wrote:
I'd advice strongly against that. Creating hard-to-trigger carry
propagation bugs is not unlikely when playing with these primitives, and
addmul_N.c will be much better at finding these, and will also shorten
your development cycle (fast results, no
ni...@lysator.liu.se (Niels Möller) writes:
One could hope so. I'm still a bit skeptic to mul and accumulate, at
least for addmul_2, since it's seems like a good idea to schedule the
multiplications far in advance.
Multiply-accumulate can have the drawback to put multiplication in the
cri
Torbjorn Granlund writes:
> I suspect that some scheduling just might improve performance by a large
> factor...
One could hope so. I'm still a bit skeptic to mul and accumulate, at
least for addmul_2, since it's seems like a good idea to schedule the
multiplications far in advance.
> How do yo
ni...@lysator.liu.se (Niels Möller) writes:
Found the vmlal instruction now. Makes for a cute loop,
.Loop:
vld1.32 l01[1], [vp]!
vld1.32 {u00[]}, [up]!
vaddl.u32 q1, l01, c01
vmlal.u32 q1, u00, v01 C q1 overlaps with c01 a
Torbjorn Granlund writes:
> IIRC, there is an almost parallel set of SIMD multiply-accumulate
> insns.
Found the vmlal instruction now. Makes for a cute loop,
.Loop:
vld1.32 l01[1], [vp]!
vld1.32 {u00[]}, [up]!
vaddl.u32 q1, l01, c01
vmlal.u