On Wed, 7 Sep 2011, Mathieu Bouchard wrote:

On Wed, 7 Sep 2011, Bill Gribble wrote:

So far iteration on plain floats seems to be the best I can come up with, but HADDPS is tantalizingly close to what I want to do. Any hints?

Once I thought that with some commutativity you could speed things up like this :

(f0+f1+f2+f3)+(f4+f5+f6+f7)+...

can be rearranged as :

(f0+f4+...)+(f1+f5+...)+(f2+f6+...)+(f3+f7+...)

But what I said does not apply to your case, because you want a scan, whether I didn't really read and assumed a fold.

I don't know how to optimise a scan.

 _______________________________________________________________________
| Mathieu Bouchard ---- tél: +1.514.383.3801 ---- Villeray, Montréal, QC
_______________________________________________
[email protected] mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list

Reply via email to