sure, I understand. Let me just quickly explain the matrix-vector
behavior you've observed: I don't know your experimental setting, but
for a 1kx1k matrix-vector the small input (8MB) likely fits into L3
cache. If you would increase the data sizes, let's say to 8GB (where you
actually read from
Thanks for doing this, Deron. Cleaning up the code base will make it much
easier to maintain and fix issues as they arise.
-Mike
--
Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry
Sent from my iPhone.
> On Nov 30, 2016, at 11:56 AM, Deron Eriksson w