Chris Marshall <[email protected]> wrote:

> What do you mean doing the inner product in parallel?
> 
> The above was just the operation counts.

My bad. I thought you were telling me that *memory* was O(N**3). So I
completely misunderstood. I think I understand now. The inner product
is *copying* values in memory O(N**3) times. But memory usage is still
O(N**2).

Your email said "O(N**3) memory ops" and my monkey brain saw "O(N**3)
memory". :-P


> The good news is that is what caches are for which is why things
> aren't so bad for smaller matrices.  This type of optimization
> is what is needed to be done to speed up PDL's matrix multiply.
> Since you have O(N**3) computations and O(N**2) memory accesses,
> for large matrix multiplies you can completely hide the memory
> access cost---if it is implemented to do that...

Yeah, I think I understand now. Is the high-performance implementation
hard to do? I don't really have a sense of whether we are talking about
a weekend project or a major overhaul.

Daniel.

_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Reply via email to