Chris Marshall <[email protected]> wrote: > What do you mean doing the inner product in parallel? > > The above was just the operation counts.
My bad. I thought you were telling me that *memory* was O(N**3). So I completely misunderstood. I think I understand now. The inner product is *copying* values in memory O(N**3) times. But memory usage is still O(N**2). Your email said "O(N**3) memory ops" and my monkey brain saw "O(N**3) memory". :-P > The good news is that is what caches are for which is why things > aren't so bad for smaller matrices. This type of optimization > is what is needed to be done to speed up PDL's matrix multiply. > Since you have O(N**3) computations and O(N**2) memory accesses, > for large matrix multiplies you can completely hide the memory > access cost---if it is implemented to do that... Yeah, I think I understand now. Is the high-performance implementation hard to do? I don't really have a sense of whether we are talking about a weekend project or a major overhaul. Daniel. _______________________________________________ Perldl mailing list [email protected] http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
