On 5/7/10 8:55 AM, Craig DeForest wrote: > > On May 7, 2010, at 3:27 AM, Daniel Carrera wrote: >>> The good news is that is what caches are for which is why things >>> aren't so bad for smaller matrices. This type of optimization >>> is what is needed to be done to speed up PDL's matrix multiply. >>> Since you have O(N**3) computations and O(N**2) memory accesses, >>> for large matrix multiplies you can completely hide the memory >>> access cost---if it is implemented to do that... >> >> Yeah, I think I understand now. Is the high-performance implementation >> hard to do? I don't really have a sense of whether we are talking >> about >> a weekend project or a major overhaul. > > > It could be a weekend project for someone smart. Doing it "Right" > would require an overhaul of the PP code generator itself, which is > pretty hairy - but making matrix multiplication, in particular, using > tiling to optimize cache usage could probably be done with stupid > index tricks. >
Can I suggest that before any sort of optimization work that we get a solid set of performance benchmarks (a la Daniel's micro benchmarks perhaps?) so that we can be sure that any such work actually does improve things (and, just as importantly, doesn't slow things down elsewhere). I still occasionally have nightmares from my time digging around in PDL:PP when I added the bad-value code, and then again when I tried to make it a bit more self-describing ;-) Doug -- ------------------------------------------------------------------- Doug Burke | http://hea-www.harvard.edu/~dburke/ Harvard-Smithsonian | Email: [email protected] Center for Astrophysics | Phone: (617) 496 7853 60 Garden Street MS-2 | Fax: (617) 495 7356 Cambridge, MA 02138 | Office: B-440 ------------------------------------------------------------------- _______________________________________________ Perldl mailing list [email protected] http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
