I was curious to try a benchmark, but don't have a problem with these block
sizes handy. Are other block sizes planned? Does someone have benchmarks
against current (S)BAIJ implementations (with software prefetch)? I've seen
the HPCA paper from Guo and Gropp, but I think that work was done before
BAIJ had software prefetch, but also perhaps with a version of BSTRM that
did not software prefetch, so I wonder how they compare now. Also, how is
the performance for multiple processes per socket on Intel and AMD?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20110509/0ade9f06/attachment.html>

Reply via email to