On a multicore node, you may not get a very good speedup if the bandwidth is heavily shared between all the cores. I guess this is what Petsc people have explained here http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers If you have a multi-socket multicore node, my guess would be to keep one MPI process on each socket and then to use a multithreaded BLAS (like Goto) inside each socket to keep the cores busy during BLAS operations. Hope this helps Desire
On 04/01/2011 03:43 PM, Ormiston, Scott J. wrote: > I am just starting to try superlu_dist to get a direct solver that > runs in parallel with PETSc. > > My first tests (with ex15f) show that it takes longer and longer as > the number of cores increases. For example 4 cores takes 8 times > longer than 2 cores and 8 cores takes 25 times longer than 4 cores. > Obviously I expected a speed-up; has anyone else seen this behaviour > with superlu_dist? If not, what could be going wrong here? > > Scott Ormiston
