On 03/17/12 03:33 PM, Gerard Gorman wrote: > Hi > > We have profiled on Cray compute nodes with two 16-core AMD Opteron > 2.3GHz Interlagos processors, using the same matrix but this time with > -ksp_type cg and -pc_type jacobi. Attached are the logs with the 32 MPI > processes and the 32 OpenMP threads tests. > > Most of the time is in stage 2. As seen previously, MatMult is > performing well, but the overall performance in KSPSolve drops for > OpenMP. I have attached a plot of the (hybrid mpi+openmp time)/(pure > openmp) where all 32 cores are always used. What the graph shows is that > we are always getting better performance in MatMult for pure OpenMP but > there is something additional in KSPSolve that degrades the OpenMP > performance. > > So far we have profiled with oprofile measuring the event > CPU_CLK_UNHALTED, but this has not shown up the bottleneck. So more > digging is required. > > Any suggestions/comments gratefully received. Why are you using gcc? (I'm biased, but serious question) Did you post your CFLAGS and FFLAGS? PathScale is happy to work with you on this.
Best, ./C
