"C. Bergstr?m" emailed the following on 17/03/12 08:55: > On 03/17/12 03:33 PM, Gerard Gorman wrote: >> Hi >> >> We have profiled on Cray compute nodes with two 16-core AMD Opteron >> 2.3GHz Interlagos processors, using the same matrix but this time with >> -ksp_type cg and -pc_type jacobi. Attached are the logs with the 32 MPI >> processes and the 32 OpenMP threads tests. >> >> Most of the time is in stage 2. As seen previously, MatMult is >> performing well, but the overall performance in KSPSolve drops for >> OpenMP. I have attached a plot of the (hybrid mpi+openmp time)/(pure >> openmp) where all 32 cores are always used. What the graph shows is that >> we are always getting better performance in MatMult for pure OpenMP but >> there is something additional in KSPSolve that degrades the OpenMP >> performance. >> >> So far we have profiled with oprofile measuring the event >> CPU_CLK_UNHALTED, but this has not shown up the bottleneck. So more >> digging is required. >> >> Any suggestions/comments gratefully received. > Why are you using gcc? (I'm biased, but serious question) Did you > post your CFLAGS and FFLAGS? PathScale is happy to work with you on > this. > > Best, > > ./C
Thanks for your help - I'll give pathscale a go this evening. I used GCC just because it gave me the least trouble building in the Cray env. As it was I still had to compile my own valgrind installation and was given the magic scrub_headers script to remove some troublesome headers generated by the petsc config. I didn't set any additional flags - just configured with --with-debugging=0 Cheers Gerard
