Hi Rodrigo, These are interesting results. It looks like you were bound by a speedup of about 2, which suggests you might have been seeing cache capacity/conflict problems. Did you do any further analysis on why you weren't able to get better performance?
A On Fri, Nov 12, 2010 at 8:26 AM, Rodrigo R. Paz <rodrigop at intec.unl.edu.ar> wrote: > Hi all, > find attached a plot with some results (speedup) that we have obtained some > time ago with some hacks we introduced to petsc in order to be used on > hybrid archs using openmp. > The tests were done in a set of 6 Xeon nodes with 8 cores each. Results are > for the MatMult op in KSP in the context of the solution of > advection-diffusion-reaction eqs by means of SUPG stabilized FEM. > > Rodrigo > > -- > Rodrigo Paz > National Council for Scientific Research CONICET > CIMEC-INTEC-CONICET-UNL. > G?emes 3450. 3000, Santa Fe, Argentina. > Tel/Fax: +54-342-4511594, Fax: +54-342-4511169 > > > On Thu, Nov 11, 2010 at 10:34 PM, Barry Smith <bsmith at mcs.anl.gov> wrote: >> >> On Nov 11, 2010, at 7:22 PM, Jed Brown wrote: >> >> > On Fri, Nov 12, 2010 at 02:18, Barry Smith <bsmith at mcs.anl.gov> wrote: >> > How do you get adaptive load balancing (across the cores inside a >> > process) if you have OpenMP compiler decide the partitioning/parallelism? >> > This was Bill's point in why not to use OpenMP. For example if you give >> > each >> > core the same amount of work up front they will end not ending at the same >> > time so you have wasted cycles. >> > >> > Hmm, I think this issue is largely subordinate to the memory locality >> > (for the sort of work we usually care about), but the OpenMP could be more >> > dynamic about distributing work. ?I.e. this could be an OpenMP >> > implementation or tuning issue, but I don't see it as a fundamental >> > disadvantage of that programming model. ?I could be wrong. >> >> ? You are probably right, your previous explanation was better. ?Here is >> something related that Bill and I discussed, static load balance has lower >> overhead while dynamic has more overhead. Static load balancing however will >> end up with some in-balance. Thus one could do an upfront static load >> balancing of most of the data then when the first cores run out of their >> static work they do the rest of the work with the dynamic balancing. >> >> ? Barry >> >> > >> > Jed >> > >
