thank you! this has been extremely useful in figuring out a plan of action.
On Mon, Sep 15, 2014 at 9:08 PM, Barry Smith <[email protected]> wrote: > > Based on the streams speedups below it looks like a single core can > utilize roughly 1/2 of the memory bandwidth, leaving all the other cores > only 1/2 of the bandwidth to utilize, so you can only expect at best a > speedup of roughly 2 on this machine with traditional PETSc sparse solvers. > > To add insult to injury it appears that the threads are not being > assigned to physical cores very well either. Under the best circumstance > on this system one would like to see a speedup of about 2 when running with > two processes but it actually delivers only 1.23 and the speedup of 2 only > occurs with 5 processes. I attribute this to the MPI or OS not assigning > the second MPI process to the “best” core for memory bandwidth. Likely it > should assign the second MPI process to the 2nd CPU but instead it is > assigning it also to the first CPU and only when it gets to the 5th MPI > process does the second CPU get utilized. > > You can look at the documentation for your MPI’s process affinity to > see if you can force the 2nd MPI process onto the second CPU. > > Barry > > > np speedup > 1 1.0 > 2 1.23 > 3 1.3 > 4 1.75 > 5 2.18 > > > 6 1.22 > 7 2.3 > 8 1.22 > 9 2.01 > 10 1.19 > 11 1.93 > 12 1.93 > 13 1.73 > 14 2.17 > 15 1.99 > 16 2.08 > 17 2.16 > 18 1.47 > 19 1.95 > 20 2.09 > 21 1.9 > 22 1.96 > 23 1.92 > 24 2.02 > 25 1.96 > 26 1.89 > 27 1.93 > 28 1.97 > 29 1.96 > 30 1.93 > 31 2.16 > 32 2.12 > Estimation of possible > > On Sep 15, 2014, at 1:42 PM, Katy Ghantous <[email protected]> wrote: > > > Matt, thanks! i will look into that and find other ways to make the > computation faster. > > > > Barry, the benchmark reports up to 2 speedup, but says 1 node in the > end. but either way i was expecting a higher speedup.. 2 is the limit for > two cpus despite the multiple cores? > > > > please let me know if the file attached is what you are asking for. > > Thank you! > > > > > > On Mon, Sep 15, 2014 at 8:23 PM, Barry Smith <[email protected]> wrote: > > > > Please send the output from running > > > > make steams NPMAX=32 > > > > in the PETSc root directory. > > > > > > Barry > > > > My guess is that it reports “one node” is just because it uses the > “hostname” to distinguish nodes and though your machine has two CPUs, from > the point of view of the OS it has only a single hostname and hence reports > just one “node”. > > > > > > On Sep 15, 2014, at 12:45 PM, Katy Ghantous <[email protected]> > wrote: > > > > > Hi, > > > I am using DMDA to run in parallel TS to solves a set of N equations. > I am using DMDAGetCorners in the RHSfunction with setting the stencil size > at 2 to solve a set of coupled ODEs on 30 cores. > > > The machine has 32 cores (2 physical CPUs with 2x8 core each with > speed of 3.4Ghz per core). > > > However, mpiexec with more than one core is showing no speedup. > > > Also at the configuring/testing stage for petsc on that machine, there > was no speedup and it only reported one node. > > > Is there somehting wrong with how i configured petsc or is the > approach inappropriate for the machine? > > > I am not sure what files (or sections of the code) you would need to > be able to answer my question. > > > > > > Thank you! > > > > > > <scaling.log> > >
