On Wed, Sep 29, 2010 at 14:34, Moinier, Pierre (UK) <Pierre.Moinier at baesystems.com> wrote: > Jed, > > Thanks for your help and thanks also to all of the others who have replied!. > I made some progress and wrote a new code that runs in parallel. However the > results seems to show that the time requires to solve the linear systems is > the same whether I use 1, 2 or 4 processors... Surely I am missing something. > I copied the code below. For info, I run the executable as: ./test -ksp_type > cg -ksp_rtol 1.e-6 -pc_type none
How big is the matrix (dimensions and number of nonzeros)? Run with -log_summary and send the output. This problem is mostly memory bandwidth limited and a single core can saturate most of the memory bus for a whole socket on most architectures. If you are interested in time to solution, you almost certainly want to use a preconditioner. Sometimes these do more work per byte so you may be able to see more speedup without adding sockets. Jed
