Hi guys, I have seen multiple emails regarding this in the mailing list and I'm afraid you might have already answered this question but I'm not quite sure!
I have objects in my code that are hard(er) to parallelize using MPI and so far my strategy has been to just handle them in serial such that each process has a copy of the whole thing. This object is related to my grid generation/information etc so it only needs to be done once at the beginning (no moving mesh for NOW). As a result I do not care much about the speed since its nothing compared to the overall solution time. However, I do care about the memory that this object consumes and can limit my problem size. So I had the following idea the other day. Is it possible/good idea to paralleize the grid generation using OpenMP so that each node (as opposed to core) would share the data structure? This can save me a lot since memory on nodes are shared among cores (e.g. 32 GB/node vs 2GB/core on Ranger). What I'm not quite sure about is how the job is scheduled when running the code via mpirun -n Np. Should Np be the total number of cores or nodes? If I use, say Np = 16 processes on one node, MPI is running 16 versions of the code on a single node (which has 16 cores). How does OpenMP figure out how to fork? Does it fork a total of 16 threads/MPI process = 256 threads or is it smart to just fork a total of 16 threads/node = 1 thread/core = 16 threads? I'm a bit confused here how the job is scheduled when MPI and OpenMP are mixed? Do I make any sense at all?! Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120420/cd466a9c/attachment.htm>
