On Wed, 28 Jul 2010 09:20:46 -0500, "Ravi Kannan" <rxk at cfdrc.com> wrote: > Can anyone tell, whether PETSc uses (or has) OPENMP provision.
You are free to use OpenMP in user code (like function evaluation and Jacobian assembly). There is some basic support in PCOPENMP. > In addition, how does it compare with parallel systems with multi-core > architecture. Based on some literature search, my understanding is > that MPI has poor scalability in multi-core systems, due to the > Ethernet switch. Modern MPI implementations don't use Ethernet for self-sends, instead they map shared memory around. Even when they talk over TCP, it's on the kernel loopback device and so it never goes to the Ethernet device, but is more expensive to copy through the kernel. They can also use RDMA provided by the HCA on e.g. an InfiniBand divice to do the sends and receives without a context switch. I've heard some people observe this being faster than mapping shared memory for certain problems, but it's usually not. NUMA is an important complication, the mapping of physical pages (which your program doesn't directly contral) is crucial to performance. This happens automatically with MPI (through affinity settings), but needs some careful tuning with OpenMP. In particular, it is very easy to fault pages on a different socket from where you later use them, causing mysterious slowdowns by a factor at least as large as the number of sockets. For problems that admit domain-decomposition strategies, it's not clear that OpenMP is generally faster than MPI, the reliable memory performance that you get from MPI should not be underestimated. Like anything, it's problem and hardware dependent. Jed
