On Tue, 3 Mar 2009, Carlo Nervi wrote: CN> Axel, I've heard opposite opinions on openmp and mpich2.
do yourself a favor and try to install OpenMPI instead of MPICH2 first. it is performing better, or as well, and _much_ less clumsy to use (at least the way we use it in our group). CN> The recent Openmp (i've heard) is now efficient and fast. yes, the compilers may incur less overhead when using OpenMP, but OpenMP is still only fast, if the directives are programmed well and the code rewritten to make good use of it. last time i checked, there were no OpenMP directives in Q-E, so setting the -openmp flag will have no significant effect. in general, OpenMP is not a good way to multithread a code. running your own thread management with a pool of threads to delegate work to, is _much_ more efficient. the reason OpenMP is so attractive is, that you can add _some_ parallelism to your code with very little effort. just sprinkle a few OpenMP directives at the right places, and you get a moderate speedup. a very good deal for the effort. but that by far cannot cope with the efficiency of the MPI distributed data parallelism that is in plane wave codes like Q-E. ...and by the time you have exploited that parallelism to the maximum, you have so little data left, that OpenMP is not helping much anymore. to give you some numbers. using a different, very well OpenMP parallelised plane wave code, (including OpenMP FFT, and OpenMP BLAS/LAPACK), i managed to get about 80% of the efficiency of the MPI parallelization on a single 2x dual core node, using a sizable input. at running across two of those nodes, the efficiency of OpenMP versus MPI (i.e. running 2 MPI tasks * 4 threads vs. 8 MPI tasks) dropped to 60% and for anything larger there was next no gain, or even a slowdown. as was said before. -ipo is a waste of your time. all it does is make the code run slower, and make you wait longer until it is linked, and even more so, quite often code gets miscompiled because of -ipo. all these "advanced" compiler features (like IPO, PGO, SSE/MMX-vectorization) are working best with small test cases, but for any larger, more complex code, they add overhead about as often as they don't. the compiler doesn't really know which parts of a code are executed a lot and which are not, so it may put a lot of effort into optimizing the wrong parts, and - as i wrote before - aggressive optimization has the high risk of miscompiled (overoptimized) code. CN> However, I trust the experience of the developers :-). you should never do that. as a scientist you are obliged to never take anything for granted and convince yourself. i may just be a "compiler terrorist", trying to sabotage sales for the compiler vendors. ;-) cheers, axel. CN> Thanks, CN> Carlo CN> _______________________________________________ CN> Pw_forum mailing list CN> Pw_forum at pwscf.org CN> http://www.democritos.it/mailman/listinfo/pw_forum CN> -- ======================================================================= Axel Kohlmeyer akohlmey at cmm.chem.upenn.edu http://www.cmm.upenn.edu Center for Molecular Modeling -- University of Pennsylvania Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323 tel: 1-215-898-1582, fax: 1-215-573-6233, office-tel: 1-215-898-5425 ======================================================================= If you make something idiot-proof, the universe creates a better idiot.
