On Wed, 14 Nov 2007, Dr Brent Walker wrote: BW> Hi all,
hi brent, BW> Does anyone know whether QE has been used successfully with the IBM BW> ESSL libraries on the Blue Gene L architecture (running linux)? Google BW> unfortunately hasn't provided me with much in relation to this. yep. i've managed to compile and run QE on a BG/L. BW> I have spent some time trying to get QE (well really PWscf) running on BW> such a machine and am at the stage of deciding whether to persevere or BW> just give up and use locally compiled versions of fftw and lapack/blas BW> (following say the provided "Make.bgl" file, which seems to work fine BW> for me). due to the (lack of) features in the BG/L cpu, you may actually get reasonable performance with regular BLAS/LAPACK. you can try the "double hummer" libraries, but then you are limited to coprocessor mode. this is probably needed anyways, because the limitations to jobs on BG/L are very hard. there is no local storage, so pw.x with its default setting of storing wavefunctions as files, is not scaling well. you'll have to use the (experimental) feature of storing those in memory, but with 512MB/node there is not much memory available. on top of that, the cpus on BG/L are very slow, so you need to parallelize across a large number of cpus to get decent performance. in my view for a code like pw.x it is currently not worth the hassle. your chances with cp.x are much better, but then again, you are limited by the supported feature set of cp.x. altogether, you have to keep in mind, that BG/L is mainly a machine to get a good ranking in the top500 and thus please administrators, politicians and generally people who are not using it. from the user's perspective it is a constant struggle and a PITA. if i had the choice, i'd rather skip the top500 placement and get a machine that is usable. the majority of QE jobs are run on rather small clusters, so to run well on those machines is where most of the effort goes. BW> Is this worth pursuing or should I just file it in the "too hard" BW> basket for the time being? If people think there is some hope that I BW> can get this to work, I'll provide more details (make.sys, etc.). BW> BW> Thanks very much for any information/thoughts/anecdotes on this! well, i've been struggling a lot with finding _any_ project that runs well on a BG/L that does not run better on a cray xt3/xt4 or even a reasonably well laid out PC cluster with DDR infiniband. my best results were so far with classical MD using LAMMPS on systems that have no coulomb interactions. there i am scaling out on the BG/L at half the performance of the scaleout timing on a cray xt3. for most codes, particularly plane wave pseudopotential DFT the difference is about a factor of 10. so before putting in more effort, it might be worth to discuss what kind of calculations you intend to run and how much cpu time across how many nodes you have at your disposal. cheers, axel. BW> BW> Brent. BW> BW> PS. I have noted AK's comment "good luck (you'll be needing it)" BW> regarding compilation of QE on BG/L on 31 Aug, which of course doesn't BW> bode well! BW> BW> -- ======================================================================= Axel Kohlmeyer akohlmey at cmm.chem.upenn.edu http://www.cmm.upenn.edu Center for Molecular Modeling -- University of Pennsylvania Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323 tel: 1-215-898-1582, fax: 1-215-573-6233, office-tel: 1-215-898-5425 ======================================================================= If you make something idiot-proof, the universe creates a better idiot.
