On 20 Jan 2014, at 14:23, Yosef Zlochower <[email protected]> wrote:
> On 01/20/2014 08:06 AM, Ian Hinder wrote: >> On 20 Jan 2014, at 06:14, James Healy <[email protected]> wrote: >> >>> Hello all, >>> >>> On Thursday morning, I pulled a fresh checkout of the newest version of >>> the Einstein Toolkit (ET_2013_11) to use with RIT's LazEv code. I >>> compiled it on stampede using the current stampede.cfg located in >>> simfactory/mdb/optionlists which uses Intel MPI version 4.1.0.030 and >>> the intel compilers version 13.1.1.163 (enabled through a module load). >>> I submitted a short job which I ran previously with ET_2013_05. The >>> results come out the same. However, the run speed as reported in >>> Carpet::physical_time_per_hour is poor. It starts off good, >>> approximately the same as with the previous build, but over time drops >>> to as low as half the speed over 24 hours of evolution. On recovery from >>> checkpoint, the speed is even worse, dropping to below 1/4 of the >>> original run speed. >>> >>> So, I tried using the previous stampede.cfg included in the ET_2013_05 >>> branch of simfactory, the same one I used to compile my ET_2013_05 >>> build. This cfgfile uses the same version of IMPI but different Intel >>> compilers (version 13.0.2.146). The run speed shows the same trends as >>> when using the newer config file. >> Hi Jim, >> >> I'm quite confused by this problem report. I guess that you are meaning the >> following: >> >> - You get the slowdown with the current ET_2013_11 release >> - You don't get the slowdown with the ET_2013_05 release >> - You do get the slowdown if you use the current ET_2013_11 release with the >> ET_2013_05 stampede.cfg >> >> Is that correct? >> >> I consider Intel MPI to be unusable on Stampede, and that it always has >> been. I used to get random crashes, hangs and slowdowns. I also >> experienced similar problems with Intel MPI on SuperMUC. For any serious >> work, I have always used MVAPICH2 on Stampede. In the current ET trunk >> Intel MPI has been replaced with MVAPICH2. I would try the current trunk >> and see if this fixes your problems. You can also use just the stampede >> files from the current trunk with the ET_2013_11 release (make sure you use >> the ones listed in stampede.ini). > Interesting. I haven't been able to get a run to work with mvapich2 because > of an issue with the runs > dying during checkpoint. Which config file are you using (module loaded, > etc)? How much ram per node > do your production runs typically use? I'm using exactly the default simfactory config from the current trunk, so you can see the modules etc there. Checkpointing (and recovery works fine). I usually aim for something like 75% memory usage for production runs. > >> We didn't change the MPI version before the release, as that would have been >> quite an invasive change at that point. However, I would consider >> backporting this, after suitable discussion. >> >> Of course, your problem might be unrelated to the version of MPI. I am >> running perfectly fine on stampede with the current trunk (MVAPICH2); runs >> have a consistent speed and retain this speed after recovery. >> > -- Ian Hinder http://numrel.aei.mpg.de/people/hinder
_______________________________________________ Users mailing list [email protected] http://lists.einsteintoolkit.org/mailman/listinfo/users
