On 20 Jan 2014, at 14:23, Yosef Zlochower <[email protected]> wrote:

> On 01/20/2014 08:06 AM, Ian Hinder wrote:
>> On 20 Jan 2014, at 06:14, James Healy <[email protected]> wrote:
>> 
>>> Hello all,
>>> 
>>> On Thursday morning, I pulled a fresh checkout of the newest version of
>>> the Einstein Toolkit (ET_2013_11) to use with RIT's LazEv code. I
>>> compiled it on stampede using the current stampede.cfg located in
>>> simfactory/mdb/optionlists which uses Intel MPI version 4.1.0.030 and
>>> the intel compilers version 13.1.1.163 (enabled through a module load).
>>> I submitted a short job which I ran previously with ET_2013_05.  The
>>> results come out the same.  However, the run speed as reported in
>>> Carpet::physical_time_per_hour is poor. It starts off good,
>>> approximately the same as with the previous build, but over time drops
>>> to as low as half the speed over 24 hours of evolution. On recovery from
>>> checkpoint, the speed is even worse, dropping to below 1/4 of the
>>> original run speed.
>>> 
>>> So, I tried using the previous stampede.cfg included in the ET_2013_05
>>> branch of simfactory, the same one I used to compile my ET_2013_05
>>> build.  This cfgfile uses the same version of IMPI but different Intel
>>> compilers (version 13.0.2.146). The run speed shows the same trends as
>>> when using the newer config file.
>> Hi Jim,
>> 
>> I'm quite confused by this problem report.  I guess that you are meaning the 
>> following:
>> 
>> - You get the slowdown with the current ET_2013_11 release
>> - You don't get the slowdown with the ET_2013_05 release
>> - You do get the slowdown if you use the current ET_2013_11 release with the 
>> ET_2013_05 stampede.cfg
>> 
>> Is that correct?
>> 
>> I consider Intel MPI to be unusable on Stampede, and that it always has 
>> been.  I used to get random crashes, hangs and slowdowns.  I also 
>> experienced similar problems with Intel MPI on SuperMUC.  For any serious 
>> work, I have always used MVAPICH2 on Stampede.  In the current ET trunk 
>> Intel MPI has been replaced with MVAPICH2.  I would try the current trunk 
>> and see if this fixes your problems.  You can also use just the stampede 
>> files from the current trunk with the ET_2013_11 release (make sure you use 
>> the ones listed in stampede.ini).
> Interesting. I haven't been able to get a run to work with mvapich2 because 
> of an issue with the runs
> dying during checkpoint. Which config file are you using (module loaded, 
> etc)? How much ram per node
> do your production runs typically use?

I'm using exactly the default simfactory config from the current trunk, so you 
can see the modules etc there.  Checkpointing (and recovery works fine).  I 
usually aim for something like 75% memory usage for production runs.


> 
>> We didn't change the MPI version before the release, as that would have been 
>> quite an invasive change at that point.  However, I would consider 
>> backporting this, after suitable discussion.
>> 
>> Of course, your problem might be unrelated to the version of MPI.  I am 
>> running perfectly fine on stampede with the current trunk (MVAPICH2); runs 
>> have a consistent speed and retain this speed after recovery.
>> 
> 

-- 
Ian Hinder
http://numrel.aei.mpg.de/people/hinder

_______________________________________________
Users mailing list
[email protected]
http://lists.einsteintoolkit.org/mailman/listinfo/users

Reply via email to