Glad to hear you found/fixed the problem!

> On Feb 6, 2015, at 2:44 PM, Peter A Ruprecht <[email protected]> 
> wrote:
> 
> 
> Thanks to everyone who responded.
> 
> It appears that the issue was not the version of Slurm, but rather that we
> had set TaskAffinity=yes in cgroups.conf at the same time we installed the
> new version.
> 
> Applications that were using OpenMPI version 1.6 and prior were in many
> cases showing dramatically slower run times.  I incorrectly wrote earlier
> that v1.8 was also affected; in fact it seems to have been OK.
> 
> I don't have a good environment for testing this further at the moment,
> unfortunately, but since we backed out the change the users are happy
> again.
> 
> Thanks again,
> Peter
> 
> On 2/6/15, 6:49 AM, "Ralph Castain" <[email protected]> wrote:
> 
>> 
>> If you are launching via mpirun, then you won't be using either version
>> of PMI - OMPI has its own internal daemons that handle the launch and
>> wireup.
>> 
>> It's odd that it happens across OMPI versions as there exist significant
>> differences between them. Is the speed difference associated with non-MPI
>> jobs as well? In other words, if you execute "mpirun hostname", does it
>> also take an inordinate amount of time?
>> 
>> If not, then the other possibility is that you are falling back on TCP
>> instead of IB, or that something is preventing the use of shared memory
>> as a transport for procs on the same node.
>> 
>> 
>>> On Feb 5, 2015, at 5:02 PM, Peter A Ruprecht
>>> <[email protected]> wrote:
>>> 
>>> 
>>> Answering two questions at one time:
>>> 
>>> I am pretty sure we are not using PMI2.
>>> 
>>> Jobs are launched via "sbatch job_script" where the script contains
>>> "mpirun ./executable_file".  There appear to be issues with at least
>>> OMPI
>>> 1.6.4 and 1.8.X.
>>> 
>>> Thanks
>>> Peter
>>> 
>>> On 2/5/15, 5:39 PM, "Ralph Castain" <[email protected]> wrote:
>>> 
>>>> 
>>>> And are you launching via mpirun or directly with srun <myapp>? What
>>>> OMPI
>>>> version are you using?
>>>> 
>>>> 
>>>>> On Feb 5, 2015, at 3:32 PM, Chris Samuel <[email protected]>
>>>>> wrote:
>>>>> 
>>>>> 
>>>>> On Thu, 5 Feb 2015 03:27:25 PM Peter A Ruprecht wrote:
>>>>> 
>>>>>> I ask because some of our users have started reporting a 10x increase
>>>>>> in
>>>>>> run-times of OpenMPI jobs since we upgraded to 14.11.3 from 14.3.
>>>>>> It's
>>>>>> possible there is some other problem going on in our cluster, but all
>>>>>> of
>>>>>> our hardware checks including Infiniband diagnostics look pretty
>>>>>> clean.
>>>>> 
>>>>> Are you using PMI2?
>>>>> 
>>>>> cheers,
>>>>> Chris
>>>>> -- 
>>>>> Christopher Samuel        Senior Systems Administrator
>>>>> VLSCI - Victorian Life Sciences Computation Initiative
>>>>> Email: [email protected] Phone: +61 (0)3 903 55545
>>>>> http://www.vlsci.org.au/      http://twitter.com/vlsci

Reply via email to