Hi Reuti

I appreciate your help on this thread - I confess I'm puzzled by it. As you 
know, OMPI doesn't use SGE to launch the individual processes, nor does SGE 
even know they exist. All SGE is used for is to launch the OMPI daemons 
(orteds). This is done as a single qrsh call, so won't all the daemons wind up 
being executed against the same queue regardless of how many queues exist in 
the system?

Given that the daemons then fork/exec the MPI processes (outside of qrsh), I 
would think they would inherit that nice setting as well, and so all the procs 
will be running at the same nice level too.

As for TMPDIR, we don't forward that unless specifically directed to do so, 
which I didn't see on their cmd line.


On Mar 14, 2012, at 2:33 AM, Reuti wrote:

> Hi,
> 
> Am 14.03.2012 um 04:02 schrieb Joshua Baker-LePain:
> 
>> On Tue, 13 Mar 2012 at 5:31pm, Ralph Castain wrote
>> 
>>> FWIW: I have a Centos6 system myself, and I have no problems running OMPI 
>>> on it (1.4 or 1.5). I can try building it the same way you do and see what 
>>> happens.
>> 
>> I can run as many threads as I like on a single system with no problems, 
>> even if those threads are running at different nice levels.
> 
> How do they get different nice levels - you renice them? I would assume that 
> all start at the same of the parent. In your test program you posted there 
> are no threads.
> 
> 
>> The problem seems to arise when I'm both a) running across multiple machines 
>> and b) running threads at differing nice levels (which often happens as a 
>> result of our queueing setup).
> 
> This sounds like you are getting slots from different queues assigned to one 
> and the same job. My experience: don't do it, unless you neeed it. The 
> problem is, that SGE can't decide in its `qrsh -inherit ...` call, which 
> queue is the correct one for the particular call. As a result all calls to a 
> slave machine can end up in one and the same queue. Although this is not 
> correct, it won't oversubscribe the node, as usually the overall slot amount 
> is limited already and it's more a matter of names SGE sets for the 
> environment of the job:
> 
> https://arc.liv.ac.uk/trac/SGE/ticket/813
> 
> As a result, the SGE set $TMPDIR can be different between the master of the 
> parallel job and the slave as the name of the queue is part of $TMPDIR. When 
> a wrong $TMPDIR is set on a node (by Open MPI's forwarding?), strange things 
> can happen depending on the application.
> 
> Do you face the same if you stay in one and the same queue across the 
> machines? If you want to limit the number of available PEs in your setup for 
> the user, you could request a PE by a wildcard and once a PE is selected SGE 
> will stay in this PE. Attaching each PE to only one queue allows this way to 
> avoid the mixture of slots from different queues (orte1 PE => all.q, orte2 PE 
> => extra.q and you request orte*).
> 
> -- Reuti
> 
> 
>> I can't guarantee that the problem *never* happens when I run across 
>> multiple machines with all the threads un-niced, but I haven't been able to 
>> reproduce that at will like I can for the other case.
>> 
>> -- 
>> Joshua Baker-LePain
>> QB3 Shared Cluster Sysadmin
>> UCSF
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to