Re: [gridengine users] jsv and MPI core bind questions

Daniel Gruber Tue, 26 Mar 2013 10:10:43 -0700

Am 26.03.2013 um 17:10 schrieb Reuti:

> Hi,
> 
> Am 26.03.2013 um 12:17 schrieb Arnau Bria:
> 
>> I'm migrating a bash jsv script to perl and adding some
>> modifications, but I have some doubts:
>> 
>> 1) jsv_correct vs jsv_accept. From man:
>> 
>> If the result_type is ACCEPTED the job will be accepted as it was
>> initially submitted by the end  user.  All  param_commands  and
>> env_commands which  might  have  been  sent  before  the
>> result_command are ignored in this case.  The result_type CORRECT
>> indicates that the job should be accepted after all modifications sent
>> via param_commands and env_commands are applied to the job
>> 
>> But if I do modifications (I'm doing jsv_sub_add_param) and then
>> jsv_accept, the job is modified and submited, so, why is jsv_correct
>> needed? what could happen if I do not correct but accept?
> 
> I would say it's a bug, that the changes made to the job are committed. They 
> should be ignored.


As far as I remember there is just one implementation which only sends 
differences
in case of jsv_correct. It might be the Java JSV. I agree, it would be easier 
for JSV
implementers as well as for users to have just only one statement. 

> 
> 
>> 2) core binding. I have it configured for serial and smp jobs, but
>> which is the correct strategy and configuration for mpi jobs?
>> Is linear going to span jobs acros different host sockets?
> 
> AFAICS the request is applied on all machines which you get granted for the 
> job. I.e. applied per `qrsh -inherit ...` besides setting it for the 
> jobscript already. This is hard to handle in case of a round robin 
> allocation, as you don't know in advance whether you get just one slot per 
> machine or more. Maybe the best would be to use it with a fixed allocation 
> rule only.

Yes, linear spans across sockets, while it tries to allocate cores on one 
socket first. Basically it 
chooses the socket with most free cores and fills it up, then it chooses the 
second socket, 
and so on. Something like "packing" jobs close to shared cache levels. In Univa 
Grid Engine it is 
not a per qrsh -inherit call anymore (as it is like for SGE 6.2u5), it is now a 
per host request 
because core management was moved in 8.1.0 from execd level into the scheduler 
itself.
The scheduler has a global view on used resources.

When requesting linear with JSV you need to request "linear_automatic" since 
"linear" equals
to something like "qsub -binding linear:2:0,0" while "linear_automatic" equals 
to the more common
"qsub -binding linear:2". 

If you are using OpenMPI you can also generate a rankfile out of the PE 
hostfile and delegate the 
core selection to OpenMPI. But in SGE you have the same core selection for each 
host hence the 
jobs must run host exclusively, which is no real advantage. In Univa Grid 
Engine you don't have
this limitation anymore, again because the scheduler selects cores with having 
a global view.

Maybe this is interesting for you:
http://www.gridengine.eu/grid-engine-internals/119-boosting-openmpi-performance-with-rankfiles-core-binding-and-univa-grid-engine

Daniel


> 
> -- Reuti
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] jsv and MPI core bind questions

Reply via email to