Am 18.02.2015 um 13:13 schrieb Kevin Taylor <[email protected]>:
> 
> 
> I have several groups of machines that have infiniband on them and due to 
> history and physical locations, these groups of machines have individual 
> infiniband domains. 
> 
> What I've done right now (not in production) is create a boolean complex for 
> 'ib' and identify all of the nodes that contain infiniband. I've also created 
> a string complex called 'ibdomain' that has a name to uniquely identify which 
> systems connect to each other with IB. 
> 
> Is there a way that a user could just ask for 'ib' when submitting a parallel 
> job (I don't care where it goes as long as it has infiniband), and have the 
> grid engine tell the job the value of 'ibdomain'? Or keep the job within 
> systems on the same ibdomain?

It should work to request one of the domains by specifying its name as a 
request with a specific string `qsub -l ibdomain=section2 ...`. But this may 
not be what you are looking for as you can't use a wildcard here (at least not 
with the effect to stay inside one domain).

Instead of a complex which receives a certain number, it's easier to define one 
PE per domain with a suffix and then request like `qsub -pe "ib*" 16 ...` for 
any of the IB domains. Once a PE is selected, only slots belonging to this PE 
will be selected for this job.

You can attach the PEs in a single queue by defining a list of PEs for 
individual nodes or per hostgroup (which might shorten the line).

$ qconf -sq all.q
...
pe_list               make,[@ibhosts1=ib1 make],[@ibhosts2=ib2 make smp]

(the default is used only for machines not listed further more in the list, 
it's not added to all automatically)

-- Reuti
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to