Hi, Am 21.02.2012 um 20:20 schrieb Txema Heredia Genestar:
> Hello all, > > I am having some problems to run threaded jobs in SGE 6.1u4. In our cluster, > h_vmem is defined as a consumable attribute in all nodes. It is mandatory, > all jobs must request it, with a default value of 6Gb. That constraint leads > any "parallel" job sent to the cluster to try to reserve a lot of memory > (h_vmem * slots). This is ok for most parallel processes (mpi and the such). > But, sometimes, we need to run "threaded" jobs, where all jobs share a chunk > of memory (everything on a single node). This leads to situations where I > need to send an 8-threaded job that requires, say, 10 Gb of memory, but it > cannot be scheduled because no node can handle a 80Gb request. When a memory > request cannot be fulfilled, the typical message of "cannot run in PE "smp" > because it only offers N slots" appears in qstat (where N is the maximum > number of slots I wolud be able to use given the requested h_vmem size). > > This is the parallel environment I am trying to use: > > # qconf -sp smp > pe_name smp > slots 9999 > user_lists test_users > xuser_lists NONE > start_proc_args /bin/true > stop_proc_args /bin/true > allocation_rule $fill_up for SMP mode you will need $pe_slots here, unless you are requesting exactly one node in addition in the submission command. I assume before you got simply more than one node. == The answer from Bob changing the complex h_vmem to JOB would help for this type of job, but not if you have also MPI jobs in the cluster. I had an RFE for introducing this on a PE level: https://arc.liv.ac.uk/trac/SGE/ticket/197 To cite from the issue "Therefore I wrote, that an entry inthe PE would still be advantageous: h_vmem can only be JOBS or YES" == For now: you could adjust the memory request in a JSV depending on the requested PE, but for this you need 6.2 IIRC. -- Reuti > control_slaves FALSE > job_is_first_task FALSE > urgency_slots min > > The most annoying part of all this is that this behaviour is not consistent: > This morning I've been able to run a 6-threaded job requesting 10Gb of memory > in a 48Gb node. But, in the afternoon, the same job using the very same > command in the same node could not be run. > > Does anyone have any suggestion on how to deal with this? > > Thanks in advance, > > Txema > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
