Am 31.10.2012 um 23:28 schrieb Dave Love: > Reuti <[email protected]> writes: > >> Am 30.10.2012 um 06:54 schrieb Oren Mustaki: >> >>> >>> i found a solution in >>> http://www.gridengine.info/2006/02/14/grouping-jobs-to-nodes-via-wildcard-pes/ >>> >>> the following is more suitable for my needs : >>> >>> create a host group for each subcluster – group1 group2 >>> create pe for each subcluster fillup_group1 fillup_group2 >>> associate the pes to queue >>> create resource quota rule >>> limit pes {fillup_group1} hosts {!group1} to slots = 0 >>> limit pes {fillup_group2} hosts {!group2} to slots = 0 >> >> Nice variant. >> >> But you can get different queues this way on one and the same host. On >> case you use `qrsh -inherit ...`, then also the name of $TMPDIR will >> be different for each process if they run in different queues (this >> might be irrelevant of course - depends on the application). > > I'd meant to investigate changing that, but let it slip. Is there any > reason to have the queue there? If anything, I think it should be the > cell. The code says > > /* make tmpdir only when this is the first task that gets started > in this queue instance. QU_job_slots_used holds actual number of used > slots for this job in the queue */ > > but doesn't indicate why the queue matters as far as I can tell.
Note: there is also issue https://arc.liv.ac.uk/trac/SGE/ticket/813 where two `qrsh -inherit` to the same exechost end up in wrong queues. This would also be solved then, as the desired queue can't be selected right now. (Only if you would like to get exactly one unique $TMPDIR per `qrsh -inherit` with a slot count of 1 in each queue you would be out of luck. But for now this can't be guaranteed anyway. OTOH: it could be a feature to limit some kind of disk quota inside $TMPDIR and you want to get a correct one for each `qrsh -inherit` call and the -q option should be implemented.) Before changing this: I wonder what was the intention >12 years ago to include the name of the queue, as the job/task-id is already unique? I'm not sure, whether it was already in DQS. In SGE 5.3 there were no cluster queues (i.e. one queue definition per exechost...) and often the number of the exechost was included in the name of the queue because of this, like 1234.1.serial01.q for a serial queue on node01. -- Reuti _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
