Reuti <[email protected]> writes:
> Note: there is also issue https://arc.liv.ac.uk/trac/SGE/ticket/813 where two
> `qrsh -inherit` to the same exechost end up in wrong queues. This would also
> be solved then, as the desired queue can't be selected right now.
Looking at the code, I don't actually understand how you get
inconsistent TMPDIRs, as the name seems to be derived from the master
queue name in the calls of sge_make_tmpdir.
> (Only if you would like to get exactly one unique $TMPDIR per `qrsh -inherit`
> with a slot count of 1 in each queue you would be out of luck. But for now
> this can't be guaranteed anyway. OTOH: it could be a feature to limit some
> kind of disk quota inside $TMPDIR and you want to get a correct one for each
> `qrsh -inherit` call and the -q option should be implemented.)
Maybe, though that seems quite obscure and less important than problems
caused by the current implementation, even if I'm now confused how they
arise...
> Before changing this: I wonder what was the intention >12 years ago to
> include the name of the queue, as the job/task-id is already unique?
Yes, that's what I mean. I'm inclined to change it anyway if there's no
obvious reason. (The id is only unique in a given cell, and you could
currently have trouble from multiple cells with job ids of similar
sizes, though I doubt that's at all common.)
> I'm not sure, whether it was already in DQS. In SGE 5.3 there were no
> cluster queues (i.e. one queue definition per exechost...) and often
> the number of the exechost was included in the name of the queue
> because of this, like 1234.1.serial01.q for a serial queue on node01.
I'm not sure it helps, but dqs_make_tmpdir:
/* Note could have multiple instantiations of same job, */
/* on same machine, under same queue */
sprintf(str,"%s/%d.%s.%d",qconf->tmpdir,job->job_number,qconf->qname,me.pid);
c.f. sge_make_tmpdir:
/* Note could have multiple instantiations of same job, */
/* on same machine, under same queue */
snprintf(tmpdir, ltmpdir, "%s/"sge_u32"."sge_u32".%s", t, jobid,
jataskid, lGetString(qep, QU_qname));
--
Community Grid Engine: http://arc.liv.ac.uk/SGE/
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users