Hallo to distinguished forum members, Recently we have a need to submit jobs in way that qsub request both requestable variable hostname and parallel environment.
For example if we submit 'xterm' job: * $SGE_ROOT/bin/lx-amd64/qsub -V -cwd -b y -l hostname=host_in_grid -pe somePe 1 xterm This kind of request results in a strange behavior of the scheduler - this requests results to one of the below states of the submission: 1. xterm job opened as expected. 2. There is a very long delay and then xterm opened. 3. Job enters 'qw' state with similar to below error: cannot run because it exceeds limit "/////" in rule "some_rule/1" cannot run in PE "somePe" because it only offers 0 slots In all of the above states the "host_in_grid" has enough free slots and the quota rule "some_rule" is not related in any way to the consumable/request able variable in the job submission request. If we try to remove "some_rule" quota from the SGE quotas, then this error picks up another rule and again states that its limit was exceeded. NOTE: somePe parallel environment has enough free slots - it is initially defined with 999 slots. Basically these "cannot run" messages do not reflect the real reason why the job can't be run, since all conditions are actually met - this is very confusing, why this happen? We also found a workaround without the requestable variable "hostname" like below when it ALWAYS work: $SGE_ROOT/bin/lx-amd64/qsub -V -cwd -b y -q host_in_grid -pe testpe 1 xterm Any ideas why does this strange behavior occur? Is this some kind of a bug? How this can be resolved? Appreciate your help. Thanks. Yuri Burmachenko | Sr. Engineer | IT | Mellanox Tech nologies Ltd. Work: +972 74 7236386 | Cell +972 54 7542188 |Fax: +972 4 959 3245 Follow us on Twitter<http://twitter.com/mellanoxtech> and Facebook<http://www.facebook.com/pages/Mellanox-Technologies/223164879116>
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
