Hello, I set up a test queue to test a new prolog/epilog scripts and I am seeing some strange behavior when I submit a PE job to this queue, which causes the job to not get scheduled forever or for a very long period of time. I tried several PE with allocation rules of '1', '2', '4'. All to no avail. Submitting a job without a PE makes it run immediately. I am using SGE 2.6u5.
Checking why it is not running: $ qalter -w *v* 7301747 ... Job 7301747 cannot run because it exceeds limit "ilya/////" in rule "limit_slots_for_users/1" Job 7301747 cannot run in PE "pe_1" because it only offers 0 slots verification: no suitable queues $ qconf -sp pe_1 pe_name pe_1 slots 9999999 user_lists NONE xuser_lists NONE start_proc_args startmpi.sh $pe_hostfile stop_proc_args stopmpi.sh $pe_hostfile allocation_rule *1* control_slaves TRUE job_is_first_task TRUE urgency_slots min accounting_summary FALSE $ qconf -srqs limit_slots_for_users { name limit_slots_for_users description "limit the number of simultaneous slots any user can use" enabled TRUE limit users {*} to slots=800 } And finally, $ qstat job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 7301584 0.60051 sleep ilya qw 04/20/2018 18:29:26 4 7301747 0.50051 sleep ilya qw 04/20/2018 18:36:23 1 So I am not running anything at the moment. If I submit a job with the same PE to a production queue, it will get scheduled. A job that I left hanging last night, finally got scheduled after 7-8 hours. The test queue is a follows: qconf -sq test_gpu.q qname test_gpu.q hostlist @gpu seq_no 0 load_thresholds np_load_avg=1.75 suspend_thresholds NONE nsuspend 1 suspend_interval 00:05:00 priority 0 min_cpu_interval 00:05:00 processors UNDEFINED qtype BATCH INTERACTIVE ckpt_list NONE pe_list make pe_1 pe_2 pe_3 pe_4 pe_slots rerun TRUE slots 4 tmpdir /data shell /bin/sh prolog sgeg...@prolog.sh epilog sgeg...@epilog.sh shell_start_mode unix_behavior starter_method NONE suspend_method NONE resume_method NONE terminate_method custom_kill -p $job_pid -j $job_id notify 00:00:60 owner_list NONE user_lists system.g xuser_lists NONE subordinate_list NONE complex_values NONE projects NONE xprojects NONE calendar NONE initial_state default s_rt INFINITY h_rt INFINITY s_cpu INFINITY h_cpu INFINITY s_fsize INFINITY h_fsize INFINITY s_data INFINITY h_data INFINITY s_stack INFINITY h_stack INFINITY s_core 1G h_core INFINITY s_rss INFINITY h_rss INFINITY s_vmem INFINITY h_vmem INFINITY Any suggestions? Thank you, Ilya.
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users