Hi, OS: Ubuntu 10.04 64 OGS/GE 2011.11p1 (linux-x64) Intermittently for no apparent reason, jobs fail and when rerun succeed.
On a daily basis I see the below errors in the qmaster messages log. 11/12/2012 11:24:25|event_|n1|E|no event client known with id 138 to process acknowledgements 11/12/2012 11:24:25|event_|n1|E|no event client known with id 21 to process acknowledgements 11/12/2012 11:24:25|event_|n1|E|no event client known with id 141 to process acknowledgements 11/12/2012 11:24:25|worker|n1|E|no event client known with id 16 to shutdown I have played with many of the parameters (see below) with no joy. Any suggestions to troubleshoot and resolve would be appreciated. Please let me know if you require additional configs or logs. Thanks Laurence --------------------------------------------------------------------------------------------------------- algorithm default schedule_interval 00:00:01 maxujobs 0 queue_sort_method load job_load_adjustments np_load_avg=0.50,load_avg=0.50 load_adjustment_decay_time 0:7:30 load_formula load_avg-num_proc schedd_job_info false flush_submit_sec 4 flush_finish_sec 4 params none reprioritize_interval 0:0:0 halftime 168 usage_weight_list cpu=1.000000,mem=0.000000,io=0.000000 compensation_factor 5.000000 weight_user 0.250000 weight_project 0.250000 weight_department 0.250000 weight_job 0.250000 weight_tickets_functional 50000 weight_tickets_share 0 share_override_tickets TRUE share_functional_shares TRUE max_functional_jobs_to_schedule 1250 report_pjob_tickets TRUE max_pending_tasks_per_job 500 halflife_decay_list none policy_hierarchy OFS weight_ticket 0.010000 weight_waiting_time 0.000000 weight_deadline 3600000.000000 weight_urgency 0.100000 weight_priority 1.000000 max_reservation 0 default_duration INFINITY
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users