Re: [gridengine users] Grid Engine Sluggish

2019-01-28 Thread Skylar Thompson
Another thing to check is that schedd_job_info is disabled - leaving this enabled can cause high CPU and memory load on the qmaster with large jobs or lots of reservations. On Sat, Jan 26, 2019 at 01:16:58PM -0500, Daniel Povey wrote: > Check if there are any huge jobs in the queue. Sometimes

Re: [gridengine users] Grid Engine Sluggish

2019-01-26 Thread Joseph Farran
Very temping but the user would have my head :-) On my previous post I think I may have found the issue.   I turned down max_reservation from 100 to 10 and sge_qmaster seems to be taking a breather and not staying at 100% cpu utilization. Time will tell. Thank you all, Joseph On 1/26/2019

Re: [gridengine users] Grid Engine Sluggish

2019-01-26 Thread Joseph Farran
Hi Ian. I have BLCR checkpoint implemented and even-though the context files are large, the FS keeps up. I've been playing with adjusting qconf ( qconf -msconf  ) vales I think I just hit gold.   I set max_reservation really low and max_reservation 10 And this is the first time sge_qmaster

Re: [gridengine users] Grid Engine Sluggish

2019-01-26 Thread Ian Kaufman
IO issues? NFS server providing data and possibly jobs running over NDS shares as opposed to running on local disk? On Sat, Jan 26, 2019, 11:23 AM Joseph Farran Hi Daniel. > > Yes I do have large job-arrays around 7k tasks BUT I have had larger job > arrays of 500k without seeing this kind of

Re: [gridengine users] Grid Engine Sluggish

2019-01-26 Thread Daniel Povey
It may depend on specific features of those large job arrays. You could try deleting them and see if the problem disappears. On Sat, Jan 26, 2019 at 2:23 PM Joseph Farran wrote: > Hi Daniel. > > Yes I do have large job-arrays around 7k tasks BUT I have had larger job > arrays of 500k without

Re: [gridengine users] Grid Engine Sluggish

2019-01-26 Thread Joseph Farran
Hi Daniel. Yes I do have large job-arrays around 7k tasks BUT I have had larger job arrays of 500k without seeing this kind of slowdown. Joseph On 1/26/2019 10:16 AM, Daniel Povey wrote: Check if there are any huge jobs in the queue. Sometimes very large task ranges, or large numbers of

Re: [gridengine users] Grid Engine Sluggish

2019-01-26 Thread Joseph Farran
Hi Reuti. Yes - several times with no success. Joseph On 1/26/2019 4:03 AM, Reuti wrote: Hi, Am 26.01.2019 um 10:20 schrieb Joseph Farran : Hi. Our Grid Engine is running very sluggish all of a sudden. Sqe_qmaster stays at 100%

Re: [gridengine users] Grid Engine Sluggish

2019-01-26 Thread Daniel Povey
Check if there are any huge jobs in the queue. Sometimes very large task ranges, or large numbers of jobs, can make it slow. On Sat, Jan 26, 2019 at 7:05 AM Reuti wrote: > Hi, > > > Am 26.01.2019 um 10:20 schrieb Joseph Farran : > > > > Hi. > > Our Grid Engine is running very sluggish all of a

Re: [gridengine users] Grid Engine Sluggish

2019-01-26 Thread Reuti
Hi, > Am 26.01.2019 um 10:20 schrieb Joseph Farran : > > Hi. > Our Grid Engine is running very sluggish all of a sudden. Sqe_qmaster stays > at 100% all the time where is used to be 100% for a few seconds every 30 > seconds or so. > I ran the qping command but not sure how to read it. Any

[gridengine users] Grid Engine Sluggish

2019-01-26 Thread Joseph Farran
Hi. Our Grid Engine is running very sluggish all of a sudden. Sqe_qmaster stays at 100% all the time where is used to be 100% for a few seconds every 30 seconds or so. I ran the qping command but not sure how to read it.   Any helpful insight much appreciated qping -i 5 -info hpc-s 6444