Re: [gridengine users] sge_schedd exhausts all memory

Joshua Baker-LePain Tue, 25 Oct 2011 10:32:17 -0700

On Tue, 25 Oct 2011 at 6:12pm, SLIM H.A. wrote

After using GridEngine 6.1u6 for more than a year a problem has cropped
up suddenly with the scheduler. The scheduler uses rapidly all the
available memory in the system and can ultimately crash the server.
Stopping qmaster, waiting until top shows a normal memory usage and
restarting it, immediately all memory is claimed by sge_schedd. I have
tried setting the params  profile=1 setting with qconf -msconf to
monitor the scheduler message file, the output after restarting qmaster
is below. I cannot see anything relevant but maybe someone else has a
better insight.


Does anyone know another way to investigate this "memory leak"?

I recently dealt with a similar problem on 6.1u3. I tracked it down to asingle job -- a 50,000 task array job with a very poorly written jobscript which clocked in at over 32MB. Putting a hold on that job settledSGE back into sane amounts of memory usage. I then gently encouraged theuser to rewrite the job script.

One way to track down which job(s) is/are causing the issue is to put ahold on all queued jobs. Take the hold off in batches and track down theerrant job(s).


Good luck.

--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] sge_schedd exhausts all memory

Reply via email to