Re: [gridengine users] SGE rolling over MAX_SEQNUM, peculiar things happened

Daniel Povey Fri, 18 Oct 2019 10:33:34 -0700

Normally restarting the qmaster (e.g. systemctl restart gridengine-qmaster)
should be a very routine and harmless operation that should be invisible to
users except for a temporary inaccessibility of `qstat`.


On Fri, Oct 18, 2019 at 8:35 AM WALLIS Michael <mike.wal...@ed.ac.uk> wrote:

> Hi folks,
>
> Our instance of (quite old, 2011.11p1_155) SGE rolled over 10,000,000 jobs
> at the start of the
> month, and then started again at 1 as expected. About ten days later we
> started the qmaster
> a few times (it was segfaulting, originally we thought that a user was
> using newer qstat
> binaries to query an old qmaster) with JID nearing ~20k, only after each
> of the restarts the JID
> started at about 1100, not the number we were expecting. Because of this
> there's duplicate JID
> entries in accounting and it's causing a bit of a problem for people who
> monitor for failed jobs.
>
> Because of the nature of the workload the currently-running JIDs are now
> all over the place,
> with some JIDs in the queue still in the 9,99n,nnn range and some in four
> figures. If we need to
> restart the qmaster again, will the jobseqnum file be overwritten with the
> largest JID still in
> the queue (as suggested in
> http://arc.liv.ac.uk/pipermail/gridengine-users/2010-January/028661.html)?
>
> Am aware that this is an old version of SGE and we're in the middle of
> transitioning to a
> much newer one, but this is a bit of an issue while we're still shifting
> workloads over.
>
> Thanks,
> Mike
> --
> Mike Wallis x503305
> University of Edinburgh, Research Services,
> Argyle House, 3 Lady Lawson Street,
> Edinburgh, EH3 9DR
>
>
> The University of Edinburgh is a charitable body, registered in Scotland,
> with registration number SC005336.
>
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users
>

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] SGE rolling over MAX_SEQNUM, peculiar things happened

Reply via email to