Might be worth adding that the 8.1.6 qmaster grabs 99% of the CPU within a
couple of  minutes of starting up, so  there's something screwy going on with
it.

On 24 February 2014 12:24, Kevin Buckley
<kevin.buckley.ecs.vuw.ac...@gmail.com> wrote:
> Hi there,
>
> have recently been seeing issues on our School's SGE grid where the
> the qmaster (8.0.0) became unresponsive after around a day's uptime
> showing a message akin to
>
> 02/21/2014 11:40:00|worker|mymaster|E|not enough memory to allocate
> 1048576 bytes in init_packbuffer
>
> with clients seeing the usual "gdi timeout" messages when attempting
> qstat/qsub etc.
>
> After the qmaster was brought up tp 8.1.6, the unresponsiveness
> started kicking in
> almost immediately, although no discernable memory issue logging appears.
>
> A qping of the master suggests that the older version displays
>
> status:                   1
>
> with all the threads in a E state, whilst with the 8.16 master, the process
> entered
>
> status:                   2
>
> fairly quickly after a restart.
>
> Despite any apparent mistmatch between the i386 master and x86_64 execds,
> the system here has only just started to misbehave - though perhaps we've just
> been "lucky".
>
> Whilst I doubt anyone else out there will have such a system as ours, if 
> anyone
> has any suggestions as to debugging such issues, to a  deeper level than a 
> basic
> qping, which is what most of the postings a web search unearthed seem
> to suggest,
> I'd be delighted to hear of them
>
> Kevin M. Buckley
>
> eScience Consultant
> School of Engineering and Computer Science
> Victoria University of Wellington
> New Zealand
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to