Hi,

Am 24.02.2014 um 00:24 schrieb Kevin Buckley:

> have recently been seeing issues on our School's SGE grid where the
> the qmaster (8.0.0) became unresponsive after around a day's uptime
> showing a message akin to
> 
> 02/21/2014 11:40:00|worker|mymaster|E|not enough memory to allocate
> 1048576 bytes in init_packbuffer
> 
> with clients seeing the usual "gdi timeout" messages when attempting
> qstat/qsub etc.
> 
> After the qmaster was brought up tp 8.1.6, the unresponsiveness
> started kicking in
> almost immediately, although no discernable memory issue logging appears.
> 
> A qping of the master suggests that the older version displays
> 
> status:                   1
> 
> with all the threads in a E state, whilst with the 8.16 master, the process
> entered
> 
> status:                   2
> 
> fairly quickly after a restart.
> 
> Despite any apparent mistmatch between the i386 master and x86_64 execds,
> the system here has only just started to misbehave - though perhaps we've just
> been "lucky".

You mean the execds were already 8.1.6 before, and now the qmaster was updated 
to this version?

A different architecture between qmaster and execds is no problem and supported 
(i.e. SGE supports heterogenous clusters since I remember).

-- Reuti


> Whilst I doubt anyone else out there will have such a system as ours, if 
> anyone
> has any suggestions as to debugging such issues, to a  deeper level than a 
> basic
> qping, which is what most of the postings a web search unearthed seem
> to suggest,
> I'd be delighted to hear of them
> 
> Kevin M. Buckley
> 
> eScience Consultant
> School of Engineering and Computer Science
> Victoria University of Wellington
> New Zealand
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to