Hi, Am 24.02.2014 um 00:24 schrieb Kevin Buckley:
> have recently been seeing issues on our School's SGE grid where the > the qmaster (8.0.0) became unresponsive after around a day's uptime > showing a message akin to > > 02/21/2014 11:40:00|worker|mymaster|E|not enough memory to allocate > 1048576 bytes in init_packbuffer > > with clients seeing the usual "gdi timeout" messages when attempting > qstat/qsub etc. > > After the qmaster was brought up tp 8.1.6, the unresponsiveness > started kicking in > almost immediately, although no discernable memory issue logging appears. > > A qping of the master suggests that the older version displays > > status: 1 > > with all the threads in a E state, whilst with the 8.16 master, the process > entered > > status: 2 > > fairly quickly after a restart. > > Despite any apparent mistmatch between the i386 master and x86_64 execds, > the system here has only just started to misbehave - though perhaps we've just > been "lucky". You mean the execds were already 8.1.6 before, and now the qmaster was updated to this version? A different architecture between qmaster and execds is no problem and supported (i.e. SGE supports heterogenous clusters since I remember). -- Reuti > Whilst I doubt anyone else out there will have such a system as ours, if > anyone > has any suggestions as to debugging such issues, to a deeper level than a > basic > qping, which is what most of the postings a web search unearthed seem > to suggest, > I'd be delighted to hear of them > > Kevin M. Buckley > > eScience Consultant > School of Engineering and Computer Science > Victoria University of Wellington > New Zealand > _______________________________________________ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users