On 09/01/12 10:40, Simon Templer wrote:
Hi,

we had some issues with time shifts in the SOGo web interface, so I wanted to check if this is resolved with the current release. So I upgraded to 1.3.11 (using the Ubuntu packages, running Ubuntu 11.10).
Hi

I am seeing something pretty similar to what you have seen.

ie

Jan 06 09:55:59 sogod [8054]: [ERROR]<0x0x24c66b0[WOWatchDogChild]>  FAILURE 
receiving status for child 8059
Jan 06 09:55:59 sogod [8054]: [ERROR]<0x0x24c66b0[WOWatchDogChild]>    
socket:<NGActiveSocket[0x0x25bf230]: mode=rw address=(nil) 
connectedTo=<0x0x2458bb0[NGLocalSocketAddress]: /tmp/_ngsocket_0x1f76_0x22f7$
Jan 06 09:55:59 sogod [8054]: [ERROR]<0x0x24c66b0[WOWatchDogChild]>    
exception:<NGSocketTimedOutException: 0x25723b0>  NAME:NGSocketTimedOutException 
REASON:the socket was shutdown INFO:{errno = 11; error $
Jan 06 09:55:59 sogod [8054]:<0x0x24c66b0[WOWatchDogChild]>  sending terminate 
signal to pid 8059
Jan 06 09:55:59 sogod [8054]: [ERROR]<0x0x24c66b0[WOWatchDogChild]>  FAILURE 
notifying child 8059
Jan 06 09:55:59 sogod [8054]:<0x0x24c66b0[WOWatchDogChild]>  sending terminate 
signal to pid 8059
Jan 06 09:55:59 sogod [8059]: [ERROR]<0x0x25e59b0[WOHttpAdaptor]>  http server 
caught:<NGCouldNotAcceptException: 0x28c85e0>  NAME:NGCouldNotAcceptException 
REASON:Could not accept: not a valid socket descri$
Jan 06 09:55:59 sogod [8054]:<0x0x24c66b0[WOWatchDogChild]>  child 8059 exited
Jan 06 09:55:59 sogod [8054]:<0x0x25a5f40[WOWatchDog]>  child spawned with pid 
8802
Jan 06 09:56:59 sogod [8802]:<0x0x22f4750[SOGoCache]>  Cache cleanup interval 
set every 300.000000 seconds
I too am running 1.3.11 on Ubuntu 10.04LTS with MySQL as my SQLserver.

After the errors above I got more of the same nature every few mins with new PIDs being created for the child processes. Although the server as a whole ran ok with Sogo during this time, come 8am the next morning when the master sogo process will attempt to stop all sogo processes and cleanly restart new ones, this time it couldnt stop and restart all the processes.

Jan 07 07:58:56 sogod [14519]: version 1.3.11 (build @vizzini.inverse.ca 201112122226) -- starting Jan 07 07:58:56 sogod [14519]: vmem size check enabled: shutting down app when vmem > 512 MB Jan 07 07:58:56 sogod [14519]: <0x0x1a861d0[SOGoProductLoader]> SOGo products loaded from '/usr/lib/GNUstep/SOGo': Jan 07 07:58:56 sogod [14519]: <0x0x1a861d0[SOGoProductLoader]> ContactsUI.SOGo, AdministrationUI.SOGo, SchedulerUI.SOGo, MailPartViewers.SOGo, Mailer.SOGo, PreferencesUI.SOGo, CommonUI.SOGo, Appointments.$ Jan 07 07:58:57 sogod [14519]: [WARN] <0x0x1be9f40[WOWatchDog]> listening socket: attempt 1 failed Jan 07 07:58:58 sogod [14519]: [WARN] <0x0x1be9f40[WOWatchDog]> listening socket: attempt 2 failed Jan 07 07:58:59 sogod [14519]: [WARN] <0x0x1be9f40[WOWatchDog]> listening socket: attempt 3 failed Jan 07 07:59:00 sogod [14519]: [WARN] <0x0x1be9f40[WOWatchDog]> listening socket: attempt 4 failed Jan 07 07:59:01 sogod [14519]: [WARN] <0x0x1be9f40[WOWatchDog]> listening socket: attempt 5 failed Jan 07 07:59:02 sogod [14519]: [ERROR] <0x0x1be9f40[WOWatchDog]> unable to listen on specified port, check that no other process is already using it

I have already increased my vmem size from 300 to 512MB and thought that this had solved the issue, but it seems not. There weren't any exceptional requests just before the server errors started, and the whole server wasn't under any great load at the time.

As there is another mailing post with a similar issue again using 1.3.11 http://www.mail-archive.com/[email protected]/msg07840.html, am just wondering if there is a link to something that has changed. We had no server errors really at all using 1.3.5 up to 1.3.9, with good reliability but now keep getting them.

I have WOWorkers set to 5 in my main /etc config with only 30 users, so believe easily have enough ram allocated and WOWorker processes available.

--
Kind Regards,

Julian



After the upgrade, when launching up the web interface in the browser, it will receive no response - not even the login page will be displayed. Looking at the sogo log there is much activity:

Jan 09 11:13:40 sogod [1586]: version 1.3.11 (build @vizzini.inverse.ca 201112121816) -- starting Jan 09 11:13:40 sogod [1586]: vmem size check enabled: shutting down app when vmem > 384 MB Jan 09 11:13:41 sogod [1586]: <0x0x9a75d80[SOGoProductLoader]> SOGo products loaded from '/usr/lib/GNUstep/SOGo': Jan 09 11:13:41 sogod [1586]: <0x0x9a75d80[SOGoProductLoader]> Contacts.SOGo, Mailer.SOGo, MailerUI.SOGo, PreferencesUI.SOGo, AdministrationUI.SOGo, CommonUI.SOGo, Appointments.SOGo, SchedulerUI.SOGo, MainUI.SOGo, ContactsUI.SOGo, MailPartViewers.SOGo Jan 09 11:13:41 sogod [1586]: <0x0x9a79ea8[WOWatchDog]> listening on *:20000 Jan 09 11:13:41 sogod [1586]: <0x0x9a79ea8[WOWatchDog]> watchdog process pid: 1586 Jan 09 11:13:41 sogod [1586]: <0x0xb744aa00[WOWatchDogChild]> watchdog request timeout set to 10 minutes Jan 09 11:13:41 sogod [1586]: <0x0x9a79ea8[WOWatchDog]> preparing 1 children Jan 09 11:13:41 sogod [1586]: <0x0x9a79ea8[WOWatchDog]> child spawned with pid 1658 Jan 09 11:15:12 sogod [1586]: [ERROR] <0x0x9b08e58[WOWatchDogChild]> FAILURE receiving status for child 1658 Jan 09 11:15:12 sogod [1586]: [ERROR] <0x0x9b08e58[WOWatchDogChild]> socket: <NGActiveSocket[0x0x99a4c98]: mode=rw address=(null) connectedTo=<0x0x9a8dc08[NGLocalSocketAddress]: /tmp/_ngsocket_0x632_0x997e668_000> receive-timeout=1.000s> Jan 09 11:15:12 sogod [1586]: [ERROR] <0x0x9b08e58[WOWatchDogChild]> exception: <NGSocketTimedOutException: 0x9af9cc0> NAME:NGSocketTimedOutException REASON:the socket was shutdown INFO:{errno = 11; error = "Resource temporarily unavailable"; stream = "{object = <NGActiveSocket[0x0x99a4c98]: mode=rw address=(null) connectedTo=<0x0x9a8dc08[NGLocalSocketAddress]: /tmp/_ngsocket_0x632_0x997e668_000> receive-timeout=1.000s>;}"; } Jan 09 11:15:12 sogod [1586]: <0x0x9b08e58[WOWatchDogChild]> sending terminate signal to pid 1658 Jan 09 11:15:12 sogod [1586]: [ERROR] <0x0x9b08e58[WOWatchDogChild]> FAILURE notifying child 1658 Jan 09 11:15:12 sogod [1586]: <0x0x9b08e58[WOWatchDogChild]> sending terminate signal to pid 1658 2012-01-09 11:15:12.081 sogod[1658] autorelease called without pool for object (0x9a039f8) of class NSIntNumber in thread <NSThread: 0x997e668> Jan 09 11:15:12 sogod [1586]: <0x0x9b08e58[WOWatchDogChild]> child 1658 exited

--- The following part then is repeated (endlessly) every few seconds (with rising child pids) ---

Jan 09 11:15:12 sogod [1586]: <0x0x9a79ea8[WOWatchDog]> child spawned with pid 1765 Jan 09 11:15:14 sogod [1586]: [ERROR] <0x0x9b08e58[WOWatchDogChild]> FAILURE receiving status for child 1765 Jan 09 11:15:14 sogod [1586]: [ERROR] <0x0x9b08e58[WOWatchDogChild]> socket: <NGActiveSocket[0x0x9b62890]: mode=rw address=(null) connectedTo=<0x0x9a8dc08[NGLocalSocketAddress]: /tmp/_ngsocket_0x632_0x997e668_000> receive-timeout=1.000s> Jan 09 11:15:14 sogod [1586]: [ERROR] <0x0x9b08e58[WOWatchDogChild]> exception: <NGSocketTimedOutException: 0x9af9cc0> NAME:NGSocketTimedOutException REASON:the socket was shutdown INFO:{errno = 11; error = "Resource temporarily unavailable"; stream = "{object = <NGActiveSocket[0x0x9b62890]: mode=rw address=(null) connectedTo=<0x0x9a8dc08[NGLocalSocketAddress]: /tmp/_ngsocket_0x632_0x997e668_000> receive-timeout=1.000s>;}"; } Jan 09 11:15:14 sogod [1586]: <0x0x9b08e58[WOWatchDogChild]> sending terminate signal to pid 1765 Jan 09 11:15:14 sogod [1586]: [ERROR] <0x0x9b08e58[WOWatchDogChild]> FAILURE notifying child 1765 Jan 09 11:15:14 sogod [1586]: <0x0x9b08e58[WOWatchDogChild]> sending terminate signal to pid 1765 Jan 09 11:15:14 sogod [1586]: <0x0x9b08e58[WOWatchDogChild]> child 1765 exited Jan 09 11:15:14 sogod [1586]: <0x0x9b08e58[WOWatchDogChild]> avoiding to respawn child before 2012-01-09 11:15:17 +0100

From the log output I have no idea what could go wrong. It would be great if someone had a hint on what the problem might be or where to look for more insight into what goes wrong.

Regards,
Simon


--
[email protected]
https://inverse.ca/sogo/lists

Reply via email to