I recently upgrade from 1.3.12 to 1.3.15 on debian stable (with no
problems during the upgrade). About 20 hours laser, SOGo crashed,
leaving a dead process that I had to manually kill (-9) the sogod
process and restart sogo. The logs show:
localhost - - [18/May/2012:08:50:41 GMT] "GET
/SOGo/so/user/Mail/0/folderDrafts/newDraft1337352640-1//edit HTTP/1.1"
200 357811/0 0.728 - - 0
localhost - - [18/May/2012:08:50:49 GMT] "POST
/SOGo/so/user/Mail//0/folderINBOX/uids HTTP/1.1" 200 3441/44 0.269 - - 0
localhost - - [18/May/2012:08:50:49 GMT] "POST
/SOGo/so/user/Mail//0/folderINBOX/unseenCount HTTP/1.1" 200 13/0 0.246 - - 0
May 18 08:52:02 sogod [26185]:
<<0x01CF07C0[SOGoContactGCSFolder]:personal>>D fetched 1243 records.
2012-05-18 08:52:02.647 sogod[26185] WARNING(-[NSNull(misc) length]):
called NSNull -length (returns 0) !!!
C: connect to <0x0x2809ba0[NGInternetSocketAddress]: host=localhost
not-filled>
S: <SMTP-Reply: code=220 line='mail.example.com ESMTP Postfix (Debian/GNU)'>
C: EHLO localhost
S: <SMTP-Reply: code=250 line='DSN'>
S: pipelining extension supported.
S: size extension supported.
C: MAIL FROM:<[email protected]>
C: RCPT TO:<[email protected]>
C: DATA
S: <SMTP-Reply: code=354 line='End data with <CR><LF>.<CR><LF>'>
C: data(45503 bytes) ..
S: <SMTP-Reply: code=250 line='2.0.0 Ok: queued as B242F49B2E'>
C: QUIT
S: <SMTP-Reply: code=221 line='2.0.0 Bye'>
May 18 08:53:02 sogod [26167]: [WARN] <0x0x16409b0[WOWatchDogChild]> pid
26185 has been hanging in the same request for 1 minutes
May 18 08:54:02 sogod [26167]: [WARN] <0x0x16409b0[WOWatchDogChild]> pid
26185 has been hanging in the same request for 2 minutes
May 18 08:55:02 sogod [26167]: [WARN] <0x0x16409b0[WOWatchDogChild]> pid
26185 has been hanging in the same request for 3 minutes
May 18 08:56:02 sogod [26167]: [WARN] <0x0x16409b0[WOWatchDogChild]> pid
26185 has been hanging in the same request for 4 minutes
May 18 08:57:02 sogod [26167]: [WARN] <0x0x16409b0[WOWatchDogChild]> pid
26185 has been hanging in the same request for 5 minutes
May 18 08:58:02 sogod [26167]: [WARN] <0x0x16409b0[WOWatchDogChild]> pid
26185 has been hanging in the same request for 6 minutes
May 18 08:59:02 sogod [26167]: [WARN] <0x0x16409b0[WOWatchDogChild]> pid
26185 has been hanging in the same request for 7 minutes
May 18 09:00:02 sogod [26167]: [WARN] <0x0x16409b0[WOWatchDogChild]> pid
26185 has been hanging in the same request for 8 minutes
May 18 09:01:02 sogod [26167]: [WARN] <0x0x16409b0[WOWatchDogChild]> pid
26185 has been hanging in the same request for 9 minutes
May 18 09:02:02 sogod [26167]: [WARN] <0x0x16409b0[WOWatchDogChild]>
safety belt -- sending KILL signal to pid 26185
May 18 09:02:02 sogod [26167]: <0x0x16409b0[WOWatchDogChild]> child
26185 exited
May 18 09:02:02 sogod [26167]: <0x0x16409b0[WOWatchDogChild]>
(terminated due to signal 9)
May 18 09:02:02 sogod [26167]: <0x0x16ecda0[WOWatchDog]> child spawned
with pid 25729
2012-05-18 09:02:02.265 sogod[25729] Note: Using UTF-8 as URL encoding
in NGExtensions.
May 18 09:02:02 sogod [25729]: <0x0x1914070[SOGoCache]> Cache cleanup
interval set every 300.000000 seconds
May 18 09:02:02 sogod [25729]: <0x0x1914070[SOGoCache]> Using host(s)
'localhost' as server(s)
2012-05-18 09:02:02.291 sogod[25729] Note(SoObject): SoDebugKeyLookup is
enabled!
2012-05-18 09:02:02.291 sogod[25729] Note(SoObject): SoDebugBaseURL is
enabled!
2012-05-18 09:02:02.291 sogod[25729] Note(SoObject): relative base URLs
are enabled.
2012-05-18 09:02:02.296 sogod[25729] WARNING: IMAP4 connection pooling
is disabled!
2012-05-18 09:02:02.333 sogod[25729] Note(NGImap4Connection): using '/'
as the IMAP4 folder separator.
May 18 09:03:02 sogod [26167]: [WARN] <0x0x16409b0[WOWatchDogChild]> pid
25729 has been hanging in the same request for 1 minutes
May 18 09:04:02 sogod [26167]: [WARN] <0x0x16409b0[WOWatchDogChild]> pid
25729 has been hanging in the same request for 2 minutes
that loop of a 10 minute timeout and then sending the kill signal
happened a few times until I tried to restart sogo with
"/etc/init.d/sogo restart" which then gave me:
May 18 09:27:02 sogod [26167]: [WARN] <0x0x16409b0[WOWatchDogChild]> pid
26562 has been hanging in the same request for 5 minutes
May 18 09:28:02 sogod [26167]: [WARN] <0x0x16409b0[WOWatchDogChild]> pid
26562 has been hanging in the same request for 6 minutes
May 18 09:29:02 sogod [26167]: [WARN] <0x0x16409b0[WOWatchDogChild]> pid
26562 has been hanging in the same request for 7 minutes
May 18 09:30:02 sogod [26167]: [WARN] <0x0x16409b0[WOWatchDogChild]> pid
26562 has been hanging in the same request for 8 minutes
May 18 09:30:27 sogod [26167]: <0x0x16ecda0[WOWatchDog]> Terminating
with SIGINT or SIGTERM
May 18 09:30:27 sogod [26167]: <0x0x16409b0[WOWatchDogChild]> sending
terminate signal to pid 26562
May 18 09:30:28 sogod [26167]: <0x0x16ecda0[WOWatchDog]> Terminating
with SIGINT or SIGTERM
May 18 09:30:28 sogod [26167]: <0x0x16ecda0[WOWatchDog]> Terminating
with SIGINT or SIGTERM
May 18 09:30:28 sogod [26167]: <0x0x16ecda0[WOWatchDog]> Terminating
with SIGINT or SIGTERM
May 18 09:30:28 sogod [26167]: <0x0x16ecda0[WOWatchDog]> Terminating
with SIGINT or SIGTERM
May 18 09:30:28 sogod [26167]: <0x0x16ecda0[WOWatchDog]> Terminating
with SIGINT or SIGTERM
May 18 09:30:28 sogod [26167]: <0x0x16ecda0[WOWatchDog]> Terminating
with SIGINT or SIGTERM
May 18 09:30:28 sogod [26167]: <0x0x16ecda0[WOWatchDog]> Terminating
with SIGINT or SIGTERM
May 18 09:30:28 sogod [26167]: <0x0x16ecda0[WOWatchDog]> Terminating
with SIGINT or SIGTERM
May 18 09:30:28 sogod [26167]: <0x0x16ecda0[WOWatchDog]> Terminating
with SIGINT or SIGTERM
May 18 09:30:28 sogod [26167]: <0x0x16ecda0[WOWatchDog]> Terminating
with SIGINT or SIGTERM
and that repeated until I killed sogod.
Could this be related to the sending email attachments bug in 1.3.15?
Anything I can do to help debug what is going on? Any ideas why the
watchdog child restart doesn't clear the problem? After I restarted
sogo, everything is working for now...
Thanks,
Ben
--
[email protected]
https://inverse.ca/sogo/lists