Hallo Robert, your vserver reports a disappeared COMMUNIC semid. What is that for? For each connection a vserver is started. The vserver uses two semaphores for communication. For each direction a semaphore is needed to synchronize with the SAP DB kernel shared communication memory segment usage. In an established connection, the vserver gets packets from socket and puts the content into a shared memory. Then it triggers the semaphore which is used by the corresponding user kernel thread (UKT) inside SAP DB. Now it will wait on the communication channel specific semaphore until this is triggered by the SAP DB kernel.
The SAP DB kernel will read the shared memory content request section and after its work is done, it will put its reply into the shared memory reply section. After the reply is available, the kernel uses the communication channel specific semaphore to wakeup vserver. The vserver triggered by the semaphore will now transfer the reply section content into the socket and wait for the next request. If your link is still valid, from the knldiag i can see that the SAP DB kernel was kill by a 'Signal 9'. How can such a signal be recongnized? The SAP DB Kernel after it is started immediatly forks a child process and watches over its status. The 'Watchdog' reopens the knldiag after the child process terminated and writes 'famous last words' and flushes content of shared memory trace buffers. A signal '9' is the KILL signal which cannot be caught or ignored. Whoever was that friendly to send that signal to SAP DB kernel now triggered the 'cleanup' procedure in the watchdog. This cleanup uses the 'tag file' directories, which are found in /usr/spool/sql/ipc/db:dbname and /usr/spool/sql/ipc/us:dbname. In these directories all shared memory segments and semaphores that belong to the SAP DB instance 'dbname' are found. The cleanup code uses the directory entries to removed the semaphores and shared memory segments for all clients (including all vservers connected to SAP DB) and fo! r the SAP DB kernel itself. This cleanup code in your case removed the IPC semaphores with 'ipcrm()' system calls so that the vserver reported them as 'disappeared'. If it uses those disapperaed semaphores it ends up with 'invalid argument'. So what you see is a follow up of a 'Signal 9'. Best regards jrg P.S.: It would be nice if you could upload your knldiag of the second crash you recently had. The way you did with the first crash helped a lot for this analysis. The excerpt you send us is not enaugh, since all i know now is that you ran out of memory :-( > -----Original Message----- > From: Robert Kr�ger [mailto:[EMAIL PROTECTED]] > Sent: Freitag, 27. September 2002 14:46 > To: Hoffmeister, Joerg; [EMAIL PROTECTED] > Subject: Re: strange crashes (ERR -11987) > > > > OK, one more try without a large attachment: > > I put an archive containing the kernel dumps for the crash on > our website. I > hope this helps to shed some light on the issue. > > http://www.signal7.de/knldiag_and_rtedump.tgz > > the crash occured at 2002-09-24 13:04 > > thanks in advance, > > robert > > On Tuesday 24 September 2002 17:33, Hoffmeister, Joerg wrote: > > Robert: > > > > the vserver.prot in this case might show only follow-up > effects to a died > > kernel. More information about what might have caused the > crashes can be > > retrieved out of knldiag.err and knldiag/knldiag.old (if it > is overwritten > > already). Examine those files for the point of time of the > crashes or even > > append them to a mail to this list. > > > > J�rg Hoffmeister > > SAP AG, SAP Labs Berlin > > > > -----Original Message----- > > From: Robert Kr�ger [mailto:[EMAIL PROTECTED]] > > Sent: Dienstag, 24. September 2002 13:23 > > To: [EMAIL PROTECTED] > > Subject: strange crashes (ERR -11987) > > > > > > > > hi, > > > > we've had 2 strange crashes on one of our production > systems with the > > following log in vserver.prot: > > > > 2002-09-24 13:04:00 15417 ERR -11987 COMMUNIC semid > 21254185 disappeared! > > > > 2002-09-24 13:04:00 19150 ERR -11987 COMMUNIC semid > 15268376 disappeared! > > > > 2002-09-24 13:04:00 19151 ERR -11987 COMMUNIC semid > 15268383 disappeared! > > > > 2002-09-24 13:04:00 19152 ERR -11987 COMMUNIC semid > 15268384 disappeared! > > > > 2002-09-24 13:04:00 19146 ERR -11987 COMMUNIC semid > 15268375 disappeared! > > > > 2002-09-24 13:04:00 19144 ERR -11987 COMMUNIC semid > 15268372 disappeared! > > > > 2002-09-24 13:04:03 19085 ERR -11987 COMMUNIC semctl > (setval 26351624) > > error: Invalid argument > > > > 2002-09-24 13:04:08 19160 ERR -11987 COMMUNIC semctl > (setval 26351624) > > error: Invalid argument > > > > 2002-09-24 13:04:10 3615 ERR -11987 COMMUNIC semctl > (setval 26351624) > > error: Invalid argument > > > > 2002-09-24 13:05:04 14796 ERR -11987 COMMUNIC semctl > (setval 26351625) > > error: Invalid argument > > > > 2002-09-24 13:05:56 15376 ERR -11987 COMMUNIC semctl > (setval 26351625) > > error: Invalid argument > > > > Server version is 7.3.0.21 on linux (suse 7.1). which diagnostic > > information must i provide so someone can take an educated > guess regarding > > what's happening. > > > > thanks in advance, > > > > robert > > > > > > > > > > > > _______________________________________________ > > sapdb.general mailing list > > [EMAIL PROTECTED] > > http://listserv.sap.com/mailman/listinfo/sapdb.general > > _______________________________________________ > sapdb.general mailing list > [EMAIL PROTECTED] > http://listserv.sap.com/mailman/listinfo/sapdb.general > _______________________________________________ sapdb.general mailing list [EMAIL PROTECTED] http://listserv.sap.com/mailman/listinfo/sapdb.general
