Hallo Robert,
your vserver reports a disappeared COMMUNIC semid. What is that for? For each 
connection a vserver is started. The vserver uses two semaphores for communication. 
For each direction a semaphore is needed to synchronize with the SAP DB kernel shared 
communication memory segment usage. In an established connection, the vserver gets 
packets from socket and puts the content into a shared memory. Then it triggers the 
semaphore which is used by the corresponding user kernel thread (UKT) inside SAP DB. 
Now it will wait on the communication channel specific semaphore until this is 
triggered by the SAP DB kernel.

The SAP DB kernel will read the shared memory content request section and after its 
work is done, it will put its reply into the shared memory reply section. After the 
reply is available, the kernel uses the communication channel specific semaphore to 
wakeup vserver.

The vserver triggered by the semaphore will now transfer the reply section content 
into the socket and wait for the next request.

If your link is still valid, from the knldiag i can see that the SAP DB kernel was 
kill by a 'Signal 9'. How can such a signal be recongnized? The SAP DB Kernel after it 
is started immediatly forks a child process and watches over its status. The 
'Watchdog' reopens the knldiag after the child process terminated and writes 'famous 
last words' and flushes content of shared memory trace buffers. A signal '9' is the 
KILL signal which cannot be caught or ignored. Whoever was that friendly to send that 
signal to SAP DB kernel now triggered the 'cleanup' procedure in the watchdog. This 
cleanup uses the 'tag file' directories, which are found in 
/usr/spool/sql/ipc/db:dbname and /usr/spool/sql/ipc/us:dbname. In these directories 
all shared memory segments and semaphores that belong to the SAP DB instance 'dbname' 
are found. The cleanup code uses the directory entries to removed the semaphores and 
shared memory segments for all clients (including all vservers connected to SAP DB) 
and fo!
r the SAP DB kernel itself. This cleanup code in your case removed the IPC semaphores 
with 'ipcrm()' system calls so that the vserver reported them as 'disappeared'. If it 
uses those disapperaed semaphores it ends up with 'invalid argument'. So what you see 
is a follow up of a 'Signal 9'.

Best regards
jrg

P.S.: It would be nice if you could upload your knldiag of the second crash you 
recently had. The way you did with the first crash helped a lot for this analysis. The 
excerpt you send us is not enaugh, since all i know now is that you ran out of memory 
:-(

> -----Original Message-----
> From: Robert Kr�ger [mailto:[EMAIL PROTECTED]]
> Sent: Freitag, 27. September 2002 14:46
> To: Hoffmeister, Joerg; [EMAIL PROTECTED]
> Subject: Re: strange crashes (ERR -11987)
> 
> 
> 
> OK, one more try without a large attachment:
> 
> I put an archive containing the kernel dumps for the crash on 
> our website. I 
> hope this helps to shed some light on the issue.
> 
> http://www.signal7.de/knldiag_and_rtedump.tgz
> 
> the crash occured at 2002-09-24 13:04 
> 
> thanks in advance,
> 
> robert
> 
> On Tuesday 24 September 2002 17:33, Hoffmeister, Joerg wrote:
> > Robert:
> >
> > the vserver.prot in this case might show only follow-up 
> effects to a died
> > kernel. More information about what might have caused the 
> crashes can be
> > retrieved out of knldiag.err and knldiag/knldiag.old (if it 
> is overwritten
> > already). Examine those files for the point of time of the 
> crashes or even
> > append them to a mail to this list.
> >
> > J�rg Hoffmeister
> > SAP AG, SAP Labs Berlin
> >
> > -----Original Message-----
> > From: Robert Kr�ger [mailto:[EMAIL PROTECTED]]
> > Sent: Dienstag, 24. September 2002 13:23
> > To: [EMAIL PROTECTED]
> > Subject: strange crashes (ERR -11987)
> >
> >
> >
> > hi,
> >
> > we've had 2 strange crashes on one of our production 
> systems with the
> > following log in vserver.prot:
> >
> > 2002-09-24 13:04:00 15417 ERR -11987 COMMUNIC semid 
> 21254185 disappeared!
> >
> > 2002-09-24 13:04:00 19150 ERR -11987 COMMUNIC semid 
> 15268376 disappeared!
> >
> > 2002-09-24 13:04:00 19151 ERR -11987 COMMUNIC semid 
> 15268383 disappeared!
> >
> > 2002-09-24 13:04:00 19152 ERR -11987 COMMUNIC semid 
> 15268384 disappeared!
> >
> > 2002-09-24 13:04:00 19146 ERR -11987 COMMUNIC semid 
> 15268375 disappeared!
> >
> > 2002-09-24 13:04:00 19144 ERR -11987 COMMUNIC semid 
> 15268372 disappeared!
> >
> > 2002-09-24 13:04:03 19085 ERR -11987 COMMUNIC semctl 
> (setval 26351624)
> > error: Invalid argument
> >
> > 2002-09-24 13:04:08 19160 ERR -11987 COMMUNIC semctl 
> (setval 26351624)
> > error: Invalid argument
> >
> > 2002-09-24 13:04:10  3615 ERR -11987 COMMUNIC semctl 
> (setval 26351624)
> > error: Invalid argument
> >
> > 2002-09-24 13:05:04 14796 ERR -11987 COMMUNIC semctl 
> (setval 26351625)
> > error: Invalid argument
> >
> > 2002-09-24 13:05:56 15376 ERR -11987 COMMUNIC semctl 
> (setval 26351625)
> > error: Invalid argument
> >
> > Server version is 7.3.0.21 on linux (suse 7.1). which diagnostic
> > information must i provide so someone can take an educated 
> guess regarding
> > what's happening.
> >
> > thanks in advance,
> >
> > robert
> >
> >
> >
> >
> >
> > _______________________________________________
> > sapdb.general mailing list
> > [EMAIL PROTECTED]
> > http://listserv.sap.com/mailman/listinfo/sapdb.general
> 
> _______________________________________________
> sapdb.general mailing list
> [EMAIL PROTECTED]
> http://listserv.sap.com/mailman/listinfo/sapdb.general
> 
_______________________________________________
sapdb.general mailing list
[EMAIL PROTECTED]
http://listserv.sap.com/mailman/listinfo/sapdb.general

Reply via email to