Munder Albert (CI/ISE) schrieb:
Hi everybody,
regarding our TCP/TLS stability problems we have no decided to make test with kamailio 1.5.1 Nevertheless it would be interesting if there is a chance to get rid of this problems. Is anybody using TLS? Used modules: SNMP, mySQL Summary of problems
Errors may be related to the following log file entries

un 17 09:01:27 si-…. /usr/local/sbin/openser[13921]: WARNING:core:send2child: no free tcp receiver, connection passed to the leastbusy one (6)

That means that all of the tcp workers are currently busy by having connections assigned to it. That does not mean, that the worker process is really busy.


Jun 17 08:54:52 si-…. /usr/local/sbin/openser[13921]: ERROR:core:tcpconn_new: shared memory allocation failure
> Jun 17 08:54:52 si-… /usr/local/sbin/openser[13921]:
> ERROR:core:handle_new_connect: tcpconn_new failed, closing socket

You are running out of shared memory. Either you allocate too much or there is somewhere a memory leak.

Please debug according to the following howto:
http://www.kamailio.org/dokuwiki/doku.php/troubleshooting:memory


And a few of these also (7613 times):

Jun 17 08:57:24 si-… /usr/local/sbin/openser[13880]: ERROR:core:tls_accept: some error in SSL:

Jun 17 08:57:24 si-… /usr/local/sbin/openser[13880]: ERROR:core:tls_print_errstack: error:1409C041:SSL routines:SSL3_SETUP_BUFFERS:malloc failure

openssl is running out of memory. openssl does not use openser's memory manager but uses the standard OS malloc.

MAybe there are so many TCP/TLS connections that you run out of memory? Strange.


*shared memory consumption*
shared memory is continously increasing (set to 1024)

What do you mean with "continously increasing". Openser's memory manager allocates the memory for shared memory during startup. During runtime, openser's shared memory stays constant.

If you experience increasing shared memory then this must be caused from standard OS malloc which is used by other libraries (e.g. openssl, libxml, mysqlclient, ...)

In this case there can be a bug in the library itself or openser uses the library in a wrong way.

regards
Klaus



PKG_MEM is 1 MB
*high CPU load for some openser processes* normally after some days we get a high CPU load (50-90%) for a small number of the openser processes
It looks like an endless loop and requires restart of openser
There may be an endless loop in

Pass_fd.c

again:

ret=sendmsg(unix_socket, &msg, 0);

if (ret<0){

if (errno==EINTR) goto again;

LM_CRIT("sendmsg failed on %d: %s\n", unix_socket, strerror(errno));

}

any comments on that?

Mit besten Grüßen | Best regards
*Albert Munder*
Robert Bosch GmbH
IT Systems Engineering (CI/ISE)
Postfach 30 02 20
70442 Stuttgart
GERMANY
www.bosch.com
Tel. +49 711 811-40562
Fax +49 711 811-5113333
___albert.mun...@de.bosch.com_
Robert Bosch GmbH, Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart HRB 14000 Aufsichtsratsvorsitzender: Hermann Scholl; Geschäftsführung: Franz Fehrenbach, Siegfried Dais; Bernd Bohr, Wolfgang Chur, Rudolf Colm, Gerhard Kümmel, Wolfgang Malchow, Peter Marks;
Volkmar Denner, Peter Tyroller.

------------------------------------------------------------------------
*Von:* Henning Westerholt [mailto:henning.westerh...@1und1.de]
*Gesendet:* Dienstag, 30. Juni 2009 17:25
*An:* users@lists.kamailio.org
*Cc:* Munder Albert (CI/ISE)
*Betreff:* Re: [Kamailio-Users] OpenSER stability problems in pilot project

On Dienstag, 30. Juni 2009, Munder Albert (CI/ISE) wrote:
 > [..]
 > We are running OpenSER in a pilot project and
 > unfortunately have some stability problems.


Hallo Albert,


 > * Appr. 5000 subscriber accounts
 > * Appr. 1200 simultaneously registered users
 > * Signalling encrypted with TLS
 > * Media data encrypted with SRTP
 > * Clients: softphones and hardphones
 > * Re-registration time for clients: 3600 sec


I've not that much experience with TCP, but don't think that this numbers should be a problem in a setup like this.


 > OpenSER configuration
 > · Works as stateful SIP Proxy
 > 1 mySQL database
 > 2 Version 1.3.4.-TLS
 > 3 Tcp_children: 100 --> is it recommended to increase this number?


This are quite a lot of children, but ok.


 > 4 Udp_children: 20
 > 5 Tcp_connection_timeout: 3600
 > 6 Shared memory:
 > · -m 512 when error occurred
 > 1 Now set to 1024


How much PKG_MEM do you use? The default value?


 > Problems
 > * Shared memory consumption
 > Shared memory usage is permanently increasing (about 50 MB per day)
 > Application already crashed twice


This could be a memory leak, what modules do you use? And do you use any proprietary modules? You could use the memory debugging to further investigate this: http://www.kamailio.org/dokuwiki/doku.php/troubleshooting:memory


 > First messages were, these, repeated thousands of times (5915 times):
 > Jun 17 08:54:52 si-.... /usr/local/sbin/openser[13921]:
 > ERROR:core:tcpconn_new: shared memory allocation failure Jun 17 08:54:52
 > si-... /usr/local/sbin/openser[13921]: ERROR:core:handle_new_connect:
 > tcpconn_new failed, closing socket And a few of these also (7613 times):
 > Jun 17 08:57:24 si-... /usr/local/sbin/openser[13880]:
 > ERROR:core:tls_accept: some error in SSL: Jun 17 08:57:24 si-...
 > /usr/local/sbin/openser[13880]: ERROR:core:tls_print_errstack:
 > error:1409C041:SSL routines:SSL3_SETUP_BUFFERS:malloc failure


This are caused from insufficient memory conditions. I can't comment on the TCP and TLS errors. But before really starting to investigate this problem, would it be possible for you to use a more recent version, e.g. kamailio 1.5.1 for testing?


 > * TCP errors, lost SIP messages
 >
 > Examples from error messages:
 > 14.100 times in log file from 17.06.09
 > Jun 17 04:03:15 si-... /usr/local/sbin/openser[13863]:
 > ERROR:core:tcp_blocking_connect: poll error: flags 18 Jun 17 04:03:15
 > si-... /usr/local/sbin/openser[13863]: ERROR:core:tcp_blocking_connect:
> failed to retrieve SO_ERROR (111) Connection refused Jun 17 04:03:15 si-...
 > /usr/local/sbin/openser[13863]: ERROR:core:tcpconn_connect:
 > tcp_blocking_connect failed Jun 17 04:03:15 si-...
> /usr/local/sbin/openser[13863]: ERROR:core:tcp_send: connect failed Jun 17 > 04:03:15 si-.. /usr/local/sbin/openser[13863]: ERROR:tm:msg_send: tcp_send
 > failed Jun 17 04:03:15 si-... /usr/local/sbin/openser[13863]:
 > ERROR:tm:t_forward_nonack: sending request failed
 >
 > Appears at least 20 000 times; and in the day of the last shared memory
 > errors, it was 225.794 times in the log file (note that the number in
 > parenthesis is usually 1 or 2, but on that day it has reached 6): Jun 17
 > 09:01:27 si-.... /usr/local/sbin/openser[13921]: WARNING:core:send2child:
 > no free tcp receiver, connection passed to the leastbusy one (6) Jun 17
> 09:01:27 si-... /usr/local/sbin/openser[13921]: WARNING:core:send2child: no
 > free tcp receiver, connection passed to the leastbusy one (5)
 >
 > * Certificate validation problems
 > TCP traffic is currently significantly increased by some ( appr. 70)
 > clients which failed to validate the TLS certificate. Registration is
 > repeated every 5 sec.
 >
 > Circa 30 thousand per day (on that day, it was 37.162 times in log)
 > Jun 17 04:03:10 si-024lc008 /usr/local/sbin/openser[13801]:
 > ERROR:core:tls_accept: some error in SSL: Jun 17 04:03:10 si-024lc008
 > /usr/local/sbin/openser[13801]: ERROR:core:tls_print_errstack:
 > error:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca


Best regards,


Henning


------------------------------------------------------------------------

_______________________________________________
Kamailio (OpenSER) - Users mailing list
Users@lists.kamailio.org
http://lists.kamailio.org/cgi-bin/mailman/listinfo/users
http://lists.openser-project.org/cgi-bin/mailman/listinfo/users

_______________________________________________
Kamailio (OpenSER) - Users mailing list
Users@lists.kamailio.org
http://lists.kamailio.org/cgi-bin/mailman/listinfo/users
http://lists.openser-project.org/cgi-bin/mailman/listinfo/users

Reply via email to