Maybe you are hitting the max connections? How many connections are there when it starts to show those errors?
On Fri, 5 Jun 2020 at 01:06, Calvin Ellison <[email protected]> wrote: > > A) Is the LRN database located locally on the OpenSIPs box or is it > remote? > > We are using an F5 BIG-IP to proxy a pool of database servers. > Opensips is showing two connection-related errors: > > Jun 4 10:41:48 TC-521 /usr/sbin/opensips[12318]: > ERROR:db_mysql:db_mysql_connect: driver error(2013): Lost connection > to MySQL server at 'reading authorization packet', system error: 110 > Jun 4 10:41:48 TC-521 /usr/sbin/opensips[12318]: > ERROR:db_mysql:db_mysql_new_connection: initial connect failed > Jun 4 10:41:48 TC-521 /usr/sbin/opensips[12318]: > ERROR:core:db_init_async: failed to open new DB connection on > mysql://XXXX:[email protected]:0/ > Jun 4 10:41:48 TC-521 /usr/sbin/opensips[12318]: > INFO:db_mysql:db_mysql_async_raw_query: Failed to open new connection > (current: 1 + 8). Running in sync mode! > Jun 4 10:41:48 TC-521 /usr/sbin/opensips[12318]: > INFO:db_mysql:switch_state_to_disconnected: disconnect event for > 0x7f8903f16d10 > Jun 4 10:41:48 TC-521 /usr/sbin/opensips[12318]: > INFO:db_mysql:reset_all_statements: resetting all statements on > connection: (0x7f8903f16bb0) 0x7f8903f16d10 > Jun 4 10:41:48 TC-521 /usr/sbin/opensips[12318]: > INFO:db_mysql:connect_with_retry: re-connected successful for > 0x7f8903f16d10 > > Jun 4 10:44:29 TC-521 /usr/sbin/opensips[12342]: > ERROR:db_mysql:db_mysql_connect: driver error(2003): Can't connect to > MySQL server on '10.0.5.38' (110) > Jun 4 10:44:29 TC-521 /usr/sbin/opensips[12342]: > ERROR:db_mysql:db_mysql_new_connection: initial connect failed > Jun 4 10:44:29 TC-521 /usr/sbin/opensips[12342]: > ERROR:core:db_init_async: failed to open new DB connection on > mysql://XXXX:[email protected]:0/ > Jun 4 10:44:29 TC-521 /usr/sbin/opensips[12342]: > INFO:db_mysql:db_mysql_async_raw_query: Failed to open new connection > (current: 1 + 10). Running in sync mode! > Jun 4 10:44:29 TC-521 /usr/sbin/opensips[12342]: > INFO:db_mysql:switch_state_to_disconnected: disconnect event for > 0x7f8903f16d10 > Jun 4 10:44:29 TC-521 /usr/sbin/opensips[12342]: > INFO:db_mysql:reset_all_statements: resetting all statements on > connection: (0x7f8903f16bb0) 0x7f8903f16d10 > Jun 4 10:44:29 TC-521 /usr/sbin/opensips[12342]: > INFO:db_mysql:connect_with_retry: re-connected successful for > 0x7f8903f16d10 > > MariaDB is also showing an error from its perspective: > > 2020-06-04 23:40:27 64783 [Warning] Aborted connection 64783 to db: > 'unconnected' user: 'anonymous' host: '8.38.42.13' (Got timeout > reading communication packets) > > > B) Have you tried only doing sync database queries? Async introduces > some overhead, and I'm not sure if it causes extra database connections to > be created. When using sync there is a connection per child process that > stays up. > > Using synchronous mode appeared to be causing context switching issues > under heavy load. We specifically moved to async for this reason and > that appeared to reduce the CPU load dramatically. From the docs: > > "Using the asynchronous, "suspend-resume" logic instead of forking a > large number of processes in order to scale also has the advantage of > optimizing system resource usage, increasing its maximal throughput. > By requiring less processes to complete the same amount of work in the > same amount of time, process context switching is minimized and > overall CPU usage is improved. Less processes will also eat up less > system memory." > > I've been tweaking each of the configuration settings I've mentioned, > but without any clear path forward. Would 3.x provide any solutions? > > Is it possible to have too many children or timer partitions, and > starve opensips with context switches? Would that cause connection > issues? > > > C) Does the database have enough memory to contain the LRN and DNC > datasets fully in memory? The extra latency for the non-cache hits sent to > the database may stack up if the database has to hit disk. > > DB says query response time is like 0.001s and doesn't show any sign > of strain. I'm not personally familiar with the TokuDB engine, but I'm > lead to believe the entire dataset is in memory. I have two DBA triple > checking things. It's possible we're hitting a max connections or open > files limit that's set too low. Sometimes our peak hours include > spikes as well. > > > D) How many child processes are you using now? If you are hitting 100% > you may need to increase them. > > Only one hits 100% initially, then they topple over after that. This > seems to be related to the intermittent database connection errors. > We'll see what raising the max connections and ulimits on the server > does. I've also backed off on children and increased the async > connection pool size to result in the same number of total maximum > connections. Presumably this will reduce context switches and timer > delays. > > > E) Are your memcached processes using heavy cpu? If you are caching > multiple lists, I've found it helps to use unique memcached instance per > list. > > All of the various SIP dips are the same db stored procedure with many > fields in the response. Those fields are cached as a CSV string, so > any cached dip can be used by any other kind of dip. The same call is > likely to use multiple dips, so we should only hit the DB once per > call regardless of how many different dips we apply. > > > F) Look for memory related log messages. If the memory starts getting > exhausted you will see defrag messages. This will chew up available > computation cycles. > > Both opensips servers and the database have plenty of free memory. How > do I know how much shared and process memory to use? I see warnings > about the reactor size shrinking to a percentage of the process memory > but have no idea what that implies. > > _______________________________________________ > Users mailing list > [email protected] > http://lists.opensips.org/cgi-bin/mailman/listinfo/users > -- Regards, David Villasmil email: [email protected] phone: +34669448337
_______________________________________________ Users mailing list [email protected] http://lists.opensips.org/cgi-bin/mailman/listinfo/users
