Hello, good that it was sorted out and nothing unexpected inside kamailio locking code...
Cheers, Daniel On 12.01.18 16:38, Moritz Maisel wrote: > Hi Daniel, > > thanks for pointing out the details about the lock structure. After > some investigation towards locking issues, we found out that the root > cause was an external system that blocked kamailio's connection > attempts which subsequently lead to blocked threads. Sorry for the > noise and have a nice weekend! > > Regards, > Moritz > > > On Fri, Jan 12, 2018 at 1:50 PM, Daniel-Constantin Mierla > <[email protected] <mailto:[email protected]>> wrote: > > Hello, > > > On 12.01.18 11:48, Sebastian Damm wrote: >> Hi Daniel, >> >> unless there is a silent exit from some module (those extra >> routes just write some information into redis), I can't find any >> other drop and exits than the one in the route posted. >> >> We have added some logging around the lock functions, and >> although we are missing some log lines for some packets, making >> it look like some locks don't get released, we do see that >> another PID can successfully lock the same key later. >> >> There is no other lock/unlock in the config. >> >> Is there a way to debug the lock array? > attaching with gdb and looking at the values is always an option. > Otherwise, I do not recall anything for this purpose written in > kamailio. > >> One guess could be that the lock array is full, although we >> already have the lock size set to 10. > > Lock array cannot become "full", if you thought like that if there > are 10 lock operations with different keys on an array of 10 locks. > > Actually many string keys can use same lock/mutex -- a hashing > function is performed over the key value results in an integer > value which then is made modulo number of locks and the lock at > that index is chosen from the array. So if you do lock("abc") and > then you don't release it, then lock("xyz") can get blocked. Also, > doing unlock("xyz") without lock("xyz") is releasing the > lock("abc") -- note that "abc" and "xyz" are just random example > to explain what can happen, but does not mean it happens for these > keys. > > Cheers, > Daniel > > >> >> Regards, >> Sebastian >> >> On Fri, Jan 12, 2018 at 11:08 AM, Daniel-Constantin Mierla >> <[email protected] <mailto:[email protected]>> wrote: >> >> Hello, >> >> have you checked that any of the route blocks used in between >> lock() and unlock() don't do an exit/drop (also not return 0)? >> >> Do you have other lock() / unlock() in the config? >> >> Cheers, >> Daniel >> >> >> On 12.01.18 08:49, Sebastian Damm wrote: >>> Hi, >>> >>> this morning, two Kamailio servers suddenly stopped working >>> after having worked without a problem for months. Both are >>> running 5.0.2 on Debian Jessie. Those systems only analyze >>> mirrored traffic and write some information to a RabbitMQ. >>> >>> I tried restarting, but after a few seconds they get stuck >>> the same way as before. Then I attached a gdb to some of the >>> UDP listeners, and they all look pretty much the same. >>> >>> >>> ===========8< process 1 ================= >>> (gdb) bt >>> #0 0x00007f404ca435b9 in syscall () from >>> /lib/x86_64-linux-gnu/libc.so.6 >>> #1 0x00007f4049f41260 in futex_get (lock=0x7f4041bccd00) at >>> ../../core/parser/../mem/../futexlock.h:108 >>> #2 0x00007f4049f477d7 in cfg_lock_helper >>> (lkey=0x7fff3c048650, mode=0) at cfgutils.c:667 >>> #3 0x00007f4049f481ed in w_cfg_lock_wrapper >>> (msg=0x7f404ba17f18, key=0x7f404ba15488, mode=0) at >>> cfgutils.c:712 >>> #4 0x00007f4049f4823c in w_cfg_lock (msg=0x7f404ba17f18, >>> key=0x7f404ba15488 "\210a\241K@\177", s2=0x0) at cfgutils.c:717 >>> #5 0x000000000045cd85 in do_action (h=0x7fff3c049410, >>> a=0x7f404b9d5170, msg=0x7f404ba17f18) at core/action.c:1060 >>> #6 0x0000000000469afa in run_actions (h=0x7fff3c049410, >>> a=0x7f404b9d5170, msg=0x7f404ba17f18) at core/action.c:1552 >>> #7 0x000000000045958d in do_action (h=0x7fff3c049410, >>> a=0x7f404b9ce558, msg=0x7f404ba17f18) at core/action.c:678 >>> #8 0x0000000000469afa in run_actions (h=0x7fff3c049410, >>> a=0x7f404b9cdf70, msg=0x7f404ba17f18) at core/action.c:1552 >>> #9 0x000000000046a2bd in run_top_route (a=0x7f404b9cdf70, >>> msg=0x7f404ba17f18, c=0x0) at core/action.c:1641 >>> #10 0x0000000000580033 in receive_msg ( >>> buf=0xa393f9 <buf+121> "REGISTER sip:domain >>> SIP/2.0\r\nVia: SIP/2.0/UDP >>> >>> 1.2.3.4;branch=z9hG4bKfc98.c762d7151f110c7eb71fc7d4e0648f1f.0\r\nVia: >>> SIP/2.0/UDP >>> 192.168.0.3:5060;rport=38020;received=62.30.8.128;branch=z9hG4"..., >>> len=799, rcv_info=0x7fff3c049810) at core/receive.c:264 >>> #11 0x00007f40483efcdf in parsing_hepv3_message >>> (buf=0xa39380 <buf> "HEP3\003\230", len=920) at hep.c:499 >>> #12 0x00007f40483ee264 in hepv3_received (buf=0xa39380 <buf> >>> "HEP3\003\230", len=920, ri=0x7fff3c049a50) at hep.c:231 >>> #13 0x00007f40483ec9cb in hep_msg_received >>> (data=0x7fff3c049a30) at hep.c:85 >>> #14 0x000000000049e4e1 in sr_event_exec (type=7, >>> data=0x7fff3c049a30) at core/events.c:263 >>> #15 0x000000000048731a in udp_rcv_loop () at >>> core/udp_server.c:466 >>> #16 0x0000000000422d08 in main_loop () at main.c:1623 >>> #17 0x000000000042a408 in main (argc=13, >>> argv=0x7fff3c049ef8) at main.c:2643 >>> (gdb) quit >>> ===========8< process 1 ================= >>> >>> >>> ===========8< process 2 ================= >>> (gdb) bt >>> #0 0x00007f404ca435b9 in syscall () from >>> /lib/x86_64-linux-gnu/libc.so.6 >>> #1 0x00007f4049f41260 in futex_get (lock=0x7f4041bccd00) at >>> ../../core/parser/../mem/../futexlock.h:108 >>> #2 0x00007f4049f477d7 in cfg_lock_helper >>> (lkey=0x7fff3c048650, mode=0) at cfgutils.c:667 >>> #3 0x00007f4049f481ed in w_cfg_lock_wrapper >>> (msg=0x7f404ba17f18, key=0x7f404ba15488, mode=0) at >>> cfgutils.c:712 >>> #4 0x00007f4049f4823c in w_cfg_lock (msg=0x7f404ba17f18, >>> key=0x7f404ba15488 "\210a\241K@\177", s2=0x0) at cfgutils.c:717 >>> #5 0x000000000045cd85 in do_action (h=0x7fff3c049600, >>> a=0x7f404b9d5170, msg=0x7f404ba17f18) at core/action.c:1060 >>> #6 0x0000000000469afa in run_actions (h=0x7fff3c049600, >>> a=0x7f404b9d5170, msg=0x7f404ba17f18) at core/action.c:1552 >>> #7 0x000000000045958d in do_action (h=0x7fff3c049600, >>> a=0x7f404b9cf920, msg=0x7f404ba17f18) at core/action.c:678 >>> #8 0x0000000000469afa in run_actions (h=0x7fff3c049600, >>> a=0x7f404b9ce9c0, msg=0x7f404ba17f18) at core/action.c:1552 >>> #9 0x000000000046a2bd in run_top_route (a=0x7f404b9ce9c0, >>> msg=0x7f404ba17f18, c=0x7fff3c049600) at core/action.c:1641 >>> #10 0x0000000000580a69 in receive_msg ( >>> buf=0xa393f9 <buf+121> "SIP/2.0 200 OK\r\nVia: >>> SIP/2.0/UDP >>> >>> 172.20.40.8;branch=z9hG4bKa69c.5cca5a54c4d270c515eeb4bbb5e0bb44.0\r\nVia: >>> SIP/2.0/UDP 2.3.4.5:5060 <http://2.3.4.5:5060>\r\nContact: >>> <sip:[email protected]:59669;transport=UDP>\r\nTo: "..., >>> len=506, rcv_info=0x7fff3c049810) at core/receive.c:327 >>> #11 0x00007f40483efcdf in parsing_hepv3_message >>> (buf=0xa39380 <buf> "HEP3\002s", len=627) at hep.c:499 >>> #12 0x00007f40483ee264 in hepv3_received (buf=0xa39380 <buf> >>> "HEP3\002s", len=627, ri=0x7fff3c049a50) at hep.c:231 >>> #13 0x00007f40483ec9cb in hep_msg_received >>> (data=0x7fff3c049a30) at hep.c:85 >>> #14 0x000000000049e4e1 in sr_event_exec (type=7, >>> data=0x7fff3c049a30) at core/events.c:263 >>> #15 0x000000000048731a in udp_rcv_loop () at >>> core/udp_server.c:466 >>> #16 0x0000000000422d08 in main_loop () at main.c:1623 >>> #17 0x000000000042a408 in main (argc=13, >>> argv=0x7fff3c049ef8) at main.c:2643 >>> (gdb) quit >>> ===========8< process 2 ================= >>> >>> Looks to me as if Kamailio gets stuck trying to get a lock >>> for this packet. >>> >>> The config file when handling those packets looks like this: >>> >>> request_route { >>> route(initialize_variables); >>> route(foo); >>> } >>> >>> onreply_route { >>> route(initialize_variables); >>> route(foo); >>> } >>> >>> route[initialize_variables] { >>> $vn(bar) = $null; >>> $vn(baz) = $null; >>> $vn(barbaz) = $null; >>> $vn(foobar) = $null; >>> >>> } >>> >>> route[foo] { >>> lock("$ci"); >>> xlog("L_DBG", "Obtained lock, calling lua...\n"); >>> if(!lua_run("handle_packet")) { >>> xlog("L_ERR", "SCRIPT: failed to execute lua >>> function!\n"); >>> } >>> >>> if ($vn(bar) != $null) { >>> route(mediaIpToRedis); >>> } >>> >>> if ($vn(baz) != $null) { >>> route(channelInfoToRedis); >>> } >>> >>> if ($vn(barbaz) != $null) { >>> route(sendToQueue); >>> } >>> xlog("L_DBG", "All finished, releasing lock...\n"); >>> unlock("$ci"); >>> xlog("L_DBG", "Released lock...\n"); >>> >>> # update stats >>> if ($vn(foobar) != $null) { >>> update_stat("$vn(foobar)", "+1"); >>> } >>> >>> drop; >>> exit; >>> } >>> >>> Is there any race condition I am missing? Until this >>> morning, it ran without problems and only every 2 minutes or >>> so, 4 packets were sent to the RabbitMQ. So the system >>> throws away nearly 100% of the traffic (which is around 20 >>> Mbit). >>> >>> In the log file, there are no errors at all, no aborted lua >>> scripts or whatsoever. >>> >>> Does anybody have a hint for me? Thanks in advance. >>> >>> Regards >>> Sebastian >>> >>> >>> _______________________________________________ >>> Kamailio (SER) - Users Mailing List >>> [email protected] <mailto:[email protected]> >>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users >>> <https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users> >> >> -- >> Daniel-Constantin Mierla >> www.twitter.com/miconda <http://www.twitter.com/miconda> -- >> www.linkedin.com/in/miconda <http://www.linkedin.com/in/miconda> >> Kamailio Advanced Training - March 5-7, 2018, Berlin - >> www.asipto.com <http://www.asipto.com> >> Kamailio World Conference - May 14-16, 2018 - www.kamailioworld.com >> <http://www.kamailioworld.com> >> >> >> >> >> >> >> _______________________________________________ >> Kamailio (SER) - Users Mailing List >> [email protected] <mailto:[email protected]> >> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users >> <https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users> > > -- > Daniel-Constantin Mierla > www.twitter.com/miconda <http://www.twitter.com/miconda> -- > www.linkedin.com/in/miconda <http://www.linkedin.com/in/miconda> > Kamailio Advanced Training - March 5-7, 2018, Berlin - www.asipto.com > <http://www.asipto.com> > Kamailio World Conference - May 14-16, 2018 - www.kamailioworld.com > <http://www.kamailioworld.com> > > > _______________________________________________ > Kamailio (SER) - Users Mailing List > [email protected] <mailto:[email protected]> > https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users > <https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users> > > > > > -- > sipgate GmbH - Gladbacher Str. 74 - 40219 Düsseldorf > HRB Düsseldorf 39841 - Geschäftsführer: Thilo Salmon, Tim Mois > Steuernummer: 106/5724/7147, Umsatzsteuer-ID: DE219349391 > > www.sipgate.de <http://www.sipgate.de> - www.sipgate.co.uk > <http://www.sipgate.co.uk> -- Daniel-Constantin Mierla www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio Advanced Training - March 5-7, 2018, Berlin - www.asipto.com Kamailio World Conference - May 14-16, 2018 - www.kamailioworld.com
_______________________________________________ Kamailio (SER) - Users Mailing List [email protected] https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
