Hello, I pushed two patches to prevent the crash, even the modules is not used as expected in the config.
Charles: can you check and see if both makes sense? The one in worker_loop() function is to prevent the crash: * https://github.com/kamailio/kamailio/commit/a675ab88fefac75145a7d563fee0431458630529 This should be backported if all goes fine with it. The second one in empty_peer_callback() is to generated a 202-Accepted response, otherwise in such cases the sender will do retransmissions: * https://github.com/kamailio/kamailio/commit/7f618c2d855ac268df905eb3d6e18733c8773047 But maybe it was on purpose not to send a response (i.e., to allow sending the response from config), in such case it can be reverted. Cheers, Daniel On 24.04.20 20:57, Charles Chance wrote: > Hi, > > Did you try the config snippet I provided? > > Basically dmq_handle_message() must be called if the message is not > your own, otherwise the node discovery/health check will not work and > you will see nodes disappearing as you described. > > Here it is again: > > if(is_method("KDMQ")){ > > if($rU =~ "userOnline"){ > //user came online in cluster, resume transactions if-any > suspended > $avp(remoteUser) = $rb; > } else { > dmq_handle_message(); > } > } > > Notice that we check for your own/custom message first, then call > handle message if not matched. > > Let me know if it works. > > Cheers, > > Charles > > > On Fri, 24 Apr 2020 at 19:52, SamyGo <[email protected] > <mailto:[email protected]>> wrote: > > Yes, > I did read all(past 3+ years) his replies specific to DMQ and DMQ > USRLOC and only one matched exact description and there has no > resolution to it. > Github open+closed issues for DMQ didn't have anything similar > either. Could it be something I'm doing wrong !? > > Additional info: One of the server is direct on Public IP and > Other one is behind NAT. Another test setup where it consistently > reproducible is two server behind NAT(AWS) > Here are the mod params. Only usrloc sync is done via DMQ and no > other module is using DMQ. > > listen=udp:LocalIP:5060 advertise PublicIP:5060 > > modparam("dmq","server_address", DMQ_LOCAL_SERVER) > modparam("dmq", "notification_address", DMQ_REMOTE_SERVER) > modparam("dmq", "multi_notify", 0) //1 for DNS SRV > modparam("dmq", "num_workers", 10) > modparam("dmq", "ping_interval", 60) > > modparam("dmq_usrloc", "enable", 1) > modparam("dmq_usrloc", "sync", 1) > modparam("dmq_usrloc", "batch_size", 4000) > modparam("dmq_usrloc", "batch_usleep", 1000) > modparam("dmq_usrloc", "usrloc_domain", "location") > > Where: DMQ_REMOTE_SERVER = sip:PublicIP2:5060 > > GDB info as requested: > > Core was generated by `/usr/local/sbin/kamailio -w /tmp/kamailio > -P /var/run/kamailio/kamailio.pid -f'. > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x00007f248c4cef15 in send_reply (msg=0x7f2469f88d40, code=0, > reason=0x7ffd775e3ab8) at sl.c:276 > 276 if(reason->s[reason->len-1]=='\0') { > (gdb) > (gdb) > (gdb) frame 0 > #0 0x00007f248c4cef15 in send_reply (msg=0x7f2469f88d40, code=0, > reason=0x7ffd775e3ab8) at sl.c:276 > 276 if(reason->s[reason->len-1]=='\0') { > (gdb) p *reason > $1 = {s = 0x0, len = 0} > (gdb) > (gdb) frame 1 > #1 0x00007f24656c6549 in worker_loop (id=2) at worker.c:129 > 129 > if(slb.freply(current_job->msg, peer_response.resp_code, > (gdb) p *worker > $3 = {queue = 0x7f2469f240a8, jobs_processed = 5, lock = {val = > 2}, pid = 935} > (gdb) > (gdb) > (gdb) p *current_job > $6 = {f = 0x7f24656d6d8d <empty_peer_callback>, msg = > 0x7f2469f88d40, orig_peer = 0x7f2469f6ed50, next = 0x0, prev = 0x0} > (gdb) > > > On Fri, Apr 24, 2020 at 1:30 PM Daniel-Constantin Mierla > <[email protected] <mailto:[email protected]>> wrote: > > Hello, > > have you tried the suggestion from Charles in the other > response? It can help figuring out where the problem resides. > > Now, from C point of view, I would need the following output > from gdb of the core file: > > frame 0 > p *reason > > frame 1 > p *worker > p *current_job > > I would also need to know the modparams for dmq and other > dmq_* module, plus the list if modules for which you enabled > dmq (eg, htable, dialog, presence, ...). > > Cheers, > Daniel > > On 24.04.20 18:10, SamyGo wrote: >> Oops,apologize, missed that: >> >> version: kamailio 5.3.3 (x86_64/linux) 44ccb9-dirty >> flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, >> DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MMAP, PKG_MALLOC, >> Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, >> FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, >> USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES >> ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144, >> MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB >> poll method support: poll, epoll_lt, epoll_et, sigio_rt, select. >> id: 44ccb9 -dirty >> compiled on 17:04:55 Apr 17 2020 with gcc 4.9.2 >> >> Tried this with version 5.0, 5.2, and now 5.3 same situation.. >> >> Thankyou for looking into this, >> Sammy >> >> On Fri, Apr 24, 2020 at 2:33 AM Daniel-Constantin Mierla >> <[email protected] <mailto:[email protected]>> wrote: >> >> Hello, >> >> you have to provide the version of kamailio for each >> reported kamailio issue, otherwise is hard to match with >> the source code. Use 'kamailio -v' to get version details. >> >> Cheers, >> Daniel >> >> On 23.04.20 23:36, SamyGo wrote: >>> Hi, >>> >>> Is there a way to broadcast KDMQ to the cluster but not >>> expect a reply back !?as far as I've read the source >>> code dmq_bcast_message is exactly like dmq_send_message >>> in a way that it expects a callback to be executed on >>> response i.e expects a reply. >>> >>> So, the situation I'm facing is I'm broadcasting message >>> to cluster and I do not want a reply back. The following >>> two options result in crash & core dump. >>> >>> 1 - If my script doesn't respond back, by use of >>> dmq_handle_message, it marks the destined servers as >>> "inactive" and stops usrloc sync process which >>> isn't desirable. >>> 2 - If I respond back with the dmq_handle_message it >>> crashes the Kamailio which just received this >>> broadcasted message. >>> >>> Here is how its done in script: >>> >>> *broadcasting message to cluster:* >>> dmq_bcast_message("userOnline", "$fu", >>> "text/plain"); >>> >>> *Receiving and handling a broadcast message:* >>> route[DMQ_HANDLE] { >>> if(!(is_method("KDMQ") || $rm == "KDMQ")) return; >>> >>> if(is_method("KDMQ") || $rm == "KDMQ"){ >>> if($rU =~ "userOnline"){ >>> //user came online in cluster, >>> resume transactions if-any suspended >>> $avp(remoteUser) = $rb; >>> } >>> dmq_handle_message(); >>> exit; >>> } >>> } >>> >>> *Related log lines:* >>> Apr 23 21:15:48 kamailio[916]: ALERT: <script>: >>> [da2c1-2f499] ------ DMQ_HANDLE: UserOnline Event >>> Received ------ >>> Apr 23 21:15:48 kamailio[916]: DEBUG: dmq >>> [message.c:53]: ki_dmq_handle_message_rc(): >>> dmq_handle_message [KDMQ sip:[email protected]:5060 >>> <http://sip:[email protected]:5060>] >>> Apr 23 21:15:48 kamailio[916]: DEBUG: dmq >>> [message.c:66]: ki_dmq_handle_message_rc(): >>> dmq_handle_message peer found: userOnline >>> Apr 23 21:15:48 kamailio[916]: DEBUG: <core> >>> [core/receive.c:437]: receive_msg(): request-route >>> executed in: 401461 usec >>> Apr 23 21:15:48 kamailio[935]: DEBUG: dmq >>> [worker.c:87]: worker_loop(): dmq_worker [2 935] lock >>> acquired >>> and crash/segfault.. >>> >>> Core dump: https://pastebin.com/S7ekCPfF >>> >>> Any help or pointers to solve this would be really >>> appreciated. >>> >>> Best Regards, >>> Sammy >>> >>> _______________________________________________ >>> Kamailio (SER) - Users Mailing List >>> [email protected] <mailto:[email protected]> >>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users >> >> -- >> Daniel-Constantin Mierla -- www.asipto.com >> <http://www.asipto.com> >> www.twitter.com/miconda <http://www.twitter.com/miconda> -- >> www.linkedin.com/in/miconda <http://www.linkedin.com/in/miconda> >> > -- > Daniel-Constantin Mierla -- www.asipto.com <http://www.asipto.com> > www.twitter.com/miconda <http://www.twitter.com/miconda> -- > www.linkedin.com/in/miconda <http://www.linkedin.com/in/miconda> > > _______________________________________________ > Kamailio (SER) - Users Mailing List > [email protected] <mailto:[email protected]> > https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users > > -- > *Charles Chance* > Managing Director > > t. 0330 120 1200 m. 07932 063 891 > > Sipcentric Ltd. Company registered in England & Wales no. > 7365592. Registered office: Faraday Wharf, Innovation Birmingham > Campus, Holt Street, Birmingham Science Park, Birmingham B7 4BB. -- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda
_______________________________________________ Kamailio (SER) - Users Mailing List [email protected] https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
