Some screenshots to illustrate what I mean:

![image](https://user-images.githubusercontent.com/16212586/42674283-f2d6c2da-8623-11e8-8df4-cf5b9859efd6.png)

Same graph, with a Y-axe limited to 500:

![image](https://user-images.githubusercontent.com/16212586/42674318-112dc756-8624-11e8-8c06-c2fc345d08cf.png)

Same as above, but shortening the time frame and with Y-limit of 100:

![image](https://user-images.githubusercontent.com/16212586/42674387-586dfa28-8624-11e8-956d-f9b1805883da.png)

I see a type of pattern going further back in time is the following:

- all is working
- something happens (*see below note*)
- dialogs number reported goes to 0 or close to 0 for a bit
- dialogs report the 9223372036854776000
- things start working again?



And, looking in the logs, at the ~time the spikes happen, I can see this in the 
logs:

```
root@kamailio2:~# cat /var/log/kamailio/kamailio.log-20180712 | grep -v NOTICE 
| grep -v "core/tcp_"
...
Jul 12 02:54:24 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'orcr9cRwLT' and tags 'AV-f9MK~v' 
'tetjZeg24aZeS'
Jul 12 02:54:45 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'IRO6Y5Rw53' and tags 'cIaxfaScL' 
'9jBBFcc1NaUNQ'
Jul 12 02:54:48 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'fw3hjesWFN' and tags '-3xLiat-a' 
't1r1KrNrH0BcQ'
Jul 12 02:54:50 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'w1rHrdpK2f' and tags 'HoJgX22ya' 
'4XmtD99FgyyDa'
Jul 12 02:54:58 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'mrnlQ8OYuX' and tags 'yum6HTlKK' 
'766FZ05HaQXmj'
Jul 12 02:55:02 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'lrbwFVn7Q8' and tags 'xOubxsLmV' 
'77Q419N7eHy6N'
Jul 12 02:55:03 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID '14yaUN~xGG' and tags '6nSXFwBSb' 
'mp1U2N57Ncg3p'
Jul 12 02:55:05 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'hgI~lJdBJ3' and tags 'FzXjp4-YJ' 
'a2jt4Hrv1H0cN'
Jul 12 02:55:09 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'oBof16H0~A' and tags 'ZepyE21EK' 
'3ZXHF271aU7pH'
Jul 12 02:55:13 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'XIvos7QACD' and tags 'UPsNWynuM' 
'X3jF108jcX89N'
Jul 12 02:55:13 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'ygaFfVdLn0' and tags '3Qbx8dkvK' 
'0yUyZrX83aSpa'
Jul 12 02:55:18 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'm9G~N0LU9d' and tags 'wQ~cwqKHN' 
'cNp0aHaS0XD4F'
Jul 12 02:55:20 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'OUaCSKP2PZ' and tags 'W2-o~Knwc' 
'epjHQN6yQj91e'
Jul 12 02:55:22 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'wUtsFit9wE' and tags 'CmPHrLXm9' 
'KKpSm5Fv718mD'
Jul 12 02:55:24 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'i5FZlDb2Jt' and tags 'c7AO4enu~' 
'423vDyyBrU4cQ'
Jul 12 02:55:26 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'XLKRf6e5Xq' and tags 'rSTom2iS3' 
'020pQS21HyQUK'
Jul 12 02:55:29 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'nfBiavwuHu' and tags 'P3cHBXiea' 
'NQ2ZBN8BZNSBF'
Jul 12 02:55:31 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID '699iA2AdSO' and tags 'eZSvyxAYC' 
'tj76025mK04Qe'
Jul 12 02:55:31 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'XUuh-7CmGt' and tags 'GyhSzl-c7' 
'r0mNXc4DSerjQ'
Jul 12 02:55:33 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'oMau5CTVYF' and tags '42SPhgBwh' 
'XQeBjgrpNUFpp'
Jul 12 02:55:34 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'l0nKr0oj5c' and tags '2ZzNMM0b8' 
'B6p7a3U8H0c9j'
Jul 12 02:55:35 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'Kl49oLgGHV' and tags 'E5E5P9rFX' 
'NB9ZFyjFN2cvD'
Jul 12 02:55:35 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'i2lEbuiAFm' and tags 'TDExnc7Bx' 
'Urv4SBvp5gyQQ'
Jul 12 02:55:39 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'S5k252ptQ7' and tags 'VfPxDw0Bw' 
'7p13arg6BmDZD'
Jul 12 02:55:39 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'fjqooHvA~I' and tags 'NbHBVvU9w' 
'XZc6SDF87vB1B'
Jul 12 02:55:40 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID '7b~UWpjiY2' and tags 'eVOoK-UjU' 
'QXUHKmmpFmS1m'
Jul 12 02:55:41 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'eOSkqIsjd0' and tags '8chfVD~av' 
'Zv170QZ4SmQ2N'
Jul 12 02:55:42 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'FgI7PLr3km' and tags 'HomQTapK0' 
'HvevN50X0NjSr'
Jul 12 02:55:42 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID '9tKCteAnPq' and tags '9QSRb9F3r' 
'0yyS6HUX3QB2r'
Jul 12 02:55:44 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'w0mRMEBDPQ' and tags 'mqY5bERQ1' 
'yQXyarSrcyymg'
Jul 12 02:55:46 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'AP1gTeQl6U' and tags 'YUz~HwPjE' 
'eX0mZc7eUm37a'
Jul 12 02:55:46 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'lhxI8T1rXr' and tags 'yb92ex~AK' 
'SyZHS3m48eN8F'
Jul 12 02:55:49 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID '2RqFuZyUmj' and tags 'owLepp0dU' 
'7v1cjy0N0BDej'
Jul 12 02:55:49 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'DkuPmsRFZz' and tags '~HNqKE7E2' 
'Z90vN6SXFDvUD'
Jul 12 02:55:50 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'Zdz48Xh~co' and tags 'hQ4p6Ad8a' 
'F6pjtetXrg7er'
Jul 12 02:55:50 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'CNoQG-88vC' and tags 'Y-2W95k9F' 
'0eZg37BpN419c'
Jul 12 02:55:52 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'IVtYwdhMVV' and tags 'tpw7O0Hxz' 
'p0UrDgSFvyFya'
...
root@kamailio2:~#
```

That seemed weird, all from the same process (`27795`) and almost at the same 
time?

So I wanted to check randomly for one of the calls, if I take for example the 
first one (`orcr9cRwLT`) I can't find anything in the logs:

```
root@kamailio2:~# grep orcr9cRwLT /var/log/kamailio/kamailio.log-20180712
Jul 12 02:54:24 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'orcr9cRwLT' and tags 'AV-f9MK~v' 
'tetjZeg24aZeS'
root@kamailio2:~#
```

The interesting thing is:


```
root@kamailio1:~# grep orcr9cRwLT /var/log/kamailio/kamailio.log-20180712
Jul 11 14:54:24 kamailio1 sbc[29467]: NOTICE: <script>: New request - M=INVITE 
F=sip:5cfafa09ac2a45428cff23271e150...@my.domain.com 
T=sip:inbo...@sip.mydomain.com R=sip:inbo...@sip.mydomain.com:50814 
IP=99.203.1.225:31849 UA='HIDDEN_USER_AGENT' ID=orcr9cRwLT
Jul 11 14:54:24 kamailio1 sbc[29467]: NOTICE: <script>: TLS information: 
SNI=sip.mydomain.com Ver=TLSv1.2 Cipher=AES256-GCM-SHA384 Bits=256 - M=INVITE 
R=sip:inbo...@sip.mydomain.com:50814 ID=orcr9cRwLT
Jul 11 14:54:24 kamailio1 sbc[29467]: NOTICE: <script>: NAT detected - M=INVITE 
R=sip:inbo...@sip.mydomain.com:50814 ID=orcr9cRwLT
Jul 11 14:54:24 kamailio1 sbc[29467]: NOTICE: <script>: Setting updated contact 
- M=INVITE ID=orcr9cRwLT
Jul 11 14:54:24 kamailio1 sbc[29467]: NOTICE: <script>: Authentication 
successful - M=INVITE R=sip:inbo...@sip.mydomain.com:50814 ID=orcr9cRwLT
Jul 11 14:54:24 kamailio1 sbc[29467]: NOTICE: <script>: Selected dispatcher 
node: sip:A.B.C.D:5060 - M=INVITE R=sip:inbound@A.B.C.D:5060 ID=orcr9cRwLT
Jul 11 14:54:24 kamailio1 sbc[29467]: NOTICE: <script>: Forwarding request - 
M=INVITE R=sip:inbound@A.B.C.D:5060 ID=orcr9cRwLT
Jul 11 14:54:24 kamailio1 sbc[29436]: NOTICE: <script>: Reply - M=INVITE S=100 
trying -- your call is important to us ID=orcr9cRwLT
Jul 11 14:54:24 kamailio1 sbc[29430]: NOTICE: <script>: Reply - M=INVITE S=200 
OK ID=orcr9cRwLT
Jul 11 14:54:24 kamailio1 sbc[29433]: NOTICE: <script>: Reply - M=INVITE S=200 
OK ID=orcr9cRwLT
Jul 11 14:54:24 kamailio1 sbc[29467]: NOTICE: <script>: Setting updated contact 
- M=ACK ID=orcr9cRwLT
Jul 11 14:54:24 kamailio1 sbc[29467]: NOTICE: <script>: In-dialog request from 
App - M=ACK R=sip:inbound@E.F.G.H:6061;transport=udp ID=orcr9cRwLT
Jul 11 14:54:24 kamailio1 sbc[29467]: NOTICE: <script>: Forwarding request - 
M=ACK R=sip:inbound@E.F.G.H:6061;transport=udp ID=orcr9cRwLT
Jul 11 14:54:24 kamailio1 sbc[29467]: NOTICE: <script>: Setting updated contact 
- M=ACK ID=orcr9cRwLT
Jul 11 14:54:24 kamailio1 sbc[29467]: NOTICE: <script>: In-dialog request from 
App - M=ACK R=sip:inbound@E.F.G.H:6061;transport=udp ID=orcr9cRwLT
Jul 11 14:54:24 kamailio1 sbc[29467]: NOTICE: <script>: Forwarding request - 
M=ACK R=sip:inbound@E.F.G.H:6061;transport=udp ID=orcr9cRwLT
root@kamailio1:~#
```


So the call, in fact was started through `kamailio1` (this is a cluster of 2 
nodes, `kamailio1` and `kamailio2`, and they replicate dialog information using 
KDMQ).

The error msgs in `kamailio2` appear exactly 12hr after the call started in 
`kamailio1`.

I'm starting to wonder if this can have anything to do with KDMQ replication? 

When we moved from v5.0.2 to v5.1.4, we also enabled KMDQ for dialog module 
(which in v5.0.2 we were not using).

I hope this info helps narrow down the issue.

My next steps unless you guys give me other suggestions:

a) Enable debug logs and wait for issue to reproduce, maybe we see something 
there.
b) Disable KDMQ replication on dialog module and see if the issue is solved or 
continues.

I don't like either of them, but I rather start with the most helpful of the 
two.





 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/1591#issuecomment-404738386
_______________________________________________
Kamailio (SER) - Development Mailing List
sr-dev@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev

Reply via email to