Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)
Closing with the solution of rpc "dlg.stats_active". -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/1591#issuecomment-471442731___ Kamailio (SER) - Development Mailing List sr-dev@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)
Closed #1591. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/1591#event-2193117634___ Kamailio (SER) - Development Mailing List sr-dev@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)
I just verified and removing DB_URL and DB_MODE from dialog modparams has the same behavior. In order to keep the thread in sync with the topic, I have created #1692 to track the metric not being updated. As for *this* issue we can either close it or continue troubleshooting on why the dialog counter from `stats.get_statistics` goes below 0. @miconda what do you think? -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/1591#issuecomment-433486369___ Kamailio (SER) - Development Mailing List sr-dev@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)
Hi @miconda, sorry for the delay, it's taken me a while until I have been able to test this.. I can confirm so far that `dlg.stats_active` works correctly, it's in sync between both nodes all the time, so any changes to the dialogs are reflected there instantly on both nodes. But, something must be wrong with `stats.get_statistics`: Active K node: ``` root@sbc01:/etc/kamailio# kamcmd stats.get_statistics all | grep dialog dialog:active_dialogs = 30 dialog:early_dialogs = 2 dialog:expired_dialogs = 14 dialog:failed_dialogs = 12176 dialog:processed_dialogs = 44907 root@sbc01:/etc/kamailio# ``` Backup K node (newly restarted): ``` root@sbc02:/etc/kamailio# kamcmd stats.get_statistics all | grep dialog dialog:active_dialogs = 27 dialog:early_dialogs = 0 dialog:expired_dialogs = 0 dialog:failed_dialogs = 0 dialog:processed_dialogs = 0 root@sbc02:/etc/kamailio# ``` that `27` seems to me like an "initial sync" from the peer node, but that number won't change after that (which makes sense as this node is a backup and doesnt' handle any traffic unless there is a failover). Does this make sense? Thanks! -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/1591#issuecomment-432767538___ Kamailio (SER) - Development Mailing List sr-dev@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)
This is only for the statistics. The rpc command was not touched. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/1591#issuecomment-420531298___ Kamailio (SER) - Development Mailing List sr-dev@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)
Hi @miconda , I can absolutely test this. Would your patch apply to return values for `dlg.stats_active` or to the dialog entries for `stats.get_statistics` ?? Or maybe to both? :) -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/1591#issuecomment-420415071___ Kamailio (SER) - Development Mailing List sr-dev@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)
I pushed a patch (see above) to update stats for active and early dialogs on dmq operations. Can you give it a try and see if all ok with it. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/1591#issuecomment-420299616___ Kamailio (SER) - Development Mailing List sr-dev@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)
The rpc command `dlg.stats_active` goes over the internal hash table of dialog module and counts the dialogs that are in active state, so what ever is there, local or replicated dialogs. Counters are exposed to inconsistency if adding/removing is triggered by events that can appear many times, so maybe the statistic related to active calls should be changed to a callback counting in a similar manner as the rpc command. There are other statistics that have a callback function behind instead of a counter. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/1591#issuecomment-413202450___ Kamailio (SER) - Development Mailing List sr-dev@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)
Hi @charlesrchance, thanks for the time to look into this. What you say totally matches the "weird" behaviors I was seeing. You really gave me a lot of clarity understanding what is going on behind the scenes. To answer your questions: 1. Yes, I have database but my idea was to move away from it for each thing I handle using DMQ. This also matches chronology of events, I have been restarting nodes here and there, I wasn't "losing" dialogs, but making them "orphaned" instead without knowing. 2. So there are (at least) a couple different options here: 2a) stat counters would have all dialogs (local + replicated), which in reality is the true value, as from kamailio's perspective, those dialogs are there and they are active aren't they? 2b) stat counters would only have local dialogs, so if you want to have full dialog count you would have to stack the values from all nodes. Now there is also a `dlg.stats_active` that gives you current dialogs (not sure if it also shows replicated ones) so maybe we can benefit from both options, a counter that includes both local and replicated dialogs, and a different counter (dlg.stats_active?) that has only the current-local-count of dialogs. Since I fully restarted both SBCs (which in fact, that was clearing the orphaned dialogs) I haven't seen a spike (negative counter). At this point I don't know what the best options is, I do understand perfectly the problem though. @miconda what do you think about all this? -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/1591#issuecomment-409303111___ Kamailio (SER) - Development Mailing List sr-dev@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)
Having made some tests around this, whilst I have not yet been able to reproduce the negative counter issue, I do think there needs to be some further thought around dialog replication. First thing to note - stats are not affected by replicated dialogs, so I don't think DMQ is _directly_ responsible for the negative counting. However - and this may be _indirectly_ related - if a node is restarted, any dialogs owned by it at the time will be forever 'stuck'. This is owing to the fact that in its current implementation, the dialog owner is responsible for triggering update/removal across the rest of the cluster. If the owner no longer exists - or has been restarted and has no idea that it was once the owner of some/all of the dialogs it receives in its initial re-sync - then this link is broken permanently. It is further compounded by the fact these orphaned dialogs never (in my tests, anyway) timeout. I need to spend some more time on the DMQ side, since this is the first time I have looked at it properly. In the meantime, @joelsdc, regarding your issue: 1. Do you have database enabled alongside DMQ replication or was it only for testing? I suspect this is where the recent 38 'expired' dialogs came from - conversely, the earlier 'bad' dialogs you mentioned were likely a result of the owning node having been restarted (these would not have been included in the 'expired' counter). 2. Are you expecting the stats counters (the original ones, not the new 'dlg.stats_active') to reflect all dialogs across the cluster or just those handled directly by the local instance? -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/1591#issuecomment-409205552___ Kamailio (SER) - Development Mailing List sr-dev@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)
More info: On another cluster, same setup..., during this troubleshooting I disabled DMQ and enabled MySQL for dialog replication, I also left one node outside of rotation to see replication behavior. Well, with 0 traffic I can see this: ``` root@sbc01:~# kamctl rpc dlg.stats_active { "jsonrpc": "2.0", "result": { "starting": 0, "connecting": 0, "answering": 0, "ongoing": 0, "all": 0 }, "id": 2634 } root@sbc01:~# ``` Which is correct, but...: ``` root@sbc01:~# kamctl rpc stats.fetch all | grep dialog "dialog.active_dialogs": "38", "dialog.early_dialogs": "0", "dialog.expired_dialogs": "38", "dialog.failed_dialogs": "0", "dialog.processed_dialogs": "1", root@sbc01:~# ``` I wonder if it's correct that those 38 `expired` dialogs still count towards the `active` counter? Could that have something to do? Maybe those non-existent-expired-dialogs because they are somewhere in the "active" counter, they get replicated only when DMQ is enabled? So far I have clear that: 1- For sure there is a scenario where counters go below 0. 2- It only happens (so far) when DMQ is enabled 3- In my initial look, I found some logs about dialogs created on node1 were timed-out on node2 (see previous posts) 4- Now I see a discrepancy on active vs expired dialogs on a node with 0 active dialogs. I feel we are slowly narrowing down what the possible problem can be, but I still don't have a clear picture what causes what. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/1591#issuecomment-408690103___ Kamailio (SER) - Development Mailing List sr-dev@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)
Could it have something to do with "old-non-expired-non-removed" dialogs? So today I applied latest @charlesrchance patch, and what I normally do (so I don't lose dialogs) is: 1- Restart one node 2- Wait for DMQ replication dialog sync 3- Restart other node Right after restarting both nodes in that sequence, I already could see values close to ULONG_MAX, so I did a `watch` every second and I could see the values move around. I decided to compare the dialogs to see if I could find anything interesting, what I found is that there were a lot of dialogs with the `init_ts` from like 11th and 13th July and being today 28th, those dialogs should not be there. So I wonder if those "bad" dialogs could be the cause of the counters going below 0. Maybe some timer to remove those dialogs didn't happen? and because of the way I perform restarts to not lose dialogs, they were constantly there replicated backwards and forwards... That said, I did a full restart (both nodes at the same time, losing dialogs) and since then the metrics are looking good. I hope this info helps. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/1591#issuecomment-408629258___ Kamailio (SER) - Development Mailing List sr-dev@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)
Thanks! Let me know if you need any more info -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/1591#issuecomment-406000670___ Kamailio (SER) - Development Mailing List sr-dev@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)
The values are closed to ULONG_MAX for 64bit, so likely some dialog end events are counted many times (likely due to dmq replication) and the value goes below 0, turning it to max value. For reference, ranges for number types: * https://en.wikibooks.org/wiki/C_Programming/limits.h In the past days I was rather busy with other stuff, I will try to get back and analyze the code asap. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/1591#issuecomment-405880927___ Kamailio (SER) - Development Mailing List sr-dev@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)
I don't know if it helps, but I have one server constantly reporting incorrectly: ``` root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active" "dialog:active_dialogs = 18446744073709551568", root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active" "dialog:active_dialogs = 18446744073709551567", root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active" "dialog:active_dialogs = 18446744073709551567", root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active" "dialog:active_dialogs = 18446744073709551566", root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active" "dialog:active_dialogs = 18446744073709551566", root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active" "dialog:active_dialogs = 18446744073709551566", root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active" "dialog:active_dialogs = 18446744073709551566", root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active" "dialog:active_dialogs = 18446744073709551564", root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active" "dialog:active_dialogs = 18446744073709551564", root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active" "dialog:active_dialogs = 18446744073709551563", root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active" "dialog:active_dialogs = 18446744073709551562", root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active" "dialog:active_dialogs = 18446744073709551565", root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active" "dialog:active_dialogs = 18446744073709551567", root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active" "dialog:active_dialogs = 18446744073709551568", root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active" "dialog:active_dialogs = 18446744073709551569", root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active" "dialog:active_dialogs = 18446744073709551569", root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active" "dialog:active_dialogs = 18446744073709551570", root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active" "dialog:active_dialogs = 18446744073709551570", root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active" "dialog:active_dialogs = 18446744073709551570", root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active" "dialog:active_dialogs = 18446744073709551569", root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active" "dialog:active_dialogs = 18446744073709551569", root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active" "dialog:active_dialogs = 18446744073709551570", root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active" "dialog:active_dialogs = 18446744073709551571", root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active" "dialog:active_dialogs = 18446744073709551572", root@kamailio1:~# ``` -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/1591#issuecomment-405637294___ Kamailio (SER) - Development Mailing List sr-dev@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)
Hi Daniel, This is what I can get: ``` root@kamailio2:/var/tmp# gdb /usr/sbin/kamailio -ex "bt full" --batch core.kamailio.1334.13cn4.1531505030 > core.kamailio.1334.13cn4.1531505030-bt_full.txt 66 ../../core/parser/../mem/../atomic/atomic_common.h: No such file or directory. root@kamailio2:/var/tmp# gdb /usr/sbin/kamailio -ex "bt full" --batch core.kamailio.1335.13cn4.1531505030 > core.kamailio.1335.13cn4.1531505030-bt_full.txt 66 ../../core/parser/../mem/../atomic/atomic_common.h: No such file or directory. root@kamailio2:/var/tmp# gdb /usr/sbin/kamailio -ex "bt full" --batch core.kamailio.1336.13cn4.1531505030 > core.kamailio.1336.13cn4.1531505030-bt_full.txt 66 ../../core/parser/../mem/../atomic/atomic_common.h: No such file or directory. root@kamailio2:/var/tmp# gdb /usr/sbin/kamailio -ex "bt full" --batch core.kamailio.1337.13cn4.1531505030 > core.kamailio.1337.13cn4.1531505030-bt_full.txt 66 ../../core/parser/../mem/../atomic/atomic_common.h: No such file or directory. root@kamailio2:/var/tmp# gdb /usr/sbin/kamailio -ex "bt full" --batch core.kamailio.1338.13cn4.1531505030 > core.kamailio.1338.13cn4.1531505030-bt_full.txt 66 ../../core/parser/../mem/../atomic/atomic_common.h: No such file or directory. root@kamailio2:/var/tmp# gdb /usr/sbin/kamailio -ex "bt full" --batch core.kamailio.1339.13cn4.1531505030 > core.kamailio.1339.13cn4.1531505030-bt_full.txt 66 ../../core/parser/../mem/../atomic/atomic_common.h: No such file or directory. root@kamailio2:/var/tmp# gdb /usr/sbin/kamailio -ex "bt full" --batch core.kamailio.24376.13cn4.1531504766 > ocre.kamailio.24376.13cn4.1531504766-bt_full.txt 51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. root@kamailio2:/var/tmp# gdb /usr/sbin/kamailio -ex "bt full" --batch core.kamailio.28420.13cn4.1531504860 > ocre.kamailio.28420.13cn4.1531504860-bt_full.txt BFD: Warning: /var/tmp/core.kamailio.28420.13cn4.1531504860 is truncated: expected core file size >= 8607031296, found: 2081087488. Cannot access memory at address 0x7fc79b2dc148 Cannot access memory at address 0x7fc79b2dc140 Failed to read a valid object file image from memory. root@kamailio2:/var/tmp# gdb /usr/sbin/kamailio -ex "bt full" --batch core.kamailio.28421.13cn4.1531504860 > ocre.kamailio.28421.13cn4.1531504860-bt_full.txt BFD: Warning: /var/tmp/core.kamailio.28421.13cn4.1531504860 is truncated: expected core file size >= 8607031296, found: 2136645632. Cannot access memory at address 0x7fc79b2dc148 Cannot access memory at address 0x7fc79b2dc140 Failed to read a valid object file image from memory. root@kamailio2:/var/tmp# gdb /usr/sbin/kamailio -ex "bt full" --batch core.kamailio.28422.13cn4.1531504860 > ocre.kamailio.28422.13cn4.1531504860-bt_full.txt BFD: Warning: /var/tmp/core.kamailio.28422.13cn4.1531504860 is truncated: expected core file size >= 8607031296, found: 2128039936. Cannot access memory at address 0x7fc79b2dc148 Cannot access memory at address 0x7fc79b2dc140 Failed to read a valid object file image from memory. root@kamailio2:/var/tmp# gdb /usr/sbin/kamailio -ex "bt full" --batch core.kamailio.28423.13cn4.1531504860 > ocre.kamailio.28423.13cn4.1531504860-bt_full.txt BFD: Warning: /var/tmp/core.kamailio.28423.13cn4.1531504860 is truncated: expected core file size >= 8607031296, found: 2174545920. Cannot access memory at address 0x7fc79b2dc148 Cannot access memory at address 0x7fc79b2dc140 Failed to read a valid object file image from memory. root@kamailio2:/var/tmp# ``` [core.kamailio.1334.13cn4.1531505030-bt_full.txt](https://github.com/kamailio/kamailio/files/2193740/core.kamailio.1334.13cn4.1531505030-bt_full.txt) [core.kamailio.1335.13cn4.1531505030-bt_full.txt](https://github.com/kamailio/kamailio/files/2193741/core.kamailio.1335.13cn4.1531505030-bt_full.txt) [core.kamailio.1336.13cn4.1531505030-bt_full.txt](https://github.com/kamailio/kamailio/files/2193742/core.kamailio.1336.13cn4.1531505030-bt_full.txt) [core.kamailio.1337.13cn4.1531505030-bt_full.txt](https://github.com/kamailio/kamailio/files/2193743/core.kamailio.1337.13cn4.1531505030-bt_full.txt) [core.kamailio.1338.13cn4.1531505030-bt_full.txt](https://github.com/kamailio/kamailio/files/2193744/core.kamailio.1338.13cn4.1531505030-bt_full.txt) [core.kamailio.1339.13cn4.1531505030-bt_full.txt](https://github.com/kamailio/kamailio/files/2193745/core.kamailio.1339.13cn4.1531505030-bt_full.txt) [ocre.kamailio.24376.13cn4.1531504766-bt_full.txt](https://github.com/kamailio/kamailio/files/2193746/ocre.kamailio.24376.13cn4.1531504766-bt_full.txt) [ocre.kamailio.28420.13cn4.1531504860-bt_full.txt](https://github.com/kamailio/kamailio/files/2193747/ocre.kamailio.28420.13cn4.1531504860-bt_full.txt) [ocre.kamailio.28421.13cn4.1531504860-bt_full.txt](https://github.com/kamailio/kamailio/files/2193748/ocre.kamailio.28421.13cn4.1531504860-bt_full.txt)
Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)
Hi Daniel, On my initial test in prod, I can see already discrepancies: ``` root@kamailio2:~# kamctl rpc dlg.stats_active { "jsonrpc": "2.0", "result": { "starting": 4, "connecting": 55, "answering": 2, "ongoing": 253, "all": 314 }, "id": 23360 } root@kamailio2:~# kamctl rpc stats.get_statistics all | grep dialog "dialog:active_dialogs = 18446744073709551604", "dialog:early_dialogs = 15", "dialog:expired_dialogs = 0", "dialog:failed_dialogs = 120", "dialog:processed_dialogs = 340", root@kamailio2:~# ``` Now we can go past 9223372036854776000 ??? I don't understand that. That was right after restarting (green active dialogs, yellow early dialogs): ![image](https://user-images.githubusercontent.com/16212586/42706316-2926ed4e-868b-11e8-8f62-08bddc849d58.png) Here is multiple runs of the stats.get_statistics: ``` root@kamailio2:~# kamctl rpc stats.get_statistics all | grep dialog "dialog:active_dialogs = 18446744073709551614", "dialog:early_dialogs = 8", "dialog:expired_dialogs = 0", "dialog:failed_dialogs = 315", "dialog:processed_dialogs = 863", root@kamailio2:~# kamctl rpc stats.get_statistics all | grep dialog "dialog:active_dialogs = 18446744073709551614", "dialog:early_dialogs = 8", "dialog:expired_dialogs = 0", "dialog:failed_dialogs = 315", "dialog:processed_dialogs = 864", root@kamailio2:~# kamctl rpc stats.get_statistics all | grep dialog "dialog:active_dialogs = 18446744073709551614", "dialog:early_dialogs = 8", "dialog:expired_dialogs = 0", "dialog:failed_dialogs = 315", "dialog:processed_dialogs = 864", root@kamailio2:~# kamctl rpc stats.get_statistics all | grep dialog "dialog:active_dialogs = 18446744073709551613", "dialog:early_dialogs = 10", "dialog:expired_dialogs = 0", "dialog:failed_dialogs = 320", "dialog:processed_dialogs = 866", root@kamailio2:~# kamctl rpc stats.get_statistics all | grep dialog "dialog:active_dialogs = 0", "dialog:early_dialogs = 8", "dialog:expired_dialogs = 0", "dialog:failed_dialogs = 322", "dialog:processed_dialogs = 870", root@kamailio2:~# kamctl rpc stats.get_statistics all | grep dialog "dialog:active_dialogs = 1", "dialog:early_dialogs = 9", "dialog:expired_dialogs = 0", "dialog:failed_dialogs = 323", "dialog:processed_dialogs = 873", root@kamailio2:~# kamctl rpc stats.get_statistics all | grep dialog "dialog:active_dialogs = 18446744073709551615", "dialog:early_dialogs = 9", "dialog:expired_dialogs = 0", "dialog:failed_dialogs = 324", "dialog:processed_dialogs = 875", root@kamailio2:~# kamctl rpc stats.get_statistics all | grep dialog "dialog:active_dialogs = 18446744073709551615", "dialog:early_dialogs = 10", "dialog:expired_dialogs = 0", "dialog:failed_dialogs = 325", "dialog:processed_dialogs = 876", root@kamailio2:~# ``` I tried to have a script gathering metrics: ``` root@kamailio2:~# cat dialogs.sh #!/bin/bash while true; do DIALOGS=`kamctl rpc dlg.list | grep h_entry | sort | wc -l` METRICS=`kamctl rpc stats.get_statistics all | grep -e "dialog:active" -e "dialog:early" | tr -d " " | tr -d "\n"` NEWMETRICS=`kamctl rpc dlg.stats_active | jq -c .result` echo "`date +'%Y-%m-%d %H:%M:%S.%N'` - dlg.list:$DIALOGS - stats.get_statistics:$METRICS - dlg.stats_active:$NEWMETRICS" sleep 1 done root@kamailio2:~# ``` But, after a little kamailio "starts slowing down" until it dies (had to completely reboot the server). ``` root@kamailio2:~# ./dialogs.sh 2018-07-13 10:57:14.204158716 - dlg.list:419 - stats.get_statistics:"dialog:active_dialogs=13","dialog:early_dialogs=12", - dlg.stats_active:{"starting":5,"connecting":62,"answering":7,"ongoing":288,"all":362} 2018-07-13 10:57:15.437165579 - dlg.list:422 - stats.get_statistics:"dialog:active_dialogs=15","dialog:early_dialogs=11", - dlg.stats_active:{"starting":4,"connecting":59,"answering":10,"ongoing":289,"all":362} 2018-07-13 10:57:16.678432892 - dlg.list:423 - stats.get_statistics:"dialog:active_dialogs=14","dialog:early_dialogs=9", - dlg.stats_active:{"starting":2,"connecting":58,"answering":9,"ongoing":289,"all":358} 2018-07-13 10:57:17.915458776 - dlg.list:422 - stats.get_statistics:"dialog:active_dialogs=13","dialog:early_dialogs=9", - dlg.stats_active:{"starting":4,"connecting":60,"answering":7,"ongoing":289,"all":360} 2018-07-13 10:57:19.149467641 - dlg.list:420 - stats.get_statistics:"dialog:active_dialogs=13","dialog:early_dialogs=8", - dlg.stats_active:{"starting":3,"connecting":60,"answering":5,"ongoing":290,"all":358} 2018-07-13 10:57:20.386457513 - dlg.list:415 - stats.get_statistics:"dialog:active_dialogs=14","dialog:early_dialogs=7", - dlg.stats_active:{"starting":4,"connecting":59,"answering":5,"ongoing":290,"all":358} 2018-07-13 10:57:21.627241266 - dlg.list:412 - stats.get_statistics:"dialog:active_dialogs=15","dialog:early_dialogs=7", -
Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)
I'm building v5.1.4 packages with that commit cherry-picked. I will deploy as soon as I have run a few tests in lab. I have noticed that the issue happens only on low traffic, so it matches your theory of the value going "below 0"... I was thinking if running a script every second gathering on both nodes: ``` kamctl rpc dlg.stats_active kamctl rpc dlg.list | grep h_entry | sort | wc -l kamctl rpc stats.get_statistics all | grep -e "dialog:active" -e "dialog:early" | tr -d " " | tr -d "\n" ``` I will also enable debug logs on `kamailio2` node (has slightly less traffic than `kamailio1`) maybe that helps too. Thank you! I'll report back results tomorrow. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/1591#issuecomment-404894491___ Kamailio (SER) - Development Mailing List sr-dev@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)
As an alternative to get the stats of active dialogs, I added an rpc command that scans the internal dialog hash table: * https://github.com/kamailio/kamailio/commit/ebb149066690f7d96f45e1639e0c5ca9616bbbe0 It is slightly slower, given the scan in real time, that the stats using the core counters, but should be fast enough not to affect the performances. It can be used to compare the results eventually and detect when an anomaly in the counter based stats happen. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/1591#issuecomment-404883239___ Kamailio (SER) - Development Mailing List sr-dev@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)
It may be that the value is decremented more than the active calls and "goes below 0", but the counter is an 'unsigned long' value, so from 0 jumps to maximum value. It may be related to DMQ replication. Based on feedback, so far all was pretty accurate with the stats for active dialogs in an instance without DMQ. So this path worth checking. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/1591#issuecomment-404755185___ Kamailio (SER) - Development Mailing List sr-dev@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)
We have another cluster of two kamailio for internal traffic, those are also on v5.1.4 but don't have KDMQ enabled yet, and their graphs look perfect. So I'm really starting to think this can be DMQ related. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/1591#issuecomment-404740630___ Kamailio (SER) - Development Mailing List sr-dev@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)
Some screenshots to illustrate what I mean: ![image](https://user-images.githubusercontent.com/16212586/42674283-f2d6c2da-8623-11e8-8df4-cf5b9859efd6.png) Same graph, with a Y-axe limited to 500: ![image](https://user-images.githubusercontent.com/16212586/42674318-112dc756-8624-11e8-8c06-c2fc345d08cf.png) Same as above, but shortening the time frame and with Y-limit of 100: ![image](https://user-images.githubusercontent.com/16212586/42674387-586dfa28-8624-11e8-956d-f9b1805883da.png) I see a type of pattern going further back in time is the following: - all is working - something happens (*see below note*) - dialogs number reported goes to 0 or close to 0 for a bit - dialogs report the 9223372036854776000 - things start working again? And, looking in the logs, at the ~time the spikes happen, I can see this in the logs: ``` root@kamailio2:~# cat /var/log/kamailio/kamailio.log-20180712 | grep -v NOTICE | grep -v "core/tcp_" ... Jul 12 02:54:24 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: dlg_ontimeout(): timeout for dlg with CallID 'orcr9cRwLT' and tags 'AV-f9MK~v' 'tetjZeg24aZeS' Jul 12 02:54:45 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: dlg_ontimeout(): timeout for dlg with CallID 'IRO6Y5Rw53' and tags 'cIaxfaScL' '9jBBFcc1NaUNQ' Jul 12 02:54:48 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: dlg_ontimeout(): timeout for dlg with CallID 'fw3hjesWFN' and tags '-3xLiat-a' 't1r1KrNrH0BcQ' Jul 12 02:54:50 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: dlg_ontimeout(): timeout for dlg with CallID 'w1rHrdpK2f' and tags 'HoJgX22ya' '4XmtD99FgyyDa' Jul 12 02:54:58 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: dlg_ontimeout(): timeout for dlg with CallID 'mrnlQ8OYuX' and tags 'yum6HTlKK' '766FZ05HaQXmj' Jul 12 02:55:02 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: dlg_ontimeout(): timeout for dlg with CallID 'lrbwFVn7Q8' and tags 'xOubxsLmV' '77Q419N7eHy6N' Jul 12 02:55:03 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: dlg_ontimeout(): timeout for dlg with CallID '14yaUN~xGG' and tags '6nSXFwBSb' 'mp1U2N57Ncg3p' Jul 12 02:55:05 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: dlg_ontimeout(): timeout for dlg with CallID 'hgI~lJdBJ3' and tags 'FzXjp4-YJ' 'a2jt4Hrv1H0cN' Jul 12 02:55:09 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: dlg_ontimeout(): timeout for dlg with CallID 'oBof16H0~A' and tags 'ZepyE21EK' '3ZXHF271aU7pH' Jul 12 02:55:13 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: dlg_ontimeout(): timeout for dlg with CallID 'XIvos7QACD' and tags 'UPsNWynuM' 'X3jF108jcX89N' Jul 12 02:55:13 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: dlg_ontimeout(): timeout for dlg with CallID 'ygaFfVdLn0' and tags '3Qbx8dkvK' '0yUyZrX83aSpa' Jul 12 02:55:18 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: dlg_ontimeout(): timeout for dlg with CallID 'm9G~N0LU9d' and tags 'wQ~cwqKHN' 'cNp0aHaS0XD4F' Jul 12 02:55:20 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: dlg_ontimeout(): timeout for dlg with CallID 'OUaCSKP2PZ' and tags 'W2-o~Knwc' 'epjHQN6yQj91e' Jul 12 02:55:22 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: dlg_ontimeout(): timeout for dlg with CallID 'wUtsFit9wE' and tags 'CmPHrLXm9' 'KKpSm5Fv718mD' Jul 12 02:55:24 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: dlg_ontimeout(): timeout for dlg with CallID 'i5FZlDb2Jt' and tags 'c7AO4enu~' '423vDyyBrU4cQ' Jul 12 02:55:26 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: dlg_ontimeout(): timeout for dlg with CallID 'XLKRf6e5Xq' and tags 'rSTom2iS3' '020pQS21HyQUK' Jul 12 02:55:29 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: dlg_ontimeout(): timeout for dlg with CallID 'nfBiavwuHu' and tags 'P3cHBXiea' 'NQ2ZBN8BZNSBF' Jul 12 02:55:31 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: dlg_ontimeout(): timeout for dlg with CallID '699iA2AdSO' and tags 'eZSvyxAYC' 'tj76025mK04Qe' Jul 12 02:55:31 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: dlg_ontimeout(): timeout for dlg with CallID 'XUuh-7CmGt' and tags 'GyhSzl-c7' 'r0mNXc4DSerjQ' Jul 12 02:55:33 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: dlg_ontimeout(): timeout for dlg with CallID 'oMau5CTVYF' and tags '42SPhgBwh' 'XQeBjgrpNUFpp' Jul 12 02:55:34 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: dlg_ontimeout(): timeout for dlg with CallID 'l0nKr0oj5c' and tags '2ZzNMM0b8' 'B6p7a3U8H0c9j' Jul 12 02:55:35 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: dlg_ontimeout(): timeout for dlg with CallID 'Kl49oLgGHV' and tags 'E5E5P9rFX' 'NB9ZFyjFN2cvD' Jul 12 02:55:35 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: dlg_ontimeout(): timeout for dlg with CallID 'i2lEbuiAFm' and tags 'TDExnc7Bx' 'Urv4SBvp5gyQQ' Jul 12 02:55:39 kamailio2 sbc[27795]:
Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)
@jchavanton: I will try to have a look @miconda: it reports 9223372036854776000 and after X time it goes back to good values. (where X can be sometimes on the very next request, sometimes after couple minutes, but haven't seen above that). But to your point: yes, it goes back to good values by itself. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/1591#issuecomment-404660178___ Kamailio (SER) - Development Mailing List sr-dev@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)
To clarify: some wrong value is returned, but afterwards it comes back to good values? Is it always the same wrong value? -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/1591#issuecomment-404581656___ Kamailio (SER) - Development Mailing List sr-dev@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)
Hi Joel, There was recently a little fix on initializing the counter from database. But this fix it is not in 5.1.4 anyway since the release was create June 5 and the fix was merged Could you try to isolate where the regression was introduced by building different version doing a checkout somewhere in the middle ``` git log --pretty=oneline 5.0.2...5.1.4 ``` -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/1591#issuecomment-404581159___ Kamailio (SER) - Development Mailing List sr-dev@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)
I updated the issue as I think this might not be related to KEX module and more with DIALOG module. As I'm not sure, I'm not specifying in the topic. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/1591#issuecomment-404555603___ Kamailio (SER) - Development Mailing List sr-dev@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev