Closing with the solution of rpc "dlg.stats_active".
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/1591#issuecomment-471442731___
Closed #1591.
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/1591#event-2193117634___
Kamailio (SER) - Development Mailing List
I just verified and removing DB_URL and DB_MODE from dialog modparams has the
same behavior. In order to keep the thread in sync with the topic, I have
created #1692 to track the metric not being updated. As for *this* issue we can
either close it or continue troubleshooting on why the dialog
Hi @miconda, sorry for the delay, it's taken me a while until I have been able
to test this..
I can confirm so far that `dlg.stats_active` works correctly, it's in sync
between both nodes all the time, so any changes to the dialogs are reflected
there instantly on both nodes.
But, something
This is only for the statistics.
The rpc command was not touched.
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
Hi @miconda , I can absolutely test this.
Would your patch apply to return values for `dlg.stats_active` or to the dialog
entries for `stats.get_statistics` ?? Or maybe to both? :)
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on
I pushed a patch (see above) to update stats for active and early dialogs on
dmq operations. Can you give it a try and see if all ok with it.
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
The rpc command `dlg.stats_active` goes over the internal hash table of dialog
module and counts the dialogs that are in active state, so what ever is there,
local or replicated dialogs.
Counters are exposed to inconsistency if adding/removing is triggered by events
that can appear many times,
Hi @charlesrchance, thanks for the time to look into this.
What you say totally matches the "weird" behaviors I was seeing. You really
gave me a lot of clarity understanding what is going on behind the scenes.
To answer your questions:
1. Yes, I have database but my idea was to move away from
Having made some tests around this, whilst I have not yet been able to
reproduce the negative counter issue, I do think there needs to be some further
thought around dialog replication.
First thing to note - stats are not affected by replicated dialogs, so I don't
think DMQ is _directly_
More info:
On another cluster, same setup..., during this troubleshooting I disabled DMQ
and enabled MySQL for dialog replication, I also left one node outside of
rotation to see replication behavior.
Well, with 0 traffic I can see this:
```
root@sbc01:~# kamctl rpc dlg.stats_active
{
Could it have something to do with "old-non-expired-non-removed" dialogs?
So today I applied latest @charlesrchance patch, and what I normally do (so I
don't lose dialogs) is:
1- Restart one node
2- Wait for DMQ replication dialog sync
3- Restart other node
Right after restarting both nodes in
Thanks! Let me know if you need any more info
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/1591#issuecomment-406000670___
Kamailio
The values are closed to ULONG_MAX for 64bit, so likely some dialog end events
are counted many times (likely due to dmq replication) and the value goes below
0, turning it to max value.
For reference, ranges for number types:
* https://en.wikibooks.org/wiki/C_Programming/limits.h
In the
I don't know if it helps, but I have one server constantly reporting
incorrectly:
```
root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active"
"dialog:active_dialogs = 18446744073709551568",
root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e
Hi Daniel,
This is what I can get:
```
root@kamailio2:/var/tmp# gdb /usr/sbin/kamailio -ex "bt full" --batch
core.kamailio.1334.13cn4.1531505030 >
core.kamailio.1334.13cn4.1531505030-bt_full.txt
66 ../../core/parser/../mem/../atomic/atomic_common.h: No such file or
directory.
Hi Daniel,
On my initial test in prod, I can see already discrepancies:
```
root@kamailio2:~# kamctl rpc dlg.stats_active
{
"jsonrpc": "2.0",
"result": {
"starting": 4,
"connecting": 55,
"answering": 2,
"ongoing": 253,
"all": 314
},
"id": 23360
}
I'm building v5.1.4 packages with that commit cherry-picked. I will deploy as
soon as I have run a few tests in lab. I have noticed that the issue happens
only on low traffic, so it matches your theory of the value going "below 0"...
I was thinking if running a script every second gathering on
As an alternative to get the stats of active dialogs, I added an rpc command
that scans the internal dialog hash table:
*
https://github.com/kamailio/kamailio/commit/ebb149066690f7d96f45e1639e0c5ca9616bbbe0
It is slightly slower, given the scan in real time, that the stats using the
core
It may be that the value is decremented more than the active calls and "goes
below 0", but the counter is an 'unsigned long' value, so from 0 jumps to
maximum value. It may be related to DMQ replication. Based on feedback, so far
all was pretty accurate with the stats for active dialogs in an
We have another cluster of two kamailio for internal traffic, those are also on
v5.1.4 but don't have KDMQ enabled yet, and their graphs look perfect. So I'm
really starting to think this can be DMQ related.
--
You are receiving this because you are subscribed to this thread.
Reply to this
Some screenshots to illustrate what I mean:
![image](https://user-images.githubusercontent.com/16212586/42674283-f2d6c2da-8623-11e8-8df4-cf5b9859efd6.png)
Same graph, with a Y-axe limited to 500:
@jchavanton: I will try to have a look
@miconda: it reports 9223372036854776000 and after X time it goes back to good
values. (where X can be sometimes on the very next request, sometimes after
couple minutes, but haven't seen above that). But to your point: yes, it goes
back to good values
To clarify: some wrong value is returned, but afterwards it comes back to good
values? Is it always the same wrong value?
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
Hi Joel,
There was recently a little fix on initializing the counter from database.
But this fix it is not in 5.1.4 anyway since the release was create June 5 and
the fix was merged
Could you try to isolate where the regression was introduced
by building different version doing a checkout
I updated the issue as I think this might not be related to KEX module and more
with DIALOG module. As I'm not sure, I'm not specifying in the topic.
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
26 matches
Mail list logo