Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)

2019-03-11 Thread Daniel-Constantin Mierla
Closing with the solution of rpc "dlg.stats_active".

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/1591#issuecomment-471442731___
Kamailio (SER) - Development Mailing List
sr-dev@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev


Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)

2019-03-11 Thread Daniel-Constantin Mierla
Closed #1591.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/1591#event-2193117634___
Kamailio (SER) - Development Mailing List
sr-dev@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev


Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)

2018-10-26 Thread Joel Serrano
I just verified and removing DB_URL and DB_MODE from dialog modparams has the 
same behavior. In order to keep the thread in sync with the topic, I have 
created #1692 to track the metric not being updated. As for *this* issue we can 
either close it or continue troubleshooting on why the dialog counter from 
`stats.get_statistics` goes below 0.

@miconda what do you think?




-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/1591#issuecomment-433486369___
Kamailio (SER) - Development Mailing List
sr-dev@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev


Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)

2018-10-24 Thread Joel Serrano
Hi @miconda, sorry for the delay, it's taken me a while until I have been able 
to test this..

I can confirm so far that `dlg.stats_active` works correctly, it's in sync 
between both nodes all the time, so any changes to the dialogs are reflected 
there instantly on both nodes. 

But, something must be wrong with `stats.get_statistics`:

Active K node:

```
root@sbc01:/etc/kamailio# kamcmd stats.get_statistics all | grep dialog
dialog:active_dialogs = 30
dialog:early_dialogs = 2
dialog:expired_dialogs = 14
dialog:failed_dialogs = 12176
dialog:processed_dialogs = 44907
root@sbc01:/etc/kamailio#
```

Backup K node (newly restarted):

```
root@sbc02:/etc/kamailio# kamcmd stats.get_statistics all | grep dialog
dialog:active_dialogs = 27
dialog:early_dialogs = 0
dialog:expired_dialogs = 0
dialog:failed_dialogs = 0
dialog:processed_dialogs = 0
root@sbc02:/etc/kamailio#
```

that `27` seems to me like an "initial sync" from the peer node, but that 
number won't change after that (which makes sense as this node is a backup and 
doesnt' handle any traffic unless there is a failover).

Does this make sense?

Thanks!



-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/1591#issuecomment-432767538___
Kamailio (SER) - Development Mailing List
sr-dev@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev


Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)

2018-09-12 Thread Daniel-Constantin Mierla
This is only for the statistics.

The rpc command was not touched.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/1591#issuecomment-420531298___
Kamailio (SER) - Development Mailing List
sr-dev@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev


Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)

2018-09-11 Thread Joel Serrano
Hi @miconda , I can absolutely test this.

Would your patch apply to return values for `dlg.stats_active` or to the dialog 
entries for `stats.get_statistics` ?? Or maybe to both? :)



-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/1591#issuecomment-420415071___
Kamailio (SER) - Development Mailing List
sr-dev@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev


Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)

2018-09-11 Thread Daniel-Constantin Mierla
I pushed a patch (see above) to update stats for active and early dialogs on 
dmq operations. Can you give it a try and see if all ok with it.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/1591#issuecomment-420299616___
Kamailio (SER) - Development Mailing List
sr-dev@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev


Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)

2018-08-15 Thread Daniel-Constantin Mierla
The rpc command `dlg.stats_active` goes over the internal hash table of dialog 
module and counts the dialogs that are in active state, so what ever is there, 
local or replicated dialogs.

Counters are exposed to inconsistency if adding/removing is triggered by events 
that can appear many times, so maybe the statistic related to active calls 
should be changed to a callback counting in a similar manner as the rpc 
command. There are other statistics that have a callback function behind 
instead of a counter.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/1591#issuecomment-413202450___
Kamailio (SER) - Development Mailing List
sr-dev@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev


Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)

2018-07-31 Thread Joel Serrano
Hi @charlesrchance, thanks for the time to look into this.

What you say totally matches the "weird" behaviors I was seeing. You really 
gave me a lot of clarity understanding what is going on behind the scenes.

To answer your questions:

1. Yes, I have database but my idea was to move away from it for each thing I 
handle using DMQ. This also matches chronology of events, I have been 
restarting nodes here and there, I wasn't "losing" dialogs, but making them 
"orphaned" instead without knowing.

2. So there are (at least) a couple different options here:
2a) stat counters would have all dialogs (local + replicated), which in reality 
is the true value, as from kamailio's perspective, those dialogs are there and 
they are active aren't they?
2b) stat counters would only have local dialogs, so if you want to have full 
dialog count you would have to stack the values from all nodes.

Now there is also a `dlg.stats_active` that gives you current dialogs (not sure 
if it also shows replicated ones)  so maybe we can benefit from both options, a 
counter that includes both local and replicated dialogs, and a different 
counter (dlg.stats_active?) that has only the current-local-count of dialogs.

Since I fully restarted both SBCs (which in fact, that was clearing the 
orphaned dialogs) I haven't seen a spike (negative counter).

At this point I don't know what the best options is, I do understand perfectly 
the problem though.

@miconda what do you think about all this?







-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/1591#issuecomment-409303111___
Kamailio (SER) - Development Mailing List
sr-dev@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev


Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)

2018-07-31 Thread Charles Chance
Having made some tests around this, whilst I have not yet been able to 
reproduce the negative counter issue, I do think there needs to be some further 
thought around dialog replication.

First thing to note - stats are not affected by replicated dialogs, so I don't 
think DMQ is _directly_ responsible for the negative counting. However - and 
this may be _indirectly_ related - if a node is restarted, any dialogs owned by 
it at the time will be forever 'stuck'. This is owing to the fact that in its 
current implementation, the dialog owner is responsible for triggering 
update/removal across the rest of the cluster. If the owner no longer exists - 
or has been restarted and has no idea that it was once the owner of some/all of 
the dialogs it receives in its initial re-sync - then this link is broken 
permanently. It is further compounded by the fact these orphaned dialogs never 
(in my tests, anyway) timeout.

I need to spend some more time on the DMQ side, since this is the first time I 
have looked at it properly. In the meantime, @joelsdc, regarding your issue:

1. Do you have database enabled alongside DMQ replication or was it only for 
testing? I suspect this is where the recent 38 'expired' dialogs came from - 
conversely, the earlier 'bad' dialogs you mentioned were likely a result of the 
owning node having been restarted (these would not have been included in the 
'expired' counter).

2. Are you expecting the stats counters (the original ones, not the new 
'dlg.stats_active') to reflect all dialogs across the cluster or just those 
handled directly by the local instance?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/1591#issuecomment-409205552___
Kamailio (SER) - Development Mailing List
sr-dev@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev


Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)

2018-07-29 Thread Joel Serrano
More info:

On another cluster, same setup..., during this troubleshooting I disabled DMQ 
and enabled MySQL for dialog replication, I also left one node outside of 
rotation to see replication behavior.

Well, with 0 traffic I can see this:

```
root@sbc01:~# kamctl rpc dlg.stats_active
{
  "jsonrpc":  "2.0",
  "result": {
"starting": 0,
"connecting": 0,
"answering":  0,
"ongoing":  0,
"all":  0
  },
  "id": 2634
}
root@sbc01:~#
```

Which is correct, but...:

```
root@sbc01:~# kamctl rpc stats.fetch all | grep dialog
"dialog.active_dialogs":  "38",
"dialog.early_dialogs": "0",
"dialog.expired_dialogs": "38",
"dialog.failed_dialogs":  "0",
"dialog.processed_dialogs": "1",
root@sbc01:~#
```

I wonder if it's correct that those 38 `expired` dialogs still count towards 
the `active` counter? Could that have something to do? Maybe those 
non-existent-expired-dialogs because they are somewhere in the "active" 
counter, they get replicated only when DMQ is enabled?

So far I have clear that:

1- For sure there is a scenario where counters go below 0.
2- It only happens (so far) when DMQ is enabled
3- In my initial look, I found some logs about dialogs created on node1 were 
timed-out on node2 (see previous posts)
4- Now I see a discrepancy on active vs expired dialogs on a node with 0 active 
dialogs.

I feel we are slowly narrowing down what the possible problem can be, but I 
still don't have a clear picture what causes what.


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/1591#issuecomment-408690103___
Kamailio (SER) - Development Mailing List
sr-dev@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev


Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)

2018-07-28 Thread Joel Serrano
Could it have something to do with "old-non-expired-non-removed" dialogs?

So today I applied latest @charlesrchance patch, and what I normally do (so I 
don't lose dialogs) is:

1- Restart one node
2- Wait for DMQ replication dialog sync
3- Restart other node

Right after restarting both nodes in that sequence, I already could see values 
close to ULONG_MAX, so I did a `watch` every second and I could see the values 
move around.

I decided to compare the dialogs to see if I could find anything interesting, 
what I found is that there were a lot of dialogs with the `init_ts` from like 
11th and 13th July and being today 28th, those dialogs should not be there.

So I wonder if those "bad" dialogs could be the cause of the counters going 
below 0.

Maybe some timer to remove those dialogs didn't happen? and because of the way 
I perform restarts to not lose dialogs, they were constantly there replicated 
backwards and forwards...

That said, I did a full restart (both nodes at the same time, losing dialogs) 
and since then the metrics are looking good.

I hope this info helps.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/1591#issuecomment-408629258___
Kamailio (SER) - Development Mailing List
sr-dev@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev


Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)

2018-07-18 Thread Joel Serrano
Thanks! Let me know if you need any more info

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/1591#issuecomment-406000670___
Kamailio (SER) - Development Mailing List
sr-dev@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev


Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)

2018-07-18 Thread Daniel-Constantin Mierla
The values are closed to ULONG_MAX for 64bit, so likely some dialog end events 
are counted many times (likely due to dmq replication) and the value goes below 
0, turning it to max value.

For reference, ranges for number types:
  * https://en.wikibooks.org/wiki/C_Programming/limits.h

In the past days I was rather busy with other stuff, I will try to get back and 
analyze the code asap.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/1591#issuecomment-405880927___
Kamailio (SER) - Development Mailing List
sr-dev@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev


Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)

2018-07-17 Thread Joel Serrano
I don't know if it helps, but I have one server constantly reporting 
incorrectly:

```
root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active"
"dialog:active_dialogs = 18446744073709551568",
root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active"
"dialog:active_dialogs = 18446744073709551567",
root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active"
"dialog:active_dialogs = 18446744073709551567",
root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active"
"dialog:active_dialogs = 18446744073709551566",
root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active"
"dialog:active_dialogs = 18446744073709551566",
root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active"
"dialog:active_dialogs = 18446744073709551566",
root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active"
"dialog:active_dialogs = 18446744073709551566",
root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active"
"dialog:active_dialogs = 18446744073709551564",
root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active"
"dialog:active_dialogs = 18446744073709551564",
root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active"
"dialog:active_dialogs = 18446744073709551563",
root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active"
"dialog:active_dialogs = 18446744073709551562",
root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active"
"dialog:active_dialogs = 18446744073709551565",
root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active"
"dialog:active_dialogs = 18446744073709551567",
root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active"
"dialog:active_dialogs = 18446744073709551568",
root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active"
"dialog:active_dialogs = 18446744073709551569",
root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active"
"dialog:active_dialogs = 18446744073709551569",
root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active"
"dialog:active_dialogs = 18446744073709551570",
root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active"
"dialog:active_dialogs = 18446744073709551570",
root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active"
"dialog:active_dialogs = 18446744073709551570",
root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active"
"dialog:active_dialogs = 18446744073709551569",
root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active"
"dialog:active_dialogs = 18446744073709551569",
root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active"
"dialog:active_dialogs = 18446744073709551570",
root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active"
"dialog:active_dialogs = 18446744073709551571",
root@kamailio1:~# kamctl rpc stats.get_statistics all | grep -e "dialog:active"
"dialog:active_dialogs = 18446744073709551572",
root@kamailio1:~#
``` 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/1591#issuecomment-405637294___
Kamailio (SER) - Development Mailing List
sr-dev@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev


Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)

2018-07-13 Thread Joel Serrano
Hi Daniel, 

This is what I can get:

```
root@kamailio2:/var/tmp# gdb /usr/sbin/kamailio -ex "bt full" --batch 
core.kamailio.1334.13cn4.1531505030 > 
core.kamailio.1334.13cn4.1531505030-bt_full.txt
66  ../../core/parser/../mem/../atomic/atomic_common.h: No such file or 
directory.
root@kamailio2:/var/tmp# gdb /usr/sbin/kamailio -ex "bt full" --batch 
core.kamailio.1335.13cn4.1531505030 > 
core.kamailio.1335.13cn4.1531505030-bt_full.txt
66  ../../core/parser/../mem/../atomic/atomic_common.h: No such file or 
directory.
root@kamailio2:/var/tmp# gdb /usr/sbin/kamailio -ex "bt full" --batch 
core.kamailio.1336.13cn4.1531505030 > 
core.kamailio.1336.13cn4.1531505030-bt_full.txt
66  ../../core/parser/../mem/../atomic/atomic_common.h: No such file or 
directory.
root@kamailio2:/var/tmp# gdb /usr/sbin/kamailio -ex "bt full" --batch 
core.kamailio.1337.13cn4.1531505030 > 
core.kamailio.1337.13cn4.1531505030-bt_full.txt
66  ../../core/parser/../mem/../atomic/atomic_common.h: No such file or 
directory.
root@kamailio2:/var/tmp# gdb /usr/sbin/kamailio -ex "bt full" --batch 
core.kamailio.1338.13cn4.1531505030 > 
core.kamailio.1338.13cn4.1531505030-bt_full.txt
66  ../../core/parser/../mem/../atomic/atomic_common.h: No such file or 
directory.
root@kamailio2:/var/tmp# gdb /usr/sbin/kamailio -ex "bt full" --batch 
core.kamailio.1339.13cn4.1531505030 > 
core.kamailio.1339.13cn4.1531505030-bt_full.txt
66  ../../core/parser/../mem/../atomic/atomic_common.h: No such file or 
directory.
root@kamailio2:/var/tmp# gdb /usr/sbin/kamailio -ex "bt full" --batch 
core.kamailio.24376.13cn4.1531504766 > 
ocre.kamailio.24376.13cn4.1531504766-bt_full.txt
51  ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
root@kamailio2:/var/tmp# gdb /usr/sbin/kamailio -ex "bt full" --batch 
core.kamailio.28420.13cn4.1531504860 > 
ocre.kamailio.28420.13cn4.1531504860-bt_full.txt
BFD: Warning: /var/tmp/core.kamailio.28420.13cn4.1531504860 is truncated: 
expected core file size >= 8607031296, found: 2081087488.
Cannot access memory at address 0x7fc79b2dc148
Cannot access memory at address 0x7fc79b2dc140
Failed to read a valid object file image from memory.
root@kamailio2:/var/tmp# gdb /usr/sbin/kamailio -ex "bt full" --batch 
core.kamailio.28421.13cn4.1531504860 > 
ocre.kamailio.28421.13cn4.1531504860-bt_full.txt
BFD: Warning: /var/tmp/core.kamailio.28421.13cn4.1531504860 is truncated: 
expected core file size >= 8607031296, found: 2136645632.
Cannot access memory at address 0x7fc79b2dc148
Cannot access memory at address 0x7fc79b2dc140
Failed to read a valid object file image from memory.
root@kamailio2:/var/tmp# gdb /usr/sbin/kamailio -ex "bt full" --batch 
core.kamailio.28422.13cn4.1531504860 > 
ocre.kamailio.28422.13cn4.1531504860-bt_full.txt
BFD: Warning: /var/tmp/core.kamailio.28422.13cn4.1531504860 is truncated: 
expected core file size >= 8607031296, found: 2128039936.
Cannot access memory at address 0x7fc79b2dc148
Cannot access memory at address 0x7fc79b2dc140
Failed to read a valid object file image from memory.
root@kamailio2:/var/tmp# gdb /usr/sbin/kamailio -ex "bt full" --batch 
core.kamailio.28423.13cn4.1531504860 > 
ocre.kamailio.28423.13cn4.1531504860-bt_full.txt
BFD: Warning: /var/tmp/core.kamailio.28423.13cn4.1531504860 is truncated: 
expected core file size >= 8607031296, found: 2174545920.
Cannot access memory at address 0x7fc79b2dc148
Cannot access memory at address 0x7fc79b2dc140
Failed to read a valid object file image from memory.
root@kamailio2:/var/tmp#
```

[core.kamailio.1334.13cn4.1531505030-bt_full.txt](https://github.com/kamailio/kamailio/files/2193740/core.kamailio.1334.13cn4.1531505030-bt_full.txt)
[core.kamailio.1335.13cn4.1531505030-bt_full.txt](https://github.com/kamailio/kamailio/files/2193741/core.kamailio.1335.13cn4.1531505030-bt_full.txt)
[core.kamailio.1336.13cn4.1531505030-bt_full.txt](https://github.com/kamailio/kamailio/files/2193742/core.kamailio.1336.13cn4.1531505030-bt_full.txt)
[core.kamailio.1337.13cn4.1531505030-bt_full.txt](https://github.com/kamailio/kamailio/files/2193743/core.kamailio.1337.13cn4.1531505030-bt_full.txt)
[core.kamailio.1338.13cn4.1531505030-bt_full.txt](https://github.com/kamailio/kamailio/files/2193744/core.kamailio.1338.13cn4.1531505030-bt_full.txt)
[core.kamailio.1339.13cn4.1531505030-bt_full.txt](https://github.com/kamailio/kamailio/files/2193745/core.kamailio.1339.13cn4.1531505030-bt_full.txt)
[ocre.kamailio.24376.13cn4.1531504766-bt_full.txt](https://github.com/kamailio/kamailio/files/2193746/ocre.kamailio.24376.13cn4.1531504766-bt_full.txt)
[ocre.kamailio.28420.13cn4.1531504860-bt_full.txt](https://github.com/kamailio/kamailio/files/2193747/ocre.kamailio.28420.13cn4.1531504860-bt_full.txt)
[ocre.kamailio.28421.13cn4.1531504860-bt_full.txt](https://github.com/kamailio/kamailio/files/2193748/ocre.kamailio.28421.13cn4.1531504860-bt_full.txt)

Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)

2018-07-13 Thread Joel Serrano
Hi Daniel, 

On my initial test in prod, I can see already discrepancies:

```
root@kamailio2:~# kamctl rpc dlg.stats_active
{
  "jsonrpc":  "2.0",
  "result": {
"starting": 4,
"connecting": 55,
"answering":  2,
"ongoing":  253,
"all":  314
  },
  "id": 23360
}
root@kamailio2:~# kamctl rpc stats.get_statistics all | grep dialog
"dialog:active_dialogs = 18446744073709551604",
"dialog:early_dialogs = 15",
"dialog:expired_dialogs = 0",
"dialog:failed_dialogs = 120",
"dialog:processed_dialogs = 340",
root@kamailio2:~#
```

Now we can go past 9223372036854776000 ??? I don't understand that.

That was right after restarting (green active dialogs, yellow early dialogs):

![image](https://user-images.githubusercontent.com/16212586/42706316-2926ed4e-868b-11e8-8f62-08bddc849d58.png)


Here is multiple runs of the stats.get_statistics:

```
root@kamailio2:~# kamctl rpc stats.get_statistics all | grep dialog
"dialog:active_dialogs = 18446744073709551614",
"dialog:early_dialogs = 8",
"dialog:expired_dialogs = 0",
"dialog:failed_dialogs = 315",
"dialog:processed_dialogs = 863",
root@kamailio2:~# kamctl rpc stats.get_statistics all | grep dialog
"dialog:active_dialogs = 18446744073709551614",
"dialog:early_dialogs = 8",
"dialog:expired_dialogs = 0",
"dialog:failed_dialogs = 315",
"dialog:processed_dialogs = 864",
root@kamailio2:~# kamctl rpc stats.get_statistics all | grep dialog
"dialog:active_dialogs = 18446744073709551614",
"dialog:early_dialogs = 8",
"dialog:expired_dialogs = 0",
"dialog:failed_dialogs = 315",
"dialog:processed_dialogs = 864",
root@kamailio2:~# kamctl rpc stats.get_statistics all | grep dialog
"dialog:active_dialogs = 18446744073709551613",
"dialog:early_dialogs = 10",
"dialog:expired_dialogs = 0",
"dialog:failed_dialogs = 320",
"dialog:processed_dialogs = 866",
root@kamailio2:~# kamctl rpc stats.get_statistics all | grep dialog
"dialog:active_dialogs = 0",
"dialog:early_dialogs = 8",
"dialog:expired_dialogs = 0",
"dialog:failed_dialogs = 322",
"dialog:processed_dialogs = 870",
root@kamailio2:~# kamctl rpc stats.get_statistics all | grep dialog
"dialog:active_dialogs = 1",
"dialog:early_dialogs = 9",
"dialog:expired_dialogs = 0",
"dialog:failed_dialogs = 323",
"dialog:processed_dialogs = 873",
root@kamailio2:~# kamctl rpc stats.get_statistics all | grep dialog
"dialog:active_dialogs = 18446744073709551615",
"dialog:early_dialogs = 9",
"dialog:expired_dialogs = 0",
"dialog:failed_dialogs = 324",
"dialog:processed_dialogs = 875",
root@kamailio2:~# kamctl rpc stats.get_statistics all | grep dialog
"dialog:active_dialogs = 18446744073709551615",
"dialog:early_dialogs = 10",
"dialog:expired_dialogs = 0",
"dialog:failed_dialogs = 325",
"dialog:processed_dialogs = 876",
root@kamailio2:~#
```

I tried to have a script gathering metrics:

```
root@kamailio2:~# cat dialogs.sh
#!/bin/bash

while true; do
DIALOGS=`kamctl rpc dlg.list | grep h_entry | sort  | wc -l`
METRICS=`kamctl rpc stats.get_statistics all | grep -e "dialog:active" -e 
"dialog:early" | tr -d " " | tr -d "\n"`
NEWMETRICS=`kamctl rpc dlg.stats_active | jq -c .result`
echo "`date +'%Y-%m-%d %H:%M:%S.%N'` - dlg.list:$DIALOGS - 
stats.get_statistics:$METRICS - dlg.stats_active:$NEWMETRICS"
sleep 1
done
root@kamailio2:~#
```

But, after a little kamailio "starts slowing down" until it dies (had to 
completely reboot the server).

```
root@kamailio2:~# ./dialogs.sh
2018-07-13 10:57:14.204158716 - dlg.list:419 - 
stats.get_statistics:"dialog:active_dialogs=13","dialog:early_dialogs=12", - 
dlg.stats_active:{"starting":5,"connecting":62,"answering":7,"ongoing":288,"all":362}
2018-07-13 10:57:15.437165579 - dlg.list:422 - 
stats.get_statistics:"dialog:active_dialogs=15","dialog:early_dialogs=11", - 
dlg.stats_active:{"starting":4,"connecting":59,"answering":10,"ongoing":289,"all":362}
2018-07-13 10:57:16.678432892 - dlg.list:423 - 
stats.get_statistics:"dialog:active_dialogs=14","dialog:early_dialogs=9", - 
dlg.stats_active:{"starting":2,"connecting":58,"answering":9,"ongoing":289,"all":358}
2018-07-13 10:57:17.915458776 - dlg.list:422 - 
stats.get_statistics:"dialog:active_dialogs=13","dialog:early_dialogs=9", - 
dlg.stats_active:{"starting":4,"connecting":60,"answering":7,"ongoing":289,"all":360}
2018-07-13 10:57:19.149467641 - dlg.list:420 - 
stats.get_statistics:"dialog:active_dialogs=13","dialog:early_dialogs=8", - 
dlg.stats_active:{"starting":3,"connecting":60,"answering":5,"ongoing":290,"all":358}
2018-07-13 10:57:20.386457513 - dlg.list:415 - 
stats.get_statistics:"dialog:active_dialogs=14","dialog:early_dialogs=7", - 
dlg.stats_active:{"starting":4,"connecting":59,"answering":5,"ongoing":290,"all":358}
2018-07-13 10:57:21.627241266 - dlg.list:412 - 
stats.get_statistics:"dialog:active_dialogs=15","dialog:early_dialogs=7", - 

Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)

2018-07-13 Thread Joel Serrano
I'm building v5.1.4 packages with that commit cherry-picked. I will deploy as 
soon as I have run a few tests in lab. I have noticed that the issue happens 
only on low traffic, so it matches your theory of the value going "below 0"...

I was thinking if running a script every second gathering on both nodes:

```
kamctl rpc dlg.stats_active
kamctl rpc dlg.list | grep h_entry | sort  | wc -l
kamctl rpc stats.get_statistics all | grep -e "dialog:active" -e "dialog:early" 
| tr -d " " | tr -d "\n"
```

I will also enable debug logs on `kamailio2` node (has slightly less traffic 
than `kamailio1`) maybe that helps too.

Thank you! I'll report back results tomorrow.



-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/1591#issuecomment-404894491___
Kamailio (SER) - Development Mailing List
sr-dev@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev


Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)

2018-07-13 Thread Daniel-Constantin Mierla
As an alternative to get the stats of active dialogs, I added an rpc command 
that scans the internal dialog hash table:

  * 
https://github.com/kamailio/kamailio/commit/ebb149066690f7d96f45e1639e0c5ca9616bbbe0

It is slightly slower, given the scan in real time, that the stats using the 
core counters, but should be fast enough not to affect the performances.

It can be used to compare the results eventually and detect when an anomaly in 
the counter based stats happen.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/1591#issuecomment-404883239___
Kamailio (SER) - Development Mailing List
sr-dev@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev


Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)

2018-07-13 Thread Daniel-Constantin Mierla
It may be that the value is decremented more than the active calls and "goes 
below 0", but the counter is an 'unsigned long' value, so from 0 jumps to 
maximum value. It may be related to DMQ replication. Based on feedback, so far 
all was pretty accurate with the stats for active dialogs in an instance 
without DMQ. So this path worth checking.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/1591#issuecomment-404755185___
Kamailio (SER) - Development Mailing List
sr-dev@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev


Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)

2018-07-13 Thread Joel Serrano
We have another cluster of two kamailio for internal traffic, those are also on 
v5.1.4 but don't have KDMQ enabled yet, and their graphs look perfect. So I'm 
really starting to think this can be DMQ related.


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/1591#issuecomment-404740630___
Kamailio (SER) - Development Mailing List
sr-dev@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev


Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)

2018-07-13 Thread Joel Serrano
Some screenshots to illustrate what I mean:

![image](https://user-images.githubusercontent.com/16212586/42674283-f2d6c2da-8623-11e8-8df4-cf5b9859efd6.png)

Same graph, with a Y-axe limited to 500:

![image](https://user-images.githubusercontent.com/16212586/42674318-112dc756-8624-11e8-8c06-c2fc345d08cf.png)

Same as above, but shortening the time frame and with Y-limit of 100:

![image](https://user-images.githubusercontent.com/16212586/42674387-586dfa28-8624-11e8-956d-f9b1805883da.png)

I see a type of pattern going further back in time is the following:

- all is working
- something happens (*see below note*)
- dialogs number reported goes to 0 or close to 0 for a bit
- dialogs report the 9223372036854776000
- things start working again?



And, looking in the logs, at the ~time the spikes happen, I can see this in the 
logs:

```
root@kamailio2:~# cat /var/log/kamailio/kamailio.log-20180712 | grep -v NOTICE 
| grep -v "core/tcp_"
...
Jul 12 02:54:24 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'orcr9cRwLT' and tags 'AV-f9MK~v' 
'tetjZeg24aZeS'
Jul 12 02:54:45 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'IRO6Y5Rw53' and tags 'cIaxfaScL' 
'9jBBFcc1NaUNQ'
Jul 12 02:54:48 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'fw3hjesWFN' and tags '-3xLiat-a' 
't1r1KrNrH0BcQ'
Jul 12 02:54:50 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'w1rHrdpK2f' and tags 'HoJgX22ya' 
'4XmtD99FgyyDa'
Jul 12 02:54:58 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'mrnlQ8OYuX' and tags 'yum6HTlKK' 
'766FZ05HaQXmj'
Jul 12 02:55:02 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'lrbwFVn7Q8' and tags 'xOubxsLmV' 
'77Q419N7eHy6N'
Jul 12 02:55:03 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID '14yaUN~xGG' and tags '6nSXFwBSb' 
'mp1U2N57Ncg3p'
Jul 12 02:55:05 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'hgI~lJdBJ3' and tags 'FzXjp4-YJ' 
'a2jt4Hrv1H0cN'
Jul 12 02:55:09 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'oBof16H0~A' and tags 'ZepyE21EK' 
'3ZXHF271aU7pH'
Jul 12 02:55:13 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'XIvos7QACD' and tags 'UPsNWynuM' 
'X3jF108jcX89N'
Jul 12 02:55:13 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'ygaFfVdLn0' and tags '3Qbx8dkvK' 
'0yUyZrX83aSpa'
Jul 12 02:55:18 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'm9G~N0LU9d' and tags 'wQ~cwqKHN' 
'cNp0aHaS0XD4F'
Jul 12 02:55:20 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'OUaCSKP2PZ' and tags 'W2-o~Knwc' 
'epjHQN6yQj91e'
Jul 12 02:55:22 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'wUtsFit9wE' and tags 'CmPHrLXm9' 
'KKpSm5Fv718mD'
Jul 12 02:55:24 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'i5FZlDb2Jt' and tags 'c7AO4enu~' 
'423vDyyBrU4cQ'
Jul 12 02:55:26 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'XLKRf6e5Xq' and tags 'rSTom2iS3' 
'020pQS21HyQUK'
Jul 12 02:55:29 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'nfBiavwuHu' and tags 'P3cHBXiea' 
'NQ2ZBN8BZNSBF'
Jul 12 02:55:31 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID '699iA2AdSO' and tags 'eZSvyxAYC' 
'tj76025mK04Qe'
Jul 12 02:55:31 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'XUuh-7CmGt' and tags 'GyhSzl-c7' 
'r0mNXc4DSerjQ'
Jul 12 02:55:33 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'oMau5CTVYF' and tags '42SPhgBwh' 
'XQeBjgrpNUFpp'
Jul 12 02:55:34 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'l0nKr0oj5c' and tags '2ZzNMM0b8' 
'B6p7a3U8H0c9j'
Jul 12 02:55:35 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'Kl49oLgGHV' and tags 'E5E5P9rFX' 
'NB9ZFyjFN2cvD'
Jul 12 02:55:35 kamailio2 sbc[27795]: WARNING: dialog [dlg_handlers.c:1596]: 
dlg_ontimeout(): timeout for dlg with CallID 'i2lEbuiAFm' and tags 'TDExnc7Bx' 
'Urv4SBvp5gyQQ'
Jul 12 02:55:39 kamailio2 sbc[27795]: 

Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)

2018-07-12 Thread Joel Serrano
@jchavanton: I will try to have a look 

@miconda: it reports 9223372036854776000  and after X time it goes back to good 
values. (where X can be sometimes on the very next request, sometimes after 
couple minutes, but haven't seen above that). But to your point: yes, it goes 
back to good values by itself.





-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/1591#issuecomment-404660178___
Kamailio (SER) - Development Mailing List
sr-dev@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev


Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)

2018-07-12 Thread Daniel-Constantin Mierla
To clarify: some wrong value is returned, but afterwards it comes back to good 
values? Is it always the same wrong value?


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/1591#issuecomment-404581656___
Kamailio (SER) - Development Mailing List
sr-dev@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev


Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)

2018-07-12 Thread Julien Chavanton
Hi Joel, 
There was recently a little fix on initializing the counter from database.
But this fix it is not in 5.1.4 anyway since the release was create June 5 and 
the fix was merged  

Could you try to isolate where the regression was introduced
by building different version doing a checkout somewhere in the middle
```
git log --pretty=oneline 5.0.2...5.1.4
```

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/1591#issuecomment-404581159___
Kamailio (SER) - Development Mailing List
sr-dev@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev


Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)

2018-07-12 Thread Joel Serrano
I updated the issue as I think this might not be related to KEX module and more 
with DIALOG module. As I'm not sure, I'm not specifying in the topic.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/1591#issuecomment-404555603___
Kamailio (SER) - Development Mailing List
sr-dev@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev