Re: [SR-Users] qm_find_free() Free fragment not found, called from xcap_server, no more pkg

2019-09-23 Thread Cody Herzog
Excellent.

Thanks again, Henning.

From: Henning Westerholt [mailto:h...@skalatan.de]
Sent: Sunday, September 22, 2019 3:58 AM
To: Kamailio (SER) - Users Mailing List ; Cody 
Herzog 
Subject: Re: [SR-Users] qm_find_free() Free fragment not found, called from 
xcap_server, no more pkg


Hello Cody,

this is only used when memory is freed. The additional code that is executed is 
really small, so there will be no (observable) performance impact.

Cheers,

Henning
Am 21.09.19 um 03:59 schrieb Cody Herzog:
Thanks very much, Henning.

We will experiment with that option.

I wonder if there will be any performance impact with mem_join enabled.

We've still had no luck reproducing the issue, so we will keep trying different 
long-duration load tests to see if we can trigger it.

Thanks again.

From: Henning Westerholt [mailto:h...@skalatan.de]
Sent: Thursday, September 19, 2019 2:22 PM
To: Kamailio (SER) - Users Mailing List 
<mailto:sr-users@lists.kamailio.org>; Cody Herzog 
<mailto:cher...@intouchhealth.com>
Subject: Re: [SR-Users] qm_find_free() Free fragment not found, called from 
xcap_server, no more pkg


Hello Cody,

you only need to enable the mem_join parameter, no compile time switch is 
necessary (anymore).

Cheers,

Henning
Am 19.09.19 um 23:15 schrieb Cody Herzog:
Thanks for the quick and useful response.

Increasing the package memory seems like a simple and safe thing to do. Also, 
we are indeed planning to upgrade soon.

Regarding 'mem_join', based on the documentation, I assume we will have to 
compile with MEM_JOIN_FREE in order for it to work. Is that correct?

We are still hoping to be able to reproduce this issue in our dev environment, 
so that we can prove our changes are helping. Can anyone think of a way we 
might be able to intentionally cause some memory fragmentation? Is there a 
particular type of access pattern, or a series of operations we can repeat many 
times to induce fragmentation?

We've been running heavy SIPp load tests for many days which exercise the 
following:

-Registration
-Subscription to large RLS contact lists
-Subscription to presence of many contacts

So far, we haven't been able to reproduce.

Thanks again.

From: Daniel-Constantin Mierla [mailto:mico...@gmail.com]
Sent: Wednesday, September 18, 2019 11:32 PM
To: Kamailio (SER) - Users Mailing List 
<mailto:sr-users@lists.kamailio.org>; Cody Herzog 
<mailto:cher...@intouchhealth.com>
Subject: Re: [SR-Users] qm_find_free() Free fragment not found, called from 
xcap_server, no more pkg


Hello,

first, I would recommend to upgrade to the latest version in v5.1.x series, the 
5.1.0 was the first there and there were many issues fixed in the 5.1 branch 
that will ensure smoother run in long term.

The issue reported is likely related to fragmentation, try to set global 
parameter:

mem_join=1

As the instance needs to deal with large chunks for xcap documents, I would 
also suggest to increase a bit the pool of pkg memory via -M command line 
parameter -- I see that now is 8MB, make it 12 or 16.

Cheers,
Daniel
On 19.09.19 01:07, Cody Herzog wrote:
Hello.

Recently, for the first time, we experienced an apparent memory issue in our 
production environment. Here's an example of the relevant log messages:

Sep 13 18:55:22 SIPCOMM /usr/local/sbin/kamailio[2302]: ERROR:  
[core/mem/q_malloc.c:286]: qm_find_free(): qm_find_free(0x7fd8bbcfc010, 
134648); Free fragment not found!
Sep 13 18:55:22 SIPCOMM /usr/local/sbin/kamailio[2302]: ERROR:  
[core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7fd8bbcfc010, 134648) 
called from xcap_server: xcap_server.c: ki_xcaps_put(549), module: xcap_server; 
Free fragment not found!
Sep 13 18:55:22 SIPCOMM /usr/local/sbin/kamailio[2302]: ERROR: xcap_server 
[xcap_server.c:552]: ki_xcaps_put(): no more pkg
Sep 13 18:55:22 SIPCOMM /usr/local/sbin/kamailio[2252]: ERROR: app_perl 
[kamailioxs.xs:487]: XS_Kamailio_log(): 500 Server error

The failed operation was the XCAP server module trying to generate a very large 
RLS contact list for a user. The issue only impacted users with very large 
lists, as though a large contiguous block of memory could not be found, whereas 
other smaller allocations continued to work fine. We believe the requested 
allocation was around 112 KB in size.
The server had been up for 14 days. We were able to work around the issue 
temporarily by just restarting the Kamailio service. It's unusual, because our 
production server is often up for months, and we've never seen this issue 
before. The load on production is increasing slowly due to an increased 
concurrent user count, so that might be related.

Before restarting the service on production, we captured the output of the 
following commands:

kamctl stats shmem
kamcmd mod.stats all shm
kamcmd pkg.stats
kamcmd mod.stats all pkg

Here's the shared mem output:

{
  "jsonrpc":  "2.0",
  "result": [
"shmem:fragments = 27240",
&qu

Re: [SR-Users] qm_find_free() Free fragment not found, called from xcap_server, no more pkg

2019-09-22 Thread Henning Westerholt
Hello Cody,

this is only used when memory is freed. The additional code that is executed is 
really small, so there will be no (observable) performance impact.

Cheers,

Henning

Am 21.09.19 um 03:59 schrieb Cody Herzog:
Thanks very much, Henning.

We will experiment with that option.

I wonder if there will be any performance impact with mem_join enabled.

We've still had no luck reproducing the issue, so we will keep trying different 
long-duration load tests to see if we can trigger it.

Thanks again.

From: Henning Westerholt [mailto:h...@skalatan.de]
Sent: Thursday, September 19, 2019 2:22 PM
To: Kamailio (SER) - Users Mailing List 
<mailto:sr-users@lists.kamailio.org>; Cody Herzog 
<mailto:cher...@intouchhealth.com>
Subject: Re: [SR-Users] qm_find_free() Free fragment not found, called from 
xcap_server, no more pkg


Hello Cody,

you only need to enable the mem_join parameter, no compile time switch is 
necessary (anymore).

Cheers,

Henning
Am 19.09.19 um 23:15 schrieb Cody Herzog:
Thanks for the quick and useful response.

Increasing the package memory seems like a simple and safe thing to do. Also, 
we are indeed planning to upgrade soon.

Regarding 'mem_join', based on the documentation, I assume we will have to 
compile with MEM_JOIN_FREE in order for it to work. Is that correct?

We are still hoping to be able to reproduce this issue in our dev environment, 
so that we can prove our changes are helping. Can anyone think of a way we 
might be able to intentionally cause some memory fragmentation? Is there a 
particular type of access pattern, or a series of operations we can repeat many 
times to induce fragmentation?

We’ve been running heavy SIPp load tests for many days which exercise the 
following:

-Registration
-Subscription to large RLS contact lists
-Subscription to presence of many contacts

So far, we haven't been able to reproduce.

Thanks again.

From: Daniel-Constantin Mierla [mailto:mico...@gmail.com]
Sent: Wednesday, September 18, 2019 11:32 PM
To: Kamailio (SER) - Users Mailing List 
<mailto:sr-users@lists.kamailio.org>; Cody Herzog 
<mailto:cher...@intouchhealth.com>
Subject: Re: [SR-Users] qm_find_free() Free fragment not found, called from 
xcap_server, no more pkg


Hello,

first, I would recommend to upgrade to the latest version in v5.1.x series, the 
5.1.0 was the first there and there were many issues fixed in the 5.1 branch 
that will ensure smoother run in long term.

The issue reported is likely related to fragmentation, try to set global 
parameter:

mem_join=1

As the instance needs to deal with large chunks for xcap documents, I would 
also suggest to increase a bit the pool of pkg memory via -M command line 
parameter -- I see that now is 8MB, make it 12 or 16.

Cheers,
Daniel
On 19.09.19 01:07, Cody Herzog wrote:
Hello.

Recently, for the first time, we experienced an apparent memory issue in our 
production environment. Here's an example of the relevant log messages:

Sep 13 18:55:22 SIPCOMM /usr/local/sbin/kamailio[2302]: ERROR:  
[core/mem/q_malloc.c:286]: qm_find_free(): qm_find_free(0x7fd8bbcfc010, 
134648); Free fragment not found!
Sep 13 18:55:22 SIPCOMM /usr/local/sbin/kamailio[2302]: ERROR:  
[core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7fd8bbcfc010, 134648) 
called from xcap_server: xcap_server.c: ki_xcaps_put(549), module: xcap_server; 
Free fragment not found!
Sep 13 18:55:22 SIPCOMM /usr/local/sbin/kamailio[2302]: ERROR: xcap_server 
[xcap_server.c:552]: ki_xcaps_put(): no more pkg
Sep 13 18:55:22 SIPCOMM /usr/local/sbin/kamailio[2252]: ERROR: app_perl 
[kamailioxs.xs:487]: XS_Kamailio_log(): 500 Server error

The failed operation was the XCAP server module trying to generate a very large 
RLS contact list for a user. The issue only impacted users with very large 
lists, as though a large contiguous block of memory could not be found, whereas 
other smaller allocations continued to work fine. We believe the requested 
allocation was around 112 KB in size.
The server had been up for 14 days. We were able to work around the issue 
temporarily by just restarting the Kamailio service. It's unusual, because our 
production server is often up for months, and we've never seen this issue 
before. The load on production is increasing slowly due to an increased 
concurrent user count, so that might be related.

Before restarting the service on production, we captured the output of the 
following commands:

kamctl stats shmem
kamcmd mod.stats all shm
kamcmd pkg.stats
kamcmd mod.stats all pkg

Here's the shared mem output:

{
  "jsonrpc":  "2.0",
  "result": [
"shmem:fragments = 27240",
"shmem:free_size = 447203296",
"shmem:max_used_size = 116175576",
"shmem:real_used_size = 89667616",
"shmem:total_size = 536870912",
"shmem:used_size = 68824240"
  ],
  "id": 6934
}

Here's the pkg output for the p

Re: [SR-Users] qm_find_free() Free fragment not found, called from xcap_server, no more pkg

2019-09-20 Thread Cody Herzog
Thanks very much, Henning.

We will experiment with that option.

I wonder if there will be any performance impact with mem_join enabled.

We've still had no luck reproducing the issue, so we will keep trying different 
long-duration load tests to see if we can trigger it.

Thanks again.

From: Henning Westerholt [mailto:h...@skalatan.de]
Sent: Thursday, September 19, 2019 2:22 PM
To: Kamailio (SER) - Users Mailing List ; Cody 
Herzog 
Subject: Re: [SR-Users] qm_find_free() Free fragment not found, called from 
xcap_server, no more pkg


Hello Cody,

you only need to enable the mem_join parameter, no compile time switch is 
necessary (anymore).

Cheers,

Henning
Am 19.09.19 um 23:15 schrieb Cody Herzog:
Thanks for the quick and useful response.

Increasing the package memory seems like a simple and safe thing to do. Also, 
we are indeed planning to upgrade soon.

Regarding 'mem_join', based on the documentation, I assume we will have to 
compile with MEM_JOIN_FREE in order for it to work. Is that correct?

We are still hoping to be able to reproduce this issue in our dev environment, 
so that we can prove our changes are helping. Can anyone think of a way we 
might be able to intentionally cause some memory fragmentation? Is there a 
particular type of access pattern, or a series of operations we can repeat many 
times to induce fragmentation?

We've been running heavy SIPp load tests for many days which exercise the 
following:

-Registration
-Subscription to large RLS contact lists
-Subscription to presence of many contacts

So far, we haven't been able to reproduce.

Thanks again.

From: Daniel-Constantin Mierla [mailto:mico...@gmail.com]
Sent: Wednesday, September 18, 2019 11:32 PM
To: Kamailio (SER) - Users Mailing List 
<mailto:sr-users@lists.kamailio.org>; Cody Herzog 
<mailto:cher...@intouchhealth.com>
Subject: Re: [SR-Users] qm_find_free() Free fragment not found, called from 
xcap_server, no more pkg


Hello,

first, I would recommend to upgrade to the latest version in v5.1.x series, the 
5.1.0 was the first there and there were many issues fixed in the 5.1 branch 
that will ensure smoother run in long term.

The issue reported is likely related to fragmentation, try to set global 
parameter:

mem_join=1

As the instance needs to deal with large chunks for xcap documents, I would 
also suggest to increase a bit the pool of pkg memory via -M command line 
parameter -- I see that now is 8MB, make it 12 or 16.

Cheers,
Daniel
On 19.09.19 01:07, Cody Herzog wrote:
Hello.

Recently, for the first time, we experienced an apparent memory issue in our 
production environment. Here's an example of the relevant log messages:

Sep 13 18:55:22 SIPCOMM /usr/local/sbin/kamailio[2302]: ERROR:  
[core/mem/q_malloc.c:286]: qm_find_free(): qm_find_free(0x7fd8bbcfc010, 
134648); Free fragment not found!
Sep 13 18:55:22 SIPCOMM /usr/local/sbin/kamailio[2302]: ERROR:  
[core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7fd8bbcfc010, 134648) 
called from xcap_server: xcap_server.c: ki_xcaps_put(549), module: xcap_server; 
Free fragment not found!
Sep 13 18:55:22 SIPCOMM /usr/local/sbin/kamailio[2302]: ERROR: xcap_server 
[xcap_server.c:552]: ki_xcaps_put(): no more pkg
Sep 13 18:55:22 SIPCOMM /usr/local/sbin/kamailio[2252]: ERROR: app_perl 
[kamailioxs.xs:487]: XS_Kamailio_log(): 500 Server error

The failed operation was the XCAP server module trying to generate a very large 
RLS contact list for a user. The issue only impacted users with very large 
lists, as though a large contiguous block of memory could not be found, whereas 
other smaller allocations continued to work fine. We believe the requested 
allocation was around 112 KB in size.
The server had been up for 14 days. We were able to work around the issue 
temporarily by just restarting the Kamailio service. It's unusual, because our 
production server is often up for months, and we've never seen this issue 
before. The load on production is increasing slowly due to an increased 
concurrent user count, so that might be related.

Before restarting the service on production, we captured the output of the 
following commands:

kamctl stats shmem
kamcmd mod.stats all shm
kamcmd pkg.stats
kamcmd mod.stats all pkg

Here's the shared mem output:

{
  "jsonrpc":  "2.0",
  "result": [
"shmem:fragments = 27240",
"shmem:free_size = 447203296",
"shmem:max_used_size = 116175576",
"shmem:real_used_size = 89667616",
"shmem:total_size = 536870912",
"shmem:used_size = 68824240"
  ],
  "id": 6934
}

Here's the pkg output for the particular PID which was throwing the errors:

{
entry: 34
pid: 2302
rank: 14
used: 2415864
free: 4949688
real_used: 3438920
total_size: 8388608
total_frags: 1951
}

We didn't see a

Re: [SR-Users] qm_find_free() Free fragment not found, called from xcap_server, no more pkg

2019-09-19 Thread Henning Westerholt
Hello Cody,

you only need to enable the mem_join parameter, no compile time switch is 
necessary (anymore).

Cheers,

Henning

Am 19.09.19 um 23:15 schrieb Cody Herzog:
Thanks for the quick and useful response.

Increasing the package memory seems like a simple and safe thing to do. Also, 
we are indeed planning to upgrade soon.

Regarding 'mem_join', based on the documentation, I assume we will have to 
compile with MEM_JOIN_FREE in order for it to work. Is that correct?

We are still hoping to be able to reproduce this issue in our dev environment, 
so that we can prove our changes are helping. Can anyone think of a way we 
might be able to intentionally cause some memory fragmentation? Is there a 
particular type of access pattern, or a series of operations we can repeat many 
times to induce fragmentation?

We’ve been running heavy SIPp load tests for many days which exercise the 
following:

-Registration
-Subscription to large RLS contact lists
-Subscription to presence of many contacts

So far, we haven't been able to reproduce.

Thanks again.

From: Daniel-Constantin Mierla [mailto:mico...@gmail.com]
Sent: Wednesday, September 18, 2019 11:32 PM
To: Kamailio (SER) - Users Mailing List 
<mailto:sr-users@lists.kamailio.org>; Cody Herzog 
<mailto:cher...@intouchhealth.com>
Subject: Re: [SR-Users] qm_find_free() Free fragment not found, called from 
xcap_server, no more pkg


Hello,

first, I would recommend to upgrade to the latest version in v5.1.x series, the 
5.1.0 was the first there and there were many issues fixed in the 5.1 branch 
that will ensure smoother run in long term.

The issue reported is likely related to fragmentation, try to set global 
parameter:

mem_join=1

As the instance needs to deal with large chunks for xcap documents, I would 
also suggest to increase a bit the pool of pkg memory via -M command line 
parameter -- I see that now is 8MB, make it 12 or 16.

Cheers,
Daniel
On 19.09.19 01:07, Cody Herzog wrote:
Hello.

Recently, for the first time, we experienced an apparent memory issue in our 
production environment. Here's an example of the relevant log messages:

Sep 13 18:55:22 SIPCOMM /usr/local/sbin/kamailio[2302]: ERROR:  
[core/mem/q_malloc.c:286]: qm_find_free(): qm_find_free(0x7fd8bbcfc010, 
134648); Free fragment not found!
Sep 13 18:55:22 SIPCOMM /usr/local/sbin/kamailio[2302]: ERROR:  
[core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7fd8bbcfc010, 134648) 
called from xcap_server: xcap_server.c: ki_xcaps_put(549), module: xcap_server; 
Free fragment not found!
Sep 13 18:55:22 SIPCOMM /usr/local/sbin/kamailio[2302]: ERROR: xcap_server 
[xcap_server.c:552]: ki_xcaps_put(): no more pkg
Sep 13 18:55:22 SIPCOMM /usr/local/sbin/kamailio[2252]: ERROR: app_perl 
[kamailioxs.xs:487]: XS_Kamailio_log(): 500 Server error

The failed operation was the XCAP server module trying to generate a very large 
RLS contact list for a user. The issue only impacted users with very large 
lists, as though a large contiguous block of memory could not be found, whereas 
other smaller allocations continued to work fine. We believe the requested 
allocation was around 112 KB in size.
The server had been up for 14 days. We were able to work around the issue 
temporarily by just restarting the Kamailio service. It's unusual, because our 
production server is often up for months, and we've never seen this issue 
before. The load on production is increasing slowly due to an increased 
concurrent user count, so that might be related.

Before restarting the service on production, we captured the output of the 
following commands:

kamctl stats shmem
kamcmd mod.stats all shm
kamcmd pkg.stats
kamcmd mod.stats all pkg

Here's the shared mem output:

{
  "jsonrpc":  "2.0",
  "result": [
"shmem:fragments = 27240",
"shmem:free_size = 447203296",
"shmem:max_used_size = 116175576",
"shmem:real_used_size = 89667616",
"shmem:total_size = 536870912",
"shmem:used_size = 68824240"
  ],
  "id": 6934
}

Here's the pkg output for the particular PID which was throwing the errors:

{
entry: 34
pid: 2302
rank: 14
used: 2415864
free: 4949688
real_used: 3438920
total_size: 8388608
total_frags: 1951
}

We didn't see anything obvious in the stats output which explains the issue.

We've been trying to reproduce the issue in a dev environment using simulated 
higher than production load running for many days, but so far we've had no 
luck. We've been monitoring memory stats over time, but we don't see any 
obvious leaks or issues.

We've searched various past threads, but didn't find any obvious answers. Here 
are some of the documents and the threads we've been reading:

https://www.kamailio.org/wiki/tutorials/troubleshooting/memory
https://www.kamailio.org

Re: [SR-Users] qm_find_free() Free fragment not found, called from xcap_server, no more pkg

2019-09-19 Thread Cody Herzog
Thanks for the quick and useful response.

Increasing the package memory seems like a simple and safe thing to do. Also, 
we are indeed planning to upgrade soon.

Regarding 'mem_join', based on the documentation, I assume we will have to 
compile with MEM_JOIN_FREE in order for it to work. Is that correct?

We are still hoping to be able to reproduce this issue in our dev environment, 
so that we can prove our changes are helping. Can anyone think of a way we 
might be able to intentionally cause some memory fragmentation? Is there a 
particular type of access pattern, or a series of operations we can repeat many 
times to induce fragmentation?

We've been running heavy SIPp load tests for many days which exercise the 
following:

-Registration
-Subscription to large RLS contact lists
-Subscription to presence of many contacts

So far, we haven't been able to reproduce.

Thanks again.

From: Daniel-Constantin Mierla [mailto:mico...@gmail.com]
Sent: Wednesday, September 18, 2019 11:32 PM
To: Kamailio (SER) - Users Mailing List ; Cody 
Herzog 
Subject: Re: [SR-Users] qm_find_free() Free fragment not found, called from 
xcap_server, no more pkg


Hello,

first, I would recommend to upgrade to the latest version in v5.1.x series, the 
5.1.0 was the first there and there were many issues fixed in the 5.1 branch 
that will ensure smoother run in long term.

The issue reported is likely related to fragmentation, try to set global 
parameter:

mem_join=1

As the instance needs to deal with large chunks for xcap documents, I would 
also suggest to increase a bit the pool of pkg memory via -M command line 
parameter -- I see that now is 8MB, make it 12 or 16.

Cheers,
Daniel
On 19.09.19 01:07, Cody Herzog wrote:
Hello.

Recently, for the first time, we experienced an apparent memory issue in our 
production environment. Here's an example of the relevant log messages:

Sep 13 18:55:22 SIPCOMM /usr/local/sbin/kamailio[2302]: ERROR:  
[core/mem/q_malloc.c:286]: qm_find_free(): qm_find_free(0x7fd8bbcfc010, 
134648); Free fragment not found!
Sep 13 18:55:22 SIPCOMM /usr/local/sbin/kamailio[2302]: ERROR:  
[core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7fd8bbcfc010, 134648) 
called from xcap_server: xcap_server.c: ki_xcaps_put(549), module: xcap_server; 
Free fragment not found!
Sep 13 18:55:22 SIPCOMM /usr/local/sbin/kamailio[2302]: ERROR: xcap_server 
[xcap_server.c:552]: ki_xcaps_put(): no more pkg
Sep 13 18:55:22 SIPCOMM /usr/local/sbin/kamailio[2252]: ERROR: app_perl 
[kamailioxs.xs:487]: XS_Kamailio_log(): 500 Server error

The failed operation was the XCAP server module trying to generate a very large 
RLS contact list for a user. The issue only impacted users with very large 
lists, as though a large contiguous block of memory could not be found, whereas 
other smaller allocations continued to work fine. We believe the requested 
allocation was around 112 KB in size.
The server had been up for 14 days. We were able to work around the issue 
temporarily by just restarting the Kamailio service. It's unusual, because our 
production server is often up for months, and we've never seen this issue 
before. The load on production is increasing slowly due to an increased 
concurrent user count, so that might be related.

Before restarting the service on production, we captured the output of the 
following commands:

kamctl stats shmem
kamcmd mod.stats all shm
kamcmd pkg.stats
kamcmd mod.stats all pkg

Here's the shared mem output:

{
  "jsonrpc":  "2.0",
  "result": [
"shmem:fragments = 27240",
"shmem:free_size = 447203296",
"shmem:max_used_size = 116175576",
"shmem:real_used_size = 89667616",
"shmem:total_size = 536870912",
"shmem:used_size = 68824240"
  ],
  "id": 6934
}

Here's the pkg output for the particular PID which was throwing the errors:

{
entry: 34
pid: 2302
rank: 14
used: 2415864
free: 4949688
real_used: 3438920
total_size: 8388608
total_frags: 1951
}

We didn't see anything obvious in the stats output which explains the issue.

We've been trying to reproduce the issue in a dev environment using simulated 
higher than production load running for many days, but so far we've had no 
luck. We've been monitoring memory stats over time, but we don't see any 
obvious leaks or issues.

We've searched various past threads, but didn't find any obvious answers. Here 
are some of the documents and the threads we've been reading:

https://www.kamailio.org/wiki/tutorials/troubleshooting/memory
https://www.kamailio.org/wiki/cookbooks/3.3.x/core#mem_join
https://sr-users.sip-router.narkive.com/3TEDs3ga/tcp-free-fragment-not-found
https://lists.kamailio.org/pipermail/sr-users/2012-June/073552.html
https://lists.kamailio.org/pipermail/sr-users/2017-February/096132.html

Re: [SR-Users] qm_find_free() Free fragment not found, called from xcap_server, no more pkg

2019-09-19 Thread Daniel-Constantin Mierla
Hello,

first, I would recommend to upgrade to the latest version in v5.1.x
series, the 5.1.0 was the first there and there were many issues fixed
in the 5.1 branch that will ensure smoother run in long term.

The issue reported is likely related to fragmentation, try to set global
parameter:

mem_join=1

As the instance needs to deal with large chunks for xcap documents, I
would also suggest to increase a bit the pool of pkg memory via -M
command line parameter -- I see that now is 8MB, make it 12 or 16.

Cheers,
Daniel

On 19.09.19 01:07, Cody Herzog wrote:
>
> Hello.
>
>  
>
> Recently, for the first time, we experienced an apparent memory issue
> in our production environment. Here's an example of the relevant log
> messages:
>
>  
>
> Sep 13 18:55:22 SIPCOMM /usr/local/sbin/kamailio[2302]: ERROR: 
> [core/mem/q_malloc.c:286]: qm_find_free():
> qm_find_free(0x7fd8bbcfc010, 134648); Free fragment not found!
>
> Sep 13 18:55:22 SIPCOMM /usr/local/sbin/kamailio[2302]: ERROR: 
> [core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7fd8bbcfc010,
> 134648) called from xcap_server: xcap_server.c: ki_xcaps_put(549),
> module: xcap_server; Free fragment not found!
>
> Sep 13 18:55:22 SIPCOMM /usr/local/sbin/kamailio[2302]: ERROR:
> xcap_server [xcap_server.c:552]: ki_xcaps_put(): no more pkg
>
> Sep 13 18:55:22 SIPCOMM /usr/local/sbin/kamailio[2252]: ERROR:
> app_perl [kamailioxs.xs:487]: XS_Kamailio_log(): 500 Server error
>
>  
>
> The failed operation was the XCAP server module trying to generate a
> very large RLS contact list for a user. The issue only impacted users
> with very large lists, as though a large contiguous block of memory
> could not be found, whereas other smaller allocations continued to
> work fine. We believe the requested allocation was around 112 KB in size.
>
> The server had been up for 14 days. We were able to work around the
> issue temporarily by just restarting the Kamailio service. It's
> unusual, because our production server is often up for months, and
> we've never seen this issue before. The load on production is
> increasing slowly due to an increased concurrent user count, so that
> might be related.
>
>  
>
> Before restarting the service on production, we captured the output of
> the following commands:
>
>  
>
> kamctl stats shmem
>
> kamcmd mod.stats all shm
>
> kamcmd pkg.stats
>
> kamcmd mod.stats all pkg
>
>  
>
> Here's the shared mem output:
>
>  
>
> {
>
>   "jsonrpc":  "2.0",
>
>   "result": [
>
>     "shmem:fragments = 27240",
>
>     "shmem:free_size = 447203296",
>
>     "shmem:max_used_size = 116175576",
>
>     "shmem:real_used_size = 89667616",
>
>     "shmem:total_size = 536870912",
>
>     "shmem:used_size = 68824240"
>
>   ],
>
>   "id": 6934
>
> }
>
>  
>
> Here's the pkg output for the particular PID which was throwing the
> errors:
>
>  
>
> {
>
>     entry: 34
>
>     pid: 2302
>
>     rank: 14
>
>     used: 2415864
>
>     free: 4949688
>
>     real_used: 3438920
>
>     total_size: 8388608
>
>     total_frags: 1951
>
> }
>
>  
>
> We didn't see anything obvious in the stats output which explains the
> issue.
>
>  
>
> We've been trying to reproduce the issue in a dev environment using
> simulated higher than production load running for many days, but so
> far we've had no luck. We've been monitoring memory stats over time,
> but we don't see any obvious leaks or issues.
>
>  
>
> We've searched various past threads, but didn't find any obvious
> answers. Here are some of the documents and the threads we've been
> reading:
>
>  
>
> https://www.kamailio.org/wiki/tutorials/troubleshooting/memory
>
> https://www.kamailio.org/wiki/cookbooks/3.3.x/core#mem_join
>
> https://sr-users.sip-router.narkive.com/3TEDs3ga/tcp-free-fragment-not-found
>
> https://lists.kamailio.org/pipermail/sr-users/2012-June/073552.html
>
> https://lists.kamailio.org/pipermail/sr-users/2017-February/096132.html
>
> https://lists.kamailio.org/pipermail/sr-users/2017-September/098607.html
>
> https://github.com/kamailio/kamailio/issues/1001
>
> https://lists.kamailio.org/pipermail/sr-users/2016-April/092592.html
>
> https://lists.kamailio.org/pipermail/sr-users/2010-July/064832.html
>
>  
>
> Regarding our Kamailio version and build options, here's the output of
> 'kamailio -v':
>
>  
>
> 
>
> version: kamailio 5.1.0 (x86_64/linux)
>
> flags: STATS: Off, USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS,
> DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MEM, SHM_MMAP, PKG_MALLOC,
> Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX,
> FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR,
> USE_DST_BLACKLIST, HAVE_RESOLV_RES
>
> ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16,
> MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB
>
> poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
>
> id: unknown
>
> compiled on 20:07:31 Jan  4 2018 with