Re: [OpenSIPS-Users] Potential memory leak in siptrace module

2018-02-01 Thread Steve Brisson
Hi,

In answer to your questions...

1.
$ opensips -V
version: opensips 2.3.2 (x86_64/linux)
flags: STATS: On, DISABLE_NAGLE, USE_MCAST, SHM_MMAP, PKG_MALLOC, F_MALLOC, 
FAST_LOCK-ADAPTIVE_WAIT
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, 
MAX_URI_SIZE 1024, BUF_SIZE 65535
poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
main.c compiled on 15:38:40 Nov 24 2017 with gcc 5.4.0

I built from commit 9e497ea4ab03c67cb073bdf0dccef2b16bff80e2. I've since diff'd 
with the HEAD and I haven't seen any changes in the siptrace code that would 
have helped my particular issue.

2.
The call to sip_trace is in the main request route branch and it gets called 
for all requests.
input parameters:
sip_trace("tid", "d", "sip");

module config:
modparam("siptrace", "trace_on", 0)
modparam("siptrace", "trace_id", 
"[tid]uri=mysql://db-user:db-pass@db-host/db-name;table=sip_trace;")



From: Bogdan-Andrei Iancu [mailto:bog...@opensips.org]
Sent: Thursday, February 1, 2018 10:54 AM
To: OpenSIPS users mailling list <users@lists.opensips.org>; Steve Brisson 
<steve.bris...@librestream.com>
Subject: Re: [OpenSIPS-Users] Potential memory leak in siptrace module

Hi Steve,

Thank you for your report. Could you provide the exact revision of your 
OpenSIPS (opensips -V) ?

Also what requests are you calling the sip_trace() for (INVITES ? PUBLISH? 
all?) and what are the input paramters ?

Thanks,


Bogdan-Andrei Iancu



OpenSIPS Founder and Developer

  http://www.opensips-solutions.com

OpenSIPS Summit 2018

  http://www.opensips.org/events/Summit-2018Amsterdam
On 01/31/2018 07:48 PM, Steve Brisson wrote:
I've found that when I call the sip_trace method, in the main request routing 
block, the shmem used_size and fragments continually increases until the 
process crashes from an out of memory condition.

By enabling memory alloc debugging, I confirmed that the leaked allocation is 
the shm_malloc of a trace_info_t structure in the sip_trace_w method 
[siptrace.c:1539]. This piece of memory is supposed to be freed in the dialog 
terminated callback but because of the logic in the sip_trace_w method, 
trace_flags gets modified to TRACE_TRANSACTION and the dialog callbacks are 
never registered.

I haven't worked out the details of a proper fix yet. My workaround has been to 
remove the siptrace module from my config.


OpenSIPS config details:

opensips 2.3.2 (x86_64/linux)

#siptrace module config
modparam("siptrace", "trace_on", 0)
modparam("siptrace", "trace_id", 
"[tid]uri=mysql://db-user:db-pass@db-host/db-name;table=sip_trace;")

#siptrace method call in main request branch
sip_trace("tid", "d", "sip");




___

Users mailing list

Users@lists.opensips.org<mailto:Users@lists.opensips.org>

http://lists.opensips.org/cgi-bin/mailman/listinfo/users

___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users


Re: [OpenSIPS-Users] tcp_async timeouts confusion

2018-01-31 Thread Steve Brisson
Sorry for the late response.

Unfortunately I no longer have call access to the vcs server that experienced 
this error and the issue seemed timing dependent on that particular call path. 
I will update the bug with DBG logs if I am able to recreate the problem by 
some other means.

Thanks for the help.

From: Users [mailto:users-boun...@lists.opensips.org] On Behalf Of Liviu Chircu
Sent: Friday, January 19, 2018 7:16 AM
To: users@lists.opensips.org
Subject: Re: [OpenSIPS-Users] tcp_async timeouts confusion


Hi Steve,

I've opened a GitHub issue for this report so we can better keep track of it 
[1], and will likely be next on my priority list. I will try to find some time 
to attempt to reproduce it asap. In the meanwhile, would you be as kind as to 
supply some debug logs while the async TCP reads are piling up? You may ship 
them to li...@opensips.org<mailto:li...@opensips.org>.

Best regards,

[1]: https://github.com/OpenSIPS/opensips/issues/1259

Liviu Chircu

OpenSIPS Developer

http://www.opensips-solutions.com
On 18.01.2018 18:35, Steve Brisson wrote:
Hi, just wondering if this issue is still being considered.

Thanks,
steve

From: Users [mailto:users-boun...@lists.opensips.org] On Behalf Of Steve Brisson
Sent: Thursday, January 11, 2018 11:56 AM
To: OpenSIPS users mailling list 
<users@lists.opensips.org><mailto:users@lists.opensips.org>
Subject: Re: [OpenSIPS-Users] tcp_async timeouts confusion

TCP scenario:
Yes that is exactly correct. If I have tcp_connect_timeout=3000 and tcp_async=1 
configured then calls to the vcs fail. If I then make the single change to 
disable tcp_async, calls to my vcs endpoints will work.

TLS scenario:
Thanks for the clarification, makes sense. Unfortunately the online 
documentation doesn't describe the parameters and lists an incorrect scaling 
factor (it's defined in milliseconds, not seconds).

steve

From: Users [mailto:users-boun...@lists.opensips.org] On Behalf Of Liviu Chircu
Sent: Thursday, January 11, 2018 3:39 AM
To: users@lists.opensips.org<mailto:users@lists.opensips.org>
Subject: Re: [OpenSIPS-Users] tcp_async timeouts confusion


Answers below,


On 09.01.2018 22:25, Steve Brisson wrote:

*** Using TCP ***

After the invite is sent to the vcs, tcpdump at the opensips server showed 100, 
180, and 200 OK responses from the vcs arriving and ACK'd correctly at the 
opensips server. The 100 response arrived 185ms after the invite is sent. But, 
I don't see these responses in the branch's onreply_route, the global 
onreply_route, or in the log at DBG level. netstat -t shows the connection with 
the data in the recv-q that never reaches 0. This implies to me that opensips 
is not polling that connection correctly for recv data.

If I disable tcp_async then the call is completed successfully. So in the case 
that works, I have tcp_connect_timeout=3000 and tcp_async=0.
Just to confirm the scenario: So the signaling is broken with 
"tcp_connect_timeout=3000 and tcp_async=1" (reply routes do not get triggered / 
recv-q keeps on growing), however once you switch to "tcp_async=0", everything 
is back to working?

*** Using TLS ***

Running tcpdump, I see the opensips server send a Client Hello then a FIN 
packet 100ms later. The vcs responds with a Server Hello 200ms after the Client 
Hello and this gets RST.

To workaround this case, I set tls_handshake_timeout=3000 and 
tls_send_timeout=1000. Maybe this is the correct behavior, I'm still not 100% 
sure how the tls parameters function.
This time, it's behaving as expected. Maybe there should be a diagram somewhere 
with how these parameters work together. For example, each TLS connection will 
roughly follow the below steps along with their corresponding parameterized 
timeouts:

1. TCP connect (tcp_connect_timeout)
2. TLS connect/accept handshake (tls_handshake_timeout)
3. TLS write (tls_send_timeout)
4. TLS write (tls_send_timeout)



Liviu Chircu

OpenSIPS Developer

http://www.opensips-solutions.com




___

Users mailing list

Users@lists.opensips.org<mailto:Users@lists.opensips.org>

http://lists.opensips.org/cgi-bin/mailman/listinfo/users

___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users


[OpenSIPS-Users] Potential memory leak in siptrace module

2018-01-31 Thread Steve Brisson
I've found that when I call the sip_trace method, in the main request routing 
block, the shmem used_size and fragments continually increases until the 
process crashes from an out of memory condition.

By enabling memory alloc debugging, I confirmed that the leaked allocation is 
the shm_malloc of a trace_info_t structure in the sip_trace_w method 
[siptrace.c:1539]. This piece of memory is supposed to be freed in the dialog 
terminated callback but because of the logic in the sip_trace_w method, 
trace_flags gets modified to TRACE_TRANSACTION and the dialog callbacks are 
never registered.

I haven't worked out the details of a proper fix yet. My workaround has been to 
remove the siptrace module from my config.


OpenSIPS config details:

opensips 2.3.2 (x86_64/linux)

#siptrace module config
modparam("siptrace", "trace_on", 0)
modparam("siptrace", "trace_id", 
"[tid]uri=mysql://db-user:db-pass@db-host/db-name;table=sip_trace;")

#siptrace method call in main request branch
sip_trace("tid", "d", "sip");
___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users


Re: [OpenSIPS-Users] tcp_async timeouts confusion

2018-01-18 Thread Steve Brisson
Hi, just wondering if this issue is still being considered.

Thanks,
steve

From: Users [mailto:users-boun...@lists.opensips.org] On Behalf Of Steve Brisson
Sent: Thursday, January 11, 2018 11:56 AM
To: OpenSIPS users mailling list <users@lists.opensips.org>
Subject: Re: [OpenSIPS-Users] tcp_async timeouts confusion

TCP scenario:
Yes that is exactly correct. If I have tcp_connect_timeout=3000 and tcp_async=1 
configured then calls to the vcs fail. If I then make the single change to 
disable tcp_async, calls to my vcs endpoints will work.

TLS scenario:
Thanks for the clarification, makes sense. Unfortunately the online 
documentation doesn't describe the parameters and lists an incorrect scaling 
factor (it's defined in milliseconds, not seconds).

steve

From: Users [mailto:users-boun...@lists.opensips.org] On Behalf Of Liviu Chircu
Sent: Thursday, January 11, 2018 3:39 AM
To: users@lists.opensips.org<mailto:users@lists.opensips.org>
Subject: Re: [OpenSIPS-Users] tcp_async timeouts confusion


Answers below,


On 09.01.2018 22:25, Steve Brisson wrote:

*** Using TCP ***

After the invite is sent to the vcs, tcpdump at the opensips server showed 100, 
180, and 200 OK responses from the vcs arriving and ACK'd correctly at the 
opensips server. The 100 response arrived 185ms after the invite is sent. But, 
I don't see these responses in the branch's onreply_route, the global 
onreply_route, or in the log at DBG level. netstat -t shows the connection with 
the data in the recv-q that never reaches 0. This implies to me that opensips 
is not polling that connection correctly for recv data.

If I disable tcp_async then the call is completed successfully. So in the case 
that works, I have tcp_connect_timeout=3000 and tcp_async=0.
Just to confirm the scenario: So the signaling is broken with 
"tcp_connect_timeout=3000 and tcp_async=1" (reply routes do not get triggered / 
recv-q keeps on growing), however once you switch to "tcp_async=0", everything 
is back to working?

*** Using TLS ***

Running tcpdump, I see the opensips server send a Client Hello then a FIN 
packet 100ms later. The vcs responds with a Server Hello 200ms after the Client 
Hello and this gets RST.

To workaround this case, I set tls_handshake_timeout=3000 and 
tls_send_timeout=1000. Maybe this is the correct behavior, I'm still not 100% 
sure how the tls parameters function.
This time, it's behaving as expected. Maybe there should be a diagram somewhere 
with how these parameters work together. For example, each TLS connection will 
roughly follow the below steps along with their corresponding parameterized 
timeouts:

1. TCP connect (tcp_connect_timeout)
2. TLS connect/accept handshake (tls_handshake_timeout)
3. TLS write (tls_send_timeout)
4. TLS write (tls_send_timeout)


Liviu Chircu

OpenSIPS Developer

http://www.opensips-solutions.com
___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users


Re: [OpenSIPS-Users] tcp_async timeouts confusion

2018-01-09 Thread Steve Brisson
Thanks for the explanation! That matches what I was expecting to see so I think 
there is an issue worth examining here.

*** Server Details ***

opensips -V

version: opensips 2.3.2 (x86_64/linux)
flags: STATS: On, DISABLE_NAGLE, USE_MCAST, SHM_MMAP, PKG_MALLOC, F_MALLOC, 
FAST_LOCK-ADAPTIVE_WAIT
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, 
MAX_URI_SIZE 1024, BUF_SIZE 65535
poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
main.c compiled on 15:38:40 Nov 24 2017 with gcc 5.4.0

My main problem scenario is establishing a connection between an endpoint local 
to opensips, and an endpoint registered to a cisco vcs, using either TCP or 
TLS. Both transports fail in slightly different ways I will describe below. The 
opensips server is running on an AWS server so I have the advertised_address 
and listen aliases set to deal with IP translation. The only tcp timeout I had 
configured was tcp_connect_timeout=3000.

*** Using TCP ***

After the invite is sent to the vcs, tcpdump at the opensips server showed 100, 
180, and 200 OK responses from the vcs arriving and ACK'd correctly at the 
opensips server. The 100 response arrived 185ms after the invite is sent. But, 
I don't see these responses in the branch's onreply_route, the global 
onreply_route, or in the log at DBG level. netstat -t shows the connection with 
the data in the recv-q that never reaches 0. This implies to me that opensips 
is not polling that connection correctly for recv data.

If I disable tcp_async then the call is completed successfully. So in the case 
that works, I have tcp_connect_timeout=3000 and tcp_async=0.

*** Using TLS ***

Running tcpdump, I see the opensips server send a Client Hello then a FIN 
packet 100ms later. The vcs responds with a Server Hello 200ms after the Client 
Hello and this gets RST.

To workaround this case, I set tls_handshake_timeout=3000 and 
tls_send_timeout=1000. Maybe this is the correct behavior, I'm still not 100% 
sure how the tls parameters function.

*** Conclusion ***
In both the TCP and TLS cases it seems like the tcp_connect_timeout isn't being 
used as expected.

So to workaround this, I went from having only tcp_connect_timeout=3000 to:
tcp_connect_timeout=3000
modparam("proto_tcp", "tcp_async", 0)
modparam("proto_tcp", "tcp_send_timeout", 1000)
modparam("tls_mgm", "tls_handshake_timeout", 3000)
modparam("tls_mgm", "tls_send_timeout", 1000)


Please let me know if I am unclear in my description of the issue. There is a 
lot of details to go through.

Thanks again for the quick response.

From: Users [mailto:users-boun...@lists.opensips.org] On Behalf Of Liviu Chircu
Sent: Tuesday, January 9, 2018 3:20 AM
To: users@lists.opensips.org
Subject: Re: [OpenSIPS-Users] tcp_async timeouts confusion


Hi, Steve!

The transport layer was heavily refactored roughly three years ago, see [1], 
[2] and [3] for the relevant commits which, indeed, bumped the default connect 
timeout down a lot, to a much lower value (10s -> 100ms). Although 100ms might 
seem unnecessary (it's async! let it sleep as long as it wants!), keep in mind 
that the TLS support isn't async at all, yet it will also make use of the same, 
default "tcp_connect_timeout" - a 10s default here is quite bad for high 
traffic volume TLS proxies which often need to open up lots of TCP/TLS 
connections.

All in all, the "tcp_connect_timeout" should not get ignored at all. The 
"tcp_async_local_connect_timeout" [4] is the first one that hits, after which 
the connect waiting will be performed by a non-TCP worker, up to 
"tcp_connect_timeout" milliseconds. If it doesn't behave like this, let us 
know, and we'll look into it more.

Best regards,

[1]: https://github.com/OpenSIPS/opensips/commit/b343ca1c
[2]: https://github.com/OpenSIPS/opensips/commit/78c84620
[3]: https://github.com/OpenSIPS/opensips/commit/11aedc6
[4]: http://www.opensips.org/html/docs/modules/2.4.x/proto_tcp.html#idp5544000

Liviu Chircu

OpenSIPS Developer

http://www.opensips-solutions.com
On 08.01.2018 22:38, Steve Brisson wrote:
Hi,

I've run into some issues related to tcp_async and tcp/tls timeouts since 
upgrading opensips from v1.8 to v2.3.

Based on my v1.8 config, I had the tcp_connect_timeout set to 3 secs but this 
gets ignored in v2.3 because tcp_async is enabled by default. As a result, 
calls made from a local opensips endpoint to a remote registered endpoint 
(through a cisco vcs) were failing. I then noticed that the tcp/tls timeouts 
were aggressively reduced from 10-30s to 100ms by default with the tcp_async 
feature.

My main questions are:
- How is the tcp_async feature supposed to function if the 
tcp_async_local_connect_timeout expires? The code seems to imply that the 
socket gets put onto a tcp main thread and handled.
- 100ms seems pretty short as a default for these timeouts, especially tls. 

[OpenSIPS-Users] tcp_async timeouts confusion

2018-01-08 Thread Steve Brisson
Hi,

I've run into some issues related to tcp_async and tcp/tls timeouts since 
upgrading opensips from v1.8 to v2.3.

Based on my v1.8 config, I had the tcp_connect_timeout set to 3 secs but this 
gets ignored in v2.3 because tcp_async is enabled by default. As a result, 
calls made from a local opensips endpoint to a remote registered endpoint 
(through a cisco vcs) were failing. I then noticed that the tcp/tls timeouts 
were aggressively reduced from 10-30s to 100ms by default with the tcp_async 
feature.

My main questions are:
- How is the tcp_async feature supposed to function if the 
tcp_async_local_connect_timeout expires? The code seems to imply that the 
socket gets put onto a tcp main thread and handled.
- 100ms seems pretty short as a default for these timeouts, especially tls. 
Does a timeout result in the sip request getting cancelled or is there still 
some processing that can occur after because of handling on the tcp main thread.

In short, I'm a confused about what the tcp_async feature does and how the 
timeouts should be set. Any explanations would be greatly appreciated.

Thanks for your time.

steve
___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users