Re: [OpenSIPS-Users] UDP workers consume 100% CPU

2019-08-14 Thread Vitalii Aleksandrov

Absolute agree. No doubts it's the same issue.

I suspect you are facing an issue similar to this one[1], so I think 
we should move the discussion on that ticket. So please send us your 
updates over there.


[1] https://github.com/OpenSIPS/opensips/issues/1767

Best regards,
Răzvan

On 8/14/19 12:58 PM, Vitalii Aleksandrov wrote:

Ok, I'll check when finish load testing with libssl-1.0.



Hi, Vitalii!

Just to double check you are using the right overwritten functions, 
can you add some debugging in the tls_mgm code, recompile, and run 
the test again? I'm thinking to add something like the snippet below 
on line 1525[1]:

LM_INFO("initialized mutex over OpenSIPS\n");

[1] 
https://github.com/OpenSIPS/opensips/blob/master/modules/tls_mgm/tls_mgm.c#L1525 



Best regards,
Răzvan

On 8/12/19 7:20 PM, Vitalii Aleksandrov wrote:
Checked the tls_mgm.c code and found that I wasn't right. I've just 
downgraded libssl and restarted opensips without rebuilding it. 
Will correctly rebuild opensips with libssl-1.0 dev files and retest.
The workaround with pthread_mutex_init() and shm_malloc() wrapping 
theoretically should work and Kamailio went the same way. Anyhow 
something definitely wrong with this "workaround" and libssl-1.1




Hi,

Downgraded libssl to 1.0.2n-1ubuntu5.3 available in repository and 
still can reproduce this bug. You were right that this is a 
deadlock in libssl. Found TCP worker process blocked in libcrypto 
and a bunch of processes burning CPU cycles in sched_yield() 
waiting for the lock. That's really strange that do completely 
different libssl versions have such a dangerous bug. Found a few 
opened issues related to libssl, but according to comments 
downgrading to libssl-1.0 helped to make it stable.



Hi,

I'm using libssl-1.1.1-1ubuntu2.1~18.04.4

Since I need SIP over TLS I can't just remove proto_tls. Will try 
to reproduce it again and collect more data.


On 07.08.19 13:18, Bogdan-Andrei Iancu wrote:

Hi Vitalii,

As the backtraces show, it is a deadlock in TLS - if you remove 
the proto_tls, you will not experience this issue anymore. The 
root problem is not in OpenSIPS itself but in the 
libssl/libcrypto - what versions are you using for these libs ?


Regards,

Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
  https://www.opensips-solutions.com
OpenSIPS Summit 2019
https://www.opensips.org/events/Summit-2019Amsterdam/

On 08/05/2019 09:13 PM, Vitalii Aleksandrov wrote:

Hi,

I have a test opensips 2.4.6 (last week's checkout from 2.4 
branch) and after short load testing it started to consume all 
CPU resources.


Here is what opensips floods to syslog:
Aug  3 13:25:01 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176231430 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176231530 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176231630 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176231730 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176231830 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176231930 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176232030 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:timer_ticker: 
timer task  already scheduled for 114023090 ms (now 
176232030 ms), it may overlap..


Where 4774 is: "Process::  ID=7 PID=4774 Type=timer"

Here is the TOP output for time consuming processes:
  PID USER PR  NI    VIRT    RES  SHR S %CPU 
%MEM TIME+ COMMAND
 4801 opensips  20   0  721040  28496  24240 R  52.9 0.7 
1817:08 opensips
 4802 opensips  20   0  721040  27724  23588 R  47.1 0.7 
1813:06 opensips
 4806 opensips  20   0  721040  27640  23532 R  47.1 0.7 
1814:41 opensips
 4833 opensips  20   0  721040  28848  24812 R  47.1 0.7 
1815:16 opensips
 4803 opensips  20   0  721044  27928  23732 R  41.2 0.7 
1817:09 opensips
 4804 opensips  20   0  721040  28188  23996 R  41.2 0.7 
1816:05 opensips
 4799 opensips  20   0  721040  28272  24172 R  35.3 0.7 
1813:29 opensips
 4800 opensips  20   0  721040  28340  24204 R  29.4 0.7 
1816:52 opensips
 4805 opensips  20   0  723256  29620  25300 R  29.4 0.8 
1819:28 opensips


And here is the "opensipsctl ps" for them:
Process::  ID=32 PID=4799 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=33 PID=4800 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=34 PID=4801 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=35 PID=4802 Type=SIP receiver udp:127.0.0.1:5070
P

Re: [OpenSIPS-Users] UDP workers consume 100% CPU

2019-08-14 Thread Răzvan Crainea
I suspect you are facing an issue similar to this one[1], so I think we 
should move the discussion on that ticket. So please send us your 
updates over there.


[1] https://github.com/OpenSIPS/opensips/issues/1767

Best regards,
Răzvan

On 8/14/19 12:58 PM, Vitalii Aleksandrov wrote:

Ok, I'll check when finish load testing with libssl-1.0.



Hi, Vitalii!

Just to double check you are using the right overwritten functions, 
can you add some debugging in the tls_mgm code, recompile, and run the 
test again? I'm thinking to add something like the snippet below on 
line 1525[1]:

LM_INFO("initialized mutex over OpenSIPS\n");

[1] 
https://github.com/OpenSIPS/opensips/blob/master/modules/tls_mgm/tls_mgm.c#L1525 



Best regards,
Răzvan

On 8/12/19 7:20 PM, Vitalii Aleksandrov wrote:
Checked the tls_mgm.c code and found that I wasn't right. I've just 
downgraded libssl and restarted opensips without rebuilding it. Will 
correctly rebuild opensips with libssl-1.0 dev files and retest.
The workaround with pthread_mutex_init() and shm_malloc() wrapping 
theoretically should work and Kamailio went the same way. Anyhow 
something definitely wrong with this "workaround" and libssl-1.1




Hi,

Downgraded libssl to 1.0.2n-1ubuntu5.3 available in repository and 
still can reproduce this bug. You were right that this is a deadlock 
in libssl. Found TCP worker process blocked in libcrypto and a bunch 
of processes burning CPU cycles in sched_yield() waiting for the 
lock. That's really strange that do completely different libssl 
versions have such a dangerous bug. Found a few opened issues 
related to libssl, but according to comments downgrading to 
libssl-1.0 helped to make it stable.



Hi,

I'm using libssl-1.1.1-1ubuntu2.1~18.04.4

Since I need SIP over TLS I can't just remove proto_tls. Will try 
to reproduce it again and collect more data.


On 07.08.19 13:18, Bogdan-Andrei Iancu wrote:

Hi Vitalii,

As the backtraces show, it is a deadlock in TLS - if you remove 
the proto_tls, you will not experience this issue anymore. The 
root problem is not in OpenSIPS itself but in the libssl/libcrypto 
- what versions are you using for these libs ?


Regards,

Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
  https://www.opensips-solutions.com
OpenSIPS Summit 2019
  https://www.opensips.org/events/Summit-2019Amsterdam/

On 08/05/2019 09:13 PM, Vitalii Aleksandrov wrote:

Hi,

I have a test opensips 2.4.6 (last week's checkout from 2.4 
branch) and after short load testing it started to consume all 
CPU resources.


Here is what opensips floods to syslog:
Aug  3 13:25:01 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176231430 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176231530 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176231630 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176231730 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176231830 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176231930 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176232030 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:timer_ticker: 
timer task  already scheduled for 114023090 ms (now 
176232030 ms), it may overlap..


Where 4774 is: "Process::  ID=7 PID=4774 Type=timer"

Here is the TOP output for time consuming processes:
  PID USER PR  NI    VIRT    RES  SHR    S %CPU 
%MEM TIME+ COMMAND
 4801 opensips  20   0  721040  28496  24240 R  52.9 0.7 1817:08 
opensips
 4802 opensips  20   0  721040  27724  23588 R  47.1 0.7 1813:06 
opensips
 4806 opensips  20   0  721040  27640  23532 R  47.1 0.7 1814:41 
opensips
 4833 opensips  20   0  721040  28848  24812 R  47.1 0.7 1815:16 
opensips
 4803 opensips  20   0  721044  27928  23732 R  41.2 0.7 1817:09 
opensips
 4804 opensips  20   0  721040  28188  23996 R  41.2 0.7 1816:05 
opensips
 4799 opensips  20   0  721040  28272  24172 R  35.3 0.7 1813:29 
opensips
 4800 opensips  20   0  721040  28340  24204 R  29.4 0.7 1816:52 
opensips
 4805 opensips  20   0  723256  29620  25300 R  29.4 0.8 1819:28 
opensips


And here is the "opensipsctl ps" for them:
Process::  ID=32 PID=4799 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=33 PID=4800 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=34 PID=4801 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=35 PID=4802 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=36 PID=4803 Type=SIP receiver u

Re: [OpenSIPS-Users] UDP workers consume 100% CPU

2019-08-14 Thread Vitalii Aleksandrov

Ok, I'll check when finish load testing with libssl-1.0.



Hi, Vitalii!

Just to double check you are using the right overwritten functions, 
can you add some debugging in the tls_mgm code, recompile, and run the 
test again? I'm thinking to add something like the snippet below on 
line 1525[1]:

LM_INFO("initialized mutex over OpenSIPS\n");

[1] 
https://github.com/OpenSIPS/opensips/blob/master/modules/tls_mgm/tls_mgm.c#L1525


Best regards,
Răzvan

On 8/12/19 7:20 PM, Vitalii Aleksandrov wrote:
Checked the tls_mgm.c code and found that I wasn't right. I've just 
downgraded libssl and restarted opensips without rebuilding it. Will 
correctly rebuild opensips with libssl-1.0 dev files and retest.
The workaround with pthread_mutex_init() and shm_malloc() wrapping 
theoretically should work and Kamailio went the same way. Anyhow 
something definitely wrong with this "workaround" and libssl-1.1




Hi,

Downgraded libssl to 1.0.2n-1ubuntu5.3 available in repository and 
still can reproduce this bug. You were right that this is a deadlock 
in libssl. Found TCP worker process blocked in libcrypto and a bunch 
of processes burning CPU cycles in sched_yield() waiting for the 
lock. That's really strange that do completely different libssl 
versions have such a dangerous bug. Found a few opened issues 
related to libssl, but according to comments downgrading to 
libssl-1.0 helped to make it stable.



Hi,

I'm using libssl-1.1.1-1ubuntu2.1~18.04.4

Since I need SIP over TLS I can't just remove proto_tls. Will try 
to reproduce it again and collect more data.


On 07.08.19 13:18, Bogdan-Andrei Iancu wrote:

Hi Vitalii,

As the backtraces show, it is a deadlock in TLS - if you remove 
the proto_tls, you will not experience this issue anymore. The 
root problem is not in OpenSIPS itself but in the libssl/libcrypto 
- what versions are you using for these libs ?


Regards,

Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
  https://www.opensips-solutions.com
OpenSIPS Summit 2019
  https://www.opensips.org/events/Summit-2019Amsterdam/

On 08/05/2019 09:13 PM, Vitalii Aleksandrov wrote:

Hi,

I have a test opensips 2.4.6 (last week's checkout from 2.4 
branch) and after short load testing it started to consume all 
CPU resources.


Here is what opensips floods to syslog:
Aug  3 13:25:01 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176231430 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176231530 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176231630 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176231730 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176231830 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176231930 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176232030 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:timer_ticker: 
timer task  already scheduled for 114023090 ms (now 
176232030 ms), it may overlap..


Where 4774 is: "Process::  ID=7 PID=4774 Type=timer"

Here is the TOP output for time consuming processes:
  PID USER PR  NI    VIRT    RES  SHR    S %CPU 
%MEM TIME+ COMMAND
 4801 opensips  20   0  721040  28496  24240 R  52.9 0.7 1817:08 
opensips
 4802 opensips  20   0  721040  27724  23588 R  47.1 0.7 1813:06 
opensips
 4806 opensips  20   0  721040  27640  23532 R  47.1 0.7 1814:41 
opensips
 4833 opensips  20   0  721040  28848  24812 R  47.1 0.7 1815:16 
opensips
 4803 opensips  20   0  721044  27928  23732 R  41.2 0.7 1817:09 
opensips
 4804 opensips  20   0  721040  28188  23996 R  41.2 0.7 1816:05 
opensips
 4799 opensips  20   0  721040  28272  24172 R  35.3 0.7 1813:29 
opensips
 4800 opensips  20   0  721040  28340  24204 R  29.4 0.7 1816:52 
opensips
 4805 opensips  20   0  723256  29620  25300 R  29.4 0.8 1819:28 
opensips


And here is the "opensipsctl ps" for them:
Process::  ID=32 PID=4799 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=33 PID=4800 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=34 PID=4801 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=35 PID=4802 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=36 PID=4803 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=37 PID=4804 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=38 PID=4805 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=39 PID=4806 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=66 PID=4833 Type=TCP receiver

"strace" output for those burning pr

Re: [OpenSIPS-Users] UDP workers consume 100% CPU

2019-08-14 Thread Răzvan Crainea

Hi, Vitalii!

Just to double check you are using the right overwritten functions, can 
you add some debugging in the tls_mgm code, recompile, and run the test 
again? I'm thinking to add something like the snippet below on line 1525[1]:

LM_INFO("initialized mutex over OpenSIPS\n");

[1] 
https://github.com/OpenSIPS/opensips/blob/master/modules/tls_mgm/tls_mgm.c#L1525


Best regards,
Răzvan

On 8/12/19 7:20 PM, Vitalii Aleksandrov wrote:
Checked the tls_mgm.c code and found that I wasn't right. I've just 
downgraded libssl and restarted opensips without rebuilding it. Will 
correctly rebuild opensips with libssl-1.0 dev files and retest.
The workaround with pthread_mutex_init() and shm_malloc() wrapping 
theoretically should work and Kamailio went the same way. Anyhow 
something definitely wrong with this "workaround" and libssl-1.1




Hi,

Downgraded libssl to 1.0.2n-1ubuntu5.3 available in repository and 
still can reproduce this bug. You were right that this is a deadlock 
in libssl. Found TCP worker process blocked in libcrypto and a bunch 
of processes burning CPU cycles in sched_yield() waiting for the lock. 
That's really strange that do completely different libssl versions 
have such a dangerous bug. Found a few opened issues related to 
libssl, but according to comments downgrading to libssl-1.0 helped to 
make it stable.



Hi,

I'm using libssl-1.1.1-1ubuntu2.1~18.04.4

Since I need SIP over TLS I can't just remove proto_tls. Will try to 
reproduce it again and collect more data.


On 07.08.19 13:18, Bogdan-Andrei Iancu wrote:

Hi Vitalii,

As the backtraces show, it is a deadlock in TLS - if you remove the 
proto_tls, you will not experience this issue anymore. The root 
problem is not in OpenSIPS itself but in the libssl/libcrypto - what 
versions are you using for these libs ?


Regards,

Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
  https://www.opensips-solutions.com
OpenSIPS Summit 2019
  https://www.opensips.org/events/Summit-2019Amsterdam/

On 08/05/2019 09:13 PM, Vitalii Aleksandrov wrote:

Hi,

I have a test opensips 2.4.6 (last week's checkout from 2.4 branch) 
and after short load testing it started to consume all CPU resources.


Here is what opensips floods to syslog:
Aug  3 13:25:01 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176231430 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176231530 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176231630 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176231730 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176231830 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176231930 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176232030 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:timer_ticker: timer 
task  already scheduled for 114023090 ms (now 176232030 
ms), it may overlap..


Where 4774 is: "Process::  ID=7 PID=4774 Type=timer"

Here is the TOP output for time consuming processes:
  PID USER PR  NI    VIRT    RES  SHR    S  %CPU 
%MEM TIME+ COMMAND
 4801 opensips  20   0  721040  28496  24240 R  52.9  0.7 1817:08 
opensips
 4802 opensips  20   0  721040  27724  23588 R  47.1  0.7 1813:06 
opensips
 4806 opensips  20   0  721040  27640  23532 R  47.1  0.7 1814:41 
opensips
 4833 opensips  20   0  721040  28848  24812 R  47.1  0.7 1815:16 
opensips
 4803 opensips  20   0  721044  27928  23732 R  41.2  0.7 1817:09 
opensips
 4804 opensips  20   0  721040  28188  23996 R  41.2  0.7 1816:05 
opensips
 4799 opensips  20   0  721040  28272  24172 R  35.3  0.7 1813:29 
opensips
 4800 opensips  20   0  721040  28340  24204 R  29.4  0.7 1816:52 
opensips
 4805 opensips  20   0  723256  29620  25300 R  29.4  0.8 1819:28 
opensips


And here is the "opensipsctl ps" for them:
Process::  ID=32 PID=4799 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=33 PID=4800 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=34 PID=4801 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=35 PID=4802 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=36 PID=4803 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=37 PID=4804 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=38 PID=4805 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=39 PID=4806 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=66 PID=4833 Type=TCP receiver

"strace" output for those burning processes just floods with:
11:07:04.042794 sched_yiel

Re: [OpenSIPS-Users] UDP workers consume 100% CPU

2019-08-12 Thread Vitalii Aleksandrov
Checked the tls_mgm.c code and found that I wasn't right. I've just 
downgraded libssl and restarted opensips without rebuilding it. Will 
correctly rebuild opensips with libssl-1.0 dev files and retest.
The workaround with pthread_mutex_init() and shm_malloc() wrapping 
theoretically should work and Kamailio went the same way. Anyhow 
something definitely wrong with this "workaround" and libssl-1.1




Hi,

Downgraded libssl to 1.0.2n-1ubuntu5.3 available in repository and 
still can reproduce this bug. You were right that this is a deadlock 
in libssl. Found TCP worker process blocked in libcrypto and a bunch 
of processes burning CPU cycles in sched_yield() waiting for the lock. 
That's really strange that do completely different libssl versions 
have such a dangerous bug. Found a few opened issues related to 
libssl, but according to comments downgrading to libssl-1.0 helped to 
make it stable.



Hi,

I'm using libssl-1.1.1-1ubuntu2.1~18.04.4

Since I need SIP over TLS I can't just remove proto_tls. Will try to 
reproduce it again and collect more data.


On 07.08.19 13:18, Bogdan-Andrei Iancu wrote:

Hi Vitalii,

As the backtraces show, it is a deadlock in TLS - if you remove the 
proto_tls, you will not experience this issue anymore. The root 
problem is not in OpenSIPS itself but in the libssl/libcrypto - what 
versions are you using for these libs ?


Regards,

Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
  https://www.opensips-solutions.com
OpenSIPS Summit 2019
  https://www.opensips.org/events/Summit-2019Amsterdam/

On 08/05/2019 09:13 PM, Vitalii Aleksandrov wrote:

Hi,

I have a test opensips 2.4.6 (last week's checkout from 2.4 branch) 
and after short load testing it started to consume all CPU resources.


Here is what opensips floods to syslog:
Aug  3 13:25:01 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176231430 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176231530 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176231630 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176231730 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176231830 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176231930 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176232030 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:timer_ticker: timer 
task  already scheduled for 114023090 ms (now 176232030 
ms), it may overlap..


Where 4774 is: "Process::  ID=7 PID=4774 Type=timer"

Here is the TOP output for time consuming processes:
  PID USER PR  NI    VIRT    RES  SHR    S  %CPU 
%MEM TIME+ COMMAND
 4801 opensips  20   0  721040  28496  24240 R  52.9  0.7 1817:08 
opensips
 4802 opensips  20   0  721040  27724  23588 R  47.1  0.7 1813:06 
opensips
 4806 opensips  20   0  721040  27640  23532 R  47.1  0.7 1814:41 
opensips
 4833 opensips  20   0  721040  28848  24812 R  47.1  0.7 1815:16 
opensips
 4803 opensips  20   0  721044  27928  23732 R  41.2  0.7 1817:09 
opensips
 4804 opensips  20   0  721040  28188  23996 R  41.2  0.7 1816:05 
opensips
 4799 opensips  20   0  721040  28272  24172 R  35.3  0.7 1813:29 
opensips
 4800 opensips  20   0  721040  28340  24204 R  29.4  0.7 1816:52 
opensips
 4805 opensips  20   0  723256  29620  25300 R  29.4  0.8 1819:28 
opensips


And here is the "opensipsctl ps" for them:
Process::  ID=32 PID=4799 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=33 PID=4800 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=34 PID=4801 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=35 PID=4802 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=36 PID=4803 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=37 PID=4804 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=38 PID=4805 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=39 PID=4806 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=66 PID=4833 Type=TCP receiver

"strace" output for those burning processes just floods with:
11:07:04.042794 sched_yield()   = 0
11:07:04.042869 sched_yield()   = 0
11:07:04.042943 sched_yield()   = 0
11:07:04.043016 sched_yield()   = 0
11:07:04.043090 sched_yield()   = 0
11:07:04.043164 sched_yield()   = 0

Unfortunately this opensips is striped and gdb output is limited.
GDB bt of TCP receiver:
(gdb) bt
#0  0x7f939211fe57 in sched_yield () at 
../sysdeps/unix/syscall-template.S:78
#1  0x7f936fffc14d

Re: [OpenSIPS-Users] UDP workers consume 100% CPU

2019-08-12 Thread Vitalii Aleksandrov

Hi,

Downgraded libssl to 1.0.2n-1ubuntu5.3 available in repository and still 
can reproduce this bug. You were right that this is a deadlock in 
libssl. Found TCP worker process blocked in libcrypto and a bunch of 
processes burning CPU cycles in sched_yield() waiting for the lock. 
That's really strange that do completely different libssl versions have 
such a dangerous bug. Found a few opened issues related to libssl, but 
according to comments downgrading to libssl-1.0 helped to make it stable.



Hi,

I'm using libssl-1.1.1-1ubuntu2.1~18.04.4

Since I need SIP over TLS I can't just remove proto_tls. Will try to 
reproduce it again and collect more data.


On 07.08.19 13:18, Bogdan-Andrei Iancu wrote:

Hi Vitalii,

As the backtraces show, it is a deadlock in TLS - if you remove the 
proto_tls, you will not experience this issue anymore. The root 
problem is not in OpenSIPS itself but in the libssl/libcrypto - what 
versions are you using for these libs ?


Regards,

Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
  https://www.opensips-solutions.com
OpenSIPS Summit 2019
  https://www.opensips.org/events/Summit-2019Amsterdam/

On 08/05/2019 09:13 PM, Vitalii Aleksandrov wrote:

Hi,

I have a test opensips 2.4.6 (last week's checkout from 2.4 branch) 
and after short load testing it started to consume all CPU resources.


Here is what opensips floods to syslog:
Aug  3 13:25:01 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176231430 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176231530 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176231630 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176231730 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176231830 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176231930 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
utimer task  already scheduled for 114023190 ms (now 
176232030 ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:timer_ticker: timer 
task  already scheduled for 114023090 ms (now 176232030 
ms), it may overlap..


Where 4774 is: "Process::  ID=7 PID=4774 Type=timer"

Here is the TOP output for time consuming processes:
  PID USER PR  NI    VIRT    RES  SHR    S  %CPU 
%MEM TIME+ COMMAND
 4801 opensips  20   0  721040  28496  24240 R  52.9  0.7 1817:08 
opensips
 4802 opensips  20   0  721040  27724  23588 R  47.1  0.7 1813:06 
opensips
 4806 opensips  20   0  721040  27640  23532 R  47.1  0.7 1814:41 
opensips
 4833 opensips  20   0  721040  28848  24812 R  47.1  0.7 1815:16 
opensips
 4803 opensips  20   0  721044  27928  23732 R  41.2  0.7 1817:09 
opensips
 4804 opensips  20   0  721040  28188  23996 R  41.2  0.7 1816:05 
opensips
 4799 opensips  20   0  721040  28272  24172 R  35.3  0.7 1813:29 
opensips
 4800 opensips  20   0  721040  28340  24204 R  29.4  0.7 1816:52 
opensips
 4805 opensips  20   0  723256  29620  25300 R  29.4  0.8 1819:28 
opensips


And here is the "opensipsctl ps" for them:
Process::  ID=32 PID=4799 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=33 PID=4800 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=34 PID=4801 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=35 PID=4802 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=36 PID=4803 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=37 PID=4804 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=38 PID=4805 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=39 PID=4806 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=66 PID=4833 Type=TCP receiver

"strace" output for those burning processes just floods with:
11:07:04.042794 sched_yield()   = 0
11:07:04.042869 sched_yield()   = 0
11:07:04.042943 sched_yield()   = 0
11:07:04.043016 sched_yield()   = 0
11:07:04.043090 sched_yield()   = 0
11:07:04.043164 sched_yield()   = 0

Unfortunately this opensips is striped and gdb output is limited.
GDB bt of TCP receiver:
(gdb) bt
#0  0x7f939211fe57 in sched_yield () at 
../sysdeps/unix/syscall-template.S:78
#1  0x7f936fffc14d in ?? () from 
/usr/lib/x86_64-linux-gnu/opensips/modules/proto_tls.so
#2  0x7f93791698d5 in send_pr_buffer () from 
/usr/lib/x86_64-linux-gnu/opensips/modules/tm.so
#3  0x7f93791626b8 in relay_reply () from 
/usr/lib/x86_64-linux-gnu/opensips/modules/tm.so
#4  0x7f937919fe2a in timer_routine () from 
/usr/lib/x86_64-linux-gnu/opensips/modules/tm.so

#5  0x55e00034a58a in handle_timer_jo

Re: [OpenSIPS-Users] UDP workers consume 100% CPU

2019-08-07 Thread Vitalii Aleksandrov

Hi,

I'm using libssl-1.1.1-1ubuntu2.1~18.04.4

Since I need SIP over TLS I can't just remove proto_tls. Will try to 
reproduce it again and collect more data.


On 07.08.19 13:18, Bogdan-Andrei Iancu wrote:

Hi Vitalii,

As the backtraces show, it is a deadlock in TLS - if you remove the 
proto_tls, you will not experience this issue anymore. The root 
problem is not in OpenSIPS itself but in the libssl/libcrypto - what 
versions are you using for these libs ?


Regards,

Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
  https://www.opensips-solutions.com
OpenSIPS Summit 2019
  https://www.opensips.org/events/Summit-2019Amsterdam/

On 08/05/2019 09:13 PM, Vitalii Aleksandrov wrote:

Hi,

I have a test opensips 2.4.6 (last week's checkout from 2.4 branch) 
and after short load testing it started to consume all CPU resources.


Here is what opensips floods to syslog:
Aug  3 13:25:01 l /opensips[4774]: WARNING:core:utimer_ticker: utimer 
task  already scheduled for 114023190 ms (now 176231430 
ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer 
task  already scheduled for 114023190 ms (now 176231530 
ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer 
task  already scheduled for 114023190 ms (now 176231630 
ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer 
task  already scheduled for 114023190 ms (now 176231730 
ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer 
task  already scheduled for 114023190 ms (now 176231830 
ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer 
task  already scheduled for 114023190 ms (now 176231930 
ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer 
task  already scheduled for 114023190 ms (now 176232030 
ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:timer_ticker: timer 
task  already scheduled for 114023090 ms (now 176232030 
ms), it may overlap..


Where 4774 is: "Process::  ID=7 PID=4774 Type=timer"

Here is the TOP output for time consuming processes:
  PID USER PR  NI    VIRT    RES  SHR    S  %CPU %MEM 
TIME+ COMMAND
 4801 opensips  20   0  721040  28496  24240 R  52.9  0.7 1817:08 
opensips
 4802 opensips  20   0  721040  27724  23588 R  47.1  0.7 1813:06 
opensips
 4806 opensips  20   0  721040  27640  23532 R  47.1  0.7 1814:41 
opensips
 4833 opensips  20   0  721040  28848  24812 R  47.1  0.7 1815:16 
opensips
 4803 opensips  20   0  721044  27928  23732 R  41.2  0.7 1817:09 
opensips
 4804 opensips  20   0  721040  28188  23996 R  41.2  0.7 1816:05 
opensips
 4799 opensips  20   0  721040  28272  24172 R  35.3  0.7 1813:29 
opensips
 4800 opensips  20   0  721040  28340  24204 R  29.4  0.7 1816:52 
opensips
 4805 opensips  20   0  723256  29620  25300 R  29.4  0.8 1819:28 
opensips


And here is the "opensipsctl ps" for them:
Process::  ID=32 PID=4799 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=33 PID=4800 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=34 PID=4801 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=35 PID=4802 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=36 PID=4803 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=37 PID=4804 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=38 PID=4805 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=39 PID=4806 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=66 PID=4833 Type=TCP receiver

"strace" output for those burning processes just floods with:
11:07:04.042794 sched_yield()   = 0
11:07:04.042869 sched_yield()   = 0
11:07:04.042943 sched_yield()   = 0
11:07:04.043016 sched_yield()   = 0
11:07:04.043090 sched_yield()   = 0
11:07:04.043164 sched_yield()   = 0

Unfortunately this opensips is striped and gdb output is limited.
GDB bt of TCP receiver:
(gdb) bt
#0  0x7f939211fe57 in sched_yield () at 
../sysdeps/unix/syscall-template.S:78
#1  0x7f936fffc14d in ?? () from 
/usr/lib/x86_64-linux-gnu/opensips/modules/proto_tls.so
#2  0x7f93791698d5 in send_pr_buffer () from 
/usr/lib/x86_64-linux-gnu/opensips/modules/tm.so
#3  0x7f93791626b8 in relay_reply () from 
/usr/lib/x86_64-linux-gnu/opensips/modules/tm.so
#4  0x7f937919fe2a in timer_routine () from 
/usr/lib/x86_64-linux-gnu/opensips/modules/tm.so

#5  0x55e00034a58a in handle_timer_job ()
#6  0x55e00041ae2a in tcp_worker_proc_loop ()
#7  0x55e000427218 in tcp_start_processes ()
#8  0x55e0002e6837 in main ()

GDB bt of UDP receiver:
#0  0x7f939211fe57 in sched_yield () at 
../sysdeps/unix/syscall-template.S:78
#1  0x7f936fffc14d in ?? () from 
/usr/lib/x86_64-linux-gnu/opensips/modules/proto_tls.so
#2  0x7f93791698d5 in send_pr_buffer () from 
/usr/lib/x86_64-linux-gnu/opensips/modules/tm.so
#3  0x7f93791626b8 in relay_reply () from 
/usr/lib/x86_64-linux

Re: [OpenSIPS-Users] UDP workers consume 100% CPU

2019-08-07 Thread Bogdan-Andrei Iancu

Hi Vitalii,

As the backtraces show, it is a deadlock in TLS - if you remove the 
proto_tls, you will not experience this issue anymore. The root problem 
is not in OpenSIPS itself but in the libssl/libcrypto - what versions 
are you using for these libs ?


Regards,

Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
  https://www.opensips-solutions.com
OpenSIPS Summit 2019
  https://www.opensips.org/events/Summit-2019Amsterdam/

On 08/05/2019 09:13 PM, Vitalii Aleksandrov wrote:

Hi,

I have a test opensips 2.4.6 (last week's checkout from 2.4 branch) 
and after short load testing it started to consume all CPU resources.


Here is what opensips floods to syslog:
Aug  3 13:25:01 l /opensips[4774]: WARNING:core:utimer_ticker: utimer 
task  already scheduled for 114023190 ms (now 176231430 
ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer 
task  already scheduled for 114023190 ms (now 176231530 
ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer 
task  already scheduled for 114023190 ms (now 176231630 
ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer 
task  already scheduled for 114023190 ms (now 176231730 
ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer 
task  already scheduled for 114023190 ms (now 176231830 
ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer 
task  already scheduled for 114023190 ms (now 176231930 
ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer 
task  already scheduled for 114023190 ms (now 176232030 
ms), it may overlap..
Aug  3 13:25:02 l /opensips[4774]: WARNING:core:timer_ticker: timer 
task  already scheduled for 114023090 ms (now 176232030 ms), 
it may overlap..


Where 4774 is: "Process::  ID=7 PID=4774 Type=timer"

Here is the TOP output for time consuming processes:
  PID USER PR  NIVIRTRES  SHRS  %CPU %MEM 
TIME+ COMMAND
 4801 opensips  20   0  721040  28496  24240 R  52.9  0.7 1817:08 
opensips
 4802 opensips  20   0  721040  27724  23588 R  47.1  0.7 1813:06 
opensips
 4806 opensips  20   0  721040  27640  23532 R  47.1  0.7 1814:41 
opensips
 4833 opensips  20   0  721040  28848  24812 R  47.1  0.7 1815:16 
opensips
 4803 opensips  20   0  721044  27928  23732 R  41.2  0.7 1817:09 
opensips
 4804 opensips  20   0  721040  28188  23996 R  41.2  0.7 1816:05 
opensips
 4799 opensips  20   0  721040  28272  24172 R  35.3  0.7 1813:29 
opensips
 4800 opensips  20   0  721040  28340  24204 R  29.4  0.7 1816:52 
opensips
 4805 opensips  20   0  723256  29620  25300 R  29.4  0.8 1819:28 
opensips


And here is the "opensipsctl ps" for them:
Process::  ID=32 PID=4799 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=33 PID=4800 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=34 PID=4801 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=35 PID=4802 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=36 PID=4803 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=37 PID=4804 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=38 PID=4805 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=39 PID=4806 Type=SIP receiver udp:127.0.0.1:5070
Process::  ID=66 PID=4833 Type=TCP receiver

"strace" output for those burning processes just floods with:
11:07:04.042794 sched_yield()   = 0
11:07:04.042869 sched_yield()   = 0
11:07:04.042943 sched_yield()   = 0
11:07:04.043016 sched_yield()   = 0
11:07:04.043090 sched_yield()   = 0
11:07:04.043164 sched_yield()   = 0

Unfortunately this opensips is striped and gdb output is limited.
GDB bt of TCP receiver:
(gdb) bt
#0  0x7f939211fe57 in sched_yield () at 
../sysdeps/unix/syscall-template.S:78
#1  0x7f936fffc14d in ?? () from 
/usr/lib/x86_64-linux-gnu/opensips/modules/proto_tls.so
#2  0x7f93791698d5 in send_pr_buffer () from 
/usr/lib/x86_64-linux-gnu/opensips/modules/tm.so
#3  0x7f93791626b8 in relay_reply () from 
/usr/lib/x86_64-linux-gnu/opensips/modules/tm.so
#4  0x7f937919fe2a in timer_routine () from 
/usr/lib/x86_64-linux-gnu/opensips/modules/tm.so

#5  0x55e00034a58a in handle_timer_job ()
#6  0x55e00041ae2a in tcp_worker_proc_loop ()
#7  0x55e000427218 in tcp_start_processes ()
#8  0x55e0002e6837 in main ()

GDB bt of UDP receiver:
#0  0x7f939211fe57 in sched_yield () at 
../sysdeps/unix/syscall-template.S:78
#1  0x7f936fffc14d in ?? () from 
/usr/lib/x86_64-linux-gnu/opensips/modules/proto_tls.so
#2  0x7f93791698d5 in send_pr_buffer () from 
/usr/lib/x86_64-linux-gnu/opensips/modules/tm.so
#3  0x7f93791626b8 in relay_reply () from 
/usr/lib/x86_64-linux-gnu/opensips/modules/tm.so
#4  0x7f9379165015 in reply_received () from 
/usr/lib/x86_64-linux-gnu/opensips/modules/tm.so

#5  0x55e000322fed in forward_reply ()
#6  0x55e000309d1d in receive_msg