Re: [OpenSIPS-Users] UDP workers consume 100% CPU
Absolute agree. No doubts it's the same issue. I suspect you are facing an issue similar to this one[1], so I think we should move the discussion on that ticket. So please send us your updates over there. [1] https://github.com/OpenSIPS/opensips/issues/1767 Best regards, Răzvan On 8/14/19 12:58 PM, Vitalii Aleksandrov wrote: Ok, I'll check when finish load testing with libssl-1.0. Hi, Vitalii! Just to double check you are using the right overwritten functions, can you add some debugging in the tls_mgm code, recompile, and run the test again? I'm thinking to add something like the snippet below on line 1525[1]: LM_INFO("initialized mutex over OpenSIPS\n"); [1] https://github.com/OpenSIPS/opensips/blob/master/modules/tls_mgm/tls_mgm.c#L1525 Best regards, Răzvan On 8/12/19 7:20 PM, Vitalii Aleksandrov wrote: Checked the tls_mgm.c code and found that I wasn't right. I've just downgraded libssl and restarted opensips without rebuilding it. Will correctly rebuild opensips with libssl-1.0 dev files and retest. The workaround with pthread_mutex_init() and shm_malloc() wrapping theoretically should work and Kamailio went the same way. Anyhow something definitely wrong with this "workaround" and libssl-1.1 Hi, Downgraded libssl to 1.0.2n-1ubuntu5.3 available in repository and still can reproduce this bug. You were right that this is a deadlock in libssl. Found TCP worker process blocked in libcrypto and a bunch of processes burning CPU cycles in sched_yield() waiting for the lock. That's really strange that do completely different libssl versions have such a dangerous bug. Found a few opened issues related to libssl, but according to comments downgrading to libssl-1.0 helped to make it stable. Hi, I'm using libssl-1.1.1-1ubuntu2.1~18.04.4 Since I need SIP over TLS I can't just remove proto_tls. Will try to reproduce it again and collect more data. On 07.08.19 13:18, Bogdan-Andrei Iancu wrote: Hi Vitalii, As the backtraces show, it is a deadlock in TLS - if you remove the proto_tls, you will not experience this issue anymore. The root problem is not in OpenSIPS itself but in the libssl/libcrypto - what versions are you using for these libs ? Regards, Bogdan-Andrei Iancu OpenSIPS Founder and Developer https://www.opensips-solutions.com OpenSIPS Summit 2019 https://www.opensips.org/events/Summit-2019Amsterdam/ On 08/05/2019 09:13 PM, Vitalii Aleksandrov wrote: Hi, I have a test opensips 2.4.6 (last week's checkout from 2.4 branch) and after short load testing it started to consume all CPU resources. Here is what opensips floods to syslog: Aug 3 13:25:01 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231430 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231530 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231630 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231730 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231830 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231930 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176232030 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:timer_ticker: timer task already scheduled for 114023090 ms (now 176232030 ms), it may overlap.. Where 4774 is: "Process:: ID=7 PID=4774 Type=timer" Here is the TOP output for time consuming processes: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4801 opensips 20 0 721040 28496 24240 R 52.9 0.7 1817:08 opensips 4802 opensips 20 0 721040 27724 23588 R 47.1 0.7 1813:06 opensips 4806 opensips 20 0 721040 27640 23532 R 47.1 0.7 1814:41 opensips 4833 opensips 20 0 721040 28848 24812 R 47.1 0.7 1815:16 opensips 4803 opensips 20 0 721044 27928 23732 R 41.2 0.7 1817:09 opensips 4804 opensips 20 0 721040 28188 23996 R 41.2 0.7 1816:05 opensips 4799 opensips 20 0 721040 28272 24172 R 35.3 0.7 1813:29 opensips 4800 opensips 20 0 721040 28340 24204 R 29.4 0.7 1816:52 opensips 4805 opensips 20 0 723256 29620 25300 R 29.4 0.8 1819:28 opensips And here is the "opensipsctl ps" for them: Process:: ID=32 PID=4799 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=33 PID=4800 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=34 PID=4801 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=35 PID=4802 Type=SIP receiver udp:127.0.0.1:5070 P
Re: [OpenSIPS-Users] UDP workers consume 100% CPU
I suspect you are facing an issue similar to this one[1], so I think we should move the discussion on that ticket. So please send us your updates over there. [1] https://github.com/OpenSIPS/opensips/issues/1767 Best regards, Răzvan On 8/14/19 12:58 PM, Vitalii Aleksandrov wrote: Ok, I'll check when finish load testing with libssl-1.0. Hi, Vitalii! Just to double check you are using the right overwritten functions, can you add some debugging in the tls_mgm code, recompile, and run the test again? I'm thinking to add something like the snippet below on line 1525[1]: LM_INFO("initialized mutex over OpenSIPS\n"); [1] https://github.com/OpenSIPS/opensips/blob/master/modules/tls_mgm/tls_mgm.c#L1525 Best regards, Răzvan On 8/12/19 7:20 PM, Vitalii Aleksandrov wrote: Checked the tls_mgm.c code and found that I wasn't right. I've just downgraded libssl and restarted opensips without rebuilding it. Will correctly rebuild opensips with libssl-1.0 dev files and retest. The workaround with pthread_mutex_init() and shm_malloc() wrapping theoretically should work and Kamailio went the same way. Anyhow something definitely wrong with this "workaround" and libssl-1.1 Hi, Downgraded libssl to 1.0.2n-1ubuntu5.3 available in repository and still can reproduce this bug. You were right that this is a deadlock in libssl. Found TCP worker process blocked in libcrypto and a bunch of processes burning CPU cycles in sched_yield() waiting for the lock. That's really strange that do completely different libssl versions have such a dangerous bug. Found a few opened issues related to libssl, but according to comments downgrading to libssl-1.0 helped to make it stable. Hi, I'm using libssl-1.1.1-1ubuntu2.1~18.04.4 Since I need SIP over TLS I can't just remove proto_tls. Will try to reproduce it again and collect more data. On 07.08.19 13:18, Bogdan-Andrei Iancu wrote: Hi Vitalii, As the backtraces show, it is a deadlock in TLS - if you remove the proto_tls, you will not experience this issue anymore. The root problem is not in OpenSIPS itself but in the libssl/libcrypto - what versions are you using for these libs ? Regards, Bogdan-Andrei Iancu OpenSIPS Founder and Developer https://www.opensips-solutions.com OpenSIPS Summit 2019 https://www.opensips.org/events/Summit-2019Amsterdam/ On 08/05/2019 09:13 PM, Vitalii Aleksandrov wrote: Hi, I have a test opensips 2.4.6 (last week's checkout from 2.4 branch) and after short load testing it started to consume all CPU resources. Here is what opensips floods to syslog: Aug 3 13:25:01 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231430 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231530 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231630 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231730 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231830 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231930 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176232030 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:timer_ticker: timer task already scheduled for 114023090 ms (now 176232030 ms), it may overlap.. Where 4774 is: "Process:: ID=7 PID=4774 Type=timer" Here is the TOP output for time consuming processes: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4801 opensips 20 0 721040 28496 24240 R 52.9 0.7 1817:08 opensips 4802 opensips 20 0 721040 27724 23588 R 47.1 0.7 1813:06 opensips 4806 opensips 20 0 721040 27640 23532 R 47.1 0.7 1814:41 opensips 4833 opensips 20 0 721040 28848 24812 R 47.1 0.7 1815:16 opensips 4803 opensips 20 0 721044 27928 23732 R 41.2 0.7 1817:09 opensips 4804 opensips 20 0 721040 28188 23996 R 41.2 0.7 1816:05 opensips 4799 opensips 20 0 721040 28272 24172 R 35.3 0.7 1813:29 opensips 4800 opensips 20 0 721040 28340 24204 R 29.4 0.7 1816:52 opensips 4805 opensips 20 0 723256 29620 25300 R 29.4 0.8 1819:28 opensips And here is the "opensipsctl ps" for them: Process:: ID=32 PID=4799 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=33 PID=4800 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=34 PID=4801 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=35 PID=4802 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=36 PID=4803 Type=SIP receiver u
Re: [OpenSIPS-Users] UDP workers consume 100% CPU
Ok, I'll check when finish load testing with libssl-1.0. Hi, Vitalii! Just to double check you are using the right overwritten functions, can you add some debugging in the tls_mgm code, recompile, and run the test again? I'm thinking to add something like the snippet below on line 1525[1]: LM_INFO("initialized mutex over OpenSIPS\n"); [1] https://github.com/OpenSIPS/opensips/blob/master/modules/tls_mgm/tls_mgm.c#L1525 Best regards, Răzvan On 8/12/19 7:20 PM, Vitalii Aleksandrov wrote: Checked the tls_mgm.c code and found that I wasn't right. I've just downgraded libssl and restarted opensips without rebuilding it. Will correctly rebuild opensips with libssl-1.0 dev files and retest. The workaround with pthread_mutex_init() and shm_malloc() wrapping theoretically should work and Kamailio went the same way. Anyhow something definitely wrong with this "workaround" and libssl-1.1 Hi, Downgraded libssl to 1.0.2n-1ubuntu5.3 available in repository and still can reproduce this bug. You were right that this is a deadlock in libssl. Found TCP worker process blocked in libcrypto and a bunch of processes burning CPU cycles in sched_yield() waiting for the lock. That's really strange that do completely different libssl versions have such a dangerous bug. Found a few opened issues related to libssl, but according to comments downgrading to libssl-1.0 helped to make it stable. Hi, I'm using libssl-1.1.1-1ubuntu2.1~18.04.4 Since I need SIP over TLS I can't just remove proto_tls. Will try to reproduce it again and collect more data. On 07.08.19 13:18, Bogdan-Andrei Iancu wrote: Hi Vitalii, As the backtraces show, it is a deadlock in TLS - if you remove the proto_tls, you will not experience this issue anymore. The root problem is not in OpenSIPS itself but in the libssl/libcrypto - what versions are you using for these libs ? Regards, Bogdan-Andrei Iancu OpenSIPS Founder and Developer https://www.opensips-solutions.com OpenSIPS Summit 2019 https://www.opensips.org/events/Summit-2019Amsterdam/ On 08/05/2019 09:13 PM, Vitalii Aleksandrov wrote: Hi, I have a test opensips 2.4.6 (last week's checkout from 2.4 branch) and after short load testing it started to consume all CPU resources. Here is what opensips floods to syslog: Aug 3 13:25:01 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231430 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231530 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231630 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231730 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231830 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231930 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176232030 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:timer_ticker: timer task already scheduled for 114023090 ms (now 176232030 ms), it may overlap.. Where 4774 is: "Process:: ID=7 PID=4774 Type=timer" Here is the TOP output for time consuming processes: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4801 opensips 20 0 721040 28496 24240 R 52.9 0.7 1817:08 opensips 4802 opensips 20 0 721040 27724 23588 R 47.1 0.7 1813:06 opensips 4806 opensips 20 0 721040 27640 23532 R 47.1 0.7 1814:41 opensips 4833 opensips 20 0 721040 28848 24812 R 47.1 0.7 1815:16 opensips 4803 opensips 20 0 721044 27928 23732 R 41.2 0.7 1817:09 opensips 4804 opensips 20 0 721040 28188 23996 R 41.2 0.7 1816:05 opensips 4799 opensips 20 0 721040 28272 24172 R 35.3 0.7 1813:29 opensips 4800 opensips 20 0 721040 28340 24204 R 29.4 0.7 1816:52 opensips 4805 opensips 20 0 723256 29620 25300 R 29.4 0.8 1819:28 opensips And here is the "opensipsctl ps" for them: Process:: ID=32 PID=4799 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=33 PID=4800 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=34 PID=4801 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=35 PID=4802 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=36 PID=4803 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=37 PID=4804 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=38 PID=4805 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=39 PID=4806 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=66 PID=4833 Type=TCP receiver "strace" output for those burning pr
Re: [OpenSIPS-Users] UDP workers consume 100% CPU
Hi, Vitalii! Just to double check you are using the right overwritten functions, can you add some debugging in the tls_mgm code, recompile, and run the test again? I'm thinking to add something like the snippet below on line 1525[1]: LM_INFO("initialized mutex over OpenSIPS\n"); [1] https://github.com/OpenSIPS/opensips/blob/master/modules/tls_mgm/tls_mgm.c#L1525 Best regards, Răzvan On 8/12/19 7:20 PM, Vitalii Aleksandrov wrote: Checked the tls_mgm.c code and found that I wasn't right. I've just downgraded libssl and restarted opensips without rebuilding it. Will correctly rebuild opensips with libssl-1.0 dev files and retest. The workaround with pthread_mutex_init() and shm_malloc() wrapping theoretically should work and Kamailio went the same way. Anyhow something definitely wrong with this "workaround" and libssl-1.1 Hi, Downgraded libssl to 1.0.2n-1ubuntu5.3 available in repository and still can reproduce this bug. You were right that this is a deadlock in libssl. Found TCP worker process blocked in libcrypto and a bunch of processes burning CPU cycles in sched_yield() waiting for the lock. That's really strange that do completely different libssl versions have such a dangerous bug. Found a few opened issues related to libssl, but according to comments downgrading to libssl-1.0 helped to make it stable. Hi, I'm using libssl-1.1.1-1ubuntu2.1~18.04.4 Since I need SIP over TLS I can't just remove proto_tls. Will try to reproduce it again and collect more data. On 07.08.19 13:18, Bogdan-Andrei Iancu wrote: Hi Vitalii, As the backtraces show, it is a deadlock in TLS - if you remove the proto_tls, you will not experience this issue anymore. The root problem is not in OpenSIPS itself but in the libssl/libcrypto - what versions are you using for these libs ? Regards, Bogdan-Andrei Iancu OpenSIPS Founder and Developer https://www.opensips-solutions.com OpenSIPS Summit 2019 https://www.opensips.org/events/Summit-2019Amsterdam/ On 08/05/2019 09:13 PM, Vitalii Aleksandrov wrote: Hi, I have a test opensips 2.4.6 (last week's checkout from 2.4 branch) and after short load testing it started to consume all CPU resources. Here is what opensips floods to syslog: Aug 3 13:25:01 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231430 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231530 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231630 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231730 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231830 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231930 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176232030 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:timer_ticker: timer task already scheduled for 114023090 ms (now 176232030 ms), it may overlap.. Where 4774 is: "Process:: ID=7 PID=4774 Type=timer" Here is the TOP output for time consuming processes: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4801 opensips 20 0 721040 28496 24240 R 52.9 0.7 1817:08 opensips 4802 opensips 20 0 721040 27724 23588 R 47.1 0.7 1813:06 opensips 4806 opensips 20 0 721040 27640 23532 R 47.1 0.7 1814:41 opensips 4833 opensips 20 0 721040 28848 24812 R 47.1 0.7 1815:16 opensips 4803 opensips 20 0 721044 27928 23732 R 41.2 0.7 1817:09 opensips 4804 opensips 20 0 721040 28188 23996 R 41.2 0.7 1816:05 opensips 4799 opensips 20 0 721040 28272 24172 R 35.3 0.7 1813:29 opensips 4800 opensips 20 0 721040 28340 24204 R 29.4 0.7 1816:52 opensips 4805 opensips 20 0 723256 29620 25300 R 29.4 0.8 1819:28 opensips And here is the "opensipsctl ps" for them: Process:: ID=32 PID=4799 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=33 PID=4800 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=34 PID=4801 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=35 PID=4802 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=36 PID=4803 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=37 PID=4804 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=38 PID=4805 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=39 PID=4806 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=66 PID=4833 Type=TCP receiver "strace" output for those burning processes just floods with: 11:07:04.042794 sched_yiel
Re: [OpenSIPS-Users] UDP workers consume 100% CPU
Checked the tls_mgm.c code and found that I wasn't right. I've just downgraded libssl and restarted opensips without rebuilding it. Will correctly rebuild opensips with libssl-1.0 dev files and retest. The workaround with pthread_mutex_init() and shm_malloc() wrapping theoretically should work and Kamailio went the same way. Anyhow something definitely wrong with this "workaround" and libssl-1.1 Hi, Downgraded libssl to 1.0.2n-1ubuntu5.3 available in repository and still can reproduce this bug. You were right that this is a deadlock in libssl. Found TCP worker process blocked in libcrypto and a bunch of processes burning CPU cycles in sched_yield() waiting for the lock. That's really strange that do completely different libssl versions have such a dangerous bug. Found a few opened issues related to libssl, but according to comments downgrading to libssl-1.0 helped to make it stable. Hi, I'm using libssl-1.1.1-1ubuntu2.1~18.04.4 Since I need SIP over TLS I can't just remove proto_tls. Will try to reproduce it again and collect more data. On 07.08.19 13:18, Bogdan-Andrei Iancu wrote: Hi Vitalii, As the backtraces show, it is a deadlock in TLS - if you remove the proto_tls, you will not experience this issue anymore. The root problem is not in OpenSIPS itself but in the libssl/libcrypto - what versions are you using for these libs ? Regards, Bogdan-Andrei Iancu OpenSIPS Founder and Developer https://www.opensips-solutions.com OpenSIPS Summit 2019 https://www.opensips.org/events/Summit-2019Amsterdam/ On 08/05/2019 09:13 PM, Vitalii Aleksandrov wrote: Hi, I have a test opensips 2.4.6 (last week's checkout from 2.4 branch) and after short load testing it started to consume all CPU resources. Here is what opensips floods to syslog: Aug 3 13:25:01 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231430 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231530 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231630 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231730 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231830 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231930 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176232030 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:timer_ticker: timer task already scheduled for 114023090 ms (now 176232030 ms), it may overlap.. Where 4774 is: "Process:: ID=7 PID=4774 Type=timer" Here is the TOP output for time consuming processes: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4801 opensips 20 0 721040 28496 24240 R 52.9 0.7 1817:08 opensips 4802 opensips 20 0 721040 27724 23588 R 47.1 0.7 1813:06 opensips 4806 opensips 20 0 721040 27640 23532 R 47.1 0.7 1814:41 opensips 4833 opensips 20 0 721040 28848 24812 R 47.1 0.7 1815:16 opensips 4803 opensips 20 0 721044 27928 23732 R 41.2 0.7 1817:09 opensips 4804 opensips 20 0 721040 28188 23996 R 41.2 0.7 1816:05 opensips 4799 opensips 20 0 721040 28272 24172 R 35.3 0.7 1813:29 opensips 4800 opensips 20 0 721040 28340 24204 R 29.4 0.7 1816:52 opensips 4805 opensips 20 0 723256 29620 25300 R 29.4 0.8 1819:28 opensips And here is the "opensipsctl ps" for them: Process:: ID=32 PID=4799 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=33 PID=4800 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=34 PID=4801 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=35 PID=4802 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=36 PID=4803 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=37 PID=4804 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=38 PID=4805 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=39 PID=4806 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=66 PID=4833 Type=TCP receiver "strace" output for those burning processes just floods with: 11:07:04.042794 sched_yield() = 0 11:07:04.042869 sched_yield() = 0 11:07:04.042943 sched_yield() = 0 11:07:04.043016 sched_yield() = 0 11:07:04.043090 sched_yield() = 0 11:07:04.043164 sched_yield() = 0 Unfortunately this opensips is striped and gdb output is limited. GDB bt of TCP receiver: (gdb) bt #0 0x7f939211fe57 in sched_yield () at ../sysdeps/unix/syscall-template.S:78 #1 0x7f936fffc14d
Re: [OpenSIPS-Users] UDP workers consume 100% CPU
Hi, Downgraded libssl to 1.0.2n-1ubuntu5.3 available in repository and still can reproduce this bug. You were right that this is a deadlock in libssl. Found TCP worker process blocked in libcrypto and a bunch of processes burning CPU cycles in sched_yield() waiting for the lock. That's really strange that do completely different libssl versions have such a dangerous bug. Found a few opened issues related to libssl, but according to comments downgrading to libssl-1.0 helped to make it stable. Hi, I'm using libssl-1.1.1-1ubuntu2.1~18.04.4 Since I need SIP over TLS I can't just remove proto_tls. Will try to reproduce it again and collect more data. On 07.08.19 13:18, Bogdan-Andrei Iancu wrote: Hi Vitalii, As the backtraces show, it is a deadlock in TLS - if you remove the proto_tls, you will not experience this issue anymore. The root problem is not in OpenSIPS itself but in the libssl/libcrypto - what versions are you using for these libs ? Regards, Bogdan-Andrei Iancu OpenSIPS Founder and Developer https://www.opensips-solutions.com OpenSIPS Summit 2019 https://www.opensips.org/events/Summit-2019Amsterdam/ On 08/05/2019 09:13 PM, Vitalii Aleksandrov wrote: Hi, I have a test opensips 2.4.6 (last week's checkout from 2.4 branch) and after short load testing it started to consume all CPU resources. Here is what opensips floods to syslog: Aug 3 13:25:01 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231430 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231530 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231630 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231730 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231830 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231930 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176232030 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:timer_ticker: timer task already scheduled for 114023090 ms (now 176232030 ms), it may overlap.. Where 4774 is: "Process:: ID=7 PID=4774 Type=timer" Here is the TOP output for time consuming processes: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4801 opensips 20 0 721040 28496 24240 R 52.9 0.7 1817:08 opensips 4802 opensips 20 0 721040 27724 23588 R 47.1 0.7 1813:06 opensips 4806 opensips 20 0 721040 27640 23532 R 47.1 0.7 1814:41 opensips 4833 opensips 20 0 721040 28848 24812 R 47.1 0.7 1815:16 opensips 4803 opensips 20 0 721044 27928 23732 R 41.2 0.7 1817:09 opensips 4804 opensips 20 0 721040 28188 23996 R 41.2 0.7 1816:05 opensips 4799 opensips 20 0 721040 28272 24172 R 35.3 0.7 1813:29 opensips 4800 opensips 20 0 721040 28340 24204 R 29.4 0.7 1816:52 opensips 4805 opensips 20 0 723256 29620 25300 R 29.4 0.8 1819:28 opensips And here is the "opensipsctl ps" for them: Process:: ID=32 PID=4799 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=33 PID=4800 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=34 PID=4801 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=35 PID=4802 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=36 PID=4803 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=37 PID=4804 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=38 PID=4805 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=39 PID=4806 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=66 PID=4833 Type=TCP receiver "strace" output for those burning processes just floods with: 11:07:04.042794 sched_yield() = 0 11:07:04.042869 sched_yield() = 0 11:07:04.042943 sched_yield() = 0 11:07:04.043016 sched_yield() = 0 11:07:04.043090 sched_yield() = 0 11:07:04.043164 sched_yield() = 0 Unfortunately this opensips is striped and gdb output is limited. GDB bt of TCP receiver: (gdb) bt #0 0x7f939211fe57 in sched_yield () at ../sysdeps/unix/syscall-template.S:78 #1 0x7f936fffc14d in ?? () from /usr/lib/x86_64-linux-gnu/opensips/modules/proto_tls.so #2 0x7f93791698d5 in send_pr_buffer () from /usr/lib/x86_64-linux-gnu/opensips/modules/tm.so #3 0x7f93791626b8 in relay_reply () from /usr/lib/x86_64-linux-gnu/opensips/modules/tm.so #4 0x7f937919fe2a in timer_routine () from /usr/lib/x86_64-linux-gnu/opensips/modules/tm.so #5 0x55e00034a58a in handle_timer_jo
Re: [OpenSIPS-Users] UDP workers consume 100% CPU
Hi, I'm using libssl-1.1.1-1ubuntu2.1~18.04.4 Since I need SIP over TLS I can't just remove proto_tls. Will try to reproduce it again and collect more data. On 07.08.19 13:18, Bogdan-Andrei Iancu wrote: Hi Vitalii, As the backtraces show, it is a deadlock in TLS - if you remove the proto_tls, you will not experience this issue anymore. The root problem is not in OpenSIPS itself but in the libssl/libcrypto - what versions are you using for these libs ? Regards, Bogdan-Andrei Iancu OpenSIPS Founder and Developer https://www.opensips-solutions.com OpenSIPS Summit 2019 https://www.opensips.org/events/Summit-2019Amsterdam/ On 08/05/2019 09:13 PM, Vitalii Aleksandrov wrote: Hi, I have a test opensips 2.4.6 (last week's checkout from 2.4 branch) and after short load testing it started to consume all CPU resources. Here is what opensips floods to syslog: Aug 3 13:25:01 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231430 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231530 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231630 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231730 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231830 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231930 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176232030 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:timer_ticker: timer task already scheduled for 114023090 ms (now 176232030 ms), it may overlap.. Where 4774 is: "Process:: ID=7 PID=4774 Type=timer" Here is the TOP output for time consuming processes: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4801 opensips 20 0 721040 28496 24240 R 52.9 0.7 1817:08 opensips 4802 opensips 20 0 721040 27724 23588 R 47.1 0.7 1813:06 opensips 4806 opensips 20 0 721040 27640 23532 R 47.1 0.7 1814:41 opensips 4833 opensips 20 0 721040 28848 24812 R 47.1 0.7 1815:16 opensips 4803 opensips 20 0 721044 27928 23732 R 41.2 0.7 1817:09 opensips 4804 opensips 20 0 721040 28188 23996 R 41.2 0.7 1816:05 opensips 4799 opensips 20 0 721040 28272 24172 R 35.3 0.7 1813:29 opensips 4800 opensips 20 0 721040 28340 24204 R 29.4 0.7 1816:52 opensips 4805 opensips 20 0 723256 29620 25300 R 29.4 0.8 1819:28 opensips And here is the "opensipsctl ps" for them: Process:: ID=32 PID=4799 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=33 PID=4800 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=34 PID=4801 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=35 PID=4802 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=36 PID=4803 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=37 PID=4804 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=38 PID=4805 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=39 PID=4806 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=66 PID=4833 Type=TCP receiver "strace" output for those burning processes just floods with: 11:07:04.042794 sched_yield() = 0 11:07:04.042869 sched_yield() = 0 11:07:04.042943 sched_yield() = 0 11:07:04.043016 sched_yield() = 0 11:07:04.043090 sched_yield() = 0 11:07:04.043164 sched_yield() = 0 Unfortunately this opensips is striped and gdb output is limited. GDB bt of TCP receiver: (gdb) bt #0 0x7f939211fe57 in sched_yield () at ../sysdeps/unix/syscall-template.S:78 #1 0x7f936fffc14d in ?? () from /usr/lib/x86_64-linux-gnu/opensips/modules/proto_tls.so #2 0x7f93791698d5 in send_pr_buffer () from /usr/lib/x86_64-linux-gnu/opensips/modules/tm.so #3 0x7f93791626b8 in relay_reply () from /usr/lib/x86_64-linux-gnu/opensips/modules/tm.so #4 0x7f937919fe2a in timer_routine () from /usr/lib/x86_64-linux-gnu/opensips/modules/tm.so #5 0x55e00034a58a in handle_timer_job () #6 0x55e00041ae2a in tcp_worker_proc_loop () #7 0x55e000427218 in tcp_start_processes () #8 0x55e0002e6837 in main () GDB bt of UDP receiver: #0 0x7f939211fe57 in sched_yield () at ../sysdeps/unix/syscall-template.S:78 #1 0x7f936fffc14d in ?? () from /usr/lib/x86_64-linux-gnu/opensips/modules/proto_tls.so #2 0x7f93791698d5 in send_pr_buffer () from /usr/lib/x86_64-linux-gnu/opensips/modules/tm.so #3 0x7f93791626b8 in relay_reply () from /usr/lib/x86_64-linux
Re: [OpenSIPS-Users] UDP workers consume 100% CPU
Hi Vitalii, As the backtraces show, it is a deadlock in TLS - if you remove the proto_tls, you will not experience this issue anymore. The root problem is not in OpenSIPS itself but in the libssl/libcrypto - what versions are you using for these libs ? Regards, Bogdan-Andrei Iancu OpenSIPS Founder and Developer https://www.opensips-solutions.com OpenSIPS Summit 2019 https://www.opensips.org/events/Summit-2019Amsterdam/ On 08/05/2019 09:13 PM, Vitalii Aleksandrov wrote: Hi, I have a test opensips 2.4.6 (last week's checkout from 2.4 branch) and after short load testing it started to consume all CPU resources. Here is what opensips floods to syslog: Aug 3 13:25:01 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231430 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231530 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231630 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231730 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231830 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176231930 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: utimer task already scheduled for 114023190 ms (now 176232030 ms), it may overlap.. Aug 3 13:25:02 l /opensips[4774]: WARNING:core:timer_ticker: timer task already scheduled for 114023090 ms (now 176232030 ms), it may overlap.. Where 4774 is: "Process:: ID=7 PID=4774 Type=timer" Here is the TOP output for time consuming processes: PID USER PR NIVIRTRES SHRS %CPU %MEM TIME+ COMMAND 4801 opensips 20 0 721040 28496 24240 R 52.9 0.7 1817:08 opensips 4802 opensips 20 0 721040 27724 23588 R 47.1 0.7 1813:06 opensips 4806 opensips 20 0 721040 27640 23532 R 47.1 0.7 1814:41 opensips 4833 opensips 20 0 721040 28848 24812 R 47.1 0.7 1815:16 opensips 4803 opensips 20 0 721044 27928 23732 R 41.2 0.7 1817:09 opensips 4804 opensips 20 0 721040 28188 23996 R 41.2 0.7 1816:05 opensips 4799 opensips 20 0 721040 28272 24172 R 35.3 0.7 1813:29 opensips 4800 opensips 20 0 721040 28340 24204 R 29.4 0.7 1816:52 opensips 4805 opensips 20 0 723256 29620 25300 R 29.4 0.8 1819:28 opensips And here is the "opensipsctl ps" for them: Process:: ID=32 PID=4799 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=33 PID=4800 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=34 PID=4801 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=35 PID=4802 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=36 PID=4803 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=37 PID=4804 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=38 PID=4805 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=39 PID=4806 Type=SIP receiver udp:127.0.0.1:5070 Process:: ID=66 PID=4833 Type=TCP receiver "strace" output for those burning processes just floods with: 11:07:04.042794 sched_yield() = 0 11:07:04.042869 sched_yield() = 0 11:07:04.042943 sched_yield() = 0 11:07:04.043016 sched_yield() = 0 11:07:04.043090 sched_yield() = 0 11:07:04.043164 sched_yield() = 0 Unfortunately this opensips is striped and gdb output is limited. GDB bt of TCP receiver: (gdb) bt #0 0x7f939211fe57 in sched_yield () at ../sysdeps/unix/syscall-template.S:78 #1 0x7f936fffc14d in ?? () from /usr/lib/x86_64-linux-gnu/opensips/modules/proto_tls.so #2 0x7f93791698d5 in send_pr_buffer () from /usr/lib/x86_64-linux-gnu/opensips/modules/tm.so #3 0x7f93791626b8 in relay_reply () from /usr/lib/x86_64-linux-gnu/opensips/modules/tm.so #4 0x7f937919fe2a in timer_routine () from /usr/lib/x86_64-linux-gnu/opensips/modules/tm.so #5 0x55e00034a58a in handle_timer_job () #6 0x55e00041ae2a in tcp_worker_proc_loop () #7 0x55e000427218 in tcp_start_processes () #8 0x55e0002e6837 in main () GDB bt of UDP receiver: #0 0x7f939211fe57 in sched_yield () at ../sysdeps/unix/syscall-template.S:78 #1 0x7f936fffc14d in ?? () from /usr/lib/x86_64-linux-gnu/opensips/modules/proto_tls.so #2 0x7f93791698d5 in send_pr_buffer () from /usr/lib/x86_64-linux-gnu/opensips/modules/tm.so #3 0x7f93791626b8 in relay_reply () from /usr/lib/x86_64-linux-gnu/opensips/modules/tm.so #4 0x7f9379165015 in reply_received () from /usr/lib/x86_64-linux-gnu/opensips/modules/tm.so #5 0x55e000322fed in forward_reply () #6 0x55e000309d1d in receive_msg