Re: [asterisk-users] Asterisk SIP deadlocks - update_provisional_keepalive
Hi everyone, It appears that in Asterisk version 1.8.10.1 happens the same problem: asterisk1*CLI core show version Asterisk 1.8.10.1~dfsg-1ubuntu1 built by buildd @ yellow on a x86_64 running Linux on 2012-04-24 12:47:04 UTC The sip calls doesn't work after next log message: [Aug 22 13:56:44] WARNING[25246] chan_sip.c: Unable to cancel schedule ID 11251269. This is probably a bug (chan_sip.c: stop_session_timer, line 25844). Best regards 2013/4/6 Duane Larson duane.lar...@gmail.com Looks like version 11.3 did not fix my issue. http://pastebin.com/gd291Bqz On Thu, Apr 4, 2013 at 1:23 PM, Duane Larson duane.lar...@gmail.comwrote: Thanks Jim. Searched through the change log for deadlock but nothing really stuck out. I'll upgrade to 11.3 and see if that makes a difference. On Thu, Apr 4, 2013 at 10:59 AM, Jim Lucas li...@cmsws.com wrote: On 04/03/2013 08:15 PM, Duane Larson wrote: So it just happened again on both machines at the same time and I was running debug on both servers. I am running OpenSIPS and load balancing between both servers so I am guessing when the invite was sent to the first server it was frozen for some reason and then OpenSIPS sent the invite to the second server and that server was also frozen/deadlocked because of the SIP message. I noticed on both servers the last log that was posted with Asterisk deadlocked was the following Asterisk version 11.0.1 [Apr 3 21:39:42] DEBUG[12984] res_timing_timerfd.c: Expected to acknowledge 1 ticks but got 11805 instead Asterisk version 11.2.1 [Apr 3 21:39:50] DEBUG[1854] res_timing_timerfd.c: Expected to acknowledge 1 ticks but got 12423 instead In my last email I posted the debug from the Asterisk server with 11.0.1 version of code. Here is a post of the debug for the Asterisk server with version 11.2.1 http://pastebin.com/mbjSSAWM This has to be a bug right? I am thinking of opening an issue on the Asterisk JIRA system A number of deadlocks were fixed in the current release of 11.3. Please read the change log to see if any fit your issue. http://downloads.asterisk.org/**pub/telephony/asterisk/** ChangeLog-11-currenthttp://downloads.asterisk.org/pub/telephony/asterisk/ChangeLog-11-current On Wed, Apr 3, 2013 at 4:45 PM, Duane Larson duane.lar...@gmail.com wrote: It just happened again on the 11.0.1 box and I was able to grab a debug. I am hoping someone can tell me if this is a bug or something wrong with my config. gdb asterisk-bin/sbin/asterisk 29048 Go here for the debug output http://pastebin.com/DGXx0BSk On Tue, Apr 2, 2013 at 7:42 PM, Duane Larson duane.lar...@gmail.com wrote: I am currently running two different versions of Asterisk 11.0.1 11.2.1 I have noticed the bug occur on both servers. The issue is that when I try to dial a phone number sometimes the call will never go out. I will check the Asterisk server with NGREP and see that the SIP messages are making it to Asterisk but Asterisk isn't responding. I do the following command netstat -nap |grep 5060 and see that Asterisk has a lot under the Recv-Q column. It usually takes about 10 minutes before Asterisk becomes responsive again or else before 10 minutes is up I could restart Asterisk and everything will be back to normal. I see in the message logs the following errors On the 11.0.1 Asterisk server WARNING[23723][C-0010] chan_sip.c: Unable to cancel schedule ID 11473. This is probably a bug (chan_sip.c: update_provisional_keepalive, line 4406). On the 11.2.1 Asterisk server WARNING[3493][C-001f] chan_sip.c: Unable to cancel schedule ID 30810. This is probably a bug (chan_sip.c: update_provisional_keepalive, line 4683). When I look in chan_sip.c on both servers I see that they are the same line of code AST_SCHED_DEL_UNREF(sched, pvt-provisional_keepalive_**sched_id, dialog_unref(pvt, when you delete the provisional_keepalive_sched_**id, you should dec the refcount for the stored dialog ptr)); What could be causing this because it seems to happen at least once a day. -- -- *--*--*--*--*--* Duane *--*--*--*--*--* -- -- __**__** _ -- Bandwidth and Colocation Provided by http://www.api-digital.com -- New to Asterisk? Join us for a live introductory webinar every Thurs: http://www.asterisk.org/hello asterisk-users mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/**mailman/listinfo/asterisk-**usershttp://lists.digium.com/mailman/listinfo/asterisk-users -- Jim Lucas http://www.cmsws.com/ http://www.cmsws.com/examples/ -- -- *--*--*--*--*--* Duane *--*--*--*--*--* -- -- -- *--*--*--*--*--* Duane *--*--*--*--*--* -- -- _ -- Bandwidth and Colocation Provided by http://www.api-digital.com -- New to
Re: [asterisk-users] Asterisk SIP deadlocks - update_provisional_keepalive
Looks like version 11.3 did not fix my issue. http://pastebin.com/gd291Bqz On Thu, Apr 4, 2013 at 1:23 PM, Duane Larson duane.lar...@gmail.com wrote: Thanks Jim. Searched through the change log for deadlock but nothing really stuck out. I'll upgrade to 11.3 and see if that makes a difference. On Thu, Apr 4, 2013 at 10:59 AM, Jim Lucas li...@cmsws.com wrote: On 04/03/2013 08:15 PM, Duane Larson wrote: So it just happened again on both machines at the same time and I was running debug on both servers. I am running OpenSIPS and load balancing between both servers so I am guessing when the invite was sent to the first server it was frozen for some reason and then OpenSIPS sent the invite to the second server and that server was also frozen/deadlocked because of the SIP message. I noticed on both servers the last log that was posted with Asterisk deadlocked was the following Asterisk version 11.0.1 [Apr 3 21:39:42] DEBUG[12984] res_timing_timerfd.c: Expected to acknowledge 1 ticks but got 11805 instead Asterisk version 11.2.1 [Apr 3 21:39:50] DEBUG[1854] res_timing_timerfd.c: Expected to acknowledge 1 ticks but got 12423 instead In my last email I posted the debug from the Asterisk server with 11.0.1 version of code. Here is a post of the debug for the Asterisk server with version 11.2.1 http://pastebin.com/mbjSSAWM This has to be a bug right? I am thinking of opening an issue on the Asterisk JIRA system A number of deadlocks were fixed in the current release of 11.3. Please read the change log to see if any fit your issue. http://downloads.asterisk.org/**pub/telephony/asterisk/** ChangeLog-11-currenthttp://downloads.asterisk.org/pub/telephony/asterisk/ChangeLog-11-current On Wed, Apr 3, 2013 at 4:45 PM, Duane Larson duane.lar...@gmail.com wrote: It just happened again on the 11.0.1 box and I was able to grab a debug. I am hoping someone can tell me if this is a bug or something wrong with my config. gdb asterisk-bin/sbin/asterisk 29048 Go here for the debug output http://pastebin.com/DGXx0BSk On Tue, Apr 2, 2013 at 7:42 PM, Duane Larson duane.lar...@gmail.com wrote: I am currently running two different versions of Asterisk 11.0.1 11.2.1 I have noticed the bug occur on both servers. The issue is that when I try to dial a phone number sometimes the call will never go out. I will check the Asterisk server with NGREP and see that the SIP messages are making it to Asterisk but Asterisk isn't responding. I do the following command netstat -nap |grep 5060 and see that Asterisk has a lot under the Recv-Q column. It usually takes about 10 minutes before Asterisk becomes responsive again or else before 10 minutes is up I could restart Asterisk and everything will be back to normal. I see in the message logs the following errors On the 11.0.1 Asterisk server WARNING[23723][C-0010] chan_sip.c: Unable to cancel schedule ID 11473. This is probably a bug (chan_sip.c: update_provisional_keepalive, line 4406). On the 11.2.1 Asterisk server WARNING[3493][C-001f] chan_sip.c: Unable to cancel schedule ID 30810. This is probably a bug (chan_sip.c: update_provisional_keepalive, line 4683). When I look in chan_sip.c on both servers I see that they are the same line of code AST_SCHED_DEL_UNREF(sched, pvt-provisional_keepalive_**sched_id, dialog_unref(pvt, when you delete the provisional_keepalive_sched_**id, you should dec the refcount for the stored dialog ptr)); What could be causing this because it seems to happen at least once a day. -- -- *--*--*--*--*--* Duane *--*--*--*--*--* -- -- __**__** _ -- Bandwidth and Colocation Provided by http://www.api-digital.com -- New to Asterisk? Join us for a live introductory webinar every Thurs: http://www.asterisk.org/hello asterisk-users mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/**mailman/listinfo/asterisk-**usershttp://lists.digium.com/mailman/listinfo/asterisk-users -- Jim Lucas http://www.cmsws.com/ http://www.cmsws.com/examples/ -- -- *--*--*--*--*--* Duane *--*--*--*--*--* -- -- -- *--*--*--*--*--* Duane *--*--*--*--*--* -- -- _ -- Bandwidth and Colocation Provided by http://www.api-digital.com -- New to Asterisk? Join us for a live introductory webinar every Thurs: http://www.asterisk.org/hello asterisk-users mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-users
Re: [asterisk-users] Asterisk SIP deadlocks - update_provisional_keepalive
On 04/03/2013 08:15 PM, Duane Larson wrote: So it just happened again on both machines at the same time and I was running debug on both servers. I am running OpenSIPS and load balancing between both servers so I am guessing when the invite was sent to the first server it was frozen for some reason and then OpenSIPS sent the invite to the second server and that server was also frozen/deadlocked because of the SIP message. I noticed on both servers the last log that was posted with Asterisk deadlocked was the following Asterisk version 11.0.1 [Apr 3 21:39:42] DEBUG[12984] res_timing_timerfd.c: Expected to acknowledge 1 ticks but got 11805 instead Asterisk version 11.2.1 [Apr 3 21:39:50] DEBUG[1854] res_timing_timerfd.c: Expected to acknowledge 1 ticks but got 12423 instead In my last email I posted the debug from the Asterisk server with 11.0.1 version of code. Here is a post of the debug for the Asterisk server with version 11.2.1 http://pastebin.com/mbjSSAWM This has to be a bug right? I am thinking of opening an issue on the Asterisk JIRA system A number of deadlocks were fixed in the current release of 11.3. Please read the change log to see if any fit your issue. http://downloads.asterisk.org/pub/telephony/asterisk/ChangeLog-11-current On Wed, Apr 3, 2013 at 4:45 PM, Duane Larson duane.lar...@gmail.com wrote: It just happened again on the 11.0.1 box and I was able to grab a debug. I am hoping someone can tell me if this is a bug or something wrong with my config. gdb asterisk-bin/sbin/asterisk 29048 Go here for the debug output http://pastebin.com/DGXx0BSk On Tue, Apr 2, 2013 at 7:42 PM, Duane Larson duane.lar...@gmail.comwrote: I am currently running two different versions of Asterisk 11.0.1 11.2.1 I have noticed the bug occur on both servers. The issue is that when I try to dial a phone number sometimes the call will never go out. I will check the Asterisk server with NGREP and see that the SIP messages are making it to Asterisk but Asterisk isn't responding. I do the following command netstat -nap |grep 5060 and see that Asterisk has a lot under the Recv-Q column. It usually takes about 10 minutes before Asterisk becomes responsive again or else before 10 minutes is up I could restart Asterisk and everything will be back to normal. I see in the message logs the following errors On the 11.0.1 Asterisk server WARNING[23723][C-0010] chan_sip.c: Unable to cancel schedule ID 11473. This is probably a bug (chan_sip.c: update_provisional_keepalive, line 4406). On the 11.2.1 Asterisk server WARNING[3493][C-001f] chan_sip.c: Unable to cancel schedule ID 30810. This is probably a bug (chan_sip.c: update_provisional_keepalive, line 4683). When I look in chan_sip.c on both servers I see that they are the same line of code AST_SCHED_DEL_UNREF(sched, pvt-provisional_keepalive_sched_id, dialog_unref(pvt, when you delete the provisional_keepalive_sched_id, you should dec the refcount for the stored dialog ptr)); What could be causing this because it seems to happen at least once a day. -- -- *--*--*--*--*--* Duane *--*--*--*--*--* -- -- _ -- Bandwidth and Colocation Provided by http://www.api-digital.com -- New to Asterisk? Join us for a live introductory webinar every Thurs: http://www.asterisk.org/hello asterisk-users mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-users -- Jim Lucas http://www.cmsws.com/ http://www.cmsws.com/examples/ -- _ -- Bandwidth and Colocation Provided by http://www.api-digital.com -- New to Asterisk? Join us for a live introductory webinar every Thurs: http://www.asterisk.org/hello asterisk-users mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-users
Re: [asterisk-users] Asterisk SIP deadlocks - update_provisional_keepalive
Thanks Jim. Searched through the change log for deadlock but nothing really stuck out. I'll upgrade to 11.3 and see if that makes a difference. On Thu, Apr 4, 2013 at 10:59 AM, Jim Lucas li...@cmsws.com wrote: On 04/03/2013 08:15 PM, Duane Larson wrote: So it just happened again on both machines at the same time and I was running debug on both servers. I am running OpenSIPS and load balancing between both servers so I am guessing when the invite was sent to the first server it was frozen for some reason and then OpenSIPS sent the invite to the second server and that server was also frozen/deadlocked because of the SIP message. I noticed on both servers the last log that was posted with Asterisk deadlocked was the following Asterisk version 11.0.1 [Apr 3 21:39:42] DEBUG[12984] res_timing_timerfd.c: Expected to acknowledge 1 ticks but got 11805 instead Asterisk version 11.2.1 [Apr 3 21:39:50] DEBUG[1854] res_timing_timerfd.c: Expected to acknowledge 1 ticks but got 12423 instead In my last email I posted the debug from the Asterisk server with 11.0.1 version of code. Here is a post of the debug for the Asterisk server with version 11.2.1 http://pastebin.com/mbjSSAWM This has to be a bug right? I am thinking of opening an issue on the Asterisk JIRA system A number of deadlocks were fixed in the current release of 11.3. Please read the change log to see if any fit your issue. http://downloads.asterisk.org/**pub/telephony/asterisk/** ChangeLog-11-currenthttp://downloads.asterisk.org/pub/telephony/asterisk/ChangeLog-11-current On Wed, Apr 3, 2013 at 4:45 PM, Duane Larson duane.lar...@gmail.com wrote: It just happened again on the 11.0.1 box and I was able to grab a debug. I am hoping someone can tell me if this is a bug or something wrong with my config. gdb asterisk-bin/sbin/asterisk 29048 Go here for the debug output http://pastebin.com/DGXx0BSk On Tue, Apr 2, 2013 at 7:42 PM, Duane Larson duane.lar...@gmail.com wrote: I am currently running two different versions of Asterisk 11.0.1 11.2.1 I have noticed the bug occur on both servers. The issue is that when I try to dial a phone number sometimes the call will never go out. I will check the Asterisk server with NGREP and see that the SIP messages are making it to Asterisk but Asterisk isn't responding. I do the following command netstat -nap |grep 5060 and see that Asterisk has a lot under the Recv-Q column. It usually takes about 10 minutes before Asterisk becomes responsive again or else before 10 minutes is up I could restart Asterisk and everything will be back to normal. I see in the message logs the following errors On the 11.0.1 Asterisk server WARNING[23723][C-0010] chan_sip.c: Unable to cancel schedule ID 11473. This is probably a bug (chan_sip.c: update_provisional_keepalive, line 4406). On the 11.2.1 Asterisk server WARNING[3493][C-001f] chan_sip.c: Unable to cancel schedule ID 30810. This is probably a bug (chan_sip.c: update_provisional_keepalive, line 4683). When I look in chan_sip.c on both servers I see that they are the same line of code AST_SCHED_DEL_UNREF(sched, pvt-provisional_keepalive_**sched_id, dialog_unref(pvt, when you delete the provisional_keepalive_sched_**id, you should dec the refcount for the stored dialog ptr)); What could be causing this because it seems to happen at least once a day. -- -- *--*--*--*--*--* Duane *--*--*--*--*--* -- -- __**__**_ -- Bandwidth and Colocation Provided by http://www.api-digital.com -- New to Asterisk? Join us for a live introductory webinar every Thurs: http://www.asterisk.org/hello asterisk-users mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/**mailman/listinfo/asterisk-**usershttp://lists.digium.com/mailman/listinfo/asterisk-users -- Jim Lucas http://www.cmsws.com/ http://www.cmsws.com/examples/ -- -- *--*--*--*--*--* Duane *--*--*--*--*--* -- -- _ -- Bandwidth and Colocation Provided by http://www.api-digital.com -- New to Asterisk? Join us for a live introductory webinar every Thurs: http://www.asterisk.org/hello asterisk-users mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-users
Re: [asterisk-users] Asterisk SIP deadlocks - update_provisional_keepalive
It just happened again on the 11.0.1 box and I was able to grab a debug. I am hoping someone can tell me if this is a bug or something wrong with my config. gdb asterisk-bin/sbin/asterisk 29048 Go here for the debug output http://pastebin.com/DGXx0BSk On Tue, Apr 2, 2013 at 7:42 PM, Duane Larson duane.lar...@gmail.com wrote: I am currently running two different versions of Asterisk 11.0.1 11.2.1 I have noticed the bug occur on both servers. The issue is that when I try to dial a phone number sometimes the call will never go out. I will check the Asterisk server with NGREP and see that the SIP messages are making it to Asterisk but Asterisk isn't responding. I do the following command netstat -nap |grep 5060 and see that Asterisk has a lot under the Recv-Q column. It usually takes about 10 minutes before Asterisk becomes responsive again or else before 10 minutes is up I could restart Asterisk and everything will be back to normal. I see in the message logs the following errors On the 11.0.1 Asterisk server WARNING[23723][C-0010] chan_sip.c: Unable to cancel schedule ID 11473. This is probably a bug (chan_sip.c: update_provisional_keepalive, line 4406). On the 11.2.1 Asterisk server WARNING[3493][C-001f] chan_sip.c: Unable to cancel schedule ID 30810. This is probably a bug (chan_sip.c: update_provisional_keepalive, line 4683). When I look in chan_sip.c on both servers I see that they are the same line of code AST_SCHED_DEL_UNREF(sched, pvt-provisional_keepalive_sched_id, dialog_unref(pvt, when you delete the provisional_keepalive_sched_id, you should dec the refcount for the stored dialog ptr)); What could be causing this because it seems to happen at least once a day. -- -- *--*--*--*--*--* Duane *--*--*--*--*--* -- -- _ -- Bandwidth and Colocation Provided by http://www.api-digital.com -- New to Asterisk? Join us for a live introductory webinar every Thurs: http://www.asterisk.org/hello asterisk-users mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-users
Re: [asterisk-users] Asterisk SIP deadlocks - update_provisional_keepalive
So it just happened again on both machines at the same time and I was running debug on both servers. I am running OpenSIPS and load balancing between both servers so I am guessing when the invite was sent to the first server it was frozen for some reason and then OpenSIPS sent the invite to the second server and that server was also frozen/deadlocked because of the SIP message. I noticed on both servers the last log that was posted with Asterisk deadlocked was the following Asterisk version 11.0.1 [Apr 3 21:39:42] DEBUG[12984] res_timing_timerfd.c: Expected to acknowledge 1 ticks but got 11805 instead Asterisk version 11.2.1 [Apr 3 21:39:50] DEBUG[1854] res_timing_timerfd.c: Expected to acknowledge 1 ticks but got 12423 instead In my last email I posted the debug from the Asterisk server with 11.0.1 version of code. Here is a post of the debug for the Asterisk server with version 11.2.1 http://pastebin.com/mbjSSAWM This has to be a bug right? I am thinking of opening an issue on the Asterisk JIRA system On Wed, Apr 3, 2013 at 4:45 PM, Duane Larson duane.lar...@gmail.com wrote: It just happened again on the 11.0.1 box and I was able to grab a debug. I am hoping someone can tell me if this is a bug or something wrong with my config. gdb asterisk-bin/sbin/asterisk 29048 Go here for the debug output http://pastebin.com/DGXx0BSk On Tue, Apr 2, 2013 at 7:42 PM, Duane Larson duane.lar...@gmail.comwrote: I am currently running two different versions of Asterisk 11.0.1 11.2.1 I have noticed the bug occur on both servers. The issue is that when I try to dial a phone number sometimes the call will never go out. I will check the Asterisk server with NGREP and see that the SIP messages are making it to Asterisk but Asterisk isn't responding. I do the following command netstat -nap |grep 5060 and see that Asterisk has a lot under the Recv-Q column. It usually takes about 10 minutes before Asterisk becomes responsive again or else before 10 minutes is up I could restart Asterisk and everything will be back to normal. I see in the message logs the following errors On the 11.0.1 Asterisk server WARNING[23723][C-0010] chan_sip.c: Unable to cancel schedule ID 11473. This is probably a bug (chan_sip.c: update_provisional_keepalive, line 4406). On the 11.2.1 Asterisk server WARNING[3493][C-001f] chan_sip.c: Unable to cancel schedule ID 30810. This is probably a bug (chan_sip.c: update_provisional_keepalive, line 4683). When I look in chan_sip.c on both servers I see that they are the same line of code AST_SCHED_DEL_UNREF(sched, pvt-provisional_keepalive_sched_id, dialog_unref(pvt, when you delete the provisional_keepalive_sched_id, you should dec the refcount for the stored dialog ptr)); What could be causing this because it seems to happen at least once a day. -- -- *--*--*--*--*--* Duane *--*--*--*--*--* -- -- -- *--*--*--*--*--* Duane *--*--*--*--*--* -- -- _ -- Bandwidth and Colocation Provided by http://www.api-digital.com -- New to Asterisk? Join us for a live introductory webinar every Thurs: http://www.asterisk.org/hello asterisk-users mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-users
[asterisk-users] Asterisk SIP deadlocks - update_provisional_keepalive
I am currently running two different versions of Asterisk 11.0.1 11.2.1 I have noticed the bug occur on both servers. The issue is that when I try to dial a phone number sometimes the call will never go out. I will check the Asterisk server with NGREP and see that the SIP messages are making it to Asterisk but Asterisk isn't responding. I do the following command netstat -nap |grep 5060 and see that Asterisk has a lot under the Recv-Q column. It usually takes about 10 minutes before Asterisk becomes responsive again or else before 10 minutes is up I could restart Asterisk and everything will be back to normal. I see in the message logs the following errors On the 11.0.1 Asterisk server WARNING[23723][C-0010] chan_sip.c: Unable to cancel schedule ID 11473. This is probably a bug (chan_sip.c: update_provisional_keepalive, line 4406). On the 11.2.1 Asterisk server WARNING[3493][C-001f] chan_sip.c: Unable to cancel schedule ID 30810. This is probably a bug (chan_sip.c: update_provisional_keepalive, line 4683). When I look in chan_sip.c on both servers I see that they are the same line of code AST_SCHED_DEL_UNREF(sched, pvt-provisional_keepalive_sched_id, dialog_unref(pvt, when you delete the provisional_keepalive_sched_id, you should dec the refcount for the stored dialog ptr)); What could be causing this because it seems to happen at least once a day. -- _ -- Bandwidth and Colocation Provided by http://www.api-digital.com -- New to Asterisk? Join us for a live introductory webinar every Thurs: http://www.asterisk.org/hello asterisk-users mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-users