Re: [asterisk-users] Asterisk rtp.conf stunaddr setting - what happens if there is an outage

2023-02-06 Thread Joshua C. Colp
On Mon, Feb 6, 2023 at 6:05 PM Dan Cropp  wrote:

> A quick follow-up.
>
>
>
> Looking at other customers running 18.12.1 who reported problems at the
> exact same time with AWS issue described below.
>
>
>
> We are seeing similar behavior.
>
> For these systems, the third STUN failure occurs.  We were able to answer
> the call because the SIP provider didn’t CANCEL the call.
>
> However, upstream from the service provider the calls were terminated.
>
> Resulting in a call from the SIP provider to Asterisk that’s live, but
> there is no caller so it appears to be dead air.
>
>
>
> Does the res_rtp_asterisk stunaddr DNS TTL expiration mentioned in change
> ID I7955a046293f913ba121bbd82153b04439e3465f require the dnsmgr.conf to be
> enabled?
>

It doesn't use dnsmgr so it's not required to be enabled. If the TTL is
long, or it's cached locally then it could stick around longer.

Fundamentally though is there a reason you're using STUN in the first
place? Can you not just configure the public IP address and not rely on an
external STUN server? rtp.conf has ice_host_candidates specifically for
situations like AWS.

-- 
Joshua C. Colp
Asterisk Project Lead
Sangoma Technologies
Check us out at www.sangoma.com and www.asterisk.org
-- 
_
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

Check out the new Asterisk community forum at: https://community.asterisk.org/

New to Asterisk? Start here:
  https://wiki.asterisk.org/wiki/display/AST/Getting+Started

asterisk-users mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-users

Re: [asterisk-users] Asterisk rtp.conf stunaddr setting - what happens if there is an outage

2023-02-06 Thread Dan Cropp
A quick follow-up.

Looking at other customers running 18.12.1 who reported problems at the exact 
same time with AWS issue described below.

We are seeing similar behavior.
For these systems, the third STUN failure occurs.  We were able to answer the 
call because the SIP provider didn't CANCEL the call.
However, upstream from the service provider the calls were terminated.
Resulting in a call from the SIP provider to Asterisk that's live, but there is 
no caller so it appears to be dead air.

Does the res_rtp_asterisk stunaddr DNS TTL expiration mentioned in change ID 
I7955a046293f913ba121bbd82153b04439e3465f require the dnsmgr.conf to be enabled?

Dan


From: Dan Cropp
Sent: Monday, February 6, 2023 2:06 PM
To: Asterisk Users Mailing List - Non-Commercial Discussion 

Subject: Asterisk rtp.conf stunaddr setting - what happens if there is an outage

Over the weekend, we had several customers running at AWS.  AWS had an outage 
during this time.

This customer is running Asterisk 16.23.0 (which has the STUN timeout crash 
fix).
>From what I have been told, other customers are running newer Asterisk 18.12.1 
>but encountered similar issues.  (I haven't had a chance to verify this)
All these customers should be running PJSIP, but I haven't had a chance to 
verify.


The logs show Asterisk was reporting problems communicating with the STUN 
address in the rtp.conf

[02/04 00:15:03.812] NOTICE[5943] stun.c: Attempt 1 to send STUN request to 
'x.x.x.x' timed out.
[02/04 00:15:06.812] NOTICE[5943] stun.c: Attempt 2 to send STUN request to 
''x.x.x.x ' timed out.
[02/04 00:15:09.813] WARNING[5943] stun.c: Attempt 3 to send STUN request to 
'x.x.x.x' timed out. Check that the server address is correct and reachable.

Until Asterisk was reset, the same pattern kept happening.

Asterisk received INVITEs
Immediately sends the 100 Trying
7 seconds later, Asterisk receives a CANCEL from the SIP provider.
Another half second later, Asterisk receives a second CANCEL
A second later, Asterisk receives a third CANCEL
After the third failed to send STUN request, Asterisk sends a 200 OK response 
for the CSeq CANCEL
Followed by a 487 Request Terminated
Then a second 200 OK response for the CANCEL CSeq
Then a third 200 OK response for the CANCEL CSeq

We have an AMI connection.  At this point, we are seeing the Newchannel event 
for this channel.
It immediately sends various events for the Channel, including the Event: 
Hangup indicating the channel is ended.

63 ms later, it receives an ACK which completes the Call-ID processing.


This went on for over 8 hours.
When they restarted the Asterisk box, everything was fine.  I have been told, 
they had to restart each Asterisk we had running at AWS to resolve the failed 
to send to STUN error.  No calls/channels would work until that was resolved.

I wonder if the STUN address lookup happens only one time and AWS DNS may have 
modified something during this outage/recovery?
Is there a recommendation on how to prevent this from happening?
Any thoughts?


Dan

-- 
_
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

Check out the new Asterisk community forum at: https://community.asterisk.org/

New to Asterisk? Start here:
  https://wiki.asterisk.org/wiki/display/AST/Getting+Started

asterisk-users mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-users

[asterisk-users] Asterisk rtp.conf stunaddr setting - what happens if there is an outage

2023-02-06 Thread Dan Cropp
Over the weekend, we had several customers running at AWS.  AWS had an outage 
during this time.

This customer is running Asterisk 16.23.0 (which has the STUN timeout crash 
fix).
>From what I have been told, other customers are running newer Asterisk 18.12.1 
>but encountered similar issues.  (I haven't had a chance to verify this)
All these customers should be running PJSIP, but I haven't had a chance to 
verify.


The logs show Asterisk was reporting problems communicating with the STUN 
address in the rtp.conf

[02/04 00:15:03.812] NOTICE[5943] stun.c: Attempt 1 to send STUN request to 
'x.x.x.x' timed out.
[02/04 00:15:06.812] NOTICE[5943] stun.c: Attempt 2 to send STUN request to 
''x.x.x.x ' timed out.
[02/04 00:15:09.813] WARNING[5943] stun.c: Attempt 3 to send STUN request to 
'x.x.x.x' timed out. Check that the server address is correct and reachable.

Until Asterisk was reset, the same pattern kept happening.

Asterisk received INVITEs
Immediately sends the 100 Trying
7 seconds later, Asterisk receives a CANCEL from the SIP provider.
Another half second later, Asterisk receives a second CANCEL
A second later, Asterisk receives a third CANCEL
After the third failed to send STUN request, Asterisk sends a 200 OK response 
for the CSeq CANCEL
Followed by a 487 Request Terminated
Then a second 200 OK response for the CANCEL CSeq
Then a third 200 OK response for the CANCEL CSeq

We have an AMI connection.  At this point, we are seeing the Newchannel event 
for this channel.
It immediately sends various events for the Channel, including the Event: 
Hangup indicating the channel is ended.

63 ms later, it receives an ACK which completes the Call-ID processing.


This went on for over 8 hours.
When they restarted the Asterisk box, everything was fine.  I have been told, 
they had to restart each Asterisk we had running at AWS to resolve the failed 
to send to STUN error.  No calls/channels would work until that was resolved.

I wonder if the STUN address lookup happens only one time and AWS DNS may have 
modified something during this outage/recovery?
Is there a recommendation on how to prevent this from happening?
Any thoughts?


Dan

-- 
_
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

Check out the new Asterisk community forum at: https://community.asterisk.org/

New to Asterisk? Start here:
  https://wiki.asterisk.org/wiki/display/AST/Getting+Started

asterisk-users mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-users