On Jun 6, 2007, at 12:45 PM, Gilad Shaham wrote:




If anyone is really into solving this rare scenario, one possible
solution is to wait a bit longer after the CANCEL, for example 64*T1
+
T4.


        Unfortunately, this does not really solve the problem. Suppose
we
have more than one proxy in a chain, each with this timer of 64T1+T4.
We end up having a problem still (since the first proxy's timer pops,
then the second, etc). It seems to me that we should set a timer for
64T1 that causes a 408 to be forwarded back, and tear down the
transaction later (maybe another 64T1 later). This way, we can ACK
the responses that might come in as a result of timers popping
downstream.



What I meant was that if the retransmission timer is 64*T1 and the
cancel timer is a bit larger (e.g., 64*T1 + T4) then this case would be
resolved because the retransmission would occur before the cancel,
giving enough time for a 408 to propagate. It's possible I missed
something, but I think the difference between the 2 suggestions is the
cancel timer 64*T1+T4 vs 2*64*T1.


Let's make sure of whether we're on the same page or not; we might be, but let me describe my scenario in a little more detail. Lets say we have a _chain_ of proxies between the UAC and UAS, and an INVITE has already gone through, and everything has sent a 100.

UAC -> P1 -> P2 -> P3 -> ... -> Pn -> UAS

Scenario 1:
Now, suppose UAC CANCELs. P1-Pn all set their CANCEL timer for 64T1 +T4, one after the other, as the CANCEL is forwarded. And let's say UAS is out to cause trouble, and sends CANCEL/200, but no INVITE/487. What happens? Seems like P1's CANCEL timer will pop, causing teardown of its transaction with P2, and sending a 408 to UAC. Very soon after, P2's timer pops, causing teardown of the P2->P3 transaction, and sends a 408 back to P1. But P1's transaction with P2 is already gone, and so it never ACKs. This then repeats at each proxy in the chain, so they are all retransmitting 408 upstream.

Scenario 2:
On the other hand, suppose each proxy sets two timers when they send a CANCEL; one that will cause a 408 to be sent upstream on the Server Invite transaction(64*T1, let's call this Timer L), and another that will cause the Client Invite transaction to be torn down (64*T1+X, let's call this Timer M). What happens after the CANCEL in this case? P1's Timer L pops, and it sends a 408 back to UAC. If X is sufficiently large, P2's Timer L will pop next, and it will send a 408 back to P1 _before_ Timer M pops on P1. We avoid the retransmit storm.

Were you thinking of the technique described in scenario 1, or scenario 2?

At this point, we have the question of how large X needs to be. Setting X to 64*T1 would ensure that we'd stay around until all possible retransmissions have been sent (except maybe one, since the downstream element's retransmit timer starts slightly later, but that won't happen often). It might be overkill to wait this long, however. We want to set it to at least T1, but if we do this and the first 408 gets dropped, we end up with the full compliment of retransmissions.

Alternately, we could just ACK all stray responses.

ACKing stray responses is something that could have some issues:
* Not all error responses have contact headers and in that case it's
impossible to know the destination.
* Contact headers may be different than the original request URI which
may cause the ACK of an error response to go through a different path.
* A source of possible DoS attack (?)

        
Yeah, we would have to do something non-standard to make this work (like sending back to the source of the error). I don't think that this presents a DoS attack that is any worse than just bombarding the server with requests (since well-meaning elements would not worsen the problem). I still like the double-timer approach more though.

Gilad.

Best regards,
Byron Campen


_______________________________________________
Sip-implementors mailing list
Sip-implementors@cs.columbia.edu
https://lists.cs.columbia.edu/cucslists/listinfo/sip-implementors

Reply via email to