On Jun 6, 2007, at 12:45 PM, Gilad Shaham wrote:
If anyone is really into solving this rare scenario, one possible
solution is to wait a bit longer after the CANCEL, for example 64*T1
+
T4.
Unfortunately, this does not really solve the problem. Suppose
we
have more than one proxy in a chain, each with this timer of 64T1+T4.
We end up having a problem still (since the first proxy's timer pops,
then the second, etc). It seems to me that we should set a timer for
64T1 that causes a 408 to be forwarded back, and tear down the
transaction later (maybe another 64T1 later). This way, we can ACK
the responses that might come in as a result of timers popping
downstream.
What I meant was that if the retransmission timer is 64*T1 and the
cancel timer is a bit larger (e.g., 64*T1 + T4) then this case
would be
resolved because the retransmission would occur before the cancel,
giving enough time for a 408 to propagate. It's possible I missed
something, but I think the difference between the 2 suggestions is the
cancel timer 64*T1+T4 vs 2*64*T1.
Let's make sure of whether we're on the same page or not; we might
be, but let me describe my scenario in a little more detail. Lets say
we have a _chain_ of proxies between the UAC and UAS, and an INVITE
has already gone through, and everything has sent a 100.
UAC -> P1 -> P2 -> P3 -> ... -> Pn -> UAS
Scenario 1:
Now, suppose UAC CANCELs. P1-Pn all set their CANCEL timer for 64T1
+T4, one after the other, as the CANCEL is forwarded. And let's say
UAS is out to cause trouble, and sends CANCEL/200, but no INVITE/487.
What happens? Seems like P1's CANCEL timer will pop, causing teardown
of its transaction with P2, and sending a 408 to UAC. Very soon
after, P2's timer pops, causing teardown of the P2->P3 transaction,
and sends a 408 back to P1. But P1's transaction with P2 is already
gone, and so it never ACKs. This then repeats at each proxy in the
chain, so they are all retransmitting 408 upstream.
Scenario 2:
On the other hand, suppose each proxy sets two timers when they send
a CANCEL; one that will cause a 408 to be sent upstream on the Server
Invite transaction(64*T1, let's call this Timer L), and another that
will cause the Client Invite transaction to be torn down (64*T1+X,
let's call this Timer M). What happens after the CANCEL in this case?
P1's Timer L pops, and it sends a 408 back to UAC. If X is
sufficiently large, P2's Timer L will pop next, and it will send a
408 back to P1 _before_ Timer M pops on P1. We avoid the retransmit
storm.
Were you thinking of the technique described in scenario 1, or
scenario 2?
At this point, we have the question of how large X needs to be.
Setting X to 64*T1 would ensure that we'd stay around until all
possible retransmissions have been sent (except maybe one, since the
downstream element's retransmit timer starts slightly later, but that
won't happen often). It might be overkill to wait this long, however.
We want to set it to at least T1, but if we do this and the first 408
gets dropped, we end up with the full compliment of retransmissions.
Alternately, we could just ACK all stray responses.
ACKing stray responses is something that could have some issues:
* Not all error responses have contact headers and in that case it's
impossible to know the destination.
* Contact headers may be different than the original request URI which
may cause the ACK of an error response to go through a different path.
* A source of possible DoS attack (?)
Yeah, we would have to do something non-standard to make this work
(like sending back to the source of the error). I don't think that
this presents a DoS attack that is any worse than just bombarding the
server with requests (since well-meaning elements would not worsen
the problem). I still like the double-timer approach more though.
Gilad.
Best regards,
Byron Campen
_______________________________________________
Sip-implementors mailing list
Sip-implementors@cs.columbia.edu
https://lists.cs.columbia.edu/cucslists/listinfo/sip-implementors