The e2e timeout serves two purposes, at least:
- Retrying if the request or response got blackholed somewhere, hoping
that the second attempt uses a different set of nodes (e.g., after the
overlay has stabilized after churn) and is luckier;
- Giving up when a response is unlikely, so that the calling
application doesn't hang forever.
The latter is more of a local API design issue, the former requires
specification advice, to avoid too many spurious retransmissions.
The ping can help, but high variance of the response time is likely in
"challenged" networks, so it may not be all *that* helpful. I don't
think we need to specify the precise algorithm, as this seems like a
good opportunity for implementor cleverness and deployment experience.
I suspect that trying to make the per-hop behavior as reliable as
possible and providing error feedback when a request can't be routed
further for some reason (e.g., a node finds that none of its next-hop
links are working due to a NAT issue) is likely to help more than
elaborate timer guessing.
Henning
(sitting on a hotel wireless LAN that went from almost useless to
lossless within a minute)
I agree. Maybe let the peer do some diagnostic actions to gather the
measured data of the overlay and based on the data, the timeout
value could be adjusted. For example, a peer could use PING or other
messages to calculate the RTT for a request, even there is a late
response for the request, i,e, the response could not find the
corresponding request transactions, because timer fires and
transaction are destroyed.
By maintaining the maximum RTT may be helpful to choose a reasonable
timeout. However, in some cases, the timeout value will be limited
by the upper layer application, for instance, the user hope to get
the information from the overlay within a fixed time.
_______________________________________________
P2PSIP mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/p2psip