The e2e timeout serves two purposes, at least:

- Retrying if the request or response got blackholed somewhere, hoping that the second attempt uses a different set of nodes (e.g., after the overlay has stabilized after churn) and is luckier;

- Giving up when a response is unlikely, so that the calling application doesn't hang forever.

The latter is more of a local API design issue, the former requires specification advice, to avoid too many spurious retransmissions.

The ping can help, but high variance of the response time is likely in "challenged" networks, so it may not be all *that* helpful. I don't think we need to specify the precise algorithm, as this seems like a good opportunity for implementor cleverness and deployment experience.

I suspect that trying to make the per-hop behavior as reliable as possible and providing error feedback when a request can't be routed further for some reason (e.g., a node finds that none of its next-hop links are working due to a NAT issue) is likely to help more than elaborate timer guessing.

Henning
(sitting on a hotel wireless LAN that went from almost useless to lossless within a minute)





I agree. Maybe let the peer do some diagnostic actions to gather the measured data of the overlay and based on the data, the timeout value could be adjusted. For example, a peer could use PING or other messages to calculate the RTT for a request, even there is a late response for the request, i,e, the response could not find the corresponding request transactions, because timer fires and transaction are destroyed.

By maintaining the maximum RTT may be helpful to choose a reasonable timeout. However, in some cases, the timeout value will be limited by the upper layer application, for instance, the user hope to get the information from the overlay within a fixed time.





_______________________________________________
P2PSIP mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/p2psip

Reply via email to