On Mon, 2014-01-13 at 23:22 -0600, Andrew Deason wrote: > On Mon, 13 Jan 2014 12:32:12 -0500 > Jeffrey Hutzelman <[email protected]> wrote: > > > A worse situation arises when server A makes an RPC to server B, but the > > best route from server B back to the original source address goes via a > > different interface than the request came in on. In this situation, the > > kernel will assign the wrong source address to server B's outgoing > > reply, which may cause Rx on server A to drop it on the floor. > > But we ignore the source address when the multihoming bit is set in the > epoch.
Unfortunately, this behavior has changed a few times. There are actually several tests: - On a client-mode connection, the source address is always ignored. This actually should have the effect of making small requests like votes always work. But for some reason it doesn't. - Both the source address and port are ignored if the epoch multihome bit is set. This happens on both client- and server-mode connections, except that for a period of about 2 years starting in 2004, it happened on client-mode connections only. So you're right; the exact scenario I described, where a packet is dropped by the calling client due to a mismatch of the server's address, shouldn't happen. The practical effect of this is that it is possible for voting to work fine, because that's a single-round-trip operation, while larger calls such as transferring a database update fare not so well (or consistently). > But all processes (that use rxkad) set the multihoming bit. Unless you > are talking about something else? I don't even see where a process would > manually set or clear the multihoming bit, unless it manually set the rx > epoch, and nobody does that. The 'switch' is always flipped (or always > not flipped, I assume, if you go back far enough). rx sets the multihome bit by default only in kernel mode. In user mode, it is not set. As it turns out, you're right -- the multihome bit is also set by rxkad, not only for the current connection but for all future connections, whenever a new connection is set up. That code has been there since AFS 3.1, but I've never noticed it before in all that time. This is rather significant, because it means that, except for that two-year period 10 years ago, we should never have this sort of multi-homing problem. Ever. And yet that clearly has not been the case. Blargh. OK; sorry, Harald. It seems I can't explain what you've seen after all. -- Jeff _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
