On Mon, 2014-01-13 at 15:00 +0100, Harald Barth wrote: > (1) I had an old NetInfo file with a wrong IP addr lying around. This > id _not_ prevent the server to start nor to prevent sync completely. > The protection server synced fine and the volume location server > refused.
The NetInfo and NetRestrict files serve as filters on the actual set of addresses found by enumerating interfaces. Mentioning an address that the machine does not have has no effect. > (2) I have a machine where the database server is known as X.Y.Z.43 > but the machine's primary IP is X.Y.Z.46. This seems to work well > until something somewhere checks the source address of the traffic > when sync is tried. Result: The protection server synced fine and the > volume location server refused. I'm not sure why your vlserver and ptserver are behaving differently, unless they are different versions or you have some port-specific filter or the like. When multi-homed Ubik servers are used, the CellServDB used by the Ubik servers must list each server exactly once. Further, each server's CellServDB must use the same set of servers; it won't work to have a server identified by one address in one copy of the file and a different address in another copy. The CellServDB files used by clients and fileservers can list every address for every server, though getting a fileserver and Ubik server on the same machine not to use the same CellServDB can be... challenging. The way Ubik takes advantage of multi-homed servers is to dynamically discover the additional addresses of each server. Whenever a server starts, it exchanges addresses with each other server, or at least the ones that are actually up. Once this is done, each of those servers is able to contact the other using any of its addresses. However, only one address is used at a time -- Ubik doesn't start trying a new address for a multi-homed peer until the one it's been using stops working. Like over-the-network communication in AFS, Ubik server-to-server communication is done using Rx. Particularly, the voting protocol is based on each "candidate" making an RPC to each other server; the vote is encoded as the return value of that RPC. What that means is that a server has no opportunity to try sending its votes to multiple addresses; it can only send one response, which necessarily goes to the address that made the RPC. So, if you have a network condition which blocks traffic between two servers in only one direction, voting will not work. However, this normally will sort itself out, at least partially, because the server making the Beacon RPC will see this as a timeout and treat the other server as down. A worse situation arises when server A makes an RPC to server B, but the best route from server B back to the original source address goes via a different interface than the request came in on. In this situation, the kernel will assign the wrong source address to server B's outgoing reply, which may cause Rx on server A to drop it on the floor. This is the problem that -rxbind is designed to work around, at the expense of the server not really being multi-homed, at least as far as AFS is concerned. Whether this problem arises depends on your network topology, but generally, you will have problems any time server B has multiple interfaces whose best route from A uses the same outgoing address. This includes cases where one server has multiple addresses or interfaces on the same subnet. The sad truth is that in order to properly support multi-homed hosts, Rx needs to be fixed so that it identifies all available interfaces, binds a separate socket for each interface, and keeps track of to which interface an incoming connection belongs, so that it can send responses out the same interface. This approach is necessarily used by all major UDP-based services (e.g. DNS, NTP, DHCP), as it is the only way to insure correct behavior on a multi-homed host. -- Jeff _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
