On 6/6/2025 10:44 AM, Martin Podworny wrote:
... we have just found and fix the error with setting for all fileserver-, ptserver- an 
vlserver-processes in /etc/openafs/BosConfig the option "-rxmaxmtu 1260". That 
was it.

The use of -rxmaxmtu on the servers forces the servers and their clients to construct RX/UDP/IP packets such that their maximum length at the IP layer are never larger than 1316 octets.

The underlying problem with many VPN implementations is that the VPN tunnel mtu is small compared to the most common network mtu of 1500 octets (rxmaxmtu = 1444).

OpenAFS clients and servers on Linux are configured to probe the mtu of the link.   They do that by setting the Dont Fragment flag on each outgoing UDP packet.   When a router or switch copies the packet from the incoming port to the outgoing port, if the DF flag is set and the packet cannot be sent without splitting it, the packet will be dropped.   The Linux probing mechanism only works if every router/switch which drops a packet due to the DF flag is going send an ICMP Packet Too Big to the originator stating that the packet was dropped and what the mtu of the next network segment is.

Upon receipt of a ICMP Packet Too Big, the receiver updates its local routing tables and the next time a UDP/IP packet is sent on the route, if its larger than the discovered Path MTU, the packet will be locally fragmented with each fragment sent with the DF flag set.

There are two problems with many VPN implementations.   First, the tunnel exit endpoint enforces the DF flag but refuses to send ICMP packets.  As a result, packets larger than the tunnel mtu received from a server are often dropped and the sender's IP stack receives no knowledge of the path mtu so cannot correct the problem on retransmission.

Second, the VPN tunnel entry point on the client system is implemented as a virtual network interface (bridge or tunnel) and therefore attempts to send packets which are too large are dropped locally when the DF flag is set because the packets exceed the virtual interface's mtu.  In theory the application could identify which interface the packet will be sent on and restrict the size of the constructed packets to that interface's local mtu. However, there is no portal method of implementing this and in practice without a dedicated socket for each pair of endpoints as is done for TCP.

Forcing the server side rxmaxmtu to a small enough value prevents the construction of response DATA packets larger than will fit into the tunnel, and it advises clients to do the same.  However, the advise is received in an ACK packet which won't be sent to the client until after the DATA packets have been constructed.  If the initially constructed packets are larger than the tunnel's mtu, then they won't make it to the server.   To avoid the client side packet construction problem requires that rxmaxmtu also be set on the client's afsd.

However, even with -rxmaxmtu specified on both the client and server, there remains the problem of the rxkad CHALLENGE/RESPONSE exchange where the RESPONSE packet includes a Kerberos service ticket which often includes Additional Data (such as Active Directory Group Membership information for the client principal).  RESPONSE packets larger than the tunnel mtu cannot be sent without fragmentation.   An end user whose issued afs/cell@REALM service ticket is too large to fit in the tunnel won't be able to authenticate to the cell.

An alternative to rxmaxmtu is to modify the VPN tunnel configuration to ignore the DF flag and permit all of the packets to be transmitted across the tunnel as fragments if the incoming packet is too large.

Jeffrey Altman
AuriStor, Inc.

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to