Re: [PATCH 1/3] [NET] Do pmtu check in transport layer
John Heffner wrote: Check the pmtu check at the transport layer (for UDP, ICMP and raw), and send a local error if socket is PMTUDISC_DO and packet is too big. This is actually a pure bugfix for ipv6. For ipv4, it allows us to do pmtu checks in the same way as for ipv6. diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index d096332..593acf7 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -822,7 +822,9 @@ int ip_append_data(struct sock *sk, fragheaderlen = sizeof(struct iphdr) + (opt ? opt-optlen : 0); maxfraglen = ((mtu - fragheaderlen) ~7) + fragheaderlen; - if (inet-cork.length + length 0x - fragheaderlen) { + if (inet-cork.length + length 0x - fragheaderlen || + (inet-pmtudisc = IP_PMTUDISC_DO + inet-cork.length + length mtu)) { ip_local_error(sk, EMSGSIZE, rt-rt_dst, inet-dport, mtu-exthdrlen); return -EMSGSIZE; } This makes ping report an incorrect MTU when IPsec is used since we're only accounting for the additional header_len, not the trailer_len (which is not easily changeable). Additionally it will report different MTUs for the first and following fragments when the socket is corked because only the first fragment includes the header_len. It also can't deal with things like NAT and routing by fwmark that change the route. The old behaviour was that we get an ICMP frag. required with the MTU of the final route, while this will always report the MTU of the initially chosen route. For all these reasons I think it should be reverted to the old behaviour. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] [NET] Do pmtu check in transport layer
Patrick McHardy wrote: John Heffner wrote: Check the pmtu check at the transport layer (for UDP, ICMP and raw), and send a local error if socket is PMTUDISC_DO and packet is too big. This is actually a pure bugfix for ipv6. For ipv4, it allows us to do pmtu checks in the same way as for ipv6. diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index d096332..593acf7 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -822,7 +822,9 @@ int ip_append_data(struct sock *sk, fragheaderlen = sizeof(struct iphdr) + (opt ? opt-optlen : 0); maxfraglen = ((mtu - fragheaderlen) ~7) + fragheaderlen; - if (inet-cork.length + length 0x - fragheaderlen) { + if (inet-cork.length + length 0x - fragheaderlen || + (inet-pmtudisc = IP_PMTUDISC_DO +inet-cork.length + length mtu)) { ip_local_error(sk, EMSGSIZE, rt-rt_dst, inet-dport, mtu-exthdrlen); return -EMSGSIZE; } This makes ping report an incorrect MTU when IPsec is used since we're only accounting for the additional header_len, not the trailer_len (which is not easily changeable). Additionally it will report different MTUs for the first and following fragments when the socket is corked because only the first fragment includes the header_len. It also can't deal with things like NAT and routing by fwmark that change the route. The old behaviour was that we get an ICMP frag. required with the MTU of the final route, while this will always report the MTU of the initially chosen route. For all these reasons I think it should be reverted to the old behaviour. You're right, this is no good. I think the other problems are fixable, but NAT really screws this. Unfortunately, there is still a real problem with ipv6, in that the output side does not generate a packet too big ICMP like ipv4. Also, it feels kind of undesirable be rely on local ICMP instead of direct error message delivery. I'll try to generate a new patch. Thanks, -John - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] [NET] Do pmtu check in transport layer
John Heffner wrote: Patrick McHardy wrote: This makes ping report an incorrect MTU when IPsec is used since we're only accounting for the additional header_len, not the trailer_len (which is not easily changeable). Additionally it will report different MTUs for the first and following fragments when the socket is corked because only the first fragment includes the header_len. It also can't deal with things like NAT and routing by fwmark that change the route. The old behaviour was that we get an ICMP frag. required with the MTU of the final route, while this will always report the MTU of the initially chosen route. For all these reasons I think it should be reverted to the old behaviour. You're right, this is no good. I think the other problems are fixable, but NAT really screws this. Routing by fwmark is also unfixable and IPsec is quite hard. Unfortunately, there is still a real problem with ipv6, in that the output side does not generate a packet too big ICMP like ipv4. Also, it feels kind of undesirable be rely on local ICMP instead of direct error message delivery. I'll try to generate a new patch. I think its necessary since at the transport layer we simply don't have all the information about whats going to happen to a packet. IPv6 now also supports routing by fwmark, so it has the same problem if it doesn't generate packet too big messages. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] [NET] Do pmtu check in transport layer
From: John Heffner [EMAIL PROTECTED] Date: Fri, 23 Mar 2007 20:06:44 -0400 Check the pmtu check at the transport layer (for UDP, ICMP and raw), and send a local error if socket is PMTUDISC_DO and packet is too big. This is actually a pure bugfix for ipv6. For ipv4, it allows us to do pmtu checks in the same way as for ipv6. Signed-off-by: John Heffner [EMAIL PROTECTED] Applied, thanks John. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] [NET] Do pmtu check in transport layer
Check the pmtu check at the transport layer (for UDP, ICMP and raw), and send a local error if socket is PMTUDISC_DO and packet is too big. This is actually a pure bugfix for ipv6. For ipv4, it allows us to do pmtu checks in the same way as for ipv6. Signed-off-by: John Heffner [EMAIL PROTECTED] --- net/ipv4/ip_output.c |4 +++- net/ipv4/raw.c|8 +--- net/ipv6/ip6_output.c | 11 ++- net/ipv6/raw.c|7 +-- 4 files changed, 19 insertions(+), 11 deletions(-) diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index d096332..593acf7 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -822,7 +822,9 @@ int ip_append_data(struct sock *sk, fragheaderlen = sizeof(struct iphdr) + (opt ? opt-optlen : 0); maxfraglen = ((mtu - fragheaderlen) ~7) + fragheaderlen; - if (inet-cork.length + length 0x - fragheaderlen) { + if (inet-cork.length + length 0x - fragheaderlen || + (inet-pmtudisc = IP_PMTUDISC_DO +inet-cork.length + length mtu)) { ip_local_error(sk, EMSGSIZE, rt-rt_dst, inet-dport, mtu-exthdrlen); return -EMSGSIZE; } diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c index 87e9c16..f252f4e 100644 --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -271,10 +271,12 @@ static int raw_send_hdrinc(struct sock *sk, void *from, size_t length, struct iphdr *iph; struct sk_buff *skb; int err; + int mtu; - if (length rt-u.dst.dev-mtu) { - ip_local_error(sk, EMSGSIZE, rt-rt_dst, inet-dport, - rt-u.dst.dev-mtu); + mtu = inet-pmtudisc == IP_PMTUDISC_DO ? dst_mtu(rt-u.dst) : +rt-u.dst.dev-mtu; + if (length mtu) { + ip_local_error(sk, EMSGSIZE, rt-rt_dst, inet-dport, mtu); return -EMSGSIZE; } if (flagsMSG_PROBE) diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 3055169..711dfc3 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1044,11 +1044,12 @@ int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to, fragheaderlen = sizeof(struct ipv6hdr) + rt-u.dst.nfheader_len + (opt ? opt-opt_nflen : 0); maxfraglen = ((mtu - fragheaderlen) ~7) + fragheaderlen - sizeof(struct frag_hdr); - if (mtu = sizeof(struct ipv6hdr) + IPV6_MAXPLEN) { - if (inet-cork.length + length sizeof(struct ipv6hdr) + IPV6_MAXPLEN - fragheaderlen) { - ipv6_local_error(sk, EMSGSIZE, fl, mtu-exthdrlen); - return -EMSGSIZE; - } + if ((mtu = sizeof(struct ipv6hdr) + IPV6_MAXPLEN +inet-cork.length + length sizeof(struct ipv6hdr) + IPV6_MAXPLEN - fragheaderlen) || + (np-pmtudisc = IPV6_PMTUDISC_DO +inet-cork.length + length mtu)) { + ipv6_local_error(sk, EMSGSIZE, fl, mtu-exthdrlen); + return -EMSGSIZE; } /* diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c index 306d5d8..75db277 100644 --- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -556,9 +556,12 @@ static int rawv6_send_hdrinc(struct sock *sk, void *from, int length, struct sk_buff *skb; unsigned int hh_len; int err; + int mtu; - if (length rt-u.dst.dev-mtu) { - ipv6_local_error(sk, EMSGSIZE, fl, rt-u.dst.dev-mtu); + mtu = np-pmtudisc == IPV6_PMTUDISC_DO ? dst_mtu(rt-u.dst) : +rt-u.dst.dev-mtu; + if (length mtu) { + ipv6_local_error(sk, EMSGSIZE, fl, mtu); return -EMSGSIZE; } if (flagsMSG_PROBE) -- 1.5.0.2.gc260-dirty - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html