Re: [PATCH 1/3] [NET] Do pmtu check in transport layer

2007-04-09 Thread Patrick McHardy
John Heffner wrote:
 Check the pmtu check at the transport layer (for UDP, ICMP and raw), and
 send a local error if socket is PMTUDISC_DO and packet is too big.  This is
 actually a pure bugfix for ipv6.  For ipv4, it allows us to do pmtu checks
 in the same way as for ipv6.
 
 diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
 index d096332..593acf7 100644
 --- a/net/ipv4/ip_output.c
 +++ b/net/ipv4/ip_output.c
 @@ -822,7 +822,9 @@ int ip_append_data(struct sock *sk,
   fragheaderlen = sizeof(struct iphdr) + (opt ? opt-optlen : 0);
   maxfraglen = ((mtu - fragheaderlen)  ~7) + fragheaderlen;
  
 - if (inet-cork.length + length  0x - fragheaderlen) {
 + if (inet-cork.length + length  0x - fragheaderlen ||
 + (inet-pmtudisc = IP_PMTUDISC_DO 
 +  inet-cork.length + length  mtu)) {
   ip_local_error(sk, EMSGSIZE, rt-rt_dst, inet-dport, 
 mtu-exthdrlen);
   return -EMSGSIZE;
   }


This makes ping report an incorrect MTU when IPsec is used since we're
only accounting for the additional header_len, not the trailer_len
(which is not easily changeable). Additionally it will report different
MTUs for the first and following fragments when the socket is corked
because only the first fragment includes the header_len. It also can't
deal with things like NAT and routing by fwmark that change the route.
The old behaviour was that we get an ICMP frag. required with the MTU
of the final route, while this will always report the MTU of the
initially chosen route.

For all these reasons I think it should be reverted to the old
behaviour.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] [NET] Do pmtu check in transport layer

2007-04-09 Thread John Heffner

Patrick McHardy wrote:

John Heffner wrote:

Check the pmtu check at the transport layer (for UDP, ICMP and raw), and
send a local error if socket is PMTUDISC_DO and packet is too big.  This is
actually a pure bugfix for ipv6.  For ipv4, it allows us to do pmtu checks
in the same way as for ipv6.

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index d096332..593acf7 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -822,7 +822,9 @@ int ip_append_data(struct sock *sk,
fragheaderlen = sizeof(struct iphdr) + (opt ? opt-optlen : 0);
maxfraglen = ((mtu - fragheaderlen)  ~7) + fragheaderlen;
 
-	if (inet-cork.length + length  0x - fragheaderlen) {

+   if (inet-cork.length + length  0x - fragheaderlen ||
+   (inet-pmtudisc = IP_PMTUDISC_DO 
+inet-cork.length + length  mtu)) {
ip_local_error(sk, EMSGSIZE, rt-rt_dst, inet-dport, 
mtu-exthdrlen);
return -EMSGSIZE;
}



This makes ping report an incorrect MTU when IPsec is used since we're
only accounting for the additional header_len, not the trailer_len
(which is not easily changeable). Additionally it will report different
MTUs for the first and following fragments when the socket is corked
because only the first fragment includes the header_len. It also can't
deal with things like NAT and routing by fwmark that change the route.
The old behaviour was that we get an ICMP frag. required with the MTU
of the final route, while this will always report the MTU of the
initially chosen route.

For all these reasons I think it should be reverted to the old
behaviour.


You're right, this is no good.  I think the other problems are fixable, 
but NAT really screws this.


Unfortunately, there is still a real problem with ipv6, in that the 
output side does not generate a packet too big ICMP like ipv4.  Also, it 
feels kind of undesirable be rely on local ICMP instead of direct error 
message delivery.  I'll try to generate a new patch.


Thanks,
  -John
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] [NET] Do pmtu check in transport layer

2007-04-09 Thread Patrick McHardy
John Heffner wrote:
 Patrick McHardy wrote:
 
 This makes ping report an incorrect MTU when IPsec is used since we're
 only accounting for the additional header_len, not the trailer_len
 (which is not easily changeable). Additionally it will report different
 MTUs for the first and following fragments when the socket is corked
 because only the first fragment includes the header_len. It also can't
 deal with things like NAT and routing by fwmark that change the route.
 The old behaviour was that we get an ICMP frag. required with the MTU
 of the final route, while this will always report the MTU of the
 initially chosen route.

 For all these reasons I think it should be reverted to the old
 behaviour.
 
 
 You're right, this is no good.  I think the other problems are fixable,
 but NAT really screws this.


Routing by fwmark is also unfixable and IPsec is quite hard.

 Unfortunately, there is still a real problem with ipv6, in that the
 output side does not generate a packet too big ICMP like ipv4.  Also, it
 feels kind of undesirable be rely on local ICMP instead of direct error
 message delivery.  I'll try to generate a new patch.


I think its necessary since at the transport layer we simply don't
have all the information about whats going to happen to a packet.
IPv6 now also supports routing by fwmark, so it has the same problem
if it doesn't generate packet too big messages.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] [NET] Do pmtu check in transport layer

2007-03-24 Thread David Miller
From: John Heffner [EMAIL PROTECTED]
Date: Fri, 23 Mar 2007 20:06:44 -0400

 Check the pmtu check at the transport layer (for UDP, ICMP and raw), and
 send a local error if socket is PMTUDISC_DO and packet is too big.  This is
 actually a pure bugfix for ipv6.  For ipv4, it allows us to do pmtu checks
 in the same way as for ipv6.
 
 Signed-off-by: John Heffner [EMAIL PROTECTED]

Applied, thanks John.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] [NET] Do pmtu check in transport layer

2007-03-23 Thread John Heffner
Check the pmtu check at the transport layer (for UDP, ICMP and raw), and
send a local error if socket is PMTUDISC_DO and packet is too big.  This is
actually a pure bugfix for ipv6.  For ipv4, it allows us to do pmtu checks
in the same way as for ipv6.

Signed-off-by: John Heffner [EMAIL PROTECTED]
---
 net/ipv4/ip_output.c  |4 +++-
 net/ipv4/raw.c|8 +---
 net/ipv6/ip6_output.c |   11 ++-
 net/ipv6/raw.c|7 +--
 4 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index d096332..593acf7 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -822,7 +822,9 @@ int ip_append_data(struct sock *sk,
fragheaderlen = sizeof(struct iphdr) + (opt ? opt-optlen : 0);
maxfraglen = ((mtu - fragheaderlen)  ~7) + fragheaderlen;
 
-   if (inet-cork.length + length  0x - fragheaderlen) {
+   if (inet-cork.length + length  0x - fragheaderlen ||
+   (inet-pmtudisc = IP_PMTUDISC_DO 
+inet-cork.length + length  mtu)) {
ip_local_error(sk, EMSGSIZE, rt-rt_dst, inet-dport, 
mtu-exthdrlen);
return -EMSGSIZE;
}
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 87e9c16..f252f4e 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -271,10 +271,12 @@ static int raw_send_hdrinc(struct sock *sk, void *from, 
size_t length,
struct iphdr *iph;
struct sk_buff *skb;
int err;
+   int mtu;
 
-   if (length  rt-u.dst.dev-mtu) {
-   ip_local_error(sk, EMSGSIZE, rt-rt_dst, inet-dport,
-  rt-u.dst.dev-mtu);
+   mtu = inet-pmtudisc == IP_PMTUDISC_DO ? dst_mtu(rt-u.dst) :
+rt-u.dst.dev-mtu;
+   if (length  mtu) {
+   ip_local_error(sk, EMSGSIZE, rt-rt_dst, inet-dport, mtu);
return -EMSGSIZE;
}
if (flagsMSG_PROBE)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 3055169..711dfc3 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1044,11 +1044,12 @@ int ip6_append_data(struct sock *sk, int getfrag(void 
*from, char *to,
fragheaderlen = sizeof(struct ipv6hdr) + rt-u.dst.nfheader_len + (opt 
? opt-opt_nflen : 0);
maxfraglen = ((mtu - fragheaderlen)  ~7) + fragheaderlen - 
sizeof(struct frag_hdr);
 
-   if (mtu = sizeof(struct ipv6hdr) + IPV6_MAXPLEN) {
-   if (inet-cork.length + length  sizeof(struct ipv6hdr) + 
IPV6_MAXPLEN - fragheaderlen) {
-   ipv6_local_error(sk, EMSGSIZE, fl, mtu-exthdrlen);
-   return -EMSGSIZE;
-   }
+   if ((mtu = sizeof(struct ipv6hdr) + IPV6_MAXPLEN 
+inet-cork.length + length  sizeof(struct ipv6hdr) + IPV6_MAXPLEN 
- fragheaderlen) ||
+   (np-pmtudisc = IPV6_PMTUDISC_DO 
+inet-cork.length + length  mtu)) {
+   ipv6_local_error(sk, EMSGSIZE, fl, mtu-exthdrlen);
+   return -EMSGSIZE;
}
 
/*
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 306d5d8..75db277 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -556,9 +556,12 @@ static int rawv6_send_hdrinc(struct sock *sk, void *from, 
int length,
struct sk_buff *skb;
unsigned int hh_len;
int err;
+   int mtu;
 
-   if (length  rt-u.dst.dev-mtu) {
-   ipv6_local_error(sk, EMSGSIZE, fl, rt-u.dst.dev-mtu);
+   mtu = np-pmtudisc == IPV6_PMTUDISC_DO ? dst_mtu(rt-u.dst) :
+rt-u.dst.dev-mtu;
+   if (length  mtu) {
+   ipv6_local_error(sk, EMSGSIZE, fl, mtu);
return -EMSGSIZE;
}
if (flagsMSG_PROBE)
-- 
1.5.0.2.gc260-dirty

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html