Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
In article <[EMAIL PROTECTED]> (at Sun, 18 Dec 2005 23:59:35 +0100), Patrick McHardy <[EMAIL PROTECTED]> says: > YOSHIFUJI Hideaki wrote: : > > BTW, we're now using full of skb->cb > > (and we are even exceeding it w/ mobile-ipv6 extensions)... > > Not in mainline so far, so maybe we can fit your extensions > and my patches without the mobile extensions, that apparently > exceed the CB anyway, in there for now. Can I look at those > patches somewhere? BTW, other fields in the IP6CB seem to > store offsets in u16 fields, is this OK for nhoff too? I > thought with jumbo options I need to use a u32 field. Well, don't mind too much. I just wanted to note that we're about to exceed skb->cb. I will probably enlarge skb->cb if needed. And, yes, good point. I think nhoff should be u32 because it is critical. In theory, u32 is definitely needed for others as well... --yoshfuji - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
In article <[EMAIL PROTECTED]> (at Sun, 18 Dec 2005 15:27:01 +0100), Patrick McHardy <[EMAIL PROTECTED]> says: > YOSHIFUJI Hideaki wrote: > > In article <[EMAIL PROTECTED]> (at Tue, 22 Nov 2005 02:14:26 +0100), > > Patrick McHardy <[EMAIL PROTECTED]> says: > > > > > >>The easiest way would be to store nhoff somewhere in the skb and > >>use it to continue at the next header. But I still hope there is > >>a way without keeping data in the skb. > > > > > > We've coded up this. > > How about this patch instead? It eliminates the nhoffp argument > to IPv6 protocol handlers by storing it in the IP6CB, which allows > to call ip6_input_finish a second time and have it skip already > parsed headers and also gets rid of the manual hopopts skipping. The idea to store IP6CB itself seems sane to me. BTW, we're now using full of skb->cb (and we are even exceeding it w/ mobile-ipv6 extensions)... --yoshfuji - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
Herbert Xu wrote: On Sun, Dec 04, 2005 at 11:06:02PM +0100, Patrick McHardy wrote: If there is a DNAT in the way, this will jump to the very start of the stack. So if we have a hostile IPsec peer, and the DNAT rules are such that this can occur, then we could be in trouble (especially because policy/selector verification does not occur until all IPsec has been done so we can't check inner address validitiy at this point). We could return NET_XMIT_BYPASS from ip_xfrm_transport_hook(), although it looks a bit ugly to use NET_XMIT* on the input path. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
On Sun, Dec 04, 2005 at 11:06:02PM +0100, Patrick McHardy wrote: > > >I'm worried about this bit. This looks like it'll go back to the top > >of the IP stack with the existing call chain. So could grow as the > >number of transforms increase. > > Its not so bad. It adds ip_xfrm_transport_hook and > ip_local_deliver_finish to the call stack, but since two subsequent > transport mode SAs are always processed at once it can't take this > path again without calling netif_rx in between. If there is a DNAT in the way, this will jump to the very start of the stack. So if we have a hostile IPsec peer, and the DNAT rules are such that this can occur, then we could be in trouble (especially because policy/selector verification does not occur until all IPsec has been done so we can't check inner address validitiy at this point). > Besides the double counting, packets also appear on the packet sockets > after transport mode decapsulation with the original approach. For > IPv6 there's also the double-parsing of extension header issue. Having the packets appear twice on AF_PACKET is probably desirable :) I'll need to think about the double-parsing though. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
Herbert Xu wrote: On Sun, Nov 20, 2005 at 04:31:36PM +, Patrick McHardy wrote: @@ -145,7 +149,17 @@ int xfrm4_rcv_encap(struct sk_buff *skb, netif_rx(skb); return 0; } else { +#ifdef CONFIG_NETFILTER + __skb_push(skb, skb->data - skb->nh.raw); + skb->nh.iph->tot_len = htons(skb->len); + ip_send_check(skb->nh.iph); + + NF_HOOK(PF_INET, NF_IP_PRE_ROUTING, skb, skb->dev, NULL, + ip_xfrm_transport_hook); + return 0; +#else return -skb->nh.iph->protocol; +#endif I'm worried about this bit. This looks like it'll go back to the top of the IP stack with the existing call chain. So could grow as the number of transforms increase. Its not so bad. It adds ip_xfrm_transport_hook and ip_local_deliver_finish to the call stack, but since two subsequent transport mode SAs are always processed at once it can't take this path again without calling netif_rx in between. Perhaps we need to play a dst_input/netif_rx trick here. Actually, was there a problem with your original netif_rx approach apart from the issue with double counting? Besides the double counting, packets also appear on the packet sockets after transport mode decapsulation with the original approach. For IPv6 there's also the double-parsing of extension header issue. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
On Sun, Nov 20, 2005 at 04:31:36PM +, Patrick McHardy wrote: > > @@ -145,7 +149,17 @@ int xfrm4_rcv_encap(struct sk_buff *skb, > netif_rx(skb); > return 0; > } else { > +#ifdef CONFIG_NETFILTER > + __skb_push(skb, skb->data - skb->nh.raw); > + skb->nh.iph->tot_len = htons(skb->len); > + ip_send_check(skb->nh.iph); > + > + NF_HOOK(PF_INET, NF_IP_PRE_ROUTING, skb, skb->dev, NULL, > + ip_xfrm_transport_hook); > + return 0; > +#else > return -skb->nh.iph->protocol; > +#endif I'm worried about this bit. This looks like it'll go back to the top of the IP stack with the existing call chain. So could grow as the number of transforms increase. Perhaps we need to play a dst_input/netif_rx trick here. Actually, was there a problem with your original netif_rx approach apart from the issue with double counting? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
Hello. In article <[EMAIL PROTECTED]> (at Tue, 22 Nov 2005 02:14:26 +0100), Patrick McHardy <[EMAIL PROTECTED]> says: > The easiest way would be to store nhoff somewhere in the skb and > use it to continue at the next header. But I still hope there is > a way without keeping data in the skb. We've coded up this. Though we have still another issue (call chain issue) to resolve, we're getting closer to the goal. i.e. we should continue the loop for common case. Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]> Signed-off-by: Yasuyuki Kozawai <[EMAIL PROTECTED]> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 8c5d600..1101851 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -272,6 +272,9 @@ struct sk_buff { void(*destructor)(struct sk_buff *skb); #ifdef CONFIG_NETFILTER __u32 nfmark; +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) + unsigned intnf_nhoff; +#endif struct nf_conntrack *nfct; #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE) struct sk_buff *nfct_reasm; diff --git a/include/net/ipv6.h b/include/net/ipv6.h index 44b979a..0531d0a 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -463,6 +463,8 @@ extern int ip6_dst_lookup(struct sock extern int ip6_output(struct sk_buff *skb); extern int ip6_forward(struct sk_buff *skb); extern int ip6_input(struct sk_buff *skb); +extern int ip6_input_finish2(struct sk_buff *skb, + unsigned int nhoff); extern int ip6_mc_input(struct sk_buff *skb); /* diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c index e84b3cd..cd0606a 100644 --- a/net/ipv6/ip6_input.c +++ b/net/ipv6/ip6_input.c @@ -134,31 +134,13 @@ out: * Deliver the packet to the host */ - -static inline int ip6_input_finish(struct sk_buff *skb) +int ip6_input_finish2(struct sk_buff *skb, unsigned int nhoff) { struct inet6_protocol *ipprot; struct sock *raw_sk; - unsigned int nhoff; - int nexthdr; + unsigned int nexthdr; u8 hash; - skb->h.raw = skb->nh.raw + sizeof(struct ipv6hdr); - - /* -* Parse extension headers -*/ - - nexthdr = skb->nh.ipv6h->nexthdr; - nhoff = offsetof(struct ipv6hdr, nexthdr); - - /* Skip hop-by-hop options, they are already parsed. */ - if (nexthdr == NEXTHDR_HOP) { - nhoff = sizeof(struct ipv6hdr); - nexthdr = skb->h.raw[0]; - skb->h.raw += (skb->h.raw[1]+1)<<3; - } - rcu_read_lock(); resubmit: if (!pskb_pull(skb, skb->h.raw - skb->data)) @@ -221,6 +203,26 @@ discard: return 0; } +static inline int ip6_input_finish(struct sk_buff *skb) +{ + unsigned int nhoff; + + skb->h.raw = skb->nh.raw + sizeof(struct ipv6hdr); + + /* +* Parse extension headers +*/ + + nhoff = offsetof(struct ipv6hdr, nexthdr); + + /* Skip hop-by-hop options, they are already parsed. */ + if (skb->nh.ipv6h->nexthdr == NEXTHDR_HOP) { + nhoff = sizeof(struct ipv6hdr); + skb->h.raw += (skb->h.raw[1]+1)<<3; + } + + return ip6_input_finish2(skb, nhoff); +} int ip6_input(struct sk_buff *skb) { diff --git a/net/ipv6/ipv6_syms.c b/net/ipv6/ipv6_syms.c index 1648278..6051783 100644 --- a/net/ipv6/ipv6_syms.c +++ b/net/ipv6/ipv6_syms.c @@ -15,6 +15,7 @@ EXPORT_SYMBOL(ndisc_mc_map); EXPORT_SYMBOL(register_inet6addr_notifier); EXPORT_SYMBOL(unregister_inet6addr_notifier); EXPORT_SYMBOL(ip6_route_output); +EXPORT_SYMBOL(ip6_input_finish2); EXPORT_SYMBOL(addrconf_lock); EXPORT_SYMBOL(ipv6_setsockopt); EXPORT_SYMBOL(ipv6_getsockopt); diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c index 9987416..2e3b28d 100644 --- a/net/ipv6/xfrm6_input.c +++ b/net/ipv6/xfrm6_input.c @@ -9,6 +9,7 @@ * IPv6 support */ +#include #include #include #include @@ -17,6 +18,7 @@ #include #include #include +#include #include static inline void ipip6_ecn_decapsulate(struct sk_buff *skb) @@ -28,6 +30,25 @@ static inline void ipip6_ecn_decapsulate IP6_ECN_set_ce(inner_iph); } +#ifdef CONFIG_NETFILTER +static inline int xfrm6_rcv_spi_finish2(struct sk_buff *skb) +{ + __skb_pull(skb, skb->h.raw - skb->nh.raw); + return ip6_input_finish2(skb, skb->nf_nhoff); +} + +static inline int xfrm6_rcv_spi_finish(struct sk_buff *skb) +{ + if (skb->dst == NULL) { + ip6_route_input(skb); + return dst_input(skb); + } + + return NF_HOOK(PF_INET6, NF_IP6_LOCAL_IN, skb, skb->dev, NULL, + xfrm6_rcv_spi_finish2); +} +#endif + int xfrm6_rcv_spi(struct sk_buff **pskb, unsigned int *
Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
Kazunori Miyazawa wrote: > Patrick McHardy wrote: > >>The problem is that netfilter hooks take ownership of the skb, so the >>caller can't touch it afterwards. It would be possible, but it would >>become very ugly. Can I assume that you would also be satisfied if >>the double-parsing of extension headers is fixed in some other way? > > > My concern is cost to look up routing table and parse extention headers twice. > I think the latter will be critical issue if some extention header makes > some state in the stack. The routing lookup is not done unless someone resets skb->dst (i.e. nfqueue). I'm looking into fixing the extension header problem, so I hope that will resolve all issues. > IMHO, we will call the loop part of ip6_input_finish to inject the skb > to upper layer directly in ip6_xfrm_transport_hook. > But it may make duplicate codes and raise other issues... The easiest way would be to store nhoff somewhere in the skb and use it to continue at the next header. But I still hope there is a way without keeping data in the skb. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
Patrick McHardy wrote: > Kazunori Miyazawa wrote: > >>Hello Patrick, >> >>I have a comment about the patch on the IPv6 input process. >>The kernel applied your patch will always call ip6_rcv_finish >>when enabling netfilter and receiving a esp packet so that >>it will always look up the routing table and parse >>extention headers twice a packet. >>It will probably cost. >> >>Your ip_xfrm_transport_hook is a good idea, I think. > > > Yes, not passing the packets through the entire stack seems like > the right thing to do. > > >>We could call ip6_rcv_finish if the netfilter changed the addresses >>or otherwise we can continue the loop to avoid the cost in a similar >>way because we can know the change with checking skb->dst. >> >>If you fix the point, your change will resolve our issues which were >>mentioned in the previous mail from Kozakai-san. > > > The problem is that netfilter hooks take ownership of the skb, so the > caller can't touch it afterwards. It would be possible, but it would > become very ugly. Can I assume that you would also be satisfied if > the double-parsing of extension headers is fixed in some other way? My concern is cost to look up routing table and parse extention headers twice. I think the latter will be critical issue if some extention header makes some state in the stack. IMHO, we will call the loop part of ip6_input_finish to inject the skb to upper layer directly in ip6_xfrm_transport_hook. But it may make duplicate codes and raise other issues... I'm also consider the issue. Thank you Patrick, -- Kazunori Miyazawa - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
David S. Miller wrote: I've read over Patrick's two most recent postings of these patches and I think they are generally sane and I cannot find any holes in them. Herbert brought up the legitimate concern about defragmentation, but I think that's a detail and does not take away from the structural soundness of Patrick's approach. I think we implicitly agreed on moving the POST_ROUTING hook before fragmentation and change the user-visible behaviour of the mangle POSTROUTING chain. At least neither Harald not Rusty objected to the patch :) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
Yasuyuki KOZAKAI wrote: I don't see why it is confusing. Plain text packets are visible before encapsulation (and they have to be because we don't necessarily know if packets will be encapsulated at the time the hooks are called in case the policy lookup after NAT returns a policy), plain text packets are visible after decapsulation. With different hooks we can't have symetrical behaviour because of the case I mentioned above, and that would be confusing IMO. Well, what I worried about was just ease to use, not internal processing. I suspected that users can correctly configure IPsec and packet filtering. Just doing "iptables -P INPUT -j DROP; iptables -A INPUT -p esp -j ACCEPT" will drop all input packets and this is different behavior with current kernel, for example. So I just imagined many people would say "Why ?". But as you said in other mail, probably this is my needles fear, isn't this ? As you know, I'm basically worrier. :) :) I think its OK since in tunnel mode the decapsulated packets already need to be allowed by the ruleset, so I think its not too confusing to expect the same in transport mode. Then, why don't we make xfrm{4,6}_rcv_spi() return 0 (1 if IPv6) so that ip_local_deliver_finish()/ip6_input_finish() can continue to process headers if packets have not been mangled ? Is this difficult or impossible to implement ? I'm not sure I understand. Do you propose to check if the packet was mangled after the PRE_ROUTING hook (if it was not stolen or queued) and if not return directly to ip6_input_finish()? Yes. Where would the LOCAL_IN hook be called? Ah, indeed. But we can add it just before return directly to ip6_input_finish(). Please see my mail to Kazunori. The hooks take ownership of the skb, changing this would become pretty ugly because of queued or stolen packets. And it would still leave one path where packets are not directly returned, so I think the double-parsing should be handled otherwise. AFAICT statistics are not affected by my patches, except for the iptables counters. The double parsing of extension headers is indeed a problem I forgot about, it looks like we need to carry nhoff around if it can't be derived from the packet. Of cause that is one solution. I'm going to try that if I can't come up with something better. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
Kazunori Miyazawa wrote: > Hello Patrick, > > I have a comment about the patch on the IPv6 input process. > The kernel applied your patch will always call ip6_rcv_finish > when enabling netfilter and receiving a esp packet so that > it will always look up the routing table and parse > extention headers twice a packet. > It will probably cost. > > Your ip_xfrm_transport_hook is a good idea, I think. Yes, not passing the packets through the entire stack seems like the right thing to do. > We could call ip6_rcv_finish if the netfilter changed the addresses > or otherwise we can continue the loop to avoid the cost in a similar > way because we can know the change with checking skb->dst. > > If you fix the point, your change will resolve our issues which were > mentioned in the previous mail from Kozakai-san. The problem is that netfilter hooks take ownership of the skb, so the caller can't touch it afterwards. It would be possible, but it would become very ugly. Can I assume that you would also be satisfied if the double-parsing of extension headers is fixed in some other way? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
Hi, From: Patrick McHardy <[EMAIL PROTECTED]> Date: Mon, 21 Nov 2005 07:52:36 +0100 > I don't see why it is confusing. Plain text packets are visible before > encapsulation (and they have to be because we don't necessarily know > if packets will be encapsulated at the time the hooks are called in > case the policy lookup after NAT returns a policy), plain text packets > are visible after decapsulation. With different hooks we can't have > symetrical behaviour because of the case I mentioned above, and that > would be confusing IMO. Well, what I worried about was just ease to use, not internal processing. I suspected that users can correctly configure IPsec and packet filtering. Just doing "iptables -P INPUT -j DROP; iptables -A INPUT -p esp -j ACCEPT" will drop all input packets and this is different behavior with current kernel, for example. So I just imagined many people would say "Why ?". But as you said in other mail, probably this is my needles fear, isn't this ? As you know, I'm basically worrier. :) > > And this can be said about IPv6 input path. If packets have not been mangled > > (this is ordinary case because ip6tables doesn't have neither NAT nor > > target module to mangle addresses directly), they don't have to re-route > > and don't have to re-visit ip6_input_finish(). > > > > In the other way, if their addresses have been mangled, it's necessary to > > re-route. I agree re-visiting ip6_input_finish() in this case. > > Same goes for ip6_input_finish as for ip_local_deliver_finish(), > the packet would continue its path there anyway. Do you actually > mean ip6_rcv_finish()? No. I mean ip6_input_finish(). calling ip6_input_finish() twice causes problem at processing of IPv6 extension headers. This is different point between IPv4 and IPv6. Please note that I don't mean IPv4 processing in your patch has problem. I think it will work. What I wanted to do was just avoiding double processing of extension headers and synchronizing IPv4/IPv6 behavior as possible. > > Then, why don't we make xfrm{4,6}_rcv_spi() return 0 (1 if IPv6) > > so that ip_local_deliver_finish()/ip6_input_finish() can continue to process > > headers if packets have not been mangled ? Is this difficult or impossible > > to > > implement ? > > I'm not sure I understand. Do you propose to check if the packet was > mangled after the PRE_ROUTING hook (if it was not stolen or queued) > and if not return directly to ip6_input_finish()? Yes. >Where would the > LOCAL_IN hook be called? Ah, indeed. But we can add it just before return directly to ip6_input_finish(). > > This can solve the issue about twice processing of statistics and IPv6 > > extension headers. Because ip_local_deliver_finish()/ip6_input_finish() can > > continue to process extension headers after ESP/AH in ordinary case. > > AFAICT statistics are not affected by my patches, except for the > iptables counters. The double parsing of extension headers is indeed > a problem I forgot about, it looks like we need to carry nhoff around > if it can't be derived from the packet. Of cause that is one solution. -- Yasuyuki Kozakai - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
Hello. In article <[EMAIL PROTECTED]> (at Mon, 21 Nov 2005 17:31:41 +0900), Kazunori Miyazawa <[EMAIL PROTECTED]> says: > Your ip_xfrm_transport_hook is a good idea, I think. > > We could call ip6_rcv_finish if the netfilter changed the addresses > or otherwise we can continue the loop to avoid the cost in a similar > way because we can know the change with checking skb->dst. Well, I agree. In article <[EMAIL PROTECTED]> (at Sun, 20 Nov 2005 17:31:36 +0100), Patrick McHardy <[EMAIL PROTECTED]> says: > diff --git a/net/ipv4/netfilter.c b/net/ipv4/netfilter.c > index b93e7cd..3c39296 100644 > --- a/net/ipv4/netfilter.c > +++ b/net/ipv4/netfilter.c > @@ -105,6 +105,26 @@ int ip_dst_output(struct sk_buff *skb) > return dst_output(skb); > } > EXPORT_SYMBOL(ip_dst_output); > + > +/* > + * okfn for transport mode xfrm_input.c hook. Basically a copy of > + * ip_rcv_finish without statistics and option parsing. > + */ > +int ip_xfrm_transport_hook(struct sk_buff *skb) > +{ > + struct iphdr *iph = skb->nh.iph; > + > + if (likely(skb->dst == NULL)) { > + int err = ip_route_input(skb, iph->daddr, iph->saddr, iph->tos, > + skb->dev); > + if (unlikely(err)) > + goto drop; > + } > + return dst_input(skb); > +drop: > + kfree_skb(skb); > + return NET_RX_DROP; > +} > #endif /* CONFIG_XFRM */ > : > @@ -129,7 +133,16 @@ int xfrm6_rcv_spi(struct sk_buff **pskb, > netif_rx(skb); > return -1; > } else { > +#ifdef CONFIG_NETFILTER > + skb->nh.ipv6h->payload_len = htons(skb->len); > + __skb_push(skb, skb->data - skb->nh.raw); > + > + NF_HOOK(PF_INET6, NF_IP6_PRE_ROUTING, skb, skb->dev, NULL, > + ip6_rcv_finish); > + return -1; > +#else > return 1; > +#endif > } > Probably, we can do similarly for ipv6; e.g.: int ip6_xfrm_transport_hook(struct sk_buff *skb) { #if 0 /* We NEVER support NAT. :-) */ if (likely(skb->dst == NULL)) { int err = ip6_route_input() if (unlikely(err)) goto drop; } #endif __skb_pull(skb, skb->h.raw - skb->nh.raw); return NET_RX_SUCCESS; drop: kfree_skb(skb); return NET_RX_DROP; } : } else { #ifdef CONFIG_NETFILTER skb->nh.ipv6h->payload_len = htons(skb->len); skb->h.raw = skb->data; __skb_push(skb, skb->data - skb->nh.raw); if (NF_HOOK(PF_INET6, NF_IP6_PRE_ROUTING, skb, skb->dev, NULL, ip6_xfrm_transport_hook) == NET_RX_DROP) return -1; #endif return 1; } Then, we can continue parsing extension headers, I think. -- YOSHIFUJI Hideaki @ USAGI Project <[EMAIL PROTECTED]> GPG-FP : 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
Hello Patrick, I have a comment about the patch on the IPv6 input process. The kernel applied your patch will always call ip6_rcv_finish when enabling netfilter and receiving a esp packet so that it will always look up the routing table and parse extention headers twice a packet. It will probably cost. Your ip_xfrm_transport_hook is a good idea, I think. We could call ip6_rcv_finish if the netfilter changed the addresses or otherwise we can continue the loop to avoid the cost in a similar way because we can know the change with checking skb->dst. If you fix the point, your change will resolve our issues which were mentioned in the previous mail from Kozakai-san. Patrick McHardy wrote: > [IPV4/6]: Netfilter IPsec input hooks > > When the innermost transform uses transport mode the decapsulated packet > is not visible to netfilter. Pass the packet through the PRE_ROUTING and > LOCAL_IN hooks again before handing it to upper layer protocols to make > netfilter-visibility symetrical to the output path. > > Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]> > > --- > commit 08cf39d5d7d8b942431a6529daa3ab69ecfb34b5 > tree 6f2a1a85f915b1b6f89ad50cf3d8855f21a561b6 > parent b847425c693f43a63137d18e36e5c8cf0187c175 > author Patrick McHardy <[EMAIL PROTECTED]> Sat, 19 Nov 2005 21:50:22 +0100 > committer Patrick McHardy <[EMAIL PROTECTED]> Sat, 19 Nov 2005 21:50:22 +0100 > > include/linux/netfilter_ipv4.h |2 +- > include/net/ipv6.h |2 ++ > net/ipv4/netfilter.c | 20 > net/ipv4/xfrm4_input.c | 14 ++ > net/ipv6/ip6_input.c |2 +- > net/ipv6/xfrm6_input.c | 13 + > 6 files changed, 51 insertions(+), 2 deletions(-) > > diff --git a/include/linux/netfilter_ipv4.h b/include/linux/netfilter_ipv4.h > index fdc4a95..e9103fe 100644 > --- a/include/linux/netfilter_ipv4.h > +++ b/include/linux/netfilter_ipv4.h > @@ -79,7 +79,7 @@ enum nf_ip_hook_priorities { > > #ifdef __KERNEL__ > extern int ip_route_me_harder(struct sk_buff **pskb); > - > +extern int ip_xfrm_transport_hook(struct sk_buff *skb); > #endif /*__KERNEL__*/ > > #endif /*__LINUX_IP_NETFILTER_H*/ > diff --git a/include/net/ipv6.h b/include/net/ipv6.h > index 6addb4d..4fbfe43 100644 > --- a/include/net/ipv6.h > +++ b/include/net/ipv6.h > @@ -414,6 +414,8 @@ extern intipv6_rcv(struct sk_buff > *sk >struct packet_type *pt, >struct net_device *orig_dev); > > +extern int ip6_rcv_finish(struct sk_buff *skb); > + > /* > * upper-layer output functions > */ > diff --git a/net/ipv4/netfilter.c b/net/ipv4/netfilter.c > index b93e7cd..3c39296 100644 > --- a/net/ipv4/netfilter.c > +++ b/net/ipv4/netfilter.c > @@ -105,6 +105,26 @@ int ip_dst_output(struct sk_buff *skb) > return dst_output(skb); > } > EXPORT_SYMBOL(ip_dst_output); > + > +/* > + * okfn for transport mode xfrm_input.c hook. Basically a copy of > + * ip_rcv_finish without statistics and option parsing. > + */ > +int ip_xfrm_transport_hook(struct sk_buff *skb) > +{ > + struct iphdr *iph = skb->nh.iph; > + > + if (likely(skb->dst == NULL)) { > + int err = ip_route_input(skb, iph->daddr, iph->saddr, iph->tos, > + skb->dev); > + if (unlikely(err)) > + goto drop; > + } > + return dst_input(skb); > +drop: > + kfree_skb(skb); > + return NET_RX_DROP; > +} > #endif /* CONFIG_XFRM */ > > /* > diff --git a/net/ipv4/xfrm4_input.c b/net/ipv4/xfrm4_input.c > index 2d3849c..d90cd93 100644 > --- a/net/ipv4/xfrm4_input.c > +++ b/net/ipv4/xfrm4_input.c > @@ -11,6 +11,8 @@ > > #include > #include > +#include > +#include > #include > #include > #include > @@ -137,6 +139,8 @@ int xfrm4_rcv_encap(struct sk_buff *skb, > memcpy(skb->sp->x+skb->sp->len, xfrm_vec, xfrm_nr*sizeof(struct > sec_decap_state)); > skb->sp->len += xfrm_nr; > > + nf_reset(skb); > + > if (decaps) { > if (!(skb->dev->flags&IFF_LOOPBACK)) { > dst_release(skb->dst); > @@ -145,7 +149,17 @@ int xfrm4_rcv_encap(struct sk_buff *skb, > netif_rx(skb); > return 0; > } else { > +#ifdef CONFIG_NETFILTER > + __skb_push(skb, skb->data - skb->nh.raw); > + skb->nh.iph->tot_len = htons(skb->len); > + ip_send_check(skb->nh.iph); > + > + NF_HOOK(PF_INET, NF_IP_PRE_ROUTING, skb, skb->dev, NULL, > + ip_xfrm_transport_hook); > + return 0; > +#else > return -skb->nh.iph->protocol; > +#endif > } > > drop_unlock: > diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c > index 33d3b0e..e84b3cd 100644 > --- a/net/ipv6/ip6_input.c > +++ b/net/ipv6/ip6_input.c > @@ -48,7 +48,7 @@ > > > > -static inline int i
Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
David S. Miller <[EMAIL PROTECTED]> wrote: > > I've read over Patrick's two most recent postings of these patches > and I think they are generally sane and I cannot find any holes in > them. Herbert brought up the legitimate concern about defragmentation, > but I think that's a detail and does not take away from the structural > soundness of Patrick's approach. Yes I agree completely. The new IPsec stack has been around for three years and it's about time that we have proper netfilter support for it. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
From: Patrick McHardy <[EMAIL PROTECTED]> Date: Mon, 21 Nov 2005 07:52:36 +0100 > I don't see why it is confusing. Plain text packets are visible before > encapsulation (and they have to be because we don't necessarily know > if packets will be encapsulated at the time the hooks are called in > case the policy lookup after NAT returns a policy), plain text packets > are visible after decapsulation. With different hooks we can't have > symetrical behaviour because of the case I mentioned above, and that > would be confusing IMO. I think this is a very important point. I can see no serious argument against this behavior, especially for output. On input, there is an argument of paranoia about seeing plaintext packets, but administrator could do this anyways with tcpdump or custom kernel module if this system is the decapsulation point. I've read over Patrick's two most recent postings of these patches and I think they are generally sane and I cannot find any holes in them. Herbert brought up the legitimate concern about defragmentation, but I think that's a detail and does not take away from the structural soundness of Patrick's approach. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
Yasuyuki KOZAKAI wrote: At first, now I could agree to use same name for hooks before/after xfrm processing, if it's important to keep compatibility than to avoid difficulty to use. Even now I think it's confusing to pass packets before/after xfrm to same hook, and believe it's ideal to use different name for them. But people can configure correctly according to you, then my concern might be needless fear. I don't see why it is confusing. Plain text packets are visible before encapsulation (and they have to be because we don't necessarily know if packets will be encapsulated at the time the hooks are called in case the policy lookup after NAT returns a policy), plain text packets are visible after decapsulation. With different hooks we can't have symetrical behaviour because of the case I mentioned above, and that would be confusing IMO. Next is about re-visiting stack, I'm beginning to understand your patch. It looks natural to re-route after DNAT on input path of IPv4. That's really needed if packets have been mangled. But is there any reason that we have to allow packets to re-visit ip_local_deliver_finish() in the case that they have not been mangled at PRE_ROUTING ? I think no. This situation is like ip_nat_out(). My patches don't change when and if packets will reach ip_local_deliver_finish(), they just add a possibility for rerouting. Currently the transforms are called from ip_local_deliver_finish() and in transport mode the decapsulated packet continues its path in ip_local_deliver_finish(), with my patches they will go through two netfilter hooks and continue the exact same codepath, given that they are not NATed to a foreign destination IP on their way. And this can be said about IPv6 input path. If packets have not been mangled (this is ordinary case because ip6tables doesn't have neither NAT nor target module to mangle addresses directly), they don't have to re-route and don't have to re-visit ip6_input_finish(). In the other way, if their addresses have been mangled, it's necessary to re-route. I agree re-visiting ip6_input_finish() in this case. Same goes for ip6_input_finish as for ip_local_deliver_finish(), the packet would continue its path there anyway. Do you actually mean ip6_rcv_finish()? Then, why don't we make xfrm{4,6}_rcv_spi() return 0 (1 if IPv6) so that ip_local_deliver_finish()/ip6_input_finish() can continue to process headers if packets have not been mangled ? Is this difficult or impossible to implement ? I'm not sure I understand. Do you propose to check if the packet was mangled after the PRE_ROUTING hook (if it was not stolen or queued) and if not return directly to ip6_input_finish()? Where would the LOCAL_IN hook be called? This can solve the issue about twice processing of statistics and IPv6 extension headers. Because ip_local_deliver_finish()/ip6_input_finish() can continue to process extension headers after ESP/AH in ordinary case. AFAICT statistics are not affected by my patches, except for the iptables counters. The double parsing of extension headers is indeed a problem I forgot about, it looks like we need to carry nhoff around if it can't be derived from the packet. Regards Patrick - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
Hi, Patrick, From: Patrick McHardy <[EMAIL PROTECTED]> Date: Sun, 20 Nov 2005 17:31:36 +0100 > [IPV4/6]: Netfilter IPsec input hooks > > When the innermost transform uses transport mode the decapsulated packet > is not visible to netfilter. Pass the packet through the PRE_ROUTING and > LOCAL_IN hooks again before handing it to upper layer protocols to make > netfilter-visibility symetrical to the output path. > > Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]> At first, now I could agree to use same name for hooks before/after xfrm processing, if it's important to keep compatibility than to avoid difficulty to use. Even now I think it's confusing to pass packets before/after xfrm to same hook, and believe it's ideal to use different name for them. But people can configure correctly according to you, then my concern might be needless fear. Next is about re-visiting stack, I'm beginning to understand your patch. It looks natural to re-route after DNAT on input path of IPv4. That's really needed if packets have been mangled. But is there any reason that we have to allow packets to re-visit ip_local_deliver_finish() in the case that they have not been mangled at PRE_ROUTING ? I think no. This situation is like ip_nat_out(). And this can be said about IPv6 input path. If packets have not been mangled (this is ordinary case because ip6tables doesn't have neither NAT nor target module to mangle addresses directly), they don't have to re-route and don't have to re-visit ip6_input_finish(). In the other way, if their addresses have been mangled, it's necessary to re-route. I agree re-visiting ip6_input_finish() in this case. Then, why don't we make xfrm{4,6}_rcv_spi() return 0 (1 if IPv6) so that ip_local_deliver_finish()/ip6_input_finish() can continue to process headers if packets have not been mangled ? Is this difficult or impossible to implement ? This can solve the issue about twice processing of statistics and IPv6 extension headers. Because ip_local_deliver_finish()/ip6_input_finish() can continue to process extension headers after ESP/AH in ordinary case. In special case, if some codes mangle IPv6 addresses, that's the codes to take care of the possibility of re-visiting ip6_input_finish(). What do you think ? # Please note that these are just my opinions and other USAGI guys might have # other opinions. Regards, -- Yasuyuki Kozakai - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html