Re: IPv6/NDP/IPsec breakage in -current

2017-01-02 Thread Martin Pieuchot
On 06/12/16(Tue) 15:23, Alexander Bluhm wrote:
> On Mon, Nov 21, 2016 at 03:15:39PM +0100, Martin Pieuchot wrote:
> > naddy@ confirmed this diff fixes his tunnel mode setup, ok?
> 
> IPv6 neighbor discovery over IPsec does not work reliably.  It uses
> link-local, global and multicast addresses and depending on your
> flows and SA it either works or not.
> 
> The (sizeof(*hip6) < ln->ln_hold->m_len) check is wrong, so the
> code used saddr6 = NULL before.  With that and m_inject() it worked.
> We have a bug that was hiding a problem, but now the bug is not
> triggered anymore.
> 
> Then there is RFC 2461 written in this old IPv6 spirit.  If we have
> a bunch of address scopes, use the best matching address.  Then
> chances are high, that autoconfiguration and featuritis works.
> 
> SEND brings the issue to a new x509 certificate problem level.
> 
> I am not sure what to do.
> 
> mpi@ suggests to remove the IPv6 spirit and by accident it fixes
> naddy@'s problem.  I am a bit reluctant to remove it.  It is not a
> propper fix and may trigger problems elsewhere.
> 
> In my IPv6+IPsec setup I use a separate unencrypted network for
> IKE, ESP and corresponding ND packets.
> 
> For transparent mode I have no better solution than excluding ICMP6
> from the flow.
> 
> At least we should put something into the ipsec.conf(5) man page.

This is still an issue, multiple diffs are floating around, could we
commit a fix?

> > > Index: netinet6/nd6_nbr.c
> > > ===
> > > RCS file: /cvs/src/sys/netinet6/nd6_nbr.c,v
> > > retrieving revision 1.110
> > > diff -u -p -r1.110 nd6_nbr.c
> > > --- netinet6/nd6_nbr.c23 Aug 2016 11:03:10 -  1.110
> > > +++ netinet6/nd6_nbr.c4 Nov 2016 09:02:47 -
> > > @@ -433,54 +433,23 @@ nd6_ns_output(struct ifnet *ifp, struct 
> > >   }
> > >   ip6->ip6_dst = dst_sa.sin6_addr;
> > >   if (!dad) {
> > > - /*
> > > -  * RFC2461 7.2.2:
> > > -  * "If the source address of the packet prompting the
> > > -  * solicitation is the same as one of the addresses assigned
> > > -  * to the outgoing interface, that address SHOULD be placed
> > > -  * in the IP Source Address of the outgoing solicitation.
> > > -  * Otherwise, any one of the addresses assigned to the
> > > -  * interface should be used."
> > > -  *
> > > -  * We use the source address for the prompting packet
> > > -  * (saddr6), if:
> > > -  * - saddr6 is given from the caller (by giving "ln"), and
> > > -  * - saddr6 belongs to the outgoing interface.
> > > -  * Otherwise, we perform the source address selection as usual.
> > > -  */
> > > - struct ip6_hdr *hip6;   /* hold ip6 */
> > > - struct in6_addr *saddr6;
> > > +  /* Perform source address selection. */
> > > + struct rtentry *rt;
> > >  
> > > - if (ln && ln->ln_hold) {
> > > - hip6 = mtod(ln->ln_hold, struct ip6_hdr *);
> > > - /* XXX pullup? */
> > > - if (sizeof(*hip6) < ln->ln_hold->m_len)
> > > - saddr6 = >ip6_src;
> > > - else
> > > - saddr6 = NULL;
> > > - } else
> > > - saddr6 = NULL;
> > > - if (saddr6 && in6ifa_ifpwithaddr(ifp, saddr6))
> > > - src_sa.sin6_addr = *saddr6;
> > > - else {
> > > - struct rtentry *rt;
> > > + rt = rtalloc(sin6tosa(_sa), RT_RESOLVE,
> > > + m->m_pkthdr.ph_rtableid);
> > > + if (!rtisvalid(rt)) {
> > > + char addr[INET6_ADDRSTRLEN];
> > >  
> > > - rt = rtalloc(sin6tosa(_sa), RT_RESOLVE,
> > > - m->m_pkthdr.ph_rtableid);
> > > - if (!rtisvalid(rt)) {
> > > - char addr[INET6_ADDRSTRLEN];
> > > -
> > > - nd6log((LOG_DEBUG,
> > > - "%s: source can't be determined: dst=%s\n",
> > > - __func__, inet_ntop(AF_INET6,
> > > - _sa.sin6_addr, addr, sizeof(addr;
> > > - rtfree(rt);
> > > - goto bad;
> > > - }
> > > - src_sa.sin6_addr =
> > > - ifatoia6(rt->rt_ifa)->ia_addr.sin6_addr;
> > > + nd6log((LOG_DEBUG,
> > > + "%s: source can't be determined: dst=%s\n",
> > > + __func__, inet_ntop(AF_INET6,
> > > + _sa.sin6_addr, addr, sizeof(addr;
> > >   rtfree(rt);
> > > + goto bad;
> > >   }
> > > + src_sa.sin6_addr = ifatoia6(rt->rt_ifa)->ia_addr.sin6_addr;
> > > + rtfree(rt);
> > >   } else {
> > >   /*
> > >* Source address for DAD packet must always 

Re: IPv6/NDP/IPsec breakage in -current

2016-12-06 Thread Alexander Bluhm
On Mon, Nov 21, 2016 at 03:15:39PM +0100, Martin Pieuchot wrote:
> naddy@ confirmed this diff fixes his tunnel mode setup, ok?

IPv6 neighbor discovery over IPsec does not work reliably.  It uses
link-local, global and multicast addresses and depending on your
flows and SA it either works or not.

The (sizeof(*hip6) < ln->ln_hold->m_len) check is wrong, so the
code used saddr6 = NULL before.  With that and m_inject() it worked.
We have a bug that was hiding a problem, but now the bug is not
triggered anymore.

Then there is RFC 2461 written in this old IPv6 spirit.  If we have
a bunch of address scopes, use the best matching address.  Then
chances are high, that autoconfiguration and featuritis works.

SEND brings the issue to a new x509 certificate problem level.

I am not sure what to do.

mpi@ suggests to remove the IPv6 spirit and by accident it fixes
naddy@'s problem.  I am a bit reluctant to remove it.  It is not a
propper fix and may trigger problems elsewhere.

In my IPv6+IPsec setup I use a separate unencrypted network for
IKE, ESP and corresponding ND packets.

For transparent mode I have no better solution than excluding ICMP6
from the flow.

At least we should put something into the ipsec.conf(5) man page.

bluhm

> 
> > Index: netinet6/nd6_nbr.c
> > ===
> > RCS file: /cvs/src/sys/netinet6/nd6_nbr.c,v
> > retrieving revision 1.110
> > diff -u -p -r1.110 nd6_nbr.c
> > --- netinet6/nd6_nbr.c  23 Aug 2016 11:03:10 -  1.110
> > +++ netinet6/nd6_nbr.c  4 Nov 2016 09:02:47 -
> > @@ -433,54 +433,23 @@ nd6_ns_output(struct ifnet *ifp, struct 
> > }
> > ip6->ip6_dst = dst_sa.sin6_addr;
> > if (!dad) {
> > -   /*
> > -* RFC2461 7.2.2:
> > -* "If the source address of the packet prompting the
> > -* solicitation is the same as one of the addresses assigned
> > -* to the outgoing interface, that address SHOULD be placed
> > -* in the IP Source Address of the outgoing solicitation.
> > -* Otherwise, any one of the addresses assigned to the
> > -* interface should be used."
> > -*
> > -* We use the source address for the prompting packet
> > -* (saddr6), if:
> > -* - saddr6 is given from the caller (by giving "ln"), and
> > -* - saddr6 belongs to the outgoing interface.
> > -* Otherwise, we perform the source address selection as usual.
> > -*/
> > -   struct ip6_hdr *hip6;   /* hold ip6 */
> > -   struct in6_addr *saddr6;
> > +/* Perform source address selection. */
> > +   struct rtentry *rt;
> >  
> > -   if (ln && ln->ln_hold) {
> > -   hip6 = mtod(ln->ln_hold, struct ip6_hdr *);
> > -   /* XXX pullup? */
> > -   if (sizeof(*hip6) < ln->ln_hold->m_len)
> > -   saddr6 = >ip6_src;
> > -   else
> > -   saddr6 = NULL;
> > -   } else
> > -   saddr6 = NULL;
> > -   if (saddr6 && in6ifa_ifpwithaddr(ifp, saddr6))
> > -   src_sa.sin6_addr = *saddr6;
> > -   else {
> > -   struct rtentry *rt;
> > +   rt = rtalloc(sin6tosa(_sa), RT_RESOLVE,
> > +   m->m_pkthdr.ph_rtableid);
> > +   if (!rtisvalid(rt)) {
> > +   char addr[INET6_ADDRSTRLEN];
> >  
> > -   rt = rtalloc(sin6tosa(_sa), RT_RESOLVE,
> > -   m->m_pkthdr.ph_rtableid);
> > -   if (!rtisvalid(rt)) {
> > -   char addr[INET6_ADDRSTRLEN];
> > -
> > -   nd6log((LOG_DEBUG,
> > -   "%s: source can't be determined: dst=%s\n",
> > -   __func__, inet_ntop(AF_INET6,
> > -   _sa.sin6_addr, addr, sizeof(addr;
> > -   rtfree(rt);
> > -   goto bad;
> > -   }
> > -   src_sa.sin6_addr =
> > -   ifatoia6(rt->rt_ifa)->ia_addr.sin6_addr;
> > +   nd6log((LOG_DEBUG,
> > +   "%s: source can't be determined: dst=%s\n",
> > +   __func__, inet_ntop(AF_INET6,
> > +   _sa.sin6_addr, addr, sizeof(addr;
> > rtfree(rt);
> > +   goto bad;
> > }
> > +   src_sa.sin6_addr = ifatoia6(rt->rt_ifa)->ia_addr.sin6_addr;
> > +   rtfree(rt);
> > } else {
> > /*
> >  * Source address for DAD packet must always be IPv6
> > 



Re: IPv6/NDP/IPsec breakage in -current

2016-11-04 Thread Martin Pieuchot
On 02/11/16(Wed) 10:19, Martin Pieuchot wrote:
> On 25/10/16(Tue) 22:13, Markus Friedl wrote:
> > 
> > > Am 25.10.2016 um 17:13 schrieb Mike Belopuhov :
> > > 
> > > 
> > > There are apparently some discussions in infomational RFCs regarding
> > > this issue.  For instance https://tools.ietf.org/html/rfc3756 
> > >  states:
> > > 
> > >   More specifically, the currently used key agreement protocol, IKE,
> > >   suffers from a chicken-and-egg problem [8]: one needs an IP address
> > >   to run IKE, IKE is needed to establish IPsec SAs, and IPsec SAs are
> > >   required to configure an IP address.
> > > 
> > > Which goes one step further: how to protect all ND in general, but is
> > > still applicable in our situation.  There were attempts to protect ND
> > > in alternative way, e.g. SEND (https://tools.ietf.org/html/rfc3971 
> > > ).
> > > FreeBSD has picked up on it and has had a SoC project which seems to
> > > be integrated right now:
> > > 
> > >   https://wiki.freebsd.org/SOC2009AnaKukec 
> > > 
> > >   https://www.freebsd.org/cgi/man.cgi?query=send=4 
> > > 
> > > 
> > > Would it be possible for us to disable the check and always set saddr6
> > > to NULL for now?
> > 
> > Fine w/me.
> > 
> > Or we could check if the packet has been IPsec encapsulated
> > and set saddr6 to NULL in this case.
> 
> Is this fixed?  Anything we're still waiting for?

So something like that?  FWIW I'm happy with fewer in6ifa_ifpwithaddr().

Index: netinet6/nd6_nbr.c
===
RCS file: /cvs/src/sys/netinet6/nd6_nbr.c,v
retrieving revision 1.110
diff -u -p -r1.110 nd6_nbr.c
--- netinet6/nd6_nbr.c  23 Aug 2016 11:03:10 -  1.110
+++ netinet6/nd6_nbr.c  4 Nov 2016 09:02:47 -
@@ -433,54 +433,23 @@ nd6_ns_output(struct ifnet *ifp, struct 
}
ip6->ip6_dst = dst_sa.sin6_addr;
if (!dad) {
-   /*
-* RFC2461 7.2.2:
-* "If the source address of the packet prompting the
-* solicitation is the same as one of the addresses assigned
-* to the outgoing interface, that address SHOULD be placed
-* in the IP Source Address of the outgoing solicitation.
-* Otherwise, any one of the addresses assigned to the
-* interface should be used."
-*
-* We use the source address for the prompting packet
-* (saddr6), if:
-* - saddr6 is given from the caller (by giving "ln"), and
-* - saddr6 belongs to the outgoing interface.
-* Otherwise, we perform the source address selection as usual.
-*/
-   struct ip6_hdr *hip6;   /* hold ip6 */
-   struct in6_addr *saddr6;
+/* Perform source address selection. */
+   struct rtentry *rt;
 
-   if (ln && ln->ln_hold) {
-   hip6 = mtod(ln->ln_hold, struct ip6_hdr *);
-   /* XXX pullup? */
-   if (sizeof(*hip6) < ln->ln_hold->m_len)
-   saddr6 = >ip6_src;
-   else
-   saddr6 = NULL;
-   } else
-   saddr6 = NULL;
-   if (saddr6 && in6ifa_ifpwithaddr(ifp, saddr6))
-   src_sa.sin6_addr = *saddr6;
-   else {
-   struct rtentry *rt;
+   rt = rtalloc(sin6tosa(_sa), RT_RESOLVE,
+   m->m_pkthdr.ph_rtableid);
+   if (!rtisvalid(rt)) {
+   char addr[INET6_ADDRSTRLEN];
 
-   rt = rtalloc(sin6tosa(_sa), RT_RESOLVE,
-   m->m_pkthdr.ph_rtableid);
-   if (!rtisvalid(rt)) {
-   char addr[INET6_ADDRSTRLEN];
-
-   nd6log((LOG_DEBUG,
-   "%s: source can't be determined: dst=%s\n",
-   __func__, inet_ntop(AF_INET6,
-   _sa.sin6_addr, addr, sizeof(addr;
-   rtfree(rt);
-   goto bad;
-   }
-   src_sa.sin6_addr =
-   ifatoia6(rt->rt_ifa)->ia_addr.sin6_addr;
+   nd6log((LOG_DEBUG,
+   "%s: source can't be determined: dst=%s\n",
+   __func__, inet_ntop(AF_INET6,
+   _sa.sin6_addr, addr, sizeof(addr;
rtfree(rt);
+   goto bad;
}
+   src_sa.sin6_addr = 

Re: IPv6/NDP/IPsec breakage in -current

2016-11-02 Thread Martin Pieuchot
On 25/10/16(Tue) 22:13, Markus Friedl wrote:
> 
> > Am 25.10.2016 um 17:13 schrieb Mike Belopuhov :
> > 
> > 
> > There are apparently some discussions in infomational RFCs regarding
> > this issue.  For instance https://tools.ietf.org/html/rfc3756 
> >  states:
> > 
> >   More specifically, the currently used key agreement protocol, IKE,
> >   suffers from a chicken-and-egg problem [8]: one needs an IP address
> >   to run IKE, IKE is needed to establish IPsec SAs, and IPsec SAs are
> >   required to configure an IP address.
> > 
> > Which goes one step further: how to protect all ND in general, but is
> > still applicable in our situation.  There were attempts to protect ND
> > in alternative way, e.g. SEND (https://tools.ietf.org/html/rfc3971 
> > ).
> > FreeBSD has picked up on it and has had a SoC project which seems to
> > be integrated right now:
> > 
> >   https://wiki.freebsd.org/SOC2009AnaKukec 
> > 
> >   https://www.freebsd.org/cgi/man.cgi?query=send=4 
> > 
> > 
> > Would it be possible for us to disable the check and always set saddr6
> > to NULL for now?
> 
> Fine w/me.
> 
> Or we could check if the packet has been IPsec encapsulated
> and set saddr6 to NULL in this case.

Is this fixed?  Anything we're still waiting for?



Re: IPv6/NDP/IPsec breakage in -current

2016-10-25 Thread Markus Friedl

> Am 25.10.2016 um 17:13 schrieb Mike Belopuhov :
> 
> 
> There are apparently some discussions in infomational RFCs regarding
> this issue.  For instance https://tools.ietf.org/html/rfc3756 
>  states:
> 
>   More specifically, the currently used key agreement protocol, IKE,
>   suffers from a chicken-and-egg problem [8]: one needs an IP address
>   to run IKE, IKE is needed to establish IPsec SAs, and IPsec SAs are
>   required to configure an IP address.
> 
> Which goes one step further: how to protect all ND in general, but is
> still applicable in our situation.  There were attempts to protect ND
> in alternative way, e.g. SEND (https://tools.ietf.org/html/rfc3971 
> ).
> FreeBSD has picked up on it and has had a SoC project which seems to
> be integrated right now:
> 
>   https://wiki.freebsd.org/SOC2009AnaKukec 
> 
>   https://www.freebsd.org/cgi/man.cgi?query=send=4 
> 
> 
> Would it be possible for us to disable the check and always set saddr6
> to NULL for now?

Fine w/me.

Or we could check if the packet has been IPsec encapsulated
and set saddr6 to NULL in this case.

Re: IPv6/NDP/IPsec breakage in -current

2016-10-25 Thread Mike Belopuhov
On Thu, Oct 13, 2016 at 21:43 +0200, Markus Friedl wrote:
>
> > Am 13.10.2016 um 13:06 schrieb Christian Weisgerber :
> >
> >> After the second m_makespace():
> >>
> >>+--+-+  +--+  ++-+
> >>| IPv6 | ESP |  | IPv6 |  | ICMPv6 | ESP |
> >>+--+-+  +--+  ++-+
> >>
> >> With m_inject(), it would instead be something like this:
> >>
> >>+--++-+  +--+  +
> >>| IPv6 || ESP |  | IPv6 |  | ICMPv6  ...
> >>+--++-+  +--+  +
> >
> > Found it.  It's this snippet of nd6_ns_output() that handles those
> > mbuf chains differently:
> >
> >454 if (ln && ln->ln_hold) {
> >455 hip6 = mtod(ln->ln_hold, struct ip6_hdr *);
> >456 /* XXX pullup? */
> >457 if (sizeof(*hip6) < ln->ln_hold->m_len)
> >458 saddr6 = >ip6_src;
> >459 else
> >460 saddr6 = NULL;
> >461 } else
> >462 saddr6 = NULL;
> >
> > Did this only ever work by accident?
>
> ok, to get it right, the following is the difference:
>
> with m_inject() the first mbuf always contains the 40 byte ipv6 header
> while with m_makespace() it also contains the ESP header.
>
> so with m_inject() the ln_hold->m_len is 40 and since this is
> exactly the size of hip6, the code falls back to saddr6 = NULL.
>
> IMHO the code should use <= and not <:
>if (sizeof(*hip6) <=  ln->ln_hold->m_len)
> but then your example will also fail with the old m_inject() code.
>

I agree with the above.

> If this intended address selection is indeed correct then we
> need to figure out if a bypass flow for NDP is necessary, or
> if NDP should always bypass IPsec (but what about bringing NDP
> over IPsec?)
>

There are apparently some discussions in infomational RFCs regarding
this issue.  For instance https://tools.ietf.org/html/rfc3756 states:

   More specifically, the currently used key agreement protocol, IKE,
   suffers from a chicken-and-egg problem [8]: one needs an IP address
   to run IKE, IKE is needed to establish IPsec SAs, and IPsec SAs are
   required to configure an IP address.

Which goes one step further: how to protect all ND in general, but is
still applicable in our situation.  There were attempts to protect ND
in alternative way, e.g. SEND (https://tools.ietf.org/html/rfc3971).
FreeBSD has picked up on it and has had a SoC project which seems to
be integrated right now:

   https://wiki.freebsd.org/SOC2009AnaKukec
   https://www.freebsd.org/cgi/man.cgi?query=send=4

Would it be possible for us to disable the check and always set saddr6
to NULL for now?

> With IPv4 this problem does not exist, because ARP packet are not IP
> packets, so they are not matched by the IPsec flow.
>
> -m



Re: IPv6/NDP/IPsec breakage in -current

2016-10-19 Thread Christian Weisgerber
Alexander Bluhm:

> I also see issues with IPv6 and NDP, but no IPsec involved.  There
> are several other threads on bugs@ about broken IPv6.
> 
> It seems that sending neighbor solicitation retries for expired ND
> entries does not work.  The diff below helps in my case, although
> it is only a workaround and not MP safe.  It would be interesting
> to know wether it also affects your scenario.

It does not.

> --- netinet6/nd6.c3 Oct 2016 12:33:21 -   1.193
> +++ netinet6/nd6.c13 Oct 2016 21:47:25 -
> @@ -827,7 +827,7 @@ nd6_free(struct rtentry *rt, int gc)
>* caches, and disable the route entry not to be used in already
>* cached routes.
>*/
> - if (!ISSET(rt->rt_flags, RTF_STATIC|RTF_CACHED))
> + if (!ISSET(rt->rt_flags, RTF_STATIC))
>   rtdeletemsg(rt, ifp, ifp->if_rdomain);
>   splx(s);
>  

-- 
Christian "naddy" Weisgerber  na...@mips.inka.de



Re: IPv6/NDP/IPsec breakage in -current

2016-10-14 Thread Alexander Bluhm
On Thu, Oct 06, 2016 at 11:12:18PM +0200, Christian Weisgerber wrote:
> Something is very broken at the intersection of IPv6, NDP, and IPsec
> in -current.

I also see issues with IPv6 and NDP, but no IPsec involved.  There
are several other threads on bugs@ about broken IPv6.

It seems that sending neighbor solicitation retries for expired ND
entries does not work.  The diff below helps in my case, although
it is only a workaround and not MP safe.  It would be interesting
to know wether it also affects your scenario.

The RTF_CACHED code was introduced with this commit:

revision 1.190
date: 2016/08/22 16:01:52;  author: mpi;  state: Exp;  lines: +24 -6;  
commitid: Jx7agqiuXqs8RRGd;
Make the ``rt_gwroute'' pointer of RTF_GATEWAY entries immutable.

This means that no protection is needed to guarantee that the next hop
route wont be modified by CPU1 while CPU0 is dereferencing it in a L2
resolution functions.

While here also fix an ``ifa'' leak resulting in RTF_GATEWAY being always
invalid.

dlg@ likes it, inputs and ok bluhm@


bluhm

Index: netinet6/nd6.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/netinet6/nd6.c,v
retrieving revision 1.193
diff -u -p -r1.193 nd6.c
--- netinet6/nd6.c  3 Oct 2016 12:33:21 -   1.193
+++ netinet6/nd6.c  13 Oct 2016 21:47:25 -
@@ -827,7 +827,7 @@ nd6_free(struct rtentry *rt, int gc)
 * caches, and disable the route entry not to be used in already
 * cached routes.
 */
-   if (!ISSET(rt->rt_flags, RTF_STATIC|RTF_CACHED))
+   if (!ISSET(rt->rt_flags, RTF_STATIC))
rtdeletemsg(rt, ifp, ifp->if_rdomain);
splx(s);
 



Re: IPv6/NDP/IPsec breakage in -current

2016-10-13 Thread Markus Friedl

> Am 13.10.2016 um 13:06 schrieb Christian Weisgerber :
> 
>> After the second m_makespace():
>> 
>>+--+-+  +--+  ++-+
>>| IPv6 | ESP |  | IPv6 |  | ICMPv6 | ESP |
>>+--+-+  +--+  ++-+
>> 
>> With m_inject(), it would instead be something like this:
>> 
>>+--++-+  +--+  +
>>| IPv6 || ESP |  | IPv6 |  | ICMPv6  ...
>>+--++-+  +--+  +
> 
> Found it.  It's this snippet of nd6_ns_output() that handles those
> mbuf chains differently:
> 
>454 if (ln && ln->ln_hold) {
>455 hip6 = mtod(ln->ln_hold, struct ip6_hdr *);
>456 /* XXX pullup? */
>457 if (sizeof(*hip6) < ln->ln_hold->m_len)
>458 saddr6 = >ip6_src;
>459 else
>460 saddr6 = NULL;
>461 } else
>462 saddr6 = NULL;
> 
> Did this only ever work by accident?

ok, to get it right, the following is the difference:

with m_inject() the first mbuf always contains the 40 byte ipv6 header
while with m_makespace() it also contains the ESP header.

so with m_inject() the ln_hold->m_len is 40 and since this is
exactly the size of hip6, the code falls back to saddr6 = NULL.

IMHO the code should use <= and not <:
   if (sizeof(*hip6) <=  ln->ln_hold->m_len)
but then your example will also fail with the old m_inject() code.

If this intended address selection is indeed correct then we 
need to figure out if a bypass flow for NDP is necessary, or
if NDP should always bypass IPsec (but what about bringing NDP over IPsec?)

With IPv4 this problem does not exist, because ARP packet are not IP
packets, so they are not matched by the IPsec flow.

-m


Re: IPv6/NDP/IPsec breakage in -current

2016-10-12 Thread Mike Belopuhov
On Wed, Oct 12, 2016 at 18:00 +0200, Christian Weisgerber wrote:
> Mike Belopuhov:
> 
> > It's also not clear what's wrong with those broken NS/ND
> > packets that you receive.
> 
> Oct 12 17:30:10 bardioc /bsd: nd6_na_input: ND packet from non-neighbor
> Oct 12 17:30:12 bardioc last message repeated 2 times
> Oct 12 17:30:15 bardioc /bsd: nd6_ns_input: NS packet from non-neighbor
> Oct 12 17:30:15 bardioc /bsd: nd6_ns_input: src=2001:6f8:124a::4
> Oct 12 17:30:15 bardioc /bsd: nd6_ns_input: dst=2001:6f8:124a::2
> Oct 12 17:30:15 bardioc /bsd: nd6_ns_input: tgt=2001:6f8:124a::2
> Oct 12 17:30:16 bardioc /bsd: nd6_ns_input: NS packet from non-neighbor
> Oct 12 17:30:16 bardioc /bsd: nd6_ns_input: src=2001:6f8:124a::4
> Oct 12 17:30:16 bardioc /bsd: nd6_ns_input: dst=2001:6f8:124a::2
> Oct 12 17:30:16 bardioc /bsd: nd6_ns_input: tgt=2001:6f8:124a::2
> Oct 12 17:30:17 bardioc /bsd: nd6_ns_input: NS packet from non-neighbor
> Oct 12 17:30:17 bardioc /bsd: nd6_ns_input: src=2001:6f8:124a::4
> Oct 12 17:30:17 bardioc /bsd: nd6_ns_input: dst=2001:6f8:124a::2
> Oct 12 17:30:17 bardioc /bsd: nd6_ns_input: tgt=2001:6f8:124a::2
> 
> What seems to be wrong is that they are going through the tunnel.
>

But what does it have to do with m_makespace then?  It's called by
the esp_output, therefore the decision to apply the transformation
has been already made at this point.  Which SPD lookup succeeds
that shouldn't?

> > Do you get any ESP errors?
> 
> No.
> 
> > Could you please try the following diff.  Unfortunately,
> > it might produce too much output.  If you could narrow it
> > down to affected packets this would help a lot.
> 
> I can narrow it down to a single packet.  Here's the result from
> ping6 -c1 2001:6f8:124a::4 :
> 
> 2: PKTHDR (0xff00dfa9d200): len 64, total 152, leading(-) 72, trailing(+) 
> 16
> 2: MBUF (0xff00dfa9db00): len 40, total 224, leading(-) 0, trailing(+) 184
> 2: MBUF (0xff00dfa9da00): len 32, total 224, leading(-) 72, trailing(+) 
> 120
> ===
> 3: PKTHDR (0xff00dfa9d200): len 64, total 152, leading(-) 72, trailing(+) 
> 16
> 3: MBUF (0xff00dfa9db00): len 40, total 224, leading(-) 0, trailing(+) 184
> 3: MBUF (0xff00dfa9da00): len 56, total 224, leading(-) 72, trailing(+) 96
> ===
> 
> That corresponds to the two m_makespace() calls in esp_output().
> 
> I already dug around earlier.  The input packet layout looks like
> this:
> 
>   mbuf   mbuf
> +--+--+  ++
> | IPv6 | IPv6 |  | ICMPv6 |
> +--+--+  ++
> 
> After the first m_makespace():
> 
> +--+-+  +--+  ++
> | IPv6 | ESP |  | IPv6 |  | ICMPv6 |
> +--+-+  +--+  ++
> 
> After the second m_makespace():
> 
> +--+-+  +--+  ++-+
> | IPv6 | ESP |  | IPv6 |  | ICMPv6 | ESP |
> +--+-+  +--+  ++-+
> 
> With m_inject(), it would instead be something like this:
> 
> +--++-+  +--+  +
> | IPv6 || ESP |  | IPv6 |  | ICMPv6  ...
> +--++-+  +--+  +
> 
>

Yes, I don't see anything wrong with this.  For some reason you
get the same addresses for your mbuf chain components.  Are they
statically allocated somehow?

> The corresponding traffic this ping6 triggers on the outgoing em0
> interface:
> 
> 17:30:10.192307 2001:6f8:124a::2 > ff02::1:ff00:4: icmp6: neighbor sol: who 
> has 2001:6f8:124a::4
> 17:30:10.192737 esp 2001:6f8:124a::4 > 2001:6f8:124a::2 spi 0xbeefdead seq 2 
> len 120
> 17:30:11.189521 2001:6f8:124a::2 > ff02::1:ff00:4: icmp6: neighbor sol: who 
> has 2001:6f8:124a::4
> 17:30:11.189738 esp 2001:6f8:124a::4 > 2001:6f8:124a::2 spi 0xbeefdead seq 3 
> len 120
> 17:30:12.189831 2001:6f8:124a::2 > ff02::1:ff00:4: icmp6: neighbor sol: who 
> has 2001:6f8:124a::4
> 17:30:12.190033 esp 2001:6f8:124a::4 > 2001:6f8:124a::2 spi 0xbeefdead seq 4 
> len 120
> 17:30:15.187608 esp 2001:6f8:124a::4 > 2001:6f8:124a::2 spi 0xbeefdead seq 5 
> len 120
> 17:30:16.188826 esp 2001:6f8:124a::4 > 2001:6f8:124a::2 spi 0xbeefdead seq 6 
> len 120
> 17:30:17.194123 esp 2001:6f8:124a::4 > 2001:6f8:124a::2 spi 0xbeefdead seq 7 
> len 120
> 
> And on enc0:
> 
> 17:30:10.141545 (authentic,confidential): SPI 0xdeadbeef: 2001:6f8:124a::2 > 
> 2001:6f8:124a::4: icmp6: echo request (encap)
> 17:30:10.192872 (authentic,confidential): SPI 0xbeefdead: 2001:6f8:124a::4 > 
> 2001:6f8:124a::2: icmp6: neighbor adv: tgt is 2001:6f8:124a::4 (encap)
> 17:30:11.189809 (authentic,confidential): SPI 0xbeefdead: 2001:6f8:124a::4 > 
> 2001:6f8:124a::2: icmp6: neighbor adv: tgt is 2001:6f8:124a::4 (encap)
> 17:30:12.190101 (authentic,confidential): SPI 0xbeefdead: 2001:6f8:124a::4 > 
> 2001:6f8:124a::2: icmp6: neighbor adv: tgt is 2001:6f8:124a::4 (encap)
> 17:30:15.187706 (authentic,confidential): SPI 

Re: IPv6/NDP/IPsec breakage in -current

2016-10-12 Thread Christian Weisgerber
Mike Belopuhov:

> It's also not clear what's wrong with those broken NS/ND
> packets that you receive.

Oct 12 17:30:10 bardioc /bsd: nd6_na_input: ND packet from non-neighbor
Oct 12 17:30:12 bardioc last message repeated 2 times
Oct 12 17:30:15 bardioc /bsd: nd6_ns_input: NS packet from non-neighbor
Oct 12 17:30:15 bardioc /bsd: nd6_ns_input: src=2001:6f8:124a::4
Oct 12 17:30:15 bardioc /bsd: nd6_ns_input: dst=2001:6f8:124a::2
Oct 12 17:30:15 bardioc /bsd: nd6_ns_input: tgt=2001:6f8:124a::2
Oct 12 17:30:16 bardioc /bsd: nd6_ns_input: NS packet from non-neighbor
Oct 12 17:30:16 bardioc /bsd: nd6_ns_input: src=2001:6f8:124a::4
Oct 12 17:30:16 bardioc /bsd: nd6_ns_input: dst=2001:6f8:124a::2
Oct 12 17:30:16 bardioc /bsd: nd6_ns_input: tgt=2001:6f8:124a::2
Oct 12 17:30:17 bardioc /bsd: nd6_ns_input: NS packet from non-neighbor
Oct 12 17:30:17 bardioc /bsd: nd6_ns_input: src=2001:6f8:124a::4
Oct 12 17:30:17 bardioc /bsd: nd6_ns_input: dst=2001:6f8:124a::2
Oct 12 17:30:17 bardioc /bsd: nd6_ns_input: tgt=2001:6f8:124a::2

What seems to be wrong is that they are going through the tunnel.

> Do you get any ESP errors?

No.

> Could you please try the following diff.  Unfortunately,
> it might produce too much output.  If you could narrow it
> down to affected packets this would help a lot.

I can narrow it down to a single packet.  Here's the result from
ping6 -c1 2001:6f8:124a::4 :

2: PKTHDR (0xff00dfa9d200): len 64, total 152, leading(-) 72, trailing(+) 16
2: MBUF (0xff00dfa9db00): len 40, total 224, leading(-) 0, trailing(+) 184
2: MBUF (0xff00dfa9da00): len 32, total 224, leading(-) 72, trailing(+) 120
===
3: PKTHDR (0xff00dfa9d200): len 64, total 152, leading(-) 72, trailing(+) 16
3: MBUF (0xff00dfa9db00): len 40, total 224, leading(-) 0, trailing(+) 184
3: MBUF (0xff00dfa9da00): len 56, total 224, leading(-) 72, trailing(+) 96
===

That corresponds to the two m_makespace() calls in esp_output().

I already dug around earlier.  The input packet layout looks like
this:

  mbuf   mbuf
+--+--+  ++
| IPv6 | IPv6 |  | ICMPv6 |
+--+--+  ++

After the first m_makespace():

+--+-+  +--+  ++
| IPv6 | ESP |  | IPv6 |  | ICMPv6 |
+--+-+  +--+  ++

After the second m_makespace():

+--+-+  +--+  ++-+
| IPv6 | ESP |  | IPv6 |  | ICMPv6 | ESP |
+--+-+  +--+  ++-+

With m_inject(), it would instead be something like this:

+--++-+  +--+  +
| IPv6 || ESP |  | IPv6 |  | ICMPv6  ...
+--++-+  +--+  +


The corresponding traffic this ping6 triggers on the outgoing em0
interface:

17:30:10.192307 2001:6f8:124a::2 > ff02::1:ff00:4: icmp6: neighbor sol: who has 
2001:6f8:124a::4
17:30:10.192737 esp 2001:6f8:124a::4 > 2001:6f8:124a::2 spi 0xbeefdead seq 2 
len 120
17:30:11.189521 2001:6f8:124a::2 > ff02::1:ff00:4: icmp6: neighbor sol: who has 
2001:6f8:124a::4
17:30:11.189738 esp 2001:6f8:124a::4 > 2001:6f8:124a::2 spi 0xbeefdead seq 3 
len 120
17:30:12.189831 2001:6f8:124a::2 > ff02::1:ff00:4: icmp6: neighbor sol: who has 
2001:6f8:124a::4
17:30:12.190033 esp 2001:6f8:124a::4 > 2001:6f8:124a::2 spi 0xbeefdead seq 4 
len 120
17:30:15.187608 esp 2001:6f8:124a::4 > 2001:6f8:124a::2 spi 0xbeefdead seq 5 
len 120
17:30:16.188826 esp 2001:6f8:124a::4 > 2001:6f8:124a::2 spi 0xbeefdead seq 6 
len 120
17:30:17.194123 esp 2001:6f8:124a::4 > 2001:6f8:124a::2 spi 0xbeefdead seq 7 
len 120

And on enc0:

17:30:10.141545 (authentic,confidential): SPI 0xdeadbeef: 2001:6f8:124a::2 > 
2001:6f8:124a::4: icmp6: echo request (encap)
17:30:10.192872 (authentic,confidential): SPI 0xbeefdead: 2001:6f8:124a::4 > 
2001:6f8:124a::2: icmp6: neighbor adv: tgt is 2001:6f8:124a::4 (encap)
17:30:11.189809 (authentic,confidential): SPI 0xbeefdead: 2001:6f8:124a::4 > 
2001:6f8:124a::2: icmp6: neighbor adv: tgt is 2001:6f8:124a::4 (encap)
17:30:12.190101 (authentic,confidential): SPI 0xbeefdead: 2001:6f8:124a::4 > 
2001:6f8:124a::2: icmp6: neighbor adv: tgt is 2001:6f8:124a::4 (encap)
17:30:15.187706 (authentic,confidential): SPI 0xbeefdead: 2001:6f8:124a::4 > 
2001:6f8:124a::2: icmp6: neighbor sol: who has 2001:6f8:124a::2 (encap)
17:30:16.188924 (authentic,confidential): SPI 0xbeefdead: 2001:6f8:124a::4 > 
2001:6f8:124a::2: icmp6: neighbor sol: who has 2001:6f8:124a::2 (encap)
17:30:17.194226 (authentic,confidential): SPI 0xbeefdead: 2001:6f8:124a::4 > 
2001:6f8:124a::2: icmp6: neighbor sol: who has 2001:6f8:124a::2 (encap)

-- 
Christian "naddy" Weisgerber  na...@mips.inka.de



Re: IPv6/NDP/IPsec breakage in -current

2016-10-11 Thread Mike Belopuhov
On Mon, Oct 10, 2016 at 21:13 +, Christian Weisgerber wrote:
> On 2016-10-09, Christian Weisgerber  wrote:
> 
> > Found by bisection.  The culprit is this commit:
> >
> > 
> > CVSROOT:/cvs
> > Module name:src
> > Changes by: mar...@cvs.openbsd.org  2016/09/13 13:56:55
> >
> > Modified files:
> > sys/kern   : uipc_mbuf.c 
> > sys/netinet: ip_ah.c ip_esp.c ip_ipcomp.c ipsec_output.c 
> > sys/sys: mbuf.h 
> > share/man/man9 : mbuf.9 
> >
> > Log message:
> > avoid extensive mbuf allocation for IPsec by replacing m_inject(4)
> > with m_makespace(4) from freebsd; ok mpi@, bluhm@, mikeb@, dlg@
> > 
> 
> I don't see anything wrong in there.  Maybe the problem is elsewhere
> and that change just triggers it.
> 
> Meanwhile, here's a less invasive "backout" that neuters m_makespace()
> so it produces the same mbuf chains as m_inject() did.  This makes
> the bug disappear.
> 

I can't find any immediate deficiencies in the m_makespace.
It's also not clear what's wrong with those broken NS/ND
packets that you receive.  Do you get any ESP errors?
We need to know what kind of chains are being affected. 

Could you please try the following diff.  Unfortunately,
it might produce too much output.  If you could narrow it
down to affected packets this would help a lot.

diff --git sys/kern/uipc_mbuf.c sys/kern/uipc_mbuf.c
index d6b248f..cf9c650 100644
--- sys/kern/uipc_mbuf.c
+++ sys/kern/uipc_mbuf.c
@@ -996,10 +996,34 @@ extpacket:
n->m_next = m->m_next;
m->m_next = NULL;
return (n);
 }
 
+static void
+m_hexdump(const char *where, struct mbuf *m)
+{
+   char *desc;
+   int len;
+
+   while (m != NULL) {
+   len = MLEN;
+   desc = "MBUF";
+   if (m->m_flags & M_EXT) {
+   len = m->m_ext.ext_size;
+   desc = "CLUSTER";
+   } else if (m->m_flags & M_PKTHDR) {
+   len = MHLEN;
+   desc = "PKTHDR";
+   }
+   printf("%s: %s (%p): len %d, total %d, leading(-) %d, "
+   "trailing(+) %d\n", where, desc, m, m->m_len, len,
+   m_leadingspace(m), m_trailingspace(m));
+   m = m->m_next;
+   }
+   printf("===\n");
+}
+
 /*
  * Make space for a new header of length hlen at skip bytes
  * into the packet.  When doing this we allocate new mbufs only
  * when absolutely necessary.  The mbuf where the new header
  * is to go is returned together with an offset into the mbuf.
@@ -1032,10 +1056,11 @@ m_makespace(struct mbuf *m0, int skip, int hlen, int 
*off)
if (skip)
memmove(m->m_data-hlen, m->m_data, skip);
m->m_data -= hlen;
m->m_len += hlen;
(*off) = skip;
+   m_hexdump("1", m0);
} else if (hlen > M_TRAILINGSPACE(m)) {
struct mbuf *n0, *n, **np;
int todo, len, done, alloc;
 
n0 = NULL;
@@ -1073,10 +1098,11 @@ m_makespace(struct mbuf *m0, int skip, int hlen, int 
*off)
*off = skip;
if (n0 != NULL) {
*np = m->m_next;
m->m_next = n0;
}
+   m_hexdump("2", m0);
}
else {
n = m_get(M_DONTWAIT, m->m_type);
if (n == NULL) {
m_freem(n0);
@@ -1105,10 +1131,11 @@ m_makespace(struct mbuf *m0, int skip, int hlen, int 
*off)
if (remain > 0)
memmove(mtod(m, caddr_t) + skip + hlen,
  mtod(m, caddr_t) + skip, remain);
m->m_len += hlen;
*off = skip;
+   m_hexdump("3", m0);
}
m0->m_pkthdr.len += hlen;   /* adjust packet length */
return m;
 }
 



Re: IPv6/NDP/IPsec breakage in -current

2016-10-10 Thread Christian Weisgerber
On 2016-10-09, Christian Weisgerber  wrote:

> Found by bisection.  The culprit is this commit:
>
> 
> CVSROOT:/cvs
> Module name:src
> Changes by: mar...@cvs.openbsd.org  2016/09/13 13:56:55
>
> Modified files:
> sys/kern   : uipc_mbuf.c 
> sys/netinet: ip_ah.c ip_esp.c ip_ipcomp.c ipsec_output.c 
> sys/sys: mbuf.h 
> share/man/man9 : mbuf.9 
>
> Log message:
> avoid extensive mbuf allocation for IPsec by replacing m_inject(4)
> with m_makespace(4) from freebsd; ok mpi@, bluhm@, mikeb@, dlg@
> 

I don't see anything wrong in there.  Maybe the problem is elsewhere
and that change just triggers it.

Meanwhile, here's a less invasive "backout" that neuters m_makespace()
so it produces the same mbuf chains as m_inject() did.  This makes
the bug disappear.

Index: uipc_mbuf.c
===
RCS file: /cvs/src/sys/kern/uipc_mbuf.c,v
retrieving revision 1.228
diff -u -p -r1.228 uipc_mbuf.c
--- uipc_mbuf.c 13 Sep 2016 19:56:55 -  1.228
+++ uipc_mbuf.c 10 Oct 2016 20:54:40 -
@@ -1062,13 +1062,16 @@ m_makespace(struct mbuf *m0, int skip, i
 * the contents of m as needed.
 */
remain = m->m_len - skip;   /* data to move */
+#if 0
if (skip < remain && hlen <= M_LEADINGSPACE(m)) {
if (skip)
memmove(m->m_data-hlen, m->m_data, skip);
m->m_data -= hlen;
m->m_len += hlen;
(*off) = skip;
-   } else if (hlen > M_TRAILINGSPACE(m)) {
+   } else if (hlen > M_TRAILINGSPACE(m))
+#endif
+   {
struct mbuf *n0, *n, **np;
int todo, len, done, alloc;
 
@@ -1102,6 +1105,7 @@ m_makespace(struct mbuf *m0, int skip, i
todo -= len;
}
 
+#if 0
if (hlen <= M_TRAILINGSPACE(m) + remain) {
m->m_len = skip + hlen;
*off = skip;
@@ -1109,8 +1113,9 @@ m_makespace(struct mbuf *m0, int skip, i
*np = m->m_next;
m->m_next = n0;
}
-   }
-   else {
+   } else
+#endif
+   {
n = m_get(M_DONTWAIT, m->m_type);
if (n == NULL) {
m_freem(n0);
@@ -1131,7 +1136,9 @@ m_makespace(struct mbuf *m0, int skip, i
m = n;  /* header is at front ... */
*off = 0;   /* ... of new mbuf */
}
-   } else {
+   }
+#if 0
+   else {
/*
 * Copy the remainder to the back of the mbuf
 * so there's space to write the new header.
@@ -1142,6 +1149,7 @@ m_makespace(struct mbuf *m0, int skip, i
m->m_len += hlen;
*off = skip;
}
+#endif
m0->m_pkthdr.len += hlen;   /* adjust packet length */
return m;
 }
-- 
Christian "naddy" Weisgerber  na...@mips.inka.de



Re: IPv6/NDP/IPsec breakage in -current

2016-10-08 Thread Christian Weisgerber
On 2016-10-06, Christian Weisgerber  wrote:

> Something is very broken at the intersection of IPv6, NDP, and IPsec
> in -current.

Found by bisection.  The culprit is this commit:


CVSROOT:/cvs
Module name:src
Changes by: mar...@cvs.openbsd.org  2016/09/13 13:56:55

Modified files:
sys/kern   : uipc_mbuf.c 
sys/netinet: ip_ah.c ip_esp.c ip_ipcomp.c ipsec_output.c 
sys/sys: mbuf.h 
share/man/man9 : mbuf.9 

Log message:
avoid extensive mbuf allocation for IPsec by replacing m_inject(4)
with m_makespace(4) from freebsd; ok mpi@, bluhm@, mikeb@, dlg@


Kudos to vgross@ for calling it a few days ago:

> If so, I think we are dealing with some kind of mbuf mangling.

A backout is somewhat involved due to further changes that have been
layered on top.  Backout patch below for completeness, but I don't
suggest that this be actually committed.

Index: sys/kern/uipc_mbuf.c
===
RCS file: /cvs/src/sys/kern/uipc_mbuf.c,v
retrieving revision 1.231
diff -u -p -r1.231 uipc_mbuf.c
--- sys/kern/uipc_mbuf.c15 Sep 2016 02:00:16 -  1.231
+++ sys/kern/uipc_mbuf.c9 Oct 2016 00:20:34 -
@@ -126,6 +126,7 @@ int max_hdr;/* largest 
link+protocol 
 struct mutex m_extref_mtx = MUTEX_INITIALIZER(IPL_NET);
 
 void   m_extfree(struct mbuf *);
+struct mbuf *m_copym0(struct mbuf *, int, int, int, int);
 void   nmbclust_update(void);
 void   m_zero(struct mbuf *);
 
@@ -567,7 +568,23 @@ m_prepend(struct mbuf *m, int len, int h
  * The wait parameter is a choice of M_WAIT/M_DONTWAIT from caller.
  */
 struct mbuf *
-m_copym(struct mbuf *m0, int off, int len, int wait)
+m_copym(struct mbuf *m, int off, int len, int wait)
+{
+   return m_copym0(m, off, len, wait, 0);  /* shallow copy on M_EXT */
+}
+
+/*
+ * m_copym2() is like m_copym(), except it COPIES cluster mbufs, instead
+ * of merely bumping the reference count.
+ */
+struct mbuf *
+m_copym2(struct mbuf *m, int off, int len, int wait)
+{
+   return m_copym0(m, off, len, wait, 1);  /* deep copy */
+}
+
+struct mbuf *
+m_copym0(struct mbuf *m0, int off, int len, int wait, int deep)
 {
struct mbuf *m, *n, **np;
struct mbuf *top;
@@ -600,9 +617,23 @@ m_copym(struct mbuf *m0, int off, int le
}
n->m_len = min(len, m->m_len - off);
if (m->m_flags & M_EXT) {
-   n->m_data = m->m_data + off;
-   n->m_ext = m->m_ext;
-   MCLADDREFERENCE(m, n);
+   if (!deep) {
+   n->m_data = m->m_data + off;
+   n->m_ext = m->m_ext;
+   MCLADDREFERENCE(m, n);
+   } else {
+   /*
+* we are unsure about the way m was allocated.
+* copy into multiple MCLBYTES cluster mbufs.
+*/
+   MCLGET(n, wait);
+   n->m_len = 0;
+   n->m_len = M_TRAILINGSPACE(n);
+   n->m_len = min(n->m_len, len);
+   n->m_len = min(n->m_len, m->m_len - off);
+   memcpy(mtod(n, caddr_t), mtod(m, caddr_t) + off,
+   n->m_len);
+   }
} else
memcpy(mtod(n, caddr_t), mtod(m, caddr_t) + off,
n->m_len);
@@ -933,6 +964,67 @@ m_getptr(struct mbuf *m, int loc, int *o
 }
 
 /*
+ * Inject a new mbuf chain of length siz in mbuf chain m0 at
+ * position len0. Returns a pointer to the first injected mbuf, or
+ * NULL on failure (m0 is left undisturbed). Note that if there is
+ * enough space for an object of size siz in the appropriate position,
+ * no memory will be allocated. Also, there will be no data movement in
+ * the first len0 bytes (pointers to that will remain valid).
+ *
+ * XXX It is assumed that siz is less than the size of an mbuf at the moment.
+ */
+struct mbuf *
+m_inject(struct mbuf *m0, int len0, int siz, int wait)
+{
+   struct mbuf *m, *n, *n2 = NULL, *n3;
+   unsigned len = len0, remain;
+
+   if ((siz >= MHLEN) || (len0 <= 0))
+   return (NULL);
+   for (m = m0; m && len > m->m_len; m = m->m_next)
+   len -= m->m_len;
+   if (m == NULL)
+   return (NULL);
+   remain = m->m_len - len;
+   if (remain == 0) {
+   if ((m->m_next) && (M_LEADINGSPACE(m->m_next) >= siz)) {
+   m->m_next->m_len += siz;
+   if (m0->m_flags &