uvm_fault when setting ddb breakpoint on armv7 -current

2020-12-15 Thread Vincent Gross
Hello,

I am investigating a usb issue on my imx6-based novena, and I tried to
set a breakpoint to inspect the backtrace when the issue occurs. The
problem is, when resuming execution out of ddb, I get a uvm_fault and
then the only way forward is to reboot the system.

Am I missing a step ? or is it a bug ?


-> Kernel config

$ diff -Naur /usr/src/sys/arch/armv7/conf/{GENERIC,USBDEBUG}

--- /usr/src/sys/arch/armv7/conf/GENERICMon Dec 14 09:19:10 2020
+++ /usr/src/sys/arch/armv7/conf/USBDEBUG   Sun Dec 13 17:26:09 2020
@@ -26,6 +26,11 @@
 option USBVERBOSE
 option USER_PCICONF# user-space PCI configuration
 
+option USB_DEBUG
+option UHUB_DEBUG
+option UMASS_DEBUG
+option EHCI_DEBUG
+
 config bsd swap generic

 # The main bus device


-> steps to reproduce over serial console

$ doas sysctl ddb.trigger=1
Stopped at  db_enter:   ldrbr15, [r15, r15, ror r15]!
ddb> break umass_bbb_reset
ddb> c

uvm_fault(0xc08d1260, c0659000, 2, 0) -> e
Fatal kernel mode data abort: 'Permission fault (L1)'
trapframe: 0xd0ccfcf8
DFSR=080d, DFAR=c06595b8, spsr=2013
r0 =00ff, r1 =c06595b8, r2 =, r3 =0002
r4 =c08e5164, r5 =c06595b8, r6 =d0ccfd91, r7 =0003
r8 =c083ee30, r9 =0004, r10=c06595b8, r11=d0ccfd88
r12=000f, ssp=d0ccfd48, slr=1060, pc =c04d69c0

Stopped at  db_write_bytes+0x3ac:   strbr0, [r5], #0x001
ddb> trace
db_write_bytes+0x3ac
rlv=0xc03973fc rfp=0xd0ccfda0
db_put_value+0x50
rlv=0xc0669cc0 rfp=0xd0ccfdb0
db_set_breakpoints+0x54
rlv=0xc072e670 rfp=0xd0ccfdd8
db_restart_at_pc+0x178
rlv=0xc06731c4 rfp=0xd0ccfe00
db_trap+0x14c
rlv=0xc04d6b18 rfp=0xd0ccfe20
db_trapper+0x88
rlv=0xc06f4734 rfp=0xd0ccfe50
undefinedinstruction+0x114
rlv=0xc05b5a68 rfp=0xd0ccfed8
$a.13
rlv=0xc04b1a18 rfp=0xd0ccff40
sys_sysctl+0x17c
rlv=0xc0427620 rfp=0xd0ccffa8
swi_handler+0x2e0
rlv=0xc05b5898 rfp=0xbffe1460


-> dmesg

Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.
Copyrght (c) 1995-2020 OpenBSD. All rights reserved.  https://www.OpenBS.org

OpenBSD 6.8-current (USBDEBUG) #2: Mon Dec 14 10:18:02 CET 2020
dermi...@derrida.kilob.yt:/usr/src/sys/arch/armv/compile/USBDEBUG
real mem  = 3933511680 (3751MB)
avail mem = 3847270400 (3669MB)
random: good seed from bootblocks
mainbus0 at root: Kosagi Novena Dual/Quad
cpu0 at mainbus0 mpidr 0: ARM Cortex-A9 r2p10
cpu0: 32KB 32b/line 4-way L1 VIPT I-cache, 32KB 32b/line 4-way L1 D-cache
cortex0 at mainbus0
amptimer0 at cortex0: tick rate 396000 KHz
armliicc0 at cortex0: rtl 7 waymask: 0x000f
ampintc0 at mainbus0 nirq 160, ncpu 4: "interrupt-controller"
simplebus0 at mainbus0: "soc"
"dma-apbh" at simplebus0 not configured
"gpu" at simplebus0 not configured
"gpu" at simplebus0 not configured
"hdmi" at simplebus0 not configured
"timer" at simplebus0 not configured
"l2-cache" at simplebus0 not configured
"pcie" at simplebus0 not configured
"pmu" at simplebus0 not configured
simplebus1 at simplebus0: "aips-bus"
imxccm0 at simplebus1
imxanatop0 at simplebus1
syscon0 at simplebus1: "snvs"
imxrtc0 at syscon0
imxsrc0 at simplebus1
syscon1 at simplebus1: "iomuxc-gpr"
imxiomuxc0 at simplebus1
simplebus2 at simplebus1: "spba-bus"
"ssi" at simplebus2 not configured
"asrc" at simplebus2 not configured
"vpu" at simplebus1 not configured
"pwm" at simplebus1 not configured
"gpt" at simplebus1 not configured
imxgpio0 at simplebus1
imxgpio1 at simplebus1
imxgpio2 at simplebus1
imxgpio3 at simplebus1
imxgpio4 at simplebus1
imxgpio5 at simplebus1
imxgpio6 at simplebus1
"kpp" at simplebus1 not configured
imxdog0 at simplebus1
imxtemp0 at simplebus1
"usbphy" at simplebus1 not configured
"usbphy" at simplebus1 not configured
imxgpc0 at simplebus1
"ldb" at simplebus1 not configured
"sdma" at simplebus1 not configured
simplebus3 at simplebus0: "aips-bus"
syscon2 at simplebus3: "ocotp"
"caam" at simplebus3 not configured
imxehci0 at simplebus3
usb0 at imxehci0: USB revision 2.0
uhub0 at usb0 configuration 1 interface 0 "i.MX EHCI root hub" rev 2.00/1.00 
addr 1
uhub0: 1 port with 1 removable, self powered
imxehci1 at simplebus3
usb1 at imxehci1: USB revision 2.0
uhub1 at usb1 configuration 1 interface 0 "i.MX EHCI root hub" rev 2.00/1.00 
addr 1
uhub1: 1 port with 1 removable, self powered
"usbmisc" at simplebus3 not configured
fec0 at simplebus3
fec0: address 00:1f:11:02:17:de
ukphy0 at fec0 phy 7: Generic IEEE 802.3u media interface, rev. 1: OUI 
0x000885, model 0x0021
imxesdhc0 at simplebus3
imxesdhc0: 198 MHz base clock
sdmmc0 at imxesdhc0: 4-bit, sd high-speed, mmc high-speed, dma
imxesdhc1 at simplebus3
imxesdhc1: 198 MHz base clock
sdmmc1 at imxesdhc1: 4-bit, sd high-speed, mmc high-speed, dma
imxiic0 at simplebus3
iic0 at imxiic0
"sbs,sbs-battery" at iic0 addr 0xb not configured
"kosagi,senoko" at iic0 addr 0x20 not configured

Re: IP_SENDSRCADDR cmsg_len and dnsmasq

2018-07-16 Thread Vincent Gross
On Thu, 12 Jul 2018 19:54:26 +0200
Alexander Bluhm  wrote:

> 
> If it is a temporary problem, that will go away when the content
> of the socket buffer is sent away, we should block or return
> EWOULDBLOCK.  For a permanent problem return EMSGSIZE.  Non atomic
> operations can be split in smaller chunks, so there are no permanent
> problems.  Control messages are considerd atomic.  AF_UNIX needs
> special treatment as file descriptor passing is dificult.  On top
> of that consider integer overflow.
> 
> revision 1.100
> date: 2012/04/24 16:35:08;  author: deraadt;  state: Exp;  lines:
> +13 -3; In sosend() for AF_UNIX control message sending, correctly
> calculate the size (internalized ones can be larger on some
> architectures) for fitting into the socket.  Avoid getting confused
> by sb_hiwat as well. This fixes a variety of issues where sendmsg()
> would fail to deliver a fd set or fail to wait; even leading to file
> leakage. Worked on this with claudio for about a week...
> 
> revision 1.145
> date: 2016/01/06 10:06:50;  author: stefan;  state: Exp;  lines:
> +27 -30; Prevent integer overflows in sosend() and soreceive() by
> converting min()+uiomovei() to ulmin()+uiomove() and re-arranging
> space computations in sosend(). The soreceive() part was also
> reported by Martin Natano. ok bluhm@ and also discussed with tedu@
> 
> So first of all we should split the AF_UNIX cases to keep it readable.
> And I don't want to change the AF_UNIX code as the commit message
> indicates that it was hard to develop the current solution.
> 
> From the bug reports it seems that we should check that the UDP
> packets and the IP_SENDSRCADDR fit into the socket buffer.  If not
> it is a permanent EMSGSIZE error.  So make sure that resid + clen
> <= so->so_snd.sb_hiwat.  Then write it the other way around to
> prevent signed integer overflow.
> 
> The result of this considerations is the diff below.  I have not
> tested it.  Does the orignal bug go away with it?  Some hackathons
> ago jeremy@ mentioned that the ruby test suite found a bug in this
> area.  So maybe we should try it.
> 
> Does this make sense?
> 
> bluhm
> 
> Index: kern/uipc_socket.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/kern/uipc_socket.c,v
> retrieving revision 1.225
> diff -u -p -r1.225 uipc_socket.c
> --- kern/uipc_socket.c5 Jul 2018 14:45:07 -   1.225
> +++ kern/uipc_socket.c12 Jul 2018 17:24:28 -
> @@ -462,10 +462,14 @@ restart:
>   space = sbspace(so, >so_snd);
>   if (flags & MSG_OOB)
>   space += 1024;
> - if ((atomic && resid > so->so_snd.sb_hiwat) ||
> - (so->so_proto->pr_domain->dom_family != AF_UNIX
> &&
> - clen > so->so_snd.sb_hiwat))
> - snderr(EMSGSIZE);
> + if (so->so_proto->pr_domain->dom_family == AF_UNIX) {
> + if (atomic && resid > so->so_snd.sb_hiwat)
> + snderr(EMSGSIZE);
> + } else {
> + if (clen > so->so_snd.sb_hiwat ||
> + (atomic && resid > so->so_snd.sb_hiwat -
> clen))
> + snderr(EMSGSIZE);
> + }
>   if (space < clen ||
>   (space - clen < resid &&
>   (atomic || space < so->so_snd.sb_lowat))) {

It is indeed much easier to parse. Kudos on spotting the potential
overflow. Ok vgross@

I have a regression test for this based on Alexander Markert code +
rework by mpi@, do you want me to commit it right now ?



IP_SENDSRCADDR cmsg_len and dnsmasq

2018-06-27 Thread Vincent Gross
So a while back Alexander Markert sent a bug report regarding sendmsg()
behaviour with IP_SENDSRCADDR :

https://marc.info/?l=openbsd-tech=149276833923905=2

This impacts our dnsmasq port :

https://marc.info/?l=openbsd-tech=149234052220818=2

Alexander Markert shows in the first thread the problematic code and
conditions.

To save you the trip back in time : sendfrom() returns EWOULDBLOCK (or
blocks if using blocking IO) when len(cmsg) + len(data) >
len(socket.buffer). The better behaviour would be to never block and
return EMSGSIZE.

The first diff fixes the kernel code ; The second diff reverts
https://marc.info/?l=openbsd-ports-cvs=149233921320572=2 and fixes
a bad cmsg setup.

1) Can you confirm this fixes dnsmasq ? or whatever you used to trigger
  the bug ?

2) Ok ?


(apologies for the delay by the way :S )

-- 8<  8<  8< --

Index: sys/kern/uipc_socket.c
===
RCS file: /cvs/src/sys/kern/uipc_socket.c,v
retrieving revision 1.224
diff -u -p -r1.224 uipc_socket.c
--- sys/kern/uipc_socket.c  14 Jun 2018 08:46:09 -
1.224 +++ sys/kern/uipc_socket.c26 Jun 2018 18:51:49 -
@@ -459,9 +459,10 @@ restart:
space = sbspace(so, >so_snd);
if (flags & MSG_OOB)
space += 1024;
-   if ((atomic && resid > so->so_snd.sb_hiwat) ||
+   if ((so->so_proto->pr_domain->dom_family == AF_UNIX &&
+   atomic && resid > so->so_snd.sb_hiwat) ||
(so->so_proto->pr_domain->dom_family != AF_UNIX &&
-   clen > so->so_snd.sb_hiwat))
+   clen + (atomic ? resid : 0) > so->so_snd.sb_hiwat))
snderr(EMSGSIZE);
if (space < clen ||
(space - clen < resid &&



-- 8<  8<  8< --


Index: net/dnsmasq/patches/patch-src_dnsmasq_c
===
RCS file: net/dnsmasq/patches/patch-src_dnsmasq_c
diff -N net/dnsmasq/patches/patch-src_dnsmasq_c
--- net/dnsmasq/patches/patch-src_dnsmasq_c 29 Mar 2018 19:42:51 -  
1.5
+++ /dev/null   1 Jan 1970 00:00:00 -
@@ -1,16 +0,0 @@
-$OpenBSD: patch-src_dnsmasq_c,v 1.5 2018/03/29 19:42:51 ajacoutot Exp $
-
-Fails. Currently disabled pending investigation.
-
-Index: src/dnsmasq.c
 src/dnsmasq.c.orig
-+++ src/dnsmasq.c
-@@ -149,7 +149,7 @@ int main (int argc, char **argv)
-   open("/dev/null", O_RDWR); 
- 
- #ifndef HAVE_LINUX_NETWORK
--#  if !(defined(IP_RECVDSTADDR) && defined(IP_RECVIF) && 
defined(IP_SENDSRCADDR))
-+#  if defined(__OpenBSD__) || !(defined(IP_RECVDSTADDR) && defined(IP_RECVIF) 
&& defined(IP_SENDSRCADDR))
-   if (!option_bool(OPT_NOWILD))
- {
-   bind_fallback = 1;
Index: net/dnsmasq/patches/patch-src_forward_c
===
RCS file: /cvs/ports/net/dnsmasq/patches/patch-src_forward_c,v
retrieving revision 1.1
diff -u -p -r1.1 patch-src_forward_c
--- net/dnsmasq/patches/patch-src_forward_c 16 Apr 2017 10:40:07 -  
1.1
+++ net/dnsmasq/patches/patch-src_forward_c 26 Jun 2018 18:48:17 -
@@ -1,24 +1,17 @@
-$OpenBSD: patch-src_forward_c,v 1.1 2017/04/16 10:40:07 sthen Exp $
+$OpenBSD$
 
-Fails. Currently disabled pending investigation.
+CMSG_SPACE() != CMSG_LEN()
 
 src/forward.c.orig Sat Apr 15 22:36:04 2017
-+++ src/forward.c  Sat Apr 15 22:46:09 2017
-@@ -35,7 +35,7 @@ int send_from(int fd, int nowild, char *packet, size_t
- struct cmsghdr align; /* this ensures alignment */
- #if defined(HAVE_LINUX_NETWORK)
- char control[CMSG_SPACE(sizeof(struct in_pktinfo))];
--#elif defined(IP_SENDSRCADDR)
-+#elif !defined(__OpenBSD__) && defined(IP_SENDSRCADDR)
- char control[CMSG_SPACE(sizeof(struct in_addr))];
- #endif
- #ifdef HAVE_IPV6
-@@ -71,7 +71,7 @@ int send_from(int fd, int nowild, char *packet, size_t
- msg.msg_controllen = cmptr->cmsg_len = CMSG_LEN(sizeof(struct 
in_pktinfo));
- cmptr->cmsg_level = IPPROTO_IP;
+Index: src/forward.c
+--- src/forward.c.orig
 src/forward.c
+@@ -73,7 +73,8 @@ int send_from(int fd, int nowild, char *packet, size_t
  cmptr->cmsg_type = IP_PKTINFO;
--#elif defined(IP_SENDSRCADDR)
-+#elif !defined(__OpenBSD__) && defined(IP_SENDSRCADDR)
+ #elif defined(IP_SENDSRCADDR)
  memcpy(CMSG_DATA(cmptr), &(source->addr.addr4), 
sizeof(source->addr.addr4));
- msg.msg_controllen = cmptr->cmsg_len = CMSG_LEN(sizeof(struct 
in_addr));
+-msg.msg_controllen = cmptr->cmsg_len = CMSG_LEN(sizeof(struct 
in_addr));
++msg.msg_controllen = sizeof(control_u.control);
++cmptr->cmsg_len = CMSG_LEN(sizeof(struct in_addr));
  cmptr->cmsg_level = IPPROTO_IP;
+ cmptr->cmsg_type = IP_SENDSRCADDR;
+ #endif



sys/net/if.c, leftovers from r1.442

2016-12-01 Thread Vincent Gross
up is never set in ifioctl().

Ok ?

Index: net/if.c
===
RCS file: /cvs/src/sys/net/if.c,v
retrieving revision 1.463
diff -u -p -r1.463 if.c
--- net/if.c28 Nov 2016 11:18:02 -  1.463
+++ net/if.c1 Dec 2016 20:31:27 -
@@ -1688,7 +1688,6 @@ ifioctl(struct socket *so, u_long cmd, c
size_t bytesdone;
short oif_flags;
const char *label;
-   short up = 0;
 
switch (cmd) {
 
@@ -2046,12 +2045,6 @@ ifioctl(struct socket *so, u_long cmd, c
if (((oif_flags ^ ifp->if_flags) & IFF_UP) != 0)
microtime(>if_lastchange);
 
-   /* If we took down the IF, bring it back */
-   if (up) {
-   s = splnet();
-   if_up(ifp);
-   splx(s);
-   }
return (error);
 }
 



vxlan bug wrt IN6_ANY as source Was: Re: tweak in6_selectsrc()

2016-11-30 Thread Vincent Gross
On Tue, 29 Nov 2016 17:03:44 +0100
Martin Pieuchot  wrote:

> Diff below removes the 'struct route_in6' argument from
> in6_selectsrc().
> 
> It is only used by in6_pcbselsrc() so move the code there.  This
> reduces differences with IPv4 and help me to get rid of 'struct
> route*'.
> 
> ok?

Reads ok, not tested yet.

Your diff is interesting in that is helped me to find a bug
in vxlan(4).

Build a tunnel like this:
$ doas ifconfig pair11 rdomain 11
$ doas ifconfig pair12 rdomain 12 patch pair11
$ doas ifconfig pair11 inet6 fd03::11/64 up
$ doas ifconfig pair12 inet6 fd03::12/64 up
$ doas ifconfig vxlan11 rdomain 11 tunneldomain 11 vnetid 10
$ doas ifconfig vxlan12 rdomain 12 tunneldomain 12 vnetid 10
$ doas ifconfig vxlan11 inet6 fd06::11/64 tunnel :: fd03::12 up
$ doas ifconfig vxlan12 inet6 fd06::12/64 tunnel :: fd03::11 up

Watch ping6 fail:
$ ping6 -V 11 fd06::12

Tweak the vxlans and see pings flow
$ doas ifconfig vxlan11 tunnel fd03::11 fd03::12
$ doas ifconfig vxlan12 tunnel fd03::12 fd03::11
$ ping6 -V 11 fd06::11


I think we should not allow at all empty source addresses, as it can
make things confusing when troubleshooting. goda@ yasuoka@ reyk@ :
what is your take on this ?


> 
> Index: net/if_vxlan.c
> ===
> RCS file: /cvs/src/sys/net/if_vxlan.c,v
> retrieving revision 1.52
> diff -u -p -r1.52 if_vxlan.c
> --- net/if_vxlan.c29 Nov 2016 10:09:57 -  1.52
> +++ net/if_vxlan.c29 Nov 2016 15:52:41 -
> @@ -768,7 +768,7 @@ vxlan_encap6(struct ifnet *ifp, struct m
>   ip6->ip6_hlim = ip6_defhlim;
>  
>   if (IN6_IS_ADDR_UNSPECIFIED((src)->sin6_addr)) {
> - error = in6_selectsrc(, satosin6(dst), NULL,
> NULL,
> + error = in6_selectsrc(, satosin6(dst), NULL,
>   sc->sc_rdomain);
>   if (error != 0) {
>   m_freem(m);
> Index: netinet6/in6_src.c
> ===
> RCS file: /cvs/src/sys/netinet6/in6_src.c,v
> retrieving revision 1.80
> diff -u -p -r1.80 in6_src.c
> --- netinet6/in6_src.c2 Sep 2016 13:53:44 -   1.80
> +++ netinet6/in6_src.c29 Nov 2016 15:56:56 -
> @@ -99,7 +99,6 @@ in6_pcbselsrc(struct in6_addr **in6src, 
>   struct route_in6 *ro = >inp_route6;
>   struct in6_addr *laddr = >inp_laddr6;
>   u_int rtableid = inp->inp_rtableid;
> -
>   struct ifnet *ifp = NULL;
>   struct in6_addr *dst;
>   struct in6_ifaddr *ia6 = NULL;
> @@ -172,7 +171,55 @@ in6_pcbselsrc(struct in6_addr **in6src, 
>   return (0);
>   }
>  
> - return in6_selectsrc(in6src, dstsock, mopts, ro, rtableid);
> + error = in6_selectsrc(in6src, dstsock, mopts, rtableid);
> + if (error != EADDRNOTAVAIL)
> + return (error);
> +
> + /*
> +  * If route is known or can be allocated now,
> +  * our src addr is taken from the i/f, else punt.
> +  */
> + if (!rtisvalid(ro->ro_rt) || (ro->ro_tableid != rtableid) ||
> + !IN6_ARE_ADDR_EQUAL(>ro_dst.sin6_addr, dst)) {
> + rtfree(ro->ro_rt);
> + ro->ro_rt = NULL;
> + }
> + if (ro->ro_rt == NULL) {
> + struct sockaddr_in6 *sa6;
> +
> + /* No route yet, so try to acquire one */
> + bzero(>ro_dst, sizeof(struct sockaddr_in6));
> + ro->ro_tableid = rtableid;
> + sa6 = >ro_dst;
> + sa6->sin6_family = AF_INET6;
> + sa6->sin6_len = sizeof(struct sockaddr_in6);
> + sa6->sin6_addr = *dst;
> + sa6->sin6_scope_id = dstsock->sin6_scope_id;
> + ro->ro_rt = rtalloc(sin6tosa(>ro_dst),
> + RT_RESOLVE, ro->ro_tableid);
> + }
> +
> + /*
> +  * in_pcbconnect() checks out IFF_LOOPBACK to skip using
> +  * the address. But we don't know why it does so.
> +  * It is necessary to ensure the scope even for lo0
> +  * so doesn't check out IFF_LOOPBACK.
> +  */
> +
> + if (ro->ro_rt) {
> + ifp = if_get(ro->ro_rt->rt_ifidx);
> + if (ifp != NULL) {
> + ia6 = in6_ifawithscope(ifp, dst, rtableid);
> + if_put(ifp);
> + }
> + if (ia6 == NULL) /* xxx scope error ?*/
> + ia6 = ifatoia6(ro->ro_rt->rt_ifa);
> + }
> + if (ia6 == NULL)
> + return (EHOSTUNREACH);  /* no route */
> +
> + *in6src = >ia_addr.sin6_addr;
> + return (0);
>  }
>  
>  /*
> @@ -183,7 +230,7 @@ in6_pcbselsrc(struct in6_addr **in6src, 
>   */
>  int
>  in6_selectsrc(struct in6_addr **in6src, struct sockaddr_in6 *dstsock,
> -struct ip6_moptions *mopts, struct route_in6 *ro, u_int rtableid)
> +struct ip6_moptions *mopts, unsigned int rtableid)
>  {
>   struct ifnet *ifp = NULL;
>   struct in6_addr *dst;
> @@ -239,54 +286,6 @@ in6_selectsrc(struct in6_addr 

Re: add in6 multicast support to vxlan(4), take 4

2016-11-30 Thread Vincent Gross
On Tue, 29 Nov 2016 15:13:16 +0100
Alexander Bluhm <alexander.bl...@gmx.net> wrote:

> On Sat, Nov 05, 2016 at 12:41:39PM +0100, Vincent Gross wrote:
> > Updated diff, I reworked the logic to handle the if_get/if_put
> > dance in vxlan_multicast_join(), and fixed an uninitialized
> > variable.
> > 
> > Ok ?  
> 
> Some nits inline.

[snip]

About sleeping on malloc : better to err on the safe side with
M_NOWAIT.

About resolving the route : you are right, the cloning route is enough
to get the interface index. 

New diff with nits fixed :


Index: sys/net/if_vxlan.c
===
RCS file: /cvs/src/sys/net/if_vxlan.c,v
retrieving revision 1.52
diff -u -p -r1.52 if_vxlan.c
--- sys/net/if_vxlan.c  29 Nov 2016 10:09:57 -  1.52
+++ sys/net/if_vxlan.c  30 Nov 2016 22:23:01 -
@@ -47,6 +47,10 @@
 #include 
 #include 
 
+#ifdef INET6
+#include 
+#endif /* INET6 */
+
 #if NPF > 0
 #include 
 #endif
@@ -61,7 +65,14 @@ struct vxlan_softc {
struct arpcomsc_ac;
struct ifmedia   sc_media;
 
-   struct ip_moptions   sc_imo;
+   union {
+   struct ip_moptions   u_imo;
+#ifdef INET6
+   struct ip6_moptions  u_im6o;
+#endif /* INET6 */
+   } sc_imu;
+#define sc_imo sc_imu.u_imo
+#define sc_im6osc_imu.u_im6o
void*sc_ahcookie;
void*sc_lhcookie;
void*sc_dhcookie;
@@ -129,10 +140,6 @@ vxlan_clone_create(struct if_clone *ifc,
M_DEVBUF, M_NOWAIT|M_ZERO)) == NULL)
return (ENOMEM);
 
-   sc->sc_imo.imo_membership = malloc(
-   (sizeof(struct in_multi *) * IP_MIN_MEMBERSHIPS), M_IPMOPTS,
-   M_WAITOK|M_ZERO);
-   sc->sc_imo.imo_max_memberships = IP_MIN_MEMBERSHIPS;
sc->sc_dstport = htons(VXLAN_PORT);
sc->sc_vnetid = VXLAN_VNI_UNSET;
 
@@ -190,7 +197,6 @@ vxlan_clone_destroy(struct ifnet *ifp)
ifmedia_delete_instance(>sc_media, IFM_INST_ANY);
ether_ifdetach(ifp);
if_detach(ifp);
-   free(sc->sc_imo.imo_membership, M_IPMOPTS, 0);
free(sc, M_DEVBUF, sizeof(*sc));
 
return (0);
@@ -199,11 +205,35 @@ vxlan_clone_destroy(struct ifnet *ifp)
 void
 vxlan_multicast_cleanup(struct ifnet *ifp)
 {
-   struct vxlan_softc  *sc = (struct vxlan_softc *)ifp->if_softc;
-   struct ip_moptions  *imo = >sc_imo;
-   struct ifnet*mifp;
+   struct vxlan_softc   *sc = (struct vxlan_softc *)ifp->if_softc;
+   struct ip_moptions   *imo;
+   struct in_multi **imm;
+   struct ip6_moptions  *im6o;
+   struct in6_multi_mship   *im6m, *im6m_next;
+   struct ifnet *mifp = NULL;
+
+   switch (sc->sc_dst.ss_family) {
+   case AF_INET:
+   imo = >sc_imo;
+   mifp = if_get(imo->imo_ifidx);
+   imm = imo->imo_membership;
+   while (imo->imo_num_memberships > 0)
+   in_delmulti(imm[--imo->imo_num_memberships]);
+   free(imm, M_IPMOPTS,
+   sizeof(struct in_multi *) * IP_MIN_MEMBERSHIPS);
+   break;
+#ifdef INET6
+   case AF_INET6:
+   im6o = >sc_im6o;
+   mifp = if_get(im6o->im6o_ifidx);
+   LIST_FOREACH_SAFE(im6m, >im6o_memberships, i6mm_chain,
+   im6m_next)
+   in6_leavegroup(im6m);
+   break;
+#endif /* INET6 */
+   }
+   bzero(>sc_imu, sizeof(sc->sc_imu));
 
-   mifp = if_get(imo->imo_ifidx);
if (mifp != NULL) {
if (sc->sc_ahcookie != NULL) {
hook_disestablish(mifp->if_addrhooks, sc->sc_ahcookie);
@@ -219,14 +249,9 @@ vxlan_multicast_cleanup(struct ifnet *if
sc->sc_dhcookie);
sc->sc_dhcookie = NULL;
}
-
-   if_put(mifp);
}
 
-   if (imo->imo_num_memberships > 0) {
-   in_delmulti(imo->imo_membership[--imo->imo_num_memberships]);
-   imo->imo_ifidx = 0;
-   }
+   if_put(mifp);
 }
 
 int
@@ -234,55 +259,141 @@ vxlan_multicast_join(struct ifnet *ifp, 
 struct sockaddr *dst)
 {
struct vxlan_softc  *sc = ifp->if_softc;
-   struct ip_moptions  *imo = >sc_imo;
+   struct ip_moptions  *imo;
+   struct ip6_moptions *im6o;
+   struct in6_multi_mship  *im6m;
struct sockaddr_in  *src4, *dst4;
 #ifdef INET6
-   struct sockaddr_in6 *dst6;
+   struct sockaddr_in6 *src6, *dst6;
 #endif /* INET6 */
struct ifaddr   *ifa;
-   struct ifnet*mifp;
+   struct ifnet*mifp, *m6ifp = NULL;
+

Re: add in6 multicast support to vxlan(4), take 4

2016-11-28 Thread Vincent Gross
On Thu, 10 Nov 2016 22:16:55 +0100
Vincent Gross <vgr...@openbsd.org> wrote:

> On Sat, 5 Nov 2016 12:41:39 +0100
> Vincent Gross <vgr...@openbsd.org> wrote:
> 
> > Updated diff, I reworked the logic to handle the if_get/if_put dance
> > in vxlan_multicast_join(), and fixed an uninitialized variable.
> > 
> > Ok ?  
> 
> Anyone to comment or ok ? this blocks the submission of
> other changes on the network stack.

Come on ! Don't be shy !

http://quigon.bsws.de/papers/2015/asiabsdcon/mgp00042.html
http://quigon.bsws.de/papers/2015/asiabsdcon/mgp00043.html

> 
> > 
> > Index: net/if_vxlan.c
> > ===
> > RCS file: /cvs/src/sys/net/if_vxlan.c,v
> > retrieving revision 1.51
> > diff -u -p -r1.51 if_vxlan.c
> > --- net/if_vxlan.c  25 Oct 2016 16:31:08 -  1.51
> > +++ net/if_vxlan.c  5 Nov 2016 11:36:02 -
> > @@ -47,6 +47,8 @@
> >  #include 
> >  #include 
> >  
> > +#include 
> > +
> >  #if NPF > 0
> >  #include 
> >  #endif
> > @@ -61,7 +63,12 @@ struct vxlan_softc {
> > struct arpcomsc_ac;
> > struct ifmedia   sc_media;
> >  
> > -   struct ip_moptions   sc_imo;
> > +   union {
> > +   struct ip_moptions   u_imo;
> > +   struct ip6_moptions  u_imo6;
> > +   } sc_imu;
> > +#define sc_imo sc_imu.u_imo
> > +#define sc_im6osc_imu.u_imo6
> > void*sc_ahcookie;
> > void*sc_lhcookie;
> > void*sc_dhcookie;
> > @@ -129,10 +136,6 @@ vxlan_clone_create(struct if_clone *ifc,
> > M_DEVBUF, M_NOWAIT|M_ZERO)) == NULL)
> > return (ENOMEM);
> >  
> > -   sc->sc_imo.imo_membership = malloc(
> > -   (sizeof(struct in_multi *) * IP_MIN_MEMBERSHIPS),
> > M_IPMOPTS,
> > -   M_WAITOK|M_ZERO);
> > -   sc->sc_imo.imo_max_memberships = IP_MIN_MEMBERSHIPS;
> > sc->sc_dstport = htons(VXLAN_PORT);
> > sc->sc_vnetid = VXLAN_VNI_UNSET;
> >  
> > @@ -190,7 +193,6 @@ vxlan_clone_destroy(struct ifnet *ifp)
> > ifmedia_delete_instance(>sc_media, IFM_INST_ANY);
> > ether_ifdetach(ifp);
> > if_detach(ifp);
> > -   free(sc->sc_imo.imo_membership, M_IPMOPTS, 0);
> > free(sc, M_DEVBUF, sizeof(*sc));
> >  
> > return (0);
> > @@ -199,11 +201,33 @@ vxlan_clone_destroy(struct ifnet *ifp)
> >  void
> >  vxlan_multicast_cleanup(struct ifnet *ifp)
> >  {
> > -   struct vxlan_softc  *sc = (struct vxlan_softc
> > *)ifp->if_softc;
> > -   struct ip_moptions  *imo = >sc_imo;
> > -   struct ifnet*mifp;
> > +   struct vxlan_softc   *sc = (struct vxlan_softc
> > *)ifp->if_softc;
> > +   struct ip_moptions   *imo;
> > +   struct in_multi **imm;
> > +   struct ip6_moptions  *im6o;
> > +   struct in6_multi_mship   *im6m, *im6m_next;
> > +   struct ifnet *mifp = NULL;
> > +
> > +   switch (sc->sc_dst.ss_family) {
> > +   case AF_INET:
> > +   imo = >sc_imo;
> > +   mifp = if_get(imo->imo_ifidx);
> > +   imm = imo->imo_membership;
> > +   while (imo->imo_num_memberships > 0)
> > +
> > in_delmulti(imm[--imo->imo_num_memberships]);
> > +   free(imm, M_IPMOPTS,
> > +   sizeof(struct in_multi *) *
> > imo->imo_num_memberships);
> > +   break;
> > +   case AF_INET6:
> > +   im6o = >sc_im6o;
> > +   mifp = if_get(im6o->im6o_ifidx);
> > +   LIST_FOREACH_SAFE(im6m, >im6o_memberships,
> > i6mm_chain,
> > +   im6m_next)
> > +   in6_leavegroup(im6m);
> > +   break;
> > +   }
> > +   bzero(>sc_imu, sizeof(sc->sc_imu));
> >  
> > -   mifp = if_get(imo->imo_ifidx);
> > if (mifp != NULL) {
> > if (sc->sc_ahcookie != NULL) {
> > hook_disestablish(mifp->if_addrhooks,
> > sc->sc_ahcookie); @@ -219,14 +243,9 @@
> > vxlan_multicast_cleanup(struct ifnet *if sc->sc_dhcookie);
> > sc->sc_dhcookie = NULL;
> > }
> > -
> > -   if_put(mifp);
> > }
> >  
> > -   if (imo->imo_num_memberships > 0) {
> > -
> > in_delmulti(imo->imo_membership[--imo->imo_num_memberships]);
> > -  

Re: add in6 multicast support to vxlan(4), take 4

2016-11-10 Thread Vincent Gross
On Sat, 5 Nov 2016 12:41:39 +0100
Vincent Gross <vgr...@openbsd.org> wrote:

> Updated diff, I reworked the logic to handle the if_get/if_put dance
> in vxlan_multicast_join(), and fixed an uninitialized variable.
> 
> Ok ?

Anyone to comment or ok ? this blocks the submission of
other changes on the network stack.

> 
> Index: net/if_vxlan.c
> ===
> RCS file: /cvs/src/sys/net/if_vxlan.c,v
> retrieving revision 1.51
> diff -u -p -r1.51 if_vxlan.c
> --- net/if_vxlan.c25 Oct 2016 16:31:08 -  1.51
> +++ net/if_vxlan.c5 Nov 2016 11:36:02 -
> @@ -47,6 +47,8 @@
>  #include 
>  #include 
>  
> +#include 
> +
>  #if NPF > 0
>  #include 
>  #endif
> @@ -61,7 +63,12 @@ struct vxlan_softc {
>   struct arpcomsc_ac;
>   struct ifmedia   sc_media;
>  
> - struct ip_moptions   sc_imo;
> + union {
> + struct ip_moptions   u_imo;
> + struct ip6_moptions  u_imo6;
> + } sc_imu;
> +#define sc_imo   sc_imu.u_imo
> +#define sc_im6o  sc_imu.u_imo6
>   void*sc_ahcookie;
>   void*sc_lhcookie;
>   void*sc_dhcookie;
> @@ -129,10 +136,6 @@ vxlan_clone_create(struct if_clone *ifc,
>   M_DEVBUF, M_NOWAIT|M_ZERO)) == NULL)
>   return (ENOMEM);
>  
> - sc->sc_imo.imo_membership = malloc(
> - (sizeof(struct in_multi *) * IP_MIN_MEMBERSHIPS),
> M_IPMOPTS,
> - M_WAITOK|M_ZERO);
> - sc->sc_imo.imo_max_memberships = IP_MIN_MEMBERSHIPS;
>   sc->sc_dstport = htons(VXLAN_PORT);
>   sc->sc_vnetid = VXLAN_VNI_UNSET;
>  
> @@ -190,7 +193,6 @@ vxlan_clone_destroy(struct ifnet *ifp)
>   ifmedia_delete_instance(>sc_media, IFM_INST_ANY);
>   ether_ifdetach(ifp);
>   if_detach(ifp);
> - free(sc->sc_imo.imo_membership, M_IPMOPTS, 0);
>   free(sc, M_DEVBUF, sizeof(*sc));
>  
>   return (0);
> @@ -199,11 +201,33 @@ vxlan_clone_destroy(struct ifnet *ifp)
>  void
>  vxlan_multicast_cleanup(struct ifnet *ifp)
>  {
> - struct vxlan_softc  *sc = (struct vxlan_softc
> *)ifp->if_softc;
> - struct ip_moptions  *imo = >sc_imo;
> - struct ifnet*mifp;
> + struct vxlan_softc   *sc = (struct vxlan_softc
> *)ifp->if_softc;
> + struct ip_moptions   *imo;
> + struct in_multi **imm;
> + struct ip6_moptions  *im6o;
> + struct in6_multi_mship   *im6m, *im6m_next;
> + struct ifnet *mifp = NULL;
> +
> + switch (sc->sc_dst.ss_family) {
> + case AF_INET:
> + imo = >sc_imo;
> + mifp = if_get(imo->imo_ifidx);
> + imm = imo->imo_membership;
> + while (imo->imo_num_memberships > 0)
> + in_delmulti(imm[--imo->imo_num_memberships]);
> + free(imm, M_IPMOPTS,
> + sizeof(struct in_multi *) *
> imo->imo_num_memberships);
> + break;
> + case AF_INET6:
> + im6o = >sc_im6o;
> + mifp = if_get(im6o->im6o_ifidx);
> + LIST_FOREACH_SAFE(im6m, >im6o_memberships,
> i6mm_chain,
> + im6m_next)
> + in6_leavegroup(im6m);
> + break;
> + }
> + bzero(>sc_imu, sizeof(sc->sc_imu));
>  
> - mifp = if_get(imo->imo_ifidx);
>   if (mifp != NULL) {
>   if (sc->sc_ahcookie != NULL) {
>   hook_disestablish(mifp->if_addrhooks,
> sc->sc_ahcookie); @@ -219,14 +243,9 @@ vxlan_multicast_cleanup(struct
> ifnet *if sc->sc_dhcookie);
>   sc->sc_dhcookie = NULL;
>   }
> -
> - if_put(mifp);
>   }
>  
> - if (imo->imo_num_memberships > 0) {
> -
> in_delmulti(imo->imo_membership[--imo->imo_num_memberships]);
> - imo->imo_ifidx = 0;
> - }
> + if_put(mifp);
>  }
>  
>  int
> @@ -234,55 +253,141 @@ vxlan_multicast_join(struct ifnet *ifp, 
>  struct sockaddr *dst)
>  {
>   struct vxlan_softc  *sc = ifp->if_softc;
> - struct ip_moptions  *imo = >sc_imo;
> + struct ip_moptions  *imo;
> + struct ip6_moptions *im6o;
> + struct in6_multi_mship  *im6m;
>   struct sockaddr_in  *src4, *dst4;
>  #ifdef INET6
> - struct sockaddr_in6 *dst6;
> + struct sockaddr_in6 *src6, *dst6;
>  #endif /* INET6 */
>   struct ifaddr   *ifa;
> - struct if

Re: [PATCH] iked: Bugfixes for IKE rekeying

2016-11-09 Thread Vincent Gross
On Wed, 9 Nov 2016 13:16:46 +
Thomas Klute  wrote:

> Hi tech@,
> 
> this patch contains fixes for two bugs that break IKE rekeying
> initiated by iked. Please review, and apply or let me know what has to
> be changed! Both bugs are fixed by initializing the respective
> structures of the new IKE SA (struct iked_sa *nsa in the
> ikev2_ike_sa_rekey function):

Thanks, we are looking into it.

> 
> For [1]: Copying the address information is required to send any
> request messages over the new IKE SA after rekeying, otherwise errors
> like the following happen because the IP addresses and ports remain
> initialized to zero:
> 
> ikev2_msg_send: INFORMATIONAL request from any to any msgid 1, 80
> bytes ikev2_msg_send: sendtofrom: Invalid argument
> 
> For [2]: Setting the DH group based on the currently used one is
> necessary because iked proposes only the currently used transforms
> during IKE rekeying, so trying to use any other group for the DH
> exchange will fail even if it is preferred by local policy (see
> comment in the patch for details).
> 
> This patch includes and supersedes the one for only the first bug I
> sent yesterday.
> 
> Best regards,
> Thomas
> 
> [1] https://marc.info/?l=openbsd-bugs=147739504516767=2
> [2] https://marc.info/?l=openbsd-bugs=147747405806461=2
> 
> Index: src/sbin/iked/ikev2.c
> ===
> RCS file: /cvs/src/sbin/iked/ikev2.c,v
> retrieving revision 1.131
> diff -u -p -u -r1.131 ikev2.c
> --- src/sbin/iked/ikev2.c 2 Jun 2016 07:14:26 -
> 1.131 +++ src/sbin/iked/ikev2.c   9 Nov 2016 13:12:32 -
> @@ -2658,6 +2658,18 @@ ikev2_ike_sa_rekey(struct iked *env, voi
>   goto done;
>   }
>  
> + /* Select the DH group ID based on the currently used
> +  * one. Otherwise the call to ikev2_sa_initiator below would
> +  * set it to the first DH transform in the policy, while the
> +  * SA payload contains only one proposal matching the
> +  * currently used transforms. If a different DH transform has
> +  * been negotiated this means KE payload and negotiated DH
> +  * transform cannot match, causing rekeying to fail. */
> + if ((nsa->sa_dhgroup = group_get(sa->sa_dhgroup->id)) ==
> NULL) {
> + log_debug("%s: failed to initialize DH group",
> __func__);
> + goto done;
> + }
> +
>   if (ikev2_sa_initiator(env, nsa, sa, NULL)) {
>   log_debug("%s: failed to setup DH", __func__);
>   goto done;
> @@ -2665,6 +2677,13 @@ ikev2_ike_sa_rekey(struct iked *env, voi
>   sa_state(env, nsa, IKEV2_STATE_AUTH_SUCCESS);
>   nonce = nsa->sa_inonce;
>  
> + /* Copy local and peer address from the old SA */
> + if (sa_address(nsa, >sa_peer, >sa_peer.addr) == -1
> ||
> + sa_address(nsa, >sa_local, >sa_local.addr) ==
> -1) {
> + log_debug("%s: failed copy address data", __func__);
> + goto done;
> + }
> +
>   if ((e = ibuf_static()) == NULL)
>   goto done;
>  
> 



Re: Kill ifa_ifwithnet()

2016-11-07 Thread Vincent Gross
On Mon, 7 Nov 2016 08:59:53 +0100
Martin Pieuchot <m...@openbsd.org> wrote:

> On 04/11/16(Fri) 21:33, Vincent Gross wrote:
> > [...] 
> > Why are you killing Strict Source Route Record ? Just as you did
> > with rtredirect(), you can check whether RTF_GATEWAY is set and
> > send back an ICMP_UNREACH if so. Or did I miss something ?  
> 
> Like that?
> 
> Index: netinet/ip_input.c
> ===
> RCS file: /cvs/src/sys/netinet/ip_input.c,v
> retrieving revision 1.282
> diff -u -p -r1.282 ip_input.c
> --- netinet/ip_input.c22 Sep 2016 10:12:25 -  1.282
> +++ netinet/ip_input.c7 Nov 2016 07:59:02 -
> @@ -1117,37 +1117,20 @@ ip_dooptions(struct mbuf *m, struct ifne
>   ipaddr.sin_len = sizeof(ipaddr);
>   memcpy(_addr, cp + off,
>   sizeof(ipaddr.sin_addr));
> - if (opt == IPOPT_SSRR) {
> - if ((ia = ifatoia(ifa_ifwithdstaddr(
> - sintosa(),
> - m->m_pkthdr.ph_rtableid))) ==
> NULL)
> - ia = ifatoia(ifa_ifwithnet(
> - sintosa(),
> -
> m->m_pkthdr.ph_rtableid));
> - if (ia == NULL) {
> - type = ICMP_UNREACH;
> - code = ICMP_UNREACH_SRCFAIL;
> - goto bad;
> - }
> - memcpy(cp + off,
> >ia_addr.sin_addr,
> - sizeof(struct in_addr));
> - cp[IPOPT_OFFSET] += sizeof(struct
> in_addr);
> - } else {
> - /* keep packet in the virtual
> instance */
> - rt = rtalloc(sintosa(),
> RT_RESOLVE,
> - rtableid);
> - if (!rtisvalid(rt)) {
> - type = ICMP_UNREACH;
> - code = ICMP_UNREACH_SRCFAIL;
> - rtfree(rt);
> - goto bad;
> - }
> - ia = ifatoia(rt->rt_ifa);
> - memcpy(cp + off,
> >ia_addr.sin_addr,
> - sizeof(struct in_addr));
> + /* keep packet in the virtual instance */
> + rt = rtalloc(sintosa(), RT_RESOLVE,
> rtableid);
> + if (!rtisvalid(rt) || ((opt == IPOPT_SSRR) &&
> + ISSET(rt->rt_flags, RTF_GATEWAY))) {
> + type = ICMP_UNREACH;
> + code = ICMP_UNREACH_SRCFAIL;
>   rtfree(rt);
> - cp[IPOPT_OFFSET] += sizeof(struct
> in_addr);
> + goto bad;
>   }
> + ia = ifatoia(rt->rt_ifa);
> + memcpy(cp + off, >ia_addr.sin_addr,
> + sizeof(struct in_addr));
> + rtfree(rt);
> + cp[IPOPT_OFFSET] += sizeof(struct in_addr);
>   ip->ip_dst = ipaddr.sin_addr;
>   /*
>* Let ip_intr's mcast routing check handle
> mcast pkts

Ok vgross@



add in6 multicast support to vxlan(4), take 4

2016-11-05 Thread Vincent Gross
Updated diff, I reworked the logic to handle the if_get/if_put dance in
vxlan_multicast_join(), and fixed an uninitialized variable.

Ok ?

Index: net/if_vxlan.c
===
RCS file: /cvs/src/sys/net/if_vxlan.c,v
retrieving revision 1.51
diff -u -p -r1.51 if_vxlan.c
--- net/if_vxlan.c  25 Oct 2016 16:31:08 -  1.51
+++ net/if_vxlan.c  5 Nov 2016 11:36:02 -
@@ -47,6 +47,8 @@
 #include 
 #include 
 
+#include 
+
 #if NPF > 0
 #include 
 #endif
@@ -61,7 +63,12 @@ struct vxlan_softc {
struct arpcomsc_ac;
struct ifmedia   sc_media;
 
-   struct ip_moptions   sc_imo;
+   union {
+   struct ip_moptions   u_imo;
+   struct ip6_moptions  u_imo6;
+   } sc_imu;
+#define sc_imo sc_imu.u_imo
+#define sc_im6osc_imu.u_imo6
void*sc_ahcookie;
void*sc_lhcookie;
void*sc_dhcookie;
@@ -129,10 +136,6 @@ vxlan_clone_create(struct if_clone *ifc,
M_DEVBUF, M_NOWAIT|M_ZERO)) == NULL)
return (ENOMEM);
 
-   sc->sc_imo.imo_membership = malloc(
-   (sizeof(struct in_multi *) * IP_MIN_MEMBERSHIPS), M_IPMOPTS,
-   M_WAITOK|M_ZERO);
-   sc->sc_imo.imo_max_memberships = IP_MIN_MEMBERSHIPS;
sc->sc_dstport = htons(VXLAN_PORT);
sc->sc_vnetid = VXLAN_VNI_UNSET;
 
@@ -190,7 +193,6 @@ vxlan_clone_destroy(struct ifnet *ifp)
ifmedia_delete_instance(>sc_media, IFM_INST_ANY);
ether_ifdetach(ifp);
if_detach(ifp);
-   free(sc->sc_imo.imo_membership, M_IPMOPTS, 0);
free(sc, M_DEVBUF, sizeof(*sc));
 
return (0);
@@ -199,11 +201,33 @@ vxlan_clone_destroy(struct ifnet *ifp)
 void
 vxlan_multicast_cleanup(struct ifnet *ifp)
 {
-   struct vxlan_softc  *sc = (struct vxlan_softc *)ifp->if_softc;
-   struct ip_moptions  *imo = >sc_imo;
-   struct ifnet*mifp;
+   struct vxlan_softc   *sc = (struct vxlan_softc *)ifp->if_softc;
+   struct ip_moptions   *imo;
+   struct in_multi **imm;
+   struct ip6_moptions  *im6o;
+   struct in6_multi_mship   *im6m, *im6m_next;
+   struct ifnet *mifp = NULL;
+
+   switch (sc->sc_dst.ss_family) {
+   case AF_INET:
+   imo = >sc_imo;
+   mifp = if_get(imo->imo_ifidx);
+   imm = imo->imo_membership;
+   while (imo->imo_num_memberships > 0)
+   in_delmulti(imm[--imo->imo_num_memberships]);
+   free(imm, M_IPMOPTS,
+   sizeof(struct in_multi *) * imo->imo_num_memberships);
+   break;
+   case AF_INET6:
+   im6o = >sc_im6o;
+   mifp = if_get(im6o->im6o_ifidx);
+   LIST_FOREACH_SAFE(im6m, >im6o_memberships, i6mm_chain,
+   im6m_next)
+   in6_leavegroup(im6m);
+   break;
+   }
+   bzero(>sc_imu, sizeof(sc->sc_imu));
 
-   mifp = if_get(imo->imo_ifidx);
if (mifp != NULL) {
if (sc->sc_ahcookie != NULL) {
hook_disestablish(mifp->if_addrhooks, sc->sc_ahcookie);
@@ -219,14 +243,9 @@ vxlan_multicast_cleanup(struct ifnet *if
sc->sc_dhcookie);
sc->sc_dhcookie = NULL;
}
-
-   if_put(mifp);
}
 
-   if (imo->imo_num_memberships > 0) {
-   in_delmulti(imo->imo_membership[--imo->imo_num_memberships]);
-   imo->imo_ifidx = 0;
-   }
+   if_put(mifp);
 }
 
 int
@@ -234,55 +253,141 @@ vxlan_multicast_join(struct ifnet *ifp, 
 struct sockaddr *dst)
 {
struct vxlan_softc  *sc = ifp->if_softc;
-   struct ip_moptions  *imo = >sc_imo;
+   struct ip_moptions  *imo;
+   struct ip6_moptions *im6o;
+   struct in6_multi_mship  *im6m;
struct sockaddr_in  *src4, *dst4;
 #ifdef INET6
-   struct sockaddr_in6 *dst6;
+   struct sockaddr_in6 *src6, *dst6;
 #endif /* INET6 */
struct ifaddr   *ifa;
-   struct ifnet*mifp;
+   struct ifnet*mifp = NULL, *m6ifp = NULL;
+   struct rtentry  *rt;
+   int  error;
 
switch (dst->sa_family) {
case AF_INET:
dst4 = satosin(dst);
+   src4 = satosin(src);
if (!IN_MULTICAST(dst4->sin_addr.s_addr))
return (0);
+   if (src4->sin_addr.s_addr == INADDR_ANY ||
+   IN_MULTICAST(src4->sin_addr.s_addr))
+   return (EINVAL);
+   if ((ifa = ifa_ifwithaddr(src, sc->sc_rdomain)) == NULL ||
+   (mifp = ifa->ifa_ifp) == NULL ||
+   (mifp->if_flags & IFF_MULTICAST) == 0)
+

Re: Kill ifa_ifwithnet()

2016-11-04 Thread Vincent Gross
On Fri, 4 Nov 2016 12:01:58 +0100
Martin Pieuchot  wrote:

> Rather than trying to keep this old routing table like function alive
> by reimplementing rn_refines(), let's get rid of it.
> 
> ok?
> 
> Index: net/route.c
> ===
> RCS file: /cvs/src/sys/net/route.c,v
> retrieving revision 1.333
> diff -u -p -r1.333 route.c
> --- net/route.c   6 Oct 2016 19:09:08 -   1.333
> +++ net/route.c   4 Nov 2016 10:51:55 -
> @@ -550,11 +550,16 @@ rtredirect(struct sockaddr *dst, struct 
>   splsoftassert(IPL_SOFTNET);
>  
>   /* verify the gateway is directly reachable */
> - if ((ifa = ifa_ifwithnet(gateway, rdomain)) == NULL) {
> + rt = rtalloc(gateway, 0, rdomain);
> + if (!rtisvalid(rt) || ISSET(rt->rt_flags, RTF_GATEWAY)) {
> + rtfree(rt);
>   error = ENETUNREACH;
>   goto out;
>   }
> - ifidx = ifa->ifa_ifp->if_index;
> + ifidx = rt->rt_ifidx;
> + rtfree(rt);
> + rt = NULL;
> +
>   rt = rtable_lookup(rdomain, dst, NULL, NULL, RTP_ANY);
>   /*
>* If the redirect isn't from our current router for this
> dst, Index: net/if.c
> ===
> RCS file: /cvs/src/sys/net/if.c,v
> retrieving revision 1.456
> diff -u -p -r1.456 if.c
> --- net/if.c  19 Oct 2016 02:05:49 -  1.456
> +++ net/if.c  4 Nov 2016 10:55:03 -
> @@ -1282,47 +1282,6 @@ ifa_ifwithdstaddr(struct sockaddr *addr,
>  }
>  
>  /*
> - * Find an interface on a specific network.  If many, choice
> - * is most specific found.
> - */
> -struct ifaddr *
> -ifa_ifwithnet(struct sockaddr *sa, u_int rtableid)
> -{
> - struct ifnet *ifp;
> - struct ifaddr *ifa, *ifa_maybe = NULL;
> - char *cplim, *addr_data = sa->sa_data;
> - u_int rdomain;
> -
> - KERNEL_ASSERT_LOCKED();
> - rdomain = rtable_l2(rtableid);
> - TAILQ_FOREACH(ifp, , if_list) {
> - if (ifp->if_rdomain != rdomain)
> - continue;
> - TAILQ_FOREACH(ifa, >if_addrlist, ifa_list) {
> - char *cp, *cp2, *cp3;
> -
> - if (ifa->ifa_addr->sa_family !=
> sa->sa_family ||
> - ifa->ifa_netmask == 0)
> - next: continue;
> - cp = addr_data;
> - cp2 = ifa->ifa_addr->sa_data;
> - cp3 = ifa->ifa_netmask->sa_data;
> - cplim = (char *)ifa->ifa_netmask +
> - ifa->ifa_netmask->sa_len;
> - while (cp3 < cplim)
> - if ((*cp++ ^ *cp2++) & *cp3++)
> - /* want to continue for() loop */
> - goto next;
> - if (ifa_maybe == 0 ||
> - rn_refines((caddr_t)ifa->ifa_netmask,
> - (caddr_t)ifa_maybe->ifa_netmask))
> - ifa_maybe = ifa;
> - }
> - }
> - return (ifa_maybe);
> -}
> -
> -/*
>   * Find an interface address specific to an interface best matching
>   * a given address.
>   */
> Index: net/if_var.h
> ===
> RCS file: /cvs/src/sys/net/if_var.h,v
> retrieving revision 1.75
> diff -u -p -r1.75 if_var.h
> --- net/if_var.h  4 Sep 2016 15:46:39 -   1.75
> +++ net/if_var.h  4 Nov 2016 10:54:55 -
> @@ -304,7 +304,6 @@ void  p2p_rtrequest(struct ifnet *, int, 
>  
>  struct   ifaddr *ifa_ifwithaddr(struct sockaddr *, u_int);
>  struct   ifaddr *ifa_ifwithdstaddr(struct sockaddr *, u_int);
> -struct   ifaddr *ifa_ifwithnet(struct sockaddr *, u_int);
>  struct   ifaddr *ifaof_ifpforaddr(struct sockaddr *, struct
> ifnet *); voidifafree(struct ifaddr *);
>  

Everything above is ok vgross@

> Index: netinet/ip_input.c
> ===
> RCS file: /cvs/src/sys/netinet/ip_input.c,v
> retrieving revision 1.282
> diff -u -p -r1.282 ip_input.c
> --- netinet/ip_input.c22 Sep 2016 10:12:25 -  1.282
> +++ netinet/ip_input.c4 Nov 2016 10:54:49 -
> @@ -1117,37 +1117,19 @@ ip_dooptions(struct mbuf *m, struct ifne
>   ipaddr.sin_len = sizeof(ipaddr);
>   memcpy(_addr, cp + off,
>   sizeof(ipaddr.sin_addr));
> - if (opt == IPOPT_SSRR) {
> - if ((ia = ifatoia(ifa_ifwithdstaddr(
> - sintosa(),
> - m->m_pkthdr.ph_rtableid))) ==
> NULL)
> - ia = ifatoia(ifa_ifwithnet(
> - sintosa(),
> -
> m->m_pkthdr.ph_rtableid));
> - if (ia == NULL) {
> -   

Re: add in6 multicast support to vxlan(4) ; question on mbufs

2016-11-01 Thread Vincent Gross
On Tue, 1 Nov 2016 18:51:08 +0100
Mike Belopuhov <m...@belopuhov.com> wrote:

> On 1 November 2016 at 18:23, Vincent Gross <vincent.gr...@kilob.yt>
> wrote:
> > On Tue, 4 Oct 2016 01:07:51 +0200
> > Vincent Gross <vgr...@openbsd.org> wrote:
> >  
> >> On Sat, 24 Sep 2016 10:58:10 +0200
> >> Vincent Gross <vgr...@openbsd.org> wrote:
> >>  
> >> > Hi,
> >> >  
> >> [snip]  
> >> >
> >> > Aside from the mbuf issue, is this Ok ?  
> >>
> >> I will go back on the mbuff stuff later.
> >>
> >> Diff rebased, ok anyone ?
> >>  
> >
> > New rebase, tested on amd64 and macppc, Ok ?
> >  
> 
> Why have you kept the m_adj with ETHER_ALIGN?
> 

Derp. Lost track while reading the mbuf thread.

New diff w/o the mbuf dance, again tested on amd64 and macppc.


Index: sys/net/if_vxlan.c
===
RCS file: /cvs/src/sys/net/if_vxlan.c,v
retrieving revision 1.51
diff -u -p -r1.51 if_vxlan.c
--- sys/net/if_vxlan.c  25 Oct 2016 16:31:08 -  1.51
+++ sys/net/if_vxlan.c  1 Nov 2016 21:58:24 -
@@ -47,6 +47,8 @@
 #include 
 #include 
 
+#include 
+
 #if NPF > 0
 #include 
 #endif
@@ -61,7 +63,12 @@ struct vxlan_softc {
struct arpcomsc_ac;
struct ifmedia   sc_media;
 
-   struct ip_moptions   sc_imo;
+   union {
+   struct ip_moptions   u_imo;
+   struct ip6_moptions  u_imo6;
+   } sc_imu;
+#define sc_imo sc_imu.u_imo
+#define sc_im6osc_imu.u_imo6
void*sc_ahcookie;
void*sc_lhcookie;
void*sc_dhcookie;
@@ -129,10 +136,6 @@ vxlan_clone_create(struct if_clone *ifc,
M_DEVBUF, M_NOWAIT|M_ZERO)) == NULL)
return (ENOMEM);
 
-   sc->sc_imo.imo_membership = malloc(
-   (sizeof(struct in_multi *) * IP_MIN_MEMBERSHIPS),
M_IPMOPTS,
-   M_WAITOK|M_ZERO);
-   sc->sc_imo.imo_max_memberships = IP_MIN_MEMBERSHIPS;
sc->sc_dstport = htons(VXLAN_PORT);
sc->sc_vnetid = VXLAN_VNI_UNSET;
 
@@ -190,7 +193,6 @@ vxlan_clone_destroy(struct ifnet *ifp)
ifmedia_delete_instance(>sc_media, IFM_INST_ANY);
ether_ifdetach(ifp);
if_detach(ifp);
-   free(sc->sc_imo.imo_membership, M_IPMOPTS, 0);
free(sc, M_DEVBUF, sizeof(*sc));
 
return (0);
@@ -199,11 +201,33 @@ vxlan_clone_destroy(struct ifnet *ifp)
 void
 vxlan_multicast_cleanup(struct ifnet *ifp)
 {
-   struct vxlan_softc  *sc = (struct vxlan_softc
*)ifp->if_softc;
-   struct ip_moptions  *imo = >sc_imo;
-   struct ifnet*mifp;
+   struct vxlan_softc   *sc = (struct vxlan_softc
*)ifp->if_softc;
+   struct ip_moptions   *imo;
+   struct in_multi **imm;
+   struct ip6_moptions  *im6o;
+   struct in6_multi_mship   *im6m, *im6m_next;
+   struct ifnet *mifp = NULL;
+
+   switch (sc->sc_dst.ss_family) {
+   case AF_INET:
+   imo = >sc_imo;
+   mifp = if_get(imo->imo_ifidx);
+   imm = imo->imo_membership;
+   while (imo->imo_num_memberships > 0)
+   in_delmulti(imm[--imo->imo_num_memberships]);
+   free(imm, M_IPMOPTS,
+   sizeof(struct in_multi *) *
imo->imo_num_memberships);
+   break;
+   case AF_INET6:
+   im6o = >sc_im6o;
+   mifp = if_get(im6o->im6o_ifidx);
+   LIST_FOREACH_SAFE(im6m, >im6o_memberships,
i6mm_chain,
+   im6m_next)
+   in6_leavegroup(im6m);
+   break;
+   }
+   bzero(>sc_imu, sizeof(sc->sc_imu));
 
-   mifp = if_get(imo->imo_ifidx);
if (mifp != NULL) {
if (sc->sc_ahcookie != NULL) {
hook_disestablish(mifp->if_addrhooks,
sc->sc_ahcookie); @@ -219,14 +243,9 @@ vxlan_multicast_cleanup(struct
ifnet *if sc->sc_dhcookie);
sc->sc_dhcookie = NULL;
}
-
-   if_put(mifp);
}
 
-   if (imo->imo_num_memberships > 0) {
-
in_delmulti(imo->imo_membership[--imo->imo_num_memberships]);
-   imo->imo_ifidx = 0;
-   }
+   if_put(mifp);
 }
 
 int
@@ -234,55 +253,140 @@ vxlan_multicast_join(struct ifnet *ifp, 
 struct sockaddr *dst)
 {
struct vxlan_softc  *sc = ifp->if_softc;
-   struct ip_moptions  *imo = >sc_imo;
+   struct ip_moptions  *imo;
+   struct ip6_moptions *im6o;
+   struct in6_multi_mship  *imm;
struct sockaddr_in  *src4, *dst4;
 #ifdef INET6
-   struct

Re: add in6 multicast support to vxlan(4) ; question on mbufs

2016-11-01 Thread Vincent Gross
On Tue, 4 Oct 2016 01:07:51 +0200
Vincent Gross <vgr...@openbsd.org> wrote:

> On Sat, 24 Sep 2016 10:58:10 +0200
> Vincent Gross <vgr...@openbsd.org> wrote:
> 
> > Hi,
> >   
> [snip]
> > 
> > Aside from the mbuf issue, is this Ok ?  
> 
> I will go back on the mbuff stuff later.
> 
> Diff rebased, ok anyone ?
> 

New rebase, tested on amd64 and macppc, Ok ?

Index: sys/net/if_vxlan.c
===
RCS file: /cvs/src/sys/net/if_vxlan.c,v
retrieving revision 1.51
diff -u -p -r1.51 if_vxlan.c
--- sys/net/if_vxlan.c  25 Oct 2016 16:31:08 -  1.51
+++ sys/net/if_vxlan.c  1 Nov 2016 17:18:15 -
@@ -47,6 +47,8 @@
 #include 
 #include 
 
+#include 
+
 #if NPF > 0
 #include 
 #endif
@@ -61,7 +63,12 @@ struct vxlan_softc {
struct arpcomsc_ac;
struct ifmedia   sc_media;
 
-   struct ip_moptions   sc_imo;
+   union {
+   struct ip_moptions   u_imo;
+   struct ip6_moptions  u_imo6;
+   } sc_imu;
+#define sc_imo sc_imu.u_imo
+#define sc_im6osc_imu.u_imo6
void*sc_ahcookie;
void*sc_lhcookie;
void*sc_dhcookie;
@@ -129,10 +136,6 @@ vxlan_clone_create(struct if_clone *ifc,
M_DEVBUF, M_NOWAIT|M_ZERO)) == NULL)
return (ENOMEM);
 
-   sc->sc_imo.imo_membership = malloc(
-   (sizeof(struct in_multi *) * IP_MIN_MEMBERSHIPS), M_IPMOPTS,
-   M_WAITOK|M_ZERO);
-   sc->sc_imo.imo_max_memberships = IP_MIN_MEMBERSHIPS;
sc->sc_dstport = htons(VXLAN_PORT);
sc->sc_vnetid = VXLAN_VNI_UNSET;
 
@@ -190,7 +193,6 @@ vxlan_clone_destroy(struct ifnet *ifp)
ifmedia_delete_instance(>sc_media, IFM_INST_ANY);
ether_ifdetach(ifp);
if_detach(ifp);
-   free(sc->sc_imo.imo_membership, M_IPMOPTS, 0);
free(sc, M_DEVBUF, sizeof(*sc));
 
return (0);
@@ -199,11 +201,33 @@ vxlan_clone_destroy(struct ifnet *ifp)
 void
 vxlan_multicast_cleanup(struct ifnet *ifp)
 {
-   struct vxlan_softc  *sc = (struct vxlan_softc *)ifp->if_softc;
-   struct ip_moptions  *imo = >sc_imo;
-   struct ifnet*mifp;
+   struct vxlan_softc   *sc = (struct vxlan_softc *)ifp->if_softc;
+   struct ip_moptions   *imo;
+   struct in_multi **imm;
+   struct ip6_moptions  *im6o;
+   struct in6_multi_mship   *im6m, *im6m_next;
+   struct ifnet *mifp = NULL;
+
+   switch (sc->sc_dst.ss_family) {
+   case AF_INET:
+   imo = >sc_imo;
+   mifp = if_get(imo->imo_ifidx);
+   imm = imo->imo_membership;
+   while (imo->imo_num_memberships > 0)
+   in_delmulti(imm[--imo->imo_num_memberships]);
+   free(imm, M_IPMOPTS,
+   sizeof(struct in_multi *) * imo->imo_num_memberships);
+   break;
+   case AF_INET6:
+   im6o = >sc_im6o;
+   mifp = if_get(im6o->im6o_ifidx);
+   LIST_FOREACH_SAFE(im6m, >im6o_memberships, i6mm_chain,
+   im6m_next)
+   in6_leavegroup(im6m);
+   break;
+   }
+   bzero(>sc_imu, sizeof(sc->sc_imu));
 
-   mifp = if_get(imo->imo_ifidx);
if (mifp != NULL) {
if (sc->sc_ahcookie != NULL) {
hook_disestablish(mifp->if_addrhooks, sc->sc_ahcookie);
@@ -219,14 +243,9 @@ vxlan_multicast_cleanup(struct ifnet *if
sc->sc_dhcookie);
sc->sc_dhcookie = NULL;
}
-
-   if_put(mifp);
}
 
-   if (imo->imo_num_memberships > 0) {
-   in_delmulti(imo->imo_membership[--imo->imo_num_memberships]);
-   imo->imo_ifidx = 0;
-   }
+   if_put(mifp);
 }
 
 int
@@ -234,55 +253,140 @@ vxlan_multicast_join(struct ifnet *ifp, 
 struct sockaddr *dst)
 {
struct vxlan_softc  *sc = ifp->if_softc;
-   struct ip_moptions  *imo = >sc_imo;
+   struct ip_moptions  *imo;
+   struct ip6_moptions *im6o;
+   struct in6_multi_mship  *imm;
struct sockaddr_in  *src4, *dst4;
 #ifdef INET6
-   struct sockaddr_in6 *dst6;
+   struct sockaddr_in6 *src6, *dst6;
 #endif /* INET6 */
struct ifaddr   *ifa;
-   struct ifnet*mifp;
+   struct ifnet*mifp = NULL;
+   struct rtentry  *rt;
+   int  error;
 
switch (dst->sa_family) {
case AF_INET:
dst4 = satosin(dst);
+   src4 = satosin(src);
if (!IN_MULTICAST

Re: add in6 multicast support to vxlan(4) ; question on mbufs

2016-10-03 Thread Vincent Gross
On Sat, 24 Sep 2016 10:58:10 +0200
Vincent Gross <vgr...@openbsd.org> wrote:

> Hi,
> 
[snip]
> 
> Aside from the mbuf issue, is this Ok ?

I will go back on the mbuff stuff later.

Diff rebased, ok anyone ?

Index: net/if_vxlan.c
===
RCS file: /cvs/src/sys/net/if_vxlan.c,v
retrieving revision 1.48
diff -u -p -r1.48 if_vxlan.c
--- net/if_vxlan.c  30 Sep 2016 10:22:05 -  1.48
+++ net/if_vxlan.c  3 Oct 2016 23:12:42 -
@@ -47,6 +47,8 @@
 #include 
 #include 
 
+#include 
+
 #if NPF > 0
 #include 
 #endif
@@ -61,7 +63,12 @@ struct vxlan_softc {
struct arpcomsc_ac;
struct ifmedia   sc_media;
 
-   struct ip_moptions   sc_imo;
+   union {
+   struct ip_moptions   u_imo;
+   struct ip6_moptions  u_imo6;
+   } sc_imu;
+#define sc_imo sc_imu.u_imo
+#define sc_im6osc_imu.u_imo6
void*sc_ahcookie;
void*sc_lhcookie;
void*sc_dhcookie;
@@ -129,10 +136,6 @@ vxlan_clone_create(struct if_clone *ifc,
M_DEVBUF, M_NOWAIT|M_ZERO)) == NULL)
return (ENOMEM);
 
-   sc->sc_imo.imo_membership = malloc(
-   (sizeof(struct in_multi *) * IP_MIN_MEMBERSHIPS),
M_IPMOPTS,
-   M_WAITOK|M_ZERO);
-   sc->sc_imo.imo_max_memberships = IP_MIN_MEMBERSHIPS;
sc->sc_dstport = htons(VXLAN_PORT);
sc->sc_vnetid = VXLAN_VNI_UNSET;
 
@@ -190,7 +193,6 @@ vxlan_clone_destroy(struct ifnet *ifp)
ifmedia_delete_instance(>sc_media, IFM_INST_ANY);
ether_ifdetach(ifp);
if_detach(ifp);
-   free(sc->sc_imo.imo_membership, M_IPMOPTS, 0);
free(sc, M_DEVBUF, sizeof(*sc));
 
return (0);
@@ -199,11 +201,33 @@ vxlan_clone_destroy(struct ifnet *ifp)
 void
 vxlan_multicast_cleanup(struct ifnet *ifp)
 {
-   struct vxlan_softc  *sc = (struct vxlan_softc
*)ifp->if_softc;
-   struct ip_moptions  *imo = >sc_imo;
-   struct ifnet*mifp;
+   struct vxlan_softc   *sc = (struct vxlan_softc
*)ifp->if_softc;
+   struct ip_moptions   *imo;
+   struct in_multi **imm;
+   struct ip6_moptions  *im6o;
+   struct in6_multi_mship   *im6m, *im6m_next;
+   struct ifnet *mifp = NULL;
+
+   switch (sc->sc_dst.ss_family) {
+   case AF_INET:
+   imo = >sc_imo;
+   mifp = if_get(imo->imo_ifidx);
+   imm = imo->imo_membership;
+   while (imo->imo_num_memberships > 0)
+   in_delmulti(imm[--imo->imo_num_memberships]);
+   free(imm, M_IPMOPTS,
+   sizeof(struct in_multi *) *
imo->imo_num_memberships);
+   break;
+   case AF_INET6:
+   im6o = >sc_im6o;
+   mifp = if_get(im6o->im6o_ifidx);
+   LIST_FOREACH_SAFE(im6m, >im6o_memberships,
i6mm_chain,
+   im6m_next)
+   in6_leavegroup(im6m);
+   break;
+   }
+   bzero(>sc_imu, sizeof(sc->sc_imu));
 
-   mifp = if_get(imo->imo_ifidx);
if (mifp != NULL) {
if (sc->sc_ahcookie != NULL) {
hook_disestablish(mifp->if_addrhooks,
sc->sc_ahcookie); @@ -219,14 +243,9 @@ vxlan_multicast_cleanup(struct
ifnet *if sc->sc_dhcookie);
sc->sc_dhcookie = NULL;
}
-
-   if_put(mifp);
}
 
-   if (imo->imo_num_memberships > 0) {
-
in_delmulti(imo->imo_membership[--imo->imo_num_memberships]);
-   imo->imo_ifidx = 0;
-   }
+   if_put(mifp);
 }
 
 int
@@ -234,47 +253,136 @@ vxlan_multicast_join(struct ifnet *ifp, 
 struct sockaddr *dst)
 {
struct vxlan_softc  *sc = ifp->if_softc;
-   struct ip_moptions  *imo = >sc_imo;
+   struct ip_moptions  *imo;
+   struct ip6_moptions *im6o;
+   struct in6_multi_mship  *imm;
struct sockaddr_in  *src4, *dst4;
-   struct sockaddr_in6 *dst6;
+   struct sockaddr_in6 *src6, *dst6;
struct ifaddr   *ifa;
-   struct ifnet*mifp;
+   struct ifnet*mifp = NULL;
+   struct rtentry  *rt;
+   int  error;
 
-   if (dst->sa_family == AF_INET) {
+   switch (dst->sa_family) {
+   case AF_INET:
dst4 = satosin(dst);
+   src4 = satosin(src);
if (!IN_MULTICAST(dst4->sin_addr.s_addr))
return (0);
-   } else if (dst->sa_family == AF_INET6) {
+   if (src4->sin_addr.s_addr == INADDR_ANY ||
+   IN_MULTICAST(src4->sin_addr.s_addr))
+

Re: iked recvfromto flags

2016-09-26 Thread Vincent Gross
On Mon, 26 Sep 2016 18:33:43 +0200
j...@wxcvbn.org (Jeremie Courreges-Anglas) wrote:

> Don't ignore the "flags" argument passed to recvfromto.  Doesn't
> matter for now in iked (0 is passed), but this kind of code tends to
> be copied.
> 
> ok?
> 

ok vgross@

> 
> Index: util.c
> ===
> RCS file: /cvs/src/sbin/iked/util.c,v
> retrieving revision 1.31
> diff -u -p -p -u -r1.31 util.c
> --- util.c4 Sep 2016 10:26:02 -   1.31
> +++ util.c26 Sep 2016 16:32:56 -
> @@ -366,7 +366,7 @@ recvfromto(int s, void *buf, size_t len,
>   msg.msg_control = 
>   msg.msg_controllen = sizeof(cmsgbuf.buf);
>  
> - if ((ret = recvmsg(s, , 0)) == -1)
> + if ((ret = recvmsg(s, , flags)) == -1)
>   return (-1);
>  
>   *fromlen = from->sa_len;
> 



add in6 multicast support to vxlan(4) ; question on mbufs

2016-09-24 Thread Vincent Gross
Hi,

As said in Subject:.

I would like to get comments on the m_adj/m_pullup dance at the end of
vxlan_lookup() ; I do this because ether_input() access the ethernet header
with mtod(), and under some conditions the mbuf handled would have its
first data chunk empty (mh_len == 0). What is the rule of thumb
regarding m_pullup/mtod use versus m_copydata ?

Aside from the mbuf issue, is this Ok ?

Index: net/if_vxlan.c
===
RCS file: /cvs/src/sys/net/if_vxlan.c,v
retrieving revision 1.44
diff -u -p -r1.44 if_vxlan.c
--- net/if_vxlan.c  4 Sep 2016 11:14:44 -   1.44
+++ net/if_vxlan.c  24 Sep 2016 08:37:22 -
@@ -47,6 +47,8 @@
 #include 
 #include 
 
+#include 
+
 #if NPF > 0
 #include 
 #endif
@@ -61,7 +63,12 @@ struct vxlan_softc {
struct arpcomsc_ac;
struct ifmedia   sc_media;
 
-   struct ip_moptions   sc_imo;
+   union {
+   struct ip_moptions   u_imo;
+   struct ip6_moptions  u_imo6;
+   } sc_imu;
+#define sc_imo sc_imu.u_imo
+#define sc_im6osc_imu.u_imo6
void*sc_ahcookie;
void*sc_lhcookie;
void*sc_dhcookie;
@@ -129,10 +136,6 @@ vxlan_clone_create(struct if_clone *ifc,
M_DEVBUF, M_NOWAIT|M_ZERO)) == NULL)
return (ENOMEM);
 
-   sc->sc_imo.imo_membership = malloc(
-   (sizeof(struct in_multi *) * IP_MIN_MEMBERSHIPS), M_IPMOPTS,
-   M_WAITOK|M_ZERO);
-   sc->sc_imo.imo_max_memberships = IP_MIN_MEMBERSHIPS;
sc->sc_dstport = htons(VXLAN_PORT);
sc->sc_vnetid = VXLAN_VNI_UNSET;
 
@@ -190,7 +193,6 @@ vxlan_clone_destroy(struct ifnet *ifp)
ifmedia_delete_instance(>sc_media, IFM_INST_ANY);
ether_ifdetach(ifp);
if_detach(ifp);
-   free(sc->sc_imo.imo_membership, M_IPMOPTS, 0);
free(sc, M_DEVBUF, sizeof(*sc));
 
return (0);
@@ -199,11 +201,33 @@ vxlan_clone_destroy(struct ifnet *ifp)
 void
 vxlan_multicast_cleanup(struct ifnet *ifp)
 {
-   struct vxlan_softc  *sc = (struct vxlan_softc *)ifp->if_softc;
-   struct ip_moptions  *imo = >sc_imo;
-   struct ifnet*mifp;
+   struct vxlan_softc   *sc = (struct vxlan_softc *)ifp->if_softc;
+   struct ip_moptions   *imo;
+   struct in_multi **imm;
+   struct ip6_moptions  *im6o;
+   struct in6_multi_mship   *im6m, *im6m_next;
+   struct ifnet *mifp = NULL;
+
+   switch (sc->sc_dst.ss_family) {
+   case AF_INET:
+   imo = >sc_imo;
+   mifp = if_get(imo->imo_ifidx);
+   imm = imo->imo_membership;
+   while (imo->imo_num_memberships > 0)
+   in_delmulti(imm[--imo->imo_num_memberships]);
+   free(imm, M_IPMOPTS,
+   sizeof(struct in_multi *) * imo->imo_num_memberships);
+   break;
+   case AF_INET6:
+   im6o = >sc_im6o;
+   mifp = if_get(im6o->im6o_ifidx);
+   LIST_FOREACH_SAFE(im6m, >im6o_memberships, i6mm_chain,
+   im6m_next)
+   in6_leavegroup(im6m);
+   break;
+   }
+   bzero(>sc_imu, sizeof(sc->sc_imu));
 
-   mifp = if_get(imo->imo_ifidx);
if (mifp != NULL) {
if (sc->sc_ahcookie != NULL) {
hook_disestablish(mifp->if_addrhooks, sc->sc_ahcookie);
@@ -219,14 +243,9 @@ vxlan_multicast_cleanup(struct ifnet *if
sc->sc_dhcookie);
sc->sc_dhcookie = NULL;
}
-
-   if_put(mifp);
}
 
-   if (imo->imo_num_memberships > 0) {
-   in_delmulti(imo->imo_membership[--imo->imo_num_memberships]);
-   imo->imo_ifidx = 0;
-   }
+   if_put(mifp);
 }
 
 int
@@ -234,47 +253,136 @@ vxlan_multicast_join(struct ifnet *ifp, 
 struct sockaddr *dst)
 {
struct vxlan_softc  *sc = ifp->if_softc;
-   struct ip_moptions  *imo = >sc_imo;
+   struct ip_moptions  *imo;
+   struct ip6_moptions *im6o;
+   struct in6_multi_mship  *imm;
struct sockaddr_in  *src4, *dst4;
-   struct sockaddr_in6 *dst6;
+   struct sockaddr_in6 *src6, *dst6;
struct ifaddr   *ifa;
-   struct ifnet*mifp;
+   struct ifnet*mifp = NULL;
+   struct rtentry  *rt;
+   int  error;
 
-   if (dst->sa_family == AF_INET) {
+   switch (dst->sa_family) {
+   case AF_INET:
dst4 = satosin(dst);
+   src4 = satosin(src);
if (!IN_MULTICAST(dst4->sin_addr.s_addr))
return (0);
-   } else if (dst->sa_family == AF_INET6) {
+   if (src4->sin_addr.s_addr == INADDR_ANY 

Re: timeout_set_proc(9)

2016-09-16 Thread Vincent Gross
On Thu, 15 Sep 2016 16:29:45 +0200
Martin Pieuchot  wrote:

> After discussing with a few people about a new "timed task" API I came
> to the conclusion that mixing timeouts and tasks will result in:
> 
>   - always including a 'struct timeout' in a 'struct task', or the
> other the way around
> or
>   
>   - introducing a new data structure, hence API.
> 
> Since I'd like to keep the change as small as possible when converting
> existing timeout_set(9), neither option seem a good fit.  So I decided
> to add a new kernel thread, curiously named "softclock", that will
> offer his stack to the poor timeout handlers that need one. 
> 
> With this approach, converting a timeout is just a matter of doing:
> 
>   s/timeout_set/timeout_set_proc/
> 
> 
> Diff below includes the conversions I need for the "netlock".  I'm
> waiting for feedbacks and a better name to document the new function.
> 
> Comments?

Reads OK; I like the simple renaming.

The "softclock" thread name will be confusing, the timeouts are indeed
driven by the softclock interrupt, but the tasks have nothing to do
with softclock. Maybe "timeothread" ?

Will this new thread stay, or is it only to ease the transition to MP
networking ?

> 
> Index: net/if_pflow.c
> ===
> RCS file: /cvs/src/sys/net/if_pflow.c,v
> retrieving revision 1.61
> diff -u -p -r1.61 if_pflow.c
> --- net/if_pflow.c29 Apr 2016 08:55:03 -  1.61
> +++ net/if_pflow.c15 Sep 2016 14:19:10 -
> @@ -548,15 +548,16 @@ pflow_init_timeouts(struct pflow_softc *
>   if (timeout_initialized(>sc_tmo_tmpl))
>   timeout_del(>sc_tmo_tmpl);
>   if (!timeout_initialized(>sc_tmo))
> - timeout_set(>sc_tmo, pflow_timeout, sc);
> + timeout_set_proc(>sc_tmo, pflow_timeout,
> sc); break;
>   case PFLOW_PROTO_10:
>   if (!timeout_initialized(>sc_tmo_tmpl))
> - timeout_set(>sc_tmo_tmpl,
> pflow_timeout_tmpl, sc);
> + timeout_set_proc(>sc_tmo_tmpl,
> pflow_timeout_tmpl,
> + sc);
>   if (!timeout_initialized(>sc_tmo))
> - timeout_set(>sc_tmo, pflow_timeout, sc);
> + timeout_set_proc(>sc_tmo, pflow_timeout,
> sc); if (!timeout_initialized(>sc_tmo6))
> - timeout_set(>sc_tmo6, pflow_timeout6,
> sc);
> + timeout_set_proc(>sc_tmo6,
> pflow_timeout6, sc); 
>   timeout_add_sec(>sc_tmo_tmpl,
> PFLOW_TMPL_TIMEOUT); break;
> Index: net/if_pfsync.c
> ===
> RCS file: /cvs/src/sys/net/if_pfsync.c,v
> retrieving revision 1.231
> diff -u -p -r1.231 if_pfsync.c
> --- net/if_pfsync.c   15 Sep 2016 02:00:18 -  1.231
> +++ net/if_pfsync.c   15 Sep 2016 14:19:10 -
> @@ -328,9 +328,9 @@ pfsync_clone_create(struct if_clone *ifc
>   IFQ_SET_MAXLEN(>if_snd, IFQ_MAXLEN);
>   ifp->if_hdrlen = sizeof(struct pfsync_header);
>   ifp->if_mtu = ETHERMTU;
> - timeout_set(>sc_tmo, pfsync_timeout, sc);
> - timeout_set(>sc_bulk_tmo, pfsync_bulk_update, sc);
> - timeout_set(>sc_bulkfail_tmo, pfsync_bulk_fail, sc);
> + timeout_set_proc(>sc_tmo, pfsync_timeout, sc);
> + timeout_set_proc(>sc_bulk_tmo, pfsync_bulk_update, sc);
> + timeout_set_proc(>sc_bulkfail_tmo, pfsync_bulk_fail, sc);
>  
>   if_attach(ifp);
>   if_alloc_sadl(ifp);
> @@ -1723,7 +1723,7 @@ pfsync_defer(struct pf_state *st, struct
>   sc->sc_deferred++;
>   TAILQ_INSERT_TAIL(>sc_deferrals, pd, pd_entry);
>  
> - timeout_set(>pd_tmo, pfsync_defer_tmo, pd);
> + timeout_set_proc(>pd_tmo, pfsync_defer_tmo, pd);
>   timeout_add_msec(>pd_tmo, 20);
>  
>   schednetisr(NETISR_PFSYNC);
> Index: netinet/ip_carp.c
> ===
> RCS file: /cvs/src/sys/netinet/ip_carp.c,v
> retrieving revision 1.293
> diff -u -p -r1.293 ip_carp.c
> --- netinet/ip_carp.c 25 Jul 2016 16:44:04 -  1.293
> +++ netinet/ip_carp.c 15 Sep 2016 14:19:11 -
> @@ -831,9 +831,9 @@ carp_new_vhost(struct carp_softc *sc, in
>   vhe->vhid = vhid;
>   vhe->advskew = advskew;
>   vhe->state = INIT;
> - timeout_set(>ad_tmo, carp_send_ad, vhe);
> - timeout_set(>md_tmo, carp_master_down, vhe);
> - timeout_set(>md6_tmo, carp_master_down, vhe);
> + timeout_set_proc(>ad_tmo, carp_send_ad, vhe);
> + timeout_set_proc(>md_tmo, carp_master_down, vhe);
> + timeout_set_proc(>md6_tmo, carp_master_down, vhe);
>  
>   KERNEL_ASSERT_LOCKED(); /* touching carp_vhosts */
>  
> Index: netinet/tcp_timer.h
> ===
> RCS file: /cvs/src/sys/netinet/tcp_timer.h,v
> retrieving revision 1.13
> diff -u -p -r1.13 tcp_timer.h
> --- netinet/tcp_timer.h   6 Jul 2011 23:44:20 

Re: ip6_setpktopt: dead code & param

2016-09-13 Thread Vincent Gross
On Tue, 13 Sep 2016 14:19:24 +0200
j...@wxcvbn.org (Jeremie Courreges-Anglas) wrote:

> Since it has been introduced, ip6_setpktopt has only been called with
> (sticky=1, cmsg=0) or (sticky=0, cmsg=1).  Let's simplify this code.

Ok vgross@

> 
> 
> Index: ip6_output.c
> ===
> RCS file: /cvs/src/sys/netinet6/ip6_output.c,v
> retrieving revision 1.213
> diff -u -p -p -u -r1.213 ip6_output.c
> --- ip6_output.c  25 Aug 2016 12:30:16 -  1.213
> +++ ip6_output.c  13 Sep 2016 11:56:19 -
> @@ -119,8 +119,7 @@ struct ip6_exthdrs {
>  int ip6_pcbopt(int, u_char *, int, struct ip6_pktopts **, int, int);
>  int ip6_pcbopts(struct ip6_pktopts **, struct mbuf *, struct socket
> *); int ip6_getpcbopt(struct ip6_pktopts *, int, struct mbuf **);
> -int ip6_setpktopt(int, u_char *, int, struct ip6_pktopts *, int, int,
> - int, int);
> +int ip6_setpktopt(int, u_char *, int, struct ip6_pktopts *, int,
> int, int); int ip6_setmoptions(int, struct ip6_moptions **, struct
> mbuf *); int ip6_getmoptions(int, struct ip6_moptions *, struct mbuf
> **); int ip6_copyexthdr(struct mbuf **, caddr_t, int);
> @@ -1770,7 +1769,7 @@ ip6_pcbopt(int optname, u_char *buf, int
>   }
>   opt = *pktopt;
>  
> - return (ip6_setpktopt(optname, buf, len, opt, priv, 1, 0,
> uproto));
> + return (ip6_setpktopt(optname, buf, len, opt, priv, 1,
> uproto)); }
>  
>  int
> @@ -2352,7 +2351,7 @@ ip6_setpktopts(struct mbuf *control, str
>   return (EINVAL);
>   if (cm->cmsg_level == IPPROTO_IPV6) {
>   error = ip6_setpktopt(cm->cmsg_type,
> CMSG_DATA(cm),
> - cm->cmsg_len - CMSG_LEN(0), opt, priv,
> 0, 1, uproto);
> + cm->cmsg_len - CMSG_LEN(0), opt, priv,
> 0, uproto); if (error)
>   return (error);
>   }
> @@ -2367,39 +2366,12 @@ ip6_setpktopts(struct mbuf *control, str
>  /*
>   * Set a particular packet option, as a sticky option or an
> ancillary data
>   * item.  "len" can be 0 only when it's a sticky option.
> - * We have 4 cases of combination of "sticky" and "cmsg":
> - * "sticky=0, cmsg=0": impossible
> - * "sticky=0, cmsg=1": RFC2292 or RFC3542 ancillary data
> - * "sticky=1, cmsg=0": RFC3542 socket option
> - * "sticky=1, cmsg=1": RFC2292 socket option
>   */
>  int
>  ip6_setpktopt(int optname, u_char *buf, int len, struct ip6_pktopts
> *opt,
> -int priv, int sticky, int cmsg, int uproto)
> +int priv, int sticky, int uproto)
>  {
>   int minmtupolicy;
> -
> - if (!sticky && !cmsg) {
> -#ifdef DIAGNOSTIC
> - printf("ip6_setpktopt: impossible case\n");
> -#endif
> - return (EINVAL);
> - }
> -
> - if (sticky && cmsg) {
> - switch (optname) {
> - case IPV6_PKTINFO:
> - case IPV6_HOPLIMIT:
> - case IPV6_HOPOPTS:
> - case IPV6_DSTOPTS:
> - case IPV6_RTHDRDSTOPTS:
> - case IPV6_RTHDR:
> - case IPV6_USE_MIN_MTU:
> - case IPV6_DONTFRAG:
> - case IPV6_TCLASS:
> - return (ENOPROTOOPT);
> - }
> - }
>  
>   switch (optname) {
>   case IPV6_PKTINFO:
> 



Re: rwsleep(9)

2016-09-13 Thread Vincent Gross
On Tue, 13 Sep 2016 10:08:13 +0200
Martin Pieuchot <m...@openbsd.org> wrote:

> On 12/09/16(Mon) 12:12, Vincent Gross wrote:
> > On Mon, 12 Sep 2016 10:49:03 +0200
> > Martin Pieuchot <m...@openbsd.org> wrote:
> >   
> > > I'd like to use a write lock to serialize accesses to ip_output().
> > > This will be used to guarantee that atomic code sections in the
> > > socket layer stay atomic when the input/forwarding path won't run
> > > under KERNEL_LOCK().
> > > 
> > > For such purpose I'll have to convert some tsleep(9) to an
> > > msleep(9)-like function operating on a write lock.  That's why I'd
> > > like to introduce rwsleep(9).  I did not bother exporting a read
> > > variant of this function since I don't need it for the moment.
> > > 
> > > ok?  
> > 
> > MP noob here :
> > 
> > tsleep() and msleep() check if they are called during
> > autoconfiguration or after a panic to let interrupts run. There is
> > no such check here. I get that rwsleep() during autoconf makes
> > little sense, but to err on the safe side maybe add some kind of
> > assert (if it is not too much of a pain) ? and what about panic,
> > shouldn't this be handled ?  
> 
> This is not a MP problem but an old BSD heritage.  I don't mind adding
> it.  But that's not a real solution to panic being broken with sleep
> or locks.
> 

"old BSD heritage" -> 'nuff said. No need to spread the rot then.

ok vgross@



Re: rwsleep(9)

2016-09-12 Thread Vincent Gross
On Mon, 12 Sep 2016 10:49:03 +0200
Martin Pieuchot  wrote:

> I'd like to use a write lock to serialize accesses to ip_output().
> This will be used to guarantee that atomic code sections in the
> socket layer stay atomic when the input/forwarding path won't run
> under KERNEL_LOCK().
> 
> For such purpose I'll have to convert some tsleep(9) to an
> msleep(9)-like function operating on a write lock.  That's why I'd
> like to introduce rwsleep(9).  I did not bother exporting a read
> variant of this function since I don't need it for the moment.
> 
> ok?

MP noob here :

tsleep() and msleep() check if they are called during autoconfiguration
or after a panic to let interrupts run. There is no such check here. I
get that rwsleep() during autoconf makes little sense, but to err on
the safe side maybe add some kind of assert (if it is not too much of
a pain) ? and what about panic, shouldn't this be handled ?

> 
> Index: sys/kern/kern_synch.c
> ===
> RCS file: /cvs/src/sys/kern/kern_synch.c,v
> retrieving revision 1.134
> diff -u -p -r1.134 kern_synch.c
> --- sys/kern/kern_synch.c 3 Sep 2016 15:06:06 -
> 1.134 +++ sys/kern/kern_synch.c   12 Sep 2016 08:41:23 -
> @@ -226,6 +226,40 @@ msleep(const volatile void *ident, struc
>   return (error);
>  }
>  
> +/*
> + * Same as tsleep, but if we have a rwlock provided, then once we've
> + * entered the sleep queue we drop the it. After sleeping we re-lock.
> + */
> +int
> +rwsleep(const volatile void *ident, struct rwlock *wl, int priority,
> +const char *wmesg, int timo)
> +{
> + struct sleep_state sls;
> + int error, error1;
> +
> + KASSERT((priority & ~(PRIMASK | PCATCH | PNORELOCK)) == 0);
> + rw_assert_wrlock(wl);
> +
> + sleep_setup(, ident, priority, wmesg);
> + sleep_setup_timeout(, timo);
> + sleep_setup_signal(, priority);
> +
> + rw_exit_write(wl);
> +
> + sleep_finish(, 1);
> + error1 = sleep_finish_timeout();
> + error = sleep_finish_signal();
> +
> + if ((priority & PNORELOCK) == 0)
> + rw_enter_write(wl);
> +
> + /* Signal errors are higher priority than timeouts. */
> + if (error == 0 && error1 != 0)
> + error = error1;
> +
> + return (error);
> +}
> +
>  void
>  sleep_setup(struct sleep_state *sls, const volatile void *ident, int
> prio, const char *wmesg)
> Index: sys/sys/systm.h
> ===
> RCS file: /cvs/src/sys/sys/systm.h,v
> retrieving revision 1.116
> diff -u -p -r1.116 systm.h
> --- sys/sys/systm.h   4 Sep 2016 09:22:29 -   1.116
> +++ sys/sys/systm.h   12 Sep 2016 08:35:26 -
> @@ -246,11 +246,13 @@ int sleep_finish_signal(struct sleep_sta
>  void sleep_queue_init(void);
>  
>  struct mutex;
> +struct rwlock;
>  voidwakeup_n(const volatile void *, int);
>  voidwakeup(const volatile void *);
>  #define wakeup_one(c) wakeup_n((c), 1)
>  int  tsleep(const volatile void *, int, const char *, int);
>  int  msleep(const volatile void *, struct mutex *, int,  const
> char*, int); +int rwsleep(const volatile void *, struct rwlock
> *, int, const char *, int); void  yield(void);
>  
>  void wdog_register(int (*)(void *, int), void *);
> Index: share/man/man9/tsleep.9
> ===
> RCS file: /cvs/src/share/man/man9/tsleep.9,v
> retrieving revision 1.10
> diff -u -p -r1.10 tsleep.9
> --- share/man/man9/tsleep.9   14 Sep 2015 15:14:55 -
> 1.10 +++ share/man/man9/tsleep.9  12 Sep 2016 08:42:55 -
> @@ -34,6 +34,7 @@
>  .Sh NAME
>  .Nm tsleep ,
>  .Nm msleep ,
> +.Nm rwsleep ,
>  .Nm wakeup ,
>  .Nm wakeup_n ,
>  .Nm wakeup_one
> @@ -45,6 +46,8 @@
>  .Fn tsleep "void *ident" "int priority" "const char *wmesg" "int
> timo" .Ft int
>  .Fn msleep "void *ident" "struct mutex *mtx" "int priority" "const
> char *wmesg" "int timo" +.Ft int
> +.Fn rwsleep "void *ident" "struct rwlock *wl" "int priority" "const
> char *wmesg" "int timo" .Ft void
>  .Fn wakeup "void *ident"
>  .Ft void
> @@ -53,9 +56,10 @@
>  .Fn wakeup_one "void *ident"
>  .Sh DESCRIPTION
>  These functions implement voluntary context switching.
> -.Fn tsleep
> -and
> +.Fn tsleep ,
>  .Fn msleep
> +and
> +.Fn rwsleep
>  are used throughout the kernel whenever processing in the current
> context cannot continue for any of the following reasons:
>  .Bl -bullet -offset indent
> @@ -146,6 +150,22 @@ argument.
>  .El
>  .Pp
>  The
> +.Fn rwsleep
> +function behaves just like
> +.Fn tsleep ,
> +but takes an additional argument:
> +.Bl -tag -width priority
> +.It Fa wl
> +A write lock that will be unlocked when the process is safely
> +on the sleep queue.
> +The write lock will be relocked at the end of rwsleep unless the
> +.Dv PNORELOCK
> +flag is set in the
> +.Fa priority
> +argument.
> +.El
> +.Pp
> +The
>  .Fn wakeup
>  function will mark all processes 

in6_selectroute should never get AF_INET filled struct route *

2016-09-02 Thread Vincent Gross
in6_selectroute() checks whether the struct route it received contains
a valid route whose AF is not AF_INET6, "in case the cache is shared".
Well, is this cache shared or not ?

There's only two ways to get to in6_selectroute()
1) in6_pcbselsrc() -> in6_selectif() -> in6_selectroute()
It is trivial to check that only inet6 is handled here, and that any
other AF is obviously an error.

2) ip6_output() -> in6_selectroute()
  a. If the struct route * arg of ip6_output() is NULL, then
ip6_output() zeroes a struct route from the stack, it will never
be valid thus there is no need to check its AF.
  b. If the struct route * arg is not NULL, it is passed to
ip6_output().

ip6_output() is called with a non-NULL struct route * in 5 places only:

netinet/tcp_output.c:1124:  error = ip6_output(m, 
tp->t_inpcb->inp_outputopts6,
netinet/tcp_output.c-1125->t_inpcb->inp_route6,
netinet/tcp_output.c-1126-0, NULL, tp->t_inpcb);

netinet/tcp_subr.c:399: ip6_output(m, tp ? tp->t_inpcb->inp_outputopts6 
: NULL,
netinet/tcp_subr.c-400- tp ? >t_inpcb->inp_route6 : NULL,
netinet/tcp_subr.c-401- 0, NULL,
netinet/tcp_subr.c-402- tp ? tp->t_inpcb : NULL);

netinet/tcp_input.c:4386:   error = ip6_output(m, NULL /*XXX*/, 
>sc_route6, 0,
netinet/tcp_input.c-4387-   NULL, NULL);

netinet6/ip6_divert.c:167:  error = ip6_output(m, NULL, 
>inp_route6,
netinet6/ip6_divert.c-168-  IP_ALLOWBROADCAST | IP_RAWOUTPUT, 
NULL, NULL);

netinet6/raw_ip6.c:457: error = ip6_output(m, optp, >inp_route6, flags,
netinet6/raw_ip6.c-458- in6p->inp_moptions6, in6p);

Each time, the struct route is only used in an inet6 context.

I think it is safe to add this KASSERT() to in6_selectroute(). A few
other things can be tightened, they will be addressed later.

Ok ?


Index: netinet6/in6_src.c
===
RCS file: /cvs/src/sys/netinet6/in6_src.c,v
retrieving revision 1.79
diff -u -p -r1.79 in6_src.c
--- netinet6/in6_src.c  4 Aug 2016 20:46:24 -   1.79
+++ netinet6/in6_src.c  2 Sep 2016 09:17:10 -
@@ -302,13 +302,13 @@ in6_selectroute(struct sockaddr_in6 *dst
 
/*
 * Use a cached route if it exists and is valid, else try to allocate
-* a new one.  Note that we should check the address family of the
-* cached destination, in case of sharing the cache with IPv4.
+* a new one.
 */
if (ro) {
+   if (rtisvalid(ro->ro_rt))
+   KASSERT(sin6tosa(>ro_dst)->sa_family == AF_INET6);
if (!rtisvalid(ro->ro_rt) ||
-sin6tosa(>ro_dst)->sa_family != AF_INET6 ||
-!IN6_ARE_ADDR_EQUAL(>ro_dst.sin6_addr, dst)) {
+   !IN6_ARE_ADDR_EQUAL(>ro_dst.sin6_addr, dst)) {
rtfree(ro->ro_rt);
ro->ro_rt = NULL;
}



Re: Let iked specify its source address when sending

2016-09-02 Thread Vincent Gross
Objections anyone ?

On Wed, 31 Aug 2016 15:57:45 +0200
Vincent Gross <vgr...@openbsd.org> wrote:

> On Wed, 31 Aug 2016 15:26:53 +0200
> Vincent Gross <vgr...@openbsd.org> wrote:
> 
> > On Thu, 11 Aug 2016 16:57:27 +0100
> > Stuart Henderson <s...@spacehopper.org> wrote:
> >   
> > > On 2016/06/27 13:00, Jérémie Courrèges-Anglas wrote:
> > [...]
> > > > 
> > > > I also gave my ok to vgross by IM.
> > > > 
> > > > I know that some concerns have been exposed privately, I was not
> > > > Cc'd, thus I have no idea what is the current status of that
> > > > discussion.  To the people concerned, please keep me / us
> > > > updated about that discussion and Cc us.  
> > > 
> > > How are things looking with IN_SENDSRCADDR now, are there any
> > > remaining concerns that need fixing before it could be committed?
> > > (Also if anyone has a share-able diff to use this with iked it
> > > would be quite handy..)
> > > 
> > 
> > Tested locally with two iked on two distinct rdomains plus a bit of
> > LD_PRELOAD goop. Unfortunately I couldn't ping from one rdom to the
> > other, but I also have this problem without my patch, so I am
> > confident this ping problem is unrelated.
> > 
> > I would be very grateful if someone could test this.
> >  
> 
> Take two, unmangled version :
> 
> Index: sbin/iked/iked.h
> ===
> RCS file: /cvs/src/sbin/iked/iked.h,v
> retrieving revision 1.96
> diff -u -p -r1.96 iked.h
> --- sbin/iked/iked.h  1 Jun 2016 11:16:41 -   1.96
> +++ sbin/iked/iked.h  31 Aug 2016 13:19:10 -
> @@ -898,6 +898,8 @@ intsocket_setport(struct sockaddr *, i
>  int   socket_getaddr(int, struct sockaddr_storage *);
>  int   socket_bypass(int, struct sockaddr *);
>  int   udp_bind(struct sockaddr *, in_port_t);
> +ssize_t   sendtofrom(int, void *, size_t, int, struct sockaddr
> *,
> + socklen_t, struct sockaddr *, socklen_t);
>  ssize_t   recvfromto(int, void *, size_t, int, struct sockaddr
> *, socklen_t *, struct sockaddr *, socklen_t *);
>  const char *
> Index: sbin/iked/ikev2_msg.c
> ===
> RCS file: /cvs/src/sbin/iked/ikev2_msg.c,v
> retrieving revision 1.45
> diff -u -p -r1.45 ikev2_msg.c
> --- sbin/iked/ikev2_msg.c 19 Oct 2015 11:25:35 -
> 1.45 +++ sbin/iked/ikev2_msg.c31 Aug 2016 13:19:10 -
> @@ -319,9 +319,11 @@ ikev2_msg_send(struct iked *env, struct 
>   msg->msg_offset += sizeof(natt);
>   }
>  
> - if ((sendto(msg->msg_fd, ibuf_data(buf), ibuf_size(buf), 0,
> - (struct sockaddr *)>msg_peer, msg->msg_peerlen)) ==
> -1) {
> - log_warn("%s: sendto", __func__);
> + if (sendtofrom(msg->msg_fd, ibuf_data(buf), ibuf_size(buf),
> 0,
> + (struct sockaddr *)>msg_peer, msg->msg_peerlen,
> + (struct sockaddr *)>msg_local, msg->msg_locallen) <
> + ibuf_size(buf)) {
> + log_warn("%s: sendtofrom", __func__);
>   return (-1);
>   }
>  
> @@ -969,10 +971,12 @@ int
>  ikev2_msg_retransmit_response(struct iked *env, struct iked_sa *sa,
>  struct iked_message *msg)
>  {
> - if ((sendto(msg->msg_fd, ibuf_data(msg->msg_data),
> - ibuf_size(msg->msg_data), 0, (struct sockaddr
> *)>msg_peer,
> - msg->msg_peerlen)) == -1) {
> - log_warn("%s: sendto", __func__);
> + if (sendtofrom(msg->msg_fd, ibuf_data(msg->msg_data),
> + ibuf_size(msg->msg_data), 0,
> + (struct sockaddr *)>msg_peer, msg->msg_peerlen,
> + (struct sockaddr *)>msg_local, msg->msg_locallen) < 
> + ibuf_size(msg->msg_data)) {
> + log_warn("%s: sendtofrom", __func__);
>   return (-1);
>   }
>  
> @@ -996,11 +1000,12 @@ ikev2_msg_retransmit_timeout(struct iked
>   struct iked_sa  *sa = msg->msg_sa;
>  
>   if (msg->msg_tries < IKED_RETRANSMIT_TRIES) {
> - if ((sendto(msg->msg_fd, ibuf_data(msg->msg_data),
> + if (sendtofrom(msg->msg_fd, ibuf_data(msg->msg_data),
>   ibuf_size(msg->msg_data), 0,
> - (struct sockaddr *)>msg_peer,
> - msg->msg_peerlen)) == -1) {
> - log_warn("%s: sendto", __func__);
> + (struct sockaddr *)>msg_peer,
> msg->

Re: Drop IPSec traffic that should be encapsulated but is not

2016-09-01 Thread Vincent Gross
On Thu, 1 Sep 2016 18:02:14 +0200
Claer <cl...@claer.hammock.fr> wrote:

> Hello,
> 
> In some production systems, I'm still using an old patch to isakmpd
> for Nat-t.
> When negociating SAs with ASA peers and OpenBSD is nated, you have
> issues during negociation. The following discutions explain the issue
> 
> http://openbsd.7691.n7.nabble.com/isakmpd-NAT-T-interoperability-td173004.html
> http://marc.info/?l=openbsd-tech=139140140105433=2
> 
> I think the patch is related to the parts of the code you are working
> on.
> 

Actually it is not. The issue you are referencing is in isakmpd,
whereas the diff below is in the OpenBSD kernel. Totally different
stuff. I do not plan to look at isakmpd at the moment, as it only
supports IKEv1, and its code is nearly twice the size of iked.

I do not have Cisco gear available to test, is this issue present when
opening NAT-T tunnels with iked ?

Cheers

> Would you mind looking at this issue also? :)
> 
> Thanks!
> 
> Claer
> 
> On Thu, Sep 01 2016 at 31:10, Vincent Gross wrote:
> 
> > Our IPSec stack rejects UDP-encapsulated traffic using a non
> > encapsulating SA, but not the other way around. This diff adds
> > the missing check and the corresponding stat counter.
> > 
> > Ok ?
> > 
> > Index: sys/netinet/ip_esp.h
> > ===
> > RCS file: /cvs/src/sys/netinet/ip_esp.h,v
> > retrieving revision 1.42
> > diff -u -p -r1.42 ip_esp.h
> > --- sys/netinet/ip_esp.h10 Jan 2010 12:43:07 -
> > 1.42 +++ sys/netinet/ip_esp.h   1 Sep 2016 08:24:15 -
> > @@ -62,6 +62,7 @@ struct espstat
> >  u_int32_t  esps_udpencin;  /* Input ESP-in-UDP packets */
> >  u_int32_t  esps_udpencout; /* Output ESP-in-UDP packets
> > */ u_int32_tesps_udpinval;  /* Invalid input ESP-in-UDP
> > packets */
> > +u_int32_t  esps_udpneeded; /* Trying to use a ESP-in-UDP
> > TDB */ };
> >  
> >  /*
> > Index: sys/netinet/ipsec_input.c
> > ===
> > RCS file: /cvs/src/sys/netinet/ipsec_input.c,v
> > retrieving revision 1.135
> > diff -u -p -r1.135 ipsec_input.c
> > --- sys/netinet/ipsec_input.c   10 Sep 2015 17:52:05
> > -   1.135 +++ sys/netinet/ipsec_input.c 1 Sep 2016
> > 08:24:16 - @@ -262,6 +262,16 @@ ipsec_common_input(struct mbuf
> > *m, int s return EINVAL;
> > }
> >  
> > +   if (!udpencap && (tdbp->tdb_flags & TDBF_UDPENCAP)) {
> > +   splx(s);
> > +   DPRINTF(("ipsec_common_input(): attempted to use
> > udpencap "
> > +   "SA %s/%08x/%u\n", ipsp_address(_address,
> > buf,
> > +   sizeof(buf)), ntohl(spi), tdbp->tdb_sproto));
> > +   m_freem(m);
> > +   espstat.esps_udpneeded++;
> > +   return EINVAL;
> > +   }
> > +
> > if (tdbp->tdb_xform == NULL) {
> > splx(s);
> > DPRINTF(("ipsec_common_input(): attempted to use
> > uninitialized " Index: usr.bin/netstat/inet.c
> > ===
> > RCS file: /cvs/src/usr.bin/netstat/inet.c,v
> > retrieving revision 1.150
> > diff -u -p -r1.150 inet.c
> > --- usr.bin/netstat/inet.c  27 Aug 2016 04:13:43 -
> > 1.150 +++ usr.bin/netstat/inet.c1 Sep 2016 08:24:16 -
> > @@ -1073,6 +1073,7 @@ esp_stats(char *name)
> > p(esps_udpencin, "\t%u input UDP encapsulated ESP
> > packet%s\n"); p(esps_udpencout, "\t%u output UDP encapsulated ESP
> > packet%s\n"); p(esps_udpinval, "\t%u UDP packet%s for
> > non-encapsulating TDB received\n");
> > +   p(esps_udpneeded, "\t%u raw ESP packet%s for encapsulating
> > TDB received\n"); p(esps_ibytes, "\t%llu input byte%s\n");
> > p(esps_obytes, "\t%llu output byte%s\n");
> >  
> >   



NAT-on-enc on iked(8)

2016-09-01 Thread Vincent Gross
This diff adds the missing bits to support NAT-on-enc in iked(8).

See OUTGOING NETWORK ADDRESS TRANSLATION in iked.conf(5), and also
http://undeadly.org/cgi?action=article=20090127205841.

Ok ?


diff --git sbin/iked/iked.h sbin/iked/iked.h
index aa40d70..dfa04ad 100644
--- sbin/iked/iked.h
+++ sbin/iked/iked.h
@@ -140,6 +140,7 @@ struct iked_flow {
struct iked_addr flow_src;
struct iked_addr flow_dst;
unsigned int flow_dir;  /* in/out */
+   struct iked_addr flow_prenat;
 
unsigned int flow_loaded;   /* pfkey done */
 
diff --git sbin/iked/parse.y sbin/iked/parse.y
index c93a978..e3e7c29 100644
--- sbin/iked/parse.y
+++ sbin/iked/parse.y
@@ -2418,7 +2418,7 @@ create_ike(char *name, int af, uint8_t ipproto, struct 
ipsec_hosts *hosts,
 {
char idstr[IKED_ID_SIZE];
unsigned int idtype = IKEV2_ID_NONE;
-   struct ipsec_addr_wrap  *ipa, *ipb;
+   struct ipsec_addr_wrap  *ipa, *ipb, *ippn;
struct iked_policy   pol;
struct iked_proposal prop[2];
unsigned int j;
@@ -2640,6 +2640,17 @@ create_ike(char *name, int af, uint8_t ipproto, struct 
ipsec_hosts *hosts,
flows[j].flow_dst.addr_net = ipb->netaddress;
flows[j].flow_dst.addr_port = hosts->dport;
 
+   ippn = ipa->srcnat;
+   if (ippn) {
+   memcpy([j].flow_prenat.addr, >address,
+   sizeof(ippn->address));
+   flows[j].flow_prenat.addr_af = ippn->af;
+   flows[j].flow_prenat.addr_mask = ippn->mask;
+   flows[j].flow_prenat.addr_net = ippn->netaddress;
+   } else {
+   flows[j].flow_prenat.addr_af = 0;
+   }
+
flows[j].flow_ipproto = ipproto;
 
pol.pol_nflows++;
diff --git sbin/iked/pfkey.c sbin/iked/pfkey.c
index 72c2d31..20ca4aa 100644
--- sbin/iked/pfkey.c
+++ sbin/iked/pfkey.c
@@ -173,6 +173,7 @@ int
 pfkey_flow(int sd, uint8_t satype, uint8_t action, struct iked_flow *flow)
 {
struct sadb_msg  smsg;
+   struct iked_addr*flow_src, *flow_dst;
struct sadb_address  sa_src, sa_dst, sa_local, sa_peer, sa_smask,
 sa_dmask;
struct sadb_protocol sa_flowtype, sa_protocol;
@@ -183,56 +184,75 @@ pfkey_flow(int sd, uint8_t satype, uint8_t action, struct 
iked_flow *flow)
 
sa_srcid = sa_dstid = NULL;
 
+   flow_src = >flow_src;
+   flow_dst = >flow_dst;
+
+   if (flow->flow_prenat.addr_af == flow_src->addr_af) {
+   switch (flow->flow_type) {
+   case SADB_X_FLOW_TYPE_USE:
+   flow_dst = >flow_prenat;
+   break;
+   case SADB_X_FLOW_TYPE_REQUIRE:
+   flow_src = >flow_prenat;
+   break;
+   case 0:
+   if (flow->flow_dir == IPSP_DIRECTION_IN)
+   flow_dst = >flow_prenat;
+   else
+   flow_src = >flow_prenat;
+   }
+   }
+
bzero(, sizeof(ssrc));
bzero(, sizeof(smask));
-   memcpy(, >flow_src.addr, sizeof(ssrc));
-   memcpy(, >flow_src.addr, sizeof(smask));
-   socket_af((struct sockaddr *), flow->flow_src.addr_port);
-   socket_af((struct sockaddr *), flow->flow_src.addr_port ?
+   memcpy(, _src->addr, sizeof(ssrc));
+   memcpy(, _src->addr, sizeof(smask));
+   socket_af((struct sockaddr *), flow_src->addr_port);
+   socket_af((struct sockaddr *), flow_src->addr_port ?
0x : 0);
 
-   switch (flow->flow_src.addr_af) {
+   switch (flow_src->addr_af) {
case AF_INET:
((struct sockaddr_in *))->sin_addr.s_addr =
-   prefixlen2mask(flow->flow_src.addr_net ?
-   flow->flow_src.addr_mask : 32);
+   prefixlen2mask(flow_src->addr_net ?
+   flow_src->addr_mask : 32);
break;
case AF_INET6:
-   prefixlen2mask6(flow->flow_src.addr_net ?
-   flow->flow_src.addr_mask : 128,
+   prefixlen2mask6(flow_src->addr_net ?
+   flow_src->addr_mask : 128,
(uint32_t *)((struct sockaddr_in6 *)
)->sin6_addr.s6_addr);
break;
default:
log_warnx("%s: unsupported address family %d",
-   __func__, flow->flow_src.addr_af);
+   __func__, flow_src->addr_af);
return (-1);
}
smask.ss_len = ssrc.ss_len;
 
bzero(, sizeof(sdst));
bzero(, sizeof(dmask));
-   memcpy(, >flow_dst.addr, sizeof(sdst));
-   memcpy(, 

Drop IPSec traffic that should be encapsulated but is not

2016-09-01 Thread Vincent Gross
Our IPSec stack rejects UDP-encapsulated traffic using a non
encapsulating SA, but not the other way around. This diff adds
the missing check and the corresponding stat counter.

Ok ?

Index: sys/netinet/ip_esp.h
===
RCS file: /cvs/src/sys/netinet/ip_esp.h,v
retrieving revision 1.42
diff -u -p -r1.42 ip_esp.h
--- sys/netinet/ip_esp.h10 Jan 2010 12:43:07 -  1.42
+++ sys/netinet/ip_esp.h1 Sep 2016 08:24:15 -
@@ -62,6 +62,7 @@ struct espstat
 u_int32_t  esps_udpencin;  /* Input ESP-in-UDP packets */
 u_int32_t  esps_udpencout; /* Output ESP-in-UDP packets */
 u_int32_t  esps_udpinval;  /* Invalid input ESP-in-UDP packets */
+u_int32_t  esps_udpneeded; /* Trying to use a ESP-in-UDP TDB */
 };
 
 /*
Index: sys/netinet/ipsec_input.c
===
RCS file: /cvs/src/sys/netinet/ipsec_input.c,v
retrieving revision 1.135
diff -u -p -r1.135 ipsec_input.c
--- sys/netinet/ipsec_input.c   10 Sep 2015 17:52:05 -  1.135
+++ sys/netinet/ipsec_input.c   1 Sep 2016 08:24:16 -
@@ -262,6 +262,16 @@ ipsec_common_input(struct mbuf *m, int s
return EINVAL;
}
 
+   if (!udpencap && (tdbp->tdb_flags & TDBF_UDPENCAP)) {
+   splx(s);
+   DPRINTF(("ipsec_common_input(): attempted to use udpencap "
+   "SA %s/%08x/%u\n", ipsp_address(_address, buf,
+   sizeof(buf)), ntohl(spi), tdbp->tdb_sproto));
+   m_freem(m);
+   espstat.esps_udpneeded++;
+   return EINVAL;
+   }
+
if (tdbp->tdb_xform == NULL) {
splx(s);
DPRINTF(("ipsec_common_input(): attempted to use uninitialized "
Index: usr.bin/netstat/inet.c
===
RCS file: /cvs/src/usr.bin/netstat/inet.c,v
retrieving revision 1.150
diff -u -p -r1.150 inet.c
--- usr.bin/netstat/inet.c  27 Aug 2016 04:13:43 -  1.150
+++ usr.bin/netstat/inet.c  1 Sep 2016 08:24:16 -
@@ -1073,6 +1073,7 @@ esp_stats(char *name)
p(esps_udpencin, "\t%u input UDP encapsulated ESP packet%s\n");
p(esps_udpencout, "\t%u output UDP encapsulated ESP packet%s\n");
p(esps_udpinval, "\t%u UDP packet%s for non-encapsulating TDB 
received\n");
+   p(esps_udpneeded, "\t%u raw ESP packet%s for encapsulating TDB 
received\n");
p(esps_ibytes, "\t%llu input byte%s\n");
p(esps_obytes, "\t%llu output byte%s\n");
 



Re: Let iked specify its source address when sending

2016-08-31 Thread Vincent Gross
On Wed, 31 Aug 2016 16:09:30 +0200
Reyk Floeter <r...@openbsd.org> wrote:

> On Wed, Aug 31, 2016 at 03:26:53PM +0200, Vincent Gross wrote:
> > On Thu, 11 Aug 2016 16:57:27 +0100
> > Stuart Henderson <s...@spacehopper.org> wrote:
> >   
> > > On 2016/06/27 13:00, J?r?mie Courr?ges-Anglas wrote:  
> > [...]
> > > > 
> > > > I also gave my ok to vgross by IM.
> > > > 
> > > > I know that some concerns have been exposed privately, I was not
> > > > Cc'd, thus I have no idea what is the current status of that
> > > > discussion.  To the people concerned, please keep me / us
> > > > updated about that discussion and Cc us.
> > > 
> > > How are things looking with IN_SENDSRCADDR now, are there any
> > > remaining concerns that need fixing before it could be committed?
> > > (Also if anyone has a share-able diff to use this with iked it
> > > would be quite handy..)
> > >   
> > 
> > Tested locally with two iked on two distinct rdomains plus a bit of
> > LD_PRELOAD goop. Unfortunately I couldn't ping from one rdom to the
> > other, but I also have this problem without my patch, so I am
> > confident this ping problem is unrelated.
> > 
> > I would be very grateful if someone could test this.
> >   
> 
> I don't know why you need LD_PRELOAD.
> 

I wanted to use different key sets for each iked instance.

> When testing iked with two different rdomains, you have to create an
> enc(4) device per rdomain, or no ipsec traffic will flow.  enc0 is for
> rdomain 0 only.
> 
> # ifconfig enc1 rdomain up
> # ifconfig enc2 rdomain up
> # route -T 1 exec iked -ddvvf /etc/iked.conf.1
> # route -T 2 exec iked -ddvvf /etc/iked.conf.2

Well, look at what you did, I understood why my pings wouldn't
go through !

Hm, turns out I can send esp'ed data, when using the not-default
address, but I can't receive yet.

This is my test bench :

-- A side:

$ doas ifconfig pair10 rdomain 10 10.124.0.10/24 up
$ doas ifconfig enc10 rdomain 10 up
$ doas ifconfig vether10 rdomain 10 10.123.10.1/24 up
$ doas route -T 10 add 10.123.11.0/24 10.124.0.11
$ cat iked.a.conf
ikev2 active esp from 10.123.10.0/24 to 10.124.0.11 \
  local 10.123.10.1 peer 10.124.0.11 \
  srcid a.test dstid b.test


"route -T 10 exec ipsecctl -s all" output :
FLOWS:
flow esp in from 10.124.0.11 to 10.123.10.0/24 peer 10.124.0.11 srcid 
FQDN/a.test dstid FQDN/b.test type use
flow esp out from 10.123.10.0/24 to 10.124.0.11 peer 10.124.0.11 srcid 
FQDN/a.test dstid FQDN/b.test type require
flow esp out from ::/0 to ::/0 type deny

SAD:
esp tunnel from 10.124.0.11 to 10.124.0.10 spi 0x2ebe4b1b auth hmac-sha2-256 
enc aes-256
esp tunnel from 10.124.0.10 to 10.124.0.11 spi 0x3c4b29c3 auth hmac-sha2-256 
enc aes-256



-- B side:

$ doas ifconfig pair11 rdomain 11 10.124.0.11/24 up
$ doas ifconfig enc11 rdomain 11 up
$ doas ifconfig vether11 rdomain 11 10.123.11.1/24 up
$ doas route -T 11 add 10.123.10.0/24 10.124.0.10
$ cat iked.b.conf
ikev2 active esp from 10.124.0.11 to 10.123.10.0/24 \
  local 10.124.0.11 peer 10.123.10.1 \
  srcid b.test dstid a.test


"route -T 11 exec ipsecctl -s all" output :
FLOWS:
flow esp in from 10.123.10.0/24 to 10.124.0.11 peer 10.124.0.10 srcid 
FQDN/b.test dstid FQDN/a.test type use
flow esp out from 10.124.0.11 to 10.123.10.0/24 peer 10.124.0.10 srcid 
FQDN/b.test dstid FQDN/a.test type require
flow esp out from ::/0 to ::/0 type deny

SAD:
esp tunnel from 10.124.0.11 to 10.124.0.10 spi 0x2ebe4b1b auth hmac-sha2-256 
enc aes-256
esp tunnel from 10.124.0.10 to 10.124.0.11 spi 0x3c4b29c3 auth hmac-sha2-256 
enc aes-256



-- The fun part:

run "tcpdump -ni pair10", then "route -T 10 exec ping -I 10.123.10.1 
10.124.0.11" :

...
17:26:24.185391 esp 10.124.0.10 > 10.124.0.11 spi 0x3c4b29c3 seq 36 len 136
17:26:24.185797 10.124.0.11.4500 > 10.124.0.10.4500:udpencap: esp 10.124.0.11 > 
10.124.0.10 spi 0x2ebe4b1b seq 36 len 136
17:26:25.190350 esp 10.124.0.10 > 10.124.0.11 spi 0x3c4b29c3 seq 37 len 136
17:26:25.190680 10.124.0.11.4500 > 10.124.0.10.4500:udpencap: esp 10.124.0.11 > 
10.124.0.10 spi 0x2ebe4b1b seq 37 len 136
17:26:26.190344 esp 10.124.0.10 > 10.124.0.11 spi 0x3c4b29c3 seq 38 len 136
17:26:26.190701 10.124.0.11.4500 > 10.124.0.10.4500:udpencap: esp 10.124.0.11 > 
10.124.0.10 spi 0x2ebe4b1b seq 38 len 136
...


The udpencap'd return traffic is not picked up by enc10, so your ping replies 
are lost ...



Re: Let iked specify its source address when sending

2016-08-31 Thread Vincent Gross
On Wed, 31 Aug 2016 15:26:53 +0200
Vincent Gross <vgr...@openbsd.org> wrote:

> On Thu, 11 Aug 2016 16:57:27 +0100
> Stuart Henderson <s...@spacehopper.org> wrote:
> 
> > On 2016/06/27 13:00, Jérémie Courrèges-Anglas wrote:  
> [...]  
> > > 
> > > I also gave my ok to vgross by IM.
> > > 
> > > I know that some concerns have been exposed privately, I was not
> > > Cc'd, thus I have no idea what is the current status of that
> > > discussion.  To the people concerned, please keep me / us updated
> > > about that discussion and Cc us.
> > 
> > How are things looking with IN_SENDSRCADDR now, are there any
> > remaining concerns that need fixing before it could be committed?
> > (Also if anyone has a share-able diff to use this with iked it
> > would be quite handy..)
> >   
> 
> Tested locally with two iked on two distinct rdomains plus a bit of
> LD_PRELOAD goop. Unfortunately I couldn't ping from one rdom to the
> other, but I also have this problem without my patch, so I am
> confident this ping problem is unrelated.
> 
> I would be very grateful if someone could test this.
>

Take two, unmangled version :

Index: sbin/iked/iked.h
===
RCS file: /cvs/src/sbin/iked/iked.h,v
retrieving revision 1.96
diff -u -p -r1.96 iked.h
--- sbin/iked/iked.h1 Jun 2016 11:16:41 -   1.96
+++ sbin/iked/iked.h31 Aug 2016 13:19:10 -
@@ -898,6 +898,8 @@ int  socket_setport(struct sockaddr *, i
 int socket_getaddr(int, struct sockaddr_storage *);
 int socket_bypass(int, struct sockaddr *);
 int udp_bind(struct sockaddr *, in_port_t);
+ssize_t sendtofrom(int, void *, size_t, int, struct sockaddr *,
+   socklen_t, struct sockaddr *, socklen_t);
 ssize_t recvfromto(int, void *, size_t, int, struct sockaddr *,
socklen_t *, struct sockaddr *, socklen_t *);
 const char *
Index: sbin/iked/ikev2_msg.c
===
RCS file: /cvs/src/sbin/iked/ikev2_msg.c,v
retrieving revision 1.45
diff -u -p -r1.45 ikev2_msg.c
--- sbin/iked/ikev2_msg.c   19 Oct 2015 11:25:35 -  1.45
+++ sbin/iked/ikev2_msg.c   31 Aug 2016 13:19:10 -
@@ -319,9 +319,11 @@ ikev2_msg_send(struct iked *env, struct 
msg->msg_offset += sizeof(natt);
}
 
-   if ((sendto(msg->msg_fd, ibuf_data(buf), ibuf_size(buf), 0,
-   (struct sockaddr *)>msg_peer, msg->msg_peerlen)) == -1) {
-   log_warn("%s: sendto", __func__);
+   if (sendtofrom(msg->msg_fd, ibuf_data(buf), ibuf_size(buf), 0,
+   (struct sockaddr *)>msg_peer, msg->msg_peerlen,
+   (struct sockaddr *)>msg_local, msg->msg_locallen) <
+   ibuf_size(buf)) {
+   log_warn("%s: sendtofrom", __func__);
return (-1);
}
 
@@ -969,10 +971,12 @@ int
 ikev2_msg_retransmit_response(struct iked *env, struct iked_sa *sa,
 struct iked_message *msg)
 {
-   if ((sendto(msg->msg_fd, ibuf_data(msg->msg_data),
-   ibuf_size(msg->msg_data), 0, (struct sockaddr *)>msg_peer,
-   msg->msg_peerlen)) == -1) {
-   log_warn("%s: sendto", __func__);
+   if (sendtofrom(msg->msg_fd, ibuf_data(msg->msg_data),
+   ibuf_size(msg->msg_data), 0,
+   (struct sockaddr *)>msg_peer, msg->msg_peerlen,
+   (struct sockaddr *)>msg_local, msg->msg_locallen) < 
+   ibuf_size(msg->msg_data)) {
+   log_warn("%s: sendtofrom", __func__);
return (-1);
}
 
@@ -996,11 +1000,12 @@ ikev2_msg_retransmit_timeout(struct iked
struct iked_sa  *sa = msg->msg_sa;
 
if (msg->msg_tries < IKED_RETRANSMIT_TRIES) {
-   if ((sendto(msg->msg_fd, ibuf_data(msg->msg_data),
+   if (sendtofrom(msg->msg_fd, ibuf_data(msg->msg_data),
ibuf_size(msg->msg_data), 0,
-   (struct sockaddr *)>msg_peer,
-   msg->msg_peerlen)) == -1) {
-   log_warn("%s: sendto", __func__);
+   (struct sockaddr *)>msg_peer, msg->msg_peerlen,
+   (struct sockaddr *)>msg_local, msg->msg_locallen) <
+   ibuf_size(msg->msg_data)) {
+   log_warn("%s: sendtofrom", __func__);
sa_free(env, sa);
return;
}
Index: sbin/iked/util.c
===
RCS file: /cvs/src/sbin/iked/util.c,v
retrieving revision 1.30
diff -u -p -r1.30 util.c
--- sbin/iked/util.c23 Nov 2015 19:28:34 - 

Let iked specify its source address when sending

2016-08-31 Thread Vincent Gross
On Thu, 11 Aug 2016 16:57:27 +0100
Stuart Henderson  wrote:

> On 2016/06/27 13:00, Jérémie Courrèges-Anglas wrote:
[...]  
> > 
> > I also gave my ok to vgross by IM.
> > 
> > I know that some concerns have been exposed privately, I was not
> > Cc'd, thus I have no idea what is the current status of that
> > discussion.  To the people concerned, please keep me / us updated
> > about that discussion and Cc us.  
> 
> How are things looking with IN_SENDSRCADDR now, are there any
> remaining concerns that need fixing before it could be committed?
> (Also if anyone has a share-able diff to use this with iked it
> would be quite handy..)
> 

Tested locally with two iked on two distinct rdomains plus a bit of
LD_PRELOAD goop. Unfortunately I couldn't ping from one rdom to the
other, but I also have this problem without my patch, so I am confident
this ping problem is unrelated.

I would be very grateful if someone could test this.


Index: sbin/iked/iked.h
===
RCS file: /cvs/src/sbin/iked/iked.h,v
retrieving revision 1.96
diff -u -p -r1.96 iked.h
--- sbin/iked/iked.h1 Jun 2016 11:16:41 -   1.96
+++ sbin/iked/iked.h31 Aug 2016 13:19:10 -
@@ -898,6 +898,8 @@ int  socket_setport(struct sockaddr *, i
 int socket_getaddr(int, struct sockaddr_storage *);
 int socket_bypass(int, struct sockaddr *);
 int udp_bind(struct sockaddr *, in_port_t);
+ssize_t sendtofrom(int, void *, size_t, int, struct sockaddr *,
+   socklen_t, struct sockaddr *, socklen_t);
 ssize_t recvfromto(int, void *, size_t, int, struct sockaddr *,
socklen_t *, struct sockaddr *, socklen_t *);
 const char *
Index: sbin/iked/ikev2_msg.c
===
RCS file: /cvs/src/sbin/iked/ikev2_msg.c,v
retrieving revision 1.45
diff -u -p -r1.45 ikev2_msg.c
--- sbin/iked/ikev2_msg.c   19 Oct 2015 11:25:35 -  1.45
+++ sbin/iked/ikev2_msg.c   31 Aug 2016 13:19:10 -
@@ -319,9 +319,11 @@ ikev2_msg_send(struct iked *env, struct 
msg->msg_offset += sizeof(natt);
}
 
-   if ((sendto(msg->msg_fd, ibuf_data(buf), ibuf_size(buf), 0,
-   (struct sockaddr *)>msg_peer, msg->msg_peerlen)) ==
-1) {
-   log_warn("%s: sendto", __func__);
+   if (sendtofrom(msg->msg_fd, ibuf_data(buf), ibuf_size(buf), 0,
+   (struct sockaddr *)>msg_peer, msg->msg_peerlen,
+   (struct sockaddr *)>msg_local, msg->msg_locallen) <
+   ibuf_size(buf)) {
+   log_warn("%s: sendtofrom", __func__);
return (-1);
}
 
@@ -969,10 +971,12 @@ int
 ikev2_msg_retransmit_response(struct iked *env, struct iked_sa *sa,
 struct iked_message *msg)
 {
-   if ((sendto(msg->msg_fd, ibuf_data(msg->msg_data),
-   ibuf_size(msg->msg_data), 0, (struct sockaddr
*)>msg_peer,
-   msg->msg_peerlen)) == -1) {
-   log_warn("%s: sendto", __func__);
+   if (sendtofrom(msg->msg_fd, ibuf_data(msg->msg_data),
+   ibuf_size(msg->msg_data), 0,
+   (struct sockaddr *)>msg_peer, msg->msg_peerlen,
+   (struct sockaddr *)>msg_local, msg->msg_locallen) < 
+   ibuf_size(msg->msg_data)) {
+   log_warn("%s: sendtofrom", __func__);
return (-1);
}
 
@@ -996,11 +1000,12 @@ ikev2_msg_retransmit_timeout(struct iked
struct iked_sa  *sa = msg->msg_sa;
 
if (msg->msg_tries < IKED_RETRANSMIT_TRIES) {
-   if ((sendto(msg->msg_fd, ibuf_data(msg->msg_data),
+   if (sendtofrom(msg->msg_fd, ibuf_data(msg->msg_data),
ibuf_size(msg->msg_data), 0,
-   (struct sockaddr *)>msg_peer,
-   msg->msg_peerlen)) == -1) {
-   log_warn("%s: sendto", __func__);
+   (struct sockaddr *)>msg_peer,
msg->msg_peerlen,
+   (struct sockaddr *)>msg_local,
msg->msg_locallen) <
+   ibuf_size(msg->msg_data)) {
+   log_warn("%s: sendtofrom", __func__);
sa_free(env, sa);
return;
}
Index: sbin/iked/util.c
===
RCS file: /cvs/src/sbin/iked/util.c,v
retrieving revision 1.30
diff -u -p -r1.30 util.c
--- sbin/iked/util.c23 Nov 2015 19:28:34 -  1.30
+++ sbin/iked/util.c31 Aug 2016 13:19:10 -
@@ -287,6 +287,57 @@ sockaddr_cmp(struct sockaddr *a, struct 
 }
 
 ssize_t
+sendtofrom(int s, void *buf, size_t len, int flags, struct sockaddr
*to,
+socklen_t tolen, struct sockaddr *from, socklen_t fromlen)
+{
+   struct iovec iov;
+   struct msghdrmsg;
+   struct cmsghdr  *cmsg;
+   struct in6_pktinfo  *pkt6;
+   struct sockaddr_in  *in;
+   struct sockaddr_in6 

Re: IP_SENDSRCADDR [2/2] : add cmsg support

2016-08-16 Thread Vincent Gross
On Thu, 11 Aug 2016 16:57:27 +0100
Stuart Henderson <s...@spacehopper.org> wrote:

> On 2016/06/27 13:00, Jérémie Courrèges-Anglas wrote:
> > Stuart Henderson <s...@spacehopper.org> writes:
> >   
[...]
> > >
> > > Basically yes but one observation.  
> > 
> > I also gave my ok to vgross by IM.
> > 
> > I know that some concerns have been exposed privately, I was not
> > Cc'd, thus I have no idea what is the current status of that
> > discussion.  To the people concerned, please keep me / us updated
> > about that discussion and Cc us.  
> 
> How are things looking with IN_SENDSRCADDR now, are there any
> remaining concerns that need fixing before it could be committed?
> (Also if anyone has a share-able diff to use this with iked it
> would be quite handy..)
> 

I just commited the diff with fixes, enhancements and regression tests.
All manners of testing and feedback are welcome !

--
Vincent Gross



Re: split in6_selectsrc() for saner prototypes

2016-07-29 Thread Vincent Gross
On Wed, 20 Jul 2016 12:36:45 +0200
Vincent Gross <vgr...@openbsd.org> wrote:

> This is a completely mechanical diff to get rid of the 7-params
> madness in in6_selectsrc().
> 
> I also apply the same treatment to in_selectsrc() for consistency.
> 
> Ok?

... and of course I forgot to initialize a variable and broke all ipv6,
thanks to Heikko for reporting this on bugs@.

New diff below adds dst init in in6_selectsrc(), ok ?

Index: sys/netinet/in_pcb.c
===
RCS file: /cvs/src/sys/netinet/in_pcb.c,v
retrieving revision 1.212
diff -u -p -r1.212 in_pcb.c
--- sys/netinet/in_pcb.c22 Jul 2016 11:14:41 -  1.212
+++ sys/netinet/in_pcb.c29 Jul 2016 19:53:21 -
@@ -525,8 +525,7 @@ in_pcbconnect(struct inpcb *inp, struct 
if (sin->sin_port == 0)
return (EADDRNOTAVAIL);
 
-   error = in_selectsrc(, sin, inp->inp_moptions, >inp_route,
-   >inp_laddr, inp->inp_rtableid);
+   error = in_pcbselsrc(, sin, inp);
if (error)
return (error);
 
@@ -876,10 +875,14 @@ in_pcbrtentry(struct inpcb *inp)
  * an entry to the caller for later use.
  */
 int
-in_selectsrc(struct in_addr **insrc, struct sockaddr_in *sin,
-struct ip_moptions *mopts, struct route *ro, struct in_addr *laddr,
-u_int rtableid)
+in_pcbselsrc(struct in_addr **insrc, struct sockaddr_in *sin,
+struct inpcb *inp)
 {
+   struct ip_moptions *mopts = inp->inp_moptions;
+   struct route *ro = >inp_route;
+   struct in_addr *laddr = >inp_laddr;
+   u_int rtableid = inp->inp_rtableid;
+
struct sockaddr_in *sin2;
struct in_ifaddr *ia = NULL;
 
Index: sys/netinet/in_pcb.h
===
RCS file: /cvs/src/sys/netinet/in_pcb.h,v
retrieving revision 1.102
diff -u -p -r1.102 in_pcb.h
--- sys/netinet/in_pcb.h22 Jul 2016 11:14:41 -  1.102
+++ sys/netinet/in_pcb.h29 Jul 2016 19:53:21 -
@@ -289,8 +289,7 @@ void in_setpeeraddr(struct inpcb *, str
 voidin_setsockaddr(struct inpcb *, struct mbuf *);
 int in_baddynamic(u_int16_t, u_int16_t);
 int in_rootonly(u_int16_t, u_int16_t);
-int in_selectsrc(struct in_addr **, struct sockaddr_in *,
-   struct ip_moptions *, struct route *, struct in_addr *, u_int);
+int in_pcbselsrc(struct in_addr **, struct sockaddr_in *, struct inpcb *);
 struct rtentry *
in_pcbrtentry(struct inpcb *);
 
Index: sys/netinet/udp_usrreq.c
===
RCS file: /cvs/src/sys/netinet/udp_usrreq.c,v
retrieving revision 1.216
diff -u -p -r1.216 udp_usrreq.c
--- sys/netinet/udp_usrreq.c22 Jul 2016 11:14:41 -  1.216
+++ sys/netinet/udp_usrreq.c29 Jul 2016 19:53:22 -
@@ -989,8 +989,7 @@ udp_output(struct inpcb *inp, struct mbu
goto release;
}
 
-   error = in_selectsrc(, sin, inp->inp_moptions,
-   >inp_route, >inp_laddr, inp->inp_rtableid);
+   error = in_pcbselsrc(, sin, inp);
if (error)
goto release;
 
Index: sys/netinet6/icmp6.c
===
RCS file: /cvs/src/sys/netinet6/icmp6.c,v
retrieving revision 1.188
diff -u -p -r1.188 icmp6.c
--- sys/netinet6/icmp6.c22 Jul 2016 11:14:41 -  1.188
+++ sys/netinet6/icmp6.c29 Jul 2016 19:53:22 -
@@ -1259,7 +1259,7 @@ icmp6_reflect(struct mbuf *m, size_t off
 * source address of the erroneous packet.
 */
bzero(, sizeof(ro));
-   error = in6_selectsrc(, _src, NULL, NULL, , NULL,
+   error = in6_selectsrc(, _src, NULL, ,
m->m_pkthdr.ph_rtableid);
if (ro.ro_rt)
rtfree(ro.ro_rt); /* XXX: we could use this */
Index: sys/netinet6/in6_pcb.c
===
RCS file: /cvs/src/sys/netinet6/in6_pcb.c,v
retrieving revision 1.95
diff -u -p -r1.95 in6_pcb.c
--- sys/netinet6/in6_pcb.c  22 Jul 2016 11:14:41 -  1.95
+++ sys/netinet6/in6_pcb.c  29 Jul 2016 19:53:22 -
@@ -281,9 +281,7 @@ in6_pcbconnect(struct inpcb *inp, struct
 * with the address specified by setsockopt(IPV6_PKTINFO).
 * Is it the intended behavior?
 */
-   error = in6_selectsrc(, sin6, inp->inp_outputopts6,
-   inp->inp_moptions6, >inp_route6, >inp_laddr6,
-   inp->inp_rtableid);
+   error = in6_pcbselsrc(, sin6, inp, inp->inp_outputopts6);
if (error)
return (error);
 
Index: sys/netinet6/in6_src.c
===
RCS file: /cvs/src/sys/netinet6/in6_src.c,v
retrieving revision 1

split in6_selectsrc() for saner prototypes

2016-07-20 Thread Vincent Gross
This is a completely mechanical diff to get rid of the 7-params madness
in in6_selectsrc().

I also apply the same treatment to in_selectsrc() for consistency.

Ok?

Index: sys/netinet/in_pcb.c
===
RCS file: /cvs/src/sys/netinet/in_pcb.c,v
retrieving revision 1.210
diff -u -p -r1.210 in_pcb.c
--- sys/netinet/in_pcb.c19 Jul 2016 14:49:46 -  1.210
+++ sys/netinet/in_pcb.c20 Jul 2016 10:21:17 -
@@ -525,8 +525,7 @@ in_pcbconnect(struct inpcb *inp, struct 
if (sin->sin_port == 0)
return (EADDRNOTAVAIL);
 
-   error = in_selectsrc(, sin, inp->inp_moptions, >inp_route,
-   >inp_laddr, inp->inp_rtableid);
+   error = in_selpcbsrc(, sin, inp);
if (error)
return (error);
 
@@ -876,10 +875,14 @@ in_pcbrtentry(struct inpcb *inp)
  * an entry to the caller for later use.
  */
 int
-in_selectsrc(struct in_addr **insrc, struct sockaddr_in *sin,
-struct ip_moptions *mopts, struct route *ro, struct in_addr *laddr,
-u_int rtableid)
+in_selpcbsrc(struct in_addr **insrc, struct sockaddr_in *sin,
+struct inpcb *inp)
 {
+   struct ip_moptions *mopts = inp->inp_moptions;
+   struct route *ro = >inp_route;
+   struct in_addr *laddr = >inp_laddr;
+   u_int rtableid = inp->inp_rtableid;
+
struct sockaddr_in *sin2;
struct in_ifaddr *ia = NULL;
 
Index: sys/netinet/in_pcb.h
===
RCS file: /cvs/src/sys/netinet/in_pcb.h,v
retrieving revision 1.100
diff -u -p -r1.100 in_pcb.h
--- sys/netinet/in_pcb.h27 Jun 2016 16:33:48 -  1.100
+++ sys/netinet/in_pcb.h20 Jul 2016 10:21:17 -
@@ -289,8 +289,7 @@ void in_setpeeraddr(struct inpcb *, str
 voidin_setsockaddr(struct inpcb *, struct mbuf *);
 int in_baddynamic(u_int16_t, u_int16_t);
 int in_rootonly(u_int16_t, u_int16_t);
-int in_selectsrc(struct in_addr **, struct sockaddr_in *,
-   struct ip_moptions *, struct route *, struct in_addr *, u_int);
+int in_selpcbsrc(struct in_addr **, struct sockaddr_in *, struct inpcb *);
 struct rtentry *
in_pcbrtentry(struct inpcb *);
 
Index: sys/netinet/udp_usrreq.c
===
RCS file: /cvs/src/sys/netinet/udp_usrreq.c,v
retrieving revision 1.214
diff -u -p -r1.214 udp_usrreq.c
--- sys/netinet/udp_usrreq.c28 Jun 2016 11:22:53 -  1.214
+++ sys/netinet/udp_usrreq.c20 Jul 2016 10:21:19 -
@@ -989,8 +989,7 @@ udp_output(struct inpcb *inp, struct mbu
goto release;
}
 
-   error = in_selectsrc(, sin, inp->inp_moptions,
-   >inp_route, >inp_laddr, inp->inp_rtableid);
+   error = in_selpcbsrc(, sin, inp);
if (error)
goto release;
 
Index: sys/netinet6/icmp6.c
===
RCS file: /cvs/src/sys/netinet6/icmp6.c,v
retrieving revision 1.186
diff -u -p -r1.186 icmp6.c
--- sys/netinet6/icmp6.c5 Jul 2016 10:17:14 -   1.186
+++ sys/netinet6/icmp6.c20 Jul 2016 10:21:19 -
@@ -1259,7 +1259,7 @@ icmp6_reflect(struct mbuf *m, size_t off
 * source address of the erroneous packet.
 */
bzero(, sizeof(ro));
-   error = in6_selectsrc(, _src, NULL, NULL, , NULL,
+   error = in6_selectsrc(, _src, NULL, ,
m->m_pkthdr.ph_rtableid);
if (ro.ro_rt)
rtfree(ro.ro_rt); /* XXX: we could use this */
Index: sys/netinet6/in6_pcb.c
===
RCS file: /cvs/src/sys/netinet6/in6_pcb.c,v
retrieving revision 1.93
diff -u -p -r1.93 in6_pcb.c
--- sys/netinet6/in6_pcb.c  5 Jul 2016 10:17:14 -   1.93
+++ sys/netinet6/in6_pcb.c  20 Jul 2016 10:21:19 -
@@ -281,9 +281,7 @@ in6_pcbconnect(struct inpcb *inp, struct
 * with the address specified by setsockopt(IPV6_PKTINFO).
 * Is it the intended behavior?
 */
-   error = in6_selectsrc(, sin6, inp->inp_outputopts6,
-   inp->inp_moptions6, >inp_route6, >inp_laddr6,
-   inp->inp_rtableid);
+   error = in6_selpcbsrc(, sin6, inp, inp->inp_outputopts6);
if (error)
return (error);
 
Index: sys/netinet6/in6_src.c
===
RCS file: /cvs/src/sys/netinet6/in6_src.c,v
retrieving revision 1.76
diff -u -p -r1.76 in6_src.c
--- sys/netinet6/in6_src.c  5 Jul 2016 10:17:14 -   1.76
+++ sys/netinet6/in6_src.c  20 Jul 2016 10:21:19 -
@@ -88,15 +88,18 @@ int in6_selectif(struct sockaddr_in6 *, 
 
 /*
  * Return an IPv6 address, which is the most appropriate for a given
- * destination and user specified options.
- * If 

Re: IP_SENDSRCADDR [2/2] : add cmsg support

2016-06-15 Thread Vincent Gross
On Mon, 13 Jun 2016 16:49:01 +0200
Vincent Gross <vgr...@openbsd.org> wrote:
> 
> While validating source address inside selection functions is the
> right direction, I don't think it would be a good thing to extend
> further in_selectsrc() prototype. However it is easy to add a check
> while processing cmsg.
> 
> rev2 below. Ok ?
> 

rev3 below.

I fixed the line length, the useless bzero(), and also the wording in
ip.4

Ok ?

Index: sys/netinet/in.h
===
RCS file: /cvs/src/sys/netinet/in.h,v
retrieving revision 1.115
diff -u -p -r1.115 in.h
--- sys/netinet/in.h20 Oct 2015 20:22:42 -  1.115
+++ sys/netinet/in.h15 Jun 2016 17:37:11 -
@@ -307,6 +307,7 @@ struct ip_opts {
 #define IP_RECVRTABLE  35   /* bool; receive rdomain w/dgram */
 #define IP_IPSECFLOWINFO   36   /* bool; IPsec flow info for dgram */
 #define IP_IPDEFTTL37   /* int; IP TTL system default */
+#define IP_SENDSRCADDR 38   /* struct in_addr; source address to use */
 
 #define IP_RTABLE  0x1021  /* int; routing table, see SO_RTABLE */
 #define IP_DIVERTFL0x1022  /* int; divert direction flag opt */
Index: sys/netinet/udp_usrreq.c
===
RCS file: /cvs/src/sys/netinet/udp_usrreq.c,v
retrieving revision 1.212
diff -u -p -r1.212 udp_usrreq.c
--- sys/netinet/udp_usrreq.c15 Jun 2016 16:06:35 -  1.212
+++ sys/netinet/udp_usrreq.c15 Jun 2016 17:37:11 -
@@ -888,6 +888,7 @@ udp_output(struct inpcb *inp, struct mbu
struct sockaddr_in *sin = NULL;
struct udpiphdr *ui;
u_int32_t ipsecflowinfo = 0;
+   struct sockaddr_in src_sin;
int len = m->m_pkthdr.len;
struct in_addr *laddr;
int error = 0;
@@ -906,6 +907,8 @@ udp_output(struct inpcb *inp, struct mbu
goto release;
}
 
+   memset(_sin, 0, sizeof(src_sin));
+
if (control) {
u_int clen;
struct cmsghdr *cm;
@@ -939,9 +942,20 @@ udp_output(struct inpcb *inp, struct mbu
cm->cmsg_level == IPPROTO_IP &&
cm->cmsg_type == IP_IPSECFLOWINFO) {
ipsecflowinfo = *(u_int32_t *)CMSG_DATA(cm);
-   break;
-   }
+   } else
 #endif
+   if (cm->cmsg_len == CMSG_LEN(sizeof(struct in_addr)) &&
+   cm->cmsg_level == IPPROTO_IP &&
+   cm->cmsg_type == IP_SENDSRCADDR) {
+   memcpy(_sin.sin_addr, CMSG_DATA(cm),
+   sizeof(struct in_addr));
+   src_sin.sin_family = AF_INET;
+   src_sin.sin_len = sizeof(src_sin);
+   /* no check on reuse when sin->sin_port == 0 */
+   if ((error = in_pcbaddrisavail(inp, _sin,
+   0, curproc)))
+   goto release;
+   }
clen -= CMSG_ALIGN(cm->cmsg_len);
cmsgs += CMSG_ALIGN(cm->cmsg_len);
} while (clen);
@@ -979,6 +993,17 @@ udp_output(struct inpcb *inp, struct mbu
splx(s);
if (error)
goto release;
+   }
+
+   if (src_sin.sin_len > 0 &&
+   src_sin.sin_addr.s_addr != INADDR_ANY &&
+   src_sin.sin_addr.s_addr != inp->inp_laddr.s_addr) {
+   src_sin.sin_port = inp->inp_lport;
+   if (inp->inp_laddr.s_addr != INADDR_ANY &&
+   (error =
+   in_pcbaddrisavail(inp, _sin, 0, curproc)))
+   goto release;
+   laddr = _sin.sin_addr;
}
} else {
if (inp->inp_faddr.s_addr == INADDR_ANY) {
Index: share/man/man4/ip.4
===
RCS file: /cvs/src/share/man/man4/ip.4,v
retrieving revision 1.38
diff -u -p -r1.38 ip.4
--- share/man/man4/ip.4 20 Oct 2015 22:08:19 -  1.38
+++ share/man/man4/ip.4 15 Jun 2016 17:37:12 -
@@ -290,6 +290,34 @@ cmsg_len = CMSG_LEN(sizeof(u_int))
 cmsg_level = IPPROTO_IP
 cmsg_type = IP_RECVRTABLE
 .Ed
+.Pp
+When sending on a
+.Dv SOCK_DGRAM
+socket with
+.Xr sendmsg 2
+, the source address to be used can be passed as ancillary data with a type 
code of
+.Dv IP_SENDSRCADDR .
+The
+.Va msg_control
+field in the
+.Vt msghdr
+structure should point to a buffer that contains a
+.Vt cmsghdr
+structure followed by the requested sourc

Re: IP_SENDSRCADDR [2/2] : add cmsg support

2016-06-13 Thread Vincent Gross
On Mon, 13 Jun 2016 19:57:15 +0200
Jeremie Courreges-Anglas <j...@wxcvbn.org> wrote:

> Vincent Gross <vgr...@openbsd.org> writes:
> 
> > Le Mon, 13 Jun 2016 07:35:16 +0200,
> > j...@wxcvbn.org (Jeremie Courreges-Anglas) a écrit :
> >  
> >> j...@wxcvbn.org (Jeremie Courreges-Anglas) writes:
> >>   
> >> > cc'ing sthen since he also has interest in IP_SENDSRCADDR
> >> >
> >> > Jeremie Courreges-Anglas <j...@wxcvbn.org> writes:
> >> >
> >> >> Vincent Gross <vgr...@openbsd.org> writes:
> >> >>
> >> >>> This diff adds support for IP_SENDSRCADDR cmsg on UDP sockets.
> >> >>> As for udp6_output(), we check that the source address+port is
> >> >>> available only if inp_laddr != *
> >> >>
> >> >> Your last IP_SENDSRCADDR diff didn't have that check, I think
> >> >> it is harmful.  If the socket is not bound then there is
> >> >> effectively no check performed by in_pcbaddrisavail(), thus I
> >> >> can use any random address. Other than this additional bypass
> >> >> check, your diff looks good to me.
> >> >>  
> > [...]  
> >> >>
> >> >> I haven't checked yet whether udp6_output is also affected.  If
> >> >> you folks already know that it isn't, please let me know.
> >> 
> >> The answer is "no", a few tests can't trigger the same problem.
> >> IIUC in6_selectsrc is responsible for rejection of non-local
> >> systems. Maybe we should take the same approach in netinet/, and
> >> extend in_selectsrc()?
> >> 
> >> --  
> >
> > While validating source address inside selection functions is the
> > right direction, I don't think it would be a good thing to extend
> > further in_selectsrc() prototype.  
> 
> I find it nice to have all the source address selection in one place.
> Or do you have another refactoring in mind?
> 

Uh, turns out I was operating on obsolete data. I would actually be
easy to shrink in_selectsrc() prototype to
(int)(struct in_addr **, struct sockaddr_in *, struct in_pcb *).
But this looks like a layering violation to me ... What do you think ?

$ grep -r in_selectsrc sys/net*
sys/netinet/in_pcb.c
sys/netinet/in_pcb.h
sys/netinet/udp_usrreq.c
$ cd sys/netinet
$ grep -A2 in_selectsrc
error = in_selectsrc(, sin, inp->inp_moptions,
>inp_route, >inp_laddr, inp->inp_rtableid);
if (error)
in_selectsrc(struct in_addr **insrc, struct sockaddr_in *sin,
struct ip_moptions *mopts, struct route *ro, struct in_addr *laddr,
u_int rtableid)
$ grep -A2 in_selectsrc udp_usrreq.c
error = in_selectsrc(, sin, inp->inp_moptions,
>inp_route, >inp_laddr, inp->inp_rtableid);
if (error)


> > However it is easy to add a check while
> > processing cmsg.
> >
> > rev2 below. Ok ?  
> 
> Nits below, looks fine otherwise.  The checks do detect addresses not
> configured on the system and overlaps of bound sockets.
> 
> >
> > diff --git a/share/man/man4/ip.4 b/share/man/man4/ip.4
> > index 111432b..154b0d1 100644
> > --- a/share/man/man4/ip.4
> > +++ b/share/man/man4/ip.4
> > @@ -290,6 +290,27 @@ cmsg_len = CMSG_LEN(sizeof(u_int))
> >  cmsg_level = IPPROTO_IP
> >  cmsg_type = IP_RECVRTABLE
> >  .Ed
> > +.Pp
> > +If the
> > +.Dv IP_SENDSRCADDR
> > +option is passed to a
> > +.Xr sendmsg 2
> > +call on a
> > +.Dv SOCK_DGRAM
> > +socket, the address passed along the
> > +.Vt cmsghdr
> > +structure will be used as the source of the outgoing
> > +.Tn UDP
> > +datagram.  The
> > +.Vt cmsghdr
> > +fields for
> > +.Xr sendmsg 2
> > +have the following values:  
> 
> I would have worded it "should have" here, since these are the values
> that the developer is supposed to pass.

Yes, I have to find a better wording for this part.

> 
> > +.Bd -literal -offset indent
> > +cmsg_len = CMSG_LEN(sizeof(struct in_addr))
> > +cmsg_level = IPPROTO_IP
> > +cmsg_type = IP_SENDSRCADDR
> > +.Ed
> >  .Ss "Multicast Options"
> >  .Tn IP
> >  multicasting is supported only on
> > diff --git a/sys/netinet/in.h b/sys/netinet/in.h
> > index adb1b30..bf8c95d 100644
> > --- a/sys/netinet/in.h
> > +++ b/sys/netinet/in.h
> > @@ -307,6 +307,7 @@ struct ip_opts {
> >  #define IP_RECVRTABLE  35   /* bool; receive rdomain
> > w/dgram */ #define IP_IPSECFLOWINFO 36   /* 

Re: IP_SENDSRCADDR [2/2] : add cmsg support

2016-06-13 Thread Vincent Gross
Le Mon, 13 Jun 2016 07:35:16 +0200,
j...@wxcvbn.org (Jérémie Courrèges-Anglas) a écrit :

> j...@wxcvbn.org (Jeremie Courreges-Anglas) writes:
> 
> > cc'ing sthen since he also has interest in IP_SENDSRCADDR
> >
> > Jeremie Courreges-Anglas <j...@wxcvbn.org> writes:
> >  
> >> Vincent Gross <vgr...@openbsd.org> writes:
> >>  
> >>> This diff adds support for IP_SENDSRCADDR cmsg on UDP sockets. As
> >>> for udp6_output(), we check that the source address+port is
> >>> available only if inp_laddr != *  
> >>
> >> Your last IP_SENDSRCADDR diff didn't have that check, I think it is
> >> harmful.  If the socket is not bound then there is effectively no
> >> check performed by in_pcbaddrisavail(), thus I can use any random
> >> address. Other than this additional bypass check, your diff looks
> >> good to me.
> >>
[...]
> >>
> >> I haven't checked yet whether udp6_output is also affected.  If you
> >> folks already know that it isn't, please let me know.  
> 
> The answer is "no", a few tests can't trigger the same problem.  IIUC
> in6_selectsrc is responsible for rejection of non-local systems.
> Maybe we should take the same approach in netinet/, and extend
> in_selectsrc()?
> 
> --

While validating source address inside selection functions is the right
direction, I don't think it would be a good thing to extend further
in_selectsrc() prototype. However it is easy to add a check while
processing cmsg.

rev2 below. Ok ?


diff --git a/share/man/man4/ip.4 b/share/man/man4/ip.4
index 111432b..154b0d1 100644
--- a/share/man/man4/ip.4
+++ b/share/man/man4/ip.4
@@ -290,6 +290,27 @@ cmsg_len = CMSG_LEN(sizeof(u_int))
 cmsg_level = IPPROTO_IP
 cmsg_type = IP_RECVRTABLE
 .Ed
+.Pp
+If the
+.Dv IP_SENDSRCADDR
+option is passed to a
+.Xr sendmsg 2
+call on a
+.Dv SOCK_DGRAM
+socket, the address passed along the
+.Vt cmsghdr
+structure will be used as the source of the outgoing
+.Tn UDP
+datagram.  The
+.Vt cmsghdr
+fields for
+.Xr sendmsg 2
+have the following values:
+.Bd -literal -offset indent
+cmsg_len = CMSG_LEN(sizeof(struct in_addr))
+cmsg_level = IPPROTO_IP
+cmsg_type = IP_SENDSRCADDR
+.Ed
 .Ss "Multicast Options"
 .Tn IP
 multicasting is supported only on
diff --git a/sys/netinet/in.h b/sys/netinet/in.h
index adb1b30..bf8c95d 100644
--- a/sys/netinet/in.h
+++ b/sys/netinet/in.h
@@ -307,6 +307,7 @@ struct ip_opts {
 #define IP_RECVRTABLE  35   /* bool; receive rdomain w/dgram */
 #define IP_IPSECFLOWINFO   36   /* bool; IPsec flow info for dgram */
 #define IP_IPDEFTTL37   /* int; IP TTL system default */
+#define IP_SENDSRCADDR 38   /* struct in_addr; source address to use */
 
 #define IP_RTABLE  0x1021  /* int; routing table, see SO_RTABLE */
 #define IP_DIVERTFL0x1022  /* int; divert direction flag opt */
diff --git a/sys/netinet/udp_usrreq.c b/sys/netinet/udp_usrreq.c
index 1feea11..401ed7a 100644
--- a/sys/netinet/udp_usrreq.c
+++ b/sys/netinet/udp_usrreq.c
@@ -888,6 +888,7 @@ udp_output(struct inpcb *inp, struct mbuf *m, struct mbuf 
*addr,
struct sockaddr_in *sin = NULL;
struct udpiphdr *ui;
u_int32_t ipsecflowinfo = 0;
+   struct sockaddr_in src_sin;
int len = m->m_pkthdr.len;
struct in_addr *laddr;
int error = 0;
@@ -906,6 +907,8 @@ udp_output(struct inpcb *inp, struct mbuf *m, struct mbuf 
*addr,
goto release;
}
 
+   memset(_sin, 0, sizeof(src_sin));
+
if (control) {
u_int clen;
struct cmsghdr *cm;
@@ -939,9 +942,20 @@ udp_output(struct inpcb *inp, struct mbuf *m, struct mbuf 
*addr,
cm->cmsg_level == IPPROTO_IP &&
cm->cmsg_type == IP_IPSECFLOWINFO) {
ipsecflowinfo = *(u_int32_t *)CMSG_DATA(cm);
-   break;
-   }
+   } else
 #endif
+   if (cm->cmsg_len == CMSG_LEN(sizeof(struct in_addr)) &&
+   cm->cmsg_level == IPPROTO_IP &&
+   cm->cmsg_type == IP_SENDSRCADDR) {
+   bzero(_sin, sizeof(src_sin));
+   memcpy(_sin.sin_addr, CMSG_DATA(cm),
+   sizeof(struct in_addr));
+   src_sin.sin_family = AF_INET;
+   src_sin.sin_len = sizeof(src_sin);
+   /* no check on reuse done when sin->sin_port == 
0 */
+   if ((error = in_pcbaddrisavail(inp, _sin, 
0, curproc)))
+   goto release;
+  

Re: IP_SENDSRCADDR [2/2] : add cmsg support

2016-06-12 Thread Vincent Gross
On Sun, 12 Jun 2016 15:29:32 +0200 (CEST)
Mark Kettenis <mark.kette...@xs4all.nl> wrote:

> > Date: Sun, 12 Jun 2016 14:59:55 +0200
> > From: Vincent Gross <vgr...@openbsd.org>
> > 
> > This diff adds support for IP_SENDSRCADDR cmsg on UDP sockets. As
> > for udp6_output(), we check that the source address+port is
> > available only if inp_laddr != *
> > 
> > Ok ?  
> 
> Why do we need this?  cmsg stuff is fragile, so we want the to keep it
> as simple as possible.

In iked.conf(5), you can specify the local and remote addresses to use
for IKEv2 handshake. Let's say I have 192.0.2.1/25 on em0, and
192.0.2.129/25 on em1, and that I have a single udp socket bound to
0.0.0.0. I receive an IKEv2 message on em0, with 192.0.2.129 as
destination address, and a source address reachable only via em0.
If I reply with the receiving socket, in_selectsrc() will pick 192.0.2.1
as the reply source address, and the handshake will abort.

isakmpd(8) work around this by opening one socket per local address.
This means that we must either watch for RTM_NEWADDR and RTM_DELADDR,
or poll using getifaddrs(3), if we want to catch all changes.

This is one example, I remember other developers saying how they
would benefit from this, but I can't find the conversations back :P

> 
> > diff --git a/share/man/man4/ip.4 b/share/man/man4/ip.4
> > index 111432b..154b0d1 100644
> > --- a/share/man/man4/ip.4
> > +++ b/share/man/man4/ip.4
> > @@ -290,6 +290,27 @@ cmsg_len = CMSG_LEN(sizeof(u_int))
> >  cmsg_level = IPPROTO_IP
> >  cmsg_type = IP_RECVRTABLE
> >  .Ed
> > +.Pp
> > +If the
> > +.Dv IP_SENDSRCADDR
> > +option is passed to a
> > +.Xr sendmsg 2
> > +call on a
> > +.Dv SOCK_DGRAM
> > +socket, the address passed along the
> > +.Vt cmsghdr
> > +structure will be used as the source of the outgoing
> > +.Tn UDP
> > +datagram.  The
> > +.Vt cmsghdr
> > +fields for
> > +.Xr sendmsg 2
> > +have the following values:
> > +.Bd -literal -offset indent
> > +cmsg_len = CMSG_LEN(sizeof(struct in_addr))
> > +cmsg_level = IPPROTO_IP
> > +cmsg_type = IP_SENDSRCADDR
> > +.Ed
> >  .Ss "Multicast Options"
> >  .Tn IP
> >  multicasting is supported only on
> > diff --git a/sys/netinet/in.h b/sys/netinet/in.h
> > index adb1b30..bf8c95d 100644
> > --- a/sys/netinet/in.h
> > +++ b/sys/netinet/in.h
> > @@ -307,6 +307,7 @@ struct ip_opts {
> >  #define IP_RECVRTABLE  35   /* bool; receive rdomain
> > w/dgram */ #define IP_IPSECFLOWINFO 36   /* bool; IPsec flow
> > info for dgram */ #define IP_IPDEFTTL   37   /* int;
> > IP TTL system default */ +#define IP_SENDSRCADDR
> > 38   /* struct in_addr; source address to use */ 
> >  #define IP_RTABLE  0x1021  /* int; routing
> > table, see SO_RTABLE */ #define IP_DIVERTFL
> > 0x1022  /* int; divert direction flag opt */ diff --git
> > a/sys/netinet/udp_usrreq.c b/sys/netinet/udp_usrreq.c index
> > 1feea11..35675b4 100644 --- a/sys/netinet/udp_usrreq.c
> > +++ b/sys/netinet/udp_usrreq.c
> > @@ -888,6 +888,7 @@ udp_output(struct inpcb *inp, struct mbuf *m,
> > struct mbuf *addr, struct sockaddr_in *sin = NULL;
> > struct udpiphdr *ui;
> > u_int32_t ipsecflowinfo = 0;
> > +   struct sockaddr_in src_sin;
> > int len = m->m_pkthdr.len;
> > struct in_addr *laddr;
> > int error = 0;
> > @@ -906,6 +907,8 @@ udp_output(struct inpcb *inp, struct mbuf *m,
> > struct mbuf *addr, goto release;
> > }
> >  
> > +   memset(_sin, 0, sizeof(src_sin));
> > +
> > if (control) {
> > u_int clen;
> > struct cmsghdr *cm;
> > @@ -939,9 +942,16 @@ udp_output(struct inpcb *inp, struct mbuf *m,
> > struct mbuf *addr, cm->cmsg_level == IPPROTO_IP &&
> > cm->cmsg_type == IP_IPSECFLOWINFO) {
> > ipsecflowinfo = *(u_int32_t
> > *)CMSG_DATA(cm);
> > -   break;
> > -   }
> > +   } else
> >  #endif
> > +   if (cm->cmsg_len == CMSG_LEN(sizeof(struct
> > in_addr)) &&
> > +   cm->cmsg_level == IPPROTO_IP &&
> > +   cm->cmsg_type == IP_SENDSRCADDR) {
> > +   memcpy(_sin.sin_addr,
> > CMSG_DATA(cm),
> > +   sizeof(struct in_addr));
> > +   src_sin.sin_family = AF_INET;
> > +   

Re: IP_SENDSRCADDR [1/2] : move cmsg handling code

2016-06-12 Thread Vincent Gross
On Sun, 12 Jun 2016 15:00:14 +0200
Vincent Gross <vgr...@openbsd.org> wrote:

Damn you autowrap ! get off my diff !

(thanks jca@ for spotting)

> This diff moves the cmsg handling code on top of udp_output(). I split
> the whole IP_SENDSRCADDR thung in two chunks so that it's easier to
> audit.
> 
> ok ?
> 

diff --git a/sys/netinet/udp_usrreq.c b/sys/netinet/udp_usrreq.c
index 2db5998..1feea11 100644
--- a/sys/netinet/udp_usrreq.c
+++ b/sys/netinet/udp_usrreq.c
@@ -906,6 +906,47 @@ udp_output(struct inpcb *inp, struct mbuf *m, struct mbuf 
*addr,
goto release;
}
 
+   if (control) {
+   u_int clen;
+   struct cmsghdr *cm;
+   caddr_t cmsgs;
+
+   /*
+* XXX: Currently, we assume all the optional information is 
stored
+* in a single mbuf.
+*/
+   if (control->m_next) {
+   error = EINVAL;
+   goto release;
+   }
+
+   clen = control->m_len;
+   cmsgs = mtod(control, caddr_t);
+   do {
+   if (clen < CMSG_LEN(0)) {
+   error = EINVAL;
+   goto release;
+   }
+   cm = (struct cmsghdr *)cmsgs;
+   if (cm->cmsg_len < CMSG_LEN(0) ||
+   CMSG_ALIGN(cm->cmsg_len) > clen) {
+   error = EINVAL;
+   goto release;
+   }
+#ifdef IPSEC
+   if (ISSET(inp->inp_flags,INP_IPSECFLOWINFO) &&
+   cm->cmsg_len == CMSG_LEN(sizeof(ipsecflowinfo)) &&
+   cm->cmsg_level == IPPROTO_IP &&
+   cm->cmsg_type == IP_IPSECFLOWINFO) {
+   ipsecflowinfo = *(u_int32_t *)CMSG_DATA(cm);
+   break;
+   }
+#endif
+   clen -= CMSG_ALIGN(cm->cmsg_len);
+   cmsgs += CMSG_ALIGN(cm->cmsg_len);
+   } while (clen);
+   }
+
if (addr) {
sin = mtod(addr, struct sockaddr_in *);
 
@@ -947,45 +988,6 @@ udp_output(struct inpcb *inp, struct mbuf *m, struct mbuf 
*addr,
laddr = >inp_laddr;
}
 
-#ifdef IPSEC
-   if (control && (inp->inp_flags & INP_IPSECFLOWINFO) != 0) {
-   u_int clen;
-   struct cmsghdr *cm;
-   caddr_t cmsgs;
-
-   /*
-* XXX: Currently, we assume all the optional information is 
stored
-* in a single mbuf.
-*/
-   if (control->m_next) {
-   error = EINVAL;
-   goto release;
-   }
-
-   clen = control->m_len;
-   cmsgs = mtod(control, caddr_t);
-   do {
-   if (clen < CMSG_LEN(0)) {
-   error = EINVAL;
-   goto release;
-   }
-   cm = (struct cmsghdr *)cmsgs;
-   if (cm->cmsg_len < CMSG_LEN(0) ||
-   CMSG_ALIGN(cm->cmsg_len) > clen) {
-   error = EINVAL;
-   goto release;
-   }
-   if (cm->cmsg_len == CMSG_LEN(sizeof(ipsecflowinfo)) &&
-   cm->cmsg_level == IPPROTO_IP &&
-   cm->cmsg_type == IP_IPSECFLOWINFO) {
-   ipsecflowinfo = *(u_int32_t *)CMSG_DATA(cm);
-   break;
-   }
-   clen -= CMSG_ALIGN(cm->cmsg_len);
-   cmsgs += CMSG_ALIGN(cm->cmsg_len);
-   } while (clen);
-   }
-#endif
/*
 * Calculate data length and get a mbuf
 * for UDP and IP headers.



IP_SENDSRCADDR [1/2] : move cmsg handling code

2016-06-12 Thread Vincent Gross
This diff moves the cmsg handling code on top of udp_output(). I split
the whole IP_SENDSRCADDR thung in two chunks so that it's easier to
audit.

ok ?

diff --git a/sys/netinet/udp_usrreq.c b/sys/netinet/udp_usrreq.c
index 2db5998..1feea11 100644
--- a/sys/netinet/udp_usrreq.c
+++ b/sys/netinet/udp_usrreq.c
@@ -906,6 +906,47 @@ udp_output(struct inpcb *inp, struct mbuf *m,
struct mbuf *addr, goto release;
}
 
+   if (control) {
+   u_int clen;
+   struct cmsghdr *cm;
+   caddr_t cmsgs;
+
+   /*
+* XXX: Currently, we assume all the optional
information is stored
+* in a single mbuf.
+*/
+   if (control->m_next) {
+   error = EINVAL;
+   goto release;
+   }
+
+   clen = control->m_len;
+   cmsgs = mtod(control, caddr_t);
+   do {
+   if (clen < CMSG_LEN(0)) {
+   error = EINVAL;
+   goto release;
+   }
+   cm = (struct cmsghdr *)cmsgs;
+   if (cm->cmsg_len < CMSG_LEN(0) ||
+   CMSG_ALIGN(cm->cmsg_len) > clen) {
+   error = EINVAL;
+   goto release;
+   }
+#ifdef IPSEC
+   if (ISSET(inp->inp_flags,INP_IPSECFLOWINFO) &&
+   cm->cmsg_len ==
CMSG_LEN(sizeof(ipsecflowinfo)) &&
+   cm->cmsg_level == IPPROTO_IP &&
+   cm->cmsg_type == IP_IPSECFLOWINFO) {
+   ipsecflowinfo = *(u_int32_t
*)CMSG_DATA(cm);
+   break;
+   }
+#endif
+   clen -= CMSG_ALIGN(cm->cmsg_len);
+   cmsgs += CMSG_ALIGN(cm->cmsg_len);
+   } while (clen);
+   }
+
if (addr) {
sin = mtod(addr, struct sockaddr_in *);
 
@@ -947,45 +988,6 @@ udp_output(struct inpcb *inp, struct mbuf *m,
struct mbuf *addr, laddr = >inp_laddr;
}
 
-#ifdef IPSEC
-   if (control && (inp->inp_flags & INP_IPSECFLOWINFO) != 0) {
-   u_int clen;
-   struct cmsghdr *cm;
-   caddr_t cmsgs;
-
-   /*
-* XXX: Currently, we assume all the optional
information is stored
-* in a single mbuf.
-*/
-   if (control->m_next) {
-   error = EINVAL;
-   goto release;
-   }
-
-   clen = control->m_len;
-   cmsgs = mtod(control, caddr_t);
-   do {
-   if (clen < CMSG_LEN(0)) {
-   error = EINVAL;
-   goto release;
-   }
-   cm = (struct cmsghdr *)cmsgs;
-   if (cm->cmsg_len < CMSG_LEN(0) ||
-   CMSG_ALIGN(cm->cmsg_len) > clen) {
-   error = EINVAL;
-   goto release;
-   }
-   if (cm->cmsg_len ==
CMSG_LEN(sizeof(ipsecflowinfo)) &&
-   cm->cmsg_level == IPPROTO_IP &&
-   cm->cmsg_type == IP_IPSECFLOWINFO) {
-   ipsecflowinfo = *(u_int32_t
*)CMSG_DATA(cm);
-   break;
-   }
-   clen -= CMSG_ALIGN(cm->cmsg_len);
-   cmsgs += CMSG_ALIGN(cm->cmsg_len);
-   } while (clen);
-   }
-#endif
/*
 * Calculate data length and get a mbuf
 * for UDP and IP headers.



IP_SENDSRCADDR [2/2] : add cmsg support

2016-06-12 Thread Vincent Gross
This diff adds support for IP_SENDSRCADDR cmsg on UDP sockets. As for
udp6_output(), we check that the source address+port is available only
if inp_laddr != *

Ok ?

diff --git a/share/man/man4/ip.4 b/share/man/man4/ip.4
index 111432b..154b0d1 100644
--- a/share/man/man4/ip.4
+++ b/share/man/man4/ip.4
@@ -290,6 +290,27 @@ cmsg_len = CMSG_LEN(sizeof(u_int))
 cmsg_level = IPPROTO_IP
 cmsg_type = IP_RECVRTABLE
 .Ed
+.Pp
+If the
+.Dv IP_SENDSRCADDR
+option is passed to a
+.Xr sendmsg 2
+call on a
+.Dv SOCK_DGRAM
+socket, the address passed along the
+.Vt cmsghdr
+structure will be used as the source of the outgoing
+.Tn UDP
+datagram.  The
+.Vt cmsghdr
+fields for
+.Xr sendmsg 2
+have the following values:
+.Bd -literal -offset indent
+cmsg_len = CMSG_LEN(sizeof(struct in_addr))
+cmsg_level = IPPROTO_IP
+cmsg_type = IP_SENDSRCADDR
+.Ed
 .Ss "Multicast Options"
 .Tn IP
 multicasting is supported only on
diff --git a/sys/netinet/in.h b/sys/netinet/in.h
index adb1b30..bf8c95d 100644
--- a/sys/netinet/in.h
+++ b/sys/netinet/in.h
@@ -307,6 +307,7 @@ struct ip_opts {
 #define IP_RECVRTABLE  35   /* bool; receive rdomain w/dgram */
 #define IP_IPSECFLOWINFO   36   /* bool; IPsec flow info for dgram */
 #define IP_IPDEFTTL37   /* int; IP TTL system default */
+#define IP_SENDSRCADDR 38   /* struct in_addr; source address to use */
 
 #define IP_RTABLE  0x1021  /* int; routing table, see SO_RTABLE */
 #define IP_DIVERTFL0x1022  /* int; divert direction flag opt */
diff --git a/sys/netinet/udp_usrreq.c b/sys/netinet/udp_usrreq.c
index 1feea11..35675b4 100644
--- a/sys/netinet/udp_usrreq.c
+++ b/sys/netinet/udp_usrreq.c
@@ -888,6 +888,7 @@ udp_output(struct inpcb *inp, struct mbuf *m, struct mbuf 
*addr,
struct sockaddr_in *sin = NULL;
struct udpiphdr *ui;
u_int32_t ipsecflowinfo = 0;
+   struct sockaddr_in src_sin;
int len = m->m_pkthdr.len;
struct in_addr *laddr;
int error = 0;
@@ -906,6 +907,8 @@ udp_output(struct inpcb *inp, struct mbuf *m, struct mbuf 
*addr,
goto release;
}
 
+   memset(_sin, 0, sizeof(src_sin));
+
if (control) {
u_int clen;
struct cmsghdr *cm;
@@ -939,9 +942,16 @@ udp_output(struct inpcb *inp, struct mbuf *m, struct mbuf 
*addr,
cm->cmsg_level == IPPROTO_IP &&
cm->cmsg_type == IP_IPSECFLOWINFO) {
ipsecflowinfo = *(u_int32_t *)CMSG_DATA(cm);
-   break;
-   }
+   } else
 #endif
+   if (cm->cmsg_len == CMSG_LEN(sizeof(struct in_addr)) &&
+   cm->cmsg_level == IPPROTO_IP &&
+   cm->cmsg_type == IP_SENDSRCADDR) {
+   memcpy(_sin.sin_addr, CMSG_DATA(cm),
+   sizeof(struct in_addr));
+   src_sin.sin_family = AF_INET;
+   src_sin.sin_len = sizeof(src_sin);
+   }
clen -= CMSG_ALIGN(cm->cmsg_len);
cmsgs += CMSG_ALIGN(cm->cmsg_len);
} while (clen);
@@ -980,6 +990,17 @@ udp_output(struct inpcb *inp, struct mbuf *m, struct mbuf 
*addr,
if (error)
goto release;
}
+
+   if (src_sin.sin_len > 0 &&
+   src_sin.sin_addr.s_addr != INADDR_ANY &&
+   src_sin.sin_addr.s_addr != inp->inp_laddr.s_addr) {
+   src_sin.sin_port = inp->inp_lport;
+   if (inp->inp_laddr.s_addr != INADDR_ANY &&
+   (error =
+   in_pcbaddrisavail(inp, _sin, 0, curproc)))
+   goto release;
+   laddr = _sin.sin_addr;
+   }
} else {
if (inp->inp_faddr.s_addr == INADDR_ANY) {
error = ENOTCONN;



Re: Set prio when bypassing pf(4)

2016-06-08 Thread Vincent Gross
On Wed, 8 Jun 2016 15:12:23 +0200
Martin Pieuchot <m...@openbsd.org> wrote:

> On 07/06/16(Tue) 22:02, Stuart Henderson wrote:
> > On 2016/06/07 21:49, Vincent Gross wrote:  
> > > 
> > > It's how henning@ set things up when integrating the new queuing
> > > mechanism.
> > > http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/kern/uipc_mbuf.c#rev1.160
> > >  
> > > > Is there any use for this apart for vlan(4) interfaces?  
> > > 
> > > AFAICT, no.   
> 
> In this case I'd suggest to make this a vlan(4) specific
> configuration, is there a problem with that?

Actually, there is. Consider this setup:

# ifconfig vlan4 vlan 4 vlandev em0 up
# ifconfig vlan5 vlan 4 vlandev em1 up
# ifconfig trunk0 trunkproto failover trunkport vlan4 trunport vlan5 up
# ifconfig trunk0 10.10.10.50/24

llprio in vlan4 or vlan5 is useless because they are not initiating
ARP requests, and adding lookups in trunk would be the Wrong Way.

This particular exemple might seem far-fetched, but I'm sure there are
plenty worse actually deployed.

[...]

> > > I don't think we should make a special case for vlan(4), this
> > > kind of detail do not belong to the arp(4) or bpf(4) layer.
> 
> Which kind detail are you talking about?  I still don't understand how
> the scope if this problem is beyond vlan(4).  Are you also interested
> in queue priority?
> 

What I meant is arp(4) and bpf(4) and maybe other protocols should not
need to concern themselves about the nature of the device they are
transmitting on.

This problem manifests itself only on vlan(4) so far, because of the
CoS field in the 802.1Q header. But the problem is broader than that
actually, as some network protocols completely bypass pf(4), and it
can have consequences as we go down into the network device stacking.



Re: Set prio when bypassing pf(4)

2016-06-07 Thread Vincent Gross
Le Tue, 7 Jun 2016 10:48:22 +0200,
Martin Pieuchot <m...@openbsd.org> a écrit :

> On 06/06/16(Mon) 23:52, Vincent Gross wrote:
> > On Mon, 6 Jun 2016 17:33:36 +0100
> > Stuart Henderson <s...@spacehopper.org> wrote:
> >   
> > > On 2016/06/06 16:15, Vincent Gross wrote:  
> > > > When sending ARP requests, or when writing to a bpf handle (as
> > > > when sending DHCP Discover), we bypass pf(4) so we have no way
> > > > to define the priority (m->m_pkthdr.pf.prio) of the outgoing
> > > > packets.  
> > [...]  
> > > > 
> > > > This diff adds
> > > > 1) an if_llprio field to struct ifnet
> > > 
> > > struct if_data.. this is used by enough ports that changing the
> > > abi  
> > [...]  
> > >   
> > > > diff --git a/sbin/ifconfig/ifconfig.8
> > > > b/sbin/ifconfig/ifconfig.8
> > > 
> > > BTW. patch warns about offsets if you apply this to -current.
> > >   
> > [...]  
> > > 
> > > Other than these points, it seems a useful thing to do, pppoe
> > > could use it too.
> > > 
> > > I wonder what these broken ISP devices are that require the
> > > priority field in the vlan frame header to be 0 (aka "prio 1")...
> > >   
> > 
> > r2 below. I moved if_llprio from if_data to struct ifnet, and went
> > from u_char to u_int8_t. I also added a bound check in ifioctl().
> > 
> > Comments ? ok ?  
> 
> Could you explain me why our default prio is 3?
> 

It's how henning@ set things up when integrating the new queuing mechanism.
http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/kern/uipc_mbuf.c#rev1.160

> Is there any use for this apart for vlan(4) interfaces?

AFAICT, no. 

> Should it
> really be part of "struct ifnet" ?
> 

sthen@ pointed out that struct if_data was heavily used by our ports, and
that such a change would require a version bump. Now, I may have overlooked
a better place for it.

I don't think we should make a special case for vlan(4), this kind of detail
do not belong to the arp(4) or bpf(4) layer.

> I also find weird to see a field inside ``m_pkthdr.pf'' being used
> without pf(4).
> 
> > Index: sbin/ifconfig/ifconfig.8
> > ===
> > RCS file: /cvs/src/sbin/ifconfig/ifconfig.8,v
> > retrieving revision 1.267
> > diff -u -p -r1.267 ifconfig.8
> > --- sbin/ifconfig/ifconfig.86 Apr 2016 10:07:14
> > -   1.267 +++ sbin/ifconfig/ifconfig.8  6 Jun 2016
> > 21:43:46 - @@ -327,6 +327,10 @@ Disable special processing at
> > the link l Change the link layer address (MAC address) of the
> > interface. This should be specified as six colon-separated hex
> > values, or can be chosen randomly.
> > +.It Cm llprio Ar prio
> > +Set the priority for link layer communications
> > +.Pf ( Xr arp 4 ,
> > +.Xr bpf 4 ) .
> >  .It Cm media Op Ar type
> >  Set the media type of the interface to
> >  .Ar type .
> > Index: sbin/ifconfig/ifconfig.c
> > ===
> > RCS file: /cvs/src/sbin/ifconfig/ifconfig.c,v
> > retrieving revision 1.322
> > diff -u -p -r1.322 ifconfig.c
> > --- sbin/ifconfig/ifconfig.c3 May 2016 17:52:33
> > -   1.322 +++ sbin/ifconfig/ifconfig.c  6 Jun 2016
> > 21:43:46 - @@ -135,6 +135,7 @@ char name[IFNAMSIZ];
> >  intflags, xflags, setaddr, setipdst, doalias;
> >  u_long metric, mtu;
> >  intrdomainid;
> > +intllprio;
> >  intclearaddr, s;
> >  intnewaddr = 0;
> >  intaf = AF_INET;
> > @@ -157,6 +158,7 @@ voidaddaf(const char *, int);
> >  void   removeaf(const char *, int);
> >  void   setifbroadaddr(const char *, int);
> >  void   setifmtu(const char *, int);
> > +void   setifllprio(const char *, int);
> >  void   setifnwid(const char *, int);
> >  void   setifbssid(const char *, int);
> >  void   setifnwkey(const char *, int);
> > @@ -521,6 +523,7 @@ const structcmd {
> > { "instance",   NEXTARG,A_MEDIAINST,
> > setmediainst }, { "inst",   NEXTARG,
> > A_MEDIAINST,setmediainst }, { "lladdr",
> > NEXTARG,0,  setiflladdr },
> > +   { "llprio", NEXTARG,0,
> > setifllprio }, { NULL, /*src*/  0,
> > 0,  setifaddr }, { NULL, /*dst*/
> > 0,  0,   

Re: Set prio when bypassing pf(4)

2016-06-06 Thread Vincent Gross
On Mon, 6 Jun 2016 17:33:36 +0100
Stuart Henderson <s...@spacehopper.org> wrote:

> On 2016/06/06 16:15, Vincent Gross wrote:
> > When sending ARP requests, or when writing to a bpf handle (as when
> > sending DHCP Discover), we bypass pf(4) so we have no way to define
> > the priority (m->m_pkthdr.pf.prio) of the outgoing packets.
[...]
> > 
> > This diff adds
> > 1) an if_llprio field to struct ifnet  
> 
> struct if_data.. this is used by enough ports that changing the abi
[...]
> 
> > diff --git a/sbin/ifconfig/ifconfig.8 b/sbin/ifconfig/ifconfig.8  
> 
> BTW. patch warns about offsets if you apply this to -current.
> 
[...]
> 
> Other than these points, it seems a useful thing to do, pppoe could
> use it too.
> 
> I wonder what these broken ISP devices are that require the
> priority field in the vlan frame header to be 0 (aka "prio 1")...
> 

r2 below. I moved if_llprio from if_data to struct ifnet, and went from
u_char to u_int8_t. I also added a bound check in ifioctl().

Comments ? ok ?

Index: sbin/ifconfig/ifconfig.8
===
RCS file: /cvs/src/sbin/ifconfig/ifconfig.8,v
retrieving revision 1.267
diff -u -p -r1.267 ifconfig.8
--- sbin/ifconfig/ifconfig.86 Apr 2016 10:07:14 -   1.267
+++ sbin/ifconfig/ifconfig.86 Jun 2016 21:43:46 -
@@ -327,6 +327,10 @@ Disable special processing at the link l
 Change the link layer address (MAC address) of the interface.
 This should be specified as six colon-separated hex values, or can
 be chosen randomly.
+.It Cm llprio Ar prio
+Set the priority for link layer communications
+.Pf ( Xr arp 4 ,
+.Xr bpf 4 ) .
 .It Cm media Op Ar type
 Set the media type of the interface to
 .Ar type .
Index: sbin/ifconfig/ifconfig.c
===
RCS file: /cvs/src/sbin/ifconfig/ifconfig.c,v
retrieving revision 1.322
diff -u -p -r1.322 ifconfig.c
--- sbin/ifconfig/ifconfig.c3 May 2016 17:52:33 -   1.322
+++ sbin/ifconfig/ifconfig.c6 Jun 2016 21:43:46 -
@@ -135,6 +135,7 @@ charname[IFNAMSIZ];
 intflags, xflags, setaddr, setipdst, doalias;
 u_long metric, mtu;
 intrdomainid;
+intllprio;
 intclearaddr, s;
 intnewaddr = 0;
 intaf = AF_INET;
@@ -157,6 +158,7 @@ voidaddaf(const char *, int);
 void   removeaf(const char *, int);
 void   setifbroadaddr(const char *, int);
 void   setifmtu(const char *, int);
+void   setifllprio(const char *, int);
 void   setifnwid(const char *, int);
 void   setifbssid(const char *, int);
 void   setifnwkey(const char *, int);
@@ -521,6 +523,7 @@ const structcmd {
{ "instance",   NEXTARG,A_MEDIAINST,setmediainst },
{ "inst",   NEXTARG,A_MEDIAINST,setmediainst },
{ "lladdr", NEXTARG,0,  setiflladdr },
+   { "llprio", NEXTARG,0,  setifllprio },
{ NULL, /*src*/ 0,  0,  setifaddr },
{ NULL, /*dst*/ 0,  0,  setifdstaddr },
{ NULL, /*illegal*/0,   0,  NULL },
@@ -854,6 +857,11 @@ getinfo(struct ifreq *ifr, int create)
else
rdomainid = ifr->ifr_rdomainid;
 #endif
+   if (ioctl(s, SIOCGIFLLPRIO, (caddr_t)ifr) < 0)
+   llprio = 0;
+   else
+   llprio = ifr->ifr_llprio;
+
return (0);
 }
 
@@ -1411,6 +1419,21 @@ setifmtu(const char *val, int d)
 
 /* ARGSUSED */
 void
+setifllprio(const char *val, int d)
+{
+   const char *errmsg = NULL;
+
+   (void) strlcpy(ifr.ifr_name, name, sizeof(ifr.ifr_name));
+
+   ifr.ifr_mtu = strtonum(val, 0, UCHAR_MAX, );
+   if (errmsg)
+   errx(1, "mtu %s: %s", val, errmsg);
+   if (ioctl(s, SIOCSIFLLPRIO, (caddr_t)) < 0)
+   warn("SIOCSIFLLPRIO");
+}
+
+/* ARGSUSED */
+void
 setifgroup(const char *group_name, int dummy)
 {
struct ifgroupreq ifgr;
@@ -2894,6 +2917,7 @@ status(int link, struct sockaddr_dl *sdl
printf(" metric %lu", metric);
if (mtu)
printf(" mtu %lu", mtu);
+   printf(" llprio %lu", llprio);
putchar('\n');
 #ifndef SMALL
if (showcapsflag)
Index: sys/net/bpf.c
===
RCS file: /cvs/src/sys/net/bpf.c,v
retrieving revision 1.141
diff -u -p -r1.141 bpf.c
--- sys/net/bpf.c   18 May 2016 03:46:03 -  1.141
+++ sys/net/bpf.c   6 Jun 2016 21:43:48 -
@@ -561,6 +561,7 @@ bpfwrite(dev_t dev, struct uio *uio, int
}
 
m->m_pkthdr.ph_rtableid = ifp->if_rdomain;
+   m->m_pkthdr.pf.prio = ifp->if_llprio;
 
if (d->bd_hdrcmplt && dst.ss_fam

Set prio when bypassing pf(4)

2016-06-06 Thread Vincent Gross
When sending ARP requests, or when writing to a bpf handle (as when
sending DHCP Discover), we bypass pf(4) so we have no way to define
the priority (m->m_pkthdr.pf.prio) of the outgoing packets.

My ISP runs two vlans to separate the delivery of general-purpose
internet and TV/phone over fiber; on the internet vlan, any frame with
a priority different from 0 is dropped; because we use m_pkthdr.pf.prio
to define this priority, and the default priority IFQ_DEFPRIO == 3,
all of my ARP and DHCP frames are dropped when I use a stock OpenBSD
kernel.

This diff adds
1) an if_llprio field to struct ifnet
2) the "llprio" keyword to ifconfig(8) and its manpage
3) code to init m_pkthdr.pf.prio from ifp->if_llprio when doing arp(4)
and bpf(4)

Don't forget to install the new headers before rebuilding ifconfig(8).

Comments ?


diff --git a/sbin/ifconfig/ifconfig.8 b/sbin/ifconfig/ifconfig.8
index c301a90..1f42e41 100644
--- a/sbin/ifconfig/ifconfig.8
+++ b/sbin/ifconfig/ifconfig.8
@@ -327,6 +327,10 @@ Disable special processing at the link level with the 
specified interface.
 Change the link layer address (MAC address) of the interface.
 This should be specified as six colon-separated hex values, or can
 be chosen randomly.
+.It Cm llprio Ar prio
+Set the priority for link layer communications
+.Pf ( Xr arp 4 ,
+.Xr bpf 4 ) .
 .It Cm media Op Ar type
 Set the media type of the interface to
 .Ar type .
diff --git a/sbin/ifconfig/ifconfig.c b/sbin/ifconfig/ifconfig.c
index c30ced5..c1e3594 100644
--- a/sbin/ifconfig/ifconfig.c
+++ b/sbin/ifconfig/ifconfig.c
@@ -135,6 +135,7 @@ charname[IFNAMSIZ];
 intflags, xflags, setaddr, setipdst, doalias;
 u_long metric, mtu;
 intrdomainid;
+intllprio;
 intclearaddr, s;
 intnewaddr = 0;
 intaf = AF_INET;
@@ -157,6 +158,7 @@ voidaddaf(const char *, int);
 void   removeaf(const char *, int);
 void   setifbroadaddr(const char *, int);
 void   setifmtu(const char *, int);
+void   setifllprio(const char *, int);
 void   setifnwid(const char *, int);
 void   setifbssid(const char *, int);
 void   setifnwkey(const char *, int);
@@ -521,6 +523,7 @@ const structcmd {
{ "instance",   NEXTARG,A_MEDIAINST,setmediainst },
{ "inst",   NEXTARG,A_MEDIAINST,setmediainst },
{ "lladdr", NEXTARG,0,  setiflladdr },
+   { "llprio", NEXTARG,0,  setifllprio },
{ NULL, /*src*/ 0,  0,  setifaddr },
{ NULL, /*dst*/ 0,  0,  setifdstaddr },
{ NULL, /*illegal*/0,   0,  NULL },
@@ -854,6 +857,11 @@ getinfo(struct ifreq *ifr, int create)
else
rdomainid = ifr->ifr_rdomainid;
 #endif
+   if (ioctl(s, SIOCGIFLLPRIO, (caddr_t)ifr) < 0)
+   llprio = 0;
+   else
+   llprio = ifr->ifr_llprio;
+
return (0);
 }
 
@@ -1411,6 +1419,21 @@ setifmtu(const char *val, int d)
 
 /* ARGSUSED */
 void
+setifllprio(const char *val, int d)
+{
+   const char *errmsg = NULL;
+
+   (void) strlcpy(ifr.ifr_name, name, sizeof(ifr.ifr_name));
+
+   ifr.ifr_mtu = strtonum(val, 0, UCHAR_MAX, );
+   if (errmsg)
+   errx(1, "mtu %s: %s", val, errmsg);
+   if (ioctl(s, SIOCSIFLLPRIO, (caddr_t)) < 0)
+   warn("SIOCSIFLLPRIO");
+}
+
+/* ARGSUSED */
+void
 setifgroup(const char *group_name, int dummy)
 {
struct ifgroupreq ifgr;
@@ -2894,6 +2917,7 @@ status(int link, struct sockaddr_dl *sdl, int ls)
printf(" metric %lu", metric);
if (mtu)
printf(" mtu %lu", mtu);
+   printf(" llprio %lu", llprio);
putchar('\n');
 #ifndef SMALL
if (showcapsflag)
diff --git a/sys/net/bpf.c b/sys/net/bpf.c
index 31b6ed0..d2f1060 100644
--- a/sys/net/bpf.c
+++ b/sys/net/bpf.c
@@ -561,6 +561,7 @@ bpfwrite(dev_t dev, struct uio *uio, int ioflag)
}
 
m->m_pkthdr.ph_rtableid = ifp->if_rdomain;
+   m->m_pkthdr.pf.prio = ifp->if_llprio;
 
if (d->bd_hdrcmplt && dst.ss_family == AF_UNSPEC)
dst.ss_family = pseudo_AF_HDRCMPLT;
diff --git a/sys/net/if.c b/sys/net/if.c
index 9b53bf1..e155b77 100644
--- a/sys/net/if.c
+++ b/sys/net/if.c
@@ -536,6 +536,7 @@ if_attach_common(struct ifnet *ifp)
M_TEMP, M_WAITOK|M_ZERO);
ifp->if_linkstatetask = malloc(sizeof(*ifp->if_linkstatetask),
M_TEMP, M_WAITOK|M_ZERO);
+   ifp->if_llprio = IFQ_DEFPRIO;
 
SRPL_INIT(>if_inputs);
 }
@@ -1988,6 +1989,16 @@ ifioctl(struct socket *so, u_long cmd, caddr_t data, 
struct proc *p)
ifnewlladdr(ifp);
break;
 
+   case SIOCGIFLLPRIO:
+   ifr->ifr_llprio = ifp->if_llprio;
+   break;
+
+   case SIOCSIFLLPRIO:
+   if ((error = suser(p, 0)))
+   return (error);
+   ifp->if_llprio = ifr->ifr_llprio;
+ 

Re: ifa_ifwithroute() fix

2016-05-31 Thread Vincent Gross
On Tue, 31 May 2016 09:51:10 +0200
Martin Pieuchot  wrote:

> On 19/04/16(Tue) 10:43, Martin Pieuchot wrote:
> > Mart Tõnso reported [0] a weird case related to the use of
> > ifa_ifwithnet().
> > 
> > The problem is that ifa_ifwithroute() does not always use route
> > entries but the poor's man routing table: ifa_ifwithnet().  This is
> > misleading because one cannot understand why "# route add" is not
> > coherent with "# route get".
> > 
> > So I'd like to commit the diff below which always use the route
> > table unless an interface index is specified in the gateway.  Mart
> > Tõnso confirmed it fixes his issue.
> > 
> > ok?  
> 
> Anyone?

ok vgross@

> 
> > 
> > [0] https://marc.info/?l=openbsd-misc=146046751201006=2
> > 
> > 
> > Index: net/route.c
> > ===
> > RCS file: /cvs/src/sys/net/route.c,v
> > retrieving revision 1.298
> > diff -u -p -r1.298 route.c
> > --- net/route.c 26 Mar 2016 21:56:04 -  1.298
> > +++ net/route.c 13 Apr 2016 07:38:11 -
> > @@ -740,20 +740,16 @@ ifa_ifwithroute(int flags, struct sockad
> > ifa = ifaof_ifpforaddr(dst, ifp);
> > if_put(ifp);
> > } else {
> > -   ifa = ifa_ifwithnet(gateway, rtableid);
> > -   }
> > -   }
> > -   if (ifa == NULL) {
> > -   struct rtentry  *rt = rtalloc(gateway, 0,
> > rtableid);
> > -   /* The gateway must be local if the same address
> > family. */
> > -   if (!rtisvalid(rt) || ((rt->rt_flags &
> > RTF_GATEWAY) &&
> > -   rt_key(rt)->sa_family == dst->sa_family)) {
> > +   struct rtentry *rt;
> > +
> > +   rt = rtalloc(gateway, RT_RESOLVE,
> > rtableid);
> > +   if (rt != NULL)
> > +   ifa = rt->rt_ifa;
> > rtfree(rt);
> > -   return (NULL);
> > }
> > -   ifa = rt->rt_ifa;
> > -   rtfree(rt);
> > }
> > +   if (ifa == NULL)
> > +   return (NULL);
> > if (ifa->ifa_addr->sa_family != dst->sa_family) {
> > struct ifaddr   *oifa = ifa;
> > ifa = ifaof_ifpforaddr(dst, ifa->ifa_ifp);
> >   
> 



Preserve DiffServ when fragmenting ipv4

2016-05-04 Thread Vincent Gross
When fragmenting ipv4, we do not preserve DiffServ/ToS field.
Here is how to observe this :

[obsd1](vlan10)  (vlan10)[obsd2](vlan20) --mtu600-- (vlan20)[obsd3]

root@obsd2 # sysctl net.inet.ip.forwarding=1
root@obsd2 # tcpdump -ni $VLAN20DEV

user@obsd3 $ nc -4ul 

root@obsd1 $ echo "pass set prio 1" | pfctl -f -
user@obsd1 $ perl -e 'print "a"x800' | nc -4u $OBSD3VLAN20 


tcpdump: listening on vio0, link-type EN10MB
11:34.26.588937 802.1Q vid 10 pri 0 10.10.0.10.45095 > 10.20.0.20.: udp 800
11:34.26.589121 802.1Q vid 20 pri 0 10.10.0.10.45095 > 10.20.0.20.: udp 800 
(frag 26935:576@0+)
11:34.26.589152 802.1Q vid 20 pri 3 10.10.0.10 > 10.20.0.20: (frag 
26935:232@576)



Diff below ensures the fragmented packets have the same priority.

Ok ?

diff --git a/sys/netinet/ip_output.c b/sys/netinet/ip_output.c
index d0b15f8..5921566 100644
--- a/sys/netinet/ip_output.c
+++ b/sys/netinet/ip_output.c
@@ -678,9 +678,10 @@ ip_fragment(struct mbuf *m, struct ifnet *ifp,
u_long mtu)
m->m_data += max_linkhdr;
mhip = mtod(m, struct ip *);
*mhip = *ip;
-   /* we must inherit MCAST and BCAST flags and routing table */
+   /* we must inherit MCAST/BCAST flags, routing table and prio */
m->m_flags |= m0->m_flags & (M_MCAST|M_BCAST);
m->m_pkthdr.ph_rtableid = m0->m_pkthdr.ph_rtableid;
+   m->m_pkthdr.pf.prio = m0->m_pkthdr.pf.prio;
if (hlen > sizeof (struct ip)) {
mhlen = ip_optcopy(ip, mhip) + sizeof (struct ip);
mhip->ip_hl = mhlen >> 2;



Re: arm: new FDT-enabled mainbus

2016-05-01 Thread Vincent Gross
On Sun, 1 May 2016 13:27:29 +0200
Patrick Wildt  wrote:

> Hi,
> 
> I updated the diff with the feedback received.  This basically adds
> a tree-like topology by making mainbus FDT aware and implementing
> a simplebus that can span the tree's roots into more branches.
> 
> Next steps (and diffs) are implementing an FDT platform for armv7,
> similar to imx/omap/... and having the generic interrupt controller
> and timer attach to a simplebus/fdt bus.
> 
> Comments?

Legacy boot works fine on my novena laptop, fdt boot fails, I lack
expertise but it seems that cortex0 and its children were skipped.

Thank you for pushing a better arm support :)

Both output below :

## Booting kernel from Legacy Image at 1030 ...
   Image Name:   boot
   Image Type:   ARM Linux Kernel Image (uncompressed)
   Data Size:4993156 Bytes = 4.8 MiB
   Load Address: 1030
   Entry Point:  1030
   Verifying Checksum ... OK
   Loading Kernel Image ... OK
Using machid 0x10ad from environment

Starting kernel ...


OpenBSD/imx booting ...
arg0 0x0 arg1 0x10ad arg2 0x1100
atag core flags 0 pagesize 0 rootdev 0
atag cmdline [sd0i:/bsd.imx.umg]
atag revision 00063012
atag mem start 0x1000 size 0xf000
bootfile: sd0i:/bsd.imx.umg
bootargs: 
memory size derived from u-boot
bootconf.mem[0].address = 1000 pages 983040/0xf000
Allocating page tables
freestart = 0x107c4000, free_pages = 981051 (0x000ef83b)
IRQ stack: p0x107f2000 v0xc07f2000
ABT stack: p0x107f3000 v0xc07f3000
UND stack: p0x107f4000 v0xc07f4000
SVC stack: p0x107f5000 v0xc07f5000
Creating L1 page table at 0x107c4000
Mapping kernel
Constructing L2 page tables
undefined page pmap [ using 715008 bytes of bsd ELF symbol table ]
board type: 4269
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights
reserved. Copyright (c) 1995-2016 OpenBSD. All rights reserved.
http://www.OpenBSD.org

OpenBSD 5.9-current (GENERIC) #31: Fri Apr 29 22:42:20 CET 2016

dermi...@russell.kilb.yt:/home/dermiste/OpenBSD/srcsys/arch/armv7/compile/GENERIC
real mm  = 4026527744 (3839MB)
avail mem = 3940864000 (3758MB)
warning: no entropy supplied by boot lader
mainbus0 at root: no device tree
cortex0 at mainbus0
ampintc0 at cortex0 nirq 160
amptimer0 at cortex0: tick rate 396000 KHz
armliicc0 at cortex0: rtl 7 waymask: 0x000f
cpu0 at mainbus0: ARM Cortex A9 R2 rev 10 (ARMv7 core)
cpu0: DC enabled IC enabled WB disabled EABT branch prediction enabled
cpu0: 32KB(32b/l,4way) I-cache, 32KB(32b/l,4way) wr-back D-cache
imx0 at mainbus0: Kosagi Novena
imxccm0 at imx0: imx6 rev 1.2 CPU freq: 792 MHz
imxiomuxc0 at imx0
imxdog0 at imx0
imxocotp0 at imx0
imxuart0 at imx0 console
imxgpio0 at imx0
imxgpio1 at imx0
imxgpio2 at imx0
imxgpio3 at imx0
imxgpio4 at imx0
imxgpio5 at imx0
imxgpio6 at imx0
imxesdhc0 at imx0
sdmmc0 at imxesdhc0
imxesdhc1 at imx0
sdmmc1 at imxesdhc1
ehci0 at imx0
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 "i.MX6 EHCI root hub" rev 2.00/1.00 addr 1
imxenet0 at imx0
imxenet0: address 00:1f:11:02:17:de
ukphy0 at imxenet0 phy 7: Generic IEEE 802.3u media interface, rev. 1:
OUI 0x000885, model 0x0021 ahci0 at imx0 AHCI 1.3
ahci0: port 0: 3.0Gb/s
scsibus0 at ahci0: 32 targets
sd0 at scsibus0 targ 0 lun 0:  SCSI3
0/direct fixed naa.5002538da0003f36 sd0: 238475MB, 512 bytes/sector,
488397168 sectors, thin scsibus1 at sdmmc1: 2 targets, initiator 0
sd1 at scsibus1 targ 1 lun 0:  SCSI2 0/direct fixed
sd1: 7600MB, 512 bytes/sector, 15564800 sectors
uhub1 at uhub0 port 1 "Genesys Logic USB2.0 Hub Charger" rev 2.00/1.97
addr 2 axe0 at uhub1 port 2 configuration 1 interface 0 "ASIX
Electronics AX88772B" rev 2.00/0.01 addr 3 axe0: AX88772B, address
00:0e:c6:87:72:01 ukphy1 at axe0 phy 16: Generic IEEE 802.3u media
interface, rev. 1: OUI 0x000ec6, model 0x0008 ugen0 at uhub1 port 3
"AsureWave product 0x3393" rev 1.10/0.01 addr 4 uhub2 at uhub1 port 4
"Genesys Logic USB2.0 Hub Charger" rev 2.00/1.97 addr 5 vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
boot device: sd0
root on sd0a (46b8d5734c644ff3.a) swap on sd0b dump on sd0b




## Booting kernel from Legacy Image at 1030 ...
   Image Name:   boot
   Image Type:   ARM Linux Kernel Image (uncompressed)
   Data Size:4993156 Bytes = 4.8 MiB
   Load Address: 1030
   Entry Point:  1030
   Verifying Checksum ... OK
## Flattened Device Tree blob at 1010
   Booting using the fdt blob at 0x1010
   Loading Kernel Image ... OK
   reserving fdt memory region: addr=1010 size=b000
   Using Device Tree in place at 1010, end 1010dfff
Using machid 0x10ad from environment

Starting kernel ...


OpenBSD/imx booting ...
arg0 0x0 arg1 0x10ad arg2 0x1010
Allocating page tables
freestart = 0x107c4000, free_pages = 981051 (0x000ef83b)
IRQ stack: p0x107f2000 v0xc07f2000
ABT stack: p0x107f3000 v0xc07f3000
UND stack: p0x107f4000 v0xc07f4000
SVC stack: p0x107f5000 

Simplify in_pcblookup()

2016-04-09 Thread Vincent Gross
in_pcblookup() is always called with *:0 for the remote side.
Remove the useless bits, shuffle the tests around and it's much
easier to audit.

Ok ?

Index: netinet/in_pcb.c
===
RCS file: /cvs/src/sys/netinet/in_pcb.c,v
retrieving revision 1.201
diff -u -p -r1.201 in_pcb.c
--- netinet/in_pcb.c8 Apr 2016 14:34:21 -   1.201
+++ netinet/in_pcb.c9 Apr 2016 09:42:07 -
@@ -415,14 +415,13 @@ in_pcbaddrisavail(struct inpcb *inp, str
struct inpcb *t;
 
if (so->so_euid) {
-   t = in_pcblookup(table, _addr, 0,
-   >sin_addr, lport, INPLOOKUP_WILDCARD,
-   inp->inp_rtableid);
+   t = in_pcblookup_local(table, >sin_addr, lport,
+   INPLOOKUP_WILDCARD, inp->inp_rtableid);
if (t && (so->so_euid != t->inp_socket->so_euid))
return (EADDRINUSE);
}
-   t = in_pcblookup(table, _addr, 0,
-   >sin_addr, lport, wild, inp->inp_rtableid);
+   t = in_pcblookup_local(table, >sin_addr, lport,
+   wild, inp->inp_rtableid);
if (t && (reuseport & t->inp_socket->so_options) == 0)
return (EADDRINUSE);
}
@@ -475,8 +474,8 @@ in_pcbpickport(u_int16_t *lport, void *l
candidate = lower;
localport = htons(candidate);
} while (in_baddynamic(localport, so->so_proto->pr_protocol) ||
-   in_pcblookup(table, _addr, 0,
-   laddr, localport, wild, inp->inp_rtableid));
+   in_pcblookup_local(table, laddr, localport, wild,
+   inp->inp_rtableid));
*lport = localport;
 
return (0);
@@ -734,14 +733,14 @@ in_rtchange(struct inpcb *inp, int errno
 }
 
 struct inpcb *
-in_pcblookup(struct inpcbtable *table, void *faddrp, u_int fport_arg,
-void *laddrp, u_int lport_arg, int flags, u_int rdomain)
+in_pcblookup_local(struct inpcbtable *table, void *laddrp, u_int lport_arg,
+int flags, u_int rdomain)
 {
struct inpcb *inp, *match = NULL;
int matchwild = 3, wildcard;
-   u_int16_t fport = fport_arg, lport = lport_arg;
-   struct in_addr faddr = *(struct in_addr *)faddrp;
+   u_int16_t lport = lport_arg;
struct in_addr laddr = *(struct in_addr *)laddrp;
+   struct in6_addr *laddr6 = (struct in6_addr *)laddrp;
struct inpcbhead *head;
 
rdomain = rtable_l2(rdomain);   /* convert passed rtableid to rdomain */
@@ -753,60 +752,40 @@ in_pcblookup(struct inpcbtable *table, v
continue;
wildcard = 0;
 #ifdef INET6
-   if (flags & INPLOOKUP_IPV6) {
-   struct in6_addr *laddr6 = (struct in6_addr *)laddrp;
-   struct in6_addr *faddr6 = (struct in6_addr *)faddrp;
-
-   if (!(inp->inp_flags & INP_IPV6))
+   if (ISSET(flags, INPLOOKUP_IPV6)) {
+   if (!ISSET(inp->inp_flags, INP_IPV6))
continue;
 
-   if (!IN6_IS_ADDR_UNSPECIFIED(>inp_laddr6)) {
-   if (IN6_IS_ADDR_UNSPECIFIED(laddr6))
-   wildcard++;
-   else if (!IN6_ARE_ADDR_EQUAL(>inp_laddr6, 
laddr6))
-   continue;
-   } else {
-   if (!IN6_IS_ADDR_UNSPECIFIED(laddr6))
-   wildcard++;
-   }
+   if (!IN6_IS_ADDR_UNSPECIFIED(>inp_faddr6))
+   wildcard++;
 
-   if (!IN6_IS_ADDR_UNSPECIFIED(>inp_faddr6)) {
-   if (IN6_IS_ADDR_UNSPECIFIED(faddr6))
+   if (!IN6_ARE_ADDR_EQUAL(>inp_laddr6, laddr6)) {
+   if (IN6_IS_ADDR_UNSPECIFIED(>inp_laddr6) ||
+   IN6_IS_ADDR_UNSPECIFIED(laddr6))
wildcard++;
-   else if (!IN6_ARE_ADDR_EQUAL(>inp_faddr6,
-   faddr6) || inp->inp_fport != fport)
+   else
continue;
-   } else {
-   if (!IN6_IS_ADDR_UNSPECIFIED(faddr6))
-   wildcard++;
}
+
} else
 #endif /* INET6 */
{
 #ifdef INET6
-   if (inp->inp_flags & INP_IPV6)
+   if (ISSET(inp->inp_flags, INP_IPV6))
continue;
 #endif /* INET6 */
 
-   if (inp->inp_faddr.s_addr != INADDR_ANY) {
-

Remove long-dead and confusing code on rip6_ctlinput()

2016-04-08 Thread Vincent Gross
When using raw ip6 socket, one can connect(2) then send(2), or
just sendto(2). The code below would try to find the non-connected
raw ip6 socket corresponding to an incoming icmp6 message, to deliver
the failure. This code has been disabled ever since it has been put
in-tree, justifiably so because we are doing a wildcard socket search
based on barely-checked external input.

Better remove it altogether and prevent future useless head-scratching.

Ok?

Index: netinet6/raw_ip6.c
===
RCS file: /cvs/src/sys/netinet6/raw_ip6.c,v
retrieving revision 1.89
diff -u -p -r1.89 raw_ip6.c
--- netinet6/raw_ip6.c  29 Mar 2016 11:57:51 -  1.89
+++ netinet6/raw_ip6.c  8 Apr 2016 17:55:24 -
@@ -285,21 +285,6 @@ rip6_ctlinput(int cmd, struct sockaddr *
 */
in6p = in6_pcbhashlookup(, >sin6_addr, 0,
_src->sin6_addr, 0, rdomain);
-#if 0
-   if (!in6p) {
-   /*
-* As the use of sendto(2) is fairly popular,
-* we may want to allow non-connected pcb too.
-* But it could be too weak against attacks...
-* We should at least check if the local
-* address (= s) is really ours.
-*/
-   in6p = in_pcblookup(, >sin6_addr, 0,
-   (struct in6_addr *)_src->sin6_addr, 0,
-   INPLOOKUP_WILDCARD | INPLOOKUP_IPV6,
-   rdomain);
-   }
-#endif
 
if (in6p && in6p->inp_ipv6.ip6_nxt &&
in6p->inp_ipv6.ip6_nxt == nxt)



new diff for reserved ports checks [2/2] Was: Re: move "privileged port" check out of in(6)_pcbaddrisavail()

2016-04-03 Thread Vincent Gross
On 03/31/16 14:07, Alexander Bluhm wrote:
> On Wed, Mar 30, 2016 at 10:44:14PM +0200, Vincent Gross wrote:
>> This diff moves the "are we binding to a privileged port while not being 
>> root ?"
>> check from in(6)_pcbaddrisavail() to in_pcbbind().
> 
>> --- sys/netinet/in_pcb.c 26 Mar 2016 21:56:04 -  1.198
>> +++ sys/netinet/in_pcb.c 30 Mar 2016 20:33:00 -
>> @@ -341,9 +341,14 @@ in_pcbbind(struct inpcb *inp, struct mbu
>>  }
>>  }
>>  
>> -if (lport == 0)
>> +if (lport == 0) {
>>  if ((error = in_pcbpickport(, wild, inp, p)))
>>  return (error);
>> +} else {
>> +if (ntohs(lport) < IPPORT_RESERVED &&
>> +(error = suser(p, 0)))
>> +return (EACCES);
>> +}
>>  inp->inp_lport = lport;
> 
> At this point inp has already been modified.  So when we bail out
> with EACCES here, we have a partially successful system call.
> 
> Move the assignments
> inp->inp_laddr6 = sin6->sin6_addr;
> inp->inp_laddr = sin->sin_addr;
> down after the return (EACCES).
> 
> Looks like that return (error) was wrong before.

diff --git a/sys/netinet/in_pcb.c b/sys/netinet/in_pcb.c
index 1ff0056..63b3357 100644
--- a/sys/netinet/in_pcb.c
+++ b/sys/netinet/in_pcb.c
@@ -343,9 +343,22 @@ in_pcbbind(struct inpcb *inp, struct mbuf *nam, struct 
proc *p)
}
}
 
-   if (lport == 0)
+   if (lport == 0) {
if ((error = in_pcbpickport(, laddr, wild, inp, p)))
return (error);
+   } else {
+   /*
+* Question:  Do we wish to continue the Berkeley
+* tradition of ports < IPPORT_RESERVED be only for
+* root?
+* Answer: For now yes, but IMHO, it should be REMOVED!
+* OUCH: One other thing, is there no better way of
+* finding a process for a socket instead of using
+* curproc?  (Marked with BSD's {in,}famous XXX ?
+*/
+   if (ntohs(lport) < IPPORT_RESERVED && (error = suser(p, 0)))
+   return (EACCES);
+   }
if (nam) {
switch (sotopf(so)) {
 #ifdef INET6
@@ -371,7 +384,6 @@ in_pcbaddrisavail(struct inpcb *inp, struct sockaddr_in 
*sin, int wild,
struct inpcbtable *table = inp->inp_table;
u_int16_t lport = sin->sin_port;
int reuseport = (so->so_options & SO_REUSEPORT);
-   int error;
 
if (IN_MULTICAST(sin->sin_addr.s_addr)) {
/*
@@ -411,10 +423,6 @@ in_pcbaddrisavail(struct inpcb *inp, struct sockaddr_in 
*sin, int wild,
if (lport) {
struct inpcb *t;
 
-   /* GROSS */
-   if (ntohs(lport) < IPPORT_RESERVED &&
-   (error = suser(p, 0)))
-   return (EACCES);
if (so->so_euid) {
t = in_pcblookup(table, _addr, 0,
>sin_addr, lport, INPLOOKUP_WILDCARD,
diff --git a/sys/netinet6/in6_pcb.c b/sys/netinet6/in6_pcb.c
index 4fde210..c11b936 100644
--- a/sys/netinet6/in6_pcb.c
+++ b/sys/netinet6/in6_pcb.c
@@ -158,7 +158,6 @@ in6_pcbaddrisavail(struct inpcb *inp, struct sockaddr_in6 
*sin6, int wild,
struct inpcbtable *table = inp->inp_table;
u_short lport = sin6->sin6_port;
int reuseport = (so->so_options & SO_REUSEPORT);
-   int error;
 
wild |= INPLOOKUP_IPV6;
/* KAME hack: embed scopeid */
@@ -217,17 +216,6 @@ in6_pcbaddrisavail(struct inpcb *inp, struct sockaddr_in6 
*sin6, int wild,
if (lport) {
struct inpcb *t;
 
-   /*
-* Question:  Do we wish to continue the Berkeley
-* tradition of ports < IPPORT_RESERVED be only for
-* root?
-* Answer: For now yes, but IMHO, it should be REMOVED!
-* OUCH: One other thing, is there no better way of
-* finding a process for a socket instead of using
-* curproc?  (Marked with BSD's {in,}famous XXX ?
-*/
-   if (ntohs(lport) < IPPORT_RESERVED && (error = suser(p, 0)))
-   return error;
if (so->so_euid) {
t = in_pcblookup(table,
(struct in_addr *)_addr, 0,



new diff for reserved ports checks [1/2] Was: Re: move "privileged port" check out of in(6)_pcbaddrisavail()

2016-04-03 Thread Vincent Gross
On 03/31/16 14:07, Alexander Bluhm wrote:
> On Wed, Mar 30, 2016 at 10:44:14PM +0200, Vincent Gross wrote:
>> This diff moves the "are we binding to a privileged port while not being 
>> root ?"
>> check from in(6)_pcbaddrisavail() to in_pcbbind().
> 
>> --- sys/netinet/in_pcb.c 26 Mar 2016 21:56:04 -  1.198
>> +++ sys/netinet/in_pcb.c 30 Mar 2016 20:33:00 -
>> @@ -341,9 +341,14 @@ in_pcbbind(struct inpcb *inp, struct mbu
>>  }
>>  }
>>  
>> -if (lport == 0)
>> +if (lport == 0) {
>>  if ((error = in_pcbpickport(, wild, inp, p)))
>>  return (error);
>> +} else {
>> +if (ntohs(lport) < IPPORT_RESERVED &&
>> +(error = suser(p, 0)))
>> +return (EACCES);
>> +}
>>  inp->inp_lport = lport;
> 
> At this point inp has already been modified.  So when we bail out
> with EACCES here, we have a partially successful system call.
> 
> Move the assignments
> inp->inp_laddr6 = sin6->sin6_addr;
> inp->inp_laddr = sin->sin_addr;
> down after the return (EACCES).
> 
> Looks like that return (error) was wrong before.

in_pcbpickport() need the local address, so I extend the prototype and
keep a void * to the sin(6)_addr or zeroin46_addr. And while at it, I set
the INPLOOKUP_IPV6 flag which will be needed in in_pcbpickport().

Ok ?

Index: netinet/in_pcb.c
===
RCS file: /cvs/src/sys/netinet/in_pcb.c,v
retrieving revision 1.198
diff -u -p -r1.198 in_pcb.c
--- netinet/in_pcb.c26 Mar 2016 21:56:04 -  1.198
+++ netinet/in_pcb.c3 Apr 2016 19:16:37 -
@@ -286,6 +286,7 @@ in_pcbbind(struct inpcb *inp, struct mbu
struct socket *so = inp->inp_socket;
u_int16_t lport = 0;
int wild = 0;
+   void *laddr = _addr;
int error;
 
if (inp->inp_lport)
@@ -312,9 +313,10 @@ in_pcbbind(struct inpcb *inp, struct mbu
if (sin6->sin6_family != AF_INET6)
return (EAFNOSUPPORT);
 
+   wild |= INPLOOKUP_IPV6;
if ((error = in6_pcbaddrisavail(inp, sin6, wild, p)))
return (error);
-   inp->inp_laddr6 = sin6->sin6_addr;
+   laddr = >sin6_addr;
lport = sin6->sin6_port;
break;
}
@@ -332,7 +334,7 @@ in_pcbbind(struct inpcb *inp, struct mbu
 
if ((error = in_pcbaddrisavail(inp, sin, wild, p)))
return (error);
-   inp->inp_laddr = sin->sin_addr;
+   laddr = >sin_addr;
lport = sin->sin_port;
break;
}
@@ -342,8 +344,20 @@ in_pcbbind(struct inpcb *inp, struct mbu
}
 
if (lport == 0)
-   if ((error = in_pcbpickport(, wild, inp, p)))
+   if ((error = in_pcbpickport(, laddr, wild, inp, p)))
return (error);
+   if (nam) {
+   switch (sotopf(so)) {
+#ifdef INET6
+   case PF_INET6:
+   inp->inp_laddr6 = *(struct in6_addr *)laddr;
+   break;
+#endif
+   case PF_INET:
+   inp->inp_laddr = *(struct in_addr *)laddr;
+   break;
+   }
+   }
inp->inp_lport = lport;
in_pcbrehash(inp);
return (0);
@@ -418,12 +432,12 @@ in_pcbaddrisavail(struct inpcb *inp, str
 }
 
 int
-in_pcbpickport(u_int16_t *lport, int wild, struct inpcb *inp, struct proc *p)
+in_pcbpickport(u_int16_t *lport, void *laddr, int wild, struct inpcb *inp,
+struct proc *p)
 {
struct socket *so = inp->inp_socket;
struct inpcbtable *table = inp->inp_table;
u_int16_t first, last, lower, higher, candidate, localport;
-   void *laddr;
int count;
 
if (inp->inp_flags & INP_HIGHPORT) {
@@ -453,10 +467,6 @@ in_pcbpickport(u_int16_t *lport, int wil
 
count = higher - lower;
candidate = lower + arc4random_uniform(count);
-   if (sotopf(so) == PF_INET6)
-   laddr = >inp_laddr6;
-   else
-   laddr = >inp_laddr;
 
do {
if (count-- < 0)/* completely used? */
Index: netinet/in_pcb.h
===
RCS file: /cvs/src/sys/netinet/in_pcb.h,v
retrieving revision 1.96
diff -u -p -r1.96 in_pcb.h
--- netinet/in_pcb.h23 Mar 2016 15:50:36 -  1.96
+++ netinet/in_pcb.h3 Apr 2016 19:16:37 

move "privileged port" check out of in(6)_pcbaddrisavail()

2016-03-30 Thread Vincent Gross
Hello,

This diff moves the "are we binding to a privileged port while not being root ?"
check from in(6)_pcbaddrisavail() to in_pcbbind().

This way we have a cleaner separation between "is the resource available ?"
and "am I allowed to access the resource ?" (which may or may not get its own
function later).

Also, it unbreaks naddy@'s iked setup (ikev2:sendmsg([::]:500) =>
in6_selectsrc() != in6p->inp_laddr6 => in6_pcbaddrisavail() => EPERM).

Ok ?

Index: sys/netinet/in_pcb.c
===
RCS file: /cvs/src/sys/netinet/in_pcb.c,v
retrieving revision 1.198
diff -u -p -r1.198 in_pcb.c
--- sys/netinet/in_pcb.c26 Mar 2016 21:56:04 -  1.198
+++ sys/netinet/in_pcb.c30 Mar 2016 20:33:00 -
@@ -341,9 +341,14 @@ in_pcbbind(struct inpcb *inp, struct mbu
}
}
 
-   if (lport == 0)
+   if (lport == 0) {
if ((error = in_pcbpickport(, wild, inp, p)))
return (error);
+   } else {
+   if (ntohs(lport) < IPPORT_RESERVED &&
+   (error = suser(p, 0)))
+   return (EACCES);
+   }
inp->inp_lport = lport;
in_pcbrehash(inp);
return (0);
@@ -357,7 +362,6 @@ in_pcbaddrisavail(struct inpcb *inp, str
struct inpcbtable *table = inp->inp_table;
u_int16_t lport = sin->sin_port;
int reuseport = (so->so_options & SO_REUSEPORT);
-   int error;
 
if (IN_MULTICAST(sin->sin_addr.s_addr)) {
/*
@@ -398,9 +402,6 @@ in_pcbaddrisavail(struct inpcb *inp, str
struct inpcb *t;
 
/* GROSS */
-   if (ntohs(lport) < IPPORT_RESERVED &&
-   (error = suser(p, 0)))
-   return (EACCES);
if (so->so_euid) {
t = in_pcblookup(table, _addr, 0,
>sin_addr, lport, INPLOOKUP_WILDCARD,
Index: sys/netinet6/in6_pcb.c
===
RCS file: /cvs/src/sys/netinet6/in6_pcb.c,v
retrieving revision 1.90
diff -u -p -r1.90 in6_pcb.c
--- sys/netinet6/in6_pcb.c  30 Mar 2016 13:02:22 -  1.90
+++ sys/netinet6/in6_pcb.c  30 Mar 2016 20:33:01 -
@@ -158,7 +158,6 @@ in6_pcbaddrisavail(struct inpcb *inp, st
struct inpcbtable *table = inp->inp_table;
u_short lport = sin6->sin6_port;
int reuseport = (so->so_options & SO_REUSEPORT);
-   int error;
 
wild |= INPLOOKUP_IPV6;
/* KAME hack: embed scopeid */
@@ -226,8 +225,6 @@ in6_pcbaddrisavail(struct inpcb *inp, st
 * finding a process for a socket instead of using
 * curproc?  (Marked with BSD's {in,}famous XXX ?
 */
-   if (ntohs(lport) < IPPORT_RESERVED && (error = suser(p, 0)))
-   return error;
if (so->so_euid) {
t = in_pcblookup(table,
(struct in_addr *)_addr, 0,



use fast lookup in in6_pcbconnect()

2016-03-23 Thread Vincent Gross
The current use of in_pcblookup() in in6_pcbconnect() is suboptimal :
all of the addresses and ports are defined, we are only interested in
exact matches, and its v4 cousin in_pcbconnect() already uses
in_pcbhashlookup().

Ok ?

Index: sys/netinet6/in6_pcb.c
===
RCS file: /cvs/src/sys/netinet6/in6_pcb.c,v
retrieving revision 1.89
diff -u -p -r1.89 in6_pcb.c
--- sys/netinet6/in6_pcb.c  23 Mar 2016 15:50:36 -  1.89
+++ sys/netinet6/in6_pcb.c  23 Mar 2016 17:09:11 -
@@ -304,9 +304,9 @@ in6_pcbconnect(struct inpcb *inp, struct

inp->inp_ipv6.ip6_hlim = (u_int8_t)in6_selecthlim(inp);

-   if (in_pcblookup(inp->inp_table, >sin6_addr, sin6->sin6_port,
+   if (in6_pcbhashlookup(inp->inp_table, >sin6_addr, sin6->sin6_port,
IN6_IS_ADDR_UNSPECIFIED(>inp_laddr6) ? in6a : >inp_laddr6,
-   inp->inp_lport, INPLOOKUP_IPV6, inp->inp_rtableid)) {
+   inp->inp_lport, inp->inp_rtableid)) {
return (EADDRINUSE);
}



merge in_ and in6_pcbbind(), introduce in(6)_pcbaddrisavail()

2015-12-23 Thread Vincent Gross
in_pcbbind and in6_pcbbind have a lot in common, the only meaningful
differences are in the checks done to ensure a sockaddr is available.

This diff splits theses checks in their own functions, and merge the
remaining code in one single function. Aside from being easier to read,
it also makes it very easy to check sockaddr availability without
actually binding.

Tested on my own laptop for the last ten days ; no regression observed
with regress/sys/netinet/in_pcbbind.

Ok ?

Index: netinet/in_pcb.c
===
RCS file: /cvs/src/sys/netinet/in_pcb.c,v
retrieving revision 1.195
diff -u -p -r1.195 in_pcb.c
--- netinet/in_pcb.c18 Dec 2015 22:25:16 -  1.195
+++ netinet/in_pcb.c23 Dec 2015 08:07:14 -
@@ -284,91 +284,129 @@ int
 in_pcbbind(struct inpcb *inp, struct mbuf *nam, struct proc *p)
 {
struct socket *so = inp->inp_socket;
-   struct inpcbtable *table = inp->inp_table;
-   struct sockaddr_in *sin;
u_int16_t lport = 0;
-   int wild = 0, reuseport = (so->so_options & SO_REUSEPORT);
+   int wild = 0;
int error;
 
-#ifdef INET6
-   if (sotopf(so) == PF_INET6)
-   return in6_pcbbind(inp, nam, p);
-#endif /* INET6 */
-
-   if (inp->inp_lport || inp->inp_laddr.s_addr != INADDR_ANY)
+   if (inp->inp_lport != 0)
return (EINVAL);
+
if ((so->so_options & (SO_REUSEADDR|SO_REUSEPORT)) == 0 &&
((so->so_proto->pr_flags & PR_CONNREQUIRED) == 0 ||
 (so->so_options & SO_ACCEPTCONN) == 0))
wild = INPLOOKUP_WILDCARD;
+
if (nam) {
-   sin = mtod(nam, struct sockaddr_in *);
-   if (nam->m_len != sizeof(*sin))
+   switch (sotopf(so)) {
+#ifdef INET6
+   case PF_INET6: {
+   struct sockaddr_in6 *sin6;
+
+   if (TAILQ_EMPTY(_ifaddr))
+   return (EADDRNOTAVAIL);
+   if (!IN6_IS_ADDR_UNSPECIFIED(>inp_laddr6))
+   return (EINVAL);
+
+   sin6 = mtod(nam, struct sockaddr_in6 *);
+   if (nam->m_len != sizeof(*sin6))
+   return (EINVAL);
+   if (sin6->sin6_family != AF_INET6)
+   return (EAFNOSUPPORT);
+   if ((error = in6_pcbaddrisavail(inp, sin6, wild, p)))
+   return (error);
+   inp->inp_laddr6 = sin6->sin6_addr;
+   lport = sin6->sin6_port;
+   break;
+   }
+#endif
+   case PF_INET: {
+   struct sockaddr_in *sin;
+
+   if (inp->inp_laddr.s_addr != INADDR_ANY)
+   return (EINVAL);
+
+   sin = mtod(nam, struct sockaddr_in *);
+   if (nam->m_len != sizeof(*sin))
+   return (EINVAL);
+   if (sin->sin_family != AF_INET)
+   return (EAFNOSUPPORT);
+   if ((error = in_pcbaddrisavail(inp, sin, wild, p)))
+   return (error);
+   inp->inp_laddr = sin->sin_addr;
+   lport = sin->sin_port;
+   break;
+   }
+   default:
return (EINVAL);
+   }
+   }
+
+   if (lport == 0)
+   if ((error = in_pcbpickport(, wild, inp, p)))
+   return (error);
+   inp->inp_lport = lport;
+   in_pcbrehash(inp);
+   return (0);
+}
+
+int
+in_pcbaddrisavail(struct inpcb *inp, struct sockaddr_in *sin, int wild,
+struct proc *p)
+{
+   struct socket *so = inp->inp_socket;
+   struct inpcbtable *table = inp->inp_table;
+   u_int16_t lport = sin->sin_port;
+   int reuseport = (so->so_options & SO_REUSEPORT);
+   int error;
 
-   if (sin->sin_family != AF_INET)
-   return (EAFNOSUPPORT);
+   if (IN_MULTICAST(sin->sin_addr.s_addr)) {
+   /*
+* Treat SO_REUSEADDR as SO_REUSEPORT for multicast;
+* allow complete duplication of binding if
+* SO_REUSEPORT is set, or if SO_REUSEADDR is set
+* and a multicast address is bound on both
+* new and duplicated sockets.
+*/
+   if (so->so_options & (SO_REUSEADDR|SO_REUSEPORT))
+   reuseport = SO_REUSEADDR|SO_REUSEPORT;
+   } else if (sin->sin_addr.s_addr != INADDR_ANY) {
+
+   if ((so->so_options & SO_BINDANY) == 0 ||
+   (so->so_type != SOCK_DGRAM) ||
+   (sin->sin_addr.s_addr != INADDR_BROADCAST &&
+!in_broadcast(sin->sin_addr, inp->inp_rtableid))) {
+  

Re: "Adding" the same IPv6 address twice

2015-12-21 Thread Vincent Gross
On 12/21/15 11:36, Martin Pieuchot wrote:
> Currently if you try to configure the same IPv6 address twice via the
> SIOCAIFADDR_IN6 ioctl(2) the kernel will return EEXIST and the address
> will be unset: 
> 
> # ifconfig vether0 inet6 2001::1
> # ifconfig vether0 inet6 2001::1 
> ifconfig: SIOCAIFADDR: File exists
> 
> Diff below fixes that by not inserting the local route if we're "just"
> updating an existing address.  sebastia@ confirmed it fixes his use
> case, so I'm looking for oks.
> 
> Index: netinet6/in6.c
> ===
> RCS file: /cvs/src/sys/netinet6/in6.c,v
> retrieving revision 1.181
> diff -u -p -r1.181 in6.c
> --- netinet6/in6.c3 Dec 2015 13:13:42 -   1.181
> +++ netinet6/in6.c18 Dec 2015 09:27:18 -
[...]
> @@ -454,12 +454,15 @@ in6_control(struct socket *so, u_long cm
>   return (EINVAL);
>   }
>  
> + if (ia6 == NULL)
> + newifaddr = 1;
> +
>   /*
>* Make the address tentative before joining multicast
>* addresses, so that corresponding MLD responses would
>* not have a tentative source address.
>*/
> - if ((ia6 == NULL) && in6if_do_dad(ifp))
> + if (newifaddr && in6if_do_dad(ifp))
>   ifra->ifra_flags |= IN6_IFF_TENTATIVE;
>  
>   /*
[...]
> @@ -489,6 +493,9 @@ in6_control(struct socket *so, u_long cm
>   /* Perform DAD, if needed. */
>   if (ia6->ia6_flags & IN6_IFF_TENTATIVE)
>   nd6_dad_start(>ia_ifa);
> +
> + if (!newifaddr)
> + break;
>  
>   plen = in6_mask2len(>ia_prefixmask.sin6_addr, NULL);
>   if ((ifp->if_flags & IFF_LOOPBACK) || plen == 128) {
> 


The "if (!newaddr)" should be moved above the "if (IN6_IFF_TENTATIVE)" ;
this way the skipping of DAD when !newaddr is more explicit.



Add SO_REUSEADDR when binding SO_REUSEPORT socket to multicast address

2015-12-09 Thread Vincent Gross
in_pcbbind and in6_pcbbind both extends SO_REUSEADDR for multicast
addresses so that it turns into a SO_REUSEPORT. But the check is done
in such a way that you cannot bind a SO_REUSEPORT-enabled socket to a
multicast address *after* you bound a SO_REUSEADDR-enabled socket to
the same address.

*But:* due to how the struct in_pcb are handled, if you :
1) bind a SO_REUSEADDR-enabled socket to a multicast address,
2) then bind a SO_REUSEADDR|SO_REUSEPORT-enabled socket to the same address,
as a result you can now bind a SO_REUSEPORT-enabled socket to this address.

The regress test in regress/sys/netinet/in_pcbbind reproduce this behaviour
(be sure to get v1.2 for Makefile and runtest.c)

This diff allow SO_REUSEPORT-only socket to be bound after SO_REUSEADDR-only.

ok ?

Index: netinet/in_pcb.c
===
RCS file: /cvs/src/sys/netinet/in_pcb.c,v
retrieving revision 1.194
diff -u -p -r1.194 in_pcb.c
--- netinet/in_pcb.c3 Dec 2015 21:57:59 -   1.194
+++ netinet/in_pcb.c9 Dec 2015 15:22:16 -
@@ -318,7 +318,7 @@ in_pcbbind(struct inpcb *inp, struct mbu
 * and a multicast address is bound on both
 * new and duplicated sockets.
 */
-   if (so->so_options & SO_REUSEADDR)
+   if (so->so_options & (SO_REUSEADDR|SO_REUSEPORT))
reuseport = SO_REUSEADDR|SO_REUSEPORT;
} else if (sin->sin_addr.s_addr != INADDR_ANY) {
sin->sin_port = 0;  /* yech... */
Index: netinet6/in6_pcb.c
===
RCS file: /cvs/src/sys/netinet6/in6_pcb.c,v
retrieving revision 1.83
diff -u -p -r1.83 in6_pcb.c
--- netinet6/in6_pcb.c  2 Dec 2015 22:13:44 -   1.83
+++ netinet6/in6_pcb.c  9 Dec 2015 15:22:16 -
@@ -214,7 +214,7 @@ in6_pcbbind(struct inpcb *inp, struct mb
 * and a multicast address is bound on both
 * new and duplicated sockets.
 */
-   if (so->so_options & SO_REUSEADDR)
+   if (so->so_options & (SO_REUSEADDR|SO_REUSEPORT))
reuseport = SO_REUSEADDR | SO_REUSEPORT;
} else if (!IN6_IS_ADDR_UNSPECIFIED(>sin6_addr)) {
struct ifaddr *ifa = NULL;



Re: Do not pass NULL to rtdeletemsg()

2015-12-07 Thread Vincent Gross
On 12/07/15 14:57, Martin Pieuchot wrote:
> If the interface is gone that means you're dealing with a cached route
> so there's no need to try to remove it from the table.
> 
> Better be explicit and do that before calling rtdeletemsg() rather than
> inside.
> 
> ok?

ok vgross@

> 
> Index: netinet/ip_icmp.c
> ===
> RCS file: /cvs/src/sys/netinet/ip_icmp.c,v
> retrieving revision 1.150
> diff -u -p -r1.150 ip_icmp.c
> --- netinet/ip_icmp.c 3 Dec 2015 21:11:53 -   1.150
> +++ netinet/ip_icmp.c 7 Dec 2015 12:40:06 -
> @@ -1042,19 +1042,21 @@ icmp_mtudisc(struct icmp *icp, u_int rta
>  void
>  icmp_mtudisc_timeout(struct rtentry *rt, struct rttimer *r)
>  {
> - if (rt == NULL)
> - panic("icmp_mtudisc_timeout:  bad route to timeout");
> + struct ifnet *ifp;
> + int s;
>  
> - if ((rt->rt_flags & (RTF_DYNAMIC | RTF_HOST)) ==
> - (RTF_DYNAMIC | RTF_HOST)) {
> + ifp = if_get(rt->rt_ifidx);
> + if (ifp == NULL)
> + return;
> +
> + if ((rt->rt_flags & (RTF_DYNAMIC|RTF_HOST)) == (RTF_DYNAMIC|RTF_HOST)) {
>   void *(*ctlfunc)(int, struct sockaddr *, u_int, void *);
>   struct sockaddr_in sin;
> - int s;
>  
>   sin = *satosin(rt_key(rt));
>  
>   s = splsoftnet();
> - rtdeletemsg(rt, NULL, r->rtt_tableid);
> + rtdeletemsg(rt, ifp, r->rtt_tableid);
>  
>   /* Notify TCP layer of increased Path MTU estimate */
>   ctlfunc = inetsw[ip_protox[IPPROTO_TCP]].pr_ctlinput;
> @@ -1062,9 +1064,12 @@ icmp_mtudisc_timeout(struct rtentry *rt,
>   (*ctlfunc)(PRC_MTUINC, sintosa(),
>   r->rtt_tableid, NULL);
>   splx(s);
> - } else
> + } else {
>   if ((rt->rt_rmx.rmx_locks & RTV_MTU) == 0)
>   rt->rt_rmx.rmx_mtu = 0;
> + }
> +
> + if_put(ifp);
>  }
>  
>  /*
> @@ -1088,17 +1093,20 @@ icmp_ratelimit(const struct in_addr *dst
>  void
>  icmp_redirect_timeout(struct rtentry *rt, struct rttimer *r)
>  {
> - if (rt == NULL)
> - panic("icmp_redirect_timeout:  bad route to timeout");
> + struct ifnet *ifp;
> + int s;
>  
> - if ((rt->rt_flags & (RTF_DYNAMIC | RTF_HOST)) ==
> - (RTF_DYNAMIC | RTF_HOST)) {
> - int s;
> + ifp = if_get(rt->rt_ifidx);
> + if (ifp == NULL)
> + return;
>  
> + if ((rt->rt_flags & (RTF_DYNAMIC|RTF_HOST)) == (RTF_DYNAMIC|RTF_HOST)) {
>   s = splsoftnet();
> - rtdeletemsg(rt, NULL, r->rtt_tableid);
> + rtdeletemsg(rt, ifp, r->rtt_tableid);
>   splx(s);
>   }
> +
> + if_put(ifp);
>  }
>  
>  int
> Index: netinet6/icmp6.c
> ===
> RCS file: /cvs/src/sys/netinet6/icmp6.c,v
> retrieving revision 1.182
> diff -u -p -r1.182 icmp6.c
> --- netinet6/icmp6.c  3 Dec 2015 21:11:53 -   1.182
> +++ netinet6/icmp6.c  7 Dec 2015 12:39:28 -
> @@ -1952,34 +1952,42 @@ icmp6_mtudisc_clone(struct sockaddr *dst
>  void
>  icmp6_mtudisc_timeout(struct rtentry *rt, struct rttimer *r)
>  {
> - if (rt == NULL)
> - panic("icmp6_mtudisc_timeout: bad route to timeout");
> - if ((rt->rt_flags & (RTF_DYNAMIC | RTF_HOST)) ==
> - (RTF_DYNAMIC | RTF_HOST)) {
> - int s;
> + struct ifnet *ifp;
> + int s;
>  
> + ifp = if_get(rt->rt_ifidx);
> + if (ifp == NULL)
> + return;
> +
> + if ((rt->rt_flags & (RTF_DYNAMIC|RTF_HOST)) == (RTF_DYNAMIC|RTF_HOST)) {
>   s = splsoftnet();
> - rtdeletemsg(rt, NULL, r->rtt_tableid);
> + rtdeletemsg(rt, ifp, r->rtt_tableid);
>   splx(s);
>   } else {
>   if (!(rt->rt_rmx.rmx_locks & RTV_MTU))
>   rt->rt_rmx.rmx_mtu = 0;
>   }
> +
> + if_put(ifp);
>  }
>  
>  void
>  icmp6_redirect_timeout(struct rtentry *rt, struct rttimer *r)
>  {
> - if (rt == NULL)
> - panic("icmp6_redirect_timeout: bad route to timeout");
> - if ((rt->rt_flags & (RTF_GATEWAY | RTF_DYNAMIC | RTF_HOST)) ==
> - (RTF_GATEWAY | RTF_DYNAMIC | RTF_HOST)) {
> - int s;
> + struct ifnet *ifp;
> + int s;
>  
> + ifp = if_get(rt->rt_ifidx);
> + if (ifp == NULL)
> + return;
> +
> + if ((rt->rt_flags & (RTF_DYNAMIC|RTF_HOST)) == (RTF_DYNAMIC|RTF_HOST)) {
>   s = splsoftnet();
> - rtdeletemsg(rt, NULL, r->rtt_tableid);
> + rtdeletemsg(rt, ifp, r->rtt_tableid);
>   splx(s);
>   }
> +
> + if_put(ifp);
>  }
>  
>  int *icmpv6ctl_vars[ICMPV6CTL_MAXID] = ICMPV6CTL_VARS;
> 



simplify in6_selectsrc() logic

2015-12-05 Thread Vincent Gross
in6_selectsrc() uses two different rtalloc calls depending on whether or
not the destination address is multicast or not, but there is nothing to
explain why. I dug a bit and found this commit from itojun@ :

diff -u -r1.6 -r1.7
--- src/sys/netinet6/in6_src.c  2000/06/18 04:49:32 1.6
+++ src/sys/netinet6/in6_src.c  2000/06/18 17:02:59 1.7
@@ -244,7 +244,11 @@
ro->ro_dst.sin6_family = AF_INET6;
ro->ro_dst.sin6_len = sizeof(struct sockaddr_in6);
ro->ro_dst.sin6_addr = *dst;
-   if (!IN6_IS_ADDR_MULTICAST(dst)) {
+   ro->ro_dst.sin6_scope_id = dstsock->sin6_scope_id;
+   if (IN6_IS_ADDR_MULTICAST(dst)) {
+   ro->ro_rt = rtalloc1(&((struct route *)ro)
+->ro_dst, 0);
+   } else {
rtalloc((struct route *)ro);
}
}

Below are rtalloc() and rtalloc1() from sys/net/route.c r1.19 commited
on 05/21/2000 :

> void
> rtalloc(ro)
>   register struct route *ro;
> {
>   if (ro->ro_rt && ro->ro_rt->rt_ifp && (ro->ro_rt->rt_flags & RTF_UP))
>   return;  /* XXX */
>   ro->ro_rt = rtalloc1(>ro_dst, 1);
> }
> 
> struct rtentry *
> rtalloc1(dst, report)
>   register struct sockaddr *dst;
>   int report;
> {
[...]
>   /*
>* IP encapsulation does lots of lookups where we don't need nor want
>* the RTM_MISSes that would be generated.  It causes RTM_MISS storms
>* sent upward breaking user-level routing queries.
>*/
>   miss:   if (report && dst->sa_family != PF_KEY) {
>   bzero((caddr_t), sizeof(info));
>   info.rti_info[RTAX_DST] = dst;
>   rt_missmsg(msgtype, , 0, err);
>   }
>   }
>   splx(s);
>   return (newrt);
> }


So this if(MULTICAST) has been introduced to prevent RTM_MISS storms when
looking up routes to multicast addresses ; multicast and unicast route lookups
are the same.

Also, rtalloc(foo, RT_RESOLVE, bar) and rtalloc_mpath(foo, NULL, bar) are both
equivalent to _rtalloc(foo, NULL, RT_RESOLVE, bar).

Let's remove this if(MULTICAST), it's just confusing.

ok ?

Index: sys/netinet6/in6_src.c
===
RCS file: /cvs/src/sys/netinet6/in6_src.c,v
retrieving revision 1.71
diff -u -p -r1.71 in6_src.c
--- sys/netinet6/in6_src.c  2 Dec 2015 13:29:26 -   1.71
+++ sys/netinet6/in6_src.c  5 Dec 2015 12:03:48 -
@@ -240,13 +240,8 @@ in6_selectsrc(struct in6_addr **in6src, 
sa6->sin6_len = sizeof(struct sockaddr_in6);
sa6->sin6_addr = *dst;
sa6->sin6_scope_id = dstsock->sin6_scope_id;
-   if (IN6_IS_ADDR_MULTICAST(dst)) {
-   ro->ro_rt = rtalloc(sin6tosa(>ro_dst),
-   RT_RESOLVE, ro->ro_tableid);
-   } else {
-   ro->ro_rt = rtalloc_mpath(sin6tosa(>ro_dst),
-   NULL, ro->ro_tableid);
-   }
+   ro->ro_rt = rtalloc(sin6tosa(>ro_dst),
+   RT_RESOLVE, ro->ro_tableid);
}
 
/*



Re: explicitly check broadcast addresses on some ifa_ifwithaddr() uses

2015-12-03 Thread Vincent Gross
On 12/02/15 20:06, Martin Pieuchot wrote:
> On 02/12/15(Wed) 16:18, Vincent Gross wrote:
>> When fed a broadcast address, ifa_ifwitaddr() returns the unicast ifa
>> whose broadcast address match the input. This is used mainly to select
>> ifa, and there can be trouble when you have 2 ifas on the same range
>> (e.g. 10.0.0.1/24@em0 & 10.0.0.20/24@em1) :
>>
>> netinet/ip_mroute.c:814
>> net/route.c:785
>> netinet/ip_divert.c:143
>> net/if_vxlan.c:241
>>
>> There are also places where broadcast addresses should not be tolerated :
>>
>> netinet/ip_input.c:1061  broadcast address is not a module identifier
>> netinet/ip_input.c:1141  see above
>> netinet/ip_input.c:1197  see above
>> netinet6/*:  no broadcast in ipv6
>> net/route.c:562: gateway shall never be a broadcast addr
>> net/route.c:713: see above
>>
>> This diff removes broadcast matching from ifa_ifwithaddr, and
>> adds or rewrites checks where necessary.
>>
>> Comments ? Ok ?
> 
> Looks good to me.  Some nits below.

Nits applied.

Anyone else ?


Index: sys/net/if.c
===
RCS file: /cvs/src/sys/net/if.c,v
retrieving revision 1.417
diff -u -p -r1.417 if.c
--- sys/net/if.c2 Dec 2015 16:35:52 -   1.417
+++ sys/net/if.c3 Dec 2015 07:59:53 -
@@ -1179,13 +1179,6 @@ ifa_ifwithaddr(struct sockaddr *addr, u_
 
if (equal(addr, ifa->ifa_addr))
return (ifa);
-
-   /* IPv6 doesn't have broadcast */
-   if ((ifp->if_flags & IFF_BROADCAST) &&
-   ifa->ifa_broadaddr &&
-   ifa->ifa_broadaddr->sa_len != 0 &&
-   equal(ifa->ifa_broadaddr, addr))
-   return (ifa);
}
}
return (NULL);
Index: sys/netinet/in_pcb.c
===
RCS file: /cvs/src/sys/netinet/in_pcb.c,v
retrieving revision 1.190
diff -u -p -r1.190 in_pcb.c
--- sys/netinet/in_pcb.c2 Dec 2015 22:13:44 -   1.190
+++ sys/netinet/in_pcb.c3 Dec 2015 07:59:53 -
@@ -332,14 +332,13 @@ in_pcbbind(struct inpcb *inp, struct mbu
 
ia = ifatoia(ifa_ifwithaddr(sintosa(sin),
inp->inp_rtableid));
-   if (ia == NULL)
-   return (EADDRNOTAVAIL);
 
/* SOCK_RAW does not use in_pcbbind() */
-   if (so->so_type != SOCK_DGRAM &&
-   sin->sin_addr.s_addr !=
-   ia->ia_addr.sin_addr.s_addr)
-   return (EADDRNOTAVAIL);
+   if (ia == NULL &&
+   (so->so_type != SOCK_DGRAM ||
+   !in_broadcast(sin->sin_addr,
+   inp->inp_rtableid)))
+   return (EADDRNOTAVAIL);
}
}
if (lport) {
@@ -353,7 +352,8 @@ in_pcbbind(struct inpcb *inp, struct mbu
t = in_pcblookup(table, _addr, 0,
>sin_addr, lport, INPLOOKUP_WILDCARD,
inp->inp_rtableid);
-   if (t && (so->so_euid != 
t->inp_socket->so_euid))
+   if (t &&
+   (so->so_euid != t->inp_socket->so_euid))
return (EADDRINUSE);
}
t = in_pcblookup(table, _addr, 0,
Index: sys/netinet/ip_output.c
===
RCS file: /cvs/src/sys/netinet/ip_output.c,v
retrieving revision 1.311
diff -u -p -r1.311 ip_output.c
--- sys/netinet/ip_output.c 2 Dec 2015 20:50:20 -   1.311
+++ sys/netinet/ip_output.c 3 Dec 2015 07:59:53 -
@@ -1368,13 +1368,12 @@ ip_setmoptions(int optname, struct ip_mo
sin.sin_family = AF_INET;
sin.sin_addr = addr;
ia = ifatoia(ifa_ifwithaddr(sintosa(), rtableid));
-   if (ia && in_hosteq(sin.sin_addr, ia->ia_addr.sin_addr))
-   ifp = ia->ia_ifp;
-   if (ifp == NULL || (ifp->if_flags & IFF_MULTICAST) == 0) {
+   if (ia == NULL ||
+   (ia->ia_ifp->if_flags & IFF_MULTICAST) == 0) {
  

Re: explicitly check broadcast addresses on some ifa_ifwithaddr() uses

2015-12-03 Thread Vincent Gross
On 12/03/15 10:21, Vincent Gross wrote:
> On 12/02/15 20:06, Martin Pieuchot wrote:
>> On 02/12/15(Wed) 16:18, Vincent Gross wrote:
>>> When fed a broadcast address, ifa_ifwitaddr() returns the unicast ifa
>>> whose broadcast address match the input. This is used mainly to select
>>> ifa, and there can be trouble when you have 2 ifas on the same range
>>> (e.g. 10.0.0.1/24@em0 & 10.0.0.20/24@em1) :
>>>
>>> netinet/ip_mroute.c:814
>>> net/route.c:785
>>> netinet/ip_divert.c:143
>>> net/if_vxlan.c:241
>>>
>>> There are also places where broadcast addresses should not be tolerated :
>>>
>>> netinet/ip_input.c:1061  broadcast address is not a module identifier
>>> netinet/ip_input.c:1141  see above
>>> netinet/ip_input.c:1197  see above
>>> netinet6/*:  no broadcast in ipv6
>>> net/route.c:562: gateway shall never be a broadcast addr
>>> net/route.c:713: see above
>>>
>>> This diff removes broadcast matching from ifa_ifwithaddr, and
>>> adds or rewrites checks where necessary.
>>>
>>> Comments ? Ok ?
>>
>> Looks good to me.  Some nits below.
> 
> Nits applied.
> 
> Anyone else ?

bluhm@ spotted one case where in_broadcast was needed.

ok ?

Index: sys/net/if.c
===
RCS file: /cvs/src/sys/net/if.c,v
retrieving revision 1.418
diff -u -p -r1.418 if.c
--- sys/net/if.c3 Dec 2015 12:22:51 -   1.418
+++ sys/net/if.c3 Dec 2015 13:48:58 -
@@ -1220,13 +1220,6 @@ ifa_ifwithaddr(struct sockaddr *addr, u_
 
if (equal(addr, ifa->ifa_addr))
return (ifa);
-
-   /* IPv6 doesn't have broadcast */
-   if ((ifp->if_flags & IFF_BROADCAST) &&
-   ifa->ifa_broadaddr &&
-   ifa->ifa_broadaddr->sa_len != 0 &&
-   equal(ifa->ifa_broadaddr, addr))
-   return (ifa);
}
}
return (NULL);
Index: sys/net/route.c
===
RCS file: /cvs/src/sys/net/route.c,v
retrieving revision 1.283
diff -u -p -r1.283 route.c
--- sys/net/route.c 2 Dec 2015 16:49:58 -   1.283
+++ sys/net/route.c 3 Dec 2015 13:49:00 -
@@ -539,7 +539,9 @@ rtredirect(struct sockaddr *dst, struct 
 bcmp((caddr_t)(a1), (caddr_t)(a2), (a1)->sa_len) == 0)
if (rt != NULL && (!equal(src, rt->rt_gateway) || rt->rt_ifa != ifa))
error = EINVAL;
-   else if (ifa_ifwithaddr(gateway, rdomain) != NULL)
+   else if (ifa_ifwithaddr(gateway, rdomain) != NULL ||
+   (gateway->sa_family = AF_INET &&
+   in_broadcast(satosin(gateway)->sin_addr, rdomain)))
error = EHOSTUNREACH;
if (error)
goto done;
Index: sys/netinet/in_pcb.c
===
RCS file: /cvs/src/sys/netinet/in_pcb.c,v
retrieving revision 1.191
diff -u -p -r1.191 in_pcb.c
--- sys/netinet/in_pcb.c3 Dec 2015 09:49:15 -   1.191
+++ sys/netinet/in_pcb.c3 Dec 2015 13:49:00 -
@@ -332,14 +332,13 @@ in_pcbbind(struct inpcb *inp, struct mbu
 
ia = ifatoia(ifa_ifwithaddr(sintosa(sin),
inp->inp_rtableid));
-   if (ia == NULL)
-   return (EADDRNOTAVAIL);
 
/* SOCK_RAW does not use in_pcbbind() */
-   if (so->so_type != SOCK_DGRAM &&
-   sin->sin_addr.s_addr !=
-   ia->ia_addr.sin_addr.s_addr)
-   return (EADDRNOTAVAIL);
+   if (ia == NULL &&
+   (so->so_type != SOCK_DGRAM ||
+   !in_broadcast(sin->sin_addr,
+   inp->inp_rtableid)))
+   return (EADDRNOTAVAIL);
}
}
if (lport) {
@@ -353,7 +352,8 @@ in_pcbbind(struct inpcb *inp, struct mbu
t = in_pcblookup(table, _addr, 0,
>sin_addr, lport, INPLOOKUP_WILDCARD,
inp->inp_rtableid);
-   if (t && (so->so_euid != 
t->inp_socket->so_euid))
+   if (t &&
+

explicitly check broadcast addresses on some ifa_ifwithaddr() uses

2015-12-02 Thread Vincent Gross
When fed a broadcast address, ifa_ifwitaddr() returns the unicast ifa
whose broadcast address match the input. This is used mainly to select
ifa, and there can be trouble when you have 2 ifas on the same range
(e.g. 10.0.0.1/24@em0 & 10.0.0.20/24@em1) :

netinet/ip_mroute.c:814
net/route.c:785
netinet/ip_divert.c:143
net/if_vxlan.c:241

There are also places where broadcast addresses should not be tolerated :

netinet/ip_input.c:1061  broadcast address is not a module identifier
netinet/ip_input.c:1141  see above
netinet/ip_input.c:1197  see above
netinet6/*:  no broadcast in ipv6
net/route.c:562: gateway shall never be a broadcast addr
net/route.c:713: see above

This diff removes broadcast matching from ifa_ifwithaddr, and
adds or rewrites checks where necessary.

Comments ? Ok ?

Index: sys/net/if.c
===
RCS file: /cvs/src/sys/net/if.c,v
retrieving revision 1.416
diff -u -p -r1.416 if.c
--- sys/net/if.c2 Dec 2015 08:47:00 -   1.416
+++ sys/net/if.c2 Dec 2015 15:17:26 -
@@ -1178,13 +1178,6 @@ ifa_ifwithaddr(struct sockaddr *addr, u_
 
if (equal(addr, ifa->ifa_addr))
return (ifa);
-
-   /* IPv6 doesn't have broadcast */
-   if ((ifp->if_flags & IFF_BROADCAST) &&
-   ifa->ifa_broadaddr &&
-   ifa->ifa_broadaddr->sa_len != 0 &&
-   equal(ifa->ifa_broadaddr, addr))
-   return (ifa);
}
}
return (NULL);
Index: sys/netinet/in_pcb.c
===
RCS file: /cvs/src/sys/netinet/in_pcb.c,v
retrieving revision 1.188
diff -u -p -r1.188 in_pcb.c
--- sys/netinet/in_pcb.c30 Oct 2015 09:39:42 -  1.188
+++ sys/netinet/in_pcb.c2 Dec 2015 15:17:26 -
@@ -328,14 +328,12 @@ in_pcbbind(struct inpcb *inp, struct mbu
 
ia = ifatoia(ifa_ifwithaddr(sintosa(sin),
inp->inp_rtableid));
-   if (ia == NULL)
-   return (EADDRNOTAVAIL);
 
/* SOCK_RAW does not use in_pcbbind() */
-   if (so->so_type != SOCK_DGRAM &&
-   sin->sin_addr.s_addr !=
-   ia->ia_addr.sin_addr.s_addr)
-   return (EADDRNOTAVAIL);
+   if (ia == NULL &&
+   (so->so_type != SOCK_DGRAM ||
+   !in_broadcast(sin->sin_addr, 
inp->inp_rtableid)))
+   return (EADDRNOTAVAIL);
}
}
if (lport) {
Index: sys/netinet/ip_output.c
===
RCS file: /cvs/src/sys/netinet/ip_output.c,v
retrieving revision 1.310
diff -u -p -r1.310 ip_output.c
--- sys/netinet/ip_output.c 2 Dec 2015 13:29:26 -   1.310
+++ sys/netinet/ip_output.c 2 Dec 2015 15:17:27 -
@@ -1387,9 +1387,8 @@ ip_setmoptions(int optname, struct ip_mo
sin.sin_family = AF_INET;
sin.sin_addr = addr;
ia = ifatoia(ifa_ifwithaddr(sintosa(), rtableid));
-   if (ia && in_hosteq(sin.sin_addr, ia->ia_addr.sin_addr))
-   ifp = ia->ia_ifp;
-   if (ifp == NULL || (ifp->if_flags & IFF_MULTICAST) == 0) {
+   if (ia == NULL || (ifp = ia->ia_ifp) == NULL ||
+   (ia->ia_ifp->if_flags & IFF_MULTICAST) == 0) {
error = EADDRNOTAVAIL;
break;
}
@@ -1561,12 +1560,11 @@ ip_setmoptions(int optname, struct ip_mo
sin.sin_family = AF_INET;
sin.sin_addr = mreq->imr_interface;
ia = ifatoia(ifa_ifwithaddr(sintosa(), rtableid));
-   if (ia && in_hosteq(sin.sin_addr, ia->ia_addr.sin_addr))
-   ifp = ia->ia_ifp;
-   else {
+   if (ia == NULL) {
error = EADDRNOTAVAIL;
break;
}
+   ifp = ia->ia_ifp;
}
/*
 * Find the membership in the membership array.
Index: sys/netinet/raw_ip.c
===
RCS file: /cvs/src/sys/netinet/raw_ip.c,v
retrieving revision 1.84
diff -u -p -r1.84 raw_ip.c
--- sys/netinet/raw_ip.c28 Jul 2015 12:22:07 -  1.84
+++ sys/netinet/raw_ip.c2 Dec 2015 15:17:27 -
@@ -473,6 +473,7 @@ 

rewrite if_ifwithaddr() to use rtalloc(9)

2015-10-26 Thread Vincent Gross
regress/sys/net/rdomains still passes with this diff.

Ok ?

Index: net/if.c
===
RCS file: /cvs/src/sys/net/if.c,v
retrieving revision 1.398
diff -u -p -r1.398 if.c
--- net/if.c25 Oct 2015 21:58:04 -  1.398
+++ net/if.c26 Oct 2015 09:44:10 -
@@ -1143,31 +1143,19 @@ if_congested(void)
 struct ifaddr *
 ifa_ifwithaddr(struct sockaddr *addr, u_int rtableid)
 {
-   struct ifnet *ifp;
struct ifaddr *ifa;
+   struct rtentry *rt;
u_int rdomain;
 
+   /*
+* Local routes corresponding to ifas are in rdomain's
+* default rtable.
+*/
rdomain = rtable_l2(rtableid);
-   TAILQ_FOREACH(ifp, , if_list) {
-   if (ifp->if_rdomain != rdomain)
-   continue;
-
-   TAILQ_FOREACH(ifa, >if_addrlist, ifa_list) {
-   if (ifa->ifa_addr->sa_family != addr->sa_family)
-   continue;
-
-   if (equal(addr, ifa->ifa_addr))
-   return (ifa);
-
-   /* IPv6 doesn't have broadcast */
-   if ((ifp->if_flags & IFF_BROADCAST) &&
-   ifa->ifa_broadaddr &&
-   ifa->ifa_broadaddr->sa_len != 0 &&
-   equal(ifa->ifa_broadaddr, addr))
-   return (ifa);
-   }
-   }
-   return (NULL);
+   rt = rtalloc(addr, 0, rdomain);
+   ifa = rt && (rt->rt_flags & RTF_LOCAL) ? rt->rt_ifa : NULL;
+   rtfree(rt);
+   return ifa;
 }
 
 /*



dedup in_pcbbind() port scan loop

2015-10-01 Thread Vincent Gross
Although the sysctls controlling the port range are labelled "port(hi)?first" 
and
"port(hi)?last", no ordering is enforced and you can have portfirst > portlast.
in_pcbbind() (and in6_pcbsetport()) work around this by duplicating the loop 
looking
for an available port.

This diff introduce temporary bounds and compare them to guarantee that
first <= last, thus allowing deduplication of the port scan loop.

Tested on my laptop with a narrow port range and heavy cheezburger browsing, no 
fault
detected. Deeper testing welcome.

Should I include in6_pcbsetport() changes right now or should ipv4 be validated 
first ?

--
Vincent Gross


Index: netinet/in_pcb.c
===
RCS file: /cvs/src/sys/netinet/in_pcb.c,v
retrieving revision 1.180
diff -u -p -r1.180 in_pcb.c
--- netinet/in_pcb.c22 Sep 2015 09:34:38 -  1.180
+++ netinet/in_pcb.c1 Oct 2015 09:47:16 -
@@ -360,67 +360,43 @@ in_pcbbind(struct inpcb *inp, struct mbu
inp->inp_laddr = sin->sin_addr;
}
if (lport == 0) {
-   u_int16_t first, last;
+   u_int16_t bound_a, bound_b, first, last;
int count;
 
if (inp->inp_flags & INP_HIGHPORT) {
-   first = ipport_hifirstauto; /* sysctl */
-   last = ipport_hilastauto;
+   bound_a = ipport_hifirstauto;   /* sysctl */
+   bound_b = ipport_hilastauto;
} else if (inp->inp_flags & INP_LOWPORT) {
if ((error = suser(p, 0)))
return (EACCES);
-   first = IPPORT_RESERVED-1; /* 1023 */
-   last = 600;/* not IPPORT_RESERVED/2 */
+   bound_a = IPPORT_RESERVED-1; /* 1023 */
+   bound_b = 600; /* not IPPORT_RESERVED/2 */
} else {
-   first = ipport_firstauto;   /* sysctl */
-   last  = ipport_lastauto;
+   bound_a = ipport_firstauto; /* sysctl */
+   bound_b = ipport_lastauto;
}
-
-   /*
-* Simple check to ensure all ports are not used up causing
-* a deadlock here.
-*
-* We split the two cases (up and down) so that the direction
-* is not being tested on each round of the loop.
-*/
-
-   if (first > last) {
-   /*
-* counting down
-*/
-   count = first - last;
-   if (count)
-   lastport = first - arc4random_uniform(count);
-
-   do {
-   if (count-- < 0)/* completely used? */
-   return (EADDRNOTAVAIL);
-   --lastport;
-   if (lastport > first || lastport < last)
-   lastport = first;
-   lport = htons(lastport);
-   } while (in_baddynamic(lastport, 
so->so_proto->pr_protocol) ||
-   in_pcblookup(table, _addr, 0,
-   >inp_laddr, lport, wild, inp->inp_rtableid));
+   if (bound_a < bound_b) {
+   first = bound_a;
+   last  = bound_b;
} else {
-   /*
-* counting up
-*/
-   count = last - first;
-   if (count)
-   lastport = first + arc4random_uniform(count);
-
-   do {
-   if (count-- < 0)/* completely used? */
-   return (EADDRNOTAVAIL);
-   ++lastport;
-   if (lastport < first || lastport > last)
-   lastport = first;
-   lport = htons(lastport);
-   } while (in_baddynamic(lastport, 
so->so_proto->pr_protocol) ||
-   in_pcblookup(table, _addr, 0,
-   >inp_laddr, lport, wild, inp->inp_rtableid));
+   first = bound_b;
+   last  = bound_a;
}
+   /* first <= last */
+
+   count = last - first;
+   lastport = first + arc4random_uniform(count);
+
+   do {
+   if (count-- < 0)/* completely used? */
+   return (EADDRNOTAVAIL);

Re: kill struct inpcbtable's inpt_lastport

2015-09-19 Thread Vincent Gross
On 09/18/15 23:39, David Hill wrote:
> On Fri, Sep 18, 2015 at 11:05:55PM +0200, Vincent Gross wrote:
>> On 09/18/15 15:18, David Hill wrote:
>>> Is this 'if (count)' statement needed?  We know first > last, so count
>>> will always be positive.  lastport will always be set.
>>
>>> if last == first, then the if statement will be false and lastport will
>>> be uninitialized, I believe.
>>>
>>
>> Both remarks are true, but I think it is better to keep a more extensive
>> refactoring in a separate diff, refactoring that shall get rid of this
>> yucky code duplication.
>>
> 
> Well, this code changes the current behavior.  I'd at least change
> lastport to be initialized to 0 to keep the behavior the same.  It was
> previously set to 0 with M_ZERO.
> 

Fixed. Ok ?

Index: sys/netinet/in_pcb.c
===
RCS file: /cvs/src/sys/netinet/in_pcb.c,v
retrieving revision 1.179
diff -u -p -r1.179 in_pcb.c
--- sys/netinet/in_pcb.c11 Sep 2015 15:29:47 -  1.179
+++ sys/netinet/in_pcb.c19 Sep 2015 17:52:42 -
@@ -199,7 +199,6 @@ in_pcbinit(struct inpcbtable *table, int
>inpt_lhash);
if (table->inpt_lhashtbl == NULL)
panic("in_pcbinit: hashinit failed for lport");
-   table->inpt_lastport = 0;
table->inpt_count = 0;
arc4random_buf(>inpt_key, sizeof(table->inpt_key));
 }
@@ -281,8 +280,8 @@ in_pcbbind(struct inpcb *inp, struct mbu
 {
struct socket *so = inp->inp_socket;
struct inpcbtable *table = inp->inp_table;
-   u_int16_t *lastport = >inp_table->inpt_lastport;
struct sockaddr_in *sin;
+   u_int16_t lastport = 0;
u_int16_t lport = 0;
int wild = 0, reuseport = (so->so_options & SO_REUSEPORT);
int error;
@@ -391,16 +390,16 @@ in_pcbbind(struct inpcb *inp, struct mbu
 */
count = first - last;
if (count)
-   *lastport = first - arc4random_uniform(count);
+   lastport = first - arc4random_uniform(count);
 
do {
if (count-- < 0)/* completely used? */
return (EADDRNOTAVAIL);
-   --*lastport;
-   if (*lastport > first || *lastport < last)
-   *lastport = first;
-   lport = htons(*lastport);
-   } while (in_baddynamic(*lastport, 
so->so_proto->pr_protocol) ||
+   --lastport;
+   if (lastport > first || lastport < last)
+   lastport = first;
+   lport = htons(lastport);
+   } while (in_baddynamic(lastport, 
so->so_proto->pr_protocol) ||
in_pcblookup(table, _addr, 0,
>inp_laddr, lport, wild, inp->inp_rtableid));
} else {
@@ -409,16 +408,16 @@ in_pcbbind(struct inpcb *inp, struct mbu
 */
count = last - first;
if (count)
-   *lastport = first + arc4random_uniform(count);
+   lastport = first + arc4random_uniform(count);
 
do {
if (count-- < 0)/* completely used? */
return (EADDRNOTAVAIL);
-   ++*lastport;
-   if (*lastport < first || *lastport > last)
-   *lastport = first;
-   lport = htons(*lastport);
-   } while (in_baddynamic(*lastport, 
so->so_proto->pr_protocol) ||
+   ++lastport;
+   if (lastport < first || lastport > last)
+   lastport = first;
+   lport = htons(lastport);
+   } while (in_baddynamic(lastport, 
so->so_proto->pr_protocol) ||
in_pcblookup(table, _addr, 0,
>inp_laddr, lport, wild, inp->inp_rtableid));
}
Index: sys/netinet/in_pcb.h
===
RCS file: /cvs/src/sys/netinet/in_pcb.h,v
retrieving revision 1.89
diff -u -p -r1.89 in_pcb.h
--- sys/netinet/in_pcb.h16 Apr 2015 19:24:13 -  1.89
+++ sys/netinet/in_pcb.h19 Sep 2015 17:52:42 -
@@ -152,7 +152,6 @@ struct inpcbta

Re: kill struct inpcbtable's inpt_lastport

2015-09-18 Thread Vincent Gross
On 09/18/15 15:18, David Hill wrote:
> Is this 'if (count)' statement needed?  We know first > last, so count
> will always be positive.  lastport will always be set.

> if last == first, then the if statement will be false and lastport will
> be uninitialized, I believe.
> 

Both remarks are true, but I think it is better to keep a more extensive
refactoring in a separate diff, refactoring that shall get rid of this
yucky code duplication.

--
Vincent Gross



Re: kill struct inpcbtable's inpt_lastport

2015-09-18 Thread Vincent Gross
On 09/13/15 11:49, Vincent Gross wrote:
> On 09/13/15 10:37, Claudio Jeker wrote:
>> On Sun, Sep 13, 2015 at 12:18:10AM +0200, Vincent Gross wrote:
>>> On 09/12/15 22:10, Claudio Jeker wrote:
>>>> On Sat, Sep 12, 2015 at 02:40:59PM +0200, Vincent Gross wrote:
>>>>> inpt_lastport is never read without being written before, and only
>>>>> in_pcbbind()
>>>>> and in6_pcbsetport() are using it. This diff removes inpt_lastport from
>>>>> struct inpcbtable and turns it into a local variable where it is used.
>>>>>
>>>>> Ok ?
>>>> Reads OK but can not be applied because something wrapped some lines.
>>>
>>
>> Lines are now fixed but now all the tabs got replaced by spaces. So the
>> thing still fails to apply.
>>
> 
> How about now ?
> 
> 
> Index: sys/netinet/in_pcb.c
> ===
> RCS file: /cvs/src/sys/netinet/in_pcb.c,v
> retrieving revision 1.179
> diff -u -p -r1.179 in_pcb.c
> --- sys/netinet/in_pcb.c  11 Sep 2015 15:29:47 -  1.179
> +++ sys/netinet/in_pcb.c  12 Sep 2015 12:22:03 -
> @@ -199,7 +199,6 @@ in_pcbinit(struct inpcbtable *table, int
>   >inpt_lhash);
>   if (table->inpt_lhashtbl == NULL)
>   panic("in_pcbinit: hashinit failed for lport");
> - table->inpt_lastport = 0;
>   table->inpt_count = 0;
>   arc4random_buf(>inpt_key, sizeof(table->inpt_key));
>  }
> @@ -281,9 +280,8 @@ in_pcbbind(struct inpcb *inp, struct mbu
>  {
>   struct socket *so = inp->inp_socket;
>   struct inpcbtable *table = inp->inp_table;
> - u_int16_t *lastport = >inp_table->inpt_lastport;
>   struct sockaddr_in *sin;
> - u_int16_t lport = 0;
> + u_int16_t lastport, lport = 0;
>   int wild = 0, reuseport = (so->so_options & SO_REUSEPORT);
>   int error;
>  
> @@ -391,16 +389,16 @@ in_pcbbind(struct inpcb *inp, struct mbu
>*/
>   count = first - last;
>   if (count)
> - *lastport = first - arc4random_uniform(count);
> + lastport = first - arc4random_uniform(count);
>  
>   do {
>   if (count-- < 0)/* completely used? */
>   return (EADDRNOTAVAIL);
> - --*lastport;
> - if (*lastport > first || *lastport < last)
> - *lastport = first;
> - lport = htons(*lastport);
> - } while (in_baddynamic(*lastport, 
> so->so_proto->pr_protocol) ||
> + --lastport;
> + if (lastport > first || lastport < last)
> + lastport = first;
> + lport = htons(lastport);
> + } while (in_baddynamic(lastport, 
> so->so_proto->pr_protocol) ||
>   in_pcblookup(table, _addr, 0,
>   >inp_laddr, lport, wild, inp->inp_rtableid));
>   } else {
> @@ -409,16 +407,16 @@ in_pcbbind(struct inpcb *inp, struct mbu
>*/
>   count = last - first;
>   if (count)
> - *lastport = first + arc4random_uniform(count);
> + lastport = first + arc4random_uniform(count);
>  
>   do {
>   if (count-- < 0)/* completely used? */
>   return (EADDRNOTAVAIL);
> - ++*lastport;
> - if (*lastport < first || *lastport > last)
> - *lastport = first;
> - lport = htons(*lastport);
> - } while (in_baddynamic(*lastport, 
> so->so_proto->pr_protocol) ||
> + ++lastport;
> + if (lastport < first || lastport > last)
> + lastport = first;
> + lport = htons(lastport);
> + } while (in_baddynamic(lastport, 
> so->so_proto->pr_protocol) ||
>   in_pcblookup(table, _addr, 0,
>   >inp_laddr, lport, wild, inp->inp_rtableid));
>   }
> Index: sys/netinet/in_pcb.h
> ==

Re: kill struct inpcbtable's inpt_lastport

2015-09-13 Thread Vincent Gross
On 09/13/15 10:37, Claudio Jeker wrote:
> On Sun, Sep 13, 2015 at 12:18:10AM +0200, Vincent Gross wrote:
>> On 09/12/15 22:10, Claudio Jeker wrote:
>>> On Sat, Sep 12, 2015 at 02:40:59PM +0200, Vincent Gross wrote:
>>>> inpt_lastport is never read without being written before, and only
>>>> in_pcbbind()
>>>> and in6_pcbsetport() are using it. This diff removes inpt_lastport from
>>>> struct inpcbtable and turns it into a local variable where it is used.
>>>>
>>>> Ok ?
>>> Reads OK but can not be applied because something wrapped some lines.
>>
> 
> Lines are now fixed but now all the tabs got replaced by spaces. So the
> thing still fails to apply.
> 

How about now ?


Index: sys/netinet/in_pcb.c
===
RCS file: /cvs/src/sys/netinet/in_pcb.c,v
retrieving revision 1.179
diff -u -p -r1.179 in_pcb.c
--- sys/netinet/in_pcb.c11 Sep 2015 15:29:47 -  1.179
+++ sys/netinet/in_pcb.c12 Sep 2015 12:22:03 -
@@ -199,7 +199,6 @@ in_pcbinit(struct inpcbtable *table, int
>inpt_lhash);
if (table->inpt_lhashtbl == NULL)
panic("in_pcbinit: hashinit failed for lport");
-   table->inpt_lastport = 0;
table->inpt_count = 0;
arc4random_buf(>inpt_key, sizeof(table->inpt_key));
 }
@@ -281,9 +280,8 @@ in_pcbbind(struct inpcb *inp, struct mbu
 {
struct socket *so = inp->inp_socket;
struct inpcbtable *table = inp->inp_table;
-   u_int16_t *lastport = >inp_table->inpt_lastport;
struct sockaddr_in *sin;
-   u_int16_t lport = 0;
+   u_int16_t lastport, lport = 0;
int wild = 0, reuseport = (so->so_options & SO_REUSEPORT);
int error;
 
@@ -391,16 +389,16 @@ in_pcbbind(struct inpcb *inp, struct mbu
 */
count = first - last;
if (count)
-   *lastport = first - arc4random_uniform(count);
+   lastport = first - arc4random_uniform(count);
 
do {
if (count-- < 0)/* completely used? */
return (EADDRNOTAVAIL);
-   --*lastport;
-   if (*lastport > first || *lastport < last)
-   *lastport = first;
-   lport = htons(*lastport);
-   } while (in_baddynamic(*lastport, 
so->so_proto->pr_protocol) ||
+   --lastport;
+   if (lastport > first || lastport < last)
+   lastport = first;
+   lport = htons(lastport);
+   } while (in_baddynamic(lastport, 
so->so_proto->pr_protocol) ||
in_pcblookup(table, _addr, 0,
>inp_laddr, lport, wild, inp->inp_rtableid));
} else {
@@ -409,16 +407,16 @@ in_pcbbind(struct inpcb *inp, struct mbu
 */
count = last - first;
if (count)
-   *lastport = first + arc4random_uniform(count);
+   lastport = first + arc4random_uniform(count);
 
do {
if (count-- < 0)/* completely used? */
return (EADDRNOTAVAIL);
-   ++*lastport;
-   if (*lastport < first || *lastport > last)
-   *lastport = first;
-   lport = htons(*lastport);
-   } while (in_baddynamic(*lastport, 
so->so_proto->pr_protocol) ||
+   ++lastport;
+   if (lastport < first || lastport > last)
+   lastport = first;
+   lport = htons(lastport);
+   } while (in_baddynamic(lastport, 
so->so_proto->pr_protocol) ||
in_pcblookup(table, _addr, 0,
>inp_laddr, lport, wild, inp->inp_rtableid));
}
Index: sys/netinet/in_pcb.h
===
RCS file: /cvs/src/sys/netinet/in_pcb.h,v
retrieving revision 1.89
diff -u -p -r1.89 in_pcb.h
--- sys/netinet/in_pcb.h16 Apr 2015 19:24:13 -  1.89
+++ sys/netinet/in_pcb.h12 Sep 2015 12:22:03 -
@@ -152,7 +152,6 @@ struct inpcbtable {
struct inpcbhead *inpt_hashtbl, *inpt_lhashtbl;
SIPHASH_KEY in

Re: kill struct inpcbtable's inpt_lastport

2015-09-12 Thread Vincent Gross
On 09/12/15 22:10, Claudio Jeker wrote:
> On Sat, Sep 12, 2015 at 02:40:59PM +0200, Vincent Gross wrote:
>> inpt_lastport is never read without being written before, and only
>> in_pcbbind()
>> and in6_pcbsetport() are using it. This diff removes inpt_lastport from
>> struct inpcbtable and turns it into a local variable where it is used.
>>
>> Ok ?
> Reads OK but can not be applied because something wrapped some lines.

Ok, thunderbird and I reached an agreement where we will keep our legs and 
lines unbroken.


--
Vincent


Index: sys/netinet/in_pcb.c
===
RCS file: /cvs/src/sys/netinet/in_pcb.c,v
retrieving revision 1.179
diff -u -p -r1.179 in_pcb.c
--- sys/netinet/in_pcb.c11 Sep 2015 15:29:47 -  1.179
+++ sys/netinet/in_pcb.c12 Sep 2015 12:22:03 -
@@ -199,7 +199,6 @@ in_pcbinit(struct inpcbtable *table, int
>inpt_lhash);
if (table->inpt_lhashtbl == NULL)
panic("in_pcbinit: hashinit failed for lport");
-   table->inpt_lastport = 0;
table->inpt_count = 0;
arc4random_buf(>inpt_key, sizeof(table->inpt_key));
 }
@@ -281,9 +280,8 @@ in_pcbbind(struct inpcb *inp, struct mbu
 {
struct socket *so = inp->inp_socket;
struct inpcbtable *table = inp->inp_table;
-   u_int16_t *lastport = >inp_table->inpt_lastport;
struct sockaddr_in *sin;
-   u_int16_t lport = 0;
+   u_int16_t lastport, lport = 0;
int wild = 0, reuseport = (so->so_options & SO_REUSEPORT);
int error;
 
@@ -391,16 +389,16 @@ in_pcbbind(struct inpcb *inp, struct mbu
 */
count = first - last;
if (count)
-   *lastport = first - arc4random_uniform(count);
+   lastport = first - arc4random_uniform(count);
 
do {
if (count-- < 0)/* completely used? */
return (EADDRNOTAVAIL);
-   --*lastport;
-   if (*lastport > first || *lastport < last)
-   *lastport = first;
-   lport = htons(*lastport);
-   } while (in_baddynamic(*lastport, 
so->so_proto->pr_protocol) ||
+   --lastport;
+   if (lastport > first || lastport < last)
+   lastport = first;
+   lport = htons(lastport);
+   } while (in_baddynamic(lastport, 
so->so_proto->pr_protocol) ||
in_pcblookup(table, _addr, 0,
>inp_laddr, lport, wild, inp->inp_rtableid));
} else {
@@ -409,16 +407,16 @@ in_pcbbind(struct inpcb *inp, struct mbu
 */
count = last - first;
if (count)
-   *lastport = first + arc4random_uniform(count);
+   lastport = first + arc4random_uniform(count);
 
do {
if (count-- < 0)/* completely used? */
return (EADDRNOTAVAIL);
-   ++*lastport;
-   if (*lastport < first || *lastport > last)
-   *lastport = first;
-   lport = htons(*lastport);
-   } while (in_baddynamic(*lastport, 
so->so_proto->pr_protocol) ||
+   ++lastport;
+   if (lastport < first || lastport > last)
+   lastport = first;
+   lport = htons(lastport);
+   } while (in_baddynamic(lastport, 
so->so_proto->pr_protocol) ||
in_pcblookup(table, _addr, 0,
>inp_laddr, lport, wild, inp->inp_rtableid));
}
Index: sys/netinet/in_pcb.h
===
RCS file: /cvs/src/sys/netinet/in_pcb.h,v
retrieving revision 1.89
diff -u -p -r1.89 in_pcb.h
--- sys/netinet/in_pcb.h16 Apr 2015 19:24:13 -  1.89
+++ sys/netinet/in_pcb.h12 Sep 2015 12:22:03 -
@@ -152,7 +152,6 @@ struct inpcbtable {
struct inpcbhead *inpt_hashtbl, *inpt_lhashtbl;
SIPHASH_KEY inpt_key;
u_longinpt_hash, inpt_lhash;
-   u_int16_t inpt_lastport;
int   inpt_count;
 };
 
Index: sys/netinet6/in6_pcb.c
===

kill struct inpcbtable's inpt_lastport

2015-09-12 Thread Vincent Gross
inpt_lastport is never read without being written before, and only
in_pcbbind()
and in6_pcbsetport() are using it. This diff removes inpt_lastport from
struct inpcbtable and turns it into a local variable where it is used.

Ok ?

--
Vincent

Index: sys/netinet/in_pcb.c
===
RCS file: /cvs/src/sys/netinet/in_pcb.c,v
retrieving revision 1.179
diff -u -p -r1.179 in_pcb.c
--- sys/netinet/in_pcb.c11 Sep 2015 15:29:47 -  1.179
+++ sys/netinet/in_pcb.c12 Sep 2015 12:22:03 -
@@ -199,7 +199,6 @@ in_pcbinit(struct inpcbtable *table, int
>inpt_lhash);
if (table->inpt_lhashtbl == NULL)
panic("in_pcbinit: hashinit failed for lport");
-   table->inpt_lastport = 0;
table->inpt_count = 0;
arc4random_buf(>inpt_key, sizeof(table->inpt_key));
 }
@@ -281,9 +280,8 @@ in_pcbbind(struct inpcb *inp, struct mbu
 {
struct socket *so = inp->inp_socket;
struct inpcbtable *table = inp->inp_table;
-   u_int16_t *lastport = >inp_table->inpt_lastport;
struct sockaddr_in *sin;
-   u_int16_t lport = 0;
+   u_int16_t lastport, lport = 0;
int wild = 0, reuseport = (so->so_options & SO_REUSEPORT);
int error;
 
@@ -391,16 +389,16 @@ in_pcbbind(struct inpcb *inp, struct mbu
 */
count = first - last;
if (count)
-   *lastport = first -
arc4random_uniform(count);
+   lastport = first -
arc4random_uniform(count);
 
do {
if (count-- < 0)/* completely
used? */
return (EADDRNOTAVAIL);
-   --*lastport;
-   if (*lastport > first || *lastport < last)
-   *lastport = first;
-   lport = htons(*lastport);
-   } while (in_baddynamic(*lastport,
so->so_proto->pr_protocol) ||
+   --lastport;
+   if (lastport > first || lastport < last)
+   lastport = first;
+   lport = htons(lastport);
+   } while (in_baddynamic(lastport,
so->so_proto->pr_protocol) ||
in_pcblookup(table, _addr, 0,
>inp_laddr, lport, wild,
inp->inp_rtableid));
} else {
@@ -409,16 +407,16 @@ in_pcbbind(struct inpcb *inp, struct mbu
 */
count = last - first;
if (count)
-   *lastport = first +
arc4random_uniform(count);
+   lastport = first +
arc4random_uniform(count);
 
do {
if (count-- < 0)/* completely
used? */
return (EADDRNOTAVAIL);
-   ++*lastport;
-   if (*lastport < first || *lastport > last)
-   *lastport = first;
-   lport = htons(*lastport);
-   } while (in_baddynamic(*lastport,
so->so_proto->pr_protocol) ||
+   ++lastport;
+   if (lastport < first || lastport > last)
+   lastport = first;
+   lport = htons(lastport);
+   } while (in_baddynamic(lastport,
so->so_proto->pr_protocol) ||
in_pcblookup(table, _addr, 0,
>inp_laddr, lport, wild,
inp->inp_rtableid));
}
Index: sys/netinet/in_pcb.h
===
RCS file: /cvs/src/sys/netinet/in_pcb.h,v
retrieving revision 1.89
diff -u -p -r1.89 in_pcb.h
--- sys/netinet/in_pcb.h16 Apr 2015 19:24:13 -  1.89
+++ sys/netinet/in_pcb.h12 Sep 2015 12:22:03 -
@@ -152,7 +152,6 @@ struct inpcbtable {
struct inpcbhead *inpt_hashtbl, *inpt_lhashtbl;
SIPHASH_KEY inpt_key;
u_longinpt_hash, inpt_lhash;
-   u_int16_t inpt_lastport;
int   inpt_count;
 };
 
Index: sys/netinet6/in6_pcb.c
===
RCS file: /cvs/src/sys/netinet6/in6_pcb.c,v
retrieving revision 1.74
diff -u -p -r1.74 in6_pcb.c
--- sys/netinet6/in6_pcb.c  11 Sep 2015 15:29:47 -  1.74
+++ sys/netinet6/in6_pcb.c  12 Sep 2015 12:22:07 -
@@ -294,8 +294,7 @@ in6_pcbsetport(struct in6_addr *laddr, s
struct socket *so = inp->inp_socket;
struct inpcbtable *table = inp->inp_table;
u_int16_t first, last;
-   u_int16_t *lastport = 

PATCH: iked SA cleanup on shutdown

2015-05-02 Thread Vincent Gross
Hi folks,

this patch makes iked clean its SAs on shutdown: for each existing IKE
SA, all of their Child SAs will be removed from the kernel, and a IKE
DELETE notification payload will be sent to the peer.

Comments ?

Cheers,

--
Vincent / dermiste


Index: iked.h
===
RCS file: /cvs/src/sbin/iked/iked.h,v
retrieving revision 1.84
diff -u -p -r1.84 iked.h
--- iked.h  26 Mar 2015 19:52:35 -  1.84
+++ iked.h  2 May 2015 17:11:34 -
@@ -549,7 +549,7 @@ struct privsep_proc {
const char  *p_chroot;
struct privsep  *p_ps;
struct iked *p_env;
-   void(*p_shutdown)(void);
+   void(*p_shutdown)(struct privsep_proc *);
u_intp_instance;
 };
 
@@ -744,6 +744,7 @@ pid_tikev1(struct privsep *, struct pr
 
 /* ikev2.c */
 pid_t   ikev2(struct privsep *, struct privsep_proc *);
+voidikev2_shutdown(struct privsep_proc *);
 voidikev2_recv(struct iked *, struct iked_message *);
 voidikev2_init_ike_sa(struct iked *, void *);
 int ikev2_sa_negotiate(struct iked_proposals *, struct iked_proposals *,
Index: ikev2.c
===
RCS file: /cvs/src/sbin/iked/ikev2.c,v
retrieving revision 1.120
diff -u -p -r1.120 ikev2.c
--- ikev2.c 26 Mar 2015 19:52:35 -  1.120
+++ ikev2.c 2 May 2015 17:11:39 -
@@ -136,7 +136,20 @@ static struct privsep_proc procs[] = {
 pid_t
 ikev2(struct privsep *ps, struct privsep_proc *p)
 {
+   p-p_shutdown = ikev2_shutdown;
return (proc_run(ps, p, procs, nitems(procs), NULL, NULL));
+}
+
+void
+ikev2_shutdown(struct privsep_proc *p)
+{
+   struct iked *env = p-p_env;
+   struct iked_sa  *sa, *tmpsa;
+
+   RB_FOREACH_SAFE(sa, iked_sas, env-sc_sas,tmpsa) {
+   ikev2_ikesa_delete(env, sa, sa-sa_hdr.sh_initiator);
+   sa_free(env, sa);
+   }
 }
 
 int
Index: proc.c
===
RCS file: /cvs/src/sbin/iked/proc.c,v
retrieving revision 1.22
diff -u -p -r1.22 proc.c
--- proc.c  16 Jan 2015 06:39:58 -  1.22
+++ proc.c  2 May 2015 17:11:39 -
@@ -297,7 +297,7 @@ proc_shutdown(struct privsep_proc *p)
control_cleanup(ps-ps_csock);
 
if (p-p_shutdown != NULL)
-   (*p-p_shutdown)();
+   (*p-p_shutdown)(p);
 
proc_close(ps);
 



PATCH: bring crypto(9) up to speed with crypto/cryptodev.h

2015-05-02 Thread Vincent Gross
Hi folks,

crypto(9) describes functions and constants that are not part of
crypto/cryptodev.h anymore (see 1.58 - 1.60), this patch fixes that.

Cheers,

--
Vincent / dermiste


Index: crypto.9
===
RCS file: /cvs/src/share/man/man9/crypto.9,v
retrieving revision 1.37
diff -u -p -r1.37 crypto.9
--- crypto.920 Aug 2014 11:23:42 -  1.37
+++ crypto.92 May 2015 20:02:31 -
@@ -28,21 +28,15 @@
 .Ft int
 .Fn crypto_register u_int32_t int * int (*)(u_int32_t *, struct cryptoini 
*) int (*)(u_int64_t) int (*)(struct cryptop *)
 .Ft int
-.Fn crypto_kregister u_int32_t int * int (*)(struct cryptkop *)
-.Ft int
 .Fn crypto_unregister u_int32_t int
 .Ft void
 .Fn crypto_done struct cryptop *
-.Ft void
-.Fn crypto_kdone struct cryptkop *
 .Ft int
 .Fn crypto_newsession u_int64_t * struct cryptoini * int
 .Ft int
 .Fn crypto_freesession u_int64_t
 .Ft int
 .Fn crypto_dispatch struct cryptop *
-.Ft int
-.Fn crypto_kdispatch struct cryptkop *
 .Ft struct cryptop *
 .Fn crypto_getreq int
 .Ft void
@@ -84,23 +78,6 @@ struct cryptop {
caddr_tcrp_mac;
 };
 
-struct crparam {
-caddr_t crp_p;
-u_int   crp_nbits;
-};
-
-#define CRK_MAXPARAM8
-
-struct cryptkop {
-u_int  krp_op; /* ie. CRK_MOD_EXP or other */
-u_int  krp_status; /* return status */
-u_shortkrp_iparams;/* # of input parameters */
-u_shortkrp_oparams;/* # of output parameters */
-   u_int32_t  krp_hid;
-struct crparam krp_param[CRK_MAXPARAM];  /* kvm */
-int   (*krp_callback)(struct cryptkop *);
-struct cryptkop   *krp_next;
-};
 .Ed
 .Sh DESCRIPTION
 .Nm
@@ -119,11 +96,6 @@ descriptors that instruct the framework 
 with it) of the operations that should be applied on the data (more
 than one cryptographic operation can be requested).
 .Pp
-Keying operations are supported as well.
-Unlike the symmetric operators described above,
-these sessionless commands perform mathematical operations using
-input and output parameters.
-.Pp
 Since the consumers may not be associated with a process, drivers may
 not use
 .Xr tsleep 9 .
@@ -168,8 +140,6 @@ CRYPTO_CAST_CBC
 CRYPTO_MD5_HMAC
 CRYPTO_SHA1_HMAC
 CRYPTO_RIPEMD160_HMAC
-CRYPTO_MD5_KPDK
-CRYPTO_SHA1_KPDK
 CRYPTO_AES_CBC
 CRYPTO_AES_CTR
 CRYPTO_AES_XTS
@@ -391,37 +361,11 @@ callback routine to do the necessary cle
 opaque field in the
 .Fa cryptop
 structure.
-.Pp
-.Fn crypto_kdispatch
-is called to perform a keying operation.
-The various fields in the
-.Fa cryptkop
-structure are:
-.Bl -tag -width crp_alloctype
-.It Fa krp_op
-Operation code, such as CRK_MOD_EXP.
-.It Fa krp_status
-Return code.
-This errno-style variable indicates whether there were lower level reasons
-for operation failure.
-.It Fa krp_iparams
-Number of input parameters to the specified operation.
-Note that each operation has a (typically hardwired) number of such parameters.
-.It Fa krp_oparams
-Number of output parameters from the specified operation.
-Note that each operation has a (typically hardwired) number of such parameters.
-.It Fa krp_kvp
-An array of kernel memory blocks containing the parameters.
-.It Fa krp_hid
-Identifier specifying which low-level driver is being used.
-.It Fa krp_callback
-Callback called on completion of a keying operation.
 .El
 .Sh DRIVER-SIDE API
 The
 .Fn crypto_get_driverid ,
 .Fn crypto_register ,
-.Fn crypto_kregister ,
 .Fn crypto_unregister ,
 and
 .Fn crypto_done
@@ -465,7 +409,6 @@ The calling convention for the three dri
 int (*newsession) (u_int32_t *, struct cryptoini *);
 int (*freesession) (u_int64_t);
 int (*process) (struct cryptop *);
-int (*kprocess) (struct cryptkop *);
 .Ed
 .Pp
 On invocation, the first argument to
@@ -501,24 +444,8 @@ routine should invoke
 .Fn crypto_done .
 Session migration may be performed, as mentioned previously.
 .Pp
-The
-.Fn kprocess
-routine is invoked with a request to perform crypto key processing.
-This routine must not block, but should queue the request and return
-immediately.
-Upon processing the request, the callback routine should be invoked.
-In case of error, the error indication must be placed in the
-.Fa krp_status
-field of the
-.Fa cryptkop
-structure.
-When the request is completed, or an error is detected, the
-.Fn kprocess
-routine should invoke
-.Fn crypto_kdone .
 .Sh RETURN VALUES
 .Fn crypto_register ,
-.Fn crypto_kregister ,
 .Fn crypto_unregister ,
 .Fn crypto_newsession ,
 and



Re: PATCH: clarifying iked.conf man

2015-05-01 Thread Vincent Gross
On Mon, Apr 20, 2015 at 07:35:58PM +0059, Jason McIntyre wrote:
 On Wed, Apr 15, 2015 at 05:13:13PM +0200, Vincent Gross wrote:
  Hello,
  
  iked.conf's man page is a bit fuzzy on how local and peer ip defaults
  are set. This patch below attempts to fix that.
  
 
 if you can specify one and have the other default to any, i agree we'd
 want to document it.
 
 for the rest, the diff essentially removes the information about when
 these options might be useful and needed. i'm less sure about that.
 
 i'd appreciate some feedback from a developer that the content is
 correct.
 
 i'm less inclined to rearrange the page this way without good reason.
 
 also note for future man diffs to start new sentences on new lines.
 

I took a second look at parse.y, and found it would choke on configs
like this one :

ikev2 active esp \
from 10.0.1.0/24 to 172.16.0.1 local 10.0.1.1 \
srcid 'client.lan' dstid 'gateway.lan'

To get this config to work you would need to add peer 172.16.0.1.

It would be more logical to default local to src and peer to dst when
having only one traffic selector, and both to any otherwise.

The diff below changes how defaults are set for peer and local, and
reflects the change in iked.conf(5).

Comments ? Suggestions ?

--- parse.y.origFri May  1 15:10:51 2015
+++ parse.y Fri May  1 17:08:51 2015
@@ -2482,25 +2482,21 @@
if (peers) {
if (peers-src)
ipa = peers-src;
+   else if (hosts-src  hosts-src-next == NULL)
+   ipa = hosts-src;
if (peers-dst)
ipb = peers-dst;
-   if (ipa == NULL  ipb == NULL) {
-   if (hosts-src  hosts-src-next == NULL)
-   ipa = hosts-src;
-   if (hosts-dst  hosts-dst-next == NULL)
-   ipb = hosts-dst;
-   }
+   else if (hosts-dst  hosts-dst-next == NULL)
+   ipb = hosts-dst;
}
if (ipa == NULL  ipb == NULL) {
yyerror(could not get local/peer specification);
return (-1);
}
-   if (pol.pol_flags  IKED_POLICY_ACTIVE) {
-   if (ipb == NULL || ipb-netaddress ||
-   (ipa != NULL  ipa-netaddress)) {
-   yyerror(active mode requires local/peer address);
+   if ((pol.pol_flags  IKED_POLICY_ACTIVE) 
+   (ipb == NULL || ipb-netaddress)) {
+   yyerror(active mode requires peer host address);
return (-1);
-   }
}
if (ipa) {
memcpy(pol.pol_local.addr, ipa-address,

--- iked.conf.5 28 Feb 2015 21:51:57 -  1.38
+++ iked.conf.5 1 May 2015 15:12:44 -
@@ -341,16 +341,24 @@ this option is generally not needed.
 The
 .Ic peer
 parameter specifies the address or FQDN of the remote endpoint.
-For host-to-host connections where
+For single-traffic-selector host-to-host connections where
 .Ar dst
 is identical to
 .Ar remote ,
 this option is generally not needed as it will be set to
 .Ar dst
 automatically.
-If it is not specified or if the keyword
-.Ar any
-is given, the default peer is used.
+.Pp
+When the policy contains only one traffic selector,
+.Ic local
+and
+.Ic peer
+default values are
+.Ar src
+and
+.Ar dst
+respectively. Otherwise they both default to
+.Ar any .
 .It Xo
 .Ic ikesa
 .Ic auth Ar algorithm



PATCH: clarifying iked.conf man

2015-04-15 Thread Vincent Gross
Hello,

iked.conf's man page is a bit fuzzy on how local and peer ip defaults
are set. This patch below attempts to fix that.

Also, can you take a look at my previous nat-on-ipsec-on-iked patchset ?

see http://marc.info/?l=openbsd-techm=142662971007779w=2

Cheers,


Index: iked.conf.5
===
RCS file: /cvs/src/sbin/iked/iked.conf.5,v
retrieving revision 1.38
diff -u -p -r1.38 iked.conf.5
--- iked.conf.5 28 Feb 2015 21:51:57 -  1.38
+++ iked.conf.5 15 Apr 2015 15:02:21 -
@@ -334,23 +334,21 @@ see the file
 .It Ic local Ar localip Ic peer Ar remote
 The
 .Ic local
-parameter specifies the address or FQDN of the local endpoint.
-Unless the gateway is multi-homed or uses address aliases,
-this option is generally not needed.
-.Pp
-The
+and
 .Ic peer
-parameter specifies the address or FQDN of the remote endpoint.
-For host-to-host connections where
+parameters specify the address or FQDN of the local and remote
+endpoints respectively.
+If neither are specified, their default values are equal to
+.Ar src
+and
 .Ar dst
-is identical to
-.Ar remote ,
-this option is generally not needed as it will be set to
-.Ar dst
-automatically.
-If it is not specified or if the keyword
-.Ar any
-is given, the default peer is used.
+for
+.Ar localip
+and
+.Ar remote
+respectively. When only one is specified, the other
+defaults to
+.Ar any .
 .It Xo
 .Ic ikesa
 .Ic auth Ar algorithm



Re: autoinstall(8) tweaks

2015-04-15 Thread Vincent Gross
On Wed, Apr 15, 2015 at 08:20:15AM +0900, Ryan McBride wrote:
 On Thu, Apr 09, 2015 at 04:27:17AM -0600, Theo de Raadt wrote:
   But it seems people are expected to build a custom bsd.rd if they
   want something different so I'll bow out of this conversation.
  
  No, the situation is that less than 1% of the user community
  apparently have a secret usage case, but never manage to explain it.
  
 
 I manage a bunch of OpenBSD proxies that I would like to be able to
 build from scratch using automated tools; everything is in place 
 (ansible) except for the base OpenBSD install as I need a separate
 /var/squid partition to prevent cache / log disasters from filling /var;
 similar concerns would apply to many other data / log-heavy daemons.
 
 On other systems where I don't know how the data will grow, I typically
 configure them with something close to the auto layout, but a smaller
 /home, and leave the remaining disk empty. When I get a feel for what
 the data usage is in /var/daemon or /home or /usr/local, I can expand
 /home or create a new partition and migrate the data.

The default allocation is actually easy to rework right after a fresh
install, as /usr/src, /usr/obj and /home are at the end. Ssh as
root, kill /usr/src, /usr/obj and /home, optionally extend /usr/local,
and then repartition as you wish.

As for swap and /tmp, you can move /tmp to the end, at worst you will
lose 4G worth of disk space you can add to swap. And if you need more
than 2x RAM swap, you have bigger problems than partitioning.

 Other reasons to want non-auto partitioning like include:
 - simpler dump/restore

Yeah, embrace failure is what all the cool kids do these days. Except
that this kind of non-management just sweeps problems under the rug so
they can mature into propers monsters ready to gnaw at your skull at the
worst possible moment.

 - moving certain parts of hier(7) onto a different device
   (you can do this as a post-install task if they are empty, but
   it becomes a pain if it's something that's part of base)
 
 A place where the latter can be quite useful is on a virtualised guest,
 where you can easily make one storage device persistant, and another
 ephemeral across reboots.

Which part of baseXX, compXX, manXX, gameXX, xbaseXX, xfontXX, xservXX
or xshareXX would fall under such a case ?

None.

 Yes, all of this can be done manually, but basically any place I would
 care to work at is moving towards complete automation of system installs
 (for *hack*Cloud*spit*, Continuous Delivery, DR, or just plain old
 laziness). It would be really nice if the OpenBSD installer would handle
 this in a sane fashion.

Do you want me to write an ansible playbook to run a handful of shell
commands over ssh ?

Cheers,

--
Vincent Gross



Re: PATCH: NAT on IPSec

2015-01-26 Thread Vincent Gross
On Thu, Jan 15, 2015 at 04:00:20PM +0100, Vincent Gross wrote:
 Hello folks,
 
 This patch brings nat capabilites into iked, the same way that mpf@ did
 with isakmpd about 6 years ago.
 
 Comments ?

bumpity bump bump.

Any comments on this ?

 
 Tested with the following setup, with icmp, udp and tcp:
 
  Local pf.conf:
 table homev4 { 172.23.0.0/23 }
 
 set skip on lo
 
 match out on enc0 from ! homev4 to homev4 nat-to 172.23.50.1
 
 block return
 pass
 block return in on ! lo0 proto tcp to port 6000:6010
 
  Local iked.conf:
 ikev2 active esp \
   from 172.23.50.1 (0.0.0.0/0) to 172.23.0.0/23 peer 79.143.250.153 \
   srcid 'spinoza.kilob.yt' dstid 'brouwer.kilob.yt'
 
  Local ip address:
 ppp0: flags=8051UP,POINTOPOINT,RUNNING,MULTICAST mtu 1500
   priority: 0
   groups: ppp egress
   inet 100.97.217.112 -- 10.64.64.64 netmask 0xff00
 
  Remote pf.conf:
 [...]
 pass on enc0
 [...]
 
  Remote iked.conf:
 ikev2 esp \
   from 172.23.0.0/23 to 172.23.50.1 peer any \
   srcid 'brouwer.kilob.yt' dstid 'spinoza.kilob.yt'
 
 
 
 
 Index: iked.h
 ===
 RCS file: /cvs/src/sbin/iked/iked.h,v
 retrieving revision 1.82
 diff -u -p -r1.82 iked.h
 --- iked.h18 Aug 2014 09:43:02 -  1.82
 +++ iked.h15 Jan 2015 13:54:46 -
 @@ -139,6 +139,8 @@ struct iked_flow {
   struct iked_addr flow_src;
   struct iked_addr flow_dst;
   u_intflow_dir;  /* in/out */
 + struct iked_addr flow_prenat;   /* pre-nat source */
 + u_intflow_usenat;
  
   u_intflow_loaded;   /* pfkey done */
  
 Index: parse.y
 ===
 RCS file: /cvs/src/sbin/iked/parse.y,v
 retrieving revision 1.43
 diff -u -p -r1.43 parse.y
 --- parse.y   12 Jan 2015 11:24:58 -  1.43
 +++ parse.y   15 Jan 2015 13:54:47 -
 @@ -2401,7 +2401,7 @@ create_ike(char *name, int af, u_int8_t 
  {
   char idstr[IKED_ID_SIZE];
   u_intidtype = IKEV2_ID_NONE;
 - struct ipsec_addr_wrap  *ipa, *ipb;
 + struct ipsec_addr_wrap  *ipa, *ipb, *ipn;
   struct iked_policy   pol;
   struct iked_proposal prop[2];
   u_intj;
 @@ -2622,6 +2622,16 @@ create_ike(char *name, int af, u_int8_t 
   flows[j].flow_dst.addr_mask = ipb-mask;
   flows[j].flow_dst.addr_net = ipb-netaddress;
   flows[j].flow_dst.addr_port = hosts-dport;
 +
 + ipn = ipa-srcnat;
 + if (ipn) {
 + memcpy(flows[j].flow_prenat.addr, ipn-address,
 + sizeof(ipn-address));
 + flows[j].flow_prenat.addr_af = ipn-af;
 + flows[j].flow_prenat.addr_mask = ipn-mask;
 + flows[j].flow_prenat.addr_net = ipn-netaddress;
 + flows[j].flow_usenat = 1;
 + }
  
   flows[j].flow_ipproto = ipproto;
  
 Index: pfkey.c
 ===
 RCS file: /cvs/src/sbin/iked/pfkey.c,v
 retrieving revision 1.40
 diff -u -p -r1.40 pfkey.c
 --- pfkey.c   29 Oct 2014 06:26:39 -  1.40
 +++ pfkey.c   15 Jan 2015 13:54:47 -
 @@ -180,6 +180,7 @@ int
  pfkey_flow(int sd, u_int8_t satype, u_int8_t action, struct iked_flow *flow)
  {
   struct sadb_msg  smsg;
 + struct iked_addr*flow_src, *flow_dst;
   struct sadb_address  sa_src, sa_dst, sa_local, sa_peer, sa_smask,
sa_dmask;
   struct sadb_protocol sa_flowtype, sa_protocol;
 @@ -192,58 +193,76 @@ pfkey_flow(int sd, u_int8_t satype, u_in
   sport = dport = 0;
   sa_srcid = sa_dstid = NULL;
  
 + flow_src = flow-flow_src;
 + flow_dst = flow-flow_dst;
 +
 + if (flow-flow_usenat)
 + switch (flow-flow_type) {
 + case SADB_X_FLOW_TYPE_USE:
 + flow_dst = flow-flow_prenat;
 + break;
 + case SADB_X_FLOW_TYPE_REQUIRE:
 + flow_src = flow-flow_prenat;
 + break;
 + case 0:
 + if (flow-flow_dir == IPSP_DIRECTION_IN)
 + flow_dst = flow-flow_prenat;
 + else
 + flow_src = flow-flow_prenat;
 + }
 +
   bzero(ssrc, sizeof(ssrc));
   bzero(smask, sizeof(smask));
 - memcpy(ssrc, flow-flow_src.addr, sizeof(ssrc));
 - memcpy(smask, flow-flow_src.addr, sizeof(smask));
 - if ((sport = flow-flow_src.addr_port) != 0)
 + memcpy(ssrc, flow_src-addr, sizeof(ssrc));
 + memcpy(smask, flow_src-addr, sizeof(smask));
 + if ((sport = flow_src-addr_port) != 0)
   dport = 0x;
   socket_af

PATCH: NAT on IPSec

2015-01-15 Thread Vincent Gross
Hello folks,

This patch brings nat capabilites into iked, the same way that mpf@ did
with isakmpd about 6 years ago.

Comments ?

Tested with the following setup, with icmp, udp and tcp:

 Local pf.conf:
table homev4 { 172.23.0.0/23 }

set skip on lo

match out on enc0 from ! homev4 to homev4 nat-to 172.23.50.1

block return
pass
block return in on ! lo0 proto tcp to port 6000:6010

 Local iked.conf:
ikev2 active esp \
from 172.23.50.1 (0.0.0.0/0) to 172.23.0.0/23 peer 79.143.250.153 \
srcid 'spinoza.kilob.yt' dstid 'brouwer.kilob.yt'

 Local ip address:
ppp0: flags=8051UP,POINTOPOINT,RUNNING,MULTICAST mtu 1500
priority: 0
groups: ppp egress
inet 100.97.217.112 -- 10.64.64.64 netmask 0xff00

 Remote pf.conf:
[...]
pass on enc0
[...]

 Remote iked.conf:
ikev2 esp \
from 172.23.0.0/23 to 172.23.50.1 peer any \
srcid 'brouwer.kilob.yt' dstid 'spinoza.kilob.yt'




Index: iked.h
===
RCS file: /cvs/src/sbin/iked/iked.h,v
retrieving revision 1.82
diff -u -p -r1.82 iked.h
--- iked.h  18 Aug 2014 09:43:02 -  1.82
+++ iked.h  15 Jan 2015 13:54:46 -
@@ -139,6 +139,8 @@ struct iked_flow {
struct iked_addr flow_src;
struct iked_addr flow_dst;
u_intflow_dir;  /* in/out */
+   struct iked_addr flow_prenat;   /* pre-nat source */
+   u_intflow_usenat;
 
u_intflow_loaded;   /* pfkey done */
 
Index: parse.y
===
RCS file: /cvs/src/sbin/iked/parse.y,v
retrieving revision 1.43
diff -u -p -r1.43 parse.y
--- parse.y 12 Jan 2015 11:24:58 -  1.43
+++ parse.y 15 Jan 2015 13:54:47 -
@@ -2401,7 +2401,7 @@ create_ike(char *name, int af, u_int8_t 
 {
char idstr[IKED_ID_SIZE];
u_intidtype = IKEV2_ID_NONE;
-   struct ipsec_addr_wrap  *ipa, *ipb;
+   struct ipsec_addr_wrap  *ipa, *ipb, *ipn;
struct iked_policy   pol;
struct iked_proposal prop[2];
u_intj;
@@ -2622,6 +2622,16 @@ create_ike(char *name, int af, u_int8_t 
flows[j].flow_dst.addr_mask = ipb-mask;
flows[j].flow_dst.addr_net = ipb-netaddress;
flows[j].flow_dst.addr_port = hosts-dport;
+
+   ipn = ipa-srcnat;
+   if (ipn) {
+   memcpy(flows[j].flow_prenat.addr, ipn-address,
+   sizeof(ipn-address));
+   flows[j].flow_prenat.addr_af = ipn-af;
+   flows[j].flow_prenat.addr_mask = ipn-mask;
+   flows[j].flow_prenat.addr_net = ipn-netaddress;
+   flows[j].flow_usenat = 1;
+   }
 
flows[j].flow_ipproto = ipproto;
 
Index: pfkey.c
===
RCS file: /cvs/src/sbin/iked/pfkey.c,v
retrieving revision 1.40
diff -u -p -r1.40 pfkey.c
--- pfkey.c 29 Oct 2014 06:26:39 -  1.40
+++ pfkey.c 15 Jan 2015 13:54:47 -
@@ -180,6 +180,7 @@ int
 pfkey_flow(int sd, u_int8_t satype, u_int8_t action, struct iked_flow *flow)
 {
struct sadb_msg  smsg;
+   struct iked_addr*flow_src, *flow_dst;
struct sadb_address  sa_src, sa_dst, sa_local, sa_peer, sa_smask,
 sa_dmask;
struct sadb_protocol sa_flowtype, sa_protocol;
@@ -192,58 +193,76 @@ pfkey_flow(int sd, u_int8_t satype, u_in
sport = dport = 0;
sa_srcid = sa_dstid = NULL;
 
+   flow_src = flow-flow_src;
+   flow_dst = flow-flow_dst;
+
+   if (flow-flow_usenat)
+   switch (flow-flow_type) {
+   case SADB_X_FLOW_TYPE_USE:
+   flow_dst = flow-flow_prenat;
+   break;
+   case SADB_X_FLOW_TYPE_REQUIRE:
+   flow_src = flow-flow_prenat;
+   break;
+   case 0:
+   if (flow-flow_dir == IPSP_DIRECTION_IN)
+   flow_dst = flow-flow_prenat;
+   else
+   flow_src = flow-flow_prenat;
+   }
+
bzero(ssrc, sizeof(ssrc));
bzero(smask, sizeof(smask));
-   memcpy(ssrc, flow-flow_src.addr, sizeof(ssrc));
-   memcpy(smask, flow-flow_src.addr, sizeof(smask));
-   if ((sport = flow-flow_src.addr_port) != 0)
+   memcpy(ssrc, flow_src-addr, sizeof(ssrc));
+   memcpy(smask, flow_src-addr, sizeof(smask));
+   if ((sport = flow_src-addr_port) != 0)
dport = 0x;
socket_af((struct sockaddr *)ssrc, sport);
socket_af((struct sockaddr *)smask, dport);
 
-   switch 

iked control process crash at startup

2014-11-25 Thread Vincent Gross
Hi tech@,

I've been using iked for some weeks to tunnel my laptop to home over 3G.
Sunday I upgraded my laptop to the latest snapshot; previous upgrade was
about 2 or 3 weeks ago. When I started iked, it crashed randomly, as in
one time it runs just fine and completes the handshake, the other it
crashes before even sending the first packet.

I ran ktrace -di /sbin/iked and kdump'd the resulting file. Of the 5
processes, 4 finished by calling exit(0), one was terminated on a
SIGSEGV. As it is also the only one that do stuff on /var/run/iked.sock,
it is the control process. I repeated the above ktrace 4 times and got
consistent results: SIGSEGV'd control process.

I'll keep the hunt going, but I am not sure how long this will take nor
how much time I'll have to spare, so here is the control process kdump.

Cheers,

--
Vincent


 17866 iked RET   fork 0
 17866 iked CALL  getpid()
 17866 iked RET   getpid 17866/0x45ca
 17866 iked CALL  setpgid(0,0x45ca)
 17866 iked RET   setpgid 0
 17866 iked CALL  socket(PF_LOCAL,0x1SOCK_STREAM,0)
 17866 iked RET   socket 15/0xf
 17866 iked CALL  unlink(0x631ceb)
 17866 iked NAMI  /var/run/iked.sock
 17866 iked RET   unlink 0
 17866 iked CALL  umask(0117S_IXUSR|S_IXGRP|S_IROTH|S_IWOTH|S_IXOTH)
 17866 iked RET   umask 18/0x12
 17866 iked CALL  bind(0xf,0x7f7c8660,0x6a)
 17866 iked STRU  struct sockaddr { AF_LOCAL, /var/run/iked.sock }
 17866 iked NAMI  /var/run/iked.sock
 17866 iked RET   bind 0
 17866 iked CALL  umask(022S_IWGRP|S_IWOTH)
 17866 iked RET   umask 79/0x4f
 17866 iked CALL  chmod(0x631ceb,0660S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP)
 17866 iked NAMI  /var/run/iked.sock
 17866 iked RET   chmod 0
 17866 iked CALL  fcntl(0xf,F_GETFL)
 17866 iked RET   fcntl 2
 17866 iked CALL  fcntl(0xf,F_SETFL,0x6O_RDWR|O_NONBLOCK)
 17866 iked RET   fcntl 0
 17866 iked CALL  chroot(0x631d39)
 17866 iked NAMI  /etc/iked/
 17866 iked RET   chroot 0
 17866 iked CALL  chdir(0x6669f2)
 17866 iked NAMI  /
 17866 iked RET   chdir 0
 17866 iked CALL  __sysctl(2.3,0x7f7c8640,0x7f7c8630,0,0)
 17866 iked RET   __sysctl 0
 17866 iked CALL  setgroups(0x1,0x9bddb4)
 17866 iked RET   setgroups 0
 17866 iked CALL  setresgid(0x65,0x65,0x65)
 17866 iked RET   setresgid 0
 17866 iked CALL  setresuid(0x65,0x65,0x65)
 17866 iked RET   setresuid 0
 17866 iked CALL  clock_gettime(CLOCK_MONOTONIC,0x7f7c86c0)
 17866 iked STRU  struct timespec { 150798.566033906 }
 17866 iked RET   clock_gettime 0
 17866 iked CALL  clock_gettime(CLOCK_MONOTONIC,0x7f7c8690)
 17866 iked STRU  struct timespec { 150798.566077766 }
 17866 iked RET   clock_gettime 0
 17866 iked CALL  issetugid()
 17866 iked RET   issetugid 0
 17866 iked CALL  kqueue()
 17866 iked RET   kqueue 16/0x10
 17866 iked CALL  getpid()
 17866 iked RET   getpid 17866/0x45ca
 17866 iked CALL  getentropy(0x7f7c8550,0x28)
 17866 iked RET   getentropy 0
 17866 iked CALL  issetugid()
 17866 iked RET   issetugid 0
 17866 iked CALL  kevent(0x10,0x7f7c8640,0x1,0,0,0x7f7c8660)
 17866 iked STRU  struct timespec { 0 }
 17866 iked RET   kevent 0
 17866 iked CALL  sigaction(SIGINT,0x7f7c85f0,0x2817fb150)
 17866 iked STRU  struct sigaction { handler=0x42f6f0, mask=~0, 
flags=0x2SA_RESTART }
 17866 iked STRU  struct sigaction { handler=SIG_DFL, mask=0, flags=0 }
 17866 iked RET   sigaction 0
 17866 iked CALL  kevent(0x10,0x7f7c8640,0x1,0,0,0x7f7c8660)
 17866 iked STRU  struct timespec { 0 }
 17866 iked RET   kevent 0
 17866 iked CALL  sigaction(SIGTERM,0x7f7c85f0,0x2817fcc30)
 17866 iked STRU  struct sigaction { handler=0x42f6f0, mask=~0, 
flags=0x2SA_RESTART }
 17866 iked STRU  struct sigaction { handler=SIG_DFL, mask=0, flags=0 }
 17866 iked RET   sigaction 0
 17866 iked CALL  kevent(0x10,0x7f7c8640,0x1,0,0,0x7f7c8660)
 17866 iked STRU  struct timespec { 0 }
 17866 iked RET   kevent 0
 17866 iked CALL  sigaction(SIGCHLD,0x7f7c85f0,0x2817fa980)
 17866 iked STRU  struct sigaction { handler=0x42f6f0, mask=~0, 
flags=0x2SA_RESTART }
 17866 iked STRU  struct sigaction { handler=SIG_DFL, mask=0, flags=0 }
 17866 iked RET   sigaction 0
 17866 iked CALL  kevent(0x10,0x7f7c8640,0x1,0,0,0x7f7c8660)
 17866 iked STRU  struct timespec { 0 }
 17866 iked RET   kevent 0
 17866 iked CALL  sigaction(SIGHUP,0x7f7c85f0,0x2817fc530)
 17866 iked STRU  struct sigaction { handler=0x42f6f0, mask=~0, 
flags=0x2SA_RESTART }
 17866 iked STRU  struct sigaction { handler=SIG_DFL, mask=0, flags=0 }
 17866 iked RET   sigaction 0
 17866 iked CALL  kevent(0x10,0x7f7c8640,0x1,0,0,0x7f7c8660)
 17866 iked STRU  struct timespec { 0 }
 17866 iked RET   kevent 0
 17866 iked CALL  

add DSA and ECDSA to relayd ca engine

2014-11-08 Thread Vincent Gross
Hi,

Two diffs below. The first moves ecdsa_method declaration from
ecs_locl.h to ecdsa.h, as ecs_locl.h is not installed in
/usr/include/openssl/.

The second one adds DSA and ECDSA capabilities to relayd ca engine, and
also checks that when using a DSA certificate, we have enabled EDH in
the relevant proto section. Requirements have been documented in
relayd.conf(5).

It works, but it surely needs some refactoring and style tweaks.

Comments ?

[Diff #1]

Index: ecdsa.h
===
RCS file: /cvs/src/lib/libssl/src/crypto/ecdsa/ecdsa.h,v
retrieving revision 1.2
diff -u -p -u -r1.2 ecdsa.h
--- ecdsa.h 12 Jun 2014 15:49:29 -  1.2
+++ ecdsa.h 8 Nov 2014 18:30:58 -
@@ -75,11 +75,36 @@
 extern C {
 #endif
 
-typedef struct ECDSA_SIG_st
-   {
+typedef struct ECDSA_SIG_st ECDSA_SIG;
+
+struct ecdsa_method {
+   const char *name;
+   ECDSA_SIG *(*ecdsa_do_sign)(const unsigned char *dgst, int dgst_len, 
+   const BIGNUM *inv, const BIGNUM *rp, EC_KEY *eckey);
+   int (*ecdsa_sign_setup)(EC_KEY *eckey, BN_CTX *ctx, BIGNUM **kinv, 
+   BIGNUM **r);
+   int (*ecdsa_do_verify)(const unsigned char *dgst, int dgst_len, 
+   const ECDSA_SIG *sig, EC_KEY *eckey);
+#if 0
+   int (*init)(EC_KEY *eckey);
+   int (*finish)(EC_KEY *eckey);
+#endif
+   int flags;
+   char *app_data;
+};
+
+/* If this flag is set the ECDSA method is FIPS compliant and can be used
+ * in FIPS mode. This is set in the validated module method. If an
+ * application sets this flag in its own methods it is its responsibility
+ * to ensure the result is compliant.
+ */
+
+#define ECDSA_FLAG_FIPS_METHOD  0x1
+
+struct ECDSA_SIG_st {
BIGNUM *r;
BIGNUM *s;
-   } ECDSA_SIG;
+};
 
 /** Allocates and initialize a ECDSA_SIG structure
  *  \return pointer to a ECDSA_SIG structure or NULL if an error occurred
Index: ecs_locl.h
===
RCS file: /cvs/src/lib/libssl/src/crypto/ecdsa/ecs_locl.h,v
retrieving revision 1.2
diff -u -p -u -r1.2 ecs_locl.h
--- ecs_locl.h  12 Jun 2014 15:49:29 -  1.2
+++ ecs_locl.h  8 Nov 2014 18:30:58 -
@@ -65,31 +65,6 @@
 extern C {
 #endif
 
-struct ecdsa_method 
-   {
-   const char *name;
-   ECDSA_SIG *(*ecdsa_do_sign)(const unsigned char *dgst, int dgst_len, 
-   const BIGNUM *inv, const BIGNUM *rp, EC_KEY *eckey);
-   int (*ecdsa_sign_setup)(EC_KEY *eckey, BN_CTX *ctx, BIGNUM **kinv, 
-   BIGNUM **r);
-   int (*ecdsa_do_verify)(const unsigned char *dgst, int dgst_len, 
-   const ECDSA_SIG *sig, EC_KEY *eckey);
-#if 0
-   int (*init)(EC_KEY *eckey);
-   int (*finish)(EC_KEY *eckey);
-#endif
-   int flags;
-   char *app_data;
-   };
-
-/* If this flag is set the ECDSA method is FIPS compliant and can be used
- * in FIPS mode. This is set in the validated module method. If an
- * application sets this flag in its own methods it is its responsibility
- * to ensure the result is compliant.
- */
-
-#define ECDSA_FLAG_FIPS_METHOD 0x1
-
 typedef struct ecdsa_data_st {
/* EC_KEY_METH_DATA part */
int (*init)(EC_KEY *);


[Diff #2]

Index: ca.c
===
RCS file: /cvs/src/usr.sbin/relayd/ca.c,v
retrieving revision 1.9
diff -u -p -u -r1.9 ca.c
--- ca.c2 Oct 2014 19:16:31 -   1.9
+++ ca.c8 Nov 2014 18:26:15 -
@@ -32,9 +32,13 @@
 #include stdlib.h
 #include errno.h
 
+#include openssl/ossl_typ.h
+#include openssl/bn.h
 #include openssl/pem.h
 #include openssl/evp.h
 #include openssl/rsa.h
+#include openssl/dsa.h
+#include openssl/ecdsa.h
 #include openssl/engine.h
 
 #include relayd.h
@@ -60,6 +64,30 @@ int   rsae_verify(int dtype, const u_char
u_int, const RSA *);
 int rsae_keygen(RSA *, int, BIGNUM *, BN_GENCB *);
 
+
+DSA_SIG*dsae_do_sign(const unsigned char *, int, DSA *);
+int dsae_sign_setup(DSA *, BN_CTX *, BIGNUM **, BIGNUM **);
+int dsae_do_verify(const unsigned char *, int, DSA_SIG *,
+   DSA *);
+int dsae_mod_exp(DSA *, BIGNUM *, BIGNUM *, BIGNUM *,
+   BIGNUM *, BIGNUM *, BIGNUM *, BN_CTX *,
+   BN_MONT_CTX *);
+int dsae_bn_mod_exp(DSA *, BIGNUM *, BIGNUM *,
+   const BIGNUM *, const BIGNUM *, BN_CTX *,
+   BN_MONT_CTX *);
+int dsae_init(DSA *);
+int dsae_finish(DSA *);
+int dsae_paramgen(DSA *, int, const unsigned char *, int,
+   int *, unsigned long *, BN_GENCB *);
+int dsae_keygen(DSA *);
+
+ECDSA_SIG  *ecdsae_do_sign(const unsigned char *, int ,
+   const BIGNUM *, const BIGNUM *, EC_KEY *);
+int ecdsae_sign_setup(EC_KEY *, BN_CTX *, BIGNUM **,
+

Re: Request for Funding our Electricity

2014-01-15 Thread Vincent Gross
On Wed, Jan 15, 2014 at 06:25:53PM +0200, MJ wrote:
 
 I have long held the opinion that Theo is probably the best coder on this 
 planet. That?s not any sort of ass-kissing, either, it?s my objective, 
 unbiased opinion. And I know Henning personally, as in ?live and worked 
 together with him - one hell of an expert.
 
 However, the dilemma that the project has found itself in now very clearly 
 demonstrates that Theo is not a businessman and that there isn?t any other 
 businessman at the helm, either. Imagining that people will suddenly start to 
 pay for something that they have constantly been getting for free is absurd - 
 their belief is that somebody else will surely step up first or somebody will 
 fork in the name of fame. No business on this planet is going to allocate 
 budget to paying OpenBSD?s electricity bills, let alone anything else, 
 without 1) a detailed itemisation of the electrical bills, 2) a detailed 
 justification of said line items, and 3) a satisfaction of their own business 
 interest. It?s just not sexy for a philanthropist to support a relatively 
 unheard of operating system when cancer is still left uncured.

Define sexy. Some people will say it's having flash running full speed
on their web browser while streaming 3 youtube videos. For me it's being
able to trust my operating system to behave in a way that keeps me in
the loop and able to fix it.

As for the legalese, some people said You'll never get anywhere without
a protocol number for CARP!, yet some ciscos support CARP nowadays.

 
 It?s not good to be removing coders from their tasks; the project needs a 
 businessman or two. One who will handle the corporate feature requests and 
 charge dearly for them. Things like routing technology and high-speed packet 
 forwarding - things that can replace the exorbitant costs of maintaining 
 cisco routers. This is the key. With the FBSD 10GB wire speed packet 
 forwarding incorporated, OpenBSD would be ready to challenge Cisco in a very 
 serious way. Completely free as always, but with paid support for this edge 
 cases that make life what it is.
 

I don't know what is your background with corporate IT, but my
experience is that most of the time what the suits are looking for is
the assurance they will have resources to fix arising issues, or in
layman terms, a tech support to yell at. I do not see OpenBSD providing
such a support. However there are quite a few companies that provide
such service for their OpenBSD-based appliances.

Does that mean OpenBSD roadmap should be based on what will sell with
these companies? The answer (which is no) has already be given many
times on misc@, and I will let Theo add another layer of p[ao]int if he
deems it necessary.

Lastly, you suggest having a businessman in the project. That is,
someone who gets a commit bit by doing something else than coding. It's
not even about what this says to the world or the example it sets. It is
just plain rude towards the developers. I am not downplaying the
skills of businessmen; but you simply can't just say that contributing
code the OpenBSD way is the same as selling the product, however tough
that may be.


This is not a race; this is about doing things right.

regards,

--
Vincent