Re: IPv6 DoS sysctl man page additions
On 19 April 2014 13:20, Loganaden Velvindron lo...@elandsys.com wrote: On Sat, Apr 19, 2014 at 04:04:30AM -0700, Loganaden Velvindron wrote: Hi All, I'm taking a short break from playing with pf statistics. There were 4 sysctls added from KAME, but the man pages weren't updated accordingly. (Adapted from the NetBSD man page changes) Feedback welcomed. Removed trailing spaces and use set to instead of set it to based on feedback from sthen@ OK mikeb
Re: iked + isakmpd on the same machine
On 22 April 2014 17:13, Philipp e1c1bac6253dc54a1e89ddc046585...@posteo.net wrote: It happened! A remote peer *requires* IKEv2 - and I've to do that on a machine running isakmpd with somewhat 25+ IKEv1 peers. First hurdle: I cannot bind iked to a certain (carp) IP-address. Mad workaround: start isakmpd (with Listen-on) first. Second hurdle: iked loads its SAs and eventually does this by creating a new empty SADB, effectivly killing all the SAs isakmpd loaded into the kernel before? Is there a diff sleeping out there for tackling the first hurdle? For the second one, I've to refrain from testing this in live further more. First to reconstruct my Frankenstein-Lab. Cheers for any thoughts beside mad, bro? :-) more like it's not supported and is not supposed to work. it's like running nginx and apache at the same time but worse since there are kernel tentacles involved as well (as you might have figured out already) that will likely prevent you from doing that on the same box but different ip addresses. cheers, mike
Re: iked + isakmpd on the same machine
On 24 April 2014 12:12, Philipp e1c1bac6253dc54a1e89ddc046585...@posteo.net wrote: Am 22.04.2014 17:28 schrieb Mike Belopuhov: more like it's not supported and is not supposed to work. not supposed as in 'not wanted'? not supposed. it's like running nginx and apache at the same time but Quite frankly: I'm doing that in some locations ;-) not on the same port (80) though. ikev2 and isakmp both use same udp ports (500 and 4500). worse since there are kernel tentacles involved as well (as you might have figured out already) that will likely That's somehow the main problem, the two daemons are not trying to share the pfkey2 ioctls outcome. i don't see it like that. So, I can wait til iked supports ikev1, too. there are no current plans to implement ikev1 support that i'm aware of. Using a different machine will be quite painful at the moment. Rock+hard place. prevent you from doing that on the same box but different ip addresses. Nevertheless I'd say that a Listen-on style directive for iked would a useful thing[tm], e.g. to default the srcid to that. perhaps. currently i believe srcid will default to local if specified. Cheers.
Re: Kill in_localaddr()
On 24 April 2014 16:41, Martin Pieuchot mpieuc...@nolizard.org wrote: in_localaddr() is used only once in our tree and only if the sysctl net.inet.ip.mtudisc is set to 0. It is used to optimize the size of the MSS if the forward address correspond to a host on one of our subnets. Since it's an optimization for a special case that's not enabled by default, I'd like to kill it to remove one usage of the global list of IPv4 addresses. While here get rid of the #ifdef RTV_MTU, it is here. ok? OK
Re: iked + isakmpd on the same machine
On 24 April 2014 20:25, Chris Cappuccio ch...@nmedia.net wrote: Mike Belopuhov [m...@belopuhov.com] wrote: more like it's not supported and is not supposed to work. it's like running nginx and apache at the same time hey, nginx and httpd run concurrently quite fine on different IP addresses, same box :) i meant using the same port numbers of course.
Re: iked + isakmpd on the same machine
On 24 April 2014 22:25, Alexander Hall alexan...@beard.se wrote: On 04/24/14 21:53, Stuart Henderson wrote: On 2014/04/24 20:30, Mike Belopuhov wrote: On 24 April 2014 20:25, Chris Cappuccio ch...@nmedia.net wrote: Mike Belopuhov [m...@belopuhov.com] wrote: more like it's not supported and is not supposed to work. it's like running nginx and apache at the same time hey, nginx and httpd run concurrently quite fine on different IP addresses, same box :) i meant using the same port numbers of course. they can do that fine too! :) just have one hand-off the relevant requests to the other. If they bind to separate IP addresses that is obviously not a problem, even for the same port numbers. yes. that's precisely what i meant: you can't bind to the same ipaddr:port pair twice. why do i have to chew it and spit it out for you. it was clear what i meant from the start.
snmpd: add backend from bgpd to support multiple routing tables
This adds ktable code from bgpd to fetch, store and perform lookups in multiple routing tables. Currently it doesn't do anything useful but it's a prerequisite for any future work in this direction. OK to get this in? diff --git usr.sbin/snmpd/kroute.c usr.sbin/snmpd/kroute.c index e157b25..e19f924 100644 --- usr.sbin/snmpd/kroute.c +++ usr.sbin/snmpd/kroute.c @@ -45,10 +45,13 @@ #include snmpd.h extern struct snmpd*env; +struct ktable **krt; +u_intkrt_size; + struct { struct event ks_ev; u_long ks_iflastchange; u_long ks_nroutes;/* 4 billions enough? */ int ks_fd; @@ -77,24 +80,32 @@ struct kif_node { intkroute_compare(struct kroute_node *, struct kroute_node *); intkroute6_compare(struct kroute6_node *, struct kroute6_node *); intkif_compare(struct kif_node *, struct kif_node *); -struct kroute_node *kroute_find(in_addr_t, u_int8_t, u_int8_t); +voidktable_init(void); +int ktable_new(u_int, u_int); +voidktable_free(u_int); +int ktable_exists(u_int, u_int *); +struct ktable *ktable_get(u_int); +int ktable_update(u_int); + +struct kroute_node *kroute_find(struct ktable *, in_addr_t, u_int8_t, + u_int8_t); struct kroute_node *kroute_matchgw(struct kroute_node *, struct sockaddr_in *); -int kroute_insert(struct kroute_node *); -int kroute_remove(struct kroute_node *); -voidkroute_clear(void); +int kroute_insert(struct ktable *, struct kroute_node *); +int kroute_remove(struct ktable *, struct kroute_node *); +voidkroute_clear(struct ktable *); -struct kroute6_node*kroute6_find(const struct in6_addr *, u_int8_t, -u_int8_t); +struct kroute6_node*kroute6_find(struct ktable *, const struct in6_addr *, + u_int8_t, u_int8_t); struct kroute6_node*kroute6_matchgw(struct kroute6_node *, struct sockaddr_in6 *); -int kroute6_insert(struct kroute6_node *); -int kroute6_remove(struct kroute6_node *); -voidkroute6_clear(void); +int kroute6_insert(struct ktable *, struct kroute6_node *); +int kroute6_remove(struct ktable *, struct kroute6_node *); +voidkroute6_clear(struct ktable *); struct kif_arp *karp_find(struct sockaddr *, u_short); int karp_insert(struct kif_node *, struct kif_arp *); int karp_remove(struct kif_node *, struct kif_arp *); @@ -121,23 +132,21 @@ void if_newaddr(u_short, struct sockaddr *, struct sockaddr *, struct sockaddr *); void if_deladdr(u_short, struct sockaddr *, struct sockaddr *, struct sockaddr *); void if_announce(void *); -intfetchtable(void); +intfetchtable(struct ktable *); intfetchifs(u_short); -intfetcharp(void); +intfetcharp(struct ktable *); void dispatch_rtmsg(int, short, void *); intrtmsg_process(char *, int); -intdispatch_rtmsg_addr(struct rt_msghdr *, +intdispatch_rtmsg_addr(struct ktable *, struct rt_msghdr *, struct sockaddr *[RTAX_MAX]); -RB_HEAD(kroute_tree, kroute_node) krt; RB_PROTOTYPE(kroute_tree, kroute_node, entry, kroute_compare) RB_GENERATE(kroute_tree, kroute_node, entry, kroute_compare) -RB_HEAD(kroute6_tree, kroute6_node)krt6; RB_PROTOTYPE(kroute6_tree, kroute6_node, entry, kroute6_compare) RB_GENERATE(kroute6_tree, kroute6_node, entry, kroute6_compare) RB_HEAD(kif_tree, kif_node)kit; RB_PROTOTYPE(kif_tree, kif_node, entry, kif_compare) @@ -149,10 +158,11 @@ RB_GENERATE(ka_tree, kif_addr, node, ka_compare) void kr_init(void) { int opt = 0, rcvbuf, default_rcvbuf; + unsigned inttid = RTABLE_ANY; socklen_t optlen; if ((kr_state.ks_ifd = socket(AF_INET, SOCK_DGRAM, 0)) == -1) fatal(kr_init: ioctl socket); @@ -179,31 +189,166 @@ kr_init(void) setsockopt(kr_state.ks_fd, SOL_SOCKET, SO_RCVBUF, rcvbuf, sizeof(rcvbuf)) == -1 errno == ENOBUFS; rcvbuf /= 2) ; /* nothing */ - RB_INIT(krt); - RB_INIT(krt6); + if (setsockopt(kr_state.ks_fd, AF_ROUTE, ROUTE_TABLEFILTER, tid, + sizeof(tid)) == -1) + log_warn(kr_init: setsockopt AF_ROUTE ROUTE_TABLEFILTER); + RB_INIT(kit); RB_INIT(kat); if
Re: data modified on freelist, tmpfs-related?
On 30 April 2014 15:55, Mark Kettenis mark.kette...@xs4all.nl wrote: Date: Wed, 30 Apr 2014 15:38:39 +0200 (CEST) From: Mark Kettenis mark.kette...@xs4all.nl Date: Wed, 30 Apr 2014 13:39:20 +0100 From: Stuart Henderson st...@openbsd.org Seen when running e2fsprogs regression tests with /tmp on tmpfs I'm not surprised; tmpfs contains some serious bugs. I recommend not using it until those are fixed. Which means, I'd like somebody else besides espie@ to comment on my uvm_aobj.c list manipulation hack diff. Diff made sense to me when I looked at it, but I would rather hide direct pointer access :/ Perhaps LIST_SWAP does a tiny bit more, but it's cleaner and perhaps can be useful in the future.
Re: data modified on freelist, tmpfs-related?
On 30 April 2014 16:55, Mark Kettenis mark.kette...@xs4all.nl wrote: From: Mike Belopuhov m...@belopuhov.com Date: Wed, 30 Apr 2014 16:00:45 +0200 On 30 April 2014 15:55, Mark Kettenis mark.kette...@xs4all.nl wrote: Date: Wed, 30 Apr 2014 15:38:39 +0200 (CEST) From: Mark Kettenis mark.kette...@xs4all.nl Date: Wed, 30 Apr 2014 13:39:20 +0100 From: Stuart Henderson st...@openbsd.org Seen when running e2fsprogs regression tests with /tmp on tmpfs I'm not surprised; tmpfs contains some serious bugs. I recommend not using it until those are fixed. Which means, I'd like somebody else besides espie@ to comment on my uvm_aobj.c list manipulation hack diff. Diff made sense to me when I looked at it, but I would rather hide direct pointer access :/ Perhaps LIST_SWAP does a tiny bit more, but it's cleaner and perhaps can be useful in the future. I'm not comfortable with introducing more sys/queue.h APIs. So perhaps we should just punt on the optimization and remove/insert all list items. Removing the trap comments that pedro set up... I agree.
Re: Annoying emacs variable in if_spppsubr.c
On 2 May 2014 12:09, Jérémie Courrèges-Anglas j...@wxcvbn.org wrote: This one is bugging me each time I start my Emacs session (because Emacs now asks confirmation for most variables). This one would be useful only with hilit19.el (obsolete) from editors/emacs21... if the size of the file was actually under 12 bytes (it's not anymore). ok to kill it? yes, please.
Re: m-m_pkthdr.rcvif and ip6_input()
On 12 May 2014 15:12, Martin Pieuchot mpieuc...@nolizard.org wrote: Like the previous diffs, it reduces the number of m-m_pkthdr.rcvif occurrences, this time in ip6_input(). Should be no functional change. Ok? OK
Re: Remove p2p loopback hack in nd6_rtrequest()
On Mon, May 12, 2014 at 12:48 +0200, Martin Pieuchot wrote: On 07/05/14(Wed) 12:46, Martin Pieuchot wrote: Diff below stops abusing nd6_rtrequest() for loopback interfaces, which means we can remove the special hack below and reduce the differences with arp_rtrequest(). This diff introduces two changes in the inet6 routing table, but they should not matter. The first one is that the gateway of the link-local entry for loopback interfaces is no longer a buggy link-layer address: -fe80::1%lo0link#6 UHL 00 - 4 lo0 +fe80::1%lo0fe80::1%lo0UHL 00 - 4 lo0 The second one is that every route to network associated to the loopback interface will now have the mtu of this interface: -ff02::/16 ::1UGRS 00 - 8 lo0 +ff02::/16 ::1UGRS 00 33192 8 lo0 Both changes are consistent with the IPv4 behavior, ok? Anyone? OK Index: netinet6/in6.c === RCS file: /home/ncvs/src/sys/netinet6/in6.c,v retrieving revision 1.136 diff -u -p -r1.136 in6.c --- netinet6/in6.c 5 May 2014 11:44:33 - 1.136 +++ netinet6/in6.c 7 May 2014 10:29:24 - @@ -1399,7 +1399,7 @@ in6_ifinit(struct ifnet *ifp, struct in6 /* Add ownaddr as loopback rtentry, if necessary (ex. on p2p link). */ if (newhost) { /* set the rtrequest function to create llinfo */ - if ((ifp-if_flags IFF_POINTOPOINT) == 0) + if ((ifp-if_flags (IFF_LOOPBACK | IFF_POINTOPOINT)) == 0) ia6-ia_ifa.ifa_rtrequest = nd6_rtrequest; rt_ifa_addloop((ia6-ia_ifa)); Index: netinet6/nd6.c === RCS file: /home/ncvs/src/sys/netinet6/nd6.c,v retrieving revision 1.116 diff -u -p -r1.116 nd6.c --- netinet6/nd6.c 7 May 2014 08:14:59 - 1.116 +++ netinet6/nd6.c 7 May 2014 10:29:24 - @@ -1059,20 +1059,14 @@ nd6_rtrequest(int req, struct rtentry *r #endif /* FALLTHROUGH */ case RTM_RESOLVE: - if ((ifp-if_flags (IFF_POINTOPOINT | IFF_LOOPBACK)) == 0) { - /* -* Address resolution isn't necessary for a point to -* point link, so we can skip this test for a p2p link. -*/ - if (gate-sa_family != AF_LINK || - gate-sa_len sizeof(null_sdl)) { - log(LOG_DEBUG, %s: bad gateway value: %s\n, - __func__, ifp-if_xname); - break; - } - SDL(gate)-sdl_type = ifp-if_type; - SDL(gate)-sdl_index = ifp-if_index; + if (gate-sa_family != AF_LINK || + gate-sa_len sizeof(null_sdl)) { + log(LOG_DEBUG, %s: bad gateway value: %s\n, + __func__, ifp-if_xname); + break; } + SDL(gate)-sdl_type = ifp-if_type; + SDL(gate)-sdl_index = ifp-if_index; if (ln != NULL) break; /* This happens on a route change */ /*
Re: snmpd: add backend from bgpd to support multiple routing tables
On Mon, Apr 28, 2014 at 14:20 +0200, Mike Belopuhov wrote: This adds ktable code from bgpd to fetch, store and perform lookups in multiple routing tables. Currently it doesn't do anything useful but it's a prerequisite for any future work in this direction. OK to get this in? Any objections? diff --git usr.sbin/snmpd/kroute.c usr.sbin/snmpd/kroute.c index e157b25..e19f924 100644 --- usr.sbin/snmpd/kroute.c +++ usr.sbin/snmpd/kroute.c @@ -45,10 +45,13 @@ #include snmpd.h extern struct snmpd *env; +struct ktable**krt; +u_int krt_size; + struct { struct event ks_ev; u_long ks_iflastchange; u_long ks_nroutes;/* 4 billions enough? */ int ks_fd; @@ -77,24 +80,32 @@ struct kif_node { int kroute_compare(struct kroute_node *, struct kroute_node *); int kroute6_compare(struct kroute6_node *, struct kroute6_node *); int kif_compare(struct kif_node *, struct kif_node *); -struct kroute_node *kroute_find(in_addr_t, u_int8_t, u_int8_t); +void ktable_init(void); +int ktable_new(u_int, u_int); +void ktable_free(u_int); +int ktable_exists(u_int, u_int *); +struct ktable*ktable_get(u_int); +int ktable_update(u_int); + +struct kroute_node *kroute_find(struct ktable *, in_addr_t, u_int8_t, + u_int8_t); struct kroute_node *kroute_matchgw(struct kroute_node *, struct sockaddr_in *); -int kroute_insert(struct kroute_node *); -int kroute_remove(struct kroute_node *); -void kroute_clear(void); +int kroute_insert(struct ktable *, struct kroute_node *); +int kroute_remove(struct ktable *, struct kroute_node *); +void kroute_clear(struct ktable *); -struct kroute6_node *kroute6_find(const struct in6_addr *, u_int8_t, - u_int8_t); +struct kroute6_node *kroute6_find(struct ktable *, const struct in6_addr *, + u_int8_t, u_int8_t); struct kroute6_node *kroute6_matchgw(struct kroute6_node *, struct sockaddr_in6 *); -int kroute6_insert(struct kroute6_node *); -int kroute6_remove(struct kroute6_node *); -void kroute6_clear(void); +int kroute6_insert(struct ktable *, struct kroute6_node *); +int kroute6_remove(struct ktable *, struct kroute6_node *); +void kroute6_clear(struct ktable *); struct kif_arp *karp_find(struct sockaddr *, u_short); int karp_insert(struct kif_node *, struct kif_arp *); int karp_remove(struct kif_node *, struct kif_arp *); @@ -121,23 +132,21 @@ voidif_newaddr(u_short, struct sockaddr *, struct sockaddr *, struct sockaddr *); void if_deladdr(u_short, struct sockaddr *, struct sockaddr *, struct sockaddr *); void if_announce(void *); -int fetchtable(void); +int fetchtable(struct ktable *); int fetchifs(u_short); -int fetcharp(void); +int fetcharp(struct ktable *); void dispatch_rtmsg(int, short, void *); int rtmsg_process(char *, int); -int dispatch_rtmsg_addr(struct rt_msghdr *, +int dispatch_rtmsg_addr(struct ktable *, struct rt_msghdr *, struct sockaddr *[RTAX_MAX]); -RB_HEAD(kroute_tree, kroute_node)krt; RB_PROTOTYPE(kroute_tree, kroute_node, entry, kroute_compare) RB_GENERATE(kroute_tree, kroute_node, entry, kroute_compare) -RB_HEAD(kroute6_tree, kroute6_node) krt6; RB_PROTOTYPE(kroute6_tree, kroute6_node, entry, kroute6_compare) RB_GENERATE(kroute6_tree, kroute6_node, entry, kroute6_compare) RB_HEAD(kif_tree, kif_node) kit; RB_PROTOTYPE(kif_tree, kif_node, entry, kif_compare) @@ -149,10 +158,11 @@ RB_GENERATE(ka_tree, kif_addr, node, ka_compare) void kr_init(void) { int opt = 0, rcvbuf, default_rcvbuf; + unsigned inttid = RTABLE_ANY; socklen_t optlen; if ((kr_state.ks_ifd = socket(AF_INET, SOCK_DGRAM, 0)) == -1) fatal(kr_init: ioctl socket); @@ -179,31 +189,166 @@ kr_init(void) setsockopt(kr_state.ks_fd, SOL_SOCKET, SO_RCVBUF, rcvbuf, sizeof(rcvbuf)) == -1 errno == ENOBUFS; rcvbuf /= 2) ; /* nothing */ - RB_INIT(krt); - RB_INIT(krt6); + if (setsockopt(kr_state.ks_fd, AF_ROUTE, ROUTE_TABLEFILTER, tid, + sizeof(tid)) == -1) + log_warn(kr_init: setsockopt AF_ROUTE
Re: MI MTU size for lo(4)
On 13 May 2014 15:45, Claudio Jeker cje...@diehard.n-r-g.com wrote: With KAME the MTU size of the loopback interface became strange and is actually dependend on the architecture. I see no point in all this just go back to the way it was long long long ago and just use 32k as the MTU. AFAIK all of this was only done to test large IPv6 packets but why MHLEN and MLEN were added is still beyond my imagination. In fact this comes from this commit in NetBSD originally: http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/net/if_loop.c#rev1.20 Comment says Add MHLEN + MLEN extra space to LOMTU for IP and transport headers. This doesn't make me understand why this is needed however. So unless this clues you in, I'm OK with setting it to 32k everywhere.
Re: MI MTU size for lo(4)
On 13 May 2014 16:05, Mike Belopuhov m...@belopuhov.com wrote: On 13 May 2014 15:45, Claudio Jeker cje...@diehard.n-r-g.com wrote: With KAME the MTU size of the loopback interface became strange and is actually dependend on the architecture. I see no point in all this just go back to the way it was long long long ago and just use 32k as the MTU. AFAIK all of this was only done to test large IPv6 packets but why MHLEN and MLEN were added is still beyond my imagination. In fact this comes from this commit in NetBSD originally: http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/net/if_loop.c#rev1.20 Comment says Add MHLEN + MLEN extra space to LOMTU for IP and transport headers. This doesn't make me understand why this is needed however. So unless this clues you in, I'm OK with setting it to 32k everywhere. And purely for your amusement: FreeBSD have decreased MTU from 64k to 16k in '95: http://svnweb.freebsd.org/base/head/sys/net/if_loop.c?r1=6875r2=6876 Perhaps (and I'm somewhat speculating here) because of TCP performance problems, like so: http://marc.info/?l=netbsd-current-usersm=102684186303106w=2 (fixed diff is in the next mail though)
Re: PRU_BIND in raw ip
On 28 May 2014 13:36, Martin Pieuchot mpieuc...@nolizard.org wrote: On 28/05/14(Wed) 09:30, Jérémie Courrèges-Anglas wrote: Martin Pieuchot mpieuc...@nolizard.org writes: Diff below replace in_iawithaddr() + in_broadcast() - ifa_ifwithaddr(), that does the same for IPv4 since broadcast addresses are added to the tree. This prevents listeners to bind on 255.255.255.255, something allowed with the current code. Thoughts? You're right, here's an updated diff that keeps this behavior. OK
Re: Remove a global variable in ip_input
On 4 June 2014 12:30, Martin Pieuchot mpieuc...@nolizard.org wrote: ok? sure
Re: in_pcbbind() and in_broadcast/in_iawithaddr
On 3 June 2014 09:18, Martin Pieuchot mpieuc...@nolizard.org wrote: On 02/06/14(Mon) 15:45, Martin Pieuchot wrote: This diff is similar to the one that has been committed to handle the SOCK_RAW binding. I'd like to stop using in_iawithaddr() *and* in_broadcast(). Since these functions are just doing an iteration on all the addresses present in the RB-tree (or equivalent), let's use ifa_ifwithaddr() instead. This diff should not introduce any behavior change concerning SOCK_DGRAM and binding to multicast addresses. As pointed out by jca@ this diff breaks on loopback. This is because the loopback IPv4 addresses are abusing the dstaddr field to be able to create a route to their address. Hopefully this hack can be removed once the local route diff is in, but in the meantime let's use the less intuitive but equivalent idiom: sin-sin_addr.s_addr != ia-ia_addr.s_addr Update diff below, is this one ok? OK
Re: pf anchor references
On Mon, Jun 02, 2014 at 17:51 +0200, Mike Belopuhov wrote: Hi, I've been chasing some bugs in the pfctl anchor code for a couple of weeks and I'm not astonished at how loose the handling is in general. Lot's of rules and checks are being violated by some code paths while honoured by others. The problem I have is that it's truly a rabbit's hole: almost every bug I hit is a feature in disguise. One thing that I particularly hate is called an anchor reference. A good example of this is a ruleset like this: anchor foo { block all } anchor foo If you believe it should be a syntax error let me disappoint you. The second 'anchor foo' is just a reference to the anchor named foo defined before: # echo -e 'anchor foo {\n block all\n}\nanchor foo' | pfctl -f - # pfctl -a '*' -sr anchor foo all { block drop all } anchor foo all { block drop all } # echo -e 'pass all' | pfctl -a 'foo' -f - # pfctl -a '*' -sr anchor foo all { pass all flags S/SA } anchor foo all { pass all flags S/SA } And to demonstrate a reference specified by an absolute path we can add anchor /foo inside foo: # echo -e 'anchor /foo' | pfctl -a 'foo' -f - This obviously gets us an endless loop if we decide to print out contents recursively (pfctl -a '*' -sr). Thankfully this doesn't affect ruleset evaluation in the kernel. The basic question I have is why do we need this dangerous and (at least in my eyes) pretty useless mechanism? Any opinions on this? Does anyone make any sensible use of it? Should we try to simplify this by removing ambiguous functionality? Cheers, Mike While I've got a few responses on tech supporting me, I should probably ask misc@ as well for additional scrutiny. And besides, I've come up with an example that might simplify ruleset creation for say multiple customer vlans or rdomains. Does anyone use it like that? This looks like an anchor template that we want to specify but not use directly in the main ruleset, but it plays somewhat nicely with quick anchors. (The ruleset below was tested and works correctly) Cheers, Mike block all anchor customer1 quick on rdomain 1 { anchor /allow-egress anchor /allow-ssh anchor /allow-icmp-echo } anchor customer2 quick on rdomain 2 { anchor /allow-egress anchor /allow-ssh } pass in quick on rdomain 0 proto tcp to (self) port ssh pass out quick on rdomain 0 # end of ruleset evaluation block quick anchor allow-ssh { pass in proto tcp to (self) port ssh } anchor allow-icmp-echo { pass in inet proto icmp to (self) icmp-type echoreq code 0 } anchor allow-egress { pass out }
Re: pfctl: stricter redirect specs
On Tue, Jun 24, 2014 at 15:07 +0200, Mike Belopuhov wrote: I have carefully tested that and do not expect any unrelated fallout. And for the reasons stated above I don't believe anyone is using this since it's largely error prone. and a regress chunk that avoids using such combination. note that this diff basically finishes previous attemts at modifying tests to match present syntax. diff --git regress/sbin/pfctl/pf48.in regress/sbin/pfctl/pf48.in index 540711a..a0dd143 100644 --- regress/sbin/pfctl/pf48.in +++ regress/sbin/pfctl/pf48.in @@ -1,12 +1,12 @@ table regress { 1.2.3.4 !5.6.7.8 10/8 lo0 } table regress.1 const { ::1 fe80::/64 } table regress.a { 1.2.3.4 !5.6.7.8 } { ::1 ::2 ::3 } file /dev/null const { 4.3.2.1 } -match out on lo0 from regress.1 to regress.2 nat-to lo0:0 +match out on lo0 inet from regress.1 to regress.2 nat-to lo0:0 match out on !lo0 inet from !regress.1 to regress.2 nat-to lo0:0 match in on lo0 inet6 from regress.1 to regress.2 rdr-to lo0:0 -match in on !lo0 from ! regress.1 to regress.2 rdr-to lo0:0 +match in on !lo0 inet6 from ! regress.1 to regress.2 rdr-to lo0:0 match in from { regress.1 !regress.2 } to any match out from any to { !regress.1, regress.2 } pass in from regress to any pass out from any to regress pass in from { regress.1 regress.2 } to any diff --git regress/sbin/pfctl/pf48.loaded regress/sbin/pfctl/pf48.loaded index 68b9854..cfdc8a2 100644 --- regress/sbin/pfctl/pf48.loaded +++ regress/sbin/pfctl/pf48.loaded @@ -1,7 +1,7 @@ -@0 match out on lo0 inet6 from regress.1:2 to regress.2:* nat-to ::1 - [ Skip steps: d=2 p=end da=4 sp=end dp=end ] +@0 match out on lo0 inet from regress.1:2 to regress.2:* nat-to 127.0.0.1 + [ Skip steps: d=2 f=2 p=end da=4 sp=end dp=end ] [ queue: qname= qid=0 pqname= pqid=0 ] [ Evaluations: 0 Packets: 0 Bytes: 0 States: 0 ] @1 match out on ! lo0 inet from ! regress.1:2 to regress.2:* nat-to 127.0.0.1 [ Skip steps: p=end da=4 sp=end dp=end ] [ queue: qname= qid=0 pqname= pqid=0 ] diff --git regress/sbin/pfctl/pf48.ok regress/sbin/pfctl/pf48.ok index 5f6e6ab..aeccfcf 100644 --- regress/sbin/pfctl/pf48.ok +++ regress/sbin/pfctl/pf48.ok @@ -1,9 +1,9 @@ table regress { 1.2.3.4 !5.6.7.8 10.0.0.0/8 ::1 fe80::1 127.0.0.1 } table regress.1 const { ::1 fe80::/64 } table regress.a const { 1.2.3.4 !5.6.7.8 ::1 ::2 ::3 } file /dev/null { 4.3.2.1 } -match out on lo0 inet6 from regress.1 to regress.2 nat-to ::1 +match out on lo0 inet from regress.1 to regress.2 nat-to 127.0.0.1 match out on ! lo0 inet from ! regress.1 to regress.2 nat-to 127.0.0.1 match in on lo0 inet6 from regress.1 to regress.2 rdr-to ::1 match in on ! lo0 inet6 from ! regress.1 to regress.2 rdr-to ::1 match in from regress.1 to any match in from ! regress.2 to any diff --git regress/sbin/pfctl/pf48.optimized regress/sbin/pfctl/pf48.optimized index 63309a7..90e2036 100644 --- regress/sbin/pfctl/pf48.optimized +++ regress/sbin/pfctl/pf48.optimized @@ -1,7 +1,7 @@ -@0 match out on lo0 inet6 from regress.1:2 to regress.2:* nat-to ::1 - [ Skip steps: d=2 p=end da=4 sp=end dp=end ] +@0 match out on lo0 inet from regress.1:2 to regress.2:* nat-to 127.0.0.1 + [ Skip steps: d=2 f=2 p=end da=4 sp=end dp=end ] [ queue: qname= qid=0 pqname= pqid=0 ] [ Evaluations: 0 Packets: 0 Bytes: 0 States: 0 ] @1 match out on ! lo0 inet from ! regress.1:2 to regress.2:* nat-to 127.0.0.1 [ Skip steps: p=end da=4 sp=end dp=end ] [ queue: qname= qid=0 pqname= pqid=0 ] diff --git regress/sbin/pfctl/pf98.in regress/sbin/pfctl/pf98.in index a18a491..a224f9e 100644 --- regress/sbin/pfctl/pf98.in +++ regress/sbin/pfctl/pf98.in @@ -1,4 +1,4 @@ # Test rule order processing should pass (require-order no longer required) pass in on lo100 all -match out on lo0 all nat-to lo0 +match out on lo0 inet6 all nat-to lo0
pfctl: better af-to translation specs handling
Hi, collapse_redirspec is one of my pet peeve since the af-to support. Unfortunately we didn't put much effort in making it work well back then, but now it is time for a change! Improving upon the last diff here's a collapsed version of the collapse_redirspec, so to speak. Instead of having two loops: first counts addresses, the second one performs an action, here's a merged version that has all the checks done once and in the right order. No changes in the regress test output (all tests pass) and the gain is that we no longer need to explicitely set the address family for af-to rules, it will be deduced in most of the cases. For example, previously you'd have to do: pass in inet af-to inet6 from ::1 While it's clear that you're translating IPv4 to the IPv6. Now it's possible to omit the 'inet' part: pass in af-to inet6 from ::1 Which is in line with how nat-to and rdr-to target addresses affect selection of the address family for the whole rule, i.e.: % echo 'pass in rdr-to ::1' | pfctl -vnf - pass in inet6 all flags S/SA rdr-to ::1 (Of course af-to rule still needs the 'inet6' part, but that will require some other changes as well...) OK? P.S. This diff moves chunks around quite a bit and might require applying to tell the difference. diff --git sbin/pfctl/parse.y sbin/pfctl/parse.y index ce80f8e..fabe210 100644 --- sbin/pfctl/parse.y +++ sbin/pfctl/parse.y @@ -2025,26 +2025,21 @@ filter_opt : USER uids { filter_opts.nat.rdr = $2; memcpy(filter_opts.nat.pool_opts, $3, sizeof(filter_opts.nat.pool_opts)); } | AFTO af FROM redirpool pool_opts { if (filter_opts.nat.rdr) { yyerror(cannot respecify af-to); YYERROR; } if ($2 == 0) { - yyerror(no address family specified); - YYERROR; - } - if ($4-host-af $4-host-af != $2) { - yyerror(af-to addresses must be in the - target address family); + yyerror(no target address family specified); YYERROR; } filter_opts.nat.af = $2; filter_opts.nat.rdr = $4; memcpy(filter_opts.nat.pool_opts, $5, sizeof(filter_opts.nat.pool_opts)); filter_opts.rdr.rdr = calloc(1, sizeof(struct redirection)); bzero(filter_opts.rdr.pool_opts, sizeof(filter_opts.rdr.pool_opts)); @@ -3482,21 +3477,21 @@ redirspec : host optweight{ n-weight = $2; } $$ = $1; } | '{' optnl redir_host_list '}' { $$ = $3; } ; redir_host_list: host optweight optnl { if ($1-addr.type != PF_ADDR_ADDRMASK) { free($1); - yyerror(only addresses can be listed for + yyerror(only addresses can be listed for redirection pools ); YYERROR; } if ($2 0) { struct node_host*n; for (n = $1; n != NULL; n = n-next) n-weight = $2; } $$ = $1; } @@ -4288,118 +4283,139 @@ expand_queue(char *qname, struct node_if *interfaces, struct queue_opts *opts) FREE_LIST(struct node_if, interfaces); return (0); } int collapse_redirspec(struct pf_pool *rpool, struct pf_rule *r, struct redirspec *rs, u_int8_t allow_if) { struct pf_opt_tbl *tbl = NULL; - struct node_host *h; + struct node_host *h, *hprev = NULL; struct pf_rule_addr ra; - int af = 0, i = 0; + int af = 0, naddr = 0; if (!rs || !rs-rdr || rs-rdr-host == NULL) { rpool-addr.type = PF_ADDR_NONE; return (0); } if (r-rule_flag PFRULE_AFTO) r-naf = rs-af; - /* count matching addresses */ for (h = rs-rdr-host; h != NULL; h = h-next) { - if (h-af) { - /* check that af-to target address has correct af */ - if (rs-af rs-af != h-af) { - yyerror(af mismatch in %s spec, - allow_if ? routing : translation); -
Re: ANONCVS MIRROR MAINTAINERS.. YOU NEED TO READ THIS!
On 26 June 2014 08:53, patrick keshishian sids...@boxsoft.com wrote: On Wed, Jun 25, 2014 at 10:01:06PM -0700, patrick keshishian wrote: On Thu, Jun 26, 2014 at 06:37:00AM +0200, Alexander Hall wrote: On 06/25/14 20:52, Bob Beck wrote: If you or someone you love runs an anoncvs server, they need to see this. As you know we recently added commitid support to cvs, and we had you update your cvsync binary. Unfortunately, the fix wasn't quite right. We ran into problems with the synching of commitid files. naddy managed to cook a correct fix for us. anoncvs.ca (the fanout machine) has been fixed - again. What you need to do is to update your cvsync to the latest one that was just committed into ports (cvsync-0.25.0pre0 or later). Remove your scanfile (if any). Once you do that, re-run cvsync and you should be back in business. Is it known whether this requires a fixed upstream cvsync first, or if it is ok to apply the fix and your repo will become fine whenever the upstream is fixed? Anyone with insight, please feel free to chime in. The commit message suggests, at least, the server must be updated: | Add full support for commitid and bump protocol version | Old clients will receive updates with commitid stripped out. It seems, running updated cvsync against an un-updated cvsyncd (I assume) results in failure: Connecting to anoncvs1.usa.openbsd.org port Connected to 149.20.54.217 port Running... Updating (collection openbsd-src/rcs) Updater(RCS): UPDATE(delta): error Updater(RCS): UPDATE: /cvs/anoncvs/cvs/obsd/src/lib/libc/string/timingsafe_bcmp.3,v Updater: RCS Error Socket Error: recv: 2 residue 2 Receiver Error: recv Mux(SEND) Error: not running: 0 DirScan: RCS Error Mux(SEND) Error: not running: 1 FileScan(RCS): UPDATE /cvs/anoncvs/cvs/obsd/src/sbin/pfctl/parse.y,v FileScan: RCS Error Failed removing offending files locally and restarting cvsync helps.
Re: PF Once rules are not removed from main anchor
On 21 June 2014 15:36, Alexandr Nedvedicky alexandr.nedvedi...@oracle.com wrote: Hello, I'm not sure it is the right place to submit patches. Let me know if there is better/more appropriate address for this. during our testing we've found the once rules are not removed, when used in main anchor. Correct. I've addedd the ruleset pointer check to prevent that shortly before the release since it caused panics... during debugging we found the rules in main anchor have member anchor set to NULL (pf_rule::anchor). This makes pf_purge_rule() function to bail out to early without removing the rule from ruleset. patch below fixed problem for us. However, this solution is not correct for us. Perhaps you have some other changes in your tree to make it work. Index: pf_ioctl.c === RCS file: /cvs/src/sys/net/pf_ioctl.c,v retrieving revision 1.272 diff -u -r1.272 pf_ioctl.c --- pf_ioctl.c 22 Apr 2014 14:41:03 - 1.272 +++ pf_ioctl.c 20 Jun 2014 14:26:22 - @@ -312,7 +312,7 @@ { u_int32_tnr; - if (ruleset == NULL || ruleset-anchor == NULL) + if (ruleset == NULL) return; ruleset-anchor is useless since nothing really checks for it if ruleset is NULL. pf_rm_rule(ruleset-rules.active.ptr, rule); @@ -325,7 +325,10 @@ ruleset-rules.active.ticket++; pf_calc_skip_steps(ruleset-rules.active.ptr); - pf_remove_if_empty_ruleset(ruleset); + + if (ruleset != pf_main_ruleset) { + pf_remove_if_empty_ruleset(ruleset); + } } You don't need to check ruleset against pf_main_ruleset since the first thing pf_remove_if_empty_ruleset does is bail if ruleset == pf_main_ruleset. This bit confused me quite a bit. What really is a problem for us is that when pf_purge_rule is called on a main ruleset from pf_test_rule the first argument (ruleset) is NULL and not pf_main_ruleset, which would be sensible. The only other user of it, AFAICT, is pflog_packet but it checks for ruleset-anchor before it does it's thing. I don't see any reason why we can't start our rule set iteration with 'ruleset' set to pf_main_ruleset (regress tests agree). OK? diff --git sys/net/pf.c sys/net/pf.c index 71f85d1..562901d 100644 --- sys/net/pf.c +++ sys/net/pf.c @@ -3163,10 +3163,11 @@ pf_test_rule(struct pf_pdesc *pd, struct pf_rule **rm, struct pf_state **sm, } break; #endif /* INET6 */ } + ruleset = pf_main_ruleset; r = TAILQ_FIRST(pf_main_ruleset.rules.active.ptr); while (r != NULL) { r-evaluations++; PF_TEST_ATTRIB((pfi_kif_match(r-kif, pd-kif) == r-ifnot), r-skip[PF_SKIP_IFP].ptr); diff --git sys/net/pf_ioctl.c sys/net/pf_ioctl.c index 5ad762c..86e987f 100644 --- sys/net/pf_ioctl.c +++ sys/net/pf_ioctl.c @@ -308,24 +308,19 @@ pf_rm_rule(struct pf_rulequeue *rulequeue, struct pf_rule *rule) } void pf_purge_rule(struct pf_ruleset *ruleset, struct pf_rule *rule) { - u_int32_tnr; + u_int32_tnr = 0; - if (ruleset == NULL || ruleset-anchor == NULL) - return; + KASSERT(ruleset != NULL rule != NULL); pf_rm_rule(ruleset-rules.active.ptr, rule); ruleset-rules.active.rcount--; - - nr = 0; TAILQ_FOREACH(rule, ruleset-rules.active.ptr, entries) rule-nr = nr++; - ruleset-rules.active.ticket++; - pf_calc_skip_steps(ruleset-rules.active.ptr); pf_remove_if_empty_ruleset(ruleset); } u_int16_t
Re: [PATCH] rdomain support on rc.d
On 11 July 2014 10:29, Antoine Jacoutot ajacou...@bsdfrog.org wrote: On Thu, Jul 10, 2014 at 06:51:01PM +0200, Loďc BLOT wrote: Hello all, I use rdomains to split routing domains per company and also separate administration interfaces from routing interfaces on my routers (sshd, bacula, postfix and puppetd running on a dedicated rdomain) Actually there is a problem with rdomains, we need to modify /etc/rc.d scripts to add rdomain execution environment to the specified service. If rc.subr have support to rdomains, we can let the rc.d scripts clean. To resolve those rdomain issues, I created a patch and I added a new variable we could use on rc.conf(.local), ${_name}_rdomain. (This variable needs a signed integer and use an existing rdomain, this is checked by rc.subr. I want to contribute to OpenBSD and I give you this patch. If you have any suggestions to improve it, tell me. I don't use rdomain so someone knowledgeable should comment here. But it does look like a nice idea. having something like this would be really cool. in case you'll be tweaking the code, make sure that the route -T exec printf check is preserved. i would use true in this test however. as far as i can tell the daemon_rdomain bit that goes into the rc script is fine, however i'm not quite sure how can i start two daemons in different rdomains via rc.conf.local. looks like this diff doesn't handle this and allows only one instance in the ${_name}_rdomain rdomain. but sometimes you want multiple, say sshd in rdomain 0 and 1. daemon_rdomain flag allows me to go and create another rc.d/sshd-rdomain-1 script and stuff daemon_rdomain=1 in there. but then i'd have to add it to the pkg_scripts... this is a minor issue that i see. perhaps ${_name}_rdomain should list multiple values, like sshd_rdomain=0,1,2,3.
pf: fixup pf_step_into_anchor to save current anchor rule pointer (2)
Hi, This is a second diff and it makes sure that pf_step_into_anchor always saves a pointer to the rule that owns the anchor on the pf anchor stack. There's no reason why we should check for depth here. As a side effect this makes sure that the correct nested anchor gets it's counter bumped instead of the top most. For the save/restore symmetry pf_step_out_of_anchor is made to always restore previous value of the anchor rule. depth == 0 means what we a at the top (main ruleset). OK? diff --git sys/net/pf.c sys/net/pf.c index 9832e47..8708417 100644 --- sys/net/pf.c +++ sys/net/pf.c @@ -2666,11 +2666,11 @@ pf_step_into_anchor(int *depth, struct pf_ruleset **rs, if (*depth = sizeof(pf_anchor_stack) / sizeof(pf_anchor_stack[0])) { log(LOG_ERR, pf_step_into_anchor: stack overflow\n); *r = TAILQ_NEXT(*r, entries); return; - } else if (*depth == 0 a != NULL) + } else if (a != NULL) *a = *r; f = pf_anchor_stack + (*depth)++; f-rs = *rs; f-r = *r; if ((*r)-anchor_wildcard) { @@ -2711,10 +2711,12 @@ pf_step_out_of_anchor(int *depth, struct pf_ruleset **rs, } } (*depth)--; if (*depth == 0 a != NULL) *a = NULL; + else if (a != NULL) + *a = f-r; *rs = f-rs; if (*match *depth) { *match = *depth; if (f-r-quick) quick = 1;
pf: once for match rules?
Hi, Before I send a diff for pfctl to disable once on match rules, I've decided to try and see how much work is it to make it actually work. Turns out that I need to extend pf_rule_item by 3 pointers to track the match rule ruleset, anchor rule and the ruleset it belongs to. Here's what this means in practice. Consider a ruleset: block drop all match out log proto tcp to port 22 once anchor foo all { match out log proto tcp to port 22 once anchor bar all { match out log proto tcp to port 22 once pass out quick proto tcp to port 22 once } } Once we send a packet to port 22 the ruleset collapses to just: block drop all Thoughts? diff --git sys/net/pf.c sys/net/pf.c index 9f0e2d6..5679a40 100644 --- sys/net/pf.c +++ sys/net/pf.c @@ -3279,15 +3279,16 @@ pf_test_rule(struct pf_pdesc *pd, struct pf_rule **rm, struct pf_state **sm, PR_NOWAIT)) == NULL) { REASON_SET(reason, PFRES_MEMORY); goto cleanup; } ri-r = r; + ri-ar = a; + ri-rs = ruleset; + ri-ars = aruleset; /* order is irrelevant */ SLIST_INSERT_HEAD(rules, ri, entry); pf_rule_to_actions(r, act); - if (r-rule_flag PFRULE_AFTO) - pd-naf = r-naf; if (pf_get_transaddr(r, pd, sns, nr) == -1) { REASON_SET(reason, PFRES_TRANSLATE); goto cleanup; } if (r-log || act.log PF_LOG_MATCHES) { @@ -3428,10 +3429,12 @@ pf_test_rule(struct pf_pdesc *pd, struct pf_rule **rm, struct pf_state **sm, virtual_type, icmp_dir); } } else { while ((ri = SLIST_FIRST(rules))) { SLIST_REMOVE_HEAD(rules, entry); + if (ri-r-rule_flag PFRULE_ONCE) + pf_purge_rule(ri-rs, ri-r, ri-ars, ri-ar); pool_put(pf_rule_item_pl, ri); } } /* copy back packet headers if needed */ @@ -3454,10 +3457,23 @@ pf_test_rule(struct pf_pdesc *pd, struct pf_rule **rm, struct pf_state **sm, } #endif if (r-rule_flag PFRULE_ONCE) pf_purge_rule(ruleset, r, aruleset, a); + if (*sm) { + SLIST_FOREACH(ri, (*sm)-match_rules, entry) { + if (ri-r-rule_flag PFRULE_ONCE) + /* +* We can be sure that pf_purge_rule won't +* pool_put the rule because when *sm != NULL +* STATE_INC_COUNTERS has increased states_cur. +* pf_rule_item's and rules will be g/c'ed by +* pf_free_state. +*/ + pf_purge_rule(ri-rs, ri-r, ri-ars, ri-ar); + } + } #if INET INET6 if (rewrite skw-af != sks-af) return (PF_AFRT); #endif /* INET INET6 */ diff --git sys/net/pfvar.h sys/net/pfvar.h index a0d94f7..49af7b4 100644 --- sys/net/pfvar.h +++ sys/net/pfvar.h @@ -691,10 +691,13 @@ struct pf_threshold { }; struct pf_rule_item { SLIST_ENTRY(pf_rule_item)entry; struct pf_rule *r; + struct pf_rule *ar; + struct pf_ruleset *rs; + struct pf_ruleset *ars; }; SLIST_HEAD(pf_rule_slist, pf_rule_item); enum pf_sn_types { PF_SN_NONE, PF_SN_NAT, PF_SN_RDR, PF_SN_ROUTE, PF_SN_MAX };
Re: pf: once for match rules?
On Tue, Jul 22, 2014 at 19:03 +0200, Mike Belopuhov wrote: Hi, Before I send a diff for pfctl to disable once on match rules, I've decided to try and see how much work is it to make it actually work. Turns out that I need to extend pf_rule_item by 3 pointers to track the match rule ruleset, anchor rule and the ruleset it belongs to. Here's what this means in practice. Consider a ruleset: block drop all match out log proto tcp to port 22 once anchor foo all { match out log proto tcp to port 22 once anchor bar all { match out log proto tcp to port 22 once pass out quick proto tcp to port 22 once } } Once we send a packet to port 22 the ruleset collapses to just: block drop all Thoughts? Henning thinks it's a bit of an overkill. Any other opinions? diff --git sys/net/pf.c sys/net/pf.c index 9f0e2d6..5679a40 100644 --- sys/net/pf.c +++ sys/net/pf.c @@ -3279,15 +3279,16 @@ pf_test_rule(struct pf_pdesc *pd, struct pf_rule **rm, struct pf_state **sm, PR_NOWAIT)) == NULL) { REASON_SET(reason, PFRES_MEMORY); goto cleanup; } ri-r = r; + ri-ar = a; + ri-rs = ruleset; + ri-ars = aruleset; /* order is irrelevant */ SLIST_INSERT_HEAD(rules, ri, entry); pf_rule_to_actions(r, act); - if (r-rule_flag PFRULE_AFTO) - pd-naf = r-naf; if (pf_get_transaddr(r, pd, sns, nr) == -1) { REASON_SET(reason, PFRES_TRANSLATE); goto cleanup; } if (r-log || act.log PF_LOG_MATCHES) { @@ -3428,10 +3429,12 @@ pf_test_rule(struct pf_pdesc *pd, struct pf_rule **rm, struct pf_state **sm, virtual_type, icmp_dir); } } else { while ((ri = SLIST_FIRST(rules))) { SLIST_REMOVE_HEAD(rules, entry); + if (ri-r-rule_flag PFRULE_ONCE) + pf_purge_rule(ri-rs, ri-r, ri-ars, ri-ar); pool_put(pf_rule_item_pl, ri); } } /* copy back packet headers if needed */ @@ -3454,10 +3457,23 @@ pf_test_rule(struct pf_pdesc *pd, struct pf_rule **rm, struct pf_state **sm, } #endif if (r-rule_flag PFRULE_ONCE) pf_purge_rule(ruleset, r, aruleset, a); + if (*sm) { + SLIST_FOREACH(ri, (*sm)-match_rules, entry) { + if (ri-r-rule_flag PFRULE_ONCE) + /* + * We can be sure that pf_purge_rule won't + * pool_put the rule because when *sm != NULL + * STATE_INC_COUNTERS has increased states_cur. + * pf_rule_item's and rules will be g/c'ed by + * pf_free_state. + */ + pf_purge_rule(ri-rs, ri-r, ri-ars, ri-ar); + } + } #if INET INET6 if (rewrite skw-af != sks-af) return (PF_AFRT); #endif /* INET INET6 */ diff --git sys/net/pfvar.h sys/net/pfvar.h index a0d94f7..49af7b4 100644 --- sys/net/pfvar.h +++ sys/net/pfvar.h @@ -691,10 +691,13 @@ struct pf_threshold { }; struct pf_rule_item { SLIST_ENTRY(pf_rule_item)entry; struct pf_rule *r; + struct pf_rule *ar; + struct pf_ruleset *rs; + struct pf_ruleset *ars; }; SLIST_HEAD(pf_rule_slist, pf_rule_item); enum pf_sn_types { PF_SN_NONE, PF_SN_NAT, PF_SN_RDR, PF_SN_ROUTE, PF_SN_MAX };
Re: Kill MRT_{ADD,DEL}_BW_UPCALL
On 13 August 2014 10:56, Martin Pieuchot mpieuc...@nolizard.org wrote: Our multicast routing code is insert your adjective and for the most part unused. We discussed with claudio@ during t2k13 to rewrite only the parts that people currently use, any volunteer? In the meantime, I'd like to kill the obviously unused parts of it. So here's a first diff that remove the bandwidth monitoring interface. Nothing use it in base and a quick search on codesearch.debian.net reveals that only net/xorp picks it if it finds the defines. Ok? OK
Re: Fix pppoe(4) with rdomain != 0
OK On 13 August 2014 11:56, Martin Pieuchot mpieuc...@nolizard.org wrote: ok? Index: net/if_pppoe.c === RCS file: /home/ncvs/src/sys/net/if_pppoe.c,v retrieving revision 1.40 diff -u -p -r1.40 if_pppoe.c --- net/if_pppoe.c 12 Jul 2014 18:44:22 - 1.40 +++ net/if_pppoe.c 13 Aug 2014 09:56:16 - @@ -1398,6 +1398,9 @@ pppoe_send_padt(struct ifnet *outgoing_i memcpy(eh-ether_dhost, dest, ETHER_ADDR_LEN); m0-m_flags = ~(M_BCAST|M_MCAST); + /* encapsulated packet is forced into rdomain of physical interface */ + m0-m_pkthdr.ph_rtableid = outgoing_if-if_rdomain; + return (outgoing_if-if_output(outgoing_if, m0, dst, NULL)); }
[regress] convert aes testcase from /dev/crypto
in order to deprecate crypto(4) interface regress tests need to be converted. this aes test case actually uses ecb vectors, therefore no chaining is required and the code looks very simple. OK? diff --git regress/sys/crypto/aes/Makefile regress/sys/crypto/aes/Makefile index 459aedb..826d98c 100644 --- regress/sys/crypto/aes/Makefile +++ regress/sys/crypto/aes/Makefile @@ -1,9 +1,13 @@ # $OpenBSD: Makefile,v 1.2 2014/01/18 05:54:52 martynas Exp $ -PROG= aestest +DIR= ${.CURDIR}/../../../../sys + +CFLAGS+= -I${DIR} +PROG= aestest +SRCS= aestest.c CDIAGFLAGS=-Wall #CDIAGFLAGS+= -Werror CDIAGFLAGS+= -Wpointer-arith CDIAGFLAGS+= -Wno-uninitialized CDIAGFLAGS+= -Wstrict-prototypes @@ -12,9 +16,12 @@ CDIAGFLAGS+= -Wunused CDIAGFLAGS+= -Wsign-compare CDIAGFLAGS+= -Wshadow REGRESS_ROOT_TARGETS= run-regress-${PROG} +.PATH: ${DIR}/crypto +SRCS+= rijndael.c + run-regress-${PROG}: ${PROG} - ${SUDO} ./${PROG} ${.CURDIR}/vectors/*.txt + ./${PROG} ${.CURDIR}/vectors/*.txt .include bsd.regress.mk diff --git regress/sys/crypto/aes/aestest.c regress/sys/crypto/aes/aestest.c index 2437c38..720dbc1 100644 --- regress/sys/crypto/aes/aestest.c +++ regress/sys/crypto/aes/aestest.c @@ -24,117 +24,39 @@ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ /* - * Test crypto(4) AES with test vectors provided by Dr Brian Gladman: - * http://fp.gladman.plus.com/AES/ + * Test kernel AES implementation with test vectors provided by + * Dr Brian Gladman: http://fp.gladman.plus.com/AES/ */ -#include sys/types.h #include sys/param.h -#include sys/ioctl.h -#include sys/sysctl.h -#include crypto/cryptodev.h +#include crypto/rijndael.h #include err.h -#include fcntl.h #include stdio.h #include stdlib.h #include string.h #include unistd.h #include ctype.h static int -syscrypt(const unsigned char *key, size_t klen, const unsigned char *in, +docrypt(const unsigned char *key, size_t klen, const unsigned char *in, unsigned char *out, size_t len, int do_encrypt) { - struct session_op session; - struct crypt_op cryp; - int cryptodev_fd = -1, fd = -1; - u_char iv[32]; - - /* -* Kludge; the kernel doesn't support ECB encryption so we -* use a all-zero IV and encrypt a single block only, so the -* result should be the same. -*/ - bzero(iv, sizeof(iv)); - - if ((cryptodev_fd = open(/dev/crypto, O_RDWR, 0)) 0) { - warn(/dev/crypto); - goto err; - } - if (ioctl(cryptodev_fd, CRIOGET, fd) == -1) { - warn(CRIOGET failed); - goto err; - } - memset(session, 0, sizeof(session)); - session.cipher = CRYPTO_AES_CBC; - session.key = (caddr_t) key; - session.keylen = klen; - if (ioctl(fd, CIOCGSESSION, session) == -1) { - warn(CIOCGSESSION); - goto err; - } - memset(cryp, 0, sizeof(cryp)); - cryp.ses = session.ses; - cryp.op = do_encrypt ? COP_ENCRYPT : COP_DECRYPT; - cryp.flags = 0; - cryp.len = len; - cryp.src = (caddr_t) in; - cryp.dst = (caddr_t) out; - cryp.iv = (caddr_t) iv; - cryp.mac = 0; - if (ioctl(fd, CIOCCRYPT, cryp) == -1) { - warn(CIOCCRYPT); - goto err; - } - if (ioctl(fd, CIOCFSESSION, session.ses) == -1) { - warn(CIOCFSESSION); - goto err; - } - close(fd); - close(cryptodev_fd); - return (0); - -err: - if (fd != -1) - close(fd); - if (cryptodev_fd != -1) - close(cryptodev_fd); - return (-1); -} - -static int -getallowsoft(void) -{ - int mib[2], old; - size_t olen; - - olen = sizeof(old); - - mib[0] = CTL_KERN; - mib[1] = KERN_CRYPTODEVALLOWSOFT; - if (sysctl(mib, 2, old, olen, NULL, 0) 0) - err(1, sysctl failed); - - return old; -} - -static void -setallowsoft(int new) -{ - int mib[2], old; - size_t olen, nlen; - - olen = nlen = sizeof(new); - - mib[0] = CTL_KERN; - mib[1] = KERN_CRYPTODEVALLOWSOFT; - - if (sysctl(mib, 2, old, olen, new, nlen) 0) - err(1, sysctl failed); + rijndael_ctx ctx; + int error = 0; + + memset(ctx, 0, sizeof(ctx)); + error = rijndael_set_key(ctx, key, klen * 8); + if (error) + return -1; + if (do_encrypt) + rijndael_encrypt(ctx, in, out); + else + rijndael_decrypt(ctx, in, out); + return 0; } static int match(unsigned char *a, unsigned char *b, size_t len) { @@ -221,21 +143,21 @@ do_tests(const char *filename, int test_num, u_char *key, u_int keylen, { char result[32]; int fail = 0; /* Encrypt test */
[regress] convert aes-ctr test from /dev/crypto
this test is converted the same way jsing@ has recently converted an xts test by pulling in xform.c code. OK? diff --git regress/sys/crypto/aesctr/Makefile regress/sys/crypto/aesctr/Makefile index 31ae500..7310dbc 100644 --- regress/sys/crypto/aesctr/Makefile +++ regress/sys/crypto/aesctr/Makefile @@ -1,10 +1,29 @@ # $OpenBSD: Makefile,v 1.1 2005/05/25 05:47:53 markus Exp $ +DIR= ${.CURDIR}/../../../../sys + +CFLAGS+= -I${DIR} + PROG= aesctr +SRCS= aesctr.c + +CDIAGFLAGS=-Wall +CDIAGFLAGS+= -Werror +CDIAGFLAGS+= -Wpointer-arith +CDIAGFLAGS+= -Wno-uninitialized +CDIAGFLAGS+= -Wstrict-prototypes +CDIAGFLAGS+= -Wmissing-prototypes +CDIAGFLAGS+= -Wunused +CDIAGFLAGS+= -Wsign-compare +CDIAGFLAGS+= -Wshadow REGRESS_ROOT_TARGETS= run-regress-${PROG} +.PATH: ${DIR}/crypto +SRCS+= cast.c ecb_enc.c ecb3_enc.c gmac.c rijndael.c set_key.c +SRCS+= xform.c + run-regress-${PROG}: ${PROG} - ${SUDO} ./${PROG} + ./${PROG} .include bsd.regress.mk diff --git regress/sys/crypto/aesctr/aesctr.c regress/sys/crypto/aesctr/aesctr.c index 4cc1a6e..3a0b4d1 100644 --- regress/sys/crypto/aesctr/aesctr.c +++ regress/sys/crypto/aesctr/aesctr.c @@ -14,17 +14,13 @@ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */ -#include sys/types.h #include sys/param.h -#include sys/ioctl.h -#include sys/sysctl.h -#include crypto/cryptodev.h +#include crypto/rijndael.h #include err.h -#include fcntl.h #include stdio.h #include stdlib.h #include string.h #include unistd.h #include limits.h @@ -128,92 +124,67 @@ struct { B4 07 DF 86 65 69 FD 07 F4 8C C0 B5 83 D6 07 1F /*1E C0 E6 B8*/, }, }; -static int -syscrypt(const unsigned char *key, size_t klen, const unsigned char *iv, -const unsigned char *in, unsigned char *out, size_t len, int encrypt) -{ - struct session_op session; - struct crypt_op cryp; - int cryptodev_fd = -1, fd = -1; +/* Stubs */ - if ((cryptodev_fd = open(/dev/crypto, O_RDWR, 0)) 0) { - warn(/dev/crypto); - goto err; - } - if (ioctl(cryptodev_fd, CRIOGET, fd) == -1) { - warn(CRIOGET failed); - goto err; - } - memset(session, 0, sizeof(session)); - session.cipher = CRYPTO_AES_CTR; - session.key = (caddr_t) key; - session.keylen = klen; - if (ioctl(fd, CIOCGSESSION, session) == -1) { - warn(CIOCGSESSION); - goto err; - } - memset(cryp, 0, sizeof(cryp)); - cryp.ses = session.ses; - cryp.op = encrypt ? COP_ENCRYPT : COP_DECRYPT; - cryp.flags = 0; - cryp.len = len; - cryp.src = (caddr_t) in; - cryp.dst = (caddr_t) out; - cryp.iv = (caddr_t) iv; - cryp.mac = 0; - if (ioctl(fd, CIOCCRYPT, cryp) == -1) { - warn(CIOCCRYPT); - goto err; - } - if (ioctl(fd, CIOCFSESSION, session.ses) == -1) { - warn(CIOCFSESSION); - goto err; - } - close(fd); - close(cryptodev_fd); - return (0); +u_int32_t deflate_global(u_int8_t *, u_int32_t, int, u_int8_t **); -err: - if (fd != -1) - close(fd); - if (cryptodev_fd != -1) - close(cryptodev_fd); - return (-1); +u_int32_t +deflate_global(u_int8_t *data, u_int32_t size, int comp, u_int8_t **out) +{ + return 0; } -static int -getallowsoft(void) +void explicit_bzero(void *, size_t); + +void +explicit_bzero(void *b, size_t len) { - int mib[2], old; - size_t olen; + bzero(b, len); +} - olen = sizeof(old); +/* Definitions from /sys/crypto/xform.c */ - mib[0] = CTL_KERN; - mib[1] = KERN_CRYPTODEVALLOWSOFT; - if (sysctl(mib, 2, old, olen, NULL, 0) 0) - err(1, sysctl failed); +#define AESCTR_NONCESIZE 4 +#define AESCTR_IVSIZE 8 +#define AESCTR_BLOCKSIZE 16 - return old; -} +struct aes_ctr_ctx { + u_int32_t ac_ek[4*(AES_MAXROUNDS + 1)]; + u_int8_tac_block[AESCTR_BLOCKSIZE]; + int ac_nr; +}; -static void -setallowsoft(int new) -{ - int mib[2], old; - size_t olen, nlen; +int aes_ctr_setkey(void *, u_int8_t *, int); +void aes_ctr_encrypt(caddr_t, u_int8_t *); +void aes_ctr_decrypt(caddr_t, u_int8_t *); +void aes_ctr_reinit(caddr_t, u_int8_t *); - olen = nlen = sizeof(new); +static int +docrypt(const unsigned char *key, size_t klen, const unsigned char *iv, +const unsigned char *in, unsigned char *out, size_t len, int encrypt) +{ + u_int8_t block[AESCTR_BLOCKSIZE]; + struct aes_ctr_ctx ctx; + int error = 0; + size_t i; - mib[0] = CTL_KERN; - mib[1] = KERN_CRYPTODEVALLOWSOFT; +
[regress] convert enc (3des) test from /dev/crypto
this one with a bit of cheating however (manual cbc implementation). OK? diff --git regress/sys/crypto/enc/Makefile regress/sys/crypto/enc/Makefile index cc29b32..8725f0c 100644 --- regress/sys/crypto/enc/Makefile +++ regress/sys/crypto/enc/Makefile @@ -1,12 +1,21 @@ # $OpenBSD: Makefile,v 1.5 2010/10/15 10:39:12 jsg Exp $ +DIR= ${.CURDIR}/../../../../sys + +CFLAGS+= -I${DIR} + PROG= des3 +SRCS= des3.c LDADD=-lcrypto DPADD=${LIBCRYPTO} REGRESS_ROOT_TARGETS= run-regress-${PROG} +.PATH: ${DIR}/crypto +SRCS+= cast.c ecb_enc.c ecb3_enc.c gmac.c rijndael.c set_key.c +SRCS+= xform.c + run-regress-${PROG}: ${PROG} - ${SUDO} ./${PROG} + ./${PROG} .include bsd.regress.mk diff --git regress/sys/crypto/enc/des3.c regress/sys/crypto/enc/des3.c index 024418d..fe67872 100644 --- regress/sys/crypto/enc/des3.c +++ regress/sys/crypto/enc/des3.c @@ -22,105 +22,73 @@ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ -#include sys/types.h #include sys/param.h -#include sys/ioctl.h -#include sys/sysctl.h -#include crypto/cryptodev.h #include openssl/des.h #include err.h #include fcntl.h #include stdio.h #include stdlib.h #include string.h #include unistd.h -static int -syscrypt(const unsigned char *key, size_t klen, const unsigned char *iv, -const unsigned char *in, unsigned char *out, size_t len, int encrypt) -{ - struct session_op session; - struct crypt_op cryp; - int cryptodev_fd = -1, fd = -1; - - if ((cryptodev_fd = open(/dev/crypto, O_RDWR, 0)) 0) { - warn(/dev/crypto); - goto err; - } - if (ioctl(cryptodev_fd, CRIOGET, fd) == -1) { - warn(CRIOGET failed); - goto err; - } - memset(session, 0, sizeof(session)); - session.cipher = CRYPTO_3DES_CBC; - session.key = (caddr_t) key; - session.keylen = klen; - if (ioctl(fd, CIOCGSESSION, session) == -1) { - warn(CIOCGSESSION); - goto err; - } - memset(cryp, 0, sizeof(cryp)); - cryp.ses = session.ses; - cryp.op = encrypt ? COP_ENCRYPT : COP_DECRYPT; - cryp.flags = 0; - cryp.len = len; - cryp.src = (caddr_t) in; - cryp.dst = (caddr_t) out; - cryp.iv = (caddr_t) iv; - cryp.mac = 0; - if (ioctl(fd, CIOCCRYPT, cryp) == -1) { - warn(CIOCCRYPT); - goto err; - } - if (ioctl(fd, CIOCFSESSION, session.ses) == -1) { - warn(CIOCFSESSION); - goto err; - } - close(fd); - close(cryptodev_fd); - return (0); +/* Stubs */ -err: - if (fd != -1) - close(fd); - if (cryptodev_fd != -1) - close(cryptodev_fd); - return (-1); -} +u_int32_t deflate_global(u_int8_t *, u_int32_t, int, u_int8_t **); -static int -getallowsoft(void) +u_int32_t +deflate_global(u_int8_t *data, u_int32_t size, int comp, u_int8_t **out) { - int mib[2], old; - size_t olen; - - olen = sizeof(old); - - mib[0] = CTL_KERN; - mib[1] = KERN_CRYPTODEVALLOWSOFT; - if (sysctl(mib, 2, old, olen, NULL, 0) 0) - err(1, sysctl failed); - - return old; + return 0; } -static void -setallowsoft(int new) +void explicit_bzero(void *, size_t); + +void +explicit_bzero(void *b, size_t len) { - int mib[2], old; - size_t olen, nlen; + bzero(b, len); +} - olen = nlen = sizeof(new); - mib[0] = CTL_KERN; - mib[1] = KERN_CRYPTODEVALLOWSOFT; +/* Simulate CBC mode */ - if (sysctl(mib, 2, old, olen, new, nlen) 0) - err(1, sysctl failed); +static int +docrypt(const unsigned char *key, size_t klen, const unsigned char *iv0, +const unsigned char *in, unsigned char *out, size_t len, int encrypt) +{ + u_int8_t block[8], iv[8], iv2[8], *ivp = iv, *nivp; + u_int8_t ctx[384]; + int i, j, error = 0; + + memcpy(iv, iv0, 8); + memset(ctx, 0, sizeof(ctx)); + error = des3_setkey(ctx, key, klen); + if (error) + return -1; + for (i = 0; i len / 8; i ++) { + bcopy(in, block, 8); + in += 8; + if (encrypt) { + for (j = 0; j 8; j++) + block[j] ^= ivp[j]; + des3_encrypt(ctx, block); + memcpy(ivp, block, 8); + } else { + nivp = ivp == iv ? iv2 : iv; + memcpy(nivp, block, 8); + des3_decrypt(ctx, block); + for (j = 0; j 8; j++) + block[j] ^= ivp[j]; + ivp = nivp; + } +
/dev/crypto removal (1/3): unlink pseudo device
first step is to unlink crypto(4) pseudo device from the architecture dependant character device tables and kernel config files. OK? diff --git sys/arch/alpha/alpha/conf.c sys/arch/alpha/alpha/conf.c index 83cea34..7d103af 100644 --- sys/arch/alpha/alpha/conf.c +++ sys/arch/alpha/alpha/conf.c @@ -190,11 +190,11 @@ struct cdevsw cdevsw[] = #endif cdev_bio_init(NBIO,bio),/* 53: ioctl tunnel */ cdev_notdef(), cdev_ptm_init(NPTY,ptm),/* 55: pseudo-tty ptm device */ cdev_hotplug_init(NHOTPLUG,hotplug), /* 56: devices hot plugging */ - cdev_crypto_init(NCRYPTO,crypto), /* 57: /dev/crypto */ + cdev_notdef(), /* 57: was: /dev/crypto */ cdev_bktr_init(NBKTR,bktr), /* 58: Bt848 video capture device */ cdev_radio_init(NRADIO,radio), /* 59: generic radio I/O */ cdev_mouse_init(NWSMUX, wsmux), /* 60: ws multiplexor */ cdev_vscsi_init(NVSCSI, vscsi), /* 61: vscsi */ cdev_notdef(), diff --git sys/arch/alpha/conf/GENERIC sys/arch/alpha/conf/GENERIC index 7b0c790..dbc8bdc 100644 --- sys/arch/alpha/conf/GENERIC +++ sys/arch/alpha/conf/GENERIC @@ -422,8 +422,7 @@ option ONEWIREVERBOSE owid* at onewire? # ID owsbm* at onewire? # Smart Battery Monitor owtemp* at onewire?# Temperature owctr* at onewire? # Counter device -pseudo-device crypto 1 pseudo-device hotplug 1 # devices hot plugging pseudo-device wsmux 2 # mouse keyboard multiplexor diff --git sys/arch/amd64/amd64/conf.c sys/arch/amd64/amd64/conf.c index 59d4c9b..5da5820 100644 --- sys/arch/amd64/amd64/conf.c +++ sys/arch/amd64/amd64/conf.c @@ -38,12 +38,10 @@ #include sys/tty.h #include sys/vnode.h #include machine/conf.h -#include inet.h - #include wd.h bdev_decl(wd); #include fdc.h #include fd.h bdev_decl(fd); @@ -254,11 +252,11 @@ struct cdevsw cdevsw[] = cdev_tty_init(NUCOM,ucom), /* 66: USB tty */ cdev_mouse_init(NWSKBD, wskbd), /* 67: keyboards */ cdev_mouse_init(NWSMOUSE, /* 68: mice */ wsmouse), cdev_mouse_init(NWSMUX, wsmux), /* 69: ws multiplexor */ - cdev_crypto_init(NCRYPTO,crypto), /* 70: /dev/crypto */ + cdev_notdef(), /* 70: was: /dev/crypto */ cdev_tty_init(NCZ,cztty), /* 71: Cyclades-Z serial port */ #ifdef USER_PCICONF cdev_pci_init(NPCI,pci),/* 72: PCI user */ #else cdev_notdef(), diff --git sys/arch/amd64/conf/GENERIC sys/arch/amd64/conf/GENERIC index 09fd692..3177dd8 100644 --- sys/arch/amd64/conf/GENERIC +++ sys/arch/amd64/conf/GENERIC @@ -600,11 +600,10 @@ pseudo-device pctr1 pseudo-device nvram 1 pseudo-device hotplug 1 # devices hot plugging # mouse keyboard multiplexor pseudo-devices pseudo-device wsmux 2 -pseudo-device crypto 1 # Virtio devices virtio*at pci? # Virtio PCI device vioblk*at virtio? # Virtio block device vio* at virtio? # Virtio network device diff --git sys/arch/arm/arm/conf.c sys/arch/arm/arm/conf.c index d381885..8f12732 100644 --- sys/arch/arm/arm/conf.c +++ sys/arch/arm/arm/conf.c @@ -53,12 +53,10 @@ #include sys/conf.h #include sys/vnode.h #include machine/conf.h -#include inet.h - /* * From this point, these need to be MI foo.h files. */ /* @@ -321,11 +319,11 @@ struct cdevsw cdevsw[] = { cdev_lkm_dummy(), /* 42: reserved */ cdev_lkm_dummy(), /* 43: reserved */ cdev_lkm_dummy(), /* 44: reserved */ cdev_lkm_dummy(), /* 45: reserved */ cdev_pf_init(NPF,pf), /* 46: packet filter */ - cdev_crypto_init(NCRYPTO,crypto), /* 47: /dev/crypto */ + cdev_notdef(), /* 47: was: /dev/crypto */ cdev_lkm_dummy(), /* 48: reserved */ cdev_lkm_dummy(), /* 49: reserved */ cdev_systrace_init(NSYSTRACE,systrace), /* 50: system call tracing */ cdev_notdef(), /* 51: reserved */ cdev_bio_init(NBIO,bio),/* 52: ioctl tunnel */ diff --git sys/arch/armish/conf/GENERIC sys/arch/armish/conf/GENERIC index e72ff41..6bf908c 100644 --- sys/arch/armish/conf/GENERIC +++ sys/arch/armish/conf/GENERIC @@ -195,7 +195,6 @@ owsbm* at onewire? # Smart Battery Monitor owtemp* at onewire?# Temperature owctr* at onewire? # Counter device # mouse keyboard multiplexor pseudo-devices pseudo-device wsmux 2 -pseudo-device crypto 1 pseudo-device hotplug 1 # devices hot plugging diff --git sys/arch/hppa/hppa/conf.c sys/arch/hppa/hppa/conf.c
/dev/crypto removal (2/3): remove kernel support
this removes /dev/crypto device interface and public key functionality that is only usable via /dev/crypto. OK? diff --git sys/conf/files sys/conf/files index 3941639..9af78cc 100644 --- sys/conf/files +++ sys/conf/files @@ -870,11 +870,10 @@ file crypto/blf.c (inet ipsec) | crypto | vnd file crypto/cast.c (inet ipsec) | crypto file crypto/ecb_enc.c (inet ipsec) | crypto file crypto/set_key.c (inet ipsec) | crypto file crypto/ecb3_enc.c (inet ipsec) | crypto file crypto/crypto.c (inet ipsec) | crypto -file crypto/cryptodev.c((inet ipsec) | crypto) needs-flag file crypto/criov.c(inet ipsec) | crypto file crypto/cryptosoft.c (inet ipsec) | crypto file crypto/xform.c(inet ipsec) | crypto file crypto/xform_ipcomp.c (inet ipsec) | crypto file crypto/arc4.c diff --git sys/crypto/crypto.c sys/crypto/crypto.c index f62752d..4a11ae3 100644 --- sys/crypto/crypto.c +++ sys/crypto/crypto.c @@ -280,39 +280,10 @@ crypto_get_driverid(u_int8_t flags) /* * Register a crypto driver. It should be called once for each algorithm * supported by the driver. */ int -crypto_kregister(u_int32_t driverid, int *kalg, -int (*kprocess)(struct cryptkop *)) -{ - int s, i; - - if (driverid = crypto_drivers_num || kalg == NULL || - crypto_drivers == NULL) - return EINVAL; - - s = splvm(); - - for (i = 0; i = CRK_ALGORITHM_MAX; i++) { - /* -* XXX Do some performance testing to determine -* placing. We probably need an auxiliary data -* structure that describes relative performances. -*/ - - crypto_drivers[driverid].cc_kalg[i] = kalg[i]; - } - - crypto_drivers[driverid].cc_kprocess = kprocess; - - splx(s); - return 0; -} - -/* Register a crypto driver. */ -int crypto_register(u_int32_t driverid, int *alg, int (*newses)(u_int32_t *, struct cryptoini *), int (*freeses)(u_int64_t), int (*process)(struct cryptop *)) { int s, i; @@ -410,71 +381,10 @@ crypto_dispatch(struct cryptop *crp) } return 0; } -int -crypto_kdispatch(struct cryptkop *krp) -{ - if (crypto_taskq) { - task_set(krp-krp_task, (void (*))crypto_kinvoke, krp, NULL); - task_add(crypto_taskq, krp-krp_task); - } else { - crypto_kinvoke(krp); - } - - return 0; -} - -/* - * Dispatch an asymmetric crypto request to the appropriate crypto devices. - */ -int -crypto_kinvoke(struct cryptkop *krp) -{ - extern int cryptodevallowsoft; - u_int32_t hid; - int error; - int s; - - /* Sanity checks. */ - if (krp == NULL || krp-krp_callback == NULL) - return (EINVAL); - - s = splvm(); - for (hid = 0; hid crypto_drivers_num; hid++) { - if ((crypto_drivers[hid].cc_flags CRYPTOCAP_F_SOFTWARE) - cryptodevallowsoft == 0) - continue; - if (crypto_drivers[hid].cc_kprocess == NULL) - continue; - if ((crypto_drivers[hid].cc_kalg[krp-krp_op] - CRYPTO_ALG_FLAG_SUPPORTED) == 0) - continue; - break; - } - - if (hid == crypto_drivers_num) { - krp-krp_status = ENODEV; - crypto_kdone(krp); - splx(s); - return (0); - } - - krp-krp_hid = hid; - - crypto_drivers[hid].cc_koperations++; - - error = crypto_drivers[hid].cc_kprocess(krp); - if (error) { - krp-krp_status = error; - crypto_kdone(krp); - } - splx(s); - return (0); -} - /* * Dispatch a crypto request to the appropriate crypto devices. */ int crypto_invoke(struct cryptop *crp) @@ -624,40 +534,5 @@ crypto_done(struct cryptop *crp) task_set(crp-crp_task, (void (*))crp-crp_callback, crp, NULL); task_add(crypto_taskq, crp-crp_task); } } - -/* - * Invoke the callback on behalf of the driver. - */ -void -crypto_kdone(struct cryptkop *krp) -{ - task_set(krp-krp_task, (void (*))krp-krp_callback, krp, NULL); - task_add(crypto_taskq, krp-krp_task); -} - -int -crypto_getfeat(int *featp) -{ - extern int cryptodevallowsoft, userasymcrypto; - int hid, kalg, feat = 0; - - if (userasymcrypto == 0) - goto out; - for (hid = 0; hid crypto_drivers_num; hid++) { - if ((crypto_drivers[hid].cc_flags CRYPTOCAP_F_SOFTWARE) - cryptodevallowsoft == 0) { - continue; - } - if
/dev/crypto removal (3/3): userland bits
please note that the commented out example usage in etc/MAKEDEV.common remains till someone feels the need to change it. OK? diff --git etc/MAKEDEV.common etc/MAKEDEV.common index bfcd943..b656d46 100644 --- etc/MAKEDEV.common +++ etc/MAKEDEV.common @@ -131,11 +131,10 @@ target(all, wd, 0, 1, 2, 3)dnl target(all, xd, 0, 1, 2, 3)dnl target(all, systrace)dnl target(all, pctr)dnl target(all, pctr0)dnl target(all, pf)dnl -twrget(all, cry, crypto)dnl target(all, apm)dnl target(all, acpi)dnl twrget(all, tth, ttyh, 0, 1)dnl target(all, ttyA, 0, 1)dnl target(all, ttyB, 0, 1, 2, 3, 4, 5)dnl @@ -446,12 +445,10 @@ __devitem(fdesc, fd, fd/* nodes, fd)dnl _mkdev(fdesc, fd, {-RMlist[${#RMlist[*]}]=;mkdir -p fd;rm -f n=0 while [ $n -lt 64 ];do M fd/$n c major_fdesc_c $n;n=Add($n, 1);done MKlist[${#MKlist[*]}]=;chmod 555 fd-})dnl __devitem(oppr, openprom,PROM settings,openprom)dnl _cdev(oppr, openprom, 70, 0)dnl -__devitem(cry, crypto, Hardware crypto access driver,crypto)dnl -_mkdev(cry, crypto, {-M crypto c major_cry_c-} 0)dnl __devitem(pf, pf*, Packet Filter)dnl _mkdev(pf, {-pf*-}, {-M pf c major_pf_c 0 600-})dnl __devitem(bpf, bpf*, Berkeley Packet Filter)dnl _mkdev(bpf, {-bpf*-}, {-M bpf$U c major_bpf_c $U 600-}, 600)dnl _mkdev(tun, {-tun*-}, {-M tun$U c major_tun_c $U 600-}, 600)dnl diff --git etc/etc.alpha/MAKEDEV etc/etc.alpha/MAKEDEV index 2dd9859..050aa4e 100644 --- etc/etc.alpha/MAKEDEV +++ etc/etc.alpha/MAKEDEV @@ -68,11 +68,10 @@ # Special purpose devices: # audio* Audio devices # bio ioctl tunnel pseudo-device # bktr* Video frame grabbers # bpf*Berkeley Packet Filter -# crypto Hardware crypto access driver # diskmap Disk mapper # fd fd/* nodes # fuseUserland Filesystem # hotplug devices hot plugging # lkm Loadable kernel modules interface @@ -324,14 +323,10 @@ fd) diskmap) M diskmap c 63 0 640 operator ;; -crypto) - M crypto c 57 0 - ;; - bpf*) M bpf$U c 11 $U 600 ;; bktr*) @@ -539,11 +534,11 @@ all) R local wscons pci0 pci1 pci2 pci3 uall rmidi0 rmidi1 rmidi2 R rmidi3 rmidi4 rmidi5 rmidi6 rmidi7 tuner0 radio0 speaker R video0 video1 uk0 random lpa0 lpa1 lpa2 lpt0 lpt1 lpt2 lkm R tty00 tty01 tty02 tty03 tty04 tty05 tty06 tty07 tty08 tty09 R tty0a tty0b ttyc0 ttyc1 ttyc2 ttyc3 ttyc4 ttyc5 ttyc6 ttyc7 - R ttyB0 ttyB1 ttyB2 ttyB3 ttyB4 ttyB5 crypto pf systrace wd0 + R ttyB0 ttyB1 ttyB2 ttyB3 ttyB4 ttyB5 pf systrace wd0 R wd1 wd2 wd3 std st0 st1 fd ;; wd*|sd*) case $i in diff --git etc/etc.amd64/MAKEDEV etc/etc.amd64/MAKEDEV index 2971021..4a0d05c 100644 --- etc/etc.amd64/MAKEDEV +++ etc/etc.amd64/MAKEDEV @@ -68,11 +68,10 @@ # apm Power Management Interface # audio* Audio devices # bio ioctl tunnel pseudo-device # bktr* Video frame grabbers # bpf*Berkeley Packet Filter -# crypto Hardware crypto access driver # diskmap Disk mapper # drm*Direct Rendering Manager # fd fd/* nodes # fuseUserland Filesystem # gpio* General Purpose Input/Output @@ -344,14 +343,10 @@ drm*) diskmap) M diskmap c 90 0 640 operator ;; -crypto) - M crypto c 70 0 - ;; - bpf*) M bpf$U c 23 $U 600 ;; bktr*) @@ -565,11 +560,11 @@ all) R wscons pci0 pci1 pci2 pci3 uall rmidi0 rmidi1 rmidi2 rmidi3 R rmidi4 rmidi5 rmidi6 rmidi7 tuner0 radio0 speaker video0 R video1 uk0 random lpa0 lpa1 lpa2 lpt0 lpt1 lpt2 lkm tty00 R tty01 tty02 tty03 tty04 tty05 tty06 tty07 tty08 tty09 tty0a R tty0b ttyc0 ttyc1 ttyc2 ttyc3 ttyc4 ttyc5 ttyc6 ttyc7 apm - R crypto pf pctr systrace wd0 wd1 wd2 wd3 std st0 st1 fd + R pf pctr systrace wd0 wd1 wd2 wd3 std st0 st1 fd ;; wd*|sd*) case $i in wd*) dodisk wd $U 0 3 $U 0;; diff --git etc/etc.armish/MAKEDEV etc/etc.armish/MAKEDEV index 7a3a21e..25fd4d4 100644 --- etc/etc.armish/MAKEDEV +++ etc/etc.armish/MAKEDEV @@ -62,11 +62,10 @@ # apm Power management device # audio* Audio devices # bio ioctl tunnel pseudo-device # bktr* Video frame grabbers # bpf*Berkeley Packet Filter -# crypto Hardware crypto access driver # diskmap Disk mapper # fd fd/* nodes # fuseUserland Filesystem # hotplug devices hot plugging # lkm Loadable kernel modules interface @@ -302,14 +301,10 @@ fd) diskmap) M diskmap c 102 0 640 operator ;; -crypto) - M crypto c 47 0 - ;; - bpf*) M bpf$U c 22 $U 600 ;; bktr*) @@ -481,11 +476,11 @@ all) R bpf5 bpf6 bpf7 bpf8 bpf9 pty0 diskmap vscsi0 ch0 audio0 R audio1 audio2 fuse pppx hotplug ptm local wscons pci0 pci1 R pci2 pci3 uall rmidi0 rmidi1 rmidi2 rmidi3 rmidi4 rmidi5
Re: /dev/crypto removal (2/3): remove kernel support
On Mon, Aug 18, 2014 at 16:03 +0200, Mike Belopuhov wrote: this removes /dev/crypto device interface and public key functionality that is only usable via /dev/crypto. OK? minor correction: preserve #ifdef _KERNEL in the cryptodev.h. while there are no userland programs including it atm leaving it for the symmetry with cryptosoft.c seems like a good idea. diff --git sys/conf/files sys/conf/files index 3941639..9af78cc 100644 --- sys/conf/files +++ sys/conf/files @@ -870,11 +870,10 @@ file crypto/blf.c (inet ipsec) | crypto | vnd file crypto/cast.c (inet ipsec) | crypto file crypto/ecb_enc.c (inet ipsec) | crypto file crypto/set_key.c (inet ipsec) | crypto file crypto/ecb3_enc.c (inet ipsec) | crypto file crypto/crypto.c (inet ipsec) | crypto -file crypto/cryptodev.c((inet ipsec) | crypto) needs-flag file crypto/criov.c(inet ipsec) | crypto file crypto/cryptosoft.c (inet ipsec) | crypto file crypto/xform.c(inet ipsec) | crypto file crypto/xform_ipcomp.c (inet ipsec) | crypto file crypto/arc4.c diff --git sys/crypto/crypto.c sys/crypto/crypto.c index f62752d..4a11ae3 100644 --- sys/crypto/crypto.c +++ sys/crypto/crypto.c @@ -280,39 +280,10 @@ crypto_get_driverid(u_int8_t flags) /* * Register a crypto driver. It should be called once for each algorithm * supported by the driver. */ int -crypto_kregister(u_int32_t driverid, int *kalg, -int (*kprocess)(struct cryptkop *)) -{ - int s, i; - - if (driverid = crypto_drivers_num || kalg == NULL || - crypto_drivers == NULL) - return EINVAL; - - s = splvm(); - - for (i = 0; i = CRK_ALGORITHM_MAX; i++) { - /* -* XXX Do some performance testing to determine -* placing. We probably need an auxiliary data -* structure that describes relative performances. -*/ - - crypto_drivers[driverid].cc_kalg[i] = kalg[i]; - } - - crypto_drivers[driverid].cc_kprocess = kprocess; - - splx(s); - return 0; -} - -/* Register a crypto driver. */ -int crypto_register(u_int32_t driverid, int *alg, int (*newses)(u_int32_t *, struct cryptoini *), int (*freeses)(u_int64_t), int (*process)(struct cryptop *)) { int s, i; @@ -410,71 +381,10 @@ crypto_dispatch(struct cryptop *crp) } return 0; } -int -crypto_kdispatch(struct cryptkop *krp) -{ - if (crypto_taskq) { - task_set(krp-krp_task, (void (*))crypto_kinvoke, krp, NULL); - task_add(crypto_taskq, krp-krp_task); - } else { - crypto_kinvoke(krp); - } - - return 0; -} - -/* - * Dispatch an asymmetric crypto request to the appropriate crypto devices. - */ -int -crypto_kinvoke(struct cryptkop *krp) -{ - extern int cryptodevallowsoft; - u_int32_t hid; - int error; - int s; - - /* Sanity checks. */ - if (krp == NULL || krp-krp_callback == NULL) - return (EINVAL); - - s = splvm(); - for (hid = 0; hid crypto_drivers_num; hid++) { - if ((crypto_drivers[hid].cc_flags CRYPTOCAP_F_SOFTWARE) - cryptodevallowsoft == 0) - continue; - if (crypto_drivers[hid].cc_kprocess == NULL) - continue; - if ((crypto_drivers[hid].cc_kalg[krp-krp_op] - CRYPTO_ALG_FLAG_SUPPORTED) == 0) - continue; - break; - } - - if (hid == crypto_drivers_num) { - krp-krp_status = ENODEV; - crypto_kdone(krp); - splx(s); - return (0); - } - - krp-krp_hid = hid; - - crypto_drivers[hid].cc_koperations++; - - error = crypto_drivers[hid].cc_kprocess(krp); - if (error) { - krp-krp_status = error; - crypto_kdone(krp); - } - splx(s); - return (0); -} - /* * Dispatch a crypto request to the appropriate crypto devices. */ int crypto_invoke(struct cryptop *crp) @@ -624,40 +534,5 @@ crypto_done(struct cryptop *crp) task_set(crp-crp_task, (void (*))crp-crp_callback, crp, NULL); task_add(crypto_taskq, crp-crp_task); } } - -/* - * Invoke the callback on behalf of the driver. - */ -void -crypto_kdone(struct cryptkop *krp) -{ - task_set(krp-krp_task, (void (*))krp-krp_callback, krp, NULL); - task_add(crypto_taskq, krp-krp_task); -} - -int -crypto_getfeat(int *featp) -{ - extern int cryptodevallowsoft, userasymcrypto; - int hid, kalg, feat = 0; - - if (userasymcrypto == 0) - goto out; - for (hid
/dev/crypto removal (3bis): DTYPE_CRYPTO
I don't know if we recycle them somehow, but just in case... diff --git sys/sys/file.h sys/sys/file.h index d98118e..64c0f31 100644 --- sys/sys/file.h +++ sys/sys/file.h @@ -67,11 +67,11 @@ struct file { short f_flag; /* see fcntl.h */ #defineDTYPE_VNODE 1 /* file */ #defineDTYPE_SOCKET2 /* communications endpoint */ #defineDTYPE_PIPE 3 /* pipe */ #defineDTYPE_KQUEUE4 /* event queue */ -#defineDTYPE_CRYPTO5 /* crypto */ +/* was define DTYPE_CRYPTO5 */ #defineDTYPE_SYSTRACE 6 /* system call tracing */ short f_type; /* descriptor type */ longf_count;/* reference count */ longf_msgcount; /* references from message queue */ struct ucred *f_cred; /* credentials associated with descriptor */
Re: pf: once for match rules?
On Tue, Aug 12, 2014 at 18:26 +0200, Mike Belopuhov wrote: On Tue, Jul 22, 2014 at 19:03 +0200, Mike Belopuhov wrote: Hi, Before I send a diff for pfctl to disable once on match rules, I've decided to try and see how much work is it to make it actually work. Turns out that I need to extend pf_rule_item by 3 pointers to track the match rule ruleset, anchor rule and the ruleset it belongs to. Here's what this means in practice. Consider a ruleset: block drop all match out log proto tcp to port 22 once anchor foo all { match out log proto tcp to port 22 once anchor bar all { match out log proto tcp to port 22 once pass out quick proto tcp to port 22 once } } Once we send a packet to port 22 the ruleset collapses to just: block drop all Thoughts? Henning thinks it's a bit of an overkill. Any other opinions? here we go then. OK? diff --git sbin/pfctl/parse.y sbin/pfctl/parse.y index c277b8d..61c2646 100644 --- sbin/pfctl/parse.y +++ sbin/pfctl/parse.y @@ -1488,12 +1488,18 @@ pfrule : action dir logquick interface af proto fromto if ($8.marker FOM_SETPRIO) { r.set_prio[0] = $8.set_prio[0]; r.set_prio[1] = $8.set_prio[1]; r.scrub_flags |= PFSTATE_SETPRIO; } - if ($8.marker FOM_ONCE) + if ($8.marker FOM_ONCE) { + if (r.action == PF_MATCH) { + yyerror(can't specify once for + match rules); + YYERROR; + } r.rule_flag |= PFRULE_ONCE; + } if ($8.marker FOM_AFTO) r.rule_flag |= PFRULE_AFTO; r.af = $5; if ($8.tag)
Re: let vlan(4) mtu be limited by the parents hardmtu instead of current mtu
On 27 August 2014 13:17, David Gwynne da...@gwynne.id.au wrote: On Tue, Aug 26, 2014 at 09:11:14PM -0400, Brad Smith wrote: On 20/08/14 8:03 PM, David Gwynne wrote: sthen@ says this is likely a bit optimistic. while most of our drivers unconditionally configure their max mru, there's some stupid ones that still interpret the configured mtu as a what the mru should be. dlg oce(4) and ix(4) need to be fixed. this might fix ix(4). anyone able to test? ix and oce (internally though) calculate various paramethers related to flow control, internal buffer sizes and so on based on the maximum frame size. therefore if i set max frame size to 16k but always use 1,5k all these calculations will be off. and by killing SIOCSIFMTU you make it impossible for the driver to readjust them. we have no idea at the moment how will this affect nic's behavior. pieces of such black magic such as the code under /* Hardware Packet Buffer Flow Control setup */ comment must be studied before making such changes. and as i've mentioned to dlg, ix diff also switches it to using 16k cluster pool no matter what you actually need. this can be changed, but needs testing (i'm doing it now).
Re: let vlan(4) mtu be limited by the parents hardmtu instead of current mtu
On 27 August 2014 13:23, David Gwynne da...@gwynne.id.au wrote: On Tue, Aug 26, 2014 at 09:11:14PM -0400, Brad Smith wrote: On 20/08/14 8:03 PM, David Gwynne wrote: sthen@ says this is likely a bit optimistic. while most of our drivers unconditionally configure their max mru, there's some stupid ones that still interpret the configured mtu as a what the mru should be. dlg oce(4) and ix(4) need to be fixed. this might fix oce. can anyone test? works fine here. i think we can go with this one. OK mikeb
Re: bge(4) Jumbo support for newer chipsets
On 27 August 2014 08:25, Brad Smith b...@comstyle.com wrote: Looking for some testing of the following diff to add Jumbo support for the BCM5714 / BCM5780 and BCM5717 / BCM5719 / BCM5720 / BCM57765 / BCM57766 chipsets. i have tested this on Broadcom BCM5719 rev 0x01, unknown BCM5719 (0x5719001), APE firmware NCSI 1.1.15.0 and Broadcom BCM5714 rev 0xa3, BCM5715 A3 (0x9003). it works, however i'm not strictly a fan of switching the cluster pool to larger one for 5714. wasting another 8k page (on sparc for example) for every rx cluster in 90% cases sounds kinda wrong to me. but ymmv. apart from that there's a deficiency in the diff itself. you probably want to change MCLBYTES in bge_rxrinfo to bge_rx_std_len otherwise statistics look wrong. i'm certainly OK with !5714 part of the diff.
Re: bge(4) Jumbo support for newer chipsets
On 28 August 2014 12:32, David Gwynne da...@gwynne.id.au wrote: On 28 Aug 2014, at 3:02 am, Mike Belopuhov m...@belopuhov.com wrote: On 27 August 2014 08:25, Brad Smith b...@comstyle.com wrote: Looking for some testing of the following diff to add Jumbo support for the BCM5714 / BCM5780 and BCM5717 / BCM5719 / BCM5720 / BCM57765 / BCM57766 chipsets. i have tested this on Broadcom BCM5719 rev 0x01, unknown BCM5719 (0x5719001), APE firmware NCSI 1.1.15.0 and Broadcom BCM5714 rev 0xa3, BCM5715 A3 (0x9003). it works, however i'm not strictly a fan of switching the cluster pool to larger one for 5714. wasting another 8k page (on sparc for example) for every rx cluster in 90% cases sounds kinda wrong to me. but ymmv. this is what MCLGETI was invented to solve though. comparing pre mclgeti to what this does: that doesn't make my point invalid though. a 5714 right now without jumbos would have 512 rings entries with 2048 bytes on each. 2048 * 512 is a 1024k of ram. if we bumped the std ring up to jumbos by default, 9216 * 512 would eat 4608k of ram. your calculation is a bit off. it's not 9216 * 512, in case of sparc64 it's 8k * 2 * 512 which is 8M. but my concern is different: you ask uvm to do more work for every cluster since now you need 2 consequent pages of memory for one cluster instead of just one that fits 8k/2k = 4 clusters. my boxes with bge with mclgeti generally sit around 40 clusters, but sometimes end up around 80. 80 * 9216 is 720k. we can have jumbos and still be ahead. if you compare the nics with split rings: 512 * 2048 + 256 * 9216 is ~3.3M. the same chip with mclgeti and only doing a 1500 byte workload would be 80 * 2048 + 17 * 9216, or 300k. apart from that there's a deficiency in the diff itself. you probably want to change MCLBYTES in bge_rxrinfo to bge_rx_std_len otherwise statistics look wrong. yeah. i have tested both 1500 and 9000 mtus on a 5714 and it is working well. as you say, 5719 seems to be fine too, but ive only tested it with mtu 1500. ill test 9k tomorrow. it needs tests on older chips too though.
Re: minphys woes
On 29 August 2014 11:26, Stefan Fritsch s...@sfritsch.de wrote: On Fri, 29 Aug 2014, Miod Vallat wrote: sc-sc_xfer_max is computed according to the host's capabilities. What I want to simulate with this diff is a host adapter that can only cope with transfers 64k == MAXPHYS. Back to your original problem, you might want to print the sc_link struct as well the scsi_adapter struct it points to, when you detect a transfer larger than MAXPHYS. It has likely been overriden or reset to NULL by mistake at some point. OK, I will try that. Will take a bit until I have time, though. But I have also read the code and I could not find a place in the path from bread() to the scsi adapter cmd function where the minphys function is called. correct me if i'm wrong, but what happens is that bread being a block read reads up to MAXBSIZE which is conveniently set to 64k and you can't create a filesystem with a larger block size. physio (raw device io) however doesn't go through bread and need to know how split the provided buffer in separate transactions hence minphys.
Re: newfs.8
On 29 August 2014 08:19, Jason McIntyre j...@kerhand.co.uk wrote: is this correct? i'm not a user myself, but the text states that special, for mount_mfs, is typically that of the primary swap area. how would you even define the swap area without a disklabel? jmc sort of yes. mount_mfs(8) says this: [...] The special file is only used to read the disk label which provides a set of configuration parameters for the memory based file system. The special file is typically that of the primary swap area, since that is where the file system will be backed up when free memory gets low and the memory supporting the file system has to be paged. If the keyword ``swap'' is used instead of a special file name, default configuration parameters will be used. (This option is useful when trying to use mount_mfs on a machine without any disks.) in reality it fakes up a disklabel and proceeds. number of XXX's around this code is also mildly amusing...
Re: newfs.8
On 29 August 2014 13:44, Jason McIntyre j...@kerhand.co.uk wrote: On Fri, Aug 29, 2014 at 01:39:57PM +0200, Mike Belopuhov wrote: On 29 August 2014 08:19, Jason McIntyre j...@kerhand.co.uk wrote: is this correct? i'm not a user myself, but the text states that special, for mount_mfs, is typically that of the primary swap area. how would you even define the swap area without a disklabel? jmc sort of yes. mount_mfs(8) says this: [...] The special file is only used to read the disk label which provides a set of configuration parameters for the memory based file system. The special file is typically that of the primary swap area, since that is where the file system will be backed up when free memory gets low and the memory supporting the file system has to be paged. If the keyword ``swap'' is used instead of a special file name, default configuration parameters will be used. (This option is useful when trying to use mount_mfs on a machine without any disks.) in reality it fakes up a disklabel and proceeds. number of XXX's around this code is also mildly amusing... so the diff posted was correct and should be committed? jmc yes, i believe so. OK mikeb
Re: reduce the number of missed PCB cache with tcpbench -su
Daniel, don't reply anything to Damien just yet. Can you please run a simple test on Monday. Try tcpbench -u -n 2 ip (as in multi- connection test) without your patch and then with the patch and see if behavior is changed. Thanks On 29 August 2014 18:01, Damien Miller d...@mindrot.org wrote: On Fri, 29 Aug 2014, Daniel Jakots wrote: Hi, When running tcpbench -su, a lot of them are counted as missed PCB cache. ... + n = recvfrom(fd, ptb-dummybuf, ptb-dummybuf_len, 0, + (struct sockaddr *)ss, slen); + if (n 0 connect(fd, (const struct sockaddr *)ss, slen)) + warn(fail to connect); What's the benefit of this? I've never seen an application do this; wouldn't it be better to improve the PCB cache so it caught this case (which seems the usual way UDP applications behave) instead? -d
Re: minphys woes
On 29 August 2014 22:39, Stefan Fritsch s...@sfritsch.de wrote: On Fri, 29 Aug 2014, Mike Belopuhov wrote: correct me if i'm wrong, but what happens is that bread being a block read reads up to MAXBSIZE which is conveniently set to 64k and you can't create a filesystem with a larger block size. physio (raw device io) however doesn't go through bread and need to know how split the provided buffer in separate transactions hence minphys. Yes, that seems to be what happens. But if every adapter needs to support transfers of MAXBSIZE == MAXPHYS anyway, there would be no need for the adapter to be able to override the default minphys function with its own. And adapters that only support smaller transfers would need to have logic in their driver to be able to split the transfer into smaller chunks. i believe that if you start digging you realise that (at least at some point) the least common denominator is (or was) 64k meaning that even the shittiest controller on vax can do 64k. which means that we don't have code for a filesystem or buffercache to probe the controller for a supported transfer size. I think it makes more sense to have that logic in one place to be used by all drivers. perhaps, but unless the filesystem will issue breads of larger blocks the only benefit would be physio itself which doesn't really justify the change.
Re: reduce the number of missed PCB cache with tcpbench -su
On 29 August 2014 18:01, Damien Miller d...@mindrot.org wrote: What's the benefit of this? This creates a UDP PCB per connection. Otherwise we always rely on matching the wildcard PCB. I've never seen an application do this; I doubt that. However, things like NTP or DNS servers usually expect requests from everyone so they don't do it. Now I have actually identified one use case that this diff breaks: multiple senders won't work. Which is kind of annoying. wouldn't it be better to improve the PCB cache so it caught this case (which seems the usual way UDP applications behave) instead? Definitely! Using something like a prefix tree (mickey advocates hash array mapped tries) would eliminate this second lookup and also solve problems with attacking PCB hash with collisions and resizing the hash table. -d
Re: minphys woes
On 1 September 2014 13:06, Stefan Fritsch s...@sfritsch.de wrote: On Mon, 1 Sep 2014, Mike Belopuhov wrote: On 29 August 2014 22:39, Stefan Fritsch s...@sfritsch.de wrote: Yes, that seems to be what happens. But if every adapter needs to support transfers of MAXBSIZE == MAXPHYS anyway, there would be no need for the adapter to be able to override the default minphys function with its own. And adapters that only support smaller transfers would need to have logic in their driver to be able to split the transfer into smaller chunks. i believe that if you start digging you realise that (at least at some point) the least common denominator is (or was) 64k meaning that even the shittiest controller on vax can do 64k. which means that we don't have code for a filesystem or buffercache to probe the controller for a supported transfer size. That's possible. But what is really strange is, why does openbsd then have an infrastructure to set different max transfer sizes for physio on a per-adapter basis? This makes no sense. Either the drivers have to support 64k transfers, in which case most of the minphys infrastructure is useless, or they don't have to. In the latter case the minphys infrastructure would need to be used in all code paths. i haven't found a controller that does less than MAXPHYS. perhaps they meant to improve the situation but stopped short. I think it makes more sense to have that logic in one place to be used by all drivers. perhaps, but unless the filesystem will issue breads of larger blocks the only benefit would be physio itself which doesn't really justify the change. You lost me there. Why should this depend on how many requests some filesystem makes with larger blocks? If there is the possibility that filesystems do such requests, someone needs to do the splitting of the requests. The question is who. The driver or the block layer. well, the filesystem doesn't issue requests for more than 64k at a time. newfs won't allow you to build a 128k block file filesystem. BTW, for my use case I don't actually want to limit the block size, but rather the number of DMA segments. But the most reasonable way to do that seemed to set minphys to max_segments * pagesize. If we change how these things work, we could take the number of DMA segments into account. can't help you with this one.
Re: bge(4) Jumbo support for newer chipsets
On 2 September 2014 03:54, Brad Smith b...@comstyle.com wrote: On Wed, Aug 27, 2014 at 02:25:27AM -0400, Brad Smith wrote: Looking for some testing of the following diff to add Jumbo support for the BCM5714 / BCM5780 and BCM5717 / BCM5719 / BCM5720 / BCM57765 / BCM57766 chipsets. Here is an updated diff with bge_rxrinfo() being fixed. get it in. OK mikeb
Re: splnet() and SIOCSIFADDR
On 3 September 2014 15:53, Martin Pieuchot mpieuc...@nolizard.org wrote: On 03/09/14(Wed) 15:25, Martin Pieuchot wrote: Drivers that need a splnet() protection inside their SIOCSIFADDR generally raise the spl level themselves, so we should not need to do that in in{6,}_ifinit(). One exception to this rule is, as always, carp(4)... So the diff below moves the spl dance inside carp's SIOCSIFADDR handler, it's a baby step, so we can take care of carp_set_addr{6,} later. mikeb@ pointed out that a spl dance was missing from the SIOCDIFADDR_IN6 ioctl, updated diff below also fixes that. i think we're ready to step on a carefully placed landmine. OK mikeb
Re: RTFREE - rtfree
On 8 October 2014 12:24, Martin Pieuchot mpieuc...@nolizard.org wrote: Diff below kills the macro and use the fonction instead since they are equivalent. It also replaces some 0 - NULL where it applies. It does not include the manpage bits, I'll deal with that afterward. I'd appreciate reviews and oks. although syn_cache_put doesn't really require nullification, i'm ok with the diff.
Re: improving OpenBSD's gmac.c...
On 8 October 2014 00:48, John-Mark Gurney j...@funkthat.com wrote: Christian Weisgerber wrote this message on Tue, Oct 07, 2014 at 23:08 +0200: John-Mark Gurney: So, as I was working on FreeBSD's implementation of gmac.c, I noticed that I was able to get a significant speed up by using a mask instead of an if branch in ghash_gfmul in gmac.c from OpenBSD... Add a mask var and replace the code between the comments update Z and update V w/: mask = !!(x[i 3] (1 (~i 7))); mask = ~(mask - 1); z[0] ^= v[0] mask; z[1] ^= v[1] mask; z[2] ^= v[2] mask; z[3] ^= v[3] mask; And you should see a nice performance increase... I tried this on a Soekris net6501-50 and the performance increase was around 1.3%. (I set up an ESP transport association with AES-128-GMAC and pushed UDP traffic with tcpbench over it.) Yeh, I benchmarked the raw algo in userland, not part of IPsec. I forget the resulting perf increase, but it was well more than 10-20%. A look at the generated amd64 assembly code shows that the change indeed removes a branch. What's pretty shocking is that this code mul = v[3] 1; ... v[0] = (v[0] 1) ^ (0xe100 * mul); is turned into an actual imul instruction by GCC. I used the same masking approach to get rid of the multiplication, but the improvement was minuscule (1%). Hmm. interesting... In my code I switched both to using the and operator... I also have code which switches the registers to 64bits so that on amd64, we make better uses of registers, and on i386, the compilers are good at breaking down the 64bit registers to 32bit w/o extra work... I also have an implementation of ghash that does a 4 bit lookup table version with the table split between cache lines in p4 at: https://p4db.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/opencrypto/sys/opencrypto/gfmult.cREV=4 I'll have to look at this, but haven't there been increasing misgivings about table implementations for GHASH because of timing attacks? Well, the code avoids one issue but introduces another issue... Each table lookup accesses all four lines (assuming 64 byte cache lines), so there isn't a problem there... The issue intrroduded is that the first 64 bits of a cache line are faster to access than the remaining bits... For more info, look at the NSS bug: https://bugzilla.mozilla.org/show_bug.cgi?id=868948 Though considering that the AES implementation that FreeBSD is still using is the standard table based AES, there are also timing issues there too... Yes, no excuse for opening up additional windows... I have thought about introducing an option of slow but secure and fast but insecure against local attackers... Since we are already on the fast but insecure against local attackers path, I figured I'll take the extra performance... As with all things, there is a trade off between speed and security... As most IPsec gateways don't have a local user trying to target the key, this is less of an issue... And even if they did, it'd probably be easier to use a local root exploit to get the key than try to mount a timing attack to extract a key that might change in an hour or a day... -- John-Mark Gurney Voice: +1 415 225 5579 All that I will do, has been done, All that I have, has not. Hi, I've talked to Theo and it looks like we'll be importing your GF2 multiplication library as is. I think we should concentrate on making table version of gmac.c work better. Cheers, Mike
Re: A system without interface?
On 14 October 2014 11:01, Martin Pieuchot mpieuc...@nolizard.org wrote: On 08/10/14(Wed) 14:29, Martin Pieuchot wrote: I'm looking after the uses of the global list of interface. These ones are pointless, you always have at least one interface on your system. Ok? Anyone? looks good to me. ok mikeb
Re: pool page colouring
On 28 October 2014 17:02, Ted Unangst t...@tedunangst.com wrote: On Tue, Oct 28, 2014 at 16:49, David Gwynne wrote: when i shuffled the locking in pools around, page colouring was left behind. page colouring is where you offset items within a page if you have enough slack space. the previous implementation simply incremented the colour so each new page got the next offset. i didnt do this because the page and its items are now initted outside the lock, so maintaining that curcolour iterator wasnt as easy. this sidesteps the curcolor maintenance by just having each page randomly pick a colour when it's set up. Would it make more sense to use the page address to pick the color? Does it actually still make sense to keep page coloring? Is there still benefit on modern hardware?
Re: pool page colouring
On 29 October 2014 22:52, Ted Unangst t...@tedunangst.com wrote: On Wed, Oct 29, 2014 at 07:25, David Gwynne wrote: i dunno. im fine with either removing colouring altogether or setting it from something else completely. i just want a decision to be made cos right now ph_color isnt set, which is a bug. there. i fixed it. so is there any performance difference?
Re: Multipath for HOST p2p routes
On 4 November 2014 12:51, Martin Pieuchot mpieuc...@nolizard.org wrote: How are we suppose to support configuration with multiple p2p interfaces pointing to the same destination address? Right now only one route to host is added. Diff below replaces a hack that move a host route from one p2p interface to another by always installing MPATH routes. It assumes that multiples p2p interfaces configured to the same destination address will have different local addresses. ok? After discussing it with Martin it sounds like a good idea. There's some semi-unrelated cleanup in in_ifinit that he has promised to commit separately. OK mikeb for both changes.
Re: Kill in_iawithaddr()
On 4 November 2014 12:52, Martin Pieuchot mpieuc...@nolizard.org wrote: This function is just a wrapper around ifa_ifwithaddr() and I'd prefer to have less function iterating over the global list of interfaces. ok? what's not immediately apparent is that it also makes sure that the address that ifa_ifwithaddr returns is not a broadcast address. oh god. ok for the cleanup
Re: network pool names
On 4 November 2014 13:23, Martin Pieuchot mpieuc...@nolizard.org wrote: Remove pl suffix, ok? ok with a syncache instead of syn
Re: pool page colouring
On 5 November 2014 01:12, Mike Belopuhov m...@belopuhov.com wrote: well, first of all, right now this is a rather theoretical gain. we need to test it to understand if it makes things easier. err. i meant to say go faster not easier.
Re: pool page colouring
On 5 November 2014 00:38, David Gwynne da...@gwynne.id.au wrote: On 30 Oct 2014, at 07:52, Ted Unangst t...@tedunangst.com wrote: On Wed, Oct 29, 2014 at 07:25, David Gwynne wrote: i dunno. im fine with either removing colouring altogether or setting it from something else completely. i just want a decision to be made cos right now ph_color isnt set, which is a bug. there. i fixed it. looks like we were both ignorant and wrong. mikeb@ points out this from the original slab paper: 4.1. Impact of Buffer Address Distribution on Cache Utilization The address distribution of mid-size buffers can affect the system’s overall cache utilization. In par- ticular, power-of-two allocators - where all buffers are 2 n bytes and are 2 n -byte aligned - are pes- simal.* Suppose, for example, that every inode (∼ 300 bytes) is assigned a 512-byte buffer, 512-byte aligned, and that only the first dozen fields of an inode (48 bytes) are frequently referenced. Then the majority of inode-related memory traffic will be at addresses between 0 and 47 modulo 512. Thus the cache lines near 512-byte boundaries will be heavily loaded while the rest lie fallow. In effect only 9% (48/512) of the cache will be usable by inodes. Fully-associative caches would not suffer this problem, but current hardware trends are toward simpler rather than more complex caches. 4.3. Slab Coloring The slab allocator incorporates a simple coloring scheme that distributes buffers evenly throughout the cache, resulting in excellent cache utilization and bus balance. The concept is simple: each time a new slab is created, the buffer addresses start at a slightly different offset (color) from the slab base (which is always page-aligned). For example, for a cache of 200-byte objects with 8-byte alignment, the first slab’s buffers would be at addresses 0, 200, 400, ... relative to the slab base. The next slab’s buffers would be at offsets 8, 208, 408, ... and so on. The maximum slab color is determined by the amount of unused space in the slab. we run on enough different machines that i think we should consider this. well, first of all, right now this is a rather theoretical gain. we need to test it to understand if it makes things easier. to see cache statistics we can use performance counters, however current pctr code might be a bit out of date. so the question is if we do bring colouring back, how do we calculate it? arc4random? mask bits off ph_magic? atomic_inc something in the pool? read a counter from the pool? shift bits off the page address? the way i read it is that you have a per-pool running value pr_color that you increment by the item alignment or native cache line size modulo space available for every page you are getting from uvm. however i can see that it might entail a problem by locating a page header (or was it page boundary? don't have the code at hand) using simple math.
Re: iked responds with esp over external ips.
On 4 November 2014 17:06, Martin Larsson martin.larss...@gmail.com wrote: Hello! I've setup a tunnel between OpenBSD 5.6 using iked and an openwrt router running strongswan. The tunnel works great with ping and other traffic but traffic between the two external ip's dies. This is a site-to-site connection and nothing fancy. iked.conf on OpenBSD. ikev2 esp from $10.11.12.0/24 to $194.168.4.0/24 peer $tcgw srcid sippan.se # ipsecctl -sa FLOWS: flow esp in from 192.168.4.0/24 to 10.11.12.0/24 peer 82.17.12.21 srcid FQDN/sippan.se dstid FQDN/sswan.sippan.se type use flow esp out from 10.11.12.0/24 to 192.168.4.0/24 peer 82.17.12.21 srcid FQDN/sippan.se dstid FQDN/sswan.sippan.se type require flow esp out from ::/0 to ::/0 type deny SAD: esp tunnel from 82.17.12.21 to 130.51.23.4 spi 0x67483925 auth hmac-sha1 enc aes esp tunnel from 130.51.23.4 to 82.17.12.21 spi 0xcf1f39d1 auth hmac-sha1 enc aes # netstat -nr Routing tables Internet: DestinationGatewayFlags Refs Use Mtu Prio Iface default130.51.23.4 UGS 10 30430256 - 8 em0 10/8 link#5 UC 10 - 4 vether0 10.11.12.13fe:e1:ba:d0:d6:1c UHLl 01 - 1 lo0 10.255.255.255 link#5 UHLc 3 570 - 4 vether0 82.17.12.21 130.51.23.4 UGHD 0 30430251 - L 56 em0 127/8 127.0.0.1 UGRS 00 32768 8 lo0 127.0.0.1 127.0.0.1 UH 16 32768 4 lo0 194.48.213.128/27 link#1 UC 10 - 4 em0 130.51.23.4 00:00:cd:19:95:16 UHLc 20 - 4 em0 130.51.23.4 00:02:b3:aa:cc:c3 HLl00 - 1 lo0 224/4 127.0.0.1 URS00 32768 8 lo0 Internet6: -removed, dont use it- Encap: Source Port DestinationPort Proto SA(Address/Proto/Type/Direction) 192.168.4/24 0 10.11.12/240 0 82.17.12.21/esp/use/in 10.11.12/240 192.168.4/24 0 0 82.17.12.21/esp/require/out default0 default 0 0 none/esp/deny/out # tcpdump on openbsd while trying to connect with ssh to the external ip of the OpenBSD host from the exernal ip of the other end. # tcpdump host 82.17.12.21 tcpdump: listening on em0, link-type EN10MB tcpdump: WARNING: compensating for unaligned libpcap packets 16:49:55.539903 egget.priv.lamest.se.54158 loller.sippan.se.ssh: S 2729317717:2729317717(0) win 8192 mss 1460,nop,wscale 8,nop,nop,sackOK (DF) 16:49:55.539932 loller.sippan.se.ssh egget.priv.lamest.se.54158: S 2317435827:2317435827(0) ack 2729317718 win 16384 mss 1240,nop,nop,sackOK,nop,wscale 3 16:49:55.545936 egget.priv.lamest.se.54158 loller.sippan.se.ssh: . ack 1 win 256 (DF) 16:49:55.553927 esp loller.sippan.se egget.priv.lamest.se spi 0xcf1f39d1 seq 190 len 100 16:50:01.553883 esp loller.sippan.se egget.priv.lamest.se spi 0xcf1f39d1 seq 191 len 100 16:50:05.977468 esp egget.priv.lamest.se loller.sippan.se spi 0x67483925 seq 127 len 84 (DF) 16:50:05.977519 esp loller.sippan.se egget.priv.lamest.se spi 0xcf1f39d1 seq 192 len 84 # tcpdump on enc0 while trying ssh and https tcpdump: listening on enc0, link-type ENC tcpdump: WARNING: compensating for unaligned libpcap packets 17:01:01.578622 (authentic,confidential): SPI 0xc31749f4: loller.sippan.se.ssh egget.priv.lamest.se.54158: R 2317435850:2317435850(0) ack 2729317718 win 0 (encap) 17:01:05.786123 (authentic,confidential): SPI 0xc31749f4: loller.sippan.se.ssh egget.priv.lamest.se.54792: P 3813334764:3813334785(21) ack 2711749548 win 2170 (encap) 17:01:05.968654 (authentic,confidential): SPI 0xc31749f4: loller.sippan.se.https egget.priv.lamest.se.54793: P 3540908942:3540909100(158) ack 1840586787 win 2170 (encap) 17:01:06.265543 (authentic,confidential): SPI 0xc31749f4: loller.sippan.se.https egget.priv.lamest.se.54793: . ack 1 win 2170 (encap) 17:01:06.876165 (authentic,confidential): SPI 0xc31749f4: loller.sippan.se.https egget.priv.lamest.se.54793: . ack 1 win 2170 (encap) 17:01:08.095189 (authentic,confidential): SPI 0xc31749f4: loller.sippan.se.https egget.priv.lamest.se.54793: . ack 1 win 2170 (encap) 17:01:10.459116 (authentic,confidential): SPI 0xc31749f4: loller.sippan.se.https egget.priv.lamest.se.54793: . ack 1 win 2170 (encap) So it appears that OpenBSD tries to send back traffic with ESP when it shouldn't. I'd also like to add that the exact same setup works with with isakmpd. Best regards Martin This is a known issue. The reason why it happens is not strictly documented. The change to the stack was introduced as part of the cleanup long time ago. Supposedly it's there to support TCP MD5 signatures, but I didn't verify that yet. There's this code in the
Re: iked responds with esp over external ips.
On 5 November 2014 13:28, Mike Belopuhov m...@belopuhov.com wrote: On 4 November 2014 17:06, Martin Larsson martin.larss...@gmail.com wrote: Hello! I've setup a tunnel between OpenBSD 5.6 using iked and an openwrt router running strongswan. The tunnel works great with ping and other traffic but traffic between the two external ip's dies. This is a site-to-site connection and nothing fancy. iked.conf on OpenBSD. ikev2 esp from $10.11.12.0/24 to $194.168.4.0/24 peer $tcgw srcid sippan.se # ipsecctl -sa FLOWS: flow esp in from 192.168.4.0/24 to 10.11.12.0/24 peer 82.17.12.21 srcid FQDN/sippan.se dstid FQDN/sswan.sippan.se type use flow esp out from 10.11.12.0/24 to 192.168.4.0/24 peer 82.17.12.21 srcid FQDN/sippan.se dstid FQDN/sswan.sippan.se type require flow esp out from ::/0 to ::/0 type deny SAD: esp tunnel from 82.17.12.21 to 130.51.23.4 spi 0x67483925 auth hmac-sha1 enc aes esp tunnel from 130.51.23.4 to 82.17.12.21 spi 0xcf1f39d1 auth hmac-sha1 enc aes # netstat -nr Routing tables Internet: DestinationGatewayFlags Refs Use Mtu Prio Iface default130.51.23.4 UGS 10 30430256 - 8 em0 10/8 link#5 UC 10 - 4 vether0 10.11.12.13fe:e1:ba:d0:d6:1c UHLl 01 - 1 lo0 10.255.255.255 link#5 UHLc 3 570 - 4 vether0 82.17.12.21 130.51.23.4 UGHD 0 30430251 - L 56 em0 127/8 127.0.0.1 UGRS 00 32768 8 lo0 127.0.0.1 127.0.0.1 UH 16 32768 4 lo0 194.48.213.128/27 link#1 UC 10 - 4 em0 130.51.23.4 00:00:cd:19:95:16 UHLc 20 - 4 em0 130.51.23.4 00:02:b3:aa:cc:c3 HLl00 - 1 lo0 224/4 127.0.0.1 URS00 32768 8 lo0 Internet6: -removed, dont use it- Encap: Source Port DestinationPort Proto SA(Address/Proto/Type/Direction) 192.168.4/24 0 10.11.12/240 0 82.17.12.21/esp/use/in 10.11.12/240 192.168.4/24 0 0 82.17.12.21/esp/require/out default0 default 0 0 none/esp/deny/out # tcpdump on openbsd while trying to connect with ssh to the external ip of the OpenBSD host from the exernal ip of the other end. # tcpdump host 82.17.12.21 tcpdump: listening on em0, link-type EN10MB tcpdump: WARNING: compensating for unaligned libpcap packets 16:49:55.539903 egget.priv.lamest.se.54158 loller.sippan.se.ssh: S 2729317717:2729317717(0) win 8192 mss 1460,nop,wscale 8,nop,nop,sackOK (DF) 16:49:55.539932 loller.sippan.se.ssh egget.priv.lamest.se.54158: S 2317435827:2317435827(0) ack 2729317718 win 16384 mss 1240,nop,nop,sackOK,nop,wscale 3 16:49:55.545936 egget.priv.lamest.se.54158 loller.sippan.se.ssh: . ack 1 win 256 (DF) 16:49:55.553927 esp loller.sippan.se egget.priv.lamest.se spi 0xcf1f39d1 seq 190 len 100 16:50:01.553883 esp loller.sippan.se egget.priv.lamest.se spi 0xcf1f39d1 seq 191 len 100 16:50:05.977468 esp egget.priv.lamest.se loller.sippan.se spi 0x67483925 seq 127 len 84 (DF) 16:50:05.977519 esp loller.sippan.se egget.priv.lamest.se spi 0xcf1f39d1 seq 192 len 84 There's another interesting behavior evident in this trace: first few (usually 1 or 2) packets (SYN-ACK from loller here) are not encrypted. That's because we update our PCB on input due to the presence of a matching SA in a very agile manner: first it doesn't really match, but we make sure next time it does (funny logic in the ipsp_spd_inp that coupled with the default IPsec policy).
Re: 5.6 Icmp6 checksum / pf
On Sun, Nov 09, 2014 at 10:17 +0100, Bastien Durel wrote: Hi, I recently upgraded to 5.6, and got problems with icmpv6 I have a gif tunnel for IPv6: [root@fremen root]# ifconfig gif0 gif0: flags=8051UP,POINTOPOINT,RUNNING,MULTICAST mtu 1280 description: Sixxs priority: 0 groups: gif egress tunnel: inet 78.194.94.73 - 212.100.184.146 inet6 fe80::200:24ff:fecf:42ac%gif0 - prefixlen 64 scopeid 0x10 inet6 2001:6f8:202:19c::2 - 2001:6f8:202:19c::1 prefixlen 128 When I initiate a ping from this interface, it works as intended: 08:59:38.376107 2001:6f8:202:19c::2 2001:41d0:8:91a::1: icmp6: echo request (id:392e seq:0) [icmp6 cksum ok] (len 16, hlim 64) 08:59:38.410385 2001:41d0:8:91a::1 2001:6f8:202:19c::2: icmp6: echo reply (id:392e seq:0) [icmp6 cksum ok] (len 16, hlim 57) 08:59:39.389383 2001:6f8:202:19c::2 2001:41d0:8:91a::1: icmp6: echo request (id:392e seq:1) [icmp6 cksum ok] (len 16, hlim 64) 08:59:39.421397 2001:41d0:8:91a::1 2001:6f8:202:19c::2: icmp6: echo reply (id:392e seq:1) [icmp6 cksum ok] (len 16, hlim 57) But when a ping from outside reached it, the echo reply is sent with a bad (0) checksum, and the packet is dropped by te other side: 09:40:28.852102 2001:41d0:8:91a::1 2001:6f8:202:19c::2: icmp6: echo request (id:6c10 seq:1) [icmp6 cksum ok] (len 64, hlim 57) 09:40:28.852251 2001:6f8:202:19c::2 2001:41d0:8:91a::1: icmp6: echo reply (id:6c10 seq:1) [bad icmp6 cksum 0! - 2d93] (len 64, hlim 64) 09:40:29.860327 2001:41d0:8:91a::1 2001:6f8:202:19c::2: icmp6: echo request (id:6c10 seq:2) [icmp6 cksum ok] (len 64, hlim 57) 09:40:29.860432 2001:6f8:202:19c::2 2001:41d0:8:91a::1: icmp6: echo reply (id:6c10 seq:2) [bad icmp6 cksum 0! - 8a71] (len 64, hlim 64) It works correctly with this pf rule disabled: pass in on gif0 reply-to ( gif0 2001:6f8:202:19c::1 ) keep state What's the correct way to handle correct return-path ? Regards, -- Bastien Durel hi, looks like it's caused by the forgotten in6_proto_cksum_out call in the pf_route6 after possible pf_map_addr/PF_ACPY manipulations. bastien, can you please verify that the diff below fixes your issue? diff --git sys/net/pf.c sys/net/pf.c index 979f44d..4c934db 100644 --- sys/net/pf.c +++ sys/net/pf.c @@ -5777,20 +5777,22 @@ pf_route6(struct mbuf **m, struct pf_rule *r, int dir, struct ifnet *oifp, goto bad; else if (m0 == NULL) goto done; if (m0-m_len sizeof(struct ip6_hdr)) { DPFPRINTF(LOG_ERR, pf_route6: m0-m_len sizeof(struct ip6_hdr)); goto bad; } } + in6_proto_cksum_out(m0, ifp); + /* * If the packet is too large for the outgoing interface, * send back an icmp6 error. */ if (IN6_IS_SCOPE_EMBED(dst-sin6_addr)) dst-sin6_addr.s6_addr16[1] = htons(ifp-if_index); if ((u_long)m0-m_pkthdr.len = ifp-if_mtu) { nd6_output(ifp, ifp, m0, dst, NULL); } else { in6_ifstat_inc(ifp, ifs6_in_toobig);
Re: improving OpenBSD's gmac.c...
On 10 October 2014 02:39, Damien Miller d...@mindrot.org wrote: On Thu, 9 Oct 2014, Christian Weisgerber wrote: John-Mark Gurney: I also have an implementation of ghash that does a 4 bit lookup table version with the table split between cache lines in p4 at: https://p4db.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/opencrypto/sys/opencrypto/gfmult.cREV=4 This also has a version with does 4 blocks at a time getting a further speed up... FWIW, I did a quick dirty merge of this into the OpenBSD tree and the speed of my test (net6501-50, tcpbench -u over esp aes-128-gmac) almost doubled. isn't this likely to make it more likely to be subject to timing attacks? then how is this different to our table based aes implementation? and it's the same C code as in openssl which also uses table based gcm implementation. what countermeasures can be applied to the table lookup code to fight these attacks?
Re: divert-to with sockets bound to any
On 19 June 2013 20:20, Reyk Floeter r...@openbsd.org wrote: On Wed, Jun 19, 2013 at 08:00:01PM +0200, Reyk Floeter wrote: OK? I forgot the in6_pcblookup_listen() case, updated diff below. Reyk it boils down to the pcb lookup magic as i thought; ok mikeb.
Stop calling IPsec and pf under splnet
Hi, As it was pointed out by dhill there are some rogue splnets in the tcp_input that shouldn't be there really. The only reason they're still there is to match overzealous splnets in bridge_ broadcast. bridge_ifenqueue is the only function call in there that requires splnet protection since it's dealing with send queues. Narrowing the range of the splnet protection allows us to remove all splnet protection of the IPsec SPD and TDB code. This as well removes the only pf_test call done under IPL_NET. Below are essentially two diffs that are rather hard to separate. I've tested the diff with the gif-to-ethernet IPsec bridge but some additional IPsec and bridge testing won't hurt. mpi@ has provided some feedback already, so I'm really looking for OK's on this. diff --git sys/net/if_bridge.c sys/net/if_bridge.c index 41d7b67..0ca2710 100644 --- sys/net/if_bridge.c +++ sys/net/if_bridge.c @@ -969,12 +969,10 @@ bridge_output(struct ifnet *ifp, struct mbuf *m, struct sockaddr *sa, return (ENOBUFS); } eh = mtod(m, struct ether_header *); dst = (struct ether_addr *)eh-ether_dhost[0]; - s = splnet(); - /* * If bridge is down, but original output interface is up, * go ahead and send out that interface. Otherwise the packet * is dropped below. */ @@ -1007,11 +1005,10 @@ bridge_output(struct ifnet *ifp, struct mbuf *m, struct sockaddr *sa, */ if ((mtag = m_tag_find(m, PACKET_TAG_IPSEC_OUT_CRYPTO_NEEDED, NULL)) != NULL) { ipsp_skipcrypto_unmark((struct tdb_ident *)(mtag + 1)); m_freem(m); - splx(s); return (0); } #endif /* IPSEC */ bridge_span(sc, NULL, m); @@ -1074,22 +1071,24 @@ bridge_output(struct ifnet *ifp, struct mbuf *m, struct sockaddr *sa, m1-m_pkthdr.len = len; } mc = m1; } + s = splnet(); error = bridge_ifenqueue(sc, dst_if, mc); + splx(s); if (error) continue; } if (!used) m_freem(m); - splx(s); return (0); } sendunicast: bridge_span(sc, NULL, m); + s = splnet(); if ((dst_if-if_flags IFF_RUNNING) == 0) { m_freem(m); splx(s); return (ENETDOWN); } @@ -1251,13 +1250,11 @@ bridgeintr_frame(struct bridge_softc *sc, struct mbuf *m) * If the packet is a multicast or broadcast OR if we don't * know any better, forward it to all interfaces. */ if ((m-m_flags (M_BCAST | M_MCAST)) || dst_if == NULL) { sc-sc_if.if_imcasts++; - s = splnet(); bridge_broadcast(sc, src_if, eh, m); - splx(s); return; } /* * At this point, we're dealing with a unicast frame going to a @@ -1496,13 +1493,11 @@ bridge_broadcast(struct bridge_softc *sc, struct ifnet *ifp, struct ether_header *eh, struct mbuf *m) { struct bridge_iflist *p; struct mbuf *mc; struct ifnet *dst_if; - int len, used = 0; - - splassert(IPL_NET); + int len, s, used = 0; TAILQ_FOREACH(p, sc-sc_iflist, next) { /* * Don't retransmit out of the same interface where * the packet was received from. @@ -1587,11 +1582,13 @@ bridge_broadcast(struct bridge_softc *sc, struct ifnet *ifp, len += ETHER_VLAN_ENCAP_LEN; #endif if ((len - ETHER_HDR_LEN) dst_if-if_mtu) bridge_fragment(sc, dst_if, eh, mc); else { + s = splnet(); bridge_ifenqueue(sc, dst_if, mc); + splx(s); } } if (!used) m_freem(m); @@ -1643,11 +1640,11 @@ bridge_span(struct bridge_softc *sc, struct ether_header *eh, struct mbuf *morig) { struct bridge_iflist *p; struct ifnet *ifp; struct mbuf *mc, *m; - int error; + int s, error; if (TAILQ_EMPTY(sc-sc_spanlist)) return; m = m_copym2(morig, 0, M_COPYALL, M_NOWAIT); @@ -1679,11 +1676,13 @@ bridge_span(struct bridge_softc *sc, struct ether_header *eh, if (mc == NULL) { sc-sc_if.if_oerrors++; continue; } + s = splnet(); error = bridge_ifenqueue(sc, ifp, mc); + splx(s); if (error)
ix(4) driver update to latest FreeBSD/Intel source code
Hi, The following diff updates most of the ix(4) driver to what FreeBSD and Intel have today. Most importantly it introduces support for the Ethernet flow control. Please test and report. OK's are welcome as well. http://theapt.org/~mike/ix.diff http://theapt.org/~mike/ix-w.diff (less whitespace changes)
Re: include netinet/in_var.h in dev
On 6 August 2013 03:54, Alexander Bluhm alexander.bl...@gmx.net wrote: Hi, For an upcoming change in in6_var.h I would like to minimize the impact. Most network drivers include netinet/in_var.h, but apparently they don't have to. Can we remove these includes? compiled on amd64 and i386 ok? bluhm should be fine. ok mikeb
Re: Insert new IPv4 addresses at only one place
On 7 August 2013 15:07, Martin Pieuchot mpieuc...@nolizard.org wrote: Diff below deduplicate and move the code adding a new address to the global list into in_ifinit(), there's no functional change. While here add a comment about why we always delete addresses from the tree during update. ok? OK mikeb
Re: Constify the null sockaddr in arp_rtrequest()
On 8 August 2013 12:35, Martin Pieuchot mpieuc...@nolizard.org wrote: arp_rtrequest() uses a default static sockaddr_dl which is only used read-only: it is copied by rt_setgate(). I'd like to constify this structure to make it clear no value can be trashed if code using it is run in parallel. Also remove another reference to the name of the variable to make it clear it is used only once. ok? ok mikeb
Re: tedu netatm
On 9 August 2013 09:36, Martin Pieuchot mpieuc...@nolizard.org wrote: It's me again :) With a freshly updated and tested diff to tedu netatm. I got no objection since I raised the issue 5 months ago [0], so I'm now looking for oks. [0] http://marc.info/?l=openbsd-techm=136335787207091w=2 i think it should go the way of doo doo.
ray(4) removal
to make mpi's life a tad easier and also lose some weight, i'd like to move rat(4) to the attic. mpi, kettenis, jsg and henning agree. i'll commit the diff if noone objects. henning has also suggested to remove the pre-wifi era cnw(4). if there's interest i can cook a diff for that as well, but it doesn't seem to get in the way at the moment. Index: arch/amd64/conf/GENERIC === RCS file: /cvs/openbsd/src/sys/arch/amd64/conf/GENERIC,v retrieving revision 1.343 diff -u -p -r1.343 GENERIC --- arch/amd64/conf/GENERIC 12 Aug 2013 04:11:52 - 1.343 +++ arch/amd64/conf/GENERIC 13 Aug 2013 13:10:57 - @@ -472,7 +472,6 @@ wi* at pcmcia? # WaveLAN IEEE 802.11DS an*at pci? # Aironet IEEE 802.11DS an*at pcmcia? # Aironet IEEE 802.11DS cnw* at pcmcia? # Xircom Netwave -ray* at pcmcia? # Raylink Aviator2.4/Pro 802.11FH iwi* at pci? # Intel PRO/Wireless 2200BG/2915ABG wpi* at pci? # Intel PRO/Wireless 3945ABG iwn* at pci? # Intel WiFi Link 4965/5000/1000/6000 Index: arch/amd64/conf/RAMDISK === RCS file: /cvs/openbsd/src/sys/arch/amd64/conf/RAMDISK,v retrieving revision 1.55 diff -u -p -r1.55 RAMDISK --- arch/amd64/conf/RAMDISK 15 Mar 2012 20:06:02 - 1.55 +++ arch/amd64/conf/RAMDISK 13 Aug 2013 13:10:57 - @@ -217,7 +217,6 @@ nfe*at pci? # NVIDIA nForce Ethernet #wi* at pcmcia? # WaveLAN IEEE 802.11DS #an* at pcmcia? # Aironet IEEE 802.11DS #cnw* at pcmcia? # Xircom Netwave -#ray* at pcmcia? # Raylink Aviator2.4/Pro 802.11FH # Media Independent Interface (mii) drivers #exphy*at mii? # 3Com internal PHYs Index: arch/amd64/conf/RAMDISK_CD === RCS file: /cvs/openbsd/src/sys/arch/amd64/conf/RAMDISK_CD,v retrieving revision 1.125 diff -u -p -r1.125 RAMDISK_CD --- arch/amd64/conf/RAMDISK_CD 5 Apr 2013 05:45:09 - 1.125 +++ arch/amd64/conf/RAMDISK_CD 13 Aug 2013 13:10:57 - @@ -291,7 +291,6 @@ wi* at pci? # WaveLAN IEEE 802.11DS wi*at pcmcia? # WaveLAN IEEE 802.11DS an*at pcmcia? # Aironet IEEE 802.11DS #cnw* at pcmcia? # Xircom Netwave -#ray* at pcmcia? # Raylink Aviator2.4/Pro 802.11FH ipw* at pci? # Intel PRO/Wireless 2100 iwi* at pci? # Intel PRO/Wireless 2200BG/2915ABG wpi* at pci? # Intel PRO/Wireless 3945ABG Index: arch/i386/conf/GENERIC === RCS file: /cvs/openbsd/src/sys/arch/i386/conf/GENERIC,v retrieving revision 1.748 diff -u -p -r1.748 GENERIC --- arch/i386/conf/GENERIC 12 Aug 2013 04:11:52 - 1.748 +++ arch/i386/conf/GENERIC 13 Aug 2013 13:10:57 - @@ -612,7 +612,6 @@ an* at pci? # Aironet IEEE 802.11DS an*at isapnp? # Aironet IEEE 802.11DS an*at pcmcia? # Aironet IEEE 802.11DS #cnw* at pcmcia? # Xircom Netwave -ray* at pcmcia? # Raylink Aviator2.4/Pro 802.11FH ipw* at pci? # Intel PRO/Wireless 2100 iwi* at pci? # Intel PRO/Wireless 2200BG/2915ABG wpi* at pci? # Intel PRO/Wireless 3945ABG Index: arch/i386/conf/RAMDISK === RCS file: /cvs/openbsd/src/sys/arch/i386/conf/RAMDISK,v retrieving revision 1.180 diff -u -p -r1.180 RAMDISK --- arch/i386/conf/RAMDISK 14 Jun 2013 01:00:24 - 1.180 +++ arch/i386/conf/RAMDISK 13 Aug 2013 13:10:57 - @@ -238,7 +238,6 @@ wi* at pci? # WaveLAN IEEE 802.11DS wi*at pcmcia? # WaveLAN IEEE 802.11DS #an* at pcmcia? # Aironet IEEE 802.11DS #cnw* at pcmcia? # Xircom Netwave -#ray* at pcmcia? # Raylink Aviator2.4/Pro 802.11FH # Media Independent Interface (mii) drivers exphy* at mii? # 3Com internal PHYs Index: arch/i386/conf/RAMDISKB === RCS file: /cvs/openbsd/src/sys/arch/i386/conf/RAMDISKB,v retrieving revision 1.121 diff -u -p -r1.121 RAMDISKB --- arch/i386/conf/RAMDISKB 15 Mar 2012 20:06:03 - 1.121 +++ arch/i386/conf/RAMDISKB 13 Aug 2013 13:10:57 - @@ -247,7 +247,6 @@ nfe*at pci? # NVIDIA nForce Ethernet #wi* at pcmcia? # WaveLAN IEEE 802.11DS
Re: rtsold ioctl sysctl
On Thu, Aug 15, 2013 at 00:39 +0200, Alexander Bluhm wrote: Hi, I would like to replace the obsolete ioctl(SIOCGDRLST_IN6) interface with sysctl(net.inet6.icmp6.nd6_drlist) in rtsold. Code copied from ndp. ok? looks good to me. OK
Re: remove obsolete nd6 ioctls
On 15 August 2013 17:34, Alexander Bluhm alexander.bl...@gmx.net wrote: Hi, After converting the last user of ioctl(SIOCGDRLST_IN6) to sysctl, I would like to remove dead kernel ioctl code. Is it save to just delete this? ok? bluhm if ports are fine with it, i'm fine as well (:
Re: threaded prof signals
On 16 August 2013 09:23, Ted Unangst t...@tedunangst.com wrote: Actually, here's my concern. There's only one timeout for the process. What happens when two threads are running on two CPUs? Is there a guarantee that cpu0 will both set and execute the timeout before cpu1 sets it, or is there a race where cpu1 will fail to send the signal to its thread because cpu0 has already processed the timeout? softclock and hence the timeout processing is done only by the cpu0 currently.
Re: Stop using static variables in ICMP
On 9 August 2013 11:04, Martin Pieuchot mpieuc...@nolizard.org wrote: This is the last episode from the first season of the serie, move your variables to the stack. Like in the previous episodes, this one will let us execute the various icmp functions in parallel without risk of trashing a value. It also reduces the difference with the icmp6 code adding a redirect route. ok? OK
Re: src/sbin/ifconfig: missing include
On 19 August 2013 12:52, David Coppa dco...@gmail.com wrote: This misses util.h: cc -O2 -pipe -fno-pie -Wall -DINET6 -c /usr/src/sbin/ifconfig/ifconfig.c /usr/src/sbin/ifconfig/ifconfig.c: In function 'setifwpakey': /usr/src/sbin/ifconfig/ifconfig.c:1759: warning: implicit declaration of function 'pkcs5_pbkdf2' OK? OK
bge: call if_link_state_change when state is actually different
hi, bge(4) is the last driver in the tree that is willing to call if_link_state_change whenever, while others do so only when the link state does change. there should be no real change in functionality. ok? diff --git sys/dev/pci/if_bge.c sys/dev/pci/if_bge.c index 5cd56e2..233ccab 100644 --- sys/dev/pci/if_bge.c +++ sys/dev/pci/if_bge.c @@ -4512,22 +4512,27 @@ bge_link_upd(struct bge_softc *sc) if (BGE_ASICREV(sc-bge_chipid) == BGE_ASICREV_BCM5704) BGE_CLRBIT(sc, BGE_MAC_MODE, BGE_MACMODE_TBI_SEND_CFGS); CSR_WRITE_4(sc, BGE_MAC_STS, 0x); status = CSR_READ_4(sc, BGE_MAC_MODE); - ifp-if_link_state = - (status BGE_MACMODE_HALF_DUPLEX) ? + link = (status BGE_MACMODE_HALF_DUPLEX) ? LINK_STATE_HALF_DUPLEX : LINK_STATE_FULL_DUPLEX; - if_link_state_change(ifp); ifp-if_baudrate = IF_Gbps(1); + if (ifp-if_link_state != link) { + ifp-if_link_state = link; + if_link_state_change(ifp); + } } } else if (BGE_STS_BIT(sc, BGE_STS_LINK)) { BGE_STS_CLRBIT(sc, BGE_STS_LINK); - ifp-if_link_state = LINK_STATE_DOWN; - if_link_state_change(ifp); + link = LINK_STATE_DOWN; ifp-if_baudrate = 0; + if (ifp-if_link_state != link) { + ifp-if_link_state = link; + if_link_state_change(ifp); + } } } else if (BGE_STS_BIT(sc, BGE_STS_AUTOPOLL)) { /* * Some broken BCM chips have BGE_STATFLAG_LINKSTATE_CHANGED bit * in status word always set. Workaround this bug by reading
defer routing table updates on link state changes
hi, in order to make our life a bit easier and prevent rogue accesses to the routing table from the hardware interrupt context violating all kinds of spl assumptions we would like if_link_state_change that is called by network device drivers in their interrupt service routines to defer its work to the process context or thereabouts. i did some testing here, but wouldn't mind if someone tries this diff in gre/vlan/ospf/anything-weird setups. making sure that hot-plugging/unplugging usb interfaces doesn't produce any undesirable effects would be superb as well. please note that a token (an interface index) is passed to the workq in order to make sure that if the interface would be gone by the time syswq goes around to run the task it would just fall through. ok? diff --git sys/net/if.c sys/net/if.c index 6dafd0d..5b6800a 100644 --- sys/net/if.c +++ sys/net/if.c @@ -79,10 +79,11 @@ #include sys/protosw.h #include sys/kernel.h #include sys/ioctl.h #include sys/domain.h #include sys/sysctl.h +#include sys/workq.h #include net/if.h #include net/if_dl.h #include net/if_media.h #include net/if_types.h @@ -151,10 +152,12 @@ int if_clone_list(struct if_clonereq *); struct if_clone*if_clone_lookup(const char *, int *); void if_congestion_clear(void *); intif_group_egress_build(void); +void if_link_state_change_task(void *, void *); + intifai_cmp(struct ifaddr_item *, struct ifaddr_item *); void ifa_item_insert(struct sockaddr *, struct ifaddr *, struct ifnet *); void ifa_item_remove(struct sockaddr *, struct ifaddr *, struct ifnet *); #ifndef SMALL_KERNEL void ifa_print_rb(void); @@ -1106,21 +1109,39 @@ if_up(struct ifnet *ifp) m_clinitifp(ifp); } /* - * Process a link state change. - * NOTE: must be called at splsoftnet or equivalent. + * Schedule a link state change task. */ void if_link_state_change(struct ifnet *ifp) { - rt_ifmsg(ifp); + /* try to put the routing table update task on syswq */ + workq_add_task(NULL, 0, if_link_state_change_task, + (void *)((unsigned long)ifp-if_index), NULL); +} + +/* + * Process a link state change. + */ +void +if_link_state_change_task(void *arg, void *unused) +{ + unsigned int index = (unsigned long)arg; + struct ifnet *ifp; + int s; + + s = splsoftnet(); + if ((ifp = if_get(index)) != NULL) { + rt_ifmsg(ifp); #ifndef SMALL_KERNEL - rt_if_track(ifp); + rt_if_track(ifp); #endif - dohooks(ifp-if_linkstatehooks, 0); + dohooks(ifp-if_linkstatehooks, 0); + } + splx(s); } /* * Handle interface watchdog timer routines. Called * from softclock, we decrement timers (if set) and
Re: defer routing table updates on link state changes
On 27 August 2013 13:39, Martin Pieuchot mpieuc...@nolizard.org wrote: I think that's the right approach but the current code generating interfaces indexes is too clever from my point of view, it tries to reuse the last index if possible. This could lead to some funny races if we detach and reattach an interface before the task get scheduled. So I'm ok with your diff if we avoid to reuse the last index too soon. Diff below does that. i'm ok with this change.
Re: Remove unused argument from *rtrequest()
On 27 August 2013 15:58, Martin Pieuchot mpieuc...@nolizard.org wrote: In order to define a proper API for our routine table, I'd like to turn the struct rt_addrinfo into a private type (ie: only used in route.c and rtsock.c). This type is used by a lost of code in our network stack to add or delete a route but also in the various *rtrequest() functions. However in these functions the argument is never used! So the diff below kills it. ok? ok
Re: defer routing table updates on link state changes
On Mon, Aug 26, 2013 at 13:36 +0200, Mike Belopuhov wrote: hi, in order to make our life a bit easier and prevent rogue accesses to the routing table from the hardware interrupt context violating all kinds of spl assumptions we would like if_link_state_change that is called by network device drivers in their interrupt service routines to defer its work to the process context or thereabouts. i did some testing here, but wouldn't mind if someone tries this diff in gre/vlan/ospf/anything-weird setups. making sure that hot-plugging/unplugging usb interfaces doesn't produce any undesirable effects would be superb as well. please note that a token (an interface index) is passed to the workq in order to make sure that if the interface would be gone by the time syswq goes around to run the task it would just fall through. ok? i've got an ok from mpi, anyone else would like to test and/or comment? diff --git sys/net/if.c sys/net/if.c index 6dafd0d..5b6800a 100644 --- sys/net/if.c +++ sys/net/if.c @@ -79,10 +79,11 @@ #include sys/protosw.h #include sys/kernel.h #include sys/ioctl.h #include sys/domain.h #include sys/sysctl.h +#include sys/workq.h #include net/if.h #include net/if_dl.h #include net/if_media.h #include net/if_types.h @@ -151,10 +152,12 @@ int if_clone_list(struct if_clonereq *); struct if_clone *if_clone_lookup(const char *, int *); void if_congestion_clear(void *); int if_group_egress_build(void); +void if_link_state_change_task(void *, void *); + int ifai_cmp(struct ifaddr_item *, struct ifaddr_item *); void ifa_item_insert(struct sockaddr *, struct ifaddr *, struct ifnet *); void ifa_item_remove(struct sockaddr *, struct ifaddr *, struct ifnet *); #ifndef SMALL_KERNEL void ifa_print_rb(void); @@ -1106,21 +1109,39 @@ if_up(struct ifnet *ifp) m_clinitifp(ifp); } /* - * Process a link state change. - * NOTE: must be called at splsoftnet or equivalent. + * Schedule a link state change task. */ void if_link_state_change(struct ifnet *ifp) { - rt_ifmsg(ifp); + /* try to put the routing table update task on syswq */ + workq_add_task(NULL, 0, if_link_state_change_task, + (void *)((unsigned long)ifp-if_index), NULL); +} + +/* + * Process a link state change. + */ +void +if_link_state_change_task(void *arg, void *unused) +{ + unsigned int index = (unsigned long)arg; + struct ifnet *ifp; + int s; + + s = splsoftnet(); + if ((ifp = if_get(index)) != NULL) { + rt_ifmsg(ifp); #ifndef SMALL_KERNEL - rt_if_track(ifp); + rt_if_track(ifp); #endif - dohooks(ifp-if_linkstatehooks, 0); + dohooks(ifp-if_linkstatehooks, 0); + } + splx(s); } /* * Handle interface watchdog timer routines. Called * from softclock, we decrement timers (if set) and
Re: ix(4): enable checksum offload
On 9 September 2013 21:48, Brad Smith b...@comstyle.com wrote: Here is a diff to enable the checksum offload support for ix(4). Looking for any testing. last time i checked this broke ospf traffic. please make sure at least ip/tcp, ip/udp, ip/icmp, ip/ip, ip/gre, ip/esp, ip/ah and ip/ospf work fine with this.
Re: em(4): enable checksum offload
On 9 September 2013 21:44, Brad Smith b...@comstyle.com wrote: Since I have been asked to send out these diffs again here is a diff to enable the checksum offload support for em(4). Looking for any testing. tx checksum offloading will not work on 75, 76, 80, i350.
Re: unknown products found in Dell Optiplex 9020
On 9 September 2013 22:46, STeve Andre' and...@msu.edu wrote: On 09/09/13 07:45, Paul de Weerd wrote: Found a couple of unknown Intel products in a Dell Optiplex 9020: vendor Intel, unknown product 0x153a (class network subclass ethernet, rev 0x04) at pci0 dev 25 function 0 not configured vendor Intel, unknown product 0x8c22 (class serial bus subclass SMBus, rev 0x04) at pci0 dev 31 function 3 not configured vendor Intel, unknown product 0x8c31 (class serial bus subclass USB, rev 0x04) at pci0 dev 20 function 0 not configured vendor Intel, unknown product 0x8c3a (class communications subclass miscellaneous, rev 0x04) at pci0 dev 22 function 0 not configured vendor Intel, unknown product 0x8c3d (class communications subclass serial, rev 0x04) at pci0 dev 22 function 3 not configured The included diff adds these to pcidevs, resulting in the following (none of these devices are supported by actual drivers yet): Intel C220 xHCI USB rev 0x04 at pci0 dev 20 function 0 not configured Intel C220 MEI controller rev 0x04 at pci0 dev 22 function 0 not configured Intel C220 KT controller rev 0x04 at pci0 dev 22 function 3 not configured Intel I217-LM rev 0x04 at pci0 dev 25 function 0 not configured The 217-LM is the Listening Monitor, by the NSA. Intel C220 SMBus controller rev 0x04 at pci0 dev 31 function 3 not configured Note that the diff below also adds I217-V, which got me confused while I was figuring out what I had in this machine. That part is not in the machine (as I understand it, it's basically a simplified version of the same NIC, lacking some enterprise features). Full dmesg after the diff. The 217-V was made by GCHQ. yes, these two come with an additional non-optional tx offload mode we don't support in our stack yet.
Re: defer routing table updates on link state changes
On 12 September 2013 17:18, Martin Pieuchot mpieuc...@nolizard.org wrote: FWIW it would be interesting to modify tun(4) so that it doesn't need to detach/reattach itself when switching between mode, this would allow us to stop reusing the last index. this definitely makes a lot of sense.
Re: defer routing table updates on link state changes
On 12 September 2013 17:31, Reyk Floeter r...@openbsd.org wrote: On Thu, Sep 12, 2013 at 05:18:39PM +0200, Martin Pieuchot wrote: For example, you have to query the IfIndex via SNMP to get further information, like the ifName or statistics, and most monitoring systems would save interface information based on the index - they would not recognize that tun0 with if_index 10 is the same interface (from an OpenBSD point of view) as tun0 with an if_index 11. It is not guaranteed in OpenBSD, but we shouldn't make it worse. All our interface drivers are associated to one index when they are attached and it does not change. The only thing is that tun(4) is special because internally when it switches from L3 to L2 or vice versa it detaches and reattaches itself. That's why this hack of reusing the last index is needed. It also matters if you create destroy and re-create any other cloner interface (vlan, ...). it makes no sense whatsoever, reyk. those indices can be easily stolen and nobody guarantees that if you create vlan10, vlan11, then destroy vlan10, create vlan12 and vlan10 that vlan10 will have the same index as before. in fact it might be a different interface created for a different purpose days after. who knows? if snmp client relies on this behavior, it's broken since we have never made any provisions regarding how we use those indices. But if I understand correctly what you're saying about querying the IfIndex, I find it even more dangerous to reuse the last index, because I can create a configuration where I change an interface while keeping the same index (with usb interfaces for example). I think that's nonsense. looks like you misunderstand the problem we're dealing with here. FWIW it would be interesting to modify tun(4) so that it doesn't need to detach/reattach itself when switching between mode, this would allow us to stop reusing the last index. Or you could simply rewrite tun(4)? Isn't there any other way to do what you want without stopping to reuse the index? SNMP simply expects that if_indexes are fairly static, linear, and without holes. Why should we change that in OpenBSD? Is there any security reason to randomize the indexes - No. Reyk or snmp can simply stop assuming things. if_index wasn't created for snmp in the first place.
Re: defer routing table updates on link state changes
On 12 September 2013 18:28, Mike Belopuhov m...@belopuhov.com wrote: On 12 September 2013 18:14, Reyk Floeter r...@openbsd.org wrote: On Thu, Sep 12, 2013 at 05:53:42PM +0200, Mike Belopuhov wrote: looks like you misunderstand the problem we're dealing with here. Sure, I do. You're trying to push one thing and you don't want to hear the concerns about a specific detail of it. with all respect, i think you don't. otherwise you wouldn't be asking the questions you're asking. we do hear your concerns, but since even before the change if_index was not static at all the way you seem to be implying snmp requires it, i don't see a situation drastically changing. if you create all the interfaces on startup or before you start snmpd and don't destroy/ re-create them nothing is changed. perhaps truly persistent snmp ifindexes could be implemented in the snmpd itself via a new configuration option, i.e. interface vlan1 index 1. you don't necessarily have to use the same ones system is using for it's own purposes.
Re: defer routing table updates on link state changes
On 12 September 2013 18:14, Reyk Floeter r...@openbsd.org wrote: On Thu, Sep 12, 2013 at 05:53:42PM +0200, Mike Belopuhov wrote: looks like you misunderstand the problem we're dealing with here. Sure, I do. You're trying to push one thing and you don't want to hear the concerns about a specific detail of it. with all respect, i think you don't. otherwise you wouldn't be asking the questions you're asking. we do hear your concerns, but since even before the change if_index was not static at all the way you seem to be implying snmp requires it, i don't see a situation drastically changing. if you create all the interfaces on startup or before you start snmpd and don't destroy/ re-create them nothing is changed. FWIW it would be interesting to modify tun(4) so that it doesn't need to detach/reattach itself when switching between mode, this would allow us to stop reusing the last index. Or you could simply rewrite tun(4)? Isn't there any other way to do what you want without stopping to reuse the index? SNMP simply expects that if_indexes are fairly static, linear, and without holes. Why should we change that in OpenBSD? Is there any security reason to randomize the indexes - No. Reyk or snmp can simply stop assuming things. if_index wasn't created for snmp in the first place. Of course, everyone else is wrong, let's change the world! IfIndex is used by SNMP since at least 1988 (RFC 1066) and many many tools have adopted it expecting this behaviour. Anyway, just go ahead and do the stuff. I don't care, it is not a big issue for snmpd. But I still don't see the point of changing the semantics instead of finding another way to do what you want. Unless there is a security issue or similar with if_indexes and changing it would actually improve something. Blah. Reyk no need to get upset. mpi's change does improve things. we want to make full use of if_index' initial design and use it as a reference to the interface in the mp network stack . it has nothing to do with a badly designed protocol from the eighties.
Re: defer routing table updates on link state changes
On 12 September 2013 19:07, Reyk Floeter r...@openbsd.org wrote: On Thu, Sep 12, 2013 at 06:59:13PM +0200, Mike Belopuhov wrote: Ok, let's stop this. I don't think you read what I replied before. I didn't say that we're static with if_indexes, just that we shouldn't make it worse. or implement persistent indices in the snmpd itself maybe? Maybe. But this would create another layer of abstraction. And if_index is not just used by SNMP. I give up, but please read my next comment below. Isn't there any other way to do what you want without stopping to reuse the index? SNMP simply expects that if_indexes are fairly static, linear, and without holes. Why should we change that in OpenBSD? Is there any security reason to randomize the indexes - No. or snmp can simply stop assuming things. if_index wasn't created for snmp in the first place. Actually, I think this assumption is wrong. I researched a little bit in BSD history: - RFC 1066 from August 1988 is one of the early SNMP RFC that mention IfIndex - 4.3BSD-Tahoe from June 1988 doesn't have if_index, I also didn't find in other early BSDs. - 4.3BSD-Reno from June 1990 does have it. You can even find a new comment /* XXX fast fix for SNMP, going away soon */ on top of if.h. So it seems that if_index was added _for_ SNMP. i believe this comment refers to the inclusion of sys/time.h. Yes, I know. But see: 1. 1988-06: 4.3BSD-Tahoe without if_index 2. 1988-08: SNMP 3. 1990-06: 4.3BSD-Reno with if_index, mentioning SNMP in if.h. I don't have the commit history, but this might indicate that there was some work to support SNMP and many new fields in struct ifnet have been added for SNMP. http://minnie.tuhs.org/cgi-bin/utree.pl?file1=4.3BSD-Reno/src/sys/net/if.hfile2=4.3BSD-Tahoe/usr/src/sys/net/if.hprint=1 reyk either way, we need to move forward on this. we want to use if_index for the purpose of looking up the interface w/o a pointer to the ifnet. should we implement additional indices for that or snmp problem will be dealt with? currently if you reuse an index and reuse the memory for the ifnet you may suddenly pick a new interface and perform an operation on it that might not necessarily make sense in the context.
Re: defer routing table updates on link state changes
On 12 September 2013 18:48, Reyk Floeter r...@openbsd.org wrote: On Thu, Sep 12, 2013 at 06:28:15PM +0200, Mike Belopuhov wrote: Sure, I do. You're trying to push one thing and you don't want to hear the concerns about a specific detail of it. with all respect, i think you don't. otherwise you wouldn't be asking the questions you're asking. we do hear your concerns, but since even before the change if_index was not static at all the way you seem to be implying snmp requires it, i don't see a situation drastically changing. if you create all the interfaces on startup or before you start snmpd and don't destroy/ re-create them nothing is changed. Ok, let's stop this. I don't think you read what I replied before. I didn't say that we're static with if_indexes, just that we shouldn't make it worse. or implement persistent indices in the snmpd itself maybe? I give up, but please read my next comment below. Isn't there any other way to do what you want without stopping to reuse the index? SNMP simply expects that if_indexes are fairly static, linear, and without holes. Why should we change that in OpenBSD? Is there any security reason to randomize the indexes - No. or snmp can simply stop assuming things. if_index wasn't created for snmp in the first place. Actually, I think this assumption is wrong. I researched a little bit in BSD history: - RFC 1066 from August 1988 is one of the early SNMP RFC that mention IfIndex - 4.3BSD-Tahoe from June 1988 doesn't have if_index, I also didn't find in other early BSDs. - 4.3BSD-Reno from June 1990 does have it. You can even find a new comment /* XXX fast fix for SNMP, going away soon */ on top of if.h. So it seems that if_index was added _for_ SNMP. i believe this comment refers to the inclusion of sys/time.h. Of course, everyone else is wrong, let's change the world! IfIndex is used by SNMP since at least 1988 (RFC 1066) and many many tools have adopted it expecting this behaviour. Anyway, just go ahead and do the stuff. I don't care, it is not a big issue for snmpd. But I still don't see the point of changing the semantics instead of finding another way to do what you want. Unless there is a security issue or similar with if_indexes and changing it would actually improve something. Blah. no need to get upset. mpi's change does improve things. we want to make full use of if_index' initial design and use it as a reference to the interface in the mp network stack . it has nothing to do with a badly designed protocol from the eighties. Oh, come on. SNMP is as badly designed as many other things that we're using every day. sure. Do you suggest to rm snmpd because the protocol is badly designed? no, i don't. i merely suggest that if_index should not be used if persistent ifindex'es are required.
Re: enc interface errno
On 27 September 2013 15:24, Alexander Bluhm alexander.bl...@gmx.net wrote: Hi, The error return codes for the enc interface seem quite inconsistent. Always return the appropriate errno. ok? bluhm OK
Re: 5.4 html Security Improvements section
On 9 October 2013 19:51, Alexey E. Suslikov alexey.susli...@gmail.com wrote: * Added AES-XTS support to aesni crypto(4) driver on amd64. Allows softraid(4) to benefit from the AES-NI instructions on newer Intel CPUs not at the moment, though.
Re: pf dropping window updates and acks
On Fri, Oct 11, 2013 at 12:09 +0200, Gerhard Roth wrote: In January bluhm@ introduced 'data_end' to pf.c:tcp_track_full(). Now this breaks the handling of non-data packets. They may be rejected because the SEQ_GEQ(src-seqhi, data_end) check fails. The patch below should fix this. Makes sense to me. OK mikeb Gerhard Index: sys/net/pf.c === RCS file: /cvs/src/sys/net/pf.c,v retrieving revision 1.841 diff -u -p -u -p -r1.841 pf.c --- sys/net/pf.c 9 Oct 2013 09:32:01 - 1.841 +++ sys/net/pf.c 11 Oct 2013 09:57:20 - @@ -3940,7 +3940,7 @@ pf_tcp_track_full(struct pf_pdesc *pd, s if (seq == end) { /* Ease sequencing restrictions on no data packets */ seq = src-seqlo; - end = seq; + data_end = end = seq; } ackskew = dst-seqlo - ack;
defer routing table updates on link state changes (again)
hi, since mpi's if_index diff is now in, this should probably go in as well. it has received some testing in the meantime. original description: in order to make our life a bit easier and prevent rogue accesses to the routing table from the hardware interrupt context violating all kinds of spl assumptions we would like if_link_state_change that is called by network device drivers in their interrupt service routines to defer its work to the process context or thereabouts. please note that a token (an interface index) is passed to the workq in order to make sure that if the interface would be gone by the time syswq goes around to run the task it would just fall through. ok? diff --git sys/net/if.c sys/net/if.c index 6dafd0d..5b6800a 100644 --- sys/net/if.c +++ sys/net/if.c @@ -79,10 +79,11 @@ #include sys/protosw.h #include sys/kernel.h #include sys/ioctl.h #include sys/domain.h #include sys/sysctl.h +#include sys/workq.h #include net/if.h #include net/if_dl.h #include net/if_media.h #include net/if_types.h @@ -151,10 +152,12 @@ int if_clone_list(struct if_clonereq *); struct if_clone*if_clone_lookup(const char *, int *); void if_congestion_clear(void *); intif_group_egress_build(void); +void if_link_state_change_task(void *, void *); + intifai_cmp(struct ifaddr_item *, struct ifaddr_item *); void ifa_item_insert(struct sockaddr *, struct ifaddr *, struct ifnet *); void ifa_item_remove(struct sockaddr *, struct ifaddr *, struct ifnet *); #ifndef SMALL_KERNEL void ifa_print_rb(void); @@ -1106,21 +1109,39 @@ if_up(struct ifnet *ifp) m_clinitifp(ifp); } /* - * Process a link state change. - * NOTE: must be called at splsoftnet or equivalent. + * Schedule a link state change task. */ void if_link_state_change(struct ifnet *ifp) { - rt_ifmsg(ifp); + /* try to put the routing table update task on syswq */ + workq_add_task(NULL, 0, if_link_state_change_task, + (void *)((unsigned long)ifp-if_index), NULL); +} + +/* + * Process a link state change. + */ +void +if_link_state_change_task(void *arg, void *unused) +{ + unsigned int index = (unsigned long)arg; + struct ifnet *ifp; + int s; + + s = splsoftnet(); + if ((ifp = if_get(index)) != NULL) { + rt_ifmsg(ifp); #ifndef SMALL_KERNEL - rt_if_track(ifp); + rt_if_track(ifp); #endif - dohooks(ifp-if_linkstatehooks, 0); + dohooks(ifp-if_linkstatehooks, 0); + } + splx(s); } /* * Handle interface watchdog timer routines. Called * from softclock, we decrement timers (if set) and
Re: Make bioctl(4) print cache policy
On 22 October 2013 15:22, Mark Kettenis mark.kette...@xs4all.nl wrote: Diff below makes bioctl(4) print the cache policy for that's currently in effect for RAID volumes. It only prints the state (WB for write-back, WT for write-through) if the RAID controller driver fills in the details in response to a BIOCVOL ioctl. This diff adds such support to mfi(4). # bioctl mfi0 Volume Status Size Device mfi0 0 Online72746008576 sd0 RAID1 WB 0 Online73407820800 1:0.0 noencl HITACHI HUS153073VLS300 A598 1 Online73407820800 1:1.0 noencl HITACHI HUS153073VLS300 A598 ok? mpii specs do not use terms write-back or write-through, they rather talk about write cache being enabled or disabled. they obviously mean write-back cache. i believe for the bioctl user it's more important to know if actual writes are deferred or not, so i would rather say Write Cache Enabled/Disabled or Write Cache On/Off or something similar rather then if it's WB or WT. oterwise people will have to look in the man page for what WB or WT stands for and then search wikipedia for what does it mean.. just my 2 cents.
convert crypto queue to the task(9) api
Tested on amd64 SP and MP, i386 SP so far. sparc64 MP test is in progress. I've also tested the crypto(4) interface (doesn't use queue) so softraid should work as well. ok? diff --git sys/crypto/crypto.c sys/crypto/crypto.c index 7df0c435..fbdcd97 100644 --- sys/crypto/crypto.c +++ sys/crypto/crypto.c @@ -34,11 +34,11 @@ struct cryptocap *crypto_drivers = NULL; int crypto_drivers_num = 0; struct pool cryptop_pool; struct pool cryptodesc_pool; -struct workq *crypto_workq; +struct taskq *crypto_taskq; /* * Create a new session. */ int @@ -414,26 +414,26 @@ crypto_dispatch(struct cryptop *crp) hid = (crp-crp_sid 32) 0x; if (hid crypto_drivers_num) crypto_drivers[hid].cc_queued++; splx(s); - if (crypto_workq) { - workq_queue_task(crypto_workq, crp-crp_wqt, 0, - (workq_fn)crypto_invoke, crp, NULL); + if (crypto_taskq) { + task_set(crp-crp_task, (void (*))crypto_invoke, crp, NULL); + task_add(crypto_taskq, crp-crp_task); } else { crypto_invoke(crp); } return 0; } int crypto_kdispatch(struct cryptkop *krp) { - if (crypto_workq) { - workq_queue_task(crypto_workq, krp-krp_wqt, 0, - (workq_fn)crypto_kinvoke, krp, NULL); + if (crypto_taskq) { + task_set(krp-krp_task, (void (*))crypto_kinvoke, krp, NULL); + task_add(crypto_taskq, krp-krp_task); } else { crypto_kinvoke(krp); } return 0; @@ -616,11 +616,11 @@ crypto_getreq(int num) } void crypto_init(void) { - crypto_workq = workq_create(crypto, 1, IPL_HIGH); + crypto_taskq = taskq_create(crypto, 1, IPL_HIGH); pool_init(cryptop_pool, sizeof(struct cryptop), 0, 0, 0, cryptop, NULL); pool_init(cryptodesc_pool, sizeof(struct cryptodesc), 0, 0, 0, cryptodesc, NULL); @@ -635,23 +635,24 @@ crypto_done(struct cryptop *crp) crp-crp_flags |= CRYPTO_F_DONE; if (crp-crp_flags CRYPTO_F_NOQUEUE) { /* not from the crypto queue, wakeup the userland process */ crp-crp_callback(crp); } else { - workq_queue_task(crypto_workq, crp-crp_wqt, 0, - (workq_fn)crp-crp_callback, crp, NULL); + task_set(crp-crp_task, (void (*))crp-crp_callback, + crp, NULL); + task_add(crypto_taskq, crp-crp_task); } } /* * Invoke the callback on behalf of the driver. */ void crypto_kdone(struct cryptkop *krp) { - workq_queue_task(crypto_workq, krp-krp_wqt, 0, - (workq_fn)krp-krp_callback, krp, NULL); + task_set(krp-krp_task, (void (*))krp-krp_callback, krp, NULL); + task_add(crypto_taskq, krp-krp_task); } int crypto_getfeat(int *featp) { diff --git sys/crypto/cryptodev.h sys/crypto/cryptodev.h index 4f87046..4f732f9 100644 --- sys/crypto/cryptodev.h +++ sys/crypto/cryptodev.h @@ -51,11 +51,11 @@ #ifndef _CRYPTO_CRYPTO_H_ #define _CRYPTO_CRYPTO_H_ #include sys/ioccom.h -#include sys/workq.h +#include sys/task.h /* Some initial values */ #define CRYPTO_DRIVERS_INITIAL 4 #define CRYPTO_DRIVERS_MAX 128 #define CRYPTO_SW_SESSIONS 32 @@ -158,11 +158,11 @@ struct cryptodesc { struct cryptodesc *crd_next; }; /* Structure describing complete operation */ struct cryptop { - struct workq_task crp_wqt; + struct task crp_task; u_int64_t crp_sid;/* Session ID */ int crp_ilen; /* Input data total length */ int crp_olen; /* Result total length */ int crp_alloctype; /* Type of buf to allocate if needed */ @@ -228,11 +228,11 @@ struct crypt_kop { #define CRF_DSA_SIGN (1 CRK_DSA_SIGN) #define CRF_DSA_VERIFY (1 CRK_DSA_VERIFY) #define CRF_DH_COMPUTE_KEY (1 CRK_DH_COMPUTE_KEY) struct cryptkop { - struct workq_task krp_wqt; + struct task krp_task; u_int krp_op; /* ie. CRK_MOD_EXP or other */ u_int krp_status; /* return status */ u_short krp_iparams;/* # of input parameters */ u_short krp_oparams;/* # of output parameters */
Re: convert crypto queue to the task(9) api
On Wed, Oct 30, 2013 at 14:58 +0100, Mike Belopuhov wrote: Tested on amd64 SP and MP, i386 SP so far. sparc64 MP test is in progress. I've also tested the crypto(4) interface (doesn't use queue) so softraid should work as well. sparc64 test is done. on a side note, that IPL_HIGH can be safely converted to IPL_VM since crypto code itself uses splvm (which is also rather pointless because hw crypto uses IPL_NET and mbuf allocation code will bump IPL itself anyways). i have just tested IPL_VM taskq on the aforementioned machines. ok? diff --git sys/crypto/crypto.c sys/crypto/crypto.c index 7df0c435..fbdcd97 100644 --- sys/crypto/crypto.c +++ sys/crypto/crypto.c @@ -34,11 +34,11 @@ struct cryptocap *crypto_drivers = NULL; int crypto_drivers_num = 0; struct pool cryptop_pool; struct pool cryptodesc_pool; -struct workq *crypto_workq; +struct taskq *crypto_taskq; /* * Create a new session. */ int @@ -414,26 +414,26 @@ crypto_dispatch(struct cryptop *crp) hid = (crp-crp_sid 32) 0x; if (hid crypto_drivers_num) crypto_drivers[hid].cc_queued++; splx(s); - if (crypto_workq) { - workq_queue_task(crypto_workq, crp-crp_wqt, 0, - (workq_fn)crypto_invoke, crp, NULL); + if (crypto_taskq) { + task_set(crp-crp_task, (void (*))crypto_invoke, crp, NULL); + task_add(crypto_taskq, crp-crp_task); } else { crypto_invoke(crp); } return 0; } int crypto_kdispatch(struct cryptkop *krp) { - if (crypto_workq) { - workq_queue_task(crypto_workq, krp-krp_wqt, 0, - (workq_fn)crypto_kinvoke, krp, NULL); + if (crypto_taskq) { + task_set(krp-krp_task, (void (*))crypto_kinvoke, krp, NULL); + task_add(crypto_taskq, krp-krp_task); } else { crypto_kinvoke(krp); } return 0; @@ -616,11 +616,11 @@ crypto_getreq(int num) } void crypto_init(void) { - crypto_workq = workq_create(crypto, 1, IPL_HIGH); + crypto_taskq = taskq_create(crypto, 1, IPL_HIGH); pool_init(cryptop_pool, sizeof(struct cryptop), 0, 0, 0, cryptop, NULL); pool_init(cryptodesc_pool, sizeof(struct cryptodesc), 0, 0, 0, cryptodesc, NULL); @@ -635,23 +635,24 @@ crypto_done(struct cryptop *crp) crp-crp_flags |= CRYPTO_F_DONE; if (crp-crp_flags CRYPTO_F_NOQUEUE) { /* not from the crypto queue, wakeup the userland process */ crp-crp_callback(crp); } else { - workq_queue_task(crypto_workq, crp-crp_wqt, 0, - (workq_fn)crp-crp_callback, crp, NULL); + task_set(crp-crp_task, (void (*))crp-crp_callback, + crp, NULL); + task_add(crypto_taskq, crp-crp_task); } } /* * Invoke the callback on behalf of the driver. */ void crypto_kdone(struct cryptkop *krp) { - workq_queue_task(crypto_workq, krp-krp_wqt, 0, - (workq_fn)krp-krp_callback, krp, NULL); + task_set(krp-krp_task, (void (*))krp-krp_callback, krp, NULL); + task_add(crypto_taskq, krp-krp_task); } int crypto_getfeat(int *featp) { diff --git sys/crypto/cryptodev.h sys/crypto/cryptodev.h index 4f87046..4f732f9 100644 --- sys/crypto/cryptodev.h +++ sys/crypto/cryptodev.h @@ -51,11 +51,11 @@ #ifndef _CRYPTO_CRYPTO_H_ #define _CRYPTO_CRYPTO_H_ #include sys/ioccom.h -#include sys/workq.h +#include sys/task.h /* Some initial values */ #define CRYPTO_DRIVERS_INITIAL 4 #define CRYPTO_DRIVERS_MAX 128 #define CRYPTO_SW_SESSIONS 32 @@ -158,11 +158,11 @@ struct cryptodesc { struct cryptodesc *crd_next; }; /* Structure describing complete operation */ struct cryptop { - struct workq_task crp_wqt; + struct task crp_task; u_int64_t crp_sid;/* Session ID */ int crp_ilen; /* Input data total length */ int crp_olen; /* Result total length */ int crp_alloctype; /* Type of buf to allocate if needed */ @@ -228,11 +228,11 @@ struct crypt_kop { #define CRF_DSA_SIGN (1 CRK_DSA_SIGN) #define CRF_DSA_VERIFY (1 CRK_DSA_VERIFY) #define CRF_DH_COMPUTE_KEY (1 CRK_DH_COMPUTE_KEY) struct cryptkop { - struct workq_task krp_wqt; + struct task krp_task; u_int krp_op; /* ie. CRK_MOD_EXP or other */ u_int krp_status; /* return status */ u_short krp_iparams;/* # of input parameters */ u_short krp_oparams;/* # of output parameters */
Re: IPv6 routing header type 0
On 14 November 2013 18:52, Henning Brauer lists-openbsdt...@bsws.de wrote: * Theo de Raadt dera...@cvs.openbsd.org [2013-11-14 18:47]: it is the status quo *right now* Look, you can't call something the status quo when a commit was made 1 month ago, to a REAL status quo that existed for 10 years when itojun made the change... and immediately after this recent commit we started arguying about the change. Go find out what status quo means. let's not get into this, leads us nowhere. It *was* being filtered, until a month ago. no news here - we had discussed that change at the time. pretty sure you were in the loop even. No, I was not in the loop before it was commited, and many others were not either. Apparently only you and mikeb were, and you did NOT push for more people to know about the change before it was commited. mikeb? bluhm worked on comitted that otoh. i'm still pretty damn sure you were Cc'd; won't dig for old mail just to prove it; don't see the point, doesn't change anything now anyway. we have discussed that with bluhm in berlin and initially i had the same opinion: leave the check in the stack, but he has convinced me that it's rather pf's job to do it. i'm not against bringing it back and his diff looks fine to me, esp. since it avoids double check that was there before. The non-pf RH0 filtering case is worthwhile. and here we disagree. -- Henning Brauer, h...@bsws.de, henn...@openbsd.org BS Web Services GmbH, http://bsws.de, Full-Service ISP Secure Hosting, Mail and DNS Services. Dedicated Servers, Root to Fully Managed Henning Brauer Consulting, http://henningbrauer.com/