Re: AVL tree
I test this implementation with 1000 entry inserted by random() ;) average seek time with random search of 1 is from 2 to 3 seconds On Fri, 20 May 2011 04:04:07 +0200 Ariane van der Steldt ari...@stack.nl wrote: On Thu, May 19, 2011 at 07:33:22PM +0200, Mike Belopuhov wrote: On Thu, May 19, 2011 at 7:25 PM, Michael Pounov mi...@aitbg.com wrote: Not see differences in results with performance from RB tree vs AVL, Which tree size did you use to test that? Which insert order (yes, this is relevant: with the right ordering you can hit the worst or best case of a tree algorithm). Did you test lookup time or update time or both (combined or split)? but right solution for problems when we have choice between algorithms. if you don't see any difference then what's the point in having them both available? AVL trees have a difference of max 1 between the longest and shortest path to a leaf, whereas RB-trees have at max the longest path be double the length of the shortest path. I.e. work case lookups require traversal of ln(size)/ln(2) elements for AVL trees, 2 * ln(size)/ln(2) elements for RB trees. However, RB trees are slightly more efficient with insertion/removal. The better balancing of AVL trees can make a big difference in lookup time compared to RB trees, for trees containing millions of items. how are you supposed to choose between them? Basically it's a trade off. Is lookup more important, or insert/remove? And ofcourse, if you still don't know, just take what strikes your fancy. :D But I'm mostly interested because I tend to use trees left, right and center and those macros lead to code bloat. If I can use a tree without macros, I'm happy. -- Ariane -- M.Punov - AITNET - Sofia/Bulgaria - Software Network Solutions (+359) 888 73 73 58;(+359) 2 402 4000
Re: AVL tree
On Fri, May 20, 2011 at 4:04 AM, Ariane van der Steldt ari...@stack.nl wrote: AVL trees have a difference of max 1 between the longest and shortest path to a leaf, whereas RB-trees have at max the longest path be double the length of the shortest path. I.e. work case lookups require traversal of ln(size)/ln(2) elements for AVL trees, 2 * ln(size)/ln(2) elements for RB trees. However, RB trees are slightly more efficient with insertion/removal. The better balancing of AVL trees can make a big difference in lookup time compared to RB trees, for trees containing millions of items. how are you supposed to choose between them? Basically it's a trade off. Is lookup more important, or insert/remove? And ofcourse, if you still don't know, just take what strikes your fancy. :D i know about the difference of lookup and insert/remove speed, but is there a significant difference in practice (in openbsd use cases)? But I'm mostly interested because I tend to use trees left, right and center and those macros lead to code bloat. If I can use a tree without macros, I'm happy. so maybe it makes sense to have a non macro implementation in libkern and use it in uvm or whenever needed? -- Ariane
Re: AVL tree
On Fri, 20 May 2011 05:01:41 +0200 Ariane van der Steldt ari...@stack.nl wrote: On Thu, May 19, 2011 at 08:28:09PM +0300, Michael Pounov wrote: Add AVL tree implementation and merge few RB tree related macros. If you have comments or any claims, please send me feedback and I will fix them. Inline patch:: --- /usr/src/sys/sys/tree.h Mon Mar 2 11:42:55 2009 +++ tree.h Thu May 19 20:16:36 2011 @@ -730,9 +730,367 @@ (x) != NULL; \ (x) = name##_RB_NEXT(x)) +#define RB_FOREACH_FROM(x, name, y) \ This macro modifies parameter y, which I do not expect from using it. Please keep y its original value. Also, I want to be able to use an rvalue for y. + for ((x) = (y); \ + ((x) != NULL) ((y) = name##_RB_NEXT(x), (x) != NULL);\ +(x) = (y)) + +#define RB_FOREACH_SAFE(x, name, head, y) \ (Again, don't modify the value of y.) + for ((x) = RB_MIN(name, head); \ + ((x) != NULL) ((y) = name##_RB_NEXT(x), (x) != NULL);\ +(x) = (y)) + Is there a specific reason your wrote the above 2 macros? The pattern where the for loop is written out is quite common and well understood by developers. #define RB_FOREACH_REVERSE(x, name, head) \ for ((x) = RB_MAX(name, head); \ (x) != NULL; \ (x) = name##_RB_PREV(x)) + +#define RB_FOREACH_REVERSE_FROM(x, name, y) \ + for ((x) = (y); \ + ((x) != NULL) ((y) = name##_RB_PREV(x), (x) != NULL);\ +(x) = (y)) + +#define RB_FOREACH_REVERSE_SAFE(x, name, head, y) \ + for ((x) = RB_MAX(name, head); \ + ((x) != NULL) ((y) = name##_RB_PREV(x), (x) != NULL);\ +(x) = (y)) + +/* Macros that define a avl tree */ +#define AVL_MAX_HEIGHT 42 No. Trees do not have a max height. On a modern amd64, I can create code which will be able to fit more than 2 ** 42 items in the tree (provided I had the physmem in my machine). +#define AVL_HEAD(name, type) \ +struct name { \ + struct type *avlh_root; /* root of the tree */ \ + int avlh_height;\ + long avlh_count;\ This tree implementation use height for ancestors temporary array because we have not a pointer to the parent. You may use all memory when set new height 2**height = all stored elements in tree! :) size_t, not long. KR guarantee it'll fit for every future version C. Why is there a count? For most trees in the system, we don't care how large they are. And we don't care for the height either. +} + +#define AVL_INITIALIZER(root) \ + { NULL, 0, 0 } + +#define AVL_INIT(root, height) do { \ + (root)-avlh_root = NULL; \ + (root)-avlh_height = !height ? AVL_MAX_HEIGHT : height;\ height? This is not the current height of the tree... Don't like it; if the items in the tree are constrained in some way, other code will have dealt with it. + (root)-avlh_count = 0; \ +} while (0) + +#define AVL_ENTRY(type) \ +struct { \ + struct type *avle_left; /* left element */ \ + struct type *avle_right;/* right element */ \ + int avle_height;/* node color */\ Comment copy-pasted from RB tree. +} + +#define AVL_LEFT(elm, field) (elm)-field.avle_left +#define AVL_RIGHT(elm, field) (elm)-field.avle_right +#define AVL_HEIGHT(elm, field) (elm)-field.avle_height +#define AVL_ROOT(head) (head)-avlh_root +#define AVL_EMPTY(head)(AVL_ROOT(head) == NULL) +#define AVL_ROOT_HEIGHT(head) (head)-avlh_height How about this instead: (AVL_EMPTY(head) ? 0 : AVL_HEIGHT(AVL_ROOT(head))) +#define AVL_ROOT_COUNT(head) (head)-avlh_count + +/* Generates prototypes and inline functions */ +#defineAVL_PROTOTYPE(name, type, field, cmp) \ + AVL_PROTOTYPE_INTERNAL(name, type, field, cmp,) +#define
rdr-to ::1
inet6 pf rules that rdr-to ::1 do not work currently. Matching packets just disappear and the counter packets that violated scope rules from a netstat -s -p ip6 gets incremented. It came up before on misc@: http://marc.info/?t=12680425912r=1w=2 The attached diff removes the check (let's call it check #1) that drops the packet. This is just to point out where the problem is because the BSD's have diverged here: FreeBSD _replaced_ check #1 with another check (#2) in this diff from 2004 (from kame): http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet6/ip6_input.c.diff?r1=1.68;r2=1.69 NetBSD _replaced_ check #1 it with something totally different (attributed to jinmei@kame) http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/netinet6/ip6_input.c?rev=1.81content-type=text/x-cvsweb-markup Check #2 was _added_ to OpenBSD in 2006 (attributed to jinmei@kame): http://www.openbsd.org/cgi-bin/cvsweb/src/sys/netinet6/ip6_input.c.diff?r1=1.72;r2=1.73 Basically, check #1 is gone in FreeBSD and NetBSD and the diff syncs us closer to FreeBSD. I'm unsure if it's the right thing to do though for a few reasons. The FreeBSD diff has a second part that I cannot yet tell is related or not. Or maybe the NetBSD diff could be better. And OpenBSD also seems to have other checks in this area. I'll spend some more time on this, but maybe there's an IPv6 guru that can lend a hand? :-) -- Cam Index: ip6_input.c === RCS file: /cvs/src/sys/netinet6/ip6_input.c,v retrieving revision 1.99 diff -u -r1.99 ip6_input.c --- ip6_input.c 3 Apr 2011 13:56:05 - 1.99 +++ ip6_input.c 20 May 2011 09:30:14 - @@ -270,7 +270,6 @@ in6_ifstat_inc(m-m_pkthdr.rcvif, ifs6_in_addrerr); goto bad; } - if (IN6_IS_ADDR_MC_INTFACELOCAL(ip6-ip6_dst) !(m-m_flags M_LOOP)) { /* @@ -340,19 +339,6 @@ ip6 = mtod(m, struct ip6_hdr *); srcrt = !IN6_ARE_ADDR_EQUAL(odst, ip6-ip6_dst); #endif - - if (IN6_IS_ADDR_LOOPBACK(ip6-ip6_src) || - IN6_IS_ADDR_LOOPBACK(ip6-ip6_dst)) { - if (m-m_pkthdr.rcvif-if_flags IFF_LOOPBACK) { - ours = 1; - deliverifp = m-m_pkthdr.rcvif; - goto hbhcheck; - } else { - ip6stat.ip6s_badscope++; - in6_ifstat_inc(m-m_pkthdr.rcvif, ifs6_in_addrerr); - goto bad; - } - } /* drop packets if interface ID portion is already filled */ if ((m-m_pkthdr.rcvif-if_flags IFF_LOOPBACK) == 0) {
Re: set skip on ifgroup
* Alexander Hall ha...@openbsd.org [2011-05-19 10:25]: On 05/18/11 23:31, Stuart Henderson wrote: set skip in PF has a slightly unexpected behaviour; rather than skipping by interface group, it matches on the non-numeric part of an interface name. I think the prefix match test is a common behaviour so I think you should keep that. no, that is just leftover. -- Henning Brauer, h...@bsws.de, henn...@openbsd.org BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting
Re: set skip on ifgroup
* Claudio Jeker cje...@diehard.n-r-g.com [2011-05-19 11:29]: There is a bigger problem with 'set skip on lo', it is only evaluated during load. So if you create a lo1 afterwards the set skip will not trigger. This is very annoying especially with qemu and tun interfaces. To be honest I'm not sure who will do a 'set skip on sis' or 'set skip on em'. Normaly you want to filter on your physical interfaces and not just skip them all. For pseudo-devices like lo, tun, vlan, etc. a group is created automatically. I think the very important bit is this: Hmmm, looking further, it seems ordinary rules only match on the interface name or group as well (in pfi_kif_match()), so maybe you're just plain right after all. :-) set skip is currently special and works in a not so expected way so it is better to make it work like all other users of interface names and people needing 'set skip on em' should add a 'group em' line to their hostname.em* files. spot on. -- Henning Brauer, h...@bsws.de, henn...@openbsd.org BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting
Re: set skip on ifgroup
* Stuart Henderson s...@spacehopper.org [2011-05-19 11:21]: Note that the default ruleset does include a 'set skip on lo' but that's fine since lo* interfaces are by default added to the lo group. If people get bitten by this change, they could either add an interface-name-matching group to each interface or we do that automatically, as is done for vlan's, lo's etc. What does anyone else think about this (making em0 a member of em automatically, etc.)? I don't think it's really all that important, there are usually better grouping criteria than what driver supports the device and people have to change their ruleset this release anyway. we have specifically decided to NOT do this. what prupose does a group of all interface of a certain driver have? I see zero for teh regular drivers. the ones where it makes sense are the clonables like ppp and for those we have automatic base class group, i. e. all tun interfaces end up in the tun group by default. -- Henning Brauer, h...@bsws.de, henn...@openbsd.org BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting
Re: set skip on ifgroup
* Claudio Jeker cje...@diehard.n-r-g.com [2011-05-19 13:20]: On Thu, May 19, 2011 at 10:49:59AM +0100, Stuart Henderson wrote: On 2011/05/19 11:26, Claudio Jeker wrote: There is a bigger problem with 'set skip on lo', it is only evaluated during load. So if you create a lo1 afterwards the set skip will not trigger. This is very annoying especially with qemu and tun interfaces. Right, I noticed this during testing, and this at least deserves a mention (independent of my other diff). Changing this behaviour could be a problem though, I think it would need to be checked before state lookup, and we don't want to walk the groups of all interfaces on the system for every packet. Since the PFI_IFLAG_SKIP is set on a kif it should be possible to inherit the flag from a group when an interface is created. This would have no impact on runtime execution time. I would really like to have set skip pick up interfaces at runtime because I get constantly burned by it. again, spot on. -- Henning Brauer, h...@bsws.de, henn...@openbsd.org BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting
Re: set skip on ifgroup
* Reyk Floeter r...@openbsd.org [2011-05-19 11:47]: On Thu, May 19, 2011 at 11:26:59AM +0200, Claudio Jeker wrote: To be honest I'm not sure who will do a 'set skip on sis' or 'set skip on em'. I would ;-) Sometimes you have machines with different types of physical interfaces where one type is used for internal stuff like a dedicated pfsync or management link, eg. set skip on em block in on ix0 ...or soekrises with vr(4) and a quad sis(4) in one box. you are abusing a coincidence, in this case, pretty much. but really, you can just put a group vr in hostname.vr* if you want this. or use a more descriptive group name to begin with :) I think the very important bit is this: Hmmm, looking further, it seems ordinary rules only match on the interface name or group as well (in pfi_kif_match()), so maybe you're just plain right after all. :-) set skip is currently special and works in a not so expected way so it is better to make it work like all other users of interface names and people needing 'set skip on em' should add a 'group em' line to their hostname.em* files. ifconfig em also works, so i think it would be less special and confusing if set skip on em would just work without extra config magic. I disagree. the ifconfig em is a backwards compat hack that doesn't make much sense at all. -- Henning Brauer, h...@bsws.de, henn...@openbsd.org BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting
Re: set skip on ifgroup
* Stuart Henderson s...@spacehopper.org [2011-05-19 11:50]: On 2011/05/19 11:26, Claudio Jeker wrote: There is a bigger problem with 'set skip on lo', it is only evaluated during load. So if you create a lo1 afterwards the set skip will not trigger. This is very annoying especially with qemu and tun interfaces. Right, I noticed this during testing, and this at least deserves a mention (independent of my other diff). Changing this behaviour could be a problem though, I think it would need to be checked before state lookup, and we don't want to walk the groups of all interfaces on the system for every packet. nah. we get calls from the interface subsystem when interfaces show up or go. just a few lines of code missing to deal with skip. -- Henning Brauer, h...@bsws.de, henn...@openbsd.org BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting
Split iked(8) policy creation imsg
Hello all. This patch splits off IMSG_CFG_POLICY into four messages: IMSG_CFG_POLICY_BEGIN IMSG_CFG_POLICY_PROPOSAL IMSG_CFG_POLICY_FLOW IMSG_CFG_POLICY_COMMIT Each new policy should start with IMSG_CFG_POLICY_BEGIN, then proceed any number of proposals and flows, and then commit policy. I decided not to use ticket system because I see no race sources. Now we can have long, long policies in your iked.conf. Previously we were limited by maximum imsg packet size (MAX_IMSGSIZE) which is 16 kilobytes currently. This patch complements previous one: http://marc.info/?l=openbsd-techm=130583217730436w=2 Those patches are totally independent and may be applied in any order and combination. Both patches were tested on i386. -- Best wishes, Vadim Zhukov A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing in e-mail? Index: ikev2.c === RCS file: /cvs/src/sbin/iked/ikev2.c,v retrieving revision 1.55 diff -u -p -r1.55 ikev2.c --- ikev2.c 9 May 2011 11:15:18 - 1.55 +++ ikev2.c 20 May 2011 09:48:06 - @@ -133,8 +133,14 @@ ikev2_dispatch_parent(int fd, struct pri return (config_getsocket(env, imsg, ikev2_msg_cb)); case IMSG_PFKEY_SOCKET: return (config_getpfkey(env, imsg)); - case IMSG_CFG_POLICY: - return (config_getpolicy(env, imsg)); + case IMSG_CFG_POLICY_BEGIN: + return (config_getpolicy_begin(env, imsg)); + case IMSG_CFG_POLICY_PROPOSAL: + return (config_getpolicy_proposal(env, imsg)); + case IMSG_CFG_POLICY_FLOW: + return (config_getpolicy_flow(env, imsg)); + case IMSG_CFG_POLICY_COMMIT: + return (config_getpolicy_commit(env, imsg)); case IMSG_CFG_USER: return (config_getuser(env, imsg)); case IMSG_COMPILE: Index: config.c === RCS file: /cvs/src/sbin/iked/config.c,v retrieving revision 1.12 diff -u -p -r1.12 config.c --- config.c9 May 2011 11:15:18 - 1.12 +++ config.c20 May 2011 09:48:06 - @@ -45,6 +45,10 @@ #include iked.h #include ikev2.h + +static struct iked_policy *newpol = NULL; + + struct iked_sa * config_new_sa(struct iked *env, int initiator) { @@ -584,122 +588,167 @@ config_setpolicy(struct iked *env, struc { struct iked_proposal*prop; struct iked_flow*flow; - struct iked_transform *xform; - size_t size, iovcnt, j, c = 0; struct iovec iov[IOV_MAX]; + unsigned int i; - iovcnt = 1; - size = sizeof(*pol); - TAILQ_FOREACH(prop, pol-pol_proposals, prop_entry) { - size += (prop-prop_nxforms * sizeof(*xform)) + - (sizeof(*prop)); - iovcnt += prop-prop_nxforms + 1; + if (env-sc_opts IKED_OPT_NOACTION) { + print_policy(pol); + return (0); } - size += pol-pol_nflows * sizeof(*flow); - iovcnt += pol-pol_nflows; - - if (iovcnt IOV_MAX) { - log_warn(%s: too many proposals/flows, __func__); + if (proc_compose_imsg(env, id, IMSG_CFG_POLICY_BEGIN, -1, + pol, sizeof(*pol)) == -1) return (-1); - } - - iov[c].iov_base = pol; - iov[c++].iov_len = sizeof(*pol); TAILQ_FOREACH(prop, pol-pol_proposals, prop_entry) { - iov[c].iov_base = prop; - iov[c++].iov_len = sizeof(*prop); - - for (j = 0; j prop-prop_nxforms; j++) { - xform = prop-prop_xforms + j; + iov[0].iov_base = prop; + iov[0].iov_len = sizeof(*prop); - iov[c].iov_base = xform; - iov[c++].iov_len = sizeof(*xform); + for (i = 1; i = prop-prop_nxforms; i++) { + if (i = IOV_MAX) { + log_warn(%s: too many proposals, __func__); + return (-1); + } + iov[i].iov_base = prop-prop_xforms + (i - 1); + iov[i].iov_len = sizeof(struct iked_transform); } + + if (proc_composev_imsg(env, id, IMSG_CFG_POLICY_PROPOSAL, -1, + iov, i) == -1) + return (-1); } RB_FOREACH(flow, iked_flows, pol-pol_flows) { - iov[c].iov_base = flow; - iov[c++].iov_len = sizeof(*flow); + if (proc_compose_imsg(env, id, IMSG_CFG_POLICY_FLOW, -1, + flow, sizeof(struct iked_flow)) == -1) + return (-1); } - if (env-sc_opts IKED_OPT_NOACTION) { -
Panic in sr_crypto_rw with kern.bufcachepercent = 75
Hi all, OpenBSD-current snapshot dated 16-May-2011: I get an always-reproducible panic with softraid crypto and kern.bufcachepercent = 75, when untarring a tarball of the complete source tree. The disk layout is the following, with softraid crypto for all but /: /dev/sd0a on / type ffs (local) /dev/sd2f on /home type ffs (local, nodev, nosuid) /dev/sd2e on /usr type ffs (local, nodev) /dev/sd2d on /var type ffs (local, nodev, nosuid) /dev/sd3i on /mnt type msdos (local) Here's the trace: panic: sr_crypto_rw: no crypto op Stopped at Debugger+0x4: popl %ebp RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC! IF RUNNING SMP, USE 'mach ddbcpu #' AND 'trace' ON OTHER PROCESSORS, TOO. DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION! ddb{0} trace Debugger(d08e41c2,dc777d78,d08b9e8f,dc777d78,0) at Debugger+0x4 panic(d08b9e8f,1,0,d1fcf488,d2016000) at panic+0x5d sr_crypto_rw(d1fcf488,0,74,d03e7cfa,d1fa7240) at sr_crypto_rw+0x93 sr_scsi_cmd(d6c75178,0,478b7e0,0,20) at sr_scsi_cmd+0x17f sdstart(d6c75178,d1fcf488,d2016c40,d1fd2c00,d1fd2c84) at sdstart+0x1b4 scsi_ioh_runqueue(d2016c2c,d1fd2c84,d6793b90,dc777e5c,d1fa7000) at scsi_ioh_runqueue+0x59 scsi_xsh_runqueue(d1fd2c00,d6793b90,0,d0578683,20) at scsi_xsh_runqueue+0x11d sdstrategy(d6793b90,4a007000,0,fb530b2,0) at sdstrategy+0x175 spec_strategy(dc777ec8,d08c4f07,dc777eec,1001b4,d6793b90) at spec_strategy+0x3d VOP_STRATEGY(d6793b90,1,4,d6793b90,d6849d7c) at VOP_STRATEGY+0x2c bwrite(d6793b90,d08c4323,dc777f2c,d041964e,d6793b90) at bwrite+0xcc VOP_BWRITE(d6793b90,d6df027c,dc777f5c,0,d6dd41dc) at VOP_BWRITE+0x29 ffs_sync(dc777f4c,0,d6dd41dc,d6e53000,3) at ffs_sync+0x126 VOP_FSYNC(d6dd41dc,d6e53000,3,d6df027c,d03d2f23) at VOP_FSYNC+0x35 sched_sync(d6df027c) at sched_sync+0xaf Bad frame pointer: 0xd0baae88 ddb{0} mach ddbcpu 1 Stopped at Debugger+0x4: popl %ebp RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC! IF RUNNING SMP, USE 'mach ddbcpu #' AND 'trace' ON OTHER PROCESSORS, TOO. DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION! ddb{1} trace Debugger(d1e9bc00,d083cb6c,0,296,d1e9bc34) at Debugger+0x4 i386_ipi_handler(b0,d03e0020,d09e,d1e90010,df3b0010) at i386_ipi_handler+0x5f Xintripi() at Xintripi+0x49 --- interrupt --- __mp_lock(d0a35ce4,7fff,dc751f18,d0202630,0) at __mp_lock+0x52 i386_softintlock(0,d03e0020,d0a3,10,d1e90010) at i386_softintlock+0x12 Xintrltimer() at Xintrltimer+0x50 --- interrupt --- cpu_idle_cycle(d1e9bc00) at cpu_idle_cycle+0xf Bad frame pointer: 0xd0baae48 ddb{1} ps PID PPID PGRP UID S FLAGS WAITCOMMAND 1177 10944 1177 1000 2 0x200 tar 10944 1 10944 1000 3 0x280 pause ksh 32405 1 32405 0 3 0x280 ttyin getty 22382 1 22382 0 3 0x280 ttyin getty 16249 1 16249 0 3 0x280 ttyin getty 4339 1 4339 0 3 0x280 ttyin getty 27948 1 27948 0 3 0x280 select cron 22709 1 22709 0 3 0x280 kqread apmd 3666 1 366699 3 0x280 pollaucat 27290 1 27290 0 3 0x280 select sshd 28487 1837 183774 3 0x280 bpf pflogd 1837 1 1837 0 3 0x280 netio pflogd 14290 22465 2246573 3 0x280 pollsyslogd 22465 1 22465 0 3 0x288 netio syslogd 11486 0 0 0 3 0x2100200 bored srdis 16 0 0 0 3 0x2100200 aiodonedaiodoned * 15 0 0 0 7 0x2100200 update 14 0 0 0 3 0x2100200 cleaner cleaner 13 0 0 0 30x100200 reaper reaper 12 0 0 0 2 0x2100200 pagedaemon 11 0 0 0 2 0x2100200 crypto 10 0 0 0 3 0x2100200 pftmpfpurge 9 0 0 0 3 0x2100200 usbtsk usbtask 8 0 0 0 3 0x2100200 usbatsk usbatsk 7 0 0 0 3 0x2100200 bored intelrel 6 0 0 0 3 0x2100200 acpi0 acpi0 5 0 0 0 7 0x40100200 idle1 4 0 0 0 3 0x2100200 bored syswq 3 0 0 0 3 0x40100200 idle0 2 0 0 0 3 0x2100200 kmalloc kmthread 1 0 1 0 3 0x280 waitinit 0 -1 0 0 3 0x2080200 scheduler swapper ddb{1} boot reboot softraid0: could not allocate metadata scratch area splassert: assertwaitok: want -1 have 1 With kern.bufcachepercent=70 the system seems stable (tested by untarring src.tar a lot of times). dmesg below... Cheers! David OpenBSD 4.9-current (GENERIC.MP) #113: Mon May 16 11:52:06 MDT
Re: Panic in sr_crypto_rw with kern.bufcachepercent = 75
On Fri, May 20, 2011 at 06:24 -0600, David Coppa wrote: Hi all, OpenBSD-current snapshot dated 16-May-2011: I get an always-reproducible panic with softraid crypto and kern.bufcachepercent = 75, when untarring a tarball of the complete source tree. The disk layout is the following, with softraid crypto for all but /: /dev/sd0a on / type ffs (local) /dev/sd2f on /home type ffs (local, nodev, nosuid) /dev/sd2e on /usr type ffs (local, nodev) /dev/sd2d on /var type ffs (local, nodev, nosuid) /dev/sd3i on /mnt type msdos (local) Here's the trace: panic: sr_crypto_rw: no crypto op hi, although i'm not sure that this is a best solution, i can't see why we should panic here. sr_scsi_cmd seems to cope with stuffups just fine. Index: dev/softraid_crypto.c === RCS file: /home/cvs/src/sys/dev/softraid_crypto.c,v retrieving revision 1.65 diff -u -p -r1.65 softraid_crypto.c --- dev/softraid_crypto.c 6 Apr 2011 03:14:51 - 1.65 +++ dev/softraid_crypto.c 20 May 2011 12:42:12 - @@ -1115,7 +1115,7 @@ sr_crypto_rw(struct sr_workunit *wu) if (wu-swu_xs-flags SCSI_DATA_OUT) { crp = sr_crypto_getcryptop(wu, 1); if (crp == NULL) - panic(sr_crypto_rw: no crypto op); + return (1); crp-crp_callback = sr_crypto_write; crp-crp_opaque = wu; s = splvm();
Re: Split iked(8) policy creation imsg
Hi! On Fri, May 20, 2011 at 03:54:03PM +0400, Vadim Zhukov wrote: This patch splits off IMSG_CFG_POLICY into four messages: IMSG_CFG_POLICY_BEGIN IMSG_CFG_POLICY_PROPOSAL IMSG_CFG_POLICY_FLOW IMSG_CFG_POLICY_COMMIT Each new policy should start with IMSG_CFG_POLICY_BEGIN, then proceed any number of proposals and flows, and then commit policy. I decided not to use ticket system because I see no race sources. Now we can have long, long policies in your iked.conf. Previously we were limited by maximum imsg packet size (MAX_IMSGSIZE) which is 16 kilobytes currently. Thanks for you diff. See comments below. It is appreciated and I can adjust it but you can see my comments for modifications below. reyk Index: ikev2.c === RCS file: /cvs/src/sbin/iked/ikev2.c,v retrieving revision 1.55 diff -u -p -r1.55 ikev2.c --- ikev2.c 9 May 2011 11:15:18 - 1.55 +++ ikev2.c 20 May 2011 09:48:06 - @@ -133,8 +133,14 @@ ikev2_dispatch_parent(int fd, struct pri return (config_getsocket(env, imsg, ikev2_msg_cb)); case IMSG_PFKEY_SOCKET: return (config_getpfkey(env, imsg)); - case IMSG_CFG_POLICY: - return (config_getpolicy(env, imsg)); + case IMSG_CFG_POLICY_BEGIN: + return (config_getpolicy_begin(env, imsg)); + case IMSG_CFG_POLICY_PROPOSAL: + return (config_getpolicy_proposal(env, imsg)); + case IMSG_CFG_POLICY_FLOW: + return (config_getpolicy_flow(env, imsg)); + case IMSG_CFG_POLICY_COMMIT: + return (config_getpolicy_commit(env, imsg)); The naming should be simplified just to IMSG_CFG_POLICY IMSG_CFG_PROPOSAL IMSG_CFG_FLOW same for the functions config_getpolicy config_getproposal config_getflow The commit step is not required because the existing IMSG_COMPILE finishes the policy configuration ones, calculates the skip steps and starts operation. See more comments below. There are some hints to make it fully async below (not reviewing the code in detail). case IMSG_CFG_USER: return (config_getuser(env, imsg)); case IMSG_COMPILE: Index: config.c === RCS file: /cvs/src/sbin/iked/config.c,v retrieving revision 1.12 diff -u -p -r1.12 config.c --- config.c 9 May 2011 11:15:18 - 1.12 +++ config.c 20 May 2011 09:48:06 - @@ -45,6 +45,10 @@ #include iked.h #include ikev2.h + +static struct iked_policy *newpol = NULL; + + Please don't use a global variable here. Just add the policy to the global list in config_getpolicy() and reference to the policy by id in the config_getproposal() and config_getflow() calls. struct iked_sa * config_new_sa(struct iked *env, int initiator) { @@ -584,122 +588,167 @@ config_setpolicy(struct iked *env, struc { struct iked_proposal*prop; struct iked_flow*flow; - struct iked_transform *xform; - size_t size, iovcnt, j, c = 0; struct iovec iov[IOV_MAX]; + unsigned int i; - iovcnt = 1; - size = sizeof(*pol); - TAILQ_FOREACH(prop, pol-pol_proposals, prop_entry) { - size += (prop-prop_nxforms * sizeof(*xform)) + - (sizeof(*prop)); - iovcnt += prop-prop_nxforms + 1; + if (env-sc_opts IKED_OPT_NOACTION) { + print_policy(pol); + return (0); } - size += pol-pol_nflows * sizeof(*flow); - iovcnt += pol-pol_nflows; - - if (iovcnt IOV_MAX) { - log_warn(%s: too many proposals/flows, __func__); + if (proc_compose_imsg(env, id, IMSG_CFG_POLICY_BEGIN, -1, + pol, sizeof(*pol)) == -1) return (-1); - } - - iov[c].iov_base = pol; - iov[c++].iov_len = sizeof(*pol); TAILQ_FOREACH(prop, pol-pol_proposals, prop_entry) { - iov[c].iov_base = prop; - iov[c++].iov_len = sizeof(*prop); - - for (j = 0; j prop-prop_nxforms; j++) { - xform = prop-prop_xforms + j; + iov[0].iov_base = prop; + iov[0].iov_len = sizeof(*prop); - iov[c].iov_base = xform; - iov[c++].iov_len = sizeof(*xform); + for (i = 1; i = prop-prop_nxforms; i++) { + if (i = IOV_MAX) { + log_warn(%s: too many proposals, __func__); + return (-1); + } + iov[i].iov_base = prop-prop_xforms + (i - 1); + iov[i].iov_len = sizeof(struct iked_transform); } + + if (proc_composev_imsg(env, id, IMSG_CFG_POLICY_PROPOSAL, -1, + iov, i) == -1) + return (-1); }
Re: Panic in sr_crypto_rw with kern.bufcachepercent = 75
On Fri, May 20, 2011 at 2:45 PM, Mike Belopuhov m...@crypt.org.ru wrote: On Fri, May 20, 2011 at 06:24 -0600, David Coppa wrote: Hi all, OpenBSD-current snapshot dated 16-May-2011: I get an always-reproducible panic with softraid crypto and kern.bufcachepercent = 75, when untarring a tarball of the complete source tree. The disk layout is the following, with softraid crypto for all but /: /dev/sd0a on / type ffs (local) /dev/sd2f on /home type ffs (local, nodev, nosuid) /dev/sd2e on /usr type ffs (local, nodev) /dev/sd2d on /var type ffs (local, nodev, nosuid) /dev/sd3i on /mnt type msdos (local) Here's the trace: panic: sr_crypto_rw: no crypto op hi, although i'm not sure that this is a best solution, i can't see why we should panic here. sr_scsi_cmd seems to cope with stuffups just fine. Index: dev/softraid_crypto.c === RCS file: /home/cvs/src/sys/dev/softraid_crypto.c,v retrieving revision 1.65 diff -u -p -r1.65 softraid_crypto.c --- dev/softraid_crypto.c 6 Apr 2011 03:14:51 - 1.65 +++ dev/softraid_crypto.c 20 May 2011 12:42:12 - @@ -1115,7 +1115,7 @@ sr_crypto_rw(struct sr_workunit *wu) if (wu-swu_xs-flags SCSI_DATA_OUT) { crp = sr_crypto_getcryptop(wu, 1); if (crp == NULL) - panic(sr_crypto_rw: no crypto op); + return (1); crp-crp_callback = sr_crypto_write; crp-crp_opaque = wu; s = splvm(); It now survives to several untarrings of src.tar with kern.bufcachepercent=90. cheers, David
Re: Allow ipsecctl-like grouping in iked(8)
hi, On Thu, May 19, 2011 at 11:06:44PM +0400, Vadim Zhukov wrote: This patch allows ipsecctl-like flow grouping along with current behavior. It allows to write many-to-many policies in a more compact way, see an example: ikev2 esp \ from { 1.2.3.4, 5.6.7.8 } to { 3.4.5.6, 4.5.6.7} \ from 7.8.9.0 to { 0.1.2.3, 2.3.4.5 } \ ... will create six flows: 1.2.3.4 - 3.4.5.6 1.2.3.4 - 4.5.6.7 5.6.7.8 - 3.4.5.6 5.6.7.8 - 4.5.6.7 7.8.9.0 - 0.1.2.3 7.8.9.0 - 2.3.4.5 Please note that you're still limited by MAX_IMSGSIZE, which is about 7 flows, depending on arch. This will be addressed in another patch. The difference between ipsec.conf and iked.conf and IKEv1 and IKEv2 is that an IKEv2 SA can have multiple flows, a.k.a. traffic selectors, in a single exchange while IKEv1 can only have one. So in ipsec.conf the code would expand it into multiple independent ike policies that would lead to multiple independent IKEv1 exchanges and SAs. With IKEv2 this would expand into a single IKEv2 SA with many traffic selectors that would have to fit into a single IKEv2 UDP packet (IKE_AUTH) which can lead to IP fragmentation and all kinds of problems. The current code does not prevent multiple traffic selectors as it is totally legal in IKEv2, but I would be careful with using macros to generate _many_ traffic selectors, or flows, in a policy. Maybe this is just a documentation issue or you need some additional length checking here to avoid/warn about excessive fragmentation (considering the 64k IPv4 limit as well). Reyk -- Best wishes, Vadim Zhukov A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing in e-mail? Index: iked.conf.5 === RCS file: /cvs/src/sbin/iked/iked.conf.5,v retrieving revision 1.13 diff -u -p -r1.13 iked.conf.5 --- iked.conf.5 21 Jan 2011 12:34:11 - 1.13 +++ iked.conf.5 19 May 2011 17:18:54 - @@ -318,6 +318,29 @@ For a list of all port name to number ma .Xr ipsecctl 8 , see the file .Pa /etc/services . +.Pp +Traffic selector can contain more than one address for both +.Ic from +and +.Ic to +clauses, for example: +.Bd -literal -offset indent +ikev2 esp \e + from { 172.16.0.0/24, 172.16.4.0/24, 172.16.9.0/24 } \e + to { 192.168.0.0/20, 10.0.0.0/8 } \e + peer 1.2.3.4 local 5.6.7.8 +.Ed +.Pp +In this case, six actual selectors will be created, from each +address in +.Ic from +part to each address in +.Ic to +part. +Commas between addresses are optional. +.Pp +You can combine multiple addresses and multiple traffic selectors +features in one statement. .It Ic local Ar localip Ic peer Ar remote The .Ic local @@ -821,9 +844,13 @@ and any other connection will be matched .Sq catch all policy. .Bd -literal -offset indent -ikev2 quick esp from 10.10.10.0/24 to 10.20.20.0/24 \e +ikev2 quick esp \e + from { 10.10.10.0/24, 10.11.10.0/24 } \e + to 10.20.20.0/24 \e peer 192.168.1.34 -ikev2 catch all esp from 10.0.1.0/24 to 10.0.2.0/24 \e +ikev2 catch all esp \e + from 10.0.1.0/24 to 10.0.2.0/24 \e + from 10.0.31.0/24 to 10.0.35.0/24 \e peer any ikev2 subnet esp from 10.0.3.0/24 to 10.0.4.0/24 \e peer 192.168.1.0/24 Index: parse.y === RCS file: /cvs/src/sbin/iked/parse.y,v retrieving revision 1.21 diff -u -p -r1.21 parse.y --- parse.y 18 Apr 2011 08:45:43 - 1.21 +++ parse.y 19 May 2011 17:18:54 - @@ -258,6 +258,8 @@ struct ipsec_hosts { struct ipsec_addr_wrap *dst; u_int16_tsport; u_int16_tdport; + struct ipsec_hosts *next; + struct ipsec_hosts *tail; }; struct ipsec_filters { @@ -293,6 +295,8 @@ intget_id_type(char *); u_int8_t x2i(unsigned char *); int parsekey(unsigned char *, size_t, struct iked_auth *); int parsekeyfile(char *, struct iked_auth *); +struct ipsec_hosts *expand_hosts(struct ipsec_addr_wrap *, u_int16_t, + struct ipsec_addr_wrap *, u_int16_t); struct ipsec_transforms *ipsec_transforms; struct ipsec_filters *ipsec_filters; @@ -345,7 +349,7 @@ typedef struct { %typev.number portval af %typev.peers peers %typev.anyhost anyhost -%typev.hosthost host_spec +%typev.hosthost host_list host_spec %typev.ids ids %typev.id id %typev.transforms transforms @@ -490,51 +494,20 @@ hosts_list : hosts { $$ = $1; } else if ($1
Re: Split iked(8) policy creation imsg
On 20 May 2011 c. 18:45:05 Reyk Floeter wrote: Hi! On Fri, May 20, 2011 at 03:54:03PM +0400, Vadim Zhukov wrote: This patch splits off IMSG_CFG_POLICY into four messages: IMSG_CFG_POLICY_BEGIN IMSG_CFG_POLICY_PROPOSAL IMSG_CFG_POLICY_FLOW IMSG_CFG_POLICY_COMMIT Each new policy should start with IMSG_CFG_POLICY_BEGIN, then proceed any number of proposals and flows, and then commit policy. I decided not to use ticket system because I see no race sources. Now we can have long, long policies in your iked.conf. Previously we were limited by maximum imsg packet size (MAX_IMSGSIZE) which is 16 kilobytes currently. Thanks for you diff. See comments below. It is appreciated and I can adjust it but you can see my comments for modifications below. Thank you for reviewing and commenting! I'll rework the diff soon and send new version. -- Best wishes, Vadim Zhukov A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing in e-mail?
Re: Panic in sr_crypto_rw with kern.bufcachepercent = 75
Can I get the orignal diff one more time? On Fri, May 20, 2011 at 04:51:02PM +0200, David Coppa wrote: On Fri, May 20, 2011 at 2:45 PM, Mike Belopuhov m...@crypt.org.ru wrote: On Fri, May 20, 2011 at 06:24 -0600, David Coppa wrote: Hi all, OpenBSD-current snapshot dated 16-May-2011: I get an always-reproducible panic with softraid crypto and kern.bufcachepercent = 75, when untarring a tarball of the complete source tree. The disk layout is the following, with softraid crypto for all but /: /dev/sd0a on / type ffs (local) /dev/sd2f on /home type ffs (local, nodev, nosuid) /dev/sd2e on /usr type ffs (local, nodev) /dev/sd2d on /var type ffs (local, nodev, nosuid) /dev/sd3i on /mnt type msdos (local) Here's the trace: panic: sr_crypto_rw: no crypto op hi, although i'm not sure that this is a best solution, i can't see why we should panic here. ?sr_scsi_cmd seems to cope with stuffups just fine. Index: dev/softraid_crypto.c === RCS file: /home/cvs/src/sys/dev/softraid_crypto.c,v retrieving revision 1.65 diff -u -p -r1.65 softraid_crypto.c --- dev/softraid_crypto.c ? ? ? 6 Apr 2011 03:14:51 - ? ? ? 1.65 +++ dev/softraid_crypto.c ? ? ? 20 May 2011 12:42:12 - @@ -1115,7 +1115,7 @@ sr_crypto_rw(struct sr_workunit *wu) ? ? ? ?if (wu-swu_xs-flags SCSI_DATA_OUT) { ? ? ? ? ? ? ? ?crp = sr_crypto_getcryptop(wu, 1); ? ? ? ? ? ? ? ?if (crp == NULL) - ? ? ? ? ? ? ? ? ? ? ? panic(sr_crypto_rw: no crypto op); + ? ? ? ? ? ? ? ? ? ? ? return (1); ? ? ? ? ? ? ? ?crp-crp_callback = sr_crypto_write; ? ? ? ? ? ? ? ?crp-crp_opaque = wu; ? ? ? ? ? ? ? ?s = splvm(); It now survives to several untarrings of src.tar with kern.bufcachepercent=90. cheers, David
Re: AVL tree
On Fri, May 20, 2011 at 11:23:47AM +0200, Mike Belopuhov wrote: On Fri, May 20, 2011 at 4:04 AM, Ariane van der Steldt ari...@stack.nl wrote: AVL trees have a difference of max 1 between the longest and shortest path to a leaf, whereas RB-trees have at max the longest path be double the length of the shortest path. I.e. work case lookups require traversal of ? ?ln(size)/ln(2) elements for AVL trees, 2 * ln(size)/ln(2) elements for RB trees. However, RB trees are slightly more efficient with insertion/removal. The better balancing of AVL trees can make a big difference in lookup time compared to RB trees, for trees containing millions of items. how are you supposed to choose between them? Basically it's a trade off. Is lookup more important, or insert/remove? And ofcourse, if you still don't know, just take what strikes your fancy. :D i know about the difference of lookup and insert/remove speed, but is there a significant difference in practice (in openbsd use cases)? I suspect that the PF state table is mostly lookups, therefor AVL would improve performance there. That said without having looked at the code though. For UVMs allocators, the case is harder to pin, since they use a lot of lookup and traversal (where AVL may benefit) while also requiring a lot of modifications (where RB may be better). The choice is a lot harder to pin down. This is without taking cache coherency on multiple CPUs into account, which may change the picture again (a write/rotate would lead to cache invalidation on all other cpus). But I'm mostly interested because I tend to use trees left, right and center and those macros lead to code bloat. If I can use a tree without macros, I'm happy. so maybe it makes sense to have a non macro implementation in libkern and use it in uvm or whenever needed? Or just in sys/kern. -- Ariane