Re: AVL tree

2011-05-20 Thread Michael Pounov
I test this implementation with 1000 entry inserted by random()
;)
average seek time with random search of 1 is from 2 to 3 seconds

On Fri, 20 May 2011 04:04:07 +0200
Ariane van der Steldt ari...@stack.nl wrote:

 On Thu, May 19, 2011 at 07:33:22PM +0200, Mike Belopuhov wrote:
  On Thu, May 19, 2011 at 7:25 PM, Michael Pounov mi...@aitbg.com wrote:
   Not see differences in results with performance from RB tree vs AVL,
 
 Which tree size did you use to test that? Which insert order (yes, this
 is relevant: with the right ordering you can hit the worst or best case
 of a tree algorithm). Did you test lookup time or update time or both
 (combined or split)?
 
but right solution for problems when we have choice between algorithms.
  
  
  if you don't see any difference then what's the point in having them
  both available?
 
 AVL trees have a difference of max 1 between the longest and shortest
 path to a leaf, whereas RB-trees have at max the longest path be double
 the length of the shortest path.
 I.e. work case lookups require traversal of
 ln(size)/ln(2) elements for AVL trees,
 2 * ln(size)/ln(2) elements for RB trees.
 
 However, RB trees are slightly more efficient with insertion/removal.
 
 The better balancing of AVL trees can make a big difference in lookup
 time compared to RB trees, for trees containing millions of items.
 
  how are you supposed to choose between them?
 
 Basically it's a trade off. Is lookup more important, or insert/remove?
 And ofcourse, if you still don't know, just take what strikes your
 fancy. :D
 
 But I'm mostly interested because I tend to use trees left, right and
 center and those macros lead to code bloat. If I can use a tree without
 macros, I'm happy.
 -- 
 Ariane
 


-- 

M.Punov
-
AITNET - Sofia/Bulgaria -
Software  Network Solutions
(+359) 888 73 73 58;(+359) 2 402 4000



Re: AVL tree

2011-05-20 Thread Mike Belopuhov
On Fri, May 20, 2011 at 4:04 AM, Ariane van der Steldt ari...@stack.nl
wrote:

 AVL trees have a difference of max 1 between the longest and shortest
 path to a leaf, whereas RB-trees have at max the longest path be double
 the length of the shortest path.
 I.e. work case lookups require traversal of
ln(size)/ln(2) elements for AVL trees,
 2 * ln(size)/ln(2) elements for RB trees.

 However, RB trees are slightly more efficient with insertion/removal.

 The better balancing of AVL trees can make a big difference in lookup
 time compared to RB trees, for trees containing millions of items.

 how are you supposed to choose between them?

 Basically it's a trade off. Is lookup more important, or insert/remove?
 And ofcourse, if you still don't know, just take what strikes your
 fancy. :D


i know about the difference of lookup and insert/remove speed, but
is there a significant difference in practice (in openbsd use cases)?

 But I'm mostly interested because I tend to use trees left, right and
 center and those macros lead to code bloat. If I can use a tree without
 macros, I'm happy.

so maybe it makes sense to have a non macro implementation in libkern
and use it in uvm or whenever needed?

 --
 Ariane



Re: AVL tree

2011-05-20 Thread Michael Pounov
On Fri, 20 May 2011 05:01:41 +0200
Ariane van der Steldt ari...@stack.nl wrote:

 On Thu, May 19, 2011 at 08:28:09PM +0300, Michael Pounov wrote:
  Add AVL tree implementation and merge few RB tree related macros.
  
  If you have comments or any claims, please send me feedback
  and I will fix them. 
  
   Inline patch::
  --- /usr/src/sys/sys/tree.h Mon Mar  2 11:42:55 2009
  +++ tree.h  Thu May 19 20:16:36 2011
  @@ -730,9 +730,367 @@
   (x) != NULL;   \
   (x) = name##_RB_NEXT(x))
   
  +#define RB_FOREACH_FROM(x, name, y)
  \
 
 This macro modifies parameter y, which I do not expect from using it.
 Please keep y its original value. Also, I want to be able to use an
 rvalue for y.
 
  +   for ((x) = (y); \
  +   ((x) != NULL)  ((y) = name##_RB_NEXT(x), (x) != NULL);\
  +(x) = (y))
  +
  +#define RB_FOREACH_SAFE(x, name, head, y)  \
 
 (Again, don't modify the value of y.)
 
  +   for ((x) = RB_MIN(name, head);  \
  +   ((x) != NULL)  ((y) = name##_RB_NEXT(x), (x) != NULL);\
  +(x) = (y))
  +
 
 Is there a specific reason your wrote the above 2 macros? The pattern
 where the for loop is written out is quite common and well understood by
 developers.
 
   #define RB_FOREACH_REVERSE(x, name, head)  \
  for ((x) = RB_MAX(name, head);  \
   (x) != NULL;   \
   (x) = name##_RB_PREV(x))
  +
  +#define RB_FOREACH_REVERSE_FROM(x, name, y)
  \
  +   for ((x) = (y); \
  +   ((x) != NULL)  ((y) = name##_RB_PREV(x), (x) != NULL);\
  +(x) = (y))
  +
  +#define RB_FOREACH_REVERSE_SAFE(x, name, head, y)  \
  +   for ((x) = RB_MAX(name, head);  \
  +   ((x) != NULL)  ((y) = name##_RB_PREV(x), (x) != NULL);\
  +(x) = (y))
  +
  +/* Macros that define a avl tree */
  +#define AVL_MAX_HEIGHT 42
 
 No. Trees do not have a max height. On a modern amd64, I can
 create code which will be able to fit more than 2 ** 42 items in the
 tree (provided I had the physmem in my machine).
 
  +#define AVL_HEAD(name, type)   
  \
  +struct name {  
  \
  +   struct type *avlh_root; /* root of the tree */  \
  +   int avlh_height;\
  +   long avlh_count;\
 

This tree implementation use height for ancestors temporary array
because we have not a pointer to the parent.

You may use all memory when set new height 2**height = all stored elements in 
tree!
:)

 size_t, not long. KR guarantee it'll fit for every future version C.
 
 Why is there a count? For most trees in the system, we don't care how
 large they are. And we don't care for the height either.
 
  +}
  +
  +#define AVL_INITIALIZER(root)  
  \
  +   { NULL, 0, 0 }
  +
  +#define AVL_INIT(root, height) do {
  \
  +   (root)-avlh_root = NULL;   \
  +   (root)-avlh_height = !height ? AVL_MAX_HEIGHT : height;\
 
 height? This is not the current height of the tree... Don't like it; if
 the items in the tree are constrained in some way, other code will have
 dealt with it.
 
  +   (root)-avlh_count = 0; \
  +} while (0)
  +
  +#define AVL_ENTRY(type)
  \
  +struct {   \
  +   struct type *avle_left; /* left element */  \
  +   struct type *avle_right;/* right element */ \
  +   int avle_height;/* node color */\
 
 Comment copy-pasted from RB tree.
 
  +}
  +
  +#define AVL_LEFT(elm, field)   (elm)-field.avle_left
  +#define AVL_RIGHT(elm, field)  (elm)-field.avle_right
  +#define AVL_HEIGHT(elm, field) (elm)-field.avle_height
  +#define AVL_ROOT(head) (head)-avlh_root
  +#define AVL_EMPTY(head)(AVL_ROOT(head) == NULL)
  +#define AVL_ROOT_HEIGHT(head)  (head)-avlh_height
 
 How about this instead:  (AVL_EMPTY(head) ? 0 : AVL_HEIGHT(AVL_ROOT(head)))
 
  +#define AVL_ROOT_COUNT(head)   (head)-avlh_count
  +
  +/* Generates prototypes and inline functions */
  +#defineAVL_PROTOTYPE(name, type, field, cmp)   
  \
  +   AVL_PROTOTYPE_INTERNAL(name, type, field, cmp,)
  +#define

rdr-to ::1

2011-05-20 Thread Camiel Dobbelaar
inet6 pf rules that rdr-to ::1 do not work currently.  Matching
packets just disappear and the counter packets that violated scope
rules from a netstat -s -p ip6 gets incremented.

It came up before on misc@:
http://marc.info/?t=12680425912r=1w=2

The attached diff removes the check (let's call it check #1) that drops
the packet.  This is just to point out where the problem is because the
BSD's have diverged here:

FreeBSD _replaced_ check #1 with another check (#2) in this diff from
2004 (from kame):
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet6/ip6_input.c.diff?r1=1.68;r2=1.69

NetBSD _replaced_ check #1 it with something totally different
(attributed to jinmei@kame)
http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/netinet6/ip6_input.c?rev=1.81content-type=text/x-cvsweb-markup

Check #2 was _added_ to OpenBSD in 2006 (attributed to jinmei@kame):
http://www.openbsd.org/cgi-bin/cvsweb/src/sys/netinet6/ip6_input.c.diff?r1=1.72;r2=1.73

Basically, check #1 is gone in FreeBSD and NetBSD and the diff syncs us
closer to FreeBSD.

I'm unsure if it's the right thing to do though for a few reasons. The
FreeBSD diff has a second part that I cannot yet tell is related or not.
 Or maybe the NetBSD diff could be better.  And OpenBSD also seems to
have other checks in this area.

I'll spend some more time on this, but maybe there's an IPv6 guru that
can lend a hand?  :-)

--
Cam




Index: ip6_input.c
===
RCS file: /cvs/src/sys/netinet6/ip6_input.c,v
retrieving revision 1.99
diff -u -r1.99 ip6_input.c
--- ip6_input.c 3 Apr 2011 13:56:05 -   1.99
+++ ip6_input.c 20 May 2011 09:30:14 -
@@ -270,7 +270,6 @@
in6_ifstat_inc(m-m_pkthdr.rcvif, ifs6_in_addrerr);
goto bad;
}
-
if (IN6_IS_ADDR_MC_INTFACELOCAL(ip6-ip6_dst) 
!(m-m_flags  M_LOOP)) {
/*
@@ -340,19 +339,6 @@
ip6 = mtod(m, struct ip6_hdr *);
srcrt = !IN6_ARE_ADDR_EQUAL(odst, ip6-ip6_dst);
 #endif
-
-   if (IN6_IS_ADDR_LOOPBACK(ip6-ip6_src) ||
-   IN6_IS_ADDR_LOOPBACK(ip6-ip6_dst)) {
-   if (m-m_pkthdr.rcvif-if_flags  IFF_LOOPBACK) {
-   ours = 1;
-   deliverifp = m-m_pkthdr.rcvif;
-   goto hbhcheck;
-   } else {
-   ip6stat.ip6s_badscope++;
-   in6_ifstat_inc(m-m_pkthdr.rcvif, ifs6_in_addrerr);
-   goto bad;
-   }
-   }

/* drop packets if interface ID portion is already filled */
if ((m-m_pkthdr.rcvif-if_flags  IFF_LOOPBACK) == 0) {



Re: set skip on ifgroup

2011-05-20 Thread Henning Brauer
* Alexander Hall ha...@openbsd.org [2011-05-19 10:25]:
 On 05/18/11 23:31, Stuart Henderson wrote:
  set skip in PF has a slightly unexpected behaviour; rather
  than skipping by interface group, it matches on the non-numeric
  part of an interface name.
 
 I think the prefix match test is a common behaviour so I think you
 should keep that.

no, that is just leftover.

-- 
Henning Brauer, h...@bsws.de, henn...@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting



Re: set skip on ifgroup

2011-05-20 Thread Henning Brauer
* Claudio Jeker cje...@diehard.n-r-g.com [2011-05-19 11:29]:
 There is a bigger problem with 'set skip on lo', it is only evaluated
 during load. So if you create a lo1 afterwards the set skip will not
 trigger. This is very annoying especially with qemu and tun interfaces.
 
 To be honest I'm not sure who will do a 'set skip on sis' or
 'set skip on em'. Normaly you want to filter on your physical interfaces
 and not just skip them all. For pseudo-devices like lo, tun, vlan, etc. a
 group is created automatically.
 
 I think the very important bit is this:
  Hmmm, looking further, it seems ordinary rules only match on the
  interface name or group as well (in pfi_kif_match()), so maybe
  you're just plain right after all. :-)
 
 set skip is currently special and works in a not so expected way so it is
 better to make it work like all other users of interface names and people
 needing 'set skip on em' should add a 'group em' line to their
 hostname.em* files.

spot on.

-- 
Henning Brauer, h...@bsws.de, henn...@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting



Re: set skip on ifgroup

2011-05-20 Thread Henning Brauer
* Stuart Henderson s...@spacehopper.org [2011-05-19 11:21]:
  Note that the default ruleset does include a 'set skip on lo' but
  that's fine since lo* interfaces are by default added to the lo
  group.
 
  If people get bitten by this change, they could either add
  an interface-name-matching group to each interface or we do that
  automatically, as is done for vlan's, lo's etc.
 
 What does anyone else think about this (making em0 a member
 of em automatically, etc.)? I don't think it's really all that
 important, there are usually better grouping criteria than
 what driver supports the device and people have to change
 their ruleset this release anyway.

we have specifically decided to NOT do this. what prupose does a group
of all interface of a certain driver have? I see zero for teh regular
drivers. the ones where it makes sense are the clonables like ppp and
for those we have automatic base class group, i. e. all tun interfaces
end up in the tun group by default.

-- 
Henning Brauer, h...@bsws.de, henn...@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting



Re: set skip on ifgroup

2011-05-20 Thread Henning Brauer
* Claudio Jeker cje...@diehard.n-r-g.com [2011-05-19 13:20]:
 On Thu, May 19, 2011 at 10:49:59AM +0100, Stuart Henderson wrote:
  On 2011/05/19 11:26, Claudio Jeker wrote:
   There is a bigger problem with 'set skip on lo', it is only evaluated
   during load. So if you create a lo1 afterwards the set skip will not
   trigger. This is very annoying especially with qemu and tun interfaces.
  
  Right, I noticed this during testing, and this at least deserves a
  mention (independent of my other diff).
  
  Changing this behaviour could be a problem though, I think it would
  need to be checked before state lookup, and we don't want to walk the
  groups of all interfaces on the system for every packet.
  
 
 Since the PFI_IFLAG_SKIP is set on a kif it should be possible to
 inherit the flag from a group when an interface is created.
 This would have no impact on runtime execution time.
 
 I would really like to have set skip pick up interfaces at runtime because
 I get constantly burned by it.

again, spot on.

-- 
Henning Brauer, h...@bsws.de, henn...@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting



Re: set skip on ifgroup

2011-05-20 Thread Henning Brauer
* Reyk Floeter r...@openbsd.org [2011-05-19 11:47]:
 On Thu, May 19, 2011 at 11:26:59AM +0200, Claudio Jeker wrote:
  To be honest I'm not sure who will do a 'set skip on sis' or
  'set skip on em'.
 
 I would ;-)
 
 Sometimes you have machines with different types of physical
 interfaces where one type is used for internal stuff like a dedicated
 pfsync or management link, eg. 
 
 set skip on em
 block in on ix0
 
 ...or soekrises with vr(4) and a quad sis(4) in one box.

you are abusing a coincidence, in this case, pretty much.

but really, you can just put a group vr in hostname.vr* if you want
this. or use a more descriptive group name to begin with :)

  I think the very important bit is this:
   Hmmm, looking further, it seems ordinary rules only match on the
   interface name or group as well (in pfi_kif_match()), so maybe
   you're just plain right after all. :-)
  set skip is currently special and works in a not so expected way so it is
  better to make it work like all other users of interface names and people
  needing 'set skip on em' should add a 'group em' line to their
  hostname.em* files.
 ifconfig em also works, so i think it would be less special and
 confusing if set skip on em would just work without extra config
 magic.

I disagree.

the ifconfig em is a backwards compat hack that doesn't make much
sense at all.

-- 
Henning Brauer, h...@bsws.de, henn...@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting



Re: set skip on ifgroup

2011-05-20 Thread Henning Brauer
* Stuart Henderson s...@spacehopper.org [2011-05-19 11:50]:
 On 2011/05/19 11:26, Claudio Jeker wrote:
  There is a bigger problem with 'set skip on lo', it is only evaluated
  during load. So if you create a lo1 afterwards the set skip will not
  trigger. This is very annoying especially with qemu and tun interfaces.
 
 Right, I noticed this during testing, and this at least deserves a
 mention (independent of my other diff).
 
 Changing this behaviour could be a problem though, I think it would
 need to be checked before state lookup, and we don't want to walk the
 groups of all interfaces on the system for every packet.

nah. we get calls from the interface subsystem when interfaces show up
or go. just a few lines of code missing to deal with skip.

-- 
Henning Brauer, h...@bsws.de, henn...@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting



Split iked(8) policy creation imsg

2011-05-20 Thread Vadim Zhukov
Hello all.

This patch splits off IMSG_CFG_POLICY into four messages:

  IMSG_CFG_POLICY_BEGIN
  IMSG_CFG_POLICY_PROPOSAL
  IMSG_CFG_POLICY_FLOW
  IMSG_CFG_POLICY_COMMIT

Each new policy should start with IMSG_CFG_POLICY_BEGIN, then proceed
any number of proposals and flows, and then commit policy. I decided
not to use ticket system because I see no race sources.

Now we can have long, long policies in your iked.conf. Previously we were
limited by maximum imsg packet size (MAX_IMSGSIZE) which is 16 kilobytes
currently.

This patch complements previous one:
http://marc.info/?l=openbsd-techm=130583217730436w=2
Those patches are totally independent and may be applied in any order and
combination.

Both patches were tested on i386.

-- 
  Best wishes,
Vadim Zhukov

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?


Index: ikev2.c
===
RCS file: /cvs/src/sbin/iked/ikev2.c,v
retrieving revision 1.55
diff -u -p -r1.55 ikev2.c
--- ikev2.c 9 May 2011 11:15:18 -   1.55
+++ ikev2.c 20 May 2011 09:48:06 -
@@ -133,8 +133,14 @@ ikev2_dispatch_parent(int fd, struct pri
return (config_getsocket(env, imsg, ikev2_msg_cb));
case IMSG_PFKEY_SOCKET:
return (config_getpfkey(env, imsg));
-   case IMSG_CFG_POLICY:
-   return (config_getpolicy(env, imsg));
+   case IMSG_CFG_POLICY_BEGIN:
+   return (config_getpolicy_begin(env, imsg));
+   case IMSG_CFG_POLICY_PROPOSAL:
+   return (config_getpolicy_proposal(env, imsg));
+   case IMSG_CFG_POLICY_FLOW:
+   return (config_getpolicy_flow(env, imsg));
+   case IMSG_CFG_POLICY_COMMIT:
+   return (config_getpolicy_commit(env, imsg));
case IMSG_CFG_USER:
return (config_getuser(env, imsg));
case IMSG_COMPILE:
Index: config.c
===
RCS file: /cvs/src/sbin/iked/config.c,v
retrieving revision 1.12
diff -u -p -r1.12 config.c
--- config.c9 May 2011 11:15:18 -   1.12
+++ config.c20 May 2011 09:48:06 -
@@ -45,6 +45,10 @@
 #include iked.h
 #include ikev2.h
 
+
+static struct iked_policy *newpol = NULL;
+
+
 struct iked_sa *
 config_new_sa(struct iked *env, int initiator)
 {
@@ -584,122 +588,167 @@ config_setpolicy(struct iked *env, struc
 {
struct iked_proposal*prop;
struct iked_flow*flow;
-   struct iked_transform   *xform;
-   size_t   size, iovcnt, j, c = 0;
struct iovec iov[IOV_MAX];
+   unsigned int i;
 
-   iovcnt = 1;
-   size = sizeof(*pol);
-   TAILQ_FOREACH(prop, pol-pol_proposals, prop_entry) {
-   size += (prop-prop_nxforms * sizeof(*xform)) +
-   (sizeof(*prop));
-   iovcnt += prop-prop_nxforms + 1;
+   if (env-sc_opts  IKED_OPT_NOACTION) {
+   print_policy(pol);
+   return (0);
}
 
-   size += pol-pol_nflows * sizeof(*flow);
-   iovcnt += pol-pol_nflows;
-
-   if (iovcnt  IOV_MAX) {
-   log_warn(%s: too many proposals/flows, __func__);
+   if (proc_compose_imsg(env, id, IMSG_CFG_POLICY_BEGIN, -1,
+   pol, sizeof(*pol)) == -1)
return (-1);
-   }
-
-   iov[c].iov_base = pol;
-   iov[c++].iov_len = sizeof(*pol);
 
TAILQ_FOREACH(prop, pol-pol_proposals, prop_entry) {
-   iov[c].iov_base = prop;
-   iov[c++].iov_len = sizeof(*prop);
-
-   for (j = 0; j  prop-prop_nxforms; j++) {
-   xform = prop-prop_xforms + j;
+   iov[0].iov_base = prop;
+   iov[0].iov_len = sizeof(*prop);
 
-   iov[c].iov_base = xform;
-   iov[c++].iov_len = sizeof(*xform);
+   for (i = 1; i = prop-prop_nxforms; i++) {
+   if (i = IOV_MAX) {
+   log_warn(%s: too many proposals, __func__);
+   return (-1);
+   }
+   iov[i].iov_base = prop-prop_xforms + (i - 1);
+   iov[i].iov_len = sizeof(struct iked_transform);
}
+
+   if (proc_composev_imsg(env, id, IMSG_CFG_POLICY_PROPOSAL, -1,
+   iov, i) == -1)
+   return (-1);
}
 
RB_FOREACH(flow, iked_flows, pol-pol_flows) {
-   iov[c].iov_base = flow;
-   iov[c++].iov_len = sizeof(*flow);
+   if (proc_compose_imsg(env, id, IMSG_CFG_POLICY_FLOW, -1,
+   flow, sizeof(struct iked_flow)) == -1)
+   return (-1);
}
 
-   if (env-sc_opts  IKED_OPT_NOACTION) {
-   

Panic in sr_crypto_rw with kern.bufcachepercent = 75

2011-05-20 Thread David Coppa
Hi all,

OpenBSD-current snapshot dated 16-May-2011:
I get an always-reproducible panic with softraid crypto and
kern.bufcachepercent = 75, when untarring a tarball of the
complete source tree.

The disk layout is the following, with softraid crypto for
all but /:

/dev/sd0a on / type ffs (local)
/dev/sd2f on /home type ffs (local, nodev, nosuid)
/dev/sd2e on /usr type ffs (local, nodev)
/dev/sd2d on /var type ffs (local, nodev, nosuid)
/dev/sd3i on /mnt type msdos (local)

Here's the trace:


panic: sr_crypto_rw: no crypto op
Stopped at Debugger+0x4: popl %ebp
RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC!
IF RUNNING SMP, USE 'mach ddbcpu #' AND 'trace' ON OTHER PROCESSORS, TOO.
DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION!
ddb{0} trace
Debugger(d08e41c2,dc777d78,d08b9e8f,dc777d78,0) at Debugger+0x4
panic(d08b9e8f,1,0,d1fcf488,d2016000) at panic+0x5d
sr_crypto_rw(d1fcf488,0,74,d03e7cfa,d1fa7240) at sr_crypto_rw+0x93
sr_scsi_cmd(d6c75178,0,478b7e0,0,20) at sr_scsi_cmd+0x17f
sdstart(d6c75178,d1fcf488,d2016c40,d1fd2c00,d1fd2c84) at sdstart+0x1b4
scsi_ioh_runqueue(d2016c2c,d1fd2c84,d6793b90,dc777e5c,d1fa7000) at 
scsi_ioh_runqueue+0x59
scsi_xsh_runqueue(d1fd2c00,d6793b90,0,d0578683,20) at scsi_xsh_runqueue+0x11d
sdstrategy(d6793b90,4a007000,0,fb530b2,0) at sdstrategy+0x175
spec_strategy(dc777ec8,d08c4f07,dc777eec,1001b4,d6793b90) at spec_strategy+0x3d
VOP_STRATEGY(d6793b90,1,4,d6793b90,d6849d7c) at VOP_STRATEGY+0x2c
bwrite(d6793b90,d08c4323,dc777f2c,d041964e,d6793b90) at bwrite+0xcc
VOP_BWRITE(d6793b90,d6df027c,dc777f5c,0,d6dd41dc) at VOP_BWRITE+0x29
ffs_sync(dc777f4c,0,d6dd41dc,d6e53000,3) at ffs_sync+0x126
VOP_FSYNC(d6dd41dc,d6e53000,3,d6df027c,d03d2f23) at VOP_FSYNC+0x35
sched_sync(d6df027c) at sched_sync+0xaf
Bad frame pointer: 0xd0baae88
ddb{0} mach ddbcpu 1
Stopped at Debugger+0x4: popl %ebp
RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC!
IF RUNNING SMP, USE 'mach ddbcpu #' AND 'trace' ON OTHER PROCESSORS, TOO.
DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION!
ddb{1} trace
Debugger(d1e9bc00,d083cb6c,0,296,d1e9bc34) at Debugger+0x4
i386_ipi_handler(b0,d03e0020,d09e,d1e90010,df3b0010) at 
i386_ipi_handler+0x5f
Xintripi() at Xintripi+0x49
--- interrupt ---
__mp_lock(d0a35ce4,7fff,dc751f18,d0202630,0) at __mp_lock+0x52
i386_softintlock(0,d03e0020,d0a3,10,d1e90010) at i386_softintlock+0x12
Xintrltimer() at Xintrltimer+0x50
--- interrupt ---
cpu_idle_cycle(d1e9bc00) at cpu_idle_cycle+0xf
Bad frame pointer: 0xd0baae48
ddb{1} ps
   PID   PPID   PGRP   UID   S   FLAGS   WAITCOMMAND
  1177  10944   1177  1000   2   0x200   tar
 10944  1  10944  1000   3   0x280   pause   ksh
 32405  1  32405 0   3   0x280   ttyin   getty
 22382  1  22382 0   3   0x280   ttyin   getty
 16249  1  16249 0   3   0x280   ttyin   getty
  4339  1   4339 0   3   0x280   ttyin   getty
 27948  1  27948 0   3   0x280   select  cron
 22709  1  22709 0   3   0x280   kqread  apmd
  3666  1   366699   3   0x280   pollaucat
 27290  1  27290 0   3   0x280   select  sshd
 28487   1837   183774   3   0x280   bpf pflogd
  1837  1   1837 0   3   0x280   netio   pflogd
 14290  22465  2246573   3   0x280   pollsyslogd
 22465  1  22465 0   3   0x288   netio   syslogd
 11486  0  0 0   3   0x2100200   bored   srdis
16  0  0 0   3   0x2100200   aiodonedaiodoned
*   15  0  0 0   7   0x2100200   update
14  0  0 0   3   0x2100200   cleaner cleaner
13  0  0 0   30x100200   reaper  reaper
12  0  0 0   2   0x2100200   pagedaemon
11  0  0 0   2   0x2100200   crypto
10  0  0 0   3   0x2100200   pftmpfpurge
 9  0  0 0   3   0x2100200   usbtsk  usbtask
 8  0  0 0   3   0x2100200   usbatsk usbatsk
 7  0  0 0   3   0x2100200   bored   intelrel
 6  0  0 0   3   0x2100200   acpi0   acpi0
 5  0  0 0   7  0x40100200   idle1
 4  0  0 0   3   0x2100200   bored   syswq
 3  0  0 0   3  0x40100200   idle0
 2  0  0 0   3   0x2100200   kmalloc kmthread
 1  0  1 0   3   0x280   waitinit
 0 -1  0 0   3   0x2080200   scheduler   swapper
ddb{1} boot reboot
softraid0: could not allocate metadata scratch area
splassert: assertwaitok: want -1 have 1


With kern.bufcachepercent=70 the system seems stable
(tested by untarring src.tar a lot of times).

dmesg below...

Cheers!
David


OpenBSD 4.9-current (GENERIC.MP) #113: Mon May 16 11:52:06 MDT 

Re: Panic in sr_crypto_rw with kern.bufcachepercent = 75

2011-05-20 Thread Mike Belopuhov
On Fri, May 20, 2011 at 06:24 -0600, David Coppa wrote:
 Hi all,
 
 OpenBSD-current snapshot dated 16-May-2011:
 I get an always-reproducible panic with softraid crypto and
 kern.bufcachepercent = 75, when untarring a tarball of the
 complete source tree.
 
 The disk layout is the following, with softraid crypto for
 all but /:
 
 /dev/sd0a on / type ffs (local)
 /dev/sd2f on /home type ffs (local, nodev, nosuid)
 /dev/sd2e on /usr type ffs (local, nodev)
 /dev/sd2d on /var type ffs (local, nodev, nosuid)
 /dev/sd3i on /mnt type msdos (local)
 
 Here's the trace:
 
 
 panic: sr_crypto_rw: no crypto op

hi,

although i'm not sure that this is a best solution, i can't
see why we should panic here.  sr_scsi_cmd seems to cope with
stuffups just fine.

Index: dev/softraid_crypto.c
===
RCS file: /home/cvs/src/sys/dev/softraid_crypto.c,v
retrieving revision 1.65
diff -u -p -r1.65 softraid_crypto.c
--- dev/softraid_crypto.c   6 Apr 2011 03:14:51 -   1.65
+++ dev/softraid_crypto.c   20 May 2011 12:42:12 -
@@ -1115,7 +1115,7 @@ sr_crypto_rw(struct sr_workunit *wu)
if (wu-swu_xs-flags  SCSI_DATA_OUT) {
crp = sr_crypto_getcryptop(wu, 1);
if (crp == NULL)
-   panic(sr_crypto_rw: no crypto op);
+   return (1);
crp-crp_callback = sr_crypto_write;
crp-crp_opaque = wu;
s = splvm();



Re: Split iked(8) policy creation imsg

2011-05-20 Thread Reyk Floeter
Hi!

On Fri, May 20, 2011 at 03:54:03PM +0400, Vadim Zhukov wrote:
 This patch splits off IMSG_CFG_POLICY into four messages:
 
   IMSG_CFG_POLICY_BEGIN
   IMSG_CFG_POLICY_PROPOSAL
   IMSG_CFG_POLICY_FLOW
   IMSG_CFG_POLICY_COMMIT
 
 Each new policy should start with IMSG_CFG_POLICY_BEGIN, then proceed
 any number of proposals and flows, and then commit policy. I decided
 not to use ticket system because I see no race sources.
 
 Now we can have long, long policies in your iked.conf. Previously we were
 limited by maximum imsg packet size (MAX_IMSGSIZE) which is 16 kilobytes
 currently.
 

Thanks for you diff.  See comments below.  It is appreciated and I can
adjust it but you can see my comments for modifications below.

reyk

 Index: ikev2.c
 ===
 RCS file: /cvs/src/sbin/iked/ikev2.c,v
 retrieving revision 1.55
 diff -u -p -r1.55 ikev2.c
 --- ikev2.c   9 May 2011 11:15:18 -   1.55
 +++ ikev2.c   20 May 2011 09:48:06 -
 @@ -133,8 +133,14 @@ ikev2_dispatch_parent(int fd, struct pri
   return (config_getsocket(env, imsg, ikev2_msg_cb));
   case IMSG_PFKEY_SOCKET:
   return (config_getpfkey(env, imsg));
 - case IMSG_CFG_POLICY:
 - return (config_getpolicy(env, imsg));
 + case IMSG_CFG_POLICY_BEGIN:
 + return (config_getpolicy_begin(env, imsg));
 + case IMSG_CFG_POLICY_PROPOSAL:
 + return (config_getpolicy_proposal(env, imsg));
 + case IMSG_CFG_POLICY_FLOW:
 + return (config_getpolicy_flow(env, imsg));
 + case IMSG_CFG_POLICY_COMMIT:
 + return (config_getpolicy_commit(env, imsg));

The naming should be simplified just to

IMSG_CFG_POLICY
IMSG_CFG_PROPOSAL
IMSG_CFG_FLOW

same for the functions

config_getpolicy
config_getproposal
config_getflow

The commit step is not required because the existing IMSG_COMPILE
finishes the policy configuration ones, calculates the skip steps and
starts operation.

See more comments below. There are some hints to make it fully async
below (not reviewing the code in detail).

   case IMSG_CFG_USER:
   return (config_getuser(env, imsg));
   case IMSG_COMPILE:
 Index: config.c
 ===
 RCS file: /cvs/src/sbin/iked/config.c,v
 retrieving revision 1.12
 diff -u -p -r1.12 config.c
 --- config.c  9 May 2011 11:15:18 -   1.12
 +++ config.c  20 May 2011 09:48:06 -
 @@ -45,6 +45,10 @@
  #include iked.h
  #include ikev2.h
  
 +
 +static struct iked_policy *newpol = NULL;
 +
 +

Please don't use a global variable here.  Just add the policy to the
global list in config_getpolicy() and reference to the policy by id
in the config_getproposal() and config_getflow() calls.

  struct iked_sa *
  config_new_sa(struct iked *env, int initiator)
  {
 @@ -584,122 +588,167 @@ config_setpolicy(struct iked *env, struc
  {
   struct iked_proposal*prop;
   struct iked_flow*flow;
 - struct iked_transform   *xform;
 - size_t   size, iovcnt, j, c = 0;
   struct iovec iov[IOV_MAX];
 + unsigned int i;
  
 - iovcnt = 1;
 - size = sizeof(*pol);
 - TAILQ_FOREACH(prop, pol-pol_proposals, prop_entry) {
 - size += (prop-prop_nxforms * sizeof(*xform)) +
 - (sizeof(*prop));
 - iovcnt += prop-prop_nxforms + 1;
 + if (env-sc_opts  IKED_OPT_NOACTION) {
 + print_policy(pol);
 + return (0);
   }
  
 - size += pol-pol_nflows * sizeof(*flow);
 - iovcnt += pol-pol_nflows;
 -
 - if (iovcnt  IOV_MAX) {
 - log_warn(%s: too many proposals/flows, __func__);
 + if (proc_compose_imsg(env, id, IMSG_CFG_POLICY_BEGIN, -1,
 + pol, sizeof(*pol)) == -1)
   return (-1);
 - }
 -
 - iov[c].iov_base = pol;
 - iov[c++].iov_len = sizeof(*pol);
  
   TAILQ_FOREACH(prop, pol-pol_proposals, prop_entry) {
 - iov[c].iov_base = prop;
 - iov[c++].iov_len = sizeof(*prop);
 -
 - for (j = 0; j  prop-prop_nxforms; j++) {
 - xform = prop-prop_xforms + j;
 + iov[0].iov_base = prop;
 + iov[0].iov_len = sizeof(*prop);
  
 - iov[c].iov_base = xform;
 - iov[c++].iov_len = sizeof(*xform);
 + for (i = 1; i = prop-prop_nxforms; i++) {
 + if (i = IOV_MAX) {
 + log_warn(%s: too many proposals, __func__);
 + return (-1);
 + }
 + iov[i].iov_base = prop-prop_xforms + (i - 1);
 + iov[i].iov_len = sizeof(struct iked_transform);
   }
 +
 + if (proc_composev_imsg(env, id, IMSG_CFG_POLICY_PROPOSAL, -1,
 + iov, i) == -1)
 + return (-1);
   }
  
   

Re: Panic in sr_crypto_rw with kern.bufcachepercent = 75

2011-05-20 Thread David Coppa
On Fri, May 20, 2011 at 2:45 PM, Mike Belopuhov m...@crypt.org.ru wrote:
 On Fri, May 20, 2011 at 06:24 -0600, David Coppa wrote:
 Hi all,

 OpenBSD-current snapshot dated 16-May-2011:
 I get an always-reproducible panic with softraid crypto and
 kern.bufcachepercent = 75, when untarring a tarball of the
 complete source tree.

 The disk layout is the following, with softraid crypto for
 all but /:

 /dev/sd0a on / type ffs (local)
 /dev/sd2f on /home type ffs (local, nodev, nosuid)
 /dev/sd2e on /usr type ffs (local, nodev)
 /dev/sd2d on /var type ffs (local, nodev, nosuid)
 /dev/sd3i on /mnt type msdos (local)

 Here's the trace:


 panic: sr_crypto_rw: no crypto op

 hi,

 although i'm not sure that this is a best solution, i can't
 see why we should panic here.  sr_scsi_cmd seems to cope with
 stuffups just fine.

 Index: dev/softraid_crypto.c
 ===
 RCS file: /home/cvs/src/sys/dev/softraid_crypto.c,v
 retrieving revision 1.65
 diff -u -p -r1.65 softraid_crypto.c
 --- dev/softraid_crypto.c   6 Apr 2011 03:14:51 -   1.65
 +++ dev/softraid_crypto.c   20 May 2011 12:42:12 -
 @@ -1115,7 +1115,7 @@ sr_crypto_rw(struct sr_workunit *wu)
if (wu-swu_xs-flags  SCSI_DATA_OUT) {
crp = sr_crypto_getcryptop(wu, 1);
if (crp == NULL)
 -   panic(sr_crypto_rw: no crypto op);
 +   return (1);
crp-crp_callback = sr_crypto_write;
crp-crp_opaque = wu;
s = splvm();


It now survives to several untarrings of src.tar with
kern.bufcachepercent=90.

cheers,
David



Re: Allow ipsecctl-like grouping in iked(8)

2011-05-20 Thread Reyk Floeter
hi,

On Thu, May 19, 2011 at 11:06:44PM +0400, Vadim Zhukov wrote:
 This patch allows ipsecctl-like flow grouping along with current
 behavior. It allows to write many-to-many policies in a more
 compact way, see an example:
 
   ikev2 esp \
 from { 1.2.3.4, 5.6.7.8 } to { 3.4.5.6, 4.5.6.7} \
 from 7.8.9.0 to { 0.1.2.3, 2.3.4.5 } \
 ...
 
 will create six flows:
 
   1.2.3.4 - 3.4.5.6
   1.2.3.4 - 4.5.6.7
   5.6.7.8 - 3.4.5.6
   5.6.7.8 - 4.5.6.7
   7.8.9.0 - 0.1.2.3
   7.8.9.0 - 2.3.4.5
 
 Please note that you're still limited by MAX_IMSGSIZE, which is
 about 7 flows, depending on arch. This will be addressed in
 another patch.
 

The difference between ipsec.conf and iked.conf and IKEv1 and IKEv2 is
that an IKEv2 SA can have multiple flows, a.k.a. traffic selectors, in
a single exchange while IKEv1 can only have one.

So in ipsec.conf the code would expand it into multiple independent
ike policies that would lead to multiple independent IKEv1 exchanges
and SAs.

With IKEv2 this would expand into a single IKEv2 SA with many traffic
selectors that would have to fit into a single IKEv2 UDP packet
(IKE_AUTH) which can lead to IP fragmentation and all kinds of
problems.  The current code does not prevent multiple traffic
selectors as it is totally legal in IKEv2, but I would be careful with
using macros to generate _many_ traffic selectors, or flows, in a
policy.  Maybe this is just a documentation issue or you need some
additional length checking here to avoid/warn about excessive
fragmentation (considering the 64k IPv4 limit as well).

Reyk

 -- 
   Best wishes,
 Vadim Zhukov
 
 A: Because it messes up the order in which people normally read text.
 Q: Why is top-posting such a bad thing?
 A: Top-posting.
 Q: What is the most annoying thing in e-mail?
 
 
 Index: iked.conf.5
 ===
 RCS file: /cvs/src/sbin/iked/iked.conf.5,v
 retrieving revision 1.13
 diff -u -p -r1.13 iked.conf.5
 --- iked.conf.5   21 Jan 2011 12:34:11 -  1.13
 +++ iked.conf.5   19 May 2011 17:18:54 -
 @@ -318,6 +318,29 @@ For a list of all port name to number ma
  .Xr ipsecctl 8 ,
  see the file
  .Pa /etc/services .
 +.Pp
 +Traffic selector can contain more than one address for both
 +.Ic from
 +and
 +.Ic to
 +clauses, for example:
 +.Bd -literal -offset indent
 +ikev2 esp \e
 + from { 172.16.0.0/24, 172.16.4.0/24, 172.16.9.0/24 } \e
 + to   { 192.168.0.0/20, 10.0.0.0/8 } \e
 + peer 1.2.3.4 local 5.6.7.8
 +.Ed
 +.Pp
 +In this case, six actual selectors will be created, from each
 +address in
 +.Ic from
 +part to each address in
 +.Ic to
 +part.
 +Commas between addresses are optional.
 +.Pp
 +You can combine multiple addresses and multiple traffic selectors
 +features in one statement.
  .It Ic local Ar localip Ic peer Ar remote
  The
  .Ic local
 @@ -821,9 +844,13 @@ and any other connection will be matched
  .Sq catch all
  policy.
  .Bd -literal -offset indent
 -ikev2 quick esp from 10.10.10.0/24 to 10.20.20.0/24 \e
 +ikev2 quick esp \e
 + from { 10.10.10.0/24, 10.11.10.0/24 } \e
 + to 10.20.20.0/24 \e
   peer 192.168.1.34
 -ikev2 catch all esp from 10.0.1.0/24 to 10.0.2.0/24 \e
 +ikev2 catch all esp \e
 + from 10.0.1.0/24 to 10.0.2.0/24 \e
 + from 10.0.31.0/24 to 10.0.35.0/24 \e
   peer any
  ikev2 subnet esp from 10.0.3.0/24 to 10.0.4.0/24 \e
   peer 192.168.1.0/24
 Index: parse.y
 ===
 RCS file: /cvs/src/sbin/iked/parse.y,v
 retrieving revision 1.21
 diff -u -p -r1.21 parse.y
 --- parse.y   18 Apr 2011 08:45:43 -  1.21
 +++ parse.y   19 May 2011 17:18:54 -
 @@ -258,6 +258,8 @@ struct ipsec_hosts {
   struct ipsec_addr_wrap  *dst;
   u_int16_tsport;
   u_int16_tdport;
 + struct ipsec_hosts  *next;
 + struct ipsec_hosts  *tail;
  };
  
  struct ipsec_filters {
 @@ -293,6 +295,8 @@ intget_id_type(char *);
  u_int8_t  x2i(unsigned char *);
  int   parsekey(unsigned char *, size_t, struct iked_auth *);
  int   parsekeyfile(char *, struct iked_auth *);
 +struct ipsec_hosts   *expand_hosts(struct ipsec_addr_wrap *, u_int16_t,
 +  struct ipsec_addr_wrap *, u_int16_t);
  
  struct ipsec_transforms *ipsec_transforms;
  struct ipsec_filters *ipsec_filters;
 @@ -345,7 +349,7 @@ typedef struct {
  %typev.number  portval af
  %typev.peers   peers
  %typev.anyhost anyhost
 -%typev.hosthost host_spec
 +%typev.hosthost host_list host_spec
  %typev.ids ids
  %typev.id  id
  %typev.transforms  transforms
 @@ -490,51 +494,20 @@ hosts_list  : hosts { $$ = 
 $1; }
   else if ($1 

Re: Split iked(8) policy creation imsg

2011-05-20 Thread Vadim Zhukov
On 20 May 2011 c. 18:45:05 Reyk Floeter wrote:
 Hi!

 On Fri, May 20, 2011 at 03:54:03PM +0400, Vadim Zhukov wrote:
  This patch splits off IMSG_CFG_POLICY into four messages:
 
IMSG_CFG_POLICY_BEGIN
IMSG_CFG_POLICY_PROPOSAL
IMSG_CFG_POLICY_FLOW
IMSG_CFG_POLICY_COMMIT
 
  Each new policy should start with IMSG_CFG_POLICY_BEGIN, then
  proceed any number of proposals and flows, and then commit policy. I
  decided not to use ticket system because I see no race sources.
 
  Now we can have long, long policies in your iked.conf. Previously we
  were limited by maximum imsg packet size (MAX_IMSGSIZE) which is 16
  kilobytes currently.

 Thanks for you diff.  See comments below.  It is appreciated and I can
 adjust it but you can see my comments for modifications below.

Thank you for reviewing and commenting! I'll rework the diff soon and
send new version.

--
  Best wishes,
Vadim Zhukov

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?



Re: Panic in sr_crypto_rw with kern.bufcachepercent = 75

2011-05-20 Thread Marco Peereboom
Can I get the orignal diff one more time?

On Fri, May 20, 2011 at 04:51:02PM +0200, David Coppa wrote:
 On Fri, May 20, 2011 at 2:45 PM, Mike Belopuhov m...@crypt.org.ru wrote:
  On Fri, May 20, 2011 at 06:24 -0600, David Coppa wrote:
  Hi all,
 
  OpenBSD-current snapshot dated 16-May-2011:
  I get an always-reproducible panic with softraid crypto and
  kern.bufcachepercent = 75, when untarring a tarball of the
  complete source tree.
 
  The disk layout is the following, with softraid crypto for
  all but /:
 
  /dev/sd0a on / type ffs (local)
  /dev/sd2f on /home type ffs (local, nodev, nosuid)
  /dev/sd2e on /usr type ffs (local, nodev)
  /dev/sd2d on /var type ffs (local, nodev, nosuid)
  /dev/sd3i on /mnt type msdos (local)
 
  Here's the trace:
 
 
  panic: sr_crypto_rw: no crypto op
 
  hi,
 
  although i'm not sure that this is a best solution, i can't
  see why we should panic here. ?sr_scsi_cmd seems to cope with
  stuffups just fine.
 
  Index: dev/softraid_crypto.c
  ===
  RCS file: /home/cvs/src/sys/dev/softraid_crypto.c,v
  retrieving revision 1.65
  diff -u -p -r1.65 softraid_crypto.c
  --- dev/softraid_crypto.c ? ? ? 6 Apr 2011 03:14:51 - ? ? ? 1.65
  +++ dev/softraid_crypto.c ? ? ? 20 May 2011 12:42:12 -
  @@ -1115,7 +1115,7 @@ sr_crypto_rw(struct sr_workunit *wu)
  ? ? ? ?if (wu-swu_xs-flags  SCSI_DATA_OUT) {
  ? ? ? ? ? ? ? ?crp = sr_crypto_getcryptop(wu, 1);
  ? ? ? ? ? ? ? ?if (crp == NULL)
  - ? ? ? ? ? ? ? ? ? ? ? panic(sr_crypto_rw: no crypto op);
  + ? ? ? ? ? ? ? ? ? ? ? return (1);
  ? ? ? ? ? ? ? ?crp-crp_callback = sr_crypto_write;
  ? ? ? ? ? ? ? ?crp-crp_opaque = wu;
  ? ? ? ? ? ? ? ?s = splvm();
 
 
 It now survives to several untarrings of src.tar with kern.bufcachepercent=90.
 
 cheers,
 David



Re: AVL tree

2011-05-20 Thread Ariane van der Steldt
On Fri, May 20, 2011 at 11:23:47AM +0200, Mike Belopuhov wrote:
 On Fri, May 20, 2011 at 4:04 AM, Ariane van der Steldt ari...@stack.nl 
 wrote:
  AVL trees have a difference of max 1 between the longest and shortest
  path to a leaf, whereas RB-trees have at max the longest path be double
  the length of the shortest path.
  I.e. work case lookups require traversal of
  ? ?ln(size)/ln(2) elements for AVL trees,
  2 * ln(size)/ln(2) elements for RB trees.
 
  However, RB trees are slightly more efficient with insertion/removal.
 
  The better balancing of AVL trees can make a big difference in lookup
  time compared to RB trees, for trees containing millions of items.
 
  how are you supposed to choose between them?
 
  Basically it's a trade off. Is lookup more important, or insert/remove?
  And ofcourse, if you still don't know, just take what strikes your
  fancy. :D
 
 
 i know about the difference of lookup and insert/remove speed, but
 is there a significant difference in practice (in openbsd use cases)?

I suspect that the PF state table is mostly lookups, therefor AVL would
improve performance there. That said without having looked at the code
though.

For UVMs allocators, the case is harder to pin, since they use a lot of
lookup and traversal (where AVL may benefit) while also requiring a lot
of modifications (where RB may be better). The choice is a lot harder to
pin down.

This is without taking cache coherency on multiple CPUs into account,
which may change the picture again (a write/rotate would lead to cache
invalidation on all other cpus).

  But I'm mostly interested because I tend to use trees left, right and
  center and those macros lead to code bloat. If I can use a tree without
  macros, I'm happy.
 
 so maybe it makes sense to have a non macro implementation in libkern
 and use it in uvm or whenever needed?

Or just in sys/kern.
-- 
Ariane