Re: mbuf header bloat ?

2003-01-02 Thread Andrew Gallatin

Bosko Milekic writes:
 > 
 >   Yeah, this looks like the least-intrusive way to do it.  I'm okay with
 >   the patch.  I like the idea of using an EXT-type flag to mark the data
 >   buffer types using this method.  Thanks.

Thanks.. Committed.

 >   P.S.: Try not to use MEXTADD, if possible.  Use m_extadd() instead,
 >   which is the procedure-equivalent version.  MEXTADD is just provided
 >   for 'backwards-compatibility'.  It used to be a large ugly macro.

Oh, OK.  That's fine.  I'm happy to just use m_extadd().  Thanks for
the tip.

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2003-01-02 Thread Bosko Milekic

On Thu, Jan 02, 2003 at 03:47:46PM -0500, Andrew Gallatin wrote:
>  >   To be honest, I don't really like the idea but I don't see a better
>  >   solution.  Right now, ref counting for regular mbuf clusters works
>  >   fine and is pretty damn fast, but I don't know how I could make it
>  >   happen for other external buffer types other than the way I just
>  >   described.
> 
> There's a second can of worms too.  We don't want the mbuf system
> freeing externally maintained refcnt pointers.  So we need a type
> or flag to distinguish them.
> 
> I've appended a minimal impact patch that I came up with.  It just
> adds a new type (EXT_EXTREF) and leaves the m_extadd() api/abi pretty
> much unchanged.  Callers that want to manage their own refcnt memory
> call m_extadd() like this:
> 
>   mp->m_ext.ref_cnt =  &entry->ref_count;
>   MEXTADD(mp, (void *)entry->jbuf->buf, GM_JLEN,
>   gm_freebsd_ether_jfree, (void *)entry->jbuf, 
>   0, EXT_EXTREF);
> 
> 
> It seems to work just fine, and together with some patches from Alan
> Cox for kmem_malloc(), allows me to make my network driver MPSAFE.
> I'm still testing for other hidden Giant acquisitions or
> GIANT_REQUIRED() assertions in rare codepaths, but its been up for 15
> minutes now, and that's 14m 59sec  longer than usual ;)
> 
> Would you be OK with this or something like it?
> 
> Thanks,
> 
> Drew
[patch snipped]

  Yeah, this looks like the least-intrusive way to do it.  I'm okay with
  the patch.  I like the idea of using an EXT-type flag to mark the data
  buffer types using this method.  Thanks.

  P.S.: Try not to use MEXTADD, if possible.  Use m_extadd() instead,
  which is the procedure-equivalent version.  MEXTADD is just provided
  for 'backwards-compatibility'.  It used to be a large ugly macro.

-- 
Bosko Milekic * [EMAIL PROTECTED] * [EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2003-01-02 Thread Andrew Gallatin

Bosko Milekic writes:
 > 
 > On Thu, Jan 02, 2003 at 01:53:53PM -0500, Andrew Gallatin wrote:
 > > I'm just tuning up my driver now to catch up to the "recent" interface
 > > changes.  While there, I went to add a ref count for my driver managed
 > > M_EXT clusters.  However, m_extadd() does not take a parameter for
 > > assignment into mp->m_ext.ref_cnt Eg, I cannot call m_extadd() if I
 > > want to use my own refcounter.
 > > 
 > > Is there any chance this could be fixed?  O/w, I'll have to
 > > avoid calling m_extadd()..
 > 
 >   I dunno.  I hate the whole story behind the reference counters in the
 >   mbuf code and I don't know what the correct approach here would be.  A
 >   long long while back I think m_extadd (or its equivalent) did allow
 >   something like this.  Then, we changed it so that counters would be
 >   allocated by the mbuf code 'transparently' (i.e., so that the Rest Of
 >   The World didn't have to worry about it).  However, not long ago, I
 >   changed the way reference counters were allocated for mbuf clusters so
 >   that it would be faster.  Counters for other objects are allocated
 >   with malloc().  The only 'correct' (half-assed) solution I can see is
 >   to allow m_extadd() to take a 'refcount' argument (again?) - perhaps
 >   leave a backwards-compatible m_extadd() and add a m_addext() or
 >   something - and only call malloc() for the counter if refcnt_provided
 >   == NULL.
 > 
 >   To be honest, I don't really like the idea but I don't see a better
 >   solution.  Right now, ref counting for regular mbuf clusters works
 >   fine and is pretty damn fast, but I don't know how I could make it
 >   happen for other external buffer types other than the way I just
 >   described.

There's a second can of worms too.  We don't want the mbuf system
freeing externally maintained refcnt pointers.  So we need a type
or flag to distinguish them.

I've appended a minimal impact patch that I came up with.  It just
adds a new type (EXT_EXTREF) and leaves the m_extadd() api/abi pretty
much unchanged.  Callers that want to manage their own refcnt memory
call m_extadd() like this:

  mp->m_ext.ref_cnt =  &entry->ref_count;
  MEXTADD(mp, (void *)entry->jbuf->buf, GM_JLEN,
gm_freebsd_ether_jfree, (void *)entry->jbuf, 
0, EXT_EXTREF);


It seems to work just fine, and together with some patches from Alan
Cox for kmem_malloc(), allows me to make my network driver MPSAFE.
I'm still testing for other hidden Giant acquisitions or
GIANT_REQUIRED() assertions in rare codepaths, but its been up for 15
minutes now, and that's 14m 59sec  longer than usual ;)

Would you be OK with this or something like it?

Thanks,

Drew


Index: kern/subr_mbuf.c
===
RCS file: /home/ncvs/src/sys/kern/subr_mbuf.c,v
retrieving revision 1.34
diff -u -r1.34 subr_mbuf.c
--- kern/subr_mbuf.c27 Nov 2002 04:26:00 -  1.34
+++ kern/subr_mbuf.c2 Jan 2003 19:59:54 -
@@ -1062,7 +1062,8 @@
 (((uintptr_t)(cl) - (uintptr_t)cl_refcntmap) >> MCLSHIFT)
 
 #define_mext_dealloc_ref(m)\
-   free((m)->m_ext.ref_cnt, M_MBUF)
+   if ((m)->m_ext.ext_type != EXT_EXTREF)  \
+   free((m)->m_ext.ref_cnt, M_MBUF)
 
 /**
  * Internal routines.
@@ -1508,9 +1509,13 @@
 m_extadd(struct mbuf *mb, caddr_t buf, u_int size,
 void (*freef)(void *, void *), void *args, int flags, int type)
 {
+   u_int *ref_cnt = NULL;
 
-   _mext_init_ref(mb, ((type != EXT_CLUSTER) ?
-   NULL : &cl_refcntmap[cl2ref(mb->m_ext.ext_buf)]));
+   if (type == EXT_CLUSTER)
+   ref_cnt = &cl_refcntmap[cl2ref(mb->m_ext.ext_buf)];
+   else if (type == EXT_EXTREF)
+   ref_cnt = mb->m_ext.ref_cnt;
+   _mext_init_ref(mb, ref_cnt);
if (mb->m_ext.ref_cnt != NULL) {
mb->m_flags |= (M_EXT | flags);
mb->m_ext.ext_buf = buf;
Index: sys/mbuf.h
===
RCS file: /home/ncvs/src/sys/sys/mbuf.h,v
retrieving revision 1.109
diff -u -r1.109 mbuf.h
--- sys/mbuf.h  30 Dec 2002 20:22:40 -  1.109
+++ sys/mbuf.h  2 Jan 2003 19:40:20 -
@@ -173,6 +173,7 @@
 #defineEXT_NET_DRV 100 /* custom ext_buf provided by net driver(s) */
 #defineEXT_MOD_TYPE200 /* custom module's ext_buf type */
 #defineEXT_DISPOSABLE  300 /* can throw this buffer away w/page flipping 
*/
+#defineEXT_EXTREF  400 /* has externally maintained ref_cnt ptr*/
 
 /*
  * Flags copied when copying m_pkthdr.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2003-01-02 Thread Bosko Milekic

On Thu, Jan 02, 2003 at 01:53:53PM -0500, Andrew Gallatin wrote:
> I'm just tuning up my driver now to catch up to the "recent" interface
> changes.  While there, I went to add a ref count for my driver managed
> M_EXT clusters.  However, m_extadd() does not take a parameter for
> assignment into mp->m_ext.ref_cnt Eg, I cannot call m_extadd() if I
> want to use my own refcounter.
> 
> Is there any chance this could be fixed?  O/w, I'll have to
> avoid calling m_extadd()..

  I dunno.  I hate the whole story behind the reference counters in the
  mbuf code and I don't know what the correct approach here would be.  A
  long long while back I think m_extadd (or its equivalent) did allow
  something like this.  Then, we changed it so that counters would be
  allocated by the mbuf code 'transparently' (i.e., so that the Rest Of
  The World didn't have to worry about it).  However, not long ago, I
  changed the way reference counters were allocated for mbuf clusters so
  that it would be faster.  Counters for other objects are allocated
  with malloc().  The only 'correct' (half-assed) solution I can see is
  to allow m_extadd() to take a 'refcount' argument (again?) - perhaps
  leave a backwards-compatible m_extadd() and add a m_addext() or
  something - and only call malloc() for the counter if refcnt_provided
  == NULL.

  To be honest, I don't really like the idea but I don't see a better
  solution.  Right now, ref counting for regular mbuf clusters works
  fine and is pretty damn fast, but I don't know how I could make it
  happen for other external buffer types other than the way I just
  described.

> Thanks,
> 
> Drew

-- 
Bosko Milekic * [EMAIL PROTECTED] * [EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2003-01-02 Thread Andrew Gallatin

Bosko Milekic writes:
 > 
 > On Mon, Nov 25, 2002 at 08:13:46PM -0500, Andrew Gallatin wrote:
 > >  >   It is not out of date.  The code means:
 > >  > 
 > >  >   "If you've given me a counter then I'll use it otherwise I'll try to
 > >  >   allocate one with malloc()."
 > > 
 > > Ah, duh.  Thanks.  I'd better start providing one in my driver then..
 > 
 >   Again, if you're just using regular mbuf clusters (of the 2K variety)
 >   you don't need to because mb_alloc will do it for you.  If you're
 >   using a third-party buffer (e.g., jumbo buf) then you can if you want
 >   to but make sure it's not something that requires freeing afterwords
 >   (in other words, only do it if you do the exact same thing mb_alloc
 >   does for regular clusters because otherwise you'll have to worry
 >   about destructing the counter before freeing the mbuf).


I'm just tuning up my driver now to catch up to the "recent" interface
changes.  While there, I went to add a ref count for my driver managed
M_EXT clusters.  However, m_extadd() does not take a parameter for
assignment into mp->m_ext.ref_cnt Eg, I cannot call m_extadd() if I
want to use my own refcounter.

Is there any chance this could be fixed?  O/w, I'll have to
avoid calling m_extadd()..

Thanks,

Drew



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-28 Thread Sam Leffler
> On Mon, Nov 25, 2002 at 10:46:00AM -0800, Sam Leffler wrote:
> > > > I don't see this problem; looutput looks to do the right thing.
FWIW
> > I've
> > > > passed mbufs w/ mtags through the loopback interface.
> > >
> > > This refers specifically to the following code snippet:
> > >
> > > if (m && m->m_next != NULL && m->m_pkthdr.len < MCLBYTES) {
> > > struct mbuf *n;
> > >
> > > MGETHDR(n, M_DONTWAIT, MT_HEADER);
> > > if (!n)
> > > goto contiguousfail;
> > > MCLGET(n, M_DONTWAIT);
> > > if (! (n->m_flags & M_EXT)) {
> > > m_freem(n);
> > > goto contiguousfail;
> > > }
> > >
> > > m_copydata(m, 0, m->m_pkthdr.len, mtod(n, caddr_t));
> > > n->m_pkthdr = m->m_pkthdr;
> > > n->m_len = m->m_pkthdr.len;
> > > n->m_pkthdr.aux = m->m_pkthdr.aux;
> > > m->m_pkthdr.aux = (struct mbuf *)NULL;
> > > m_freem(m);
> > > m = n;
> > > }
> > >
> >
> > Something is wrong with your tree:
> >
> > ebb% grep aux ../sys/mbuf.h
> > ebb%
> >
> > The above code is correct in the repo as is the m_getcl code.
>
>   While the 'aux' part of the code was in fact removed, what Robert is
>   saying still applies.  What happens is that this code 'manually'
>   performs a mbuf to mbuf+cluster copy because of some pretty bogus
>   assumption in the KAME code that expects the data to be contiguous in
>   a single cluster.
>
>   So when the 'n' mbuf is allocated and the cluster with it then a
>   shallow copy of the packet header is done from mbuf 'm' (the old mbuf)
>   to the newly allocated mbuf 'n'.  What Robert is saying is that
>   this copy breaks his MAC label semantics which require a deeper copy
>   in this case and he is concerned that it may break the m_tag semantics
>   too, especially given the fact that the old mbuf 'm' is then freed
>   (doesn't this destroy the label?!).  If that's indeed the case, then
>   this copy remains bogus.
>

The real code (i.e. what was checked in when I replaced the aux mbufs w/
tags) doesn't do the above; it does the right thing; albeit using the binary
copy of the mbuf instead of using the appropriate macro.

Regardless Robert and I have been working through the details of revising
the system handling of pkthdr's.

Sam


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-27 Thread Robert Watson

On Wed, 27 Nov 2002, Julian Elischer wrote:

> On Wed, 27 Nov 2002, Robert Watson wrote:
> 
> > I'd like to continue to explore options for reducing the number of memory
> > allocations to extend storage on mbufs.  One idea I've been tossing around
> > is adopting Jeff Roberson's extension model used in struct proc and
> > related structures. 
> 
> I've been wondering about a couple of things..  1/ soemtiems I wonder if
> ALL mbufs should not be external mbufs. 
> 
> In other words, if the mbuf were always just a header and data was
> always stored on an external buffer it might actually simplify some
> code. It would then become possible that some tag space be allocated
> along with the mbuf header.. if MAC was in the system, then every mbuf
> would be allocated with a MAC tag by default.  Maybe as a single
> allocation. The UMA allocator's init()  capability gives us a lot of
> latitude in doing things like that.

These are all interesting possibilities.  One of the upshots of this
conversation is that it sounds like, while 5.x is stabilizing, we might
want to consider some side experimental work to evaluate the continuing
effectiveness of mbufs, and to experiment with alternatives.  Between
Bosko's new allocator and UMA, I think we're pretty set on optimizing for
the current mbuf model fo 5.x.  Finishing up and cleaning up the
fine-grained locking and measuring the impact of current changes should
keep us busy on the implementation side for a bit though.

There seem to be a number of parts of this problem -- how changes in
traffic, common interfaces, usage patterns, and memory allocators have
changed our requirements for a network subsystem. 

I have the suspicion that the traffic patterns leading to an mbuf model
will probably remain: most connections will remain assymetric in nature,
with most of the large frames in one direction representing a bulk data
transfer, and small frames in the other direction, representing the
acknowledgement and control stream.  TCP hasn't shown any signs of going
to a model where we send large selective acknowledgement frames covering
wide windows, which I suppose it might do at some point given the increase
in minimum frame size.  However, we have seen work to pack lots of small
packets into large frames for bulk delivery in routers to avoid the loss
of efficiency over medimums with large mininum frame sizes.

Maybe we can put together a working group to do some discussion and
experimentation.  This is an area where we might be able to approach
potential sponsors using FreeBSD for joint investment in network
performance improvement. 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-27 Thread Julian Elischer


On Wed, 27 Nov 2002, Bosko Milekic wrote:

> 
> On Wed, Nov 27, 2002 at 12:51:27PM -0800, Julian Elischer wrote:
> > true.. if it has a 'size' argument it would do what I was thinkng
> > about..
> 
>   We actually do have that in the new m_getm().  If you do a m_getm() it
>   allows you to specify 'size' and it will allocate a packet header mbuf
>   for you with external storage and may even allocate more than one and
>   chain them together for you in one shot and without dropping the
>   per-CPU cache lock, if it can get away with it.  It does a 'best' fit
>   allocation so you effectively have your 'small,' 'big,' and 'real big'
>   scenario.

cool
I hadn't noticed that..

> 
> -- 
> Bosko Milekic * [EMAIL PROTECTED] * [EMAIL PROTECTED]
> 
> 
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-current" in the body of the message
> 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-27 Thread Bosko Milekic

On Wed, Nov 27, 2002 at 12:51:27PM -0800, Julian Elischer wrote:
> true.. if it has a 'size' argument it would do what I was thinkng
> about..

  We actually do have that in the new m_getm().  If you do a m_getm() it
  allows you to specify 'size' and it will allocate a packet header mbuf
  for you with external storage and may even allocate more than one and
  chain them together for you in one shot and without dropping the
  per-CPU cache lock, if it can get away with it.  It does a 'best' fit
  allocation so you effectively have your 'small,' 'big,' and 'real big'
  scenario.

-- 
Bosko Milekic * [EMAIL PROTECTED] * [EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-27 Thread Terry Lambert
Andrew Gallatin wrote:
> If you're a vendor of a device which inserts MAC mtags and needs
> options MAC, you put this code in your driver:
> 
> if (mbstat.m_mhlen != MHLEN) {
>printf("Please rebuild your kernel with 'options MAC'\n");
>goto atach_failed_no_mac;
> }
> 
> I've already got code like this in my driver to check that m_mclbytes
> and m_mlen is what I expect it to be, since people sometimes change
> them.


I think you are still not getting it, but it's not worth arguing
over.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-27 Thread Andrew Gallatin

Terry Lambert writes:
 > Andrew Gallatin wrote:
 > > Terry Lambert writes:
 > >  > Andrew Gallatin wrote:
 > >  > > What I (as a 3rd party driver author working in a GNUish
 > 
 > "This is how I do it."
 > 
 > > <...>
 > > 
 > >  > > How is one supposed to build a 3rd party module these days?
 > 
 > "How are you supposed to do it?"
 > 
 > >  > One is not.  The vendor supplies only a binary.
 > > 
 > > Damn it Terry,  I AM the vendor.  Somtimes I wonder if you even read
 > > the articles you reply to.   I'm asking how the vendor (me) is
 > > supposed to build a binary module and I gave an example of how currently
 > > do it.
 > 
 > You're the vendor in the first statement, and a consumer in the
 > second.

I"m the vendor in both statements, no matter how you choose to misread
my words.

 > The topic of the post to which you were replying was third party
 > binary compatability.
 > 
 > The answer is that if the structures change, then there is no
 > binary compatability without source code, period.

Yes there is.  You can be binary compatable ONLY with a kernel with
options MAC.

 > It seemed to me that you were assuming access to the source code
 > for consumers of third party modules.
 > 
 > I think the issue that Robert is concerned about is MAC modules
 > that are provided by a third party to a consumer of FreeBSD and
 > the modules, and for which the structure changes and so on can
 > not be permitted.
 > 
 > This mnakes sense, because the MAC code is being developed under
 > a DARPA contract, and it's likely that the module source code and
 > the modules won't be available to the end users, let alone the
 > general public, without some kind of security clearance, and then
 > probably not then.

If you're a vendor of a device which inserts MAC mtags and needs
options MAC, you put this code in your driver:

if (mbstat.m_mhlen != MHLEN) {
   printf("Please rebuild your kernel with 'options MAC'\n");
   goto atach_failed_no_mac;
}

I've already got code like this in my driver to check that m_mclbytes
and m_mlen is what I expect it to be, since people sometimes change
them. 

Drew
   

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-27 Thread Julian Elischer


On Wed, 27 Nov 2002, Bosko Milekic wrote:

> 
> On Wed, Nov 27, 2002 at 11:56:33AM -0800, Julian Elischer wrote:
> > On Wed, 27 Nov 2002, Robert Watson wrote:
> > > I'd like to continue to explore options for reducing the number of memory
> > > allocations to extend storage on mbufs.  One idea I've been tossing around
> > > is adopting Jeff Roberson's extension model used in struct proc and
> > > related structures. 
> > 
> > I've been wondering about a couple of things..
> > 1/ soemtiems I wonder if ALL mbufs should not be external mbufs.
> > 
> > In other words, if the mbuf were always just a header and data was
> > always stored on an external buffer it might actually simplify some
> > code. It would then become possible that some tag space
> > be allocated along with the mbuf header.. if MAC was 
> > in the system, then every mbuf would be allocated with a MAC tag by
> > default.  Maybe as a single allocation. The UMA allocator's init()
> > capability gives us a lot of latitude in doing things like that.
> 
>   I don't see how that would simplify anything.  You would still need
>   two allocations for external storage because you need to offer
>   third-party code the possibility to provide its own external storage
>   type (think jumbo bufs or sendfile(2) or the zero-copy code).  You
>   don't really gain anything except for maybe potential space wastage
>   for very small packets by "always allocating an mbuf with external
>   storage" (you may only save a really quick and negligeable size
>   comparison, but that's it).

I was thinking of having a selection of sized external buffers.

small, medium, big..

really it was only a thought.

> 
>   As for non-third-party type external storage (your standard 2K mbuf
>   clusters) those can be allocated in one shot with an mbuf pre-attached
>   to it by the existing allocator anyway and an interface is provided to
>   do so (m_getcl(), iirc).

true.. if it has a 'size' argument it would do what I was thinkng
about..

>  
> --
> Bosko Milekic * [EMAIL PROTECTED] * [EMAIL PROTECTED]
> 
> 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-27 Thread Terry Lambert
Andrew Gallatin wrote:
> Terry Lambert writes:
>  > Andrew Gallatin wrote:
>  > > What I (as a 3rd party driver author working in a GNUish

"This is how I do it."

> <...>
> 
>  > > How is one supposed to build a 3rd party module these days?

"How are you supposed to do it?"

>  > One is not.  The vendor supplies only a binary.
> 
> Damn it Terry,  I AM the vendor.  Somtimes I wonder if you even read
> the articles you reply to.   I'm asking how the vendor (me) is
> supposed to build a binary module and I gave an example of how currently
> do it.

You're the vendor in the first statement, and a consumer in the
second.

The topic of the post to which you were replying was third party
binary compatability.

The answer is that if the structures change, then there is no
binary compatability without source code, period.

It seemed to me that you were assuming access to the source code
for consumers of third party modules.

I think the issue that Robert is concerned about is MAC modules
that are provided by a third party to a consumer of FreeBSD and
the modules, and for which the structure changes and so on can
not be permitted.

This mnakes sense, because the MAC code is being developed under
a DARPA contract, and it's likely that the module source code and
the modules won't be available to the end users, let alone the
general public, without some kind of security clearance, and then
probably not then.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-27 Thread Bosko Milekic

On Wed, Nov 27, 2002 at 11:56:33AM -0800, Julian Elischer wrote:
> On Wed, 27 Nov 2002, Robert Watson wrote:
> > I'd like to continue to explore options for reducing the number of memory
> > allocations to extend storage on mbufs.  One idea I've been tossing around
> > is adopting Jeff Roberson's extension model used in struct proc and
> > related structures. 
> 
> I've been wondering about a couple of things..
> 1/ soemtiems I wonder if ALL mbufs should not be external mbufs.
> 
> In other words, if the mbuf were always just a header and data was
> always stored on an external buffer it might actually simplify some
> code. It would then become possible that some tag space
> be allocated along with the mbuf header.. if MAC was 
> in the system, then every mbuf would be allocated with a MAC tag by
> default.  Maybe as a single allocation. The UMA allocator's init()
> capability gives us a lot of latitude in doing things like that.

  I don't see how that would simplify anything.  You would still need
  two allocations for external storage because you need to offer
  third-party code the possibility to provide its own external storage
  type (think jumbo bufs or sendfile(2) or the zero-copy code).  You
  don't really gain anything except for maybe potential space wastage
  for very small packets by "always allocating an mbuf with external
  storage" (you may only save a really quick and negligeable size
  comparison, but that's it).

  As for non-third-party type external storage (your standard 2K mbuf
  clusters) those can be allocated in one shot with an mbuf pre-attached
  to it by the existing allocator anyway and an interface is provided to
  do so (m_getcl(), iirc).
 
--
Bosko Milekic * [EMAIL PROTECTED] * [EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-27 Thread Andrew Gallatin

Terry Lambert writes:
 > Andrew Gallatin wrote:
 > > What I (as a 3rd party driver author working in a GNUish

<...>

 > > How is one supposed to build a 3rd party module these days?
 > 
 > One is not.  The vendor supplies only a binary.

Damn it Terry,  I AM the vendor.  Somtimes I wonder if you even read
the articles you reply to.   I'm asking how the vendor (me) is
supposed to build a binary module and I gave an example of how currently
do it.

Drew


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-27 Thread Terry Lambert
Andrew Gallatin wrote:
> What I (as a 3rd party driver author working in a GNUish
> autoconf/gnumake environment) do is to require a user building from
> source to specify the location of a configured kernel tree where make
> depend has been run (defaulting to GENERIC).  I then pickup the
> various option and bus files out of that directory.  When I build binary
> modules, I build from source as a normal user (using a 4.1.1 system in
> a chroot).  Using an approach like this, a vendor could ship a MAC
> aware driver by picking up the options files from a MAC kernel build
> directory.

I believe he was talking about modules for which source code
is not available.


> How is one supposed to build a 3rd party module these days?

One is not.  The vendor supplies only a binary.


>  > I think you under-estimate the complexity of variably sized key kernel
>  > data structures.  mbuf.h is included all over the kernel, as well as in
>  > many user applications (although often for bogus reasons).  My proposed
>  > strategy is the following:
> 
> Bizzare.  I had no idea userland apps used mbuf.h.  That does indeed
> sound bogus.

On the contrary: it's a very clever thing to do.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-27 Thread Julian Elischer


On Wed, 27 Nov 2002, Robert Watson wrote:

> 
> I'd like to continue to explore options for reducing the number of memory
> allocations to extend storage on mbufs.  One idea I've been tossing around
> is adopting Jeff Roberson's extension model used in struct proc and
> related structures. 

I've been wondering about a couple of things..
1/ soemtiems I wonder if ALL mbufs should not be external mbufs.

In other words, if the mbuf were always just a header and data was
always stored on an external buffer it might actually simplify some
code. It would then become possible that some tag space
be allocated along with the mbuf header.. if MAC was 
in the system, then every mbuf would be allocated with a MAC tag by
default.  Maybe as a single allocation. The UMA allocator's init()
capability gives us a lot of latitude in doing things like that.




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-27 Thread Andrew Gallatin

Robert Watson writes:
 > 
 > Andrew,
 > 
 > Thanks for your patience as I finished some research and experimentation
 > regarding the options there.  Some more details below.
 > 
 > On Sat, 23 Nov 2002, Andrew Gallatin wrote:
 > 
 > > On the contrary, I think that if anything is going to be done, it must
 > > be done now, so as to not break binary network driver compatability like
 > > we did in 4.1.1 when the size of mbufs changed.  Otherwise, we're stuck
 > > with it until 6.0.
 > 
 > Per an on-going discussion on -arch, it seems there's a reasonable
 > concencus that the kernel driver ABI will not be frozen until 5.1, since
 > we need continued flexibility to mature the fine-grained locking, KSE, and
 > MAC technologies.  This will allow us some wiggle room in resolving these
 > sorts of issues. 

I agree.  

 > > As you eloquently state, there are a number of tradeoffs involved.  On a
 > > 64-bit platform, 99% of users are paying 40 bytes/pkt for something that
 > > they will never use.  On x86, 99.99% of users are paying 20 bytes/pkt
 > > for a feature they will never use.  At least a signifigant fraction of
 > > nics make use of csum offloading (xl, ti, bge, em, myri). 
 > > 
 > > I propose that we make struct label portion of the pkthdr compile-time
 > > conditional on MAC.  The assumption is that you will move the MAC label
 > > to an m_tag sometime after 5.0-RELEASE. 
 > 
 > For a variety of reasons, I'm averse to the notion of compile-time
 > components in the struct mbuf (and other) vital kernel structures.  One of
 > the design requirements for the MAC Framework was that it be possible for
 > third party vendors to distribute security modules that plug in without
 > necessarily being part of the FreeBSD build infrastructure.  While it is
 > true we currently require options MAC to be compiled into the kernel, we
 > don't require that you manually integrate module source into the kernel
 > source so that it builds as part of a kernel.  Due to the way that


I'm not at all certain that I understand this objection.  I agree its
a bit ugly and it was proposed as a last resort when I thought we'd be
freezing the ABI at 5.0.  So I'm not strongly advocating it, but I'm
very curious as to why you think things need to be manually integrated
for it to work.

What I (as a 3rd party driver author working in a GNUish
autoconf/gnumake environment) do is to require a user building from
source to specify the location of a configured kernel tree where make
depend has been run (defaulting to GENERIC).  I then pickup the
various option and bus files out of that directory.  When I build binary
modules, I build from source as a normal user (using a 4.1.1 system in
a chroot).  Using an approach like this, a vendor could ship a MAC
aware driver by picking up the options files from a MAC kernel build
directory.

How is one supposed to build a 3rd party module these days?

 > separately shipped modules build out of the context of a kernel
 > configuration, this would introduce substantial problems.  However, since
 > we believe that the kernel ABI will not be frozen until 5.1, if we have an
 > alternative place to put the label that doesn't expand the pkthdr, then we
 > can change it once we think the solution is ready. 

Agreed.


 > On the topic of m_tag: I've spent a few days working with m_tag now to see
 > if it can meet the needs of the MAC Framework.  My conclusion is that, in
 > the form it's currently in the tree, it cannot meet the requirements. 
 > However, I believe with a relatively straight forward set of
 > modifications, it can.  As such, the proposed 5.1 time frame for moving
 > the MAC Framework to using m_tag is realistic.  I'm currently exchanging
 > patches with Sam Leffler looking at how to tweak the various protocol
 > stacks to properly maintain m_tag chains on mbufs when mbufs are copied,
 > etc.  These problems largely stem from a failure to maintain the tag
 > chains on mbufs over some of the copy/... operations that occur.  The
 > result is that the MAC labels stored in mbufs are often discarded or lost,
 > and many packets float around the system without proper protection.  For
 > policies that rely on ubituitous labeling, this results in rapid assertion
 > failures (yes, we fail very closed :-).  I hope to post patches for these
 > changes in the next few days once I've had a perform more extensive
 > testing.  Sam and I are having an on-going conversation about whether it
 > would be safe to introduce some of these changes before 5.0.
 > 
 > There are some downsides to moving to m_tag for MAC labels.  One is that
 > it effectively doubles the number of memory allocations in the system for
 > every packet delivered through the system when running with MAC if we
 > maintain the current semantic that all packets are labeled.  This means
 > users will pay a higher cost for MAC even if they don't label packets,
 > which is unfortunate.  I'm currently exploring the impact -- my hope is
 > that

Re: mbuf header bloat ?

2002-11-27 Thread Robert Watson

Andrew,

Thanks for your patience as I finished some research and experimentation
regarding the options there.  Some more details below.

On Sat, 23 Nov 2002, Andrew Gallatin wrote:

> On the contrary, I think that if anything is going to be done, it must
> be done now, so as to not break binary network driver compatability like
> we did in 4.1.1 when the size of mbufs changed.  Otherwise, we're stuck
> with it until 6.0.

Per an on-going discussion on -arch, it seems there's a reasonable
concencus that the kernel driver ABI will not be frozen until 5.1, since
we need continued flexibility to mature the fine-grained locking, KSE, and
MAC technologies.  This will allow us some wiggle room in resolving these
sorts of issues. 

> As you eloquently state, there are a number of tradeoffs involved.  On a
> 64-bit platform, 99% of users are paying 40 bytes/pkt for something that
> they will never use.  On x86, 99.99% of users are paying 20 bytes/pkt
> for a feature they will never use.  At least a signifigant fraction of
> nics make use of csum offloading (xl, ti, bge, em, myri). 
> 
> I propose that we make struct label portion of the pkthdr compile-time
> conditional on MAC.  The assumption is that you will move the MAC label
> to an m_tag sometime after 5.0-RELEASE. 

For a variety of reasons, I'm averse to the notion of compile-time
components in the struct mbuf (and other) vital kernel structures.  One of
the design requirements for the MAC Framework was that it be possible for
third party vendors to distribute security modules that plug in without
necessarily being part of the FreeBSD build infrastructure.  While it is
true we currently require options MAC to be compiled into the kernel, we
don't require that you manually integrate module source into the kernel
source so that it builds as part of a kernel.  Due to the way that
separately shipped modules build out of the context of a kernel
configuration, this would introduce substantial problems.  However, since
we believe that the kernel ABI will not be frozen until 5.1, if we have an
alternative place to put the label that doesn't expand the pkthdr, then we
can change it once we think the solution is ready. 

On the topic of m_tag: I've spent a few days working with m_tag now to see
if it can meet the needs of the MAC Framework.  My conclusion is that, in
the form it's currently in the tree, it cannot meet the requirements. 
However, I believe with a relatively straight forward set of
modifications, it can.  As such, the proposed 5.1 time frame for moving
the MAC Framework to using m_tag is realistic.  I'm currently exchanging
patches with Sam Leffler looking at how to tweak the various protocol
stacks to properly maintain m_tag chains on mbufs when mbufs are copied,
etc.  These problems largely stem from a failure to maintain the tag
chains on mbufs over some of the copy/... operations that occur.  The
result is that the MAC labels stored in mbufs are often discarded or lost,
and many packets float around the system without proper protection.  For
policies that rely on ubituitous labeling, this results in rapid assertion
failures (yes, we fail very closed :-).  I hope to post patches for these
changes in the next few days once I've had a perform more extensive
testing.  Sam and I are having an on-going conversation about whether it
would be safe to introduce some of these changes before 5.0.

There are some downsides to moving to m_tag for MAC labels.  One is that
it effectively doubles the number of memory allocations in the system for
every packet delivered through the system when running with MAC if we
maintain the current semantic that all packets are labeled.  This means
users will pay a higher cost for MAC even if they don't label packets,
which is unfortunate.  I'm currently exploring the impact -- my hope is
that changes to the memory allocators since 4.x, such as the new mbuf
allocator and introduction of UMA, will largely mitigate that effect.  A
fair amount of interest has been expressed in supporting MAC in the
GENERIC kernel eventually: if and when that becomes the case, we may find
that the rationale for moving the label out of the mbuf is reversed.

> This will immediately reduce the size of mbufs for the vast majority of
> users, and will prevent a 4.1.1 like flag-day for 3rd party network
> driver vendors.  The only downside is that the few MAC users will not be
> able to use 3rd party binary network drivers until the MAC label is put
> into an m_tag.  This seems fair, as the only people inconvienced are the
> people who want the labels and they are motivated to move them to an
> m_tag.  But that's easy for me to say, since I don't run MAC, and I may
> be missing something big. 

I think you under-estimate the complexity of variably sized key kernel
data structures.  mbuf.h is included all over the kernel, as well as in
many user applications (although often for bogus reasons).  My proposed
strategy is the following:

(1) For 5.0, we ei

Re: mbuf header bloat ?

2002-11-26 Thread Bosko Milekic

On Mon, Nov 25, 2002 at 08:13:46PM -0500, Andrew Gallatin wrote:
>  >   It is not out of date.  The code means:
>  > 
>  >   "If you've given me a counter then I'll use it otherwise I'll try to
>  >   allocate one with malloc()."
> 
> Ah, duh.  Thanks.  I'd better start providing one in my driver then..

  Again, if you're just using regular mbuf clusters (of the 2K variety)
  you don't need to because mb_alloc will do it for you.  If you're
  using a third-party buffer (e.g., jumbo buf) then you can if you want
  to but make sure it's not something that requires freeing afterwords
  (in other words, only do it if you do the exact same thing mb_alloc
  does for regular clusters because otherwise you'll have to worry
  about destructing the counter before freeing the mbuf).

> Drew
 
-- 
Bosko Milekic * [EMAIL PROTECTED] * [EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-25 Thread Andrew Gallatin

Bosko Milekic writes:

 > > back.  We'll also need a kproc that can wake up every now and then to
 > > expand the pool if allocations at interrupt time failed.   Or do you
 > > already have a mechanism for that?
 > 
 >   The intended mechanism is the kproc and when the allocator was first
 >   designed and written, I had taken that into account although I have
<...>
 >   (I hope that came out OK).  As you can see, the concept is very simple
 >   and with the current infrastructure should be fairly easy to
 >   implement.  If you're wondering, I was going to do is as a 5.1 feature
 >   at some point because I've been so swamped with other things right now
 >   and so I do not foresee being able to do this in time for 5.0.

This looks ideal.  I'm looking forward to it.

 >   It is not out of date.  The code means:
 > 
 >   "If you've given me a counter then I'll use it otherwise I'll try to
 >   allocate one with malloc()."

Ah, duh.  Thanks.  I'd better start providing one in my driver then..

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-25 Thread Andrew Gallatin

Bosko Milekic writes:
 > 
 >   Well, first of all, I never call kmem_malloc() with any locks held so
 >   this argument about grabbing Giant while other locks are held is not
 >   applicable in my case.

Well, not for the mbuf allocator itself, but for its consumers.

 >   Given that you call the allocator with locks held, then you should
 >   only be doing so with M_DONTWAIT as an M_TRYWAIT call may result in a
 >   call to m_drain() which - if you are holding any locks - may lead to
 >   lock order reversals.  So given that you can only call the allocator
 >   with M_DONTWAIT for device drivers or any other code paths holding
 >   locks, then it would make sense to just make the M_DONTWAIT case never

Exactly.  I think we're now in violent agreement.  But in order for
this to work, we need your mbuf kproc to replenish the caches in case
of allocation failures.  

I'm happy to wait until 5.1 for decent performance, as long as I know
its coming.  Hopefully we'll have time to bring some of the more
popular drivers out from under Giant in time for 5.1.


 >   call the VM routines as well.  Either that, or make sure that a call
 >   to kmem_malloc() with M_DONTWAIT can do its job without requiring
 >   Giant (is this possible?)

The VM gurus are now aware of the problem and are working on it, but
its not currently possible.

Drew



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-25 Thread Terry Lambert
Bosko Milekic wrote:
[ ... memory allocator ... ]

FWIW: The Sequent Dynix allocator paper has been converted, and
is now available online:

Experience With an Efficient Parallel Kernel Memory Allocator
Paul E. McKenney, Jack Slingwine, Phil Krueger
Sequent Computer Systems, Inc.
Software Practice And Experience
http://citeseer.nj.nec.com/484408.html

This is the same reference that is in the books "UNIX Internals:
The New Frontiers" and "UNIX For Modern Architectures".

It's the reference I always give out, when locking in allocators
comes up, but now I have something other than a ratty photocopy
to give people.  8-).

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-25 Thread Bosko Milekic

On Mon, Nov 25, 2002 at 04:20:39PM -0500, Andrew Gallatin wrote:
> I'm not sure what you mean.  The problem is that Giant inhibits
> scaling on SMPs and I want to get all network drivers out from under
> it, not just mine.
> 
> Requiring Giant for mbuf allocations effectivly defeats your elegant,
> cleverly designed mbuf allocator with lock-free per-cpu caches, etc.
> And it introduces all kinds of windows for races and locking errors if
> SMP safe drivers must drop all their mutexes and grab Giant for mbuf
> allocations.
> 
> And you cannot grab Giant inside the mbuf allocation code itself
> because the mutex rules prohibit acquiring Giant while holding any
> other locks.  From mutex(9):
> 
>Giant
>  If Giant must be acquired, it must be acquired prior to acquiring other
>  mutexes.  Put another way: it is impossible to acquire Giant non-recur-
>  sively while holding another mutex.  It is possible to acquire other
>  mutexes while holding Giant, and it is possible to acquire Giant recur-
>  sively while holding other mutexes.

  Well, first of all, I never call kmem_malloc() with any locks held so
  this argument about grabbing Giant while other locks are held is not
  applicable in my case.

  Given that you call the allocator with locks held, then you should
  only be doing so with M_DONTWAIT as an M_TRYWAIT call may result in a
  call to m_drain() which - if you are holding any locks - may lead to
  lock order reversals.  So given that you can only call the allocator
  with M_DONTWAIT for device drivers or any other code paths holding
  locks, then it would make sense to just make the M_DONTWAIT case never
  call the VM routines as well.  Either that, or make sure that a call
  to kmem_malloc() with M_DONTWAIT can do its job without requiring
  Giant (is this possible?)
  
> Emperically, I vaguely remember marking my driver as SMP safe (with
> witness and invariants off, of course) provided something like a
> 30-40% performance increase on a dual 1GHz PIII system.  It was still
> not as fast as stable, but the current/stable performance difference
> was no longer embarrassing.
> 
> Drew
> 

-- 
Bosko Milekic * [EMAIL PROTECTED] * [EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-25 Thread Terry Lambert
Bosko Milekic wrote:
[ ... packet size distribution ... ]
>   I am equally curious about this.  One of the design assumptions for
>   mbufs and clusters, according to McKusick et al. (and I believe
>   another text which currently escapes me) is that packets are typically
>   either very small or fairly large.  Given the MAC label additions
>   (yes it would be nice if this was done using the m_tag interface but
>   at the very least one can say that they are implemented fairly
>   'consistently' despite the fact that they appear imposing to the
>   general mbuf structure), and the currently available data region in
>   the mbuf, it is absolutely necessary to know whether the assumption of
>   packet size distribution still holds before a decision is made on how
>   to modify the MAC label implementation - if at all.

In fact, it is even more useful to consider the idea of variable
sized mbufs.  The actual size you want is "whatever size is needed
for the incoming packets for the MTU of the sender".  Practically,
this means 8K (a compromise on the 9K "jumbograms" vs. page size),
1536 (512*3), etc..

I get concerned with all this decoration of mbufs (MAC vs. m_tag
vs. whatever) that people are doing, since this type of thing is
going to reduce overall capacity more than m_pullup(), etc..

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-25 Thread Terry Lambert
Bosko Milekic wrote:
>   This is not entirely true.  You can allocate an mbuf chain without
>   holding Giant if the caches are well populated - and they should be
>   in the common/general case.  You can in fact modify the allocator to
>   just not do a kmem_malloc() if called with M_DONTWAIT, but I don't
>   think you'd want to do this at this point.

In fact, one of the first changes I make in a kernel when I go
to do a networking product of any kind is to allocate the mbufs
in machdep.c out of physical RAM, and then pre-link them onto a
free-list, instead of using the standard (comparatively very
slow) mbuf allocator.


>   The gist of the argument boils down to the fact that network buffer
>   allocations have different requirements than general all-purpose
>   allocations (by design, the last time I checked), and that is why
>   an mbuf/cluster allocator exists.

Everything allocated at interrupt has pretty much the same
requirements.  The only real difference in mbuf's is that the
allocation failure cases are generally better handled than all
other allocation failure cases within the kernel (or people
would not have been beating up Jeff about a month ago for the
kmem_map space issue).

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-25 Thread Bosko Milekic

On Mon, Nov 25, 2002 at 10:46:00AM -0800, Sam Leffler wrote:
> > > I don't see this problem; looutput looks to do the right thing.  FWIW
> I've
> > > passed mbufs w/ mtags through the loopback interface.
> >
> > This refers specifically to the following code snippet:
> >
> > if (m && m->m_next != NULL && m->m_pkthdr.len < MCLBYTES) {
> > struct mbuf *n;
> >
> > MGETHDR(n, M_DONTWAIT, MT_HEADER);
> > if (!n)
> > goto contiguousfail;
> > MCLGET(n, M_DONTWAIT);
> > if (! (n->m_flags & M_EXT)) {
> > m_freem(n);
> > goto contiguousfail;
> > }
> >
> > m_copydata(m, 0, m->m_pkthdr.len, mtod(n, caddr_t));
> > n->m_pkthdr = m->m_pkthdr;
> > n->m_len = m->m_pkthdr.len;
> > n->m_pkthdr.aux = m->m_pkthdr.aux;
> > m->m_pkthdr.aux = (struct mbuf *)NULL;
> > m_freem(m);
> > m = n;
> > }
> >
> 
> Something is wrong with your tree:
> 
> ebb% grep aux ../sys/mbuf.h
> ebb%
> 
> The above code is correct in the repo as is the m_getcl code.

  While the 'aux' part of the code was in fact removed, what Robert is
  saying still applies.  What happens is that this code 'manually'
  performs a mbuf to mbuf+cluster copy because of some pretty bogus
  assumption in the KAME code that expects the data to be contiguous in
  a single cluster.

  So when the 'n' mbuf is allocated and the cluster with it then a
  shallow copy of the packet header is done from mbuf 'm' (the old mbuf)
  to the newly allocated mbuf 'n'.  What Robert is saying is that
  this copy breaks his MAC label semantics which require a deeper copy
  in this case and he is concerned that it may break the m_tag semantics
  too, especially given the fact that the old mbuf 'm' is then freed
  (doesn't this destroy the label?!).  If that's indeed the case, then
  this copy remains bogus.

> Sam

--
Bosko Milekic * [EMAIL PROTECTED] * [EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-25 Thread Poul-Henning Kamp
In message <[EMAIL PROTECTED]>, Andrew Gallatin
 writes:

>I'm not sure what you mean.  The problem is that Giant inhibits
>scaling on SMPs and I want to get all network drivers out from under
>it, not just mine.

Well, that is the 1' view.

Once you try to move code out from under Giant you will find that
the transition period also contains an increased risk of deadlocks
because you have to alternately interface with Giant-free and
Giant-protected code.

I found it somewhat problematic during GEOM development that it was
hard find or get a precise status on which bits of the kernel had
which Giant status.

As it is, I still have 37 source lines in sys/geom which fiddles
Giant one way or another, some of which are probably not needed
but I was unable to tell at the time and did the defensive thing.

In 20/20 hindsight, I think the very first step of SMPng we should
have gone through each and every non-static function in the kernel
and inserted an explicit assert on Giant, and then subsequently
removed them as we cleaned our way though the kernel.

Lets remember that next time :-)

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-25 Thread Andrew Gallatin

Bosko Milekic writes:
 >   Firstly, it should be noted that the behavior of calling kmem_malloc()
 >   when its caches are empty is an old property that has been carried
 >   over from the original allocator - in other words, it is not something
 >   that I arbitrarily introduced.

Certainly.   I'm not blaming you for that.

 >   With that said, we may want to think about changing it or at least
 >   introducing a third case that would modify the semantics so that Giant
 >   would never be called.  For instance, in addition to M_TRYWAIT and
 >   M_DONTWAIT, you could invent M_CACHEONLY that would only attempt a
 >   quick cache allocation.  However, I'm unsure as to the advantages this
 >   would bring - if any.  Why is it so much of a problem if your driver
 >   wants to grab Giant?  Are you afraid of lock order reversals somewhere
 >   down the line?  Perhaps there is a misconception that we can clear up
 >   here before any work is done.
 >  

I'm not sure what you mean.  The problem is that Giant inhibits
scaling on SMPs and I want to get all network drivers out from under
it, not just mine.

Requiring Giant for mbuf allocations effectivly defeats your elegant,
cleverly designed mbuf allocator with lock-free per-cpu caches, etc.
And it introduces all kinds of windows for races and locking errors if
SMP safe drivers must drop all their mutexes and grab Giant for mbuf
allocations.

And you cannot grab Giant inside the mbuf allocation code itself
because the mutex rules prohibit acquiring Giant while holding any
other locks.  From mutex(9):

   Giant
 If Giant must be acquired, it must be acquired prior to acquiring other
 mutexes.  Put another way: it is impossible to acquire Giant non-recur-
 sively while holding another mutex.  It is possible to acquire other
 mutexes while holding Giant, and it is possible to acquire Giant recur-
 sively while holding other mutexes.


Emperically, I vaguely remember marking my driver as SMP safe (with
witness and invariants off, of course) provided something like a
30-40% performance increase on a dual 1GHz PIII system.  It was still
not as fast as stable, but the current/stable performance difference
was no longer embarrassing.

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-25 Thread Bosko Milekic

On Mon, Nov 25, 2002 at 01:27:39PM -0500, Andrew Gallatin wrote:
> Bosko Milekic writes:
>  > On Sun, Nov 24, 2002 at 04:23:33PM -0500, Andrew Gallatin wrote:
>  > > If we're going to nitpick the mbuf system, a much, much worse problem
>  > > is that you cannot allocate an mbuf chain w/o holding Giant, which
>  > > stems from the mbuf system eventually calling kmem_malloc().  This
>  > > effectively prevents any network driver from being giant-free.  When
>  > > mbufs are low, mb_alloc() calls mb_pop_cont().  This, in turn, calls
>  > > kmem_malloc() which requires Giant...
>  > 
>  >   This is not entirely true.  You can allocate an mbuf chain without
>  >   holding Giant if the caches are well populated - and they should be
>  >   in the common/general case.  You can in fact modify the allocator to
>  >   just not do a kmem_malloc() if called with M_DONTWAIT, but I don't
>  >   think you'd want to do this at this point.
>
> Unfortunately, the common case is not good enough when figuring out
> locking for network drivers and the uncommon case forces all network
> drivers under Giant.  I was thinking about doing what you describe,
> but my misconception about ref counters being malloced was holding me

  Firstly, it should be noted that the behavior of calling kmem_malloc()
  when its caches are empty is an old property that has been carried
  over from the original allocator - in other words, it is not something
  that I arbitrarily introduced.

  With that said, we may want to think about changing it or at least
  introducing a third case that would modify the semantics so that Giant
  would never be called.  For instance, in addition to M_TRYWAIT and
  M_DONTWAIT, you could invent M_CACHEONLY that would only attempt a
  quick cache allocation.  However, I'm unsure as to the advantages this
  would bring - if any.  Why is it so much of a problem if your driver
  wants to grab Giant?  Are you afraid of lock order reversals somewhere
  down the line?  Perhaps there is a misconception that we can clear up
  here before any work is done.
 
> back.  We'll also need a kproc that can wake up every now and then to
> expand the pool if allocations at interrupt time failed.   Or do you
> already have a mechanism for that?

  The intended mechanism is the kproc and when the allocator was first
  designed and written, I had taken that into account although I have
  not implemented the kproc yet.  The idea was to have a kproc maintain
  a balance of the caches without interfering with the general
  allocation cases.  The way this would be accomplished is actually
  fairly elegant, although we would have to be careful when doing the
  implementation:

  (i) Have the kproc make sure that if we're under a low watermark, do a
   block allocation from the mbuf and/or cluster maps and populate the
   general global cache (thereby not interfering with simultaneous
   frees to the per-CPU caches).

  (ii) Have the kproc check if we're above a high watermark on the
   general global cache and if so free some pages back to VM without
   interfering with simultaneous allocations on the per-CPU caches.

  (iii) Have the kproc do some bookkeeping and auto-tune its high and
   low watermarks.

  The key words in all three goals above is "not interfering with
  simultaneous allocations/frees."  This is a big advantage and is one
  of the reasons that the mbuf allocator exists.  What it means is that
  you can muck with the VM and indirectly populate the mbuf and cluster
  caches without interfering with the common {mbuf,cluster} free and
  allocation case.  Graphically:

  [ VM ] <--> [ global cache ] <--> [ per-CPU caches ]
  ^  ^
  |  |
  [ only ][ common case allocations and ]
  [ done by  ][  frees  ]
  [ kproc (to]
  [ be written)  ]
  [ and when ]
  [ certain types of ]
  [ allocations fail ]  

  (I hope that came out OK).  As you can see, the concept is very simple
  and with the current infrastructure should be fairly easy to
  implement.  If you're wondering, I was going to do is as a 5.1 feature
  at some point because I've been so swamped with other things right now
  and so I do not foresee being able to do this in time for 5.0.

>  > > The mbuf system calls malloc in other ways too.  The first time you
>  > > use a cluster, m_ext.ext_ref_cnt is malloc()'ed, and malloc is called
>  > > when the mbuf map is expanded...   I assume malloc will eventually
>  > > call kmem_malloc(), leading to the same locking problems.
>  > 
>  >   Actually, that has been changed a while ago.  The ref_cnt for clusters
>  >   is no longer malloc()'d.  A contiguous space is used from which
>  >   cluster ref counters are "allocated" (just like in the old/original
>  >   design).  This modification was made a while ago as an optimisation.
> 
> Cool.  The following code confused me into ma

Re: mbuf header bloat ?

2002-11-25 Thread Sam Leffler
> > I don't see this problem; looutput looks to do the right thing.  FWIW
I've
> > passed mbufs w/ mtags through the loopback interface.
>
> This refers specifically to the following code snippet:
>
> if (m && m->m_next != NULL && m->m_pkthdr.len < MCLBYTES) {
> struct mbuf *n;
>
> MGETHDR(n, M_DONTWAIT, MT_HEADER);
> if (!n)
> goto contiguousfail;
> MCLGET(n, M_DONTWAIT);
> if (! (n->m_flags & M_EXT)) {
> m_freem(n);
> goto contiguousfail;
> }
>
> m_copydata(m, 0, m->m_pkthdr.len, mtod(n, caddr_t));
> n->m_pkthdr = m->m_pkthdr;
> n->m_len = m->m_pkthdr.len;
> n->m_pkthdr.aux = m->m_pkthdr.aux;
> m->m_pkthdr.aux = (struct mbuf *)NULL;
> m_freem(m);
> m = n;
> }
>

Something is wrong with your tree:

ebb% grep aux ../sys/mbuf.h
ebb%

The above code is correct in the repo as is the m_getcl code.

Sam


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-25 Thread Luigi Rizzo
On Mon, Nov 25, 2002 at 01:39:02PM -0500, Andrew Gallatin wrote:
...
>  > widely used (at list now) thus failing the two main important criteria
 

on a side note i notice my english is getting worse and worse, and
i keep making mistakes as the one above. Sorry about that.
Perhaps i should go back to italy :)

cheers
luigi

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-25 Thread Robert Watson

On Mon, 25 Nov 2002, Sam Leffler wrote:

> I don't see this problem; m_getcl appears to do the right thing. 

Hmm.  I see the SLIST initialization there also.  Maybe I'm thinking of
another function, I'll have to go check.  Sorry about that.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-25 Thread Robert Watson

On Mon, 25 Nov 2002, Sam Leffler wrote:

> As I explained to you; the handling of mtags mimics what was there for
> the aux mbufs.  I did this intentionally to avoid changes that might
> introduce subtle problems.  My intent was to cleanup this stuff after
> 5.0 releases by replacing the pkthdr copy macros with separate DUP+MOVE
> macros ala openbsd.  I did this in my original implemention but
> discarded it when I did the -current integration. 

And I agree this is the right direction to take this in once we are out of
the freeze.

> I don't believe it's necessary to overload the basic mtag structure but
> instead introduce a specific cookie that enables a more general
> mechanism that would be suitable for your needs. 

That sounds like a reasonable approach.

> > (3) Not all code generating packets properly initializes m_tag field.  The
> > one example I've come across so far is the use of m_getcl() to grab
> > mbufs with an attached cluster -- it never initializes the SLIST
> > properly, as far as I can tell.  Right now that's used in device
> > drivers, and also in the BPF packet generation code.  If the header is
> > zero'd, this may be OK due to an implicit proper initialization, but
> > this is concerning.  We need to do more work to normalize packet
> > handling.
> 
> I don't see this problem; m_getcl appears to do the right thing.
> 
> > (4) Code still exists that improperly hand-copies mbuf packet header data.
> > if_loop.c contains some particular bogus code that also triggers a
> > panic in the MAC code for the same reason.  If m_tag data ever passes
> > through if_loop and hits the re-alignment case introduced by KAME, the
> > system will panic when the tag data is free'd.  This code all needs to
> > be normalized, and proper semantics must be enforced.
> >
> 
> I don't see this problem; looutput looks to do the right thing.  FWIW I've
> passed mbufs w/ mtags through the loopback interface.

This refers specifically to the following code snippet:

if (m && m->m_next != NULL && m->m_pkthdr.len < MCLBYTES) {
struct mbuf *n;

MGETHDR(n, M_DONTWAIT, MT_HEADER);
if (!n)
goto contiguousfail;
MCLGET(n, M_DONTWAIT);
if (! (n->m_flags & M_EXT)) {
m_freem(n);
goto contiguousfail;
}

m_copydata(m, 0, m->m_pkthdr.len, mtod(n, caddr_t));
n->m_pkthdr = m->m_pkthdr;
n->m_len = m->m_pkthdr.len;
n->m_pkthdr.aux = m->m_pkthdr.aux;
m->m_pkthdr.aux = (struct mbuf *)NULL;
m_freem(m);
m = n;
}

In this scenario, the mbuf header for (n) is initialized to an empty m_tag
chain.  The direct assignment of (n)'s pkthdr data from (m) copies the
pointers from (m).  (m) is then freed, which causes the mbuf allocator to
go through and delete the m_tag chain on (m), freeing the individual
entries in the chain, which are now still referenced by (n).  You only
bump into this case if you trigger the conditional clause above.  In the
MAC code, this case results in a duplicate free() when (n) is later
released -- unless I'm mis-reading things (quite possible), the same
failure mode should exist for the m_tag code.  In my local tree, I have
this case disabled, and I've been trying to figure out what the right
solution is -- probably to move to using M_COPY_PKTHDR() and then doing
the fixup.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-25 Thread Andrew Gallatin

Luigi Rizzo writes:
 > The mbuf bloat has two aspects -- first it does have some cost to
 > initialize and reset all these extra fields (and it is bug prone --
 > witness is the missing cleanup in m_getcl(), because m-tags were
 > introduced after m_getcl() and probably it was forgotten); second,
 > a legitimate question might arise at some point on why some features
 > deserve to go there and others don't, and unfortunately the mac label
 > constitutes a very bad precedent because it is very large and not very
 > widely used (at list now) thus failing the two main important criteria
 > for selection what should be in and what should not.

Well said.

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-25 Thread Robert Watson

On Mon, 25 Nov 2002, Bosko Milekic wrote:

> On Mon, Nov 25, 2002 at 11:31:39AM -0500, Robert Watson wrote:
> > BTW, do you have any recent large-scale measurements of packet size
> > distribution?  In local tests and measurements, the additional 20 bytes on
> > i386 didn't bump the remaining mbuf data space sufficiently low to
> > substantially change the behavior of the stack.  However, I haven't done
> > measurements against the 64-bit variation.  In practice, a number of
> > network interfaces now seem to use clustered mbufs and not attempt to use
> > the in-mbuf storage space...  All my packet distribution measurements come
> > from a typical ISP environment, but may not match what is seen in
> > large-scale backbone environments.
> 
>   I am equally curious about this.  One of the design assumptions for
>   mbufs and clusters, according to McKusick et al. (and I believe
>   another text which currently escapes me) is that packets are typically
>   either very small or fairly large.  Given the MAC label additions
>   (yes it would be nice if this was done using the m_tag interface but
>   at the very least one can say that they are implemented fairly
>   'consistently' despite the fact that they appear imposing to the
>   general mbuf structure), and the currently available data region in
>   the mbuf, it is absolutely necessary to know whether the assumption of
>   packet size distribution still holds before a decision is made on how
>   to modify the MAC label implementation - if at all.

It's worth pointing out for those listening, and as I'm sure you're
already aware, m_tag was not available for use by the MAC Framework when
we did any of the design and implementation, and m_tags were committed to
the tree about three months after the MAC work.  I'm happy to look at
switching the mechanism used for MAC to m_tag, especially once m_tag is
mature enough to be used, but it wasn't a design consideration in our
first pass simply because it didn't exist :-).

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-25 Thread Sam Leffler
> (1) When packet headers are copied using m_copy_pkthdr(), different
> consumers have different expectations for what the resulting semantics
> are for m_tag data -- some want it duplicated, others want it moved.
> In practice, it is only ever moved, so consumers that expect
> duplication are in for a surprise.  We need to re-implement the packet
> header copying code so that it can generate a failure (because it
> involves allocation), and separate the duplicate and move abstractions
> to get clean semantics.  I exchanged some e-mail with Sam Leffler on
> the topic, and apparently OpenBSD has already made these changes, or
> similar ones, and we should do the same for 5.1.
>

As I explained to you; the handling of mtags mimics what was there for the
aux mbufs.  I did this intentionally to avoid changes that might introduce
subtle problems.  My intent was  to cleanup this stuff after 5.0 releases by
replacing the pkthdr copy macros with separate DUP+MOVE macros ala openbsd.
I did this in my original implemention but discarded it when I did
the -current integration.

> (2) m_tag's don't have a notion that the data carried in a tag is
> multi-dimmensional and may require special destructor behavior.  While
> it does centralize copying and free'ing of data, it handles this
> purely with bcopy, malloc, and free, which is not appropriate for use
> with MAC labels, since they may contain a variety of per-policy data
> that may require special handling (reference count management, etc).
> I tried putting tag-specific release/free/... code in the m_tag
> central routines, and it looks like that would work, although it
> eventually would lead to a lot of junk in the m_tag code.  We might
> want to consider m_tag entry free/copy pointers of some sort, but I'm
> not sure if we want to go there.  Adding the MAC stuff to the
> m_tag_{free,copy,...} calls won't break the ABI, whereas adding free
> and copy pointers to the tags themselves would.
>

I don't believe it's necessary to overload the basic mtag structure but
instead introduce a specific cookie that enables a more general mechanism
that would be suitable for your needs.

> (3) Not all code generating packets properly initializes m_tag field.  The
> one example I've come across so far is the use of m_getcl() to grab
> mbufs with an attached cluster -- it never initializes the SLIST
> properly, as far as I can tell.  Right now that's used in device
> drivers, and also in the BPF packet generation code.  If the header is
> zero'd, this may be OK due to an implicit proper initialization, but
> this is concerning.  We need to do more work to normalize packet
> handling.
>

I don't see this problem; m_getcl appears to do the right thing.

> (4) Code still exists that improperly hand-copies mbuf packet header data.
> if_loop.c contains some particular bogus code that also triggers a
> panic in the MAC code for the same reason.  If m_tag data ever passes
> through if_loop and hits the re-alignment case introduced by KAME, the
> system will panic when the tag data is free'd.  This code all needs to
> be normalized, and proper semantics must be enforced.
>

I don't see this problem; looutput looks to do the right thing.  FWIW I've
passed mbufs w/ mtags through the loopback interface.

<...other stuff omitted...>

Please contact me directly about the problems so we can resolve them
immediately.

Sam


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-25 Thread Andrew Gallatin

Bosko Milekic writes:
 > 
 > On Sun, Nov 24, 2002 at 04:23:33PM -0500, Andrew Gallatin wrote:
 > > If we're going to nitpick the mbuf system, a much, much worse problem
 > > is that you cannot allocate an mbuf chain w/o holding Giant, which
 > > stems from the mbuf system eventually calling kmem_malloc().  This
 > > effectively prevents any network driver from being giant-free.  When
 > > mbufs are low, mb_alloc() calls mb_pop_cont().  This, in turn, calls
 > > kmem_malloc() which requires Giant...
 > 
 >   This is not entirely true.  You can allocate an mbuf chain without
 >   holding Giant if the caches are well populated - and they should be
 >   in the common/general case.  You can in fact modify the allocator to
 >   just not do a kmem_malloc() if called with M_DONTWAIT, but I don't
 >   think you'd want to do this at this point.

Unfortunately, the common case is not good enough when figuring out
locking for network drivers and the uncommon case forces all network
drivers under Giant.  I was thinking about doing what you describe,
but my misconception about ref counters being malloced was holding me
back.  We'll also need a kproc that can wake up every now and then to
expand the pool if allocations at interrupt time failed.   Or do you
already have a mechanism for that?

 > > The mbuf system calls malloc in other ways too.  The first time you
 > > use a cluster, m_ext.ext_ref_cnt is malloc()'ed, and malloc is called
 > > when the mbuf map is expanded...   I assume malloc will eventually
 > > call kmem_malloc(), leading to the same locking problems.
 > 
 >   Actually, that has been changed a while ago.  The ref_cnt for clusters
 >   is no longer malloc()'d.  A contiguous space is used from which
 >   cluster ref counters are "allocated" (just like in the old/original
 >   design).  This modification was made a while ago as an optimisation.

Cool.  The following code confused me into making the above statement.  Is it
out of date? 

#define _mext_init_ref(m, ref) do { \
(m)->m_ext.ref_cnt = ((ref) == NULL) ?  \
malloc(sizeof(u_int), M_MBUF, M_NOWAIT) : (u_int *)(ref);   \
if ((m)->m_ext.ref_cnt != NULL) {   \
*((m)->m_ext.ref_cnt) = 0;  \
MEXT_ADD_REF((m));  \
}   \
} while (0)



Thanks,

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-25 Thread Bosko Milekic

On Mon, Nov 25, 2002 at 11:31:39AM -0500, Robert Watson wrote:
> BTW, do you have any recent large-scale measurements of packet size
> distribution?  In local tests and measurements, the additional 20 bytes on
> i386 didn't bump the remaining mbuf data space sufficiently low to
> substantially change the behavior of the stack.  However, I haven't done
> measurements against the 64-bit variation.  In practice, a number of
> network interfaces now seem to use clustered mbufs and not attempt to use
> the in-mbuf storage space...  All my packet distribution measurements come
> from a typical ISP environment, but may not match what is seen in
> large-scale backbone environments.

  I am equally curious about this.  One of the design assumptions for
  mbufs and clusters, according to McKusick et al. (and I believe
  another text which currently escapes me) is that packets are typically
  either very small or fairly large.  Given the MAC label additions
  (yes it would be nice if this was done using the m_tag interface but
  at the very least one can say that they are implemented fairly
  'consistently' despite the fact that they appear imposing to the
  general mbuf structure), and the currently available data region in
  the mbuf, it is absolutely necessary to know whether the assumption of
  packet size distribution still holds before a decision is made on how
  to modify the MAC label implementation - if at all.

> Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
> [EMAIL PROTECTED]  Network Associates Laboratories

-- 
Bosko Milekic * [EMAIL PROTECTED] * [EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-25 Thread Bosko Milekic

On Sun, Nov 24, 2002 at 01:33:29PM -0800, Julian Elischer wrote:
> 
> 
> On Sun, 24 Nov 2002, Andrew Gallatin wrote:
> 
> > 
> > If we're going to nitpick the mbuf system, a much, much worse problem
> > is that you cannot allocate an mbuf chain w/o holding Giant, which
> > stems from the mbuf system eventually calling kmem_malloc().  This
> > effectively prevents any network driver from being giant-free.  When
> > mbufs are low, mb_alloc() calls mb_pop_cont().  This, in turn, calls
> > kmem_malloc() which requires Giant...
> > 
> > The mbuf system calls malloc in other ways too.  The first time you
> > use a cluster, m_ext.ext_ref_cnt is malloc()'ed, and malloc is called
> > when the mbuf map is expanded...   I assume malloc will eventually
> > call kmem_malloc(), leading to the same locking problems.
> > 
> > I know that both tru64 and aix just malloc their mbufs.
> 
> I think we tied that and went back to a separate allocator, but I have
> no idea why..  maybe someone else can enlighten me..

  As I mentionned in a previous Email, it all has to do with what were
  considered to be the requirements of network buffer allocations and
  optimisations in that respect.

-- 
Bosko Milekic * [EMAIL PROTECTED] * [EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-25 Thread Bosko Milekic

On Sun, Nov 24, 2002 at 04:23:33PM -0500, Andrew Gallatin wrote:
> If we're going to nitpick the mbuf system, a much, much worse problem
> is that you cannot allocate an mbuf chain w/o holding Giant, which
> stems from the mbuf system eventually calling kmem_malloc().  This
> effectively prevents any network driver from being giant-free.  When
> mbufs are low, mb_alloc() calls mb_pop_cont().  This, in turn, calls
> kmem_malloc() which requires Giant...

  This is not entirely true.  You can allocate an mbuf chain without
  holding Giant if the caches are well populated - and they should be
  in the common/general case.  You can in fact modify the allocator to
  just not do a kmem_malloc() if called with M_DONTWAIT, but I don't
  think you'd want to do this at this point.

> The mbuf system calls malloc in other ways too.  The first time you
> use a cluster, m_ext.ext_ref_cnt is malloc()'ed, and malloc is called
> when the mbuf map is expanded...   I assume malloc will eventually
> call kmem_malloc(), leading to the same locking problems.

  Actually, that has been changed a while ago.  The ref_cnt for clusters
  is no longer malloc()'d.  A contiguous space is used from which
  cluster ref counters are "allocated" (just like in the old/original
  design).  This modification was made a while ago as an optimisation.

> I know that both tru64 and aix just malloc their mbufs.

  The main reason for which mbufs and clusters are allocated via a
  different allocator boils down to significant optimisations.  Some of
  these include grouped mbuf + cluster allocations with minimized
  lock-dropping/re-acquiring for grouped allocations (essentially
  "atomic allocations"), reduced lock contention, not being forced to
  call VM routines in the common free case (i.e., maybe optionally
  implement lazy freeing), reducing the number of function calls
  required to make a single buffer allocation, etc, etc, etc.

  The gist of the argument boils down to the fact that network buffer
  allocations have different requirements than general all-purpose
  allocations (by design, the last time I checked), and that is why
  an mbuf/cluster allocator exists.

  With that said, the mbuf/cluster allocator shares some important
  characteristics with the UMA implementation of malloc (if I remember
  UMA the last time I checked) and those have to do with, primarily,
  SMP-friendliness.

> Drew

-- 
Bosko Milekic * [EMAIL PROTECTED] * [EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-25 Thread Poul-Henning Kamp
In message <[EMAIL PROTECTED]>, Robe
rt Watson writes:
>
>On Sat, 23 Nov 2002, Andrew Gallatin wrote:
>
>> I propose that we make struct label portion of the pkthdr compile-time
>> conditional on MAC.  The assumption is that you will move the MAC label
>> to an m_tag sometime after 5.0-RELEASE. 

I object to this.  I spent time getting rid of variant sized structs
in -current, and we should not reintroduce them.

If you wonder why this is important, think:
kld-load
and 
forgetting the necessary opt_bla.h #includes.

Poul-Henning


-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-25 Thread Luigi Rizzo
On Mon, Nov 25, 2002 at 11:31:39AM -0500, Robert Watson wrote:
...
> This weekend I spent about six hours looking at what it would take to move
> MAC label data into m_tags.  While in theory it is a workable idea, it
> turns out our m_tag implementation is fairly far from being ready to
> handle something like this.  I ran into the following immediate problems:



yes, what you mention are certainly critical issues that need to
be dealt with. The need for special constructor/destructors might be
a bit troublesome to get right -- i.e. you'd want the extra info not
to take too much space in the m_tag_header, which in turn might
be solved by replacing the tag type/subtype with a pointer to an m_tag
descriptor which in turn contains all the relevant attributes for
the object. At which point we are recreating C++'s vtables ?
But this is not necessarily bad from the point of view of
efficiency, because i guess right now you have specific sections
of code to take care of allocation/deallocations which could be
moved to the mbuf handling routines ?

> BTW, do you have any recent large-scale measurements of packet size
> distribution?  In local tests and measurements, the additional 20 bytes on
> i386 didn't bump the remaining mbuf data space sufficiently low to
> substantially change the behavior of the stack.  However, I haven't done

individual mbufs are typically used in the output path for locally
originated data, where the code still tries to optimize for size.
E.g. if you writes to TCP sockets in small chunks, you might end
up getting chains of individual mbufs instead of merging them into
larger clusters (this is how, for example, Prafulla found out that
ip_output() ended up being called with a chain of 64 or so mbufs
for a single packet on an interface using jumbo frames!)

The mbuf bloat has two aspects -- first it does have some cost to
initialize and reset all these extra fields (and it is bug prone --
witness is the missing cleanup in m_getcl(), because m-tags were
introduced after m_getcl() and probably it was forgotten); second,
a legitimate question might arise at some point on why some features
deserve to go there and others don't, and unfortunately the mac label
constitutes a very bad precedent because it is very large and not very
widely used (at list now) thus failing the two main important criteria
for selection what should be in and what should not.

cheers
luigi
--+-
 Luigi RIZZO, [EMAIL PROTECTED]  . ICSI (on leave from Univ. di Pisa)
 http://www.iet.unipi.it/~luigi/  . 1947 Center St, Berkeley CA 94704
 Phone: (510) 666 2988
--+-


cheers
luigi

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-25 Thread Robert Watson

On Sat, 23 Nov 2002, Andrew Gallatin wrote:

> I propose that we make struct label portion of the pkthdr compile-time
> conditional on MAC.  The assumption is that you will move the MAC label
> to an m_tag sometime after 5.0-RELEASE. 

This weekend I spent about six hours looking at what it would take to move
MAC label data into m_tags.  While in theory it is a workable idea, it
turns out our m_tag implementation is fairly far from being ready to
handle something like this.  I ran into the following immediate problems:

(1) When packet headers are copied using m_copy_pkthdr(), different
consumers have different expectations for what the resulting semantics
are for m_tag data -- some want it duplicated, others want it moved.
In practice, it is only ever moved, so consumers that expect
duplication are in for a surprise.  We need to re-implement the packet
header copying code so that it can generate a failure (because it
involves allocation), and separate the duplicate and move abstractions
to get clean semantics.  I exchanged some e-mail with Sam Leffler on
the topic, and apparently OpenBSD has already made these changes, or
similar ones, and we should do the same for 5.1.

(2) m_tag's don't have a notion that the data carried in a tag is
multi-dimmensional and may require special destructor behavior.  While
it does centralize copying and free'ing of data, it handles this
purely with bcopy, malloc, and free, which is not appropriate for use
with MAC labels, since they may contain a variety of per-policy data
that may require special handling (reference count management, etc).
I tried putting tag-specific release/free/... code in the m_tag
central routines, and it looks like that would work, although it
eventually would lead to a lot of junk in the m_tag code.  We might
want to consider m_tag entry free/copy pointers of some sort, but I'm
not sure if we want to go there.  Adding the MAC stuff to the
m_tag_{free,copy,...} calls won't break the ABI, whereas adding free
and copy pointers to the tags themselves would.

(3) Not all code generating packets properly initializes m_tag field.  The
one example I've come across so far is the use of m_getcl() to grab
mbufs with an attached cluster -- it never initializes the SLIST
properly, as far as I can tell.  Right now that's used in device
drivers, and also in the BPF packet generation code.  If the header is
zero'd, this may be OK due to an implicit proper initialization, but
this is concerning.  We need to do more work to normalize packet
handling.

(4) Code still exists that improperly hand-copies mbuf packet header data.
if_loop.c contains some particular bogus code that also triggers a
panic in the MAC code for the same reason.  If m_tag data ever passes
through if_loop and hits the re-alignment case introduced by KAME, the
system will panic when the tag data is free'd.  This code all needs to
be normalized, and proper semantics must be enforced. 

> This will immediately reduce the size of mbufs for the vast majority of
> users, and will prevent a 4.1.1 like flag-day for 3rd party network
> driver vendors.  The only downside is that the few MAC users will not be
> able to use 3rd party binary network drivers until the MAC label is put
> into an m_tag.  This seems fair, as the only people inconvienced are the
> people who want the labels and they are motivated to move them to an
> m_tag.  But that's easy for me to say, since I don't run MAC, and I may
> be missing something big. 

In the past I have looked at adding conditionally-defined components to
struct mbuf and other key kernel data structures.  While the condition of
the tree is improving from this perspective due to better isolation of
user and kernel data structures, the result is still incredibly messy,
especially if you key the conditionally defined sections on a kernel
option.  mbuf.h is included in a number of userland applications -- some
expected, such as the ipfilter test framework, but others less expected --
such as BIND.  I'm very wary of the notion of adding conditionally defined
portions of struct mbuf on this (and other) bases.  I'll take a look at
whether many of the obvious foot-shooting scenarios still exist since I
last tried it.  Moving to m_tag looks like a reasonable long-term
strategy, but until the m_tag code is substantially more mature, it isn't
realistic.  Otherwise, I might have attempted to push through a change to
it now before RC1.

BTW, do you have any recent large-scale measurements of packet size
distribution?  In local tests and measurements, the additional 20 bytes on
i386 didn't bump the remaining mbuf data space sufficiently low to
substantially change the behavior of the stack.  However, I haven't done
measurements against the 64-bit variation.  In practice, a number of
network interfaces now seem to use clustered mbufs and not attempt

Re: mbuf header bloat ?

2002-11-24 Thread Julian Elischer


On Sun, 24 Nov 2002, Andrew Gallatin wrote:

> 
> If we're going to nitpick the mbuf system, a much, much worse problem
> is that you cannot allocate an mbuf chain w/o holding Giant, which
> stems from the mbuf system eventually calling kmem_malloc().  This
> effectively prevents any network driver from being giant-free.  When
> mbufs are low, mb_alloc() calls mb_pop_cont().  This, in turn, calls
> kmem_malloc() which requires Giant...
> 
> The mbuf system calls malloc in other ways too.  The first time you
> use a cluster, m_ext.ext_ref_cnt is malloc()'ed, and malloc is called
> when the mbuf map is expanded...   I assume malloc will eventually
> call kmem_malloc(), leading to the same locking problems.
> 
> I know that both tru64 and aix just malloc their mbufs.

I think we tied that and went back to a separate allocator, but I have
no idea why..  maybe someone else can enlighten me..


> 
> Drew
> 
> 
> 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-24 Thread Andrew Gallatin

Julian Elischer writes:
 > 
 > 
 > On Sat, 23 Nov 2002, Andrew Gallatin wrote:
 > 
 > > 
 > > As you eloquently state, there are a number of tradeoffs involved.  On
 > > a 64-bit platform, 99% of users are paying 40 bytes/pkt for something
 > > that they will never use.  On x86, 99.99% of users are paying 20
 > > bytes/pkt for a feature they will never use.  At least a signifigant
 > > fraction of nics make use of csum offloading (xl, ti, bge, em, myri).
 > 
 > 
 > the downside to the TAG stuff is that you need to allocate a separate
 > tag storage, and that is a malloc.. which has certain characteristics
 > vs the mbuf allocator.  We have a special allocator for mbufs for a
 > reason. (I'm not sure how many of the original reasons still apply).
 > so it's worth looking at whether malloc is a suitable method of
 > allocating all that stuff before we take it out of the mbuf.
 > 

If we're going to nitpick the mbuf system, a much, much worse problem
is that you cannot allocate an mbuf chain w/o holding Giant, which
stems from the mbuf system eventually calling kmem_malloc().  This
effectively prevents any network driver from being giant-free.  When
mbufs are low, mb_alloc() calls mb_pop_cont().  This, in turn, calls
kmem_malloc() which requires Giant...

The mbuf system calls malloc in other ways too.  The first time you
use a cluster, m_ext.ext_ref_cnt is malloc()'ed, and malloc is called
when the mbuf map is expanded...   I assume malloc will eventually
call kmem_malloc(), leading to the same locking problems.

I know that both tru64 and aix just malloc their mbufs.

Drew



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-24 Thread Julian Elischer


On Sat, 23 Nov 2002, Andrew Gallatin wrote:

> 
> As you eloquently state, there are a number of tradeoffs involved.  On
> a 64-bit platform, 99% of users are paying 40 bytes/pkt for something
> that they will never use.  On x86, 99.99% of users are paying 20
> bytes/pkt for a feature they will never use.  At least a signifigant
> fraction of nics make use of csum offloading (xl, ti, bge, em, myri).


the downside to the TAG stuff is that you need to allocate a separate
tag storage, and that is a malloc.. which has certain characteristics
vs the mbuf allocator.  We have a special allocator for mbufs for a
reason. (I'm not sure how many of the original reasons still apply).
so it's worth looking at whether malloc is a suitable method of
allocating all that stuff before we take it out of the mbuf.



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-23 Thread Andrew Gallatin

Robert Watson writes:
 > 
 > On Thu, 21 Nov 2002, Luigi Rizzo wrote:

<...>

 > > The label is 5 ints, the pkthdr a total of 11 ints (and m_hdr takes
 > > another 6, for a total of 136 bytes of header info on 64-bit
 > > architectures). 
 > > 
 > > Of the pkthdr, only 3 fields (rcvif, len, tags) are of really general
 > > use, the rest being used only in certain cases and for very specific
 > > purposes (e.g. reassembly of fragments, or hw capabilities, or MAC). 
 > > 
 > > Now that Sam has done the excellent work of integrating packet tags to
 > > carry annotations around, i really believe that we should try to move
 > > out of the pkthdr all non-general fields, and move them to m_tags so we
 > > only pay the cost when needed and not in all cases.  Also this pays a
 > > lot in terms of ABI compatibility and extensibility.  I understand that
 > > for 5.0 it is a bit late to act, but i do hope that we can reconsider
 > > this issue for 5.1 and pull out of the pkthdr at least the MAC label,
 > > and possibly also the csum_* fields, much in the same way it has been
 > > done for VLAN labels. 

 > 
 > In the original MAC label design for mbufs, and up until very recently,
 > m_tag wasn't available, so that design didn't use it.  We traded off
 > various things, including measured packet lengths, and decided the 20
 > bytes was an acceptable cost given the available extension services.  I'm
 > certainly willing to re-consider that notion now that we have general
 > extensibility, and now that we have a good, SMP-safe slab allocator in
 > 5.0.  However, one thing to keep in mind is that in a MAC environment,
 > every packet header mbuf does have a valid label, and as a result,
 > inserting additional memory allocations for every packet handled can have
 > substantial cost.  I've had a number of requests to make "options MAC" a
 > default-shipped option: it's not ready yet (as an experimental feature),
 > but it may well be by 6.0 we are ready for that, assuming the performance
 > numbers are right.  In that situation, it could well be that it does make
 > sense to maintain the label data in the mbuf pkthdr, since it really will
 > be used for all pkthdr's.
 > 
 > There's a hard tradeoff here, and it applies to all of the data in the
 > packet header.  On the one hand, we want to keep the mbuf packet header
 > data small: any data there cuts into the space available for fast packet
 > storage without a cluster.  We also want to keep it protocol-independent,
 > since we use mbufs for all protocols, as well (in a number of situations) 
 > as a general purpose memory allocator for the network stack.  On the other
 > hand, we also want to use the highest performance configuration for the
 > common case, and the reality is that our current common case is IPv4
 > networking.  I'm not a big fan of performance hacks, but if we're reaching
 > a time when >50% of network cards provide support for IP checksum handling
 > on the card, paying a few bytes cost per mbuf header may be a definite
 > win.  As you suggest, this is probably a question we need to revisit once
 > 5.0 is out the door, because we really can't make changes like this right
 > now.



On the contrary, I think that if anything is going to be done, it
must be done now, so as to not break binary network driver
compatability like we did in 4.1.1 when the size of mbufs changed.
Otherwise, we're stuck with it until 6.0.

As you eloquently state, there are a number of tradeoffs involved.  On
a 64-bit platform, 99% of users are paying 40 bytes/pkt for something
that they will never use.  On x86, 99.99% of users are paying 20
bytes/pkt for a feature they will never use.  At least a signifigant
fraction of nics make use of csum offloading (xl, ti, bge, em, myri).

I propose that we make struct label portion of the pkthdr compile-time
conditional on MAC.  The assumption is that you will move the MAC
label to an m_tag sometime after 5.0-RELEASE.

This will immediately reduce the size of mbufs for the vast majority
of users, and will prevent a 4.1.1 like flag-day for 3rd party network
driver vendors.  The only downside is that the few MAC users will not
be able to use 3rd party binary network drivers until the MAC label is
put into an m_tag.  This seems fair, as the only people inconvienced
are the people who want the labels and they are motivated to move them
to an m_tag.  But that's easy for me to say, since I don't run MAC,
and I may be missing something big.

Drew





To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-22 Thread John Baldwin

On 22-Nov-2002 Robert Watson wrote:
> 
> On Thu, 21 Nov 2002, Luigi Rizzo wrote:
> 
>> [Bcc to -net because it is relevant there. This email has been triggered
>> by a private discussion i was having with other committers (who will
>> easily recognise themselves :) which suggested the possibility of adding
>> more fields to mbuf headers]
>> 
>> Just recently came up to my attention that we have the following code in
>> 
>> 
>> #define MAC_MAX_POLICIES4
>> struct label {
>> int l_flags;
>> union {
>> void*l_ptr;
>> long l_long;
>> }   l_perpolicy[MAC_MAX_POLICIES];
>> };
>> 
>> (what are l_perpolicy[], ints ? Could this be written a bit better ?)
>> and then in 
> 
> MAC labels provide general purpose security label storage across a variety
> of kernel objects.  Each MAC label is made up of a number of "slots" which
> are allocated to statically linked or dynamically loaded policies.  The
> union is used to allow policies to use their slot for either the purposes
> of an integer store, or as a pointer with the semantics they define.  Some
> policies simply use the long to represent their policy information,
> perhaps by itself (a partition number), or to key into an existing array. 
> Other policies use the pointer to point to shared reference-counted,
> static, or per-label data.  Probably the "long" should be changed to some
> other integer type that better matches the notion of pointer length.

intptr_t.

-- 

John Baldwin <[EMAIL PROTECTED]>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-22 Thread Robert Watson

On Thu, 21 Nov 2002, Luigi Rizzo wrote:

> [Bcc to -net because it is relevant there. This email has been triggered
> by a private discussion i was having with other committers (who will
> easily recognise themselves :) which suggested the possibility of adding
> more fields to mbuf headers]
> 
> Just recently came up to my attention that we have the following code in
> 
> 
> #define MAC_MAX_POLICIES4
> struct label {
> int l_flags;
> union {
> void*l_ptr;
> long l_long;
> }   l_perpolicy[MAC_MAX_POLICIES];
> };
> 
> (what are l_perpolicy[], ints ? Could this be written a bit better ?)
> and then in 

MAC labels provide general purpose security label storage across a variety
of kernel objects.  Each MAC label is made up of a number of "slots" which
are allocated to statically linked or dynamically loaded policies.  The
union is used to allow policies to use their slot for either the purposes
of an integer store, or as a pointer with the semantics they define.  Some
policies simply use the long to represent their policy information,
perhaps by itself (a partition number), or to key into an existing array. 
Other policies use the pointer to point to shared reference-counted,
static, or per-label data.  Probably the "long" should be changed to some
other integer type that better matches the notion of pointer length.

> struct pkthdr {
> struct  ifnet *rcvif;   /* rcv interface */
> int len;/* total packet length */
> /* variables for ip and tcp reassembly */
> void*header;/* pointer to packet header */
> /* variables for hardware checksum */
> int csum_flags; /* flags regarding checksum */
> int csum_data;  /* data field used by csum routines */
> SLIST_HEAD(packet_tags, m_tag) tags; /* list of packet tags */
> struct  label label;/* MAC label of data in packet */  
> };
>  
> The label is 5 ints, the pkthdr a total of 11 ints (and m_hdr takes
> another 6, for a total of 136 bytes of header info on 64-bit
> architectures). 
> 
> Of the pkthdr, only 3 fields (rcvif, len, tags) are of really general
> use, the rest being used only in certain cases and for very specific
> purposes (e.g. reassembly of fragments, or hw capabilities, or MAC). 
> 
> Now that Sam has done the excellent work of integrating packet tags to
> carry annotations around, i really believe that we should try to move
> out of the pkthdr all non-general fields, and move them to m_tags so we
> only pay the cost when needed and not in all cases.  Also this pays a
> lot in terms of ABI compatibility and extensibility.  I understand that
> for 5.0 it is a bit late to act, but i do hope that we can reconsider
> this issue for 5.1 and pull out of the pkthdr at least the MAC label,
> and possibly also the csum_* fields, much in the same way it has been
> done for VLAN labels. 

In the original MAC label design for mbufs, and up until very recently,
m_tag wasn't available, so that design didn't use it.  We traded off
various things, including measured packet lengths, and decided the 20
bytes was an acceptable cost given the available extension services.  I'm
certainly willing to re-consider that notion now that we have general
extensibility, and now that we have a good, SMP-safe slab allocator in
5.0.  However, one thing to keep in mind is that in a MAC environment,
every packet header mbuf does have a valid label, and as a result,
inserting additional memory allocations for every packet handled can have
substantial cost.  I've had a number of requests to make "options MAC" a
default-shipped option: it's not ready yet (as an experimental feature),
but it may well be by 6.0 we are ready for that, assuming the performance
numbers are right.  In that situation, it could well be that it does make
sense to maintain the label data in the mbuf pkthdr, since it really will
be used for all pkthdr's.

There's a hard tradeoff here, and it applies to all of the data in the
packet header.  On the one hand, we want to keep the mbuf packet header
data small: any data there cuts into the space available for fast packet
storage without a cluster.  We also want to keep it protocol-independent,
since we use mbufs for all protocols, as well (in a number of situations) 
as a general purpose memory allocator for the network stack.  On the other
hand, we also want to use the highest performance configuration for the
common case, and the reality is that our current common case is IPv4
networking.  I'm not a big fan of performance hacks, but if we're reaching
a time when >50% of network cards provide support for IP checksum handling
on the card, paying a few bytes cost per mbuf header may be a definite
win.  As you suggest, this is probably a question we need to revisit once
5.0 is out the door, because we really

Re: mbuf header bloat ?

2002-11-21 Thread Archie Cobbs
Luigi Rizzo wrote:
> Now that Sam has done the excellent work of integrating packet tags
> to carry annotations around, i really believe that we should try
> to move out of the pkthdr all non-general fields, and move them to
> m_tags so we only pay the cost when needed and not in all cases.
> Also this pays a lot in terms of ABI compatibility and extensibility.
> I understand that for 5.0 it is a bit late to act, but i do hope
> that we can reconsider this issue for 5.1 and pull out of the pkthdr
> at least the MAC label, and possibly also the csum_* fields, much
> in the same way it has been done for VLAN labels.

You got my vote :-)

-Archie

__
Archie Cobbs * Packet Design * http://www.packetdesign.com

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: mbuf header bloat ?

2002-11-21 Thread Andre Oppermann
Luigi Rizzo wrote:
> 
> [Bcc to -net because it is relevant there. This email has been
> triggered by a private discussion i was having with other committers
> (who will easily recognise themselves :) which suggested the
> possibility of adding more fields to mbuf headers]
> 
> Just recently came up to my attention that we have the following
> code in 
> 
> #define MAC_MAX_POLICIES4
> struct label {
> int l_flags;
> union {
> void*l_ptr;
> long l_long;
> }   l_perpolicy[MAC_MAX_POLICIES];
> };
> 
> (what are l_perpolicy[], ints ? Could this be written a bit better ?)
> and then in 
> 
> struct pkthdr {
> struct  ifnet *rcvif;   /* rcv interface */
> int len;/* total packet length */
> /* variables for ip and tcp reassembly */
> void*header;/* pointer to packet header */
> /* variables for hardware checksum */
> int csum_flags; /* flags regarding checksum */
> int csum_data;  /* data field used by csum routines */
> SLIST_HEAD(packet_tags, m_tag) tags; /* list of packet tags */
> struct  label label;/* MAC label of data in packet */
> };
> 
> The label is 5 ints, the pkthdr a total of 11 ints (and m_hdr takes
> another 6, for a total of 136 bytes of header info on 64-bit architectures).
> 
> Of the pkthdr, only 3 fields (rcvif, len, tags) are of really general
> use, the rest being used only in certain cases and for very specific
> purposes (e.g. reassembly of fragments, or hw capabilities, or MAC).
> 
> Now that Sam has done the excellent work of integrating packet tags
> to carry annotations around, i really believe that we should try
> to move out of the pkthdr all non-general fields, and move them to
> m_tags so we only pay the cost when needed and not in all cases.
> Also this pays a lot in terms of ABI compatibility and extensibility.
> I understand that for 5.0 it is a bit late to act, but i do hope
> that we can reconsider this issue for 5.1 and pull out of the pkthdr
> at least the MAC label, and possibly also the csum_* fields, much
> in the same way it has been done for VLAN labels.

Sounds good to me.

-- 
Andre

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message