subject:"Re\: \[PATCH\] Containment measures for slab objects on scatter gather lists"

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-30 Thread Christoph Hellwig

On Thu, Jun 28, 2007 at 10:24:24PM -0700, Andrew Morton wrote:
> > Really, it would be great if we could treat kmalloc() objects
> > just like real pages.
> 
> >From a high level, that seems like a bad idea.  kmalloc() gives you a
> virtual address and you really shouldn't be poking around at that memory's
> underlying page's pageframe metadata.
> 
> However we can of course do tasteless and weird things if the benefit is
> sufficient

Hey, when we had exactly that issues coming up with xfs/ext3 recovery over
iscsi/aoe you said it's fine :)  End result is that XFS got fixed and ext3
is still broken..

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-30 Thread Russell King

On Sat, Jun 30, 2007 at 12:11:38AM +0100, Alan Cox wrote:
> > DMA to or from memory should be done via the DMA mapping API.  If we're
> > DMAing to/from a limited range within a page, either we should be using
> > dma_map_single(), or dma_map_page() with an appropriate offset and size.
> 
> If those ranges overlap a cache line then the dma mapping API will not
> save your backside.

There's nothing much that the DMA API can do though.  Consider DMA
to a result buffer which is, eg, only 16 bytes in size.  So you get
passed a size of '16' to the DMA API.  What should you do at this
point?  BUG()?

What if you have 64 or 128 byte cache lines?

> > sizes, but they do happen.  We handle this on ARM by writing back
> > the overlapped lines and invalidating the rest before the DMA operation
> > commences, and hope that the overlapped lines aren't touched for the
> > duration of the DMA.)
> 
> The combination of "hope" and "DMA" isn't a good one for stable system
> design. In this situation we should be waving large red flags

I agree.

However, I don't think this is an issue for the DMA API to handle; it's
something that driver authors need to be aware of.  If they wish to do
a DMA to a kmalloc'd buffer or even a page, we could require that
offsets and sizes are cache line aligned.

However, remember that turning on slab debugging turns off cache line
alignment, so imposing such a requirement implies that the slab
debugging will break DMA, or driver authors also have to be aware of
that and do their own alignment internally, *or* we provide an allocator
which does unconditionally align.

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-30 Thread Russell King

On Sat, Jun 30, 2007 at 12:11:38AM +0100, Alan Cox wrote:
  DMA to or from memory should be done via the DMA mapping API.  If we're
  DMAing to/from a limited range within a page, either we should be using
  dma_map_single(), or dma_map_page() with an appropriate offset and size.
 
 If those ranges overlap a cache line then the dma mapping API will not
 save your backside.

There's nothing much that the DMA API can do though.  Consider DMA
to a result buffer which is, eg, only 16 bytes in size.  So you get
passed a size of '16' to the DMA API.  What should you do at this
point?  BUG()?

What if you have 64 or 128 byte cache lines?

  sizes, but they do happen.  We handle this on ARM by writing back
  the overlapped lines and invalidating the rest before the DMA operation
  commences, and hope that the overlapped lines aren't touched for the
  duration of the DMA.)
 
 The combination of hope and DMA isn't a good one for stable system
 design. In this situation we should be waving large red flags

I agree.

However, I don't think this is an issue for the DMA API to handle; it's
something that driver authors need to be aware of.  If they wish to do
a DMA to a kmalloc'd buffer or even a page, we could require that
offsets and sizes are cache line aligned.

However, remember that turning on slab debugging turns off cache line
alignment, so imposing such a requirement implies that the slab
debugging will break DMA, or driver authors also have to be aware of
that and do their own alignment internally, *or* we provide an allocator
which does unconditionally align.

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-30 Thread Christoph Hellwig

On Thu, Jun 28, 2007 at 10:24:24PM -0700, Andrew Morton wrote:
  Really, it would be great if we could treat kmalloc() objects
  just like real pages.
 
 From a high level, that seems like a bad idea.  kmalloc() gives you a
 virtual address and you really shouldn't be poking around at that memory's
 underlying page's pageframe metadata.
 
 However we can of course do tasteless and weird things if the benefit is
 sufficient

Hey, when we had exactly that issues coming up with xfs/ext3 recovery over
iscsi/aoe you said it's fine :)  End result is that XFS got fixed and ext3
is still broken..

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-29 Thread Alan Cox

> DMA to or from memory should be done via the DMA mapping API.  If we're
> DMAing to/from a limited range within a page, either we should be using
> dma_map_single(), or dma_map_page() with an appropriate offset and size.

If those ranges overlap a cache line then the dma mapping API will not
save your backside.

On a system with a 32 byte cache granularity what happens if you get two
dma mapping calls for x and x+16. Right now the thing that avoids this
occurring is that the allocators don't pack stuff in that hard so x+16
always belongs to the same driver and we can hope driver authors are
sensible 

> sizes, but they do happen.  We handle this on ARM by writing back
> the overlapped lines and invalidating the rest before the DMA operation
> commences, and hope that the overlapped lines aren't touched for the
> duration of the DMA.)

The combination of "hope" and "DMA" isn't a good one for stable system
design. In this situation we should be waving large red flags
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-29 Thread Alan Cox

On Fri, 29 Jun 2007 13:45:29 -0700
Andrew Morton <[EMAIL PROTECTED]> wrote:

> On Fri, 29 Jun 2007 13:16:57 +0100
> Alan Cox <[EMAIL PROTECTED]> wrote:
> 
> > > If those operations involve modifying that slab page's pageframe then what
> > > stops concurrent dma'ers from stomping on each other's changes?  As in:
> > > why aren't we already buggy?
> > 
> > Or DMA operations falling out with CPU operations in the same memory
> > area. Not all platforms have hardware consistency and some will blat the
> > entire page out of cache.
> 
> Is that just a performance problem, or can data be lost here?  It depends
> on the meaning of "blat": writeback?  invalidate?  More details, please.

Invalidate. Sorry didn't realise it they hadn't discovered that word down
under.

If you've got something packing objects in tight we are going
to have fun with cache handling simply because the CPU cache granularity
may mean that the invalidate also invalidates a few bytes on (ie a 12
byte object will invalidate 16 bytes of memory) and you've just removed
any CPU held changes in the start of the next object.

Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-29 Thread Russell King

On Fri, Jun 29, 2007 at 01:45:29PM -0700, Andrew Morton wrote:
> On Fri, 29 Jun 2007 13:16:57 +0100
> Alan Cox <[EMAIL PROTECTED]> wrote:
> 
> > > If those operations involve modifying that slab page's pageframe then what
> > > stops concurrent dma'ers from stomping on each other's changes?  As in:
> > > why aren't we already buggy?
> > 
> > Or DMA operations falling out with CPU operations in the same memory
> > area. Not all platforms have hardware consistency and some will blat the
> > entire page out of cache.
> 
> Is that just a performance problem, or can data be lost here?  It depends
> on the meaning of "blat": writeback?  invalidate?  More details, please.
> 
> 
> I'm dyin here and nobody will talk to me.  If the kernel is already doing
> these things, why aren't we already buggy?  Is it because we don't actually
> modify the pageframes of these dma-to-from-kmalloced pages?  But we were
> thinking of doing so in the future?

I think people are getting too het up about this.

DMA to or from memory should be done via the DMA mapping API.  If we're
DMAing to/from a limited range within a page, either we should be using
dma_map_single(), or dma_map_page() with an appropriate offset and size.

Other cache flushing functions should not be called for DMA operations;
any cache handling required by non-coherent architectures should be done
by the DMA API only.

However, with non-coherent aliasing architectures (such as those with
aliasing VIPT or VIVT caches) there is an additional requirement on PIO
to page cache.  If the page we're writing data has some cache lines
allocated to it, we potentially hit those cache lines and the data
doesn't hit the underlying page.  Later on, when we come to map the
page into userspace, the data may still be sitting in the cache lines
corresponding with the kernel's mapping.  Therefore, there is a
requirement to ensure that the cache state WRT the kernel's mapping is
the same irrespective of the method by which data ends up in the page.

That means that for these caches, the data PIO'd into the page must be
written back to the underlying page before the page is handed to
userspace.

The two are completely separate; it seems to me from the above discussion
that people are confusing the two scenarios, and mixing DMA with the PIO
cache handling.  Please don't, you'll only get more and more confused.

(Note: with the dma_map_* API, architectures have to be sensible when
they're passed offests and sizes which aren't cacheline aligned.
Technically, it's buggy to ask for non-L1 line aligned offsets and
sizes, but they do happen.  We handle this on ARM by writing back
the overlapped lines and invalidating the rest before the DMA operation
commences, and hope that the overlapped lines aren't touched for the
duration of the DMA.)

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-29 Thread Andrew Morton

On Fri, 29 Jun 2007 13:16:57 +0100
Alan Cox <[EMAIL PROTECTED]> wrote:

> > If those operations involve modifying that slab page's pageframe then what
> > stops concurrent dma'ers from stomping on each other's changes?  As in:
> > why aren't we already buggy?
> 
> Or DMA operations falling out with CPU operations in the same memory
> area. Not all platforms have hardware consistency and some will blat the
> entire page out of cache.

Is that just a performance problem, or can data be lost here?  It depends
on the meaning of "blat": writeback?  invalidate?  More details, please.

I'm dyin here and nobody will talk to me.  If the kernel is already doing
these things, why aren't we already buggy?  Is it because we don't actually
modify the pageframes of these dma-to-from-kmalloced pages?  But we were
thinking of doing so in the future?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-29 Thread Christoph Lameter

On Fri, 29 Jun 2007, Hugh Dickins wrote:

> I stand by my page_mapping patch, and the remark I made before,
> that page_mapping(page) is the correct place to check this.  What is
> page_mapping(page) for?  Precisely to return the struct address_space*
> from page->mapping when that's what's in there, and not when that field
> has been reused for something else.
> 
> So lines like
> > +   mapping = PageSlab(page) ? NULL : page_mapping(page);
> seem to miss the point.

They certainly point out to the reader that one can expect a slab page 
here where one may not expect one since flush_dcache_page is a page
cache function.

> I agree that the only clash found yet has been in flush_dcache_page,
> so some bytes and branches can indeed be saved by just doing the
> test in there.  Oh, but your VM_BUG_ON cancels out that saving.
> And if we were to try to save bytes and branches there, it's the
> synthetic swapper_space business (only required in a couple of
> places) I'd be wanting to cut out.

VM_BUG_ON is different from BUG_ON. BUG_ON is always checked. VM_BUG_ON 
depends on a debug config option.

> To me this all seems like a big fuss to excuse your surprise:

I am weirded out by the use of terms like "accusations" and "excuses" in 
these discussions. But if it makes you feel better

Others seemed to have encountered the same surprises before me. So it is 
better to point out these issues in the sources. There is the danger of 
other pagecache functions getting called for slab pages in the I/O 
layer and the check in page_mapping provides some protection from such 
changes. The checks/comments in the functions where we allow slab page use 
help the ones enhancing the code to keep these issues in mind and help 
them to not be surprised in turn.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-29 Thread Hugh Dickins

On Thu, 28 Jun 2007, Christoph Lameter wrote:

> I had a talk with James Bottomley last night and it seems that there is an 
> established way of using the page structs of slab objects in the block 
> layer. Drivers may use the DMA interfaces to issue control commands. In 
> that case they may allocate a short structure via the slab allocator and 
> put the control commands into that slab object.
> 
> The driver will then perform a sg_init_one() on the slab object. 
> sg_init_one() calls sg_set_buf() which determines the page struct of a 
> page. In this case sg_set_buf() will determine the page struct of a slab 
> object. The dma layer may then perform operations on the "slab page". The 
> block layer folks seem to have spend some time to make this work right.

Yes, I don't see why this comes as such a surprise and horror to you,
so much in need of dire WARNINGs.  kmalloc memory is not a different
kind of memory from what you get from the page allocators.

I stand by my page_mapping patch, and the remark I made before,
that page_mapping(page) is the correct place to check this.  What is
page_mapping(page) for?  Precisely to return the struct address_space*
from page->mapping when that's what's in there, and not when that field
has been reused for something else.

So lines like
> + mapping = PageSlab(page) ? NULL : page_mapping(page);
seem to miss the point.

I agree that the only clash found yet has been in flush_dcache_page,
so some bytes and branches can indeed be saved by just doing the
test in there.  Oh, but your VM_BUG_ON cancels out that saving.
And if we were to try to save bytes and branches there, it's the
synthetic swapper_space business (only required in a couple of
places) I'd be wanting to cut out.

To me this all seems like a big fuss to excuse your surprise:
so please don't expect an Ack from me; but if others prefer
this, I won't be Nacking.  (Though I'll probably whine about
it into eternity ;)

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-29 Thread Alan Cox

> If those operations involve modifying that slab page's pageframe then what
> stops concurrent dma'ers from stomping on each other's changes?  As in:
> why aren't we already buggy?

Or DMA operations falling out with CPU operations in the same memory
area. Not all platforms have hardware consistency and some will blat the
entire page out of cache.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-29 Thread David Miller

From: Christoph Lameter <[EMAIL PROTECTED]>
Date: Fri, 29 Jun 2007 00:00:39 -0700 (PDT)

> On Thu, 28 Jun 2007, David Miller wrote:
> 
> > Really, it would be great if we could treat kmalloc() objects
> > just like real pages.  Everything wants to do I/O on pages
> > but sometimes (like the networking) you have a kmalloc
> > chunk which is technically just a part of a page.
> > 
> > The fact that there is no easy way to make this work is
> > frustrating :-)
> 
> There is easy way: Allocate a page and just use the first N bytes. You can 
> specify the bytes to be used when putting the memory onto the scatter 
> gather list. This wastes memory but it works. You have real refcounting 
> since you got a real page.
> 
> How frequent are these objects?

Every single network packet.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-29 Thread Christoph Lameter

On Thu, 28 Jun 2007, David Miller wrote:

> Really, it would be great if we could treat kmalloc() objects
> just like real pages.  Everything wants to do I/O on pages
> but sometimes (like the networking) you have a kmalloc
> chunk which is technically just a part of a page.
> 
> The fact that there is no easy way to make this work is
> frustrating :-)

There is easy way: Allocate a page and just use the first N bytes. You can 
specify the bytes to be used when putting the memory onto the scatter 
gather list. This wastes memory but it works. You have real refcounting 
since you got a real page.

How frequent are these objects?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-29 Thread Christoph Lameter

On Thu, 28 Jun 2007, Andrew Morton wrote:

> If those operations _don't_ involve modifying the pageframe (hopes this is
> true) then we're read-only and things become much easier?

This is true right now. We are way off topic ...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-29 Thread Christoph Lameter

On Thu, 28 Jun 2007, David Miller wrote:

> From: Andrew Morton <[EMAIL PROTECTED]>
> Date: Thu, 28 Jun 2007 22:24:24 -0700
> 
> > So what happens when two quite different threads of control are doing
> > IO against two hunks of kmalloced memory which happen to come from the same
> > page?  Either some (kernel-wide) locking is needed, or that pageframe needs
> > to be treated as readonly?
> 
> Or you put an atomic_t at the beginning or tail of every SLAB
> object.  It's a space cost not a runtime cost for the common
> case which is:

Hmmm... We could do something like

kmem_cache_get(slab, object)

and

kmem_cache_put(slab, object)

kmem_cache_get would disable allocations from the slab and increment a 
refcount.

kmem_cache_put would enable allocations again if the refcount reaches 
one.

The problem is that freeing an object may cause writes to the object. F.e.
poisoning will overwrite the object on free. SLUB will put its free 
pointer in the first words etc.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-29 Thread Christoph Lameter

On Thu, 28 Jun 2007, David Miller wrote:

 From: Andrew Morton [EMAIL PROTECTED]
 Date: Thu, 28 Jun 2007 22:24:24 -0700

  So what happens when two quite different threads of control are doing
  IO against two hunks of kmalloced memory which happen to come from the same
  page?  Either some (kernel-wide) locking is needed, or that pageframe needs
  to be treated as readonly?

 Or you put an atomic_t at the beginning or tail of every SLAB
 object.  It's a space cost not a runtime cost for the common
 case which is:

Hmmm... We could do something like

kmem_cache_get(slab, object)

and

kmem_cache_put(slab, object)

kmem_cache_get would disable allocations from the slab and increment a 
refcount.

kmem_cache_put would enable allocations again if the refcount reaches 
one.

The problem is that freeing an object may cause writes to the object. F.e.
poisoning will overwrite the object on free. SLUB will put its free 
pointer in the first words etc.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-29 Thread Christoph Lameter

On Thu, 28 Jun 2007, Andrew Morton wrote:

 If those operations _don't_ involve modifying the pageframe (hopes this is
 true) then we're read-only and things become much easier?

This is true right now. We are way off topic ...

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-29 Thread Christoph Lameter

On Thu, 28 Jun 2007, David Miller wrote:

 Really, it would be great if we could treat kmalloc() objects
 just like real pages.  Everything wants to do I/O on pages
 but sometimes (like the networking) you have a kmalloc
 chunk which is technically just a part of a page.
 
 The fact that there is no easy way to make this work is
 frustrating :-)

There is easy way: Allocate a page and just use the first N bytes. You can 
specify the bytes to be used when putting the memory onto the scatter 
gather list. This wastes memory but it works. You have real refcounting 
since you got a real page.

How frequent are these objects?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-29 Thread David Miller

From: Christoph Lameter [EMAIL PROTECTED]
Date: Fri, 29 Jun 2007 00:00:39 -0700 (PDT)

 On Thu, 28 Jun 2007, David Miller wrote:

  Really, it would be great if we could treat kmalloc() objects
  just like real pages.  Everything wants to do I/O on pages
  but sometimes (like the networking) you have a kmalloc
  chunk which is technically just a part of a page.

  The fact that there is no easy way to make this work is
  frustrating :-)

 There is easy way: Allocate a page and just use the first N bytes. You can 
 specify the bytes to be used when putting the memory onto the scatter 
 gather list. This wastes memory but it works. You have real refcounting 
 since you got a real page.

 How frequent are these objects?

Every single network packet.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-29 Thread Alan Cox

 If those operations involve modifying that slab page's pageframe then what
 stops concurrent dma'ers from stomping on each other's changes?  As in:
 why aren't we already buggy?

Or DMA operations falling out with CPU operations in the same memory
area. Not all platforms have hardware consistency and some will blat the
entire page out of cache.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-29 Thread Hugh Dickins

On Thu, 28 Jun 2007, Christoph Lameter wrote:

 I had a talk with James Bottomley last night and it seems that there is an 
 established way of using the page structs of slab objects in the block 
 layer. Drivers may use the DMA interfaces to issue control commands. In 
 that case they may allocate a short structure via the slab allocator and 
 put the control commands into that slab object.
 
 The driver will then perform a sg_init_one() on the slab object. 
 sg_init_one() calls sg_set_buf() which determines the page struct of a 
 page. In this case sg_set_buf() will determine the page struct of a slab 
 object. The dma layer may then perform operations on the slab page. The 
 block layer folks seem to have spend some time to make this work right.

Yes, I don't see why this comes as such a surprise and horror to you,
so much in need of dire WARNINGs.  kmalloc memory is not a different
kind of memory from what you get from the page allocators.

I stand by my page_mapping patch, and the remark I made before,
that page_mapping(page) is the correct place to check this.  What is
page_mapping(page) for?  Precisely to return the struct address_space*
from page-mapping when that's what's in there, and not when that field
has been reused for something else.

So lines like
 + mapping = PageSlab(page) ? NULL : page_mapping(page);
seem to miss the point.

I agree that the only clash found yet has been in flush_dcache_page,
so some bytes and branches can indeed be saved by just doing the
test in there.  Oh, but your VM_BUG_ON cancels out that saving.
And if we were to try to save bytes and branches there, it's the
synthetic swapper_space business (only required in a couple of
places) I'd be wanting to cut out.

To me this all seems like a big fuss to excuse your surprise:
so please don't expect an Ack from me; but if others prefer
this, I won't be Nacking.  (Though I'll probably whine about
it into eternity ;)

Hugh
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-29 Thread Christoph Lameter

On Fri, 29 Jun 2007, Hugh Dickins wrote:

 I stand by my page_mapping patch, and the remark I made before,
 that page_mapping(page) is the correct place to check this.  What is
 page_mapping(page) for?  Precisely to return the struct address_space*
 from page-mapping when that's what's in there, and not when that field
 has been reused for something else.
 
 So lines like
  +   mapping = PageSlab(page) ? NULL : page_mapping(page);
 seem to miss the point.

They certainly point out to the reader that one can expect a slab page 
here where one may not expect one since flush_dcache_page is a page
cache function.

 I agree that the only clash found yet has been in flush_dcache_page,
 so some bytes and branches can indeed be saved by just doing the
 test in there.  Oh, but your VM_BUG_ON cancels out that saving.
 And if we were to try to save bytes and branches there, it's the
 synthetic swapper_space business (only required in a couple of
 places) I'd be wanting to cut out.

VM_BUG_ON is different from BUG_ON. BUG_ON is always checked. VM_BUG_ON 
depends on a debug config option.

 To me this all seems like a big fuss to excuse your surprise:

I am weirded out by the use of terms like accusations and excuses in 
these discussions. But if it makes you feel better

Others seemed to have encountered the same surprises before me. So it is 
better to point out these issues in the sources. There is the danger of 
other pagecache functions getting called for slab pages in the I/O 
layer and the check in page_mapping provides some protection from such 
changes. The checks/comments in the functions where we allow slab page use 
help the ones enhancing the code to keep these issues in mind and help 
them to not be surprised in turn.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-29 Thread Andrew Morton

On Fri, 29 Jun 2007 13:16:57 +0100
Alan Cox [EMAIL PROTECTED] wrote:

  If those operations involve modifying that slab page's pageframe then what
  stops concurrent dma'ers from stomping on each other's changes?  As in:
  why aren't we already buggy?
 
 Or DMA operations falling out with CPU operations in the same memory
 area. Not all platforms have hardware consistency and some will blat the
 entire page out of cache.

Is that just a performance problem, or can data be lost here?  It depends
on the meaning of blat: writeback?  invalidate?  More details, please.


I'm dyin here and nobody will talk to me.  If the kernel is already doing
these things, why aren't we already buggy?  Is it because we don't actually
modify the pageframes of these dma-to-from-kmalloced pages?  But we were
thinking of doing so in the future?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-29 Thread Russell King

On Fri, Jun 29, 2007 at 01:45:29PM -0700, Andrew Morton wrote:
 On Fri, 29 Jun 2007 13:16:57 +0100
 Alan Cox [EMAIL PROTECTED] wrote:
 
   If those operations involve modifying that slab page's pageframe then what
   stops concurrent dma'ers from stomping on each other's changes?  As in:
   why aren't we already buggy?
  
  Or DMA operations falling out with CPU operations in the same memory
  area. Not all platforms have hardware consistency and some will blat the
  entire page out of cache.
 
 Is that just a performance problem, or can data be lost here?  It depends
 on the meaning of blat: writeback?  invalidate?  More details, please.
 
 
 I'm dyin here and nobody will talk to me.  If the kernel is already doing
 these things, why aren't we already buggy?  Is it because we don't actually
 modify the pageframes of these dma-to-from-kmalloced pages?  But we were
 thinking of doing so in the future?

I think people are getting too het up about this.

DMA to or from memory should be done via the DMA mapping API.  If we're
DMAing to/from a limited range within a page, either we should be using
dma_map_single(), or dma_map_page() with an appropriate offset and size.

Other cache flushing functions should not be called for DMA operations;
any cache handling required by non-coherent architectures should be done
by the DMA API only.

However, with non-coherent aliasing architectures (such as those with
aliasing VIPT or VIVT caches) there is an additional requirement on PIO
to page cache.  If the page we're writing data has some cache lines
allocated to it, we potentially hit those cache lines and the data
doesn't hit the underlying page.  Later on, when we come to map the
page into userspace, the data may still be sitting in the cache lines
corresponding with the kernel's mapping.  Therefore, there is a
requirement to ensure that the cache state WRT the kernel's mapping is
the same irrespective of the method by which data ends up in the page.

That means that for these caches, the data PIO'd into the page must be
written back to the underlying page before the page is handed to
userspace.

The two are completely separate; it seems to me from the above discussion
that people are confusing the two scenarios, and mixing DMA with the PIO
cache handling.  Please don't, you'll only get more and more confused.

(Note: with the dma_map_* API, architectures have to be sensible when
they're passed offests and sizes which aren't cacheline aligned.
Technically, it's buggy to ask for non-L1 line aligned offsets and
sizes, but they do happen.  We handle this on ARM by writing back
the overlapped lines and invalidating the rest before the DMA operation
commences, and hope that the overlapped lines aren't touched for the
duration of the DMA.)

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-29 Thread Alan Cox

On Fri, 29 Jun 2007 13:45:29 -0700
Andrew Morton [EMAIL PROTECTED] wrote:

 On Fri, 29 Jun 2007 13:16:57 +0100
 Alan Cox [EMAIL PROTECTED] wrote:
 
   If those operations involve modifying that slab page's pageframe then what
   stops concurrent dma'ers from stomping on each other's changes?  As in:
   why aren't we already buggy?
  
  Or DMA operations falling out with CPU operations in the same memory
  area. Not all platforms have hardware consistency and some will blat the
  entire page out of cache.
 
 Is that just a performance problem, or can data be lost here?  It depends
 on the meaning of blat: writeback?  invalidate?  More details, please.

Invalidate. Sorry didn't realise it they hadn't discovered that word down
under.

If you've got something packing objects in tight we are going
to have fun with cache handling simply because the CPU cache granularity
may mean that the invalidate also invalidates a few bytes on (ie a 12
byte object will invalidate 16 bytes of memory) and you've just removed
any CPU held changes in the start of the next object.

Alan
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-29 Thread Alan Cox

 DMA to or from memory should be done via the DMA mapping API.  If we're
 DMAing to/from a limited range within a page, either we should be using
 dma_map_single(), or dma_map_page() with an appropriate offset and size.

If those ranges overlap a cache line then the dma mapping API will not
save your backside.

On a system with a 32 byte cache granularity what happens if you get two
dma mapping calls for x and x+16. Right now the thing that avoids this
occurring is that the allocators don't pack stuff in that hard so x+16
always belongs to the same driver and we can hope driver authors are
sensible 

 sizes, but they do happen.  We handle this on ARM by writing back
 the overlapped lines and invalidating the rest before the DMA operation
 commences, and hope that the overlapped lines aren't touched for the
 duration of the DMA.)

The combination of hope and DMA isn't a good one for stable system
design. In this situation we should be waving large red flags
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-28 Thread Andrew Morton

On Thu, 28 Jun 2007 22:37:34 -0700 (PDT) David Miller <[EMAIL PROTECTED]> wrote:

> From: Andrew Morton <[EMAIL PROTECTED]>
> Date: Thu, 28 Jun 2007 22:24:24 -0700
> 
> > So what happens when two quite different threads of control are doing
> > IO against two hunks of kmalloced memory which happen to come from the same
> > page?  Either some (kernel-wide) locking is needed, or that pageframe needs
> > to be treated as readonly?
> 
> Or you put an atomic_t at the beginning or tail of every SLAB
> object.  It's a space cost not a runtime cost for the common
> case which is:
> 
>   smp_rmb();
>   if (atomic_read(_obj->count) == 1)
>   really_free_it();
>   else if (atomic_dec_and_test(...))
> 
> Note I don't like this variant either. :)

but, but...  Christoph said 'The dma layer may then perform operations on
the "slab page'.

If those operations involve modifying that slab page's pageframe then what
stops concurrent dma'ers from stomping on each other's changes?  As in:
why aren't we already buggy?

If those operations _don't_ involve modifying the pageframe (hopes this is
true) then we're read-only and things become much easier?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-28 Thread David Miller

From: Andrew Morton <[EMAIL PROTECTED]>
Date: Thu, 28 Jun 2007 22:24:24 -0700

> So what happens when two quite different threads of control are doing
> IO against two hunks of kmalloced memory which happen to come from the same
> page?  Either some (kernel-wide) locking is needed, or that pageframe needs
> to be treated as readonly?

Or you put an atomic_t at the beginning or tail of every SLAB
object.  It's a space cost not a runtime cost for the common
case which is:

smp_rmb();
if (atomic_read(_obj->count) == 1)
really_free_it();
else if (atomic_dec_and_test(...))

Note I don't like this variant either. :)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-28 Thread Andrew Morton

On Thu, 28 Jun 2007 22:06:06 -0700 (PDT) David Miller <[EMAIL PROTECTED]> wrote:

> From: Christoph Lameter <[EMAIL PROTECTED]>
> Date: Thu, 28 Jun 2007 21:39:01 -0700 (PDT)
> 
> > H... Maybe we are creating more of a mess with this. Isnt there some 
> > other way to handle these object.
> 
> That's where I was going with the silly idea to use another
> allocator :)
> 
> Really, it would be great if we could treat kmalloc() objects
> just like real pages.

>From a high level, that seems like a bad idea.  kmalloc() gives you a
virtual address and you really shouldn't be poking around at that memory's
underlying page's pageframe metadata.

However we can of course do tasteless and weird things if the benefit is
sufficient

>  Everything wants to do I/O on pages
> but sometimes (like the networking) you have a kmalloc
> chunk which is technically just a part of a page.

hm.  So what happens when two quite different threads of control are doing
IO against two hunks of kmalloced memory which happen to come from the same
page?  Either some (kernel-wide) locking is needed, or that pageframe needs
to be treated as readonly?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-28 Thread David Miller

From: Christoph Lameter <[EMAIL PROTECTED]>
Date: Thu, 28 Jun 2007 21:39:01 -0700 (PDT)

> H... Maybe we are creating more of a mess with this. Isnt there some 
> other way to handle these object.

That's where I was going with the silly idea to use another
allocator :)

Really, it would be great if we could treat kmalloc() objects
just like real pages.  Everything wants to do I/O on pages
but sometimes (like the networking) you have a kmalloc
chunk which is technically just a part of a page.

The fact that there is no easy way to make this work is
frustrating :-)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-28 Thread Christoph Lameter

On Thu, 28 Jun 2007, David Miller wrote:

> > You can get such a reference and then the slab page will be in limbo if 
> > all objects are freed until that reference is given up. The reference 
> > method is also use by kmem_cache_vacate() (but that is slab internal).
> 
> What about if someone kfree()'s that object meanwhile?

If that is the last object in the slab then the page is deslabified and 
it will stick around as a regular page until its refcount reaches zero.

There is slab functionality in SLUB that uses this trick to insure that 
the slab does not go away.

> Can we bump the SLAB object count just like we can a page?

You can bump the slab page count. But you have to use virt_to_head_page() 
instead of virt_to_page() since slab pages may be compound pages (in both 
SLAB and SLUB). Incrementing the refcount of a tail page will cause an oops on
free.

> That's basically what's happening in the stuff Jens is
> working on, he needs to grab a reference to a SLAB
> object just like one can a page.  Even if there is an
> intervening kfree (like a put_page()) the SLAB object is
> still live until all the references are put, and thus it
> can't get reallocated and given to another client by SLAB.

If you kfree the object then all slab allocators will feel free to 
immediately alloc the object for other purposes.

If you want to prohibit further allocations from a slab then I can likely 
give you a function call that removes a slab from the partial lists so 
that the slab will not be allocated from. But this will only work with 
SLUB.

At some point the slab will then need to be returned to the partial slab 
lists.

H... Maybe we are creating more of a mess with this. Isnt there some 
other way to handle these object.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-28 Thread David Miller

From: Christoph Lameter <[EMAIL PROTECTED]>
Date: Thu, 28 Jun 2007 21:22:22 -0700 (PDT)

> On Thu, 28 Jun 2007, David Miller wrote:
> 
> > > Still a better solution would be to not use the slab allocator at all for 
> > > the objects that are used to send commands to the devices. These are not 
> > > permanent and grabbing a page from the pcp lists and putting it back is 
> > > likely as fast as performing a kmalloc.
> > 
> > Jens Axboe wants to get references to the page structs behind
> > kmalloc() allocated pages in his networking splice work.
> 
> You can get such a reference and then the slab page will be in limbo if 
> all objects are freed until that reference is given up. The reference 
> method is also use by kmem_cache_vacate() (but that is slab internal).

What about if someone kfree()'s that object meanwhile?

Can we bump the SLAB object count just like we can a page?
That's basically what's happening in the stuff Jens is
working on, he needs to grab a reference to a SLAB
object just like one can a page.  Even if there is an
intervening kfree (like a put_page()) the SLAB object is
still live until all the references are put, and thus it
can't get reallocated and given to another client by SLAB.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-28 Thread Christoph Lameter

On Thu, 28 Jun 2007, David Miller wrote:

> > Still a better solution would be to not use the slab allocator at all for 
> > the objects that are used to send commands to the devices. These are not 
> > permanent and grabbing a page from the pcp lists and putting it back is 
> > likely as fast as performing a kmalloc.
> 
> Jens Axboe wants to get references to the page structs behind
> kmalloc() allocated pages in his networking splice work.

You can get such a reference and then the slab page will be in limbo if 
all objects are freed until that reference is given up. The reference 
method is also use by kmem_cache_vacate() (but that is slab internal).

> We could make a special allocator in the networking that carves
> chunks out of pages but I'm sure you'll find that about as stupid
> and wasteful as I do :-)

Well then I guess this containment patch is as good as we can get.
It makes sure that things do not get out of hand.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-28 Thread David Miller

From: Christoph Lameter <[EMAIL PROTECTED]>
Date: Thu, 28 Jun 2007 21:01:36 -0700 (PDT)

> Modify the functions in the affected arches to check for PageSlab() and 
> use a NULL mapping if such a page is encountered. This may only be 
> necessary for parisc and arm since sparc64 and xtensa do not scan over 
> processes mapping a page but I have modified those two arches also for 
> correctnesses sake since they use page_mapping() in flush_dcache_page().
> 
> Still a better solution would be to not use the slab allocator at all for 
> the objects that are used to send commands to the devices. These are not 
> permanent and grabbing a page from the pcp lists and putting it back is 
> likely as fast as performing a kmalloc.

Jens Axboe wants to get references to the page structs behind
kmalloc() allocated pages in his networking splice work.

We pass scatterlists around, but networking buffers are
composed of a kmalloc()'d data header area for packet headers
and some of the initial packet data, then a true scatterlist
of page/offset/len triplets.

Splice wants to work with pages everwhere.

This is a reocurring theme, we should provide some kind of
solution to these issues instead of wishing they would go
away.

We could make a special allocator in the networking that carves
chunks out of pages but I'm sure you'll find that about as stupid
and wasteful as I do :-)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-28 Thread David Miller

From: Christoph Lameter [EMAIL PROTECTED]
Date: Thu, 28 Jun 2007 21:01:36 -0700 (PDT)

 Modify the functions in the affected arches to check for PageSlab() and 
 use a NULL mapping if such a page is encountered. This may only be 
 necessary for parisc and arm since sparc64 and xtensa do not scan over 
 processes mapping a page but I have modified those two arches also for 
 correctnesses sake since they use page_mapping() in flush_dcache_page().

 Still a better solution would be to not use the slab allocator at all for 
 the objects that are used to send commands to the devices. These are not 
 permanent and grabbing a page from the pcp lists and putting it back is 
 likely as fast as performing a kmalloc.

Jens Axboe wants to get references to the page structs behind
kmalloc() allocated pages in his networking splice work.

We pass scatterlists around, but networking buffers are
composed of a kmalloc()'d data header area for packet headers
and some of the initial packet data, then a true scatterlist
of page/offset/len triplets.

Splice wants to work with pages everwhere.

This is a reocurring theme, we should provide some kind of
solution to these issues instead of wishing they would go
away.

We could make a special allocator in the networking that carves
chunks out of pages but I'm sure you'll find that about as stupid
and wasteful as I do :-)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-28 Thread Christoph Lameter

On Thu, 28 Jun 2007, David Miller wrote:

  Still a better solution would be to not use the slab allocator at all for 
  the objects that are used to send commands to the devices. These are not 
  permanent and grabbing a page from the pcp lists and putting it back is 
  likely as fast as performing a kmalloc.
 
 Jens Axboe wants to get references to the page structs behind
 kmalloc() allocated pages in his networking splice work.

You can get such a reference and then the slab page will be in limbo if 
all objects are freed until that reference is given up. The reference 
method is also use by kmem_cache_vacate() (but that is slab internal).

 We could make a special allocator in the networking that carves
 chunks out of pages but I'm sure you'll find that about as stupid
 and wasteful as I do :-)

Well then I guess this containment patch is as good as we can get.
It makes sure that things do not get out of hand.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-28 Thread David Miller

From: Christoph Lameter [EMAIL PROTECTED]
Date: Thu, 28 Jun 2007 21:22:22 -0700 (PDT)

 On Thu, 28 Jun 2007, David Miller wrote:

   Still a better solution would be to not use the slab allocator at all for 
   the objects that are used to send commands to the devices. These are not 
   permanent and grabbing a page from the pcp lists and putting it back is 
   likely as fast as performing a kmalloc.

  Jens Axboe wants to get references to the page structs behind
  kmalloc() allocated pages in his networking splice work.

 You can get such a reference and then the slab page will be in limbo if 
 all objects are freed until that reference is given up. The reference 
 method is also use by kmem_cache_vacate() (but that is slab internal).

What about if someone kfree()'s that object meanwhile?

Can we bump the SLAB object count just like we can a page?
That's basically what's happening in the stuff Jens is
working on, he needs to grab a reference to a SLAB
object just like one can a page.  Even if there is an
intervening kfree (like a put_page()) the SLAB object is
still live until all the references are put, and thus it
can't get reallocated and given to another client by SLAB.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-28 Thread Christoph Lameter

On Thu, 28 Jun 2007, David Miller wrote:

  You can get such a reference and then the slab page will be in limbo if 
  all objects are freed until that reference is given up. The reference 
  method is also use by kmem_cache_vacate() (but that is slab internal).
 
 What about if someone kfree()'s that object meanwhile?

If that is the last object in the slab then the page is deslabified and 
it will stick around as a regular page until its refcount reaches zero.

There is slab functionality in SLUB that uses this trick to insure that 
the slab does not go away.

 Can we bump the SLAB object count just like we can a page?

You can bump the slab page count. But you have to use virt_to_head_page() 
instead of virt_to_page() since slab pages may be compound pages (in both 
SLAB and SLUB). Incrementing the refcount of a tail page will cause an oops on
free.

 That's basically what's happening in the stuff Jens is
 working on, he needs to grab a reference to a SLAB
 object just like one can a page.  Even if there is an
 intervening kfree (like a put_page()) the SLAB object is
 still live until all the references are put, and thus it
 can't get reallocated and given to another client by SLAB.

If you kfree the object then all slab allocators will feel free to 
immediately alloc the object for other purposes.

If you want to prohibit further allocations from a slab then I can likely 
give you a function call that removes a slab from the partial lists so 
that the slab will not be allocated from. But this will only work with 
SLUB.

At some point the slab will then need to be returned to the partial slab 
lists.

H... Maybe we are creating more of a mess with this. Isnt there some 
other way to handle these object.



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-28 Thread David Miller

From: Christoph Lameter [EMAIL PROTECTED]
Date: Thu, 28 Jun 2007 21:39:01 -0700 (PDT)

 H... Maybe we are creating more of a mess with this. Isnt there some 
 other way to handle these object.

That's where I was going with the silly idea to use another
allocator :)

Really, it would be great if we could treat kmalloc() objects
just like real pages.  Everything wants to do I/O on pages
but sometimes (like the networking) you have a kmalloc
chunk which is technically just a part of a page.

The fact that there is no easy way to make this work is
frustrating :-)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-28 Thread Andrew Morton

On Thu, 28 Jun 2007 22:06:06 -0700 (PDT) David Miller [EMAIL PROTECTED] wrote:

 From: Christoph Lameter [EMAIL PROTECTED]
 Date: Thu, 28 Jun 2007 21:39:01 -0700 (PDT)

  H... Maybe we are creating more of a mess with this. Isnt there some 
  other way to handle these object.

 That's where I was going with the silly idea to use another
 allocator :)

 Really, it would be great if we could treat kmalloc() objects
 just like real pages.

From a high level, that seems like a bad idea.  kmalloc() gives you a
virtual address and you really shouldn't be poking around at that memory's
underlying page's pageframe metadata.

However we can of course do tasteless and weird things if the benefit is
sufficient

  Everything wants to do I/O on pages
 but sometimes (like the networking) you have a kmalloc
 chunk which is technically just a part of a page.

hm.  So what happens when two quite different threads of control are doing
IO against two hunks of kmalloced memory which happen to come from the same
page?  Either some (kernel-wide) locking is needed, or that pageframe needs
to be treated as readonly?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-28 Thread David Miller

From: Andrew Morton [EMAIL PROTECTED]
Date: Thu, 28 Jun 2007 22:24:24 -0700

 So what happens when two quite different threads of control are doing
 IO against two hunks of kmalloced memory which happen to come from the same
 page?  Either some (kernel-wide) locking is needed, or that pageframe needs
 to be treated as readonly?

Or you put an atomic_t at the beginning or tail of every SLAB
object.  It's a space cost not a runtime cost for the common
case which is:

smp_rmb();
if (atomic_read(slab_obj-count) == 1)
really_free_it();
else if (atomic_dec_and_test(...))

Note I don't like this variant either. :)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Containment measures for slab objects on scatter gather lists

2007-06-28 Thread Andrew Morton

On Thu, 28 Jun 2007 22:37:34 -0700 (PDT) David Miller [EMAIL PROTECTED] wrote:

 From: Andrew Morton [EMAIL PROTECTED]
 Date: Thu, 28 Jun 2007 22:24:24 -0700

  So what happens when two quite different threads of control are doing
  IO against two hunks of kmalloced memory which happen to come from the same
  page?  Either some (kernel-wide) locking is needed, or that pageframe needs
  to be treated as readonly?

 Or you put an atomic_t at the beginning or tail of every SLAB
 object.  It's a space cost not a runtime cost for the common
 case which is:

   smp_rmb();
   if (atomic_read(slab_obj-count) == 1)
   really_free_it();
   else if (atomic_dec_and_test(...))

 Note I don't like this variant either. :)

but, but...  Christoph said 'The dma layer may then perform operations on
the slab page'.

If those operations involve modifying that slab page's pageframe then what
stops concurrent dma'ers from stomping on each other's changes?  As in:
why aren't we already buggy?

If those operations _don't_ involve modifying the pageframe (hopes this is
true) then we're read-only and things become much easier?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

42 matches

Mail list logo