Re: Bounce buffer deadlock

2001-06-30 Thread Marcelo Tosatti



On Sat, 30 Jun 2001, Steve Lord wrote:

> 
> > Yes. 2.4.6-pre8 fixes that (not sure if its up already). 
> 
> It is up.
> 
> > 
> > > If the fix is to avoid page_launder in these cases then the number of
> > > occurrences when an alloc_pages fails will go up. 
> > 
> > > I was attempting to come up with a way of making try_to_free_buffers
> > > fail on buffers which are being processed in the generic_make_request
> > > path by marking them, the problem is there is no single place to reset
> > > the state of a buffer so that try_to_free_buffers will wait for it.
> > > Doing it after the end of the loop in generic_make_request is race
> > > prone to say the least.
> > 
> > I really want to fix things like this in 2.5. (ie not avoid the deadlock
> > by completly avoiding physical IO, but avoid the deadlock by avoiding
> > physical IO on the "device" which is doing the allocation)
> > 
> > Could you send me your code ? No problem if it does not work at all :)
> > 
> 
> Well, the basic idea is simple, but I suspect the implementation might
> rapidly become historical in 2.5. Basically I added a new buffer state bit,
> although BH_Req looks like it could be cannibalized, no one appears to check
> for it (is it really dead code?). 

Dunno. Ask Jens :)

> Using a flag to skip buffers in try_to_free_buffers is easy:

I was thinking more about skipping the buffers which are "owned" by the
device which is doing the allocation. 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bounce buffer deadlock

2001-06-30 Thread Steve Lord

> 
> On Sat, 30 Jun 2001, Steve Lord wrote:
> >
> > OK, sounds reasonable, time to go download and merge again I guess!
> 
> For 2.4.7 or so, I'll make a backwards-compatibility define (ie make
> GFP_BUFFER be the same as the new GFP_NOIO, which is the historical
> behaviour and the anally safe value, if not very efficient), but I'm
> planning on releasing 2.4.6 without it, to try to flush out people who are
> able to take advantage of the new extended semantics out of the
> woodworks..
> 
>   Linus

Consider XFS flushed out (once I merge). This, for us, is the tricky part:


[EMAIL PROTECTED] said:
>> That allows us to do the best we can - still flushing out dirty
>> buffers when that's ok (like when a filesystem wants more memory), and
>> giving the allocator better control over exactly _what_ he objects to.

XFS introduces the concept of the low level flush of a buffer not always
being really a low level flush, since a delayed allocate buffer can end
up reentering the filesystem in order to create the true on disk allocation.
This in turn can cause a transaction and more memory allocations. The really
nasty case we were using GFP_BUFFER for is a memory allocation which is from
within a transaction, we cannot afford to have another transaction start out
of the bottom of memory allocation as it may require resources locked by
the transaction which is allocating memory.

Steve



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bounce buffer deadlock

2001-06-30 Thread Steve Lord


> Yes. 2.4.6-pre8 fixes that (not sure if its up already). 

It is up.

> 
> > If the fix is to avoid page_launder in these cases then the number of
> > occurrences when an alloc_pages fails will go up. 
> 
> > I was attempting to come up with a way of making try_to_free_buffers
> > fail on buffers which are being processed in the generic_make_request
> > path by marking them, the problem is there is no single place to reset
> > the state of a buffer so that try_to_free_buffers will wait for it.
> > Doing it after the end of the loop in generic_make_request is race
> > prone to say the least.
> 
> I really want to fix things like this in 2.5. (ie not avoid the deadlock
> by completly avoiding physical IO, but avoid the deadlock by avoiding
> physical IO on the "device" which is doing the allocation)
> 
> Could you send me your code ? No problem if it does not work at all :)
> 

Well, the basic idea is simple, but I suspect the implementation might
rapidly become historical in 2.5. Basically I added a new buffer state bit,
although BH_Req looks like it could be cannibalized, no one appears to check
for it (is it really dead code?). 

Using a flag to skip buffers in try_to_free_buffers is easy:


===
Index: linux/fs/buffer.c
===

--- /usr/tmp/TmpDir.3237-0/linux/fs/buffer.c_1.68   Sat Jun 30 12:56:29 2001
+++ linux/fs/buffer.c   Sat Jun 30 12:57:52 2001
@@ -2365,7 +2365,7 @@
 /*
  * Can the buffer be thrown out?
  */
-#define BUFFER_BUSY_BITS   ((1b_state 
& BUFFER_BUSY_BITS))
 
 /*
@@ -2430,7 +2430,11 @@
spin_unlock(_list[index].lock);
write_unlock(_table_lock);
spin_unlock(_list_lock);
-   if (wait) {
+   /* Buffers in the middle of generic_make_request processing cannot
+* be waited for, they may be allocating memory right now and be
+* locked by this thread.
+*/
+   if (wait && !buffer_clamped(tmp)) {
sync_page_buffers(bh, wait);
/* We waited synchronously, so we can free the buffers. */
if (wait > 1 && !loop) {

===
Index: linux/include/linux/fs.h
===

--- /usr/tmp/TmpDir.3237-0/linux/include/linux/fs.h_1.99Sat Jun 30 12:56:29 
2001
+++ linux/include/linux/fs.hSat Jun 30 07:05:37 2001
@@ -224,6 +224,8 @@
BH_Mapped,  /* 1 if the buffer has a disk mapping */
BH_New, /* 1 if the buffer is new and not yet written out */
BH_Protected,   /* 1 if the buffer is protected */
+   BH_Clamped, /* 1 if the buffer cannot be reclaimed
+* in it's current state */
BH_Delay,   /* 1 if the buffer is delayed allocate */
 
BH_PrivateStart,/* not a state bit, but the first bit available
@@ -286,6 +288,7 @@
 #define buffer_mapped(bh)  __buffer_state(bh,Mapped)
 #define buffer_new(bh) __buffer_state(bh,New)
 #define buffer_protected(bh)   __buffer_state(bh,Protected)
+#define buffer_clamped(bh) __buffer_state(bh,Clamped)
 #define buffer_delay(bh)   __buffer_state(bh,Delay)
 
 #define bh_offset(bh)  ((unsigned long)(bh)->b_data & ~PAGE_MASK)


The tricky part which I had not worked out how to do yet is to manage the
clearing of a state bit in all the correct places. You would have to set it
when the buffer got locked when I/O was about to start, it becomes clearable
after the last memory allocation during the I/O submission process. I do
not like the approach because there are so many ways a buffer can go
once you get into generic_make_request. At first I thought I could just
explicitly set and clear a flag around memory allocations like the bounce
buffer path. However, that can lead to AB BA deadlocks between multiple
threads submitting I/O requests. At this point I started to think I was
going to build an unmaintainable rats nest and decided I had not got
the correct answer.

I am not sure that an approach which avoids a specific device will fly either,
all the I/O can be on one device, and what does device mean when it comes
to md/lvm and request remapping?

Steve


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bounce buffer deadlock

2001-06-30 Thread Linus Torvalds


On Sat, 30 Jun 2001, Steve Lord wrote:
>
> OK, sounds reasonable, time to go download and merge again I guess!

For 2.4.7 or so, I'll make a backwards-compatibility define (ie make
GFP_BUFFER be the same as the new GFP_NOIO, which is the historical
behaviour and the anally safe value, if not very efficient), but I'm
planning on releasing 2.4.6 without it, to try to flush out people who are
able to take advantage of the new extended semantics out of the
woodworks..

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bounce buffer deadlock

2001-06-30 Thread Steve Lord

> 
> On Sat, 30 Jun 2001, Steve Lord wrote:
> >
> > It looks to me as if all memory allocations of type GFP_BUFFER which happen
> > in generic_make_request downwards can hit the same type of deadlock, so
> > bounce buffers, the request functions of the raid and lvm paths can all
> > end up in try_to_free_buffers on a buffer they themselves hold the lock on.
> 
> .. which is why GFP_BUFFER doesn't exist any more in the most recent
> pre-kernels (oops, this is pre8 only, not pre7 like I said in the previous
> email)
> 
> The problem is that GFP_BUFFER used to mean two things: "don't call
> low-level filesystem" and "don't do IO". Some of the pre-kernels starting
> to make it mean "don't call low-level FS" only. The later ones split up
> the semantics, so that the cases which care about FS deadlocks use
> "GFP_NOFS", and the cases that care about IO recursion use "GFP_NOIO", so
> that we don't overload the meaning of GFP_BUFFER.
> 
> That allows us to do the best we can - still flushing out dirty buffers
> when that's ok (like when a filesystem wants more memory), and giving the
> allocator better control over exactly _what_ he objects to.
> 
>   Linus

OK, sounds reasonable, time to go download and merge again I guess!

Steve



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bounce buffer deadlock

2001-06-30 Thread Linus Torvalds


On Sat, 30 Jun 2001, Steve Lord wrote:
>
> It looks to me as if all memory allocations of type GFP_BUFFER which happen
> in generic_make_request downwards can hit the same type of deadlock, so
> bounce buffers, the request functions of the raid and lvm paths can all
> end up in try_to_free_buffers on a buffer they themselves hold the lock on.

.. which is why GFP_BUFFER doesn't exist any more in the most recent
pre-kernels (oops, this is pre8 only, not pre7 like I said in the previous
email)

The problem is that GFP_BUFFER used to mean two things: "don't call
low-level filesystem" and "don't do IO". Some of the pre-kernels starting
to make it mean "don't call low-level FS" only. The later ones split up
the semantics, so that the cases which care about FS deadlocks use
"GFP_NOFS", and the cases that care about IO recursion use "GFP_NOIO", so
that we don't overload the meaning of GFP_BUFFER.

That allows us to do the best we can - still flushing out dirty buffers
when that's ok (like when a filesystem wants more memory), and giving the
allocator better control over exactly _what_ he objects to.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bounce buffer deadlock

2001-06-30 Thread Marcelo Tosatti



On Sat, 30 Jun 2001, Steve Lord wrote:

> > 
> > 
> > On Fri, 29 Jun 2001, Steve Lord wrote:
> > 
> > > 
> > > Has anyone else seen a hang like this:
> > > 
> > >   bdflush()
> > > flush_dirty_buffers()
> > >   ll_rw_block()
> > >   submit_bh(buffer X)
> > > generic_make_request()
> > >   __make_request()
> > >   create_bounce()
> > > alloc_bounce_page()
> > >   alloc_page()
> > > try_to_free_pages()
> > >   do_try_to_free_pages()
> > > page_launder()
> > >   try_to_free_buffers( , 2)  -- i.e. wait for buffers
> > > sync_page_buffers()
> > >   __wait_on_buffer(buffer X)
> > > 
> > > Where the buffer head X going in the top of the stack is the same as the on
> > e
> > > we wait on at the bottom.
> > > 
> > > There still seems to be nothing to prevent the try to free buffers from
> > > blocking on a buffer like this. Setting a flag on the buffer around the
> > > create_bounce call, and skipping it in the try_to_free_buffers path would
> > > be one approach to avoiding this.
> > 
> > Yes there is a bug: Linus is going to put a new fix soon. 
> > 
> > > I hit this in 2.4.6-pre6, and I don't see anything in the ac series to prot
> > ect
> > > against it.
> > 
> > Thats because the -ac series does not contain the __GFP_BUFFER/__GFP_IO
> > moditications which are in the -ac series.
> 
> It looks to me as if all memory allocations of type GFP_BUFFER which happen
> in generic_make_request downwards can hit the same type of deadlock,
> bounce buffers, the request functions of the raid and lvm paths can all
> end up in try_to_free_buffers on a buffer they themselves hold the lock on.

Yes. 2.4.6-pre8 fixes that (not sure if its up already). 

> If the fix is to avoid page_launder in these cases then the number of
> occurrences when an alloc_pages fails will go up. 

> I was attempting to come up with a way of making try_to_free_buffers
> fail on buffers which are being processed in the generic_make_request
> path by marking them, the problem is there is no single place to reset
> the state of a buffer so that try_to_free_buffers will wait for it.
> Doing it after the end of the loop in generic_make_request is race
> prone to say the least.

I really want to fix things like this in 2.5. (ie not avoid the deadlock
by completly avoiding physical IO, but avoid the deadlock by avoiding
physical IO on the "device" which is doing the allocation)

Could you send me your code ? No problem if it does not work at all :)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bounce buffer deadlock

2001-06-30 Thread Steve Lord

> 
> 
> On Fri, 29 Jun 2001, Steve Lord wrote:
> 
> > 
> > Has anyone else seen a hang like this:
> > 
> >   bdflush()
> > flush_dirty_buffers()
> >   ll_rw_block()
> > submit_bh(buffer X)
> >   generic_make_request()
> > __make_request()
> > create_bounce()
> >   alloc_bounce_page()
> > alloc_page()
> >   try_to_free_pages()
> > do_try_to_free_pages()
> >   page_launder()
> > try_to_free_buffers( , 2)  -- i.e. wait for buffers
> >   sync_page_buffers()
> > __wait_on_buffer(buffer X)
> > 
> > Where the buffer head X going in the top of the stack is the same as the on
> e
> > we wait on at the bottom.
> > 
> > There still seems to be nothing to prevent the try to free buffers from
> > blocking on a buffer like this. Setting a flag on the buffer around the
> > create_bounce call, and skipping it in the try_to_free_buffers path would
> > be one approach to avoiding this.
> 
> Yes there is a bug: Linus is going to put a new fix soon. 
> 
> > I hit this in 2.4.6-pre6, and I don't see anything in the ac series to prot
> ect
> > against it.
> 
> Thats because the -ac series does not contain the __GFP_BUFFER/__GFP_IO
> moditications which are in the -ac series.

It looks to me as if all memory allocations of type GFP_BUFFER which happen
in generic_make_request downwards can hit the same type of deadlock, so
bounce buffers, the request functions of the raid and lvm paths can all
end up in try_to_free_buffers on a buffer they themselves hold the lock on.

If the fix is to avoid page_launder in these cases then the number of
occurrences when an alloc_pages fails will go up. I was attempting to
come up with a way of making try_to_free_buffers fail on buffers which
are being processed in the generic_make_request path by marking them,
the problem is there is no single place to reset the state of a buffer so
that  try_to_free_buffers will wait for it. Doing it after the end of the
loop in generic_make_request is race prone to say the least.

Steve



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bounce buffer deadlock

2001-06-30 Thread Linus Torvalds


On Fri, 29 Jun 2001, Steve Lord wrote:
>
> Has anyone else seen a hang like this:
>
>   bdflush()
> flush_dirty_buffers()
>   ll_rw_block()
>   submit_bh(buffer X)
> generic_make_request()
>   __make_request()
>   create_bounce()
> alloc_bounce_page()
>   alloc_page()
> try_to_free_pages()
>   do_try_to_free_pages()
> page_launder()
>   try_to_free_buffers( , 2)  -- i.e. wait for buffers
> sync_page_buffers()
>   __wait_on_buffer(buffer X)

Should be fixed in 2.4.6-pre8 (or pre7, for that matter).

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bounce buffer deadlock

2001-06-30 Thread Marcelo Tosatti



On Sat, 30 Jun 2001, Marcelo Tosatti wrote:

> 
> 
> On Fri, 29 Jun 2001, Steve Lord wrote:
> 
> > 
> > Has anyone else seen a hang like this:
> > 
> >   bdflush()
> > flush_dirty_buffers()
> >   ll_rw_block()
> > submit_bh(buffer X)
> >   generic_make_request()
> > __make_request()
> > create_bounce()
> >   alloc_bounce_page()
> > alloc_page()
> >   try_to_free_pages()
> > do_try_to_free_pages()
> >   page_launder()
> > try_to_free_buffers( , 2)  -- i.e. wait for buffers
> >   sync_page_buffers()
> > __wait_on_buffer(buffer X)
> > 
> > Where the buffer head X going in the top of the stack is the same as the one
> > we wait on at the bottom.
> > 
> > There still seems to be nothing to prevent the try to free buffers from
> > blocking on a buffer like this. Setting a flag on the buffer around the
> > create_bounce call, and skipping it in the try_to_free_buffers path would
> > be one approach to avoiding this.
> 
> Yes there is a bug: Linus is going to put a new fix soon. 
> 
> > I hit this in 2.4.6-pre6, and I don't see anything in the ac series to protect
> > against it.
> 
> Thats because the -ac series does not contain the __GFP_BUFFER/__GFP_IO
> moditications which are in the -ac series.
  ^^ 

I mean 2.4.6-pre, sorry.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bounce buffer deadlock

2001-06-30 Thread Marcelo Tosatti



On Fri, 29 Jun 2001, Steve Lord wrote:

> 
> Has anyone else seen a hang like this:
> 
>   bdflush()
> flush_dirty_buffers()
>   ll_rw_block()
>   submit_bh(buffer X)
> generic_make_request()
>   __make_request()
>   create_bounce()
> alloc_bounce_page()
>   alloc_page()
> try_to_free_pages()
>   do_try_to_free_pages()
> page_launder()
>   try_to_free_buffers( , 2)  -- i.e. wait for buffers
> sync_page_buffers()
>   __wait_on_buffer(buffer X)
> 
> Where the buffer head X going in the top of the stack is the same as the one
> we wait on at the bottom.
> 
> There still seems to be nothing to prevent the try to free buffers from
> blocking on a buffer like this. Setting a flag on the buffer around the
> create_bounce call, and skipping it in the try_to_free_buffers path would
> be one approach to avoiding this.

Yes there is a bug: Linus is going to put a new fix soon. 

> I hit this in 2.4.6-pre6, and I don't see anything in the ac series to protect
> against it.

Thats because the -ac series does not contain the __GFP_BUFFER/__GFP_IO
moditications which are in the -ac series.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bounce buffer deadlock

2001-06-30 Thread Marcelo Tosatti



On Fri, 29 Jun 2001, Steve Lord wrote:

 
 Has anyone else seen a hang like this:
 
   bdflush()
 flush_dirty_buffers()
   ll_rw_block()
   submit_bh(buffer X)
 generic_make_request()
   __make_request()
   create_bounce()
 alloc_bounce_page()
   alloc_page()
 try_to_free_pages()
   do_try_to_free_pages()
 page_launder()
   try_to_free_buffers( , 2)  -- i.e. wait for buffers
 sync_page_buffers()
   __wait_on_buffer(buffer X)
 
 Where the buffer head X going in the top of the stack is the same as the one
 we wait on at the bottom.
 
 There still seems to be nothing to prevent the try to free buffers from
 blocking on a buffer like this. Setting a flag on the buffer around the
 create_bounce call, and skipping it in the try_to_free_buffers path would
 be one approach to avoiding this.

Yes there is a bug: Linus is going to put a new fix soon. 

 I hit this in 2.4.6-pre6, and I don't see anything in the ac series to protect
 against it.

Thats because the -ac series does not contain the __GFP_BUFFER/__GFP_IO
moditications which are in the -ac series.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bounce buffer deadlock

2001-06-30 Thread Marcelo Tosatti



On Sat, 30 Jun 2001, Marcelo Tosatti wrote:

 
 
 On Fri, 29 Jun 2001, Steve Lord wrote:
 
  
  Has anyone else seen a hang like this:
  
bdflush()
  flush_dirty_buffers()
ll_rw_block()
  submit_bh(buffer X)
generic_make_request()
  __make_request()
  create_bounce()
alloc_bounce_page()
  alloc_page()
try_to_free_pages()
  do_try_to_free_pages()
page_launder()
  try_to_free_buffers( , 2)  -- i.e. wait for buffers
sync_page_buffers()
  __wait_on_buffer(buffer X)
  
  Where the buffer head X going in the top of the stack is the same as the one
  we wait on at the bottom.
  
  There still seems to be nothing to prevent the try to free buffers from
  blocking on a buffer like this. Setting a flag on the buffer around the
  create_bounce call, and skipping it in the try_to_free_buffers path would
  be one approach to avoiding this.
 
 Yes there is a bug: Linus is going to put a new fix soon. 
 
  I hit this in 2.4.6-pre6, and I don't see anything in the ac series to protect
  against it.
 
 Thats because the -ac series does not contain the __GFP_BUFFER/__GFP_IO
 moditications which are in the -ac series.
  ^^ 

I mean 2.4.6-pre, sorry.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bounce buffer deadlock

2001-06-30 Thread Steve Lord

 
 
 On Fri, 29 Jun 2001, Steve Lord wrote:
 
  
  Has anyone else seen a hang like this:
  
bdflush()
  flush_dirty_buffers()
ll_rw_block()
  submit_bh(buffer X)
generic_make_request()
  __make_request()
  create_bounce()
alloc_bounce_page()
  alloc_page()
try_to_free_pages()
  do_try_to_free_pages()
page_launder()
  try_to_free_buffers( , 2)  -- i.e. wait for buffers
sync_page_buffers()
  __wait_on_buffer(buffer X)
  
  Where the buffer head X going in the top of the stack is the same as the on
 e
  we wait on at the bottom.
  
  There still seems to be nothing to prevent the try to free buffers from
  blocking on a buffer like this. Setting a flag on the buffer around the
  create_bounce call, and skipping it in the try_to_free_buffers path would
  be one approach to avoiding this.
 
 Yes there is a bug: Linus is going to put a new fix soon. 
 
  I hit this in 2.4.6-pre6, and I don't see anything in the ac series to prot
 ect
  against it.
 
 Thats because the -ac series does not contain the __GFP_BUFFER/__GFP_IO
 moditications which are in the -ac series.

It looks to me as if all memory allocations of type GFP_BUFFER which happen
in generic_make_request downwards can hit the same type of deadlock, so
bounce buffers, the request functions of the raid and lvm paths can all
end up in try_to_free_buffers on a buffer they themselves hold the lock on.

If the fix is to avoid page_launder in these cases then the number of
occurrences when an alloc_pages fails will go up. I was attempting to
come up with a way of making try_to_free_buffers fail on buffers which
are being processed in the generic_make_request path by marking them,
the problem is there is no single place to reset the state of a buffer so
that  try_to_free_buffers will wait for it. Doing it after the end of the
loop in generic_make_request is race prone to say the least.

Steve



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bounce buffer deadlock

2001-06-30 Thread Linus Torvalds


On Fri, 29 Jun 2001, Steve Lord wrote:

 Has anyone else seen a hang like this:

   bdflush()
 flush_dirty_buffers()
   ll_rw_block()
   submit_bh(buffer X)
 generic_make_request()
   __make_request()
   create_bounce()
 alloc_bounce_page()
   alloc_page()
 try_to_free_pages()
   do_try_to_free_pages()
 page_launder()
   try_to_free_buffers( , 2)  -- i.e. wait for buffers
 sync_page_buffers()
   __wait_on_buffer(buffer X)

Should be fixed in 2.4.6-pre8 (or pre7, for that matter).

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bounce buffer deadlock

2001-06-30 Thread Marcelo Tosatti



On Sat, 30 Jun 2001, Steve Lord wrote:

  
  
  On Fri, 29 Jun 2001, Steve Lord wrote:
  
   
   Has anyone else seen a hang like this:
   
 bdflush()
   flush_dirty_buffers()
 ll_rw_block()
 submit_bh(buffer X)
   generic_make_request()
 __make_request()
 create_bounce()
   alloc_bounce_page()
 alloc_page()
   try_to_free_pages()
 do_try_to_free_pages()
   page_launder()
 try_to_free_buffers( , 2)  -- i.e. wait for buffers
   sync_page_buffers()
 __wait_on_buffer(buffer X)
   
   Where the buffer head X going in the top of the stack is the same as the on
  e
   we wait on at the bottom.
   
   There still seems to be nothing to prevent the try to free buffers from
   blocking on a buffer like this. Setting a flag on the buffer around the
   create_bounce call, and skipping it in the try_to_free_buffers path would
   be one approach to avoiding this.
  
  Yes there is a bug: Linus is going to put a new fix soon. 
  
   I hit this in 2.4.6-pre6, and I don't see anything in the ac series to prot
  ect
   against it.
  
  Thats because the -ac series does not contain the __GFP_BUFFER/__GFP_IO
  moditications which are in the -ac series.
 
 It looks to me as if all memory allocations of type GFP_BUFFER which happen
 in generic_make_request downwards can hit the same type of deadlock,
 bounce buffers, the request functions of the raid and lvm paths can all
 end up in try_to_free_buffers on a buffer they themselves hold the lock on.

Yes. 2.4.6-pre8 fixes that (not sure if its up already). 

 If the fix is to avoid page_launder in these cases then the number of
 occurrences when an alloc_pages fails will go up. 

 I was attempting to come up with a way of making try_to_free_buffers
 fail on buffers which are being processed in the generic_make_request
 path by marking them, the problem is there is no single place to reset
 the state of a buffer so that try_to_free_buffers will wait for it.
 Doing it after the end of the loop in generic_make_request is race
 prone to say the least.

I really want to fix things like this in 2.5. (ie not avoid the deadlock
by completly avoiding physical IO, but avoid the deadlock by avoiding
physical IO on the device which is doing the allocation)

Could you send me your code ? No problem if it does not work at all :)


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bounce buffer deadlock

2001-06-30 Thread Linus Torvalds


On Sat, 30 Jun 2001, Steve Lord wrote:

 It looks to me as if all memory allocations of type GFP_BUFFER which happen
 in generic_make_request downwards can hit the same type of deadlock, so
 bounce buffers, the request functions of the raid and lvm paths can all
 end up in try_to_free_buffers on a buffer they themselves hold the lock on.

.. which is why GFP_BUFFER doesn't exist any more in the most recent
pre-kernels (oops, this is pre8 only, not pre7 like I said in the previous
email)

The problem is that GFP_BUFFER used to mean two things: don't call
low-level filesystem and don't do IO. Some of the pre-kernels starting
to make it mean don't call low-level FS only. The later ones split up
the semantics, so that the cases which care about FS deadlocks use
GFP_NOFS, and the cases that care about IO recursion use GFP_NOIO, so
that we don't overload the meaning of GFP_BUFFER.

That allows us to do the best we can - still flushing out dirty buffers
when that's ok (like when a filesystem wants more memory), and giving the
allocator better control over exactly _what_ he objects to.

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bounce buffer deadlock

2001-06-30 Thread Steve Lord

 
 On Sat, 30 Jun 2001, Steve Lord wrote:
 
  It looks to me as if all memory allocations of type GFP_BUFFER which happen
  in generic_make_request downwards can hit the same type of deadlock, so
  bounce buffers, the request functions of the raid and lvm paths can all
  end up in try_to_free_buffers on a buffer they themselves hold the lock on.
 
 .. which is why GFP_BUFFER doesn't exist any more in the most recent
 pre-kernels (oops, this is pre8 only, not pre7 like I said in the previous
 email)
 
 The problem is that GFP_BUFFER used to mean two things: don't call
 low-level filesystem and don't do IO. Some of the pre-kernels starting
 to make it mean don't call low-level FS only. The later ones split up
 the semantics, so that the cases which care about FS deadlocks use
 GFP_NOFS, and the cases that care about IO recursion use GFP_NOIO, so
 that we don't overload the meaning of GFP_BUFFER.
 
 That allows us to do the best we can - still flushing out dirty buffers
 when that's ok (like when a filesystem wants more memory), and giving the
 allocator better control over exactly _what_ he objects to.
 
   Linus

OK, sounds reasonable, time to go download and merge again I guess!

Steve



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bounce buffer deadlock

2001-06-30 Thread Linus Torvalds


On Sat, 30 Jun 2001, Steve Lord wrote:

 OK, sounds reasonable, time to go download and merge again I guess!

For 2.4.7 or so, I'll make a backwards-compatibility define (ie make
GFP_BUFFER be the same as the new GFP_NOIO, which is the historical
behaviour and the anally safe value, if not very efficient), but I'm
planning on releasing 2.4.6 without it, to try to flush out people who are
able to take advantage of the new extended semantics out of the
woodworks..

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bounce buffer deadlock

2001-06-30 Thread Steve Lord


 Yes. 2.4.6-pre8 fixes that (not sure if its up already). 

It is up.

 
  If the fix is to avoid page_launder in these cases then the number of
  occurrences when an alloc_pages fails will go up. 
 
  I was attempting to come up with a way of making try_to_free_buffers
  fail on buffers which are being processed in the generic_make_request
  path by marking them, the problem is there is no single place to reset
  the state of a buffer so that try_to_free_buffers will wait for it.
  Doing it after the end of the loop in generic_make_request is race
  prone to say the least.
 
 I really want to fix things like this in 2.5. (ie not avoid the deadlock
 by completly avoiding physical IO, but avoid the deadlock by avoiding
 physical IO on the device which is doing the allocation)
 
 Could you send me your code ? No problem if it does not work at all :)
 

Well, the basic idea is simple, but I suspect the implementation might
rapidly become historical in 2.5. Basically I added a new buffer state bit,
although BH_Req looks like it could be cannibalized, no one appears to check
for it (is it really dead code?). 

Using a flag to skip buffers in try_to_free_buffers is easy:


===
Index: linux/fs/buffer.c
===

--- /usr/tmp/TmpDir.3237-0/linux/fs/buffer.c_1.68   Sat Jun 30 12:56:29 2001
+++ linux/fs/buffer.c   Sat Jun 30 12:57:52 2001
@@ -2365,7 +2365,7 @@
 /*
  * Can the buffer be thrown out?
  */
-#define BUFFER_BUSY_BITS   ((1BH_Dirty) | (1BH_Lock) | (1BH_Protected))
+#define BUFFER_BUSY_BITS   ((1BH_Dirty) | (1BH_Lock) | (1BH_Protected) | 
+(1BH_Clamped))
 #define buffer_busy(bh)(atomic_read((bh)-b_count) | ((bh)-b_state 
 BUFFER_BUSY_BITS))
 
 /*
@@ -2430,7 +2430,11 @@
spin_unlock(free_list[index].lock);
write_unlock(hash_table_lock);
spin_unlock(lru_list_lock);
-   if (wait) {
+   /* Buffers in the middle of generic_make_request processing cannot
+* be waited for, they may be allocating memory right now and be
+* locked by this thread.
+*/
+   if (wait  !buffer_clamped(tmp)) {
sync_page_buffers(bh, wait);
/* We waited synchronously, so we can free the buffers. */
if (wait  1  !loop) {

===
Index: linux/include/linux/fs.h
===

--- /usr/tmp/TmpDir.3237-0/linux/include/linux/fs.h_1.99Sat Jun 30 12:56:29 
2001
+++ linux/include/linux/fs.hSat Jun 30 07:05:37 2001
@@ -224,6 +224,8 @@
BH_Mapped,  /* 1 if the buffer has a disk mapping */
BH_New, /* 1 if the buffer is new and not yet written out */
BH_Protected,   /* 1 if the buffer is protected */
+   BH_Clamped, /* 1 if the buffer cannot be reclaimed
+* in it's current state */
BH_Delay,   /* 1 if the buffer is delayed allocate */
 
BH_PrivateStart,/* not a state bit, but the first bit available
@@ -286,6 +288,7 @@
 #define buffer_mapped(bh)  __buffer_state(bh,Mapped)
 #define buffer_new(bh) __buffer_state(bh,New)
 #define buffer_protected(bh)   __buffer_state(bh,Protected)
+#define buffer_clamped(bh) __buffer_state(bh,Clamped)
 #define buffer_delay(bh)   __buffer_state(bh,Delay)
 
 #define bh_offset(bh)  ((unsigned long)(bh)-b_data  ~PAGE_MASK)


The tricky part which I had not worked out how to do yet is to manage the
clearing of a state bit in all the correct places. You would have to set it
when the buffer got locked when I/O was about to start, it becomes clearable
after the last memory allocation during the I/O submission process. I do
not like the approach because there are so many ways a buffer can go
once you get into generic_make_request. At first I thought I could just
explicitly set and clear a flag around memory allocations like the bounce
buffer path. However, that can lead to AB BA deadlocks between multiple
threads submitting I/O requests. At this point I started to think I was
going to build an unmaintainable rats nest and decided I had not got
the correct answer.

I am not sure that an approach which avoids a specific device will fly either,
all the I/O can be on one device, and what does device mean when it comes
to md/lvm and request remapping?

Steve


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bounce buffer deadlock

2001-06-30 Thread Steve Lord

 
 On Sat, 30 Jun 2001, Steve Lord wrote:
 
  OK, sounds reasonable, time to go download and merge again I guess!
 
 For 2.4.7 or so, I'll make a backwards-compatibility define (ie make
 GFP_BUFFER be the same as the new GFP_NOIO, which is the historical
 behaviour and the anally safe value, if not very efficient), but I'm
 planning on releasing 2.4.6 without it, to try to flush out people who are
 able to take advantage of the new extended semantics out of the
 woodworks..
 
   Linus

Consider XFS flushed out (once I merge). This, for us, is the tricky part:


[EMAIL PROTECTED] said:
 That allows us to do the best we can - still flushing out dirty
 buffers when that's ok (like when a filesystem wants more memory), and
 giving the allocator better control over exactly _what_ he objects to.

XFS introduces the concept of the low level flush of a buffer not always
being really a low level flush, since a delayed allocate buffer can end
up reentering the filesystem in order to create the true on disk allocation.
This in turn can cause a transaction and more memory allocations. The really
nasty case we were using GFP_BUFFER for is a memory allocation which is from
within a transaction, we cannot afford to have another transaction start out
of the bottom of memory allocation as it may require resources locked by
the transaction which is allocating memory.

Steve



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bounce buffer deadlock

2001-06-30 Thread Marcelo Tosatti



On Sat, 30 Jun 2001, Steve Lord wrote:

 
  Yes. 2.4.6-pre8 fixes that (not sure if its up already). 
 
 It is up.
 
  
   If the fix is to avoid page_launder in these cases then the number of
   occurrences when an alloc_pages fails will go up. 
  
   I was attempting to come up with a way of making try_to_free_buffers
   fail on buffers which are being processed in the generic_make_request
   path by marking them, the problem is there is no single place to reset
   the state of a buffer so that try_to_free_buffers will wait for it.
   Doing it after the end of the loop in generic_make_request is race
   prone to say the least.
  
  I really want to fix things like this in 2.5. (ie not avoid the deadlock
  by completly avoiding physical IO, but avoid the deadlock by avoiding
  physical IO on the device which is doing the allocation)
  
  Could you send me your code ? No problem if it does not work at all :)
  
 
 Well, the basic idea is simple, but I suspect the implementation might
 rapidly become historical in 2.5. Basically I added a new buffer state bit,
 although BH_Req looks like it could be cannibalized, no one appears to check
 for it (is it really dead code?). 

Dunno. Ask Jens :)

 Using a flag to skip buffers in try_to_free_buffers is easy:

I was thinking more about skipping the buffers which are owned by the
device which is doing the allocation. 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bounce buffer deadlock

2001-06-29 Thread Alan Cox

> Has anyone else seen a hang like this:
> 
>   bdflush()
> flush_dirty_buffers()
>   ll_rw_block()
>   submit_bh(buffer X)
> generic_make_request()
>   __make_request()
>   create_bounce()
> alloc_bounce_page()
>   alloc_page()
> try_to_free_pages()
>   do_try_to_free_pages()
> page_launder()
>   try_to_free_buffers( , 2)  -- i.e. wait for buffers
> sync_page_buffers()
>   __wait_on_buffer(buffer X)

Whoops. We actually carefully set PF_MEMALLOC to avoid deadlocks here but if
the buffer is laundered ummm

Looks like page_launder needs to be more careful

> I hit this in 2.4.6-pre6, and I don't see anything in the ac series to protect
> against it.

Not this one no

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Bounce buffer deadlock

2001-06-29 Thread Steve Lord


Has anyone else seen a hang like this:

  bdflush()
flush_dirty_buffers()
  ll_rw_block()
submit_bh(buffer X)
  generic_make_request()
__make_request()
create_bounce()
  alloc_bounce_page()
alloc_page()
  try_to_free_pages()
do_try_to_free_pages()
  page_launder()
try_to_free_buffers( , 2)  -- i.e. wait for buffers
  sync_page_buffers()
__wait_on_buffer(buffer X)

Where the buffer head X going in the top of the stack is the same as the one
we wait on at the bottom.

There still seems to be nothing to prevent the try to free buffers from
blocking on a buffer like this. Setting a flag on the buffer around the
create_bounce call, and skipping it in the try_to_free_buffers path would
be one approach to avoiding this.

I hit this in 2.4.6-pre6, and I don't see anything in the ac series to protect
against it.

Steve


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Bounce buffer deadlock

2001-06-29 Thread Steve Lord


Has anyone else seen a hang like this:

  bdflush()
flush_dirty_buffers()
  ll_rw_block()
submit_bh(buffer X)
  generic_make_request()
__make_request()
create_bounce()
  alloc_bounce_page()
alloc_page()
  try_to_free_pages()
do_try_to_free_pages()
  page_launder()
try_to_free_buffers( , 2)  -- i.e. wait for buffers
  sync_page_buffers()
__wait_on_buffer(buffer X)

Where the buffer head X going in the top of the stack is the same as the one
we wait on at the bottom.

There still seems to be nothing to prevent the try to free buffers from
blocking on a buffer like this. Setting a flag on the buffer around the
create_bounce call, and skipping it in the try_to_free_buffers path would
be one approach to avoiding this.

I hit this in 2.4.6-pre6, and I don't see anything in the ac series to protect
against it.

Steve


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bounce buffer deadlock

2001-06-29 Thread Alan Cox

 Has anyone else seen a hang like this:
 
   bdflush()
 flush_dirty_buffers()
   ll_rw_block()
   submit_bh(buffer X)
 generic_make_request()
   __make_request()
   create_bounce()
 alloc_bounce_page()
   alloc_page()
 try_to_free_pages()
   do_try_to_free_pages()
 page_launder()
   try_to_free_buffers( , 2)  -- i.e. wait for buffers
 sync_page_buffers()
   __wait_on_buffer(buffer X)

Whoops. We actually carefully set PF_MEMALLOC to avoid deadlocks here but if
the buffer is laundered ummm

Looks like page_launder needs to be more careful

 I hit this in 2.4.6-pre6, and I don't see anything in the ac series to protect
 against it.

Not this one no

Alan

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/