Re: Bounce buffer deadlock
On Sat, 30 Jun 2001, Steve Lord wrote: > > > Yes. 2.4.6-pre8 fixes that (not sure if its up already). > > It is up. > > > > > > If the fix is to avoid page_launder in these cases then the number of > > > occurrences when an alloc_pages fails will go up. > > > > > I was attempting to come up with a way of making try_to_free_buffers > > > fail on buffers which are being processed in the generic_make_request > > > path by marking them, the problem is there is no single place to reset > > > the state of a buffer so that try_to_free_buffers will wait for it. > > > Doing it after the end of the loop in generic_make_request is race > > > prone to say the least. > > > > I really want to fix things like this in 2.5. (ie not avoid the deadlock > > by completly avoiding physical IO, but avoid the deadlock by avoiding > > physical IO on the "device" which is doing the allocation) > > > > Could you send me your code ? No problem if it does not work at all :) > > > > Well, the basic idea is simple, but I suspect the implementation might > rapidly become historical in 2.5. Basically I added a new buffer state bit, > although BH_Req looks like it could be cannibalized, no one appears to check > for it (is it really dead code?). Dunno. Ask Jens :) > Using a flag to skip buffers in try_to_free_buffers is easy: I was thinking more about skipping the buffers which are "owned" by the device which is doing the allocation. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bounce buffer deadlock
> > On Sat, 30 Jun 2001, Steve Lord wrote: > > > > OK, sounds reasonable, time to go download and merge again I guess! > > For 2.4.7 or so, I'll make a backwards-compatibility define (ie make > GFP_BUFFER be the same as the new GFP_NOIO, which is the historical > behaviour and the anally safe value, if not very efficient), but I'm > planning on releasing 2.4.6 without it, to try to flush out people who are > able to take advantage of the new extended semantics out of the > woodworks.. > > Linus Consider XFS flushed out (once I merge). This, for us, is the tricky part: [EMAIL PROTECTED] said: >> That allows us to do the best we can - still flushing out dirty >> buffers when that's ok (like when a filesystem wants more memory), and >> giving the allocator better control over exactly _what_ he objects to. XFS introduces the concept of the low level flush of a buffer not always being really a low level flush, since a delayed allocate buffer can end up reentering the filesystem in order to create the true on disk allocation. This in turn can cause a transaction and more memory allocations. The really nasty case we were using GFP_BUFFER for is a memory allocation which is from within a transaction, we cannot afford to have another transaction start out of the bottom of memory allocation as it may require resources locked by the transaction which is allocating memory. Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bounce buffer deadlock
> Yes. 2.4.6-pre8 fixes that (not sure if its up already). It is up. > > > If the fix is to avoid page_launder in these cases then the number of > > occurrences when an alloc_pages fails will go up. > > > I was attempting to come up with a way of making try_to_free_buffers > > fail on buffers which are being processed in the generic_make_request > > path by marking them, the problem is there is no single place to reset > > the state of a buffer so that try_to_free_buffers will wait for it. > > Doing it after the end of the loop in generic_make_request is race > > prone to say the least. > > I really want to fix things like this in 2.5. (ie not avoid the deadlock > by completly avoiding physical IO, but avoid the deadlock by avoiding > physical IO on the "device" which is doing the allocation) > > Could you send me your code ? No problem if it does not work at all :) > Well, the basic idea is simple, but I suspect the implementation might rapidly become historical in 2.5. Basically I added a new buffer state bit, although BH_Req looks like it could be cannibalized, no one appears to check for it (is it really dead code?). Using a flag to skip buffers in try_to_free_buffers is easy: === Index: linux/fs/buffer.c === --- /usr/tmp/TmpDir.3237-0/linux/fs/buffer.c_1.68 Sat Jun 30 12:56:29 2001 +++ linux/fs/buffer.c Sat Jun 30 12:57:52 2001 @@ -2365,7 +2365,7 @@ /* * Can the buffer be thrown out? */ -#define BUFFER_BUSY_BITS ((1b_state & BUFFER_BUSY_BITS)) /* @@ -2430,7 +2430,11 @@ spin_unlock(_list[index].lock); write_unlock(_table_lock); spin_unlock(_list_lock); - if (wait) { + /* Buffers in the middle of generic_make_request processing cannot +* be waited for, they may be allocating memory right now and be +* locked by this thread. +*/ + if (wait && !buffer_clamped(tmp)) { sync_page_buffers(bh, wait); /* We waited synchronously, so we can free the buffers. */ if (wait > 1 && !loop) { === Index: linux/include/linux/fs.h === --- /usr/tmp/TmpDir.3237-0/linux/include/linux/fs.h_1.99Sat Jun 30 12:56:29 2001 +++ linux/include/linux/fs.hSat Jun 30 07:05:37 2001 @@ -224,6 +224,8 @@ BH_Mapped, /* 1 if the buffer has a disk mapping */ BH_New, /* 1 if the buffer is new and not yet written out */ BH_Protected, /* 1 if the buffer is protected */ + BH_Clamped, /* 1 if the buffer cannot be reclaimed +* in it's current state */ BH_Delay, /* 1 if the buffer is delayed allocate */ BH_PrivateStart,/* not a state bit, but the first bit available @@ -286,6 +288,7 @@ #define buffer_mapped(bh) __buffer_state(bh,Mapped) #define buffer_new(bh) __buffer_state(bh,New) #define buffer_protected(bh) __buffer_state(bh,Protected) +#define buffer_clamped(bh) __buffer_state(bh,Clamped) #define buffer_delay(bh) __buffer_state(bh,Delay) #define bh_offset(bh) ((unsigned long)(bh)->b_data & ~PAGE_MASK) The tricky part which I had not worked out how to do yet is to manage the clearing of a state bit in all the correct places. You would have to set it when the buffer got locked when I/O was about to start, it becomes clearable after the last memory allocation during the I/O submission process. I do not like the approach because there are so many ways a buffer can go once you get into generic_make_request. At first I thought I could just explicitly set and clear a flag around memory allocations like the bounce buffer path. However, that can lead to AB BA deadlocks between multiple threads submitting I/O requests. At this point I started to think I was going to build an unmaintainable rats nest and decided I had not got the correct answer. I am not sure that an approach which avoids a specific device will fly either, all the I/O can be on one device, and what does device mean when it comes to md/lvm and request remapping? Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bounce buffer deadlock
On Sat, 30 Jun 2001, Steve Lord wrote: > > OK, sounds reasonable, time to go download and merge again I guess! For 2.4.7 or so, I'll make a backwards-compatibility define (ie make GFP_BUFFER be the same as the new GFP_NOIO, which is the historical behaviour and the anally safe value, if not very efficient), but I'm planning on releasing 2.4.6 without it, to try to flush out people who are able to take advantage of the new extended semantics out of the woodworks.. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bounce buffer deadlock
> > On Sat, 30 Jun 2001, Steve Lord wrote: > > > > It looks to me as if all memory allocations of type GFP_BUFFER which happen > > in generic_make_request downwards can hit the same type of deadlock, so > > bounce buffers, the request functions of the raid and lvm paths can all > > end up in try_to_free_buffers on a buffer they themselves hold the lock on. > > .. which is why GFP_BUFFER doesn't exist any more in the most recent > pre-kernels (oops, this is pre8 only, not pre7 like I said in the previous > email) > > The problem is that GFP_BUFFER used to mean two things: "don't call > low-level filesystem" and "don't do IO". Some of the pre-kernels starting > to make it mean "don't call low-level FS" only. The later ones split up > the semantics, so that the cases which care about FS deadlocks use > "GFP_NOFS", and the cases that care about IO recursion use "GFP_NOIO", so > that we don't overload the meaning of GFP_BUFFER. > > That allows us to do the best we can - still flushing out dirty buffers > when that's ok (like when a filesystem wants more memory), and giving the > allocator better control over exactly _what_ he objects to. > > Linus OK, sounds reasonable, time to go download and merge again I guess! Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bounce buffer deadlock
On Sat, 30 Jun 2001, Steve Lord wrote: > > It looks to me as if all memory allocations of type GFP_BUFFER which happen > in generic_make_request downwards can hit the same type of deadlock, so > bounce buffers, the request functions of the raid and lvm paths can all > end up in try_to_free_buffers on a buffer they themselves hold the lock on. .. which is why GFP_BUFFER doesn't exist any more in the most recent pre-kernels (oops, this is pre8 only, not pre7 like I said in the previous email) The problem is that GFP_BUFFER used to mean two things: "don't call low-level filesystem" and "don't do IO". Some of the pre-kernels starting to make it mean "don't call low-level FS" only. The later ones split up the semantics, so that the cases which care about FS deadlocks use "GFP_NOFS", and the cases that care about IO recursion use "GFP_NOIO", so that we don't overload the meaning of GFP_BUFFER. That allows us to do the best we can - still flushing out dirty buffers when that's ok (like when a filesystem wants more memory), and giving the allocator better control over exactly _what_ he objects to. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bounce buffer deadlock
On Sat, 30 Jun 2001, Steve Lord wrote: > > > > > > On Fri, 29 Jun 2001, Steve Lord wrote: > > > > > > > > Has anyone else seen a hang like this: > > > > > > bdflush() > > > flush_dirty_buffers() > > > ll_rw_block() > > > submit_bh(buffer X) > > > generic_make_request() > > > __make_request() > > > create_bounce() > > > alloc_bounce_page() > > > alloc_page() > > > try_to_free_pages() > > > do_try_to_free_pages() > > > page_launder() > > > try_to_free_buffers( , 2) -- i.e. wait for buffers > > > sync_page_buffers() > > > __wait_on_buffer(buffer X) > > > > > > Where the buffer head X going in the top of the stack is the same as the on > > e > > > we wait on at the bottom. > > > > > > There still seems to be nothing to prevent the try to free buffers from > > > blocking on a buffer like this. Setting a flag on the buffer around the > > > create_bounce call, and skipping it in the try_to_free_buffers path would > > > be one approach to avoiding this. > > > > Yes there is a bug: Linus is going to put a new fix soon. > > > > > I hit this in 2.4.6-pre6, and I don't see anything in the ac series to prot > > ect > > > against it. > > > > Thats because the -ac series does not contain the __GFP_BUFFER/__GFP_IO > > moditications which are in the -ac series. > > It looks to me as if all memory allocations of type GFP_BUFFER which happen > in generic_make_request downwards can hit the same type of deadlock, > bounce buffers, the request functions of the raid and lvm paths can all > end up in try_to_free_buffers on a buffer they themselves hold the lock on. Yes. 2.4.6-pre8 fixes that (not sure if its up already). > If the fix is to avoid page_launder in these cases then the number of > occurrences when an alloc_pages fails will go up. > I was attempting to come up with a way of making try_to_free_buffers > fail on buffers which are being processed in the generic_make_request > path by marking them, the problem is there is no single place to reset > the state of a buffer so that try_to_free_buffers will wait for it. > Doing it after the end of the loop in generic_make_request is race > prone to say the least. I really want to fix things like this in 2.5. (ie not avoid the deadlock by completly avoiding physical IO, but avoid the deadlock by avoiding physical IO on the "device" which is doing the allocation) Could you send me your code ? No problem if it does not work at all :) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bounce buffer deadlock
> > > On Fri, 29 Jun 2001, Steve Lord wrote: > > > > > Has anyone else seen a hang like this: > > > > bdflush() > > flush_dirty_buffers() > > ll_rw_block() > > submit_bh(buffer X) > > generic_make_request() > > __make_request() > > create_bounce() > > alloc_bounce_page() > > alloc_page() > > try_to_free_pages() > > do_try_to_free_pages() > > page_launder() > > try_to_free_buffers( , 2) -- i.e. wait for buffers > > sync_page_buffers() > > __wait_on_buffer(buffer X) > > > > Where the buffer head X going in the top of the stack is the same as the on > e > > we wait on at the bottom. > > > > There still seems to be nothing to prevent the try to free buffers from > > blocking on a buffer like this. Setting a flag on the buffer around the > > create_bounce call, and skipping it in the try_to_free_buffers path would > > be one approach to avoiding this. > > Yes there is a bug: Linus is going to put a new fix soon. > > > I hit this in 2.4.6-pre6, and I don't see anything in the ac series to prot > ect > > against it. > > Thats because the -ac series does not contain the __GFP_BUFFER/__GFP_IO > moditications which are in the -ac series. It looks to me as if all memory allocations of type GFP_BUFFER which happen in generic_make_request downwards can hit the same type of deadlock, so bounce buffers, the request functions of the raid and lvm paths can all end up in try_to_free_buffers on a buffer they themselves hold the lock on. If the fix is to avoid page_launder in these cases then the number of occurrences when an alloc_pages fails will go up. I was attempting to come up with a way of making try_to_free_buffers fail on buffers which are being processed in the generic_make_request path by marking them, the problem is there is no single place to reset the state of a buffer so that try_to_free_buffers will wait for it. Doing it after the end of the loop in generic_make_request is race prone to say the least. Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bounce buffer deadlock
On Fri, 29 Jun 2001, Steve Lord wrote: > > Has anyone else seen a hang like this: > > bdflush() > flush_dirty_buffers() > ll_rw_block() > submit_bh(buffer X) > generic_make_request() > __make_request() > create_bounce() > alloc_bounce_page() > alloc_page() > try_to_free_pages() > do_try_to_free_pages() > page_launder() > try_to_free_buffers( , 2) -- i.e. wait for buffers > sync_page_buffers() > __wait_on_buffer(buffer X) Should be fixed in 2.4.6-pre8 (or pre7, for that matter). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bounce buffer deadlock
On Sat, 30 Jun 2001, Marcelo Tosatti wrote: > > > On Fri, 29 Jun 2001, Steve Lord wrote: > > > > > Has anyone else seen a hang like this: > > > > bdflush() > > flush_dirty_buffers() > > ll_rw_block() > > submit_bh(buffer X) > > generic_make_request() > > __make_request() > > create_bounce() > > alloc_bounce_page() > > alloc_page() > > try_to_free_pages() > > do_try_to_free_pages() > > page_launder() > > try_to_free_buffers( , 2) -- i.e. wait for buffers > > sync_page_buffers() > > __wait_on_buffer(buffer X) > > > > Where the buffer head X going in the top of the stack is the same as the one > > we wait on at the bottom. > > > > There still seems to be nothing to prevent the try to free buffers from > > blocking on a buffer like this. Setting a flag on the buffer around the > > create_bounce call, and skipping it in the try_to_free_buffers path would > > be one approach to avoiding this. > > Yes there is a bug: Linus is going to put a new fix soon. > > > I hit this in 2.4.6-pre6, and I don't see anything in the ac series to protect > > against it. > > Thats because the -ac series does not contain the __GFP_BUFFER/__GFP_IO > moditications which are in the -ac series. ^^ I mean 2.4.6-pre, sorry. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bounce buffer deadlock
On Fri, 29 Jun 2001, Steve Lord wrote: > > Has anyone else seen a hang like this: > > bdflush() > flush_dirty_buffers() > ll_rw_block() > submit_bh(buffer X) > generic_make_request() > __make_request() > create_bounce() > alloc_bounce_page() > alloc_page() > try_to_free_pages() > do_try_to_free_pages() > page_launder() > try_to_free_buffers( , 2) -- i.e. wait for buffers > sync_page_buffers() > __wait_on_buffer(buffer X) > > Where the buffer head X going in the top of the stack is the same as the one > we wait on at the bottom. > > There still seems to be nothing to prevent the try to free buffers from > blocking on a buffer like this. Setting a flag on the buffer around the > create_bounce call, and skipping it in the try_to_free_buffers path would > be one approach to avoiding this. Yes there is a bug: Linus is going to put a new fix soon. > I hit this in 2.4.6-pre6, and I don't see anything in the ac series to protect > against it. Thats because the -ac series does not contain the __GFP_BUFFER/__GFP_IO moditications which are in the -ac series. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bounce buffer deadlock
On Fri, 29 Jun 2001, Steve Lord wrote: Has anyone else seen a hang like this: bdflush() flush_dirty_buffers() ll_rw_block() submit_bh(buffer X) generic_make_request() __make_request() create_bounce() alloc_bounce_page() alloc_page() try_to_free_pages() do_try_to_free_pages() page_launder() try_to_free_buffers( , 2) -- i.e. wait for buffers sync_page_buffers() __wait_on_buffer(buffer X) Where the buffer head X going in the top of the stack is the same as the one we wait on at the bottom. There still seems to be nothing to prevent the try to free buffers from blocking on a buffer like this. Setting a flag on the buffer around the create_bounce call, and skipping it in the try_to_free_buffers path would be one approach to avoiding this. Yes there is a bug: Linus is going to put a new fix soon. I hit this in 2.4.6-pre6, and I don't see anything in the ac series to protect against it. Thats because the -ac series does not contain the __GFP_BUFFER/__GFP_IO moditications which are in the -ac series. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bounce buffer deadlock
On Sat, 30 Jun 2001, Marcelo Tosatti wrote: On Fri, 29 Jun 2001, Steve Lord wrote: Has anyone else seen a hang like this: bdflush() flush_dirty_buffers() ll_rw_block() submit_bh(buffer X) generic_make_request() __make_request() create_bounce() alloc_bounce_page() alloc_page() try_to_free_pages() do_try_to_free_pages() page_launder() try_to_free_buffers( , 2) -- i.e. wait for buffers sync_page_buffers() __wait_on_buffer(buffer X) Where the buffer head X going in the top of the stack is the same as the one we wait on at the bottom. There still seems to be nothing to prevent the try to free buffers from blocking on a buffer like this. Setting a flag on the buffer around the create_bounce call, and skipping it in the try_to_free_buffers path would be one approach to avoiding this. Yes there is a bug: Linus is going to put a new fix soon. I hit this in 2.4.6-pre6, and I don't see anything in the ac series to protect against it. Thats because the -ac series does not contain the __GFP_BUFFER/__GFP_IO moditications which are in the -ac series. ^^ I mean 2.4.6-pre, sorry. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bounce buffer deadlock
On Fri, 29 Jun 2001, Steve Lord wrote: Has anyone else seen a hang like this: bdflush() flush_dirty_buffers() ll_rw_block() submit_bh(buffer X) generic_make_request() __make_request() create_bounce() alloc_bounce_page() alloc_page() try_to_free_pages() do_try_to_free_pages() page_launder() try_to_free_buffers( , 2) -- i.e. wait for buffers sync_page_buffers() __wait_on_buffer(buffer X) Where the buffer head X going in the top of the stack is the same as the on e we wait on at the bottom. There still seems to be nothing to prevent the try to free buffers from blocking on a buffer like this. Setting a flag on the buffer around the create_bounce call, and skipping it in the try_to_free_buffers path would be one approach to avoiding this. Yes there is a bug: Linus is going to put a new fix soon. I hit this in 2.4.6-pre6, and I don't see anything in the ac series to prot ect against it. Thats because the -ac series does not contain the __GFP_BUFFER/__GFP_IO moditications which are in the -ac series. It looks to me as if all memory allocations of type GFP_BUFFER which happen in generic_make_request downwards can hit the same type of deadlock, so bounce buffers, the request functions of the raid and lvm paths can all end up in try_to_free_buffers on a buffer they themselves hold the lock on. If the fix is to avoid page_launder in these cases then the number of occurrences when an alloc_pages fails will go up. I was attempting to come up with a way of making try_to_free_buffers fail on buffers which are being processed in the generic_make_request path by marking them, the problem is there is no single place to reset the state of a buffer so that try_to_free_buffers will wait for it. Doing it after the end of the loop in generic_make_request is race prone to say the least. Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bounce buffer deadlock
On Fri, 29 Jun 2001, Steve Lord wrote: Has anyone else seen a hang like this: bdflush() flush_dirty_buffers() ll_rw_block() submit_bh(buffer X) generic_make_request() __make_request() create_bounce() alloc_bounce_page() alloc_page() try_to_free_pages() do_try_to_free_pages() page_launder() try_to_free_buffers( , 2) -- i.e. wait for buffers sync_page_buffers() __wait_on_buffer(buffer X) Should be fixed in 2.4.6-pre8 (or pre7, for that matter). Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bounce buffer deadlock
On Sat, 30 Jun 2001, Steve Lord wrote: On Fri, 29 Jun 2001, Steve Lord wrote: Has anyone else seen a hang like this: bdflush() flush_dirty_buffers() ll_rw_block() submit_bh(buffer X) generic_make_request() __make_request() create_bounce() alloc_bounce_page() alloc_page() try_to_free_pages() do_try_to_free_pages() page_launder() try_to_free_buffers( , 2) -- i.e. wait for buffers sync_page_buffers() __wait_on_buffer(buffer X) Where the buffer head X going in the top of the stack is the same as the on e we wait on at the bottom. There still seems to be nothing to prevent the try to free buffers from blocking on a buffer like this. Setting a flag on the buffer around the create_bounce call, and skipping it in the try_to_free_buffers path would be one approach to avoiding this. Yes there is a bug: Linus is going to put a new fix soon. I hit this in 2.4.6-pre6, and I don't see anything in the ac series to prot ect against it. Thats because the -ac series does not contain the __GFP_BUFFER/__GFP_IO moditications which are in the -ac series. It looks to me as if all memory allocations of type GFP_BUFFER which happen in generic_make_request downwards can hit the same type of deadlock, bounce buffers, the request functions of the raid and lvm paths can all end up in try_to_free_buffers on a buffer they themselves hold the lock on. Yes. 2.4.6-pre8 fixes that (not sure if its up already). If the fix is to avoid page_launder in these cases then the number of occurrences when an alloc_pages fails will go up. I was attempting to come up with a way of making try_to_free_buffers fail on buffers which are being processed in the generic_make_request path by marking them, the problem is there is no single place to reset the state of a buffer so that try_to_free_buffers will wait for it. Doing it after the end of the loop in generic_make_request is race prone to say the least. I really want to fix things like this in 2.5. (ie not avoid the deadlock by completly avoiding physical IO, but avoid the deadlock by avoiding physical IO on the device which is doing the allocation) Could you send me your code ? No problem if it does not work at all :) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bounce buffer deadlock
On Sat, 30 Jun 2001, Steve Lord wrote: It looks to me as if all memory allocations of type GFP_BUFFER which happen in generic_make_request downwards can hit the same type of deadlock, so bounce buffers, the request functions of the raid and lvm paths can all end up in try_to_free_buffers on a buffer they themselves hold the lock on. .. which is why GFP_BUFFER doesn't exist any more in the most recent pre-kernels (oops, this is pre8 only, not pre7 like I said in the previous email) The problem is that GFP_BUFFER used to mean two things: don't call low-level filesystem and don't do IO. Some of the pre-kernels starting to make it mean don't call low-level FS only. The later ones split up the semantics, so that the cases which care about FS deadlocks use GFP_NOFS, and the cases that care about IO recursion use GFP_NOIO, so that we don't overload the meaning of GFP_BUFFER. That allows us to do the best we can - still flushing out dirty buffers when that's ok (like when a filesystem wants more memory), and giving the allocator better control over exactly _what_ he objects to. Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bounce buffer deadlock
On Sat, 30 Jun 2001, Steve Lord wrote: It looks to me as if all memory allocations of type GFP_BUFFER which happen in generic_make_request downwards can hit the same type of deadlock, so bounce buffers, the request functions of the raid and lvm paths can all end up in try_to_free_buffers on a buffer they themselves hold the lock on. .. which is why GFP_BUFFER doesn't exist any more in the most recent pre-kernels (oops, this is pre8 only, not pre7 like I said in the previous email) The problem is that GFP_BUFFER used to mean two things: don't call low-level filesystem and don't do IO. Some of the pre-kernels starting to make it mean don't call low-level FS only. The later ones split up the semantics, so that the cases which care about FS deadlocks use GFP_NOFS, and the cases that care about IO recursion use GFP_NOIO, so that we don't overload the meaning of GFP_BUFFER. That allows us to do the best we can - still flushing out dirty buffers when that's ok (like when a filesystem wants more memory), and giving the allocator better control over exactly _what_ he objects to. Linus OK, sounds reasonable, time to go download and merge again I guess! Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bounce buffer deadlock
On Sat, 30 Jun 2001, Steve Lord wrote: OK, sounds reasonable, time to go download and merge again I guess! For 2.4.7 or so, I'll make a backwards-compatibility define (ie make GFP_BUFFER be the same as the new GFP_NOIO, which is the historical behaviour and the anally safe value, if not very efficient), but I'm planning on releasing 2.4.6 without it, to try to flush out people who are able to take advantage of the new extended semantics out of the woodworks.. Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bounce buffer deadlock
Yes. 2.4.6-pre8 fixes that (not sure if its up already). It is up. If the fix is to avoid page_launder in these cases then the number of occurrences when an alloc_pages fails will go up. I was attempting to come up with a way of making try_to_free_buffers fail on buffers which are being processed in the generic_make_request path by marking them, the problem is there is no single place to reset the state of a buffer so that try_to_free_buffers will wait for it. Doing it after the end of the loop in generic_make_request is race prone to say the least. I really want to fix things like this in 2.5. (ie not avoid the deadlock by completly avoiding physical IO, but avoid the deadlock by avoiding physical IO on the device which is doing the allocation) Could you send me your code ? No problem if it does not work at all :) Well, the basic idea is simple, but I suspect the implementation might rapidly become historical in 2.5. Basically I added a new buffer state bit, although BH_Req looks like it could be cannibalized, no one appears to check for it (is it really dead code?). Using a flag to skip buffers in try_to_free_buffers is easy: === Index: linux/fs/buffer.c === --- /usr/tmp/TmpDir.3237-0/linux/fs/buffer.c_1.68 Sat Jun 30 12:56:29 2001 +++ linux/fs/buffer.c Sat Jun 30 12:57:52 2001 @@ -2365,7 +2365,7 @@ /* * Can the buffer be thrown out? */ -#define BUFFER_BUSY_BITS ((1BH_Dirty) | (1BH_Lock) | (1BH_Protected)) +#define BUFFER_BUSY_BITS ((1BH_Dirty) | (1BH_Lock) | (1BH_Protected) | +(1BH_Clamped)) #define buffer_busy(bh)(atomic_read((bh)-b_count) | ((bh)-b_state BUFFER_BUSY_BITS)) /* @@ -2430,7 +2430,11 @@ spin_unlock(free_list[index].lock); write_unlock(hash_table_lock); spin_unlock(lru_list_lock); - if (wait) { + /* Buffers in the middle of generic_make_request processing cannot +* be waited for, they may be allocating memory right now and be +* locked by this thread. +*/ + if (wait !buffer_clamped(tmp)) { sync_page_buffers(bh, wait); /* We waited synchronously, so we can free the buffers. */ if (wait 1 !loop) { === Index: linux/include/linux/fs.h === --- /usr/tmp/TmpDir.3237-0/linux/include/linux/fs.h_1.99Sat Jun 30 12:56:29 2001 +++ linux/include/linux/fs.hSat Jun 30 07:05:37 2001 @@ -224,6 +224,8 @@ BH_Mapped, /* 1 if the buffer has a disk mapping */ BH_New, /* 1 if the buffer is new and not yet written out */ BH_Protected, /* 1 if the buffer is protected */ + BH_Clamped, /* 1 if the buffer cannot be reclaimed +* in it's current state */ BH_Delay, /* 1 if the buffer is delayed allocate */ BH_PrivateStart,/* not a state bit, but the first bit available @@ -286,6 +288,7 @@ #define buffer_mapped(bh) __buffer_state(bh,Mapped) #define buffer_new(bh) __buffer_state(bh,New) #define buffer_protected(bh) __buffer_state(bh,Protected) +#define buffer_clamped(bh) __buffer_state(bh,Clamped) #define buffer_delay(bh) __buffer_state(bh,Delay) #define bh_offset(bh) ((unsigned long)(bh)-b_data ~PAGE_MASK) The tricky part which I had not worked out how to do yet is to manage the clearing of a state bit in all the correct places. You would have to set it when the buffer got locked when I/O was about to start, it becomes clearable after the last memory allocation during the I/O submission process. I do not like the approach because there are so many ways a buffer can go once you get into generic_make_request. At first I thought I could just explicitly set and clear a flag around memory allocations like the bounce buffer path. However, that can lead to AB BA deadlocks between multiple threads submitting I/O requests. At this point I started to think I was going to build an unmaintainable rats nest and decided I had not got the correct answer. I am not sure that an approach which avoids a specific device will fly either, all the I/O can be on one device, and what does device mean when it comes to md/lvm and request remapping? Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bounce buffer deadlock
On Sat, 30 Jun 2001, Steve Lord wrote: OK, sounds reasonable, time to go download and merge again I guess! For 2.4.7 or so, I'll make a backwards-compatibility define (ie make GFP_BUFFER be the same as the new GFP_NOIO, which is the historical behaviour and the anally safe value, if not very efficient), but I'm planning on releasing 2.4.6 without it, to try to flush out people who are able to take advantage of the new extended semantics out of the woodworks.. Linus Consider XFS flushed out (once I merge). This, for us, is the tricky part: [EMAIL PROTECTED] said: That allows us to do the best we can - still flushing out dirty buffers when that's ok (like when a filesystem wants more memory), and giving the allocator better control over exactly _what_ he objects to. XFS introduces the concept of the low level flush of a buffer not always being really a low level flush, since a delayed allocate buffer can end up reentering the filesystem in order to create the true on disk allocation. This in turn can cause a transaction and more memory allocations. The really nasty case we were using GFP_BUFFER for is a memory allocation which is from within a transaction, we cannot afford to have another transaction start out of the bottom of memory allocation as it may require resources locked by the transaction which is allocating memory. Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bounce buffer deadlock
On Sat, 30 Jun 2001, Steve Lord wrote: Yes. 2.4.6-pre8 fixes that (not sure if its up already). It is up. If the fix is to avoid page_launder in these cases then the number of occurrences when an alloc_pages fails will go up. I was attempting to come up with a way of making try_to_free_buffers fail on buffers which are being processed in the generic_make_request path by marking them, the problem is there is no single place to reset the state of a buffer so that try_to_free_buffers will wait for it. Doing it after the end of the loop in generic_make_request is race prone to say the least. I really want to fix things like this in 2.5. (ie not avoid the deadlock by completly avoiding physical IO, but avoid the deadlock by avoiding physical IO on the device which is doing the allocation) Could you send me your code ? No problem if it does not work at all :) Well, the basic idea is simple, but I suspect the implementation might rapidly become historical in 2.5. Basically I added a new buffer state bit, although BH_Req looks like it could be cannibalized, no one appears to check for it (is it really dead code?). Dunno. Ask Jens :) Using a flag to skip buffers in try_to_free_buffers is easy: I was thinking more about skipping the buffers which are owned by the device which is doing the allocation. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bounce buffer deadlock
> Has anyone else seen a hang like this: > > bdflush() > flush_dirty_buffers() > ll_rw_block() > submit_bh(buffer X) > generic_make_request() > __make_request() > create_bounce() > alloc_bounce_page() > alloc_page() > try_to_free_pages() > do_try_to_free_pages() > page_launder() > try_to_free_buffers( , 2) -- i.e. wait for buffers > sync_page_buffers() > __wait_on_buffer(buffer X) Whoops. We actually carefully set PF_MEMALLOC to avoid deadlocks here but if the buffer is laundered ummm Looks like page_launder needs to be more careful > I hit this in 2.4.6-pre6, and I don't see anything in the ac series to protect > against it. Not this one no Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Bounce buffer deadlock
Has anyone else seen a hang like this: bdflush() flush_dirty_buffers() ll_rw_block() submit_bh(buffer X) generic_make_request() __make_request() create_bounce() alloc_bounce_page() alloc_page() try_to_free_pages() do_try_to_free_pages() page_launder() try_to_free_buffers( , 2) -- i.e. wait for buffers sync_page_buffers() __wait_on_buffer(buffer X) Where the buffer head X going in the top of the stack is the same as the one we wait on at the bottom. There still seems to be nothing to prevent the try to free buffers from blocking on a buffer like this. Setting a flag on the buffer around the create_bounce call, and skipping it in the try_to_free_buffers path would be one approach to avoiding this. I hit this in 2.4.6-pre6, and I don't see anything in the ac series to protect against it. Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Bounce buffer deadlock
Has anyone else seen a hang like this: bdflush() flush_dirty_buffers() ll_rw_block() submit_bh(buffer X) generic_make_request() __make_request() create_bounce() alloc_bounce_page() alloc_page() try_to_free_pages() do_try_to_free_pages() page_launder() try_to_free_buffers( , 2) -- i.e. wait for buffers sync_page_buffers() __wait_on_buffer(buffer X) Where the buffer head X going in the top of the stack is the same as the one we wait on at the bottom. There still seems to be nothing to prevent the try to free buffers from blocking on a buffer like this. Setting a flag on the buffer around the create_bounce call, and skipping it in the try_to_free_buffers path would be one approach to avoiding this. I hit this in 2.4.6-pre6, and I don't see anything in the ac series to protect against it. Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bounce buffer deadlock
Has anyone else seen a hang like this: bdflush() flush_dirty_buffers() ll_rw_block() submit_bh(buffer X) generic_make_request() __make_request() create_bounce() alloc_bounce_page() alloc_page() try_to_free_pages() do_try_to_free_pages() page_launder() try_to_free_buffers( , 2) -- i.e. wait for buffers sync_page_buffers() __wait_on_buffer(buffer X) Whoops. We actually carefully set PF_MEMALLOC to avoid deadlocks here but if the buffer is laundered ummm Looks like page_launder needs to be more careful I hit this in 2.4.6-pre6, and I don't see anything in the ac series to protect against it. Not this one no Alan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/