Re: [PATCH 1/2] VM throttling: Start writeback at dirty_writeback_start_ratio
On Tue, 10 Apr 2007 12:04:54 +0900 Tomoki Sekiyama <[EMAIL PROTECTED]> wrote: > Hello Andrew, > Thank you for your comments. > > Andrew Morton wrote: > > On Tue, 03 Apr 2007 19:46:04 +0900 > > Tomoki Sekiyama <[EMAIL PROTECTED]> wrote: > >> If % of Dirty+Writeback > `dirty_writeback_start_ratio', generators of > >> dirty pages start writeback of dirty pages by themselves. At that time, > >> these processes are not blocked in balance_dirty_pages(), but they may > >> be blocked if the write-requests-queue of the written disk is full > >> (that is, the length of the queue > `nr_requests'). By this behavior, > >> we can throttle only processes which write to the disks with heavy load, > >> and can allow processes to write to the other disks without blocking. > >> > >> If % of Dirty+Writeback > `dirty_ratio', generators of dirty pages > >> are throttled as current Linux does, not to fill up memory with dirty > >> pages. > > > > Does this actually solve the problem? If the request queue is sufficiently > > large (relative to the various dirty-memory thresholds) then I'd expect > > that a heavy-writer will be able to very quickly take the total > > dirty+writeback memory up to the dirty_ratio (should be renamed > > throttle_threshold, but it's too late for that). > > > > I suspect the reason why this patch was successful in your testing was > > because dirty_start_writeback_ratio happens to exceed the size of the disk > > request queues, so the heavy writer is getting stuck on disk request queue > > exhaustion. > > > > But that won't work if we have a lot of processes writing to a lot of > > disks, and it won't work if the request queue size is large, or if the > > dirty-memory thresholds are small (relative to the request queue size). > > > > Do the patches still work after > > `echo 1 > /sys/block/sda/queue/nr_requests'? > > As you pointed out, this patch has no effect if nr_requests is too large, > because it distinguishes heavy disks depending on the length of the write- > requests queue of each disk. > > This patch is for providing the system administrators with room to avoid > the problem by adjusting parameters appropriately, rather than an automatic > solution for any possible situations. > > Could you please tell me some situations in which we should set nr_request > that large? It's probably not a sensible thing to do. But it's _possible_ to do, and the fact that the kernel will again misbehave indicates an overall weakness in our design. And there are other ways in which this situation could occur: - The request queue has a fixed size (it is not scaled according to the amount of memory in the machine). So if the machine is small enough (say, 64MB) then the problem can happen. - The machine could have a large number of disks - The queue size of 128 is in units of "number of requests". But it is independent upon the _size_ of those requests. If someone comes up with a driver which wants to use 16MB-sized requests, the problem will again reoccur. For all these sorts of reasons, we have learned that we should avoid any dependence upon request queue exhaustion within the VM/VFS/etc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] VM throttling: Start writeback at dirty_writeback_start_ratio
Hello Andrew, Thank you for your comments. Andrew Morton wrote: > On Tue, 03 Apr 2007 19:46:04 +0900 > Tomoki Sekiyama <[EMAIL PROTECTED]> wrote: >> If % of Dirty+Writeback > `dirty_writeback_start_ratio', generators of >> dirty pages start writeback of dirty pages by themselves. At that time, >> these processes are not blocked in balance_dirty_pages(), but they may >> be blocked if the write-requests-queue of the written disk is full >> (that is, the length of the queue > `nr_requests'). By this behavior, >> we can throttle only processes which write to the disks with heavy load, >> and can allow processes to write to the other disks without blocking. >> >> If % of Dirty+Writeback > `dirty_ratio', generators of dirty pages >> are throttled as current Linux does, not to fill up memory with dirty >> pages. > > Does this actually solve the problem? If the request queue is sufficiently > large (relative to the various dirty-memory thresholds) then I'd expect > that a heavy-writer will be able to very quickly take the total > dirty+writeback memory up to the dirty_ratio (should be renamed > throttle_threshold, but it's too late for that). > > I suspect the reason why this patch was successful in your testing was > because dirty_start_writeback_ratio happens to exceed the size of the disk > request queues, so the heavy writer is getting stuck on disk request queue > exhaustion. > > But that won't work if we have a lot of processes writing to a lot of > disks, and it won't work if the request queue size is large, or if the > dirty-memory thresholds are small (relative to the request queue size). > > Do the patches still work after > `echo 1 > /sys/block/sda/queue/nr_requests'? As you pointed out, this patch has no effect if nr_requests is too large, because it distinguishes heavy disks depending on the length of the write- requests queue of each disk. This patch is for providing the system administrators with room to avoid the problem by adjusting parameters appropriately, rather than an automatic solution for any possible situations. Could you please tell me some situations in which we should set nr_request that large? Thanks, -- Tomoki Sekiyama Hitachi, Ltd., Systems Development Laboratory - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] VM throttling: Start writeback at dirty_writeback_start_ratio
Hello Andrew, Thank you for your comments. Andrew Morton wrote: On Tue, 03 Apr 2007 19:46:04 +0900 Tomoki Sekiyama [EMAIL PROTECTED] wrote: If % of Dirty+Writeback `dirty_writeback_start_ratio', generators of dirty pages start writeback of dirty pages by themselves. At that time, these processes are not blocked in balance_dirty_pages(), but they may be blocked if the write-requests-queue of the written disk is full (that is, the length of the queue `nr_requests'). By this behavior, we can throttle only processes which write to the disks with heavy load, and can allow processes to write to the other disks without blocking. If % of Dirty+Writeback `dirty_ratio', generators of dirty pages are throttled as current Linux does, not to fill up memory with dirty pages. Does this actually solve the problem? If the request queue is sufficiently large (relative to the various dirty-memory thresholds) then I'd expect that a heavy-writer will be able to very quickly take the total dirty+writeback memory up to the dirty_ratio (should be renamed throttle_threshold, but it's too late for that). I suspect the reason why this patch was successful in your testing was because dirty_start_writeback_ratio happens to exceed the size of the disk request queues, so the heavy writer is getting stuck on disk request queue exhaustion. But that won't work if we have a lot of processes writing to a lot of disks, and it won't work if the request queue size is large, or if the dirty-memory thresholds are small (relative to the request queue size). Do the patches still work after `echo 1 /sys/block/sda/queue/nr_requests'? As you pointed out, this patch has no effect if nr_requests is too large, because it distinguishes heavy disks depending on the length of the write- requests queue of each disk. This patch is for providing the system administrators with room to avoid the problem by adjusting parameters appropriately, rather than an automatic solution for any possible situations. Could you please tell me some situations in which we should set nr_request that large? Thanks, -- Tomoki Sekiyama Hitachi, Ltd., Systems Development Laboratory - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] VM throttling: Start writeback at dirty_writeback_start_ratio
On Tue, 10 Apr 2007 12:04:54 +0900 Tomoki Sekiyama [EMAIL PROTECTED] wrote: Hello Andrew, Thank you for your comments. Andrew Morton wrote: On Tue, 03 Apr 2007 19:46:04 +0900 Tomoki Sekiyama [EMAIL PROTECTED] wrote: If % of Dirty+Writeback `dirty_writeback_start_ratio', generators of dirty pages start writeback of dirty pages by themselves. At that time, these processes are not blocked in balance_dirty_pages(), but they may be blocked if the write-requests-queue of the written disk is full (that is, the length of the queue `nr_requests'). By this behavior, we can throttle only processes which write to the disks with heavy load, and can allow processes to write to the other disks without blocking. If % of Dirty+Writeback `dirty_ratio', generators of dirty pages are throttled as current Linux does, not to fill up memory with dirty pages. Does this actually solve the problem? If the request queue is sufficiently large (relative to the various dirty-memory thresholds) then I'd expect that a heavy-writer will be able to very quickly take the total dirty+writeback memory up to the dirty_ratio (should be renamed throttle_threshold, but it's too late for that). I suspect the reason why this patch was successful in your testing was because dirty_start_writeback_ratio happens to exceed the size of the disk request queues, so the heavy writer is getting stuck on disk request queue exhaustion. But that won't work if we have a lot of processes writing to a lot of disks, and it won't work if the request queue size is large, or if the dirty-memory thresholds are small (relative to the request queue size). Do the patches still work after `echo 1 /sys/block/sda/queue/nr_requests'? As you pointed out, this patch has no effect if nr_requests is too large, because it distinguishes heavy disks depending on the length of the write- requests queue of each disk. This patch is for providing the system administrators with room to avoid the problem by adjusting parameters appropriately, rather than an automatic solution for any possible situations. Could you please tell me some situations in which we should set nr_request that large? It's probably not a sensible thing to do. But it's _possible_ to do, and the fact that the kernel will again misbehave indicates an overall weakness in our design. And there are other ways in which this situation could occur: - The request queue has a fixed size (it is not scaled according to the amount of memory in the machine). So if the machine is small enough (say, 64MB) then the problem can happen. - The machine could have a large number of disks - The queue size of 128 is in units of number of requests. But it is independent upon the _size_ of those requests. If someone comes up with a driver which wants to use 16MB-sized requests, the problem will again reoccur. For all these sorts of reasons, we have learned that we should avoid any dependence upon request queue exhaustion within the VM/VFS/etc. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] VM throttling: Start writeback at dirty_writeback_start_ratio
On Tue, 03 Apr 2007 19:46:04 +0900 Tomoki Sekiyama <[EMAIL PROTECTED]> wrote: > This patchset is to avoid the problem that write(2) can be blocked for a > long time if a system has several disks with different speed and is > under heavy I/O pressure. > > -Description of the problem: > While Dirty+Writeback pages get more than 40%(`dirty_ratio') of memory, > generators of dirty pages are blocked in balance_dirty_pages() until > they start writeback of a specific number (`write_chunk', typically=1536) > of dirty pages on the disks they write to. > > Under this rule, if a process writes to the disk which has only a few > (less than 1536) dirty pages, that process will be blocked until > writeback of the other disks is completed and % of Dirty+Writeback goes > below 40%. > > Thus, if a slow device (such as a USB disk) has many dirty pages, the > processes which write small data to the other disks can be blocked for > quite a long time. > > -Solution: > This patch introduces high/low-watermark algorithm in > balance_dirty_pages() in order to throttle only the processes which > write to disks with heavy load. > > This patch adds `dirty_start_writeback_ratio' for the low-watermark, > and modifies get_dirty_limits() to calculate and return the writeback > starting level of dirty pages based on `dirty_start_writeback_ratio'. > > If % of Dirty+Writeback > `dirty_writeback_start_ratio', generators of > dirty pages start writeback of dirty pages by themselves. At that time, > these processes are not blocked in balance_dirty_pages(), but they may > be blocked if the write-requests-queue of the written disk is full > (that is, the length of the queue > `nr_requests'). By this behavior, > we can throttle only processes which write to the disks with heavy load, > and can allow processes to write to the other disks without blocking. > > If % of Dirty+Writeback > `dirty_ratio', generators of dirty pages > are throttled as current Linux does, not to fill up memory with dirty > pages. Does this actually solve the problem? If the request queue is sufficiently large (relative to the various dirty-memory thresholds) then I'd expect that a heavy-writer will be able to very quickly take the total dirty+writeback memory up to the dirty_ratio (should be renamed throttle_threshold, but it's too late for that). I suspect the reason why this patch was successful in your testing was because dirty_start_writeback_ratio happens to exceed the size of the disk request queues, so the heavy writer is getting stuck on disk request queue exhaustion. But that won't work if we have a lot of processes writing to a lot of disks, and it won't work if the request queue size is large, or if the dirty-memory thresholds are small (relative to the request queue size). Do the patches still work after `echo 1 > /sys/block/sda/queue/nr_requests'? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] VM throttling: Start writeback at dirty_writeback_start_ratio
On Tue, 03 Apr 2007 19:46:04 +0900 Tomoki Sekiyama [EMAIL PROTECTED] wrote: This patchset is to avoid the problem that write(2) can be blocked for a long time if a system has several disks with different speed and is under heavy I/O pressure. -Description of the problem: While Dirty+Writeback pages get more than 40%(`dirty_ratio') of memory, generators of dirty pages are blocked in balance_dirty_pages() until they start writeback of a specific number (`write_chunk', typically=1536) of dirty pages on the disks they write to. Under this rule, if a process writes to the disk which has only a few (less than 1536) dirty pages, that process will be blocked until writeback of the other disks is completed and % of Dirty+Writeback goes below 40%. Thus, if a slow device (such as a USB disk) has many dirty pages, the processes which write small data to the other disks can be blocked for quite a long time. -Solution: This patch introduces high/low-watermark algorithm in balance_dirty_pages() in order to throttle only the processes which write to disks with heavy load. This patch adds `dirty_start_writeback_ratio' for the low-watermark, and modifies get_dirty_limits() to calculate and return the writeback starting level of dirty pages based on `dirty_start_writeback_ratio'. If % of Dirty+Writeback `dirty_writeback_start_ratio', generators of dirty pages start writeback of dirty pages by themselves. At that time, these processes are not blocked in balance_dirty_pages(), but they may be blocked if the write-requests-queue of the written disk is full (that is, the length of the queue `nr_requests'). By this behavior, we can throttle only processes which write to the disks with heavy load, and can allow processes to write to the other disks without blocking. If % of Dirty+Writeback `dirty_ratio', generators of dirty pages are throttled as current Linux does, not to fill up memory with dirty pages. Does this actually solve the problem? If the request queue is sufficiently large (relative to the various dirty-memory thresholds) then I'd expect that a heavy-writer will be able to very quickly take the total dirty+writeback memory up to the dirty_ratio (should be renamed throttle_threshold, but it's too late for that). I suspect the reason why this patch was successful in your testing was because dirty_start_writeback_ratio happens to exceed the size of the disk request queues, so the heavy writer is getting stuck on disk request queue exhaustion. But that won't work if we have a lot of processes writing to a lot of disks, and it won't work if the request queue size is large, or if the dirty-memory thresholds are small (relative to the request queue size). Do the patches still work after `echo 1 /sys/block/sda/queue/nr_requests'? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] VM throttling: Start writeback at dirty_writeback_start_ratio
This patchset is to avoid the problem that write(2) can be blocked for a long time if a system has several disks with different speed and is under heavy I/O pressure. -Description of the problem: While Dirty+Writeback pages get more than 40%(`dirty_ratio') of memory, generators of dirty pages are blocked in balance_dirty_pages() until they start writeback of a specific number (`write_chunk', typically=1536) of dirty pages on the disks they write to. Under this rule, if a process writes to the disk which has only a few (less than 1536) dirty pages, that process will be blocked until writeback of the other disks is completed and % of Dirty+Writeback goes below 40%. Thus, if a slow device (such as a USB disk) has many dirty pages, the processes which write small data to the other disks can be blocked for quite a long time. -Solution: This patch introduces high/low-watermark algorithm in balance_dirty_pages() in order to throttle only the processes which write to disks with heavy load. This patch adds `dirty_start_writeback_ratio' for the low-watermark, and modifies get_dirty_limits() to calculate and return the writeback starting level of dirty pages based on `dirty_start_writeback_ratio'. If % of Dirty+Writeback > `dirty_writeback_start_ratio', generators of dirty pages start writeback of dirty pages by themselves. At that time, these processes are not blocked in balance_dirty_pages(), but they may be blocked if the write-requests-queue of the written disk is full (that is, the length of the queue > `nr_requests'). By this behavior, we can throttle only processes which write to the disks with heavy load, and can allow processes to write to the other disks without blocking. If % of Dirty+Writeback > `dirty_ratio', generators of dirty pages are throttled as current Linux does, not to fill up memory with dirty pages. Thanks, Signed-off-by: Tomoki Sekiyama <[EMAIL PROTECTED]> --- include/linux/writeback.h |1 mm/page-writeback.c | 52 -- 2 files changed, 42 insertions(+), 11 deletions(-) Index: linux-2.6.21-rc5-mm3-writeback/include/linux/writeback.h === --- linux-2.6.21-rc5-mm3-writeback.orig/include/linux/writeback.h +++ linux-2.6.21-rc5-mm3-writeback/include/linux/writeback.h @@ -94,6 +94,7 @@ static inline int laptop_spinned_down(vo /* These are exported to sysctl. */ extern int dirty_background_ratio; +extern int dirty_start_writeback_ratio; extern int vm_dirty_ratio; extern int dirty_writeback_interval; extern int dirty_expire_interval; Index: linux-2.6.21-rc5-mm3-writeback/mm/page-writeback.c === --- linux-2.6.21-rc5-mm3-writeback.orig/mm/page-writeback.c +++ linux-2.6.21-rc5-mm3-writeback/mm/page-writeback.c @@ -72,6 +72,11 @@ int dirty_background_ratio = 10; /* * The generator of dirty data starts writeback at this percentage */ +int dirty_start_writeback_ratio = 35; + +/* + * The generator of dirty data is blocked at this percentage + */ int vm_dirty_ratio = 40; /* @@ -112,12 +117,16 @@ static void background_writeout(unsigned * performing lots of scanning. * * We only allow 1/2 of the currently-unmapped memory to be dirtied. + * `vm.dirty_ratio' is ignored if it is larger than that. + * In this case, `vm.dirty_start_writeback_ratio' is also decreased to keep + * writeback independently among disks. * * We don't permit the clamping level to fall below 5% - that is getting rather * excessive. * - * We make sure that the background writeout level is below the adjusted - * clamping level. + * We make sure that the active writeout level is below the adjusted clamping + * leve, and that the background writeout level is below the active writeout + * level. */ static unsigned long highmem_dirtyable_memory(unsigned long total) @@ -158,13 +167,15 @@ static unsigned long determine_dirtyable } static void -get_dirty_limits(long *pbackground, long *pdirty, +get_dirty_limits(long *pbackground, long *pstart_writeback, long *pdirty, struct address_space *mapping) { int background_ratio; /* Percentages */ + int start_writeback_ratio; int dirty_ratio; int unmapped_ratio; long background; + long start_writeback; long dirty; unsigned long available_memory = determine_dirtyable_memory(); struct task_struct *tsk; @@ -177,28 +188,40 @@ get_dirty_limits(long *pbackground, long if (dirty_ratio > unmapped_ratio / 2) dirty_ratio = unmapped_ratio / 2; + start_writeback_ratio = dirty_start_writeback_ratio; + if (start_writeback_ratio > dirty_ratio) + start_writeback_ratio = dirty_ratio; + start_writeback_ratio -= vm_dirty_ratio - dirty_ratio; + if (dirty_ratio < 5) dirty_ratio = 5; + if
[PATCH 1/2] VM throttling: Start writeback at dirty_writeback_start_ratio
This patchset is to avoid the problem that write(2) can be blocked for a long time if a system has several disks with different speed and is under heavy I/O pressure. -Description of the problem: While Dirty+Writeback pages get more than 40%(`dirty_ratio') of memory, generators of dirty pages are blocked in balance_dirty_pages() until they start writeback of a specific number (`write_chunk', typically=1536) of dirty pages on the disks they write to. Under this rule, if a process writes to the disk which has only a few (less than 1536) dirty pages, that process will be blocked until writeback of the other disks is completed and % of Dirty+Writeback goes below 40%. Thus, if a slow device (such as a USB disk) has many dirty pages, the processes which write small data to the other disks can be blocked for quite a long time. -Solution: This patch introduces high/low-watermark algorithm in balance_dirty_pages() in order to throttle only the processes which write to disks with heavy load. This patch adds `dirty_start_writeback_ratio' for the low-watermark, and modifies get_dirty_limits() to calculate and return the writeback starting level of dirty pages based on `dirty_start_writeback_ratio'. If % of Dirty+Writeback `dirty_writeback_start_ratio', generators of dirty pages start writeback of dirty pages by themselves. At that time, these processes are not blocked in balance_dirty_pages(), but they may be blocked if the write-requests-queue of the written disk is full (that is, the length of the queue `nr_requests'). By this behavior, we can throttle only processes which write to the disks with heavy load, and can allow processes to write to the other disks without blocking. If % of Dirty+Writeback `dirty_ratio', generators of dirty pages are throttled as current Linux does, not to fill up memory with dirty pages. Thanks, Signed-off-by: Tomoki Sekiyama [EMAIL PROTECTED] --- include/linux/writeback.h |1 mm/page-writeback.c | 52 -- 2 files changed, 42 insertions(+), 11 deletions(-) Index: linux-2.6.21-rc5-mm3-writeback/include/linux/writeback.h === --- linux-2.6.21-rc5-mm3-writeback.orig/include/linux/writeback.h +++ linux-2.6.21-rc5-mm3-writeback/include/linux/writeback.h @@ -94,6 +94,7 @@ static inline int laptop_spinned_down(vo /* These are exported to sysctl. */ extern int dirty_background_ratio; +extern int dirty_start_writeback_ratio; extern int vm_dirty_ratio; extern int dirty_writeback_interval; extern int dirty_expire_interval; Index: linux-2.6.21-rc5-mm3-writeback/mm/page-writeback.c === --- linux-2.6.21-rc5-mm3-writeback.orig/mm/page-writeback.c +++ linux-2.6.21-rc5-mm3-writeback/mm/page-writeback.c @@ -72,6 +72,11 @@ int dirty_background_ratio = 10; /* * The generator of dirty data starts writeback at this percentage */ +int dirty_start_writeback_ratio = 35; + +/* + * The generator of dirty data is blocked at this percentage + */ int vm_dirty_ratio = 40; /* @@ -112,12 +117,16 @@ static void background_writeout(unsigned * performing lots of scanning. * * We only allow 1/2 of the currently-unmapped memory to be dirtied. + * `vm.dirty_ratio' is ignored if it is larger than that. + * In this case, `vm.dirty_start_writeback_ratio' is also decreased to keep + * writeback independently among disks. * * We don't permit the clamping level to fall below 5% - that is getting rather * excessive. * - * We make sure that the background writeout level is below the adjusted - * clamping level. + * We make sure that the active writeout level is below the adjusted clamping + * leve, and that the background writeout level is below the active writeout + * level. */ static unsigned long highmem_dirtyable_memory(unsigned long total) @@ -158,13 +167,15 @@ static unsigned long determine_dirtyable } static void -get_dirty_limits(long *pbackground, long *pdirty, +get_dirty_limits(long *pbackground, long *pstart_writeback, long *pdirty, struct address_space *mapping) { int background_ratio; /* Percentages */ + int start_writeback_ratio; int dirty_ratio; int unmapped_ratio; long background; + long start_writeback; long dirty; unsigned long available_memory = determine_dirtyable_memory(); struct task_struct *tsk; @@ -177,28 +188,40 @@ get_dirty_limits(long *pbackground, long if (dirty_ratio unmapped_ratio / 2) dirty_ratio = unmapped_ratio / 2; + start_writeback_ratio = dirty_start_writeback_ratio; + if (start_writeback_ratio dirty_ratio) + start_writeback_ratio = dirty_ratio; + start_writeback_ratio -= vm_dirty_ratio - dirty_ratio; + if (dirty_ratio 5) dirty_ratio = 5; + if (start_writeback_ratio