Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
Hi! > > IMVHO every developer involved in memory-management (and indeed, any > > software development; the authors of ntpd comes in mind here) should > > have a 386 with 4MB of RAM and some 16MB of swap. Nowadays I have the > > luxury of a 486 with 8MB of RAM and 32MB of swap as a firewall, but it's > > still a pain to work with. > > If you really want to have fun, remove all swap... My handheld has 12MB ram, no swap ;-), and that's pretty big machine for handheld. Pavel PS: Swapping on flash disk is bad idea, right? -- I'm [EMAIL PROTECTED] "In my country we have almost anarchy and I don't care." Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
Hi! IMVHO every developer involved in memory-management (and indeed, any software development; the authors of ntpd comes in mind here) should have a 386 with 4MB of RAM and some 16MB of swap. Nowadays I have the luxury of a 486 with 8MB of RAM and 32MB of swap as a firewall, but it's still a pain to work with. If you really want to have fun, remove all swap... My handheld has 12MB ram, no swap ;-), and that's pretty big machine for handheld. Pavel PS: Swapping on flash disk is bad idea, right? -- I'm [EMAIL PROTECTED] In my country we have almost anarchy and I don't care. Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
Hi! > > IMVHO every developer involved in memory-management (and indeed, any > > software development; the authors of ntpd comes in mind here) should > > have a 386 with 4MB of RAM and some 16MB of swap. Nowadays I have the > > luxury of a 486 with 8MB of RAM and 32MB of swap as a firewall, but it's > > still a pain to work with. > > You're absolutely right. The smallest thing I'm testing with > on a regular basis is my dual pentium machine, booted with > mem=8m or mem=16m. > > Time to hunt around for a 386 or 486 which is limited to such > a small amount of RAM ;) Buy agenda handheld: 16MB flash, 8MB ram, X, size of palm. It is definitely more sexy machine than average 486. [Or get philips velo 1, if you want keyboard ;-)] Pavel -- The best software in life is free (not shareware)! Pavel GCM d? s-: !g p?:+ au- a--@ w+ v- C++@ UL+++ L++ N++ E++ W--- M- Y- R+ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Wed, May 23, 2001 at 05:51:50PM +, Scott Anderson wrote: > David Weinehall wrote: > > IMVHO every developer involved in memory-management (and indeed, any > > software development; the authors of ntpd comes in mind here) should > > have a 386 with 4MB of RAM and some 16MB of swap. Nowadays I have the > > luxury of a 486 with 8MB of RAM and 32MB of swap as a firewall, but it's > > still a pain to work with. > > If you really want to have fun, remove all swap... Oh, I've done some testing without swap too, mainly to test Rik's oom-killer. Seemed to work pretty well. Can't say it was enjoyable, though. /David _ _ // David Weinehall <[EMAIL PROTECTED]> /> Northern lights wander \\ // Project MCA Linux hacker// Dance across the winter sky // \> http://www.acc.umu.se/~tao/http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Wed, May 23, 2001 at 05:51:50PM +, Scott Anderson wrote: David Weinehall wrote: IMVHO every developer involved in memory-management (and indeed, any software development; the authors of ntpd comes in mind here) should have a 386 with 4MB of RAM and some 16MB of swap. Nowadays I have the luxury of a 486 with 8MB of RAM and 32MB of swap as a firewall, but it's still a pain to work with. If you really want to have fun, remove all swap... Oh, I've done some testing without swap too, mainly to test Rik's oom-killer. Seemed to work pretty well. Can't say it was enjoyable, though. /David _ _ // David Weinehall [EMAIL PROTECTED] / Northern lights wander \\ // Project MCA Linux hacker// Dance across the winter sky // \ http://www.acc.umu.se/~tao// Full colour fire / - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
Hi! IMVHO every developer involved in memory-management (and indeed, any software development; the authors of ntpd comes in mind here) should have a 386 with 4MB of RAM and some 16MB of swap. Nowadays I have the luxury of a 486 with 8MB of RAM and 32MB of swap as a firewall, but it's still a pain to work with. You're absolutely right. The smallest thing I'm testing with on a regular basis is my dual pentium machine, booted with mem=8m or mem=16m. Time to hunt around for a 386 or 486 which is limited to such a small amount of RAM ;) Buy agenda handheld: 16MB flash, 8MB ram, X, size of palm. It is definitely more sexy machine than average 486. [Or get philips velo 1, if you want keyboard ;-)] Pavel -- The best software in life is free (not shareware)! Pavel GCM d? s-: !g p?:+ au- a--@ w+ v- C++@ UL+++ L++ N++ E++ W--- M- Y- R+ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Thu, 24 May 2001, Rik van Riel wrote: > > > > OK.. let's forget about throughput for a moment and consider > > > > those annoying reports of 0 order allocations failing :) > > > > > > Those are ok. All failing 0 order allocations are either > > > atomic allocations or GFP_BUFFER allocations. I guess we > > > should just remove the printk() ;) > > > > Hmm. The guy who's box locks up on him after a burst of these > > probably doesn't think these failures are very OK ;-) I don't > > think order 0 failing is cool at all.. ever. > > You may not think it's cool, but it's needed in order to > prevent deadlocks. Just because an allocation cannot do > disk IO or sleep, that's no reason to loop around like > crazy in __alloc_pages() and hang the machine ... ;) True, but if we have resources available there's no excuse for a failure. Well, yes there is. If the cost of that resource is higher than the value of letting the allocation succeed. We have no data on the value of success, but we do plan on consuming the reclaimable pool and do that (must), so I still think turning these resources loose at strategic moments is logically sound. (doesn't mean there's not a better way.. it's just an easy way) I'd really like someone who has this problem to try the patch to see if it does help. I don't have this darn problem myself, so I'm left holding a bag of idle curiosity. ;-) Cheers, -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Thu, 24 May 2001, Mike Galbraith wrote: > On Thu, 24 May 2001, Rik van Riel wrote: > > On Thu, 24 May 2001, Mike Galbraith wrote: > > > On Sun, 20 May 2001, Rik van Riel wrote: > > > > > > > Remember that inactive_clean pages are always immediately > > > > reclaimable by __alloc_pages(), if you measured a performance > > > > difference by freeing pages in a different way I'm pretty sure > > > > it's a side effect of something else. What that something > > > > else is I'm curious to find out, but I'm pretty convinced that > > > > throwing away data early isn't the way to go. > > > > > > OK.. let's forget about throughput for a moment and consider > > > those annoying reports of 0 order allocations failing :) > > > > Those are ok. All failing 0 order allocations are either > > atomic allocations or GFP_BUFFER allocations. I guess we > > should just remove the printk() ;) > > Hmm. The guy who's box locks up on him after a burst of these > probably doesn't think these failures are very OK ;-) I don't > think order 0 failing is cool at all.. ever. You may not think it's cool, but it's needed in order to prevent deadlocks. Just because an allocation cannot do disk IO or sleep, that's no reason to loop around like crazy in __alloc_pages() and hang the machine ... ;) > A (long) while back, Linus specifically mentioned worrying > about atomic allocation reliability. That's a separate issue. That was, IIRC, about the failure of atomic allocations causing packet loss on Linux routers and, because of that, poor performance. This is something we still need to look into, but basically this problem is about too high latency and NOT about "pre-freeing" more pages (like your patch attempts). If this problem is still an issue, it's quite likely that the VM is holding locks for too long so that it cannot react fast enough to free up some inactive_clean pages. regards, Rik -- Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Thu, 24 May 2001, Rik van Riel wrote: > On Thu, 24 May 2001, Mike Galbraith wrote: > > On Sun, 20 May 2001, Rik van Riel wrote: > > > > > Remember that inactive_clean pages are always immediately > > > reclaimable by __alloc_pages(), if you measured a performance > > > difference by freeing pages in a different way I'm pretty sure > > > it's a side effect of something else. What that something > > > else is I'm curious to find out, but I'm pretty convinced that > > > throwing away data early isn't the way to go. > > > > OK.. let's forget about throughput for a moment and consider > > those annoying reports of 0 order allocations failing :) > > Those are ok. All failing 0 order allocations are either > atomic allocations or GFP_BUFFER allocations. I guess we > should just remove the printk() ;) Hmm. The guy who's box locks up on him after a burst of these probably doesn't think these failures are very OK ;-) I don't think order 0 failing is cool at all.. ever. A (long) while back, Linus specifically mentioned worrying about atomic allocation reliability. > > What do you think of the below (ignore the refill_inactive bit) > > wrt allocator reliability under heavy stress? The thing does > > kick in and pump up zones even if I set the 'blood donor' level > > to pages_min. > > > - unsigned long water_mark; > > + unsigned long water_mark = 1 << order; > > Makes no sense at all since water_mark gets assigned not 10 > lines below. ;) That assignment was supposed to turn into +=. > > + if (direct_reclaim) { > > + int count; > > + > > + /* If we're in bad shape.. */ > > + if (z->free_pages < z->pages_low && z->inactive_clean_pages) { > > I'm not sure if we want to fill up the free list all the way > to z->pages_low all the time, since "free memory is wasted > memory". Yes. I'm just thinking of the burst of allocations with no reclaim possible. > The reason the current scheme only triggers when we reach > z->pages_min and then goes all the way up to z->pages_low > is memory defragmentation. Since we'll be doing direct Ah. > reclaim for just about every allocation in the system, it > only happens occasionally that we throw away all the > inactive_clean pages between z->pages_min and z->pages_low. This one has me puzzled. We're reluctant to release cleaned pages, but at the same time, we reclaim if possible as soon as all zones are below pages_high. > > + count = 4 * (1 << page_cluster); > > + /* reclaim a page for ourselves if we can afford to.. >*/ > > + if (z->inactive_clean_pages > count) > > + page = reclaim_page(z); > > + if (z->inactive_clean_pages < 2 * count) > > + count = z->inactive_clean_pages / 2; > > + } else count = 0; > > What exactly is the reasoning behind this complex "count" > stuff? Is there a good reason for not just refilling the > free list up to the target or until the inactive_clean list > is depleted ? Well, yes. You didn't like the 50/50 split thingy I did before, so I connected zones to a tricklecharger instead. > > + /* > > +* and make a small donation to the reclaim challenged. > > +* > > +* We don't ever want a zone to reach the state where we > > +* have nothing except reclaimable pages left.. not if > > +* we can possibly do something to help prevent it. > > +*/ > > This comment makes little sense If not, then none of it does. This situation is the ONLY thing I was worried about. free_pages + inactive_clean_pages > pages_min does nothing about free_pages for those who can't reclaim if most of that is inactive_clean_pages. IFF it's possible to be critical on free_pages and still have clean pages, it does make sense. > > + if (z->inactive_clean_pages - z->free_pages > z->pages_low > > + && waitqueue_active(_wait)) > > + wake_up_interruptible(_wait); > > This doesn't make any sense to me at all. Why wake up > kreclaimd just because the difference between the number > of inactive_clean pages and free pages is large ? You had to get there with direct_reclaim not set was the thought. Nobody gave the zone a transfusion, but there is a blood supply. If nobody gets around to refilling the zone, kreclaimd will. > Didn't we determine in our last exchange of email that > it would be a good thing under most loads to keep as much > inactive_clean memory around as possible and not waste^Wfree > memory early ? So why do we reclaim if we're just below pages_high? The whole point of this patch is to reclaim _less_ in the general case, but to do so in a timely manner if we really need it. > > - /* > > -
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Thu, 24 May 2001, Mike Galbraith wrote: > On Sun, 20 May 2001, Rik van Riel wrote: > > > Remember that inactive_clean pages are always immediately > > reclaimable by __alloc_pages(), if you measured a performance > > difference by freeing pages in a different way I'm pretty sure > > it's a side effect of something else. What that something > > else is I'm curious to find out, but I'm pretty convinced that > > throwing away data early isn't the way to go. > > OK.. let's forget about throughput for a moment and consider > those annoying reports of 0 order allocations failing :) Those are ok. All failing 0 order allocations are either atomic allocations or GFP_BUFFER allocations. I guess we should just remove the printk() ;) > What do you think of the below (ignore the refill_inactive bit) > wrt allocator reliability under heavy stress? The thing does > kick in and pump up zones even if I set the 'blood donor' level > to pages_min. > - unsigned long water_mark; > + unsigned long water_mark = 1 << order; Makes no sense at all since water_mark gets assigned not 10 lines below. ;) > + if (direct_reclaim) { > + int count; > + > + /* If we're in bad shape.. */ > + if (z->free_pages < z->pages_low && z->inactive_clean_pages) { I'm not sure if we want to fill up the free list all the way to z->pages_low all the time, since "free memory is wasted memory". The reason the current scheme only triggers when we reach z->pages_min and then goes all the way up to z->pages_low is memory defragmentation. Since we'll be doing direct reclaim for just about every allocation in the system, it only happens occasionally that we throw away all the inactive_clean pages between z->pages_min and z->pages_low. > + count = 4 * (1 << page_cluster); > + /* reclaim a page for ourselves if we can afford to.. >*/ > + if (z->inactive_clean_pages > count) > + page = reclaim_page(z); > + if (z->inactive_clean_pages < 2 * count) > + count = z->inactive_clean_pages / 2; > + } else count = 0; What exactly is the reasoning behind this complex "count" stuff? Is there a good reason for not just refilling the free list up to the target or until the inactive_clean list is depleted ? > + /* > + * and make a small donation to the reclaim challenged. > + * > + * We don't ever want a zone to reach the state where we > + * have nothing except reclaimable pages left.. not if > + * we can possibly do something to help prevent it. > + */ This comment makes little sense > + if (z->inactive_clean_pages - z->free_pages > z->pages_low > + && waitqueue_active(_wait)) > + wake_up_interruptible(_wait); This doesn't make any sense to me at all. Why wake up kreclaimd just because the difference between the number of inactive_clean pages and free pages is large ? Didn't we determine in our last exchange of email that it would be a good thing under most loads to keep as much inactive_clean memory around as possible and not waste^Wfree memory early ? > - /* > - * First, see if we have any zones with lots of free memory. > - * > - * We allocate free memory first because it doesn't contain > - * any data ... DUH! > - */ We want to keep this. Suppose we have one zone which is half filled with inactive_clean pages and one zone which has "too many" free pages. Allocating from the first zone means we evict some piece of, potentially useful, data from the cache; allocating from the second zone means we can keep the data in memory and only fill up a currently unused page. > @@ -824,39 +824,17 @@ > #define DEF_PRIORITY (6) > static int refill_inactive(unsigned int gfp_mask, int user) > { I've heard all kinds of things about this part of the patch, except an explanation of why and how it is supposed to work ;) > @@ -976,8 +954,9 @@ >* We go to sleep for one second, but if it's needed >* we'll be woken up earlier... >*/ > - if (!free_shortage() || !inactive_shortage()) { > - interruptible_sleep_on_timeout(_wait, HZ); > + if (current->need_resched || !free_shortage() || > + !inactive_shortage()) { > + interruptible_sleep_on_timeout(_wait, HZ/10); Makes sense. Integrated in my tree ;) regards, Rik -- Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose...
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Sun, 20 May 2001, Rik van Riel wrote: > Remember that inactive_clean pages are always immediately > reclaimable by __alloc_pages(), if you measured a performance > difference by freeing pages in a different way I'm pretty sure > it's a side effect of something else. What that something > else is I'm curious to find out, but I'm pretty convinced that > throwing away data early isn't the way to go. OK.. let's forget about throughput for a moment and consider those annoying reports of 0 order allocations failing :) What do you think of the below (ignore the refill_inactive bit) wrt allocator reliability under heavy stress? The thing does kick in and pump up zones even if I set the 'blood donor' level to pages_min. -Mike --- linux-2.4.5-pre3/mm/page_alloc.c.orgMon May 21 10:35:06 2001 +++ linux-2.4.5-pre3/mm/page_alloc.cThu May 24 08:18:36 2001 @@ -224,10 +224,11 @@ unsigned long order, int limit, int direct_reclaim) { zone_t **zone = zonelist->zones; + struct page *page = NULL; for (;;) { zone_t *z = *(zone++); - unsigned long water_mark; + unsigned long water_mark = 1 << order; if (!z) break; @@ -249,18 +250,44 @@ case PAGES_HIGH: water_mark = z->pages_high; } + if (z->free_pages + z->inactive_clean_pages < water_mark) + continue; - if (z->free_pages + z->inactive_clean_pages > water_mark) { - struct page *page = NULL; - /* If possible, reclaim a page directly. */ - if (direct_reclaim && z->free_pages < z->pages_min + 8) + if (direct_reclaim) { + int count; + + /* If we're in bad shape.. */ + if (z->free_pages < z->pages_low && z->inactive_clean_pages) { + count = 4 * (1 << page_cluster); + /* reclaim a page for ourselves if we can afford to.. +*/ + if (z->inactive_clean_pages > count) + page = reclaim_page(z); + if (z->inactive_clean_pages < 2 * count) + count = z->inactive_clean_pages / 2; + } else count = 0; + + /* +* and make a small donation to the reclaim challenged. +* +* We don't ever want a zone to reach the state where we +* have nothing except reclaimable pages left.. not if +* we can possibly do something to help prevent it. +*/ + while (count--) { + struct page *page; page = reclaim_page(z); - /* If that fails, fall back to rmqueue. */ - if (!page) - page = rmqueue(z, order); - if (page) - return page; + if (!page) + break; + __free_page(page); + } } + if (!page) + page = rmqueue(z, order); + if (page) + return page; + if (z->inactive_clean_pages - z->free_pages > z->pages_low + && waitqueue_active(_wait)) + wake_up_interruptible(_wait); } /* Found nothing. */ @@ -314,29 +341,6 @@ wakeup_bdflush(0); try_again: - /* -* First, see if we have any zones with lots of free memory. -* -* We allocate free memory first because it doesn't contain -* any data ... DUH! -*/ - zone = zonelist->zones; - for (;;) { - zone_t *z = *(zone++); - if (!z) - break; - if (!z->size) - BUG(); - - if (z->free_pages >= z->pages_low) { - page = rmqueue(z, order); - if (page) - return page; - } else if (z->free_pages < z->pages_min && - waitqueue_active(_wait)) { - wake_up_interruptible(_wait); - } - } /* * Try to allocate a page from a zone with a HIGH --- linux-2.4.5-pre3/mm/vmscan.c.orgThu May 17 16:44:23 2001 +++ linux-2.4.5-pre3/mm/vmscan.cThu May 24 08:05:21 2001 @@ -824,39 +824,17 @@ #define DEF_PRIORITY (6) static int refill_inactive(unsigned int
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Sun, 20 May 2001, Rik van Riel wrote: Remember that inactive_clean pages are always immediately reclaimable by __alloc_pages(), if you measured a performance difference by freeing pages in a different way I'm pretty sure it's a side effect of something else. What that something else is I'm curious to find out, but I'm pretty convinced that throwing away data early isn't the way to go. OK.. let's forget about throughput for a moment and consider those annoying reports of 0 order allocations failing :) What do you think of the below (ignore the refill_inactive bit) wrt allocator reliability under heavy stress? The thing does kick in and pump up zones even if I set the 'blood donor' level to pages_min. -Mike --- linux-2.4.5-pre3/mm/page_alloc.c.orgMon May 21 10:35:06 2001 +++ linux-2.4.5-pre3/mm/page_alloc.cThu May 24 08:18:36 2001 @@ -224,10 +224,11 @@ unsigned long order, int limit, int direct_reclaim) { zone_t **zone = zonelist-zones; + struct page *page = NULL; for (;;) { zone_t *z = *(zone++); - unsigned long water_mark; + unsigned long water_mark = 1 order; if (!z) break; @@ -249,18 +250,44 @@ case PAGES_HIGH: water_mark = z-pages_high; } + if (z-free_pages + z-inactive_clean_pages water_mark) + continue; - if (z-free_pages + z-inactive_clean_pages water_mark) { - struct page *page = NULL; - /* If possible, reclaim a page directly. */ - if (direct_reclaim z-free_pages z-pages_min + 8) + if (direct_reclaim) { + int count; + + /* If we're in bad shape.. */ + if (z-free_pages z-pages_low z-inactive_clean_pages) { + count = 4 * (1 page_cluster); + /* reclaim a page for ourselves if we can afford to.. +*/ + if (z-inactive_clean_pages count) + page = reclaim_page(z); + if (z-inactive_clean_pages 2 * count) + count = z-inactive_clean_pages / 2; + } else count = 0; + + /* +* and make a small donation to the reclaim challenged. +* +* We don't ever want a zone to reach the state where we +* have nothing except reclaimable pages left.. not if +* we can possibly do something to help prevent it. +*/ + while (count--) { + struct page *page; page = reclaim_page(z); - /* If that fails, fall back to rmqueue. */ - if (!page) - page = rmqueue(z, order); - if (page) - return page; + if (!page) + break; + __free_page(page); + } } + if (!page) + page = rmqueue(z, order); + if (page) + return page; + if (z-inactive_clean_pages - z-free_pages z-pages_low +waitqueue_active(kreclaimd_wait)) + wake_up_interruptible(kreclaimd_wait); } /* Found nothing. */ @@ -314,29 +341,6 @@ wakeup_bdflush(0); try_again: - /* -* First, see if we have any zones with lots of free memory. -* -* We allocate free memory first because it doesn't contain -* any data ... DUH! -*/ - zone = zonelist-zones; - for (;;) { - zone_t *z = *(zone++); - if (!z) - break; - if (!z-size) - BUG(); - - if (z-free_pages = z-pages_low) { - page = rmqueue(z, order); - if (page) - return page; - } else if (z-free_pages z-pages_min - waitqueue_active(kreclaimd_wait)) { - wake_up_interruptible(kreclaimd_wait); - } - } /* * Try to allocate a page from a zone with a HIGH --- linux-2.4.5-pre3/mm/vmscan.c.orgThu May 17 16:44:23 2001 +++ linux-2.4.5-pre3/mm/vmscan.cThu May 24 08:05:21 2001 @@ -824,39 +824,17 @@ #define DEF_PRIORITY (6) static int refill_inactive(unsigned int gfp_mask, int
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Thu, 24 May 2001, Mike Galbraith wrote: On Sun, 20 May 2001, Rik van Riel wrote: Remember that inactive_clean pages are always immediately reclaimable by __alloc_pages(), if you measured a performance difference by freeing pages in a different way I'm pretty sure it's a side effect of something else. What that something else is I'm curious to find out, but I'm pretty convinced that throwing away data early isn't the way to go. OK.. let's forget about throughput for a moment and consider those annoying reports of 0 order allocations failing :) Those are ok. All failing 0 order allocations are either atomic allocations or GFP_BUFFER allocations. I guess we should just remove the printk() ;) What do you think of the below (ignore the refill_inactive bit) wrt allocator reliability under heavy stress? The thing does kick in and pump up zones even if I set the 'blood donor' level to pages_min. - unsigned long water_mark; + unsigned long water_mark = 1 order; Makes no sense at all since water_mark gets assigned not 10 lines below. ;) + if (direct_reclaim) { + int count; + + /* If we're in bad shape.. */ + if (z-free_pages z-pages_low z-inactive_clean_pages) { I'm not sure if we want to fill up the free list all the way to z-pages_low all the time, since free memory is wasted memory. The reason the current scheme only triggers when we reach z-pages_min and then goes all the way up to z-pages_low is memory defragmentation. Since we'll be doing direct reclaim for just about every allocation in the system, it only happens occasionally that we throw away all the inactive_clean pages between z-pages_min and z-pages_low. + count = 4 * (1 page_cluster); + /* reclaim a page for ourselves if we can afford to.. */ + if (z-inactive_clean_pages count) + page = reclaim_page(z); + if (z-inactive_clean_pages 2 * count) + count = z-inactive_clean_pages / 2; + } else count = 0; What exactly is the reasoning behind this complex count stuff? Is there a good reason for not just refilling the free list up to the target or until the inactive_clean list is depleted ? + /* + * and make a small donation to the reclaim challenged. + * + * We don't ever want a zone to reach the state where we + * have nothing except reclaimable pages left.. not if + * we can possibly do something to help prevent it. + */ This comment makes little sense + if (z-inactive_clean_pages - z-free_pages z-pages_low + waitqueue_active(kreclaimd_wait)) + wake_up_interruptible(kreclaimd_wait); This doesn't make any sense to me at all. Why wake up kreclaimd just because the difference between the number of inactive_clean pages and free pages is large ? Didn't we determine in our last exchange of email that it would be a good thing under most loads to keep as much inactive_clean memory around as possible and not waste^Wfree memory early ? - /* - * First, see if we have any zones with lots of free memory. - * - * We allocate free memory first because it doesn't contain - * any data ... DUH! - */ We want to keep this. Suppose we have one zone which is half filled with inactive_clean pages and one zone which has too many free pages. Allocating from the first zone means we evict some piece of, potentially useful, data from the cache; allocating from the second zone means we can keep the data in memory and only fill up a currently unused page. @@ -824,39 +824,17 @@ #define DEF_PRIORITY (6) static int refill_inactive(unsigned int gfp_mask, int user) { I've heard all kinds of things about this part of the patch, except an explanation of why and how it is supposed to work ;) @@ -976,8 +954,9 @@ * We go to sleep for one second, but if it's needed * we'll be woken up earlier... */ - if (!free_shortage() || !inactive_shortage()) { - interruptible_sleep_on_timeout(kswapd_wait, HZ); + if (current-need_resched || !free_shortage() || + !inactive_shortage()) { + interruptible_sleep_on_timeout(kswapd_wait, HZ/10); Makes sense. Integrated in my tree ;) regards, Rik -- Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Thu, 24 May 2001, Rik van Riel wrote: On Thu, 24 May 2001, Mike Galbraith wrote: On Sun, 20 May 2001, Rik van Riel wrote: Remember that inactive_clean pages are always immediately reclaimable by __alloc_pages(), if you measured a performance difference by freeing pages in a different way I'm pretty sure it's a side effect of something else. What that something else is I'm curious to find out, but I'm pretty convinced that throwing away data early isn't the way to go. OK.. let's forget about throughput for a moment and consider those annoying reports of 0 order allocations failing :) Those are ok. All failing 0 order allocations are either atomic allocations or GFP_BUFFER allocations. I guess we should just remove the printk() ;) Hmm. The guy who's box locks up on him after a burst of these probably doesn't think these failures are very OK ;-) I don't think order 0 failing is cool at all.. ever. A (long) while back, Linus specifically mentioned worrying about atomic allocation reliability. What do you think of the below (ignore the refill_inactive bit) wrt allocator reliability under heavy stress? The thing does kick in and pump up zones even if I set the 'blood donor' level to pages_min. - unsigned long water_mark; + unsigned long water_mark = 1 order; Makes no sense at all since water_mark gets assigned not 10 lines below. ;) That assignment was supposed to turn into +=. + if (direct_reclaim) { + int count; + + /* If we're in bad shape.. */ + if (z-free_pages z-pages_low z-inactive_clean_pages) { I'm not sure if we want to fill up the free list all the way to z-pages_low all the time, since free memory is wasted memory. Yes. I'm just thinking of the burst of allocations with no reclaim possible. The reason the current scheme only triggers when we reach z-pages_min and then goes all the way up to z-pages_low is memory defragmentation. Since we'll be doing direct Ah. reclaim for just about every allocation in the system, it only happens occasionally that we throw away all the inactive_clean pages between z-pages_min and z-pages_low. This one has me puzzled. We're reluctant to release cleaned pages, but at the same time, we reclaim if possible as soon as all zones are below pages_high. + count = 4 * (1 page_cluster); + /* reclaim a page for ourselves if we can afford to.. */ + if (z-inactive_clean_pages count) + page = reclaim_page(z); + if (z-inactive_clean_pages 2 * count) + count = z-inactive_clean_pages / 2; + } else count = 0; What exactly is the reasoning behind this complex count stuff? Is there a good reason for not just refilling the free list up to the target or until the inactive_clean list is depleted ? Well, yes. You didn't like the 50/50 split thingy I did before, so I connected zones to a tricklecharger instead. + /* +* and make a small donation to the reclaim challenged. +* +* We don't ever want a zone to reach the state where we +* have nothing except reclaimable pages left.. not if +* we can possibly do something to help prevent it. +*/ This comment makes little sense If not, then none of it does. This situation is the ONLY thing I was worried about. free_pages + inactive_clean_pages pages_min does nothing about free_pages for those who can't reclaim if most of that is inactive_clean_pages. IFF it's possible to be critical on free_pages and still have clean pages, it does make sense. + if (z-inactive_clean_pages - z-free_pages z-pages_low +waitqueue_active(kreclaimd_wait)) + wake_up_interruptible(kreclaimd_wait); This doesn't make any sense to me at all. Why wake up kreclaimd just because the difference between the number of inactive_clean pages and free pages is large ? You had to get there with direct_reclaim not set was the thought. Nobody gave the zone a transfusion, but there is a blood supply. If nobody gets around to refilling the zone, kreclaimd will. Didn't we determine in our last exchange of email that it would be a good thing under most loads to keep as much inactive_clean memory around as possible and not waste^Wfree memory early ? So why do we reclaim if we're just below pages_high? The whole point of this patch is to reclaim _less_ in the general case, but to do so in a timely manner if we really need it. - /* -* First, see if we have any zones with lots of free memory. -* -* We allocate free memory first because it doesn't
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Thu, 24 May 2001, Mike Galbraith wrote: On Thu, 24 May 2001, Rik van Riel wrote: On Thu, 24 May 2001, Mike Galbraith wrote: On Sun, 20 May 2001, Rik van Riel wrote: Remember that inactive_clean pages are always immediately reclaimable by __alloc_pages(), if you measured a performance difference by freeing pages in a different way I'm pretty sure it's a side effect of something else. What that something else is I'm curious to find out, but I'm pretty convinced that throwing away data early isn't the way to go. OK.. let's forget about throughput for a moment and consider those annoying reports of 0 order allocations failing :) Those are ok. All failing 0 order allocations are either atomic allocations or GFP_BUFFER allocations. I guess we should just remove the printk() ;) Hmm. The guy who's box locks up on him after a burst of these probably doesn't think these failures are very OK ;-) I don't think order 0 failing is cool at all.. ever. You may not think it's cool, but it's needed in order to prevent deadlocks. Just because an allocation cannot do disk IO or sleep, that's no reason to loop around like crazy in __alloc_pages() and hang the machine ... ;) A (long) while back, Linus specifically mentioned worrying about atomic allocation reliability. That's a separate issue. That was, IIRC, about the failure of atomic allocations causing packet loss on Linux routers and, because of that, poor performance. This is something we still need to look into, but basically this problem is about too high latency and NOT about pre-freeing more pages (like your patch attempts). If this problem is still an issue, it's quite likely that the VM is holding locks for too long so that it cannot react fast enough to free up some inactive_clean pages. regards, Rik -- Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Thu, 24 May 2001, Rik van Riel wrote: OK.. let's forget about throughput for a moment and consider those annoying reports of 0 order allocations failing :) Those are ok. All failing 0 order allocations are either atomic allocations or GFP_BUFFER allocations. I guess we should just remove the printk() ;) Hmm. The guy who's box locks up on him after a burst of these probably doesn't think these failures are very OK ;-) I don't think order 0 failing is cool at all.. ever. You may not think it's cool, but it's needed in order to prevent deadlocks. Just because an allocation cannot do disk IO or sleep, that's no reason to loop around like crazy in __alloc_pages() and hang the machine ... ;) True, but if we have resources available there's no excuse for a failure. Well, yes there is. If the cost of that resource is higher than the value of letting the allocation succeed. We have no data on the value of success, but we do plan on consuming the reclaimable pool and do that (must), so I still think turning these resources loose at strategic moments is logically sound. (doesn't mean there's not a better way.. it's just an easy way) I'd really like someone who has this problem to try the patch to see if it does help. I don't have this darn problem myself, so I'm left holding a bag of idle curiosity. ;-) Cheers, -Mike - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
David Weinehall wrote: > IMVHO every developer involved in memory-management (and indeed, any > software development; the authors of ntpd comes in mind here) should > have a 386 with 4MB of RAM and some 16MB of swap. Nowadays I have the > luxury of a 486 with 8MB of RAM and 32MB of swap as a firewall, but it's > still a pain to work with. If you really want to have fun, remove all swap... Scott Anderson [EMAIL PROTECTED] MontaVista Software Inc. (408)328-9214 1237 East Arques Ave. http://www.mvista.com Sunnyvale, CA 94085 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
>Time to hunt around for a 386 or 486 which is limited to such >a small amount of RAM ;) I've got an old knackered 486DX/33 with 8Mb RAM (in 30-pin SIMMs, woohoo!), a flat CMOS battery, a 2Gb Maxtor HD that needs a low-level format every year, and no case. It isn't running anything right now... -- from: Jonathan "Chromatix" Morton mail: [EMAIL PROTECTED] (not for attachments) big-mail: [EMAIL PROTECTED] uni-mail: [EMAIL PROTECTED] The key to knowledge is not to rely on people to teach you it. Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/ -BEGIN GEEK CODE BLOCK- Version 3.12 GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*) -END GEEK CODE BLOCK- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Mon, 21 May 2001, David Weinehall wrote: > IMVHO every developer involved in memory-management (and indeed, any > software development; the authors of ntpd comes in mind here) should > have a 386 with 4MB of RAM and some 16MB of swap. Nowadays I have the > luxury of a 486 with 8MB of RAM and 32MB of swap as a firewall, but it's > still a pain to work with. You're absolutely right. The smallest thing I'm testing with on a regular basis is my dual pentium machine, booted with mem=8m or mem=16m. Time to hunt around for a 386 or 486 which is limited to such a small amount of RAM ;) cheers, Rik -- Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Mon, 21 May 2001, David Weinehall wrote: IMVHO every developer involved in memory-management (and indeed, any software development; the authors of ntpd comes in mind here) should have a 386 with 4MB of RAM and some 16MB of swap. Nowadays I have the luxury of a 486 with 8MB of RAM and 32MB of swap as a firewall, but it's still a pain to work with. You're absolutely right. The smallest thing I'm testing with on a regular basis is my dual pentium machine, booted with mem=8m or mem=16m. Time to hunt around for a 386 or 486 which is limited to such a small amount of RAM ;) cheers, Rik -- Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
Time to hunt around for a 386 or 486 which is limited to such a small amount of RAM ;) I've got an old knackered 486DX/33 with 8Mb RAM (in 30-pin SIMMs, woohoo!), a flat CMOS battery, a 2Gb Maxtor HD that needs a low-level format every year, and no case. It isn't running anything right now... -- from: Jonathan Chromatix Morton mail: [EMAIL PROTECTED] (not for attachments) big-mail: [EMAIL PROTECTED] uni-mail: [EMAIL PROTECTED] The key to knowledge is not to rely on people to teach you it. Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/ -BEGIN GEEK CODE BLOCK- Version 3.12 GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*) -END GEEK CODE BLOCK- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
David Weinehall wrote: IMVHO every developer involved in memory-management (and indeed, any software development; the authors of ntpd comes in mind here) should have a 386 with 4MB of RAM and some 16MB of swap. Nowadays I have the luxury of a 486 with 8MB of RAM and 32MB of swap as a firewall, but it's still a pain to work with. If you really want to have fun, remove all swap... Scott Anderson [EMAIL PROTECTED] MontaVista Software Inc. (408)328-9214 1237 East Arques Ave. http://www.mvista.com Sunnyvale, CA 94085 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Sun, May 20, 2001 at 11:54:09PM +0200, Pavel Machek wrote: > Hi! > > > > You're right. It should never dump too much data at once. OTOH, if > > > those cleaned pages are really old (front of reclaim list), there's no > > > value in keeping them either. Maybe there should be a slow bleed for > > > mostly idle or lightly loaded conditions. > > > > If you don't think it's worthwhile keeping the oldest pages > > in memory around, please hand me your excess DIMMS ;) > > Sorry, Rik, you can't have that that DIMM. You know, you are > developing memory managment, and we can't have you having too much > memory available ;-). IMVHO every developer involved in memory-management (and indeed, any software development; the authors of ntpd comes in mind here) should have a 386 with 4MB of RAM and some 16MB of swap. Nowadays I have the luxury of a 486 with 8MB of RAM and 32MB of swap as a firewall, but it's still a pain to work with. /David _ _ // David Weinehall <[EMAIL PROTECTED]> /> Northern lights wander \\ // Project MCA Linux hacker// Dance across the winter sky // \> http://www.acc.umu.se/~tao/http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
Hi! > > You're right. It should never dump too much data at once. OTOH, if > > those cleaned pages are really old (front of reclaim list), there's no > > value in keeping them either. Maybe there should be a slow bleed for > > mostly idle or lightly loaded conditions. > > If you don't think it's worthwhile keeping the oldest pages > in memory around, please hand me your excess DIMMS ;) Sorry, Rik, you can't have that that DIMM. You know, you are developing memory managment, and we can't have you having too much memory available ;-). Pavel -- I'm [EMAIL PROTECTED] "In my country we have almost anarchy and I don't care." Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
Hi, On Sun, May 20, 2001 at 07:04:31AM -0300, Rik van Riel wrote: > On Sun, 20 May 2001, Mike Galbraith wrote: > > > > Looking at the locking and trying to think SMP (grunt) though, I > > don't like the thought of taking two locks for each page until > > > 100%. The data in that block is toast anyway. A big hairy SMP > > box has to feel reclaim_page(). (they probably feel the zone lock > > too.. probably would like to allocate blocks) > > Indeed, but this is a separate problem. Doing per-CPU private > (small, 8-32 page?) free lists is probably a good idea Ingo already implemented that for Tux2. Cheers, Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
Hi, On Sun, May 20, 2001 at 07:04:31AM -0300, Rik van Riel wrote: On Sun, 20 May 2001, Mike Galbraith wrote: Looking at the locking and trying to think SMP (grunt) though, I don't like the thought of taking two locks for each page until 100%. The data in that block is toast anyway. A big hairy SMP box has to feel reclaim_page(). (they probably feel the zone lock too.. probably would like to allocate blocks) Indeed, but this is a separate problem. Doing per-CPU private (small, 8-32 page?) free lists is probably a good idea Ingo already implemented that for Tux2. Cheers, Stephen - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
Hi! You're right. It should never dump too much data at once. OTOH, if those cleaned pages are really old (front of reclaim list), there's no value in keeping them either. Maybe there should be a slow bleed for mostly idle or lightly loaded conditions. If you don't think it's worthwhile keeping the oldest pages in memory around, please hand me your excess DIMMS ;) Sorry, Rik, you can't have that that DIMM. You know, you are developing memory managment, and we can't have you having too much memory available ;-). Pavel -- I'm [EMAIL PROTECTED] In my country we have almost anarchy and I don't care. Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Sun, May 20, 2001 at 11:54:09PM +0200, Pavel Machek wrote: Hi! You're right. It should never dump too much data at once. OTOH, if those cleaned pages are really old (front of reclaim list), there's no value in keeping them either. Maybe there should be a slow bleed for mostly idle or lightly loaded conditions. If you don't think it's worthwhile keeping the oldest pages in memory around, please hand me your excess DIMMS ;) Sorry, Rik, you can't have that that DIMM. You know, you are developing memory managment, and we can't have you having too much memory available ;-). IMVHO every developer involved in memory-management (and indeed, any software development; the authors of ntpd comes in mind here) should have a 386 with 4MB of RAM and some 16MB of swap. Nowadays I have the luxury of a 486 with 8MB of RAM and 32MB of swap as a firewall, but it's still a pain to work with. /David _ _ // David Weinehall [EMAIL PROTECTED] / Northern lights wander \\ // Project MCA Linux hacker// Dance across the winter sky // \ http://www.acc.umu.se/~tao// Full colour fire / - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Sun, 20 May 2001, Marcelo Tosatti wrote: > On Sat, 19 May 2001, Mike Galbraith wrote: > > > @@ -1054,7 +1033,7 @@ > > if (!zone->size) > > continue; > > > > - while (zone->free_pages < zone->pages_low) { > > + while (zone->free_pages < zone->inactive_clean_pages) { > > struct page * page; > > page = reclaim_page(zone); > > if (!page) > > > What you're trying to do with this change ? Just ensuring that I never had a large supply of cleaned pages laying around at a time when folks are in distress. It also ensures that you never donate your last reclaimable pages, but that wasn't the intent. It was a stray though that happened to produce measurable improvement. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Sun, 20 May 2001, Rik van Riel wrote: > On Sun, 20 May 2001, Mike Galbraith wrote: > > On 20 May 2001, Zlatko Calusic wrote: > > > > Also in all recent kernels, if the machine is swapping, swap cache > > > grows without limits and is hard to recycle, but then again that is > > > a known problem. > > > > This one bugs me. I do not see that and can't understand why. > > Could it be because we never free swap space and never > delete pages from the swap cache ? I sent a query to the list asking if a heavy load cleared it out, but got no replies. I figured about the only thing it could be is that under light load, reclaim isn't needed to cure and shortage. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Sat, 19 May 2001, Mike Galbraith wrote: > @@ -1054,7 +1033,7 @@ > if (!zone->size) > continue; > > - while (zone->free_pages < zone->pages_low) { > + while (zone->free_pages < zone->inactive_clean_pages) { > struct page * page; > page = reclaim_page(zone); > if (!page) What you're trying to do with this change ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Sun, 20 May 2001, Mike Galbraith wrote: > On 20 May 2001, Zlatko Calusic wrote: > > Also in all recent kernels, if the machine is swapping, swap cache > > grows without limits and is hard to recycle, but then again that is > > a known problem. > > This one bugs me. I do not see that and can't understand why. Could it be because we never free swap space and never delete pages from the swap cache ? Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to [EMAIL PROTECTED] (spam digging piggy) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Sun, 20 May 2001, Mike Galbraith wrote: > > Also in all recent kernels, if the machine is swapping, swap cache > > grows without limits and is hard to recycle, but then again that is > > a known problem. > > This one bugs me. I do not see that and can't understand why. To throw away dirty and dead swapcache (its done at swap writepage()) pages page_launder() has to run into its second loop (launder_loop = 1) (meaning that a lot of clean cache has been thrown out already). We can "short circuit" this dead swapcache pages by cleaning them in the first page_launder() loop. Take a look at the writepage() patch I sent to Linus a few days ago. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On 20 May 2001, Zlatko Calusic wrote: > Mike Galbraith <[EMAIL PROTECTED]> writes: > > > Hi, > > > > On Fri, 18 May 2001, Stephen C. Tweedie wrote: > > > > > That's the main problem with static parameters. The problem you are > > > trying to solve is fundamentally dynamic in most cases (which is also > > > why magic numbers tend to suck in the VM.) > > > > Magic numbers might be sucking some performance right now ;-) > > > [snip] > > I like your patch, it improves performance somewhat and makes things > more smooth and also code is simpler. Thanks for the feedback. Positive is nice.. as is negative. > Anyway, 2.4.5-pre3 is quite debalanced and it has even broken some > things that were working properly before. For instance, swapoff now > deadlocks the machine (even with your patch applied). I haven't run into that. > Unfortunately, I have failed to pinpoint the exact problem, but I'm > confident that kernel goes in some kind of loop (99% system time, just > before deadlock). Anybody has some guidelines how to debug kernel if > you're running X? Serial console and kdb or kgdb if you have two machines.. or uml? > Also in all recent kernels, if the machine is swapping, swap cache > grows without limits and is hard to recycle, but then again that is > a known problem. This one bugs me. I do not see that and can't understand why. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Sun, 20 May 2001, Ingo Oeser wrote: > On Sun, May 20, 2001 at 05:29:49AM +0200, Mike Galbraith wrote: > > I'm not sure why that helps. I didn't put it in as a trick or > > anything though. I put it in because it didn't seem like a > > good idea to ever have more cleaned pages than free pages at a > > time when we're yammering for help.. so I did that and it helped. > > The rationale for this is easy: free pages is wasted memory, > clean pages is hot, clean cache. The best state a cache can be in. Sure. Under low load, cache is great. Under stress, keeping it is not an option though ;-) We're at or beyond capacity and moving at a high delda V (people yammering for help). If you can recognize and kill the delta rapidly by dumping that which you are going to have to dump anyway, you save time getting back on your feet. (my guess as to why dumping clean pages does measurably help in this case) -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
Mike Galbraith <[EMAIL PROTECTED]> writes: > Hi, > > On Fri, 18 May 2001, Stephen C. Tweedie wrote: > > > That's the main problem with static parameters. The problem you are > > trying to solve is fundamentally dynamic in most cases (which is also > > why magic numbers tend to suck in the VM.) > > Magic numbers might be sucking some performance right now ;-) > [snip] I like your patch, it improves performance somewhat and makes things more smooth and also code is simpler. Anyway, 2.4.5-pre3 is quite debalanced and it has even broken some things that were working properly before. For instance, swapoff now deadlocks the machine (even with your patch applied). Unfortunately, I have failed to pinpoint the exact problem, but I'm confident that kernel goes in some kind of loop (99% system time, just before deadlock). Anybody has some guidelines how to debug kernel if you're running X? Also in all recent kernels, if the machine is swapping, swap cache grows without limits and is hard to recycle, but then again that is a known problem. -- Zlatko - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Sun, May 20, 2001 at 05:29:49AM +0200, Mike Galbraith wrote: > I'm not sure why that helps. I didn't put it in as a trick or > anything though. I put it in because it didn't seem like a > good idea to ever have more cleaned pages than free pages at a > time when we're yammering for help.. so I did that and it helped. The rationale for this is easy: free pages is wasted memory, clean pages is hot, clean cache. The best state a cache can be in. Regards Ingo Oeser -- To the systems programmer, users and applications serve only to provide a test load. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Sun, 20 May 2001, Mike Galbraith wrote: > but ;-) > > Looking at the locking and trying to think SMP (grunt) though, I > don't like the thought of taking two locks for each page until > 100%. The data in that block is toast anyway. A big hairy SMP > box has to feel reclaim_page(). (they probably feel the zone lock > too.. probably would like to allocate blocks) Indeed, but this is a separate problem. Doing per-CPU private (small, 8-32 page?) free lists is probably a good idea, but I don't really think it's related to kreclaimd ;) regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to [EMAIL PROTECTED] (spam digging piggy) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Sun, 20 May 2001, Rik van Riel wrote: > On Sun, 20 May 2001, Mike Galbraith wrote: > > > You're right. It should never dump too much data at once. OTOH, if > > those cleaned pages are really old (front of reclaim list), there's no > > value in keeping them either. Maybe there should be a slow bleed for > > mostly idle or lightly loaded conditions. > > If you don't think it's worthwhile keeping the oldest pages > in memory around, please hand me your excess DIMMS ;) You're welcome to the data in any of them :) The hardware I keep. > Remember that inactive_clean pages are always immediately > reclaimable by __alloc_pages(), if you measured a performance > difference by freeing pages in a different way I'm pretty sure > it's a side effect of something else. What that something > else is I'm curious to find out, but I'm pretty convinced that > throwing away data early isn't the way to go. OK. I'm getting a little distracted by thinking about the locking and some latency comments I've heard various gurus make. I should probably stick to thinking about/measuring throughput.. much easier. but ;-) Looking at the locking and trying to think SMP (grunt) though, I don't like the thought of taking two locks for each page until kreclaimd gets a chance to run. One of those locks is the pagecache_lock, and that makes me think it'd be better to just reclaim a block if I have to reclaim at all. At that point, the chances of needing to lock the pagecache soon again are about 100%. The data in that block is toast anyway. A big hairy SMP box has to feel reclaim_page(). (they probably feel the zone lock too.. probably would like to allocate blocks) -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Sun, 20 May 2001, Mike Galbraith wrote: > You're right. It should never dump too much data at once. OTOH, if > those cleaned pages are really old (front of reclaim list), there's no > value in keeping them either. Maybe there should be a slow bleed for > mostly idle or lightly loaded conditions. If you don't think it's worthwhile keeping the oldest pages in memory around, please hand me your excess DIMMS ;) Remember that inactive_clean pages are always immediately reclaimable by __alloc_pages(), if you measured a performance difference by freeing pages in a different way I'm pretty sure it's a side effect of something else. What that something else is I'm curious to find out, but I'm pretty convinced that throwing away data early isn't the way to go. regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to [EMAIL PROTECTED] (spam digging piggy) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Sun, 20 May 2001, Rik van Riel wrote: > On Sun, 20 May 2001, Mike Galbraith wrote: > > > > I'm not sure why that helps. I didn't put it in as a trick or > > anything though. I put it in because it didn't seem like a > > good idea to ever have more cleaned pages than free pages at a > > time when we're yammering for help.. so I did that and it helped. >^ > > Note that this is not the normal situation. Now think > about the amount of data you'd be blowing away from the > inactive_clean pages after a bit of background aging > has gone on on a lightly loaded system. Not Good(tm) You're right. It should never dump too much data at once. OTOH, if those cleaned pages are really old (front of reclaim list), there's no value in keeping them either. Maybe there should be a slow bleed for mostly idle or lightly loaded conditions. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Sun, 20 May 2001, Mike Galbraith wrote: > On Sat, 19 May 2001, Rik van Riel wrote: > > On Sat, 19 May 2001, Mike Galbraith wrote: > > > On Fri, 18 May 2001, Stephen C. Tweedie wrote: > > > > > > > That's the main problem with static parameters. The problem you are > > > > trying to solve is fundamentally dynamic in most cases (which is also > > > > why magic numbers tend to suck in the VM.) > > > > > > Magic numbers might be sucking some performance right now ;-) > > > > ... so you replace them with some others ... ;) > > I reused one of our base numbers to classify the severity of the > situation.. not the same as inventing new ones. (well, not quite > the same anyway.. half did come from the south fourty;) *nod* ;) (not that I'm saying this is bad ... it's just that I'd like to know why things work before looking at applying them) > > > (yes, the last hunk looks out of place wrt my text. > > > > It also looks kind of bogus and geared completely towards this > > particular workload ;) > > I'm not sure why that helps. I didn't put it in as a trick or > anything though. I put it in because it didn't seem like a > good idea to ever have more cleaned pages than free pages at a > time when we're yammering for help.. so I did that and it helped. ^ Note that this is not the normal situation. Now think about the amount of data you'd be blowing away from the inactive_clean pages after a bit of background aging has gone on on a lightly loaded system. Not Good(tm) regards, Rik -- Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Sun, 20 May 2001, Mike Galbraith wrote: On Sat, 19 May 2001, Rik van Riel wrote: On Sat, 19 May 2001, Mike Galbraith wrote: On Fri, 18 May 2001, Stephen C. Tweedie wrote: That's the main problem with static parameters. The problem you are trying to solve is fundamentally dynamic in most cases (which is also why magic numbers tend to suck in the VM.) Magic numbers might be sucking some performance right now ;-) ... so you replace them with some others ... ;) I reused one of our base numbers to classify the severity of the situation.. not the same as inventing new ones. (well, not quite the same anyway.. half did come from the south fourty;) *nod* ;) (not that I'm saying this is bad ... it's just that I'd like to know why things work before looking at applying them) (yes, the last hunk looks out of place wrt my text. It also looks kind of bogus and geared completely towards this particular workload ;) I'm not sure why that helps. I didn't put it in as a trick or anything though. I put it in because it didn't seem like a good idea to ever have more cleaned pages than free pages at a time when we're yammering for help.. so I did that and it helped. ^ Note that this is not the normal situation. Now think about the amount of data you'd be blowing away from the inactive_clean pages after a bit of background aging has gone on on a lightly loaded system. Not Good(tm) regards, Rik -- Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Sun, 20 May 2001, Rik van Riel wrote: On Sun, 20 May 2001, Mike Galbraith wrote: I'm not sure why that helps. I didn't put it in as a trick or anything though. I put it in because it didn't seem like a good idea to ever have more cleaned pages than free pages at a time when we're yammering for help.. so I did that and it helped. ^ Note that this is not the normal situation. Now think about the amount of data you'd be blowing away from the inactive_clean pages after a bit of background aging has gone on on a lightly loaded system. Not Good(tm) You're right. It should never dump too much data at once. OTOH, if those cleaned pages are really old (front of reclaim list), there's no value in keeping them either. Maybe there should be a slow bleed for mostly idle or lightly loaded conditions. -Mike - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Sun, 20 May 2001, Mike Galbraith wrote: You're right. It should never dump too much data at once. OTOH, if those cleaned pages are really old (front of reclaim list), there's no value in keeping them either. Maybe there should be a slow bleed for mostly idle or lightly loaded conditions. If you don't think it's worthwhile keeping the oldest pages in memory around, please hand me your excess DIMMS ;) Remember that inactive_clean pages are always immediately reclaimable by __alloc_pages(), if you measured a performance difference by freeing pages in a different way I'm pretty sure it's a side effect of something else. What that something else is I'm curious to find out, but I'm pretty convinced that throwing away data early isn't the way to go. regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to [EMAIL PROTECTED] (spam digging piggy) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Sun, 20 May 2001, Rik van Riel wrote: On Sun, 20 May 2001, Mike Galbraith wrote: You're right. It should never dump too much data at once. OTOH, if those cleaned pages are really old (front of reclaim list), there's no value in keeping them either. Maybe there should be a slow bleed for mostly idle or lightly loaded conditions. If you don't think it's worthwhile keeping the oldest pages in memory around, please hand me your excess DIMMS ;) You're welcome to the data in any of them :) The hardware I keep. Remember that inactive_clean pages are always immediately reclaimable by __alloc_pages(), if you measured a performance difference by freeing pages in a different way I'm pretty sure it's a side effect of something else. What that something else is I'm curious to find out, but I'm pretty convinced that throwing away data early isn't the way to go. OK. I'm getting a little distracted by thinking about the locking and some latency comments I've heard various gurus make. I should probably stick to thinking about/measuring throughput.. much easier. but ;-) Looking at the locking and trying to think SMP (grunt) though, I don't like the thought of taking two locks for each page until kreclaimd gets a chance to run. One of those locks is the pagecache_lock, and that makes me think it'd be better to just reclaim a block if I have to reclaim at all. At that point, the chances of needing to lock the pagecache soon again are about 100%. The data in that block is toast anyway. A big hairy SMP box has to feel reclaim_page(). (they probably feel the zone lock too.. probably would like to allocate blocks) -Mike - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Sun, 20 May 2001, Mike Galbraith wrote: but ;-) Looking at the locking and trying to think SMP (grunt) though, I don't like the thought of taking two locks for each page until 100%. The data in that block is toast anyway. A big hairy SMP box has to feel reclaim_page(). (they probably feel the zone lock too.. probably would like to allocate blocks) Indeed, but this is a separate problem. Doing per-CPU private (small, 8-32 page?) free lists is probably a good idea, but I don't really think it's related to kreclaimd ;) regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to [EMAIL PROTECTED] (spam digging piggy) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Sun, May 20, 2001 at 05:29:49AM +0200, Mike Galbraith wrote: I'm not sure why that helps. I didn't put it in as a trick or anything though. I put it in because it didn't seem like a good idea to ever have more cleaned pages than free pages at a time when we're yammering for help.. so I did that and it helped. The rationale for this is easy: free pages is wasted memory, clean pages is hot, clean cache. The best state a cache can be in. Regards Ingo Oeser -- To the systems programmer, users and applications serve only to provide a test load. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
Mike Galbraith [EMAIL PROTECTED] writes: Hi, On Fri, 18 May 2001, Stephen C. Tweedie wrote: That's the main problem with static parameters. The problem you are trying to solve is fundamentally dynamic in most cases (which is also why magic numbers tend to suck in the VM.) Magic numbers might be sucking some performance right now ;-) [snip] I like your patch, it improves performance somewhat and makes things more smooth and also code is simpler. Anyway, 2.4.5-pre3 is quite debalanced and it has even broken some things that were working properly before. For instance, swapoff now deadlocks the machine (even with your patch applied). Unfortunately, I have failed to pinpoint the exact problem, but I'm confident that kernel goes in some kind of loop (99% system time, just before deadlock). Anybody has some guidelines how to debug kernel if you're running X? Also in all recent kernels, if the machine is swapping, swap cache grows without limits and is hard to recycle, but then again that is a known problem. -- Zlatko - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Sun, 20 May 2001, Ingo Oeser wrote: On Sun, May 20, 2001 at 05:29:49AM +0200, Mike Galbraith wrote: I'm not sure why that helps. I didn't put it in as a trick or anything though. I put it in because it didn't seem like a good idea to ever have more cleaned pages than free pages at a time when we're yammering for help.. so I did that and it helped. The rationale for this is easy: free pages is wasted memory, clean pages is hot, clean cache. The best state a cache can be in. Sure. Under low load, cache is great. Under stress, keeping it is not an option though ;-) We're at or beyond capacity and moving at a high delda V (people yammering for help). If you can recognize and kill the delta rapidly by dumping that which you are going to have to dump anyway, you save time getting back on your feet. (my guess as to why dumping clean pages does measurably help in this case) -Mike - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On 20 May 2001, Zlatko Calusic wrote: Mike Galbraith [EMAIL PROTECTED] writes: Hi, On Fri, 18 May 2001, Stephen C. Tweedie wrote: That's the main problem with static parameters. The problem you are trying to solve is fundamentally dynamic in most cases (which is also why magic numbers tend to suck in the VM.) Magic numbers might be sucking some performance right now ;-) [snip] I like your patch, it improves performance somewhat and makes things more smooth and also code is simpler. Thanks for the feedback. Positive is nice.. as is negative. Anyway, 2.4.5-pre3 is quite debalanced and it has even broken some things that were working properly before. For instance, swapoff now deadlocks the machine (even with your patch applied). I haven't run into that. Unfortunately, I have failed to pinpoint the exact problem, but I'm confident that kernel goes in some kind of loop (99% system time, just before deadlock). Anybody has some guidelines how to debug kernel if you're running X? Serial console and kdb or kgdb if you have two machines.. or uml? Also in all recent kernels, if the machine is swapping, swap cache grows without limits and is hard to recycle, but then again that is a known problem. This one bugs me. I do not see that and can't understand why. -Mike - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Sun, 20 May 2001, Mike Galbraith wrote: Also in all recent kernels, if the machine is swapping, swap cache grows without limits and is hard to recycle, but then again that is a known problem. This one bugs me. I do not see that and can't understand why. To throw away dirty and dead swapcache (its done at swap writepage()) pages page_launder() has to run into its second loop (launder_loop = 1) (meaning that a lot of clean cache has been thrown out already). We can short circuit this dead swapcache pages by cleaning them in the first page_launder() loop. Take a look at the writepage() patch I sent to Linus a few days ago. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Sun, 20 May 2001, Mike Galbraith wrote: On 20 May 2001, Zlatko Calusic wrote: Also in all recent kernels, if the machine is swapping, swap cache grows without limits and is hard to recycle, but then again that is a known problem. This one bugs me. I do not see that and can't understand why. Could it be because we never free swap space and never delete pages from the swap cache ? Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to [EMAIL PROTECTED] (spam digging piggy) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Sat, 19 May 2001, Mike Galbraith wrote: @@ -1054,7 +1033,7 @@ if (!zone-size) continue; - while (zone-free_pages zone-pages_low) { + while (zone-free_pages zone-inactive_clean_pages) { struct page * page; page = reclaim_page(zone); if (!page) What you're trying to do with this change ? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Sun, 20 May 2001, Rik van Riel wrote: On Sun, 20 May 2001, Mike Galbraith wrote: On 20 May 2001, Zlatko Calusic wrote: Also in all recent kernels, if the machine is swapping, swap cache grows without limits and is hard to recycle, but then again that is a known problem. This one bugs me. I do not see that and can't understand why. Could it be because we never free swap space and never delete pages from the swap cache ? I sent a query to the list asking if a heavy load cleared it out, but got no replies. I figured about the only thing it could be is that under light load, reclaim isn't needed to cure and shortage. -Mike - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Sun, 20 May 2001, Marcelo Tosatti wrote: On Sat, 19 May 2001, Mike Galbraith wrote: @@ -1054,7 +1033,7 @@ if (!zone-size) continue; - while (zone-free_pages zone-pages_low) { + while (zone-free_pages zone-inactive_clean_pages) { struct page * page; page = reclaim_page(zone); if (!page) What you're trying to do with this change ? Just ensuring that I never had a large supply of cleaned pages laying around at a time when folks are in distress. It also ensures that you never donate your last reclaimable pages, but that wasn't the intent. It was a stray though that happened to produce measurable improvement. -Mike - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Sun, 20 May 2001, Dieter Nützel wrote: > > > Three back to back make -j 30 runs for three different kernels. > > > Swap cache numbers are taken immediately after last completion. > > > > The performance increase is nice, though. Do you see similar > > changes in different kinds of workloads ? > > I you have a patch against 2.4.4-ac11 I will do some tests with some > (interactive) 3D apps. I don't have an ac kernel resident atm, but since Alan merged here very recently, it will probably go in ok. If not, just holler and I'll download ac11 and make you a clean patch. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Sat, 19 May 2001, Rik van Riel wrote: > On Sat, 19 May 2001, Mike Galbraith wrote: > > On Fri, 18 May 2001, Stephen C. Tweedie wrote: > > > > > That's the main problem with static parameters. The problem you are > > > trying to solve is fundamentally dynamic in most cases (which is also > > > why magic numbers tend to suck in the VM.) > > > > Magic numbers might be sucking some performance right now ;-) > > ... so you replace them with some others ... ;) I reused one of our base numbers to classify the severity of the situation.. not the same as inventing new ones. (well, not quite the same anyway.. half did come from the south fourty;) > > Three back to back make -j 30 runs for three different kernels. > > Swap cache numbers are taken immediately after last completion. > > The performance increase is nice, though. Do you see similar > changes in different kinds of workloads ? I don't have much to test with here, but I'll see if I can find something. I'd rather see someone with a server load try it. > > (yes, the last hunk looks out of place wrt my text. > > It also looks kind of bogus and geared completely towards this > particular workload ;) I'm not sure why that helps. I didn't put it in as a trick or anything though. I put it in because it didn't seem like a good idea to ever have more cleaned pages than free pages at a time when we're yammering for help.. so I did that and it helped. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
> > Three back to back make -j 30 runs for three different kernels. > > Swap cache numbers are taken immediately after last completion. > > The performance increase is nice, though. Do you see similar > changes in different kinds of workloads ? I you have a patch against 2.4.4-ac11 I will do some tests with some (interactive) 3D apps. -Dieter - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Sat, 19 May 2001, Mike Galbraith wrote: > On Fri, 18 May 2001, Stephen C. Tweedie wrote: > > > That's the main problem with static parameters. The problem you are > > trying to solve is fundamentally dynamic in most cases (which is also > > why magic numbers tend to suck in the VM.) > > Magic numbers might be sucking some performance right now ;-) ... so you replace them with some others ... ;) > Three back to back make -j 30 runs for three different kernels. > Swap cache numbers are taken immediately after last completion. The performance increase is nice, though. Do you see similar changes in different kinds of workloads ? > (yes, the last hunk looks out of place wrt my text. It also looks kind of bogus and geared completely towards this particular workload ;) regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to [EMAIL PROTECTED] (spam digging piggy) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Sat, 19 May 2001, Mike Galbraith wrote: On Fri, 18 May 2001, Stephen C. Tweedie wrote: That's the main problem with static parameters. The problem you are trying to solve is fundamentally dynamic in most cases (which is also why magic numbers tend to suck in the VM.) Magic numbers might be sucking some performance right now ;-) ... so you replace them with some others ... ;) Three back to back make -j 30 runs for three different kernels. Swap cache numbers are taken immediately after last completion. The performance increase is nice, though. Do you see similar changes in different kinds of workloads ? (yes, the last hunk looks out of place wrt my text. It also looks kind of bogus and geared completely towards this particular workload ;) regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to [EMAIL PROTECTED] (spam digging piggy) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
Three back to back make -j 30 runs for three different kernels. Swap cache numbers are taken immediately after last completion. The performance increase is nice, though. Do you see similar changes in different kinds of workloads ? I you have a patch against 2.4.4-ac11 I will do some tests with some (interactive) 3D apps. -Dieter - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Sat, 19 May 2001, Rik van Riel wrote: On Sat, 19 May 2001, Mike Galbraith wrote: On Fri, 18 May 2001, Stephen C. Tweedie wrote: That's the main problem with static parameters. The problem you are trying to solve is fundamentally dynamic in most cases (which is also why magic numbers tend to suck in the VM.) Magic numbers might be sucking some performance right now ;-) ... so you replace them with some others ... ;) I reused one of our base numbers to classify the severity of the situation.. not the same as inventing new ones. (well, not quite the same anyway.. half did come from the south fourty;) Three back to back make -j 30 runs for three different kernels. Swap cache numbers are taken immediately after last completion. The performance increase is nice, though. Do you see similar changes in different kinds of workloads ? I don't have much to test with here, but I'll see if I can find something. I'd rather see someone with a server load try it. (yes, the last hunk looks out of place wrt my text. It also looks kind of bogus and geared completely towards this particular workload ;) I'm not sure why that helps. I didn't put it in as a trick or anything though. I put it in because it didn't seem like a good idea to ever have more cleaned pages than free pages at a time when we're yammering for help.. so I did that and it helped. -Mike - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Sun, 20 May 2001, Dieter Nützel wrote: Three back to back make -j 30 runs for three different kernels. Swap cache numbers are taken immediately after last completion. The performance increase is nice, though. Do you see similar changes in different kinds of workloads ? I you have a patch against 2.4.4-ac11 I will do some tests with some (interactive) 3D apps. I don't have an ac kernel resident atm, but since Alan merged here very recently, it will probably go in ok. If not, just holler and I'll download ac11 and make you a clean patch. -Mike - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/