subject:"Re\: \[RFC\]\[PATCH\] Re\: Linux 2.4.4\-ac10"

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-26 Thread Pavel Machek


Hi!

> > IMVHO every developer involved in memory-management (and indeed, any
> > software development; the authors of ntpd comes in mind here) should
> > have a 386 with 4MB of RAM and some 16MB of swap. Nowadays I have the
> > luxury of a 486 with 8MB of RAM and 32MB of swap as a firewall, but it's
> > still a pain to work with.
> 
> If you really want to have fun, remove all swap...

My handheld has 12MB ram, no swap ;-), and that's pretty big machine
for handheld.
Pavel
PS: Swapping on flash disk is bad idea, right?
-- 
I'm [EMAIL PROTECTED] "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-26 Thread Pavel Machek


Hi!

  IMVHO every developer involved in memory-management (and indeed, any
  software development; the authors of ntpd comes in mind here) should
  have a 386 with 4MB of RAM and some 16MB of swap. Nowadays I have the
  luxury of a 486 with 8MB of RAM and 32MB of swap as a firewall, but it's
  still a pain to work with.
 
 If you really want to have fun, remove all swap...

My handheld has 12MB ram, no swap ;-), and that's pretty big machine
for handheld.
Pavel
PS: Swapping on flash disk is bad idea, right?
-- 
I'm [EMAIL PROTECTED] In my country we have almost anarchy and I don't care.
Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-25 Thread Pavel Machek


Hi!

> > IMVHO every developer involved in memory-management (and indeed, any
> > software development; the authors of ntpd comes in mind here) should
> > have a 386 with 4MB of RAM and some 16MB of swap. Nowadays I have the
> > luxury of a 486 with 8MB of RAM and 32MB of swap as a firewall, but it's
> > still a pain to work with.
> 
> You're absolutely right. The smallest thing I'm testing with
> on a regular basis is my dual pentium machine, booted with
> mem=8m or mem=16m.
> 
> Time to hunt around for a 386 or 486 which is limited to such
> a small amount of RAM ;)

Buy agenda handheld: 16MB flash, 8MB ram, X, size of palm. It is
definitely more sexy machine than average 486. [Or get philips velo 1,
if you want keyboard ;-)]
Pavel
-- 
The best software in life is free (not shareware)!  Pavel
GCM d? s-: !g p?:+ au- a--@ w+ v- C++@ UL+++ L++ N++ E++ W--- M- Y- R+
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-25 Thread David Weinehall


On Wed, May 23, 2001 at 05:51:50PM +, Scott Anderson wrote:
> David Weinehall wrote:
> > IMVHO every developer involved in memory-management (and indeed, any
> > software development; the authors of ntpd comes in mind here) should
> > have a 386 with 4MB of RAM and some 16MB of swap. Nowadays I have the
> > luxury of a 486 with 8MB of RAM and 32MB of swap as a firewall, but it's
> > still a pain to work with.
> 
> If you really want to have fun, remove all swap...

Oh, I've done some testing without swap too, mainly to test Rik's
oom-killer. Seemed to work pretty well. Can't say it was enjoyable, though.


/David
  _ _
 // David Weinehall <[EMAIL PROTECTED]> /> Northern lights wander  \\
//  Project MCA Linux hacker//  Dance across the winter sky //
\>  http://www.acc.umu.se/~tao/http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-25 Thread David Weinehall


On Wed, May 23, 2001 at 05:51:50PM +, Scott Anderson wrote:
 David Weinehall wrote:
  IMVHO every developer involved in memory-management (and indeed, any
  software development; the authors of ntpd comes in mind here) should
  have a 386 with 4MB of RAM and some 16MB of swap. Nowadays I have the
  luxury of a 486 with 8MB of RAM and 32MB of swap as a firewall, but it's
  still a pain to work with.
 
 If you really want to have fun, remove all swap...

Oh, I've done some testing without swap too, mainly to test Rik's
oom-killer. Seemed to work pretty well. Can't say it was enjoyable, though.


/David
  _ _
 // David Weinehall [EMAIL PROTECTED] / Northern lights wander  \\
//  Project MCA Linux hacker//  Dance across the winter sky //
\  http://www.acc.umu.se/~tao//   Full colour fire   /
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-25 Thread Pavel Machek


Hi!

  IMVHO every developer involved in memory-management (and indeed, any
  software development; the authors of ntpd comes in mind here) should
  have a 386 with 4MB of RAM and some 16MB of swap. Nowadays I have the
  luxury of a 486 with 8MB of RAM and 32MB of swap as a firewall, but it's
  still a pain to work with.
 
 You're absolutely right. The smallest thing I'm testing with
 on a regular basis is my dual pentium machine, booted with
 mem=8m or mem=16m.
 
 Time to hunt around for a 386 or 486 which is limited to such
 a small amount of RAM ;)

Buy agenda handheld: 16MB flash, 8MB ram, X, size of palm. It is
definitely more sexy machine than average 486. [Or get philips velo 1,
if you want keyboard ;-)]
Pavel
-- 
The best software in life is free (not shareware)!  Pavel
GCM d? s-: !g p?:+ au- a--@ w+ v- C++@ UL+++ L++ N++ E++ W--- M- Y- R+
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-24 Thread Mike Galbraith

On Thu, 24 May 2001, Rik van Riel wrote:

> > > > OK.. let's forget about throughput for a moment and consider
> > > > those annoying reports of 0 order allocations failing :)
> > >
> > > Those are ok.  All failing 0 order allocations are either
> > > atomic allocations or GFP_BUFFER allocations.  I guess we
> > > should just remove the printk()  ;)
> >
> > Hmm.  The guy who's box locks up on him after a burst of these
> > probably doesn't think these failures are very OK ;-)  I don't
> > think order 0 failing is cool at all.. ever.
>
> You may not think it's cool, but it's needed in order to
> prevent deadlocks. Just because an allocation cannot do
> disk IO or sleep, that's no reason to loop around like
> crazy in __alloc_pages() and hang the machine ... ;)

True, but if we have resources available there's no excuse for a
failure.  Well, yes there is.  If the cost of that resource is
higher than the value of letting the allocation succeed.  We have
no data on the value of success, but we do plan on consuming the
reclaimable pool and do that (must), so I still think turning
these resources loose at strategic moments is logically sound.
(doesn't mean there's not a better way.. it's just an easy way)

I'd really like someone who has this problem to try the patch to
see if it does help.  I don't have this darn problem myself, so
I'm left holding a bag of idle curiosity. ;-)

Cheers,

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-24 Thread Rik van Riel

On Thu, 24 May 2001, Mike Galbraith wrote:
> On Thu, 24 May 2001, Rik van Riel wrote:
> > On Thu, 24 May 2001, Mike Galbraith wrote:
> > > On Sun, 20 May 2001, Rik van Riel wrote:
> > >
> > > > Remember that inactive_clean pages are always immediately
> > > > reclaimable by __alloc_pages(), if you measured a performance
> > > > difference by freeing pages in a different way I'm pretty sure
> > > > it's a side effect of something else.  What that something
> > > > else is I'm curious to find out, but I'm pretty convinced that
> > > > throwing away data early isn't the way to go.
> > >
> > > OK.. let's forget about throughput for a moment and consider
> > > those annoying reports of 0 order allocations failing :)
> >
> > Those are ok.  All failing 0 order allocations are either
> > atomic allocations or GFP_BUFFER allocations.  I guess we
> > should just remove the printk()  ;)
>
> Hmm.  The guy who's box locks up on him after a burst of these
> probably doesn't think these failures are very OK ;-)  I don't
> think order 0 failing is cool at all.. ever.

You may not think it's cool, but it's needed in order to
prevent deadlocks. Just because an allocation cannot do
disk IO or sleep, that's no reason to loop around like
crazy in __alloc_pages() and hang the machine ... ;)

> A (long) while back, Linus specifically mentioned worrying
> about atomic allocation reliability.

That's a separate issue.  That was, IIRC, about the
failure of atomic allocations causing packet loss on
Linux routers and, because of that, poor performance.

This is something we still need to look into, but
basically this problem is about too high latency and
NOT about "pre-freeing" more pages (like your patch
attempts).  If this problem is still an issue, it's
quite likely that the VM is holding locks for too
long so that it cannot react fast enough to free up
some inactive_clean pages.

regards,

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-24 Thread Mike Galbraith

On Thu, 24 May 2001, Rik van Riel wrote:

> On Thu, 24 May 2001, Mike Galbraith wrote:
> > On Sun, 20 May 2001, Rik van Riel wrote:
> >
> > > Remember that inactive_clean pages are always immediately
> > > reclaimable by __alloc_pages(), if you measured a performance
> > > difference by freeing pages in a different way I'm pretty sure
> > > it's a side effect of something else.  What that something
> > > else is I'm curious to find out, but I'm pretty convinced that
> > > throwing away data early isn't the way to go.
> >
> > OK.. let's forget about throughput for a moment and consider
> > those annoying reports of 0 order allocations failing :)
>
> Those are ok.  All failing 0 order allocations are either
> atomic allocations or GFP_BUFFER allocations.  I guess we
> should just remove the printk()  ;)

Hmm.  The guy who's box locks up on him after a burst of these probably
doesn't think these failures are very OK ;-)  I don't think order 0
failing is cool at all.. ever.  A (long) while back, Linus specifically
mentioned worrying about atomic allocation reliability.

> > What do you think of the below (ignore the refill_inactive bit)
> > wrt allocator reliability under heavy stress?  The thing does
> > kick in and pump up zones even if I set the 'blood donor' level
> > to pages_min.
>
> > -   unsigned long water_mark;
> > +   unsigned long water_mark = 1 << order;
>
> Makes no sense at all since water_mark gets assigned not 10
> lines below.  ;)

That assignment was supposed to turn into +=.

> > +   if (direct_reclaim) {
> > +   int count;
> > +
> > +   /* If we're in bad shape.. */
> > +   if (z->free_pages < z->pages_low && z->inactive_clean_pages) {
>
> I'm not sure if we want to fill up the free list all the way
> to z->pages_low all the time, since "free memory is wasted
> memory".

Yes.  I'm just thinking of the burst of allocations with no reclaim
possible.

> The reason the current scheme only triggers when we reach
> z->pages_min and then goes all the way up to z->pages_low
> is memory defragmentation. Since we'll be doing direct

Ah.

> reclaim for just about every allocation in the system, it
> only happens occasionally that we throw away all the
> inactive_clean pages between z->pages_min and z->pages_low.

This one has me puzzled.  We're reluctant to release cleaned pages,
but at the same time, we reclaim if possible as soon as all zones
are below pages_high.

> > +   count = 4 * (1 << page_cluster);
> > +   /* reclaim a page for ourselves if we can afford to.. 
>*/
> > +   if (z->inactive_clean_pages > count)
> > +   page = reclaim_page(z);
> > +   if (z->inactive_clean_pages < 2 * count)
> > +   count = z->inactive_clean_pages / 2;
> > +   } else count = 0;
>
> What exactly is the reasoning behind this complex  "count"
> stuff? Is there a good reason for not just refilling the
> free list up to the target or until the inactive_clean list
> is depleted ?

Well, yes.  You didn't like the 50/50 split thingy I did before, so
I connected zones to a tricklecharger instead.

> > +   /*
> > +* and make a small donation to the reclaim challenged.
> > +*
> > +* We don't ever want a zone to reach the state where we
> > +* have nothing except reclaimable pages left.. not if
> > +* we can possibly do something to help prevent it.
> > +*/
>
> This comment makes little sense

If not, then none of it does.  This situation is the ONLY thing I
was worried about.  free_pages + inactive_clean_pages > pages_min
does nothing about free_pages for those who can't reclaim if most
of that is inactive_clean_pages. IFF it's possible to be critical
on free_pages and still have clean pages, it does make sense.

> > +   if (z->inactive_clean_pages - z->free_pages > z->pages_low
> > +   && waitqueue_active(_wait))
> > +   wake_up_interruptible(_wait);
>
> This doesn't make any sense to me at all.  Why wake up
> kreclaimd just because the difference between the number
> of inactive_clean pages and free pages is large ?

You had to get there with direct_reclaim not set was the thought.
Nobody gave the zone a transfusion, but there is a blood supply.
If nobody gets around to refilling the zone, kreclaimd will.

> Didn't we determine in our last exchange of email that
> it would be a good thing under most loads to keep as much
> inactive_clean memory around as possible and not waste^Wfree
> memory early ?

So why do we reclaim if we're just below pages_high?  The whole
point of this patch is to reclaim _less_ in the general case, but
to do so in a timely manner if we really need it.

> > -   /*
> > -

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-24 Thread Rik van Riel

On Thu, 24 May 2001, Mike Galbraith wrote:
> On Sun, 20 May 2001, Rik van Riel wrote:
>
> > Remember that inactive_clean pages are always immediately
> > reclaimable by __alloc_pages(), if you measured a performance
> > difference by freeing pages in a different way I'm pretty sure
> > it's a side effect of something else.  What that something
> > else is I'm curious to find out, but I'm pretty convinced that
> > throwing away data early isn't the way to go.
>
> OK.. let's forget about throughput for a moment and consider
> those annoying reports of 0 order allocations failing :)

Those are ok.  All failing 0 order allocations are either
atomic allocations or GFP_BUFFER allocations.  I guess we
should just remove the printk()  ;)

> What do you think of the below (ignore the refill_inactive bit)
> wrt allocator reliability under heavy stress?  The thing does
> kick in and pump up zones even if I set the 'blood donor' level
> to pages_min.

> - unsigned long water_mark;
> + unsigned long water_mark = 1 << order;

Makes no sense at all since water_mark gets assigned not 10
lines below.  ;)

> + if (direct_reclaim) {
> + int count;
> +
> + /* If we're in bad shape.. */
> + if (z->free_pages < z->pages_low && z->inactive_clean_pages) {

I'm not sure if we want to fill up the free list all the way
to z->pages_low all the time, since "free memory is wasted
memory".

The reason the current scheme only triggers when we reach
z->pages_min and then goes all the way up to z->pages_low
is memory defragmentation. Since we'll be doing direct
reclaim for just about every allocation in the system, it
only happens occasionally that we throw away all the
inactive_clean pages between z->pages_min and z->pages_low.

> + count = 4 * (1 << page_cluster);
> + /* reclaim a page for ourselves if we can afford to.. 
>*/
> + if (z->inactive_clean_pages > count)
> + page = reclaim_page(z);
> + if (z->inactive_clean_pages < 2 * count)
> + count = z->inactive_clean_pages / 2;
> + } else count = 0;

What exactly is the reasoning behind this complex  "count"
stuff? Is there a good reason for not just refilling the
free list up to the target or until the inactive_clean list
is depleted ?

> + /*
> +  * and make a small donation to the reclaim challenged.
> +  *
> +  * We don't ever want a zone to reach the state where we
> +  * have nothing except reclaimable pages left.. not if
> +  * we can possibly do something to help prevent it.
> +  */

This comment makes little sense

> + if (z->inactive_clean_pages - z->free_pages > z->pages_low
> + && waitqueue_active(_wait))
> + wake_up_interruptible(_wait);

This doesn't make any sense to me at all.  Why wake up
kreclaimd just because the difference between the number
of inactive_clean pages and free pages is large ?

Didn't we determine in our last exchange of email that
it would be a good thing under most loads to keep as much
inactive_clean memory around as possible and not waste^Wfree
memory early ?

> - /*
> -  * First, see if we have any zones with lots of free memory.
> -  *
> -  * We allocate free memory first because it doesn't contain
> -  * any data ... DUH!
> -  */

We want to keep this.  Suppose we have one zone which is
half filled with inactive_clean pages and one zone which
has "too many" free pages.

Allocating from the first zone means we evict some piece
of, potentially useful, data from the cache; allocating
from the second zone means we can keep the data in memory
and only fill up a currently unused page.

> @@ -824,39 +824,17 @@
>  #define DEF_PRIORITY (6)
>  static int refill_inactive(unsigned int gfp_mask, int user)
>  {

I've heard all kinds of things about this part of the patch,
except an explanation of why and how it is supposed to work ;)

> @@ -976,8 +954,9 @@
>* We go to sleep for one second, but if it's needed
>* we'll be woken up earlier...
>*/
> - if (!free_shortage() || !inactive_shortage()) {
> - interruptible_sleep_on_timeout(_wait, HZ);
> + if (current->need_resched || !free_shortage() ||
> + !inactive_shortage()) {
> + interruptible_sleep_on_timeout(_wait, HZ/10);

Makes sense.  Integrated in my tree ;)

regards,

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-24 Thread Mike Galbraith


On Sun, 20 May 2001, Rik van Riel wrote:

> Remember that inactive_clean pages are always immediately
> reclaimable by __alloc_pages(), if you measured a performance
> difference by freeing pages in a different way I'm pretty sure
> it's a side effect of something else.  What that something
> else is I'm curious to find out, but I'm pretty convinced that
> throwing away data early isn't the way to go.

OK.. let's forget about throughput for a moment and consider
those annoying reports of 0 order allocations failing :)

What do you think of the below (ignore the refill_inactive bit)
wrt allocator reliability under heavy stress?  The thing does
kick in and pump up zones even if I set the 'blood donor' level
to pages_min.

-Mike

--- linux-2.4.5-pre3/mm/page_alloc.c.orgMon May 21 10:35:06 2001
+++ linux-2.4.5-pre3/mm/page_alloc.cThu May 24 08:18:36 2001
@@ -224,10 +224,11 @@
unsigned long order, int limit, int direct_reclaim)
 {
zone_t **zone = zonelist->zones;
+   struct page *page = NULL;

for (;;) {
zone_t *z = *(zone++);
-   unsigned long water_mark;
+   unsigned long water_mark = 1 << order;

if (!z)
break;
@@ -249,18 +250,44 @@
case PAGES_HIGH:
water_mark = z->pages_high;
}
+   if (z->free_pages + z->inactive_clean_pages < water_mark)
+   continue;

-   if (z->free_pages + z->inactive_clean_pages > water_mark) {
-   struct page *page = NULL;
-   /* If possible, reclaim a page directly. */
-   if (direct_reclaim && z->free_pages < z->pages_min + 8)
+   if (direct_reclaim) {
+   int count;
+
+   /* If we're in bad shape.. */
+   if (z->free_pages < z->pages_low && z->inactive_clean_pages) {
+   count = 4 * (1 << page_cluster);
+   /* reclaim a page for ourselves if we can afford to.. 
+*/
+   if (z->inactive_clean_pages > count)
+   page = reclaim_page(z);
+   if (z->inactive_clean_pages < 2 * count)
+   count = z->inactive_clean_pages / 2;
+   } else count = 0;
+
+   /*
+* and make a small donation to the reclaim challenged.
+*
+* We don't ever want a zone to reach the state where we
+* have nothing except reclaimable pages left.. not if
+* we can possibly do something to help prevent it.
+*/
+   while (count--) {
+   struct page *page;
page = reclaim_page(z);
-   /* If that fails, fall back to rmqueue. */
-   if (!page)
-   page = rmqueue(z, order);
-   if (page)
-   return page;
+   if (!page)
+   break;
+   __free_page(page);
+   }
}
+   if (!page)
+   page = rmqueue(z, order);
+   if (page)
+   return page;
+   if (z->inactive_clean_pages - z->free_pages > z->pages_low
+   && waitqueue_active(_wait))
+   wake_up_interruptible(_wait);
}

/* Found nothing. */
@@ -314,29 +341,6 @@
wakeup_bdflush(0);

 try_again:
-   /*
-* First, see if we have any zones with lots of free memory.
-*
-* We allocate free memory first because it doesn't contain
-* any data ... DUH!
-*/
-   zone = zonelist->zones;
-   for (;;) {
-   zone_t *z = *(zone++);
-   if (!z)
-   break;
-   if (!z->size)
-   BUG();
-
-   if (z->free_pages >= z->pages_low) {
-   page = rmqueue(z, order);
-   if (page)
-   return page;
-   } else if (z->free_pages < z->pages_min &&
-   waitqueue_active(_wait)) {
-   wake_up_interruptible(_wait);
-   }
-   }

/*
 * Try to allocate a page from a zone with a HIGH
--- linux-2.4.5-pre3/mm/vmscan.c.orgThu May 17 16:44:23 2001
+++ linux-2.4.5-pre3/mm/vmscan.cThu May 24 08:05:21 2001
@@ -824,39 +824,17 @@
 #define DEF_PRIORITY (6)
 static int refill_inactive(unsigned int

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-24 Thread Mike Galbraith


On Sun, 20 May 2001, Rik van Riel wrote:

 Remember that inactive_clean pages are always immediately
 reclaimable by __alloc_pages(), if you measured a performance
 difference by freeing pages in a different way I'm pretty sure
 it's a side effect of something else.  What that something
 else is I'm curious to find out, but I'm pretty convinced that
 throwing away data early isn't the way to go.

OK.. let's forget about throughput for a moment and consider
those annoying reports of 0 order allocations failing :)

What do you think of the below (ignore the refill_inactive bit)
wrt allocator reliability under heavy stress?  The thing does
kick in and pump up zones even if I set the 'blood donor' level
to pages_min.

-Mike

--- linux-2.4.5-pre3/mm/page_alloc.c.orgMon May 21 10:35:06 2001
+++ linux-2.4.5-pre3/mm/page_alloc.cThu May 24 08:18:36 2001
@@ -224,10 +224,11 @@
unsigned long order, int limit, int direct_reclaim)
 {
zone_t **zone = zonelist-zones;
+   struct page *page = NULL;

for (;;) {
zone_t *z = *(zone++);
-   unsigned long water_mark;
+   unsigned long water_mark = 1  order;

if (!z)
break;
@@ -249,18 +250,44 @@
case PAGES_HIGH:
water_mark = z-pages_high;
}
+   if (z-free_pages + z-inactive_clean_pages  water_mark)
+   continue;

-   if (z-free_pages + z-inactive_clean_pages  water_mark) {
-   struct page *page = NULL;
-   /* If possible, reclaim a page directly. */
-   if (direct_reclaim  z-free_pages  z-pages_min + 8)
+   if (direct_reclaim) {
+   int count;
+
+   /* If we're in bad shape.. */
+   if (z-free_pages  z-pages_low  z-inactive_clean_pages) {
+   count = 4 * (1  page_cluster);
+   /* reclaim a page for ourselves if we can afford to.. 
+*/
+   if (z-inactive_clean_pages  count)
+   page = reclaim_page(z);
+   if (z-inactive_clean_pages  2 * count)
+   count = z-inactive_clean_pages / 2;
+   } else count = 0;
+
+   /*
+* and make a small donation to the reclaim challenged.
+*
+* We don't ever want a zone to reach the state where we
+* have nothing except reclaimable pages left.. not if
+* we can possibly do something to help prevent it.
+*/
+   while (count--) {
+   struct page *page;
page = reclaim_page(z);
-   /* If that fails, fall back to rmqueue. */
-   if (!page)
-   page = rmqueue(z, order);
-   if (page)
-   return page;
+   if (!page)
+   break;
+   __free_page(page);
+   }
}
+   if (!page)
+   page = rmqueue(z, order);
+   if (page)
+   return page;
+   if (z-inactive_clean_pages - z-free_pages  z-pages_low
+waitqueue_active(kreclaimd_wait))
+   wake_up_interruptible(kreclaimd_wait);
}

/* Found nothing. */
@@ -314,29 +341,6 @@
wakeup_bdflush(0);

 try_again:
-   /*
-* First, see if we have any zones with lots of free memory.
-*
-* We allocate free memory first because it doesn't contain
-* any data ... DUH!
-*/
-   zone = zonelist-zones;
-   for (;;) {
-   zone_t *z = *(zone++);
-   if (!z)
-   break;
-   if (!z-size)
-   BUG();
-
-   if (z-free_pages = z-pages_low) {
-   page = rmqueue(z, order);
-   if (page)
-   return page;
-   } else if (z-free_pages  z-pages_min 
-   waitqueue_active(kreclaimd_wait)) {
-   wake_up_interruptible(kreclaimd_wait);
-   }
-   }

/*
 * Try to allocate a page from a zone with a HIGH
--- linux-2.4.5-pre3/mm/vmscan.c.orgThu May 17 16:44:23 2001
+++ linux-2.4.5-pre3/mm/vmscan.cThu May 24 08:05:21 2001
@@ -824,39 +824,17 @@
 #define DEF_PRIORITY (6)
 static int refill_inactive(unsigned int gfp_mask, int

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-24 Thread Rik van Riel


On Thu, 24 May 2001, Mike Galbraith wrote:
 On Sun, 20 May 2001, Rik van Riel wrote:

  Remember that inactive_clean pages are always immediately
  reclaimable by __alloc_pages(), if you measured a performance
  difference by freeing pages in a different way I'm pretty sure
  it's a side effect of something else.  What that something
  else is I'm curious to find out, but I'm pretty convinced that
  throwing away data early isn't the way to go.

 OK.. let's forget about throughput for a moment and consider
 those annoying reports of 0 order allocations failing :)

Those are ok.  All failing 0 order allocations are either
atomic allocations or GFP_BUFFER allocations.  I guess we
should just remove the printk()  ;)

 What do you think of the below (ignore the refill_inactive bit)
 wrt allocator reliability under heavy stress?  The thing does
 kick in and pump up zones even if I set the 'blood donor' level
 to pages_min.

 - unsigned long water_mark;
 + unsigned long water_mark = 1  order;

Makes no sense at all since water_mark gets assigned not 10
lines below.  ;)


 + if (direct_reclaim) {
 + int count;
 +
 + /* If we're in bad shape.. */
 + if (z-free_pages  z-pages_low  z-inactive_clean_pages) {

I'm not sure if we want to fill up the free list all the way
to z-pages_low all the time, since free memory is wasted
memory.

The reason the current scheme only triggers when we reach
z-pages_min and then goes all the way up to z-pages_low
is memory defragmentation. Since we'll be doing direct
reclaim for just about every allocation in the system, it
only happens occasionally that we throw away all the
inactive_clean pages between z-pages_min and z-pages_low.

 + count = 4 * (1  page_cluster);
 + /* reclaim a page for ourselves if we can afford to.. 
*/
 + if (z-inactive_clean_pages  count)
 + page = reclaim_page(z);
 + if (z-inactive_clean_pages  2 * count)
 + count = z-inactive_clean_pages / 2;
 + } else count = 0;

What exactly is the reasoning behind this complex  count
stuff? Is there a good reason for not just refilling the
free list up to the target or until the inactive_clean list
is depleted ?

 + /*
 +  * and make a small donation to the reclaim challenged.
 +  *
 +  * We don't ever want a zone to reach the state where we
 +  * have nothing except reclaimable pages left.. not if
 +  * we can possibly do something to help prevent it.
 +  */

This comment makes little sense

 + if (z-inactive_clean_pages - z-free_pages  z-pages_low
 +  waitqueue_active(kreclaimd_wait))
 + wake_up_interruptible(kreclaimd_wait);

This doesn't make any sense to me at all.  Why wake up
kreclaimd just because the difference between the number
of inactive_clean pages and free pages is large ?

Didn't we determine in our last exchange of email that
it would be a good thing under most loads to keep as much
inactive_clean memory around as possible and not waste^Wfree
memory early ?

 - /*
 -  * First, see if we have any zones with lots of free memory.
 -  *
 -  * We allocate free memory first because it doesn't contain
 -  * any data ... DUH!
 -  */

We want to keep this.  Suppose we have one zone which is
half filled with inactive_clean pages and one zone which
has too many free pages.

Allocating from the first zone means we evict some piece
of, potentially useful, data from the cache; allocating
from the second zone means we can keep the data in memory
and only fill up a currently unused page.


 @@ -824,39 +824,17 @@
  #define DEF_PRIORITY (6)
  static int refill_inactive(unsigned int gfp_mask, int user)
  {

I've heard all kinds of things about this part of the patch,
except an explanation of why and how it is supposed to work ;)


 @@ -976,8 +954,9 @@
* We go to sleep for one second, but if it's needed
* we'll be woken up earlier...
*/
 - if (!free_shortage() || !inactive_shortage()) {
 - interruptible_sleep_on_timeout(kswapd_wait, HZ);
 + if (current-need_resched || !free_shortage() ||
 + !inactive_shortage()) {
 + interruptible_sleep_on_timeout(kswapd_wait, HZ/10);

Makes sense.  Integrated in my tree ;)


regards,

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-24 Thread Mike Galbraith


On Thu, 24 May 2001, Rik van Riel wrote:

 On Thu, 24 May 2001, Mike Galbraith wrote:
  On Sun, 20 May 2001, Rik van Riel wrote:
 
   Remember that inactive_clean pages are always immediately
   reclaimable by __alloc_pages(), if you measured a performance
   difference by freeing pages in a different way I'm pretty sure
   it's a side effect of something else.  What that something
   else is I'm curious to find out, but I'm pretty convinced that
   throwing away data early isn't the way to go.
 
  OK.. let's forget about throughput for a moment and consider
  those annoying reports of 0 order allocations failing :)

 Those are ok.  All failing 0 order allocations are either
 atomic allocations or GFP_BUFFER allocations.  I guess we
 should just remove the printk()  ;)

Hmm.  The guy who's box locks up on him after a burst of these probably
doesn't think these failures are very OK ;-)  I don't think order 0
failing is cool at all.. ever.  A (long) while back, Linus specifically
mentioned worrying about atomic allocation reliability.

  What do you think of the below (ignore the refill_inactive bit)
  wrt allocator reliability under heavy stress?  The thing does
  kick in and pump up zones even if I set the 'blood donor' level
  to pages_min.

  -   unsigned long water_mark;
  +   unsigned long water_mark = 1  order;

 Makes no sense at all since water_mark gets assigned not 10
 lines below.  ;)

That assignment was supposed to turn into +=.

  +   if (direct_reclaim) {
  +   int count;
  +
  +   /* If we're in bad shape.. */
  +   if (z-free_pages  z-pages_low  z-inactive_clean_pages) {

 I'm not sure if we want to fill up the free list all the way
 to z-pages_low all the time, since free memory is wasted
 memory.

Yes.  I'm just thinking of the burst of allocations with no reclaim
possible.

 The reason the current scheme only triggers when we reach
 z-pages_min and then goes all the way up to z-pages_low
 is memory defragmentation. Since we'll be doing direct

Ah.

 reclaim for just about every allocation in the system, it
 only happens occasionally that we throw away all the
 inactive_clean pages between z-pages_min and z-pages_low.

This one has me puzzled.  We're reluctant to release cleaned pages,
but at the same time, we reclaim if possible as soon as all zones
are below pages_high.

  +   count = 4 * (1  page_cluster);
  +   /* reclaim a page for ourselves if we can afford to.. 
*/
  +   if (z-inactive_clean_pages  count)
  +   page = reclaim_page(z);
  +   if (z-inactive_clean_pages  2 * count)
  +   count = z-inactive_clean_pages / 2;
  +   } else count = 0;

 What exactly is the reasoning behind this complex  count
 stuff? Is there a good reason for not just refilling the
 free list up to the target or until the inactive_clean list
 is depleted ?

Well, yes.  You didn't like the 50/50 split thingy I did before, so
I connected zones to a tricklecharger instead.

  +   /*
  +* and make a small donation to the reclaim challenged.
  +*
  +* We don't ever want a zone to reach the state where we
  +* have nothing except reclaimable pages left.. not if
  +* we can possibly do something to help prevent it.
  +*/

 This comment makes little sense

If not, then none of it does.  This situation is the ONLY thing I
was worried about.  free_pages + inactive_clean_pages  pages_min
does nothing about free_pages for those who can't reclaim if most
of that is inactive_clean_pages. IFF it's possible to be critical
on free_pages and still have clean pages, it does make sense.

  +   if (z-inactive_clean_pages - z-free_pages  z-pages_low
  +waitqueue_active(kreclaimd_wait))
  +   wake_up_interruptible(kreclaimd_wait);

 This doesn't make any sense to me at all.  Why wake up
 kreclaimd just because the difference between the number
 of inactive_clean pages and free pages is large ?

You had to get there with direct_reclaim not set was the thought.
Nobody gave the zone a transfusion, but there is a blood supply.
If nobody gets around to refilling the zone, kreclaimd will.

 Didn't we determine in our last exchange of email that
 it would be a good thing under most loads to keep as much
 inactive_clean memory around as possible and not waste^Wfree
 memory early ?

So why do we reclaim if we're just below pages_high?  The whole
point of this patch is to reclaim _less_ in the general case, but
to do so in a timely manner if we really need it.

  -   /*
  -* First, see if we have any zones with lots of free memory.
  -*
  -* We allocate free memory first because it doesn't

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-24 Thread Rik van Riel


On Thu, 24 May 2001, Mike Galbraith wrote:
 On Thu, 24 May 2001, Rik van Riel wrote:
  On Thu, 24 May 2001, Mike Galbraith wrote:
   On Sun, 20 May 2001, Rik van Riel wrote:
  
Remember that inactive_clean pages are always immediately
reclaimable by __alloc_pages(), if you measured a performance
difference by freeing pages in a different way I'm pretty sure
it's a side effect of something else.  What that something
else is I'm curious to find out, but I'm pretty convinced that
throwing away data early isn't the way to go.
  
   OK.. let's forget about throughput for a moment and consider
   those annoying reports of 0 order allocations failing :)
 
  Those are ok.  All failing 0 order allocations are either
  atomic allocations or GFP_BUFFER allocations.  I guess we
  should just remove the printk()  ;)

 Hmm.  The guy who's box locks up on him after a burst of these
 probably doesn't think these failures are very OK ;-)  I don't
 think order 0 failing is cool at all.. ever.

You may not think it's cool, but it's needed in order to
prevent deadlocks. Just because an allocation cannot do
disk IO or sleep, that's no reason to loop around like
crazy in __alloc_pages() and hang the machine ... ;)

 A (long) while back, Linus specifically mentioned worrying
 about atomic allocation reliability.

That's a separate issue.  That was, IIRC, about the
failure of atomic allocations causing packet loss on
Linux routers and, because of that, poor performance.

This is something we still need to look into, but
basically this problem is about too high latency and
NOT about pre-freeing more pages (like your patch
attempts).  If this problem is still an issue, it's
quite likely that the VM is holding locks for too
long so that it cannot react fast enough to free up
some inactive_clean pages.

regards,

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-24 Thread Mike Galbraith


On Thu, 24 May 2001, Rik van Riel wrote:

OK.. let's forget about throughput for a moment and consider
those annoying reports of 0 order allocations failing :)
  
   Those are ok.  All failing 0 order allocations are either
   atomic allocations or GFP_BUFFER allocations.  I guess we
   should just remove the printk()  ;)
 
  Hmm.  The guy who's box locks up on him after a burst of these
  probably doesn't think these failures are very OK ;-)  I don't
  think order 0 failing is cool at all.. ever.

 You may not think it's cool, but it's needed in order to
 prevent deadlocks. Just because an allocation cannot do
 disk IO or sleep, that's no reason to loop around like
 crazy in __alloc_pages() and hang the machine ... ;)

True, but if we have resources available there's no excuse for a
failure.  Well, yes there is.  If the cost of that resource is
higher than the value of letting the allocation succeed.  We have
no data on the value of success, but we do plan on consuming the
reclaimable pool and do that (must), so I still think turning
these resources loose at strategic moments is logically sound.
(doesn't mean there's not a better way.. it's just an easy way)

I'd really like someone who has this problem to try the patch to
see if it does help.  I don't have this darn problem myself, so
I'm left holding a bag of idle curiosity. ;-)

Cheers,

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-23 Thread Scott Anderson

David Weinehall wrote:
> IMVHO every developer involved in memory-management (and indeed, any
> software development; the authors of ntpd comes in mind here) should
> have a 386 with 4MB of RAM and some 16MB of swap. Nowadays I have the
> luxury of a 486 with 8MB of RAM and 32MB of swap as a firewall, but it's
> still a pain to work with.

If you really want to have fun, remove all swap...

Scott Anderson
[EMAIL PROTECTED]   MontaVista Software Inc.
(408)328-9214   1237 East Arques Ave.
http://www.mvista.com   Sunnyvale, CA  94085
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-23 Thread Jonathan Morton


>Time to hunt around for a 386 or 486 which is limited to such
>a small amount of RAM ;)

I've got an old knackered 486DX/33 with 8Mb RAM (in 30-pin SIMMs, woohoo!),
a flat CMOS battery, a 2Gb Maxtor HD that needs a low-level format every
year, and no case.  It isn't running anything right now...

--
from: Jonathan "Chromatix" Morton
mail: [EMAIL PROTECTED]  (not for attachments)
big-mail: [EMAIL PROTECTED]
uni-mail: [EMAIL PROTECTED]

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-BEGIN GEEK CODE BLOCK-
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-END GEEK CODE BLOCK-


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-23 Thread Rik van Riel


On Mon, 21 May 2001, David Weinehall wrote:

> IMVHO every developer involved in memory-management (and indeed, any
> software development; the authors of ntpd comes in mind here) should
> have a 386 with 4MB of RAM and some 16MB of swap. Nowadays I have the
> luxury of a 486 with 8MB of RAM and 32MB of swap as a firewall, but it's
> still a pain to work with.

You're absolutely right. The smallest thing I'm testing with
on a regular basis is my dual pentium machine, booted with
mem=8m or mem=16m.

Time to hunt around for a 386 or 486 which is limited to such
a small amount of RAM ;)

cheers,

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-23 Thread Rik van Riel


On Mon, 21 May 2001, David Weinehall wrote:

 IMVHO every developer involved in memory-management (and indeed, any
 software development; the authors of ntpd comes in mind here) should
 have a 386 with 4MB of RAM and some 16MB of swap. Nowadays I have the
 luxury of a 486 with 8MB of RAM and 32MB of swap as a firewall, but it's
 still a pain to work with.

You're absolutely right. The smallest thing I'm testing with
on a regular basis is my dual pentium machine, booted with
mem=8m or mem=16m.

Time to hunt around for a 386 or 486 which is limited to such
a small amount of RAM ;)

cheers,

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-23 Thread Jonathan Morton


Time to hunt around for a 386 or 486 which is limited to such
a small amount of RAM ;)

I've got an old knackered 486DX/33 with 8Mb RAM (in 30-pin SIMMs, woohoo!),
a flat CMOS battery, a 2Gb Maxtor HD that needs a low-level format every
year, and no case.  It isn't running anything right now...

--
from: Jonathan Chromatix Morton
mail: [EMAIL PROTECTED]  (not for attachments)
big-mail: [EMAIL PROTECTED]
uni-mail: [EMAIL PROTECTED]

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-BEGIN GEEK CODE BLOCK-
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-END GEEK CODE BLOCK-


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-23 Thread Scott Anderson


David Weinehall wrote:
 IMVHO every developer involved in memory-management (and indeed, any
 software development; the authors of ntpd comes in mind here) should
 have a 386 with 4MB of RAM and some 16MB of swap. Nowadays I have the
 luxury of a 486 with 8MB of RAM and 32MB of swap as a firewall, but it's
 still a pain to work with.

If you really want to have fun, remove all swap...

Scott Anderson
[EMAIL PROTECTED]   MontaVista Software Inc.
(408)328-9214   1237 East Arques Ave.
http://www.mvista.com   Sunnyvale, CA  94085
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-21 Thread David Weinehall


On Sun, May 20, 2001 at 11:54:09PM +0200, Pavel Machek wrote:
> Hi!
> 
> > > You're right.  It should never dump too much data at once.  OTOH, if
> > > those cleaned pages are really old (front of reclaim list), there's no
> > > value in keeping them either.  Maybe there should be a slow bleed for
> > > mostly idle or lightly loaded conditions.
> > 
> > If you don't think it's worthwhile keeping the oldest pages
> > in memory around, please hand me your excess DIMMS ;)
> 
> Sorry, Rik, you can't have that that DIMM. You know, you are
> developing memory managment, and we can't have you having too much
> memory available ;-).

IMVHO every developer involved in memory-management (and indeed, any
software development; the authors of ntpd comes in mind here) should
have a 386 with 4MB of RAM and some 16MB of swap. Nowadays I have the
luxury of a 486 with 8MB of RAM and 32MB of swap as a firewall, but it's
still a pain to work with.


/David
  _ _
 // David Weinehall <[EMAIL PROTECTED]> /> Northern lights wander  \\
//  Project MCA Linux hacker//  Dance across the winter sky //
\>  http://www.acc.umu.se/~tao/http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-21 Thread Pavel Machek


Hi!

> > You're right.  It should never dump too much data at once.  OTOH, if
> > those cleaned pages are really old (front of reclaim list), there's no
> > value in keeping them either.  Maybe there should be a slow bleed for
> > mostly idle or lightly loaded conditions.
> 
> If you don't think it's worthwhile keeping the oldest pages
> in memory around, please hand me your excess DIMMS ;)

Sorry, Rik, you can't have that that DIMM. You know, you are
developing memory managment, and we can't have you having too much
memory available ;-).
  Pavel
-- 
I'm [EMAIL PROTECTED] "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-21 Thread Stephen C. Tweedie


Hi,

On Sun, May 20, 2001 at 07:04:31AM -0300, Rik van Riel wrote:
> On Sun, 20 May 2001, Mike Galbraith wrote:
> > 
> > Looking at the locking and trying to think SMP (grunt) though, I
> > don't like the thought of taking two locks for each page until
> 
> > 100%.  The data in that block is toast anyway.  A big hairy SMP
> > box has to feel reclaim_page(). (they probably feel the zone lock
> > too.. probably would like to allocate blocks)
> 
> Indeed, but this is a separate problem.  Doing per-CPU private
> (small, 8-32 page?) free lists is probably a good idea

Ingo already implemented that for Tux2.

Cheers,
 Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-21 Thread Stephen C. Tweedie


Hi,

On Sun, May 20, 2001 at 07:04:31AM -0300, Rik van Riel wrote:
 On Sun, 20 May 2001, Mike Galbraith wrote:
  
  Looking at the locking and trying to think SMP (grunt) though, I
  don't like the thought of taking two locks for each page until
 
  100%.  The data in that block is toast anyway.  A big hairy SMP
  box has to feel reclaim_page(). (they probably feel the zone lock
  too.. probably would like to allocate blocks)
 
 Indeed, but this is a separate problem.  Doing per-CPU private
 (small, 8-32 page?) free lists is probably a good idea

Ingo already implemented that for Tux2.

Cheers,
 Stephen
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-21 Thread Pavel Machek


Hi!

  You're right.  It should never dump too much data at once.  OTOH, if
  those cleaned pages are really old (front of reclaim list), there's no
  value in keeping them either.  Maybe there should be a slow bleed for
  mostly idle or lightly loaded conditions.
 
 If you don't think it's worthwhile keeping the oldest pages
 in memory around, please hand me your excess DIMMS ;)

Sorry, Rik, you can't have that that DIMM. You know, you are
developing memory managment, and we can't have you having too much
memory available ;-).
  Pavel
-- 
I'm [EMAIL PROTECTED] In my country we have almost anarchy and I don't care.
Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-21 Thread David Weinehall


On Sun, May 20, 2001 at 11:54:09PM +0200, Pavel Machek wrote:
 Hi!
 
   You're right.  It should never dump too much data at once.  OTOH, if
   those cleaned pages are really old (front of reclaim list), there's no
   value in keeping them either.  Maybe there should be a slow bleed for
   mostly idle or lightly loaded conditions.
  
  If you don't think it's worthwhile keeping the oldest pages
  in memory around, please hand me your excess DIMMS ;)
 
 Sorry, Rik, you can't have that that DIMM. You know, you are
 developing memory managment, and we can't have you having too much
 memory available ;-).

IMVHO every developer involved in memory-management (and indeed, any
software development; the authors of ntpd comes in mind here) should
have a 386 with 4MB of RAM and some 16MB of swap. Nowadays I have the
luxury of a 486 with 8MB of RAM and 32MB of swap as a firewall, but it's
still a pain to work with.


/David
  _ _
 // David Weinehall [EMAIL PROTECTED] / Northern lights wander  \\
//  Project MCA Linux hacker//  Dance across the winter sky //
\  http://www.acc.umu.se/~tao//   Full colour fire   /
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-20 Thread Mike Galbraith

On Sun, 20 May 2001, Marcelo Tosatti wrote:

> On Sat, 19 May 2001, Mike Galbraith wrote:
>
> > @@ -1054,7 +1033,7 @@
> > if (!zone->size)
> > continue;
> >
> > -   while (zone->free_pages < zone->pages_low) {
> > +   while (zone->free_pages < zone->inactive_clean_pages) {
> > struct page * page;
> > page = reclaim_page(zone);
> > if (!page)
>
>
> What you're trying to do with this change ?

Just ensuring that I never had a large supply of cleaned pages laying
around at a time when folks are in distress.  It also ensures that you
never donate your last reclaimable pages, but that wasn't the intent.

It was a stray though that happened to produce measurable improvement.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-20 Thread Mike Galbraith


On Sun, 20 May 2001, Rik van Riel wrote:

> On Sun, 20 May 2001, Mike Galbraith wrote:
> > On 20 May 2001, Zlatko Calusic wrote:
>
> > > Also in all recent kernels, if the machine is swapping, swap cache
> > > grows without limits and is hard to recycle, but then again that is
> > > a known problem.
> >
> > This one bugs me.  I do not see that and can't understand why.
>
> Could it be because we never free swap space and never
> delete pages from the swap cache ?

I sent a query to the list asking if a heavy load cleared it out,
but got no replies.  I figured about the only thing it could be
is that under light load, reclaim isn't needed to cure and shortage.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-20 Thread Marcelo Tosatti




On Sat, 19 May 2001, Mike Galbraith wrote:

> @@ -1054,7 +1033,7 @@
>   if (!zone->size)
>   continue;
> 
> - while (zone->free_pages < zone->pages_low) {
> + while (zone->free_pages < zone->inactive_clean_pages) {
>   struct page * page;
>   page = reclaim_page(zone);
>   if (!page)


What you're trying to do with this change ? 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-20 Thread Rik van Riel


On Sun, 20 May 2001, Mike Galbraith wrote:
> On 20 May 2001, Zlatko Calusic wrote:

> > Also in all recent kernels, if the machine is swapping, swap cache
> > grows without limits and is hard to recycle, but then again that is
> > a known problem.
> 
> This one bugs me.  I do not see that and can't understand why.

Could it be because we never free swap space and never
delete pages from the swap cache ?

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-20 Thread Marcelo Tosatti

On Sun, 20 May 2001, Mike Galbraith wrote:

> > Also in all recent kernels, if the machine is swapping, swap cache
> > grows without limits and is hard to recycle, but then again that is
> > a known problem.
> 
> This one bugs me.  I do not see that and can't understand why.

To throw away dirty and dead swapcache (its done at swap writepage())
pages page_launder() has to run into its second loop (launder_loop = 1)
(meaning that a lot of clean cache has been thrown out already).

We can "short circuit" this dead swapcache pages by cleaning them in the
first page_launder() loop.

Take a look at the writepage() patch I sent to Linus a few days ago.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-20 Thread Mike Galbraith


On 20 May 2001, Zlatko Calusic wrote:

> Mike Galbraith <[EMAIL PROTECTED]> writes:
>
> > Hi,
> >
> > On Fri, 18 May 2001, Stephen C. Tweedie wrote:
> >
> > > That's the main problem with static parameters.  The problem you are
> > > trying to solve is fundamentally dynamic in most cases (which is also
> > > why magic numbers tend to suck in the VM.)
> >
> > Magic numbers might be sucking some performance right now ;-)
> >
> [snip]
>
> I like your patch, it improves performance somewhat and makes things
> more smooth and also code is simpler.

Thanks for the feedback.  Positive is nice.. as is negative.

> Anyway, 2.4.5-pre3 is quite debalanced and it has even broken some
> things that were working properly before. For instance, swapoff now
> deadlocks the machine (even with your patch applied).

I haven't run into that.

> Unfortunately, I have failed to pinpoint the exact problem, but I'm
> confident that kernel goes in some kind of loop (99% system time, just
> before deadlock). Anybody has some guidelines how to debug kernel if
> you're running X?

Serial console and kdb or kgdb if you have two machines.. or uml?

> Also in all recent kernels, if the machine is swapping, swap cache
> grows without limits and is hard to recycle, but then again that is
> a known problem.

This one bugs me.  I do not see that and can't understand why.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-20 Thread Mike Galbraith

On Sun, 20 May 2001, Ingo Oeser wrote:

> On Sun, May 20, 2001 at 05:29:49AM +0200, Mike Galbraith wrote:
> > I'm not sure why that helps.  I didn't put it in as a trick or
> > anything though.  I put it in because it didn't seem like a
> > good idea to ever have more cleaned pages than free pages at a
> > time when we're yammering for help.. so I did that and it helped.
>
> The rationale for this is easy: free pages is wasted memory,
> clean pages is hot, clean cache. The best state a cache can be in.

Sure.  Under low load, cache is great.  Under stress, keeping it is
not an option though ;-)  We're at or beyond capacity and moving at
a high delda V (people yammering for help).  If you can recognize and
kill the delta rapidly by dumping that which you are going to have
to dump anyway, you save time getting back on your feet.  (my guess
as to why dumping clean pages does measurably help in this case)

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-20 Thread Zlatko Calusic

Mike Galbraith <[EMAIL PROTECTED]> writes:

> Hi,
> 
> On Fri, 18 May 2001, Stephen C. Tweedie wrote:
> 
> > That's the main problem with static parameters.  The problem you are
> > trying to solve is fundamentally dynamic in most cases (which is also
> > why magic numbers tend to suck in the VM.)
> 
> Magic numbers might be sucking some performance right now ;-)
> 
[snip]

I like your patch, it improves performance somewhat and makes things
more smooth and also code is simpler.

Anyway, 2.4.5-pre3 is quite debalanced and it has even broken some
things that were working properly before. For instance, swapoff now
deadlocks the machine (even with your patch applied).

Unfortunately, I have failed to pinpoint the exact problem, but I'm
confident that kernel goes in some kind of loop (99% system time, just
before deadlock). Anybody has some guidelines how to debug kernel if
you're running X?

Also in all recent kernels, if the machine is swapping, swap cache
grows without limits and is hard to recycle, but then again that is
a known problem.
-- 
Zlatko
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-20 Thread Ingo Oeser


On Sun, May 20, 2001 at 05:29:49AM +0200, Mike Galbraith wrote:
> I'm not sure why that helps.  I didn't put it in as a trick or
> anything though.  I put it in because it didn't seem like a
> good idea to ever have more cleaned pages than free pages at a
> time when we're yammering for help.. so I did that and it helped.

The rationale for this is easy: free pages is wasted memory,
clean pages is hot, clean cache. The best state a cache can be in.

Regards

Ingo Oeser
-- 
To the systems programmer,
users and applications serve only to provide a test load.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-20 Thread Rik van Riel


On Sun, 20 May 2001, Mike Galbraith wrote:

> but ;-)
> 
> Looking at the locking and trying to think SMP (grunt) though, I
> don't like the thought of taking two locks for each page until

> 100%.  The data in that block is toast anyway.  A big hairy SMP
> box has to feel reclaim_page(). (they probably feel the zone lock
> too.. probably would like to allocate blocks)

Indeed, but this is a separate problem.  Doing per-CPU private
(small, 8-32 page?) free lists is probably a good idea, but I
don't really think it's related to kreclaimd ;)

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-20 Thread Mike Galbraith

On Sun, 20 May 2001, Rik van Riel wrote:

> On Sun, 20 May 2001, Mike Galbraith wrote:
>
> > You're right.  It should never dump too much data at once.  OTOH, if
> > those cleaned pages are really old (front of reclaim list), there's no
> > value in keeping them either.  Maybe there should be a slow bleed for
> > mostly idle or lightly loaded conditions.
>
> If you don't think it's worthwhile keeping the oldest pages
> in memory around, please hand me your excess DIMMS ;)

You're welcome to the data in any of them :)  The hardware I keep.

> Remember that inactive_clean pages are always immediately
> reclaimable by __alloc_pages(), if you measured a performance
> difference by freeing pages in a different way I'm pretty sure
> it's a side effect of something else.  What that something
> else is I'm curious to find out, but I'm pretty convinced that
> throwing away data early isn't the way to go.

OK.  I'm getting a little distracted by thinking about the locking
and some latency comments I've heard various gurus make.  I should
probably stick to thinking about/measuring throughput.. much easier.

but ;-)

Looking at the locking and trying to think SMP (grunt) though, I
don't like the thought of taking two locks for each page until
kreclaimd gets a chance to run.  One of those locks is the
pagecache_lock, and that makes me think it'd be better to just
reclaim a block if I have to reclaim at all.  At that point, the
chances of needing to lock the pagecache soon again are about
100%.  The data in that block is toast anyway.  A big hairy SMP
box has to feel reclaim_page(). (they probably feel the zone lock
too.. probably would like to allocate blocks)

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-20 Thread Rik van Riel

On Sun, 20 May 2001, Mike Galbraith wrote:

> You're right.  It should never dump too much data at once.  OTOH, if
> those cleaned pages are really old (front of reclaim list), there's no
> value in keeping them either.  Maybe there should be a slow bleed for
> mostly idle or lightly loaded conditions.

If you don't think it's worthwhile keeping the oldest pages
in memory around, please hand me your excess DIMMS ;)

Remember that inactive_clean pages are always immediately
reclaimable by __alloc_pages(), if you measured a performance
difference by freeing pages in a different way I'm pretty sure
it's a side effect of something else.  What that something
else is I'm curious to find out, but I'm pretty convinced that
throwing away data early isn't the way to go.

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-20 Thread Mike Galbraith


On Sun, 20 May 2001, Rik van Riel wrote:

> On Sun, 20 May 2001, Mike Galbraith wrote:
> >
> > I'm not sure why that helps.  I didn't put it in as a trick or
> > anything though.  I put it in because it didn't seem like a
> > good idea to ever have more cleaned pages than free pages at a
> > time when we're yammering for help.. so I did that and it helped.
>^
>
> Note that this is not the normal situation. Now think
> about the amount of data you'd be blowing away from the
> inactive_clean pages after a bit of background aging
> has gone on on a lightly loaded system.  Not Good(tm)

You're right.  It should never dump too much data at once.  OTOH, if
those cleaned pages are really old (front of reclaim list), there's no
value in keeping them either.  Maybe there should be a slow bleed for
mostly idle or lightly loaded conditions.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-20 Thread Rik van Riel


On Sun, 20 May 2001, Mike Galbraith wrote:
> On Sat, 19 May 2001, Rik van Riel wrote:
> > On Sat, 19 May 2001, Mike Galbraith wrote:
> > > On Fri, 18 May 2001, Stephen C. Tweedie wrote:
> > >
> > > > That's the main problem with static parameters.  The problem you are
> > > > trying to solve is fundamentally dynamic in most cases (which is also
> > > > why magic numbers tend to suck in the VM.)
> > >
> > > Magic numbers might be sucking some performance right now ;-)
> >
> > ... so you replace them with some others ... ;)
>
> I reused one of our base numbers to classify the severity of the
> situation.. not the same as inventing new ones.  (well, not quite
> the same anyway.. half did come from the south fourty;)

*nod* ;)

(not that I'm saying this is bad ... it's just that I'd
like to know why things work before looking at applying
them)

> > > (yes, the last hunk looks out of place wrt my text.
> >
> > It also looks kind of bogus and geared completely towards this
> > particular workload ;)
>
> I'm not sure why that helps.  I didn't put it in as a trick or
> anything though.  I put it in because it didn't seem like a
> good idea to ever have more cleaned pages than free pages at a
> time when we're yammering for help.. so I did that and it helped.
   ^

Note that this is not the normal situation. Now think
about the amount of data you'd be blowing away from the
inactive_clean pages after a bit of background aging
has gone on on a lightly loaded system.  Not Good(tm)

regards,

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-20 Thread Rik van Riel


On Sun, 20 May 2001, Mike Galbraith wrote:
 On Sat, 19 May 2001, Rik van Riel wrote:
  On Sat, 19 May 2001, Mike Galbraith wrote:
   On Fri, 18 May 2001, Stephen C. Tweedie wrote:
  
That's the main problem with static parameters.  The problem you are
trying to solve is fundamentally dynamic in most cases (which is also
why magic numbers tend to suck in the VM.)
  
   Magic numbers might be sucking some performance right now ;-)
 
  ... so you replace them with some others ... ;)

 I reused one of our base numbers to classify the severity of the
 situation.. not the same as inventing new ones.  (well, not quite
 the same anyway.. half did come from the south fourty;)

*nod* ;)

(not that I'm saying this is bad ... it's just that I'd
like to know why things work before looking at applying
them)

   (yes, the last hunk looks out of place wrt my text.
 
  It also looks kind of bogus and geared completely towards this
  particular workload ;)

 I'm not sure why that helps.  I didn't put it in as a trick or
 anything though.  I put it in because it didn't seem like a
 good idea to ever have more cleaned pages than free pages at a
 time when we're yammering for help.. so I did that and it helped.
   ^

Note that this is not the normal situation. Now think
about the amount of data you'd be blowing away from the
inactive_clean pages after a bit of background aging
has gone on on a lightly loaded system.  Not Good(tm)

regards,

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-20 Thread Mike Galbraith


On Sun, 20 May 2001, Rik van Riel wrote:

 On Sun, 20 May 2001, Mike Galbraith wrote:
 
  I'm not sure why that helps.  I didn't put it in as a trick or
  anything though.  I put it in because it didn't seem like a
  good idea to ever have more cleaned pages than free pages at a
  time when we're yammering for help.. so I did that and it helped.
^

 Note that this is not the normal situation. Now think
 about the amount of data you'd be blowing away from the
 inactive_clean pages after a bit of background aging
 has gone on on a lightly loaded system.  Not Good(tm)

You're right.  It should never dump too much data at once.  OTOH, if
those cleaned pages are really old (front of reclaim list), there's no
value in keeping them either.  Maybe there should be a slow bleed for
mostly idle or lightly loaded conditions.

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-20 Thread Rik van Riel


On Sun, 20 May 2001, Mike Galbraith wrote:

 You're right.  It should never dump too much data at once.  OTOH, if
 those cleaned pages are really old (front of reclaim list), there's no
 value in keeping them either.  Maybe there should be a slow bleed for
 mostly idle or lightly loaded conditions.

If you don't think it's worthwhile keeping the oldest pages
in memory around, please hand me your excess DIMMS ;)

Remember that inactive_clean pages are always immediately
reclaimable by __alloc_pages(), if you measured a performance
difference by freeing pages in a different way I'm pretty sure
it's a side effect of something else.  What that something
else is I'm curious to find out, but I'm pretty convinced that
throwing away data early isn't the way to go.

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-20 Thread Mike Galbraith


On Sun, 20 May 2001, Rik van Riel wrote:

 On Sun, 20 May 2001, Mike Galbraith wrote:

  You're right.  It should never dump too much data at once.  OTOH, if
  those cleaned pages are really old (front of reclaim list), there's no
  value in keeping them either.  Maybe there should be a slow bleed for
  mostly idle or lightly loaded conditions.

 If you don't think it's worthwhile keeping the oldest pages
 in memory around, please hand me your excess DIMMS ;)

You're welcome to the data in any of them :)  The hardware I keep.

 Remember that inactive_clean pages are always immediately
 reclaimable by __alloc_pages(), if you measured a performance
 difference by freeing pages in a different way I'm pretty sure
 it's a side effect of something else.  What that something
 else is I'm curious to find out, but I'm pretty convinced that
 throwing away data early isn't the way to go.

OK.  I'm getting a little distracted by thinking about the locking
and some latency comments I've heard various gurus make.  I should
probably stick to thinking about/measuring throughput.. much easier.

but ;-)

Looking at the locking and trying to think SMP (grunt) though, I
don't like the thought of taking two locks for each page until
kreclaimd gets a chance to run.  One of those locks is the
pagecache_lock, and that makes me think it'd be better to just
reclaim a block if I have to reclaim at all.  At that point, the
chances of needing to lock the pagecache soon again are about
100%.  The data in that block is toast anyway.  A big hairy SMP
box has to feel reclaim_page(). (they probably feel the zone lock
too.. probably would like to allocate blocks)

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-20 Thread Rik van Riel


On Sun, 20 May 2001, Mike Galbraith wrote:

 but ;-)
 
 Looking at the locking and trying to think SMP (grunt) though, I
 don't like the thought of taking two locks for each page until

 100%.  The data in that block is toast anyway.  A big hairy SMP
 box has to feel reclaim_page(). (they probably feel the zone lock
 too.. probably would like to allocate blocks)

Indeed, but this is a separate problem.  Doing per-CPU private
(small, 8-32 page?) free lists is probably a good idea, but I
don't really think it's related to kreclaimd ;)

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-20 Thread Ingo Oeser


On Sun, May 20, 2001 at 05:29:49AM +0200, Mike Galbraith wrote:
 I'm not sure why that helps.  I didn't put it in as a trick or
 anything though.  I put it in because it didn't seem like a
 good idea to ever have more cleaned pages than free pages at a
 time when we're yammering for help.. so I did that and it helped.

The rationale for this is easy: free pages is wasted memory,
clean pages is hot, clean cache. The best state a cache can be in.

Regards

Ingo Oeser
-- 
To the systems programmer,
users and applications serve only to provide a test load.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-20 Thread Zlatko Calusic


Mike Galbraith [EMAIL PROTECTED] writes:

 Hi,
 
 On Fri, 18 May 2001, Stephen C. Tweedie wrote:
 
  That's the main problem with static parameters.  The problem you are
  trying to solve is fundamentally dynamic in most cases (which is also
  why magic numbers tend to suck in the VM.)
 
 Magic numbers might be sucking some performance right now ;-)
 
[snip]

I like your patch, it improves performance somewhat and makes things
more smooth and also code is simpler.

Anyway, 2.4.5-pre3 is quite debalanced and it has even broken some
things that were working properly before. For instance, swapoff now
deadlocks the machine (even with your patch applied).

Unfortunately, I have failed to pinpoint the exact problem, but I'm
confident that kernel goes in some kind of loop (99% system time, just
before deadlock). Anybody has some guidelines how to debug kernel if
you're running X?

Also in all recent kernels, if the machine is swapping, swap cache
grows without limits and is hard to recycle, but then again that is
a known problem.
-- 
Zlatko
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-20 Thread Mike Galbraith


On Sun, 20 May 2001, Ingo Oeser wrote:

 On Sun, May 20, 2001 at 05:29:49AM +0200, Mike Galbraith wrote:
  I'm not sure why that helps.  I didn't put it in as a trick or
  anything though.  I put it in because it didn't seem like a
  good idea to ever have more cleaned pages than free pages at a
  time when we're yammering for help.. so I did that and it helped.

 The rationale for this is easy: free pages is wasted memory,
 clean pages is hot, clean cache. The best state a cache can be in.

Sure.  Under low load, cache is great.  Under stress, keeping it is
not an option though ;-)  We're at or beyond capacity and moving at
a high delda V (people yammering for help).  If you can recognize and
kill the delta rapidly by dumping that which you are going to have
to dump anyway, you save time getting back on your feet.  (my guess
as to why dumping clean pages does measurably help in this case)

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-20 Thread Mike Galbraith


On 20 May 2001, Zlatko Calusic wrote:

 Mike Galbraith [EMAIL PROTECTED] writes:

  Hi,
 
  On Fri, 18 May 2001, Stephen C. Tweedie wrote:
 
   That's the main problem with static parameters.  The problem you are
   trying to solve is fundamentally dynamic in most cases (which is also
   why magic numbers tend to suck in the VM.)
 
  Magic numbers might be sucking some performance right now ;-)
 
 [snip]

 I like your patch, it improves performance somewhat and makes things
 more smooth and also code is simpler.

Thanks for the feedback.  Positive is nice.. as is negative.

 Anyway, 2.4.5-pre3 is quite debalanced and it has even broken some
 things that were working properly before. For instance, swapoff now
 deadlocks the machine (even with your patch applied).

I haven't run into that.

 Unfortunately, I have failed to pinpoint the exact problem, but I'm
 confident that kernel goes in some kind of loop (99% system time, just
 before deadlock). Anybody has some guidelines how to debug kernel if
 you're running X?

Serial console and kdb or kgdb if you have two machines.. or uml?

 Also in all recent kernels, if the machine is swapping, swap cache
 grows without limits and is hard to recycle, but then again that is
 a known problem.

This one bugs me.  I do not see that and can't understand why.

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-20 Thread Marcelo Tosatti




On Sun, 20 May 2001, Mike Galbraith wrote:

  Also in all recent kernels, if the machine is swapping, swap cache
  grows without limits and is hard to recycle, but then again that is
  a known problem.
 
 This one bugs me.  I do not see that and can't understand why.

To throw away dirty and dead swapcache (its done at swap writepage())
pages page_launder() has to run into its second loop (launder_loop = 1)
(meaning that a lot of clean cache has been thrown out already).

We can short circuit this dead swapcache pages by cleaning them in the
first page_launder() loop.

Take a look at the writepage() patch I sent to Linus a few days ago.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-20 Thread Rik van Riel


On Sun, 20 May 2001, Mike Galbraith wrote:
 On 20 May 2001, Zlatko Calusic wrote:

  Also in all recent kernels, if the machine is swapping, swap cache
  grows without limits and is hard to recycle, but then again that is
  a known problem.
 
 This one bugs me.  I do not see that and can't understand why.

Could it be because we never free swap space and never
delete pages from the swap cache ?

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-20 Thread Marcelo Tosatti




On Sat, 19 May 2001, Mike Galbraith wrote:

 @@ -1054,7 +1033,7 @@
   if (!zone-size)
   continue;
 
 - while (zone-free_pages  zone-pages_low) {
 + while (zone-free_pages  zone-inactive_clean_pages) {
   struct page * page;
   page = reclaim_page(zone);
   if (!page)


What you're trying to do with this change ? 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-20 Thread Mike Galbraith


On Sun, 20 May 2001, Rik van Riel wrote:

 On Sun, 20 May 2001, Mike Galbraith wrote:
  On 20 May 2001, Zlatko Calusic wrote:

   Also in all recent kernels, if the machine is swapping, swap cache
   grows without limits and is hard to recycle, but then again that is
   a known problem.
 
  This one bugs me.  I do not see that and can't understand why.

 Could it be because we never free swap space and never
 delete pages from the swap cache ?

I sent a query to the list asking if a heavy load cleared it out,
but got no replies.  I figured about the only thing it could be
is that under light load, reclaim isn't needed to cure and shortage.

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-20 Thread Mike Galbraith


On Sun, 20 May 2001, Marcelo Tosatti wrote:

 On Sat, 19 May 2001, Mike Galbraith wrote:

  @@ -1054,7 +1033,7 @@
  if (!zone-size)
  continue;
 
  -   while (zone-free_pages  zone-pages_low) {
  +   while (zone-free_pages  zone-inactive_clean_pages) {
  struct page * page;
  page = reclaim_page(zone);
  if (!page)


 What you're trying to do with this change ?

Just ensuring that I never had a large supply of cleaned pages laying
around at a time when folks are in distress.  It also ensures that you
never donate your last reclaimable pages, but that wasn't the intent.

It was a stray though that happened to produce measurable improvement.

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-19 Thread Mike Galbraith


On Sun, 20 May 2001, Dieter Nützel wrote:

> > > Three back to back make -j 30 runs for three different kernels.
> > > Swap cache numbers are taken immediately after last completion.
> >
> > The performance increase is nice, though.  Do you see similar
> > changes in different kinds of workloads ?
>
> I you have a patch against 2.4.4-ac11 I will do some tests with some
> (interactive) 3D apps.

I don't have an ac kernel resident atm, but since Alan merged here
very recently, it will probably go in ok.  If not, just holler and
I'll download ac11 and make you a clean patch.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-19 Thread Mike Galbraith

On Sat, 19 May 2001, Rik van Riel wrote:

> On Sat, 19 May 2001, Mike Galbraith wrote:
> > On Fri, 18 May 2001, Stephen C. Tweedie wrote:
> >
> > > That's the main problem with static parameters.  The problem you are
> > > trying to solve is fundamentally dynamic in most cases (which is also
> > > why magic numbers tend to suck in the VM.)
> >
> > Magic numbers might be sucking some performance right now ;-)
>
> ... so you replace them with some others ... ;)

I reused one of our base numbers to classify the severity of the
situation.. not the same as inventing new ones.  (well, not quite
the same anyway.. half did come from the south fourty;)

> > Three back to back make -j 30 runs for three different kernels.
> > Swap cache numbers are taken immediately after last completion.
>
> The performance increase is nice, though.  Do you see similar
> changes in different kinds of workloads ?

I don't have much to test with here, but I'll see if I can find
something. I'd rather see someone with a server load try it.

> > (yes, the last hunk looks out of place wrt my text.
>
> It also looks kind of bogus and geared completely towards this
> particular workload ;)

I'm not sure why that helps.  I didn't put it in as a trick or
anything though.  I put it in because it didn't seem like a
good idea to ever have more cleaned pages than free pages at a
time when we're yammering for help.. so I did that and it helped.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-19 Thread Dieter Nützel


> > Three back to back make -j 30 runs for three different kernels.
> > Swap cache numbers are taken immediately after last completion.
>
> The performance increase is nice, though.  Do you see similar
> changes in different kinds of workloads ?

I you have a patch against 2.4.4-ac11 I will do some tests with some 
(interactive) 3D apps.

-Dieter

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-19 Thread Rik van Riel


On Sat, 19 May 2001, Mike Galbraith wrote:
> On Fri, 18 May 2001, Stephen C. Tweedie wrote:
> 
> > That's the main problem with static parameters.  The problem you are
> > trying to solve is fundamentally dynamic in most cases (which is also
> > why magic numbers tend to suck in the VM.)
> 
> Magic numbers might be sucking some performance right now ;-)

... so you replace them with some others ... ;)

> Three back to back make -j 30 runs for three different kernels.
> Swap cache numbers are taken immediately after last completion.

The performance increase is nice, though.  Do you see similar
changes in different kinds of workloads ?


> (yes, the last hunk looks out of place wrt my text.

It also looks kind of bogus and geared completely towards this
particular workload ;)

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-19 Thread Rik van Riel


On Sat, 19 May 2001, Mike Galbraith wrote:
 On Fri, 18 May 2001, Stephen C. Tweedie wrote:
 
  That's the main problem with static parameters.  The problem you are
  trying to solve is fundamentally dynamic in most cases (which is also
  why magic numbers tend to suck in the VM.)
 
 Magic numbers might be sucking some performance right now ;-)

... so you replace them with some others ... ;)

 Three back to back make -j 30 runs for three different kernels.
 Swap cache numbers are taken immediately after last completion.

The performance increase is nice, though.  Do you see similar
changes in different kinds of workloads ?


 (yes, the last hunk looks out of place wrt my text.

It also looks kind of bogus and geared completely towards this
particular workload ;)

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-19 Thread Dieter Nützel


  Three back to back make -j 30 runs for three different kernels.
  Swap cache numbers are taken immediately after last completion.

 The performance increase is nice, though.  Do you see similar
 changes in different kinds of workloads ?

I you have a patch against 2.4.4-ac11 I will do some tests with some 
(interactive) 3D apps.

-Dieter

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-19 Thread Mike Galbraith


On Sat, 19 May 2001, Rik van Riel wrote:

 On Sat, 19 May 2001, Mike Galbraith wrote:
  On Fri, 18 May 2001, Stephen C. Tweedie wrote:
 
   That's the main problem with static parameters.  The problem you are
   trying to solve is fundamentally dynamic in most cases (which is also
   why magic numbers tend to suck in the VM.)
 
  Magic numbers might be sucking some performance right now ;-)

 ... so you replace them with some others ... ;)

I reused one of our base numbers to classify the severity of the
situation.. not the same as inventing new ones.  (well, not quite
the same anyway.. half did come from the south fourty;)

  Three back to back make -j 30 runs for three different kernels.
  Swap cache numbers are taken immediately after last completion.

 The performance increase is nice, though.  Do you see similar
 changes in different kinds of workloads ?

I don't have much to test with here, but I'll see if I can find
something. I'd rather see someone with a server load try it.

  (yes, the last hunk looks out of place wrt my text.

 It also looks kind of bogus and geared completely towards this
 particular workload ;)

I'm not sure why that helps.  I didn't put it in as a trick or
anything though.  I put it in because it didn't seem like a
good idea to ever have more cleaned pages than free pages at a
time when we're yammering for help.. so I did that and it helped.

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-19 Thread Mike Galbraith


On Sun, 20 May 2001, Dieter Nützel wrote:

   Three back to back make -j 30 runs for three different kernels.
   Swap cache numbers are taken immediately after last completion.
 
  The performance increase is nice, though.  Do you see similar
  changes in different kinds of workloads ?

 I you have a patch against 2.4.4-ac11 I will do some tests with some
 (interactive) 3D apps.

I don't have an ac kernel resident atm, but since Alan merged here
very recently, it will probably go in ok.  If not, just holler and
I'll download ac11 and make you a clean patch.

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

64 matches

Mail list logo