Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-12 Thread Rik van Riel

On Tue, 8 May 2001, David S. Miller wrote:

> So instead, you could test for the condition that prevents any
> possible forward progress, no?

if (!order || free_shortage() > 0)
goto try_again;

(which was the experimental patch I discussed with Marcelo)

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-12 Thread Rik van Riel

On Tue, 8 May 2001, David S. Miller wrote:

 So instead, you could test for the condition that prevents any
 possible forward progress, no?

if (!order || free_shortage()  0)
goto try_again;

(which was the experimental patch I discussed with Marcelo)

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-10 Thread Stephen C. Tweedie

Hi,

On Thu, May 10, 2001 at 03:49:05PM -0300, Marcelo Tosatti wrote:

> Back to the main discussion --- I guess we could make __GFP_FAIL (with
> __GFP_WAIT set :)) allocations actually fail if "try_to_free_pages()" does
> not make any progress (ie returns zero). But maybe thats a bit too
> extreme.

That would seem to be a reasonable interpretation of __GFP_FAIL +
__GFP_WAIT, yes.

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-10 Thread Marcelo Tosatti



On Thu, 10 May 2001, Stephen C. Tweedie wrote:

> Hi,
> 
> On Thu, May 10, 2001 at 03:22:57PM -0300, Marcelo Tosatti wrote:
> 
> > Initially I thought about __GFP_FAIL to be used by writeout routines which
> > want to cluster pages until they can allocate memory without causing any
> > pressure to the system. Something like this: 
> > 
> > while ((page = alloc_page(GFP_FAIL))
> > add_page_to_cluster(page);
> > write_cluster(); 
> 
> Isn't that an orthogonal decision?  You can use __GFP_FAIL with or
> without __GFP_WAIT or __GFP_IO, whichever is appropriate.

Correct. 

Back to the main discussion --- I guess we could make __GFP_FAIL (with
__GFP_WAIT set :)) allocations actually fail if "try_to_free_pages()" does
not make any progress (ie returns zero). But maybe thats a bit too
extreme.

What do you think? 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-10 Thread Stephen C. Tweedie

Hi,

On Thu, May 10, 2001 at 03:22:57PM -0300, Marcelo Tosatti wrote:

> Initially I thought about __GFP_FAIL to be used by writeout routines which
> want to cluster pages until they can allocate memory without causing any
> pressure to the system. Something like this: 
> 
> while ((page = alloc_page(GFP_FAIL))
>   add_page_to_cluster(page);
> write_cluster(); 

Isn't that an orthogonal decision?  You can use __GFP_FAIL with or
without __GFP_WAIT or __GFP_IO, whichever is appropriate.

Cheers,
 Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-10 Thread Marcelo Tosatti


On Thu, 10 May 2001, Stephen C. Tweedie wrote:

> Hi,
> 
> On Thu, May 10, 2001 at 01:43:46PM -0300, Marcelo Tosatti wrote:
> 
> > No. __GFP_FAIL can to try to reclaim pages from inactive clean.
> > 
> > We just want to avoid __GFP_FAIL allocations from going to
> > try_to_free_pages().
> 
> Why?  __GFP_FAIL is only useful as an indication that the caller has
> some magic mechanism for coping with failure.  

Hum, not _only_. 

Initially I thought about __GFP_FAIL to be used by writeout routines which
want to cluster pages until they can allocate memory without causing any
pressure to the system. Something like this: 


while ((page = alloc_page(GFP_FAIL))
add_page_to_cluster(page);

write_cluster(); 

See?

> There's no other information passed, so a brief call to
> try_to_free_pages is quite appropriate.

This obviously depends on what we decide __GFP_FAIL will be used for.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-10 Thread Stephen C. Tweedie

Hi,

On Thu, May 10, 2001 at 01:43:46PM -0300, Marcelo Tosatti wrote:

> No. __GFP_FAIL can to try to reclaim pages from inactive clean.
> 
> We just want to avoid __GFP_FAIL allocations from going to
> try_to_free_pages().

Why?  __GFP_FAIL is only useful as an indication that the caller has
some magic mechanism for coping with failure.  There's no other
information passed, so a brief call to try_to_free_pages is quite
appropriate.

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-10 Thread Marcelo Tosatti



On Thu, 10 May 2001, Mark Hemment wrote:

> 
> On Wed, 9 May 2001, Marcelo Tosatti wrote:
> > On Wed, 9 May 2001, Mark Hemment wrote:
> > >   Could introduce another allocation flag (__GFP_FAIL?) which is or'ed
> > > with a __GFP_WAIT to limit the looping?
> > 
> > __GFP_FAIL is in the -ac tree already and it is being used by the bounce
> > buffer allocation code. 
> 
> Thanks for the pointer.
> 
>   For non-zero order allocations, the test against __GFP_FAIL is a little
> too soon; it would be better after we've tried to reclaim pages from the
> inactive-clean list.  Any nasty side effects to this?

No. __GFP_FAIL can to try to reclaim pages from inactive clean.

We just want to avoid __GFP_FAIL allocations from going to
try_to_free_pages().

>   Plus, the code still prevents PF_MEMALLOC processes from using the
> inactive-clean list for non-zero order allocations.  As the trend seems to
> be to make zero and non-zero allocations 'equivalent', shouldn't this
> restriction to lifted?

I don't see any problem about making non-zero allocations be able to
directly reclaim pages.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-10 Thread Mark Hemment


On Wed, 9 May 2001, Marcelo Tosatti wrote:
> On Wed, 9 May 2001, Mark Hemment wrote:
> >   Could introduce another allocation flag (__GFP_FAIL?) which is or'ed
> > with a __GFP_WAIT to limit the looping?
> 
> __GFP_FAIL is in the -ac tree already and it is being used by the bounce
> buffer allocation code. 

Thanks for the pointer.

  For non-zero order allocations, the test against __GFP_FAIL is a little
too soon; it would be better after we've tried to reclaim pages from the
inactive-clean list.  Any nasty side effects to this?

  Plus, the code still prevents PF_MEMALLOC processes from using the
inactive-clean list for non-zero order allocations.  As the trend seems to
be to make zero and non-zero allocations 'equivalent', shouldn't this
restriction to lifted?

Mark

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-10 Thread Marcelo Tosatti



On Thu, 10 May 2001, Mark Hemment wrote:

 
 On Wed, 9 May 2001, Marcelo Tosatti wrote:
  On Wed, 9 May 2001, Mark Hemment wrote:
 Could introduce another allocation flag (__GFP_FAIL?) which is or'ed
   with a __GFP_WAIT to limit the looping?
  
  __GFP_FAIL is in the -ac tree already and it is being used by the bounce
  buffer allocation code. 
 
 Thanks for the pointer.
 
   For non-zero order allocations, the test against __GFP_FAIL is a little
 too soon; it would be better after we've tried to reclaim pages from the
 inactive-clean list.  Any nasty side effects to this?

No. __GFP_FAIL can to try to reclaim pages from inactive clean.

We just want to avoid __GFP_FAIL allocations from going to
try_to_free_pages().

   Plus, the code still prevents PF_MEMALLOC processes from using the
 inactive-clean list for non-zero order allocations.  As the trend seems to
 be to make zero and non-zero allocations 'equivalent', shouldn't this
 restriction to lifted?

I don't see any problem about making non-zero allocations be able to
directly reclaim pages.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-10 Thread Stephen C. Tweedie

Hi,

On Thu, May 10, 2001 at 01:43:46PM -0300, Marcelo Tosatti wrote:

 No. __GFP_FAIL can to try to reclaim pages from inactive clean.
 
 We just want to avoid __GFP_FAIL allocations from going to
 try_to_free_pages().

Why?  __GFP_FAIL is only useful as an indication that the caller has
some magic mechanism for coping with failure.  There's no other
information passed, so a brief call to try_to_free_pages is quite
appropriate.

--Stephen
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-10 Thread Marcelo Tosatti


On Thu, 10 May 2001, Stephen C. Tweedie wrote:

 Hi,
 
 On Thu, May 10, 2001 at 01:43:46PM -0300, Marcelo Tosatti wrote:
 
  No. __GFP_FAIL can to try to reclaim pages from inactive clean.
  
  We just want to avoid __GFP_FAIL allocations from going to
  try_to_free_pages().
 
 Why?  __GFP_FAIL is only useful as an indication that the caller has
 some magic mechanism for coping with failure.  

Hum, not _only_. 

Initially I thought about __GFP_FAIL to be used by writeout routines which
want to cluster pages until they can allocate memory without causing any
pressure to the system. Something like this: 


while ((page = alloc_page(GFP_FAIL))
add_page_to_cluster(page);

write_cluster(); 

See?

 There's no other information passed, so a brief call to
 try_to_free_pages is quite appropriate.

This obviously depends on what we decide __GFP_FAIL will be used for.



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-10 Thread Stephen C. Tweedie

Hi,

On Thu, May 10, 2001 at 03:22:57PM -0300, Marcelo Tosatti wrote:

 Initially I thought about __GFP_FAIL to be used by writeout routines which
 want to cluster pages until they can allocate memory without causing any
 pressure to the system. Something like this: 
 
 while ((page = alloc_page(GFP_FAIL))
   add_page_to_cluster(page);
 write_cluster(); 

Isn't that an orthogonal decision?  You can use __GFP_FAIL with or
without __GFP_WAIT or __GFP_IO, whichever is appropriate.

Cheers,
 Stephen
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-10 Thread Marcelo Tosatti



On Thu, 10 May 2001, Stephen C. Tweedie wrote:

 Hi,
 
 On Thu, May 10, 2001 at 03:22:57PM -0300, Marcelo Tosatti wrote:
 
  Initially I thought about __GFP_FAIL to be used by writeout routines which
  want to cluster pages until they can allocate memory without causing any
  pressure to the system. Something like this: 
  
  while ((page = alloc_page(GFP_FAIL))
  add_page_to_cluster(page);
  write_cluster(); 
 
 Isn't that an orthogonal decision?  You can use __GFP_FAIL with or
 without __GFP_WAIT or __GFP_IO, whichever is appropriate.

Correct. 

Back to the main discussion --- I guess we could make __GFP_FAIL (with
__GFP_WAIT set :)) allocations actually fail if try_to_free_pages() does
not make any progress (ie returns zero). But maybe thats a bit too
extreme.

What do you think? 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-10 Thread Stephen C. Tweedie

Hi,

On Thu, May 10, 2001 at 03:49:05PM -0300, Marcelo Tosatti wrote:

 Back to the main discussion --- I guess we could make __GFP_FAIL (with
 __GFP_WAIT set :)) allocations actually fail if try_to_free_pages() does
 not make any progress (ie returns zero). But maybe thats a bit too
 extreme.

That would seem to be a reasonable interpretation of __GFP_FAIL +
__GFP_WAIT, yes.

--Stephen
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-10 Thread Mark Hemment


On Wed, 9 May 2001, Marcelo Tosatti wrote:
 On Wed, 9 May 2001, Mark Hemment wrote:
Could introduce another allocation flag (__GFP_FAIL?) which is or'ed
  with a __GFP_WAIT to limit the looping?
 
 __GFP_FAIL is in the -ac tree already and it is being used by the bounce
 buffer allocation code. 

Thanks for the pointer.

  For non-zero order allocations, the test against __GFP_FAIL is a little
too soon; it would be better after we've tried to reclaim pages from the
inactive-clean list.  Any nasty side effects to this?

  Plus, the code still prevents PF_MEMALLOC processes from using the
inactive-clean list for non-zero order allocations.  As the trend seems to
be to make zero and non-zero allocations 'equivalent', shouldn't this
restriction to lifted?

Mark

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-09 Thread Marcelo Tosatti



On Wed, 9 May 2001, Mark Hemment wrote:

> 
> On Tue, 8 May 2001, David S. Miller wrote: 
> > Actually, the change was made because it is illogical to try only
> > once on multi-order pages.  Especially because we depend upon order
> > 1 pages so much (every task struct allocated).  We depend upon them
> > even more so on sparc64 (certain kinds of page tables need to be
> > allocated as 1 order pages).
> > 
> > The old code failed _far_ too easily, it was unacceptable.
> > 
> > Why put some strange limit in there?  Whatever number you pick
> > is arbitrary, and I can probably piece together an allocation
> > state where the choosen limit is too small.
> 
>   Agreed, but some allocations of non-zero orders can fall back to other
> schemes (such as an emergency buffer, or using vmalloc for a temp
> buffer) and don't want to be trapped in __alloc_pages() for too long.
> 
>   Could introduce another allocation flag (__GFP_FAIL?) which is or'ed
> with a __GFP_WAIT to limit the looping?

__GFP_FAIL is in the -ac tree already and it is being used by the bounce
buffer allocation code. 



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-09 Thread Mark Hemment


On Tue, 8 May 2001, David S. Miller wrote: 
> Actually, the change was made because it is illogical to try only
> once on multi-order pages.  Especially because we depend upon order
> 1 pages so much (every task struct allocated).  We depend upon them
> even more so on sparc64 (certain kinds of page tables need to be
> allocated as 1 order pages).
> 
> The old code failed _far_ too easily, it was unacceptable.
> 
> Why put some strange limit in there?  Whatever number you pick
> is arbitrary, and I can probably piece together an allocation
> state where the choosen limit is too small.

  Agreed, but some allocations of non-zero orders can fall back to other
schemes (such as an emergency buffer, or using vmalloc for a temp
buffer) and don't want to be trapped in __alloc_pages() for too long.

  Could introduce another allocation flag (__GFP_FAIL?) which is or'ed
with a __GFP_WAIT to limit the looping?

> So instead, you could test for the condition that prevents any
> possible forward progress, no?

  Yes, it is possible to trap when kswapd might not make any useful
progress for a failing non-zero ordered allocation, and to set a global
"force" flag (kswapd_force) to ensure it does something useful.
  For order-1 allocations, that would work.

  For order-2 (and above) it becomes much more difficult as the page
'reap' routines release/process pages based upon age and do not factor in
whether a page may/will buddy (now or in the near future).  This 'blind'
processing of pages can wipe a significant percentage of the page cache
when trying to build a buddy at a high order.

  Of course, no one should be doing really large order allocations and
expecting them to succeed.  But, if they are doing this, the allocation
should at least fail.

Mark

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-09 Thread Mark Hemment


On Tue, 8 May 2001, David S. Miller wrote: 
 Actually, the change was made because it is illogical to try only
 once on multi-order pages.  Especially because we depend upon order
 1 pages so much (every task struct allocated).  We depend upon them
 even more so on sparc64 (certain kinds of page tables need to be
 allocated as 1 order pages).
 
 The old code failed _far_ too easily, it was unacceptable.
 
 Why put some strange limit in there?  Whatever number you pick
 is arbitrary, and I can probably piece together an allocation
 state where the choosen limit is too small.

  Agreed, but some allocations of non-zero orders can fall back to other
schemes (such as an emergency buffer, or using vmalloc for a temp
buffer) and don't want to be trapped in __alloc_pages() for too long.

  Could introduce another allocation flag (__GFP_FAIL?) which is or'ed
with a __GFP_WAIT to limit the looping?

 So instead, you could test for the condition that prevents any
 possible forward progress, no?

  Yes, it is possible to trap when kswapd might not make any useful
progress for a failing non-zero ordered allocation, and to set a global
force flag (kswapd_force) to ensure it does something useful.
  For order-1 allocations, that would work.

  For order-2 (and above) it becomes much more difficult as the page
'reap' routines release/process pages based upon age and do not factor in
whether a page may/will buddy (now or in the near future).  This 'blind'
processing of pages can wipe a significant percentage of the page cache
when trying to build a buddy at a high order.

  Of course, no one should be doing really large order allocations and
expecting them to succeed.  But, if they are doing this, the allocation
should at least fail.

Mark

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-09 Thread Marcelo Tosatti



On Wed, 9 May 2001, Mark Hemment wrote:

 
 On Tue, 8 May 2001, David S. Miller wrote: 
  Actually, the change was made because it is illogical to try only
  once on multi-order pages.  Especially because we depend upon order
  1 pages so much (every task struct allocated).  We depend upon them
  even more so on sparc64 (certain kinds of page tables need to be
  allocated as 1 order pages).
  
  The old code failed _far_ too easily, it was unacceptable.
  
  Why put some strange limit in there?  Whatever number you pick
  is arbitrary, and I can probably piece together an allocation
  state where the choosen limit is too small.
 
   Agreed, but some allocations of non-zero orders can fall back to other
 schemes (such as an emergency buffer, or using vmalloc for a temp
 buffer) and don't want to be trapped in __alloc_pages() for too long.
 
   Could introduce another allocation flag (__GFP_FAIL?) which is or'ed
 with a __GFP_WAIT to limit the looping?

__GFP_FAIL is in the -ac tree already and it is being used by the bounce
buffer allocation code. 



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-08 Thread David S. Miller


Marcelo Tosatti writes:
 > On Tue, 8 May 2001, Mark Hemment wrote:
 > >   Does anyone know why the 2.4.3pre6 change was made?
 > 
 > Because wakeup_bdflush(0) can wakeup bdflush _even_ if it does not have
 > any job to do (ie less than 30% dirty buffers in the default config).  

Actually, the change was made because it is illogical to try only
once on multi-order pages.  Especially because we depend upon order
1 pages so much (every task struct allocated).  We depend upon them
even more so on sparc64 (certain kinds of page tables need to be
allocated as 1 order pages).

The old code failed _far_ too easily, it was unacceptable.

Why put some strange limit in there?  Whatever number you pick
is arbitrary, and I can probably piece together an allocation
state where the choosen limit is too small.

So instead, you could test for the condition that prevents any
possible forward progress, no?

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-08 Thread Jens Axboe

On Tue, May 08 2001, Marcelo Tosatti wrote:
> >   The attached patch (against 2.4.5-pre1) fixes the looping symptom, by
> > adding a counter and looping only twice for non-zero order allocations.
> 
> Looks good. (actually Rik had a patch similar to this which fixed a real
> case with cdda2wav just like you described)

Not cdda2wav, I pressume, but the optimization discussed here before that
wasn't really doable because of the vm behaviour when doing

do 
try to alloc some amount of contiogous pages
if (ok)
break

lower number of pages wanted
while true

CDROMREADAUDIO stopped doing this and fell back to single cdda frame
size allocations because of these failures, even though it meant a huge
decrease in speed. cdda2wav will ask for iirc 16 frames at the time, the
current driver will try and to 8 first and then fall back to slower
extraction if allocations fail.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-08 Thread Marcelo Tosatti



On Tue, 8 May 2001, Mark Hemment wrote:

> 
>   In 2.4.3pre6, code in page_alloc.c:__alloc_pages(), changed from;
> 
>   try_to_free_pages(gfp_mask);
>   wakeup_bdflush();
>   if (!order)
>   goto try_again;
> to
>   try_to_free_pages(gfp_mask);
>   wakeup_bdflush();
>   goto try_again;
> 
> 
>   This introduced the effect of a non-zero order, __GFP_WAIT allocation
> (without PF_MEMALLOC set), never returning failure.  The allocation keeps
> looping in __alloc_pages(), kicking kswapd, until the allocation succeeds.
> 
>   If there is plenty of memory in the free-pools and inactive-lists
> free_shortage() will return false, causing the state of these
> free-pools/inactive-lists not to be 'improved' by kswapd.
> 
>   If there is nothing else changing/improving the free-pools or
> inactive-lists, the allocation loops forever (kicking kswapd).
> 
>   Does anyone know why the 2.4.3pre6 change was made?

Because wakeup_bdflush(0) can wakeup bdflush _even_ if it does not have
any job to do (ie less than 30% dirty buffers in the default config).  

> 
>   The attached patch (against 2.4.5-pre1) fixes the looping symptom, by
> adding a counter and looping only twice for non-zero order allocations.

Looks good. (actually Rik had a patch similar to this which fixed a real
case with cdda2wav just like you described)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-08 Thread Alex Bligh - linux-kernel

>   The real fix is to measure fragmentation and the progress of kswapd, but
> that is too drastic for 2.4.x.

I suspect the real fix might, in general, be
a) to reduce use of kmalloc() etc. which gives
   physically contiguous memory, where virtually
   contiguous memory will do (and is, presumably,
   far easier to come by). (or perhaps add some
   flag to kmalloc to allocate out of virtual
   rather than physical memory).
b) to bias flush or swap out routines to create
   physically contiguous higher order blocks.
   Many heuristics will give you that ability.

Disclaimer: I haven't looked at this for issue for years,
but Linux seems to fail on >4k allocations now, and
fragment memory far more, than it did on much smaller
systems doing lots of nasty (8k, thus 3 pages including
header) NFS stuff back in 94.

--
Alex Bligh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[PATCH] allocation looping + kswapd CPU cycles

2001-05-08 Thread Mark Hemment


  In 2.4.3pre6, code in page_alloc.c:__alloc_pages(), changed from;

try_to_free_pages(gfp_mask);
wakeup_bdflush();
if (!order)
goto try_again;
to
try_to_free_pages(gfp_mask);
wakeup_bdflush();
goto try_again;


  This introduced the effect of a non-zero order, __GFP_WAIT allocation
(without PF_MEMALLOC set), never returning failure.  The allocation keeps
looping in __alloc_pages(), kicking kswapd, until the allocation succeeds.

  If there is plenty of memory in the free-pools and inactive-lists
free_shortage() will return false, causing the state of these
free-pools/inactive-lists not to be 'improved' by kswapd.

  If there is nothing else changing/improving the free-pools or
inactive-lists, the allocation loops forever (kicking kswapd).

  Does anyone know why the 2.4.3pre6 change was made?

  The attached patch (against 2.4.5-pre1) fixes the looping symptom, by
adding a counter and looping only twice for non-zero order allocations.

  The real fix is to measure fragmentation and the progress of kswapd, but
that is too drastic for 2.4.x.

Mark


diff -ur linux-2.4.5-pre1/mm/page_alloc.c markhe-2.4.5-pre1/mm/page_alloc.c
--- linux-2.4.5-pre1/mm/page_alloc.cFri Apr 27 22:18:08 2001
+++ markhe-2.4.5-pre1/mm/page_alloc.c   Tue May  8 13:42:12 2001
@@ -275,6 +275,7 @@
 {
zone_t **zone;
int direct_reclaim = 0;
+   int loop;
unsigned int gfp_mask = zonelist->gfp_mask;
struct page * page;
 
@@ -313,6 +314,7 @@
&& nr_inactive_dirty_pages >= freepages.high)
wakeup_bdflush(0);
 
+   loop = 0;
 try_again:
/*
 * First, see if we have any zones with lots of free memory.
@@ -453,7 +455,8 @@
if (gfp_mask & __GFP_WAIT) {
memory_pressure++;
try_to_free_pages(gfp_mask);
-   goto try_again;
+   if (!order || loop++ < 2)
+   goto try_again;
}
}
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[PATCH] allocation looping + kswapd CPU cycles

2001-05-08 Thread Mark Hemment


  In 2.4.3pre6, code in page_alloc.c:__alloc_pages(), changed from;

try_to_free_pages(gfp_mask);
wakeup_bdflush();
if (!order)
goto try_again;
to
try_to_free_pages(gfp_mask);
wakeup_bdflush();
goto try_again;


  This introduced the effect of a non-zero order, __GFP_WAIT allocation
(without PF_MEMALLOC set), never returning failure.  The allocation keeps
looping in __alloc_pages(), kicking kswapd, until the allocation succeeds.

  If there is plenty of memory in the free-pools and inactive-lists
free_shortage() will return false, causing the state of these
free-pools/inactive-lists not to be 'improved' by kswapd.

  If there is nothing else changing/improving the free-pools or
inactive-lists, the allocation loops forever (kicking kswapd).

  Does anyone know why the 2.4.3pre6 change was made?

  The attached patch (against 2.4.5-pre1) fixes the looping symptom, by
adding a counter and looping only twice for non-zero order allocations.

  The real fix is to measure fragmentation and the progress of kswapd, but
that is too drastic for 2.4.x.

Mark


diff -ur linux-2.4.5-pre1/mm/page_alloc.c markhe-2.4.5-pre1/mm/page_alloc.c
--- linux-2.4.5-pre1/mm/page_alloc.cFri Apr 27 22:18:08 2001
+++ markhe-2.4.5-pre1/mm/page_alloc.c   Tue May  8 13:42:12 2001
@@ -275,6 +275,7 @@
 {
zone_t **zone;
int direct_reclaim = 0;
+   int loop;
unsigned int gfp_mask = zonelist-gfp_mask;
struct page * page;
 
@@ -313,6 +314,7 @@
 nr_inactive_dirty_pages = freepages.high)
wakeup_bdflush(0);
 
+   loop = 0;
 try_again:
/*
 * First, see if we have any zones with lots of free memory.
@@ -453,7 +455,8 @@
if (gfp_mask  __GFP_WAIT) {
memory_pressure++;
try_to_free_pages(gfp_mask);
-   goto try_again;
+   if (!order || loop++  2)
+   goto try_again;
}
}
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-08 Thread Alex Bligh - linux-kernel

   The real fix is to measure fragmentation and the progress of kswapd, but
 that is too drastic for 2.4.x.

I suspect the real fix might, in general, be
a) to reduce use of kmalloc() etc. which gives
   physically contiguous memory, where virtually
   contiguous memory will do (and is, presumably,
   far easier to come by). (or perhaps add some
   flag to kmalloc to allocate out of virtual
   rather than physical memory).
b) to bias flush or swap out routines to create
   physically contiguous higher order blocks.
   Many heuristics will give you that ability.

Disclaimer: I haven't looked at this for issue for years,
but Linux seems to fail on 4k allocations now, and
fragment memory far more, than it did on much smaller
systems doing lots of nasty (8k, thus 3 pages including
header) NFS stuff back in 94.

--
Alex Bligh
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-08 Thread Marcelo Tosatti



On Tue, 8 May 2001, Mark Hemment wrote:

 
   In 2.4.3pre6, code in page_alloc.c:__alloc_pages(), changed from;
 
   try_to_free_pages(gfp_mask);
   wakeup_bdflush();
   if (!order)
   goto try_again;
 to
   try_to_free_pages(gfp_mask);
   wakeup_bdflush();
   goto try_again;
 
 
   This introduced the effect of a non-zero order, __GFP_WAIT allocation
 (without PF_MEMALLOC set), never returning failure.  The allocation keeps
 looping in __alloc_pages(), kicking kswapd, until the allocation succeeds.
 
   If there is plenty of memory in the free-pools and inactive-lists
 free_shortage() will return false, causing the state of these
 free-pools/inactive-lists not to be 'improved' by kswapd.
 
   If there is nothing else changing/improving the free-pools or
 inactive-lists, the allocation loops forever (kicking kswapd).
 
   Does anyone know why the 2.4.3pre6 change was made?

Because wakeup_bdflush(0) can wakeup bdflush _even_ if it does not have
any job to do (ie less than 30% dirty buffers in the default config).  

 
   The attached patch (against 2.4.5-pre1) fixes the looping symptom, by
 adding a counter and looping only twice for non-zero order allocations.

Looks good. (actually Rik had a patch similar to this which fixed a real
case with cdda2wav just like you described)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-08 Thread Jens Axboe

On Tue, May 08 2001, Marcelo Tosatti wrote:
The attached patch (against 2.4.5-pre1) fixes the looping symptom, by
  adding a counter and looping only twice for non-zero order allocations.
 
 Looks good. (actually Rik had a patch similar to this which fixed a real
 case with cdda2wav just like you described)

Not cdda2wav, I pressume, but the optimization discussed here before that
wasn't really doable because of the vm behaviour when doing

do 
try to alloc some amount of contiogous pages
if (ok)
break

lower number of pages wanted
while true

CDROMREADAUDIO stopped doing this and fell back to single cdda frame
size allocations because of these failures, even though it meant a huge
decrease in speed. cdda2wav will ask for iirc 16 frames at the time, the
current driver will try and to 8 first and then fall back to slower
extraction if allocations fail.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-08 Thread David S. Miller


Marcelo Tosatti writes:
  On Tue, 8 May 2001, Mark Hemment wrote:
 Does anyone know why the 2.4.3pre6 change was made?
  
  Because wakeup_bdflush(0) can wakeup bdflush _even_ if it does not have
  any job to do (ie less than 30% dirty buffers in the default config).  

Actually, the change was made because it is illogical to try only
once on multi-order pages.  Especially because we depend upon order
1 pages so much (every task struct allocated).  We depend upon them
even more so on sparc64 (certain kinds of page tables need to be
allocated as 1 order pages).

The old code failed _far_ too easily, it was unacceptable.

Why put some strange limit in there?  Whatever number you pick
is arbitrary, and I can probably piece together an allocation
state where the choosen limit is too small.

So instead, you could test for the condition that prevents any
possible forward progress, no?

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/