Re: OOM killer gripe (was Re: What still uses the block layer?)

2007-10-20 Thread Pavel Machek
Hi!

  Would an oom-kill-someone-now sysrq be of help, I wonder?
 
 *shrug* It might.  I was a letting it run hoping it would complete itself 
 when 

sysrq-f, IIRC.

 it locked solid.  (The keyboard LEDs weren't flashing, so I don't _think_ it 
 paniced.  I was in X so I wouldn't have seen a message...)
 
 (To be honest, I can never remember how to trigger sysrq on a laptop 
 keyboard.  
 Presumably X won't intercept it the way it does alt-f1 and ctrl-alt-del...)

sysrq works even in X, and should be pressable on todays laptop
keyboards...
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OOM killer gripe (was Re: What still uses the block layer?)

2007-10-20 Thread Pavel Machek
Hi!

 I suppose I should just configure suspending to a file instead of a
 swap partition, but I've just historically trusted suspend/resume to a
 swap partition much more than to a file.  Or maybe I should hack in a
 sysctl to prevent any swapping even though the swap partition is
 configured (so only suspend/resume will use it).

swapon -a; swsusp; swapoff -a?

Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OOM killer gripe (was Re: What still uses the block layer?)

2007-10-18 Thread Rogier Wolff
On Tue, Oct 16, 2007 at 05:34:15PM +1000, Nick Piggin wrote:
  It's a hard call.  The I/O time for 1MB of contiguous disk data
  is about the I/O time of 512 bytes of contiguous disk data.
 
 And if you're thrashing, then by definition you need to throw
 out 1MB of your working set in order to read it in.

Right. But you need a differential hit rate of only a few percent on
that 1020 extra kb of data you swapped in versus the 1Mb of data you
swapped out for this to be advantageous.

With differential hit rate I mean the chances of getting a hit on
the 1Mb of data just paged in, minus the chances of getting a hit on
the 1Mb of data just paged out. 

With a little luck that 1Mb that is paged out didn't get used for
quite a while, while there is a hint that the 1Mb you're paging in
is active, as one of its sub-pages just got a hit.

So... IMHO, it would be useful to implement something that pages out
chunks of memory larger than a single hardware page. This would reduce
the size of the memory management tables (*), as well as improve disk
throughput if things DO come to paging

This should of course be configurable. Some workloads are better off
with a virtual page size of 8k, some with 128k. some with 1M.

As far as I can see, the page-cluster parameter defines how many
pages at a time are selected for page-out at a time. This increases
the page-out efficiency. Improving the page-in efficiency is also
useful: It is the other half of hte equation.

Roger. 


(*) If the kernel starts working with a 1Mb virtual page size, you
need a 256 times smaller mapping table between processes and memory or
swap. Of course, the hardware doesn't support this (actually, it does
for 1Mb virtual pages), so you'll have to create 256 page table
entries for the hardware instead of just one.



-- 
** [EMAIL PROTECTED] ** http://www.BitWizard.nl/ ** +31-15-2600998 **
**Delftechpark 26 2628 XH  Delft, The Netherlands. KVK: 27239233**
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement. 
Does it sit on the couch all day? Is it unemployed? Please be specific! 
Define 'it' and what it isn't doing. - Adapted from lxrbot FAQ
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OOM killer gripe (was Re: What still uses the block layer?)

2007-10-16 Thread David Newall

Nick Piggin wrote:

On Monday 15 October 2007 19:52, Rob Landley wrote:
  

On Monday 15 October 2007 8:37:44 am Nick Piggin wrote:


You really shouldn't configure
so much [swap] unless you do want the kernel to actually use it all, right?
  

Two words: Software suspend.  I've actually been thinking of increasing
it on the next install...



Kernel doesn't know that you want to use it for suspend but not
regular swapping, unfortunately.
  


Couldn't you mount swap before suspend and unmount it after resume?
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OOM killer gripe (was Re: What still uses the block layer?)

2007-10-16 Thread Rob Landley
On Monday 15 October 2007 11:38:33 pm Eric W. Biederman wrote:
  I don't follow your logic. We don't need SWAP  RAM in order to swap
  effectively, IMO.

 The steady state of a system that is heavily and usably swapping but
 not thrashing is that all of the pages in RAM are in the swap cache,
 at least that used to be the case.

Mind if I throw in some vague and questionable numbers? :)

I vaguely recall that my old 486 laptop with 16 megabyes of ram (circa 1998) 
used to be able to do 3 point something megabytes per second to/from disk, 
according to hdparm -t.  (That was with DMA enabled.)

This means that my old laptop, using sequential writes and not being bogged 
down by excessive seeking, could write its entire memory contents to disk and 
read it back in again in about 10 seconds total (5 write, 5 read).

My current laptop has 2 gigabytes of ram, and hdparm -t /dev/sda says:
  /dev/sda:
   Timing buffered disk reads:  116 MB in  3.01 seconds =  38.54 MB/sec

So that's a little over a factor of 10 speed improvement.  (Although I note 
that I got 30 megabytes/second off of an ATA/100 adapter in 2002, so it's 
barely any faster than it was 5 years ago.)

This means I can expect my current laptop to write out its memory in 50 
seconds (2000/40), and another 50 seconds to read it back in.

So 10 seconds to cycle through memory 10 years ago, vs a little under 2 
minutes today, on systems at roughly the same price point.  And that's 
limited by what the hardware is doing, assuming a _perfect_ linear read/write 
pattern with no seeks.

Oh, and my old 486 had its RAM maxed out.  This one can hold twice as much.  
And heavy seeking sucks more than it used to relative to sequential reads by 
something like a proportional amount (hence the rise of I/O elevators as a 
mitigation strategy), although I haven't got numbers for that handy.

  I don't know if there is a causal relationship there. I mean, I
  think it's been a long time since thrashing was ever a viable mode
  of operation, right?

 Right.  But swapping heavily has been a viable mode of operation
 and that the vast gap in disk random IO performance seems to have
 hurt significantly.

 It be very clear is used to able to run a problem at little below
 full speed with the disk pegged with swap traffic, and I did this
 regularly when I started out with linux.

The problem is the gap is getting bigger.  The 486-75 laptop mentioned above 
had a 25 mhz 32 bit front side bus.  A quick google suggests my core 2 duo 
has a 667 mhz FSB and I'm guessing a 128 bit data path (two 64-bit channels).

I could boot up memtest86 and get actual benchmarks, but total handwaving for 
a moment, 25*32=800 and 667*128=85376, and the second divided by the first is 
over 100 times as big.  That concurs with the 16mhz-1733 mhz processor speed 
increase.

Factor of 10 disk speed increase, factor of 100 memory speed increase.  Disks 
speeds aren't keeping up with processor and memory increases.  Disk _sizes_ 
are, but speeds aren't.

  Maybe desktops just have less need for swapping now, so nobody sees
  it much until something goes _really_ bad. When I'm using my 256MB
  machine, unused stuff goes to swap.

 There is a bit of truth in the fact that there is less need for
 swapping now.  At the same time however swapping simply does not
 work well right now, and I'm not at all certain why.

Do the numbers above help?  It'll only get worse, unless some random new 
technology (maybe http://en.wikipedia.org/wiki/MRAM or something) swoops in 
to change everything, again.

  the disk for is very limited.   I wonder if we could figure out
  how to push and pull 1M or bigger chunks into and out of swap?
 
  Pulling in 1MB pages can really easily end up compounding the
  thrashing problem unless you're very sure a significant amount
  of it will be used.

 It's a hard call.  The I/O time for 1MB of contiguous disk data
 is about the I/O time of 512 bytes of contiguous disk data.

Hence the seek sucking even more now part. :(

I'm sure somebody will eventually write an OLS paper or something on the 
advisability of making swapping decisions with 4k granularity when disks 
really want bigger I/O transactions.  Maybe they already have, somewhere 
between:
http://kernel.org/doc/ols/2007/ols2007v1-pages-53-64.pdf
and
http://kernel.org/doc/ols/2007/ols2007v1-pages-277-284.pdf

Rob
-- 
One of my most productive days was throwing away 1000 lines of code.
  - Ken Thompson.
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OOM killer gripe (was Re: What still uses the block layer?)

2007-10-16 Thread Eric W. Biederman
Rob Landley [EMAIL PROTECTED] writes:

 On Monday 15 October 2007 11:38:33 pm Eric W. Biederman wrote:
  I don't follow your logic. We don't need SWAP  RAM in order to swap
  effectively, IMO.

 The steady state of a system that is heavily and usably swapping but
 not thrashing is that all of the pages in RAM are in the swap cache,
 at least that used to be the case.

 Mind if I throw in some vague and questionable numbers? :)

 I vaguely recall that my old 486 laptop with 16 megabyes of ram (circa 1998) 
 used to be able to do 3 point something megabytes per second to/from disk, 
 according to hdparm -t.  (That was with DMA enabled.)

 This means that my old laptop, using sequential writes and not being bogged 
 down by excessive seeking, could write its entire memory contents to disk and 
 read it back in again in about 10 seconds total (5 write, 5 read).

 My current laptop has 2 gigabytes of ram, and hdparm -t /dev/sda says:
   /dev/sda:
Timing buffered disk reads:  116 MB in  3.01 seconds =  38.54 MB/sec

 So that's a little over a factor of 10 speed improvement.  (Although I note 
 that I got 30 megabytes/second off of an ATA/100 adapter in 2002, so it's 
 barely any faster than it was 5 years ago.)

 This means I can expect my current laptop to write out its memory in 50 
 seconds (2000/40), and another 50 seconds to read it back in.

 So 10 seconds to cycle through memory 10 years ago, vs a little under 2 
 minutes today, on systems at roughly the same price point.  And that's 
 limited by what the hardware is doing, assuming a _perfect_ linear read/write 
 pattern with no seeks.

 Oh, and my old 486 had its RAM maxed out.  This one can hold twice as much.  
 And heavy seeking sucks more than it used to relative to sequential reads by 
 something like a proportional amount (hence the rise of I/O elevators as a 
 mitigation strategy), although I haven't got numbers for that handy.

  I don't know if there is a causal relationship there. I mean, I
  think it's been a long time since thrashing was ever a viable mode
  of operation, right?

 Right.  But swapping heavily has been a viable mode of operation
 and that the vast gap in disk random IO performance seems to have
 hurt significantly.

 It be very clear is used to able to run a problem at little below
 full speed with the disk pegged with swap traffic, and I did this
 regularly when I started out with linux.

 The problem is the gap is getting bigger.  The 486-75 laptop mentioned above 
 had a 25 mhz 32 bit front side bus.  A quick google suggests my core 2 duo 
 has a 667 mhz FSB and I'm guessing a 128 bit data path (two 64-bit channels).

I'm pretty certain Intels' arechitecture is only has a 64bit front side bus.
Of course I'm used to seeing it clocked a bit higher.

 I could boot up memtest86 and get actual benchmarks, but total handwaving for 
 a moment, 25*32=800 and 667*128=85376, and the second divided by the first is 
 over 100 times as big.  That concurs with the 16mhz-1733 mhz processor speed 
 increase.

 Factor of 10 disk speed increase, factor of 100 memory speed increase.  Disks 
 speeds aren't keeping up with processor and memory increases.  Disk _sizes_ 
 are, but speeds aren't.

Exactly.

  Maybe desktops just have less need for swapping now, so nobody sees
  it much until something goes _really_ bad. When I'm using my 256MB
  machine, unused stuff goes to swap.

 There is a bit of truth in the fact that there is less need for
 swapping now.  At the same time however swapping simply does not
 work well right now, and I'm not at all certain why.

 Do the numbers above help?  It'll only get worse, unless some random new 
 technology (maybe http://en.wikipedia.org/wiki/MRAM or something) swoops in 
 to change everything, again.

Well it will be interesting to see what happens with NAND flash.  So far
it is pricey but you can easily make it faster then todays hard drives.
Capacity is still coming.

  the disk for is very limited.   I wonder if we could figure out
  how to push and pull 1M or bigger chunks into and out of swap?
 
  Pulling in 1MB pages can really easily end up compounding the
  thrashing problem unless you're very sure a significant amount
  of it will be used.

 It's a hard call.  The I/O time for 1MB of contiguous disk data
 is about the I/O time of 512 bytes of contiguous disk data.

 Hence the seek sucking even more now part. :(

 I'm sure somebody will eventually write an OLS paper or something on the 
 advisability of making swapping decisions with 4k granularity when disks 
 really want bigger I/O transactions.  Maybe they already have, somewhere 
 between:
 http://kernel.org/doc/ols/2007/ols2007v1-pages-53-64.pdf
 and
 http://kernel.org/doc/ols/2007/ols2007v1-pages-277-284.pdf

An interesting point.  What would really impress me is actually finding
a current work load that can productively swap after everything kernel
side is fixed up and optimized.   So far it seems like real swapping is so
painful 

Re: OOM killer gripe (was Re: What still uses the block layer?)

2007-10-16 Thread Alan Cox
 I'm sure somebody will eventually write an OLS paper or something on the 
 advisability of making swapping decisions with 4k granularity when disks 
 really want bigger I/O transactions. 

Funnily enough someone thought of that many years ago. They even added
and documented it, then they made it adjustable.

See the vm section of Documentation/filesystems/proc.txt

Alan
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OOM killer gripe (was Re: What still uses the block layer?)

2007-10-16 Thread Andrew Morton
On Mon, 15 Oct 2007 23:37:44 +1000
Nick Piggin [EMAIL PROTECTED] wrote:

 Would an oom-kill-someone-now sysrq be of help, I wonder?

Is already there: sysrq-f.
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OOM killer gripe (was Re: What still uses the block layer?)

2007-10-16 Thread Rob Landley
On Tuesday 16 October 2007 5:28:59 am Alan Cox wrote:
  I'm sure somebody will eventually write an OLS paper or something on the
  advisability of making swapping decisions with 4k granularity when disks
  really want bigger I/O transactions.

 Funnily enough someone thought of that many years ago. They even added
 and documented it, then they made it adjustable.

 See the vm section of Documentation/filesystems/proc.txt

I presume you refer to:

  page-cluster
  

  page-cluster controls the number of pages which are written to swap in
  a single attempt.  The swap I/O size.

  It is a logarithmic value - setting it to zero means 1 page, setting
  it to 1 means 2 pages, setting it to 2 means 4 pages, etc.

  The default value is three (eight pages at a time).  There may be some
  small benefits in tuning this to a different value if your workload is
  swap-intensive.

I didn't know that controlled whether the pages were contiguous (or written to 
contiguous locations in swap).  I thought it was just how many the VM tried 
to free at a time.

Still, worth a tweak.  Thanks.

 Alan

Rob
-- 
One of my most productive days was throwing away 1000 lines of code.
  - Ken Thompson.
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


OOM killer gripe (was Re: What still uses the block layer?)

2007-10-15 Thread Nick Piggin
On Monday 15 October 2007 18:04, Rob Landley wrote:
 On Sunday 14 October 2007 8:45:03 pm Theodore Tso wrote:

   excuse for conflating different categories of devices in the first
   place.
 
  See the thinkpad Ultrabay drive example above.

 Last week I drove my laptop so deep into swap (with a make -j on qemu)
 that after half an hour trying to repaint my kmail window, it locked solid.
 Again.  You'd think the oom killer would come to the rescue, but it didn't.
 Maybe Ubuntu disabled it.  I have _2_gigs_ of ram in this sucker, on a
 stock Ubuntu 7.04 install (with the upgrade all tab pressed a few times),
 and yet I managed to make it swap itself to death one more time.

 Virtual memory isn't perfect.  I've _always_ been able to come up with
 examples where it just doesn't work for me.  This doesn't mean VM
 overcommit should be abolished, because it's useful more often than not.

I hate to go completely offtopic here, but disks are so incredibly
slow when compared to RAM that there is really nothing the kernel
can do about this. Presumably the job will finish, given infinite
time.

How much swap do you have configured? You really shouldn't configure
so much unless you do want the kernel to actually use it all, right?
Because if we're not really conservative about OOM killing, then the
user who actually really did want to use all the swap they configured
gets angry when we kill their jobs without using it all.

Would an oom-kill-someone-now sysrq be of help, I wonder?
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OOM killer gripe (was Re: What still uses the block layer?)

2007-10-15 Thread Rob Landley
On Monday 15 October 2007 8:37:44 am Nick Piggin wrote:
  Virtual memory isn't perfect.  I've _always_ been able to come up with
  examples where it just doesn't work for me.  This doesn't mean VM
  overcommit should be abolished, because it's useful more often than not.

 I hate to go completely offtopic here, but disks are so incredibly
 slow when compared to RAM that there is really nothing the kernel
 can do about this.

I know.

 Presumably the job will finish, given infinite 
 time.

I gave it about half an hour, then it locked solid and stopped writing to the 
disk at all.  (I gave it another 5 minutes at that point, then held down the 
power button.)

Lost about 50 open konqueror tabs...

 How much swap do you have configured?

2 gigs, same as ram.

 You really shouldn't configure 
 so much unless you do want the kernel to actually use it all, right?

Two words: Software suspend.  I've actually been thinking of increasing it 
on the next install...

 Because if we're not really conservative about OOM killing, then the
 user who actually really did want to use all the swap they configured
 gets angry when we kill their jobs without using it all.

I tend to lower swappiness and when that happens all sorts of stuff goes 
weird.  Software suspend used to say says it can't free enough memory if I 
put swappiness at 0 (dunno if it still does).  This time the OOM killer never 
triggered before hard deadlock.  (I think I had it around 20 or 40 or some 
such.)

 Would an oom-kill-someone-now sysrq be of help, I wonder?

*shrug* It might.  I was a letting it run hoping it would complete itself when 
it locked solid.  (The keyboard LEDs weren't flashing, so I don't _think_ it 
paniced.  I was in X so I wouldn't have seen a message...)

(To be honest, I can never remember how to trigger sysrq on a laptop keyboard.  
Presumably X won't intercept it the way it does alt-f1 and ctrl-alt-del...)

Rob
-- 
One of my most productive days was throwing away 1000 lines of code.
  - Ken Thompson.
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OOM killer gripe (was Re: What still uses the block layer?)

2007-10-15 Thread Nick Piggin
On Monday 15 October 2007 19:52, Rob Landley wrote:
 On Monday 15 October 2007 8:37:44 am Nick Piggin wrote:
   Virtual memory isn't perfect.  I've _always_ been able to come up with
   examples where it just doesn't work for me.  This doesn't mean VM
   overcommit should be abolished, because it's useful more often than
   not.
 
  I hate to go completely offtopic here, but disks are so incredibly
  slow when compared to RAM that there is really nothing the kernel
  can do about this.

 I know.

  Presumably the job will finish, given infinite
  time.

 I gave it about half an hour, then it locked solid and stopped writing to
 the disk at all.  (I gave it another 5 minutes at that point, then held
 down the power button.)

Maybe it was a bug then. Hard to say without backtraces ;)


  You really shouldn't configure
  so much unless you do want the kernel to actually use it all, right?

 Two words: Software suspend.  I've actually been thinking of increasing
 it on the next install...

Kernel doesn't know that you want to use it for suspend but not
regular swapping, unfortunately.


  Because if we're not really conservative about OOM killing, then the
  user who actually really did want to use all the swap they configured
  gets angry when we kill their jobs without using it all.

 I tend to lower swappiness and when that happens all sorts of stuff goes
 weird.  Software suspend used to say says it can't free enough memory if I
 put swappiness at 0 (dunno if it still does).  This time the OOM killer
 never triggered before hard deadlock.  (I think I had it around 20 or 40 or
 some such.)

  Would an oom-kill-someone-now sysrq be of help, I wonder?

 *shrug* It might.  I was a letting it run hoping it would complete itself
 when it locked solid.  (The keyboard LEDs weren't flashing, so I don't
 _think_ it paniced.  I was in X so I wouldn't have seen a message...)

If you can work out where things are spinning/sleeping when that happens,
along with sysrq+M data, then it could make for a useful bug report. Not
entirely helpful, but if it is a reproducible problem for you, then you
might be able to get that data from outside X.
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OOM killer gripe (was Re: What still uses the block layer?)

2007-10-15 Thread Eric W. Biederman
Nick Piggin [EMAIL PROTECTED] writes:

 On Monday 15 October 2007 18:04, Rob Landley wrote:
 On Sunday 14 October 2007 8:45:03 pm Theodore Tso wrote:

   excuse for conflating different categories of devices in the first
   place.
 
  See the thinkpad Ultrabay drive example above.

 Last week I drove my laptop so deep into swap (with a make -j on qemu)
 that after half an hour trying to repaint my kmail window, it locked solid.
 Again.  You'd think the oom killer would come to the rescue, but it didn't.
 Maybe Ubuntu disabled it.  I have _2_gigs_ of ram in this sucker, on a
 stock Ubuntu 7.04 install (with the upgrade all tab pressed a few times),
 and yet I managed to make it swap itself to death one more time.

 Virtual memory isn't perfect.  I've _always_ been able to come up with
 examples where it just doesn't work for me.  This doesn't mean VM
 overcommit should be abolished, because it's useful more often than not.

 I hate to go completely offtopic here, but disks are so incredibly
 slow when compared to RAM that there is really nothing the kernel
 can do about this. Presumably the job will finish, given infinite
 time.

 How much swap do you have configured? You really shouldn't configure
 so much unless you do want the kernel to actually use it all, right?

No.

There are three basic swapping scenarios.
- Pushing unused data out of ram
- Swapping 
- Thrashing

To effectively swap you need SWAP  RAM because after a little while of
swapping all of your pages in RAM should be assigned a location in the
page cache.

I have not heard of many people swapping and not thrashing lately.
I think part of the problem is that we do random access to the swap
partition which makes us seek limited.  And since the number of
seeks per unit time has been increasing at a linear or slower rate
that if we are doing random disk I/O then the amount we can use
the disk for is very limited.   I wonder if we could figure out
how to push and pull 1M or bigger chunks into and out of swap?

I don't know if swap has actually worked since we vmscan stopped
going over the virtual addresses.

 Because if we're not really conservative about OOM killing, then the
 user who actually really did want to use all the swap they configured
 gets angry when we kill their jobs without using it all.

I totally agree. The fact that the OOM killer started is a sign that
the system was completely overwhelmed and nothing better could happen.

In this case my gut feel says limiting the total number of processes
would have been much more effective then anything at all to do with
swap. make -j reminds me of the classic fork bomb.

 Would an oom-kill-someone-now sysrq be of help, I wonder?

Well we have SAQ which should kill everything on your current VT
which should include X and all of it's children.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OOM killer gripe (was Re: What still uses the block layer?)

2007-10-15 Thread david

On Mon, 15 Oct 2007, Eric W. Biederman wrote:


Nick Piggin [EMAIL PROTECTED] writes:


How much swap do you have configured? You really shouldn't configure
so much unless you do want the kernel to actually use it all, right?


No.

There are three basic swapping scenarios.
- Pushing unused data out of ram
- Swapping
- Thrashing

To effectively swap you need SWAP  RAM because after a little while of
swapping all of your pages in RAM should be assigned a location in the
page cache.


on some kernel versions you are correct about needing swap  ram, but on 
current versions you are not. the swap space gets allocated as needed, and 
re-used as needed (I don't know the mechanism of this, but I remember the 
last time this changed from vm=max(ram,swap) to vm=ram+swap)



I have not heard of many people swapping and not thrashing lately.
I think part of the problem is that we do random access to the swap
partition which makes us seek limited.  And since the number of
seeks per unit time has been increasing at a linear or slower rate
that if we are doing random disk I/O then the amount we can use
the disk for is very limited.   I wonder if we could figure out
how to push and pull 1M or bigger chunks into and out of swap?


it has been noted by many people that linux is very slow to pull things 
back into ram from swap, significantly slower then simple seed limiting 
would seem to account for.


Davdi Lang
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OOM killer gripe (was Re: What still uses the block layer?)

2007-10-15 Thread Nick Piggin
On Tuesday 16 October 2007 13:55, Eric W. Biederman wrote:
 Nick Piggin [EMAIL PROTECTED] writes:

  How much swap do you have configured? You really shouldn't configure
  so much unless you do want the kernel to actually use it all, right?

 No.

 There are three basic swapping scenarios.
 - Pushing unused data out of ram
 - Swapping
 - Thrashing

 To effectively swap you need SWAP  RAM because after a little while of
 swapping all of your pages in RAM should be assigned a location in the
 page cache.

I don't follow your logic. We don't need SWAP  RAM in order to swap
effectively, IMO.


 I have not heard of many people swapping and not thrashing lately.
 I think part of the problem is that we do random access to the swap
 partition which makes us seek limited.  And since the number of
 seeks per unit time has been increasing at a linear or slower rate
 that if we are doing random disk I/O then the amount we can use

I don't know if there is a causal relationship there. I mean, I
think it's been a long time since thrashing was ever a viable mode
of operation, right?

Maybe desktops just have less need for swapping now, so nobody sees
it much until something goes _really_ bad. When I'm using my 256MB
machine, unused stuff goes to swap.


 the disk for is very limited.   I wonder if we could figure out
 how to push and pull 1M or bigger chunks into and out of swap?

Pulling in 1MB pages can really easily end up compounding the
thrashing problem unless you're very sure a significant amount
of it will be used.


 I don't know if swap has actually worked since we vmscan stopped
 going over the virtual addresses.

I do, and it does ;)


  Because if we're not really conservative about OOM killing, then the
  user who actually really did want to use all the swap they configured
  gets angry when we kill their jobs without using it all.

 I totally agree. The fact that the OOM killer started is a sign that
 the system was completely overwhelmed and nothing better could happen.

 In this case my gut feel says limiting the total number of processes
 would have been much more effective then anything at all to do with
 swap. make -j reminds me of the classic fork bomb.

Yep.


  Would an oom-kill-someone-now sysrq be of help, I wonder?

 Well we have SAQ which should kill everything on your current VT
 which should include X and all of it's children.

Which is exactly what you don't want to do if you've just forkbombed
yourself. I missed the fact that we now have a manual oom kill...
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OOM killer gripe (was Re: What still uses the block layer?)

2007-10-15 Thread Eric W. Biederman
[EMAIL PROTECTED] writes:


 on some kernel versions you are correct about needing swap  ram, but on 
 current
 versions you are not. the swap space gets allocated as needed, and re-used as
 needed (I don't know the mechanism of this, but I remember the last time this
 changed from vm=max(ram,swap) to vm=ram+swap)

I don't think I can recall a linux kernel that required swap  ram.
However for serious swapping under linux having swap  ram was very
useful and pretty much a requirement for a workload that involved
swapping heavily (not thrashing).

 I have not heard of many people swapping and not thrashing lately.
 I think part of the problem is that we do random access to the swap
 partition which makes us seek limited.  And since the number of
 seeks per unit time has been increasing at a linear or slower rate
 that if we are doing random disk I/O then the amount we can use
 the disk for is very limited.   I wonder if we could figure out
 how to push and pull 1M or bigger chunks into and out of swap?

 it has been noted by many people that linux is very slow to pull things back
 into ram from swap, significantly slower then simple seed limiting would seem 
 to
 account for.

Yes.  It may be the large amount of random access (my current guess)
or it may be something else.

I'm wonder if I should build an application with a configurable
data set and working set that can be used for swap testing.  I don't
think it would be very hard and it might help sort through some of
the swap performance problems.

Eric



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OOM killer gripe (was Re: What still uses the block layer?)

2007-10-15 Thread Nick Piggin
On Tuesday 16 October 2007 14:38, Eric W. Biederman wrote:
 Nick Piggin [EMAIL PROTECTED] writes:
  On Tuesday 16 October 2007 13:55, Eric W. Biederman wrote:

  I don't follow your logic. We don't need SWAP  RAM in order to swap
  effectively, IMO.

 The steady state of a system that is heavily and usably swapping but
 not thrashing is that all of the pages in RAM are in the swap cache,
 at least that used to be the case.

Yeah, it works better in 2.6 (and, IIRC later 2.4 kernels).


  I don't know if there is a causal relationship there. I mean, I
  think it's been a long time since thrashing was ever a viable mode
  of operation, right?

 Right.  But swapping heavily has been a viable mode of operation
 and that the vast gap in disk random IO performance seems to have
 hurt significantly.

Or, just not improved as fast as everything else is improving.
There isn't too much the kernel can do about that. It just
relatively changes the point at which you'd consider swapping
heavily, right?


 It be very clear is used to able to run a problem at little below
 full speed with the disk pegged with swap traffic, and I did this
 regularly when I started out with linux.

I can do this now. In make -jhuge tests for example, you can get
a 4GB, 4 core machine to max out a disk with swapping and still
have 0 idle time. Of course you can also go past that point and
your idle time comes up. That's not new though.


  Maybe desktops just have less need for swapping now, so nobody sees
  it much until something goes _really_ bad. When I'm using my 256MB
  machine, unused stuff goes to swap.

 There is a bit of truth in the fact that there is less need for
 swapping now.  At the same time however swapping simply does not
 work well right now, and I'm not at all certain why.

  the disk for is very limited.   I wonder if we could figure out
  how to push and pull 1M or bigger chunks into and out of swap?
 
  Pulling in 1MB pages can really easily end up compounding the
  thrashing problem unless you're very sure a significant amount
  of it will be used.

 It's a hard call.  The I/O time for 1MB of contiguous disk data
 is about the I/O time of 512 bytes of contiguous disk data.

And if you're thrashing, then by definition you need to throw
out 1MB of your working set in order to read it in.


  I don't know if swap has actually worked since we vmscan stopped
  going over the virtual addresses.
 
  I do, and it does ;)

 Really?  Not just the pushing of unused stuff into swap.

We had several bugs and things that caused swapping performance
regressions vs 2.4 in earlyish 2.6. After those were fixed, we're
pretty competitive with 2.4 in some basic tests I was using. I
haven't run them for a fair while, so something might have broken
since then, I don't know.
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html