subject:"Re\: \[HACKERS\] Pre\-allocation of shared memory ..."

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-16 Thread Jim C. Nasby

On Thu, Jun 12, 2003 at 10:10:02PM -0400, Bruce Momjian wrote:
 Tom Lane wrote:
 It is bad to hang the system, but if it reports swap failure, at least
 the admin knows why it failed, rather than killing random processes.
 
I wonder if it might be better to suspend whatever process is trying to
allocate/write to too much memory. At least then you have some chance of
keeping the system up (obviously you'd need to leave some amount free so
you could login to the box to fix things).
-- 
Jim C. Nasby (aka Decibel!)[EMAIL PROTECTED]
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-16 Thread Jim C. Nasby

On Fri, Jun 13, 2003 at 12:41:28PM -0400, Bruce Momjian wrote:
 Of course, if you exceed swap, your system hangs.
 
Are you sure? I ran out of swap once or came damn close, due to a cron
job gone amuck. My clue was starting to see lots of memory allocation
errors. After I fixed what was blocking all the backed-up cron jobs, the
machine ground to a crawl (mmm... system load of 400+ on a dual
PII-375), and X did crash (though I think that's because I tried
switching to a different virtual console), but the machine stayed up and
eventually worked through everything.
-- 
Jim C. Nasby (aka Decibel!)[EMAIL PROTECTED]
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-15 Thread Shridhar Daithankar

On 14 Jun 2003 at 16:38, Andrew Dunstan wrote:
 Summary: don't take shortcuts looking for this - Read the Source, Luke. It's
 important not to give people false expectations. For now, I'm leaning in
 Tom's direction of advising people to avoid Linux for mission-critical
 situations that could run into an OOM.

While I agree that vanilla linux does not handle the situation gracefully 
enough, anybody running a mission critical application should spec. the machine 
and the demads on the same carefully enough. For certain linux won't start 
doing OOM kill because it started going low on buffer memory. ( At least I hope 
so.)

If on expects to throw uncalculated amount of load on a mission critical box, 
till it reaches swap for every malloc in a strcpy, there are things need to be 
checked before which kernel/OS you are running.

And BTW whas that original comment for vanilla liux or linux in general..:-)


Bye
 Shridhar

--
Adore, v.:  To venerate expectantly.-- Ambrose Bierce, The 
Devil's 
Dictionary


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-15 Thread Andrew Dunstan

Alan Cox has written to me thus:

 It got dropped for RH9 and some errata kernels because of clashes between
 the old stuff and the rmap vm and other weird RH patches

andrew

- Original Message - 
From: Andrew Dunstan [EMAIL PROTECTED]
To: Tom Lane [EMAIL PROTECTED]
Cc: Kurt Roeckx [EMAIL PROTECTED]; Matthew Kirkwood
[EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Saturday, June 14, 2003 5:39 PM
Subject: Re: [HACKERS] Pre-allocation of shared memory ...


 I know he does -  *but* I think it has probably been wiped out by accident
 somewhere along the line (like when they went to 2.4.20?)

 Here's what's in RH sources - tell me after you look that I am looking in
 the wrong place. (Or did RH get cute and decide to do this only for the AS
 product?)



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-14 Thread Lamar Owen

On Friday 13 June 2003 15:29, Lamar Owen wrote:
 It is or was a Linux kernel problem.  The 2.2 kernel required double swap
 space, even though it wasn't well documented.  Early 2.4 kernels also
 required double swap space, and it was better documented.  Current Red Hat
 2.4 kernels, I'm not sure which VM system is in use.  The old VM certainly
 DID require double physical memory swap space.

After consulting with some kernel gurus, you can upgrade to a straight Alan 
Cox (-ac) kernel and turn off overcommits to cause it to fail the allocation 
instead of blowing processes out at random when the overcommit bites.
-- 
Lamar Owen
WGCR Internet Radio
1 Peter 4:11


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-14 Thread Andrew Dunstan

The trouble with this advice is that if I am an SA wanting to run a DBMS
server, I will want to run a kernel supplied by a vendor, not an arbitrary
kernel released by a developer, even one as respected as Alan Cox.

andrew

- Original Message - 
From: Lamar Owen [EMAIL PROTECTED]
To: Nigel J. Andrews [EMAIL PROTECTED]
Cc: Josh Berkus [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Saturday, June 14, 2003 11:52 AM
Subject: Re: [HACKERS] Pre-allocation of shared memory ...

 On Friday 13 June 2003 15:29, Lamar Owen wrote:
  It is or was a Linux kernel problem.  The 2.2 kernel required double
swap
  space, even though it wasn't well documented.  Early 2.4 kernels also
  required double swap space, and it was better documented.  Current Red
Hat
  2.4 kernels, I'm not sure which VM system is in use.  The old VM
certainly
  DID require double physical memory swap space.

 After consulting with some kernel gurus, you can upgrade to a straight
Alan
 Cox (-ac) kernel and turn off overcommits to cause it to fail the
allocation
 instead of blowing processes out at random when the overcommit bites.

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-14 Thread Andrew Dunstan


http://lwn.net/Articles/4628/ has this possibly useful info:

---
 So what is strict VM overcommit?  We introduce new overcommit policies
that attempt to never succeed an allocation that can not be fulfilled by
the backing store and consequently never OOM.  This is achieved through
strict accounting of the committed address space and a policy to
allow/refuse allocations based on that accounting.

In the strictest of modes, it should be impossible to allocate more
memory than available and impossible to OOM.  All memory failures should
be pushed down to the allocation routines -- malloc, mmap, etc.
--
But see also the discussion from July last
year:http://www.ussg.iu.edu/hypermail/linux/kernel/0207.2/index.htmlA quick
investigation of 2.4 releases on kernel.org appears to show this still
hasn't made it into mainline kernels. Apparently Alan did this work
originally because RH had customers using Oracle who were running into OOM
... Surprise!I don't keep copies of old kernel sources around on my Linux
machine, so I don't know when it went into the RH kernel series - that at
least would be nice to know.andrew

- Original Message - 
From: Andrew Dunstan [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Saturday, June 14, 2003 12:30 PM
Subject: Re: [HACKERS] Pre-allocation of shared memory ...


 The trouble with this advice is that if I am an SA wanting to run a DBMS
 server, I will want to run a kernel supplied by a vendor, not an arbitrary
 kernel released by a developer, even one as respected as Alan Cox.

 andrew

 - Original Message - 
 From: Lamar Owen [EMAIL PROTECTED]
 To: Nigel J. Andrews [EMAIL PROTECTED]
 Cc: Josh Berkus [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Sent: Saturday, June 14, 2003 11:52 AM
 Subject: Re: [HACKERS] Pre-allocation of shared memory ...


  On Friday 13 June 2003 15:29, Lamar Owen wrote:
   It is or was a Linux kernel problem.  The 2.2 kernel required double
 swap
   space, even though it wasn't well documented.  Early 2.4 kernels also
   required double swap space, and it was better documented.  Current Red
 Hat
   2.4 kernels, I'm not sure which VM system is in use.  The old VM
 certainly
   DID require double physical memory swap space.
 
  After consulting with some kernel gurus, you can upgrade to a straight
 Alan
  Cox (-ac) kernel and turn off overcommits to cause it to fail the
 allocation
  instead of blowing processes out at random when the overcommit bites.


 ---(end of broadcast)---
 TIP 9: the planner will ignore your desire to choose an index scan if your
   joining column's datatypes do not match


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-14 Thread Matthew Kirkwood

On Sat, 14 Jun 2003, Andrew Dunstan wrote:

 The trouble with this advice is that if I am an SA wanting to run a
 DBMS server, I will want to run a kernel supplied by a vendor, not an
 arbitrary kernel released by a developer, even one as respected as
 Alan Cox.

Like, say, Red Hat:

$ ls -l /proc/sys/vm/overcommit_memory
-rw-r--r--1 root root0 Jun 14 18:58 /proc/sys/vm/overcommit_memory
$ uname -a
Linux stinky.hoopy.net 2.4.20-20.1.1995.2.2.nptl #1 Fri May 23 12:18:31 EDT 2003 i686 
i686 i386 GNU/Linux

(This is a Rawhide kernel, but I think that control has been
in stock RH kernels for some time now.)

Matthew.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-14 Thread Kurt Roeckx

On Sat, Jun 14, 2003 at 08:32:40PM +0100, Matthew Kirkwood wrote:
 On Sat, 14 Jun 2003, Andrew Dunstan wrote:
 
  The trouble with this advice is that if I am an SA wanting to run a
  DBMS server, I will want to run a kernel supplied by a vendor, not an
  arbitrary kernel released by a developer, even one as respected as
  Alan Cox.
 
 Like, say, Red Hat:
 
 $ ls -l /proc/sys/vm/overcommit_memory
 -rw-r--r--1 root root0 Jun 14 18:58 
 /proc/sys/vm/overcommit_memory
 $ uname -a
 Linux stinky.hoopy.net 2.4.20-20.1.1995.2.2.nptl #1 Fri May 23 12:18:31 EDT 2003 
 i686 i686 i386 GNU/Linux


I also got that /proc/sys/vm/overcommit_memory on a plain 2.4.21.


Kurt


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-14 Thread Matthew Kirkwood

On Sat, 14 Jun 2003, Kurt Roeckx wrote:

  $ ls -l /proc/sys/vm/overcommit_memory
  -rw-r--r--1 root root0 Jun 14 18:58 
  /proc/sys/vm/overcommit_memory
  $ uname -a
  Linux stinky.hoopy.net 2.4.20-20.1.1995.2.2.nptl #1 Fri May 23 12:18:31 EDT 2003 
  i686 i686 i386 GNU/Linux

 I also got that /proc/sys/vm/overcommit_memory on a plain 2.4.21.

This might also be interesting:

http://www.cs.helsinki.fi/linux/linux-kernel/2002-33/0826.html

I couldn't say how much of it is in the stock RH kernels,
or how successful the heuristic is.

Matthew.


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-14 Thread Andrew Dunstan

Yes, but it's only a binary flag. Non-zero says cheerfully overcommit and
0 says try not to overcommit  but there isn't a value that says make sure
not to overcommit.

Have a look in mm/mmap.c in the plain 2.4.21 sources for evidence. There's
nothing like the Alan Cox patch.

IOW, simply the presence of /proc/sys/vm/overcommit_memory with a value set
to 0 doesn't guarantee you won't get an OOM kill, AFAICS.

I *know* the latest RH kernel docs *say* they have paranoid mode that
supposedly guarantees against OOM - it was me that pointed that out
originally :-). I just checked on the latest sources (today it's RH8, kernel
2.4.20-18.8) to be doubly sure, and can't see the patches. (That would be
really bad of RH, btw, if I'm correct - saying in your docs you support
something that you don't)

The proof, if any is needed, that the mainline kernel still does not have
this, is that it is still in Alan's patch set against 2.4.21, at
http://www.kernel.org/pub/linux/kernel/people/alan/linux-2.4/2.4.21/patch-2.4.21-ac1.gz

Summary: don't take shortcuts looking for this - Read the Source, Luke. It's
important not to give people false expectations. For now, I'm leaning in
Tom's direction of advising people to avoid Linux for mission-critical
situations that could run into an OOM.

cheers

andrew

- Original Message - 
From: Kurt Roeckx [EMAIL PROTECTED]
To: Matthew Kirkwood [EMAIL PROTECTED]
Cc: Andrew Dunstan [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Saturday, June 14, 2003 3:44 PM
Subject: Re: [HACKERS] Pre-allocation of shared memory ...


 On Sat, Jun 14, 2003 at 08:32:40PM +0100, Matthew Kirkwood wrote:
  On Sat, 14 Jun 2003, Andrew Dunstan wrote:
 
   The trouble with this advice is that if I am an SA wanting to run a
   DBMS server, I will want to run a kernel supplied by a vendor, not an
   arbitrary kernel released by a developer, even one as respected as
   Alan Cox.
 
  Like, say, Red Hat:
 
  $ ls -l /proc/sys/vm/overcommit_memory
  -rw-r--r--1 root root0 Jun 14 18:58
/proc/sys/vm/overcommit_memory
  $ uname -a
  Linux stinky.hoopy.net 2.4.20-20.1.1995.2.2.nptl #1 Fri May 23 12:18:31
EDT 2003 i686 i686 i386 GNU/Linux


 I also got that /proc/sys/vm/overcommit_memory on a plain 2.4.21.


 Kurt


 ---(end of broadcast)---
 TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-14 Thread Tom Lane

Andrew Dunstan [EMAIL PROTECTED] writes:
 I *know* the latest RH kernel docs *say* they have paranoid mode that
 supposedly guarantees against OOM - it was me that pointed that out
 originally :-). I just checked on the latest sources (today it's RH8, kernel
 2.4.20-18.8) to be doubly sure, and can't see the patches.

I think you must be looking in the wrong place.  Red Hat's kernels have
included the mode 2/3 overcommit logic since RHL 7.3, according to
what I can find.  (Don't forget Alan Cox works for Red Hat ;-).)

But it is true that it's not in Linus' tree yet.  This may be because
there are still some loose ends.  The copy of the overcommit document
in my RHL 8.0 system lists some ToDo items down at the bottom:

To Do
-
o   Account ptrace pages (this is hard)
o   Disable MAP_NORESERVE in mode 2/3
o   Account for shared anonymous mappings properly
- right now we account them per instance

I have not installed RHL 9 yet --- is the ToDo list any shorter there?

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-14 Thread Tom Lane

Andrew Dunstan [EMAIL PROTECTED] writes:
 I *know* the latest RH kernel docs *say* they have paranoid mode that
 supposedly guarantees against OOM - it was me that pointed that out
 originally :-). I just checked on the latest sources (today it's RH8, kernel
 2.4.20-18.8) to be doubly sure, and can't see the patches. (That would be
 really bad of RH, btw, if I'm correct - saying in your docs you support
 something that you don't)

I tried a direct test on my RHL 8.0 box, and was able to prove that
indeed the overcommit 2/3 modes do something, though whether they work
exactly as documented is another question.

I wrote this silly little test program to get an approximate answer
about the largest amount a program could malloc:

#include stdio.h
#include stdlib.h

int
main (int argc, char **argv)
{
  size_t min = 1024;/* assume this'd work */
  size_t max = -1;  /* = max unsigned */
  size_t sz;
  void *ptr;

  while ((max - min) = 1024ul) {
sz = (((unsigned long long) max) + ((unsigned long long) min)) / 2;
ptr = malloc(sz);
if (ptr) {
  free(ptr);
//  printf(malloc(%lu) succeeded\n, sz);
  min = sz;
} else {
//  printf(malloc(%lu) failed\n, sz);
  max = sz;
}
  }

  printf(Max malloc is %lu Kb\n, min / 1024);

  return 0;
}

and got these results:

[EMAIL PROTECTED] tmp]# echo 0  /proc/sys/vm/overcommit_memory
[EMAIL PROTECTED] tmp]# ./alloc
Max malloc is 1489075 Kb
[EMAIL PROTECTED] tmp]# echo 1  /proc/sys/vm/overcommit_memory
[EMAIL PROTECTED] tmp]# ./alloc
Max malloc is 2063159 Kb
[EMAIL PROTECTED] tmp]# echo 2  /proc/sys/vm/overcommit_memory
[EMAIL PROTECTED] tmp]# ./alloc
Max malloc is 1101639 Kb
[EMAIL PROTECTED] tmp]# echo 3  /proc/sys/vm/overcommit_memory
[EMAIL PROTECTED] tmp]# ./alloc
Max malloc is 974179 Kb

So it's definitely doing something.  /proc/meminfo shows

total:used:free:  shared: buffers:  cached:
Mem:  261042176 160456704 1005854720 72015872 63344640
Swap: 1077501952 44974080 1032527872
MemTotal:   254924 kB
MemFree: 98228 kB
MemShared:   0 kB
Buffers: 70328 kB
Cached:  59244 kB
SwapCached:   2616 kB
Active: 102532 kB
Inact_dirty: 11644 kB
Inact_clean: 21840 kB
Inact_target:27200 kB
HighTotal:   0 kB
HighFree:0 kB
LowTotal:   254924 kB
LowFree: 98228 kB
SwapTotal: 1052248 kB
SwapFree:  1008328 kB
Committed_AS:77164 kB

It does appear that the limit in mode 3 is not too far from where
you'd expect (SwapTotal - Committed_AS), and mode 2 allows about
128M more, which is correct since there's 256 M of RAM.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-14 Thread Andrew Dunstan

I know he does -  *but* I think it has probably been wiped out by accident
somewhere along the line (like when they went to 2.4.20?)

Here's what's in RH sources - tell me after you look that I am looking in
the wrong place. (Or did RH get cute and decide to do this only for the AS
product?)

first, RH7.3/kernel 2.4.18-3 (patch present):


int vm_enough_memory(long pages, int charge)
{
/* Stupid algorithm to decide if we have enough memory: while
 * simple, it hopefully works in most obvious cases.. Easy to
 * fool it, but this should catch most mistakes.
 *
 * 23/11/98 NJC: Somewhat less stupid version of algorithm,
 * which tries to do TheRightThing.  Instead of using half of
 * (buffers+cache), use the minimum values.  Allow an extra 2%
 * of num_physpages for safety margin.
 *
 * 2002/02/26 Alan Cox: Added two new modes that do real accounting
 */
unsigned long free, allowed;
struct sysinfo i;

if(charge)
atomic_add(pages, vm_committed_space);

/* Sometimes we want to use more memory than we have. */
if (sysctl_overcommit_memory == 1)
return 1;
if (sysctl_overcommit_memory == 0)
{
/* The page cache contains buffer pages these days.. */
free = atomic_read(page_cache_size);
free += nr_free_pages();
free += nr_swap_pages;

/*
 * This double-counts: the nrpages are both in the
page-cache
 * and in the swapper space. At the same time, this
compensates
 * for the swap-space over-allocation (ie nr_swap_pages
being
 * too small.
 */
free += swapper_space.nrpages;

/*
 * The code below doesn't account for free space in the
inode
 * and dentry slab cache, slab cache fragmentation, inodes
and
 * dentries which will become freeable under VM load, etc.
 * Lets just hope all these (complex) factors balance out...
 */
free += (dentry_stat.nr_unused * sizeof(struct dentry)) 
PAGE_SHIFT;
free += (inodes_stat.nr_unused * sizeof(struct inode)) 
PAGE_SHIFT;

if(free  pages)
return 1;
atomic_sub(pages, vm_committed_space);
return 0;
}
allowed = total_swap_pages;

if(sysctl_overcommit_memory == 2)
{
/* FIXME - need to add arch hooks to get the bits we need
   without the higher overhead crap */
si_meminfo(i);
allowed += i.totalram  1;
}
if(atomic_read(vm_committed_space)  allowed)
return 1;
if(charge)
atomic_sub(pages, vm_committed_space);
return 0;

}
-
and here's what's in RH9/2.4.20-18 (patch absent):
--
int vm_enough_memory(long pages)
{
/* Stupid algorithm to decide if we have enough memory: while
 * simple, it hopefully works in most obvious cases.. Easy to
 * fool it, but this should catch most mistakes.
 */
/* 23/11/98 NJC: Somewhat less stupid version of algorithm,
 * which tries to do TheRightThing.  Instead of using half of
 * (buffers+cache), use the minimum values.  Allow an extra 2%
 * of num_physpages for safety margin.
 */

unsigned long free;

/* Sometimes we want to use more memory than we have. */
if (sysctl_overcommit_memory)
return 1;

/* The page cache contains buffer pages these days.. */
free = atomic_read(page_cache_size);
free += nr_free_pages();
free += nr_swap_pages;

/*
 * This double-counts: the nrpages are both in the page-cache
 * and in the swapper space. At the same time, this compensates
 * for the swap-space over-allocation (ie nr_swap_pages being
 * too small.
 */
free += swapper_space.nrpages;

/*
 * The code below doesn't account for free space in the inode
 * and dentry slab cache, slab cache fragmentation, inodes and
 * dentries which will become freeable under VM load, etc.
 * Lets just hope all these (complex) factors balance out...
 */
free += (dentry_stat.nr_unused * sizeof(struct dentry)) 
PAGE_SHIFT;
free += (inodes_stat.nr_unused * sizeof(struct inode)) 
PAGE_SHIFT;

return free  pages;
}

- Original Message - 
From: Tom Lane [EMAIL PROTECTED]
To: Andrew Dunstan [EMAIL PROTECTED]
Cc: Kurt Roeckx [EMAIL PROTECTED]; Matthew Kirkwood
[EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Saturday, June 14, 2003 5:16 PM
Subject: Re: [HACKERS] Pre-allocation of shared memory ...


 Andrew

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-14 Thread Lamar Owen

On Saturday 14 June 2003 16:38, Andrew Dunstan wrote:
 IOW, simply the presence of /proc/sys/vm/overcommit_memory with a value set
 to 0 doesn't guarantee you won't get an OOM kill, AFAICS.

Right.  You need the value to be 2 or 3.  Which means you need Alan's patch to 
do that.

 I *know* the latest RH kernel docs *say* they have paranoid mode that
 supposedly guarantees against OOM - it was me that pointed that out
 originally :-). I just checked on the latest sources (today it's RH8,
 kernel 2.4.20-18.8) to be doubly sure, and can't see the patches. (That
 would be really bad of RH, btw, if I'm correct - saying in your docs you
 support something that you don't)

But note these two lines in the docs with 2.4.20-13.9 (RHL9 errata):
* This describes the overcommit management facility in the latest kernel
  tree (FIXME: actually it also describes the stuff that isnt yet done)

Pay double attention to the line that says FIXME.  IOW, they've documented 
stuff that might not be done!

You can try Red Hat's enterprise kernel, but you'll have to build it from 
source.  RHEL AS is available online as source RPMs.

Also understand that the official Red Hat kernel is very close to an Alan Cox 
kernel.  Also, if you really want to get down and dirty testing the kernel, a 
test suite is available to help with that, known as Cerberus.  Configs are 
available specifically tuned to stress-test kernels.  I think Cerberus is on 
Source Forge.

So, make sure you have a kernel that allows overcommit-accounting mode 2 to 
prevent kills on OOM.  Theoretically mode 2 will prevent the possiblity of 
OOM completely.

If I read things right, if you have double swap space mode 0 will not OOM 
nearly as quickly.
-- 
Lamar Owen
WGCR Internet Radio
1 Peter 4:11


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-13 Thread Shridhar Daithankar

On 12 Jun 2003 at 11:31, Bruce Momjian wrote:

 
 OK, doc patch attached and applied.  Improvements?

Can we point people to /usr/src/linux/doc...place where they can find more 
documentation  and if their kernel supports it or not.

Bye
 Shridhar

--
Zall's Laws:(1) Any time you get a mouthful of hot soup, the next thing you do 
 
   will be wrong.   (2) How long a minute is, depends on which side of the 
bathroom   door you're on.


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-13 Thread Jeroen T. Vermeulen

On Thu, Jun 12, 2003 at 07:22:14PM -0700, Ron Mayer wrote:
 
 I'm guessing any database backend (postgres, oracle)
 that wasn't part of a long-lived connection seems like 
 an especially attractive target to this algorithm.  

Yeah, IIRC it tries to pick daemons that can be restarted, or will be
restarted automatically, but may need a lot less memory after that.


Jeroen


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-13 Thread Bruce Momjian

Shridhar Daithankar wrote:
 On 12 Jun 2003 at 11:31, Bruce Momjian wrote:
 
  
  OK, doc patch attached and applied.  Improvements?
 
 Can we point people to /usr/src/linux/doc...place where they can find more 
 documentation  and if their kernel supports it or not.

Yes, we could, but the name of the parameter seems enough.  They
certainly can look that up.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-13 Thread Patrick Welche

On Thu, Jun 12, 2003 at 10:10:02PM -0400, Bruce Momjian wrote:
 Tom Lane wrote:
  Bruce Momjian [EMAIL PROTECTED] writes:
   You have to love that swap + 1/2 ram option --- when you need four
   possible options, there is something wrong with your approach.  :-)
  
  I'm still wondering what the no overcommit handling option does,
  exactly.
 
 I assume it does no kills, and allows you to commit until you run of of
 swap and hang.  This might be the BSD 4.4 behavior, actually.

? I thought the idea of no overcommit was that your malloc fails ENOMEM
if there isn't enough memory free for your whole request, rather than
gambling that other processes aren't actually using all of theirs right now
and have pages swapped out. I don't see where the hang comes in..

 It is bad to hang the system, but if it reports swap failure, at least
 the admin knows why it failed, rather than killing random processes.

Yes!

Patrick

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-13 Thread Bruce Momjian

Patrick Welche wrote:
 On Thu, Jun 12, 2003 at 10:10:02PM -0400, Bruce Momjian wrote:
  Tom Lane wrote:
   Bruce Momjian [EMAIL PROTECTED] writes:
You have to love that swap + 1/2 ram option --- when you need four
possible options, there is something wrong with your approach.  :-)
   
   I'm still wondering what the no overcommit handling option does,
   exactly.
  
  I assume it does no kills, and allows you to commit until you run of of
  swap and hang.  This might be the BSD 4.4 behavior, actually.
 
 ? I thought the idea of no overcommit was that your malloc fails ENOMEM
 if there isn't enough memory free for your whole request, rather than
 gambling that other processes aren't actually using all of theirs right now
 and have pages swapped out. I don't see where the hang comes in..

I think there are two important memory cases:

malloc() - should fail right away if it can't reserve the requested
memory;  assuming application request memory they don't use just seems
dumb --- fix the bad apps.

fork() - this is the tricky one because you don't know at fork time who
is going to be sharing the data pages as read-only or doing an exec to
overlay a new process, and who is going to be modifying them and need a
private copy.

I think only the fork case is tricky.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-13 Thread Jeroen T. Vermeulen

On Fri, Jun 13, 2003 at 09:25:49AM -0400, Bruce Momjian wrote:
 
 malloc() - should fail right away if it can't reserve the requested
 memory;  assuming application request memory they don't use just seems
 dumb --- fix the bad apps.
 
 fork() - this is the tricky one because you don't know at fork time who
 is going to be sharing the data pages as read-only or doing an exec to
 overlay a new process, and who is going to be modifying them and need a
 private copy.
 
 I think only the fork case is tricky.

But how do you tell that a malloc() can't get enough memory, once you've
had to overcommit on fork()s?  If a really large program did a regular
fork()/exec() and there wasn't enough free virtual memory to support
the full fork() just in case the program isn't going to exec(), then
*any* malloc() occurring between the two calls would have to fail.  That
may be better than random killing in theory, but the practical effect
would be close to that.

There's other complications as well, I'm sure.  If this were easy, we
probably wouldn't be discussing this problem now.


Jeroen


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-13 Thread Josh Berkus

Tom, et al,

  Given that swap space is cheap, and that killing random processes is
  obviously bad, it's not apparent to me why people think this is not
  a good approach --- at least for high-reliability servers.  And Linux
  would definitely like to think of itself as a server-grade OS.

Regrettably, few of the GUI installers for Linux (SuSE or Red Hat, for 
example), include adequate swap space in their suggested disk formatting.  
Some versions of some distributions do not create a swap partition at all; 
others allocate only 130mb to this partition regardless of actual RAM.

So regardless of what they *should* be doing, there's thousands of Linux users 
out there with too little or no swap on disk ...

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 10: the planner will ignore your desire to choose an index scan if your
   joining column's datatypes do not match

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-13 Thread Bruce Momjian

Josh Berkus wrote:
 Tom, et al,
 
   Given that swap space is cheap, and that killing random processes is
   obviously bad, it's not apparent to me why people think this is not
   a good approach --- at least for high-reliability servers.  And Linux
   would definitely like to think of itself as a server-grade OS.
 
 Regrettably, few of the GUI installers for Linux (SuSE or Red Hat, for 
 example), include adequate swap space in their suggested disk formatting.  
 Some versions of some distributions do not create a swap partition at all; 
 others allocate only 130mb to this partition regardless of actual RAM.
 
 So regardless of what they *should* be doing, there's thousands of Linux users 
 out there with too little or no swap on disk ...

Yes, I have seen that on BSD's too.  I am unsure if we need actual swap
backing store, or just sufficient RAM to allow fork expansion for dirty
pages.


-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-13 Thread Bruce Momjian

Lamar Owen wrote:
 On Friday 13 June 2003 11:55, Josh Berkus wrote:
  Regrettably, few of the GUI installers for Linux (SuSE or Red Hat, for
  example), include adequate swap space in their suggested disk formatting.
  Some versions of some distributions do not create a swap partition at all;
  others allocate only 130mb to this partition regardless of actual RAM.
 
 Incidentally, Red Hat as of about 7.0 began insisting on swap space at least 
 as large as twice RAM size.  In my case on my 512MB RAM notebook, that meant 
 it wanted 1GB swap.  If you upgrade your RAM you could get into trouble.  In 
 that case, you create a swap file on one of your other partitions that the 
 kernel can use.

Oh, that's interesting. I know the newer BSD releases got rid of the
large swap requirement, on the understanding that you usually aren't
going to be using it anyway.

What old BSD releases used to do was to allocate swap space as backing
_all_ RAM, even when it wasn't going to need it, while later releases
allocated swap only when it was needed, so it was only for cases
_exceeding_ RAM, so your virtual memory was now RAM _plus_ swap.

Of course, if you exceed swap, your system hangs.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 10: the planner will ignore your desire to choose an index scan if your
   joining column's datatypes do not match

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-13 Thread Nigel J. Andrews

On Fri, 13 Jun 2003, Lamar Owen wrote:

 On Friday 13 June 2003 11:55, Josh Berkus wrote:
  Regrettably, few of the GUI installers for Linux (SuSE or Red Hat, for
  example), include adequate swap space in their suggested disk formatting.
  Some versions of some distributions do not create a swap partition at all;
  others allocate only 130mb to this partition regardless of actual RAM.
 
 Incidentally, Red Hat as of about 7.0 began insisting on swap space at least 
 as large as twice RAM size.  In my case on my 512MB RAM notebook, that meant 
 it wanted 1GB swap.  If you upgrade your RAM you could get into trouble.  In 
 that case, you create a swap file on one of your other partitions that the 
 kernel can use.

I'm not sure I agree with this. To a large extent these days of cheap memory
swap space is there to give you time to notice the excessive use of it and
repair the system, since you'd normally be running everything in RAM.

Using the old measure of twice physical memory for swap is excessive on a
decent system imo. I certainly would not allocate 1GB of swap! Well, okay, I
might if I've got a 16GB machine with the potential for an excessive
but transitory workload, or say 4-8GB machine with a few very large memory
usage processes that can be started as part of the normal work load.

In short, imo these days swap is there to prevent valid processes dying for
lack of system memory and not to provide normal workspace for them.

Having said all that, I haven't read the start of this thread so I've probably
missed the reason for the complaint about lack of swap space, like a problem on
a small memory system.


-- 
Nigel J. Andrews


---(end of broadcast)---
TIP 9: most folks find a random_page_cost between 1 or 2 is ideal

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-13 Thread Bruce Momjian


I will say I do use swap sometimes when I am editing a huge image or
something --- there are peak times when it is required.

---

Nigel J. Andrews wrote:
 On Fri, 13 Jun 2003, Lamar Owen wrote:
 
  On Friday 13 June 2003 11:55, Josh Berkus wrote:
   Regrettably, few of the GUI installers for Linux (SuSE or Red Hat, for
   example), include adequate swap space in their suggested disk formatting.
   Some versions of some distributions do not create a swap partition at all;
   others allocate only 130mb to this partition regardless of actual RAM.
  
  Incidentally, Red Hat as of about 7.0 began insisting on swap space at least 
  as large as twice RAM size.  In my case on my 512MB RAM notebook, that meant 
  it wanted 1GB swap.  If you upgrade your RAM you could get into trouble.  In 
  that case, you create a swap file on one of your other partitions that the 
  kernel can use.
 
 I'm not sure I agree with this. To a large extent these days of cheap memory
 swap space is there to give you time to notice the excessive use of it and
 repair the system, since you'd normally be running everything in RAM.
 
 Using the old measure of twice physical memory for swap is excessive on a
 decent system imo. I certainly would not allocate 1GB of swap! Well, okay, I
 might if I've got a 16GB machine with the potential for an excessive
 but transitory workload, or say 4-8GB machine with a few very large memory
 usage processes that can be started as part of the normal work load.
 
 In short, imo these days swap is there to prevent valid processes dying for
 lack of system memory and not to provide normal workspace for them.
 
 Having said all that, I haven't read the start of this thread so I've probably
 missed the reason for the complaint about lack of swap space, like a problem on
 a small memory system.
 
 
 -- 
 Nigel J. Andrews
 
 
 ---(end of broadcast)---
 TIP 9: most folks find a random_page_cost between 1 or 2 is ideal
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-13 Thread Jeroen T. Vermeulen

On Fri, Jun 13, 2003 at 12:32:24PM -0400, Lamar Owen wrote:
 
 Incidentally, Red Hat as of about 7.0 began insisting on swap space at least 
 as large as twice RAM size.  In my case on my 512MB RAM notebook, that meant 
 it wanted 1GB swap.  If you upgrade your RAM you could get into trouble.  In 
 that case, you create a swap file on one of your other partitions that the 
 kernel can use.

RedHat's position may be influenced by the fact that, AFAIR, they use
the Rik van Riel virtual memory system which is inclusive--i.e., you need
at least as much swap as you have physical memory before you really have
any virtual memory at all.  This was fixed by the competing Andrea
Arcangeli system, which became standard for the Linux kernel around
2.4.10 or so.


Jeroen


---(end of broadcast)---
TIP 10: the planner will ignore your desire to choose an index scan if your
   joining column's datatypes do not match

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-13 Thread Lamar Owen

On Friday 13 June 2003 12:46, Nigel J. Andrews wrote:
 On Fri, 13 Jun 2003, Lamar Owen wrote:
  Incidentally, Red Hat as of about 7.0 began insisting on swap space at
  least as large as twice RAM size.  In my case on my 512MB RAM notebook,
  that meant it wanted 1GB swap.  If you upgrade your RAM you could get
  into trouble.  In that case, you create a swap file on one of your other
  partitions that the kernel can use.

 I'm not sure I agree with this. To a large extent these days of cheap
 memory swap space is there to give you time to notice the excessive use of
 it and repair the system, since you'd normally be running everything in
 RAM.

It is or was a Linux kernel problem.  The 2.2 kernel required double swap 
space, even though it wasn't well documented.  Early 2.4 kernels also 
required double swap space, and it was better documented.  Current Red Hat 
2.4 kernels, I'm not sure which VM system is in use.  The old VM certainly 
DID require double physical memory swap space.

From a message I wrote in January of 2002:
On Tuesday 22 January 2002 03:48 pm, Jim Wilcoxson wrote:
 I should have said, we're running this way on 2.2.19, not 2.4   -J

  Is this Linux requirement documented anywhere?  We're running 256MB
  of swap on 1GB machines and have not had any problems.  But we don't
  swap much either.

2.2 actually needs 2x swap, but the problems are worse with 2.4.  2.2 won't
die a horrible screaming death -- but 2.4 WILL DIE if you run out of swap in
the wrong way. As to documentation, I can't tell you how I found out about
it, as I'm under NDA from that source.

However, it is public information:  see http://lwn.net/2001/0607/kernel.php3
for some pointers.  Also see
http://www.geocrawler.com/archives/3/84/2001/5/0/5867356/
http://www.tuxedo.org/~esr/writings/ultimate-linux-box/configuration.html
and
http://www.ultraviolet.org/mail-archives/linux-kernel.2001/28831.html

And note that Red Hat Linux 7.1 and 7.2 will complain vociferously if you
create a swap partition smaller than 2x RAM during installation (anaconda).
What it doesn't do is complain when you upgrade RAM but don't upgrade your
swap.

Now, as to whether this is _still_ a requirement or not, I don't know.  Search 
the lkml (Linux Kernel Mailing List) for it.

However, understand that the Red Hat kernel is closer to an Alan Cox kernel 
than to a Linus kernel.  At least that was true up to 2.4.18; the Red Hat 
2.4.20 is very different, with NPTL and its ilk thrown in.
-- 
Lamar Owen
WGCR Internet Radio
1 Peter 4:11


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-12 Thread Hans-Jürgen Schönig

 Yeah, I see it in the Mandrake kernel.  But it's not in stock 2.4.19, so
 you can't assume everybody has it.

We had this problem on a recent version of good old Slackware.
I think we also had it on RedHat 8 or so.
Doing this kind of killing is definitely a bad habit. I thought it had 
it had to do with something else so my proposal for pre-allocation seems 
to be pretty obsolete ;).

Thanks a lot.

	Hans

--
Cybertec Geschwinde u Schoenig
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/2952/30706; +43/664/233 90 75
www.cybertec.at, www.postgresql.at, kernel.cybertec.at


---(end of broadcast)---
TIP 6: Have you searched our list archives?
http://archives.postgresql.org

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-12 Thread Andrew Dunstan


On this machine (RH9, kernel 2.4.20-18.9) the docs say (in
/usr/src/linux-2.4/Documentation/vm/overcommit-accounting ):

-
The Linux kernel supports four overcommit handling modes

0   -   Heuristic overcommit handling. Obvious overcommits of
address space are refused. Used for a typical system. It
ensures a seriously wild allocation fails while allowing
overcommit to reduce swap usage

1   -   No overcommit handling. Appropriate for some scientific
applications

2   -   (NEW) strict overcommit. The total address space commit
for the system is not permitted to exceed swap + half ram.
In almost all situations this means a process will not be
killed while accessing pages but only by malloc failures
that are reported back by the kernel mmap/brk code.

3   -   (NEW) paranoid overcommit The total address space commit
for the system is not permitted to exceed swap. The machine
will never kill a process accessing pages it has mapped
except due to a bug (ie report it!)
--

So maybe

  sysctl -w vm.overcommit_memory=3

is what's needed? I guess you might pay a performance hit for doing that,
though.

andrew

  Yeah, I see it in the Mandrake kernel.  But it's not in stock 2.4.19,
  so you can't assume everybody has it.
 

 We had this problem on a recent version of good old Slackware.
 I think we also had it on RedHat 8 or so.

 Doing this kind of killing is definitely a bad habit. I thought it had
 it had to do with something else so my proposal for pre-allocation
 seems  to be pretty obsolete ;).

 Thanks a lot.

   Hans




---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-12 Thread Jon Lapham

Tom Lane wrote:
Is this a Linux machine?  If so, the true explanation is probably (c):
the kernel is kill 9'ing randomly-chosen database processes whenever
it starts to feel low on memory.  I would suggest checking the
postmaster log to determine the signal number the failed backends are
dying with.  The client-side message does not give nearly enough info
to debug such problems.
AFAIK the only good way around this problem is to use another OS with a
more rational design for handling low-memory situations.  No other Unix
does anything remotely as brain-dead as what Linux does.  Or bug your
favorite Linux kernel hacker to fix the kernel.
Tom-

Just curious.  What would a rationally designed OS do in an out of 
memory situation?

It seems like from the discussions I've read about the subject there 
really is no rational solution to this irrational problem.

Some solutions such as suspend process, write image to file and 
increase swap space assume available disk space, which is obviously 
not guaranteed to be avaliable.

--
-**-*-*---*-*---*-*---*-*-*-*---*-*---*-*-*-*-*---
 Jon Lapham  [EMAIL PROTECTED]  Rio de Janeiro, Brasil
 Work: Extracta Moléculas Naturais SA http://www.extracta.com.br/
 Web: http://www.jandr.org/
***-*--**---***---


---(end of broadcast)---
TIP 6: Have you searched our list archives?
http://archives.postgresql.org

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-12 Thread Tom Lane

Jon Lapham [EMAIL PROTECTED] writes:
 Just curious.  What would a rationally designed OS do in an out of 
 memory situation?

Fail malloc() requests.

The sysctl docs that Andrew Dunstan just provided give some insight into
the problem: the default behavior of Linux is to promise more virtual
memory than it can actually deliver.  That is, it allows malloc to
succeed even when it's not going to be able to actually provide the
address space when push comes to shove.  When called to stand and
deliver, the kernel has no way to report failure (other than perhaps a
software-induced SIGSEGV, which would hardly be an improvement).  So it
kills the process instead.  Unfortunately, the process that happens to
be in the line of fire at this point could be any process, not only the
one that made unreasonable memory demands.

This is perhaps an okay behavior for desktop systems being run by
people who are accustomed to Microsoft-like reliability.  But to make it
the default is brain-dead, and to make it the only available behavior
(as seems to have been true until very recently) defies belief.  The
setting now called paranoid overcommit is IMHO the *only* acceptable
one for any sort of server system.  With anything else, you risk having
critical userspace daemons killed through no fault of their own.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-12 Thread Jon Lapham

Tom Lane wrote:
 [snip]
The
setting now called paranoid overcommit is IMHO the *only* acceptable
one for any sort of server system.  With anything else, you risk having
critical userspace daemons killed through no fault of their own.
Wow.  Thanks for the info.  I found the documentation you are referring 
to in Documentation/vm/overcommit-accounting (on a stock RH9 machine).

It seems that the overcommit policy is set via the sysctl 
`vm.overcommit_memory'.  So...

[EMAIL PROTECTED] src]# sysctl -a | grep -i overcommit
vm.overcommit_memory = 0
...the default seems to be Heuristic overcommit handling.  It seems 
that what we want is vm.overcommit_memory = 3 for paranoid overcommit.

Thanks for getting to the bottom of this Tom.  It *is* insane that the 
default isn't paranoid overcommit.

--
-**-*-*---*-*---*-*---*-*-*-*---*-*---*-*-*-*-*---
 Jon Lapham  [EMAIL PROTECTED]  Rio de Janeiro, Brasil
 Work: Extracta Moléculas Naturais SA http://www.extracta.com.br/
 Web: http://www.jandr.org/
***-*--**---***---


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-12 Thread Bruce Momjian


What really kills [:-)] me is that they allocate memory assuming I will
not be using it all, then terminate the executable in an unrecoverable
way when I go to use the memory.

And, they make a judgement on users who don't want this by calling them
paranoid.

I will add something to the docs about this.

---

Tom Lane wrote:
 Jon Lapham [EMAIL PROTECTED] writes:
  Just curious.  What would a rationally designed OS do in an out of 
  memory situation?
 
 Fail malloc() requests.
 
 The sysctl docs that Andrew Dunstan just provided give some insight into
 the problem: the default behavior of Linux is to promise more virtual
 memory than it can actually deliver.  That is, it allows malloc to
 succeed even when it's not going to be able to actually provide the
 address space when push comes to shove.  When called to stand and
 deliver, the kernel has no way to report failure (other than perhaps a
 software-induced SIGSEGV, which would hardly be an improvement).  So it
 kills the process instead.  Unfortunately, the process that happens to
 be in the line of fire at this point could be any process, not only the
 one that made unreasonable memory demands.
 
 This is perhaps an okay behavior for desktop systems being run by
 people who are accustomed to Microsoft-like reliability.  But to make it
 the default is brain-dead, and to make it the only available behavior
 (as seems to have been true until very recently) defies belief.  The
 setting now called paranoid overcommit is IMHO the *only* acceptable
 one for any sort of server system.  With anything else, you risk having
 critical userspace daemons killed through no fault of their own.
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 6: Have you searched our list archives?
 
 http://archives.postgresql.org
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-12 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 What really kills [:-)] me is that they allocate memory assuming I will
 not be using it all, then terminate the executable in an unrecoverable
 way when I go to use the memory.

To be fair, I'm probably misstating things by referring to malloc().
The big problem probably comes from fork() with copy-on-write --- the
kernel has no good way to estimate how much of the shared address space
will eventually become private modified copies, but it can be forgiven
for wanting to make less than the worst-case assumption.

Still, if you are wanting to run a reliable server, I think worst-case
assumption is exactly what you want.  Swap space is cheap, and there's
no reason you shouldn't have enough swap to support the worst-case
situation.  If the swap area goes largely unused, that's fine.

The policy they're calling paranoid overcommit (don't allocate more
virtual memory than you have swap) is as far as I know the standard on
all Unixen other than Linux; certainly it's the traditional behavior.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-12 Thread Bruce Momjian


OK, doc patch attached and applied.  Improvements?

---

Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  What really kills [:-)] me is that they allocate memory assuming I will
  not be using it all, then terminate the executable in an unrecoverable
  way when I go to use the memory.
 
 To be fair, I'm probably misstating things by referring to malloc().
 The big problem probably comes from fork() with copy-on-write --- the
 kernel has no good way to estimate how much of the shared address space
 will eventually become private modified copies, but it can be forgiven
 for wanting to make less than the worst-case assumption.
 
 Still, if you are wanting to run a reliable server, I think worst-case
 assumption is exactly what you want.  Swap space is cheap, and there's
 no reason you shouldn't have enough swap to support the worst-case
 situation.  If the swap area goes largely unused, that's fine.
 
 The policy they're calling paranoid overcommit (don't allocate more
 virtual memory than you have swap) is as far as I know the standard on
 all Unixen other than Linux; certainly it's the traditional behavior.
 
   regards, tom lane
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073
Index: doc/src/sgml/runtime.sgml
===
RCS file: /cvsroot/pgsql-server/doc/src/sgml/runtime.sgml,v
retrieving revision 1.184
diff -c -c -r1.184 runtime.sgml
*** doc/src/sgml/runtime.sgml   11 Jun 2003 22:13:21 -  1.184
--- doc/src/sgml/runtime.sgml   12 Jun 2003 15:29:45 -
***
*** 2780,2785 
--- 2780,2795 
  filename/usr/src/linux/include/asm-replaceablexxx//shmpara
  m.h/ and filename/usr/src/linux/include/linux/sem.h/.
 /para
+ 
+para
+ Linux has poor default memory overcommit behavior.  Rather than
+ failing if it can not reserve enough memory, it returns success, 
+ but later fails when the memory can't be mapped and terminates 
+ the application.  To prevent unpredictable process termination, use:
+ programlisting
+ sysctl -w vm.overcommit_memory=3
+ /programlisting
+/para
/listitem
   /varlistentry
  

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-12 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 OK, doc patch attached and applied.  Improvements?

I think it would be worth spending another sentence to tell people
exactly what the symptom looks like, ie, backends dying with signal 9.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-12 Thread Bruce Momjian


I have added the following sentence to the docs too:

Note, you will need enough swap space to cover all your memory
needs.

I still wish Linux would just fail the fork/malloc when memory is low,
rather than requiring swap for everything _or_ overcommitting.  I wonder
if making a unified buffer cache just made that too hard to do.

---

Andrew Dunstan wrote:
 
 On this machine (RH9, kernel 2.4.20-18.9) the docs say (in
 /usr/src/linux-2.4/Documentation/vm/overcommit-accounting ):
 
 -
 The Linux kernel supports four overcommit handling modes
 
 0   -   Heuristic overcommit handling. Obvious overcommits of
 address space are refused. Used for a typical system. It
 ensures a seriously wild allocation fails while allowing
 overcommit to reduce swap usage
 
 1   -   No overcommit handling. Appropriate for some scientific
 applications
 
 2   -   (NEW) strict overcommit. The total address space commit
 for the system is not permitted to exceed swap + half ram.
 In almost all situations this means a process will not be
 killed while accessing pages but only by malloc failures
 that are reported back by the kernel mmap/brk code.
 
 3   -   (NEW) paranoid overcommit The total address space commit
 for the system is not permitted to exceed swap. The machine
 will never kill a process accessing pages it has mapped
 except due to a bug (ie report it!)
 --
 
 So maybe
 
   sysctl -w vm.overcommit_memory=3
 
 is what's needed? I guess you might pay a performance hit for doing that,
 though.
 
 andrew
 
   Yeah, I see it in the Mandrake kernel.  But it's not in stock 2.4.19,
   so you can't assume everybody has it.
  
 
  We had this problem on a recent version of good old Slackware.
  I think we also had it on RedHat 8 or so.
 
  Doing this kind of killing is definitely a bad habit. I thought it had
  it had to do with something else so my proposal for pre-allocation
  seems  to be pretty obsolete ;).
 
  Thanks a lot.
 
  Hans
 
 
 
 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-12 Thread Bruce Momjian


OK, new text is:

   para
Linux has poor default memory overcommit behavior.  Rather than
failing if it can not reserve enough memory, it returns success,
but later fails when the memory can't be mapped and terminates
the application with literalkill -9/.  To prevent unpredictable
process termination, use:
programlisting
sysctl -w vm.overcommit_memory=3
/programlisting
Note, you will need enough swap space to cover all your memory needs.
   /para
  /listitem
 /varlistentry

---

Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  OK, doc patch attached and applied.  Improvements?
 
 I think it would be worth spending another sentence to tell people
 exactly what the symptom looks like, ie, backends dying with signal 9.
 
   regards, tom lane
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-12 Thread Andrew Dunstan


A couple of points:

. It is probably a good idea to put do this via /etc/sysctl.conf, which will
be called earlyish by init scripts (on RH9 it is in the network startup
file, for some reason).

. The setting is not available on all kernel versions AFAIK. The admin needs
to check the docs. I have no idea when this went into the kernel, and no
time to spend finding out. Even if we knew, it might have gone into vendor
kernels at other odd times  - there are often times when the vendors are in
advance of the officially released kernels.

Andrew


Bruce wrote:

 OK, new text is:

   para
Linux has poor default memory overcommit behavior.  Rather than
failing if it can not reserve enough memory, it returns success,
but later fails when the memory can't be mapped and terminates
the application with literalkill -9/.  To prevent
unpredictable process termination, use:
 programlisting
 sysctl -w vm.overcommit_memory=3
 /programlisting
Note, you will need enough swap space to cover all your memory
needs.
   /para
  /listitem
 /varlistentry

 ---

 Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  OK, doc patch attached and applied.  Improvements?

 I think it would be worth spending another sentence to tell people
 exactly what the symptom looks like, ie, backends dying with signal 9.

  regards, tom lane


 --
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania
  19073

 ---(end of
 broadcast)--- TIP 2: you can get off all lists
 at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])




---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-12 Thread Bruce Momjian


Well, let's see what feedback we get.

---

Andrew Dunstan wrote:
 
 A couple of points:
 
 . It is probably a good idea to put do this via /etc/sysctl.conf, which will
 be called earlyish by init scripts (on RH9 it is in the network startup
 file, for some reason).
 
 . The setting is not available on all kernel versions AFAIK. The admin needs
 to check the docs. I have no idea when this went into the kernel, and no
 time to spend finding out. Even if we knew, it might have gone into vendor
 kernels at other odd times  - there are often times when the vendors are in
 advance of the officially released kernels.
 
 Andrew
 
 
 Bruce wrote:
 
  OK, new text is:
 
para
 Linux has poor default memory overcommit behavior.  Rather than
 failing if it can not reserve enough memory, it returns success,
 but later fails when the memory can't be mapped and terminates
 the application with literalkill -9/.  To prevent
 unpredictable process termination, use:
  programlisting
  sysctl -w vm.overcommit_memory=3
  /programlisting
 Note, you will need enough swap space to cover all your memory
 needs.
/para
   /listitem
  /varlistentry
 
  ---
 
  Tom Lane wrote:
  Bruce Momjian [EMAIL PROTECTED] writes:
   OK, doc patch attached and applied.  Improvements?
 
  I think it would be worth spending another sentence to tell people
  exactly what the symptom looks like, ie, backends dying with signal 9.
 
 regards, tom lane
 
 
  --
   Bruce Momjian|  http://candle.pha.pa.us
   [EMAIL PROTECTED]   |  (610) 359-1001
   +  If your life is a hard drive, |  13 Roberts Road
   +  Christ can be your backup.|  Newtown Square, Pennsylvania
   19073
 
  ---(end of
  broadcast)--- TIP 2: you can get off all lists
  at once with the unregister command
 (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
 
 
 
 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-12 Thread Greg Stark

Tom Lane [EMAIL PROTECTED] writes:

 The policy they're calling paranoid overcommit (don't allocate more
 virtual memory than you have swap) is as far as I know the standard on
 all Unixen other than Linux; certainly it's the traditional behavior.

Uhm, it's traditional for Unixen without extensive shared memory usage like
SunOS 4. But it's not nearly as standard as you say. 

In fact Linux wasn't the first major Unix to behave this way at all. As far as
I know, that honour belongs to AIX. Not coincidentally, one of the first
Unixen to have shared libraries. Hence the AIX invention of SIGDANGER which
told a process its death was imminent.

On AIX the heuristic was to kill the largest process in order to clear up the
most memory -- which had a nasty habit of picking the X server to kill, which
of course, well, it cleared up lots of memory... I think they fixed that by
changing the heuristic to kill the *second* biggest process.

I think you'll find this overcommit issue affects many if not most Unixen.
There's a bit of a vicious circle here, a lot of software now have the habit
of starting off by mallocing huge chunks of memory that they never need
because well the machine has virtual memory so it doesn't cost anything.

-- 
greg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-12 Thread Tom Lane

Greg Stark [EMAIL PROTECTED] writes:
 I think you'll find this overcommit issue affects many if not most Unixen.

I'm unconvinced, because I've only ever heard of the problem affecting
Postgres on Linux.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-12 Thread Bruce Momjian

Tom Lane wrote:
 Greg Stark [EMAIL PROTECTED] writes:
  I think you'll find this overcommit issue affects many if not most Unixen.
 
 I'm unconvinced, because I've only ever heard of the problem affecting
 Postgres on Linux.

What I don't understand is why they just don't start failing on
fork/malloc rather than killing things.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-12 Thread Jeroen T. Vermeulen

On Thu, Jun 12, 2003 at 08:08:28PM -0400, Bruce Momjian wrote:
  
  I'm unconvinced, because I've only ever heard of the problem affecting
  Postgres on Linux.
 
 What I don't understand is why they just don't start failing on
 fork/malloc rather than killing things.

I may be way off the mark here, falling into the middle of this as I am,
but it may be because the kernel overcommits the memory (which is sort of
logical in a way given the way fork() works).  That may mean that malloc()
thinks it gets more memory and returns a pointer, but the kernel hasn't
actually committed that address space yet and waits to see if it's ever
going to be needed.

Given the right allocation proportions, this may mean that in the end the
kernel has no way to handle a shortage gracefully by causing fork() or
allocations to fail.  I would assume it then goes through its alternatives
like scaling back its file cache--which it'd probably start to do before
a lot of swapping was needed, so not much to scrape out of that barrel.

After that, where do you go?  Try to find a reasonable process to shoot
in the head.  From what I heard, although I haven't kept current, a lot
of work went into selecting a reasonable process, so there will be some
determinism.  And if you have occasion to find out in the first place,
some determinism usually means suspiciously bad luck.


Jeroen


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-12 Thread Andrew Dunstan

I'm not saying you're wrong, but I also think it's true that typical Linux
usage patterns are rather different from those of other *nixen. Linux
started out being able to do a lot with a little, and is still often used
that way - with more functions crammed into boxes with less resources. When
I last worked in a data centre (a few years ago now, for one of the world's
largest companies) they had hundreds of AIX and HP-UX boxes, each well
resourced and each dedicated to exactly one function. I rarely see Linux
being used that way, and I often see it configured with lowish memory and
not nearly enough swap.

In any case, it seems to me we need to have someone check that setting the
vm.overcommit_memory to paranoid will actually stop the postmaster being
killed. I'd love to help but I'm up to my ears in stuff right now. If we
know that we can save the philosophical stuff for another day :-)

cheers

andrew

- Original Message - 
From: Tom Lane [EMAIL PROTECTED]
To: Greg Stark [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Thursday, June 12, 2003 6:19 PM
Subject: Re: [HACKERS] Pre-allocation of shared memory ...


 Greg Stark [EMAIL PROTECTED] writes:
  I think you'll find this overcommit issue affects many if not most
Unixen.

 I'm unconvinced, because I've only ever heard of the problem affecting
 Postgres on Linux.

 regards, tom lane


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-12 Thread Tom Lane

Jeroen T. Vermeulen [EMAIL PROTECTED] writes:
 Given the right allocation proportions, this may mean that in the end the
 kernel has no way to handle a shortage gracefully by causing fork() or
 allocations to fail.

Sure it does.  All you need is a conservative allocation policy: fork()
fails if it cannot reserve enough swap space to guarantee that the new
process could write over its entire address space.  Copy-on-write is
an optimization that reduces physical RAM usage, not virtual address
space or swap-space requirements.

Given that swap space is cheap, and that killing random processes is
obviously bad, it's not apparent to me why people think this is not
a good approach --- at least for high-reliability servers.  And Linux
would definitely like to think of itself as a server-grade OS.

 After that, where do you go?  Try to find a reasonable process to shoot
 in the head.  From what I heard, although I haven't kept current, a lot
 of work went into selecting a reasonable process, so there will be some
 determinism.

Considering the frequency with which we hear of database backends
getting shot in the head, I'd say those heuristics need lots of work
yet.  I'll take a non-heuristic solution for any system I have to
administer, thanks.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-12 Thread Alvaro Herrera

On Thu, Jun 12, 2003 at 09:18:33PM -0400, Tom Lane wrote:

 Given that swap space is cheap, and that killing random processes is
 obviously bad, it's not apparent to me why people think this is not
 a good approach --- at least for high-reliability servers.  And Linux
 would definitely like to think of itself as a server-grade OS.

Well, it was a toy OS when conceived, that's for sure.  But it's getting
better.

 Considering the frequency with which we hear of database backends
 getting shot in the head, I'd say those heuristics need lots of work
 yet.

Previous versions were said to attempt to kill init.  You have to admit
there has been some progress.

But then there's the problem of people running database servers on
misconfigured machines.  They should know better than not setting enough
swap space, IMHO anyway.

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
Y una voz del caos me hablo y me dijo
Sonrie y se feliz, podria ser peor.
Y sonrei. Y fui feliz.
Y fue peor.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-12 Thread Bruce Momjian

Tom Lane wrote:
 Jeroen T. Vermeulen [EMAIL PROTECTED] writes:
  Given the right allocation proportions, this may mean that in the end the
  kernel has no way to handle a shortage gracefully by causing fork() or
  allocations to fail.
 
 Sure it does.  All you need is a conservative allocation policy: fork()
 fails if it cannot reserve enough swap space to guarantee that the new
 process could write over its entire address space.  Copy-on-write is
 an optimization that reduces physical RAM usage, not virtual address
 space or swap-space requirements.
 
 Given that swap space is cheap, and that killing random processes is
 obviously bad, it's not apparent to me why people think this is not
 a good approach --- at least for high-reliability servers.  And Linux
 would definitely like to think of itself as a server-grade OS.

BSD used to require full swap behind all RAM.  I am not sure if that was
changed in BSD 4.4 or in later BSD/OS releases, but it is no longer
true.  I think now it can use RAM or swap as reserved backing store for
fork page modifications.  However, when the system runs of of swap, it
hangs!

  After that, where do you go?  Try to find a reasonable process to shoot
  in the head.  From what I heard, although I haven't kept current, a lot
  of work went into selecting a reasonable process, so there will be some
  determinism.
 
 Considering the frequency with which we hear of database backends
 getting shot in the head, I'd say those heuristics need lots of work
 yet.  I'll take a non-heuristic solution for any system I have to
 administer, thanks.

You have to love that swap + 1/2 ram option --- when you need four
possible options, there is something wrong with your approach.  :-)

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-12 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 You have to love that swap + 1/2 ram option --- when you need four
 possible options, there is something wrong with your approach.  :-)

I'm still wondering what the no overcommit handling option does,
exactly.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-12 Thread Greg Stark

Alvaro Herrera [EMAIL PROTECTED] writes:

 On Thu, Jun 12, 2003 at 09:18:33PM -0400, Tom Lane wrote:
 
  Given that swap space is cheap, and that killing random processes is
  obviously bad, it's not apparent to me why people think this is not
  a good approach --- at least for high-reliability servers.  And Linux
  would definitely like to think of itself as a server-grade OS.

Consider the case of huge processes trying to fork/exec to run ls. It might
seem kind of strange to be getting Out of memory errors from your java or
database engine when there are hundreds of megs free on the machine...

I suspect this was less of an issue in the days before copy on write because
vfork was more widely used/implemented. I'm not sure linux even implements
vfork other than just as a wrapper around fork. Even BSD ditched it a while
back though I think I saw that NetBSD reimplemented it since then.

 But then there's the problem of people running database servers on
 misconfigured machines.  They should know better than not setting enough
 swap space, IMHO anyway.

Well, I've seen DBAs say Since I don't want the database swapping anyways,
I'll make really sure it doesn't swap by just not giving it any swap space --
that's why we bought so much RAM in the first place. It's not obvious that
you need swap to back memory the machine doesn't even report as being in
use...

-- 
greg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-12 Thread Bruce Momjian

Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  You have to love that swap + 1/2 ram option --- when you need four
  possible options, there is something wrong with your approach.  :-)
 
 I'm still wondering what the no overcommit handling option does,
 exactly.

I assume it does no kills, and allows you to commit until you run of of
swap and hang.  This might be the BSD 4.4 behavior, actually.

It is bad to hang the system, but if it reports swap failure, at least
the admin knows why it failed, rather than killing random processes.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-12 Thread Bruce Momjian

Greg Stark wrote:
 I suspect this was less of an issue in the days before copy on write because
 vfork was more widely used/implemented. I'm not sure linux even implements
 vfork other than just as a wrapper around fork. Even BSD ditched it a while
 back though I think I saw that NetBSD reimplemented it since then.
 
  But then there's the problem of people running database servers on
  misconfigured machines.  They should know better than not setting enough
  swap space, IMHO anyway.
 
 Well, I've seen DBAs say Since I don't want the database swapping anyways,
 I'll make really sure it doesn't swap by just not giving it any swap space --
 that's why we bought so much RAM in the first place. It's not obvious that
 you need swap to back memory the machine doesn't even report as being in
 use...

I see no reason RAM can't be used as backing store for possible
copy-on-write use.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-12 Thread Greg Stark

Bruce Momjian [EMAIL PROTECTED] writes:

 I see no reason RAM can't be used as backing store for possible
 copy-on-write use.

Depends on the scenario. For a database like postgres it would work fairly
well since that RAM is still available for filesystem buffers. For Oracle it
would suck because it's not available for Oracle to allocate to use for its
own buffers. And for a web server with an architecture like Apache it would
suck because it would mean being restricted to a much lower number of
processes than the machine could really handle.

  I'm still wondering what the no overcommit handling option does,
  exactly.
 
 I assume it does no kills, and allows you to commit until you run of of
 swap and hang.  This might be the BSD 4.4 behavior, actually.

I think it just makes fork/mmap/sbrk return an error if you run out of swap.
That makes the error appear most likely as malloc() returning null which most
applications don't handle anyways and the user sees the same behaviour:
programs crashing randomly.

Of course that's not what high availability server software does but since
most users' big memory consumers these days seem to be their window manager
and its 3d animated window decorations...

-- 
greg


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-12 Thread Ron Mayer


Jeroen T. Vermeulen wrote:

After that, where do you go?  Try to find a reasonable process to shoot
in the head.  From what I heard, although I haven't kept current, a lot
of work went into selecting a reasonable process, so there will be some
determinism.

FWIW, you can browse the logic linux uses to choose 
which process to kill here:
  http://lxr.linux.no/source/mm/oom_kill.c

If I read that right, this calculates points for each process, where:
   points = vm_size_of_process 
/ sqrt(cpu_time_it_ran)
/ sqrt(sqrt(clock_time_it_had)
* 2 if the process was niced
/ 4 if the process ran a root
/ 4 if the process had hardware access.
and whichever process has the most points dies.

I'm guessing any database backend (postgres, oracle)
that wasn't part of a long-lived connection seems like 
an especially attractive target to this algorithm.  

(Though hopefully it's all moot now that Andrew / Tom
 found/recommended the paranoid overcommit option, which
 sure seems like the most sane thing for a server to me)

   Ron

PS: Oracle DBAs suffer from the same pain. 
  http://www.cs.helsinki.fi/linux/linux-kernel/2001-12/0098.html
  http://www.ussg.iu.edu/hypermail/linux/kernel/0103.3/0094.html



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-12 Thread Alvaro Herrera

On Thu, Jun 12, 2003 at 07:22:14PM -0700, Ron Mayer wrote:

 FWIW, you can browse the logic linux uses to choose 
 which process to kill here:
   http://lxr.linux.no/source/mm/oom_kill.c

Hey, this LXR thing is cool.  It'd be nice to have one of those for
Postgres.

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
La naturaleza, tan fragil, tan expuesta a la muerte... y tan viva

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-11 Thread Bruce Momjian


We already pre-allocate all shared memory and resources on postmaster
start.

---

Hans-Jürgen Schönig wrote:
 There is a problem which occurs from time to time and which is a bit 
 nasty in business environments.
 When the shared memory is eaten up by some application such as Apache 
 PostgreSQL will refuse to do what it should do because there is no 
 memory around. To many people this looks like a problem relatd to 
 stability. Also, it influences availability of the database itself.
 
 I was thinking of a solution which might help to get around this problem:
 If we had a flag to tell PostgreSQL that XXX Megs of shared memory 
 should be preallocated by PostgreSQL. The database would the sure that 
 there is always enough memory around. The problem is that PostgreSQL had 
 to care more about memory consumption.
 
 Of course, the best solution is to put PostgreSQL on a separate machine 
 but many people don't do it so we have to live with memory leaks caused 
 by other software (we have just seen a nasty one in mod_perl).
 
 Does it make sense?
 
   Regards,
 
   Hans
 
 
 -- 
 Cybertec Geschwinde u Schoenig
 Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
 Tel: +43/2952/30706; +43/664/233 90 75
 www.cybertec.at, www.postgresql.at, kernel.cybertec.at
 
 
 
 ---(end of broadcast)---
 TIP 5: Have you checked our extensive FAQ?
 
 http://www.postgresql.org/docs/faqs/FAQ.html
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-11 Thread Tom Lane

=?ISO-8859-1?Q?Hans-J=FCrgen_Sch=F6nig?= [EMAIL PROTECTED] writes:
 I have two explanations for the following behaviour:
 a. a bug
 b. not enough shared memory

 WARNING:  Message from PostgreSQL backend:
   The Postmaster has informed me that some other backend
   died abnormally and possibly corrupted shared memory.

Is this a Linux machine?  If so, the true explanation is probably (c):
the kernel is kill 9'ing randomly-chosen database processes whenever
it starts to feel low on memory.  I would suggest checking the
postmaster log to determine the signal number the failed backends are
dying with.  The client-side message does not give nearly enough info
to debug such problems.

There is also possibility (d): you have some bad RAM that is located in
an address range that doesn't get used until the machine is under full
load.  But if the backends are dying with signal 9 then I'll take the
kernel-kill theory.

AFAIK the only good way around this problem is to use another OS with a
more rational design for handling low-memory situations.  No other Unix
does anything remotely as brain-dead as what Linux does.  Or bug your
favorite Linux kernel hacker to fix the kernel.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-11 Thread Bruce Momjian

Tom Lane wrote:
 =?ISO-8859-1?Q?Hans-J=FCrgen_Sch=F6nig?= [EMAIL PROTECTED] writes:
  I have two explanations for the following behaviour:
  a. a bug
  b. not enough shared memory
 
  WARNING:  Message from PostgreSQL backend:
  The Postmaster has informed me that some other backend
  died abnormally and possibly corrupted shared memory.
 
 Is this a Linux machine?  If so, the true explanation is probably (c):
 the kernel is kill 9'ing randomly-chosen database processes whenever
 it starts to feel low on memory.  I would suggest checking the
 postmaster log to determine the signal number the failed backends are
 dying with.  The client-side message does not give nearly enough info
 to debug such problems.
 
 There is also possibility (d): you have some bad RAM that is located in
 an address range that doesn't get used until the machine is under full
 load.  But if the backends are dying with signal 9 then I'll take the
 kernel-kill theory.
 
 AFAIK the only good way around this problem is to use another OS with a
 more rational design for handling low-memory situations.  No other Unix
 does anything remotely as brain-dead as what Linux does.  Or bug your
 favorite Linux kernel hacker to fix the kernel.

Is there no sysctl way to disable such kills?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-11 Thread Doug McNaught

Bruce Momjian [EMAIL PROTECTED] writes:

 Tom Lane wrote:
  AFAIK the only good way around this problem is to use another OS with a
  more rational design for handling low-memory situations.  No other Unix
  does anything remotely as brain-dead as what Linux does.  Or bug your
  favorite Linux kernel hacker to fix the kernel.
 
 Is there no sysctl way to disable such kills?

The -ac kernel patches from Alan Cox have a sysctl to control memory
overcommit--you can set it to track memory usage and fail allocations
when memory runs out, rather than the random kill behavior.  I'm not
sure whether those have made it into the stock kernel yet, but the
vendor kernels (such as Red Hat's) might have it too.

-Doug

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-11 Thread Alvaro Herrera

On Wed, Jun 11, 2003 at 07:35:20PM -0400, Doug McNaught wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
 
  Is there no sysctl way to disable such kills?
 
 The -ac kernel patches from Alan Cox have a sysctl to control memory
 overcommit--you can set it to track memory usage and fail allocations
 when memory runs out, rather than the random kill behavior.  I'm not
 sure whether those have made it into the stock kernel yet, but the
 vendor kernels (such as Red Hat's) might have it too.

Yeah, I see it in the Mandrake kernel.  But it's not in stock 2.4.19, so
you can't assume everybody has it.

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
¿Qué importan los años?  Lo que realmente importa es comprobar que
a fin de cuentas la mejor edad de la vida es estar vivo  (Mafalda)

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

61 matches

Mail list logo