Re: VM: dynamic swap remapping (patch)

2001-10-01 Thread Vladimir Dozen

ehlo.

 Well Joe seems to have provided a pretty interesting document on
 how it works in AIX, but I was wondering if they do anything wrt
 low/high watermarks like my idea.
 
 Basically you'd like to inform processes that the danger has been
 alliviated so that they can cautiously start accepting more work
 rather than freaking out and shutting out clients forever...


   Actually, most of applications believe that everything OK except
   something tells them it's not. Regular OOM protection may be build
   as:

   int on_sigdanger(int)
   {
 throw std::runtime_error(out of memory);
   }
   ...

   while( there_are_more_requests )
   {
 try
 {
   do_some_work_eating_lot_of_memory();
 }
 catch(const std::exception ex)
 {
   cerr  ex.what()  endl;
 }
   }

   I.e, we will attempt to execute user requests while we have them
   in our queue, but we will get exceptions and stop processing if
   system is out of memory. As soon as system will get enough free space
   we will continue normal processing without any special handling from
   our side.

   It means that signal that opposite SIGDANGER is rarely required, if required
   at all. You should be glad, it reduces work to do. ;)

P.S. I know that throwing inside signal handler is bad techique, but it works
  (and works better than setting flag and testing it everywhere).

dozen







To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: VM: dynamic swap remapping (patch)

2001-10-01 Thread Karsten W. Rohrbach

i got a (way old) ppc 604e, in the corner of my office.
it's a 74p, latest 4.3.3 patchlevel from one month ago or so installed.

i could arrange ssh access to the box if somebody cares, although i am
not available 24x7 for remote hands ;-)
there's nothing critical on it, the box got 128mb ram, so contact me 
off-list if you want to play around with it.

/k

Greg Lehey([EMAIL PROTECTED])@2001.10.01 13:19:51 +:
 On Sunday, 30 September 2001 at 14:55:58 -0500, Alfred Perlstein wrote:
  * Jos Backus [EMAIL PROTECTED] [010930 14:35] wrote:
  On Sun, Sep 30, 2001 at 02:23:26PM -0500, Alfred Perlstein wrote:
  * Jos Backus [EMAIL PROTECTED] [010930 12:55] wrote:
  AIX has SIGDANGER.
 
  Anyone care to tell me how it works in AIX?  If the interface is
  nice, cloning it would be kind of cool.
 
  I don't currently have access to an AIX system, but
 
  
http://as400bks.rochester.ibm.com/doc_link/en_US/a_doc_lib/aixbman/admnconc/pag_space_under.htm
 
  has some (useful) info.
 
  It sure does!
 
  I think I'm going to make a proposal on -arch about this, to be
  perfectly honest, AIX has a good implementation, I haven't read it
  all yet, but it doesn't look like it gives the applications a
  notification when the danger is gone, we'll have to figure that out,
  or I'll have to read more into this.
 
 If it's any help, I have an AIX box here.  It belongs to IBM, so I
 have to respect security issues, but I'll do what I can.
 
 Greg
 --
 See complete headers for address and phone numbers
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with unsubscribe freebsd-hackers in the body of the message

-- 
 Gravity is an unforgiving motherfucker.
KR433/KR11-RIPE -- WebMonster Community Founder -- nGENn GmbH Senior Techie
http://www.webmonster.de/ -- ftp://ftp.webmonster.de/ -- http://www.ngenn.net/
karstenrohrbach.de -- alphangenn.net -- alphascene.org -- [EMAIL PROTECTED]
GnuPG 0x2964BF46 2001-03-15 42F9 9FFF 50D4 2F38 DBEE  DF22 3340 4F4E 2964 BF46
Please do not remove my address from To: and Cc: fields in mailing lists. 10x

 PGP signature


Re: VM: dynamic swap remapping (patch)

2001-09-30 Thread Matt Dillon


:  Second, application not always grows to 1G, most of the time it keeps
:  as small as 500M ;). Why should we precommit 1G for 500M data? Doing
:  multi-mmap memory management is additional pain.

Why not?  Disk space is cheap.  For a problem like this I would simply
throw in two 30G+ hard drives and partition them with 16G of swap each,
giving me 32G of swap for the machine.  If you needed to do it cheaply
you could even use IDE, though personally I would use SCSI for 
reliability.  Depending on the amount of real memory in the machine
you might have to tweek a few kernel options (like matching NSWAP to
the actual number of swap devices), but basically it should just work.

Even using file-backed memory is fairly trivial.  You don't need to
do multi-mmap memory management or do any kernel tweaking.  Just
reserve 1G and use a single mmap() and file per process.

-Matt


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: VM: dynamic swap remapping (patch)

2001-09-30 Thread Vladimir Dozen

ehlo.

 My suggestion, (but not my final say, i'm still open to ideas):
 
Implement a memory status signal to notify processes of changes
in the relative amount of system memory.
 
When memory reaches a low or high watermark, the signal is
broadcast to all running processes.
 
The default disposition will be to ignore the signal.
 
The signal will be named SIGMEMINFO.  (SIGXfoo means
'process has exceeded resource foo')

  Agreed. As for SIG_IGN, can anyone tell me -- can I force
  existing application to use my signal handler? For example,
  by preallocating some shared library? If so, there are no
  contras for ignoring signal by default.

The signal will pass via the siginfo struct information
such that the process can determine if the system has
just exceeded the low watermark (danger) or has reclaimed
down to the high watermark (enough free memory).

  Passing more info is always better. Agreed.

 a) over allocate swap a bit and set the low watermark carefully.
 b) do the following enhancement:
 
  Provide a system whereby you can swap to the filesystem without
  additional upcalls/syscalls from userspace, basically, provide
  some means of paging to the filesystem automatically.
 
   then, set your lowwater mark to the size of your swap partition,
   now your system will alert your processes and automatically swap
   _anyone_ to the filesystem.
 
 I really think that this would be more flexible and still allow
 you to achieve what you want... What do you think?

  I can't say anything until I'll got detail. Sorry, English is neither
  my native nor used often, so I may easely miss important details, but
  here is my random comments:
  
  Initally, I was trying the same (I think) approach, but there was 
  some problems. Some kernel function refused to work with VM objects 
  of processes differing from curproc. I.e., it could be hard to work 
  with bigproc inside swap daemon; and swap daemon is the only place 
  where we can detect OOM condition; that's why I used signal to transfer
  control to user space, and then back into kernel -- already in another
  process. Another reason to do it -- to make all limits and quota work
  automatically. Also, I did not wanted to make swap daemon busy too long.
  
  Also, what means over allocate swap a bit? How to compute the value
  of that bit? At what moment should we preallocate? Should we repeat
  preallocation after getting SIGMEMINFO (himark)?

  Also, you cannot set low mark to size of swap partition. To create
  file-based swap you need some memory (file operations requires it).
  So, low mark should be a bit lower (that's why I raised value of
  nswap_lowat).

  Finally, if you want to over allocate swap for every process in
  system, the whole swap can wind up consisting of only preallocations.
  Resource management is the role of kernel. Any hard reservation
  interfere with that.

-- 
dozen @ home

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: VM: dynamic swap remapping (patch)

2001-09-30 Thread Alfred Perlstein

* Matt Dillon [EMAIL PROTECTED] [010930 02:53] wrote:
 
 :  Second, application not always grows to 1G, most of the time it keeps
 :  as small as 500M ;). Why should we precommit 1G for 500M data? Doing
 :  multi-mmap memory management is additional pain.
 
 Why not?  Disk space is cheap.  For a problem like this I would simply
 throw in two 30G+ hard drives and partition them with 16G of swap each,
 giving me 32G of swap for the machine.  If you needed to do it cheaply
 you could even use IDE, though personally I would use SCSI for 
 reliability.  Depending on the amount of real memory in the machine
 you might have to tweek a few kernel options (like matching NSWAP to
 the actual number of swap devices), but basically it should just work.
 
 Even using file-backed memory is fairly trivial.  You don't need to
 do multi-mmap memory management or do any kernel tweaking.  Just
 reserve 1G and use a single mmap() and file per process.

What he needs is a system to inform him that things aren't looking
so good, check my email for what I think is a pretty good solution.

-- 
-Alfred Perlstein [[EMAIL PROTECTED]]
'Instead of asking why a piece of software is using 1970s technology,
start asking why software is ignoring 30 years of accumulated wisdom.'

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: VM: dynamic swap remapping (patch)

2001-09-30 Thread Poul-Henning Kamp

In message [EMAIL PROTECTED], Matt Dillon writes:
:  Second, application not always grows to 1G, most of the time it keeps
:  as small as 500M ;). Why should we precommit 1G for 500M data? Doing
:  multi-mmap memory management is additional pain.

Even using file-backed memory is fairly trivial.  You don't need to
do multi-mmap memory management or do any kernel tweaking.  Just
reserve 1G and use a single mmap() and file per process.

I once had a patch to phkmalloc() which backed all malloc'ed VM with
hidden files in the users homedir.  It was written to put the VM
usage under QUOTA control, but it had many useful side effects as well.

I can't seem to find it right now, but it is trivial to do: just
replace the sbrk(2) with mmap().  Only downside is the needed 
filedescriptor which some shells don't like.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: VM: dynamic swap remapping (patch)

2001-09-30 Thread Rik van Riel

On Sat, 29 Sep 2001, Alfred Perlstein wrote:
 * Vladimir Dozen [EMAIL PROTECTED] [010929 14:38] wrote:

  P.S. Anyway, I do NOT insist my solution is better, and even that it
   is good for anything at all. It was fun for me to hack in BSD kernel,
   and it was interesting challenge, and I feel need to share results
   with others. At worst, I will recommend our customer to setup
   processing farm under FreeBSD with applied patch.

 I'm really impressed with the work you put into this, but it seems
 that you've tried to tackle two problems at the same time,

Indeed, the whole idea of swapping tasks to the filesystem
in nice, but having the task do this all by itself isn't a
good option for many people...

 My suggestion, (but not my final say, i'm still open to ideas):

Implement a memory status signal to notify processes of changes
in the relative amount of system memory.

When memory reaches a low or high watermark, the signal is
broadcast to all running processes.

The default disposition will be to ignore the signal.

The signal will be named SIGMEMINFO.  (SIGXfoo means
'process has exceeded resource foo')

That'd be SIGDANGER, right ?

 b) do the following enhancement:

  Provide a system whereby you can swap to the filesystem without
  additional upcalls/syscalls from userspace, basically, provide
  some means of paging to the filesystem automatically.

Sounds like a winner, when swap runs out a process gets
suspended onto the filesystem automatically and SIGDANGER
is sent out to give others a chance to clean themselves
up.

If enough space is freed, the suspended process can get
back into the system.

This should also preserve leaky applications while at the
same time leaving the system intact...

regards,

Rik
-- 
IA64: a worthy successor to i860.

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: VM: dynamic swap remapping (patch)

2001-09-30 Thread Vladimir Dozen

ehlo.

 :  Second, application not always grows to 1G, most of the time it keeps
 :  as small as 500M ;). Why should we precommit 1G for 500M data? Doing
 :  multi-mmap memory management is additional pain.
 
 Why not?  Disk space is cheap.

  Developer time is expensive. Someone already wrote good allocation
  routines, and they are inside libc. Reinventing bycicle in every 
  new large-scale application doesn't sounds good for me.

 For a problem like this I would simply
 throw in two 30G+ hard drives and partition them with 16G of swap each,
 giving me 32G of swap for the machine.

  As it was said here before, there are actually two problems: notification
  (avoiding silently kills) and getting more paging space. The second can
  be solved by adding swap space. The first -- cannot. As developer, I'm
  more interested in first. Current solution with killproc() is not
  acceptable. 
  
  Just imagine any OS documentation which say: the OS may
  terminate process at any point with no warning or notification. Would
  you like to use it? But this is exactly what FreeBSD does at OOM.

 Even using file-backed memory is fairly trivial.  You don't need to
 do multi-mmap memory management or do any kernel tweaking.  Just
 reserve 1G and use a single mmap() and file per process.

  As I already said, it is not trivial. It involves writing/adopting
  some allocation stuff. It means time  human resources - money.

-- 
dozen @ home

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: VM: dynamic swap remapping (patch)

2001-09-30 Thread Alfred Perlstein

* Rik van Riel [EMAIL PROTECTED] [010930 04:12] wrote:
 On Sat, 29 Sep 2001, Alfred Perlstein wrote:
  * Vladimir Dozen [EMAIL PROTECTED] [010929 14:38] wrote:
 
   P.S. Anyway, I do NOT insist my solution is better, and even that it
is good for anything at all. It was fun for me to hack in BSD kernel,
and it was interesting challenge, and I feel need to share results
with others. At worst, I will recommend our customer to setup
processing farm under FreeBSD with applied patch.
 
  I'm really impressed with the work you put into this, but it seems
  that you've tried to tackle two problems at the same time,
 
 Indeed, the whole idea of swapping tasks to the filesystem
 in nice, but having the task do this all by itself isn't a
 good option for many people...
 
  My suggestion, (but not my final say, i'm still open to ideas):
 
 Implement a memory status signal to notify processes of changes
 in the relative amount of system memory.
 
 When memory reaches a low or high watermark, the signal is
 broadcast to all running processes.
 
 The default disposition will be to ignore the signal.
 
 The signal will be named SIGMEMINFO.  (SIGXfoo means
 'process has exceeded resource foo')
 
 That'd be SIGDANGER, right ?

Sort of.

 
  b) do the following enhancement:
 
   Provide a system whereby you can swap to the filesystem without
   additional upcalls/syscalls from userspace, basically, provide
   some means of paging to the filesystem automatically.
 
 Sounds like a winner, when swap runs out a process gets
 suspended onto the filesystem automatically and SIGDANGER
 is sent out to give others a chance to clean themselves
 up.

Well, no, the idea is to have a low and high watermark so that
flip-flopping on the boundry doesn't generate a lot of signals.

SIGDANGER is ok for a name, but slightly misleading because
I wanted to piggyback some info in the siginfo to tell processes
when the danger has passed.  Well ok, the name is ok, but
I do want an upcall when the situation is alleviated.

Let me also state that it may be wise to add huristics to the
system to not SIGDANGER anything that is completely swapped
out or hasn't run in a long time, this would avoid a spike
in thrashing at the time of the broadcast.

 If enough space is freed, the suspended process can get
 back into the system.
 
 This should also preserve leaky applications while at the
 same time leaving the system intact...

Hopefully, also having a SIGDANGER handler may be an indication
to the kernel to give you a second chance before shooting at
you, I know it could be used to subvert behavior to have another
niave program killed, however that could be a tunable to give
those trying to do the right thing a second chance.

-- 
-Alfred Perlstein [[EMAIL PROTECTED]]
'Instead of asking why a piece of software is using 1970s technology,
start asking why software is ignoring 30 years of accumulated wisdom.'

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: VM: dynamic swap remapping (patch)

2001-09-30 Thread Vladimir Dozen

ehlo.

 You're still thinking of the combined solution, just think of a
 system where all you have right now is the signals I mentioned.

  Yah, now I think I got it. Well, actually, signal(s) is all
  I need. The remapping was just a bonus. To be more precise,
  I need the only signal -- at low mark passed. Some other
  application might be interested in second -- hi mark --
  signal, but my doesn't. 

  SIGDANGER is the signal from Irix, AFAIR?

  So, how about to accept this name (just to not increase entropy
  of the Universe) and send it to all processes when nswap_lowat
  reached?

  The only point -- I prefer to have ability to set nswap_lowat
  via sysctl since I cannot predict what amount of memory can
  be consumed while freeing memory ;) (e.g., throwing exception
  in C++ may eat memory due to creating exception object; logging
  may eat memory also).

 Just think what happens if your filesystems are full and you run
 out of swap...

  The same that happens today -- killproc() will kill me.
  The situation doesn't becomes worse with remapping, it just
  ... mmm... prolonges.

-- 
dozen @ home

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: VM: dynamic swap remapping (patch)

2001-09-30 Thread Alfred Perlstein

* Vladimir Dozen [EMAIL PROTECTED] [010930 04:41] wrote:
 ehlo.
 
  You're still thinking of the combined solution, just think of a
  system where all you have right now is the signals I mentioned.
 
   Yah, now I think I got it. Well, actually, signal(s) is all
   I need. The remapping was just a bonus. To be more precise,
   I need the only signal -- at low mark passed. Some other
   application might be interested in second -- hi mark --
   signal, but my doesn't. 
 
   SIGDANGER is the signal from Irix, AFAIR?
 
   So, how about to accept this name (just to not increase entropy
   of the Universe) and send it to all processes when nswap_lowat
   reached?
 
   The only point -- I prefer to have ability to set nswap_lowat
   via sysctl since I cannot predict what amount of memory can
   be consumed while freeing memory ;) (e.g., throwing exception
   in C++ may eat memory due to creating exception object; logging
   may eat memory also).

You want to submit a patch?  If not I can take a look at it,
but it's been a bit since I've looked at the vm system.

 
  Just think what happens if your filesystems are full and you run
  out of swap...
 
   The same that happens today -- killproc() will kill me.
   The situation doesn't becomes worse with remapping, it just
   ... mmm... prolonges.
 
 -- 
 dozen @ home

-- 
-Alfred Perlstein [[EMAIL PROTECTED]]
'Instead of asking why a piece of software is using 1970s technology,
start asking why software is ignoring 30 years of accumulated wisdom.'

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: VM: dynamic swap remapping (patch)

2001-09-30 Thread Vladimir Dozen

ehlo.

 You want to submit a patch?  If not I can take a look at it,
 but it's been a bit since I've looked at the vm system.

  except for sysctl, the patch is quite simple due to the fact
  that histeresis is already implemented in swap_pager.c, something
  like:


diff vm/swap_pager.c vm.new/swap_pager.c
217a218,219
 struct proc* p;
   
218a221,225
 /* warn all processes */
 for( p = allproc.lh_first; p != 0; p = p-p_list.le_next ) 
 {
   psignal(p,SIGDANGER);
 }



diff sys/signal.h sys.new/signal.h
105a106,109
 #ifndef _POSIX_SOURCE
 #define SIGDANGER   32  /* close to out-of-memory */
 #endif
 



diff kern/kern_sig.c kern.new/kern_sig.c
165a166
 SA_IGNORE   /* SIGDANGER */


-- 
dozen @ home

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: VM: dynamic swap remapping (patch)

2001-09-30 Thread Vladimir Dozen

ehlo.

 
 diff vm/swap_pager.c vm.new/swap_pager.c
 217a218,219
  struct proc* p;

 218a221,225
  /* warn all processes */
  for( p = allproc.lh_first; p != 0; p = p-p_list.le_next ) 
  {
psignal(p,SIGDANGER);
  }
 

  Oops, it doesn't work. All processes died. Why?
  Something should be changed in libc?

-- 
dozen @ home

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: VM: dynamic swap remapping (patch)

2001-09-30 Thread Alfred Perlstein

* Vladimir Dozen [EMAIL PROTECTED] [010930 06:16] wrote:
 ehlo.
 
  
  diff vm/swap_pager.c vm.new/swap_pager.c
  217a218,219
   struct proc* p;
 
  218a221,225
   /* warn all processes */
   for( p = allproc.lh_first; p != 0; p = p-p_list.le_next ) 
   {
 psignal(p,SIGDANGER);
   }
  
 
   Oops, it doesn't work. All processes died. Why?
   Something should be changed in libc?

I'll take a look at implementing it sometime this week.

I want to do the siginfo thing if possible.


-- 
-Alfred Perlstein [[EMAIL PROTECTED]]
'Instead of asking why a piece of software is using 1970s technology,
start asking why software is ignoring 30 years of accumulated wisdom.'

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: VM: dynamic swap remapping (patch)

2001-09-30 Thread Jos Backus

On Sun, Sep 30, 2001 at 01:44:37PM +, Vladimir Dozen wrote:
   SIGDANGER is the signal from Irix, AFAIR?

AIX has SIGDANGER.

-- 
Jos Backus _/  _/_/_/Santa Clara, CA
  _/  _/   _/
 _/  _/_/_/ 
_/  _/  _/_/
[EMAIL PROTECTED] _/_/   _/_/_/use Std::Disclaimer;

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: VM: dynamic swap remapping (patch)

2001-09-30 Thread Jos Backus

On Sun, Sep 30, 2001 at 02:23:26PM -0500, Alfred Perlstein wrote:
 * Jos Backus [EMAIL PROTECTED] [010930 12:55] wrote:
  AIX has SIGDANGER.
 
 Anyone care to tell me how it works in AIX?  If the interface is
 nice, cloning it would be kind of cool.

I don't currently have access to an AIX system, but

http://as400bks.rochester.ibm.com/doc_link/en_US/a_doc_lib/aixbman/admnconc/pag_space_under.htm

has some (useful) info.

-- 
Jos Backus _/  _/_/_/Santa Clara, CA
  _/  _/   _/
 _/  _/_/_/ 
_/  _/  _/_/
[EMAIL PROTECTED] _/_/   _/_/_/use Std::Disclaimer;

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: VM: dynamic swap remapping (patch)

2001-09-30 Thread Alfred Perlstein

* Jos Backus [EMAIL PROTECTED] [010930 14:35] wrote:
 On Sun, Sep 30, 2001 at 02:23:26PM -0500, Alfred Perlstein wrote:
  * Jos Backus [EMAIL PROTECTED] [010930 12:55] wrote:
   AIX has SIGDANGER.
  
  Anyone care to tell me how it works in AIX?  If the interface is
  nice, cloning it would be kind of cool.
 
 I don't currently have access to an AIX system, but
 
 
http://as400bks.rochester.ibm.com/doc_link/en_US/a_doc_lib/aixbman/admnconc/pag_space_under.htm
 
 has some (useful) info.

It sure does!

I think I'm going to make a proposal on -arch about this, to be perfectly
honest, AIX has a good implementation, I haven't read it all yet, but
it doesn't look like it gives the applications a notification when
the danger is gone, we'll have to figure that out, or I'll have to
read more into this.

-- 
-Alfred Perlstein [[EMAIL PROTECTED]]
'Instead of asking why a piece of software is using 1970s technology,
start asking why software is ignoring 30 years of accumulated wisdom.'

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: VM: dynamic swap remapping (patch)

2001-09-30 Thread Matt Dillon


:
:In message [EMAIL PROTECTED], Matt Dillon writes:
::  Second, application not always grows to 1G, most of the time it keeps
::  as small as 500M ;). Why should we precommit 1G for 500M data? Doing
::  multi-mmap memory management is additional pain.
:
:Even using file-backed memory is fairly trivial.  You don't need to
:do multi-mmap memory management or do any kernel tweaking.  Just
:reserve 1G and use a single mmap() and file per process.
:
:I once had a patch to phkmalloc() which backed all malloc'ed VM with
:hidden files in the users homedir.  It was written to put the VM
:usage under QUOTA control, but it had many useful side effects as well.
:
:I can't seem to find it right now, but it is trivial to do: just
:replace the sbrk(2) with mmap().  Only downside is the needed 
:filedescriptor which some shells don't like.
:
:-- 
:Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
:[EMAIL PROTECTED] | TCP/IP since RFC 956

I think the file descriptor problem can be solved easily... simply
open the file, mmap() the entire 1G segment for this special application,
and then close() the file.  Then have sbrk() just eats out of the mapped 
segment.  Alternatively sbrk() could open/mmap/close in large 1MB or 4MB
segments, again leaving no file descriptors dangling.

-Matt


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: VM: dynamic swap remapping (patch)

2001-09-30 Thread Matt Dillon

: :Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
: :[EMAIL PROTECTED] | TCP/IP since RFC 956
: 
: I think the file descriptor problem can be solved easily... simply
: open the file, mmap() the entire 1G segment for this special application,
: and then close() the file.  Then have sbrk() just eats out of the mapped 
: segment.  Alternatively sbrk() could open/mmap/close in large 1MB or 4MB
: segments, again leaving no file descriptors dangling.
:
:Won't that cause fragmentation?  You're forgettng the need to 
:ftruncate or pre-zero the file unless that's been fixed.
:
:-- 
:-Alfred Perlstein [[EMAIL PROTECTED]]

You have to pre-zero the file.   You can do it in reasonably-sized
chunks (like 4M) without causing fragmentation.  You *CANNOT* use 
ftruncate() to extend the file - that will virtually guarentee massive
fragmentation.

-Matt


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: VM: dynamic swap remapping (patch)

2001-09-30 Thread Warner Losh

In message [EMAIL PROTECTED] Vladimir Dozen writes:
:   SIGDANGER is the signal from Irix, AFAIR?

AIX.

Warner

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: VM: dynamic swap remapping (patch)

2001-09-30 Thread Greg Lehey

On Sunday, 30 September 2001 at 14:55:58 -0500, Alfred Perlstein wrote:
 * Jos Backus [EMAIL PROTECTED] [010930 14:35] wrote:
 On Sun, Sep 30, 2001 at 02:23:26PM -0500, Alfred Perlstein wrote:
 * Jos Backus [EMAIL PROTECTED] [010930 12:55] wrote:
 AIX has SIGDANGER.

 Anyone care to tell me how it works in AIX?  If the interface is
 nice, cloning it would be kind of cool.

 I don't currently have access to an AIX system, but

 
http://as400bks.rochester.ibm.com/doc_link/en_US/a_doc_lib/aixbman/admnconc/pag_space_under.htm

 has some (useful) info.

 It sure does!

 I think I'm going to make a proposal on -arch about this, to be
 perfectly honest, AIX has a good implementation, I haven't read it
 all yet, but it doesn't look like it gives the applications a
 notification when the danger is gone, we'll have to figure that out,
 or I'll have to read more into this.

If it's any help, I have an AIX box here.  It belongs to IBM, so I
have to respect security issues, but I'll do what I can.

Greg
--
See complete headers for address and phone numbers

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: VM: dynamic swap remapping (patch)

2001-09-30 Thread Alfred Perlstein

* Greg Lehey [EMAIL PROTECTED] [010930 22:49] wrote:
 On Sunday, 30 September 2001 at 14:55:58 -0500, Alfred Perlstein wrote:
  * Jos Backus [EMAIL PROTECTED] [010930 14:35] wrote:
  On Sun, Sep 30, 2001 at 02:23:26PM -0500, Alfred Perlstein wrote:
  * Jos Backus [EMAIL PROTECTED] [010930 12:55] wrote:
  AIX has SIGDANGER.
 
  Anyone care to tell me how it works in AIX?  If the interface is
  nice, cloning it would be kind of cool.
 
  I don't currently have access to an AIX system, but
 
  
http://as400bks.rochester.ibm.com/doc_link/en_US/a_doc_lib/aixbman/admnconc/pag_space_under.htm
 
  has some (useful) info.
 
  It sure does!
 
  I think I'm going to make a proposal on -arch about this, to be
  perfectly honest, AIX has a good implementation, I haven't read it
  all yet, but it doesn't look like it gives the applications a
  notification when the danger is gone, we'll have to figure that out,
  or I'll have to read more into this.
 
 If it's any help, I have an AIX box here.  It belongs to IBM, so I
 have to respect security issues, but I'll do what I can.

Well Joe seems to have provided a pretty interesting document on
how it works in AIX, but I was wondering if they do anything wrt
low/high watermarks like my idea.

Basically you'd like to inform processes that the danger has been
alliviated so that they can cautiously start accepting more work
rather than freaking out and shutting out clients forever...

This might lead to a situation where SIGDANGER starts getting
sent informing that things are looking bleak, then processes
start freeing resources, they get the second SIGDANGER to let
them know that things are looking ok so they ramp up again and
the cycle repeats, I guess that's not optimal, but I'd like FreeBSD
to let processes know that things are looking better so they can
go from scrooge mode to thrifty mode.

-- 
-Alfred Perlstein [[EMAIL PROTECTED]]
'Instead of asking why a piece of software is using 1970s technology,
start asking why software is ignoring 30 years of accumulated wisdom.'

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: VM: dynamic swap remapping (patch)

2001-09-30 Thread Jos Backus

On Sun, Sep 30, 2001 at 11:41:14PM -0500, Alfred Perlstein wrote:
  If it's any help, I have an AIX box here.  It belongs to IBM, so I
  have to respect security issues, but I'll do what I can.

I seem to remember that one could set a watermark using the no command, but I
could be wrong. No AIX to verify this, maybe Greg can. The link below has some
info, too:

http://nscp.upenn.edu/aix4.3html/aixbman/prftungd/tunableaixparms.htm

-- 
JoS Backus _/  _/_/_/Santa Clara, CA
  _/  _/   _/
 _/  _/_/_/ 
_/  _/  _/_/
[EMAIL PROTECTED] _/_/   _/_/_/use Std::Disclaimer;

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: VM: dynamic swap remapping (patch)

2001-09-30 Thread Terry Lambert

Alfred Perlstein wrote:
[ ... SIGDANGER ... ]
 Well Joe seems to have provided a pretty interesting document on
 how it works in AIX, but I was wondering if they do anything wrt
 low/high watermarks like my idea.
 
 Basically you'd like to inform processes that the danger has been
 alliviated so that they can cautiously start accepting more work
 rather than freaking out and shutting out clients forever...

The process is supposed to return unused memory to the system
when it gets the signal, if it can.

It's not supposed to shed all load until it gets the all clear
signal.

I don't know if there are any good books on Windows Internals,
but the Windows VM system does the same thing: it notifies all
kernel subsystems that they need to free up memory, if they can.
The VFAT32 IFS will basically return exactly one page out of
many thousands it is using for cache, when it gets the request
(it is implemented as a callback, which you must provide when
you register for VM services).


 This might lead to a situation where SIGDANGER starts getting
 sent informing that things are looking bleak, then processes
 start freeing resources, they get the second SIGDANGER to let
 them know that things are looking ok so they ramp up again and
 the cycle repeats, I guess that's not optimal, but I'd like FreeBSD
 to let processes know that things are looking better so they can
 go from scrooge mode to thrifty mode.

The idea is just to free resources, if you can, and to mark the
processes which are precious by whether or not they have a
signal handler.  A close reading of the other document posted
(it seemed to be the admin manual from the URL) will indicate
that the followon SIGKILL is not sent to the processes that have
a SIGDANGER handler registered.  Note that this does not mean
that your process won't be killed off as a result of a page not
present fault, so abusing the interface is not really tolerated
very well by the system.

I think signalling an all clear is really a bad idea; a soft
hysteresis loop is much less prone to pendulum swings than a
hard hysteresis loop (lesson #1 in the book Fuzzy Logic).

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: VM: dynamic swap remapping (patch)

2001-09-29 Thread Karsten W. Rohrbach

Vladimir Dozen([EMAIL PROTECTED])@2001.09.29 15:59:41 +:
 ehlo.
 
   (Sorry for long pre-history, I believe it is necessary.)
 
   My current employer develops large CORBA-based data mining servers.
   They are usually run under HP-UX, but, following the current fashion
   to build processing farms, I was targeted to build version for free
   unices. Initial platform was Linux, and build itself was done smoothly,
   but very soon we were got problem: we use pthreads; to be more precise,
   we use thread-per-client model. This means that at the same time we may
   compute from single to a few tens client sessions. Each session may eat
   as much as 1G of address space, and even more (actually, there is no
   limits except for hardware ones).

IIRC from the problems we had with a project some while ago, mm might
help. [http://www.engelschall.com/sw/mm/]

it wraps malloc() and friends into a neat api, including preallocation
in fs space (the features are somewhat os dependent) and fast shared
memory.

/k

-- 
 Did you know that there are 71.9 acres of nipple tissue in the U.S.?
KR433/KR11-RIPE -- WebMonster Community Founder -- nGENn GmbH Senior Techie
http://www.webmonster.de/ -- ftp://ftp.webmonster.de/ -- http://www.ngenn.net/
karstenrohrbach.de -- alphangenn.net -- alphascene.org -- [EMAIL PROTECTED]
GnuPG 0x2964BF46 2001-03-15 42F9 9FFF 50D4 2F38 DBEE  DF22 3340 4F4E 2964 BF46
Please do not remove my address from To: and Cc: fields in mailing lists. 10x

 PGP signature


VM: dynamic swap remapping (patch)

2001-09-29 Thread Vladimir Dozen

ehlo.

  (Sorry for long pre-history, I believe it is necessary.)

  My current employer develops large CORBA-based data mining servers.
  They are usually run under HP-UX, but, following the current fashion
  to build processing farms, I was targeted to build version for free
  unices. Initial platform was Linux, and build itself was done smoothly,
  but very soon we were got problem: we use pthreads; to be more precise,
  we use thread-per-client model. This means that at the same time we may
  compute from single to a few tens client sessions. Each session may eat
  as much as 1G of address space, and even more (actually, there is no
  limits except for hardware ones).

  The problem was how Linux (and FreeBSD, as we discovered soon) treats
  out-of-memory (OOM) situation. 
  
  Under HPUX memory is precommited (i.e., swap is reserved for every 
  allocated page), so as soon as we get into OOM, malloc() or operator 
  new() returns NULL or throws exception, so we have opportunity to 
  unroll stack, tell client we cannot perform his request currently and, 
  most important, are able to continue execution of other clients requests.

  Linux and FreeBSD simply were killing whole our process and we have no
  any chance to know we are out of memory! All our data of all our clients
  (some of them were in processing days before) were lost. :(

  Very unfriendly, and, what can be more important, this kind of interaction 
  (absence of it, really) between OS and application reduces chances of 
  porting really large applications onto FreeBSD due to fact that no one 
  can trust OS that can simply trash user data with no warning.

  It seems to me, OS must use any chance to continue execution of 
  application instead of killing it. I do think it is Right Way.

  I have wrote a patch that modifies behaivour (have I spelled this
  word right? ;) of VM when we are out of memory. Instead of killing
  largest process, we remap parts of it's address space onto temporal
  files (exactly as HP-UX does when swapping into dir turned on).
  Of course, we cannot do it when we absolutely out of swap, we do it
  a bit early, when swap daemon founds swap free pages lowed to 
  nswap_lowat.

  I called this patch OOM Keeper as opposite to OOM Killer used in
  Linux (yah, I prefer BSD).

  Here is generic algorithm:

  1. Swap daemon founds vm_swap_size  nswap_lowat; it calls
 vm_oomkeeper_swap_almost_full();
  2. vm_oomkeeper_swap_almost_full() searches process having
 largest vm_object of type OBJT_SWAP, and sends it signal
 (proposed name: SIGXMEM).
  3. process gets signal, and calls special syscall (proposed
 name: remap).
  4. (we are again in kernel, this time curproc is our big process,
  in vm_oomkeeper_process).
 while free swap blocks are lower than nswap_hiwat, we
 do following:
   a) find largest object of OBJT_SWAP in current process
   b) create temporal file and unlink() it
   c) save first 1M of object into file
   d) cut first 1M of map (here we can get free swap blocks)
   e) mmap the file onto the place where the data was before.

  If any of above will fail, then old killproc() will trigger,
  so system will still be able to drop buggy processes.
  
  Note: process now has chance to do something in OOM situation.
  It can simply ignore signal, and it will be killed soon. It can
  call remap(), and it will be remapped onto files -- this will
  slow things down, but will allow to continue processing. It can
  free some space (e.g., by unmapping anonymous mmap). It can
  finally save current data and terminate, if nothing of above is
  acceptable.
  
  Note also that ulimits and quota are in action since files
  are created under process credentials.
  
  This patch was tested on my home PC with 64M RAM and 64M swap; I was
  able to run processes with _committed_ address space up to 512M
  in various scenarios: large malloc then commit, small incremental
  mallocs with immediate commit, random commit, parallel run of
  two or three such memory eaters, etc. No doubts, it requires
  additional testing.

  The patch is at whole in separate file -- vm_oomkeeper.c, and
  it requeres only single intrusion point in current code -- add
  single line in swap_pager.c:swp_sizechk().

  But, to fully implement it, I have to add new signal and new
  syscall into system. I do not want to go so far until I'll know
  if my patch acceptable for FreeBSD team.

  To make it fully controllable it would also be useful to set 
  nswap_{hi,lo}wat via sysctl interface. In any case, when using OOMK 
  these two should be raised about 4 to 8 times (from 400K to 2-4M).

  It would be also valueable if default action for SIGXMEM would be not
  SIG_IGN, but calling remap(). This requires patching of libc. Special
  environment variable ($REMAPDIR) might be used to set location of
  temporal files.
  
  I can send the vm_oomkeeper.c by request (it is 12K long, and I
  do not want to 

Re: VM: dynamic swap remapping (patch)

2001-09-29 Thread Alfred Perlstein

* Vladimir Dozen [EMAIL PROTECTED] [010929 06:57] wrote:
 ehlo.
 
   (Sorry for long pre-history, I believe it is necessary.)
[snip]
   Comments?

Wow!  This is really awesome work you've done, perhaps you can put
the patch up on a URL someplace?  If not mail it to me in private
and I can put it up for people to see.  One thing though, I think
that this behaviour should be toggled via a sysctl, but I think I
can manage doing that for you.

One other question, why not just set an option to make FreeBSD not
overcommit?  I've always wanted the ability to turn off overcommit
for exactly the same reasons you do.

-- 
-Alfred Perlstein [[EMAIL PROTECTED]]
'Instead of asking why a piece of software is using 1970s technology,
start asking why software is ignoring 30 years of accumulated wisdom.'

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: VM: dynamic swap remapping (patch)

2001-09-29 Thread Wilko Bulte

On Sat, Sep 29, 2001 at 07:10:24AM -0500, Alfred Perlstein wrote:
 * Vladimir Dozen [EMAIL PROTECTED] [010929 06:57] wrote:
  ehlo.
  
(Sorry for long pre-history, I believe it is necessary.)
 [snip]
Comments?
 
 Wow!  This is really awesome work you've done, perhaps you can put
 the patch up on a URL someplace?  If not mail it to me in private
 and I can put it up for people to see.  One thing though, I think
 that this behaviour should be toggled via a sysctl, but I think I
 can manage doing that for you.
 
 One other question, why not just set an option to make FreeBSD not
 overcommit?  I've always wanted the ability to turn off overcommit
 for exactly the same reasons you do.

FWIW: Tru64 has had this capability since day one. You can select
swap-overcommit mode by removing a symlink (/sbin/swapdefault - /dev/foob)
were /dev/foob is the primary swap partition.

W/

-- 
|   / o / /_  _ email:  [EMAIL PROTECTED]
|/|/ / / /(  (_)  Bulte Arnhem, The Netherlands 

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: VM: dynamic swap remapping (patch)

2001-09-29 Thread Matt Dillon

: overcommit?  I've always wanted the ability to turn off overcommit
: for exactly the same reasons you do.
:
:FWIW: Tru64 has had this capability since day one. You can select
:swap-overcommit mode by removing a symlink (/sbin/swapdefault - /dev/foob)
:were /dev/foob is the primary swap partition.
:
:W/
:
:-- 
:|   / o / /_  _email:  [EMAIL PROTECTED]
:|/|/ / / /(  (_)  BulteArnhem, The Netherlands 

Well, the overcommit argument comes up once or twice a year.  Frankly
I don't see much of a point to it.  While it is true that you could 
implement a signal the plain fact of the matter is that having to deal
with the possibility in a program at the N points (generally hundreds of
points) where that program allocates memory, either directly or 
indirectly, virtually guarentees that you will introduce bugs into the
system.  You also cannot guarentee that your process will have time to
cleanup prior to the system killing, nor can you guarentee that all the
standard system utilities and daemons will be able to gracefully handle
the out of memory condition.  In otherwords, you could implement
the signal and even have the program use it, but you will still likely
leave gaping holes in the implementation that will result in lost data.

It is much easier to manage memory manually.  For example, if these
programs require 1G of independant memory to run it ought to be a
fairly simple matter to simply create a 1GB file for each process
(using dd rather then ftruncate() to create the file so the blocks are
preallocated), mmap() it using PROT_READ|PROT_WRITE, MAP_SHARED|MAP_NOSYNC,
and do your memory management out of that.  The memory space will be
backed by the file rather then by swap.  You get all the benefits of
the standard overcommit capabilities of the system as well as the
ability to pre-reserve the main workspace for the programs and you
automatically get persistent storage for the data.  Problem solved.

-Matt


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: VM: dynamic swap remapping (patch)

2001-09-29 Thread Rik van Riel

On Sat, 29 Sep 2001, Vladimir Dozen wrote:

   I have wrote a patch that modifies behaivour (have I spelled this
   word right? ;) of VM when we are out of memory. Instead of killing
   largest process, we remap parts of it's address space onto temporal
   files (exactly as HP-UX does when swapping into dir turned on).

This is not instead of killing, this is just a way to
delay the killing of processes longer. Once your disk
is full you'd still run into the choice between a
deadlock and a kill...

It's an awesome way of delaying the out of memory
problem, though, because a suspended application won't
be able to allocate anything more, giving the system a
better chance to let the running apps run to completion.

Alternatively, the one leaky application is suspended
and the rest of the system continues to run without any
problems.

In short, I like it ;)

regards,

Rik
-- 
IA64: a worthy successor to i860.

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: VM: dynamic swap remapping (patch)

2001-09-29 Thread Alfred Perlstein

* Vladimir Dozen [EMAIL PROTECTED] [010929 14:38] wrote:

 P.S. Anyway, I do NOT insist my solution is better, and even that it
  is good for anything at all. It was fun for me to hack in BSD kernel, 
  and it was interesting challenge, and I feel need to share results 
  with others. At worst, I will recommend our customer to setup 
  processing farm under FreeBSD with applied patch.

I'm really impressed with the work you put into this, but it seems
that you've tried to tackle two problems at the same time, and by
tying them together made it less flexible and possibly more error
prone.

My suggestion, (but not my final say, i'm still open to ideas):

   Implement a memory status signal to notify processes of changes
   in the relative amount of system memory.

   When memory reaches a low or high watermark, the signal is
   broadcast to all running processes.

   The default disposition will be to ignore the signal.

   The signal will be named SIGMEMINFO.  (SIGXfoo means
   'process has exceeded resource foo')

   The signal will pass via the siginfo struct information
   such that the process can determine if the system has
   just exceeded the low watermark (danger) or has reclaimed
   down to the high watermark (enough free memory).

   This is just to provide processes with a warning to scale back
   consumption, exit, or release reasources, the good part is that
   it's broadcast and all interested parties will do something,
   hopefully the right thing.


To achieve nearly the same effect as your patch, I would implement
the above low/high water mark notification, then either:

a) over allocate swap a bit and set the low watermark carefully.
b) do the following enhancement:

 Provide a system whereby you can swap to the filesystem without
 additional upcalls/syscalls from userspace, basically, provide
 some means of paging to the filesystem automatically.

  then, set your lowwater mark to the size of your swap partition,
  now your system will alert your processes and automatically swap
  _anyone_ to the filesystem.

I really think that this would be more flexible and still allow
you to achieve what you want... What do you think?

-- 
-Alfred Perlstein [[EMAIL PROTECTED]]
'Instead of asking why a piece of software is using 1970s technology,
start asking why software is ignoring 30 years of accumulated wisdom.'

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: VM: dynamic swap remapping (patch)

2001-09-29 Thread Vladimir Dozen

ehlo.

 You also cannot guarentee that your process will have time to
 cleanup prior to the system killing, nor can you guarentee that all the
 standard system utilities and daemons will be able to gracefully handle
 the out of memory condition.  In otherwords, you could implement
 the signal and even have the program use it, but you will still likely
 leave gaping holes in the implementation that will result in lost data.

  Actually, the things as I coded them better suited namely for poorly
  written daemons that never check for malloc result. Precommit will just
  kill them as soon as malloc() will return NULL, and they dereference it.
  Killproc() will kill them too. Remapping will save them. Disk space
  now is large enough to make them live till root will notice that 
  they grow to much and do something (kill them manually, probably ;).

 It is much easier to manage memory manually.  For example, if these
 programs require 1G of independant memory to run it ought to be a
 fairly simple matter to simply create a 1GB file for each process
 (using dd rather then ftruncate() to create the file so the blocks are
 preallocated), mmap() it using PROT_READ|PROT_WRITE,MAP_SHARED|MAP_NOSYNC,
 and do your memory management out of that.

  First at all, it is NOT easier. Doing own memory management is not too
  simple, especially for threads and SMP -- we seen 50% performance impact
  when two threads on two processors were doing intensive allocations
  (it was not FreeBSD, and these was kernel threads).
  
  Second, application not always grows to 1G, most of the time it keeps
  as small as 500M ;). Why should we precommit 1G for 500M data? Doing
  multi-mmap memory management is additional pain.
  
  Third, swapping to device is faster, and, while we have enough swap, 
  I would prefer to swap there. Even a few percent for 5-day computation
  make sense.

 Problem solved.

  If I'm the developer -- probably, yes. What if I'm system administrator,
  and has to run something large _and important_? The day I'll notice 
  that monster creates swap files I'll know I have to add RAM. I will
  have time since it still works, it was not killed.

P.S. Anyway, I do NOT insist my solution is better, and even that it
 is good for anything at all. It was fun for me to hack in BSD kernel, 
 and it was interesting challenge, and I feel need to share results 
 with others. At worst, I will recommend our customer to setup 
 processing farm under FreeBSD with applied patch.
 
-- 
dozen @ home

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message