On Friday 04 November 2005 13:10, Blaisorblade wrote:
> > If you've got a daemon running in the virtual system to hand back memory
> > to the host, then you don't need a tuneable.
>
> I think Jeff's idea was a daemon running on the host (not as root) to
> manage splitting of memory between UMLs (and possibly the host).

That's more configuration on the host that's not really needed.  Doesn't do my 
case any good.

> > What I was thinking is that if we get prezeroing infrastructure that can
> > use various prezeroing accelerators (as has been discussed but I don't
> > believe merged), then a logical prezeroing accelerator for UML would be
> > calling madvise on the host system.  This has the advantage of
> > automatically giving back to the host system any memory that's not in
> > use, but would require some way to tell kswapd or some such that keeping
> > around lots of prezeroed memory is preferable to keeping around lots of
> > page cache.
>
> Ah, ok, I see, but a tuneable to say this is almost useless for anything
> else I guess, so it won't even get coded.

If we get prezeroing, the tunable is useful.  If we haven't got prezeroing, 
this infrastructure probably won't get in.

> > I think it's
> > because the disk is so overwhelmed, and some things (like vim's .swp
> > file, and something similar in kmail's composer) do a gratuitous fsync...
>
> Yep, that's possible (running Gentoo, I often go to loads like 8-10,
> including a CPU-hog in the background, and things become a bit slow).

It's not load for me, it's disk bandwidth.  Every time it writes to the swap 
UBD, that data is scheduled for write-out.  So if it's thrashing the swap 
file, even though it's reading the data back in fairly quickly the data still 
gets written out to disk, again and again, each time it's touched.  Result: 
the disk I/O becomes a bottleneck and the disk is _PEGGED_ as long as the 
swap storm continues.

Not try to have anything else on the system that wants to access the disk.  
Yes, reads are prioritized but anything that does fsync and waits for a write 
(as vi and kmail's composer do, or anything _else_ that wants to swap out a 
page to free up some memory) winds up waiting somewhere around 15 seconds.  
(Yeah, 20 gigs/second on linear reads.  Not quite so much on an endless 
series of small random seeks.)

> However, I feel that really it's the simple "fork" which slows down like a
> crawl (and given that memory allocation will easily sleep waiting for some
> memory to be freed - i.e. to be freed or synced to disk, that's
> reasonable).

 You don't have to fork to block in this swap storm.  The fact my old ubuntu's 
on a 2.6.10 kernel might have something to do with it, though...

> And, btw, Frag. Avoidance would help for that too...

If it ever goes in...

> > > However look at /proc/sys/vm/swappiness
> >
> > Setting swappiness to 0 triggers the OOM killer on 2.6.14 for a load that
> > completes with swappiness at 60.
>
> Yep, I see - it becomes so reluctant to swapping that it prefers killing.
> Unintended, but at least a reasonable bug...

Triggering the OOM killer when you have _any_ writes pending is silly.  Wait 
and memory will free up.  And yet we do this all the time.  We trigger the 
OOM killer when there's still swap space, too.  I thought the point of a swap 
block device instead of a swap file was that there were no memory allocations 
needed to flush out memory.

The oom killer is theoretically for when waiting won't help.  But the 
implementation doesn't seem to match that...

> > I mentioned this on the list a little
> > while ago and some people asked for copies of my test script...
> >
> > > or use Con Kolivas's patches to find new tunable and policies.
> >
> > The daemon you mentioned is an alternative, but I'm not quite sure how
> > rapid the daemon's reaction is going to be to potential OOM situations
> > when something suddenly wants an extra 200 megs...
>
> The daemon will have to be designed and written, so we'll see... and we
> _could_ add a pre-OOM hook (it would be meaningful for Xen and any other
> virtualization tool)... to trigger a mconsole notification on the host and
> wait for any response from the daemon...
>
> At that point I become curious for "how much should the daemon give to the
> guest", and that would be policy configurable... but the policy file (which
> I already guess will be more complex than the daemon itself) would like
> some way to gather "how memory it needs" informations.
>
> We already started discussing on IRC with Jeff some ideas for estimating
> the past usage, but predicting the future one is more difficult.
>
> It's still possible to calculate the speed of new allocations, but not to
> now what's happening inside... the only possibility I see is to allow the
> notification to include the amount of needed memory (you can already do
> "echo something nice > /proc/notify", we now only need a client).
>
> But this allows DoSing the host with untrusted users. Not fully though,
> since you can never hotplug memory which wasn't hot-unplugged first - i.e.
> you would boot your UML with mem=256m and then immediately hot-unplug the
> most of it.

In theory, UML memory isn't all that much different from allocating normal 
user memory.  So a DOS shouldn't enter into it.

I like the idea that a UML instance can figure out when it has extra memory 
it's not using, and hand it back to the host.  And for a UML to maintain any 
significant quantity of page cache that isn't tmpfs or ramfs is probably a 
bad idea (with the possible exception of NFS mounts).  With hostfs or UBD, 
the host should have it cached to it's just two copies of memory, and 
fetching it again is easy.  (Dentries I can see cacheing lots of.)

Hence a UML instance could indeed (in theory) have lots of actually free 
memory (aggressively reclaim page cache) that it could madvise back to the 
host.  And if it can do that, why does it need the deamon?

Rob


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

Reply via email to