Re: [uml-devel] Re: [Lhms-devel] [PATCH 0/7] Fragmentation Avoidance V19

Blaisorblade Fri, 04 Nov 2005 15:38:21 -0800

Jeff, some input for you - can you have a look?

On Friday 04 November 2005 21:41, Rob Landley wrote:
> On Friday 04 November 2005 13:10, Blaisorblade wrote:
> > > If you've got a daemon running in the virtual system to hand back
> > > memory to the host, then you don't need a tuneable.
> >
> > I think Jeff's idea was a daemon running on the host (not as root) to
> > manage splitting of memory between UMLs (and possibly the host).
>
> That's more configuration on the host that's not really needed.  Doesn't do
> my case any good.


We'll consider your case too then... for your job the daemon could be done by 
a thread started by UML on the host. The idea (still very preliminar) was 
conceived for hosting providers, to my knowlegde.

However, see below - we can directly madvise() a page when it's freed. In this 
case, we'd need the guest to keep some free memory - and this can be done via 
one of Con Kolivas VM patches on the guest.

> > > What I was thinking is that if we get prezeroing infrastructure that
> > > can use various prezeroing accelerators (as has been discussed but I
> > > don't believe merged), then a logical prezeroing accelerator for UML
> > > would be calling madvise on the host system.  This has the advantage of
> > > automatically giving back to the host system any memory that's not in
> > > use, but would require some way to tell kswapd or some such that
> > > keeping around lots of prezeroed memory is preferable to keeping around
> > > lots of page cache.
> >
> > Ah, ok, I see, but a tuneable to say this is almost useless for anything
> > else I guess, so it won't even get coded.

> If we get prezeroing, the tunable is useful.  If we haven't got prezeroing,
> this infrastructure probably won't get in.

Hmm.... yep, prezeroing is useless if you don't keep some prezeroed memory, 
right.... I had answered too much in a hurry.

Btw, indeed I previously planned to use the existing arch_free_pages() hook 
for page freeing, to call madvise() (conditionally I mean)... actually yes, 
you can't make sure that page isn't going to be reused, but if the page is 
_freed_ and you want still the content kept you will _anyway_ loose.

The biggest risk is to madvise() a page uselessly, and that disturbs a bit 
performance, except that in general we should win by letting the host use 
more memory.

> > > I think it's
> > > because the disk is so overwhelmed, and some things (like vim's .swp
> > > file, and something similar in kmail's composer) do a gratuitous
> > > fsync...
> >
> > Yep, that's possible (running Gentoo, I often go to loads like 8-10,
> > including a CPU-hog in the background, and things become a bit slow).

> It's not load for me, it's disk bandwidth.  Every time it writes to the
> swap UBD, that data is scheduled for write-out.  So if it's thrashing the
> swap file, even though it's reading the data back in fairly quickly the
> data still gets written out to disk, again and again, each time it's
> touched.  Result: the disk I/O becomes a bottleneck and the disk is
> _PEGGED_ as long as the swap storm continues.

> Not try to have anything else on the system that wants to access the disk.
> Yes, reads are prioritized but anything that does fsync and waits for a
> write (as vi and kmail's composer do, or anything _else_ that wants to swap
> out a page to free up some memory) winds up waiting somewhere around 15
> seconds.

> (Yeah, 20 gigs/second on linear reads.  Not quite so much on an 
> endless series of small random seeks.)
:: LOL :: If you have such a laptop, then I'm writing this email via Kmail on 
a Windows client via Putty and Cygwin/X from my laptop. Ah, sorry, I'm indeed 
doing this...

> > However, I feel that really it's the simple "fork" which slows down like
> > a crawl (and given that memory allocation will easily sleep waiting for
> > some memory to be freed - i.e. to be freed or synced to disk, that's
> > reasonable).

>  You don't have to fork to block in this swap storm.

Surely, but at the console you really feel ls or free taking ages ...

>  The fact my old 
> ubuntu's on a 2.6.10 kernel might have something to do with it, though...

> > Yep, I see - it becomes so reluctant to swapping that it prefers killing.
> > Unintended, but at least a reasonable bug...

> Triggering the OOM killer when you have _any_ writes pending is silly. 

Hey, I said "reasonable bug", but don't forget "bug" in the sentence. It's 
reasonable as opposed to "busybox install doesn't work in UML", which is an 
unreasonable bug.

> Wait and memory will free up.  And yet we do this all the time.  We trigger
> the OOM killer when there's still swap space, too.  I thought the point of
> a swap block device instead of a swap file was that there were no memory
> allocations needed to flush out memory.
>
> The oom killer is theoretically for when waiting won't help.  But the
> implementation doesn't seem to match that...

> > > The daemon you mentioned is an alternative, but I'm not quite sure how
> > > rapid the daemon's reaction is going to be to potential OOM situations
> > > when something suddenly wants an extra 200 megs...
> >
> > The daemon will have to be designed and written, so we'll see... and we
> > _could_ add a pre-OOM hook (it would be meaningful for Xen and any other
> > virtualization tool)... to trigger a mconsole notification on the host
> > and wait for any response from the daemon...
> >
> > At that point I become curious for "how much should the daemon give to
> > the guest", and that would be policy configurable... but the policy file
> > (which I already guess will be more complex than the daemon itself) would
> > like some way to gather "how memory it needs" informations.
> >
> > We already started discussing on IRC with Jeff some ideas for estimating
> > the past usage, but predicting the future one is more difficult.
> >
> > It's still possible to calculate the speed of new allocations, but not to
> > now what's happening inside... the only possibility I see is to allow the
> > notification to include the amount of needed memory (you can already do
> > "echo something nice > /proc/notify", we now only need a client).
> >
> > But this allows DoSing the host with untrusted users. Not fully though,
> > since you can never hotplug memory which wasn't hot-unplugged first -
> > i.e. you would boot your UML with mem=256m and then immediately
> > hot-unplug the most of it.
>
> In theory, UML memory isn't all that much different from allocating normal
> user memory.  So a DOS shouldn't enter into it.
>
> I like the idea that a UML instance can figure out when it has extra memory
> it's not using, and hand it back to the host.  And for a UML to maintain
> any significant quantity of page cache that isn't tmpfs or ramfs is
> probably a bad idea (with the possible exception of NFS mounts).  With
> hostfs or UBD, the host should have it cached to it's just two copies of
> memory, and fetching it again is easy.  (Dentries I can see cacheing lots
> of.)
>
> Hence a UML instance could indeed (in theory) have lots of actually free
> memory (aggressively reclaim page cache) that it could madvise back to the
> host.  And if it can do that, why does it need the deamon?
>
> Rob

-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade

        

        
                
___________________________________ 
Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB 
http://mail.yahoo.it



-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
User-mode-linux-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

Re: [uml-devel] Re: [Lhms-devel] [PATCH 0/7] Fragmentation Avoidance V19

Reply via email to