On Friday 04 November 2005 23:45, Jeff Dike wrote:
> > If we get prezeroing, the tunable is useful.  If we haven't got
> > prezeroing, this infrastructure probably won't get in.
>
> I'm not really convinced that prezeroing would be that useful, particularly
> through madvise.  The reason is that the normal case for a system is that
> it has no free memory because it's caching anything that might be useful.
> The one case I can think of where you all of a sudden have a lot of free
> memory that might not be used for a while is when a large process exits,
> and you get a lot of freed data, page tables, etc.  Then, we could possibly
> madvise that and stick it on a zeroed pages list.  Forgetting about the
> extra infrastructure needed to implement it, even that would be under
> constant threat.  Witness the ocassional proposals to do pre-swapping
> - swapping in stuff before it's needed when you have some free memory
> for it.

For the specific case of User Mode Linux, the extra cacheing is largely a 
waste of time.  For UBD or hostfs, the host OS has the data cached so we're 
spending twice as much memory as necessary in hopes of avoiding a syscall and 
copy.

A proposal floated by on Linux-Kernel yesterday for a zone for hugepages, 
something the kernel could only put anonymous and pagecache in.  If we had 
the option to keep page cache out of it as well, then we could specify at 
boot time, "I'm giving this UML instance mem=256 but I only want the default 
32M of that to be used by anything but anonymous pages.  When those are free, 
it's fine for them to be free, I _want_ them to be free.  The host can put 
that to good use in service of Konqueror and Kmail.

This would be useful to me.  If the sucker reaps all the page cache and 
dentries and _still_ runs out of memory for a kernel allocation, then yes 
I've misconfigured it.  But it had better reap all the page cache and 
dentries first...

> Looking at it another way, what this would basically doing would be
> moving page zeroing from userspace to kernel space, which is generally
> counter to the direction that things generally go.

Page zeroing is currently done in userspace?

Juggling memory is something that userspace has traditionally deeply sucked 
at.  Having to page in a daemon to make decisions in a low memory situation 
is unlikely to improve matters.

> > It's not load for me, it's disk bandwidth.  Every time it writes to the
> > swap UBD, that data is scheduled for write-out.  So if it's thrashing the
> > swap file, even though it's reading the data back in fairly quickly the
> > data still gets written out to disk, again and again, each time it's
> > touched.  Result: the disk I/O becomes a bottleneck and the disk is
> > _PEGGED_ as long as the swap storm continues.
>
> Do you understand exactly what's happening here?  Because I don't, and
> I wish someone could explain it.  UML shouldn't be able to bog down
> the host like that.  Its one-request-at-a-time pseudo-AIO shouldn't
> make that much IO happen that suddenly.  There are other things that
> do IO for a living (kernel builds, updatedb) and they don't seem to
> bog down the system like this.

Just some educated guesses.

1) Ubuntu is defaulting to the anticipator I/O scheduler, and that melts down 
under sufficiently heavy loads.  I should switch it to CFQ.

2) Some applications (vim and kmail most noticeably) do an fsync(), and in 
situations with lots of disk activity an fsync can block for 30 seconds.  
(Why konqueror suffers from this is another question, but konqueror has 
always been vulnerable to low memory situations...)

Either way it's a host kernel problem.  UML is a more or less normal userspace 
app, and it shouldn't be able to bog the system like...

P.S.  You're mentioning loads the kernel guys have specifically optimized for.  
Kernel builds are what the kernel has been optimized for over the past 10 
years, but it's not a big I/O test.  The I/O it does is nicely localized by 
directory, sucking in entire files each time so readahead is never wasted, 
and it's generally CPU bound even on a fast machine.  As for updatedb, that's 
also not random seeks taking small chunks out of a file, and it's dominated 
by reads.  The anticipatory scheduler is effectively optimized for updatedb.  
Of course the first thing I do on any new system is kill cron...

swapping has always been tougher to optimize for, and it turns out that even 
swapping-style access patterns originating in userspace and happening inside 
a file are still a bit of a pain.

I'll see if CFQ improves matters.  If nothing else, the ability of "nice 20 
mybuild" to actually affect disk I/O would be a _serious_ bonus...

>     Jeff

Rob


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

Reply via email to