On Saturday 05 November 2005 06:45, Jeff Dike wrote: > On Fri, Nov 04, 2005 at 02:41:11PM -0600, Rob Landley wrote: > > On Friday 04 November 2005 13:10, Blaisorblade wrote: > > > > What I was thinking is that if we get prezeroing infrastructure that > > > > can use various prezeroing accelerators (as has been discussed but I > > > > don't believe merged), then a logical prezeroing accelerator for UML > > > > would be calling madvise on the host system. This has the advantage > > > > of automatically giving back to the host system any memory that's not > > > > in use, but would require some way to tell kswapd or some such that > > > > keeping around lots of prezeroed memory is preferable to keeping > > > > around lots of page cache.
> > > Ah, ok, I see, but a tuneable to say this is almost useless for > > > anything else I guess, so it won't even get coded. > > If we get prezeroing, the tunable is useful. If we haven't got > > prezeroing, this infrastructure probably won't get in. > I'm not really convinced that prezeroing would be that useful, particularly > through madvise. The reason is that the normal case for a system is that > it has no free memory because it's caching anything that might be useful. > The one case I can think of where you all of a sudden have a lot of free > memory that might not be used for a while is when a large process exits, > and you get a lot of freed data, page tables, etc. Then, we could possibly > madvise that and stick it on a zeroed pages list. Forgetting about the > extra infrastructure needed to implement it, even that would be under > constant threat. Witness the ocassional proposals to do pre-swapping > - swapping in stuff before it's needed when you have some free memory > for it. I've proposed in fact including (for now) another of Con's patch, which gives some preference to free memory over pagecache (to speed up page allocation)... but I don't quite understand why no Con's patches get merged, at least in -mm (not that I follow that a lot)... Also, using pre-zeroing accelerators would mean that we need to keep some zero-ed memory at hand... > Looking at it another way, what this would basically doing would be > moving page zeroing from userspace to kernel space, which is generally > counter to the direction that things generally go. Nope - it's not reimplementing memset() as a syscall - which would move things from userspace to kernelspace. Instead, pre-zeroing is moving the existing kernelspace memset() to hardware. And relying on it doesn't seem bad... > > It's not load for me, it's disk bandwidth. Every time it writes to the > > swap UBD, that data is scheduled for write-out. So if it's thrashing the > > swap file, even though it's reading the data back in fairly quickly the > > data still gets written out to disk, again and again, each time it's > > touched. Result: the disk I/O becomes a bottleneck and the disk is > > _PEGGED_ as long as the swap storm continues. Sorry, you mention UML fsyncing to the swap file... this is misconfiguration! Disable CONFIG_*UBD*_SYNC and enable it per-device with ubd0s= rather than ubd0= (see --help output). The net effect is the same, except you don't get synchronous swapping! Can you try and report _any_ difference? > Do you understand exactly what's happening here? Because I don't, and > I wish someone could explain it. UML shouldn't be able to bog down > the host like that. Its one-request-at-a-time pseudo-AIO shouldn't > make that much IO happen that suddenly. There are other things that > do IO for a living (kernel builds, updatedb) and they don't seem to > bog down the system like this. Some ideas on this: 1) In my experience, even with the CFQ scheduler, updatedb _does_ do many bad to the system... (but I run a [EMAIL PROTECTED] CPU hog, so I may not be considered totally trustworthy). Without CFQ it's a total pain (even without CPU hogs)... 2) Also, fsync() is a bad idea here.... the host elevator can either prioritize only UML's writes wrt. all other writes (which could be seen as unfair and so wouldn't be implemented) or prioritize all writes, or do nothing to speed up fsync() - and I guess some elevator prioritizes writes on fsync(). Which bogs down the host 3) Have you looked at C. Aker's ubd token limiter? 4) Also, remember that you mustn't count I/O, but rather seeks (one seek can costs about 10 ms+ on a laptop, i.e. 100 Kb of sequential I/O).... * remember that guest's ext3 (and recently reiserfs too) is proud of avoiding fragmentation by spreading files on the whole disk... (thus making ubd0 _much_ sparse). * and finally, remember that we usually run UML on sparse files, which is an atypical workload and not optimized against fragmentation... we could well have a sequential read in UML become a crazy seek storm in the host. About this, a paper at OLS (Virtualized GNU/Linux testing across distros?) talks about "special I/O elevator setup" for UML, and the authors talked to you for some issues, and IIRC maybe even work at Intel... ***) Jeff, what about talking to them and asking them to submit us their code, or at the very least their recipe? -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___________________________________ Yahoo! Messenger: chiamate gratuite in tutto il mondo http://it.messenger.yahoo.com ------------------------------------------------------- SF.Net email is sponsored by: Tame your development challenges with Apache's Geronimo App Server. Download it for free - -and be entered to win a 42" plasma tv or your very own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel