Re: Break 2.4 VM in five easy steps
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Mon, Jun 11, 2001 at 04:04:45PM -0300, Rik van Riel wrote: > On Mon, 11 Jun 2001, Maciej Zenczykowski wrote: > > On Fri, 8 Jun 2001, Pavel Machek wrote: > > > > > That modulo is likely slower than dereference. > > > > > > > + if (count % 256 == 0) { > > > > You are forgetting that this case should be converted to and 255 > > or a plain byte reference by any optimizing compiler You read too much into my choice - 256 is a random number ;) > What matters is that this thing calls schedule() unconditionally > every 256th time. Checking current->need_resched will only call > schedule if it is needed ... not only that, but it will also > call schedule FASTER if it is needed. I will try this later today, but it seems right enough. generic_file_write seems to do enough other work that a dereference vs. and-255 shouldn't be too bad... Bernd Jendrissek -BEGIN PGP SIGNATURE- Version: GnuPG v1.0.4 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE7Jciz/FmLrNfLpjMRAmI9AKCm2EYziCzG0qrobFooGLf3kepb/wCbBQf6 nXmD/OZNhGttwQejZtYi3ic= =rWL2 -END PGP SIGNATURE- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On Mon, 11 Jun 2001, Maciej Zenczykowski wrote: > On Fri, 8 Jun 2001, Pavel Machek wrote: > > > That modulo is likely slower than dereference. > > > > > + if (count % 256 == 0) { > > You are forgetting that this case should be converted to and 255 > or a plain byte reference by any optimizing compiler Not relevant. What matters is that this thing calls schedule() unconditionally every 256th time. Checking current->need_resched will only call schedule if it is needed ... not only that, but it will also call schedule FASTER if it is needed. regards, Rik -- Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On Fri, 8 Jun 2001, Pavel Machek wrote: > That modulo is likely slower than dereference. > > > + if (count % 256 == 0) { You are forgetting that this case should be converted to and 255 or a plain byte reference by any optimizing compiler - and gcc surely is, on x86 this code can be reduced to around 2 cycles (Pentium: mov, or, jnz, with preceding code intertwined to cancel stalls and jnz being likely in the code buffer)... Maciek - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
Hi! > If this solves your problem, use it; if your name is Linus or Alan, > ignore or do it right please. Well I guess you should do CONDITIONAL_SCHEDULE (if it is not defined as macro, do if (current->need_resched) schedule()). That modulo is likely slower than dereference. > diff -u -r1.1 -r1.2 > --- linux-hack/mm/filemap.c 2001/06/06 21:16:28 1.1 > +++ linux-hack/mm/filemap.c 2001/06/07 08:57:52 1.2 > @@ -2599,6 +2599,11 @@ > char *kaddr; > int deactivate = 1; > > + /* bernd-hack: give other processes a chance to run */ > + if (count % 256 == 0) { > + schedule(); > + } > + > /* > * Try to find the page in the cache. If it isn't there, > * allocate a free page. > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.0.4 (GNU/Linux) > Comment: For info see http://www.gnupg.org > > iD8DBQE7H1tb/FmLrNfLpjMRAguAAJ0fYInFbAa6LjFC/CWZbRPQxzZwrwCeNqT0 > /Kod15Nx7AzaM4v0WhOgp88= > =pyr6 > -END PGP SIGNATURE- > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt, details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
Hi! > But if the page in memory is 'dirty', you can't be efficient with swapping > *in* the page. The page on disk is invalid and should be released, or am I > missing something? Yes. You are missing fragmentation. This keeps it low. Pavel -- Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt, details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
>I realize that assembly is platform-specific. Being >that I use the IA32 class machine, that's what I >would write for. Others who use other platforms could >do the deed for their native language. Meaning we'd still need a good C implementation anyway for the 75% of platforms nobody's going to get around to writing an assembly implementation for this year, so we might as well do that first, eh? As for IA32 being everywhere, 16 bit 8086 was everywhere until 1990 or so. And 64 bitness is right around the corner (iTanic is a pointless way of de-optimizing for memory bus bandwidth, which is your real bottleneck and not whatever happens inside a chip you've clock multiplied by a factor of 12 or more. But x86-64 looks seriously cool if AMD would get off their rear and actually implement sledgehammer in silicon within our lifetimes. And that's probably transmeta's way of going 64 bit eventually too. (And that was obvious even BEFORE the cross licensing agreement was announced.)) And interestingly, an assembly routine optimized for 386 assembly just might get beaten by C code compiled for Athlon optimization. It's not JUST "IA32". Memory management code probably has to know about the PAE addressing extensions, different translation lookaside buffer versions, and interacting with the wonderful wide world of DMA. Luckily in kernel we just don't do floating point (MMX/3DNow/whatever it was they're so proud of in Pentium 4 whose acronym I've forgotten at the moment. Not SLS, that was a linux distribution...) If your'e a dyed in the wool assembly hacker, go help the GCC/EGCS folks make a better compiler. They could use you. The kernel isn't the place for assembly optimization. >Being that most users are on the IA32 platform, I'm >sure they wouldn't reject an assembly solution to >this problem. If it's unreadable to C hackers, so that nobody understands it, so that it's black magic that positively invites subtle bugs from other code that has to interface with it... Yes they darn well WOULD reject it. Simplicity and clarity are actually slightly MORE important than raw performance, since if you just six months the midrange hardware gets 30% faster. The ONLY assembly that's left in the kernel is the stuff that's unavoidable, like boot sectors and the setup code that bootstraps the first kernel init function in C, or perhaps the occasional driver that's so amazingly timing dependent it's effectively real-time programming at the nanosecond level. (And for most of those, they've either faked a C solution or restricted the assembly to 5 lines in the middle of a bunch of C code. Memo: this is the kind of thing where profanity gets into kernel comments.) And of course there are a few assembly macros for half-dozen line things like spinlocks that either can't be done any other way or are real bottleneck cases where the cost of the extra opacity (which is a major cost, that is definitely taken into consideration) honestly is worth it. > As for kernel acceptance, that's an >issue for the political eggheads. Not my forte. :-) The problem in this case is an O(n^2) or worse algorithm is being used. Converting it to assembly isn't going to fix something that gets exponentially worse, it just means that instead of blowing up at 2 gigs it now blows up at 6 gigs. That's not a long term solution. If eliminating 5 lines of assembly is a good thing, rewriting an entire subsystem in assembly isn't going to happen. Trust us on this one. Rob __ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Report was:Re: Break 2.4 VM in five easy steps
Mike Galbraith <[EMAIL PROTECTED]> writes: > On Fri, 8 Jun 2001, John Stoffel wrote: > > > Mike> OK, riddle me this. If this test is a crummy test, just how is > > Mike> it that I was able to warn Rik in advance that when 2.4.5 was > > Mike> released, he should expect complaints? How did I _know_ that? > > Mike> The answer is that I fiddle with Rik's code a lot, and I test > > Mike> with this test because it tells me a lot. It may not tell you > > Mike> anything, but it does me. > > > > I never said it was a crummy test, please do not read more into my > > words than was written. What I was trying to get across is that just > > one test (such as a compile of the kernel) isn't perfect at showing > > where the problems are with the VM sub-system. > > Hmm... > > Tobias> Could you please explain what is good about this test? I > Tobias> understand that it will stress the VM, but will it do so in a > Tobias> realistic and relevant way? > > I agree, this isn't really a good test case. I'd rather see what > > happens when you fire up a gimp session to edit an image which is > *almost* the size of RAM, or even just 50% the size of ram. Then how > does that affect your other processes that are running at the same > time? > > ...but anyway, yes it just one test from any number of possibles. One great test that I'm using regularly to see what's goin' on, is at http://lxr.linux.no/. It is a cool utility to cross reference your Linux kernel source tree, and in the mean time eat gobs of memory, do lots of I/O, and burn many CPU cycles (all at the same time). Ideal test, if you ask me and if anybody has the time, it would be nice to see different timing numbers when run on different kernels. Just make sure you run it on the same kernel tree to make reproducable results. It has three passes, and the third one is the most interesting one (use vmstat 1 to see why). When run with 64MB RAM configuration, it would swap heavily, with 128MB somewhat, and at 192MB maybe not (depending on the other applications running at the same time). Try it, it is a nice utility, and a great test. :) -- Zlatko - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On Sat, 9 Jun 2001, Rik van Riel wrote: >> Why are half the people here trying to hide behind this diskspace >> is cheap argument? If we rely on that, then Linux sucks shit. > >Never mind them, I haven't seen any of them contribute >VM code, even ;) Nor have I, but I think you guys working on it will get it cleaned up eventually. What bugs me is people trying to pretend that it isn't important to fix, or that spending money to get newer hardware is acceptable solution. >OTOH, disk space _is_ cheap, so the other VM - performance >related - VM bugs do have a somewhat higher priority at the >moment. Yes, it is cheap. It isn't always an acceptable workaround though, so I'm glad you guys are working on it - even if we have to wait a bit. I have faith in the system. ;o) -- Mike A. Harris - Linux advocate - Open Source advocate Opinions and viewpoints expressed are solely my own. -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On Wed, 6 Jun 2001, Mike A. Harris wrote: > Why are half the people here trying to hide behind this diskspace > is cheap argument? If we rely on that, then Linux sucks shit. Never mind them, I haven't seen any of them contribute VM code, even ;) OTOH, disk space _is_ cheap, so the other VM - performance related - VM bugs do have a somewhat higher priority at the moment. regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to [EMAIL PROTECTED] (spam digging piggy) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On 6 Jun 2001, Eric W. Biederman wrote: > Derek Glidden <[EMAIL PROTECTED]> writes: > > > The problem I reported is not that 2.4 uses huge amounts of swap but > > that trying to recover that swap off of disk under 2.4 can leave the > > machine in an entirely unresponsive state, while 2.2 handles identical > > situations gracefully. > > The interesting thing from other reports is that it appears to be > kswapd using up CPU resources. This part is being worked on, expect a solution for this thing soon... Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to [EMAIL PROTECTED] (spam digging piggy) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On Wed, 6 Jun 2001, Derek Glidden wrote: > Or are you saying that if someone is unhappy with a particular > situation, they should just keep their mouth shut and accept it? There are lots of options ... 1) wait until somebody fixes the problem 2) fix the problem yourself 3) start infinite flamewars and make developers so sick of the problem nobody wants to fix it 4) pay someone to fix the problem ;) Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to [EMAIL PROTECTED] (spam digging piggy) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On Wed, 6 Jun 2001, Sean Hunter wrote: > A working VM would have several differences from what we have in my > opinion, among which are: > - It wouldn't require 8GB of swap on my large boxes > - It wouldn't suffer from the "bounce buffer" bug on my > large boxes > - It wouldn't cause the disk drive on my laptop to be > _constantly_ in use even when all I have done is spawned a > shell session and have no large apps or daemons running. > - It wouldn't kill things saying it was OOM unless it was OOM. I fully agree these problems need to be fixed. I just wish I had the time to tackle all of them right now ;) We should be close to getting the 3rd problem fixed and the deadlock problem with the bounce buffers seems to be fixed already. Getting reclaiming of swap space and OOM fixed is a matter of time ... I hope I'll have that time in the near future. regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to [EMAIL PROTECTED] (spam digging piggy) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Report was:Re: Break 2.4 VM in five easy steps
> reads the RTC device. The patched RTC driver can then > measure the elapsed time between the interrupt and the > read from userspace. Voila: latency. interesting, but I'm not sure there's much advantage over doing it entirely in user-space with the normal /dev/rtc: http://brain.mcmaster.ca/~hahn/realfeel.c it just prints out the raw time difference from when rtc should have woken up the program. you can do your own histogram; for summary purposes, something like stdev is probably best. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On 6 Jun 2001, Miles Lane wrote: >> Precicely. Saying 8x RAM doesn't change it either. Sometime >> next week I'm going to purposefully put a new 60Gb disk in on a >> separate controller as pure swap on top of 256Mb of RAM. My >> guess is after bootup, and login, I'll have 48Gb of stuff in >> swap "just in case". > >Mike and others, I am getting tired of your comments. Sheesh. And I'm tired of having people tell me, or tell others to buy a faster computer or more RAM to work around a real technical problem. If a dual 1Ghz system with 1Gb of RAM and 60GB of disk space broken across 3 U160 drives is not a modern fast workstation I don't know what is. My 300Mhz system however works on its own stuff, and doesn't need upgrading. >The various developers who actually work on the VM have already >acknowledged the issues and are exploring fixes, including at >least one patch that already exists. Precicely, which underscores what I'm saying: The problem is acknowledged, and being worked on by talented hackers knowing what they are doing - so why must people keep saying "get more disk space, it is cheap?" et al.? That is totally nonuseful advice in most cases. Many have pointed out already for example how impossible that would be in a 500 computer webserver farm. >It seems clear that the uproar from the people who are having >trouble with the new VM's handling of swap space have been >heard and folks are going to fix these problems. It may not >happen today or tomorrow, but soon. What the heck else do you >want? I agree with you. What I want, is when someone talks about this stuff or inquires about it, for people to stop telling them that their computer is out of date and they should upgrade it as that is bogus advice. "It worked fine yesterday, why should I upgrade" reigns supreme. >Making enflammatory remarks about the current situation does >nothing to help get the problems fixed, it just wastes our time >and bandwidth. It's not like there is someone forcing you to read it though. >So please, if you have new facts that you want to offer that >will help us characterize and understand these VM issues better >or discover new problems, feel free to share them. But if you >just want to rant, I, for one, would rather you didn't. Point noted, however that isn't going to stop anyone from speaking their personal opinion on things. Freedom of speech. -- Mike A. Harris - Linux advocate - Open Source advocate Opinions and viewpoints expressed are solely my own. -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Report was:Re: Break 2.4 VM in five easy steps
On Sat, 9 Jun 2001, Jonathan Morton wrote: > >> On the subject of Mike Galbraith's kernel compilation test, how much > >> physical RAM does he have for his machine, what type of CPU is it, and what > >> (approximate) type of device does he use for swap? I'll see if I can > >> partially duplicate his results at this end. So far all my tests have been > >> done with a fast CPU - perhaps I should try the P166/MMX or even try > >> loading linux-pmac onto my 8100. > > > >It's a PIII/500 with one ide disk. > > ...with how much RAM? That's the important bit. Duh! :) I'm a dipstick. 128mb. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Report was:Re: Break 2.4 VM in five easy steps
On Fri, 8 Jun 2001, Marcelo Tosatti wrote: > On Fri, 8 Jun 2001, John Stoffel wrote: > > > More importantly, a *repeatable* set of tests is what is needed to > > test the VM and get consistent results from run to run, so you can see > > how your changes are impacting performance. The kernel compile > > doesn't really have any one process grow to a large fraction of > > memory, so dropping in a compile which *does* is a good thing. > > I agree with you. > > Mike, I'm sure you have noticed that stock kernel gives much better > results than mine or Jonathan's patch. I noticed that Jonathan brought back waiting.. that (among others) made me very interested. > Now the stock kernel gives us crappy interactivity compared to my patch. > (Note: my patch still does not gives me the interactivity I want under > high VM loads, but I hope to get there soon). (And that's why) Among other things (yes, I do love throughput) I've poked at the interactivity problem. I can't improve it anymore without doing some strategic waiting :( I used to be able to help it a little by doing a careful roll-up in scrub size as load builds.. trying to smooth the transition from latency oriented to hammer down throughput. > BTW, we are talking with the OSDL (http://www.osdlab.org) guys about a > possibility to setup a test system which would run a different variety of > benchmarks to give us results of different kinds of workloads. If that > ever happens, we'll probably get rid of most of this testing problems. Excellent! -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Report was:Re: Break 2.4 VM in five easy steps
On Fri, 8 Jun 2001, Tobias Ringstrom wrote: > On Fri, 8 Jun 2001, Mike Galbraith wrote: > > On Fri, 8 Jun 2001, Tobias Ringstrom wrote: > > > On Fri, 8 Jun 2001, Mike Galbraith wrote: > > > > I gave this a shot at my favorite vm beater test (make -j30 bzImage) > > > > while testing some other stuff today. > > > > > > Could you please explain what is good about this test? I understand that > > > it will stress the VM, but will it do so in a realistic and relevant way? > > > > Can you explain what is bad about this test? ;) It spins the same VM wheels > > I think a load of ~30 is quit uncommon, and therefor it is unclear to me > that it would be a test that would be repesentative of most normal loads. It's not supposed to be repesentative. It's supposed to take the box rapidly (but not instantly) from idle through lo->medium->high and maintain solid throughput. > > as any other load does. What's the difference if I have a bunch of httpd > > allocating or a bunch of cc1/as/ld? This load has a modest cachable data > > set and is compute bound.. and above all gives very repeatable results. > > Not a big difference. The difference I was thinking abount is the > difference between spawning lots of processes allocating, using and > freeing lots of memory, compared to a case where you have a few processes > touching a lot of already allocated pages in some pattern. I was > wondering whether optimizing for your case would be good or bad for the > other case. I know, I know, I should do more testing myself. And I > should probably not ask you, since you really really like your test, > and you will probably just say yes... ;-) It's not a matter of optimizing for my case.. that would be horrible. It's a matter of is the vm capable of rapid and correct responses. > At home, I'm running a couple of computers. One of them is a slow > computer running Linux, serving mail, NFS, SMB, etc. I'm usually logged > in on a couple of virtual consoles. On this machine, I do not mind if all > shells, daemons and other idle processes are beeing swapped out in favor > of disk cache for the NFS and SMB serving. In fact, that is a very good > thing, and I want it that way. > > Another maching is my desktop machine. When using this maching, I really > hate when my emacsen, browsers, xterms, etc are swapped out just to give > me some stupid disk cache for my xmms or compilations. I do not care if a > kernel compile is a little slower as long as my applications are snappy. > > How could Linux predict this? It is a matter of taste, IMHO. I have no idea. It would be _wonderful_ if it could detect interactive tasks and give them preferencial treatment. > > I use it to watch reaction to surge. I watch for the vm to build to a > > solid maximum throughput without thrashing. That's the portion of VM > > that I'm interested in, so that's what I test. Besides :) I simply don't > > have the hardware to try to simulate hairy chested server loads. There > > are lots of folks with hairy chested boxes.. they should test that stuff. > > Agreed. More testing is needed. Now if we would have those knobs and > wheels to turn, we could perhaps also tune our systems to behave as we > like them, and submit that as well. Right now you need to be a kernel > hacker, and see through all the magic with shm, mmap, a bunch of caches, > page lists, etc. I'd give a lot for a nice picture (or state diagram) > showing the lifetime of a page, but I have not found such a picture > anywhere. Besides, the VM seems to change every new release anyway. > > > I've been repeating ~this test since 2.0 times, and have noticed a 1:1 > > relationship. When I notice that my box is ~happy doing this load test, > > I also notice very few VM gripes hitting the list. > > Ok, but as you say, we need more tests. > > > > Isn't the interesting case when you have a number of processes using lots > > > of memory, but only a part of all that memory is beeing actively used, and > > > that memory fits in RAM. In that case, the VM should make sure that the > > > not used memory is swapped out. In RAM you should have the used memory, > > > but also disk cache if there is any RAM left. Does the current VM handle > > > this case fine yet? IMHO, this is the case most people care about. It is > > > definately the case I care about, at least. :-) > > > > The interesting case is _every_ case. Try seeing my particular test as > > a simulation of a small classroom box with 30 students compiling their > > assignments and it'll suddenly become quite realistic. You'll notice > > by the numbers I post that I was very careful to not overload the box in > > a rediculous manner when selecting the total size of the job.. it's just > > a heavily loaded box. This test does not overload my IO resources, so > > it tests the VM's ability to choose and move the right stuff at the right > > time to get the job done with a minimum of additional overhead. > > I did not understand th
Re: VM Report was:Re: Break 2.4 VM in five easy steps
>> On the subject of Mike Galbraith's kernel compilation test, how much >> physical RAM does he have for his machine, what type of CPU is it, and what >> (approximate) type of device does he use for swap? I'll see if I can >> partially duplicate his results at this end. So far all my tests have been >> done with a fast CPU - perhaps I should try the P166/MMX or even try >> loading linux-pmac onto my 8100. > >It's a PIII/500 with one ide disk. ...with how much RAM? That's the important bit. -- from: Jonathan "Chromatix" Morton mail: [EMAIL PROTECTED] (not for attachments) The key to knowledge is not to rely on people to teach you it. GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Report was:Re: Break 2.4 VM in five easy steps
On Sat, 9 Jun 2001, Jonathan Morton wrote: > On the subject of Mike Galbraith's kernel compilation test, how much > physical RAM does he have for his machine, what type of CPU is it, and what > (approximate) type of device does he use for swap? I'll see if I can > partially duplicate his results at this end. So far all my tests have been > done with a fast CPU - perhaps I should try the P166/MMX or even try > loading linux-pmac onto my 8100. It's a PIII/500 with one ide disk. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Report was:Re: Break 2.4 VM in five easy steps
On Fri, 8 Jun 2001, Mike Galbraith wrote: > On Fri, 8 Jun 2001, John Stoffel wrote: > > I agree, this isn't really a good test case. I'd rather see what > > happens when you fire up a gimp session to edit an image which is > > *almost* the size of RAM, or even just 50% the size of ram. > > OK, riddle me this. If this test is a crummy test, just how is it Personally, I'd like to see BOTH of these tests, and many many more. Preferably, handed to the VM hackers in various colourful graphs that allow even severely undercaffeinated hackers to see how things changed for the good or the bad between kernel revisions. cheers, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to [EMAIL PROTECTED] (spam digging piggy) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Report was:Re: Break 2.4 VM in five easy steps
Jonathan Morton wrote: > > [ Re-entering discussion after too long a day and a long sleep... ] > > >> There is the problem in terms of some people want pure interactive > >> performance, while others are looking for throughput over all else, > >> but those are both extremes of the spectrum. Though I suspect > >> raw throughput is the less wanted (in terms of numbers of systems) > >> than keeping interactive response good during VM pressure. > > > >And this raises a very very important point: raw throughtput wins > >enterprise-like benchmarks, and the enterprise people are the ones who pay > >most of hackers here. (including me and Rik) > > Very true. As well as the fact that interactivity is much harder to > measure. The question is, what is interactivity (from the kernel's > perspective)? It usually means small(ish) processes with intermittent > working-set and CPU requirements. These types of process can safely be > swapped out when not immediately in use, but the kernel has to be able to > page them in quite quickly when needed. Doing that under heavy load is > very non-trivial. For the low-latency stuff, latency can be defined as the worst-case time to schedule a userspace process in response to an interrupt. That metric is also appropriate in this case, (latency equals interactivity), although here you don't need to be so fanatical about the *worst case*. A few scheduling blips here are less fatal. I have tools to measure latency (aka interactivity). At http://www.uow.edu.au/~andrewm/linux/schedlat.html#downloads there is a kernel patch called `rtc-debug' which causes the PC RTC to generate a stream of interrupts. A user-space task called `amlat' responds to those interrupts and reads the RTC device. The patched RTC driver can then measure the elapsed time between the interrupt and the read from userspace. Voila: latency. When you close the RTC device (by killing amlat), the RTC driver will print out a histogram of the latencies. `amlat' at present gives itself SCHED_RR policy and runs under mlockall() - for your testing you'll need to delete those lines. So. Simple apply rtc-debug, run `amlat' and kill it when you've finished the workload. The challenge will be to relate the latency histogram to human-perceived interactivity. I'm not sure of the best way of doing that. Perhaps monitor the 90th percentile, and aim to keep it below 100 milliseconds. Also, `amlat' should do a bit of disk I/O as well. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Report was:Re: Break 2.4 VM in five easy steps
[ Re-entering discussion after too long a day and a long sleep... ] >> There is the problem in terms of some people want pure interactive >> performance, while others are looking for throughput over all else, >> but those are both extremes of the spectrum. Though I suspect >> raw throughput is the less wanted (in terms of numbers of systems) >> than keeping interactive response good during VM pressure. > >And this raises a very very important point: raw throughtput wins >enterprise-like benchmarks, and the enterprise people are the ones who pay >most of hackers here. (including me and Rik) Very true. As well as the fact that interactivity is much harder to measure. The question is, what is interactivity (from the kernel's perspective)? It usually means small(ish) processes with intermittent working-set and CPU requirements. These types of process can safely be swapped out when not immediately in use, but the kernel has to be able to page them in quite quickly when needed. Doing that under heavy load is very non-trivial. It can also mean multimedia applications with a continuous (maybe small) working set, a continuous but not 100% CPU usage, and the special property that the user WILL notice if this process gets swapped out even briefly. mpg123 and XMMS fall into this category, and I sometimes tried running these alongside my compilation tests to see how they fared. I think I had it going fairly well towards the end, with mpg123 stuttering relatively rarely and briefly while VM load was high. On the subject of Mike Galbraith's kernel compilation test, how much physical RAM does he have for his machine, what type of CPU is it, and what (approximate) type of device does he use for swap? I'll see if I can partially duplicate his results at this end. So far all my tests have been done with a fast CPU - perhaps I should try the P166/MMX or even try loading linux-pmac onto my 8100. -- from: Jonathan "Chromatix" Morton mail: [EMAIL PROTECTED] (not for attachments) The key to knowledge is not to rely on people to teach you it. GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Report was:Re: Break 2.4 VM in five easy steps
On Fri, 8 Jun 2001, John Stoffel wrote: > > Marcelo> Now the stock kernel gives us crappy interactivity compared > Marcelo> to my patch. (Note: my patch still does not gives me the > Marcelo> interactivity I want under high VM loads, but I hope to get > Marcelo> there soon). > > This raises the important question, how can we objectively measure > interactive response in the kernel and relate it to the user's > perceived interactive response? If we could come up with some sort of > testing system that would show us this, it would help alot, since we > could just have people run tests in a more automatic and repeatable > manner. > > And I think it would also help us automatically tune the Kernel, since > it would have a knowledge of it's own performance. > > There is the problem in terms of some people want pure interactive > performance, while others are looking for throughput over all else, > but those are both extremes of the spectrum. Though I suspect > raw throughput is the less wanted (in terms of numbers of systems) > than keeping interactive response good during VM pressure. And this raises a very very important point: raw throughtput wins enterprise-like benchmarks, and the enterprise people are the ones who pay most of hackers here. (including me and Rik) We have to be careful about that. > I have zero knowledge of how we could do this, but giving the kernel > some counters, even if only for use during debugging runs, which would > give us some objective feedback on performance would be a big win. > > Having people just send in reports of "I ran X,Y,Z and it was slow" > doesn't help us, since it's so hard to re-create their environment so > you can run tests against it. Lets wait for some test system to be set up (eg the OSDL thing). Once thats done, I'm sure we will find out some way of doing it. Well, good weekend for you too. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Report was:Re: Break 2.4 VM in five easy steps
Marcelo> Now the stock kernel gives us crappy interactivity compared Marcelo> to my patch. (Note: my patch still does not gives me the Marcelo> interactivity I want under high VM loads, but I hope to get Marcelo> there soon). This raises the important question, how can we objectively measure interactive response in the kernel and relate it to the user's perceived interactive response? If we could come up with some sort of testing system that would show us this, it would help alot, since we could just have people run tests in a more automatic and repeatable manner. And I think it would also help us automatically tune the Kernel, since it would have a knowledge of it's own performance. There is the problem in terms of some people want pure interactive performance, while others are looking for throughput over all else, but those are both extremes of the spectrum. Though I suspect raw throughput is the less wanted (in terms of numbers of systems) than keeping interactive response good during VM pressure. I have zero knowledge of how we could do this, but giving the kernel some counters, even if only for use during debugging runs, which would give us some objective feedback on performance would be a big win. Having people just send in reports of "I ran X,Y,Z and it was slow" doesn't help us, since it's so hard to re-create their environment so you can run tests against it. Anyway, enjoy the weekend all. John John Stoffel - Senior Unix Systems Administrator - Lucent Technologies [EMAIL PROTECTED] - http://www.lucent.com - 978-952-7548 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Report was:Re: Break 2.4 VM in five easy steps
On Fri, 8 Jun 2001, John Stoffel wrote: > > Mike> OK, riddle me this. If this test is a crummy test, just how is > Mike> it that I was able to warn Rik in advance that when 2.4.5 was > Mike> released, he should expect complaints? How did I _know_ that? > Mike> The answer is that I fiddle with Rik's code a lot, and I test > Mike> with this test because it tells me a lot. It may not tell you > Mike> anything, but it does me. > > I never said it was a crummy test, please do not read more into my > words than was written. What I was trying to get across is that just > one test (such as a compile of the kernel) isn't perfect at showing > where the problems are with the VM sub-system. > > Jonathan Morton has been using another large compile to also test the > sub-system, and it includes a compile which puts a large, single > process pressure on the VM. I consider this to be a more > representative test of how the VM deals with pressure. > > The kernel compile is an ok test of basic VM handling, but from what > I've been hearing on linux-kernel and linux-mm is that the VM goes to > crap when you have a mix of stuff running, and one (or more) processes > starts up or grows much larger and starts impacting the system > performance. > > I'm also not knocking your contributions to this discussion, so stop > being so touchy. I was trying to contribute and say (albeit poorly) > that a *mix* of tests is needed to test the VM. > > More importantly, a *repeatable* set of tests is what is needed to > test the VM and get consistent results from run to run, so you can see > how your changes are impacting performance. The kernel compile > doesn't really have any one process grow to a large fraction of > memory, so dropping in a compile which *does* is a good thing. I agree with you. Mike, I'm sure you have noticed that stock kernel gives much better results than mine or Jonathan's patch. Now the stock kernel gives us crappy interactivity compared to my patch. (Note: my patch still does not gives me the interactivity I want under high VM loads, but I hope to get there soon). BTW, we are talking with the OSDL (http://www.osdlab.org) guys about a possibility to setup a test system which would run a different variety of benchmarks to give us results of different kinds of workloads. If that ever happens, we'll probably get rid of most of this testing problems. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Report was:Re: Break 2.4 VM in five easy steps
On Fri, 8 Jun 2001, Mike Galbraith wrote: > On Fri, 8 Jun 2001, Tobias Ringstrom wrote: > > On Fri, 8 Jun 2001, Mike Galbraith wrote: > > > I gave this a shot at my favorite vm beater test (make -j30 bzImage) > > > while testing some other stuff today. > > > > Could you please explain what is good about this test? I understand that > > it will stress the VM, but will it do so in a realistic and relevant way? > > Can you explain what is bad about this test? ;) It spins the same VM wheels I think a load of ~30 is quit uncommon, and therefor it is unclear to me that it would be a test that would be repesentative of most normal loads. > as any other load does. What's the difference if I have a bunch of httpd > allocating or a bunch of cc1/as/ld? This load has a modest cachable data > set and is compute bound.. and above all gives very repeatable results. Not a big difference. The difference I was thinking abount is the difference between spawning lots of processes allocating, using and freeing lots of memory, compared to a case where you have a few processes touching a lot of already allocated pages in some pattern. I was wondering whether optimizing for your case would be good or bad for the other case. I know, I know, I should do more testing myself. And I should probably not ask you, since you really really like your test, and you will probably just say yes... ;-) At home, I'm running a couple of computers. One of them is a slow computer running Linux, serving mail, NFS, SMB, etc. I'm usually logged in on a couple of virtual consoles. On this machine, I do not mind if all shells, daemons and other idle processes are beeing swapped out in favor of disk cache for the NFS and SMB serving. In fact, that is a very good thing, and I want it that way. Another maching is my desktop machine. When using this maching, I really hate when my emacsen, browsers, xterms, etc are swapped out just to give me some stupid disk cache for my xmms or compilations. I do not care if a kernel compile is a little slower as long as my applications are snappy. How could Linux predict this? It is a matter of taste, IMHO. > I use it to watch reaction to surge. I watch for the vm to build to a > solid maximum throughput without thrashing. That's the portion of VM > that I'm interested in, so that's what I test. Besides :) I simply don't > have the hardware to try to simulate hairy chested server loads. There > are lots of folks with hairy chested boxes.. they should test that stuff. Agreed. More testing is needed. Now if we would have those knobs and wheels to turn, we could perhaps also tune our systems to behave as we like them, and submit that as well. Right now you need to be a kernel hacker, and see through all the magic with shm, mmap, a bunch of caches, page lists, etc. I'd give a lot for a nice picture (or state diagram) showing the lifetime of a page, but I have not found such a picture anywhere. Besides, the VM seems to change every new release anyway. > I've been repeating ~this test since 2.0 times, and have noticed a 1:1 > relationship. When I notice that my box is ~happy doing this load test, > I also notice very few VM gripes hitting the list. Ok, but as you say, we need more tests. > > Isn't the interesting case when you have a number of processes using lots > > of memory, but only a part of all that memory is beeing actively used, and > > that memory fits in RAM. In that case, the VM should make sure that the > > not used memory is swapped out. In RAM you should have the used memory, > > but also disk cache if there is any RAM left. Does the current VM handle > > this case fine yet? IMHO, this is the case most people care about. It is > > definately the case I care about, at least. :-) > > The interesting case is _every_ case. Try seeing my particular test as > a simulation of a small classroom box with 30 students compiling their > assignments and it'll suddenly become quite realistic. You'll notice > by the numbers I post that I was very careful to not overload the box in > a rediculous manner when selecting the total size of the job.. it's just > a heavily loaded box. This test does not overload my IO resources, so > it tests the VM's ability to choose and move the right stuff at the right > time to get the job done with a minimum of additional overhead. I did not understand those numbers when I saw them the first time. Now, I must say that your test does not look as silly as it did before. > The current VM handles things generally well imho, but has problems > regulating itself under load. My test load hits the VM right in it's > weakest point (not _that_ weak, but..) by starting at zero and building > rapidly to max.. and keeping it _right there_. > > > I'm not saying that it's a completely uninteresting case when your active > > memory is bigger than you RAM of course, but perhaps there should be other > > algorithms handling that case, such as putting some of the swa
Re: VM Report was:Re: Break 2.4 VM in five easy steps
On Fri, 8 Jun 2001, John Stoffel wrote: > Mike> OK, riddle me this. If this test is a crummy test, just how is > Mike> it that I was able to warn Rik in advance that when 2.4.5 was > Mike> released, he should expect complaints? How did I _know_ that? > Mike> The answer is that I fiddle with Rik's code a lot, and I test > Mike> with this test because it tells me a lot. It may not tell you > Mike> anything, but it does me. > > I never said it was a crummy test, please do not read more into my > words than was written. What I was trying to get across is that just > one test (such as a compile of the kernel) isn't perfect at showing > where the problems are with the VM sub-system. Hmm... Tobias> Could you please explain what is good about this test? I Tobias> understand that it will stress the VM, but will it do so in a Tobias> realistic and relevant way? I agree, this isn't really a good test case. I'd rather see what happens when you fire up a gimp session to edit an image which is *almost* the size of RAM, or even just 50% the size of ram. Then how does that affect your other processes that are running at the same time? ...but anyway, yes it just one test from any number of possibles. > Jonathan Morton has been using another large compile to also test the > sub-system, and it includes a compile which puts a large, single > process pressure on the VM. I consider this to be a more > representative test of how the VM deals with pressure. What does 'more representative' mean given that the VM must react to every situation it runs into? > The kernel compile is an ok test of basic VM handling, but from what Now we're communicating. I never said it was more than that ;-) > I've been hearing on linux-kernel and linux-mm is that the VM goes to > crap when you have a mix of stuff running, and one (or more) processes > starts up or grows much larger and starts impacting the system > performance. > > I'm also not knocking your contributions to this discussion, so stop > being so touchy. I was trying to contribute and say (albeit poorly) > that a *mix* of tests is needed to test the VM. Yes, more people need to test. I don't need to do all of those other tests (no have right toys), more people need to do repeatable tests. > More importantly, a *repeatable* set of tests is what is needed to > test the VM and get consistent results from run to run, so you can see > how your changes are impacting performance. The kernel compile > doesn't really have any one process grow to a large fraction of > memory, so dropping in a compile which *does* is a good thing. I know I'm only watching basic functionality. I'm watching basic functionality with one very consistant test run very consistantly. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Report was:Re: Break 2.4 VM in five easy steps
Mike> OK, riddle me this. If this test is a crummy test, just how is Mike> it that I was able to warn Rik in advance that when 2.4.5 was Mike> released, he should expect complaints? How did I _know_ that? Mike> The answer is that I fiddle with Rik's code a lot, and I test Mike> with this test because it tells me a lot. It may not tell you Mike> anything, but it does me. I never said it was a crummy test, please do not read more into my words than was written. What I was trying to get across is that just one test (such as a compile of the kernel) isn't perfect at showing where the problems are with the VM sub-system. Jonathan Morton has been using another large compile to also test the sub-system, and it includes a compile which puts a large, single process pressure on the VM. I consider this to be a more representative test of how the VM deals with pressure. The kernel compile is an ok test of basic VM handling, but from what I've been hearing on linux-kernel and linux-mm is that the VM goes to crap when you have a mix of stuff running, and one (or more) processes starts up or grows much larger and starts impacting the system performance. I'm also not knocking your contributions to this discussion, so stop being so touchy. I was trying to contribute and say (albeit poorly) that a *mix* of tests is needed to test the VM. More importantly, a *repeatable* set of tests is what is needed to test the VM and get consistent results from run to run, so you can see how your changes are impacting performance. The kernel compile doesn't really have any one process grow to a large fraction of memory, so dropping in a compile which *does* is a good thing. John John Stoffel - Senior Unix Systems Administrator - Lucent Technologies [EMAIL PROTECTED] - http://www.lucent.com - 978-952-7548 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Report was:Re: Break 2.4 VM in five easy steps
On Fri, 8 Jun 2001, John Stoffel wrote: > > "Tobias" == Tobias Ringstrom <[EMAIL PROTECTED]> writes: > > Tobias> On Fri, 8 Jun 2001, Mike Galbraith wrote: > > >> I gave this a shot at my favorite vm beater test (make -j30 bzImage) > >> while testing some other stuff today. > > Tobias> Could you please explain what is good about this test? I > Tobias> understand that it will stress the VM, but will it do so in a > Tobias> realistic and relevant way? > > I agree, this isn't really a good test case. I'd rather see what > happens when you fire up a gimp session to edit an image which is > *almost* the size of RAM, or even just 50% the size of ram. Then how > does that affect your other processes that are running at the same > time? OK, riddle me this. If this test is a crummy test, just how is it that I was able to warn Rik in advance that when 2.4.5 was released, he should expect complaints? How did I _know_ that? The answer is that I fiddle with Rik's code a lot, and I test with this test because it tells me a lot. It may not tell you anything, but it does me. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Report was:Re: Break 2.4 VM in five easy steps
On Fri, 8 Jun 2001, Tobias Ringstrom wrote: > On Fri, 8 Jun 2001, Mike Galbraith wrote: > > I gave this a shot at my favorite vm beater test (make -j30 bzImage) > > while testing some other stuff today. > > Could you please explain what is good about this test? I understand that > it will stress the VM, but will it do so in a realistic and relevant way? Can you explain what is bad about this test? ;) It spins the same VM wheels as any other load does. What's the difference if I have a bunch of httpd allocating or a bunch of cc1/as/ld? This load has a modest cachable data set and is compute bound.. and above all gives very repeatable results. I use it to watch reaction to surge. I watch for the vm to build to a solid maximum throughput without thrashing. That's the portion of VM that I'm interested in, so that's what I test. Besides :) I simply don't have the hardware to try to simulate hairy chested server loads. There are lots of folks with hairy chested boxes.. they should test that stuff. I've been repeating ~this test since 2.0 times, and have noticed a 1:1 relationship. When I notice that my box is ~happy doing this load test, I also notice very few VM gripes hitting the list. > Isn't the interesting case when you have a number of processes using lots > of memory, but only a part of all that memory is beeing actively used, and > that memory fits in RAM. In that case, the VM should make sure that the > not used memory is swapped out. In RAM you should have the used memory, > but also disk cache if there is any RAM left. Does the current VM handle > this case fine yet? IMHO, this is the case most people care about. It is > definately the case I care about, at least. :-) The interesting case is _every_ case. Try seeing my particular test as a simulation of a small classroom box with 30 students compiling their assignments and it'll suddenly become quite realistic. You'll notice by the numbers I post that I was very careful to not overload the box in a rediculous manner when selecting the total size of the job.. it's just a heavily loaded box. This test does not overload my IO resources, so it tests the VM's ability to choose and move the right stuff at the right time to get the job done with a minimum of additional overhead. The current VM handles things generally well imho, but has problems regulating itself under load. My test load hits the VM right in it's weakest point (not _that_ weak, but..) by starting at zero and building rapidly to max.. and keeping it _right there_. > I'm not saying that it's a completely uninteresting case when your active > memory is bigger than you RAM of course, but perhaps there should be other > algorithms handling that case, such as putting some of the swapping > processes to sleep for some time, especially if you have lots of processes > competing for the memory. I may be wrong, but it seems to me that your > testcase falls into this second category (also known as thrashing). Thrashing? Let's look some numbers. (not the ugly ones, the ~ok ones;) real9m12.198s make -j 30 bzImage user7m41.290s sys 0m34.840s user : 0:07:47.69 76.8% page in : 452632 nice : 0:00:00.00 0.0% page out: 399847 system: 0:01:17.08 12.7% swap in :75338 idle : 0:01:03.97 10.5% swap out:88291 real8m6.994s make bzImage user7m34.350s sys 0m26.550s user : 0:07:37.52 78.4% page in :90546 nice : 0:00:00.00 0.0% page out:18164 system: 0:01:26.13 14.8% swap in :1 idle : 0:00:39.69 6.8% swap out:0 ...look at cpu utilization. One minute +tiny change to complete the large job vs the small (VM footprint) job. The box is not thrashing, it's working it's little silicon butt off. What I'm testing is the VM's ability to handle load without thrashing so badly that it loses throughput bigtime, stalls itself whatever.. it's ability to regulate itself. I consider a minute and a half to be ~acceptable, a minute to be good, and 30 seconds to be excellent. That's just my own little VM performance thermometer. > An at last, a humble request: Every problem I've had with the VM has been > that it either swapped out too many processes and used too much cache, or > the other way around. I'd really enjoy a way to tune this behaviour, if > possible. Tunables aren't really practical in VM (imho). If there were a dozen knobs, you'd have to turn a dozen knobs a dozen times a day. VM has to be self regulating. In case you can't tell (the length of this reply) I like my fovorite little generic throughput test a LOT :-) -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Report was:Re: Break 2.4 VM in five easy steps
> "Tobias" == Tobias Ringstrom <[EMAIL PROTECTED]> writes: Tobias> On Fri, 8 Jun 2001, Mike Galbraith wrote: >> I gave this a shot at my favorite vm beater test (make -j30 bzImage) >> while testing some other stuff today. Tobias> Could you please explain what is good about this test? I Tobias> understand that it will stress the VM, but will it do so in a Tobias> realistic and relevant way? I agree, this isn't really a good test case. I'd rather see what happens when you fire up a gimp session to edit an image which is *almost* the size of RAM, or even just 50% the size of ram. Then how does that affect your other processes that are running at the same time? This testing could even be automated with the script-foo stuff to get consistent results across runs, which is the prime requirement of any sort of testing. On another issue, in swap.c we have two defines for buffer_mem and page_cache, but the first maxes out at 60%, while the cache maxes out at 75%. Shouldn't they both be lower numbers? Or at least equally sized? I've set my page_cache maximum to be 60, I'll be trying to test it over the weekend, but good weather will keep me outside doing other stuff... Thanks, John John Stoffel - Senior Unix Systems Administrator - Lucent Technologies [EMAIL PROTECTED] - http://www.lucent.com - 978-952-7548 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Report was:Re: Break 2.4 VM in five easy steps
On Fri, 8 Jun 2001, Mike Galbraith wrote: > I gave this a shot at my favorite vm beater test (make -j30 bzImage) > while testing some other stuff today. Could you please explain what is good about this test? I understand that it will stress the VM, but will it do so in a realistic and relevant way? Isn't the interesting case when you have a number of processes using lots of memory, but only a part of all that memory is beeing actively used, and that memory fits in RAM. In that case, the VM should make sure that the not used memory is swapped out. In RAM you should have the used memory, but also disk cache if there is any RAM left. Does the current VM handle this case fine yet? IMHO, this is the case most people care about. It is definately the case I care about, at least. :-) I'm not saying that it's a completely uninteresting case when your active memory is bigger than you RAM of course, but perhaps there should be other algorithms handling that case, such as putting some of the swapping processes to sleep for some time, especially if you have lots of processes competing for the memory. I may be wrong, but it seems to me that your testcase falls into this second category (also known as thrashing). An at last, a humble request: Every problem I've had with the VM has been that it either swapped out too many processes and used too much cache, or the other way around. I'd really enjoy a way to tune this behaviour, if possible. /Tobias - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Report was:Re: Break 2.4 VM in five easy steps
On Fri, 8 Jun 2001, Jonathan Morton wrote: > http://www.chromatix.uklinux.net/linux-patches/vm-update-2.patch > > Try this. I can't guarantee it's SMP-safe yet (I'm leaving the gurus to > that, but they haven't told me about any errors in the past hour so I'm > assuming they aren't going to find anything glaringly wrong...), but you > might like to see if your performance improves with it. It also fixes the > OOM-killer bug, which you refer to above. > > Some measurements, from my own box (1GHz Athlon, 256Mb RAM): > > For the following benchmarks, physical memory availability was reduced > according to the parameter in the left column. The benchmark is the > wall-clock time taken to compile MySQL. > > mem= 2.4.5 earlier tweaks now > 48M 8m30s 6m30s 5m58s > 32M unknown 2h15m 12m34s > > The following was performed with all 256Mb RAM available. This is > compilation of MySQL using make -j 15. > > kernel: 2.4.5 now > time: 6m30s 6m15s > peak swap:190M70M > > For the following test, the 256Mb swap partition on my IDE drive was > disabled and replaced with a 1Gb swapfile on my Ultra160 SCSI drive. This > is compilation of MySQL using make -j 20. > > kernel: 2.4.5 now > time: 7m20s 6m30s > peak swap:370M254M > > Draw your own conclusions. :) (ok;) Hi, I gave this a shot at my favorite vm beater test (make -j30 bzImage) while testing some other stuff today. seven identical runs, six slightly different kernels plus yours. real11m23.522s 2.4.5.vm-update-2 user7m59.170s sys 0m37.030s user : 0:08:07.06 65.6% page in : 642402 nice : 0:00:00.00 0.0% page out: 676820 system: 0:02:09.44 17.4% swap in : 105965 idle : 0:02:05.66 16.9% swap out: 162603 real10m9.512s 2.4.5.virgin user7m55.520s sys 0m35.460s user : 0:08:02.66 72.2% page in : 535186 nice : 0:00:00.00 0.0% page out: 377992 system: 0:01:37.78 14.6% swap in :99445 idle : 0:01:28.14 13.2% swap out:81926 real10m48.939s 2.4.5.virgin+reclaim.marcelo user7m54.960s sys 0m36.240s user : 0:08:02.33 68.0% page in : 566239 nice : 0:00:00.00 0.0% page out: 431874 system: 0:01:56.02 16.4% swap in : 108633 idle : 0:01:50.61 15.6% swap out:96415 real9m54.466s 2.4.5.virgin+reclaim.mike (icky 'bleeder valve') user7m57.370s sys 0m35.890s user : 0:08:04.74 74.1% page in : 527678 nice : 0:00:00.00 0.0% page out: 405259 system: 0:01:12.01 11.0% swap in :98616 idle : 0:01:37.47 14.9% swap out:91492 real9m12.198s 2.4.5.tweak user7m41.290s sys 0m34.840s user : 0:07:47.69 76.8% page in : 452632 nice : 0:00:00.00 0.0% page out: 399847 system: 0:01:17.08 12.7% swap in :75338 idle : 0:01:03.97 10.5% swap out:88291 real9m41.563s 2.4.5.tweak+reclaim.marcelo user7m59.880s sys 0m34.690s user : 0:08:07.22 73.4% page in : 515433 nice : 0:00:00.00 0.0% page out: 545762 system: 0:01:35.34 14.4% swap in :88425 idle : 0:01:21.11 12.2% swap out: 125967 real9m47.682s 2.4.5.tweak+reclaim.mike user8m2.190s sys 0m34.550s user : 0:08:09.57 75.7% page in : 513166 nice : 0:00:00.00 0.0% page out: 473539 system: 0:01:20.27 12.4% swap in :83127 idle : 0:01:16.89 11.9% swap out: 108886 Conclusion: Your patch hits the cache too hard and pays through the nose for doing so.. at least under this hefty weight load it does. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Thu, Jun 07, 2001 at 03:38:35PM -0600, Brian D Heaton wrote: > Maybe i'm missing something. I just tried this (with the 262144k/1 > and 128k/2048 params) and my results are within .1s of each other. This is > without any special patches. Am I doing something wrong Oh, I don't mean the time elapsed, It's that nothing _else_ can happen while dd is hogging the kernel. > Oh yes - > > SMP - dual PIII866/133 Yes, this is what you are doing wrong ;) My hypothesis is that in your case, one cpu gets pegged copying pages from /dev/zero into dd's buffer, while the other cpu can do things like updating mouse cursors, run setiathome, etc. What happens if you do *two* dd-tortures with huge buffers at the same time? And then, please don't happen to have a quad box! I don't know if my symptom (loss of interactivity on heavy writing) is related to swapoff -a causing the same symptom on deeply-swapped boxes. BTW keep in mind my 4-liner is based more on voodoo than on analysis. Bernd Jendrissek -BEGIN PGP SIGNATURE- Version: GnuPG v1.0.4 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE7IICU/FmLrNfLpjMRAnpTAJ48/jAFxZqfxUf2NXT0O542KDbNOwCfaoZo Q2xaNE4GBqnbn/cl2vrRxLc= =4sGO -END PGP SIGNATURE- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
In my everyday desktop workstation (PII 350) I have 64MB of RAM and use 300MB of swap, 150MB on each hard disk. After upgrading to 2.4, and maintaining the same set of applications (KDE, Netscape & friends), the machine performance is _definitely_ much worse, in terms of responsiveness and throughput. Most of applications just take much longer to load, and once you've made something that required more memory for a while (like compiling a kernel, opening a large JPEG in gimp, etc) it takes lots of time to come back to normal. Strangely, with 2.4 the workstation just feels that someone stole the 64MB DIMM and put in a 16MB one!! One thing I find strange is that with 2.4 if you run top or something similar you notice that memory allocated for cache is almost always using more than half total RAM. I don't remember seeing this with 2.2 kernel series... Anyway I think there is something really broken with respect to 2.4 VM. It is just NOT acceptable that when running the same set of apps and type of work and you upgrade your kernel, your hardware no longer is up to the job, when it fited perfectly right before. This is just MS way of solving problems here. Best regards Claudio Martins On Wed, Jun 06, 2001 at 06:58:39AM -0700, Gerhard Mack wrote: > > I have several boxes with 2x ram as swap and performance still sucks > compared to 2.2.17. > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Report was:Re: Break 2.4 VM in five easy steps
At 12:29 am +0100 8/6/2001, Shane Nay wrote: >(VM report at Marcelo Tosatti's request. He has mentioned that rather than >complaining about the VM that people mention what there experiences were. I >have tried to do so in the way that he asked.) >> By performance you mean interactivity or throughput? > >Interactivity. I don't have any throughput needs to speak of. > >I just ran a barage of tests on my machine, and the smallest it would ever >make the cache was 16M, it would prefer to kill processes rather than make >the cache smaller than that. http://www.chromatix.uklinux.net/linux-patches/vm-update-2.patch Try this. I can't guarantee it's SMP-safe yet (I'm leaving the gurus to that, but they haven't told me about any errors in the past hour so I'm assuming they aren't going to find anything glaringly wrong...), but you might like to see if your performance improves with it. It also fixes the OOM-killer bug, which you refer to above. Some measurements, from my own box (1GHz Athlon, 256Mb RAM): For the following benchmarks, physical memory availability was reduced according to the parameter in the left column. The benchmark is the wall-clock time taken to compile MySQL. mem=2.4.5 earlier tweaks now 48M 8m30s 6m30s 5m58s 32M unknown 2h15m 12m34s The following was performed with all 256Mb RAM available. This is compilation of MySQL using make -j 15. kernel: 2.4.5 now time: 6m30s 6m15s peak swap: 190M70M For the following test, the 256Mb swap partition on my IDE drive was disabled and replaced with a 1Gb swapfile on my Ultra160 SCSI drive. This is compilation of MySQL using make -j 20. kernel: 2.4.5 now time: 7m20s 6m30s peak swap: 370M254M Draw your own conclusions. :) -- from: Jonathan "Chromatix" Morton mail: [EMAIL PROTECTED] (not for attachments) The key to knowledge is not to rely on people to teach you it. GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
VM Report was:Re: Break 2.4 VM in five easy steps
(VM report at Marcelo Tosatti's request. He has mentioned that rather than complaining about the VM that people mention what there experiences were. I have tried to do so in the way that he asked.) > 1) Describe what you're running. (your workload) A lot of daemons, all on a private network so there is no throughput load on them. About 13 rxvt's, freeamp actively playing music at all times, xemacs with 25 active buffers, a few instances of vi, opera, no "desktop env", just windowmaker. (Though I have a few KDE2 apps open, and one or two GTK based apps open, so lots of library code swapping in and out I imagine) Now what I've noticed lately is this, with 2.4.2 my machine would lock quite frequently when I was compiling code and had other apps that were allocing memory. With 2.4.5 I haven't had that behaviour, but I've been much lighter on my machine. (I was doing full toolchain builds with 2.4.2 when I had the real problems) But processes were still running when the machine would lock..., like the mp3 player was still playing I noticed one time. With 2.4.5 (not -ac) I haven't had any deadlocks, but the system seems very sluggish at acute moments . While doing absolutely nothing processor intensive (I've been loading up top and ps'ing with regularity when this happens, looking for kswapd going crazy), when I switch between workspaces the refresh is much more sluggish on occasion, like I can watch windows appear. Almost like a micro freeze really. (AMD T-Bird 1.333Mhz 256MB-DDR) > 2) Describe what you're feeling. (eg "interactivity is crap when I run > this or that thing", etc) Freeing memory takes *forever*, but I think that's a function of how I'm allocing in this polygon rendering routine I'm working on. Like literally sucks up vast numbers of cycles and makes picogui totally unusable. But I think this is unrelated to the kernel..., I think that's just because I haven't implemented re-use in memory structures for the polygon routine. (It's malloc/freeing massive numbers of small chunks of memory rather than doing it's own memory management, probably related to glibc memory organization) Here's a vmstat line after a 8 days of uptime and before contrived mem tests: procs memoryswap io system cpu r b w swpd free buff cache si sobibo incs us sy id 1 0 0 0 3056 7856 121872 0 0 7 4 3716 1 0 40 > If we need more info than that I'll request in private. > > Also send this reports to the linux-mm list, so other VM hackers can also > get those reports and we avoid traffic on lk. > By performance you mean interactivity or throughput? Interactivity. I don't have any throughput needs to speak of. I just ran a barage of tests on my machine, and the smallest it would ever make the cache was 16M, it would prefer to kill processes rather than make the cache smaller than that. Contrived stressor program: (pseudo code) fork(); fork(); fork(); fork(); //16 total processes for (i=0;i Just do what I described above. Done :). Thanks, Shane Nay. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On Thu, 7 Jun 2001, Shane Nay wrote: > On Thursday 07 June 2001 13:00, Marcelo Tosatti wrote: > > On Thu, 7 Jun 2001, Shane Nay wrote: > > > (Oh, BTW, I really appreciate the work that people have done on the VM, > > > but folks that are just talking..., well, think clearly before you impact > > > other people that are writing code.) > > > > If all the people talking were reporting results we would be really happy. > > > > Seriously, we really lack VM reports. > > Okay, I've had some problems with the VM on my machine, what is the most > usefull way to compile reports for you? 1) Describe what you're running. (your workload) 2) Describe what you're feeling. (eg "interactivity is crap when I run this or that thing", etc) If we need more info than that I'll request in private. Also send this reports to the linux-mm list, so other VM hackers can also get those reports and we avoid traffic on lk. > I have modified the kernel for a few different ports fixing bugs, and > device drivers, etc., but the VM is all greek to me, I can just see > that caching is hyper aggressive and doesn't look like it's going back > to the pool..., which results in sluggish performance. By performance you mean interactivity or throughput? > Now I know from the work that I've done that anecdotal information is > almost never even remotely usefull. If we need more info, we will request. > Therefore is there any body of information that I can read up on to > create a usefull set of data points for you or other VM hackers to > look at? (Or maybe some report in the past that you thought was > especially usefull?) Just do what I described above. Thanks - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
"Eric W. Biederman" wrote: > LA Walsh <[EMAIL PROTECTED]> writes: > > > Now for whatever reason, since 2.4, I consistently use at least > > a few Mb of swap -- stands at 5Meg now. Weird -- but I notice things > > like nscd running 7 copies that take 72M. Seems like overkill for > > a laptop. > > So the question becomes why you are seeing an increased swap usage. > Currently there are two canidates in the 2.4.x code path. > > 1) Delayed swap deallocation, when a program exits after it >has gone into swap it's swap usage is not freed. Ouch. --- Double ouch. Swap is backing a non-existent program? > > > 2) Increased tenacity of swap caching. In particular in 2.2.x if a page >that was in the swap cache was written to the the page in the swap >space would be removed. In 2.4.x the location in swap space is >retained with the goal of getting more efficient swap-ins. But if the page in memory is 'dirty', you can't be efficient with swapping *in* the page. The page on disk is invalid and should be released, or am I missing something? > Neither of the known canidates from increasing the swap load applies > when you aren't swapping in the first place. They may aggrevate the > usage of swap when you are already swapping but they do not cause > swapping themselves. This is why the intial recommendation for > increased swap space size was made. If you are swapping we will use > more swap. > > However what pushes your laptop over the edge into swapping is an > entirely different question. And probably what should be solved. On my laptop, it is insignificant and to my knowledge has no measurable impact. It seems like there is always 3-5 Meg used in swap no matter what's running (or not) on the system. > > I think that is the point -- it was supported in 2.2, it is, IMO, > > a serious regression that it is not supported in 2.4. > > The problem with this general line of arguing is that it lumps a whole > bunch of real issues/regressions into one over all perception. Since > there are multiple reasons people are seeing problems, they need to be > tracked down with specifics. --- Uhhh, yeah, sorta -- it's addressing the statement that a "new requirement of 2.4 is to have double the swap space". If everyone agrees that's a problem, then yes, we can go into specifics of what is causing or contributing to the problem. It's getting past the attitude of some people that 2xMem for swap is somehow 'normal and acceptable -- deal with it". In my case, seems like 10Mb of swap would be all that would generally be used (I don't think I've ever seen swap usage over 7Mb) on a 512M system. To be told "oh, your wrong, you *should* have 1Gig or you are operating in an 'unsupported' or non-standard configuration". I find that very user-unfriendly. > > The swapoff case comes down to dead swap pages in the swap cache. > Which greatly increases the number of swap pages slows the system > down, but since these pages are trivial to free we don't generate any > I/O so don't wait for I/O and thus never enter the scheduler. Making > nothing else in the system runnable. --- I haven't ever *noticed* this on my machine but that could be because there isn't much in swap to begin with? Could be I was just blissfully ignorant of the time it took to do a swapoff. Hmmmlet's see... Just tried it. I didn't get a total lock up, but cursor movement was definitely jerky: > time sudo swapoff -a real0m10.577s user0m0.000s sys 0m9.430s Looking at vmstat, the needed space was taken mostly out of the page cache (86M->81.8M) and about 700K each out of free and buff. > Your case is significantly different. I don't know if you are seeing > any issues with swapping at all. With a 5M usage it may simply be > totally unused pages being pushed out to the swap space. --- Probably -- I guess the page cache and disk buffers put enough pressure to push some things off to swap. -linda -- The above thoughts and | They may have nothing to do with writings are my own. | the opinions of my employer. :-) L A Walsh| Senior MTS, Trust Tech, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
Uh, last I checked on my linux based embedded device I didn't want to swap to flash. Hmm.., now why was that..., oh, that's right, it's *much* more expensive than memory, oh yes, and it actually gets FRIED when you write to a block more than 100k times. Oh, what was that other thing..., oh yes, and its SOLDERED ON THE BOARD. Damn..., guess I just lost a grand or so. Seriously folks, Linux isn't just for big webservers... Thanks, Shane Nay. (Oh, BTW, I really appreciate the work that people have done on the VM, but folks that are just talking..., well, think clearly before you impact other people that are writing code.) On Wednesday 06 June 2001 02:57, Dr S.M. Huen wrote: > On Wed, 6 Jun 2001, Sean Hunter wrote: > > For large memory boxes, this is ridiculous. Should I have 8GB of swap? > > Do I understand you correctly? > ECC grade SDRAM for your 8GB server costs £335 per GB as 512MB sticks even > at today's silly prices (Crucial). Ultra160 SCSI costs £8.93/GB as 73GB > drives. > > It will cost you 19x as much to put the RAM in as to put the > developer's recommended amount of swap space to back up that RAM. The > developers gave their reasons for this design some time ago and if the > ONLY problem was that it required you to allocate more swap, why should > it be a priority item to fix it for those that refuse to do so? By all > means fix it urgently where it doesn't work when used as advised but > demanding priority to fixing a problem encountered when a user refuses to > use it in the manner specified seems very unreasonable. If you can afford > 4GB RAM, you certainly can afford 8GB swap. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On Thursday 07 June 2001 13:00, Marcelo Tosatti wrote: > On Thu, 7 Jun 2001, Shane Nay wrote: > > (Oh, BTW, I really appreciate the work that people have done on the VM, > > but folks that are just talking..., well, think clearly before you impact > > other people that are writing code.) > > If all the people talking were reporting results we would be really happy. > > Seriously, we really lack VM reports. Okay, I've had some problems with the VM on my machine, what is the most usefull way to compile reports for you? I have modified the kernel for a few different ports fixing bugs, and device drivers, etc., but the VM is all greek to me, I can just see that caching is hyper aggressive and doesn't look like it's going back to the pool..., which results in sluggish performance. Now I know from the work that I've done that anecdotal information is almost never even remotely usefull. Therefore is there any body of information that I can read up on to create a usefull set of data points for you or other VM hackers to look at? (Or maybe some report in the past that you thought was especially usefull?) Thank You, Shane Nay. (I have in the past had many problems with the VM on embedded machines as well, but I'm not actively working on any right this second..., though my Psion is sitting next to me begging for me to run some VM tests on it :) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On Thu, 7 Jun 2001, Shane Nay wrote: > (Oh, BTW, I really appreciate the work that people have done on the VM, but > folks that are just talking..., well, think clearly before you impact other > people that are writing code.) If all the people talking were reporting results we would be really happy. Seriously, we really lack VM reports. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On 07 Jun 2001 11:49:47 -0400, Derek Glidden wrote: > Miles Lane wrote: > > > > So please, if you have new facts that you want to offer that > > will help us characterize and understand these VM issues better > > or discover new problems, feel free to share them. But if you > > just want to rant, I, for one, would rather you didn't. > > *sigh* > > Not to prolong an already pointless thread, but that really was the > intent of my original message. I had figured out a specific way, with > easy-to-follow steps, to make the VM misbehave under very certain > conditions. I even offered to help figure out a solution in any way I > could, considering I'm not familiar with kernel code. > > However, I guess this whole "too much swap" issue has a lot of people on > edge and immediately assumed I was talking about this subject, without > actually reading my original message. Actually, I think your original message was useful. It has spurred a reevaluation of some design assumptions implicit in the VM in the 2.4 series and has also surfaced some bugs. It was not you who I felt was sending enflammatory remarks, it was the folks who have been bellyaching about the current swap disk space requirements without offering any new information to help developers remedy the situation. So, thanks for bringing the topic up. :-) Cheers, Miles - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On Thu, 7 Jun 2001, Mike Galbraith wrote: > On 6 Jun 2001, Eric W. Biederman wrote: > > > Mike Galbraith <[EMAIL PROTECTED]> writes: > > > > > > If you could confirm this by calling swapoff sometime other than at > > > > reboot time. That might help. Say by running top on the console. > > > > > > The thing goes comatose here too. SCHED_RR vmstat doesn't run, console > > > switch is nogo... > > > > > > After running his memory hog, swapoff took 18 seconds. I hacked a > > > bleeder valve for dead swap pages, and it dropped to 4 seconds.. still > > > utterly comatose for those 4 seconds though. > > > > At the top of the while(1) loop in try_to_unuse what happens if you put in. > > if (need_resched) schedule(); > > It should be outside all of the locks. It might just be a matter of everything > > serializing on the SMP locks, and the kernel refusing to preempt itself. > > That did it. What about including this workaround in the kernel ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On Thursday, 07 June 2001, at 09:23:42 +0200, Helge Hafting wrote: > Derek Glidden wrote: > > > > Helge Hafting wrote: > [...] > The machine froze 10 seconds or so at the end of the minute, I can > imagine that biting with bigger swap. > Same behavior here with a Pentium III 600, 128 MB RAM and 128 MB of swap. Filled mem and swap with the infamous glob() "bug" (ls ../*/.. etc.), made swapoff, and the machine kept very responsive except for the last 10-15 seconds before swapoff ends. Even scrolling complex pages with Mozilla 0.9 worked smoothly :). -- José Luis Domingo López Linux Registered User #189436 Debian GNU/Linux Potato (P166 64 MB RAM) jdomingo EN internautas PUNTO org => ¿ Spam ? Atente a las consecuencias jdomingo AT internautas DOT org => Spam at your own risk - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
Helge Hafting <[EMAIL PROTECTED]> writes: > A problem with this is that normal paging-in is allowed to page other > things out as well. But you can't have that when swap is about to > be turned off. My guess is that swapoff functionality was perceived to > be so seldom used that they didn't bother too much with scheduling > or efficiency. There is some truth in that. You aren't allowed to allocate new pages in the swap space currently being removed however. The current swap off code removes pages from the current swap space without breaking any sharing between swap pages. Depending on your load this may be important. Fixing swapoff to be more efficient while at the same time keeping sharing between pages is tricky. That under loads that are easy to trigger in 2.4 swapoff never sleeps is a big bug. > I don't have the same problem myself though. Shutting down with > 30M or so in swap never take unusual time on 2.4.x kernels here, > with a 300MHz processor. I did a test while typing this letter, > almost filling the 96M swap partition with 88M. swapoff > took 1 minute at 100% cpu. This is long, but the machine was responsive > most of that time. I.e. no worse than during a kernel compile. > The machine froze 10 seconds or so at the end of the minute, I can > imagine that biting with bigger swap. O.k. so at some point you actually wait for I/O and other process get a chance to run. On the larger machines we never wait for I/O and thus never schedule at all. The problem is now understood. Now we just need to fix it. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
LA Walsh <[EMAIL PROTECTED]> writes: > Now for whatever reason, since 2.4, I consistently use at least > a few Mb of swap -- stands at 5Meg now. Weird -- but I notice things > like nscd running 7 copies that take 72M. Seems like overkill for > a laptop. So the question becomes why you are seeing an increased swap usage. Currently there are two canidates in the 2.4.x code path. 1) Delayed swap deallocation, when a program exits after it has gone into swap it's swap usage is not freed. Ouch. 2) Increased tenacity of swap caching. In particular in 2.2.x if a page that was in the swap cache was written to the the page in the swap space would be removed. In 2.4.x the location in swap space is retained with the goal of getting more efficient swap-ins. Neither of the known canidates from increasing the swap load applies when you aren't swapping in the first place. They may aggrevate the usage of swap when you are already swapping but they do not cause swapping themselves. This is why the intial recommendation for increased swap space size was made. If you are swapping we will use more swap. However what pushes your laptop over the edge into swapping is an entirely different question. And probably what should be solved. > I think that is the point -- it was supported in 2.2, it is, IMO, > a serious regression that it is not supported in 2.4. The problem with this general line of arguing is that it lumps a whole bunch of real issues/regressions into one over all perception. Since there are multiple reasons people are seeing problems, they need to be tracked down with specifics. The swapoff case comes down to dead swap pages in the swap cache. Which greatly increases the number of swap pages slows the system down, but since these pages are trivial to free we don't generate any I/O so don't wait for I/O and thus never enter the scheduler. Making nothing else in the system runnable. Your case is significantly different. I don't know if you are seeing any issues with swapping at all. With a 5M usage it may simply be totally unused pages being pushed out to the swap space. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
Miles Lane wrote: > > So please, if you have new facts that you want to offer that > will help us characterize and understand these VM issues better > or discover new problems, feel free to share them. But if you > just want to rant, I, for one, would rather you didn't. *sigh* Not to prolong an already pointless thread, but that really was the intent of my original message. I had figured out a specific way, with easy-to-follow steps, to make the VM misbehave under very certain conditions. I even offered to help figure out a solution in any way I could, considering I'm not familiar with kernel code. However, I guess this whole "too much swap" issue has a lot of people on edge and immediately assumed I was talking about this subject, without actually reading my original message. -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- #!/usr/bin/perl -w $_='while(read+STDIN,$_,2048){$a=29;$b=73;$c=142;$t=255;@t=map {$_%16or$t^=$c^=($m=(11,10,116,100,11,122,20,100)[$_/16%8])&110; $t^=(72,@z=(64,72,$a^=12*($_%16-2?0:$m&17)),$b^=$_%64?12:0,@z) [$_%8]}(16..271);if((@a=unx"C*",$_)[20]&48){$h=5;$_=unxb24,join "",@b=map{xB8,unxb8,chr($_^$a[--$h+84])}@ARGV;s/...$/1$&/;$d= unxV,xb25,$_;$e=256|(ord$b[4])<<9|ord$b[3];$d=$d>>8^($f=$t&($d >>12^$d>>4^$d^$d/8))<<17,$e=$e>>8^($t&($g=($q=$e>>14&7^$e)^$q* 8^$q<<6))<<9,$_=$t[$_]^(($h>>=8)+=$f+(~$g&$t))for@a[128..$#a]} print+x"C*",@a}';s/x/pack+/g;eval usage: qrpff 153 2 8 105 225 < /mnt/dvd/VOB_FILENAME \ | extract_mpeg2 | mpeg2dec - http://www.eff.org/http://www.opendvd.org/ http://www.cs.cmu.edu/~dst/DeCSS/Gallery/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On Thu, 7 Jun 2001, Bulent Abali wrote: > I happened to saw this one with debugger attached serial port. > The system was alive. I think I was watching the free page count and > it was decreasing very slowly may be couple pages per second. Bigger > the swap usage longer it takes to do swapoff. For example, if I had > 1GB in the swap space then it would take may be an half hour to shutdown... I took a ~300ms ktrace snapshot of the no IO spot with 2.4.4.ikd.. % TOTALTOTAL USECSAVG/CALL NCALLS 0.0693% 208.540.40 517 c012d4b9 __free_pages 0.0755% 227.341.01 224 c012cb67 __free_pages_ok ... 34.7195% 104515.150.95 110049 c012de73 unuse_vma 53.3435% 160578.37 303.55 529 c012dd38 __swap_free Total entries: 131051 Total usecs:301026.93 Idle: 0.00% Andrew Morton could be right about that loop not being wonderful. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
"Eric W. Biederman" wrote: > There are cetain scenario's where you can't avoid virtual mem = > min(RAM,swap). Which is what I was trying to say, (bad formula). What > happens is that pages get referenced evenly enough and quickly enough > that you simply cannot reuse the on disk pages. Basically in the > worst case all of RAM is pretty much in flight doing I/O. This is > true of all paging systems. So, if I understand, you are talking about thrashing behavior where your active set is larger than physical ram. If that is the case then requiring 2X+ swap for "better" performance is reasonable. However, if your active set is truely larger than your physical memory on a consistant basis, in this day, the solution is usually "add more RAM". I may be wrong, but my belief is that with today's computers people are used to having enough memory to do their normal tasks and that swap is for "peak loads" that don't occur on a sustained basis. Of course I imagine that this is my belief as it is my own practice/view. I want to have considerably more memory than my normal working set. Swap on my laptop disk is *slow*. It's a low-power, low-RPM, slow seek rate all to conserve power (difference between spinning/off = 1W). So I have 50% of my phys mem on swap -- because I want to 'feel' it when I goto swap and start looking for memory hogs. For me, the pathological case is touching swap *at all*. So the idea of the entire active set being >=phys mem is already broken on my setup. Thus my expectation of swap only as 'warning'/'buffer' zone. Now for whatever reason, since 2.4, I consistently use at least a few Mb of swap -- stands at 5Meg now. Weird -- but I notice things like nscd running 7 copies that take 72M. Seems like overkill for a laptop. > However just because in the worst case virtual mem = min(RAM,swap), is > no reason other cases should use that much swap. If you are doing a > lot of swapping it is more efficient to plan on mem = min(RAM,swap) as > well, because frequently you can save on I/O operations by simply > reusing the existing swap page. --- Agreed. But planning your swap space for a worst case scenario that you never hit is wasteful. My worst case is using any swap. The system should be able to live with swap=1/2*phys in my situation. I don't think I'm unique in this respect. > It's a theoretical worst case and they all have it. In practice it is > very hard to find a work load where practically every page in the > system is close to the I/O point howerver. --- Well exactly the point. It was in such situations in some older systems that some programs were swapped out and temporarily made unavailable for running (they showed up in the 'w' space in vmstat). > Except for removing pages that aren't used paging with swap < RAM is > not useful. Simply removing pages that aren't in active use but might > possibly be used someday is a common case, so it is worth supporting. --- I think that is the point -- it was supported in 2.2, it is, IMO, a serious regression that it is not supported in 2.4. -linda -- The above thoughts and | They may have nothing to do with writings are my own. | the opinions of my employer. :-) L A Walsh| Senior MTS, Trust Tech., Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
>> O.k. I think I'm ready to nominate the dead swap pages for the big >> 2.4.x VM bug award. So we are burning cpu cycles in sys_swapoff >> instead of being IO bound? Just wanting to understand this the cheap way :) > >There's no IO being done whatsoever (that I can see with only a blinky). >I can fire up ktrace and find out exactly what's going on if that would >be helpful. Eating the dead swap pages from the active page list prior >to swapoff cures all but a short freeze. Eating the rest (few of those) >might cure the rest, but I doubt it. > >-Mike 1) I second Mike's observation. swapoff either from command line or during shutdown, just hangs there. No disk I/O is being done as I could see from the blinkers. This is not a I/O boundness issue. It is more like a deadlock. I happened to saw this one with debugger attached serial port. The system was alive. I think I was watching the free page count and it was decreasing very slowly may be couple pages per second. Bigger the swap usage longer it takes to do swapoff. For example, if I had 1GB in the swap space then it would take may be an half hour to shutdown... 2) Now why I would have 1 GB in the swap space, that is another problem. Here is what I observe and it doesn't make much sense to me. Let's say I have 1GB of memory and plenty of swap. And let's say there is process with little less than 1GB size. Suppose the system starts swapping because it is short few megabytes of memory. Within *seconds* of swapping, I see that the swap disk usage balloons to nearly 1GB. Nearly entire memory moves in to the page cache. If you run xosview you will know what I mean. Memory usage suddenly turns from green to red :-). And I know for a fact that my disk cannot do 1GB per second :-). The SHARE column of the big process in "top" goes up by hundreds of megabytes. So it appears to me that MM is marking the whole process memory to be swapped out and probably reserving nearly 1 GB in the swap space and furthermore moves entire process pages to apparently to the page cache. You would think that if you are short by few MB of memory MM would put few MB worth of pages in the swap. But it wants to move entire processes in to swap. When the 1GB process exits, the swap usage doesn't change (dead swap pages?). And shutdown or swapoff will take forever due to #1 above. Bulent - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 NotDashEscaped: You need GnuPG to verify this message First things first: 1) Please Cc: me when responding, 2) apologies for dropping any References: headers, 3) sorry for bad formatting "Jeffrey W. Baker" wrote: > On Tue, 5 Jun 2001, Derek Glidden wrote: > > This isn't trying to test extreme low-memory pressure, just how the > > system handles recovering from going somewhat into swap, which is > > a real > > day-to-day problem for me, because I often run a couple of apps > > that > > most of the time live in RAM, but during heavy computation runs, > > can go > > a couple hundred megs into swap for a few minutes at a time. > > Whenever > > that happens, my machine always starts acting up afterwards, so I > > started investigating and found some really strange stuff going on. Has anyone else noticed the difference between dd if=/dev/zero of=bigfile bs=16384k count=1 and dd if=/dev/zero of=bigfile bs=8k count=2048 deleting 'bigfile' each time before use? (You with lots of memory may (or may not!) want to try bs=262144k) Once, a few months ago, I thought I traced this to the loop at line ~2597 in linux/mm/filemap.c:generic_file_write 2593 remove_suid(inode); 2594 inode->i_ctime = inode->i_mtime = CURRENT_TIME; 2595 mark_inode_dirty_sync(inode); 2596 2597 while (count) { 2598 unsigned long index, offset; 2599 char *kaddr; 2600 int deactivate = 1; ... 2659 2660 if (status < 0) 2661 break; 2662 } 2663 *ppos = pos; 2664 2665 if (cached_page) It appears to me that pseudo-spins (it *does* do useful work) in this loop for as long as there are pages available. BTW while the big-bs dd is running, the disk is active. I assume that writes are indeed scheduled and start happening even while we're still dirtying pages? Does this freezing effect occur on SMP machines too? Oops, had access to one until this morning :( Would an SMP box still have a 'spare' cpu which isn't dirtying pages like crazy, and can therefore do things like updating mouse cursors, etc.? Bernd Jendrissek P.S. here's my patch that cures this one symptom; it smells and looks ugly, I know, but at least my mouse cursor doesn't jump across the whole screen when I do the dd=torture. I have no idea if this is right or not, whether I'm allowed to call schedule inside generic_file_write or not, etc. And the '256' is just random - small enough to let the cursor move, but large enough to do work between schedule()s. If this solves your problem, use it; if your name is Linus or Alan, ignore or do it right please. diff -u -r1.1 -r1.2 --- linux-hack/mm/filemap.c 2001/06/06 21:16:28 1.1 +++ linux-hack/mm/filemap.c 2001/06/07 08:57:52 1.2 @@ -2599,6 +2599,11 @@ char *kaddr; int deactivate = 1; + /* bernd-hack: give other processes a chance to run */ + if (count % 256 == 0) { + schedule(); + } + /* * Try to find the page in the cache. If it isn't there, * allocate a free page. -BEGIN PGP SIGNATURE- Version: GnuPG v1.0.4 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE7H1tb/FmLrNfLpjMRAguAAJ0fYInFbAa6LjFC/CWZbRPQxzZwrwCeNqT0 /Kod15Nx7AzaM4v0WhOgp88= =pyr6 -END PGP SIGNATURE- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
Linus Torvalds <[EMAIL PROTECTED]> writes: > On 7 Jun 2001, Eric W. Biederman wrote: > > No - I suspect that we're not actually doing all that much IO at all, and > the real reason for the lock-up is just that the current algorithm is so > bad that when it starts to act exponentially worse it really _is_ taking > minutes of CPU time following pointers and generally not being very nice > on the CPU cache etc.. Hmm. Unless I am mistaken the complexity is O(SwapPages*VMSize) Which is very bad, but no where near exponentially horrible. > The bulk of the work is walking the process page tables thousands and > thousands of times. Expensive. Definitely. I played following the page tables in a good way a while back, and even when you do it right the process is slow. Is if (need_resched) { schedule(); } A good idiom to use when you know you have a loop that will take a long time. Because even if we do this right we should do our best to avoid starving other processes in the system Hmm. There is a nasty case with turning the walk inside out. When we read a page into RAM there could still be other users of that page that still refer to the swap entry. So we cannot immediately remove the page from the swap cache. Unless we want to break sharing and increase the demands upon the virtual memory when we are shrinking it... > > If this is going on I think we need to look at our delayed > > deallocation policy a little more carefully. > > Agreed. I already talked in private with some people about just > re-visiting the issue of the lazy de-allocation. It has nice properties, > but it certainly appears as if the nasty cases just plain outweigh the > advantages. I'm trying to remember the advantages. Besides not having to care that a page is a swap page in free_pte. If there really is some value in not handling the pages there (and I seem to recall something about pages under I/O). It might at least be worth putting the pages on their own LRU list. So that kswapd can cruch through the list whenever it wakes up and gives a bunch of free pages. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On 7 Jun 2001, Eric W. Biederman wrote: > Mike Galbraith <[EMAIL PROTECTED]> writes: > > > On 7 Jun 2001, Eric W. Biederman wrote: > > > > > Does this improve the swapoff speed or just allow other programs to > > > run at the same time? If it is still slow under that kind of load it > > > would be interesting to know what is taking up all time. > > > > > > If it is no longer slow a patch should be made and sent to Linus. > > > > No, it only cures the freeze. The other appears to be the slow code > > pointed out by Andrew Morton being tickled by dead swap pages. > > O.k. I think I'm ready to nominate the dead swap pages for the big > 2.4.x VM bug award. So we are burning cpu cycles in sys_swapoff > instead of being IO bound? Just wanting to understand this the cheap way :) There's no IO being done whatsoever (that I can see with only a blinky). I can fire up ktrace and find out exactly what's going on if that would be helpful. Eating the dead swap pages from the active page list prior to swapoff cures all but a short freeze. Eating the rest (few of those) might cure the rest, but I doubt it. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On 7 Jun 2001, Eric W. Biederman wrote: > [EMAIL PROTECTED] (Linus Torvalds) writes: > > > > Somebody interested in trying the above add? And looking for other more > > obvious bandaid fixes. It won't "fix" swapoff per se, but it might make > > it bearable and bring it to the 2.2.x levels. > > At little bit. The one really bad behavior of not letting any other > processes run seems to be fixed with an explicit: > if (need_resched) { > schedule(); > } > > What I can't figure out is why this is necessary. Because we should > be sleeping in alloc_pages if nowhere else. No - I suspect that we're not actually doing all that much IO at all, and the real reason for the lock-up is just that the current algorithm is so bad that when it starts to act exponentially worse it really _is_ taking minutes of CPU time following pointers and generally not being very nice on the CPU cache etc.. The bulk of the work is walking the process page tables thousands and thousands of times. Expensive. > If this is going on I think we need to look at our delayed > deallocation policy a little more carefully. Agreed. I already talked in private with some people about just re-visiting the issue of the lazy de-allocation. It has nice properties, but it certainly appears as if the nasty cases just plain outweigh the advantages. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
Mike Galbraith <[EMAIL PROTECTED]> writes: > On 7 Jun 2001, Eric W. Biederman wrote: > > > Does this improve the swapoff speed or just allow other programs to > > run at the same time? If it is still slow under that kind of load it > > would be interesting to know what is taking up all time. > > > > If it is no longer slow a patch should be made and sent to Linus. > > No, it only cures the freeze. The other appears to be the slow code > pointed out by Andrew Morton being tickled by dead swap pages. O.k. I think I'm ready to nominate the dead swap pages for the big 2.4.x VM bug award. So we are burning cpu cycles in sys_swapoff instead of being IO bound? Just wanting to understand this the cheap way :) Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
[EMAIL PROTECTED] (Linus Torvalds) writes: > > Somebody interested in trying the above add? And looking for other more > obvious bandaid fixes. It won't "fix" swapoff per se, but it might make > it bearable and bring it to the 2.2.x levels. At little bit. The one really bad behavior of not letting any other processes run seems to be fixed with an explicit: if (need_resched) { schedule(); } What I can't figure out is why this is necessary. Because we should be sleeping in alloc_pages if nowhere else. I suppose if the bulk of our effort really is freeing dead swap cache pages we can spin without sleeping, and never let another process run because we are busily recycling dead swap cache pages. Does this sound right? If this is going on I think we need to look at our delayed deallocation policy a little more carefully. I suspect we should have code in kswapd actively removing these dead swap cache pages. After we get the latency improvements in exit these pages do absolutely nothing for us except clog up the whole system, and generally give the 2.4 VM a bad name. Anyone care to check my analysis? > Is anybody interested in making "swapoff()" better? Please speak up.. Interested. But finding the time... Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On 7 Jun 2001, Eric W. Biederman wrote: > Mike Galbraith <[EMAIL PROTECTED]> writes: > > > On 6 Jun 2001, Eric W. Biederman wrote: > > > > > Mike Galbraith <[EMAIL PROTECTED]> writes: > > > > > > > > If you could confirm this by calling swapoff sometime other than at > > > > > reboot time. That might help. Say by running top on the console. > > > > > > > > The thing goes comatose here too. SCHED_RR vmstat doesn't run, console > > > > switch is nogo... > > > > > > > > After running his memory hog, swapoff took 18 seconds. I hacked a > > > > bleeder valve for dead swap pages, and it dropped to 4 seconds.. still > > > > utterly comatose for those 4 seconds though. > > > > > > At the top of the while(1) loop in try_to_unuse what happens if you put in. > > > if (need_resched) schedule(); > > > It should be outside all of the locks. It might just be a matter of > > everything > > > > > serializing on the SMP locks, and the kernel refusing to preempt itself. > > > > That did it. > > Does this improve the swapoff speed or just allow other programs to > run at the same time? If it is still slow under that kind of load it > would be interesting to know what is taking up all time. > > If it is no longer slow a patch should be made and sent to Linus. No, it only cures the freeze. The other appears to be the slow code pointed out by Andrew Morton being tickled by dead swap pages. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
Derek Glidden wrote: > > Helge Hafting wrote: > > > > The drive is inactive because it isn't needed, the machine is > > running loops on data in memory. And it is unresponsive because > > nothing else is scheduled, maybe "swapoff" is easier to implement > > I don't quite get what you're saying. If the system becomes > unresponsive because the VM swap recovery parts of the kernel are > interfering with the kernel scheduler then that's also bad because there > absolutely *are* other processes that should be getting time, like the > console windows/shells at which I'm logged in. If they aren't getting > it specifically because the VM is preventing them from receiving > execution time, then that's another bug. > Sure. The kernel doing a big job without scheduling anything is a problem. > I'm not familiar enough with the swapping bits of the kernel code, so I > could be totally wrong, but turning off a swap file/partition should > just call the same parts of the VM subsystem that would normally try to > recover swap space under memory pressure. A problem with this is that normal paging-in is allowed to page other things out as well. But you can't have that when swap is about to be turned off. My guess is that swapoff functionality was perceived to be so seldom used that they didn't bother too much with scheduling or efficiency. I don't have the same problem myself though. Shutting down with 30M or so in swap never take unusual time on 2.4.x kernels here, with a 300MHz processor. I did a test while typing this letter, almost filling the 96M swap partition with 88M. swapoff took 1 minute at 100% cpu. This is long, but the machine was responsive most of that time. I.e. no worse than during a kernel compile. The machine froze 10 seconds or so at the end of the minute, I can imagine that biting with bigger swap. Helge Hafting - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
Mike Galbraith <[EMAIL PROTECTED]> writes: > On 6 Jun 2001, Eric W. Biederman wrote: > > > Mike Galbraith <[EMAIL PROTECTED]> writes: > > > > > > If you could confirm this by calling swapoff sometime other than at > > > > reboot time. That might help. Say by running top on the console. > > > > > > The thing goes comatose here too. SCHED_RR vmstat doesn't run, console > > > switch is nogo... > > > > > > After running his memory hog, swapoff took 18 seconds. I hacked a > > > bleeder valve for dead swap pages, and it dropped to 4 seconds.. still > > > utterly comatose for those 4 seconds though. > > > > At the top of the while(1) loop in try_to_unuse what happens if you put in. > > if (need_resched) schedule(); > > It should be outside all of the locks. It might just be a matter of > everything > > > serializing on the SMP locks, and the kernel refusing to preempt itself. > > That did it. Does this improve the swapoff speed or just allow other programs to run at the same time? If it is still slow under that kind of load it would be interesting to know what is taking up all time. If it is no longer slow a patch should be made and sent to Linus. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
LA Walsh <[EMAIL PROTECTED]> writes: > "Eric W. Biederman" wrote: > > > The hard rule will always be that to cover all pathological cases swap > > must be greater than RAM. Because in the worse case all RAM will be > > in thes swap cache. That this is more than just the worse case in 2.4 > > is problematic. I.e. In the worst case: > > Virtual Memory = RAM + (swap - RAM). > > Hmmmso my 512M laptop only really has 256M? Um...I regularlly run > more than 256M of programs. I don't want it to swap -- its a special, weird > condition if I do start swapping. I don't want to waste 1G of HD (5%) for > something I never want to use. IRIX runs just fine with swap Irix, your Virtual Memory = RAM + swap. Seems like the Linux kernel requires > more swap than other old OS's (SunOS3 (virtual mem = min(mem,swap)). > I *thought* I remember that restriction being lifted in SunOS4 when they > upgraded the VM. Even though I worked there for 6 years, that was > 6 years ago... There are cetain scenario's where you can't avoid virtual mem = min(RAM,swap). Which is what I was trying to say, (bad formula). What happens is that pages get referenced evenly enough and quickly enough that you simply cannot reuse the on disk pages. Basically in the worst case all of RAM is pretty much in flight doing I/O. This is true of all paging systems. However just because in the worst case virtual mem = min(RAM,swap), is no reason other cases should use that much swap. If you are doing a lot of swapping it is more efficient to plan on mem = min(RAM,swap) as well, because frequently you can save on I/O operations by simply reusing the existing swap page. > > > You can't improve the worst case. We can improve the worst case that > > many people are facing. > > --- > Other OS's don't have this pathological 'worst case' scenario. Even > my Windows [vm]box seems to operate fine with swap virtual space closely approximates physical + disk memory. It's a theoretical worst case and they all have it. In practice it is very hard to find a work load where practically every page in the system is close to the I/O point howerver. Except for removing pages that aren't used paging with swap < RAM is not useful. Simply removing pages that aren't in active use but might possibly be used someday is a common case, so it is worth supporting. > > > It's worth complaining about. It is also worth digging into and find > > out what the real problem is. I have a hunch that this hole > > conversation on swap sizes being irritating is hiding the real > > problem. > > --- > Okay, admission of ignorance. When we speak of "swap space", > is this term inclusive of both demand paging space and > swap-out-entire-programs space or one or another? Linux has no method to swap out an entire program so when I speak of swapping I'm actually thinking paging. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On 6 Jun 2001, Eric W. Biederman wrote: > Mike Galbraith <[EMAIL PROTECTED]> writes: > > > > If you could confirm this by calling swapoff sometime other than at > > > reboot time. That might help. Say by running top on the console. > > > > The thing goes comatose here too. SCHED_RR vmstat doesn't run, console > > switch is nogo... > > > > After running his memory hog, swapoff took 18 seconds. I hacked a > > bleeder valve for dead swap pages, and it dropped to 4 seconds.. still > > utterly comatose for those 4 seconds though. > > At the top of the while(1) loop in try_to_unuse what happens if you put in. > if (need_resched) schedule(); > It should be outside all of the locks. It might just be a matter of everything > serializing on the SMP locks, and the kernel refusing to preempt itself. That did it. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On 06 Jun 2001 20:34:49 -0400, Mike A. Harris wrote: > On Wed, 6 Jun 2001, Derek Glidden wrote: > > >> Derek> overwhelmed. On the system I'm using to write this, with > >> Derek> 512MB of RAM and 512MB of swap, I run two copies of this > >> > >> Please see the following message on the kernel mailing list, > >> > >> 3086:Linus 2.4.0 notes are quite clear that you need at least twice RAM of swap > >> Message-Id: <[EMAIL PROTECTED]> > > > >Yes, I'm aware of this. > > > >However, I still believe that my original problem report is a BUG. No > >matter how much swap I have, or don't have, and how much is or isn't > >being used, running "swapoff" and forcing the VM subsystem to reclaim > >unused swap should NOT cause my machine to feign death for several > >minutes. > > > >I can easily take 256MB out of this machine, and then I *will* have > >twice as much swap as RAM and I can still cause the exact same > >behaviour. > > > >It's a bug, and no number of times saying "You need twice as much swap > >as RAM" will change that fact. > > Precicely. Saying 8x RAM doesn't change it either. Sometime > next week I'm going to purposefully put a new 60Gb disk in on a > separate controller as pure swap on top of 256Mb of RAM. My > guess is after bootup, and login, I'll have 48Gb of stuff in > swap "just in case". Mike and others, I am getting tired of your comments. Sheesh. The various developers who actually work on the VM have already acknowledged the issues and are exploring fixes, including at least one patch that already exists. It seems clear that the uproar from the people who are having trouble with the new VM's handling of swap space have been heard and folks are going to fix these problems. It may not happen today or tomorrow, but soon. What the heck else do you want? Making enflammatory remarks about the current situation does nothing to help get the problems fixed, it just wastes our time and bandwidth. So please, if you have new facts that you want to offer that will help us characterize and understand these VM issues better or discover new problems, feel free to share them. But if you just want to rant, I, for one, would rather you didn't. Miles - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On Wed, 6 Jun 2001, Derek Glidden wrote: >> Derek> overwhelmed. On the system I'm using to write this, with >> Derek> 512MB of RAM and 512MB of swap, I run two copies of this >> >> Please see the following message on the kernel mailing list, >> >> 3086:Linus 2.4.0 notes are quite clear that you need at least twice RAM of swap >> Message-Id: <[EMAIL PROTECTED]> > >Yes, I'm aware of this. > >However, I still believe that my original problem report is a BUG. No >matter how much swap I have, or don't have, and how much is or isn't >being used, running "swapoff" and forcing the VM subsystem to reclaim >unused swap should NOT cause my machine to feign death for several >minutes. > >I can easily take 256MB out of this machine, and then I *will* have >twice as much swap as RAM and I can still cause the exact same >behaviour. > >It's a bug, and no number of times saying "You need twice as much swap >as RAM" will change that fact. Precicely. Saying 8x RAM doesn't change it either. Sometime next week I'm going to purposefully put a new 60Gb disk in on a separate controller as pure swap on top of 256Mb of RAM. My guess is after bootup, and login, I'll have 48Gb of stuff in swap "just in case". -- Mike A. Harris - Linux advocate - Open Source advocate Opinions and viewpoints expressed are solely my own. -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On Wed, 6 Jun 2001, android wrote: >associated with that mindset that made Microsoft such a [fill in the blank]. >As for the 2.4 VM problem, what are you doing with your machine that's >making it use up so much memory? I have several processes running >on mine all the time, including a slew in X, and I have yet to see >significant swap activity. Try _compiling_ XFree86. Watch the machine nosedive. -- Mike A. Harris - Linux advocate - Open Source advocate Opinions and viewpoints expressed are solely my own. -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On Wed, 6 Jun 2001, Dr S.M. Huen wrote: >> For large memory boxes, this is ridiculous. Should I have 8GB of swap? >> > >Do I understand you correctly? >ECC grade SDRAM for your 8GB server costs £335 per GB as 512MB sticks even >at today's silly prices (Crucial). Ultra160 SCSI costs £8.93/GB as 73GB >drives. Linux is all about technical correctness, and doing the job properly. It isn't about "there is a bug in the kernel, but that is ok because a 8Gb swapfile only costs $2" Why are half the people here trying to hide behind this diskspace is cheap argument? If we rely on that, then Linux sucks shit. The problem IMHO is widely acknowledged by those who matter as an official BUG, and that is that. It is also acknowledged widely by those who can fix the problem that it will be fixed in time. So technically speaking - the kernel has a widely known bug/misfeature, which is acknowledged by core kernel developers as needing fixing, and that it will get fixed at some point. Saying it is a nonissue due to the cost of hardware resources is just plain Microsoft attitude and holds absolutely zero technical merit. It *IS* an issue, because it is making Linux suck, and is causing REAL WORLD PROBLEMS. The use 2x RAM is nothing more than a bandaid workaround, so don't claim that it is the proper fix due to big wallet size. I have 2.2 doing a software build that takes 40 minutes with 256Mb of RAM, and 1G of swap. The same build on 2.4 takes 60 minutes. That is 4x RAM for swap. Lowering the swap down to 2x RAM makes no difference in the numbers, down to 1x RAM the 2.4 build slows down horrendously, and droping the swap to 20Mb makes it die completely in 2.4. 2.4 is fine for a firewall, or certain other applications, but regardless of the amount of SWAP, I'll take the 40minute build using 2.2 over the 60minute build using 2.4 anyday. This is the real world. And no cost isn't an issue to me. Putting another 80Gb drive in this box for swap isn't going to help the work get done any faster. -- Mike A. Harris - Linux advocate - Open Source advocate Opinions and viewpoints expressed are solely my own. -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
[EMAIL PROTECTED] (Alexander Viro) wrote on 06.06.01 in <[EMAIL PROTECTED]>: > On Wed, 6 Jun 2001, Sean Hunter wrote: > > > This is completely bogus. I am not saying that I can't afford the swap. > > What I am saying is that it is completely broken to require this amount > > of swap given the boundaries of efficient use. > > Funny. I can count many ways in which 4.3BSD, SunOS{3,4} and post-4.4 BSD > systems I've used were broken, but I've never thought that swap==2*RAM rule > was one of them. As a "will break without" rule, I'd consider a kernel with that property completely unsuitable for production use. I certainly don't remember thinking of that as more than a recommendation back when I used commercial Unices (SysVsomething). MfG Kai - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
At 11:27 pm +0100 6/6/2001, android wrote: >> >I'd be happy to write a new routine in assembly >> >>I sincerely hope you're joking. >> >>It's the algorithm that needs fixing, not the implementation of that >>algorithm. Writing in assembler? Hope you're proficient at writing in >>x86, PPC, 68k, MIPS (several varieties), ARM, SPARC, and whatever other >>architectures we support these days. And you darn well better hope every >>other kernel hacker is as proficient as that, to be able to read it. >As for the algorithm, I'm sure that >whatever method is used to handle page swapping, it has to comply with >the kernel's memory management scheme already in place. That's why I would >need the details so that I wouldn't create more problems than already present. Have you actually been following this thread? The algorithm has been discussed and at least one alternative brought forward. -- from: Jonathan "Chromatix" Morton mail: [EMAIL PROTECTED] (not for attachments) The key to knowledge is not to rely on people to teach you it. GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On 06 Jun 2001 15:27:57 -0700, android wrote: > >I sincerely hope you're joking. > > I realize that assembly is platform-specific. Being that I use the IA32 class > machine, that's what I would write for. Others who use other platforms could > do the deed for their native language. no, look at the code. it is not going to benefit from assembly (assuming you can even implement it cleanly in assembly). its basically an iteration of other function calls. doing a new implementation in assembly for each platform is not feasible, anyhow. this is the sort of thing that needs to be uniform. this really has nothing to do with the "iron" of the computer -- its a loop to check and free swap pages. assembly will not provide benefit. -- Robert M. Love [EMAIL PROTECTED] [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
hi, I have a problem with kswapd, it takes suddenly 98 % CPU and crash my server I dono why, I have a linux kernel 2.2.17 debian distro if anyone can help me ... thx ;) Antoine - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
> >I'd be happy to write a new routine in assembly > >I sincerely hope you're joking. > >It's the algorithm that needs fixing, not the implementation of that >algorithm. Writing in assembler? Hope you're proficient at writing in >x86, PPC, 68k, MIPS (several varieties), ARM, SPARC, and whatever other >architectures we support these days. And you darn well better hope every >other kernel hacker is as proficient as that, to be able to read it. I realize that assembly is platform-specific. Being that I use the IA32 class machine, that's what I would write for. Others who use other platforms could do the deed for their native language. As for the algorithm, I'm sure that whatever method is used to handle page swapping, it has to comply with the kernel's memory management scheme already in place. That's why I would need the details so that I wouldn't create more problems than already present. Being that most users are on the IA32 platform, I'm sure they wouldn't reject an assembly solution to this problem. As for kernel acceptance, that's an issue for the political eggheads. Not my forte. :-) -- Ted - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
>I'd be happy to write a new routine in assembly I sincerely hope you're joking. It's the algorithm that needs fixing, not the implementation of that algorithm. Writing in assembler? Hope you're proficient at writing in x86, PPC, 68k, MIPS (several varieties), ARM, SPARC, and whatever other architectures we support these days. And you darn well better hope every other kernel hacker is as proficient as that, to be able to read it. IOW, no chance. -- from: Jonathan "Chromatix" Morton mail: [EMAIL PROTECTED] (not for attachments) The key to knowledge is not to rely on people to teach you it. GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
"Eric W. Biederman" wrote: > The hard rule will always be that to cover all pathological cases swap > must be greater than RAM. Because in the worse case all RAM will be > in thes swap cache. That this is more than just the worse case in 2.4 > is problematic. I.e. In the worst case: > Virtual Memory = RAM + (swap - RAM). Hmmmso my 512M laptop only really has 256M? Um...I regularlly run more than 256M of programs. I don't want it to swap -- its a special, weird condition if I do start swapping. I don't want to waste 1G of HD (5%) for something I never want to use. IRIX runs just fine with swap You can't improve the worst case. We can improve the worst case that > many people are facing. --- Other OS's don't have this pathological 'worst case' scenario. Even my Windows [vm]box seems to operate fine with swap It's worth complaining about. It is also worth digging into and find > out what the real problem is. I have a hunch that this hole > conversation on swap sizes being irritating is hiding the real > problem. --- Okay, admission of ignorance. When we speak of "swap space", is this term inclusive of both demand paging space and swap-out-entire-programs space or one or another? -linda -- The above thoughts and | They may have nothing to do with writings are my own. | the opinions of my employer. :-) L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
>Is anybody interested in making "swapoff()" better? Please speak up.. > > Linus I'd be happy to write a new routine in assembly, if I had a clue as to how the VM algorithm works in Linux. What should swapoff do if all physical memory is in use? How does the swapping algorithm balance against cache memory? Can someone point me to where I can find the exact details of the VM mechanism in Linux? Thanks! -- Ted - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
In article <[EMAIL PROTECTED]>, Derek Glidden <[EMAIL PROTECTED]> wrote: > >After reading the messages to this list for the last couple of weeks and >playing around on my machine, I'm convinced that the VM system in 2.4 is >still severely broken. Now, this may well be true, but what you actually demonstrated is that "swapoff()" is extremely (and I mean _EXTREMELY_) inefficient, to the point that it can certainly be called broken. It got worse in 2.4.x not so much due to any generic VM worseness, as due to the fact that the much more persistent swap cache behaviour in 2.4.x just exposes the fundamental inefficiencies of "swapoff()" more clearly. I don't think the swapoff() algorithm itself has changed, it's just that the algorithm was always exponential, I think (and because of the persistent swap cache, the "n" in the algorithm became much bigger). So this is really a separate problem from the general VM balancing issues. Go and look at the "try_to_unuse()" logic, and wince. I'd love to have somebody look a bit more at swap-off. It may well be, for example, that swap-off does not correctly notice dead swap-pages at all - somebody should verify that it doesn't try to read in and "try_to_unuse()" dead swap entries. That would make the inefficiency show up even more clearly. (Quick look gives the following: right now try_to_unuse() in mm/swapfile.c does something like lock_page(page); if (PageSwapCache(page)) delete_from_swap_cache_nolock(page); UnlockPage(page); read_lock(&tasklist_lock); for_each_task(p) unuse_process(p->mm, entry, page); read_unlock(&tasklist_lock); shmem_unuse(entry, page); /* Now get rid of the extra reference to the temporary page we've been using. */ page_cache_release(page); and we should trivially notice that if the page count is 1, it cannot be mapped in any process, so we should maybe add something like lock_page(page); if (PageSwapCache(page)) delete_from_swap_cache_nolock(page); UnlockPage(page); + if (page_count(page) == 1) + goto nothing_to_do; read_lock(&tasklist_lock); for_each_task(p) unuse_process(p->mm, entry, page); read_unlock(&tasklist_lock); shmem_unuse(entry, page); + + nothing_to_do: + /* Now get rid of the extra reference to the temporary page we've been using. */ page_cache_release(page); which should (assuming I got the page count thing right - I'v eobviously not tested the above change) make sure that we don't spend tons of time on dead swap pages. Somebody interested in trying the above add? And looking for other more obvious bandaid fixes. It won't "fix" swapoff per se, but it might make it bearable and bring it to the 2.2.x levels. The _real_ fix is to really make "swapoff()" work the other way around - go through each process and look for swap entries in the page tables _first_, and bring all entries for that device in sanely, and after everything is brought in just drop all the swap cache pages for that device. The current swapoff() thing is really a quick hack that has lived on since early 1992 with quick hacks to make it work with the big VM changes that have happened since. That would make swapoff be O(n) in VM size (and you can easily do some further micro-optimizations at that time by avoiding shared mappings with backing store and other things that cannot have swap info involved) Is anybody interested in making "swapoff()" better? Please speak up.. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On Wednesday 06 June 2001 20:27, Eric W. Biederman wrote: > The hard rule will always be that to cover all pathological cases > swap must be greater than RAM. Because in the worse case all RAM > will be in thes swap cache. Could you explain in very simple terms how the worst case comes about? -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
Mike Galbraith wrote: > > Can you try the patch below to see if it helps? If you watch > with vmstat, you should see swap shrinking after your test. > Let is shrink a while and then see how long swapoff takes. > Under a normal load, it'll munch a handfull of them at least > once a second and keep them from getting annoying. (theory;) Hi Mike, I'll give that patch a spin this evening after work when I have time to patch and recompile the kernel. -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
Mike Galbraith <[EMAIL PROTECTED]> writes: > On 6 Jun 2001, Eric W. Biederman wrote: > > > Derek Glidden <[EMAIL PROTECTED]> writes: > > > > > > > The problem I reported is not that 2.4 uses huge amounts of swap but > > > that trying to recover that swap off of disk under 2.4 can leave the > > > machine in an entirely unresponsive state, while 2.2 handles identical > > > situations gracefully. > > > > > > > The interesting thing from other reports is that it appears to be kswapd > > using up CPU resources. Not the swapout code at all. So it appears > > to be a fundamental VM issue. And calling swapoff is just a good way > > to trigger it. > > > > If you could confirm this by calling swapoff sometime other than at > > reboot time. That might help. Say by running top on the console. > > The thing goes comatose here too. SCHED_RR vmstat doesn't run, console > switch is nogo... > > After running his memory hog, swapoff took 18 seconds. I hacked a > bleeder valve for dead swap pages, and it dropped to 4 seconds.. still > utterly comatose for those 4 seconds though. At the top of the while(1) loop in try_to_unuse what happens if you put in. if (need_resched) schedule(); It should be outside all of the locks. It might just be a matter of everything serializing on the SMP locks, and the kernel refusing to preempt itself. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
"Eric W. Biederman" wrote: > > Derek Glidden <[EMAIL PROTECTED]> writes: > > > The problem I reported is not that 2.4 uses huge amounts of swap but > > that trying to recover that swap off of disk under 2.4 can leave the > > machine in an entirely unresponsive state, while 2.2 handles identical > > situations gracefully. > > > > The interesting thing from other reports is that it appears to be kswapd > using up CPU resources. Not the swapout code at all. So it appears > to be a fundamental VM issue. And calling swapoff is just a good way > to trigger it. > > If you could confirm this by calling swapoff sometime other than at > reboot time. That might help. Say by running top on the console. That's exactly what my original test was doing. I think it was Jeffrey Baker complaining about "swapoff" at reboot. See my original post that started this thread and follow the "five easy steps." :) I'm sucking down a lot of swap, although not all that's available which is something I am specifically trying to avoid - I wanted to stress the VM/swap recovery procedure, not "out of RAM and swap" memory pressure - and then running 'swapoff' from an xterm or a console. The problem with being able to see what's eating up CPU resources is that the whole machine stops responding for me to tell. consoles stop updating, the X display freezes, keyboard input is locked out, etc. As far as anyone can tell, for several minutes, the whole machine is locked up. (except, strangely enough, the machine will still respond to ping) I've tried running 'top' to see what task is taking up all the CPU time, but the system hangs before it shows anything meaningful. I have been able to tell that it hits 100% "system" utilization very quickly though. I did notice that the first thing sys_swapoff() does is call lock_kernel() ... so if sys_swapoff() takes a long time, I imagine things will get very unresponsive quickly. (But I'm not intimately familiar with the various kernel locks, so I don't know what granularity/atomicity/whatever lock_kernel() enforces.) -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- #!/usr/bin/perl -w $_='while(read+STDIN,$_,2048){$a=29;$b=73;$c=142;$t=255;@t=map {$_%16or$t^=$c^=($m=(11,10,116,100,11,122,20,100)[$_/16%8])&110; $t^=(72,@z=(64,72,$a^=12*($_%16-2?0:$m&17)),$b^=$_%64?12:0,@z) [$_%8]}(16..271);if((@a=unx"C*",$_)[20]&48){$h=5;$_=unxb24,join "",@b=map{xB8,unxb8,chr($_^$a[--$h+84])}@ARGV;s/...$/1$&/;$d= unxV,xb25,$_;$e=256|(ord$b[4])<<9|ord$b[3];$d=$d>>8^($f=$t&($d >>12^$d>>4^$d^$d/8))<<17,$e=$e>>8^($t&($g=($q=$e>>14&7^$e)^$q* 8^$q<<6))<<9,$_=$t[$_]^(($h>>=8)+=$f+(~$g&$t))for@a[128..$#a]} print+x"C*",@a}';s/x/pack+/g;eval usage: qrpff 153 2 8 105 225 < /mnt/dvd/VOB_FILENAME \ | extract_mpeg2 | mpeg2dec - http://www.eff.org/http://www.opendvd.org/ http://www.cs.cmu.edu/~dst/DeCSS/Gallery/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On 6 Jun 2001, Eric W. Biederman wrote: > Derek Glidden <[EMAIL PROTECTED]> writes: > > > > The problem I reported is not that 2.4 uses huge amounts of swap but > > that trying to recover that swap off of disk under 2.4 can leave the > > machine in an entirely unresponsive state, while 2.2 handles identical > > situations gracefully. > > > > The interesting thing from other reports is that it appears to be kswapd > using up CPU resources. Not the swapout code at all. So it appears > to be a fundamental VM issue. And calling swapoff is just a good way > to trigger it. > > If you could confirm this by calling swapoff sometime other than at > reboot time. That might help. Say by running top on the console. The thing goes comatose here too. SCHED_RR vmstat doesn't run, console switch is nogo... After running his memory hog, swapoff took 18 seconds. I hacked a bleeder valve for dead swap pages, and it dropped to 4 seconds.. still utterly comatose for those 4 seconds though. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
>Furthermore, I am not demanding anything, much less "priority fixing" >for this bug. Its my personal opinion that this is the most critical bug >in the 2.4 series, and if I had the time and skill, this is what I would >be working on. Because I don't have the time and skill, I am perfectly >happy to wait until those that do fix the problem. To say it isn't a >problem because I can buy more disk is nonsense, and its that sort of >thinking that leads to constant need to upgrade hardware in the >proprietary OS world. > >Sean This would reflect the Microsoft way of programming: If there's a bug in the system, don't fix it, but upgrade your hardware. Why do you think the requirements for Windows is so great? Most of their code is very inefficient. I'm sure they programmed their kernel in Visual Basic. The worst part is that they get paid to do this! I program in Linux because I don't want to be associated with that mindset that made Microsoft such a [fill in the blank]. As for the 2.4 VM problem, what are you doing with your machine that's making it use up so much memory? I have several processes running on mine all the time, including a slew in X, and I have yet to see significant swap activity. -- Ted P.S. My faithful Timex Sinclair from the 80's never had swap :-) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On Tue, 5 Jun 2001, Derek Glidden wrote: > After reading the messages to this list for the last couple of weeks and > playing around on my machine, I'm convinced that the VM system in 2.4 is > still severely broken. ... Hi, Can you try the patch below to see if it helps? If you watch with vmstat, you should see swap shrinking after your test. Let is shrink a while and then see how long swapoff takes. Under a normal load, it'll munch a handfull of them at least once a second and keep them from getting annoying. (theory;) -Mike --- linux-2.4.5.ac5/mm/vmscan.c.org Sat Jun 2 07:37:16 2001 +++ linux-2.4.5.ac5/mm/vmscan.c Wed Jun 6 18:29:02 2001 @@ -1005,6 +1005,53 @@ return ret; } +int deadswap_reclaim(unsigned int priority) +{ + struct list_head * page_lru; + struct page * page; + int maxscan = nr_active_pages >> priority; + int nr_reclaim = 0; + + /* Take the lock while messing with the list... */ + spin_lock(&pagemap_lru_lock); + while (maxscan-- > 0 && (page_lru = active_list.prev) != &active_list) { + page = list_entry(page_lru, struct page, lru); + + /* Wrong page on list?! (list corruption, should not happen) */ + if (!PageActive(page)) { + printk("VM: refill_inactive, wrong page on list.\n"); + list_del(page_lru); + nr_active_pages--; + continue; + } + + if (PageSwapCache(page) && + (page_count(page) - !!page->buffers) == 1 && + swap_count(page) == 1) { + if (page->buffers || TryLockPage(page)) { + ClearPageReferenced(page); + ClearPageDirty(page); + page->age = 0; + deactivate_page_nolock(page); + } else { + page_cache_get(page); + spin_unlock(&pagemap_lru_lock); + delete_from_swap_cache_nolock(page); + spin_lock(&pagemap_lru_lock); + UnlockPage(page); + page_cache_release(page); + } + nr_reclaim++; + continue; + } + list_del(page_lru); + list_add(page_lru, &active_list); + } + spin_unlock(&pagemap_lru_lock); + + return nr_reclaim; +} + DECLARE_WAIT_QUEUE_HEAD(kreclaimd_wait); /* * Kreclaimd will move pages from the inactive_clean list to the @@ -1027,7 +1074,7 @@ * We sleep until someone wakes us up from * page_alloc.c::__alloc_pages(). */ - interruptible_sleep_on(&kreclaimd_wait); + interruptible_sleep_on_timeout(&kreclaimd_wait, HZ); /* * Move some pages from the inactive_clean lists to @@ -1051,6 +1098,7 @@ } pgdat = pgdat->node_next; } while (pgdat); + deadswap_reclaim(4); } } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
Derek Glidden <[EMAIL PROTECTED]> writes: > The problem I reported is not that 2.4 uses huge amounts of swap but > that trying to recover that swap off of disk under 2.4 can leave the > machine in an entirely unresponsive state, while 2.2 handles identical > situations gracefully. > The interesting thing from other reports is that it appears to be kswapd using up CPU resources. Not the swapout code at all. So it appears to be a fundamental VM issue. And calling swapoff is just a good way to trigger it. If you could confirm this by calling swapoff sometime other than at reboot time. That might help. Say by running top on the console. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On Wed, 06 Jun 2001, Dr S.M. Huen wrote: > The whole screaming match is about whether a drastic degradation on using > swap with less than the 2*RAM swap specified by the developers should lead > one to conclude that a kernel is "broken". I would argue that any system that performs substantially worse with swap==1xRAM than a system with swap==0xRAM is fundamentally broken. it seems that w/ todays 2.4.x kernel, people running programs totalling LESS THAN their physical dram are having swap problems. they should not even be using 1 byte of swap. the whole point of swapping pages is to give you more memory to execute programs. if I want to execute 140MB of programs+kernel on a system with 128 MB of ram, I should be able to do the job effectively with ANY amount of "total memory" exceeding 140MB. not some hokey 128MB RAM + 256MB swap just because the kernel it too fscked up to deal with a small swap file. -- /*** ** Mark Salisbury | Mercury Computer Systems** ***/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
"Eric W. Biederman" wrote: > > > Or are you saying that if someone is unhappy with a particular > > situation, they should just keep their mouth shut and accept it? > > It's worth complaining about. It is also worth digging into and find > out what the real problem is. I have a hunch that this hole > conversation on swap sizes being irritating is hiding the real > problem. I totally agree with this, and want to reiterate that the original problem I posted has /nothing/ to do with the "swap == 2*RAM" issue. The problem I reported is not that 2.4 uses huge amounts of swap but that trying to recover that swap off of disk under 2.4 can leave the machine in an entirely unresponsive state, while 2.2 handles identical situations gracefully. I'm annoyed by 2.4's "requirement" of too much swap, but I consider that less a bug and more a severe design flaw. -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- #!/usr/bin/perl -w $_='while(read+STDIN,$_,2048){$a=29;$b=73;$c=142;$t=255;@t=map {$_%16or$t^=$c^=($m=(11,10,116,100,11,122,20,100)[$_/16%8])&110; $t^=(72,@z=(64,72,$a^=12*($_%16-2?0:$m&17)),$b^=$_%64?12:0,@z) [$_%8]}(16..271);if((@a=unx"C*",$_)[20]&48){$h=5;$_=unxb24,join "",@b=map{xB8,unxb8,chr($_^$a[--$h+84])}@ARGV;s/...$/1$&/;$d= unxV,xb25,$_;$e=256|(ord$b[4])<<9|ord$b[3];$d=$d>>8^($f=$t&($d >>12^$d>>4^$d^$d/8))<<17,$e=$e>>8^($t&($g=($q=$e>>14&7^$e)^$q* 8^$q<<6))<<9,$_=$t[$_]^(($h>>=8)+=$f+(~$g&$t))for@a[128..$#a]} print+x"C*",@a}';s/x/pack+/g;eval usage: qrpff 153 2 8 105 225 < /mnt/dvd/VOB_FILENAME \ | extract_mpeg2 | mpeg2dec - http://www.eff.org/http://www.opendvd.org/ http://www.cs.cmu.edu/~dst/DeCSS/Gallery/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On Wed, 6 Jun 2001, Kurt Roeckx wrote: > On Wed, Jun 06, 2001 at 10:57:57AM +0100, Dr S.M. Huen wrote: > > On Wed, 6 Jun 2001, Sean Hunter wrote: > > > > > > > > For large memory boxes, this is ridiculous. Should I have 8GB of swap? > > > > > > > Do I understand you correctly? > > ECC grade SDRAM for your 8GB server costs £335 per GB as 512MB sticks even > > at today's silly prices (Crucial). Ultra160 SCSI costs £8.93/GB as 73GB > > drives. > > Maybe you really should reread the statements people made about > this before. > I think you might do with a more careful quoting or reading of the thread yourself before casting such aspersions. I did not recommend swap use. I argued that it was not reasonable to reject a 2*RAM swap requirement on cost grounds. There are those who do not think this argument adequate because of grounds other than hardware cost (e.g. retrofitting existing farms, laptops with zillions of OSes etc.) > > That swap = 2 * RAM is just a guideline, you really should look > at what applications you run, and how memory they use. If you > choise your RAM so that all application can always be in memory > at all time, there is no need for swap. If they can't be, the > rule might help you. > I think the whole argument of the thread is against you here. It seems that if you do NOT provide 2*RAM you get into trouble much earlier than you expect (a few argue that even if you do you get trouble). If it were just a guideline that gracefully degraded your performance the other lot wouldn't be screaming. The whole screaming match is about whether a drastic degradation on using swap with less than the 2*RAM swap specified by the developers should lead one to conclude that a kernel is "broken". To conclude, this is not a hypothetical argument about whether to operate completely in core. There's not a person on LKML who doesn't know running in RAM is better than running swapping. It is one where users do swap but allocate a size smaller than that recommended and are adversely affected. It is about whether a kernel that reacts this way could be regarded as stable. Answe - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
Derek Glidden <[EMAIL PROTECTED]> writes: > John Alvord wrote: > > > > On Wed, 06 Jun 2001 11:31:28 -0400, Derek Glidden > > <[EMAIL PROTECTED]> wrote: > > > > > > > >I'm beginning to be amazed at the Linux VM hackers' attitudes regarding > > >this problem. I expect this sort of behaviour from academics - ignoring > > >real actual problems being reported by real actual people really and > > >actually experiencing and reporting them because "technically" or > > >"theoretically" they "shouldn't be an issue" or because "the "literature > > >[documentation] says otherwise - but not from this group. > > > > There have been multiple comments that a fix for the problem is > > forthcoming. Is there some reason you have to keep talking about it? > > Because there have been many more comments that "The rule for 2.4 is > 'swap == 2*RAM' and that's the way it is" and "disk space is cheap - > just add more" than there have been "this is going to be fixed" which is > extremely discouraging and doesn't instill me with all sorts of > confidence that this problem is being taken seriously. The hard rule will always be that to cover all pathological cases swap must be greater than RAM. Because in the worse case all RAM will be in thes swap cache. That this is more than just the worse case in 2.4 is problematic. I.e. In the worst case: Virtual Memory = RAM + (swap - RAM). You can't improve the worst case. We can improve the worst case that many people are facing. > Or are you saying that if someone is unhappy with a particular > situation, they should just keep their mouth shut and accept it? It's worth complaining about. It is also worth digging into and find out what the real problem is. I have a hunch that this hole conversation on swap sizes being irritating is hiding the real problem. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On Wednesday, 06 June 2001, at 10:19:30 +0200, Xavier Bestel wrote: > On 05 Jun 2001 23:19:08 -0400, Derek Glidden wrote: > > On Wed, Jun 06, 2001 at 12:16:30PM +1000, Andrew Morton wrote: > [...] > Did you try to put twice as much swap as you have RAM ? (e.g. add a 512M > swapfile to your box) > I'm not a kernel guru, neither I can even try to understand how an operating system's memory management is designed or behaves. But I've some questions and thoughs: 1. Is swap=2xRAM a desing issue, or just a recommendation to get best results _based_ on current VM subsystem status ? 2. Wouldn't performance drop quickly when VM starts to swap processes/pages to disk, instead of keeping them on RAM ?. Maybe having a couple of GB worth of processes on disk is not very wyse. 3. Shouldn't an ideal VM manage swap space as an extension of system's RAM (of course, taking into account that RAM is much faster than HD, and nothing should be on swap if there is room enough on RAM ?. 4. Wouldn't you say that "adding more swap" (maybe 2xRAM is a recommendation, maybe a temporary fix, maybe a design decission) is the M$-way of fixing things ?. If there is a _real_ need for more swap to get a well baheving system, let's add swap. But we shouldn't hide inner desing and/or implementation problems under the "cheap multigigabyte disks" argument. 5. AFAIK, kernel developers are well aware of current 2.4.x problems in some areas. I don't think insisting on certain problems without providing ideas, testing, support, and limiting to just blaming the authors is the best way to go. Maybe kernel hackers are the most interested of all in fixing all these issues ASAP. Just some thoughts from someone unable to write C code and help fix this mess ;). -- José Luis Domingo López Linux Registered User #189436 Debian GNU/Linux Potato (P166 64 MB RAM) jdomingo EN internautas PUNTO org => ¿ Spam ? Atente a las consecuencias jdomingo AT internautas DOT org => Spam at your own risk - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On Wed, Jun 06, 2001 at 10:57:57AM +0100, Dr S.M. Huen wrote: > On Wed, 6 Jun 2001, Sean Hunter wrote: > > > > > For large memory boxes, this is ridiculous. Should I have 8GB of swap? > > > > Do I understand you correctly? > ECC grade SDRAM for your 8GB server costs £335 per GB as 512MB sticks even > at today's silly prices (Crucial). Ultra160 SCSI costs £8.93/GB as 73GB > drives. Maybe you really should reread the statements people made about this before. One of them being, that if you're not using swap in 2.2, it won't need any in 2.4 either. 2.4 will use more swap in case it does use it. It now works more like other UNIX variants where the rule is that swap = 2 * RAM. That swap = 2 * RAM is just a guideline, you really should look at what applications you run, and how memory they use. If you choise your RAM so that all application can always be in memory at all time, there is no need for swap. If they can't be, the rule might help you. I think someone said that the swap should be large enough to hold all application that are running on swapspace, that is, in case you want to use swap. Disk maybe be alot cheaper than RAM, but it's also alot slower. Kurt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On Wed, Jun 06, 2001 at 06:48:32AM -0400, Alexander Viro wrote: > On Wed, 6 Jun 2001, Sean Hunter wrote: > > > This is completely bogus. I am not saying that I can't afford the swap. > > What I am saying is that it is completely broken to require this amount > > of swap given the boundaries of efficient use. > > Funny. I can count many ways in which 4.3BSD, SunOS{3,4} and post-4.4 BSD > systems I've used were broken, but I've never thought that swap==2*RAM rule > was one of them. > > Not that being more kind on swap would be a bad thing, but that rule for > amount of swap is pretty common. ISTR similar for (very old) SCO, so it's > not just BSD world. How are modern Missed'em'V variants in that respect, BTW? Although I don't have any swap-trouble myself, what I think most people are having problems with is not that Linux doesn't have the "you-dont-need-2xRAM-size-swap-if-you-swap-at-all feature", but that it lost it in 2.4. -- Linux 2.4.5-ac9 #5 Wed Jun 6 18:30:24 CEST 2001 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On Wed, 6 Jun 2001, Alexander Viro wrote: > On Wed, 6 Jun 2001, Sean Hunter wrote: > > > This is completely bogus. I am not saying that I can't afford the swap. > > What I am saying is that it is completely broken to require this amount > > of swap given the boundaries of efficient use. > > Funny. I can count many ways in which 4.3BSD, SunOS{3,4} and post-4.4 BSD > systems I've used were broken, but I've never thought that swap==2*RAM rule > was one of them. > > Not that being more kind on swap would be a bad thing, but that rule for > amount of swap is pretty common. ISTR similar for (very old) SCO, so it's > not just BSD world. How are modern Missed'em'V variants in that respect, BTW? frequently when building out a solaris web farm you have to just bite it and throw away half your disk for swap that will never be used. it's got pessimistic memory allocation by default. you can do something with mmap() to get an optimistic allocation, but i didn't trust making this change to apache when i was involved with a farm like this... i didn't want to be debugging any potential low memory problems. -dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On 6 Jun 2001, Eric W. Biederman wrote: > "Jeffrey W. Baker" <[EMAIL PROTECTED]> writes: > > > On Tue, 5 Jun 2001, Derek Glidden wrote: > > > > > > > > After reading the messages to this list for the last couple of weeks and > > > playing around on my machine, I'm convinced that the VM system in 2.4 is > > > still severely broken. > > > > > > This isn't trying to test extreme low-memory pressure, just how the > > > system handles recovering from going somewhat into swap, which is a real > > > day-to-day problem for me, because I often run a couple of apps that > > > most of the time live in RAM, but during heavy computation runs, can go > > > a couple hundred megs into swap for a few minutes at a time. Whenever > > > that happens, my machine always starts acting up afterwards, so I > > > started investigating and found some really strange stuff going on. > > > > I reboot each of my machines every week, to take them offline for > > intrusion detection. I use 2.4 because I need advanced features of > > iptables that ipchains lacks. Because the 2.4 VM is so broken, and > > because my machines are frequently deeply swapped, they can sometimes take > > over 30 minutes to shutdown. They hang of course when the shutdown rc > > script turns off the swap. The first few times this happened I assumed > > they were dead. > > Interesting. Is it constant disk I/O? Or constant CPU utilization. > In any case you should be able to comment that line out of your shutdown > rc script and be in perfectly good shape. Well I can't exactly run top(1) at shutdown time, but the disks aren't running at all. Either the system is using the CPUs, or it is blocked waiting for something to happen. You're right about swapoff, we removed it from our shutdown script. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On Wed, 6 Jun 2001, Dr S.M. Huen wrote: > If you can afford 4GB RAM, you certainly can afford 8GB swap. this is a completely crap argument. you should study the economics of managing a farm of thousands of machines some day. when you do this, you'll also learn to consider the power requirements (8W+ per 3.5" disk) which you need to bring to each rack, supply backup UPS/generator power for, and exhaust through your air conditioning for each of these useless swap disks. plus you'll also learn to consider the wages for the unlucky person who has to go around to every box in a farm, open it up, and install another disk. plus you'll learn that the time this person spent installing new disks wasn't spent installing new systems, which means you couldn't bring as many customers on line this month, which means you may not make revenue targets. plus you'll learn that every time you open a box that's been in production for a while, there's a small, but noticeable, chance that it won't reboot. so your normal monthly failure rate will go from the 2% range up to the 5% range. -dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
Richard Gooch wrote: > > Daniel Phillips writes: > > On Wednesday 06 June 2001 10:54, Sean Hunter wrote: > > > > > > Did you try to put twice as much swap as you have RAM ? (e.g. add a > > > > 512M swapfile to your box) > > > > This is what Linus recommended for 2.4 (swap = 2 * RAM), saying > > > > that anything less won't do any good: 2.4 overallocates swap even > > > > if it doesn't use it all. So in your case you just have enough swap > > > > to map your RAM, and nothing to really swap your apps. > > > > > > For large memory boxes, this is ridiculous. Should I have 8GB of > > > swap? > > Sure. It's cheap. If you don't mind slumming it, go and buy a 20 GB > IDE drive for US$65. I know RAM has gotten a lot cheaper lately (US$66 > for a 512 MiB PC133 DIMM), but it's still far more expensive. If you > can afford 4 GiB of RAM, you can definately afford 8 GiB of swap. For me, the problem is not the money. If I have a system that needs 4GB of RAM, it is highly unlikely that I would ever want to be running this machine with 8GB of swap active. However, I may be willing to tollerate 1GB of swapping before paging to disk slowed things down too much. This is the exact scenario I had when dealing with a large Sun machine running Oracle & some other stuff. Oracle is dedicated large amounts of RAM, but if I wanted to run a quick, memory intensive program too, (and at the moment performance isn't all that big of a deal), then using some swap is OK. So, I too cast my vote for the 2*RAM requiment to be odious and in need of fixing!! It could be a suggestion, but I would consider that if not following the suggestion caused more than 10% slowdown, then things are still broken, and optimally, it should work like the 2.2 does (in other words, I don't notice, and don't particularly care how much swap per RAM I need, just how much total RAM-like-stuff I need.) Thanks, Ben -- Ben Greear <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
John Alvord wrote: > > On Wed, 06 Jun 2001 11:31:28 -0400, Derek Glidden > <[EMAIL PROTECTED]> wrote: > > > > >I'm beginning to be amazed at the Linux VM hackers' attitudes regarding > >this problem. I expect this sort of behaviour from academics - ignoring > >real actual problems being reported by real actual people really and > >actually experiencing and reporting them because "technically" or > >"theoretically" they "shouldn't be an issue" or because "the "literature > >[documentation] says otherwise - but not from this group. > > There have been multiple comments that a fix for the problem is > forthcoming. Is there some reason you have to keep talking about it? Because there have been many more comments that "The rule for 2.4 is 'swap == 2*RAM' and that's the way it is" and "disk space is cheap - just add more" than there have been "this is going to be fixed" which is extremely discouraging and doesn't instill me with all sorts of confidence that this problem is being taken seriously. Or are you saying that if someone is unhappy with a particular situation, they should just keep their mouth shut and accept it? -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- #!/usr/bin/perl -w $_='while(read+STDIN,$_,2048){$a=29;$b=73;$c=142;$t=255;@t=map {$_%16or$t^=$c^=($m=(11,10,116,100,11,122,20,100)[$_/16%8])&110; $t^=(72,@z=(64,72,$a^=12*($_%16-2?0:$m&17)),$b^=$_%64?12:0,@z) [$_%8]}(16..271);if((@a=unx"C*",$_)[20]&48){$h=5;$_=unxb24,join "",@b=map{xB8,unxb8,chr($_^$a[--$h+84])}@ARGV;s/...$/1$&/;$d= unxV,xb25,$_;$e=256|(ord$b[4])<<9|ord$b[3];$d=$d>>8^($f=$t&($d >>12^$d>>4^$d^$d/8))<<17,$e=$e>>8^($t&($g=($q=$e>>14&7^$e)^$q* 8^$q<<6))<<9,$_=$t[$_]^(($h>>=8)+=$f+(~$g&$t))for@a[128..$#a]} print+x"C*",@a}';s/x/pack+/g;eval usage: qrpff 153 2 8 105 225 < /mnt/dvd/VOB_FILENAME \ | extract_mpeg2 | mpeg2dec - http://www.eff.org/http://www.opendvd.org/ http://www.cs.cmu.edu/~dst/DeCSS/Gallery/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
On Wed, 06 Jun 2001 11:31:28 -0400, Derek Glidden <[EMAIL PROTECTED]> wrote: > >I'm beginning to be amazed at the Linux VM hackers' attitudes regarding >this problem. I expect this sort of behaviour from academics - ignoring >real actual problems being reported by real actual people really and >actually experiencing and reporting them because "technically" or >"theoretically" they "shouldn't be an issue" or because "the "literature >[documentation] says otherwise - but not from this group. There have been multiple comments that a fix for the problem is forthcoming. Is there some reason you have to keep talking about it? John alvord - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
OK, Linus said if I use swap it should be at least twice as much as RAM. there will be much more discussion about it, for me this contraint is a very very bad idea. Have you ever thought about diskless workstations? Swapping over a network sounds ugly. Nevertheless, my question is: what happens if I plan to use no swap. I have enough memory installed for my purposes and every swapping operation can do only one thing: slowing down the system. Is there a different behaviour if I completely disable swap? greetings Christian Bornträger - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
> Funny. I can count many ways in which 4.3BSD, SunOS{3,4} and post-4.4 BSD > systems I've used were broken, but I've never thought that swap==2*RAM rule > was one of them. Yes, but Linux isn't 4.3BSD, SunOS or post-4.4 BSD. Not to mention, all other OS's I've had experience using *don't* break severely if you don't follow the "swap==2*RAM" rule. Except Linux 2.4. > Not that being more kind on swap would be a bad thing, but that rule for > amount of swap is pretty common. ISTR similar for (very old) SCO, so it's > not just BSD world. How are modern Missed'em'V variants in that respect, BTW? Yes, but that has traditionally been one of the big BENEFITS of Linux, and other UNIXes. As Sean Hunter said, "Virtual memory is one of the killer features of unix." Linux has *never* in the past REQUIRED me to follow that rule. Which is a big reason I use it in so many places. Take an example mentioned by someone on the list already: a laptop. I have two laptops that run Linux. One has a 4GB disk, one has a 12GB disk. Both disks are VERY full of data and both machines get pretty heavy use. It's a fact that I just bumped one laptop (with 256MB of swap configured) from 128MB to 256MB of RAM. Does this mean that if I want to upgrade to the 2.4 kernel on that machine I now have to back up all that data, repartition the drive and restore everything just so I can fastidiously follow the "swap == 2*RAM" rule else the 2.4 VM subsystem will break? Bollocks, to quote yet another participant in this silly discussion. I'm beginning to be amazed at the Linux VM hackers' attitudes regarding this problem. I expect this sort of behaviour from academics - ignoring real actual problems being reported by real actual people really and actually experiencing and reporting them because "technically" or "theoretically" they "shouldn't be an issue" or because "the "literature [documentation] says otherwise - but not from this group. -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
Daniel Phillips writes: > On Wednesday 06 June 2001 10:54, Sean Hunter wrote: > > > > Did you try to put twice as much swap as you have RAM ? (e.g. add a > > > 512M swapfile to your box) > > > This is what Linus recommended for 2.4 (swap = 2 * RAM), saying > > > that anything less won't do any good: 2.4 overallocates swap even > > > if it doesn't use it all. So in your case you just have enough swap > > > to map your RAM, and nothing to really swap your apps. > > > > For large memory boxes, this is ridiculous. Should I have 8GB of > > swap? Sure. It's cheap. If you don't mind slumming it, go and buy a 20 GB IDE drive for US$65. I know RAM has gotten a lot cheaper lately (US$66 for a 512 MiB PC133 DIMM), but it's still far more expensive. If you can afford 4 GiB of RAM, you can definately afford 8 GiB of swap. > And laptops with big memories and small disks. That's not that common, though. Usually you get far more disc than RAM on a laptop, just as with a desktop. Regards, Richard Permanent: [EMAIL PROTECTED] Current: [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
Helge Hafting wrote: > > The drive is inactive because it isn't needed, the machine is > running loops on data in memory. And it is unresponsive because > nothing else is scheduled, maybe "swapoff" is easier to implement I don't quite get what you're saying. If the system becomes unresponsive because the VM swap recovery parts of the kernel are interfering with the kernel scheduler then that's also bad because there absolutely *are* other processes that should be getting time, like the console windows/shells at which I'm logged in. If they aren't getting it specifically because the VM is preventing them from receiving execution time, then that's another bug. > when processes cannot try to allocate more or touch pages > while it runs. "swapoff" isn't something you normally do often, > so it don't have to be nice. I'm not familiar enough with the swapping bits of the kernel code, so I could be totally wrong, but turning off a swap file/partition should just call the same parts of the VM subsystem that would normally try to recover swap space under memory pressure. Using "swapoff" to force this behaviour should just force it to happen manually rather than when memory pressure is high enough. Which means that if that's the normal behaviour of the VM subsystem when memory pressure gets high and it needs to recover unused pages from swap - i.e. the machine stops running - then that's still very broken behaviour, no matter what instigated the occurance. > Still, I find it strange that swapoff should take much more time, > even if you can get 2.2 to have the same amount in swap. So do I. Hence the original report. -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- #!/usr/bin/perl -w $_='while(read+STDIN,$_,2048){$a=29;$b=73;$c=142;$t=255;@t=map {$_%16or$t^=$c^=($m=(11,10,116,100,11,122,20,100)[$_/16%8])&110; $t^=(72,@z=(64,72,$a^=12*($_%16-2?0:$m&17)),$b^=$_%64?12:0,@z) [$_%8]}(16..271);if((@a=unx"C*",$_)[20]&48){$h=5;$_=unxb24,join "",@b=map{xB8,unxb8,chr($_^$a[--$h+84])}@ARGV;s/...$/1$&/;$d= unxV,xb25,$_;$e=256|(ord$b[4])<<9|ord$b[3];$d=$d>>8^($f=$t&($d >>12^$d>>4^$d^$d/8))<<17,$e=$e>>8^($t&($g=($q=$e>>14&7^$e)^$q* 8^$q<<6))<<9,$_=$t[$_]^(($h>>=8)+=$f+(~$g&$t))for@a[128..$#a]} print+x"C*",@a}';s/x/pack+/g;eval usage: qrpff 153 2 8 105 225 < /mnt/dvd/VOB_FILENAME \ | extract_mpeg2 | mpeg2dec - http://www.eff.org/http://www.opendvd.org/ http://www.cs.cmu.edu/~dst/DeCSS/Gallery/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/