On Thu, Oct 27, 2005 at 10:56:58PM +0200, Grzegorz Nosek wrote: > Hello all, > > I have noticed a disturbing pattern on my smp systems with vserver > patches (2.6.13.4 with vserver patches from gentoo). If the load > average is quite high (in my situation it was about 70 in one context > and about 30 in context 1), I experience random hard freezes. These > are not (IMO) from thrashing etc., as the machine easily survives > higher loads. At the console I can switch between VTs and that's about > all I can do. No networking or anything. > > To keep it as clear as possible (too much blood in my caffeine > stream): I have two SMP machines (a dual 1.8GHz Xeon and an AMD64 x2 > 3800+ in 32-bit mode). While experimenting with vserver guests on the > Xeons I have often encountered oopses when the vserver was not shut > down properly (due to issues with my initscripts). It looked like > this: > > - vserver vXXX start > (some errors from my scripts) > ... > - vserver vXXX stop > (shutdown messages, hanging after 'Deconfiguring network interfaces') > > vwait waits and waits forever (I haven't patched it yet) > > after killing vwait I can no longer access the context (chcontext > segfaults with a kernel oops - I should still have logs somewhere if > you are interested)
yes, definitely, any _oops_ or _stack_ _trace_ issued while a linux-vserver kernel is running _is_ interesting and should be reported back to the linux-vserver kernel developers ... > The stack traces apparently have null dereferences in an impossible > place. The oops seems to happen in __create_vx_info, just after > returning from __dealloc_vx_info. That line contains an instruction > like mov %eax,%esi or something to this effect (not accessing memory > at all). this is something we fixed in devel recently (two weeks ago?) as you said, you are using the gentoo (devel) branch this might be related, but still not updated ... > I also experienced occassional lockups under high load (a make -j100 > kernel build inside one vserver :) ) could be related to, we had a thread and discussion about similar effects, once again, this only affects the devel branch and is already fixed ... > I have compiled the kernel again with vserver debugging and history > logging (whatever it is called) and yesterday when I was shutting down > a vserver vwait didn't exit too. So I killed it and wanted to > chcontext into that vserver to invoke the kernel oops and have some > more debugging info. The machine locked hard (under zero load). I was > unable to recover any debugging info as it didn't hit syslog (will > build something with network console soon probably. > > The AMD64 box was experiencing random lockups too, not related to > shutting down vservers or anything like that, just when the load was a > bit higher. I booted a uni-processor kernel and it seems to work OK so > far. > > Has anybody experienced similar problems? I can run the boxes UP for > now but I'd really need SMP before going into production. > > OK, enough of this babbling ;) I suspect that some part of vserver > support is not SMP-safe in some way. Although I have no real debugging > data, my gut feeling says it's some spinlock deadlock (and some deep > bowels add that it might be inside the scheduler). I'll try to gather > some more information (with a kernel with all possible debugging on > and a network console). finally here the fix(es) we did :) (they are in 2.1.0-rc4) http://vserver.13thfloor.at/Experimental/delta-2.6.13.3-vs2.1.0-rc3-rc3.1.diff.bz2 http://vserver.13thfloor.at/Experimental/delta-2.6.13.3-vs2.1.0-rc3.1-rc3.2.diff.bz2 http://vserver.13thfloor.at/Experimental/delta-2.6.13.3-vs2.1.0-rc3.2-rc3.3.diff.bz2 http://vserver.13thfloor.at/Experimental/delta-2.6.13.3-vs2.1.0-rc3.3-rc3.4.diff.bz2 > If you need more information about my setup, feel free to ask. as usual, output of testme.sh would be helpful ... (as written on the testing page) thanks, Herbert > Best regards, > Grzegorz Nosek > _______________________________________________ > Vserver mailing list > [email protected] > http://list.linux-vserver.org/mailman/listinfo/vserver _______________________________________________ Vserver mailing list [email protected] http://list.linux-vserver.org/mailman/listinfo/vserver
