Greetings oh UML gurus...

I'm trying to run about 50 UML hosts on a dual-processor (each 
dual-core) Opteron system, running 64-bit Ubuntu 6.06LTS .

I have 4 GB of physical RAM in the Ubuntu box + 3 GB swap.  It is 
running a vanilla 2.6.17.8 kernel (no skas3 patches, because I haven't 
successfully managed to apply any sets of skas3 patches)

Each guest is configured to have 48MB ram and 256MB swap, and is running 
kernel 2.6.18-rc5 and a 64-bit Mandriva 2006 operating system.

Guest are started with a line like this:

/home/uml/kernel64 ubda=/home/$user/root_fs ubdb=/home/$user/swap 
eth0=daemon,08:00:07:26:c0:04,unix,/opt/uml/run/uml_switch.ctl mem=48M

I assign a fixed MAC so that each virtual machine can DHCP an address.

Here's my list of troubles:

the guest machines are dropping off the network regularly.  They may be 
pingable for a few minutes, but then dhclient seems to crash on them.  
stopping them and restarting them puts them back on the network for a 
while, but invariably, the network dies again.

guest machines randomly crash.  They aren't sending me anything via 
syslog (I've configured them to log to the host machine), and they don't 
appear to log anything to STDOUT.  I can make machines crash with the 
message in the subject line, by simply working them too hard (e.g. log 
in, fire off a few dozen shells, add about 1000 users as fast as bash 
can go through a for loop).  My students, however (up to 25 working 
simultaneously on UML guests), are making the machines crash by doing 
nothing more than adding two or three users manually, looking at man 
pages, and typing 'ls'.

I've tried to build a debugging kernel, and successfully started the 
kernel in gdb, and got it running.  Unfortunately, when I got the 
message in the subject line, it blew me all the way out of gdb.

I have been experiencing these crashes using guest kernels 2.6.17.11, 
2.6.18-rc4, 2.6.18-rc5.

I am stumped as to what I should do next.  I've had some rather creative 
suggestions from my students, but unfortunately none of them can be 
repeated in mixed company.

One possible thing which may be connected to the general network failure 
is the fact that when I'm starting these machines, I dare not start them 
any faster than 1 machine per 40 seconds.  If I start the machines at 
that rate, they will successfully DHCP an address.  If I start them, 
say, every 10 seconds, most machines will never grab an address.  In 
fact, the DHCP server never sees their request come across tap0.

My math indicates that I should be able to fit 50 48MB machines in 4 GB 
of RAM, but I'm willing to further constrain these machines if that will 
help.

I've disabled /lib/tls and /lib64/tls on both host and guest operating 
systems.

Any help is greatly appreciated.

Jeremy

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
User-mode-linux-user mailing list
User-mode-linux-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-user

Reply via email to