Jeff Dike wrote: > On Sat, May 05, 2007 at 10:33:22PM +0200, Jan Ploski wrote: > >>I repeated the tests with linux-2.6.21-rc7-mm2 (didn't compile right >>away, required a one-liner fix) > > > What fix?
kernel/tlb.c:255 contains the statement: log_info("total flush time - %Ld nsecs\n", end_time - start_time); "end_time" (or maybe "start_time"?) was undefined, I just commented out the line to get it to compile. >>1) [Note: this has nothing to do with the previously reported problem, >>but might be interesting anyway:] Compiling the UML kernel on SLES10 >>with its shipped gcc-4.1.0 (ignoring warnings during the compilation) is >>a bad idea. With such a miscompiled kernel, 80% of my test cases hang at >>random points. > > > Hmmm, I've got 4.1.1 here with no problems and I haven't noticed > problems with gcc recently, although I haven't kept track of what I > had. The version I have is gcc-4.1.0-28.4, it reports as gcc (GCC) 4.1.0 (SUSE Linux). This is on x86_64 architecture. >>3) I noticed that the UML console output, which I redirected to a file >>with > in my wrapper shell script, was being randomly truncated. As a >>remedy, I changed "con0=fd:0,fd:1" to "con0=pts,fd:1" on the command >>line. Now I'm getting the complete console output from each run >>collected, which is good. > > > One UML per log file, so they're not stepping on each other? Yes. Each UML has its own working directory. >>4) I am now experiencing random segmentation faults - for example, in 18 >>out of 842 UML instances in today's test. The root_fs is Debian stable, >>so I wouldn't blame it. It also does not seem to be flaky hardware, as >>the instances crash on different hardware nodes. In over half of the >>faulty cases, fsck on boot will crash: > > > This one I need to fix. This is with rc7-mm2 or 2.6.21-mm1? Can you > point me to the filesystem you're booting? 2.6.21-mm1. The file system is the Debian root_fs downloaded from the web site. I did an apt-get upgrade and installed a few packages. I'm sending you a download link for the whole "experiment" with separate email. >>5) Something which I observed only once so far: the UML process does not >>terminate, but instead starts consuming 100% CPU time. The captured >>console output ends with "System halted." and does not differ from a >>successful run. > > > If that happens again, can you attach gdb to it and see where it is? Ok. From my perspective this type of hang is not a big deal, as I can have my wrapper script watch the log file and kill off an instance which gets stuck at the end. >>6) When running a UML instance to edit my root_fs (with all other >>instances killed, of course) I get: >> >>F_SETLK failed, file already locked by pid 934 >>Failed to lock 'root_fs', err = 11 >>Failed to open 'root_fs', errno = 11 >>ubda: Can't open "root_fs": errno = 11 >> >>with a subsequent Kernel panic. There is never any process with pid 934, >>nor any other UML instance which could be the culprit. > > > There is some UML process still hanging around - it may not be > obviously UML, but it should be there. If not, then this is a host > kernel bug, but I would put good money on there being a UML process > that you're not noticing. I can reproduce it any time in my current setup. ps auxwww|grep linux shows no UML processes hanging around. I have also written a small test program which just attempts to lock the file. In this program, fcntl(fd, F_SETLK, &lock) fails with errno = Bad file descriptor. Best regards - Jan Ploski ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ User-mode-linux-user mailing list User-mode-linux-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-user