Jeff Dike wrote:
> On Sat, May 05, 2007 at 10:33:22PM +0200, Jan Ploski wrote:
> 
>>I repeated the tests with linux-2.6.21-rc7-mm2 (didn't compile right 
>>away, required a one-liner fix) 
> 
> 
> What fix?

kernel/tlb.c:255 contains the statement:
log_info("total flush time - %Ld nsecs\n", end_time - start_time);

"end_time" (or maybe "start_time"?) was undefined, I just commented out 
the line to get it to compile.

>>1) [Note: this has nothing to do with the previously reported problem, 
>>but might be interesting anyway:] Compiling the UML kernel on SLES10 
>>with its shipped gcc-4.1.0 (ignoring warnings during the compilation) is 
>>a bad idea. With such a miscompiled kernel, 80% of my test cases hang at 
>>random points.
> 
> 
> Hmmm, I've got 4.1.1 here with no problems and I haven't noticed
> problems with gcc recently, although I haven't kept track of what I
> had.

The version I have is gcc-4.1.0-28.4, it reports as
gcc (GCC) 4.1.0 (SUSE Linux). This is on x86_64 architecture.

>>3) I noticed that the UML console output, which I redirected to a file 
>>with > in my wrapper shell script, was being randomly truncated. As a 
>>remedy, I changed "con0=fd:0,fd:1" to "con0=pts,fd:1" on the command 
>>line. Now I'm getting the complete console output from each run 
>>collected, which is good.
> 
> 
> One UML per log file, so they're not stepping on each other?

Yes. Each UML has its own working directory.

>>4) I am now experiencing random segmentation faults - for example, in 18 
>>out of 842 UML instances in today's test. The root_fs is Debian stable, 
>>so I wouldn't blame it. It also does not seem to be flaky hardware, as 
>>the instances crash on different hardware nodes. In over half of the 
>>faulty cases, fsck on boot will crash:
> 
> 
> This one I need to fix.  This is with rc7-mm2 or 2.6.21-mm1?  Can you
> point me to the filesystem you're booting?

2.6.21-mm1. The file system is the Debian root_fs downloaded from the 
web site. I did an apt-get upgrade and installed a few packages. I'm 
sending you a download link for the whole "experiment" with separate email.

>>5) Something which I observed only once so far: the UML process does not 
>>terminate, but instead starts consuming 100% CPU time. The captured 
>>console output ends with "System halted." and does not differ from a 
>>successful run.
> 
> 
> If that happens again, can you attach gdb to it and see where it is?

Ok. From my perspective this type of hang is not a big deal, as I can 
have my wrapper script watch the log file and kill off an instance which 
gets stuck at the end.

>>6) When running a UML instance to edit my root_fs (with all other 
>>instances killed, of course) I get:
>>
>>F_SETLK failed, file already locked by pid 934
>>Failed to lock 'root_fs', err = 11
>>Failed to open 'root_fs', errno = 11
>>ubda: Can't open "root_fs": errno = 11
>>
>>with a subsequent Kernel panic. There is never any process with pid 934, 
>>nor any other UML instance which could be the culprit. 
> 
> 
> There is some UML process still hanging around - it may not be
> obviously UML, but it should be there.  If not, then this is a host
> kernel bug, but I would put good money on there being a UML process
> that you're not noticing.

I can reproduce it any time in my current setup. ps auxwww|grep linux 
shows no UML processes hanging around. I have also written a small test 
program which just attempts to lock the file. In this program, fcntl(fd, 
F_SETLK, &lock) fails with errno = Bad file descriptor.

Best regards -
Jan Ploski

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
User-mode-linux-user mailing list
User-mode-linux-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-user

Reply via email to