* Peter Lieven (p...@kamp.de) wrote: > Hi David, > > Am 07.04.2015 um 10:43 schrieb Dr. David Alan Gilbert: > >>>> Any particular workload or reproducer? > >>> Workload is almost zero. I try to figure out if there is a way to trigger > >>> it. > >>> > >>> Maybe playing a role: Machine type is -M pc1.2 and we set -kvmclock as > >>> CPU flag since kvmclock seemed to be quite buggy in 2.6.16... > >>> > >>> Exact cmdline is: > >>> /usr/bin/qemu-2.2.1 -enable-kvm -M pc-1.2 -nodefaults -netdev > >>> type=tap,id=guest2,script=no,downscript=no,ifname=tap2 -device > >>> e1000,netdev=guest2,mac=52:54:00:ff:00:65 -drive > >>> format=raw,file=iscsi://172.21.200.53/iqn.2001-05.com.equallogic:4-52aed6-88a7e99a4-d9e00040fdc509a3-XXX-hd0/0,if=ide,cache=writeback,aio=native > >>> -serial null -parallel null -m 1024 -smp > >>> 2,sockets=1,cores=2,threads=1 -monitor tcp:0:4003,server,nowait -vnc :3 > >>> -qmp tcp:0:3003,server,nowait -name 'XXX' -boot order=c,once=dc,menu=off > >>> -drive index=2,media=cdrom,if=ide,cache=unsafe,aio=native,readonly=on -k > >>> de -incoming tcp:0:5003 -pidfile /var/run/qemu/vm-146.pid -mem-path > >>> /hugepages -mem-prealloc -rtc base=utc -usb -usbdevice tablet -no-hpet > >>> -vga cirrus -cpu qemu64,-kvmclock > >>> > >>> Exact kernel is: > >>> 2.6.16.46-0.12-smp (i think this is SLES10 or sth.) > >>> > >>> The machine does not hang. It seems just I/O is hanging. So you can type > >>> at the console or ping the system, but no longer login. > >>> > >>> Thank you, > >>> Peter > >> Interesting observation: Migrating the vServer again seems to fix to > >> problem (at least in one case I could test just now). > >> > >> 2.6.8-24-smp is also affected. > > How often does it fail - you say 'sometimes' - is it a 1/10 or a 1/1000 ? > Its more often than 1/10 I would say.
OK, that's not too bad - it's the 1/1000 that are really nasty to find. In your setup, how easy would it be for you to try : with either 2.1 or current head? with a newer machine-type? without the cdrom? Dave > > > > > I'm not sure at what kernel version the switch is, but newer kernels use > > some > > code shared with the newer SATA world (libata?) where as older kernels had > > separate IDE code, so the behaviour of the two can be quite different. > > Thats a good point. I will check what the kernels have. > I remember that there was sth like a problem with error handling in > the old drivers? Paolo, you worked a lot on IDE lately. Do you remember? > > Thanks, > Peter -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK