[Qemu-devel] [Bug 1716028] Re: qemu 2.10 locks images with no feature flag
In our multipath case, the initial open succeeds (it's not locked by anything else) and the second lock attempt fails, however, IIUC the fcntl structure[1] includes the locking pid, which should be our invocation of QEMU; Can we not check if the locking pid matches the current pid and not fail? This appears to be the original goal of protecting locked images from being manipulated by external processes. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1716028 Title: qemu 2.10 locks images with no feature flag Status in QEMU: New Status in qemu package in Ubuntu: New Bug description: 1) % lsb_release -rd Description: Ubuntu Artful Aardvark (development branch) Release: 17.10 2) % apt-cache policy qemu-system-x86 qemu-system-x86: Installed: 1:2.10~rc3+dfsg-0ubuntu1 Candidate: 1:2.10+dfsg-0ubuntu1 Version table: 1:2.10+dfsg-0ubuntu1 500 500 http://archive.ubuntu.com//ubuntu devel/main amd64 Packages *** 1:2.10~rc3+dfsg-0ubuntu1 100 100 /var/lib/dpkg/status 3) qemu locks image files with no way to discover this feature nor how to disable it 4) qemu provides a way to query if it supports image locking, and what the default value is, and how to disable the locking via cli qemu 2.10 now will lock image files and warn if an image is currently locked. This prevent qemu from running (and possibly corrupting said image). However, qemu does not provide any way to determine if a qemu binary actually has this capability. Normally behavior changing features are exposed via some change to the qemu help menu or QMP/QAPI output of capabilities. I believe this slipped through since libvirt already does image locking, but direct cli users will be caught by this change. In particular, we have a use-case where we simulate multipath disks by creating to disks which point to the same file which now breaks without adding the 'file.locking=off' to the -drive parameters; which is also completely undocumented and unexposed. Some parts of the cli like -device allow querying of settable options (qemu-system-x86 -device scsi_hd,?) but nothing equivalent exists for -drive parameters. ProblemType: Bug DistroRelease: Ubuntu 17.10 Package: qemu-system-x86 1:2.10~rc3+dfsg-0ubuntu1 ProcVersionSignature: Ubuntu 4.12.0-11.12-generic 4.12.5 Uname: Linux 4.12.0-11-generic x86_64 NonfreeKernelModules: zfs zunicode zavl zcommon znvpair ApportVersion: 2.20.6-0ubuntu7 Architecture: amd64 Date: Fri Sep 8 12:56:53 2017 JournalErrors: Hint: You are currently not seeing messages from other users and the system. Users in groups 'adm', 'systemd-journal' can see all messages. Pass -q to turn off this notice. -- Logs begin at Mon 2017-01-30 11:56:02 CST, end at Fri 2017-09-08 12:56:46 CDT. -- -- No entries -- KvmCmdLine: COMMAND STAT EUID RUID PID PPID %CPU COMMAND MachineType: HP ProLiant DL360 Gen9 ProcEnviron: TERM=xterm PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=en_US.UTF-8 SHELL=/bin/bash ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.12.0-11-generic root=UUID=45354276-e0c0-4bf6-9083-f130b89411cc ro --- console=ttyS1,115200 SourcePackage: qemu UpgradeStatus: No upgrade log present (probably fresh install) dmi.bios.date: 03/05/2015 dmi.bios.vendor: HP dmi.bios.version: P89 dmi.chassis.type: 23 dmi.chassis.vendor: HP dmi.modalias: dmi:bvnHP:bvrP89:bd03/05/2015:svnHP:pnProLiantDL360Gen9:pvr:cvnHP:ct23:cvr: dmi.product.family: ProLiant dmi.product.name: ProLiant DL360 Gen9 dmi.sys.vendor: HP To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1716028/+subscriptions
[Qemu-devel] [Bug 1716028] Re: qemu 2.10 locks images with no feature flag
Kevin, Thanks for the information. A couple of points for feedback: 1) there doesn't appear to be a way to run qmp query-schema without spawning a qemu instance in -qmp mode and having a second client issue the query-schema; certainly a qemu-system-$arch -qmp-schema would be quite useful when examining feature availability. While I know the QMP/QAPI introspection is where most of the work has gone to describing how to interact with qemu it's quite cumbersome at best: searching for blockdev-add, find arg-type: 48, read arg-type-48, find list of 'variants', know that locking feature is part of 'file', find type: 207, see member 'locking' in list[A], which is of type 296, find type: 296, which is an enum of 'on', 'off', 'auto' A. Interesting enough, qapi says the default is None, however qemu certainly locks files by default which would seem to imply a mismatch between qapi defaults and qemu behavior. That's pretty heavy; Maybe that warning message qemu prints could provide some hints as to what a user could do (or refer to a manpage on locking?). 2) share-rw appears to be a blockdev parameter (I see it available via most block devices via qemu-system-$arch -device {scsi-hd,ide-hd,virtio- blk}? However there is no equivalent -blockdev for dumping the default options that a -blockdev parameter takes. The qmp-schema also does not include any information about 'share-rw' w.r.t what values are available that I could find after dumping the schema. Thanks smoser: #!/bin/sh qemu-system-x86_64 -S -nodefaults -nographic \ -serial none -monitor none -qmp stdio
Re: [Qemu-devel] [Bug 1305402] Re: libvirt fails to start VirtualMachines
Machine type changes may be related to: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1626070 There's a PPA in the bug with a fix for at least the wily machine type. On Thu, Sep 22, 2016 at 6:05 PM, Leo Ariaswrote: > This has just happened to me. For some reason, all my machines had > machine='pc-i440fx-wily'. > After an update in yakkety, they stopped working. > > $ qemu-system-x86_64 -enable-kvm -machine help | grep wily > > So I updated the machine xml to a supported machine as Charles > suggested, and they work again. > > Here's the note for my future self about how to do the update: > > $ virsh dumpxml $machine-name > /tmp/machine.xml > Edit the xml. > $ virsh define /tmp/machine.xml > > -- > You received this bug notification because you are subscribed to qemu in > Ubuntu. > https://bugs.launchpad.net/bugs/1305402 > > Title: > libvirt fails to start VirtualMachines > > To manage notifications about this bug go to: > https://bugs.launchpad.net/qemu/+bug/1305402/+subscriptions > -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1305402 Title: libvirt fails to start VirtualMachines Status in QEMU: Invalid Status in libvirt package in Ubuntu: Confirmed Status in qemu package in Ubuntu: Confirmed Bug description: I've created several kvm based machines using virtual machine manager. They have operated well over the last 4 days without issue. I did an apt-get upgrade, and qemu was in the list of updates. After upgrading, I am unable to start any of the provisioned virtual machines with the following error output virsh # start node2 error: Failed to start domain node2 error: internal error: process exited while connecting to monitor: qemu-system-x86_64: -machine trusty,accel=kvm,usb=off: Unsupported machine type Use -machine help to list supported machines! virsh # start node3 error: Failed to start domain node3 error: internal error: process exited while connecting to monitor: qemu-system-x86_64: -machine trusty,accel=kvm,usb=off: Unsupported machine type Use -machine help to list supported machines! $ dpkg -l | grep kvm ii qemu-kvm 2.0.0~rc1+dfsg-0ubuntu3 amd64QEMU Full virtualization on x86 hardware (transitional package) Log snippet from vm 'media' that was verified working, and fails to start after the upgrade. LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm-spice -name media -S -machine trusty,accel=kvm,usb=off -m 1548 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 60b20f7b-6d20-bcb3-f4fc-808a9b2fe0d0 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/media.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot menu=off,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/libvirt/images/media.img,if=none,id=drive-virtio-disk0,format=raw -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/home/charles/iso/ubuntu-desktop-12.04.4-amd64.iso,if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=26 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:a0:69:d9,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:1 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 char device redirected to /dev/pts/13 (label charserial0) qemu: terminating on signal 15 from pid 31688 2014-04-10 03:36:54.593+: shutting down 2014-04-10 03:59:25.487+: starting up LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm-spice -name media -S -machine trusty,accel=kvm,usb=off -m 1548 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 60b20f7b-6d20-bcb3-f4fc-808a9b2fe0d0 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/media.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot menu=off,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/libvirt/images/media.img,if=none,id=drive-virtio-disk0,format=raw -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/home/charles/iso/ubuntu-desktop-12.04.4-amd64.iso,if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=25 -device
[Qemu-devel] [Bug 1563887] Re: qemu-system-ppc64 freezes on starting image on ppc64le
As noted, Xenial kernel is not supporting POWER7 cpu. ** Changed in: linux (Ubuntu) Status: Confirmed => Invalid -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1563887 Title: qemu-system-ppc64 freezes on starting image on ppc64le Status in QEMU: Invalid Status in linux package in Ubuntu: Invalid Status in livecd-rootfs package in Ubuntu: Invalid Status in qemu package in Ubuntu: Invalid Bug description: qemu-system-ppc64 running on Ubuntu 16.04 beta-2 fails to start an image as part of the certification process. This on an IBM ppc64le in PowerVM mode running Ubuntu 16.04 beta-2 deployed by MAAS 1.9.1. There is no error output. ubuntu@alpine01:~/kvm$ qemu-system-ppc64 -m 256 -display none -nographic -net nic -net user,net=10.0.0.0/8,host=10.0.0.1,hostfwd=tcp::-:22 -machine pseries -drive file=xenial-server-cloudimg-ppc64el-disk1.img,if=virtio -drive file=seed.iso,if=virtio WARNING: Image format was not specified for 'seed.iso' and probing guessed raw. Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted. Specify the 'raw' format explicitly to remove the restrictions. SLOF ** QEMU Starting Build Date = Jan 29 2016 18:58:37 FW Version = buildd@ release 20151103 Press "s" to enter Open Firmware. Populating /vdevice methods Populating /vdevice/vty@7100 Populating /vdevice/nvram@7101 Populating /vdevice/l-lan@7102 Populating /vdevice/v-scsi@7103 SCSI: Looking for devices 8200 CD-ROM : "QEMU QEMU CD-ROM 2.5+" Populating /pci@8002000 00 1800 (D) : 1af4 1001virtio [ block ] 00 1000 (D) : 1af4 1001virtio [ block ] 00 0800 (D) : 106b 003fserial bus [ usb-ohci ] 00 (D) : 1234 qemu vga No NVRAM common partition, re-initializing... Installing QEMU fb Scanning USB OHCI: initializing USB Keyboard USB mouse No console specified using screen & keyboard Welcome to Open Firmware Copyright (c) 2004, 2011 IBM Corporation All rights reserved. This program and the accompanying materials are made available under the terms of the BSD License available at http://www.opensource.org/licenses/bsd-license.php Trying to load: from: /pci@8002000/scsi@3 ... E3404: Not a bootable device! Trying to load: from: /pci@8002000/scsi@2 ... Successfully loaded Linux ppc64le #31-Ubuntu SMP F ProblemType: Bug DistroRelease: Ubuntu 16.04 Package: qemu-system-ppc 1:2.5+dfsg-5ubuntu6 ProcVersionSignature: Ubuntu 4.4.0-16.32-generic 4.4.6 Uname: Linux 4.4.0-16-generic ppc64le ApportVersion: 2.20-0ubuntu3 Architecture: ppc64el Date: Wed Mar 30 14:10:01 2016 KvmCmdLine: COMMAND STAT EUID RUID PID PPID %CPU COMMAND kvm-irqfd-clean S< 0 0 1172 2 0.0 [kvm-irqfd-clean] qemu-nbdSsl 0 0 13467 1 0.0 qemu-nbd -c /dev/nbd0 xenial-server-cloudimg-ppc64el-disk1.img qemu-system-ppc Sl+ 1000 1000 18973 18896 101 qemu-system-ppc64 -m 256 -display none -nographic -net nic -net user,net=10.0.0.0/8,host=10.0.0.1,hostfwd=tcp::-:22 -machine pseries -drive file=xenial-server-cloudimg-ppc64el-disk1.img,if=virtio -drive file=seed.iso,if=virtio Lsusb: Error: command ['lsusb'] failed with exit code 1: ProcEnviron: TERM=xterm PATH=(custom, no user) LANG=en_US.UTF-8 SHELL=/bin/bash ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinux-4.4.0-16-generic root=UUID=92d820c8-ab25-497b-9b1e-f1435992bbf3 ro ProcLoadAvg: 1.08 0.94 0.58 2/616 19571 ProcLocks: 1: POSIX ADVISORY WRITE 886 00:13:381 0 EOF 2: POSIX ADVISORY WRITE 1339 00:13:528 0 EOF 3: FLOCK ADVISORY WRITE 1284 00:13:522 0 EOF 4: POSIX ADVISORY WRITE 2281 00:13:563 0 EOF 5: POSIX ADVISORY WRITE 1331 00:13:536 0 EOF ProcSwaps: Filename TypeSizeUsedPriority /swap.img file 8388544 0 -1 ProcVersion: Linux version 4.4.0-16-generic (buildd@bos01-ppc64el-001) (gcc version 5.3.1 20160320 (Ubuntu/Linaro/IBM 5.3.1-12ubuntu4) ) #32-Ubuntu SMP Thu Mar 24 22:31:14 UTC 2016 SourcePackage: qemu UpgradeStatus: No upgrade log present (probably fresh install) bootlist: /pci@8002011/pci1014,034A@0/sas/disk@4068402c40 /pci@8002018/ethernet@0:speed=auto,duplex=auto,csarch,000.000.000.000,,000.000.000.000,000.000.000.000,5,5,000.000.000.000,512 /pci@8002018/ethernet@0,1:speed=auto,duplex=auto,csarch,000.000.000.000,,000.000.000.000,000.000.000.000,5,5,000.000.000.000,512
[Qemu-devel] [Bug 1563887] Re: qemu-system-ppc64 freezes on starting image on ppc64le
I should also mention that the original command used will work if we specify which cpu type to use: qemu-system-ppc64 -m 256 \ -cpu POWER8 \ -display none -nographic \ -net nic -net user,net=10.0.0.0/8,host=10.0.0.1,hostfwd=tcp::-:22 \ -machine pseries \ -drive file=xenial-server-cloudimg-ppc64el-disk1.img,if=virtio \ -drive file=seed.iso,if=virtio -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1563887 Title: qemu-system-ppc64 freezes on starting image on ppc64le Status in QEMU: Confirmed Status in linux package in Ubuntu: Confirmed Status in livecd-rootfs package in Ubuntu: Invalid Status in qemu package in Ubuntu: Invalid Bug description: qemu-system-ppc64 running on Ubuntu 16.04 beta-2 fails to start an image as part of the certification process. This on an IBM ppc64le in PowerVM mode running Ubuntu 16.04 beta-2 deployed by MAAS 1.9.1. There is no error output. ubuntu@alpine01:~/kvm$ qemu-system-ppc64 -m 256 -display none -nographic -net nic -net user,net=10.0.0.0/8,host=10.0.0.1,hostfwd=tcp::-:22 -machine pseries -drive file=xenial-server-cloudimg-ppc64el-disk1.img,if=virtio -drive file=seed.iso,if=virtio WARNING: Image format was not specified for 'seed.iso' and probing guessed raw. Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted. Specify the 'raw' format explicitly to remove the restrictions. SLOF ** QEMU Starting Build Date = Jan 29 2016 18:58:37 FW Version = buildd@ release 20151103 Press "s" to enter Open Firmware. Populating /vdevice methods Populating /vdevice/vty@7100 Populating /vdevice/nvram@7101 Populating /vdevice/l-lan@7102 Populating /vdevice/v-scsi@7103 SCSI: Looking for devices 8200 CD-ROM : "QEMU QEMU CD-ROM 2.5+" Populating /pci@8002000 00 1800 (D) : 1af4 1001virtio [ block ] 00 1000 (D) : 1af4 1001virtio [ block ] 00 0800 (D) : 106b 003fserial bus [ usb-ohci ] 00 (D) : 1234 qemu vga No NVRAM common partition, re-initializing... Installing QEMU fb Scanning USB OHCI: initializing USB Keyboard USB mouse No console specified using screen & keyboard Welcome to Open Firmware Copyright (c) 2004, 2011 IBM Corporation All rights reserved. This program and the accompanying materials are made available under the terms of the BSD License available at http://www.opensource.org/licenses/bsd-license.php Trying to load: from: /pci@8002000/scsi@3 ... E3404: Not a bootable device! Trying to load: from: /pci@8002000/scsi@2 ... Successfully loaded Linux ppc64le #31-Ubuntu SMP F ProblemType: Bug DistroRelease: Ubuntu 16.04 Package: qemu-system-ppc 1:2.5+dfsg-5ubuntu6 ProcVersionSignature: Ubuntu 4.4.0-16.32-generic 4.4.6 Uname: Linux 4.4.0-16-generic ppc64le ApportVersion: 2.20-0ubuntu3 Architecture: ppc64el Date: Wed Mar 30 14:10:01 2016 KvmCmdLine: COMMAND STAT EUID RUID PID PPID %CPU COMMAND kvm-irqfd-clean S< 0 0 1172 2 0.0 [kvm-irqfd-clean] qemu-nbdSsl 0 0 13467 1 0.0 qemu-nbd -c /dev/nbd0 xenial-server-cloudimg-ppc64el-disk1.img qemu-system-ppc Sl+ 1000 1000 18973 18896 101 qemu-system-ppc64 -m 256 -display none -nographic -net nic -net user,net=10.0.0.0/8,host=10.0.0.1,hostfwd=tcp::-:22 -machine pseries -drive file=xenial-server-cloudimg-ppc64el-disk1.img,if=virtio -drive file=seed.iso,if=virtio Lsusb: Error: command ['lsusb'] failed with exit code 1: ProcEnviron: TERM=xterm PATH=(custom, no user) LANG=en_US.UTF-8 SHELL=/bin/bash ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinux-4.4.0-16-generic root=UUID=92d820c8-ab25-497b-9b1e-f1435992bbf3 ro ProcLoadAvg: 1.08 0.94 0.58 2/616 19571 ProcLocks: 1: POSIX ADVISORY WRITE 886 00:13:381 0 EOF 2: POSIX ADVISORY WRITE 1339 00:13:528 0 EOF 3: FLOCK ADVISORY WRITE 1284 00:13:522 0 EOF 4: POSIX ADVISORY WRITE 2281 00:13:563 0 EOF 5: POSIX ADVISORY WRITE 1331 00:13:536 0 EOF ProcSwaps: Filename TypeSizeUsedPriority /swap.img file 8388544 0 -1 ProcVersion: Linux version 4.4.0-16-generic (buildd@bos01-ppc64el-001) (gcc version 5.3.1 20160320 (Ubuntu/Linaro/IBM 5.3.1-12ubuntu4) ) #32-Ubuntu SMP Thu Mar 24 22:31:14 UTC 2016 SourcePackage: qemu UpgradeStatus: No upgrade log present (probably fresh install) bootlist: /pci@8002011/pci1014,034A@0/sas/disk@4068402c40
[Qemu-devel] [Bug 1563887] Re: qemu-system-ppc64 freezes on starting image on ppc64le
Here's an update. The Xenial kernel doesn;t like the emulated POWER7 cpu that the command line being used generates by default. processor : 0 cpu : POWER7 (raw), altivec supported clock : 1000.00MHz revision: 2.3 (pvr 003f 0203) timebase: 51200 platform: pSeries model : IBM pSeries (emulated by qemu) machine : CHRP IBM pSeries (emulated by qemu) We can boot a Wily image (kernel 4.2.0-35) just fine with the POWER7 cpu. When booting Xenial's kernel with POWER7 cpu, it produces a stacktrace during module load: [9.885165] Loaded X.509 cert 'Build time autogenerated kernel key: 6687eed33bf99302166296c3e5cafe31ef38ad41' [9.886507] zswap: loaded using pool lzo/zbud [9.916000] modprobe[74]: unhandled signal 4 at 3fffb5a4d03c nip 3fffb5a4d03c lr 3fffb5a25e24 code 30001 [9.925819] modprobe[76]: unhandled signal 4 at 3fff85b9d03c nip 3fff85b9d03c lr 3fff85b75e24 code 30001 [9.928401] Key type trusted registered [9.930762] modprobe[79]: unhandled signal 4 at 3fff7d05d03c nip 3fff7d05d03c lr 3fff7d035e24 code 30001 [9.933360] modprobe[80]: unhandled signal 4 at 3fff8820d03c nip 3fff8820d03c lr 3fff881e5e24 code 30001 [9.936240] modprobe[83]: unhandled signal 4 at 3fffb4fbd03c nip 3fffb4fbd03c lr 3fffb4f95e24 code 30001 [9.938873] modprobe[84]: unhandled signal 4 at 3fff92d4d03c nip 3fff92d4d03c lr 3fff92d25e24 code 30001 [9.940335] Key type encrypted registered [9.940461] AppArmor: AppArmor sha1 policy hashing enabled [9.941005] ima: No TPM chip found, activating TPM-bypass! [9.942985] evm: HMAC attrs: 0x1 [9.947081] hctosys: unable to open rtc device (rtc0) [9.987867] Freeing unused kernel memory: 6144K (c0ea - c14a) [9.991123] init[1]: unhandled signal 4 at 3fff8edfd03c nip 3fff8edfd03c lr 3fff8edd5e24 code 30001 [9.994581] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0004 [9.994581] [9.994889] CPU: 0 PID: 1 Comm: init Not tainted 4.4.0-18-generic #34-Ubuntu [9.995054] Call Trace: [9.995216] [c0001e4c3a50] [c0aed6fc] dump_stack+0xb0/0xf0 (unreliable) [9.995336] [c0001e4c3a90] [c0ae9930] panic+0x100/0x2c0 [9.995398] [c0001e4c3b20] [c00bd554] do_exit+0xc24/0xc30 [9.995443] [c0001e4c3be0] [c00bd644] do_group_exit+0x64/0x100 [9.995490] [c0001e4c3c20] [c00ceaac] get_signal+0x55c/0x7b0 [9.995534] [c0001e4c3d10] [c0017424] do_signal+0x54/0x2b0 [9.995578] [c0001e4c3e00] [c001787c] do_notify_resume+0xbc/0xd0 [9.995677] [c0001e4c3e30] [c0009838] ret_from_except_lite+0x64/0x68 [ 10.011069] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0004 [ 10.011069] When we use -enable-kvm, this bypasses the tcg POWER7 cpu, and uses the host cpu type (POWER8) which is why we can boot the Xenial kernel with KVM. We need to open a linux task to help track down that issue; also if someone is testing Xenial on POWER7 hardware, that may help determine if there is a lurking qemu tcg issue, though given that Wily kernels boot fine in tcg mode; it's more likely there's something that changed/broke in the kernels since 4.2.0-35. I'm marking the qemu task invalid, and will open the linux task. ** Changed in: qemu (Ubuntu) Status: Confirmed => Invalid -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1563887 Title: qemu-system-ppc64 freezes on starting image on ppc64le Status in QEMU: Confirmed Status in linux package in Ubuntu: Confirmed Status in livecd-rootfs package in Ubuntu: Invalid Status in qemu package in Ubuntu: Invalid Bug description: qemu-system-ppc64 running on Ubuntu 16.04 beta-2 fails to start an image as part of the certification process. This on an IBM ppc64le in PowerVM mode running Ubuntu 16.04 beta-2 deployed by MAAS 1.9.1. There is no error output. ubuntu@alpine01:~/kvm$ qemu-system-ppc64 -m 256 -display none -nographic -net nic -net user,net=10.0.0.0/8,host=10.0.0.1,hostfwd=tcp::-:22 -machine pseries -drive file=xenial-server-cloudimg-ppc64el-disk1.img,if=virtio -drive file=seed.iso,if=virtio WARNING: Image format was not specified for 'seed.iso' and probing guessed raw. Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted. Specify the 'raw' format explicitly to remove the restrictions. SLOF ** QEMU Starting Build Date = Jan 29 2016 18:58:37 FW Version = buildd@ release 20151103 Press "s" to enter Open Firmware. Populating /vdevice methods Populating
[Qemu-devel] [Bug 1465935] Re: kvm_irqchip_commit_routes: Assertion `ret == 0' failed
Thank you for taking the time to report this bug and helping to make Ubuntu better. Please execute the following command, as it will automatically gather debugging information, in a terminal: apport-collect 1465935 When reporting bugs in the future please use apport by using 'ubuntu- bug' and the name of the package affected. You can learn more about this functionality at https://wiki.ubuntu.com/ReportingBugs. ** Changed in: qemu (Ubuntu) Status: New = Incomplete -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1465935 Title: kvm_irqchip_commit_routes: Assertion `ret == 0' failed Status in QEMU: New Status in qemu package in Ubuntu: Incomplete Bug description: Several my QEMU instances crashed, and in the qemu log, I can see this assertion failure, qemu-system-x86_64: /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:984: kvm_irqchip_commit_routes: Assertion `ret == 0' failed. The QEMU version is 2.0.0, HV OS is ubuntu 12.04, kernel 3.2.0-38. Guest OS is RHEL 6.3. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1465935/+subscriptions
[Qemu-devel] [Bug 1465935] Re: kvm_irqchip_commit_routes: Assertion `ret == 0' failed
Have you be able to reproduce this issue on a wily host? What about a different guest? Or is only RHEL6.3 affected? -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1465935 Title: kvm_irqchip_commit_routes: Assertion `ret == 0' failed Status in QEMU: New Status in qemu package in Ubuntu: Incomplete Bug description: Several my QEMU instances crashed, and in the qemu log, I can see this assertion failure, qemu-system-x86_64: /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:984: kvm_irqchip_commit_routes: Assertion `ret == 0' failed. The QEMU version is 2.0.0, HV OS is ubuntu 12.04, kernel 3.2.0-38. Guest OS is RHEL 6.3. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1465935/+subscriptions
[Qemu-devel] [Bug 1465935] Re: kvm_irqchip_commit_routes: Assertion `ret == 0' failed
It appears that the latest version of the patch is here: http://lists.gnu.org/archive/html/qemu-devel/2015-01/msg00822.html However, this hasn't yet be accepted upstream. The most recent discussion requires the submitter to respond to the maintainers questions here: http://lists.gnu.org/archive/html/qemu-devel/2015-01/msg00623.html -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1465935 Title: kvm_irqchip_commit_routes: Assertion `ret == 0' failed Status in QEMU: New Status in qemu package in Ubuntu: Incomplete Bug description: Several my QEMU instances crashed, and in the qemu log, I can see this assertion failure, qemu-system-x86_64: /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:984: kvm_irqchip_commit_routes: Assertion `ret == 0' failed. The QEMU version is 2.0.0, HV OS is ubuntu 12.04, kernel 3.2.0-38. Guest OS is RHEL 6.3. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1465935/+subscriptions
[Qemu-devel] [Bug 551545] Re: PXE netboot not booting localboot from virtio-disk
This issue should be fixed in the qemu-kvm version included in precise. ** Changed in: qemu-kvm (Ubuntu) Status: Triaged = Fix Released -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/551545 Title: PXE netboot not booting localboot from virtio-disk Status in QEMU: New Status in qemu-kvm package in Ubuntu: Fix Released Status in qemu-kvm package in Fedora: Unknown Bug description: Binary package hint: qemu-kvm lsb_release -rd Description: Ubuntu lucid (development branch) Release: 10.04 apt-cache policy qemu-kvm qemu-kvm: Installiert: 0.12.3+noroms-0ubuntu3 Kandidat: 0.12.3+noroms-0ubuntu3 Versions-Tabelle: *** 0.12.3+noroms-0ubuntu3 0 500 http://intranet/ubuntu/ lucid/main Packages 100 /var/lib/dpkg/status Description of the problem: Starting a guest like this: vdekvm \ -m 256M \ -cpu host \ -smp 1 \ -name karmic \ -boot order=nc \ -drive file=/dev/vg01/test,if=virtio,boot=on,cache=none \ -net nic,vlan=0,macaddr=00:2f:8d:b6:cf:d0,model=virtio \ -net vde,vlan=0,sock=/var/run/vde2/vde0.ctl \ -watchdog i6300esb \ -vnc :0 \ -serial telnet:localhost:23,server,nowait \ -monitor tcp:127.0.0.1:12000,server,nowait \ -runas kvmguest On telnet localhost you can see that the following boot-menu appears: - Boot Menu - = local rescue It is loaded from this pxelinux.cfg/default file: SERIAL 0 9600n8 DISPLAY boot.txt TIMEOUT 120 DEFAULT local PROMPT 1 LABEL local localboot 0 LABEL rescue kernel lucid append initrd=lucid-initrd.gz rescue/enable=true -- quiet console=ttyS0,9600n8 After the timeout, the guest tries to boot, but fails and reloads the boot menu. This is an endless loop, until I break it or choose the rescue menu entry. I would expect that it boots from first virtio-disk ProblemType: Bug DistroRelease: Ubuntu 10.04 Package: qemu-kvm 0.12.3+noroms-0ubuntu3 ProcVersionSignature: Ubuntu 2.6.32-18.27-generic 2.6.32.10+drm33.1 Uname: Linux 2.6.32-18-generic x86_64 Architecture: amd64 Date: Tue Mar 30 11:40:59 2010 ExecutablePath: /usr/bin/qemu-system-x86_64 MachineType: MICRO-STAR INTERANTIONAL CO.,LTD MS-7368 ProcCmdLine: root=UUID=0d27271c-feaa-40d9-bbbd-baff4ca1d3cc rw init=/bin/bash ProcEnviron: LANG=de_DE.UTF-8 SHELL=/bin/bash SourcePackage: qemu-kvm dmi.bios.date: 10/31/2007 dmi.bios.vendor: American Megatrends Inc. dmi.bios.version: V1.5B2 dmi.board.asset.tag: To Be Filled By O.E.M. dmi.board.name: MS-7368 dmi.board.vendor: MICRO-STAR INTERANTIONAL CO.,LTD dmi.board.version: 1.0 dmi.chassis.asset.tag: To Be Filled By O.E.M. dmi.chassis.type: 3 dmi.chassis.vendor: To Be Filled By O.E.M. dmi.chassis.version: To Be Filled By O.E.M. dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrV1.5B2:bd10/31/2007:svnMICRO-STARINTERANTIONALCO.,LTD:pnMS-7368:pvr1.0:rvnMICRO-STARINTERANTIONALCO.,LTD:rnMS-7368:rvr1.0:cvnToBeFilledByO.E.M.:ct3:cvrToBeFilledByO.E.M.: dmi.product.name: MS-7368 dmi.product.version: 1.0 dmi.sys.vendor: MICRO-STAR INTERANTIONAL CO.,LTD To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/551545/+subscriptions
[Qemu-devel] [Bug 1318830] Re: High CPU usage on windows virtual machine
Hi, Are you running ntp in the host or timekeeping in the guest? If so, is the guest clock close to what current time should be? Also, can you try adding this option: -global mc146818rtc.lost_tick_policy=slew -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1318830 Title: High CPU usage on windows virtual machine Status in QEMU: New Status in qemu package in Ubuntu: New Bug description: I got Ubuntu 14.04, with Qemu 2.0 and moved my windows VM to this new box, and made sure that what this article indicates was achieved https://www.kraxel.org/blog/2014/03/qemu-and-usb-tablet-cpu-consumtion/ I can attest that it works following the instructions, erasing the registry, etc. Unfortunately, with 4 cpus as below, I still see 60% CPU outside as shown by Top versus 0% CPU inside. My Kernel is 3.15.0-031500rc4-generic If some developer wants to log in and take a look, I am happy to help. The box is not in production and I take full responsibility. Until this is solved, KVM is not going to compete with Hyper-V or Vmware. Basically KVM is not suitable for the Enterprise as of yet. qemu-system-x86_64 -enable-kvm -name Production -S -machine pc- i440fx-2.0,accel=kvm,usb=off -cpu kvm64,+rdtscp,+pdpe1gb,+x2apic,+dca,+pcid,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pclmuldq,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme,hv_relaxed,hv_vapic,hv_spinlocks=0xfff -m 4024 -realtime mlock=off -smp 4,sockets=1,cores=4,threads=1 -uuid e8701c5c-b542-0199-fd2a-1047df24770e -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/Production.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -boot strict=on -device piix3-usb- uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/libvirt/images/Production.img,if=none,id=drive-virtio- disk0,format=raw -device virtio-blk- pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio- disk0,bootindex=1 -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31 -device virtio-net- pci,netdev=hostnet0,id=net0,mac=00:16:3a:d2:cd:ea,bus=pci.0,addr=0x3 -netdev tap,fd=35,id=hostnet1,vhost=on,vhostfd=36 -device virtio-net- pci,netdev=hostnet1,id=net1,mac=52:54:00:70:fe:54,bus=pci.0,addr=0x4 -chardev pty,id=charserial0 -device isa- serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 127.0.0.1:0 -device VGA,id=video0,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x5 -device hda- duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon- pci,id=balloon0,bus=pci.0,addr=0x7 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1318830/+subscriptions
Re: [Qemu-devel] [RFC] fix crashes with vmware driver
* Serge Hallyn serge.hal...@canonical.com [2012-03-02 15:13]: Hi, I don't know where the best place to catch this would be, but with vnc and vmware_vga it's possible to get set_bit called on a negative index, crashing qemu. See https://bugs.launchpad.net/ubuntu/+source/qemu-kvm/+bug/918791 for details. This patch prevents that. It's possible this should be caught earlier, but this patch works for me. Signed-off-by: Serge Hallyn serge.hal...@canonical.com --- vmware_vga.c | 16 1 file changed, 16 insertions(+) Index: qemu-kvm-1.0+noroms/hw/vmware_vga.c === --- qemu-kvm-1.0+noroms.orig/hw/vmware_vga.c 2012-03-01 16:19:23.280571798 -0600 +++ qemu-kvm-1.0+noroms/hw/vmware_vga.c 2012-03-01 16:27:27.910975006 -0600 @@ -298,6 +298,22 @@ uint8_t *src; uint8_t *dst; +if (x 0) { +fprintf(stderr, %s: update x was 0 (%d, w %d)\n, +__FUNCTION__, x, w); +w += x; + if (w 0) + return; +x = 0; +} +if (y 0) { +fprintf(stderr, %s: update y was 0 (%d, h %d)\n, +__FUNCTION__, y, h); +h += y; + if (h 0) + return; +y = 0; +} Looks like it has mixed spaces and tabs. CODING_STYLE wants {} on all if's if (x + w s-width) { fprintf(stderr, %s: update width too large x: %d, w: %d\n, __FUNCTION__, x, w); -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] [PATCH v5] qemu-ga: Add guest-network-info command
); + +if (interfaces) { +network_info = g_malloc0(sizeof(*network_info)); +network_info-interfaces = interfaces; +} + +return network_info; +} -- 1.7.3.4 -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] [PATCH 2/4] Add cleanup function
* Eric Blake ebl...@redhat.com [2012-01-17 16:03]: On 01/16/2012 10:16 AM, Ryan Harper wrote: if test -z $1 -o -z $2; then echo Usage: $0 QEMU TEST1 [TEST2 ...] +cleanup exit 1 Is it worth using 'trap cleanup 0' to install the cleanup handler up front, instead of modifying all exit call sites? I thought about that, but it seemed to require switching to /bin/bash Not really. and I know Anthony had written the scripts carefully to be /bin/sh. POSIX requires /bin/sh to support 'trap cleanup 0', and I don't know of I was using trap cleanup SIGINT; which /bin/sh didn't like: (finalgravity) qemu-test % ./qemu-test ~/work/git/qemu/x86_64-softmmu/qemu-system-x86_64 tests/virtio-serial.sh trap: SIGINT: bad trap but with 0 instead, that seems to work. any counter-example shells that fail to do this. There are non-POSIX shells where installing a trap 0 handler from inside a function body invokes the handler upon exiting the function, instead of exiting the overall script, but even Solaris /bin/sh knows how to correctly handle a trap 0 handler installed outside of any function calls. https://www.gnu.org/software/autoconf/manual/autoconf.html#trap -- Eric Blake ebl...@redhat.com+1-919-301-3266 Libvirt virtualization library http://libvirt.org -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] [PATCH 2/4] Add cleanup function
* Eric Blake ebl...@redhat.com [2012-01-13 17:18]: On 01/13/2012 03:05 PM, Ryan Harper wrote: Create a cleanup function and call it from all exits so we don't leave temp files and directories around since we change the name on each invocation. Also, no need to delete the files in the tmpdir, so just remove the tmpdir if it exists. Signed-off-by: Ryan Harper ry...@us.ibm.com --- qemu-test | 11 +-- 1 files changed, 9 insertions(+), 2 deletions(-) diff --git a/qemu-test b/qemu-test index cd102a7..71c1ba1 100755 --- a/qemu-test +++ b/qemu-test @@ -1,7 +1,14 @@ #!/bin/sh +cleanup() { +if test -n $tmpdir; then +rm -rf $tmpdir; +fi +} + if test -z $1 -o -z $2; then echo Usage: $0 QEMU TEST1 [TEST2 ...] +cleanup exit 1 Is it worth using 'trap cleanup 0' to install the cleanup handler up front, instead of modifying all exit call sites? I thought about that, but it seemed to require switching to /bin/bash and I know Anthony had written the scripts carefully to be /bin/sh. -- Eric Blake ebl...@redhat.com+1-919-301-3266 Libvirt virtualization library http://libvirt.org -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] [PATCH 2/2] KVM: Use -cpu best as default on x86
* Alexander Graf ag...@suse.de [2012-01-08 17:53]: When running QEMU without -cpu parameter, the user usually wants a sane default. So far, we're using the qemu64/qemu32 CPU type, which basically means the maximum TCG can emulate. it also means we all maximum possible migration targets. Have you given any thought to migration with -cpu best? That's a really good default when using TCG, but when running with KVM we much rather want a default saying the maximum performance I can get. Fortunately we just added an option that gives us the best performance while still staying safe on the testability side of things: -cpu best. So all we need to do is make -cpu best the default when the user doesn't define any. This fixes a lot of subtile breakage in the GNU toolchain (libgmp) which hicks up on QEMU's non-existent CPU models. This patch also adds a new pc-1.1 machine type to keep backwards compatible with older versions of QEMU. Signed-off-by: Alexander Graf ag...@suse.de --- hw/pc_piix.c | 42 ++ 1 files changed, 34 insertions(+), 8 deletions(-) diff --git a/hw/pc_piix.c b/hw/pc_piix.c index 00f525e..3d78ccb 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -79,7 +79,8 @@ static void pc_init1(MemoryRegion *system_memory, const char *initrd_filename, const char *cpu_model, int pci_enabled, - int kvmclock_enabled) + int kvmclock_enabled, + int may_cpu_best) { int i; ram_addr_t below_4g_mem_size, above_4g_mem_size; @@ -102,6 +103,9 @@ static void pc_init1(MemoryRegion *system_memory, MemoryRegion *rom_memory; DeviceState *dev; +if (!cpu_model kvm_enabled() may_cpu_best) { +cpu_model = best; +} pc_cpus_init(cpu_model); if (kvmclock_enabled) { @@ -263,7 +267,21 @@ static void pc_init_pci(ram_addr_t ram_size, get_system_io(), ram_size, boot_device, kernel_filename, kernel_cmdline, - initrd_filename, cpu_model, 1, 1); + initrd_filename, cpu_model, 1, 1, 1); +} + +static void pc_init_pci_oldcpu(ram_addr_t ram_size, + const char *boot_device, + const char *kernel_filename, + const char *kernel_cmdline, + const char *initrd_filename, + const char *cpu_model) +{ +pc_init1(get_system_memory(), + get_system_io(), + ram_size, boot_device, + kernel_filename, kernel_cmdline, + initrd_filename, cpu_model, 1, 1, 0); } static void pc_init_pci_no_kvmclock(ram_addr_t ram_size, @@ -277,7 +295,7 @@ static void pc_init_pci_no_kvmclock(ram_addr_t ram_size, get_system_io(), ram_size, boot_device, kernel_filename, kernel_cmdline, - initrd_filename, cpu_model, 1, 0); + initrd_filename, cpu_model, 1, 0, 0); } static void pc_init_isa(ram_addr_t ram_size, @@ -293,7 +311,7 @@ static void pc_init_isa(ram_addr_t ram_size, get_system_io(), ram_size, boot_device, kernel_filename, kernel_cmdline, - initrd_filename, cpu_model, 0, 1); + initrd_filename, cpu_model, 0, 1, 0); } #ifdef CONFIG_XEN @@ -314,8 +332,8 @@ static void pc_xen_hvm_init(ram_addr_t ram_size, } #endif -static QEMUMachine pc_machine_v1_0 = { -.name = pc-1.0, +static QEMUMachine pc_machine_v1_1 = { +.name = pc-1.1, .alias = pc, .desc = Standard PC, .init = pc_init_pci, @@ -323,17 +341,24 @@ static QEMUMachine pc_machine_v1_0 = { .is_default = 1, }; +static QEMUMachine pc_machine_v1_0 = { +.name = pc-1.0, +.desc = Standard PC, +.init = pc_init_pci_oldcpu, +.max_cpus = 255, +}; + static QEMUMachine pc_machine_v0_15 = { .name = pc-0.15, .desc = Standard PC, -.init = pc_init_pci, +.init = pc_init_pci_oldcpu, .max_cpus = 255, }; static QEMUMachine pc_machine_v0_14 = { .name = pc-0.14, .desc = Standard PC, -.init = pc_init_pci, +.init = pc_init_pci_oldcpu, .max_cpus = 255, .compat_props = (GlobalProperty[]) { { @@ -612,6 +637,7 @@ static QEMUMachine xenfv_machine = { static void pc_machine_init(void) { +qemu_register_machine(pc_machine_v1_1); qemu_register_machine(pc_machine_v1_0); qemu_register_machine(pc_machine_v0_15); qemu_register_machine(pc_machine_v0_14); -- 1.6.0.2 -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] [PATCH 2/2] KVM: Use -cpu best as default on x86
* Alexander Graf ag...@suse.de [2012-01-16 13:37]: On 16.01.2012, at 20:30, Ryan Harper wrote: * Alexander Graf ag...@suse.de [2012-01-08 17:53]: When running QEMU without -cpu parameter, the user usually wants a sane default. So far, we're using the qemu64/qemu32 CPU type, which basically means the maximum TCG can emulate. it also means we all maximum possible migration targets. Have you given any thought to migration with -cpu best? If you have the same boxes in your cluster, migration just works. If you don't, you usually use a specific CPU model that is the least dominator between your boxes either way. Sure, but the idea behind -cpu best is to not have to figure that out; you had suggested that the qemu64/qemu32 were just related to TCG, and what I'm suggesting is that it's also the most compatible w.r.t migration. it sounds like if migration is a requirement, then -cpu best probably isn't something that would be used. I suppose I'm OK with that, or at least I don't have a better suggestion on how to carefully push up the capabilities without at some point breaking migration. The current kvm64 type is broken. Libgmp just abort()s when we pass it in. So anything is better than what we do today on AMD hosts :). I wonder if it breaks with Cyris cpus... other tools tend to do runtime detection (mplayer). Alex -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] [PATCH 2/2] KVM: Use -cpu best as default on x86
* Alexander Graf ag...@suse.de [2012-01-16 13:52]: On 16.01.2012, at 20:46, Ryan Harper wrote: * Alexander Graf ag...@suse.de [2012-01-16 13:37]: On 16.01.2012, at 20:30, Ryan Harper wrote: * Alexander Graf ag...@suse.de [2012-01-08 17:53]: When running QEMU without -cpu parameter, the user usually wants a sane default. So far, we're using the qemu64/qemu32 CPU type, which basically means the maximum TCG can emulate. it also means we all maximum possible migration targets. Have you given any thought to migration with -cpu best? If you have the same boxes in your cluster, migration just works. If you don't, you usually use a specific CPU model that is the least dominator between your boxes either way. Sure, but the idea behind -cpu best is to not have to figure that out; you had suggested that the qemu64/qemu32 were just related to TCG, and what I'm suggesting is that it's also the most compatible w.r.t migration. The, the most compatible wrt migration is -cpu kvm64 / kvm32. it sounds like if migration is a requirement, then -cpu best probably isn't something that would be used. I suppose I'm OK with that, or at least I don't have a better suggestion on how to carefully push up the capabilities without at some point breaking migration. Yes, if you're interested in migration, then you're almost guaranteed to have a toolstack on top that has knowledge of your whole cluster and can do the least dominator detection over all of your nodes. On the QEMU level we don't know anything about other machines. The current kvm64 type is broken. Libgmp just abort()s when we pass it in. So anything is better than what we do today on AMD hosts :). I wonder if it breaks with Cyris cpus... other tools tend to do runtime detection (mplayer). It probably does :). But then again those don't do KVM, do they? not following; mplayer issues SSE2, 3 and 4 instructions to see what works to figure out how to optimize; it doesn't care if the cpu is called QEMU64 or Cyrus or AMD. I'm not saying that we can't do better than qemu64 w.r.t best cpu to select by default, but there are plenty of applications that want to optimize their code based on what's available, but this is done via code execution instead of string comparision. Alex -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] [PATCH 2/2] KVM: Use -cpu best as default on x86
* Alexander Graf ag...@suse.de [2012-01-16 14:52]: On 16.01.2012, at 21:13, Ryan Harper wrote: * Alexander Graf ag...@suse.de [2012-01-16 13:52]: On 16.01.2012, at 20:46, Ryan Harper wrote: * Alexander Graf ag...@suse.de [2012-01-16 13:37]: On 16.01.2012, at 20:30, Ryan Harper wrote: * Alexander Graf ag...@suse.de [2012-01-08 17:53]: When running QEMU without -cpu parameter, the user usually wants a sane default. So far, we're using the qemu64/qemu32 CPU type, which basically means the maximum TCG can emulate. it also means we all maximum possible migration targets. Have you given any thought to migration with -cpu best? If you have the same boxes in your cluster, migration just works. If you don't, you usually use a specific CPU model that is the least dominator between your boxes either way. Sure, but the idea behind -cpu best is to not have to figure that out; you had suggested that the qemu64/qemu32 were just related to TCG, and what I'm suggesting is that it's also the most compatible w.r.t migration. The, the most compatible wrt migration is -cpu kvm64 / kvm32. it sounds like if migration is a requirement, then -cpu best probably isn't something that would be used. I suppose I'm OK with that, or at least I don't have a better suggestion on how to carefully push up the capabilities without at some point breaking migration. Yes, if you're interested in migration, then you're almost guaranteed to have a toolstack on top that has knowledge of your whole cluster and can do the least dominator detection over all of your nodes. On the QEMU level we don't know anything about other machines. The current kvm64 type is broken. Libgmp just abort()s when we pass it in. So anything is better than what we do today on AMD hosts :). I wonder if it breaks with Cyris cpus... other tools tend to do runtime detection (mplayer). It probably does :). But then again those don't do KVM, do they? not following; mplayer issues SSE2, 3 and 4 instructions to see what works to figure out how to optimize; it doesn't care if the cpu is called QEMU64 or Cyrus or AMD. I'm not saying that we can't do better than qemu64 w.r.t best cpu to select by default, but there are plenty of applications that want to optimize their code based on what's available, but this is done via code execution instead of string comparision. The problem with -cpu kvm64 is that we choose a family/model that doesn't exist in the real world, and then glue AuthenticAMD or GenuineIntel in the vendor string. Libgmp checks for existing CPUs, finds that this CPU doesn't match any real world IDs and abort()s. The problem is that there is not a single CPU on this planet in silicon that has the same model+family numbers, but exists in AuthenticAMD _and_ GenuineIntel flavors. We need to pass the host vendor in though, because the guest uses it to detect if it should execute SYSCALL or SYSENTER, because intel and amd screwed up heavily on that one. I forgot about this one. =( -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
[Qemu-devel] [PATCH] Add virtio-blk-drive-serial test
We can test out the virtio-blk drive serial number by generating and then reading it back via the file in sysfs. Signed-off-by: Ryan Harper ry...@us.ibm.com --- tests/virtio-blk-drive-serial.sh | 40 ++ 1 files changed, 40 insertions(+), 0 deletions(-) create mode 100755 tests/virtio-blk-drive-serial.sh diff --git a/tests/virtio-blk-drive-serial.sh b/tests/virtio-blk-drive-serial.sh new file mode 100755 index 000..0586f97 --- /dev/null +++ b/tests/virtio-blk-drive-serial.sh @@ -0,0 +1,40 @@ +#!/bin/sh + +serial=0123456789abcdefghi + +in_host() { +tmpdisk=$tmpdir/disk.img +qemu-img create -f qcow2 $tmpdisk 10G + +qemu -nographic -enable-kvm \ +-drive file=$tmpdisk,if=none,id=drive-virtio-disk0,format=raw,cache=none,serial=$serial \ +-device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 +rc=$? + +rm $tmpdisk +return $rc +} + +in_guest() { +sysfspath=/sys/block/vda +if ! test -e $sysfspath; then +echo Device not visible! +return 1 +fi + +guest_serial=`cat $sysfspath/serial` + +if test $guest_serial != $serial; then +echo drive has wrong serial! +echo Expected '$serial', got '$guest_serial' +return 2 +fi + +return 0 +} + +if test $QEMU_TEST; then +in_host +else +in_guest +fi -- 1.7.6
[Qemu-devel] [PATCH 1/4] update get_file_size to not throw error on non-existant files
In some cases, when qemu is just starting up the logfile hasn't yet been created and will throw an error message into the output: ls: cannot access .tmp-3070/logfile-3070.log: No such file or directory if we check to see if the file is readable first then we know the file is present and we can extract the size. Signed-off-by: Ryan Harper ry...@us.ibm.com --- qemu-test |7 --- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/qemu-test b/qemu-test index c6ea595..cd102a7 100755 --- a/qemu-test +++ b/qemu-test @@ -54,9 +54,10 @@ checkpid() { } get_file_size() { -ls -al $1 | cut -f5 -d' ' 2/dev/null -if test $? != 0; then - echo 0 +if test -r ${1} ; then +ls -al $1 | cut -f5 -d' ' +else +echo 0 fi } -- 1.7.6
[Qemu-devel] [PATCH 4/4] Apply consistent indentation
Inner blocks of if/for/while are all indented. Signed-off-by: Ryan Harper ry...@us.ibm.com --- qemu-test | 56 1 files changed, 28 insertions(+), 28 deletions(-) diff --git a/qemu-test b/qemu-test index 445ca6d..9750a3f 100755 --- a/qemu-test +++ b/qemu-test @@ -2,7 +2,7 @@ cleanup() { if test -n $tmpdir; then -rm -rf $tmpdir; +rm -rf $tmpdir; fi } @@ -29,9 +29,9 @@ if ! which qmp /dev/null 2/dev/null; then qmp=${QEMU_SRC}/QMP/qmp if ! test -x ${qmp}; then -echo Please set QEMU_SRC to set to a recent qemu.git tree -cleanup -exit 1 +echo Please set QEMU_SRC to set to a recent qemu.git tree +cleanup +exit 1 fi else qmp=`which qmp | head -1` @@ -63,9 +63,9 @@ checkpid() { get_file_size() { if test -r ${1} ; then -ls -al $1 | cut -f5 -d' ' +ls -al $1 | cut -f5 -d' ' else -echo 0 +echo 0 fi } @@ -104,11 +104,11 @@ choose() { target=$(($RANDOM % $#)) count=0 for i in $@; do -if test $count = $target; then -echo $i -return 0 -fi -count=$(($count + 1)) +if test $count = $target; then +echo $i +return 0 +fi +count=$(($count + 1)) done # not supposed to happen... @@ -117,7 +117,7 @@ choose() { choose_bool() { if test `choose yes no` = yes; then -return 0 +return 0 fi return 1 } @@ -135,34 +135,34 @@ start_qemu() { qemu_is_okay() { # it's stopped, that's not necessarly bad if ! checkpid $pid; then -return 1 +return 1 fi log_size=`get_file_size $tmplog` if test $last_log_size = $log_size; then -freeze_count=$(($freeze_count + 1)) +freeze_count=$(($freeze_count + 1)) else -freeze_count=0 -last_log_size=$log_size +freeze_count=0 +last_log_size=$log_size fi if test $freeze_count -gt $FREEZE_THRESHOLD checkpid $pid; then -qemu_pid=`cat $tmppid` -kill -9 $qemu_pid -echo Guest ($qemu_pid) has not had output in $FREEZE_THRESHOLD seconds! -return 2 +qemu_pid=`cat $tmppid` +kill -9 $qemu_pid +echo Guest ($qemu_pid) has not had output in $FREEZE_THRESHOLD seconds! +return 2 fi if ! qmp query-status /dev/null 2/dev/null checkpid $pid; then -qemu_pid=`cat $tmppid` - -kill -9 $qemu_pid -echo QEMU is hung! -return 3 +qemu_pid=`cat $tmppid` + +kill -9 $qemu_pid +echo QEMU is hung! +return 3 fi # it's stopped, that's not necessarly bad if ! checkpid $pid; then -return 1 +return 1 fi } @@ -170,7 +170,7 @@ get_qemu_status() { wait $pid rc=`cat $tmprc` if test $(($rc 1)) = 1; then -rc=$(($rc / 2)) +rc=$(($rc / 2)) fi return $rc } @@ -179,7 +179,7 @@ qemu() { start_qemu $@ while qemu_is_okay; do -sleep 1 +sleep 1 done get_qemu_status -- 1.7.6
[Qemu-devel] [PATCH 2/4] Add cleanup function
Create a cleanup function and call it from all exits so we don't leave temp files and directories around since we change the name on each invocation. Also, no need to delete the files in the tmpdir, so just remove the tmpdir if it exists. Signed-off-by: Ryan Harper ry...@us.ibm.com --- qemu-test | 11 +-- 1 files changed, 9 insertions(+), 2 deletions(-) diff --git a/qemu-test b/qemu-test index cd102a7..71c1ba1 100755 --- a/qemu-test +++ b/qemu-test @@ -1,7 +1,14 @@ #!/bin/sh +cleanup() { +if test -n $tmpdir; then +rm -rf $tmpdir; +fi +} + if test -z $1 -o -z $2; then echo Usage: $0 QEMU TEST1 [TEST2 ...] +cleanup exit 1 fi @@ -23,6 +30,7 @@ if ! which qmp /dev/null 2/dev/null; then if ! test -x ${qmp}; then echo Please set QEMU_SRC to set to a recent qemu.git tree +cleanup exit 1 fi else @@ -182,7 +190,6 @@ QEMU_TEST=1 . $1 rc=$? -rm -f $tmplog $tmppid $tmpqmp $tmpinitrd -rm -rf $tmpdir $tmprc +cleanup exit $rc -- 1.7.6
[Qemu-devel] [PATCH] qemu-test: qemu-test script cleanups
Ran qemu-test for the first time and found a few places to clean up the output. Did tab removal in qemu-test and then updated the indentation to be consistent. Signed-off-by: Ryan Harper ry...@us.ibm.com
[Qemu-devel] [PATCH 3/4] Remove tabs and replace with four spaces.
Signed-off-by: Ryan Harper ry...@us.ibm.com --- qemu-test | 62 ++-- 1 files changed, 31 insertions(+), 31 deletions(-) diff --git a/qemu-test b/qemu-test index 71c1ba1..445ca6d 100755 --- a/qemu-test +++ b/qemu-test @@ -29,9 +29,9 @@ if ! which qmp /dev/null 2/dev/null; then qmp=${QEMU_SRC}/QMP/qmp if ! test -x ${qmp}; then - echo Please set QEMU_SRC to set to a recent qemu.git tree +echo Please set QEMU_SRC to set to a recent qemu.git tree cleanup - exit 1 +exit 1 fi else qmp=`which qmp | head -1` @@ -74,13 +74,13 @@ qmp() { qmp_pid=$! count=0 while checkpid $qmp_pid; do - sleep 1 - count=$(($count + 1)) - if test $count -gt $QMP_TIMEOUT; then - echo $count, $QMP_TIMEOUT - kill -9 $qmp_pid - return 1 - fi +sleep 1 +count=$(($count + 1)) +if test $count -gt $QMP_TIMEOUT; then +echo $count, $QMP_TIMEOUT +kill -9 $qmp_pid +return 1 +fi done return 0 } @@ -104,11 +104,11 @@ choose() { target=$(($RANDOM % $#)) count=0 for i in $@; do - if test $count = $target; then - echo $i - return 0 - fi - count=$(($count + 1)) +if test $count = $target; then +echo $i +return 0 +fi +count=$(($count + 1)) done # not supposed to happen... @@ -117,7 +117,7 @@ choose() { choose_bool() { if test `choose yes no` = yes; then - return 0 +return 0 fi return 1 } @@ -135,34 +135,34 @@ start_qemu() { qemu_is_okay() { # it's stopped, that's not necessarly bad if ! checkpid $pid; then - return 1 +return 1 fi log_size=`get_file_size $tmplog` if test $last_log_size = $log_size; then - freeze_count=$(($freeze_count + 1)) +freeze_count=$(($freeze_count + 1)) else - freeze_count=0 - last_log_size=$log_size +freeze_count=0 +last_log_size=$log_size fi if test $freeze_count -gt $FREEZE_THRESHOLD checkpid $pid; then - qemu_pid=`cat $tmppid` - kill -9 $qemu_pid - echo Guest ($qemu_pid) has not had output in $FREEZE_THRESHOLD seconds! - return 2 +qemu_pid=`cat $tmppid` +kill -9 $qemu_pid +echo Guest ($qemu_pid) has not had output in $FREEZE_THRESHOLD seconds! +return 2 fi if ! qmp query-status /dev/null 2/dev/null checkpid $pid; then - qemu_pid=`cat $tmppid` - - kill -9 $qemu_pid - echo QEMU is hung! - return 3 +qemu_pid=`cat $tmppid` + +kill -9 $qemu_pid +echo QEMU is hung! +return 3 fi # it's stopped, that's not necessarly bad if ! checkpid $pid; then - return 1 +return 1 fi } @@ -170,7 +170,7 @@ get_qemu_status() { wait $pid rc=`cat $tmprc` if test $(($rc 1)) = 1; then - rc=$(($rc / 2)) +rc=$(($rc / 2)) fi return $rc } @@ -179,7 +179,7 @@ qemu() { start_qemu $@ while qemu_is_okay; do - sleep 1 +sleep 1 done get_qemu_status -- 1.7.6
Re: [Qemu-devel] Storage requirements for live migration
== Image Formats == Image formats are not safe to use with live migration. The reason is that QEMU caches data for image formats and does not have a mechanism to flush those caches. The following attempts to describe the issues with the various formats === QCOW2 === QCOW2 caches two forms of data, cluster metadata (L1/L2 data, refcount table, etc) and mutable header information (file size, snapshot entries, etc). This data needs to be discarded before after migration starts. before after? pick one =) -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] [PATCH v10 1/3] block: add the command line support
--git a/blockdev.c b/blockdev.c index 0827bf7..faf8c56 100644 --- a/blockdev.c +++ b/blockdev.c @@ -235,6 +235,9 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi) int on_read_error, on_write_error; const char *devaddr; DriveInfo *dinfo; +BlockIOLimit io_limits; +bool bps_iol; +bool iops_iol; int snapshot = 0; int ret; @@ -353,6 +356,32 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi) } } +/* disk I/O throttling */ +io_limits.bps[BLOCK_IO_LIMIT_TOTAL] = + qemu_opt_get_number(opts, bps, 0); +io_limits.bps[BLOCK_IO_LIMIT_READ] = + qemu_opt_get_number(opts, bps_rd, 0); +io_limits.bps[BLOCK_IO_LIMIT_WRITE] = + qemu_opt_get_number(opts, bps_wr, 0); +io_limits.iops[BLOCK_IO_LIMIT_TOTAL] = + qemu_opt_get_number(opts, iops, 0); +io_limits.iops[BLOCK_IO_LIMIT_READ] = + qemu_opt_get_number(opts, iops_rd, 0); +io_limits.iops[BLOCK_IO_LIMIT_WRITE] = + qemu_opt_get_number(opts, iops_wr, 0); + +bps_iol = (io_limits.bps[BLOCK_IO_LIMIT_TOTAL] != 0) + ((io_limits.bps[BLOCK_IO_LIMIT_READ] != 0) + || (io_limits.bps[BLOCK_IO_LIMIT_WRITE] != 0)); +iops_iol = (io_limits.iops[BLOCK_IO_LIMIT_TOTAL] != 0) + ((io_limits.iops[BLOCK_IO_LIMIT_READ] != 0) + || (io_limits.iops[BLOCK_IO_LIMIT_WRITE] != 0)); +if (bps_iol || iops_iol) { +error_report(bps(iops) and bps_rd/bps_wr(iops_rd/iops_wr) + cannot be used at the same time); +return NULL; +} + How about extracting this section into the bdrv_set_io_limits() function or some other common setting function. This should be the same code that's used when modifying/setting these values via the monitor as well (minus how you obtain the input values - option parsings vs. monitor values). on_write_error = BLOCK_ERR_STOP_ENOSPC; if ((buf = qemu_opt_get(opts, werror)) != NULL) { if (type != IF_IDE type != IF_SCSI type != IF_VIRTIO type != IF_NONE) { @@ -460,6 +489,9 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi) bdrv_set_on_error(dinfo-bdrv, on_read_error, on_write_error); +/* disk I/O throttling */ +bdrv_set_io_limits(dinfo-bdrv, io_limits); + switch(type) { case IF_IDE: case IF_SCSI: diff --git a/qemu-config.c b/qemu-config.c index 597d7e1..1aa080f 100644 --- a/qemu-config.c +++ b/qemu-config.c @@ -85,6 +85,30 @@ static QemuOptsList qemu_drive_opts = { .name = readonly, .type = QEMU_OPT_BOOL, .help = open drive file as read-only, +},{ +.name = iops, +.type = QEMU_OPT_NUMBER, +.help = limit total I/O operations per second, +},{ +.name = iops_rd, +.type = QEMU_OPT_NUMBER, +.help = limit read operations per second, +},{ +.name = iops_wr, +.type = QEMU_OPT_NUMBER, +.help = limit write operations per second, +},{ +.name = bps, +.type = QEMU_OPT_NUMBER, +.help = limit total bytes per second, +},{ +.name = bps_rd, +.type = QEMU_OPT_NUMBER, +.help = limit read bytes per second, +},{ +.name = bps_wr, +.type = QEMU_OPT_NUMBER, +.help = limit write bytes per second, }, { /* end of list */ } }, diff --git a/qemu-options.hx b/qemu-options.hx index 681eaf1..25a7be7 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -136,6 +136,7 @@ DEF(drive, HAS_ARG, QEMU_OPTION_drive, [,cache=writethrough|writeback|none|directsync|unsafe][,format=f]\n [,serial=s][,addr=A][,id=name][,aio=threads|native]\n [,readonly=on|off]\n + [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r][,iops_wr=w]]\n use 'file' as a drive image\n, QEMU_ARCH_ALL) STEXI @item -drive @var{option}[,@var{option}[,@var{option}[,...]]] -- 1.7.6 -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] [PATCH 3/6] trace: Add trace events group implementation in the backend simple
* Mark Wu wu...@linux.vnet.ibm.com [2011-10-12 12:26]: Signed-off-by: Mark Wu wu...@linux.vnet.ibm.com --- trace/simple.c | 30 ++ trace/simple.h |7 +++ 2 files changed, 37 insertions(+), 0 deletions(-) diff --git a/trace/simple.c b/trace/simple.c index b639dda..7aa4c0b 100644 --- a/trace/simple.c +++ b/trace/simple.c @@ -321,6 +321,16 @@ void trace_print_events(FILE *stream, fprintf_function stream_printf) } } +void trace_print_groups(FILE *stream, fprintf_function stream_printf) +{ +unsigned int i; + +for (i = 0; i NR_TRACE_EVENT_GROUPS; i++) { +stream_printf(stream, %s [GROUP ID %u] : state %u\n, + trace_group_list[i].gp_name, i, + trace_group_list[i].state); You've got a mix of space and tabs in this file. Remove the tabs and adjust the spacing to CODING_STYLE rules. +} +} bool trace_event_set_state(const char *name, bool state) { unsigned int i; @@ -334,6 +344,26 @@ bool trace_event_set_state(const char *name, bool state) return false; } +bool trace_event_group_set_state(const char *gp_name, bool state) +{ +unsigned int i; +unsigned int j; +TraceEventGroup *group; + +for (i = 0; i NR_TRACE_EVENT_GROUPS; i++) { + group = trace_group_list[i]; here +if (!strcmp(group-gp_name, gp_name)) { +group-state = state; + + for (j = group-start; j = group-end; j++) { here +trace_list[j].state = state; +} +return true; +} +} +return false; +} + /* Helper function to create a thread with signals blocked. Use glib's * portable threads since QEMU abstractions cannot be used due to reentrancy in * the tracer. Also note the signal masking on POSIX hosts so that the thread diff --git a/trace/simple.h b/trace/simple.h index 466e75b..cf119f3 100644 --- a/trace/simple.h +++ b/trace/simple.h @@ -22,6 +22,13 @@ typedef struct { bool state; } TraceEvent; +typedef struct { +const char *gp_name; +bool state; +int start; +int end; +} TraceEventGroup; + void trace0(TraceEventID event); void trace1(TraceEventID event, uint64_t x1); void trace2(TraceEventID event, uint64_t x1, uint64_t x2); -- 1.7.1 -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] [PATCH] vhost-net: cleanup host notifiers at last step
, VirtIODevice *vdev); int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev); void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev); +int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev); +void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev); #endif diff --git a/hw/vhost_net.c b/hw/vhost_net.c index a559812..950a6b8 100644 --- a/hw/vhost_net.c +++ b/hw/vhost_net.c @@ -139,16 +139,22 @@ int vhost_net_start(struct vhost_net *net, { struct vhost_vring_file file = { }; int r; + +net-dev.nvqs = 2; +net-dev.vqs = net-vqs; + +r = vhost_dev_enable_notifiers(net-dev, dev); +if (r 0) { +goto fail_notifiers; +} if (net-dev.acked_features (1 VIRTIO_NET_F_MRG_RXBUF)) { tap_set_vnet_hdr_len(net-vc, sizeof(struct virtio_net_hdr_mrg_rxbuf)); } -net-dev.nvqs = 2; -net-dev.vqs = net-vqs; r = vhost_dev_start(net-dev, dev); if (r 0) { -return r; +goto fail_start; } net-vc-info-poll(net-vc, false); @@ -173,6 +179,9 @@ fail: if (net-dev.acked_features (1 VIRTIO_NET_F_MRG_RXBUF)) { tap_set_vnet_hdr_len(net-vc, sizeof(struct virtio_net_hdr)); } +fail_start: +vhost_dev_disable_notifiers(net-dev, dev); +fail_notifiers: return r; } @@ -190,6 +199,7 @@ void vhost_net_stop(struct vhost_net *net, if (net-dev.acked_features (1 VIRTIO_NET_F_MRG_RXBUF)) { tap_set_vnet_hdr_len(net-vc, sizeof(struct virtio_net_hdr)); } +vhost_dev_disable_notifiers(net-dev, dev); } void vhost_net_cleanup(struct vhost_net *net) -- 1.7.5.53.gc233e -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] [PATCH 0/3] better I/O accounting V2
* Christoph Hellwig h...@lst.de [2011-08-24 13:46]: On Mon, Aug 22, 2011 at 11:46:00AM -0500, Ryan Harper wrote: So, I believe this is how it's happening. we start accounting on a write which is turned into a multiwrite via virtio_blk_handle_write() which calls virtio_submit_multiwrite(). Then when the multiwrite completes, we invoke virtio_blk_rw_complete() on each part of the multiwrite. None of these requests have had their acct structure initialized since there was just *one* initial write. We could do a bdrv_acct_start() on each req, but that would break the concept of hiding the additional writes under the initial request. We do initialize all write fields correctly. Where we fail right now is for non-read/write requests. I can reproduce your issue by reading the serial attribute in sysfs - any chance Fedora does that during boot? Indeed. F15 has udev hooks to support /dev/disk/by-id for virtio-blk devices which will issue non-read/write request (get drive serial). The patch below should take care of these cases: Tested with this patch, F15 boots fine, mkfs.ext4 and dd all work as expected. Tested-by: Ryan Harper ry...@us.ibm.com Index: qemu/hw/virtio-blk.c === --- qemu.orig/hw/virtio-blk.c 2011-08-24 20:32:03.764503179 +0200 +++ qemu/hw/virtio-blk.c 2011-08-24 20:35:55.716579922 +0200 @@ -59,9 +59,6 @@ static void virtio_blk_req_complete(Virt stb_p(req-in-status, status); virtqueue_push(s-vq, req-elem, req-qiov.size + sizeof(*req-in)); virtio_notify(s-vdev, s-vq); - -bdrv_acct_done(s-bs, req-acct); -g_free(req); } static int virtio_blk_handle_rw_error(VirtIOBlockReq *req, int error, @@ -83,6 +80,8 @@ static int virtio_blk_handle_rw_error(Vi vm_stop(VMSTOP_DISKFULL); } else { virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR); +bdrv_acct_done(s-bs, req-acct); +g_free(req); bdrv_mon_event(s-bs, BDRV_ACTION_REPORT, is_read); } @@ -102,6 +101,8 @@ static void virtio_blk_rw_complete(void } virtio_blk_req_complete(req, VIRTIO_BLK_S_OK); +bdrv_acct_done(req-dev-bs, req-acct); +g_free(req); } static void virtio_blk_flush_complete(void *opaque, int ret) @@ -115,6 +116,8 @@ static void virtio_blk_flush_complete(vo } virtio_blk_req_complete(req, VIRTIO_BLK_S_OK); +bdrv_acct_done(req-dev-bs, req-acct); +g_free(req); } static VirtIOBlockReq *virtio_blk_alloc_request(VirtIOBlock *s) @@ -157,6 +160,7 @@ static void virtio_blk_handle_scsi(VirtI */ if (req-elem.out_num 2 || req-elem.in_num 3) { virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR); +g_free(req); return; } @@ -165,6 +169,7 @@ static void virtio_blk_handle_scsi(VirtI */ if (req-elem.out_num 2 req-elem.in_num 3) { virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP); +g_free(req); return; } @@ -231,11 +236,13 @@ static void virtio_blk_handle_scsi(VirtI stl_p(req-scsi-data_len, hdr.dxfer_len); virtio_blk_req_complete(req, status); +g_free(req); } #else static void virtio_blk_handle_scsi(VirtIOBlockReq *req) { virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP); +g_free(req); } #endif /* __linux__ */ @@ -378,6 +385,7 @@ static void virtio_blk_handle_request(Vi s-serial ? s-serial : , MIN(req-elem.in_sg[0].iov_len, VIRTIO_BLK_ID_BYTES)); virtio_blk_req_complete(req, VIRTIO_BLK_S_OK); +g_free(req); } else if (type VIRTIO_BLK_T_OUT) { qemu_iovec_init_external(req-qiov, req-elem.out_sg[1], req-elem.out_num - 1); -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] [PATCH 2/2] [FYI] coverage test for Linux installs
* Anthony Liguori aligu...@us.ibm.com [2011-08-08 14:33]: This is a simple tool that I've been using for the past couple months to help with day-to-day testing of changes. It may seem like it's similar to kvm-autotest but it's actually quite different. Most noticably: - It's a coverage test instead of an acceptance test. Each time it runs it uses a slightly a semi-random configuration for the guest. This means that the more you run it, the more coverage you get. This is a good fit for maintainer testing because you get wide testing without having to wait for 48-hour test cycles. - It runs in the foreground as an unprivileged user. This means I can kick it off in a terminal while I go off and do something else, but still keep an eye on what's going on. - It works with TCG guests too and includes a TCG test case for the pseries Power machine. That said, it's *not* a replacement for KVM autotest. It is not a good tool for doing acceptance testing, like you would do to cut a new QEMU release. But perhaps there's some behavior that could be leveraged by KVM autotest in the future here. I'm not proposing this for tree inclusion right now. Just sharing a tool that I've found to be useful. I really just want the previous patch to go in so that I can stop carrying that patch privately. Right now, you need to setup an ISO directory to use the test tool. After copy-on-read lands in the tree, I plan on making it create CoR files backing to http so that no explicit setup is required. --- test-linux.c | 549 ++ 1 files changed, 549 insertions(+), 0 deletions(-) create mode 100644 test-linux.c I needed the following changes to the Makefile to build this.. diff --git a/Makefile b/Makefile index 8606849..b7df0e1 100644 --- a/Makefile +++ b/Makefile @@ -162,6 +162,7 @@ check-qlist: check-qlist.o qlist.o qint.o $(CHECK_PROG_DEPS) check-qfloat: check-qfloat.o qfloat.o $(CHECK_PROG_DEPS) check-qjson: check-qjson.o qfloat.o qint.o qdict.o qstring.o qlist.o qbool.o qjson.o json-streamer.o json-lexer.o js test-coroutine: test-coroutine.o qemu-timer-common.o async.o $(coroutine-obj-y) $(CHECK_PROG_DEPS) +test-linux: test-linux.o $(CHECK_PROG_DEPS) $(qapi-obj-y): $(GENERATED_HEADERS) qapi-dir := qapi-generated -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] Block layer roadmap on wiki
* Stefan Hajnoczi stefa...@gmail.com [2011-08-22 08:35]: At KVM Forum Kevin, Christoph, and I had an opportunity to get together for a Block Layer BoF. We went through the recent roadmap mailing list thread and touched on each proposed feature. Here is the block layer roadmap wiki page: http://wiki.qemu.org/BlockRoadmap Kevin: I have moved the runtime WCE toggling to QEMU 1.0 since you mentioned you want it for the next release. My main take-away from the BoF was that integrating support for host block devices and storage appliances will allow us to reduce the amount of effort spent on image formats. In order to make image formats support the desired features and performance we end up implementing much of the storage stack and file systems in userspace - code that is duplicated and cannot take advantage of the existing storage stack. +1 Storage management features are not just available in remote SAN and NAS appliances anymore. For local storage, btrfs has file-level clones and thin-dev is significantly improving LVM snapshots. Thin-dev is bringing a much more efficient and scalable snapshot model to LVM. This device-mapper feature will make LVM attractive for high performance I/O without giving up snapshot and clone features. It also supports cloning off block devices that are not in the pool (e.g. external storage, much like QEMU's backing files feature): https://github.com/jthornber/linux-2.6/tree/thin-dev This will not replace image formats overnight because image formats are still widely used and will continue to be a useful for transferring and sharing disk images. But focussing on the larger Any thoughts on how to make this easily usable for LVM? If there were an export/import to/from file to LVM? is that sufficient? Anything like this in existence? storage stack where either local LVM, btrfs, or storage appliances do the storage management means we exploit those options instead of implementing equivalent functionality ourselves. QEMU then runs with plain old raw in more cases. Stefan -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] [PATCH 0/3] better I/O accounting V2
[cookie-type] += cookie-bytes; 240 bs-nr_ops[cookie-type]++; 241 bs-total_time_ns[cookie-type] += get_clock() - cookie-start_time_ns; 242 } 243 -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] [PATCH 0/3] better I/O accounting V2
* Christoph Hellwig h...@lst.de [2011-08-22 10:16]: On Mon, Aug 22, 2011 at 09:59:16AM -0500, Ryan Harper wrote: * Christoph Hellwig h...@lst.de [2011-08-21 17:27]: changes since V1: - rebase to apply against the current qemu.git tree Testing on tip + this series with: ./configure --enable-debug --enable-kvm --enable-io-thread --target-list=x86_64-softmmu That is about the same config that I run. [New Thread 0x74f27700 (LWP 18033)] Program received signal SIGSEGV, Segmentation fault. 0x004200c1 in bdrv_acct_done (bs=0x12310b0, cookie=0x1c69f50) at /root/git/qemu/block_int.h:239 239 bs-nr_bytes[cookie-type] += cookie-bytes; I can't see how this can result in a segfault, unless we get a corrupted cookie. Can you print what cookie-type is from gdb? (gdb) frame 0 #0 0x004200c1 in bdrv_acct_done (bs=0x12310b0, cookie=0x1c68810) at /root/git/qemu/block_int.h:239 239 bs-nr_bytes[cookie-type] += cookie-bytes; (gdb) p *cookie $3 = {bytes = 72057589759737855, start_time_ns = 72057589759737855, type = 16777215} -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] [PATCH 0/3] better I/O accounting V2
* Christoph Hellwig h...@lst.de [2011-08-22 10:37]: On Mon, Aug 22, 2011 at 10:29:11AM -0500, Ryan Harper wrote: (gdb) frame 0 #0 0x004200c1 in bdrv_acct_done (bs=0x12310b0, cookie=0x1c68810) at /root/git/qemu/block_int.h:239 239 bs-nr_bytes[cookie-type] += cookie-bytes; (gdb) p *cookie $3 = {bytes = 72057589759737855, start_time_ns = 72057589759737855, type = 16777215} So it is indeed corrupted. I'll try to figure out how that could have happened. So, I believe this is how it's happening. we start accounting on a write which is turned into a multiwrite via virtio_blk_handle_write() which calls virtio_submit_multiwrite(). Then when the multiwrite completes, we invoke virtio_blk_rw_complete() on each part of the multiwrite. None of these requests have had their acct structure initialized since there was just *one* initial write. We could do a bdrv_acct_start() on each req, but that would break the concept of hiding the additional writes under the initial request. So ensuring that the acct field is initialed when the request is allocated will fix the issue. With this patch, I don't see the crash anymore. Signed-off-by: Ryan Harper ry...@us.ibm.com diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c index 2660d1d..e746917 100644 --- a/hw/virtio-blk.c +++ b/hw/virtio-blk.c @@ -123,6 +123,7 @@ static VirtIOBlockReq *virtio_blk_alloc_request(VirtIOBlock *s) req-dev = s; req-qiov.size = 0; req-next = NULL; +memset(req-acct, 0, sizeof(BlockAcctCookie)); return req; } -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] Block layer roadmap on wiki
* Stefan Hajnoczi stefa...@gmail.com [2011-08-22 15:32]: On Mon, Aug 22, 2011 at 8:04 PM, Anthony Liguori anth...@codemonkey.ws wrote: On 08/22/2011 08:34 AM, Stefan Hajnoczi wrote: At KVM Forum Kevin, Christoph, and I had an opportunity to get together for a Block Layer BoF. We went through the recent roadmap mailing list thread and touched on each proposed feature. Here is the block layer roadmap wiki page: http://wiki.qemu.org/BlockRoadmap Kevin: I have moved the runtime WCE toggling to QEMU 1.0 since you mentioned you want it for the next release. My main take-away from the BoF was that integrating support for host block devices and storage appliances will allow us to reduce the amount of effort spent on image formats. In order to make image formats support the desired features and performance we end up implementing much of the storage stack and file systems in userspace - code that is duplicated and cannot take advantage of the existing storage stack. The flip side is, tighter integration either makes features hard to consume or makes QEMU enter a space it currently hasn't. Many features require root privileges to configure and a system-wide scope. That's not QEMU today. QEMU itself should be about emulation and virtualization. Storage management needs to be done outside of QEMU. Today you can already take an LVM snapshot - it happens outside of QEMU. It's at the libvirt level where different storage systems get abstracted (LVM, directory, iSCSI, etc) and there is a single API/command set to invoke management functions. But even without libvirt you can do it yourself, and I think this separation makes sense so that QEMU can be focussed on running a single VM rather than managing storage. In addition, it makes QEMU tied to a specific platform (most likely Linux). QEMU will still work but certain features might not be available. For example, this is true today if you're using a storage appliance that does deduplication - that's a feature you're getting on top of the emulation/virtualization that QEMU does. But it doesn't tie QEMU to a particular platform. You could certainly rm -rf block/* and still be able to accomplish much of what's done today but it would be extremely painful to do in practice. We have to find a balance of not reinventing things and making sure that simple things are simple to do. We wouldn't rm -rf block/* because we still need qemu-nbd. It probably makes sense to keep what we have today. I'm talking more about a shift from writing our own image format to integrating existing storage support. I think this is a key point. While I do like the idea of keeping QEMU focused on single VM, I think we don't help ourselves by not consuming the hypervisor platform services and integrating/exploiting those features to make using QEMU easier. That said, it does mean that some things like system-wide config and privs are hard and aren't strictly virtualization issues, but that doesn't mean we can't integrate some sort of solution. That may require tighter integration and more focus on the higher up pieces in the stack to really enable this. Yes, exactly. Much of it shouldn't be inside QEMU. Stefan -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] [PATCH v4 0/3] The intro for QEMU disk I/O limits
* Zhi Yong Wu wu...@linux.vnet.ibm.com [2011-08-01 01:30]: The main goal of the patch is to effectively cap the disk I/O speed or counts of one single VM.It is only one draft, so it unavoidably has some drawbacks, if you catch them, please let me know. The patch will mainly introduce one block I/O throttling algorithm, one timer and one block queue for each I/O limits enabled drive. When a block request is coming in, the throttling algorithm will check if its I/O rate or counts exceed the limits; if yes, then it will enqueue to the block queue; The timer will handle the I/O requests in it. Some available features follow as below: (1) global bps limit. -drive bps=xxxin bytes/s (2) only read bps limit -drive bps_rd=xxx in bytes/s (3) only write bps limit -drive bps_wr=xxx in bytes/s (4) global iops limit -drive iops=xxx in ios/s (5) only read iops limit -drive iops_rd=xxxin ios/s (6) only write iops limit -drive iops_wr=xxxin ios/s (7) the combination of some limits. -drive bps=xxx,iops=xxx Known Limitations: (1) #1 can not coexist with #2, #3 (2) #4 can not coexist with #5, #6 (3) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario. Zhi Yong Wu (3): v4: fix memory leaking based on ryan's feedback. It looks like the leak has been fixed, but I think we need to rework how the blk-queue is using the AIOPool. I'll reply to that patch. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] [PATCH v4 2/3] The support for block queue
* Zhi Yong Wu wu...@linux.vnet.ibm.com [2011-08-01 01:30]: +static AIOPool block_queue_pool = { +.aiocb_size = sizeof(struct BlockDriverAIOCB), +.cancel = qemu_block_queue_cancel, +}; + +static void qemu_block_queue_callback(void *opaque, int ret) +{ +BlockDriverAIOCB *acb = opaque; + +qemu_aio_release(acb); +} + So, here we really want to invoke the original commands callback, and then free the request here (via qemu_aio_release()). see below. +BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue, +BlockDriverState *bs, +BlockRequestHandler *handler, +int64_t sector_num, +QEMUIOVector *qiov, +int nb_sectors, +BlockDriverCompletionFunc *cb, +void *opaque) +{ +BlockIORequest *request; +BlockDriverAIOCB *acb; + +request = qemu_malloc(sizeof(BlockIORequest)); +request-bs = bs; +request-handler = handler; +request-sector_num = sector_num; +request-qiov = qiov; +request-nb_sectors = nb_sectors; +request-cb = cb; +request-opaque = opaque; + +QTAILQ_INSERT_TAIL(queue-requests, request, entry); + +acb = qemu_aio_get(block_queue_pool, bs, + qemu_block_queue_callback, opaque); +request-acb = acb; + +return acb; +} + +int qemu_block_queue_handler(BlockIORequest *request) +{ +int ret; +BlockDriverAIOCB *res; + +/* indicate this req is from block queue */ +request-bs-req_from_queue = true; + +res = request-handler(request-bs, request-sector_num, + request-qiov, request-nb_sectors, + request-cb, request-opaque); + +if (request-acb) { +qemu_block_queue_callback(request-acb, 0); +} + +ret = (res == NULL) ? 0 : 1; + +return ret; +} You don't want to malloc the BlockIORequest directly in _enqueue(), rather you want to allocate that from your AIOPool via qemu_aio_get(). As it is now, we're allocating two BlockIORequests (malloc and then a aio_get()). You'll need a BlockDriverAIOCB in the BlockIORequest structure. Then in your queue_handler, instead of passing the original read or write callback (request-cb), you want to hook the block_queue callback (qemu_block_queue_callback()), in that callback you can then invoke the request callback and then release the request. The request should be, only one malloc (via the pool which will re-use the memory instead of incuring a malloc on every request), and then you release the memory back to the pool once your request is complete, which you'll know after wiring up the block_queue callback to the completion of the request's handler. And then since we don't double allocate, you won't need to do the qemu_free(request) in block.c in block_timer... -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] [PATCH v4 3/3] The support for queue timer and throttling algorithm
; +bs-io_disps.ios[BLOCK_IO_LIMIT_WRITE]++; +} +} + +if (bdrv_io_limits_enable(bs-io_limits)) { +bs-req_from_queue = false; } return ret; diff --git a/block.h b/block.h index 859d1d9..f0dac62 100644 --- a/block.h +++ b/block.h @@ -97,7 +97,6 @@ int bdrv_change_backing_file(BlockDriverState *bs, const char *backing_file, const char *backing_fmt); void bdrv_register(BlockDriver *bdrv); - typedef struct BdrvCheckResult { int corruptions; int leaks; diff --git a/block_int.h b/block_int.h index 1e265d2..1ca826b 100644 --- a/block_int.h +++ b/block_int.h @@ -27,10 +27,17 @@ #include block.h #include qemu-option.h #include qemu-queue.h +#include block/blk-queue.h #define BLOCK_FLAG_ENCRYPT 1 #define BLOCK_FLAG_COMPAT6 4 +#define BLOCK_IO_LIMIT_READ 0 +#define BLOCK_IO_LIMIT_WRITE1 +#define BLOCK_IO_LIMIT_TOTAL2 + +#define BLOCK_IO_SLICE_TIME 1 + #define BLOCK_OPT_SIZE size #define BLOCK_OPT_ENCRYPT encryption #define BLOCK_OPT_COMPAT6 compat6 @@ -46,6 +53,16 @@ typedef struct AIOPool { BlockDriverAIOCB *free_aiocb; } AIOPool; +typedef struct BlockIOLimit { +uint64_t bps[3]; +uint64_t iops[3]; +} BlockIOLimit; + +typedef struct BlockIODisp { +uint64_t bytes[2]; +uint64_t ios[2]; +} BlockIODisp; + struct BlockDriver { const char *format_name; int instance_size; @@ -175,6 +192,15 @@ struct BlockDriverState { void *sync_aiocb; +/* the time for latest disk I/O */ +int64_t slice_start[2]; +int64_t slice_end[2]; +BlockIOLimit io_limits; +BlockIODisp io_disps; +BlockQueue *block_queue; +QEMUTimer*block_timer; +bool req_from_queue; + /* I/O stats (display with info blockstats). */ uint64_t rd_bytes; uint64_t wr_bytes; @@ -222,6 +248,9 @@ void qemu_aio_release(void *p); void *qemu_blockalign(BlockDriverState *bs, size_t size); +void bdrv_set_io_limits(BlockDriverState *bs, +BlockIOLimit *io_limits); + #ifdef _WIN32 int is_windows_drive(const char *filename); #endif -- 1.7.2.3 -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] [PATCH v3 0/2] The intro for QEMU disk I/O limits
* Zhi Yong Wu wu...@linux.vnet.ibm.com [2011-07-28 05:53]: The main goal of the patch is to effectively cap the disk I/O speed or counts of one single VM.It is only one draft, so it unavoidably has some drawbacks, if you catch them, please let me know. The patch will mainly introduce one block I/O throttling algorithm, one timer and one block queue for each I/O limits enabled drive. When a block request is coming in, the throttling algorithm will check if its I/O rate or counts exceed the limits; if yes, then it will enqueue to the block queue; The timer will periodically handle the I/O requests in it. Some available features follow as below: (1) global bps limit. -drive bps=xxxin bytes/s (2) only read bps limit -drive bps_rd=xxx in bytes/s (3) only write bps limit -drive bps_wr=xxx in bytes/s (4) global iops limit -drive iops=xxx in ios/s (5) only read iops limit -drive iops_rd=xxxin ios/s (6) only write iops limit -drive iops_wr=xxxin ios/s (7) the combination of some limits. -drive bps=xxx,iops=xxx Known Limitations: (1) #1 can not coexist with #2, #3 (2) #4 can not coexist with #5, #6 (3) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario. I don't yet have detailed info , but we've got a memory leak in the code. After running the VM with a 1MB r and w limit for 8 hours or so: -drive bps_rd=$((1*1024*1024)),bps_wr=$((1*1024*1024)) I've got my system swapping with 43G resident in memory: 9913 root 20 0 87.3g 43g 548 D 9.6 34.5 44:00.87 qemu-system-x86 would be worth looking through the code and maybe a valgrind run to catch the leak. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] qemu - SCSI disk Device Model, Serial Number, and Firmware Version?
* Dave Seddon d...@seddon.ca [2011-06-07 02:28]: Greetings, Just wondering if it would be difficult to add the ability to define the SCSI disk Device Model, Serial Number, and Firmware Version. I've been using the '-device lsi' successfully to emulate the LSI controller, but now I want to emulate certain disks too. e.g. I've been using this: --- ... -drive if=none,id=disk00,file=/home/das/documents/qemu/disk00.img.qcow,media=disk,cache=writeback \ -device lsi \ -device scsi-disk,drive=disk00,bus=scsi.0 \ ... --- The reason this would be really cool is that tools like smartmontools seem to match on the Device Model, and the device-model QEMU hasn't made it into the list yet. What's the end goal here? Are you really passing an entire disk to the guest and want to have the guest monitor the device? If not, then I suggest just disabling smartmontools since it won't give you meaningful data anyhow. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] [RFC]QEMU disk I/O limits
* Daniel P. Berrange berra...@redhat.com [2011-05-31 09:25]: On Tue, May 31, 2011 at 10:10:37AM -0400, Vivek Goyal wrote: On Tue, May 31, 2011 at 02:56:46PM +0100, Daniel P. Berrange wrote: On Tue, May 31, 2011 at 09:45:37AM -0400, Vivek Goyal wrote: On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote: Hello, all, I have prepared to work on a feature called Disk I/O limits for qemu-kvm projeect. This feature will enable the user to cap disk I/O amount performed by a VM.It is important for some storage resources to be shared among multi-VMs. As you've known, if some of VMs are doing excessive disk I/O, they will hurt the performance of other VMs. Hi Zhiyong, Why not use kernel blkio controller for this and why reinvent the wheel and implement the feature again in qemu? The finest level of granularity offered by cgroups apply limits per QEMU process. So the blkio controller can't be used to apply controls directly to individual disks used by QEMU, only the VM as a whole. So are multiple VMs using same disk. Then put multiple VMs in same cgroup and apply the limit on that disk. Or if you want to put a system wide limit on a disk, then put all VMs in root cgroup and put limit on root cgroups. I fail to understand what's the exact requirement here. I thought the biggest use case was isolation one VM from other which might be sharing same device. Hence we were interested in putting per VM limit on disk and not a system wide limit on disk (independent of VM). No, it isn't about putting limits on a disk independant of a VM. It is about one VM having multiple disks, and wanting to set different policies for each of its virtual disks. eg qemu-kvm -drive file=/dev/sda1 -drive file=/dev/sdb3 and wanting to say that sda1 is limited to 10 MB/s, while sdb3 is limited to 50 MB/s. You can't do that kind of thing with cgroups, because it can only control the entire process, not individual resources within the process. yes, but with files: qemu-kvm -drive file=/path/to/local/vm/images -drive file=/path/to/shared/storage -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] [PATCH v3 1/2] cdrom: Allow the TEST_UNIT_READY command after a cdrom change
* Amit Shah amit.s...@redhat.com [2011-04-07 03:53]: We restrict the commands that a guest can send us after a cdrom change event. The current list includes REQUEST_SENSE and INQUIRY commands. Guests can also issue TEST_UNIT_READY to inquire for the status, so allow this command as well. This also gets rid of one cause of the HSM violation errors in Linux guests. Those errors came up because we had the UNIT_ATTENTION event pending and we replied with an error message to a command that should be allowed in such a condition. The guest then did a soft reset to get to a sane state. Signed-off-by: Amit Shah amit.s...@redhat.com --- hw/ide/core.c |9 + 1 files changed, 5 insertions(+), 4 deletions(-) diff --git a/hw/ide/core.c b/hw/ide/core.c index 007a4ee..d55d804 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -1105,10 +1105,11 @@ static void ide_atapi_cmd(IDEState *s) /* If there's a UNIT_ATTENTION condition pending, only REQUEST_SENSE and INQUIRY commands are allowed to complete. */ Wasn't this comment going to get updated to be more generic (ie more than just the 3 commands below may be needed?) if (s-sense_key == SENSE_UNIT_ATTENTION - s-io_buffer[0] != GPCMD_REQUEST_SENSE - s-io_buffer[0] != GPCMD_INQUIRY) { - ide_atapi_cmd_check_status(s); - return; +s-io_buffer[0] != GPCMD_REQUEST_SENSE +s-io_buffer[0] != GPCMD_INQUIRY +s-io_buffer[0] != GPCMD_TEST_UNIT_READY) { +ide_atapi_cmd_check_status(s); +return; } switch(s-io_buffer[0]) { case GPCMD_TEST_UNIT_READY: -- 1.7.4 -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] [PATCH v3] Do not delete BlockDriverState when deleting the drive
* Markus Armbruster arm...@redhat.com [2011-03-29 02:44]: Ryan Harper ry...@us.ibm.com writes: When removing a drive from the host-side via drive_del we currently have the following path: drive_del qemu_aio_flush() bdrv_close()// zaps bs-drv, which makes any subsequent I/O get // dropped. Works as designed drive_uninit() bdrv_delete() // frees the bs. Since the device is still connected to // bs, any subsequent I/O is a use-after-free. The value of bs-drv becomes unpredictable on free. As long as it remains null, I/O still gets dropped, however it could become non-null at any point after the free resulting SEGVs or other QEMU state corruption. To resolve this issue as simply as possible, we can chose to not actually delete the BlockDriverState pointer. Since bdrv_close() handles setting the drv pointer to NULL, we just need to remove the BlockDriverState from the QLIST that is used to enumerate the block devices. This is currently handled within bdrv_delete, so move this into its own function, bdrv_make_anon(). The result is that we can now invoke drive_del, this closes the file descriptors and sets BlockDriverState-drv to NULL which prevents futher IO to the device, and since we do not free BlockDriverState, we don't have to worry about the copy retained in the block devices. We also don't attempt to remove the qdev property since we are no longer deleting the BlockDriverState on drives with associated drives. This also allows for removing Drives with no devices associated either. Reported-by: Markus Armbruster arm...@redhat.com Signed-off-by: Ryan Harper ry...@us.ibm.com --- v2-v3 - Update drive_del use after free description - s/bdrv_remove/bdrv_make_anon/g - Don't remove qdev property since we don't delete bs any more - If (bs-peer) bdrv_make_anon else bdrv_delete to handle removing drives with no device. v1-v2 - NULL bs-device_name after removing from list to prevent second removal. block.c| 13 ++--- block.h|1 + blockdev.c | 25 - 3 files changed, 19 insertions(+), 20 deletions(-) diff --git a/block.c b/block.c index c8e2f97..6a5d3f2 100644 --- a/block.c +++ b/block.c @@ -697,14 +697,21 @@ void bdrv_close_all(void) } } +/* make a BlockDriverState anonymous by removing from bdrv_state list. + Also, NULL terminate the device_name to prevent double remove */ +void bdrv_make_anon(BlockDriverState *bs) +{ +if (bs-device_name[0] != '\0') { +QTAILQ_REMOVE(bdrv_states, bs, list); +} You lost +bs-device_name[0] = '\0'; since v2. Oops. Crap. +} + void bdrv_delete(BlockDriverState *bs) { assert(!bs-peer); /* remove from list, if necessary */ -if (bs-device_name[0] != '\0') { -QTAILQ_REMOVE(bdrv_states, bs, list); -} +bdrv_make_anon(bs); bdrv_close(bs); if (bs-file != NULL) { diff --git a/block.h b/block.h index 5d78fc0..52e9cad 100644 --- a/block.h +++ b/block.h @@ -66,6 +66,7 @@ int bdrv_create(BlockDriver *drv, const char* filename, QEMUOptionParameter *options); int bdrv_create_file(const char* filename, QEMUOptionParameter *options); BlockDriverState *bdrv_new(const char *device_name); +void bdrv_make_anon(BlockDriverState *bs); void bdrv_delete(BlockDriverState *bs); int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags); int bdrv_open(BlockDriverState *bs, const char *filename, int flags, diff --git a/blockdev.c b/blockdev.c index ecf2252..2c0eb06 100644 --- a/blockdev.c +++ b/blockdev.c @@ -737,8 +737,6 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data) { const char *id = qdict_get_str(qdict, id); BlockDriverState *bs; -BlockDriverState **ptr; -Property *prop; bs = bdrv_find(id); if (!bs) { @@ -755,24 +753,17 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data) bdrv_flush(bs); bdrv_close(bs); -/* clean up guest state from pointing to host resource by - * finding and removing DeviceState drive property */ +/* if we have a device associated with this BlockDriverState (bs-peer) + * then we need to make the drive anonymous until the device + * can be removed. If this is a drive with no device backing + * then we can just get rid of the block driver state right here. + */ if (bs-peer) { -for (prop = bs-peer-info-props; prop prop-name; prop++) { -if (prop-info-type == PROP_TYPE_DRIVE) { -ptr = qdev_get_prop_ptr(bs-peer, prop); -if (*ptr == bs) { -bdrv_detach(bs, bs-peer); -*ptr = NULL
Re: [Qemu-devel] [PATCH v3] Do not delete BlockDriverState when deleting the drive
* Markus Armbruster arm...@redhat.com [2011-03-29 04:06]: Since you have to respin anyway, would you mind limiting commit message line length to 70-75 characters? Thanks. yep -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
[Qemu-devel] [PATCH v4] Do not delete BlockDriverState when deleting the drive
When removing a drive from the host-side via drive_del we currently have the following path: drive_del qemu_aio_flush() bdrv_close()// zaps bs-drv, which makes any subsequent I/O get // dropped. Works as designed drive_uninit() bdrv_delete() // frees the bs. Since the device is still connected to // bs, any subsequent I/O is a use-after-free. The value of bs-drv becomes unpredictable on free. As long as it remains null, I/O still gets dropped, however it could become non-null at any point after the free resulting SEGVs or other QEMU state corruption. To resolve this issue as simply as possible, we can chose to not actually delete the BlockDriverState pointer. Since bdrv_close() handles setting the drv pointer to NULL, we just need to remove the BlockDriverState from the QLIST that is used to enumerate the block devices. This is currently handled within bdrv_delete, so move this into its own function, bdrv_make_anon(). The result is that we can now invoke drive_del, this closes the file descriptors and sets BlockDriverState-drv to NULL which prevents futher IO to the device, and since we do not free BlockDriverState, we don't have to worry about the copy retained in the block devices. We also don't attempt to remove the qdev property since we are no longer deleting the BlockDriverState on drives with associated drives. This also allows for removing Drives with no devices associated either. Reported-by: Markus Armbruster arm...@redhat.com Signed-off-by: Ryan Harper ry...@us.ibm.com --- v3-v4 - add back missing nulling of device_name in v2 - use drive_uninit() when removing drive with no peer - align commit message on 72 char boundary v2-v3 - Update drive_del use after free description - s/bdrv_remove/bdrv_make_anon/g - Don't remove qdev property since we don't delete bs any more - If (bs-peer) bdrv_make_anon else bdrv_delete to handle removing drives with no device. v1-v2 - NULL bs-device_name after removing from list to prevent second removal. block.c| 14 +++--- block.h|1 + blockdev.c | 25 - 3 files changed, 20 insertions(+), 20 deletions(-) diff --git a/block.c b/block.c index c8e2f97..c93ec6d 100644 --- a/block.c +++ b/block.c @@ -697,14 +697,22 @@ void bdrv_close_all(void) } } +/* make a BlockDriverState anonymous by removing from bdrv_state list. + Also, NULL terminate the device_name to prevent double remove */ +void bdrv_make_anon(BlockDriverState *bs) +{ +if (bs-device_name[0] != '\0') { +QTAILQ_REMOVE(bdrv_states, bs, list); +} +bs-device_name[0] = '\0'; +} + void bdrv_delete(BlockDriverState *bs) { assert(!bs-peer); /* remove from list, if necessary */ -if (bs-device_name[0] != '\0') { -QTAILQ_REMOVE(bdrv_states, bs, list); -} +bdrv_make_anon(bs); bdrv_close(bs); if (bs-file != NULL) { diff --git a/block.h b/block.h index 5d78fc0..52e9cad 100644 --- a/block.h +++ b/block.h @@ -66,6 +66,7 @@ int bdrv_create(BlockDriver *drv, const char* filename, QEMUOptionParameter *options); int bdrv_create_file(const char* filename, QEMUOptionParameter *options); BlockDriverState *bdrv_new(const char *device_name); +void bdrv_make_anon(BlockDriverState *bs); void bdrv_delete(BlockDriverState *bs); int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags); int bdrv_open(BlockDriverState *bs, const char *filename, int flags, diff --git a/blockdev.c b/blockdev.c index ecf2252..c810b16 100644 --- a/blockdev.c +++ b/blockdev.c @@ -737,8 +737,6 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data) { const char *id = qdict_get_str(qdict, id); BlockDriverState *bs; -BlockDriverState **ptr; -Property *prop; bs = bdrv_find(id); if (!bs) { @@ -755,24 +753,17 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data) bdrv_flush(bs); bdrv_close(bs); -/* clean up guest state from pointing to host resource by - * finding and removing DeviceState drive property */ +/* if we have a device associated with this BlockDriverState (bs-peer) + * then we need to make the drive anonymous until the device + * can be removed. If this is a drive with no device backing + * then we can just get rid of the block driver state right here. + */ if (bs-peer) { -for (prop = bs-peer-info-props; prop prop-name; prop++) { -if (prop-info-type == PROP_TYPE_DRIVE) { -ptr = qdev_get_prop_ptr(bs-peer, prop); -if (*ptr == bs) { -bdrv_detach(bs, bs-peer); -*ptr = NULL; -break; -} -} -} +bdrv_make_anon(bs); +} else { +drive_uninit(drive_get_by_blockdev(bs)); } -/* clean up host side */ -drive_uninit(drive_get_by_blockdev(bs)); - return 0
Re: [Qemu-devel] [PATCH v2] Do not delete BlockDriverState when deleting the drive
* Markus Armbruster arm...@redhat.com [2011-03-24 07:27]: Whoops, almost missed this. Best to cc: me to avoid that. It was sent directly to you: Sender: qemu-devel-bounces+ryanh=us.ibm@nongnu.org From: Ryan Harper ry...@us.ibm.com Subject: Re: [Qemu-devel] [PATCH v2] Do not delete BlockDriverState when deleting the drive Date: Tue, 22 Mar 2011 20:53:47 -0500 Message-ID: 20110323015347.ga20...@us.ibm.com User-Agent: Mutt/1.5.6+20040907i To: Markus Armbruster arm...@redhat.com Cc: Kevin Wolf kw...@redhat.com, Ryan Harper ry...@us.ibm.com, qemu-devel@nongnu.org Ryan Harper ry...@us.ibm.com writes: * Markus Armbruster arm...@redhat.com [2011-03-15 04:48]: Sorry for the long delay, I was out of action for a week. Ryan Harper ry...@us.ibm.com writes: When removing a drive from the host-side via drive_del we currently have the following path: drive_del qemu_aio_flush() bdrv_close() drive_uninit() bdrv_delete() When we bdrv_delete() we end up qemu_free() the BlockDriverState pointer however, the block devices retain a copy of this pointer, see hw/virtio-blk.c:virtio_blk_init() where we s-bs = conf-bs. We now have a use-after-free situation. If the guest continues to issue IO against the device, and we've reallocated the memory that the BlockDriverState pointed at, then we will fail the bs-drv checks in the various bdrv_ methods. we will fail the bs-drv checks is misleading, in my opinion. Here's what happens: 1. bdrv_close(bs) zaps bs-drv, which makes any subsequent I/O get dropped. Works as designed. 2. drive_uninit() frees the bs. Since the device is still connected to bs, any subsequent I/O is a use-after-free. The value of bs-drv becomes unpredictable on free. As long as it remains null, I/O still gets dropped. I/O crashes or worse once that changed. Could be right on free, could be much later. If you respin anyway, please clarify your description. Sure. I wasn't planning a new version, but I'll update and send anyhow as I didn't see it get included in pull from the block branch. To resolve this issue as simply as possible, we can chose to not actually delete the BlockDriverState pointer. Since bdrv_close() handles setting the drv pointer to NULL, we just need to remove the BlockDriverState from the QLIST that is used to enumerate the block devices. This is currently handled within bdrv_delete, so move this into it's own function, bdrv_remove(). Why do we remove the BlockDriverState from bdrv_states? Because we want drive_del make its *name* go away. Begs the question: is the code prepared for a BlockDriverState object that isn't on bdrv_states? Turns out we're in luck: bdrv_new() already creates such objects when the device_name is empty. This is used for internal BlockDriverStates such as COW backing files. Your code makes device_name empty when taking the object off bdrv_states, so we're good. Begs yet another question: how does the behavior of a BlockDriverState change when it's taken off bdrv_states, and is that the behavior we want? Changes: * bdrv_delete() no longer takes it off bdrv_states. Good. * bdrv_close_all(), bdrv_commit_all() and bdrv_flush_all() no longer cover it. Okay, because bdrv_close(), bdrv_commit() and bdrv_flush() do nothing anyway for closed BlockDriverStates. * info block and info blockstats no longer show it, because bdrv_info() and bdrv_info_stats() no longer see it. Okay. * bdrv_find(), bdrv_next(), bdrv_iterate() no longer see it. Impact? Please check their uses and report. 1664 block-migration.c block_load bs = bdrv_find(device_name); - no longer see it. This is fine since we can't migrate a block device that has been removed 2562 blockdev.c do_commit bs = bdrv_find(device); - do_commit won't see it in either when calling bdrv_commit_all() Fine as you mention above. If user specifies the device name we won't find it, that's OK because we can't commit data against a closed BlockDriverState. 3587 blockdev.c do_snapshot_blkdev bs = bdrv_find(device); - OK, cannot take a snapshot against a deleted BlockDriverState 4662 blockdev.c do_eject bs = bdrv_find(filename); - OK, cannot eject a deleted BlockDriverState; 5676 blockdev.c do_block_set_passwd bs = bdrv_find(qdict_get_str(qdict, device)); - OK, cannot set password a deleted BlockDriverState; 6701 blockdev.c do_change_block bs = bdrv_find(device); - OK, cannot change the file/device of a deleted BlockDriverState; 7732 blockdev.c do_drive_del bs = bdrv_find(id); - OK, cannot
[Qemu-devel] [PATCH v3] Do not delete BlockDriverState when deleting the drive
When removing a drive from the host-side via drive_del we currently have the following path: drive_del qemu_aio_flush() bdrv_close()// zaps bs-drv, which makes any subsequent I/O get // dropped. Works as designed drive_uninit() bdrv_delete() // frees the bs. Since the device is still connected to // bs, any subsequent I/O is a use-after-free. The value of bs-drv becomes unpredictable on free. As long as it remains null, I/O still gets dropped, however it could become non-null at any point after the free resulting SEGVs or other QEMU state corruption. To resolve this issue as simply as possible, we can chose to not actually delete the BlockDriverState pointer. Since bdrv_close() handles setting the drv pointer to NULL, we just need to remove the BlockDriverState from the QLIST that is used to enumerate the block devices. This is currently handled within bdrv_delete, so move this into its own function, bdrv_make_anon(). The result is that we can now invoke drive_del, this closes the file descriptors and sets BlockDriverState-drv to NULL which prevents futher IO to the device, and since we do not free BlockDriverState, we don't have to worry about the copy retained in the block devices. We also don't attempt to remove the qdev property since we are no longer deleting the BlockDriverState on drives with associated drives. This also allows for removing Drives with no devices associated either. Reported-by: Markus Armbruster arm...@redhat.com Signed-off-by: Ryan Harper ry...@us.ibm.com --- v2-v3 - Update drive_del use after free description - s/bdrv_remove/bdrv_make_anon/g - Don't remove qdev property since we don't delete bs any more - If (bs-peer) bdrv_make_anon else bdrv_delete to handle removing drives with no device. v1-v2 - NULL bs-device_name after removing from list to prevent second removal. block.c| 13 ++--- block.h|1 + blockdev.c | 25 - 3 files changed, 19 insertions(+), 20 deletions(-) diff --git a/block.c b/block.c index c8e2f97..6a5d3f2 100644 --- a/block.c +++ b/block.c @@ -697,14 +697,21 @@ void bdrv_close_all(void) } } +/* make a BlockDriverState anonymous by removing from bdrv_state list. + Also, NULL terminate the device_name to prevent double remove */ +void bdrv_make_anon(BlockDriverState *bs) +{ +if (bs-device_name[0] != '\0') { +QTAILQ_REMOVE(bdrv_states, bs, list); +} +} + void bdrv_delete(BlockDriverState *bs) { assert(!bs-peer); /* remove from list, if necessary */ -if (bs-device_name[0] != '\0') { -QTAILQ_REMOVE(bdrv_states, bs, list); -} +bdrv_make_anon(bs); bdrv_close(bs); if (bs-file != NULL) { diff --git a/block.h b/block.h index 5d78fc0..52e9cad 100644 --- a/block.h +++ b/block.h @@ -66,6 +66,7 @@ int bdrv_create(BlockDriver *drv, const char* filename, QEMUOptionParameter *options); int bdrv_create_file(const char* filename, QEMUOptionParameter *options); BlockDriverState *bdrv_new(const char *device_name); +void bdrv_make_anon(BlockDriverState *bs); void bdrv_delete(BlockDriverState *bs); int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags); int bdrv_open(BlockDriverState *bs, const char *filename, int flags, diff --git a/blockdev.c b/blockdev.c index ecf2252..2c0eb06 100644 --- a/blockdev.c +++ b/blockdev.c @@ -737,8 +737,6 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data) { const char *id = qdict_get_str(qdict, id); BlockDriverState *bs; -BlockDriverState **ptr; -Property *prop; bs = bdrv_find(id); if (!bs) { @@ -755,24 +753,17 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data) bdrv_flush(bs); bdrv_close(bs); -/* clean up guest state from pointing to host resource by - * finding and removing DeviceState drive property */ +/* if we have a device associated with this BlockDriverState (bs-peer) + * then we need to make the drive anonymous until the device + * can be removed. If this is a drive with no device backing + * then we can just get rid of the block driver state right here. + */ if (bs-peer) { -for (prop = bs-peer-info-props; prop prop-name; prop++) { -if (prop-info-type == PROP_TYPE_DRIVE) { -ptr = qdev_get_prop_ptr(bs-peer, prop); -if (*ptr == bs) { -bdrv_detach(bs, bs-peer); -*ptr = NULL; -break; -} -} -} +bdrv_make_anon(bs); +} else { +bdrv_delete(bs); } -/* clean up host side */ -drive_uninit(drive_get_by_blockdev(bs)); - return 0; } -- 1.7.1 -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] [PATCH v2] Do not delete BlockDriverState when deleting the drive
* Markus Armbruster arm...@redhat.com [2011-03-15 04:48]: Sorry for the long delay, I was out of action for a week. Ryan Harper ry...@us.ibm.com writes: When removing a drive from the host-side via drive_del we currently have the following path: drive_del qemu_aio_flush() bdrv_close() drive_uninit() bdrv_delete() When we bdrv_delete() we end up qemu_free() the BlockDriverState pointer however, the block devices retain a copy of this pointer, see hw/virtio-blk.c:virtio_blk_init() where we s-bs = conf-bs. We now have a use-after-free situation. If the guest continues to issue IO against the device, and we've reallocated the memory that the BlockDriverState pointed at, then we will fail the bs-drv checks in the various bdrv_ methods. we will fail the bs-drv checks is misleading, in my opinion. Here's what happens: 1. bdrv_close(bs) zaps bs-drv, which makes any subsequent I/O get dropped. Works as designed. 2. drive_uninit() frees the bs. Since the device is still connected to bs, any subsequent I/O is a use-after-free. The value of bs-drv becomes unpredictable on free. As long as it remains null, I/O still gets dropped. I/O crashes or worse once that changed. Could be right on free, could be much later. If you respin anyway, please clarify your description. Sure. I wasn't planning a new version, but I'll update and send anyhow as I didn't see it get included in pull from the block branch. To resolve this issue as simply as possible, we can chose to not actually delete the BlockDriverState pointer. Since bdrv_close() handles setting the drv pointer to NULL, we just need to remove the BlockDriverState from the QLIST that is used to enumerate the block devices. This is currently handled within bdrv_delete, so move this into it's own function, bdrv_remove(). Why do we remove the BlockDriverState from bdrv_states? Because we want drive_del make its *name* go away. Begs the question: is the code prepared for a BlockDriverState object that isn't on bdrv_states? Turns out we're in luck: bdrv_new() already creates such objects when the device_name is empty. This is used for internal BlockDriverStates such as COW backing files. Your code makes device_name empty when taking the object off bdrv_states, so we're good. Begs yet another question: how does the behavior of a BlockDriverState change when it's taken off bdrv_states, and is that the behavior we want? Changes: * bdrv_delete() no longer takes it off bdrv_states. Good. * bdrv_close_all(), bdrv_commit_all() and bdrv_flush_all() no longer cover it. Okay, because bdrv_close(), bdrv_commit() and bdrv_flush() do nothing anyway for closed BlockDriverStates. * info block and info blockstats no longer show it, because bdrv_info() and bdrv_info_stats() no longer see it. Okay. * bdrv_find(), bdrv_next(), bdrv_iterate() no longer see it. Impact? Please check their uses and report. 1664 block-migration.c block_load bs = bdrv_find(device_name); - no longer see it. This is fine since we can't migrate a block device that has been removed 2562 blockdev.c do_commit bs = bdrv_find(device); - do_commit won't see it in either when calling bdrv_commit_all() Fine as you mention above. If user specifies the device name we won't find it, that's OK because we can't commit data against a closed BlockDriverState. 3587 blockdev.c do_snapshot_blkdev bs = bdrv_find(device); - OK, cannot take a snapshot against a deleted BlockDriverState 4662 blockdev.c do_eject bs = bdrv_find(filename); - OK, cannot eject a deleted BlockDriverState; 5676 blockdev.c do_block_set_passwd bs = bdrv_find(qdict_get_str(qdict, device)); - OK, cannot set password a deleted BlockDriverState; 6701 blockdev.c do_change_block bs = bdrv_find(device); - OK, cannot change the file/device of a deleted BlockDriverState; 7732 blockdev.c do_drive_del bs = bdrv_find(id); - OK, cannot delete an already deleted Drive 8783 blockdev.c do_block_resize bs = bdrv_find(device); - OK, cannot resize a deleted Drive 9312 hw/qdev-properties.c parse_drive bs = bdrv_find(str); - Used when invoking qdev_prop_drive .parse method; parse method is invoked via qdev_device_add() which calls set_property() which invokes parse. AFAICT, this is OK since we won't be going down the device add path worrying about a deleted block device. The result is that we can now invoke drive_del, this closes the file descriptors and sets BlockDriverState-drv to NULL which prevents futher IO to the device, and since we do not free BlockDriverState, we don't have to worry about the copy retained in the block devices. Yep
[Qemu-devel] [PATCH v2] Do not delete BlockDriverState when deleting the drive
When removing a drive from the host-side via drive_del we currently have the following path: drive_del qemu_aio_flush() bdrv_close() drive_uninit() bdrv_delete() When we bdrv_delete() we end up qemu_free() the BlockDriverState pointer however, the block devices retain a copy of this pointer, see hw/virtio-blk.c:virtio_blk_init() where we s-bs = conf-bs. We now have a use-after-free situation. If the guest continues to issue IO against the device, and we've reallocated the memory that the BlockDriverState pointed at, then we will fail the bs-drv checks in the various bdrv_ methods. To resolve this issue as simply as possible, we can chose to not actually delete the BlockDriverState pointer. Since bdrv_close() handles setting the drv pointer to NULL, we just need to remove the BlockDriverState from the QLIST that is used to enumerate the block devices. This is currently handled within bdrv_delete, so move this into it's own function, bdrv_remove(). The result is that we can now invoke drive_del, this closes the file descriptors and sets BlockDriverState-drv to NULL which prevents futher IO to the device, and since we do not free BlockDriverState, we don't have to worry about the copy retained in the block devices. Reported-by: Marcus Armbruster arm...@redhat.com Signed-off-by: Ryan Harper ry...@us.ibm.com --- v1-v2 - NULL bs-device_name after removing from list to prevent second removal. block.c| 12 +--- block.h|1 + blockdev.c |2 +- 3 files changed, 11 insertions(+), 4 deletions(-) diff --git a/block.c b/block.c index 1544d81..0df9942 100644 --- a/block.c +++ b/block.c @@ -697,14 +697,20 @@ void bdrv_close_all(void) } } +void bdrv_remove(BlockDriverState *bs) +{ +if (bs-device_name[0] != '\0') { +QTAILQ_REMOVE(bdrv_states, bs, list); +} +bs-device_name[0] = '\0'; +} + void bdrv_delete(BlockDriverState *bs) { assert(!bs-peer); /* remove from list, if necessary */ -if (bs-device_name[0] != '\0') { -QTAILQ_REMOVE(bdrv_states, bs, list); -} +bdrv_remove(bs); bdrv_close(bs); if (bs-file != NULL) { diff --git a/block.h b/block.h index 5d78fc0..8447397 100644 --- a/block.h +++ b/block.h @@ -66,6 +66,7 @@ int bdrv_create(BlockDriver *drv, const char* filename, QEMUOptionParameter *options); int bdrv_create_file(const char* filename, QEMUOptionParameter *options); BlockDriverState *bdrv_new(const char *device_name); +void bdrv_remove(BlockDriverState *bs); void bdrv_delete(BlockDriverState *bs); int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags); int bdrv_open(BlockDriverState *bs, const char *filename, int flags, diff --git a/blockdev.c b/blockdev.c index 0690cc8..1f57b50 100644 --- a/blockdev.c +++ b/blockdev.c @@ -760,7 +760,7 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data) } /* clean up host side */ -drive_uninit(drive_get_by_blockdev(bs)); +bdrv_remove(bs); return 0; } -- 1.7.1 -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
[Qemu-devel] [PATCH v2] Don't allow multiwrites against a block device without underlying medium
If the block device has been closed, we no longer have a medium to submit IO against, check for this before submitting io. This prevents a segfault further in the code where we dereference elements of the block driver. Signed-off-by: Ryan Harper ry...@us.ibm.com --- v1-v2: - move bs-drv check to top of function to match other bdrv_ functions - fill out reqs response with error code before returning -1. block.c |8 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/block.c b/block.c index f7d91a2..1544d81 100644 --- a/block.c +++ b/block.c @@ -2398,6 +2398,14 @@ int bdrv_aio_multiwrite(BlockDriverState *bs, BlockRequest *reqs, int num_reqs) MultiwriteCB *mcb; int i; +/* don't submit writes if we don't have a medium */ +if (bs-drv == NULL) { +for (i = 0; i num_reqs; i++) { +reqs[i].error = -ENOMEDIUM; +} +return -1; +} + if (num_reqs == 0) { return 0; } -- 1.7.1 -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
[Qemu-devel] [PATCH] Do not delete BlockDriverState when deleting the drive
When removing a drive from the host-side via drive_del we currently have the following path: drive_del qemu_aio_flush() bdrv_close() drive_uninit() bdrv_delete() When we bdrv_delete() we end up qemu_free()'ing the BlockDriverState pointer however, the block devices retain a copy of this pointer, see hw/virtio-blk.c:virtio_blk_init() where we s-bs = conf-bs. We now have a use-after-free situation. If the guest continues to issue IO against the device, and we've reallocated the memory that the BlockDriverState pointed at, then we will fail the bs-drv checks in the various bdrv_ methods. To resolve this issue as simply as possible, we can chose to not actually delete the BlockDriverState pointer. Since bdrv_close() handles setting the drv pointer to NULL, we just need to remove the BlockDriverState from the QLIST that is used to enumerate the block devices. This is currently handled within bdrv_delete, so move this into it's own function, bdrv_remove(). The result is that we can now invoke drive_del, this closes the file descriptors and sets BlockDriverState-drv to NULL which prevents futher IO to the device, and since we do not free BlockDriverState, we don't have to worry about the copy retained in the block devices. This state will be deleted if the guest is asked and responds to a pci device removal request. Reported-by: Markus Armbruster arm...@redhat.com Signed-off-by: Ryan Harper ry...@us.ibm.com --- block.c| 11 --- block.h|1 + blockdev.c |2 +- 3 files changed, 10 insertions(+), 4 deletions(-) diff --git a/block.c b/block.c index f7d91a2..92dd3fe 100644 --- a/block.c +++ b/block.c @@ -697,14 +697,19 @@ void bdrv_close_all(void) } } +void bdrv_remove(BlockDriverState *bs) +{ +if (bs-device_name[0] != '\0') { +QTAILQ_REMOVE(bdrv_states, bs, list); +} +} + void bdrv_delete(BlockDriverState *bs) { assert(!bs-peer); /* remove from list, if necessary */ -if (bs-device_name[0] != '\0') { -QTAILQ_REMOVE(bdrv_states, bs, list); -} +bdrv_remove(bs); bdrv_close(bs); if (bs-file != NULL) { diff --git a/block.h b/block.h index 5d78fc0..8447397 100644 --- a/block.h +++ b/block.h @@ -66,6 +66,7 @@ int bdrv_create(BlockDriver *drv, const char* filename, QEMUOptionParameter *options); int bdrv_create_file(const char* filename, QEMUOptionParameter *options); BlockDriverState *bdrv_new(const char *device_name); +void bdrv_remove(BlockDriverState *bs); void bdrv_delete(BlockDriverState *bs); int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags); int bdrv_open(BlockDriverState *bs, const char *filename, int flags, diff --git a/blockdev.c b/blockdev.c index 0690cc8..1f57b50 100644 --- a/blockdev.c +++ b/blockdev.c @@ -760,7 +760,7 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data) } /* clean up host side */ -drive_uninit(drive_get_by_blockdev(bs)); +bdrv_remove(bs); return 0; } -- 1.7.1 -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] [PATCH] Fix ATA SMART and CHECK POWER MODE
*bus, uint32_t val) memset(s-io_buffer, 0, 0x200); s-io_buffer[0] = 0x01; /* smart struct version */ for (n=0; n30; n++) { - if (smart_attributes[n][0] == 0) + if (smart_attributes[n][0] == 0) CODING_STYLE if() break; - s-io_buffer[2+0+(n*12)] = smart_attributes[n][0]; - s-io_buffer[2+1+(n*12)] = smart_attributes[n][1]; - s-io_buffer[2+3+(n*12)] = smart_attributes[n][2]; - s-io_buffer[2+4+(n*12)] = smart_attributes[n][3]; + int i; + for(i = 0; i 12; i++) { + s-io_buffer[2+i+(n*12)] = smart_attributes[n][i]; + } } + whitespace s-io_buffer[362] = 0x02 | (s-smart_autosave?0x80:0x00); if (s-smart_selftest_count == 0) { - s-io_buffer[363] = 0; + s-io_buffer[363] = 0; } else { - s-io_buffer[363] = + s-io_buffer[363] = s-smart_selftest_data[3 + -(s-smart_selftest_count - 1) * -24]; +(s-smart_selftest_count - 1) * +24]; } s-io_buffer[364] = 0x20; s-io_buffer[365] = 0x01; @@ -2136,9 +2160,9 @@ void ide_exec_cmd(IDEBus *bus, uint32_t val) s-io_buffer[372] = 0x02; /* minutes for poll short test */ s-io_buffer[373] = 0x36; /* minutes for poll ext test */ s-io_buffer[374] = 0x01; /* minutes for poll conveyance */ - - for (n=0; n511; n++) - s-io_buffer[511] += s-io_buffer[n]; + for (n=0; n511; n++) { /* checksum */ + s-io_buffer[511] += s-io_buffer[n]; + } s-io_buffer[511] = 0x100 - s-io_buffer[511]; s-status = READY_STAT | SEEK_STAT; ide_transfer_start(s, s-io_buffer, 0x200, ide_transfer_stop); @@ -2147,32 +2171,31 @@ void ide_exec_cmd(IDEBus *bus, uint32_t val) case SMART_READ_LOG: switch (s-sector) { case 0x01: /* summary smart error log */ - memset(s-io_buffer, 0, 0x200); - s-io_buffer[0] = 0x01; - s-io_buffer[1] = 0x00; /* no error entries */ - s-io_buffer[452] = s-smart_errors 0xff; - s-io_buffer[453] = (s-smart_errors 0xff00) 8; - - for (n=0; n511; n++) + memset(s-io_buffer, 0, 0x200); + s-io_buffer[0] = 0x01; + s-io_buffer[1] = 0x00; /* no error entries */ + s-io_buffer[452] = s-smart_errors 0xff; + s-io_buffer[453] = (s-smart_errors 0xff00) 8; + for (n=0; n511; n++) brace s-io_buffer[511] += s-io_buffer[n]; - s-io_buffer[511] = 0x100 - s-io_buffer[511]; - break; + s-io_buffer[511] = 0x100 - s-io_buffer[511]; + break; case 0x06: /* smart self test log */ - memset(s-io_buffer, 0, 0x200); - s-io_buffer[0] = 0x01; - if (s-smart_selftest_count == 0) { + memset(s-io_buffer, 0, 0x200); + s-io_buffer[0] = 0x01; + if (s-smart_selftest_count == 0) { s-io_buffer[508] = 0; - } else { + } else { s-io_buffer[508] = s-smart_selftest_count; for (n=2; n506; n++) - s-io_buffer[n] = s-smart_selftest_data[n]; - } - for (n=0; n511; n++) + s-io_buffer[n] = s-smart_selftest_data[n]; + } + for (n=0; n511; n++) brace s-io_buffer[511] += s-io_buffer[n]; - s-io_buffer[511] = 0x100 - s-io_buffer[511]; - break; + s-io_buffer[511] = 0x100 - s-io_buffer[511]; + break; default: - goto abort_cmd; + goto abort_cmd; } s-status = READY_STAT | SEEK_STAT; ide_transfer_start(s, s-io_buffer, 0x200, ide_transfer_stop); This patchset looks a lot larger than it should since there is a lot of indentation movement, it would be good to see a version that just implemented the changes needed, which AFAICT are mainly the additional attributes and limits. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] [PATCH v3] Fix ATA SMART and CHECK POWER MODE
:0x00); if (s-smart_selftest_count == 0) { -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] [PATCH 2/3] block: tell drivers about an image resize
* Christoph Hellwig h...@lst.de [2011-01-14 10:27]: Add a new size_changed flag in the BlockDriverState and call the change_cb callback to notify drivers about a size change. Signed-off-by: Christoph Hellwig h...@lst.de Index: qemu/block.c === --- qemu.orig/block.c 2011-01-14 17:05:49.527003363 +0100 +++ qemu/block.c 2011-01-14 17:07:23.206255143 +0100 @@ -1135,6 +1135,10 @@ int bdrv_truncate(BlockDriverState *bs, ret = drv-bdrv_truncate(bs, offset); if (ret == 0) { ret = refresh_total_sectors(bs, offset BDRV_SECTOR_BITS); +if (bs-change_cb) { +bs-size_changed = 1; +bs-change_cb(bs-change_opaque); +} Do we want to check to ensure the size_changed flag isn't set before doing a second resize event? I'm wondering if the truncate takes $longtime and user gets impatient and issues a second resize command. How should we respond? Ignore it? queue it up? Is there any other path that invokes bdrv_truncate? If so, do we want invoke the call back in those scenarios or only if the truncate is initiated from the monitor invocation? } return ret; } Index: qemu/block_int.h === --- qemu.orig/block_int.h 2011-01-14 17:05:49.537004411 +0100 +++ qemu/block_int.h 2011-01-14 17:06:02.539004271 +0100 @@ -167,6 +167,7 @@ struct BlockDriverState { char backing_format[16]; /* if non-zero and backing_file exists */ int is_temporary; int media_changed; +int size_changed; BlockDriverState *backing_hd; BlockDriverState *file; -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] Re: [PATCH] Fix segfault with ram_size 4095M without kvm
* Ryan Harper ry...@us.ibm.com [2011-01-04 09:49]: * Aurelien Jarno aurel...@aurel32.net [2010-12-25 16:37]: On Wed, Dec 08, 2010 at 04:27:45PM -0200, Luiz Capitulino wrote: On Wed, 08 Dec 2010 12:23:12 -0600 Anthony Liguori aligu...@linux.vnet.ibm.com wrote: On 12/08/2010 12:01 PM, Luiz Capitulino wrote: Currently, x86_64-softmmu qemu segfaults when trying to use 4095M memsize. This patch adds a simple check and error message (much like the 2047 limit on 32-bit hosts) on ram_size in the control path after we determine we're not using kvm Upstream qemu-kvm is affected if using the -no-kvm option; this patch address the segfault there as well. Signed-off-by: Ryan Harperry...@us.ibm.com Signed-off-by: Aurelien Jarnoaurel...@aurel32.net --- NOTE: this patch was applied in the v0.12.x branch, but it seems it got lost for master No, it was intentional. We should fix the segv, this is not a known limitation but rather a bug. A TCG bug, I presume? Do you have more details about this issue and how to reproduce it? At the time of the bug, it was something simple like: qemu-system-x86_64 -m 4097 -hda /dev/null we'd get an imediate segfault. As you say, I'm not seeing it now on current git; I'll see about bisecting to see if we did get a fix for the issue. I attempted to bisect, but there a couple commits around where the issue was fixed that broke git bisect =( That narrowed it down to about 5 commits to check. This the last git commit where I can reproduce the segfault with the above test case (qemu invocation). commit 0aef4261ac0ec9089ade0e3a92f986cb4ba7317e Author: Aurelien Jarno aurel...@aurel32.net Date: Thu Mar 11 21:29:42 2010 +0100 target-ppc: fix evsrwu and evsrws (second try) Signed-off-by: Aurelien Jarno aurel...@aurel32.net The next 4 commits don't compile so they are untest-able: commit 14f24e1465edc44b9b4d89fbbea66e06088154e1 - fails to build with: - ./configure --target-list=x86_64-softmmu make clean make /home/rharper/work/git/qemu/exec.c: In function 'phys_page_find_alloc': /home/rharper/work/git/qemu/exec.c:341: error: #error unsupported TARGET_PHYS_ADDR_SPACE_BITS /home/rharper/work/git/qemu/exec.c: In function 'phys_page_for_each': /home/rharper/work/git/qemu/exec.c:1670: error: #error unsupported TARGET_PHYS_ADDR_SPACE_BITS make[1]: *** [exec.o] Error 1 make: *** [subdir-x86_64-softmmu] Error 2 commit 7bc7b099dfa38a856b1bc892c0f9f3d6fe28e170 - fails to build with: - ./configure --target-list=x86_64-softmmu make clean make /home/rharper/work/git/qemu/exec.c: In function 'phys_page_find_alloc': /home/rharper/work/git/qemu/exec.c:341: error: #error unsupported TARGET_PHYS_ADDR_SPACE_BITS /home/rharper/work/git/qemu/exec.c: In function 'phys_page_for_each': /home/rharper/work/git/qemu/exec.c:1670: error: #error unsupported TARGET_PHYS_ADDR_SPACE_BITS make[1]: *** [exec.o] Error 1 make: *** [subdir-x86_64-softmmu] Error 2 commit b9f83121a13153536d886305414b540460c34508 - fails to build with: - ./configure --target-list=x86_64-softmmu make clean make /home/rharper/work/git/qemu/exec.c: In function 'phys_page_find_alloc': /home/rharper/work/git/qemu/exec.c:341: error: #error unsupported TARGET_PHYS_ADDR_SPACE_BITS /home/rharper/work/git/qemu/exec.c: In function 'phys_page_for_each': /home/rharper/work/git/qemu/exec.c:1670: error: #error unsupported TARGET_PHYS_ADDR_SPACE_BITS make[1]: *** [exec.o] Error 1 make: *** [subdir-x86_64-softmmu] Error 2 commit 5270589032f450ae7c3448730855aa18ff68ccff - fails to build with: - ./configure --target-list=x86_64-softmmu make clean make /home/rharper/work/git/qemu/exec.c: In function 'phys_page_find_alloc': /home/rharper/work/git/qemu/exec.c:341: error: #error unsupported TARGET_PHYS_ADDR_SPACE_BITS /home/rharper/work/git/qemu/exec.c: In function 'phys_page_for_each': /home/rharper/work/git/qemu/exec.c:1670: error: #error unsupported TARGET_PHYS_ADDR_SPACE_BITS make[1]: *** [exec.o] Error 1 make: *** [subdir-x86_64-softmmu] Error 2 And this commit compiles and the test case no longer segfaults. So I'd say things are fixed at this point. commit 5cd2c5b6ad75c46d40118ac67c0c09d4e7930a65 - compiles and issue is no longer present. - ./configure --target-list=x86_64-softmmu make clean make sudo x86_64-softmmu/qemu-syem-x86_64 -L pc-bios -hda /dev/null -m 4097 -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] Re: [PATCH] Fix segfault with ram_size 4095M without kvm
* Aurelien Jarno aurel...@aurel32.net [2010-12-25 16:37]: On Wed, Dec 08, 2010 at 04:27:45PM -0200, Luiz Capitulino wrote: On Wed, 08 Dec 2010 12:23:12 -0600 Anthony Liguori aligu...@linux.vnet.ibm.com wrote: On 12/08/2010 12:01 PM, Luiz Capitulino wrote: Currently, x86_64-softmmu qemu segfaults when trying to use 4095M memsize. This patch adds a simple check and error message (much like the 2047 limit on 32-bit hosts) on ram_size in the control path after we determine we're not using kvm Upstream qemu-kvm is affected if using the -no-kvm option; this patch address the segfault there as well. Signed-off-by: Ryan Harperry...@us.ibm.com Signed-off-by: Aurelien Jarnoaurel...@aurel32.net --- NOTE: this patch was applied in the v0.12.x branch, but it seems it got lost for master No, it was intentional. We should fix the segv, this is not a known limitation but rather a bug. A TCG bug, I presume? Do you have more details about this issue and how to reproduce it? At the time of the bug, it was something simple like: qemu-system-x86_64 -m 4097 -hda /dev/null we'd get an imediate segfault. As you say, I'm not seeing it now on current git; I'll see about bisecting to see if we did get a fix for the issue. Support for more than 4GB of memory has been added a few years ago, and I am not able to reproduce the problem anymore (I have booted a 64-bit guest with 6GB of RAM, and make sure the guest use the whole memory). I guess TCG itself is fine, but there might be a bug in the MMU emulation in some cases. I also noticed that now i386-softmmu has been artificially limited to 2047MB. Tthis configuration used to support up to 64GB of RAM (PAE) on 64-bit hosts. -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel33.net -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
[Qemu-devel] [PATCH] blockdev: check dinfo ptr before using
If a user decides to punish a guest by revoking its block device via drive_del, and subsequently also attempts to remove the pci device backing it, and the device is using blockdev_auto_del() then we get a segfault when we attempt to access dinfo-auto_del.[1] The fix is to check if drive_get_by_blockdev() actually returns a valid dinfo pointer or not. 1. (qemu) pci_add auto storage file=images/test01.raw,if=virtio,id=block1,snapshot=on (qemu) drive_del block1 (qemu) pci_del 5 *segfault* Signed-off-by: Ryan Harper ry...@us.ibm.com -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com diff --git a/blockdev.c b/blockdev.c index f6ac439..3b3b82d 100644 --- a/blockdev.c +++ b/blockdev.c @@ -30,14 +30,16 @@ void blockdev_mark_auto_del(BlockDriverState *bs) { DriveInfo *dinfo = drive_get_by_blockdev(bs); -dinfo-auto_del = 1; +if (dinfo) { +dinfo-auto_del = 1; +} } void blockdev_auto_del(BlockDriverState *bs) { DriveInfo *dinfo = drive_get_by_blockdev(bs); -if (dinfo-auto_del) { +if (dinfo dinfo-auto_del) { drive_uninit(dinfo); } }
Re: [Qemu-devel] [PATCH 00/10] AHCI emulation support v2
* Alexander Graf ag...@suse.de [2010-11-18 12:49]: On 18.11.2010, at 14:26, Kevin Wolf kw...@redhat.com wrote: Hi Alex, Am 18.11.2010 04:27, schrieb Alexander Graf: This patch adds support for AHCI emulation. I have tested and verified it works in Linux, OpenBSD, Windows Vista and Windows 7. This AHCI emulation supports NCQ, so multiple read or write requests can be outstanding at the same time. The code is however not fully optimized yet. I'm fairly sure that there are low hanging performance fruits to be found still :). In my simple benchmarks I achieved about 2/3rd of virtio performance. Also, this AHCI emulation layer does not support legacy mode. So if you're using a disk with this emulation, you do not get it exposed using the legacy IDE interfaces. Another nitpick is CD-ROM support in Windows. Somehow it doesn't detect a CD-ROM drive attached to AHCI. At least it doesn't list it. To attach an AHCI disk to your VM, please use -drive file=...,if=sata This should do the trick for x86. On other platforms, you might need to add the ahci host controller using -device. This patch set is based on work done during the Google Summer of Code. I was mentoring a student, Roland Elek, who wrote most of the AHCI emulation code based on a patch from Chong Qiao. A bunch of other people were also involved, so everybody who I didn't mention - thanks a lot! I'm not completely sure about the relationship between the AHCI emulation and our existing IDE emulation. First thing I noticed is that AHCI wants to be independent and resides in hw/ instead of hw/ide/, but it still include ide/internal.h. Do you think it would make sense to move AHCI into hw/ide? Both ahci and ide implement ata. I guess the best thing to do would be to completely refactor all ide code into ata and pata code, then add ahci (sata) to the game. Estimated working time: ~1 month. :) As I would rather have something working we can base on in the tree, so whoever volunteers for the refactoring (hint!) knows how to design the interfaces, I am not sure how much is reasonable within this patch set. Moving the file to ide/ does sound like a good idea however. Then I believe that core.c is now a mixture of some generic ATA code (that is also used by SATA) and the Legacy IDE code. SATA doesn't seem to interact with the generic code through clean interfaces, but by accessing internal data structures and calls to somewhere in the middle of the existing IDE emultion. I think we should get a clean abstraction there and have a clean split between SATA, PATA and common code, with each of them sitting in its own file in hw/ide. I haven't reviewed the patches in detail but just had a quick look at them, so my impressions might be wrong. If so, please correct me. No, you're completely right. We're in a chicken and egg situation. We don't have ahci, but the ide code is ugly. We would probably do a bad job at refactoring the ata code if we don't know which interfaces to design for. So IMHO the only way we can really go is to implement sata, take the uglyness it adds to the already ugly ide code and use all of that for a working state we can start to refactor from. So yes, while I agree with your obversations, I do not believe we will get there until at least 0.16 or so. And I'd rather like to see a fast default block driver in gueast sooner than later :) Speaking of fast, do you have any numbers around ACHI vs IDE (not that I need any convincing that we can do better than IDE); just curious. Alex -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] Re: [PATCH 0/2] v8 Decouple block device removal from device removal
* Kevin Wolf kw...@redhat.com [2010-11-16 08:05]: Am 16.11.2010 14:51, schrieb Luiz Capitulino: On Fri, 12 Nov 2010 18:38:57 +0100 Kevin Wolf kw...@redhat.com wrote: Am 12.11.2010 18:07, schrieb Ryan Harper: details, details, v8 This patch series decouples the detachment of a block device from the removal of the backing pci-device. Removal of a hotplugged pci device requires the guest to respond before qemu tears down the block device. In some cases, the guest may not respond leaving the guest with continued access to the block device. Mgmt layer doesn't have a reliable method to force a disconnect. The new monitor command, drive_del, will revoke a guests access to the block device independently of the removal of the pci device. The first patch implements drive_del, the second patch implements the qmp version of the monitor command. Changes since v7: - Fixed up doc strings (delete - drive_del) Changes since v6: - Updated patch description - Dropped bdrv_unplug and inlined in drive_del - Explicitly invoke drive_uninit() Changes since v5: - Removed dangling pointers in guest and host state. This ensures things like info block no longer displays the deleted drive; though info pci will continue to display the pci device until the guest responds to the removal request. - Renamed drive_unplug - drive_del Changes since v4: - Droppped drive_get_by_id patch and use bdrv_find() instead - Added additional details about drive_unplug to hmp/qmp interface Changes since v3: - Moved QMP command for drive_unplug() to separate patch Changes since v2: - Added QMP command for drive_unplug() Changes since v1: - CodingStyle fixes - Added qemu_aio_flush() to bdrv_unplug() Signed-off-by: Ryan Harper ry...@us.ibm.com Thanks, applied both to the block branch. I guess the conclusion was that we don't want the new command in QMP? http://lists.gnu.org/archive/html/qemu-devel/2010-11/msg01084.html If you compare the time of these mails, Markus sent his mail only a few minutes after I had applied the patches and posted this. Ryan split the patch in two parts only to allow dropping the QMP part if we decided so, so I think he'll agree. I'm going to drop the second Indeed. patch from my queue again before I send a pull request. Kevin -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
[Qemu-devel] [PATCH 0/2] v7 Decouple block device removal from device removal
Once more dear friends, v7 This patch series decouples the detachment of a block device from the removal of the backing pci-device. Removal of a hotplugged pci device requires the guest to respond before qemu tears down the block device. In some cases, the guest may not respond leaving the guest with continued access to the block device. Mgmt layer doesn't have a reliable method to force a disconnect. The new monitor command, drive_del, will revoke a guests access to the block device independently of the removal of the pci device. The first patch implements drive_del, the second patch implements the qmp version of the monitor command. Changes since v6: - Updated patch description - Dropped bdrv_unplug and inlined in drive_del - Explicitly invoke drive_uninit() Changes since v5: - Removed dangling pointers in guest and host state. This ensures things like info block no longer displays the deleted drive; though info pci will continue to display the pci device until the guest responds to the removal request. - Renamed drive_unplug - drive_del Changes since v4: - Droppped drive_get_by_id patch and use bdrv_find() instead - Added additional details about drive_unplug to hmp/qmp interface Changes since v3: - Moved QMP command for drive_unplug() to separate patch Changes since v2: - Added QMP command for drive_unplug() Changes since v1: - CodingStyle fixes - Added qemu_aio_flush() to bdrv_unplug() Signed-off-by: Ryan Harper ry...@us.ibm.com
[Qemu-devel] [PATCH 1/2] Implement drive_del to decouple block removal from device removal
Currently device hotplug removal code is tied to device removal via ACPI. All pci devices that are removable via device_del() require the guest to respond to the request. In some cases the guest may not respond leaving the device still accessible to the guest. The management layer doesn't currently have a reliable way to revoke access to host resource in the presence of an uncooperative guest. This patch implements a new monitor command, drive_del, which provides an explicit command to revoke access to a host block device. drive_del first quiesces the block device (qemu_aio_flush; bdrv_flush() and bdrv_close()). This prevents further IO from being submitted against the host device. Finally, drive_del cleans up pointers between the drive object (host resource) and the device object (guest resource). Signed-off-by: Ryan Harper ry...@us.ibm.com --- blockdev.c | 39 +++ blockdev.h |1 + hmp-commands.hx | 18 ++ 3 files changed, 58 insertions(+), 0 deletions(-) diff --git a/blockdev.c b/blockdev.c index 6cb179a..f6ac439 100644 --- a/blockdev.c +++ b/blockdev.c @@ -14,6 +14,8 @@ #include qemu-option.h #include qemu-config.h #include sysemu.h +#include hw/qdev.h +#include block_int.h static QTAILQ_HEAD(drivelist, DriveInfo) drives = QTAILQ_HEAD_INITIALIZER(drives); @@ -597,3 +599,40 @@ int do_change_block(Monitor *mon, const char *device, } return monitor_read_bdrv_key_start(mon, bs, NULL, NULL); } + +int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data) +{ +const char *id = qdict_get_str(qdict, id); +BlockDriverState *bs; +BlockDriverState **ptr; +Property *prop; + +bs = bdrv_find(id); +if (!bs) { +qerror_report(QERR_DEVICE_NOT_FOUND, id); +return -1; +} + +/* quiesce block driver; prevent further io */ +qemu_aio_flush(); +bdrv_flush(bs); +bdrv_close(bs); + +/* clean up guest state from pointing to host resource by + * finding and removing DeviceState drive property */ +for (prop = bs-peer-info-props; prop prop-name; prop++) { +if (prop-info-type == PROP_TYPE_DRIVE) { +ptr = qdev_get_prop_ptr(bs-peer, prop); +if ((*ptr) == bs) { +bdrv_detach(bs, bs-peer); +*ptr = NULL; +break; +} +} +} + +/* clean up host side */ +drive_uninit(drive_get_by_blockdev(bs)); + +return 0; +} diff --git a/blockdev.h b/blockdev.h index 653affc..2a0559e 100644 --- a/blockdev.h +++ b/blockdev.h @@ -51,5 +51,6 @@ int do_eject(Monitor *mon, const QDict *qdict, QObject **ret_data); int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject **ret_data); int do_change_block(Monitor *mon, const char *device, const char *filename, const char *fmt); +int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data); #endif diff --git a/hmp-commands.hx b/hmp-commands.hx index e5585ba..d6dc18c 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -68,6 +68,24 @@ Eject a removable medium (use -f to force it). ETEXI { +.name = drive_del, +.args_type = id:s, +.params = device, +.help = remove host block device, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_drive_del, +}, + +STEXI +...@item delete @var{device} +...@findex delete +Remove host block device. The result is that guest generated IO is no longer +submitted against the host device underlying the disk. Once a drive has +been deleted, the QEMU Block layer returns -EIO which results in IO +errors in the guest for applications that are reading/writing to the device. +ETEXI + +{ .name = change, .args_type = device:B,target:F,arg:s?, .params = device filename [format], -- 1.6.3.3
[Qemu-devel] [PATCH 2/2] Add qmp version of drive_del
Signed-off-by: Ryan Harper ry...@us.ibm.com --- qmp-commands.hx | 29 + 1 files changed, 29 insertions(+), 0 deletions(-) diff --git a/qmp-commands.hx b/qmp-commands.hx index 793cf1c..1e0d4e9 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -338,6 +338,35 @@ Example: EQMP { +.name = drive_del, +.args_type = id:s, +.params = device, +.help = remove host block device, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_drive_del, +}, + +SQMP +drive del +-- + +Remove host block device. The result is that guest generated IO is no longer +submitted against the host device underlying the disk. Once a drive has +been deleted, the QEMU Block layer returns -EIO which results in IO +errors in the guest for applications that are reading/writing to the device. + +Arguments: + +- id: the device's ID (json-string) + +Example: + +- { execute: drive_del, arguments: { id: drive-virtio-blk1 } } +- { return: {} } + +EQMP + +{ .name = cpu, .args_type = index:i, .params = index, -- 1.6.3.3
[Qemu-devel] Re: [PATCH 1/2] Implement drive_del to decouple block removal from device removal
* Kevin Wolf kw...@redhat.com [2010-11-12 10:43]: Am 12.11.2010 16:38, schrieb Ryan Harper: Currently device hotplug removal code is tied to device removal via ACPI. All pci devices that are removable via device_del() require the guest to respond to the request. In some cases the guest may not respond leaving the device still accessible to the guest. The management layer doesn't currently have a reliable way to revoke access to host resource in the presence of an uncooperative guest. This patch implements a new monitor command, drive_del, which provides an explicit command to revoke access to a host block device. drive_del first quiesces the block device (qemu_aio_flush; bdrv_flush() and bdrv_close()). This prevents further IO from being submitted against the host device. Finally, drive_del cleans up pointers between the drive object (host resource) and the device object (guest resource). Signed-off-by: Ryan Harper ry...@us.ibm.com --- blockdev.c | 39 +++ blockdev.h |1 + hmp-commands.hx | 18 ++ 3 files changed, 58 insertions(+), 0 deletions(-) diff --git a/blockdev.c b/blockdev.c index 6cb179a..f6ac439 100644 --- a/blockdev.c +++ b/blockdev.c @@ -14,6 +14,8 @@ #include qemu-option.h #include qemu-config.h #include sysemu.h +#include hw/qdev.h +#include block_int.h static QTAILQ_HEAD(drivelist, DriveInfo) drives = QTAILQ_HEAD_INITIALIZER(drives); @@ -597,3 +599,40 @@ int do_change_block(Monitor *mon, const char *device, } return monitor_read_bdrv_key_start(mon, bs, NULL, NULL); } + +int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data) +{ +const char *id = qdict_get_str(qdict, id); +BlockDriverState *bs; +BlockDriverState **ptr; +Property *prop; + +bs = bdrv_find(id); +if (!bs) { +qerror_report(QERR_DEVICE_NOT_FOUND, id); +return -1; +} + +/* quiesce block driver; prevent further io */ +qemu_aio_flush(); +bdrv_flush(bs); +bdrv_close(bs); + +/* clean up guest state from pointing to host resource by + * finding and removing DeviceState drive property */ +for (prop = bs-peer-info-props; prop prop-name; prop++) { +if (prop-info-type == PROP_TYPE_DRIVE) { +ptr = qdev_get_prop_ptr(bs-peer, prop); +if ((*ptr) == bs) { +bdrv_detach(bs, bs-peer); +*ptr = NULL; +break; +} +} +} + +/* clean up host side */ +drive_uninit(drive_get_by_blockdev(bs)); + +return 0; +} diff --git a/blockdev.h b/blockdev.h index 653affc..2a0559e 100644 --- a/blockdev.h +++ b/blockdev.h @@ -51,5 +51,6 @@ int do_eject(Monitor *mon, const QDict *qdict, QObject **ret_data); int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject **ret_data); int do_change_block(Monitor *mon, const char *device, const char *filename, const char *fmt); +int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data); #endif diff --git a/hmp-commands.hx b/hmp-commands.hx index e5585ba..d6dc18c 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -68,6 +68,24 @@ Eject a removable medium (use -f to force it). ETEXI { +.name = drive_del, +.args_type = id:s, +.params = device, +.help = remove host block device, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_drive_del, +}, + +STEXI +...@item delete @var{device} +...@findex delete I think this should be @item drive_del and @findex drive_del. Yep. Kevin -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
[Qemu-devel] [PATCH 2/2] Add qmp version of drive_del
Signed-off-by: Ryan Harper ry...@us.ibm.com --- qmp-commands.hx | 29 + 1 files changed, 29 insertions(+), 0 deletions(-) diff --git a/qmp-commands.hx b/qmp-commands.hx index 793cf1c..1e0d4e9 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -338,6 +338,35 @@ Example: EQMP { +.name = drive_del, +.args_type = id:s, +.params = device, +.help = remove host block device, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_drive_del, +}, + +SQMP +drive del +-- + +Remove host block device. The result is that guest generated IO is no longer +submitted against the host device underlying the disk. Once a drive has +been deleted, the QEMU Block layer returns -EIO which results in IO +errors in the guest for applications that are reading/writing to the device. + +Arguments: + +- id: the device's ID (json-string) + +Example: + +- { execute: drive_del, arguments: { id: drive-virtio-blk1 } } +- { return: {} } + +EQMP + +{ .name = cpu, .args_type = index:i, .params = index, -- 1.6.3.3
[Qemu-devel] [PATCH 0/2] v8 Decouple block device removal from device removal
details, details, v8 This patch series decouples the detachment of a block device from the removal of the backing pci-device. Removal of a hotplugged pci device requires the guest to respond before qemu tears down the block device. In some cases, the guest may not respond leaving the guest with continued access to the block device. Mgmt layer doesn't have a reliable method to force a disconnect. The new monitor command, drive_del, will revoke a guests access to the block device independently of the removal of the pci device. The first patch implements drive_del, the second patch implements the qmp version of the monitor command. Changes since v7: - Fixed up doc strings (delete - drive_del) Changes since v6: - Updated patch description - Dropped bdrv_unplug and inlined in drive_del - Explicitly invoke drive_uninit() Changes since v5: - Removed dangling pointers in guest and host state. This ensures things like info block no longer displays the deleted drive; though info pci will continue to display the pci device until the guest responds to the removal request. - Renamed drive_unplug - drive_del Changes since v4: - Droppped drive_get_by_id patch and use bdrv_find() instead - Added additional details about drive_unplug to hmp/qmp interface Changes since v3: - Moved QMP command for drive_unplug() to separate patch Changes since v2: - Added QMP command for drive_unplug() Changes since v1: - CodingStyle fixes - Added qemu_aio_flush() to bdrv_unplug() Signed-off-by: Ryan Harper ry...@us.ibm.com
[Qemu-devel] [PATCH 1/2] Implement drive_del to decouple block removal from device removal
Currently device hotplug removal code is tied to device removal via ACPI. All pci devices that are removable via device_del() require the guest to respond to the request. In some cases the guest may not respond leaving the device still accessible to the guest. The management layer doesn't currently have a reliable way to revoke access to host resource in the presence of an uncooperative guest. This patch implements a new monitor command, drive_del, which provides an explicit command to revoke access to a host block device. drive_del first quiesces the block device (qemu_aio_flush; bdrv_flush() and bdrv_close()). This prevents further IO from being submitted against the host device. Finally, drive_del cleans up pointers between the drive object (host resource) and the device object (guest resource). Signed-off-by: Ryan Harper ry...@us.ibm.com --- blockdev.c | 39 +++ blockdev.h |1 + hmp-commands.hx | 18 ++ 3 files changed, 58 insertions(+), 0 deletions(-) diff --git a/blockdev.c b/blockdev.c index 6cb179a..f6ac439 100644 --- a/blockdev.c +++ b/blockdev.c @@ -14,6 +14,8 @@ #include qemu-option.h #include qemu-config.h #include sysemu.h +#include hw/qdev.h +#include block_int.h static QTAILQ_HEAD(drivelist, DriveInfo) drives = QTAILQ_HEAD_INITIALIZER(drives); @@ -597,3 +599,40 @@ int do_change_block(Monitor *mon, const char *device, } return monitor_read_bdrv_key_start(mon, bs, NULL, NULL); } + +int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data) +{ +const char *id = qdict_get_str(qdict, id); +BlockDriverState *bs; +BlockDriverState **ptr; +Property *prop; + +bs = bdrv_find(id); +if (!bs) { +qerror_report(QERR_DEVICE_NOT_FOUND, id); +return -1; +} + +/* quiesce block driver; prevent further io */ +qemu_aio_flush(); +bdrv_flush(bs); +bdrv_close(bs); + +/* clean up guest state from pointing to host resource by + * finding and removing DeviceState drive property */ +for (prop = bs-peer-info-props; prop prop-name; prop++) { +if (prop-info-type == PROP_TYPE_DRIVE) { +ptr = qdev_get_prop_ptr(bs-peer, prop); +if ((*ptr) == bs) { +bdrv_detach(bs, bs-peer); +*ptr = NULL; +break; +} +} +} + +/* clean up host side */ +drive_uninit(drive_get_by_blockdev(bs)); + +return 0; +} diff --git a/blockdev.h b/blockdev.h index 653affc..2a0559e 100644 --- a/blockdev.h +++ b/blockdev.h @@ -51,5 +51,6 @@ int do_eject(Monitor *mon, const QDict *qdict, QObject **ret_data); int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject **ret_data); int do_change_block(Monitor *mon, const char *device, const char *filename, const char *fmt); +int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data); #endif diff --git a/hmp-commands.hx b/hmp-commands.hx index e5585ba..c8b0206 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -68,6 +68,24 @@ Eject a removable medium (use -f to force it). ETEXI { +.name = drive_del, +.args_type = id:s, +.params = device, +.help = remove host block device, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_drive_del, +}, + +STEXI +...@item drive_del @var{device} +...@findex drive_del +Remove host block device. The result is that guest generated IO is no longer +submitted against the host device underlying the disk. Once a drive has +been deleted, the QEMU Block layer returns -EIO which results in IO +errors in the guest for applications that are reading/writing to the device. +ETEXI + +{ .name = change, .args_type = device:B,target:F,arg:s?, .params = device filename [format], -- 1.6.3.3
Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del()
* Markus Armbruster arm...@redhat.com [2010-11-11 04:48]: Ryan Harper ry...@us.ibm.com writes: * Markus Armbruster arm...@redhat.com [2010-11-10 11:40]: Ryan Harper ry...@us.ibm.com writes: * Markus Armbruster arm...@redhat.com [2010-11-10 06:48]: One real question, and a couple of nits. Ryan Harper ry...@us.ibm.com writes: Block hot unplug is racy since the guest is required to acknowlege the ACPI unplug event; this may not happen synchronously with the device removal command Well, I wouldn't call unplug racy. It just takes an unpredictable length of time, possibly forever. To make a race, you need to throw in a client assuming (incorrectly) that unplug is instantaneous, as described in your next paragraph. Moreover, all PCI unplug is that way, not just block. This series aims to close a gap where by mgmt applications that assume the block resource has been removed without confirming that the guest has acknowledged the removal may re-assign the underlying device to a second guest leading to data leakage. Yes, the incorrect assumption is a problem. But with that fixed (in the management application), we run right into the next problem: there is no way for the management application to reliably disconnect the guest from a block device. And that's the problem you're fixing. Yeah, that's the right way to word it; providing a method to forcibly disconnect the guest from the host device. This series introduces a new montor command to decouple asynchornous device Typos montor and asynchornous. You might want to use a spell checker :) Lines are a bit long. Recommend wrap at column 70. removal from restricting guest access to a block device. We do this by creating a new monitor command drive_del which maps to a bdrv_unplug() command which does a qemu_aio_flush; bdrv_flush() and bdrv_close(). Once complete, subsequent IO is rejected from the device and the guest will get IO errors but continue to function. In addition to preventing further IO, we clean up state pointers between host (BlockDriverState) and guest (DeviceInfo). A subsequent device removal command can be issued to remove the device, to which the guest may or maynot respond, but as long as the unplugged bit is set, no IO maynot is not a word. will be sumbitted. This suggests to drive_del before device_del, which makes the device goes through a broken device state on its way to unplug. If the guest accesses the device in that state, it gets I/O errors. Not nice. Instead, I'd recommend device_del, wait for the device to go away, drive_del on time out. If the guest reacts to the ACPI unplug promptly, it's never exposed to the broken device state. Note: if the drive_del fails because the device doesn't exist, we lost the race with the automatic destruction, which is harmless. Ignore that error. Honestly, other than describing what happens if you sever the connection when the guest isn't aware of it; I don't want to try to capture how the mgmt layer implements the removal. One may want to force the disconnect before attempting to remove the device; or the other way around; that's really the mgmt layer's call. Fair enough. Signed-off-by: Ryan Harper ry...@us.ibm.com --- block.c |7 +++ block.h |1 + blockdev.c | 36 blockdev.h |1 + hmp-commands.hx | 18 ++ 5 files changed, 63 insertions(+), 0 deletions(-) diff --git a/block.c b/block.c index 6b505fb..c76a796 100644 --- a/block.c +++ b/block.c @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int removable) } } +void bdrv_unplug(BlockDriverState *bs) +{ +qemu_aio_flush(); +bdrv_flush(bs); +bdrv_close(bs); +} + Unless we expect more users, I'd inline this into its only caller. Matter of taste. Works for me. int bdrv_is_removable(BlockDriverState *bs) { return bs-removable; diff --git a/block.h b/block.h index 78ecfac..581414c 100644 --- a/block.h +++ b/block.h @@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, BlockErrorAction on_read_error, BlockErrorAction on_write_error); BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read); void bdrv_set_removable(BlockDriverState *bs, int removable); +void bdrv_unplug(BlockDriverState *bs); int bdrv_is_removable(BlockDriverState *bs); int bdrv_is_read_only(BlockDriverState *bs); int bdrv_is_sg(BlockDriverState *bs); diff --git a/blockdev.c b/blockdev.c
Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del()
* Markus Armbruster arm...@redhat.com [2010-11-10 06:48]: One real question, and a couple of nits. Ryan Harper ry...@us.ibm.com writes: Block hot unplug is racy since the guest is required to acknowlege the ACPI unplug event; this may not happen synchronously with the device removal command Well, I wouldn't call unplug racy. It just takes an unpredictable length of time, possibly forever. To make a race, you need to throw in a client assuming (incorrectly) that unplug is instantaneous, as described in your next paragraph. Moreover, all PCI unplug is that way, not just block. This series aims to close a gap where by mgmt applications that assume the block resource has been removed without confirming that the guest has acknowledged the removal may re-assign the underlying device to a second guest leading to data leakage. Yes, the incorrect assumption is a problem. But with that fixed (in the management application), we run right into the next problem: there is no way for the management application to reliably disconnect the guest from a block device. And that's the problem you're fixing. Yeah, that's the right way to word it; providing a method to forcibly disconnect the guest from the host device. This series introduces a new montor command to decouple asynchornous device Typos montor and asynchornous. You might want to use a spell checker :) Lines are a bit long. Recommend wrap at column 70. removal from restricting guest access to a block device. We do this by creating a new monitor command drive_del which maps to a bdrv_unplug() command which does a qemu_aio_flush; bdrv_flush() and bdrv_close(). Once complete, subsequent IO is rejected from the device and the guest will get IO errors but continue to function. In addition to preventing further IO, we clean up state pointers between host (BlockDriverState) and guest (DeviceInfo). A subsequent device removal command can be issued to remove the device, to which the guest may or maynot respond, but as long as the unplugged bit is set, no IO maynot is not a word. will be sumbitted. This suggests to drive_del before device_del, which makes the device goes through a broken device state on its way to unplug. If the guest accesses the device in that state, it gets I/O errors. Not nice. Instead, I'd recommend device_del, wait for the device to go away, drive_del on time out. If the guest reacts to the ACPI unplug promptly, it's never exposed to the broken device state. Note: if the drive_del fails because the device doesn't exist, we lost the race with the automatic destruction, which is harmless. Ignore that error. Honestly, other than describing what happens if you sever the connection when the guest isn't aware of it; I don't want to try to capture how the mgmt layer implements the removal. One may want to force the disconnect before attempting to remove the device; or the other way around; that's really the mgmt layer's call. Signed-off-by: Ryan Harper ry...@us.ibm.com --- block.c |7 +++ block.h |1 + blockdev.c | 36 blockdev.h |1 + hmp-commands.hx | 18 ++ 5 files changed, 63 insertions(+), 0 deletions(-) diff --git a/block.c b/block.c index 6b505fb..c76a796 100644 --- a/block.c +++ b/block.c @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int removable) } } +void bdrv_unplug(BlockDriverState *bs) +{ +qemu_aio_flush(); +bdrv_flush(bs); +bdrv_close(bs); +} + Unless we expect more users, I'd inline this into its only caller. Matter of taste. Works for me. int bdrv_is_removable(BlockDriverState *bs) { return bs-removable; diff --git a/block.h b/block.h index 78ecfac..581414c 100644 --- a/block.h +++ b/block.h @@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, BlockErrorAction on_read_error, BlockErrorAction on_write_error); BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read); void bdrv_set_removable(BlockDriverState *bs, int removable); +void bdrv_unplug(BlockDriverState *bs); int bdrv_is_removable(BlockDriverState *bs); int bdrv_is_read_only(BlockDriverState *bs); int bdrv_is_sg(BlockDriverState *bs); diff --git a/blockdev.c b/blockdev.c index 6cb179a..ee8c2ec 100644 --- a/blockdev.c +++ b/blockdev.c @@ -14,6 +14,8 @@ #include qemu-option.h #include qemu-config.h #include sysemu.h +#include hw/qdev.h +#include block_int.h static QTAILQ_HEAD(drivelist, DriveInfo) drives = QTAILQ_HEAD_INITIALIZER(drives); @@ -597,3 +599,37 @@ int do_change_block(Monitor *mon, const char *device, } return monitor_read_bdrv_key_start(mon, bs, NULL, NULL); } + +int do_drive_del(Monitor *mon, const
Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del()
* Markus Armbruster arm...@redhat.com [2010-11-10 11:40]: Ryan Harper ry...@us.ibm.com writes: * Markus Armbruster arm...@redhat.com [2010-11-10 06:48]: One real question, and a couple of nits. Ryan Harper ry...@us.ibm.com writes: Block hot unplug is racy since the guest is required to acknowlege the ACPI unplug event; this may not happen synchronously with the device removal command Well, I wouldn't call unplug racy. It just takes an unpredictable length of time, possibly forever. To make a race, you need to throw in a client assuming (incorrectly) that unplug is instantaneous, as described in your next paragraph. Moreover, all PCI unplug is that way, not just block. This series aims to close a gap where by mgmt applications that assume the block resource has been removed without confirming that the guest has acknowledged the removal may re-assign the underlying device to a second guest leading to data leakage. Yes, the incorrect assumption is a problem. But with that fixed (in the management application), we run right into the next problem: there is no way for the management application to reliably disconnect the guest from a block device. And that's the problem you're fixing. Yeah, that's the right way to word it; providing a method to forcibly disconnect the guest from the host device. This series introduces a new montor command to decouple asynchornous device Typos montor and asynchornous. You might want to use a spell checker :) Lines are a bit long. Recommend wrap at column 70. removal from restricting guest access to a block device. We do this by creating a new monitor command drive_del which maps to a bdrv_unplug() command which does a qemu_aio_flush; bdrv_flush() and bdrv_close(). Once complete, subsequent IO is rejected from the device and the guest will get IO errors but continue to function. In addition to preventing further IO, we clean up state pointers between host (BlockDriverState) and guest (DeviceInfo). A subsequent device removal command can be issued to remove the device, to which the guest may or maynot respond, but as long as the unplugged bit is set, no IO maynot is not a word. will be sumbitted. This suggests to drive_del before device_del, which makes the device goes through a broken device state on its way to unplug. If the guest accesses the device in that state, it gets I/O errors. Not nice. Instead, I'd recommend device_del, wait for the device to go away, drive_del on time out. If the guest reacts to the ACPI unplug promptly, it's never exposed to the broken device state. Note: if the drive_del fails because the device doesn't exist, we lost the race with the automatic destruction, which is harmless. Ignore that error. Honestly, other than describing what happens if you sever the connection when the guest isn't aware of it; I don't want to try to capture how the mgmt layer implements the removal. One may want to force the disconnect before attempting to remove the device; or the other way around; that's really the mgmt layer's call. Fair enough. Signed-off-by: Ryan Harper ry...@us.ibm.com --- block.c |7 +++ block.h |1 + blockdev.c | 36 blockdev.h |1 + hmp-commands.hx | 18 ++ 5 files changed, 63 insertions(+), 0 deletions(-) diff --git a/block.c b/block.c index 6b505fb..c76a796 100644 --- a/block.c +++ b/block.c @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int removable) } } +void bdrv_unplug(BlockDriverState *bs) +{ +qemu_aio_flush(); +bdrv_flush(bs); +bdrv_close(bs); +} + Unless we expect more users, I'd inline this into its only caller. Matter of taste. Works for me. int bdrv_is_removable(BlockDriverState *bs) { return bs-removable; diff --git a/block.h b/block.h index 78ecfac..581414c 100644 --- a/block.h +++ b/block.h @@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, BlockErrorAction on_read_error, BlockErrorAction on_write_error); BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read); void bdrv_set_removable(BlockDriverState *bs, int removable); +void bdrv_unplug(BlockDriverState *bs); int bdrv_is_removable(BlockDriverState *bs); int bdrv_is_read_only(BlockDriverState *bs); int bdrv_is_sg(BlockDriverState *bs); diff --git a/blockdev.c b/blockdev.c index 6cb179a..ee8c2ec 100644 --- a/blockdev.c +++ b/blockdev.c @@ -14,6 +14,8 @@ #include qemu-option.h #include qemu-config.h #include sysemu.h +#include hw/qdev.h +#include block_int.h static
Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del()
* Markus Armbruster arm...@redhat.com [2010-11-10 11:40]: Ryan Harper ry...@us.ibm.com writes: * Markus Armbruster arm...@redhat.com [2010-11-10 06:48]: One real question, and a couple of nits. Ryan Harper ry...@us.ibm.com writes: Block hot unplug is racy since the guest is required to acknowlege the ACPI unplug event; this may not happen synchronously with the device removal command Well, I wouldn't call unplug racy. It just takes an unpredictable length of time, possibly forever. To make a race, you need to throw in a client assuming (incorrectly) that unplug is instantaneous, as described in your next paragraph. Moreover, all PCI unplug is that way, not just block. This series aims to close a gap where by mgmt applications that assume the block resource has been removed without confirming that the guest has acknowledged the removal may re-assign the underlying device to a second guest leading to data leakage. Yes, the incorrect assumption is a problem. But with that fixed (in the management application), we run right into the next problem: there is no way for the management application to reliably disconnect the guest from a block device. And that's the problem you're fixing. Yeah, that's the right way to word it; providing a method to forcibly disconnect the guest from the host device. This series introduces a new montor command to decouple asynchornous device Typos montor and asynchornous. You might want to use a spell checker :) Lines are a bit long. Recommend wrap at column 70. removal from restricting guest access to a block device. We do this by creating a new monitor command drive_del which maps to a bdrv_unplug() command which does a qemu_aio_flush; bdrv_flush() and bdrv_close(). Once complete, subsequent IO is rejected from the device and the guest will get IO errors but continue to function. In addition to preventing further IO, we clean up state pointers between host (BlockDriverState) and guest (DeviceInfo). A subsequent device removal command can be issued to remove the device, to which the guest may or maynot respond, but as long as the unplugged bit is set, no IO maynot is not a word. will be sumbitted. This suggests to drive_del before device_del, which makes the device goes through a broken device state on its way to unplug. If the guest accesses the device in that state, it gets I/O errors. Not nice. Instead, I'd recommend device_del, wait for the device to go away, drive_del on time out. If the guest reacts to the ACPI unplug promptly, it's never exposed to the broken device state. Note: if the drive_del fails because the device doesn't exist, we lost the race with the automatic destruction, which is harmless. Ignore that error. Honestly, other than describing what happens if you sever the connection when the guest isn't aware of it; I don't want to try to capture how the mgmt layer implements the removal. One may want to force the disconnect before attempting to remove the device; or the other way around; that's really the mgmt layer's call. Fair enough. Signed-off-by: Ryan Harper ry...@us.ibm.com --- block.c |7 +++ block.h |1 + blockdev.c | 36 blockdev.h |1 + hmp-commands.hx | 18 ++ 5 files changed, 63 insertions(+), 0 deletions(-) diff --git a/block.c b/block.c index 6b505fb..c76a796 100644 --- a/block.c +++ b/block.c @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int removable) } } +void bdrv_unplug(BlockDriverState *bs) +{ +qemu_aio_flush(); +bdrv_flush(bs); +bdrv_close(bs); +} + Unless we expect more users, I'd inline this into its only caller. Matter of taste. Works for me. int bdrv_is_removable(BlockDriverState *bs) { return bs-removable; diff --git a/block.h b/block.h index 78ecfac..581414c 100644 --- a/block.h +++ b/block.h @@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, BlockErrorAction on_read_error, BlockErrorAction on_write_error); BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read); void bdrv_set_removable(BlockDriverState *bs, int removable); +void bdrv_unplug(BlockDriverState *bs); int bdrv_is_removable(BlockDriverState *bs); int bdrv_is_read_only(BlockDriverState *bs); int bdrv_is_sg(BlockDriverState *bs); diff --git a/blockdev.c b/blockdev.c index 6cb179a..ee8c2ec 100644 --- a/blockdev.c +++ b/blockdev.c @@ -14,6 +14,8 @@ #include qemu-option.h #include qemu-config.h #include sysemu.h +#include hw/qdev.h +#include block_int.h static
Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
* Markus Armbruster arm...@redhat.com [2010-11-08 06:04]: Michael S. Tsirkin m...@redhat.com writes: On Mon, Nov 08, 2010 at 11:32:01AM +0100, Markus Armbruster wrote: Ryan Harper ry...@us.ibm.com writes: * Markus Armbruster arm...@redhat.com [2010-11-06 04:19]: Ryan Harper ry...@us.ibm.com writes: * Markus Armbruster arm...@redhat.com [2010-11-05 11:11]: Ryan Harper ry...@us.ibm.com writes: * Markus Armbruster arm...@redhat.com [2010-11-05 08:28]: I'd be fine with any of these: 1. A new command device_disconnet ID (or similar name) to disconnect device ID from any host parts. Nice touch: you don't have to know about the device's host part(s) to disconnect it. But it might be more work than the other two. This is sort of what netdev_del() and drive_unplug() are today; we're just saying sever the connection of this device id. No, I have netdev_del as (3). All three options are sort of the same, just different commands with a common purpose. I'd like to rename drive_unplug() to blockdev_del() and call it done. I was looking at libvirt and the right call to netdev_del is already in-place; I'd just need to re-spin my block patch to call blockdev_del() after invoking device_del() to match what is done for net. Unless I'm missing something, you can't just rename: your unplug does not delete the host part. 2. New commands netdev_disconnect, drive_disconnect (or similar names) to disconnect a host part from a guest device. Like (1), except you have to point to the other end of the connection to cut it. What's the advantage here? We need an additional piece of info (host part) in addition to the device id? That's a disadvantage. Possible advantage: implementation could be slightly easier than (1), because you don't have to find the host parts. 3. A new command drive_del ID similar to existing netdev_del. This is (2) fused with delete. Conceptual wart: you can't disconnect and keep the host part around. Moreover, delete is slightly dangerous, because it renders any guest device still using the host part useless. Hrm, I thought that's what (1) is. No. With (1), the argument is a *device* ID, and we disconnect *all* host parts connected to this device (typically just one). With (3), the argument is a netdev/drive ID, and disconnect *this* host part from the peer device. Well, either (1) or (3); I'd like to rename drive_unplug() to blockdev_del() since they're similar function w.r.t removing access to the host resource. And we can invoke them in the same way from libvirt (after doing guest notification, remove access). I'd call it drive_del for now, to match drive_add. OK, drive_del() and as you mentioned, drive_unplug will take out the block driver, but doesn't remove the dinfo object; that ends up dying when we call the device destructor. I think for symmetry we'll want drive_del to remove the dinfo object as well. Exactly. a. bdrv_detach() to zap the pointer from bdrv to qdev b. zap the pointer from qdev to bdrv c. drive_uninit() to dispose of the host part a-c need to be done to match netdev_del symmetry? How hard of a req is this? Without (c), it's not a delete. And (c) without (b) leaves a dangling pointer. (c) without (a) fails an assertion in bdrv_delete(). Aside: (b) should probably be folded into bdrv_detach(). Step b could be awkward with (3), because you don't know device details. I guess you have to search device properties for a drive property pointing to bdrv. I like (1) because it puts that loop in the one place where it belongs: qdev core. (3) duplicates it in every HOSTDEV_del. Except for netdev_del, which is special because of VLANs. To avoid step b, you could try to keep the bdrv around in a special zombie state. Still have to free the dinfo, but can't use drive_uninit() for that then. If you think I'm overcomplicating this, feel free to prove me wrong with working code :) drive_unplug() works as-is today; so it does feel very combursome at this point. Other than the name change and agreement on how mgmt should invoke the command, it's been a long ride to get here. Sometimes it takes a tough man to make a tender chicken. I'll take my best shot at trying to clean up the other pointers and objects; though on one of my attempts when I took out the dinfo() object that didn't go so well; going to have to audit who uses dinfo and where and what they check before calling it to have
Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
* Michael S. Tsirkin m...@redhat.com [2010-11-08 10:57]: On Mon, Nov 08, 2010 at 08:02:50AM -0600, Ryan Harper wrote: * Markus Armbruster arm...@redhat.com [2010-11-08 06:04]: Michael S. Tsirkin m...@redhat.com writes: On Mon, Nov 08, 2010 at 11:32:01AM +0100, Markus Armbruster wrote: Ryan Harper ry...@us.ibm.com writes: * Markus Armbruster arm...@redhat.com [2010-11-06 04:19]: Ryan Harper ry...@us.ibm.com writes: * Markus Armbruster arm...@redhat.com [2010-11-05 11:11]: Ryan Harper ry...@us.ibm.com writes: * Markus Armbruster arm...@redhat.com [2010-11-05 08:28]: I'd be fine with any of these: 1. A new command device_disconnet ID (or similar name) to disconnect device ID from any host parts. Nice touch: you don't have to know about the device's host part(s) to disconnect it. But it might be more work than the other two. This is sort of what netdev_del() and drive_unplug() are today; we're just saying sever the connection of this device id. No, I have netdev_del as (3). All three options are sort of the same, just different commands with a common purpose. I'd like to rename drive_unplug() to blockdev_del() and call it done. I was looking at libvirt and the right call to netdev_del is already in-place; I'd just need to re-spin my block patch to call blockdev_del() after invoking device_del() to match what is done for net. Unless I'm missing something, you can't just rename: your unplug does not delete the host part. 2. New commands netdev_disconnect, drive_disconnect (or similar names) to disconnect a host part from a guest device. Like (1), except you have to point to the other end of the connection to cut it. What's the advantage here? We need an additional piece of info (host part) in addition to the device id? That's a disadvantage. Possible advantage: implementation could be slightly easier than (1), because you don't have to find the host parts. 3. A new command drive_del ID similar to existing netdev_del. This is (2) fused with delete. Conceptual wart: you can't disconnect and keep the host part around. Moreover, delete is slightly dangerous, because it renders any guest device still using the host part useless. Hrm, I thought that's what (1) is. No. With (1), the argument is a *device* ID, and we disconnect *all* host parts connected to this device (typically just one). With (3), the argument is a netdev/drive ID, and disconnect *this* host part from the peer device. Well, either (1) or (3); I'd like to rename drive_unplug() to blockdev_del() since they're similar function w.r.t removing access to the host resource. And we can invoke them in the same way from libvirt (after doing guest notification, remove access). I'd call it drive_del for now, to match drive_add. OK, drive_del() and as you mentioned, drive_unplug will take out the block driver, but doesn't remove the dinfo object; that ends up dying when we call the device destructor. I think for symmetry we'll want drive_del to remove the dinfo object as well. Exactly. a. bdrv_detach() to zap the pointer from bdrv to qdev b. zap the pointer from qdev to bdrv c. drive_uninit() to dispose of the host part a-c need to be done to match netdev_del symmetry? How hard of a req is this? Without (c), it's not a delete. And (c) without (b) leaves a dangling pointer. (c) without (a) fails an assertion in bdrv_delete(). Aside: (b) should probably be folded into bdrv_detach(). Step b could be awkward with (3), because you don't know device details. I guess you have to search device properties for a drive property pointing to bdrv. I like (1) because it puts that loop in the one place where it belongs: qdev core. (3) duplicates it in every HOSTDEV_del. Except for netdev_del, which is special because of VLANs. To avoid step b, you could try to keep the bdrv around in a special zombie state. Still have to free the dinfo, but can't use drive_uninit() for that then. If you think I'm overcomplicating this, feel free to prove me wrong with working code :) drive_unplug() works as-is today; so it does feel very combursome at this point
Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
* Daniel P. Berrange berra...@redhat.com [2010-11-08 11:05]: On Mon, Nov 08, 2010 at 06:56:02PM +0200, Michael S. Tsirkin wrote: On Mon, Nov 08, 2010 at 08:02:50AM -0600, Ryan Harper wrote: * Markus Armbruster arm...@redhat.com [2010-11-08 06:04]: Michael S. Tsirkin m...@redhat.com writes: Here's how the various objects are connected to each other: contains drivelist--- DriveInfo | | .bdrv | .id == .bdrv-device_name | contains V bdrv_states --- BlockDriverState | ^ .peer | | | | host part -|---|--- | | guest part | | property drive v | DeviceState To disconnect host from guest part, you need to cut both pointers. To delete the host part, you need to delete both objects, BlockDriverState and DriveInfo. If we remove DriveInfo, how can management later detect that guest part was deleted? Directly: check whether the qdev is gone. I don't know how to check that indirectly, via DriveInfo. If you want symmetry with netdev, it's possible to keep a shell of BlockDriverState/DriveInfo around (solving dangling pointer problems). netdev_del deletes the host network part: (qemu) info network Devices not on any VLAN: net.0: net=10.0.2.0, restricted=n peer=nic.0 nic.0: model=virtio-net-pci,macaddr=52:54:00:12:34:56 peer=net.0 (qemu) netdev_del net.0 (qemu) info network Devices not on any VLAN: nic.0: model=virtio-net-pci,macaddr=52:54:00:12:34:56 peer=net.0 It leaves around the VLAN object. Since qdev property points to that, it doesn't dangle. In my opinion, drive_del should make the drive vanish from info block, Yeah; that's the right thing to do here. Let me respin the patch with the name change and the additional work to fix up the pointers and ensure that we don't see the drive in info block. Daniel, I'd like your input here: can you live with device diappearing from info block and parsing qdev tree info to figure out whether device is really gone? We don't use info block for anything. Having to parse the full qdev tree to determine if a single device is gone seems rather tedious. It would be better if query-qdev took an optional argument, which is the name of the device to root the tree at. Then checking whether a device named 'foo' is gone just means running 'query-qdev foo' and seeing if that returns an error about the device not existing, then we know it has gone. No need to parse anything. Being able to query the qdev data for a single device, or sub-tree of devices seems useful in its own right. Since I'm not looking forward to parsing info block (easy) nor parsing all of qdev tree (much harder) I really like the query approach. That makes it easy to put a query in the netdev_del/drive_del commands to skip invoking them if the guest has already responded. Regards, Daniel -- |: Red Hat, Engineering, London-o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org-o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
[Qemu-devel] [PATCH 2/2] Add qmp version of drive_del
Signed-off-by: Ryan Harper ry...@us.ibm.com --- qmp-commands.hx | 29 + 1 files changed, 29 insertions(+), 0 deletions(-) diff --git a/qmp-commands.hx b/qmp-commands.hx index 793cf1c..1e0d4e9 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -338,6 +338,35 @@ Example: EQMP { +.name = drive_del, +.args_type = id:s, +.params = device, +.help = remove host block device, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_drive_del, +}, + +SQMP +drive del +-- + +Remove host block device. The result is that guest generated IO is no longer +submitted against the host device underlying the disk. Once a drive has +been deleted, the QEMU Block layer returns -EIO which results in IO +errors in the guest for applications that are reading/writing to the device. + +Arguments: + +- id: the device's ID (json-string) + +Example: + +- { execute: drive_del, arguments: { id: drive-virtio-blk1 } } +- { return: {} } + +EQMP + +{ .name = cpu, .args_type = index:i, .params = index, -- 1.6.3.3
[Qemu-devel] [PATCH 0/2] v6 Decouple block device removal from device removal
After *much* discussion, here's version 6. This patch series decouples the detachment of a block device from the removal of the backing pci-device. Removal of a hotplugged pci device requires the guest to respond before qemu tears down the block device. In some cases, the guest may not respond leaving the guest with continued access to the block device. The new monitor command, drive_del, will revoke a guests access to the block device independently of the removal of the pci device. The first patch implements drive_del and bdrv_unplug, the second patch implements the qmp version of the monitor command. Changes since v5: - Removed dangling pointers in guest and host state. This ensures things like info block no longer displays the deleted drive; though info pci will continue to display the pci device until the guest responds to the removal request. - Renamed drive_unplug - drive_del Changes since v4: - Droppped drive_get_by_id patch and use bdrv_find() instead - Added additional details about drive_unplug to hmp/qmp interface Changes since v3: - Moved QMP command for drive_unplug() to separate patch Changes since v2: - Added QMP command for drive_unplug() Changes since v1: - CodingStyle fixes - Added qemu_aio_flush() to bdrv_unplug() Signed-off-by: Ryan Harper ry...@us.ibm.com
[Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del()
Block hot unplug is racy since the guest is required to acknowlege the ACPI unplug event; this may not happen synchronously with the device removal command This series aims to close a gap where by mgmt applications that assume the block resource has been removed without confirming that the guest has acknowledged the removal may re-assign the underlying device to a second guest leading to data leakage. This series introduces a new montor command to decouple asynchornous device removal from restricting guest access to a block device. We do this by creating a new monitor command drive_del which maps to a bdrv_unplug() command which does a qemu_aio_flush; bdrv_flush() and bdrv_close(). Once complete, subsequent IO is rejected from the device and the guest will get IO errors but continue to function. In addition to preventing further IO, we clean up state pointers between host (BlockDriverState) and guest (DeviceInfo). A subsequent device removal command can be issued to remove the device, to which the guest may or maynot respond, but as long as the unplugged bit is set, no IO will be sumbitted. Signed-off-by: Ryan Harper ry...@us.ibm.com --- block.c |7 +++ block.h |1 + blockdev.c | 36 blockdev.h |1 + hmp-commands.hx | 18 ++ 5 files changed, 63 insertions(+), 0 deletions(-) diff --git a/block.c b/block.c index 6b505fb..c76a796 100644 --- a/block.c +++ b/block.c @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int removable) } } +void bdrv_unplug(BlockDriverState *bs) +{ +qemu_aio_flush(); +bdrv_flush(bs); +bdrv_close(bs); +} + int bdrv_is_removable(BlockDriverState *bs) { return bs-removable; diff --git a/block.h b/block.h index 78ecfac..581414c 100644 --- a/block.h +++ b/block.h @@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, BlockErrorAction on_read_error, BlockErrorAction on_write_error); BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read); void bdrv_set_removable(BlockDriverState *bs, int removable); +void bdrv_unplug(BlockDriverState *bs); int bdrv_is_removable(BlockDriverState *bs); int bdrv_is_read_only(BlockDriverState *bs); int bdrv_is_sg(BlockDriverState *bs); diff --git a/blockdev.c b/blockdev.c index 6cb179a..ee8c2ec 100644 --- a/blockdev.c +++ b/blockdev.c @@ -14,6 +14,8 @@ #include qemu-option.h #include qemu-config.h #include sysemu.h +#include hw/qdev.h +#include block_int.h static QTAILQ_HEAD(drivelist, DriveInfo) drives = QTAILQ_HEAD_INITIALIZER(drives); @@ -597,3 +599,37 @@ int do_change_block(Monitor *mon, const char *device, } return monitor_read_bdrv_key_start(mon, bs, NULL, NULL); } + +int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data) +{ +const char *id = qdict_get_str(qdict, id); +BlockDriverState *bs; +Property *prop; + +bs = bdrv_find(id); +if (!bs) { +qerror_report(QERR_DEVICE_NOT_FOUND, id); +return -1; +} + +/* quiesce block driver; prevent further io */ +bdrv_unplug(bs); + +/* clean up guest state from pointing to host resource by + * finding and removing DeviceState drive property */ +for (prop = bs-peer-info-props; prop prop-name; prop++) { +if ((prop-info-type == PROP_TYPE_DRIVE) +(*(BlockDriverState **)qdev_get_prop_ptr(bs-peer, prop) == bs)) { +if (prop-info-free) { +prop-info-free(bs-peer, prop); +} +} +} + +/* clean up host state pointing to guest resource by removing + * pointers to guest device in the BlockDriverState */ +bdrv_delete(bs); + +return 0; +} + diff --git a/blockdev.h b/blockdev.h index 653affc..2a0559e 100644 --- a/blockdev.h +++ b/blockdev.h @@ -51,5 +51,6 @@ int do_eject(Monitor *mon, const QDict *qdict, QObject **ret_data); int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject **ret_data); int do_change_block(Monitor *mon, const char *device, const char *filename, const char *fmt); +int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data); #endif diff --git a/hmp-commands.hx b/hmp-commands.hx index e5585ba..d6dc18c 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -68,6 +68,24 @@ Eject a removable medium (use -f to force it). ETEXI { +.name = drive_del, +.args_type = id:s, +.params = device, +.help = remove host block device, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_drive_del, +}, + +STEXI +...@item delete @var{device} +...@findex delete +Remove host block device. The result is that guest generated IO is no longer +submitted against the host device underlying the disk. Once a drive has +been deleted, the QEMU Block layer returns -EIO which results in IO +errors in the guest for applications
Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
* Markus Armbruster arm...@redhat.com [2010-11-06 04:19]: Ryan Harper ry...@us.ibm.com writes: * Markus Armbruster arm...@redhat.com [2010-11-05 11:11]: Ryan Harper ry...@us.ibm.com writes: * Markus Armbruster arm...@redhat.com [2010-11-05 08:28]: I'd be fine with any of these: 1. A new command device_disconnet ID (or similar name) to disconnect device ID from any host parts. Nice touch: you don't have to know about the device's host part(s) to disconnect it. But it might be more work than the other two. This is sort of what netdev_del() and drive_unplug() are today; we're just saying sever the connection of this device id. No, I have netdev_del as (3). All three options are sort of the same, just different commands with a common purpose. I'd like to rename drive_unplug() to blockdev_del() and call it done. I was looking at libvirt and the right call to netdev_del is already in-place; I'd just need to re-spin my block patch to call blockdev_del() after invoking device_del() to match what is done for net. Unless I'm missing something, you can't just rename: your unplug does not delete the host part. 2. New commands netdev_disconnect, drive_disconnect (or similar names) to disconnect a host part from a guest device. Like (1), except you have to point to the other end of the connection to cut it. What's the advantage here? We need an additional piece of info (host part) in addition to the device id? That's a disadvantage. Possible advantage: implementation could be slightly easier than (1), because you don't have to find the host parts. 3. A new command drive_del ID similar to existing netdev_del. This is (2) fused with delete. Conceptual wart: you can't disconnect and keep the host part around. Moreover, delete is slightly dangerous, because it renders any guest device still using the host part useless. Hrm, I thought that's what (1) is. No. With (1), the argument is a *device* ID, and we disconnect *all* host parts connected to this device (typically just one). With (3), the argument is a netdev/drive ID, and disconnect *this* host part from the peer device. Well, either (1) or (3); I'd like to rename drive_unplug() to blockdev_del() since they're similar function w.r.t removing access to the host resource. And we can invoke them in the same way from libvirt (after doing guest notification, remove access). I'd call it drive_del for now, to match drive_add. OK, drive_del() and as you mentioned, drive_unplug will take out the block driver, but doesn't remove the dinfo object; that ends up dying when we call the device destructor. I think for symmetry we'll want drive_del to remove the dinfo object as well. Exactly. a. bdrv_detach() to zap the pointer from bdrv to qdev b. zap the pointer from qdev to bdrv c. drive_uninit() to dispose of the host part a-c need to be done to match netdev_del symmetry? How hard of a req is this? Step b could be awkward with (3), because you don't know device details. I guess you have to search device properties for a drive property pointing to bdrv. I like (1) because it puts that loop in the one place where it belongs: qdev core. (3) duplicates it in every HOSTDEV_del. Except for netdev_del, which is special because of VLANs. To avoid step b, you could try to keep the bdrv around in a special zombie state. Still have to free the dinfo, but can't use drive_uninit() for that then. If you think I'm overcomplicating this, feel free to prove me wrong with working code :) drive_unplug() works as-is today; so it does feel very combursome at this point. Other than the name change and agreement on how mgmt should invoke the command, it's been a long ride to get here. I'll take my best shot at trying to clean up the other pointers and objects; though on one of my attempts when I took out the dinfo() object that didn't go so well; going to have to audit who uses dinfo and where and what they check before calling it to have a proper cleanup that doesn't remove the whole device altogether. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
* Michael S. Tsirkin m...@redhat.com [2010-11-05 09:18]: On Fri, Nov 05, 2010 at 02:27:49PM +0100, Markus Armbruster wrote: Ryan Harper ry...@us.ibm.com writes: * Michael S. Tsirkin m...@redhat.com [2010-11-03 16:46]: On Wed, Nov 03, 2010 at 03:59:29PM -0500, Ryan Harper wrote: * Michael S. Tsirkin m...@redhat.com [2010-11-03 13:03]: On Wed, Nov 03, 2010 at 12:29:10PM -0500, Ryan Harper wrote: * Markus Armbruster arm...@redhat.com [2010-11-03 11:42]: Ryan Harper ry...@us.ibm.com writes: * Michael S. Tsirkin m...@redhat.com [2010-11-03 02:22]: On Tue, Nov 02, 2010 at 03:23:38PM -0500, Ryan Harper wrote: * Michael S. Tsirkin m...@redhat.com [2010-11-02 14:18]: On Tue, Nov 02, 2010 at 02:01:08PM -0500, Ryan Harper wrote: I like the idea of disconnect; if part of the device_del method was to invoke a disconnect method, we could implement that for block, net, etc; I'd think we'd want to send the notification, then disconnect. Struggling with whether it's worth having some reasonable timeout between notification and disconnect. The problem with this is that it has no analog in real world. In real world, you can send some notifications to the guest, and you can remove the card. Tying them together is what created the problem in the first place. Timeouts can be implemented by management, maybe with a nice dialog being shown to the user. Very true. I'm fine with forcing a disconnect during the removal path prior to notification. Do we want a new disconnect method at the device level (pci)? or just use the existing removal callback and call that during the initial hotremov event? Not sure what you mean by that, but I don't see a device doing anything differently wrt surprise or ordered removal. So probably the existing callback should do. I don't think we need to talk about disconnect: since we decided we are emulating device removal, let's call it just that. Because current the removal process depends on the guest actually responding. What I'm suggesting is that, in Marcus's term, and what drive_unplug() implements, is to disconnect the host block device from the guest device to prevent any further access to it in the case the guest doesn't respond to the removal request made via ACPI. Very specifically, what we're suggesting instead of the drive_unplug() command so to complete the device removal operation without waiting for the guest to respond; that's what's going to happen if we invoke the response callback; it will appear as if the guest responded whether it did or not. What I was suggesting above was to instead of calling the callback for handing the guest response was to add a device function called disconnect which would remove any association of host resources from guest resources before we notified the guest. Thinking about it again I'm not sure this is useful, but if we're going to remove the device without the guests knowledge, I'm not sure how useful sending the removal requests via ACPI is in the first place. My feeling is that I'd like to have explicit control over the disconnect from host resources separate from the device removal *if* we're going to retain the guest notification. If we don't care to notify the guest, then we can just do device removal without notifying the guest and be done with it. I imagine management would typically want to do this: 1. notify guest 2. wait a bit 3. remove device Yes; but this argues for (1) being a separate command from (3) Yes. Long term I think we will want a way to do that. unless we require (3) to include (1) and (2) in the qemu implementation. Currently we implement: 1. device_del (attempt to remove device) 2. notify guest
Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
* Markus Armbruster arm...@redhat.com [2010-11-05 08:28]: Ryan Harper ry...@us.ibm.com writes: * Michael S. Tsirkin m...@redhat.com [2010-11-03 16:46]: On Wed, Nov 03, 2010 at 03:59:29PM -0500, Ryan Harper wrote: * Michael S. Tsirkin m...@redhat.com [2010-11-03 13:03]: On Wed, Nov 03, 2010 at 12:29:10PM -0500, Ryan Harper wrote: * Markus Armbruster arm...@redhat.com [2010-11-03 11:42]: Ryan Harper ry...@us.ibm.com writes: * Michael S. Tsirkin m...@redhat.com [2010-11-03 02:22]: On Tue, Nov 02, 2010 at 03:23:38PM -0500, Ryan Harper wrote: * Michael S. Tsirkin m...@redhat.com [2010-11-02 14:18]: On Tue, Nov 02, 2010 at 02:01:08PM -0500, Ryan Harper wrote: I like the idea of disconnect; if part of the device_del method was to invoke a disconnect method, we could implement that for block, net, etc; I'd think we'd want to send the notification, then disconnect. Struggling with whether it's worth having some reasonable timeout between notification and disconnect. The problem with this is that it has no analog in real world. In real world, you can send some notifications to the guest, and you can remove the card. Tying them together is what created the problem in the first place. Timeouts can be implemented by management, maybe with a nice dialog being shown to the user. Very true. I'm fine with forcing a disconnect during the removal path prior to notification. Do we want a new disconnect method at the device level (pci)? or just use the existing removal callback and call that during the initial hotremov event? Not sure what you mean by that, but I don't see a device doing anything differently wrt surprise or ordered removal. So probably the existing callback should do. I don't think we need to talk about disconnect: since we decided we are emulating device removal, let's call it just that. Because current the removal process depends on the guest actually responding. What I'm suggesting is that, in Marcus's term, and what drive_unplug() implements, is to disconnect the host block device from the guest device to prevent any further access to it in the case the guest doesn't respond to the removal request made via ACPI. Very specifically, what we're suggesting instead of the drive_unplug() command so to complete the device removal operation without waiting for the guest to respond; that's what's going to happen if we invoke the response callback; it will appear as if the guest responded whether it did or not. What I was suggesting above was to instead of calling the callback for handing the guest response was to add a device function called disconnect which would remove any association of host resources from guest resources before we notified the guest. Thinking about it again I'm not sure this is useful, but if we're going to remove the device without the guests knowledge, I'm not sure how useful sending the removal requests via ACPI is in the first place. My feeling is that I'd like to have explicit control over the disconnect from host resources separate from the device removal *if* we're going to retain the guest notification. If we don't care to notify the guest, then we can just do device removal without notifying the guest and be done with it. I imagine management would typically want to do this: 1. notify guest 2. wait a bit 3. remove device Yes; but this argues for (1) being a separate command from (3) Yes. Long term I think we will want a way to do that. unless we require (3) to include (1) and (2) in the qemu implementation. Currently we implement: 1. device_del (attempt to remove device) 2. notify guest 3. if guest responds, remove device 4. disconnect host resource from device on destruction With my drive_unplug patch we do: 1. disconnect host resource from device
Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
* Markus Armbruster arm...@redhat.com [2010-11-05 11:11]: Ryan Harper ry...@us.ibm.com writes: * Markus Armbruster arm...@redhat.com [2010-11-05 08:28]: I'd be fine with any of these: 1. A new command device_disconnet ID (or similar name) to disconnect device ID from any host parts. Nice touch: you don't have to know about the device's host part(s) to disconnect it. But it might be more work than the other two. This is sort of what netdev_del() and drive_unplug() are today; we're just saying sever the connection of this device id. No, I have netdev_del as (3). All three options are sort of the same, just different commands with a common purpose. I'd like to rename drive_unplug() to blockdev_del() and call it done. I was looking at libvirt and the right call to netdev_del is already in-place; I'd just need to re-spin my block patch to call blockdev_del() after invoking device_del() to match what is done for net. Unless I'm missing something, you can't just rename: your unplug does not delete the host part. 2. New commands netdev_disconnect, drive_disconnect (or similar names) to disconnect a host part from a guest device. Like (1), except you have to point to the other end of the connection to cut it. What's the advantage here? We need an additional piece of info (host part) in addition to the device id? That's a disadvantage. Possible advantage: implementation could be slightly easier than (1), because you don't have to find the host parts. 3. A new command drive_del ID similar to existing netdev_del. This is (2) fused with delete. Conceptual wart: you can't disconnect and keep the host part around. Moreover, delete is slightly dangerous, because it renders any guest device still using the host part useless. Hrm, I thought that's what (1) is. No. With (1), the argument is a *device* ID, and we disconnect *all* host parts connected to this device (typically just one). With (3), the argument is a netdev/drive ID, and disconnect *this* host part from the peer device. Well, either (1) or (3); I'd like to rename drive_unplug() to blockdev_del() since they're similar function w.r.t removing access to the host resource. And we can invoke them in the same way from libvirt (after doing guest notification, remove access). I'd call it drive_del for now, to match drive_add. OK, drive_del() and as you mentioned, drive_unplug will take out the block driver, but doesn't remove the dinfo object; that ends up dying when we call the device destructor. I think for symmetry we'll want drive_del to remove the dinfo object as well. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
* Michael S. Tsirkin m...@redhat.com [2010-11-03 16:46]: On Wed, Nov 03, 2010 at 03:59:29PM -0500, Ryan Harper wrote: * Michael S. Tsirkin m...@redhat.com [2010-11-03 13:03]: On Wed, Nov 03, 2010 at 12:29:10PM -0500, Ryan Harper wrote: * Markus Armbruster arm...@redhat.com [2010-11-03 11:42]: Ryan Harper ry...@us.ibm.com writes: * Michael S. Tsirkin m...@redhat.com [2010-11-03 02:22]: On Tue, Nov 02, 2010 at 03:23:38PM -0500, Ryan Harper wrote: * Michael S. Tsirkin m...@redhat.com [2010-11-02 14:18]: On Tue, Nov 02, 2010 at 02:01:08PM -0500, Ryan Harper wrote: I like the idea of disconnect; if part of the device_del method was to invoke a disconnect method, we could implement that for block, net, etc; I'd think we'd want to send the notification, then disconnect. Struggling with whether it's worth having some reasonable timeout between notification and disconnect. The problem with this is that it has no analog in real world. In real world, you can send some notifications to the guest, and you can remove the card. Tying them together is what created the problem in the first place. Timeouts can be implemented by management, maybe with a nice dialog being shown to the user. Very true. I'm fine with forcing a disconnect during the removal path prior to notification. Do we want a new disconnect method at the device level (pci)? or just use the existing removal callback and call that during the initial hotremov event? Not sure what you mean by that, but I don't see a device doing anything differently wrt surprise or ordered removal. So probably the existing callback should do. I don't think we need to talk about disconnect: since we decided we are emulating device removal, let's call it just that. Because current the removal process depends on the guest actually responding. What I'm suggesting is that, in Marcus's term, and what drive_unplug() implements, is to disconnect the host block device from the guest device to prevent any further access to it in the case the guest doesn't respond to the removal request made via ACPI. Very specifically, what we're suggesting instead of the drive_unplug() command so to complete the device removal operation without waiting for the guest to respond; that's what's going to happen if we invoke the response callback; it will appear as if the guest responded whether it did or not. What I was suggesting above was to instead of calling the callback for handing the guest response was to add a device function called disconnect which would remove any association of host resources from guest resources before we notified the guest. Thinking about it again I'm not sure this is useful, but if we're going to remove the device without the guests knowledge, I'm not sure how useful sending the removal requests via ACPI is in the first place. My feeling is that I'd like to have explicit control over the disconnect from host resources separate from the device removal *if* we're going to retain the guest notification. If we don't care to notify the guest, then we can just do device removal without notifying the guest and be done with it. I imagine management would typically want to do this: 1. notify guest 2. wait a bit 3. remove device Yes; but this argues for (1) being a separate command from (3) Yes. Long term I think we will want a way to do that. unless we require (3) to include (1) and (2) in the qemu implementation. Currently we implement: 1. device_del (attempt to remove device) 2. notify guest 3. if guest responds, remove device 4. disconnect host resource from device on destruction With my drive_unplug patch we do: 1. disconnect host resource from device This is what drive_unplug does, right? Correct. 2. device_del (attempt to remove device) 3. notify guest 4. if guest responds, remove device I think we're
Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
* Michael S. Tsirkin m...@redhat.com [2010-11-03 02:22]: On Tue, Nov 02, 2010 at 03:23:38PM -0500, Ryan Harper wrote: * Michael S. Tsirkin m...@redhat.com [2010-11-02 14:18]: On Tue, Nov 02, 2010 at 02:01:08PM -0500, Ryan Harper wrote: I like the idea of disconnect; if part of the device_del method was to invoke a disconnect method, we could implement that for block, net, etc; I'd think we'd want to send the notification, then disconnect. Struggling with whether it's worth having some reasonable timeout between notification and disconnect. The problem with this is that it has no analog in real world. In real world, you can send some notifications to the guest, and you can remove the card. Tying them together is what created the problem in the first place. Timeouts can be implemented by management, maybe with a nice dialog being shown to the user. Very true. I'm fine with forcing a disconnect during the removal path prior to notification. Do we want a new disconnect method at the device level (pci)? or just use the existing removal callback and call that during the initial hotremov event? Not sure what you mean by that, but I don't see a device doing anything differently wrt surprise or ordered removal. So probably the existing callback should do. I don't think we need to talk about disconnect: since we decided we are emulating device removal, let's call it just that. Because current the removal process depends on the guest actually responding. What I'm suggesting is that, in Marcus's term, and what drive_unplug() implements, is to disconnect the host block device from the guest device to prevent any further access to it in the case the guest doesn't respond to the removal request made via ACPI. Very specifically, what we're suggesting instead of the drive_unplug() command so to complete the device removal operation without waiting for the guest to respond; that's what's going to happen if we invoke the response callback; it will appear as if the guest responded whether it did or not. What I was suggesting above was to instead of calling the callback for handing the guest response was to add a device function called disconnect which would remove any association of host resources from guest resources before we notified the guest. Thinking about it again I'm not sure this is useful, but if we're going to remove the device without the guests knowledge, I'm not sure how useful sending the removal requests via ACPI is in the first place. My feeling is that I'd like to have explicit control over the disconnect from host resources separate from the device removal *if* we're going to retain the guest notification. If we don't care to notify the guest, then we can just do device removal without notifying the guest and be done with it. I imagine management would typically want to do this: 1. notify guest 2. wait a bit 3. remove device Yes; but this argues for (1) being a separate command from (3) Yes. Long term I think we will want a way to do that. unless we require (3) to include (1) and (2) in the qemu implementation. Currently we implement: 1. device_del (attempt to remove device) 2. notify guest 3. if guest responds, remove device 4. disconnect host resource from device on destruction With my drive_unplug patch we do: 1. disconnect host resource from device This is what drive_unplug does, right? Correct. 2. device_del (attempt to remove device) 3. notify guest 4. if guest responds, remove device I think we're suggesting to instead do (if we keep disconnect as part of device_del) 1. device_del (attemp to remove device) 2. notify guest 3. invoke device destruction callback resulting in disconnect host resource from device 4. if guest responds, invoke device destruction path a second time. By response you mean eject? No, this is not what I was suggesting. I was really suggesting that your patch is fine :) Sorry about confusion. I don't mean eject; I mean responding to the ACPI event by writing a response to the PCI chipset which QEMU then in turn will invoke the qdev_unplug() path which ultimately kills the device and the Drive and BlockState objects. I was also saying that from what I hear, the pci express support will at some point need interfaces to - notify guest about device removal/addition - get eject from guest - remove device without talking to guest - add device without talking to guest - suppress device deletion on eject All this can be generic and can work through express configuration mechanisms or through acpi for pci
Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
* Markus Armbruster arm...@redhat.com [2010-11-03 11:42]: Ryan Harper ry...@us.ibm.com writes: * Michael S. Tsirkin m...@redhat.com [2010-11-03 02:22]: On Tue, Nov 02, 2010 at 03:23:38PM -0500, Ryan Harper wrote: * Michael S. Tsirkin m...@redhat.com [2010-11-02 14:18]: On Tue, Nov 02, 2010 at 02:01:08PM -0500, Ryan Harper wrote: I like the idea of disconnect; if part of the device_del method was to invoke a disconnect method, we could implement that for block, net, etc; I'd think we'd want to send the notification, then disconnect. Struggling with whether it's worth having some reasonable timeout between notification and disconnect. The problem with this is that it has no analog in real world. In real world, you can send some notifications to the guest, and you can remove the card. Tying them together is what created the problem in the first place. Timeouts can be implemented by management, maybe with a nice dialog being shown to the user. Very true. I'm fine with forcing a disconnect during the removal path prior to notification. Do we want a new disconnect method at the device level (pci)? or just use the existing removal callback and call that during the initial hotremov event? Not sure what you mean by that, but I don't see a device doing anything differently wrt surprise or ordered removal. So probably the existing callback should do. I don't think we need to talk about disconnect: since we decided we are emulating device removal, let's call it just that. Because current the removal process depends on the guest actually responding. What I'm suggesting is that, in Marcus's term, and what drive_unplug() implements, is to disconnect the host block device from the guest device to prevent any further access to it in the case the guest doesn't respond to the removal request made via ACPI. Very specifically, what we're suggesting instead of the drive_unplug() command so to complete the device removal operation without waiting for the guest to respond; that's what's going to happen if we invoke the response callback; it will appear as if the guest responded whether it did or not. What I was suggesting above was to instead of calling the callback for handing the guest response was to add a device function called disconnect which would remove any association of host resources from guest resources before we notified the guest. Thinking about it again I'm not sure this is useful, but if we're going to remove the device without the guests knowledge, I'm not sure how useful sending the removal requests via ACPI is in the first place. My feeling is that I'd like to have explicit control over the disconnect from host resources separate from the device removal *if* we're going to retain the guest notification. If we don't care to notify the guest, then we can just do device removal without notifying the guest and be done with it. I imagine management would typically want to do this: 1. notify guest 2. wait a bit 3. remove device Yes; but this argues for (1) being a separate command from (3) Yes. Long term I think we will want a way to do that. unless we require (3) to include (1) and (2) in the qemu implementation. Currently we implement: 1. device_del (attempt to remove device) 2. notify guest 3. if guest responds, remove device 4. disconnect host resource from device on destruction With my drive_unplug patch we do: 1. disconnect host resource from device This is what drive_unplug does, right? Correct. 2. device_del (attempt to remove device) 3. notify guest 4. if guest responds, remove device I think we're suggesting to instead do (if we keep disconnect as part of device_del) 1. device_del (attemp to remove device) 2. notify guest 3. invoke device destruction callback resulting in disconnect host resource from device 4. if guest responds, invoke device destruction path a second time. By response you mean eject? No, this is not what I was suggesting. I was really suggesting that your patch is fine :) Sorry about confusion. I don't mean eject; I mean responding to the ACPI event by writing a response to the PCI chipset which QEMU then in turn will invoke the qdev_unplug() path which ultimately kills the device and the Drive and BlockState objects. I was also saying that from what I hear, the pci express support will at some point need interfaces to - notify guest
Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
* Michael S. Tsirkin m...@redhat.com [2010-11-03 13:03]: On Wed, Nov 03, 2010 at 12:29:10PM -0500, Ryan Harper wrote: * Markus Armbruster arm...@redhat.com [2010-11-03 11:42]: Ryan Harper ry...@us.ibm.com writes: * Michael S. Tsirkin m...@redhat.com [2010-11-03 02:22]: On Tue, Nov 02, 2010 at 03:23:38PM -0500, Ryan Harper wrote: * Michael S. Tsirkin m...@redhat.com [2010-11-02 14:18]: On Tue, Nov 02, 2010 at 02:01:08PM -0500, Ryan Harper wrote: I like the idea of disconnect; if part of the device_del method was to invoke a disconnect method, we could implement that for block, net, etc; I'd think we'd want to send the notification, then disconnect. Struggling with whether it's worth having some reasonable timeout between notification and disconnect. The problem with this is that it has no analog in real world. In real world, you can send some notifications to the guest, and you can remove the card. Tying them together is what created the problem in the first place. Timeouts can be implemented by management, maybe with a nice dialog being shown to the user. Very true. I'm fine with forcing a disconnect during the removal path prior to notification. Do we want a new disconnect method at the device level (pci)? or just use the existing removal callback and call that during the initial hotremov event? Not sure what you mean by that, but I don't see a device doing anything differently wrt surprise or ordered removal. So probably the existing callback should do. I don't think we need to talk about disconnect: since we decided we are emulating device removal, let's call it just that. Because current the removal process depends on the guest actually responding. What I'm suggesting is that, in Marcus's term, and what drive_unplug() implements, is to disconnect the host block device from the guest device to prevent any further access to it in the case the guest doesn't respond to the removal request made via ACPI. Very specifically, what we're suggesting instead of the drive_unplug() command so to complete the device removal operation without waiting for the guest to respond; that's what's going to happen if we invoke the response callback; it will appear as if the guest responded whether it did or not. What I was suggesting above was to instead of calling the callback for handing the guest response was to add a device function called disconnect which would remove any association of host resources from guest resources before we notified the guest. Thinking about it again I'm not sure this is useful, but if we're going to remove the device without the guests knowledge, I'm not sure how useful sending the removal requests via ACPI is in the first place. My feeling is that I'd like to have explicit control over the disconnect from host resources separate from the device removal *if* we're going to retain the guest notification. If we don't care to notify the guest, then we can just do device removal without notifying the guest and be done with it. I imagine management would typically want to do this: 1. notify guest 2. wait a bit 3. remove device Yes; but this argues for (1) being a separate command from (3) Yes. Long term I think we will want a way to do that. unless we require (3) to include (1) and (2) in the qemu implementation. Currently we implement: 1. device_del (attempt to remove device) 2. notify guest 3. if guest responds, remove device 4. disconnect host resource from device on destruction With my drive_unplug patch we do: 1. disconnect host resource from device This is what drive_unplug does, right? Correct. 2. device_del (attempt to remove device) 3. notify guest 4. if guest responds, remove device I think we're suggesting to instead do (if we keep disconnect as part of device_del) 1. device_del (attemp to remove device) 2. notify guest 3. invoke device destruction callback resulting in disconnect host resource from device 4. if guest responds, invoke device destruction path a second time. By response you mean eject? No, this is not what I was suggesting
Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
* Markus Armbruster arm...@redhat.com [2010-11-02 04:40]: Ryan Harper ry...@us.ibm.com writes: * Markus Armbruster arm...@redhat.com [2010-10-29 11:11]: Ryan Harper ry...@us.ibm.com writes: * Markus Armbruster arm...@redhat.com [2010-10-29 09:13]: [Note cc: Michael] Ryan Harper ry...@us.ibm.com writes: If I understand your patch correctly, the difference between your drive_unplug and my blockdev_del is as follows: * drive_unplug forcefully severs the connection between the host part of the block device and its BlockDriverState. A shell of the host part remains, to be cleaned up later. You need forceful disconnect operation to be able to revoke access to an image whether the guest cooperates or not. Fair enough. * blockdev_del deletes a host part. My current version fails when the host part is in use. I patterned that after netdev_del, which used to work that way, until commit 2ffcb18d: Make netdev_del delete the netdev even when it's in use To hot-unplug guest and host part of a network device, you do: device_del NIC-ID netdev_del NETDEV-ID For PCI devices, device_del merely tells ACPI to unplug the device. The device goes away for real only after the guest processed the ACPI unplug event. You have to wait until then (e.g. by polling info pci) before you can unplug the netdev. Not good. Fix by removing the in use check from do_netdev_del(). Deleting a netdev while it's in use is safe; packets simply get routed to the bit bucket. Isn't this the very same problem that's behind your drive_unplug? Yes it is. I'd like to have some consistency among net, block and char device commands, i.e. a common set of operations that work the same for all of them. Can we agree on such a set? Yeah; the current trouble (or at least what I perceive to be trouble) is that in the case where the guest responds to device_del induced ACPI removal event; the current qdev code already does the host-side device tear down. Not sure if it is OK to do a blockdev_del() immediately after the device_del. What happens when we do: device_del ACPI to guest blockdev_del /* removes host-side device */ Fails in my tree, because the blockdev's still in use. See below. guest responds to ACPI qdev calls pci device removal code qemu attempts to destroy the associated host-side block That may just work today; and if not, it shouldn't be hard to fix up the code to check for NULLs I hate the automatic deletion of host part along with the guest part. device_del should undo device_add. {block,net,char}dev_{add,del} should be similarly paired. Agreed. In my blockdev branch, I keep the automatic delete only for backwards compatibility: if you create the drive with drive_add, it gets auto-deleted, but if you use blockdev_add, it stays around. But what to do about the case where we're doing drive_add and then a device_del() That's the urgent situation that needs to be resolved. What's the exact problem we need to solve urgently? Is it provide means to cut the connection to the host part immediately, even with an uncooperative guest? Yes, need to ensure that if the mgmt layer (libvirt) has done what it believes should have disassociated the host block device from the guest, we want to ensure that the host block device is no longer accessible from the guest. Does this need to be separate from device_del? no, it doesn't have to be. Honestly, I didn't see a clear way to do something like unplug early in the device_del because that's all pci device code which has no knowledge of host block devices; having it disconnect seemed like a layering violation. Even if your drive_unplug shouldn't fit in that set, we might want it as a stop-gap. Depends on how urgent the need for it is. Yet another special-purpose command to be deprecated later. The fix is urgent; but I'm willing to spin a couple patches if it helps get this into better shape. Can we agree on a common solution for block and net? That's why I cc'ed Michael. I didn't see a good way to have block behave the same as net; though I do agree that it would be good to have this be common, long term. If we can't make them behave 100% the same, then the next best thing is to offer a preferred way to do things that works similarly enough to let users ignore the differences. Possible preferred ways to revoke access to a host part: A. device_del Need to make device_del cut the connection right away instead of when the guest completes unplug. device_del changes behavior. Any problems with that? I don't think so; current mgmt consumers assume a cooperative guest and don't handle
Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
* Michael S. Tsirkin m...@redhat.com [2010-11-02 08:59]: On Tue, Nov 02, 2010 at 08:46:22AM -0500, Ryan Harper wrote: * Markus Armbruster arm...@redhat.com [2010-11-02 04:40]: I'd like to have some consistency among net, block and char device commands, i.e. a common set of operations that work the same for all of them. Can we agree on such a set? Yeah; the current trouble (or at least what I perceive to be trouble) is that in the case where the guest responds to device_del induced ACPI removal event; the current qdev code already does the host-side device tear down. Not sure if it is OK to do a blockdev_del() immediately after the device_del. What happens when we do: device_del ACPI to guest blockdev_del /* removes host-side device */ Fails in my tree, because the blockdev's still in use. See below. guest responds to ACPI qdev calls pci device removal code qemu attempts to destroy the associated host-side block That may just work today; and if not, it shouldn't be hard to fix up the code to check for NULLs I hate the automatic deletion of host part along with the guest part. device_del should undo device_add. {block,net,char}dev_{add,del} should be similarly paired. Agreed. In my blockdev branch, I keep the automatic delete only for backwards compatibility: if you create the drive with drive_add, it gets auto-deleted, but if you use blockdev_add, it stays around. But what to do about the case where we're doing drive_add and then a device_del() That's the urgent situation that needs to be resolved. What's the exact problem we need to solve urgently? Is it provide means to cut the connection to the host part immediately, even with an uncooperative guest? Yes, need to ensure that if the mgmt layer (libvirt) has done what it believes should have disassociated the host block device from the guest, we want to ensure that the host block device is no longer accessible from the guest. Does this need to be separate from device_del? no, it doesn't have to be. Honestly, I didn't see a clear way to do something like unplug early in the device_del because that's all pci device code which has no knowledge of host block devices; having it disconnect seemed like a layering violation. We invoke the cleanup callback, isn't that enough? Won't that look a bit strange? on device_del, call the cleanup callback first;, then notify the guest, if the guest responds, I suppose as long as the cleanup callback can handle being called a second time that'd work. I like the idea of disconnect; if part of the device_del method was to invoke a disconnect method, we could implement that for block, net, etc; I'd think we'd want to send the notification, then disconnect. Struggling with whether it's worth having some reasonable timeout between notification and disconnect. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
* Michael S. Tsirkin m...@redhat.com [2010-11-02 10:56]: On Tue, Nov 02, 2010 at 09:22:01AM -0500, Ryan Harper wrote: * Michael S. Tsirkin m...@redhat.com [2010-11-02 08:59]: On Tue, Nov 02, 2010 at 08:46:22AM -0500, Ryan Harper wrote: * Markus Armbruster arm...@redhat.com [2010-11-02 04:40]: I'd like to have some consistency among net, block and char device commands, i.e. a common set of operations that work the same for all of them. Can we agree on such a set? Yeah; the current trouble (or at least what I perceive to be trouble) is that in the case where the guest responds to device_del induced ACPI removal event; the current qdev code already does the host-side device tear down. Not sure if it is OK to do a blockdev_del() immediately after the device_del. What happens when we do: device_del ACPI to guest blockdev_del /* removes host-side device */ Fails in my tree, because the blockdev's still in use. See below. guest responds to ACPI qdev calls pci device removal code qemu attempts to destroy the associated host-side block That may just work today; and if not, it shouldn't be hard to fix up the code to check for NULLs I hate the automatic deletion of host part along with the guest part. device_del should undo device_add. {block,net,char}dev_{add,del} should be similarly paired. Agreed. In my blockdev branch, I keep the automatic delete only for backwards compatibility: if you create the drive with drive_add, it gets auto-deleted, but if you use blockdev_add, it stays around. But what to do about the case where we're doing drive_add and then a device_del() That's the urgent situation that needs to be resolved. What's the exact problem we need to solve urgently? Is it provide means to cut the connection to the host part immediately, even with an uncooperative guest? Yes, need to ensure that if the mgmt layer (libvirt) has done what it believes should have disassociated the host block device from the guest, we want to ensure that the host block device is no longer accessible from the guest. Does this need to be separate from device_del? no, it doesn't have to be. Honestly, I didn't see a clear way to do something like unplug early in the device_del because that's all pci device code which has no knowledge of host block devices; having it disconnect seemed like a layering violation. We invoke the cleanup callback, isn't that enough? Won't that look a bit strange? on device_del, call the cleanup callback first;, then notify the guest, if the guest responds, I suppose as long as the cleanup callback can handle being called a second time that'd work. Well this is exactly what happens with surpise removal. If you yank a card out the slot, guest only gets notification afterwards. Right, though the card ripper can (in some systems) press the removal button which would send notification. I think I'm fine with not bothering to notify; this was mgmt interface driven anyhow so who ever is doing it should have already ensured they weren't using the device. I like the idea of disconnect; if part of the device_del method was to invoke a disconnect method, we could implement that for block, net, etc; I'd think we'd want to send the notification, then disconnect. Struggling with whether it's worth having some reasonable timeout between notification and disconnect. The problem with this is that it has no analog in real world. In real world, you can send some notifications to the guest, and you can remove the card. Tying them together is what created the problem in the first place. Timeouts can be implemented by management, maybe with a nice dialog being shown to the user. Very true. I'm fine with forcing a disconnect during the removal path prior to notification. Do we want a new disconnect method at the device level (pci)? or just use the existing removal callback and call that during the initial hotremov event? -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
I like the idea of disconnect; if part of the device_del method was to invoke a disconnect method, we could implement that for block, net, etc; I'd think we'd want to send the notification, then disconnect. Struggling with whether it's worth having some reasonable timeout between notification and disconnect. The problem with this is that it has no analog in real world. In real world, you can send some notifications to the guest, and you can remove the card. Tying them together is what created the problem in the first place. Timeouts can be implemented by management, maybe with a nice dialog being shown to the user. Very true. I'm fine with forcing a disconnect during the removal path prior to notification. Do we want a new disconnect method at the device level (pci)? or just use the existing removal callback and call that during the initial hotremov event? Not sure what you mean by that, but I don't see a device doing anything differently wrt surprise or ordered removal. So probably the existing callback should do. I don't think we need to talk about disconnect: since we decided we are emulating device removal, let's call it just that. Because current the removal process depends on the guest actually responding. What I'm suggesting is that, in Marcus's term, and what drive_unplug() implements, is to disconnect the host block device from the guest device to prevent any further access to it in the case the guest doesn't respond to the removal request made via ACPI. Very specifically, what we're suggesting instead of the drive_unplug() command so to complete the device removal operation without waiting for the guest to respond; that's what's going to happen if we invoke the response callback; it will appear as if the guest responded whether it did or not. What I was suggesting above was to instead of calling the callback for handing the guest response was to add a device function called disconnect which would remove any association of host resources from guest resources before we notified the guest. Thinking about it again I'm not sure this is useful, but if we're going to remove the device without the guests knowledge, I'm not sure how useful sending the removal requests via ACPI is in the first place. My feeling is that I'd like to have explicit control over the disconnect from host resources separate from the device removal *if* we're going to retain the guest notification. If we don't care to notify the guest, then we can just do device removal without notifying the guest and be done with it. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
* Michael S. Tsirkin m...@redhat.com [2010-11-02 14:18]: On Tue, Nov 02, 2010 at 02:01:08PM -0500, Ryan Harper wrote: I like the idea of disconnect; if part of the device_del method was to invoke a disconnect method, we could implement that for block, net, etc; I'd think we'd want to send the notification, then disconnect. Struggling with whether it's worth having some reasonable timeout between notification and disconnect. The problem with this is that it has no analog in real world. In real world, you can send some notifications to the guest, and you can remove the card. Tying them together is what created the problem in the first place. Timeouts can be implemented by management, maybe with a nice dialog being shown to the user. Very true. I'm fine with forcing a disconnect during the removal path prior to notification. Do we want a new disconnect method at the device level (pci)? or just use the existing removal callback and call that during the initial hotremov event? Not sure what you mean by that, but I don't see a device doing anything differently wrt surprise or ordered removal. So probably the existing callback should do. I don't think we need to talk about disconnect: since we decided we are emulating device removal, let's call it just that. Because current the removal process depends on the guest actually responding. What I'm suggesting is that, in Marcus's term, and what drive_unplug() implements, is to disconnect the host block device from the guest device to prevent any further access to it in the case the guest doesn't respond to the removal request made via ACPI. Very specifically, what we're suggesting instead of the drive_unplug() command so to complete the device removal operation without waiting for the guest to respond; that's what's going to happen if we invoke the response callback; it will appear as if the guest responded whether it did or not. What I was suggesting above was to instead of calling the callback for handing the guest response was to add a device function called disconnect which would remove any association of host resources from guest resources before we notified the guest. Thinking about it again I'm not sure this is useful, but if we're going to remove the device without the guests knowledge, I'm not sure how useful sending the removal requests via ACPI is in the first place. My feeling is that I'd like to have explicit control over the disconnect from host resources separate from the device removal *if* we're going to retain the guest notification. If we don't care to notify the guest, then we can just do device removal without notifying the guest and be done with it. I imagine management would typically want to do this: 1. notify guest 2. wait a bit 3. remove device Yes; but this argues for (1) being a separate command from (3) unless we require (3) to include (1) and (2) in the qemu implementation. Currently we implement: 1. device_del (attempt to remove device) 2. notify guest 3. if guest responds, remove device 4. disconnect host resource from device on destruction With my drive_unplug patch we do: 1. disconnect host resource from device 2. device_del (attempt to remove device) 3. notify guest 4. if guest responds, remove device I think we're suggesting to instead do (if we keep disconnect as part of device_del) 1. device_del (attemp to remove device) 2. notify guest 3. invoke device destruction callback resulting in disconnect host resource from device 4. if guest responds, invoke device destruction path a second time. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] [PATCH 2/3] v2 Fix Block Hotplug race with drive_unplug()
* Markus Armbruster arm...@redhat.com [2010-10-29 09:08]: Ryan Harper ry...@us.ibm.com writes: Block hot unplug is racy since the guest is required to acknowlege the ACPI unplug event; this may not happen synchronously with the device removal command This series aims to close a gap where by mgmt applications that assume the block resource has been removed without confirming that the guest has acknowledged the removal may re-assign the underlying device to a second guest leading to data leakage. This series introduces a new montor command to decouple asynchornous device removal from restricting guest access to a block device. We do this by creating a new monitor command drive_unplug which maps to a bdrv_unplug() command which does a qemu_aio_flush; bdrv_flush() and bdrv_close(). Once complete, subsequent IO is rejected from the device and the guest will get IO errors but continue to function. A subsequent device removal command can be issued to remove the device, to which the guest may or maynot respond, but as long as the unplugged bit is set, no IO will be sumbitted. Changes since v1: - Added qemu_aio_flush() before bdrv_flush() to wait on pending io Signed-off-by: Ryan Harper ry...@us.ibm.com --- block.c |7 +++ block.h |1 + blockdev.c | 26 ++ blockdev.h |1 + hmp-commands.hx | 15 +++ 5 files changed, 50 insertions(+), 0 deletions(-) diff --git a/block.c b/block.c index a19374d..be47655 100644 --- a/block.c +++ b/block.c @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int removable) } } +void bdrv_unplug(BlockDriverState *bs) +{ +qemu_aio_flush(); +bdrv_flush(bs); +bdrv_close(bs); +} Stupid question: why doesn't bdrv_close() flush automatically? And why do we have to flush here, but not before other uses of bdrv_close(), such as eject_device()? + int bdrv_is_removable(BlockDriverState *bs) { return bs-removable; diff --git a/block.h b/block.h index 5f64380..732f63e 100644 --- a/block.h +++ b/block.h @@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, BlockErrorAction on_read_error, BlockErrorAction on_write_error); BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read); void bdrv_set_removable(BlockDriverState *bs, int removable); +void bdrv_unplug(BlockDriverState *bs); int bdrv_is_removable(BlockDriverState *bs); int bdrv_is_read_only(BlockDriverState *bs); int bdrv_is_sg(BlockDriverState *bs); diff --git a/blockdev.c b/blockdev.c index 5fc3b9b..68eb329 100644 --- a/blockdev.c +++ b/blockdev.c @@ -610,3 +610,29 @@ int do_change_block(Monitor *mon, const char *device, } return monitor_read_bdrv_key_start(mon, bs, NULL, NULL); } + +int do_drive_unplug(Monitor *mon, const QDict *qdict, QObject **ret_data) +{ +DriveInfo *dinfo; +BlockDriverState *bs; +const char *id; + +if (!qdict_haskey(qdict, id)) { +qerror_report(QERR_MISSING_PARAMETER, id); +return -1; +} As Luiz pointed out, this check is redundant. + +id = qdict_get_str(qdict, id); +dinfo = drive_get_by_id(id); +if (!dinfo) { +qerror_report(QERR_DEVICE_NOT_FOUND, id); +return -1; +} + +/* mark block device unplugged */ +bs = dinfo-bdrv; +bdrv_unplug(bs); + +return 0; +} + What about: const char *id = qdict_get_str(qdict, id); BlockDriverState *bs; bs = bdrv_find(id); if (!bs) { qerror_report(QERR_DEVICE_NOT_FOUND, id); return -1; } bdrv_unplug(bs); return 0; Precedence: commit f8b6cc00 replaced uses of drive_get_by_id() by bdrv_find(). That works out nicely; and I can drop the drive_get_by_id() patch as well. Thanks. diff --git a/blockdev.h b/blockdev.h index 19c6915..ecb9ac8 100644 --- a/blockdev.h +++ b/blockdev.h @@ -52,5 +52,6 @@ int do_eject(Monitor *mon, const QDict *qdict, QObject **ret_data); int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject **ret_data); int do_change_block(Monitor *mon, const char *device, const char *filename, const char *fmt); +int do_drive_unplug(Monitor *mon, const QDict *qdict, QObject **ret_data); #endif diff --git a/hmp-commands.hx b/hmp-commands.hx index 81999aa..7a32a2e 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -68,6 +68,21 @@ Eject a removable medium (use -f to force it). ETEXI { +.name = drive_unplug, +.args_type = id:s, +.params = device, +.help = unplug block device, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_drive_unplug
[Qemu-devel] [PATCH 0/2] v5 Decouple block device removal from device removal
This patch series decouples the detachment of a block device from the removal of the backing pci-device. Removal of a hotplugged pci device requires the guest to respond before qemu tears down the block device. In some cases, the guest may not respond leaving the guest with continued access to the block device. The new monitor command, drive_unplug, will revoke a guests access to the block device independently of the removal of the pci device. The first patch adds a new drive find method, the second patch implements the monitor command and block layer changes. Changes since v4: - Droppped drive_get_by_id patch and use bdrv_find() instead - Added additional details about drive_unplug to hmp/qmp interface Changes since v3: - Moved QMP command for drive_unplug() to separate patch Changes since v2: - Added QMP command for drive_unplug() Changes since v1: - CodingStyle fixes - Added qemu_aio_flush() to bdrv_unplug() Signed-off-by: Ryan Harper ry...@us.ibm.com
[Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_unplug()
Block hot unplug is racy since the guest is required to acknowlege the ACPI unplug event; this may not happen synchronously with the device removal command This series aims to close a gap where by mgmt applications that assume the block resource has been removed without confirming that the guest has acknowledged the removal may re-assign the underlying device to a second guest leading to data leakage. This series introduces a new montor command to decouple asynchornous device removal from restricting guest access to a block device. We do this by creating a new monitor command drive_unplug which maps to a bdrv_unplug() command which does a qemu_aio_flush; bdrv_flush() and bdrv_close(). Once complete, subsequent IO is rejected from the device and the guest will get IO errors but continue to function. A subsequent device removal command can be issued to remove the device, to which the guest may or maynot respond, but as long as the unplugged bit is set, no IO will be sumbitted. Signed-off-by: Ryan Harper ry...@us.ibm.com --- block.c |7 +++ block.h |1 + blockdev.c | 17 + blockdev.h |1 + hmp-commands.hx | 20 5 files changed, 46 insertions(+), 0 deletions(-) diff --git a/block.c b/block.c index 985d0b7..2320e9d 100644 --- a/block.c +++ b/block.c @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int removable) } } +void bdrv_unplug(BlockDriverState *bs) +{ +qemu_aio_flush(); +bdrv_flush(bs); +bdrv_close(bs); +} + int bdrv_is_removable(BlockDriverState *bs) { return bs-removable; diff --git a/block.h b/block.h index a4facf2..608fd83 100644 --- a/block.h +++ b/block.h @@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, BlockErrorAction on_read_error, BlockErrorAction on_write_error); BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read); void bdrv_set_removable(BlockDriverState *bs, int removable); +void bdrv_unplug(BlockDriverState *bs); int bdrv_is_removable(BlockDriverState *bs); int bdrv_is_read_only(BlockDriverState *bs); int bdrv_is_sg(BlockDriverState *bs); diff --git a/blockdev.c b/blockdev.c index ff7602b..7bbdb65 100644 --- a/blockdev.c +++ b/blockdev.c @@ -597,3 +597,20 @@ int do_change_block(Monitor *mon, const char *device, } return monitor_read_bdrv_key_start(mon, bs, NULL, NULL); } + +int do_drive_unplug(Monitor *mon, const QDict *qdict, QObject **ret_data) +{ +const char *id = qdict_get_str(qdict, id); +BlockDriverState *bs; + +bs = bdrv_find(id); +if (!bs) { +qerror_report(QERR_DEVICE_NOT_FOUND, id); +return -1; +} + +bdrv_unplug(bs); + +return 0; +} + diff --git a/blockdev.h b/blockdev.h index 653affc..a454853 100644 --- a/blockdev.h +++ b/blockdev.h @@ -51,5 +51,6 @@ int do_eject(Monitor *mon, const QDict *qdict, QObject **ret_data); int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject **ret_data); int do_change_block(Monitor *mon, const char *device, const char *filename, const char *fmt); +int do_drive_unplug(Monitor *mon, const QDict *qdict, QObject **ret_data); #endif diff --git a/hmp-commands.hx b/hmp-commands.hx index 81999aa..f6d3c85 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -68,6 +68,26 @@ Eject a removable medium (use -f to force it). ETEXI { +.name = drive_unplug, +.args_type = id:s, +.params = device, +.help = unplug block device, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_drive_unplug, +}, + +STEXI +...@item unplug @var{device} +...@findex unplug +Unplug block device. The result is that guest generated IO is no longer +submitted against the host device underlying the disk. Once a drive has +been unplugged, the QEMU Block layer returns -EIO which results in IO +errors in the guest for applications that are reading/writing to the device +when it is unplugged. Unplugged block devices can be safely deleted along with +the associated pci devices (if present). +ETEXI + +{ .name = change, .args_type = device:B,target:F,arg:s?, .params = device filename [format], -- 1.6.3.3
[Qemu-devel] [PATCH 2/2] Add qmp version of drive_unplug
Signed-off-by: Ryan Harper ry...@us.ibm.com --- qmp-commands.hx | 31 +++ 1 files changed, 31 insertions(+), 0 deletions(-) diff --git a/qmp-commands.hx b/qmp-commands.hx index 793cf1c..a1f7b2f 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -338,6 +338,37 @@ Example: EQMP { +.name = drive_unplug, +.args_type = id:s, +.params = device, +.help = unplug block device, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_drive_unplug, +}, + +SQMP +drive unplug +-- + +Unplug a block device. The result is that guest generated IO is no longer +submitted against the host device underlying the disk. Once a drive has +been unplugged, the QEMU Block layer returns -EIO which results in IO +errors in the guest for applications that are reading/writing to the device +when it is unplugged. Unplugged block devices can be safely deleted along with +the associated pci devices (if present). + +Arguments: + +- id: the device's ID (json-string) + +Example: + +- { execute: drive_unplug, arguments: { id: drive-virtio-blk1 } } +- { return: {} } + +EQMP + +{ .name = cpu, .args_type = index:i, .params = index, -- 1.6.3.3
Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
* Markus Armbruster arm...@redhat.com [2010-10-29 09:13]: [Note cc: Michael] Ryan Harper ry...@us.ibm.com writes: This patch series decouples the detachment of a block device from the removal of the backing pci-device. Removal of a hotplugged pci device requires the guest to respond before qemu tears down the block device. In some cases, the guest may not respond leaving the guest with continued access to the block device. The new monitor command, drive_unplug, will revoke a guests access to the block device independently of the removal of the pci device. The first patch adds a new drive find method, the second patch implements the monitor command and block layer changes. Changes since v3: - Moved QMP command for drive_unplug() to separate patch Changes since v2: - Added QMP command for drive_unplug() Changes since v1: - CodingStyle fixes - Added qemu_aio_flush() to bdrv_unplug() Signed-off-by: Ryan Harper ry...@us.ibm.com If I understand your patch correctly, the difference between your drive_unplug and my blockdev_del is as follows: * drive_unplug forcefully severs the connection between the host part of the block device and its BlockDriverState. A shell of the host part remains, to be cleaned up later. You need forceful disconnect operation to be able to revoke access to an image whether the guest cooperates or not. Fair enough. * blockdev_del deletes a host part. My current version fails when the host part is in use. I patterned that after netdev_del, which used to work that way, until commit 2ffcb18d: Make netdev_del delete the netdev even when it's in use To hot-unplug guest and host part of a network device, you do: device_del NIC-ID netdev_del NETDEV-ID For PCI devices, device_del merely tells ACPI to unplug the device. The device goes away for real only after the guest processed the ACPI unplug event. You have to wait until then (e.g. by polling info pci) before you can unplug the netdev. Not good. Fix by removing the in use check from do_netdev_del(). Deleting a netdev while it's in use is safe; packets simply get routed to the bit bucket. Isn't this the very same problem that's behind your drive_unplug? Yes it is. I'd like to have some consistency among net, block and char device commands, i.e. a common set of operations that work the same for all of them. Can we agree on such a set? Yeah; the current trouble (or at least what I perceive to be trouble) is that in the case where the guest responds to device_del induced ACPI removal event; the current qdev code already does the host-side device tear down. Not sure if it is OK to do a blockdev_del() immediately after the device_del. What happens when we do: device_del ACPI to guest blockdev_del /* removes host-side device */ guest responds to ACPI qdev calls pci device removal code qemu attempts to destroy the associated host-side block That may just work today; and if not, it shouldn't be hard to fix up the code to check for NULLs Even if your drive_unplug shouldn't fit in that set, we might want it as a stop-gap. Depends on how urgent the need for it is. Yet another special-purpose command to be deprecated later. The fix is urgent; but I'm willing to spin a couple patches if it helps get this into better shape. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
* Markus Armbruster arm...@redhat.com [2010-10-29 11:11]: Ryan Harper ry...@us.ibm.com writes: * Markus Armbruster arm...@redhat.com [2010-10-29 09:13]: [Note cc: Michael] Ryan Harper ry...@us.ibm.com writes: If I understand your patch correctly, the difference between your drive_unplug and my blockdev_del is as follows: * drive_unplug forcefully severs the connection between the host part of the block device and its BlockDriverState. A shell of the host part remains, to be cleaned up later. You need forceful disconnect operation to be able to revoke access to an image whether the guest cooperates or not. Fair enough. * blockdev_del deletes a host part. My current version fails when the host part is in use. I patterned that after netdev_del, which used to work that way, until commit 2ffcb18d: Make netdev_del delete the netdev even when it's in use To hot-unplug guest and host part of a network device, you do: device_del NIC-ID netdev_del NETDEV-ID For PCI devices, device_del merely tells ACPI to unplug the device. The device goes away for real only after the guest processed the ACPI unplug event. You have to wait until then (e.g. by polling info pci) before you can unplug the netdev. Not good. Fix by removing the in use check from do_netdev_del(). Deleting a netdev while it's in use is safe; packets simply get routed to the bit bucket. Isn't this the very same problem that's behind your drive_unplug? Yes it is. I'd like to have some consistency among net, block and char device commands, i.e. a common set of operations that work the same for all of them. Can we agree on such a set? Yeah; the current trouble (or at least what I perceive to be trouble) is that in the case where the guest responds to device_del induced ACPI removal event; the current qdev code already does the host-side device tear down. Not sure if it is OK to do a blockdev_del() immediately after the device_del. What happens when we do: device_del ACPI to guest blockdev_del /* removes host-side device */ Fails in my tree, because the blockdev's still in use. See below. guest responds to ACPI qdev calls pci device removal code qemu attempts to destroy the associated host-side block That may just work today; and if not, it shouldn't be hard to fix up the code to check for NULLs I hate the automatic deletion of host part along with the guest part. device_del should undo device_add. {block,net,char}dev_{add,del} should be similarly paired. Agreed. In my blockdev branch, I keep the automatic delete only for backwards compatibility: if you create the drive with drive_add, it gets auto-deleted, but if you use blockdev_add, it stays around. But what to do about the case where we're doing drive_add and then a device_del() That's the urgent situation that needs to be resolved. Even if your drive_unplug shouldn't fit in that set, we might want it as a stop-gap. Depends on how urgent the need for it is. Yet another special-purpose command to be deprecated later. The fix is urgent; but I'm willing to spin a couple patches if it helps get this into better shape. Can we agree on a common solution for block and net? That's why I cc'ed Michael. I didn't see a good way to have block behave the same as net; though I do agree that it would be good to have this be common, long term. Currently, we have two different ways: * The netdev way: del always succeeds How can it succeed if the host part is in use? If all device models are prepared to deal with a missing host part, we can delete it right away. Else, we need to replace it with a suitable zombie, which is auto-deleted when it goes out of use. Such zombies are not be visible elsewhere, in particular, the ID becomes available immediately. * The unplug way: del fails while in use, unplug always succeeds Feels a bit cleaner to me. But changing netdev_del might not be acceptable. Either way works for me as an user interface. But I'd rather not have both. Next, we need to consider how to integrate this with the automatic deletion of drives on qdev destruction. That's too late for unplug, we want that right in device_del. I'd leave the stupid automatic delete where it is now, in qdev destruction. The C API need unplug and delete separate for that. Regardless of the way we choose, we need to think very clearly on how exactly device models should behave when their host part is missing or a zombie, and how that behavior appears in the guest. For net, making it look exactly like a yanked out network cable would make sense to me. What about block? It seems to me that for block it's like cdrom with no disk, floppy with no media, hard disk that's gone bad. I think we we
[Qemu-devel] [PATCH 2/3] v2 Fix Block Hotplug race with drive_unplug()
Block hot unplug is racy since the guest is required to acknowlege the ACPI unplug event; this may not happen synchronously with the device removal command This series aims to close a gap where by mgmt applications that assume the block resource has been removed without confirming that the guest has acknowledged the removal may re-assign the underlying device to a second guest leading to data leakage. This series introduces a new montor command to decouple asynchornous device removal from restricting guest access to a block device. We do this by creating a new monitor command drive_unplug which maps to a bdrv_unplug() command which does a qemu_aio_flush; bdrv_flush() and bdrv_close(). Once complete, subsequent IO is rejected from the device and the guest will get IO errors but continue to function. A subsequent device removal command can be issued to remove the device, to which the guest may or maynot respond, but as long as the unplugged bit is set, no IO will be sumbitted. Changes since v1: - Added qemu_aio_flush() before bdrv_flush() to wait on pending io Signed-off-by: Ryan Harper ry...@us.ibm.com --- block.c |7 +++ block.h |1 + blockdev.c | 26 ++ blockdev.h |1 + hmp-commands.hx | 15 +++ 5 files changed, 50 insertions(+), 0 deletions(-) diff --git a/block.c b/block.c index a19374d..be47655 100644 --- a/block.c +++ b/block.c @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int removable) } } +void bdrv_unplug(BlockDriverState *bs) +{ +qemu_aio_flush(); +bdrv_flush(bs); +bdrv_close(bs); +} + int bdrv_is_removable(BlockDriverState *bs) { return bs-removable; diff --git a/block.h b/block.h index 5f64380..732f63e 100644 --- a/block.h +++ b/block.h @@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, BlockErrorAction on_read_error, BlockErrorAction on_write_error); BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read); void bdrv_set_removable(BlockDriverState *bs, int removable); +void bdrv_unplug(BlockDriverState *bs); int bdrv_is_removable(BlockDriverState *bs); int bdrv_is_read_only(BlockDriverState *bs); int bdrv_is_sg(BlockDriverState *bs); diff --git a/blockdev.c b/blockdev.c index 5fc3b9b..68eb329 100644 --- a/blockdev.c +++ b/blockdev.c @@ -610,3 +610,29 @@ int do_change_block(Monitor *mon, const char *device, } return monitor_read_bdrv_key_start(mon, bs, NULL, NULL); } + +int do_drive_unplug(Monitor *mon, const QDict *qdict, QObject **ret_data) +{ +DriveInfo *dinfo; +BlockDriverState *bs; +const char *id; + +if (!qdict_haskey(qdict, id)) { +qerror_report(QERR_MISSING_PARAMETER, id); +return -1; +} + +id = qdict_get_str(qdict, id); +dinfo = drive_get_by_id(id); +if (!dinfo) { +qerror_report(QERR_DEVICE_NOT_FOUND, id); +return -1; +} + +/* mark block device unplugged */ +bs = dinfo-bdrv; +bdrv_unplug(bs); + +return 0; +} + diff --git a/blockdev.h b/blockdev.h index 19c6915..ecb9ac8 100644 --- a/blockdev.h +++ b/blockdev.h @@ -52,5 +52,6 @@ int do_eject(Monitor *mon, const QDict *qdict, QObject **ret_data); int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject **ret_data); int do_change_block(Monitor *mon, const char *device, const char *filename, const char *fmt); +int do_drive_unplug(Monitor *mon, const QDict *qdict, QObject **ret_data); #endif diff --git a/hmp-commands.hx b/hmp-commands.hx index 81999aa..7a32a2e 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -68,6 +68,21 @@ Eject a removable medium (use -f to force it). ETEXI { +.name = drive_unplug, +.args_type = id:s, +.params = device, +.help = unplug block device, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_drive_unplug, +}, + +STEXI +...@item unplug @var{device} +...@findex unplug +Unplug block device. +ETEXI + +{ .name = change, .args_type = device:B,target:F,arg:s?, .params = device filename [format], -- 1.6.3.3
[Qemu-devel] [PATCH 3/3] Add qmp version of drive_unplug
Signed-off-by: Ryan Harper ry...@us.ibm.com --- qmp-commands.hx | 26 ++ 1 files changed, 26 insertions(+), 0 deletions(-) diff --git a/qmp-commands.hx b/qmp-commands.hx index 793cf1c..e8f3d4a 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -338,6 +338,32 @@ Example: EQMP { +.name = drive_unplug, +.args_type = id:s, +.params = device, +.help = unplug block device, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_drive_unplug, +}, + +SQMP +drive unplug +-- + +Unplug a block device. + +Arguments: + +- id: the device's ID (json-string) + +Example: + +- { execute: drive_unplug, arguments: { id: drive-virtio-blk1 } } +- { return: {} } + +EQMP + +{ .name = cpu, .args_type = index:i, .params = index, -- 1.6.3.3
[Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
This patch series decouples the detachment of a block device from the removal of the backing pci-device. Removal of a hotplugged pci device requires the guest to respond before qemu tears down the block device. In some cases, the guest may not respond leaving the guest with continued access to the block device. The new monitor command, drive_unplug, will revoke a guests access to the block device independently of the removal of the pci device. The first patch adds a new drive find method, the second patch implements the monitor command and block layer changes. Changes since v3: - Moved QMP command for drive_unplug() to separate patch Changes since v2: - Added QMP command for drive_unplug() Changes since v1: - CodingStyle fixes - Added qemu_aio_flush() to bdrv_unplug() Signed-off-by: Ryan Harper ry...@us.ibm.com
[Qemu-devel] [PATCH 1/3] v2 Add drive_get_by_id
Add a function to find a drive by id string. Changes since v1: -Coding Style fix Signed-off-by: Ryan Harper ry...@us.ibm.com --- blockdev.c | 13 + blockdev.h |1 + 2 files changed, 14 insertions(+), 0 deletions(-) diff --git a/blockdev.c b/blockdev.c index ff7602b..5fc3b9b 100644 --- a/blockdev.c +++ b/blockdev.c @@ -75,6 +75,19 @@ DriveInfo *drive_get(BlockInterfaceType type, int bus, int unit) return NULL; } +DriveInfo *drive_get_by_id(const char *id) +{ +DriveInfo *dinfo; + +QTAILQ_FOREACH(dinfo, drives, next) { +if (strcmp(id, dinfo-id)) { +continue; +} +return dinfo; +} +return NULL; +} + int drive_get_max_bus(BlockInterfaceType type) { int max_bus; diff --git a/blockdev.h b/blockdev.h index 653affc..19c6915 100644 --- a/blockdev.h +++ b/blockdev.h @@ -38,6 +38,7 @@ DriveInfo *drive_get(BlockInterfaceType type, int bus, int unit); int drive_get_max_bus(BlockInterfaceType type); void drive_uninit(DriveInfo *dinfo); DriveInfo *drive_get_by_blockdev(BlockDriverState *bs); +DriveInfo *drive_get_by_id(const char *id); QemuOpts *drive_add(const char *file, const char *fmt, ...) GCC_FMT_ATTR(2, 3); DriveInfo *drive_init(QemuOpts *arg, int default_to_scsi, int *fatal_error); -- 1.6.3.3