from:"Ryan Harper"

[Qemu-devel] [Bug 1716028] Re: qemu 2.10 locks images with no feature flag

2017-09-11 Thread Ryan Harper

In our multipath case, the initial open succeeds (it's not locked by
anything else) and the second lock attempt fails, however, IIUC the
fcntl structure[1] includes the locking pid, which should be our
invocation of QEMU;

Can we not check if the locking pid matches the current pid and not
fail?  This appears to be the original goal of protecting locked images
from being manipulated by external processes.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1716028

Title:
  qemu 2.10 locks images with no feature flag

Status in QEMU:
  New
Status in qemu package in Ubuntu:
  New

Bug description:
  1) % lsb_release -rd
  Description:  Ubuntu Artful Aardvark (development branch)
  Release:  17.10

  2) % apt-cache policy qemu-system-x86
  qemu-system-x86:
Installed: 1:2.10~rc3+dfsg-0ubuntu1
Candidate: 1:2.10+dfsg-0ubuntu1
Version table:
   1:2.10+dfsg-0ubuntu1 500
  500 http://archive.ubuntu.com//ubuntu devel/main amd64 Packages
   *** 1:2.10~rc3+dfsg-0ubuntu1 100
  100 /var/lib/dpkg/status

  3) qemu locks image files with no way to discover this feature nor how
  to disable it

  4) qemu provides a way to query if it supports image locking, and what
  the default value is, and how to disable the locking via cli

  qemu 2.10 now will lock image files and warn if an image is currently
  locked.  This prevent qemu from running (and possibly corrupting said
  image).

  However, qemu does not provide any way to determine if a qemu binary
  actually has this capability.  Normally behavior changing features are
  exposed via some change to the qemu help menu or QMP/QAPI output of
  capabilities.

  I believe this slipped through since libvirt already does image
  locking, but direct cli users will be caught by this change.

  In particular, we have a use-case where we simulate multipath disks by
  creating to disks which point to the same file which now breaks
  without adding the 'file.locking=off' to the -drive parameters;  which
  is also completely undocumented and unexposed.

  Some parts of the cli like -device allow querying of settable options
  (qemu-system-x86 -device scsi_hd,?)  but nothing equivalent exists for
  -drive parameters.

  ProblemType: Bug
  DistroRelease: Ubuntu 17.10
  Package: qemu-system-x86 1:2.10~rc3+dfsg-0ubuntu1
  ProcVersionSignature: Ubuntu 4.12.0-11.12-generic 4.12.5
  Uname: Linux 4.12.0-11-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl zcommon znvpair
  ApportVersion: 2.20.6-0ubuntu7
  Architecture: amd64
  Date: Fri Sep  8 12:56:53 2017
  JournalErrors:
   Hint: You are currently not seeing messages from other users and the system.
 Users in groups 'adm', 'systemd-journal' can see all messages.
 Pass -q to turn off this notice.
   -- Logs begin at Mon 2017-01-30 11:56:02 CST, end at Fri 2017-09-08 12:56:46 
CDT. --
   -- No entries --
  KvmCmdLine: COMMAND STAT  EUID  RUID   PID  PPID %CPU COMMAND
  MachineType: HP ProLiant DL360 Gen9
  ProcEnviron:
   TERM=xterm
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.12.0-11-generic 
root=UUID=45354276-e0c0-4bf6-9083-f130b89411cc ro --- console=ttyS1,115200
  SourcePackage: qemu
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 03/05/2015
  dmi.bios.vendor: HP
  dmi.bios.version: P89
  dmi.chassis.type: 23
  dmi.chassis.vendor: HP
  dmi.modalias: 
dmi:bvnHP:bvrP89:bd03/05/2015:svnHP:pnProLiantDL360Gen9:pvr:cvnHP:ct23:cvr:
  dmi.product.family: ProLiant
  dmi.product.name: ProLiant DL360 Gen9
  dmi.sys.vendor: HP

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1716028/+subscriptions

[Qemu-devel] [Bug 1716028] Re: qemu 2.10 locks images with no feature flag

2017-09-11 Thread Ryan Harper

Kevin,

Thanks for the information.  A couple of points for feedback:

1) there doesn't appear to be a way to run qmp query-schema without
spawning a qemu instance in -qmp mode and having a second client issue
the query-schema;  certainly a qemu-system-$arch -qmp-schema would be
quite useful when examining feature availability.   While I know the
QMP/QAPI introspection is where most of the work has gone to describing
how to interact with qemu it's quite cumbersome at best:

searching for blockdev-add, find arg-type: 48, read arg-type-48, find
list of 'variants', know that locking feature is part of 'file', find
type: 207, see member 'locking' in list[A], which is of type 296, find
type: 296, which is an enum of 'on', 'off', 'auto'

A. Interesting enough, qapi says the default is None, however qemu
certainly locks files by default which would seem to imply a mismatch
between qapi defaults and qemu behavior.

That's pretty heavy;  Maybe that warning message qemu prints could
provide some hints as to what a user could do (or refer to a manpage on
locking?).

2) share-rw appears to be a blockdev parameter (I see it available via
most block devices via qemu-system-$arch -device {scsi-hd,ide-hd,virtio-
blk}? However there is no equivalent -blockdev for dumping the default
options that a -blockdev parameter takes.  The qmp-schema also does not
include any information about 'share-rw' w.r.t what values are available
that I could find after dumping the schema.

Thanks smoser:

#!/bin/sh
qemu-system-x86_64 -S -nodefaults -nographic \
   -serial none -monitor none -qmp stdio

Re: [Qemu-devel] [Bug 1305402] Re: libvirt fails to start VirtualMachines

2016-09-22 Thread Ryan Harper

Machine type changes may be related to:

https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1626070

There's a PPA in the bug with a fix for at least the wily machine type.

On Thu, Sep 22, 2016 at 6:05 PM, Leo Arias 
wrote:

> This has just happened to me. For some reason, all my machines had
> machine='pc-i440fx-wily'.
> After an update in yakkety, they stopped working.
>
> $ qemu-system-x86_64 -enable-kvm -machine help | grep wily
>
> So I updated the machine xml to a supported machine as Charles
> suggested, and they work again.
>
> Here's the note for my future self about how to do the update:
>
> $ virsh dumpxml $machine-name > /tmp/machine.xml
> Edit the xml.
> $ virsh define /tmp/machine.xml
>
> --
> You received this bug notification because you are subscribed to qemu in
> Ubuntu.
> https://bugs.launchpad.net/bugs/1305402
>
> Title:
>   libvirt fails to start VirtualMachines
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/qemu/+bug/1305402/+subscriptions
>

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1305402

Title:
  libvirt fails to start VirtualMachines

Status in QEMU:
  Invalid
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  I've created several kvm based machines using virtual machine manager.
  They have operated well over the last 4 days without issue. I did an
  apt-get upgrade, and qemu was in the list of updates.

  After upgrading, I am unable to start any of the provisioned virtual
  machines with the following error output

  virsh # start node2
  error: Failed to start domain node2
  error: internal error: process exited while connecting to monitor: 
qemu-system-x86_64: -machine trusty,accel=kvm,usb=off: Unsupported machine type
  Use -machine help to list supported machines!

  virsh # start node3
  error: Failed to start domain node3
  error: internal error: process exited while connecting to monitor: 
qemu-system-x86_64: -machine trusty,accel=kvm,usb=off: Unsupported machine type
  Use -machine help to list supported machines!

  $ dpkg -l | grep kvm
  ii  qemu-kvm 2.0.0~rc1+dfsg-0ubuntu3 
amd64QEMU Full virtualization on x86 hardware (transitional package)

  Log snippet from vm 'media' that was verified working, and fails to
  start after the upgrade.

  LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin 
QEMU_AUDIO_DRV=none /usr/bin/kvm-spice -name media -S -machine 
trusty,accel=kvm,usb=off -m 1548 -realtime mlock=off -smp 
1,sockets=1,cores=1,threads=1 -uuid 60b20f7b-6d20-bcb3-f4fc-808a9b2fe0d0 
-no-user-config -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/media.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown 
-boot menu=off,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 
-drive 
file=/var/lib/libvirt/images/media.img,if=none,id=drive-virtio-disk0,format=raw 
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
 -drive 
file=/home/charles/iso/ubuntu-desktop-12.04.4-amd64.iso,if=none,id=drive-ide0-1-0,readonly=on,format=raw
 -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev 
tap,fd=24,id=hostnet0,vhost=on,vhostfd=26 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:a0:69:d9,bus=pci.0,addr=0x3 
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 
-vnc 127.0.0.1:1 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
  char device redirected to /dev/pts/13 (label charserial0)
  qemu: terminating on signal 15 from pid 31688
  2014-04-10 03:36:54.593+: shutting down
  2014-04-10 03:59:25.487+: starting up
  LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin 
QEMU_AUDIO_DRV=none /usr/bin/kvm-spice -name media -S -machine 
trusty,accel=kvm,usb=off -m 1548 -realtime mlock=off -smp 
1,sockets=1,cores=1,threads=1 -uuid 60b20f7b-6d20-bcb3-f4fc-808a9b2fe0d0 
-no-user-config -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/media.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown 
-boot menu=off,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 
-drive 
file=/var/lib/libvirt/images/media.img,if=none,id=drive-virtio-disk0,format=raw 
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
 -drive 
file=/home/charles/iso/ubuntu-desktop-12.04.4-amd64.iso,if=none,id=drive-ide0-1-0,readonly=on,format=raw
 -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev 
tap,fd=24,id=hostnet0,vhost=on,vhostfd=25 -device

[Qemu-devel] [Bug 1563887] Re: qemu-system-ppc64 freezes on starting image on ppc64le

2016-04-13 Thread Ryan Harper

As noted, Xenial kernel is not supporting POWER7 cpu.

** Changed in: linux (Ubuntu)
   Status: Confirmed => Invalid

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1563887

Title:
  qemu-system-ppc64 freezes on starting image on ppc64le

Status in QEMU:
  Invalid
Status in linux package in Ubuntu:
  Invalid
Status in livecd-rootfs package in Ubuntu:
  Invalid
Status in qemu package in Ubuntu:
  Invalid

Bug description:
  qemu-system-ppc64 running on Ubuntu 16.04 beta-2 fails to start an
  image as part of the certification process. This on an IBM ppc64le in
  PowerVM mode running Ubuntu 16.04 beta-2 deployed by MAAS 1.9.1. There
  is no error output.

  ubuntu@alpine01:~/kvm$ qemu-system-ppc64 -m 256 -display none -nographic -net 
nic -net user,net=10.0.0.0/8,host=10.0.0.1,hostfwd=tcp::-:22 -machine 
pseries -drive file=xenial-server-cloudimg-ppc64el-disk1.img,if=virtio -drive 
file=seed.iso,if=virtio
  WARNING: Image format was not specified for 'seed.iso' and probing guessed 
raw.
   Automatically detecting the format is dangerous for raw images, 
write operations on block 0 will be restricted.
   Specify the 'raw' format explicitly to remove the restrictions.

  SLOF **
  QEMU Starting
   Build Date = Jan 29 2016 18:58:37
   FW Version = buildd@ release 20151103
   Press "s" to enter Open Firmware.

  Populating /vdevice methods
  Populating /vdevice/vty@7100
  Populating /vdevice/nvram@7101
  Populating /vdevice/l-lan@7102
  Populating /vdevice/v-scsi@7103
     SCSI: Looking for devices
    8200 CD-ROM   : "QEMU QEMU CD-ROM  2.5+"
  Populating /pci@8002000
   00 1800 (D) : 1af4 1001virtio [ block ]
   00 1000 (D) : 1af4 1001virtio [ block ]
   00 0800 (D) : 106b 003fserial bus [ usb-ohci ]
   00  (D) : 1234 qemu vga
  No NVRAM common partition, re-initializing...
  Installing QEMU fb

  Scanning USB
    OHCI: initializing
  USB Keyboard
  USB mouse
  No console specified using screen & keyboard

    Welcome to Open Firmware

    Copyright (c) 2004, 2011 IBM Corporation All rights reserved.
    This program and the accompanying materials are made available
    under the terms of the BSD License available at
    http://www.opensource.org/licenses/bsd-license.php

  Trying to load:  from: /pci@8002000/scsi@3 ...
  E3404: Not a bootable device!
  Trying to load:  from: /pci@8002000/scsi@2 ...   Successfully loaded
  Linux ppc64le
  #31-Ubuntu SMP F

  ProblemType: Bug
  DistroRelease: Ubuntu 16.04
  Package: qemu-system-ppc 1:2.5+dfsg-5ubuntu6
  ProcVersionSignature: Ubuntu 4.4.0-16.32-generic 4.4.6
  Uname: Linux 4.4.0-16-generic ppc64le
  ApportVersion: 2.20-0ubuntu3
  Architecture: ppc64el
  Date: Wed Mar 30 14:10:01 2016
  KvmCmdLine:
   COMMAND STAT  EUID  RUID   PID  PPID %CPU COMMAND
   kvm-irqfd-clean S<   0 0  1172 2  0.0 [kvm-irqfd-clean]
   qemu-nbdSsl  0 0 13467 1  0.0 qemu-nbd -c /dev/nbd0 
xenial-server-cloudimg-ppc64el-disk1.img
   qemu-system-ppc Sl+   1000  1000 18973 18896  101 qemu-system-ppc64 -m 256 
-display none -nographic -net nic -net 
user,net=10.0.0.0/8,host=10.0.0.1,hostfwd=tcp::-:22 -machine pseries -drive 
file=xenial-server-cloudimg-ppc64el-disk1.img,if=virtio -drive 
file=seed.iso,if=virtio
  Lsusb: Error: command ['lsusb'] failed with exit code 1:
  ProcEnviron:
   TERM=xterm
   PATH=(custom, no user)
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinux-4.4.0-16-generic 
root=UUID=92d820c8-ab25-497b-9b1e-f1435992bbf3 ro
  ProcLoadAvg: 1.08 0.94 0.58 2/616 19571
  ProcLocks:
   1: POSIX  ADVISORY  WRITE 886 00:13:381 0 EOF
   2: POSIX  ADVISORY  WRITE 1339 00:13:528 0 EOF
   3: FLOCK  ADVISORY  WRITE 1284 00:13:522 0 EOF
   4: POSIX  ADVISORY  WRITE 2281 00:13:563 0 EOF
   5: POSIX  ADVISORY  WRITE 1331 00:13:536 0 EOF
  ProcSwaps:
   Filename TypeSizeUsedPriority
   /swap.img   file 8388544 0   -1
  ProcVersion: Linux version 4.4.0-16-generic (buildd@bos01-ppc64el-001) (gcc 
version 5.3.1 20160320 (Ubuntu/Linaro/IBM 5.3.1-12ubuntu4) ) #32-Ubuntu SMP Thu 
Mar 24 22:31:14 UTC 2016
  SourcePackage: qemu
  UpgradeStatus: No upgrade log present (probably fresh install)
  bootlist:
   /pci@8002011/pci1014,034A@0/sas/disk@4068402c40
   
/pci@8002018/ethernet@0:speed=auto,duplex=auto,csarch,000.000.000.000,,000.000.000.000,000.000.000.000,5,5,000.000.000.000,512
   
/pci@8002018/ethernet@0,1:speed=auto,duplex=auto,csarch,000.000.000.000,,000.000.000.000,000.000.000.000,5,5,000.000.000.000,512

[Qemu-devel] [Bug 1563887] Re: qemu-system-ppc64 freezes on starting image on ppc64le

2016-04-13 Thread Ryan Harper

I should also mention that the original command used will work if we
specify which cpu type to use:

qemu-system-ppc64 -m 256 \ 
  -cpu POWER8 \
  -display none -nographic \
  -net nic -net user,net=10.0.0.0/8,host=10.0.0.1,hostfwd=tcp::-:22 \
  -machine pseries \
  -drive file=xenial-server-cloudimg-ppc64el-disk1.img,if=virtio \
  -drive file=seed.iso,if=virtio

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1563887

Title:
  qemu-system-ppc64 freezes on starting image on ppc64le

Status in QEMU:
  Confirmed
Status in linux package in Ubuntu:
  Confirmed
Status in livecd-rootfs package in Ubuntu:
  Invalid
Status in qemu package in Ubuntu:
  Invalid

Bug description:
  qemu-system-ppc64 running on Ubuntu 16.04 beta-2 fails to start an
  image as part of the certification process. This on an IBM ppc64le in
  PowerVM mode running Ubuntu 16.04 beta-2 deployed by MAAS 1.9.1. There
  is no error output.

  ubuntu@alpine01:~/kvm$ qemu-system-ppc64 -m 256 -display none -nographic -net 
nic -net user,net=10.0.0.0/8,host=10.0.0.1,hostfwd=tcp::-:22 -machine 
pseries -drive file=xenial-server-cloudimg-ppc64el-disk1.img,if=virtio -drive 
file=seed.iso,if=virtio
  WARNING: Image format was not specified for 'seed.iso' and probing guessed 
raw.
   Automatically detecting the format is dangerous for raw images, 
write operations on block 0 will be restricted.
   Specify the 'raw' format explicitly to remove the restrictions.

  SLOF **
  QEMU Starting
   Build Date = Jan 29 2016 18:58:37
   FW Version = buildd@ release 20151103
   Press "s" to enter Open Firmware.

  Populating /vdevice methods
  Populating /vdevice/vty@7100
  Populating /vdevice/nvram@7101
  Populating /vdevice/l-lan@7102
  Populating /vdevice/v-scsi@7103
     SCSI: Looking for devices
    8200 CD-ROM   : "QEMU QEMU CD-ROM  2.5+"
  Populating /pci@8002000
   00 1800 (D) : 1af4 1001virtio [ block ]
   00 1000 (D) : 1af4 1001virtio [ block ]
   00 0800 (D) : 106b 003fserial bus [ usb-ohci ]
   00  (D) : 1234 qemu vga
  No NVRAM common partition, re-initializing...
  Installing QEMU fb

  Scanning USB
    OHCI: initializing
  USB Keyboard
  USB mouse
  No console specified using screen & keyboard

    Welcome to Open Firmware

    Copyright (c) 2004, 2011 IBM Corporation All rights reserved.
    This program and the accompanying materials are made available
    under the terms of the BSD License available at
    http://www.opensource.org/licenses/bsd-license.php

  Trying to load:  from: /pci@8002000/scsi@3 ...
  E3404: Not a bootable device!
  Trying to load:  from: /pci@8002000/scsi@2 ...   Successfully loaded
  Linux ppc64le
  #31-Ubuntu SMP F

  ProblemType: Bug
  DistroRelease: Ubuntu 16.04
  Package: qemu-system-ppc 1:2.5+dfsg-5ubuntu6
  ProcVersionSignature: Ubuntu 4.4.0-16.32-generic 4.4.6
  Uname: Linux 4.4.0-16-generic ppc64le
  ApportVersion: 2.20-0ubuntu3
  Architecture: ppc64el
  Date: Wed Mar 30 14:10:01 2016
  KvmCmdLine:
   COMMAND STAT  EUID  RUID   PID  PPID %CPU COMMAND
   kvm-irqfd-clean S<   0 0  1172 2  0.0 [kvm-irqfd-clean]
   qemu-nbdSsl  0 0 13467 1  0.0 qemu-nbd -c /dev/nbd0 
xenial-server-cloudimg-ppc64el-disk1.img
   qemu-system-ppc Sl+   1000  1000 18973 18896  101 qemu-system-ppc64 -m 256 
-display none -nographic -net nic -net 
user,net=10.0.0.0/8,host=10.0.0.1,hostfwd=tcp::-:22 -machine pseries -drive 
file=xenial-server-cloudimg-ppc64el-disk1.img,if=virtio -drive 
file=seed.iso,if=virtio
  Lsusb: Error: command ['lsusb'] failed with exit code 1:
  ProcEnviron:
   TERM=xterm
   PATH=(custom, no user)
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinux-4.4.0-16-generic 
root=UUID=92d820c8-ab25-497b-9b1e-f1435992bbf3 ro
  ProcLoadAvg: 1.08 0.94 0.58 2/616 19571
  ProcLocks:
   1: POSIX  ADVISORY  WRITE 886 00:13:381 0 EOF
   2: POSIX  ADVISORY  WRITE 1339 00:13:528 0 EOF
   3: FLOCK  ADVISORY  WRITE 1284 00:13:522 0 EOF
   4: POSIX  ADVISORY  WRITE 2281 00:13:563 0 EOF
   5: POSIX  ADVISORY  WRITE 1331 00:13:536 0 EOF
  ProcSwaps:
   Filename TypeSizeUsedPriority
   /swap.img   file 8388544 0   -1
  ProcVersion: Linux version 4.4.0-16-generic (buildd@bos01-ppc64el-001) (gcc 
version 5.3.1 20160320 (Ubuntu/Linaro/IBM 5.3.1-12ubuntu4) ) #32-Ubuntu SMP Thu 
Mar 24 22:31:14 UTC 2016
  SourcePackage: qemu
  UpgradeStatus: No upgrade log present (probably fresh install)
  bootlist:
   /pci@8002011/pci1014,034A@0/sas/disk@4068402c40

[Qemu-devel] [Bug 1563887] Re: qemu-system-ppc64 freezes on starting image on ppc64le

2016-04-12 Thread Ryan Harper

Here's an update.

The Xenial kernel doesn;t like the emulated POWER7 cpu that the command
line being used generates by default.

processor   : 0
cpu : POWER7 (raw), altivec supported
clock   : 1000.00MHz
revision: 2.3 (pvr 003f 0203)

timebase: 51200
platform: pSeries
model   : IBM pSeries (emulated by qemu)
machine : CHRP IBM pSeries (emulated by qemu)

We can boot a Wily image (kernel 4.2.0-35) just fine with the POWER7
cpu.


When booting Xenial's kernel with POWER7 cpu, it produces a stacktrace during 
module load:

[9.885165] Loaded X.509 cert 'Build time autogenerated kernel key: 
6687eed33bf99302166296c3e5cafe31ef38ad41'
[9.886507] zswap: loaded using pool lzo/zbud
[9.916000] modprobe[74]: unhandled signal 4 at 3fffb5a4d03c nip 
3fffb5a4d03c lr 3fffb5a25e24 code 30001
[9.925819] modprobe[76]: unhandled signal 4 at 3fff85b9d03c nip 
3fff85b9d03c lr 3fff85b75e24 code 30001
[9.928401] Key type trusted registered
[9.930762] modprobe[79]: unhandled signal 4 at 3fff7d05d03c nip 
3fff7d05d03c lr 3fff7d035e24 code 30001
[9.933360] modprobe[80]: unhandled signal 4 at 3fff8820d03c nip 
3fff8820d03c lr 3fff881e5e24 code 30001
[9.936240] modprobe[83]: unhandled signal 4 at 3fffb4fbd03c nip 
3fffb4fbd03c lr 3fffb4f95e24 code 30001
[9.938873] modprobe[84]: unhandled signal 4 at 3fff92d4d03c nip 
3fff92d4d03c lr 3fff92d25e24 code 30001
[9.940335] Key type encrypted registered
[9.940461] AppArmor: AppArmor sha1 policy hashing enabled
[9.941005] ima: No TPM chip found, activating TPM-bypass!
[9.942985] evm: HMAC attrs: 0x1
[9.947081] hctosys: unable to open rtc device (rtc0)
[9.987867] Freeing unused kernel memory: 6144K (c0ea - 
c14a)
[9.991123] init[1]: unhandled signal 4 at 3fff8edfd03c nip 
3fff8edfd03c lr 3fff8edd5e24 code 30001
[9.994581] Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x0004
[9.994581] 
[9.994889] CPU: 0 PID: 1 Comm: init Not tainted 4.4.0-18-generic #34-Ubuntu
[9.995054] Call Trace:
[9.995216] [c0001e4c3a50] [c0aed6fc] dump_stack+0xb0/0xf0 
(unreliable)
[9.995336] [c0001e4c3a90] [c0ae9930] panic+0x100/0x2c0
[9.995398] [c0001e4c3b20] [c00bd554] do_exit+0xc24/0xc30
[9.995443] [c0001e4c3be0] [c00bd644] do_group_exit+0x64/0x100
[9.995490] [c0001e4c3c20] [c00ceaac] get_signal+0x55c/0x7b0
[9.995534] [c0001e4c3d10] [c0017424] do_signal+0x54/0x2b0
[9.995578] [c0001e4c3e00] [c001787c] do_notify_resume+0xbc/0xd0
[9.995677] [c0001e4c3e30] [c0009838] 
ret_from_except_lite+0x64/0x68
[   10.011069] ---[ end Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x0004
[   10.011069] 


When we use -enable-kvm, this bypasses the tcg POWER7 cpu, and uses the host 
cpu type (POWER8) which is why we can boot the Xenial kernel with KVM.

We need to open a linux task to help track down that issue; also if
someone is testing Xenial on POWER7 hardware, that may help determine if
there is a lurking qemu tcg issue, though given that Wily kernels boot
fine in tcg mode; it's more likely there's something that changed/broke
in the kernels since 4.2.0-35.

I'm marking the qemu task invalid, and will open the linux task.


** Changed in: qemu (Ubuntu)
   Status: Confirmed => Invalid

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1563887

Title:
  qemu-system-ppc64 freezes on starting image on ppc64le

Status in QEMU:
  Confirmed
Status in linux package in Ubuntu:
  Confirmed
Status in livecd-rootfs package in Ubuntu:
  Invalid
Status in qemu package in Ubuntu:
  Invalid

Bug description:
  qemu-system-ppc64 running on Ubuntu 16.04 beta-2 fails to start an
  image as part of the certification process. This on an IBM ppc64le in
  PowerVM mode running Ubuntu 16.04 beta-2 deployed by MAAS 1.9.1. There
  is no error output.

  ubuntu@alpine01:~/kvm$ qemu-system-ppc64 -m 256 -display none -nographic -net 
nic -net user,net=10.0.0.0/8,host=10.0.0.1,hostfwd=tcp::-:22 -machine 
pseries -drive file=xenial-server-cloudimg-ppc64el-disk1.img,if=virtio -drive 
file=seed.iso,if=virtio
  WARNING: Image format was not specified for 'seed.iso' and probing guessed 
raw.
   Automatically detecting the format is dangerous for raw images, 
write operations on block 0 will be restricted.
   Specify the 'raw' format explicitly to remove the restrictions.

  SLOF **
  QEMU Starting
   Build Date = Jan 29 2016 18:58:37
   FW Version = buildd@ release 20151103
   Press "s" to enter Open Firmware.

  Populating /vdevice methods
  Populating

[Qemu-devel] [Bug 1465935] Re: kvm_irqchip_commit_routes: Assertion `ret == 0' failed

2015-06-29 Thread Ryan Harper

Thank you for taking the time to report this bug and helping to make
Ubuntu better. Please execute the following command, as it will
automatically gather debugging information, in a terminal:

apport-collect 1465935

When reporting bugs in the future please use apport by using 'ubuntu-
bug' and the name of the package affected. You can learn more about this
functionality at https://wiki.ubuntu.com/ReportingBugs.

** Changed in: qemu (Ubuntu)
   Status: New = Incomplete

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1465935

Title:
  kvm_irqchip_commit_routes: Assertion `ret == 0' failed

Status in QEMU:
  New
Status in qemu package in Ubuntu:
  Incomplete

Bug description:
  Several my QEMU instances crashed, and in the  qemu log, I can see
  this assertion failure,

 qemu-system-x86_64: /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:984:
  kvm_irqchip_commit_routes: Assertion `ret == 0' failed.

  The QEMU version is 2.0.0, HV OS is ubuntu 12.04, kernel 3.2.0-38.
  Guest OS is RHEL 6.3.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1465935/+subscriptions

[Qemu-devel] [Bug 1465935] Re: kvm_irqchip_commit_routes: Assertion `ret == 0' failed

2015-06-29 Thread Ryan Harper

Have you be able to reproduce this issue on a wily host?  What about a
different guest?  Or is only RHEL6.3 affected?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1465935

Title:
  kvm_irqchip_commit_routes: Assertion `ret == 0' failed

Status in QEMU:
  New
Status in qemu package in Ubuntu:
  Incomplete

Bug description:
  Several my QEMU instances crashed, and in the  qemu log, I can see
  this assertion failure,

 qemu-system-x86_64: /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:984:
  kvm_irqchip_commit_routes: Assertion `ret == 0' failed.

  The QEMU version is 2.0.0, HV OS is ubuntu 12.04, kernel 3.2.0-38.
  Guest OS is RHEL 6.3.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1465935/+subscriptions

[Qemu-devel] [Bug 1465935] Re: kvm_irqchip_commit_routes: Assertion `ret == 0' failed

2015-06-29 Thread Ryan Harper

It appears that the latest version of the patch is here:

http://lists.gnu.org/archive/html/qemu-devel/2015-01/msg00822.html

However, this hasn't yet be accepted upstream.  The most recent
discussion requires the submitter to respond to the maintainers
questions here:

http://lists.gnu.org/archive/html/qemu-devel/2015-01/msg00623.html

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1465935

Title:
  kvm_irqchip_commit_routes: Assertion `ret == 0' failed

Status in QEMU:
  New
Status in qemu package in Ubuntu:
  Incomplete

Bug description:
  Several my QEMU instances crashed, and in the  qemu log, I can see
  this assertion failure,

 qemu-system-x86_64: /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:984:
  kvm_irqchip_commit_routes: Assertion `ret == 0' failed.

  The QEMU version is 2.0.0, HV OS is ubuntu 12.04, kernel 3.2.0-38.
  Guest OS is RHEL 6.3.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1465935/+subscriptions

[Qemu-devel] [Bug 551545] Re: PXE netboot not booting localboot from virtio-disk

2015-01-16 Thread Ryan Harper

This issue should be fixed in the qemu-kvm version included in precise.

** Changed in: qemu-kvm (Ubuntu)
   Status: Triaged = Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/551545

Title:
  PXE netboot not booting localboot from virtio-disk

Status in QEMU:
  New
Status in qemu-kvm package in Ubuntu:
  Fix Released
Status in qemu-kvm package in Fedora:
  Unknown

Bug description:
  Binary package hint: qemu-kvm

  lsb_release -rd
  Description:  Ubuntu lucid (development branch)
  Release:  10.04

  apt-cache policy qemu-kvm
  qemu-kvm:
Installiert: 0.12.3+noroms-0ubuntu3
Kandidat: 0.12.3+noroms-0ubuntu3
Versions-Tabelle:
   *** 0.12.3+noroms-0ubuntu3 0
  500 http://intranet/ubuntu/ lucid/main Packages
  100 /var/lib/dpkg/status

  Description of the problem:

  Starting a guest like this:

  vdekvm \
-m 256M \
-cpu host \
-smp 1 \
-name karmic \
-boot order=nc \
-drive file=/dev/vg01/test,if=virtio,boot=on,cache=none \
-net nic,vlan=0,macaddr=00:2f:8d:b6:cf:d0,model=virtio \
-net vde,vlan=0,sock=/var/run/vde2/vde0.ctl \
-watchdog i6300esb \
-vnc :0 \
-serial telnet:localhost:23,server,nowait \
-monitor tcp:127.0.0.1:12000,server,nowait \
-runas kvmguest

  On telnet localhost you can see that the following boot-menu
  appears:

  - Boot Menu -
  =

  local
  rescue

  It is loaded from this pxelinux.cfg/default file:

  SERIAL 0 9600n8

  DISPLAY boot.txt

  TIMEOUT 120
  DEFAULT local
  PROMPT 1

  LABEL local
localboot 0

  LABEL rescue
kernel lucid
append initrd=lucid-initrd.gz rescue/enable=true -- quiet 
console=ttyS0,9600n8

  
  After the timeout, the guest tries to boot, but fails and reloads the boot 
menu. This is an endless loop, until I break it or choose the rescue menu entry.

  I would expect that it boots from first virtio-disk

  ProblemType: Bug
  DistroRelease: Ubuntu 10.04
  Package: qemu-kvm 0.12.3+noroms-0ubuntu3
  ProcVersionSignature: Ubuntu 2.6.32-18.27-generic 2.6.32.10+drm33.1
  Uname: Linux 2.6.32-18-generic x86_64
  Architecture: amd64
  Date: Tue Mar 30 11:40:59 2010
  ExecutablePath: /usr/bin/qemu-system-x86_64
  MachineType: MICRO-STAR INTERANTIONAL CO.,LTD MS-7368
  ProcCmdLine: root=UUID=0d27271c-feaa-40d9-bbbd-baff4ca1d3cc rw init=/bin/bash
  ProcEnviron:
   LANG=de_DE.UTF-8
   SHELL=/bin/bash
  SourcePackage: qemu-kvm
  dmi.bios.date: 10/31/2007
  dmi.bios.vendor: American Megatrends Inc.
  dmi.bios.version: V1.5B2
  dmi.board.asset.tag: To Be Filled By O.E.M.
  dmi.board.name: MS-7368
  dmi.board.vendor: MICRO-STAR INTERANTIONAL CO.,LTD
  dmi.board.version: 1.0
  dmi.chassis.asset.tag: To Be Filled By O.E.M.
  dmi.chassis.type: 3
  dmi.chassis.vendor: To Be Filled By O.E.M.
  dmi.chassis.version: To Be Filled By O.E.M.
  dmi.modalias: 
dmi:bvnAmericanMegatrendsInc.:bvrV1.5B2:bd10/31/2007:svnMICRO-STARINTERANTIONALCO.,LTD:pnMS-7368:pvr1.0:rvnMICRO-STARINTERANTIONALCO.,LTD:rnMS-7368:rvr1.0:cvnToBeFilledByO.E.M.:ct3:cvrToBeFilledByO.E.M.:
  dmi.product.name: MS-7368
  dmi.product.version: 1.0
  dmi.sys.vendor: MICRO-STAR INTERANTIONAL CO.,LTD

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/551545/+subscriptions

[Qemu-devel] [Bug 1318830] Re: High CPU usage on windows virtual machine

2015-01-16 Thread Ryan Harper

Hi,

Are you running ntp in the host or timekeeping in the guest?  If so, is
the guest clock close to what current time should be?

Also, can you try adding this option:

-global mc146818rtc.lost_tick_policy=slew

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1318830

Title:
  High CPU usage on windows virtual machine

Status in QEMU:
  New
Status in qemu package in Ubuntu:
  New

Bug description:
  I got Ubuntu 14.04, with Qemu 2.0 and moved my windows VM to this new box, 
and made sure that what this article indicates was achieved
  https://www.kraxel.org/blog/2014/03/qemu-and-usb-tablet-cpu-consumtion/
  I can attest that it works following the instructions, erasing the registry, 
etc.
  Unfortunately, with 4 cpus as below, I still see 60% CPU outside as shown by 
Top versus 0% CPU inside. My Kernel is 3.15.0-031500rc4-generic
  If some developer wants to log in and take a look, I am happy to help. The 
box is not in production and I take full responsibility. Until this is solved, 
KVM is not going to compete with Hyper-V or Vmware.  Basically KVM is not 
suitable for the Enterprise as of yet.

  qemu-system-x86_64 -enable-kvm -name Production -S -machine pc-
  i440fx-2.0,accel=kvm,usb=off -cpu
  
kvm64,+rdtscp,+pdpe1gb,+x2apic,+dca,+pcid,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pclmuldq,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme,hv_relaxed,hv_vapic,hv_spinlocks=0xfff
  -m 4024 -realtime mlock=off -smp 4,sockets=1,cores=4,threads=1 -uuid
  e8701c5c-b542-0199-fd2a-1047df24770e -no-user-config -nodefaults
  -chardev
  
socket,id=charmonitor,path=/var/lib/libvirt/qemu/Production.monitor,server,nowait
  -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime
  -no-shutdown -boot strict=on -device piix3-usb-
  uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
  file=/var/lib/libvirt/images/Production.img,if=none,id=drive-virtio-
  disk0,format=raw -device virtio-blk-
  pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-
  disk0,bootindex=1 -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31
  -device virtio-net-
  pci,netdev=hostnet0,id=net0,mac=00:16:3a:d2:cd:ea,bus=pci.0,addr=0x3
  -netdev tap,fd=35,id=hostnet1,vhost=on,vhostfd=36 -device virtio-net-
  pci,netdev=hostnet1,id=net1,mac=52:54:00:70:fe:54,bus=pci.0,addr=0x4
  -chardev pty,id=charserial0 -device isa-
  serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0
  -vnc 127.0.0.1:0 -device VGA,id=video0,bus=pci.0,addr=0x2 -device
  intel-hda,id=sound0,bus=pci.0,addr=0x5 -device hda-
  duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-
  pci,id=balloon0,bus=pci.0,addr=0x7

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1318830/+subscriptions

Re: [Qemu-devel] [RFC] fix crashes with vmware driver

2012-03-02 Thread Ryan Harper

* Serge Hallyn serge.hal...@canonical.com [2012-03-02 15:13]:
 Hi,
 
 I don't know where the best place to catch this would be, but
 with vnc and vmware_vga it's possible to get set_bit called on
 a negative index, crashing qemu.  See
 
 https://bugs.launchpad.net/ubuntu/+source/qemu-kvm/+bug/918791
 
 for details.  This patch prevents that.  It's possible this
 should be caught earlier, but this patch works for me.
 
 Signed-off-by: Serge Hallyn serge.hal...@canonical.com
 ---
  vmware_vga.c |   16 
  1 file changed, 16 insertions(+)
 
 Index: qemu-kvm-1.0+noroms/hw/vmware_vga.c
 ===
 --- qemu-kvm-1.0+noroms.orig/hw/vmware_vga.c  2012-03-01 16:19:23.280571798 
 -0600
 +++ qemu-kvm-1.0+noroms/hw/vmware_vga.c   2012-03-01 16:27:27.910975006 
 -0600
 @@ -298,6 +298,22 @@
  uint8_t *src;
  uint8_t *dst;
 
 +if (x  0) {
 +fprintf(stderr, %s: update x was  0 (%d, w %d)\n,
 +__FUNCTION__, x, w);
 +w += x;
 + if (w  0)
 + return;
 +x = 0;
 +}
 +if (y  0) {
 +fprintf(stderr, %s: update y was  0 (%d, h %d)\n,
 +__FUNCTION__, y, h);
 +h += y;
 + if (h  0)
 + return;
 +y = 0;
 +}

Looks like it has mixed spaces and tabs.

CODING_STYLE wants {} on all if's 



  if (x + w  s-width) {
  fprintf(stderr, %s: update width too large x: %d, w: %d\n,
  __FUNCTION__, x, w);

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] [PATCH v5] qemu-ga: Add guest-network-info command

2012-02-28 Thread Ryan Harper

);
 +
 +if (interfaces) {
 +network_info = g_malloc0(sizeof(*network_info));
 +network_info-interfaces = interfaces;
 +}
 +
 +return network_info;
 +}
 -- 
 1.7.3.4
 

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] [PATCH 2/4] Add cleanup function

2012-01-20 Thread Ryan Harper

* Eric Blake ebl...@redhat.com [2012-01-17 16:03]:
 On 01/16/2012 10:16 AM, Ryan Harper wrote:
   if test -z $1 -o -z $2; then
   echo Usage: $0 QEMU TEST1 [TEST2 ...]
  +cleanup
   exit 1
 
  Is it worth using 'trap cleanup 0' to install the cleanup handler up
  front, instead of modifying all exit call sites?
  
  I thought about that, but it seemed to require switching to /bin/bash
 
 Not really.
 
  
  and I know Anthony had written the scripts carefully to be /bin/sh.
 
 POSIX requires /bin/sh to support 'trap cleanup 0', and I don't know of

I was using trap cleanup SIGINT; which /bin/sh didn't like:

(finalgravity) qemu-test % ./qemu-test 
~/work/git/qemu/x86_64-softmmu/qemu-system-x86_64 tests/virtio-serial.sh 
trap: SIGINT: bad trap

but with 0 instead, that seems to work.

 any counter-example shells that fail to do this.  There are non-POSIX
 shells where installing a trap 0 handler from inside a function body
 invokes the handler upon exiting the function, instead of exiting the
 overall script, but even Solaris /bin/sh knows how to correctly handle a
 trap 0 handler installed outside of any function calls.
 
 https://www.gnu.org/software/autoconf/manual/autoconf.html#trap
 
 -- 
 Eric Blake   ebl...@redhat.com+1-919-301-3266
 Libvirt virtualization library http://libvirt.org
 



-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] [PATCH 2/4] Add cleanup function

2012-01-16 Thread Ryan Harper

* Eric Blake ebl...@redhat.com [2012-01-13 17:18]:
 On 01/13/2012 03:05 PM, Ryan Harper wrote:
  Create a cleanup function and call it from all exits so we don't leave
  temp files and directories around since we change the name on each 
  invocation.
  
  Also,  no need to delete the files in the tmpdir, so just remove the tmpdir
  if it exists.
  
  Signed-off-by: Ryan Harper ry...@us.ibm.com
  ---
   qemu-test |   11 +--
   1 files changed, 9 insertions(+), 2 deletions(-)
  
  diff --git a/qemu-test b/qemu-test
  index cd102a7..71c1ba1 100755
  --- a/qemu-test
  +++ b/qemu-test
  @@ -1,7 +1,14 @@
   #!/bin/sh
   
  +cleanup() {
  +if test -n $tmpdir; then
  +rm -rf $tmpdir;
  +fi
  +}
  +
   if test -z $1 -o -z $2; then
   echo Usage: $0 QEMU TEST1 [TEST2 ...]
  +cleanup
   exit 1
 
 Is it worth using 'trap cleanup 0' to install the cleanup handler up
 front, instead of modifying all exit call sites?

I thought about that, but it seemed to require switching to /bin/bash

and I know Anthony had written the scripts carefully to be /bin/sh.

 
 -- 
 Eric Blake   ebl...@redhat.com+1-919-301-3266
 Libvirt virtualization library http://libvirt.org
 



-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] [PATCH 2/2] KVM: Use -cpu best as default on x86

2012-01-16 Thread Ryan Harper

* Alexander Graf ag...@suse.de [2012-01-08 17:53]:
 When running QEMU without -cpu parameter, the user usually wants a sane
 default. So far, we're using the qemu64/qemu32 CPU type, which basically
 means the maximum TCG can emulate.

it also means we all maximum possible migration targets.  Have you
given any thought to migration with -cpu best? 

 
 That's a really good default when using TCG, but when running with KVM
 we much rather want a default saying the maximum performance I can get.
 
 Fortunately we just added an option that gives us the best performance
 while still staying safe on the testability side of things: -cpu best.
 So all we need to do is make -cpu best the default when the user doesn't
 define any.
 
 This fixes a lot of subtile breakage in the GNU toolchain (libgmp) which
 hicks up on QEMU's non-existent CPU models.
 
 This patch also adds a new pc-1.1 machine type to keep backwards compatible
 with older versions of QEMU.
 
 Signed-off-by: Alexander Graf ag...@suse.de
 ---
  hw/pc_piix.c |   42 ++
  1 files changed, 34 insertions(+), 8 deletions(-)
 
 diff --git a/hw/pc_piix.c b/hw/pc_piix.c
 index 00f525e..3d78ccb 100644
 --- a/hw/pc_piix.c
 +++ b/hw/pc_piix.c
 @@ -79,7 +79,8 @@ static void pc_init1(MemoryRegion *system_memory,
   const char *initrd_filename,
   const char *cpu_model,
   int pci_enabled,
 - int kvmclock_enabled)
 + int kvmclock_enabled,
 + int may_cpu_best)
  {
  int i;
  ram_addr_t below_4g_mem_size, above_4g_mem_size;
 @@ -102,6 +103,9 @@ static void pc_init1(MemoryRegion *system_memory,
  MemoryRegion *rom_memory;
  DeviceState *dev;
 
 +if (!cpu_model  kvm_enabled()  may_cpu_best) {
 +cpu_model = best;
 +}
  pc_cpus_init(cpu_model);
 
  if (kvmclock_enabled) {
 @@ -263,7 +267,21 @@ static void pc_init_pci(ram_addr_t ram_size,
   get_system_io(),
   ram_size, boot_device,
   kernel_filename, kernel_cmdline,
 - initrd_filename, cpu_model, 1, 1);
 + initrd_filename, cpu_model, 1, 1, 1);
 +}
 +
 +static void pc_init_pci_oldcpu(ram_addr_t ram_size,
 +   const char *boot_device,
 +   const char *kernel_filename,
 +   const char *kernel_cmdline,
 +   const char *initrd_filename,
 +   const char *cpu_model)
 +{
 +pc_init1(get_system_memory(),
 + get_system_io(),
 + ram_size, boot_device,
 + kernel_filename, kernel_cmdline,
 + initrd_filename, cpu_model, 1, 1, 0);
  }
 
  static void pc_init_pci_no_kvmclock(ram_addr_t ram_size,
 @@ -277,7 +295,7 @@ static void pc_init_pci_no_kvmclock(ram_addr_t ram_size,
   get_system_io(),
   ram_size, boot_device,
   kernel_filename, kernel_cmdline,
 - initrd_filename, cpu_model, 1, 0);
 + initrd_filename, cpu_model, 1, 0, 0);
  }
 
  static void pc_init_isa(ram_addr_t ram_size,
 @@ -293,7 +311,7 @@ static void pc_init_isa(ram_addr_t ram_size,
   get_system_io(),
   ram_size, boot_device,
   kernel_filename, kernel_cmdline,
 - initrd_filename, cpu_model, 0, 1);
 + initrd_filename, cpu_model, 0, 1, 0);
  }
 
  #ifdef CONFIG_XEN
 @@ -314,8 +332,8 @@ static void pc_xen_hvm_init(ram_addr_t ram_size,
  }
  #endif
 
 -static QEMUMachine pc_machine_v1_0 = {
 -.name = pc-1.0,
 +static QEMUMachine pc_machine_v1_1 = {
 +.name = pc-1.1,
  .alias = pc,
  .desc = Standard PC,
  .init = pc_init_pci,
 @@ -323,17 +341,24 @@ static QEMUMachine pc_machine_v1_0 = {
  .is_default = 1,
  };
 
 +static QEMUMachine pc_machine_v1_0 = {
 +.name = pc-1.0,
 +.desc = Standard PC,
 +.init = pc_init_pci_oldcpu,
 +.max_cpus = 255,
 +};
 +
  static QEMUMachine pc_machine_v0_15 = {
  .name = pc-0.15,
  .desc = Standard PC,
 -.init = pc_init_pci,
 +.init = pc_init_pci_oldcpu,
  .max_cpus = 255,
  };
 
  static QEMUMachine pc_machine_v0_14 = {
  .name = pc-0.14,
  .desc = Standard PC,
 -.init = pc_init_pci,
 +.init = pc_init_pci_oldcpu,
  .max_cpus = 255,
  .compat_props = (GlobalProperty[]) {
  {
 @@ -612,6 +637,7 @@ static QEMUMachine xenfv_machine = {
 
  static void pc_machine_init(void)
  {
 +qemu_register_machine(pc_machine_v1_1);
  qemu_register_machine(pc_machine_v1_0);
  qemu_register_machine(pc_machine_v0_15);
  qemu_register_machine(pc_machine_v0_14);
 -- 
 1.6.0.2
 

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] [PATCH 2/2] KVM: Use -cpu best as default on x86

2012-01-16 Thread Ryan Harper

* Alexander Graf ag...@suse.de [2012-01-16 13:37]:
 
 On 16.01.2012, at 20:30, Ryan Harper wrote:
 
  * Alexander Graf ag...@suse.de [2012-01-08 17:53]:
  When running QEMU without -cpu parameter, the user usually wants a sane
  default. So far, we're using the qemu64/qemu32 CPU type, which basically
  means the maximum TCG can emulate.
  
  it also means we all maximum possible migration targets.  Have you
  given any thought to migration with -cpu best? 
 
 If you have the same boxes in your cluster, migration just works. If
 you don't, you usually use a specific CPU model that is the least
 dominator between your boxes either way.

Sure, but the idea behind -cpu best is to not have to figure that out;
you had suggested that the qemu64/qemu32 were just related to TCG, and
what I'm suggesting is that it's also the most compatible w.r.t
migration.  

it sounds like if migration is a requirement, then -cpu best probably
isn't something that would be used.  I suppose I'm OK with that, or at
least I don't have a better suggestion on how to carefully push up the
capabilities without at some point breaking migration.

 
 The current kvm64 type is broken. Libgmp just abort()s when we pass it
 in. So anything is better than what we do today on AMD hosts :).

I wonder if it breaks with Cyris cpus... other tools tend to do runtime
detection (mplayer).


 
 
 Alex

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] [PATCH 2/2] KVM: Use -cpu best as default on x86

2012-01-16 Thread Ryan Harper

* Alexander Graf ag...@suse.de [2012-01-16 13:52]:
 
 On 16.01.2012, at 20:46, Ryan Harper wrote:
 
  * Alexander Graf ag...@suse.de [2012-01-16 13:37]:
  
  On 16.01.2012, at 20:30, Ryan Harper wrote:
  
  * Alexander Graf ag...@suse.de [2012-01-08 17:53]:
  When running QEMU without -cpu parameter, the user usually wants a sane
  default. So far, we're using the qemu64/qemu32 CPU type, which basically
  means the maximum TCG can emulate.
  
  it also means we all maximum possible migration targets.  Have you
  given any thought to migration with -cpu best? 
  
  If you have the same boxes in your cluster, migration just works. If
  you don't, you usually use a specific CPU model that is the least
  dominator between your boxes either way.
  
  Sure, but the idea behind -cpu best is to not have to figure that out;
  you had suggested that the qemu64/qemu32 were just related to TCG, and
  what I'm suggesting is that it's also the most compatible w.r.t
  migration.  
 
 The, the most compatible wrt migration is -cpu kvm64 / kvm32.
 
  it sounds like if migration is a requirement, then -cpu best probably
  isn't something that would be used.  I suppose I'm OK with that, or at
  least I don't have a better suggestion on how to carefully push up the
  capabilities without at some point breaking migration.
 
 Yes, if you're interested in migration, then you're almost guaranteed to have 
 a toolstack on top that has knowledge of your whole cluster and can do the 
 least dominator detection over all of your nodes. On the QEMU level we don't 
 know anything about other machines.
 
  
  
  The current kvm64 type is broken. Libgmp just abort()s when we pass it
  in. So anything is better than what we do today on AMD hosts :).
  
  I wonder if it breaks with Cyris cpus... other tools tend to do runtime
  detection (mplayer).
 
 It probably does :). But then again those don't do KVM, do they?

not following; mplayer issues SSE2, 3 and 4 instructions to see what
works to figure out how to optimize; it doesn't care if the cpu is
called QEMU64 or Cyrus or AMD.  I'm not saying that we can't do better
than qemu64 w.r.t best cpu to select by default, but there are plenty of
applications that want to optimize their code based on what's available,
but this is done via code execution instead of string comparision.

 
 
 Alex
 

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] [PATCH 2/2] KVM: Use -cpu best as default on x86

2012-01-16 Thread Ryan Harper

* Alexander Graf ag...@suse.de [2012-01-16 14:52]:
 
 On 16.01.2012, at 21:13, Ryan Harper wrote:
 
  * Alexander Graf ag...@suse.de [2012-01-16 13:52]:
  
  On 16.01.2012, at 20:46, Ryan Harper wrote:
  
  * Alexander Graf ag...@suse.de [2012-01-16 13:37]:
  
  On 16.01.2012, at 20:30, Ryan Harper wrote:
  
  * Alexander Graf ag...@suse.de [2012-01-08 17:53]:
  When running QEMU without -cpu parameter, the user usually wants a sane
  default. So far, we're using the qemu64/qemu32 CPU type, which 
  basically
  means the maximum TCG can emulate.
  
  it also means we all maximum possible migration targets.  Have you
  given any thought to migration with -cpu best? 
  
  If you have the same boxes in your cluster, migration just works. If
  you don't, you usually use a specific CPU model that is the least
  dominator between your boxes either way.
  
  Sure, but the idea behind -cpu best is to not have to figure that out;
  you had suggested that the qemu64/qemu32 were just related to TCG, and
  what I'm suggesting is that it's also the most compatible w.r.t
  migration.  
  
  The, the most compatible wrt migration is -cpu kvm64 / kvm32.
  
  it sounds like if migration is a requirement, then -cpu best probably
  isn't something that would be used.  I suppose I'm OK with that, or at
  least I don't have a better suggestion on how to carefully push up the
  capabilities without at some point breaking migration.
  
  Yes, if you're interested in migration, then you're almost guaranteed to 
  have a toolstack on top that has knowledge of your whole cluster and can 
  do the least dominator detection over all of your nodes. On the QEMU level 
  we don't know anything about other machines.
  
  
  
  The current kvm64 type is broken. Libgmp just abort()s when we pass it
  in. So anything is better than what we do today on AMD hosts :).
  
  I wonder if it breaks with Cyris cpus... other tools tend to do runtime
  detection (mplayer).
  
  It probably does :). But then again those don't do KVM, do they?
  
  not following; mplayer issues SSE2, 3 and 4 instructions to see what
  works to figure out how to optimize; it doesn't care if the cpu is
  called QEMU64 or Cyrus or AMD.  I'm not saying that we can't do better
  than qemu64 w.r.t best cpu to select by default, but there are plenty of
  applications that want to optimize their code based on what's available,
  but this is done via code execution instead of string comparision.
 
 The problem with -cpu kvm64 is that we choose a family/model that
 doesn't exist in the real world, and then glue AuthenticAMD or
 GenuineIntel in the vendor string. Libgmp checks for existing CPUs,
 finds that this CPU doesn't match any real world IDs and abort()s.
 
 The problem is that there is not a single CPU on this planet in
 silicon that has the same model+family numbers, but exists in
 AuthenticAMD _and_ GenuineIntel flavors. We need to pass the host
 vendor in though, because the guest uses it to detect if it should
 execute SYSCALL or SYSENTER, because intel and amd screwed up heavily
 on that one.

I forgot about this one.  =(


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

[Qemu-devel] [PATCH] Add virtio-blk-drive-serial test

2012-01-13 Thread Ryan Harper

We can test out the virtio-blk drive serial number by generating and then
reading it back via the file in sysfs.

Signed-off-by: Ryan Harper ry...@us.ibm.com
---
 tests/virtio-blk-drive-serial.sh |   40 ++
 1 files changed, 40 insertions(+), 0 deletions(-)
 create mode 100755 tests/virtio-blk-drive-serial.sh

diff --git a/tests/virtio-blk-drive-serial.sh b/tests/virtio-blk-drive-serial.sh
new file mode 100755
index 000..0586f97
--- /dev/null
+++ b/tests/virtio-blk-drive-serial.sh
@@ -0,0 +1,40 @@
+#!/bin/sh
+
+serial=0123456789abcdefghi
+
+in_host() {
+tmpdisk=$tmpdir/disk.img
+qemu-img create -f qcow2 $tmpdisk 10G
+
+qemu -nographic -enable-kvm \
+-drive 
file=$tmpdisk,if=none,id=drive-virtio-disk0,format=raw,cache=none,serial=$serial
 \
+-device 
virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
+rc=$?
+
+rm $tmpdisk
+return $rc
+}
+
+in_guest() {
+sysfspath=/sys/block/vda
+if ! test -e $sysfspath; then
+echo Device not visible!
+return 1
+fi
+
+guest_serial=`cat $sysfspath/serial`
+
+if test $guest_serial != $serial; then
+echo drive has wrong serial!
+echo Expected '$serial', got '$guest_serial'
+return 2
+fi
+
+return 0
+}
+
+if test $QEMU_TEST; then
+in_host
+else
+in_guest
+fi
-- 
1.7.6

[Qemu-devel] [PATCH 1/4] update get_file_size to not throw error on non-existant files

2012-01-13 Thread Ryan Harper

In some cases, when qemu is just starting up the logfile hasn't yet
been created and will throw an error message into the output:

ls: cannot access .tmp-3070/logfile-3070.log: No such file or directory

if we check to see if the file is readable first then we know the file
is present and we can extract the size.

Signed-off-by: Ryan Harper ry...@us.ibm.com
---
 qemu-test |7 ---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/qemu-test b/qemu-test
index c6ea595..cd102a7 100755
--- a/qemu-test
+++ b/qemu-test
@@ -54,9 +54,10 @@ checkpid() {
 }
 
 get_file_size() {
-ls -al $1 | cut -f5 -d' ' 2/dev/null
-if test $? != 0; then
-   echo 0
+if test -r ${1} ; then
+ls -al $1 | cut -f5 -d' '
+else
+echo 0
 fi
 }
 
-- 
1.7.6

[Qemu-devel] [PATCH 4/4] Apply consistent indentation

2012-01-13 Thread Ryan Harper

Inner blocks of if/for/while are all indented.

Signed-off-by: Ryan Harper ry...@us.ibm.com
---
 qemu-test |   56 
 1 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/qemu-test b/qemu-test
index 445ca6d..9750a3f 100755
--- a/qemu-test
+++ b/qemu-test
@@ -2,7 +2,7 @@
 
 cleanup() {
 if test -n $tmpdir; then
-rm -rf $tmpdir;
+rm -rf $tmpdir;
 fi
 }
 
@@ -29,9 +29,9 @@ if ! which qmp /dev/null 2/dev/null; then
 qmp=${QEMU_SRC}/QMP/qmp
 
 if ! test -x ${qmp}; then
-echo Please set QEMU_SRC to set to a recent qemu.git tree
-cleanup
-exit 1
+echo Please set QEMU_SRC to set to a recent qemu.git tree
+cleanup
+exit 1
 fi
 else
 qmp=`which qmp | head -1`
@@ -63,9 +63,9 @@ checkpid() {
 
 get_file_size() {
 if test -r ${1} ; then
-ls -al $1 | cut -f5 -d' '
+ls -al $1 | cut -f5 -d' '
 else
-echo 0
+echo 0
 fi
 }
 
@@ -104,11 +104,11 @@ choose() {
 target=$(($RANDOM % $#))
 count=0
 for i in $@; do
-if test $count = $target; then
-echo $i
-return 0
-fi
-count=$(($count + 1))
+if test $count = $target; then
+echo $i
+return 0
+fi
+count=$(($count + 1))
 done
 
 # not supposed to happen...
@@ -117,7 +117,7 @@ choose() {
 
 choose_bool() {
 if test `choose yes no` = yes; then
-return 0
+return 0
 fi
 return 1
 }
@@ -135,34 +135,34 @@ start_qemu() {
 qemu_is_okay() {
 # it's stopped, that's not necessarly bad
 if ! checkpid $pid; then
-return 1
+return 1
 fi
  
 log_size=`get_file_size $tmplog`
 if test $last_log_size = $log_size; then
-freeze_count=$(($freeze_count + 1))
+freeze_count=$(($freeze_count + 1))
 else
-freeze_count=0
-last_log_size=$log_size
+freeze_count=0
+last_log_size=$log_size
 fi
 if test $freeze_count -gt $FREEZE_THRESHOLD  checkpid $pid; then
-qemu_pid=`cat $tmppid`
-kill -9 $qemu_pid
-echo Guest ($qemu_pid) has not had output in $FREEZE_THRESHOLD seconds!
-return 2
+qemu_pid=`cat $tmppid`
+kill -9 $qemu_pid
+echo Guest ($qemu_pid) has not had output in $FREEZE_THRESHOLD 
seconds!
+return 2
 fi
 
 if ! qmp query-status /dev/null 2/dev/null  checkpid $pid; then
-qemu_pid=`cat $tmppid`
-
-kill -9 $qemu_pid
-echo QEMU is hung!
-return 3
+qemu_pid=`cat $tmppid`
+
+kill -9 $qemu_pid
+echo QEMU is hung!
+return 3
 fi
 
 # it's stopped, that's not necessarly bad
 if ! checkpid $pid; then
-return 1
+return 1
 fi
 }
 
@@ -170,7 +170,7 @@ get_qemu_status() {
 wait $pid
 rc=`cat $tmprc`
 if test $(($rc  1)) = 1; then
-rc=$(($rc / 2))
+rc=$(($rc / 2))
 fi
 return $rc
 }
@@ -179,7 +179,7 @@ qemu() {
 start_qemu $@
 
 while qemu_is_okay; do
-sleep 1
+sleep 1
 done
 
 get_qemu_status
-- 
1.7.6

[Qemu-devel] [PATCH 2/4] Add cleanup function

2012-01-13 Thread Ryan Harper

Create a cleanup function and call it from all exits so we don't leave
temp files and directories around since we change the name on each invocation.

Also,  no need to delete the files in the tmpdir, so just remove the tmpdir
if it exists.

Signed-off-by: Ryan Harper ry...@us.ibm.com
---
 qemu-test |   11 +--
 1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/qemu-test b/qemu-test
index cd102a7..71c1ba1 100755
--- a/qemu-test
+++ b/qemu-test
@@ -1,7 +1,14 @@
 #!/bin/sh
 
+cleanup() {
+if test -n $tmpdir; then
+rm -rf $tmpdir;
+fi
+}
+
 if test -z $1 -o -z $2; then
 echo Usage: $0 QEMU TEST1 [TEST2 ...]
+cleanup
 exit 1
 fi
 
@@ -23,6 +30,7 @@ if ! which qmp /dev/null 2/dev/null; then
 
 if ! test -x ${qmp}; then
echo Please set QEMU_SRC to set to a recent qemu.git tree
+cleanup
exit 1
 fi
 else
@@ -182,7 +190,6 @@ QEMU_TEST=1
 . $1
 rc=$?
 
-rm -f $tmplog $tmppid $tmpqmp $tmpinitrd
-rm -rf $tmpdir $tmprc
+cleanup
 
 exit $rc
-- 
1.7.6

[Qemu-devel] [PATCH] qemu-test: qemu-test script cleanups

2012-01-13 Thread Ryan Harper

Ran qemu-test for the first time and found a few places to clean up the output.
Did tab removal in qemu-test and then updated the indentation to be consistent.

Signed-off-by: Ryan Harper ry...@us.ibm.com

[Qemu-devel] [PATCH 3/4] Remove tabs and replace with four spaces.

2012-01-13 Thread Ryan Harper

Signed-off-by: Ryan Harper ry...@us.ibm.com
---
 qemu-test |   62 ++--
 1 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/qemu-test b/qemu-test
index 71c1ba1..445ca6d 100755
--- a/qemu-test
+++ b/qemu-test
@@ -29,9 +29,9 @@ if ! which qmp /dev/null 2/dev/null; then
 qmp=${QEMU_SRC}/QMP/qmp
 
 if ! test -x ${qmp}; then
-   echo Please set QEMU_SRC to set to a recent qemu.git tree
+echo Please set QEMU_SRC to set to a recent qemu.git tree
 cleanup
-   exit 1
+exit 1
 fi
 else
 qmp=`which qmp | head -1`
@@ -74,13 +74,13 @@ qmp() {
 qmp_pid=$!
 count=0
 while checkpid $qmp_pid; do
-   sleep 1
-   count=$(($count + 1))
-   if test $count -gt $QMP_TIMEOUT; then
-   echo $count, $QMP_TIMEOUT
-   kill -9 $qmp_pid
-   return 1
-   fi
+sleep 1
+count=$(($count + 1))
+if test $count -gt $QMP_TIMEOUT; then
+echo $count, $QMP_TIMEOUT
+kill -9 $qmp_pid
+return 1
+fi
 done
 return 0
 }
@@ -104,11 +104,11 @@ choose() {
 target=$(($RANDOM % $#))
 count=0
 for i in $@; do
-   if test $count = $target; then
-   echo $i
-   return 0
-   fi
-   count=$(($count + 1))
+if test $count = $target; then
+echo $i
+return 0
+fi
+count=$(($count + 1))
 done
 
 # not supposed to happen...
@@ -117,7 +117,7 @@ choose() {
 
 choose_bool() {
 if test `choose yes no` = yes; then
-   return 0
+return 0
 fi
 return 1
 }
@@ -135,34 +135,34 @@ start_qemu() {
 qemu_is_okay() {
 # it's stopped, that's not necessarly bad
 if ! checkpid $pid; then
-   return 1
+return 1
 fi
  
 log_size=`get_file_size $tmplog`
 if test $last_log_size = $log_size; then
-   freeze_count=$(($freeze_count + 1))
+freeze_count=$(($freeze_count + 1))
 else
-   freeze_count=0
-   last_log_size=$log_size
+freeze_count=0
+last_log_size=$log_size
 fi
 if test $freeze_count -gt $FREEZE_THRESHOLD  checkpid $pid; then
-   qemu_pid=`cat $tmppid`
-   kill -9 $qemu_pid
-   echo Guest ($qemu_pid) has not had output in $FREEZE_THRESHOLD 
seconds!
-   return 2
+qemu_pid=`cat $tmppid`
+kill -9 $qemu_pid
+echo Guest ($qemu_pid) has not had output in $FREEZE_THRESHOLD seconds!
+return 2
 fi
 
 if ! qmp query-status /dev/null 2/dev/null  checkpid $pid; then
-   qemu_pid=`cat $tmppid`
-   
-   kill -9 $qemu_pid
-   echo QEMU is hung!
-   return 3
+qemu_pid=`cat $tmppid`
+
+kill -9 $qemu_pid
+echo QEMU is hung!
+return 3
 fi
 
 # it's stopped, that's not necessarly bad
 if ! checkpid $pid; then
-   return 1
+return 1
 fi
 }
 
@@ -170,7 +170,7 @@ get_qemu_status() {
 wait $pid
 rc=`cat $tmprc`
 if test $(($rc  1)) = 1; then
-   rc=$(($rc / 2))
+rc=$(($rc / 2))
 fi
 return $rc
 }
@@ -179,7 +179,7 @@ qemu() {
 start_qemu $@
 
 while qemu_is_okay; do
-   sleep 1
+sleep 1
 done
 
 get_qemu_status
-- 
1.7.6

Re: [Qemu-devel] Storage requirements for live migration

2011-11-11 Thread Ryan Harper

 == Image Formats ==
 
 Image formats are not safe to use with live migration.  The reason is that 
 QEMU caches data for image formats and does not have a mechanism to flush 
 those caches.  The following attempts to describe the issues with the 
 various formats
 
 === QCOW2 ===
 
 QCOW2 caches two forms of data, cluster metadata (L1/L2 data, refcount 
 table, etc) and mutable header information (file size, snapshot entries, 
 etc).
 
 This data needs to be discarded before after migration starts.

before after?

pick one =)


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] [PATCH v10 1/3] block: add the command line support

2011-11-01 Thread Ryan Harper

 --git a/blockdev.c b/blockdev.c
 index 0827bf7..faf8c56 100644
 --- a/blockdev.c
 +++ b/blockdev.c
 @@ -235,6 +235,9 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
  int on_read_error, on_write_error;
  const char *devaddr;
  DriveInfo *dinfo;
 +BlockIOLimit io_limits;
 +bool bps_iol;
 +bool iops_iol;
  int snapshot = 0;
  int ret;
 
 @@ -353,6 +356,32 @@ DriveInfo *drive_init(QemuOpts *opts, int 
 default_to_scsi)
  }
  }
 
 +/* disk I/O throttling */
 +io_limits.bps[BLOCK_IO_LIMIT_TOTAL]  =
 +   qemu_opt_get_number(opts, bps, 0);
 +io_limits.bps[BLOCK_IO_LIMIT_READ]   =
 +   qemu_opt_get_number(opts, bps_rd, 0);
 +io_limits.bps[BLOCK_IO_LIMIT_WRITE]  =
 +   qemu_opt_get_number(opts, bps_wr, 0);
 +io_limits.iops[BLOCK_IO_LIMIT_TOTAL] =
 +   qemu_opt_get_number(opts, iops, 0);
 +io_limits.iops[BLOCK_IO_LIMIT_READ]  =
 +   qemu_opt_get_number(opts, iops_rd, 0);
 +io_limits.iops[BLOCK_IO_LIMIT_WRITE] =
 +   qemu_opt_get_number(opts, iops_wr, 0);
 +
 +bps_iol  = (io_limits.bps[BLOCK_IO_LIMIT_TOTAL] != 0)
 +  ((io_limits.bps[BLOCK_IO_LIMIT_READ] != 0)
 + || (io_limits.bps[BLOCK_IO_LIMIT_WRITE] != 0));
 +iops_iol = (io_limits.iops[BLOCK_IO_LIMIT_TOTAL] != 0)
 +  ((io_limits.iops[BLOCK_IO_LIMIT_READ] != 0)
 + || (io_limits.iops[BLOCK_IO_LIMIT_WRITE] != 0));
 +if (bps_iol || iops_iol) {
 +error_report(bps(iops) and bps_rd/bps_wr(iops_rd/iops_wr)
 + cannot be used at the same time);
 +return NULL;
 +}
 +


How about extracting this section into the bdrv_set_io_limits() function or some
other common setting function.  This should be the same code that's used when
modifying/setting these values via the monitor as well (minus how you obtain the
input values - option parsings vs. monitor values).

  on_write_error = BLOCK_ERR_STOP_ENOSPC;
  if ((buf = qemu_opt_get(opts, werror)) != NULL) {
  if (type != IF_IDE  type != IF_SCSI  type != IF_VIRTIO  type 
 != IF_NONE) {
 @@ -460,6 +489,9 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
 
  bdrv_set_on_error(dinfo-bdrv, on_read_error, on_write_error);
 
 +/* disk I/O throttling */
 +bdrv_set_io_limits(dinfo-bdrv, io_limits);
 +
  switch(type) {
  case IF_IDE:
  case IF_SCSI:
 diff --git a/qemu-config.c b/qemu-config.c
 index 597d7e1..1aa080f 100644
 --- a/qemu-config.c
 +++ b/qemu-config.c
 @@ -85,6 +85,30 @@ static QemuOptsList qemu_drive_opts = {
  .name = readonly,
  .type = QEMU_OPT_BOOL,
  .help = open drive file as read-only,
 +},{
 +.name = iops,
 +.type = QEMU_OPT_NUMBER,
 +.help = limit total I/O operations per second,
 +},{
 +.name = iops_rd,
 +.type = QEMU_OPT_NUMBER,
 +.help = limit read operations per second,
 +},{
 +.name = iops_wr,
 +.type = QEMU_OPT_NUMBER,
 +.help = limit write operations per second,
 +},{
 +.name = bps,
 +.type = QEMU_OPT_NUMBER,
 +.help = limit total bytes per second,
 +},{
 +.name = bps_rd,
 +.type = QEMU_OPT_NUMBER,
 +.help = limit read bytes per second,
 +},{
 +.name = bps_wr,
 +.type = QEMU_OPT_NUMBER,
 +.help = limit write bytes per second,
  },
  { /* end of list */ }
  },
 diff --git a/qemu-options.hx b/qemu-options.hx
 index 681eaf1..25a7be7 100644
 --- a/qemu-options.hx
 +++ b/qemu-options.hx
 @@ -136,6 +136,7 @@ DEF(drive, HAS_ARG, QEMU_OPTION_drive,
 
 [,cache=writethrough|writeback|none|directsync|unsafe][,format=f]\n
 [,serial=s][,addr=A][,id=name][,aio=threads|native]\n
 [,readonly=on|off]\n
 +   
 [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r][,iops_wr=w]]\n
  use 'file' as a drive image\n, QEMU_ARCH_ALL)
  STEXI
  @item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
 -- 
 1.7.6
 

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] [PATCH 3/6] trace: Add trace events group implementation in the backend simple

2011-10-12 Thread Ryan Harper

* Mark Wu wu...@linux.vnet.ibm.com [2011-10-12 12:26]:
 
 Signed-off-by: Mark Wu wu...@linux.vnet.ibm.com
 ---
  trace/simple.c |   30 ++
  trace/simple.h |7 +++
  2 files changed, 37 insertions(+), 0 deletions(-)
 
 diff --git a/trace/simple.c b/trace/simple.c
 index b639dda..7aa4c0b 100644
 --- a/trace/simple.c
 +++ b/trace/simple.c
 @@ -321,6 +321,16 @@ void trace_print_events(FILE *stream, fprintf_function 
 stream_printf)
  }
  }
 
 +void trace_print_groups(FILE *stream, fprintf_function stream_printf)
 +{
 +unsigned int i;
 +
 +for (i = 0; i  NR_TRACE_EVENT_GROUPS; i++) {
 +stream_printf(stream, %s [GROUP ID %u] : state %u\n,
 +  trace_group_list[i].gp_name, i,
 +   trace_group_list[i].state);

You've got a mix of space and tabs in this file.  Remove the tabs and
adjust the spacing to CODING_STYLE rules.

 +}
 +}
  bool trace_event_set_state(const char *name, bool state)
  {
  unsigned int i;
 @@ -334,6 +344,26 @@ bool trace_event_set_state(const char *name, bool state)
  return false;
  }
 
 +bool trace_event_group_set_state(const char *gp_name, bool state)
 +{
 +unsigned int i;
 +unsigned int j;
 +TraceEventGroup *group;
 +
 +for (i = 0; i  NR_TRACE_EVENT_GROUPS; i++) {
 + group = trace_group_list[i]; 

here

 +if (!strcmp(group-gp_name, gp_name)) {
 +group-state = state;
 +
 + for (j = group-start; j = group-end; j++) {

here

 +trace_list[j].state = state;
 +}
 +return true;
 +}
 +}
 +return false;
 +}
 +
  /* Helper function to create a thread with signals blocked.  Use glib's
   * portable threads since QEMU abstractions cannot be used due to reentrancy 
 in
   * the tracer.  Also note the signal masking on POSIX hosts so that the 
 thread
 diff --git a/trace/simple.h b/trace/simple.h
 index 466e75b..cf119f3 100644
 --- a/trace/simple.h
 +++ b/trace/simple.h
 @@ -22,6 +22,13 @@ typedef struct {
  bool state;
  } TraceEvent;
 
 +typedef struct {
 +const char *gp_name;
 +bool state;
 +int start;
 +int end;
 +} TraceEventGroup;
 +
  void trace0(TraceEventID event);
  void trace1(TraceEventID event, uint64_t x1);
  void trace2(TraceEventID event, uint64_t x1, uint64_t x2);
 -- 
 1.7.1
 

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] [PATCH] vhost-net: cleanup host notifiers at last step

2011-08-26 Thread Ryan Harper

, VirtIODevice *vdev);
  int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
  void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
 +int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
 +void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
 
  #endif
 diff --git a/hw/vhost_net.c b/hw/vhost_net.c
 index a559812..950a6b8 100644
 --- a/hw/vhost_net.c
 +++ b/hw/vhost_net.c
 @@ -139,16 +139,22 @@ int vhost_net_start(struct vhost_net *net,
  {
  struct vhost_vring_file file = { };
  int r;
 +
 +net-dev.nvqs = 2;
 +net-dev.vqs = net-vqs;
 +
 +r = vhost_dev_enable_notifiers(net-dev, dev);
 +if (r  0) {
 +goto fail_notifiers;
 +}
  if (net-dev.acked_features  (1  VIRTIO_NET_F_MRG_RXBUF)) {
  tap_set_vnet_hdr_len(net-vc,
   sizeof(struct virtio_net_hdr_mrg_rxbuf));
  }
 
 -net-dev.nvqs = 2;
 -net-dev.vqs = net-vqs;
  r = vhost_dev_start(net-dev, dev);
  if (r  0) {
 -return r;
 +goto fail_start;
  }
 
  net-vc-info-poll(net-vc, false);
 @@ -173,6 +179,9 @@ fail:
  if (net-dev.acked_features  (1  VIRTIO_NET_F_MRG_RXBUF)) {
  tap_set_vnet_hdr_len(net-vc, sizeof(struct virtio_net_hdr));
  }
 +fail_start:
 +vhost_dev_disable_notifiers(net-dev, dev);
 +fail_notifiers:
  return r;
  }
 
 @@ -190,6 +199,7 @@ void vhost_net_stop(struct vhost_net *net,
  if (net-dev.acked_features  (1  VIRTIO_NET_F_MRG_RXBUF)) {
  tap_set_vnet_hdr_len(net-vc, sizeof(struct virtio_net_hdr));
  }
 +vhost_dev_disable_notifiers(net-dev, dev);
  }
 
  void vhost_net_cleanup(struct vhost_net *net)
 -- 
 1.7.5.53.gc233e

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] [PATCH 0/3] better I/O accounting V2

2011-08-25 Thread Ryan Harper

* Christoph Hellwig h...@lst.de [2011-08-24 13:46]:
 On Mon, Aug 22, 2011 at 11:46:00AM -0500, Ryan Harper wrote:
  So, I believe this is how it's happening.
  
  we start accounting on a write which is turned into a multiwrite via
  virtio_blk_handle_write() which calls virtio_submit_multiwrite().
  
  Then when the multiwrite completes, we invoke virtio_blk_rw_complete()
  on each part of the multiwrite.  None of these requests have had their
  acct structure initialized since there was just *one* initial write.
  We could do a bdrv_acct_start() on each req, but that would break the
  concept of hiding the additional writes under the initial request.
 
 We do initialize all write fields correctly.  Where we fail right now
 is for non-read/write requests.  I can reproduce your issue by reading
 the serial attribute in sysfs - any chance Fedora does that during boot?

Indeed.  F15 has udev hooks to support /dev/disk/by-id for virtio-blk
devices which will issue non-read/write request (get drive serial).

 
 The patch below should take care of these cases:

Tested with this patch, F15 boots fine, mkfs.ext4 and dd all work as
expected.

Tested-by: Ryan Harper ry...@us.ibm.com

 
 
 Index: qemu/hw/virtio-blk.c
 ===
 --- qemu.orig/hw/virtio-blk.c 2011-08-24 20:32:03.764503179 +0200
 +++ qemu/hw/virtio-blk.c  2011-08-24 20:35:55.716579922 +0200
 @@ -59,9 +59,6 @@ static void virtio_blk_req_complete(Virt
  stb_p(req-in-status, status);
  virtqueue_push(s-vq, req-elem, req-qiov.size + sizeof(*req-in));
  virtio_notify(s-vdev, s-vq);
 -
 -bdrv_acct_done(s-bs, req-acct);
 -g_free(req);
  }
 
  static int virtio_blk_handle_rw_error(VirtIOBlockReq *req, int error,
 @@ -83,6 +80,8 @@ static int virtio_blk_handle_rw_error(Vi
  vm_stop(VMSTOP_DISKFULL);
  } else {
  virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
 +bdrv_acct_done(s-bs, req-acct);
 +g_free(req);
  bdrv_mon_event(s-bs, BDRV_ACTION_REPORT, is_read);
  }
 
 @@ -102,6 +101,8 @@ static void virtio_blk_rw_complete(void
  }
 
  virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
 +bdrv_acct_done(req-dev-bs, req-acct);
 +g_free(req);
  }
 
  static void virtio_blk_flush_complete(void *opaque, int ret)
 @@ -115,6 +116,8 @@ static void virtio_blk_flush_complete(vo
  }
 
  virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
 +bdrv_acct_done(req-dev-bs, req-acct);
 +g_free(req);
  }
 
  static VirtIOBlockReq *virtio_blk_alloc_request(VirtIOBlock *s)
 @@ -157,6 +160,7 @@ static void virtio_blk_handle_scsi(VirtI
   */
  if (req-elem.out_num  2 || req-elem.in_num  3) {
  virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
 +g_free(req);
  return;
  }
 
 @@ -165,6 +169,7 @@ static void virtio_blk_handle_scsi(VirtI
   */
  if (req-elem.out_num  2  req-elem.in_num  3) {
  virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP);
 +g_free(req);
  return;
  }
 
 @@ -231,11 +236,13 @@ static void virtio_blk_handle_scsi(VirtI
  stl_p(req-scsi-data_len, hdr.dxfer_len);
 
  virtio_blk_req_complete(req, status);
 +g_free(req);
  }
  #else
  static void virtio_blk_handle_scsi(VirtIOBlockReq *req)
  {
  virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP);
 +g_free(req);
  }
  #endif /* __linux__ */
 
 @@ -378,6 +385,7 @@ static void virtio_blk_handle_request(Vi
  s-serial ? s-serial : ,
  MIN(req-elem.in_sg[0].iov_len, VIRTIO_BLK_ID_BYTES));
  virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
 +g_free(req);
  } else if (type  VIRTIO_BLK_T_OUT) {
  qemu_iovec_init_external(req-qiov, req-elem.out_sg[1],
   req-elem.out_num - 1);

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] [PATCH 2/2] [FYI] coverage test for Linux installs

2011-08-23 Thread Ryan Harper

* Anthony Liguori aligu...@us.ibm.com [2011-08-08 14:33]:
 This is a simple tool that I've been using for the past couple months to help
 with day-to-day testing of changes.  It may seem like it's similar to
 kvm-autotest but it's actually quite different.  Most noticably:
 
  - It's a coverage test instead of an acceptance test.  Each time it runs it
uses a slightly a semi-random configuration for the guest.  This means that
the more you run it, the more coverage you get.  This is a good fit for
maintainer testing because you get wide testing without having to wait for
48-hour test cycles.
 
  - It runs in the foreground as an unprivileged user.  This means I can kick 
 it
off in a terminal while I go off and do something else, but still keep an
eye on what's going on.
 
  - It works with TCG guests too and includes a TCG test case for the pseries
Power machine.
 
 That said, it's *not* a replacement for KVM autotest.  It is not a good tool
 for doing acceptance testing, like you would do to cut a new QEMU release.  
 But
 perhaps there's some behavior that could be leveraged by KVM autotest in the
 future here.
 
 I'm not proposing this for tree inclusion right now.  Just sharing a tool that
 I've found to be useful.  I really just want the previous patch to go in so 
 that
 I can stop carrying that patch privately.
 
 Right now, you need to setup an ISO directory to use the test tool.  After
 copy-on-read lands in the tree, I plan on making it create CoR files backing 
 to
 http so that no explicit setup is required.
 ---
  test-linux.c |  549 
 ++
  1 files changed, 549 insertions(+), 0 deletions(-)
  create mode 100644 test-linux.c

I needed the following changes to the Makefile to build this..


diff --git a/Makefile b/Makefile
index 8606849..b7df0e1 100644
--- a/Makefile
+++ b/Makefile
@@ -162,6 +162,7 @@ check-qlist: check-qlist.o qlist.o qint.o $(CHECK_PROG_DEPS)
 check-qfloat: check-qfloat.o qfloat.o $(CHECK_PROG_DEPS)
 check-qjson: check-qjson.o qfloat.o qint.o qdict.o qstring.o qlist.o qbool.o 
qjson.o json-streamer.o json-lexer.o js
 test-coroutine: test-coroutine.o qemu-timer-common.o async.o 
$(coroutine-obj-y) $(CHECK_PROG_DEPS)
+test-linux: test-linux.o $(CHECK_PROG_DEPS)
 
 $(qapi-obj-y): $(GENERATED_HEADERS)
 qapi-dir := qapi-generated


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] Block layer roadmap on wiki

2011-08-22 Thread Ryan Harper

* Stefan Hajnoczi stefa...@gmail.com [2011-08-22 08:35]:
 At KVM Forum Kevin, Christoph, and I had an opportunity to get
 together for a Block Layer BoF.  We went through the recent roadmap
 mailing list thread and touched on each proposed feature.
 
 Here is the block layer roadmap wiki page:
 http://wiki.qemu.org/BlockRoadmap
 
 Kevin: I have moved the runtime WCE toggling to QEMU 1.0 since you
 mentioned you want it for the next release.
 
 My main take-away from the BoF was that integrating support for host
 block devices and storage appliances will allow us to reduce the
 amount of effort spent on image formats.  In order to make image
 formats support the desired features and performance we end up
 implementing much of the storage stack and file systems in userspace -
 code that is duplicated and cannot take advantage of the existing
 storage stack.

+1

 
 Storage management features are not just available in remote SAN and
 NAS appliances anymore.  For local storage, btrfs has file-level
 clones and thin-dev is significantly improving LVM snapshots.
 
 Thin-dev is bringing a much more efficient and scalable snapshot model
 to LVM.  This device-mapper feature will make LVM attractive for high
 performance I/O without giving up snapshot and clone features.  It
 also supports cloning off block devices that are not in the pool (e.g.
 external storage, much like QEMU's backing files feature):
 https://github.com/jthornber/linux-2.6/tree/thin-dev
 
 This will not replace image formats overnight because image formats
 are still widely used and will continue to be a useful for
 transferring and sharing disk images.  But focussing on the larger

Any thoughts on how to make this easily usable for LVM?  If there were
an export/import to/from file to LVM?  is that sufficient?  Anything
like this in existence?

 storage stack where either local LVM, btrfs, or storage appliances do
 the storage management means we exploit those options instead of
 implementing equivalent functionality ourselves.  QEMU then runs with
 plain old raw in more cases.
 
 Stefan

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] [PATCH 0/3] better I/O accounting V2

2011-08-22 Thread Ryan Harper

[cookie-type] += cookie-bytes;
240 bs-nr_ops[cookie-type]++;
241 bs-total_time_ns[cookie-type] += get_clock() - 
cookie-start_time_ns;
242 }
243


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] [PATCH 0/3] better I/O accounting V2

2011-08-22 Thread Ryan Harper

* Christoph Hellwig h...@lst.de [2011-08-22 10:16]:
 On Mon, Aug 22, 2011 at 09:59:16AM -0500, Ryan Harper wrote:
  * Christoph Hellwig h...@lst.de [2011-08-21 17:27]:
   changes since V1:
- rebase to apply against the current qemu.git tree
  
  
  Testing on tip + this series with:
  
  ./configure --enable-debug --enable-kvm --enable-io-thread 
  --target-list=x86_64-softmmu
 
 That is about the same config that I run.
 
  [New Thread 0x74f27700 (LWP 18033)]
  
  Program received signal SIGSEGV, Segmentation fault.
  0x004200c1 in bdrv_acct_done (bs=0x12310b0, cookie=0x1c69f50) at 
  /root/git/qemu/block_int.h:239
  239 bs-nr_bytes[cookie-type] += cookie-bytes;
 
 I can't see how this can result in a segfault, unless we get a corrupted
 cookie.  Can you print what cookie-type is from gdb?


(gdb) frame 0
#0  0x004200c1 in bdrv_acct_done (bs=0x12310b0, cookie=0x1c68810) at 
/root/git/qemu/block_int.h:239 239 bs-nr_bytes[cookie-type] += 
cookie-bytes;
(gdb) p *cookie
$3 = {bytes = 72057589759737855, start_time_ns = 72057589759737855, type = 
16777215}


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] [PATCH 0/3] better I/O accounting V2

2011-08-22 Thread Ryan Harper

* Christoph Hellwig h...@lst.de [2011-08-22 10:37]:
 On Mon, Aug 22, 2011 at 10:29:11AM -0500, Ryan Harper wrote:
  (gdb) frame 0
  #0  0x004200c1 in bdrv_acct_done (bs=0x12310b0, cookie=0x1c68810) 
  at /root/git/qemu/block_int.h:239 239 bs-nr_bytes[cookie-type] += 
  cookie-bytes;
  (gdb) p *cookie
  $3 = {bytes = 72057589759737855, start_time_ns = 72057589759737855, type = 
  16777215}
 
 So it is indeed corrupted.  I'll try to figure out how that could have
 happened.

So, I believe this is how it's happening.

we start accounting on a write which is turned into a multiwrite via
virtio_blk_handle_write() which calls virtio_submit_multiwrite().

Then when the multiwrite completes, we invoke virtio_blk_rw_complete()
on each part of the multiwrite.  None of these requests have had their
acct structure initialized since there was just *one* initial write.
We could do a bdrv_acct_start() on each req, but that would break the
concept of hiding the additional writes under the initial request.

So ensuring that the acct field is initialed when the request is
allocated will fix the issue.

With this patch, I don't see the crash anymore.

Signed-off-by: Ryan Harper ry...@us.ibm.com


diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index 2660d1d..e746917 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -123,6 +123,7 @@ static VirtIOBlockReq *virtio_blk_alloc_request(VirtIOBlock 
*s)
 req-dev = s;
 req-qiov.size = 0;
 req-next = NULL;
+memset(req-acct, 0, sizeof(BlockAcctCookie));
 return req;
 }



-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] Block layer roadmap on wiki

2011-08-22 Thread Ryan Harper

* Stefan Hajnoczi stefa...@gmail.com [2011-08-22 15:32]:
 On Mon, Aug 22, 2011 at 8:04 PM, Anthony Liguori anth...@codemonkey.ws 
 wrote:
  On 08/22/2011 08:34 AM, Stefan Hajnoczi wrote:
 
  At KVM Forum Kevin, Christoph, and I had an opportunity to get
  together for a Block Layer BoF.  We went through the recent roadmap
  mailing list thread and touched on each proposed feature.
 
  Here is the block layer roadmap wiki page:
  http://wiki.qemu.org/BlockRoadmap
 
  Kevin: I have moved the runtime WCE toggling to QEMU 1.0 since you
  mentioned you want it for the next release.
 
  My main take-away from the BoF was that integrating support for host
  block devices and storage appliances will allow us to reduce the
  amount of effort spent on image formats.  In order to make image
  formats support the desired features and performance we end up
  implementing much of the storage stack and file systems in userspace -
  code that is duplicated and cannot take advantage of the existing
  storage stack.
 
  The flip side is, tighter integration either makes features hard to consume
  or makes QEMU enter a space it currently hasn't.  Many features require root
  privileges to configure and a system-wide scope.  That's not QEMU today.
 
 QEMU itself should be about emulation and virtualization.  Storage
 management needs to be done outside of QEMU.  Today you can already
 take an LVM snapshot - it happens outside of QEMU.  It's at the
 libvirt level where different storage systems get abstracted (LVM,
 directory, iSCSI, etc) and there is a single API/command set to invoke
 management functions.  But even without libvirt you can do it
 yourself, and I think this separation makes sense so that QEMU can be
 focussed on running a single VM rather than managing storage.
 
  In addition, it makes QEMU tied to a specific platform (most likely Linux).
 
 QEMU will still work but certain features might not be available.  For
 example, this is true today if you're using a storage appliance that
 does deduplication - that's a feature you're getting on top of the
 emulation/virtualization that QEMU does.  But it doesn't tie QEMU to a
 particular platform.
 
  You could certainly rm -rf block/* and still be able to accomplish much of
  what's done today but it would be extremely painful to do in practice.  We
  have to find a balance of not reinventing things and making sure that simple
  things are simple to do.
 
 We wouldn't rm -rf block/* because we still need qemu-nbd.  It
 probably makes sense to keep what we have today.  I'm talking more
 about a shift from writing our own image format to integrating
 existing storage support.

I think this is a key point.  While I do like the idea of keeping QEMU
focused on single VM, I think we don't help ourselves by not consuming
the hypervisor platform services and integrating/exploiting those
features to make using QEMU easier.

That said, it does mean that some things like system-wide config and
privs are hard and aren't strictly virtualization issues, but that
doesn't mean we can't integrate some sort of solution.

 
  That may require tighter integration and more focus on the higher up pieces
  in the stack to really enable this.
 
 Yes, exactly.  Much of it shouldn't be inside QEMU.
 
 Stefan

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] [PATCH v4 0/3] The intro for QEMU disk I/O limits

2011-08-01 Thread Ryan Harper

* Zhi Yong Wu wu...@linux.vnet.ibm.com [2011-08-01 01:30]:
 The main goal of the patch is to effectively cap the disk I/O speed or counts 
 of one single VM.It is only one draft, so it unavoidably has some drawbacks, 
 if you catch them, please let me know.
 
 The patch will mainly introduce one block I/O throttling algorithm, one timer 
 and one block queue for each I/O limits enabled drive.
 
 When a block request is coming in, the throttling algorithm will check if its 
 I/O rate or counts exceed the limits; if yes, then it will enqueue to the 
 block queue; The timer will handle the I/O requests in it.
 
 Some available features follow as below:
 (1) global bps limit.
 -drive bps=xxxin bytes/s
 (2) only read bps limit
 -drive bps_rd=xxx in bytes/s
 (3) only write bps limit
 -drive bps_wr=xxx in bytes/s
 (4) global iops limit
 -drive iops=xxx   in ios/s
 (5) only read iops limit
 -drive iops_rd=xxxin ios/s
 (6) only write iops limit
 -drive iops_wr=xxxin ios/s
 (7) the combination of some limits.
 -drive bps=xxx,iops=xxx
 
 Known Limitations:
 (1) #1 can not coexist with #2, #3
 (2) #4 can not coexist with #5, #6
 (3) When bps/iops limits are specified to a small value such as 511 bytes/s, 
 this VM will hang up. We are considering how to handle this senario.
 
 
 Zhi Yong Wu (3):
   v4: fix memory leaking based on ryan's feedback. 

It looks like the leak has been fixed, but I think we need to rework how
the blk-queue is using the AIOPool.  I'll reply to that patch.



-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] [PATCH v4 2/3] The support for block queue

2011-08-01 Thread Ryan Harper

* Zhi Yong Wu wu...@linux.vnet.ibm.com [2011-08-01 01:30]:

 +static AIOPool block_queue_pool = {
 +.aiocb_size = sizeof(struct BlockDriverAIOCB),
 +.cancel = qemu_block_queue_cancel,
 +};
 +
 +static void qemu_block_queue_callback(void *opaque, int ret)
 +{
 +BlockDriverAIOCB *acb = opaque;
 +
 +qemu_aio_release(acb);
 +}
 +

So, here we really want to invoke the original commands callback, and
then free the request here (via qemu_aio_release()).  see below.


 +BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue,
 +BlockDriverState *bs,
 +BlockRequestHandler *handler,
 +int64_t sector_num,
 +QEMUIOVector *qiov,
 +int nb_sectors,
 +BlockDriverCompletionFunc *cb,
 +void *opaque)
 +{
 +BlockIORequest *request;
 +BlockDriverAIOCB *acb;
 +
 +request = qemu_malloc(sizeof(BlockIORequest));
 +request-bs = bs;
 +request-handler = handler;
 +request-sector_num = sector_num;
 +request-qiov = qiov;
 +request-nb_sectors = nb_sectors;
 +request-cb = cb;
 +request-opaque = opaque;
 +
 +QTAILQ_INSERT_TAIL(queue-requests, request, entry);
 +
 +acb = qemu_aio_get(block_queue_pool, bs,
 +   qemu_block_queue_callback, opaque);
 +request-acb = acb;
 +
 +return acb;
 +}
 +
 +int qemu_block_queue_handler(BlockIORequest *request)
 +{
 +int ret;
 +BlockDriverAIOCB *res;
 +
 +/* indicate this req is from block queue */
 +request-bs-req_from_queue = true;
 +
 +res = request-handler(request-bs, request-sector_num,
 +   request-qiov, request-nb_sectors,
 +   request-cb, request-opaque);
 +
 +if (request-acb) {
 +qemu_block_queue_callback(request-acb, 0);
 +}
 +
 +ret = (res == NULL) ? 0 : 1;
 +
 +return ret;
 +}


You don't want to malloc the BlockIORequest directly in _enqueue(), rather
you want to allocate that from your AIOPool via qemu_aio_get().  As it
is now, we're allocating two BlockIORequests (malloc and then a
aio_get()).  You'll need a BlockDriverAIOCB in the BlockIORequest
structure.  Then in your queue_handler, instead of passing the original
read or write callback (request-cb), you want to hook the block_queue callback
(qemu_block_queue_callback()), in that callback you can then invoke the
request callback and then release the request.

The request should be, only one malloc (via the pool which will re-use
the memory instead of incuring a malloc on every request), and then you 
release the memory back to the pool once your request is complete, which
you'll know after wiring up the block_queue callback to the completion
of the request's handler.  And then since we don't double allocate, you
won't need to do the qemu_free(request) in block.c in block_timer...


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] [PATCH v4 3/3] The support for queue timer and throttling algorithm

2011-08-01 Thread Ryan Harper

;
 +bs-io_disps.ios[BLOCK_IO_LIMIT_WRITE]++;
 +}
 +}
 +
 +if (bdrv_io_limits_enable(bs-io_limits)) {
 +bs-req_from_queue = false;
  }
 
  return ret;
 diff --git a/block.h b/block.h
 index 859d1d9..f0dac62 100644
 --- a/block.h
 +++ b/block.h
 @@ -97,7 +97,6 @@ int bdrv_change_backing_file(BlockDriverState *bs,
  const char *backing_file, const char *backing_fmt);
  void bdrv_register(BlockDriver *bdrv);
 
 -
  typedef struct BdrvCheckResult {
  int corruptions;
  int leaks;
 diff --git a/block_int.h b/block_int.h
 index 1e265d2..1ca826b 100644
 --- a/block_int.h
 +++ b/block_int.h
 @@ -27,10 +27,17 @@
  #include block.h
  #include qemu-option.h
  #include qemu-queue.h
 +#include block/blk-queue.h
 
  #define BLOCK_FLAG_ENCRYPT   1
  #define BLOCK_FLAG_COMPAT6   4
 
 +#define BLOCK_IO_LIMIT_READ 0
 +#define BLOCK_IO_LIMIT_WRITE1
 +#define BLOCK_IO_LIMIT_TOTAL2
 +
 +#define BLOCK_IO_SLICE_TIME 1
 +
  #define BLOCK_OPT_SIZE  size
  #define BLOCK_OPT_ENCRYPT   encryption
  #define BLOCK_OPT_COMPAT6   compat6
 @@ -46,6 +53,16 @@ typedef struct AIOPool {
  BlockDriverAIOCB *free_aiocb;
  } AIOPool;
 
 +typedef struct BlockIOLimit {
 +uint64_t bps[3];
 +uint64_t iops[3];
 +} BlockIOLimit;
 +
 +typedef struct BlockIODisp {
 +uint64_t bytes[2];
 +uint64_t ios[2];
 +} BlockIODisp;
 +
  struct BlockDriver {
  const char *format_name;
  int instance_size;
 @@ -175,6 +192,15 @@ struct BlockDriverState {
 
  void *sync_aiocb;
 
 +/* the time for latest disk I/O */
 +int64_t slice_start[2];
 +int64_t slice_end[2];
 +BlockIOLimit io_limits;
 +BlockIODisp  io_disps;
 +BlockQueue   *block_queue;
 +QEMUTimer*block_timer;
 +bool req_from_queue;
 +
  /* I/O stats (display with info blockstats). */
  uint64_t rd_bytes;
  uint64_t wr_bytes;
 @@ -222,6 +248,9 @@ void qemu_aio_release(void *p);
 
  void *qemu_blockalign(BlockDriverState *bs, size_t size);
 
 +void bdrv_set_io_limits(BlockDriverState *bs,
 +BlockIOLimit *io_limits);
 +
  #ifdef _WIN32
  int is_windows_drive(const char *filename);
  #endif
 -- 
 1.7.2.3
 

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] [PATCH v3 0/2] The intro for QEMU disk I/O limits

2011-07-31 Thread Ryan Harper

* Zhi Yong Wu wu...@linux.vnet.ibm.com [2011-07-28 05:53]:
 The main goal of the patch is to effectively cap the disk I/O speed or counts 
 of one single VM.It is only one draft, so it unavoidably has some drawbacks, 
 if you catch them, please let me know.
 
 The patch will mainly introduce one block I/O throttling algorithm, one timer 
 and one block queue for each I/O limits enabled drive.
 
 When a block request is coming in, the throttling algorithm will check if its 
 I/O rate or counts exceed the limits; if yes, then it will enqueue to the 
 block queue; The timer will periodically handle the I/O requests in it.
 
 Some available features follow as below:
 (1) global bps limit.
 -drive bps=xxxin bytes/s
 (2) only read bps limit
 -drive bps_rd=xxx in bytes/s
 (3) only write bps limit
 -drive bps_wr=xxx in bytes/s
 (4) global iops limit
 -drive iops=xxx   in ios/s
 (5) only read iops limit
 -drive iops_rd=xxxin ios/s
 (6) only write iops limit
 -drive iops_wr=xxxin ios/s
 (7) the combination of some limits.
 -drive bps=xxx,iops=xxx
 
 Known Limitations:
 (1) #1 can not coexist with #2, #3
 (2) #4 can not coexist with #5, #6
 (3) When bps/iops limits are specified to a small value such as 511 bytes/s, 
 this VM will hang up. We are considering how to handle this senario.
 

I don't yet have detailed info , but we've got a memory leak in the
code.  After running the VM with a 1MB r and w limit for 8 hours or
so:

-drive bps_rd=$((1*1024*1024)),bps_wr=$((1*1024*1024))

I've got my system swapping with 43G resident in memory:

9913 root  20   0 87.3g  43g  548 D  9.6 34.5  44:00.87 qemu-system-x86


would be worth looking through the code and maybe a valgrind run to
catch the leak.


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] qemu - SCSI disk Device Model, Serial Number, and Firmware Version?

2011-07-11 Thread Ryan Harper

* Dave Seddon d...@seddon.ca [2011-06-07 02:28]:
 Greetings,
 
 Just wondering if it would be difficult to add the ability to define the
 SCSI disk Device Model, Serial Number, and Firmware Version.  I've
 been using the '-device lsi' successfully to emulate the LSI controller,
 but now I want to emulate certain disks too.
 
 e.g.  I've been using this:
 ---
 ...
   -drive 
 if=none,id=disk00,file=/home/das/documents/qemu/disk00.img.qcow,media=disk,cache=writeback
  \
   -device lsi \
   -device scsi-disk,drive=disk00,bus=scsi.0 \
 ...
 ---
 
 
 The reason this would be really cool is that tools like smartmontools
 seem to match on the Device Model, and the device-model QEMU hasn't
 made it into the list yet.
 

What's the end goal here?  Are you really passing an entire disk to the
guest and want to have the guest monitor the device? If not, then I
suggest just disabling smartmontools since it won't give you meaningful
data anyhow.



-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] [RFC]QEMU disk I/O limits

2011-05-31 Thread Ryan Harper

* Daniel P. Berrange berra...@redhat.com [2011-05-31 09:25]:
 On Tue, May 31, 2011 at 10:10:37AM -0400, Vivek Goyal wrote:
  On Tue, May 31, 2011 at 02:56:46PM +0100, Daniel P. Berrange wrote:
   On Tue, May 31, 2011 at 09:45:37AM -0400, Vivek Goyal wrote:
On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
 Hello, all,
 
 I have prepared to work on a feature called Disk I/O limits for 
 qemu-kvm projeect.
 This feature will enable the user to cap disk I/O amount 
 performed by a VM.It is important for some storage resources to be 
 shared among multi-VMs. As you've known, if some of VMs are doing 
 excessive disk I/O, they will hurt the performance of other VMs.
 

Hi Zhiyong,

Why not use kernel blkio controller for this and why reinvent the wheel
and implement the feature again in qemu?
   
   The finest level of granularity offered by cgroups apply limits per QEMU
   process. So the blkio controller can't be used to apply controls directly
   to individual disks used by QEMU, only the VM as a whole.
  
  So are multiple VMs using same disk. Then put multiple VMs in same
  cgroup and apply the limit on that disk.
  
  Or if you want to put a system wide limit on a disk, then put all
  VMs in root cgroup and put limit on root cgroups.
  
  I fail to understand what's the exact requirement here. I thought
  the biggest use case was isolation one VM from other which might
  be sharing same device. Hence we were interested in putting 
  per VM limit on disk and not a system wide limit on disk (independent
  of VM).
 
 No, it isn't about putting limits on a disk independant of a VM. It is
 about one VM having multiple disks, and wanting to set different policies
 for each of its virtual disks. eg
 
   qemu-kvm -drive file=/dev/sda1 -drive file=/dev/sdb3
 
 and wanting to say that sda1 is limited to 10 MB/s, while sdb3 is
 limited to 50 MB/s.  You can't do that kind of thing with cgroups,
 because it can only control the entire process, not individual
 resources within the process.

yes, but with files:

qemu-kvm -drive file=/path/to/local/vm/images 
 -drive file=/path/to/shared/storage 


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] [PATCH v3 1/2] cdrom: Allow the TEST_UNIT_READY command after a cdrom change

2011-04-08 Thread Ryan Harper

* Amit Shah amit.s...@redhat.com [2011-04-07 03:53]:
 We restrict the commands that a guest can send us after a cdrom change
 event.  The current list includes REQUEST_SENSE and INQUIRY commands.
 Guests can also issue TEST_UNIT_READY to inquire for the status, so
 allow this command as well.
 
 This also gets rid of one cause of the HSM violation errors in Linux
 guests.  Those errors came up because we had the UNIT_ATTENTION event
 pending and we replied with an error message to a command that should be
 allowed in such a condition.  The guest then did a soft reset to get to
 a sane state.
 
 Signed-off-by: Amit Shah amit.s...@redhat.com
 ---
  hw/ide/core.c |9 +
  1 files changed, 5 insertions(+), 4 deletions(-)
 
 diff --git a/hw/ide/core.c b/hw/ide/core.c
 index 007a4ee..d55d804 100644
 --- a/hw/ide/core.c
 +++ b/hw/ide/core.c
 @@ -1105,10 +1105,11 @@ static void ide_atapi_cmd(IDEState *s)
  /* If there's a UNIT_ATTENTION condition pending, only
 REQUEST_SENSE and INQUIRY commands are allowed to complete. */

Wasn't this comment going to get updated to be more generic (ie more
than just the 3 commands below may be needed?)

  if (s-sense_key == SENSE_UNIT_ATTENTION 
 - s-io_buffer[0] != GPCMD_REQUEST_SENSE 
 - s-io_buffer[0] != GPCMD_INQUIRY) {
 - ide_atapi_cmd_check_status(s);
 - return;
 +s-io_buffer[0] != GPCMD_REQUEST_SENSE 
 +s-io_buffer[0] != GPCMD_INQUIRY 
 +s-io_buffer[0] != GPCMD_TEST_UNIT_READY) {
 +ide_atapi_cmd_check_status(s);
 +return;
  }
  switch(s-io_buffer[0]) {
  case GPCMD_TEST_UNIT_READY:
 -- 
 1.7.4
 

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] [PATCH v3] Do not delete BlockDriverState when deleting the drive

2011-03-29 Thread Ryan Harper

* Markus Armbruster arm...@redhat.com [2011-03-29 02:44]:
 Ryan Harper ry...@us.ibm.com writes:
 
  When removing a drive from the host-side via drive_del we currently have the
  following path:
 
  drive_del
  qemu_aio_flush()
  bdrv_close()// zaps bs-drv, which makes any subsequent I/O get
  // dropped.  Works as designed
  drive_uninit()
  bdrv_delete()   // frees the bs.  Since the device is still connected to
  // bs, any subsequent I/O is a use-after-free.
 
  The value of bs-drv becomes unpredictable on free.  As long as it
  remains null, I/O still gets dropped, however it could become non-null at 
  any
  point after the free resulting SEGVs or other QEMU state corruption.
 
  To resolve this issue as simply as possible, we can chose to not actually
  delete the BlockDriverState pointer.  Since bdrv_close() handles setting 
  the drv
  pointer to NULL, we just need to remove the BlockDriverState from the QLIST
  that is used to enumerate the block devices.  This is currently handled 
  within
  bdrv_delete, so move this into its own function, bdrv_make_anon().
 
  The result is that we can now invoke drive_del, this closes the file 
  descriptors
  and sets BlockDriverState-drv to NULL which prevents futher IO to the 
  device,
  and since we do not free BlockDriverState, we don't have to worry about the 
  copy
  retained in the block devices.
 
  We also don't attempt to remove the qdev property since we are no longer 
  deleting
  the BlockDriverState on drives with associated drives.  This also allows for
  removing Drives with no devices associated either.
 
  Reported-by: Markus Armbruster arm...@redhat.com
  Signed-off-by: Ryan Harper ry...@us.ibm.com
  ---
  v2-v3
   - Update drive_del use after free description
   - s/bdrv_remove/bdrv_make_anon/g
   - Don't remove qdev property since we don't delete bs any more
   - If (bs-peer) bdrv_make_anon else bdrv_delete to handle removing
 drives with no device.
  v1-v2
- NULL bs-device_name after removing from list to prevent
  second removal.
 
   block.c|   13 ++---
   block.h|1 +
   blockdev.c |   25 -
   3 files changed, 19 insertions(+), 20 deletions(-)
 
  diff --git a/block.c b/block.c
  index c8e2f97..6a5d3f2 100644
  --- a/block.c
  +++ b/block.c
  @@ -697,14 +697,21 @@ void bdrv_close_all(void)
   }
   }
   
  +/* make a BlockDriverState anonymous by removing from bdrv_state list.
  +   Also, NULL terminate the device_name to prevent double remove */
  +void bdrv_make_anon(BlockDriverState *bs)
  +{
  +if (bs-device_name[0] != '\0') {
  +QTAILQ_REMOVE(bdrv_states, bs, list);
  +}
 
 You lost
 
 +bs-device_name[0] = '\0';
 
 since v2.  Oops.

Crap.  

 
  +}
  +
   void bdrv_delete(BlockDriverState *bs)
   {
   assert(!bs-peer);
   
   /* remove from list, if necessary */
  -if (bs-device_name[0] != '\0') {
  -QTAILQ_REMOVE(bdrv_states, bs, list);
  -}
  +bdrv_make_anon(bs);
   
   bdrv_close(bs);
   if (bs-file != NULL) {
  diff --git a/block.h b/block.h
  index 5d78fc0..52e9cad 100644
  --- a/block.h
  +++ b/block.h
  @@ -66,6 +66,7 @@ int bdrv_create(BlockDriver *drv, const char* filename,
   QEMUOptionParameter *options);
   int bdrv_create_file(const char* filename, QEMUOptionParameter *options);
   BlockDriverState *bdrv_new(const char *device_name);
  +void bdrv_make_anon(BlockDriverState *bs);
   void bdrv_delete(BlockDriverState *bs);
   int bdrv_file_open(BlockDriverState **pbs, const char *filename, int 
  flags);
   int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
  diff --git a/blockdev.c b/blockdev.c
  index ecf2252..2c0eb06 100644
  --- a/blockdev.c
  +++ b/blockdev.c
  @@ -737,8 +737,6 @@ int do_drive_del(Monitor *mon, const QDict *qdict, 
  QObject **ret_data)
   {
   const char *id = qdict_get_str(qdict, id);
   BlockDriverState *bs;
  -BlockDriverState **ptr;
  -Property *prop;
   
   bs = bdrv_find(id);
   if (!bs) {
  @@ -755,24 +753,17 @@ int do_drive_del(Monitor *mon, const QDict *qdict, 
  QObject **ret_data)
   bdrv_flush(bs);
   bdrv_close(bs);
   
  -/* clean up guest state from pointing to host resource by
  - * finding and removing DeviceState drive property */
  +/* if we have a device associated with this BlockDriverState (bs-peer)
  + * then we need to make the drive anonymous until the device
  + * can be removed.  If this is a drive with no device backing
  + * then we can just get rid of the block driver state right here.
  + */
   if (bs-peer) {
  -for (prop = bs-peer-info-props; prop  prop-name; prop++) {
  -if (prop-info-type == PROP_TYPE_DRIVE) {
  -ptr = qdev_get_prop_ptr(bs-peer, prop);
  -if (*ptr == bs) {
  -bdrv_detach(bs, bs-peer);
  -*ptr = NULL

Re: [Qemu-devel] [PATCH v3] Do not delete BlockDriverState when deleting the drive

2011-03-29 Thread Ryan Harper

* Markus Armbruster arm...@redhat.com [2011-03-29 04:06]:
 Since you have to respin anyway, would you mind limiting commit message
 line length to 70-75 characters?  Thanks.

yep


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

[Qemu-devel] [PATCH v4] Do not delete BlockDriverState when deleting the drive

2011-03-29 Thread Ryan Harper

When removing a drive from the host-side via drive_del we currently have
the following path:

drive_del
qemu_aio_flush()
bdrv_close()// zaps bs-drv, which makes any subsequent I/O get
// dropped.  Works as designed
drive_uninit()
bdrv_delete()   // frees the bs.  Since the device is still connected to
// bs, any subsequent I/O is a use-after-free.

The value of bs-drv becomes unpredictable on free.  As long as it
remains null, I/O still gets dropped, however it could become non-null
at any point after the free resulting SEGVs or other QEMU state
corruption.

To resolve this issue as simply as possible, we can chose to not
actually delete the BlockDriverState pointer.  Since bdrv_close()
handles setting the drv pointer to NULL, we just need to remove the
BlockDriverState from the QLIST that is used to enumerate the block
devices.  This is currently handled within bdrv_delete, so move this
into its own function, bdrv_make_anon().

The result is that we can now invoke drive_del, this closes the file
descriptors and sets BlockDriverState-drv to NULL which prevents futher
IO to the device, and since we do not free BlockDriverState, we don't
have to worry about the copy retained in the block devices.

We also don't attempt to remove the qdev property since we are no longer
deleting the BlockDriverState on drives with associated drives.  This
also allows for removing Drives with no devices associated either.

Reported-by: Markus Armbruster arm...@redhat.com
Signed-off-by: Ryan Harper ry...@us.ibm.com
---
v3-v4
 - add back missing nulling of device_name in v2
 - use drive_uninit() when removing drive with no peer
 - align commit message on 72 char boundary
v2-v3
 - Update drive_del use after free description 
 - s/bdrv_remove/bdrv_make_anon/g
 - Don't remove qdev property since we don't delete bs any more
 - If (bs-peer) bdrv_make_anon else bdrv_delete to handle removing
   drives with no device.
v1-v2
  - NULL bs-device_name after removing from list to prevent
second removal.

 block.c|   14 +++---
 block.h|1 +
 blockdev.c |   25 -
 3 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/block.c b/block.c
index c8e2f97..c93ec6d 100644
--- a/block.c
+++ b/block.c
@@ -697,14 +697,22 @@ void bdrv_close_all(void)
 }
 }
 
+/* make a BlockDriverState anonymous by removing from bdrv_state list.
+   Also, NULL terminate the device_name to prevent double remove */
+void bdrv_make_anon(BlockDriverState *bs)
+{
+if (bs-device_name[0] != '\0') {
+QTAILQ_REMOVE(bdrv_states, bs, list);
+}
+bs-device_name[0] = '\0';
+}
+
 void bdrv_delete(BlockDriverState *bs)
 {
 assert(!bs-peer);
 
 /* remove from list, if necessary */
-if (bs-device_name[0] != '\0') {
-QTAILQ_REMOVE(bdrv_states, bs, list);
-}
+bdrv_make_anon(bs);
 
 bdrv_close(bs);
 if (bs-file != NULL) {
diff --git a/block.h b/block.h
index 5d78fc0..52e9cad 100644
--- a/block.h
+++ b/block.h
@@ -66,6 +66,7 @@ int bdrv_create(BlockDriver *drv, const char* filename,
 QEMUOptionParameter *options);
 int bdrv_create_file(const char* filename, QEMUOptionParameter *options);
 BlockDriverState *bdrv_new(const char *device_name);
+void bdrv_make_anon(BlockDriverState *bs);
 void bdrv_delete(BlockDriverState *bs);
 int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags);
 int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
diff --git a/blockdev.c b/blockdev.c
index ecf2252..c810b16 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -737,8 +737,6 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject 
**ret_data)
 {
 const char *id = qdict_get_str(qdict, id);
 BlockDriverState *bs;
-BlockDriverState **ptr;
-Property *prop;
 
 bs = bdrv_find(id);
 if (!bs) {
@@ -755,24 +753,17 @@ int do_drive_del(Monitor *mon, const QDict *qdict, 
QObject **ret_data)
 bdrv_flush(bs);
 bdrv_close(bs);
 
-/* clean up guest state from pointing to host resource by
- * finding and removing DeviceState drive property */
+/* if we have a device associated with this BlockDriverState (bs-peer)
+ * then we need to make the drive anonymous until the device
+ * can be removed.  If this is a drive with no device backing
+ * then we can just get rid of the block driver state right here.
+ */
 if (bs-peer) {
-for (prop = bs-peer-info-props; prop  prop-name; prop++) {
-if (prop-info-type == PROP_TYPE_DRIVE) {
-ptr = qdev_get_prop_ptr(bs-peer, prop);
-if (*ptr == bs) {
-bdrv_detach(bs, bs-peer);
-*ptr = NULL;
-break;
-}
-}
-}
+bdrv_make_anon(bs);
+} else {
+drive_uninit(drive_get_by_blockdev(bs));
 }
 
-/* clean up host side */
-drive_uninit(drive_get_by_blockdev(bs));
-
 return 0

Re: [Qemu-devel] [PATCH v2] Do not delete BlockDriverState when deleting the drive

2011-03-28 Thread Ryan Harper

* Markus Armbruster arm...@redhat.com [2011-03-24 07:27]:
 Whoops, almost missed this.  Best to cc: me to avoid that.

It was sent directly to you:

   Sender: qemu-devel-bounces+ryanh=us.ibm@nongnu.org
   From: Ryan Harper ry...@us.ibm.com
   Subject: Re: [Qemu-devel] [PATCH v2] Do not delete BlockDriverState when 
 deleting the drive
   Date: Tue, 22 Mar 2011 20:53:47 -0500
   Message-ID: 20110323015347.ga20...@us.ibm.com
   User-Agent: Mutt/1.5.6+20040907i
   To: Markus Armbruster arm...@redhat.com
   Cc: Kevin Wolf kw...@redhat.com, Ryan Harper ry...@us.ibm.com,
   qemu-devel@nongnu.org

 Ryan Harper ry...@us.ibm.com writes:

  * Markus Armbruster arm...@redhat.com [2011-03-15 04:48]:
  Sorry for the long delay, I was out of action for a week.

  Ryan Harper ry...@us.ibm.com writes:

   When removing a drive from the host-side via drive_del we currently have 
   the
   following path:

   drive_del
   qemu_aio_flush()
   bdrv_close()
   drive_uninit()
   bdrv_delete()

   When we bdrv_delete() we end up qemu_free() the BlockDriverState pointer
   however, the block devices retain a copy of this pointer, see
   hw/virtio-blk.c:virtio_blk_init() where we s-bs = conf-bs.

   We now have a use-after-free situation.  If the guest continues to issue 
   IO
   against the device, and we've reallocated the memory that the 
   BlockDriverState
   pointed at, then we will fail the bs-drv checks in the various bdrv_ 
   methods.

  we will fail the bs-drv checks is misleading, in my opinion.  Here's
  what happens:

  1. bdrv_close(bs) zaps bs-drv, which makes any subsequent I/O get
 dropped.  Works as designed.

  2. drive_uninit() frees the bs.  Since the device is still connected to
 bs, any subsequent I/O is a use-after-free.

 The value of bs-drv becomes unpredictable on free.  As long as it
 remains null, I/O still gets dropped.  I/O crashes or worse once that
 changed.  Could be right on free, could be much later.

  If you respin anyway, please clarify your description.

  Sure.  I wasn't planning a new version, but I'll update and send anyhow
  as I didn't see it get included in pull from the block branch.

   To resolve this issue as simply as possible, we can chose to not actually
   delete the BlockDriverState pointer.  Since bdrv_close() handles setting 
   the drv
   pointer to NULL, we just need to remove the BlockDriverState from the 
   QLIST
   that is used to enumerate the block devices.  This is currently handled 
   within
   bdrv_delete, so move this into it's own function, bdrv_remove().

  Why do we remove the BlockDriverState from bdrv_states?  Because we want
  drive_del make its *name* go away.

  Begs the question: is the code prepared for a BlockDriverState object
  that isn't on bdrv_states?  Turns out we're in luck: bdrv_new() already
  creates such objects when the device_name is empty.  This is used for
  internal BlockDriverStates such as COW backing files.  Your code makes
  device_name empty when taking the object off bdrv_states, so we're good.

  Begs yet another question: how does the behavior of a BlockDriverState
  change when it's taken off bdrv_states, and is that the behavior we
  want?  Changes:

  * bdrv_delete() no longer takes it off bdrv_states.  Good.

  * bdrv_close_all(), bdrv_commit_all() and bdrv_flush_all() no longer
cover it.  Okay, because bdrv_close(), bdrv_commit() and bdrv_flush()
do nothing anyway for closed BlockDriverStates.

  * info block and info blockstats no longer show it, because
bdrv_info() and bdrv_info_stats() no longer see it.  Okay.

  * bdrv_find(), bdrv_next(), bdrv_iterate() no longer see it.  Impact?
Please check their uses and report.

 1664  block-migration.c block_load
   bs = bdrv_find(device_name);

  - no longer see it.  This is fine since we can't migrate a block
  device that has been removed

 2562  blockdev.c do_commit
   bs = bdrv_find(device);

  - do_commit won't see it in either when calling bdrv_commit_all()
  Fine as you mention above.  If user specifies the device name
  we won't find it, that's OK because we can't commit data against
  a closed BlockDriverState.

 3587  blockdev.c do_snapshot_blkdev
   bs = bdrv_find(device);

 - OK, cannot take a snapshot against a deleted BlockDriverState

 4662  blockdev.c do_eject
   bs = bdrv_find(filename);

 - OK, cannot eject a deleted BlockDriverState; 

 5676  blockdev.c do_block_set_passwd
   bs = bdrv_find(qdict_get_str(qdict, device));

 - OK, cannot set password a deleted BlockDriverState; 

 6701  blockdev.c do_change_block
   bs = bdrv_find(device);

 - OK, cannot change the file/device of a deleted BlockDriverState; 

 7732  blockdev.c do_drive_del
   bs = bdrv_find(id);

 - OK, cannot

[Qemu-devel] [PATCH v3] Do not delete BlockDriverState when deleting the drive

2011-03-28 Thread Ryan Harper

When removing a drive from the host-side via drive_del we currently have the
following path:

drive_del
qemu_aio_flush()
bdrv_close()// zaps bs-drv, which makes any subsequent I/O get
// dropped.  Works as designed
drive_uninit()
bdrv_delete()   // frees the bs.  Since the device is still connected to
// bs, any subsequent I/O is a use-after-free.

The value of bs-drv becomes unpredictable on free.  As long as it
remains null, I/O still gets dropped, however it could become non-null at any
point after the free resulting SEGVs or other QEMU state corruption.

To resolve this issue as simply as possible, we can chose to not actually
delete the BlockDriverState pointer.  Since bdrv_close() handles setting the drv
pointer to NULL, we just need to remove the BlockDriverState from the QLIST
that is used to enumerate the block devices.  This is currently handled within
bdrv_delete, so move this into its own function, bdrv_make_anon().

The result is that we can now invoke drive_del, this closes the file descriptors
and sets BlockDriverState-drv to NULL which prevents futher IO to the device,
and since we do not free BlockDriverState, we don't have to worry about the copy
retained in the block devices.

We also don't attempt to remove the qdev property since we are no longer 
deleting
the BlockDriverState on drives with associated drives.  This also allows for
removing Drives with no devices associated either.

Reported-by: Markus Armbruster arm...@redhat.com
Signed-off-by: Ryan Harper ry...@us.ibm.com
---
v2-v3
 - Update drive_del use after free description
 - s/bdrv_remove/bdrv_make_anon/g
 - Don't remove qdev property since we don't delete bs any more
 - If (bs-peer) bdrv_make_anon else bdrv_delete to handle removing
   drives with no device.
v1-v2
  - NULL bs-device_name after removing from list to prevent
second removal.

 block.c|   13 ++---
 block.h|1 +
 blockdev.c |   25 -
 3 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/block.c b/block.c
index c8e2f97..6a5d3f2 100644
--- a/block.c
+++ b/block.c
@@ -697,14 +697,21 @@ void bdrv_close_all(void)
 }
 }
 
+/* make a BlockDriverState anonymous by removing from bdrv_state list.
+   Also, NULL terminate the device_name to prevent double remove */
+void bdrv_make_anon(BlockDriverState *bs)
+{
+if (bs-device_name[0] != '\0') {
+QTAILQ_REMOVE(bdrv_states, bs, list);
+}
+}
+
 void bdrv_delete(BlockDriverState *bs)
 {
 assert(!bs-peer);
 
 /* remove from list, if necessary */
-if (bs-device_name[0] != '\0') {
-QTAILQ_REMOVE(bdrv_states, bs, list);
-}
+bdrv_make_anon(bs);
 
 bdrv_close(bs);
 if (bs-file != NULL) {
diff --git a/block.h b/block.h
index 5d78fc0..52e9cad 100644
--- a/block.h
+++ b/block.h
@@ -66,6 +66,7 @@ int bdrv_create(BlockDriver *drv, const char* filename,
 QEMUOptionParameter *options);
 int bdrv_create_file(const char* filename, QEMUOptionParameter *options);
 BlockDriverState *bdrv_new(const char *device_name);
+void bdrv_make_anon(BlockDriverState *bs);
 void bdrv_delete(BlockDriverState *bs);
 int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags);
 int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
diff --git a/blockdev.c b/blockdev.c
index ecf2252..2c0eb06 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -737,8 +737,6 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject 
**ret_data)
 {
 const char *id = qdict_get_str(qdict, id);
 BlockDriverState *bs;
-BlockDriverState **ptr;
-Property *prop;
 
 bs = bdrv_find(id);
 if (!bs) {
@@ -755,24 +753,17 @@ int do_drive_del(Monitor *mon, const QDict *qdict, 
QObject **ret_data)
 bdrv_flush(bs);
 bdrv_close(bs);
 
-/* clean up guest state from pointing to host resource by
- * finding and removing DeviceState drive property */
+/* if we have a device associated with this BlockDriverState (bs-peer)
+ * then we need to make the drive anonymous until the device
+ * can be removed.  If this is a drive with no device backing
+ * then we can just get rid of the block driver state right here.
+ */
 if (bs-peer) {
-for (prop = bs-peer-info-props; prop  prop-name; prop++) {
-if (prop-info-type == PROP_TYPE_DRIVE) {
-ptr = qdev_get_prop_ptr(bs-peer, prop);
-if (*ptr == bs) {
-bdrv_detach(bs, bs-peer);
-*ptr = NULL;
-break;
-}
-}
-}
+bdrv_make_anon(bs);
+} else {
+bdrv_delete(bs);
 }
 
-/* clean up host side */
-drive_uninit(drive_get_by_blockdev(bs));
-
 return 0;
 }
 
-- 
1.7.1

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] [PATCH v2] Do not delete BlockDriverState when deleting the drive

2011-03-22 Thread Ryan Harper

* Markus Armbruster arm...@redhat.com [2011-03-15 04:48]:
 Sorry for the long delay, I was out of action for a week.
 
 Ryan Harper ry...@us.ibm.com writes:
 
  When removing a drive from the host-side via drive_del we currently have the
  following path:
 
  drive_del
  qemu_aio_flush()
  bdrv_close()
  drive_uninit()
  bdrv_delete()
 
  When we bdrv_delete() we end up qemu_free() the BlockDriverState pointer
  however, the block devices retain a copy of this pointer, see
  hw/virtio-blk.c:virtio_blk_init() where we s-bs = conf-bs.
 
  We now have a use-after-free situation.  If the guest continues to issue IO
  against the device, and we've reallocated the memory that the 
  BlockDriverState
  pointed at, then we will fail the bs-drv checks in the various bdrv_ 
  methods.
 
 we will fail the bs-drv checks is misleading, in my opinion.  Here's
 what happens:
 
 1. bdrv_close(bs) zaps bs-drv, which makes any subsequent I/O get
dropped.  Works as designed.
 
 2. drive_uninit() frees the bs.  Since the device is still connected to
bs, any subsequent I/O is a use-after-free.
 
The value of bs-drv becomes unpredictable on free.  As long as it
remains null, I/O still gets dropped.  I/O crashes or worse once that
changed.  Could be right on free, could be much later.
 
 If you respin anyway, please clarify your description.

Sure.  I wasn't planning a new version, but I'll update and send anyhow
as I didn't see it get included in pull from the block branch.
 
  To resolve this issue as simply as possible, we can chose to not actually
  delete the BlockDriverState pointer.  Since bdrv_close() handles setting 
  the drv
  pointer to NULL, we just need to remove the BlockDriverState from the QLIST
  that is used to enumerate the block devices.  This is currently handled 
  within
  bdrv_delete, so move this into it's own function, bdrv_remove().
 
 Why do we remove the BlockDriverState from bdrv_states?  Because we want
 drive_del make its *name* go away.
 
 Begs the question: is the code prepared for a BlockDriverState object
 that isn't on bdrv_states?  Turns out we're in luck: bdrv_new() already
 creates such objects when the device_name is empty.  This is used for
 internal BlockDriverStates such as COW backing files.  Your code makes
 device_name empty when taking the object off bdrv_states, so we're good.
 
 Begs yet another question: how does the behavior of a BlockDriverState
 change when it's taken off bdrv_states, and is that the behavior we
 want?  Changes:
 
 * bdrv_delete() no longer takes it off bdrv_states.  Good.
 
 * bdrv_close_all(), bdrv_commit_all() and bdrv_flush_all() no longer
   cover it.  Okay, because bdrv_close(), bdrv_commit() and bdrv_flush()
   do nothing anyway for closed BlockDriverStates.
 
 * info block and info blockstats no longer show it, because
   bdrv_info() and bdrv_info_stats() no longer see it.  Okay.
 
 * bdrv_find(), bdrv_next(), bdrv_iterate() no longer see it.  Impact?
   Please check their uses and report.

   1664  block-migration.c block_load
 bs = bdrv_find(device_name);

- no longer see it.  This is fine since we can't migrate a block
device that has been removed

   2562  blockdev.c do_commit
 bs = bdrv_find(device);

- do_commit won't see it in either when calling bdrv_commit_all()
Fine as you mention above.  If user specifies the device name
we won't find it, that's OK because we can't commit data against
a closed BlockDriverState.

   3587  blockdev.c do_snapshot_blkdev
 bs = bdrv_find(device);

   - OK, cannot take a snapshot against a deleted BlockDriverState

   4662  blockdev.c do_eject
 bs = bdrv_find(filename);

   - OK, cannot eject a deleted BlockDriverState; 

   5676  blockdev.c do_block_set_passwd
 bs = bdrv_find(qdict_get_str(qdict, device));

   - OK, cannot set password a deleted BlockDriverState; 

   6701  blockdev.c do_change_block
 bs = bdrv_find(device);

   - OK, cannot change the file/device of a deleted BlockDriverState; 

   7732  blockdev.c do_drive_del
 bs = bdrv_find(id);

   - OK, cannot delete an already deleted Drive

   8783  blockdev.c do_block_resize
 bs = bdrv_find(device);

   - OK, cannot resize a deleted Drive

   9312  hw/qdev-properties.c parse_drive
 bs = bdrv_find(str);

   - Used when invoking qdev_prop_drive .parse method;  parse method is invoked 
via
qdev_device_add() which calls set_property() which invokes parse.  AFAICT, 
this is OK
since we won't be going down the device add path worrying about a
deleted block device.

 
  The result is that we can now invoke drive_del, this closes the file 
  descriptors
  and sets BlockDriverState-drv to NULL which prevents futher IO to the 
  device,
  and since we do not free BlockDriverState, we don't have to worry about the 
  copy
  retained in the block devices.
 
 Yep

[Qemu-devel] [PATCH v2] Do not delete BlockDriverState when deleting the drive

2011-03-07 Thread Ryan Harper

When removing a drive from the host-side via drive_del we currently have the
following path:

drive_del
qemu_aio_flush()
bdrv_close()
drive_uninit()
bdrv_delete()

When we bdrv_delete() we end up qemu_free() the BlockDriverState pointer
however, the block devices retain a copy of this pointer, see
hw/virtio-blk.c:virtio_blk_init() where we s-bs = conf-bs.

We now have a use-after-free situation.  If the guest continues to issue IO
against the device, and we've reallocated the memory that the BlockDriverState
pointed at, then we will fail the bs-drv checks in the various bdrv_ methods.

To resolve this issue as simply as possible, we can chose to not actually
delete the BlockDriverState pointer.  Since bdrv_close() handles setting the drv
pointer to NULL, we just need to remove the BlockDriverState from the QLIST
that is used to enumerate the block devices.  This is currently handled within
bdrv_delete, so move this into it's own function, bdrv_remove().

The result is that we can now invoke drive_del, this closes the file descriptors
and sets BlockDriverState-drv to NULL which prevents futher IO to the device,
and since we do not free BlockDriverState, we don't have to worry about the copy
retained in the block devices.

Reported-by: Marcus Armbruster arm...@redhat.com
Signed-off-by: Ryan Harper ry...@us.ibm.com
---
v1-v2
 - NULL bs-device_name after removing from list to prevent
   second removal.

 block.c|   12 +---
 block.h|1 +
 blockdev.c |2 +-
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index 1544d81..0df9942 100644
--- a/block.c
+++ b/block.c
@@ -697,14 +697,20 @@ void bdrv_close_all(void)
 }
 }
 
+void bdrv_remove(BlockDriverState *bs)
+{
+if (bs-device_name[0] != '\0') {
+QTAILQ_REMOVE(bdrv_states, bs, list);
+}
+bs-device_name[0] = '\0';
+}
+
 void bdrv_delete(BlockDriverState *bs)
 {
 assert(!bs-peer);
 
 /* remove from list, if necessary */
-if (bs-device_name[0] != '\0') {
-QTAILQ_REMOVE(bdrv_states, bs, list);
-}
+bdrv_remove(bs);
 
 bdrv_close(bs);
 if (bs-file != NULL) {
diff --git a/block.h b/block.h
index 5d78fc0..8447397 100644
--- a/block.h
+++ b/block.h
@@ -66,6 +66,7 @@ int bdrv_create(BlockDriver *drv, const char* filename,
 QEMUOptionParameter *options);
 int bdrv_create_file(const char* filename, QEMUOptionParameter *options);
 BlockDriverState *bdrv_new(const char *device_name);
+void bdrv_remove(BlockDriverState *bs);
 void bdrv_delete(BlockDriverState *bs);
 int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags);
 int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
diff --git a/blockdev.c b/blockdev.c
index 0690cc8..1f57b50 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -760,7 +760,7 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject 
**ret_data)
 }
 
 /* clean up host side */
-drive_uninit(drive_get_by_blockdev(bs));
+bdrv_remove(bs);
 
 return 0;
 }
-- 
1.7.1


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

[Qemu-devel] [PATCH v2] Don't allow multiwrites against a block device without underlying medium

2011-03-07 Thread Ryan Harper

If the block device has been closed, we no longer have a medium to submit
IO against, check for this before submitting io.  This prevents a segfault
further in the code where we dereference elements of the block driver.

Signed-off-by: Ryan Harper ry...@us.ibm.com
---
v1-v2:
- move bs-drv check to top of function to match other bdrv_
functions
- fill out reqs response with error code before
  returning -1.

 block.c |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index f7d91a2..1544d81 100644
--- a/block.c
+++ b/block.c
@@ -2398,6 +2398,14 @@ int bdrv_aio_multiwrite(BlockDriverState *bs, 
BlockRequest *reqs, int num_reqs)
 MultiwriteCB *mcb;
 int i;
 
+/* don't submit writes if we don't have a medium */
+if (bs-drv == NULL) {
+for (i = 0; i  num_reqs; i++) {
+reqs[i].error = -ENOMEDIUM;
+}
+return -1;
+}
+
 if (num_reqs == 0) {
 return 0;
 }
-- 
1.7.1


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

[Qemu-devel] [PATCH] Do not delete BlockDriverState when deleting the drive

2011-03-02 Thread Ryan Harper

When removing a drive from the host-side via drive_del we currently have the
following path:

drive_del
qemu_aio_flush()
bdrv_close()
drive_uninit()
bdrv_delete()

When we bdrv_delete() we end up qemu_free()'ing the BlockDriverState pointer
however, the block devices retain a copy of this pointer, see
hw/virtio-blk.c:virtio_blk_init() where we s-bs = conf-bs.

We now have a use-after-free situation.  If the guest continues to issue IO
against the device, and we've reallocated the memory that the BlockDriverState
pointed at, then we will fail the bs-drv checks in the various bdrv_ methods.

To resolve this issue as simply as possible, we can chose to not actually
delete the BlockDriverState pointer.  Since bdrv_close() handles setting the drv
pointer to NULL, we just need to remove the BlockDriverState from the QLIST
that is used to enumerate the block devices.  This is currently handled within
bdrv_delete, so move this into it's own function, bdrv_remove().

The result is that we can now invoke drive_del, this closes the file descriptors
and sets BlockDriverState-drv to NULL which prevents futher IO to the device,
and since we do not free BlockDriverState, we don't have to worry about the copy
retained in the block devices.  This state will be deleted if the guest
is asked and responds to a pci device removal request.

Reported-by: Markus Armbruster arm...@redhat.com
Signed-off-by: Ryan Harper ry...@us.ibm.com
---
 block.c|   11 ---
 block.h|1 +
 blockdev.c |2 +-
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index f7d91a2..92dd3fe 100644
--- a/block.c
+++ b/block.c
@@ -697,14 +697,19 @@ void bdrv_close_all(void)
 }
 }
 
+void bdrv_remove(BlockDriverState *bs)
+{
+if (bs-device_name[0] != '\0') {
+QTAILQ_REMOVE(bdrv_states, bs, list);
+}
+}
+
 void bdrv_delete(BlockDriverState *bs)
 {
 assert(!bs-peer);
 
 /* remove from list, if necessary */
-if (bs-device_name[0] != '\0') {
-QTAILQ_REMOVE(bdrv_states, bs, list);
-}
+bdrv_remove(bs);
 
 bdrv_close(bs);
 if (bs-file != NULL) {
diff --git a/block.h b/block.h
index 5d78fc0..8447397 100644
--- a/block.h
+++ b/block.h
@@ -66,6 +66,7 @@ int bdrv_create(BlockDriver *drv, const char* filename,
 QEMUOptionParameter *options);
 int bdrv_create_file(const char* filename, QEMUOptionParameter *options);
 BlockDriverState *bdrv_new(const char *device_name);
+void bdrv_remove(BlockDriverState *bs);
 void bdrv_delete(BlockDriverState *bs);
 int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags);
 int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
diff --git a/blockdev.c b/blockdev.c
index 0690cc8..1f57b50 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -760,7 +760,7 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject 
**ret_data)
 }
 
 /* clean up host side */
-drive_uninit(drive_get_by_blockdev(bs));
+bdrv_remove(bs);
 
 return 0;
 }
-- 
1.7.1


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] [PATCH] Fix ATA SMART and CHECK POWER MODE

2011-02-09 Thread Ryan Harper

 *bus, uint32_t val)
   memset(s-io_buffer, 0, 0x200);
   s-io_buffer[0] = 0x01; /* smart struct version */
   for (n=0; n30; n++) {
 - if (smart_attributes[n][0] == 0)
 + if (smart_attributes[n][0] == 0)

CODING_STYLE if()

   break;
 - s-io_buffer[2+0+(n*12)] = smart_attributes[n][0];
 - s-io_buffer[2+1+(n*12)] = smart_attributes[n][1];
 - s-io_buffer[2+3+(n*12)] = smart_attributes[n][2];
 - s-io_buffer[2+4+(n*12)] = smart_attributes[n][3];
 + int i;
 + for(i = 0; i  12; i++) {
 + s-io_buffer[2+i+(n*12)] = smart_attributes[n][i];
 + }
   }
 +

   whitespace

   s-io_buffer[362] = 0x02 | (s-smart_autosave?0x80:0x00);
   if (s-smart_selftest_count == 0) {
 - s-io_buffer[363] = 0;
 + s-io_buffer[363] = 0;
   } else {
 - s-io_buffer[363] =
 + s-io_buffer[363] =
   s-smart_selftest_data[3 + 
 -(s-smart_selftest_count - 1) *
 -24];
 +(s-smart_selftest_count - 1) *
 +24];
   }
   s-io_buffer[364] = 0x20; 
   s-io_buffer[365] = 0x01; 
 @@ -2136,9 +2160,9 @@ void ide_exec_cmd(IDEBus *bus, uint32_t val)
   s-io_buffer[372] = 0x02; /* minutes for poll short test */
   s-io_buffer[373] = 0x36; /* minutes for poll ext test */
   s-io_buffer[374] = 0x01; /* minutes for poll conveyance */
 -
 - for (n=0; n511; n++) 
 - s-io_buffer[511] += s-io_buffer[n];
 + for (n=0; n511; n++) { /* checksum */
 + s-io_buffer[511] += s-io_buffer[n];
 + }
   s-io_buffer[511] = 0x100 - s-io_buffer[511];
   s-status = READY_STAT | SEEK_STAT;
   ide_transfer_start(s, s-io_buffer, 0x200, ide_transfer_stop);
 @@ -2147,32 +2171,31 @@ void ide_exec_cmd(IDEBus *bus, uint32_t val)
   case SMART_READ_LOG:
   switch (s-sector) {
   case 0x01: /* summary smart error log */
 - memset(s-io_buffer, 0, 0x200);
 - s-io_buffer[0] = 0x01;
 - s-io_buffer[1] = 0x00; /* no error entries */
 - s-io_buffer[452] = s-smart_errors  0xff;
 - s-io_buffer[453] = (s-smart_errors  0xff00)  8;
 -
 - for (n=0; n511; n++)
 + memset(s-io_buffer, 0, 0x200);
 + s-io_buffer[0] = 0x01;
 + s-io_buffer[1] = 0x00; /* no error entries */
 + s-io_buffer[452] = s-smart_errors  0xff;
 + s-io_buffer[453] = (s-smart_errors  0xff00)  8;
 + for (n=0; n511; n++)

   brace

   s-io_buffer[511] += s-io_buffer[n];
 - s-io_buffer[511] = 0x100 - s-io_buffer[511];
 - break;
 + s-io_buffer[511] = 0x100 - s-io_buffer[511];
 + break;
   case 0x06: /* smart self test log */
 - memset(s-io_buffer, 0, 0x200);
 - s-io_buffer[0] = 0x01;
 - if (s-smart_selftest_count == 0) {
 + memset(s-io_buffer, 0, 0x200);
 + s-io_buffer[0] = 0x01;
 + if (s-smart_selftest_count == 0) {
   s-io_buffer[508] = 0;
 - } else {
 + } else {
   s-io_buffer[508] = s-smart_selftest_count;
   for (n=2; n506; n++) 
 - s-io_buffer[n] = s-smart_selftest_data[n];
 - }
 - for (n=0; n511; n++)
 + s-io_buffer[n] = s-smart_selftest_data[n];
 + }
 + for (n=0; n511; n++)

   brace

   s-io_buffer[511] += s-io_buffer[n];
 - s-io_buffer[511] = 0x100 - s-io_buffer[511];
 - break;
 + s-io_buffer[511] = 0x100 - s-io_buffer[511];
 + break;
   default:
 - goto abort_cmd;
 + goto abort_cmd;
   }
   s-status = READY_STAT | SEEK_STAT;
   ide_transfer_start(s, s-io_buffer, 0x200, ide_transfer_stop);
 

This patchset looks a lot larger than it should since there is a lot of
indentation movement, it would be good to see a version that just
implemented the changes needed, which AFAICT are mainly the additional
attributes and limits.

 
-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] [PATCH v3] Fix ATA SMART and CHECK POWER MODE

2011-02-09 Thread Ryan Harper

:0x00);
   if (s-smart_selftest_count == 0) {
 
 

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] [PATCH 2/3] block: tell drivers about an image resize

2011-01-14 Thread Ryan Harper

* Christoph Hellwig h...@lst.de [2011-01-14 10:27]:
 Add a new size_changed flag in the BlockDriverState and call the
 change_cb callback to notify drivers about a size change.
 
 Signed-off-by: Christoph Hellwig h...@lst.de
 
 Index: qemu/block.c
 ===
 --- qemu.orig/block.c 2011-01-14 17:05:49.527003363 +0100
 +++ qemu/block.c  2011-01-14 17:07:23.206255143 +0100
 @@ -1135,6 +1135,10 @@ int bdrv_truncate(BlockDriverState *bs,
  ret = drv-bdrv_truncate(bs, offset);
  if (ret == 0) {
  ret = refresh_total_sectors(bs, offset  BDRV_SECTOR_BITS);
 +if (bs-change_cb) {
 +bs-size_changed = 1;
 +bs-change_cb(bs-change_opaque);
 +}

Do we want to check to ensure the size_changed flag isn't set before
doing a second resize event?  I'm wondering if the truncate takes
$longtime and user gets impatient and issues a second resize command.
How should we respond?  Ignore it?  queue it up? 

Is there any other path that invokes bdrv_truncate?  If so, do we want
invoke the call back in those scenarios or only if the truncate is
initiated from the monitor invocation?

  }
  return ret;
  }
 Index: qemu/block_int.h
 ===
 --- qemu.orig/block_int.h 2011-01-14 17:05:49.537004411 +0100
 +++ qemu/block_int.h  2011-01-14 17:06:02.539004271 +0100
 @@ -167,6 +167,7 @@ struct BlockDriverState {
  char backing_format[16]; /* if non-zero and backing_file exists */
  int is_temporary;
  int media_changed;
 +int size_changed;
 
  BlockDriverState *backing_hd;
  BlockDriverState *file;

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] Re: [PATCH] Fix segfault with ram_size 4095M without kvm

2011-01-05 Thread Ryan Harper

* Ryan Harper ry...@us.ibm.com [2011-01-04 09:49]:
 * Aurelien Jarno aurel...@aurel32.net [2010-12-25 16:37]:
  On Wed, Dec 08, 2010 at 04:27:45PM -0200, Luiz Capitulino wrote:
   On Wed, 08 Dec 2010 12:23:12 -0600
   Anthony Liguori aligu...@linux.vnet.ibm.com wrote:
   
On 12/08/2010 12:01 PM, Luiz Capitulino wrote:
 Currently, x86_64-softmmu qemu segfaults when trying to use  4095M 
 memsize.
 This patch adds a simple check and error message (much like the 2047 
 limit on
 32-bit hosts) on ram_size in the control path after we determine we're
 not using kvm

 Upstream qemu-kvm is affected if using the -no-kvm option; this patch 
 address
 the segfault there as well.

 Signed-off-by: Ryan Harperry...@us.ibm.com
 Signed-off-by: Aurelien Jarnoaurel...@aurel32.net
 ---
 NOTE: this patch was applied in the v0.12.x branch, but it seems it 
 got
lost for master


No, it was intentional.  We should fix the segv, this is not a known 
limitation but rather a bug.
   
   A TCG bug, I presume?
   
  
  Do you have more details about this issue and how to reproduce it? 
 
 At the time of the bug, it was something simple like:
 
 qemu-system-x86_64 -m 4097 -hda /dev/null
 
 we'd get an imediate segfault.  As you say, I'm not seeing it now on
 current git; I'll see about bisecting to see if we did get a fix for the
 issue.

I attempted to bisect, but there a couple commits around where the issue
was fixed that broke git bisect =(  That narrowed it down to about 5
commits to check.

This the last git commit where I can reproduce the segfault with the
above test case (qemu invocation).

commit 0aef4261ac0ec9089ade0e3a92f986cb4ba7317e
Author: Aurelien Jarno aurel...@aurel32.net
Date:   Thu Mar 11 21:29:42 2010 +0100

target-ppc: fix evsrwu and evsrws (second try)

Signed-off-by: Aurelien Jarno aurel...@aurel32.net


The next 4 commits don't compile so they are untest-able:

commit 14f24e1465edc44b9b4d89fbbea66e06088154e1
 - fails to build with:
- ./configure --target-list=x86_64-softmmu  make clean  make
/home/rharper/work/git/qemu/exec.c: In function 'phys_page_find_alloc':
/home/rharper/work/git/qemu/exec.c:341: error: #error unsupported 
TARGET_PHYS_ADDR_SPACE_BITS
/home/rharper/work/git/qemu/exec.c: In function 'phys_page_for_each':
/home/rharper/work/git/qemu/exec.c:1670: error: #error unsupported 
TARGET_PHYS_ADDR_SPACE_BITS
make[1]: *** [exec.o] Error 1
make: *** [subdir-x86_64-softmmu] Error 2

commit 7bc7b099dfa38a856b1bc892c0f9f3d6fe28e170
 - fails to build with:
- ./configure --target-list=x86_64-softmmu  make clean  make
/home/rharper/work/git/qemu/exec.c: In function 'phys_page_find_alloc':
/home/rharper/work/git/qemu/exec.c:341: error: #error unsupported 
TARGET_PHYS_ADDR_SPACE_BITS
/home/rharper/work/git/qemu/exec.c: In function 'phys_page_for_each':
/home/rharper/work/git/qemu/exec.c:1670: error: #error unsupported 
TARGET_PHYS_ADDR_SPACE_BITS
make[1]: *** [exec.o] Error 1
make: *** [subdir-x86_64-softmmu] Error 2

commit b9f83121a13153536d886305414b540460c34508
 - fails to build with:
- ./configure --target-list=x86_64-softmmu  make clean  make
/home/rharper/work/git/qemu/exec.c: In function 'phys_page_find_alloc':
/home/rharper/work/git/qemu/exec.c:341: error: #error unsupported 
TARGET_PHYS_ADDR_SPACE_BITS
/home/rharper/work/git/qemu/exec.c: In function 'phys_page_for_each':
/home/rharper/work/git/qemu/exec.c:1670: error: #error unsupported 
TARGET_PHYS_ADDR_SPACE_BITS
make[1]: *** [exec.o] Error 1
make: *** [subdir-x86_64-softmmu] Error 2

commit 5270589032f450ae7c3448730855aa18ff68ccff
 - fails to build with:
- ./configure --target-list=x86_64-softmmu  make clean  make
/home/rharper/work/git/qemu/exec.c: In function 'phys_page_find_alloc':
/home/rharper/work/git/qemu/exec.c:341: error: #error unsupported 
TARGET_PHYS_ADDR_SPACE_BITS
/home/rharper/work/git/qemu/exec.c: In function 'phys_page_for_each':
/home/rharper/work/git/qemu/exec.c:1670: error: #error unsupported 
TARGET_PHYS_ADDR_SPACE_BITS
make[1]: *** [exec.o] Error 1
make: *** [subdir-x86_64-softmmu] Error 2


And this commit compiles and the test case no longer segfaults.  So I'd
say things are fixed at this point.

commit 5cd2c5b6ad75c46d40118ac67c0c09d4e7930a65
- compiles and issue is no longer present.
- ./configure --target-list=x86_64-softmmu  make clean  make 
sudo x86_64-softmmu/qemu-syem-x86_64 -L pc-bios -hda /dev/null -m 4097


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] Re: [PATCH] Fix segfault with ram_size 4095M without kvm

2011-01-04 Thread Ryan Harper

* Aurelien Jarno aurel...@aurel32.net [2010-12-25 16:37]:
 On Wed, Dec 08, 2010 at 04:27:45PM -0200, Luiz Capitulino wrote:
  On Wed, 08 Dec 2010 12:23:12 -0600
  Anthony Liguori aligu...@linux.vnet.ibm.com wrote:
  
   On 12/08/2010 12:01 PM, Luiz Capitulino wrote:
Currently, x86_64-softmmu qemu segfaults when trying to use  4095M 
memsize.
This patch adds a simple check and error message (much like the 2047 
limit on
32-bit hosts) on ram_size in the control path after we determine we're
not using kvm
   
Upstream qemu-kvm is affected if using the -no-kvm option; this patch 
address
the segfault there as well.
   
Signed-off-by: Ryan Harperry...@us.ibm.com
Signed-off-by: Aurelien Jarnoaurel...@aurel32.net
---
NOTE: this patch was applied in the v0.12.x branch, but it seems it got
   lost for master
   
   
   No, it was intentional.  We should fix the segv, this is not a known 
   limitation but rather a bug.
  
  A TCG bug, I presume?
  
 
 Do you have more details about this issue and how to reproduce it? 

At the time of the bug, it was something simple like:

qemu-system-x86_64 -m 4097 -hda /dev/null

we'd get an imediate segfault.  As you say, I'm not seeing it now on
current git; I'll see about bisecting to see if we did get a fix for the
issue.

 
 Support for more than 4GB of memory has been added a few years ago, 
 and I am not able to reproduce the problem anymore (I have booted a 
 64-bit guest with 6GB of RAM, and make sure the guest use the whole 
 memory). I guess TCG itself is fine, but there might be a bug in
 the MMU emulation in some cases.
 
 I also noticed that now i386-softmmu has been artificially limited to
 2047MB. Tthis configuration used to support up to 64GB of RAM (PAE)
 on 64-bit hosts.
 
 -- 
 Aurelien Jarno  GPG: 1024D/F1BCDB73
 aurel...@aurel32.net http://www.aurel33.net

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

[Qemu-devel] [PATCH] blockdev: check dinfo ptr before using

2010-12-09 Thread Ryan Harper

If a user decides to punish a guest by revoking its block device via
drive_del, and subsequently also attempts to remove the pci device
backing it, and the device is using blockdev_auto_del() then we get a
segfault when we attempt to access dinfo-auto_del.[1]

The fix is to check if drive_get_by_blockdev() actually returns a valid
dinfo pointer or not.

1. (qemu) pci_add auto storage 
file=images/test01.raw,if=virtio,id=block1,snapshot=on
   (qemu) drive_del block1
   (qemu) pci_del 5
   *segfault*

Signed-off-by: Ryan Harper ry...@us.ibm.com

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com


diff --git a/blockdev.c b/blockdev.c
index f6ac439..3b3b82d 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -30,14 +30,16 @@ void blockdev_mark_auto_del(BlockDriverState *bs)
 {
 DriveInfo *dinfo = drive_get_by_blockdev(bs);
 
-dinfo-auto_del = 1;
+if (dinfo) {
+dinfo-auto_del = 1;
+}
 }
 
 void blockdev_auto_del(BlockDriverState *bs)
 {
 DriveInfo *dinfo = drive_get_by_blockdev(bs);
 
-if (dinfo-auto_del) {
+if (dinfo  dinfo-auto_del) {
 drive_uninit(dinfo);
 }
 }

Re: [Qemu-devel] [PATCH 00/10] AHCI emulation support v2

2010-11-18 Thread Ryan Harper

* Alexander Graf ag...@suse.de [2010-11-18 12:49]:
 
 On 18.11.2010, at 14:26, Kevin Wolf kw...@redhat.com wrote:
 
  Hi Alex,
  
  Am 18.11.2010 04:27, schrieb Alexander Graf:
  This patch adds support for AHCI emulation. I have tested and verified it 
  works
  in Linux, OpenBSD, Windows Vista and Windows 7. This AHCI emulation 
  supports
  NCQ, so multiple read or write requests can be outstanding at the same 
  time.
  
  The code is however not fully optimized yet. I'm fairly sure that there are
  low hanging performance fruits to be found still :). In my simple 
  benchmarks
  I achieved about 2/3rd of virtio performance.
  
  Also, this AHCI emulation layer does not support legacy mode. So if you're
  using a disk with this emulation, you do not get it exposed using the 
  legacy
  IDE interfaces.
  
  Another nitpick is CD-ROM support in Windows. Somehow it doesn't detect a
  CD-ROM drive attached to AHCI. At least it doesn't list it.
  
  To attach an AHCI disk to your VM, please use
  
   -drive file=...,if=sata
  
  This should do the trick for x86. On other platforms, you might need to add
  the ahci host controller using -device.
  
  
  This patch set is based on work done during the Google Summer of Code. I 
  was
  mentoring a student, Roland Elek, who wrote most of the AHCI emulation code
  based on a patch from Chong Qiao. A bunch of other people were also 
  involved,
  so everybody who I didn't mention - thanks a lot!
  
  I'm not completely sure about the relationship between the AHCI
  emulation and our existing IDE emulation. First thing I noticed is that
  AHCI wants to be independent and resides in hw/ instead of hw/ide/, but
  it still include ide/internal.h. Do you think it would make sense to
  move AHCI into hw/ide?
 
 Both ahci and ide implement ata. I guess the best thing to do would be to 
 completely refactor all ide code into ata and pata code, then add ahci (sata) 
 to the game. Estimated working time: ~1 month. :)
 
 As I would rather have something working we can base on in the tree, so 
 whoever volunteers for the refactoring (hint!) knows how to design the 
 interfaces, I am not sure how much is reasonable within this patch set.
 
 Moving the file to ide/ does sound like a good idea however.
 
  
  Then I believe that core.c is now a mixture of some generic ATA code
  (that is also used by SATA) and the Legacy IDE code. SATA doesn't seem
  to interact with the generic code through clean interfaces, but by
  accessing internal data structures and calls to somewhere in the middle
  of the existing IDE emultion. I think we should get a clean abstraction
  there and have a clean split between SATA, PATA and common code, with
  each of them sitting in its own file in hw/ide.
  
  I haven't reviewed the patches in detail but just had a quick look at
  them, so my impressions might be wrong. If so, please correct me.
 
 No, you're completely right. We're in a chicken and egg situation. We don't 
 have ahci, but the ide code is ugly. We would probably do a bad job at 
 refactoring the ata code if we don't know which interfaces to design for.
 
 So IMHO the only way we can really go is to implement sata, take the uglyness 
 it adds to the already ugly ide code and use all of that for a working state 
 we can start to refactor from.
 
 So yes, while I agree with your obversations, I do not believe we will
 get there until at least 0.16 or so. And I'd rather like to see a fast
 default block driver in gueast sooner than later :)
 

Speaking of fast, do you have any numbers around ACHI vs IDE (not that I
need any convincing that we can do better than IDE); just curious.

 
 Alex
 
  

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] Re: [PATCH 0/2] v8 Decouple block device removal from device removal

2010-11-16 Thread Ryan Harper

* Kevin Wolf kw...@redhat.com [2010-11-16 08:05]:
 Am 16.11.2010 14:51, schrieb Luiz Capitulino:
  On Fri, 12 Nov 2010 18:38:57 +0100
  Kevin Wolf kw...@redhat.com wrote:
  
  Am 12.11.2010 18:07, schrieb Ryan Harper:
  details, details, v8
 
  This patch series decouples the detachment of a block device from the
  removal of the backing pci-device.  Removal of a hotplugged pci device
  requires the guest to respond before qemu tears down the block device.
  In some cases, the guest may not respond leaving the guest with
  continued access to the block device.  Mgmt layer doesn't have a
  reliable method to force a disconnect.  
 
  The new monitor command, drive_del, will revoke a guests access to the
  block device independently of the removal of the pci device.
 
  The first patch implements drive_del, the second patch implements the
  qmp version of the monitor command.
 
  Changes since v7:
  - Fixed up doc strings (delete - drive_del)
  Changes since v6:
  - Updated patch description
  - Dropped bdrv_unplug and inlined in drive_del
  - Explicitly invoke drive_uninit()
  Changes since v5:
  - Removed dangling pointers in guest and host state.  This ensures things 
  like 
info block no longer displays the deleted drive; though info pci will
continue to display the pci device until the guest responds to the 
  removal
request.
  - Renamed drive_unplug - drive_del
  Changes since v4:
  - Droppped drive_get_by_id patch and use bdrv_find() instead
  - Added additional details about drive_unplug to hmp/qmp interface
 
  Changes since v3:
  - Moved QMP command for drive_unplug() to separate patch
 
  Changes since v2:
  - Added QMP command for drive_unplug()
 
  Changes since v1:
  - CodingStyle fixes
  - Added qemu_aio_flush() to bdrv_unplug()
 
  Signed-off-by: Ryan Harper ry...@us.ibm.com
 
  Thanks, applied both to the block branch.
  
  I guess the conclusion was that we don't want the new command
  in QMP?
  
http://lists.gnu.org/archive/html/qemu-devel/2010-11/msg01084.html
 
 If you compare the time of these mails, Markus sent his mail only a few
 minutes after I had applied the patches and posted this.
 
 Ryan split the patch in two parts only to allow dropping the QMP part if
 we decided so, so I think he'll agree. I'm going to drop the second

Indeed.

 patch from my queue again before I send a pull request.
 
 Kevin

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

[Qemu-devel] [PATCH 0/2] v7 Decouple block device removal from device removal

2010-11-12 Thread Ryan Harper

Once more dear friends, v7

This patch series decouples the detachment of a block device from the
removal of the backing pci-device.  Removal of a hotplugged pci device
requires the guest to respond before qemu tears down the block device.
In some cases, the guest may not respond leaving the guest with
continued access to the block device.  Mgmt layer doesn't have a
reliable method to force a disconnect.  

The new monitor command, drive_del, will revoke a guests access to the
block device independently of the removal of the pci device.

The first patch implements drive_del, the second patch implements the
qmp version of the monitor command.

Changes since v6:
- Updated patch description
- Dropped bdrv_unplug and inlined in drive_del
- Explicitly invoke drive_uninit()
Changes since v5:
- Removed dangling pointers in guest and host state.  This ensures things like 
  info block no longer displays the deleted drive; though info pci will
  continue to display the pci device until the guest responds to the removal
  request.
- Renamed drive_unplug - drive_del
Changes since v4:
- Droppped drive_get_by_id patch and use bdrv_find() instead
- Added additional details about drive_unplug to hmp/qmp interface

Changes since v3:
- Moved QMP command for drive_unplug() to separate patch

Changes since v2:
- Added QMP command for drive_unplug()

Changes since v1:
- CodingStyle fixes
- Added qemu_aio_flush() to bdrv_unplug()

Signed-off-by: Ryan Harper ry...@us.ibm.com

[Qemu-devel] [PATCH 1/2] Implement drive_del to decouple block removal from device removal

2010-11-12 Thread Ryan Harper

Currently device hotplug removal code is tied to device removal via
ACPI.  All pci devices that are removable via device_del() require the
guest to respond to the request.  In some cases the guest may not
respond leaving the device still accessible to the guest.  The management
layer doesn't currently have a reliable way to revoke access to host
resource in the presence of an uncooperative guest.

This patch implements a new monitor command, drive_del, which
provides an explicit command to revoke access to a host block device.

drive_del first quiesces the block device (qemu_aio_flush;
bdrv_flush() and bdrv_close()).  This prevents further IO from being
submitted against the host device.  Finally, drive_del cleans up
pointers between the drive object (host resource) and the device
object (guest resource).

Signed-off-by: Ryan Harper ry...@us.ibm.com
---
 blockdev.c  |   39 +++
 blockdev.h  |1 +
 hmp-commands.hx |   18 ++
 3 files changed, 58 insertions(+), 0 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 6cb179a..f6ac439 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -14,6 +14,8 @@
 #include qemu-option.h
 #include qemu-config.h
 #include sysemu.h
+#include hw/qdev.h
+#include block_int.h
 
 static QTAILQ_HEAD(drivelist, DriveInfo) drives = 
QTAILQ_HEAD_INITIALIZER(drives);
 
@@ -597,3 +599,40 @@ int do_change_block(Monitor *mon, const char *device,
 }
 return monitor_read_bdrv_key_start(mon, bs, NULL, NULL);
 }
+
+int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
+{
+const char *id = qdict_get_str(qdict, id);
+BlockDriverState *bs;
+BlockDriverState **ptr;
+Property *prop;
+
+bs = bdrv_find(id);
+if (!bs) {
+qerror_report(QERR_DEVICE_NOT_FOUND, id);
+return -1;
+}
+
+/* quiesce block driver; prevent further io */
+qemu_aio_flush();
+bdrv_flush(bs);
+bdrv_close(bs);
+
+/* clean up guest state from pointing to host resource by
+ * finding and removing DeviceState drive property */
+for (prop = bs-peer-info-props; prop  prop-name; prop++) {
+if (prop-info-type == PROP_TYPE_DRIVE) {
+ptr = qdev_get_prop_ptr(bs-peer, prop);
+if ((*ptr) == bs) {
+bdrv_detach(bs, bs-peer);
+*ptr = NULL;
+break;
+}
+}
+}
+
+/* clean up host side */
+drive_uninit(drive_get_by_blockdev(bs));
+
+return 0;
+}
diff --git a/blockdev.h b/blockdev.h
index 653affc..2a0559e 100644
--- a/blockdev.h
+++ b/blockdev.h
@@ -51,5 +51,6 @@ int do_eject(Monitor *mon, const QDict *qdict, QObject 
**ret_data);
 int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject **ret_data);
 int do_change_block(Monitor *mon, const char *device,
 const char *filename, const char *fmt);
+int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
 
 #endif
diff --git a/hmp-commands.hx b/hmp-commands.hx
index e5585ba..d6dc18c 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -68,6 +68,24 @@ Eject a removable medium (use -f to force it).
 ETEXI
 
 {
+.name   = drive_del,
+.args_type  = id:s,
+.params = device,
+.help   = remove host block device,
+.user_print = monitor_user_noop,
+.mhandler.cmd_new = do_drive_del,
+},
+
+STEXI
+...@item delete @var{device}
+...@findex delete
+Remove host block device.  The result is that guest generated IO is no longer
+submitted against the host device underlying the disk.  Once a drive has
+been deleted, the QEMU Block layer returns -EIO which results in IO 
+errors in the guest for applications that are reading/writing to the device.
+ETEXI
+
+{
 .name   = change,
 .args_type  = device:B,target:F,arg:s?,
 .params = device filename [format],
-- 
1.6.3.3

[Qemu-devel] [PATCH 2/2] Add qmp version of drive_del

2010-11-12 Thread Ryan Harper

Signed-off-by: Ryan Harper ry...@us.ibm.com
---
 qmp-commands.hx |   29 +
 1 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/qmp-commands.hx b/qmp-commands.hx
index 793cf1c..1e0d4e9 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -338,6 +338,35 @@ Example:
 EQMP
 
 {
+.name   = drive_del,
+.args_type  = id:s,
+.params = device,
+.help   = remove host block device,
+.user_print = monitor_user_noop,
+.mhandler.cmd_new = do_drive_del,
+},
+
+SQMP
+drive del
+--
+
+Remove host block device.  The result is that guest generated IO is no longer
+submitted against the host device underlying the disk.  Once a drive has
+been deleted, the QEMU Block layer returns -EIO which results in IO 
+errors in the guest for applications that are reading/writing to the device.
+
+Arguments:
+
+- id: the device's ID (json-string)
+
+Example:
+
+- { execute: drive_del, arguments: { id: drive-virtio-blk1 } }
+- { return: {} }
+
+EQMP
+
+{
 .name   = cpu,
 .args_type  = index:i,
 .params = index,
-- 
1.6.3.3

[Qemu-devel] Re: [PATCH 1/2] Implement drive_del to decouple block removal from device removal

2010-11-12 Thread Ryan Harper

* Kevin Wolf kw...@redhat.com [2010-11-12 10:43]:
 Am 12.11.2010 16:38, schrieb Ryan Harper:
  Currently device hotplug removal code is tied to device removal via
  ACPI.  All pci devices that are removable via device_del() require the
  guest to respond to the request.  In some cases the guest may not
  respond leaving the device still accessible to the guest.  The management
  layer doesn't currently have a reliable way to revoke access to host
  resource in the presence of an uncooperative guest.
  
  This patch implements a new monitor command, drive_del, which
  provides an explicit command to revoke access to a host block device.
  
  drive_del first quiesces the block device (qemu_aio_flush;
  bdrv_flush() and bdrv_close()).  This prevents further IO from being
  submitted against the host device.  Finally, drive_del cleans up
  pointers between the drive object (host resource) and the device
  object (guest resource).
  
  Signed-off-by: Ryan Harper ry...@us.ibm.com
  ---
   blockdev.c  |   39 +++
   blockdev.h  |1 +
   hmp-commands.hx |   18 ++
   3 files changed, 58 insertions(+), 0 deletions(-)
  
  diff --git a/blockdev.c b/blockdev.c
  index 6cb179a..f6ac439 100644
  --- a/blockdev.c
  +++ b/blockdev.c
  @@ -14,6 +14,8 @@
   #include qemu-option.h
   #include qemu-config.h
   #include sysemu.h
  +#include hw/qdev.h
  +#include block_int.h
   
   static QTAILQ_HEAD(drivelist, DriveInfo) drives = 
  QTAILQ_HEAD_INITIALIZER(drives);
   
  @@ -597,3 +599,40 @@ int do_change_block(Monitor *mon, const char *device,
   }
   return monitor_read_bdrv_key_start(mon, bs, NULL, NULL);
   }
  +
  +int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
  +{
  +const char *id = qdict_get_str(qdict, id);
  +BlockDriverState *bs;
  +BlockDriverState **ptr;
  +Property *prop;
  +
  +bs = bdrv_find(id);
  +if (!bs) {
  +qerror_report(QERR_DEVICE_NOT_FOUND, id);
  +return -1;
  +}
  +
  +/* quiesce block driver; prevent further io */
  +qemu_aio_flush();
  +bdrv_flush(bs);
  +bdrv_close(bs);
  +
  +/* clean up guest state from pointing to host resource by
  + * finding and removing DeviceState drive property */
  +for (prop = bs-peer-info-props; prop  prop-name; prop++) {
  +if (prop-info-type == PROP_TYPE_DRIVE) {
  +ptr = qdev_get_prop_ptr(bs-peer, prop);
  +if ((*ptr) == bs) {
  +bdrv_detach(bs, bs-peer);
  +*ptr = NULL;
  +break;
  +}
  +}
  +}
  +
  +/* clean up host side */
  +drive_uninit(drive_get_by_blockdev(bs));
  +
  +return 0;
  +}
  diff --git a/blockdev.h b/blockdev.h
  index 653affc..2a0559e 100644
  --- a/blockdev.h
  +++ b/blockdev.h
  @@ -51,5 +51,6 @@ int do_eject(Monitor *mon, const QDict *qdict, QObject 
  **ret_data);
   int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject 
  **ret_data);
   int do_change_block(Monitor *mon, const char *device,
   const char *filename, const char *fmt);
  +int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
   
   #endif
  diff --git a/hmp-commands.hx b/hmp-commands.hx
  index e5585ba..d6dc18c 100644
  --- a/hmp-commands.hx
  +++ b/hmp-commands.hx
  @@ -68,6 +68,24 @@ Eject a removable medium (use -f to force it).
   ETEXI
   
   {
  +.name   = drive_del,
  +.args_type  = id:s,
  +.params = device,
  +.help   = remove host block device,
  +.user_print = monitor_user_noop,
  +.mhandler.cmd_new = do_drive_del,
  +},
  +
  +STEXI
  +...@item delete @var{device}
  +...@findex delete
 
 I think this should be @item drive_del and @findex drive_del.

Yep.

 
 Kevin

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

[Qemu-devel] [PATCH 2/2] Add qmp version of drive_del

2010-11-12 Thread Ryan Harper

Signed-off-by: Ryan Harper ry...@us.ibm.com
---
 qmp-commands.hx |   29 +
 1 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/qmp-commands.hx b/qmp-commands.hx
index 793cf1c..1e0d4e9 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -338,6 +338,35 @@ Example:
 EQMP
 
 {
+.name   = drive_del,
+.args_type  = id:s,
+.params = device,
+.help   = remove host block device,
+.user_print = monitor_user_noop,
+.mhandler.cmd_new = do_drive_del,
+},
+
+SQMP
+drive del
+--
+
+Remove host block device.  The result is that guest generated IO is no longer
+submitted against the host device underlying the disk.  Once a drive has
+been deleted, the QEMU Block layer returns -EIO which results in IO 
+errors in the guest for applications that are reading/writing to the device.
+
+Arguments:
+
+- id: the device's ID (json-string)
+
+Example:
+
+- { execute: drive_del, arguments: { id: drive-virtio-blk1 } }
+- { return: {} }
+
+EQMP
+
+{
 .name   = cpu,
 .args_type  = index:i,
 .params = index,
-- 
1.6.3.3

[Qemu-devel] [PATCH 0/2] v8 Decouple block device removal from device removal

2010-11-12 Thread Ryan Harper

details, details, v8

This patch series decouples the detachment of a block device from the
removal of the backing pci-device.  Removal of a hotplugged pci device
requires the guest to respond before qemu tears down the block device.
In some cases, the guest may not respond leaving the guest with
continued access to the block device.  Mgmt layer doesn't have a
reliable method to force a disconnect.  

The new monitor command, drive_del, will revoke a guests access to the
block device independently of the removal of the pci device.

The first patch implements drive_del, the second patch implements the
qmp version of the monitor command.

Changes since v7:
- Fixed up doc strings (delete - drive_del)
Changes since v6:
- Updated patch description
- Dropped bdrv_unplug and inlined in drive_del
- Explicitly invoke drive_uninit()
Changes since v5:
- Removed dangling pointers in guest and host state.  This ensures things like 
  info block no longer displays the deleted drive; though info pci will
  continue to display the pci device until the guest responds to the removal
  request.
- Renamed drive_unplug - drive_del
Changes since v4:
- Droppped drive_get_by_id patch and use bdrv_find() instead
- Added additional details about drive_unplug to hmp/qmp interface

Changes since v3:
- Moved QMP command for drive_unplug() to separate patch

Changes since v2:
- Added QMP command for drive_unplug()

Changes since v1:
- CodingStyle fixes
- Added qemu_aio_flush() to bdrv_unplug()

Signed-off-by: Ryan Harper ry...@us.ibm.com

[Qemu-devel] [PATCH 1/2] Implement drive_del to decouple block removal from device removal

2010-11-12 Thread Ryan Harper

Currently device hotplug removal code is tied to device removal via
ACPI.  All pci devices that are removable via device_del() require the
guest to respond to the request.  In some cases the guest may not
respond leaving the device still accessible to the guest.  The management
layer doesn't currently have a reliable way to revoke access to host
resource in the presence of an uncooperative guest.

This patch implements a new monitor command, drive_del, which
provides an explicit command to revoke access to a host block device.

drive_del first quiesces the block device (qemu_aio_flush;
bdrv_flush() and bdrv_close()).  This prevents further IO from being
submitted against the host device.  Finally, drive_del cleans up
pointers between the drive object (host resource) and the device
object (guest resource).

Signed-off-by: Ryan Harper ry...@us.ibm.com
---
 blockdev.c  |   39 +++
 blockdev.h  |1 +
 hmp-commands.hx |   18 ++
 3 files changed, 58 insertions(+), 0 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 6cb179a..f6ac439 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -14,6 +14,8 @@
 #include qemu-option.h
 #include qemu-config.h
 #include sysemu.h
+#include hw/qdev.h
+#include block_int.h
 
 static QTAILQ_HEAD(drivelist, DriveInfo) drives = 
QTAILQ_HEAD_INITIALIZER(drives);
 
@@ -597,3 +599,40 @@ int do_change_block(Monitor *mon, const char *device,
 }
 return monitor_read_bdrv_key_start(mon, bs, NULL, NULL);
 }
+
+int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
+{
+const char *id = qdict_get_str(qdict, id);
+BlockDriverState *bs;
+BlockDriverState **ptr;
+Property *prop;
+
+bs = bdrv_find(id);
+if (!bs) {
+qerror_report(QERR_DEVICE_NOT_FOUND, id);
+return -1;
+}
+
+/* quiesce block driver; prevent further io */
+qemu_aio_flush();
+bdrv_flush(bs);
+bdrv_close(bs);
+
+/* clean up guest state from pointing to host resource by
+ * finding and removing DeviceState drive property */
+for (prop = bs-peer-info-props; prop  prop-name; prop++) {
+if (prop-info-type == PROP_TYPE_DRIVE) {
+ptr = qdev_get_prop_ptr(bs-peer, prop);
+if ((*ptr) == bs) {
+bdrv_detach(bs, bs-peer);
+*ptr = NULL;
+break;
+}
+}
+}
+
+/* clean up host side */
+drive_uninit(drive_get_by_blockdev(bs));
+
+return 0;
+}
diff --git a/blockdev.h b/blockdev.h
index 653affc..2a0559e 100644
--- a/blockdev.h
+++ b/blockdev.h
@@ -51,5 +51,6 @@ int do_eject(Monitor *mon, const QDict *qdict, QObject 
**ret_data);
 int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject **ret_data);
 int do_change_block(Monitor *mon, const char *device,
 const char *filename, const char *fmt);
+int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
 
 #endif
diff --git a/hmp-commands.hx b/hmp-commands.hx
index e5585ba..c8b0206 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -68,6 +68,24 @@ Eject a removable medium (use -f to force it).
 ETEXI
 
 {
+.name   = drive_del,
+.args_type  = id:s,
+.params = device,
+.help   = remove host block device,
+.user_print = monitor_user_noop,
+.mhandler.cmd_new = do_drive_del,
+},
+
+STEXI
+...@item drive_del @var{device}
+...@findex drive_del
+Remove host block device.  The result is that guest generated IO is no longer
+submitted against the host device underlying the disk.  Once a drive has
+been deleted, the QEMU Block layer returns -EIO which results in IO 
+errors in the guest for applications that are reading/writing to the device.
+ETEXI
+
+{
 .name   = change,
 .args_type  = device:B,target:F,arg:s?,
 .params = device filename [format],
-- 
1.6.3.3

Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del()

2010-11-11 Thread Ryan Harper

* Markus Armbruster arm...@redhat.com [2010-11-11 04:48]:
 Ryan Harper ry...@us.ibm.com writes:
 
  * Markus Armbruster arm...@redhat.com [2010-11-10 11:40]:
  Ryan Harper ry...@us.ibm.com writes:
  
   * Markus Armbruster arm...@redhat.com [2010-11-10 06:48]:
   One real question, and a couple of nits.
   
   Ryan Harper ry...@us.ibm.com writes:
   
Block hot unplug is racy since the guest is required to acknowlege 
the ACPI
unplug event; this may not happen synchronously with the device 
removal command
   
   Well, I wouldn't call unplug racy.  It just takes an unpredictable
   length of time, possibly forever.  To make a race, you need to throw in
   a client assuming (incorrectly) that unplug is instantaneous, as
   described in your next paragraph.
   
   Moreover, all PCI unplug is that way, not just block.
   
This series aims to close a gap where by mgmt applications that 
assume the
block resource has been removed without confirming that the guest has
acknowledged the removal may re-assign the underlying device to a 
second guest
leading to data leakage.
   
   Yes, the incorrect assumption is a problem.  But with that fixed (in the
   management application), we run right into the next problem: there is no
   way for the management application to reliably disconnect the guest from
   a block device.  And that's the problem you're fixing.
  
   Yeah, that's the right way to word it; providing a method to forcibly
   disconnect the guest from the host device.
   
This series introduces a new montor command to decouple asynchornous 
device
   
   Typos montor and asynchornous.  You might want to use a spell
   checker :)
   
   Lines are a bit long.  Recommend wrap at column 70.
   
removal from restricting guest access to a block device.  We do this 
by creating
a new monitor command drive_del which maps to a bdrv_unplug() command 
which
does a qemu_aio_flush; bdrv_flush() and bdrv_close().  Once complete, 
subsequent
IO is rejected from the device and the guest will get IO errors but 
continue to
function.  In addition to preventing further IO, we clean up state 
pointers
between host (BlockDriverState) and guest (DeviceInfo).
   
A subsequent device removal command can be issued to remove the 
device, to which
the guest may or maynot respond, but as long as the unplugged bit is 
set, no IO
   
   maynot is not a word.
   
will be sumbitted.
   
   This suggests to drive_del before device_del, which makes the device
   goes through a broken device state on its way to unplug.  If the guest
   accesses the device in that state, it gets I/O errors.  Not nice.
   
   Instead, I'd recommend device_del, wait for the device to go away,
   drive_del on time out.  If the guest reacts to the ACPI unplug promptly,
   it's never exposed to the broken device state.  Note: if the drive_del
   fails because the device doesn't exist, we lost the race with the
   automatic destruction, which is harmless.  Ignore that error.
  
   Honestly, other than describing what happens if you sever the connection
   when the guest isn't aware of it; I don't want to try to capture how the
   mgmt layer implements the removal.  
  
   One may want to force the disconnect before attempting to remove the
   device; or the other way around; that's really the mgmt layer's call.
  
  Fair enough.
  
Signed-off-by: Ryan Harper ry...@us.ibm.com
---
 block.c |7 +++
 block.h |1 +
 blockdev.c  |   36 
 blockdev.h  |1 +
 hmp-commands.hx |   18 ++
 5 files changed, 63 insertions(+), 0 deletions(-)
   
diff --git a/block.c b/block.c
index 6b505fb..c76a796 100644
--- a/block.c
+++ b/block.c
@@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, 
int removable)
 }
 }
 
+void bdrv_unplug(BlockDriverState *bs)
+{
+qemu_aio_flush();
+bdrv_flush(bs);
+bdrv_close(bs);
+}
+
   
   Unless we expect more users, I'd inline this into its only caller.
   Matter of taste.
  
   Works for me.
  
   
 int bdrv_is_removable(BlockDriverState *bs)
 {
 return bs-removable;
diff --git a/block.h b/block.h
index 78ecfac..581414c 100644
--- a/block.h
+++ b/block.h
@@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, 
BlockErrorAction on_read_error,
BlockErrorAction on_write_error);
 BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int 
is_read);
 void bdrv_set_removable(BlockDriverState *bs, int removable);
+void bdrv_unplug(BlockDriverState *bs);
 int bdrv_is_removable(BlockDriverState *bs);
 int bdrv_is_read_only(BlockDriverState *bs);
 int bdrv_is_sg(BlockDriverState *bs);
diff --git a/blockdev.c b/blockdev.c

Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del()

2010-11-10 Thread Ryan Harper

* Markus Armbruster arm...@redhat.com [2010-11-10 06:48]:
 One real question, and a couple of nits.
 
 Ryan Harper ry...@us.ibm.com writes:
 
  Block hot unplug is racy since the guest is required to acknowlege the ACPI
  unplug event; this may not happen synchronously with the device removal 
  command
 
 Well, I wouldn't call unplug racy.  It just takes an unpredictable
 length of time, possibly forever.  To make a race, you need to throw in
 a client assuming (incorrectly) that unplug is instantaneous, as
 described in your next paragraph.
 
 Moreover, all PCI unplug is that way, not just block.
 
  This series aims to close a gap where by mgmt applications that assume the
  block resource has been removed without confirming that the guest has
  acknowledged the removal may re-assign the underlying device to a second 
  guest
  leading to data leakage.
 
 Yes, the incorrect assumption is a problem.  But with that fixed (in the
 management application), we run right into the next problem: there is no
 way for the management application to reliably disconnect the guest from
 a block device.  And that's the problem you're fixing.

Yeah, that's the right way to word it; providing a method to forcibly
disconnect the guest from the host device.
 
  This series introduces a new montor command to decouple asynchornous device
 
 Typos montor and asynchornous.  You might want to use a spell
 checker :)
 
 Lines are a bit long.  Recommend wrap at column 70.
 
  removal from restricting guest access to a block device.  We do this by 
  creating
  a new monitor command drive_del which maps to a bdrv_unplug() command which
  does a qemu_aio_flush; bdrv_flush() and bdrv_close().  Once complete, 
  subsequent
  IO is rejected from the device and the guest will get IO errors but 
  continue to
  function.  In addition to preventing further IO, we clean up state pointers
  between host (BlockDriverState) and guest (DeviceInfo).
 
  A subsequent device removal command can be issued to remove the device, to 
  which
  the guest may or maynot respond, but as long as the unplugged bit is set, 
  no IO
 
 maynot is not a word.
 
  will be sumbitted.
 
 This suggests to drive_del before device_del, which makes the device
 goes through a broken device state on its way to unplug.  If the guest
 accesses the device in that state, it gets I/O errors.  Not nice.
 
 Instead, I'd recommend device_del, wait for the device to go away,
 drive_del on time out.  If the guest reacts to the ACPI unplug promptly,
 it's never exposed to the broken device state.  Note: if the drive_del
 fails because the device doesn't exist, we lost the race with the
 automatic destruction, which is harmless.  Ignore that error.

Honestly, other than describing what happens if you sever the connection
when the guest isn't aware of it; I don't want to try to capture how the
mgmt layer implements the removal.  

One may want to force the disconnect before attempting to remove the
device; or the other way around; that's really the mgmt layer's call.

 
  Signed-off-by: Ryan Harper ry...@us.ibm.com
  ---
   block.c |7 +++
   block.h |1 +
   blockdev.c  |   36 
   blockdev.h  |1 +
   hmp-commands.hx |   18 ++
   5 files changed, 63 insertions(+), 0 deletions(-)
 
  diff --git a/block.c b/block.c
  index 6b505fb..c76a796 100644
  --- a/block.c
  +++ b/block.c
  @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int 
  removable)
   }
   }
   
  +void bdrv_unplug(BlockDriverState *bs)
  +{
  +qemu_aio_flush();
  +bdrv_flush(bs);
  +bdrv_close(bs);
  +}
  +
 
 Unless we expect more users, I'd inline this into its only caller.
 Matter of taste.

Works for me.

 
   int bdrv_is_removable(BlockDriverState *bs)
   {
   return bs-removable;
  diff --git a/block.h b/block.h
  index 78ecfac..581414c 100644
  --- a/block.h
  +++ b/block.h
  @@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, 
  BlockErrorAction on_read_error,
  BlockErrorAction on_write_error);
   BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read);
   void bdrv_set_removable(BlockDriverState *bs, int removable);
  +void bdrv_unplug(BlockDriverState *bs);
   int bdrv_is_removable(BlockDriverState *bs);
   int bdrv_is_read_only(BlockDriverState *bs);
   int bdrv_is_sg(BlockDriverState *bs);
  diff --git a/blockdev.c b/blockdev.c
  index 6cb179a..ee8c2ec 100644
  --- a/blockdev.c
  +++ b/blockdev.c
  @@ -14,6 +14,8 @@
   #include qemu-option.h
   #include qemu-config.h
   #include sysemu.h
  +#include hw/qdev.h
  +#include block_int.h
   
   static QTAILQ_HEAD(drivelist, DriveInfo) drives = 
  QTAILQ_HEAD_INITIALIZER(drives);
   
  @@ -597,3 +599,37 @@ int do_change_block(Monitor *mon, const char *device,
   }
   return monitor_read_bdrv_key_start(mon, bs, NULL, NULL);
   }
  +
  +int do_drive_del(Monitor *mon, const

Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del()

2010-11-10 Thread Ryan Harper

* Markus Armbruster arm...@redhat.com [2010-11-10 11:40]:
 Ryan Harper ry...@us.ibm.com writes:
 
  * Markus Armbruster arm...@redhat.com [2010-11-10 06:48]:
  One real question, and a couple of nits.
  
  Ryan Harper ry...@us.ibm.com writes:
  
   Block hot unplug is racy since the guest is required to acknowlege the 
   ACPI
   unplug event; this may not happen synchronously with the device removal 
   command
  
  Well, I wouldn't call unplug racy.  It just takes an unpredictable
  length of time, possibly forever.  To make a race, you need to throw in
  a client assuming (incorrectly) that unplug is instantaneous, as
  described in your next paragraph.
  
  Moreover, all PCI unplug is that way, not just block.
  
   This series aims to close a gap where by mgmt applications that assume 
   the
   block resource has been removed without confirming that the guest has
   acknowledged the removal may re-assign the underlying device to a second 
   guest
   leading to data leakage.
  
  Yes, the incorrect assumption is a problem.  But with that fixed (in the
  management application), we run right into the next problem: there is no
  way for the management application to reliably disconnect the guest from
  a block device.  And that's the problem you're fixing.
 
  Yeah, that's the right way to word it; providing a method to forcibly
  disconnect the guest from the host device.
  
   This series introduces a new montor command to decouple asynchornous 
   device
  
  Typos montor and asynchornous.  You might want to use a spell
  checker :)
  
  Lines are a bit long.  Recommend wrap at column 70.
  
   removal from restricting guest access to a block device.  We do this by 
   creating
   a new monitor command drive_del which maps to a bdrv_unplug() command 
   which
   does a qemu_aio_flush; bdrv_flush() and bdrv_close().  Once complete, 
   subsequent
   IO is rejected from the device and the guest will get IO errors but 
   continue to
   function.  In addition to preventing further IO, we clean up state 
   pointers
   between host (BlockDriverState) and guest (DeviceInfo).
  
   A subsequent device removal command can be issued to remove the device, 
   to which
   the guest may or maynot respond, but as long as the unplugged bit is 
   set, no IO
  
  maynot is not a word.
  
   will be sumbitted.
  
  This suggests to drive_del before device_del, which makes the device
  goes through a broken device state on its way to unplug.  If the guest
  accesses the device in that state, it gets I/O errors.  Not nice.
  
  Instead, I'd recommend device_del, wait for the device to go away,
  drive_del on time out.  If the guest reacts to the ACPI unplug promptly,
  it's never exposed to the broken device state.  Note: if the drive_del
  fails because the device doesn't exist, we lost the race with the
  automatic destruction, which is harmless.  Ignore that error.
 
  Honestly, other than describing what happens if you sever the connection
  when the guest isn't aware of it; I don't want to try to capture how the
  mgmt layer implements the removal.  
 
  One may want to force the disconnect before attempting to remove the
  device; or the other way around; that's really the mgmt layer's call.
 
 Fair enough.
 
   Signed-off-by: Ryan Harper ry...@us.ibm.com
   ---
block.c |7 +++
block.h |1 +
blockdev.c  |   36 
blockdev.h  |1 +
hmp-commands.hx |   18 ++
5 files changed, 63 insertions(+), 0 deletions(-)
  
   diff --git a/block.c b/block.c
   index 6b505fb..c76a796 100644
   --- a/block.c
   +++ b/block.c
   @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int 
   removable)
}
}

   +void bdrv_unplug(BlockDriverState *bs)
   +{
   +qemu_aio_flush();
   +bdrv_flush(bs);
   +bdrv_close(bs);
   +}
   +
  
  Unless we expect more users, I'd inline this into its only caller.
  Matter of taste.
 
  Works for me.
 
  
int bdrv_is_removable(BlockDriverState *bs)
{
return bs-removable;
   diff --git a/block.h b/block.h
   index 78ecfac..581414c 100644
   --- a/block.h
   +++ b/block.h
   @@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, 
   BlockErrorAction on_read_error,
   BlockErrorAction on_write_error);
BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read);
void bdrv_set_removable(BlockDriverState *bs, int removable);
   +void bdrv_unplug(BlockDriverState *bs);
int bdrv_is_removable(BlockDriverState *bs);
int bdrv_is_read_only(BlockDriverState *bs);
int bdrv_is_sg(BlockDriverState *bs);
   diff --git a/blockdev.c b/blockdev.c
   index 6cb179a..ee8c2ec 100644
   --- a/blockdev.c
   +++ b/blockdev.c
   @@ -14,6 +14,8 @@
#include qemu-option.h
#include qemu-config.h
#include sysemu.h
   +#include hw/qdev.h
   +#include block_int.h

static

Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del()

2010-11-10 Thread Ryan Harper

* Markus Armbruster arm...@redhat.com [2010-11-10 11:40]:
 Ryan Harper ry...@us.ibm.com writes:
 
  * Markus Armbruster arm...@redhat.com [2010-11-10 06:48]:
  One real question, and a couple of nits.
  
  Ryan Harper ry...@us.ibm.com writes:
  
   Block hot unplug is racy since the guest is required to acknowlege the 
   ACPI
   unplug event; this may not happen synchronously with the device removal 
   command
  
  Well, I wouldn't call unplug racy.  It just takes an unpredictable
  length of time, possibly forever.  To make a race, you need to throw in
  a client assuming (incorrectly) that unplug is instantaneous, as
  described in your next paragraph.
  
  Moreover, all PCI unplug is that way, not just block.
  
   This series aims to close a gap where by mgmt applications that assume 
   the
   block resource has been removed without confirming that the guest has
   acknowledged the removal may re-assign the underlying device to a second 
   guest
   leading to data leakage.
  
  Yes, the incorrect assumption is a problem.  But with that fixed (in the
  management application), we run right into the next problem: there is no
  way for the management application to reliably disconnect the guest from
  a block device.  And that's the problem you're fixing.
 
  Yeah, that's the right way to word it; providing a method to forcibly
  disconnect the guest from the host device.
  
   This series introduces a new montor command to decouple asynchornous 
   device
  
  Typos montor and asynchornous.  You might want to use a spell
  checker :)
  
  Lines are a bit long.  Recommend wrap at column 70.
  
   removal from restricting guest access to a block device.  We do this by 
   creating
   a new monitor command drive_del which maps to a bdrv_unplug() command 
   which
   does a qemu_aio_flush; bdrv_flush() and bdrv_close().  Once complete, 
   subsequent
   IO is rejected from the device and the guest will get IO errors but 
   continue to
   function.  In addition to preventing further IO, we clean up state 
   pointers
   between host (BlockDriverState) and guest (DeviceInfo).
  
   A subsequent device removal command can be issued to remove the device, 
   to which
   the guest may or maynot respond, but as long as the unplugged bit is 
   set, no IO
  
  maynot is not a word.
  
   will be sumbitted.
  
  This suggests to drive_del before device_del, which makes the device
  goes through a broken device state on its way to unplug.  If the guest
  accesses the device in that state, it gets I/O errors.  Not nice.
  
  Instead, I'd recommend device_del, wait for the device to go away,
  drive_del on time out.  If the guest reacts to the ACPI unplug promptly,
  it's never exposed to the broken device state.  Note: if the drive_del
  fails because the device doesn't exist, we lost the race with the
  automatic destruction, which is harmless.  Ignore that error.
 
  Honestly, other than describing what happens if you sever the connection
  when the guest isn't aware of it; I don't want to try to capture how the
  mgmt layer implements the removal.  
 
  One may want to force the disconnect before attempting to remove the
  device; or the other way around; that's really the mgmt layer's call.
 
 Fair enough.
 
   Signed-off-by: Ryan Harper ry...@us.ibm.com
   ---
block.c |7 +++
block.h |1 +
blockdev.c  |   36 
blockdev.h  |1 +
hmp-commands.hx |   18 ++
5 files changed, 63 insertions(+), 0 deletions(-)
  
   diff --git a/block.c b/block.c
   index 6b505fb..c76a796 100644
   --- a/block.c
   +++ b/block.c
   @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int 
   removable)
}
}

   +void bdrv_unplug(BlockDriverState *bs)
   +{
   +qemu_aio_flush();
   +bdrv_flush(bs);
   +bdrv_close(bs);
   +}
   +
  
  Unless we expect more users, I'd inline this into its only caller.
  Matter of taste.
 
  Works for me.
 
  
int bdrv_is_removable(BlockDriverState *bs)
{
return bs-removable;
   diff --git a/block.h b/block.h
   index 78ecfac..581414c 100644
   --- a/block.h
   +++ b/block.h
   @@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, 
   BlockErrorAction on_read_error,
   BlockErrorAction on_write_error);
BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read);
void bdrv_set_removable(BlockDriverState *bs, int removable);
   +void bdrv_unplug(BlockDriverState *bs);
int bdrv_is_removable(BlockDriverState *bs);
int bdrv_is_read_only(BlockDriverState *bs);
int bdrv_is_sg(BlockDriverState *bs);
   diff --git a/blockdev.c b/blockdev.c
   index 6cb179a..ee8c2ec 100644
   --- a/blockdev.c
   +++ b/blockdev.c
   @@ -14,6 +14,8 @@
#include qemu-option.h
#include qemu-config.h
#include sysemu.h
   +#include hw/qdev.h
   +#include block_int.h

static

Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal

2010-11-08 Thread Ryan Harper

* Markus Armbruster arm...@redhat.com [2010-11-08 06:04]:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Mon, Nov 08, 2010 at 11:32:01AM +0100, Markus Armbruster wrote:
  Ryan Harper ry...@us.ibm.com writes:
  
   * Markus Armbruster arm...@redhat.com [2010-11-06 04:19]:
   Ryan Harper ry...@us.ibm.com writes:
   
* Markus Armbruster arm...@redhat.com [2010-11-05 11:11]:
Ryan Harper ry...@us.ibm.com writes:

 * Markus Armbruster arm...@redhat.com [2010-11-05 08:28]:
 I'd be fine with any of these:
 
 1. A new command device_disconnet ID (or similar name) to 
 disconnect
device ID from any host parts.  Nice touch: you don't have to 
 know
about the device's host part(s) to disconnect it.  But it 
 might be
more work than the other two.

 This is sort of what netdev_del() and drive_unplug() are today; 
 we're
 just saying sever the connection of this device id.   

No, I have netdev_del as (3).

All three options are sort of the same, just different commands 
with
a common purpose.

 I'd like to rename drive_unplug() to blockdev_del() and call it 
 done.  I
 was looking at libvirt and the right call to netdev_del is already
 in-place; I'd just need to re-spin my block patch to call 
 blockdev_del()
 after invoking device_del() to match what is done for net.

Unless I'm missing something, you can't just rename: your unplug does
not delete the host part.

 2. New commands netdev_disconnect, drive_disconnect (or similar 
 names)
to disconnect a host part from a guest device.  Like (1), 
 except you
have to point to the other end of the connection to cut it.

 What's the advantage here? We need an additional piece of info 
 (host
 part) in addition to the device id?

That's a disadvantage.

Possible advantage: implementation could be slightly easier than (1),
because you don't have to find the host parts.

 3. A new command drive_del ID similar to existing netdev_del.  
 This is
(2) fused with delete.  Conceptual wart: you can't disconnect 
 and
keep the host part around.  Moreover, delete is slightly 
 dangerous,
because it renders any guest device still using the host part
useless.

 Hrm, I thought that's what (1) is.

No.

With (1), the argument is a *device* ID, and we disconnect *all* host
parts connected to this device (typically just one).

With (3), the argument is a netdev/drive ID, and disconnect *this* 
host
part from the peer device.

 Well, either (1) or (3); I'd 
 like to
 rename drive_unplug() to blockdev_del() since they're similar 
 function
 w.r.t removing access to the host resource.  And we can invoke 
 them in
 the same way from libvirt (after doing guest notification, remove
 access).

I'd call it drive_del for now, to match drive_add.
   
OK, drive_del() and as you mentioned, drive_unplug will take out the
block driver, but doesn't remove the dinfo object; that ends up dying
when we call the device destructor.  I think for symmetry we'll want
drive_del to remove the dinfo object as well.
   
   Exactly.
   
   a. bdrv_detach() to zap the pointer from bdrv to qdev
   b. zap the pointer from qdev to bdrv
   c. drive_uninit() to dispose of the host part
  
   a-c need to be done to match netdev_del symmetry?  How hard of a req is
   this?
  
  Without (c), it's not a delete.  And (c) without (b) leaves a dangling
  pointer.  (c) without (a) fails an assertion in bdrv_delete().
  
  Aside: (b) should probably be folded into bdrv_detach().
  
   Step b could be awkward with (3), because you don't know device details.
   I guess you have to search device properties for a drive property
   pointing to bdrv.  I like (1) because it puts that loop in the one place
   where it belongs: qdev core.  (3) duplicates it in every HOSTDEV_del.
   Except for netdev_del, which is special because of VLANs.
   
   To avoid step b, you could try to keep the bdrv around in a special
   zombie state.  Still have to free the dinfo, but can't use
   drive_uninit() for that then.
   
   If you think I'm overcomplicating this, feel free to prove me wrong with
   working code :)
  
   drive_unplug() works as-is today; so it does feel very combursome at
   this point.  Other than the name change and agreement on how mgmt should
   invoke the command, it's been a long ride to get here.
  
  Sometimes it takes a tough man to make a tender chicken.
 
   I'll take my best shot at trying to clean up the other
   pointers and objects; though on one of my attempts when I took out the
   dinfo() object that didn't go so well; going to have to audit who uses
   dinfo and where and what they check before calling it to have

Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal

2010-11-08 Thread Ryan Harper

* Michael S. Tsirkin m...@redhat.com [2010-11-08 10:57]:
 On Mon, Nov 08, 2010 at 08:02:50AM -0600, Ryan Harper wrote:
  * Markus Armbruster arm...@redhat.com [2010-11-08 06:04]:
   Michael S. Tsirkin m...@redhat.com writes:
   
On Mon, Nov 08, 2010 at 11:32:01AM +0100, Markus Armbruster wrote:
Ryan Harper ry...@us.ibm.com writes:

 * Markus Armbruster arm...@redhat.com [2010-11-06 04:19]:
 Ryan Harper ry...@us.ibm.com writes:
 
  * Markus Armbruster arm...@redhat.com [2010-11-05 11:11]:
  Ryan Harper ry...@us.ibm.com writes:
  
   * Markus Armbruster arm...@redhat.com [2010-11-05 08:28]:
   I'd be fine with any of these:
   
   1. A new command device_disconnet ID (or similar name) to 
   disconnect
  device ID from any host parts.  Nice touch: you don't have 
   to know
  about the device's host part(s) to disconnect it.  But it 
   might be
  more work than the other two.
  
   This is sort of what netdev_del() and drive_unplug() are 
   today; we're
   just saying sever the connection of this device id.   
  
  No, I have netdev_del as (3).
  
  All three options are sort of the same, just different 
  commands with
  a common purpose.
  
   I'd like to rename drive_unplug() to blockdev_del() and call 
   it done.  I
   was looking at libvirt and the right call to netdev_del is 
   already
   in-place; I'd just need to re-spin my block patch to call 
   blockdev_del()
   after invoking device_del() to match what is done for net.
  
  Unless I'm missing something, you can't just rename: your unplug 
  does
  not delete the host part.
  
   2. New commands netdev_disconnect, drive_disconnect (or 
   similar names)
  to disconnect a host part from a guest device.  Like (1), 
   except you
  have to point to the other end of the connection to cut it.
  
   What's the advantage here? We need an additional piece of info 
   (host
   part) in addition to the device id?
  
  That's a disadvantage.
  
  Possible advantage: implementation could be slightly easier than 
  (1),
  because you don't have to find the host parts.
  
   3. A new command drive_del ID similar to existing 
   netdev_del.  This is
  (2) fused with delete.  Conceptual wart: you can't 
   disconnect and
  keep the host part around.  Moreover, delete is slightly 
   dangerous,
  because it renders any guest device still using the host 
   part
  useless.
  
   Hrm, I thought that's what (1) is.
  
  No.
  
  With (1), the argument is a *device* ID, and we disconnect *all* 
  host
  parts connected to this device (typically just one).
  
  With (3), the argument is a netdev/drive ID, and disconnect 
  *this* host
  part from the peer device.
  
   Well, either (1) or (3); 
   I'd like to
   rename drive_unplug() to blockdev_del() since they're similar 
   function
   w.r.t removing access to the host resource.  And we can invoke 
   them in
   the same way from libvirt (after doing guest notification, 
   remove
   access).
  
  I'd call it drive_del for now, to match drive_add.
 
  OK, drive_del() and as you mentioned, drive_unplug will take out 
  the
  block driver, but doesn't remove the dinfo object; that ends up 
  dying
  when we call the device destructor.  I think for symmetry we'll 
  want
  drive_del to remove the dinfo object as well.
 
 Exactly.
 
 a. bdrv_detach() to zap the pointer from bdrv to qdev
 b. zap the pointer from qdev to bdrv
 c. drive_uninit() to dispose of the host part

 a-c need to be done to match netdev_del symmetry?  How hard of a req 
 is
 this?

Without (c), it's not a delete.  And (c) without (b) leaves a dangling
pointer.  (c) without (a) fails an assertion in bdrv_delete().

Aside: (b) should probably be folded into bdrv_detach().

 Step b could be awkward with (3), because you don't know device 
 details.
 I guess you have to search device properties for a drive property
 pointing to bdrv.  I like (1) because it puts that loop in the one 
 place
 where it belongs: qdev core.  (3) duplicates it in every 
 HOSTDEV_del.
 Except for netdev_del, which is special because of VLANs.
 
 To avoid step b, you could try to keep the bdrv around in a special
 zombie state.  Still have to free the dinfo, but can't use
 drive_uninit() for that then.
 
 If you think I'm overcomplicating this, feel free to prove me wrong 
 with
 working code :)

 drive_unplug() works as-is today; so it does feel very combursome at
 this point

Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal

2010-11-08 Thread Ryan Harper

* Daniel P. Berrange berra...@redhat.com [2010-11-08 11:05]:
 On Mon, Nov 08, 2010 at 06:56:02PM +0200, Michael S. Tsirkin wrote:
  On Mon, Nov 08, 2010 at 08:02:50AM -0600, Ryan Harper wrote:
   * Markus Armbruster arm...@redhat.com [2010-11-08 06:04]:
Michael S. Tsirkin m...@redhat.com writes:
 Here's how the various objects are connected to each other:
 
contains
 drivelist--- DriveInfo
 |
 | .bdrv
 | .id == .bdrv-device_name
 |
contains V
 bdrv_states  --- BlockDriverState
  |   ^
.peer |   |
  |   |  host part
 -|---|---
  |   | guest part
  |   | property drive
  v   |
   DeviceState
 
 To disconnect host from guest part, you need to cut both pointers.  
 To
 delete the host part, you need to delete both objects, 
 BlockDriverState
 and DriveInfo.


 If we remove DriveInfo, how can management later detect that guest 
 part
 was deleted?

Directly: check whether the qdev is gone.

I don't know how to check that indirectly, via DriveInfo.

  If you want symmetry with netdev, it's possible to keep a
 shell of BlockDriverState/DriveInfo around (solving dangling pointer
 problems).

netdev_del deletes the host network part:

(qemu) info network
Devices not on any VLAN:
  net.0: net=10.0.2.0, restricted=n peer=nic.0
  nic.0: model=virtio-net-pci,macaddr=52:54:00:12:34:56 peer=net.0
(qemu) netdev_del net.0
(qemu) info network
Devices not on any VLAN:
  nic.0: model=virtio-net-pci,macaddr=52:54:00:12:34:56 peer=net.0

It leaves around the VLAN object.  Since qdev property points to that,
it doesn't dangle.

In my opinion, drive_del should make the drive vanish from info block,
   
   Yeah; that's the right thing to do here.  Let me respin the patch with
   the name change and the additional work to fix up the pointers and
   ensure that we don't see the drive in info block.
  
  Daniel, I'd like your input here: can you live with
  device diappearing from info block and parsing
  qdev tree info to figure out whether device is really gone?
 
 We don't use info block for anything. Having to parse the full qdev tree
 to determine if a single device is gone seems rather tedious. It would
 be better if query-qdev took an optional argument, which is the name
 of the device to root the tree at. Then checking whether a device
 named 'foo' is gone just means running 'query-qdev foo' and seeing if
 that returns an error about the device not existing, then we know it
 has gone. No need to parse anything. Being able to query the qdev data
 for a single device, or sub-tree of devices seems useful in its own
 right.

Since I'm not looking forward to parsing info block (easy) nor parsing
all of qdev tree (much harder) I really like the query approach.  

That makes it easy to put a query in the netdev_del/drive_del commands
to skip invoking them if the guest has already responded.

 
 Regards,
 Daniel
 -- 
 |: Red Hat, Engineering, London-o-   http://people.redhat.com/berrange/ :|
 |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
 |: http://autobuild.org-o- http://search.cpan.org/~danberr/ :|
 |: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

[Qemu-devel] [PATCH 2/2] Add qmp version of drive_del

2010-11-08 Thread Ryan Harper

Signed-off-by: Ryan Harper ry...@us.ibm.com
---
 qmp-commands.hx |   29 +
 1 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/qmp-commands.hx b/qmp-commands.hx
index 793cf1c..1e0d4e9 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -338,6 +338,35 @@ Example:
 EQMP
 
 {
+.name   = drive_del,
+.args_type  = id:s,
+.params = device,
+.help   = remove host block device,
+.user_print = monitor_user_noop,
+.mhandler.cmd_new = do_drive_del,
+},
+
+SQMP
+drive del
+--
+
+Remove host block device.  The result is that guest generated IO is no longer
+submitted against the host device underlying the disk.  Once a drive has
+been deleted, the QEMU Block layer returns -EIO which results in IO 
+errors in the guest for applications that are reading/writing to the device.
+
+Arguments:
+
+- id: the device's ID (json-string)
+
+Example:
+
+- { execute: drive_del, arguments: { id: drive-virtio-blk1 } }
+- { return: {} }
+
+EQMP
+
+{
 .name   = cpu,
 .args_type  = index:i,
 .params = index,
-- 
1.6.3.3

[Qemu-devel] [PATCH 0/2] v6 Decouple block device removal from device removal

2010-11-08 Thread Ryan Harper

After *much* discussion, here's version 6.

This patch series decouples the detachment of a block device from the removal
of the backing pci-device.  Removal of a hotplugged pci device requires the
guest to respond before qemu tears down the block device. In some cases, the
guest may not respond leaving the guest with continued access to the block
device.  

The new monitor command, drive_del, will revoke a guests access to the
block device independently of the removal of the pci device.

The first patch implements drive_del and bdrv_unplug, the second patch
implements the qmp version of the monitor command.

Changes since v5:
- Removed dangling pointers in guest and host state.  This ensures things like 
  info block no longer displays the deleted drive; though info pci will
  continue to display the pci device until the guest responds to the removal
  request.
- Renamed drive_unplug - drive_del
Changes since v4:
- Droppped drive_get_by_id patch and use bdrv_find() instead
- Added additional details about drive_unplug to hmp/qmp interface

Changes since v3:
- Moved QMP command for drive_unplug() to separate patch

Changes since v2:
- Added QMP command for drive_unplug()

Changes since v1:
- CodingStyle fixes
- Added qemu_aio_flush() to bdrv_unplug()

Signed-off-by: Ryan Harper ry...@us.ibm.com

[Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del()

2010-11-08 Thread Ryan Harper

Block hot unplug is racy since the guest is required to acknowlege the ACPI
unplug event; this may not happen synchronously with the device removal command

This series aims to close a gap where by mgmt applications that assume the
block resource has been removed without confirming that the guest has
acknowledged the removal may re-assign the underlying device to a second guest
leading to data leakage.

This series introduces a new montor command to decouple asynchornous device
removal from restricting guest access to a block device.  We do this by creating
a new monitor command drive_del which maps to a bdrv_unplug() command which
does a qemu_aio_flush; bdrv_flush() and bdrv_close().  Once complete, subsequent
IO is rejected from the device and the guest will get IO errors but continue to
function.  In addition to preventing further IO, we clean up state pointers
between host (BlockDriverState) and guest (DeviceInfo).

A subsequent device removal command can be issued to remove the device, to which
the guest may or maynot respond, but as long as the unplugged bit is set, no IO
will be sumbitted.

Signed-off-by: Ryan Harper ry...@us.ibm.com
---
 block.c |7 +++
 block.h |1 +
 blockdev.c  |   36 
 blockdev.h  |1 +
 hmp-commands.hx |   18 ++
 5 files changed, 63 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index 6b505fb..c76a796 100644
--- a/block.c
+++ b/block.c
@@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int 
removable)
 }
 }
 
+void bdrv_unplug(BlockDriverState *bs)
+{
+qemu_aio_flush();
+bdrv_flush(bs);
+bdrv_close(bs);
+}
+
 int bdrv_is_removable(BlockDriverState *bs)
 {
 return bs-removable;
diff --git a/block.h b/block.h
index 78ecfac..581414c 100644
--- a/block.h
+++ b/block.h
@@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, 
BlockErrorAction on_read_error,
BlockErrorAction on_write_error);
 BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read);
 void bdrv_set_removable(BlockDriverState *bs, int removable);
+void bdrv_unplug(BlockDriverState *bs);
 int bdrv_is_removable(BlockDriverState *bs);
 int bdrv_is_read_only(BlockDriverState *bs);
 int bdrv_is_sg(BlockDriverState *bs);
diff --git a/blockdev.c b/blockdev.c
index 6cb179a..ee8c2ec 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -14,6 +14,8 @@
 #include qemu-option.h
 #include qemu-config.h
 #include sysemu.h
+#include hw/qdev.h
+#include block_int.h
 
 static QTAILQ_HEAD(drivelist, DriveInfo) drives = 
QTAILQ_HEAD_INITIALIZER(drives);
 
@@ -597,3 +599,37 @@ int do_change_block(Monitor *mon, const char *device,
 }
 return monitor_read_bdrv_key_start(mon, bs, NULL, NULL);
 }
+
+int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
+{
+const char *id = qdict_get_str(qdict, id);
+BlockDriverState *bs;
+Property *prop;
+
+bs = bdrv_find(id);
+if (!bs) {
+qerror_report(QERR_DEVICE_NOT_FOUND, id);
+return -1;
+}
+
+/* quiesce block driver; prevent further io */
+bdrv_unplug(bs);
+
+/* clean up guest state from pointing to host resource by
+ * finding and removing DeviceState drive property */
+for (prop = bs-peer-info-props; prop  prop-name; prop++) {
+if ((prop-info-type == PROP_TYPE_DRIVE)  
+(*(BlockDriverState **)qdev_get_prop_ptr(bs-peer, prop) == bs)) {
+if (prop-info-free) {
+prop-info-free(bs-peer, prop);
+}
+}
+}
+
+/* clean up host state pointing to guest resource by removing
+ * pointers to guest device in the BlockDriverState */
+bdrv_delete(bs);
+
+return 0;
+}
+ 
diff --git a/blockdev.h b/blockdev.h
index 653affc..2a0559e 100644
--- a/blockdev.h
+++ b/blockdev.h
@@ -51,5 +51,6 @@ int do_eject(Monitor *mon, const QDict *qdict, QObject 
**ret_data);
 int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject **ret_data);
 int do_change_block(Monitor *mon, const char *device,
 const char *filename, const char *fmt);
+int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
 
 #endif
diff --git a/hmp-commands.hx b/hmp-commands.hx
index e5585ba..d6dc18c 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -68,6 +68,24 @@ Eject a removable medium (use -f to force it).
 ETEXI
 
 {
+.name   = drive_del,
+.args_type  = id:s,
+.params = device,
+.help   = remove host block device,
+.user_print = monitor_user_noop,
+.mhandler.cmd_new = do_drive_del,
+},
+
+STEXI
+...@item delete @var{device}
+...@findex delete
+Remove host block device.  The result is that guest generated IO is no longer
+submitted against the host device underlying the disk.  Once a drive has
+been deleted, the QEMU Block layer returns -EIO which results in IO 
+errors in the guest for applications

Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal

2010-11-07 Thread Ryan Harper

* Markus Armbruster arm...@redhat.com [2010-11-06 04:19]:
 Ryan Harper ry...@us.ibm.com writes:
 
  * Markus Armbruster arm...@redhat.com [2010-11-05 11:11]:
  Ryan Harper ry...@us.ibm.com writes:
  
   * Markus Armbruster arm...@redhat.com [2010-11-05 08:28]:
   I'd be fine with any of these:
   
   1. A new command device_disconnet ID (or similar name) to disconnect
  device ID from any host parts.  Nice touch: you don't have to know
  about the device's host part(s) to disconnect it.  But it might be
  more work than the other two.
  
   This is sort of what netdev_del() and drive_unplug() are today; we're
   just saying sever the connection of this device id.   
  
  No, I have netdev_del as (3).
  
  All three options are sort of the same, just different commands with
  a common purpose.
  
   I'd like to rename drive_unplug() to blockdev_del() and call it done.  I
   was looking at libvirt and the right call to netdev_del is already
   in-place; I'd just need to re-spin my block patch to call blockdev_del()
   after invoking device_del() to match what is done for net.
  
  Unless I'm missing something, you can't just rename: your unplug does
  not delete the host part.
  
   2. New commands netdev_disconnect, drive_disconnect (or similar names)
  to disconnect a host part from a guest device.  Like (1), except you
  have to point to the other end of the connection to cut it.
  
   What's the advantage here? We need an additional piece of info (host
   part) in addition to the device id?
  
  That's a disadvantage.
  
  Possible advantage: implementation could be slightly easier than (1),
  because you don't have to find the host parts.
  
   3. A new command drive_del ID similar to existing netdev_del.  This is
  (2) fused with delete.  Conceptual wart: you can't disconnect and
  keep the host part around.  Moreover, delete is slightly dangerous,
  because it renders any guest device still using the host part
  useless.
  
   Hrm, I thought that's what (1) is.
  
  No.
  
  With (1), the argument is a *device* ID, and we disconnect *all* host
  parts connected to this device (typically just one).
  
  With (3), the argument is a netdev/drive ID, and disconnect *this* host
  part from the peer device.
  
   Well, either (1) or (3); I'd like to
   rename drive_unplug() to blockdev_del() since they're similar function
   w.r.t removing access to the host resource.  And we can invoke them in
   the same way from libvirt (after doing guest notification, remove
   access).
  
  I'd call it drive_del for now, to match drive_add.
 
  OK, drive_del() and as you mentioned, drive_unplug will take out the
  block driver, but doesn't remove the dinfo object; that ends up dying
  when we call the device destructor.  I think for symmetry we'll want
  drive_del to remove the dinfo object as well.
 
 Exactly.
 
 a. bdrv_detach() to zap the pointer from bdrv to qdev
 b. zap the pointer from qdev to bdrv
 c. drive_uninit() to dispose of the host part

a-c need to be done to match netdev_del symmetry?  How hard of a req is
this?

 
 Step b could be awkward with (3), because you don't know device details.
 I guess you have to search device properties for a drive property
 pointing to bdrv.  I like (1) because it puts that loop in the one place
 where it belongs: qdev core.  (3) duplicates it in every HOSTDEV_del.
 Except for netdev_del, which is special because of VLANs.
 
 To avoid step b, you could try to keep the bdrv around in a special
 zombie state.  Still have to free the dinfo, but can't use
 drive_uninit() for that then.
 
 If you think I'm overcomplicating this, feel free to prove me wrong with
 working code :)

drive_unplug() works as-is today; so it does feel very combursome at
this point.  Other than the name change and agreement on how mgmt should
invoke the command, it's been a long ride to get here.

I'll take my best shot at trying to clean up the other
pointers and objects; though on one of my attempts when I took out the
dinfo() object that didn't go so well; going to have to audit who uses
dinfo and where and what they check before calling it to have a proper
cleanup that doesn't remove the whole device altogether.

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal

2010-11-05 Thread Ryan Harper

* Michael S. Tsirkin m...@redhat.com [2010-11-05 09:18]:
 On Fri, Nov 05, 2010 at 02:27:49PM +0100, Markus Armbruster wrote:
  Ryan Harper ry...@us.ibm.com writes:
  
   * Michael S. Tsirkin m...@redhat.com [2010-11-03 16:46]:
   On Wed, Nov 03, 2010 at 03:59:29PM -0500, Ryan Harper wrote:
* Michael S. Tsirkin m...@redhat.com [2010-11-03 13:03]:
 On Wed, Nov 03, 2010 at 12:29:10PM -0500, Ryan Harper wrote:
  * Markus Armbruster arm...@redhat.com [2010-11-03 11:42]:
   Ryan Harper ry...@us.ibm.com writes:
   
* Michael S. Tsirkin m...@redhat.com [2010-11-03 02:22]:
On Tue, Nov 02, 2010 at 03:23:38PM -0500, Ryan Harper wrote:
 * Michael S. Tsirkin m...@redhat.com [2010-11-02 14:18]:
  On Tue, Nov 02, 2010 at 02:01:08PM -0500, Ryan Harper 
  wrote:
   I like the idea of disconnect; if part of the 
   device_del method was to
   invoke a disconnect method, we could implement 
   that for block, net, etc;
   
   I'd think we'd want to send the notification, 
   then disconnect.
   Struggling with whether it's worth having some 
   reasonable timeout
   between notification and disconnect.  
  
  The problem with this is that it has no analog in 
  real world.
  In real world, you can send some notifications to 
  the guest, and you can
  remove the card.  Tying them together is what 
  created the problem in the
  first place.
  
  Timeouts can be implemented by management, maybe 
  with a nice dialog
  being shown to the user.
 
 Very true.  I'm fine with forcing a disconnect 
 during the removal path
 prior to notification.  Do we want a new disconnect 
 method at the device
 level (pci)? or just use the existing removal 
 callback and call that
 during the initial hotremov event?

Not sure what you mean by that, but I don't see a 
device doing anything
differently wrt surprise or ordered removal. So 
probably the existing
callback should do. I don't think we need to talk 
about disconnect:
since we decided we are emulating device removal, 
let's call it
just that.
   
   Because current the removal process depends on the 
   guest actually
   responding.  What I'm suggesting is that, in Marcus's 
   term, and what
   drive_unplug() implements, is to disconnect the host 
   block device from
   the guest device to prevent any further access to it in 
   the case the
   guest doesn't respond to the removal request made via 
   ACPI.
   
   Very specifically, what we're suggesting instead of the 
   drive_unplug()
   command so to complete the device removal operation 
   without waiting for
   the guest to respond; that's what's going to happen if 
   we invoke the
   response callback; it will appear as if the guest 
   responded whether it
   did or not.
   
   What I was suggesting above was to instead of calling 
   the callback for
   handing the guest response was to add a device function 
   called
   disconnect which would remove any association of host 
   resources from
   guest resources before we notified the guest.  Thinking 
   about it again
   I'm not sure this is useful, but if we're going to 
   remove the device
   without the guests knowledge, I'm not sure how useful 
   sending the
   removal requests via ACPI is in the first place.
   
   My feeling is that I'd like to have explicit control 
   over the disconnect
   from host resources separate from the device removal 
   *if* we're going to
   retain the guest notification.  If we don't care to 
   notify the guest,
   then we can just do device removal without notifying 
   the guest
   and be done with it.
  
  I imagine management would typically want to do this:
  1. notify guest
  2. wait a bit
  3. remove device
 
 Yes; but this argues for (1) being a separate command from 
 (3)

Yes. Long term I think we will want a way to do that.

 unless we
 require (3) to include (1) and (2) in the qemu 
 implementation.
 
 Currently we implement:
 
 1. device_del (attempt to remove device)
 2. notify guest

Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal

2010-11-05 Thread Ryan Harper

* Markus Armbruster arm...@redhat.com [2010-11-05 08:28]:
 Ryan Harper ry...@us.ibm.com writes:
 
  * Michael S. Tsirkin m...@redhat.com [2010-11-03 16:46]:
  On Wed, Nov 03, 2010 at 03:59:29PM -0500, Ryan Harper wrote:
   * Michael S. Tsirkin m...@redhat.com [2010-11-03 13:03]:
On Wed, Nov 03, 2010 at 12:29:10PM -0500, Ryan Harper wrote:
 * Markus Armbruster arm...@redhat.com [2010-11-03 11:42]:
  Ryan Harper ry...@us.ibm.com writes:
  
   * Michael S. Tsirkin m...@redhat.com [2010-11-03 02:22]:
   On Tue, Nov 02, 2010 at 03:23:38PM -0500, Ryan Harper wrote:
* Michael S. Tsirkin m...@redhat.com [2010-11-02 14:18]:
 On Tue, Nov 02, 2010 at 02:01:08PM -0500, Ryan Harper wrote:
  I like the idea of disconnect; if part of the 
  device_del method was to
  invoke a disconnect method, we could implement 
  that for block, net, etc;
  
  I'd think we'd want to send the notification, 
  then disconnect.
  Struggling with whether it's worth having some 
  reasonable timeout
  between notification and disconnect.  
 
 The problem with this is that it has no analog in 
 real world.
 In real world, you can send some notifications to 
 the guest, and you can
 remove the card.  Tying them together is what 
 created the problem in the
 first place.
 
 Timeouts can be implemented by management, maybe 
 with a nice dialog
 being shown to the user.

Very true.  I'm fine with forcing a disconnect during 
the removal path
prior to notification.  Do we want a new disconnect 
method at the device
level (pci)? or just use the existing removal 
callback and call that
during the initial hotremov event?
   
   Not sure what you mean by that, but I don't see a 
   device doing anything
   differently wrt surprise or ordered removal. So 
   probably the existing
   callback should do. I don't think we need to talk about 
   disconnect:
   since we decided we are emulating device removal, let's 
   call it
   just that.
  
  Because current the removal process depends on the 
  guest actually
  responding.  What I'm suggesting is that, in Marcus's 
  term, and what
  drive_unplug() implements, is to disconnect the host 
  block device from
  the guest device to prevent any further access to it in 
  the case the
  guest doesn't respond to the removal request made via 
  ACPI.
  
  Very specifically, what we're suggesting instead of the 
  drive_unplug()
  command so to complete the device removal operation 
  without waiting for
  the guest to respond; that's what's going to happen if we 
  invoke the
  response callback; it will appear as if the guest 
  responded whether it
  did or not.
  
  What I was suggesting above was to instead of calling the 
  callback for
  handing the guest response was to add a device function 
  called
  disconnect which would remove any association of host 
  resources from
  guest resources before we notified the guest.  Thinking 
  about it again
  I'm not sure this is useful, but if we're going to remove 
  the device
  without the guests knowledge, I'm not sure how useful 
  sending the
  removal requests via ACPI is in the first place.
  
  My feeling is that I'd like to have explicit control over 
  the disconnect
  from host resources separate from the device removal *if* 
  we're going to
  retain the guest notification.  If we don't care to 
  notify the guest,
  then we can just do device removal without notifying the 
  guest
  and be done with it.
 
 I imagine management would typically want to do this:
 1. notify guest
 2. wait a bit
 3. remove device

Yes; but this argues for (1) being a separate command from (3)
   
   Yes. Long term I think we will want a way to do that.
   
unless we
require (3) to include (1) and (2) in the qemu implementation.

Currently we implement:

1. device_del (attempt to remove device)
2. notify guest
3. if guest responds, remove device
4. disconnect host resource from device on destruction

With my drive_unplug patch we do:

1. disconnect host resource from device

Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal

2010-11-05 Thread Ryan Harper

* Markus Armbruster arm...@redhat.com [2010-11-05 11:11]:
 Ryan Harper ry...@us.ibm.com writes:
 
  * Markus Armbruster arm...@redhat.com [2010-11-05 08:28]:
  I'd be fine with any of these:
  
  1. A new command device_disconnet ID (or similar name) to disconnect
 device ID from any host parts.  Nice touch: you don't have to know
 about the device's host part(s) to disconnect it.  But it might be
 more work than the other two.
 
  This is sort of what netdev_del() and drive_unplug() are today; we're
  just saying sever the connection of this device id.   
 
 No, I have netdev_del as (3).
 
 All three options are sort of the same, just different commands with
 a common purpose.
 
  I'd like to rename drive_unplug() to blockdev_del() and call it done.  I
  was looking at libvirt and the right call to netdev_del is already
  in-place; I'd just need to re-spin my block patch to call blockdev_del()
  after invoking device_del() to match what is done for net.
 
 Unless I'm missing something, you can't just rename: your unplug does
 not delete the host part.
 
  2. New commands netdev_disconnect, drive_disconnect (or similar names)
 to disconnect a host part from a guest device.  Like (1), except you
 have to point to the other end of the connection to cut it.
 
  What's the advantage here? We need an additional piece of info (host
  part) in addition to the device id?
 
 That's a disadvantage.
 
 Possible advantage: implementation could be slightly easier than (1),
 because you don't have to find the host parts.
 
  3. A new command drive_del ID similar to existing netdev_del.  This is
 (2) fused with delete.  Conceptual wart: you can't disconnect and
 keep the host part around.  Moreover, delete is slightly dangerous,
 because it renders any guest device still using the host part
 useless.
 
  Hrm, I thought that's what (1) is.
 
 No.
 
 With (1), the argument is a *device* ID, and we disconnect *all* host
 parts connected to this device (typically just one).
 
 With (3), the argument is a netdev/drive ID, and disconnect *this* host
 part from the peer device.
 
  Well, either (1) or (3); I'd like to
  rename drive_unplug() to blockdev_del() since they're similar function
  w.r.t removing access to the host resource.  And we can invoke them in
  the same way from libvirt (after doing guest notification, remove
  access).
 
 I'd call it drive_del for now, to match drive_add.

OK, drive_del() and as you mentioned, drive_unplug will take out the
block driver, but doesn't remove the dinfo object; that ends up dying
when we call the device destructor.  I think for symmetry we'll want
drive_del to remove the dinfo object as well.


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal

2010-11-04 Thread Ryan Harper

* Michael S. Tsirkin m...@redhat.com [2010-11-03 16:46]:
 On Wed, Nov 03, 2010 at 03:59:29PM -0500, Ryan Harper wrote:
  * Michael S. Tsirkin m...@redhat.com [2010-11-03 13:03]:
   On Wed, Nov 03, 2010 at 12:29:10PM -0500, Ryan Harper wrote:
* Markus Armbruster arm...@redhat.com [2010-11-03 11:42]:
 Ryan Harper ry...@us.ibm.com writes:
 
  * Michael S. Tsirkin m...@redhat.com [2010-11-03 02:22]:
  On Tue, Nov 02, 2010 at 03:23:38PM -0500, Ryan Harper wrote:
   * Michael S. Tsirkin m...@redhat.com [2010-11-02 14:18]:
On Tue, Nov 02, 2010 at 02:01:08PM -0500, Ryan Harper wrote:
 I like the idea of disconnect; if part of the 
 device_del method was to
 invoke a disconnect method, we could implement that 
 for block, net, etc;
 
 I'd think we'd want to send the notification, then 
 disconnect.
 Struggling with whether it's worth having some 
 reasonable timeout
 between notification and disconnect.  

The problem with this is that it has no analog in real 
world.
In real world, you can send some notifications to the 
guest, and you can
remove the card.  Tying them together is what created 
the problem in the
first place.

Timeouts can be implemented by management, maybe with 
a nice dialog
being shown to the user.
   
   Very true.  I'm fine with forcing a disconnect during 
   the removal path
   prior to notification.  Do we want a new disconnect 
   method at the device
   level (pci)? or just use the existing removal callback 
   and call that
   during the initial hotremov event?
  
  Not sure what you mean by that, but I don't see a device 
  doing anything
  differently wrt surprise or ordered removal. So probably 
  the existing
  callback should do. I don't think we need to talk about 
  disconnect:
  since we decided we are emulating device removal, let's 
  call it
  just that.
 
 Because current the removal process depends on the guest 
 actually
 responding.  What I'm suggesting is that, in Marcus's term, 
 and what
 drive_unplug() implements, is to disconnect the host block 
 device from
 the guest device to prevent any further access to it in the 
 case the
 guest doesn't respond to the removal request made via ACPI.
 
 Very specifically, what we're suggesting instead of the 
 drive_unplug()
 command so to complete the device removal operation without 
 waiting for
 the guest to respond; that's what's going to happen if we 
 invoke the
 response callback; it will appear as if the guest responded 
 whether it
 did or not.
 
 What I was suggesting above was to instead of calling the 
 callback for
 handing the guest response was to add a device function 
 called
 disconnect which would remove any association of host 
 resources from
 guest resources before we notified the guest.  Thinking 
 about it again
 I'm not sure this is useful, but if we're going to remove 
 the device
 without the guests knowledge, I'm not sure how useful 
 sending the
 removal requests via ACPI is in the first place.
 
 My feeling is that I'd like to have explicit control over 
 the disconnect
 from host resources separate from the device removal *if* 
 we're going to
 retain the guest notification.  If we don't care to notify 
 the guest,
 then we can just do device removal without notifying the 
 guest
 and be done with it.

I imagine management would typically want to do this:
1. notify guest
2. wait a bit
3. remove device
   
   Yes; but this argues for (1) being a separate command from (3)
  
  Yes. Long term I think we will want a way to do that.
  
   unless we
   require (3) to include (1) and (2) in the qemu implementation.
   
   Currently we implement:
   
   1. device_del (attempt to remove device)
   2. notify guest
   3. if guest responds, remove device
   4. disconnect host resource from device on destruction
   
   With my drive_unplug patch we do:
   
   1. disconnect host resource from device
  
  This is what drive_unplug does, right?
 
  Correct.
 
  
   2. device_del (attempt to remove device)
   3. notify guest
   4. if guest responds, remove device
   
   I think we're

Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal

2010-11-03 Thread Ryan Harper

* Michael S. Tsirkin m...@redhat.com [2010-11-03 02:22]:
 On Tue, Nov 02, 2010 at 03:23:38PM -0500, Ryan Harper wrote:
  * Michael S. Tsirkin m...@redhat.com [2010-11-02 14:18]:
   On Tue, Nov 02, 2010 at 02:01:08PM -0500, Ryan Harper wrote:
I like the idea of disconnect; if part of the device_del method 
was to
invoke a disconnect method, we could implement that for block, 
net, etc;

I'd think we'd want to send the notification, then disconnect.
Struggling with whether it's worth having some reasonable 
timeout
between notification and disconnect.  
   
   The problem with this is that it has no analog in real world.
   In real world, you can send some notifications to the guest, and 
   you can
   remove the card.  Tying them together is what created the problem 
   in the
   first place.
   
   Timeouts can be implemented by management, maybe with a nice 
   dialog
   being shown to the user.
  
  Very true.  I'm fine with forcing a disconnect during the removal 
  path
  prior to notification.  Do we want a new disconnect method at the 
  device
  level (pci)? or just use the existing removal callback and call that
  during the initial hotremov event?
 
 Not sure what you mean by that, but I don't see a device doing 
 anything
 differently wrt surprise or ordered removal. So probably the existing
 callback should do. I don't think we need to talk about disconnect:
 since we decided we are emulating device removal, let's call it
 just that.

Because current the removal process depends on the guest actually
responding.  What I'm suggesting is that, in Marcus's term, and what
drive_unplug() implements, is to disconnect the host block device from
the guest device to prevent any further access to it in the case the
guest doesn't respond to the removal request made via ACPI.

Very specifically, what we're suggesting instead of the drive_unplug()
command so to complete the device removal operation without waiting for
the guest to respond; that's what's going to happen if we invoke the
response callback; it will appear as if the guest responded whether it
did or not.

What I was suggesting above was to instead of calling the callback for
handing the guest response was to add a device function called
disconnect which would remove any association of host resources from
guest resources before we notified the guest.  Thinking about it again
I'm not sure this is useful, but if we're going to remove the device
without the guests knowledge, I'm not sure how useful sending the
removal requests via ACPI is in the first place.

My feeling is that I'd like to have explicit control over the disconnect
from host resources separate from the device removal *if* we're going to
retain the guest notification.  If we don't care to notify the guest,
then we can just do device removal without notifying the guest
and be done with it.
   
   I imagine management would typically want to do this:
   1. notify guest
   2. wait a bit
   3. remove device
  
  Yes; but this argues for (1) being a separate command from (3)
 
 Yes. Long term I think we will want a way to do that.
 
  unless we
  require (3) to include (1) and (2) in the qemu implementation.
  
  Currently we implement:
  
  1. device_del (attempt to remove device)
  2. notify guest
  3. if guest responds, remove device
  4. disconnect host resource from device on destruction
  
  With my drive_unplug patch we do:
  
  1. disconnect host resource from device
 
 This is what drive_unplug does, right?

Correct.

 
  2. device_del (attempt to remove device)
  3. notify guest
  4. if guest responds, remove device
  
  I think we're suggesting to instead do (if we keep disconnect as part of
  device_del)
  
  1. device_del (attemp to remove device)
  2. notify guest
  3. invoke device destruction callback resulting in disconnect host resource 
  from device
  4. if guest responds, invoke device destruction path a second time.
 
 By response you mean eject?  No, this is not what I was suggesting.
 I was really suggesting that your patch is fine :)
 Sorry about confusion.

I don't mean eject; I mean responding to the ACPI event by writing a
response to the PCI chipset which QEMU then in turn will invoke the
qdev_unplug() path which ultimately kills the device and the Drive and
BlockState objects.

 
 I was also saying that from what I hear, the pci express support
 will at some point need interfaces to
 - notify guest about device removal/addition
 - get eject from guest
 - remove device without talking to guest
 - add device without talking to guest
 - suppress device deletion on eject
 
 All this can be generic and can work through express
 configuration mechanisms or through acpi for pci

Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal

2010-11-03 Thread Ryan Harper

* Markus Armbruster arm...@redhat.com [2010-11-03 11:42]:
 Ryan Harper ry...@us.ibm.com writes:
 
  * Michael S. Tsirkin m...@redhat.com [2010-11-03 02:22]:
  On Tue, Nov 02, 2010 at 03:23:38PM -0500, Ryan Harper wrote:
   * Michael S. Tsirkin m...@redhat.com [2010-11-02 14:18]:
On Tue, Nov 02, 2010 at 02:01:08PM -0500, Ryan Harper wrote:
 I like the idea of disconnect; if part of the device_del 
 method was to
 invoke a disconnect method, we could implement that for 
 block, net, etc;
 
 I'd think we'd want to send the notification, then 
 disconnect.
 Struggling with whether it's worth having some reasonable 
 timeout
 between notification and disconnect.  

The problem with this is that it has no analog in real world.
In real world, you can send some notifications to the guest, 
and you can
remove the card.  Tying them together is what created the 
problem in the
first place.

Timeouts can be implemented by management, maybe with a nice 
dialog
being shown to the user.
   
   Very true.  I'm fine with forcing a disconnect during the 
   removal path
   prior to notification.  Do we want a new disconnect method at 
   the device
   level (pci)? or just use the existing removal callback and call 
   that
   during the initial hotremov event?
  
  Not sure what you mean by that, but I don't see a device doing 
  anything
  differently wrt surprise or ordered removal. So probably the 
  existing
  callback should do. I don't think we need to talk about disconnect:
  since we decided we are emulating device removal, let's call it
  just that.
 
 Because current the removal process depends on the guest actually
 responding.  What I'm suggesting is that, in Marcus's term, and what
 drive_unplug() implements, is to disconnect the host block device 
 from
 the guest device to prevent any further access to it in the case the
 guest doesn't respond to the removal request made via ACPI.
 
 Very specifically, what we're suggesting instead of the 
 drive_unplug()
 command so to complete the device removal operation without waiting 
 for
 the guest to respond; that's what's going to happen if we invoke the
 response callback; it will appear as if the guest responded whether 
 it
 did or not.
 
 What I was suggesting above was to instead of calling the callback 
 for
 handing the guest response was to add a device function called
 disconnect which would remove any association of host resources from
 guest resources before we notified the guest.  Thinking about it 
 again
 I'm not sure this is useful, but if we're going to remove the device
 without the guests knowledge, I'm not sure how useful sending the
 removal requests via ACPI is in the first place.
 
 My feeling is that I'd like to have explicit control over the 
 disconnect
 from host resources separate from the device removal *if* we're 
 going to
 retain the guest notification.  If we don't care to notify the guest,
 then we can just do device removal without notifying the guest
 and be done with it.

I imagine management would typically want to do this:
1. notify guest
2. wait a bit
3. remove device
   
   Yes; but this argues for (1) being a separate command from (3)
  
  Yes. Long term I think we will want a way to do that.
  
   unless we
   require (3) to include (1) and (2) in the qemu implementation.
   
   Currently we implement:
   
   1. device_del (attempt to remove device)
   2. notify guest
   3. if guest responds, remove device
   4. disconnect host resource from device on destruction
   
   With my drive_unplug patch we do:
   
   1. disconnect host resource from device
  
  This is what drive_unplug does, right?
 
  Correct.
 
  
   2. device_del (attempt to remove device)
   3. notify guest
   4. if guest responds, remove device
   
   I think we're suggesting to instead do (if we keep disconnect as part of
   device_del)
   
   1. device_del (attemp to remove device)
   2. notify guest
   3. invoke device destruction callback resulting in disconnect host 
   resource from device
   4. if guest responds, invoke device destruction path a second time.
  
  By response you mean eject?  No, this is not what I was suggesting.
  I was really suggesting that your patch is fine :)
  Sorry about confusion.
 
  I don't mean eject; I mean responding to the ACPI event by writing a
  response to the PCI chipset which QEMU then in turn will invoke the
  qdev_unplug() path which ultimately kills the device and the Drive and
  BlockState objects.
 
  
  I was also saying that from what I hear, the pci express support
  will at some point need interfaces to
  - notify guest

Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal

2010-11-03 Thread Ryan Harper

* Michael S. Tsirkin m...@redhat.com [2010-11-03 13:03]:
 On Wed, Nov 03, 2010 at 12:29:10PM -0500, Ryan Harper wrote:
  * Markus Armbruster arm...@redhat.com [2010-11-03 11:42]:
   Ryan Harper ry...@us.ibm.com writes:
   
* Michael S. Tsirkin m...@redhat.com [2010-11-03 02:22]:
On Tue, Nov 02, 2010 at 03:23:38PM -0500, Ryan Harper wrote:
 * Michael S. Tsirkin m...@redhat.com [2010-11-02 14:18]:
  On Tue, Nov 02, 2010 at 02:01:08PM -0500, Ryan Harper wrote:
   I like the idea of disconnect; if part of the device_del 
   method was to
   invoke a disconnect method, we could implement that for 
   block, net, etc;
   
   I'd think we'd want to send the notification, then 
   disconnect.
   Struggling with whether it's worth having some 
   reasonable timeout
   between notification and disconnect.  
  
  The problem with this is that it has no analog in real 
  world.
  In real world, you can send some notifications to the 
  guest, and you can
  remove the card.  Tying them together is what created the 
  problem in the
  first place.
  
  Timeouts can be implemented by management, maybe with a 
  nice dialog
  being shown to the user.
 
 Very true.  I'm fine with forcing a disconnect during the 
 removal path
 prior to notification.  Do we want a new disconnect method 
 at the device
 level (pci)? or just use the existing removal callback and 
 call that
 during the initial hotremov event?

Not sure what you mean by that, but I don't see a device doing 
anything
differently wrt surprise or ordered removal. So probably the 
existing
callback should do. I don't think we need to talk about 
disconnect:
since we decided we are emulating device removal, let's call it
just that.
   
   Because current the removal process depends on the guest 
   actually
   responding.  What I'm suggesting is that, in Marcus's term, and 
   what
   drive_unplug() implements, is to disconnect the host block 
   device from
   the guest device to prevent any further access to it in the case 
   the
   guest doesn't respond to the removal request made via ACPI.
   
   Very specifically, what we're suggesting instead of the 
   drive_unplug()
   command so to complete the device removal operation without 
   waiting for
   the guest to respond; that's what's going to happen if we invoke 
   the
   response callback; it will appear as if the guest responded 
   whether it
   did or not.
   
   What I was suggesting above was to instead of calling the 
   callback for
   handing the guest response was to add a device function called
   disconnect which would remove any association of host resources 
   from
   guest resources before we notified the guest.  Thinking about it 
   again
   I'm not sure this is useful, but if we're going to remove the 
   device
   without the guests knowledge, I'm not sure how useful sending the
   removal requests via ACPI is in the first place.
   
   My feeling is that I'd like to have explicit control over the 
   disconnect
   from host resources separate from the device removal *if* we're 
   going to
   retain the guest notification.  If we don't care to notify the 
   guest,
   then we can just do device removal without notifying the guest
   and be done with it.
  
  I imagine management would typically want to do this:
  1. notify guest
  2. wait a bit
  3. remove device
 
 Yes; but this argues for (1) being a separate command from (3)

Yes. Long term I think we will want a way to do that.

 unless we
 require (3) to include (1) and (2) in the qemu implementation.
 
 Currently we implement:
 
 1. device_del (attempt to remove device)
 2. notify guest
 3. if guest responds, remove device
 4. disconnect host resource from device on destruction
 
 With my drive_unplug patch we do:
 
 1. disconnect host resource from device

This is what drive_unplug does, right?
   
Correct.
   

 2. device_del (attempt to remove device)
 3. notify guest
 4. if guest responds, remove device
 
 I think we're suggesting to instead do (if we keep disconnect as 
 part of
 device_del)
 
 1. device_del (attemp to remove device)
 2. notify guest
 3. invoke device destruction callback resulting in disconnect host 
 resource from device
 4. if guest responds, invoke device destruction path a second time.

By response you mean eject?  No, this is not what I was suggesting

Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal

2010-11-02 Thread Ryan Harper

* Markus Armbruster arm...@redhat.com [2010-11-02 04:40]:
 Ryan Harper ry...@us.ibm.com writes:
 
  * Markus Armbruster arm...@redhat.com [2010-10-29 11:11]:
  Ryan Harper ry...@us.ibm.com writes:
  
   * Markus Armbruster arm...@redhat.com [2010-10-29 09:13]:
   [Note cc: Michael]
   
   Ryan Harper ry...@us.ibm.com writes:
   
   
   If I understand your patch correctly, the difference between your
   drive_unplug and my blockdev_del is as follows:
   
   * drive_unplug forcefully severs the connection between the host part of
 the block device and its BlockDriverState.  A shell of the host part
 remains, to be cleaned up later.  You need forceful disconnect
 operation to be able to revoke access to an image whether the guest
 cooperates or not.  Fair enough.
   
   * blockdev_del deletes a host part.  My current version fails when the
 host part is in use.  I patterned that after netdev_del, which used to
 work that way, until commit 2ffcb18d:
   
   Make netdev_del delete the netdev even when it's in use
   
   To hot-unplug guest and host part of a network device, you do:
   
   device_del NIC-ID
   netdev_del NETDEV-ID
   
   For PCI devices, device_del merely tells ACPI to unplug the device.
   The device goes away for real only after the guest processed the 
   ACPI
   unplug event.
   
   You have to wait until then (e.g. by polling info pci) before you 
   can
   unplug the netdev.  Not good.
   
   Fix by removing the in use check from do_netdev_del().  Deleting a
   netdev while it's in use is safe; packets simply get routed to the 
   bit
   bucket.
   
 Isn't this the very same problem that's behind your drive_unplug?
  
   Yes it is.
  
   
   I'd like to have some consistency among net, block and char device
   commands, i.e. a common set of operations that work the same for all of
   them.  Can we agree on such a set?
  
   Yeah; the current trouble (or at least what I perceive to be trouble) is
   that in the case where the guest responds to device_del induced ACPI
   removal event; the current qdev code already does the host-side device
   tear down.  Not sure if it is OK to do a blockdev_del() immediately
   after the device_del.  What happens when we do:
  
   device_del
   ACPI to guest
   blockdev_del /* removes host-side device */
  
  Fails in my tree, because the blockdev's still in use.  See below.
  
   guest responds to ACPI
   qdev calls pci device removal code
   qemu attempts to destroy the associated host-side block
  
   That may just work today; and if not, it shouldn't be hard to fix up the
   code to check for NULLs
  
  I hate the automatic deletion of host part along with the guest part.
  device_del should undo device_add.  {block,net,char}dev_{add,del} should
  be similarly paired.
 
  Agreed.
  
  In my blockdev branch, I keep the automatic delete only for backwards
  compatibility: if you create the drive with drive_add, it gets
  auto-deleted, but if you use blockdev_add, it stays around.
 
  But what to do about the case where we're doing drive_add and then a
  device_del()  That's the urgent situation that needs to be resolved.
 
 What's the exact problem we need to solve urgently?
 
 Is it provide means to cut the connection to the host part immediately,
 even with an uncooperative guest?

Yes, need to ensure that if the mgmt layer (libvirt) has done what it
believes should have disassociated the host block device from the guest,
we want to ensure that the host block device is no longer accessible
from the guest.

 
 Does this need to be separate from device_del?

no, it doesn't have to be.  Honestly, I didn't see a clear way to do
something like unplug early in the device_del because that's all pci
device code which has no knowledge of host block devices; having it
disconnect seemed like a layering violation.


 
   Even if your drive_unplug shouldn't fit in that set, we might want it as
   a stop-gap.  Depends on how urgent the need for it is.  Yet another
   special-purpose command to be deprecated later.
  
   The fix is urgent; but I'm willing to spin a couple patches if it helps
   get this into better shape.
  
  Can we agree on a common solution for block and net?  That's why I cc'ed
  Michael.
 
  I didn't see a good way to have block behave the same as net; though I
  do agree that it would be good to have this be common, long term.
 
 If we can't make them behave 100% the same, then the next best thing is
 to offer a preferred way to do things that works similarly enough to let
 users ignore the differences.
 
 Possible preferred ways to revoke access to a host part:
 
 A. device_del
 
Need to make device_del cut the connection right away instead of when
the guest completes unplug.
 
device_del changes behavior.  Any problems with that?

I don't think so; current mgmt consumers assume a cooperative guest and
don't handle

Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal

2010-11-02 Thread Ryan Harper

* Michael S. Tsirkin m...@redhat.com [2010-11-02 08:59]:
 On Tue, Nov 02, 2010 at 08:46:22AM -0500, Ryan Harper wrote:
  * Markus Armbruster arm...@redhat.com [2010-11-02 04:40]:

 I'd like to have some consistency among net, block and char device
 commands, i.e. a common set of operations that work the same for 
 all of
 them.  Can we agree on such a set?

 Yeah; the current trouble (or at least what I perceive to be 
 trouble) is
 that in the case where the guest responds to device_del induced ACPI
 removal event; the current qdev code already does the host-side 
 device
 tear down.  Not sure if it is OK to do a blockdev_del() immediately
 after the device_del.  What happens when we do:

 device_del
 ACPI to guest
 blockdev_del /* removes host-side device */

Fails in my tree, because the blockdev's still in use.  See below.

 guest responds to ACPI
 qdev calls pci device removal code
 qemu attempts to destroy the associated host-side block

 That may just work today; and if not, it shouldn't be hard to fix up 
 the
 code to check for NULLs

I hate the automatic deletion of host part along with the guest part.
device_del should undo device_add.  {block,net,char}dev_{add,del} 
should
be similarly paired.
   
Agreed.

In my blockdev branch, I keep the automatic delete only for backwards
compatibility: if you create the drive with drive_add, it gets
auto-deleted, but if you use blockdev_add, it stays around.
   
But what to do about the case where we're doing drive_add and then a
device_del()  That's the urgent situation that needs to be resolved.
   
   What's the exact problem we need to solve urgently?
   
   Is it provide means to cut the connection to the host part immediately,
   even with an uncooperative guest?
  
  Yes, need to ensure that if the mgmt layer (libvirt) has done what it
  believes should have disassociated the host block device from the guest,
  we want to ensure that the host block device is no longer accessible
  from the guest.
  
   
   Does this need to be separate from device_del?
  
  no, it doesn't have to be.  Honestly, I didn't see a clear way to do
  something like unplug early in the device_del because that's all pci
  device code which has no knowledge of host block devices; having it
  disconnect seemed like a layering violation.
 
 We invoke the cleanup callback, isn't that enough?

Won't that look a bit strange?  on device_del, call the cleanup callback
first;, then notify the guest, if the guest responds, I suppose as long
as the cleanup callback can handle being called a second time that'd
work.

I like the idea of disconnect; if part of the device_del method was to
invoke a disconnect method, we could implement that for block, net, etc;

I'd think we'd want to send the notification, then disconnect.
Struggling with whether it's worth having some reasonable timeout
between notification and disconnect.  




-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal

2010-11-02 Thread Ryan Harper

* Michael S. Tsirkin m...@redhat.com [2010-11-02 10:56]:
 On Tue, Nov 02, 2010 at 09:22:01AM -0500, Ryan Harper wrote:
  * Michael S. Tsirkin m...@redhat.com [2010-11-02 08:59]:
   On Tue, Nov 02, 2010 at 08:46:22AM -0500, Ryan Harper wrote:
* Markus Armbruster arm...@redhat.com [2010-11-02 04:40]:
  
   I'd like to have some consistency among net, block and char 
   device
   commands, i.e. a common set of operations that work the same 
   for all of
   them.  Can we agree on such a set?
  
   Yeah; the current trouble (or at least what I perceive to be 
   trouble) is
   that in the case where the guest responds to device_del induced 
   ACPI
   removal event; the current qdev code already does the host-side 
   device
   tear down.  Not sure if it is OK to do a blockdev_del() 
   immediately
   after the device_del.  What happens when we do:
  
   device_del
   ACPI to guest
   blockdev_del /* removes host-side device */
  
  Fails in my tree, because the blockdev's still in use.  See below.
  
   guest responds to ACPI
   qdev calls pci device removal code
   qemu attempts to destroy the associated host-side block
  
   That may just work today; and if not, it shouldn't be hard to 
   fix up the
   code to check for NULLs
  
  I hate the automatic deletion of host part along with the guest 
  part.
  device_del should undo device_add.  {block,net,char}dev_{add,del} 
  should
  be similarly paired.
 
  Agreed.
  
  In my blockdev branch, I keep the automatic delete only for 
  backwards
  compatibility: if you create the drive with drive_add, it gets
  auto-deleted, but if you use blockdev_add, it stays around.
 
  But what to do about the case where we're doing drive_add and then a
  device_del()  That's the urgent situation that needs to be resolved.
 
 What's the exact problem we need to solve urgently?
 
 Is it provide means to cut the connection to the host part 
 immediately,
 even with an uncooperative guest?

Yes, need to ensure that if the mgmt layer (libvirt) has done what it
believes should have disassociated the host block device from the guest,
we want to ensure that the host block device is no longer accessible
from the guest.

 
 Does this need to be separate from device_del?

no, it doesn't have to be.  Honestly, I didn't see a clear way to do
something like unplug early in the device_del because that's all pci
device code which has no knowledge of host block devices; having it
disconnect seemed like a layering violation.
   
   We invoke the cleanup callback, isn't that enough?
  
  Won't that look a bit strange?  on device_del, call the cleanup callback
  first;, then notify the guest, if the guest responds, I suppose as long
  as the cleanup callback can handle being called a second time that'd
  work.
 
 Well this is exactly what happens with surpise removal.
 If you yank a card out the slot, guest only gets notification
 afterwards.

Right, though the card ripper can (in some systems) press the removal
button which would send notification.  I think I'm fine with not
bothering to notify; this was mgmt interface driven anyhow so who ever
is doing it should have already ensured they weren't using the device.

 
  I like the idea of disconnect; if part of the device_del method was to
  invoke a disconnect method, we could implement that for block, net, etc;
  
  I'd think we'd want to send the notification, then disconnect.
  Struggling with whether it's worth having some reasonable timeout
  between notification and disconnect.  
 
 The problem with this is that it has no analog in real world.
 In real world, you can send some notifications to the guest, and you can
 remove the card.  Tying them together is what created the problem in the
 first place.
 
 Timeouts can be implemented by management, maybe with a nice dialog
 being shown to the user.

Very true.  I'm fine with forcing a disconnect during the removal path
prior to notification.  Do we want a new disconnect method at the device
level (pci)? or just use the existing removal callback and call that
during the initial hotremov event?


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal

2010-11-02 Thread Ryan Harper

I like the idea of disconnect; if part of the device_del method was to
invoke a disconnect method, we could implement that for block, net, etc;

I'd think we'd want to send the notification, then disconnect.
Struggling with whether it's worth having some reasonable timeout
between notification and disconnect.  
   
   The problem with this is that it has no analog in real world.
   In real world, you can send some notifications to the guest, and you can
   remove the card.  Tying them together is what created the problem in the
   first place.
   
   Timeouts can be implemented by management, maybe with a nice dialog
   being shown to the user.
  
  Very true.  I'm fine with forcing a disconnect during the removal path
  prior to notification.  Do we want a new disconnect method at the device
  level (pci)? or just use the existing removal callback and call that
  during the initial hotremov event?
 
 Not sure what you mean by that, but I don't see a device doing anything
 differently wrt surprise or ordered removal. So probably the existing
 callback should do. I don't think we need to talk about disconnect:
 since we decided we are emulating device removal, let's call it
 just that.

Because current the removal process depends on the guest actually
responding.  What I'm suggesting is that, in Marcus's term, and what
drive_unplug() implements, is to disconnect the host block device from
the guest device to prevent any further access to it in the case the
guest doesn't respond to the removal request made via ACPI.

Very specifically, what we're suggesting instead of the drive_unplug()
command so to complete the device removal operation without waiting for
the guest to respond; that's what's going to happen if we invoke the
response callback; it will appear as if the guest responded whether it
did or not.

What I was suggesting above was to instead of calling the callback for
handing the guest response was to add a device function called
disconnect which would remove any association of host resources from
guest resources before we notified the guest.  Thinking about it again
I'm not sure this is useful, but if we're going to remove the device
without the guests knowledge, I'm not sure how useful sending the
removal requests via ACPI is in the first place.

My feeling is that I'd like to have explicit control over the disconnect
from host resources separate from the device removal *if* we're going to
retain the guest notification.  If we don't care to notify the guest,
then we can just do device removal without notifying the guest
and be done with it.

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal

2010-11-02 Thread Ryan Harper

* Michael S. Tsirkin m...@redhat.com [2010-11-02 14:18]:
 On Tue, Nov 02, 2010 at 02:01:08PM -0500, Ryan Harper wrote:
  I like the idea of disconnect; if part of the device_del method was 
  to
  invoke a disconnect method, we could implement that for block, net, 
  etc;
  
  I'd think we'd want to send the notification, then disconnect.
  Struggling with whether it's worth having some reasonable timeout
  between notification and disconnect.  
 
 The problem with this is that it has no analog in real world.
 In real world, you can send some notifications to the guest, and you 
 can
 remove the card.  Tying them together is what created the problem in 
 the
 first place.
 
 Timeouts can be implemented by management, maybe with a nice dialog
 being shown to the user.

Very true.  I'm fine with forcing a disconnect during the removal path
prior to notification.  Do we want a new disconnect method at the device
level (pci)? or just use the existing removal callback and call that
during the initial hotremov event?
   
   Not sure what you mean by that, but I don't see a device doing anything
   differently wrt surprise or ordered removal. So probably the existing
   callback should do. I don't think we need to talk about disconnect:
   since we decided we are emulating device removal, let's call it
   just that.
  
  Because current the removal process depends on the guest actually
  responding.  What I'm suggesting is that, in Marcus's term, and what
  drive_unplug() implements, is to disconnect the host block device from
  the guest device to prevent any further access to it in the case the
  guest doesn't respond to the removal request made via ACPI.
  
  Very specifically, what we're suggesting instead of the drive_unplug()
  command so to complete the device removal operation without waiting for
  the guest to respond; that's what's going to happen if we invoke the
  response callback; it will appear as if the guest responded whether it
  did or not.
  
  What I was suggesting above was to instead of calling the callback for
  handing the guest response was to add a device function called
  disconnect which would remove any association of host resources from
  guest resources before we notified the guest.  Thinking about it again
  I'm not sure this is useful, but if we're going to remove the device
  without the guests knowledge, I'm not sure how useful sending the
  removal requests via ACPI is in the first place.
  
  My feeling is that I'd like to have explicit control over the disconnect
  from host resources separate from the device removal *if* we're going to
  retain the guest notification.  If we don't care to notify the guest,
  then we can just do device removal without notifying the guest
  and be done with it.
 
 I imagine management would typically want to do this:
 1. notify guest
 2. wait a bit
 3. remove device

Yes; but this argues for (1) being a separate command from (3) unless we
require (3) to include (1) and (2) in the qemu implementation.

Currently we implement:

1. device_del (attempt to remove device)
2. notify guest
3. if guest responds, remove device
4. disconnect host resource from device on destruction

With my drive_unplug patch we do:

1. disconnect host resource from device
2. device_del (attempt to remove device)
3. notify guest
4. if guest responds, remove device

I think we're suggesting to instead do (if we keep disconnect as part of
device_del)

1. device_del (attemp to remove device)
2. notify guest
3. invoke device destruction callback resulting in disconnect host resource 
from device
4. if guest responds, invoke device destruction path a second time.



-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] [PATCH 2/3] v2 Fix Block Hotplug race with drive_unplug()

2010-11-01 Thread Ryan Harper

* Markus Armbruster arm...@redhat.com [2010-10-29 09:08]:
 Ryan Harper ry...@us.ibm.com writes:
 
  Block hot unplug is racy since the guest is required to acknowlege the ACPI
  unplug event; this may not happen synchronously with the device removal 
  command
 
  This series aims to close a gap where by mgmt applications that assume the
  block resource has been removed without confirming that the guest has
  acknowledged the removal may re-assign the underlying device to a second 
  guest
  leading to data leakage.
 
  This series introduces a new montor command to decouple asynchornous device
  removal from restricting guest access to a block device.  We do this by 
  creating
  a new monitor command drive_unplug which maps to a bdrv_unplug() command 
  which
  does a qemu_aio_flush; bdrv_flush() and bdrv_close().  Once complete, 
  subsequent
  IO is rejected from the device and the guest will get IO errors but 
  continue to
  function.
 
  A subsequent device removal command can be issued to remove the device, to 
  which
  the guest may or maynot respond, but as long as the unplugged bit is set, 
  no IO
  will be sumbitted.
 
  Changes since v1:
  - Added qemu_aio_flush() before bdrv_flush() to wait on pending io
 
  Signed-off-by: Ryan Harper ry...@us.ibm.com
  ---
   block.c |7 +++
   block.h |1 +
   blockdev.c  |   26 ++
   blockdev.h  |1 +
   hmp-commands.hx |   15 +++
   5 files changed, 50 insertions(+), 0 deletions(-)
 
  diff --git a/block.c b/block.c
  index a19374d..be47655 100644
  --- a/block.c
  +++ b/block.c
  @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int 
  removable)
   }
   }
   
  +void bdrv_unplug(BlockDriverState *bs)
  +{
  +qemu_aio_flush();
  +bdrv_flush(bs);
  +bdrv_close(bs);
  +}
 
 Stupid question: why doesn't bdrv_close() flush automatically?
 
 And why do we have to flush here, but not before other uses of
 bdrv_close(), such as eject_device()?
 
  +
   int bdrv_is_removable(BlockDriverState *bs)
   {
   return bs-removable;
  diff --git a/block.h b/block.h
  index 5f64380..732f63e 100644
  --- a/block.h
  +++ b/block.h
  @@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, 
  BlockErrorAction on_read_error,
  BlockErrorAction on_write_error);
   BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read);
   void bdrv_set_removable(BlockDriverState *bs, int removable);
  +void bdrv_unplug(BlockDriverState *bs);
   int bdrv_is_removable(BlockDriverState *bs);
   int bdrv_is_read_only(BlockDriverState *bs);
   int bdrv_is_sg(BlockDriverState *bs);
  diff --git a/blockdev.c b/blockdev.c
  index 5fc3b9b..68eb329 100644
  --- a/blockdev.c
  +++ b/blockdev.c
  @@ -610,3 +610,29 @@ int do_change_block(Monitor *mon, const char *device,
   }
   return monitor_read_bdrv_key_start(mon, bs, NULL, NULL);
   }
  +
  +int do_drive_unplug(Monitor *mon, const QDict *qdict, QObject **ret_data)
  +{
  +DriveInfo *dinfo;
  +BlockDriverState *bs;
  +const char *id;
  +
  +if (!qdict_haskey(qdict, id)) {
  +qerror_report(QERR_MISSING_PARAMETER, id);
  +return -1;
  +}
 
 As Luiz pointed out, this check is redundant.
 
  +
  +id = qdict_get_str(qdict, id);
  +dinfo = drive_get_by_id(id);
  +if (!dinfo) {
  +qerror_report(QERR_DEVICE_NOT_FOUND, id);
  +return -1;
  +}
  +
  +/* mark block device unplugged */
  +bs = dinfo-bdrv;
  +bdrv_unplug(bs);
  +
  +return 0;
  +}
  + 
 
 What about:
 
 const char *id = qdict_get_str(qdict, id);
 BlockDriverState *bs;
 
 bs = bdrv_find(id);
 if (!bs) {
 qerror_report(QERR_DEVICE_NOT_FOUND, id);
 return -1;
 }
 
 bdrv_unplug(bs);
 
 return 0;
 
 Precedence: commit f8b6cc00 replaced uses of drive_get_by_id() by
 bdrv_find().

That works out nicely; and I can drop the drive_get_by_id() patch as
well.  Thanks.

 
  diff --git a/blockdev.h b/blockdev.h
  index 19c6915..ecb9ac8 100644
  --- a/blockdev.h
  +++ b/blockdev.h
  @@ -52,5 +52,6 @@ int do_eject(Monitor *mon, const QDict *qdict, QObject 
  **ret_data);
   int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject 
  **ret_data);
   int do_change_block(Monitor *mon, const char *device,
   const char *filename, const char *fmt);
  +int do_drive_unplug(Monitor *mon, const QDict *qdict, QObject **ret_data);
   
   #endif
  diff --git a/hmp-commands.hx b/hmp-commands.hx
  index 81999aa..7a32a2e 100644
  --- a/hmp-commands.hx
  +++ b/hmp-commands.hx
  @@ -68,6 +68,21 @@ Eject a removable medium (use -f to force it).
   ETEXI
   
   {
  +.name   = drive_unplug,
  +.args_type  = id:s,
  +.params = device,
  +.help   = unplug block device,
  +.user_print = monitor_user_noop,
  +.mhandler.cmd_new = do_drive_unplug

[Qemu-devel] [PATCH 0/2] v5 Decouple block device removal from device removal

2010-11-01 Thread Ryan Harper

This patch series decouples the detachment of a block device from the removal
of the backing pci-device.  Removal of a hotplugged pci device requires the
guest to respond before qemu tears down the block device. In some cases, the
guest may not respond leaving the guest with continued access to the block
device.  

The new monitor command, drive_unplug, will revoke a guests access to the
block device independently of the removal of the pci device.

The first patch adds a new drive find method, the second patch implements the
monitor command and block layer changes.

Changes since v4:
- Droppped drive_get_by_id patch and use bdrv_find() instead
- Added additional details about drive_unplug to hmp/qmp interface

Changes since v3:
- Moved QMP command for drive_unplug() to separate patch

Changes since v2:
- Added QMP command for drive_unplug()

Changes since v1:
- CodingStyle fixes
- Added qemu_aio_flush() to bdrv_unplug()

Signed-off-by: Ryan Harper ry...@us.ibm.com

[Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_unplug()

2010-11-01 Thread Ryan Harper

Block hot unplug is racy since the guest is required to acknowlege the ACPI
unplug event; this may not happen synchronously with the device removal command

This series aims to close a gap where by mgmt applications that assume the
block resource has been removed without confirming that the guest has
acknowledged the removal may re-assign the underlying device to a second guest
leading to data leakage.

This series introduces a new montor command to decouple asynchornous device
removal from restricting guest access to a block device.  We do this by creating
a new monitor command drive_unplug which maps to a bdrv_unplug() command which
does a qemu_aio_flush; bdrv_flush() and bdrv_close().  Once complete, subsequent
IO is rejected from the device and the guest will get IO errors but continue to
function.

A subsequent device removal command can be issued to remove the device, to which
the guest may or maynot respond, but as long as the unplugged bit is set, no IO
will be sumbitted.

Signed-off-by: Ryan Harper ry...@us.ibm.com
---
 block.c |7 +++
 block.h |1 +
 blockdev.c  |   17 +
 blockdev.h  |1 +
 hmp-commands.hx |   20 
 5 files changed, 46 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index 985d0b7..2320e9d 100644
--- a/block.c
+++ b/block.c
@@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int 
removable)
 }
 }
 
+void bdrv_unplug(BlockDriverState *bs)
+{
+qemu_aio_flush();
+bdrv_flush(bs);
+bdrv_close(bs);
+}
+
 int bdrv_is_removable(BlockDriverState *bs)
 {
 return bs-removable;
diff --git a/block.h b/block.h
index a4facf2..608fd83 100644
--- a/block.h
+++ b/block.h
@@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, 
BlockErrorAction on_read_error,
BlockErrorAction on_write_error);
 BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read);
 void bdrv_set_removable(BlockDriverState *bs, int removable);
+void bdrv_unplug(BlockDriverState *bs);
 int bdrv_is_removable(BlockDriverState *bs);
 int bdrv_is_read_only(BlockDriverState *bs);
 int bdrv_is_sg(BlockDriverState *bs);
diff --git a/blockdev.c b/blockdev.c
index ff7602b..7bbdb65 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -597,3 +597,20 @@ int do_change_block(Monitor *mon, const char *device,
 }
 return monitor_read_bdrv_key_start(mon, bs, NULL, NULL);
 }
+
+int do_drive_unplug(Monitor *mon, const QDict *qdict, QObject **ret_data)
+{
+const char *id = qdict_get_str(qdict, id);
+BlockDriverState *bs;
+
+bs = bdrv_find(id);
+if (!bs) {
+qerror_report(QERR_DEVICE_NOT_FOUND, id);
+return -1;
+}
+
+bdrv_unplug(bs);
+
+return 0;
+}
+ 
diff --git a/blockdev.h b/blockdev.h
index 653affc..a454853 100644
--- a/blockdev.h
+++ b/blockdev.h
@@ -51,5 +51,6 @@ int do_eject(Monitor *mon, const QDict *qdict, QObject 
**ret_data);
 int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject **ret_data);
 int do_change_block(Monitor *mon, const char *device,
 const char *filename, const char *fmt);
+int do_drive_unplug(Monitor *mon, const QDict *qdict, QObject **ret_data);
 
 #endif
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 81999aa..f6d3c85 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -68,6 +68,26 @@ Eject a removable medium (use -f to force it).
 ETEXI
 
 {
+.name   = drive_unplug,
+.args_type  = id:s,
+.params = device,
+.help   = unplug block device,
+.user_print = monitor_user_noop,
+.mhandler.cmd_new = do_drive_unplug,
+},
+
+STEXI
+...@item unplug @var{device}
+...@findex unplug
+Unplug block device.  The result is that guest generated IO is no longer
+submitted against the host device underlying the disk.  Once a drive has
+been unplugged, the QEMU Block layer returns -EIO which results in IO 
+errors in the guest for applications that are reading/writing to the device
+when it is unplugged.  Unplugged block devices can be safely deleted along with
+the associated pci devices (if present).
+ETEXI
+
+{
 .name   = change,
 .args_type  = device:B,target:F,arg:s?,
 .params = device filename [format],
-- 
1.6.3.3

[Qemu-devel] [PATCH 2/2] Add qmp version of drive_unplug

2010-11-01 Thread Ryan Harper

Signed-off-by: Ryan Harper ry...@us.ibm.com
---
 qmp-commands.hx |   31 +++
 1 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/qmp-commands.hx b/qmp-commands.hx
index 793cf1c..a1f7b2f 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -338,6 +338,37 @@ Example:
 EQMP
 
 {
+.name   = drive_unplug,
+.args_type  = id:s,
+.params = device,
+.help   = unplug block device,
+.user_print = monitor_user_noop,
+.mhandler.cmd_new = do_drive_unplug,
+},
+
+SQMP
+drive unplug
+--
+
+Unplug a block device.  The result is that guest generated IO is no longer 
+submitted against the host device underlying the disk.  Once a drive has
+been unplugged, the QEMU Block layer returns -EIO which results in IO 
+errors in the guest for applications that are reading/writing to the device
+when it is unplugged.  Unplugged block devices can be safely deleted along 
with 
+the associated pci devices (if present).
+
+Arguments:
+
+- id: the device's ID (json-string)
+
+Example:
+
+- { execute: drive_unplug, arguments: { id: drive-virtio-blk1 } }
+- { return: {} }
+
+EQMP
+
+{
 .name   = cpu,
 .args_type  = index:i,
 .params = index,
-- 
1.6.3.3

Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal

2010-10-29 Thread Ryan Harper

* Markus Armbruster arm...@redhat.com [2010-10-29 09:13]:
 [Note cc: Michael]
 
 Ryan Harper ry...@us.ibm.com writes:
 
  This patch series decouples the detachment of a block device from the 
  removal
  of the backing pci-device.  Removal of a hotplugged pci device requires the
  guest to respond before qemu tears down the block device. In some cases, the
  guest may not respond leaving the guest with continued access to the block
  device.  
 
  The new monitor command, drive_unplug, will revoke a guests access to the
  block device independently of the removal of the pci device.
 
  The first patch adds a new drive find method, the second patch implements 
  the
  monitor command and block layer changes.
 
  Changes since v3:
  - Moved QMP command for drive_unplug() to separate patch
 
  Changes since v2:
  - Added QMP command for drive_unplug()
 
  Changes since v1:
  - CodingStyle fixes
  - Added qemu_aio_flush() to bdrv_unplug()
 
  Signed-off-by: Ryan Harper ry...@us.ibm.com
 
 If I understand your patch correctly, the difference between your
 drive_unplug and my blockdev_del is as follows:
 
 * drive_unplug forcefully severs the connection between the host part of
   the block device and its BlockDriverState.  A shell of the host part
   remains, to be cleaned up later.  You need forceful disconnect
   operation to be able to revoke access to an image whether the guest
   cooperates or not.  Fair enough.
 
 * blockdev_del deletes a host part.  My current version fails when the
   host part is in use.  I patterned that after netdev_del, which used to
   work that way, until commit 2ffcb18d:
 
 Make netdev_del delete the netdev even when it's in use
 
 To hot-unplug guest and host part of a network device, you do:
 
 device_del NIC-ID
 netdev_del NETDEV-ID
 
 For PCI devices, device_del merely tells ACPI to unplug the device.
 The device goes away for real only after the guest processed the ACPI
 unplug event.
 
 You have to wait until then (e.g. by polling info pci) before you can
 unplug the netdev.  Not good.
 
 Fix by removing the in use check from do_netdev_del().  Deleting a
 netdev while it's in use is safe; packets simply get routed to the bit
 bucket.
 
   Isn't this the very same problem that's behind your drive_unplug?

Yes it is.

 
 I'd like to have some consistency among net, block and char device
 commands, i.e. a common set of operations that work the same for all of
 them.  Can we agree on such a set?

Yeah; the current trouble (or at least what I perceive to be trouble) is
that in the case where the guest responds to device_del induced ACPI
removal event; the current qdev code already does the host-side device
tear down.  Not sure if it is OK to do a blockdev_del() immediately
after the device_del.  What happens when we do:

device_del
ACPI to guest
blockdev_del /* removes host-side device */
guest responds to ACPI
qdev calls pci device removal code
qemu attempts to destroy the associated host-side block

That may just work today; and if not, it shouldn't be hard to fix up the
code to check for NULLs

 
 Even if your drive_unplug shouldn't fit in that set, we might want it as
 a stop-gap.  Depends on how urgent the need for it is.  Yet another
 special-purpose command to be deprecated later.

The fix is urgent; but I'm willing to spin a couple patches if it helps
get this into better shape.


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal

2010-10-29 Thread Ryan Harper

* Markus Armbruster arm...@redhat.com [2010-10-29 11:11]:
 Ryan Harper ry...@us.ibm.com writes:
 
  * Markus Armbruster arm...@redhat.com [2010-10-29 09:13]:
  [Note cc: Michael]
  
  Ryan Harper ry...@us.ibm.com writes:
  
  
  If I understand your patch correctly, the difference between your
  drive_unplug and my blockdev_del is as follows:
  
  * drive_unplug forcefully severs the connection between the host part of
the block device and its BlockDriverState.  A shell of the host part
remains, to be cleaned up later.  You need forceful disconnect
operation to be able to revoke access to an image whether the guest
cooperates or not.  Fair enough.
  
  * blockdev_del deletes a host part.  My current version fails when the
host part is in use.  I patterned that after netdev_del, which used to
work that way, until commit 2ffcb18d:
  
  Make netdev_del delete the netdev even when it's in use
  
  To hot-unplug guest and host part of a network device, you do:
  
  device_del NIC-ID
  netdev_del NETDEV-ID
  
  For PCI devices, device_del merely tells ACPI to unplug the device.
  The device goes away for real only after the guest processed the ACPI
  unplug event.
  
  You have to wait until then (e.g. by polling info pci) before you can
  unplug the netdev.  Not good.
  
  Fix by removing the in use check from do_netdev_del().  Deleting a
  netdev while it's in use is safe; packets simply get routed to the bit
  bucket.
  
Isn't this the very same problem that's behind your drive_unplug?
 
  Yes it is.
 
  
  I'd like to have some consistency among net, block and char device
  commands, i.e. a common set of operations that work the same for all of
  them.  Can we agree on such a set?
 
  Yeah; the current trouble (or at least what I perceive to be trouble) is
  that in the case where the guest responds to device_del induced ACPI
  removal event; the current qdev code already does the host-side device
  tear down.  Not sure if it is OK to do a blockdev_del() immediately
  after the device_del.  What happens when we do:
 
  device_del
  ACPI to guest
  blockdev_del /* removes host-side device */
 
 Fails in my tree, because the blockdev's still in use.  See below.
 
  guest responds to ACPI
  qdev calls pci device removal code
  qemu attempts to destroy the associated host-side block
 
  That may just work today; and if not, it shouldn't be hard to fix up the
  code to check for NULLs
 
 I hate the automatic deletion of host part along with the guest part.
 device_del should undo device_add.  {block,net,char}dev_{add,del} should
 be similarly paired.

Agreed.
 
 In my blockdev branch, I keep the automatic delete only for backwards
 compatibility: if you create the drive with drive_add, it gets
 auto-deleted, but if you use blockdev_add, it stays around.

But what to do about the case where we're doing drive_add and then a
device_del()  That's the urgent situation that needs to be resolved.

 
  Even if your drive_unplug shouldn't fit in that set, we might want it as
  a stop-gap.  Depends on how urgent the need for it is.  Yet another
  special-purpose command to be deprecated later.
 
  The fix is urgent; but I'm willing to spin a couple patches if it helps
  get this into better shape.
 
 Can we agree on a common solution for block and net?  That's why I cc'ed
 Michael.

I didn't see a good way to have block behave the same as net; though I
do agree that it would be good to have this be common, long term.

 
 Currently, we have two different ways:
 
 * The netdev way: del always succeeds
 
   How can it succeed if the host part is in use?
 
   If all device models are prepared to deal with a missing host part, we
   can delete it right away.
 
   Else, we need to replace it with a suitable zombie, which is
   auto-deleted when it goes out of use.  Such zombies are not be visible
   elsewhere, in particular, the ID becomes available immediately.
 
 * The unplug way: del fails while in use, unplug always succeeds
 
   Feels a bit cleaner to me.  But changing netdev_del might not be
   acceptable.
 
 Either way works for me as an user interface.  But I'd rather not have
 both.
 
 Next, we need to consider how to integrate this with the automatic
 deletion of drives on qdev destruction.  That's too late for unplug, we
 want that right in device_del.  I'd leave the stupid automatic delete
 where it is now, in qdev destruction.  The C API need unplug and delete
 separate for that.
 
 
 Regardless of the way we choose, we need to think very clearly on how
 exactly device models should behave when their host part is missing or a
 zombie, and how that behavior appears in the guest.
 
 For net, making it look exactly like a yanked out network cable would
 make sense to me.
 
 What about block?

It seems to me that for block it's like cdrom with no disk, floppy with
no media, hard disk that's gone bad.  I think we we

[Qemu-devel] [PATCH 2/3] v2 Fix Block Hotplug race with drive_unplug()

2010-10-25 Thread Ryan Harper

Block hot unplug is racy since the guest is required to acknowlege the ACPI
unplug event; this may not happen synchronously with the device removal command

This series aims to close a gap where by mgmt applications that assume the
block resource has been removed without confirming that the guest has
acknowledged the removal may re-assign the underlying device to a second guest
leading to data leakage.

This series introduces a new montor command to decouple asynchornous device
removal from restricting guest access to a block device.  We do this by creating
a new monitor command drive_unplug which maps to a bdrv_unplug() command which
does a qemu_aio_flush; bdrv_flush() and bdrv_close().  Once complete, subsequent
IO is rejected from the device and the guest will get IO errors but continue to
function.

A subsequent device removal command can be issued to remove the device, to which
the guest may or maynot respond, but as long as the unplugged bit is set, no IO
will be sumbitted.

Changes since v1:
- Added qemu_aio_flush() before bdrv_flush() to wait on pending io

Signed-off-by: Ryan Harper ry...@us.ibm.com
---
 block.c |7 +++
 block.h |1 +
 blockdev.c  |   26 ++
 blockdev.h  |1 +
 hmp-commands.hx |   15 +++
 5 files changed, 50 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index a19374d..be47655 100644
--- a/block.c
+++ b/block.c
@@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int 
removable)
 }
 }
 
+void bdrv_unplug(BlockDriverState *bs)
+{
+qemu_aio_flush();
+bdrv_flush(bs);
+bdrv_close(bs);
+}
+
 int bdrv_is_removable(BlockDriverState *bs)
 {
 return bs-removable;
diff --git a/block.h b/block.h
index 5f64380..732f63e 100644
--- a/block.h
+++ b/block.h
@@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, 
BlockErrorAction on_read_error,
BlockErrorAction on_write_error);
 BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read);
 void bdrv_set_removable(BlockDriverState *bs, int removable);
+void bdrv_unplug(BlockDriverState *bs);
 int bdrv_is_removable(BlockDriverState *bs);
 int bdrv_is_read_only(BlockDriverState *bs);
 int bdrv_is_sg(BlockDriverState *bs);
diff --git a/blockdev.c b/blockdev.c
index 5fc3b9b..68eb329 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -610,3 +610,29 @@ int do_change_block(Monitor *mon, const char *device,
 }
 return monitor_read_bdrv_key_start(mon, bs, NULL, NULL);
 }
+
+int do_drive_unplug(Monitor *mon, const QDict *qdict, QObject **ret_data)
+{
+DriveInfo *dinfo;
+BlockDriverState *bs;
+const char *id;
+
+if (!qdict_haskey(qdict, id)) {
+qerror_report(QERR_MISSING_PARAMETER, id);
+return -1;
+}
+
+id = qdict_get_str(qdict, id);
+dinfo = drive_get_by_id(id);
+if (!dinfo) {
+qerror_report(QERR_DEVICE_NOT_FOUND, id);
+return -1;
+}
+
+/* mark block device unplugged */
+bs = dinfo-bdrv;
+bdrv_unplug(bs);
+
+return 0;
+}
+ 
diff --git a/blockdev.h b/blockdev.h
index 19c6915..ecb9ac8 100644
--- a/blockdev.h
+++ b/blockdev.h
@@ -52,5 +52,6 @@ int do_eject(Monitor *mon, const QDict *qdict, QObject 
**ret_data);
 int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject **ret_data);
 int do_change_block(Monitor *mon, const char *device,
 const char *filename, const char *fmt);
+int do_drive_unplug(Monitor *mon, const QDict *qdict, QObject **ret_data);
 
 #endif
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 81999aa..7a32a2e 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -68,6 +68,21 @@ Eject a removable medium (use -f to force it).
 ETEXI
 
 {
+.name   = drive_unplug,
+.args_type  = id:s,
+.params = device,
+.help   = unplug block device,
+.user_print = monitor_user_noop,
+.mhandler.cmd_new = do_drive_unplug,
+},
+
+STEXI
+...@item unplug @var{device}
+...@findex unplug
+Unplug block device.
+ETEXI
+
+{
 .name   = change,
 .args_type  = device:B,target:F,arg:s?,
 .params = device filename [format],
-- 
1.6.3.3

[Qemu-devel] [PATCH 3/3] Add qmp version of drive_unplug

2010-10-25 Thread Ryan Harper

Signed-off-by: Ryan Harper ry...@us.ibm.com
---
 qmp-commands.hx |   26 ++
 1 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/qmp-commands.hx b/qmp-commands.hx
index 793cf1c..e8f3d4a 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -338,6 +338,32 @@ Example:
 EQMP
 
 {
+.name   = drive_unplug,
+.args_type  = id:s,
+.params = device,
+.help   = unplug block device,
+.user_print = monitor_user_noop,
+.mhandler.cmd_new = do_drive_unplug,
+},
+
+SQMP
+drive unplug
+--
+
+Unplug a block device.
+
+Arguments:
+
+- id: the device's ID (json-string)
+
+Example:
+
+- { execute: drive_unplug, arguments: { id: drive-virtio-blk1 } }
+- { return: {} }
+
+EQMP
+
+{
 .name   = cpu,
 .args_type  = index:i,
 .params = index,
-- 
1.6.3.3

[Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal

2010-10-25 Thread Ryan Harper

This patch series decouples the detachment of a block device from the removal
of the backing pci-device.  Removal of a hotplugged pci device requires the
guest to respond before qemu tears down the block device. In some cases, the
guest may not respond leaving the guest with continued access to the block
device.  

The new monitor command, drive_unplug, will revoke a guests access to the
block device independently of the removal of the pci device.

The first patch adds a new drive find method, the second patch implements the
monitor command and block layer changes.

Changes since v3:
- Moved QMP command for drive_unplug() to separate patch

Changes since v2:
- Added QMP command for drive_unplug()

Changes since v1:
- CodingStyle fixes
- Added qemu_aio_flush() to bdrv_unplug()

Signed-off-by: Ryan Harper ry...@us.ibm.com

[Qemu-devel] [PATCH 1/3] v2 Add drive_get_by_id

2010-10-25 Thread Ryan Harper

Add a function to find a drive by id string.

Changes since v1:
-Coding Style fix

Signed-off-by: Ryan Harper ry...@us.ibm.com
---
 blockdev.c |   13 +
 blockdev.h |1 +
 2 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index ff7602b..5fc3b9b 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -75,6 +75,19 @@ DriveInfo *drive_get(BlockInterfaceType type, int bus, int 
unit)
 return NULL;
 }
 
+DriveInfo *drive_get_by_id(const char *id)
+{
+DriveInfo *dinfo;
+
+QTAILQ_FOREACH(dinfo, drives, next) {
+if (strcmp(id, dinfo-id)) {
+continue;
+}
+return dinfo;
+}
+return NULL;
+}
+
 int drive_get_max_bus(BlockInterfaceType type)
 {
 int max_bus;
diff --git a/blockdev.h b/blockdev.h
index 653affc..19c6915 100644
--- a/blockdev.h
+++ b/blockdev.h
@@ -38,6 +38,7 @@ DriveInfo *drive_get(BlockInterfaceType type, int bus, int 
unit);
 int drive_get_max_bus(BlockInterfaceType type);
 void drive_uninit(DriveInfo *dinfo);
 DriveInfo *drive_get_by_blockdev(BlockDriverState *bs);
+DriveInfo *drive_get_by_id(const char *id);
 
 QemuOpts *drive_add(const char *file, const char *fmt, ...) GCC_FMT_ATTR(2, 3);
 DriveInfo *drive_init(QemuOpts *arg, int default_to_scsi, int *fatal_error);
-- 
1.6.3.3

1 2 >

1 - 100 of 178 matches

Mail list logo