[Qemu-devel] Re: [PATCH] vga: Declare as little endian

2010-12-13 Thread Alexander Graf

On 11.12.2010, at 23:33, Blue Swirl wrote:

 This patch replaces explicit bswaps with endianness hints to the
 mmio layer.
 
 CC: Alexander Graf ag...@suse.de
 Signed-off-by: Blue Swirl blauwir...@gmail.com

Acked-by: Alexander Graf ag...@suse.de


Alex




[Qemu-devel] (no subject)

2010-12-13 Thread Ronnie Sahlberg

Please find a new block driver that IF libiscsi is present on the system
will link with this userspace client library and make qemu able to
access iscsi devices directly without exposing them to the host.

The library used is multiplatform and available from
git://github.com/sahlberg/libiscsi.git






[Qemu-devel] [PATCH] libiscsi

2010-12-13 Thread Ronnie Sahlberg
This patch adds a new block driver : block.iscsi.c
This driver interfaces with the multiplatform posix library
for iscsi initiator/client access to iscsi devices hosted at
git://github.com/sahlberg/libiscsi.git

The patch adds the driver to interface with the iscsi library.
It also updated the configure script to
* by default, probe is libiscsi is available and if so, build
  qemu against libiscsi.
* --enable-libiscsi
  Force a build against libiscsi. If libiscsi is not available
  the build will fail.
* --disable-libiscsi
  Do not link against libiscsi, even if it is available.

When linked with libiscsi, qemu gains support to access iscsi resources
such as disks and cdrom directly, without having to make the devices visible
to the host.

You can specify devices using a iscsi url of the form :
iscsi://host[:port]/target-iqn-name/lun

Example:
-drive file=iscsi://10.1.1.1:3260/iqn.ronnie.test/1

-cdrom iscsi://10.1.1.1:3260/iqn.ronnie.test/2

Signed-off-by: Ronnie Sahlberg ronniesahlb...@gmail.com
---
 Makefile.objs |2 +-
 block/iscsi.c |  528 +
 configure |   29 +++
 3 files changed, 558 insertions(+), 1 deletions(-)
 create mode 100644 block/iscsi.c

diff --git a/Makefile.objs b/Makefile.objs
index cebb945..81731c5 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -22,7 +22,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o 
dmg.o bochs.o vpc.o vv
 block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
 block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
 block-nested-$(CONFIG_WIN32) += raw-win32.o
-block-nested-$(CONFIG_POSIX) += raw-posix.o
+block-nested-$(CONFIG_POSIX) += raw-posix.o iscsi.o
 block-nested-$(CONFIG_CURL) += curl.o
 
 block-obj-y +=  $(addprefix block/, $(block-nested-y))
diff --git a/block/iscsi.c b/block/iscsi.c
new file mode 100644
index 000..fba5ee6
--- /dev/null
+++ b/block/iscsi.c
@@ -0,0 +1,528 @@
+/*
+ * QEMU Block driver for iSCSI images
+ *
+ * Copyright (c) 2010 Ronnie Sahlberg ronniesahlb...@gmail.com
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the Software), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include config-host.h
+#ifdef CONFIG_LIBISCSI
+
+#include poll.h
+#include sysemu.h
+#include qemu-common.h
+#include qemu-error.h
+#include block_int.h
+
+#include iscsi/iscsi.h
+#include iscsi/scsi-lowlevel.h
+
+
+typedef struct ISCSILUN {
+struct iscsi_context *iscsi;
+int lun;
+int block_size;
+unsigned long num_blocks;
+} ISCSILUN;
+
+typedef struct ISCSIAIOCB {
+BlockDriverAIOCB common;
+QEMUIOVector *qiov;
+QEMUBH *bh;
+ISCSILUN *iscsilun;
+int canceled;
+int status;
+size_t read_size;
+} ISCSIAIOCB;
+
+struct iscsi_task {
+ISCSILUN *iscsilun;
+int status;
+int complete;
+};
+
+static int
+iscsi_is_inserted(BlockDriverState *bs)
+{
+ISCSILUN *iscsilun = bs-opaque;
+struct iscsi_context *iscsi = iscsilun-iscsi;
+
+return iscsi_is_logged_in(iscsi);
+}
+
+
+static void
+iscsi_aio_cancel(BlockDriverAIOCB *blockacb)
+{
+ISCSIAIOCB *acb = (ISCSIAIOCB *)blockacb;
+
+acb-status = -EIO;
+acb-common.cb(acb-common.opaque, acb-status);
+acb-canceled = 1;
+}
+
+static AIOPool iscsi_aio_pool = {
+.aiocb_size = sizeof(ISCSIAIOCB),
+.cancel = iscsi_aio_cancel,
+};
+
+
+static void iscsi_process_read(void *arg);
+static void iscsi_process_write(void *arg);
+
+static void
+iscsi_set_events(ISCSILUN *iscsilun)
+{
+struct iscsi_context *iscsi = iscsilun-iscsi;
+
+qemu_aio_set_fd_handler(iscsi_get_fd(iscsi), iscsi_process_read,
+   (iscsi_which_events(iscsi)POLLOUT)
+   ?iscsi_process_write:NULL,
+   NULL, NULL, iscsilun);
+}
+
+static void
+iscsi_process_read(void *arg)
+{
+ISCSILUN *iscsilun = arg;
+struct iscsi_context *iscsi = iscsilun-iscsi;
+
+iscsi_service(iscsi, POLLIN);
+

[Qemu-devel] Re: [PATCH] fix qruncom compilation problems

2010-12-13 Thread Paolo Bonzini

On 12/11/2010 03:42 PM, Stefano Bonifazi wrote:

Surely I do understand you! Your help has been very very useful and
appreciated already thank you! May you direct me to somebody who's working
on it? Some TCG guru who could understand immediately what's wrong?:)
I noticed, far now,  that each question on this mailing list is answered
only by one QEMU developer, is that a sort of policy or just a coincidence?


It's a coincidence. :)

Paolo



[Qemu-devel] Re: [RFC][PATCH v5 08/21] virtagent: add agent_viewfile qmp/hmp command

2010-12-13 Thread Jes Sorensen
On 12/10/10 18:09, Michael Roth wrote:
 I think with strictly enforced size limits the major liability for
 viewfile is, as you mentioned, users using it to view binary data or
 carefully crafted files that can mess up or fool users/shells/programs
 interpreting monitor output.
 
 But plain-text does not include escape sequences, so it's completely
 reasonable that we'd scrape them. And I'm not sure if a (qemu) in the
 text is a potential liability. Would there be any other issues to consider?
 
 If we can guard against those things, do you agree it wouldn't be an
 inherently dangerous interface? State-full, asynchronous RPCs like
 copyfile and exec are not really something I'd planned for the initial
 release. I think they'll take some time to get right, and a simple
 low-risk interface to cover what I'm fairly sure is the most common use
 case seems reasonable.

I am still wary of relying on strict limit enforcement. It is the sort
of thing that will eventually change without us noticing and we end up
with a security hole.

IMHO QEMU should not try to do these sorts of things, instead it should
provide the transport and control services. I don't think file viewing
belongs in QEMU at all. I would be a lot more comfortable if this was
implemented as a standalone monitor interface that connected to QEMU's
QMP interface. I could then use QMP to perform actions like copying the
file to /tmp and if viewing the file caused the monitor to lock up, we
wouldn't lose the guest. This could indeed be the start of an external
monitor :)

Cheers,
Jes



[Qemu-devel] Re: [SeaBIOS] seabios: acpi: add _RMV control method for PCI devices

2010-12-13 Thread Avi Kivity

On 12/13/2010 02:00 AM, Marcelo Tosatti wrote:

On Sat, Dec 11, 2010 at 09:39:30AM +0200, Avi Kivity wrote:
  On 12/08/2010 07:08 PM, Marcelo Tosatti wrote:
  Use _RMV method to indicate whether device can be removed.
  
  Data is retrieved from QEMU via I/O port 0xae0c.
  

  Where did this port come from?

Its the next available address after PCI EJ base, used
for QEMU-ACPI hotplug communication.

  What's the protocol?

ACPI reads the 32-bit field indicating the return value of the _RMV
method (which is used by Windows to decide removability). 1-bit per
slot.

More ports have to be registered if more buses are added.

  Maybe we should do this via fw_cfg.

I don't see a need for it? (yes, it might be possible, but i'm not
familiar enough with AML).


To avoid adding tons of undocumented I/O ports, and to allow 
discoverability (what happens with a new seabios on old qemu)?


We could do this in two ways: by adding a fwcfg client to the DSDT, or 
by copying the information to system memory, and referencing system 
memory from the DSDT.


--
error compiling committee.c: too many arguments to function




[Qemu-devel] Re: [PATCH V2] qemu, kvm: Enable user space NMI injection for kvm guest

2010-12-13 Thread Lai Jiangshan
On 12/10/2010 04:41 PM, Jan Kiszka wrote:
 Am 10.12.2010 08:42, Lai Jiangshan wrote:

 Make use of the new KVM_NMI IOCTL to send NMIs into the KVM guest if the
 user space raised them. (example: qemu monitor's nmi command)

 Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com
 ---
 diff --git a/configure b/configure
 index 2917874..f6f9362 100755
 --- a/configure
 +++ b/configure
 @@ -1646,6 +1646,9 @@ if test $kvm != no ; then
  #if !defined(KVM_CAP_DESTROY_MEMORY_REGION_WORKS)
  #error Missing KVM capability KVM_CAP_DESTROY_MEMORY_REGION_WORKS
  #endif
 +#if !defined(KVM_CAP_USER_NMI)
 +#error Missing KVM capability KVM_CAP_USER_NMI
 +#endif
  int main(void) { return 0; }
  EOF
if test $kerneldir !=  ; then
 
 That's what I meant.
 
 We also have a runtime check for KVM_CAP_DESTROY_MEMORY_REGION_WORKS on
 kvm init, but IMHO adding the same for KVM_CAP_USER_NMI would be
 overkill. So...
 
 diff --git a/target-i386/kvm.c b/target-i386/kvm.c
 index 7dfc357..755f8c9 100644
 --- a/target-i386/kvm.c
 +++ b/target-i386/kvm.c
 @@ -1417,6 +1417,13 @@ int kvm_arch_get_registers(CPUState *env)
  
  int kvm_arch_pre_run(CPUState *env, struct kvm_run *run)
  {
 +/* Inject NMI */
 +if (env-interrupt_request  CPU_INTERRUPT_NMI) {
 +env-interrupt_request = ~CPU_INTERRUPT_NMI;
 +DPRINTF(injected NMI\n);
 +kvm_vcpu_ioctl(env, KVM_NMI);
 +}
 +
  /* Try to inject an interrupt if the guest can accept it */
  if (run-ready_for_interrupt_injection 
  (env-interrupt_request  CPU_INTERRUPT_HARD) 
 
 Acked-by: Jan Kiszka jan.kis...@siemens.com
 

Hi, Avi

Could you apply this patch or give me any comments/suggest?

Thanks,
Lai



[Qemu-devel] Re: [SeaBIOS] seabios: acpi: add _RMV control method for PCI devices

2010-12-13 Thread Gleb Natapov
On Mon, Dec 13, 2010 at 10:41:25AM +0200, Avi Kivity wrote:
 On 12/13/2010 02:00 AM, Marcelo Tosatti wrote:
 On Sat, Dec 11, 2010 at 09:39:30AM +0200, Avi Kivity wrote:
   On 12/08/2010 07:08 PM, Marcelo Tosatti wrote:
   Use _RMV method to indicate whether device can be removed.
   
   Data is retrieved from QEMU via I/O port 0xae0c.
   
 
   Where did this port come from?
 
 Its the next available address after PCI EJ base, used
 for QEMU-ACPI hotplug communication.
 
   What's the protocol?
 
 ACPI reads the 32-bit field indicating the return value of the _RMV
 method (which is used by Windows to decide removability). 1-bit per
 slot.
 
 More ports have to be registered if more buses are added.
 
   Maybe we should do this via fw_cfg.
 
 I don't see a need for it? (yes, it might be possible, but i'm not
 familiar enough with AML).
 
 To avoid adding tons of undocumented I/O ports, and to allow
 discoverability (what happens with a new seabios on old qemu)?
 
We already have out own mini pci hot-plug controller at io port 0xae00.
The patch just extends its functionality a bit. Logically this
functionality belongs there.

 We could do this in two ways: by adding a fwcfg client to the DSDT,
 or by copying the information to system memory, and referencing
 system memory from the DSDT.
 
This is even worse. It requires some fixed address to be shared between
DSDT and Seabios (or alternatively Seabios will have to generate this
part of DSDT dynamically).

--
Gleb.



[Qemu-devel] Re: [SeaBIOS] seabios: acpi: add _RMV control method for PCI devices

2010-12-13 Thread Avi Kivity

On 12/13/2010 10:49 AM, Gleb Natapov wrote:

On Mon, Dec 13, 2010 at 10:41:25AM +0200, Avi Kivity wrote:
  On 12/13/2010 02:00 AM, Marcelo Tosatti wrote:
  On Sat, Dec 11, 2010 at 09:39:30AM +0200, Avi Kivity wrote:
 On 12/08/2010 07:08 PM, Marcelo Tosatti wrote:
 Use _RMV method to indicate whether device can be removed.
 
 Data is retrieved from QEMU via I/O port 0xae0c.
 
  
 Where did this port come from?
  
  Its the next available address after PCI EJ base, used
  for QEMU-ACPI hotplug communication.
  
 What's the protocol?
  
  ACPI reads the 32-bit field indicating the return value of the _RMV
  method (which is used by Windows to decide removability). 1-bit per
  slot.
  
  More ports have to be registered if more buses are added.
  
 Maybe we should do this via fw_cfg.
  
  I don't see a need for it? (yes, it might be possible, but i'm not
  familiar enough with AML).

  To avoid adding tons of undocumented I/O ports, and to allow
  discoverability (what happens with a new seabios on old qemu)?

We already have out own mini pci hot-plug controller at io port 0xae00.
The patch just extends its functionality a bit. Logically this
functionality belongs there.


Well, at least it should be documented.

We could also deprecate the old port and use fwcfg for everything (try 
fwcfg, fall back to ae00).



  We could do this in two ways: by adding a fwcfg client to the DSDT,
  or by copying the information to system memory, and referencing
  system memory from the DSDT.

This is even worse. It requires some fixed address to be shared between
DSDT and Seabios (or alternatively Seabios will have to generate this
part of DSDT dynamically).



Could easily be something in the F segment.

--
error compiling committee.c: too many arguments to function




[Qemu-devel] Re: [SeaBIOS] seabios: acpi: add _RMV control method for PCI devices

2010-12-13 Thread Gleb Natapov
On Mon, Dec 13, 2010 at 10:53:07AM +0200, Avi Kivity wrote:
 On 12/13/2010 10:49 AM, Gleb Natapov wrote:
 On Mon, Dec 13, 2010 at 10:41:25AM +0200, Avi Kivity wrote:
   On 12/13/2010 02:00 AM, Marcelo Tosatti wrote:
   On Sat, Dec 11, 2010 at 09:39:30AM +0200, Avi Kivity wrote:
  On 12/08/2010 07:08 PM, Marcelo Tosatti wrote:
  Use _RMV method to indicate whether device can be removed.
  
  Data is retrieved from QEMU via I/O port 0xae0c.
  
   
  Where did this port come from?
   
   Its the next available address after PCI EJ base, used
   for QEMU-ACPI hotplug communication.
   
  What's the protocol?
   
   ACPI reads the 32-bit field indicating the return value of the _RMV
   method (which is used by Windows to decide removability). 1-bit per
   slot.
   
   More ports have to be registered if more buses are added.
   
  Maybe we should do this via fw_cfg.
   
   I don't see a need for it? (yes, it might be possible, but i'm not
   familiar enough with AML).
 
   To avoid adding tons of undocumented I/O ports, and to allow
   discoverability (what happens with a new seabios on old qemu)?
 
 We already have out own mini pci hot-plug controller at io port 0xae00.
 The patch just extends its functionality a bit. Logically this
 functionality belongs there.
 
 Well, at least it should be documented.
 
Agree.

 We could also deprecate the old port and use fwcfg for everything
 (try fwcfg, fall back to ae00).
 
fwcfg designed to be simple for easy use by firmware. It has two port
one for index another for value, so its use is racy in multi-threaded SMP
environment. DSDT code is executed in such environment. There is lock
facility in AML, but why complicate things.
 
   We could do this in two ways: by adding a fwcfg client to the DSDT,
   or by copying the information to system memory, and referencing
   system memory from the DSDT.
 
 This is even worse. It requires some fixed address to be shared between
 DSDT and Seabios (or alternatively Seabios will have to generate this
 part of DSDT dynamically).
 
 
 Could easily be something in the F segment.
 
Yes, but then we will have two magic values (fwcfg index + address
in F segment) instead of one (address of pci hot-plug controller).

--
Gleb.



[Qemu-devel] Re: [SeaBIOS] seabios: acpi: add _RMV control method for PCI devices

2010-12-13 Thread Avi Kivity

On 12/13/2010 11:03 AM, Gleb Natapov wrote:

  We could also deprecate the old port and use fwcfg for everything
  (try fwcfg, fall back to ae00).

fwcfg designed to be simple for easy use by firmware. It has two port
one for index another for value, so its use is racy in multi-threaded SMP
environment. DSDT code is executed in such environment. There is lock
facility in AML, but why complicate things.


I prefer to remove complexity from interfaces and have it in the 
implementation instead.



 We could do this in two ways: by adding a fwcfg client to the DSDT,
 or by copying the information to system memory, and referencing
 system memory from the DSDT.
  
  This is even worse. It requires some fixed address to be shared between
  DSDT and Seabios (or alternatively Seabios will have to generate this
  part of DSDT dynamically).
  

  Could easily be something in the F segment.

Yes, but then we will have two magic values (fwcfg index + address
in F segment) instead of one (address of pci hot-plug controller).


The F segment address is internal to SeaBIOS; it isn't an external 
interface.


--
error compiling committee.c: too many arguments to function




[Qemu-devel] Re: [SeaBIOS] seabios: acpi: add _RMV control method for PCI devices

2010-12-13 Thread Gleb Natapov
On Mon, Dec 13, 2010 at 11:10:38AM +0200, Avi Kivity wrote:
 On 12/13/2010 11:03 AM, Gleb Natapov wrote:
   We could also deprecate the old port and use fwcfg for everything
   (try fwcfg, fall back to ae00).
 
 fwcfg designed to be simple for easy use by firmware. It has two port
 one for index another for value, so its use is racy in multi-threaded SMP
 environment. DSDT code is executed in such environment. There is lock
 facility in AML, but why complicate things.
 
 I prefer to remove complexity from interfaces and have it in the
 implementation instead.
I prefer whatever is simpler :) simpler == less bugs. And it is not like
we discuss new interface here. You want to deprecate existing interface
in favor of something that was not designed to handle the task.

 
  We could do this in two ways: by adding a fwcfg client to the DSDT,
  or by copying the information to system memory, and referencing
  system memory from the DSDT.
   
   This is even worse. It requires some fixed address to be shared between
   DSDT and Seabios (or alternatively Seabios will have to generate this
   part of DSDT dynamically).
   
 
   Could easily be something in the F segment.
 
 Yes, but then we will have two magic values (fwcfg index + address
 in F segment) instead of one (address of pci hot-plug controller).
 
 The F segment address is internal to SeaBIOS; it isn't an external
 interface.
 
Depends on how you define external interface. It can be considered as
interface between OSPM and firmware. Next time layout of F segment
changes in SeaBIOS will you remember fixing DSDT too?

--
Gleb.



[Qemu-devel] [PATCH] qemu-io: Add discard command

2010-12-13 Thread Stefan Hajnoczi
discard [-Cq] off len -- discards a number of bytes at a specified
offset

 discards a range of bytes from the given offset

 Example:
 'discard 512 1k' - discards 1 kilobyte from 512 bytes into the file

 Discards a segment of the currently open file.
 -C, -- report statistics in a machine parsable format
 -q, -- quite mode, do not show I/O statistics

Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
---
 qemu-io.c |   88 +
 1 files changed, 88 insertions(+), 0 deletions(-)

diff --git a/qemu-io.c b/qemu-io.c
index ff353eb..9de5361 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -1394,6 +1394,93 @@ static const cmdinfo_t info_cmd = {
.oneline= prints information about the current file,
 };
 
+static void
+discard_help(void)
+{
+   printf(
+\n
+ discards a range of bytes from the given offset\n
+\n
+ Example:\n
+ 'discard 512 1k' - discards 1 kilobyte from 512 bytes into the file\n
+\n
+ Discards a segment of the currently open file.\n
+ -C, -- report statistics in a machine parsable format\n
+ -q, -- quite mode, do not show I/O statistics\n
+\n);
+}
+
+static int discard_f(int argc, char **argv);
+
+static const cmdinfo_t discard_cmd = {
+   .name   = discard,
+   .altname= d,
+   .cfunc  = discard_f,
+   .argmin = 2,
+   .argmax = -1,
+   .args   = [-Cq] off len,
+   .oneline= discards a number of bytes at a specified offset,
+   .help   = discard_help,
+};
+
+static int
+discard_f(int argc, char **argv)
+{
+   struct timeval t1, t2;
+   int Cflag = 0, qflag = 0;
+   int c, ret;
+   int64_t offset;
+   int count;
+
+   while ((c = getopt(argc, argv, Cq)) != EOF) {
+   switch (c) {
+   case 'C':
+   Cflag = 1;
+   break;
+   case 'q':
+   qflag = 1;
+   break;
+   default:
+   return command_usage(discard_cmd);
+   }
+   }
+
+   if (optind != argc - 2) {
+   return command_usage(discard_cmd);
+   }
+
+   offset = cvtnum(argv[optind]);
+   if (offset  0) {
+   printf(non-numeric length argument -- %s\n, argv[optind]);
+   return 0;
+   }
+
+   optind++;
+   count = cvtnum(argv[optind]);
+   if (count  0) {
+   printf(non-numeric length argument -- %s\n, argv[optind]);
+   return 0;
+   }
+
+   gettimeofday(t1, NULL);
+   ret = bdrv_discard(bs, offset, count);
+   gettimeofday(t2, NULL);
+
+   if (ret  0) {
+   printf(discard failed: %s\n, strerror(-ret));
+   goto out;
+   }
+
+   /* Finally, report back -- -C gives a parsable format */
+   if (!qflag) {
+   t2 = tsub(t2, t1);
+   print_report(discard, t2, offset, count, count, 1, Cflag);
+   }
+
+out:
+   return 0;
+}
+
 static int
 alloc_f(int argc, char **argv)
 {
@@ -1717,6 +1804,7 @@ int main(int argc, char **argv)
add_command(truncate_cmd);
add_command(length_cmd);
add_command(info_cmd);
+   add_command(discard_cmd);
add_command(alloc_cmd);
add_command(map_cmd);
 
-- 
1.7.2.3




[Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify

2010-12-13 Thread Stefan Hajnoczi
On Sun, Dec 12, 2010 at 9:09 PM, Michael S. Tsirkin m...@redhat.com wrote:
 On Sun, Dec 12, 2010 at 10:56:34PM +0200, Michael S. Tsirkin wrote:
 On Sun, Dec 12, 2010 at 10:42:28PM +0200, Michael S. Tsirkin wrote:
  On Sun, Dec 12, 2010 at 10:41:28PM +0200, Michael S. Tsirkin wrote:
   On Sun, Dec 12, 2010 at 03:02:04PM +, Stefan Hajnoczi wrote:
See below for the v5 changelog.
   
Due to lack of connectivity I am sending from GMail.  Git should 
retain my
stefa...@linux.vnet.ibm.com From address.
   
Virtqueue notify is currently handled synchronously in userspace 
virtio.  This
prevents the vcpu from executing guest code while hardware emulation 
code
handles the notify.
   
On systems that support KVM, the ioeventfd mechanism can be used to 
make
virtqueue notify a lightweight exit by deferring hardware emulation to 
the
iothread and allowing the VM to continue execution.  This model is 
similar to
how vhost receives virtqueue notifies.
   
The result of this change is improved performance for userspace virtio 
devices.
Virtio-blk throughput increases especially for multithreaded scenarios 
and
virtio-net transmit throughput increases substantially.
  
   Interestingly, I see decreased throughput for small message
   host to get netperf runs.
  
   The command that I used was:
   netperf -H $vguest -- -m 200
  
   And the results are:
   - with ioeventfd=off
   TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.104 
   (11.0.0.104) port 0 AF_INET : demo
   Recv   Send    Send                          Utilization       Service 
   Demand
   Socket Socket  Message  Elapsed              Send     Recv     Send    
   Recv
   Size   Size    Size     Time     Throughput  local    remote   local   
   remote
   bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   
   us/KB
  
    87380  16384    200    10.00      3035.48   15.50    99.30    6.695   
   2.680
  
   - with ioeventfd=on
   TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.104 
   (11.0.0.104) port 0 AF_INET : demo
   Recv   Send    Send                          Utilization       Service 
   Demand
   Socket Socket  Message  Elapsed              Send     Recv     Send    
   Recv
   Size   Size    Size     Time     Throughput  local    remote   local   
   remote
   bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   
   us/KB
  
    87380  16384    200    10.00      1770.95   18.16    51.65    13.442  
   2.389
  
  
   Do you see this behaviour too?
 
  Just a note: this is with the patchset ported to qemu-kvm.

 And just another note: the trend is reversed for larged messages,
 e.g. with 1.5k messages ioeventfd=on outputforms ioeventfd=off.

 Another datapoint where I see a regression is with 4000 byte messages
 for guest to host traffic.

 ioeventfd=off
 set_up_server could not establish a listen endpoint for  port 12865 with 
 family AF_UNSPEC
 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.4 (11.0.0.4) 
 port 0 AF_INET : demo
 Recv   Send    Send                          Utilization       Service Demand
 Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
 Size   Size    Size     Time     Throughput  local    remote   local   remote
 bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

  87380  16384   4000    10.00      7717.56   98.80    15.11    1.049   2.566

 ioeventfd=on
 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.4 (11.0.0.4) 
 port 0 AF_INET : demo
 Recv   Send    Send                          Utilization       Service Demand
 Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
 Size   Size    Size     Time     Throughput  local    remote   local   remote
 bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

  87380  16384   4000    10.00      3965.86   87.69    15.29    1.811   5.055

Interesting.  I posted the following results in an earlier version of
this patch:

Sridhar Samudrala s...@us.ibm.com collected the following data for
virtio-net with 2.6.36-rc1 on the host and 2.6.34 on the guest.

Guest to Host TCP_STREAM throughput(Mb/sec)
---
Msg Size  vhost-net  virtio-net  virtio-net/ioeventfd
65536 127556430  7590
16384  84993084  5764
 4096  47231578  3659

Here we got a throughput improvement where you got a regression.  Your
virtio-net ioeventfd=off throughput is much higher than what we got
(different hardware and configuration, but still I didn't know that
virtio-net reaches 7 Gbit/s!).

I have focussed on the block side of things.  Any thoughts about the
virtio-net performance we're seeing?

 1024  1827 981  2060

Host to Guest TCP_STREAM throughput(Mb/sec)

[Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify

2010-12-13 Thread Michael S. Tsirkin
On Mon, Dec 13, 2010 at 10:24:51AM +, Stefan Hajnoczi wrote:
 On Sun, Dec 12, 2010 at 9:09 PM, Michael S. Tsirkin m...@redhat.com wrote:
  On Sun, Dec 12, 2010 at 10:56:34PM +0200, Michael S. Tsirkin wrote:
  On Sun, Dec 12, 2010 at 10:42:28PM +0200, Michael S. Tsirkin wrote:
   On Sun, Dec 12, 2010 at 10:41:28PM +0200, Michael S. Tsirkin wrote:
On Sun, Dec 12, 2010 at 03:02:04PM +, Stefan Hajnoczi wrote:
 See below for the v5 changelog.

 Due to lack of connectivity I am sending from GMail.  Git should 
 retain my
 stefa...@linux.vnet.ibm.com From address.

 Virtqueue notify is currently handled synchronously in userspace 
 virtio.  This
 prevents the vcpu from executing guest code while hardware emulation 
 code
 handles the notify.

 On systems that support KVM, the ioeventfd mechanism can be used to 
 make
 virtqueue notify a lightweight exit by deferring hardware emulation 
 to the
 iothread and allowing the VM to continue execution.  This model is 
 similar to
 how vhost receives virtqueue notifies.

 The result of this change is improved performance for userspace 
 virtio devices.
 Virtio-blk throughput increases especially for multithreaded 
 scenarios and
 virtio-net transmit throughput increases substantially.
   
Interestingly, I see decreased throughput for small message
host to get netperf runs.
   
The command that I used was:
netperf -H $vguest -- -m 200
   
And the results are:
- with ioeventfd=off
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.104 
(11.0.0.104) port 0 AF_INET : demo
Recv   Send    Send                          Utilization       Service 
Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    
Recv
Size   Size    Size     Time     Throughput  local    remote   local   
remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   
us/KB
   
 87380  16384    200    10.00      3035.48   15.50    99.30    6.695   
2.680
   
- with ioeventfd=on
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.104 
(11.0.0.104) port 0 AF_INET : demo
Recv   Send    Send                          Utilization       Service 
Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    
Recv
Size   Size    Size     Time     Throughput  local    remote   local   
remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   
us/KB
   
 87380  16384    200    10.00      1770.95   18.16    51.65    13.442  
2.389
   
   
Do you see this behaviour too?
  
   Just a note: this is with the patchset ported to qemu-kvm.
 
  And just another note: the trend is reversed for larged messages,
  e.g. with 1.5k messages ioeventfd=on outputforms ioeventfd=off.
 
  Another datapoint where I see a regression is with 4000 byte messages
  for guest to host traffic.
 
  ioeventfd=off
  set_up_server could not establish a listen endpoint for  port 12865 with 
  family AF_UNSPEC
  TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.4 
  (11.0.0.4) port 0 AF_INET : demo
  Recv   Send    Send                          Utilization       Service 
  Demand
  Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
  Size   Size    Size     Time     Throughput  local    remote   local   
  remote
  bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
 
   87380  16384   4000    10.00      7717.56   98.80    15.11    1.049   2.566
 
  ioeventfd=on
  TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.4 
  (11.0.0.4) port 0 AF_INET : demo
  Recv   Send    Send                          Utilization       Service 
  Demand
  Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
  Size   Size    Size     Time     Throughput  local    remote   local   
  remote
  bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
 
   87380  16384   4000    10.00      3965.86   87.69    15.29    1.811   5.055
 
 Interesting.  I posted the following results in an earlier version of
 this patch:
 
 Sridhar Samudrala s...@us.ibm.com collected the following data for
 virtio-net with 2.6.36-rc1 on the host and 2.6.34 on the guest.
 
 Guest to Host TCP_STREAM throughput(Mb/sec)
 ---
 Msg Size  vhost-net  virtio-net  virtio-net/ioeventfd
 65536 127556430  7590
 16384  84993084  5764
  4096  47231578  3659
 
 Here we got a throughput improvement where you got a regression.  Your
 virtio-net ioeventfd=off throughput is much higher than what we got
 (different hardware and configuration, but still I didn't know that
 virtio-net reaches 7 Gbit/s!).

Which qemu are you running? 

Re: [Qemu-devel] SCSI Command support over VirtIO Block device

2010-12-13 Thread अनुज
Hi

2010/12/13 Stefan Hajnoczi stefa...@gmail.com:

 On Dec 13, 2010 5:14 AM, अनुज anu...@gmail.com wrote:

 Hi

 I am trying to implement VirtIO support for a proprietary OS. And It
 would be great if I am able to process SCSI commands over VirtIO Block
 device.

 I tried to execute INQUIRY command but the status returned is UNSUPPORTED.
 If anyone provide example VirtIO SCSI Command request structure for
 INQUIRY command as per VirtIO spec Appendix D would be a great help.

 And also, the paragraph from VirtIO spec - 0.8.9 is confusing for me :

 Historically, devices assumed that the  fields type, ioprio and
 sector reside in
 a single, separate read-only buffer; the  fields errors, data_len,
 sense_len and
 residual reside in a single, separate write-only buffer; the sense
  eld in a separate
 write-only buffer of size 96 bytes, by itself; the fields errors,
 data_len, sense_len
 and residual in a single write-only buffer; and the status field is a
 separate readonly
 buffer of size 1 byte, by itself.

 Here 'status field of buffer size 1 byte' is whether readonly or
 writeonly?

 Writeonly


 I want to know from which version of Qemu-kvm supports processing of
 scsi commands over VirtIO block device as a backend.
 Although I checked the Host Feature fields in which VIRTIO_BLK_F_SCSI
 bit is set. I am using qemu-kvm version 0.12.3.

 Make sure you have a scsi-generic block device in qemu-kvm, not just a
 regular file or physical block device. Open /dev/sg.

Yes, I have given a file name instead of /dev/sg0. Now it's working as a charm.

That means I can use physical disk as a VirtIO disk in guest OS. right?
So it's kind of passthrough for a physical disk. But how can I
distinguish among different physical disks attached to the host.

is /dev/sg is different for each physical disk?

However I thought VirtIO scsi device operations are for virtual disk
(a regular file) also.


 Look at hw/virtio-blk.c in qemu-kvm for host implementation details.


 --

 Anuj Aggarwal

  .''`.
 : :Ⓐ :   # apt-get install hakuna-matata
 `. `'`
    `-



Thanks for your help.


Regards
-- 
Anuj Aggarwal

 .''`.
: :Ⓐ :   # apt-get install hakuna-matata
`. `'`
   `-



[Qemu-devel] Re: [Spice-devel] RFC; usb redirection protocol

2010-12-13 Thread Gerd Hoffmann

Basic packet structure / communication
--

Each packet exchanged between the vm-host and the usb-host starts
with a usb_redir_header, followed by an optional command specific
header follow by optional additional data.

The usb_redir_header each packet starts with looks as follows:

struct usb_redir_header { uint32_t command; uint32_t length; }


uint32_t id; ?  A reply would then carry the id of the request ...


Given that everything is done over a potentially slow transport in
practice the diferentiating between synchroneous and asynchroneous
commands may seem odd. The difference is how the usb-host will handle
them once received. For synchroneous commands the usb-host will hand
the request over to the host os and then *wait* for a response. This
means that the vm-host is guaranteed to get an immediate response.
Where as for asynchroneous commands to usb-host hands the request
over to the host os with the request to let the usb-host process know
when the request is done.


Hmm.  Looks like you are planning for one tcp stream and one thread (on 
the usb-host side) for each usb device.  That will not work very good 
for usb-over-vnc because there is a single tcp stream only.  We could of 
course multiplex multiple logical usb connections over vnc, but even 
then blocking on the usb-host side looks bad as this could disrupt other 
usb devices forwarded over the same connection.



usb_redir_report_descriptor ---

usb_redir_header.command: usb_redir_report_desciptor
usb_redir_header.length: sizeof usb device descriptors

No command specific header.

The command specific additional data contains the entire descriptors
for the usb device.

A packet of this type is send by the usb-host directly after the
hello packet it contains the usb descriptor tables for the usb
device.


Device addressing isn't done at all in the protocol, i.e. there is a 
fixed device - connection relation ship?



Please let me know what you think of this.


Do you know whenever certain low-level usb ops can work with this?

Specifically iphone firmware flashing was mentioned on the list.

Also I remember somewhere in the ehci (or xhci?) specs was mentioned 
with some devices it can be needed to talk to them *before* an bus 
address is assigned ...


cheers,
  Gerd




[Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify

2010-12-13 Thread Stefan Hajnoczi
Fresh results:

192.168.0.1 - host (runs netperf)
192.168.0.2 - guest (runs netserver)

host$ src/netperf -H 192.168.0.2 -- -m 200

ioeventfd=on
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
(192.168.0.2) port 0 AF_INET
Recv   SendSend
Socket Socket  Message  Elapsed
Size   SizeSize Time Throughput
bytes  bytes   bytessecs.10^6bits/sec
 87380  1638420010.001759.25

ioeventfd=off
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
(192.168.0.2) port 0 AF_INET
Recv   SendSend
Socket Socket  Message  Elapsed
Size   SizeSize Time Throughput
bytes  bytes   bytessecs.10^6bits/sec

 87380  1638420010.001757.15

The results vary approx +/- 3% between runs.

Invocation:
$ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev
type=tap,id=net0,ifname=tap0,script=no,downscript=no -device
virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive
if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img

I am running qemu.git with v5 patches, based off
36888c6335422f07bbc50bf3443a39f24b90c7c6.

Host:
1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz
8 GB RAM
RHEL 6 host

Next I will try the patches on latest qemu-kvm.git

Stefan



[Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify

2010-12-13 Thread Michael S. Tsirkin
On Mon, Dec 13, 2010 at 01:11:27PM +, Stefan Hajnoczi wrote:
 Fresh results:
 
 192.168.0.1 - host (runs netperf)
 192.168.0.2 - guest (runs netserver)
 
 host$ src/netperf -H 192.168.0.2 -- -m 200
 
 ioeventfd=on
 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
 (192.168.0.2) port 0 AF_INET
 Recv   SendSend
 Socket Socket  Message  Elapsed
 Size   SizeSize Time Throughput
 bytes  bytes   bytessecs.10^6bits/sec
  87380  1638420010.001759.25
 
 ioeventfd=off
 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
 (192.168.0.2) port 0 AF_INET
 Recv   SendSend
 Socket Socket  Message  Elapsed
 Size   SizeSize Time Throughput
 bytes  bytes   bytessecs.10^6bits/sec
 
  87380  1638420010.001757.15
 
 The results vary approx +/- 3% between runs.
 
 Invocation:
 $ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev
 type=tap,id=net0,ifname=tap0,script=no,downscript=no -device
 virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive
 if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img
 
 I am running qemu.git with v5 patches, based off
 36888c6335422f07bbc50bf3443a39f24b90c7c6.
 
 Host:
 1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz
 8 GB RAM
 RHEL 6 host
 
 Next I will try the patches on latest qemu-kvm.git
 
 Stefan

One interesting thing is that I put virtio-net earlier on
command line. Since iobus scan is linear for now, I wonder if this might
possibly matter.

-- 
MST



[Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify

2010-12-13 Thread Michael S. Tsirkin
On Mon, Dec 13, 2010 at 03:35:38PM +0200, Michael S. Tsirkin wrote:
 On Mon, Dec 13, 2010 at 01:11:27PM +, Stefan Hajnoczi wrote:
  Fresh results:
  
  192.168.0.1 - host (runs netperf)
  192.168.0.2 - guest (runs netserver)
  
  host$ src/netperf -H 192.168.0.2 -- -m 200
  
  ioeventfd=on
  TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
  (192.168.0.2) port 0 AF_INET
  Recv   SendSend
  Socket Socket  Message  Elapsed
  Size   SizeSize Time Throughput
  bytes  bytes   bytessecs.10^6bits/sec
   87380  1638420010.001759.25
  
  ioeventfd=off
  TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
  (192.168.0.2) port 0 AF_INET
  Recv   SendSend
  Socket Socket  Message  Elapsed
  Size   SizeSize Time Throughput
  bytes  bytes   bytessecs.10^6bits/sec
  
   87380  1638420010.001757.15
  
  The results vary approx +/- 3% between runs.
  
  Invocation:
  $ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev
  type=tap,id=net0,ifname=tap0,script=no,downscript=no -device
  virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive
  if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img
  
  I am running qemu.git with v5 patches, based off
  36888c6335422f07bbc50bf3443a39f24b90c7c6.
  
  Host:
  1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz
  8 GB RAM
  RHEL 6 host
  
  Next I will try the patches on latest qemu-kvm.git
  
  Stefan
 
 One interesting thing is that I put virtio-net earlier on
 command line.

Sorry I mean I put it after disk, you put it before.

 Since iobus scan is linear for now, I wonder if this might
 possibly matter.
 
 -- 
 MST



[Qemu-devel] Re: [PATCH v2 1/2] Do not register kvmclock savevm section if kvmclock is disabled.

2010-12-13 Thread Glauber Costa
On Wed, 2010-12-08 at 17:31 -0200, Marcelo Tosatti wrote:
 On Tue, Dec 07, 2010 at 03:12:36PM -0200, Glauber Costa wrote:
  On Mon, 2010-12-06 at 19:04 -0200, Marcelo Tosatti wrote:
   On Mon, Dec 06, 2010 at 09:03:46AM -0500, Glauber Costa wrote:
Usually nobody usually thinks about that scenario (me included and 
specially),
but kvmclock can be actually disabled in the host.

It happens in two scenarios:
 1. host too old.
 2. we passed -kvmclock to our -cpu parameter.

In both cases, we should not register kvmclock savevm section. This 
patch
achives that by registering this section only if kvmclock is actually
currently enabled in cpuid.

The only caveat is that we have to register the savevm section a little 
bit
later, since we won't know the final kvmclock state before cpuid gets 
parsed.
   
   What is the problem of registering the section? Restoring the value if
   the host does not support it returns an error?
   
   Can't you ignore the error if kvmclock is not reported in cpuid, in the
   restore handler?
  
  We can change the restore handler, but not the restore handler of
  binaries that are already out there. The motivation here is precisely to
  address migration to hosts without kvmclock, so it's better to have
  a way to disable, than to count on the fact that the other side will be
  able to ignore it.
 
 OK. Can't you register conditionally on kvmclock cpuid bit at the end of
 kvm_arch_init_vcpu, in target-i386/kvm.c?

Haven't looked at it, but will today. Actually, tsc has (obviously) the
same problem and I plan to respin the patch today including a fix for it
as well.

Thanks!





[Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify

2010-12-13 Thread Stefan Hajnoczi
Here are my results on qemu-kvm.git:

ioeventfd=on
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
(192.168.0.2) port 0 AF_INET
Recv   SendSend
Socket Socket  Message  Elapsed
Size   SizeSize Time Throughput
bytes  bytes   bytessecs.10^6bits/sec

 87380  1638420010.001203.44

ioeventfd=off
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
(192.168.0.2) port 0 AF_INET
Recv   SendSend
Socket Socket  Message  Elapsed
Size   SizeSize Time Throughput
bytes  bytes   bytessecs.10^6bits/sec

 87380  1638420010.001677.96

This is a 30% degradation that wasn't visible on qemu.git.

Same host.  qemu-kvm.git with v5 patches based on
cb1983b8809d0e06a97384a40bad1194a32fc814.

Stefan



[Qemu-devel] [Bug 595117] Re: qemu-nbd slow and missing writeback cache option

2010-12-13 Thread Serge Hallyn
@Stephane,

did upstream ever accept your patch?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/595117

Title:
  qemu-nbd slow and missing writeback cache option

Status in QEMU:
  Invalid
Status in “qemu-kvm” package in Ubuntu:
  Expired

Bug description:
  Binary package hint: qemu-kvm

dpkg -l | grep qemu
ii  kvm  
1:84+dfsg-0ubuntu16+0.12.3+noroms+0ubuntu9dummy transitional 
pacakge from kvm to qemu-
ii  qemu 0.12.3+noroms-0ubuntu9 
   dummy transitional pacakge from qemu to qemu
ii  qemu-common  0.12.3+noroms-0ubuntu9 
   qemu common functionality (bios, documentati
ii  qemu-kvm 0.12.3+noroms-0ubuntu9 
   Full virtualization on i386 and amd64 hardwa
ii  qemu-kvm-extras  0.12.3+noroms-0ubuntu9 
   fast processor emulator binaries for non-x86
ii  qemu-launcher1.7.4-1ubuntu2 
   GTK+ front-end to QEMU computer emulator
ii  qemuctl  0.2-2  
   controlling GUI for qemu

lucid amd64.

qemu-nbd is a lot slower when writing to disk than say nbd-server.

It appears it is because by default the disk image it serves is open with 
O_SYNC. The --nocache option, unintuitively, makes matters a bit better because 
it causes the image to be open with O_DIRECT instead of O_SYNC.

The qemu code allows an image to be open without any of those flags, but 
unfortunately qemu-nbd doesn't have the option to do that (qemu doesn't allow 
the image to be open with both O_SYNC and O_DIRECT though).

The default of qemu-img (of using O_SYNC) is not very sensible because anyway, 
the client (the kernel) uses caches (write-back), (and qemu-nbd -d doesn't 
flush those by the way). So if for instance qemu-nbd is killed, regardless of 
whether qemu-nbd uses O_SYNC, O_DIRECT or not, the data in the image will not 
be consistent anyway, unless syncs are done by the client (like fsync on the 
nbd device or sync mount option), and with qemu-nbd's O_SYNC mode, those 
syncs will be extremely slow.

Attached is a patch that adds a --cache={off,none,writethrough,writeback} 
option to qemu-nbd.

--cache=off is the same as --nocache (that is use O_DIRECT), writethrough is 
using O_SYNC and is still the default so this patch doesn't change the 
functionality. writeback is none of those flags, so is the addition of this 
patch. The patch also does an fsync upon qemu-nbd -d to make sure data is 
flushed to the image before removing the nbd.

Consider this test scenario:

dd bs=1M count=100 of=a  /dev/null
qemu-nbd --cache=x -c /dev/nbd0 a
cp /dev/zero /dev/nbd0
time perl -MIO::Handle -e 'STDOUT-sync or die$!' 1 /dev/nbd0

With cache=writethrough (the default), it takes over 10 minutes to write those 
100MB worth of zeroes. Running a strace, we see the recvfrom and sentos delayed 
by each 1kb write(2)s to disk (10 to 30 ms per write).

With cache=off, it takes about 30 seconds.

With cache=writeback, it takes about 3 seconds, which is similar to the 
performance you get with nbd-server

Note that the cp command runs instantly as the data is buffered by the client 
(the kernel), and not sent to qemu-nbd until the fsync(2) is called.





[Qemu-devel] Check out my photos on Shtyle.fm

2010-12-13 Thread iboga...@gmail.com




	
		
			

	
		Hi qemu-de...@nongnu.org!
	

			
		
	
	
		
			

	
	

	
		Check out my photos on Shtyle.fm
		
		I've created a profile on Shtyle.fm to upload my photos, share files and make new friends and I want to add you as a friend.
		
		View my Profile and Photos 
		
		Regards,
		
		Bogárdi Iván
		
		
	

			
		
	
	
		
			You can opt-out of Shtyle.fm emails.
		
	







[Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify

2010-12-13 Thread Stefan Hajnoczi
On Mon, Dec 13, 2010 at 1:36 PM, Michael S. Tsirkin m...@redhat.com wrote:
 On Mon, Dec 13, 2010 at 03:35:38PM +0200, Michael S. Tsirkin wrote:
 On Mon, Dec 13, 2010 at 01:11:27PM +, Stefan Hajnoczi wrote:
  Fresh results:
 
  192.168.0.1 - host (runs netperf)
  192.168.0.2 - guest (runs netserver)
 
  host$ src/netperf -H 192.168.0.2 -- -m 200
 
  ioeventfd=on
  TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
  (192.168.0.2) port 0 AF_INET
  Recv   Send    Send
  Socket Socket  Message  Elapsed
  Size   Size    Size     Time     Throughput
  bytes  bytes   bytes    secs.    10^6bits/sec
   87380  16384    200    10.00    1759.25
 
  ioeventfd=off
  TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
  (192.168.0.2) port 0 AF_INET
  Recv   Send    Send
  Socket Socket  Message  Elapsed
  Size   Size    Size     Time     Throughput
  bytes  bytes   bytes    secs.    10^6bits/sec
 
   87380  16384    200    10.00    1757.15
 
  The results vary approx +/- 3% between runs.
 
  Invocation:
  $ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev
  type=tap,id=net0,ifname=tap0,script=no,downscript=no -device
  virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive
  if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img
 
  I am running qemu.git with v5 patches, based off
  36888c6335422f07bbc50bf3443a39f24b90c7c6.
 
  Host:
  1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz
  8 GB RAM
  RHEL 6 host
 
  Next I will try the patches on latest qemu-kvm.git
 
  Stefan

 One interesting thing is that I put virtio-net earlier on
 command line.

 Sorry I mean I put it after disk, you put it before.

I can't find a measurable difference when swapping -drive and -netdev.

Can you run the same test with vhost?  I assume it still outperforms
userspace virtio for small message sizes?  I'm interested because that
also uses ioeventfd.

I am wondering if the iothread differences between qemu.git and
qemu-kvm.git can explain the performance results we see.  In
particular, qemu.git produces the same (high) throughput whether
ioeventfd is on or off.

Stefan



Re: [Qemu-devel] [PATCH 1/5] block: add discard support

2010-12-13 Thread Christoph Hellwig
On Sat, Dec 11, 2010 at 12:50:20PM +, Paul Brook wrote:
  It's guest visible state, so it must not change due to migrations.  For
  the current implementation all values for it work anyway - if it's
  smaller than the block size we'll zero out the remainder of the block.
 
 That sounds wrong. Surely we should leave partial blocks untouched.

While zeroing them is not required for qemu, the general semantics of
the XFS ioctl require it.  It punches a hole, which means it's makes the
new area equivalent to a hole create by truncating a file to a larger
size and then only writing at the larger offset.  The semantics for a
hole in all Unix filesystems is that we read back zeroes from them.
If we write into a sparse file at a not block aligned offset the
zeroing of the partial block also happens.




Re: [Qemu-devel] ]PATCH 0/7] add TRIM/UNMAP support, v3

2010-12-13 Thread Christoph Hellwig
On Sun, Dec 12, 2010 at 03:28:14PM +, Stefan Hajnoczi wrote:
 Do you have qemu-io support for discard?

Now that you wrote it we have the support :)

 Any hints on testing this?  A recent guest kernel and ext -o discard
 might exercise the code but I haven't tried yet.

Anything that submits a discard in the guest is fine.  The simples thing
to test are the various mkfs tools, as they do a whole device discard.
Also -o discard for various Linux filesystem works, Mark Lord's wiper.sh
script, or any Windows 7 installation.




Re: [Qemu-devel] [PATCH 5/6] [RFC] Emulation of Leon3.

2010-12-13 Thread Fabien Chouteau

On 12/11/2010 10:56 AM, Blue Swirl wrote:

On Tue, Dec 7, 2010 at 11:40 AM, Fabien Chouteauchout...@adacore.com  wrote:

On 12/06/2010 06:53 PM, Blue Swirl wrote:


On Mon, Dec 6, 2010 at 9:26 AM, Fabien Chouteauchout...@adacore.com
  wrote:


Signed-off-by: Fabien Chouteauchout...@adacore.com
---
  Makefile.target  |5 +-
  hw/leon3.c   |  310
++
  target-sparc/cpu.h   |   10 ++
  target-sparc/helper.c|2 +-
  target-sparc/op_helper.c |   30 -
  5 files changed, 353 insertions(+), 4 deletions(-)

diff --git a/Makefile.target b/Makefile.target
index 2800f47..f40e04f 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -290,7 +290,10 @@ obj-sparc-y += cirrus_vga.o
  else
  obj-sparc-y = sun4m.o lance.o tcx.o sun4m_iommu.o slavio_intctl.o
  obj-sparc-y += slavio_timer.o slavio_misc.o sparc32_dma.o
-obj-sparc-y += cs4231.o eccmemctl.o sbi.o sun4c_intctl.o
+obj-sparc-y += cs4231.o eccmemctl.o sbi.o sun4c_intctl.o leon3.o
+
+# GRLIB
+obj-sparc-y += grlib_gptimer.o grlib_irqmp.o grlib_apbuart.o
  endif

  obj-arm-y = integratorcp.o versatilepb.o arm_pic.o arm_timer.o
diff --git a/hw/leon3.c b/hw/leon3.c
new file mode 100644
index 000..ba61081
--- /dev/null
+++ b/hw/leon3.c
@@ -0,0 +1,310 @@
+/*
+ * QEMU Leon3 System Emulator
+ *
+ * Copyright (c) 2010 AdaCore
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining
a copy
+ * of this software and associated documentation files (the Software),
to deal
+ * in the Software without restriction, including without limitation the
rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or
sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be
included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN
+ * THE SOFTWARE.
+ */
+#include hw.h
+#include qemu-timer.h
+#include qemu-char.h
+#include sysemu.h
+#include boards.h
+#include loader.h
+#include elf.h
+
+#include grlib.h
+
+/* #define DEBUG_LEON3 */
+
+#ifdef DEBUG_LEON3
+#define DPRINTF(fmt, ...)   \
+do { printf(Leon3:  fmt , ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...)
+#endif
+
+/* Default system clock.  */
+#define CPU_CLK (40 * 1000 * 1000)
+
+#define PROM_FILENAMEu-boot.bin
+
+#define MAX_PILS 16
+
+typedef struct Leon3State
+{
+uint32_t cache_control;
+uint32_t inst_cache_conf;
+uint32_t data_cache_conf;
+
+uint64_t entry; /* save kernel entry in case of reset */
+} Leon3State;
+
+Leon3State leon3_state;


Again global state, please refactor. Perhaps most of the cache
handling code belong to target-sparc/op_helper.c and this structure to
CPUSPARCState.


I will try to find a solution for that.
Is it OK to add some Leon3 specific stuff in the CPUSPARCState?


Yes, no problem. You can also drop the intermediate Leon3State
structure if there is no benefit.


+
+/* Cache control: emulate the behavior of cache control registers but
without
+   any effect on the emulated CPU */
+
+#define CACHE_DISABLED 0x0
+#define CACHE_FROZEN   0x1
+#define CACHE_ENABLED  0x3
+
+/* Cache Control register fields */
+
+#define CACHE_CTRL_IF (1  4)  /* Instruction Cache Freeze on
Interrupt */
+#define CACHE_CTRL_DF (1  5)  /* Data Cache Freeze on Interrupt */
+#define CACHE_CTRL_DP (114)  /* Data cache flush pending */
+#define CACHE_CTRL_IP (115)  /* Instruction cache flush pending */
+#define CACHE_CTRL_IB (116)  /* Instruction burst fetch */
+#define CACHE_CTRL_FI (121)  /* Flush Instruction cache (Write only)
*/
+#define CACHE_CTRL_FD (122)  /* Flush Data cache (Write only) */
+#define CACHE_CTRL_DS (123)  /* Data cache snoop enable */
+
+void leon3_cache_control_int(void)
+{
+uint32_t state = 0;
+
+if (leon3_state.cache_controlCACHE_CTRL_IF) {
+/* Instruction cache state */
+state = leon3_state.cache_control0x3;


Please add a new define CACHE_CTRL_xxx to replace 0x3.



Done.


+if (state == CACHE_ENABLED) {
+state = CACHE_FROZEN;
+DPRINTF(Instruction cache: freeze\n);
+}
+
+leon3_state.cache_control= ~0x3;
+leon3_state.cache_control |= state;
+}
+
+if (leon3_state.cache_controlCACHE_CTRL_DF) {
+/* Data cache state */
+state = 

[Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify

2010-12-13 Thread Michael S. Tsirkin
On Mon, Dec 13, 2010 at 03:27:06PM +, Stefan Hajnoczi wrote:
 On Mon, Dec 13, 2010 at 1:36 PM, Michael S. Tsirkin m...@redhat.com wrote:
  On Mon, Dec 13, 2010 at 03:35:38PM +0200, Michael S. Tsirkin wrote:
  On Mon, Dec 13, 2010 at 01:11:27PM +, Stefan Hajnoczi wrote:
   Fresh results:
  
   192.168.0.1 - host (runs netperf)
   192.168.0.2 - guest (runs netserver)
  
   host$ src/netperf -H 192.168.0.2 -- -m 200
  
   ioeventfd=on
   TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
   (192.168.0.2) port 0 AF_INET
   Recv   Send    Send
   Socket Socket  Message  Elapsed
   Size   Size    Size     Time     Throughput
   bytes  bytes   bytes    secs.    10^6bits/sec
    87380  16384    200    10.00    1759.25
  
   ioeventfd=off
   TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
   (192.168.0.2) port 0 AF_INET
   Recv   Send    Send
   Socket Socket  Message  Elapsed
   Size   Size    Size     Time     Throughput
   bytes  bytes   bytes    secs.    10^6bits/sec
  
    87380  16384    200    10.00    1757.15
  
   The results vary approx +/- 3% between runs.
  
   Invocation:
   $ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev
   type=tap,id=net0,ifname=tap0,script=no,downscript=no -device
   virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive
   if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img
  
   I am running qemu.git with v5 patches, based off
   36888c6335422f07bbc50bf3443a39f24b90c7c6.
  
   Host:
   1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz
   8 GB RAM
   RHEL 6 host
  
   Next I will try the patches on latest qemu-kvm.git
  
   Stefan
 
  One interesting thing is that I put virtio-net earlier on
  command line.
 
  Sorry I mean I put it after disk, you put it before.
 
 I can't find a measurable difference when swapping -drive and -netdev.
 
 Can you run the same test with vhost?  I assume it still outperforms
 userspace virtio for small message sizes?  I'm interested because that
 also uses ioeventfd.

Seems to work same as ioeventfd.

 I am wondering if the iothread differences between qemu.git and
 qemu-kvm.git can explain the performance results we see.  In
 particular, qemu.git produces the same (high) throughput whether
 ioeventfd is on or off.
 
 Stefan



[Qemu-devel] Re: [PATCH 1/5] block: add discard support

2010-12-13 Thread Paolo Bonzini

On 12/10/2010 02:38 PM, Christoph Hellwig wrote:

if it's smaller than the block size we'll zero out the remainder of
the block.


I think it should fail at VM startup time, or even better do nothing at all.

When you write in the middle of an absent block, and a partially-zero 
block is created, this is not visible: a read cannot see the difference 
between all zeros because it's sparse and all zeros because it's zero.


If I ask you to (optionally) punch a 1kb hole but all you can do is 
punch a 2kb hole, I do care about the second kilobyte of data.  Since 
the hole punching of bdrv_discard is completely optional, it should not 
be done in this case.


Paolo



[Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify

2010-12-13 Thread Michael S. Tsirkin
On Mon, Dec 13, 2010 at 03:27:06PM +, Stefan Hajnoczi wrote:
 On Mon, Dec 13, 2010 at 1:36 PM, Michael S. Tsirkin m...@redhat.com wrote:
  On Mon, Dec 13, 2010 at 03:35:38PM +0200, Michael S. Tsirkin wrote:
  On Mon, Dec 13, 2010 at 01:11:27PM +, Stefan Hajnoczi wrote:
   Fresh results:
  
   192.168.0.1 - host (runs netperf)
   192.168.0.2 - guest (runs netserver)
  
   host$ src/netperf -H 192.168.0.2 -- -m 200
  
   ioeventfd=on
   TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
   (192.168.0.2) port 0 AF_INET
   Recv   Send    Send
   Socket Socket  Message  Elapsed
   Size   Size    Size     Time     Throughput
   bytes  bytes   bytes    secs.    10^6bits/sec
    87380  16384    200    10.00    1759.25
  
   ioeventfd=off
   TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
   (192.168.0.2) port 0 AF_INET
   Recv   Send    Send
   Socket Socket  Message  Elapsed
   Size   Size    Size     Time     Throughput
   bytes  bytes   bytes    secs.    10^6bits/sec
  
    87380  16384    200    10.00    1757.15
  
   The results vary approx +/- 3% between runs.
  
   Invocation:
   $ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev
   type=tap,id=net0,ifname=tap0,script=no,downscript=no -device
   virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive
   if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img
  
   I am running qemu.git with v5 patches, based off
   36888c6335422f07bbc50bf3443a39f24b90c7c6.
  
   Host:
   1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz
   8 GB RAM
   RHEL 6 host
  
   Next I will try the patches on latest qemu-kvm.git
  
   Stefan
 
  One interesting thing is that I put virtio-net earlier on
  command line.
 
  Sorry I mean I put it after disk, you put it before.
 
 I can't find a measurable difference when swapping -drive and -netdev.

One other concern I have is that we are apparently using
ioeventfd for all VQs. E.g. for virtio-net we probably should not
use it for the control VQ - it's a waste of resources.

 Can you run the same test with vhost?  I assume it still outperforms
 userspace virtio for small message sizes?  I'm interested because that
 also uses ioeventfd.
 
 I am wondering if the iothread differences between qemu.git and
 qemu-kvm.git can explain the performance results we see.  In
 particular, qemu.git produces the same (high) throughput whether
 ioeventfd is on or off.
 
 Stefan



[Qemu-devel] Re: [PATCH 1/5] block: add discard support

2010-12-13 Thread Christoph Hellwig
On Mon, Dec 13, 2010 at 05:07:27PM +0100, Paolo Bonzini wrote:
 On 12/10/2010 02:38 PM, Christoph Hellwig wrote:
 if it's smaller than the block size we'll zero out the remainder of
 the block.
 
 I think it should fail at VM startup time, or even better do nothing at all.

What should fail?

 When you write in the middle of an absent block, and a partially-zero 
 block is created, this is not visible: a read cannot see the difference 
 between all zeros because it's sparse and all zeros because it's zero.

You can not see from a VM if a block is not allocated or zeroed.  Then
again we'll never create a fully zeroed block anyway unless we get
really stupid discard patterns from the guest OS.

 If I ask you to (optionally) punch a 1kb hole but all you can do is 
 punch a 2kb hole, I do care about the second kilobyte of data.  Since 
 the hole punching of bdrv_discard is completely optional, it should not 
 be done in this case.

Of course we do not discard the second KB in that case.  If you issue
a 1k UNRSVSP ioctl on a 2k block size XFS filesystem it will zero
exactly the 1k you specified, which is required for the semantics of the
ioctl.  Yes, it's not optimal, but qemu can't easily know what block
size the underlying filesystem has.




Re: [Qemu-devel] [PATCH 1/5] block: add discard support

2010-12-13 Thread Paul Brook
 On Sat, Dec 11, 2010 at 12:50:20PM +, Paul Brook wrote:
   It's guest visible state, so it must not change due to migrations.  For
   the current implementation all values for it work anyway - if it's
   smaller than the block size we'll zero out the remainder of the block.
  
  That sounds wrong. Surely we should leave partial blocks untouched.
 
 While zeroing them is not required for qemu, the general semantics of
 the XFS ioctl require it.  It punches a hole, which means it's makes the
 new area equivalent to a hole create by truncating a file to a larger
 size and then only writing at the larger offset.  The semantics for a
 hole in all Unix filesystems is that we read back zeroes from them.
 If we write into a sparse file at a not block aligned offset the
 zeroing of the partial block also happens.

Ah, so it was just inconsistent use of the term block.  When the erase 
region includes part of a block, we zero that part of the block and leave the 
rest of the block untouched.

Paul



Re: [Qemu-devel] [PATCH 2/6] [RFC] Emulation of GRLIB IRQMP as defined in GRLIB IP Core User's Manual.

2010-12-13 Thread Fabien Chouteau

On 12/11/2010 11:31 AM, Blue Swirl wrote:

On Tue, Dec 7, 2010 at 10:43 AM, Fabien Chouteauchout...@adacore.com  wrote:

On 12/06/2010 06:25 PM, Blue Swirl wrote:


On Mon, Dec 6, 2010 at 9:26 AM, Fabien Chouteauchout...@adacore.com
  wrote:


Signed-off-by: Fabien Chouteauchout...@adacore.com
---
  hw/grlib_irqmp.c |  416
++
  1 files changed, 416 insertions(+), 0 deletions(-)

diff --git a/hw/grlib_irqmp.c b/hw/grlib_irqmp.c
new file mode 100644
index 000..69e1553
--- /dev/null
+++ b/hw/grlib_irqmp.c
@@ -0,0 +1,416 @@
+/*
+ * QEMU GRLIB IRQMP Emulator
+ *
+ * (Multiprocessor and extended interrupt not supported)
+ *
+ * Copyright (c) 2010 AdaCore
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining
a copy
+ * of this software and associated documentation files (the Software),
to deal
+ * in the Software without restriction, including without limitation the
rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or
sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be
included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include sysbus.h
+#include cpu.h
+
+#include grlib.h
+
+/* #define DEBUG_IRQ */
+
+#ifdef DEBUG_IRQ
+#define DPRINTF(fmt, ...)   \
+do { printf(IRQMP:  fmt , ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...)
+#endif
+
+#define IRQMP_MAX_CPU 16
+#define IRQMP_REG_SIZE 256  /* Size of memory mapped registers */
+
+/* Memory mapped register offsets */
+#define LEVEL_OFFSET 0x00
+#define PENDING_OFFSET   0x04
+#define FORCE0_OFFSET0x08
+#define CLEAR_OFFSET 0x0C
+#define MP_STATUS_OFFSET 0x10
+#define BROADCAST_OFFSET 0x14
+#define MASK_OFFSET  0x40
+#define FORCE_OFFSET 0x80
+#define EXTENDED_OFFSET  0xC0
+
+typedef struct IRQMP
+{
+SysBusDevice busdev;
+
+CPUSPARCState *env;


Devices should never access CPUState directly. Instead, board level
should create CPU irqs and these should then be passed here.



This case is special, Leon3 is a System-On-Chip and some of the components
are very close to the processor.
IRQMP is not really a peripheral nor a part of the CPU, it's both...


It's not a special case, it could be easily implemented separately.
MMUs, FPUs or co-processors could be special even if they have been
implemented as separate chips with real hardware. But we are actually
not looking at the (historical or current) chip boundaries but more
like what makes sense from QEMU architecture point of view.


OK then, let's go back to your first comment, why a device can't access
CPUState directly? And why Leon3.c would be better to do that.


--
Fabien Chouteau




[Qemu-devel] Re: [PATCH 1/5] block: add discard support

2010-12-13 Thread Paolo Bonzini

On 12/13/2010 05:15 PM, Christoph Hellwig wrote:

On Mon, Dec 13, 2010 at 05:07:27PM +0100, Paolo Bonzini wrote:

On 12/10/2010 02:38 PM, Christoph Hellwig wrote:

if it's smaller than the block size we'll zero out the remainder of
the block.


I think it should fail at VM startup time, or even better do nothing at all.


What should fail?


Nothing -- you wrote if it's smaller than the block size we'll zero out 
the remainder of the block which I interpreted the wrong way, i.e. as 
XFS will round up the size to the remainder of the block and zero that 
part out as well.


Thanks for the clarification.

Paolo



[Qemu-devel] [PATCH 1/4] Make vm_stop available for block layer

2010-12-13 Thread Kevin Wolf
blkqueue wants to stop the VM after an error has occurred, so we need to make
vm_stop available in common code. It now returns a boolean that tells if the VM
could be stopped, which is always true in qemu itself, and always false in the
tools.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 cpus.c|8 +---
 qemu-common.h |3 +++
 qemu-tool.c   |5 +
 sysemu.h  |1 -
 4 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/cpus.c b/cpus.c
index 91a0fb1..8ec0ed6 100644
--- a/cpus.c
+++ b/cpus.c
@@ -310,9 +310,10 @@ void qemu_notify_event(void)
 void qemu_mutex_lock_iothread(void) {}
 void qemu_mutex_unlock_iothread(void) {}
 
-void vm_stop(int reason)
+bool vm_stop(int reason)
 {
 do_vm_stop(reason);
+return true;
 }
 
 #else /* CONFIG_IOTHREAD */
@@ -848,7 +849,7 @@ static void qemu_system_vmstop_request(int reason)
 qemu_notify_event();
 }
 
-void vm_stop(int reason)
+bool vm_stop(int reason)
 {
 QemuThread me;
 qemu_thread_self(me);
@@ -863,9 +864,10 @@ void vm_stop(int reason)
 cpu_exit(cpu_single_env);
 cpu_single_env-stop = 1;
 }
-return;
+return true;
 }
 do_vm_stop(reason);
+return true;
 }
 
 #endif
diff --git a/qemu-common.h b/qemu-common.h
index de82c2e..cb077a0 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -115,6 +115,9 @@ static inline char *realpath(const char *path, char 
*resolved_path)
 
 #endif /* !defined(NEED_CPU_H) */
 
+/* VM state */
+bool vm_stop(int reason);
+
 /* bottom halves */
 typedef void QEMUBHFunc(void *opaque);
 
diff --git a/qemu-tool.c b/qemu-tool.c
index 392e1c9..3926435 100644
--- a/qemu-tool.c
+++ b/qemu-tool.c
@@ -111,3 +111,8 @@ int qemu_set_fd_handler2(int fd,
 {
 return 0;
 }
+
+bool vm_stop(int reason)
+{
+return false;
+}
diff --git a/sysemu.h b/sysemu.h
index b81a70e..77788f1 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -38,7 +38,6 @@ VMChangeStateEntry 
*qemu_add_vm_change_state_handler(VMChangeStateHandler *cb,
 void qemu_del_vm_change_state_handler(VMChangeStateEntry *e);
 
 void vm_start(void);
-void vm_stop(int reason);
 
 uint64_t ram_bytes_remaining(void);
 uint64_t ram_bytes_transferred(void);
-- 
1.7.2.3




[Qemu-devel] [PATCH 3/4] Test cases for block-queue

2010-12-13 Thread Kevin Wolf
Add some unit tests especially for the ordering and request merging in
block-queue.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 Makefile|1 +
 check-block-queue.c |  402 +++
 2 files changed, 403 insertions(+), 0 deletions(-)
 create mode 100644 check-block-queue.c

diff --git a/Makefile b/Makefile
index c80566c..3e60d7e 100644
--- a/Makefile
+++ b/Makefile
@@ -172,6 +172,7 @@ check-qdict: check-qdict.o qdict.o qfloat.o qint.o 
qstring.o qbool.o qlist.o $(C
 check-qlist: check-qlist.o qlist.o qint.o $(CHECK_PROG_DEPS)
 check-qfloat: check-qfloat.o qfloat.o $(CHECK_PROG_DEPS)
 check-qjson: check-qjson.o qfloat.o qint.o qdict.o qstring.o qlist.o qbool.o 
qjson.o json-streamer.o json-lexer.o json-parser.o $(CHECK_PROG_DEPS)
+check-block-queue: check-block-queue.o qemu-tool.o qemu-error.o $(oslib-obj-y) 
$(filter-out block-queue.o,$(block-obj-y)) $(qobject-obj-y) qemu-timer-common.o
 
 clean:
 # avoid old build problems by removing potentially incorrect old files
diff --git a/check-block-queue.c b/check-block-queue.c
new file mode 100644
index 000..b2d
--- /dev/null
+++ b/check-block-queue.c
@@ -0,0 +1,402 @@
+/*
+ * block-queue.c unit tests
+ *
+ * Copyright (c) 2010 Kevin Wolf kw...@redhat.com
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the Software), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+/* We want to test some static functions, so just include the source file */
+#define RUN_TESTS
+#include block-queue.c
+
+#define CHECK_WRITE(req, _bq, _offset, _size, _buf, _section) \
+do { \
+assert(req != NULL); \
+assert(req-type == REQ_TYPE_WRITE); \
+assert(req-bq == _bq); \
+assert(req-offset == _offset); \
+assert(req-size == _size); \
+assert(req-section == _section); \
+assert(!memcmp(req-buf, _buf, _size)); \
+} while(0)
+
+#define CHECK_BARRIER(req, _bq, _section) \
+do { \
+assert(req != NULL); \
+assert(req-type == REQ_TYPE_BARRIER); \
+assert(req-bq == _bq); \
+assert(req-section == _section); \
+} while(0)
+
+#define CHECK_READ(_context, _offset, _buf, _size, _cmpbuf) \
+do { \
+int ret; \
+memset(buf, 0, 512); \
+ret = blkqueue_pread(_context, _offset, _buf, _size); \
+assert(ret == 0); \
+assert(!memcmp(_cmpbuf, _buf, _size)); \
+} while(0)
+
+#define QUEUE_WRITE(_context, _offset, _buf, _size, _pattern) \
+do { \
+int ret; \
+memset(_buf, _pattern, _size); \
+ret = blkqueue_pwrite(_context, _offset, _buf, _size); \
+assert(ret == 0); \
+} while(0)
+#define QUEUE_BARRIER(_context) \
+do { \
+int ret; \
+ret = blkqueue_barrier(_context); \
+assert(ret == 0); \
+} while(0)
+
+#define POP_CHECK_WRITE(_bq, _offset, _buf, _size, _pattern, _section) \
+do { \
+BlockQueueRequest *req; \
+memset(_buf, _pattern, _size); \
+req = blkqueue_pop(_bq); \
+CHECK_WRITE(req, _bq, _offset, _size, _buf, _section); \
+blkqueue_free_request(req); \
+} while(0)
+#define POP_CHECK_BARRIER(_bq, _section) \
+do { \
+BlockQueueRequest *req; \
+req = blkqueue_pop(_bq); \
+CHECK_BARRIER(req, _bq, _section); \
+blkqueue_free_request(req); \
+} while(0)
+
+static void  __attribute__((used)) dump_queue(BlockQueue *bq)
+{
+BlockQueueRequest *req;
+
+fprintf(stderr, --- Queue dump ---\n);
+QTAILQ_FOREACH(req, bq-queue, link) {
+fprintf(stderr, [%d] , req-section);
+if (req-type == REQ_TYPE_WRITE) {
+fprintf(stderr, Write off=%5PRId64, len=%5PRId64, buf=%p\n,
+req-offset, req-size, req-buf);
+} else if (req-type == REQ_TYPE_BARRIER) {
+fprintf(stderr, Barrier\n);
+} else {
+fprintf(stderr, Unknown type %d\n, req-type);
+}
+}
+}
+
+static void 

[Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify

2010-12-13 Thread Stefan Hajnoczi
On Mon, Dec 13, 2010 at 4:12 PM, Michael S. Tsirkin m...@redhat.com wrote:
 On Mon, Dec 13, 2010 at 03:27:06PM +, Stefan Hajnoczi wrote:
 On Mon, Dec 13, 2010 at 1:36 PM, Michael S. Tsirkin m...@redhat.com wrote:
  On Mon, Dec 13, 2010 at 03:35:38PM +0200, Michael S. Tsirkin wrote:
  On Mon, Dec 13, 2010 at 01:11:27PM +, Stefan Hajnoczi wrote:
   Fresh results:
  
   192.168.0.1 - host (runs netperf)
   192.168.0.2 - guest (runs netserver)
  
   host$ src/netperf -H 192.168.0.2 -- -m 200
  
   ioeventfd=on
   TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
   (192.168.0.2) port 0 AF_INET
   Recv   Send    Send
   Socket Socket  Message  Elapsed
   Size   Size    Size     Time     Throughput
   bytes  bytes   bytes    secs.    10^6bits/sec
    87380  16384    200    10.00    1759.25
  
   ioeventfd=off
   TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
   (192.168.0.2) port 0 AF_INET
   Recv   Send    Send
   Socket Socket  Message  Elapsed
   Size   Size    Size     Time     Throughput
   bytes  bytes   bytes    secs.    10^6bits/sec
  
    87380  16384    200    10.00    1757.15
  
   The results vary approx +/- 3% between runs.
  
   Invocation:
   $ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev
   type=tap,id=net0,ifname=tap0,script=no,downscript=no -device
   virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive
   if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img
  
   I am running qemu.git with v5 patches, based off
   36888c6335422f07bbc50bf3443a39f24b90c7c6.
  
   Host:
   1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz
   8 GB RAM
   RHEL 6 host
  
   Next I will try the patches on latest qemu-kvm.git
  
   Stefan
 
  One interesting thing is that I put virtio-net earlier on
  command line.
 
  Sorry I mean I put it after disk, you put it before.

 I can't find a measurable difference when swapping -drive and -netdev.

 One other concern I have is that we are apparently using
 ioeventfd for all VQs. E.g. for virtio-net we probably should not
 use it for the control VQ - it's a waste of resources.

One option is a per-device (block, net, etc) bitmap that masks out
virtqueues.  Is that something you'd like to see?

I'm tempted to mask out the RX vq too and see how that affects the
qemu-kvm.git specific issue.

Stefan



[Qemu-devel] [PATCH 2/4] Add block-queue

2010-12-13 Thread Kevin Wolf
Instead of directly executing writes and fsyncs, queue them and execute them
asynchronously. What makes this interesting is that we can delay syncs and if
multiple syncs occur, we can merge them into one bdrv_flush.

A typical sequence in qcow2 (simple cluster allocation) looks like this:

1. Update refcount table
2. bdrv_flush
3. Update L2 entry

If we delay the operation and get three of these sequences queued before
actually executing, we end up with the following result, saving two syncs:

1. Update refcount table (req 1)
2. Update refcount table (req 2)
3. Update refcount table (req 3)
4. bdrv_flush
5. Update L2 entry (req 1)
6. Update L2 entry (req 2)
7. Update L2 entry (req 3)

This patch only commits a sync if either the guests has requested a flush or if
a certain number of requests is in the queue, so usually we batch more than
just three requests.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 Makefile.objs |2 +-
 block-queue.c |  875 +
 block-queue.h |   61 
 3 files changed, 937 insertions(+), 1 deletions(-)
 create mode 100644 block-queue.c
 create mode 100644 block-queue.h

diff --git a/Makefile.objs b/Makefile.objs
index 04625eb..7cb7dde 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -14,7 +14,7 @@ oslib-obj-$(CONFIG_POSIX) += oslib-posix.o
 # block-obj-y is code used by both qemu system emulation and qemu-img
 
 block-obj-y = cutils.o cache-utils.o qemu-malloc.o qemu-option.o module.o
-block-obj-y += nbd.o block.o aio.o aes.o qemu-config.o
+block-obj-y += nbd.o block.o aio.o aes.o qemu-config.o block-queue.o
 block-obj-$(CONFIG_POSIX) += posix-aio-compat.o
 block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
 
diff --git a/block-queue.c b/block-queue.c
new file mode 100644
index 000..448f20d
--- /dev/null
+++ b/block-queue.c
@@ -0,0 +1,875 @@
+/*
+ * QEMU System Emulator
+ *
+ * Copyright (c) 2010 Kevin Wolf kw...@redhat.com
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the Software), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include qemu-common.h
+#include qemu-queue.h
+#include block_int.h
+#include block-queue.h
+#include qemu-error.h
+
+//#define BLKQUEUE_DEBUG
+
+#ifdef BLKQUEUE_DEBUG
+#define DPRINTF(fmt, ...) fprintf(stderr, fmt, ##__VA_ARGS__)
+#else
+#define DPRINTF(...) do {} while(0)
+#endif
+
+#define WRITEBACK_MODES (BDRV_O_NOCACHE | BDRV_O_CACHE_WB)
+
+enum blkqueue_req_type {
+REQ_TYPE_WRITE,
+REQ_TYPE_BARRIER,
+REQ_TYPE_WAIT_FOR_COMPLETION,
+};
+
+typedef struct BlockQueueAIOCB {
+BlockDriverAIOCB common;
+QLIST_ENTRY(BlockQueueAIOCB) link;
+} BlockQueueAIOCB;
+
+typedef struct BlockQueueRequest {
+enum blkqueue_req_type type;
+BlockQueue* bq;
+
+uint64_toffset;
+void*   buf;
+uint64_tsize;
+unsignedsection;
+boolin_flight;
+
+struct ioveciov;
+QEMUIOVectorqiov;
+
+QLIST_HEAD(, BlockQueueAIOCB) acbs;
+
+QTAILQ_ENTRY(BlockQueueRequest) link;
+QSIMPLEQ_ENTRY(BlockQueueRequest) link_section;
+} BlockQueueRequest;
+
+QTAILQ_HEAD(bq_queue_head, BlockQueueRequest);
+
+struct BlockQueue {
+BlockDriverState*   bs;
+
+int barriers_requested;
+int barriers_submitted;
+int queue_size;
+int flushing;
+int num_waiting_for_cb;
+
+BlockQueueErrorHandler  error_handler;
+void*   error_opaque;
+int error_ret;
+
+int in_flight_num;
+enum blkqueue_req_type  in_flight_type;
+
+struct bq_queue_headqueue;
+struct bq_queue_headin_flight;
+
+QSIMPLEQ_HEAD(, BlockQueueRequest) sections;
+};
+
+typedef int (*blkqueue_rw_fn)(BlockQueueContext *context, uint64_t offset,
+void *buf, uint64_t size);
+typedef void (*blkqueue_handle_overlap)(void *new, void *old, size_t size);
+
+static void blkqueue_process_request(BlockQueue *bq);
+static 

[Qemu-devel] [PATCH 0/4] block-queue: Delay and batch metadata write

2010-12-13 Thread Kevin Wolf
Differences to RFC v3 include proper conversion of qcow2, addressing Stefan's
comments and fixing some error cases in which two write requests to the same
location might conflict.

Also worth noting is that bdrv_aio_pwrite is dropped. It was unsafe with
respect to multiple concurrent requests on the same sector and it's impossible
to safely emulate byte-wise access with bdrv_aio_readv/writev without
introducing yet another queue. Instead we fall back to synchronous bdrv_pwrite
now with unaligned requests in block-queue (they are rare).

Kevin Wolf (4):
  Make vm_stop available for block layer
  Add block-queue
  Test cases for block-queue
  qcow2: Use block-queue

 Makefile   |1 +
 Makefile.objs  |2 +-
 block-queue.c  |  875 
 block-queue.h  |   61 
 block/qcow2-cluster.c  |  139 +
 block/qcow2-refcount.c |  217 +++-
 block/qcow2-snapshot.c |  106 +--
 block/qcow2.c  |  144 +++-
 block/qcow2.h  |   33 ++-
 check-block-queue.c|  402 ++
 cpus.c |8 +-
 qemu-common.h  |3 +
 qemu-tool.c|5 +
 sysemu.h   |1 -
 14 files changed, 1793 insertions(+), 204 deletions(-)
 create mode 100644 block-queue.c
 create mode 100644 block-queue.h
 create mode 100644 check-block-queue.c

-- 
1.7.2.3




[Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify

2010-12-13 Thread Stefan Hajnoczi
On Mon, Dec 13, 2010 at 4:00 PM, Michael S. Tsirkin m...@redhat.com wrote:
 On Mon, Dec 13, 2010 at 03:27:06PM +, Stefan Hajnoczi wrote:
 On Mon, Dec 13, 2010 at 1:36 PM, Michael S. Tsirkin m...@redhat.com wrote:
  On Mon, Dec 13, 2010 at 03:35:38PM +0200, Michael S. Tsirkin wrote:
  On Mon, Dec 13, 2010 at 01:11:27PM +, Stefan Hajnoczi wrote:
   Fresh results:
  
   192.168.0.1 - host (runs netperf)
   192.168.0.2 - guest (runs netserver)
  
   host$ src/netperf -H 192.168.0.2 -- -m 200
  
   ioeventfd=on
   TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
   (192.168.0.2) port 0 AF_INET
   Recv   Send    Send
   Socket Socket  Message  Elapsed
   Size   Size    Size     Time     Throughput
   bytes  bytes   bytes    secs.    10^6bits/sec
    87380  16384    200    10.00    1759.25
  
   ioeventfd=off
   TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
   (192.168.0.2) port 0 AF_INET
   Recv   Send    Send
   Socket Socket  Message  Elapsed
   Size   Size    Size     Time     Throughput
   bytes  bytes   bytes    secs.    10^6bits/sec
  
    87380  16384    200    10.00    1757.15
  
   The results vary approx +/- 3% between runs.
  
   Invocation:
   $ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev
   type=tap,id=net0,ifname=tap0,script=no,downscript=no -device
   virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive
   if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img
  
   I am running qemu.git with v5 patches, based off
   36888c6335422f07bbc50bf3443a39f24b90c7c6.
  
   Host:
   1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz
   8 GB RAM
   RHEL 6 host
  
   Next I will try the patches on latest qemu-kvm.git
  
   Stefan
 
  One interesting thing is that I put virtio-net earlier on
  command line.
 
  Sorry I mean I put it after disk, you put it before.

 I can't find a measurable difference when swapping -drive and -netdev.

 Can you run the same test with vhost?  I assume it still outperforms
 userspace virtio for small message sizes?  I'm interested because that
 also uses ioeventfd.

 Seems to work same as ioeventfd.

vhost performs the same as ioeventfd=on?  And that means slower than
ioeventfd=off?

Stefan



[Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify

2010-12-13 Thread Michael S. Tsirkin
On Mon, Dec 13, 2010 at 04:29:58PM +, Stefan Hajnoczi wrote:
 On Mon, Dec 13, 2010 at 4:00 PM, Michael S. Tsirkin m...@redhat.com wrote:
  On Mon, Dec 13, 2010 at 03:27:06PM +, Stefan Hajnoczi wrote:
  On Mon, Dec 13, 2010 at 1:36 PM, Michael S. Tsirkin m...@redhat.com 
  wrote:
   On Mon, Dec 13, 2010 at 03:35:38PM +0200, Michael S. Tsirkin wrote:
   On Mon, Dec 13, 2010 at 01:11:27PM +, Stefan Hajnoczi wrote:
Fresh results:
   
192.168.0.1 - host (runs netperf)
192.168.0.2 - guest (runs netserver)
   
host$ src/netperf -H 192.168.0.2 -- -m 200
   
ioeventfd=on
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
(192.168.0.2) port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec
 87380  16384    200    10.00    1759.25
   
ioeventfd=off
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
(192.168.0.2) port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec
   
 87380  16384    200    10.00    1757.15
   
The results vary approx +/- 3% between runs.
   
Invocation:
$ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev
type=tap,id=net0,ifname=tap0,script=no,downscript=no -device
virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive
if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img
   
I am running qemu.git with v5 patches, based off
36888c6335422f07bbc50bf3443a39f24b90c7c6.
   
Host:
1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz
8 GB RAM
RHEL 6 host
   
Next I will try the patches on latest qemu-kvm.git
   
Stefan
  
   One interesting thing is that I put virtio-net earlier on
   command line.
  
   Sorry I mean I put it after disk, you put it before.
 
  I can't find a measurable difference when swapping -drive and -netdev.
 
  Can you run the same test with vhost?  I assume it still outperforms
  userspace virtio for small message sizes?  I'm interested because that
  also uses ioeventfd.
 
  Seems to work same as ioeventfd.
 
 vhost performs the same as ioeventfd=on?  And that means slower than
 ioeventfd=off?
 
 Stefan

Yes.

-- 
MST



Re: [Qemu-devel] [PATCH 6/6] [RFC] SPARCV8 asr17 register support.

2010-12-13 Thread Fabien Chouteau

On 12/11/2010 10:59 AM, Blue Swirl wrote:

On Tue, Dec 7, 2010 at 11:51 AM, Fabien Chouteauchout...@adacore.com  wrote:

On 12/06/2010 07:01 PM, Blue Swirl wrote:


On Mon, Dec 6, 2010 at 9:26 AM, Fabien Chouteauchout...@adacore.com
  wrote:


Signed-off-by: Fabien Chouteauchout...@adacore.com
---
  hw/leon3.c   |6 ++
  target-sparc/cpu.h   |1 +
  target-sparc/machine.c   |2 ++
  target-sparc/translate.c |   10 ++
  4 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/hw/leon3.c b/hw/leon3.c
index ba61081..9605ce8 100644
--- a/hw/leon3.c
+++ b/hw/leon3.c
@@ -187,6 +187,12 @@ static void main_cpu_reset(void *opaque)
values */
 leon3_state.inst_cache_conf = 0x1022;
 leon3_state.data_cache_conf = 0x1822;
+
+/* Asr17 for Leon3 mono-processor */
+env-asr17= 028;  /* CPU id */
+env-asr17= 18;   /* SPARC V8 multiply and divide
available */
+env-asr17= env-nwindows -1; /* Number of implemented registers
+   windows */


This is constant...


  }

  static void leon3_generic_hw_init(ram_addr_t  ram_size,
diff --git a/target-sparc/cpu.h b/target-sparc/cpu.h
index 6020ffd..36d49fc 100644
--- a/target-sparc/cpu.h
+++ b/target-sparc/cpu.h
@@ -341,6 +341,7 @@ typedef struct CPUSPARCState {
   from PSR) */
  #if !defined(TARGET_SPARC64) || defined(TARGET_ABI32)
 uint32_t wim;  /* window invalid mask */
+uint32_t asr17;/* asr17 */


... so no new env fields are needed...


  #endif
 target_ulong tbr;  /* trap base register */
  #if !defined(TARGET_SPARC64)
diff --git a/target-sparc/machine.c b/target-sparc/machine.c
index 752e431..c530bd3 100644
--- a/target-sparc/machine.c
+++ b/target-sparc/machine.c
@@ -42,6 +42,7 @@ void cpu_save(QEMUFile *f, void *opaque)
 qemu_put_be32s(f,env-pil_in);
  #ifndef TARGET_SPARC64
 qemu_put_be32s(f,env-wim);
+qemu_put_be32s(f,env-asr17);


... there's also nothing to save/load...


 /* MMU */
 for (i = 0; i32; i++)
 qemu_put_be32s(f,env-mmuregs[i]);
@@ -138,6 +139,7 @@ int cpu_load(QEMUFile *f, void *opaque, int
version_id)
 qemu_get_be32s(f,env-pil_in);
  #ifndef TARGET_SPARC64
 qemu_get_be32s(f,env-wim);
+qemu_get_be32s(f,env-asr17);
 /* MMU */
 for (i = 0; i32; i++)
 qemu_get_be32s(f,env-mmuregs[i]);
diff --git a/target-sparc/translate.c b/target-sparc/translate.c
index 23f9519..65de614 100644
--- a/target-sparc/translate.c
+++ b/target-sparc/translate.c
@@ -58,6 +58,7 @@ static TCGv cpu_hintp, cpu_htba, cpu_hver, cpu_ssr,
cpu_ver;
  static TCGv_i32 cpu_softint;
  #else
  static TCGv cpu_wim;
+static TCGv cpu_asr17;
  #endif
  /* local register indexes (only used inside old micro ops) */
  static TCGv cpu_tmp0;
@@ -2049,6 +2050,8 @@ static void disas_sparc_insn(DisasContext * dc)
 rs1 = GET_FIELD(insn, 13, 17);
 switch(rs1) {
 case 0: /* rdy */
+gen_movl_TN_reg(rd, cpu_y);
+break;
  #ifndef TARGET_SPARC64
 case 0x01 ... 0x0e: /* undefined in the SPARCv8
manual, rdy on the microSPARC
@@ -2058,6 +2061,11 @@ static void disas_sparc_insn(DisasContext * dc)
 case 0x10 ... 0x1f: /* implementation-dependent in the
SPARCv8 manual, rdy on the
microSPARC II */
+
+if (rs1 == 0x11) { /* Read %asr17 */
+gen_movl_TN_reg(rd, cpu_asr17);


Instead:
r_const = tcg_const_tl(asr constants  | dc-def-nwindows - 1);
gen_movl_TN_reg(rd, r_const);
tcg_temp_free(r_const);


OK for me, if it is acceptable to have this Leon3's specific behavior for
all the SPARC32 CPUs.


This will not affect other CPUs when you use CPU feature bits to make
the ASR only available to Leon3.


OK, I will try that.

--
Fabien Chouteau




Re: [Qemu-devel] [PATCH 5/6] [RFC] Emulation of Leon3.

2010-12-13 Thread Fabien Chouteau

On 12/12/2010 03:41 PM, Andreas Färber wrote:

Am 06.12.2010 um 10:26 schrieb Fabien Chouteau:


diff --git a/Makefile.target b/Makefile.target
index 2800f47..f40e04f 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -290,7 +290,10 @@ obj-sparc-y += cirrus_vga.o
else
obj-sparc-y = sun4m.o lance.o tcx.o sun4m_iommu.o slavio_intctl.o
obj-sparc-y += slavio_timer.o slavio_misc.o sparc32_dma.o
-obj-sparc-y += cs4231.o eccmemctl.o sbi.o sun4c_intctl.o
+obj-sparc-y += cs4231.o eccmemctl.o sbi.o sun4c_intctl.o leon3.o



+
+# GRLIB
+obj-sparc-y += grlib_gptimer.o grlib_irqmp.o grlib_apbuart.o


Aren't these three candidates for Makefile.hw if, as I understood it, 
they are from some non-sparc-specific component library?


They are sparc specific, but non-leon3-specific.

--
Fabien Chouteau




[Qemu-devel] Re: [PATCH] rtl8139: IO memory is not part of vmstate

2010-12-13 Thread Alex Williamson
On Mon, 2010-12-13 at 02:55 +0530, Juan Quintela wrote:
 Alex Williamson alex.william...@redhat.com wrote:
  On Sun, 2010-12-12 at 20:07 +0530, Juan Quintela wrote:
  Michael S. Tsirkin m...@redhat.com wrote:
   On Sun, Dec 12, 2010 at 05:23:39PM +0530, Juan Quintela wrote:
   Michael S. Tsirkin m...@redhat.com wrote:
On Thu, Dec 09, 2010 at 03:14:17PM -0700, Alex Williamson wrote:
  
How about we keep migrating the index for the benefit of
old versions, but ignore the value on load?
Something like the following:
   
   This was my 1st suggestion to Alex O:-)
  
   The difference here is that instead of sending garbage to the
   old version we send an actual index value.
  
   So, I am in.  he think this is bad for upstream,  I don't think so (but
   I understand that it is oppinable).
   
   Later, Juan.
  
   I think it makes sense to fix this for the stable branch,
   and I think we should try as hard as we can to avoid bumping up the
   version number there.
  
   For master we can bump the version number but it might be easier to
   just keep the code the same there.
  
  I think that your solution is better.  For older versions, it works as
  expected.  For new versions, problem is fixed.  Solution is not the
  purest, but you can say the same about uping the version for a state
  that is exactly the same length  fields O:-)
 
  I disagree, without bumping the version number, we can never guarantee
  the problem is behind us.
 
 we can, if we use the latest version.

And we determine we're using the latest version via the vmsd
version_id...

  We can always migrate to the bad version,
 
 That is the whole point.  Bumping the version makes this impossible.

Which seems like a good thing to me.  Yes, it sucks that a user may
upgrade a host, migrate a guest to it, and suddenly not be able to
migrate back to the original host.  On the other hand, isn't it better
that we don't allow a migration that could potentially risk the
integrity of the guest?  I think so.

  which puts our users at risk.  The responsible behavior is to allow
  forward migrations and prevent migrations to a version with an issue
  known to compromise VM integrity.  Perhaps I feel more strongly about
  this because I actually had to debug this problem.  Obvious in
  retrospect, but a huge pain in the butt to get there.
 
 Obviously, my point of view is different, and is related with
 maintaining a stable migration ABI. So, ... I am also biased.
 
 We have to make a decission (in general, not just this case):
 - we are going to never bump the version:
   this gives an stable ABI, but bugs stay with us forever

This is impossible.

 - we are not ever going to prettend that we care
   this makes changes trivial, as we don't have to maintain
   backward compatiblity.

That's a little dramatic.  If we can come up with a way to not bump the
version number, I'm all for it.  I haven't seen one so far.

 
 And that is it.  Basically anything in the middle don't matter.  If I
 have a machine definition, with only a single device that has bumped
 version, I can't migrate to the backwards one.

Sorry, it's for your own good.  AIUI, there is plenty of grey between
your criteria above.  Yes we should try to preserve the migration ABI.
However, we will hit bugs where that's impossible.  Then it's good to
have discussions like this and investigate whether we can safely make a
change without bumping the version_id.  IMHO, the integrity of the guest
is always more important than maintaining a static ABI.

 This is the reason why I am against the changes like this, if we are
 prettending that we are going to maintain the versions stable.
 
 Notice that there are (at least) two ways to look at this specific
 problem:
 - don't bump the version.
   * new - new : works
   * old - new : works
   * new - old : works (at least as well as old - old that existed
 before)

If it worked, I wouldn't be working on this bug ;)  Here are some
failure scenarios:

a)
   1. Boot guest with single rtl8139
   2. Hot add 2nd rtl8139
   3. Migrate guest
   4. Hot remove 2nd rtl8139
   Result: 1st NIC stops working, guest segfaults on reboot

Too complicated?  How about this:

b)
   1. Boot guest with 2 rtl8139 NICs
   2. Boot migration target with NICs listed in reverse order
   3. Migrate
   Result: NICs get swapped at reboot!!

Or how about:

c)
   1. Boot guest with e1000, rtl8139
   2. Boot migration target with rtl8139, e1000
   3. Migrate
   Result: rtl8139 now points at e1000 mmio space, fails on reboot,
e1000 fails if rtl8139 is removed

I don't think it's fair to call any of these working, and in fact, I
retract my patch that sets the mmio space to unassigned if the device is
hotplugged, since issues can clearly happen without hotplug involved.
The index the device uses depends entirely on instantiation ordering,
which is bound to cause confusing, hard to reproduce, and difficult to
debug issues.

 - bump the version
   * 

[Qemu-devel] Re: [PATCH] rtl8139: IO memory is not part of vmstate

2010-12-13 Thread Michael S. Tsirkin
On Mon, Dec 13, 2010 at 10:43:22AM -0700, Alex Williamson wrote:
 So, unfortunately, I stand by my original patch.

What about the one that put -1 in saved index for a hotplugged device?

-- 
MST



[Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify

2010-12-13 Thread Stefan Hajnoczi
On Mon, Dec 13, 2010 at 4:28 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Mon, Dec 13, 2010 at 4:12 PM, Michael S. Tsirkin m...@redhat.com wrote:
 On Mon, Dec 13, 2010 at 03:27:06PM +, Stefan Hajnoczi wrote:
 On Mon, Dec 13, 2010 at 1:36 PM, Michael S. Tsirkin m...@redhat.com wrote:
  On Mon, Dec 13, 2010 at 03:35:38PM +0200, Michael S. Tsirkin wrote:
  On Mon, Dec 13, 2010 at 01:11:27PM +, Stefan Hajnoczi wrote:
   Fresh results:
  
   192.168.0.1 - host (runs netperf)
   192.168.0.2 - guest (runs netserver)
  
   host$ src/netperf -H 192.168.0.2 -- -m 200
  
   ioeventfd=on
   TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
   (192.168.0.2) port 0 AF_INET
   Recv   Send    Send
   Socket Socket  Message  Elapsed
   Size   Size    Size     Time     Throughput
   bytes  bytes   bytes    secs.    10^6bits/sec
    87380  16384    200    10.00    1759.25
  
   ioeventfd=off
   TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
   (192.168.0.2) port 0 AF_INET
   Recv   Send    Send
   Socket Socket  Message  Elapsed
   Size   Size    Size     Time     Throughput
   bytes  bytes   bytes    secs.    10^6bits/sec
  
    87380  16384    200    10.00    1757.15
  
   The results vary approx +/- 3% between runs.
  
   Invocation:
   $ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev
   type=tap,id=net0,ifname=tap0,script=no,downscript=no -device
   virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive
   if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img
  
   I am running qemu.git with v5 patches, based off
   36888c6335422f07bbc50bf3443a39f24b90c7c6.
  
   Host:
   1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz
   8 GB RAM
   RHEL 6 host
  
   Next I will try the patches on latest qemu-kvm.git
  
   Stefan
 
  One interesting thing is that I put virtio-net earlier on
  command line.
 
  Sorry I mean I put it after disk, you put it before.

 I can't find a measurable difference when swapping -drive and -netdev.

 One other concern I have is that we are apparently using
 ioeventfd for all VQs. E.g. for virtio-net we probably should not
 use it for the control VQ - it's a waste of resources.

 One option is a per-device (block, net, etc) bitmap that masks out
 virtqueues.  Is that something you'd like to see?

 I'm tempted to mask out the RX vq too and see how that affects the
 qemu-kvm.git specific issue.

As expected, the rx virtqueue is involved in the degradation.  I
enabled ioeventfd only for the TX virtqueue and got the same good
results as userspace virtio-net.

When I enable only the rx virtqueue, performs decreases as we've seen above.

Stefan



[Qemu-devel] Re: [PATCH] rtl8139: IO memory is not part of vmstate

2010-12-13 Thread Alex Williamson
On Mon, 2010-12-13 at 19:50 +0200, Michael S. Tsirkin wrote:
 On Mon, Dec 13, 2010 at 10:43:22AM -0700, Alex Williamson wrote:
  So, unfortunately, I stand by my original patch.
 
 What about the one that put -1 in saved index for a hotplugged device?

There are still examples that don't work even without hotplug (example 2
and example 3 after the reboot).  That hack limits the damage, but still
leaves a latent bug for reboot and doesn't address the non-hotplug
scenarios.  So, I don't think it's worthwhile to pursue, and we
shouldn't pretend we can use it to avoid bumping the version_id.
Thanks,

Alex





[Qemu-devel] [PATCH 3/7] Add configure script and command line options for TPM interface.

2010-12-13 Thread Andreas Niederl
Signed-off-by: Andreas Niederl andreas.nied...@iaik.tugraz.at
---
 Makefile.objs   |3 +++
 configure   |9 +
 qemu-config.c   |   16 
 qemu-config.h   |1 +
 qemu-options.hx |6 ++
 vl.c|   29 +
 6 files changed, 64 insertions(+), 0 deletions(-)

diff --git a/Makefile.objs b/Makefile.objs
index 7409919..444a41a 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -278,6 +278,9 @@ hw-obj-$(CONFIG_REALLY_VIRTFS) += virtio-9p-debug.o
 hw-obj-$(CONFIG_VIRTFS) += virtio-9p-local.o virtio-9p-xattr.o
 hw-obj-$(CONFIG_VIRTFS) += virtio-9p-xattr-user.o virtio-9p-posix-acl.o
 
+# TPM passthrough device
+hw-obj-$(CONFIG_TPM) += tpm_tis.o tpm_backend.o tpm_host_backend.o
+
 ##
 # libdis
 # NOTE: the disassembler code is only needed for debugging
diff --git a/configure b/configure
index 2917874..ca97825 100755
--- a/configure
+++ b/configure
@@ -332,6 +332,7 @@ zero_malloc=
 trace_backend=nop
 trace_file=trace
 spice=
+tpm=no
 
 # OS specific
 if check_define __linux__ ; then
@@ -472,6 +473,7 @@ Haiku)
   usb=linux
   if [ $cpu = i386 -o $cpu = x86_64 ] ; then
 audio_possible_drivers=$audio_possible_drivers fmod
+tpm=yes
   fi
 ;;
 esac
@@ -739,6 +741,8 @@ for opt do
   ;;
   --enable-vhost-net) vhost_net=yes
   ;;
+  --disable-tpm) tpm=no
+  ;;
   --*dir)
   ;;
   *) echo ERROR: unknown option $opt; show_help=yes
@@ -934,6 +938,7 @@ echo   --trace-file=NAMEFull PATH,NAME of file to 
store traces
 echoDefault:trace-pid
 echo   --disable-spice  disable spice
 echo   --enable-spice   enable spice
+echo   --disable-tpmdisable tpm passthrough device emulation
 echo 
 echo NOTE: The object files are built at the place where configure is 
launched
 exit 1
@@ -2354,6 +2359,7 @@ echo vhost-net support $vhost_net
 echo Trace backend $trace_backend
 echo Trace output file $trace_file-pid
 echo spice support $spice
+echo tpm support   $tpm
 
 if test $sdl_too_old = yes; then
 echo - Your SDL version is too old - please upgrade to have SDL support
@@ -2606,6 +2612,9 @@ fi
 if test $fdatasync = yes ; then
   echo CONFIG_FDATASYNC=y  $config_host_mak
 fi
+if test $tpm = yes ; then
+  echo CONFIG_TPM=y  $config_host_mak
+fi
 if test $madvise = yes ; then
   echo CONFIG_MADVISE=y  $config_host_mak
 fi
diff --git a/qemu-config.c b/qemu-config.c
index 965fa46..b42483c 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -445,6 +445,22 @@ QemuOptsList qemu_option_rom_opts = {
 },
 };
 
+QemuOptsList qemu_tpm_opts = {
+.name = tpm,
+.implied_opt_name = type,
+.head = QTAILQ_HEAD_INITIALIZER(qemu_tpm_opts.head),
+.desc = {
+{
+.name = type,
+.type = QEMU_OPT_STRING,
+},{
+.name = path,
+.type = QEMU_OPT_STRING,
+},
+{ /*End of list */ }
+},
+};
+
 static QemuOptsList *vm_config_groups[32] = {
 qemu_drive_opts,
 qemu_chardev_opts,
diff --git a/qemu-config.h b/qemu-config.h
index 20d707f..eed9b3f 100644
--- a/qemu-config.h
+++ b/qemu-config.h
@@ -4,6 +4,7 @@
 extern QemuOptsList qemu_fsdev_opts;
 extern QemuOptsList qemu_virtfs_opts;
 extern QemuOptsList qemu_spice_opts;
+extern QemuOptsList qemu_tpm_opts;
 
 QemuOptsList *qemu_find_opts(const char *group);
 void qemu_add_opts(QemuOptsList *list);
diff --git a/qemu-options.hx b/qemu-options.hx
index 4d99a58..96cdb36 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2312,6 +2312,12 @@ STEXI
 Specify a trace file to log output traces to.
 ETEXI
 #endif
+#ifdef CONFIG_TPM
+DEF(tpm, HAS_ARG, QEMU_OPTION_tpm,
+-tpm host,id=id,path=path\n
+enable TPM support and forward commands to the given TPM 
device file\n,
+QEMU_ARCH_I386)
+#endif
 
 HXCOMM This is the last statement. Insert new options before this line!
 STEXI
diff --git a/vl.c b/vl.c
index cb0a3ec..fa29cbf 100644
--- a/vl.c
+++ b/vl.c
@@ -152,6 +152,9 @@ int main(int argc, char **argv)
 #ifdef CONFIG_VIRTFS
 #include fsdev/qemu-fsdev.h
 #endif
+#ifdef CONFIG_TPM
+#include hw/tpm.h
+#endif
 
 #include disas.h
 
@@ -1614,6 +1617,16 @@ static int fsdev_init_func(QemuOpts *opts, void *opaque)
 }
 #endif
 
+#ifdef CONFIG_TPM
+static int tpm_init_func(QemuOpts *opts, void *opaque)
+{
+int ret;
+ret = qemu_tpm_add(opts);
+
+return ret;
+}
+#endif
+
 static int mon_init_func(QemuOpts *opts, void *opaque)
 {
 CharDriverState *chr;
@@ -1944,6 +1957,10 @@ int main(int argc, char **argv, char **envp)
 tb_size = 0;
 autostart= 1;
 
+#ifdef CONFIG_TPM
+qemu_add_opts(qemu_tpm_opts);
+#endif
+
 /* first pass of option parsing */
 optind = 1;
 while (optind  argc) {
@@ -2438,6 +2455,13 @@ int main(int argc, char **argv, char **envp)
 qemu_free(arg_9p);
 break;
 }
+case QEMU_OPTION_tpm:
+   

[Qemu-devel] [PATCH 2/7] Add TPM host passthrough device backend.

2010-12-13 Thread Andreas Niederl
Threadlets are used for asynchronous I/O to the host TPM device because the
Linux TPM driver does not allow for non-blocking I/O.

This patch is based on the Threadlets patch series v12 posted on this list.

Signed-off-by: Andreas Niederl andreas.nied...@iaik.tugraz.at
---
 hw/tpm_backend.c  |1 +
 hw/tpm_host_backend.c |  219 +
 hw/tpm_int.h  |7 ++
 hw/tpm_tis.c  |3 -
 4 files changed, 227 insertions(+), 3 deletions(-)
 create mode 100644 hw/tpm_host_backend.c

diff --git a/hw/tpm_backend.c b/hw/tpm_backend.c
index a0bec7c..2d3b550 100644
--- a/hw/tpm_backend.c
+++ b/hw/tpm_backend.c
@@ -26,6 +26,7 @@ typedef struct {
 } TPMDriverTable;
 
 static const TPMDriverTable driver_table[] = {
+{ .name = host, .open = qemu_tpm_host_open },
 };
 
 int qemu_tpm_add(QemuOpts *opts) {
diff --git a/hw/tpm_host_backend.c b/hw/tpm_host_backend.c
new file mode 100644
index 000..238b030
--- /dev/null
+++ b/hw/tpm_host_backend.c
@@ -0,0 +1,219 @@
+
+#include errno.h
+#include signal.h
+
+#include qemu-common.h
+#include qemu-threadlets.h
+
+#include hw/tpm_int.h
+
+
+#define STATUS_DONE(1  1)
+#define STATUS_IN_PROGRESS (1  0)
+#define STATUS_IDLE 0
+
+typedef struct {
+TPMDriver common;
+
+ThreadletWork work;
+
+uint8_t send_status;
+uint8_t recv_status;
+
+int32_t send_len;
+int32_t recv_len;
+
+int fd;
+} TPMHostDriver;
+
+static int tpm_host_send(TPMDriver *drv, uint8_t locty, uint32_t len)
+{
+TPMHostDriver *hdrv = DO_UPCAST(TPMHostDriver, common, drv);
+int n = 0;
+
+drv-locty = locty;
+
+switch (hdrv-send_status) {
+case STATUS_IN_PROGRESS:
+break;
+case STATUS_IDLE:
+hdrv-send_len = len;
+hdrv-recv_len = TPM_MAX_PKT;
+/* asynchronous send */
+n = 1;
+submit_work(hdrv-work);
+break;
+case STATUS_DONE:
+break;
+default:
+n = -1;
+fprintf(stderr,
+tpm host backend: internal error on send status %d\n,
+hdrv-send_status);
+break;
+}
+
+return n;
+}
+
+static int tpm_host_recv(TPMDriver *drv, uint8_t locty, uint32_t len)
+{
+TPMHostDriver *hdrv = DO_UPCAST(TPMHostDriver, common, drv);
+int n = 0;
+
+drv-locty = locty;
+
+switch (hdrv-recv_status) {
+case STATUS_IN_PROGRESS:
+break;
+case STATUS_IDLE:
+break;
+case STATUS_DONE:
+hdrv-recv_status = STATUS_IDLE;
+n = hdrv-recv_len;
+break;
+default:
+n = -1;
+fprintf(stderr,
+tpm host backend: internal error on recv status %d\n,
+hdrv-recv_status);
+break;
+}
+
+return n;
+}
+
+
+/* borrowed from qemu-char.c */
+static int unix_write(int fd, const uint8_t *buf, uint32_t len)
+{
+int ret, len1;
+
+len1 = len;
+while (len1  0) {
+ret = write(fd, buf, len1);
+if (ret  0) {
+if (errno != EINTR  errno != EAGAIN)
+return -1;
+} else if (ret == 0) {
+break;
+} else {
+buf  += ret;
+len1 -= ret;
+}
+}
+return len - len1;
+}
+
+static int unix_read(int fd, uint8_t *buf, uint32_t len)
+{
+int ret, len1;
+uint8_t *buf1;
+
+len1 = len;
+buf1 = buf;
+while ((len1  0)  (ret = read(fd, buf1, len1)) != 0) {
+if (ret  0) {
+if (errno != EINTR  errno != EAGAIN)
+return -1;
+} else {
+buf1 += ret;
+len1 -= ret;
+}
+}
+return len - len1;
+}
+
+
+static void tpm_host_send_receive(ThreadletWork *work)
+{
+TPMHostDriver *drv = container_of(work, TPMHostDriver, work);
+TPMDriver *s   = drv-common;
+uint32_t  tpm_ret;
+int ret;
+
+drv-send_status = STATUS_IN_PROGRESS;
+
+DSHOW_BUFF(s-buf, To TPM);
+
+ret = unix_write(drv-fd, s-buf, drv-send_len);
+
+drv-send_len= ret;
+drv-send_status = STATUS_DONE;
+
+if (ret  0) {
+fprintf(stderr, Error: while transmitting data to host tpm
+: %s (%i)\n,
+strerror(errno), errno);
+return;
+}
+
+drv-recv_status = STATUS_IN_PROGRESS;
+
+ret = unix_read(drv-fd, s-buf, drv-recv_len);
+
+drv-recv_len= ret;
+drv-recv_status = STATUS_DONE;
+drv-send_status = STATUS_IDLE;
+
+if (ret  0) {
+fprintf(stderr, Error: while reading data from host tpm
+: %s (%i)\n,
+strerror(errno), errno);
+return;
+}
+
+DSHOW_BUFF(s-buf, From TPM);
+
+tpm_ret = (s-buf[8])*256 + s-buf[9];
+if (tpm_ret) {
+DPRINTF(tpm command failed with error %d\n, tpm_ret);
+} else {
+DPRINTF(tpm command succeeded\n);
+}
+}
+
+

[Qemu-devel] Re: [PATCH 3/7] Add configure script and command line options for TPM interface.

2010-12-13 Thread Andreas Niederl
On 12/13/2010 07:04 PM, Andreas Niederl wrote:
[...]

Sorry for the wrong patch count in the subject. Total number is 4.


Regards,
Andreas



smime.p7s
Description: S/MIME Cryptographic Signature


[Qemu-devel] [PATCH 1/7] Add TPM 1.2 device interface

2010-12-13 Thread Andreas Niederl
This implementation is based on the TPM 1.2 interface for virtualized TPM
devices from the Xen-4.0.0 ioemu-qemu-xen fork.

A backend driver infrastructure is provided to be able to use different
device backends.

Signed-off-by: Andreas Niederl andreas.nied...@iaik.tugraz.at
---
 hw/tpm.h |6 +
 hw/tpm_backend.c |   63 +
 hw/tpm_int.h |   36 +++
 hw/tpm_tis.c |  711 ++
 4 files changed, 816 insertions(+), 0 deletions(-)
 create mode 100644 hw/tpm.h
 create mode 100644 hw/tpm_backend.c
 create mode 100644 hw/tpm_int.h
 create mode 100644 hw/tpm_tis.c

diff --git a/hw/tpm.h b/hw/tpm.h
new file mode 100644
index 000..844c95e
--- /dev/null
+++ b/hw/tpm.h
@@ -0,0 +1,6 @@
+#ifndef TPM_H
+#define TPM_H
+
+int qemu_tpm_add(QemuOpts *opts);
+
+#endif /* TPM_H */
diff --git a/hw/tpm_backend.c b/hw/tpm_backend.c
new file mode 100644
index 000..a0bec7c
--- /dev/null
+++ b/hw/tpm_backend.c
@@ -0,0 +1,63 @@
+
+#include qemu-option.h
+
+#include hw/tpm.h
+#include hw/tpm_int.h
+
+
+static QLIST_HEAD(, TPMDriver) tpm_drivers =
+QLIST_HEAD_INITIALIZER(tpm_drivers);
+
+TPMDriver *tpm_get_driver(const char *id)
+{
+TPMDriver *drv;
+QLIST_FOREACH(drv, tpm_drivers, list) {
+if (!strcmp(drv-id, id)) {
+return drv;
+}
+}
+return NULL;
+}
+
+
+typedef struct {
+const char *name;
+TPMDriver *(*open)(QemuOpts *opts);
+} TPMDriverTable;
+
+static const TPMDriverTable driver_table[] = {
+};
+
+int qemu_tpm_add(QemuOpts *opts) {
+TPMDriver *drv = NULL;
+int i;
+
+if (qemu_opts_id(opts) == NULL) {
+fprintf(stderr, tpm: no id specified\n);
+return -1;
+}
+
+for (i = 0; i  ARRAY_SIZE(driver_table); i++) {
+if (strcmp(driver_table[i].name, qemu_opt_get(opts, type)) == 0) {
+break;
+}
+}
+
+if (i == ARRAY_SIZE(driver_table)) {
+fprintf(stderr, tpm: backend type %s not found\n,
+qemu_opt_get(opts, type));
+return -1;
+}
+
+drv = driver_table[i].open(opts);
+
+if (drv == NULL) {
+return -1;
+}
+
+drv-id = qemu_strdup(qemu_opts_id(opts));
+
+QLIST_INSERT_HEAD(tpm_drivers, drv, list);
+
+return 0;
+}
diff --git a/hw/tpm_int.h b/hw/tpm_int.h
new file mode 100644
index 000..d52d7e2
--- /dev/null
+++ b/hw/tpm_int.h
@@ -0,0 +1,36 @@
+#ifndef TPM_INT_H
+#define TPM_INT_H
+
+
+#include inttypes.h
+#include qemu-queue.h
+#include qemu-option.h
+
+
+typedef struct TPMDriver TPMDriver;
+struct TPMDriver {
+char *id;
+
+uint8_t  locty;
+uint8_t *buf;
+
+int (*send)(TPMDriver *drv, uint8_t locty, uint32_t len);
+int (*recv)(TPMDriver *drv, uint8_t locty, uint32_t len);
+
+QLIST_ENTRY(TPMDriver) list;
+};
+
+TPMDriver *tpm_get_driver(const char *id);
+
+#define DEBUG_TPM
+#ifdef DEBUG_TPM
+void show_buff(unsigned char *buff, const char *string);
+#define DPRINTF(fmt, ...) \
+fprintf(stderr, tpm_tis: %s:  fmt, __FUNCTION__, ##__VA_ARGS__)
+#define DSHOW_BUFF(buf, info) show_buff(buf, info)
+#else
+#define DPRINTF(fmt, ...)
+#define DSHOW_BUFF(buf, info)
+#endif
+
+#endif /* TPM_INT_H */
diff --git a/hw/tpm_tis.c b/hw/tpm_tis.c
new file mode 100644
index 000..0cee917
--- /dev/null
+++ b/hw/tpm_tis.c
@@ -0,0 +1,711 @@
+/*
+ * tpm_tis.c - QEMU emulator for a 1.2 TPM with TIS interface
+ *
+ * Copyright (C) 2006 IBM Corporation
+ * Copyright (C) 2010 IAIK, Graz University of Technology
+ *
+ * Author: Stefan Berger stef...@us.ibm.com
+ * David Safford saff...@us.ibm.com
+ *
+ * Author: Andreas Niederl andreas.nied...@iaik.tugraz.at
+ * Pass through a TPM device rather than using the emulator
+ * Modified to use a separate thread for IO to/from TPM as the Linux
+ * TPM driver framework does not allow non-blocking IO
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation, version 2 of the
+ * License.
+ *
+ *
+ * Implementation of the TIS interface according to specs at
+ * https://www.trustedcomputinggroup.org/
+ *
+ */
+
+#include sys/types.h
+#include sys/stat.h
+#include string.h
+
+#include qemu-option.h
+#include qemu-config.h
+#include hw/hw.h
+#include hw/pc.h
+#include hw/pci.h
+#include hw/pci_ids.h
+#include qemu-timer.h
+
+#include hw/tpm_int.h
+
+
+#define TPM_MAX_PKT4096
+#define TPM_MAX_PATH   4096
+
+#define TIS_ADDR_BASE 0xFED4
+
+/* tis registers */
+#define TPM_REG_ACCESS0x00
+#define TPM_REG_INT_ENABLE0x08
+#define TPM_REG_INT_VECTOR0x0c
+#define TPM_REG_INT_STATUS0x10
+#define TPM_REG_INTF_CAPABILITY   0x14
+#define TPM_REG_STS   0x18
+#define TPM_REG_DATA_FIFO 0x24
+#define TPM_REG_DID_VID   0xf00
+#define TPM_REG_RID   0xf04
+
+#define 

Re: [Qemu-devel] [PATCH 2/6] [RFC] Emulation of GRLIB IRQMP as defined in GRLIB IP Core User's Manual.

2010-12-13 Thread Blue Swirl
On Mon, Dec 13, 2010 at 4:23 PM, Fabien Chouteau chout...@adacore.com wrote:
 On 12/11/2010 11:31 AM, Blue Swirl wrote:

 On Tue, Dec 7, 2010 at 10:43 AM, Fabien Chouteauchout...@adacore.com
  wrote:

 On 12/06/2010 06:25 PM, Blue Swirl wrote:

 On Mon, Dec 6, 2010 at 9:26 AM, Fabien Chouteauchout...@adacore.com
  wrote:

 Signed-off-by: Fabien Chouteauchout...@adacore.com
 ---
  hw/grlib_irqmp.c |  416
 ++
  1 files changed, 416 insertions(+), 0 deletions(-)

 diff --git a/hw/grlib_irqmp.c b/hw/grlib_irqmp.c
 new file mode 100644
 index 000..69e1553
 --- /dev/null
 +++ b/hw/grlib_irqmp.c
 @@ -0,0 +1,416 @@
 +/*
 + * QEMU GRLIB IRQMP Emulator
 + *
 + * (Multiprocessor and extended interrupt not supported)
 + *
 + * Copyright (c) 2010 AdaCore
 + *
 + * Permission is hereby granted, free of charge, to any person
 obtaining
 a copy
 + * of this software and associated documentation files (the
 Software),
 to deal
 + * in the Software without restriction, including without limitation
 the
 rights
 + * to use, copy, modify, merge, publish, distribute, sublicense,
 and/or
 sell
 + * copies of the Software, and to permit persons to whom the Software
 is
 + * furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice shall be
 included in
 + * all copies or substantial portions of the Software.
 + *
 + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
 EXPRESS OR
 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
 MERCHANTABILITY,
 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
 SHALL
 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES
 OR
 OTHER
 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
 ARISING FROM,
 + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
 DEALINGS IN
 + * THE SOFTWARE.
 + */
 +
 +#include sysbus.h
 +#include cpu.h
 +
 +#include grlib.h
 +
 +/* #define DEBUG_IRQ */
 +
 +#ifdef DEBUG_IRQ
 +#define DPRINTF(fmt, ...)                                       \
 +    do { printf(IRQMP:  fmt , ## __VA_ARGS__); } while (0)
 +#else
 +#define DPRINTF(fmt, ...)
 +#endif
 +
 +#define IRQMP_MAX_CPU 16
 +#define IRQMP_REG_SIZE 256      /* Size of memory mapped registers */
 +
 +/* Memory mapped register offsets */
 +#define LEVEL_OFFSET     0x00
 +#define PENDING_OFFSET   0x04
 +#define FORCE0_OFFSET    0x08
 +#define CLEAR_OFFSET     0x0C
 +#define MP_STATUS_OFFSET 0x10
 +#define BROADCAST_OFFSET 0x14
 +#define MASK_OFFSET      0x40
 +#define FORCE_OFFSET     0x80
 +#define EXTENDED_OFFSET  0xC0
 +
 +typedef struct IRQMP
 +{
 +    SysBusDevice busdev;
 +
 +    CPUSPARCState *env;

 Devices should never access CPUState directly. Instead, board level
 should create CPU irqs and these should then be passed here.


 This case is special, Leon3 is a System-On-Chip and some of the
 components
 are very close to the processor.
 IRQMP is not really a peripheral nor a part of the CPU, it's both...

 It's not a special case, it could be easily implemented separately.
 MMUs, FPUs or co-processors could be special even if they have been
 implemented as separate chips with real hardware. But we are actually
 not looking at the (historical or current) chip boundaries but more
 like what makes sense from QEMU architecture point of view.

 OK then, let's go back to your first comment, why a device can't access
 CPUState directly? And why Leon3.c would be better to do that.

Devices should mind their own business, not other devices' or
especially CPUs' businesses. The signals between devices should be
made with qemu_irq or bus style interfaces. Board case is different
because there we interface with QEMU host. Not all devices are very
clean yet.

This has been discussed a few times earlier, please see the list
archives if you really are interested.



Re: [Qemu-devel] [PATCH 5/6] [RFC] Emulation of Leon3.

2010-12-13 Thread Blue Swirl
On Mon, Dec 13, 2010 at 3:51 PM, Fabien Chouteau chout...@adacore.com wrote:
 On 12/11/2010 10:56 AM, Blue Swirl wrote:

 On Tue, Dec 7, 2010 at 11:40 AM, Fabien Chouteauchout...@adacore.com
  wrote:

 On 12/06/2010 06:53 PM, Blue Swirl wrote:

 On Mon, Dec 6, 2010 at 9:26 AM, Fabien Chouteauchout...@adacore.com
  wrote:

 Signed-off-by: Fabien Chouteauchout...@adacore.com
 ---
  Makefile.target          |    5 +-
  hw/leon3.c               |  310
 ++
  target-sparc/cpu.h       |   10 ++
  target-sparc/helper.c    |    2 +-
  target-sparc/op_helper.c |   30 -
  5 files changed, 353 insertions(+), 4 deletions(-)

 diff --git a/Makefile.target b/Makefile.target
 index 2800f47..f40e04f 100644
 --- a/Makefile.target
 +++ b/Makefile.target
 @@ -290,7 +290,10 @@ obj-sparc-y += cirrus_vga.o
  else
  obj-sparc-y = sun4m.o lance.o tcx.o sun4m_iommu.o slavio_intctl.o
  obj-sparc-y += slavio_timer.o slavio_misc.o sparc32_dma.o
 -obj-sparc-y += cs4231.o eccmemctl.o sbi.o sun4c_intctl.o
 +obj-sparc-y += cs4231.o eccmemctl.o sbi.o sun4c_intctl.o leon3.o
 +
 +# GRLIB
 +obj-sparc-y += grlib_gptimer.o grlib_irqmp.o grlib_apbuart.o
  endif

  obj-arm-y = integratorcp.o versatilepb.o arm_pic.o arm_timer.o
 diff --git a/hw/leon3.c b/hw/leon3.c
 new file mode 100644
 index 000..ba61081
 --- /dev/null
 +++ b/hw/leon3.c
 @@ -0,0 +1,310 @@
 +/*
 + * QEMU Leon3 System Emulator
 + *
 + * Copyright (c) 2010 AdaCore
 + *
 + * Permission is hereby granted, free of charge, to any person
 obtaining
 a copy
 + * of this software and associated documentation files (the
 Software),
 to deal
 + * in the Software without restriction, including without limitation
 the
 rights
 + * to use, copy, modify, merge, publish, distribute, sublicense,
 and/or
 sell
 + * copies of the Software, and to permit persons to whom the Software
 is
 + * furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice shall be
 included in
 + * all copies or substantial portions of the Software.
 + *
 + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
 EXPRESS OR
 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
 MERCHANTABILITY,
 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
 SHALL
 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES
 OR
 OTHER
 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
 ARISING FROM,
 + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
 DEALINGS IN
 + * THE SOFTWARE.
 + */
 +#include hw.h
 +#include qemu-timer.h
 +#include qemu-char.h
 +#include sysemu.h
 +#include boards.h
 +#include loader.h
 +#include elf.h
 +
 +#include grlib.h
 +
 +/* #define DEBUG_LEON3 */
 +
 +#ifdef DEBUG_LEON3
 +#define DPRINTF(fmt, ...)                                       \
 +    do { printf(Leon3:  fmt , ## __VA_ARGS__); } while (0)
 +#else
 +#define DPRINTF(fmt, ...)
 +#endif
 +
 +/* Default system clock.  */
 +#define CPU_CLK (40 * 1000 * 1000)
 +
 +#define PROM_FILENAME        u-boot.bin
 +
 +#define MAX_PILS 16
 +
 +typedef struct Leon3State
 +{
 +    uint32_t cache_control;
 +    uint32_t inst_cache_conf;
 +    uint32_t data_cache_conf;
 +
 +    uint64_t entry;             /* save kernel entry in case of reset
 */
 +} Leon3State;
 +
 +Leon3State leon3_state;

 Again global state, please refactor. Perhaps most of the cache
 handling code belong to target-sparc/op_helper.c and this structure to
 CPUSPARCState.

 I will try to find a solution for that.
 Is it OK to add some Leon3 specific stuff in the CPUSPARCState?

 Yes, no problem. You can also drop the intermediate Leon3State
 structure if there is no benefit.

 +
 +/* Cache control: emulate the behavior of cache control registers but
 without
 +   any effect on the emulated CPU */
 +
 +#define CACHE_DISABLED 0x0
 +#define CACHE_FROZEN   0x1
 +#define CACHE_ENABLED  0x3
 +
 +/* Cache Control register fields */
 +
 +#define CACHE_CTRL_IF (1      4)  /* Instruction Cache Freeze on
 Interrupt */
 +#define CACHE_CTRL_DF (1      5)  /* Data Cache Freeze on Interrupt
 */
 +#define CACHE_CTRL_DP (1    14)  /* Data cache flush pending */
 +#define CACHE_CTRL_IP (1    15)  /* Instruction cache flush pending
 */
 +#define CACHE_CTRL_IB (1    16)  /* Instruction burst fetch */
 +#define CACHE_CTRL_FI (1    21)  /* Flush Instruction cache (Write
 only)
 */
 +#define CACHE_CTRL_FD (1    22)  /* Flush Data cache (Write only) */
 +#define CACHE_CTRL_DS (1    23)  /* Data cache snoop enable */
 +
 +void leon3_cache_control_int(void)
 +{
 +    uint32_t state = 0;
 +
 +    if (leon3_state.cache_control    CACHE_CTRL_IF) {
 +        /* Instruction cache state */
 +        state = leon3_state.cache_control    0x3;

 Please add a new define CACHE_CTRL_xxx to replace 0x3.


 Done.

 +        if (state == CACHE_ENABLED) {
 +            state = CACHE_FROZEN;
 +            DPRINTF(Instruction cache: freeze\n);
 +        

[Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify

2010-12-13 Thread Michael S. Tsirkin
On Mon, Dec 13, 2010 at 05:57:28PM +, Stefan Hajnoczi wrote:
 On Mon, Dec 13, 2010 at 4:28 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
  On Mon, Dec 13, 2010 at 4:12 PM, Michael S. Tsirkin m...@redhat.com wrote:
  On Mon, Dec 13, 2010 at 03:27:06PM +, Stefan Hajnoczi wrote:
  On Mon, Dec 13, 2010 at 1:36 PM, Michael S. Tsirkin m...@redhat.com 
  wrote:
   On Mon, Dec 13, 2010 at 03:35:38PM +0200, Michael S. Tsirkin wrote:
   On Mon, Dec 13, 2010 at 01:11:27PM +, Stefan Hajnoczi wrote:
Fresh results:
   
192.168.0.1 - host (runs netperf)
192.168.0.2 - guest (runs netserver)
   
host$ src/netperf -H 192.168.0.2 -- -m 200
   
ioeventfd=on
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
(192.168.0.2) port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec
 87380  16384    200    10.00    1759.25
   
ioeventfd=off
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
(192.168.0.2) port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec
   
 87380  16384    200    10.00    1757.15
   
The results vary approx +/- 3% between runs.
   
Invocation:
$ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev
type=tap,id=net0,ifname=tap0,script=no,downscript=no -device
virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive
if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img
   
I am running qemu.git with v5 patches, based off
36888c6335422f07bbc50bf3443a39f24b90c7c6.
   
Host:
1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz
8 GB RAM
RHEL 6 host
   
Next I will try the patches on latest qemu-kvm.git
   
Stefan
  
   One interesting thing is that I put virtio-net earlier on
   command line.
  
   Sorry I mean I put it after disk, you put it before.
 
  I can't find a measurable difference when swapping -drive and -netdev.
 
  One other concern I have is that we are apparently using
  ioeventfd for all VQs. E.g. for virtio-net we probably should not
  use it for the control VQ - it's a waste of resources.
 
  One option is a per-device (block, net, etc) bitmap that masks out
  virtqueues.  Is that something you'd like to see?
 
  I'm tempted to mask out the RX vq too and see how that affects the
  qemu-kvm.git specific issue.
 
 As expected, the rx virtqueue is involved in the degradation.  I
 enabled ioeventfd only for the TX virtqueue and got the same good
 results as userspace virtio-net.
 
 When I enable only the rx virtqueue, performs decreases as we've seen above.
 
 Stefan

Interesting. In particular this implies something's wrong with the
queue: we should not normally be getting notifications from rx queue
at all. Is it running low on buffers? Does it help to increase the vq
size?  Any other explanation?

-- 
MST



[Qemu-devel] Re: [PATCH] rtl8139: IO memory is not part of vmstate

2010-12-13 Thread Michael S. Tsirkin
On Mon, Dec 13, 2010 at 11:00:44AM -0700, Alex Williamson wrote:
 On Mon, 2010-12-13 at 19:50 +0200, Michael S. Tsirkin wrote:
  On Mon, Dec 13, 2010 at 10:43:22AM -0700, Alex Williamson wrote:
   So, unfortunately, I stand by my original patch.
  
  What about the one that put -1 in saved index for a hotplugged device?
 
 There are still examples that don't work even without hotplug (example 2
 and example 3 after the reboot).  That hack limits the damage, but still
 leaves a latent bug for reboot and doesn't address the non-hotplug
 scenarios.  So, I don't think it's worthwhile to pursue, and we
 shouldn't pretend we can use it to avoid bumping the version_id.
 Thanks,
 
 Alex

I guess when we bump it we tell users: migration is completely
borken to the old version, don't even try it.

Is there a way for libvirt to discover such incompatibilities
and avoid the migration?

-- 
MST



[Qemu-devel] Re: [PATCH] rtl8139: IO memory is not part of vmstate

2010-12-13 Thread Alex Williamson
On Mon, 2010-12-13 at 20:54 +0200, Michael S. Tsirkin wrote:
 On Mon, Dec 13, 2010 at 11:00:44AM -0700, Alex Williamson wrote:
  On Mon, 2010-12-13 at 19:50 +0200, Michael S. Tsirkin wrote:
   On Mon, Dec 13, 2010 at 10:43:22AM -0700, Alex Williamson wrote:
So, unfortunately, I stand by my original patch.
   
   What about the one that put -1 in saved index for a hotplugged device?
  
  There are still examples that don't work even without hotplug (example 2
  and example 3 after the reboot).  That hack limits the damage, but still
  leaves a latent bug for reboot and doesn't address the non-hotplug
  scenarios.  So, I don't think it's worthwhile to pursue, and we
  shouldn't pretend we can use it to avoid bumping the version_id.
  Thanks,
  
  Alex
 
 I guess when we bump it we tell users: migration is completely
 borken to the old version, don't even try it.
 
 Is there a way for libvirt to discover such incompatibilities
 and avoid the migration?

I don't know if libvirt has a way to query this in advance.  If a
migration is attempted, the target will report:

savevm: unsupported version 5 for ':00:03.0/rtl8139' v4

And the source will continue running.  We waste plenty of bits getting
to that point, but hopefully libvirt understands that it failed.
Thanks,

Alex




[Qemu-devel] Re: [PATCH] rtl8139: IO memory is not part of vmstate

2010-12-13 Thread Michael S. Tsirkin
On Mon, Dec 13, 2010 at 11:59:16AM -0700, Alex Williamson wrote:
 On Mon, 2010-12-13 at 20:54 +0200, Michael S. Tsirkin wrote:
  On Mon, Dec 13, 2010 at 11:00:44AM -0700, Alex Williamson wrote:
   On Mon, 2010-12-13 at 19:50 +0200, Michael S. Tsirkin wrote:
On Mon, Dec 13, 2010 at 10:43:22AM -0700, Alex Williamson wrote:
 So, unfortunately, I stand by my original patch.

What about the one that put -1 in saved index for a hotplugged device?
   
   There are still examples that don't work even without hotplug (example 2
   and example 3 after the reboot).  That hack limits the damage, but still
   leaves a latent bug for reboot and doesn't address the non-hotplug
   scenarios.  So, I don't think it's worthwhile to pursue, and we
   shouldn't pretend we can use it to avoid bumping the version_id.
   Thanks,
   
   Alex
  
  I guess when we bump it we tell users: migration is completely
  borken to the old version, don't even try it.
  
  Is there a way for libvirt to discover such incompatibilities
  and avoid the migration?
 
 I don't know if libvirt has a way to query this in advance.  If a
 migration is attempted, the target will report:
 
 savevm: unsupported version 5 for ':00:03.0/rtl8139' v4
 
 And the source will continue running.  We waste plenty of bits getting
 to that point,

Yes, this happens after all of memory has been migrated.

 but hopefully libvirt understands that it failed.
 Thanks,
 
 Alex



[Qemu-devel] Re: [PATCH] rtl8139: IO memory is not part of vmstate

2010-12-13 Thread Alex Williamson
On Mon, 2010-12-13 at 21:06 +0200, Michael S. Tsirkin wrote:
 On Mon, Dec 13, 2010 at 11:59:16AM -0700, Alex Williamson wrote:
  On Mon, 2010-12-13 at 20:54 +0200, Michael S. Tsirkin wrote:
   On Mon, Dec 13, 2010 at 11:00:44AM -0700, Alex Williamson wrote:
On Mon, 2010-12-13 at 19:50 +0200, Michael S. Tsirkin wrote:
 On Mon, Dec 13, 2010 at 10:43:22AM -0700, Alex Williamson wrote:
  So, unfortunately, I stand by my original patch.
 
 What about the one that put -1 in saved index for a hotplugged device?

There are still examples that don't work even without hotplug (example 2
and example 3 after the reboot).  That hack limits the damage, but still
leaves a latent bug for reboot and doesn't address the non-hotplug
scenarios.  So, I don't think it's worthwhile to pursue, and we
shouldn't pretend we can use it to avoid bumping the version_id.
Thanks,

Alex
   
   I guess when we bump it we tell users: migration is completely
   borken to the old version, don't even try it.
   
   Is there a way for libvirt to discover such incompatibilities
   and avoid the migration?
  
  I don't know if libvirt has a way to query this in advance.  If a
  migration is attempted, the target will report:
  
  savevm: unsupported version 5 for ':00:03.0/rtl8139' v4
  
  And the source will continue running.  We waste plenty of bits getting
  to that point,
 
 Yes, this happens after all of memory has been migrated.

Better late than never :^\




Re: [Qemu-devel] Re: [PATCH] PCI: Bus number from the bridge, not the device

2010-12-13 Thread Alex Williamson
On Mon, 2010-11-08 at 13:22 +0200, Michael S. Tsirkin wrote:
 On Mon, Oct 04, 2010 at 03:53:11PM -0600, Alex Williamson wrote:
  pcibus_dev_print() was erroneously retrieving the device bus
  number from the secondary bus number offset of the device
  instead of the bridge above the device.  This ends of landing
  in the 2nd byte of the 3rd BAR for devices, which thankfully
  is usually zero.  pcibus_get_dev_path() copied this code,
  inheriting the same bug.  pcibus_get_dev_path() is used for
  ramblock naming, so changing it can effect migration.  However,
  I've only seen this byte be non-zero for an assigned device,
  which can't migrate anyway, so hopefully we won't run into
  any issues.
  
  Signed-off-by: Alex Williamson alex.william...@redhat.com
 
 Good catch. Applied.

Um... submitted vs applied:

 PCI: Bus number from the bridge, not the device
 
@@ -6,20 +8,28 @@
 number from the secondary bus number offset of the device
 instead of the bridge above the device.  This ends of landing
 in the 2nd byte of the 3rd BAR for devices, which thankfully
-is usually zero.  pcibus_get_dev_path() copied this code,
+is usually zero.
+
+Note: pcibus_get_dev_path() copied this code,
 inheriting the same bug.  pcibus_get_dev_path() is used for
 ramblock naming, so changing it can effect migration.  However,
 I've only seen this byte be non-zero for an assigned device,
 which can't migrate anyway, so hopefully we won't run into
 any issues.
 
+This patch does not touch pcibus_get_dev_path, as
+bus number is guest assigned for nested buses,
+so using it for migration is broken anyway.
+Fix it properly later.
+
 Signed-off-by: Alex Williamson alex.william...@redhat.com
+Signed-off-by: Michael S. Tsirkin m...@redhat.com
 
 diff --git a/hw/pci.c b/hw/pci.c
-index 6d0934d..15416dd 100644
+index 962886e..8f6fcf8 100644
 --- a/hw/pci.c
 +++ b/hw/pci.c
-@@ -1940,8 +1940,7 @@ static void pcibus_dev_print(Monitor *mon, DeviceState 
*dev, int indent)
+@@ -1806,8 +1806,7 @@ static void pcibus_dev_print(Monitor *mon, DeviceState 
*dev, int indent)
  
  monitor_printf(mon, %*sclass %s, addr %02x:%02x.%x, 
 pci id %04x:%04x (sub %04x:%04x)\n,
@@ -29,14 +39,3 @@
 PCI_SLOT(d-devfn), PCI_FUNC(d-devfn),
 pci_get_word(d-config + PCI_VENDOR_ID),
 pci_get_word(d-config + PCI_DEVICE_ID),
-@@ -1965,7 +1964,7 @@ static char *pcibus_get_dev_path(DeviceState *dev)
- char path[16];
- 
- snprintf(path, sizeof(path), %04x:%02x:%02x.%x,
-- pci_find_domain(d-bus), d-config[PCI_SECONDARY_BUS],
-+ pci_find_domain(d-bus), pci_bus_num(d-bus),
-  PCI_SLOT(d-devfn), PCI_FUNC(d-devfn));
- 
- return strdup(path);
-
-

So the chunk that fixed the part that I was actually interested in got
dropped even though the existing code is clearly wrong.  Yes, we still
have issues with nested bridges (not that we have many of those), but
until the Fix it properly later part comes along, can we please
include the obvious bug fix?  Thanks,

Alex




[Qemu-devel] [RESEND PATCH v3 0/2] Minimal RAM API support

2010-12-13 Thread Alex Williamson
No comments since v3, please apply.  Thanks,

Alex

v3:

 - Address review comments
 - pc registers all memory below 4G in one chunk

Let me know if there are any further issues.

v2:

 - Move to Makefile.objs
 - Move structures to memory.c and create a callback function
 - Fix memory leak

I haven't moved to the state parameter because there should only
be a single instance of this per VM.  The state parameter seems
like it would add complications in setup and function calling, but
maybe point me to an example if I'm off base.

v1:

For VFIO based device assignment, we need to know what guest memory
areas are actual RAM.  RAMBlocks have long since become a grab bag
of misc allocations, so aren't effective for this.  Anthony has had
a RAM API in mind for a while now that addresses this problem.  This
implements just enough of it so that we have an interface to get
actual guest memory physical addresses to setup the host IOMMU.  We
can continue building a full RAM API on top of this stub.

Anthony, feel free to add copyright to memory.c as it's based on
your initial implementation.  I had to add something since the file
in your branch just copies a header with Frabrice's copywrite.

---

Alex Williamson (2):
  RAM API: Make use of it for x86 PC
  Minimal RAM API support


 Makefile.objs |1 +
 cpu-common.h  |2 +
 hw/pc.c   |9 ++---
 memory.c  |   97 +
 memory.h  |   44 ++
 5 files changed, 147 insertions(+), 6 deletions(-)
 create mode 100644 memory.c
 create mode 100644 memory.h



[Qemu-devel] [RESEND PATCH v3 1/2] Minimal RAM API support

2010-12-13 Thread Alex Williamson
This adds a minimum chunk of Anthony's RAM API support so that we
can identify actual VM RAM versus all the other things that make
use of qemu_ram_alloc.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 Makefile.objs |1 +
 cpu-common.h  |2 +
 memory.c  |   97 +
 memory.h  |   44 ++
 4 files changed, 144 insertions(+), 0 deletions(-)
 create mode 100644 memory.c
 create mode 100644 memory.h

diff --git a/Makefile.objs b/Makefile.objs
index cebb945..47f3c3a 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -172,6 +172,7 @@ hw-obj-y += pci.o pci_bridge.o msix.o msi.o
 hw-obj-$(CONFIG_PCI) += pci_host.o pcie_host.o
 hw-obj-$(CONFIG_PCI) += ioh3420.o xio3130_upstream.o xio3130_downstream.o
 hw-obj-y += watchdog.o
+hw-obj-y += memory.o
 hw-obj-$(CONFIG_ISA_MMIO) += isa_mmio.o
 hw-obj-$(CONFIG_ECC) += ecc.o
 hw-obj-$(CONFIG_NAND) += nand.o
diff --git a/cpu-common.h b/cpu-common.h
index 6d4a898..f08f93b 100644
--- a/cpu-common.h
+++ b/cpu-common.h
@@ -29,6 +29,8 @@ enum device_endian {
 /* address in the RAM (different from a physical address) */
 typedef unsigned long ram_addr_t;
 
+#include memory.h
+
 /* memory API */
 
 typedef void CPUWriteMemoryFunc(void *opaque, target_phys_addr_t addr, 
uint32_t value);
diff --git a/memory.c b/memory.c
new file mode 100644
index 000..742776f
--- /dev/null
+++ b/memory.c
@@ -0,0 +1,97 @@
+/*
+ * RAM API
+ *
+ *  Copyright Red Hat, Inc. 2010
+ *
+ * Authors:
+ *  Alex Williamson alex.william...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+#include memory.h
+#include range.h
+
+typedef struct ram_slot {
+target_phys_addr_t start_addr;
+ram_addr_t size;
+ram_addr_t offset;
+QLIST_ENTRY(ram_slot) next;
+} ram_slot;
+
+static QLIST_HEAD(ram_slots, ram_slot) ram_slots =
+QLIST_HEAD_INITIALIZER(ram_slots);
+
+static ram_slot *qemu_ram_find_slot(target_phys_addr_t start_addr,
+   ram_addr_t size)
+{
+ram_slot *slot;
+
+QLIST_FOREACH(slot, ram_slots, next) {
+if (slot-start_addr == start_addr  slot-size == size) {
+return slot;
+}
+
+if (ranges_overlap(start_addr, size, slot-start_addr, slot-size)) {
+hw_error(Ram range overlaps existing slot\n);
+}
+}
+
+return NULL;
+}
+
+int qemu_ram_register(target_phys_addr_t start_addr, ram_addr_t size,
+  ram_addr_t phys_offset)
+{
+ram_slot *slot;
+
+if (!size) {
+return -EINVAL;
+}
+
+assert(!qemu_ram_find_slot(start_addr, size));
+
+slot = qemu_mallocz(sizeof(ram_slot));
+
+slot-start_addr = start_addr;
+slot-size = size;
+slot-offset = phys_offset;
+
+QLIST_INSERT_HEAD(ram_slots, slot, next);
+
+cpu_register_physical_memory(slot-start_addr, slot-size, slot-offset);
+
+return 0;
+}
+
+void qemu_ram_unregister(target_phys_addr_t start_addr, ram_addr_t size)
+{
+ram_slot *slot;
+
+if (!size) {
+return;
+}
+
+slot = qemu_ram_find_slot(start_addr, size);
+assert(slot != NULL);
+
+QLIST_REMOVE(slot, next);
+qemu_free(slot);
+cpu_register_physical_memory(start_addr, size, IO_MEM_UNASSIGNED);
+
+return;
+}
+
+int qemu_ram_for_each_slot(void *opaque, qemu_ram_for_each_slot_fn fn)
+{
+ram_slot *slot;
+
+QLIST_FOREACH(slot, ram_slots, next) {
+int ret = fn(opaque, slot-start_addr, slot-size, slot-offset);
+if (ret) {
+return ret;
+}
+}
+return 0;
+}
diff --git a/memory.h b/memory.h
new file mode 100644
index 000..e7aa5cb
--- /dev/null
+++ b/memory.h
@@ -0,0 +1,44 @@
+#ifndef QEMU_MEMORY_H
+#define QEMU_MEMORY_H
+/*
+ * RAM API
+ *
+ *  Copyright Red Hat, Inc. 2010
+ *
+ * Authors:
+ *  Alex Williamson alex.william...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include qemu-common.h
+#include cpu-common.h
+
+typedef int (*qemu_ram_for_each_slot_fn)(void *opaque,
+ target_phys_addr_t start_addr,
+ ram_addr_t size,
+ ram_addr_t phys_offset);
+
+/**
+ * qemu_ram_register() : Register a region of guest physical memory
+ *
+ * The new region must not overlap an existing region.
+ */
+int qemu_ram_register(target_phys_addr_t start_addr, ram_addr_t size,
+  ram_addr_t phys_offset);
+
+/**
+ * qemu_ram_unregister() : Unregister a region of guest physical memory
+ */
+void qemu_ram_unregister(target_phys_addr_t start_addr, ram_addr_t size);
+
+/**
+ * qemu_ram_for_each_slot() : Call fn() on each registered region
+ *
+ * Stop on non-zero return from fn().
+ */
+int qemu_ram_for_each_slot(void *opaque, 

[Qemu-devel] [RESEND PATCH v3 2/2] RAM API: Make use of it for x86 PC

2010-12-13 Thread Alex Williamson
Register the actual VM RAM using the new API

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 hw/pc.c |9 +++--
 1 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index e1b2667..1554164 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -913,14 +913,11 @@ void pc_memory_init(ram_addr_t ram_size,
 /* allocate RAM */
 ram_addr = qemu_ram_alloc(NULL, pc.ram,
   below_4g_mem_size + above_4g_mem_size);
-cpu_register_physical_memory(0, 0xa, ram_addr);
-cpu_register_physical_memory(0x10,
- below_4g_mem_size - 0x10,
- ram_addr + 0x10);
+qemu_ram_register(0, below_4g_mem_size, ram_addr);
 #if TARGET_PHYS_ADDR_BITS  32
 if (above_4g_mem_size  0) {
-cpu_register_physical_memory(0x1ULL, above_4g_mem_size,
- ram_addr + below_4g_mem_size);
+qemu_ram_register(0x1ULL, above_4g_mem_size,
+  ram_addr + below_4g_mem_size);
 }
 #endif
 




Re: [Qemu-devel] [PATCH] libiscsi

2010-12-13 Thread Blue Swirl
On Mon, Dec 13, 2010 at 8:05 AM, Ronnie Sahlberg
ronniesahlb...@gmail.com wrote:
 This patch adds a new block driver : block.iscsi.c
 This driver interfaces with the multiplatform posix library
 for iscsi initiator/client access to iscsi devices hosted at
 git://github.com/sahlberg/libiscsi.git

 The patch adds the driver to interface with the iscsi library.
 It also updated the configure script to
 * by default, probe is libiscsi is available and if so, build
  qemu against libiscsi.
 * --enable-libiscsi
  Force a build against libiscsi. If libiscsi is not available
  the build will fail.
 * --disable-libiscsi
  Do not link against libiscsi, even if it is available.

 When linked with libiscsi, qemu gains support to access iscsi resources
 such as disks and cdrom directly, without having to make the devices visible
 to the host.

 You can specify devices using a iscsi url of the form :
 iscsi://host[:port]/target-iqn-name/lun

 Example:
 -drive file=iscsi://10.1.1.1:3260/iqn.ronnie.test/1

 -cdrom iscsi://10.1.1.1:3260/iqn.ronnie.test/2

 Signed-off-by: Ronnie Sahlberg ronniesahlb...@gmail.com
 ---
  Makefile.objs |    2 +-
  block/iscsi.c |  528 
 +
  configure     |   29 +++
  3 files changed, 558 insertions(+), 1 deletions(-)
  create mode 100644 block/iscsi.c

 diff --git a/Makefile.objs b/Makefile.objs
 index cebb945..81731c5 100644
 --- a/Makefile.objs
 +++ b/Makefile.objs
 @@ -22,7 +22,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o 
 dmg.o bochs.o vpc.o vv
  block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
  block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
  block-nested-$(CONFIG_WIN32) += raw-win32.o
 -block-nested-$(CONFIG_POSIX) += raw-posix.o
 +block-nested-$(CONFIG_POSIX) += raw-posix.o iscsi.o

Please use CONFIG_ISCSI...

  block-nested-$(CONFIG_CURL) += curl.o

  block-obj-y +=  $(addprefix block/, $(block-nested-y))
 diff --git a/block/iscsi.c b/block/iscsi.c
 new file mode 100644
 index 000..fba5ee6
 --- /dev/null
 +++ b/block/iscsi.c
 @@ -0,0 +1,528 @@
 +/*
 + * QEMU Block driver for iSCSI images
 + *
 + * Copyright (c) 2010 Ronnie Sahlberg ronniesahlb...@gmail.com
 + *
 + * Permission is hereby granted, free of charge, to any person obtaining a 
 copy
 + * of this software and associated documentation files (the Software), to 
 deal
 + * in the Software without restriction, including without limitation the 
 rights
 + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 + * copies of the Software, and to permit persons to whom the Software is
 + * furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice shall be included in
 + * all copies or substantial portions of the Software.
 + *
 + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
 FROM,
 + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 + * THE SOFTWARE.
 + */
 +
 +#include config-host.h
 +#ifdef CONFIG_LIBISCSI

... then this is not needed.

 +
 +#include poll.h
 +#include sysemu.h
 +#include qemu-common.h
 +#include qemu-error.h
 +#include block_int.h
 +
 +#include iscsi/iscsi.h
 +#include iscsi/scsi-lowlevel.h
 +
 +
 +typedef struct ISCSILUN {
 +    struct iscsi_context *iscsi;
 +    int lun;
 +    int block_size;
 +    unsigned long num_blocks;
 +} ISCSILUN;
 +
 +typedef struct ISCSIAIOCB {
 +    BlockDriverAIOCB common;
 +    QEMUIOVector *qiov;
 +    QEMUBH *bh;
 +    ISCSILUN *iscsilun;
 +    int canceled;
 +    int status;
 +    size_t read_size;
 +} ISCSIAIOCB;
 +
 +struct iscsi_task {
 +    ISCSILUN *iscsilun;
 +    int status;
 +    int complete;
 +};

Please see CODING_STYLE for struct naming and use of typedefs.

 +
 +static int
 +iscsi_is_inserted(BlockDriverState *bs)
 +{
 +    ISCSILUN *iscsilun = bs-opaque;
 +    struct iscsi_context *iscsi = iscsilun-iscsi;
 +
 +    return iscsi_is_logged_in(iscsi);
 +}
 +
 +
 +static void
 +iscsi_aio_cancel(BlockDriverAIOCB *blockacb)
 +{
 +    ISCSIAIOCB *acb = (ISCSIAIOCB *)blockacb;
 +
 +    acb-status = -EIO;
 +    acb-common.cb(acb-common.opaque, acb-status);
 +    acb-canceled = 1;
 +}
 +
 +static AIOPool iscsi_aio_pool = {
 +    .aiocb_size         = sizeof(ISCSIAIOCB),
 +    .cancel             = iscsi_aio_cancel,
 +};
 +
 +
 +static void iscsi_process_read(void *arg);
 +static void iscsi_process_write(void *arg);
 +
 +static void
 +iscsi_set_events(ISCSILUN *iscsilun)
 +{
 +    struct iscsi_context *iscsi = iscsilun-iscsi;
 +
 +    qemu_aio_set_fd_handler(iscsi_get_fd(iscsi), iscsi_process_read,
 +    

Re: [Qemu-devel] [RESEND PATCH v3 1/2] Minimal RAM API support

2010-12-13 Thread Blue Swirl
On Mon, Dec 13, 2010 at 8:47 PM, Alex Williamson
alex.william...@redhat.com wrote:
 This adds a minimum chunk of Anthony's RAM API support so that we
 can identify actual VM RAM versus all the other things that make
 use of qemu_ram_alloc.

 Signed-off-by: Alex Williamson alex.william...@redhat.com
 ---

  Makefile.objs |    1 +
  cpu-common.h  |    2 +
  memory.c      |   97 
 +
  memory.h      |   44 ++
  4 files changed, 144 insertions(+), 0 deletions(-)
  create mode 100644 memory.c
  create mode 100644 memory.h

 diff --git a/Makefile.objs b/Makefile.objs
 index cebb945..47f3c3a 100644
 --- a/Makefile.objs
 +++ b/Makefile.objs
 @@ -172,6 +172,7 @@ hw-obj-y += pci.o pci_bridge.o msix.o msi.o
  hw-obj-$(CONFIG_PCI) += pci_host.o pcie_host.o
  hw-obj-$(CONFIG_PCI) += ioh3420.o xio3130_upstream.o xio3130_downstream.o
  hw-obj-y += watchdog.o
 +hw-obj-y += memory.o
  hw-obj-$(CONFIG_ISA_MMIO) += isa_mmio.o
  hw-obj-$(CONFIG_ECC) += ecc.o
  hw-obj-$(CONFIG_NAND) += nand.o
 diff --git a/cpu-common.h b/cpu-common.h
 index 6d4a898..f08f93b 100644
 --- a/cpu-common.h
 +++ b/cpu-common.h
 @@ -29,6 +29,8 @@ enum device_endian {
  /* address in the RAM (different from a physical address) */
  typedef unsigned long ram_addr_t;

 +#include memory.h
 +
  /* memory API */

  typedef void CPUWriteMemoryFunc(void *opaque, target_phys_addr_t addr, 
 uint32_t value);
 diff --git a/memory.c b/memory.c
 new file mode 100644
 index 000..742776f
 --- /dev/null
 +++ b/memory.c
 @@ -0,0 +1,97 @@
 +/*
 + * RAM API
 + *
 + *  Copyright Red Hat, Inc. 2010
 + *
 + * Authors:
 + *  Alex Williamson alex.william...@redhat.com
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2.  See
 + * the COPYING file in the top-level directory.
 + *
 + */
 +#include memory.h
 +#include range.h
 +
 +typedef struct ram_slot {
 +    target_phys_addr_t start_addr;
 +    ram_addr_t size;
 +    ram_addr_t offset;
 +    QLIST_ENTRY(ram_slot) next;
 +} ram_slot;

Please see CODING_STYLE for structure naming.

 +
 +static QLIST_HEAD(ram_slots, ram_slot) ram_slots =
 +    QLIST_HEAD_INITIALIZER(ram_slots);
 +
 +static ram_slot *qemu_ram_find_slot(target_phys_addr_t start_addr,
 +                                   ram_addr_t size)
 +{
 +    ram_slot *slot;
 +
 +    QLIST_FOREACH(slot, ram_slots, next) {
 +        if (slot-start_addr == start_addr  slot-size == size) {
 +            return slot;
 +        }
 +
 +        if (ranges_overlap(start_addr, size, slot-start_addr, slot-size)) {
 +            hw_error(Ram range overlaps existing slot\n);
 +        }
 +    }
 +
 +    return NULL;
 +}
 +
 +int qemu_ram_register(target_phys_addr_t start_addr, ram_addr_t size,
 +                      ram_addr_t phys_offset)
 +{
 +    ram_slot *slot;
 +
 +    if (!size) {
 +        return -EINVAL;
 +    }
 +
 +    assert(!qemu_ram_find_slot(start_addr, size));
 +
 +    slot = qemu_mallocz(sizeof(ram_slot));

Since you initialize every field by hand later, this could be qemu_malloc().

 +
 +    slot-start_addr = start_addr;
 +    slot-size = size;
 +    slot-offset = phys_offset;
 +
 +    QLIST_INSERT_HEAD(ram_slots, slot, next);
 +
 +    cpu_register_physical_memory(slot-start_addr, slot-size, slot-offset);
 +
 +    return 0;
 +}
 +
 +void qemu_ram_unregister(target_phys_addr_t start_addr, ram_addr_t size)
 +{
 +    ram_slot *slot;
 +
 +    if (!size) {
 +        return;
 +    }
 +
 +    slot = qemu_ram_find_slot(start_addr, size);
 +    assert(slot != NULL);
 +
 +    QLIST_REMOVE(slot, next);
 +    qemu_free(slot);
 +    cpu_register_physical_memory(start_addr, size, IO_MEM_UNASSIGNED);
 +
 +    return;

Useless.



[Qemu-devel] [RESEND PATCH] exec: Implement qemu_ram_free_from_ptr()

2010-12-13 Thread Alex Williamson
Required for regions mapped via qemu_ram_alloc_from_ptr().  VFIO
and ivshmem will make use of this to remove mappings when devices
are hot unplugged.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

No comments on original patch.  Obvious missing function.  Cam has since
requested the same function for ivshmem.

 cpu-common.h |1 +
 exec.c   |   13 +
 2 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/cpu-common.h b/cpu-common.h
index 6d4a898..9b763d0 100644
--- a/cpu-common.h
+++ b/cpu-common.h
@@ -49,6 +49,7 @@ ram_addr_t cpu_get_physical_page_desc(target_phys_addr_t 
addr);
 ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name,
 ram_addr_t size, void *host);
 ram_addr_t qemu_ram_alloc(DeviceState *dev, const char *name, ram_addr_t size);
+void qemu_ram_free_from_ptr(ram_addr_t addr);
 void qemu_ram_free(ram_addr_t addr);
 /* This should only be used for ram local to a device.  */
 void *qemu_get_ram_ptr(ram_addr_t addr);
diff --git a/exec.c b/exec.c
index a338495..eea7ea7 100644
--- a/exec.c
+++ b/exec.c
@@ -2875,6 +2875,19 @@ ram_addr_t qemu_ram_alloc(DeviceState *dev, const char 
*name, ram_addr_t size)
 return qemu_ram_alloc_from_ptr(dev, name, size, NULL);
 }
 
+void qemu_ram_free_from_ptr(ram_addr_t addr)
+{
+RAMBlock *block;
+
+QLIST_FOREACH(block, ram_list.blocks, next) {
+if (addr == block-offset) {
+QLIST_REMOVE(block, next);
+qemu_free(block);
+return;
+}
+}
+}
+
 void qemu_ram_free(ram_addr_t addr)
 {
 RAMBlock *block;




[Qemu-devel] Re: [RESEND PATCH v3 1/2] Minimal RAM API support

2010-12-13 Thread Anthony Liguori

On 12/13/2010 02:47 PM, Alex Williamson wrote:

This adds a minimum chunk of Anthony's RAM API support so that we
can identify actual VM RAM versus all the other things that make
use of qemu_ram_alloc.

Signed-off-by: Alex Williamsonalex.william...@redhat.com
---

  Makefile.objs |1 +
  cpu-common.h  |2 +
  memory.c  |   97 +
  memory.h  |   44 ++
  4 files changed, 144 insertions(+), 0 deletions(-)
  create mode 100644 memory.c
  create mode 100644 memory.h

diff --git a/Makefile.objs b/Makefile.objs
index cebb945..47f3c3a 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -172,6 +172,7 @@ hw-obj-y += pci.o pci_bridge.o msix.o msi.o
  hw-obj-$(CONFIG_PCI) += pci_host.o pcie_host.o
  hw-obj-$(CONFIG_PCI) += ioh3420.o xio3130_upstream.o xio3130_downstream.o
  hw-obj-y += watchdog.o
+hw-obj-y += memory.o
  hw-obj-$(CONFIG_ISA_MMIO) += isa_mmio.o
  hw-obj-$(CONFIG_ECC) += ecc.o
  hw-obj-$(CONFIG_NAND) += nand.o
diff --git a/cpu-common.h b/cpu-common.h
index 6d4a898..f08f93b 100644
--- a/cpu-common.h
+++ b/cpu-common.h
@@ -29,6 +29,8 @@ enum device_endian {
  /* address in the RAM (different from a physical address) */
  typedef unsigned long ram_addr_t;

+#include memory.h
+
  /* memory API */

  typedef void CPUWriteMemoryFunc(void *opaque, target_phys_addr_t addr, 
uint32_t value);
diff --git a/memory.c b/memory.c
new file mode 100644
index 000..742776f
--- /dev/null
+++ b/memory.c
@@ -0,0 +1,97 @@
+/*
+ * RAM API
+ *
+ *  Copyright Red Hat, Inc. 2010
+ *
+ * Authors:
+ *  Alex Williamsonalex.william...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+#include memory.h
+#include range.h
+
+typedef struct ram_slot {
+target_phys_addr_t start_addr;
+ram_addr_t size;
+ram_addr_t offset;
+QLIST_ENTRY(ram_slot) next;
+} ram_slot;
+
+static QLIST_HEAD(ram_slots, ram_slot) ram_slots =
+QLIST_HEAD_INITIALIZER(ram_slots);
+
+static ram_slot *qemu_ram_find_slot(target_phys_addr_t start_addr,
+   ram_addr_t size)
+{
+ram_slot *slot;
+
+QLIST_FOREACH(slot,ram_slots, next) {
+if (slot-start_addr == start_addr  slot-size == size) {
+return slot;
+}
+
+if (ranges_overlap(start_addr, size, slot-start_addr, slot-size)) {
+hw_error(Ram range overlaps existing slot\n);
+}
+}
+
+return NULL;
+}

   


CODING_STYLE.  RamSlot and drop the qemu_ prefix.


+int qemu_ram_register(target_phys_addr_t start_addr, ram_addr_t size,
+  ram_addr_t phys_offset)
+{
+ram_slot *slot;
+
+if (!size) {
+return -EINVAL;
+}
+
+assert(!qemu_ram_find_slot(start_addr, size));
+
+slot = qemu_mallocz(sizeof(ram_slot));
+
+slot-start_addr = start_addr;
+slot-size = size;
+slot-offset = phys_offset;
+
+QLIST_INSERT_HEAD(ram_slots, slot, next);
+
+cpu_register_physical_memory(slot-start_addr, slot-size, slot-offset);
+
+return 0;
+}
+
+void qemu_ram_unregister(target_phys_addr_t start_addr, ram_addr_t size)
+{
+ram_slot *slot;
+
+if (!size) {
+return;
+}
+
+slot = qemu_ram_find_slot(start_addr, size);
+assert(slot != NULL);
+
+QLIST_REMOVE(slot, next);
+qemu_free(slot);
+cpu_register_physical_memory(start_addr, size, IO_MEM_UNASSIGNED);
+
+return;
+}
+
+int qemu_ram_for_each_slot(void *opaque, qemu_ram_for_each_slot_fn fn)
+{
+ram_slot *slot;
+
+QLIST_FOREACH(slot,ram_slots, next) {
+int ret = fn(opaque, slot-start_addr, slot-size, slot-offset);
+if (ret) {
+return ret;
+}
+}
+return 0;
+}
diff --git a/memory.h b/memory.h
new file mode 100644
index 000..e7aa5cb
--- /dev/null
+++ b/memory.h
@@ -0,0 +1,44 @@
+#ifndef QEMU_MEMORY_H
+#define QEMU_MEMORY_H
+/*
+ * RAM API
+ *
+ *  Copyright Red Hat, Inc. 2010
+ *
+ * Authors:
+ *  Alex Williamsonalex.william...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include qemu-common.h
+#include cpu-common.h
+
+typedef int (*qemu_ram_for_each_slot_fn)(void *opaque,
+ target_phys_addr_t start_addr,
+ ram_addr_t size,
+ ram_addr_t phys_offset);
+
+/**
+ * qemu_ram_register() : Register a region of guest physical memory
+ *
+ * The new region must not overlap an existing region.
+ */
+int qemu_ram_register(target_phys_addr_t start_addr, ram_addr_t size,
+  ram_addr_t phys_offset);
+
+/**
+ * qemu_ram_unregister() : Unregister a region of guest physical memory
+ */
+void qemu_ram_unregister(target_phys_addr_t start_addr, ram_addr_t size);
+
+/**
+ * qemu_ram_for_each_slot() : Call fn() on each 

[Qemu-devel] [PATCH v4 0/2] Minimal RAM API support

2010-12-13 Thread Alex Williamson
Update per comments, Thanks,

Alex

v4:

 - ram_slot - RamSlot (per CODING_STYLE)
 - drop qemu_ prefix from functions (per CODING_STYLE)
 - mallocz - malloc
 - drop extraneous return from void function

v3:

 - Address review comments
 - pc registers all memory below 4G in one chunk

Let me know if there are any further issues.

v2:

 - Move to Makefile.objs
 - Move structures to memory.c and create a callback function
 - Fix memory leak

I haven't moved to the state parameter because there should only
be a single instance of this per VM.  The state parameter seems
like it would add complications in setup and function calling, but
maybe point me to an example if I'm off base.

v1:

For VFIO based device assignment, we need to know what guest memory
areas are actual RAM.  RAMBlocks have long since become a grab bag
of misc allocations, so aren't effective for this.  Anthony has had
a RAM API in mind for a while now that addresses this problem.  This
implements just enough of it so that we have an interface to get
actual guest memory physical addresses to setup the host IOMMU.  We
can continue building a full RAM API on top of this stub.

Anthony, feel free to add copyright to memory.c as it's based on
your initial implementation.  I had to add something since the file
in your branch just copies a header with Frabrice's copywrite.

---

Alex Williamson (2):
  RAM API: Make use of it for x86 PC
  Minimal RAM API support


 Makefile.objs |1 +
 cpu-common.h  |2 +
 hw/pc.c   |9 ++---
 memory.c  |   94 +
 memory.h  |   44 +++
 5 files changed, 144 insertions(+), 6 deletions(-)
 create mode 100644 memory.c
 create mode 100644 memory.h



[Qemu-devel] [PATCH] RFC: delay pci_update_mappings for 64-bit BARs

2010-12-13 Thread Cam Macdonell
Do not call pci_update_mappings on the lower 32-bits of a 64-bit bar.  Wait for 
the upper 32 or else Qemu will try to map on just the lower 32 which is 
probably going to corrupt memory.

I was encountering crashes when mapping certain PCI region sizes.  The problem 
turns out that pci_update_mappings is being called without all 64-bits in the 
BAR.  For example when mapping to 0x18000, once the lower 32-bits were 
written the remapping happened (mapping to 0x800) which would overwrite 
something.

I'm not certain if this is completely correct, I'm simply testing the lower 
4-bits to only be MEM_TYPE_64 flag.  Upper 32-bit address parts can be values 
like 0xff which is tricky to test against.

Cam
---
 hw/pci.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 438c0d1..3b81792 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1000,6 +1000,9 @@ void pci_default_write_config(PCIDevice *d, uint32_t 
addr, uint32_t val, int l)
 {
 int i, was_irq_disabled = pci_irq_disabled(d);
 uint32_t config_size = pci_config_size(d);
+int is_64 = 0;
+
+is_64 = ((val  0xf) == PCI_BASE_ADDRESS_MEM_TYPE_64);
 
 for (i = 0; i  l  addr + i  config_size; val = 8, ++i) {
 uint8_t wmask = d-wmask[addr + i];
@@ -1008,7 +1011,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t 
addr, uint32_t val, int l)
 d-config[addr + i] = (d-config[addr + i]  ~wmask) | (val  wmask);
 d-config[addr + i] = ~(val  w1cmask); /* W1C: Write 1 to Clear */
 }
-if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
+if ((ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24)  (!is_64)) ||
 ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) ||
 ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) ||
 range_covers_byte(addr, l, PCI_COMMAND))
-- 
1.7.0.4




[Qemu-devel] KVM call agenda for Dec 14

2010-12-13 Thread Chris Wright
Please send in any agenda items you are interested in covering.

thanks,
-chris



[Qemu-devel] [PATCH 04/11] ide: move transfer_start after variable modification

2010-12-13 Thread Alexander Graf
We hook into transfer_start and immediately call the end function
for ahci. This means that everything needs to be in place for the
end function when we start the transfer, so let's move the function
down to where all state is in place.

Signed-off-by: Alexander Graf ag...@suse.de
---
 hw/ide/core.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 2d0ad56..04e463a 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -814,11 +814,11 @@ static void ide_atapi_cmd_reply_end(IDEState *s)
 size = s-cd_sector_size - s-io_buffer_index;
 if (size  s-elementary_transfer_size)
 size = s-elementary_transfer_size;
-ide_transfer_start(s, s-io_buffer + s-io_buffer_index,
-   size, ide_atapi_cmd_reply_end);
 s-packet_transfer_size -= size;
 s-elementary_transfer_size -= size;
 s-io_buffer_index += size;
+ide_transfer_start(s, s-io_buffer + s-io_buffer_index + size,
+   size, ide_atapi_cmd_reply_end);
 } else {
 /* a new transfer is needed */
 s-nsector = (s-nsector  ~7) | ATAPI_INT_REASON_IO;
@@ -843,11 +843,11 @@ static void ide_atapi_cmd_reply_end(IDEState *s)
 if (size  (s-cd_sector_size - s-io_buffer_index))
 size = (s-cd_sector_size - s-io_buffer_index);
 }
-ide_transfer_start(s, s-io_buffer + s-io_buffer_index,
-   size, ide_atapi_cmd_reply_end);
 s-packet_transfer_size -= size;
 s-elementary_transfer_size -= size;
 s-io_buffer_index += size;
+ide_transfer_start(s, s-io_buffer + s-io_buffer_index - size,
+   size, ide_atapi_cmd_reply_end);
 ide_set_irq(s-bus);
 #ifdef DEBUG_IDE_ATAPI
 printf(status=0x%x\n, s-status);
-- 
1.6.0.2




[Qemu-devel] [PATCH 01/11] ide: split ide command interpretation off

2010-12-13 Thread Alexander Graf
The ATA command interpretation code can be used for PATA and SATA
interfaces alike. So let's split it out into a separate function.

Signed-off-by: Alexander Graf ag...@suse.de

---

v6 - v7:

  - use bus instead of opaque (stefanha)
---
 hw/ide/core.c |   20 ++--
 hw/ide/internal.h |2 ++
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 430350f..ac4ee71 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -1791,9 +1791,6 @@ static void ide_clear_hob(IDEBus *bus)
 void ide_ioport_write(void *opaque, uint32_t addr, uint32_t val)
 {
 IDEBus *bus = opaque;
-IDEState *s;
-int n;
-int lba48 = 0;
 
 #ifdef DEBUG_IDE
 printf(IDE: write addr=0x%x val=0x%02x\n, addr, val);
@@ -1854,17 +1851,29 @@ void ide_ioport_write(void *opaque, uint32_t addr, 
uint32_t val)
 default:
 case 7:
 /* command */
+ide_exec_cmd(bus, val);
+break;
+}
+}
+
+
+void ide_exec_cmd(IDEBus *bus, uint32_t val)
+{
+IDEState *s;
+int n;
+int lba48 = 0;
+
 #if defined(DEBUG_IDE)
 printf(ide: CMD=%02x\n, val);
 #endif
 s = idebus_active_if(bus);
 /* ignore commands to non existant slave */
 if (s != bus-ifs  !s-bs)
-break;
+return;
 
 /* Only DEVICE RESET is allowed while BSY or/and DRQ are set */
 if ((s-status  (BUSY_STAT|DRQ_STAT))  val != WIN_DEVICE_RESET)
-break;
+return;
 
 switch(val) {
 case WIN_IDENTIFY:
@@ -2355,7 +2364,6 @@ void ide_ioport_write(void *opaque, uint32_t addr, 
uint32_t val)
 ide_set_irq(s-bus);
 break;
 }
-}
 }
 
 uint32_t ide_ioport_read(void *opaque, uint32_t addr1)
diff --git a/hw/ide/internal.h b/hw/ide/internal.h
index 71af66f..029c76c 100644
--- a/hw/ide/internal.h
+++ b/hw/ide/internal.h
@@ -567,6 +567,8 @@ void ide_init2_with_non_qdev_drives(IDEBus *bus, DriveInfo 
*hd0,
 DriveInfo *hd1, qemu_irq irq);
 void ide_init_ioport(IDEBus *bus, int iobase, int iobase2);
 
+void ide_exec_cmd(IDEBus *bus, uint32_t val);
+
 /* hw/ide/qdev.c */
 void ide_bus_new(IDEBus *idebus, DeviceState *dev, int bus_id);
 IDEDevice *ide_create_drive(IDEBus *bus, int unit, DriveInfo *drive);
-- 
1.6.0.2




[Qemu-devel] [PATCH 07/11] pci: add ich9 pci id

2010-12-13 Thread Alexander Graf
We need a PCI ID for our new AHCI adapter. I just picked an ICH-9
because that's the one in the Q35 chipset.

This patch adds a PCI ID define for an ICH-9 AHCI adapter.

Signed-off-by: Alexander Graf ag...@suse.de

---

v3 - v4:

  - add ICH7 instead of ICH7M (herbszt)

v4 - v5:

  - rename to ICH7_AHCI_RAID (herbszt)

v6 - v7:

  - use non-raid ich7 ahci (herbszt)

v8 - v9:

  - use ICH9 instead of ICH7
---
 hw/pci.h |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/hw/pci.h b/hw/pci.h
index 89f7b76..7f02911 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -62,6 +62,7 @@
 /* Intel (0x8086) */
 #define PCI_DEVICE_ID_INTEL_82551IT  0x1209
 #define PCI_DEVICE_ID_INTEL_825570x1229
+#define PCI_DEVICE_ID_INTEL_82801IR  0x2922
 
 /* Red Hat / Qumranet (for QEMU) -- see pci-ids.txt */
 #define PCI_VENDOR_ID_REDHAT_QUMRANET0x1af4
-- 
1.6.0.2




[Qemu-devel] [PATCH 05/11] ide: add ncq identify data for ahci sata drives

2010-12-13 Thread Alexander Graf
From: Roland Elek elek.rol...@gmail.com

I modified ide_identify() to include the zero-based queue length
value in word 75, and set bit 8 in word 76 to signal NCQ support
in the identify data for AHCI SATA drives.

Signed-off-by: Roland Elek elek.rol...@gmail.com
---
 hw/ide/core.c |7 +++
 hw/ide/internal.h |2 ++
 2 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 04e463a..344b7b4 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -140,6 +140,13 @@ static void ide_identify(IDEState *s)
 put_le16(p + 66, 120);
 put_le16(p + 67, 120);
 put_le16(p + 68, 120);
+
+if (s-ncq_queues) {
+put_le16(p + 75, s-ncq_queues - 1);
+/* NCQ supported */
+put_le16(p + 76, (1  8));
+}
+
 put_le16(p + 80, 0xf0); /* ata3 - ata6 supported */
 put_le16(p + 81, 0x16); /* conforms to ata5 */
 /* 14=NOP supported, 5=WCACHE supported, 0=SMART supported */
diff --git a/hw/ide/internal.h b/hw/ide/internal.h
index aadb505..697c3b4 100644
--- a/hw/ide/internal.h
+++ b/hw/ide/internal.h
@@ -447,6 +447,8 @@ struct IDEState {
 int smart_errors;
 uint8_t smart_selftest_count;
 uint8_t *smart_selftest_data;
+/* AHCI */
+int ncq_queues;
 };
 
 struct IDEDMAOps {
-- 
1.6.0.2




[Qemu-devel] [PATCH 00/11] AHCI emulation support v9

2010-12-13 Thread Alexander Graf
This patch adds support for AHCI emulation. I have tested and verified it works
in Linux, OpenBSD, Windows Vista and Windows 7. This AHCI emulation supports
NCQ, so multiple read or write requests can be outstanding at the same time.

The code is however not fully optimized yet. I'm fairly sure that there are
low hanging performance fruits to be found still :). In my simple benchmarks
I achieved about 2/3rd of virtio performance.

Also, this AHCI emulation layer does not support legacy mode. So if you're
using a disk with this emulation, you do not get it exposed using the legacy
IDE interfaces.

Another nitpick is CD-ROM support in Windows. Somehow it doesn't detect a
CD-ROM drive attached to AHCI. At least it doesn't list it.

To attach an AHCI disk to your VM, please use

  -drive id=disk,file=...,if=none -device ahci,id=ahci \
  -device ide-drive,drive=disk,bus=ahci.0

This patch set is based on work done during the Google Summer of Code. I was
mentoring a student, Roland Elek, who wrote most of the AHCI emulation code
based on a patch from Chong Qiao. A bunch of other people were also involved,
so everybody who I didn't mention - thanks a lot!

  git://repo.or.cz/qemu/ahci.git ahci

v1 - v2:

  - rename IDEExtender to IDEBusOps and make a pointer (kraxel)
  - make dma hooks explicit by putting them into ops struct (stefanha)
  - use qdev buses (kraxel)
  - minor cleanups
  - dprintf overhaul
  - add reset function

v2 - v3:

  - add msi support (kraxel)
  - use MIN macro (kraxel)
  - add msi support (kraxel)
  - fix ncq with multiple ports
  - zap qdev properties (kraxel)
  - redesign legacy IF_SATA hooks (kraxel)
  - don't build ahci as part of target
  - move to ide/ (kwolf)

v3 - v4:

  - prepare for endianness safety
  - add lspci dump (herbszt)
  - use ich7 instead of ich7m (herbszt)
  - fix lst+fis mapping (kraxel)
  - coding style (blue swirl)
  - explicit mmio setters/getters (blue swirl)
  - split pata code out to pata.c (kwolf)
  - only include config-devices.h in machine description (blue swirl)

v4 - v5:

  - s/H2dNcqFis/NCQFrame/g (blue swirl)
  - redo -drive magic (blue swirl)
  - bump BAR to 4k
  - rename ICH7_AHCI to ICH7_AHCI_RAID (herbszt)
  - drop device config header (blue swirl)

v5 - v6:
  - PCI config space fixes (isaku)
  - remove CONFIG_AHCI from x86 default configs (paul brook)
  - use snprintf (blue swirl)
  - add generic PCI config file (paul brook)
  - build ahci on all PCI platforms (paul brook)

v6 - v7:

  - use bus instead of opaque (stefanha)
  - change naming in IDEBusOps (stefanha, kwolf)
  - rename IDEBusOps (stefanha)
  - improve interrupt injection
  - combine tfdata code paths
  - update tfdata more often
  - reset port registers on port reset
  - improve debug output
  - add feature variable from fis for some extended commands
  - always set feature to DMA for atapi
  - osx 10.5.0 works as of this version
  - use non-raid ich7 ahci (herbszt)
  - reflect normal ich7 in pci dump
  - stick to new IDEBusOps (stefanha, kwolf)
  - stefan's ahci comments

v7 - v8:

  - rewrite ops as DMA offsplit framework
  - split bmdma stuff out to pci.c
  - generate tfdata on the fly
  - reimplement immediate dma rw
  - add safety net for busy engine
  - adjust ahci code for new DMA framework
  - move ide core+pci to pci.mak
  - add sebastian's config space patches

v8 - v9:

  - make dma providers subclass of idedma (kwolf)
  - s/set_status/add_status/g (kwolf)
  - cancel and clear ncq queue on reset (stefanha)
  - clear ptr on map failure (stefanha)
  - potential NULL deref, unregister reset (stefanha)
  - add error reporting for ncq (stefanha)
  - replace hw_error with DPRINTF (stefanha)
  - move sg generation to sg users
  - fix off-by-one in sglist interpretation
  - make background engine work (queued commands)
  - use ICH9 instead of ICH7 (aliguori)
  - update to new APIs


Alexander Graf (9):
  ide: split ide command interpretation off
  ide: fix whitespace gap in ide_exec_cmd
  ide: Split out BMDMA code from ATA core
  ide: move transfer_start after variable modification
  pci: add storage class for sata
  pci: add ich9 pci id
  ahci: add ahci emulation
  config: move ide core and pci to pci.mak
  config: add ahci for pci capable machines

Roland Elek (1):
  ide: add ncq identify data for ahci sata drives

Sebastian Herbszt (1):
  ahci: set SATA Mode Select

 Makefile.objs|1 +
 default-configs/arm-softmmu.mak  |1 -
 default-configs/i386-softmmu.mak |3 -
 default-configs/mips-softmmu.mak |3 -
 default-configs/mips64-softmmu.mak   |3 -
 default-configs/mips64el-softmmu.mak |3 -
 default-configs/mipsel-softmmu.mak   |3 -
 default-configs/pci.mak  |4 +
 default-configs/ppc-softmmu.mak  |3 -
 default-configs/ppc64-softmmu.mak|3 -
 default-configs/ppcemb-softmmu.mak   |3 -
 default-configs/sh4-softmmu.mak  |1 -
 default-configs/sh4eb-softmmu.mak|1 -
 

[Qemu-devel] [PATCH 03/11] ide: Split out BMDMA code from ATA core

2010-12-13 Thread Alexander Graf
The ATA core is currently heavily intertwined with BMDMA code. Let's loosen
that a bit, so we can happily replace the DMA backend with different
implementations.

Signed-off-by: Alexander Graf ag...@suse.de

---

v7 - v8:

  - rewrite as DMA ops

v8 - v9:

  - fold in: split out irq setting
  - fold in: move header definitions out
  - make dma providers subclass of idedma (kwolf)
  - s/set_status/add_status/g (kwolf)
---
 hw/ide/cmd646.c   |7 +-
 hw/ide/core.c |  335 ++---
 hw/ide/internal.h |   69 +--
 hw/ide/pci.c  |  289 +-
 hw/ide/pci.h  |   30 +
 hw/ide/piix.c |7 +-
 hw/ide/via.c  |7 +-
 7 files changed, 446 insertions(+), 298 deletions(-)

diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c
index ea5d2dc..fde0617 100644
--- a/hw/ide/cmd646.c
+++ b/hw/ide/cmd646.c
@@ -167,9 +167,10 @@ static void bmdma_map(PCIDevice *pci_dev, int region_num,
 
 for(i = 0;i  2; i++) {
 BMDMAState *bm = d-bmdma[i];
-d-bus[i].bmdma = bm;
+bmdma_init(d-bus[i], bm);
 bm-bus = d-bus+i;
-qemu_add_vm_change_state_handler(ide_dma_restart_cb, bm);
+qemu_add_vm_change_state_handler(d-bus[i].dma-ops-restart_cb,
+ bm-dma);
 
 if (i == 0) {
 register_ioport_write(addr, 4, 1, bmdma_writeb_0, d);
@@ -218,7 +219,7 @@ static void cmd646_reset(void *opaque)
 
 for (i = 0; i  2; i++) {
 ide_bus_reset(d-bus[i]);
-ide_dma_reset(d-bmdma[i]);
+d-bus[i].dma-ops-reset(d-bmdma[i].dma);
 }
 }
 
diff --git a/hw/ide/core.c b/hw/ide/core.c
index 5e2fcbd..2d0ad56 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -34,8 +34,6 @@
 
 #include hw/ide/internal.h
 
-#define IDE_PAGE_SIZE 4096
-
 static const int smart_attributes[][5] = {
 /* id,  flags, val, wrst, thrsh */
 { 0x01, 0x03, 0x64, 0x64, 0x06}, /* raw read */
@@ -61,11 +59,8 @@ static inline int media_is_cd(IDEState *s)
 return (media_present(s)  s-nb_sectors = CD_MAX_SECTORS);
 }
 
-static void ide_dma_start(IDEState *s, BlockDriverCompletionFunc *dma_cb);
-static void ide_dma_restart(IDEState *s, int is_read);
 static void ide_atapi_cmd_read_dma_cb(void *opaque, int ret);
 static int ide_handle_rw_error(IDEState *s, int error, int op);
-static void ide_flush_cache(IDEState *s);
 
 static void padstr(char *str, const char *src, int len)
 {
@@ -314,11 +309,11 @@ static inline void ide_abort_command(IDEState *s)
 }
 
 static inline void ide_dma_submit_check(IDEState *s,
-  BlockDriverCompletionFunc *dma_cb, BMDMAState *bm)
+  BlockDriverCompletionFunc *dma_cb)
 {
-if (bm-aiocb)
+if (s-bus-dma-aiocb)
return;
-dma_cb(bm, -1);
+dma_cb(s, -1);
 }
 
 /* prepare data transfer and tell what to do after */
@@ -328,8 +323,10 @@ static void ide_transfer_start(IDEState *s, uint8_t *buf, 
int size,
 s-end_transfer_func = end_transfer_func;
 s-data_ptr = buf;
 s-data_end = buf + size;
-if (!(s-status  ERR_STAT))
+if (!(s-status  ERR_STAT)) {
 s-status |= DRQ_STAT;
+}
+s-bus-dma-ops-start_transfer(s-bus-dma);
 }
 
 static void ide_transfer_stop(IDEState *s)
@@ -394,7 +391,7 @@ static void ide_rw_error(IDEState *s) {
 ide_set_irq(s-bus);
 }
 
-static void ide_sector_read(IDEState *s)
+void ide_sector_read(IDEState *s)
 {
 int64_t sector_num;
 int ret, n;
@@ -427,58 +424,15 @@ static void ide_sector_read(IDEState *s)
 }
 }
 
-
-/* return 0 if buffer completed */
-static int dma_buf_prepare(BMDMAState *bm, int is_write)
-{
-IDEState *s = bmdma_active_if(bm);
-struct {
-uint32_t addr;
-uint32_t size;
-} prd;
-int l, len;
-
-qemu_sglist_init(s-sg, s-nsector / (IDE_PAGE_SIZE / 512) + 1);
-s-io_buffer_size = 0;
-for(;;) {
-if (bm-cur_prd_len == 0) {
-/* end of table (with a fail safe of one page) */
-if (bm-cur_prd_last ||
-(bm-cur_addr - bm-addr) = IDE_PAGE_SIZE)
-return s-io_buffer_size != 0;
-cpu_physical_memory_read(bm-cur_addr, (uint8_t *)prd, 8);
-bm-cur_addr += 8;
-prd.addr = le32_to_cpu(prd.addr);
-prd.size = le32_to_cpu(prd.size);
-len = prd.size  0xfffe;
-if (len == 0)
-len = 0x1;
-bm-cur_prd_len = len;
-bm-cur_prd_addr = prd.addr;
-bm-cur_prd_last = (prd.size  0x8000);
-}
-l = bm-cur_prd_len;
-if (l  0) {
-qemu_sglist_add(s-sg, bm-cur_prd_addr, l);
-bm-cur_prd_addr += l;
-bm-cur_prd_len -= l;
-s-io_buffer_size += l;
-}
-}
-return 1;
-}
-
 static void dma_buf_commit(IDEState *s, int is_write)
 {
 qemu_sglist_destroy(s-sg);
 }
 
-static void ide_dma_set_inactive(BMDMAState *bm)
+static void 

[Qemu-devel] [PATCH 10/11] config: add ahci for pci capable machines

2010-12-13 Thread Alexander Graf
This patch enables AHCI for all machines supporting PCI.

Signed-off-by: Alexander Graf ag...@suse.de
---
 default-configs/pci.mak |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/default-configs/pci.mak b/default-configs/pci.mak
index d700b3c..0471efb 100644
--- a/default-configs/pci.mak
+++ b/default-configs/pci.mak
@@ -13,3 +13,4 @@ CONFIG_E1000_PCI=y
 CONFIG_IDE_CORE=y
 CONFIG_IDE_QDEV=y
 CONFIG_IDE_PCI=y
+CONFIG_AHCI=y
-- 
1.6.0.2




[Qemu-devel] [PATCH 06/11] pci: add storage class for sata

2010-12-13 Thread Alexander Graf
This patch adds the storage sata class id.

Signed-off-by: Alexander Graf ag...@suse.de
---
 hw/pci_ids.h |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/hw/pci_ids.h b/hw/pci_ids.h
index 82cba7e..ea3418c 100644
--- a/hw/pci_ids.h
+++ b/hw/pci_ids.h
@@ -15,6 +15,7 @@
 
 #define PCI_CLASS_STORAGE_SCSI   0x0100
 #define PCI_CLASS_STORAGE_IDE0x0101
+#define PCI_CLASS_STORAGE_SATA   0x0106
 #define PCI_CLASS_STORAGE_OTHER  0x0180
 
 #define PCI_CLASS_NETWORK_ETHERNET   0x0200
-- 
1.6.0.2




[Qemu-devel] [PATCH 02/11] ide: fix whitespace gap in ide_exec_cmd

2010-12-13 Thread Alexander Graf
Now that we have the function split out, we have to reindent it.
In order to increase the readability of the actual functional change,
this is split out.

Signed-off-by: Alexander Graf ag...@suse.de
---
 hw/ide/core.c |  734 
 1 files changed, 367 insertions(+), 367 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index ac4ee71..5e2fcbd 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -1864,423 +1864,423 @@ void ide_exec_cmd(IDEBus *bus, uint32_t val)
 int lba48 = 0;
 
 #if defined(DEBUG_IDE)
-printf(ide: CMD=%02x\n, val);
+printf(ide: CMD=%02x\n, val);
 #endif
-s = idebus_active_if(bus);
-/* ignore commands to non existant slave */
-if (s != bus-ifs  !s-bs)
-return;
+s = idebus_active_if(bus);
+/* ignore commands to non existant slave */
+if (s != bus-ifs  !s-bs)
+return;
 
-/* Only DEVICE RESET is allowed while BSY or/and DRQ are set */
-if ((s-status  (BUSY_STAT|DRQ_STAT))  val != WIN_DEVICE_RESET)
-return;
+/* Only DEVICE RESET is allowed while BSY or/and DRQ are set */
+if ((s-status  (BUSY_STAT|DRQ_STAT))  val != WIN_DEVICE_RESET)
+return;
 
-switch(val) {
-case WIN_IDENTIFY:
-if (s-bs  s-drive_kind != IDE_CD) {
-if (s-drive_kind != IDE_CFATA)
-ide_identify(s);
-else
-ide_cfata_identify(s);
-s-status = READY_STAT | SEEK_STAT;
-ide_transfer_start(s, s-io_buffer, 512, ide_transfer_stop);
-} else {
-if (s-drive_kind == IDE_CD) {
-ide_set_signature(s);
-}
-ide_abort_command(s);
-}
-ide_set_irq(s-bus);
-break;
-case WIN_SPECIFY:
-case WIN_RECAL:
-s-error = 0;
+switch(val) {
+case WIN_IDENTIFY:
+if (s-bs  s-drive_kind != IDE_CD) {
+if (s-drive_kind != IDE_CFATA)
+ide_identify(s);
+else
+ide_cfata_identify(s);
 s-status = READY_STAT | SEEK_STAT;
-ide_set_irq(s-bus);
-break;
-case WIN_SETMULT:
-if (s-drive_kind == IDE_CFATA  s-nsector == 0) {
-/* Disable Read and Write Multiple */
-s-mult_sectors = 0;
-s-status = READY_STAT | SEEK_STAT;
-} else if ((s-nsector  0xff) != 0 
-((s-nsector  0xff)  MAX_MULT_SECTORS ||
- (s-nsector  (s-nsector - 1)) != 0)) {
-ide_abort_command(s);
-} else {
-s-mult_sectors = s-nsector  0xff;
-s-status = READY_STAT | SEEK_STAT;
+ide_transfer_start(s, s-io_buffer, 512, ide_transfer_stop);
+} else {
+if (s-drive_kind == IDE_CD) {
+ide_set_signature(s);
 }
-ide_set_irq(s-bus);
-break;
-case WIN_VERIFY_EXT:
-   lba48 = 1;
-case WIN_VERIFY:
-case WIN_VERIFY_ONCE:
-/* do sector number check ? */
-   ide_cmd_lba48_transform(s, lba48);
+ide_abort_command(s);
+}
+ide_set_irq(s-bus);
+break;
+case WIN_SPECIFY:
+case WIN_RECAL:
+s-error = 0;
+s-status = READY_STAT | SEEK_STAT;
+ide_set_irq(s-bus);
+break;
+case WIN_SETMULT:
+if (s-drive_kind == IDE_CFATA  s-nsector == 0) {
+/* Disable Read and Write Multiple */
+s-mult_sectors = 0;
 s-status = READY_STAT | SEEK_STAT;
-ide_set_irq(s-bus);
-break;
+} else if ((s-nsector  0xff) != 0 
+((s-nsector  0xff)  MAX_MULT_SECTORS ||
+ (s-nsector  (s-nsector - 1)) != 0)) {
+ide_abort_command(s);
+} else {
+s-mult_sectors = s-nsector  0xff;
+s-status = READY_STAT | SEEK_STAT;
+}
+ide_set_irq(s-bus);
+break;
+case WIN_VERIFY_EXT:
+   lba48 = 1;
+case WIN_VERIFY:
+case WIN_VERIFY_ONCE:
+/* do sector number check ? */
+   ide_cmd_lba48_transform(s, lba48);
+s-status = READY_STAT | SEEK_STAT;
+ide_set_irq(s-bus);
+break;
case WIN_READ_EXT:
-   lba48 = 1;
-case WIN_READ:
-case WIN_READ_ONCE:
-if (!s-bs)
-goto abort_cmd;
-   ide_cmd_lba48_transform(s, lba48);
-s-req_nb_sectors = 1;
-ide_sector_read(s);
-break;
+   lba48 = 1;
+case WIN_READ:
+case WIN_READ_ONCE:
+if (!s-bs)
+goto abort_cmd;
+   ide_cmd_lba48_transform(s, lba48);
+s-req_nb_sectors = 1;
+ide_sector_read(s);
+break;
case WIN_WRITE_EXT:
-   lba48 = 1;
-case 

[Qemu-devel] [PATCH 11/11] ahci: set SATA Mode Select

2010-12-13 Thread Alexander Graf
From: Sebastian Herbszt herb...@gmx.de

Set SATA Mode Select to AHCI in the Address Map Register.

Signed-off-by: Sebastian Herbszt herb...@gmx.de
---
 hw/ide/ahci.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index f937a92..8ae236a 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -1473,6 +1473,9 @@ static int pci_ahci_init(PCIDevice *dev)
 d-card.config[PCI_LATENCY_TIMER]   = 0x00;  /* Latency timer */
 pci_config_set_interrupt_pin(d-card.config, 1);
 
+/* XXX Software should program this register */
+d-card.config[0x90]   = 1  6; /* Address Map Register - AHCI mode */
+
 qemu_register_reset(ahci_reset, d);
 
 /* XXX BAR size should be 1k, but that breaks, so bump it to 4k for now */
-- 
1.6.0.2




[Qemu-devel] [PATCH 09/11] config: move ide core and pci to pci.mak

2010-12-13 Thread Alexander Graf
Every device that can do PCI should also be able to do IDE. So let's move
the IDE definitions over to pci.mak.

Signed-off-by: Alexander Graf ag...@suse.de
---
 default-configs/arm-softmmu.mak  |1 -
 default-configs/i386-softmmu.mak |3 ---
 default-configs/mips-softmmu.mak |3 ---
 default-configs/mips64-softmmu.mak   |3 ---
 default-configs/mips64el-softmmu.mak |3 ---
 default-configs/mipsel-softmmu.mak   |3 ---
 default-configs/pci.mak  |3 +++
 default-configs/ppc-softmmu.mak  |3 ---
 default-configs/ppc64-softmmu.mak|3 ---
 default-configs/ppcemb-softmmu.mak   |3 ---
 default-configs/sh4-softmmu.mak  |1 -
 default-configs/sh4eb-softmmu.mak|1 -
 default-configs/sparc64-softmmu.mak  |3 ---
 default-configs/x86_64-softmmu.mak   |3 ---
 14 files changed, 3 insertions(+), 33 deletions(-)

diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index ac48dc1..8d1174f 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -8,7 +8,6 @@ CONFIG_ECC=y
 CONFIG_SERIAL=y
 CONFIG_PTIMER=y
 CONFIG_SD=y
-CONFIG_IDE_CORE=y
 CONFIG_MAX7310=y
 CONFIG_WM8750=y
 CONFIG_TWL92230=y
diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index ce905d2..323fafb 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -13,9 +13,6 @@ CONFIG_FDC=y
 CONFIG_ACPI=y
 CONFIG_APM=y
 CONFIG_DMA=y
-CONFIG_IDE_CORE=y
-CONFIG_IDE_QDEV=y
-CONFIG_IDE_PCI=y
 CONFIG_IDE_ISA=y
 CONFIG_IDE_PIIX=y
 CONFIG_NE2000_ISA=y
diff --git a/default-configs/mips-softmmu.mak b/default-configs/mips-softmmu.mak
index 565e611..f524971 100644
--- a/default-configs/mips-softmmu.mak
+++ b/default-configs/mips-softmmu.mak
@@ -17,9 +17,6 @@ CONFIG_ACPI=y
 CONFIG_APM=y
 CONFIG_DMA=y
 CONFIG_PIIX4=y
-CONFIG_IDE_CORE=y
-CONFIG_IDE_QDEV=y
-CONFIG_IDE_PCI=y
 CONFIG_IDE_ISA=y
 CONFIG_IDE_PIIX=y
 CONFIG_NE2000_ISA=y
diff --git a/default-configs/mips64-softmmu.mak 
b/default-configs/mips64-softmmu.mak
index 03bd8eb..aeab6b2 100644
--- a/default-configs/mips64-softmmu.mak
+++ b/default-configs/mips64-softmmu.mak
@@ -17,9 +17,6 @@ CONFIG_ACPI=y
 CONFIG_APM=y
 CONFIG_DMA=y
 CONFIG_PIIX4=y
-CONFIG_IDE_CORE=y
-CONFIG_IDE_QDEV=y
-CONFIG_IDE_PCI=y
 CONFIG_IDE_ISA=y
 CONFIG_IDE_PIIX=y
 CONFIG_NE2000_ISA=y
diff --git a/default-configs/mips64el-softmmu.mak 
b/default-configs/mips64el-softmmu.mak
index 4661617..8e6511c 100644
--- a/default-configs/mips64el-softmmu.mak
+++ b/default-configs/mips64el-softmmu.mak
@@ -17,9 +17,6 @@ CONFIG_ACPI=y
 CONFIG_APM=y
 CONFIG_DMA=y
 CONFIG_PIIX4=y
-CONFIG_IDE_CORE=y
-CONFIG_IDE_QDEV=y
-CONFIG_IDE_PCI=y
 CONFIG_IDE_ISA=y
 CONFIG_IDE_PIIX=y
 CONFIG_IDE_VIA=y
diff --git a/default-configs/mipsel-softmmu.mak 
b/default-configs/mipsel-softmmu.mak
index 92fc473..a05ac25 100644
--- a/default-configs/mipsel-softmmu.mak
+++ b/default-configs/mipsel-softmmu.mak
@@ -17,9 +17,6 @@ CONFIG_ACPI=y
 CONFIG_APM=y
 CONFIG_DMA=y
 CONFIG_PIIX4=y
-CONFIG_IDE_CORE=y
-CONFIG_IDE_QDEV=y
-CONFIG_IDE_PCI=y
 CONFIG_IDE_ISA=y
 CONFIG_IDE_PIIX=y
 CONFIG_NE2000_ISA=y
diff --git a/default-configs/pci.mak b/default-configs/pci.mak
index c74a99f..d700b3c 100644
--- a/default-configs/pci.mak
+++ b/default-configs/pci.mak
@@ -10,3 +10,6 @@ CONFIG_PCNET_COMMON=y
 CONFIG_LSI_SCSI_PCI=y
 CONFIG_RTL8139_PCI=y
 CONFIG_E1000_PCI=y
+CONFIG_IDE_CORE=y
+CONFIG_IDE_QDEV=y
+CONFIG_IDE_PCI=y
diff --git a/default-configs/ppc-softmmu.mak b/default-configs/ppc-softmmu.mak
index f1cb99e..4563742 100644
--- a/default-configs/ppc-softmmu.mak
+++ b/default-configs/ppc-softmmu.mak
@@ -23,9 +23,6 @@ CONFIG_GRACKLE_PCI=y
 CONFIG_UNIN_PCI=y
 CONFIG_DEC_PCI=y
 CONFIG_PPCE500_PCI=y
-CONFIG_IDE_CORE=y
-CONFIG_IDE_QDEV=y
-CONFIG_IDE_PCI=y
 CONFIG_IDE_ISA=y
 CONFIG_IDE_CMD646=y
 CONFIG_IDE_MACIO=y
diff --git a/default-configs/ppc64-softmmu.mak 
b/default-configs/ppc64-softmmu.mak
index 83cbe97..d5073b3 100644
--- a/default-configs/ppc64-softmmu.mak
+++ b/default-configs/ppc64-softmmu.mak
@@ -23,9 +23,6 @@ CONFIG_GRACKLE_PCI=y
 CONFIG_UNIN_PCI=y
 CONFIG_DEC_PCI=y
 CONFIG_PPCE500_PCI=y
-CONFIG_IDE_CORE=y
-CONFIG_IDE_QDEV=y
-CONFIG_IDE_PCI=y
 CONFIG_IDE_ISA=y
 CONFIG_IDE_CMD646=y
 CONFIG_IDE_MACIO=y
diff --git a/default-configs/ppcemb-softmmu.mak 
b/default-configs/ppcemb-softmmu.mak
index 2b52d4a..9f0730c 100644
--- a/default-configs/ppcemb-softmmu.mak
+++ b/default-configs/ppcemb-softmmu.mak
@@ -23,9 +23,6 @@ CONFIG_GRACKLE_PCI=y
 CONFIG_UNIN_PCI=y
 CONFIG_DEC_PCI=y
 CONFIG_PPCE500_PCI=y
-CONFIG_IDE_CORE=y
-CONFIG_IDE_QDEV=y
-CONFIG_IDE_PCI=y
 CONFIG_IDE_ISA=y
 CONFIG_IDE_CMD646=y
 CONFIG_IDE_MACIO=y
diff --git a/default-configs/sh4-softmmu.mak b/default-configs/sh4-softmmu.mak
index 87247a4..5c69acc 100644
--- a/default-configs/sh4-softmmu.mak
+++ b/default-configs/sh4-softmmu.mak
@@ -3,6 +3,5 @@
 include pci.mak
 CONFIG_SERIAL=y
 CONFIG_PTIMER=y
-CONFIG_IDE_CORE=y
 CONFIG_PFLASH_CFI02=y
 CONFIG_ISA_MMIO=y

[Qemu-devel] [PATCH 08/11] ahci: add ahci emulation

2010-12-13 Thread Alexander Graf
This patch adds an emulation layer for an ICH-9 AHCI controller. For now
this controller does not do IDE legacy emulation. It is a pure AHCI controller.

Signed-off-by: Alexander Graf ag...@suse.de

---

v1 - v2:

  - rename IDEExtender to IDEBusOps and make a pointer (kraxel)
  - make dma hooks explicit by putting them into ops struct (stefanha)
  - use qdev buses (kraxel)
  - minor cleanups
  - dprintf overhaul
  - add reset function

v2 - v3:

  - add msi support (kraxel)
  - use MIN macro (kraxel)
  - add msi support (kraxel)
  - fix ncq with multiple ports
  - zap qdev properties (kraxel)
  - redesign legacy IF_SATA hooks (kraxel)
  - don't build ahci as part of target
  - move to ide/ (kwolf)

v3 - v4:

  - prepare for endianness safety
  - add lspci dump (herbszt)
  - use ich7 instead of ich7m (herbszt)
  - fix lst+fis mapping (kraxel)
  - coding style (blue swirl)
  - explicit mmio setters/getters (blue swirl)

v4 - v5:

  - s/H2dNcqFis/NCQFrame/g (blue swirl)
  - redo -drive magic (blue swirl)
  - bump BAR to 4k
  - ahci.c: rename to ICH7_AHCI_RAID (herbszt)

v5 - v6:
  - PCI config space fixes (isaku)
  - remove CONFIG_AHCI from default configs

v6 - v7:

  - improve interrupt injection
  - combine tfdata code paths
  - update tfdata more often
  - reset port registers on port reset
  - improve debug output
  - add feature variable from fis for some extended commands
  - always set feature to DMA for atapi
  - osx 10.5.0 works as of this version
  - use non-raid ich7 ahci (herbszt)
  - reflect normal ich7 in pci dump
  - stick to new IDEBusOps (stefanha, kwolf)
  - ahci: stefan's ahci comments

v7 - v8:
  - generate tfdata on the fly
  - reimplement immediate dma rw
  - add safety net for busy engine
  - adjust for new DMA interface

v8 - v9:

  - ahci: set pci revision id to 0x02
  - make dma providers subclass of idedma (kwolf)
  - s/set_status/add_status/g (kwolf)
  - cancel and clear ncq queue on reset (stefanha)
  - clear ptr on map failure (stefanha)
  - potential NULL deref, unregister reset (stefanha)
  - add error reporting for ncq (stefanha)
  - replace hw_error with DPRINTF (stefanha)
  - move sg generation to sg users
  - fix off-by-one in sglist interpretation
  - make background engine work (queued commands)
  - use ICH9 instead of ICH7 (aliguori)
  - udpate to new APIs
---
 Makefile.objs |1 +
 hw/ide/ahci.c | 1524 +
 2 files changed, 1525 insertions(+), 0 deletions(-)
 create mode 100644 hw/ide/ahci.c

diff --git a/Makefile.objs b/Makefile.objs
index cebb945..2693088 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -241,6 +241,7 @@ hw-obj-$(CONFIG_IDE_PIIX) += ide/piix.o
 hw-obj-$(CONFIG_IDE_CMD646) += ide/cmd646.o
 hw-obj-$(CONFIG_IDE_MACIO) += ide/macio.o
 hw-obj-$(CONFIG_IDE_VIA) += ide/via.o
+hw-obj-$(CONFIG_AHCI) += ide/ahci.o
 
 # SCSI layer
 hw-obj-$(CONFIG_LSI_SCSI_PCI) += lsi53c895a.o
diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
new file mode 100644
index 000..f937a92
--- /dev/null
+++ b/hw/ide/ahci.c
@@ -0,0 +1,1524 @@
+/*
+ * QEMU AHCI Emulation
+ *
+ * Copyright (c) 2010 qiaoch...@loongson.cn
+ * Copyright (c) 2010 Roland Elek elek.rol...@gmail.com
+ * Copyright (c) 2010 Sebastian Herbszt herb...@gmx.de
+ * Copyright (c) 2010 Alexander Graf ag...@suse.de
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see http://www.gnu.org/licenses/.
+ *
+ *
+ * lspci dump of a ICH-9 real device in IDE mode (hopefully close enough):
+ *
+ * 00:1f.2 SATA controller [0106]: Intel Corporation 82801IR/IO/IH 
(ICH9R/DO/DH) 6 port SATA AHCI Controller [8086:2922] (rev 02) (prog-if 01 
[AHCI 1.0])
+ * Subsystem: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port 
SATA AHCI Controller [8086:2922]
+ * Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
+ * Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium TAbort- 
TAbort- MAbort- SERR- PERR- INTx-
+ * Latency: 0
+ * Interrupt: pin B routed to IRQ 222
+ * Region 0: I/O ports at d000 [size=8]
+ * Region 1: I/O ports at cc00 [size=4]
+ * Region 2: I/O ports at c880 [size=8]
+ * Region 3: I/O ports at c800 [size=4]
+ * Region 4: I/O ports at c480 [size=32]
+ * Region 5: Memory at febf9000 (32-bit, non-prefetchable) [size=2K]
+ * Capabilities: [80] 

[Qemu-devel] [PATCH, RFC 3/4] prep: Fix duplicate ISA IDE IRQ

2010-12-13 Thread Andreas Färber
Calling isa_ide_init() twice with the same IRQ 13 fails:

qemu: hardware error: isa irq 13 already assigned

Use a different IRQ (14) for the second one to avoid this.

Signed-off-by: Hervé Poussineau hpous...@reactos.org
Cc: Alexander Graf ag...@suse.de
Signed-off-by: Andreas Färber andreas.faer...@web.de
---
 hw/ppc_prep.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/hw/ppc_prep.c b/hw/ppc_prep.c
index 3575dee..3073870 100644
--- a/hw/ppc_prep.c
+++ b/hw/ppc_prep.c
@@ -76,7 +76,7 @@ qemu_log_mask(CPU_LOG_IOPORT, fmt, ## __VA_ARGS__)
 /* Constants for devices init */
 static const int ide_iobase[2] = { 0x1f0, 0x170 };
 static const int ide_iobase2[2] = { 0x3f6, 0x376 };
-static const int ide_irq[2] = { 13, 13 };
+static const int ide_irq[2] = { 13, 14 };
 
 #define NE2000_NB_MAX 6
 
-- 
1.7.3




[Qemu-devel] [PATCH 1/4] prep: Remove bogus BIOS size check

2010-12-13 Thread Andreas Färber
r3480 added this check to account for the entry vector 0xfff00100 to be
available for CPUs that need it. Today however, the NIP is not yet
initialized at this point (zero), so the check always triggers.

Cc: Hervé Poussineau hpous...@reactos.org
Cc: Alexander Graf ag...@suse.de
Signed-off-by: Andreas Färber andreas.faer...@web.de
---
 hw/ppc_prep.c |3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/hw/ppc_prep.c b/hw/ppc_prep.c
index 1492266..6b22122 100644
--- a/hw/ppc_prep.c
+++ b/hw/ppc_prep.c
@@ -600,9 +600,6 @@ static void ppc_prep_init (ram_addr_t ram_size,
 if (filename) {
 qemu_free(filename);
 }
-if (env-nip  0xFFF8  bios_size  0x0010) {
-hw_error(PowerPC 601 / 620 / 970 need a 1MB BIOS\n);
-}
 
 if (linux_boot) {
 kernel_base = KERNEL_LOAD_ADDR;
-- 
1.7.3




[Qemu-devel] [FYI 4/4] prep: Quickfix for ioport

2010-12-13 Thread Andreas Färber
Workaround the following error:

qemu: hardware error: register_ioport_read: invalid opaque

Signed-off-by: Hervé Poussineau hpous...@reactos.org
Signed-off-by: Andreas Färber andreas.faer...@web.de
---
 hw/ppc_prep.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/hw/ppc_prep.c b/hw/ppc_prep.c
index 3073870..0c9183e 100644
--- a/hw/ppc_prep.c
+++ b/hw/ppc_prep.c
@@ -721,8 +721,10 @@ static void ppc_prep_init (ram_addr_t ram_size,
 register_ioport_read(0x398, 2, 1, PREP_io_read, sysctrl);
 register_ioport_write(0x398, 2, 1, PREP_io_write, sysctrl);
 /* System control ports */
+#if 0
 register_ioport_read(0x0092, 0x01, 1, PREP_io_800_readb, sysctrl);
 register_ioport_write(0x0092, 0x01, 1, PREP_io_800_writeb, sysctrl);
+#endif
 register_ioport_read(0x0800, 0x52, 1, PREP_io_800_readb, sysctrl);
 register_ioport_write(0x0800, 0x52, 1, PREP_io_800_writeb, sysctrl);
 /* PCI intack location */
-- 
1.7.3




[Qemu-devel] [PATCH 2/4] prep: Add ELF support

2010-12-13 Thread Andreas Färber
In order to switch from abondoned OpenHack'Ware to OpenBIOS firmware,
the PReP machine needs to be able to load an ELF BIOS.

ELF loading is adapted from ppc_newworld, the fallback mechanism from sun4m.

Note that since we must register the maximum amount of ROM before attempting
to load an ELF BIOS and since there is no cpu_unregister_physical_memory(),
raw BIOS files such as OHW may now be preceded by unused ROM memory.

Cc: Alexander Graf ag...@suse.de
Cc: Hervé Poussineau hpous...@reactos.org
Signed-off-by: Andreas Färber andreas.faer...@web.de
---
 hw/ppc_prep.c |   24 +++-
 1 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/hw/ppc_prep.c b/hw/ppc_prep.c
index 6b22122..3575dee 100644
--- a/hw/ppc_prep.c
+++ b/hw/ppc_prep.c
@@ -36,6 +36,7 @@
 #include qemu-log.h
 #include ide.h
 #include loader.h
+#include elf.h
 #include mc146818rtc.h
 #include blockdev.h
 
@@ -582,18 +583,23 @@ static void ppc_prep_init (ram_addr_t ram_size,
 bios_name = BIOS_FILENAME;
 filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name);
 if (filename) {
-bios_size = get_image_size(filename);
+cpu_register_physical_memory(0xfff0, BIOS_SIZE,
+ bios_offset | IO_MEM_ROM);
+bios_size = load_elf(filename, NULL, NULL, NULL,
+ NULL, NULL, 1, ELF_MACHINE, 0);
+if (bios_size  0 || bios_size  BIOS_SIZE) {
+bios_size = get_image_size(filename);
+if (bios_size  0  bios_size = BIOS_SIZE) {
+target_phys_addr_t bios_addr;
+bios_size = (bios_size + 0xfff)  ~0xfff;
+bios_addr = (uint32_t)(-bios_size);
+bios_size = load_image_targphys(filename, bios_addr,
+bios_size);
+}
+}
 } else {
 bios_size = -1;
 }
-if (bios_size  0  bios_size = BIOS_SIZE) {
-target_phys_addr_t bios_addr;
-bios_size = (bios_size + 0xfff)  ~0xfff;
-bios_addr = (uint32_t)(-bios_size);
-cpu_register_physical_memory(bios_addr, bios_size,
- bios_offset | IO_MEM_ROM);
-bios_size = load_image_targphys(filename, bios_addr, bios_size);
-}
 if (bios_size  0 || bios_size  BIOS_SIZE) {
 hw_error(qemu: could not load PPC PREP bios '%s'\n, bios_name);
 }
-- 
1.7.3




[Qemu-devel] [PATCH 0/4] ppc: Fix PReP emulation

2010-12-13 Thread Andreas Färber
Hello,

Based on an earlier attempt of mine to make OpenBIOS work with -M prep,
with kind support from Hervé Poussineau here's an initial stab at
fixing the long-broken PReP emulation and preparing migration from
abandoned OpenHack'Ware to OpenBIOS as default FOSS firmware.

In particular a number of hw_error()s are resolved, so that the BIOS
can be entered at all. It is not yet working in terms of serial and
VGA support etc.

This series is also available from:

git://repo.or.cz/qemu/afaerber.git prep-queue

Some more work-in-progress for the curious is on my prep branch [2].
The corresponding work-in-progress OpenBIOS changes are at [3].

Unfortunately the prep machine is lacking documentation what exactly it
tries to emulate. The plan thus is to merge emulation of a second, real
IBM 40p machine based on Hervé's work at [1], for use with original
binary firmware.

Also upcoming are new ppc_chrp machines, forked from ppc_newworld,
emulating the 970-based IBM JS20 (using Apple U3) [4] and possibly the
POWER5-based IntelliStation 285. These depend on the ongoing ppc64 port
of OpenBIOS to be completed though. This relates to PReP in that the
machine IDs will need to be coordinated.

Have fun,
Andreas

[1] git://repo.or.cz/qemu/hpoussin.git ppc
http://repo.or.cz/w/qemu/hpoussin.git/shortlog/refs/heads/ppc
[2] http://repo.or.cz/w/qemu/afaerber.git/shortlog/refs/heads/prep
[3] http://repo.or.cz/w/openbios/afaerber.git/shortlog/refs/heads/prep
[4] http://repo.or.cz/w/qemu/afaerber.git/shortlog/refs/heads/aix

Andreas Färber (4):
  prep: Remove bogus BIOS size check
  prep: Add ELF support
  prep: Fix duplicate ISA IDE IRQ
  prep: Quickfix for ioport

 hw/ppc_prep.c |   31 ++-
 1 files changed, 18 insertions(+), 13 deletions(-)

-- 
1.7.3




[Qemu-devel] Can any one help me?

2010-12-13 Thread 欧阳晓华
I use qemu-0.13.0, and I want to emulate SPARC system,
I did these:
1.qemu-img create solaris.img 10G
2.qemu-system-sparc -m 256 -hda solaris.img -boot d -cdrom
sol-9-905-sparc.iso
3.qemu reported Unhandled Exception 0X0007, and then Stopping
execution
Any one known why this happened?
Thank you very much
Best Regards!


[Qemu-devel] Re: [PATCH] rtl8139: IO memory is not part of vmstate

2010-12-13 Thread Michael S. Tsirkin
On Mon, Dec 13, 2010 at 12:15:08PM -0700, Alex Williamson wrote:
 On Mon, 2010-12-13 at 21:06 +0200, Michael S. Tsirkin wrote:
  On Mon, Dec 13, 2010 at 11:59:16AM -0700, Alex Williamson wrote:
   On Mon, 2010-12-13 at 20:54 +0200, Michael S. Tsirkin wrote:
On Mon, Dec 13, 2010 at 11:00:44AM -0700, Alex Williamson wrote:
 On Mon, 2010-12-13 at 19:50 +0200, Michael S. Tsirkin wrote:
  On Mon, Dec 13, 2010 at 10:43:22AM -0700, Alex Williamson wrote:
   So, unfortunately, I stand by my original patch.
  
  What about the one that put -1 in saved index for a hotplugged 
  device?
 
 There are still examples that don't work even without hotplug 
 (example 2
 and example 3 after the reboot).  That hack limits the damage, but 
 still
 leaves a latent bug for reboot and doesn't address the non-hotplug
 scenarios.  So, I don't think it's worthwhile to pursue, and we
 shouldn't pretend we can use it to avoid bumping the version_id.
 Thanks,
 
 Alex

I guess when we bump it we tell users: migration is completely
borken to the old version, don't even try it.

Is there a way for libvirt to discover such incompatibilities
and avoid the migration?
   
   I don't know if libvirt has a way to query this in advance.  If a
   migration is attempted, the target will report:
   
   savevm: unsupported version 5 for ':00:03.0/rtl8139' v4
   
   And the source will continue running.  We waste plenty of bits getting
   to that point,
  
  Yes, this happens after all of memory has been migrated.
 
 Better late than never :^\

One other question: can we do the same by creating a new (empty)
section? As was discussed in the past this is easier for
downstreams to cherry-pick.

-- 
MST



Re: [Qemu-devel] Re: [PATCH] PCI: Bus number from the bridge, not the device

2010-12-13 Thread Michael S. Tsirkin
On Mon, Dec 13, 2010 at 01:04:23PM -0700, Alex Williamson wrote:
 On Mon, 2010-11-08 at 13:22 +0200, Michael S. Tsirkin wrote:
  On Mon, Oct 04, 2010 at 03:53:11PM -0600, Alex Williamson wrote:
   pcibus_dev_print() was erroneously retrieving the device bus
   number from the secondary bus number offset of the device
   instead of the bridge above the device.  This ends of landing
   in the 2nd byte of the 3rd BAR for devices, which thankfully
   is usually zero.  pcibus_get_dev_path() copied this code,
   inheriting the same bug.  pcibus_get_dev_path() is used for
   ramblock naming, so changing it can effect migration.  However,
   I've only seen this byte be non-zero for an assigned device,
   which can't migrate anyway, so hopefully we won't run into
   any issues.
   
   Signed-off-by: Alex Williamson alex.william...@redhat.com
  
  Good catch. Applied.
 
 Um... submitted vs applied:
 
  PCI: Bus number from the bridge, not the device
  
 @@ -6,20 +8,28 @@
  number from the secondary bus number offset of the device
  instead of the bridge above the device.  This ends of landing
  in the 2nd byte of the 3rd BAR for devices, which thankfully
 -is usually zero.  pcibus_get_dev_path() copied this code,
 +is usually zero.
 +
 +Note: pcibus_get_dev_path() copied this code,
  inheriting the same bug.  pcibus_get_dev_path() is used for
  ramblock naming, so changing it can effect migration.  However,
  I've only seen this byte be non-zero for an assigned device,
  which can't migrate anyway, so hopefully we won't run into
  any issues.
  
 +This patch does not touch pcibus_get_dev_path, as
 +bus number is guest assigned for nested buses,
 +so using it for migration is broken anyway.
 +Fix it properly later.
 +
  Signed-off-by: Alex Williamson alex.william...@redhat.com
 +Signed-off-by: Michael S. Tsirkin m...@redhat.com
  
  diff --git a/hw/pci.c b/hw/pci.c
 -index 6d0934d..15416dd 100644
 +index 962886e..8f6fcf8 100644
  --- a/hw/pci.c
  +++ b/hw/pci.c
 -@@ -1940,8 +1940,7 @@ static void pcibus_dev_print(Monitor *mon, DeviceState 
 *dev, int indent)
 +@@ -1806,8 +1806,7 @@ static void pcibus_dev_print(Monitor *mon, DeviceState 
 *dev, int indent)
   
   monitor_printf(mon, %*sclass %s, addr %02x:%02x.%x, 
  pci id %04x:%04x (sub %04x:%04x)\n,
 @@ -29,14 +39,3 @@
  PCI_SLOT(d-devfn), PCI_FUNC(d-devfn),
  pci_get_word(d-config + PCI_VENDOR_ID),
  pci_get_word(d-config + PCI_DEVICE_ID),
 -@@ -1965,7 +1964,7 @@ static char *pcibus_get_dev_path(DeviceState *dev)
 - char path[16];
 - 
 - snprintf(path, sizeof(path), %04x:%02x:%02x.%x,
 -- pci_find_domain(d-bus), d-config[PCI_SECONDARY_BUS],
 -+ pci_find_domain(d-bus), pci_bus_num(d-bus),
 -  PCI_SLOT(d-devfn), PCI_FUNC(d-devfn));
 - 
 - return strdup(path);
 -
 -
 
 So the chunk that fixed the part that I was actually interested in got
 dropped even though the existing code is clearly wrong.  Yes, we still
 have issues with nested bridges (not that we have many of those), but
 until the Fix it properly later part comes along, can we please
 include the obvious bug fix?  Thanks,
 
 Alex

We can stick 0 in there - would that help?  I would much rather not
create a version where we put the bus number there.

-- 
MST



Re: [Qemu-devel] Re: [PATCH] PCI: Bus number from the bridge, not the device

2010-12-13 Thread Alex Williamson
On Tue, 2010-12-14 at 06:46 +0200, Michael S. Tsirkin wrote:
 On Mon, Dec 13, 2010 at 01:04:23PM -0700, Alex Williamson wrote:
  On Mon, 2010-11-08 at 13:22 +0200, Michael S. Tsirkin wrote:
   On Mon, Oct 04, 2010 at 03:53:11PM -0600, Alex Williamson wrote:
pcibus_dev_print() was erroneously retrieving the device bus
number from the secondary bus number offset of the device
instead of the bridge above the device.  This ends of landing
in the 2nd byte of the 3rd BAR for devices, which thankfully
is usually zero.  pcibus_get_dev_path() copied this code,
inheriting the same bug.  pcibus_get_dev_path() is used for
ramblock naming, so changing it can effect migration.  However,
I've only seen this byte be non-zero for an assigned device,
which can't migrate anyway, so hopefully we won't run into
any issues.

Signed-off-by: Alex Williamson alex.william...@redhat.com
   
   Good catch. Applied.
  
  Um... submitted vs applied:
  
   PCI: Bus number from the bridge, not the device
   
  @@ -6,20 +8,28 @@
   number from the secondary bus number offset of the device
   instead of the bridge above the device.  This ends of landing
   in the 2nd byte of the 3rd BAR for devices, which thankfully
  -is usually zero.  pcibus_get_dev_path() copied this code,
  +is usually zero.
  +
  +Note: pcibus_get_dev_path() copied this code,
   inheriting the same bug.  pcibus_get_dev_path() is used for
   ramblock naming, so changing it can effect migration.  However,
   I've only seen this byte be non-zero for an assigned device,
   which can't migrate anyway, so hopefully we won't run into
   any issues.
   
  +This patch does not touch pcibus_get_dev_path, as
  +bus number is guest assigned for nested buses,
  +so using it for migration is broken anyway.
  +Fix it properly later.
  +
   Signed-off-by: Alex Williamson alex.william...@redhat.com
  +Signed-off-by: Michael S. Tsirkin m...@redhat.com
   
   diff --git a/hw/pci.c b/hw/pci.c
  -index 6d0934d..15416dd 100644
  +index 962886e..8f6fcf8 100644
   --- a/hw/pci.c
   +++ b/hw/pci.c
  -@@ -1940,8 +1940,7 @@ static void pcibus_dev_print(Monitor *mon, 
  DeviceState *dev, int indent)
  +@@ -1806,8 +1806,7 @@ static void pcibus_dev_print(Monitor *mon, 
  DeviceState *dev, int indent)

monitor_printf(mon, %*sclass %s, addr %02x:%02x.%x, 
   pci id %04x:%04x (sub %04x:%04x)\n,
  @@ -29,14 +39,3 @@
   PCI_SLOT(d-devfn), PCI_FUNC(d-devfn),
   pci_get_word(d-config + PCI_VENDOR_ID),
   pci_get_word(d-config + PCI_DEVICE_ID),
  -@@ -1965,7 +1964,7 @@ static char *pcibus_get_dev_path(DeviceState *dev)
  - char path[16];
  - 
  - snprintf(path, sizeof(path), %04x:%02x:%02x.%x,
  -- pci_find_domain(d-bus), d-config[PCI_SECONDARY_BUS],
  -+ pci_find_domain(d-bus), pci_bus_num(d-bus),
  -  PCI_SLOT(d-devfn), PCI_FUNC(d-devfn));
  - 
  - return strdup(path);
  -
  -
  
  So the chunk that fixed the part that I was actually interested in got
  dropped even though the existing code is clearly wrong.  Yes, we still
  have issues with nested bridges (not that we have many of those), but
  until the Fix it properly later part comes along, can we please
  include the obvious bug fix?  Thanks,
  
  Alex
 
 We can stick 0 in there - would that help?  I would much rather not
 create a version where we put the bus number there.

Yep, 0 is good enough until we solve the nested bridge problem.  Thanks,

Alex




Re: [Qemu-devel] Re: [PATCH] PCI: Bus number from the bridge, not the device

2010-12-13 Thread Michael S. Tsirkin
On Mon, Dec 13, 2010 at 09:49:21PM -0700, Alex Williamson wrote:
 On Tue, 2010-12-14 at 06:46 +0200, Michael S. Tsirkin wrote:
  On Mon, Dec 13, 2010 at 01:04:23PM -0700, Alex Williamson wrote:
   On Mon, 2010-11-08 at 13:22 +0200, Michael S. Tsirkin wrote:
On Mon, Oct 04, 2010 at 03:53:11PM -0600, Alex Williamson wrote:
 pcibus_dev_print() was erroneously retrieving the device bus
 number from the secondary bus number offset of the device
 instead of the bridge above the device.  This ends of landing
 in the 2nd byte of the 3rd BAR for devices, which thankfully
 is usually zero.  pcibus_get_dev_path() copied this code,
 inheriting the same bug.  pcibus_get_dev_path() is used for
 ramblock naming, so changing it can effect migration.  However,
 I've only seen this byte be non-zero for an assigned device,
 which can't migrate anyway, so hopefully we won't run into
 any issues.
 
 Signed-off-by: Alex Williamson alex.william...@redhat.com

Good catch. Applied.
   
   Um... submitted vs applied:
   
PCI: Bus number from the bridge, not the device

   @@ -6,20 +8,28 @@
number from the secondary bus number offset of the device
instead of the bridge above the device.  This ends of landing
in the 2nd byte of the 3rd BAR for devices, which thankfully
   -is usually zero.  pcibus_get_dev_path() copied this code,
   +is usually zero.
   +
   +Note: pcibus_get_dev_path() copied this code,
inheriting the same bug.  pcibus_get_dev_path() is used for
ramblock naming, so changing it can effect migration.  However,
I've only seen this byte be non-zero for an assigned device,
which can't migrate anyway, so hopefully we won't run into
any issues.

   +This patch does not touch pcibus_get_dev_path, as
   +bus number is guest assigned for nested buses,
   +so using it for migration is broken anyway.
   +Fix it properly later.
   +
Signed-off-by: Alex Williamson alex.william...@redhat.com
   +Signed-off-by: Michael S. Tsirkin m...@redhat.com

diff --git a/hw/pci.c b/hw/pci.c
   -index 6d0934d..15416dd 100644
   +index 962886e..8f6fcf8 100644
--- a/hw/pci.c
+++ b/hw/pci.c
   -@@ -1940,8 +1940,7 @@ static void pcibus_dev_print(Monitor *mon, 
   DeviceState *dev, int indent)
   +@@ -1806,8 +1806,7 @@ static void pcibus_dev_print(Monitor *mon, 
   DeviceState *dev, int indent)
 
 monitor_printf(mon, %*sclass %s, addr %02x:%02x.%x, 
pci id %04x:%04x (sub %04x:%04x)\n,
   @@ -29,14 +39,3 @@
PCI_SLOT(d-devfn), PCI_FUNC(d-devfn),
pci_get_word(d-config + PCI_VENDOR_ID),
pci_get_word(d-config + PCI_DEVICE_ID),
   -@@ -1965,7 +1964,7 @@ static char *pcibus_get_dev_path(DeviceState *dev)
   - char path[16];
   - 
   - snprintf(path, sizeof(path), %04x:%02x:%02x.%x,
   -- pci_find_domain(d-bus), d-config[PCI_SECONDARY_BUS],
   -+ pci_find_domain(d-bus), pci_bus_num(d-bus),
   -  PCI_SLOT(d-devfn), PCI_FUNC(d-devfn));
   - 
   - return strdup(path);
   -
   -
   
   So the chunk that fixed the part that I was actually interested in got
   dropped even though the existing code is clearly wrong.  Yes, we still
   have issues with nested bridges (not that we have many of those), but
   until the Fix it properly later part comes along, can we please
   include the obvious bug fix?  Thanks,
   
   Alex
  
  We can stick 0 in there - would that help?  I would much rather not
  create a version where we put the bus number there.
 
 Yep, 0 is good enough until we solve the nested bridge problem.  Thanks,
 
 Alex

I'm surprised you see that it matters in practice, but ok.
Like this?

diff --git a/hw/pci.c b/hw/pci.c
index 254647b..81231c5 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1952,7 +1952,10 @@ static char *pcibus_get_dev_path(DeviceState *dev)
 char path[16];
 
 snprintf(path, sizeof(path), %04x:%02x:%02x.%x,
- pci_find_domain(d-bus), d-config[PCI_SECONDARY_BUS],
+ pci_find_domain(d-bus),
+ 0 /* TODO: need a persistent path for nested buses.
+* Note: pci_bus_num(d-bus) is not right as it's guest
+* assigned. */,
  PCI_SLOT(d-devfn), PCI_FUNC(d-devfn));
 
 return strdup(path);



[Qemu-devel] Re: [PATCH] rtl8139: IO memory is not part of vmstate

2010-12-13 Thread Alex Williamson
On Tue, 2010-12-14 at 06:43 +0200, Michael S. Tsirkin wrote:
 On Mon, Dec 13, 2010 at 12:15:08PM -0700, Alex Williamson wrote:
  On Mon, 2010-12-13 at 21:06 +0200, Michael S. Tsirkin wrote:
   On Mon, Dec 13, 2010 at 11:59:16AM -0700, Alex Williamson wrote:
On Mon, 2010-12-13 at 20:54 +0200, Michael S. Tsirkin wrote:
 On Mon, Dec 13, 2010 at 11:00:44AM -0700, Alex Williamson wrote:
  On Mon, 2010-12-13 at 19:50 +0200, Michael S. Tsirkin wrote:
   On Mon, Dec 13, 2010 at 10:43:22AM -0700, Alex Williamson wrote:
So, unfortunately, I stand by my original patch.
   
   What about the one that put -1 in saved index for a hotplugged 
   device?
  
  There are still examples that don't work even without hotplug 
  (example 2
  and example 3 after the reboot).  That hack limits the damage, but 
  still
  leaves a latent bug for reboot and doesn't address the non-hotplug
  scenarios.  So, I don't think it's worthwhile to pursue, and we
  shouldn't pretend we can use it to avoid bumping the version_id.
  Thanks,
  
  Alex
 
 I guess when we bump it we tell users: migration is completely
 borken to the old version, don't even try it.
 
 Is there a way for libvirt to discover such incompatibilities
 and avoid the migration?

I don't know if libvirt has a way to query this in advance.  If a
migration is attempted, the target will report:

savevm: unsupported version 5 for ':00:03.0/rtl8139' v4

And the source will continue running.  We waste plenty of bits getting
to that point,
   
   Yes, this happens after all of memory has been migrated.
  
  Better late than never :^\
 
 One other question: can we do the same by creating a new (empty)
 section? As was discussed in the past this is easier for
 downstreams to cherry-pick.

The only way I can think to do that would be to have a subsection that
is always included, but saves no data.  That would force a failure on
new-old migration, but I don't think it really matches the intended
purpose of subsections and feels like it's adding cruft for no gain.
Maybe I'm missing something.  Juan, is there any advantage to trapping
this in a subsection?  Thanks,

Alex




Re: [Qemu-devel] Re: [PATCH] PCI: Bus number from the bridge, not the device

2010-12-13 Thread Alex Williamson
On Tue, 2010-12-14 at 06:57 +0200, Michael S. Tsirkin wrote:
 On Mon, Dec 13, 2010 at 09:49:21PM -0700, Alex Williamson wrote:
  On Tue, 2010-12-14 at 06:46 +0200, Michael S. Tsirkin wrote:
   On Mon, Dec 13, 2010 at 01:04:23PM -0700, Alex Williamson wrote:
On Mon, 2010-11-08 at 13:22 +0200, Michael S. Tsirkin wrote:
 On Mon, Oct 04, 2010 at 03:53:11PM -0600, Alex Williamson wrote:
  pcibus_dev_print() was erroneously retrieving the device bus
  number from the secondary bus number offset of the device
  instead of the bridge above the device.  This ends of landing
  in the 2nd byte of the 3rd BAR for devices, which thankfully
  is usually zero.  pcibus_get_dev_path() copied this code,
  inheriting the same bug.  pcibus_get_dev_path() is used for
  ramblock naming, so changing it can effect migration.  However,
  I've only seen this byte be non-zero for an assigned device,
  which can't migrate anyway, so hopefully we won't run into
  any issues.
  
  Signed-off-by: Alex Williamson alex.william...@redhat.com
 
 Good catch. Applied.

Um... submitted vs applied:

 PCI: Bus number from the bridge, not the device
 
@@ -6,20 +8,28 @@
 number from the secondary bus number offset of the device
 instead of the bridge above the device.  This ends of landing
 in the 2nd byte of the 3rd BAR for devices, which thankfully
-is usually zero.  pcibus_get_dev_path() copied this code,
+is usually zero.
+
+Note: pcibus_get_dev_path() copied this code,
 inheriting the same bug.  pcibus_get_dev_path() is used for
 ramblock naming, so changing it can effect migration.  However,
 I've only seen this byte be non-zero for an assigned device,
 which can't migrate anyway, so hopefully we won't run into
 any issues.
 
+This patch does not touch pcibus_get_dev_path, as
+bus number is guest assigned for nested buses,
+so using it for migration is broken anyway.
+Fix it properly later.
+
 Signed-off-by: Alex Williamson alex.william...@redhat.com
+Signed-off-by: Michael S. Tsirkin m...@redhat.com
 
 diff --git a/hw/pci.c b/hw/pci.c
-index 6d0934d..15416dd 100644
+index 962886e..8f6fcf8 100644
 --- a/hw/pci.c
 +++ b/hw/pci.c
-@@ -1940,8 +1940,7 @@ static void pcibus_dev_print(Monitor *mon, 
DeviceState *dev, int indent)
+@@ -1806,8 +1806,7 @@ static void pcibus_dev_print(Monitor *mon, 
DeviceState *dev, int indent)
  
  monitor_printf(mon, %*sclass %s, addr %02x:%02x.%x, 
 pci id %04x:%04x (sub %04x:%04x)\n,
@@ -29,14 +39,3 @@
 PCI_SLOT(d-devfn), PCI_FUNC(d-devfn),
 pci_get_word(d-config + PCI_VENDOR_ID),
 pci_get_word(d-config + PCI_DEVICE_ID),
-@@ -1965,7 +1964,7 @@ static char *pcibus_get_dev_path(DeviceState 
*dev)
- char path[16];
- 
- snprintf(path, sizeof(path), %04x:%02x:%02x.%x,
-- pci_find_domain(d-bus), d-config[PCI_SECONDARY_BUS],
-+ pci_find_domain(d-bus), pci_bus_num(d-bus),
-  PCI_SLOT(d-devfn), PCI_FUNC(d-devfn));
- 
- return strdup(path);
-
-

So the chunk that fixed the part that I was actually interested in got
dropped even though the existing code is clearly wrong.  Yes, we still
have issues with nested bridges (not that we have many of those), but
until the Fix it properly later part comes along, can we please
include the obvious bug fix?  Thanks,

Alex
   
   We can stick 0 in there - would that help?  I would much rather not
   create a version where we put the bus number there.
  
  Yep, 0 is good enough until we solve the nested bridge problem.  Thanks,
  
  Alex
 
 I'm surprised you see that it matters in practice, but ok.
 Like this?

I've only ever seen config[PCI_SECONDARY_BUS] be non-zero for an
assigned device, so I'm pretty sure we're not going to hurt migration,
but the code is clearly wrong and I'd like to make sure we don't trip on
a migration failure for a minor device config space change.

 diff --git a/hw/pci.c b/hw/pci.c
 index 254647b..81231c5 100644
 --- a/hw/pci.c
 +++ b/hw/pci.c
 @@ -1952,7 +1952,10 @@ static char *pcibus_get_dev_path(DeviceState *dev)
  char path[16];
  
  snprintf(path, sizeof(path), %04x:%02x:%02x.%x,
 - pci_find_domain(d-bus), d-config[PCI_SECONDARY_BUS],
 + pci_find_domain(d-bus),
 + 0 /* TODO: need a persistent path for nested buses.
 +* Note: pci_bus_num(d-bus) is not right as it's guest
 +* assigned. */,
   PCI_SLOT(d-devfn), PCI_FUNC(d-devfn));
  
  return strdup(path);

Sure, that's fine.

Acked-by: Alex Williamson alex.william...@redhat.com

Thanks,

Alex




[Qemu-devel] SMBIOS support in Qemu?

2010-12-13 Thread Anjali Kulkarni
Hi,

Which version of Qemu contains the Smbios code? If I have to get the code in my 
repo, is there any place I can get the complete set of patches?

Thanks
Anjali



[Qemu-devel] SMBIOS support in Qemu?

2010-12-13 Thread Anjali Kulkarni

Hi,

Which version of Qemu contains the Smbios code? If I have to get the code in my 
repo, is there any place I can get the complete set of patches?

Thanks
Anjali



Re: [Qemu-devel] SMBIOS support in Qemu?

2010-12-13 Thread Alex Williamson
On Mon, Dec 13, 2010 at 10:47 PM, Anjali Kulkarni anj...@juniper.net wrote:

 Hi,

 Which version of Qemu contains the Smbios code? If I have to get the code in 
 my repo, is there any place I can get the complete set of patches?

We've had SMBIOS support for a couple years, it should be in any of
the recent release and distributions.  SMBIOS is generated in seabios
in src/smbios.*[1]  Support for loading tables and fields from qemu is
in hw/smbios.*[2]

Alex

[1] http://www.seabios.org/Download
[2] http://wiki.qemu.org/Download