[Qemu-devel] Re: [PATCH] vga: Declare as little endian
On 11.12.2010, at 23:33, Blue Swirl wrote: This patch replaces explicit bswaps with endianness hints to the mmio layer. CC: Alexander Graf ag...@suse.de Signed-off-by: Blue Swirl blauwir...@gmail.com Acked-by: Alexander Graf ag...@suse.de Alex
[Qemu-devel] (no subject)
Please find a new block driver that IF libiscsi is present on the system will link with this userspace client library and make qemu able to access iscsi devices directly without exposing them to the host. The library used is multiplatform and available from git://github.com/sahlberg/libiscsi.git
[Qemu-devel] [PATCH] libiscsi
This patch adds a new block driver : block.iscsi.c This driver interfaces with the multiplatform posix library for iscsi initiator/client access to iscsi devices hosted at git://github.com/sahlberg/libiscsi.git The patch adds the driver to interface with the iscsi library. It also updated the configure script to * by default, probe is libiscsi is available and if so, build qemu against libiscsi. * --enable-libiscsi Force a build against libiscsi. If libiscsi is not available the build will fail. * --disable-libiscsi Do not link against libiscsi, even if it is available. When linked with libiscsi, qemu gains support to access iscsi resources such as disks and cdrom directly, without having to make the devices visible to the host. You can specify devices using a iscsi url of the form : iscsi://host[:port]/target-iqn-name/lun Example: -drive file=iscsi://10.1.1.1:3260/iqn.ronnie.test/1 -cdrom iscsi://10.1.1.1:3260/iqn.ronnie.test/2 Signed-off-by: Ronnie Sahlberg ronniesahlb...@gmail.com --- Makefile.objs |2 +- block/iscsi.c | 528 + configure | 29 +++ 3 files changed, 558 insertions(+), 1 deletions(-) create mode 100644 block/iscsi.c diff --git a/Makefile.objs b/Makefile.objs index cebb945..81731c5 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -22,7 +22,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vv block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o block-nested-$(CONFIG_WIN32) += raw-win32.o -block-nested-$(CONFIG_POSIX) += raw-posix.o +block-nested-$(CONFIG_POSIX) += raw-posix.o iscsi.o block-nested-$(CONFIG_CURL) += curl.o block-obj-y += $(addprefix block/, $(block-nested-y)) diff --git a/block/iscsi.c b/block/iscsi.c new file mode 100644 index 000..fba5ee6 --- /dev/null +++ b/block/iscsi.c @@ -0,0 +1,528 @@ +/* + * QEMU Block driver for iSCSI images + * + * Copyright (c) 2010 Ronnie Sahlberg ronniesahlb...@gmail.com + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the Software), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include config-host.h +#ifdef CONFIG_LIBISCSI + +#include poll.h +#include sysemu.h +#include qemu-common.h +#include qemu-error.h +#include block_int.h + +#include iscsi/iscsi.h +#include iscsi/scsi-lowlevel.h + + +typedef struct ISCSILUN { +struct iscsi_context *iscsi; +int lun; +int block_size; +unsigned long num_blocks; +} ISCSILUN; + +typedef struct ISCSIAIOCB { +BlockDriverAIOCB common; +QEMUIOVector *qiov; +QEMUBH *bh; +ISCSILUN *iscsilun; +int canceled; +int status; +size_t read_size; +} ISCSIAIOCB; + +struct iscsi_task { +ISCSILUN *iscsilun; +int status; +int complete; +}; + +static int +iscsi_is_inserted(BlockDriverState *bs) +{ +ISCSILUN *iscsilun = bs-opaque; +struct iscsi_context *iscsi = iscsilun-iscsi; + +return iscsi_is_logged_in(iscsi); +} + + +static void +iscsi_aio_cancel(BlockDriverAIOCB *blockacb) +{ +ISCSIAIOCB *acb = (ISCSIAIOCB *)blockacb; + +acb-status = -EIO; +acb-common.cb(acb-common.opaque, acb-status); +acb-canceled = 1; +} + +static AIOPool iscsi_aio_pool = { +.aiocb_size = sizeof(ISCSIAIOCB), +.cancel = iscsi_aio_cancel, +}; + + +static void iscsi_process_read(void *arg); +static void iscsi_process_write(void *arg); + +static void +iscsi_set_events(ISCSILUN *iscsilun) +{ +struct iscsi_context *iscsi = iscsilun-iscsi; + +qemu_aio_set_fd_handler(iscsi_get_fd(iscsi), iscsi_process_read, + (iscsi_which_events(iscsi)POLLOUT) + ?iscsi_process_write:NULL, + NULL, NULL, iscsilun); +} + +static void +iscsi_process_read(void *arg) +{ +ISCSILUN *iscsilun = arg; +struct iscsi_context *iscsi = iscsilun-iscsi; + +iscsi_service(iscsi, POLLIN); +
[Qemu-devel] Re: [PATCH] fix qruncom compilation problems
On 12/11/2010 03:42 PM, Stefano Bonifazi wrote: Surely I do understand you! Your help has been very very useful and appreciated already thank you! May you direct me to somebody who's working on it? Some TCG guru who could understand immediately what's wrong?:) I noticed, far now, that each question on this mailing list is answered only by one QEMU developer, is that a sort of policy or just a coincidence? It's a coincidence. :) Paolo
[Qemu-devel] Re: [RFC][PATCH v5 08/21] virtagent: add agent_viewfile qmp/hmp command
On 12/10/10 18:09, Michael Roth wrote: I think with strictly enforced size limits the major liability for viewfile is, as you mentioned, users using it to view binary data or carefully crafted files that can mess up or fool users/shells/programs interpreting monitor output. But plain-text does not include escape sequences, so it's completely reasonable that we'd scrape them. And I'm not sure if a (qemu) in the text is a potential liability. Would there be any other issues to consider? If we can guard against those things, do you agree it wouldn't be an inherently dangerous interface? State-full, asynchronous RPCs like copyfile and exec are not really something I'd planned for the initial release. I think they'll take some time to get right, and a simple low-risk interface to cover what I'm fairly sure is the most common use case seems reasonable. I am still wary of relying on strict limit enforcement. It is the sort of thing that will eventually change without us noticing and we end up with a security hole. IMHO QEMU should not try to do these sorts of things, instead it should provide the transport and control services. I don't think file viewing belongs in QEMU at all. I would be a lot more comfortable if this was implemented as a standalone monitor interface that connected to QEMU's QMP interface. I could then use QMP to perform actions like copying the file to /tmp and if viewing the file caused the monitor to lock up, we wouldn't lose the guest. This could indeed be the start of an external monitor :) Cheers, Jes
[Qemu-devel] Re: [SeaBIOS] seabios: acpi: add _RMV control method for PCI devices
On 12/13/2010 02:00 AM, Marcelo Tosatti wrote: On Sat, Dec 11, 2010 at 09:39:30AM +0200, Avi Kivity wrote: On 12/08/2010 07:08 PM, Marcelo Tosatti wrote: Use _RMV method to indicate whether device can be removed. Data is retrieved from QEMU via I/O port 0xae0c. Where did this port come from? Its the next available address after PCI EJ base, used for QEMU-ACPI hotplug communication. What's the protocol? ACPI reads the 32-bit field indicating the return value of the _RMV method (which is used by Windows to decide removability). 1-bit per slot. More ports have to be registered if more buses are added. Maybe we should do this via fw_cfg. I don't see a need for it? (yes, it might be possible, but i'm not familiar enough with AML). To avoid adding tons of undocumented I/O ports, and to allow discoverability (what happens with a new seabios on old qemu)? We could do this in two ways: by adding a fwcfg client to the DSDT, or by copying the information to system memory, and referencing system memory from the DSDT. -- error compiling committee.c: too many arguments to function
[Qemu-devel] Re: [PATCH V2] qemu, kvm: Enable user space NMI injection for kvm guest
On 12/10/2010 04:41 PM, Jan Kiszka wrote: Am 10.12.2010 08:42, Lai Jiangshan wrote: Make use of the new KVM_NMI IOCTL to send NMIs into the KVM guest if the user space raised them. (example: qemu monitor's nmi command) Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com --- diff --git a/configure b/configure index 2917874..f6f9362 100755 --- a/configure +++ b/configure @@ -1646,6 +1646,9 @@ if test $kvm != no ; then #if !defined(KVM_CAP_DESTROY_MEMORY_REGION_WORKS) #error Missing KVM capability KVM_CAP_DESTROY_MEMORY_REGION_WORKS #endif +#if !defined(KVM_CAP_USER_NMI) +#error Missing KVM capability KVM_CAP_USER_NMI +#endif int main(void) { return 0; } EOF if test $kerneldir != ; then That's what I meant. We also have a runtime check for KVM_CAP_DESTROY_MEMORY_REGION_WORKS on kvm init, but IMHO adding the same for KVM_CAP_USER_NMI would be overkill. So... diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 7dfc357..755f8c9 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -1417,6 +1417,13 @@ int kvm_arch_get_registers(CPUState *env) int kvm_arch_pre_run(CPUState *env, struct kvm_run *run) { +/* Inject NMI */ +if (env-interrupt_request CPU_INTERRUPT_NMI) { +env-interrupt_request = ~CPU_INTERRUPT_NMI; +DPRINTF(injected NMI\n); +kvm_vcpu_ioctl(env, KVM_NMI); +} + /* Try to inject an interrupt if the guest can accept it */ if (run-ready_for_interrupt_injection (env-interrupt_request CPU_INTERRUPT_HARD) Acked-by: Jan Kiszka jan.kis...@siemens.com Hi, Avi Could you apply this patch or give me any comments/suggest? Thanks, Lai
[Qemu-devel] Re: [SeaBIOS] seabios: acpi: add _RMV control method for PCI devices
On Mon, Dec 13, 2010 at 10:41:25AM +0200, Avi Kivity wrote: On 12/13/2010 02:00 AM, Marcelo Tosatti wrote: On Sat, Dec 11, 2010 at 09:39:30AM +0200, Avi Kivity wrote: On 12/08/2010 07:08 PM, Marcelo Tosatti wrote: Use _RMV method to indicate whether device can be removed. Data is retrieved from QEMU via I/O port 0xae0c. Where did this port come from? Its the next available address after PCI EJ base, used for QEMU-ACPI hotplug communication. What's the protocol? ACPI reads the 32-bit field indicating the return value of the _RMV method (which is used by Windows to decide removability). 1-bit per slot. More ports have to be registered if more buses are added. Maybe we should do this via fw_cfg. I don't see a need for it? (yes, it might be possible, but i'm not familiar enough with AML). To avoid adding tons of undocumented I/O ports, and to allow discoverability (what happens with a new seabios on old qemu)? We already have out own mini pci hot-plug controller at io port 0xae00. The patch just extends its functionality a bit. Logically this functionality belongs there. We could do this in two ways: by adding a fwcfg client to the DSDT, or by copying the information to system memory, and referencing system memory from the DSDT. This is even worse. It requires some fixed address to be shared between DSDT and Seabios (or alternatively Seabios will have to generate this part of DSDT dynamically). -- Gleb.
[Qemu-devel] Re: [SeaBIOS] seabios: acpi: add _RMV control method for PCI devices
On 12/13/2010 10:49 AM, Gleb Natapov wrote: On Mon, Dec 13, 2010 at 10:41:25AM +0200, Avi Kivity wrote: On 12/13/2010 02:00 AM, Marcelo Tosatti wrote: On Sat, Dec 11, 2010 at 09:39:30AM +0200, Avi Kivity wrote: On 12/08/2010 07:08 PM, Marcelo Tosatti wrote: Use _RMV method to indicate whether device can be removed. Data is retrieved from QEMU via I/O port 0xae0c. Where did this port come from? Its the next available address after PCI EJ base, used for QEMU-ACPI hotplug communication. What's the protocol? ACPI reads the 32-bit field indicating the return value of the _RMV method (which is used by Windows to decide removability). 1-bit per slot. More ports have to be registered if more buses are added. Maybe we should do this via fw_cfg. I don't see a need for it? (yes, it might be possible, but i'm not familiar enough with AML). To avoid adding tons of undocumented I/O ports, and to allow discoverability (what happens with a new seabios on old qemu)? We already have out own mini pci hot-plug controller at io port 0xae00. The patch just extends its functionality a bit. Logically this functionality belongs there. Well, at least it should be documented. We could also deprecate the old port and use fwcfg for everything (try fwcfg, fall back to ae00). We could do this in two ways: by adding a fwcfg client to the DSDT, or by copying the information to system memory, and referencing system memory from the DSDT. This is even worse. It requires some fixed address to be shared between DSDT and Seabios (or alternatively Seabios will have to generate this part of DSDT dynamically). Could easily be something in the F segment. -- error compiling committee.c: too many arguments to function
[Qemu-devel] Re: [SeaBIOS] seabios: acpi: add _RMV control method for PCI devices
On Mon, Dec 13, 2010 at 10:53:07AM +0200, Avi Kivity wrote: On 12/13/2010 10:49 AM, Gleb Natapov wrote: On Mon, Dec 13, 2010 at 10:41:25AM +0200, Avi Kivity wrote: On 12/13/2010 02:00 AM, Marcelo Tosatti wrote: On Sat, Dec 11, 2010 at 09:39:30AM +0200, Avi Kivity wrote: On 12/08/2010 07:08 PM, Marcelo Tosatti wrote: Use _RMV method to indicate whether device can be removed. Data is retrieved from QEMU via I/O port 0xae0c. Where did this port come from? Its the next available address after PCI EJ base, used for QEMU-ACPI hotplug communication. What's the protocol? ACPI reads the 32-bit field indicating the return value of the _RMV method (which is used by Windows to decide removability). 1-bit per slot. More ports have to be registered if more buses are added. Maybe we should do this via fw_cfg. I don't see a need for it? (yes, it might be possible, but i'm not familiar enough with AML). To avoid adding tons of undocumented I/O ports, and to allow discoverability (what happens with a new seabios on old qemu)? We already have out own mini pci hot-plug controller at io port 0xae00. The patch just extends its functionality a bit. Logically this functionality belongs there. Well, at least it should be documented. Agree. We could also deprecate the old port and use fwcfg for everything (try fwcfg, fall back to ae00). fwcfg designed to be simple for easy use by firmware. It has two port one for index another for value, so its use is racy in multi-threaded SMP environment. DSDT code is executed in such environment. There is lock facility in AML, but why complicate things. We could do this in two ways: by adding a fwcfg client to the DSDT, or by copying the information to system memory, and referencing system memory from the DSDT. This is even worse. It requires some fixed address to be shared between DSDT and Seabios (or alternatively Seabios will have to generate this part of DSDT dynamically). Could easily be something in the F segment. Yes, but then we will have two magic values (fwcfg index + address in F segment) instead of one (address of pci hot-plug controller). -- Gleb.
[Qemu-devel] Re: [SeaBIOS] seabios: acpi: add _RMV control method for PCI devices
On 12/13/2010 11:03 AM, Gleb Natapov wrote: We could also deprecate the old port and use fwcfg for everything (try fwcfg, fall back to ae00). fwcfg designed to be simple for easy use by firmware. It has two port one for index another for value, so its use is racy in multi-threaded SMP environment. DSDT code is executed in such environment. There is lock facility in AML, but why complicate things. I prefer to remove complexity from interfaces and have it in the implementation instead. We could do this in two ways: by adding a fwcfg client to the DSDT, or by copying the information to system memory, and referencing system memory from the DSDT. This is even worse. It requires some fixed address to be shared between DSDT and Seabios (or alternatively Seabios will have to generate this part of DSDT dynamically). Could easily be something in the F segment. Yes, but then we will have two magic values (fwcfg index + address in F segment) instead of one (address of pci hot-plug controller). The F segment address is internal to SeaBIOS; it isn't an external interface. -- error compiling committee.c: too many arguments to function
[Qemu-devel] Re: [SeaBIOS] seabios: acpi: add _RMV control method for PCI devices
On Mon, Dec 13, 2010 at 11:10:38AM +0200, Avi Kivity wrote: On 12/13/2010 11:03 AM, Gleb Natapov wrote: We could also deprecate the old port and use fwcfg for everything (try fwcfg, fall back to ae00). fwcfg designed to be simple for easy use by firmware. It has two port one for index another for value, so its use is racy in multi-threaded SMP environment. DSDT code is executed in such environment. There is lock facility in AML, but why complicate things. I prefer to remove complexity from interfaces and have it in the implementation instead. I prefer whatever is simpler :) simpler == less bugs. And it is not like we discuss new interface here. You want to deprecate existing interface in favor of something that was not designed to handle the task. We could do this in two ways: by adding a fwcfg client to the DSDT, or by copying the information to system memory, and referencing system memory from the DSDT. This is even worse. It requires some fixed address to be shared between DSDT and Seabios (or alternatively Seabios will have to generate this part of DSDT dynamically). Could easily be something in the F segment. Yes, but then we will have two magic values (fwcfg index + address in F segment) instead of one (address of pci hot-plug controller). The F segment address is internal to SeaBIOS; it isn't an external interface. Depends on how you define external interface. It can be considered as interface between OSPM and firmware. Next time layout of F segment changes in SeaBIOS will you remember fixing DSDT too? -- Gleb.
[Qemu-devel] [PATCH] qemu-io: Add discard command
discard [-Cq] off len -- discards a number of bytes at a specified offset discards a range of bytes from the given offset Example: 'discard 512 1k' - discards 1 kilobyte from 512 bytes into the file Discards a segment of the currently open file. -C, -- report statistics in a machine parsable format -q, -- quite mode, do not show I/O statistics Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- qemu-io.c | 88 + 1 files changed, 88 insertions(+), 0 deletions(-) diff --git a/qemu-io.c b/qemu-io.c index ff353eb..9de5361 100644 --- a/qemu-io.c +++ b/qemu-io.c @@ -1394,6 +1394,93 @@ static const cmdinfo_t info_cmd = { .oneline= prints information about the current file, }; +static void +discard_help(void) +{ + printf( +\n + discards a range of bytes from the given offset\n +\n + Example:\n + 'discard 512 1k' - discards 1 kilobyte from 512 bytes into the file\n +\n + Discards a segment of the currently open file.\n + -C, -- report statistics in a machine parsable format\n + -q, -- quite mode, do not show I/O statistics\n +\n); +} + +static int discard_f(int argc, char **argv); + +static const cmdinfo_t discard_cmd = { + .name = discard, + .altname= d, + .cfunc = discard_f, + .argmin = 2, + .argmax = -1, + .args = [-Cq] off len, + .oneline= discards a number of bytes at a specified offset, + .help = discard_help, +}; + +static int +discard_f(int argc, char **argv) +{ + struct timeval t1, t2; + int Cflag = 0, qflag = 0; + int c, ret; + int64_t offset; + int count; + + while ((c = getopt(argc, argv, Cq)) != EOF) { + switch (c) { + case 'C': + Cflag = 1; + break; + case 'q': + qflag = 1; + break; + default: + return command_usage(discard_cmd); + } + } + + if (optind != argc - 2) { + return command_usage(discard_cmd); + } + + offset = cvtnum(argv[optind]); + if (offset 0) { + printf(non-numeric length argument -- %s\n, argv[optind]); + return 0; + } + + optind++; + count = cvtnum(argv[optind]); + if (count 0) { + printf(non-numeric length argument -- %s\n, argv[optind]); + return 0; + } + + gettimeofday(t1, NULL); + ret = bdrv_discard(bs, offset, count); + gettimeofday(t2, NULL); + + if (ret 0) { + printf(discard failed: %s\n, strerror(-ret)); + goto out; + } + + /* Finally, report back -- -C gives a parsable format */ + if (!qflag) { + t2 = tsub(t2, t1); + print_report(discard, t2, offset, count, count, 1, Cflag); + } + +out: + return 0; +} + static int alloc_f(int argc, char **argv) { @@ -1717,6 +1804,7 @@ int main(int argc, char **argv) add_command(truncate_cmd); add_command(length_cmd); add_command(info_cmd); + add_command(discard_cmd); add_command(alloc_cmd); add_command(map_cmd); -- 1.7.2.3
[Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
On Sun, Dec 12, 2010 at 9:09 PM, Michael S. Tsirkin m...@redhat.com wrote: On Sun, Dec 12, 2010 at 10:56:34PM +0200, Michael S. Tsirkin wrote: On Sun, Dec 12, 2010 at 10:42:28PM +0200, Michael S. Tsirkin wrote: On Sun, Dec 12, 2010 at 10:41:28PM +0200, Michael S. Tsirkin wrote: On Sun, Dec 12, 2010 at 03:02:04PM +, Stefan Hajnoczi wrote: See below for the v5 changelog. Due to lack of connectivity I am sending from GMail. Git should retain my stefa...@linux.vnet.ibm.com From address. Virtqueue notify is currently handled synchronously in userspace virtio. This prevents the vcpu from executing guest code while hardware emulation code handles the notify. On systems that support KVM, the ioeventfd mechanism can be used to make virtqueue notify a lightweight exit by deferring hardware emulation to the iothread and allowing the VM to continue execution. This model is similar to how vhost receives virtqueue notifies. The result of this change is improved performance for userspace virtio devices. Virtio-blk throughput increases especially for multithreaded scenarios and virtio-net transmit throughput increases substantially. Interestingly, I see decreased throughput for small message host to get netperf runs. The command that I used was: netperf -H $vguest -- -m 200 And the results are: - with ioeventfd=off TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.104 (11.0.0.104) port 0 AF_INET : demo Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 16384 200 10.00 3035.48 15.50 99.30 6.695 2.680 - with ioeventfd=on TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.104 (11.0.0.104) port 0 AF_INET : demo Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 16384 200 10.00 1770.95 18.16 51.65 13.442 2.389 Do you see this behaviour too? Just a note: this is with the patchset ported to qemu-kvm. And just another note: the trend is reversed for larged messages, e.g. with 1.5k messages ioeventfd=on outputforms ioeventfd=off. Another datapoint where I see a regression is with 4000 byte messages for guest to host traffic. ioeventfd=off set_up_server could not establish a listen endpoint for port 12865 with family AF_UNSPEC TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.4 (11.0.0.4) port 0 AF_INET : demo Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 16384 4000 10.00 7717.56 98.80 15.11 1.049 2.566 ioeventfd=on TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.4 (11.0.0.4) port 0 AF_INET : demo Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 16384 4000 10.00 3965.86 87.69 15.29 1.811 5.055 Interesting. I posted the following results in an earlier version of this patch: Sridhar Samudrala s...@us.ibm.com collected the following data for virtio-net with 2.6.36-rc1 on the host and 2.6.34 on the guest. Guest to Host TCP_STREAM throughput(Mb/sec) --- Msg Size vhost-net virtio-net virtio-net/ioeventfd 65536 127556430 7590 16384 84993084 5764 4096 47231578 3659 Here we got a throughput improvement where you got a regression. Your virtio-net ioeventfd=off throughput is much higher than what we got (different hardware and configuration, but still I didn't know that virtio-net reaches 7 Gbit/s!). I have focussed on the block side of things. Any thoughts about the virtio-net performance we're seeing? 1024 1827 981 2060 Host to Guest TCP_STREAM throughput(Mb/sec)
[Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
On Mon, Dec 13, 2010 at 10:24:51AM +, Stefan Hajnoczi wrote: On Sun, Dec 12, 2010 at 9:09 PM, Michael S. Tsirkin m...@redhat.com wrote: On Sun, Dec 12, 2010 at 10:56:34PM +0200, Michael S. Tsirkin wrote: On Sun, Dec 12, 2010 at 10:42:28PM +0200, Michael S. Tsirkin wrote: On Sun, Dec 12, 2010 at 10:41:28PM +0200, Michael S. Tsirkin wrote: On Sun, Dec 12, 2010 at 03:02:04PM +, Stefan Hajnoczi wrote: See below for the v5 changelog. Due to lack of connectivity I am sending from GMail. Git should retain my stefa...@linux.vnet.ibm.com From address. Virtqueue notify is currently handled synchronously in userspace virtio. This prevents the vcpu from executing guest code while hardware emulation code handles the notify. On systems that support KVM, the ioeventfd mechanism can be used to make virtqueue notify a lightweight exit by deferring hardware emulation to the iothread and allowing the VM to continue execution. This model is similar to how vhost receives virtqueue notifies. The result of this change is improved performance for userspace virtio devices. Virtio-blk throughput increases especially for multithreaded scenarios and virtio-net transmit throughput increases substantially. Interestingly, I see decreased throughput for small message host to get netperf runs. The command that I used was: netperf -H $vguest -- -m 200 And the results are: - with ioeventfd=off TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.104 (11.0.0.104) port 0 AF_INET : demo Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 16384 200 10.00 3035.48 15.50 99.30 6.695 2.680 - with ioeventfd=on TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.104 (11.0.0.104) port 0 AF_INET : demo Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 16384 200 10.00 1770.95 18.16 51.65 13.442 2.389 Do you see this behaviour too? Just a note: this is with the patchset ported to qemu-kvm. And just another note: the trend is reversed for larged messages, e.g. with 1.5k messages ioeventfd=on outputforms ioeventfd=off. Another datapoint where I see a regression is with 4000 byte messages for guest to host traffic. ioeventfd=off set_up_server could not establish a listen endpoint for port 12865 with family AF_UNSPEC TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.4 (11.0.0.4) port 0 AF_INET : demo Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 16384 4000 10.00 7717.56 98.80 15.11 1.049 2.566 ioeventfd=on TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.4 (11.0.0.4) port 0 AF_INET : demo Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 16384 4000 10.00 3965.86 87.69 15.29 1.811 5.055 Interesting. I posted the following results in an earlier version of this patch: Sridhar Samudrala s...@us.ibm.com collected the following data for virtio-net with 2.6.36-rc1 on the host and 2.6.34 on the guest. Guest to Host TCP_STREAM throughput(Mb/sec) --- Msg Size vhost-net virtio-net virtio-net/ioeventfd 65536 127556430 7590 16384 84993084 5764 4096 47231578 3659 Here we got a throughput improvement where you got a regression. Your virtio-net ioeventfd=off throughput is much higher than what we got (different hardware and configuration, but still I didn't know that virtio-net reaches 7 Gbit/s!). Which qemu are you running?
Re: [Qemu-devel] SCSI Command support over VirtIO Block device
Hi 2010/12/13 Stefan Hajnoczi stefa...@gmail.com: On Dec 13, 2010 5:14 AM, अनुज anu...@gmail.com wrote: Hi I am trying to implement VirtIO support for a proprietary OS. And It would be great if I am able to process SCSI commands over VirtIO Block device. I tried to execute INQUIRY command but the status returned is UNSUPPORTED. If anyone provide example VirtIO SCSI Command request structure for INQUIRY command as per VirtIO spec Appendix D would be a great help. And also, the paragraph from VirtIO spec - 0.8.9 is confusing for me : Historically, devices assumed that the fields type, ioprio and sector reside in a single, separate read-only buffer; the fields errors, data_len, sense_len and residual reside in a single, separate write-only buffer; the sense eld in a separate write-only buffer of size 96 bytes, by itself; the fields errors, data_len, sense_len and residual in a single write-only buffer; and the status field is a separate readonly buffer of size 1 byte, by itself. Here 'status field of buffer size 1 byte' is whether readonly or writeonly? Writeonly I want to know from which version of Qemu-kvm supports processing of scsi commands over VirtIO block device as a backend. Although I checked the Host Feature fields in which VIRTIO_BLK_F_SCSI bit is set. I am using qemu-kvm version 0.12.3. Make sure you have a scsi-generic block device in qemu-kvm, not just a regular file or physical block device. Open /dev/sg. Yes, I have given a file name instead of /dev/sg0. Now it's working as a charm. That means I can use physical disk as a VirtIO disk in guest OS. right? So it's kind of passthrough for a physical disk. But how can I distinguish among different physical disks attached to the host. is /dev/sg is different for each physical disk? However I thought VirtIO scsi device operations are for virtual disk (a regular file) also. Look at hw/virtio-blk.c in qemu-kvm for host implementation details. -- Anuj Aggarwal .''`. : :Ⓐ : # apt-get install hakuna-matata `. `'` `- Thanks for your help. Regards -- Anuj Aggarwal .''`. : :Ⓐ : # apt-get install hakuna-matata `. `'` `-
[Qemu-devel] Re: [Spice-devel] RFC; usb redirection protocol
Basic packet structure / communication -- Each packet exchanged between the vm-host and the usb-host starts with a usb_redir_header, followed by an optional command specific header follow by optional additional data. The usb_redir_header each packet starts with looks as follows: struct usb_redir_header { uint32_t command; uint32_t length; } uint32_t id; ? A reply would then carry the id of the request ... Given that everything is done over a potentially slow transport in practice the diferentiating between synchroneous and asynchroneous commands may seem odd. The difference is how the usb-host will handle them once received. For synchroneous commands the usb-host will hand the request over to the host os and then *wait* for a response. This means that the vm-host is guaranteed to get an immediate response. Where as for asynchroneous commands to usb-host hands the request over to the host os with the request to let the usb-host process know when the request is done. Hmm. Looks like you are planning for one tcp stream and one thread (on the usb-host side) for each usb device. That will not work very good for usb-over-vnc because there is a single tcp stream only. We could of course multiplex multiple logical usb connections over vnc, but even then blocking on the usb-host side looks bad as this could disrupt other usb devices forwarded over the same connection. usb_redir_report_descriptor --- usb_redir_header.command: usb_redir_report_desciptor usb_redir_header.length: sizeof usb device descriptors No command specific header. The command specific additional data contains the entire descriptors for the usb device. A packet of this type is send by the usb-host directly after the hello packet it contains the usb descriptor tables for the usb device. Device addressing isn't done at all in the protocol, i.e. there is a fixed device - connection relation ship? Please let me know what you think of this. Do you know whenever certain low-level usb ops can work with this? Specifically iphone firmware flashing was mentioned on the list. Also I remember somewhere in the ehci (or xhci?) specs was mentioned with some devices it can be needed to talk to them *before* an bus address is assigned ... cheers, Gerd
[Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
Fresh results: 192.168.0.1 - host (runs netperf) 192.168.0.2 - guest (runs netserver) host$ src/netperf -H 192.168.0.2 -- -m 200 ioeventfd=on TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2 (192.168.0.2) port 0 AF_INET Recv SendSend Socket Socket Message Elapsed Size SizeSize Time Throughput bytes bytes bytessecs.10^6bits/sec 87380 1638420010.001759.25 ioeventfd=off TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2 (192.168.0.2) port 0 AF_INET Recv SendSend Socket Socket Message Elapsed Size SizeSize Time Throughput bytes bytes bytessecs.10^6bits/sec 87380 1638420010.001757.15 The results vary approx +/- 3% between runs. Invocation: $ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev type=tap,id=net0,ifname=tap0,script=no,downscript=no -device virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img I am running qemu.git with v5 patches, based off 36888c6335422f07bbc50bf3443a39f24b90c7c6. Host: 1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz 8 GB RAM RHEL 6 host Next I will try the patches on latest qemu-kvm.git Stefan
[Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
On Mon, Dec 13, 2010 at 01:11:27PM +, Stefan Hajnoczi wrote: Fresh results: 192.168.0.1 - host (runs netperf) 192.168.0.2 - guest (runs netserver) host$ src/netperf -H 192.168.0.2 -- -m 200 ioeventfd=on TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2 (192.168.0.2) port 0 AF_INET Recv SendSend Socket Socket Message Elapsed Size SizeSize Time Throughput bytes bytes bytessecs.10^6bits/sec 87380 1638420010.001759.25 ioeventfd=off TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2 (192.168.0.2) port 0 AF_INET Recv SendSend Socket Socket Message Elapsed Size SizeSize Time Throughput bytes bytes bytessecs.10^6bits/sec 87380 1638420010.001757.15 The results vary approx +/- 3% between runs. Invocation: $ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev type=tap,id=net0,ifname=tap0,script=no,downscript=no -device virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img I am running qemu.git with v5 patches, based off 36888c6335422f07bbc50bf3443a39f24b90c7c6. Host: 1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz 8 GB RAM RHEL 6 host Next I will try the patches on latest qemu-kvm.git Stefan One interesting thing is that I put virtio-net earlier on command line. Since iobus scan is linear for now, I wonder if this might possibly matter. -- MST
[Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
On Mon, Dec 13, 2010 at 03:35:38PM +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 01:11:27PM +, Stefan Hajnoczi wrote: Fresh results: 192.168.0.1 - host (runs netperf) 192.168.0.2 - guest (runs netserver) host$ src/netperf -H 192.168.0.2 -- -m 200 ioeventfd=on TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2 (192.168.0.2) port 0 AF_INET Recv SendSend Socket Socket Message Elapsed Size SizeSize Time Throughput bytes bytes bytessecs.10^6bits/sec 87380 1638420010.001759.25 ioeventfd=off TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2 (192.168.0.2) port 0 AF_INET Recv SendSend Socket Socket Message Elapsed Size SizeSize Time Throughput bytes bytes bytessecs.10^6bits/sec 87380 1638420010.001757.15 The results vary approx +/- 3% between runs. Invocation: $ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev type=tap,id=net0,ifname=tap0,script=no,downscript=no -device virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img I am running qemu.git with v5 patches, based off 36888c6335422f07bbc50bf3443a39f24b90c7c6. Host: 1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz 8 GB RAM RHEL 6 host Next I will try the patches on latest qemu-kvm.git Stefan One interesting thing is that I put virtio-net earlier on command line. Sorry I mean I put it after disk, you put it before. Since iobus scan is linear for now, I wonder if this might possibly matter. -- MST
[Qemu-devel] Re: [PATCH v2 1/2] Do not register kvmclock savevm section if kvmclock is disabled.
On Wed, 2010-12-08 at 17:31 -0200, Marcelo Tosatti wrote: On Tue, Dec 07, 2010 at 03:12:36PM -0200, Glauber Costa wrote: On Mon, 2010-12-06 at 19:04 -0200, Marcelo Tosatti wrote: On Mon, Dec 06, 2010 at 09:03:46AM -0500, Glauber Costa wrote: Usually nobody usually thinks about that scenario (me included and specially), but kvmclock can be actually disabled in the host. It happens in two scenarios: 1. host too old. 2. we passed -kvmclock to our -cpu parameter. In both cases, we should not register kvmclock savevm section. This patch achives that by registering this section only if kvmclock is actually currently enabled in cpuid. The only caveat is that we have to register the savevm section a little bit later, since we won't know the final kvmclock state before cpuid gets parsed. What is the problem of registering the section? Restoring the value if the host does not support it returns an error? Can't you ignore the error if kvmclock is not reported in cpuid, in the restore handler? We can change the restore handler, but not the restore handler of binaries that are already out there. The motivation here is precisely to address migration to hosts without kvmclock, so it's better to have a way to disable, than to count on the fact that the other side will be able to ignore it. OK. Can't you register conditionally on kvmclock cpuid bit at the end of kvm_arch_init_vcpu, in target-i386/kvm.c? Haven't looked at it, but will today. Actually, tsc has (obviously) the same problem and I plan to respin the patch today including a fix for it as well. Thanks!
[Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
Here are my results on qemu-kvm.git: ioeventfd=on TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2 (192.168.0.2) port 0 AF_INET Recv SendSend Socket Socket Message Elapsed Size SizeSize Time Throughput bytes bytes bytessecs.10^6bits/sec 87380 1638420010.001203.44 ioeventfd=off TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2 (192.168.0.2) port 0 AF_INET Recv SendSend Socket Socket Message Elapsed Size SizeSize Time Throughput bytes bytes bytessecs.10^6bits/sec 87380 1638420010.001677.96 This is a 30% degradation that wasn't visible on qemu.git. Same host. qemu-kvm.git with v5 patches based on cb1983b8809d0e06a97384a40bad1194a32fc814. Stefan
[Qemu-devel] [Bug 595117] Re: qemu-nbd slow and missing writeback cache option
@Stephane, did upstream ever accept your patch? -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/595117 Title: qemu-nbd slow and missing writeback cache option Status in QEMU: Invalid Status in “qemu-kvm” package in Ubuntu: Expired Bug description: Binary package hint: qemu-kvm dpkg -l | grep qemu ii kvm 1:84+dfsg-0ubuntu16+0.12.3+noroms+0ubuntu9dummy transitional pacakge from kvm to qemu- ii qemu 0.12.3+noroms-0ubuntu9 dummy transitional pacakge from qemu to qemu ii qemu-common 0.12.3+noroms-0ubuntu9 qemu common functionality (bios, documentati ii qemu-kvm 0.12.3+noroms-0ubuntu9 Full virtualization on i386 and amd64 hardwa ii qemu-kvm-extras 0.12.3+noroms-0ubuntu9 fast processor emulator binaries for non-x86 ii qemu-launcher1.7.4-1ubuntu2 GTK+ front-end to QEMU computer emulator ii qemuctl 0.2-2 controlling GUI for qemu lucid amd64. qemu-nbd is a lot slower when writing to disk than say nbd-server. It appears it is because by default the disk image it serves is open with O_SYNC. The --nocache option, unintuitively, makes matters a bit better because it causes the image to be open with O_DIRECT instead of O_SYNC. The qemu code allows an image to be open without any of those flags, but unfortunately qemu-nbd doesn't have the option to do that (qemu doesn't allow the image to be open with both O_SYNC and O_DIRECT though). The default of qemu-img (of using O_SYNC) is not very sensible because anyway, the client (the kernel) uses caches (write-back), (and qemu-nbd -d doesn't flush those by the way). So if for instance qemu-nbd is killed, regardless of whether qemu-nbd uses O_SYNC, O_DIRECT or not, the data in the image will not be consistent anyway, unless syncs are done by the client (like fsync on the nbd device or sync mount option), and with qemu-nbd's O_SYNC mode, those syncs will be extremely slow. Attached is a patch that adds a --cache={off,none,writethrough,writeback} option to qemu-nbd. --cache=off is the same as --nocache (that is use O_DIRECT), writethrough is using O_SYNC and is still the default so this patch doesn't change the functionality. writeback is none of those flags, so is the addition of this patch. The patch also does an fsync upon qemu-nbd -d to make sure data is flushed to the image before removing the nbd. Consider this test scenario: dd bs=1M count=100 of=a /dev/null qemu-nbd --cache=x -c /dev/nbd0 a cp /dev/zero /dev/nbd0 time perl -MIO::Handle -e 'STDOUT-sync or die$!' 1 /dev/nbd0 With cache=writethrough (the default), it takes over 10 minutes to write those 100MB worth of zeroes. Running a strace, we see the recvfrom and sentos delayed by each 1kb write(2)s to disk (10 to 30 ms per write). With cache=off, it takes about 30 seconds. With cache=writeback, it takes about 3 seconds, which is similar to the performance you get with nbd-server Note that the cp command runs instantly as the data is buffered by the client (the kernel), and not sent to qemu-nbd until the fsync(2) is called.
[Qemu-devel] Check out my photos on Shtyle.fm
Hi qemu-de...@nongnu.org! Check out my photos on Shtyle.fm I've created a profile on Shtyle.fm to upload my photos, share files and make new friends and I want to add you as a friend. View my Profile and Photos Regards, Bogárdi Iván You can opt-out of Shtyle.fm emails.
[Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
On Mon, Dec 13, 2010 at 1:36 PM, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Dec 13, 2010 at 03:35:38PM +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 01:11:27PM +, Stefan Hajnoczi wrote: Fresh results: 192.168.0.1 - host (runs netperf) 192.168.0.2 - guest (runs netserver) host$ src/netperf -H 192.168.0.2 -- -m 200 ioeventfd=on TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2 (192.168.0.2) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 200 10.00 1759.25 ioeventfd=off TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2 (192.168.0.2) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 200 10.00 1757.15 The results vary approx +/- 3% between runs. Invocation: $ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev type=tap,id=net0,ifname=tap0,script=no,downscript=no -device virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img I am running qemu.git with v5 patches, based off 36888c6335422f07bbc50bf3443a39f24b90c7c6. Host: 1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz 8 GB RAM RHEL 6 host Next I will try the patches on latest qemu-kvm.git Stefan One interesting thing is that I put virtio-net earlier on command line. Sorry I mean I put it after disk, you put it before. I can't find a measurable difference when swapping -drive and -netdev. Can you run the same test with vhost? I assume it still outperforms userspace virtio for small message sizes? I'm interested because that also uses ioeventfd. I am wondering if the iothread differences between qemu.git and qemu-kvm.git can explain the performance results we see. In particular, qemu.git produces the same (high) throughput whether ioeventfd is on or off. Stefan
Re: [Qemu-devel] [PATCH 1/5] block: add discard support
On Sat, Dec 11, 2010 at 12:50:20PM +, Paul Brook wrote: It's guest visible state, so it must not change due to migrations. For the current implementation all values for it work anyway - if it's smaller than the block size we'll zero out the remainder of the block. That sounds wrong. Surely we should leave partial blocks untouched. While zeroing them is not required for qemu, the general semantics of the XFS ioctl require it. It punches a hole, which means it's makes the new area equivalent to a hole create by truncating a file to a larger size and then only writing at the larger offset. The semantics for a hole in all Unix filesystems is that we read back zeroes from them. If we write into a sparse file at a not block aligned offset the zeroing of the partial block also happens.
Re: [Qemu-devel] ]PATCH 0/7] add TRIM/UNMAP support, v3
On Sun, Dec 12, 2010 at 03:28:14PM +, Stefan Hajnoczi wrote: Do you have qemu-io support for discard? Now that you wrote it we have the support :) Any hints on testing this? A recent guest kernel and ext -o discard might exercise the code but I haven't tried yet. Anything that submits a discard in the guest is fine. The simples thing to test are the various mkfs tools, as they do a whole device discard. Also -o discard for various Linux filesystem works, Mark Lord's wiper.sh script, or any Windows 7 installation.
Re: [Qemu-devel] [PATCH 5/6] [RFC] Emulation of Leon3.
On 12/11/2010 10:56 AM, Blue Swirl wrote: On Tue, Dec 7, 2010 at 11:40 AM, Fabien Chouteauchout...@adacore.com wrote: On 12/06/2010 06:53 PM, Blue Swirl wrote: On Mon, Dec 6, 2010 at 9:26 AM, Fabien Chouteauchout...@adacore.com wrote: Signed-off-by: Fabien Chouteauchout...@adacore.com --- Makefile.target |5 +- hw/leon3.c | 310 ++ target-sparc/cpu.h | 10 ++ target-sparc/helper.c|2 +- target-sparc/op_helper.c | 30 - 5 files changed, 353 insertions(+), 4 deletions(-) diff --git a/Makefile.target b/Makefile.target index 2800f47..f40e04f 100644 --- a/Makefile.target +++ b/Makefile.target @@ -290,7 +290,10 @@ obj-sparc-y += cirrus_vga.o else obj-sparc-y = sun4m.o lance.o tcx.o sun4m_iommu.o slavio_intctl.o obj-sparc-y += slavio_timer.o slavio_misc.o sparc32_dma.o -obj-sparc-y += cs4231.o eccmemctl.o sbi.o sun4c_intctl.o +obj-sparc-y += cs4231.o eccmemctl.o sbi.o sun4c_intctl.o leon3.o + +# GRLIB +obj-sparc-y += grlib_gptimer.o grlib_irqmp.o grlib_apbuart.o endif obj-arm-y = integratorcp.o versatilepb.o arm_pic.o arm_timer.o diff --git a/hw/leon3.c b/hw/leon3.c new file mode 100644 index 000..ba61081 --- /dev/null +++ b/hw/leon3.c @@ -0,0 +1,310 @@ +/* + * QEMU Leon3 System Emulator + * + * Copyright (c) 2010 AdaCore + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the Software), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ +#include hw.h +#include qemu-timer.h +#include qemu-char.h +#include sysemu.h +#include boards.h +#include loader.h +#include elf.h + +#include grlib.h + +/* #define DEBUG_LEON3 */ + +#ifdef DEBUG_LEON3 +#define DPRINTF(fmt, ...) \ +do { printf(Leon3: fmt , ## __VA_ARGS__); } while (0) +#else +#define DPRINTF(fmt, ...) +#endif + +/* Default system clock. */ +#define CPU_CLK (40 * 1000 * 1000) + +#define PROM_FILENAMEu-boot.bin + +#define MAX_PILS 16 + +typedef struct Leon3State +{ +uint32_t cache_control; +uint32_t inst_cache_conf; +uint32_t data_cache_conf; + +uint64_t entry; /* save kernel entry in case of reset */ +} Leon3State; + +Leon3State leon3_state; Again global state, please refactor. Perhaps most of the cache handling code belong to target-sparc/op_helper.c and this structure to CPUSPARCState. I will try to find a solution for that. Is it OK to add some Leon3 specific stuff in the CPUSPARCState? Yes, no problem. You can also drop the intermediate Leon3State structure if there is no benefit. + +/* Cache control: emulate the behavior of cache control registers but without + any effect on the emulated CPU */ + +#define CACHE_DISABLED 0x0 +#define CACHE_FROZEN 0x1 +#define CACHE_ENABLED 0x3 + +/* Cache Control register fields */ + +#define CACHE_CTRL_IF (1 4) /* Instruction Cache Freeze on Interrupt */ +#define CACHE_CTRL_DF (1 5) /* Data Cache Freeze on Interrupt */ +#define CACHE_CTRL_DP (114) /* Data cache flush pending */ +#define CACHE_CTRL_IP (115) /* Instruction cache flush pending */ +#define CACHE_CTRL_IB (116) /* Instruction burst fetch */ +#define CACHE_CTRL_FI (121) /* Flush Instruction cache (Write only) */ +#define CACHE_CTRL_FD (122) /* Flush Data cache (Write only) */ +#define CACHE_CTRL_DS (123) /* Data cache snoop enable */ + +void leon3_cache_control_int(void) +{ +uint32_t state = 0; + +if (leon3_state.cache_controlCACHE_CTRL_IF) { +/* Instruction cache state */ +state = leon3_state.cache_control0x3; Please add a new define CACHE_CTRL_xxx to replace 0x3. Done. +if (state == CACHE_ENABLED) { +state = CACHE_FROZEN; +DPRINTF(Instruction cache: freeze\n); +} + +leon3_state.cache_control= ~0x3; +leon3_state.cache_control |= state; +} + +if (leon3_state.cache_controlCACHE_CTRL_DF) { +/* Data cache state */ +state =
[Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
On Mon, Dec 13, 2010 at 03:27:06PM +, Stefan Hajnoczi wrote: On Mon, Dec 13, 2010 at 1:36 PM, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Dec 13, 2010 at 03:35:38PM +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 01:11:27PM +, Stefan Hajnoczi wrote: Fresh results: 192.168.0.1 - host (runs netperf) 192.168.0.2 - guest (runs netserver) host$ src/netperf -H 192.168.0.2 -- -m 200 ioeventfd=on TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2 (192.168.0.2) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 200 10.00 1759.25 ioeventfd=off TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2 (192.168.0.2) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 200 10.00 1757.15 The results vary approx +/- 3% between runs. Invocation: $ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev type=tap,id=net0,ifname=tap0,script=no,downscript=no -device virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img I am running qemu.git with v5 patches, based off 36888c6335422f07bbc50bf3443a39f24b90c7c6. Host: 1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz 8 GB RAM RHEL 6 host Next I will try the patches on latest qemu-kvm.git Stefan One interesting thing is that I put virtio-net earlier on command line. Sorry I mean I put it after disk, you put it before. I can't find a measurable difference when swapping -drive and -netdev. Can you run the same test with vhost? I assume it still outperforms userspace virtio for small message sizes? I'm interested because that also uses ioeventfd. Seems to work same as ioeventfd. I am wondering if the iothread differences between qemu.git and qemu-kvm.git can explain the performance results we see. In particular, qemu.git produces the same (high) throughput whether ioeventfd is on or off. Stefan
[Qemu-devel] Re: [PATCH 1/5] block: add discard support
On 12/10/2010 02:38 PM, Christoph Hellwig wrote: if it's smaller than the block size we'll zero out the remainder of the block. I think it should fail at VM startup time, or even better do nothing at all. When you write in the middle of an absent block, and a partially-zero block is created, this is not visible: a read cannot see the difference between all zeros because it's sparse and all zeros because it's zero. If I ask you to (optionally) punch a 1kb hole but all you can do is punch a 2kb hole, I do care about the second kilobyte of data. Since the hole punching of bdrv_discard is completely optional, it should not be done in this case. Paolo
[Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
On Mon, Dec 13, 2010 at 03:27:06PM +, Stefan Hajnoczi wrote: On Mon, Dec 13, 2010 at 1:36 PM, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Dec 13, 2010 at 03:35:38PM +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 01:11:27PM +, Stefan Hajnoczi wrote: Fresh results: 192.168.0.1 - host (runs netperf) 192.168.0.2 - guest (runs netserver) host$ src/netperf -H 192.168.0.2 -- -m 200 ioeventfd=on TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2 (192.168.0.2) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 200 10.00 1759.25 ioeventfd=off TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2 (192.168.0.2) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 200 10.00 1757.15 The results vary approx +/- 3% between runs. Invocation: $ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev type=tap,id=net0,ifname=tap0,script=no,downscript=no -device virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img I am running qemu.git with v5 patches, based off 36888c6335422f07bbc50bf3443a39f24b90c7c6. Host: 1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz 8 GB RAM RHEL 6 host Next I will try the patches on latest qemu-kvm.git Stefan One interesting thing is that I put virtio-net earlier on command line. Sorry I mean I put it after disk, you put it before. I can't find a measurable difference when swapping -drive and -netdev. One other concern I have is that we are apparently using ioeventfd for all VQs. E.g. for virtio-net we probably should not use it for the control VQ - it's a waste of resources. Can you run the same test with vhost? I assume it still outperforms userspace virtio for small message sizes? I'm interested because that also uses ioeventfd. I am wondering if the iothread differences between qemu.git and qemu-kvm.git can explain the performance results we see. In particular, qemu.git produces the same (high) throughput whether ioeventfd is on or off. Stefan
[Qemu-devel] Re: [PATCH 1/5] block: add discard support
On Mon, Dec 13, 2010 at 05:07:27PM +0100, Paolo Bonzini wrote: On 12/10/2010 02:38 PM, Christoph Hellwig wrote: if it's smaller than the block size we'll zero out the remainder of the block. I think it should fail at VM startup time, or even better do nothing at all. What should fail? When you write in the middle of an absent block, and a partially-zero block is created, this is not visible: a read cannot see the difference between all zeros because it's sparse and all zeros because it's zero. You can not see from a VM if a block is not allocated or zeroed. Then again we'll never create a fully zeroed block anyway unless we get really stupid discard patterns from the guest OS. If I ask you to (optionally) punch a 1kb hole but all you can do is punch a 2kb hole, I do care about the second kilobyte of data. Since the hole punching of bdrv_discard is completely optional, it should not be done in this case. Of course we do not discard the second KB in that case. If you issue a 1k UNRSVSP ioctl on a 2k block size XFS filesystem it will zero exactly the 1k you specified, which is required for the semantics of the ioctl. Yes, it's not optimal, but qemu can't easily know what block size the underlying filesystem has.
Re: [Qemu-devel] [PATCH 1/5] block: add discard support
On Sat, Dec 11, 2010 at 12:50:20PM +, Paul Brook wrote: It's guest visible state, so it must not change due to migrations. For the current implementation all values for it work anyway - if it's smaller than the block size we'll zero out the remainder of the block. That sounds wrong. Surely we should leave partial blocks untouched. While zeroing them is not required for qemu, the general semantics of the XFS ioctl require it. It punches a hole, which means it's makes the new area equivalent to a hole create by truncating a file to a larger size and then only writing at the larger offset. The semantics for a hole in all Unix filesystems is that we read back zeroes from them. If we write into a sparse file at a not block aligned offset the zeroing of the partial block also happens. Ah, so it was just inconsistent use of the term block. When the erase region includes part of a block, we zero that part of the block and leave the rest of the block untouched. Paul
Re: [Qemu-devel] [PATCH 2/6] [RFC] Emulation of GRLIB IRQMP as defined in GRLIB IP Core User's Manual.
On 12/11/2010 11:31 AM, Blue Swirl wrote: On Tue, Dec 7, 2010 at 10:43 AM, Fabien Chouteauchout...@adacore.com wrote: On 12/06/2010 06:25 PM, Blue Swirl wrote: On Mon, Dec 6, 2010 at 9:26 AM, Fabien Chouteauchout...@adacore.com wrote: Signed-off-by: Fabien Chouteauchout...@adacore.com --- hw/grlib_irqmp.c | 416 ++ 1 files changed, 416 insertions(+), 0 deletions(-) diff --git a/hw/grlib_irqmp.c b/hw/grlib_irqmp.c new file mode 100644 index 000..69e1553 --- /dev/null +++ b/hw/grlib_irqmp.c @@ -0,0 +1,416 @@ +/* + * QEMU GRLIB IRQMP Emulator + * + * (Multiprocessor and extended interrupt not supported) + * + * Copyright (c) 2010 AdaCore + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the Software), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include sysbus.h +#include cpu.h + +#include grlib.h + +/* #define DEBUG_IRQ */ + +#ifdef DEBUG_IRQ +#define DPRINTF(fmt, ...) \ +do { printf(IRQMP: fmt , ## __VA_ARGS__); } while (0) +#else +#define DPRINTF(fmt, ...) +#endif + +#define IRQMP_MAX_CPU 16 +#define IRQMP_REG_SIZE 256 /* Size of memory mapped registers */ + +/* Memory mapped register offsets */ +#define LEVEL_OFFSET 0x00 +#define PENDING_OFFSET 0x04 +#define FORCE0_OFFSET0x08 +#define CLEAR_OFFSET 0x0C +#define MP_STATUS_OFFSET 0x10 +#define BROADCAST_OFFSET 0x14 +#define MASK_OFFSET 0x40 +#define FORCE_OFFSET 0x80 +#define EXTENDED_OFFSET 0xC0 + +typedef struct IRQMP +{ +SysBusDevice busdev; + +CPUSPARCState *env; Devices should never access CPUState directly. Instead, board level should create CPU irqs and these should then be passed here. This case is special, Leon3 is a System-On-Chip and some of the components are very close to the processor. IRQMP is not really a peripheral nor a part of the CPU, it's both... It's not a special case, it could be easily implemented separately. MMUs, FPUs or co-processors could be special even if they have been implemented as separate chips with real hardware. But we are actually not looking at the (historical or current) chip boundaries but more like what makes sense from QEMU architecture point of view. OK then, let's go back to your first comment, why a device can't access CPUState directly? And why Leon3.c would be better to do that. -- Fabien Chouteau
[Qemu-devel] Re: [PATCH 1/5] block: add discard support
On 12/13/2010 05:15 PM, Christoph Hellwig wrote: On Mon, Dec 13, 2010 at 05:07:27PM +0100, Paolo Bonzini wrote: On 12/10/2010 02:38 PM, Christoph Hellwig wrote: if it's smaller than the block size we'll zero out the remainder of the block. I think it should fail at VM startup time, or even better do nothing at all. What should fail? Nothing -- you wrote if it's smaller than the block size we'll zero out the remainder of the block which I interpreted the wrong way, i.e. as XFS will round up the size to the remainder of the block and zero that part out as well. Thanks for the clarification. Paolo
[Qemu-devel] [PATCH 1/4] Make vm_stop available for block layer
blkqueue wants to stop the VM after an error has occurred, so we need to make vm_stop available in common code. It now returns a boolean that tells if the VM could be stopped, which is always true in qemu itself, and always false in the tools. Signed-off-by: Kevin Wolf kw...@redhat.com --- cpus.c|8 +--- qemu-common.h |3 +++ qemu-tool.c |5 + sysemu.h |1 - 4 files changed, 13 insertions(+), 4 deletions(-) diff --git a/cpus.c b/cpus.c index 91a0fb1..8ec0ed6 100644 --- a/cpus.c +++ b/cpus.c @@ -310,9 +310,10 @@ void qemu_notify_event(void) void qemu_mutex_lock_iothread(void) {} void qemu_mutex_unlock_iothread(void) {} -void vm_stop(int reason) +bool vm_stop(int reason) { do_vm_stop(reason); +return true; } #else /* CONFIG_IOTHREAD */ @@ -848,7 +849,7 @@ static void qemu_system_vmstop_request(int reason) qemu_notify_event(); } -void vm_stop(int reason) +bool vm_stop(int reason) { QemuThread me; qemu_thread_self(me); @@ -863,9 +864,10 @@ void vm_stop(int reason) cpu_exit(cpu_single_env); cpu_single_env-stop = 1; } -return; +return true; } do_vm_stop(reason); +return true; } #endif diff --git a/qemu-common.h b/qemu-common.h index de82c2e..cb077a0 100644 --- a/qemu-common.h +++ b/qemu-common.h @@ -115,6 +115,9 @@ static inline char *realpath(const char *path, char *resolved_path) #endif /* !defined(NEED_CPU_H) */ +/* VM state */ +bool vm_stop(int reason); + /* bottom halves */ typedef void QEMUBHFunc(void *opaque); diff --git a/qemu-tool.c b/qemu-tool.c index 392e1c9..3926435 100644 --- a/qemu-tool.c +++ b/qemu-tool.c @@ -111,3 +111,8 @@ int qemu_set_fd_handler2(int fd, { return 0; } + +bool vm_stop(int reason) +{ +return false; +} diff --git a/sysemu.h b/sysemu.h index b81a70e..77788f1 100644 --- a/sysemu.h +++ b/sysemu.h @@ -38,7 +38,6 @@ VMChangeStateEntry *qemu_add_vm_change_state_handler(VMChangeStateHandler *cb, void qemu_del_vm_change_state_handler(VMChangeStateEntry *e); void vm_start(void); -void vm_stop(int reason); uint64_t ram_bytes_remaining(void); uint64_t ram_bytes_transferred(void); -- 1.7.2.3
[Qemu-devel] [PATCH 3/4] Test cases for block-queue
Add some unit tests especially for the ordering and request merging in block-queue. Signed-off-by: Kevin Wolf kw...@redhat.com --- Makefile|1 + check-block-queue.c | 402 +++ 2 files changed, 403 insertions(+), 0 deletions(-) create mode 100644 check-block-queue.c diff --git a/Makefile b/Makefile index c80566c..3e60d7e 100644 --- a/Makefile +++ b/Makefile @@ -172,6 +172,7 @@ check-qdict: check-qdict.o qdict.o qfloat.o qint.o qstring.o qbool.o qlist.o $(C check-qlist: check-qlist.o qlist.o qint.o $(CHECK_PROG_DEPS) check-qfloat: check-qfloat.o qfloat.o $(CHECK_PROG_DEPS) check-qjson: check-qjson.o qfloat.o qint.o qdict.o qstring.o qlist.o qbool.o qjson.o json-streamer.o json-lexer.o json-parser.o $(CHECK_PROG_DEPS) +check-block-queue: check-block-queue.o qemu-tool.o qemu-error.o $(oslib-obj-y) $(filter-out block-queue.o,$(block-obj-y)) $(qobject-obj-y) qemu-timer-common.o clean: # avoid old build problems by removing potentially incorrect old files diff --git a/check-block-queue.c b/check-block-queue.c new file mode 100644 index 000..b2d --- /dev/null +++ b/check-block-queue.c @@ -0,0 +1,402 @@ +/* + * block-queue.c unit tests + * + * Copyright (c) 2010 Kevin Wolf kw...@redhat.com + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the Software), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +/* We want to test some static functions, so just include the source file */ +#define RUN_TESTS +#include block-queue.c + +#define CHECK_WRITE(req, _bq, _offset, _size, _buf, _section) \ +do { \ +assert(req != NULL); \ +assert(req-type == REQ_TYPE_WRITE); \ +assert(req-bq == _bq); \ +assert(req-offset == _offset); \ +assert(req-size == _size); \ +assert(req-section == _section); \ +assert(!memcmp(req-buf, _buf, _size)); \ +} while(0) + +#define CHECK_BARRIER(req, _bq, _section) \ +do { \ +assert(req != NULL); \ +assert(req-type == REQ_TYPE_BARRIER); \ +assert(req-bq == _bq); \ +assert(req-section == _section); \ +} while(0) + +#define CHECK_READ(_context, _offset, _buf, _size, _cmpbuf) \ +do { \ +int ret; \ +memset(buf, 0, 512); \ +ret = blkqueue_pread(_context, _offset, _buf, _size); \ +assert(ret == 0); \ +assert(!memcmp(_cmpbuf, _buf, _size)); \ +} while(0) + +#define QUEUE_WRITE(_context, _offset, _buf, _size, _pattern) \ +do { \ +int ret; \ +memset(_buf, _pattern, _size); \ +ret = blkqueue_pwrite(_context, _offset, _buf, _size); \ +assert(ret == 0); \ +} while(0) +#define QUEUE_BARRIER(_context) \ +do { \ +int ret; \ +ret = blkqueue_barrier(_context); \ +assert(ret == 0); \ +} while(0) + +#define POP_CHECK_WRITE(_bq, _offset, _buf, _size, _pattern, _section) \ +do { \ +BlockQueueRequest *req; \ +memset(_buf, _pattern, _size); \ +req = blkqueue_pop(_bq); \ +CHECK_WRITE(req, _bq, _offset, _size, _buf, _section); \ +blkqueue_free_request(req); \ +} while(0) +#define POP_CHECK_BARRIER(_bq, _section) \ +do { \ +BlockQueueRequest *req; \ +req = blkqueue_pop(_bq); \ +CHECK_BARRIER(req, _bq, _section); \ +blkqueue_free_request(req); \ +} while(0) + +static void __attribute__((used)) dump_queue(BlockQueue *bq) +{ +BlockQueueRequest *req; + +fprintf(stderr, --- Queue dump ---\n); +QTAILQ_FOREACH(req, bq-queue, link) { +fprintf(stderr, [%d] , req-section); +if (req-type == REQ_TYPE_WRITE) { +fprintf(stderr, Write off=%5PRId64, len=%5PRId64, buf=%p\n, +req-offset, req-size, req-buf); +} else if (req-type == REQ_TYPE_BARRIER) { +fprintf(stderr, Barrier\n); +} else { +fprintf(stderr, Unknown type %d\n, req-type); +} +} +} + +static void
[Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
On Mon, Dec 13, 2010 at 4:12 PM, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Dec 13, 2010 at 03:27:06PM +, Stefan Hajnoczi wrote: On Mon, Dec 13, 2010 at 1:36 PM, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Dec 13, 2010 at 03:35:38PM +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 01:11:27PM +, Stefan Hajnoczi wrote: Fresh results: 192.168.0.1 - host (runs netperf) 192.168.0.2 - guest (runs netserver) host$ src/netperf -H 192.168.0.2 -- -m 200 ioeventfd=on TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2 (192.168.0.2) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 200 10.00 1759.25 ioeventfd=off TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2 (192.168.0.2) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 200 10.00 1757.15 The results vary approx +/- 3% between runs. Invocation: $ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev type=tap,id=net0,ifname=tap0,script=no,downscript=no -device virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img I am running qemu.git with v5 patches, based off 36888c6335422f07bbc50bf3443a39f24b90c7c6. Host: 1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz 8 GB RAM RHEL 6 host Next I will try the patches on latest qemu-kvm.git Stefan One interesting thing is that I put virtio-net earlier on command line. Sorry I mean I put it after disk, you put it before. I can't find a measurable difference when swapping -drive and -netdev. One other concern I have is that we are apparently using ioeventfd for all VQs. E.g. for virtio-net we probably should not use it for the control VQ - it's a waste of resources. One option is a per-device (block, net, etc) bitmap that masks out virtqueues. Is that something you'd like to see? I'm tempted to mask out the RX vq too and see how that affects the qemu-kvm.git specific issue. Stefan
[Qemu-devel] [PATCH 2/4] Add block-queue
Instead of directly executing writes and fsyncs, queue them and execute them asynchronously. What makes this interesting is that we can delay syncs and if multiple syncs occur, we can merge them into one bdrv_flush. A typical sequence in qcow2 (simple cluster allocation) looks like this: 1. Update refcount table 2. bdrv_flush 3. Update L2 entry If we delay the operation and get three of these sequences queued before actually executing, we end up with the following result, saving two syncs: 1. Update refcount table (req 1) 2. Update refcount table (req 2) 3. Update refcount table (req 3) 4. bdrv_flush 5. Update L2 entry (req 1) 6. Update L2 entry (req 2) 7. Update L2 entry (req 3) This patch only commits a sync if either the guests has requested a flush or if a certain number of requests is in the queue, so usually we batch more than just three requests. Signed-off-by: Kevin Wolf kw...@redhat.com --- Makefile.objs |2 +- block-queue.c | 875 + block-queue.h | 61 3 files changed, 937 insertions(+), 1 deletions(-) create mode 100644 block-queue.c create mode 100644 block-queue.h diff --git a/Makefile.objs b/Makefile.objs index 04625eb..7cb7dde 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -14,7 +14,7 @@ oslib-obj-$(CONFIG_POSIX) += oslib-posix.o # block-obj-y is code used by both qemu system emulation and qemu-img block-obj-y = cutils.o cache-utils.o qemu-malloc.o qemu-option.o module.o -block-obj-y += nbd.o block.o aio.o aes.o qemu-config.o +block-obj-y += nbd.o block.o aio.o aes.o qemu-config.o block-queue.o block-obj-$(CONFIG_POSIX) += posix-aio-compat.o block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o diff --git a/block-queue.c b/block-queue.c new file mode 100644 index 000..448f20d --- /dev/null +++ b/block-queue.c @@ -0,0 +1,875 @@ +/* + * QEMU System Emulator + * + * Copyright (c) 2010 Kevin Wolf kw...@redhat.com + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the Software), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include qemu-common.h +#include qemu-queue.h +#include block_int.h +#include block-queue.h +#include qemu-error.h + +//#define BLKQUEUE_DEBUG + +#ifdef BLKQUEUE_DEBUG +#define DPRINTF(fmt, ...) fprintf(stderr, fmt, ##__VA_ARGS__) +#else +#define DPRINTF(...) do {} while(0) +#endif + +#define WRITEBACK_MODES (BDRV_O_NOCACHE | BDRV_O_CACHE_WB) + +enum blkqueue_req_type { +REQ_TYPE_WRITE, +REQ_TYPE_BARRIER, +REQ_TYPE_WAIT_FOR_COMPLETION, +}; + +typedef struct BlockQueueAIOCB { +BlockDriverAIOCB common; +QLIST_ENTRY(BlockQueueAIOCB) link; +} BlockQueueAIOCB; + +typedef struct BlockQueueRequest { +enum blkqueue_req_type type; +BlockQueue* bq; + +uint64_toffset; +void* buf; +uint64_tsize; +unsignedsection; +boolin_flight; + +struct ioveciov; +QEMUIOVectorqiov; + +QLIST_HEAD(, BlockQueueAIOCB) acbs; + +QTAILQ_ENTRY(BlockQueueRequest) link; +QSIMPLEQ_ENTRY(BlockQueueRequest) link_section; +} BlockQueueRequest; + +QTAILQ_HEAD(bq_queue_head, BlockQueueRequest); + +struct BlockQueue { +BlockDriverState* bs; + +int barriers_requested; +int barriers_submitted; +int queue_size; +int flushing; +int num_waiting_for_cb; + +BlockQueueErrorHandler error_handler; +void* error_opaque; +int error_ret; + +int in_flight_num; +enum blkqueue_req_type in_flight_type; + +struct bq_queue_headqueue; +struct bq_queue_headin_flight; + +QSIMPLEQ_HEAD(, BlockQueueRequest) sections; +}; + +typedef int (*blkqueue_rw_fn)(BlockQueueContext *context, uint64_t offset, +void *buf, uint64_t size); +typedef void (*blkqueue_handle_overlap)(void *new, void *old, size_t size); + +static void blkqueue_process_request(BlockQueue *bq); +static
[Qemu-devel] [PATCH 0/4] block-queue: Delay and batch metadata write
Differences to RFC v3 include proper conversion of qcow2, addressing Stefan's comments and fixing some error cases in which two write requests to the same location might conflict. Also worth noting is that bdrv_aio_pwrite is dropped. It was unsafe with respect to multiple concurrent requests on the same sector and it's impossible to safely emulate byte-wise access with bdrv_aio_readv/writev without introducing yet another queue. Instead we fall back to synchronous bdrv_pwrite now with unaligned requests in block-queue (they are rare). Kevin Wolf (4): Make vm_stop available for block layer Add block-queue Test cases for block-queue qcow2: Use block-queue Makefile |1 + Makefile.objs |2 +- block-queue.c | 875 block-queue.h | 61 block/qcow2-cluster.c | 139 + block/qcow2-refcount.c | 217 +++- block/qcow2-snapshot.c | 106 +-- block/qcow2.c | 144 +++- block/qcow2.h | 33 ++- check-block-queue.c| 402 ++ cpus.c |8 +- qemu-common.h |3 + qemu-tool.c|5 + sysemu.h |1 - 14 files changed, 1793 insertions(+), 204 deletions(-) create mode 100644 block-queue.c create mode 100644 block-queue.h create mode 100644 check-block-queue.c -- 1.7.2.3
[Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
On Mon, Dec 13, 2010 at 4:00 PM, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Dec 13, 2010 at 03:27:06PM +, Stefan Hajnoczi wrote: On Mon, Dec 13, 2010 at 1:36 PM, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Dec 13, 2010 at 03:35:38PM +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 01:11:27PM +, Stefan Hajnoczi wrote: Fresh results: 192.168.0.1 - host (runs netperf) 192.168.0.2 - guest (runs netserver) host$ src/netperf -H 192.168.0.2 -- -m 200 ioeventfd=on TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2 (192.168.0.2) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 200 10.00 1759.25 ioeventfd=off TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2 (192.168.0.2) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 200 10.00 1757.15 The results vary approx +/- 3% between runs. Invocation: $ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev type=tap,id=net0,ifname=tap0,script=no,downscript=no -device virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img I am running qemu.git with v5 patches, based off 36888c6335422f07bbc50bf3443a39f24b90c7c6. Host: 1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz 8 GB RAM RHEL 6 host Next I will try the patches on latest qemu-kvm.git Stefan One interesting thing is that I put virtio-net earlier on command line. Sorry I mean I put it after disk, you put it before. I can't find a measurable difference when swapping -drive and -netdev. Can you run the same test with vhost? I assume it still outperforms userspace virtio for small message sizes? I'm interested because that also uses ioeventfd. Seems to work same as ioeventfd. vhost performs the same as ioeventfd=on? And that means slower than ioeventfd=off? Stefan
[Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
On Mon, Dec 13, 2010 at 04:29:58PM +, Stefan Hajnoczi wrote: On Mon, Dec 13, 2010 at 4:00 PM, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Dec 13, 2010 at 03:27:06PM +, Stefan Hajnoczi wrote: On Mon, Dec 13, 2010 at 1:36 PM, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Dec 13, 2010 at 03:35:38PM +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 01:11:27PM +, Stefan Hajnoczi wrote: Fresh results: 192.168.0.1 - host (runs netperf) 192.168.0.2 - guest (runs netserver) host$ src/netperf -H 192.168.0.2 -- -m 200 ioeventfd=on TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2 (192.168.0.2) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 200 10.00 1759.25 ioeventfd=off TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2 (192.168.0.2) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 200 10.00 1757.15 The results vary approx +/- 3% between runs. Invocation: $ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev type=tap,id=net0,ifname=tap0,script=no,downscript=no -device virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img I am running qemu.git with v5 patches, based off 36888c6335422f07bbc50bf3443a39f24b90c7c6. Host: 1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz 8 GB RAM RHEL 6 host Next I will try the patches on latest qemu-kvm.git Stefan One interesting thing is that I put virtio-net earlier on command line. Sorry I mean I put it after disk, you put it before. I can't find a measurable difference when swapping -drive and -netdev. Can you run the same test with vhost? I assume it still outperforms userspace virtio for small message sizes? I'm interested because that also uses ioeventfd. Seems to work same as ioeventfd. vhost performs the same as ioeventfd=on? And that means slower than ioeventfd=off? Stefan Yes. -- MST
Re: [Qemu-devel] [PATCH 6/6] [RFC] SPARCV8 asr17 register support.
On 12/11/2010 10:59 AM, Blue Swirl wrote: On Tue, Dec 7, 2010 at 11:51 AM, Fabien Chouteauchout...@adacore.com wrote: On 12/06/2010 07:01 PM, Blue Swirl wrote: On Mon, Dec 6, 2010 at 9:26 AM, Fabien Chouteauchout...@adacore.com wrote: Signed-off-by: Fabien Chouteauchout...@adacore.com --- hw/leon3.c |6 ++ target-sparc/cpu.h |1 + target-sparc/machine.c |2 ++ target-sparc/translate.c | 10 ++ 4 files changed, 19 insertions(+), 0 deletions(-) diff --git a/hw/leon3.c b/hw/leon3.c index ba61081..9605ce8 100644 --- a/hw/leon3.c +++ b/hw/leon3.c @@ -187,6 +187,12 @@ static void main_cpu_reset(void *opaque) values */ leon3_state.inst_cache_conf = 0x1022; leon3_state.data_cache_conf = 0x1822; + +/* Asr17 for Leon3 mono-processor */ +env-asr17= 028; /* CPU id */ +env-asr17= 18; /* SPARC V8 multiply and divide available */ +env-asr17= env-nwindows -1; /* Number of implemented registers + windows */ This is constant... } static void leon3_generic_hw_init(ram_addr_t ram_size, diff --git a/target-sparc/cpu.h b/target-sparc/cpu.h index 6020ffd..36d49fc 100644 --- a/target-sparc/cpu.h +++ b/target-sparc/cpu.h @@ -341,6 +341,7 @@ typedef struct CPUSPARCState { from PSR) */ #if !defined(TARGET_SPARC64) || defined(TARGET_ABI32) uint32_t wim; /* window invalid mask */ +uint32_t asr17;/* asr17 */ ... so no new env fields are needed... #endif target_ulong tbr; /* trap base register */ #if !defined(TARGET_SPARC64) diff --git a/target-sparc/machine.c b/target-sparc/machine.c index 752e431..c530bd3 100644 --- a/target-sparc/machine.c +++ b/target-sparc/machine.c @@ -42,6 +42,7 @@ void cpu_save(QEMUFile *f, void *opaque) qemu_put_be32s(f,env-pil_in); #ifndef TARGET_SPARC64 qemu_put_be32s(f,env-wim); +qemu_put_be32s(f,env-asr17); ... there's also nothing to save/load... /* MMU */ for (i = 0; i32; i++) qemu_put_be32s(f,env-mmuregs[i]); @@ -138,6 +139,7 @@ int cpu_load(QEMUFile *f, void *opaque, int version_id) qemu_get_be32s(f,env-pil_in); #ifndef TARGET_SPARC64 qemu_get_be32s(f,env-wim); +qemu_get_be32s(f,env-asr17); /* MMU */ for (i = 0; i32; i++) qemu_get_be32s(f,env-mmuregs[i]); diff --git a/target-sparc/translate.c b/target-sparc/translate.c index 23f9519..65de614 100644 --- a/target-sparc/translate.c +++ b/target-sparc/translate.c @@ -58,6 +58,7 @@ static TCGv cpu_hintp, cpu_htba, cpu_hver, cpu_ssr, cpu_ver; static TCGv_i32 cpu_softint; #else static TCGv cpu_wim; +static TCGv cpu_asr17; #endif /* local register indexes (only used inside old micro ops) */ static TCGv cpu_tmp0; @@ -2049,6 +2050,8 @@ static void disas_sparc_insn(DisasContext * dc) rs1 = GET_FIELD(insn, 13, 17); switch(rs1) { case 0: /* rdy */ +gen_movl_TN_reg(rd, cpu_y); +break; #ifndef TARGET_SPARC64 case 0x01 ... 0x0e: /* undefined in the SPARCv8 manual, rdy on the microSPARC @@ -2058,6 +2061,11 @@ static void disas_sparc_insn(DisasContext * dc) case 0x10 ... 0x1f: /* implementation-dependent in the SPARCv8 manual, rdy on the microSPARC II */ + +if (rs1 == 0x11) { /* Read %asr17 */ +gen_movl_TN_reg(rd, cpu_asr17); Instead: r_const = tcg_const_tl(asr constants | dc-def-nwindows - 1); gen_movl_TN_reg(rd, r_const); tcg_temp_free(r_const); OK for me, if it is acceptable to have this Leon3's specific behavior for all the SPARC32 CPUs. This will not affect other CPUs when you use CPU feature bits to make the ASR only available to Leon3. OK, I will try that. -- Fabien Chouteau
Re: [Qemu-devel] [PATCH 5/6] [RFC] Emulation of Leon3.
On 12/12/2010 03:41 PM, Andreas Färber wrote: Am 06.12.2010 um 10:26 schrieb Fabien Chouteau: diff --git a/Makefile.target b/Makefile.target index 2800f47..f40e04f 100644 --- a/Makefile.target +++ b/Makefile.target @@ -290,7 +290,10 @@ obj-sparc-y += cirrus_vga.o else obj-sparc-y = sun4m.o lance.o tcx.o sun4m_iommu.o slavio_intctl.o obj-sparc-y += slavio_timer.o slavio_misc.o sparc32_dma.o -obj-sparc-y += cs4231.o eccmemctl.o sbi.o sun4c_intctl.o +obj-sparc-y += cs4231.o eccmemctl.o sbi.o sun4c_intctl.o leon3.o + +# GRLIB +obj-sparc-y += grlib_gptimer.o grlib_irqmp.o grlib_apbuart.o Aren't these three candidates for Makefile.hw if, as I understood it, they are from some non-sparc-specific component library? They are sparc specific, but non-leon3-specific. -- Fabien Chouteau
[Qemu-devel] Re: [PATCH] rtl8139: IO memory is not part of vmstate
On Mon, 2010-12-13 at 02:55 +0530, Juan Quintela wrote: Alex Williamson alex.william...@redhat.com wrote: On Sun, 2010-12-12 at 20:07 +0530, Juan Quintela wrote: Michael S. Tsirkin m...@redhat.com wrote: On Sun, Dec 12, 2010 at 05:23:39PM +0530, Juan Quintela wrote: Michael S. Tsirkin m...@redhat.com wrote: On Thu, Dec 09, 2010 at 03:14:17PM -0700, Alex Williamson wrote: How about we keep migrating the index for the benefit of old versions, but ignore the value on load? Something like the following: This was my 1st suggestion to Alex O:-) The difference here is that instead of sending garbage to the old version we send an actual index value. So, I am in. he think this is bad for upstream, I don't think so (but I understand that it is oppinable). Later, Juan. I think it makes sense to fix this for the stable branch, and I think we should try as hard as we can to avoid bumping up the version number there. For master we can bump the version number but it might be easier to just keep the code the same there. I think that your solution is better. For older versions, it works as expected. For new versions, problem is fixed. Solution is not the purest, but you can say the same about uping the version for a state that is exactly the same length fields O:-) I disagree, without bumping the version number, we can never guarantee the problem is behind us. we can, if we use the latest version. And we determine we're using the latest version via the vmsd version_id... We can always migrate to the bad version, That is the whole point. Bumping the version makes this impossible. Which seems like a good thing to me. Yes, it sucks that a user may upgrade a host, migrate a guest to it, and suddenly not be able to migrate back to the original host. On the other hand, isn't it better that we don't allow a migration that could potentially risk the integrity of the guest? I think so. which puts our users at risk. The responsible behavior is to allow forward migrations and prevent migrations to a version with an issue known to compromise VM integrity. Perhaps I feel more strongly about this because I actually had to debug this problem. Obvious in retrospect, but a huge pain in the butt to get there. Obviously, my point of view is different, and is related with maintaining a stable migration ABI. So, ... I am also biased. We have to make a decission (in general, not just this case): - we are going to never bump the version: this gives an stable ABI, but bugs stay with us forever This is impossible. - we are not ever going to prettend that we care this makes changes trivial, as we don't have to maintain backward compatiblity. That's a little dramatic. If we can come up with a way to not bump the version number, I'm all for it. I haven't seen one so far. And that is it. Basically anything in the middle don't matter. If I have a machine definition, with only a single device that has bumped version, I can't migrate to the backwards one. Sorry, it's for your own good. AIUI, there is plenty of grey between your criteria above. Yes we should try to preserve the migration ABI. However, we will hit bugs where that's impossible. Then it's good to have discussions like this and investigate whether we can safely make a change without bumping the version_id. IMHO, the integrity of the guest is always more important than maintaining a static ABI. This is the reason why I am against the changes like this, if we are prettending that we are going to maintain the versions stable. Notice that there are (at least) two ways to look at this specific problem: - don't bump the version. * new - new : works * old - new : works * new - old : works (at least as well as old - old that existed before) If it worked, I wouldn't be working on this bug ;) Here are some failure scenarios: a) 1. Boot guest with single rtl8139 2. Hot add 2nd rtl8139 3. Migrate guest 4. Hot remove 2nd rtl8139 Result: 1st NIC stops working, guest segfaults on reboot Too complicated? How about this: b) 1. Boot guest with 2 rtl8139 NICs 2. Boot migration target with NICs listed in reverse order 3. Migrate Result: NICs get swapped at reboot!! Or how about: c) 1. Boot guest with e1000, rtl8139 2. Boot migration target with rtl8139, e1000 3. Migrate Result: rtl8139 now points at e1000 mmio space, fails on reboot, e1000 fails if rtl8139 is removed I don't think it's fair to call any of these working, and in fact, I retract my patch that sets the mmio space to unassigned if the device is hotplugged, since issues can clearly happen without hotplug involved. The index the device uses depends entirely on instantiation ordering, which is bound to cause confusing, hard to reproduce, and difficult to debug issues. - bump the version *
[Qemu-devel] Re: [PATCH] rtl8139: IO memory is not part of vmstate
On Mon, Dec 13, 2010 at 10:43:22AM -0700, Alex Williamson wrote: So, unfortunately, I stand by my original patch. What about the one that put -1 in saved index for a hotplugged device? -- MST
[Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
On Mon, Dec 13, 2010 at 4:28 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Mon, Dec 13, 2010 at 4:12 PM, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Dec 13, 2010 at 03:27:06PM +, Stefan Hajnoczi wrote: On Mon, Dec 13, 2010 at 1:36 PM, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Dec 13, 2010 at 03:35:38PM +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 01:11:27PM +, Stefan Hajnoczi wrote: Fresh results: 192.168.0.1 - host (runs netperf) 192.168.0.2 - guest (runs netserver) host$ src/netperf -H 192.168.0.2 -- -m 200 ioeventfd=on TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2 (192.168.0.2) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 200 10.00 1759.25 ioeventfd=off TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2 (192.168.0.2) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 200 10.00 1757.15 The results vary approx +/- 3% between runs. Invocation: $ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev type=tap,id=net0,ifname=tap0,script=no,downscript=no -device virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img I am running qemu.git with v5 patches, based off 36888c6335422f07bbc50bf3443a39f24b90c7c6. Host: 1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz 8 GB RAM RHEL 6 host Next I will try the patches on latest qemu-kvm.git Stefan One interesting thing is that I put virtio-net earlier on command line. Sorry I mean I put it after disk, you put it before. I can't find a measurable difference when swapping -drive and -netdev. One other concern I have is that we are apparently using ioeventfd for all VQs. E.g. for virtio-net we probably should not use it for the control VQ - it's a waste of resources. One option is a per-device (block, net, etc) bitmap that masks out virtqueues. Is that something you'd like to see? I'm tempted to mask out the RX vq too and see how that affects the qemu-kvm.git specific issue. As expected, the rx virtqueue is involved in the degradation. I enabled ioeventfd only for the TX virtqueue and got the same good results as userspace virtio-net. When I enable only the rx virtqueue, performs decreases as we've seen above. Stefan
[Qemu-devel] Re: [PATCH] rtl8139: IO memory is not part of vmstate
On Mon, 2010-12-13 at 19:50 +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 10:43:22AM -0700, Alex Williamson wrote: So, unfortunately, I stand by my original patch. What about the one that put -1 in saved index for a hotplugged device? There are still examples that don't work even without hotplug (example 2 and example 3 after the reboot). That hack limits the damage, but still leaves a latent bug for reboot and doesn't address the non-hotplug scenarios. So, I don't think it's worthwhile to pursue, and we shouldn't pretend we can use it to avoid bumping the version_id. Thanks, Alex
[Qemu-devel] [PATCH 3/7] Add configure script and command line options for TPM interface.
Signed-off-by: Andreas Niederl andreas.nied...@iaik.tugraz.at --- Makefile.objs |3 +++ configure |9 + qemu-config.c | 16 qemu-config.h |1 + qemu-options.hx |6 ++ vl.c| 29 + 6 files changed, 64 insertions(+), 0 deletions(-) diff --git a/Makefile.objs b/Makefile.objs index 7409919..444a41a 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -278,6 +278,9 @@ hw-obj-$(CONFIG_REALLY_VIRTFS) += virtio-9p-debug.o hw-obj-$(CONFIG_VIRTFS) += virtio-9p-local.o virtio-9p-xattr.o hw-obj-$(CONFIG_VIRTFS) += virtio-9p-xattr-user.o virtio-9p-posix-acl.o +# TPM passthrough device +hw-obj-$(CONFIG_TPM) += tpm_tis.o tpm_backend.o tpm_host_backend.o + ## # libdis # NOTE: the disassembler code is only needed for debugging diff --git a/configure b/configure index 2917874..ca97825 100755 --- a/configure +++ b/configure @@ -332,6 +332,7 @@ zero_malloc= trace_backend=nop trace_file=trace spice= +tpm=no # OS specific if check_define __linux__ ; then @@ -472,6 +473,7 @@ Haiku) usb=linux if [ $cpu = i386 -o $cpu = x86_64 ] ; then audio_possible_drivers=$audio_possible_drivers fmod +tpm=yes fi ;; esac @@ -739,6 +741,8 @@ for opt do ;; --enable-vhost-net) vhost_net=yes ;; + --disable-tpm) tpm=no + ;; --*dir) ;; *) echo ERROR: unknown option $opt; show_help=yes @@ -934,6 +938,7 @@ echo --trace-file=NAMEFull PATH,NAME of file to store traces echoDefault:trace-pid echo --disable-spice disable spice echo --enable-spice enable spice +echo --disable-tpmdisable tpm passthrough device emulation echo echo NOTE: The object files are built at the place where configure is launched exit 1 @@ -2354,6 +2359,7 @@ echo vhost-net support $vhost_net echo Trace backend $trace_backend echo Trace output file $trace_file-pid echo spice support $spice +echo tpm support $tpm if test $sdl_too_old = yes; then echo - Your SDL version is too old - please upgrade to have SDL support @@ -2606,6 +2612,9 @@ fi if test $fdatasync = yes ; then echo CONFIG_FDATASYNC=y $config_host_mak fi +if test $tpm = yes ; then + echo CONFIG_TPM=y $config_host_mak +fi if test $madvise = yes ; then echo CONFIG_MADVISE=y $config_host_mak fi diff --git a/qemu-config.c b/qemu-config.c index 965fa46..b42483c 100644 --- a/qemu-config.c +++ b/qemu-config.c @@ -445,6 +445,22 @@ QemuOptsList qemu_option_rom_opts = { }, }; +QemuOptsList qemu_tpm_opts = { +.name = tpm, +.implied_opt_name = type, +.head = QTAILQ_HEAD_INITIALIZER(qemu_tpm_opts.head), +.desc = { +{ +.name = type, +.type = QEMU_OPT_STRING, +},{ +.name = path, +.type = QEMU_OPT_STRING, +}, +{ /*End of list */ } +}, +}; + static QemuOptsList *vm_config_groups[32] = { qemu_drive_opts, qemu_chardev_opts, diff --git a/qemu-config.h b/qemu-config.h index 20d707f..eed9b3f 100644 --- a/qemu-config.h +++ b/qemu-config.h @@ -4,6 +4,7 @@ extern QemuOptsList qemu_fsdev_opts; extern QemuOptsList qemu_virtfs_opts; extern QemuOptsList qemu_spice_opts; +extern QemuOptsList qemu_tpm_opts; QemuOptsList *qemu_find_opts(const char *group); void qemu_add_opts(QemuOptsList *list); diff --git a/qemu-options.hx b/qemu-options.hx index 4d99a58..96cdb36 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -2312,6 +2312,12 @@ STEXI Specify a trace file to log output traces to. ETEXI #endif +#ifdef CONFIG_TPM +DEF(tpm, HAS_ARG, QEMU_OPTION_tpm, +-tpm host,id=id,path=path\n +enable TPM support and forward commands to the given TPM device file\n, +QEMU_ARCH_I386) +#endif HXCOMM This is the last statement. Insert new options before this line! STEXI diff --git a/vl.c b/vl.c index cb0a3ec..fa29cbf 100644 --- a/vl.c +++ b/vl.c @@ -152,6 +152,9 @@ int main(int argc, char **argv) #ifdef CONFIG_VIRTFS #include fsdev/qemu-fsdev.h #endif +#ifdef CONFIG_TPM +#include hw/tpm.h +#endif #include disas.h @@ -1614,6 +1617,16 @@ static int fsdev_init_func(QemuOpts *opts, void *opaque) } #endif +#ifdef CONFIG_TPM +static int tpm_init_func(QemuOpts *opts, void *opaque) +{ +int ret; +ret = qemu_tpm_add(opts); + +return ret; +} +#endif + static int mon_init_func(QemuOpts *opts, void *opaque) { CharDriverState *chr; @@ -1944,6 +1957,10 @@ int main(int argc, char **argv, char **envp) tb_size = 0; autostart= 1; +#ifdef CONFIG_TPM +qemu_add_opts(qemu_tpm_opts); +#endif + /* first pass of option parsing */ optind = 1; while (optind argc) { @@ -2438,6 +2455,13 @@ int main(int argc, char **argv, char **envp) qemu_free(arg_9p); break; } +case QEMU_OPTION_tpm: +
[Qemu-devel] [PATCH 2/7] Add TPM host passthrough device backend.
Threadlets are used for asynchronous I/O to the host TPM device because the Linux TPM driver does not allow for non-blocking I/O. This patch is based on the Threadlets patch series v12 posted on this list. Signed-off-by: Andreas Niederl andreas.nied...@iaik.tugraz.at --- hw/tpm_backend.c |1 + hw/tpm_host_backend.c | 219 + hw/tpm_int.h |7 ++ hw/tpm_tis.c |3 - 4 files changed, 227 insertions(+), 3 deletions(-) create mode 100644 hw/tpm_host_backend.c diff --git a/hw/tpm_backend.c b/hw/tpm_backend.c index a0bec7c..2d3b550 100644 --- a/hw/tpm_backend.c +++ b/hw/tpm_backend.c @@ -26,6 +26,7 @@ typedef struct { } TPMDriverTable; static const TPMDriverTable driver_table[] = { +{ .name = host, .open = qemu_tpm_host_open }, }; int qemu_tpm_add(QemuOpts *opts) { diff --git a/hw/tpm_host_backend.c b/hw/tpm_host_backend.c new file mode 100644 index 000..238b030 --- /dev/null +++ b/hw/tpm_host_backend.c @@ -0,0 +1,219 @@ + +#include errno.h +#include signal.h + +#include qemu-common.h +#include qemu-threadlets.h + +#include hw/tpm_int.h + + +#define STATUS_DONE(1 1) +#define STATUS_IN_PROGRESS (1 0) +#define STATUS_IDLE 0 + +typedef struct { +TPMDriver common; + +ThreadletWork work; + +uint8_t send_status; +uint8_t recv_status; + +int32_t send_len; +int32_t recv_len; + +int fd; +} TPMHostDriver; + +static int tpm_host_send(TPMDriver *drv, uint8_t locty, uint32_t len) +{ +TPMHostDriver *hdrv = DO_UPCAST(TPMHostDriver, common, drv); +int n = 0; + +drv-locty = locty; + +switch (hdrv-send_status) { +case STATUS_IN_PROGRESS: +break; +case STATUS_IDLE: +hdrv-send_len = len; +hdrv-recv_len = TPM_MAX_PKT; +/* asynchronous send */ +n = 1; +submit_work(hdrv-work); +break; +case STATUS_DONE: +break; +default: +n = -1; +fprintf(stderr, +tpm host backend: internal error on send status %d\n, +hdrv-send_status); +break; +} + +return n; +} + +static int tpm_host_recv(TPMDriver *drv, uint8_t locty, uint32_t len) +{ +TPMHostDriver *hdrv = DO_UPCAST(TPMHostDriver, common, drv); +int n = 0; + +drv-locty = locty; + +switch (hdrv-recv_status) { +case STATUS_IN_PROGRESS: +break; +case STATUS_IDLE: +break; +case STATUS_DONE: +hdrv-recv_status = STATUS_IDLE; +n = hdrv-recv_len; +break; +default: +n = -1; +fprintf(stderr, +tpm host backend: internal error on recv status %d\n, +hdrv-recv_status); +break; +} + +return n; +} + + +/* borrowed from qemu-char.c */ +static int unix_write(int fd, const uint8_t *buf, uint32_t len) +{ +int ret, len1; + +len1 = len; +while (len1 0) { +ret = write(fd, buf, len1); +if (ret 0) { +if (errno != EINTR errno != EAGAIN) +return -1; +} else if (ret == 0) { +break; +} else { +buf += ret; +len1 -= ret; +} +} +return len - len1; +} + +static int unix_read(int fd, uint8_t *buf, uint32_t len) +{ +int ret, len1; +uint8_t *buf1; + +len1 = len; +buf1 = buf; +while ((len1 0) (ret = read(fd, buf1, len1)) != 0) { +if (ret 0) { +if (errno != EINTR errno != EAGAIN) +return -1; +} else { +buf1 += ret; +len1 -= ret; +} +} +return len - len1; +} + + +static void tpm_host_send_receive(ThreadletWork *work) +{ +TPMHostDriver *drv = container_of(work, TPMHostDriver, work); +TPMDriver *s = drv-common; +uint32_t tpm_ret; +int ret; + +drv-send_status = STATUS_IN_PROGRESS; + +DSHOW_BUFF(s-buf, To TPM); + +ret = unix_write(drv-fd, s-buf, drv-send_len); + +drv-send_len= ret; +drv-send_status = STATUS_DONE; + +if (ret 0) { +fprintf(stderr, Error: while transmitting data to host tpm +: %s (%i)\n, +strerror(errno), errno); +return; +} + +drv-recv_status = STATUS_IN_PROGRESS; + +ret = unix_read(drv-fd, s-buf, drv-recv_len); + +drv-recv_len= ret; +drv-recv_status = STATUS_DONE; +drv-send_status = STATUS_IDLE; + +if (ret 0) { +fprintf(stderr, Error: while reading data from host tpm +: %s (%i)\n, +strerror(errno), errno); +return; +} + +DSHOW_BUFF(s-buf, From TPM); + +tpm_ret = (s-buf[8])*256 + s-buf[9]; +if (tpm_ret) { +DPRINTF(tpm command failed with error %d\n, tpm_ret); +} else { +DPRINTF(tpm command succeeded\n); +} +} + +
[Qemu-devel] Re: [PATCH 3/7] Add configure script and command line options for TPM interface.
On 12/13/2010 07:04 PM, Andreas Niederl wrote: [...] Sorry for the wrong patch count in the subject. Total number is 4. Regards, Andreas smime.p7s Description: S/MIME Cryptographic Signature
[Qemu-devel] [PATCH 1/7] Add TPM 1.2 device interface
This implementation is based on the TPM 1.2 interface for virtualized TPM devices from the Xen-4.0.0 ioemu-qemu-xen fork. A backend driver infrastructure is provided to be able to use different device backends. Signed-off-by: Andreas Niederl andreas.nied...@iaik.tugraz.at --- hw/tpm.h |6 + hw/tpm_backend.c | 63 + hw/tpm_int.h | 36 +++ hw/tpm_tis.c | 711 ++ 4 files changed, 816 insertions(+), 0 deletions(-) create mode 100644 hw/tpm.h create mode 100644 hw/tpm_backend.c create mode 100644 hw/tpm_int.h create mode 100644 hw/tpm_tis.c diff --git a/hw/tpm.h b/hw/tpm.h new file mode 100644 index 000..844c95e --- /dev/null +++ b/hw/tpm.h @@ -0,0 +1,6 @@ +#ifndef TPM_H +#define TPM_H + +int qemu_tpm_add(QemuOpts *opts); + +#endif /* TPM_H */ diff --git a/hw/tpm_backend.c b/hw/tpm_backend.c new file mode 100644 index 000..a0bec7c --- /dev/null +++ b/hw/tpm_backend.c @@ -0,0 +1,63 @@ + +#include qemu-option.h + +#include hw/tpm.h +#include hw/tpm_int.h + + +static QLIST_HEAD(, TPMDriver) tpm_drivers = +QLIST_HEAD_INITIALIZER(tpm_drivers); + +TPMDriver *tpm_get_driver(const char *id) +{ +TPMDriver *drv; +QLIST_FOREACH(drv, tpm_drivers, list) { +if (!strcmp(drv-id, id)) { +return drv; +} +} +return NULL; +} + + +typedef struct { +const char *name; +TPMDriver *(*open)(QemuOpts *opts); +} TPMDriverTable; + +static const TPMDriverTable driver_table[] = { +}; + +int qemu_tpm_add(QemuOpts *opts) { +TPMDriver *drv = NULL; +int i; + +if (qemu_opts_id(opts) == NULL) { +fprintf(stderr, tpm: no id specified\n); +return -1; +} + +for (i = 0; i ARRAY_SIZE(driver_table); i++) { +if (strcmp(driver_table[i].name, qemu_opt_get(opts, type)) == 0) { +break; +} +} + +if (i == ARRAY_SIZE(driver_table)) { +fprintf(stderr, tpm: backend type %s not found\n, +qemu_opt_get(opts, type)); +return -1; +} + +drv = driver_table[i].open(opts); + +if (drv == NULL) { +return -1; +} + +drv-id = qemu_strdup(qemu_opts_id(opts)); + +QLIST_INSERT_HEAD(tpm_drivers, drv, list); + +return 0; +} diff --git a/hw/tpm_int.h b/hw/tpm_int.h new file mode 100644 index 000..d52d7e2 --- /dev/null +++ b/hw/tpm_int.h @@ -0,0 +1,36 @@ +#ifndef TPM_INT_H +#define TPM_INT_H + + +#include inttypes.h +#include qemu-queue.h +#include qemu-option.h + + +typedef struct TPMDriver TPMDriver; +struct TPMDriver { +char *id; + +uint8_t locty; +uint8_t *buf; + +int (*send)(TPMDriver *drv, uint8_t locty, uint32_t len); +int (*recv)(TPMDriver *drv, uint8_t locty, uint32_t len); + +QLIST_ENTRY(TPMDriver) list; +}; + +TPMDriver *tpm_get_driver(const char *id); + +#define DEBUG_TPM +#ifdef DEBUG_TPM +void show_buff(unsigned char *buff, const char *string); +#define DPRINTF(fmt, ...) \ +fprintf(stderr, tpm_tis: %s: fmt, __FUNCTION__, ##__VA_ARGS__) +#define DSHOW_BUFF(buf, info) show_buff(buf, info) +#else +#define DPRINTF(fmt, ...) +#define DSHOW_BUFF(buf, info) +#endif + +#endif /* TPM_INT_H */ diff --git a/hw/tpm_tis.c b/hw/tpm_tis.c new file mode 100644 index 000..0cee917 --- /dev/null +++ b/hw/tpm_tis.c @@ -0,0 +1,711 @@ +/* + * tpm_tis.c - QEMU emulator for a 1.2 TPM with TIS interface + * + * Copyright (C) 2006 IBM Corporation + * Copyright (C) 2010 IAIK, Graz University of Technology + * + * Author: Stefan Berger stef...@us.ibm.com + * David Safford saff...@us.ibm.com + * + * Author: Andreas Niederl andreas.nied...@iaik.tugraz.at + * Pass through a TPM device rather than using the emulator + * Modified to use a separate thread for IO to/from TPM as the Linux + * TPM driver framework does not allow non-blocking IO + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation, version 2 of the + * License. + * + * + * Implementation of the TIS interface according to specs at + * https://www.trustedcomputinggroup.org/ + * + */ + +#include sys/types.h +#include sys/stat.h +#include string.h + +#include qemu-option.h +#include qemu-config.h +#include hw/hw.h +#include hw/pc.h +#include hw/pci.h +#include hw/pci_ids.h +#include qemu-timer.h + +#include hw/tpm_int.h + + +#define TPM_MAX_PKT4096 +#define TPM_MAX_PATH 4096 + +#define TIS_ADDR_BASE 0xFED4 + +/* tis registers */ +#define TPM_REG_ACCESS0x00 +#define TPM_REG_INT_ENABLE0x08 +#define TPM_REG_INT_VECTOR0x0c +#define TPM_REG_INT_STATUS0x10 +#define TPM_REG_INTF_CAPABILITY 0x14 +#define TPM_REG_STS 0x18 +#define TPM_REG_DATA_FIFO 0x24 +#define TPM_REG_DID_VID 0xf00 +#define TPM_REG_RID 0xf04 + +#define
Re: [Qemu-devel] [PATCH 2/6] [RFC] Emulation of GRLIB IRQMP as defined in GRLIB IP Core User's Manual.
On Mon, Dec 13, 2010 at 4:23 PM, Fabien Chouteau chout...@adacore.com wrote: On 12/11/2010 11:31 AM, Blue Swirl wrote: On Tue, Dec 7, 2010 at 10:43 AM, Fabien Chouteauchout...@adacore.com wrote: On 12/06/2010 06:25 PM, Blue Swirl wrote: On Mon, Dec 6, 2010 at 9:26 AM, Fabien Chouteauchout...@adacore.com wrote: Signed-off-by: Fabien Chouteauchout...@adacore.com --- hw/grlib_irqmp.c | 416 ++ 1 files changed, 416 insertions(+), 0 deletions(-) diff --git a/hw/grlib_irqmp.c b/hw/grlib_irqmp.c new file mode 100644 index 000..69e1553 --- /dev/null +++ b/hw/grlib_irqmp.c @@ -0,0 +1,416 @@ +/* + * QEMU GRLIB IRQMP Emulator + * + * (Multiprocessor and extended interrupt not supported) + * + * Copyright (c) 2010 AdaCore + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the Software), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include sysbus.h +#include cpu.h + +#include grlib.h + +/* #define DEBUG_IRQ */ + +#ifdef DEBUG_IRQ +#define DPRINTF(fmt, ...) \ + do { printf(IRQMP: fmt , ## __VA_ARGS__); } while (0) +#else +#define DPRINTF(fmt, ...) +#endif + +#define IRQMP_MAX_CPU 16 +#define IRQMP_REG_SIZE 256 /* Size of memory mapped registers */ + +/* Memory mapped register offsets */ +#define LEVEL_OFFSET 0x00 +#define PENDING_OFFSET 0x04 +#define FORCE0_OFFSET 0x08 +#define CLEAR_OFFSET 0x0C +#define MP_STATUS_OFFSET 0x10 +#define BROADCAST_OFFSET 0x14 +#define MASK_OFFSET 0x40 +#define FORCE_OFFSET 0x80 +#define EXTENDED_OFFSET 0xC0 + +typedef struct IRQMP +{ + SysBusDevice busdev; + + CPUSPARCState *env; Devices should never access CPUState directly. Instead, board level should create CPU irqs and these should then be passed here. This case is special, Leon3 is a System-On-Chip and some of the components are very close to the processor. IRQMP is not really a peripheral nor a part of the CPU, it's both... It's not a special case, it could be easily implemented separately. MMUs, FPUs or co-processors could be special even if they have been implemented as separate chips with real hardware. But we are actually not looking at the (historical or current) chip boundaries but more like what makes sense from QEMU architecture point of view. OK then, let's go back to your first comment, why a device can't access CPUState directly? And why Leon3.c would be better to do that. Devices should mind their own business, not other devices' or especially CPUs' businesses. The signals between devices should be made with qemu_irq or bus style interfaces. Board case is different because there we interface with QEMU host. Not all devices are very clean yet. This has been discussed a few times earlier, please see the list archives if you really are interested.
Re: [Qemu-devel] [PATCH 5/6] [RFC] Emulation of Leon3.
On Mon, Dec 13, 2010 at 3:51 PM, Fabien Chouteau chout...@adacore.com wrote: On 12/11/2010 10:56 AM, Blue Swirl wrote: On Tue, Dec 7, 2010 at 11:40 AM, Fabien Chouteauchout...@adacore.com wrote: On 12/06/2010 06:53 PM, Blue Swirl wrote: On Mon, Dec 6, 2010 at 9:26 AM, Fabien Chouteauchout...@adacore.com wrote: Signed-off-by: Fabien Chouteauchout...@adacore.com --- Makefile.target | 5 +- hw/leon3.c | 310 ++ target-sparc/cpu.h | 10 ++ target-sparc/helper.c | 2 +- target-sparc/op_helper.c | 30 - 5 files changed, 353 insertions(+), 4 deletions(-) diff --git a/Makefile.target b/Makefile.target index 2800f47..f40e04f 100644 --- a/Makefile.target +++ b/Makefile.target @@ -290,7 +290,10 @@ obj-sparc-y += cirrus_vga.o else obj-sparc-y = sun4m.o lance.o tcx.o sun4m_iommu.o slavio_intctl.o obj-sparc-y += slavio_timer.o slavio_misc.o sparc32_dma.o -obj-sparc-y += cs4231.o eccmemctl.o sbi.o sun4c_intctl.o +obj-sparc-y += cs4231.o eccmemctl.o sbi.o sun4c_intctl.o leon3.o + +# GRLIB +obj-sparc-y += grlib_gptimer.o grlib_irqmp.o grlib_apbuart.o endif obj-arm-y = integratorcp.o versatilepb.o arm_pic.o arm_timer.o diff --git a/hw/leon3.c b/hw/leon3.c new file mode 100644 index 000..ba61081 --- /dev/null +++ b/hw/leon3.c @@ -0,0 +1,310 @@ +/* + * QEMU Leon3 System Emulator + * + * Copyright (c) 2010 AdaCore + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the Software), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ +#include hw.h +#include qemu-timer.h +#include qemu-char.h +#include sysemu.h +#include boards.h +#include loader.h +#include elf.h + +#include grlib.h + +/* #define DEBUG_LEON3 */ + +#ifdef DEBUG_LEON3 +#define DPRINTF(fmt, ...) \ + do { printf(Leon3: fmt , ## __VA_ARGS__); } while (0) +#else +#define DPRINTF(fmt, ...) +#endif + +/* Default system clock. */ +#define CPU_CLK (40 * 1000 * 1000) + +#define PROM_FILENAME u-boot.bin + +#define MAX_PILS 16 + +typedef struct Leon3State +{ + uint32_t cache_control; + uint32_t inst_cache_conf; + uint32_t data_cache_conf; + + uint64_t entry; /* save kernel entry in case of reset */ +} Leon3State; + +Leon3State leon3_state; Again global state, please refactor. Perhaps most of the cache handling code belong to target-sparc/op_helper.c and this structure to CPUSPARCState. I will try to find a solution for that. Is it OK to add some Leon3 specific stuff in the CPUSPARCState? Yes, no problem. You can also drop the intermediate Leon3State structure if there is no benefit. + +/* Cache control: emulate the behavior of cache control registers but without + any effect on the emulated CPU */ + +#define CACHE_DISABLED 0x0 +#define CACHE_FROZEN 0x1 +#define CACHE_ENABLED 0x3 + +/* Cache Control register fields */ + +#define CACHE_CTRL_IF (1 4) /* Instruction Cache Freeze on Interrupt */ +#define CACHE_CTRL_DF (1 5) /* Data Cache Freeze on Interrupt */ +#define CACHE_CTRL_DP (1 14) /* Data cache flush pending */ +#define CACHE_CTRL_IP (1 15) /* Instruction cache flush pending */ +#define CACHE_CTRL_IB (1 16) /* Instruction burst fetch */ +#define CACHE_CTRL_FI (1 21) /* Flush Instruction cache (Write only) */ +#define CACHE_CTRL_FD (1 22) /* Flush Data cache (Write only) */ +#define CACHE_CTRL_DS (1 23) /* Data cache snoop enable */ + +void leon3_cache_control_int(void) +{ + uint32_t state = 0; + + if (leon3_state.cache_control CACHE_CTRL_IF) { + /* Instruction cache state */ + state = leon3_state.cache_control 0x3; Please add a new define CACHE_CTRL_xxx to replace 0x3. Done. + if (state == CACHE_ENABLED) { + state = CACHE_FROZEN; + DPRINTF(Instruction cache: freeze\n); +
[Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
On Mon, Dec 13, 2010 at 05:57:28PM +, Stefan Hajnoczi wrote: On Mon, Dec 13, 2010 at 4:28 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Mon, Dec 13, 2010 at 4:12 PM, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Dec 13, 2010 at 03:27:06PM +, Stefan Hajnoczi wrote: On Mon, Dec 13, 2010 at 1:36 PM, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Dec 13, 2010 at 03:35:38PM +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 01:11:27PM +, Stefan Hajnoczi wrote: Fresh results: 192.168.0.1 - host (runs netperf) 192.168.0.2 - guest (runs netserver) host$ src/netperf -H 192.168.0.2 -- -m 200 ioeventfd=on TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2 (192.168.0.2) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 200 10.00 1759.25 ioeventfd=off TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2 (192.168.0.2) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 200 10.00 1757.15 The results vary approx +/- 3% between runs. Invocation: $ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev type=tap,id=net0,ifname=tap0,script=no,downscript=no -device virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img I am running qemu.git with v5 patches, based off 36888c6335422f07bbc50bf3443a39f24b90c7c6. Host: 1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz 8 GB RAM RHEL 6 host Next I will try the patches on latest qemu-kvm.git Stefan One interesting thing is that I put virtio-net earlier on command line. Sorry I mean I put it after disk, you put it before. I can't find a measurable difference when swapping -drive and -netdev. One other concern I have is that we are apparently using ioeventfd for all VQs. E.g. for virtio-net we probably should not use it for the control VQ - it's a waste of resources. One option is a per-device (block, net, etc) bitmap that masks out virtqueues. Is that something you'd like to see? I'm tempted to mask out the RX vq too and see how that affects the qemu-kvm.git specific issue. As expected, the rx virtqueue is involved in the degradation. I enabled ioeventfd only for the TX virtqueue and got the same good results as userspace virtio-net. When I enable only the rx virtqueue, performs decreases as we've seen above. Stefan Interesting. In particular this implies something's wrong with the queue: we should not normally be getting notifications from rx queue at all. Is it running low on buffers? Does it help to increase the vq size? Any other explanation? -- MST
[Qemu-devel] Re: [PATCH] rtl8139: IO memory is not part of vmstate
On Mon, Dec 13, 2010 at 11:00:44AM -0700, Alex Williamson wrote: On Mon, 2010-12-13 at 19:50 +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 10:43:22AM -0700, Alex Williamson wrote: So, unfortunately, I stand by my original patch. What about the one that put -1 in saved index for a hotplugged device? There are still examples that don't work even without hotplug (example 2 and example 3 after the reboot). That hack limits the damage, but still leaves a latent bug for reboot and doesn't address the non-hotplug scenarios. So, I don't think it's worthwhile to pursue, and we shouldn't pretend we can use it to avoid bumping the version_id. Thanks, Alex I guess when we bump it we tell users: migration is completely borken to the old version, don't even try it. Is there a way for libvirt to discover such incompatibilities and avoid the migration? -- MST
[Qemu-devel] Re: [PATCH] rtl8139: IO memory is not part of vmstate
On Mon, 2010-12-13 at 20:54 +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 11:00:44AM -0700, Alex Williamson wrote: On Mon, 2010-12-13 at 19:50 +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 10:43:22AM -0700, Alex Williamson wrote: So, unfortunately, I stand by my original patch. What about the one that put -1 in saved index for a hotplugged device? There are still examples that don't work even without hotplug (example 2 and example 3 after the reboot). That hack limits the damage, but still leaves a latent bug for reboot and doesn't address the non-hotplug scenarios. So, I don't think it's worthwhile to pursue, and we shouldn't pretend we can use it to avoid bumping the version_id. Thanks, Alex I guess when we bump it we tell users: migration is completely borken to the old version, don't even try it. Is there a way for libvirt to discover such incompatibilities and avoid the migration? I don't know if libvirt has a way to query this in advance. If a migration is attempted, the target will report: savevm: unsupported version 5 for ':00:03.0/rtl8139' v4 And the source will continue running. We waste plenty of bits getting to that point, but hopefully libvirt understands that it failed. Thanks, Alex
[Qemu-devel] Re: [PATCH] rtl8139: IO memory is not part of vmstate
On Mon, Dec 13, 2010 at 11:59:16AM -0700, Alex Williamson wrote: On Mon, 2010-12-13 at 20:54 +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 11:00:44AM -0700, Alex Williamson wrote: On Mon, 2010-12-13 at 19:50 +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 10:43:22AM -0700, Alex Williamson wrote: So, unfortunately, I stand by my original patch. What about the one that put -1 in saved index for a hotplugged device? There are still examples that don't work even without hotplug (example 2 and example 3 after the reboot). That hack limits the damage, but still leaves a latent bug for reboot and doesn't address the non-hotplug scenarios. So, I don't think it's worthwhile to pursue, and we shouldn't pretend we can use it to avoid bumping the version_id. Thanks, Alex I guess when we bump it we tell users: migration is completely borken to the old version, don't even try it. Is there a way for libvirt to discover such incompatibilities and avoid the migration? I don't know if libvirt has a way to query this in advance. If a migration is attempted, the target will report: savevm: unsupported version 5 for ':00:03.0/rtl8139' v4 And the source will continue running. We waste plenty of bits getting to that point, Yes, this happens after all of memory has been migrated. but hopefully libvirt understands that it failed. Thanks, Alex
[Qemu-devel] Re: [PATCH] rtl8139: IO memory is not part of vmstate
On Mon, 2010-12-13 at 21:06 +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 11:59:16AM -0700, Alex Williamson wrote: On Mon, 2010-12-13 at 20:54 +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 11:00:44AM -0700, Alex Williamson wrote: On Mon, 2010-12-13 at 19:50 +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 10:43:22AM -0700, Alex Williamson wrote: So, unfortunately, I stand by my original patch. What about the one that put -1 in saved index for a hotplugged device? There are still examples that don't work even without hotplug (example 2 and example 3 after the reboot). That hack limits the damage, but still leaves a latent bug for reboot and doesn't address the non-hotplug scenarios. So, I don't think it's worthwhile to pursue, and we shouldn't pretend we can use it to avoid bumping the version_id. Thanks, Alex I guess when we bump it we tell users: migration is completely borken to the old version, don't even try it. Is there a way for libvirt to discover such incompatibilities and avoid the migration? I don't know if libvirt has a way to query this in advance. If a migration is attempted, the target will report: savevm: unsupported version 5 for ':00:03.0/rtl8139' v4 And the source will continue running. We waste plenty of bits getting to that point, Yes, this happens after all of memory has been migrated. Better late than never :^\
Re: [Qemu-devel] Re: [PATCH] PCI: Bus number from the bridge, not the device
On Mon, 2010-11-08 at 13:22 +0200, Michael S. Tsirkin wrote: On Mon, Oct 04, 2010 at 03:53:11PM -0600, Alex Williamson wrote: pcibus_dev_print() was erroneously retrieving the device bus number from the secondary bus number offset of the device instead of the bridge above the device. This ends of landing in the 2nd byte of the 3rd BAR for devices, which thankfully is usually zero. pcibus_get_dev_path() copied this code, inheriting the same bug. pcibus_get_dev_path() is used for ramblock naming, so changing it can effect migration. However, I've only seen this byte be non-zero for an assigned device, which can't migrate anyway, so hopefully we won't run into any issues. Signed-off-by: Alex Williamson alex.william...@redhat.com Good catch. Applied. Um... submitted vs applied: PCI: Bus number from the bridge, not the device @@ -6,20 +8,28 @@ number from the secondary bus number offset of the device instead of the bridge above the device. This ends of landing in the 2nd byte of the 3rd BAR for devices, which thankfully -is usually zero. pcibus_get_dev_path() copied this code, +is usually zero. + +Note: pcibus_get_dev_path() copied this code, inheriting the same bug. pcibus_get_dev_path() is used for ramblock naming, so changing it can effect migration. However, I've only seen this byte be non-zero for an assigned device, which can't migrate anyway, so hopefully we won't run into any issues. +This patch does not touch pcibus_get_dev_path, as +bus number is guest assigned for nested buses, +so using it for migration is broken anyway. +Fix it properly later. + Signed-off-by: Alex Williamson alex.william...@redhat.com +Signed-off-by: Michael S. Tsirkin m...@redhat.com diff --git a/hw/pci.c b/hw/pci.c -index 6d0934d..15416dd 100644 +index 962886e..8f6fcf8 100644 --- a/hw/pci.c +++ b/hw/pci.c -@@ -1940,8 +1940,7 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent) +@@ -1806,8 +1806,7 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent) monitor_printf(mon, %*sclass %s, addr %02x:%02x.%x, pci id %04x:%04x (sub %04x:%04x)\n, @@ -29,14 +39,3 @@ PCI_SLOT(d-devfn), PCI_FUNC(d-devfn), pci_get_word(d-config + PCI_VENDOR_ID), pci_get_word(d-config + PCI_DEVICE_ID), -@@ -1965,7 +1964,7 @@ static char *pcibus_get_dev_path(DeviceState *dev) - char path[16]; - - snprintf(path, sizeof(path), %04x:%02x:%02x.%x, -- pci_find_domain(d-bus), d-config[PCI_SECONDARY_BUS], -+ pci_find_domain(d-bus), pci_bus_num(d-bus), - PCI_SLOT(d-devfn), PCI_FUNC(d-devfn)); - - return strdup(path); - - So the chunk that fixed the part that I was actually interested in got dropped even though the existing code is clearly wrong. Yes, we still have issues with nested bridges (not that we have many of those), but until the Fix it properly later part comes along, can we please include the obvious bug fix? Thanks, Alex
[Qemu-devel] [RESEND PATCH v3 0/2] Minimal RAM API support
No comments since v3, please apply. Thanks, Alex v3: - Address review comments - pc registers all memory below 4G in one chunk Let me know if there are any further issues. v2: - Move to Makefile.objs - Move structures to memory.c and create a callback function - Fix memory leak I haven't moved to the state parameter because there should only be a single instance of this per VM. The state parameter seems like it would add complications in setup and function calling, but maybe point me to an example if I'm off base. v1: For VFIO based device assignment, we need to know what guest memory areas are actual RAM. RAMBlocks have long since become a grab bag of misc allocations, so aren't effective for this. Anthony has had a RAM API in mind for a while now that addresses this problem. This implements just enough of it so that we have an interface to get actual guest memory physical addresses to setup the host IOMMU. We can continue building a full RAM API on top of this stub. Anthony, feel free to add copyright to memory.c as it's based on your initial implementation. I had to add something since the file in your branch just copies a header with Frabrice's copywrite. --- Alex Williamson (2): RAM API: Make use of it for x86 PC Minimal RAM API support Makefile.objs |1 + cpu-common.h |2 + hw/pc.c |9 ++--- memory.c | 97 + memory.h | 44 ++ 5 files changed, 147 insertions(+), 6 deletions(-) create mode 100644 memory.c create mode 100644 memory.h
[Qemu-devel] [RESEND PATCH v3 1/2] Minimal RAM API support
This adds a minimum chunk of Anthony's RAM API support so that we can identify actual VM RAM versus all the other things that make use of qemu_ram_alloc. Signed-off-by: Alex Williamson alex.william...@redhat.com --- Makefile.objs |1 + cpu-common.h |2 + memory.c | 97 + memory.h | 44 ++ 4 files changed, 144 insertions(+), 0 deletions(-) create mode 100644 memory.c create mode 100644 memory.h diff --git a/Makefile.objs b/Makefile.objs index cebb945..47f3c3a 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -172,6 +172,7 @@ hw-obj-y += pci.o pci_bridge.o msix.o msi.o hw-obj-$(CONFIG_PCI) += pci_host.o pcie_host.o hw-obj-$(CONFIG_PCI) += ioh3420.o xio3130_upstream.o xio3130_downstream.o hw-obj-y += watchdog.o +hw-obj-y += memory.o hw-obj-$(CONFIG_ISA_MMIO) += isa_mmio.o hw-obj-$(CONFIG_ECC) += ecc.o hw-obj-$(CONFIG_NAND) += nand.o diff --git a/cpu-common.h b/cpu-common.h index 6d4a898..f08f93b 100644 --- a/cpu-common.h +++ b/cpu-common.h @@ -29,6 +29,8 @@ enum device_endian { /* address in the RAM (different from a physical address) */ typedef unsigned long ram_addr_t; +#include memory.h + /* memory API */ typedef void CPUWriteMemoryFunc(void *opaque, target_phys_addr_t addr, uint32_t value); diff --git a/memory.c b/memory.c new file mode 100644 index 000..742776f --- /dev/null +++ b/memory.c @@ -0,0 +1,97 @@ +/* + * RAM API + * + * Copyright Red Hat, Inc. 2010 + * + * Authors: + * Alex Williamson alex.william...@redhat.com + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ +#include memory.h +#include range.h + +typedef struct ram_slot { +target_phys_addr_t start_addr; +ram_addr_t size; +ram_addr_t offset; +QLIST_ENTRY(ram_slot) next; +} ram_slot; + +static QLIST_HEAD(ram_slots, ram_slot) ram_slots = +QLIST_HEAD_INITIALIZER(ram_slots); + +static ram_slot *qemu_ram_find_slot(target_phys_addr_t start_addr, + ram_addr_t size) +{ +ram_slot *slot; + +QLIST_FOREACH(slot, ram_slots, next) { +if (slot-start_addr == start_addr slot-size == size) { +return slot; +} + +if (ranges_overlap(start_addr, size, slot-start_addr, slot-size)) { +hw_error(Ram range overlaps existing slot\n); +} +} + +return NULL; +} + +int qemu_ram_register(target_phys_addr_t start_addr, ram_addr_t size, + ram_addr_t phys_offset) +{ +ram_slot *slot; + +if (!size) { +return -EINVAL; +} + +assert(!qemu_ram_find_slot(start_addr, size)); + +slot = qemu_mallocz(sizeof(ram_slot)); + +slot-start_addr = start_addr; +slot-size = size; +slot-offset = phys_offset; + +QLIST_INSERT_HEAD(ram_slots, slot, next); + +cpu_register_physical_memory(slot-start_addr, slot-size, slot-offset); + +return 0; +} + +void qemu_ram_unregister(target_phys_addr_t start_addr, ram_addr_t size) +{ +ram_slot *slot; + +if (!size) { +return; +} + +slot = qemu_ram_find_slot(start_addr, size); +assert(slot != NULL); + +QLIST_REMOVE(slot, next); +qemu_free(slot); +cpu_register_physical_memory(start_addr, size, IO_MEM_UNASSIGNED); + +return; +} + +int qemu_ram_for_each_slot(void *opaque, qemu_ram_for_each_slot_fn fn) +{ +ram_slot *slot; + +QLIST_FOREACH(slot, ram_slots, next) { +int ret = fn(opaque, slot-start_addr, slot-size, slot-offset); +if (ret) { +return ret; +} +} +return 0; +} diff --git a/memory.h b/memory.h new file mode 100644 index 000..e7aa5cb --- /dev/null +++ b/memory.h @@ -0,0 +1,44 @@ +#ifndef QEMU_MEMORY_H +#define QEMU_MEMORY_H +/* + * RAM API + * + * Copyright Red Hat, Inc. 2010 + * + * Authors: + * Alex Williamson alex.william...@redhat.com + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ + +#include qemu-common.h +#include cpu-common.h + +typedef int (*qemu_ram_for_each_slot_fn)(void *opaque, + target_phys_addr_t start_addr, + ram_addr_t size, + ram_addr_t phys_offset); + +/** + * qemu_ram_register() : Register a region of guest physical memory + * + * The new region must not overlap an existing region. + */ +int qemu_ram_register(target_phys_addr_t start_addr, ram_addr_t size, + ram_addr_t phys_offset); + +/** + * qemu_ram_unregister() : Unregister a region of guest physical memory + */ +void qemu_ram_unregister(target_phys_addr_t start_addr, ram_addr_t size); + +/** + * qemu_ram_for_each_slot() : Call fn() on each registered region + * + * Stop on non-zero return from fn(). + */ +int qemu_ram_for_each_slot(void *opaque,
[Qemu-devel] [RESEND PATCH v3 2/2] RAM API: Make use of it for x86 PC
Register the actual VM RAM using the new API Signed-off-by: Alex Williamson alex.william...@redhat.com --- hw/pc.c |9 +++-- 1 files changed, 3 insertions(+), 6 deletions(-) diff --git a/hw/pc.c b/hw/pc.c index e1b2667..1554164 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -913,14 +913,11 @@ void pc_memory_init(ram_addr_t ram_size, /* allocate RAM */ ram_addr = qemu_ram_alloc(NULL, pc.ram, below_4g_mem_size + above_4g_mem_size); -cpu_register_physical_memory(0, 0xa, ram_addr); -cpu_register_physical_memory(0x10, - below_4g_mem_size - 0x10, - ram_addr + 0x10); +qemu_ram_register(0, below_4g_mem_size, ram_addr); #if TARGET_PHYS_ADDR_BITS 32 if (above_4g_mem_size 0) { -cpu_register_physical_memory(0x1ULL, above_4g_mem_size, - ram_addr + below_4g_mem_size); +qemu_ram_register(0x1ULL, above_4g_mem_size, + ram_addr + below_4g_mem_size); } #endif
Re: [Qemu-devel] [PATCH] libiscsi
On Mon, Dec 13, 2010 at 8:05 AM, Ronnie Sahlberg ronniesahlb...@gmail.com wrote: This patch adds a new block driver : block.iscsi.c This driver interfaces with the multiplatform posix library for iscsi initiator/client access to iscsi devices hosted at git://github.com/sahlberg/libiscsi.git The patch adds the driver to interface with the iscsi library. It also updated the configure script to * by default, probe is libiscsi is available and if so, build qemu against libiscsi. * --enable-libiscsi Force a build against libiscsi. If libiscsi is not available the build will fail. * --disable-libiscsi Do not link against libiscsi, even if it is available. When linked with libiscsi, qemu gains support to access iscsi resources such as disks and cdrom directly, without having to make the devices visible to the host. You can specify devices using a iscsi url of the form : iscsi://host[:port]/target-iqn-name/lun Example: -drive file=iscsi://10.1.1.1:3260/iqn.ronnie.test/1 -cdrom iscsi://10.1.1.1:3260/iqn.ronnie.test/2 Signed-off-by: Ronnie Sahlberg ronniesahlb...@gmail.com --- Makefile.objs | 2 +- block/iscsi.c | 528 + configure | 29 +++ 3 files changed, 558 insertions(+), 1 deletions(-) create mode 100644 block/iscsi.c diff --git a/Makefile.objs b/Makefile.objs index cebb945..81731c5 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -22,7 +22,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vv block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o block-nested-$(CONFIG_WIN32) += raw-win32.o -block-nested-$(CONFIG_POSIX) += raw-posix.o +block-nested-$(CONFIG_POSIX) += raw-posix.o iscsi.o Please use CONFIG_ISCSI... block-nested-$(CONFIG_CURL) += curl.o block-obj-y += $(addprefix block/, $(block-nested-y)) diff --git a/block/iscsi.c b/block/iscsi.c new file mode 100644 index 000..fba5ee6 --- /dev/null +++ b/block/iscsi.c @@ -0,0 +1,528 @@ +/* + * QEMU Block driver for iSCSI images + * + * Copyright (c) 2010 Ronnie Sahlberg ronniesahlb...@gmail.com + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the Software), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include config-host.h +#ifdef CONFIG_LIBISCSI ... then this is not needed. + +#include poll.h +#include sysemu.h +#include qemu-common.h +#include qemu-error.h +#include block_int.h + +#include iscsi/iscsi.h +#include iscsi/scsi-lowlevel.h + + +typedef struct ISCSILUN { + struct iscsi_context *iscsi; + int lun; + int block_size; + unsigned long num_blocks; +} ISCSILUN; + +typedef struct ISCSIAIOCB { + BlockDriverAIOCB common; + QEMUIOVector *qiov; + QEMUBH *bh; + ISCSILUN *iscsilun; + int canceled; + int status; + size_t read_size; +} ISCSIAIOCB; + +struct iscsi_task { + ISCSILUN *iscsilun; + int status; + int complete; +}; Please see CODING_STYLE for struct naming and use of typedefs. + +static int +iscsi_is_inserted(BlockDriverState *bs) +{ + ISCSILUN *iscsilun = bs-opaque; + struct iscsi_context *iscsi = iscsilun-iscsi; + + return iscsi_is_logged_in(iscsi); +} + + +static void +iscsi_aio_cancel(BlockDriverAIOCB *blockacb) +{ + ISCSIAIOCB *acb = (ISCSIAIOCB *)blockacb; + + acb-status = -EIO; + acb-common.cb(acb-common.opaque, acb-status); + acb-canceled = 1; +} + +static AIOPool iscsi_aio_pool = { + .aiocb_size = sizeof(ISCSIAIOCB), + .cancel = iscsi_aio_cancel, +}; + + +static void iscsi_process_read(void *arg); +static void iscsi_process_write(void *arg); + +static void +iscsi_set_events(ISCSILUN *iscsilun) +{ + struct iscsi_context *iscsi = iscsilun-iscsi; + + qemu_aio_set_fd_handler(iscsi_get_fd(iscsi), iscsi_process_read, +
Re: [Qemu-devel] [RESEND PATCH v3 1/2] Minimal RAM API support
On Mon, Dec 13, 2010 at 8:47 PM, Alex Williamson alex.william...@redhat.com wrote: This adds a minimum chunk of Anthony's RAM API support so that we can identify actual VM RAM versus all the other things that make use of qemu_ram_alloc. Signed-off-by: Alex Williamson alex.william...@redhat.com --- Makefile.objs | 1 + cpu-common.h | 2 + memory.c | 97 + memory.h | 44 ++ 4 files changed, 144 insertions(+), 0 deletions(-) create mode 100644 memory.c create mode 100644 memory.h diff --git a/Makefile.objs b/Makefile.objs index cebb945..47f3c3a 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -172,6 +172,7 @@ hw-obj-y += pci.o pci_bridge.o msix.o msi.o hw-obj-$(CONFIG_PCI) += pci_host.o pcie_host.o hw-obj-$(CONFIG_PCI) += ioh3420.o xio3130_upstream.o xio3130_downstream.o hw-obj-y += watchdog.o +hw-obj-y += memory.o hw-obj-$(CONFIG_ISA_MMIO) += isa_mmio.o hw-obj-$(CONFIG_ECC) += ecc.o hw-obj-$(CONFIG_NAND) += nand.o diff --git a/cpu-common.h b/cpu-common.h index 6d4a898..f08f93b 100644 --- a/cpu-common.h +++ b/cpu-common.h @@ -29,6 +29,8 @@ enum device_endian { /* address in the RAM (different from a physical address) */ typedef unsigned long ram_addr_t; +#include memory.h + /* memory API */ typedef void CPUWriteMemoryFunc(void *opaque, target_phys_addr_t addr, uint32_t value); diff --git a/memory.c b/memory.c new file mode 100644 index 000..742776f --- /dev/null +++ b/memory.c @@ -0,0 +1,97 @@ +/* + * RAM API + * + * Copyright Red Hat, Inc. 2010 + * + * Authors: + * Alex Williamson alex.william...@redhat.com + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ +#include memory.h +#include range.h + +typedef struct ram_slot { + target_phys_addr_t start_addr; + ram_addr_t size; + ram_addr_t offset; + QLIST_ENTRY(ram_slot) next; +} ram_slot; Please see CODING_STYLE for structure naming. + +static QLIST_HEAD(ram_slots, ram_slot) ram_slots = + QLIST_HEAD_INITIALIZER(ram_slots); + +static ram_slot *qemu_ram_find_slot(target_phys_addr_t start_addr, + ram_addr_t size) +{ + ram_slot *slot; + + QLIST_FOREACH(slot, ram_slots, next) { + if (slot-start_addr == start_addr slot-size == size) { + return slot; + } + + if (ranges_overlap(start_addr, size, slot-start_addr, slot-size)) { + hw_error(Ram range overlaps existing slot\n); + } + } + + return NULL; +} + +int qemu_ram_register(target_phys_addr_t start_addr, ram_addr_t size, + ram_addr_t phys_offset) +{ + ram_slot *slot; + + if (!size) { + return -EINVAL; + } + + assert(!qemu_ram_find_slot(start_addr, size)); + + slot = qemu_mallocz(sizeof(ram_slot)); Since you initialize every field by hand later, this could be qemu_malloc(). + + slot-start_addr = start_addr; + slot-size = size; + slot-offset = phys_offset; + + QLIST_INSERT_HEAD(ram_slots, slot, next); + + cpu_register_physical_memory(slot-start_addr, slot-size, slot-offset); + + return 0; +} + +void qemu_ram_unregister(target_phys_addr_t start_addr, ram_addr_t size) +{ + ram_slot *slot; + + if (!size) { + return; + } + + slot = qemu_ram_find_slot(start_addr, size); + assert(slot != NULL); + + QLIST_REMOVE(slot, next); + qemu_free(slot); + cpu_register_physical_memory(start_addr, size, IO_MEM_UNASSIGNED); + + return; Useless.
[Qemu-devel] [RESEND PATCH] exec: Implement qemu_ram_free_from_ptr()
Required for regions mapped via qemu_ram_alloc_from_ptr(). VFIO and ivshmem will make use of this to remove mappings when devices are hot unplugged. Signed-off-by: Alex Williamson alex.william...@redhat.com --- No comments on original patch. Obvious missing function. Cam has since requested the same function for ivshmem. cpu-common.h |1 + exec.c | 13 + 2 files changed, 14 insertions(+), 0 deletions(-) diff --git a/cpu-common.h b/cpu-common.h index 6d4a898..9b763d0 100644 --- a/cpu-common.h +++ b/cpu-common.h @@ -49,6 +49,7 @@ ram_addr_t cpu_get_physical_page_desc(target_phys_addr_t addr); ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name, ram_addr_t size, void *host); ram_addr_t qemu_ram_alloc(DeviceState *dev, const char *name, ram_addr_t size); +void qemu_ram_free_from_ptr(ram_addr_t addr); void qemu_ram_free(ram_addr_t addr); /* This should only be used for ram local to a device. */ void *qemu_get_ram_ptr(ram_addr_t addr); diff --git a/exec.c b/exec.c index a338495..eea7ea7 100644 --- a/exec.c +++ b/exec.c @@ -2875,6 +2875,19 @@ ram_addr_t qemu_ram_alloc(DeviceState *dev, const char *name, ram_addr_t size) return qemu_ram_alloc_from_ptr(dev, name, size, NULL); } +void qemu_ram_free_from_ptr(ram_addr_t addr) +{ +RAMBlock *block; + +QLIST_FOREACH(block, ram_list.blocks, next) { +if (addr == block-offset) { +QLIST_REMOVE(block, next); +qemu_free(block); +return; +} +} +} + void qemu_ram_free(ram_addr_t addr) { RAMBlock *block;
[Qemu-devel] Re: [RESEND PATCH v3 1/2] Minimal RAM API support
On 12/13/2010 02:47 PM, Alex Williamson wrote: This adds a minimum chunk of Anthony's RAM API support so that we can identify actual VM RAM versus all the other things that make use of qemu_ram_alloc. Signed-off-by: Alex Williamsonalex.william...@redhat.com --- Makefile.objs |1 + cpu-common.h |2 + memory.c | 97 + memory.h | 44 ++ 4 files changed, 144 insertions(+), 0 deletions(-) create mode 100644 memory.c create mode 100644 memory.h diff --git a/Makefile.objs b/Makefile.objs index cebb945..47f3c3a 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -172,6 +172,7 @@ hw-obj-y += pci.o pci_bridge.o msix.o msi.o hw-obj-$(CONFIG_PCI) += pci_host.o pcie_host.o hw-obj-$(CONFIG_PCI) += ioh3420.o xio3130_upstream.o xio3130_downstream.o hw-obj-y += watchdog.o +hw-obj-y += memory.o hw-obj-$(CONFIG_ISA_MMIO) += isa_mmio.o hw-obj-$(CONFIG_ECC) += ecc.o hw-obj-$(CONFIG_NAND) += nand.o diff --git a/cpu-common.h b/cpu-common.h index 6d4a898..f08f93b 100644 --- a/cpu-common.h +++ b/cpu-common.h @@ -29,6 +29,8 @@ enum device_endian { /* address in the RAM (different from a physical address) */ typedef unsigned long ram_addr_t; +#include memory.h + /* memory API */ typedef void CPUWriteMemoryFunc(void *opaque, target_phys_addr_t addr, uint32_t value); diff --git a/memory.c b/memory.c new file mode 100644 index 000..742776f --- /dev/null +++ b/memory.c @@ -0,0 +1,97 @@ +/* + * RAM API + * + * Copyright Red Hat, Inc. 2010 + * + * Authors: + * Alex Williamsonalex.william...@redhat.com + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ +#include memory.h +#include range.h + +typedef struct ram_slot { +target_phys_addr_t start_addr; +ram_addr_t size; +ram_addr_t offset; +QLIST_ENTRY(ram_slot) next; +} ram_slot; + +static QLIST_HEAD(ram_slots, ram_slot) ram_slots = +QLIST_HEAD_INITIALIZER(ram_slots); + +static ram_slot *qemu_ram_find_slot(target_phys_addr_t start_addr, + ram_addr_t size) +{ +ram_slot *slot; + +QLIST_FOREACH(slot,ram_slots, next) { +if (slot-start_addr == start_addr slot-size == size) { +return slot; +} + +if (ranges_overlap(start_addr, size, slot-start_addr, slot-size)) { +hw_error(Ram range overlaps existing slot\n); +} +} + +return NULL; +} CODING_STYLE. RamSlot and drop the qemu_ prefix. +int qemu_ram_register(target_phys_addr_t start_addr, ram_addr_t size, + ram_addr_t phys_offset) +{ +ram_slot *slot; + +if (!size) { +return -EINVAL; +} + +assert(!qemu_ram_find_slot(start_addr, size)); + +slot = qemu_mallocz(sizeof(ram_slot)); + +slot-start_addr = start_addr; +slot-size = size; +slot-offset = phys_offset; + +QLIST_INSERT_HEAD(ram_slots, slot, next); + +cpu_register_physical_memory(slot-start_addr, slot-size, slot-offset); + +return 0; +} + +void qemu_ram_unregister(target_phys_addr_t start_addr, ram_addr_t size) +{ +ram_slot *slot; + +if (!size) { +return; +} + +slot = qemu_ram_find_slot(start_addr, size); +assert(slot != NULL); + +QLIST_REMOVE(slot, next); +qemu_free(slot); +cpu_register_physical_memory(start_addr, size, IO_MEM_UNASSIGNED); + +return; +} + +int qemu_ram_for_each_slot(void *opaque, qemu_ram_for_each_slot_fn fn) +{ +ram_slot *slot; + +QLIST_FOREACH(slot,ram_slots, next) { +int ret = fn(opaque, slot-start_addr, slot-size, slot-offset); +if (ret) { +return ret; +} +} +return 0; +} diff --git a/memory.h b/memory.h new file mode 100644 index 000..e7aa5cb --- /dev/null +++ b/memory.h @@ -0,0 +1,44 @@ +#ifndef QEMU_MEMORY_H +#define QEMU_MEMORY_H +/* + * RAM API + * + * Copyright Red Hat, Inc. 2010 + * + * Authors: + * Alex Williamsonalex.william...@redhat.com + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ + +#include qemu-common.h +#include cpu-common.h + +typedef int (*qemu_ram_for_each_slot_fn)(void *opaque, + target_phys_addr_t start_addr, + ram_addr_t size, + ram_addr_t phys_offset); + +/** + * qemu_ram_register() : Register a region of guest physical memory + * + * The new region must not overlap an existing region. + */ +int qemu_ram_register(target_phys_addr_t start_addr, ram_addr_t size, + ram_addr_t phys_offset); + +/** + * qemu_ram_unregister() : Unregister a region of guest physical memory + */ +void qemu_ram_unregister(target_phys_addr_t start_addr, ram_addr_t size); + +/** + * qemu_ram_for_each_slot() : Call fn() on each
[Qemu-devel] [PATCH v4 0/2] Minimal RAM API support
Update per comments, Thanks, Alex v4: - ram_slot - RamSlot (per CODING_STYLE) - drop qemu_ prefix from functions (per CODING_STYLE) - mallocz - malloc - drop extraneous return from void function v3: - Address review comments - pc registers all memory below 4G in one chunk Let me know if there are any further issues. v2: - Move to Makefile.objs - Move structures to memory.c and create a callback function - Fix memory leak I haven't moved to the state parameter because there should only be a single instance of this per VM. The state parameter seems like it would add complications in setup and function calling, but maybe point me to an example if I'm off base. v1: For VFIO based device assignment, we need to know what guest memory areas are actual RAM. RAMBlocks have long since become a grab bag of misc allocations, so aren't effective for this. Anthony has had a RAM API in mind for a while now that addresses this problem. This implements just enough of it so that we have an interface to get actual guest memory physical addresses to setup the host IOMMU. We can continue building a full RAM API on top of this stub. Anthony, feel free to add copyright to memory.c as it's based on your initial implementation. I had to add something since the file in your branch just copies a header with Frabrice's copywrite. --- Alex Williamson (2): RAM API: Make use of it for x86 PC Minimal RAM API support Makefile.objs |1 + cpu-common.h |2 + hw/pc.c |9 ++--- memory.c | 94 + memory.h | 44 +++ 5 files changed, 144 insertions(+), 6 deletions(-) create mode 100644 memory.c create mode 100644 memory.h
[Qemu-devel] [PATCH] RFC: delay pci_update_mappings for 64-bit BARs
Do not call pci_update_mappings on the lower 32-bits of a 64-bit bar. Wait for the upper 32 or else Qemu will try to map on just the lower 32 which is probably going to corrupt memory. I was encountering crashes when mapping certain PCI region sizes. The problem turns out that pci_update_mappings is being called without all 64-bits in the BAR. For example when mapping to 0x18000, once the lower 32-bits were written the remapping happened (mapping to 0x800) which would overwrite something. I'm not certain if this is completely correct, I'm simply testing the lower 4-bits to only be MEM_TYPE_64 flag. Upper 32-bit address parts can be values like 0xff which is tricky to test against. Cam --- hw/pci.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 438c0d1..3b81792 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -1000,6 +1000,9 @@ void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l) { int i, was_irq_disabled = pci_irq_disabled(d); uint32_t config_size = pci_config_size(d); +int is_64 = 0; + +is_64 = ((val 0xf) == PCI_BASE_ADDRESS_MEM_TYPE_64); for (i = 0; i l addr + i config_size; val = 8, ++i) { uint8_t wmask = d-wmask[addr + i]; @@ -1008,7 +1011,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l) d-config[addr + i] = (d-config[addr + i] ~wmask) | (val wmask); d-config[addr + i] = ~(val w1cmask); /* W1C: Write 1 to Clear */ } -if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) || +if ((ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) (!is_64)) || ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) || ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) || range_covers_byte(addr, l, PCI_COMMAND)) -- 1.7.0.4
[Qemu-devel] KVM call agenda for Dec 14
Please send in any agenda items you are interested in covering. thanks, -chris
[Qemu-devel] [PATCH 04/11] ide: move transfer_start after variable modification
We hook into transfer_start and immediately call the end function for ahci. This means that everything needs to be in place for the end function when we start the transfer, so let's move the function down to where all state is in place. Signed-off-by: Alexander Graf ag...@suse.de --- hw/ide/core.c |8 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/hw/ide/core.c b/hw/ide/core.c index 2d0ad56..04e463a 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -814,11 +814,11 @@ static void ide_atapi_cmd_reply_end(IDEState *s) size = s-cd_sector_size - s-io_buffer_index; if (size s-elementary_transfer_size) size = s-elementary_transfer_size; -ide_transfer_start(s, s-io_buffer + s-io_buffer_index, - size, ide_atapi_cmd_reply_end); s-packet_transfer_size -= size; s-elementary_transfer_size -= size; s-io_buffer_index += size; +ide_transfer_start(s, s-io_buffer + s-io_buffer_index + size, + size, ide_atapi_cmd_reply_end); } else { /* a new transfer is needed */ s-nsector = (s-nsector ~7) | ATAPI_INT_REASON_IO; @@ -843,11 +843,11 @@ static void ide_atapi_cmd_reply_end(IDEState *s) if (size (s-cd_sector_size - s-io_buffer_index)) size = (s-cd_sector_size - s-io_buffer_index); } -ide_transfer_start(s, s-io_buffer + s-io_buffer_index, - size, ide_atapi_cmd_reply_end); s-packet_transfer_size -= size; s-elementary_transfer_size -= size; s-io_buffer_index += size; +ide_transfer_start(s, s-io_buffer + s-io_buffer_index - size, + size, ide_atapi_cmd_reply_end); ide_set_irq(s-bus); #ifdef DEBUG_IDE_ATAPI printf(status=0x%x\n, s-status); -- 1.6.0.2
[Qemu-devel] [PATCH 01/11] ide: split ide command interpretation off
The ATA command interpretation code can be used for PATA and SATA interfaces alike. So let's split it out into a separate function. Signed-off-by: Alexander Graf ag...@suse.de --- v6 - v7: - use bus instead of opaque (stefanha) --- hw/ide/core.c | 20 ++-- hw/ide/internal.h |2 ++ 2 files changed, 16 insertions(+), 6 deletions(-) diff --git a/hw/ide/core.c b/hw/ide/core.c index 430350f..ac4ee71 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -1791,9 +1791,6 @@ static void ide_clear_hob(IDEBus *bus) void ide_ioport_write(void *opaque, uint32_t addr, uint32_t val) { IDEBus *bus = opaque; -IDEState *s; -int n; -int lba48 = 0; #ifdef DEBUG_IDE printf(IDE: write addr=0x%x val=0x%02x\n, addr, val); @@ -1854,17 +1851,29 @@ void ide_ioport_write(void *opaque, uint32_t addr, uint32_t val) default: case 7: /* command */ +ide_exec_cmd(bus, val); +break; +} +} + + +void ide_exec_cmd(IDEBus *bus, uint32_t val) +{ +IDEState *s; +int n; +int lba48 = 0; + #if defined(DEBUG_IDE) printf(ide: CMD=%02x\n, val); #endif s = idebus_active_if(bus); /* ignore commands to non existant slave */ if (s != bus-ifs !s-bs) -break; +return; /* Only DEVICE RESET is allowed while BSY or/and DRQ are set */ if ((s-status (BUSY_STAT|DRQ_STAT)) val != WIN_DEVICE_RESET) -break; +return; switch(val) { case WIN_IDENTIFY: @@ -2355,7 +2364,6 @@ void ide_ioport_write(void *opaque, uint32_t addr, uint32_t val) ide_set_irq(s-bus); break; } -} } uint32_t ide_ioport_read(void *opaque, uint32_t addr1) diff --git a/hw/ide/internal.h b/hw/ide/internal.h index 71af66f..029c76c 100644 --- a/hw/ide/internal.h +++ b/hw/ide/internal.h @@ -567,6 +567,8 @@ void ide_init2_with_non_qdev_drives(IDEBus *bus, DriveInfo *hd0, DriveInfo *hd1, qemu_irq irq); void ide_init_ioport(IDEBus *bus, int iobase, int iobase2); +void ide_exec_cmd(IDEBus *bus, uint32_t val); + /* hw/ide/qdev.c */ void ide_bus_new(IDEBus *idebus, DeviceState *dev, int bus_id); IDEDevice *ide_create_drive(IDEBus *bus, int unit, DriveInfo *drive); -- 1.6.0.2
[Qemu-devel] [PATCH 07/11] pci: add ich9 pci id
We need a PCI ID for our new AHCI adapter. I just picked an ICH-9 because that's the one in the Q35 chipset. This patch adds a PCI ID define for an ICH-9 AHCI adapter. Signed-off-by: Alexander Graf ag...@suse.de --- v3 - v4: - add ICH7 instead of ICH7M (herbszt) v4 - v5: - rename to ICH7_AHCI_RAID (herbszt) v6 - v7: - use non-raid ich7 ahci (herbszt) v8 - v9: - use ICH9 instead of ICH7 --- hw/pci.h |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/hw/pci.h b/hw/pci.h index 89f7b76..7f02911 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -62,6 +62,7 @@ /* Intel (0x8086) */ #define PCI_DEVICE_ID_INTEL_82551IT 0x1209 #define PCI_DEVICE_ID_INTEL_825570x1229 +#define PCI_DEVICE_ID_INTEL_82801IR 0x2922 /* Red Hat / Qumranet (for QEMU) -- see pci-ids.txt */ #define PCI_VENDOR_ID_REDHAT_QUMRANET0x1af4 -- 1.6.0.2
[Qemu-devel] [PATCH 05/11] ide: add ncq identify data for ahci sata drives
From: Roland Elek elek.rol...@gmail.com I modified ide_identify() to include the zero-based queue length value in word 75, and set bit 8 in word 76 to signal NCQ support in the identify data for AHCI SATA drives. Signed-off-by: Roland Elek elek.rol...@gmail.com --- hw/ide/core.c |7 +++ hw/ide/internal.h |2 ++ 2 files changed, 9 insertions(+), 0 deletions(-) diff --git a/hw/ide/core.c b/hw/ide/core.c index 04e463a..344b7b4 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -140,6 +140,13 @@ static void ide_identify(IDEState *s) put_le16(p + 66, 120); put_le16(p + 67, 120); put_le16(p + 68, 120); + +if (s-ncq_queues) { +put_le16(p + 75, s-ncq_queues - 1); +/* NCQ supported */ +put_le16(p + 76, (1 8)); +} + put_le16(p + 80, 0xf0); /* ata3 - ata6 supported */ put_le16(p + 81, 0x16); /* conforms to ata5 */ /* 14=NOP supported, 5=WCACHE supported, 0=SMART supported */ diff --git a/hw/ide/internal.h b/hw/ide/internal.h index aadb505..697c3b4 100644 --- a/hw/ide/internal.h +++ b/hw/ide/internal.h @@ -447,6 +447,8 @@ struct IDEState { int smart_errors; uint8_t smart_selftest_count; uint8_t *smart_selftest_data; +/* AHCI */ +int ncq_queues; }; struct IDEDMAOps { -- 1.6.0.2
[Qemu-devel] [PATCH 00/11] AHCI emulation support v9
This patch adds support for AHCI emulation. I have tested and verified it works in Linux, OpenBSD, Windows Vista and Windows 7. This AHCI emulation supports NCQ, so multiple read or write requests can be outstanding at the same time. The code is however not fully optimized yet. I'm fairly sure that there are low hanging performance fruits to be found still :). In my simple benchmarks I achieved about 2/3rd of virtio performance. Also, this AHCI emulation layer does not support legacy mode. So if you're using a disk with this emulation, you do not get it exposed using the legacy IDE interfaces. Another nitpick is CD-ROM support in Windows. Somehow it doesn't detect a CD-ROM drive attached to AHCI. At least it doesn't list it. To attach an AHCI disk to your VM, please use -drive id=disk,file=...,if=none -device ahci,id=ahci \ -device ide-drive,drive=disk,bus=ahci.0 This patch set is based on work done during the Google Summer of Code. I was mentoring a student, Roland Elek, who wrote most of the AHCI emulation code based on a patch from Chong Qiao. A bunch of other people were also involved, so everybody who I didn't mention - thanks a lot! git://repo.or.cz/qemu/ahci.git ahci v1 - v2: - rename IDEExtender to IDEBusOps and make a pointer (kraxel) - make dma hooks explicit by putting them into ops struct (stefanha) - use qdev buses (kraxel) - minor cleanups - dprintf overhaul - add reset function v2 - v3: - add msi support (kraxel) - use MIN macro (kraxel) - add msi support (kraxel) - fix ncq with multiple ports - zap qdev properties (kraxel) - redesign legacy IF_SATA hooks (kraxel) - don't build ahci as part of target - move to ide/ (kwolf) v3 - v4: - prepare for endianness safety - add lspci dump (herbszt) - use ich7 instead of ich7m (herbszt) - fix lst+fis mapping (kraxel) - coding style (blue swirl) - explicit mmio setters/getters (blue swirl) - split pata code out to pata.c (kwolf) - only include config-devices.h in machine description (blue swirl) v4 - v5: - s/H2dNcqFis/NCQFrame/g (blue swirl) - redo -drive magic (blue swirl) - bump BAR to 4k - rename ICH7_AHCI to ICH7_AHCI_RAID (herbszt) - drop device config header (blue swirl) v5 - v6: - PCI config space fixes (isaku) - remove CONFIG_AHCI from x86 default configs (paul brook) - use snprintf (blue swirl) - add generic PCI config file (paul brook) - build ahci on all PCI platforms (paul brook) v6 - v7: - use bus instead of opaque (stefanha) - change naming in IDEBusOps (stefanha, kwolf) - rename IDEBusOps (stefanha) - improve interrupt injection - combine tfdata code paths - update tfdata more often - reset port registers on port reset - improve debug output - add feature variable from fis for some extended commands - always set feature to DMA for atapi - osx 10.5.0 works as of this version - use non-raid ich7 ahci (herbszt) - reflect normal ich7 in pci dump - stick to new IDEBusOps (stefanha, kwolf) - stefan's ahci comments v7 - v8: - rewrite ops as DMA offsplit framework - split bmdma stuff out to pci.c - generate tfdata on the fly - reimplement immediate dma rw - add safety net for busy engine - adjust ahci code for new DMA framework - move ide core+pci to pci.mak - add sebastian's config space patches v8 - v9: - make dma providers subclass of idedma (kwolf) - s/set_status/add_status/g (kwolf) - cancel and clear ncq queue on reset (stefanha) - clear ptr on map failure (stefanha) - potential NULL deref, unregister reset (stefanha) - add error reporting for ncq (stefanha) - replace hw_error with DPRINTF (stefanha) - move sg generation to sg users - fix off-by-one in sglist interpretation - make background engine work (queued commands) - use ICH9 instead of ICH7 (aliguori) - update to new APIs Alexander Graf (9): ide: split ide command interpretation off ide: fix whitespace gap in ide_exec_cmd ide: Split out BMDMA code from ATA core ide: move transfer_start after variable modification pci: add storage class for sata pci: add ich9 pci id ahci: add ahci emulation config: move ide core and pci to pci.mak config: add ahci for pci capable machines Roland Elek (1): ide: add ncq identify data for ahci sata drives Sebastian Herbszt (1): ahci: set SATA Mode Select Makefile.objs|1 + default-configs/arm-softmmu.mak |1 - default-configs/i386-softmmu.mak |3 - default-configs/mips-softmmu.mak |3 - default-configs/mips64-softmmu.mak |3 - default-configs/mips64el-softmmu.mak |3 - default-configs/mipsel-softmmu.mak |3 - default-configs/pci.mak |4 + default-configs/ppc-softmmu.mak |3 - default-configs/ppc64-softmmu.mak|3 - default-configs/ppcemb-softmmu.mak |3 - default-configs/sh4-softmmu.mak |1 - default-configs/sh4eb-softmmu.mak|1 -
[Qemu-devel] [PATCH 03/11] ide: Split out BMDMA code from ATA core
The ATA core is currently heavily intertwined with BMDMA code. Let's loosen that a bit, so we can happily replace the DMA backend with different implementations. Signed-off-by: Alexander Graf ag...@suse.de --- v7 - v8: - rewrite as DMA ops v8 - v9: - fold in: split out irq setting - fold in: move header definitions out - make dma providers subclass of idedma (kwolf) - s/set_status/add_status/g (kwolf) --- hw/ide/cmd646.c |7 +- hw/ide/core.c | 335 ++--- hw/ide/internal.h | 69 +-- hw/ide/pci.c | 289 +- hw/ide/pci.h | 30 + hw/ide/piix.c |7 +- hw/ide/via.c |7 +- 7 files changed, 446 insertions(+), 298 deletions(-) diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c index ea5d2dc..fde0617 100644 --- a/hw/ide/cmd646.c +++ b/hw/ide/cmd646.c @@ -167,9 +167,10 @@ static void bmdma_map(PCIDevice *pci_dev, int region_num, for(i = 0;i 2; i++) { BMDMAState *bm = d-bmdma[i]; -d-bus[i].bmdma = bm; +bmdma_init(d-bus[i], bm); bm-bus = d-bus+i; -qemu_add_vm_change_state_handler(ide_dma_restart_cb, bm); +qemu_add_vm_change_state_handler(d-bus[i].dma-ops-restart_cb, + bm-dma); if (i == 0) { register_ioport_write(addr, 4, 1, bmdma_writeb_0, d); @@ -218,7 +219,7 @@ static void cmd646_reset(void *opaque) for (i = 0; i 2; i++) { ide_bus_reset(d-bus[i]); -ide_dma_reset(d-bmdma[i]); +d-bus[i].dma-ops-reset(d-bmdma[i].dma); } } diff --git a/hw/ide/core.c b/hw/ide/core.c index 5e2fcbd..2d0ad56 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -34,8 +34,6 @@ #include hw/ide/internal.h -#define IDE_PAGE_SIZE 4096 - static const int smart_attributes[][5] = { /* id, flags, val, wrst, thrsh */ { 0x01, 0x03, 0x64, 0x64, 0x06}, /* raw read */ @@ -61,11 +59,8 @@ static inline int media_is_cd(IDEState *s) return (media_present(s) s-nb_sectors = CD_MAX_SECTORS); } -static void ide_dma_start(IDEState *s, BlockDriverCompletionFunc *dma_cb); -static void ide_dma_restart(IDEState *s, int is_read); static void ide_atapi_cmd_read_dma_cb(void *opaque, int ret); static int ide_handle_rw_error(IDEState *s, int error, int op); -static void ide_flush_cache(IDEState *s); static void padstr(char *str, const char *src, int len) { @@ -314,11 +309,11 @@ static inline void ide_abort_command(IDEState *s) } static inline void ide_dma_submit_check(IDEState *s, - BlockDriverCompletionFunc *dma_cb, BMDMAState *bm) + BlockDriverCompletionFunc *dma_cb) { -if (bm-aiocb) +if (s-bus-dma-aiocb) return; -dma_cb(bm, -1); +dma_cb(s, -1); } /* prepare data transfer and tell what to do after */ @@ -328,8 +323,10 @@ static void ide_transfer_start(IDEState *s, uint8_t *buf, int size, s-end_transfer_func = end_transfer_func; s-data_ptr = buf; s-data_end = buf + size; -if (!(s-status ERR_STAT)) +if (!(s-status ERR_STAT)) { s-status |= DRQ_STAT; +} +s-bus-dma-ops-start_transfer(s-bus-dma); } static void ide_transfer_stop(IDEState *s) @@ -394,7 +391,7 @@ static void ide_rw_error(IDEState *s) { ide_set_irq(s-bus); } -static void ide_sector_read(IDEState *s) +void ide_sector_read(IDEState *s) { int64_t sector_num; int ret, n; @@ -427,58 +424,15 @@ static void ide_sector_read(IDEState *s) } } - -/* return 0 if buffer completed */ -static int dma_buf_prepare(BMDMAState *bm, int is_write) -{ -IDEState *s = bmdma_active_if(bm); -struct { -uint32_t addr; -uint32_t size; -} prd; -int l, len; - -qemu_sglist_init(s-sg, s-nsector / (IDE_PAGE_SIZE / 512) + 1); -s-io_buffer_size = 0; -for(;;) { -if (bm-cur_prd_len == 0) { -/* end of table (with a fail safe of one page) */ -if (bm-cur_prd_last || -(bm-cur_addr - bm-addr) = IDE_PAGE_SIZE) -return s-io_buffer_size != 0; -cpu_physical_memory_read(bm-cur_addr, (uint8_t *)prd, 8); -bm-cur_addr += 8; -prd.addr = le32_to_cpu(prd.addr); -prd.size = le32_to_cpu(prd.size); -len = prd.size 0xfffe; -if (len == 0) -len = 0x1; -bm-cur_prd_len = len; -bm-cur_prd_addr = prd.addr; -bm-cur_prd_last = (prd.size 0x8000); -} -l = bm-cur_prd_len; -if (l 0) { -qemu_sglist_add(s-sg, bm-cur_prd_addr, l); -bm-cur_prd_addr += l; -bm-cur_prd_len -= l; -s-io_buffer_size += l; -} -} -return 1; -} - static void dma_buf_commit(IDEState *s, int is_write) { qemu_sglist_destroy(s-sg); } -static void ide_dma_set_inactive(BMDMAState *bm) +static void
[Qemu-devel] [PATCH 10/11] config: add ahci for pci capable machines
This patch enables AHCI for all machines supporting PCI. Signed-off-by: Alexander Graf ag...@suse.de --- default-configs/pci.mak |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/default-configs/pci.mak b/default-configs/pci.mak index d700b3c..0471efb 100644 --- a/default-configs/pci.mak +++ b/default-configs/pci.mak @@ -13,3 +13,4 @@ CONFIG_E1000_PCI=y CONFIG_IDE_CORE=y CONFIG_IDE_QDEV=y CONFIG_IDE_PCI=y +CONFIG_AHCI=y -- 1.6.0.2
[Qemu-devel] [PATCH 06/11] pci: add storage class for sata
This patch adds the storage sata class id. Signed-off-by: Alexander Graf ag...@suse.de --- hw/pci_ids.h |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/hw/pci_ids.h b/hw/pci_ids.h index 82cba7e..ea3418c 100644 --- a/hw/pci_ids.h +++ b/hw/pci_ids.h @@ -15,6 +15,7 @@ #define PCI_CLASS_STORAGE_SCSI 0x0100 #define PCI_CLASS_STORAGE_IDE0x0101 +#define PCI_CLASS_STORAGE_SATA 0x0106 #define PCI_CLASS_STORAGE_OTHER 0x0180 #define PCI_CLASS_NETWORK_ETHERNET 0x0200 -- 1.6.0.2
[Qemu-devel] [PATCH 02/11] ide: fix whitespace gap in ide_exec_cmd
Now that we have the function split out, we have to reindent it. In order to increase the readability of the actual functional change, this is split out. Signed-off-by: Alexander Graf ag...@suse.de --- hw/ide/core.c | 734 1 files changed, 367 insertions(+), 367 deletions(-) diff --git a/hw/ide/core.c b/hw/ide/core.c index ac4ee71..5e2fcbd 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -1864,423 +1864,423 @@ void ide_exec_cmd(IDEBus *bus, uint32_t val) int lba48 = 0; #if defined(DEBUG_IDE) -printf(ide: CMD=%02x\n, val); +printf(ide: CMD=%02x\n, val); #endif -s = idebus_active_if(bus); -/* ignore commands to non existant slave */ -if (s != bus-ifs !s-bs) -return; +s = idebus_active_if(bus); +/* ignore commands to non existant slave */ +if (s != bus-ifs !s-bs) +return; -/* Only DEVICE RESET is allowed while BSY or/and DRQ are set */ -if ((s-status (BUSY_STAT|DRQ_STAT)) val != WIN_DEVICE_RESET) -return; +/* Only DEVICE RESET is allowed while BSY or/and DRQ are set */ +if ((s-status (BUSY_STAT|DRQ_STAT)) val != WIN_DEVICE_RESET) +return; -switch(val) { -case WIN_IDENTIFY: -if (s-bs s-drive_kind != IDE_CD) { -if (s-drive_kind != IDE_CFATA) -ide_identify(s); -else -ide_cfata_identify(s); -s-status = READY_STAT | SEEK_STAT; -ide_transfer_start(s, s-io_buffer, 512, ide_transfer_stop); -} else { -if (s-drive_kind == IDE_CD) { -ide_set_signature(s); -} -ide_abort_command(s); -} -ide_set_irq(s-bus); -break; -case WIN_SPECIFY: -case WIN_RECAL: -s-error = 0; +switch(val) { +case WIN_IDENTIFY: +if (s-bs s-drive_kind != IDE_CD) { +if (s-drive_kind != IDE_CFATA) +ide_identify(s); +else +ide_cfata_identify(s); s-status = READY_STAT | SEEK_STAT; -ide_set_irq(s-bus); -break; -case WIN_SETMULT: -if (s-drive_kind == IDE_CFATA s-nsector == 0) { -/* Disable Read and Write Multiple */ -s-mult_sectors = 0; -s-status = READY_STAT | SEEK_STAT; -} else if ((s-nsector 0xff) != 0 -((s-nsector 0xff) MAX_MULT_SECTORS || - (s-nsector (s-nsector - 1)) != 0)) { -ide_abort_command(s); -} else { -s-mult_sectors = s-nsector 0xff; -s-status = READY_STAT | SEEK_STAT; +ide_transfer_start(s, s-io_buffer, 512, ide_transfer_stop); +} else { +if (s-drive_kind == IDE_CD) { +ide_set_signature(s); } -ide_set_irq(s-bus); -break; -case WIN_VERIFY_EXT: - lba48 = 1; -case WIN_VERIFY: -case WIN_VERIFY_ONCE: -/* do sector number check ? */ - ide_cmd_lba48_transform(s, lba48); +ide_abort_command(s); +} +ide_set_irq(s-bus); +break; +case WIN_SPECIFY: +case WIN_RECAL: +s-error = 0; +s-status = READY_STAT | SEEK_STAT; +ide_set_irq(s-bus); +break; +case WIN_SETMULT: +if (s-drive_kind == IDE_CFATA s-nsector == 0) { +/* Disable Read and Write Multiple */ +s-mult_sectors = 0; s-status = READY_STAT | SEEK_STAT; -ide_set_irq(s-bus); -break; +} else if ((s-nsector 0xff) != 0 +((s-nsector 0xff) MAX_MULT_SECTORS || + (s-nsector (s-nsector - 1)) != 0)) { +ide_abort_command(s); +} else { +s-mult_sectors = s-nsector 0xff; +s-status = READY_STAT | SEEK_STAT; +} +ide_set_irq(s-bus); +break; +case WIN_VERIFY_EXT: + lba48 = 1; +case WIN_VERIFY: +case WIN_VERIFY_ONCE: +/* do sector number check ? */ + ide_cmd_lba48_transform(s, lba48); +s-status = READY_STAT | SEEK_STAT; +ide_set_irq(s-bus); +break; case WIN_READ_EXT: - lba48 = 1; -case WIN_READ: -case WIN_READ_ONCE: -if (!s-bs) -goto abort_cmd; - ide_cmd_lba48_transform(s, lba48); -s-req_nb_sectors = 1; -ide_sector_read(s); -break; + lba48 = 1; +case WIN_READ: +case WIN_READ_ONCE: +if (!s-bs) +goto abort_cmd; + ide_cmd_lba48_transform(s, lba48); +s-req_nb_sectors = 1; +ide_sector_read(s); +break; case WIN_WRITE_EXT: - lba48 = 1; -case
[Qemu-devel] [PATCH 11/11] ahci: set SATA Mode Select
From: Sebastian Herbszt herb...@gmx.de Set SATA Mode Select to AHCI in the Address Map Register. Signed-off-by: Sebastian Herbszt herb...@gmx.de --- hw/ide/ahci.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c index f937a92..8ae236a 100644 --- a/hw/ide/ahci.c +++ b/hw/ide/ahci.c @@ -1473,6 +1473,9 @@ static int pci_ahci_init(PCIDevice *dev) d-card.config[PCI_LATENCY_TIMER] = 0x00; /* Latency timer */ pci_config_set_interrupt_pin(d-card.config, 1); +/* XXX Software should program this register */ +d-card.config[0x90] = 1 6; /* Address Map Register - AHCI mode */ + qemu_register_reset(ahci_reset, d); /* XXX BAR size should be 1k, but that breaks, so bump it to 4k for now */ -- 1.6.0.2
[Qemu-devel] [PATCH 09/11] config: move ide core and pci to pci.mak
Every device that can do PCI should also be able to do IDE. So let's move the IDE definitions over to pci.mak. Signed-off-by: Alexander Graf ag...@suse.de --- default-configs/arm-softmmu.mak |1 - default-configs/i386-softmmu.mak |3 --- default-configs/mips-softmmu.mak |3 --- default-configs/mips64-softmmu.mak |3 --- default-configs/mips64el-softmmu.mak |3 --- default-configs/mipsel-softmmu.mak |3 --- default-configs/pci.mak |3 +++ default-configs/ppc-softmmu.mak |3 --- default-configs/ppc64-softmmu.mak|3 --- default-configs/ppcemb-softmmu.mak |3 --- default-configs/sh4-softmmu.mak |1 - default-configs/sh4eb-softmmu.mak|1 - default-configs/sparc64-softmmu.mak |3 --- default-configs/x86_64-softmmu.mak |3 --- 14 files changed, 3 insertions(+), 33 deletions(-) diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak index ac48dc1..8d1174f 100644 --- a/default-configs/arm-softmmu.mak +++ b/default-configs/arm-softmmu.mak @@ -8,7 +8,6 @@ CONFIG_ECC=y CONFIG_SERIAL=y CONFIG_PTIMER=y CONFIG_SD=y -CONFIG_IDE_CORE=y CONFIG_MAX7310=y CONFIG_WM8750=y CONFIG_TWL92230=y diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak index ce905d2..323fafb 100644 --- a/default-configs/i386-softmmu.mak +++ b/default-configs/i386-softmmu.mak @@ -13,9 +13,6 @@ CONFIG_FDC=y CONFIG_ACPI=y CONFIG_APM=y CONFIG_DMA=y -CONFIG_IDE_CORE=y -CONFIG_IDE_QDEV=y -CONFIG_IDE_PCI=y CONFIG_IDE_ISA=y CONFIG_IDE_PIIX=y CONFIG_NE2000_ISA=y diff --git a/default-configs/mips-softmmu.mak b/default-configs/mips-softmmu.mak index 565e611..f524971 100644 --- a/default-configs/mips-softmmu.mak +++ b/default-configs/mips-softmmu.mak @@ -17,9 +17,6 @@ CONFIG_ACPI=y CONFIG_APM=y CONFIG_DMA=y CONFIG_PIIX4=y -CONFIG_IDE_CORE=y -CONFIG_IDE_QDEV=y -CONFIG_IDE_PCI=y CONFIG_IDE_ISA=y CONFIG_IDE_PIIX=y CONFIG_NE2000_ISA=y diff --git a/default-configs/mips64-softmmu.mak b/default-configs/mips64-softmmu.mak index 03bd8eb..aeab6b2 100644 --- a/default-configs/mips64-softmmu.mak +++ b/default-configs/mips64-softmmu.mak @@ -17,9 +17,6 @@ CONFIG_ACPI=y CONFIG_APM=y CONFIG_DMA=y CONFIG_PIIX4=y -CONFIG_IDE_CORE=y -CONFIG_IDE_QDEV=y -CONFIG_IDE_PCI=y CONFIG_IDE_ISA=y CONFIG_IDE_PIIX=y CONFIG_NE2000_ISA=y diff --git a/default-configs/mips64el-softmmu.mak b/default-configs/mips64el-softmmu.mak index 4661617..8e6511c 100644 --- a/default-configs/mips64el-softmmu.mak +++ b/default-configs/mips64el-softmmu.mak @@ -17,9 +17,6 @@ CONFIG_ACPI=y CONFIG_APM=y CONFIG_DMA=y CONFIG_PIIX4=y -CONFIG_IDE_CORE=y -CONFIG_IDE_QDEV=y -CONFIG_IDE_PCI=y CONFIG_IDE_ISA=y CONFIG_IDE_PIIX=y CONFIG_IDE_VIA=y diff --git a/default-configs/mipsel-softmmu.mak b/default-configs/mipsel-softmmu.mak index 92fc473..a05ac25 100644 --- a/default-configs/mipsel-softmmu.mak +++ b/default-configs/mipsel-softmmu.mak @@ -17,9 +17,6 @@ CONFIG_ACPI=y CONFIG_APM=y CONFIG_DMA=y CONFIG_PIIX4=y -CONFIG_IDE_CORE=y -CONFIG_IDE_QDEV=y -CONFIG_IDE_PCI=y CONFIG_IDE_ISA=y CONFIG_IDE_PIIX=y CONFIG_NE2000_ISA=y diff --git a/default-configs/pci.mak b/default-configs/pci.mak index c74a99f..d700b3c 100644 --- a/default-configs/pci.mak +++ b/default-configs/pci.mak @@ -10,3 +10,6 @@ CONFIG_PCNET_COMMON=y CONFIG_LSI_SCSI_PCI=y CONFIG_RTL8139_PCI=y CONFIG_E1000_PCI=y +CONFIG_IDE_CORE=y +CONFIG_IDE_QDEV=y +CONFIG_IDE_PCI=y diff --git a/default-configs/ppc-softmmu.mak b/default-configs/ppc-softmmu.mak index f1cb99e..4563742 100644 --- a/default-configs/ppc-softmmu.mak +++ b/default-configs/ppc-softmmu.mak @@ -23,9 +23,6 @@ CONFIG_GRACKLE_PCI=y CONFIG_UNIN_PCI=y CONFIG_DEC_PCI=y CONFIG_PPCE500_PCI=y -CONFIG_IDE_CORE=y -CONFIG_IDE_QDEV=y -CONFIG_IDE_PCI=y CONFIG_IDE_ISA=y CONFIG_IDE_CMD646=y CONFIG_IDE_MACIO=y diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak index 83cbe97..d5073b3 100644 --- a/default-configs/ppc64-softmmu.mak +++ b/default-configs/ppc64-softmmu.mak @@ -23,9 +23,6 @@ CONFIG_GRACKLE_PCI=y CONFIG_UNIN_PCI=y CONFIG_DEC_PCI=y CONFIG_PPCE500_PCI=y -CONFIG_IDE_CORE=y -CONFIG_IDE_QDEV=y -CONFIG_IDE_PCI=y CONFIG_IDE_ISA=y CONFIG_IDE_CMD646=y CONFIG_IDE_MACIO=y diff --git a/default-configs/ppcemb-softmmu.mak b/default-configs/ppcemb-softmmu.mak index 2b52d4a..9f0730c 100644 --- a/default-configs/ppcemb-softmmu.mak +++ b/default-configs/ppcemb-softmmu.mak @@ -23,9 +23,6 @@ CONFIG_GRACKLE_PCI=y CONFIG_UNIN_PCI=y CONFIG_DEC_PCI=y CONFIG_PPCE500_PCI=y -CONFIG_IDE_CORE=y -CONFIG_IDE_QDEV=y -CONFIG_IDE_PCI=y CONFIG_IDE_ISA=y CONFIG_IDE_CMD646=y CONFIG_IDE_MACIO=y diff --git a/default-configs/sh4-softmmu.mak b/default-configs/sh4-softmmu.mak index 87247a4..5c69acc 100644 --- a/default-configs/sh4-softmmu.mak +++ b/default-configs/sh4-softmmu.mak @@ -3,6 +3,5 @@ include pci.mak CONFIG_SERIAL=y CONFIG_PTIMER=y -CONFIG_IDE_CORE=y CONFIG_PFLASH_CFI02=y CONFIG_ISA_MMIO=y
[Qemu-devel] [PATCH 08/11] ahci: add ahci emulation
This patch adds an emulation layer for an ICH-9 AHCI controller. For now this controller does not do IDE legacy emulation. It is a pure AHCI controller. Signed-off-by: Alexander Graf ag...@suse.de --- v1 - v2: - rename IDEExtender to IDEBusOps and make a pointer (kraxel) - make dma hooks explicit by putting them into ops struct (stefanha) - use qdev buses (kraxel) - minor cleanups - dprintf overhaul - add reset function v2 - v3: - add msi support (kraxel) - use MIN macro (kraxel) - add msi support (kraxel) - fix ncq with multiple ports - zap qdev properties (kraxel) - redesign legacy IF_SATA hooks (kraxel) - don't build ahci as part of target - move to ide/ (kwolf) v3 - v4: - prepare for endianness safety - add lspci dump (herbszt) - use ich7 instead of ich7m (herbszt) - fix lst+fis mapping (kraxel) - coding style (blue swirl) - explicit mmio setters/getters (blue swirl) v4 - v5: - s/H2dNcqFis/NCQFrame/g (blue swirl) - redo -drive magic (blue swirl) - bump BAR to 4k - ahci.c: rename to ICH7_AHCI_RAID (herbszt) v5 - v6: - PCI config space fixes (isaku) - remove CONFIG_AHCI from default configs v6 - v7: - improve interrupt injection - combine tfdata code paths - update tfdata more often - reset port registers on port reset - improve debug output - add feature variable from fis for some extended commands - always set feature to DMA for atapi - osx 10.5.0 works as of this version - use non-raid ich7 ahci (herbszt) - reflect normal ich7 in pci dump - stick to new IDEBusOps (stefanha, kwolf) - ahci: stefan's ahci comments v7 - v8: - generate tfdata on the fly - reimplement immediate dma rw - add safety net for busy engine - adjust for new DMA interface v8 - v9: - ahci: set pci revision id to 0x02 - make dma providers subclass of idedma (kwolf) - s/set_status/add_status/g (kwolf) - cancel and clear ncq queue on reset (stefanha) - clear ptr on map failure (stefanha) - potential NULL deref, unregister reset (stefanha) - add error reporting for ncq (stefanha) - replace hw_error with DPRINTF (stefanha) - move sg generation to sg users - fix off-by-one in sglist interpretation - make background engine work (queued commands) - use ICH9 instead of ICH7 (aliguori) - udpate to new APIs --- Makefile.objs |1 + hw/ide/ahci.c | 1524 + 2 files changed, 1525 insertions(+), 0 deletions(-) create mode 100644 hw/ide/ahci.c diff --git a/Makefile.objs b/Makefile.objs index cebb945..2693088 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -241,6 +241,7 @@ hw-obj-$(CONFIG_IDE_PIIX) += ide/piix.o hw-obj-$(CONFIG_IDE_CMD646) += ide/cmd646.o hw-obj-$(CONFIG_IDE_MACIO) += ide/macio.o hw-obj-$(CONFIG_IDE_VIA) += ide/via.o +hw-obj-$(CONFIG_AHCI) += ide/ahci.o # SCSI layer hw-obj-$(CONFIG_LSI_SCSI_PCI) += lsi53c895a.o diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c new file mode 100644 index 000..f937a92 --- /dev/null +++ b/hw/ide/ahci.c @@ -0,0 +1,1524 @@ +/* + * QEMU AHCI Emulation + * + * Copyright (c) 2010 qiaoch...@loongson.cn + * Copyright (c) 2010 Roland Elek elek.rol...@gmail.com + * Copyright (c) 2010 Sebastian Herbszt herb...@gmx.de + * Copyright (c) 2010 Alexander Graf ag...@suse.de + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see http://www.gnu.org/licenses/. + * + * + * lspci dump of a ICH-9 real device in IDE mode (hopefully close enough): + * + * 00:1f.2 SATA controller [0106]: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA AHCI Controller [8086:2922] (rev 02) (prog-if 01 [AHCI 1.0]) + * Subsystem: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA AHCI Controller [8086:2922] + * Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ + * Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium TAbort- TAbort- MAbort- SERR- PERR- INTx- + * Latency: 0 + * Interrupt: pin B routed to IRQ 222 + * Region 0: I/O ports at d000 [size=8] + * Region 1: I/O ports at cc00 [size=4] + * Region 2: I/O ports at c880 [size=8] + * Region 3: I/O ports at c800 [size=4] + * Region 4: I/O ports at c480 [size=32] + * Region 5: Memory at febf9000 (32-bit, non-prefetchable) [size=2K] + * Capabilities: [80]
[Qemu-devel] [PATCH, RFC 3/4] prep: Fix duplicate ISA IDE IRQ
Calling isa_ide_init() twice with the same IRQ 13 fails: qemu: hardware error: isa irq 13 already assigned Use a different IRQ (14) for the second one to avoid this. Signed-off-by: Hervé Poussineau hpous...@reactos.org Cc: Alexander Graf ag...@suse.de Signed-off-by: Andreas Färber andreas.faer...@web.de --- hw/ppc_prep.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/hw/ppc_prep.c b/hw/ppc_prep.c index 3575dee..3073870 100644 --- a/hw/ppc_prep.c +++ b/hw/ppc_prep.c @@ -76,7 +76,7 @@ qemu_log_mask(CPU_LOG_IOPORT, fmt, ## __VA_ARGS__) /* Constants for devices init */ static const int ide_iobase[2] = { 0x1f0, 0x170 }; static const int ide_iobase2[2] = { 0x3f6, 0x376 }; -static const int ide_irq[2] = { 13, 13 }; +static const int ide_irq[2] = { 13, 14 }; #define NE2000_NB_MAX 6 -- 1.7.3
[Qemu-devel] [PATCH 1/4] prep: Remove bogus BIOS size check
r3480 added this check to account for the entry vector 0xfff00100 to be available for CPUs that need it. Today however, the NIP is not yet initialized at this point (zero), so the check always triggers. Cc: Hervé Poussineau hpous...@reactos.org Cc: Alexander Graf ag...@suse.de Signed-off-by: Andreas Färber andreas.faer...@web.de --- hw/ppc_prep.c |3 --- 1 files changed, 0 insertions(+), 3 deletions(-) diff --git a/hw/ppc_prep.c b/hw/ppc_prep.c index 1492266..6b22122 100644 --- a/hw/ppc_prep.c +++ b/hw/ppc_prep.c @@ -600,9 +600,6 @@ static void ppc_prep_init (ram_addr_t ram_size, if (filename) { qemu_free(filename); } -if (env-nip 0xFFF8 bios_size 0x0010) { -hw_error(PowerPC 601 / 620 / 970 need a 1MB BIOS\n); -} if (linux_boot) { kernel_base = KERNEL_LOAD_ADDR; -- 1.7.3
[Qemu-devel] [FYI 4/4] prep: Quickfix for ioport
Workaround the following error: qemu: hardware error: register_ioport_read: invalid opaque Signed-off-by: Hervé Poussineau hpous...@reactos.org Signed-off-by: Andreas Färber andreas.faer...@web.de --- hw/ppc_prep.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/hw/ppc_prep.c b/hw/ppc_prep.c index 3073870..0c9183e 100644 --- a/hw/ppc_prep.c +++ b/hw/ppc_prep.c @@ -721,8 +721,10 @@ static void ppc_prep_init (ram_addr_t ram_size, register_ioport_read(0x398, 2, 1, PREP_io_read, sysctrl); register_ioport_write(0x398, 2, 1, PREP_io_write, sysctrl); /* System control ports */ +#if 0 register_ioport_read(0x0092, 0x01, 1, PREP_io_800_readb, sysctrl); register_ioport_write(0x0092, 0x01, 1, PREP_io_800_writeb, sysctrl); +#endif register_ioport_read(0x0800, 0x52, 1, PREP_io_800_readb, sysctrl); register_ioport_write(0x0800, 0x52, 1, PREP_io_800_writeb, sysctrl); /* PCI intack location */ -- 1.7.3
[Qemu-devel] [PATCH 2/4] prep: Add ELF support
In order to switch from abondoned OpenHack'Ware to OpenBIOS firmware, the PReP machine needs to be able to load an ELF BIOS. ELF loading is adapted from ppc_newworld, the fallback mechanism from sun4m. Note that since we must register the maximum amount of ROM before attempting to load an ELF BIOS and since there is no cpu_unregister_physical_memory(), raw BIOS files such as OHW may now be preceded by unused ROM memory. Cc: Alexander Graf ag...@suse.de Cc: Hervé Poussineau hpous...@reactos.org Signed-off-by: Andreas Färber andreas.faer...@web.de --- hw/ppc_prep.c | 24 +++- 1 files changed, 15 insertions(+), 9 deletions(-) diff --git a/hw/ppc_prep.c b/hw/ppc_prep.c index 6b22122..3575dee 100644 --- a/hw/ppc_prep.c +++ b/hw/ppc_prep.c @@ -36,6 +36,7 @@ #include qemu-log.h #include ide.h #include loader.h +#include elf.h #include mc146818rtc.h #include blockdev.h @@ -582,18 +583,23 @@ static void ppc_prep_init (ram_addr_t ram_size, bios_name = BIOS_FILENAME; filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name); if (filename) { -bios_size = get_image_size(filename); +cpu_register_physical_memory(0xfff0, BIOS_SIZE, + bios_offset | IO_MEM_ROM); +bios_size = load_elf(filename, NULL, NULL, NULL, + NULL, NULL, 1, ELF_MACHINE, 0); +if (bios_size 0 || bios_size BIOS_SIZE) { +bios_size = get_image_size(filename); +if (bios_size 0 bios_size = BIOS_SIZE) { +target_phys_addr_t bios_addr; +bios_size = (bios_size + 0xfff) ~0xfff; +bios_addr = (uint32_t)(-bios_size); +bios_size = load_image_targphys(filename, bios_addr, +bios_size); +} +} } else { bios_size = -1; } -if (bios_size 0 bios_size = BIOS_SIZE) { -target_phys_addr_t bios_addr; -bios_size = (bios_size + 0xfff) ~0xfff; -bios_addr = (uint32_t)(-bios_size); -cpu_register_physical_memory(bios_addr, bios_size, - bios_offset | IO_MEM_ROM); -bios_size = load_image_targphys(filename, bios_addr, bios_size); -} if (bios_size 0 || bios_size BIOS_SIZE) { hw_error(qemu: could not load PPC PREP bios '%s'\n, bios_name); } -- 1.7.3
[Qemu-devel] [PATCH 0/4] ppc: Fix PReP emulation
Hello, Based on an earlier attempt of mine to make OpenBIOS work with -M prep, with kind support from Hervé Poussineau here's an initial stab at fixing the long-broken PReP emulation and preparing migration from abandoned OpenHack'Ware to OpenBIOS as default FOSS firmware. In particular a number of hw_error()s are resolved, so that the BIOS can be entered at all. It is not yet working in terms of serial and VGA support etc. This series is also available from: git://repo.or.cz/qemu/afaerber.git prep-queue Some more work-in-progress for the curious is on my prep branch [2]. The corresponding work-in-progress OpenBIOS changes are at [3]. Unfortunately the prep machine is lacking documentation what exactly it tries to emulate. The plan thus is to merge emulation of a second, real IBM 40p machine based on Hervé's work at [1], for use with original binary firmware. Also upcoming are new ppc_chrp machines, forked from ppc_newworld, emulating the 970-based IBM JS20 (using Apple U3) [4] and possibly the POWER5-based IntelliStation 285. These depend on the ongoing ppc64 port of OpenBIOS to be completed though. This relates to PReP in that the machine IDs will need to be coordinated. Have fun, Andreas [1] git://repo.or.cz/qemu/hpoussin.git ppc http://repo.or.cz/w/qemu/hpoussin.git/shortlog/refs/heads/ppc [2] http://repo.or.cz/w/qemu/afaerber.git/shortlog/refs/heads/prep [3] http://repo.or.cz/w/openbios/afaerber.git/shortlog/refs/heads/prep [4] http://repo.or.cz/w/qemu/afaerber.git/shortlog/refs/heads/aix Andreas Färber (4): prep: Remove bogus BIOS size check prep: Add ELF support prep: Fix duplicate ISA IDE IRQ prep: Quickfix for ioport hw/ppc_prep.c | 31 ++- 1 files changed, 18 insertions(+), 13 deletions(-) -- 1.7.3
[Qemu-devel] Can any one help me?
I use qemu-0.13.0, and I want to emulate SPARC system, I did these: 1.qemu-img create solaris.img 10G 2.qemu-system-sparc -m 256 -hda solaris.img -boot d -cdrom sol-9-905-sparc.iso 3.qemu reported Unhandled Exception 0X0007, and then Stopping execution Any one known why this happened? Thank you very much Best Regards!
[Qemu-devel] Re: [PATCH] rtl8139: IO memory is not part of vmstate
On Mon, Dec 13, 2010 at 12:15:08PM -0700, Alex Williamson wrote: On Mon, 2010-12-13 at 21:06 +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 11:59:16AM -0700, Alex Williamson wrote: On Mon, 2010-12-13 at 20:54 +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 11:00:44AM -0700, Alex Williamson wrote: On Mon, 2010-12-13 at 19:50 +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 10:43:22AM -0700, Alex Williamson wrote: So, unfortunately, I stand by my original patch. What about the one that put -1 in saved index for a hotplugged device? There are still examples that don't work even without hotplug (example 2 and example 3 after the reboot). That hack limits the damage, but still leaves a latent bug for reboot and doesn't address the non-hotplug scenarios. So, I don't think it's worthwhile to pursue, and we shouldn't pretend we can use it to avoid bumping the version_id. Thanks, Alex I guess when we bump it we tell users: migration is completely borken to the old version, don't even try it. Is there a way for libvirt to discover such incompatibilities and avoid the migration? I don't know if libvirt has a way to query this in advance. If a migration is attempted, the target will report: savevm: unsupported version 5 for ':00:03.0/rtl8139' v4 And the source will continue running. We waste plenty of bits getting to that point, Yes, this happens after all of memory has been migrated. Better late than never :^\ One other question: can we do the same by creating a new (empty) section? As was discussed in the past this is easier for downstreams to cherry-pick. -- MST
Re: [Qemu-devel] Re: [PATCH] PCI: Bus number from the bridge, not the device
On Mon, Dec 13, 2010 at 01:04:23PM -0700, Alex Williamson wrote: On Mon, 2010-11-08 at 13:22 +0200, Michael S. Tsirkin wrote: On Mon, Oct 04, 2010 at 03:53:11PM -0600, Alex Williamson wrote: pcibus_dev_print() was erroneously retrieving the device bus number from the secondary bus number offset of the device instead of the bridge above the device. This ends of landing in the 2nd byte of the 3rd BAR for devices, which thankfully is usually zero. pcibus_get_dev_path() copied this code, inheriting the same bug. pcibus_get_dev_path() is used for ramblock naming, so changing it can effect migration. However, I've only seen this byte be non-zero for an assigned device, which can't migrate anyway, so hopefully we won't run into any issues. Signed-off-by: Alex Williamson alex.william...@redhat.com Good catch. Applied. Um... submitted vs applied: PCI: Bus number from the bridge, not the device @@ -6,20 +8,28 @@ number from the secondary bus number offset of the device instead of the bridge above the device. This ends of landing in the 2nd byte of the 3rd BAR for devices, which thankfully -is usually zero. pcibus_get_dev_path() copied this code, +is usually zero. + +Note: pcibus_get_dev_path() copied this code, inheriting the same bug. pcibus_get_dev_path() is used for ramblock naming, so changing it can effect migration. However, I've only seen this byte be non-zero for an assigned device, which can't migrate anyway, so hopefully we won't run into any issues. +This patch does not touch pcibus_get_dev_path, as +bus number is guest assigned for nested buses, +so using it for migration is broken anyway. +Fix it properly later. + Signed-off-by: Alex Williamson alex.william...@redhat.com +Signed-off-by: Michael S. Tsirkin m...@redhat.com diff --git a/hw/pci.c b/hw/pci.c -index 6d0934d..15416dd 100644 +index 962886e..8f6fcf8 100644 --- a/hw/pci.c +++ b/hw/pci.c -@@ -1940,8 +1940,7 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent) +@@ -1806,8 +1806,7 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent) monitor_printf(mon, %*sclass %s, addr %02x:%02x.%x, pci id %04x:%04x (sub %04x:%04x)\n, @@ -29,14 +39,3 @@ PCI_SLOT(d-devfn), PCI_FUNC(d-devfn), pci_get_word(d-config + PCI_VENDOR_ID), pci_get_word(d-config + PCI_DEVICE_ID), -@@ -1965,7 +1964,7 @@ static char *pcibus_get_dev_path(DeviceState *dev) - char path[16]; - - snprintf(path, sizeof(path), %04x:%02x:%02x.%x, -- pci_find_domain(d-bus), d-config[PCI_SECONDARY_BUS], -+ pci_find_domain(d-bus), pci_bus_num(d-bus), - PCI_SLOT(d-devfn), PCI_FUNC(d-devfn)); - - return strdup(path); - - So the chunk that fixed the part that I was actually interested in got dropped even though the existing code is clearly wrong. Yes, we still have issues with nested bridges (not that we have many of those), but until the Fix it properly later part comes along, can we please include the obvious bug fix? Thanks, Alex We can stick 0 in there - would that help? I would much rather not create a version where we put the bus number there. -- MST
Re: [Qemu-devel] Re: [PATCH] PCI: Bus number from the bridge, not the device
On Tue, 2010-12-14 at 06:46 +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 01:04:23PM -0700, Alex Williamson wrote: On Mon, 2010-11-08 at 13:22 +0200, Michael S. Tsirkin wrote: On Mon, Oct 04, 2010 at 03:53:11PM -0600, Alex Williamson wrote: pcibus_dev_print() was erroneously retrieving the device bus number from the secondary bus number offset of the device instead of the bridge above the device. This ends of landing in the 2nd byte of the 3rd BAR for devices, which thankfully is usually zero. pcibus_get_dev_path() copied this code, inheriting the same bug. pcibus_get_dev_path() is used for ramblock naming, so changing it can effect migration. However, I've only seen this byte be non-zero for an assigned device, which can't migrate anyway, so hopefully we won't run into any issues. Signed-off-by: Alex Williamson alex.william...@redhat.com Good catch. Applied. Um... submitted vs applied: PCI: Bus number from the bridge, not the device @@ -6,20 +8,28 @@ number from the secondary bus number offset of the device instead of the bridge above the device. This ends of landing in the 2nd byte of the 3rd BAR for devices, which thankfully -is usually zero. pcibus_get_dev_path() copied this code, +is usually zero. + +Note: pcibus_get_dev_path() copied this code, inheriting the same bug. pcibus_get_dev_path() is used for ramblock naming, so changing it can effect migration. However, I've only seen this byte be non-zero for an assigned device, which can't migrate anyway, so hopefully we won't run into any issues. +This patch does not touch pcibus_get_dev_path, as +bus number is guest assigned for nested buses, +so using it for migration is broken anyway. +Fix it properly later. + Signed-off-by: Alex Williamson alex.william...@redhat.com +Signed-off-by: Michael S. Tsirkin m...@redhat.com diff --git a/hw/pci.c b/hw/pci.c -index 6d0934d..15416dd 100644 +index 962886e..8f6fcf8 100644 --- a/hw/pci.c +++ b/hw/pci.c -@@ -1940,8 +1940,7 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent) +@@ -1806,8 +1806,7 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent) monitor_printf(mon, %*sclass %s, addr %02x:%02x.%x, pci id %04x:%04x (sub %04x:%04x)\n, @@ -29,14 +39,3 @@ PCI_SLOT(d-devfn), PCI_FUNC(d-devfn), pci_get_word(d-config + PCI_VENDOR_ID), pci_get_word(d-config + PCI_DEVICE_ID), -@@ -1965,7 +1964,7 @@ static char *pcibus_get_dev_path(DeviceState *dev) - char path[16]; - - snprintf(path, sizeof(path), %04x:%02x:%02x.%x, -- pci_find_domain(d-bus), d-config[PCI_SECONDARY_BUS], -+ pci_find_domain(d-bus), pci_bus_num(d-bus), - PCI_SLOT(d-devfn), PCI_FUNC(d-devfn)); - - return strdup(path); - - So the chunk that fixed the part that I was actually interested in got dropped even though the existing code is clearly wrong. Yes, we still have issues with nested bridges (not that we have many of those), but until the Fix it properly later part comes along, can we please include the obvious bug fix? Thanks, Alex We can stick 0 in there - would that help? I would much rather not create a version where we put the bus number there. Yep, 0 is good enough until we solve the nested bridge problem. Thanks, Alex
Re: [Qemu-devel] Re: [PATCH] PCI: Bus number from the bridge, not the device
On Mon, Dec 13, 2010 at 09:49:21PM -0700, Alex Williamson wrote: On Tue, 2010-12-14 at 06:46 +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 01:04:23PM -0700, Alex Williamson wrote: On Mon, 2010-11-08 at 13:22 +0200, Michael S. Tsirkin wrote: On Mon, Oct 04, 2010 at 03:53:11PM -0600, Alex Williamson wrote: pcibus_dev_print() was erroneously retrieving the device bus number from the secondary bus number offset of the device instead of the bridge above the device. This ends of landing in the 2nd byte of the 3rd BAR for devices, which thankfully is usually zero. pcibus_get_dev_path() copied this code, inheriting the same bug. pcibus_get_dev_path() is used for ramblock naming, so changing it can effect migration. However, I've only seen this byte be non-zero for an assigned device, which can't migrate anyway, so hopefully we won't run into any issues. Signed-off-by: Alex Williamson alex.william...@redhat.com Good catch. Applied. Um... submitted vs applied: PCI: Bus number from the bridge, not the device @@ -6,20 +8,28 @@ number from the secondary bus number offset of the device instead of the bridge above the device. This ends of landing in the 2nd byte of the 3rd BAR for devices, which thankfully -is usually zero. pcibus_get_dev_path() copied this code, +is usually zero. + +Note: pcibus_get_dev_path() copied this code, inheriting the same bug. pcibus_get_dev_path() is used for ramblock naming, so changing it can effect migration. However, I've only seen this byte be non-zero for an assigned device, which can't migrate anyway, so hopefully we won't run into any issues. +This patch does not touch pcibus_get_dev_path, as +bus number is guest assigned for nested buses, +so using it for migration is broken anyway. +Fix it properly later. + Signed-off-by: Alex Williamson alex.william...@redhat.com +Signed-off-by: Michael S. Tsirkin m...@redhat.com diff --git a/hw/pci.c b/hw/pci.c -index 6d0934d..15416dd 100644 +index 962886e..8f6fcf8 100644 --- a/hw/pci.c +++ b/hw/pci.c -@@ -1940,8 +1940,7 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent) +@@ -1806,8 +1806,7 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent) monitor_printf(mon, %*sclass %s, addr %02x:%02x.%x, pci id %04x:%04x (sub %04x:%04x)\n, @@ -29,14 +39,3 @@ PCI_SLOT(d-devfn), PCI_FUNC(d-devfn), pci_get_word(d-config + PCI_VENDOR_ID), pci_get_word(d-config + PCI_DEVICE_ID), -@@ -1965,7 +1964,7 @@ static char *pcibus_get_dev_path(DeviceState *dev) - char path[16]; - - snprintf(path, sizeof(path), %04x:%02x:%02x.%x, -- pci_find_domain(d-bus), d-config[PCI_SECONDARY_BUS], -+ pci_find_domain(d-bus), pci_bus_num(d-bus), - PCI_SLOT(d-devfn), PCI_FUNC(d-devfn)); - - return strdup(path); - - So the chunk that fixed the part that I was actually interested in got dropped even though the existing code is clearly wrong. Yes, we still have issues with nested bridges (not that we have many of those), but until the Fix it properly later part comes along, can we please include the obvious bug fix? Thanks, Alex We can stick 0 in there - would that help? I would much rather not create a version where we put the bus number there. Yep, 0 is good enough until we solve the nested bridge problem. Thanks, Alex I'm surprised you see that it matters in practice, but ok. Like this? diff --git a/hw/pci.c b/hw/pci.c index 254647b..81231c5 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -1952,7 +1952,10 @@ static char *pcibus_get_dev_path(DeviceState *dev) char path[16]; snprintf(path, sizeof(path), %04x:%02x:%02x.%x, - pci_find_domain(d-bus), d-config[PCI_SECONDARY_BUS], + pci_find_domain(d-bus), + 0 /* TODO: need a persistent path for nested buses. +* Note: pci_bus_num(d-bus) is not right as it's guest +* assigned. */, PCI_SLOT(d-devfn), PCI_FUNC(d-devfn)); return strdup(path);
[Qemu-devel] Re: [PATCH] rtl8139: IO memory is not part of vmstate
On Tue, 2010-12-14 at 06:43 +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 12:15:08PM -0700, Alex Williamson wrote: On Mon, 2010-12-13 at 21:06 +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 11:59:16AM -0700, Alex Williamson wrote: On Mon, 2010-12-13 at 20:54 +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 11:00:44AM -0700, Alex Williamson wrote: On Mon, 2010-12-13 at 19:50 +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 10:43:22AM -0700, Alex Williamson wrote: So, unfortunately, I stand by my original patch. What about the one that put -1 in saved index for a hotplugged device? There are still examples that don't work even without hotplug (example 2 and example 3 after the reboot). That hack limits the damage, but still leaves a latent bug for reboot and doesn't address the non-hotplug scenarios. So, I don't think it's worthwhile to pursue, and we shouldn't pretend we can use it to avoid bumping the version_id. Thanks, Alex I guess when we bump it we tell users: migration is completely borken to the old version, don't even try it. Is there a way for libvirt to discover such incompatibilities and avoid the migration? I don't know if libvirt has a way to query this in advance. If a migration is attempted, the target will report: savevm: unsupported version 5 for ':00:03.0/rtl8139' v4 And the source will continue running. We waste plenty of bits getting to that point, Yes, this happens after all of memory has been migrated. Better late than never :^\ One other question: can we do the same by creating a new (empty) section? As was discussed in the past this is easier for downstreams to cherry-pick. The only way I can think to do that would be to have a subsection that is always included, but saves no data. That would force a failure on new-old migration, but I don't think it really matches the intended purpose of subsections and feels like it's adding cruft for no gain. Maybe I'm missing something. Juan, is there any advantage to trapping this in a subsection? Thanks, Alex
Re: [Qemu-devel] Re: [PATCH] PCI: Bus number from the bridge, not the device
On Tue, 2010-12-14 at 06:57 +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 09:49:21PM -0700, Alex Williamson wrote: On Tue, 2010-12-14 at 06:46 +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 01:04:23PM -0700, Alex Williamson wrote: On Mon, 2010-11-08 at 13:22 +0200, Michael S. Tsirkin wrote: On Mon, Oct 04, 2010 at 03:53:11PM -0600, Alex Williamson wrote: pcibus_dev_print() was erroneously retrieving the device bus number from the secondary bus number offset of the device instead of the bridge above the device. This ends of landing in the 2nd byte of the 3rd BAR for devices, which thankfully is usually zero. pcibus_get_dev_path() copied this code, inheriting the same bug. pcibus_get_dev_path() is used for ramblock naming, so changing it can effect migration. However, I've only seen this byte be non-zero for an assigned device, which can't migrate anyway, so hopefully we won't run into any issues. Signed-off-by: Alex Williamson alex.william...@redhat.com Good catch. Applied. Um... submitted vs applied: PCI: Bus number from the bridge, not the device @@ -6,20 +8,28 @@ number from the secondary bus number offset of the device instead of the bridge above the device. This ends of landing in the 2nd byte of the 3rd BAR for devices, which thankfully -is usually zero. pcibus_get_dev_path() copied this code, +is usually zero. + +Note: pcibus_get_dev_path() copied this code, inheriting the same bug. pcibus_get_dev_path() is used for ramblock naming, so changing it can effect migration. However, I've only seen this byte be non-zero for an assigned device, which can't migrate anyway, so hopefully we won't run into any issues. +This patch does not touch pcibus_get_dev_path, as +bus number is guest assigned for nested buses, +so using it for migration is broken anyway. +Fix it properly later. + Signed-off-by: Alex Williamson alex.william...@redhat.com +Signed-off-by: Michael S. Tsirkin m...@redhat.com diff --git a/hw/pci.c b/hw/pci.c -index 6d0934d..15416dd 100644 +index 962886e..8f6fcf8 100644 --- a/hw/pci.c +++ b/hw/pci.c -@@ -1940,8 +1940,7 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent) +@@ -1806,8 +1806,7 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent) monitor_printf(mon, %*sclass %s, addr %02x:%02x.%x, pci id %04x:%04x (sub %04x:%04x)\n, @@ -29,14 +39,3 @@ PCI_SLOT(d-devfn), PCI_FUNC(d-devfn), pci_get_word(d-config + PCI_VENDOR_ID), pci_get_word(d-config + PCI_DEVICE_ID), -@@ -1965,7 +1964,7 @@ static char *pcibus_get_dev_path(DeviceState *dev) - char path[16]; - - snprintf(path, sizeof(path), %04x:%02x:%02x.%x, -- pci_find_domain(d-bus), d-config[PCI_SECONDARY_BUS], -+ pci_find_domain(d-bus), pci_bus_num(d-bus), - PCI_SLOT(d-devfn), PCI_FUNC(d-devfn)); - - return strdup(path); - - So the chunk that fixed the part that I was actually interested in got dropped even though the existing code is clearly wrong. Yes, we still have issues with nested bridges (not that we have many of those), but until the Fix it properly later part comes along, can we please include the obvious bug fix? Thanks, Alex We can stick 0 in there - would that help? I would much rather not create a version where we put the bus number there. Yep, 0 is good enough until we solve the nested bridge problem. Thanks, Alex I'm surprised you see that it matters in practice, but ok. Like this? I've only ever seen config[PCI_SECONDARY_BUS] be non-zero for an assigned device, so I'm pretty sure we're not going to hurt migration, but the code is clearly wrong and I'd like to make sure we don't trip on a migration failure for a minor device config space change. diff --git a/hw/pci.c b/hw/pci.c index 254647b..81231c5 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -1952,7 +1952,10 @@ static char *pcibus_get_dev_path(DeviceState *dev) char path[16]; snprintf(path, sizeof(path), %04x:%02x:%02x.%x, - pci_find_domain(d-bus), d-config[PCI_SECONDARY_BUS], + pci_find_domain(d-bus), + 0 /* TODO: need a persistent path for nested buses. +* Note: pci_bus_num(d-bus) is not right as it's guest +* assigned. */, PCI_SLOT(d-devfn), PCI_FUNC(d-devfn)); return strdup(path); Sure, that's fine. Acked-by: Alex Williamson alex.william...@redhat.com Thanks, Alex
[Qemu-devel] SMBIOS support in Qemu?
Hi, Which version of Qemu contains the Smbios code? If I have to get the code in my repo, is there any place I can get the complete set of patches? Thanks Anjali
[Qemu-devel] SMBIOS support in Qemu?
Hi, Which version of Qemu contains the Smbios code? If I have to get the code in my repo, is there any place I can get the complete set of patches? Thanks Anjali
Re: [Qemu-devel] SMBIOS support in Qemu?
On Mon, Dec 13, 2010 at 10:47 PM, Anjali Kulkarni anj...@juniper.net wrote: Hi, Which version of Qemu contains the Smbios code? If I have to get the code in my repo, is there any place I can get the complete set of patches? We've had SMBIOS support for a couple years, it should be in any of the recent release and distributions. SMBIOS is generated in seabios in src/smbios.*[1] Support for loading tables and fields from qemu is in hw/smbios.*[2] Alex [1] http://www.seabios.org/Download [2] http://wiki.qemu.org/Download