Re: [Qemu-devel] [PATCH v4 06/10] qemu-ga: Add Windows VSS provider to quiesce applications on fsfreeze
Il 26/06/2013 00:31, Tomoki Sekiyama ha scritto: From: Paolo Bonzini [paolo.bonz...@gmail.com] on behalf of Paolo Bonzini [pbonz...@redhat.com] Il 06/06/2013 17:06, Tomoki Sekiyama ha scritto: +STDAPI VSSCheckOSVersion(void); + +STDAPI COMRegister(void); +STDAPI COMUnregister(void); + +STDAPI DllRegisterServer(void); +STDAPI DllUnregisterServer(void); Can you explain the difference between COMRegister/COMUnregister and DllRegisterServer/DllUnregisterServer (and why the COM+ part need not be done by regsvr32)? Also, why does COMUnregister call DllUnregisterServer but COMRegister does not call DllRegisterServer? COMRegister and COMUnregister are called by`qemu-ga -s install`, to register/unregister the DLL into/from COM+ application catalogue. DllRegisterServer is automatically called inside pCatalog-InstallComponent() (like regsvr32 does), and register this DLL as a VSS provider. DllUnregisterServer will do the oposite. ICOMAdminCatalog (pCatalog) does not provide a method to uninstall component, so COMUnregister calls DllUnregisterServer by itself. Understood, thanks. Just one question remains: why is the COM+ part not needed when you invoke regsvr32? Thanks, Paolo Also, is it needed to call VSSCheckOSVersion from the requestor? I would think that checking VSSAPI.DLL is stronger than checking the version, and indeed you do that check too. In Windows XP, VSSAPI.DLL exists but it has different functionality and interfaces from newer Windows. http://msdn.microsoft.com/en-us/library/windows/desktop/aa384627(v=vs.85).aspx It is checking the OS version because this patchset only supports Windows 2003 or later. Thanks, Tomoki Sekiyama
Re: [Qemu-devel] block: Review of .has_zero_init use
Am 26.06.2013 um 05:14 schrieb Bharata B Rao bhar...@linux.vnet.ibm.com: On Tue, Jun 25, 2013 at 01:39:11PM +0200, Kevin Wolf wrote: Can you please review for the gluster, rbd, sheepdog and ssh driver whether it's safe to assume that the image reads back as zeros after bdrv_create? Gluster supports both file and block backends. While the above is true for file backend (which uses ftruncate), the same is not true for block backend (which uses lvcreate lvresize). So overall it is not safe to assume that an image on GlusterFS volume reads back as zeroes after create. Okay, so for safety we have to return has_zero_init = 0. Erroneously assuming a device is zero initialized can bring severe filesystem corruption. Do you see a way to query the information of the underlaying backend from the storage and return 1 or 0 conditionally? Peter
Re: [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug
Il 26/06/2013 04:59, liu ping fan ha scritto: The latter part could be the hard one in a multi-threaded context, but I think it's up to the device to ensure it. It doesn't _have_ to be hard. For example, joining the data-plane thread would do that as well. It seems not easy, take consideration of the sequence: _bh_delete(), // ensure not scheduled, ie, no further access to DeviceState wait for rcu grace period to come, // ensure not running free DeviceState in rcu-reclaimer But in current code, _delete() can be placed in DeviceState finalization path(in reclaimer), which means while reclaiming, there still exist access path to the DeviceState. It can be placed there, but it doesn't mean it is a good idea. :) On the other hand, pushing _delete() out of finalization path is not easy, since we do not what time the DeviceState has done with its bh. See above: - if the BH will run in the iothread, the BH is definitely not running (because the BH runs under the BQL, and the reclaimer owns it) - if the BH is running in another thread, waiting for that thread to terminate obviously makes the BH not running. What we need is separation of removal and reclamation. Without that, any effort to make things unplug-safe is going to be way way more complicated than necessary. Paolo
Re: [Qemu-devel] [PATCH v11 14/15] rdma: introduce MIG_STATE_NONE and change MIG_STATE_SETUP state transition
Il 26/06/2013 02:31, Michael R. Hines ha scritto: On 06/25/2013 05:06 PM, Paolo Bonzini wrote: Il 25/06/2013 22:56, Michael R. Hines ha scritto: I was wrong - this does require a protocol extension. This is because the RDMA transfers are asynchronous, and thus we cannot know in advance that it is safe to unregister the memory associated with each individual transfer before the transfer actually completes. While the destination currently uses the protocol to participate in *registering* the page, the destination does not participate in the RDMA transfers themselves, only the source does, and thus would require a new exchange of messages to block and instruct the destination to unpin the memory. Yes, that's what I recalled too (really what mst told me :)). Does it need to be blocking though? As long as the pinning is blocking, and messages are processed in order, the source can proceed immediately after sending an unpin message. This assumes of course that the chunk is not being transmitted, and I am not sure how easy the source can determine that. No, they're not processed in order. In fact, not only does the device write out of order, but also the PCI bus writes out of order. This was such a problem in fact, that I fixed several bugs as a result a few weeks ago (v7 of the patch with an in-depth description). The destination simply cannot assume whatsoever what the ordering of the writes are - that's really the whole point of using RDMA in the first place so that the software can get out of the way of the transfer process to lower the latency of each transfer. The memory is processed out of order, but what about the messages? Those must be in order. Note that I wrote above This assumes of course that the chunk is not being transmitted. Can the source know when an asynchronous transfer finished, and delay the unpinning until that time? Paolo The only option is to send a blocking message to the other side to request the unpinning (in addition to unpinning on the source first upon completion of the original transfer). As you can expect, this would be very expensive and we must ensure that we have *very* good a-priori information that this memory will not need to be re-registered anytime in the near future. - Michael
Re: [Qemu-devel] block: Review of .has_zero_init use
On Wed, Jun 26, 2013 at 07:59:27AM +0200, Peter Lieven wrote: Am 26.06.2013 um 05:14 schrieb Bharata B Rao bhar...@linux.vnet.ibm.com: On Tue, Jun 25, 2013 at 01:39:11PM +0200, Kevin Wolf wrote: Can you please review for the gluster, rbd, sheepdog and ssh driver whether it's safe to assume that the image reads back as zeros after bdrv_create? Gluster supports both file and block backends. While the above is true for file backend (which uses ftruncate), the same is not true for block backend (which uses lvcreate lvresize). So overall it is not safe to assume that an image on GlusterFS volume reads back as zeroes after create. Okay, so for safety we have to return has_zero_init = 0. Erroneously assuming a device is zero initialized can bring severe filesystem corruption. Do you see a way to query the information of the underlaying backend from the storage and return 1 or 0 conditionally? Right now no. There have been efforts to get the capabilities (file, block etc) of gluster volume from gluster cmdline, but there haven't been discussions about providing such capabilities from libgfapi which is the library QEMU uses to talk with gluster. I see has_zero_init being used from qemu-img convert path. Is there any other use of this in QEMU ? Regards, Bharata.
Re: [Qemu-devel] [PATCH v3 1/2] net: introduce RX_FILTER_CHANGED event
Eric Blake ebl...@redhat.com writes: On 05/23/2013 03:07 AM, Amos Kong wrote: Introduce this new QMP event to notify management after guest changes rx-filter configuration. Signed-off-by: Amos Kong ak...@redhat.com --- QMP/qmp-events.txt| 14 ++ include/monitor/monitor.h | 1 + monitor.c | 1 + 3 files changed, 16 insertions(+) diff --git a/QMP/qmp-events.txt b/QMP/qmp-events.txt index 92fe5fb..ad6612b 100644 --- a/QMP/qmp-events.txt +++ b/QMP/qmp-events.txt @@ -154,6 +154,20 @@ Data: path: /machine/peripheral/virtio-net-pci-0 }, timestamp: { seconds: 1265044230, microseconds: 450486 } } +RX_FILTER_CHANGED +- + DEVICE_TRAY_MOVED - Isn't this file supposed to be kept in sorted order, to minimize merge conflicts when backporting events? Yes, please. Also helps humans navigate. GUEST_PANICKED is out of order, needs fixing.
Re: [Qemu-devel] [PULL 2/8] i.MX: Implement a more complete version of the GPT timer.
Il 25/06/2013 22:53, Peter Maydell ha scritto: On 25 June 2013 19:42, Paolo Bonzini pbonz...@redhat.com wrote: Il 25/06/2013 20:21, Peter Maydell ha scritto: @@ -416,7 +513,7 @@ static int imx_timerg_init(SysBusDevice *dev) sysbus_init_irq(dev, s-irq); memory_region_init_io(s-iomem, imx_timerg_ops, - s, imxg-timer, + s, TYPE_IMX_GPT, 0x1000); sysbus_init_mmio(dev, s-iomem); There was some agreement that this is not a good change. I agree (and more so regarding the use of the macro in the vmstate name), but nobody actually posted any comment to that effect against any of the versions of this patch that got sent out for review... Yeah, the timing was bad... Can you post a revert, though? Paolo
Re: [Qemu-devel] [PULL 2/8] i.MX: Implement a more complete version of the GPT timer.
For my own education, could you elaborate on the reason this is not a good change? Thanks. JC - Mail original - Il 25/06/2013 22:53, Peter Maydell ha scritto: On 25 June 2013 19:42, Paolo Bonzini pbonz...@redhat.com wrote: Il 25/06/2013 20:21, Peter Maydell ha scritto: @@ -416,7 +513,7 @@ static int imx_timerg_init(SysBusDevice *dev) sysbus_init_irq(dev, s-irq); memory_region_init_io(s-iomem, imx_timerg_ops, - s, imxg-timer, + s, TYPE_IMX_GPT, 0x1000); sysbus_init_mmio(dev, s-iomem); There was some agreement that this is not a good change. I agree (and more so regarding the use of the macro in the vmstate name), but nobody actually posted any comment to that effect against any of the versions of this patch that got sent out for review... Yeah, the timing was bad... Can you post a revert, though? Paolo
Re: [Qemu-devel] [PULL 2/8] i.MX: Implement a more complete version of the GPT timer.
Il 26/06/2013 09:08, j...@tribudubois.net ha scritto: For my own education, could you elaborate on the reason this is not a good change? Quoting from http://article.gmane.org/gmane.comp.emulators.qemu/218078: The MemoryRegion name is free text that can be changed as desired and doesn't need to match the type name. The TYPE_* constant serves to keep consistency between usages of the string literal but does not prohibit changing its value. The VMStateDescription should thus not use the TYPE_* constant because it must not ever change for live migration. Renaming a type is a valid thing to do if there is no command line compatibility to keep (think board-level xlnx.foo - xlnx,foo). Paolo
Re: [Qemu-devel] [PATCH v3 2/2] net: introduce command to query rx-filter information
Eric Blake ebl...@redhat.com writes: On 05/23/2013 03:08 AM, Amos Kong wrote: We want to implement mac programming over macvtap through Libvirt, related rx-filter configuration contains main mac, some of rx-mode and mac-table. The previous patch adds QMP event to notify management of rx-filter change. This patch adds a monitor command to query rx-filter information. A flag is used to avoid events flooding, if user don't query s/don't/doesn't/ rx-filter after receives one event, new events won't be sent s/after/after it/ to qmp monitor. +++ b/net/net.c @@ -961,6 +961,53 @@ void print_net_client(Monitor *mon, NetClientState *nc) nc-info_str); } +RxFilterInfoList *qmp_query_rx_filter(bool has_name, const char *name, + Error **errp) +{ +NetClientState *nc; +RxFilterInfoList *filter_list = NULL, *last_entry = NULL; + +QTAILQ_FOREACH(nc, net_clients, next) { +RxFilterInfoList *entry; +RxFilterInfo *info; + +if (nc-info-type != NET_CLIENT_OPTIONS_KIND_NIC) { +continue; +} +if (has_name strcmp(nc-name, name) != 0) { Do you need the has_name argument here, or can you ensure that the caller passes NULL when the caller's has_name was false, for one less parameter and the same amount of information? QAPI always generates a has_FOO parameter for an optional FOO parameter, even when FOO is a pointer, which has the perfectly obvious special not given value NULL. I hate that. [...]
Re: [Qemu-devel] [RFC PATCH 2/3] block: assign backing relationship in drive-backup
Il 26/06/2013 05:59, Fam Zheng ha scritto: Assign source image as the backing hd of target bs, so reading target bs gets the point-in-time copy of data from source image. Signed-off-by: Fam Zheng f...@redhat.com --- blockdev.c | 4 1 file changed, 4 insertions(+) diff --git a/blockdev.c b/blockdev.c index 5e694f3..a1d816e 100644 --- a/blockdev.c +++ b/blockdev.c @@ -1504,6 +1504,10 @@ void qmp_drive_backup(const char *device, const char *target, return; } +target_bs-backing_hd = bs; +pstrcpy(target_bs-backing_file, sizeof(target_bs-backing_file), +bs-filename); backing_format missing. Paolo backup_start(bs, target_bs, speed, on_source_error, on_target_error, block_job_cb, bs, local_err); if (local_err != NULL) {
Re: [Qemu-devel] [PULL 2/8] i.MX: Implement a more complete version of the GPT timer.
Thanks for the info. Is this consolidated into some developper documentation somewhere (a wiki, ...)? JC - Mail original - Il 26/06/2013 09:08, j...@tribudubois.net ha scritto: For my own education, could you elaborate on the reason this is not a good change? Quoting from http://article.gmane.org/gmane.comp.emulators.qemu/218078: The MemoryRegion name is free text that can be changed as desired and doesn't need to match the type name. The TYPE_* constant serves to keep consistency between usages of the string literal but does not prohibit changing its value. The VMStateDescription should thus not use the TYPE_* constant because it must not ever change for live migration. Renaming a type is a valid thing to do if there is no command line compatibility to keep (think board-level xlnx.foo - xlnx,foo). Paolo
Re: [Qemu-devel] [PULL 2/8] i.MX: Implement a more complete version of the GPT timer.
Hi On Wed, Jun 26, 2013 at 4:56 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 25/06/2013 22:53, Peter Maydell ha scritto: On 25 June 2013 19:42, Paolo Bonzini pbonz...@redhat.com wrote: Il 25/06/2013 20:21, Peter Maydell ha scritto: @@ -416,7 +513,7 @@ static int imx_timerg_init(SysBusDevice *dev) sysbus_init_irq(dev, s-irq); memory_region_init_io(s-iomem, imx_timerg_ops, - s, imxg-timer, + s, TYPE_IMX_GPT, 0x1000); sysbus_init_mmio(dev, s-iomem); There was some agreement that this is not a good change. I agree (and more so regarding the use of the macro in the vmstate name), but nobody actually posted any comment to that effect against any of the versions of this patch that got sent out for review... Yeah, the timing was bad... Can you post a revert, though? The original string being replaced was a poor choice as well. IIUC the consensus was string field of the memory regions is supposed to indicate the purpose of the memory region for the device. Good examples would be Control regs or MMIO. Naming the memory region after the device type is a redundancy as that info will come via memory region owners. So rather than revert could you just choose something more descriptive? Regards, Peter Paolo
Re: [Qemu-devel] [RFC PATCH 0/3] Point-in-time snapshot exporting with drive-backup
Il 26/06/2013 05:59, Fam Zheng ha scritto: The block job originally had total control over target bs: when the job completes, the target is deleted. Now it's shared with NBD, so we should make sure the job completion or canceling wouldn't crash NBD server. This patch doesn't handle this case: if the block job is ended before NBD exporting, there'll be problem. This should not be a problem. When the job completes or cancels it will close s-target. The NBD server has installed a close notifier for s-target, and will close connections with the clients. Similarly, if the source disk is hot-unplugged, the bdrv_close will cancel the job, which in turn closes connections with the clients. So this just works as long as no hot-unplug happens and as long as the job doesn't complete: the client can use the point-in-time snapshot via NBD as long as it cares to. This leads to another observation: a sync:'none' block-backup job probably should never complete, and should instead go on until explicit cancellation. This is because the job does no background writes, and thus completion would only happen after the guest has written the whole disk. Writing the whole disk is rare enough that it will likely cause bugs in the clients. It is easier just to never complete the job. Paolo
Re: [Qemu-devel] [RFC PATCH 0/3] Point-in-time snapshot exporting with drive-backup
On Wed, 06/26 09:23, Paolo Bonzini wrote: Il 26/06/2013 05:59, Fam Zheng ha scritto: The block job originally had total control over target bs: when the job completes, the target is deleted. Now it's shared with NBD, so we should make sure the job completion or canceling wouldn't crash NBD server. This patch doesn't handle this case: if the block job is ended before NBD exporting, there'll be problem. This should not be a problem. When the job completes or cancels it will close s-target. The NBD server has installed a close notifier for s-target, and will close connections with the clients. Similarly, if the source disk is hot-unplugged, the bdrv_close will cancel the job, which in turn closes connections with the clients. So this just works as long as no hot-unplug happens and as long as the job doesn't complete: the client can use the point-in-time snapshot via NBD as long as it cares to. OK, thanks for pointing this out. This leads to another observation: a sync:'none' block-backup job probably should never complete, and should instead go on until explicit cancellation. This is because the job does no background writes, and thus completion would only happen after the guest has written the whole disk. Writing the whole disk is rare enough that it will likely cause bugs in the clients. It is easier just to never complete the job. Yes, the sync mode none will simply run forever until cancelled. -- Fam
Re: [Qemu-devel] block: Review of .has_zero_init use
Am 26.06.2013 um 08:46 hat Bharata B Rao geschrieben: On Wed, Jun 26, 2013 at 07:59:27AM +0200, Peter Lieven wrote: Am 26.06.2013 um 05:14 schrieb Bharata B Rao bhar...@linux.vnet.ibm.com: On Tue, Jun 25, 2013 at 01:39:11PM +0200, Kevin Wolf wrote: Can you please review for the gluster, rbd, sheepdog and ssh driver whether it's safe to assume that the image reads back as zeros after bdrv_create? Gluster supports both file and block backends. While the above is true for file backend (which uses ftruncate), the same is not true for block backend (which uses lvcreate lvresize). So overall it is not safe to assume that an image on GlusterFS volume reads back as zeroes after create. Okay, so for safety we have to return has_zero_init = 0. Erroneously assuming a device is zero initialized can bring severe filesystem corruption. Do you see a way to query the information of the underlaying backend from the storage and return 1 or 0 conditionally? Right now no. There have been efforts to get the capabilities (file, block etc) of gluster volume from gluster cmdline, but there haven't been discussions about providing such capabilities from libgfapi which is the library QEMU uses to talk with gluster. I see has_zero_init being used from qemu-img convert path. Is there any other use of this in QEMU ? No, it's the only user. So what this means is that using 'qemu-img convert' with a glusterfs target corrupts the image if the volume is backed by a block device. Kevin
Re: [Qemu-devel] [PATCH] block/ssh: Set bdrv_has_zero_init according to the file type.
Am 25.06.2013 um 19:15 hat Richard W.M. Jones geschrieben: From: Richard W.M. Jones rjo...@redhat.com If the remote is a regular file, set it to true (ie. reads of uninitialized areas in a newly created file will return zeroes). If we can't prove that, return false (a safe default). Tested by adding a debugging print statement [not part of this commit] and creating a remote file and a remote block device: $ ./qemu-img create ssh://localhost/tmp/new 100M Formatting 'ssh://localhost/tmp/new', fmt=raw size=104857600 filename ssh://localhost/tmp/new: has_zero_init = 1 $ sudo lvcreate -L 1G -n tmp /dev/fedora Logical volume tmp created $ ./qemu-img create ssh://localhost/dev/fedora/tmp 1G Formatting 'ssh://localhost/dev/fedora/tmp', fmt=raw size=1073741824 filename ssh://localhost/dev/fedora/tmp: has_zero_init = 0 Cc: Kevin Wolf kw...@redhat.com Signed-off-by: Richard W.M. Jones rjo...@redhat.com Thanks, applied to the block branch, and CCed qemu-stable. Kevin
Re: [Qemu-devel] [PATCH] block: add drive_backup HMP command
On Tue, Jun 25, 2013 at 04:36:53PM +0200, Paolo Bonzini wrote: Il 25/06/2013 16:06, Kevin Wolf ha scritto: Am 25.06.2013 um 15:49 hat Paolo Bonzini geschrieben: Il 25/06/2013 15:26, Kevin Wolf ha scritto: +if (!full) { +error_setg(errp, -f is not yet implemented); +hmp_handle_error(mon, errp); +return; +} Then why make it a valid option and confuse users in the help text by describing options that don't really exist? Because otherwise we're stuck with a meaning of the flag that is different between drive-mirror and block-backup. Do you mean when otherwise isn't only we don't add -f now, but also we accidentally add a -f with different meaning later? Not sure if there's a real danger of that when we're aware that we want -f with the same meaning as for mirroring. We have drive-mirror with: * the default is 'top' * -f gives 'full' block-backup for now only implements 'full'. If we do not force the user to add -f, the default is 'full' and we should not change it later. However, I would move the not yet implemented error from HMP to QMP. This way, both drive-mirror and block-backup will have a mandatory 'sync' argument. We plan to implement it anyway, and it makes sense imo to avoid gratuitous differences in the APIs. Thanks, I should have explained this in the commit message. Requiring -f now avoids changing semantics later when 'top' becomes the default to match drive-mirror. I'll move the error into qmp_drive_backup(). Stefan
[Qemu-devel] [PATCH] gluster: Return bdrv_has_zero_init = 0
GlusterFS volumes can be backed by block devices, in which case bdrv_create() doesn't make sure that the image is zeroed out. It is currently not possibly to detect whether a given image is backed by a file or a block device, and incorrectly assuming that it is zeroed corrupts images during qemu-img convert, so let's err on the side of caution and always return 0. Cc: qemu-sta...@nongnu.org Signed-off-by: Kevin Wolf kw...@redhat.com --- block/gluster.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/block/gluster.c b/block/gluster.c index 91acde2..61424bc 100644 --- a/block/gluster.c +++ b/block/gluster.c @@ -574,6 +574,12 @@ static void qemu_gluster_close(BlockDriverState *bs) glfs_fini(s-glfs); } +static int qemu_gluster_has_zero_init(BlockDriverState *bs) +{ +/* GlusterFS volume could be backed by a block device */ +return 0; +} + static QEMUOptionParameter qemu_gluster_create_options[] = { { .name = BLOCK_OPT_SIZE, @@ -595,6 +601,7 @@ static BlockDriver bdrv_gluster = { .bdrv_aio_readv = qemu_gluster_aio_readv, .bdrv_aio_writev = qemu_gluster_aio_writev, .bdrv_aio_flush = qemu_gluster_aio_flush, +.bdrv_has_zero_init = qemu_gluster_has_zero_init, .create_options = qemu_gluster_create_options, }; @@ -610,6 +617,7 @@ static BlockDriver bdrv_gluster_tcp = { .bdrv_aio_readv = qemu_gluster_aio_readv, .bdrv_aio_writev = qemu_gluster_aio_writev, .bdrv_aio_flush = qemu_gluster_aio_flush, +.bdrv_has_zero_init = qemu_gluster_has_zero_init, .create_options = qemu_gluster_create_options, }; @@ -625,6 +633,7 @@ static BlockDriver bdrv_gluster_unix = { .bdrv_aio_readv = qemu_gluster_aio_readv, .bdrv_aio_writev = qemu_gluster_aio_writev, .bdrv_aio_flush = qemu_gluster_aio_flush, +.bdrv_has_zero_init = qemu_gluster_has_zero_init, .create_options = qemu_gluster_create_options, }; @@ -640,6 +649,7 @@ static BlockDriver bdrv_gluster_rdma = { .bdrv_aio_readv = qemu_gluster_aio_readv, .bdrv_aio_writev = qemu_gluster_aio_writev, .bdrv_aio_flush = qemu_gluster_aio_flush, +.bdrv_has_zero_init = qemu_gluster_has_zero_init, .create_options = qemu_gluster_create_options, }; -- 1.8.1.4
[Qemu-devel] Google Summer of Code 2013 has started
It is a pleasure to welcome the following GSoC 2013 students to the QEMU, KVM, and libvirt communities: Libvirt Wireshark Dissector - Yuto KAWAMURA (kawamuray) http://qemu-project.org/Features/LibvirtWiresharkDissector Libvirt Introduce API to query IP addresses for given domain - Nehal J. Wani (nehaljwani) http://www.google-melange.com/gsoc/project/google/gsoc2013/nehaljwani/51001 Libvirt More Intelligent virsh auto-completion - Tomas Meszaros http://www.google-melange.com/gsoc/project/google/gsoc2013/examon/13001 QEMU Integrated Copy-Paste - Ozan Çağlayan and Pallav Agrawal (pallav) http://qemu-project.org/Features/IntegratedCopyPaste QEMU Continuation Passing C - Charlie Shepherd (cs648) http://qemu-project.org/Features/Continuation-Passing_C QEMU Kconfig - Ákos Kovács http://qemu-project.org/Features/Kconfig QEMU USB Media Transfer Protocol emulation - a|mond http://www.google-melange.com/gsoc/project/google/gsoc2013/almond/1001 KVM Nested Virtualization Testsuite - Arthur Chunqi Li (xelatex) http://www.google-melange.com/gsoc/project/google/gsoc2013/xelatex/19001 Coding started on Monday, 17th of June and ends Monday, 23rd of September. Feel free to follow these projects - feature pages are being created with git repo and blog links. Stefan
Re: [Qemu-devel] [PATCH V2] build: remove compile warning
Il 24/06/2013 19:58, Stefan Weil ha scritto: hw/ppc/spapr_hcall.c:188:1: warning: control reaches end of non-void function [-Wreturn-type] hw/ppc/spapr_pci.c:454:1: warning: control reaches end of non-void function [-Wreturn-type] I think you could report this to mingw. GCC should handle if (!0) foo() just fine if foo is noreturn, perhaps the assertion failure runtime function is not noreturn in mingw. It's a gcc problem. Removing the assertion manually in the source code and compiling with NDEBUG (which we do by default)results in the same compiler warning. Huh? That seems wrong, assertions are there for a reason... Paolo
Re: [Qemu-devel] [PATCH 0/3] spapr: Fix remaining compiler warnings
Il 24/06/2013 19:48, Stefan Weil ha scritto: Hi Alex, this mini series fixes the remaining compiler warnings in my w32/w64 cross build environment on Debian wheezy. [PATCH 1/3] spapr: Use named enum for function remove_hpte [PATCH 2/3] spapr: Fix compiler warning for some versions of gcc [PATCH 3/3] spapr: Fix compiler warning for some versions of gcc The modifications are nearly trivial, but this time it would be better if you could apply them to your patch queue. I think these patches are a bit backwards. The right fix is just to use abort() instead of assert(0). Paolo
Re: [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug
On Wed, Jun 26, 2013 at 2:34 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 26/06/2013 04:59, liu ping fan ha scritto: The latter part could be the hard one in a multi-threaded context, but I think it's up to the device to ensure it. It doesn't _have_ to be hard. For example, joining the data-plane thread would do that as well. It seems not easy, take consideration of the sequence: _bh_delete(), // ensure not scheduled, ie, no further access to DeviceState wait for rcu grace period to come, // ensure not running free DeviceState in rcu-reclaimer But in current code, _delete() can be placed in DeviceState finalization path(in reclaimer), which means while reclaiming, there still exist access path to the DeviceState. It can be placed there, but it doesn't mean it is a good idea. :) Unfortunately , these is what current code does, otherwise, they should _bh_new() each time when scheduling bh, and _bh_delete() in bh's callback, i.e. callback is the exact place to know we have done with bh. (The other place is the destroy of DeviceState) On the other hand, pushing _delete() out of finalization path is not easy, since we do not what time the DeviceState has done with its bh. See above: - if the BH will run in the iothread, the BH is definitely not running (because the BH runs under the BQL, and the reclaimer owns it) It works for the case in which the DeviceState's reclaimer calls _bh_delete(). But in another case(also exist in current code), where we call _bh_delete() in callback, the bh will be scheduled, and oops! - if the BH is running in another thread, waiting for that thread to terminate obviously makes the BH not running. This imply that except for qemu_aio_context, AioContext can be only shared by one device, right? Otherwise we can not just terminate the thread. If it is true, why can not we have more than one just like qemu_aio_context? What we need is separation of removal and reclamation. Without that, any effort to make things unplug-safe is going to be way way more complicated than necessary. Agree, but when I tried for bh, I found the separation of removal and reclamation are not easy. We can not _bh_delete() in acpi_piix_eject_slot(), because mmio-dispatch can schedule a bh at the same time. And as explained, only two places can hold the _bh_delete(). In a short word, with rcu, we need to constrain the calling idiom of bh, i.e., putting them in reclaimer. On the other hand, my patch try to leave the calling idiom as it is, and handle this issue inside bh. Regards, Pingfan Paolo
Re: [Qemu-devel] [PATCH 10/15] target-i386: convert 'hv_spinlocks' to static property
On Tue, 25 Jun 2013 17:34:44 -0300 Eduardo Habkost ehabk...@redhat.com wrote: On Tue, Jun 25, 2013 at 05:30:50PM -0300, Eduardo Habkost wrote: On Wed, Jun 05, 2013 at 03:18:41PM +0200, Igor Mammedov wrote: Signed-off-by: Igor Mammedov imamm...@redhat.com --- [...] @@ -1632,6 +1677,7 @@ static void cpu_x86_parse_featurestr(X86CPU *cpu, char *features, Error **errp) } else if (!strcmp(featurestr, hv-spinlocks)) { char *err; const int min = 0xFFF; +char num[32]; numvalue = strtoul(val, err, 0); if (!*val || *err) { error_setg(errp, bad numerical value %s, val); @@ -1643,7 +1689,8 @@ static void cpu_x86_parse_featurestr(X86CPU *cpu, char *features, Error **errp) min); numvalue = min; } -env-hyperv_spinlock_attempts = numvalue; +snprintf(num, sizeof(num), % PRId32, numvalue); +object_property_parse(OBJECT(cpu), num, featurestr, errp); Why not use object_property_set_int()? Oh, I believe I have asked that before and you have already explained it: you are using strings to allow the existing object_property_parse() calls to be easily converted to qdev_prop_register_global() calls later. Correct? Yep, this will be used later to consolidate code.
Re: [Qemu-devel] [PATCH 08/15] target-i386: cpu: convert 'tsc-frequency' to static property
On Mon, 24 Jun 2013 14:09:43 -0300 Eduardo Habkost ehabk...@redhat.com wrote: On Wed, Jun 05, 2013 at 03:18:39PM +0200, Igor Mammedov wrote: Signed-off-by: Igor Mammedov imamm...@redhat.com --- target-i386/cpu.c | 12 +--- 1 files changed, 9 insertions(+), 3 deletions(-) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 9f6fe06..ec6d33f 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -1427,6 +1427,14 @@ static void x86_cpuid_set_tsc_freq(Object *obj, Visitor *v, void *opaque, cpu-env.tsc_khz = value / 1000; } +static PropertyInfo qdev_prop_tsc_freq = { +.name = int64, What about making it uint64 instead of keeping it signed? We already reject negative values. As this patch just converts the current code/semantics to use static properties, and the change to uint64 could be done in a separate patch: Yes, purpose of patch is to make conversion and nothing else, so it would be easy to review. Reviewed-by: Eduardo Habkost ehabk...@redhat.com +.get = x86_cpuid_get_tsc_freq, +.set = x86_cpuid_set_tsc_freq, +}; +#define DEFINE_PROP_TSC_FREQ(_n) \ +DEFINE_PROP(_n, X86CPU, env.tsc_khz, qdev_prop_tsc_freq, int32_t) + static void x86_cpuid_get_apic_id(Object *obj, Visitor *v, void *opaque, const char *name, Error **errp) { @@ -1509,6 +1517,7 @@ static Property cpu_x86_properties[] = { DEFINE_PROP_UINT32(xlevel, X86CPU, env.cpuid_xlevel, 0), DEFINE_PROP_VENDOR(vendor), DEFINE_PROP_MODEL_ID(model-id), +DEFINE_PROP_TSC_FREQ(tsc-frequency), DEFINE_PROP_END_OF_LIST(), }; @@ -2488,9 +2497,6 @@ static void x86_cpu_initfn(Object *obj) cs-env_ptr = env; cpu_exec_init(env); -object_property_add(obj, tsc-frequency, int, -x86_cpuid_get_tsc_freq, -x86_cpuid_set_tsc_freq, NULL, NULL, NULL); object_property_add(obj, apic-id, int, x86_cpuid_get_apic_id, x86_cpuid_set_apic_id, NULL, NULL, NULL); -- 1.7.1
Re: [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug
Il 26/06/2013 10:20, liu ping fan ha scritto: On the other hand, pushing _delete() out of finalization path is not easy, since we do not what time the DeviceState has done with its bh. See above: - if the BH will run in the iothread, the BH is definitely not running (because the BH runs under the BQL, and the reclaimer owns it) It works for the case in which the DeviceState's reclaimer calls _bh_delete(). But in another case(also exist in current code), where we call _bh_delete() in callback, the bh will be scheduled, and oops! But you may know that the BH is not scheduled after removal, too. There are not that many devices that have bottom halves (apart from those that use bottom halves in ptimer). If they really need to, they can do the object_ref/unref themselves. But I expect this to be rare, and generic code should not need an owner field in bottom halves. For example, block devices should stop sending requests after removal. - if the BH is running in another thread, waiting for that thread to terminate obviously makes the BH not running. This imply that except for qemu_aio_context, AioContext can be only shared by one device, right? Otherwise we can not just terminate the thread. If it is true, why can not we have more than one just like qemu_aio_context? Yes, if you do it that way you can have only one AioContext per device. RCU is another way, and doesn't have the same limitation. We should avoid introducing infrastructure that we are not sure is needed. For what it's worth, your patches to make the bottom half list thread-safe are also slightly premature. However, they do not change the API and it makes some sense to add the infrastructure. In this case, I'm simply not sure that we're there yet. If you look at the memory work, for example, the owner patches happen to be useful for BQL-less dispatch too, but they are solving existing hot-unplug bugs. What we need is separation of removal and reclamation. Without that, any effort to make things unplug-safe is going to be way way more complicated than necessary. Agree, but when I tried for bh, I found the separation of removal and reclamation are not easy. We can not _bh_delete() in acpi_piix_eject_slot(), because mmio-dispatch can schedule a bh at the same time. acpi_piix_eject_slot() is removal, not reclamation. The plan there is that qdev_free calls the exit callback immediately (which can do qemu_bh_delete), and schedules an unref after the next RCU grace period. At the next RCU grace period the BH will not be running, thus it is safe to finalize the object. Paolo And as explained, only two places can hold the _bh_delete(). In a short word, with rcu, we need to constrain the calling idiom of bh, i.e., putting them in reclaimer. On the other hand, my patch try to leave the calling idiom as it is, and handle this issue inside bh. Regards, Pingfan Paolo
Re: [Qemu-devel] [PATCH] target-arm: Added CP15 VBAR support
On 26 June 2013 02:57, Peter Crosthwaite peter.crosthwa...@xilinx.com wrote: On Wed, Feb 27, 2013 at 8:10 PM, Peter Maydell peter.mayd...@linaro.org wrote: This one's tricky because the VBAR only exists in CPUs with TrustZone, and strictly speaking QEMU models a non-TrustZone core. However we already have some dummy cp15 registers which are really TZ-only just to get guests working, and this may well fall into that category. I need to have a think about how we can coherently define what QEMU does about TZ so we can have a better idea of what should and shouldn't be implemented. (If you have any suggestions I'm open to them.) ARM_FEATURE_TZ? Yes, but what should this imply about QEMU's behaviour? Naming a feature switch is easy, semantics are hard. -- PMM
Re: [Qemu-devel] [PATCH 1/2] device_tree: Add qemu_devtree_setprop_sized_cells() utility function
On 26 June 2013 00:38, David Gibson da...@gibson.dropbear.id.au wrote: On Mon, Jun 24, 2013 at 12:02:39PM +0100, Peter Maydell wrote: On 24 June 2013 11:56, Alexander Graf ag...@suse.de wrote: This looks pretty complicated for something actually quite simple: All you want is to pass in a number of 64bit values, rather than 32bit ones, right? Nope. If the device tree blob says #address-cells is 1 and #size-cells is 1, then I want to pass in values to go in 32 bit cells. If it says #address-cells is 2 but #size-cells is still 1, then I want to pass in a value for a 64 bit cell then one for a 32 bit cell. If they're both 2 then I want to pass in values for two 64 bit cells. Hmm.. the property certainly needs to be constructed that way. But I think Alex's point is that you could make the arguments all 64-bit, and then truncate them in the generated property. Er, the arguments *are* all 64 bits and truncated in the generated property: + * @...: 0-terminated list of uint32_t number-of-cells, uint64_t value pairs There's a bigger problem, though, that exists with both versions. In general people expect integer arguments like this not to care too much about the exact integer type, because it will be promoted to the right thing. Except with varargs it won't. So if you ever have a notionally 64-bit address that happens to fit in a 32-bit variable and you pass that it, this function will be broken. And the worst of it is, it'll work most of the time, until you happen to hit the wrong ABI and parameter combination :(. Do you have a suggested improvement to the API to avoid this? thanks -- PMM
[Qemu-devel] Libvirt : Bootstrap fails for local gnulib
Hi All, The Libvirt compliation fails to bootstarp for local gnulib. We do not have outbound access for the test machines. So, we ll pull the gnulib from and copy to the test machines. When we run ./bootstrap --no-git --gnulib-srcdir=/path/to/local, below is the error: ./bootstrap: Bootstrapping from checked-out libvirt sources... ./bootstrap: consider installing git-merge-changelog from gnulib ./bootstrap: getting gnulib files... fatal: Needed a single revision Unable to find current revision in submodule path '.gnulib' If we run autogen.sh it tries to pull from git though the $GNULIB_SRCDIR is exported with the local gnulib. ./autogen.sh '--no-git' I am going to run ./configure with no arguments - if you wish to pass any to it, please specify them on the ./autogen.sh command line. running bootstrap --no-git... ./bootstrap: Bootstrapping from checked-out libvirt sources... ./bootstrap: consider installing git-merge-changelog from gnulib ./bootstrap: getting gnulib files... fatal: Unable to look up git.sv.gnu.org (port 9418) (Name or service not known) Unable to fetch in submodule path '.gnulib' Failed to bootstrap, please investigate. Please provide us the workaround for the same. Thanks, Chandrashekar
[Qemu-devel] [PATCH v5 06/14] acpi_piix4 : Implement memory device hotplug registers
From: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com A 32-byte register is used to present up to 256 hotplug-able memory devices to BIOS and OSPM. Hot-add and hot-remove functions trigger an ACPI hotplug event through these. Only reads are allowed from these registers. An ACPI hot-remove event but needs to wait for OSPM to eject the device. We use a single-byte register to know when OSPM has called the _EJ function for a particular dimm. A write to this byte will depopulate the respective dimm. Only writes are allowed to this byte. v1-v2: mems_sts address moved from 0xaf20 to 0xaf80 (to accomodate more space for cpu-hotplugging in the future). _EJ array is reduced to a single byte. Add documentation in docs/specs/acpi_hotplug.txt v3-v4: Removed hot-remove functions, will be added separately. Updated for memory API. Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com Signed-off-by: Hu Tao hu...@cn.fujitsu.com --- docs/specs/acpi_mem_hotplug.txt | 14 hw/acpi/piix4.c | 72 - 2 files changed, 85 insertions(+), 1 deletion(-) create mode 100644 docs/specs/acpi_mem_hotplug.txt diff --git a/docs/specs/acpi_mem_hotplug.txt b/docs/specs/acpi_mem_hotplug.txt new file mode 100644 index 000..8391713 --- /dev/null +++ b/docs/specs/acpi_mem_hotplug.txt @@ -0,0 +1,14 @@ +QEMU-ACPI BIOS hotplug interface +-- +This document describes the interface between QEMU and the ACPI BIOS for non-PCI +space. For the PCI interface please look at docs/specs/acpi_pci_hotplug.txt + +QEMU-ACPI BIOS memory hotplug interface +-- + +Memory Dimm status array (IO port 0xaf80-0xaf9f, 1-byte access): +--- +Dimm hot-plug notification pending. One bit per slot. + +Read by ACPI BIOS GPE.3 handler to notify OS of memory hot-add or hot-remove +events. Read-only. diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c index 756df3b..2795ab0 100644 --- a/hw/acpi/piix4.c +++ b/hw/acpi/piix4.c @@ -29,6 +29,7 @@ #include exec/ioport.h #include hw/nvram/fw_cfg.h #include exec/address-spaces.h +#include hw/mem-hotplug/dimm.h //#define DEBUG @@ -50,9 +51,12 @@ #define PIIX4_PROC_BASE 0xaf00 #define PIIX4_PROC_LEN 32 +#define PIIX4_MEM_BASE 0xaf80 +#define PIIX4_MEM_LEN 32 #define PIIX4_PCI_HOTPLUG_STATUS 2 #define PIIX4_CPU_HOTPLUG_STATUS 4 +#define PIIX4_MEM_HOTPLUG_STATUS 8 struct pci_status { uint32_t up; /* deprecated, maintained for migration compatibility */ @@ -63,6 +67,10 @@ typedef struct CPUStatus { uint8_t sts[PIIX4_PROC_LEN]; } CPUStatus; +typedef struct MemStatus { +uint8_t mems_sts[PIIX4_MEM_LEN]; +} MemStatus; + typedef struct PIIX4PMState { PCIDevice dev; @@ -70,6 +78,7 @@ typedef struct PIIX4PMState { MemoryRegion io_gpe; MemoryRegion io_pci; MemoryRegion io_cpu; +MemoryRegion io_mem; ACPIREGS ar; APMState apm; @@ -94,6 +103,8 @@ typedef struct PIIX4PMState { CPUStatus gpe_cpu; Notifier cpu_added_notifier; + +MemStatus gpe_mem; } PIIX4PMState; static void piix4_acpi_system_hot_add_init(MemoryRegion *parent, @@ -113,7 +124,8 @@ static void pm_update_sci(PIIX4PMState *s) ACPI_BITMASK_GLOBAL_LOCK_ENABLE | ACPI_BITMASK_TIMER_ENABLE)) != 0) || (((s-ar.gpe.sts[0] s-ar.gpe.en[0]) - (PIIX4_PCI_HOTPLUG_STATUS | PIIX4_CPU_HOTPLUG_STATUS)) != 0); + (PIIX4_PCI_HOTPLUG_STATUS | PIIX4_CPU_HOTPLUG_STATUS | + PIIX4_MEM_HOTPLUG_STATUS)) != 0); qemu_set_irq(s-irq, sci_level); /* schedule a timer interruption if needed */ @@ -664,8 +676,40 @@ static void piix4_init_cpu_status(CPUState *cpu, void *data) g-sts[id / 8] |= (1 (id % 8)); } +static uint32_t mem_status_readb(void *opaque, uint32_t addr) +{ +PIIX4PMState *s = opaque; +uint32_t val = 0; +MemStatus *g = s-gpe_mem; +if (addr PIIX4_MEM_LEN) { +val = (uint32_t) g-mems_sts[addr]; +} +PIIX4_DPRINTF(memhp read % PRIu32 == % PRIu32 \n, addr, val); +return val; +} + +static const MemoryRegionOps mem_hotplug_ops = { +.old_portio = (MemoryRegionPortio[]) { +{ +.offset = 0, .len = PIIX4_MEM_LEN, .size = 1, +.read = mem_status_readb, +}, +PORTIO_END_OF_LIST() +}, +.endianness = DEVICE_LITTLE_ENDIAN, +}; + +static void piix4_init_mem_status(PIIX4PMState *s) +{ +int i; +for (i = 0; i PIIX4_MEM_LEN; i++) { +s-gpe_mem.mems_sts[i] = 0; +} +} + static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev, PCIHotplugState state); +static int piix4_mem_hotplug(DeviceState *qdev, DimmDevice *dev, int add); static void piix4_acpi_system_hot_add_init(MemoryRegion *parent, PCIBus *bus, PIIX4PMState *s)
[Qemu-devel] [PATCH v5 00/14] ACPI memory hotplug
It's been quite a while since v4 and lots of changes happend in qemu and v4 just can't apply anymore. So this series is basically a rebase. Another purpose is to bring up discussions to make consensus on some questions since v4, see http://lists.gnu.org/archive/html/qemu-devel/2013-01/msg01219.html and http://lists.gnu.org/archive/html/qemu-devel/2013-02/msg05332.html This series works with seabios counterpart. changes from v4: - rebased on a recent qemu-git - based on another series which seperates i440fx refactor from v4. http://lists.nongnu.org/archive/html/qemu-devel/2013-06/msg03500.html - hot-unplug patches not included, as suggested by Vasilis, since hot-unplug has some more complications with refcounting memory regions. - fix some copy-paste errors in qapi-schema.json. v4: http://lists.gnu.org/archive/html/qemu-devel/2012-12/msg02693.html Issues: - hot-remove needs to only unmap the dimm device from guest's view. Freeing the memory should happen when the last user of the device (e.g. virtio-blk) unrefs the device. A testcase is needed for this. - Live Migration: Ramblocks are migrated before qdev VMStates are migrated. So the DimmDevice is handled diferrently than other devices. Should this be reworked ?( DimmDevice structure currently does not define a VMStateDescription) Live migration works as long as the dimm layout (command line args) are identical at the source and destination qemu command line, and destination takes into account hot-operations that have occured on source. (v3 patch 10/19 created the DimmDevice that corresponds to an unknown incoming ramblock, e.g. for a dimm that was hot-added on source. but has been dropped for the moment). - A main blocker issue is windows guest functionality. The patchset does not work for windows currently. Testing on win2012 server RC or windows2008 consumer prerelease, when adding a DIMM, there is a BSOD with ACPI_BIOS_ERROR message. After this, the VM keeps rebooting with ACPI_BIOS_ERROR. The windows pnpmem driver obviosuly has a problem with the seabios dimm implementation (or the seabios dimm implementation is not fully ACPI-compliant). If someone can review the seabios patches or has any ideas to debug this, let me know. - hot-operation notification lists need to be added to migration state. - If a virtio sg element straddles a ramblock boundary, virtio_map_sg can't handle this and qemu exits with virtio: Trying to map MMIO memory assertion. This was discovered with stress testing in a VM with hotplugged DIMMs. The top-most commit in the qemu repo above tries to fix this (803aedf0) but maybe you have a better idea. This problem is critical for hot-add upstreaming and needs to be solved. Review/discussion on the list/irc is necessary. - q35 was supposed/rumoured to have native acpi hotplug support, but I haven't found it in the spec (and I think other people in the list didn't either). Iiuc the plan is to support paravirtual memory hotplug for both piix4 and q35. Hu Tao (5): qapi: make visit_type_size fallback to v-type_int() Implement dimm device abstraction memory: record below_4g_mem_size, above_4g_mem_size memory controller: initialize dram controller. pc: Add dimm paravirt SRAT info Vasilis Liaskovitis (9): Add SIZE type to qdev properties qemu-option: export parse_option_number vl: handle -device dimm acpi_piix4 : Implement memory device hotplug registers acpi_ich9 : Implement memory device hotplug registers Introduce paravirt interface QEMU_CFG_PCI_WINDOW Implement info memory and query-memory balloon: update with hotplugged memory Implement dimm-info default-configs/x86_64-softmmu.mak | 1 + docs/specs/acpi_mem_hotplug.txt| 14 ++ docs/specs/fwcfg.txt | 28 hmp-commands.hx| 4 + hmp.c | 25 hmp.h | 2 + hw/Makefile.objs | 1 + hw/acpi/ich9.c | 56 ++- hw/acpi/piix4.c| 72 - hw/core/qdev-properties.c | 61 hw/i386/pc.c | 74 +++-- hw/i386/pc_piix.c | 1 + hw/i386/pc_q35.c | 17 ++- hw/mem-hotplug/Makefile.objs | 1 + hw/mem-hotplug/dimm.c | 298 + hw/pci-host/piix.c | 13 ++ hw/virtio/virtio-balloon.c | 13 +- include/hw/acpi/ich9.h | 10 ++ include/hw/i386/pc.h | 8 + include/hw/mem-hotplug/dimm.h | 78 ++ include/hw/nvram/fw_cfg.h | 1 + include/hw/qdev-properties.h | 3 + include/qemu/option.h | 4 + include/sysemu/sysemu.h| 1 + monitor.c | 14 ++ qapi-schema.json | 40 + qapi/qapi-visit-core.c | 6 +- qmp-commands.hx
[Qemu-devel] [PATCH v5 01/14] qapi: make visit_type_size fallback to v-type_int()
In case of v-type_uint64 is NULL, fallback to v-type_int, by calling visit_type_uint64(). Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com Signed-off-by: Hu Tao hu...@cn.fujitsu.com --- qapi/qapi-visit-core.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/qapi/qapi-visit-core.c b/qapi/qapi-visit-core.c index 401ee6e..7ae2061 100644 --- a/qapi/qapi-visit-core.c +++ b/qapi/qapi-visit-core.c @@ -239,7 +239,11 @@ void visit_type_int64(Visitor *v, int64_t *obj, const char *name, Error **errp) void visit_type_size(Visitor *v, uint64_t *obj, const char *name, Error **errp) { if (!error_is_set(errp)) { -(v-type_size ? v-type_size : v-type_uint64)(v, obj, name, errp); +if (v-type_size) { +v-type_size(v, obj, name, errp); +} else { +visit_type_uint64(v, obj, name, errp); +} } } -- 1.8.3.1
[Qemu-devel] [PATCH v5 10/14] pc: Add dimm paravirt SRAT info
The numa_fw_cfg paravirt interface is extended to include SRAT information for all hotplug-able dimms. There are 3 words for each hotplug-able memory slot, denoting start address, size and node proximity. The new info is appended after existing numa info, so that the fw_cfg layout does not break. This information is used by Seabios to build hotplug memory device objects at runtime. nb_numa_nodes is set to 1 by default (not 0), so that we always pass srat info to SeaBIOS. v3-v4: numa_fw_cfg needs to be initalized after memory controller sets up dimm ranges. Make changes for pc_piix and pc_q35 to set numa_fw_cfg after i440fx initialization. v2-v3: setting nb_numa_nodes to 1 is not needed v1-v2: Dimm SRAT info (#dimms) is appended at end of existing numa fw_cfg in order not to break existing layout Documentation of the new fwcfg layout is included in docs/specs/fwcfg.txt Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com Signed-off-by: Hu Tao hu...@cn.fujitsu.com --- docs/specs/fwcfg.txt| 28 hw/i386/pc.c| 30 -- hw/i386/pc_piix.c | 1 + hw/i386/pc_q35.c| 7 +-- include/hw/i386/pc.h| 1 + include/sysemu/sysemu.h | 1 + 6 files changed, 60 insertions(+), 8 deletions(-) create mode 100644 docs/specs/fwcfg.txt diff --git a/docs/specs/fwcfg.txt b/docs/specs/fwcfg.txt new file mode 100644 index 000..e6fcd8f --- /dev/null +++ b/docs/specs/fwcfg.txt @@ -0,0 +1,28 @@ +QEMU-BIOS Paravirt Documentation +-- + +This document describes paravirt data structures passed from QEMU to BIOS. + +fw_cfg SRAT paravirt info + +The SRAT info passed from QEMU to BIOS has the following layout: + +--- +#nodes | cpu0_pxm | cpu1_pxm | ... | cpulast_pxm | node0_mem | node1_mem | ... | nodelast_mem + +--- +#dimms | dimm0_start | dimm0_sz | dimm0_pxm | ... | dimmlast_start | dimmlast_sz | dimmlast_pxm + +Entry 0 contains the number of numa nodes (nb_numa_nodes). + +Entries 1..max_cpus: The next max_cpus entries describe node proximity for each +one of the vCPUs in the system. + +Entries max_cpus+1..max_cpus+nb_numa_nodes+1: The next nb_numa_nodes entries +describe the memory size for each one of the NUMA nodes in the system. + +Entry max_cpus+nb_numa_nodes+1 contains the number of memory dimms (nb_hp_dimms) + +The last 3 * nb_hp_dimms entries are organized in triplets: Each triplet contains +the physical address offset, size (in bytes), and node proximity for the +respective dimm. diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 65838a6..b51d3b5 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -55,6 +55,7 @@ #include hw/acpi/acpi.h #include hw/cpu/icc_bus.h #include hw/boards.h +#include hw/mem-hotplug/dimm.h /* debug PC/ISA interrupts */ //#define DEBUG_IRQ @@ -606,8 +607,6 @@ static FWCfgState *bochs_bios_init(void) FWCfgState *fw_cfg; uint8_t *smbios_table; size_t smbios_len; -uint64_t *numa_fw_cfg; -int i, j; unsigned int apic_id_limit = pc_apic_id_limit(max_cpus); fw_cfg = fw_cfg_init(BIOS_CFG_IOPORT, BIOS_CFG_IOPORT + 1, 0, 0); @@ -640,11 +639,25 @@ static FWCfgState *bochs_bios_init(void) e820_table, sizeof(e820_table)); fw_cfg_add_bytes(fw_cfg, FW_CFG_HPET, hpet_cfg, sizeof(hpet_cfg)); + +return fw_cfg; +} + +void bochs_meminfo_bios_init(void *fw_cfg) +{ +uint64_t *numa_fw_cfg; +uint64_t *hp_dimms_fw_cfg; +int i, j; +unsigned int apic_id_limit = pc_apic_id_limit(max_cpus); + /* allocate memory for the NUMA channel: one (64bit) word for the number * of nodes, one word for each VCPU-node and one word for each node to * hold the amount of memory. + * Finally one word for the number of hotplug memory slots and three words + * for each hotplug memory slot (start address, size and node proximity). */ -numa_fw_cfg = g_new0(uint64_t, 1 + apic_id_limit + nb_numa_nodes); +numa_fw_cfg = g_new0(uint64_t, + 2 + apic_id_limit + nb_numa_nodes + 3 * nb_hp_dimms); numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes); for (i = 0; i max_cpus; i++) { unsigned int apic_id = x86_cpu_apic_id_from_index(i); @@ -659,11 +672,16 @@ static FWCfgState *bochs_bios_init(void) for (i = 0; i nb_numa_nodes; i++) { numa_fw_cfg[apic_id_limit + 1 + i] = cpu_to_le64(node_mem[i]); } + +numa_fw_cfg[1 + apic_id_limit + nb_numa_nodes] = cpu_to_le64(nb_hp_dimms); + +hp_dimms_fw_cfg = numa_fw_cfg + 2 + apic_id_limit + nb_numa_nodes; +if (nb_hp_dimms) { +dimm_setup_fwcfg_layout(hp_dimms_fw_cfg); +} fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, numa_fw_cfg, - (1 + apic_id_limit +
[Qemu-devel] [PATCH v5 05/14] vl: handle -device dimm
From: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com Signed-off-by: Hu Tao hu...@cn.fujitsu.com --- vl.c | 51 +++ 1 file changed, 51 insertions(+) diff --git a/vl.c b/vl.c index 767e020..9d88a79 100644 --- a/vl.c +++ b/vl.c @@ -170,6 +170,7 @@ int main(int argc, char **argv) #include ui/qemu-spice.h #include qapi/string-input-visitor.h +#include hw/mem-hotplug/dimm.h //#define DEBUG_NET //#define DEBUG_SLIRP @@ -252,6 +253,7 @@ static QTAILQ_HEAD(, FWBootEntry) fw_boot_order = int nb_numa_nodes; uint64_t node_mem[MAX_NODES]; unsigned long *node_cpumask[MAX_NODES]; +int nb_hp_dimms; uint8_t qemu_uuid[16]; @@ -2338,6 +2340,50 @@ static int chardev_init_func(QemuOpts *opts, void *opaque) return 0; } +static int dimmcfg_init_func(QemuOpts *opts, void *opaque) +{ +const char *driver; +const char *id; +uint64_t node, size; +uint32_t populated; +const char *buf, *busbuf; + +/* DimmDevice configuration needs to be known in order to initialize chipset + * with correct memory and pci ranges. But all devices are created after + * chipset / machine initialization. In * order to avoid this problem, we + * parse dimm information earlier into dimmcfg structs. */ + +driver = qemu_opt_get(opts, driver); +if (!strcmp(driver, dimm)) { + +id = qemu_opts_id(opts); +buf = qemu_opt_get(opts, size); +parse_option_size(size, buf, size, NULL); +buf = qemu_opt_get(opts, node); +parse_option_number(node, buf, node, NULL); +busbuf = qemu_opt_get(opts, bus); +buf = qemu_opt_get(opts, populated); +if (!buf) { +populated = 0; +} else { +populated = strcmp(buf, on) ? 0 : 1; +} + +dimm_config_create((char *)id, size, busbuf ? busbuf : membus.0, + node, nb_hp_dimms); + +/* if !populated, we just keep the config. The real device + * will be created in the future with a normal device_add + * command. */ +if (!populated) { +qemu_opts_del(opts); +} +nb_hp_dimms++; +} + +return 0; +} + #ifdef CONFIG_VIRTFS static int fsdev_init_func(QemuOpts *opts, void *opaque) { @@ -4260,6 +4306,11 @@ int main(int argc, char **argv, char **envp) } qemu_add_globals(); +/* init generic devices */ +if (qemu_opts_foreach(qemu_find_opts(device), + dimmcfg_init_func, NULL, 1) != 0) { +exit(1); +} qdev_machine_init(); QEMUMachineInitArgs args = { .ram_size = ram_size, -- 1.8.3.1
[Qemu-devel] [PATCH v5 3/7] acpi-dsdt: Implement functions for memory hotplug
From: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com Extend the DSDT to include methods for handling memory hot-add and hot-remove notifications and memory device status requests. These functions are called from the memory device SSDT methods. Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com --- src/acpi-dsdt-mem-hotplug.dsl | 57 +++ src/acpi-dsdt.dsl | 5 +++- 2 files changed, 61 insertions(+), 1 deletion(-) create mode 100644 src/acpi-dsdt-mem-hotplug.dsl diff --git a/src/acpi-dsdt-mem-hotplug.dsl b/src/acpi-dsdt-mem-hotplug.dsl new file mode 100644 index 000..1d82532 --- /dev/null +++ b/src/acpi-dsdt-mem-hotplug.dsl @@ -0,0 +1,57 @@ +/ + * Memory hotplug + / + +Scope(\_SB) { +/* Objects filled in by run-time generated SSDT */ +External(MTFY, MethodObj) +External(MEON, PkgObj) + +Method (CMST, 1, NotSerialized) { +// _STA method - return ON status of memdevice +// Local0 = MEON flag for this cpu +Store(DerefOf(Index(MEON, Arg0)), Local0) +If (Local0) { Return(0xF) } Else { Return(0x0) } +} + +/* Memory hotplug notify array */ +OperationRegion(MEST, SystemIO, 0xaf80, 32) +Field (MEST, ByteAcc, NoLock, Preserve) +{ +MES, 256 +} + +Method(MESC, 0) { +// Local5 = active memdevice bitmap +Store (MES, Local5) +// Local2 = last read byte from bitmap +Store (Zero, Local2) +// Local0 = memory device iterator +Store (Zero, Local0) +While (LLess(Local0, SizeOf(MEON))) { +// Local1 = MEON flag for this memory device +Store(DerefOf(Index(MEON, Local0)), Local1) +If (And(Local0, 0x07)) { +// Shift down previously read bitmap byte +ShiftRight(Local2, 1, Local2) +} Else { +// Read next byte from memdevice bitmap +Store(DerefOf(Index(Local5, ShiftRight(Local0, 3))), Local2) +} +// Local3 = active state for this memory device +Store(And(Local2, 1), Local3) + +If (LNotEqual(Local1, Local3)) { +// State change - update MEON with new state +Store(Local3, Index(MEON, Local0)) +// Do MEM notify +If (LEqual(Local3, 1)) { +MTFY(Local0, 1) +} +} +Increment(Local0) +} +Return(One) +} + +} diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl index 158f6b4..98c9413 100644 --- a/src/acpi-dsdt.dsl +++ b/src/acpi-dsdt.dsl @@ -294,6 +294,7 @@ DefinitionBlock ( } #include acpi-dsdt-cpu-hotplug.dsl +#include acpi-dsdt-mem-hotplug.dsl / @@ -313,7 +314,9 @@ DefinitionBlock ( // CPU hotplug event \_SB.PRSC() } -Method(_L03) { +Method(_E03) { +// Memory hotplug event +\_SB.MESC() } Method(_L04) { } -- 1.8.3.1
[Qemu-devel] [PATCH v5 02/14] Add SIZE type to qdev properties
From: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com This patch adds a 'SIZE' type property to qdev. It will make dimm description more convenient by allowing sizes to be specified with K,M,G,T prefixes instead of number of bytes e.g.: -device dimm,id=mem0,size=2G,bus=membus.0 Credits go to Ian Molton for original patch. See: http://patchwork.ozlabs.org/patch/38835/ Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com Signed-off-by: Hu Tao hu...@cn.fujitsu.com --- hw/core/qdev-properties.c| 61 include/hw/qdev-properties.h | 3 +++ include/qemu/option.h| 2 ++ util/qemu-option.c | 4 +-- 4 files changed, 68 insertions(+), 2 deletions(-) diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c index 3a324fb..9c34793 100644 --- a/hw/core/qdev-properties.c +++ b/hw/core/qdev-properties.c @@ -1134,3 +1134,64 @@ void qdev_prop_set_globals(DeviceState *dev, Error **errp) class = object_class_get_parent(class); } while (class); } + +/* --- 64bit unsigned int 'size' type --- */ + +static void get_size(Object *obj, Visitor *v, void *opaque, + const char *name, Error **errp) +{ +DeviceState *dev = DEVICE(obj); +Property *prop = opaque; +uint64_t *ptr = qdev_get_prop_ptr(dev, prop); + +visit_type_size(v, ptr, name, errp); +} + +static void set_size(Object *obj, Visitor *v, void *opaque, + const char *name, Error **errp) +{ +DeviceState *dev = DEVICE(obj); +Property *prop = opaque; +uint64_t *ptr = qdev_get_prop_ptr(dev, prop); + +if (dev-realized) { +qdev_prop_set_after_realize(dev, name, errp); +return; +} + +visit_type_size(v, ptr, name, errp); +} + +static int parse_size(DeviceState *dev, Property *prop, const char *str) +{ +uint64_t *ptr = qdev_get_prop_ptr(dev, prop); +Error *errp = NULL; + +if (str != NULL) { +parse_option_size(prop-name, str, ptr, errp); +} +assert_no_error(errp); +return 0; +} + +static int print_size(DeviceState *dev, Property *prop, char *dest, size_t len) +{ +uint64_t *ptr = qdev_get_prop_ptr(dev, prop); +char suffixes[] = {'T', 'G', 'M', 'K', 'B'}; +int i = 0; +uint64_t div; + +for (div = (long int)1 40; !(*ptr / div) ; div = 10) { +i++; +} + +return snprintf(dest, len, %0.03f%c, (double)*ptr / div, suffixes[i]); +} + +PropertyInfo qdev_prop_size = { +.name = size, +.parse = parse_size, +.print = print_size, +.get = get_size, +.set = set_size, +}; diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h index 39448b7..692f82e 100644 --- a/include/hw/qdev-properties.h +++ b/include/hw/qdev-properties.h @@ -15,6 +15,7 @@ extern PropertyInfo qdev_prop_uint64; extern PropertyInfo qdev_prop_hex8; extern PropertyInfo qdev_prop_hex32; extern PropertyInfo qdev_prop_hex64; +extern PropertyInfo qdev_prop_size; extern PropertyInfo qdev_prop_string; extern PropertyInfo qdev_prop_chr; extern PropertyInfo qdev_prop_ptr; @@ -116,6 +117,8 @@ extern PropertyInfo qdev_prop_arraylen; DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_hex32, uint32_t) #define DEFINE_PROP_HEX64(_n, _s, _f, _d) \ DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_hex64, uint64_t) +#define DEFINE_PROP_SIZE(_n, _s, _f, _d) \ +DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_size, uint64_t) #define DEFINE_PROP_PCI_DEVFN(_n, _s, _f, _d) \ DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_pci_devfn, int32_t) diff --git a/include/qemu/option.h b/include/qemu/option.h index a83c700..83c1e84 100644 --- a/include/qemu/option.h +++ b/include/qemu/option.h @@ -153,5 +153,7 @@ typedef int (*qemu_opts_loopfunc)(QemuOpts *opts, void *opaque); int qemu_opts_print(QemuOpts *opts, void *dummy); int qemu_opts_foreach(QemuOptsList *list, qemu_opts_loopfunc func, void *opaque, int abort_on_failure); +void parse_option_size(const char *name, const char *value, + uint64_t *ret, Error **errp); #endif diff --git a/util/qemu-option.c b/util/qemu-option.c index 412c425..8c67857 100644 --- a/util/qemu-option.c +++ b/util/qemu-option.c @@ -173,8 +173,8 @@ static void parse_option_number(const char *name, const char *value, } } -static void parse_option_size(const char *name, const char *value, - uint64_t *ret, Error **errp) +void parse_option_size(const char *name, const char *value, + uint64_t *ret, Error **errp) { char *postfix; double sizef; -- 1.8.3.1
[Qemu-devel] [PATCH v5 09/14] memory controller: initialize dram controller.
Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com Signed-off-by: Hu Tao hu...@cn.fujitsu.com --- hw/i386/pc.c | 27 +++ include/hw/i386/pc.h | 5 + 2 files changed, 32 insertions(+) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 55056b1..65838a6 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -1283,6 +1283,27 @@ void mc_set_smm(int val, void *arg) memory_region_transaction_commit(); } +static hwaddr mc_dimm_offset(DeviceState *dev, uint64_t size) +{ +MemoryController *d = MEMORY_CONTROLLER(dev); +MemoryControllerClass *c = MEMORY_CONTROLLER_GET_CLASS(d); +hwaddr ret; + +if (d-below_4g_mem_size + size = c-pci_hole_start) { +/* if dimm fits before pci hole, append it normally */ +ret = d-below_4g_mem_size; +d-below_4g_mem_size += size; +} else { +/* otherwise place it above 4GB */ +ret = d-above_4g_mem_size + c-pci_hole_end; +d-above_4g_mem_size += size; +} + +d-ram_size += size; + +return ret; +} + static int memory_controller_init(PCIDevice *dev) { MemoryController *m = MEMORY_CONTROLLER(dev); @@ -1353,6 +1374,11 @@ static int memory_controller_init(PCIDevice *dev) PAM_EXPAN_SIZE); } +m-dram_channel0 = dimm_bus_create(OBJECT(m), membus.0, 8, + c-dimm_offset); +m-pv_dram_channel = dimm_bus_create(OBJECT(m), membus.pv, 0, + c-dimm_offset); + ram_size = m-ram_size / 8 / 1024 / 1024; if (ram_size 255) { ram_size = 255; @@ -1388,6 +1414,7 @@ static void memory_controller_class_init(ObjectClass *klass, void *data) dc-no_user = 1; mc-set_smm = mc_set_smm; mc-update = mc_update; +mc-dimm_offset = mc_dimm_offset; } static const TypeInfo memory_controller_type_info = { diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index e2cbc1b..959b92b 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -10,6 +10,7 @@ #include hw/i386/ioapic.h #include hw/pci/pci.h #include hw/pci-host/pam.h +#include hw/mem-hotplug/dimm.h #define TYPE_MEMORY_CONTROLLER memory controller #define MEMORY_CONTROLLER(obj) OBJECT_CHECK(MemoryController, (obj), TYPE_DEVICE) @@ -29,6 +30,7 @@ typedef struct MemoryControllerClass { void (*set_smm)(int val, void *arg); void (*update)(MemoryController *m); +hwaddr (*dimm_offset)(DeviceState *d, uint64_t size); } MemoryControllerClass; typedef struct MemoryController { @@ -48,6 +50,9 @@ typedef struct MemoryController { MemoryRegion ram_above_4g; hwaddr below_4g_mem_size; hwaddr above_4g_mem_size; + +DimmBus *dram_channel0; +DimmBus *pv_dram_channel; } MemoryController; void mc_update_pam(MemoryController *d); -- 1.8.3.1
[Qemu-devel] [PATCH v5 03/14] qemu-option: export parse_option_number
From: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com Signed-off-by: Hu Tao hu...@cn.fujitsu.com --- include/qemu/option.h | 2 ++ util/qemu-option.c| 4 ++-- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/include/qemu/option.h b/include/qemu/option.h index 83c1e84..d7a0bb9 100644 --- a/include/qemu/option.h +++ b/include/qemu/option.h @@ -155,5 +155,7 @@ int qemu_opts_foreach(QemuOptsList *list, qemu_opts_loopfunc func, void *opaque, int abort_on_failure); void parse_option_size(const char *name, const char *value, uint64_t *ret, Error **errp); +void parse_option_number(const char *name, const char *value, + uint64_t *ret, Error **errp); #endif diff --git a/util/qemu-option.c b/util/qemu-option.c index 8c67857..dddabcc 100644 --- a/util/qemu-option.c +++ b/util/qemu-option.c @@ -155,8 +155,8 @@ static void parse_option_bool(const char *name, const char *value, bool *ret, } } -static void parse_option_number(const char *name, const char *value, -uint64_t *ret, Error **errp) +void parse_option_number(const char *name, const char *value, + uint64_t *ret, Error **errp) { char *postfix; uint64_t number; -- 1.8.3.1
[Qemu-devel] [PATCH v5 13/14] balloon: update with hotplugged memory
From: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com query-balloon and info balloon should report total memory available to the guest. balloon inflate/ deflate can also use all memory available to the guest (initial + hotplugged memory) Ballon driver has been minimaly tested with the patch, please review and test. Caveat: if the guest does not online hotplugged-memory, it's easy for a balloon inflate command to OOM a guest. Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com --- hw/virtio/virtio-balloon.c | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c index d669756..c866000 100644 --- a/hw/virtio/virtio-balloon.c +++ b/hw/virtio/virtio-balloon.c @@ -30,6 +30,7 @@ #endif #include hw/virtio/virtio-bus.h +#include hw/mem-hotplug/dimm.h static void balloon_page(void *addr, int deflate) { @@ -271,10 +272,11 @@ static void virtio_balloon_set_config(VirtIODevice *vdev, VirtIOBalloon *dev = VIRTIO_BALLOON(vdev); struct virtio_balloon_config config; uint32_t oldactual = dev-actual; +uint64_t hotplugged_ram_size = get_hp_memory_total(); memcpy(config, config_data, 8); dev-actual = le32_to_cpu(config.actual); if (dev-actual != oldactual) { -qemu_balloon_changed(ram_size - +qemu_balloon_changed(ram_size + hotplugged_ram_size - ((ram_addr_t) dev-actual VIRTIO_BALLOON_PFN_SHIFT)); } } @@ -290,18 +292,21 @@ static void virtio_balloon_stat(void *opaque, BalloonInfo *info) VirtIOBalloon *dev = opaque; info-actual = ram_size - ((uint64_t) dev-actual VIRTIO_BALLOON_PFN_SHIFT); +info-actual += get_hp_memory_total(); } static void virtio_balloon_to_target(void *opaque, ram_addr_t target) { VirtIOBalloon *dev = VIRTIO_BALLOON(opaque); VirtIODevice *vdev = VIRTIO_DEVICE(dev); +uint64_t hotplugged_ram_size = get_hp_memory_total(); -if (target ram_size) { -target = ram_size; +if (target ram_size + hotplugged_ram_size) { +target = ram_size + hotplugged_ram_size; } if (target) { -dev-num_pages = (ram_size - target) VIRTIO_BALLOON_PFN_SHIFT; +dev-num_pages = (ram_size + hotplugged_ram_size - target) + VIRTIO_BALLOON_PFN_SHIFT; virtio_notify_config(vdev); } } -- 1.8.3.1
[Qemu-devel] [PATCH v5 7/7] pci: Use paravirt interface for pcimem_start and pcimem64_start
From: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com Initialize the 32-bit and 64-bit pci starting offsets from values passed in by the qemu paravirt interface QEMU_CFG_PCI_WINDOW. Qemu calculates the starting offsets based on initial memory and hotplug-able dimms. It's possible to avoid the new paravirt interface, and calculate pci ranges from srat entries. But the code changes are ugly, see: http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg03548.html Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com --- src/paravirt.c | 7 +++ src/paravirt.h | 1 + src/pciinit.c | 9 + 3 files changed, 17 insertions(+) diff --git a/src/paravirt.c b/src/paravirt.c index 5925c63..9c1e511 100644 --- a/src/paravirt.c +++ b/src/paravirt.c @@ -134,6 +134,7 @@ qemu_platform_setup(void) #define QEMU_CFG_BOOT_MENU 0x0e #define QEMU_CFG_MAX_CPUS 0x0f #define QEMU_CFG_FILE_DIR 0x19 +#define QEMU_CFG_PCI_WINDOW 0x1a #define QEMU_CFG_ARCH_LOCAL 0x8000 #define QEMU_CFG_ACPI_TABLES(QEMU_CFG_ARCH_LOCAL + 0) #define QEMU_CFG_SMBIOS_ENTRIES (QEMU_CFG_ARCH_LOCAL + 1) @@ -339,3 +340,9 @@ void qemu_cfg_init(void) , 0, be32_to_cpu(qfile.size)); } } + +void qemu_cfg_get_pci_offsets(u64 *pcimem_start, u64 *pcimem64_start) +{ +qemu_cfg_read_entry(pcimem_start, QEMU_CFG_PCI_WINDOW, sizeof(u64)); +qemu_cfg_read((u8*)(pcimem64_start), sizeof(u64)); +} diff --git a/src/paravirt.h b/src/paravirt.h index fce5af9..2c37d0d 100644 --- a/src/paravirt.h +++ b/src/paravirt.h @@ -27,5 +27,6 @@ static inline int runningOnKVM(void) { void qemu_preinit(void); void qemu_platform_setup(void); void qemu_cfg_init(void); +void qemu_cfg_get_pci_offsets(u64 *pcimem_start, u64 *pcimem64_start); #endif diff --git a/src/pciinit.c b/src/pciinit.c index 8370b96..7e63c5e 100644 --- a/src/pciinit.c +++ b/src/pciinit.c @@ -805,6 +805,7 @@ static void pci_bios_map_devices(struct pci_bus *busses) void pci_setup(void) { +u64 pv_pcimem_start, pv_pcimem64_start; if (!CONFIG_QEMU) return; @@ -837,6 +838,14 @@ pci_setup(void) pci_bios_init_devices(); +/* if qemu gives us other pci window values, it means there are hotplug-able + * dimms. Adjust accordingly */ +qemu_cfg_get_pci_offsets(pv_pcimem_start, pv_pcimem64_start); +if (pv_pcimem_start pcimem_start) +pcimem_start = pv_pcimem_start; +if (pv_pcimem64_start pcimem64_start) +pcimem64_start = pv_pcimem64_start; + free(busses); pci_enable_default_vga(); -- 1.8.3.1
[Qemu-devel] [PATCH v5 04/14] Implement dimm device abstraction
Each hotplug-able memory slot is a DimmDevice. All DimmDevices are attached to a new bus called DimmBus. This bus is introduced so that we no longer depend on hotplug-capability of main system bus (the main bus does not allow hotplugging). The DimmBus should be attached to a chipset Device (i440fx in case of the pc) A hot-add operation for a particular dimm: - creates a new DimmDevice and attaches it to the DimmBus - creates a new MemoryRegion of the given physical address offset, size and node proximity, and attaches it to main system memory as a sub_region. Hotplug operations are done through normal device_add commands. Also add properties to DimmDevice. TODO/FIXME: Dimm configurations are specified at qemu startup at the command line. Dimms with populated=off are actually not created, but their config is saved in the corresponding bus. If user does not specify the correct bus for a dimm device during device_add and there are multiple memory buses in the machine (e.g. right now membus.0 and membus.pv are created in this pachset) the first bus found in the device tree that matches the device type is picked. This bus may be the wrong one, and the dimm config is not retrieved, thus creating a useless dimm device. A solution would be to delete the new dimm if the dimm config was not found (see commented qmp_device_del in dimm_init) but this does not look clean. This design needs to be fixed somehow to avoid similar problems if multiple memory buses are present in the same machine. Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com Signed-off-by: Hu Tao hu...@cn.fujitsu.com --- default-configs/x86_64-softmmu.mak | 1 + hw/Makefile.objs | 1 + hw/mem-hotplug/Makefile.objs | 1 + hw/mem-hotplug/dimm.c | 241 + include/hw/mem-hotplug/dimm.h | 77 5 files changed, 321 insertions(+) create mode 100644 hw/mem-hotplug/Makefile.objs create mode 100644 hw/mem-hotplug/dimm.c create mode 100644 include/hw/mem-hotplug/dimm.h diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak index 599b630..5348800 100644 --- a/default-configs/x86_64-softmmu.mak +++ b/default-configs/x86_64-softmmu.mak @@ -46,3 +46,4 @@ CONFIG_APIC=y CONFIG_IOAPIC=y CONFIG_ICC_BUS=y CONFIG_PVPANIC=y +CONFIG_MEM_HOTPLUG=y diff --git a/hw/Makefile.objs b/hw/Makefile.objs index 0243d6a..6d3dc73 100644 --- a/hw/Makefile.objs +++ b/hw/Makefile.objs @@ -27,6 +27,7 @@ devices-dirs-$(CONFIG_SOFTMMU) += usb/ devices-dirs-$(CONFIG_VIRTIO) += virtio/ devices-dirs-$(CONFIG_SOFTMMU) += watchdog/ devices-dirs-$(CONFIG_SOFTMMU) += xen/ +devices-dirs-$(CONFIG_MEM_HOTPLUG) += mem-hotplug/ devices-dirs-y += core/ common-obj-y += $(devices-dirs-y) obj-y += $(devices-dirs-y) diff --git a/hw/mem-hotplug/Makefile.objs b/hw/mem-hotplug/Makefile.objs new file mode 100644 index 000..7563ef5 --- /dev/null +++ b/hw/mem-hotplug/Makefile.objs @@ -0,0 +1 @@ +common-obj-$(CONFIG_MEM_HOTPLUG) += dimm.o diff --git a/hw/mem-hotplug/dimm.c b/hw/mem-hotplug/dimm.c new file mode 100644 index 000..09cfc4b --- /dev/null +++ b/hw/mem-hotplug/dimm.c @@ -0,0 +1,241 @@ +/* + * Dimm device for Memory Hotplug + * + * Copyright ProfitBricks GmbH 2013 + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see http://www.gnu.org/licenses/ + */ + +#include trace.h +#include hw/qdev.h +#include hw/mem-hotplug/dimm.h +#include time.h +#include qmp-commands.h + +/* the following list is used to hold dimm config info before machine + * is initialized. After machine init, the list is not used anymore.*/ +static DimmConfiglist dimmconfig_list = + QTAILQ_HEAD_INITIALIZER(dimmconfig_list); + +/* the list of memory buses */ +static QLIST_HEAD(, DimmBus) memory_buses; + +static void dimmbus_dev_print(Monitor *mon, DeviceState *dev, int indent); +static char *dimmbus_get_fw_dev_path(DeviceState *dev); + +static Property dimm_properties[] = { +DEFINE_PROP_UINT64(start, DimmDevice, start, 0), +DEFINE_PROP_SIZE(size, DimmDevice, size, DEFAULT_DIMMSIZE), +DEFINE_PROP_UINT32(node, DimmDevice, node, 0), +DEFINE_PROP_END_OF_LIST(), +}; + +static void dimmbus_dev_print(Monitor *mon, DeviceState *dev, int indent) +{ +} + +static char *dimmbus_get_fw_dev_path(DeviceState *dev) +{ +char path[40]; + +snprintf(path, sizeof(path), %s,
[Qemu-devel] [PATCH v5 11/14] Introduce paravirt interface QEMU_CFG_PCI_WINDOW
From: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com Qemu calculates the 32-bit and 64-bit PCI starting offsets based on initial memory and hotplug-able dimms. This info needs to be passed to Seabios for PCI initialization. Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com Signed-off-by: Hu Tao hu...@cn.fujitsu.com --- hw/i386/pc_q35.c | 10 ++ hw/pci-host/piix.c| 13 + include/hw/nvram/fw_cfg.h | 1 + 3 files changed, 24 insertions(+) diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c index 2c14977..f956c62 100644 --- a/hw/i386/pc_q35.c +++ b/hw/i386/pc_q35.c @@ -42,6 +42,7 @@ #include hw/ide/ahci.h #include hw/usb.h #include hw/cpu/icc_bus.h +#include hw/nvram/fw_cfg.h /* ICH9 AHCI has 6 ports */ #define MAX_SATA_PORTS 6 @@ -75,6 +76,7 @@ static void pc_q35_init(QEMUMachineInitArgs *args) PCIDevice *ahci; DeviceState *icc_bridge; void *fw_cfg = NULL; +uint64_t *pci_window_fw_cfg; icc_bridge = qdev_create(NULL, TYPE_ICC_BRIDGE); object_property_add_child(qdev_get_machine(), icc-bridge, @@ -119,6 +121,14 @@ static void pc_q35_init(QEMUMachineInitArgs *args) /* pci */ qdev_init_nofail(DEVICE(q35_host)); bochs_meminfo_bios_init(fw_cfg); + +pci_window_fw_cfg = g_new0(uint64_t, 2); +pci_window_fw_cfg[0] = cpu_to_le64(MCH_HOST_BRIDGE_PCIEXBAR_DEFAULT); +pci_window_fw_cfg[1] = cpu_to_le64(0x1ULL + + q35_host-mch.dev.above_4g_mem_size); +fw_cfg_add_bytes(fw_cfg, FW_CFG_PCI_WINDOW, + (uint8_t *)pci_window_fw_cfg, 2 * 8); + host_bus = q35_host-host.pci.bus; /* create ISA bus */ lpc = pci_create_simple_multifunction(host_bus, PCI_DEVFN(ICH9_LPC_DEV, diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c index 71a9b8b..ace8f2b 100644 --- a/hw/pci-host/piix.c +++ b/hw/pci-host/piix.c @@ -32,6 +32,7 @@ #include hw/xen/xen.h #include hw/pci-host/pam.h #include sysemu/sysemu.h +#include hw/nvram/fw_cfg.h /* * I440FX chipset data sheet. @@ -229,6 +230,8 @@ static PCIBus *i440fx_common_init(const char *device_name, PIIX3State *piix3; I440FXPMCState *f; I440FXState *i440fx; +uint64_t *pci_window_fw_cfg; +void *fw_cfg; i440fx = I440FX_DEVICE(object_new(TYPE_I440FX_DEVICE)); s = PCI_HOST_BRIDGE(i440fx); @@ -266,6 +269,16 @@ static PCIBus *i440fx_common_init(const char *device_name, *piix3_devfn = piix3-dev.devfn; *pci_address_space = i440fx-pci_address_space; +fw_cfg = fw_cfg_find(); +if (fw_cfg) { +pci_window_fw_cfg = g_new0(uint64_t, 2); +pci_window_fw_cfg[0] = cpu_to_le64(f-dev.below_4g_mem_size); +pci_window_fw_cfg[1] = cpu_to_le64(0x1ULL + + f-dev.above_4g_mem_size); +fw_cfg_add_bytes(fw_cfg, FW_CFG_PCI_WINDOW, + (uint8_t *)pci_window_fw_cfg, 2 * 8); +} + return s-bus; } diff --git a/include/hw/nvram/fw_cfg.h b/include/hw/nvram/fw_cfg.h index f60dd67..41cfa32 100644 --- a/include/hw/nvram/fw_cfg.h +++ b/include/hw/nvram/fw_cfg.h @@ -35,6 +35,7 @@ #define FW_CFG_SETUP_SIZE 0x17 #define FW_CFG_SETUP_DATA 0x18 #define FW_CFG_FILE_DIR 0x19 +#define FW_CFG_PCI_WINDOW 0x1a #define FW_CFG_FILE_FIRST 0x20 #define FW_CFG_FILE_SLOTS 0x10 -- 1.8.3.1
[Qemu-devel] [PATCH v5 14/14] Implement dimm-info
From: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com query-dimm-info and info dimm will give current state of all dimms in the system e.g. dimm0: on dimm1: off dimm2: off dimm3: on etc. Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com Signed-off-by: Hu Tao hu...@cn.fujitsu.com --- hmp-commands.hx | 2 ++ hmp.c | 17 + hmp.h | 1 + hw/mem-hotplug/dimm.c | 43 +++ monitor.c | 7 +++ qapi-schema.json | 27 +++ 6 files changed, 97 insertions(+) diff --git a/hmp-commands.hx b/hmp-commands.hx index b2937c2..6cb1285 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -1657,6 +1657,8 @@ show roms show memory-total @item info tpm show the TPM device +@item info dimm +show dimm @end table ETEXI diff --git a/hmp.c b/hmp.c index 0371da9..04586b3 100644 --- a/hmp.c +++ b/hmp.c @@ -683,6 +683,23 @@ void hmp_info_memory(Monitor *mon, const QDict *qdict) qapi_free_MemoryInfo(mem); } +void hmp_info_dimm(Monitor *mon, const QDict *qdict) +{ +DimmInfoList *info; +DimmInfoList *item; +DimmInfo *dimm; + +info = qmp_query_dimm_info(NULL); +for (item = info; item; item = item-next) { +dimm = item-value; +monitor_printf(mon, dimm %s : %s\n, dimm-dimm, + dimm-state ? on : off); +dimm-dimm = NULL; +} + +qapi_free_DimmInfoList(info); +} + void hmp_quit(Monitor *mon, const QDict *qdict) { monitor_suspend(mon); diff --git a/hmp.h b/hmp.h index b5a5afb..06cea4e 100644 --- a/hmp.h +++ b/hmp.h @@ -38,6 +38,7 @@ void hmp_info_pci(Monitor *mon, const QDict *qdict); void hmp_info_block_jobs(Monitor *mon, const QDict *qdict); void hmp_info_tpm(Monitor *mon, const QDict *qdict); void hmp_info_memory(Monitor *mon, const QDict *qdict); +void hmp_info_dimm(Monitor *mon, const QDict *qdict); void hmp_quit(Monitor *mon, const QDict *qdict); void hmp_stop(Monitor *mon, const QDict *qdict); void hmp_system_reset(Monitor *mon, const QDict *qdict); diff --git a/hw/mem-hotplug/dimm.c b/hw/mem-hotplug/dimm.c index da31a18..4b08706 100644 --- a/hw/mem-hotplug/dimm.c +++ b/hw/mem-hotplug/dimm.c @@ -170,6 +170,18 @@ static DimmConfig *dimmcfg_find_from_name(DimmBus *bus, const char *name) return NULL; } +static DimmDevice *dimm_find_from_name(DimmBus *bus, const char *name) +{ +DimmDevice *slot; + +QTAILQ_FOREACH(slot, bus-dimmlist, nextdimm) { +if (!strcmp(slot-qdev.id, name)) { +return slot; +} +} +return NULL; +} + void dimm_setup_fwcfg_layout(uint64_t *fw_cfg_slots) { DimmConfig *slot; @@ -185,6 +197,37 @@ void dimm_setup_fwcfg_layout(uint64_t *fw_cfg_slots) } } +DimmInfoList *qmp_query_dimm_info(Error **errp) +{ +DimmBus *bus; +DimmConfig *slot; +DimmInfoList *head = NULL, *info, *cur_item = NULL; + +QLIST_FOREACH(bus, memory_buses, next) { +QTAILQ_FOREACH(slot, bus-dimmconfig_list, nextdimmcfg) { + +info = g_malloc0(sizeof(*info)); +info-value = g_malloc0(sizeof(*info-value)); +info-value-dimm = g_malloc0(sizeof(char) * 32); +strcpy(info-value-dimm, slot-name); +if (dimm_find_from_name(bus, slot-name)) { +info-value-state = 1; +} else { +info-value-state = 0; +} +/* XXX: waiting for the qapi to support GSList */ +if (!cur_item) { +head = cur_item = info; +} else { +cur_item-next = info; +cur_item = info; +} +} +} + +return head; +} + uint64_t get_hp_memory_total(void) { DimmBus *bus; diff --git a/monitor.c b/monitor.c index 14a955e..c2b046c 100644 --- a/monitor.c +++ b/monitor.c @@ -2753,6 +2753,13 @@ static mon_cmd_t info_cmds[] = { .mhandler.cmd = do_trace_print_events, }, { +.name = dimm, +.args_type = , +.params = , +.help = show active and non active dimms, +.mhandler.cmd = hmp_info_dimm, +}, +{ .name = tpm, .args_type = , .params = , diff --git a/qapi-schema.json b/qapi-schema.json index d2940dc..88bbfac 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -3621,3 +3621,30 @@ { 'type': 'MemoryInfo', 'data': { 'total': 'int' } } { 'command': 'query-memory', 'returns': 'MemoryInfo' } + +## +# @DimmInfo: +# +# Information about status of dimm device +# +# @dimm: the name of the dimm +# +# @state: 'true' means the dimm device is plugged in, 'false' means +# means the dimm device is plugged out. +# +# Since: 1.6 +# +## +{ 'type': 'DimmInfo', + 'data': {'dimm': 'str', 'state': 'bool'} } + +## +# @query-dimm-info: +# +# Returns the state of dimm devices +# +# Returns: list of @DimmInfo +# +# Since: 1.6 +## +{
[Qemu-devel] [PATCH v5 07/14] acpi_ich9 : Implement memory device hotplug registers
From: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com This implements acpi dimm hot-add capability for q35 (ich9). The logic is the same as for the pc machine (piix4). TODO: Fix acpi irq delivery bug. Currently there is a flood of irqs when delivering an acpi interrupt (should be just one). Guest complains as follows: irq 9: nobody cared [...] Disabling IRQ #9 where #9 is the acpi irq Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com Signed-off-by: Hu Tao hu...@cn.fujitsu.com --- hw/acpi/ich9.c | 56 -- include/hw/acpi/ich9.h | 10 + 2 files changed, 64 insertions(+), 2 deletions(-) diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c index 4a17f32..0034aa2 100644 --- a/hw/acpi/ich9.c +++ b/hw/acpi/ich9.c @@ -33,6 +33,7 @@ #include exec/address-spaces.h #include hw/i386/ich9.h +#include hw/mem-hotplug/dimm.h //#define DEBUG @@ -49,11 +50,14 @@ static void pm_update_sci(ICH9LPCPMRegs *pm) pm1a_sts = acpi_pm1_evt_get_sts(pm-acpi_regs); -sci_level = (((pm1a_sts pm-acpi_regs.pm1.evt.en) +sci_level = pm1a_sts pm-acpi_regs.pm1.evt.en) (ACPI_BITMASK_RT_CLOCK_ENABLE | ACPI_BITMASK_POWER_BUTTON_ENABLE | ACPI_BITMASK_GLOBAL_LOCK_ENABLE | - ACPI_BITMASK_TIMER_ENABLE)) != 0); + ACPI_BITMASK_TIMER_ENABLE)) != 0) || +(((pm-acpi_regs.gpe.sts[0] pm-acpi_regs.gpe.en[0]) + (ICH9_MEM_HOTPLUG_STATUS)) != 0)); + qemu_set_irq(pm-irq, sci_level); /* schedule a timer interruption if needed */ @@ -202,6 +206,47 @@ static void pm_powerdown_req(Notifier *n, void *opaque) acpi_pm1_evt_power_down(pm-acpi_regs); } +static uint32_t mem_status_readb(void *opaque, uint32_t addr) +{ +ICH9LPCPMRegs *s = opaque; +uint32_t val = 0; +MemStatus *g = s-gpe_mem; +if (addr ICH9_MEM_LEN) { +val = (uint32_t) g-mems_sts[addr]; +} +ICH9_DEBUG(memhp read % PRIu32 == % PRIu32 \n, addr, val); +return val; +} + +static const MemoryRegionOps ich9_mem_hotplug_ops = { +.old_portio = (MemoryRegionPortio[]) { +{ +.offset = 0, .len = ICH9_MEM_LEN, .size = 1, +.read = mem_status_readb, +}, +PORTIO_END_OF_LIST() +}, +.endianness = DEVICE_LITTLE_ENDIAN, +}; + +static void enable_mem_device(ICH9LPCState *s, int memdevice) +{ +MemStatus *g = s-pm.gpe_mem; +s-pm.acpi_regs.gpe.sts[0] |= ICH9_MEM_HOTPLUG_STATUS; +g-mems_sts[memdevice / 8] |= (1 (memdevice % 8)); +} + +static int ich9_mem_hotplug(DeviceState *dev, DimmDevice *dimm, int add) +{ +ICH9LPCState *s = ICH9_LPC_DEVICE(dev); + +if (add) { +enable_mem_device(s, dimm-idx); +} +pm_update_sci(s-pm); +return 0; +} + void ich9_pm_init(PCIDevice *lpc_pci, ICH9LPCPMRegs *pm, qemu_irq sci_irq) { @@ -227,4 +272,11 @@ void ich9_pm_init(PCIDevice *lpc_pci, ICH9LPCPMRegs *pm, qemu_register_reset(pm_reset, pm); pm-powerdown_notifier.notify = pm_powerdown_req; qemu_register_powerdown_notifier(pm-powerdown_notifier); + +memory_region_init_io(pm-io_mem, ich9_mem_hotplug_ops, pm, + acpi-memory-hotplug0, ICH9_MEM_BASE); +memory_region_add_subregion(get_system_io(), ICH9_MEM_BASE, pm-io_mem); + +dimm_bus_hotplug(ich9_mem_hotplug, lpc_pci-qdev); + } diff --git a/include/hw/acpi/ich9.h b/include/hw/acpi/ich9.h index b1fe71f..300e07f 100644 --- a/include/hw/acpi/ich9.h +++ b/include/hw/acpi/ich9.h @@ -23,6 +23,14 @@ #include hw/acpi/acpi.h +#define ICH9_MEM_BASE 0xaf80 +#define ICH9_MEM_LEN 32 +#define ICH9_MEM_HOTPLUG_STATUS 8 + +typedef struct MemStatus { +uint8_t mems_sts[ICH9_MEM_LEN]; +} MemStatus; + typedef struct ICH9LPCPMRegs { /* * In ich9 spec says that pm1_cnt register is 32bit width and @@ -34,6 +42,7 @@ typedef struct ICH9LPCPMRegs { MemoryRegion io; MemoryRegion io_gpe; MemoryRegion io_smi; +MemoryRegion io_mem; uint32_t smi_en; uint32_t smi_sts; @@ -42,6 +51,7 @@ typedef struct ICH9LPCPMRegs { uint32_t pm_io_base; Notifier powerdown_notifier; +MemStatus gpe_mem; } ICH9LPCPMRegs; void ich9_pm_init(PCIDevice *lpc_pci, ICH9LPCPMRegs *pm, -- 1.8.3.1
[Qemu-devel] [PATCH v5 12/14] Implement info memory and query-memory
From: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com Returns total physical memory available to guest in bytes, including hotplugged memory. Note that the number reported here may be different from what the guest sees e.g. if the guest has not logically onlined hotplugged memory. This functionality is provided independently of a balloon device, since a guest can be using ACPI memory hotplug without using a balloon device. Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com Signed-off-by: Hu Tao hu...@cn.fujitsu.com --- hmp-commands.hx | 2 ++ hmp.c | 8 hmp.h | 1 + hw/mem-hotplug/dimm.c | 14 ++ include/hw/mem-hotplug/dimm.h | 1 + monitor.c | 7 +++ qapi-schema.json | 13 + qmp-commands.hx | 22 ++ vl.c | 9 + 9 files changed, 77 insertions(+) diff --git a/hmp-commands.hx b/hmp-commands.hx index 915b0d1..b2937c2 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -1653,6 +1653,8 @@ show device tree show qdev device model list @item info roms show roms +@item info memory-total +show memory-total @item info tpm show the TPM device @end table diff --git a/hmp.c b/hmp.c index 494a9aa..0371da9 100644 --- a/hmp.c +++ b/hmp.c @@ -675,6 +675,14 @@ void hmp_info_tpm(Monitor *mon, const QDict *qdict) qapi_free_TPMInfoList(info_list); } +void hmp_info_memory(Monitor *mon, const QDict *qdict) +{ +MemoryInfo *mem; +mem = qmp_query_memory(NULL); +monitor_printf(mon, MemTotal: % PRIu64 \n, mem-total); +qapi_free_MemoryInfo(mem); +} + void hmp_quit(Monitor *mon, const QDict *qdict) { monitor_suspend(mon); diff --git a/hmp.h b/hmp.h index 56d2e92..b5a5afb 100644 --- a/hmp.h +++ b/hmp.h @@ -37,6 +37,7 @@ void hmp_info_balloon(Monitor *mon, const QDict *qdict); void hmp_info_pci(Monitor *mon, const QDict *qdict); void hmp_info_block_jobs(Monitor *mon, const QDict *qdict); void hmp_info_tpm(Monitor *mon, const QDict *qdict); +void hmp_info_memory(Monitor *mon, const QDict *qdict); void hmp_quit(Monitor *mon, const QDict *qdict); void hmp_stop(Monitor *mon, const QDict *qdict); void hmp_system_reset(Monitor *mon, const QDict *qdict); diff --git a/hw/mem-hotplug/dimm.c b/hw/mem-hotplug/dimm.c index 09cfc4b..da31a18 100644 --- a/hw/mem-hotplug/dimm.c +++ b/hw/mem-hotplug/dimm.c @@ -185,6 +185,20 @@ void dimm_setup_fwcfg_layout(uint64_t *fw_cfg_slots) } } +uint64_t get_hp_memory_total(void) +{ +DimmBus *bus; +DimmDevice *slot; +uint64_t info = 0; + +QLIST_FOREACH(bus, memory_buses, next) { +QTAILQ_FOREACH(slot, bus-dimmlist, nextdimm) { +info += slot-size; +} +} +return info; +} + static void dimm_realize(DeviceState *s, Error **errp) { DimmBus *bus = DIMM_BUS(qdev_get_parent_bus(s)); diff --git a/include/hw/mem-hotplug/dimm.h b/include/hw/mem-hotplug/dimm.h index 8625ae6..3947539 100644 --- a/include/hw/mem-hotplug/dimm.h +++ b/include/hw/mem-hotplug/dimm.h @@ -73,5 +73,6 @@ DimmBus *dimm_bus_create(Object *parent, const char *name, uint32_t max_dimms, dimm_calcoffset_fn pmc_set_offset); void dimm_config_create(char *id, uint64_t size, const char *bus, uint64_t node, uint32_t dimm_idx); +uint64_t get_hp_memory_total(void); #endif diff --git a/monitor.c b/monitor.c index 70ae8f5..14a955e 100644 --- a/monitor.c +++ b/monitor.c @@ -2760,6 +2760,13 @@ static mon_cmd_t info_cmds[] = { .mhandler.cmd = hmp_info_tpm, }, { +.name = memory, +.args_type = , +.params = , +.help = show memory informations including total memory size, +.mhandler.cmd = hmp_info_memory, +}, +{ .name = NULL, }, }; diff --git a/qapi-schema.json b/qapi-schema.json index a80ee40..d2940dc 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -3608,3 +3608,16 @@ '*cpuid-input-ecx': 'int', 'cpuid-register': 'X86CPURegister32', 'features': 'int' } } + +## +# @query-memory: +# +# Returns total memory in bytes, including hotplugged dimms +# +# Returns: int +# +# Since: 1.6 +## +{ 'type': 'MemoryInfo', + 'data': { 'total': 'int' } } +{ 'command': 'query-memory', 'returns': 'MemoryInfo' } diff --git a/qmp-commands.hx b/qmp-commands.hx index 8cea5e5..58d7c7c 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -2997,3 +2997,25 @@ Example: - { return: {} } EQMP + +{ +.name = query-memory, +.args_type = , +.mhandler.cmd_new = qmp_marshal_input_query_memory +}, +SQMP +query-memory +-- + +Return a json-object with the following information: + +- total: total memory in bytes, including hotplugged dimms + +Example: + +- { execute: query-memory } +- { + return: { total: 1073741824 } + } + +EQMP diff
[Qemu-devel] [PATCH v5 5/7] acpi: generate hotplug memory devices
The memory device generation is guided by qemu paravirt info. Seabios uses the info to setup SRAT entries for the hotplug-able memory slots, and to generate appropriate memory device objects. One memory device (and corresponding SRAT entry) is generated for each hotplug-able qemu memslot. Currently no SSDT memory device is created for initial system memory. We only support up to 255 DIMMs for now (PackageOp used for the MEON array can only describe an array of at most 255 elements. VarPackageOp would be needed to support more than 255 devices) Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com Signed-off-by: Hu Tao hu...@cn.fujitsu.com --- src/acpi.c | 151 ++--- src/paravirt.c | 8 +++ 2 files changed, 152 insertions(+), 7 deletions(-) diff --git a/src/acpi.c b/src/acpi.c index ce988e0..e9a0326 100644 --- a/src/acpi.c +++ b/src/acpi.c @@ -15,6 +15,8 @@ #include config.h // CONFIG_* #include paravirt.h // RamSize #include dev-q35.h +#include memmap.h +#include paravirt.h #include acpi-dsdt.hex @@ -250,11 +252,23 @@ encodeLen(u8 *ssdt_ptr, int length, int bytes) #define PCIHP_AML (ssdp_pcihp_aml + *ssdt_pcihp_start) #define PCI_SLOTS 32 +/* 0x5B 0x82 DeviceOp PkgLength NameString DimmID */ +#define MEM_BASE 0xaf80 +#define MEM_AML (ssdm_mem_aml + *ssdt_mem_start) +#define MEM_SIZEOF (*ssdt_mem_end - *ssdt_mem_start) +#define MEM_OFFSET_HEX (*ssdt_mem_name - *ssdt_mem_start + 2) +#define MEM_OFFSET_ID (*ssdt_mem_id - *ssdt_mem_start) +#define MEM_OFFSET_PXM 31 +#define MEM_OFFSET_START 55 +#define MEM_OFFSET_END 63 +#define MEM_OFFSET_SIZE 79 + #define SSDT_SIGNATURE 0x54445353 // SSDT #define SSDT_HEADER_LENGTH 36 #include ssdt-misc.hex #include ssdt-pcihp.hex +#include ssdt-mem.hex #define PCI_RMV_BASE 0xae0c @@ -306,9 +320,100 @@ static void patch_pcihp(int slot, u8 *ssdt_ptr, u32 eject) } } +static void build_memdev(u8 *ssdt_ptr, int i, u64 mem_base, u64 mem_len, u8 node) +{ +memcpy(ssdt_ptr, MEM_AML, MEM_SIZEOF); +ssdt_ptr[MEM_OFFSET_HEX] = getHex(i 4); +ssdt_ptr[MEM_OFFSET_HEX+1] = getHex(i); +ssdt_ptr[MEM_OFFSET_ID] = i; +ssdt_ptr[MEM_OFFSET_PXM] = node; +*(u64*)(ssdt_ptr + MEM_OFFSET_START) = cpu_to_le64(mem_base); +*(u64*)(ssdt_ptr + MEM_OFFSET_END) = cpu_to_le64(mem_base + mem_len); +*(u64*)(ssdt_ptr + MEM_OFFSET_SIZE) = cpu_to_le64(mem_len); +} + +static u8 *build_memssdt(u8 *ssdt_ptr, int memssdt_len, + u64 *numadimmsmap, int nb_memdevs) +{ +u64 mem_base, mem_len; +u64 *dimm = numadimmsmap; +int node; +int i; + +// build Scope(_SB_) header +*(ssdt_ptr++) = 0x10; // ScopeOp +ssdt_ptr = encodeLen(ssdt_ptr, memssdt_len, 3); +*(ssdt_ptr++) = '_'; +*(ssdt_ptr++) = 'S'; +*(ssdt_ptr++) = 'B'; +*(ssdt_ptr++) = '_'; + +for (i = 0; i nb_memdevs; i++) { +mem_base = *dimm++; +mem_len = *dimm++; +node = *dimm++; +build_memdev(ssdt_ptr, i, mem_base, mem_len, node); +ssdt_ptr += MEM_SIZEOF; +} + +// build Method(MTFY, 2) {If (LEqual(Arg0, 0x00)) {Notify(CM00, Arg1)} ...} +*(ssdt_ptr++) = 0x14; // MethodOp +ssdt_ptr = encodeLen(ssdt_ptr, 2+5+(12*nb_memdevs), 2); +*(ssdt_ptr++) = 'M'; +*(ssdt_ptr++) = 'T'; +*(ssdt_ptr++) = 'F'; +*(ssdt_ptr++) = 'Y'; +*(ssdt_ptr++) = 0x02; +for (i=0; inb_memdevs; i++) { +*(ssdt_ptr++) = 0xA0; // IfOp +ssdt_ptr = encodeLen(ssdt_ptr, 11, 1); +*(ssdt_ptr++) = 0x93; // LEqualOp +*(ssdt_ptr++) = 0x68; // Arg0Op +*(ssdt_ptr++) = 0x0A; // BytePrefix +*(ssdt_ptr++) = i; +*(ssdt_ptr++) = 0x86; // NotifyOp +*(ssdt_ptr++) = 'M'; +*(ssdt_ptr++) = 'P'; +*(ssdt_ptr++) = getHex(i 4); +*(ssdt_ptr++) = getHex(i); +*(ssdt_ptr++) = 0x69; // Arg1Op +} + +// build Name(MEON, Package() { One, One, ..., Zero, Zero, ... }) +*(ssdt_ptr++) = 0x08; // NameOp +*(ssdt_ptr++) = 'M'; +*(ssdt_ptr++) = 'E'; +*(ssdt_ptr++) = 'O'; +*(ssdt_ptr++) = 'N'; +*(ssdt_ptr++) = 0x12; // PackageOp +ssdt_ptr = encodeLen(ssdt_ptr, 2+1+(1*nb_memdevs), 2); +*(ssdt_ptr++) = nb_memdevs; + +dimm = numadimmsmap; +u8 memslot_status = 0, enabled; + +for (i = 0; i nb_memdevs; i++) { +enabled = 0; +if (i % 8 == 0) +memslot_status = inb(MEM_BASE + i/8); +enabled = memslot_status 1; +mem_base = *dimm++; +mem_len = *dimm++; +dimm++; // node +*(ssdt_ptr++) = enabled ? 0x01 : 0x00; +if (enabled) +add_e820(mem_base, mem_len, E820_RAM); +memslot_status = memslot_status 1; +} + +return ssdt_ptr; +} + static void* build_ssdt(void) { +int nb_memdevs; +u64 *numadimmsmap; int acpi_cpus = MaxCountCPUs 0xff ? 0xff : MaxCountCPUs; int length = (sizeof(ssdp_misc_aml)
[Qemu-devel] [PATCH v5 0/7] support for ACPI memory hotplug
This series adds memory devices and related methods to support ACPI memory hotplug. This series works with qemu counterpart. See qemu series for a detailed description. Hu Tao (2): set psize to 0 when romfile_loadfile failed acpi: generate hotplug memory devices Vasilis Liaskovitis (5): Add ACPI_EXTRACT_DEVICE* macros Add SSDT memory device support acpi-dsdt: Implement functions for memory hotplug q35: Add memory hotplug handler pci: Use paravirt interface for pcimem_start and pcimem64_start Makefile | 2 +- src/acpi-dsdt-mem-hotplug.dsl | 57 src/acpi-dsdt.dsl | 5 +- src/acpi.c| 151 -- src/paravirt.c| 15 + src/paravirt.h| 1 + src/pciinit.c | 9 +++ src/q35-acpi-dsdt.dsl | 6 +- src/romfile.c | 13 ++-- src/ssdt-mem.dsl | 61 + tools/acpi_extract.py | 28 11 files changed, 333 insertions(+), 15 deletions(-) create mode 100644 src/acpi-dsdt-mem-hotplug.dsl create mode 100644 src/ssdt-mem.dsl -- 1.8.3.1
[Qemu-devel] [PATCH v5 2/7] Add SSDT memory device support
From: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com Define SSDT hotplug-able memory devices in _SB namespace. The dynamically generated SSDT includes per memory device hotplug methods. These methods just call methods defined in the DSDT. Also dynamically generate a MTFY method and a MEON array of the online/available memory devices. ACPI extraction macros are used to place the AML code in variables later used by src/acpi. The design is taken from SSDT cpu generation. Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com --- Makefile | 2 +- src/ssdt-mem.dsl | 61 2 files changed, 62 insertions(+), 1 deletion(-) create mode 100644 src/ssdt-mem.dsl diff --git a/Makefile b/Makefile index 759..b88fb90 100644 --- a/Makefile +++ b/Makefile @@ -223,7 +223,7 @@ $(OUT)%.hex: src/%.dsl ./tools/acpi_extract_preprocess.py ./tools/acpi_extract.p $(Q)$(PYTHON) ./tools/acpi_extract.py $(OUT)$*.lst $(OUT)$*.off $(Q)cat $(OUT)$*.off $@ -$(OUT)acpi.o: $(OUT)acpi-dsdt.hex $(OUT)ssdt-proc.hex $(OUT)ssdt-pcihp.hex $(OUT)ssdt-misc.hex $(OUT)q35-acpi-dsdt.hex +$(OUT)acpi.o: $(OUT)acpi-dsdt.hex $(OUT)ssdt-proc.hex $(OUT)ssdt-pcihp.hex $(OUT)ssdt-misc.hex $(OUT)q35-acpi-dsdt.hex $(OUT)ssdt-mem.hex Kconfig rules diff --git a/src/ssdt-mem.dsl b/src/ssdt-mem.dsl new file mode 100644 index 000..d0f6203 --- /dev/null +++ b/src/ssdt-mem.dsl @@ -0,0 +1,61 @@ +/* This file is the basis for the ssdt_mem[] variable in src/acpi.c. + * It is similar in design to the ssdt_proc variable. + * It defines the contents of the per-dimm QWordMemory() object. At + * runtime, a dynamically generated SSDT will contain one copy of this + * AML snippet for every possible memory device in the system. The + * objects will * be placed in the \_SB_ namespace. + * + * In addition to the aml code generated from this file, the + * src/acpi.c file creates a MTFY method with an entry for each memdevice: + * Method(MTFY, 2) { + * If (LEqual(Arg0, 0x00)) { Notify(MP00, Arg1) } + * If (LEqual(Arg0, 0x01)) { Notify(MP01, Arg1) } + * ... + * } + * and a MEON array with the list of active and inactive memory devices: + * Name(MEON, Package() { One, One, ..., Zero, Zero, ... }) + */ +ACPI_EXTRACT_ALL_CODE ssdm_mem_aml + +DefinitionBlock (ssdt-mem.aml, SSDT, 0x02, BXPC, CSSDT, 0x1) +/* v-- DO NOT EDIT --v */ +{ +ACPI_EXTRACT_DEVICE_START ssdt_mem_start +ACPI_EXTRACT_DEVICE_END ssdt_mem_end +ACPI_EXTRACT_DEVICE_STRING ssdt_mem_name +Device(MPAA) { +ACPI_EXTRACT_NAME_BYTE_CONST ssdt_mem_id +Name(ID, 0xAA) +/* ^-- DO NOT EDIT --^ + * + * The src/acpi.c code requires the above layout so that it can update + * MPAA and 0xAA with the appropriate MEMDEVICE id (see + * SD_OFFSET_MEMHEX/MEMID1/MEMID2). Don't change the above without + * also updating the C code. + */ +Name(_HID, EISAID(PNP0C80)) +Name(_PXM, 0xAA) + +External(CMST, MethodObj) +External(MPEJ, MethodObj) + +Name(_CRS, ResourceTemplate() { +QwordMemory( + ResourceConsumer, + , // _DEC + MinFixed, // _MIF + MaxFixed, // _MAF + Cacheable,// _MEM + ReadWrite,// _RW + 0x0, // _GRA + 0xDEADBEEF, // _MIN + 0xE6ADBEEE, // _MAX + 0x, // _TRA + 0x0800, // _LEN + ) +}) +Method (_STA, 0) { +Return(CMST(ID)) +} +} +} -- 1.8.3.1
[Qemu-devel] [PATCH v5 08/14] memory: record below_4g_mem_size, above_4g_mem_size
Signed-off-by: Hu Tao hu...@cn.fujitsu.com --- hw/i386/pc.c | 17 - include/hw/i386/pc.h | 2 ++ 2 files changed, 10 insertions(+), 9 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 5bb4989..55056b1 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -1288,7 +1288,6 @@ static int memory_controller_init(PCIDevice *dev) MemoryController *m = MEMORY_CONTROLLER(dev); MemoryControllerClass *c = MEMORY_CONTROLLER_GET_CLASS(dev); ram_addr_t ram_size; -hwaddr below_4g_mem_size, above_4g_mem_size; hwaddr pci_hole_start, pci_hole_size; hwaddr pci_hole64_start, pci_hole64_size; int i; @@ -1296,11 +1295,11 @@ static int memory_controller_init(PCIDevice *dev) g_assert(m-system_memory != NULL); if(m-ram_size c-pci_hole_start) { -below_4g_mem_size = c-pci_hole_start; -above_4g_mem_size = m-ram_size - c-pci_hole_start; +m-below_4g_mem_size = c-pci_hole_start; +m-above_4g_mem_size = m-ram_size - c-pci_hole_start; } else { -below_4g_mem_size = m-ram_size; -above_4g_mem_size = 0; +m-below_4g_mem_size = m-ram_size; +m-above_4g_mem_size = 0; } /* Allocate RAM. We allocate it as a single memory region and use @@ -1310,16 +1309,16 @@ static int memory_controller_init(PCIDevice *dev) memory_region_init_ram(m-ram, pc.ram, m-ram_size); vmstate_register_ram_global(m-ram); memory_region_init_alias(m-ram_below_4g, ram-below-4g, m-ram, - 0, below_4g_mem_size); + 0, m-below_4g_mem_size); memory_region_add_subregion(m-system_memory, 0, m-ram_below_4g); -if (above_4g_mem_size 0) { +if (m-above_4g_mem_size 0) { memory_region_init_alias(m-ram_above_4g, ram-above-4g, m-ram, - below_4g_mem_size, above_4g_mem_size); + m-below_4g_mem_size, m-above_4g_mem_size); memory_region_add_subregion(m-system_memory, c-pci_hole_end, m-ram_above_4g); } -pci_hole_start = below_4g_mem_size; +pci_hole_start = m-below_4g_mem_size; pci_hole_size = c-pci_hole_end - pci_hole_start; pci_hole64_start = c-pci_hole_end + m-ram_size - pci_hole_start; diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index 5d36558..e2cbc1b 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -46,6 +46,8 @@ typedef struct MemoryController { MemoryRegion ram; MemoryRegion ram_below_4g; MemoryRegion ram_above_4g; +hwaddr below_4g_mem_size; +hwaddr above_4g_mem_size; } MemoryController; void mc_update_pam(MemoryController *d); -- 1.8.3.1
[Qemu-devel] [PATCH v5 4/7] set psize to 0 when romfile_loadfile failed
Signed-off-by: Hu Tao hu...@cn.fujitsu.com --- src/romfile.c | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/src/romfile.c b/src/romfile.c index ea71d1f..b1b89bb 100644 --- a/src/romfile.c +++ b/src/romfile.c @@ -51,28 +51,33 @@ romfile_loadfile(const char *name, int *psize) { struct romfile_s *file = romfile_find(name); if (!file) -return NULL; +goto failed; int filesize = file-size; if (!filesize) -return NULL; +goto failed; char *data = malloc_tmphigh(filesize+1); if (!data) { warn_noalloc(); -return NULL; +goto failed; } dprintf(5, Copying romfile '%s' (len %d)\n, name, filesize); int ret = file-copy(file, data, filesize); if (ret 0) { free(data); -return NULL; +goto failed; } if (psize) *psize = filesize; data[filesize] = '\0'; return data; + +failed: +if (psize) +*psize = 0; +return NULL; } // Attempt to load an integer from the given file - return 'defval' -- 1.8.3.1
[Qemu-devel] [PATCH v5 1/7] Add ACPI_EXTRACT_DEVICE* macros
From: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com This allows to extract the beginning, end and name of a Device object. Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com --- tools/acpi_extract.py | 28 1 file changed, 28 insertions(+) diff --git a/tools/acpi_extract.py b/tools/acpi_extract.py index ab8ced6..8ad15d3 100755 --- a/tools/acpi_extract.py +++ b/tools/acpi_extract.py @@ -230,6 +230,28 @@ def aml_package_start(offset): offset += 1 return offset + aml_pkglen_bytes(offset) + 1 +def aml_device_start(offset): +#0x5B 0x82 DeviceOp PkgLength NameString ProcID +if ((aml[offset] != 0x5B) or (aml[offset + 1] != 0x82)): +die( Name offset 0x%x: expected 0x5B 0x83 actual 0x%x 0x%x % + (offset, aml[offset], aml[offset + 1])); +return offset + +def aml_device_string(offset): +#0x5B 0x82 DeviceOp PkgLength NameString ProcID +start = aml_device_start(offset) +offset += 2 +pkglenbytes = aml_pkglen_bytes(offset) +offset += pkglenbytes +return offset + +def aml_device_end(offset): +start = aml_device_start(offset) +offset += 2 +pkglenbytes = aml_pkglen_bytes(offset) +pkglen = aml_pkglen(offset) +return offset + pkglen + lineno = 0 for line in fileinput.input(): # Strip trailing newline @@ -322,6 +344,12 @@ for i in range(len(asl)): offset = aml_processor_end(offset) elif (directive == ACPI_EXTRACT_PKG_START): offset = aml_package_start(offset) +elif (directive == ACPI_EXTRACT_DEVICE_START): +offset = aml_device_start(offset) +elif (directive == ACPI_EXTRACT_DEVICE_STRING): +offset = aml_device_string(offset) +elif (directive == ACPI_EXTRACT_DEVICE_END): +offset = aml_device_end(offset) else: die(Unsupported directive %s % directive) -- 1.8.3.1
[Qemu-devel] [PATCH] vmdk: remove wrong calculation of relative path
When creating image with backing file, the driver tries to calculate the relative path from created image file to backing file, but the path computation is incorrect. e.g.: $ qemu-img create -f vmdk -b vmdk-data-disk.vmdk vmdk-data-snapshot1 Formatting 'vmdk-data-snapshot1', fmt=vmdk size=10737418240 backing_file='vmdk-data-disk.vmdk' compat6=off zeroed_grain=off $ qemu-img info vmdk-data-snapshot1 image: vmdk-data-snapshot1 file format: vmdk virtual size: 10G (10737418240 bytes) disk size: 12K - backing file: disk.vmdk The common part in file names, vmdk-data-, is incorrectly forgotten by relative_path(). As the VMDK specification has no restriction on parentNameHint to be relative path, we simply remove this by using the backing_file option. Signed-off-by: Fam Zheng f...@redhat.com --- block/vmdk.c | 44 +--- 1 file changed, 1 insertion(+), 43 deletions(-) diff --git a/block/vmdk.c b/block/vmdk.c index 975e1d4..a28fb5e 100644 --- a/block/vmdk.c +++ b/block/vmdk.c @@ -1487,45 +1487,6 @@ static int filename_decompose(const char *filename, char *path, char *prefix, return VMDK_OK; } -static int relative_path(char *dest, int dest_size, -const char *base, const char *target) -{ -int i = 0; -int n = 0; -const char *p, *q; -#ifdef _WIN32 -const char *sep = \\; -#else -const char *sep = /; -#endif - -if (!(dest base target)) { -return VMDK_ERROR; -} -if (path_is_absolute(target)) { -pstrcpy(dest, dest_size, target); -return VMDK_OK; -} -while (base[i] == target[i]) { -i++; -} -p = base[i]; -q = target[i]; -while (*p) { -if (*p == *sep) { -n++; -} -p++; -} -dest[0] = '\0'; -for (; n; n--) { -pstrcat(dest, dest_size, ..); -pstrcat(dest, dest_size, sep); -} -pstrcat(dest, dest_size, q); -return VMDK_OK; -} - static int vmdk_create(const char *filename, QEMUOptionParameter *options) { int fd, idx = 0; @@ -1625,7 +1586,6 @@ static int vmdk_create(const char *filename, QEMUOptionParameter *options) return -ENOTSUP; } if (backing_file) { -char parent_filename[PATH_MAX]; BlockDriverState *bs = bdrv_new(); ret = bdrv_open(bs, backing_file, NULL, 0, NULL); if (ret != 0) { @@ -1638,10 +1598,8 @@ static int vmdk_create(const char *filename, QEMUOptionParameter *options) } parent_cid = vmdk_read_cid(bs, 0); bdrv_delete(bs); -relative_path(parent_filename, sizeof(parent_filename), - filename, backing_file); snprintf(parent_desc_line, sizeof(parent_desc_line), -parentFileNameHint=\%s\, parent_filename); +parentFileNameHint=\%s\, backing_file); } /* Create extents */ -- 1.8.3.1
[Qemu-devel] [PATCH v5 6/7] q35: Add memory hotplug handler
From: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com --- src/q35-acpi-dsdt.dsl | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/q35-acpi-dsdt.dsl b/src/q35-acpi-dsdt.dsl index c031d83..5b28d72 100644 --- a/src/q35-acpi-dsdt.dsl +++ b/src/q35-acpi-dsdt.dsl @@ -403,7 +403,7 @@ DefinitionBlock ( } #include acpi-dsdt-cpu-hotplug.dsl - +#include acpi-dsdt-mem-hotplug.dsl / * General purpose events @@ -418,7 +418,9 @@ DefinitionBlock ( // CPU hotplug event \_SB.PRSC() } -Method(_L02) { +Method(_E02) { +// Memory hotplug event +\_SB.MESC() } Method(_L03) { } -- 1.8.3.1
Re: [Qemu-devel] [PATCH v3 2/2] net: introduce command to query rx-filter information
Eric Blake ebl...@redhat.com writes: On 05/24/2013 07:08 AM, Luiz Capitulino wrote: This sounds like premature optimization to me, but I wonder if instead of cluttering commands with arguments to do the filtering we could add some standard way of doing this in the QAPI. Maybe we could make QAPI support generic filtering for all query-* commands. But there's still a cost with making all query-* commands malloc the space for an entire list only to then have a QAPI layer on top of it do filtering before handing back the single element over the wire, compared to passing the filtering down to the query-* implementation to do the filtering in place. I'd expect the time spent on malloc to be dwarved several times over by I/O latency. If we worry about malloc impacting our latency, then we should not be using QAPI! It's really, really malloc-happy. In QMP, we generally don't. It was you who suggested a filter command? No, Stefan suggested it on v1: https://lists.gnu.org/archive/html/qemu-devel/2013-04/msg03102.html So far, I'm with Luiz here.
Re: [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug
On Wed, Jun 26, 2013 at 4:38 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 26/06/2013 10:20, liu ping fan ha scritto: On the other hand, pushing _delete() out of finalization path is not easy, since we do not what time the DeviceState has done with its bh. See above: - if the BH will run in the iothread, the BH is definitely not running (because the BH runs under the BQL, and the reclaimer owns it) It works for the case in which the DeviceState's reclaimer calls _bh_delete(). But in another case(also exist in current code), where we call _bh_delete() in callback, the bh will be scheduled, and oops! But you may know that the BH is not scheduled after removal, too. No, the removal can run in parallel with the other mmio-dispatch which can resort to schedule a bh. I.e, the removal can not call _bh_delete(), so it do not know whether bh will be scheduled or not. There are not that many devices that have bottom halves (apart from those that use bottom halves in ptimer). If they really need to, they can do the object_ref/unref themselves. But I expect this to be rare, and generic code should not need an owner field in bottom halves. For Agree :), so the aim is how to handle bh safely when unplug, if we can resolve it with rcu, it will be more fine! example, block devices should stop sending requests after removal. Yes, but need we need take account for stop sending request != no mmio-dispatch in fly ? And I think these mmio-dispatch are the criminal :), making it is hard to sync with bh's stop. - if the BH is running in another thread, waiting for that thread to terminate obviously makes the BH not running. This imply that except for qemu_aio_context, AioContext can be only shared by one device, right? Otherwise we can not just terminate the thread. If it is true, why can not we have more than one just like qemu_aio_context? Yes, if you do it that way you can have only one AioContext per device. RCU is another way, and doesn't have the same limitation. We should avoid introducing infrastructure that we are not sure is needed. For what it's worth, your patches to make the bottom half list thread-safe are also slightly premature. However, they do not change the API and it makes some sense to add the infrastructure. In this case, I'm simply not sure that we're there yet. See, thx. If you look at the memory work, for example, the owner patches happen to be useful for BQL-less dispatch too, but they are solving existing hot-unplug bugs. Oh, I will read it again, since I had thought the owner is only used for the purpose of refcnt. What we need is separation of removal and reclamation. Without that, any effort to make things unplug-safe is going to be way way more complicated than necessary. Agree, but when I tried for bh, I found the separation of removal and reclamation are not easy. We can not _bh_delete() in acpi_piix_eject_slot(), because mmio-dispatch can schedule a bh at the same time. acpi_piix_eject_slot() is removal, not reclamation. The plan there is that qdev_free calls the exit callback immediately (which can do qemu_bh_delete), and schedules an unref after the next RCU grace period. At the next RCU grace period the BH will not be running, thus it is safe to finalize the object. I have two question: 1st, who trigger qdev_free? //Not figuring out the total design, but I feel it is quite different from kernel's design, where refcnt-0, then exit is called. 2nd, _delete_bh() means even there is any not-severed request, the request will be canceled. Is it legal, and will not lose data(not sure, since I do not know who will call qdev_free)? Thxregards, Pingfan Paolo And as explained, only two places can hold the _bh_delete(). In a short word, with rcu, we need to constrain the calling idiom of bh, i.e., putting them in reclaimer. On the other hand, my patch try to leave the calling idiom as it is, and handle this issue inside bh. Regards, Pingfan Paolo
Re: [Qemu-devel] [PATCH v5 05/14] vl: handle -device dimm
Il 26/06/2013 11:13, Hu Tao ha scritto: From: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com Signed-off-by: Hu Tao hu...@cn.fujitsu.com --- vl.c | 51 +++ 1 file changed, 51 insertions(+) diff --git a/vl.c b/vl.c index 767e020..9d88a79 100644 --- a/vl.c +++ b/vl.c @@ -170,6 +170,7 @@ int main(int argc, char **argv) #include ui/qemu-spice.h #include qapi/string-input-visitor.h +#include hw/mem-hotplug/dimm.h //#define DEBUG_NET //#define DEBUG_SLIRP @@ -252,6 +253,7 @@ static QTAILQ_HEAD(, FWBootEntry) fw_boot_order = int nb_numa_nodes; uint64_t node_mem[MAX_NODES]; unsigned long *node_cpumask[MAX_NODES]; +int nb_hp_dimms; uint8_t qemu_uuid[16]; @@ -2338,6 +2340,50 @@ static int chardev_init_func(QemuOpts *opts, void *opaque) return 0; } +static int dimmcfg_init_func(QemuOpts *opts, void *opaque) +{ +const char *driver; +const char *id; +uint64_t node, size; +uint32_t populated; +const char *buf, *busbuf; + +/* DimmDevice configuration needs to be known in order to initialize chipset + * with correct memory and pci ranges. But all devices are created after + * chipset / machine initialization. In * order to avoid this problem, we + * parse dimm information earlier into dimmcfg structs. */ + +driver = qemu_opt_get(opts, driver); +if (!strcmp(driver, dimm)) { + +id = qemu_opts_id(opts); +buf = qemu_opt_get(opts, size); +parse_option_size(size, buf, size, NULL); +buf = qemu_opt_get(opts, node); +parse_option_number(node, buf, node, NULL); +busbuf = qemu_opt_get(opts, bus); +buf = qemu_opt_get(opts, populated); +if (!buf) { +populated = 0; +} else { +populated = strcmp(buf, on) ? 0 : 1; +} + +dimm_config_create((char *)id, size, busbuf ? busbuf : membus.0, + node, nb_hp_dimms); + +/* if !populated, we just keep the config. The real device + * will be created in the future with a normal device_add + * command. */ +if (!populated) { +qemu_opts_del(opts); +} I think you need another option than -device dimm. For example it could be declared together with the NUMA node. This could declare two NUMA nodes, each with 2G of populated and 2G of unpopulated RAM: -numa node,mem-range=0-2G,mem-range-hotplug=4G-6G \ -numa node,mem-range=2G-4G,mem-range-hotplug=6G-8G I'm not sure I like the names particularly though. CCing Eduardo, Bandan and Wanlong Gao. Paolo +nb_hp_dimms++; +} + +return 0; +} + #ifdef CONFIG_VIRTFS static int fsdev_init_func(QemuOpts *opts, void *opaque) { @@ -4260,6 +4306,11 @@ int main(int argc, char **argv, char **envp) } qemu_add_globals(); +/* init generic devices */ +if (qemu_opts_foreach(qemu_find_opts(device), + dimmcfg_init_func, NULL, 1) != 0) { +exit(1); +} qdev_machine_init(); QEMUMachineInitArgs args = { .ram_size = ram_size,
Re: [Qemu-devel] [PATCH v5] QEMUBH: make AioContext's bh re-entrant
On Tue, Jun 25, 2013 at 5:40 PM, Kevin Wolf kw...@redhat.com wrote: Am 25.06.2013 um 19:26 hat Liu Ping Fan geschrieben: BH will be used outside big lock, so introduce lock to protect between the writers, ie, bh's adders and deleter. The lock only affects the writers and bh's callback does not take this extra lock. Note that for the same AioContext, aio_bh_poll() can not run in parallel yet. Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com - Please use exactly three dashes so that 'git am' recognises it as the end of the commit message. Sorry, I will notice at the next time. Should I re-post it? Thanks and regards, Pingfan This doesn't compile yet because smp_read_barrier_depends() isn't merged yet, so maybe there's still time for some nitpicking: Wouldn't using atomic_set/get better document things and make them easier to read? It should be correct anyway, so: Reviewed-by: Kevin Wolf kw...@redhat.com
Re: [Qemu-devel] [PATCH v3 2/2] net: introduce command to query rx-filter information
Amos Kong ak...@redhat.com writes: We want to implement mac programming over macvtap through Libvirt, related rx-filter configuration contains main mac, some of rx-mode and mac-table. The previous patch adds QMP event to notify management of rx-filter change. This patch adds a monitor command to query rx-filter information. A flag is used to avoid events flooding, if user don't query rx-filter after receives one event, new events won't be sent to qmp monitor. (qemu) info rx-filter vnet0 vnet0: \ promiscuous: on \ multicast: normal \ unicast: normal \ broadcast-allowed: off \ multicast-overflow: off \ unicast-overflow: off \ main-mac: 52:54:00:12:34:56 \ unicast-table: \ multicast-table: 01:00:5e:00:00:01 33:33:00:00:00:01 33:33:ff:12:34:56 Signed-off-by: Amos Kong ak...@redhat.com --- hmp-commands.hx | 2 ++ hmp.c | 49 hmp.h | 1 + hw/net/virtio-net.c | 93 + include/net/net.h | 2 ++ monitor.c | 8 + net/net.c | 47 +++ qapi-schema.json| 73 + qmp-commands.hx | 55 +++ 9 files changed, 330 insertions(+) diff --git a/hmp-commands.hx b/hmp-commands.hx index 9cea415..b7863eb 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -1639,6 +1639,8 @@ show qdev device model list show roms @item info tpm show the TPM device +@item info rx-filter [net client name] +show the rx-filter information for all nics (or for the given nic) Humor me: spell it NICs and NIC, because it's an acronym. @end table ETEXI diff --git a/hmp.c b/hmp.c index 4fb76ec..5b47382 100644 --- a/hmp.c +++ b/hmp.c @@ -653,6 +653,55 @@ void hmp_info_tpm(Monitor *mon, const QDict *qdict) qapi_free_TPMInfoList(info_list); } +void hmp_info_rx_filter(Monitor *mon, const QDict *qdict) +{ +RxFilterInfoList *filter_list, *entry; +strList *str_entry; +bool has_name = qdict_haskey(qdict, name); +const char *name = NULL; + +if (has_name) { +name = qdict_get_str(qdict, name); +} + +filter_list = qmp_query_rx_filter(has_name, name, NULL); Rather roundabout way to do const char *name = qdict_get_str(qdict, name); filter_list = qmp_query_rx_filter(name != NULL, name); +entry = filter_list; + +while (entry) { Recommend to put the loop control in one place: for (entry = filter_list; entry; entry = entry-next) { +monitor_printf(mon, %s:\n, entry-value-name); +monitor_printf(mon, \\ promiscuous: %s\n, + entry-value-promiscuous ? on : off); +monitor_printf(mon, \\ multicast: %s\n, + RxState_lookup[entry-value-multicast]); +monitor_printf(mon, \\ unicast: %s\n, + RxState_lookup[entry-value-unicast]); +monitor_printf(mon, \\ broadcast-allowed: %s\n, + entry-value-broadcast_allowed ? on : off); +monitor_printf(mon, \\ multicast-overflow: %s\n, + entry-value-multicast_overflow ? on : off); +monitor_printf(mon, \\ unicast-overflow: %s\n, + entry-value-unicast_overflow ? on : off); +monitor_printf(mon, \\ main-mac: %s\n, entry-value-main_mac); + +str_entry = entry-value-unicast_table; +monitor_printf(mon, \\ unicast-table:\n); +while (str_entry) { +monitor_printf(mon, %s\n, str_entry-value); +str_entry = str_entry-next; +} + +str_entry = entry-value-multicast_table; +monitor_printf(mon, \\ multicast-table:\n); +while (str_entry) { +monitor_printf(mon, %s\n, str_entry-value); +str_entry = str_entry-next; +} + +entry = entry-next; +} +qapi_free_RxFilterInfoList(filter_list); +} + void hmp_quit(Monitor *mon, const QDict *qdict) { monitor_suspend(mon); diff --git a/hmp.h b/hmp.h index 95fe76e..9af733e 100644 --- a/hmp.h +++ b/hmp.h @@ -37,6 +37,7 @@ void hmp_info_balloon(Monitor *mon, const QDict *qdict); void hmp_info_pci(Monitor *mon, const QDict *qdict); void hmp_info_block_jobs(Monitor *mon, const QDict *qdict); void hmp_info_tpm(Monitor *mon, const QDict *qdict); +void hmp_info_rx_filter(Monitor *mon, const QDict *qdict); void hmp_quit(Monitor *mon, const QDict *qdict); void hmp_stop(Monitor *mon, const QDict *qdict); void hmp_system_reset(Monitor *mon, const QDict *qdict); diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 1ea9556..f93e021 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -21,6 +21,8 @@ #include hw/virtio/virtio-net.h #include net/vhost_net.h #include hw/virtio/virtio-bus.h +#include qapi/qmp/qjson.h
Re: [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug
Il 26/06/2013 11:44, liu ping fan ha scritto: On Wed, Jun 26, 2013 at 4:38 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 26/06/2013 10:20, liu ping fan ha scritto: On the other hand, pushing _delete() out of finalization path is not easy, since we do not what time the DeviceState has done with its bh. See above: - if the BH will run in the iothread, the BH is definitely not running (because the BH runs under the BQL, and the reclaimer owns it) It works for the case in which the DeviceState's reclaimer calls _bh_delete(). But in another case(also exist in current code), where we call _bh_delete() in callback, the bh will be scheduled, and oops! But you may know that the BH is not scheduled after removal, too. No, the removal can run in parallel with the other mmio-dispatch which can resort to schedule a bh. But then behavior is more or less undefined. Of course the device must ensure that something sane happens, but that's the responsibility of the device, not of the generic layer. I.e, the removal can not call _bh_delete(), so it do not know whether bh will be scheduled or not. It can call qemu_bh_delete(), then all concurrent qemu_bh_schedule() will be no-ops. If you look at the memory work, for example, the owner patches happen to be useful for BQL-less dispatch too, but they are solving existing hot-unplug bugs. Oh, I will read it again, since I had thought the owner is only used for the purpose of refcnt. It is. But it goes beyond BQL-less dispatch. Anything that gives away the BQL between a map and unmap (including asynchronous I/O) needs an owner. We have ignored that until now because we do not have memory unplug. What we need is separation of removal and reclamation. Without that, any effort to make things unplug-safe is going to be way way more complicated than necessary. Agree, but when I tried for bh, I found the separation of removal and reclamation are not easy. We can not _bh_delete() in acpi_piix_eject_slot(), because mmio-dispatch can schedule a bh at the same time. acpi_piix_eject_slot() is removal, not reclamation. The plan there is that qdev_free calls the exit callback immediately (which can do qemu_bh_delete), and schedules an unref after the next RCU grace period. At the next RCU grace period the BH will not be running, thus it is safe to finalize the object. I have two question: 1st, who trigger qdev_free? //Not figuring out the total design, but I feel it is quite different from kernel's design, where refcnt-0, then exit is called. qdev_free is triggered by the guest, but free is a misnomer. It is really make it inaccessible from the guest and management (the kernel equivalent would be removal of /dev and /sys entries, for example). The actual free will happen later. 2nd, _delete_bh() means even there is any not-severed request, the request will be canceled. Is it legal, and will not lose data(not sure, since I do not know who will call qdev_free)? That's something that should be ensured by the device. For example it is not a problem to cancel virtio-net's tx_bh. If it is a problem, the device must do something else. It could even be a bdrv_drain_all() in the worst case. Paolo
Re: [Qemu-devel] [PATCH for mst/pci] output nc-name in NIC_RX_FILTER_CHANGED event
Amos Kong ak...@redhat.com writes: netclient 'name' entry in event is useful for management to know which device is changed. n-netclient_name is not always set. This patch changes to use nc-name. If we don't assign 'id', qemu will set a generated name to nc-name. Signed-off-by: Amos Kong ak...@redhat.com --- hw/net/virtio-net.c | 11 +++ 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index c88403a..e4d9752 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -200,14 +200,9 @@ static void rxfilter_notify(NetClientState *nc) VirtIONet *n = qemu_get_nic_opaque(nc); if (nc-rxfilter_notify_enabled) { -if (n-netclient_name) { -event_data = qobject_from_jsonf({ 'name': %s, 'path': %s }, -n-netclient_name, - object_get_canonical_path(OBJECT(n-qdev))); -} else { -event_data = qobject_from_jsonf({ 'path': %s }, - object_get_canonical_path(OBJECT(n-qdev))); -} +event_data = qobject_from_jsonf({ 'name': %s, 'path': %s }, + nc-name, + object_get_canonical_path(OBJECT(n-qdev))); monitor_protocol_event(QEVENT_NIC_RX_FILTER_CHANGED, event_data); qobject_decref(event_data); Is this on top of [PATCH v3 0/2] mac programming over macvtap? Yes, qdev IDs are optional, and therefore can serve as reliable identifier only when the user / management application always specifies one, and even then, you're still screwed for auto-created devices. Easily avoided for NICs, but yes, the problem is real. However, I don't agree with the solution use NetCientState name, because that's too ad hoc, and not general. Can we please use QOM paths?
Re: [Qemu-devel] [PATCH for mst/pci] output nc-name in NIC_RX_FILTER_CHANGED event
Amos Kong ak...@redhat.com writes: On Mon, Jun 24, 2013 at 02:34:59PM +0800, Amos Kong wrote: netclient 'name' entry in event is useful for management to know which device is changed. n-netclient_name is not always set. This patch changes to use nc-name. If we don't assign 'id', qemu will set a generated name to nc-name. IRC: mst akong, what do other events include? name or id? I just checked QMP/qmp-event.txt, they all use 'device name'. (eg: BLOCK_IO_ERROR, DEVICE_DELETED, DEVICE_TRAY_MOVED, BLOCK_JOB_*) If we assign 'id' for -device, device name will be set to id. Otherwise, a generated device name will set to some device. DEVICE_DELETED uses device (the qdev ID) and path (the QOM path). For reasons I don't understand, it sets path only when the device has no qdev ID. That should be cleaned up.
Re: [Qemu-devel] [PATCH 1/2] device_tree: Add qemu_devtree_setprop_sized_cells() utility function
On 26.06.2013, at 10:49, Peter Maydell wrote: On 26 June 2013 00:38, David Gibson da...@gibson.dropbear.id.au wrote: On Mon, Jun 24, 2013 at 12:02:39PM +0100, Peter Maydell wrote: On 24 June 2013 11:56, Alexander Graf ag...@suse.de wrote: This looks pretty complicated for something actually quite simple: All you want is to pass in a number of 64bit values, rather than 32bit ones, right? Nope. If the device tree blob says #address-cells is 1 and #size-cells is 1, then I want to pass in values to go in 32 bit cells. If it says #address-cells is 2 but #size-cells is still 1, then I want to pass in a value for a 64 bit cell then one for a 32 bit cell. If they're both 2 then I want to pass in values for two 64 bit cells. Hmm.. the property certainly needs to be constructed that way. But I think Alex's point is that you could make the arguments all 64-bit, and then truncate them in the generated property. Er, the arguments *are* all 64 bits and truncated in the generated property: + * @...: 0-terminated list of uint32_t number-of-cells, uint64_t value pairs There's a bigger problem, though, that exists with both versions. In general people expect integer arguments like this not to care too much about the exact integer type, because it will be promoted to the right thing. Except with varargs it won't. So if you ever have a notionally 64-bit address that happens to fit in a 32-bit variable and you pass that it, this function will be broken. And the worst of it is, it'll work most of the time, until you happen to hit the wrong ABI and parameter combination :(. Do you have a suggested improvement to the API to avoid this? I think it makes sense to make this API special-purpose for reg. We currently have a generic put any number of 32bit values into the property function (qemu_devtree_setprop_cells). Can't we also just add a qemu_devtree_setprop_reg() that walks the tree downwards in search for #address-cells and #size-cells and assembles the correct reg property from a list of 64bit arguments? qemu_devtree_setprop_reg(fdt, /foo/bar, region[0].offset, region[0].size, region[1].offset, region[1].size); Maybe we could even make it more generic to also cover ranges. Alex
Re: [Qemu-devel] [Xen-devel] [PATCH] Add Xen platform PCI device version 2.
On Wed, 2013-06-19 at 12:31 +0100, Paul Durrant wrote: That's correct. If a vendor wishes to ship a new driver for an existing piece of h/w they just post it. However, at some point the vendor will sell a new piece of h/w which requires a driver that will not work with older h/w. So, they update their device id or revision to something new, leave the old driver on windows update alone, and post a new driver that will only bind to the new h/w. Right, but that is not analogous to this change. This change is more akin to the driver folks going to the hardware guys and saying we've ended up painted into a bit of corner, the easiest way to get out of it is for you guys to reissue the exact same hardware (ASIC?) with a different device id. I can imagine the hardware guys would be thrilled with that prospect! This is model we have followed in XenServer: the platform device represents 'the set of PV drivers' You may have followed that model in XenServer but it has never been the case for upstream. The platform device simply provides an end point for enabling/triggering baseline Xen detection/support. The set of PV drivers (in so far as such a thing even exists in the mix and match world of upstream Xen) is represented via the xenbus entries for the devices and the associated feature flags and the like. and therefore to ship a new and non-backwards-compatible set of PV drivers we incremented the platform device id. That fits with the way that Windows update works and seems like a totally reasonable way to think of the platform device IMO. Clearly my opinion, and the reality of Windows Update, are somewhat contentious :-( What's contentious IMHO is adding hacks to the Xen (or qemu) code base to workaround an issue which is entirely internal to the guest Windows update/driver bindings. Ian.
Re: [Qemu-devel] [Xen-devel] [PATCH] Add Xen platform PCI device version 2.
On Thu, 2013-06-20 at 09:56 +0100, Tim Deegan wrote: At 07:47 + on 20 Jun (1371714432), Paul Durrant wrote: I agree. If this is really the only solution, we would need to have both versions presented to the guest so that old drivers continue to work without any intervention. I suspect that if we expose both, both sets of drivers try to run the same PV connections, and hilarity ensues. Actually I think I can make that work, and it is the conclusion I came to after Alex's comment. Ah, nice! In that case, I'm a lot less worried -- we can just expose both versions/devices by default and there's no need for a visible control knob tied to driver version (except maybe for debugging). It means an 'unsupported' device appearing on other/older OSes, which is unfortunate, but ISTR only Windows really complains visibly about that. I'm not at all convinced this is a good approach. Are we going to add a third, forth and fifth device whenever Linux, BSD, $other-OS paint themselves into a corner somehow WRT their internal driver model vs their Xen PV drivers? AFAIK the Citrix PV drivers have never been formally supported on anything other than XenServer and XCP (and I'm not sure about formally for XCP), so this is really an issue of supporting upgrades for people running those. I think rather than making hacks upstream, which will effectively need to be supported forever, the hack should be done on the XenServer side and take advantage of whatever the supported upgrade path is (N+1 or N+2 or whatever). This way the hack can eventually go away. For anyone who grabbed the older drivers and used them outside of the context of XenServer or XCP this is a documentation/awareness issue. Can we use the blacklisting functionality of the PV unplug protocol to blacklist previous versions of the Citrix PV drivers? I wouldn't consider this an unsuitable thing to do in upstream, in fact it would be using it for exactly the purpose for which it was designed. As long as this is sufficient to boot with emulated devices in order to switch to the newer drivers that should be good enough. Ian.
Re: [Qemu-devel] [PATCH v4 04/10] Add a script to extract VSS SDK headers on POSIX system
On 06/25/13 17:01, Paolo Bonzini wrote: Il 25/06/2013 17:01, Laszlo Ersek ha scritto: On 06/06/13 17:06, Tomoki Sekiyama wrote: +if ! command -v msiextract /dev/null; then + echo 'msiextract not found. Please install msitools.' 2 + exit 1 +fi (This is not a review comment -- I'm trying to test it:) What msiextract version (and dependencies, like libgcab, libgsf etc) are you using? I'm unable to extract vsssdk.msi; msiextract spews a bunch of assertion failures. 2e39646b7850a12673bc66ade85fece3 setup.exe 433eb024ed0c669dd1563d952ca41091 vsssdk.msi My versions (RHEL-6.4.z distro): - msitools-0.92 (built from source) - gcab-0.4 (built from source) - libgsf-1.14.15-5.el6 (patch in [1] doesn't seem to help, it only changes the kinds of asserts that fail) - glib2-2.22.5-7.el6 - libuuid-2.17.2-12.9.el6_4.3 The attached patch may help building a newer libgsf on RHEL6. It did help building it, however the runtime error has stayed. I'm starting to worry this is a more fundamental issue, namely in glib. The first errors I'm getting on the msiextract stderr are: (msiextract:28858): GLib-GObject-WARNING **: invalid cast from `GDataInputStream' to `GSeekable' (msiextract:28858): GLib-GIO-CRITICAL **: g_seekable_seek: assertion `G_IS_SEEKABLE (seekable)' failed (The second clearly being a consequence of the first.) Problem is, the first warning is wrong. According to the current, live glib docs https://developer.gnome.org/gio/stable/GDataInputStream.html, GDataInputStream implements GSeekable. The NEWS file in glib states, Overview of changes from GLib 2.32.1 to 2.33.1 == [...] * GIO: - implement GSeekable for the data and buffered stream classes Commits: git log --oneline --reverse 2.32.1..2.33.1 | grep -i seekable 90739ba Make GBufferedInputStream implement GSeekable 43895e3 Make GBufferedOutputStream implement GSeekable a44e801 Make GDataOutputStream implement GSeekable Bug: https://bugzilla.gnome.org/show_bug.cgi?id=673034 GDataOutputStream and GBufferedOutputStream are both direct descendants of GFilterOutputStream, and they both implement the GSeekable interface in isolation. However GDataInputStream is derived from GBufferedInputStream (it the former is not the sibling of the latter, as for output). GBufferedInputStream was made to implement GSeekable explicitly, and GDataInputStream gets it via inheritance. (In 2.33.1, that is.) I guess I'll extract the MSI once on F19 and stash the contents... Thanks! Laszlo
Re: [Qemu-devel] [PATCH 1/2] device_tree: Add qemu_devtree_setprop_sized_cells() utility function
On 26 June 2013 11:31, Alexander Graf ag...@suse.de wrote: I think it makes sense to make this API special-purpose for reg. We currently have a generic put any number of 32bit values into the property function (qemu_devtree_setprop_cells). Yes, but that doesn't work for things that aren't simple arrays of 32 bit values, so I think that a generic way to deal with those too would be useful. If you wanted to write a ranges property you'd need this too, so it doesn't just apply to reg. I think we could avoid the varargs doesn't promote problem by using a varargs-macro wrapper: #define qemu_devtree_setprop_sized_cells(fdt, node, prop, ...) \ do { \ uint64_t args[] = { __VA_ARGS__ }; \ do_qemu_devtree_setprop_sized_cells(fdt, node, prop, \ args, sizeof(args)); } while (0) which will promote everything (including the size arguments, harmlessly) to uint64_t, and avoids having a varargs function. Can't we also just add a qemu_devtree_setprop_reg() that walks the tree downwards in search for #address-cells and #size-cells and assembles the correct reg property from a list of 64bit arguments? Do we have an actual use for this? It seems pretty complicated. I would expect that in practice there are two major use cases: (a) create your own fdt from scratch (in which case you can just make everything 64 bits and in any case will know when creating nodes what the #address-cells etc are) (b) modify an existing fdt, in which case you definitely don't want to go poking around too deeply in the tree; anything more than just put an extra node in the root is starting to get pretty chancy. thanks -- PMM
Re: [Qemu-devel] [Xen-devel] [PATCH] Add Xen platform PCI device version 2.
-Original Message- From: Ian Campbell Sent: 26 June 2013 11:40 To: Tim (Xen.org) Cc: Paul Durrant; Matt Wilson; Alex Bligh; xen-de...@lists.xen.org; qemu- de...@nongnu.org Subject: Re: [Xen-devel] [Qemu-devel] [PATCH] Add Xen platform PCI device version 2. On Thu, 2013-06-20 at 09:56 +0100, Tim Deegan wrote: At 07:47 + on 20 Jun (1371714432), Paul Durrant wrote: I agree. If this is really the only solution, we would need to have both versions presented to the guest so that old drivers continue to work without any intervention. I suspect that if we expose both, both sets of drivers try to run the same PV connections, and hilarity ensues. Actually I think I can make that work, and it is the conclusion I came to after Alex's comment. Ah, nice! In that case, I'm a lot less worried -- we can just expose both versions/devices by default and there's no need for a visible control knob tied to driver version (except maybe for debugging). It means an 'unsupported' device appearing on other/older OSes, which is unfortunate, but ISTR only Windows really complains visibly about that. I'm not at all convinced this is a good approach. Are we going to add a third, forth and fifth device whenever Linux, BSD, $other-OS paint themselves into a corner somehow WRT their internal driver model vs their Xen PV drivers? Is there any harm in having a separate device for each OS's PV drivers? I agree it's not entirely elegant, but at least it allows for revision control when you need it. AFAIK the Citrix PV drivers have never been formally supported on anything other than XenServer and XCP (and I'm not sure about formally for XCP), so this is really an issue of supporting upgrades for people running those. I think rather than making hacks upstream, which will effectively need to be supported forever, the hack should be done on the XenServer side and take advantage of whatever the supported upgrade path is (N+1 or N+2 or whatever). This way the hack can eventually go away. For anyone who grabbed the older drivers and used them outside of the context of XenServer or XCP this is a documentation/awareness issue. Can we use the blacklisting functionality of the PV unplug protocol to blacklist previous versions of the Citrix PV drivers? I wouldn't consider this an unsuitable thing to do in upstream, in fact it would be using it for exactly the purpose for which it was designed. As long as this is sufficient to boot with emulated devices in order to switch to the newer drivers that should be good enough. We could blacklist all existing Citrix PV drivers in upstream QEMU, to avoid the clash, but that seems like a very unfriendly approach. Also, it's not going to stop someone with an existing VM, who happens to be using legacy Citrix PV drivers (an AWS VM for instance) receiving a driver from Windows Update that will blue-screen their VM on next reboot. Hence the only way forward is to bind the new drivers to something new, that we can control, so we know what driver a VM is going to get from Windows Update. And we may indeed need to modify its revision in future so that we can retire old sets of PV drivers and replace them with new ones, but only for newer XenServer releases. Thus, I also propose to make the PCI revision of the new device a command line parameter. Paul
[Qemu-devel] [PATCH 2/2] fbdev: add monitor commands to enable/disable/query
This patch adds a fbdev monitor command to enable/disable the fbdev display at runtime to both qmp and hmp. qmp: framebuffer-display enable=on|off scale=on|off device=/dev/fbn hmp: framebuffer-display on|off There is also a query-framebuffer command for qmp. Signed-off-by: Gerd Hoffmann kra...@redhat.com --- hmp-commands.hx | 14 ++ hmp.c|9 + hmp.h|1 + include/ui/console.h |1 + qapi-schema.json | 43 +++ qmp-commands.hx | 12 qmp.c| 31 +++ ui/fbdev.c | 20 8 files changed, 131 insertions(+) diff --git a/hmp-commands.hx b/hmp-commands.hx index 915b0d1..283106d 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -1563,7 +1563,21 @@ STEXI @findex qemu-io Executes a qemu-io command on the given block device. +ETEXI + +{ +.name = framebuffer-display, +.args_type = enable:b, +.params = on|off, +.help = enable/disable linux console framebuffer display, +.mhandler.cmd = hmp_framebuffer_display, +}, + +STEXI +@item framebuffer-display on | off +@findex framebuffer-display +enable/disable linux console framebuffer display. ETEXI { diff --git a/hmp.c b/hmp.c index 494a9aa..55f195f 100644 --- a/hmp.c +++ b/hmp.c @@ -1464,3 +1464,12 @@ void hmp_qemu_io(Monitor *mon, const QDict *qdict) hmp_handle_error(mon, err); } + +void hmp_framebuffer_display(Monitor *mon, const QDict *qdict) +{ +int enable = qdict_get_bool(qdict, enable); +Error *errp = NULL; + +qmp_framebuffer_display(enable, false, false, false, NULL, errp); +hmp_handle_error(mon, errp); +} diff --git a/hmp.h b/hmp.h index 56d2e92..c3a48e4 100644 --- a/hmp.h +++ b/hmp.h @@ -86,5 +86,6 @@ void hmp_nbd_server_stop(Monitor *mon, const QDict *qdict); void hmp_chardev_add(Monitor *mon, const QDict *qdict); void hmp_chardev_remove(Monitor *mon, const QDict *qdict); void hmp_qemu_io(Monitor *mon, const QDict *qdict); +void hmp_framebuffer_display(Monitor *mon, const QDict *qdict); #endif diff --git a/include/ui/console.h b/include/ui/console.h index 71b538a..5a9207d 100644 --- a/include/ui/console.h +++ b/include/ui/console.h @@ -311,6 +311,7 @@ void sdl_display_init(DisplayState *ds, int full_screen, int no_frame); /* fbdev.c */ int fbdev_display_init(const char *device, bool scale, Error **err); void fbdev_display_uninit(void); +FramebufferInfo *framebuffer_info(void); /* cocoa.m */ void cocoa_display_init(DisplayState *ds, int full_screen); diff --git a/qapi-schema.json b/qapi-schema.json index 6cc07c2..715dc1f 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -3608,3 +3608,46 @@ '*cpuid-input-ecx': 'int', 'cpuid-register': 'X86CPURegister32', 'features': 'int' } } + +## +# @framebuffer-display: +# +# Enable/disable linux console framebuffer display. +# +# @enable: whenever the framebuffer display should be enabled or disabled. +# +# @scale: #optional enables display scaling, default: off +# +# @device: #optional specifies framebuffer device, default: /dev/fb0 +# +# Returns: Nothing. +# +# Since: 1.6 +# +## +{ 'command': 'framebuffer-display', 'data': {'enable' : 'bool', + '*scale' : 'bool', + '*device' : 'str' } } + +## +# @FramebufferInfo: +# +# Since 1.6 +## +{ 'type': 'FramebufferInfo', + 'data': { 'enabled': 'bool', +'*scale' : 'bool', +'*device': 'str', +'*vtno' : 'int' } } + +## +# @query-framebuffer: +# +# Query linux console framebuffer state. +# +# Returns: FramebufferInfo. +# +# Since: 1.6 +# +## +{ 'command': 'query-framebuffer', 'returns': 'FramebufferInfo' } diff --git a/qmp-commands.hx b/qmp-commands.hx index 8cea5e5..e0661f0 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -2997,3 +2997,15 @@ Example: - { return: {} } EQMP + +{ +.name = framebuffer-display, +.args_type = enable:b,scale:b?,device:s?, +.mhandler.cmd_new = qmp_marshal_input_framebuffer_display, +}, + +{ +.name = query-framebuffer, +.args_type = , +.mhandler.cmd_new = qmp_marshal_input_query_framebuffer, +}, diff --git a/qmp.c b/qmp.c index 4c149b3..b6d826c 100644 --- a/qmp.c +++ b/qmp.c @@ -404,6 +404,37 @@ void qmp_change(const char *device, const char *target, } } +void qmp_framebuffer_display(bool enable, + bool has_scale, bool scale, + bool has_device, const char *device, + Error **errp) +{ +#if defined(CONFIG_FBDEV) +if (enable) { +if (fbdev_display_init(has_device ? device : NULL, + has_scale ? scale : false, + errp) != 0) {
[Qemu-devel] [PATCH 1/2] fbdev: add linux framebuffer display driver.
Display works, requires truecolor framebuffer with 16 or 32 bpp on the host. 32bpp is recommended. The framebuffer is used as-is, qemu doesn't try to switch modes. With LCD displays mode switching is pretty pointless IMHO, also it wouldn't work anyway with the most common fbdev drivers (vesafb, KMS). Guest display is centered on the host screen. Mouse works, uses /dev/input/mice. Keyboard works. Guest screen has whatever keymap you load inside the guest. Text windows (monitor, serial, ...) have a simple en-us keymap. Good enough to type monitor commands. Not goot enough to work seriously on a serial terminal. But the qemu terminal emulation isn't good enough for that anyway ;) Hot keys: Ctrl-Alt-Fnr - host console switching. Ctrl-Alt-nr - qemu console switching. Ctrl-Alt-S - toggle display scaling. Ctrl-Alt-ESC- exit qemu. Special feature: Sane console switching. Switching away stops screen updates. Switching back redraws the screen. When started from the linux console qemu uses the vt you've started it from (requires just read/write access to /dev/fb0). When starting from somewhere else qemu tries to open a unused virtual terminal and switch to it (usually requires root privileges to open /dev/ttynr). Signed-off-by: Gerd Hoffmann kra...@redhat.com --- configure | 12 + include/sysemu/sysemu.h |1 + include/ui/console.h|4 + qemu-options.hx |4 +- trace-events| 14 + ui/Makefile.objs|1 + ui/fbdev.c | 1164 +++ ui/linux-keynames.h | 388 vl.c| 21 + 9 files changed, 1608 insertions(+), 1 deletion(-) create mode 100644 ui/fbdev.c create mode 100644 ui/linux-keynames.h diff --git a/configure b/configure index 0e0adde..391e645 100755 --- a/configure +++ b/configure @@ -158,6 +158,7 @@ fdt= nptl= pixman= sdl= +fbdev=no virtfs= vnc=yes sparse=no @@ -549,6 +550,7 @@ Haiku) kvm=yes vhost_net=yes vhost_scsi=yes + fbdev=yes if [ $cpu = i386 -o $cpu = x86_64 ] ; then audio_possible_drivers=$audio_possible_drivers fmod fi @@ -700,6 +702,10 @@ for opt do ;; --enable-qom-cast-debug) qom_cast_debug=yes ;; + --disable-fbdev) fbdev=no + ;; + --enable-fbdev) fbdev=yes + ;; --disable-virtfs) virtfs=no ;; --enable-virtfs) virtfs=yes @@ -1051,6 +1057,8 @@ echo --disable-sdldisable SDL echo --enable-sdl enable SDL echo --disable-gtkdisable gtk UI echo --enable-gtk enable gtk UI +echo --disable-fbdev disable linux framebuffer +echo --enable-fbdev enable linux framebuffer echo --disable-virtfs disable VirtFS echo --enable-virtfs enable VirtFS echo --disable-vncdisable VNC @@ -3492,6 +3500,7 @@ fi echo pixman$pixman echo SDL support $sdl echo GTK support $gtk +echo fbdev support $fbdev echo curses support$curses echo curl support $curl echo mingw32 support $mingw32 @@ -3722,6 +3731,9 @@ if test $sdl = yes ; then echo CONFIG_SDL=y $config_host_mak echo SDL_CFLAGS=$sdl_cflags $config_host_mak fi +if test $fbdev = yes ; then + echo CONFIG_FBDEV=y $config_host_mak +fi if test $cocoa = yes ; then echo CONFIG_COCOA=y $config_host_mak fi diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h index 2fb71af..5922311 100644 --- a/include/sysemu/sysemu.h +++ b/include/sysemu/sysemu.h @@ -91,6 +91,7 @@ typedef enum DisplayType DT_CURSES, DT_SDL, DT_GTK, +DT_FBDEV, DT_NOGRAPHIC, DT_NONE, } DisplayType; diff --git a/include/ui/console.h b/include/ui/console.h index 98edf41..71b538a 100644 --- a/include/ui/console.h +++ b/include/ui/console.h @@ -308,6 +308,10 @@ void register_vc_handler(VcHandler *handler); /* sdl.c */ void sdl_display_init(DisplayState *ds, int full_screen, int no_frame); +/* fbdev.c */ +int fbdev_display_init(const char *device, bool scale, Error **err); +void fbdev_display_uninit(void); + /* cocoa.m */ void cocoa_display_init(DisplayState *ds, int full_screen); diff --git a/qemu-options.hx b/qemu-options.hx index ca6fdf6..259e76f 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -804,7 +804,7 @@ ETEXI DEF(display, HAS_ARG, QEMU_OPTION_display, -display sdl[,frame=on|off][,alt_grab=on|off][,ctrl_grab=on|off]\n -[,window_close=on|off]|curses|none|\n +[,window_close=on|off]|curses|fbdev|none|\n vnc=display[,optargs]\n select display type\n, QEMU_ARCH_ALL) STEXI @@ -822,6 +822,8 @@ support a text mode, QEMU can display this output using a curses/ncurses interface. Nothing is displayed when the graphics device is in graphical mode or if the graphics device does not support a text mode. Generally only the VGA device models support text mode. +@item fbdev +Display
Re: [Qemu-devel] [PATCH 1/2] device_tree: Add qemu_devtree_setprop_sized_cells() utility function
On 26.06.2013, at 12:50, Peter Maydell wrote: On 26 June 2013 11:31, Alexander Graf ag...@suse.de wrote: I think it makes sense to make this API special-purpose for reg. We currently have a generic put any number of 32bit values into the property function (qemu_devtree_setprop_cells). Yes, but that doesn't work for things that aren't simple arrays of 32 bit values, so I think that a generic way to deal with those too would be useful. If you wanted to write a ranges property you'd need this too, so it doesn't just apply to reg. I think we could avoid the varargs doesn't promote problem by using a varargs-macro wrapper: #define qemu_devtree_setprop_sized_cells(fdt, node, prop, ...) \ do { \ uint64_t args[] = { __VA_ARGS__ }; \ do_qemu_devtree_setprop_sized_cells(fdt, node, prop, \ args, sizeof(args)); } while (0) which will promote everything (including the size arguments, harmlessly) to uint64_t, and avoids having a varargs function. That would work, yes :). Can't we also just add a qemu_devtree_setprop_reg() that walks the tree downwards in search for #address-cells and #size-cells and assembles the correct reg property from a list of 64bit arguments? Do we have an actual use for this? It seems pretty complicated. I would expect that in practice there are two major use cases: (a) create your own fdt from scratch (in which case you can just make everything 64 bits and in any case will know when creating nodes what the #address-cells etc are) (b) modify an existing fdt, in which case you definitely don't want to go poking around too deeply in the tree; anything more than just put an extra node in the root is starting to get pretty chancy. Well, though I do agree it would mimic exactly what the interpreter will do when reading those values, ensuring consistency, no? Alex
Re: [Qemu-devel] [Xen-devel] [PATCH] Add Xen platform PCI device version 2.
On Wed, 2013-06-26 at 12:23 +0100, Paul Durrant wrote: -Original Message- From: Ian Campbell Sent: 26 June 2013 11:40 To: Tim (Xen.org) Cc: Paul Durrant; Matt Wilson; Alex Bligh; xen-de...@lists.xen.org; qemu- de...@nongnu.org Subject: Re: [Xen-devel] [Qemu-devel] [PATCH] Add Xen platform PCI device version 2. On Thu, 2013-06-20 at 09:56 +0100, Tim Deegan wrote: At 07:47 + on 20 Jun (1371714432), Paul Durrant wrote: I agree. If this is really the only solution, we would need to have both versions presented to the guest so that old drivers continue to work without any intervention. I suspect that if we expose both, both sets of drivers try to run the same PV connections, and hilarity ensues. Actually I think I can make that work, and it is the conclusion I came to after Alex's comment. Ah, nice! In that case, I'm a lot less worried -- we can just expose both versions/devices by default and there's no need for a visible control knob tied to driver version (except maybe for debugging). It means an 'unsupported' device appearing on other/older OSes, which is unfortunate, but ISTR only Windows really complains visibly about that. I'm not at all convinced this is a good approach. Are we going to add a third, forth and fifth device whenever Linux, BSD, $other-OS paint themselves into a corner somehow WRT their internal driver model vs their Xen PV drivers? Is there any harm in having a separate device for each OS's PV drivers? I agree it's not entirely elegant, but at least it allows for revision control when you need it. AFAIK the Citrix PV drivers have never been formally supported on anything other than XenServer and XCP (and I'm not sure about formally for XCP), so this is really an issue of supporting upgrades for people running those. I think rather than making hacks upstream, which will effectively need to be supported forever, the hack should be done on the XenServer side and take advantage of whatever the supported upgrade path is (N+1 or N+2 or whatever). This way the hack can eventually go away. For anyone who grabbed the older drivers and used them outside of the context of XenServer or XCP this is a documentation/awareness issue. Can we use the blacklisting functionality of the PV unplug protocol to blacklist previous versions of the Citrix PV drivers? I wouldn't consider this an unsuitable thing to do in upstream, in fact it would be using it for exactly the purpose for which it was designed. As long as this is sufficient to boot with emulated devices in order to switch to the newer drivers that should be good enough. We could blacklist all existing Citrix PV drivers in upstream QEMU, to avoid the clash, but that seems like a very unfriendly approach. Also, it's not going to stop someone with an existing VM, who happens to be using legacy Citrix PV drivers (an AWS VM for instance) receiving a driver from Windows Update that will blue-screen their VM on next reboot. Hence the only way forward is to bind the new drivers to something new, that we can control, so we know what driver a VM is going to get from Windows Update. Is it not also possible to issue a new version of the old driver via Windows update, e.g. a stunt one which just disables itself? Or to have the new one somehow reach out and nobble the old one. Or have the new one detect the presence of the old one and refuse to install until it has been removed? And we may indeed need to modify its revision in future so that we can retire old sets of PV drivers and replace them with new ones, but only for newer XenServer releases. Thus, I also propose to make the PCI revision of the new device a command line parameter. So this ugliness is really just the thin end of the wedge and all users of these drivers in the future are going to need to be educated on which magic qemu option they need to go with the version of the driver they are running? And when they get an update they might need to go into their configuration and change it or else risk a blue screen on the next reboot? That sounds like madness to me... It may be acceptable for XenServer to tie versions of the PV drivers to specific versions of XenServer but I'm afraid that is not acceptable upstream, we can't just go around willy nilly breaking compatibility between front and backends, this is why we have feature flags, negotiation and fallbacks. It would be one thing to accept this unfortunate event and take a one time hack to dig you out of a hole. But in that case it would really need to be the case that the new version of the drivers are designed with sufficient future proofing mechanisms that any future changes can be handled internally. It seems to me like you intend to treat this mechanism as an ongoing deliberate mechanism rather than a one off fix for a historical mistake. Ian.
Re: [Qemu-devel] [PATCH v2 1/2] ps2: add support of auto-repeat
Luiz Capitulino lcapitul...@redhat.com writes: On Fri, 14 Jun 2013 13:46:41 +0800 Amos Kong ak...@redhat.com wrote: On Fri, May 31, 2013 at 08:31:17PM +0800, Amos Kong wrote: On Thu, May 30, 2013 at 11:48:46AM -0500, Anthony Liguori wrote: Amos Kong ak...@redhat.com writes: diff --git a/hw/input/ps2.c b/hw/input/ps2.c index 3412079..8adbb4a 100644 --- a/hw/input/ps2.c +++ b/hw/input/ps2.c @@ -94,6 +94,10 @@ typedef struct { int translate; int scancode_set; /* 1=XT, 2=AT, 3=PS/2 */ int ledstate; +int repeat_period; /* typematic period, ms */ +int repeat_delay; /* typematic delay, ms */ +int repeat_key; /* keycode to repeat */ +QEMUTimer *repeat_timer; This state needs to be migrated, no? I suspect it can/should be done via a subsection too. It sounds only reasonable for 'sendkey' command. We want to repeat one key for 100 times, the key should be continaully repeated in the dest vm until it reaches to 100 times. For implement this, we should also migrate key_timer in ui/input.c, then it will send a release event to ps2 queue when the key_timer is expired. The bottom patch migrates repeat_timer repeat_key, where should we save key_timer for migration? Luiz, any suggestion about migrate the key_timer in ui/input.c? I don't have any. Maybe Markus or Juan can help (CC'ed). We need to migrate it, then sendkey can continually work in dest vm until the timer is expired. I read the thread, but I'm not sure I have enough context for a sensible answer. Let me try to piece it together. On a real PC keyboard, key down generates that key's make code, key up generates the key's break code. If the key is typematic, the make code repeats while the key is down. First repeat is after a programmable delay, subsequent repeats happen at a programmable rate. Which keys are typematic is programmable in scan code set 3, but we don't implement the commands to do so. Oh well, the world is full of crappy clone keyboards, this is just one more. What problem are you trying to solve? Your cover letter mentions the sendkey command. Takes an array of keys and an optional hold-time, defaulting to 100ms. Aside: array of keys + a single hold time is a rotten design. Outside the scope of this patch. 100ms is well below the smallest typematic delay, so by default no repeat happens. But if you specify a sufficiently large hold-time, it arguably should. Is that the problem you're trying to fix? Why is this problem worth fixing? Does your patch affect anything but the sendkey command? I'm asking because I'm not at all sure injecting emulated repeat between the user's keyboard and the guest OS is a good idea. I'd expect the user's keyboard to repeat according to the user's wishes already, and I'm concerned a second repeat could mess up things.
Re: [Qemu-devel] [Xen-devel] [PATCH] Add Xen platform PCI device version 2.
At 11:23 + on 26 Jun (1372245783), Paul Durrant wrote: We could blacklist all existing Citrix PV drivers in upstream QEMU, to avoid the clash, but that seems like a very unfriendly approach. Also, it's not going to stop someone with an existing VM, who happens to be using legacy Citrix PV drivers (an AWS VM for instance) receiving a driver from Windows Update that will blue-screen their VM on next reboot. Hence the only way forward is to bind the new drivers to something new, that we can control, so we know what driver a VM is going to get from Windows Update. I don't think you ever got an answer about whether this could be finagled using version numbers in the drivers. Also: would it not be possible to have a blkfront driver (even a trivial one) in your new bus driver so it can detect and avoid this problem? Or: have your bus driver detect when the blkfront driver is missing and not unplug the emulated disk? In fact wouldn't having the emulated disk driver do the unplug solve it for free? And we may indeed need to modify its revision in future so that we can retire old sets of PV drivers and replace them with new ones, but only for newer XenServer releases. Thus, I also propose to make the PCI revision of the new device a command line parameter. I'd rather not. This gets straight back to having host-admin controls that have to manually be matched to in-guest software. Tim.
[Qemu-devel] [PATCH] mac-io: Add escc-legacy memory alias region
Mac OS X's debugging serial driver accesses the ESCC through a different register layout, called escc-legacy. This layout differs from the normal escc register layout purely by the location of the respective registers. This patch adds a memory alias region that takes normal escc registers and maps them into the escc-legacy register space. With this patch applied, a Mac OS X guest successfully emits debug output on the serial port when run with debug parameters set, for example by running: $ qemu-system-ppc -prom-env -'boot-args=-v debug=0x8 io=0xff serial=0x3' \ -cdrom 10.4.iso -boot d Signed-off-by: Alexander Graf ag...@suse.de --- hw/misc/macio/macio.c | 47 +++ 1 file changed, 47 insertions(+) diff --git a/hw/misc/macio/macio.c b/hw/misc/macio/macio.c index 9ac004a..d9971e2 100644 --- a/hw/misc/macio/macio.c +++ b/hw/misc/macio/macio.c @@ -67,12 +67,59 @@ typedef struct NewWorldMacIOState { MACIOIDEState ide[2]; } NewWorldMacIOState; +/* + * The mac-io has two interfaces to the ESCC. One is called escc-legacy, + * while the other one is the normal, current ESCC interface. + * + * The magic below creates memory aliases to spawn the escc-legacy device + * purely by rerouting the respective registers to our escc region. This + * works because the only difference between the two memory regions is the + * register layout, not their semantics. + * + * Reference: ftp://ftp.software.ibm.com/rs6000/technology/spec/chrp/inwork/CHRP_IORef_1.0.pdf + */ +static void macio_escc_legacy_setup(MacIOState *macio_state) +{ +MemoryRegion *escc_legacy = g_new(MemoryRegion, 1); +MemoryRegion *bar = macio_state-bar; +int i; +static const int maps[] = { +0x00, 0x00, +0x02, 0x20, +0x04, 0x10, +0x06, 0x30, +0x08, 0x40, +0x0A, 0x50, +0x60, 0x60, +0x70, 0x70, +0x80, 0x70, +0x90, 0x80, +0xA0, 0x90, +0xB0, 0xA0, +0xC0, 0xB0, +0xD0, 0xC0, +0xE0, 0xD0, +0xF0, 0xE0, +}; + +memory_region_init(escc_legacy, escc-legacy, 256); +for (i = 0; i ARRAY_SIZE(maps); i += 2) { +MemoryRegion *port = g_new(MemoryRegion, 1); +memory_region_init_alias(port, escc-legacy-port, macio_state-escc_mem, + maps[i+1], 0x2); +memory_region_add_subregion(escc_legacy, maps[i], port); +} + +memory_region_add_subregion(bar, 0x12000, escc_legacy); +} + static void macio_bar_setup(MacIOState *macio_state) { MemoryRegion *bar = macio_state-bar; if (macio_state-escc_mem) { memory_region_add_subregion(bar, 0x13000, macio_state-escc_mem); +macio_escc_legacy_setup(macio_state); } } -- 1.8.1.4
Re: [Qemu-devel] [Xen-devel] [PATCH] Add Xen platform PCI device version 2.
-Original Message- From: Tim Deegan [mailto:t...@xen.org] Sent: 26 June 2013 12:58 To: Paul Durrant Cc: Ian Campbell; Matt Wilson; Alex Bligh; xen-de...@lists.xen.org; qemu- de...@nongnu.org Subject: Re: [Xen-devel] [Qemu-devel] [PATCH] Add Xen platform PCI device version 2. At 11:23 + on 26 Jun (1372245783), Paul Durrant wrote: We could blacklist all existing Citrix PV drivers in upstream QEMU, to avoid the clash, but that seems like a very unfriendly approach. Also, it's not going to stop someone with an existing VM, who happens to be using legacy Citrix PV drivers (an AWS VM for instance) receiving a driver from Windows Update that will blue-screen their VM on next reboot. Hence the only way forward is to bind the new drivers to something new, that we can control, so we know what driver a VM is going to get from Windows Update. I don't think you ever got an answer about whether this could be finagled using version numbers in the drivers. No, we thought about that and I don't believe it would be possible. Also: would it not be possible to have a blkfront driver (even a trivial one) in your new bus driver so it can detect and avoid this problem? Or: have your bus driver detect when the blkfront driver is missing and not unplug the emulated disk? In fact wouldn't having the emulated disk driver do the unplug solve it for free? The issue is the old s/w not the new s/w. The old drivers make assumptions about each other's presence as we can't change that because they are already out there. And we may indeed need to modify its revision in future so that we can retire old sets of PV drivers and replace them with new ones, but only for newer XenServer releases. Thus, I also propose to make the PCI revision of the new device a command line parameter. I'd rather not. This gets straight back to having host-admin controls that have to manually be matched to in-guest software. Well not really. This is just the same as a h/w vendor shipping a new device. The drivers for the old device are still there on Windows Update; so no change. The new drivers are for the new device so only download when it is present. Setting the revision number for the 'citrix pv bus' device would only be like choosing which emulated NIC you want. Paul
Re: [Qemu-devel] [PATCH v3 2/2] net: introduce command to query rx-filter information
Meh, I got confused and reviewed an out-of-date version. Hope it's not entirely noise.
[Qemu-devel] [PATCH v2 0/2] block: add drive_backup HMP command
These patches add the sync mode argument to drive-backup (just like drive-mirror) and then introduce the drive_backup HMP command. It's necessary to add the sync mode argument although only the 'full' is supported, since the drive_mirror HMP command defaults to sync mode 'top'. To avoid confusion, we implement the same behavior as drive_mirror and use the sync mode argument which Ian Main im...@redhat.com is currently working on implementing. Stefan Hajnoczi (2): blockdev: add sync mode to drive-backup QMP command block: add drive_backup HMP command blockdev.c | 6 ++ hmp-commands.hx| 20 hmp.c | 28 hmp.h | 1 + qapi-schema.json | 14 -- qmp-commands.hx| 6 +- tests/qemu-iotests/055 | 36 +--- 7 files changed, 93 insertions(+), 18 deletions(-) -- 1.8.1.4
[Qemu-devel] [PATCH v2 1/2] blockdev: add sync mode to drive-backup QMP command
The drive-backup command is similar to the drive-mirror command, except no guest data written after the command executes gets copied. Add a sync mode argument which determines whether the entire disk is copied, just allocated clusters, or only clusters being written to by the guest. Currently only sync mode 'full' is supported - it copies the entire disk. For read-only point-in-time snapshots we may only need sync mode 'none' since the target can be a qcow2 file using the guest's disk as its backing file (no need to copy the entire disk). Finally, sync mode 'top' is useful if we wish to preserve the backing chain. Note that this patch just adds the sync mode argument to drive-backup. It does not implement sync modes 'top' or 'none'. This patch is necessary so we can add a drive-backup HMP command that behaves like the existing drive-mirror HMP command and takes a sync mode. Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- blockdev.c | 6 ++ qapi-schema.json | 14 -- qmp-commands.hx| 6 +- tests/qemu-iotests/055 | 36 +--- 4 files changed, 44 insertions(+), 18 deletions(-) diff --git a/blockdev.c b/blockdev.c index edca843..fe73278 100644 --- a/blockdev.c +++ b/blockdev.c @@ -936,6 +936,7 @@ static void drive_backup_prepare(BlkTransactionState *common, Error **errp) qmp_drive_backup(backup-device, backup-target, backup-has_format, backup-format, + backup-sync, backup-has_mode, backup-mode, backup-has_speed, backup-speed, backup-has_on_source_error, backup-on_source_error, @@ -1421,6 +1422,7 @@ void qmp_block_commit(const char *device, void qmp_drive_backup(const char *device, const char *target, bool has_format, const char *format, + enum MirrorSyncMode sync, bool has_mode, enum NewImageMode mode, bool has_speed, int64_t speed, bool has_on_source_error, BlockdevOnError on_source_error, @@ -1435,6 +1437,10 @@ void qmp_drive_backup(const char *device, const char *target, int64_t size; int ret; +if (sync != MIRROR_SYNC_MODE_FULL) { +error_setg(errp, only sync mode 'full' is currently supported); +return; +} if (!has_speed) { speed = 0; } diff --git a/qapi-schema.json b/qapi-schema.json index d6479e1..1e19e4b 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -1626,6 +1626,10 @@ # @format: #optional the format of the new destination, default is to # probe if @mode is 'existing', else the format of the source # +# @sync: what parts of the disk image should be copied to the destination +#(all the disk, only the sectors allocated in the topmost image, or +#only new I/O). +# # @mode: #optional whether and how QEMU should create a new image, default is #'absolute-paths'. # @@ -1647,7 +1651,8 @@ ## { 'type': 'DriveBackup', 'data': { 'device': 'str', 'target': 'str', '*format': 'str', -'*mode': 'NewImageMode', '*speed': 'int', +'sync': 'MirrorSyncMode', '*mode': 'NewImageMode', +'*speed': 'int', '*on-source-error': 'BlockdevOnError', '*on-target-error': 'BlockdevOnError' } } @@ -1807,6 +1812,10 @@ # # @speed: #optional the maximum speed, in bytes per second # +# @sync: what parts of the disk image should be copied to the destination +#(all the disk, only the sectors allocated in the topmost image, or +#only new I/O). +# # @on-source-error: #optional the action to take on an error on the source, # default 'report'. 'stop' and 'enospc' can only be used # if the block device supports io-status (see BlockInfo). @@ -1826,7 +1835,8 @@ ## { 'command': 'drive-backup', 'data': { 'device': 'str', 'target': 'str', '*format': 'str', -'*mode': 'NewImageMode', '*speed': 'int', +'sync': 'MirrorSyncMode', '*mode': 'NewImageMode', +'*speed': 'int', '*on-source-error': 'BlockdevOnError', '*on-target-error': 'BlockdevOnError' } } diff --git a/qmp-commands.hx b/qmp-commands.hx index 362f0e1..e075df4 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -913,7 +913,7 @@ EQMP { .name = drive-backup, -.args_type = device:B,target:s,speed:i?,mode:s?,format:s?, +.args_type = sync:s,device:B,target:s,speed:i?,mode:s?,format:s?, on-source-error:s?,on-target-error:s?, .mhandler.cmd_new = qmp_marshal_input_drive_backup, }, @@ -939,6 +939,10 @@ Arguments: - format: the format of the new destination, default is to probe if 'mode' is 'existing', else the format of the source (json-string, optional) +- sync: what parts of the disk
[Qemu-devel] [PATCH v2 2/2] block: add drive_backup HMP command
Make drive_backup available on the HMP monitor: drive_backup [-n] [-f] device target [format] The -n flag requests QEMU to reuse the image found in new-image-file, instead of recreating it from scratch. The -f flag requests QEMU to copy the whole disk, so that the result does not need a backing file. Note that this flag *must* currently be passed since the other sync modes ('none' and 'top') have not been implemented yet. Requiring it ensures that drive_backup behaves like drive_mirror. Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- hmp-commands.hx | 20 hmp.c | 28 hmp.h | 1 + 3 files changed, 49 insertions(+) diff --git a/hmp-commands.hx b/hmp-commands.hx index 915b0d1..800101b 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -1059,6 +1059,26 @@ using the specified target. ETEXI { +.name = drive_backup, +.args_type = reuse:-n,full:-f,device:B,target:s,format:s?, +.params = [-n] [-f] device target [format], +.help = initiates a point-in-time\n\t\t\t + copy for a device. The device's contents are\n\t\t\t + copied to the new image file, excluding data that\n\t\t\t + is written after the command is started.\n\t\t\t + The -n flag requests QEMU to reuse the image found\n\t\t\t + in new-image-file, instead of recreating it from scratch.\n\t\t\t + The -f flag requests QEMU to copy the whole disk,\n\t\t\t + so that the result does not need a backing file.\n\t\t\t, +.mhandler.cmd = hmp_drive_backup, +}, +STEXI +@item drive_backup +@findex drive_backup +Start a point-in-time copy of a block device to a specificed target. +ETEXI + +{ .name = drive_add, .args_type = pci_addr:s,opts:s, .params = [[domain:]bus:]slot\n diff --git a/hmp.c b/hmp.c index 494a9aa..f9fb999 100644 --- a/hmp.c +++ b/hmp.c @@ -889,6 +889,34 @@ void hmp_drive_mirror(Monitor *mon, const QDict *qdict) hmp_handle_error(mon, errp); } +void hmp_drive_backup(Monitor *mon, const QDict *qdict) +{ +const char *device = qdict_get_str(qdict, device); +const char *filename = qdict_get_str(qdict, target); +const char *format = qdict_get_try_str(qdict, format); +int reuse = qdict_get_try_bool(qdict, reuse, 0); +int full = qdict_get_try_bool(qdict, full, 0); +enum NewImageMode mode; +Error *errp = NULL; + +if (!filename) { +error_set(errp, QERR_MISSING_PARAMETER, target); +hmp_handle_error(mon, errp); +return; +} + +if (reuse) { +mode = NEW_IMAGE_MODE_EXISTING; +} else { +mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS; +} + +qmp_drive_backup(device, filename, !!format, format, + full ? MIRROR_SYNC_MODE_FULL : MIRROR_SYNC_MODE_TOP, + true, mode, false, 0, false, 0, false, 0, errp); +hmp_handle_error(mon, errp); +} + void hmp_snapshot_blkdev(Monitor *mon, const QDict *qdict) { const char *device = qdict_get_str(qdict, device); diff --git a/hmp.h b/hmp.h index 56d2e92..6c3bdcd 100644 --- a/hmp.h +++ b/hmp.h @@ -55,6 +55,7 @@ void hmp_balloon(Monitor *mon, const QDict *qdict); void hmp_block_resize(Monitor *mon, const QDict *qdict); void hmp_snapshot_blkdev(Monitor *mon, const QDict *qdict); void hmp_drive_mirror(Monitor *mon, const QDict *qdict); +void hmp_drive_backup(Monitor *mon, const QDict *qdict); void hmp_migrate_cancel(Monitor *mon, const QDict *qdict); void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict); void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict); -- 1.8.1.4
Re: [Qemu-devel] [Xen-devel] [PATCH] Add Xen platform PCI device version 2.
At 12:06 + on 26 Jun (1372248391), Paul Durrant wrote: -Original Message- From: Tim Deegan [mailto:t...@xen.org] Sent: 26 June 2013 12:58 To: Paul Durrant Cc: Ian Campbell; Matt Wilson; Alex Bligh; xen-de...@lists.xen.org; qemu- de...@nongnu.org Subject: Re: [Xen-devel] [Qemu-devel] [PATCH] Add Xen platform PCI device version 2. At 11:23 + on 26 Jun (1372245783), Paul Durrant wrote: We could blacklist all existing Citrix PV drivers in upstream QEMU, to avoid the clash, but that seems like a very unfriendly approach. Also, it's not going to stop someone with an existing VM, who happens to be using legacy Citrix PV drivers (an AWS VM for instance) receiving a driver from Windows Update that will blue-screen their VM on next reboot. Hence the only way forward is to bind the new drivers to something new, that we can control, so we know what driver a VM is going to get from Windows Update. I don't think you ever got an answer about whether this could be finagled using version numbers in the drivers. No, we thought about that and I don't believe it would be possible. This doc makes it look like it's just a matter of choosing appropriate version and dates: http://msdn.microsoft.com/en-us/library/windows/hardware/gg487453.aspx Also: would it not be possible to have a blkfront driver (even a trivial one) in your new bus driver so it can detect and avoid this problem? Or: have your bus driver detect when the blkfront driver is missing and not unplug the emulated disk? In fact wouldn't having the emulated disk driver do the unplug solve it for free? The issue is the old s/w not the new s/w. The old drivers make assumptions about each other's presence as we can't change that because they are already out there. The issue you mentioned was that the old drivers bound the block driver to the PCI device, and when your new driver is installed you get STOP 7B because the system disk is missing (because I guess you'd need the new xenbus driver to come up before it will trigger installing the new block driver). So can the new driver not fix this, either by running a trivial blkfront itself or by allowing the emulated IDE controller to live? And we may indeed need to modify its revision in future so that we can retire old sets of PV drivers and replace them with new ones, but only for newer XenServer releases. Thus, I also propose to make the PCI revision of the new device a command line parameter. I'd rather not. This gets straight back to having host-admin controls that have to manually be matched to in-guest software. Well not really. This is just the same as a h/w vendor shipping a new device. Well, that would be more like having the PCI revision reflect the Xen version. Which might be a reasonable idea, if there is to be a second PCI device. But metaphors aside, it's still requiring an admin change in order to match the software inside the VM, which as we've seen is unpopular with admins. :) Tim.
Re: [Qemu-devel] [PATCH v11 14/15] rdma: introduce MIG_STATE_NONE and change MIG_STATE_SETUP state transition
On 06/26/2013 02:37 AM, Paolo Bonzini wrote: Il 26/06/2013 02:31, Michael R. Hines ha scritto: On 06/25/2013 05:06 PM, Paolo Bonzini wrote: Il 25/06/2013 22:56, Michael R. Hines ha scritto: I was wrong - this does require a protocol extension. This is because the RDMA transfers are asynchronous, and thus we cannot know in advance that it is safe to unregister the memory associated with each individual transfer before the transfer actually completes. While the destination currently uses the protocol to participate in *registering* the page, the destination does not participate in the RDMA transfers themselves, only the source does, and thus would require a new exchange of messages to block and instruct the destination to unpin the memory. Yes, that's what I recalled too (really what mst told me :)). Does it need to be blocking though? As long as the pinning is blocking, and messages are processed in order, the source can proceed immediately after sending an unpin message. This assumes of course that the chunk is not being transmitted, and I am not sure how easy the source can determine that. No, they're not processed in order. In fact, not only does the device write out of order, but also the PCI bus writes out of order. This was such a problem in fact, that I fixed several bugs as a result a few weeks ago (v7 of the patch with an in-depth description). The destination simply cannot assume whatsoever what the ordering of the writes are - that's really the whole point of using RDMA in the first place so that the software can get out of the way of the transfer process to lower the latency of each transfer. The memory is processed out of order, but what about the messages? Those must be in order. Note that I wrote above This assumes of course that the chunk is not being transmitted. Can the source know when an asynchronous transfer finished, and delay the unpinning until that time? Paolo Yes, the source does know. There's no problem unpinning on the source. But both sides must do the unpinning, not just the source. Did I misunderstand you? Are you suggesting *only* unpinning on the source? - Michael
Re: [Qemu-devel] [PATCH 1/2] device_tree: Add qemu_devtree_setprop_sized_cells() utility function
Hi, On Wed, Jun 26, 2013 at 8:50 PM, Peter Maydell peter.mayd...@linaro.org wrote: On 26 June 2013 11:31, Alexander Graf ag...@suse.de wrote: I think it makes sense to make this API special-purpose for reg. We currently have a generic put any number of 32bit values into the property function (qemu_devtree_setprop_cells). Yes, but that doesn't work for things that aren't simple arrays of 32 bit values, so I think that a generic way to deal with those too would be useful. If you wanted to write a ranges property you'd need this too, so it doesn't just apply to reg. +1. And wouldn't an implementation of such a reg-specific function back onto Peter's new function quite nicely anyway? I think we could avoid the varargs doesn't promote problem by using a varargs-macro wrapper: #define qemu_devtree_setprop_sized_cells(fdt, node, prop, ...) \ do { \ uint64_t args[] = { __VA_ARGS__ }; \ do_qemu_devtree_setprop_sized_cells(fdt, node, prop, \ args, sizeof(args)); } while (0) Are statement expressions sanctioned? Or do we need to give up the nice caller accessible return codes? And can we factor out common functionality (mainly the error checking) with existing set_prop_cells to make the interfaces consistent? (we need to get rid of those aborts sooner or later) which will promote everything (including the size arguments, harmlessly) to uint64_t, and avoids having a varargs function. Can't we also just add a qemu_devtree_setprop_reg() that walks the tree downwards in search for #address-cells and #size-cells and assembles the correct reg property from a list of 64bit arguments? I have a patch in my tree that generalises qemu_devtree_getprop* to allow you walk parents for properties (as per the #foo-cells semantic). I use it for interrupt cells however, which kinda suggests that this wish for parent traversal is more generic than just populating reg. I think that Peters patch, along with a parent search friendly property search will be enough to be able to do whatever you want in only a handful of lines. Do we have an actual use for this? It seems pretty complicated. I would expect that in practice there are two major use cases: (a) create your own fdt from scratch (in which case you can just make everything 64 bits and in any case will know when creating nodes what the #address-cells etc are) (b) modify an existing fdt, in which case you definitely don't want to go poking around too deeply in the tree; anything more than just put an extra node in the root is starting to get pretty chancy. Looking to the future, what about -device adding a periph then having qemu add it to the dtb before boot? Regards, Peter thanks -- PMM
Re: [Qemu-devel] [PATCH v11 14/15] rdma: introduce MIG_STATE_NONE and change MIG_STATE_SETUP state transition
Il 26/06/2013 14:37, Michael R. Hines ha scritto: On 06/26/2013 02:37 AM, Paolo Bonzini wrote: Il 26/06/2013 02:31, Michael R. Hines ha scritto: On 06/25/2013 05:06 PM, Paolo Bonzini wrote: Il 25/06/2013 22:56, Michael R. Hines ha scritto: I was wrong - this does require a protocol extension. This is because the RDMA transfers are asynchronous, and thus we cannot know in advance that it is safe to unregister the memory associated with each individual transfer before the transfer actually completes. While the destination currently uses the protocol to participate in *registering* the page, the destination does not participate in the RDMA transfers themselves, only the source does, and thus would require a new exchange of messages to block and instruct the destination to unpin the memory. Yes, that's what I recalled too (really what mst told me :)). Does it need to be blocking though? As long as the pinning is blocking, and messages are processed in order, the source can proceed immediately after sending an unpin message. This assumes of course that the chunk is not being transmitted, and I am not sure how easy the source can determine that. No, they're not processed in order. In fact, not only does the device write out of order, but also the PCI bus writes out of order. This was such a problem in fact, that I fixed several bugs as a result a few weeks ago (v7 of the patch with an in-depth description). The destination simply cannot assume whatsoever what the ordering of the writes are - that's really the whole point of using RDMA in the first place so that the software can get out of the way of the transfer process to lower the latency of each transfer. The memory is processed out of order, but what about the messages? Those must be in order. Note that I wrote above This assumes of course that the chunk is not being transmitted. Can the source know when an asynchronous transfer finished, and delay the unpinning until that time? Yes, the source does know. There's no problem unpinning on the source. But both sides must do the unpinning, not just the source. Did I misunderstand you? Are you suggesting *only* unpinning on the source? I'm suggesting (if possible) that the source only tells the destination to unpin once it knows it is safe for the destination to do it. As long as unpin and pin messages are not reordered, it should be possible to do it with an asynchronous message from the source to the destination. Paolo
Re: [Qemu-devel] [PATCH 1/2] device_tree: Add qemu_devtree_setprop_sized_cells() utility function
On 06/26/2013 02:38 PM, Peter Crosthwaite wrote: Hi, On Wed, Jun 26, 2013 at 8:50 PM, Peter Maydellpeter.mayd...@linaro.org wrote: On 26 June 2013 11:31, Alexander Grafag...@suse.de wrote: I think it makes sense to make this API special-purpose for reg. We currently have a generic put any number of 32bit values into the property function (qemu_devtree_setprop_cells). Yes, but that doesn't work for things that aren't simple arrays of 32 bit values, so I think that a generic way to deal with those too would be useful. If you wanted to write a ranges property you'd need this too, so it doesn't just apply to reg. +1. And wouldn't an implementation of such a reg-specific function back onto Peter's new function quite nicely anyway? I think we could avoid the varargs doesn't promote problem by using a varargs-macro wrapper: #define qemu_devtree_setprop_sized_cells(fdt, node, prop, ...) \ do { \ uint64_t args[] = { __VA_ARGS__ }; \ do_qemu_devtree_setprop_sized_cells(fdt, node, prop, \ args, sizeof(args)); } while (0) Are statement expressions sanctioned? Or do we need to give up the nice caller accessible return codes? And can we factor out common functionality (mainly the error checking) with existing set_prop_cells to make the interfaces consistent? (we need to get rid of those aborts sooner or later) which will promote everything (including the size arguments, harmlessly) to uint64_t, and avoids having a varargs function. Can't we also just add a qemu_devtree_setprop_reg() that walks the tree downwards in search for #address-cells and #size-cells and assembles the correct reg property from a list of 64bit arguments? I have a patch in my tree that generalises qemu_devtree_getprop* to allow you walk parents for properties (as per the #foo-cells semantic). I use it for interrupt cells however, which kinda suggests that this wish for parent traversal is more generic than just populating reg. I think that Peters patch, along with a parent search friendly property search will be enough to be able to do whatever you want in only a handful of lines. Do we have an actual use for this? It seems pretty complicated. I would expect that in practice there are two major use cases: (a) create your own fdt from scratch (in which case you can just make everything 64 bits and in any case will know when creating nodes what the #address-cells etc are) (b) modify an existing fdt, in which case you definitely don't want to go poking around too deeply in the tree; anything more than just put an extra node in the root is starting to get pretty chancy. Looking to the future, what about -device adding a periph then having qemu add it to the dtb before boot? I've had a lengthy discussion about that with Anthony a while ago. His take was that this is perfectly reasonable, as long as the device tree generation code stays within the machine model. The machine would just traverse the QOM hierachy and generate device tree nodes for everything it knows. Alex
Re: [Qemu-devel] [Xen-devel] [PATCH] Add Xen platform PCI device version 2.
-Original Message- From: Tim Deegan [mailto:t...@xen.org] Sent: 26 June 2013 13:36 To: Paul Durrant Cc: Ian Campbell; Matt Wilson; Alex Bligh; xen-de...@lists.xen.org; qemu- de...@nongnu.org Subject: Re: [Xen-devel] [Qemu-devel] [PATCH] Add Xen platform PCI device version 2. At 12:06 + on 26 Jun (1372248391), Paul Durrant wrote: -Original Message- From: Tim Deegan [mailto:t...@xen.org] Sent: 26 June 2013 12:58 To: Paul Durrant Cc: Ian Campbell; Matt Wilson; Alex Bligh; xen-de...@lists.xen.org; qemu- de...@nongnu.org Subject: Re: [Xen-devel] [Qemu-devel] [PATCH] Add Xen platform PCI device version 2. At 11:23 + on 26 Jun (1372245783), Paul Durrant wrote: We could blacklist all existing Citrix PV drivers in upstream QEMU, to avoid the clash, but that seems like a very unfriendly approach. Also, it's not going to stop someone with an existing VM, who happens to be using legacy Citrix PV drivers (an AWS VM for instance) receiving a driver from Windows Update that will blue-screen their VM on next reboot. Hence the only way forward is to bind the new drivers to something new, that we can control, so we know what driver a VM is going to get from Windows Update. I don't think you ever got an answer about whether this could be finagled using version numbers in the drivers. No, we thought about that and I don't believe it would be possible. This doc makes it look like it's just a matter of choosing appropriate version and dates: http://msdn.microsoft.com/en- us/library/windows/hardware/gg487453.aspx The issue, I believe, is that DriverVerDate trumps DriverVerVersion and AFAIK we can't fake an old date because the driver signing process will not allow it. Also: would it not be possible to have a blkfront driver (even a trivial one) in your new bus driver so it can detect and avoid this problem? Or: have your bus driver detect when the blkfront driver is missing and not unplug the emulated disk? In fact wouldn't having the emulated disk driver do the unplug solve it for free? The issue is the old s/w not the new s/w. The old drivers make assumptions about each other's presence as we can't change that because they are already out there. The issue you mentioned was that the old drivers bound the block driver to the PCI device, and when your new driver is installed you get STOP 7B because the system disk is missing (because I guess you'd need the new xenbus driver to come up before it will trigger installing the new block driver). So can the new driver not fix this, either by running a trivial blkfront itself or by allowing the emulated IDE controller to live? Aha, the old drivers have a registry override which will cause them to quiesce so I could have the new driver go and set that to make them shut up. That might allow us to cleanly take over on the next reboot. And we may indeed need to modify its revision in future so that we can retire old sets of PV drivers and replace them with new ones, but only for newer XenServer releases. Thus, I also propose to make the PCI revision of the new device a command line parameter. I'd rather not. This gets straight back to having host-admin controls that have to manually be matched to in-guest software. Well not really. This is just the same as a h/w vendor shipping a new device. Well, that would be more like having the PCI revision reflect the Xen version. Which might be a reasonable idea, if there is to be a second PCI device. But metaphors aside, it's still requiring an admin change in order to match the software inside the VM, which as we've seen is unpopular with admins. :) Yes, but that's more of a business decision than an architectural one. As you say, it may well be unpopular and thus is something I'd prefer we didn't do unless we absolutely have to. What we can't do is have drivers being downloaded from Windows Update into systems where we know they are going to break things so I would like the new device to have enough tunability to avoid a trainwreck, and a parameterised revision is just enough :-) Thus, I still prefer to have a new device and leave the old one alone. Paul
Re: [Qemu-devel] Monitoring Screen Activity in QEMU/KVM
On Wed, Jun 26, 2013 at 07:47:47AM +0530, Shehbaz Jaffer wrote: I want to determine the amount of screen activity taking place on VGA monitor/ Screen for different applications (eg. playing vlc video, normal typing.) When I do not start the X server, I can easily determine the screen activity by counting the number of pages accessed in the region 0xA - 0xB (This is the VGA Monitor region in boot screen). However when I start the X Server, A diffrent set of pages are hit. Could anyone please explain how the VGA Monitor works in QEMU? qemu emulates a PCI device. Probably an ancient Cirrus Logic CL 5446, but other devices are possible. Or if someone could suggest an alternate solution to determine amount of screen activity while playing diffrent applications? I would take an existing VNC client and modify it to log the screeen activity you want to log. VNC is a well-documented protocol, there are several high quality open source clients [gtk-vnc is the one I'd pick], and doing this means the guest can run at full speed. It depends a lot on how you define screen activity. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top
Re: [Qemu-devel] [PATCH 1/2] device_tree: Add qemu_devtree_setprop_sized_cells() utility function
On 26 June 2013 13:38, Peter Crosthwaite peter.crosthwa...@xilinx.com wrote: On Wed, Jun 26, 2013 at 8:50 PM, Peter Maydell peter.mayd...@linaro.org wrote: I think we could avoid the varargs doesn't promote problem by using a varargs-macro wrapper: #define qemu_devtree_setprop_sized_cells(fdt, node, prop, ...) \ do { \ uint64_t args[] = { __VA_ARGS__ }; \ do_qemu_devtree_setprop_sized_cells(fdt, node, prop, \ args, sizeof(args)); } while (0) Are statement expressions sanctioned? Or do we need to give up the nice caller accessible return codes? I just borrowed the syntax from the existing qemu_devtree_setprop_cells macro to illustrate how the varargs bit would work; I'll tweak it to actually get the return code right for the patch proper. And can we factor out common functionality (mainly the error checking) with existing set_prop_cells to make the interfaces consistent? (we need to get rid of those aborts sooner or later) Maybe we should at some point switch to using Error** for error reporting in our wrapper APIs here? It is a bit odd that half our device tree functions die on error and the other half report a failure code. which will promote everything (including the size arguments, harmlessly) to uint64_t, and avoids having a varargs function. Can't we also just add a qemu_devtree_setprop_reg() that walks the tree downwards in search for #address-cells and #size-cells and assembles the correct reg property from a list of 64bit arguments? I have a patch in my tree that generalises qemu_devtree_getprop* to allow you walk parents for properties (as per the #foo-cells semantic). I use it for interrupt cells however, which kinda suggests that this wish for parent traversal is more generic than just populating reg. I think that Peters patch, along with a parent search friendly property search will be enough to be able to do whatever you want in only a handful of lines. Sounds good to me. -- PMM
Re: [Qemu-devel] [PATCH 1/2] device_tree: Add qemu_devtree_setprop_sized_cells() utility function
On 26 June 2013 13:38, Peter Crosthwaite peter.crosthwa...@xilinx.com wrote: Are statement expressions sanctioned? ..and to answer this specific question which I missed first time round, we already use them (eg container_of, MAKE_TCGV_PTR). We seem to mark some but not all of them with __extension__; I think on balance that's probably unnecessary given we aren't going to be trying to compile with -pedantic any time soon. -- PMM
Re: [Qemu-devel] [PATCH 2/5] tcg: Simplify logic using TCG_OPF_NOT_PRESENT
Hello Richard, eons ago Richard wrote: Expand the definition of not present to include should not be present. This means we can simplify the logic surrounding the generic tcg opcodes for which the host backend ought not be providing definitions. Signed-off-by: Richard Henderson address@hidden --- tcg/tcg-opc.h | 26 +++--- tcg/tcg.c | 4 +--- tcg/tcg.h | 3 ++- 3 files changed, 18 insertions(+), 15 deletions(-) diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h index db5e6e5..83f7147 100644 --- a/tcg/tcg-opc.h +++ b/tcg/tcg-opc.h @@ -27,17 +27,21 @@ */ /* predefined ops */ -DEF(end, 0, 0, 0, 0) /* must be kept first */ -DEF(nop, 0, 0, 0, 0) -DEF(nop1, 0, 0, 1, 0) -DEF(nop2, 0, 0, 2, 0) -DEF(nop3, 0, 0, 3, 0) -DEF(nopn, 0, 0, 1, 0) /* variable number of parameters */ +DEF(end, 0, 0, 0, TCG_OPF_NOT_PRESENT) /* must be kept first */ +DEF(nop, 0, 0, 0, TCG_OPF_NOT_PRESENT) +DEF(nop1, 0, 0, 1, TCG_OPF_NOT_PRESENT) +DEF(nop2, 0, 0, 2, TCG_OPF_NOT_PRESENT) +DEF(nop3, 0, 0, 3, TCG_OPF_NOT_PRESENT) -DEF(discard, 1, 0, 0, 0) +/* variable number of parameters */ +DEF(nopn, 0, 0, 1, TCG_OPF_NOT_PRESENT) + +DEF(discard, 1, 0, 0, TCG_OPF_NOT_PRESENT) +DEF(set_label, 0, 0, 1, TCG_OPF_BB_END | TCG_OPF_NOT_PRESENT) + +/* variable number of parameters */ +DEF(call, 0, 1, 2, TCG_OPF_CALL_CLOBBER | TCG_OPF_NOT_PRESENT) If TCG_OPF_NOT_PRESENT is supposed to mark opcodes that should not be implemented by the target, then setting TCG_OPF_NOT_PRESENT for 'call' seems wrong to me. This will actually trigger the code in the ifdef CONFIG_DEBUG_TCG and cause an assert to fail. -DEF(set_label, 0, 0, 1, TCG_OPF_BB_END) -DEF(call, 0, 1, 2, TCG_OPF_CALL_CLOBBER) /* variable number of parameters */ DEF(br, 0, 0, 1, TCG_OPF_BB_END) #define IMPL(X) (__builtin_constant_p(X) !(X) ? TCG_OPF_NOT_PRESENT : 0) @@ -166,9 +170,9 @@ DEF(muls2_i64, 2, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_muls2_i64)) /* QEMU specific */ #if TARGET_LONG_BITS TCG_TARGET_REG_BITS -DEF(debug_insn_start, 0, 0, 2, 0) +DEF(debug_insn_start, 0, 0, 2, TCG_OPF_NOT_PRESENT) #else -DEF(debug_insn_start, 0, 0, 1, 0) +DEF(debug_insn_start, 0, 0, 1, TCG_OPF_NOT_PRESENT) #endif DEF(exit_tb, 0, 0, 1, TCG_OPF_BB_END) DEF(goto_tb, 0, 0, 1, TCG_OPF_BB_END) diff --git a/tcg/tcg.c b/tcg/tcg.c index 1d8099c..c7e6567 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -1160,9 +1160,7 @@ void tcg_add_target_add_op_defs(const TCGTargetOpDef *tdefs) i = 0; for (op = 0; op ARRAY_SIZE(tcg_op_defs); op++) { const TCGOpDef *def = tcg_op_defs[op]; -if (op INDEX_op_call -|| op == INDEX_op_debug_insn_start -|| (def-flags TCG_OPF_NOT_PRESENT)) { +if (def-flags TCG_OPF_NOT_PRESENT) { this will trigger for op = 'call' /* Wrong entry in op definitions? */ if (def-used) { fprintf(stderr, Invalid op definition for %s\n, def-name); diff --git a/tcg/tcg.h b/tcg/tcg.h index df375cf..72b694f 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -593,7 +593,8 @@ enum { TCG_OPF_SIDE_EFFECTS = 0x04, /* Instruction operands are 64-bits (otherwise 32-bits). */ TCG_OPF_64BIT= 0x08, -/* Instruction is optional and not implemented by the host. */ +/* Instruction is optional and not implemented by the host, or insn + is generic and should not be implemened by the host. */ TCG_OPF_NOT_PRESENT = 0x10, }; -- 1.8.1.4
[Qemu-devel] [PATCH v4 02/12] boot-order-test: New; covering just PC for now
Signed-off-by: Markus Armbruster arm...@redhat.com --- tests/Makefile | 2 ++ tests/boot-order-test.c | 73 + 2 files changed, 75 insertions(+) create mode 100644 tests/boot-order-test.c diff --git a/tests/Makefile b/tests/Makefile index c107489..394e029 100644 --- a/tests/Makefile +++ b/tests/Makefile @@ -54,6 +54,7 @@ gcov-files-i386-y = hw/fdc.c check-qtest-i386-y += tests/ide-test$(EXESUF) check-qtest-i386-y += tests/hd-geo-test$(EXESUF) gcov-files-i386-y += hw/hd-geometry.c +check-qtest-i386-y += tests/boot-order-test$(EXESUF) check-qtest-i386-y += tests/rtc-test$(EXESUF) check-qtest-i386-y += tests/i440fx-test$(EXESUF) check-qtest-i386-y += tests/fw_cfg-test$(EXESUF) @@ -130,6 +131,7 @@ tests/m48t59-test$(EXESUF): tests/m48t59-test.o tests/fdc-test$(EXESUF): tests/fdc-test.o tests/ide-test$(EXESUF): tests/ide-test.o $(libqos-pc-obj-y) tests/hd-geo-test$(EXESUF): tests/hd-geo-test.o +tests/boot-order-test$(EXESUF): tests/boot-order-test.o tests/tmp105-test$(EXESUF): tests/tmp105-test.o $(libqos-omap-obj-y) tests/i440fx-test$(EXESUF): tests/i440fx-test.o $(libqos-pc-obj-y) tests/fw_cfg-test$(EXESUF): tests/fw_cfg-test.o $(libqos-pc-obj-y) diff --git a/tests/boot-order-test.c b/tests/boot-order-test.c new file mode 100644 index 000..447be31 --- /dev/null +++ b/tests/boot-order-test.c @@ -0,0 +1,73 @@ +/* + * Boot order test cases. + * + * Copyright (c) 2013 Red Hat Inc. + * + * Authors: + * Markus Armbruster arm...@redhat.com, + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#include glib.h +#include libqtest.h + +static void test_pc_cmos_byte(int reg, int expected) +{ +int actual; + +outb(0x70, reg); +actual = inb(0x71); +g_assert_cmphex(actual, ==, expected); +} + +static void test_pc_cmos(uint8_t boot1, uint8_t boot2) +{ +test_pc_cmos_byte(0x38, boot1); +test_pc_cmos_byte(0x3d, boot2); +} + +static void test_pc_with_args(const char *test_args, + uint8_t boot1, uint8_t boot2, + uint8_t reboot1, uint8_t reboot2) +{ +char *args = g_strdup_printf(-nodefaults -display none %s, test_args); + +qtest_start(args); +test_pc_cmos(boot1, boot2); +qmp({ 'execute': 'system_reset' }); +/* + * system_reset only requests reset. We get a RESET event after + * the actual reset completes. Need to wait for that. + */ +qmp();/* HACK: wait for event */ +test_pc_cmos(reboot1, reboot2); +qtest_quit(global_qtest); +g_free(args); +} + +static void test_pc_boot_order(void) +{ +test_pc_with_args(, 0x30, 0x12, 0x30, 0x12); +test_pc_with_args(-no-fd-bootchk, 0x31, 0x12, 0x31, 0x12); +test_pc_with_args(-boot c, 0, 0x02, 0, 0x02); +test_pc_with_args(-boot nda, 0x10, 0x34, 0x10, 0x34); +test_pc_with_args(-boot order=, 0, 0, 0, 0); +test_pc_with_args(-boot order= -boot order=c, 0, 0x02, 0, 0x02); +test_pc_with_args(-boot once=a, 0, 0x01, 0x30, 0x12); +test_pc_with_args(-boot once=a -no-fd-bootchk, 0x01, 0x01, 0x31, 0x12); +test_pc_with_args(-boot once=a,order=c, 0, 0x01, 0, 0x02); +test_pc_with_args(-boot once=d -boot order=nda, 0, 0x03, 0x10, 0x34); +test_pc_with_args(-boot once=a -boot once=b -boot once=c, + 0, 0x02, 0x30, 0x12); +} + +int main(int argc, char *argv[]) +{ +g_test_init(argc, argv, NULL); + +qtest_add_func(boot-order/pc, test_pc_boot_order); + +return g_test_run(); +} -- 1.7.11.7
[Qemu-devel] [PATCH v4 09/12] boot-order-test: Add tests for PowerPC PREP
Cc: Andreas Färber afaer...@suse.de Cc: Alexander Graf ag...@suse.de Cc: qemu-...@nongnu.org Signed-off-by: Markus Armbruster arm...@redhat.com --- tests/boot-order-test.c | 25 + 1 file changed, 25 insertions(+) diff --git a/tests/boot-order-test.c b/tests/boot-order-test.c index c711c71..75b1642 100644 --- a/tests/boot-order-test.c +++ b/tests/boot-order-test.c @@ -112,6 +112,30 @@ static void test_pc_boot_order(void) test_boot_orders(NULL, read_boot_order_pc, test_cases_pc); } +static uint8_t read_m48t59(uint64_t addr, uint16_t reg) +{ +writeb(addr, reg 0xff); +writeb(addr + 1, reg 8); +return readb(addr + 3); +} + +static uint64_t read_boot_order_prep(void) +{ +return read_m48t59(0x8000 + 0x74, 0x34); +} + +static const boot_order_test test_cases_prep[] = { +{ , 'c', 'c' }, +{ -boot c, 'c', 'c' }, +{ -boot d, 'd', 'd' }, +{} +}; + +static void test_prep_boot_order(void) +{ +test_boot_orders(prep, read_boot_order_prep, test_cases_prep); +} + static uint64_t read_boot_order_pmac(void) { QFWCFG *fw_cfg = mm_fw_cfg_init(0xf510); @@ -146,6 +170,7 @@ int main(int argc, char *argv[]) if (strcmp(arch, i386) == 0 || strcmp(arch, x86_64) == 0) { qtest_add_func(boot-order/pc, test_pc_boot_order); } else if (strcmp(arch, ppc) == 0 || strcmp(arch, ppc64) == 0) { +qtest_add_func(boot-order/prep, test_prep_boot_order); qtest_add_func(boot-order/pmac_oldworld, test_pmac_oldworld_boot_order); qtest_add_func(boot-order/pmac_newworld, -- 1.7.11.7
[Qemu-devel] [PATCH v4 10/12] boot-order-test: Add tests for Sun4m
Cc: Blue Swirl blauwir...@gmail.com Signed-off-by: Markus Armbruster arm...@redhat.com --- tests/boot-order-test.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/tests/boot-order-test.c b/tests/boot-order-test.c index 75b1642..a3928ed 100644 --- a/tests/boot-order-test.c +++ b/tests/boot-order-test.c @@ -161,6 +161,18 @@ static void test_pmac_newworld_boot_order(void) test_boot_orders(mac99, read_boot_order_pmac, test_cases_fw_cfg); } +static uint64_t read_boot_order_sun4m(void) +{ +QFWCFG *fw_cfg = mm_fw_cfg_init(0xd0510ULL); + +return qfw_cfg_get_u16(fw_cfg, FW_CFG_BOOT_DEVICE); +} + +static void test_sun4m_boot_order(void) +{ +test_boot_orders(SS-5, read_boot_order_sun4m, test_cases_fw_cfg); +} + int main(int argc, char *argv[]) { const char *arch = qtest_get_arch(); @@ -175,6 +187,8 @@ int main(int argc, char *argv[]) test_pmac_oldworld_boot_order); qtest_add_func(boot-order/pmac_newworld, test_pmac_newworld_boot_order); +} else if (strcmp(arch, sparc) == 0) { +qtest_add_func(boot-order/sun4m, test_sun4m_boot_order); } return g_test_run(); -- 1.7.11.7
[Qemu-devel] [PATCH v4 03/12] libqos: include dependencies
From: Anthony Liguori aligu...@us.ibm.com Otherwise rebuilds can fail when libqos is modified. Reported-by: Markus Armbruster arm...@redhat.com Signed-off-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Markus Armbruster arm...@redhat.com --- tests/Makefile | 1 + 1 file changed, 1 insertion(+) diff --git a/tests/Makefile b/tests/Makefile index 394e029..34a2325 100644 --- a/tests/Makefile +++ b/tests/Makefile @@ -224,3 +224,4 @@ check-block: $(patsubst %,check-%, $(check-block-y)) check: check-unit check-qtest -include $(wildcard tests/*.d) +-include $(wildcard tests/libqos/*.d) -- 1.7.11.7
[Qemu-devel] [PATCH v4 05/12] boot-order-test: Add tests for PowerMacs
From: Andreas Färber afaer...@suse.de They set the boot device via fw_cfg, which is then translated to a boot path of hd or cd in OpenBIOS. Signed-off-by: Andreas Färber afaer...@suse.de Cc: Alexander Graf ag...@suse.de Cc: qemu-...@nongnu.org Converted to libqos/fw_cfg on Anthony's request. Signed-off-by: Markus Armbruster arm...@redhat.com --- tests/Makefile | 4 +++- tests/boot-order-test.c | 50 - 2 files changed, 52 insertions(+), 2 deletions(-) diff --git a/tests/Makefile b/tests/Makefile index 34a2325..8c7352d 100644 --- a/tests/Makefile +++ b/tests/Makefile @@ -67,6 +67,8 @@ gcov-files-sparc-y += hw/m48t59.c gcov-files-sparc64-y += hw/m48t59.c check-qtest-arm-y = tests/tmp105-test$(EXESUF) gcov-files-arm-y += hw/tmp105.c +check-qtest-ppc-y += tests/boot-order-test$(EXESUF) +check-qtest-ppc64-y += tests/boot-order-test$(EXESUF) GENERATED_HEADERS += tests/test-qapi-types.h tests/test-qapi-visit.h tests/test-qmp-commands.h @@ -131,7 +133,7 @@ tests/m48t59-test$(EXESUF): tests/m48t59-test.o tests/fdc-test$(EXESUF): tests/fdc-test.o tests/ide-test$(EXESUF): tests/ide-test.o $(libqos-pc-obj-y) tests/hd-geo-test$(EXESUF): tests/hd-geo-test.o -tests/boot-order-test$(EXESUF): tests/boot-order-test.o +tests/boot-order-test$(EXESUF): tests/boot-order-test.o $(libqos-obj-y) tests/tmp105-test$(EXESUF): tests/tmp105-test.o $(libqos-omap-obj-y) tests/i440fx-test$(EXESUF): tests/i440fx-test.o $(libqos-pc-obj-y) tests/fw_cfg-test$(EXESUF): tests/fw_cfg-test.o $(libqos-pc-obj-y) diff --git a/tests/boot-order-test.c b/tests/boot-order-test.c index 447be31..766981d 100644 --- a/tests/boot-order-test.c +++ b/tests/boot-order-test.c @@ -10,7 +10,9 @@ * See the COPYING file in the top-level directory. */ +#include string.h #include glib.h +#include libqos/fw_cfg.h #include libqtest.h static void test_pc_cmos_byte(int reg, int expected) @@ -63,11 +65,57 @@ static void test_pc_boot_order(void) 0, 0x02, 0x30, 0x12); } +#define G3BEIGE_CFG_ADDR 0xf510 +#define MAC99_CFG_ADDR 0xf510 + +#define NO_QEMU_PROTOS +#include hw/nvram/fw_cfg.h +#undef NO_QEMU_PROTOS + +static void test_powermac_with_args(bool newworld, const char *extra_args, +uint16_t expected_boot, +uint16_t expected_reboot) +{ +char *args = g_strdup_printf(-nodefaults -display none -machine %s %s, + newworld ? mac99 : g3beige, extra_args); +QFWCFG *fw_cfg = mm_fw_cfg_init(newworld ? MAC99_CFG_ADDR +: G3BEIGE_CFG_ADDR); +uint16_t actual; + +qtest_start(args); +actual = qfw_cfg_get_u16(fw_cfg, FW_CFG_BOOT_DEVICE); +g_assert_cmphex(actual, ==, expected_boot); +qmp({ 'execute': 'system_reset' }); +actual = qfw_cfg_get_u16(fw_cfg, FW_CFG_BOOT_DEVICE); +g_assert_cmphex(actual, ==, expected_reboot); +qtest_quit(global_qtest); +g_free(args); +} + +static void test_powermac_boot_order(void) +{ +int i; + +for (i = 0; i 2; i++) { +bool newworld = (i == 1); + +test_powermac_with_args(newworld, , 'c', 'c'); +test_powermac_with_args(newworld, -boot c, 'c', 'c'); +test_powermac_with_args(newworld, -boot d, 'd', 'd'); +} +} + int main(int argc, char *argv[]) { +const char *arch = qtest_get_arch(); + g_test_init(argc, argv, NULL); -qtest_add_func(boot-order/pc, test_pc_boot_order); +if (strcmp(arch, i386) == 0 || strcmp(arch, x86_64) == 0) { +qtest_add_func(boot-order/pc, test_pc_boot_order); +} else if (strcmp(arch, ppc) == 0 || strcmp(arch, ppc64) == 0) { +qtest_add_func(boot-order/powermac, test_powermac_boot_order); +} return g_test_run(); } -- 1.7.11.7
[Qemu-devel] [PATCH v4 01/12] qtest: Don't reset on qtest chardev connect
libqtest's qtest_init() connecting to the qtest socket triggers reset. This was coded in the hope we could use the same QEMU process for multiple tests that way. Never used. Injects an extra reset even when it's not used, and that can mess up tests such as the one of -boot once I'm about to add. Drop it. Signed-off-by: Markus Armbruster arm...@redhat.com --- qtest.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/qtest.c b/qtest.c index 07a9612..74f1842 100644 --- a/qtest.c +++ b/qtest.c @@ -472,7 +472,12 @@ static void qtest_event(void *opaque, int event) switch (event) { case CHR_EVENT_OPENED: -qemu_system_reset(false); +/* + * We used to call qemu_system_reset() here, hoping we could + * use the same process for multiple tests that way. Never + * used. Injects an extra reset even when it's not used, and + * that can mess up tests, e.g. -boot once. + */ for (i = 0; i ARRAY_SIZE(irq_levels); i++) { irq_levels[i] = 0; } -- 1.7.11.7
[Qemu-devel] [PATCH v4 00/12] Boot order tests
v4: * Old PATCH 1-6 got merged, only tests left; cover letter changed accordingly * PATCH 1,6 unchanged * PATCH 2 fix system-reset race (Anthony) * New PATCH 4,11 to make libqos/fw_cfg usable for the ppc, Sun4 tests * New PATCH 3 shanghaied from Anthony, to make PATCH 4 compile without make clean. * PATCH 5,10,12 use libqos/fw_cfg (Anthony) * PATCH 7-9 rebased, minor tweaks v3: * Rebased, with only trivial conflicts * PATCH 08 cosmetic improvements * More test cases: new PATCH 09-16 v2: * New PATCH 7 to make testing -boot once possible * Old PATCH 5 reworked and extended became PATCH 8 * Writing more tests uncovered -no-fd-bootchk weirdness, cleaned up in new PATCH 5+6 Andreas Färber (1): boot-order-test: Add tests for PowerMacs Anthony Liguori (1): libqos: include dependencies Markus Armbruster (10): qtest: Don't reset on qtest chardev connect boot-order-test: New; covering just PC for now libqos: Add support for memory-mapped fw_cfg boot-order-test: Cover -boot once in ppc tests boot-order-test: Better separate target-specific and generic parts boot-order-test: Code motion for better readability boot-order-test: Add tests for PowerPC PREP boot-order-test: Add tests for Sun4m libqos: Generalize I/O-mapped fw_cfg boot-order-test: Add tests for Sun4u qtest.c | 7 +- tests/Makefile | 7 +- tests/boot-order-test.c | 209 +++ tests/fw_cfg-test.c | 2 +- tests/libqos/fw_cfg-pc.c | 40 - tests/libqos/fw_cfg-pc.h | 20 - tests/libqos/fw_cfg.c| 55 + tests/libqos/fw_cfg.h| 9 ++ tests/libqos/malloc-pc.c | 2 +- 9 files changed, 287 insertions(+), 64 deletions(-) create mode 100644 tests/boot-order-test.c delete mode 100644 tests/libqos/fw_cfg-pc.c delete mode 100644 tests/libqos/fw_cfg-pc.h -- 1.7.11.7
[Qemu-devel] [PATCH v4 12/12] boot-order-test: Add tests for Sun4u
Cc: Blue Swirl blauwir...@gmail.com Signed-off-by: Markus Armbruster arm...@redhat.com --- tests/boot-order-test.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/tests/boot-order-test.c b/tests/boot-order-test.c index a3928ed..4b233d0 100644 --- a/tests/boot-order-test.c +++ b/tests/boot-order-test.c @@ -173,6 +173,18 @@ static void test_sun4m_boot_order(void) test_boot_orders(SS-5, read_boot_order_sun4m, test_cases_fw_cfg); } +static uint64_t read_boot_order_sun4u(void) +{ +QFWCFG *fw_cfg = io_fw_cfg_init(0x510); + +return qfw_cfg_get_u16(fw_cfg, FW_CFG_BOOT_DEVICE); +} + +static void test_sun4u_boot_order(void) +{ +test_boot_orders(sun4u, read_boot_order_sun4u, test_cases_fw_cfg); +} + int main(int argc, char *argv[]) { const char *arch = qtest_get_arch(); @@ -189,6 +201,8 @@ int main(int argc, char *argv[]) test_pmac_newworld_boot_order); } else if (strcmp(arch, sparc) == 0) { qtest_add_func(boot-order/sun4m, test_sun4m_boot_order); +} else if (strcmp(arch, sparc64) == 0) { +qtest_add_func(boot-order/sun4u, test_sun4u_boot_order); } return g_test_run(); -- 1.7.11.7
[Qemu-devel] [PATCH v4 04/12] libqos: Add support for memory-mapped fw_cfg
Signed-off-by: Markus Armbruster arm...@redhat.com --- tests/libqos/fw_cfg.c | 29 + tests/libqos/fw_cfg.h | 3 +++ 2 files changed, 32 insertions(+) diff --git a/tests/libqos/fw_cfg.c b/tests/libqos/fw_cfg.c index e386ff7..49d1683 100644 --- a/tests/libqos/fw_cfg.c +++ b/tests/libqos/fw_cfg.c @@ -2,15 +2,19 @@ * libqos fw_cfg support * * Copyright IBM, Corp. 2012-2013 + * Copyright (C) 2013 Red Hat Inc. * * Authors: * Anthony Liguori aligu...@us.ibm.com + * Markus Armbruster arm...@redhat.com * * This work is licensed under the terms of the GNU GPL, version 2 or later. * See the COPYING file in the top-level directory. */ +#include glib.h #include libqos/fw_cfg.h +#include libqtest.h #include qemu/bswap.h void qfw_cfg_select(QFWCFG *fw_cfg, uint16_t key) @@ -50,3 +54,28 @@ uint64_t qfw_cfg_get_u64(QFWCFG *fw_cfg, uint16_t key) return le64_to_cpu(value); } +static void mm_fw_cfg_select(QFWCFG *fw_cfg, uint16_t key) +{ +writew(fw_cfg-base, key); +} + +static void mm_fw_cfg_read(QFWCFG *fw_cfg, void *data, size_t len) +{ +uint8_t *ptr = data; +int i; + +for (i = 0; i len; i++) { +ptr[i] = readb(fw_cfg-base + 2); +} +} + +QFWCFG *mm_fw_cfg_init(uint64_t base) +{ +QFWCFG *fw_cfg = g_malloc0(sizeof(*fw_cfg)); + +fw_cfg-base = base; +fw_cfg-select = mm_fw_cfg_select; +fw_cfg-read = mm_fw_cfg_read; + +return fw_cfg; +} diff --git a/tests/libqos/fw_cfg.h b/tests/libqos/fw_cfg.h index 44fc42b..19bb053 100644 --- a/tests/libqos/fw_cfg.h +++ b/tests/libqos/fw_cfg.h @@ -20,6 +20,7 @@ typedef struct QFWCFG QFWCFG; struct QFWCFG { +uint64_t base; void (*select)(QFWCFG *fw_cfg, uint16_t key); void (*read)(QFWCFG *fw_cfg, void *data, size_t len); }; @@ -31,4 +32,6 @@ uint16_t qfw_cfg_get_u16(QFWCFG *fw_cfg, uint16_t key); uint32_t qfw_cfg_get_u32(QFWCFG *fw_cfg, uint16_t key); uint64_t qfw_cfg_get_u64(QFWCFG *fw_cfg, uint16_t key); +QFWCFG *mm_fw_cfg_init(uint64_t base); + #endif -- 1.7.11.7
[Qemu-devel] [PATCH v4 08/12] boot-order-test: Code motion for better readability
Cc: Andreas Färber afaer...@suse.de Cc: Alexander Graf ag...@suse.de Cc: qemu-...@nongnu.org Signed-off-by: Markus Armbruster arm...@redhat.com --- tests/boot-order-test.c | 44 ++-- 1 file changed, 22 insertions(+), 22 deletions(-) diff --git a/tests/boot-order-test.c b/tests/boot-order-test.c index 78249fc..c711c71 100644 --- a/tests/boot-order-test.c +++ b/tests/boot-order-test.c @@ -15,19 +15,15 @@ #include libqos/fw_cfg.h #include libqtest.h -static uint8_t read_mc146818(uint16_t port, uint8_t reg) -{ -outb(port, reg); -return inb(port + 1); -} - -static uint64_t read_boot_order_pc(void) -{ -uint8_t b1 = read_mc146818(0x70, 0x38); -uint8_t b2 = read_mc146818(0x70, 0x3d); +#define NO_QEMU_PROTOS +#include hw/nvram/fw_cfg.h +#undef NO_QEMU_PROTOS -return b1 | (b2 8); -} +typedef struct { +const char *args; +uint64_t expected_boot; +uint64_t expected_reboot; +} boot_order_test; static void test_a_boot_order(const char *machine, const char *test_args, @@ -57,12 +53,6 @@ static void test_a_boot_order(const char *machine, g_free(args); } -typedef struct { -const char *args; -uint64_t expected_boot; -uint64_t expected_reboot; -} boot_order_test; - static void test_boot_orders(const char *machine, uint64_t (*read_boot_order)(void), const boot_order_test *tests) @@ -77,6 +67,20 @@ static void test_boot_orders(const char *machine, } } +static uint8_t read_mc146818(uint16_t port, uint8_t reg) +{ +outb(port, reg); +return inb(port + 1); +} + +static uint64_t read_boot_order_pc(void) +{ +uint8_t b1 = read_mc146818(0x70, 0x38); +uint8_t b2 = read_mc146818(0x70, 0x3d); + +return b1 | (b2 8); +} + static const boot_order_test test_cases_pc[] = { { , 0x1230, 0x1230 }, @@ -108,10 +112,6 @@ static void test_pc_boot_order(void) test_boot_orders(NULL, read_boot_order_pc, test_cases_pc); } -#define NO_QEMU_PROTOS -#include hw/nvram/fw_cfg.h -#undef NO_QEMU_PROTOS - static uint64_t read_boot_order_pmac(void) { QFWCFG *fw_cfg = mm_fw_cfg_init(0xf510); -- 1.7.11.7
[Qemu-devel] [PATCH v4 07/12] boot-order-test: Better separate target-specific and generic parts
The initial version did just PC. I didn't bother to separate out generic parts, because I don't like to abstract from a single case. Now we have two cases, PC and PowerMac, and I'm about to add more. Time to do it right. To ease review, this commit changes the code in-place, and the next commit reorders it for better readability. Cc: Andreas Färber afaer...@suse.de Cc: Alexander Graf ag...@suse.de Cc: qemu-...@nongnu.org Signed-off-by: Markus Armbruster arm...@redhat.com --- tests/boot-order-test.c | 148 +--- 1 file changed, 91 insertions(+), 57 deletions(-) diff --git a/tests/boot-order-test.c b/tests/boot-order-test.c index 1a8e22a..78249fc 100644 --- a/tests/boot-order-test.c +++ b/tests/boot-order-test.c @@ -15,95 +15,126 @@ #include libqos/fw_cfg.h #include libqtest.h -static void test_pc_cmos_byte(int reg, int expected) +static uint8_t read_mc146818(uint16_t port, uint8_t reg) { -int actual; - -outb(0x70, reg); -actual = inb(0x71); -g_assert_cmphex(actual, ==, expected); +outb(port, reg); +return inb(port + 1); } -static void test_pc_cmos(uint8_t boot1, uint8_t boot2) +static uint64_t read_boot_order_pc(void) { -test_pc_cmos_byte(0x38, boot1); -test_pc_cmos_byte(0x3d, boot2); +uint8_t b1 = read_mc146818(0x70, 0x38); +uint8_t b2 = read_mc146818(0x70, 0x3d); + +return b1 | (b2 8); } -static void test_pc_with_args(const char *test_args, - uint8_t boot1, uint8_t boot2, - uint8_t reboot1, uint8_t reboot2) +static void test_a_boot_order(const char *machine, + const char *test_args, + uint64_t (*read_boot_order)(void), + uint64_t expected_boot, + uint64_t expected_reboot) { -char *args = g_strdup_printf(-nodefaults -display none %s, test_args); +char *args; +uint64_t actual; +args = g_strdup_printf(-nodefaults -display none%s%s %s, + machine ? -M : , + machine ?: , + test_args); qtest_start(args); -test_pc_cmos(boot1, boot2); +actual = read_boot_order(); +g_assert_cmphex(actual, ==, expected_boot); qmp({ 'execute': 'system_reset' }); /* * system_reset only requests reset. We get a RESET event after * the actual reset completes. Need to wait for that. */ qmp();/* HACK: wait for event */ -test_pc_cmos(reboot1, reboot2); +actual = read_boot_order(); +g_assert_cmphex(actual, ==, expected_reboot); qtest_quit(global_qtest); g_free(args); } -static void test_pc_boot_order(void) +typedef struct { +const char *args; +uint64_t expected_boot; +uint64_t expected_reboot; +} boot_order_test; + +static void test_boot_orders(const char *machine, + uint64_t (*read_boot_order)(void), + const boot_order_test *tests) { -test_pc_with_args(, 0x30, 0x12, 0x30, 0x12); -test_pc_with_args(-no-fd-bootchk, 0x31, 0x12, 0x31, 0x12); -test_pc_with_args(-boot c, 0, 0x02, 0, 0x02); -test_pc_with_args(-boot nda, 0x10, 0x34, 0x10, 0x34); -test_pc_with_args(-boot order=, 0, 0, 0, 0); -test_pc_with_args(-boot order= -boot order=c, 0, 0x02, 0, 0x02); -test_pc_with_args(-boot once=a, 0, 0x01, 0x30, 0x12); -test_pc_with_args(-boot once=a -no-fd-bootchk, 0x01, 0x01, 0x31, 0x12); -test_pc_with_args(-boot once=a,order=c, 0, 0x01, 0, 0x02); -test_pc_with_args(-boot once=d -boot order=nda, 0, 0x03, 0x10, 0x34); -test_pc_with_args(-boot once=a -boot once=b -boot once=c, - 0, 0x02, 0x30, 0x12); +int i; + +for (i = 0; tests[i].args; i++) { +test_a_boot_order(machine, tests[i].args, + read_boot_order, + tests[i].expected_boot, + tests[i].expected_reboot); +} } -#define G3BEIGE_CFG_ADDR 0xf510 -#define MAC99_CFG_ADDR 0xf510 +static const boot_order_test test_cases_pc[] = { +{ , + 0x1230, 0x1230 }, +{ -no-fd-bootchk, + 0x1231, 0x1231 }, +{ -boot c, + 0x0200, 0x0200 }, +{ -boot nda, + 0x3410, 0x3410 }, +{ -boot order=, + 0, 0 }, +{ -boot order= -boot order=c, + 0x0200, 0x0200 }, +{ -boot once=a, + 0x0100, 0x1230 }, +{ -boot once=a -no-fd-bootchk, + 0x0101, 0x1231 }, +{ -boot once=a,order=c, + 0x0100, 0x0200 }, +{ -boot once=d -boot order=nda, + 0x0300, 0x3410 }, +{ -boot once=a -boot once=b -boot once=c, + 0x0200, 0x1230 }, +{} +}; + +static void test_pc_boot_order(void) +{ +test_boot_orders(NULL, read_boot_order_pc, test_cases_pc); +} #define NO_QEMU_PROTOS #include hw/nvram/fw_cfg.h #undef NO_QEMU_PROTOS -static void