[Qemu-devel] [RFC v0] HACK: qom: object_property_set: abort on failure
Hi All. A couple of times now ive had debug issues due to silent failure of object_property_set. This function silently fails if the requested property does not exist for the target object. To trap this error I applied the patch below to my tree, but I am assuming that this is not mergeable as is as im guessing there are clients out there that are speculatively trying to set props. Could someone confirm the expected policy here? is setting a non-existant property supposed to be a no-op (as it currently is) or should it fail gracefully? Whats the best meachinism for creating a no_fail version of object_property_set, for the 90% case where a non-existant property is an error in machine model development? Signed-off-by: Peter A. G. Crosthwaite peter.crosthwa...@petalogix.com --- qom/object.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/qom/object.c b/qom/object.c index a552be2..6e875a8 100644 --- a/qom/object.c +++ b/qom/object.c @@ -687,7 +687,7 @@ void object_property_set(Object *obj, Visitor *v, const char *name, { ObjectProperty *prop = object_property_find(obj, name, errp); if (prop == NULL) { -return; +abort(); } if (!prop-set) { -- 1.7.0.4
Re: [Qemu-devel] [PATCH 1/2] virtio-block: support auto-sensing of block sizes
On Mon, Aug 13, 2012 at 07:50:39PM +, Blue Swirl wrote: On Mon, Aug 13, 2012 at 8:40 AM, Jens Freimann jf...@de.ibm.com wrote: From: Einar Lueck elelu...@linux.vnet.ibm.com Virtio-blk does not impose fixed block sizes for access to backing devices. This patch introduces support for auto lookup of the block sizes of the backing block device. This automatic lookup needs to be enabled explicitly. Users may do this by specifying (physical|logical)_block_size=0. Machine types may do this for their defaults, too. To achieve this, a new function blkconf_blocksizes is implemented. If physical or logical block size are zero a corresponding ioctl tries to find an appropriate value. If this does not work, 512 is used. blkconf_blocksizes is therefore only called w/in the virtio-blk context. For s390-virtio, this patch configures auto lookup as default. Signed-off-by: Einar Lueck elelu...@linux.vnet.ibm.com Signed-off-by: Jens Freimann jf...@linux.vnet.ibm.com --- hw/block-common.c| 23 +++ hw/block-common.h| 12 +--- hw/qdev-properties.c | 4 +++- hw/s390-virtio-bus.c | 2 +- hw/virtio-blk.c | 1 + 5 files changed, 37 insertions(+), 5 deletions(-) diff --git a/hw/block-common.c b/hw/block-common.c index f0196d7..444edd2 100644 --- a/hw/block-common.c +++ b/hw/block-common.c @@ -10,6 +10,9 @@ #include blockdev.h #include hw/block-common.h #include qemu-error.h +#ifdef __linux__ +#include linux/fs.h +#endif void blkconf_serial(BlockConf *conf, char **serial) { @@ -24,6 +27,26 @@ void blkconf_serial(BlockConf *conf, char **serial) } } +void blkconf_blocksizes(BlockConf *conf) +{ +int block_size; + +if (!conf-physical_block_size) { +if (bdrv_ioctl(conf-bs, BLKPBSZGET, block_size) == 0) { This use of BLKPBSZGET is not protected by #ifdef __linux__ (or BLKPBSZGET), so this will probably fail when the host OS is not Linux and BLKPBSZGET is not defined. Yes, you're right. +conf-physical_block_size = (uint16_t) block_size; +} else { +conf-physical_block_size = BLOCK_PROPERTY_STD_BLKSIZE; +} +} +if (!conf-logical_block_size) { +if (bdrv_ioctl(conf-bs, BLKSSZGET, block_size) == 0) { Also here. Ok, will fix. Thanks for the review! Jens +conf-logical_block_size = (uint16_t) block_size; +} else { +conf-logical_block_size = BLOCK_PROPERTY_STD_BLKSIZE; +} +} +} + int blkconf_geometry(BlockConf *conf, int *ptrans, unsigned cyls_max, unsigned heads_max, unsigned secs_max) { diff --git a/hw/block-common.h b/hw/block-common.h index bb808f7..d593128 100644 --- a/hw/block-common.h +++ b/hw/block-common.h @@ -40,18 +40,23 @@ static inline unsigned int get_physical_block_exp(BlockConf *conf) return exp; } -#define DEFINE_BLOCK_PROPERTIES(_state, _conf) \ +#define BLOCK_PROPERTY_STD_BLKSIZE 512 +#define DEFINE_BLOCK_PROPERTIES_EXTENDED(_state, _conf, _blksize) \ DEFINE_PROP_DRIVE(drive, _state, _conf.bs), \ DEFINE_PROP_BLOCKSIZE(logical_block_size, _state, \ - _conf.logical_block_size, 512), \ + _conf.logical_block_size, _blksize), \ DEFINE_PROP_BLOCKSIZE(physical_block_size, _state,\ - _conf.physical_block_size, 512), \ + _conf.physical_block_size, _blksize), \ DEFINE_PROP_UINT16(min_io_size, _state, _conf.min_io_size, 0), \ DEFINE_PROP_UINT32(opt_io_size, _state, _conf.opt_io_size, 0),\ DEFINE_PROP_INT32(bootindex, _state, _conf.bootindex, -1),\ DEFINE_PROP_UINT32(discard_granularity, _state, \ _conf.discard_granularity, 0) +#define DEFINE_BLOCK_PROPERTIES(_state, _conf) \ +DEFINE_BLOCK_PROPERTIES_EXTENDED(_state, _conf, \ + BLOCK_PROPERTY_STD_BLKSIZE) + #define DEFINE_BLOCK_CHS_PROPERTIES(_state, _conf) \ DEFINE_PROP_UINT32(cyls, _state, _conf.cyls, 0), \ DEFINE_PROP_UINT32(heads, _state, _conf.heads, 0), \ @@ -60,6 +65,7 @@ static inline unsigned int get_physical_block_exp(BlockConf *conf) /* Configuration helpers */ void blkconf_serial(BlockConf *conf, char **serial); +void blkconf_blocksizes(BlockConf *conf); int blkconf_geometry(BlockConf *conf, int *trans, unsigned cyls_max, unsigned heads_max, unsigned secs_max); diff --git a/hw/qdev-properties.c b/hw/qdev-properties.c index 8aca0d4..e99dee4 100644 --- a/hw/qdev-properties.c +++ b/hw/qdev-properties.c @@ -904,7 +904,9
[Qemu-devel] [PATCH] iscsi: fix race between task completition and task abortion
From: spriebe g...@profihost.ag --- block/iscsi.c | 36 1 files changed, 20 insertions(+), 16 deletions(-) diff --git a/block/iscsi.c b/block/iscsi.c index 12ca76d..257f97f 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -76,6 +76,10 @@ static void iscsi_abort_task_cb(struct iscsi_context *iscsi, int status, void *command_data, void *private_data) { + IscsiAIOCB *acb = (IscsiAIOCB *)private_data; + + scsi_free_scsi_task(acb-task); + acb-task = NULL; } static void @@ -85,14 +89,19 @@ iscsi_aio_cancel(BlockDriverAIOCB *blockacb) IscsiLun *iscsilun = acb-iscsilun; acb-common.cb(acb-common.opaque, -ECANCELED); -acb-canceled = 1; -/* send a task mgmt call to the target to cancel the task on the target */ -iscsi_task_mgmt_abort_task_async(iscsilun-iscsi, acb-task, - iscsi_abort_task_cb, NULL); +if (acb-canceled != 0) +return; + +acb-canceled = 1; -/* then also cancel the task locally in libiscsi */ -iscsi_scsi_task_cancel(iscsilun-iscsi, acb-task); +/* send a task mgmt call to the target to cancel the task on the target + * this also cancels the task in libiscsi + */ +if (acb-task) { +iscsi_task_mgmt_abort_task_async(iscsilun-iscsi, acb-task, + iscsi_abort_task_cb, acb); +} } static AIOPool iscsi_aio_pool = { @@ -184,6 +193,11 @@ iscsi_readv_writev_bh_cb(void *p) } qemu_aio_release(acb); + +if (acb-task) { + scsi_free_scsi_task(acb-task); + acb-task = NULL; +} } @@ -212,8 +226,6 @@ iscsi_aio_write16_cb(struct iscsi_context *iscsi, int status, } iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb); -scsi_free_scsi_task(acb-task); -acb-task = NULL; } static int64_t sector_qemu2lun(int64_t sector, IscsiLun *iscsilun) @@ -313,8 +325,6 @@ iscsi_aio_read16_cb(struct iscsi_context *iscsi, int status, } iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb); -scsi_free_scsi_task(acb-task); -acb-task = NULL; } static BlockDriverAIOCB * @@ -429,8 +439,6 @@ iscsi_synccache10_cb(struct iscsi_context *iscsi, int status, } iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb); -scsi_free_scsi_task(acb-task); -acb-task = NULL; } static BlockDriverAIOCB * @@ -483,8 +491,6 @@ iscsi_unmap_cb(struct iscsi_context *iscsi, int status, } iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb); -scsi_free_scsi_task(acb-task); -acb-task = NULL; } static BlockDriverAIOCB * @@ -560,8 +566,6 @@ iscsi_aio_ioctl_cb(struct iscsi_context *iscsi, int status, } iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb); -scsi_free_scsi_task(acb-task); -acb-task = NULL; } static BlockDriverAIOCB *iscsi_aio_ioctl(BlockDriverState *bs, -- 1.7.2.5
Re: [Qemu-devel] [PATCH] qom: add style guide
On Mon, Aug 13, 2012 at 01:46:46PM -0500, Anthony Liguori wrote: Signed-off-by: Anthony Liguori aligu...@us.ibm.com --- docs/qom-style-guide.md | 489 +++ 1 files changed, 489 insertions(+), 0 deletions(-) create mode 100644 docs/qom-style-guide.md diff --git a/docs/qom-style-guide.md b/docs/qom-style-guide.md new file mode 100644 index 000..e7590e0 --- /dev/null +++ b/docs/qom-style-guide.md @@ -0,0 +1,489 @@ +QEMU Object Model Style Guide += + +Overview + +This document is a step-by-step tutorial of QOM. It is meant to be read from +top to bottom addressing the most common use-cases at the start. This is a +living document and contributions are welcome but it should not attempt to be +an API reference. There code contains inline documentation and API details There code contains - The code contains? +should be covered in the respective header files. + +Motivation +-- +QEMU makes extensive use of object oriented programming. Since QEMU is written +in C, these OOP concepts often use very different mechanisms to achieve the +same goal. The net effect is a lot of infrastructure duplication and a general +lack of consistency. + +The goal of QOM is to use a single infrastructure for all OOP within QEMU. This +improves consistency and eases maintainability over the long term. + +QOM provides a common infrastructure for: + +- Type Management + - Registering types + - Enumerating registered types +- Inheritence + - Single-parent inheritance + - Introspection of inheritance hierarchy + - Multiple inheritance through stateless interfaces +- Polymorphism + - Class based polymorphism + - Virtual and pure virtual methods + - Constructor/destructor chaining +- Object Properties + - Dynamic property registration (tied to Objects) + - Property introspection + - Access permissions + - Accessor hooks +- Type Casting + - Runtime checked upcast/downcast + - Full support for casting up and down the chain (including interfaces) +- Object Enumeration + - Expression of relationship between objects + - Ability to reference objects with a symbolic path + - Represented as a directed graph + +While QOM has a lot of high level concepts, the primary design goal has been to +keep simple concepts simple to implement. + +A Note on Consistency +- +Much of the QOM types in QEMU have been converted through automated scripts. +Tremendous effort was made to make sure the resulting code was high quality and +adhered to all of the guidelines in this document. However, there are many +cases where some of the conversion could not be scripted easily and some short +cuts where taken. + +Whenever possible, as code is refactored for other reasons, it should be brought +up fully to the guidelines expressed in this document. + +Creating a Simple Type +-- +The easiest way to understand QOM is to walk through an example. This is a +typical example of creating a new type derived from Object as the parent class. +In this example, all code would live in a single C source file. + +Let's get started: + +#include qemu/object.h + +This is the header file that contains the core QOM infrastructure. It has +minimum dependencies to facilitate unit testing. + +#define TYPE_MY_TYPE my-type +#define MY_TYPE(obj) OBJECT_CHECK(MyType, (obj), TYPE_MY_TYPE) + +All QOM types should define at least two macros. The first macro is a symbolic +version of the type name. It should always take the form +TYPE_ + upper(typename). Type names should generally follow the naming rules of +QAPI which means dashes, '-', are preferred to underscores, '_'. + +The second macro is a cast macro. The first argument is the type struct and the +remaining arguments are self-evident. This form should always be followed even +if the cast macro isn't currently used by the C file. + +typedef struct MyType MyType; + +struct MyType +{ +Object parent_obj; + +/* private */ +int foo; +}; + +When declaring the structure, a forward declaration should be used. This is +useful for consistency sake as it is required when defining classes. + +The first element must be the parent type and should be named 'parent_obj' or +just 'parent'. When working with QOM types, you should avoid ever accessing +this member directly instead relying on casting macros. + +Casting macros hide the inheritence hierarchy from the implementation. This +makes it easier to refactor code over time by changing the hierarchy without +changing the code in many places. + +static TypeInfo my_type_info = { +.name = TYPE_MY_TYPE, +.parent = TYPE_OBJECT, +.instance_size = sizeof(MyType), +}; + +static void
Re: [Qemu-devel] [PATCH 0/3] VFIO-based PCI device assignment for QEMU 1.2
On 2012-08-13 21:31, Anthony Liguori wrote: Jan Kiszka jan.kis...@siemens.com writes: On 2012-08-13 15:58, Avi Kivity wrote: On 08/13/2012 04:27 PM, Anthony Liguori wrote: Thanks for pushing this forward! Hopefully this will finally kill off qemu-kvm.git for good. No, it won't. vfio requires a 3.6 kernel, which we cannot assume anyone has. We'll need the original device assignment code side-by-side. ...which is on my to-do list for 1.3. Is there a deprecation plan for the old device assignment code? I'm not really against the idea of requiring a new kernel for new features. From a Fedora/OpenSUSE point of view, would supporting old kernels be a requirement to stop shipping qemu-kvm.git over qemu.git? Since distros ship new kernels and new userspaces, I don't think distros would care so I'm not sure who we're trying to support old kernels for. We are supporting KVM down to 2.6.3x, if not 2.6.2x. Also, device assignment is a new feature for upstream, but not for the masses of KVM users of QEMU (due to qemu-kvm and corresponding libvirt support). I think it will take some more kernel releases to have all feature there that allows performance-wise equivalent device assignment via VFIO. And it can even be helpful to cross-check issues of VFIO in the field. Except for some self-contained helper functions in the KVM layer, classic device assignment will be as isolated as VFIO. So I don't think we would take any noteworthy burden to maintain it as long as the kernel supports this interface. Can't comment on the other questions. Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH] kvm-all.c: Move init of irqchip_inject_ioctl out of kvm_irqchip_create()
On 2012-08-06 19:05, Peter Maydell wrote: Move the init of the irqchip_inject_ioctl field of KVMState out of kvm_irqchip_create() and into kvm_init(), so that kvm_set_irq() can be used even when no irqchip is created (for architectures that support async interrupt notification even without an in kernel irqchip). Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- We can't just set irqchip_inject_ioctl in target-*/kvm.c because the KVMState struct layout is private to kvm-all.c. Moving the default initialisation of this field seemed the simplest approach. It's safe because the use in kvm_set_irq() is guarded by a check of kvm_async_interrupts_enabled(). The other approach would be to have a helper function for setting the field, but that seems overkill when KVM_IRQ_LINE is the standard default value. (KVM_IRQ_LINE_STATUS seems to be undocumented, incidentally. I am going to assume it's another x86ism until somebody does document it :-)) kvm-all.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kvm-all.c b/kvm-all.c index 6def6c9..9a1f001 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1200,7 +1200,6 @@ static int kvm_irqchip_create(KVMState *s) return ret; } -s-irqchip_inject_ioctl = KVM_IRQ_LINE; if (kvm_check_extension(s, KVM_CAP_IRQ_INJECT_STATUS)) { s-irqchip_inject_ioctl = KVM_IRQ_LINE_STATUS; } Either you move both or none. KVM_IRQ_LINE is old-style, deprecated, KVM_IRQ_LINE_STATUS (i.e injection with feedback to allow lost-tick compensation) is the current standard that other archs should pick up. Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH] kvm-all.c: Move init of irqchip_inject_ioctl out of kvm_irqchip_create()
On 2012-08-06 19:05, Peter Maydell wrote: Move the init of the irqchip_inject_ioctl field of KVMState out of kvm_irqchip_create() and into kvm_init(), so that kvm_set_irq() can be used even when no irqchip is created (for architectures that support async interrupt notification even without an in kernel irqchip). Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- We can't just set irqchip_inject_ioctl in target-*/kvm.c because the KVMState struct layout is private to kvm-all.c. Moving the default initialisation of this field seemed the simplest approach. It's safe because the use in kvm_set_irq() is guarded by a check of kvm_async_interrupts_enabled(). The other approach would be to have a helper function for setting the field, but that seems overkill when KVM_IRQ_LINE is the standard default value. (KVM_IRQ_LINE_STATUS seems to be undocumented, incidentally. I am going to assume it's another x86ism until somebody does document it :-)) Oh, and when implementing KVM_IRQ_LINE_STATUS for ARM, please also feel free to document it in a way that it applies both to its x86 history and new architectures. Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH] kvm-all.c: Move init of irqchip_inject_ioctl out of kvm_irqchip_create()
On 14 August 2012 08:33, Jan Kiszka jan.kis...@web.de wrote: Either you move both or none. OK. KVM_IRQ_LINE is old-style, deprecated, KVM_IRQ_LINE_STATUS (i.e injection with feedback to allow lost-tick compensation) is the current standard that other archs should pick up. Can it be documented in the kernel api.txt then, please? Nobody is going to use it otherwise... (If I'd been paying attention at the time I'd have nak'd the qemu patches that added it on the grounds they were using an undocumented kernel API :-)) -- PMM
Re: [Qemu-devel] [PATCH uq/master] kvm: i8254: Finish time conversion fix
On 2012-08-13 20:40, Michael Tokarev wrote: On 13.08.2012 22:18, Jan Kiszka wrote: 0cdd3d1444 fixed reading back the counter load time from the kernel while assuming the kernel would always update its load time on writing the state. That is only true for channel 1, and so pit_get_channel_info returned wrong output pin states for high counter values. Fix this by applying the offset also on kvm_pit_put. For this purpose, we cache the clock offset in KVMPITState, only updating it on VM state changes or when we write the state while the VM is stopped. Wug. The fix (consisting of two halves) appears to be quite messy. I will split it up into offset caching and application to kvm_pit_put. Is it a (temporary) workaround or a real solution? No, this is the real solution. It may look complex, but it is required due to the different time bases of the in-kernel PIT and QEMU's vmclock. We didn't care about this in qemu-kvm in the past, but upstream now actually supports migration between in-kernel and user space models, and it also supports the PC speaker with the in-kernel PIT enabled. And yes, this second half fixes the reported issue with grub timekeeping, and should fix the seabios problem as well (so it shouldn't be necessary to mess with timekeeping in seabios anymore). Thanks, great to hear! Jan signature.asc Description: OpenPGP digital signature
[Qemu-devel] [PATCH] dma: Fix stupid typo/thinko
Hi hard a brain fart when coding that function, it will fail to set the memory beyond the first 512 bytes. This is in turn causing guest crashes in ibmveth (spapr_llan.c on the qemu side) due to the receive queue not being properly initialized. Signed-off-by: Benjamin Herrenschmidt b...@kernel.crashing.org --- Anthony, I believe this could/should go in ASAP :-) diff --git a/dma-helpers.c b/dma-helpers.c index 35cb500..53e47c6 100644 --- a/dma-helpers.c +++ b/dma-helpers.c @@ -24,8 +24,8 @@ static void do_dma_memory_set(dma_addr_t addr, uint8_t c, dma_addr_t len) while (len 0) { l = len FILLBUF_SIZE ? len : FILLBUF_SIZE; cpu_physical_memory_rw(addr, fillbuf, l, true); -len -= len; -addr += len; +len -= l; +addr += l; } }
Re: [Qemu-devel] [PATCH] kvm-all.c: Move init of irqchip_inject_ioctl out of kvm_irqchip_create()
On 2012-08-14 09:40, Peter Maydell wrote: On 14 August 2012 08:33, Jan Kiszka jan.kis...@web.de wrote: Either you move both or none. OK. KVM_IRQ_LINE is old-style, deprecated, KVM_IRQ_LINE_STATUS (i.e injection with feedback to allow lost-tick compensation) is the current standard that other archs should pick up. Can it be documented in the kernel api.txt then, please? Nobody is going to use it otherwise... (If I'd been paying attention at the time I'd have nak'd the qemu patches that added it on the grounds they were using an undocumented kernel API :-)) The kernel API's documentation has in fact a much younger history than KVM support in QEMU. I think we still need to add quite a few standard IOCTLs to make it complete. Patches always welcome. Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH v8] kvm: notify host when the guest is panicked
On Mon, Aug 13, 2012 at 05:24:52PM -0300, Marcelo Tosatti wrote: On Mon, Aug 13, 2012 at 01:48:39PM -0600, Eric Blake wrote: On 08/13/2012 12:21 PM, Marcelo Tosatti wrote: On Wed, Aug 08, 2012 at 10:43:01AM +0800, Wen Congyang wrote: We can know the guest is panicked when the guest runs on xen. But we do not have such feature on kvm. Another purpose of this feature is: management app(for example: libvirt) can do auto dump when the guest is panicked. If management app does not do auto dump, the guest's user can do dump by hand if he sees the guest is panicked. We have three solutions to implement this feature: 1. use vmcall 2. use I/O port 3. use virtio-serial. We have decided to avoid touching hypervisor. The reason why I choose choose the I/O port is: 1. it is easier to implememt 2. it does not depend any virtual device 3. it can work when starting the kernel How about searching for the Kernel panic - not syncing string in the guests serial output? Say libvirtd could take an action upon that? Advantages: - It works for all architectures. - It does not depend on any virtual device. But it _does_ depend on a serial console, Which already exists and is supported. and furthermore requires libvirt to tee the serial console (right now, libvirt can treat the console as an opaque pass-through to the end user, but if you expect libvirt to parse the serial console for a particular string, you've lost some efficiency). - It works as early as serial console output does (panics before that should be rare). - It allows you to see why the guest panicked. I think your arguments for a serial console have already been made and refuted in earlier versions of this patch series, which is WHY this series is still applicable. Refuted why, exactly? Efficiency to parse serial console output in libvirt should not be a major issue surely? It is not zero config (guests do not send console output to serial by default). If vm users want to use serial for its working console panic notification will trigger every time user examines dmesg with Kernel panic - not syncing in it. -- Gleb.
Re: [Qemu-devel] [PATCH] kvm-all.c: Move init of irqchip_inject_ioctl out of kvm_irqchip_create()
On 14 August 2012 08:42, Jan Kiszka jan.kis...@web.de wrote: On 2012-08-14 09:40, Peter Maydell wrote: On 14 August 2012 08:33, Jan Kiszka jan.kis...@web.de wrote: KVM_IRQ_LINE is old-style, deprecated, KVM_IRQ_LINE_STATUS (i.e injection with feedback to allow lost-tick compensation) is the current standard that other archs should pick up. Can it be documented in the kernel api.txt then, please? Nobody is going to use it otherwise... (If I'd been paying attention at the time I'd have nak'd the qemu patches that added it on the grounds they were using an undocumented kernel API :-)) The kernel API's documentation has in fact a much younger history than KVM support in QEMU. I think we still need to add quite a few standard IOCTLs to make it complete. Patches always welcome. Well, you appear to know what this variant ioctl does and why it's better than KVM_IRQ_LINE, whereas I don't. I just want to deliver an interrupt, KVM_IRQ_LINE lets me deliver an interrupt, why do I need anything more? (What would I do with the status return, for instance? I have to assert the incoming irq line, there's nothing for me to do if the kernel says sorry, can't do that except abort qemu.) -- PMM
Re: [Qemu-devel] [PATCH] kvm-all.c: Move init of irqchip_inject_ioctl out of kvm_irqchip_create()
On 2012-08-14 09:52, Peter Maydell wrote: On 14 August 2012 08:42, Jan Kiszka jan.kis...@web.de wrote: On 2012-08-14 09:40, Peter Maydell wrote: On 14 August 2012 08:33, Jan Kiszka jan.kis...@web.de wrote: KVM_IRQ_LINE is old-style, deprecated, KVM_IRQ_LINE_STATUS (i.e injection with feedback to allow lost-tick compensation) is the current standard that other archs should pick up. Can it be documented in the kernel api.txt then, please? Nobody is going to use it otherwise... (If I'd been paying attention at the time I'd have nak'd the qemu patches that added it on the grounds they were using an undocumented kernel API :-)) The kernel API's documentation has in fact a much younger history than KVM support in QEMU. I think we still need to add quite a few standard IOCTLs to make it complete. Patches always welcome. Well, you appear to know what this variant ioctl does and why it's better than KVM_IRQ_LINE, whereas I don't. I just want to deliver an interrupt, KVM_IRQ_LINE lets me deliver an interrupt, why do I need anything more? (What would I do with the status return, for instance? I have to assert the incoming irq line, there's nothing for me to do if the kernel says sorry, can't do that except abort qemu.) Not sure how timekeeping of all your guests will work, but a classic scenario on x86 is that some timer is programmed to deliver periodic ticks (or one-shot ticks that also generates a virtual periodic timer) and that those ticks will then be used to derive the system time of the guest. Now, if the guest was unable to process the past tick completely (due to host load) and we inject already another tick event, that one will get lost. Some guests (older Linuxes but also many proprietary OSes) are not prepared for such tick loss and will suffer from drifting wall clocks. For that reason, we allow userspace to find out if a (potentially) tick driving IRQ was actually received by the guest or if it coalesced with an ongoing event. In the latter case, userspace can reinject those events once the guest is able to receive them again. All we need from the kernel API is that feedback KVM_IRQ_LINE_STATUS provides. The return values are nicely hidden in kvm_set_irq: /* * Return value: * 0 Interrupt was ignored (masked or not delivered for other reasons) * = 0 Interrupt was coalesced (previous irq is still pending) * 0 Number of CPUs interrupt was delivered to */ QEMU doesn't make use of that number of receiving CPUs and I'm mot sure why we even report it. Maybe the kernel API should just state that 0 means delivered. Jan signature.asc Description: OpenPGP digital signature
[Qemu-devel] [PATCH] i2c: factor out VMSD to parent class
Hi All. PMM raised a query on a recent series of mine (the SSI series) about handling VMSD for devices which define state at multiple levels of the QOM heirachy. Rather than complicate the discussion over in my series im trying to start the discussion with an existing subsystem - i2c. This patch is a first attempt at trying to get the VMSD for generic I2C state factored out of the individual devices and handled transparently by the super class (I2C_SLAVE). I have applied the change to only the one I2C device (max7310). If we were going to run with this, the change pattern would be applied to all I2C devices. This patch is not a merge proposal it is RFC only. Please review and let us know if this is flawed or not. What needs to be done to get this multi-level VMSD going? I will use whatever review I get to fix my SSI series as well as fix I2C. Signed-off-by: Peter A. G. Crosthwaite peter.crosthwa...@petalogix.com --- hw/i2c.c |2 ++ hw/i2c.h |8 hw/max7310.c |1 - 3 files changed, 2 insertions(+), 9 deletions(-) diff --git a/hw/i2c.c b/hw/i2c.c index 296bece..17e1633 100644 --- a/hw/i2c.c +++ b/hw/i2c.c @@ -207,6 +207,8 @@ static int i2c_slave_qdev_init(DeviceState *dev) I2CSlave *s = I2C_SLAVE_FROM_QDEV(dev); I2CSlaveClass *sc = I2C_SLAVE_GET_CLASS(s); +vmstate_register(NULL, 0, vmstate_i2c_slave, s); + return sc-init(s); } diff --git a/hw/i2c.h b/hw/i2c.h index 0f5682b..5b75ecc 100644 --- a/hw/i2c.h +++ b/hw/i2c.h @@ -81,12 +81,4 @@ void lm832x_key_event(DeviceState *dev, int key, int state); extern const VMStateDescription vmstate_i2c_slave; -#define VMSTATE_I2C_SLAVE(_field, _state) { \ -.name = (stringify(_field)), \ -.size = sizeof(I2CSlave), \ -.vmsd = vmstate_i2c_slave,\ -.flags = VMS_STRUCT,\ -.offset = vmstate_offset_value(_state, _field, I2CSlave),\ -} - #endif diff --git a/hw/max7310.c b/hw/max7310.c index 1ed18ba..9375691 100644 --- a/hw/max7310.c +++ b/hw/max7310.c @@ -156,7 +156,6 @@ static const VMStateDescription vmstate_max7310 = { VMSTATE_UINT8(polarity, MAX7310State), VMSTATE_UINT8(status, MAX7310State), VMSTATE_UINT8(command, MAX7310State), -VMSTATE_I2C_SLAVE(i2c, MAX7310State), VMSTATE_END_OF_LIST() } }; -- 1.7.0.4
[Qemu-devel] [PATCH uq/master 1/2] kvm: i8254: Cache kernel clock offset in KVMPITState
To prepare the final fix for clock calibration issues with the in-kernel PIT, we want to cache the offset between vmclock and the clock used by the in-kernel PIT. So far, we only need to update it when the VM state changes between running and stopped because we only read the in-kernel PIT state while the VM is running. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/kvm/i8254.c | 38 -- 1 files changed, 24 insertions(+), 14 deletions(-) diff --git a/hw/kvm/i8254.c b/hw/kvm/i8254.c index c5d3711..c235d80 100644 --- a/hw/kvm/i8254.c +++ b/hw/kvm/i8254.c @@ -35,7 +35,8 @@ typedef struct KVMPITState { PITCommonState pit; LostTickPolicy lost_tick_policy; -bool state_valid; +bool vm_stopped; +int64_t kernel_clock_offset; } KVMPITState; static int64_t abs64(int64_t v) @@ -43,19 +44,11 @@ static int64_t abs64(int64_t v) return v 0 ? -v : v; } -static void kvm_pit_get(PITCommonState *pit) +static void kvm_pit_update_clock_offset(KVMPITState *s) { -KVMPITState *s = DO_UPCAST(KVMPITState, pit, pit); -struct kvm_pit_state2 kpit; -struct kvm_pit_channel_state *kchan; -struct PITChannelState *sc; int64_t offset, clock_offset; struct timespec ts; -int i, ret; - -if (s-state_valid) { -return; -} +int i; /* * Measure the delta between CLOCK_MONOTONIC, the base used for @@ -72,6 +65,21 @@ static void kvm_pit_get(PITCommonState *pit) clock_offset = offset; } } +s-kernel_clock_offset = clock_offset; +} + +static void kvm_pit_get(PITCommonState *pit) +{ +KVMPITState *s = DO_UPCAST(KVMPITState, pit, pit); +struct kvm_pit_state2 kpit; +struct kvm_pit_channel_state *kchan; +struct PITChannelState *sc; +int i, ret; + +/* No need to re-read the state if VM is stopped. */ +if (s-vm_stopped) { +return; +} if (kvm_has_pit_state2()) { ret = kvm_vm_ioctl(kvm_state, KVM_GET_PIT2, kpit); @@ -106,7 +114,7 @@ static void kvm_pit_get(PITCommonState *pit) sc-mode = kchan-mode; sc-bcd = kchan-bcd; sc-gate = kchan-gate; -sc-count_load_time = kchan-count_load_time + clock_offset; +sc-count_load_time = kchan-count_load_time + s-kernel_clock_offset; } sc = pit-channels[0]; @@ -211,10 +219,12 @@ static void kvm_pit_vm_state_change(void *opaque, int running, KVMPITState *s = opaque; if (running) { -s-state_valid = false; +kvm_pit_update_clock_offset(s); +s-vm_stopped = false; } else { +kvm_pit_update_clock_offset(s); kvm_pit_get(s-pit); -s-state_valid = true; +s-vm_stopped = true; } } -- 1.7.3.4
[Qemu-devel] [PATCH uq/master 2/2] kvm: i8254: Finish time conversion fix
0cdd3d1444 fixed reading back the counter load time from the kernel while assuming the kernel would always update its load time on writing the state. That is only true for channel 1, and so pit_get_channel_info returned wrong output pin states for high counter values. Fix this by applying the offset also on kvm_pit_put. Now we also need to update the offset when we write the state while the VM is stopped as it keeps on changing in that state. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/kvm/i8254.c | 14 ++ 1 files changed, 10 insertions(+), 4 deletions(-) diff --git a/hw/kvm/i8254.c b/hw/kvm/i8254.c index c235d80..53d13e3 100644 --- a/hw/kvm/i8254.c +++ b/hw/kvm/i8254.c @@ -122,17 +122,23 @@ static void kvm_pit_get(PITCommonState *pit) pit_get_next_transition_time(sc, sc-count_load_time); } -static void kvm_pit_put(PITCommonState *s) +static void kvm_pit_put(PITCommonState *pit) { +KVMPITState *s = DO_UPCAST(KVMPITState, pit, pit); struct kvm_pit_state2 kpit; struct kvm_pit_channel_state *kchan; struct PITChannelState *sc; int i, ret; -kpit.flags = s-channels[0].irq_disabled ? KVM_PIT_FLAGS_HPET_LEGACY : 0; +/* The offset keeps changing as long as the VM is stopped. */ +if (s-vm_stopped) { +kvm_pit_update_clock_offset(s); +} + +kpit.flags = pit-channels[0].irq_disabled ? KVM_PIT_FLAGS_HPET_LEGACY : 0; for (i = 0; i 3; i++) { kchan = kpit.channels[i]; -sc = s-channels[i]; +sc = pit-channels[i]; kchan-count = sc-count; kchan-latched_count = sc-latched_count; kchan-count_latched = sc-count_latched; @@ -145,7 +151,7 @@ static void kvm_pit_put(PITCommonState *s) kchan-mode = sc-mode; kchan-bcd = sc-bcd; kchan-gate = sc-gate; -kchan-count_load_time = sc-count_load_time; +kchan-count_load_time = sc-count_load_time - s-kernel_clock_offset; } ret = kvm_vm_ioctl(kvm_state, -- 1.7.3.4
Re: [Qemu-devel] [PATCH] i2c: factor out VMSD to parent class
Peter A. G. Crosthwaite peter.crosthwa...@petalogix.com wrote: Hi All. PMM raised a query on a recent series of mine (the SSI series) about handling VMSD for devices which define state at multiple levels of the QOM heirachy. Rather than complicate the discussion over in my series im trying to start the discussion with an existing subsystem - i2c. This patch is a first attempt at trying to get the VMSD for generic I2C state factored out of the individual devices and handled transparently by the super class (I2C_SLAVE). I have applied the change to only the one I2C device (max7310). If we were going to run with this, the change pattern would be applied to all I2C devices. This patch is not a merge proposal it is RFC only. Please review and let us know if this is flawed or not. What needs to be done to get this multi-level VMSD going? I will use whatever review I get to fix my SSI series as well as fix I2C. Signed-off-by: Peter A. G. Crosthwaite peter.crosthwa...@petalogix.com This series move data from one part to another (obvious), now the questions: - how do you know that one part of the data relates to the same device on the other side? I don't know about i2c, I am hoping for an answer O;-) This will make that the state of the device will be sent in two chunks, so we should make sure that the two chunks ends in the same device. - This makes the change completely uncompatible. What boards use i2c/ssi? My understanding is that it is only used outside of x86(_64), if so, we can live without backward compatibility. We should increase the version numbers. Something like that on addition. static const VMStateDescription vmstate_max7310 = { .name = max7310, - .version_id = 0, - .minimum_version_id = 0, - .minimum_version_id_old = 0, + .version_id = 1, + .minimum_version_id = 1, + .minimum_version_id_old = 1, And that should make it. - If you ask me, I would very much preffer something like PCI devices, where the 1st field of any specific device is the i2c part. This would achieve two things: * all i2c devices would have the common fields at the beggining * we sent the data for one device in one go, so we will never had trouble making sure that both devices arrive at the same time, in the right order, etc. - I guess there is same reasy why you want to split the device state, it could be on the other series where I haven't read it though. Later, Juan.
Re: [Qemu-devel] [PATCH v6 2/2] block: Support GlusterFS as a QEMU block backend
Am 14.08.2012 06:38, schrieb Bharata B Rao: Kevin, Thanks for your review. I will address all of your comments in the next iteration, but have a few questions/comments on the others... On Mon, Aug 13, 2012 at 02:50:29PM +0200, Kevin Wolf wrote: +static int parse_server(GlusterURI *uri, char *server) +{ +int ret = -EINVAL; +char *token, *saveptr; +char *p, *q = server; + +p = strchr(server, '['); +if (p) { +/* [ipv6] */ +if (p != server) { +/* [ not in the beginning */ +goto out; +} +q++; +p = strrchr(p, ']'); +if (!p) { +/* No matching ] */ +goto out; +} +*p++ = '\0'; +uri-server = g_strdup(q); + +if (*p) { +if (*p != ':') { +/* [ipv6] followed by something other than : */ +goto out; +} +uri-port = strtoul(++p, NULL, 0); +if (uri-port 0) { +goto out; +} This accepts inputs where the colon isn't followed by any number. Yes, and that will result in port=0, which is default. So this is to cater for cases like gluster://[1:2:3:4:5]:/volname/image So you consider this a valid URL? I would have expected it to invalid. But let me see, there must be some official definition of an URL... Alright, so RFC 2234 says that having no digits after the colon is valid. It also says that you shouldn't generate such URLs. And it doesn't say what it means when it's there... Common interpretation seems to be that it's treated as if it wasn't specified, i.e. the default port for the schema is used. So if 0 is the default port for glusterfs, your code looks okay. But it doesn't seem to be a very useful default port number. In any case, let me see if I can get rid of this altogether and reuse qemu-sockets.c:inet_parse(). Cool, thanks! +if (token) { +uri-port = strtoul(token, NULL, 0); +if (uri-port 0) { +goto out; +} +} else { +uri-port = 0; +} The port parsing code is duplicated in IPv4 and IPv6, even though it's really the same. Being such a small piece of code, I didn't think it deserves to be made a function on its own and re-used. But if you really want it that way, I can do. It's not worth a separate function, but it can just be code outside the if statement. +static struct glfs *qemu_gluster_init(GlusterURI *uri, const char *filename) +{ +struct glfs *glfs = NULL; +int ret; + +ret = qemu_gluster_parseuri(uri, filename); +if (ret 0) { +error_report(Usage: file=gluster://server[:port]/volname/image +[?transport=socket]); Is 'socket' really the only valid transport and will it stay like this without changes to qemu? There are others like 'unix' and 'rdma'. I will fix this error message to reflect that. However QEMU needn't change for such transport types because I am not interpreting the transport type in QEMU but instead passing it on directly to GlusterFS. Maybe then just specify [?transport=...] instead of giving a specific option value? + +static void qemu_gluster_aio_event_reader(void *opaque) +{ +BDRVGlusterState *s = opaque; +GlusterAIOCB *event_acb; +int event_reader_pos = 0; +ssize_t ret; + +do { +char *p = (char *)event_acb; + +ret = read(s-fds[GLUSTER_FD_READ], p + event_reader_pos, + sizeof(event_acb) - event_reader_pos); So you're reading in a pointer address from a pipe? This is fun. +if (ret 0) { +event_reader_pos += ret; +if (event_reader_pos == sizeof(event_acb)) { +event_reader_pos = 0; +qemu_gluster_complete_aio(event_acb); +s-qemu_aio_count--; +} +} +} while (ret 0 errno == EINTR); In case of a short read the read data is just discarded? Maybe event_reader_pos was supposed to static? In earlier versions event_reader_pos was part of BDRVGlusterState and I made it local in subsequent versions and that is causing this problem. Will fix. Ah, yes, it needs to be in BDRVGlusterState, static doesn't work when you have multiple gluster drives. + +static void qemu_gluster_aio_cancel(BlockDriverAIOCB *blockacb) +{ +GlusterAIOCB *acb = (GlusterAIOCB *)blockacb; + +acb-common.cb(acb-common.opaque, -ECANCELED); +acb-canceled = true; +} After having called acb-common.cb you must not write any longer to the memory pointed at by the qiov. Either you can really cancel the request, or you need to wait until it completes. I don't think I can cancel the request that has already been sent to gluster server. block/qed.c introduces acb-finished and waits on it in qed_aio_canel. Do you suggest I follow that model ? This is a possible solution,
[Qemu-devel] memory: could we add extra input param for memory_region_init_io()?
To make memoryRegion survive without the protection of qemu big lock, we need to pin its based Object. In current code, the type of mr-opaque are quite different. Lots of them are Object, but quite of them are not yet. The most challenge for changing from memory_region_init_io(..., void *opaque, ...) to memory_region_init_io(..., Object *opaque,...) is such codes: hw/ide/cmd646.c: static void setup_cmd646_bar(PCIIDEState *d, int bus_num) { IDEBus *bus = d-bus[bus_num]; CMD646BAR *bar = d-cmd646_bar[bus_num]; bar-bus = bus; bar-pci_dev = d; memory_region_init_io(bar-cmd, cmd646_cmd_ops, bar, cmd646-cmd, 4); memory_region_init_io(bar-data, cmd646_data_ops, bar, cmd646-data, 8); } If we passed in mr's based Object @d to substitute @bar, then we can not pass the extra info @bus_num. To solve such issue, introduce extra member Object *base for MemoryRegion. diff --git a/memory.c b/memory.c index 643871b..afd5dea 100644 --- a/memory.c +++ b/memory.c @@ -931,6 +931,7 @@ static void memory_region_dispatch_write(MemoryRegion *mr, void memory_region_init_io(MemoryRegion *mr, const MemoryRegionOps *ops, + Object *base, void *opaque, const char *name, uint64_t size) @@ -938,6 +939,7 @@ void memory_region_init_io(MemoryRegion *mr, memory_region_init(mr, name, size); mr-ops = ops; mr-opaque = opaque; +mr-base = base; mr-terminates = true; mr-destructor = memory_region_destructor_iomem; mr-ram_addr = ~(ram_addr_t)0; diff --git a/memory.h b/memory.h index bd1bbae..2746e70 100644 --- a/memory.h +++ b/memory.h @@ -120,6 +120,7 @@ struct MemoryRegion { /* All fields are private - violators will be prosecuted */ const MemoryRegionOps *ops; void *opaque; +Object *base; MemoryRegion *parent; Int128 size; target_phys_addr_t addr; Any comment? Thanks, pingfan
Re: [Qemu-devel] Funny -m arguments can crash
Anthony Liguori anth...@codemonkey.ws writes: Markus Armbruster arm...@redhat.com writes: Avi Kivity a...@redhat.com writes: On 08/08/2012 12:04 PM, Markus Armbruster wrote: Yes please, maybe with a notice to the user. Next problem: minimum RAM size. For instance, -M pc -m X, where X 32KiB dies qemu: fatal: Trying to execute code outside RAM or ROM at [...] Aborted (core dumped) with TCG, and KVM internal error. Suberror: 1 with KVM. Should a minimum RAM size be enforced? Board-specific? It's really a BIOS bug causing a limitation of both kvm and tcg to be hit. The BIOS should recognize it doesn't have sufficient memory and hang gracefully (if you can picture that). It just assumes some low memory is available and tries to execute it with the results you got. SeaBIOS indeed assumes it got at least 1MiB of RAM. It doesn't bother to check CMOS for a smaller RAM size. However, that bug / feature is currently masked by a QEMU bug: we screw up CMOS contents when there's less than 1 MiB of RAM. pc_cmos_init(): int val, nb, i; [...] /* memory size */ val = 640; /* base memory in K */ rtc_set_memory(s, 0x15, val); rtc_set_memory(s, 0x16, val 8); val = (ram_size / 1024) - 1024; if (val 65535) val = 65535; rtc_set_memory(s, 0x17, val); rtc_set_memory(s, 0x18, val 8); If ram_size 1MiB, val goes negative. Oops. For instance, with -m 500k, we happily promise 640KiB base memory (CMOS addr 0x15..16), almost 64MiB extended memory (0x17..18 and 0x30..31), yet no memory above 16MiB (0x34..35). An easy way to fix this is to require 1MiB of RAM :) But if you like, I'll put sane values in CMOS instead. That'll expose the SeaBIOS bug. Anthony, you're the PC maintainer, got a preference? SeaBIOS thread: http://comments.gmane.org/gmane.comp.bios.coreboot.seabios/4341 I'd prefer fixing the CMOS values over limiting to 1MB of RAM. Having a 1MB limit is purely theoritical--not practical. There's no good reason for anyone to ask for 1MB unless they know what they're doing. If it's truly a mistake, then asking for 2MB is just as much of a mistake because no real guest will run with 2MB of memory anyway (you can't even load a kernel). So if we're just going for theoritical correctness, we ought to do it the Right Way which is fixing the CMOS values and putting the check in SeaBIOS. Next error: $ gdb --args qemu-system-x86_64 -nodefaults --enable-kvm -vnc :0 -monitor stdio -m 640k [...] Program received signal SIGSEGV, Segmentation fault. [...] (gdb) bt #0 0x003b0de884ac in __memcmp_sse2 () from /lib64/libc.so.6 #1 0x0063f1ad in patch_hypercalls (s=0x139b350) at /work/armbru/qemu/hw/i386/../kvmvapic.c:532 #2 0x0063f3fe in vapic_prepare (s=0x139b350) at /work/armbru/qemu/hw/i386/../kvmvapic.c:597 #3 0x0063f4ed in vapic_write (opaque=0x139b350, addr=0, data=32, size= 2) at /work/armbru/qemu/hw/i386/../kvmvapic.c:634 #4 0x00677a44 in memory_region_write_accessor (opaque=0x139d670, addr= 0, value=0x7654bbf0, size=2, shift=0, mask=65535) at /work/armbru/qemu/memory.c:329 #5 0x00677b26 in access_with_adjusted_size (addr=0, value= 0x7654bbf0, size=2, access_size_min=1, access_size_max=4, access= 0x6779d0 memory_region_write_accessor, opaque=0x139d670) at /work/armbru/qemu/memory.c:359 #6 0x00677f80 in memory_region_iorange_write (iorange=0x139e050, offset=0, width=2, data=32) at /work/armbru/qemu/memory.c:436 #7 0x006709a5 in ioport_writew_thunk (opaque=0x139e050, addr=126, data=32) at /work/armbru/qemu/ioport.c:219 #8 0x00670384 in ioport_write (index=1, address=126, data=32) at /work/armbru/qemu/ioport.c:83 #9 0x00670e32 in cpu_outw (addr=126, val=32) at /work/armbru/qemu/ioport.c:296 #10 0x00674637 in kvm_handle_io (port=126, data=0x77ffb000, direction=1, size=2, count=1) at /work/armbru/qemu/kvm-all.c:1411 #11 0x00674bc5 in kvm_cpu_exec (env=0x1389b30) at /work/armbru/qemu/kvm-all.c:1556 #12 0x0060e0b4 in qemu_kvm_cpu_thread_fn (arg=0x1389b30) at /work/armbru/qemu/cpus.c:757 #13 0x003b0ea07d14 in start_thread () from /lib64/libpthread.so.0 #14 0x003b0def197d in clone () from /lib64/libc.so.6 Happens when -m argument is a multiple of 4k in [648k..768k]. Only with --enable-kvm. With and without my CMOS fix applied. And another one: $ qemu-system-x86_64 -nodefaults --enable-kvm -vnc :0 -monitor stdio -m 900k QEMU 1.1.50 monitor - type 'help' for more information (qemu) KVM internal error. Suberror: 1 emulation failure EAX=000fdb78 EBX= ECX= EDX=000fdb64 ESI= EDI=000fdb64 EBP= ESP=6f98 EIP=000e3492 EFL=0006
Re: [Qemu-devel] [PATCH] i2c: factor out VMSD to parent class
On 14 August 2012 09:27, Juan Quintela quint...@redhat.com wrote: Peter A. G. Crosthwaite peter.crosthwa...@petalogix.com wrote: Hi All. PMM raised a query on a recent series of mine (the SSI series) about handling VMSD for devices which define state at multiple levels of the QOM heirachy. - If you ask me, I would very much preffer something like PCI devices, where the 1st field of any specific device is the i2c part. This would achieve two things: * all i2c devices would have the common fields at the beggining * we sent the data for one device in one go, so we will never had trouble making sure that both devices arrive at the same time, in the right order, etc. - I guess there is same reasy why you want to split the device state, it could be on the other series where I haven't read it though. Really I'm just trying to get clarification on how class hierarchies should handle vmstate. At the moment any device which is a subclass of i2c has a VMSTATE_I2C_SLAVE field corresponding to the element in its struct which is its parent object. This seems a bit odd because surely the parent class should be responsible for its own save/load? I'm also not sure we do this consistently through the whole QOM hierarchy. So what I'm after is not necessarily a patch so much as a decision about which way this should be handled. This would probably be good to put in a section of Anthony's QOM style guide... -- PMM
Re: [Qemu-devel] [PATCH 2/3] vfio: vfio-pci device assignment driver
On Tue, Jul 31, 2012 at 11:18:15PM -0600, Alex Williamson wrote: This adds the core of the QEMU VFIO-based PCI device assignment driver. To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1, and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci module. To assign device :05:00.0 to a guest, do the following: for dev in $(ls /sys/bus/pci/devices/:05:00.0/iommu_group/devices); do vendor=$(cat /sys/bus/pci/devices/$dev/vendor) device=$(cat /sys/bus/pci/devices/$dev/device) if [ -e /sys/bus/pci/devices/$dev/driver ]; then echo $dev /sys/bus/pci/devices/$dev/driver/unbind fi echo $vendor $device /sys/bus/pci/drivers/vfio-pci/new_id done Both vfio-pci and the old driver successfully match the $vendor:$device. What happens when another $vendor:$device PCI adapter is hotplugged into the host? Is there a way to bind vfio-pci on a per-adapter basis instead of a per-$vendor:$device? Stefan
Re: [Qemu-devel] [PATCH 2/4] s390: Add a mechanism to get the subchannel id.
On Tue, 7 Aug 2012, Cornelia Huck wrote: +/** + * ccw_device_get_schid - obtain a subchannel id + * @cdev: device to obtain the id for + * @schid: where to fill in the values + */ +void ccw_device_get_schid(struct ccw_device *cdev, struct subchannel_id *schid) +{ + *schid = cdev-private-schid; +} +EXPORT_SYMBOL(ccw_device_get_schid); After I gave this some thought I like your version of this function better. But please use the id of the parent subchannel instead of the copy in cdev-private (since this will be removed soon). I'll do a cleanup patch to convert the internal users of the old function to use the one above. Regards, Sebastian
Re: [Qemu-devel] [qemu-devel] [PATCH V2 0/3] [RFC] libqblock draft code v2
于 2012-8-14 4:00, Blue Swirl 写道: On Mon, Aug 13, 2012 at 11:27 AM, Wenchao Xia xiaw...@linux.vnet.ibm.com wrote: 于 2012-8-11 20:18, Blue Swirl 写道: On Fri, Aug 10, 2012 at 8:04 AM, Wenchao Xia xiaw...@linux.vnet.ibm.com wrote: Thanks for your review, sorry I have forgot some fixing you mentioned before, will correct them this time. 于 2012-8-10 1:12, Blue Swirl 写道: On Thu, Aug 9, 2012 at 10:10 AM, Wenchao Xia xiaw...@linux.vnet.ibm.com wrote: This patch intrudce libqblock API, libqblock-test is used as a test case. V2: Using struct_size and [xxx]_new [xxx]_free to keep ABI. Using embbed structure to class format options in image creating. Format specific options were brought to surface. Image format was changed to enum interger instead of string. Some API were renamed. Internel error with errno was saved and with an API caller can get it. ALL flags used were defined in libqblock.h. Something need discuss: Embbed structure or union could make the model more friendly, but that make ABI more difficult, because we need to check every embbed structure's size and guess compiler's memory arrangement. This means #pragma pack(4) or struct_size, offset_next in every structure. Any better way to solve it? or make every structure a plain one? I'd still use accessor functions instead of structure passing, it would avoid these problems. Do you mean some function like : CreateOption_Filename_Set(const char *filename) CreateOption_Format_Set(const char *filename) Something like this: int qb_create_cow(struct QBlockState *qbs, const char *filename, size_t virt_size, const char *backing_file, int backing_flag); etc. for rest of the formats. Alternatively, we could have more generic versions like you describe, or even more generic still: void qb_set_property(struct QBlockState *qbs, const char *prop_name, const char *prop_value); so the create sequence (ignoring error handling) would be: s = qb_init(); qb_set_property(s, filename, c:autoexec.bat); qb_set_property(s, format, cow); qb_set_property(s, virt_size, 10GB); // use defaults for backing_file and backing_flag qb_create(s); Likewise for info structure: char *qb_get_property(struct QBlockState *qbs, const char *prop_name); foo = qb_get_property(s, format); foo = qb_get_property(s, encrypted); foo = qb_get_property(s, num_backing_files); foo = qb_get_property(s, virt_size); This would be helpful for the client to display parameters without much understanding of their contents: char **qb_list_properties(struct QBlockState *qbs); /* returns array of property names available for this file, use get_property to retrieve their contents */ But the clients can't be completely ignorant of all formats available, for example a second UI dialog needs to be added for formats with backing files, otherwise it won't be able to access some files at all. Maybe by adding type descriptions for each property (type for filename is path, for others string, bool, enum etc). Thanks. This seems heavy document is needed for that no structure can indicate what options sub format have, user can only get that info from returns or documents. I am not sure if this is good, because it looks more like a object oriented API that C. This approach may be a bit over-engineered, but it may be simpler to tie to an UI. What do you think of the simple version then: int qb_create_cow(struct QBlockState *qbs, const char *filename, size_t virt_size, const char *backing_file, int backing_flag); it is hard to extend more options, if for every format a API is needed, then int qb_create_cow(struct QBlockState *qbs, struct QBlockOptionFmtCow *op) seems better, while keep struct QBlockOptionFmtCow a plain struct without embbed struct(using pointer instead). It can solve the issue, with a cost of more small APIs in header files that user should use. Not sure if there is a good way to make it more friendly as an object language: oc.filename = name; automatically call CreateOption_Filename_Set, API CreateOption_Filename_Set is invisible to user. Packing can even introduce a new set of problems since we don't control the CFLAGS of the client of the library. indeed, I tried to handle the difference in function qboo_adjust_o2n, found that structure member alignment is hard to deal. AIO is missing, need a prototype. Wenchao Xia (3): adding libqblock libqblock API libqblock test case Makefile |3 + block.c |2 +- block.h |1 + libqblock-test.c | 197 libqblock.c | 670 ++ libqblock.h | 447 6 files changed, 1319 insertions(+), 1 deletions(-) create mode 100644 libqblock-test.c create mode 100644 libqblock.c create mode 100644 libqblock.h -- Best Regards Wenchao Xia --
Re: [Qemu-devel] [v2 Patch 1/9]block: Framework for reopening image files safely
On 08/03/2012 01:49 AM, Luiz Capitulino wrote: On Tue, 31 Jul 2012 03:04:22 +0530 Supriya Kannerysupri...@linux.vnet.ibm.com wrote: + +void bdrv_reopen_commit(BlockDriverState *bs, BDRVReopenState *rs) +{ +BlockDriver *drv = bs-drv; + +drv-bdrv_reopen_commit(bs, rs); +} + +void bdrv_reopen_abort(BlockDriverState *bs, BDRVReopenState *rs) +{ +BlockDriver *drv = bs-drv; + +drv-bdrv_reopen_abort(bs, rs); +} + +void bdrv_reopen(BlockDriverState *bs, int bdrv_flags, Error **errp) +{ +BlockDriver *drv = bs-drv; +int ret = 0; +BDRVReopenState *reopen_state = NULL; + +/* Quiesce IO for the given block device */ +bdrv_drain_all(); +ret = bdrv_flush(bs); +if (ret != 0) { +error_set(errp, QERR_IO_ERROR); +return; +} + +/* Use driver specific reopen() if available */ +if (drv-bdrv_reopen_prepare) { +ret = bdrv_reopen_prepare(bs,reopen_state, bdrv_flags); + if (ret 0) { +bdrv_reopen_abort(bs, reopen_state); Why do you have to call bdrv_reopen_abort()? I'd expect bdrv_reopen_prepare() (to be able) to undo anything it has done. Having separate abort function avoids cluttering of reopen- prepare(). We wanted to logically separate out preparation, commit and abort. Same format is followed in implementations at block driver level as well. -thanks, Supriya
Re: [Qemu-devel] [PATCH v8] kvm: notify host when the guest is panicked
On Mon, Aug 13, 2012 at 03:21:32PM -0300, Marcelo Tosatti wrote: On Wed, Aug 08, 2012 at 10:43:01AM +0800, Wen Congyang wrote: We can know the guest is panicked when the guest runs on xen. But we do not have such feature on kvm. Another purpose of this feature is: management app(for example: libvirt) can do auto dump when the guest is panicked. If management app does not do auto dump, the guest's user can do dump by hand if he sees the guest is panicked. We have three solutions to implement this feature: 1. use vmcall 2. use I/O port 3. use virtio-serial. We have decided to avoid touching hypervisor. The reason why I choose choose the I/O port is: 1. it is easier to implememt 2. it does not depend any virtual device 3. it can work when starting the kernel How about searching for the Kernel panic - not syncing string in the guests serial output? Say libvirtd could take an action upon that? No, this is not satisfactory. It depends on the guest OS being configured to use the serial port for console output which we cannot mandate, since it may well be required for other purposes. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
[Qemu-devel] [PATCH] MAINTAINERS: Update email address for Stefan Hajnoczi
Switch to my personal email address. Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- Access to the IBM email address ceases from September 2012. Keeping qemu-trivial moving during September should be no problem. I'll be back 100% and continuing to contribute to QEMU in October again. MAINTAINERS |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 708ad54..6d864c1 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -568,7 +568,7 @@ F: monitor.c Network device layer M: Anthony Liguori aligu...@us.ibm.com -M: Stefan Hajnoczi stefa...@linux.vnet.ibm.com +M: Stefan Hajnoczi stefa...@gmail.com S: Maintained F: net/ T: git git://github.com/stefanha/qemu.git net @@ -588,7 +588,7 @@ F: slirp/ T: git git://git.kiszka.org/qemu.git queues/slirp Tracing -M: Stefan Hajnoczi stefa...@linux.vnet.ibm.com +M: Stefan Hajnoczi stefa...@gmail.com S: Maintained F: trace/ F: scripts/tracetool.py -- 1.7.10.4
Re: [Qemu-devel] [v2 Patch 1/9]block: Framework for reopening image files safely
On 08/09/2012 02:43 AM, Jeff Cody wrote: On 07/30/2012 05:34 PM, Supriya Kannery wrote: Struct BDRVReopenState along with three reopen related functions introduced for handling reopening of images safely. This can be extended by each of the block drivers to reopen respective image files. + +bdrv_reopen_commit(bs, reopen_state); +bs-open_flags = bdrv_flags; You also need to set bs-read_only here, like so: bs-read_only = !(bdrv_flags BDRV_O_RDWR); Sure..will include in updated patch. - thanks, Supriya
Re: [Qemu-devel] [PATCH v6 2/2] block: Support GlusterFS as a QEMU block backend
On Tue, Aug 14, 2012 at 10:29:26AM +0200, Kevin Wolf wrote: Yes, and that will result in port=0, which is default. So this is to cater for cases like gluster://[1:2:3:4:5]:/volname/image So you consider this a valid URL? I would have expected it to invalid. But let me see, there must be some official definition of an URL... Alright, so RFC 2234 says that having no digits after the colon is valid. It also says that you shouldn't generate such URLs. And it doesn't say what it means when it's there... Common interpretation seems to be that it's treated as if it wasn't specified, i.e. the default port for the schema is used. So if 0 is the default port for glusterfs, your code looks okay. But it doesn't seem to be a very useful default port number. I know, but gluster prefers to be called with port=0 which will be interpreted as default by it. While we are at this, let me bring out another issue. Gluster supports 3 transport types: - socket in which case the server will be hostname, ipv4 or ipv4 address. - rdma in which case server will be interpreted similar to socket. - unix in which case server will be a path to unix domain socket and this will look like any other filesystem path. (Eg. /tmp/glusterd.socket) I don't think we can fit 'unix' within the standard URI scheme (RFC 3986) easily, but I am planning to specify the 'unix' transport as below: gluster://[/path/to/unix/domain/socket]/volname/image?transport=unix i,e., I am asking the user to put the unix domain socket path within square brackets when transport type is unix. Do you think this is fine ? Is 'socket' really the only valid transport and will it stay like this without changes to qemu? There are others like 'unix' and 'rdma'. I will fix this error message to reflect that. However QEMU needn't change for such transport types because I am not interpreting the transport type in QEMU but instead passing it on directly to GlusterFS. Maybe then just specify [?transport=...] instead of giving a specific option value? Sure, but as I noted above, let me finalize on how to specify 'unix' transport type. +static int qemu_gluster_send_pipe(BDRVGlusterState *s, GlusterAIOCB *acb) +{ +int ret = 0; +while (1) { +fd_set wfd; +int fd = s-fds[GLUSTER_FD_WRITE]; + +ret = write(fd, (void *)acb, sizeof(acb)); +if (ret = 0) { +break; +} +if (errno == EINTR) { +continue; +} +if (errno != EAGAIN) { +break; +} + +FD_ZERO(wfd); +FD_SET(fd, wfd); +do { +ret = select(fd + 1, NULL, wfd, NULL, NULL); +} while (ret 0 errno == EINTR); What's the idea behind this? While we're hanging in this loop noone will read anything from the pipe, so it's unlikely that it magically becomes ready. I write to the pipe and wait for the reader to read it. The reader (qemu_gluster_aio_event_reader) is already waiting on the other end of the pipe. qemu_gluster_aio_even_reader() isn't called while we're looping here. It will only be called from the main loop, after this function has returned. May be I am not understanding you correctly here. Let me be a bit verbose. This routine is called by aio callback routine that we registered to be called by gluster after aio completion. Hence this routine will be called in the context of a separate (gluster) thread. This routine will write to the pipe and wait until the data is read from the read-end of it. As soon as the data is available to be read from this pipe, I think the routine registered for reading (qemu_gluster_aio_event_reader) would be called which will further handle the AIO completion. As per my understanding, the original co-routine thread that initiated the aio read or write request would be blocked on qemu_aio_wait(). That thread would wake up and run qemu_gluster_aio_event_reader(). So I am not clear why qemu_gluster_aio_even_reader() won't be called until this routine (qemu_gluster_send_pipe) returns. Regards, Bharata.
[Qemu-devel] [PATCH 01/10] linux-user: Fix incorrect TARGET_BLKBSZGET, TARGET_BLKBSZSET
The definitions for the ioctl numbers TARGET_BLKBSZGET and TARGET_BLKBSZSET had the wrong size parameters (they are defined with size_t, not int, even though the ioctl implementations themselves read and write integers). Since commit 354a0008 we now have an ioctl wrapper definition for BLKBSZGET and so on an x86-64-to-x86-64 linux-user binary we were triggering the mismatch warning in syscall_init(). Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- linux-user/syscall_defs.h |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h index ba9a58c..2026579 100644 --- a/linux-user/syscall_defs.h +++ b/linux-user/syscall_defs.h @@ -880,8 +880,8 @@ struct target_pollfd { #define TARGET_BLKSECTGET TARGET_IO(0x12,103)/* get max sectors per request (ll_rw_blk.c) */ #define TARGET_BLKSSZGET TARGET_IO(0x12,104)/* get block device sector size */ /* A jump here: 108-111 have been used for various private purposes. */ -#define TARGET_BLKBSZGET TARGET_IOR(0x12,112,int) -#define TARGET_BLKBSZSET TARGET_IOW(0x12,113,int) +#define TARGET_BLKBSZGET TARGET_IOR(0x12, 112, abi_ulong) +#define TARGET_BLKBSZSET TARGET_IOW(0x12, 113, abi_ulong) #define TARGET_BLKGETSIZE64 TARGET_IOR(0x12,114,abi_ulong) /* return device size in bytes (u64 *arg) */ -- 1.7.9.5
[Qemu-devel] [PATCH 10/10] linux-user: ARM: Ignore immediate value for svc in thumb mode
From: Alexander Graf ag...@suse.de When running in thumb mode, Linux doesn't evaluate the immediate value of the svc instruction, but instead just always assumes the syscall number to be in r7. This fixes executing go_bootstrap while building go for me. Signed-off-by: Alexander Graf ag...@suse.de Reviewed-by: Peter Maydell peter.mayd...@linaro.org Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- linux-user/main.c |3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/linux-user/main.c b/linux-user/main.c index 63c1249..7dea084 100644 --- a/linux-user/main.c +++ b/linux-user/main.c @@ -822,8 +822,7 @@ void cpu_loop(CPUARMState *env) } else if (n == ARM_NR_semihosting || n == ARM_NR_thumb_semihosting) { env-regs[0] = do_arm_semihosting (env); -} else if (n == 0 || n = ARM_SYSCALL_BASE - || (env-thumb n == ARM_THUMB_SYSCALL)) { +} else if (n == 0 || n = ARM_SYSCALL_BASE || env-thumb) { /* linux syscall */ if (env-thumb || n == 0) { n = env-regs[7]; -- 1.7.9.5
Re: [Qemu-devel] [Bug 1035572] Re: Bug in Qemu User Mode
On 14 August 2012 02:01, Dietmar Stölting dietmar.stoelt...@t-online.de wrote: with this new syscall.c content above things are going in the right direction:-). I make a test with strace from the program testthread of the Qemu testsuite. When I understand the result right, threading works now with this new compiled qemu-i386. The child and the parents tidptr NOW have the same number in one thread, and different but also same in other thread. This means for the not working program testclone: The functioncall with its sets of parameters is just wrong there. When you do a function call with those Flags as in testthread, threads can be builded with qemu-i386. So, the error is in the wrong calling of the function clone(). This can be corrected. Please tell me your thoughts, Yes, as I said, we know that threading does not work for i386 targets (it is also broken in more subtle ways for other targets). This is not going to get fixed until it is investigated by somebody who has the time and expertise with both i386 architecture and QEMU internals to produce a coherent fix which addresses all the problems in this area. (See also my remarks in comment #47 of bug 739785.) I'm sorry if that sounds a bit negative, but there is a reason this bug has been unfixed for over a year -- it's not a trivial one to fix, and it's not easy to evaluate whether a small patch is a component of the complete correct solution without investing the time to think about the problem as a whole. -- PMM
[Qemu-devel] [PATCH 06/10] linux-user: make host_to_target_cmsg support SO_TIMESTAMP cmsg_type
From: Jing Huang jing.huang@gmail.com Signed-off-by: Jing Huang jing.huang@gmail.com Reviewed-by: Peter Maydell peter.mayd...@linaro.org Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- linux-user/syscall.c | 20 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 700163e..3fa5299 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -1359,16 +1359,28 @@ static inline abi_long host_to_target_cmsg(struct target_msghdr *target_msgh, target_cmsg-cmsg_type = tswap32(cmsg-cmsg_type); target_cmsg-cmsg_len = tswapal(TARGET_CMSG_LEN(len)); -if (cmsg-cmsg_level != TARGET_SOL_SOCKET || cmsg-cmsg_type != SCM_RIGHTS) { -gemu_log(Unsupported ancillary data: %d/%d\n, cmsg-cmsg_level, cmsg-cmsg_type); -memcpy(target_data, data, len); -} else { +if ((cmsg-cmsg_level == TARGET_SOL_SOCKET) +(cmsg-cmsg_type == SCM_RIGHTS)) { int *fd = (int *)data; int *target_fd = (int *)target_data; int i, numfds = len / sizeof(int); for (i = 0; i numfds; i++) target_fd[i] = tswap32(fd[i]); +} else if ((cmsg-cmsg_level == TARGET_SOL_SOCKET) +(cmsg-cmsg_type == SO_TIMESTAMP) +(len == sizeof(struct timeval))) { +/* copy struct timeval to target */ +struct timeval *tv = (struct timeval *)data; +struct target_timeval *target_tv = +(struct target_timeval *)target_data; + +target_tv-tv_sec = tswapal(tv-tv_sec); +target_tv-tv_usec = tswapal(tv-tv_usec); +} else { +gemu_log(Unsupported ancillary data: %d/%d\n, +cmsg-cmsg_level, cmsg-cmsg_type); +memcpy(target_data, data, len); } cmsg = CMSG_NXTHDR(msgh, cmsg); -- 1.7.9.5
[Qemu-devel] [PATCH 08/10] linux-user: Factor out guest space probing into a function
From: Meador Inge mead...@codesourcery.com Signed-off-by: Meador Inge mead...@codesourcery.com Reviewed-by: Peter Maydell peter.mayd...@linaro.org Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- linux-user/elfload.c | 110 +++--- linux-user/qemu.h| 13 ++ 2 files changed, 90 insertions(+), 33 deletions(-) diff --git a/linux-user/elfload.c b/linux-user/elfload.c index 6b622d4..cbc7617 100644 --- a/linux-user/elfload.c +++ b/linux-user/elfload.c @@ -1426,6 +1426,73 @@ bool guest_validate_base(unsigned long guest_base) } #endif +unsigned long init_guest_space(unsigned long host_start, + unsigned long host_size, + unsigned long guest_start, + bool fixed) +{ +unsigned long current_start, real_start; +int flags; + +assert(host_start || host_size); + +/* If just a starting address is given, then just verify that + * address. */ +if (host_start !host_size) { +if (guest_validate_base(host_start)) { +return host_start; +} else { +return (unsigned long)-1; +} +} + +/* Setup the initial flags and start address. */ +current_start = host_start qemu_host_page_mask; +flags = MAP_ANONYMOUS | MAP_PRIVATE | MAP_NORESERVE; +if (fixed) { +flags |= MAP_FIXED; +} + +/* Otherwise, a non-zero size region of memory needs to be mapped + * and validated. */ +while (1) { +/* Do not use mmap_find_vma here because that is limited to the + * guest address space. We are going to make the + * guest address space fit whatever we're given. + */ +real_start = (unsigned long) +mmap((void *)current_start, host_size, PROT_NONE, flags, -1, 0); +if (real_start == (unsigned long)-1) { +return (unsigned long)-1; +} + +if ((real_start == current_start) + guest_validate_base(real_start - guest_start)) { +break; +} + +/* That address didn't work. Unmap and try a different one. + * The address the host picked because is typically right at + * the top of the host address space and leaves the guest with + * no usable address space. Resort to a linear search. We + * already compensated for mmap_min_addr, so this should not + * happen often. Probably means we got unlucky and host + * address space randomization put a shared library somewhere + * inconvenient. + */ +munmap((void *)real_start, host_size); +current_start += qemu_host_page_size; +if (host_start == current_start) { +/* Theoretically possible if host doesn't have any suitably + * aligned areas. Normally the first mmap will fail. + */ +return (unsigned long)-1; +} +} + +return real_start; +} + static void probe_guest_base(const char *image_name, abi_ulong loaddr, abi_ulong hiaddr) { @@ -1452,46 +1519,23 @@ static void probe_guest_base(const char *image_name, } } host_size = hiaddr - loaddr; -while (1) { -/* Do not use mmap_find_vma here because that is limited to the - guest address space. We are going to make the - guest address space fit whatever we're given. */ -real_start = (unsigned long) -mmap((void *)host_start, host_size, PROT_NONE, - MAP_ANONYMOUS | MAP_PRIVATE | MAP_NORESERVE, -1, 0); -if (real_start == (unsigned long)-1) { -goto exit_perror; -} -guest_base = real_start - loaddr; -if ((real_start == host_start) -guest_validate_base(guest_base)) { -break; -} -/* That address didn't work. Unmap and try a different one. - The address the host picked because is typically right at - the top of the host address space and leaves the guest with - no usable address space. Resort to a linear search. We - already compensated for mmap_min_addr, so this should not - happen often. Probably means we got unlucky and host - address space randomization put a shared library somewhere - inconvenient. */ -munmap((void *)real_start, host_size); -host_start += qemu_host_page_size; -if (host_start == loaddr) { -/* Theoretically possible if host doesn't have any suitably - aligned areas. Normally the first mmap will fail. */ -errmsg = Unable to find space for application; -goto exit_errmsg; -} + +/* Setup the initial guest memory space with ranges
[Qemu-devel] [PULL for-1.2 00/10] linux-user queue
Hi. This is a collection of recent linux-user patches: * my minor fixes for complaints about ioctl mismatches * Jing Huang's fixes for running ping * Meador's guest space probing patches * and some minor fixes from Mike Frysinger and Alex I've put these together as a pullreq with Riku's agreement, since he's currently busy and it seemed like a good idea to get these in before hardfreeze. Apologies to anybody if I missed their patch -- I collated these mostly by going back through my mail archives so it's quite likely I didn't catch everything. [Riku has OK'd this set of patches and also tested it.] Thanks -- PMM The following changes since commit 33e95c6328a3149a52615176617997c4f8f7088b: qom: Reimplement Interfaces (2012-08-13 11:20:41 +0200) are available in the git repository at: git://git.linaro.org/people/pmaydell/qemu-arm.git linux-user.next for you to fetch changes up to 610dd40359a6a8c52ba81aad2c24e7e9747b6978: linux-user: ARM: Ignore immediate value for svc in thumb mode (2012-08-13 15:47:33 +0100) Alexander Graf (1): linux-user: ARM: Ignore immediate value for svc in thumb mode Jing Huang (3): linux-user: pass sockaddr from host to target linux-user: make do_setsockopt support SOL_RAW ICMP_FILTER socket option linux-user: make host_to_target_cmsg support SO_TIMESTAMP cmsg_type Meador Inge (2): linux-user: Factor out guest space probing into a function linux-user: Use init_guest_space when -R and -B are specified Mike Frysinger (1): flatload: fix bss clearing Peter Maydell (3): linux-user: Fix incorrect TARGET_BLKBSZGET, TARGET_BLKBSZSET linux-user: Fix SNDCTL_DSP_MAP{IN, OUT}BUF ioctl definitions linux-user: Move target_to_host_errno_table[] setup out of ioctl loop linux-user/elfload.c | 161 +--- linux-user/flatload.c |2 +- linux-user/ioctls.h|4 +- linux-user/main.c | 38 ++- linux-user/qemu.h | 15 +++-- linux-user/syscall.c | 66 ++ linux-user/syscall_defs.h |8 +-- linux-user/syscall_types.h |3 + 8 files changed, 205 insertions(+), 92 deletions(-)
[Qemu-devel] [PATCH 02/10] linux-user: Fix SNDCTL_DSP_MAP{IN, OUT}BUF ioctl definitions
Fix the SNDCTL_DSP_MAP{IN,OUT}BUF ioctl definitions so that they refer to a suitably defined target struct layout rather than hardcoding the ioctl number. This fixes complaints from the syscall_init() consistency check when running an x86_64-to-x86_64 linux-user qemu. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- linux-user/ioctls.h|4 ++-- linux-user/syscall_defs.h |4 ++-- linux-user/syscall_types.h |3 +++ 3 files changed, 7 insertions(+), 4 deletions(-) diff --git a/linux-user/ioctls.h b/linux-user/ioctls.h index eb96a08..8a47767 100644 --- a/linux-user/ioctls.h +++ b/linux-user/ioctls.h @@ -186,8 +186,8 @@ IOCTL(SNDCTL_DSP_GETISPACE, IOC_R, MK_PTR(MK_STRUCT(STRUCT_audio_buf_info))) IOCTL(SNDCTL_DSP_GETOSPACE, IOC_R, MK_PTR(MK_STRUCT(STRUCT_audio_buf_info))) IOCTL(SNDCTL_DSP_GETTRIGGER, IOC_R, MK_PTR(TYPE_INT)) - IOCTL(SNDCTL_DSP_MAPINBUF, IOC_R, MK_PTR(TYPE_INT)) - IOCTL(SNDCTL_DSP_MAPOUTBUF, IOC_R, MK_PTR(TYPE_INT)) + IOCTL(SNDCTL_DSP_MAPINBUF, IOC_R, MK_PTR(MK_STRUCT(STRUCT_buffmem_desc))) + IOCTL(SNDCTL_DSP_MAPOUTBUF, IOC_R, MK_PTR(MK_STRUCT(STRUCT_buffmem_desc))) IOCTL(SNDCTL_DSP_NONBLOCK, 0, TYPE_NULL) IOCTL(SNDCTL_DSP_POST, 0, TYPE_NULL) IOCTL(SNDCTL_DSP_RESET, 0, TYPE_NULL) diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h index 2026579..2cfda5a 100644 --- a/linux-user/syscall_defs.h +++ b/linux-user/syscall_defs.h @@ -2226,8 +2226,8 @@ struct target_eabi_flock64 { #define TARGET_SNDCTL_DSP_GETTRIGGER TARGET_IOR('P',16, int) #define TARGET_SNDCTL_DSP_GETIPTR TARGET_IORU('P',17) #define TARGET_SNDCTL_DSP_GETOPTR TARGET_IORU('P',18) -#define TARGET_SNDCTL_DSP_MAPINBUF0x80085013 -#define TARGET_SNDCTL_DSP_MAPOUTBUF 0x80085014 +#define TARGET_SNDCTL_DSP_MAPINBUFTARGET_IORU('P', 19) +#define TARGET_SNDCTL_DSP_MAPOUTBUF TARGET_IORU('P', 20) #define TARGET_SNDCTL_DSP_NONBLOCK0x500e #define TARGET_SNDCTL_DSP_SAMPLESIZE 0xc0045005 #define TARGET_SNDCTL_DSP_SETDUPLEX 0x5016 diff --git a/linux-user/syscall_types.h b/linux-user/syscall_types.h index 601618d..44b6a58 100644 --- a/linux-user/syscall_types.h +++ b/linux-user/syscall_types.h @@ -77,6 +77,9 @@ STRUCT(audio_buf_info, STRUCT(count_info, TYPE_INT, TYPE_INT, TYPE_INT) +STRUCT(buffmem_desc, + TYPE_PTRVOID, TYPE_INT) + STRUCT(mixer_info, MK_ARRAY(TYPE_CHAR, 16), MK_ARRAY(TYPE_CHAR, 32), TYPE_INT, MK_ARRAY(TYPE_INT, 10)) -- 1.7.9.5
[Qemu-devel] [PATCH 05/10] linux-user: make do_setsockopt support SOL_RAW ICMP_FILTER socket option
From: Jing Huang jing.huang@gmail.com Signed-off-by: Jing Huang jing.huang@gmail.com Reviewed-by: Peter Maydell peter.mayd...@linaro.org Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- linux-user/syscall.c | 20 1 file changed, 20 insertions(+) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 47f0eb3..700163e 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -60,6 +60,7 @@ int __clone2(int (*fn)(void *), void *child_stack_base, #include netinet/ip.h #include netinet/tcp.h #include linux/wireless.h +#include linux/icmp.h #include qemu-common.h #ifdef TARGET_GPROF #include sys/gmon.h @@ -1452,6 +1453,25 @@ static abi_long do_setsockopt(int sockfd, int level, int optname, goto unimplemented; } break; +case SOL_RAW: +switch (optname) { +case ICMP_FILTER: +/* struct icmp_filter takes an u32 value */ +if (optlen sizeof(uint32_t)) { +return -TARGET_EINVAL; +} + +if (get_user_u32(val, optval_addr)) { +return -TARGET_EFAULT; +} +ret = get_errno(setsockopt(sockfd, level, optname, + val, sizeof(val))); +break; + +default: +goto unimplemented; +} +break; case TARGET_SOL_SOCKET: switch (optname) { /* Options with 'int' argument. */ -- 1.7.9.5
[Qemu-devel] [PATCH 04/10] linux-user: pass sockaddr from host to target
From: Jing Huang jing.huang@gmail.com Signed-off-by: Jing Huang jing.huang@gmail.com Reviewed-by: Peter Maydell peter.mayd...@linaro.org Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- linux-user/syscall.c | 16 +--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 1af68d2..47f0eb3 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -1268,7 +1268,6 @@ static inline abi_long host_to_target_sockaddr(abi_ulong target_addr, return 0; } -/* ??? Should this also swap msgh-name? */ static inline abi_long target_to_host_cmsg(struct msghdr *msgh, struct target_msghdr *target_msgh) { @@ -1325,7 +1324,6 @@ static inline abi_long target_to_host_cmsg(struct msghdr *msgh, return 0; } -/* ??? Should this also swap msgh-name? */ static inline abi_long host_to_target_cmsg(struct target_msghdr *target_msgh, struct msghdr *msgh) { @@ -1885,10 +1883,22 @@ static abi_long do_sendrecvmsg(int fd, abi_ulong target_msg, if (!is_error(ret)) { len = ret; ret = host_to_target_cmsg(msgp, msg); -if (!is_error(ret)) +if (!is_error(ret)) { +msgp-msg_namelen = tswap32(msg.msg_namelen); +if (msg.msg_name != NULL) { +ret = host_to_target_sockaddr(tswapal(msgp-msg_name), +msg.msg_name, msg.msg_namelen); +if (ret) { +goto out; +} +} + ret = len; +} } } + +out: unlock_iovec(vec, target_vec, count, !send); unlock_user_struct(msgp, target_msg, send ? 0 : 1); return ret; -- 1.7.9.5
Re: [Qemu-devel] [PATCH v6 2/2] block: Support GlusterFS as a QEMU block backend
Am 14.08.2012 11:34, schrieb Bharata B Rao: On Tue, Aug 14, 2012 at 10:29:26AM +0200, Kevin Wolf wrote: Yes, and that will result in port=0, which is default. So this is to cater for cases like gluster://[1:2:3:4:5]:/volname/image So you consider this a valid URL? I would have expected it to invalid. But let me see, there must be some official definition of an URL... Alright, so RFC 2234 says that having no digits after the colon is valid. It also says that you shouldn't generate such URLs. And it doesn't say what it means when it's there... Common interpretation seems to be that it's treated as if it wasn't specified, i.e. the default port for the schema is used. So if 0 is the default port for glusterfs, your code looks okay. But it doesn't seem to be a very useful default port number. I know, but gluster prefers to be called with port=0 which will be interpreted as default by it. Ok, that makes sense. While we are at this, let me bring out another issue. Gluster supports 3 transport types: - socket in which case the server will be hostname, ipv4 or ipv4 address. - rdma in which case server will be interpreted similar to socket. - unix in which case server will be a path to unix domain socket and this will look like any other filesystem path. (Eg. /tmp/glusterd.socket) I don't think we can fit 'unix' within the standard URI scheme (RFC 3986) easily, but I am planning to specify the 'unix' transport as below: gluster://[/path/to/unix/domain/socket]/volname/image?transport=unix i,e., I am asking the user to put the unix domain socket path within square brackets when transport type is unix. Do you think this is fine ? Never saw something like this before, but it does seem reasonable to me. Excludes ] from the valid characters in the file name of the socket, but that shouldn't be a problem in practice. +static int qemu_gluster_send_pipe(BDRVGlusterState *s, GlusterAIOCB *acb) +{ +int ret = 0; +while (1) { +fd_set wfd; +int fd = s-fds[GLUSTER_FD_WRITE]; + +ret = write(fd, (void *)acb, sizeof(acb)); +if (ret = 0) { +break; +} +if (errno == EINTR) { +continue; +} +if (errno != EAGAIN) { +break; +} + +FD_ZERO(wfd); +FD_SET(fd, wfd); +do { +ret = select(fd + 1, NULL, wfd, NULL, NULL); +} while (ret 0 errno == EINTR); What's the idea behind this? While we're hanging in this loop noone will read anything from the pipe, so it's unlikely that it magically becomes ready. I write to the pipe and wait for the reader to read it. The reader (qemu_gluster_aio_event_reader) is already waiting on the other end of the pipe. qemu_gluster_aio_even_reader() isn't called while we're looping here. It will only be called from the main loop, after this function has returned. May be I am not understanding you correctly here. Let me be a bit verbose. [...] Sorry, my mistake. I was assuming that this code runs in a thread created by qemu, which isn't true. Your explanation makes perfect sense. Kevin
[Qemu-devel] [PATCH 09/10] linux-user: Use init_guest_space when -R and -B are specified
From: Meador Inge mead...@codesourcery.com Roll the code used to initialize the guest memory space when -R or -B is used into 'init_guest_space' and then call 'init_guest_space' from the driver. This way the reserved guest memory space can be probed for. Calling 'mmap' just once as is currently done is not guaranteed to succeed since the host address space validation might fail. Signed-off-by: Meador Inge mead...@codesourcery.com [PMM: Fixed minor whitespace errors.] Reviewed-by: Peter Maydell peter.mayd...@linaro.org Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- linux-user/elfload.c | 59 ++ linux-user/main.c| 35 +- linux-user/qemu.h|6 - 3 files changed, 56 insertions(+), 44 deletions(-) diff --git a/linux-user/elfload.c b/linux-user/elfload.c index cbc7617..819fdd5 100644 --- a/linux-user/elfload.c +++ b/linux-user/elfload.c @@ -332,9 +332,17 @@ enum ARM_HWCAP_ARM_VFPv3D16 = 1 13, }; -#define TARGET_HAS_GUEST_VALIDATE_BASE -/* We want the opportunity to check the suggested base */ -bool guest_validate_base(unsigned long guest_base) +#define TARGET_HAS_VALIDATE_GUEST_SPACE +/* Return 1 if the proposed guest space is suitable for the guest. + * Return 0 if the proposed guest space isn't suitable, but another + * address space should be tried. + * Return -1 if there is no way the proposed guest space can be + * valid regardless of the base. + * The guest code may leave a page mapped and populate it if the + * address is suitable. + */ +static int validate_guest_space(unsigned long guest_base, +unsigned long guest_size) { unsigned long real_start, test_page_addr; @@ -342,6 +350,15 @@ bool guest_validate_base(unsigned long guest_base) * commpage at 0x0fxx */ test_page_addr = guest_base + (0x0f00 qemu_host_page_mask); + +/* If the commpage lies within the already allocated guest space, + * then there is no way we can allocate it. + */ +if (test_page_addr = guest_base + test_page_addr = (guest_base + guest_size)) { +return -1; +} + /* Note it needs to be writeable to let us initialise it */ real_start = (unsigned long) mmap((void *)test_page_addr, qemu_host_page_size, @@ -1418,9 +1435,10 @@ static abi_ulong create_elf_tables(abi_ulong p, int argc, int envc, return sp; } -#ifndef TARGET_HAS_GUEST_VALIDATE_BASE +#ifndef TARGET_HAS_VALIDATE_GUEST_SPACE /* If the guest doesn't have a validation function just agree */ -bool guest_validate_base(unsigned long guest_base) +static int validate_guest_space(unsigned long guest_base, +unsigned long guest_size) { return 1; } @@ -1439,7 +1457,7 @@ unsigned long init_guest_space(unsigned long host_start, /* If just a starting address is given, then just verify that * address. */ if (host_start !host_size) { -if (guest_validate_base(host_start)) { +if (validate_guest_space(host_start, host_size) == 1) { return host_start; } else { return (unsigned long)-1; @@ -1456,6 +1474,8 @@ unsigned long init_guest_space(unsigned long host_start, /* Otherwise, a non-zero size region of memory needs to be mapped * and validated. */ while (1) { +unsigned long real_size = host_size; + /* Do not use mmap_find_vma here because that is limited to the * guest address space. We are going to make the * guest address space fit whatever we're given. @@ -1466,9 +1486,28 @@ unsigned long init_guest_space(unsigned long host_start, return (unsigned long)-1; } -if ((real_start == current_start) - guest_validate_base(real_start - guest_start)) { -break; +/* Ensure the address is properly aligned. */ +if (real_start ~qemu_host_page_mask) { +munmap((void *)real_start, host_size); +real_size = host_size + qemu_host_page_size; +real_start = (unsigned long) +mmap((void *)real_start, real_size, PROT_NONE, flags, -1, 0); +if (real_start == (unsigned long)-1) { +return (unsigned long)-1; +} +real_start = HOST_PAGE_ALIGN(real_start); +} + +/* Check to see if the address is valid. */ +if (!host_start || real_start == current_start) { +int valid = validate_guest_space(real_start - guest_start, + real_size); +if (valid == 1) { +break; +} else if (valid == -1) { +return (unsigned long)-1; +} +/* valid == 0, so try again. */ } /* That address didn't work. Unmap and try a different one. @@ -1490,6 +1529,8 @@ unsigned long
[Qemu-devel] [PATCH 03/10] linux-user: Move target_to_host_errno_table[] setup out of ioctl loop
The code to initialise the target_to_host_errno_table[] array was accidentally inside the loop through checking and initialising all the supported ioctls. This was harmless but meant that we reinitialised the array several hundred times on startup. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- linux-user/syscall.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 3ba3ef5..1af68d2 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -4606,6 +4606,11 @@ void syscall_init(void) #undef STRUCT #undef STRUCT_SPECIAL +/* Build target_to_host_errno_table[] table from + * host_to_target_errno_table[]. */ +for (i=0; i ERRNO_TABLE_SIZE; i++) +target_to_host_errno_table[host_to_target_errno_table[i]] = i; + /* we patch the ioctl size if necessary. We rely on the fact that no ioctl has all the bits at '1' in the size field */ ie = ioctl_entries; @@ -4625,11 +4630,6 @@ void syscall_init(void) (size TARGET_IOC_SIZESHIFT); } -/* Build target_to_host_errno_table[] table from - * host_to_target_errno_table[]. */ -for (i=0; i ERRNO_TABLE_SIZE; i++) -target_to_host_errno_table[host_to_target_errno_table[i]] = i; - /* automatic consistency check if same arch */ #if (defined(__i386__) defined(TARGET_I386) defined(TARGET_ABI32)) || \ (defined(__x86_64__) defined(TARGET_X86_64)) -- 1.7.9.5
[Qemu-devel] [PATCH 07/10] flatload: fix bss clearing
From: Mike Frysinger vap...@gentoo.org The current bss clear logic assumes the target mmap address and host address are the same. Use g2h to translate from the target address space to the host so we can call memset on it. Signed-off-by: Mike Frysinger vap...@gentoo.org Reviewed-by: Peter Maydell peter.mayd...@linaro.org Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- linux-user/flatload.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/linux-user/flatload.c b/linux-user/flatload.c index be79496..58f679e 100644 --- a/linux-user/flatload.c +++ b/linux-user/flatload.c @@ -660,7 +660,7 @@ static int load_flat_file(struct linux_binprm * bprm, } /* zero the BSS. */ -memset((void *)((unsigned long)datapos + data_len), 0, bss_len); +memset(g2h(datapos + data_len), 0, bss_len); return 0; } -- 1.7.9.5
Re: [Qemu-devel] Funny -m arguments can crash
On 08/14/2012 11:44 AM, Markus Armbruster wrote: Next error: $ gdb --args qemu-system-x86_64 -nodefaults --enable-kvm -vnc :0 -monitor stdio -m 640k [...] Program received signal SIGSEGV, Segmentation fault. [...] (gdb) bt #0 0x003b0de884ac in __memcmp_sse2 () from /lib64/libc.so.6 #1 0x0063f1ad in patch_hypercalls (s=0x139b350) at /work/armbru/qemu/hw/i386/../kvmvapic.c:532 #2 0x0063f3fe in vapic_prepare (s=0x139b350) at /work/armbru/qemu/hw/i386/../kvmvapic.c:597 #3 0x0063f4ed in vapic_write (opaque=0x139b350, addr=0, data=32, size= 2) at /work/armbru/qemu/hw/i386/../kvmvapic.c:634 #4 0x00677a44 in memory_region_write_accessor (opaque=0x139d670, addr= Happens when -m argument is a multiple of 4k in [648k..768k]. Only with --enable-kvm. With and without my CMOS fix applied. kvmvapic requires RAM to be present underneath the ROM. We could fix up kvmvapic to allocate a 4k region and insert it as an overlay, but it's sufficient IMO to require sub-1M users to disable it. It won't be of any use to the anyway as Windows XP requires more than 1MB. And another one: $ qemu-system-x86_64 -nodefaults --enable-kvm -vnc :0 -monitor stdio -m 900k QEMU 1.1.50 monitor - type 'help' for more information (qemu) KVM internal error. Suberror: 1 emulation failure EAX=000fdb78 EBX= ECX= EDX=000fdb64 ESI= EDI=000fdb64 EBP= ESP=6f98 EIP=000e3492 EFL=0006 [-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0010 00c09300 DPL=0 DS [-WA] CS =0008 00c09b00 DPL=0 CS32 [-RA] SS =0010 00c09300 DPL=0 DS [-WA] DS =0010 00c09300 DPL=0 DS [-WA] FS =0010 00c09300 DPL=0 DS [-WA] GS =0010 00c09300 DPL=0 DS [-WA] LDT= 8200 DPL=0 LDT TR = 8b00 DPL=0 TSS32-busy GDT= 000fcd68 0037 IDT= 000fdb60 CR0=0011 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= Code=00 00 b8 26 00 00 00 eb 95 83 c8 ff 83 c4 4c 5b 5e 5f 5d c3 57 56 53 89 d6 39 c2 72 06 89 c7 f3 a4 eb 1b 8d 51 ff 01 d0 01 d6 89 cf 31 d2 eb 08 8a 1c q Not sure what's the problem. 57 is a push reg instruction which we ought to emulate fine. 900k is 0xe1000, just below eip, but we ought to execute just fine from unshadowed ROM. Breakpoint on kvm_handle_internal_error() yields backtrace: #0 kvm_handle_internal_error (env=0x1389b30, run=0x77ffa000) at /work/armbru/qemu/kvm-all.c:1424 #1 0x00674c5a in kvm_cpu_exec (env=0x1389b30) at /work/armbru/qemu/kvm-all.c:1586 #2 0x0060e0b4 in qemu_kvm_cpu_thread_fn (arg=0x1389b30) at /work/armbru/qemu/cpus.c:757 #3 0x003b0ea07d14 in start_thread () from /lib64/libpthread.so.0 #4 0x003b0def197d in clone () from /lib64/libc.so.6 Also seen with 904k, 908k, 964k, 968k, 972k 976k, and a whole lot more. Same EIP in the dump with those? Not reproduced with 1024k+. An easy way to fix these is to require 1MiB of RAM :) Or disabling kvm. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [v2 Patch 1/9]block: Framework for reopening image files safely
On 08/09/2012 06:32 PM, Jeff Cody wrote: On 08/09/2012 05:20 AM, Kevin Wolf wrote: Am 09.08.2012 06:26, schrieb Jeff Cody: On 07/30/2012 05:34 PM, Supriya Kannery wrote: + +void bdrv_reopen(BlockDriverState *bs, int bdrv_flags, Error **errp) +{ +BlockDriver *drv = bs-drv; +int ret = 0; +BDRVReopenState *reopen_state = NULL; + We should assert if drv is NULL: assert(drv != NULL); +/* Quiesce IO for the given block device */ +bdrv_drain_all(); +ret = bdrv_flush(bs); +if (ret != 0) { +error_set(errp, QERR_IO_ERROR); +return; +} + We also need to reopen bs-file, so maybe something like this: /* open any file images */ if (bs-file) { bdrv_reopen(bs-file, bdrv_flags, errp); if (errp *errp) { goto exit; } } This will necessitate making the handlers in the raw.c file just stubs (I'll respond to that patch as well). Doesn't this break the transactional semantics? I think you should only prepare the bs-file reopen here and commit it when committing this one. Yes, it would, so if everything stayed as it is now, that would be the right way to do it... but one thing that would be nice (I also mention this in my comments on patch 0/9) is that it become transactional for multiple BlockDriverStates to be reopened. That would obviously change the interface a bit, so that multiple BDS entries could be queued, but then it shouldn't be any different to queue the bs-file as well as the bs. All prepare() stages would happen on queued items, and only progress to commit() if all prepare() stages passed, as you mentioned. Yes, will work on to get the transactional semantics applied to bs-file reopen as well. One other thing, and perhaps this is nit-picking some - but the prepare() functions all modify the 'live' structures, after backing them up into stashes. So, on abort, the structures are restored from the stashes, and on commit the stashes are discarded. Conceptually, it seems cleaner to this the opposite way - prepare() does it work into temporary stashes, and the commit() then copies the data over from the stash to the live structure, and abort() would just discard the stashes. prepare()/commit() and abort() are from the perspective of new changes to be tried on respective block driver state. Once block driver is ready for the state change, then commit() means we are commiting these new changes to driver state. Similarly abort() means we are aborting these new changes half way and getting back to old stashed state. This is the intended logic. - thanks, Supriya
Re: [Qemu-devel] [PATCH] iscsi: fix race between task completition and task abortion
On Tue, Aug 14, 2012 at 08:44:46AM +0200, Stefan Priebe wrote: From: spriebe g...@profihost.ag CCing Ronnie, iSCSI block driver author. --- block/iscsi.c | 36 1 files changed, 20 insertions(+), 16 deletions(-) diff --git a/block/iscsi.c b/block/iscsi.c index 12ca76d..257f97f 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -76,6 +76,10 @@ static void iscsi_abort_task_cb(struct iscsi_context *iscsi, int status, void *command_data, void *private_data) { + IscsiAIOCB *acb = (IscsiAIOCB *)private_data; + + scsi_free_scsi_task(acb-task); + acb-task = NULL; } static void @@ -85,14 +89,19 @@ iscsi_aio_cancel(BlockDriverAIOCB *blockacb) IscsiLun *iscsilun = acb-iscsilun; acb-common.cb(acb-common.opaque, -ECANCELED); -acb-canceled = 1; -/* send a task mgmt call to the target to cancel the task on the target */ -iscsi_task_mgmt_abort_task_async(iscsilun-iscsi, acb-task, - iscsi_abort_task_cb, NULL); +if (acb-canceled != 0) +return; + +acb-canceled = 1; -/* then also cancel the task locally in libiscsi */ -iscsi_scsi_task_cancel(iscsilun-iscsi, acb-task); +/* send a task mgmt call to the target to cancel the task on the target + * this also cancels the task in libiscsi + */ +if (acb-task) { +iscsi_task_mgmt_abort_task_async(iscsilun-iscsi, acb-task, + iscsi_abort_task_cb, acb); +} } static AIOPool iscsi_aio_pool = { @@ -184,6 +193,11 @@ iscsi_readv_writev_bh_cb(void *p) } qemu_aio_release(acb); + +if (acb-task) { + scsi_free_scsi_task(acb-task); + acb-task = NULL; +} } @@ -212,8 +226,6 @@ iscsi_aio_write16_cb(struct iscsi_context *iscsi, int status, } iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb); -scsi_free_scsi_task(acb-task); -acb-task = NULL; } static int64_t sector_qemu2lun(int64_t sector, IscsiLun *iscsilun) @@ -313,8 +325,6 @@ iscsi_aio_read16_cb(struct iscsi_context *iscsi, int status, } iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb); -scsi_free_scsi_task(acb-task); -acb-task = NULL; } static BlockDriverAIOCB * @@ -429,8 +439,6 @@ iscsi_synccache10_cb(struct iscsi_context *iscsi, int status, } iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb); -scsi_free_scsi_task(acb-task); -acb-task = NULL; } static BlockDriverAIOCB * @@ -483,8 +491,6 @@ iscsi_unmap_cb(struct iscsi_context *iscsi, int status, } iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb); -scsi_free_scsi_task(acb-task); -acb-task = NULL; } static BlockDriverAIOCB * @@ -560,8 +566,6 @@ iscsi_aio_ioctl_cb(struct iscsi_context *iscsi, int status, } iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb); -scsi_free_scsi_task(acb-task); -acb-task = NULL; } static BlockDriverAIOCB *iscsi_aio_ioctl(BlockDriverState *bs, -- 1.7.2.5
Re: [Qemu-devel] [PATCH 2/4] s390: Add a mechanism to get the subchannel id.
On Tue, 14 Aug 2012 10:52:09 +0200 (CEST) Sebastian Ott seb...@linux.vnet.ibm.com wrote: On Tue, 7 Aug 2012, Cornelia Huck wrote: +/** + * ccw_device_get_schid - obtain a subchannel id + * @cdev: device to obtain the id for + * @schid: where to fill in the values + */ +void ccw_device_get_schid(struct ccw_device *cdev, struct subchannel_id *schid) +{ + *schid = cdev-private-schid; +} +EXPORT_SYMBOL(ccw_device_get_schid); After I gave this some thought I like your version of this function better. But please use the id of the parent subchannel instead of the copy in cdev-private (since this will be removed soon). I'll do a cleanup patch to convert the internal users of the old function to use the one above. Good, will change that (I think we can rely on a ccw device always having either a real subchannel or the orphanage pseudo subchannel as parent). Small taste question: EXPORT_SYMBOL vs EXPORT_SYMBOL_GPL?
Re: [Qemu-devel] [Qemu-ppc][PATCH v7 1/3] Add USB option in machine options
On 08/07/2012 04:41 AM, Li Zhang wrote: When -usb option is used, global varible usb_enabled is set. And all the plafrom will create one USB controller according to this variable. In fact, global varibles make code hard to read. So this patch is to remove global variable usb_enabled and add USB option in machine options. All the plaforms will get USB option value from machine options. USB option of machine options will be set either by: * -usb * -machine type=pseries,usb=on Both these ways can work now. They both set USB option in machine options. In the future, the first way will be removed. Signed-off-by: Li Zhang zhlci...@linux.vnet.ibm.com --- hw/nseries.c |9 + hw/pc_piix.c |6 ++ hw/ppc_newworld.c | 10 +- hw/ppc_oldworld.c |8 hw/ppc_prep.c |7 +++ hw/pxa2xx.c | 15 +++ hw/realview.c |8 hw/spapr.c| 12 hw/versatilepb.c |8 qemu-config.c |4 sysemu.h |1 - vl.c | 29 +++-- 12 files changed, 109 insertions(+), 8 deletions(-) diff --git a/hw/nseries.c b/hw/nseries.c index 4df2670..8e385b7 100644 --- a/hw/nseries.c +++ b/hw/nseries.c @@ -1282,6 +1282,9 @@ static void n8x0_init(ram_addr_t ram_size, const char *boot_device, int sdram_size = binfo-ram_size; DisplayState *ds; +QemuOpts *mach_opts; +bool usb_enabled = false; + s-mpu = omap2420_mpu_init(sysmem, sdram_size, cpu_model); /* Setup peripherals @@ -1322,6 +1325,12 @@ static void n8x0_init(ram_addr_t ram_size, const char *boot_device, n8x0_dss_setup(s); n8x0_cbus_setup(s); n8x0_uart_setup(s); + +mach_opts = qemu_opts_find(qemu_find_opts(machine), 0); +if (mach_opts) { +usb_enabled = qemu_opt_get_bool(mach_opts, usb, false); +} + if (usb_enabled) n8x0_usb_setup(s); diff --git a/hw/pc_piix.c b/hw/pc_piix.c index 0c0096f..0bf25d0 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -148,6 +148,8 @@ static void pc_init1(MemoryRegion *system_memory, MemoryRegion *pci_memory; MemoryRegion *rom_memory; void *fw_cfg = NULL; +QemuOpts *mach_opts; +bool usb_enabled = false; pc_cpus_init(cpu_model); @@ -267,6 +269,10 @@ static void pc_init1(MemoryRegion *system_memory, pc_cmos_init(below_4g_mem_size, above_4g_mem_size, boot_device, floppy, idebus[0], idebus[1], rtc_state); +mach_opts = qemu_opts_find(qemu_find_opts(machine), 0); +if (mach_opts) { +usb_enabled = qemu_opt_get_bool(mach_opts, usb, false); +} if (pci_enabled usb_enabled) { pci_create_simple(pci_bus, piix3_devfn + 2, piix3-usb-uhci); } diff --git a/hw/ppc_newworld.c b/hw/ppc_newworld.c index 4e2a6e6..1a324eb 100644 --- a/hw/ppc_newworld.c +++ b/hw/ppc_newworld.c @@ -159,6 +159,9 @@ static void ppc_core99_init (ram_addr_t ram_size, linux_boot = (kernel_filename != NULL); +QemuOpts *mach_opts; +bool usb_enabled = false; + /* init CPUs */ if (cpu_model == NULL) #ifdef TARGET_PPC64 @@ -350,7 +353,7 @@ static void ppc_core99_init (ram_addr_t ram_size, /* cuda also initialize ADB */ if (machine_arch == ARCH_MAC99_U3) { -usb_enabled = 1; +usb_enabled = true; } cuda_init(cuda_mem, pic[0x19]); @@ -360,6 +363,11 @@ static void ppc_core99_init (ram_addr_t ram_size, macio_init(pci_bus, PCI_DEVICE_ID_APPLE_UNI_N_KEYL, 0, pic_mem, dbdma_mem, cuda_mem, NULL, 3, ide_mem, escc_bar); +mach_opts = qemu_opts_find(qemu_find_opts(machine), 0); +if (mach_opts) { +usb_enabled = qemu_opt_get_bool(mach_opts, usb, true); +} + if (usb_enabled) { pci_create_simple(pci_bus, -1, pci-ohci); } diff --git a/hw/ppc_oldworld.c b/hw/ppc_oldworld.c index f2c6908..da05705 100644 --- a/hw/ppc_oldworld.c +++ b/hw/ppc_oldworld.c @@ -101,6 +101,9 @@ static void ppc_heathrow_init (ram_addr_t ram_size, linux_boot = (kernel_filename != NULL); +QemuOpts *mach_opts; +bool usb_enabled = true; + /* init CPUs */ if (cpu_model == NULL) cpu_model = G3; @@ -286,6 +289,11 @@ static void ppc_heathrow_init (ram_addr_t ram_size, macio_init(pci_bus, PCI_DEVICE_ID_APPLE_343S1201, 1, pic_mem, dbdma_mem, cuda_mem, nvr, 2, ide_mem, escc_bar); +mach_opts = qemu_opts_find(qemu_find_opts(machine), 0); +if (mach_opts) { +usb_enabled = qemu_opt_get_bool(mach_opts, usb, true); +} + if (usb_enabled) { pci_create_simple(pci_bus, -1, pci-ohci); } diff --git a/hw/ppc_prep.c b/hw/ppc_prep.c index be2b268..dc294ae 100644 --- a/hw/ppc_prep.c +++ b/hw/ppc_prep.c @@ -484,6 +484,9 @@ static void ppc_prep_init (ram_addr_t ram_size, linux_boot = (kernel_filename != NULL); +
Re: [Qemu-devel] [PATCH v8] kvm: notify host when the guest is panicked
On 2012-08-14 10:56, Daniel P. Berrange wrote: On Mon, Aug 13, 2012 at 03:21:32PM -0300, Marcelo Tosatti wrote: On Wed, Aug 08, 2012 at 10:43:01AM +0800, Wen Congyang wrote: We can know the guest is panicked when the guest runs on xen. But we do not have such feature on kvm. Another purpose of this feature is: management app(for example: libvirt) can do auto dump when the guest is panicked. If management app does not do auto dump, the guest's user can do dump by hand if he sees the guest is panicked. We have three solutions to implement this feature: 1. use vmcall 2. use I/O port 3. use virtio-serial. We have decided to avoid touching hypervisor. The reason why I choose choose the I/O port is: 1. it is easier to implememt 2. it does not depend any virtual device 3. it can work when starting the kernel How about searching for the Kernel panic - not syncing string in the guests serial output? Say libvirtd could take an action upon that? No, this is not satisfactory. It depends on the guest OS being configured to use the serial port for console output which we cannot mandate, since it may well be required for other purposes. Well, we have more than a single serial port, even when leaving virtio-serial aside... Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux
Re: [Qemu-devel] Funny -m arguments can crash
On 2012-08-14 12:20, Avi Kivity wrote: On 08/14/2012 11:44 AM, Markus Armbruster wrote: Next error: $ gdb --args qemu-system-x86_64 -nodefaults --enable-kvm -vnc :0 -monitor stdio -m 640k [...] Program received signal SIGSEGV, Segmentation fault. [...] (gdb) bt #0 0x003b0de884ac in __memcmp_sse2 () from /lib64/libc.so.6 #1 0x0063f1ad in patch_hypercalls (s=0x139b350) at /work/armbru/qemu/hw/i386/../kvmvapic.c:532 #2 0x0063f3fe in vapic_prepare (s=0x139b350) at /work/armbru/qemu/hw/i386/../kvmvapic.c:597 #3 0x0063f4ed in vapic_write (opaque=0x139b350, addr=0, data=32, size= 2) at /work/armbru/qemu/hw/i386/../kvmvapic.c:634 #4 0x00677a44 in memory_region_write_accessor (opaque=0x139d670, addr= Happens when -m argument is a multiple of 4k in [648k..768k]. Only with --enable-kvm. With and without my CMOS fix applied. kvmvapic requires RAM to be present underneath the ROM. We could fix up kvmvapic to allocate a 4k region and insert it as an overlay, but it's sufficient IMO to require sub-1M users to disable it. It won't be of any use to the anyway as Windows XP requires more than 1MB. We can also easily automatically disable it when there is insufficient (1MB) memory. Will post a patch. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [Qemu-ppc][PATCH v7 1/3] Add USB option in machine options
On 08/07/2012 04:41 AM, Li Zhang wrote: When -usb option is used, global varible usb_enabled is set. And all the plafrom will create one USB controller according to this variable. In fact, global varibles make code hard to read. So this patch is to remove global variable usb_enabled and add USB option in machine options. All the plaforms will get USB option value from machine options. USB option of machine options will be set either by: * -usb * -machine type=pseries,usb=on Both these ways can work now. They both set USB option in machine options. In the future, the first way will be removed. Signed-off-by: Li Zhang zhlci...@linux.vnet.ibm.com --- hw/nseries.c |9 + hw/pc_piix.c |6 ++ hw/ppc_newworld.c | 10 +- hw/ppc_oldworld.c |8 hw/ppc_prep.c |7 +++ hw/pxa2xx.c | 15 +++ hw/realview.c |8 hw/spapr.c| 12 hw/versatilepb.c |8 qemu-config.c |4 sysemu.h |1 - vl.c | 29 +++-- 12 files changed, 109 insertions(+), 8 deletions(-) [...] diff --git a/hw/spapr.c b/hw/spapr.c index 81c9343..4dc5e59 100644 --- a/hw/spapr.c +++ b/hw/spapr.c @@ -575,6 +575,8 @@ static void ppc_spapr_init(ram_addr_t ram_size, long load_limit, rtas_limit, fw_size; long pteg_shift = 17; char *filename; +QemuOpts *mach_opts; +bool usb_enabled = true; spapr = g_malloc0(sizeof(*spapr)); QLIST_INIT(spapr-phbs); @@ -710,6 +712,16 @@ static void ppc_spapr_init(ram_addr_t ram_size, spapr_vscsi_create(spapr-vio_bus); } +mach_opts = qemu_opts_find(qemu_find_opts(machine), 0); +if (mach_opts) { +usb_enabled = qemu_opt_get_bool(mach_opts, usb, true); +} + +if (usb_enabled) { +pci_create_simple(QLIST_FIRST(spapr-phbs)-host_state.bus, + -1, pci-ohci); +} + This needs to go into a separate patch. This patch is about moving the global usb_enabled variable towards a machine opt. It shouldn't modify any code outside of that scope, least of all add usb_enabled support for a new platform! Alex
Re: [Qemu-devel] memory: could we add extra input param for memory_region_init_io()?
On 08/14/2012 11:30 AM, liu ping fan wrote: To make memoryRegion survive without the protection of qemu big lock, we need to pin its based Object. In current code, the type of mr-opaque are quite different. Lots of them are Object, but quite of them are not yet. The most challenge for changing from memory_region_init_io(..., void *opaque, ...) to memory_region_init_io(..., Object *opaque,...) is such codes: hw/ide/cmd646.c: static void setup_cmd646_bar(PCIIDEState *d, int bus_num) { IDEBus *bus = d-bus[bus_num]; CMD646BAR *bar = d-cmd646_bar[bus_num]; bar-bus = bus; bar-pci_dev = d; memory_region_init_io(bar-cmd, cmd646_cmd_ops, bar, cmd646-cmd, 4); memory_region_init_io(bar-data, cmd646_data_ops, bar, cmd646-data, 8); } If we passed in mr's based Object @d to substitute @bar, then we can not pass the extra info @bus_num. To solve such issue, introduce extra member Object *base for MemoryRegion. diff --git a/memory.c b/memory.c index 643871b..afd5dea 100644 --- a/memory.c +++ b/memory.c @@ -931,6 +931,7 @@ static void memory_region_dispatch_write(MemoryRegion *mr, void memory_region_init_io(MemoryRegion *mr, const MemoryRegionOps *ops, + Object *base, void *opaque, const char *name, uint64_t size) @@ -938,6 +939,7 @@ void memory_region_init_io(MemoryRegion *mr, memory_region_init(mr, name, size); mr-ops = ops; mr-opaque = opaque; +mr-base = base; mr-terminates = true; mr-destructor = memory_region_destructor_iomem; mr-ram_addr = ~(ram_addr_t)0; diff --git a/memory.h b/memory.h index bd1bbae..2746e70 100644 --- a/memory.h +++ b/memory.h @@ -120,6 +120,7 @@ struct MemoryRegion { /* All fields are private - violators will be prosecuted */ const MemoryRegionOps *ops; void *opaque; +Object *base; MemoryRegion *parent; Int128 size; target_phys_addr_t addr; Any comment? I prefer that we convert the third parameter (opaque) to be an Object. That is a huge change, but I think it will improve the code base overall. Other options are: 1) add MemoryRegionOps::ref(MemoryRegion *) and ::unref(MemoryRegion *) If NULL, these callbacks are ignored. If not, they are called with the MemoryRegion as a parameter. Their responsibility is to derive the Object from the MemoryRegion (through the opaque or using container_of()) and ref or unref it respectively. 2) add Object *MemoryRegionOps::object(MemoryRegion *) Similar; if NULL it is ignored, otherwise it is used to derive the Object, which the memory core will ref and unref. 3) add bool MemoryRegionOps::opaque_is_object Tells the memory core that it is safe to cast the opaque into an Object. 4) add memory_region_set_object(MemoryRegion *, Object *) Like your proposal, but avoids adding an extra paramter and changing all call sites. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [PATCH v10 0/7] file descriptor passing using fd sets
Am 13.08.2012 20:39, schrieb Corey Bryant: On 08/13/2012 02:02 PM, Eric Blake wrote: On 08/13/2012 08:08 AM, Corey Bryant wrote: libvirt's sVirt security driver provides SELinux MAC isolation for Qemu guest processes and their corresponding image files. In other words, sVirt uses SELinux to prevent a QEMU process from opening files that do not belong to it. Corey Bryant (7): qemu-char: Add MSG_CMSG_CLOEXEC flag to recvmsg qapi: Introduce add-fd, remove-fd, query-fdsets block: Prevent detection of /dev/fdset/ as floppy block: Convert open calls to qemu_open block: Convert close calls to qemu_close block: Enable qemu_open/close to work with fd sets monitor: Clean up fd sets on monitor disconnect Thanks, applied all to the block branch. Hooray - I think we're there! Series: Reviewed-by: Eric Blake ebl...@redhat.com Thanks again!! Btw, In the case that no new versions of the patches need to be submitted, how does your Reviewed-by get added? Do I need to do anything to add it? No, I picked it up. Kevin
Re: [Qemu-devel] Funny -m arguments can crash
On 08/14/2012 01:44 PM, Jan Kiszka wrote: On 2012-08-14 12:20, Avi Kivity wrote: On 08/14/2012 11:44 AM, Markus Armbruster wrote: Next error: $ gdb --args qemu-system-x86_64 -nodefaults --enable-kvm -vnc :0 -monitor stdio -m 640k [...] Program received signal SIGSEGV, Segmentation fault. [...] (gdb) bt #0 0x003b0de884ac in __memcmp_sse2 () from /lib64/libc.so.6 #1 0x0063f1ad in patch_hypercalls (s=0x139b350) at /work/armbru/qemu/hw/i386/../kvmvapic.c:532 #2 0x0063f3fe in vapic_prepare (s=0x139b350) at /work/armbru/qemu/hw/i386/../kvmvapic.c:597 #3 0x0063f4ed in vapic_write (opaque=0x139b350, addr=0, data=32, size= 2) at /work/armbru/qemu/hw/i386/../kvmvapic.c:634 #4 0x00677a44 in memory_region_write_accessor (opaque=0x139d670, addr= Happens when -m argument is a multiple of 4k in [648k..768k]. Only with --enable-kvm. With and without my CMOS fix applied. kvmvapic requires RAM to be present underneath the ROM. We could fix up kvmvapic to allocate a 4k region and insert it as an overlay, but it's sufficient IMO to require sub-1M users to disable it. It won't be of any use to the anyway as Windows XP requires more than 1MB. We can also easily automatically disable it when there is insufficient (1MB) memory. Will post a patch. Would be nicer if it auto-disables itself, but don't know if the option ROM has access to the memory size. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [v2 Patch 2/9]block: raw-posix image file reopen
On 07/31/2012 10:47 PM, Eric Blake wrote: On 07/30/2012 03:34 PM, Supriya Kannery wrote: +s-fd = dup3(raw_rs-stash_s-fd, s-fd, O_CLOEXEC); You called it out in your cover letter, but just pointing out that this is one of the locations that needs a conditional fallback to dup2/qemu_set_cloexec if dup3 and O_CLOEXEC are missing. More importantly, though, you want this to be fcntl(F_DUP_CLOEXEC) and NOT dup3, so that you are duplicating to the first available fd rather than accidentally overwriting whatever s-fd happened to be on entry. Also, where is your error checking that you didn't just assign s-fd to -1? It looks like your goal here is to stash a copy of the fd, so that on failure you can then pivot over to your copy. Will use fcntl(F_DUP_CLOEXEC) in updated patch. + +*prs =(raw_rs-reopen_state); + +/* Flags that can be set using fcntl */ +int fcntl_flags = BDRV_O_NOCACHE; This variable name is misleading; fcntl_flags in my mind implies O_* not BDRV_O_*, as we are not guaranteed that they are the same values. Maybe bdrv_flags is a better name. Or are you trying to list the subset of BDRV_O flags that can be changed via mapping to the appropriate O_ flag during fcntl? From the list of flags in bdrv-openflags (BDRV_O*), only few of them can be changed using fcntl. So to identify those allowed subset, I am using fcntl_flags. Excerpt from man fcntl for F_SETFL: On Linux this command can only change the O_APPEND, O_ASYNC, O_DIRECT, O_NOATIME, and O_NONBLOCK flags. May be naming it fcntl_supported_flags would be better? - thanks, Supriya
Re: [Qemu-devel] [PATCH 2/4] s390: Add a mechanism to get the subchannel id.
On Tue, 14 Aug 2012, Cornelia Huck wrote: Sebastian Ott seb...@linux.vnet.ibm.com wrote: On Tue, 7 Aug 2012, Cornelia Huck wrote: +/** + * ccw_device_get_schid - obtain a subchannel id + * @cdev: device to obtain the id for + * @schid: where to fill in the values + */ +void ccw_device_get_schid(struct ccw_device *cdev, struct subchannel_id *schid) +{ + *schid = cdev-private-schid; +} +EXPORT_SYMBOL(ccw_device_get_schid); After I gave this some thought I like your version of this function better. But please use the id of the parent subchannel instead of the copy in cdev-private (since this will be removed soon). I'll do a cleanup patch to convert the internal users of the old function to use the one above. Good, will change that (I think we can rely on a ccw device always having either a real subchannel or the orphanage pseudo subchannel as parent). Yes you can rely on that. Small taste question: EXPORT_SYMBOL vs EXPORT_SYMBOL_GPL? get_subchannel_id was EXPORT_SYMBOL_GPL since its going to replace this let's use the _GPL variant. Regards, Sebastian
[Qemu-devel] [PATCH] xbzrle: fix compilation on ppc32
When compiling the xbzrle code on my ppc32 user space, I hit the following gcc compiler warning (treated as an error): cc1: warnings being treated as errors savevm.c: In function ‘xbzrle_encode_buffer’: savevm.c:2476: error: overflow in implicit constant conversion Fix this by making the cast explicit, rather than implicit. Signed-off-by: Alexander Graf ag...@suse.de --- savevm.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/savevm.c b/savevm.c index 0ea10c9..9ab4d83 100644 --- a/savevm.c +++ b/savevm.c @@ -2473,7 +2473,7 @@ int xbzrle_encode_buffer(uint8_t *old_buf, uint8_t *new_buf, int slen, /* word at a time for speed, use of 32-bit long okay */ if (!res) { /* truncation to 32-bit long okay */ -long mask = 0x0101010101010101ULL; +long mask = (long)0x0101010101010101ULL; while (i slen) { xor = *(long *)(old_buf + i) ^ *(long *)(new_buf + i); if ((xor - mask) ~xor (mask 7)) { -- 1.6.0.2
Re: [Qemu-devel] [PATCH] kvm-all.c: Move init of irqchip_inject_ioctl out of kvm_irqchip_create()
On 08/14/2012 10:33 AM, Jan Kiszka wrote: KVM_IRQ_LINE is old-style, deprecated, KVM_IRQ_LINE_STATUS (i.e injection with feedback to allow lost-tick compensation) is the current standard that other archs should pick up. KVM_IRQ_LINE_STATUS may not make sense on all architectures. I don't think we're really deprecating KVM_IRQ_LINE or discouraging its use. It's not like the kernel-allocated memory slot ioctls. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] Funny -m arguments can crash
On 2012-08-14 12:51, Avi Kivity wrote: On 08/14/2012 01:44 PM, Jan Kiszka wrote: On 2012-08-14 12:20, Avi Kivity wrote: On 08/14/2012 11:44 AM, Markus Armbruster wrote: Next error: $ gdb --args qemu-system-x86_64 -nodefaults --enable-kvm -vnc :0 -monitor stdio -m 640k [...] Program received signal SIGSEGV, Segmentation fault. [...] (gdb) bt #0 0x003b0de884ac in __memcmp_sse2 () from /lib64/libc.so.6 #1 0x0063f1ad in patch_hypercalls (s=0x139b350) at /work/armbru/qemu/hw/i386/../kvmvapic.c:532 #2 0x0063f3fe in vapic_prepare (s=0x139b350) at /work/armbru/qemu/hw/i386/../kvmvapic.c:597 #3 0x0063f4ed in vapic_write (opaque=0x139b350, addr=0, data=32, size= 2) at /work/armbru/qemu/hw/i386/../kvmvapic.c:634 #4 0x00677a44 in memory_region_write_accessor (opaque=0x139d670, addr= Happens when -m argument is a multiple of 4k in [648k..768k]. Only with --enable-kvm. With and without my CMOS fix applied. kvmvapic requires RAM to be present underneath the ROM. We could fix up kvmvapic to allocate a 4k region and insert it as an overlay, but it's sufficient IMO to require sub-1M users to disable it. It won't be of any use to the anyway as Windows XP requires more than 1MB. We can also easily automatically disable it when there is insufficient (1MB) memory. Will post a patch. Would be nicer if it auto-disables itself, but don't know if the option ROM has access to the memory size. There is that global ram_size, also used by vmport. Not really nice but no precedent. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PATCH 3/4] s390/kvm: Add a channel I/O based virtio transport driver.
On Tue, 14 Aug 2012 09:40:01 +0930 Rusty Russell ru...@rustcorp.com.au wrote: On Mon, 13 Aug 2012 10:56:38 +0200, Cornelia Huck cornelia.h...@de.ibm.com wrote: On Wed, 08 Aug 2012 13:52:57 +0930 Rusty Russell ru...@rustcorp.com.au wrote: On Tue, 7 Aug 2012 16:52:47 +0200, Cornelia Huck cornelia.h...@de.ibm.com wrote: 1) Please don't limit yourself to 32 feature bits! If you look at how virtio_mmio does it, they use a selector to index into a theoretically-infinite array of feature bits: It should be easy to extend the data processed by the feature ccws to a feature/index combination. Would it be practical to limit the index to an 8 bit value? 256 feature bits? That seems like it could one day be limiting. Or an 8 bit accessor into feature words? 8192 seems enough for anyone sane. An 8 bit accessor. I hope everybody stays on the sane side :) Note that we're also speculating a move to a new vring format, which will probably be little-endian. But you probably want a completely new ccw code for that anyway. Do you have a pointer to that discussion handy? If the host may support different vring formats, I'll probably want to add some kind of discovery mechanism for that as well (what discovery mechanism depends on whether this would be per-device or per-machine). It would be per-machine; per-device would be a bit crazy. We'd deprecate the old ring format. There's been no consistent thread on the ideas for a ring change, unfortunately, but you can find interesting parts here, off this thread: Message-ID: 8762gj6q5r@rustcorp.com.au Subject: Re: [RFC 7/11] virtio_pci: new, capability-aware driver. I've read a bit through this and it looks like this is really virtio-2 or so. How about discoverability by the guest? Guests will likely have to support both formats, and forcing them to look at the feature bits for each device in order to figure out the queue format feels wrong if it is going to be the same format for the whole machine anyway.
Re: [Qemu-devel] [Qemu-ppc][PATCH v7 2/3] Add one new file vga-pci.h and cleanup on all platforms
On 08/07/2012 04:41 AM, Li Zhang wrote: Functions pci_vga_init() and pci_cirrus_vga_init() are declared in pc.h. That prevents other platforms (e.g. sPAPR) to use them. This patch is to create one new file vga-pci.h and move the declarations to vga-pci.h, so that they can be shared by all platforms. This patch also cleans up on all platforms. Signed-off-by: Li Zhang zhlci...@linux.vnet.ibm.com Thanks, applied to ppc-next. Alex
Re: [Qemu-devel] [Qemu-ppc][PATCH v7 3/3] spapr: Add support for -vga option
On 08/07/2012 04:42 AM, Li Zhang wrote: Also instanciate the USB keyboard and mouse when that option is used (you can still use -device to create individual devices without all the defaults) Signed-off-by: Benjamin Herrenschmidt b...@kernel.crashing.org Signed-off-by: Li Zhang zhlci...@linux.vnet.ibm.com --- hw/spapr.c | 31 ++- 1 files changed, 30 insertions(+), 1 deletions(-) Thanks, applied to ppc-next without the USB bits. I also get the following warning now: $ ./ppc64-softmmu/qemu-system-ppc64 -nographic -M pseries -kernel /boot/vmlinux -initrd /boot/initrd -enable-kvm -m 1G -append root=/dev/null This vga model is not supported,currently it only supports -vga std [...] Fixing with a follow-up patch. Alex
Re: [Qemu-devel] [Qemu-ppc][PATCH v7 0/3] Add USB enablement and VGA enablement on sPAPR
On 08/07/2012 05:05 PM, Li Zhang wrote: Hi Alex and Andreas, Would you please help review the new version patches? We hope the patches can be merged into 1.2. 1.2 will be freezed soon. Most of it should be unintrusive enough for 1.2. Please just make sure to rework the USB patches quickly. Alex
Re: [Qemu-devel] [PATCH] kvm-all.c: Move init of irqchip_inject_ioctl out of kvm_irqchip_create()
On 2012-08-14 13:01, Avi Kivity wrote: On 08/14/2012 10:33 AM, Jan Kiszka wrote: KVM_IRQ_LINE is old-style, deprecated, KVM_IRQ_LINE_STATUS (i.e injection with feedback to allow lost-tick compensation) is the current standard that other archs should pick up. KVM_IRQ_LINE_STATUS may not make sense on all architectures. I don't think we're really deprecating KVM_IRQ_LINE or discouraging its use. It's not like the kernel-allocated memory slot ioctls. I do not think it makes sense to provide both interfaces long term (provided we ever do a cut). Also, it's almost trivial to provide the add-on feature of KVM_IRQ_LINE_STATUS, and it keeps the door open for IRQ decoalescing. If there is no way for an arch to detect coalescing, it can still return 0 unconditionally. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux
[Qemu-devel] [PATCH uq/master] kvmvapic: Disable if there is insufficient memory
We need at least 1M of RAM to map the option ROM. Otherwise, we will corrupt host memory or even crash. Reported-by: Markus Armbruster arm...@redhat.com Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/apic_common.c |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/hw/apic_common.c b/hw/apic_common.c index 58e63b0..371f95d 100644 --- a/hw/apic_common.c +++ b/hw/apic_common.c @@ -299,7 +299,9 @@ static int apic_init_common(SysBusDevice *dev) sysbus_init_mmio(dev, s-io_memory); -if (!vapic s-vapic_control VAPIC_ENABLE_MASK) { +/* Note: We need at least 1M to map the VAPIC option ROM */ +if (!vapic s-vapic_control VAPIC_ENABLE_MASK +ram_size = 1024 * 1024) { vapic = sysbus_create_simple(kvmvapic, -1, NULL); } s-vapic = vapic; -- 1.7.3.4
Re: [Qemu-devel] Funny -m arguments can crash
Avi Kivity a...@redhat.com writes: On 08/14/2012 11:44 AM, Markus Armbruster wrote: [...] And another one: $ qemu-system-x86_64 -nodefaults --enable-kvm -vnc :0 -monitor stdio -m 900k QEMU 1.1.50 monitor - type 'help' for more information (qemu) KVM internal error. Suberror: 1 emulation failure EAX=000fdb78 EBX= ECX= EDX=000fdb64 ESI= EDI=000fdb64 EBP= ESP=6f98 EIP=000e3492 EFL=0006 [-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0010 00c09300 DPL=0 DS [-WA] CS =0008 00c09b00 DPL=0 CS32 [-RA] SS =0010 00c09300 DPL=0 DS [-WA] DS =0010 00c09300 DPL=0 DS [-WA] FS =0010 00c09300 DPL=0 DS [-WA] GS =0010 00c09300 DPL=0 DS [-WA] LDT= 8200 DPL=0 LDT TR = 8b00 DPL=0 TSS32-busy GDT= 000fcd68 0037 IDT= 000fdb60 CR0=0011 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= Code=00 00 b8 26 00 00 00 eb 95 83 c8 ff 83 c4 4c 5b 5e 5f 5d c3 57 56 53 89 d6 39 c2 72 06 89 c7 f3 a4 eb 1b 8d 51 ff 01 d0 01 d6 89 cf 31 d2 eb 08 8a 1c q Not sure what's the problem. 57 is a push reg instruction which we ought to emulate fine. 900k is 0xe1000, just below eip, but we ought to execute just fine from unshadowed ROM. Breakpoint on kvm_handle_internal_error() yields backtrace: #0 kvm_handle_internal_error (env=0x1389b30, run=0x77ffa000) at /work/armbru/qemu/kvm-all.c:1424 #1 0x00674c5a in kvm_cpu_exec (env=0x1389b30) at /work/armbru/qemu/kvm-all.c:1586 #2 0x0060e0b4 in qemu_kvm_cpu_thread_fn (arg=0x1389b30) at /work/armbru/qemu/cpus.c:757 #3 0x003b0ea07d14 in start_thread () from /lib64/libpthread.so.0 #4 0x003b0def197d in clone () from /lib64/libc.so.6 Also seen with 904k, 908k, 964k, 968k, 972k 976k, and a whole lot more. Same EIP in the dump with those? Offenders within 1s in range 868k..1028k step 4: 900 EIP=000e3492 EFL=0006 [-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 904 EIP=000e3492 EFL=0006 [-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 908 EIP=000e3492 EFL=0006 [-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 916 EIP=000e570e EFL=0012 [A--] CPL=0 II=0 A20=1 SMM=0 HLT=0 964 EIP=000f2b76 EFL=0012 [A--] CPL=0 II=0 A20=1 SMM=0 HLT=0 968 EIP=000f2b76 EFL=0012 [A--] CPL=0 II=0 A20=1 SMM=0 HLT=0 972 EIP=000fc6db EFL=0087 [--S--PC] CPL=0 II=0 A20=1 SMM=0 HLT=0 976 EIP=000fc6db EFL=0087 [--S--PC] CPL=0 II=0 A20=1 SMM=0 HLT=0 980 EIP=000fc6db EFL=0087 [--S--PC] CPL=0 II=0 A20=1 SMM=0 HLT=0 984 EIP=000fc6db EFL=0087 [--S--PC] CPL=0 II=0 A20=1 SMM=0 HLT=0 988 EIP=000fc6db EFL=0087 [--S--PC] CPL=0 II=0 A20=1 SMM=0 HLT=0 992 EIP=000fc6db EFL=0087 [--S--PC] CPL=0 II=0 A20=1 SMM=0 HLT=0 996 EIP=000fc6db EFL=0087 [--S--PC] CPL=0 II=0 A20=1 SMM=0 HLT=0 1000 EIP=000fc6db EFL=0087 [--S--PC] CPL=0 II=0 A20=1 SMM=0 HLT=0 1004 EIP=000fc6db EFL=0087 [--S--PC] CPL=0 II=0 A20=1 SMM=0 HLT=0 1008 EIP=000fc6db EFL=0087 [--S--PC] CPL=0 II=0 A20=1 SMM=0 HLT=0 1012 EIP=000fe69f EFL=0087 [--S--PC] CPL=0 II=0 A20=1 SMM=0 HLT=0 1016 EIP=000fe69f EFL=0083 [--S---C] CPL=0 II=0 A20=1 SMM=0 HLT=0 1020 EIP=f000 EFL=0046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 Not reproduced with 1024k+. An easy way to fix these is to require 1MiB of RAM :) Or disabling kvm.
Re: [Qemu-devel] [v2 Patch 2/9]block: raw-posix image file reopen
On 08/10/2012 07:15 PM, Corey Bryant wrote: On 07/30/2012 05:34 PM, Supriya Kannery wrote: +static int raw_reopen_prepare(BlockDriverState *bs, BDRVReopenState **prs, + int flags) +{ + BDRVRawReopenState *raw_rs = g_malloc0(sizeof(BDRVRawReopenState)); + BDRVRawState *s = bs-opaque; + int ret = 0; + + raw_rs-reopen_state.bs = bs; + + /* stash state before reopen */ + raw_rs-stash_s = g_malloc0(sizeof(BDRVRawState)); + raw_stash_state(raw_rs-stash_s, s); + s-fd = dup3(raw_rs-stash_s-fd, s-fd, O_CLOEXEC); + + *prs = (raw_rs-reopen_state); + + /* Flags that can be set using fcntl */ + int fcntl_flags = BDRV_O_NOCACHE; + + if ((bs-open_flags ~fcntl_flags) == (flags ~fcntl_flags)) { + if ((flags BDRV_O_NOCACHE)) { + s-open_flags |= O_DIRECT; + } else { + s-open_flags = ~O_DIRECT; + } + ret = fcntl_setfl(s-fd, s-open_flags); + } else { + + /* close and reopen using new flags */ + bs-drv-bdrv_close(bs); + ret = bs-drv-bdrv_file_open(bs, bs-filename, flags); Will this allow the fdset refcount to get to zero? I was hoping your patches would prevent that from happening. Perhaps Kevin or Eric can weigh in. qemu_open() increments the refcount for an fdset when an fd from it is used, and qemu_close() decrements it. I think if you were able to perform the open before the close here that refcount wouldn't get to zero. Since we are duping the file descriptor before reaching this bdrv_close(), refcount for fd won't become zero. - thanks, Supriya
[Qemu-devel] [PATCH] PPC: spapr: Rework VGA select logic
When selecting our VGA adapter, we want to: * fail completely when we can't satisfy the user's request * support -nographic where no VGA adapter should be spawned This patch reworks the logic so we fulfill the two conditions above. Signed-off-by: Alexander Graf ag...@suse.de --- hw/spapr.c | 14 ++ 1 files changed, 10 insertions(+), 4 deletions(-) diff --git a/hw/spapr.c b/hw/spapr.c index 494c412..31c4e53 100644 --- a/hw/spapr.c +++ b/hw/spapr.c @@ -563,16 +563,22 @@ static void spapr_cpu_reset(void *opaque) cpu_reset(CPU(cpu)); } +/* Returns whether we want to use VGA or not */ static int spapr_vga_init(PCIBus *pci_bus) { -if (std_vga_enabled) { +switch (vga_interface_type) { +case VGA_STD: pci_vga_init(pci_bus); -} else { +return 1; +break; +case VGA_NONE: +return 0; +default: fprintf(stderr, This vga model is not supported, currently it only supports -vga std\n); -return 0; +exit(0); +break; } -return 1; } /* pSeries LPAR / sPAPR hardware init */ -- 1.6.0.2
Re: [Qemu-devel] [PATCH 15/35] migration: don't rely on any QERR_SOCKET_*
Markus Armbruster arm...@redhat.com wrote: Luiz Capitulino lcapitul...@redhat.com writes: Use the in_progress argument for QERR_SOCKET_CONNECT_IN_PROGRESS. The other errors are handled the same by checking if the error is set and then calling migrate_fd_error() if it's. It's also necessary to change inet_connect_opts() not to set QERR_SOCKET_CONNECT_IN_PROGRESS. This error is only used by tcp_start_outgoing_migration() and not changing it along with the usage of in_progress would break migration. Furthermore this commit fixes a bug. Today, there's a spurious error report when migration succeeds: (qemu) migrate tcp:0: migrate: Connection can not be completed immediately (qemu) After this commit no spurious error is reported anymore. Signed-off-by: Luiz Capitulino lcapitul...@redhat.com I'd prefer s-fd = inet_connect(host_port, false, in_progress, errp); if (error_is_set(errp)) { migrate_fd_error(s); return -1; } if (in_progress) { DPRINTF(connect in progress\n); qemu_set_fd_handler2(s-fd, NULL, NULL, tcp_wait_for_connect, s); } else { migrate_fd_connect(s); } return 0; because it separates abnormal and normal code paths more clearly. Matter of taste. The 1st migrate_fd_error() is not needed (it was already wrong). migrate_fd_* functions are supposed to be called only after migrate_fd_connect() has been called. Later, Juan.
[Qemu-devel] [PATCH] PPC: spapr: Remove global variable
Global variables are bad. Let's move spapr_has_graphics into the machine state struct. Signed-off-by: Alexander Graf ag...@suse.de --- hw/spapr.c |5 ++--- hw/spapr.h |1 + 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/hw/spapr.c b/hw/spapr.c index 31c4e53..b3d14c3 100644 --- a/hw/spapr.c +++ b/hw/spapr.c @@ -83,7 +83,6 @@ #define PHANDLE_XICP0x sPAPREnvironment *spapr; -bool spapr_has_graphics; qemu_irq spapr_allocate_irq(uint32_t hint, uint32_t *irq_num, enum xics_irq_type type) @@ -508,7 +507,7 @@ static void spapr_finalize_fdt(sPAPREnvironment *spapr, } } -if (!spapr_has_graphics) { +if (!spapr-has_graphics) { spapr_populate_chosen_stdout(fdt, spapr-vio_bus); } @@ -737,7 +736,7 @@ static void ppc_spapr_init(ram_addr_t ram_size, /* Graphics */ if (spapr_vga_init(QLIST_FIRST(spapr-phbs)-host_state.bus)) { -spapr_has_graphics = true; +spapr-has_graphics = true; } if (rma_size (MIN_RMA_SLOF 20)) { diff --git a/hw/spapr.h b/hw/spapr.h index 9153f29..fe40e7d 100644 --- a/hw/spapr.h +++ b/hw/spapr.h @@ -23,6 +23,7 @@ typedef struct sPAPREnvironment { int next_irq; int rtc_offset; char *cpu_model; +bool has_graphics; } sPAPREnvironment; #define H_SUCCESS 0 -- 1.6.0.2
Re: [Qemu-devel] [PATCH] audio: Make pcspk card selectable again
On 2012-07-20 09:17, Jan Kiszka wrote: From: Jan Kiszka jan.kis...@siemens.com Since we moved pcspk into hwlib, CONFIG_PCSPK is no longer defined per target. Therefore, statically built soundhw array in arch_init.c stopped including this card. Work around this by re-adding this define to config-target.mak. Long-term, a dynamic creation of this soundhw list will be necessary. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- configure |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/configure b/configure index 106209a..77b5096 100755 --- a/configure +++ b/configure @@ -3818,6 +3818,11 @@ if test $target_bsd_user = yes ; then echo CONFIG_BSD_USER=y $config_target_mak fi +# the static way of configuring available audio cards requires this workaround +if test $target_user_only != yes grep -q CONFIG_PCSPK $source_path/default-configs/$target.mak; then + echo CONFIG_PCSPK=y $config_target_mak +fi + # generate QEMU_CFLAGS/LDFLAGS for targets cflags= Ping. Sorry in case this arrives twice but the first ping didn't show up on the list for me. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [v2 Patch 2/9]block: raw-posix image file reopen
Am 14.08.2012 13:13, schrieb Supriya Kannery: On 08/10/2012 07:15 PM, Corey Bryant wrote: On 07/30/2012 05:34 PM, Supriya Kannery wrote: +static int raw_reopen_prepare(BlockDriverState *bs, BDRVReopenState **prs, + int flags) +{ + BDRVRawReopenState *raw_rs = g_malloc0(sizeof(BDRVRawReopenState)); + BDRVRawState *s = bs-opaque; + int ret = 0; + + raw_rs-reopen_state.bs = bs; + + /* stash state before reopen */ + raw_rs-stash_s = g_malloc0(sizeof(BDRVRawState)); + raw_stash_state(raw_rs-stash_s, s); + s-fd = dup3(raw_rs-stash_s-fd, s-fd, O_CLOEXEC); + + *prs = (raw_rs-reopen_state); + + /* Flags that can be set using fcntl */ + int fcntl_flags = BDRV_O_NOCACHE; + + if ((bs-open_flags ~fcntl_flags) == (flags ~fcntl_flags)) { + if ((flags BDRV_O_NOCACHE)) { + s-open_flags |= O_DIRECT; + } else { + s-open_flags = ~O_DIRECT; + } + ret = fcntl_setfl(s-fd, s-open_flags); + } else { + + /* close and reopen using new flags */ + bs-drv-bdrv_close(bs); + ret = bs-drv-bdrv_file_open(bs, bs-filename, flags); Will this allow the fdset refcount to get to zero? I was hoping your patches would prevent that from happening. Perhaps Kevin or Eric can weigh in. qemu_open() increments the refcount for an fdset when an fd from it is used, and qemu_close() decrements it. I think if you were able to perform the open before the close here that refcount wouldn't get to zero. Since we are duping the file descriptor before reaching this bdrv_close(), refcount for fd won't become zero. We need to use a qemu_dup() here, so that the fdset implementation can keep track of the new fd. Kevin
Re: [Qemu-devel] [PATCH 15/35] migration: don't rely on any QERR_SOCKET_*
Juan Quintela quint...@redhat.com writes: Markus Armbruster arm...@redhat.com wrote: Luiz Capitulino lcapitul...@redhat.com writes: Use the in_progress argument for QERR_SOCKET_CONNECT_IN_PROGRESS. The other errors are handled the same by checking if the error is set and then calling migrate_fd_error() if it's. It's also necessary to change inet_connect_opts() not to set QERR_SOCKET_CONNECT_IN_PROGRESS. This error is only used by tcp_start_outgoing_migration() and not changing it along with the usage of in_progress would break migration. Furthermore this commit fixes a bug. Today, there's a spurious error report when migration succeeds: (qemu) migrate tcp:0: migrate: Connection can not be completed immediately (qemu) After this commit no spurious error is reported anymore. Signed-off-by: Luiz Capitulino lcapitul...@redhat.com I'd prefer s-fd = inet_connect(host_port, false, in_progress, errp); if (error_is_set(errp)) { migrate_fd_error(s); return -1; } if (in_progress) { DPRINTF(connect in progress\n); qemu_set_fd_handler2(s-fd, NULL, NULL, tcp_wait_for_connect, s); } else { migrate_fd_connect(s); } return 0; because it separates abnormal and normal code paths more clearly. Matter of taste. The 1st migrate_fd_error() is not needed (it was already wrong). migrate_fd_* functions are supposed to be called only after migrate_fd_connect() has been called. It's been committed. Could you post a patch for it?
Re: [Qemu-devel] [PATCH uq/master] kvmvapic: Disable if there is insufficient memory
Jan Kiszka jan.kis...@siemens.com writes: We need at least 1M of RAM to map the option ROM. Otherwise, we will corrupt host memory or even crash. Let's put a reproducer in the commit message, if it's not too much trouble. Here's mine: $ qemu-system-x86_64 -nodefaults --enable-kvm -vnc :0 -m 640k Segmentation fault (core dumped) Reported-by: Markus Armbruster arm...@redhat.com Signed-off-by: Jan Kiszka jan.kis...@siemens.com Tested-by: Markus Armbruster arm...@redhat.com
[Qemu-devel] [PATCH v2 uq/master] kvmvapic: Disable if there is insufficient memory
We need at least 1M of RAM to map the option ROM. Otherwise, we will corrupt host memory or even crash: $ qemu-system-x86_64 -nodefaults --enable-kvm -vnc :0 -m 640k Segmentation fault (core dumped) Reported-and-tested-by: Markus Armbruster arm...@redhat.com Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/apic_common.c |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/hw/apic_common.c b/hw/apic_common.c index 58e63b0..371f95d 100644 --- a/hw/apic_common.c +++ b/hw/apic_common.c @@ -299,7 +299,9 @@ static int apic_init_common(SysBusDevice *dev) sysbus_init_mmio(dev, s-io_memory); -if (!vapic s-vapic_control VAPIC_ENABLE_MASK) { +/* Note: We need at least 1M to map the VAPIC option ROM */ +if (!vapic s-vapic_control VAPIC_ENABLE_MASK +ram_size = 1024 * 1024) { vapic = sysbus_create_simple(kvmvapic, -1, NULL); } s-vapic = vapic; -- 1.7.3.4
Re: [Qemu-devel] [PATCH 06/10] pseries: Export find_phb() utility function for PCI code
On 08/08/2012 04:10 AM, David Gibson wrote: From: Alexey Kardashevskiy a...@ozlabs.ru The pseries PCI code makes use of an internal find_dev() function which locates a PCIDevice * given a (platform specific) bus ID and device address. Internally this needs to first locate the host bridge on which the device resides based on the bus ID. This patch exposes that host bridge lookup as a separate function, which we will need later in the MSI and VFIO code. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru Signed-off-by: David Gibson da...@gibson.dropbear.id.au --- hw/spapr_pci.c | 32 ++-- 1 file changed, 22 insertions(+), 10 deletions(-) diff --git a/hw/spapr_pci.c b/hw/spapr_pci.c index fcc358e..842068f 100644 --- a/hw/spapr_pci.c +++ b/hw/spapr_pci.c @@ -29,27 +29,39 @@ #include hw/spapr_pci.h #include exec-memory.h #include libfdt.h +#include trace.h trace.h? Alex #include hw/pci_internals.h -static PCIDevice *find_dev(sPAPREnvironment *spapr, - uint64_t buid, uint32_t config_addr) +static sPAPRPHBState *find_phb(sPAPREnvironment *spapr, uint64_t buid) { -int devfn = (config_addr 8) 0xFF; sPAPRPHBState *phb; QLIST_FOREACH(phb, spapr-phbs, list) { -BusChild *kid; - if (phb-buid != buid) { continue; } +return phb; +} + +return NULL; +} + +static PCIDevice *find_dev(sPAPREnvironment *spapr, uint64_t buid, + uint32_t config_addr) +{ +sPAPRPHBState *phb = find_phb(spapr, buid); +BusChild *kid; +int devfn = (config_addr 8) 0xFF; + +if (!phb) { +return NULL; +} -QTAILQ_FOREACH(kid, phb-host_state.bus-qbus.children, sibling) { -PCIDevice *dev = (PCIDevice *)kid-child; -if (dev-devfn == devfn) { -return dev; -} +QTAILQ_FOREACH(kid, phb-host_state.bus-qbus.children, sibling) { +PCIDevice *dev = (PCIDevice *)kid-child; +if (dev-devfn == devfn) { +return dev; } }
Re: [Qemu-devel] Qemu memory operations
Sorry friends for the misleading instructions in the previous mail. cmp ecx, [r12+0x4] mov r10b, [r13+0x0] mov byte [rax+0xf], 0x0 mov byte [rax+rdx], 0x0 It seems all the above instructions are getting covered with the tcg_gen_ld/st helpers. But now I have stumbled upon another problem : I initially thought that all the interactions with the guest memory happen through the helper instructions in the translate.c file. However, I found that the helper functions for some instructions like *cmpxcgh8b *and* cmpxchg16b* are actually accessing guest memory. So, does it mean there are more than one entry points for reading guest memory. Can some one please explain how are the *ldq and stq* instructions translated to access the guest memory ?? Thanks in advance. Regards, Prathmesh Kallurkar
[Qemu-devel] [PATCH 0/2] Fix two bugs related to ram_size
There are more, but let's start with these two. Markus Armbruster (2): vl: Round argument of -m up to multiple of 8KiB pc: Fix RTC CMOS info on RAM for ram_size 1MiB hw/pc.c | 27 +++ vl.c| 4 +++- 2 files changed, 18 insertions(+), 13 deletions(-) -- 1.7.11.2
[Qemu-devel] [PATCH 1/2] vl: Round argument of -m up to multiple of 8KiB
Partial pages make little sense and don't work. Ensure the RAM size is a multiple of any possible target's page size. Fixes $ qemu-system-x86_64 -nodefaults -S -vnc :0 -monitor stdio -m 0.8 QEMU 1.1.50 monitor - type 'help' for more information (qemu) qemu-system-x86_64: /work/armbru/qemu/exec.c:2255: register_subpage: Assertion `existing-mr-subpage || existing-mr == io_mem_unassigned' failed Signed-off-by: Markus Armbruster arm...@redhat.com --- See also http://lists.nongnu.org/archive/html/qemu-devel/2012-06/msg02813.html vl.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/vl.c b/vl.c index d01256a..b411d45 100644 --- a/vl.c +++ b/vl.c @@ -2708,11 +2708,13 @@ int main(int argc, char **argv, char **envp) fprintf(stderr, qemu: invalid ram size: %s\n, optarg); exit(1); } - if (value != (uint64_t)(ram_addr_t)value) { fprintf(stderr, qemu: ram size too large\n); exit(1); } +if (value 0x1fff) { +value = (value + 0x1fff) ~0x1fff; +} ram_size = value; break; } -- 1.7.11.2
Re: [Qemu-devel] [PATCH v10 6/7] block: Enable qemu_open/close to work with fd sets
Am 13.08.2012 16:08, schrieb Corey Bryant: When qemu_open is passed a filename of the /dev/fdset/nnn format (where nnn is the fdset ID), an fd with matching access mode flags will be searched for within the specified monitor fd set. If the fd is found, a dup of the fd will be returned from qemu_open. Signed-off-by: Corey Bryant cor...@linux.vnet.ibm.com cutils.c |5 +++ monitor.c | 83 ++- monitor.h |5 +++ osdep.c | 109 + qemu-common.h |1 + qemu-tool.c | 20 +++ 6 files changed, 222 insertions(+), 1 deletion(-) This breaks the build of vscclient and the qtest cases, because osdep.c has now a new dependency on the fdset monitor functions. The easy way to fix it would be to squash the following in (linking vscclient and qtests with qemu-tool.o). Any objections? Kevin diff --git a/Makefile b/Makefile index d736ea5..a11a7f4 100644 --- a/Makefile +++ b/Makefile @@ -148,9 +148,6 @@ install-libcacard: libcacard.la $(call quiet-command,$(MAKE) $(SUBDIR_MAKEFLAGS) -C libcacard V=$(V) TARGET_DIR=$*/ install-libcacard,) endif -vscclient$(EXESUF): $(libcacard-y) $(oslib-obj-y) $(trace-obj-y) qemu-timer-common.o libcacard/vscclient.o - $(call quiet-command,$(CC) $(LDFLAGS) -o $@ $^ $(libcacard_libs) $(LIBS), LINK $@) - ## qemu-img.o: qemu-img-cmds.h @@ -166,6 +163,9 @@ qemu-io$(EXESUF): qemu-io.o cmd.o $(tools-obj-y) $(block-obj-y) qemu-bridge-helper$(EXESUF): qemu-bridge-helper.o +vscclient$(EXESUF): $(tools-obj-y) $(libcacard-y) $(oslib-obj-y) $(trace-obj-y) qemu-timer-common.o libcacard/vscclient.o + $(call quiet-command,$(CC) $(LDFLAGS) -o $@ $^ $(libcacard_libs) $(LIBS), LINK $@) + fsdev/virtfs-proxy-helper$(EXESUF): fsdev/virtfs-proxy-helper.o fsdev/virtio-9p-marshal.o oslib-posix.o $(trace-obj-y) fsdev/virtfs-proxy-helper$(EXESUF): LIBS += -lcap diff --git a/tests/Makefile b/tests/Makefile index f3f4159..26a67ce 100644 --- a/tests/Makefile +++ b/tests/Makefile @@ -81,7 +81,7 @@ TARGETS=$(patsubst %-softmmu,%, $(filter %-softmmu,$(TARGET_DIRS))) QTEST_TARGETS=$(foreach TARGET,$(TARGETS), $(if $(check-qtest-$(TARGET)-y), $(TARGET),)) check-qtest-$(CONFIG_POSIX)=$(foreach TARGET,$(TARGETS), $(check-qtest-$(TARGET)-y)) -qtest-obj-y = tests/libqtest.o $(oslib-obj-y) +qtest-obj-y = tests/libqtest.o $(oslib-obj-y) $(tools-obj-y) $(check-qtest-y): $(qtest-obj-y) .PHONY: check-help
Re: [Qemu-devel] [PATCH] iscsi: fix race between task completition and task abortion
Stefan H How should I do Acked-by properly, Is a reply with the text Acked-by: Ronnie Sahlberg ronniesahlb...@gmail.com sufficient ? regards ronnie sahlberg On Tue, Aug 14, 2012 at 8:35 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Tue, Aug 14, 2012 at 08:44:46AM +0200, Stefan Priebe wrote: From: spriebe g...@profihost.ag CCing Ronnie, iSCSI block driver author. --- block/iscsi.c | 36 1 files changed, 20 insertions(+), 16 deletions(-) diff --git a/block/iscsi.c b/block/iscsi.c index 12ca76d..257f97f 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -76,6 +76,10 @@ static void iscsi_abort_task_cb(struct iscsi_context *iscsi, int status, void *command_data, void *private_data) { + IscsiAIOCB *acb = (IscsiAIOCB *)private_data; + + scsi_free_scsi_task(acb-task); + acb-task = NULL; } static void @@ -85,14 +89,19 @@ iscsi_aio_cancel(BlockDriverAIOCB *blockacb) IscsiLun *iscsilun = acb-iscsilun; acb-common.cb(acb-common.opaque, -ECANCELED); -acb-canceled = 1; -/* send a task mgmt call to the target to cancel the task on the target */ -iscsi_task_mgmt_abort_task_async(iscsilun-iscsi, acb-task, - iscsi_abort_task_cb, NULL); +if (acb-canceled != 0) +return; + +acb-canceled = 1; -/* then also cancel the task locally in libiscsi */ -iscsi_scsi_task_cancel(iscsilun-iscsi, acb-task); +/* send a task mgmt call to the target to cancel the task on the target + * this also cancels the task in libiscsi + */ +if (acb-task) { +iscsi_task_mgmt_abort_task_async(iscsilun-iscsi, acb-task, + iscsi_abort_task_cb, acb); +} } static AIOPool iscsi_aio_pool = { @@ -184,6 +193,11 @@ iscsi_readv_writev_bh_cb(void *p) } qemu_aio_release(acb); + +if (acb-task) { + scsi_free_scsi_task(acb-task); + acb-task = NULL; +} } @@ -212,8 +226,6 @@ iscsi_aio_write16_cb(struct iscsi_context *iscsi, int status, } iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb); -scsi_free_scsi_task(acb-task); -acb-task = NULL; } static int64_t sector_qemu2lun(int64_t sector, IscsiLun *iscsilun) @@ -313,8 +325,6 @@ iscsi_aio_read16_cb(struct iscsi_context *iscsi, int status, } iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb); -scsi_free_scsi_task(acb-task); -acb-task = NULL; } static BlockDriverAIOCB * @@ -429,8 +439,6 @@ iscsi_synccache10_cb(struct iscsi_context *iscsi, int status, } iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb); -scsi_free_scsi_task(acb-task); -acb-task = NULL; } static BlockDriverAIOCB * @@ -483,8 +491,6 @@ iscsi_unmap_cb(struct iscsi_context *iscsi, int status, } iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb); -scsi_free_scsi_task(acb-task); -acb-task = NULL; } static BlockDriverAIOCB * @@ -560,8 +566,6 @@ iscsi_aio_ioctl_cb(struct iscsi_context *iscsi, int status, } iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb); -scsi_free_scsi_task(acb-task); -acb-task = NULL; } static BlockDriverAIOCB *iscsi_aio_ioctl(BlockDriverState *bs, -- 1.7.2.5
[Qemu-devel] [PATCH 2/2] pc: Fix RTC CMOS info on RAM for ram_size 1MiB
pc_cmos_init() always claims 640KiB base memory, and ram_size - 1MiB extended memory. The latter can underflow to lots of extended memory. Fix both, and clean up some. Note: SeaBIOS currently requires 1MiB of RAM, and doesn't check whether it got enough. Signed-off-by: Markus Armbruster arm...@redhat.com --- hw/pc.c | 27 +++ 1 file changed, 15 insertions(+), 12 deletions(-) diff --git a/hw/pc.c b/hw/pc.c index e8bcfc0..1597fe6 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -337,32 +337,35 @@ void pc_cmos_init(ram_addr_t ram_size, ram_addr_t above_4g_mem_size, /* various important CMOS locations needed by PC/Bochs bios */ /* memory size */ -val = 640; /* base memory in K */ +/* base memory (first MiB) */ +val = MIN(ram_size / 1024, 640); rtc_set_memory(s, 0x15, val); rtc_set_memory(s, 0x16, val 8); - -val = (ram_size / 1024) - 1024; +/* extended memory (next 64MiB) */ +if (ram_size 1024 * 1024) +val = (ram_size - 1024 * 1024) / 1024; +else +val = 0; if (val 65535) val = 65535; rtc_set_memory(s, 0x17, val); rtc_set_memory(s, 0x18, val 8); rtc_set_memory(s, 0x30, val); rtc_set_memory(s, 0x31, val 8); - -if (above_4g_mem_size) { -rtc_set_memory(s, 0x5b, (unsigned int)above_4g_mem_size 16); -rtc_set_memory(s, 0x5c, (unsigned int)above_4g_mem_size 24); -rtc_set_memory(s, 0x5d, (uint64_t)above_4g_mem_size 32); -} - -if (ram_size (16 * 1024 * 1024)) -val = (ram_size / 65536) - ((16 * 1024 * 1024) / 65536); +/* memory between 16MiB and 4GiB */ +if (ram_size 16 * 1024 * 1024) +val = (ram_size - 16 * 1024 * 1024) / 65536; else val = 0; if (val 65535) val = 65535; rtc_set_memory(s, 0x34, val); rtc_set_memory(s, 0x35, val 8); +/* memory above 4GiB */ +val = above_4g_mem_size / 65536; +rtc_set_memory(s, 0x5b, val); +rtc_set_memory(s, 0x5c, val 8); +rtc_set_memory(s, 0x5d, val 16); /* set the number of CPU */ rtc_set_memory(s, 0x5f, smp_cpus - 1); -- 1.7.11.2
Re: [Qemu-devel] [PATCH] iscsi: fix race between task completition and task abortion
On Tue, Aug 14, 2012 at 1:09 PM, ronnie sahlberg ronniesahlb...@gmail.com wrote: Is a reply with the text Acked-by: Ronnie Sahlberg ronniesahlb...@gmail.com sufficient ? Yes
Re: [Qemu-devel] [Qemu-ppc][PATCH v7 3/3] spapr: Add support for -vga option
On Tue, Aug 14, 2012 at 10:04:03PM +1000, Benjamin Herrenschmidt wrote: On Tue, 2012-08-14 at 13:04 +0200, Alexander Graf wrote: Thanks, applied to ppc-next without the USB bits. I also get the following warning now: $ ./ppc64-softmmu/qemu-system-ppc64 -nographic -M pseries -kernel /boot/vmlinux -initrd /boot/initrd -enable-kvm -m 1G -append root=/dev/null This vga model is not supported,currently it only supports -vga std [...] Fixing with a follow-up patch. Right, nowadays qemu defaults to cirrus regardless of the arch which is annoying. In fact it even creates a vga card with -nographic, you have to do -vga none to avoid it which is even more annoying for us :-) No, that's not the problem, or at least it's not the only problem. It actually rejected -vga none. I fixed that in the version in my tree, but I hadn't sent that out to Alex yet. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] [PATCH] trace/simple: Fix compiler warning for 32 bit hosts
On Mon, Aug 13, 2012 at 09:50:56PM +0200, Stefan Weil wrote: gcc complains when a 32 bit pointer is casted to a 64 bit integer. Cc: Stefan Hajnoczi stefa...@linux.vnet.ibm.com Signed-off-by: Stefan Weil s...@weilnetz.de --- scripts/tracetool/backend/simple.py |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) Thanks, applied to the tracing tree: https://github.com/stefanha/qemu/commits/tracing Stefan
Re: [Qemu-devel] [0/10] pseries updates and cleanups
On 08/08/2012 04:10 AM, David Gibson wrote: Hi Alex, This series contains all my outstanding pseries updates which aren't dependent on getting a generic patch upstream first. I have a number more which are actually more urgent to get into 1.2, but we need to get some word on the generic patches before I can really push those. In the meantime, if you could pull these various updates and cleanups into ppc-next ready for the 1.2 freeze, that would be great. Thanks, applied all to ppc-next. Alex
Re: [Qemu-devel] [PATCH] xbzrle: fix compilation on ppc32
On 08/14/2012 04:55 AM, Alexander Graf wrote: When compiling the xbzrle code on my ppc32 user space, I hit the following gcc compiler warning (treated as an error): cc1: warnings being treated as errors savevm.c: In function ‘xbzrle_encode_buffer’: savevm.c:2476: error: overflow in implicit constant conversion Fix this by making the cast explicit, rather than implicit. Signed-off-by: Alexander Graf ag...@suse.de --- savevm.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) Reviewed-by: Eric Blake ebl...@redhat.com diff --git a/savevm.c b/savevm.c index 0ea10c9..9ab4d83 100644 --- a/savevm.c +++ b/savevm.c @@ -2473,7 +2473,7 @@ int xbzrle_encode_buffer(uint8_t *old_buf, uint8_t *new_buf, int slen, /* word at a time for speed, use of 32-bit long okay */ if (!res) { /* truncation to 32-bit long okay */ -long mask = 0x0101010101010101ULL; +long mask = (long)0x0101010101010101ULL; What a picky compiler - too bad it can't just read the comment above that said we are okay with truncation :) -- Eric Blake ebl...@redhat.com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH 1/2] vl: Round argument of -m up to multiple of 8KiB
On 08/14/2012 02:58 PM, Markus Armbruster wrote: Partial pages make little sense and don't work. Ensure the RAM size is a multiple of any possible target's page size. index d01256a..b411d45 100644 --- a/vl.c +++ b/vl.c @@ -2708,11 +2708,13 @@ int main(int argc, char **argv, char **envp) fprintf(stderr, qemu: invalid ram size: %s\n, optarg); exit(1); } - if (value != (uint64_t)(ram_addr_t)value) { fprintf(stderr, qemu: ram size too large\n); exit(1); } +if (value 0x1fff) { +value = (value + 0x1fff) ~0x1fff; +} value = QEMU_ALIGN_UP(value, 8192); -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [PATCH] trace/simple: Replace asprintf by g_strdup_printf
On Mon, Aug 13, 2012 at 09:51:16PM +0200, Stefan Weil wrote: asprintf is not available for all hosts. g_strdup_printf is more portable and simplifies the code because if does not need error handling. The static variable does not need an explicit assignment to be NULL. Cc: Stefan Hajnoczi stefa...@linux.vnet.ibm.com Signed-off-by: Stefan Weil s...@weilnetz.de --- trace/simple.c | 14 -- 1 files changed, 4 insertions(+), 10 deletions(-) Thanks, applied to the tracing tree: https://github.com/stefanha/qemu/commits/tracing Stefan
Re: [Qemu-devel] [Qemu-ppc][PATCH v7 3/3] spapr: Add support for -vga option
On Tue, 2012-08-14 at 13:04 +0200, Alexander Graf wrote: Thanks, applied to ppc-next without the USB bits. I also get the following warning now: $ ./ppc64-softmmu/qemu-system-ppc64 -nographic -M pseries -kernel /boot/vmlinux -initrd /boot/initrd -enable-kvm -m 1G -append root=/dev/null This vga model is not supported,currently it only supports -vga std [...] Fixing with a follow-up patch. Right, nowadays qemu defaults to cirrus regardless of the arch which is annoying. In fact it even creates a vga card with -nographic, you have to do -vga none to avoid it which is even more annoying for us :-) That's an area where we want less stupid global magic in vl.c and more fine tunes arch control. Cheers, Ben
[Qemu-devel] [PATCH: RFC] Adding BAR0 for e500 PCI controller
PCI Root complex have TYPE-1 configuration header while PCI endpoint have type-0 configuration header. The type-1 configuration header have a BAR (BAR0). In Freescale PCI controller BAR0 is used for mapping pci address space to CCSR address space. This can used for 2 purposes: 1) for MSI interrupt generation 2) Allow CCSR registers access when configured as PCI endpoint, which I am not sure is a use case with QEMU-KVM guest. What I observed is that when guest read the size of BAR0 of host controller configuration header (TYPE1 header) then it always reads it as 0. When looking into the QEMU hw/ppce500_pci.c, I do not find the PCI controller device registering BAR0. I do not find any other controller also doing so may they do not use BAR0. There are two issues when BAR0 is not there (which I can think of): 1) There should be BAR0 emulated for PCI Root comaplex (TYPE1 header) and when reading the size of BAR0, it should give size as per real h/w. 2) Do we need this BAR0 inbound address translation? When BAR0 is of non-zero size then it will be configured for PCI address space to local address(CCSR) space translation on inbound access. The primary use case is for MSI interrupt generation. The device is configured with a address offsets in PCI address space, which will be translated to MSI interrupt generation MPIC registers. Currently I do not understand the MSI interrupt generation mechanism in QEMU and also IIRC we do not use QEMU MSI interrupt mechanism on e500 guest machines. But this BAR0 will be used when using MSI on e500. I can see one more issue, There are ATMUs emulated in hw/ppce500_pci.c, but i do not see these being used for address translation. So far that works because pci address space and local address space are 1:1 mapped. BAR0 inbound translation + ATMU translation will complete the address translation of inbound traffic. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- hw/pci_host.h|1 + hw/ppce500_pci.c |5 + 2 files changed, 6 insertions(+), 0 deletions(-) diff --git a/hw/pci_host.h b/hw/pci_host.h index 359e38f..0ec0691 100644 --- a/hw/pci_host.h +++ b/hw/pci_host.h @@ -36,6 +36,7 @@ struct PCIHostState { MemoryRegion data_mem; MemoryRegion mmcfg; MemoryRegion *address_space; +MemoryRegion bar0; uint32_t config_reg; PCIBus *bus; }; diff --git a/hw/ppce500_pci.c b/hw/ppce500_pci.c index 0f60b24..7e58a62 100644 --- a/hw/ppce500_pci.c +++ b/hw/ppce500_pci.c @@ -306,6 +306,7 @@ static int e500_pcihost_initfn(SysBusDevice *dev) PCIHostState *h; PPCE500PCIState *s; PCIBus *b; +PCIDevice *pdev; int i; MemoryRegion *address_space_mem = get_system_memory(); MemoryRegion *address_space_io = get_system_io(); @@ -336,6 +337,10 @@ static int e500_pcihost_initfn(SysBusDevice *dev) memory_region_add_subregion(s-container, PCIE500_REG_BASE, s-iomem); sysbus_init_mmio(dev, s-container); +memory_region_init_io(h-bar0, pci_host_conf_be_ops, h, + PCIHOST-bar0, 0x100); +pdev = pci_find_device(b, 0, 0); +pci_register_bar(pdev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, h-bar0); return 0; } -- 1.7.0.4
Re: [Qemu-devel] [PATCH 1/2] vl: Round argument of -m up to multiple of 8KiB
Avi Kivity a...@redhat.com writes: On 08/14/2012 02:58 PM, Markus Armbruster wrote: Partial pages make little sense and don't work. Ensure the RAM size is a multiple of any possible target's page size. index d01256a..b411d45 100644 --- a/vl.c +++ b/vl.c @@ -2708,11 +2708,13 @@ int main(int argc, char **argv, char **envp) fprintf(stderr, qemu: invalid ram size: %s\n, optarg); exit(1); } - if (value != (uint64_t)(ram_addr_t)value) { fprintf(stderr, qemu: ram size too large\n); exit(1); } +if (value 0x1fff) { +value = (value + 0x1fff) ~0x1fff; +} value = QEMU_ALIGN_UP(value, 8192); I looked for such a macro, but my greps missed. Thanks!
[Qemu-devel] [PATCH 1/6] trace: rename TraceRecordHeader to TraceLogHeader
From: Harsh Prateek Bora ha...@linux.vnet.ibm.com The TraceRecordHeader is really the header for the entire trace log file. It's not per-record header so make this obvious by renaming it. Signed-off-by: Harsh Prateek Bora ha...@linux.vnet.ibm.com Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- trace/simple.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/trace/simple.c b/trace/simple.c index b700ea3..5d92939 100644 --- a/trace/simple.c +++ b/trace/simple.c @@ -70,7 +70,7 @@ typedef struct { uint64_t header_event_id; /* HEADER_EVENT_ID */ uint64_t header_magic;/* HEADER_MAGIC*/ uint64_t header_version; /* HEADER_VERSION */ -} TraceRecordHeader; +} TraceLogHeader; static void read_from_buffer(unsigned int idx, void *dataptr, size_t size); @@ -295,7 +295,7 @@ void st_set_trace_file_enabled(bool enable) flush_trace_file(true); if (enable) { -static const TraceRecordHeader header = { +static const TraceLogHeader header = { .header_event_id = HEADER_EVENT_ID, .header_magic = HEADER_MAGIC, /* Older log readers will check for version at next location */ -- 1.7.10.4
[Qemu-devel] [PATCH 2/6] trace: remove unnecessary write_to_buffer() typecasting
From: Harsh Prateek Bora ha...@linux.vnet.ibm.com The buffer argument is void* so it is not necessary to cast. Signed-off-by: Harsh Prateek Bora ha...@linux.vnet.ibm.com Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- trace/simple.c |6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/trace/simple.c b/trace/simple.c index 5d92939..a0e0f05 100644 --- a/trace/simple.c +++ b/trace/simple.c @@ -235,9 +235,9 @@ int trace_record_start(TraceBufferRecord *rec, TraceEventID event, size_t datasi rec-next_tbuf_idx = new_idx % TRACE_BUF_LEN; rec_off = idx; -rec_off = write_to_buffer(rec_off, (uint8_t*)event, sizeof(event)); -rec_off = write_to_buffer(rec_off, (uint8_t*)timestamp_ns, sizeof(timestamp_ns)); -rec_off = write_to_buffer(rec_off, (uint8_t*)rec_len, sizeof(rec_len)); +rec_off = write_to_buffer(rec_off, event, sizeof(event)); +rec_off = write_to_buffer(rec_off, timestamp_ns, sizeof(timestamp_ns)); +rec_off = write_to_buffer(rec_off, rec_len, sizeof(rec_len)); rec-tbuf_idx = idx; rec-rec_off = (idx + sizeof(TraceRecord)) % TRACE_BUF_LEN; -- 1.7.10.4
[Qemu-devel] [PATCH 5/6] trace/simple: Fix compiler warning for 32 bit hosts
From: Stefan Weil s...@weilnetz.de gcc complains when a 32 bit pointer is casted to a 64 bit integer. Cc: Stefan Hajnoczi stefa...@linux.vnet.ibm.com Signed-off-by: Stefan Weil s...@weilnetz.de Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- scripts/tracetool/backend/simple.py |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/tracetool/backend/simple.py b/scripts/tracetool/backend/simple.py index c7e47d6..e4b4a7f 100644 --- a/scripts/tracetool/backend/simple.py +++ b/scripts/tracetool/backend/simple.py @@ -79,7 +79,7 @@ def c(events): ) # pointer var (not string) elif type_.endswith('*'): -out('trace_record_write_u64(rec, (uint64_t)(uint64_t *)%(name)s);', +out('trace_record_write_u64(rec, (uintptr_t)(uint64_t *)%(name)s);', name = name, ) # primitive data type -- 1.7.10.4
[Qemu-devel] [PATCH 3/6] trace: drop unused TraceBufferRecord-next_tbuf_idx field
From: Harsh Prateek Bora ha...@linux.vnet.ibm.com Signed-off-by: Harsh Prateek Bora ha...@linux.vnet.ibm.com Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- trace/simple.c |2 -- trace/simple.h |1 - 2 files changed, 3 deletions(-) diff --git a/trace/simple.c b/trace/simple.c index a0e0f05..4fed07f 100644 --- a/trace/simple.c +++ b/trace/simple.c @@ -231,8 +231,6 @@ int trace_record_start(TraceBufferRecord *rec, TraceEventID event, size_t datasi } idx = old_idx % TRACE_BUF_LEN; -/* To check later if threshold crossed */ -rec-next_tbuf_idx = new_idx % TRACE_BUF_LEN; rec_off = idx; rec_off = write_to_buffer(rec_off, event, sizeof(event)); diff --git a/trace/simple.h b/trace/simple.h index 7e521c1..2ab96a8 100644 --- a/trace/simple.h +++ b/trace/simple.h @@ -29,7 +29,6 @@ void st_flush_trace_buffer(void); typedef struct { unsigned int tbuf_idx; -unsigned int next_tbuf_idx; unsigned int rec_off; } TraceBufferRecord; -- 1.7.10.4
Re: [Qemu-devel] [RFC v0] HACK: qom: object_property_set: abort on failure
Peter A. G. Crosthwaite peter.crosthwa...@petalogix.com writes: Hi All. A couple of times now ive had debug issues due to silent failure of object_property_set. This function silently fails if the requested property does not exist for the target object. To trap this error I applied the patch below to my tree, but I am assuming that this is not mergeable as is as im guessing there are clients out there that are speculatively trying to set props. Could someone confirm the expected policy here? is setting a non-existant property supposed to be a no-op (as it currently is) or should it fail gracefully? Are you calling set via object_property_set_[str,int,bool,...]? Why don't you add a terminal error to those functions if setting fails and errp is NULL. Regards, Anthony Liguori Whats the best meachinism for creating a no_fail version of object_property_set, for the 90% case where a non-existant property is an error in machine model development? Signed-off-by: Peter A. G. Crosthwaite peter.crosthwa...@petalogix.com --- qom/object.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/qom/object.c b/qom/object.c index a552be2..6e875a8 100644 --- a/qom/object.c +++ b/qom/object.c @@ -687,7 +687,7 @@ void object_property_set(Object *obj, Visitor *v, const char *name, { ObjectProperty *prop = object_property_find(obj, name, errp); if (prop == NULL) { -return; +abort(); } if (!prop-set) { -- 1.7.0.4
Re: [Qemu-devel] [Qemu-ppc] [0/10] pseries updates and cleanups
On 08/14/2012 02:34 PM, Alexander Graf wrote: On 08/08/2012 04:10 AM, David Gibson wrote: Hi Alex, This series contains all my outstanding pseries updates which aren't dependent on getting a generic patch upstream first. I have a number more which are actually more urgent to get into 1.2, but we need to get some word on the generic patches before I can really push those. In the meantime, if you could pull these various updates and cleanups into ppc-next ready for the 1.2 freeze, that would be great. Thanks, applied all to ppc-next. I also went manually through the queue and fixed up the authors of a few patches to reflect the first person in the sob line. Pretty annoying. Alex
[Qemu-devel] [PATCH 6/6] trace/simple: Replace asprintf by g_strdup_printf
From: Stefan Weil s...@weilnetz.de asprintf is not available for all hosts. g_strdup_printf is more portable and simplifies the code because if does not need error handling. The static variable does not need an explicit assignment to be NULL. Cc: Stefan Hajnoczi stefa...@linux.vnet.ibm.com Signed-off-by: Stefan Weil s...@weilnetz.de Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- trace/simple.c | 14 -- 1 file changed, 4 insertions(+), 10 deletions(-) diff --git a/trace/simple.c b/trace/simple.c index 8e175ec..d83681b 100644 --- a/trace/simple.c +++ b/trace/simple.c @@ -55,7 +55,7 @@ static unsigned int trace_idx; static unsigned int writeout_idx; static uint64_t dropped_events; static FILE *trace_fp; -static char *trace_file_name = NULL; +static char *trace_file_name; /* * Trace buffer entry */ typedef struct { @@ -329,18 +329,12 @@ bool st_set_trace_file(const char *file) { st_set_trace_file_enabled(false); -free(trace_file_name); +g_free(trace_file_name); if (!file) { -if (asprintf(trace_file_name, CONFIG_TRACE_FILE, getpid()) 0) { -trace_file_name = NULL; -return false; -} +trace_file_name = g_strdup_printf(CONFIG_TRACE_FILE, getpid()); } else { -if (asprintf(trace_file_name, %s, file) 0) { -trace_file_name = NULL; -return false; -} +trace_file_name = g_strdup_printf(%s, file); } st_set_trace_file_enabled(true); -- 1.7.10.4
Re: [Qemu-devel] [PATCH 15/35] migration: don't rely on any QERR_SOCKET_*
On Tue, 14 Aug 2012 13:35:59 +0200 Markus Armbruster arm...@redhat.com wrote: Juan Quintela quint...@redhat.com writes: Markus Armbruster arm...@redhat.com wrote: Luiz Capitulino lcapitul...@redhat.com writes: Use the in_progress argument for QERR_SOCKET_CONNECT_IN_PROGRESS. The other errors are handled the same by checking if the error is set and then calling migrate_fd_error() if it's. It's also necessary to change inet_connect_opts() not to set QERR_SOCKET_CONNECT_IN_PROGRESS. This error is only used by tcp_start_outgoing_migration() and not changing it along with the usage of in_progress would break migration. Furthermore this commit fixes a bug. Today, there's a spurious error report when migration succeeds: (qemu) migrate tcp:0: migrate: Connection can not be completed immediately (qemu) After this commit no spurious error is reported anymore. Signed-off-by: Luiz Capitulino lcapitul...@redhat.com I'd prefer s-fd = inet_connect(host_port, false, in_progress, errp); if (error_is_set(errp)) { migrate_fd_error(s); return -1; } if (in_progress) { DPRINTF(connect in progress\n); qemu_set_fd_handler2(s-fd, NULL, NULL, tcp_wait_for_connect, s); } else { migrate_fd_connect(s); } return 0; because it separates abnormal and normal code paths more clearly. Matter of taste. The 1st migrate_fd_error() is not needed (it was already wrong). migrate_fd_* functions are supposed to be called only after migrate_fd_connect() has been called. It's been committed. Could you post a patch for it? Juan, are you going to do this or do you want me to do it?