Re: [Qemu-devel] [PATCH 0/7] ATAPI CDROM passthrough v5
Am 19.10.2010 um 02:10 schrieb Anthony Liguori anth...@codemonkey.ws: On 10/18/2010 06:29 PM, Alexander Graf wrote: A user will get a really nasty surprise if they think they can use a flag or rely on QEMU to prevent a VM from doing something nasty with a device. If they have this feeling of security, they're likely to chmod the device to allow unprivileged users to access it. But how a device handles ATAPI commands is totally up to the device. If you issue the wrong sequence, I'm sure there are devices out there that totally hose themselves. Are you absolutely confident that every ATAPI device out there is completely safe against hostile code provided that you simply prevent the FW update commands? I'm certainly not. Ping? Who are you pinging? Mostly Ian. I haven't seen any follow-up on this discussion and would like to know why and if there's still plans to upstream this code :). Alex
[Qemu-devel] [Tracing][v4 PATCH 0/2] QMP Query interfaces for tracing
This patch set introduces three QMP query interfaces for tracing : * query-trace: to list current contents of trace-buffer * query-trace-events : to list all available trace-events with their state. * query-trace-file : to list currently set trace-file with its status. Changelog : --- Changes v3 - v4 : - Add 'query-trace-file' interface to query currently active trace-file. - Cleanup. Changes v2 - v3 : - Change declarations of st_print_trace_to_qlist() and st_print_trace_events_to_qlist() to return QList* Changes v1 - v2 : - Add 'timestamp' field for query-trace output. - Misc cleanups. -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] [Tracing][v4 PATCH 1/2] Introduce QMP interfaces
[PATCH 1/2] Introduce QMP interfaces : - query-trace - query-trace-events - query-trace-file Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com --- monitor.c | 53 --- simpletrace.c | 69 + simpletrace.h |5 3 files changed, 123 insertions(+), 4 deletions(-) diff --git a/monitor.c b/monitor.c index 260cc02..c7e1f53 100644 --- a/monitor.c +++ b/monitor.c @@ -578,6 +578,11 @@ static void do_trace_file(Monitor *mon, const QDict *qdict) help_cmd(mon, trace-file); } } + +static void do_info_trace_file_to_qmp(Monitor *mon, QObject **ret_data) +{ +*ret_data = st_print_file_to_qobject(); +} #endif static void user_monitor_complete(void *opaque, QObject *ret_data) @@ -945,15 +950,27 @@ static void do_info_cpu_stats(Monitor *mon) #endif #if defined(CONFIG_SIMPLE_TRACE) -static void do_info_trace(Monitor *mon) +static void do_info_trace_print(Monitor *mon, const QObject *data) { st_print_trace((FILE *)mon, monitor_fprintf); } -static void do_info_trace_events(Monitor *mon) +static void do_info_trace(Monitor *mon, QObject **ret_data) +{ +QList *trace_event_list = st_print_trace_to_qlist(); +*ret_data = QOBJECT(trace_event_list); +} + +static void do_info_trace_events_print(Monitor *mon, const QObject *data) { st_print_trace_events((FILE *)mon, monitor_fprintf); } + +static void do_info_trace_events(Monitor *mon, QObject **ret_data) +{ +QList *trace_event_list = st_print_trace_events_to_qlist(); +*ret_data = QOBJECT(trace_event_list); +} #endif /** @@ -2610,14 +2627,16 @@ static const mon_cmd_t info_cmds[] = { .args_type = , .params = , .help = show current contents of trace buffer, -.mhandler.info = do_info_trace, +.user_print = do_info_trace_print, +.mhandler.info_new = do_info_trace, }, { .name = trace-events, .args_type = , .params = , .help = show available trace-events their state, -.mhandler.info = do_info_trace_events, +.user_print = do_info_trace_events_print, +.mhandler.info_new = do_info_trace_events, }, #endif { @@ -2752,6 +2771,32 @@ static const mon_cmd_t qmp_query_cmds[] = { .mhandler.info_async = do_info_balloon, .flags = MONITOR_CMD_ASYNC, }, +#if defined(CONFIG_SIMPLE_TRACE) +{ +.name = trace, +.args_type = , +.params = , +.help = show current contents of trace buffer, +.user_print = do_info_trace_print, +.mhandler.info_new = do_info_trace, +}, +{ +.name = trace-events, +.args_type = , +.params = , +.help = show available trace-events their state, +.user_print = do_info_trace_events_print, +.mhandler.info_new = do_info_trace_events, +}, +{ +.name = trace-file, +.args_type = , +.params = , +.help = show currently active trace output file and its status, +.user_print = monitor_user_noop, +.mhandler.info_new = do_info_trace_file_to_qmp, +}, +#endif { /* NULL */ }, }; diff --git a/simpletrace.c b/simpletrace.c index deb1e07..d24d6b0 100644 --- a/simpletrace.c +++ b/simpletrace.c @@ -220,6 +220,43 @@ void st_print_trace(FILE *stream, int (*stream_printf)(FILE *stream, const char } } +/** + * Add the current contents of trace-buffer as a QList. + * + */ +QList* st_print_trace_to_qlist(void) +{ +QObject *data; +QList *tlist; +unsigned int i; + +tlist = qlist_new(); + +for (i = 0; i trace_idx; i++) { + data = qobject_from_jsonf({ + 'timestamp': % PRId64 , + 'event': % PRId64 , + 'arg1': % PRId64 , + 'arg2': % PRId64 , + 'arg3': % PRId64 , + 'arg4': % PRId64 , + 'arg5': % PRId64 , + 'arg6': % PRId64 +}, +trace_buf[i].timestamp_ns, +trace_buf[i].event, +trace_buf[i].x1, +trace_buf[i].x2, +trace_buf[i].x3, +trace_buf[i].x4, +trace_buf[i].x5, +trace_buf[i].x6); + qlist_append_obj(tlist, data); +} + +return tlist; +} + void st_print_trace_events(FILE *stream, int (*stream_printf)(FILE *stream, const char *fmt, ...)) { unsigned int i; @@ -230,6 +267,38 @@
[Qemu-devel] [Tracing][v4 PATCH 2/2] Add documentation for QMP interfaces
[PATCH 2/2] Add documentation for QMP commands: - query-trace - query-trace-events - query-trace-file. Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com --- qmp-commands.hx | 94 +++ 1 files changed, 94 insertions(+), 0 deletions(-) diff --git a/qmp-commands.hx b/qmp-commands.hx index 793cf1c..bc79b55 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -1539,3 +1539,97 @@ Example: EQMP +SQMP +query-trace +- + +Show contents of trace buffer. + +Returns a set of json-objects containing the following data: + +- event: Event ID for the trace-event(json-int) +- timestamp: trace timestamp (json-int) +- arg1 .. arg6: Arguments logged by the trace-event (json-int) + +Example: + +- { execute: query-trace } +- { + return:{ + event: 22, + timestamp: 129456235912365, + arg1: 886 + arg2: 80, + arg3: 0, + arg4: 0, + arg5: 0, + arg6: 0, + }, + { + event: 22, + timestamp: 129456235973407, + arg1: 886, + arg2: 80, + arg3: 0, + arg4: 0, + arg5: 0, + arg6: 0 + }, + ... + } + +EQMP + +SQMP +query-trace-events +-- + +Show all available trace-events their state. + +Returns a set of json-objects containing the following data: + +- name: Name of Trace-event (json-string) +- event-id: Event ID of Trace-event (json-int) +- state: State of trace-event [ '0': inactive; '1':active ] (json-int) + +Example: + +- { execute: query-trace-events } +- { + return:{ + name: qemu_malloc, + event-id: 0 + state: 0, + }, + { + name: qemu_realloc, + event-id: 1, + state: 0 + }, + ... + } + +EQMP + +SQMP +query-trace-file + + +Display currently set trace file name and its status. + +Returns a set of json-objects containing the following data: + +- trace-file: Name of Trace-file (json-string) +- status: State of trace-event [ '0': disabled; '1':enabled ] (json-int) + +Example: + +- { execute: query-trace-file } +- { + return:{ + trace-file: trace-26609, + status: 1 + } + } + +EQMP -- 1.7.2.2 -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
Re: [Qemu-devel] Re: [PATCH 1/2] pci: Automatically patch PCI vendor id and device id in PCI ROM
On 10/18/10 21:36, Stefan Weil wrote: There is already some kind of error feedback: the rom will not work. For etherboot roms, booting from network won't work. VGA works, after hacking the vgabios to not have the PCI ID hardcoded elsewhere. Nevertheless /me gets the feeling that we better should not take that route. vgabios needs special patching to work. etherboot does not work as-is. Even if we make it work now it always will be fragile. The next rom update might break it again. The ID automagically adapting doesn't happen on real hardware ... cheers, Gerd
Re: [Qemu-devel] [PATCH 1/2] pci: Automatically patch PCI vendor id and device id in PCI ROM
On Mon, Oct 18, 2010 at 09:11:55PM +0200, Stefan Weil wrote: QEMU must only make sure that patching of the supported roms with supported devices work. I think that's what Anthony was saying too - make this depend on a qdev property and set it only in eepro100 for now. -- MST
[Qemu-devel] [PATCH v5 04/14] pci/bridge: fix pci_bridge_reset()
The default value of base/limit registers aren't specified in the spec. So pci_bridge_reset() shouldn't touch them. Instead, introduced two functions to reset those registers in a way of typical implementation. zero base/limit registers or disable forwarding. They will be used later. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- Changes v4 - v5: - drop the lines in pci_bridge_reset() - introduced two functions to reset base/limit registers. --- hw/pci_bridge.c | 57 +++--- hw/pci_bridge.h |2 + 2 files changed, 51 insertions(+), 8 deletions(-) diff --git a/hw/pci_bridge.c b/hw/pci_bridge.c index 638e3b3..de75e6a 100644 --- a/hw/pci_bridge.c +++ b/hw/pci_bridge.c @@ -151,6 +151,46 @@ void pci_bridge_write_config(PCIDevice *d, } } +void pci_bridge_reset_zero_base_limit(PCIDevice *dev) +{ +uint8_t *conf = dev-config; + +pci_byte_test_and_clear_mask(conf + PCI_IO_BASE, + PCI_IO_RANGE_MASK 0xff); +pci_byte_test_and_clear_mask(conf + PCI_IO_LIMIT, + PCI_IO_RANGE_MASK 0xff); +pci_word_test_and_clear_mask(conf + PCI_MEMORY_BASE, + PCI_MEMORY_RANGE_MASK 0x); +pci_word_test_and_clear_mask(conf + PCI_MEMORY_LIMIT, + PCI_MEMORY_RANGE_MASK 0x); +pci_word_test_and_clear_mask(conf + PCI_PREF_MEMORY_BASE, + PCI_PREF_RANGE_MASK 0x); +pci_word_test_and_clear_mask(conf + PCI_PREF_MEMORY_LIMIT, + PCI_PREF_RANGE_MASK 0x); +pci_set_word(conf + PCI_PREF_BASE_UPPER32, 0); +pci_set_word(conf + PCI_PREF_LIMIT_UPPER32, 0); +} + +void pci_bridge_reset_disable_base_limit(PCIDevice *dev) +{ +uint8_t *conf = dev-config; + +pci_byte_test_and_set_mask(conf + PCI_IO_BASE, + PCI_IO_RANGE_MASK 0xff); +pci_byte_test_and_clear_mask(conf + PCI_IO_LIMIT, + PCI_IO_RANGE_MASK 0xff); +pci_word_test_and_set_mask(conf + PCI_MEMORY_BASE, + PCI_MEMORY_RANGE_MASK 0x); +pci_word_test_and_clear_mask(conf + PCI_MEMORY_LIMIT, + PCI_MEMORY_RANGE_MASK 0x); +pci_word_test_and_set_mask(conf + PCI_PREF_MEMORY_BASE, + PCI_PREF_RANGE_MASK 0x); +pci_word_test_and_clear_mask(conf + PCI_PREF_MEMORY_LIMIT, + PCI_PREF_RANGE_MASK 0x); +pci_set_word(conf + PCI_PREF_BASE_UPPER32, 0); +pci_set_word(conf + PCI_PREF_LIMIT_UPPER32, 0); +} + /* reset bridge specific configuration registers */ void pci_bridge_reset_reg(PCIDevice *dev) { @@ -161,14 +201,15 @@ void pci_bridge_reset_reg(PCIDevice *dev) conf[PCI_SUBORDINATE_BUS] = 0; conf[PCI_SEC_LATENCY_TIMER] = 0; -conf[PCI_IO_BASE] = 0; -conf[PCI_IO_LIMIT] = 0; -pci_set_word(conf + PCI_MEMORY_BASE, 0); -pci_set_word(conf + PCI_MEMORY_LIMIT, 0); -pci_set_word(conf + PCI_PREF_MEMORY_BASE, 0); -pci_set_word(conf + PCI_PREF_MEMORY_LIMIT, 0); -pci_set_word(conf + PCI_PREF_BASE_UPPER32, 0); -pci_set_word(conf + PCI_PREF_LIMIT_UPPER32, 0); +/* + * the default values for base/limit registers aren't specified + * in the PCI-to-PCI-bridge spec. So we don't thouch them here. + * Each implementation can override it. + * typical implementation does + * - zero registers: pci_bridge_reset_zer_base_limit() + * or + * - disable forwarding: pci_bridge_reset_disable_base_limit() + */ pci_set_word(conf + PCI_BRIDGE_CONTROL, 0); } diff --git a/hw/pci_bridge.h b/hw/pci_bridge.h index f6fade0..2359684 100644 --- a/hw/pci_bridge.h +++ b/hw/pci_bridge.h @@ -39,6 +39,8 @@ pcibus_t pci_bridge_get_limit(const PCIDevice *bridge, uint8_t type); void pci_bridge_write_config(PCIDevice *d, uint32_t address, uint32_t val, int len); +void pci_bridge_reset_zero_base_limit(PCIDevice *dev); +void pci_bridge_reset_disable_base_limit(PCIDevice *dev); void pci_bridge_reset_reg(PCIDevice *dev); void pci_bridge_reset(DeviceState *qdev); -- 1.7.1.1
[Qemu-devel] [PATCH v5 13/14] pcie/hotplug: introduce pushing attention button command
glue pcie_push_attention_button command. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- hw/pcie_port.c | 82 +++ qemu-monitor.hx | 14 + sysemu.h|4 +++ 3 files changed, 100 insertions(+), 0 deletions(-) diff --git a/hw/pcie_port.c b/hw/pcie_port.c index 117de61..f43a1c7 100644 --- a/hw/pcie_port.c +++ b/hw/pcie_port.c @@ -18,6 +18,10 @@ * with this program; if not, see http://www.gnu.org/licenses/. */ +#include qemu-objects.h +#include sysemu.h +#include monitor.h +#include pcie.h #include pcie_port.h void pcie_port_init_reg(PCIDevice *d) @@ -114,3 +118,81 @@ void pcie_chassis_del_slot(PCIESlot *s) { QLIST_REMOVE(s, next); } + +/** + * glue for qemu monitor + */ + +/* Parse [chassis.]slot, return -1 on error */ +static int pcie_parse_slot_addr(const char* slot_addr, +uint8_t *chassisp, uint16_t *slotp) +{ +const char *p; +char *e; +unsigned long val; +unsigned long chassis = 0; +unsigned long slot; + +p = slot_addr; +val = strtoul(p, e, 0); +if (e == p) { +return -1; +} +if (*e == '.') { +chassis = val; +p = e + 1; +val = strtoul(p, e, 0); +if (e == p) { +return -1; +} +} +slot = val; + +if (*e) { +return -1; +} + +if (chassis 0xff || slot 0x) { +return -1; +} + +*chassisp = chassis; +*slotp = slot; +return 0; +} + +void pcie_attention_button_push_print(Monitor *mon, const QObject *data) +{ +QDict *qdict; + +assert(qobject_type(data) == QTYPE_QDICT); +qdict = qobject_to_qdict(data); + +monitor_printf(mon, OK chassis %d, slot %d\n, + (int) qdict_get_int(qdict, chassis), + (int) qdict_get_int(qdict, slot)); +} + +int pcie_attention_button_push(Monitor *mon, const QDict *qdict, + QObject **ret_data) +{ +const char* pcie_slot = qdict_get_str(qdict, pcie_slot); +uint8_t chassis; +uint16_t slot; +PCIESlot *s; + +if (pcie_parse_slot_addr(pcie_slot, chassis, slot) 0) { +monitor_printf(mon, invalid pcie slot address %s\n, pcie_slot); +return -1; +} +s = pcie_chassis_find_slot(chassis, slot); +if (!s) { +monitor_printf(mon, slot is not found. %s\n, pcie_slot); +return -1; +} +pcie_cap_slot_push_attention_button(s-port.br.dev); +*ret_data = qobject_from_jsonf({ 'chassis': %d, 'slot': %d}, + chassis, slot); +assert(*ret_data); +return 0; +} diff --git a/qemu-monitor.hx b/qemu-monitor.hx index 2af3de6..965c754 100644 --- a/qemu-monitor.hx +++ b/qemu-monitor.hx @@ -1154,6 +1154,20 @@ Hot remove PCI device. ETEXI { +.name = pcie_push_attention_button, +.args_type = pcie_slot:s, +.params = [chassis.]slot, +.help = push pci express attention button, +.user_print = pcie_attention_button_push_print, +.mhandler.cmd_new = pcie_attention_button_push, +}, + +STEXI +...@item pcie_abp +Push PCI express attention button +ETEXI + +{ .name = host_net_add, .args_type = device:s,opts:s?, .params = tap|user|socket|vde|dump [options], diff --git a/sysemu.h b/sysemu.h index 9c988bb..cca411d 100644 --- a/sysemu.h +++ b/sysemu.h @@ -150,6 +150,10 @@ extern unsigned int nb_prom_envs; void pci_device_hot_add(Monitor *mon, const QDict *qdict); void drive_hot_add(Monitor *mon, const QDict *qdict); void do_pci_device_hot_remove(Monitor *mon, const QDict *qdict); +/* pcie hotplug */ +void pcie_attention_button_push_print(Monitor *mon, const QObject *data); +int pcie_attention_button_push(Monitor *mon, const QDict *qdict, + QObject **ret_data); /* serial ports */ -- 1.7.1.1
[Qemu-devel] [PATCH v5 02/14] pci: introduce helper function to handle msi-x and msi.
this patch implements helper functions to handle msi-x and msi uniformly. They will be used later. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- hw/pci.c | 19 +++ hw/pci.h |3 +++ 2 files changed, 22 insertions(+), 0 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index e3462a9..300079f 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -25,6 +25,8 @@ #include pci.h #include pci_bridge.h #include pci_internals.h +#include msix.h +#include msi.h #include monitor.h #include net.h #include sysemu.h @@ -1034,6 +1036,23 @@ static void pci_set_irq(void *opaque, int irq_num, int level) pci_change_irq_level(pci_dev, irq_num, change); } +bool pci_msi_enabled(PCIDevice *dev) +{ +return msix_enabled(dev) || msi_enabled(dev); +} + +void pci_msi_notify(PCIDevice *dev, unsigned int vector) +{ +if (msix_enabled(dev)) { +msix_notify(dev, vector); +} else if (msi_enabled(dev)) { +msi_notify(dev, vector); +} else { +/* MSI/MSI-X must be enabled */ +abort(); +} +} + /***/ /* monitor info on PCI */ diff --git a/hw/pci.h b/hw/pci.h index 752e652..3072a5f 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -239,6 +239,9 @@ void do_pci_info_print(Monitor *mon, const QObject *data); void do_pci_info(Monitor *mon, QObject **ret_data); void pci_bridge_update_mappings(PCIBus *b); +bool pci_msi_enabled(PCIDevice *dev); +void pci_msi_notify(PCIDevice *dev, unsigned int vector); + static inline void pci_set_byte(uint8_t *config, uint8_t val) { -- 1.7.1.1
[Qemu-devel] [PATCH v5 14/14] pcie/aer: glue aer error injection into qemu monitor
introduce pcie_aer_inject_error command. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- Changes v3 - v4: - s/PCIE_AER/PCIEAER/g for structure names. - compilation adjustment. Changes v2 - v3: - compilation adjustment. --- hw/pcie_aer.c | 84 +++ qemu-monitor.hx | 22 ++ sysemu.h|5 +++ 3 files changed, 111 insertions(+), 0 deletions(-) diff --git a/hw/pcie_aer.c b/hw/pcie_aer.c index 1b023b0..97d3e2e 100644 --- a/hw/pcie_aer.c +++ b/hw/pcie_aer.c @@ -19,6 +19,8 @@ */ #include sysemu.h +#include qemu-objects.h +#include monitor.h #include pci_bridge.h #include pcie.h #include msix.h @@ -783,3 +785,85 @@ const VMStateDescription vmstate_pcie_aer_log = { } }; +void pcie_aer_inject_error_print(Monitor *mon, const QObject *data) +{ +QDict *qdict; +int devfn; +assert(qobject_type(data) == QTYPE_QDICT); +qdict = qobject_to_qdict(data); + +devfn = (int)qdict_get_int(qdict, devfn); +monitor_printf(mon, OK domain: %x, bus: %x devfn: %x.%x\n, + (int) qdict_get_int(qdict, domain), + (int) qdict_get_int(qdict, bus), + PCI_SLOT(devfn), PCI_FUNC(devfn)); +} + +int do_pcie_aer_inejct_error(Monitor *mon, + const QDict *qdict, QObject **ret_data) +{ +const char *pci_addr = qdict_get_str(qdict, pci_addr); +int dom; +int bus; +unsigned int slot; +unsigned int func; +PCIDevice *dev; +PCIEAERErr err; + +/* Ideally qdev device path should be used. + * However at the moment there is no reliable way to determine + * wheher a given qdev is pci device or not. + * so pci_addr is used. + */ +if (pci_parse_devaddr(pci_addr, dom, bus, slot, func)) { +monitor_printf(mon, invalid pci address %s\n, pci_addr); +return -1; +} +dev = pci_find_device(pci_find_root_bus(dom), bus, slot, func); +if (!dev) { +monitor_printf(mon, device is not found. 0x%x:0x%x.0x%x\n, + bus, slot, func); +return -1; +} +if (!pci_is_express(dev)) { +monitor_printf(mon, the device doesn't support pci express. + 0x%x:0x%x.0x%x\n, + bus, slot, func); +return -1; +} + +err.status = qdict_get_int(qdict, error_status); +err.source_id = (pci_bus_num(dev-bus) 8) | dev-devfn; + +err.flags = 0; +if (qdict_get_int(qdict, is_correctable)) { +err.flags |= PCIE_AER_ERR_IS_CORRECTABLE; +} +if (qdict_get_int(qdict, advisory_non_fatal)) { +err.flags |= PCIE_AER_ERR_MAYBE_ADVISORY; +} +if (qdict_haskey(qdict, tlph0)) { +err.flags |= PCIE_AER_ERR_HEADER_VALID; +} +if (qdict_haskey(qdict, hpfx0)) { +err.flags |= PCIE_AER_ERR_TLP_PRESENT; +} + +err.header[0] = qdict_get_try_int(qdict, tlph0, 0); +err.header[1] = qdict_get_try_int(qdict, tlph1, 0); +err.header[2] = qdict_get_try_int(qdict, tlph2, 0); +err.header[3] = qdict_get_try_int(qdict, tlph3, 0); + +err.prefix[0] = qdict_get_try_int(qdict, hpfx0, 0); +err.prefix[1] = qdict_get_try_int(qdict, hpfx1, 0); +err.prefix[2] = qdict_get_try_int(qdict, hpfx2, 0); +err.prefix[3] = qdict_get_try_int(qdict, hpfx3, 0); + +pcie_aer_inject_error(dev, err); +*ret_data = qobject_from_jsonf({ 'domain': %d, 'bus': %d, 'devfn': %d }, + pci_find_domain(dev-bus), + pci_bus_num(dev-bus), dev-devfn); +assert(*ret_data); + +return 0; +} diff --git a/qemu-monitor.hx b/qemu-monitor.hx index 965c754..ccb3d0e 100644 --- a/qemu-monitor.hx +++ b/qemu-monitor.hx @@ -1168,6 +1168,28 @@ Push PCI express attention button ETEXI { +.name = pcie_aer_inject_error, +.args_type = advisory_non_fatal:-a,is_correctable:-c, + pci_addr:s,error_status:i, + tlph0:i?,tlph1:i?,tlph2:i?,tlph3:i?, + hpfx0:i?,hpfx1:i?,hpfx2:i?,hpfx3:i?, +.params = [-a] [-c] [[domain:]bus:]slot.func + error status:32bit + [tlp header:(32bit x 4)] + [tlp header prefix:(32bit x 4)], +.help = inject pcie aer error + (use -a for advisory non fatal error) + (use -c for correctrable error), +.user_print = pcie_aer_inject_error_print, +.mhandler.cmd_new = do_pcie_aer_inejct_error, +}, + +STEXI +...@item pcie_abp +Push PCI express attention button +ETEXI + +{ .name = host_net_add, .args_type = device:s,opts:s?, .params = tap|user|socket|vde|dump [options], diff --git a/sysemu.h b/sysemu.h index cca411d..2f7157c 100644 --- a/sysemu.h +++ b/sysemu.h @@ -155,6 +155,11 @@ void pcie_attention_button_push_print(Monitor *mon, const QObject
[Qemu-devel] [PATCH v5 01/14] pci: introduce helper functions to test-and-{clear, set} mask in configuration space
This patch introduces helper functions to test-and-{clear, set} mask in configuration space. pci_{byte, word, long, quad}_test_and_{clear, set}_mask(). They will be used later. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- hw/pci.h | 70 ++ 1 files changed, 70 insertions(+), 0 deletions(-) diff --git a/hw/pci.h b/hw/pci.h index d8b399f..752e652 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -323,6 +323,76 @@ pci_config_set_interrupt_pin(uint8_t *pci_config, uint8_t val) pci_set_byte(pci_config[PCI_INTERRUPT_PIN], val); } +/* + * helper functions to do bit mask operation on configuration space. + * Just to set bit, use test-and-set and discard returned value. + * Just to clear bit, use test-and-clear and discard returned value. + * NOTE: They aren't atomic. + */ +static inline uint8_t +pci_byte_test_and_clear_mask(uint8_t *config, uint8_t mask) +{ +uint8_t val = pci_get_byte(config); +pci_set_byte(config, val ~mask); +return val mask; +} + +static inline uint8_t +pci_byte_test_and_set_mask(uint8_t *config, uint8_t mask) +{ +uint8_t val = pci_get_byte(config); +pci_set_byte(config, val | mask); +return val mask; +} + +static inline uint16_t +pci_word_test_and_clear_mask(uint8_t *config, uint16_t mask) +{ +uint16_t val = pci_get_word(config); +pci_set_word(config, val ~mask); +return val mask; +} + +static inline uint16_t +pci_word_test_and_set_mask(uint8_t *config, uint16_t mask) +{ +uint16_t val = pci_get_word(config); +pci_set_word(config, val | mask); +return val mask; +} + +static inline uint32_t +pci_long_test_and_clear_mask(uint8_t *config, uint32_t mask) +{ +uint32_t val = pci_get_long(config); +pci_set_long(config, val ~mask); +return val mask; +} + +static inline uint32_t +pci_long_test_and_set_mask(uint8_t *config, uint32_t mask) +{ +uint32_t val = pci_get_long(config); +pci_set_long(config, val | mask); +return val mask; +} + +static inline uint64_t +pci_quad_test_and_clear_mask(uint8_t *config, uint64_t mask) +{ +uint64_t val = pci_get_quad(config); +pci_set_quad(config, val ~mask); +return val mask; +} + +static inline uint64_t +pci_quad_test_and_set_mask(uint8_t *config, uint64_t mask) +{ +uint64_t val = pci_get_quad(config); +pci_set_quad(config, val | mask); +return val mask; +} + typedef int (*pci_qdev_initfn)(PCIDevice *dev); typedef struct { DeviceInfo qdev; -- 1.7.1.1
[Qemu-devel] [PATCH v5 03/14] pci: use pci_word_test_and_clear_mask() in pci_device_reset()
use pci_clear_bit_word() in pci_device_reset() where appropriate. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- Changes v4 - v5 - use pci_word_test_and_clear_mask() --- hw/pci.c |5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 300079f..409e2c0 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -139,9 +139,8 @@ static void pci_device_reset(PCIDevice *dev) dev-irq_state = 0; pci_update_irq_status(dev); /* Clear all writeable bits */ -pci_set_word(dev-config + PCI_COMMAND, - pci_get_word(dev-config + PCI_COMMAND) - ~pci_get_word(dev-wmask + PCI_COMMAND)); +pci_word_test_and_clear_mask(dev-config + PCI_COMMAND, + pci_get_word(dev-wmask + PCI_COMMAND)); dev-config[PCI_CACHE_LINE_SIZE] = 0x0; dev-config[PCI_INTERRUPT_LINE] = 0x0; for (r = 0; r PCI_NUM_REGIONS; ++r) { -- 1.7.1.1
[Qemu-devel] [PATCH v5 00/14] pcie port switch emulators
Here is v5 of the pcie patch series. I hope I addressed the blockers. On uncorrectable error status register in pcie_aer_write_config(). The register is RW1CS, so making it writable and test-and-clear doesn't work. new patches: 1, 2, updasted patches except trivial change: 4, 7, 8 BTW, as 0.13 is released, any chance to sync pci branch with the upstream by requesting pull? Patch description: This patch series implements pcie port switch emulators which is basic part for pcie/q35 support. This is for mst/pci tree. change v4 - v5: - introduced pci_xxx_test_and_clear/set_mask - eliminated xxx_notify(msi_trigger, int_level) - eliminated FLR bits. FLR will be addressed at the next phase. changes v3 - v4: - introduced new pci config helper functions.(clear set bit) - various clean up and some bug fixes. - dropped pci_shift_xxx(). - dropped function pointerin pcie_aer.h - dropped pci_exp_cap(), pcie_aer_cap(). - file rename (pcie_{root, upstream, downsatrem} = ioh33420, x3130). changes v2 - v3: - msi: improved commant and simplified shift/ffs dance - pci w1c config register framework - split pcie.[ch] into pcie_regs.h, pcie.[ch] and pcie_aer.[ch] - pcie, aer: many changes by following reviews. changes v1 - v2: - update msi - dropped already pushed out patches. - added msix patches. Isaku Yamahata (14): pci: introduce helper functions to test-and-{clear, set} mask in configuration space pci: introduce helper function to handle msi-x and msi. pci: use pci_word_test_and_clear_mask() in pci_device_reset() pci/bridge: fix pci_bridge_reset() msi: implements msi pcie: add pcie constants to pcie_regs.h pcie: helper functions for pcie capability and extended capability pcie/aer: helper functions for pcie aer capability pcie port: define struct PCIEPort/PCIESlot and helper functions ioh3420: pcie root port in X58 ioh x3130: pcie upstream port x3130: pcie downstream port pcie/hotplug: introduce pushing attention button command pcie/aer: glue aer error injection into qemu monitor Makefile.objs |4 +- hw/ioh3420.c| 229 + hw/ioh3420.h| 10 + hw/msi.c| 352 +++ hw/msi.h| 41 +++ hw/pci.c| 24 ++- hw/pci.h| 88 +- hw/pci_bridge.c | 57 +++- hw/pci_bridge.h |2 + hw/pcie.c | 540 + hw/pcie.h | 113 ++ hw/pcie_aer.c | 869 +++ hw/pcie_aer.h | 105 ++ hw/pcie_port.c | 198 +++ hw/pcie_port.h | 51 +++ hw/pcie_regs.h | 154 + hw/xio3130_downstream.c | 197 +++ hw/xio3130_downstream.h | 11 + hw/xio3130_upstream.c | 181 ++ hw/xio3130_upstream.h | 10 + qemu-common.h |6 + qemu-monitor.hx | 36 ++ sysemu.h|9 + 23 files changed, 3272 insertions(+), 15 deletions(-) create mode 100644 hw/ioh3420.c create mode 100644 hw/ioh3420.h create mode 100644 hw/msi.c create mode 100644 hw/msi.h create mode 100644 hw/pcie.c create mode 100644 hw/pcie.h create mode 100644 hw/pcie_aer.c create mode 100644 hw/pcie_aer.h create mode 100644 hw/pcie_port.c create mode 100644 hw/pcie_port.h create mode 100644 hw/pcie_regs.h create mode 100644 hw/xio3130_downstream.c create mode 100644 hw/xio3130_downstream.h create mode 100644 hw/xio3130_upstream.c create mode 100644 hw/xio3130_upstream.h
[Qemu-devel] Re: [Tracing][v4 PATCH 1/2] Introduce QMP interfaces
On Tue, Oct 19, 2010 at 11:55:50AM +0530, Prerna Saxena wrote: [PATCH 1/2] Introduce QMP interfaces : - query-trace - query-trace-events - query-trace-file Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com --- monitor.c | 53 --- simpletrace.c | 69 + simpletrace.h |5 3 files changed, 123 insertions(+), 4 deletions(-) Acked-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
[Qemu-devel] [PATCH v5 11/14] x3130: pcie upstream port
Implement TI x3130 pcie upstream port switch. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- Changes v4 - v5: - remove flr related stuff. This will be addressed at the next phase. - use pci_xxx_test_and_xxx_mask(). Chnages v3 - v4: - rename pcie_upstream - x3130_upstream. - compilation adjustment. Changes v2 - v3: - compilation adjustment. --- Makefile.objs |2 +- hw/xio3130_upstream.c | 181 + hw/xio3130_upstream.h | 10 +++ 3 files changed, 192 insertions(+), 1 deletions(-) create mode 100644 hw/xio3130_upstream.c create mode 100644 hw/xio3130_upstream.h diff --git a/Makefile.objs b/Makefile.objs index cf7d2e9..d61e88a 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -140,7 +140,7 @@ hw-obj-y = hw-obj-y += vl.o loader.o hw-obj-y += virtio.o virtio-console.o hw-obj-y += fw_cfg.o pci.o pci_host.o pcie_host.o pci_bridge.o -hw-obj-y += ioh3420.o +hw-obj-y += ioh3420.o xio3130_upstream.o hw-obj-y += watchdog.o hw-obj-$(CONFIG_ISA_MMIO) += isa_mmio.o hw-obj-$(CONFIG_ECC) += ecc.o diff --git a/hw/xio3130_upstream.c b/hw/xio3130_upstream.c new file mode 100644 index 000..cba2b09 --- /dev/null +++ b/hw/xio3130_upstream.c @@ -0,0 +1,181 @@ +/* + * xio3130_upstream.c + * TI X3130 pci express upstream port switch + * + * Copyright (c) 2010 Isaku Yamahata yamahata at valinux co jp + *VA Linux Systems Japan K.K. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, see http://www.gnu.org/licenses/. + */ + +#include pci_ids.h +#include msi.h +#include pcie.h +#include xio3130_upstream.h + +#define PCI_DEVICE_ID_TI_XIO3130U 0x8232 /* upstream port */ +#define XIO3130_REVISION0x2 +#define XIO3130_MSI_OFFSET 0x70 +#define XIO3130_MSI_SUPPORTED_FLAGS PCI_MSI_FLAGS_64BIT +#define XIO3130_MSI_NR_VECTOR 1 +#define XIO3130_SSVID_OFFSET0x80 +#define XIO3130_SSVID_SVID 0 +#define XIO3130_SSVID_SSID 0 +#define XIO3130_EXP_OFFSET 0x90 +#define XIO3130_AER_OFFSET 0x100 + +static void xio3130_upstream_write_config(PCIDevice *d, uint32_t address, + uint32_t val, int len) +{ +uint32_t uncorsta = +pci_get_long(d-config + d-exp.aer_cap + PCI_ERR_UNCOR_STATUS); + +pci_bridge_write_config(d, address, val, len); +pcie_cap_flr_write_config(d, address, val, len); +msi_write_config(d, address, val, len); +pcie_aer_write_config(d, address, val, len, uncorsta); +pci_clear_written_write_config(d, address, val, len); +} + +static void xio3130_upstream_reset(DeviceState *qdev) +{ +PCIDevice *d = DO_UPCAST(PCIDevice, qdev, qdev); +msi_reset(d); +pci_bridge_reset_zero_base_limit(d); +pci_bridge_reset(qdev); +pcie_cap_deverr_reset(d); +} + +static int xio3130_upstream_initfn(PCIDevice *d) +{ +PCIBridge* br = DO_UPCAST(PCIBridge, dev, d); +PCIEPort *p = DO_UPCAST(PCIEPort, br, br); +int rc; + +rc = pci_bridge_initfn(d); +if (rc 0) { +return rc; +} + +pcie_port_init_reg(d); +pci_config_set_vendor_id(d-config, PCI_VENDOR_ID_TI); +pci_config_set_device_id(d-config, PCI_DEVICE_ID_TI_XIO3130U); +d-config[PCI_REVISION_ID] = XIO3130_REVISION; + +rc = msi_init(d, XIO3130_MSI_OFFSET, XIO3130_MSI_NR_VECTOR, + XIO3130_MSI_SUPPORTED_FLAGS PCI_MSI_FLAGS_64BIT, + XIO3130_MSI_SUPPORTED_FLAGS PCI_MSI_FLAGS_MASKBIT); +if (rc 0) { +return rc; +} +rc = pci_bridge_ssvid_init(d, XIO3130_SSVID_OFFSET, + XIO3130_SSVID_SVID, XIO3130_SSVID_SSID); +if (rc 0) { +return rc; +} +rc = pcie_cap_init(d, XIO3130_EXP_OFFSET, PCI_EXP_TYPE_UPSTREAM, + p-port); +if (rc 0) { +return rc; +} + +/* TODO: implement FLR */ +pcie_cap_flr_init(d); + +pcie_cap_deverr_init(d); +pcie_aer_init(d, XIO3130_AER_OFFSET); + +return 0; +} + +static int xio3130_upstream_exitfn(PCIDevice *d) +{ +pcie_aer_exit(d); +msi_uninit(d); +pcie_cap_exit(d); +return pci_bridge_exitfn(d); +} + +PCIEPort *xio3130_upstream_init(PCIBus *bus, int devfn, bool multifunction, + const char *bus_name, pci_map_irq_fn map_irq, + uint8_t port) +{ +PCIDevice *d; +
[Qemu-devel] [PATCH v5 08/14] pcie/aer: helper functions for pcie aer capability
This patch implements helper functions for pcie aer capability which will be used later. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- Changes v4 - v5: - use pci_xxx_test_and_xxx_mask() - rewrote PCIDevice::written bits. - eliminated pcie_aer_notify() - introduced PCIExpressDevice::aer_intx Changes v3 - v4: - various naming fixes. - use pci bit operation helper function - eliminate errmsg function pointer - replace pci_shift_xxx() with PCIDevice::written - uncorrect error status register. - dropped pcie_aer_cap() Changes v2 - v3: - split out from pcie.[ch] to pcie_aer.[ch] to make the files sorter. - embeded PCIExpressDevice into PCIDevice. - CodingStyle fix --- Makefile.objs |2 +- hw/pcie.h |6 + hw/pcie_aer.c | 785 + hw/pcie_aer.h | 105 qemu-common.h |3 + 5 files changed, 900 insertions(+), 1 deletions(-) create mode 100644 hw/pcie_aer.c create mode 100644 hw/pcie_aer.h diff --git a/Makefile.objs b/Makefile.objs index eeb5134..68bcc48 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -186,7 +186,7 @@ hw-obj-$(CONFIG_PIIX4) += piix4.o # PCI watchdog devices hw-obj-y += wdt_i6300esb.o -hw-obj-y += pcie.o +hw-obj-y += pcie.o pcie_aer.o hw-obj-y += msix.o msi.o # PCI network cards diff --git a/hw/pcie.h b/hw/pcie.h index 68327d8..1b10753 100644 --- a/hw/pcie.h +++ b/hw/pcie.h @@ -24,6 +24,7 @@ #include hw.h #include pci_regs.h #include pcie_regs.h +#include pcie_aer.h typedef enum { /* for attention and power indicator */ @@ -66,6 +67,11 @@ struct PCIExpressDevice { /* SLOT */ unsigned int hpev_intx; /* INTx for hot plug event */ + +/* AER */ +uint16_t aer_cap; +PCIEAERLog aer_log; +unsigned int aer_intx; /* INTx for error reporting */ }; /* PCI express capability helper functions */ diff --git a/hw/pcie_aer.c b/hw/pcie_aer.c new file mode 100644 index 000..1b023b0 --- /dev/null +++ b/hw/pcie_aer.c @@ -0,0 +1,785 @@ +/* + * pcie_aer.c + * + * Copyright (c) 2010 Isaku Yamahata yamahata at valinux co jp + *VA Linux Systems Japan K.K. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, see http://www.gnu.org/licenses/. + */ + +#include sysemu.h +#include pci_bridge.h +#include pcie.h +#include msix.h +#include msi.h +#include pci_internals.h +#include pcie_regs.h + +//#define DEBUG_PCIE +#ifdef DEBUG_PCIE +# define PCIE_DPRINTF(fmt, ...) \ +fprintf(stderr, %s:%d fmt, __func__, __LINE__, ## __VA_ARGS__) +#else +# define PCIE_DPRINTF(fmt, ...) do {} while (0) +#endif +#define PCIE_DEV_PRINTF(dev, fmt, ...) \ +PCIE_DPRINTF(%s:%x fmt, (dev)-name, (dev)-devfn, ## __VA_ARGS__) + +static void pcie_aer_clear_error(PCIDevice *dev); +static uint8_t pcie_aer_root_get_vector(PCIDevice *dev); +static AERMsgResult +pcie_aer_msg_alldev(PCIDevice *dev, const PCIEAERMsg *msg); +static AERMsgResult +pcie_aer_msg_vbridge(PCIDevice *dev, const PCIEAERMsg *msg); +static AERMsgResult +pcie_aer_msg_root_port(PCIDevice *dev, const PCIEAERMsg *msg); + +/* From 6.2.7 Error Listing and Rules. Table 6-2, 6-3 and 6-4 */ +static PCIEAERSeverity pcie_aer_uncor_default_severity(uint32_t status) +{ +switch (status) { +case PCI_ERR_UNC_INTN: +case PCI_ERR_UNC_DLP: +case PCI_ERR_UNC_SDN: +case PCI_ERR_UNC_RX_OVER: +case PCI_ERR_UNC_FCP: +case PCI_ERR_UNC_MALF_TLP: +return AER_ERR_FATAL; +case PCI_ERR_UNC_POISON_TLP: +case PCI_ERR_UNC_ECRC: +case PCI_ERR_UNC_UNSUP: +case PCI_ERR_UNC_COMP_TIME: +case PCI_ERR_UNC_COMP_ABORT: +case PCI_ERR_UNC_UNX_COMP: +case PCI_ERR_UNC_ACSV: +case PCI_ERR_UNC_MCBTLP: +case PCI_ERR_UNC_ATOP_EBLOCKED: +case PCI_ERR_UNC_TLP_PRF_BLOCKED: +return AER_ERR_NONFATAL; +default: +break; +} +abort(); +return AER_ERR_FATAL; +} + +static uint32_t aer_log_next(uint32_t i, uint32_t max) +{ +return (i + 1) % max; +} + +static bool aer_log_empty_index(uint32_t producer, uint32_t consumer) +{ +return producer == consumer; +} + +static bool aer_log_empty(PCIEAERLog *aer_log) +{ +return aer_log_empty_index(aer_log-producer, aer_log-consumer); +} + +static bool aer_log_full(PCIEAERLog *aer_log) +{ +return aer_log_next(aer_log-producer, aer_log-log_max) == +aer_log-consumer; +}
[Qemu-devel] [PATCH v5 06/14] pcie: add pcie constants to pcie_regs.h
add pcie constants to pcie_regs.h. Those constants should go to Linux pci_regs.h and then the file should go away eventually. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- Changes v3 - v4: - removed copyright notice as requested. Changes v2 - v3: - moved out pcie constants from pcie.c to pcie_regs.h. - removed unused macros --- hw/pcie_regs.h | 154 1 files changed, 154 insertions(+), 0 deletions(-) create mode 100644 hw/pcie_regs.h diff --git a/hw/pcie_regs.h b/hw/pcie_regs.h new file mode 100644 index 000..3461a1b --- /dev/null +++ b/hw/pcie_regs.h @@ -0,0 +1,154 @@ +/* + * constants for pcie configurations space from pci express spec. + * + * TODO: + * Those constants and macros should go to Linux pci_regs.h + * Once they're merged, they will go away. + */ +#ifndef QEMU_PCIE_REGS_H +#define QEMU_PCIE_REGS_H + + +/* express capability */ + +#define PCI_EXP_VER2_SIZEOF 0x3c /* express capability of ver. 2 */ +#define PCI_EXT_CAP_VER_SHIFT 16 +#define PCI_EXT_CAP_NEXT_SHIFT 20 +#define PCI_EXT_CAP_NEXT_MASK (0xffc PCI_EXT_CAP_NEXT_SHIFT) + +#define PCI_EXT_CAP(id, ver, next) \ +((id) | \ + ((ver) PCI_EXT_CAP_VER_SHIFT) | \ + ((next) PCI_EXT_CAP_NEXT_SHIFT)) + +#define PCI_EXT_CAP_ALIGN 4 +#define PCI_EXT_CAP_ALIGNUP(x) \ +(((x) + PCI_EXT_CAP_ALIGN - 1) ~(PCI_EXT_CAP_ALIGN - 1)) + +/* PCI_EXP_FLAGS */ +#define PCI_EXP_FLAGS_VER2 2 /* for now, supports only ver. 2 */ +#define PCI_EXP_FLAGS_IRQ_SHIFT (ffs(PCI_EXP_FLAGS_IRQ) - 1) +#define PCI_EXP_FLAGS_TYPE_SHIFT(ffs(PCI_EXP_FLAGS_TYPE) - 1) + + +/* PCI_EXP_LINK{CAP, STA} */ +/* link speed */ +#define PCI_EXP_LNK_LS_25 1 + +#define PCI_EXP_LNK_MLW_SHIFT (ffs(PCI_EXP_LNKCAP_MLW) - 1) +#define PCI_EXP_LNK_MLW_1 (1 PCI_EXP_LNK_MLW_SHIFT) + +/* PCI_EXP_LINKCAP */ +#define PCI_EXP_LNKCAP_ASPMS_SHIFT (ffs(PCI_EXP_LNKCAP_ASPMS) - 1) +#define PCI_EXP_LNKCAP_ASPMS_0S (1 PCI_EXP_LNKCAP_ASPMS_SHIFT) + +#define PCI_EXP_LNKCAP_PN_SHIFT (ffs(PCI_EXP_LNKCAP_PN) - 1) + +#define PCI_EXP_SLTCAP_PSN_SHIFT(ffs(PCI_EXP_SLTCAP_PSN) - 1) + +#define PCI_EXP_SLTCTL_IND_RESERVED 0x0 +#define PCI_EXP_SLTCTL_IND_ON 0x1 +#define PCI_EXP_SLTCTL_IND_BLINK0x2 +#define PCI_EXP_SLTCTL_IND_OFF 0x3 +#define PCI_EXP_SLTCTL_AIC_SHIFT(ffs(PCI_EXP_SLTCTL_AIC) - 1) +#define PCI_EXP_SLTCTL_AIC_OFF \ +(PCI_EXP_SLTCTL_IND_OFF PCI_EXP_SLTCTL_AIC_SHIFT) + +#define PCI_EXP_SLTCTL_PIC_SHIFT(ffs(PCI_EXP_SLTCTL_PIC) - 1) +#define PCI_EXP_SLTCTL_PIC_OFF \ +(PCI_EXP_SLTCTL_IND_OFF PCI_EXP_SLTCTL_PIC_SHIFT) + +#define PCI_EXP_SLTCTL_SUPPORTED\ +(PCI_EXP_SLTCTL_ABPE | \ + PCI_EXP_SLTCTL_PDCE | \ + PCI_EXP_SLTCTL_CCIE | \ + PCI_EXP_SLTCTL_HPIE | \ + PCI_EXP_SLTCTL_AIC | \ + PCI_EXP_SLTCTL_PCC | \ + PCI_EXP_SLTCTL_EIC) + +#define PCI_EXP_DEVCAP2_EFF 0x10 +#define PCI_EXP_DEVCAP2_EETLPP 0x20 + +#define PCI_EXP_DEVCTL2_EETLPPB 0x80 + +/* ARI */ +#define PCI_ARI_VER 1 +#define PCI_ARI_SIZEOF 8 + +/* AER */ +#define PCI_ERR_VER 2 +#define PCI_ERR_SIZEOF 0x48 + +#define PCI_ERR_UNC_SDN 0x0020 /* surprise down */ +#define PCI_ERR_UNC_ACSV0x0020 /* ACS Violation */ +#define PCI_ERR_UNC_INTN0x0040 /* Internal Error */ +#define PCI_ERR_UNC_MCBTLP 0x0080 /* MC Blcoked TLP */ +#define PCI_ERR_UNC_ATOP_EBLOCKED 0x0100 /* atomic op egress blocked */ +#define PCI_ERR_UNC_TLP_PRF_BLOCKED 0x0200 /* TLP Prefix Blocked */ +#define PCI_ERR_COR_ADV_NONFATAL0x2000 /* Advisory Non-Fatal */ +#define PCI_ERR_COR_INTERNAL0x4000 /* Corrected Internal */ +#define PCI_ERR_COR_HL_OVERFLOW 0x8000 /* Header Long Overflow */ +#define PCI_ERR_CAP_FEP_MASK0x001f +#define PCI_ERR_CAP_MHRC0x0200 +#define PCI_ERR_CAP_MHRE0x0400 +#define PCI_ERR_CAP_TLP 0x0800 + +#define PCI_ERR_TLP_PREFIX_LOG 0x38 + +#define PCI_SEC_STATUS_RCV_SYSTEM_ERROR 0x4000 + +/* aer root error command/status */ +#define PCI_ERR_ROOT_CMD_EN_MASK(PCI_ERR_ROOT_CMD_COR_EN | \ + PCI_ERR_ROOT_CMD_NONFATAL_EN | \ + PCI_ERR_ROOT_CMD_FATAL_EN) + +#define PCI_ERR_ROOT_IRQ_MAX
[Qemu-devel] Re: [Tracing][v4 PATCH 2/2] Add documentation for QMP interfaces
On Tue, Oct 19, 2010 at 11:57:50AM +0530, Prerna Saxena wrote: [PATCH 2/2] Add documentation for QMP commands: - query-trace - query-trace-events - query-trace-file. Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com --- qmp-commands.hx | 94 +++ 1 files changed, 94 insertions(+), 0 deletions(-) Acked-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
[Qemu-devel] [PATCH v5 07/14] pcie: helper functions for pcie capability and extended capability
This patch implements helper functions for pci express capability and pci express extended capability allocation. NOTE: presence detection depends on pci_qdev_init() change. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- Changes v4 - v5: - dropped FLR related members. This will be addressed at the next phase. - use pci_xxx_test_and_xxx_mask(). - drop PCIDevice::written bits. and made related registers writable. - eliminated pcie_cap_slot_notify() - introduced PCIExpressDevice::hpev_intx Changes v3 - v4: - various clean up - dropped pcie_notify(), pcie_del_capability() - use pci_{clear_set, clear}_bit_xxx() helper functions. - dropped pci_exp_cap() --- Makefile.objs |1 + hw/pci.h |5 + hw/pcie.c | 540 + hw/pcie.h | 107 qemu-common.h |1 + 5 files changed, 654 insertions(+), 0 deletions(-) create mode 100644 hw/pcie.c create mode 100644 hw/pcie.h diff --git a/Makefile.objs b/Makefile.objs index 5f5a4c5..eeb5134 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -186,6 +186,7 @@ hw-obj-$(CONFIG_PIIX4) += piix4.o # PCI watchdog devices hw-obj-y += wdt_i6300esb.o +hw-obj-y += pcie.o hw-obj-y += msix.o msi.o # PCI network cards diff --git a/hw/pci.h b/hw/pci.h index 9e2f27d..d6c522b 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -9,6 +9,8 @@ /* PCI includes legacy ISA access. */ #include isa.h +#include pcie.h + /* PCI bus */ #define PCI_DEVFN(slot, func) slot) 0x1f) 3) | ((func) 0x07)) @@ -175,6 +177,9 @@ struct PCIDevice { /* Offset of MSI capability in config space */ uint8_t msi_cap; +/* PCI Express */ +PCIExpressDevice exp; + /* Location of option rom */ char *romfile; ram_addr_t rom_offset; diff --git a/hw/pcie.c b/hw/pcie.c new file mode 100644 index 000..53d1fce --- /dev/null +++ b/hw/pcie.c @@ -0,0 +1,540 @@ +/* + * pcie.c + * + * Copyright (c) 2010 Isaku Yamahata yamahata at valinux co jp + *VA Linux Systems Japan K.K. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, see http://www.gnu.org/licenses/. + */ + +#include sysemu.h +#include pci_bridge.h +#include pcie.h +#include msix.h +#include msi.h +#include pci_internals.h +#include pcie_regs.h + +//#define DEBUG_PCIE +#ifdef DEBUG_PCIE +# define PCIE_DPRINTF(fmt, ...) \ +fprintf(stderr, %s:%d fmt, __func__, __LINE__, ## __VA_ARGS__) +#else +# define PCIE_DPRINTF(fmt, ...) do {} while (0) +#endif +#define PCIE_DEV_PRINTF(dev, fmt, ...) \ +PCIE_DPRINTF(%s:%x fmt, (dev)-name, (dev)-devfn, ## __VA_ARGS__) + + +/*** + * pci express capability helper functions + */ +int pcie_cap_init(PCIDevice *dev, uint8_t offset, uint8_t type, uint8_t port) +{ +int pos; +uint8_t *exp_cap; + +assert(pci_is_express(dev)); + +pos = pci_add_capability(dev, PCI_CAP_ID_EXP, offset, + PCI_EXP_VER2_SIZEOF); +if (pos 0) { +return pos; +} +dev-exp.exp_cap = pos; +exp_cap = dev-config + pos; + +/* capability register + interrupt message number defaults to 0 */ +pci_set_word(exp_cap + PCI_EXP_FLAGS, + ((type PCI_EXP_FLAGS_TYPE_SHIFT) PCI_EXP_FLAGS_TYPE) | + PCI_EXP_FLAGS_VER2); + +/* device capability register + * table 7-12: + * roll based error reporting bit must be set by all + * Functions conforming to the ECN, PCI Express Base + * Specification, Revision 1.1., or subsequent PCI Express Base + * Specification revisions. + */ +pci_set_long(exp_cap + PCI_EXP_DEVCAP, PCI_EXP_DEVCAP_RBER); + +pci_set_long(exp_cap + PCI_EXP_LNKCAP, + (port PCI_EXP_LNKCAP_PN_SHIFT) | + PCI_EXP_LNKCAP_ASPMS_0S | + PCI_EXP_LNK_MLW_1 | + PCI_EXP_LNK_LS_25); + +pci_set_word(exp_cap + PCI_EXP_LNKSTA, + PCI_EXP_LNK_MLW_1 | PCI_EXP_LNK_LS_25); + +pci_set_long(exp_cap + PCI_EXP_DEVCAP2, + PCI_EXP_DEVCAP2_EFF | PCI_EXP_DEVCAP2_EETLPP); + +pci_set_word(dev-wmask + pos, PCI_EXP_DEVCTL2_EETLPPB); +return pos; +} + +void pcie_cap_exit(PCIDevice *dev) +{ +pci_del_capability(dev, PCI_CAP_ID_EXP, PCI_EXP_VER2_SIZEOF); +} + +uint8_t
[Qemu-devel] [PATCH v5 05/14] msi: implements msi
implements msi related functions. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- Changes v4 - v5: - use pci_xxx_test_and_clear/set_mask(). Changes v3 - v4: - use pci_set_bit_xxx helper function. - make nr_vectors, vector unsigned int. - introduce PCI_MSI_VECTORS_MAX. - fix undefined bit operations. - eliminate msi_set_pending(). Changes v2 - v3: - improved comment wording. - simplified shift/ffs dance. Changes v1 - v2: - opencode some oneline helper function/macros for readability - use ffs where appropriate - rename some functions/variables as suggested. - added assert() - 1 - 1U - clear INTx# when MSI is enabled - clear pending bits for freed vectors. - check the requested number of vectors. msi: update for helper functions. update for helper functions. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- Makefile.objs |2 +- hw/msi.c | 352 + hw/msi.h | 41 +++ hw/pci.h | 10 +- 4 files changed, 401 insertions(+), 4 deletions(-) create mode 100644 hw/msi.c create mode 100644 hw/msi.h diff --git a/Makefile.objs b/Makefile.objs index 594894b..5f5a4c5 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -186,7 +186,7 @@ hw-obj-$(CONFIG_PIIX4) += piix4.o # PCI watchdog devices hw-obj-y += wdt_i6300esb.o -hw-obj-y += msix.o +hw-obj-y += msix.o msi.o # PCI network cards hw-obj-y += ne2000.o diff --git a/hw/msi.c b/hw/msi.c new file mode 100644 index 000..a949d82 --- /dev/null +++ b/hw/msi.c @@ -0,0 +1,352 @@ +/* + * msi.c + * + * Copyright (c) 2010 Isaku Yamahata yamahata at valinux co jp + *VA Linux Systems Japan K.K. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + + * You should have received a copy of the GNU General Public License along + * with this program; if not, see http://www.gnu.org/licenses/. + */ + +#include msi.h + +/* Eventually those constants should go to Linux pci_regs.h */ +#define PCI_MSI_PENDING_32 0x10 +#define PCI_MSI_PENDING_64 0x14 + +/* PCI_MSI_ADDRESS_LO */ +#define PCI_MSI_ADDRESS_LO_MASK (~0x3) + +/* If we get rid of cap allocator, we won't need those. */ +#define PCI_MSI_32_SIZEOF 0x0a +#define PCI_MSI_64_SIZEOF 0x0e +#define PCI_MSI_32M_SIZEOF 0x14 +#define PCI_MSI_64M_SIZEOF 0x18 + +#define PCI_MSI_VECTORS_MAX 32 + +/* If we get rid of cap allocator, we won't need this. */ +static inline uint8_t msi_cap_sizeof(uint16_t flags) +{ +switch (flags (PCI_MSI_FLAGS_MASKBIT | PCI_MSI_FLAGS_64BIT)) { +case PCI_MSI_FLAGS_MASKBIT | PCI_MSI_FLAGS_64BIT: +return PCI_MSI_64M_SIZEOF; +case PCI_MSI_FLAGS_64BIT: +return PCI_MSI_64_SIZEOF; +case PCI_MSI_FLAGS_MASKBIT: +return PCI_MSI_32M_SIZEOF; +case 0: +return PCI_MSI_32_SIZEOF; +default: +abort(); +break; +} +return 0; +} + +//#define MSI_DEBUG + +#ifdef MSI_DEBUG +# define MSI_DPRINTF(fmt, ...) \ +fprintf(stderr, %s:%d fmt, __func__, __LINE__, ## __VA_ARGS__) +#else +# define MSI_DPRINTF(fmt, ...) do { } while (0) +#endif +#define MSI_DEV_PRINTF(dev, fmt, ...) \ +MSI_DPRINTF(%s:%x fmt, (dev)-name, (dev)-devfn, ## __VA_ARGS__) + +static inline unsigned int msi_nr_vectors(uint16_t flags) +{ +return 1U +((flags PCI_MSI_FLAGS_QSIZE) (ffs(PCI_MSI_FLAGS_QSIZE) - 1)); +} + +static inline uint8_t msi_flags_off(const PCIDevice* dev) +{ +return dev-msi_cap + PCI_MSI_FLAGS; +} + +static inline uint8_t msi_address_lo_off(const PCIDevice* dev) +{ +return dev-msi_cap + PCI_MSI_ADDRESS_LO; +} + +static inline uint8_t msi_address_hi_off(const PCIDevice* dev) +{ +return dev-msi_cap + PCI_MSI_ADDRESS_HI; +} + +static inline uint8_t msi_data_off(const PCIDevice* dev, bool msi64bit) +{ +return dev-msi_cap + (msi64bit ? PCI_MSI_DATA_64 : PCI_MSI_DATA_32); +} + +static inline uint8_t msi_mask_off(const PCIDevice* dev, bool msi64bit) +{ +return dev-msi_cap + (msi64bit ? PCI_MSI_MASK_64 : PCI_MSI_MASK_32); +} + +static inline uint8_t msi_pending_off(const PCIDevice* dev, bool msi64bit) +{ +return dev-msi_cap + (msi64bit ? PCI_MSI_PENDING_64 : PCI_MSI_PENDING_32); +} + +bool msi_enabled(const PCIDevice *dev) +{ +return msi_present(dev) +(pci_get_word(dev-config + msi_flags_off(dev)) + PCI_MSI_FLAGS_ENABLE); +} + +int msi_init(struct PCIDevice *dev, uint8_t offset, + unsigned int nr_vectors,
[Qemu-devel] [PATCH v5 12/14] x3130: pcie downstream port
Implement TI x3130 pcie downstream port switch. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- Changes v4 - v5: - use pci_xxx_test_and_xxx_mask(). - removed flr related stuff. Changes v3 - v4: - rename: pcie_downstream - x3130_downstream - compilation adjustment. Changes v2 - v3: - compilation adjustment. --- Makefile.objs |2 +- hw/xio3130_downstream.c | 197 +++ hw/xio3130_downstream.h | 11 +++ 3 files changed, 209 insertions(+), 1 deletions(-) create mode 100644 hw/xio3130_downstream.c create mode 100644 hw/xio3130_downstream.h diff --git a/Makefile.objs b/Makefile.objs index d61e88a..48f98f3 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -140,7 +140,7 @@ hw-obj-y = hw-obj-y += vl.o loader.o hw-obj-y += virtio.o virtio-console.o hw-obj-y += fw_cfg.o pci.o pci_host.o pcie_host.o pci_bridge.o -hw-obj-y += ioh3420.o xio3130_upstream.o +hw-obj-y += ioh3420.o xio3130_upstream.o xio3130_downstream.o hw-obj-y += watchdog.o hw-obj-$(CONFIG_ISA_MMIO) += isa_mmio.o hw-obj-$(CONFIG_ECC) += ecc.o diff --git a/hw/xio3130_downstream.c b/hw/xio3130_downstream.c new file mode 100644 index 000..9801723 --- /dev/null +++ b/hw/xio3130_downstream.c @@ -0,0 +1,197 @@ +/* + * x3130_downstream.c + * TI X3130 pci express downstream port switch + * + * Copyright (c) 2010 Isaku Yamahata yamahata at valinux co jp + *VA Linux Systems Japan K.K. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, see http://www.gnu.org/licenses/. + */ + +#include pci_ids.h +#include msi.h +#include pcie.h +#include xio3130_downstream.h + +#define PCI_DEVICE_ID_TI_XIO3130D 0x8233 /* downstream port */ +#define XIO3130_REVISION0x1 +#define XIO3130_MSI_OFFSET 0x70 +#define XIO3130_MSI_SUPPORTED_FLAGS PCI_MSI_FLAGS_64BIT +#define XIO3130_MSI_NR_VECTOR 1 +#define XIO3130_SSVID_OFFSET0x80 +#define XIO3130_SSVID_SVID 0 +#define XIO3130_SSVID_SSID 0 +#define XIO3130_EXP_OFFSET 0x90 +#define XIO3130_AER_OFFSET 0x100 + +static void xio3130_downstream_write_config(PCIDevice *d, uint32_t address, + uint32_t val, int len) +{ +uint16_t sltctl = +pci_get_word(d-config + d-exp.exp_cap + PCI_EXP_SLTCTL); +uint32_t uncorsta = +pci_get_long(d-config + d-exp.aer_cap + PCI_ERR_UNCOR_STATUS); + +pci_bridge_write_config(d, address, val, len); +pcie_cap_flr_write_config(d, address, val, len); +pcie_cap_slot_write_config(d, address, val, len, sltctl); +msi_write_config(d, address, val, len); +pcie_aer_write_config(d, address, val, len, uncorsta); +pci_clear_written_write_config(d, address, val, len); +} + +static void xio3130_downstream_reset(DeviceState *qdev) +{ +PCIDevice *d = DO_UPCAST(PCIDevice, qdev, qdev); +msi_reset(d); +pcie_cap_deverr_reset(d); +pcie_cap_slot_reset(d); +pcie_cap_ari_reset(d); +pci_bridge_reset_zero_base_limit(d); +pci_bridge_reset(qdev); +} + +static int xio3130_downstream_initfn(PCIDevice *d) +{ +PCIBridge* br = DO_UPCAST(PCIBridge, dev, d); +PCIEPort *p = DO_UPCAST(PCIEPort, br, br); +PCIESlot *s = DO_UPCAST(PCIESlot, port, p); +int rc; + +rc = pci_bridge_initfn(d); +if (rc 0) { +return rc; +} + +pcie_port_init_reg(d); +pci_config_set_vendor_id(d-config, PCI_VENDOR_ID_TI); +pci_config_set_device_id(d-config, PCI_DEVICE_ID_TI_XIO3130D); +d-config[PCI_REVISION_ID] = XIO3130_REVISION; + +rc = msi_init(d, XIO3130_MSI_OFFSET, XIO3130_MSI_NR_VECTOR, + XIO3130_MSI_SUPPORTED_FLAGS PCI_MSI_FLAGS_64BIT, + XIO3130_MSI_SUPPORTED_FLAGS PCI_MSI_FLAGS_MASKBIT); +if (rc 0) { +return rc; +} +rc = pci_bridge_ssvid_init(d, XIO3130_SSVID_OFFSET, + XIO3130_SSVID_SVID, XIO3130_SSVID_SSID); +if (rc 0) { +return rc; +} +rc = pcie_cap_init(d, XIO3130_EXP_OFFSET, PCI_EXP_TYPE_DOWNSTREAM, + p-port); +if (rc 0) { +return rc; +} +pcie_cap_flr_init(d); /* TODO: implement FLR */ +pcie_cap_deverr_init(d); +pcie_cap_slot_init(d, s-slot); +pcie_chassis_create(s-chassis); +rc = pcie_chassis_add_slot(s); +if (rc 0) { +return rc; +} +
[Qemu-devel] [PATCH v5 09/14] pcie port: define struct PCIEPort/PCIESlot and helper functions
define struct PCIEPort which represents common part of pci express port.(root, upstream and downstream.) add a helper function for pcie port which can be used commonly by root/upstream/downstream port. define struct PCIESlot which represents common part of pcie slot.(root and downstream.) and helper functions for it. helper functions for chassis, slot - PCIESlot conversion. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- Changes v4 - v5: - use pci_xxx_test_and_xxx_mask() Changes v3 - v4: - Initialize prefetchable memory base/limit registers correctly. They must support 64bit. - compilation adjustment. Changes v2 - v3: - static'fy chassis. - compilation adjustment. --- Makefile.objs |2 +- hw/pcie_port.c | 116 hw/pcie_port.h | 51 qemu-common.h |2 + 4 files changed, 170 insertions(+), 1 deletions(-) create mode 100644 hw/pcie_port.c create mode 100644 hw/pcie_port.h diff --git a/Makefile.objs b/Makefile.objs index 68bcc48..6c3b84a 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -186,7 +186,7 @@ hw-obj-$(CONFIG_PIIX4) += piix4.o # PCI watchdog devices hw-obj-y += wdt_i6300esb.o -hw-obj-y += pcie.o pcie_aer.o +hw-obj-y += pcie.o pcie_aer.o pcie_port.o hw-obj-y += msix.o msi.o # PCI network cards diff --git a/hw/pcie_port.c b/hw/pcie_port.c new file mode 100644 index 000..117de61 --- /dev/null +++ b/hw/pcie_port.c @@ -0,0 +1,116 @@ +/* + * pcie_port.c + * + * Copyright (c) 2010 Isaku Yamahata yamahata at valinux co jp + *VA Linux Systems Japan K.K. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, see http://www.gnu.org/licenses/. + */ + +#include pcie_port.h + +void pcie_port_init_reg(PCIDevice *d) +{ +/* Unlike pci bridge, + 66MHz and fast back to back don't apply to pci express port. */ +pci_set_word(d-config + PCI_STATUS, 0); +pci_set_word(d-config + PCI_SEC_STATUS, 0); + +/* 7.5.3.5 Prefetchable Memory Base Limit + * The Prefetchable Memory Base and Prefetchable Memory Limit registers + * must indicate that 64-bit addresses are supported, as defined in + * PCI-to-PCI Bridge Architecture Specification, Revision 1.2. + */ +pci_word_test_and_set_mask(d-config + PCI_PREF_MEMORY_BASE, + PCI_PREF_RANGE_TYPE_64); +pci_word_test_and_set_mask(d-config + PCI_PREF_MEMORY_LIMIT, + PCI_PREF_RANGE_TYPE_64); +} + +/** + * (chassis number, pcie physical slot number) - pcie slot conversion + */ +struct PCIEChassis { +uint8_t number; + +QLIST_HEAD(, PCIESlot) slots; +QLIST_ENTRY(PCIEChassis) next; +}; + +static QLIST_HEAD(, PCIEChassis) chassis = QLIST_HEAD_INITIALIZER(chassis); + +static struct PCIEChassis *pcie_chassis_find(uint8_t chassis_number) +{ +struct PCIEChassis *c; +QLIST_FOREACH(c, chassis, next) { +if (c-number == chassis_number) { +break; +} +} +return c; +} + +void pcie_chassis_create(uint8_t chassis_number) +{ +struct PCIEChassis *c; +c = pcie_chassis_find(chassis_number); +if (c) { +return; +} +c = qemu_mallocz(sizeof(*c)); +c-number = chassis_number; +QLIST_INIT(c-slots); +QLIST_INSERT_HEAD(chassis, c, next); +} + +static PCIESlot *pcie_chassis_find_slot_with_chassis(struct PCIEChassis *c, + uint8_t slot) +{ +PCIESlot *s; +QLIST_FOREACH(s, c-slots, next) { +if (s-slot == slot) { +break; +} +} +return s; +} + +PCIESlot *pcie_chassis_find_slot(uint8_t chassis_number, uint16_t slot) +{ +struct PCIEChassis *c; +c = pcie_chassis_find(chassis_number); +if (!c) { +return NULL; +} +return pcie_chassis_find_slot_with_chassis(c, slot); +} + +int pcie_chassis_add_slot(struct PCIESlot *slot) +{ +struct PCIEChassis *c; +c = pcie_chassis_find(slot-chassis); +if (!c) { +return -ENODEV; +} +if (pcie_chassis_find_slot_with_chassis(c, slot-slot)) { +return -EBUSY; +} +QLIST_INSERT_HEAD(c-slots, slot, next); +return 0; +} + +void pcie_chassis_del_slot(PCIESlot *s) +{ +QLIST_REMOVE(s, next); +} diff --git a/hw/pcie_port.h b/hw/pcie_port.h new file mode 100644 index
[Qemu-devel] [PATCH v5 10/14] ioh3420: pcie root port in X58 ioh
Implements pcie root port switch in intel X58 ioh whose device id is 0x3420. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- Changes v4 - v5: - use pci_xxx_test_and_xxx_mask() Changes v3 - v4: - rename pcie_root - ioh3420 - compilation adjustment. Changes v2 - v3: - compilation adjustment. --- Makefile.objs |1 + hw/ioh3420.c | 229 + hw/ioh3420.h | 10 +++ 3 files changed, 240 insertions(+), 0 deletions(-) create mode 100644 hw/ioh3420.c create mode 100644 hw/ioh3420.h diff --git a/Makefile.objs b/Makefile.objs index 6c3b84a..cf7d2e9 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -140,6 +140,7 @@ hw-obj-y = hw-obj-y += vl.o loader.o hw-obj-y += virtio.o virtio-console.o hw-obj-y += fw_cfg.o pci.o pci_host.o pcie_host.o pci_bridge.o +hw-obj-y += ioh3420.o hw-obj-y += watchdog.o hw-obj-$(CONFIG_ISA_MMIO) += isa_mmio.o hw-obj-$(CONFIG_ECC) += ecc.o diff --git a/hw/ioh3420.c b/hw/ioh3420.c new file mode 100644 index 000..4317ac3 --- /dev/null +++ b/hw/ioh3420.c @@ -0,0 +1,229 @@ +/* + * ioh3420.c + * Intel X58 north bridge IOH + * PCI Express root port device id 3420 + * + * Copyright (c) 2010 Isaku Yamahata yamahata at valinux co jp + *VA Linux Systems Japan K.K. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, see http://www.gnu.org/licenses/. + */ + +#include pci_ids.h +#include msi.h +#include pcie.h +#include ioh3420.h + +#define PCI_DEVICE_ID_IOH_EPORT 0x3420 /* D0:F0 express mode */ +#define PCI_DEVICE_ID_IOH_REV 0x2 +#define IOH_EP_SSVID_OFFSET 0x40 +#define IOH_EP_SSVID_SVID PCI_VENDOR_ID_INTEL +#define IOH_EP_SSVID_SSID 0 +#define IOH_EP_MSI_OFFSET 0x60 +#define IOH_EP_MSI_SUPPORTED_FLAGS PCI_MSI_FLAGS_MASKBIT +#define IOH_EP_MSI_NR_VECTOR2 +#define IOH_EP_EXP_OFFSET 0x90 +#define IOH_EP_AER_OFFSET 0x100 + +/* + * If two MSI vector are allocated, Advanced Error Interrupt Message Number + * is 1. otherwise 0. + * 17.12.5.10 RPERRSTS, 32:27 bit Advanced Error Interrupt Message Number. + */ +static uint8_t ioh3420_aer_vector(const PCIDevice *d) +{ +switch (msi_nr_vectors_allocated(d)) { +case 1: +return 0; +case 2: +return 1; +case 4: +case 8: +case 16: +case 32: +default: +break; +} +abort(); +return 0; +} + +static void ioh3420_aer_vector_update(PCIDevice *d) +{ +pcie_aer_root_set_vector(d, ioh3420_aer_vector(d)); +} + +static void ioh3420_write_config(PCIDevice *d, + uint32_t address, uint32_t val, int len) +{ +uint16_t sltctl = +pci_get_word(d-config + d-exp.exp_cap + PCI_EXP_SLTCTL); +uint32_t uncorsta = +pci_get_long(d-config + d-exp.aer_cap + PCI_ERR_UNCOR_STATUS); +uint32_t root_cmd = +pci_get_long(d-config + d-exp.aer_cap + PCI_ERR_ROOT_COMMAND); + +pci_bridge_write_config(d, address, val, len); +msi_write_config(d, address, val, len); +ioh3420_aer_vector_update(d); +pcie_cap_slot_write_config(d, address, val, len, sltctl); +pcie_aer_write_config(d, address, val, len, uncorsta); +pcie_aer_root_write_config(d, address, val, len, root_cmd); +pci_clear_written_write_config(d, address, val, len); +} + +static void ioh3420_reset(DeviceState *qdev) +{ +PCIDevice *d = DO_UPCAST(PCIDevice, qdev, qdev); +msi_reset(d); +ioh3420_aer_vector_update(d); +pcie_cap_root_reset(d); +pcie_cap_deverr_reset(d); +pcie_cap_slot_reset(d); +pcie_aer_root_reset(d); +pci_bridge_reset_disable_base_limit(d); +pci_bridge_reset(qdev); +} + +static int ioh3420_initfn(PCIDevice *d) +{ +PCIBridge* br = DO_UPCAST(PCIBridge, dev, d); +PCIEPort *p = DO_UPCAST(PCIEPort, br, br); +PCIESlot *s = DO_UPCAST(PCIESlot, port, p); +int rc; + +rc = pci_bridge_initfn(d); +if (rc 0) { +return rc; +} + +d-config[PCI_REVISION_ID] = PCI_DEVICE_ID_IOH_REV; +pcie_port_init_reg(d); + +pci_config_set_vendor_id(d-config, PCI_VENDOR_ID_INTEL); +pci_config_set_device_id(d-config, PCI_DEVICE_ID_IOH_EPORT); + +rc = pci_bridge_ssvid_init(d, IOH_EP_SSVID_OFFSET, + IOH_EP_SSVID_SVID, IOH_EP_SSVID_SSID); +if (rc 0) { +return rc; +} +rc = msi_init(d,
Re: Testing of russian keymap (was Re: [Qemu-devel] [PATCH] fix '/' and '|' on russian keymap)
On Mon, Oct 18, 2010 at 01:59:15PM -0500, Anthony Liguori wrote: On 10/18/2010 12:30 PM, Oleg Sadov wrote: I don't understand reasons for such locale-default keyboard settings for qemu too, but may be it's useful for someone... -k only exists to deal with crappy VNC clients. If you use a good VNC client (like vinagre or virt-viewer) then you don't have to use -k. Indeed you must *NOT* use -k then, because that disables the extension that vinagre/virt-viewer rely on for sane keyboard handling. Regards, Daniel -- |: Red Hat, Engineering, London-o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org-o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
[Qemu-devel] qemu aborts if i add a already registered device from qemu monitor ..
Hi I tried to add a device to guest from upstream qemu monitor using device_add. Unknowingly i try to add already registered devices from qemu monitor, my qemu monitor is aborted. I don't see a reason to kill monitor. I think abort() is a bit rough. we need a better way to handle it. If a user try to add a already registered device, qemu should convey this to user saying that, this device already registered and an error message should be fine than aborting qemu. QLIST_FOREACH(block, ram_list.blocks, next) { if (!strcmp(block-idstr, new_block-idstr)) { fprintf(stderr, RAMBlock \%s\ already registered, abort!\n, new_block-idstr); abort(); } If i return some other value in above code, instead of abort(), I would need change the code for every device, which i dont want to. Is there a way to check, if device is already enrolled or not in the very beginning of device_add call. Thanks Pradeep
Re: [Qemu-devel] [PATCH] Add a DTrace tracing backend targetted for SystemTAP compatability
On Mon, Oct 18, 2010 at 3:04 PM, Daniel P. Berrange berra...@redhat.com wrote: This introduces a new tracing backend that targets the SystemTAP implementation of DTrace userspace tracing. The core functionality should be applicable and standard across any DTrace implementation on Solaris, OS-X, *BSD, but the Makefile rules will likely need some small additional changes to cope with OS specific build requirements. Cool, I will try this patch out shortly. Here a few comments: DTrace detection in ./configure would help users trying out --trace-backend=dtrace on systems without SystemTAP installed. Perhaps running dtrace(1) is a sufficient test? If SystemTAP is not installed then an error message from ./configure will save users time. This backend builds a little differently from the other tracing backends. Specifically there is no 'trace.c' file, because the 'dtrace' command line tool generates a '.o' file directly from the dtrace probe definition file. The probe definition is usually named with a '.d' extension but QEMU uses '.d' files for its external makefile dependancy tracking, so this uses '.dtrace' as the extension for the probe definition file. The 'tracetool' program gains the ability to generate a trace.h file for DTrace, and also to generate the trace.d file containing the dtrace probe definition, and finally a qemu.stp file which is a wrapper around the probe definition providing more convenient access from SystemTAP scripts. eg, instead of probe process(qemu).mark(qemu_malloc) { printf(Malloc %d %p\n, $arg1, $arg2); } The addition of qemu.stp to /usr/share/systemtap/tapset/ lets users write probe qemu.qemu_malloc { printf(Malloc %d %p\n, size, ptr); } * .gitignore: Ignore trace-dtrace.* * Makefile: Extra rules for generating DTrace files * Makefile.obj: Don't build trace.o for DTrace, use trace-dtrace.o generated by 'dtrace' instead * tracetool: Support for generating DTrace/SystemTAP data files Signed-off-by: Daniel P. Berrange berra...@redhat.com --- .gitignore | 3 + Makefile | 31 ++ Makefile.objs | 4 + tracetool | 175 - 4 files changed, 211 insertions(+), 2 deletions(-) diff --git a/.gitignore b/.gitignore index a43e4d1..0d27afd 100644 --- a/.gitignore +++ b/.gitignore @@ -4,6 +4,9 @@ config-host.* config-target.* trace.h trace.c +trace-dtrace.h +trace-dtrace.dtrace +qemu.stp *-timestamp *-softmmu *-darwin-user diff --git a/Makefile b/Makefile index 252c817..812b0d3 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,9 @@ # Makefile for QEMU. GENERATED_HEADERS = config-host.h trace.h +ifeq ($(TRACE_BACKEND),dtrace) +GENERATED_HEADERS += trace-dtrace.h +endif ifneq ($(wildcard config-host.mak),) # Put the all: rule here so that config-host.mak can contain dependencies. @@ -106,7 +109,11 @@ ui/vnc.o: QEMU_CFLAGS += $(VNC_TLS_CFLAGS) bt-host.o: QEMU_CFLAGS += $(BLUEZ_CFLAGS) +ifeq ($(TRACE_BACKEND),dtrace) +trace.h: trace.h-timestamp trace-dtrace.h +else trace.h: trace.h-timestamp +endif trace.h-timestamp: $(SRC_PATH)/trace-events config-host.mak $(call quiet-command,sh $(SRC_PATH)/tracetool --$(TRACE_BACKEND) -h $ $@, GEN trace.h) �...@cmp -s $@ trace.h || cp $@ trace.h @@ -118,6 +125,23 @@ trace.c-timestamp: $(SRC_PATH)/trace-events config-host.mak trace.o: trace.c $(GENERATED_HEADERS) +trace-dtrace.h: trace-dtrace.dtrace + $(call quiet-command,dtrace -o $@ -h -s $, GEN trace-dtrace.h) + +# Normal practice is to name DTrace probe file with a '.d' extension +# but that gets picked up by QEMU's Makefile as an external dependancy +# rule file. So we use '.dtrace' instead +trace-dtrace.dtrace: trace-dtrace.dtrace-timestamp +trace-dtrace.dtrace-timestamp: $(SRC_PATH)/trace-events config-host.mak + $(call quiet-command,sh $(SRC_PATH)/tracetool --$(TRACE_BACKEND) -d $ $@, GEN trace-dtrace.dtrace) + @cmp -s $@ trace-dtrace.dtrace || cp $@ trace-dtrace.dtrace +ifdef CONFIG_LINUX + $(call quiet-command,sh $(SRC_PATH)/tracetool --$(TRACE_BACKEND) -s $ qemu.stp, GEN qemu.stp) +endif + +trace-dtrace.o: trace-dtrace.dtrace $(GENERATED_HEADERS) + $(call quiet-command,dtrace -o $@ -G -s $, GEN trace-dtrace.o) + simpletrace.o: simpletrace.c $(GENERATED_HEADERS) version.o: $(SRC_PATH)/version.rc config-host.mak @@ -154,6 +178,7 @@ clean: rm -f slirp/*.o slirp/*.d audio/*.o audio/*.d block/*.o block/*.d net/*.o net/*.d fsdev/*.o fsdev/*.d ui/*.o ui/*.d rm -f qemu-img-cmds.h rm -f trace.c trace.h trace.c-timestamp trace.h-timestamp + rm -f trace-dtrace.dtrace trace-dtrace.h trace-dtrace.h-timestamp qemu.stp $(MAKE) -C tests clean for d in $(ALL_SUBDIRS) libhw32 libhw64 libuser libdis libdis-user; do \ if test -d $$d; then $(MAKE) -C $$d $@ || exit 1;
Re: [Qemu-devel] [PATCH 1/2] Add drive_get_by_id
On Mon, Oct 18, 2010 at 11:17 PM, Ryan Harper ry...@us.ibm.com wrote: Add a function to find a drive by id string. Signed-off-by: Ryan Harper ry...@us.ibm.com --- blockdev.c | 12 blockdev.h | 1 + 2 files changed, 13 insertions(+), 0 deletions(-) diff --git a/blockdev.c b/blockdev.c index ff7602b..a00b3fa 100644 --- a/blockdev.c +++ b/blockdev.c @@ -75,6 +75,18 @@ DriveInfo *drive_get(BlockInterfaceType type, int bus, int unit) return NULL; } +DriveInfo *drive_get_by_id(const char *id) +{ + DriveInfo *dinfo; + + QTAILQ_FOREACH(dinfo, drives, next) { + if (strcmp(id, dinfo-id)) + continue; QEMU coding style uses curly braces even for 1-line if statements: if (strcmp(id, dinfo-id)) { continue; } Stefan
[Qemu-devel] [PATCH 1/3] qdev: make qdev_find_recursive public
--- hw/qdev.c |2 +- hw/qdev.h |1 + 2 files changed, 2 insertions(+), 1 deletions(-) diff --git a/hw/qdev.c b/hw/qdev.c index 35858cb..d669a9d 100644 --- a/hw/qdev.c +++ b/hw/qdev.c @@ -477,7 +477,7 @@ static BusState *qbus_find_recursive(BusState *bus, const char *name, return NULL; } -static DeviceState *qdev_find_recursive(BusState *bus, const char *id) +DeviceState *qdev_find_recursive(BusState *bus, const char *id) { DeviceState *dev, *ret; BusState *child; diff --git a/hw/qdev.h b/hw/qdev.h index 579328a..214066e 100644 --- a/hw/qdev.h +++ b/hw/qdev.h @@ -177,6 +177,7 @@ void qbus_create_inplace(BusState *bus, BusInfo *info, DeviceState *parent, const char *name); BusState *qbus_create(BusInfo *info, DeviceState *parent, const char *name); void qbus_free(BusState *bus); +DeviceState *qdev_find_recursive(BusState *bus, const char *id); #define FROM_QBUS(type, dev) DO_UPCAST(type, qbus, dev) -- 1.7.3.1
[Qemu-devel] [PATCH 0/3] add usb_detach and usb_attach (v2)
This patchset uses id like device_del for attaching/detaching usb devices. The first two patches ready the way: 1. makes qdev_find_recursive non static and in qdev.h 2. adds a usb_device_by_id which goes over the usb buses calling qdev_find_recursive 3. adds the commands that use usb_device_by_id Alon Levy (3): qdev: make qdev_find_recursive public usb: add public usb_device_by_id monitor: add usb_attach and usb_detach hmp-commands.hx | 34 ++ hw/qdev.c |2 +- hw/qdev.h |1 + hw/usb-bus.c| 16 hw/usb.h|1 + sysemu.h|2 ++ vl.c| 31 +++ 7 files changed, 86 insertions(+), 1 deletions(-) -- 1.7.3.1
[Qemu-devel] [PATCH 2/3] usb: add public usb_device_by_id
--- hw/usb-bus.c | 16 hw/usb.h |1 + 2 files changed, 17 insertions(+), 0 deletions(-) diff --git a/hw/usb-bus.c b/hw/usb-bus.c index b692503..d732bd3 100644 --- a/hw/usb-bus.c +++ b/hw/usb-bus.c @@ -189,6 +189,22 @@ int usb_device_detach(USBDevice *dev) return 0; } +USBDevice *usb_device_by_id(const char* id) +{ +USBBus *bus; +DeviceState *qdev; +USBDevice *dev; + +QTAILQ_FOREACH(bus, busses, next) { +qdev = qdev_find_recursive(bus-qbus, id); +if (qdev != NULL) { +dev = DO_UPCAST(USBDevice, qdev, qdev); +return dev; +} +} +return NULL; +} + int usb_device_delete_addr(int busnr, int addr) { USBBus *bus; diff --git a/hw/usb.h b/hw/usb.h index 00d2802..e70fccd 100644 --- a/hw/usb.h +++ b/hw/usb.h @@ -317,6 +317,7 @@ void usb_unregister_port(USBBus *bus, USBPort *port); int usb_device_attach(USBDevice *dev); int usb_device_detach(USBDevice *dev); int usb_device_delete_addr(int busnr, int addr); +USBDevice *usb_device_by_id(const char* id); static inline USBBus *usb_bus_from_device(USBDevice *d) { -- 1.7.3.1
[Qemu-devel] [PATCH 3/3] monitor: add usb_attach and usb_detach
--- hmp-commands.hx | 34 ++ sysemu.h|2 ++ vl.c| 31 +++ 3 files changed, 67 insertions(+), 0 deletions(-) diff --git a/hmp-commands.hx b/hmp-commands.hx index 81999aa..660205c 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -517,6 +517,40 @@ command @code{info usb} to see the devices you can remove. ETEXI { +.name = usb_attach, +.args_type = id:s, +.params = device, +.help = attach USB device 'bus.addr', +.mhandler.cmd = do_usb_attach, +}, + +STEXI +...@item usb_attach @var{devname} +...@findex usb_attach + +Attach the USB device @var{devname} to the QEMU virtual USB +hub. @var{devname} has the syntax @code{bus.addr}. Use the monitor +command @code{info usb} to see the devices you can attach. +ETEXI + +{ +.name = usb_detach, +.args_type = id:s, +.params = device, +.help = remove USB device 'bus.addr', +.mhandler.cmd = do_usb_detach, +}, + +STEXI +...@item usb_detach @var{devname} +...@findex usb_detach + +Detach the USB device @var{devname} from the QEMU virtual USB +hub. @var{devname} has the syntax @code{bus.addr}. Use the monitor +command @code{info usb} to see the devices you can detach. +ETEXI + +{ .name = device_add, .args_type = device:O, .params = driver[,prop=value][,...], diff --git a/sysemu.h b/sysemu.h index b81a70e..1dc0e58 100644 --- a/sysemu.h +++ b/sysemu.h @@ -182,6 +182,8 @@ extern struct soundhw soundhw[]; void do_usb_add(Monitor *mon, const QDict *qdict); void do_usb_del(Monitor *mon, const QDict *qdict); +void do_usb_attach(Monitor *mon, const QDict *qdict); +void do_usb_detach(Monitor *mon, const QDict *qdict); void usb_info(Monitor *mon); void rtc_change_mon_event(struct tm *tm); diff --git a/vl.c b/vl.c index df414ef..35db6c8 100644 --- a/vl.c +++ b/vl.c @@ -894,6 +894,37 @@ void do_usb_del(Monitor *mon, const QDict *qdict) } } +void do_usb_attach(Monitor *mon, const QDict *qdict) +{ +const char *id = qdict_get_str(qdict, id); +USBDevice *dev; + +dev = usb_device_by_id(id); + +if (dev == NULL) { +error_report(no such USB device '%s', id); +return; +} +if (usb_device_attach(dev) 0) { +error_report(could not attach USB device '%s', id); +} +} + +void do_usb_detach(Monitor *mon, const QDict *qdict) +{ +const char *id = qdict_get_str(qdict, id); +USBDevice *dev; + +dev = usb_device_by_id(id); +if (dev == NULL) { +error_report(no such USB device '%s', id); +return; +} +if (usb_device_detach(dev) 0) { +error_report(could not detach USB device '%s', id); +} +} + /***/ /* PCMCIA/Cardbus */ -- 1.7.3.1
[Qemu-devel] [PATCH 10/10] Add savevm/loadvm support for MCE
Port qemu-kvm's commit 1bab5d11545d8de5facf46c28630085a2f9651ae Author: Huang Ying ying.hu...@intel.com Date: Wed Mar 3 16:52:46 2010 +0800 Add savevm/loadvm support for MCE MCE registers are saved/load into/from CPUState in kvm_arch_save/load_regs. To simulate the MCG_STATUS clearing upon reset, MSR_MCG_STATUS is set to 0 for KVM_PUT_RESET_STATE. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com --- target-i386/kvm.c | 39 ++- 1 files changed, 38 insertions(+), 1 deletions(-) diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 84bd400..d940175 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -777,7 +777,7 @@ static int kvm_put_msrs(CPUState *env, int level) struct kvm_msr_entry entries[100]; } msr_data; struct kvm_msr_entry *msrs = msr_data.entries; -int n = 0; +int i, n = 0; kvm_msr_entry_set(msrs[n++], MSR_IA32_SYSENTER_CS, env-sysenter_cs); kvm_msr_entry_set(msrs[n++], MSR_IA32_SYSENTER_ESP, env-sysenter_esp); @@ -797,6 +797,18 @@ static int kvm_put_msrs(CPUState *env, int level) env-system_time_msr); kvm_msr_entry_set(msrs[n++], MSR_KVM_WALL_CLOCK, env-wall_clock_msr); } +#ifdef KVM_CAP_MCE +if (env-mcg_cap) { +if (level == KVM_PUT_RESET_STATE) +kvm_msr_entry_set(msrs[n++], MSR_MCG_STATUS, env-mcg_status); +else if (level == KVM_PUT_FULL_STATE) { +kvm_msr_entry_set(msrs[n++], MSR_MCG_STATUS, env-mcg_status); +kvm_msr_entry_set(msrs[n++], MSR_MCG_CTL, env-mcg_ctl); +for (i = 0; i (env-mcg_cap 0xff) * 4; i++) +kvm_msr_entry_set(msrs[n++], MSR_MC0_CTL + i, env-mce_banks[i]); +} +} +#endif msr_data.info.nmsrs = n; @@ -1004,6 +1016,15 @@ static int kvm_get_msrs(CPUState *env) msrs[n++].index = MSR_KVM_SYSTEM_TIME; msrs[n++].index = MSR_KVM_WALL_CLOCK; +#ifdef KVM_CAP_MCE +if (env-mcg_cap) { +msrs[n++].index = MSR_MCG_STATUS; +msrs[n++].index = MSR_MCG_CTL; +for (i = 0; i (env-mcg_cap 0xff) * 4; i++) +msrs[n++].index = MSR_MC0_CTL + i; +} +#endif + msr_data.info.nmsrs = n; ret = kvm_vcpu_ioctl(env, KVM_GET_MSRS, msr_data); if (ret 0) @@ -1046,6 +1067,22 @@ static int kvm_get_msrs(CPUState *env) case MSR_KVM_WALL_CLOCK: env-wall_clock_msr = msrs[i].data; break; +#ifdef KVM_CAP_MCE +case MSR_MCG_STATUS: +env-mcg_status = msrs[i].data; +break; +case MSR_MCG_CTL: +env-mcg_ctl = msrs[i].data; +break; +#endif +default: +#ifdef KVM_CAP_MCE +if (msrs[i].index = MSR_MC0_CTL +msrs[i].index MSR_MC0_CTL + (env-mcg_cap 0xff) * 4) { +env-mce_banks[msrs[i].index - MSR_MC0_CTL] = msrs[i].data; +break; +} +#endif } } -- 1.7.2.1
[Qemu-devel] [PATCH 09/10] MCE: Relay UCR MCE to guest
Port qemu-kvm's commit 4b62fff1101a7ad77553147717a8bd3bf79df7ef Author: Huang Ying ying.hu...@intel.com Date: Mon Sep 21 10:43:25 2009 +0800 MCE: Relay UCR MCE to guest UCR (uncorrected recovery) MCE is supported in recent Intel CPUs, where some hardware error such as some memory error can be reported without PCC (processor context corrupted). To recover from such MCE, the corresponding memory will be unmapped, and all processes accessing the memory will be killed via SIGBUS. For KVM, if QEMU/KVM is killed, all guest processes will be killed too. So we relay SIGBUS from host OS to guest system via a UCR MCE injection. Then guest OS can isolate corresponding memory and kill necessary guest processes only. SIGBUS sent to main thread (not VCPU threads) will be broadcast to all VCPU threads as UCR MCE. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com --- cpus.c| 82 -- kvm-stub.c|5 ++ kvm.h |3 + target-i386/cpu.h | 20 +- target-i386/helper.c |2 +- target-i386/kvm.c | 178 - target-i386/kvm_x86.h |3 +- 7 files changed, 279 insertions(+), 14 deletions(-) diff --git a/cpus.c b/cpus.c index 429993a..62de0bc 100644 --- a/cpus.c +++ b/cpus.c @@ -34,6 +34,10 @@ #include cpus.h #include compatfd.h +#ifdef CONFIG_LINUX +#include sys/prctl.h +#include sys/signalfd.h +#endif #ifdef SIGRTMIN #define SIG_IPI (SIGRTMIN+4) @@ -41,6 +45,10 @@ #define SIG_IPI SIGUSR1 #endif +#ifndef PR_MCE_KILL +#define PR_MCE_KILL 33 +#endif + static CPUState *next_cpu; /***/ @@ -498,28 +506,77 @@ static void qemu_tcg_wait_io_event(void) } } +static void sigbus_reraise(void) +{ +sigset_t set; +struct sigaction action; + +memset(action, 0, sizeof(action)); +action.sa_handler = SIG_DFL; +if (!sigaction(SIGBUS, action, NULL)) { +raise(SIGBUS); +sigemptyset(set); +sigaddset(set, SIGBUS); +sigprocmask(SIG_UNBLOCK, set, NULL); +} +perror(Failed to re-raise SIGBUS!\n); +abort(); +} + +static void sigbus_handler(int n, struct qemu_signalfd_siginfo *siginfo, + void *ctx) +{ +#if defined(TARGET_I386) +if (kvm_on_sigbus(siginfo-ssi_code, (void *)(intptr_t)siginfo-ssi_addr)) +#endif +sigbus_reraise(); +} + static void qemu_kvm_eat_signal(CPUState *env, int timeout) { struct timespec ts; int r, e; siginfo_t siginfo; sigset_t waitset; +sigset_t chkset; ts.tv_sec = timeout / 1000; ts.tv_nsec = (timeout % 1000) * 100; sigemptyset(waitset); sigaddset(waitset, SIG_IPI); +sigaddset(waitset, SIGBUS); -qemu_mutex_unlock(qemu_global_mutex); -r = sigtimedwait(waitset, siginfo, ts); -e = errno; -qemu_mutex_lock(qemu_global_mutex); +do { +qemu_mutex_unlock(qemu_global_mutex); -if (r == -1 !(e == EAGAIN || e == EINTR)) { -fprintf(stderr, sigtimedwait: %s\n, strerror(e)); -exit(1); -} +r = sigtimedwait(waitset, siginfo, ts); +e = errno; + +qemu_mutex_lock(qemu_global_mutex); + +if (r == -1 !(e == EAGAIN || e == EINTR)) { +fprintf(stderr, sigtimedwait: %s\n, strerror(e)); +exit(1); +} + +switch (r) { +case SIGBUS: +#ifdef TARGET_I386 +if (kvm_on_sigbus_vcpu(env, siginfo.si_code, siginfo.si_addr)) +#endif +sigbus_reraise(); +break; +default: +break; +} + +r = sigpending(chkset); +if (r == -1) { +fprintf(stderr, sigpending: %s\n, strerror(e)); +exit(1); +} +} while (sigismember(chkset, SIG_IPI) || sigismember(chkset, SIGBUS)); } static void qemu_kvm_wait_io_event(CPUState *env) @@ -645,6 +702,7 @@ static void kvm_init_ipi(CPUState *env) pthread_sigmask(SIG_BLOCK, NULL, set); sigdelset(set, SIG_IPI); +sigdelset(set, SIGBUS); r = kvm_set_signal_mask(env, set); if (r) { fprintf(stderr, kvm_set_signal_mask: %s\n, strerror(r)); @@ -655,6 +713,7 @@ static void kvm_init_ipi(CPUState *env) static sigset_t block_io_signals(void) { sigset_t set; +struct sigaction action; /* SIGUSR2 used by posix-aio-compat.c */ sigemptyset(set); @@ -665,8 +724,15 @@ static sigset_t block_io_signals(void) sigaddset(set, SIGIO); sigaddset(set, SIGALRM); sigaddset(set, SIG_IPI); +sigaddset(set, SIGBUS); pthread_sigmask(SIG_BLOCK, set, NULL); +memset(action, 0, sizeof(action)); +action.sa_flags = SA_SIGINFO; +action.sa_sigaction = (void (*)(int, siginfo_t*, void*))sigbus_handler; +sigaction(SIGBUS, action, NULL); +prctl(PR_MCE_KILL, 1, 1, 0, 0); + return set;
[Qemu-devel] Re: [PATCH] Implement a virtio GPU transport
On 10/10/10 16:11, Avi Kivity wrote: On 10/06/2010 05:59 PM, Ian Molton wrote: This patch implements a virtio-based transport for use by a virtualised OpenGL passthrough implementation. The libGL and qemu-gl code to support this patch are available here: http://gitorious.org/vm-gl-accel/qemu-gl http://gitorious.org/vm-gl-accel/qemu-libgl Comments please! 1. copy qemu-devel Ok, will do. an virtualization@, many virtio developers live there. you mean virtualizat...@lists.osdl.org ? 2. should start with a patch to the virtio-pci spec to document what you're doing Where can I find that spec? + /* Transfer data */ + if (virtqueue_add_buf(vq, sg_list, o_page, i_page, (void *)1)= 0) { + virtqueue_kick(vq); + /* Chill out until it's done with the buffer. */ + while (!virtqueue_get_buf(vq,count)) + cpu_relax(); + } + This is pretty gross, and will burn lots of cpu if the hypervisor processes the queue asynchronously. It doesnt, at present... It could be changed fairly easily ithout breaking anything if that happens though. -Ian
Re: [Qemu-devel] [PATCH][block] qcow2: Support exact L1 table growth
Am 18.10.2010 17:53, schrieb Stefan Hajnoczi: The L1 table grow operation includes a size calculation that bumps up the new L1 table size in order to anticipate the size needs of vmstate data. This helps reduce the number of times that the L1 table has to be grown when vmstate data is appended. This size overhead is not necessary during image creation, bdrv_truncate(), or snapshot goto operations. In fact, existing qemu-iotests that exercise table growth are no longer able to trigger it because image creation preallocates an L1 table that is too large after changes to qcow_create2(). This patch keeps the size calculation but also adds exact growth for callers that do not want to inflate the L1 table size unnecessarily. Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- block/qcow2-cluster.c | 25 - block/qcow2-snapshot.c |2 +- block/qcow2.c |2 +- block/qcow2.h |2 +- 4 files changed, 19 insertions(+), 12 deletions(-) Hi Kevin, This patch fixes the qcow_create2() issue seen in qemu-iotests 026 with your kevin.git/block branch. The issue was that the L1 table size of new images is inflated by qcow2_grow_l1_table(). This caused the differences in the test, e.g. L1 table grow tests no longer worked because they couldn't force the table to grow (it was already more than large enough). If we use exact L1 growth in bdrv_truncate() then less image space is wasted and the test passes again without changes to 026.out. I think this patch is the way to go, not just to satisfy the test, but also because we don't need to overallocate L1 tables to start with. Good that you took a look at it, I hadn't even thought of changing the qcow2 code. I agree that this makes sense even independent of qemu-iotests. The patch looks good to me, too. Kevin
[Qemu-devel] [PATCH 05/10] Expose thread_id in info cpus
commit ce6325ff1af34dbaee91c8d28e792277e43f1227 Author: Glauber Costa gco...@redhat.com Date: Wed Mar 5 17:01:10 2008 -0300 Augment info cpus This patch exposes the thread id associated with each cpu through the already well known 'info cpus' interface. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com --- cpu-defs.h |1 + cpus.c |5 + exec.c |1 + monitor.c |4 osdep.c| 15 +++ osdep.h|1 + 6 files changed, 27 insertions(+), 0 deletions(-) diff --git a/cpu-defs.h b/cpu-defs.h index 8d4bf86..eaed43e 100644 --- a/cpu-defs.h +++ b/cpu-defs.h @@ -197,6 +197,7 @@ typedef struct CPUWatchpoint { int nr_cores; /* number of cores within this CPU package */\ int nr_threads;/* number of threads within this CPU */ \ int running; /* Nonzero if cpu is currently running(usermode). */ \ +int thread_id; \ /* user data */ \ void *opaque; \ \ diff --git a/cpus.c b/cpus.c index 3875657..429993a 100644 --- a/cpus.c +++ b/cpus.c @@ -539,6 +539,7 @@ static void *kvm_cpu_thread_fn(void *arg) qemu_mutex_lock(qemu_global_mutex); qemu_thread_self(env-thread); +env-thread_id = get_thread_id(); if (kvm_enabled()) kvm_init_vcpu(env); @@ -578,6 +579,10 @@ static void *tcg_cpu_thread_fn(void *arg) while (!qemu_system_ready) qemu_cond_timedwait(qemu_system_cond, qemu_global_mutex, 100); +for (env = first_cpu; env != NULL; env = env-next_cpu) { +env-thread_id = get_thread_id(); +} + while (1) { cpu_exec_all(); qemu_tcg_wait_io_event(); diff --git a/exec.c b/exec.c index 1fbe91c..c09051d 100644 --- a/exec.c +++ b/exec.c @@ -637,6 +637,7 @@ void cpu_exec_init(CPUState *env) env-numa_node = 0; QTAILQ_INIT(env-breakpoints); QTAILQ_INIT(env-watchpoints); +env-thread_id = get_thread_id(); *penv = env; #if defined(CONFIG_USER_ONLY) cpu_list_unlock(); diff --git a/monitor.c b/monitor.c index 260cc02..709d0fd 100644 --- a/monitor.c +++ b/monitor.c @@ -849,6 +849,9 @@ static void print_cpu_iter(QObject *obj, void *opaque) monitor_printf(mon, (halted)); } +monitor_printf(mon, thread_id=% PRId64 , + qdict_get_int(cpu, thread_id)); + monitor_printf(mon, \n); } @@ -893,6 +896,7 @@ static void do_info_cpus(Monitor *mon, QObject **ret_data) #elif defined(TARGET_MIPS) qdict_put(cpu, PC, qint_from_int(env-active_tc.PC)); #endif +qdict_put(cpu, thread_id, qint_from_int(env-thread_id)); qlist_append(cpu_list, cpu); } diff --git a/osdep.c b/osdep.c index 2e05b21..dda0f43 100644 --- a/osdep.c +++ b/osdep.c @@ -44,6 +44,10 @@ extern int madvise(caddr_t, size_t, int); #endif +#ifdef CONFIG_LINUX +#include sys/syscall.h +#endif + #ifdef CONFIG_EVENTFD #include sys/eventfd.h #endif @@ -200,6 +204,17 @@ int qemu_create_pidfile(const char *filename) return 0; } +int get_thread_id(void) +{ +#if defined (_WIN32) +return GetCurrentThreadId(); +#elif defined (__linux__) +return syscall(SYS_gettid); +#else +return getpid(); +#endif +} + #ifdef _WIN32 /* mingw32 needs ffs for compilations without optimization. */ diff --git a/osdep.h b/osdep.h index 6716281..9b3bc2e 100644 --- a/osdep.h +++ b/osdep.h @@ -126,6 +126,7 @@ void qemu_vfree(void *ptr); int qemu_madvise(void *addr, size_t len, int advice); int qemu_create_pidfile(const char *filename); +int get_thread_id(void); #ifdef _WIN32 int ffs(int i); -- 1.7.2.1
[Qemu-devel] [PATCH 04/10] iothread: use signalfd
Block SIGALRM, SIGIO and consume them via signalfd. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com --- cpus.c | 74 +++ 1 files changed, 69 insertions(+), 5 deletions(-) diff --git a/cpus.c b/cpus.c index b09f5e3..3875657 100644 --- a/cpus.c +++ b/cpus.c @@ -33,6 +33,7 @@ #include exec-all.h #include cpus.h +#include compatfd.h #ifdef SIGRTMIN #define SIG_IPI (SIGRTMIN+4) @@ -329,14 +330,75 @@ static QemuCond qemu_work_cond; static void tcg_init_ipi(void); static void kvm_init_ipi(CPUState *env); -static void unblock_io_signals(void); +static sigset_t block_io_signals(void); + +/* If we have signalfd, we mask out the signals we want to handle and then + * use signalfd to listen for them. We rely on whatever the current signal + * handler is to dispatch the signals when we receive them. + */ +static void sigfd_handler(void *opaque) +{ +int fd = (unsigned long) opaque; +struct qemu_signalfd_siginfo info; +struct sigaction action; +ssize_t len; + +while (1) { +do { +len = read(fd, info, sizeof(info)); +} while (len == -1 errno == EINTR); + +if (len == -1 errno == EAGAIN) { +break; +} + +if (len != sizeof(info)) { +printf(read from sigfd returned %zd: %m\n, len); +return; +} + +sigaction(info.ssi_signo, NULL, action); +if ((action.sa_flags SA_SIGINFO) action.sa_sigaction) { +action.sa_sigaction(info.ssi_signo, +(siginfo_t *)info, NULL); +} else if (action.sa_handler) { +action.sa_handler(info.ssi_signo); +} +} +} + +static int qemu_signalfd_init(sigset_t mask) +{ +int sigfd; + +sigfd = qemu_signalfd(mask); +if (sigfd == -1) { +fprintf(stderr, failed to create signalfd\n); +return -errno; +} + +fcntl_setfl(sigfd, O_NONBLOCK); + +qemu_set_fd_handler2(sigfd, NULL, sigfd_handler, NULL, + (void *)(unsigned long) sigfd); + +return 0; +} int qemu_init_main_loop(void) { int ret; +sigset_t blocked_signals; cpu_set_debug_excp_handler(cpu_debug_handler); +blocked_signals = block_io_signals(); + +ret = qemu_signalfd_init(blocked_signals); +if (ret) +return ret; + +/* Note eventfd must be drained before signalfd handlers run */ ret = qemu_event_init(); if (ret) return ret; @@ -347,7 +409,6 @@ int qemu_init_main_loop(void) qemu_mutex_init(qemu_global_mutex); qemu_mutex_lock(qemu_global_mutex); -unblock_io_signals(); qemu_thread_self(io_thread); return 0; @@ -586,19 +647,22 @@ static void kvm_init_ipi(CPUState *env) } } -static void unblock_io_signals(void) +static sigset_t block_io_signals(void) { sigset_t set; +/* SIGUSR2 used by posix-aio-compat.c */ sigemptyset(set); sigaddset(set, SIGUSR2); -sigaddset(set, SIGIO); -sigaddset(set, SIGALRM); pthread_sigmask(SIG_UNBLOCK, set, NULL); sigemptyset(set); +sigaddset(set, SIGIO); +sigaddset(set, SIGALRM); sigaddset(set, SIG_IPI); pthread_sigmask(SIG_BLOCK, set, NULL); + +return set; } void qemu_mutex_lock_iothread(void) -- 1.7.2.1
[Qemu-devel] [PATCH 00/10] [PULL] qemu-kvm.git uq/master queue
The following changes since commit 38cc9b607f85017b095793cab6c129bc9844f441: issue snd_pcm_start() when capturing audio (2010-10-18 00:39:06 +0400) are available in the git repository at: git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git uq/master Huang Ying (1): Add RAM - physical addr mapping in MCE simulation Joerg Roedel (2): Set cpuid definition to 0 before initializing it Add svm cpuid features Marcelo Tosatti (7): signalfd compatibility iothread: use signalfd Expose thread_id in info cpus kvm: x86: add mce support Export qemu_ram_addr_from_host MCE: Relay UCR MCE to guest Add savevm/loadvm support for MCE Makefile.objs |1 + compatfd.c| 117 +++ compatfd.h| 43 +++ configure | 18 +++ cpu-common.h |3 +- cpu-defs.h|1 + cpus.c| 161 -- exec-all.h|2 +- exec.c| 27 +++-- kvm-all.c | 18 +++ kvm-stub.c|5 + kvm.h |6 + monitor.c |4 + osdep.c | 15 +++ osdep.h |1 + target-i386/cpu.h | 32 +- target-i386/cpuid.c | 79 ++--- target-i386/helper.c |6 + target-i386/kvm.c | 300 - target-i386/kvm_x86.h | 22 20 files changed, 817 insertions(+), 44 deletions(-) create mode 100644 compatfd.c create mode 100644 compatfd.h create mode 100644 target-i386/kvm_x86.h
[Qemu-devel] Re: [PATCH] Implement a virtio GPU transport
On 10/19/2010 12:31 PM, Ian Molton wrote: an virtualization@, many virtio developers live there. you mean virtualizat...@lists.osdl.org ? Yes. 2. should start with a patch to the virtio-pci spec to document what you're doing Where can I find that spec? http://ozlabs.org/~rusty/virtio-spec/ + /* Transfer data */ + if (virtqueue_add_buf(vq, sg_list, o_page, i_page, (void *)1)= 0) { + virtqueue_kick(vq); + /* Chill out until it's done with the buffer. */ + while (!virtqueue_get_buf(vq,count)) + cpu_relax(); + } + This is pretty gross, and will burn lots of cpu if the hypervisor processes the queue asynchronously. It doesnt, at present... It could be changed fairly easily ithout breaking anything if that happens though. The hypervisor and the guest can be changed independently. The driver should be coded so that it doesn't depend on hypervisor implementation details. -- error compiling committee.c: too many arguments to function
[Qemu-devel] [PATCH 06/10] kvm: x86: add mce support
Port qemu-kvm's MCE support commit c68b2374c9048812f488e00ffb95db66c0bc07a7 Author: Huang Ying ying.hu...@intel.com Date: Mon Jul 20 10:00:53 2009 +0800 Add MCE simulation support to qemu/kvm KVM ioctls are used to initialize MCE simulation and inject MCE. The real MCE simulation is implemented in Linux kernel. The Kernel part has been merged. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com --- target-i386/helper.c |6 +++ target-i386/kvm.c | 84 + target-i386/kvm_x86.h | 21 3 files changed, 111 insertions(+), 0 deletions(-) create mode 100644 target-i386/kvm_x86.h diff --git a/target-i386/helper.c b/target-i386/helper.c index e134340..4b430dd 100644 --- a/target-i386/helper.c +++ b/target-i386/helper.c @@ -27,6 +27,7 @@ #include exec-all.h #include qemu-common.h #include kvm.h +#include kvm_x86.h //#define DEBUG_MMU @@ -1030,6 +1031,11 @@ void cpu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status, if (bank = bank_num || !(status MCI_STATUS_VAL)) return; +if (kvm_enabled()) { +kvm_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc); +return; +} + /* * if MSR_MCG_CTL is not all 1s, the uncorrected error * reporting is disabled diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 74e7b4f..343fb02 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -27,6 +27,7 @@ #include hw/pc.h #include hw/apic.h #include ioport.h +#include kvm_x86.h #ifdef CONFIG_KVM_PARA #include linux/kvm_para.h @@ -167,6 +168,67 @@ static int get_para_features(CPUState *env) } #endif +#ifdef KVM_CAP_MCE +static int kvm_get_mce_cap_supported(KVMState *s, uint64_t *mce_cap, + int *max_banks) +{ +int r; + +r = kvm_ioctl(s, KVM_CHECK_EXTENSION, KVM_CAP_MCE); +if (r 0) { +*max_banks = r; +return kvm_ioctl(s, KVM_X86_GET_MCE_CAP_SUPPORTED, mce_cap); +} +return -ENOSYS; +} + +static int kvm_setup_mce(CPUState *env, uint64_t *mcg_cap) +{ +return kvm_vcpu_ioctl(env, KVM_X86_SETUP_MCE, mcg_cap); +} + +static int kvm_set_mce(CPUState *env, struct kvm_x86_mce *m) +{ +return kvm_vcpu_ioctl(env, KVM_X86_SET_MCE, m); +} + +struct kvm_x86_mce_data +{ +CPUState *env; +struct kvm_x86_mce *mce; +}; + +static void kvm_do_inject_x86_mce(void *_data) +{ +struct kvm_x86_mce_data *data = _data; +int r; + +r = kvm_set_mce(data-env, data-mce); +if (r 0) +perror(kvm_set_mce FAILED); +} +#endif + +void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status, +uint64_t mcg_status, uint64_t addr, uint64_t misc) +{ +#ifdef KVM_CAP_MCE +struct kvm_x86_mce mce = { +.bank = bank, +.status = status, +.mcg_status = mcg_status, +.addr = addr, +.misc = misc, +}; +struct kvm_x86_mce_data data = { +.env = cenv, +.mce = mce, +}; + +run_on_cpu(cenv, kvm_do_inject_x86_mce, data); +#endif +} + int kvm_arch_init_vcpu(CPUState *env) { struct { @@ -277,6 +339,28 @@ int kvm_arch_init_vcpu(CPUState *env) cpuid_data.cpuid.nent = cpuid_i; +#ifdef KVM_CAP_MCE +if (((env-cpuid_version 8)0xF) = 6 + (env-cpuid_features(CPUID_MCE|CPUID_MCA)) == (CPUID_MCE|CPUID_MCA) + kvm_check_extension(env-kvm_state, KVM_CAP_MCE) 0) { +uint64_t mcg_cap; +int banks; + +if (kvm_get_mce_cap_supported(env-kvm_state, mcg_cap, banks)) +perror(kvm_get_mce_cap_supported FAILED); +else { +if (banks MCE_BANKS_DEF) +banks = MCE_BANKS_DEF; +mcg_cap = MCE_CAP_DEF; +mcg_cap |= banks; +if (kvm_setup_mce(env, mcg_cap)) +perror(kvm_setup_mce FAILED); +else +env-mcg_cap = mcg_cap; +} +} +#endif + return kvm_vcpu_ioctl(env, KVM_SET_CPUID2, cpuid_data); } diff --git a/target-i386/kvm_x86.h b/target-i386/kvm_x86.h new file mode 100644 index 000..c1ebd24 --- /dev/null +++ b/target-i386/kvm_x86.h @@ -0,0 +1,21 @@ +/* + * QEMU KVM support + * + * Copyright (C) 2009 Red Hat Inc. + * Copyright IBM, Corp. 2008 + * + * Authors: + * Anthony Liguori aligu...@us.ibm.com + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + * + */ + +#ifndef __KVM_X86_H__ +#define __KVM_X86_H__ + +void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status, +uint64_t mcg_status, uint64_t addr, uint64_t misc); + +#endif -- 1.7.2.1
Re: [Qemu-devel] [PATCH 2/2] Fix Block Hotplug race with drive_unplug()
On Mon, Oct 18, 2010 at 11:17 PM, Ryan Harper ry...@us.ibm.com wrote: Block hot unplug is racy since the guest is required to acknowlege the ACPI unplug event; this may not happen synchronously with the device removal command This series aims to close a gap where by mgmt applications that assume the block resource has been removed without confirming that the guest has acknowledged the removal may re-assign the underlying device to a second guest leading to data leakage. This series introduces a new montor command to decouple asynchornous device removal from restricting guest access to a block device. We do this by creating a new monitor command drive_unplug which maps to a bdrv_unplug() command which does a bdrv_flush() and bdrv_close(). Once complete, subsequent IO is rejected from the device and the guest will get IO errors but continue to function. A subsequent device removal command can be issued to remove the device, to which the guest may or maynot respond, but as long as the unplugged bit is set, no IO will be sumbitted. Signed-off-by: Ryan Harper ry...@us.ibm.com --- block.c | 6 ++ block.h | 1 + blockdev.c | 26 ++ blockdev.h | 1 + hmp-commands.hx | 15 +++ 5 files changed, 49 insertions(+), 0 deletions(-) diff --git a/block.c b/block.c index a19374d..9fedb27 100644 --- a/block.c +++ b/block.c @@ -1328,6 +1328,12 @@ void bdrv_set_removable(BlockDriverState *bs, int removable) } } +void bdrv_unplug(BlockDriverState *bs) +{ + bdrv_flush(bs); + bdrv_close(bs); bdrv_flush() does not wait for pending aio requests to complete. bdrv_close() does not wait either. A VM with a qcow2 image file and pending aio requests could bdrv_unplug() and free the qcow2 state before aio completions occur. If a completion is handled after bdrv_close(), the qcow2 in-memory state has been freed and we get memory corruption or a crash. I think the solution is to use qemu_aio_flush() before bdrv_flush(). I waits until all pending aio requests have been completed. Stefan
[Qemu-devel] [PATCH 07/10] Export qemu_ram_addr_from_host
To be used by next patches. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com --- cpu-common.h |3 ++- exec-all.h |2 +- exec.c | 26 +- 3 files changed, 20 insertions(+), 11 deletions(-) diff --git a/cpu-common.h b/cpu-common.h index 0426bc8..a543b5d 100644 --- a/cpu-common.h +++ b/cpu-common.h @@ -47,7 +47,8 @@ void qemu_ram_free(ram_addr_t addr); /* This should only be used for ram local to a device. */ void *qemu_get_ram_ptr(ram_addr_t addr); /* This should not be used by devices. */ -ram_addr_t qemu_ram_addr_from_host(void *ptr); +int qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr); +ram_addr_t qemu_ram_addr_from_host_nofail(void *ptr); int cpu_register_io_memory(CPUReadMemoryFunc * const *mem_read, CPUWriteMemoryFunc * const *mem_write, diff --git a/exec-all.h b/exec-all.h index 3a53fe6..c457058 100644 --- a/exec-all.h +++ b/exec-all.h @@ -334,7 +334,7 @@ static inline tb_page_addr_t get_page_addr_code(CPUState *env1, target_ulong add } p = (void *)(unsigned long)addr + env1-tlb_table[mmu_idx][page_index].addend; -return qemu_ram_addr_from_host(p); +return qemu_ram_addr_from_host_nofail(p); } #endif diff --git a/exec.c b/exec.c index c09051d..9991203 100644 --- a/exec.c +++ b/exec.c @@ -2086,7 +2086,7 @@ static inline void tlb_update_dirty(CPUTLBEntry *tlb_entry) if ((tlb_entry-addr_write ~TARGET_PAGE_MASK) == IO_MEM_RAM) { p = (void *)(unsigned long)((tlb_entry-addr_write TARGET_PAGE_MASK) + tlb_entry-addend); -ram_addr = qemu_ram_addr_from_host(p); +ram_addr = qemu_ram_addr_from_host_nofail(p); if (!cpu_physical_memory_is_dirty(ram_addr)) { tlb_entry-addr_write |= TLB_NOTDIRTY; } @@ -2939,23 +2939,31 @@ void *qemu_get_ram_ptr(ram_addr_t addr) return NULL; } -/* Some of the softmmu routines need to translate from a host pointer - (typically a TLB entry) back to a ram offset. */ -ram_addr_t qemu_ram_addr_from_host(void *ptr) +int qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr) { RAMBlock *block; uint8_t *host = ptr; QLIST_FOREACH(block, ram_list.blocks, next) { if (host - block-host block-length) { -return block-offset + (host - block-host); +*ram_addr = block-offset + (host - block-host); +return 0; } } +return -1; +} -fprintf(stderr, Bad ram pointer %p\n, ptr); -abort(); +/* Some of the softmmu routines need to translate from a host pointer + (typically a TLB entry) back to a ram offset. */ +ram_addr_t qemu_ram_addr_from_host_nofail(void *ptr) +{ +ram_addr_t ram_addr; -return 0; +if (qemu_ram_addr_from_host(ptr, ram_addr)) { +fprintf(stderr, Bad ram pointer %p\n, ptr); +abort(); +} +return ram_addr; } static uint32_t unassigned_mem_readb(void *opaque, target_phys_addr_t addr) @@ -3704,7 +3712,7 @@ void cpu_physical_memory_unmap(void *buffer, target_phys_addr_t len, { if (buffer != bounce.buffer) { if (is_write) { -ram_addr_t addr1 = qemu_ram_addr_from_host(buffer); +ram_addr_t addr1 = qemu_ram_addr_from_host_nofail(buffer); while (access_len) { unsigned l; l = TARGET_PAGE_SIZE; -- 1.7.2.1
[Qemu-devel] [PATCH 08/10] Add RAM - physical addr mapping in MCE simulation
From: Huang Ying ying.hu...@intel.com In QEMU-KVM, physical address != RAM address. While MCE simulation needs physical address instead of RAM address. So kvm_physical_memory_addr_from_ram() is implemented to do the conversion, and it is invoked before being filled in the IA32_MCi_ADDR MSR. Reported-by: Dean Nelson dnel...@redhat.com Signed-off-by: Huang Ying ying.hu...@intel.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com --- kvm-all.c | 18 ++ kvm.h |3 +++ 2 files changed, 21 insertions(+), 0 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 1cc696f..37b99c7 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -137,6 +137,24 @@ static KVMSlot *kvm_lookup_overlapping_slot(KVMState *s, return found; } +int kvm_physical_memory_addr_from_ram(KVMState *s, ram_addr_t ram_addr, + target_phys_addr_t *phys_addr) +{ +int i; + +for (i = 0; i ARRAY_SIZE(s-slots); i++) { +KVMSlot *mem = s-slots[i]; + +if (ram_addr = mem-phys_offset +ram_addr mem-phys_offset + mem-memory_size) { +*phys_addr = mem-start_addr + (ram_addr - mem-phys_offset); +return 1; +} +} + +return 0; +} + static int kvm_set_user_memory_region(KVMState *s, KVMSlot *slot) { struct kvm_userspace_memory_region mem; diff --git a/kvm.h b/kvm.h index 50b6c01..8f5a754 100644 --- a/kvm.h +++ b/kvm.h @@ -174,6 +174,9 @@ static inline void cpu_synchronize_post_init(CPUState *env) } } +int kvm_physical_memory_addr_from_ram(KVMState *s, ram_addr_t ram_addr, + target_phys_addr_t *phys_addr); + #endif int kvm_set_ioeventfd_mmio_long(int fd, uint32_t adr, uint32_t val, bool assign); -- 1.7.2.1
[Qemu-devel] [PATCH 01/10] Set cpuid definition to 0 before initializing it
From: Joerg Roedel joerg.roe...@amd.com This patch cleans the (stack-allocated) cpuid definition to 0 before actually initializing it. Signed-off-by: Joerg Roedel joerg.roe...@amd.com Signed-off-by: Avi Kivity a...@redhat.com --- target-i386/cpuid.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/target-i386/cpuid.c b/target-i386/cpuid.c index 04ba8d5..3fcf78f 100644 --- a/target-i386/cpuid.c +++ b/target-i386/cpuid.c @@ -788,6 +788,8 @@ int cpu_x86_register (CPUX86State *env, const char *cpu_model) { x86_def_t def1, *def = def1; +memset(def, 0, sizeof(*def)); + if (cpu_x86_find_by_name(def, cpu_model) 0) return -1; if (def-vendor1) { -- 1.7.2.1
[Qemu-devel] [PATCH 03/10] signalfd compatibility
Port qemu-kvm's signalfd compat code. commit 5a7fdd0abd7cd24dac205317a4195446ab8748b5 Author: Anthony Liguori aligu...@us.ibm.com Date: Wed May 7 11:55:47 2008 -0500 Use signalfd() in io-thread This patch reworks the IO thread to use signalfd() instead of sigtimedwait() This will eliminate the need to use SIGIO everywhere. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com --- Makefile.objs |1 + compatfd.c| 117 + compatfd.h| 43 + configure | 18 + 4 files changed, 179 insertions(+), 0 deletions(-) create mode 100644 compatfd.c create mode 100644 compatfd.h diff --git a/Makefile.objs b/Makefile.objs index 816194a..d73002d 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -125,6 +125,7 @@ common-obj-y += $(addprefix ui/, $(ui-obj-y)) common-obj-y += iov.o acl.o common-obj-$(CONFIG_THREAD) += qemu-thread.o +common-obj-$(CONFIG_IOTHREAD) += compatfd.o common-obj-y += notify.o event_notifier.o common-obj-y += qemu-timer.o diff --git a/compatfd.c b/compatfd.c new file mode 100644 index 000..a7cebc4 --- /dev/null +++ b/compatfd.c @@ -0,0 +1,117 @@ +/* + * signalfd/eventfd compatibility + * + * Copyright IBM, Corp. 2008 + * + * Authors: + * Anthony Liguori aligu...@us.ibm.com + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ + +#include qemu-common.h +#include compatfd.h + +#include sys/syscall.h +#include pthread.h + +struct sigfd_compat_info +{ +sigset_t mask; +int fd; +}; + +static void *sigwait_compat(void *opaque) +{ +struct sigfd_compat_info *info = opaque; +int err; +sigset_t all; + +sigfillset(all); +sigprocmask(SIG_BLOCK, all, NULL); + +do { +siginfo_t siginfo; + +err = sigwaitinfo(info-mask, siginfo); +if (err == -1 errno == EINTR) { +err = 0; +continue; +} + +if (err 0) { +char buffer[128]; +size_t offset = 0; + +memcpy(buffer, err, sizeof(err)); +while (offset sizeof(buffer)) { +ssize_t len; + +len = write(info-fd, buffer + offset, +sizeof(buffer) - offset); +if (len == -1 errno == EINTR) +continue; + +if (len = 0) { +err = -1; +break; +} + +offset += len; +} +} +} while (err = 0); + +return NULL; +} + +static int qemu_signalfd_compat(const sigset_t *mask) +{ +pthread_attr_t attr; +pthread_t tid; +struct sigfd_compat_info *info; +int fds[2]; + +info = malloc(sizeof(*info)); +if (info == NULL) { +errno = ENOMEM; +return -1; +} + +if (pipe(fds) == -1) { +free(info); +return -1; +} + +qemu_set_cloexec(fds[0]); +qemu_set_cloexec(fds[1]); + +memcpy(info-mask, mask, sizeof(*mask)); +info-fd = fds[1]; + +pthread_attr_init(attr); +pthread_attr_setdetachstate(attr, PTHREAD_CREATE_DETACHED); + +pthread_create(tid, attr, sigwait_compat, info); + +pthread_attr_destroy(attr); + +return fds[0]; +} + +int qemu_signalfd(const sigset_t *mask) +{ +#if defined(CONFIG_SIGNALFD) +int ret; + +ret = syscall(SYS_signalfd, -1, mask, _NSIG / 8); +if (ret != -1) { +qemu_set_cloexec(ret); +return ret; +} +#endif + +return qemu_signalfd_compat(mask); +} diff --git a/compatfd.h b/compatfd.h new file mode 100644 index 000..fc37915 --- /dev/null +++ b/compatfd.h @@ -0,0 +1,43 @@ +/* + * signalfd/eventfd compatibility + * + * Copyright IBM, Corp. 2008 + * + * Authors: + * Anthony Liguori aligu...@us.ibm.com + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ + +#ifndef QEMU_COMPATFD_H +#define QEMU_COMPATFD_H + +#include signal.h + +struct qemu_signalfd_siginfo { +uint32_t ssi_signo; /* Signal number */ +int32_t ssi_errno; /* Error number (unused) */ +int32_t ssi_code;/* Signal code */ +uint32_t ssi_pid; /* PID of sender */ +uint32_t ssi_uid; /* Real UID of sender */ +int32_t ssi_fd; /* File descriptor (SIGIO) */ +uint32_t ssi_tid; /* Kernel timer ID (POSIX timers) */ +uint32_t ssi_band;/* Band event (SIGIO) */ +uint32_t ssi_overrun; /* POSIX timer overrun count */ +uint32_t ssi_trapno; /* Trap number that caused signal */ +int32_t ssi_status; /* Exit status or signal (SIGCHLD) */ +int32_t ssi_int; /* Integer sent by sigqueue(2) */ +uint64_t ssi_ptr; /* Pointer sent by sigqueue(2) */ +uint64_t ssi_utime; /* User CPU time consumed (SIGCHLD) */ +uint64_t ssi_stime; /* System CPU
[Qemu-devel] Re: [PATCH v5 00/14] pcie port switch emulators
On Tue, Oct 19, 2010 at 06:06:27PM +0900, Isaku Yamahata wrote: On uncorrectable error status register in pcie_aer_write_config(). The register is RW1CS, so making it writable and test-and-clear doesn't work. Sure. But isn't this what w1cmask implements? Also - mail to ad...@khaleel.us seems to bounce. I stripped it from the Cc list for now. -- MST
[Qemu-devel] Re: [PATCH v5 07/14] pcie: helper functions for pcie capability and extended capability
On Tue, Oct 19, 2010 at 06:06:34PM +0900, Isaku Yamahata wrote: This patch implements helper functions for pci express capability and pci express extended capability allocation. NOTE: presence detection depends on pci_qdev_init() change. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- Changes v4 - v5: - dropped FLR related members. This will be addressed at the next phase. - use pci_xxx_test_and_xxx_mask(). - drop PCIDevice::written bits. and made related registers writable. - eliminated pcie_cap_slot_notify() - introduced PCIExpressDevice::hpev_intx Please also add a comment saying that hpev_intx field defaults to 0, and needs to be changed by devices that want to use another interrupt for hotplug events. Changes v3 - v4: - various clean up - dropped pcie_notify(), pcie_del_capability() - use pci_{clear_set, clear}_bit_xxx() helper functions. - dropped pci_exp_cap() --- Makefile.objs |1 + hw/pci.h |5 + hw/pcie.c | 540 + hw/pcie.h | 107 qemu-common.h |1 + 5 files changed, 654 insertions(+), 0 deletions(-) create mode 100644 hw/pcie.c create mode 100644 hw/pcie.h diff --git a/Makefile.objs b/Makefile.objs index 5f5a4c5..eeb5134 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -186,6 +186,7 @@ hw-obj-$(CONFIG_PIIX4) += piix4.o # PCI watchdog devices hw-obj-y += wdt_i6300esb.o +hw-obj-y += pcie.o hw-obj-y += msix.o msi.o # PCI network cards diff --git a/hw/pci.h b/hw/pci.h index 9e2f27d..d6c522b 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -9,6 +9,8 @@ /* PCI includes legacy ISA access. */ #include isa.h +#include pcie.h + /* PCI bus */ #define PCI_DEVFN(slot, func) slot) 0x1f) 3) | ((func) 0x07)) @@ -175,6 +177,9 @@ struct PCIDevice { /* Offset of MSI capability in config space */ uint8_t msi_cap; +/* PCI Express */ +PCIExpressDevice exp; + /* Location of option rom */ char *romfile; ram_addr_t rom_offset; diff --git a/hw/pcie.c b/hw/pcie.c new file mode 100644 index 000..53d1fce --- /dev/null +++ b/hw/pcie.c @@ -0,0 +1,540 @@ +/* + * pcie.c + * + * Copyright (c) 2010 Isaku Yamahata yamahata at valinux co jp + *VA Linux Systems Japan K.K. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, see http://www.gnu.org/licenses/. + */ + +#include sysemu.h +#include pci_bridge.h +#include pcie.h +#include msix.h +#include msi.h +#include pci_internals.h +#include pcie_regs.h + +//#define DEBUG_PCIE +#ifdef DEBUG_PCIE +# define PCIE_DPRINTF(fmt, ...) \ +fprintf(stderr, %s:%d fmt, __func__, __LINE__, ## __VA_ARGS__) +#else +# define PCIE_DPRINTF(fmt, ...) do {} while (0) +#endif +#define PCIE_DEV_PRINTF(dev, fmt, ...) \ +PCIE_DPRINTF(%s:%x fmt, (dev)-name, (dev)-devfn, ## __VA_ARGS__) + + +/*** + * pci express capability helper functions + */ +int pcie_cap_init(PCIDevice *dev, uint8_t offset, uint8_t type, uint8_t port) +{ +int pos; +uint8_t *exp_cap; + +assert(pci_is_express(dev)); + +pos = pci_add_capability(dev, PCI_CAP_ID_EXP, offset, + PCI_EXP_VER2_SIZEOF); +if (pos 0) { +return pos; +} +dev-exp.exp_cap = pos; +exp_cap = dev-config + pos; + +/* capability register + interrupt message number defaults to 0 */ +pci_set_word(exp_cap + PCI_EXP_FLAGS, + ((type PCI_EXP_FLAGS_TYPE_SHIFT) PCI_EXP_FLAGS_TYPE) | + PCI_EXP_FLAGS_VER2); + +/* device capability register + * table 7-12: + * roll based error reporting bit must be set by all + * Functions conforming to the ECN, PCI Express Base + * Specification, Revision 1.1., or subsequent PCI Express Base + * Specification revisions. + */ +pci_set_long(exp_cap + PCI_EXP_DEVCAP, PCI_EXP_DEVCAP_RBER); + +pci_set_long(exp_cap + PCI_EXP_LNKCAP, + (port PCI_EXP_LNKCAP_PN_SHIFT) | + PCI_EXP_LNKCAP_ASPMS_0S | + PCI_EXP_LNK_MLW_1 | + PCI_EXP_LNK_LS_25); + +pci_set_word(exp_cap + PCI_EXP_LNKSTA, +
[Qemu-devel] Re: [PATCH v5 00/14] pcie port switch emulators
On Tue, Oct 19, 2010 at 06:06:27PM +0900, Isaku Yamahata wrote: Here is v5 of the pcie patch series. I hope I addressed the blockers. On uncorrectable error status register in pcie_aer_write_config(). The register is RW1CS, so making it writable and test-and-clear doesn't work. new patches: 1, 2, updasted patches except trivial change: 4, 7, 8 BTW, as 0.13 is released, any chance to sync pci branch with the upstream by requesting pull? Patch description: This patch series implements pcie port switch emulators which is basic part for pcie/q35 support. This is for mst/pci tree. change v4 - v5: - introduced pci_xxx_test_and_clear/set_mask - eliminated xxx_notify(msi_trigger, int_level) - eliminated FLR bits. FLR will be addressed at the next phase. changes v3 - v4: - introduced new pci config helper functions.(clear set bit) - various clean up and some bug fixes. - dropped pci_shift_xxx(). - dropped function pointerin pcie_aer.h - dropped pci_exp_cap(), pcie_aer_cap(). - file rename (pcie_{root, upstream, downsatrem} = ioh33420, x3130). changes v2 - v3: - msi: improved commant and simplified shift/ffs dance - pci w1c config register framework - split pcie.[ch] into pcie_regs.h, pcie.[ch] and pcie_aer.[ch] - pcie, aer: many changes by following reviews. changes v1 - v2: - update msi - dropped already pushed out patches. - added msix patches. Isaku Yamahata (14): pci: introduce helper functions to test-and-{clear, set} mask in configuration space pci: introduce helper function to handle msi-x and msi. pci: use pci_word_test_and_clear_mask() in pci_device_reset() pci/bridge: fix pci_bridge_reset() msi: implements msi pcie: add pcie constants to pcie_regs.h pcie: helper functions for pcie capability and extended capability I'll apply these. pcie/aer: helper functions for pcie aer capability Maybe move this to the end of the series? pcie port: define struct PCIEPort/PCIESlot and helper functions ioh3420: pcie root port in X58 ioh x3130: pcie upstream port x3130: pcie downstream port pcie/hotplug: introduce pushing attention button command I think the above can be applied - just remove the dependency on aer for now. pcie/aer: glue aer error injection into qemu monitor Makefile.objs |4 +- hw/ioh3420.c| 229 + hw/ioh3420.h| 10 + hw/msi.c| 352 +++ hw/msi.h| 41 +++ hw/pci.c| 24 ++- hw/pci.h| 88 +- hw/pci_bridge.c | 57 +++- hw/pci_bridge.h |2 + hw/pcie.c | 540 + hw/pcie.h | 113 ++ hw/pcie_aer.c | 869 +++ hw/pcie_aer.h | 105 ++ hw/pcie_port.c | 198 +++ hw/pcie_port.h | 51 +++ hw/pcie_regs.h | 154 + hw/xio3130_downstream.c | 197 +++ hw/xio3130_downstream.h | 11 + hw/xio3130_upstream.c | 181 ++ hw/xio3130_upstream.h | 10 + qemu-common.h |6 + qemu-monitor.hx | 36 ++ sysemu.h|9 + 23 files changed, 3272 insertions(+), 15 deletions(-) create mode 100644 hw/ioh3420.c create mode 100644 hw/ioh3420.h create mode 100644 hw/msi.c create mode 100644 hw/msi.h create mode 100644 hw/pcie.c create mode 100644 hw/pcie.h create mode 100644 hw/pcie_aer.c create mode 100644 hw/pcie_aer.h create mode 100644 hw/pcie_port.c create mode 100644 hw/pcie_port.h create mode 100644 hw/pcie_regs.h create mode 100644 hw/xio3130_downstream.c create mode 100644 hw/xio3130_downstream.h create mode 100644 hw/xio3130_upstream.c create mode 100644 hw/xio3130_upstream.h
Re: [Qemu-devel] qemu aborts if i add a already registered device from qemu monitor ..
On Tue, 19 Oct 2010 15:27:37 +0530 pradeep psuri...@linux.vnet.ibm.com wrote: Hi I tried to add a device to guest from upstream qemu monitor using device_add. Are you developing a new device or does it happen with existing ones? If it's the latter, can you describe steps to reproduce it? Unknowingly i try to add already registered devices from qemu monitor, my qemu monitor is aborted. I don't see a reason to kill monitor. I think abort() is a bit rough. we need a better way to handle it. If a user try to add a already registered device, qemu should convey this to user saying that, this device already registered and an error message should be fine than aborting qemu. QLIST_FOREACH(block, ram_list.blocks, next) { if (!strcmp(block-idstr, new_block-idstr)) { fprintf(stderr, RAMBlock \%s\ already registered, abort!\n, new_block-idstr); abort(); } If i return some other value in above code, instead of abort(), I would need change the code for every device, which i dont want to. Is there a way to check, if device is already enrolled or not in the very beginning of device_add call. Thanks Pradeep
[Qemu-devel] Re: [PATCH v5 04/14] pci/bridge: fix pci_bridge_reset()
On Tue, Oct 19, 2010 at 06:06:31PM +0900, Isaku Yamahata wrote: The default value of base/limit registers aren't specified in the spec. So pci_bridge_reset() shouldn't touch them. Instead, introduced two functions to reset those registers in a way of typical implementation. zero base/limit registers or disable forwarding. They will be used later. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- I have some second thoughts: 1. pci_bridge_reset is used in several devices and I have no idea what the reset change will do there, how do real devices behave or whether guests depend on a specific behaviour. It seems harmless to leave the current implementation in place, and simply add pci_bridge_disable_base_limit which devices can call after pci_bridge_reset. 2. _zero and _disable describe what function does already, so we should drop _reset from function name. Thus we only get 1 new function, pci_bridge_disable_base_limit. Changes v4 - v5: - drop the lines in pci_bridge_reset() - introduced two functions to reset base/limit registers. --- hw/pci_bridge.c | 57 +++--- hw/pci_bridge.h |2 + 2 files changed, 51 insertions(+), 8 deletions(-) diff --git a/hw/pci_bridge.c b/hw/pci_bridge.c index 638e3b3..de75e6a 100644 --- a/hw/pci_bridge.c +++ b/hw/pci_bridge.c @@ -151,6 +151,46 @@ void pci_bridge_write_config(PCIDevice *d, } } +void pci_bridge_reset_zero_base_limit(PCIDevice *dev) +{ +uint8_t *conf = dev-config; + +pci_byte_test_and_clear_mask(conf + PCI_IO_BASE, + PCI_IO_RANGE_MASK 0xff); +pci_byte_test_and_clear_mask(conf + PCI_IO_LIMIT, + PCI_IO_RANGE_MASK 0xff); +pci_word_test_and_clear_mask(conf + PCI_MEMORY_BASE, + PCI_MEMORY_RANGE_MASK 0x); +pci_word_test_and_clear_mask(conf + PCI_MEMORY_LIMIT, + PCI_MEMORY_RANGE_MASK 0x); +pci_word_test_and_clear_mask(conf + PCI_PREF_MEMORY_BASE, + PCI_PREF_RANGE_MASK 0x); +pci_word_test_and_clear_mask(conf + PCI_PREF_MEMORY_LIMIT, + PCI_PREF_RANGE_MASK 0x); +pci_set_word(conf + PCI_PREF_BASE_UPPER32, 0); +pci_set_word(conf + PCI_PREF_LIMIT_UPPER32, 0); +} + +void pci_bridge_reset_disable_base_limit(PCIDevice *dev) +{ +uint8_t *conf = dev-config; + +pci_byte_test_and_set_mask(conf + PCI_IO_BASE, + PCI_IO_RANGE_MASK 0xff); +pci_byte_test_and_clear_mask(conf + PCI_IO_LIMIT, + PCI_IO_RANGE_MASK 0xff); +pci_word_test_and_set_mask(conf + PCI_MEMORY_BASE, + PCI_MEMORY_RANGE_MASK 0x); +pci_word_test_and_clear_mask(conf + PCI_MEMORY_LIMIT, + PCI_MEMORY_RANGE_MASK 0x); +pci_word_test_and_set_mask(conf + PCI_PREF_MEMORY_BASE, + PCI_PREF_RANGE_MASK 0x); +pci_word_test_and_clear_mask(conf + PCI_PREF_MEMORY_LIMIT, + PCI_PREF_RANGE_MASK 0x); +pci_set_word(conf + PCI_PREF_BASE_UPPER32, 0); +pci_set_word(conf + PCI_PREF_LIMIT_UPPER32, 0); +} + /* reset bridge specific configuration registers */ void pci_bridge_reset_reg(PCIDevice *dev) { @@ -161,14 +201,15 @@ void pci_bridge_reset_reg(PCIDevice *dev) conf[PCI_SUBORDINATE_BUS] = 0; conf[PCI_SEC_LATENCY_TIMER] = 0; -conf[PCI_IO_BASE] = 0; -conf[PCI_IO_LIMIT] = 0; -pci_set_word(conf + PCI_MEMORY_BASE, 0); -pci_set_word(conf + PCI_MEMORY_LIMIT, 0); -pci_set_word(conf + PCI_PREF_MEMORY_BASE, 0); -pci_set_word(conf + PCI_PREF_MEMORY_LIMIT, 0); -pci_set_word(conf + PCI_PREF_BASE_UPPER32, 0); -pci_set_word(conf + PCI_PREF_LIMIT_UPPER32, 0); +/* + * the default values for base/limit registers aren't specified + * in the PCI-to-PCI-bridge spec. So we don't thouch them here. + * Each implementation can override it. + * typical implementation does + * - zero registers: pci_bridge_reset_zer_base_limit() + * or + * - disable forwarding: pci_bridge_reset_disable_base_limit() + */ pci_set_word(conf + PCI_BRIDGE_CONTROL, 0); } diff --git a/hw/pci_bridge.h b/hw/pci_bridge.h index f6fade0..2359684 100644 --- a/hw/pci_bridge.h +++ b/hw/pci_bridge.h @@ -39,6 +39,8 @@ pcibus_t pci_bridge_get_limit(const PCIDevice *bridge, uint8_t type); void pci_bridge_write_config(PCIDevice *d, uint32_t address, uint32_t val, int len); +void pci_bridge_reset_zero_base_limit(PCIDevice *dev); +void pci_bridge_reset_disable_base_limit(PCIDevice *dev); void pci_bridge_reset_reg(PCIDevice *dev); void pci_bridge_reset(DeviceState *qdev); -- 1.7.1.1
Re: [Qemu-devel] [Tracing][v4 PATCH 2/2] Add documentation for QMP interfaces
On 10/19/2010 11:57 AM, Prerna Saxena wrote: [PATCH 2/2] Add documentation for QMP commands: - query-trace - query-trace-events - query-trace-file. I've been trying ways to avoid building this documentation for other trace backends ( since these commands are only available with the 'simple' backend ). However, looks like hxtool blindly copies text between SQMP and EQMP. I can only think of making hxtool a wee bit intelligent to be able to parse CONFIG_* options and build documentation accordingly. Is there a workaround I'm missing ? -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
Re: [Qemu-devel] Re: KVM call agenda for Oct 19
On 10/19/2010 04:11 AM, Chris Wright wrote: * Juan Quintela (quint...@redhat.com) wrote: Please send in any agenda items you are interested in covering. - 0.13.X -stable handoff - 0.14 planning - threadlet work - virtfs proposals - Live snapshots - We were asked to add this feature for external qcow2 images. Will simple approach of fsync + tracking each requested backing file (it can be per vDisk) and re-open the new image would be accepted? - Integration with FS freeze for consistent guest app snapshot Many apps do not sync their ram state to disk correctly or frequent enough. Physical world backup software calls fs freeze on xfs and VSS for windows to make the backup consistent. In order to integrated this with live snapshots we need a guest agent to trigger the guest fs freeze. We can either have qemu communicate with the agent directly through virtio-serial or have a mgmt daemon use virtio-serial to communicate with the guest in addition to QMP messages about the live snapshot state. Preferences? The first solution complicates qemu while the second complicates mgmt.
[Qemu-devel] [PATCH 0/1] ccid emulated card (v2, for usb-ccid v3)
v2 changes: fixed a bug that made certificates emulation not work, and some cleanup. v1 message: Meant to be applied after the usb-ccid v3 patch on the list. Causes --enable-smartcard to depend on libcac_card, library for emulating CAC compliant smart cards at http://cgit.freedesktop.org/~alon/cac_card/ hw/ccid-card-emulated.c: new device Makefile.objs: add ccid-card-emulated.o if --enable-smartcard configure: dependency on libcac_card if --enable-smartcard hw/usb-ccid.c: added a TODO note hw/ccid-card-passthru.c: removed does-nothing print method. Alon Levy (1): add ccid-card-emulated device (v2) Makefile.objs |2 +- configure | 20 ++ hw/ccid-card-emulated.c | 495 +++ hw/ccid-card-passthru.c |6 - hw/usb-ccid.c |2 + 5 files changed, 518 insertions(+), 7 deletions(-) create mode 100644 hw/ccid-card-emulated.c -- 1.7.3.1
[Qemu-devel] [PATCH 1/1] add ccid-card-emulated device (v2)
changes from v1: remove stale comments, use only c-style comments bugfix, forgot to set recv_len change reader name to 'Virtual Reader' Signed-off-by: Alon Levy al...@redhat.com --- Makefile.objs |2 +- configure | 20 ++ hw/ccid-card-emulated.c | 495 +++ hw/ccid-card-passthru.c |6 - hw/usb-ccid.c |2 + 5 files changed, 518 insertions(+), 7 deletions(-) create mode 100644 hw/ccid-card-emulated.c diff --git a/Makefile.objs b/Makefile.objs index 3c4a880..ae12546 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -173,7 +173,7 @@ hw-obj-$(CONFIG_FDC) += fdc.o hw-obj-$(CONFIG_ACPI) += acpi.o acpi_piix4.o hw-obj-$(CONFIG_APM) += pm_smbus.o apm.o hw-obj-$(CONFIG_DMA) += dma.o -hw-obj-$(CONFIG_SMARTCARD) += usb-ccid.o ccid-card-passthru.o +hw-obj-$(CONFIG_SMARTCARD) += usb-ccid.o ccid-card-passthru.o ccid-card-emulated.o # PPC devices hw-obj-$(CONFIG_OPENPIC) += openpic.o diff --git a/configure b/configure index e1922a3..59b8436 100755 --- a/configure +++ b/configure @@ -2112,6 +2112,26 @@ EOF fi fi +# check for libcaccard for smartcard support +if test $smartcard != no ; then + cat $TMPC EOF +#include vscard_common.h +int main() { return 0; } +EOF + smartcard_cflags=$($pkgconfig --cflags cac_card cac_card 2/dev/null) + smartcard_libs=$($pkgconfig --libs cac_card cac_card 2/dev/null) + if $pkgconfig --atleast-version=0.0.1 cac_card \ + compile_prog $smartcard_cflags $smartcard_libs ; then +smartcard=yes +QEMU_CFLAGS=$QEMU_CFLAGS $smartcard_cflags + else +if test smartcard = yes ; then + feature_not_found smartcard +fi +smartcard=no + fi +fi + ## ## diff --git a/hw/ccid-card-emulated.c b/hw/ccid-card-emulated.c new file mode 100644 index 000..9eee6b7 --- /dev/null +++ b/hw/ccid-card-emulated.c @@ -0,0 +1,495 @@ +/* + * CCID Card Device. Emulated card. + * + * It can be used to provide access to the local hardware in a non exclusive + * way, or it can use certificates. It requires the usb-ccid bus. + * + * Usage 1: standard, mirror hardware reader+card: + * qemu .. -usb -device usb-ccid -device ccid-card-emulated + * + * Usage 2: use certificates, no hardware required + * one time: create the certificates: + * for i in 1 2 3; do certutil -d /etc/pki/nssdb -x -t CT,CT,CT -S -s CN=user$i -n user$i; done + * qemu .. -usb -device usb-ccid -device ccid-card-emulated,cert1=user1,cert2=user2,cert3=user3 + * + * If you use a non default db for the certificates you can specify it using the db parameter. + * + * + * Copyright (c) 2010 Red Hat. + * Written by Alon Levy. + * + * This code is licenced under the LGPL. + */ + +#include pthread.h +#include eventt.h +#include vevent.h +#include vreader.h +#include vcard_emul.h +#include qemu-char.h +#include monitor.h +#include hw/ccid.h + +#define DPRINTF(lvl, fmt, ...) \ +do { if (lvl = debug) { printf(ccid-card-emul: %s: fmt , __func__, ## __VA_ARGS__); } } while (0) + +static int debug = 0; + +#define EMULATED_DEV_NAME ccid-card-emulated + +#define BACKEND_NSS_EMULATED nss-emulated /* the default */ +#define BACKEND_CERTIFICATES certificates + +typedef struct EmulatedState EmulatedState; + +enum { +EMUL_READER_INSERT = 0, +EMUL_READER_REMOVE, +EMUL_CARD_INSERT, +EMUL_CARD_REMOVE, +EMUL_GUEST_APDU, +EMUL_RESPONSE_APDU, +EMUL_ERROR, +}; + +static const char* emul_event_to_string(uint32_t emul_event) +{ +switch (emul_event) { +case EMUL_READER_INSERT: return EMUL_READER_INSERT; +case EMUL_READER_REMOVE: return EMUL_READER_REMOVE; +case EMUL_CARD_INSERT: return EMUL_CARD_INSERT; +case EMUL_CARD_REMOVE: return EMUL_CARD_REMOVE; +case EMUL_GUEST_APDU: return EMUL_GUEST_APDU; +case EMUL_RESPONSE_APDU: return EMUL_RESPONSE_APDU; +case EMUL_ERROR: return EMUL_ERROR; +default: +break; +} +return UNKNOWN; +} + +typedef struct EmulEvent { +QSIMPLEQ_ENTRY(EmulEvent) entry; +union { +struct { +uint32_t type; +} gen; +struct { +uint32_t type; +uint64_t code; +} error; +struct { +uint32_t type; +uint32_t len; +uint8_t data[]; +} data; +} p; +} EmulEvent; + +#define MAX_ATR_SIZE 40 +struct EmulatedState { +CCIDCardState base; +uint8_t debug; +char*backend; +char*cert1; +char*cert2; +char*cert3; +char*db; +uint8_t atr[MAX_ATR_SIZE]; +uint8_t atr_length; +QSIMPLEQ_HEAD(event_list, EmulEvent) event_list; +pthread_mutex_t event_list_mutex; +VReader *reader; +QSIMPLEQ_HEAD(guest_apdu_list, EmulEvent) guest_apdu_list; +pthread_mutex_t vreader_mutex; /* and guest_apdu_list mutex */ +pthread_mutex_t handle_apdu_mutex; +
Re: [Qemu-devel] Re: KVM call agenda for Oct 19
On 10/19/2010 02:48 PM, Dor Laor wrote: On 10/19/2010 04:11 AM, Chris Wright wrote: * Juan Quintela (quint...@redhat.com) wrote: Please send in any agenda items you are interested in covering. - 0.13.X -stable handoff - 0.14 planning - threadlet work - virtfs proposals - Live snapshots - We were asked to add this feature for external qcow2 images. Will simple approach of fsync + tracking each requested backing file (it can be per vDisk) and re-open the new image would be accepted? - Integration with FS freeze for consistent guest app snapshot Many apps do not sync their ram state to disk correctly or frequent enough. Physical world backup software calls fs freeze on xfs and VSS for windows to make the backup consistent. In order to integrated this with live snapshots we need a guest agent to trigger the guest fs freeze. We can either have qemu communicate with the agent directly through virtio-serial or have a mgmt daemon use virtio-serial to communicate with the guest in addition to QMP messages about the live snapshot state. Preferences? The first solution complicates qemu while the second complicates mgmt. Third option, make the freeze path management - qemu - virtio-blk - guest kernel - file systems. The advantage is that it's easy to associate file systems with a block device this way. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] Re: KVM call agenda for Oct 19
On 10/19/2010 02:55 PM, Avi Kivity wrote: On 10/19/2010 02:48 PM, Dor Laor wrote: On 10/19/2010 04:11 AM, Chris Wright wrote: * Juan Quintela (quint...@redhat.com) wrote: Please send in any agenda items you are interested in covering. - 0.13.X -stable handoff - 0.14 planning - threadlet work - virtfs proposals - Live snapshots - We were asked to add this feature for external qcow2 images. Will simple approach of fsync + tracking each requested backing file (it can be per vDisk) and re-open the new image would be accepted? - Integration with FS freeze for consistent guest app snapshot Many apps do not sync their ram state to disk correctly or frequent enough. Physical world backup software calls fs freeze on xfs and VSS for windows to make the backup consistent. In order to integrated this with live snapshots we need a guest agent to trigger the guest fs freeze. We can either have qemu communicate with the agent directly through virtio-serial or have a mgmt daemon use virtio-serial to communicate with the guest in addition to QMP messages about the live snapshot state. Preferences? The first solution complicates qemu while the second complicates mgmt. Third option, make the freeze path management - qemu - virtio-blk - guest kernel - file systems. The advantage is that it's easy to associate file systems with a block device this way. OTH the userspace freeze path already exist and now you create another path. What about FS that span over LVM with multiple drives? IDE/SCSI?
Re: [Qemu-devel] Re: KVM call agenda for Oct 19
On 10/19/2010 08:03 AM, Avi Kivity wrote: On 10/19/2010 02:58 PM, Dor Laor wrote: On 10/19/2010 02:55 PM, Avi Kivity wrote: On 10/19/2010 02:48 PM, Dor Laor wrote: On 10/19/2010 04:11 AM, Chris Wright wrote: * Juan Quintela (quint...@redhat.com) wrote: Please send in any agenda items you are interested in covering. - 0.13.X -stable handoff - 0.14 planning - threadlet work - virtfs proposals - Live snapshots - We were asked to add this feature for external qcow2 images. Will simple approach of fsync + tracking each requested backing file (it can be per vDisk) and re-open the new image would be accepted? - Integration with FS freeze for consistent guest app snapshot Many apps do not sync their ram state to disk correctly or frequent enough. Physical world backup software calls fs freeze on xfs and VSS for windows to make the backup consistent. In order to integrated this with live snapshots we need a guest agent to trigger the guest fs freeze. We can either have qemu communicate with the agent directly through virtio-serial or have a mgmt daemon use virtio-serial to communicate with the guest in addition to QMP messages about the live snapshot state. Preferences? The first solution complicates qemu while the second complicates mgmt. Third option, make the freeze path management - qemu - virtio-blk - guest kernel - file systems. The advantage is that it's easy to associate file systems with a block device this way. OTH the userspace freeze path already exist and now you create another path. I guess we would still have a userspace daemon; instead of talking to virtio-serial it talks to virtio-blk. So: management - qemu - virtio-blk - guest driver - kernel fs resolver - daemon - apps Yuck. Yeah, in Windows, I'm pretty sure the freeze API is a userspace concept. Various apps can hook into it to serialize their state. At the risk of stealing Mike's thunder, we've actually been working on a simple guest agent exactly for this type of task. Mike's planning an RFC for later this week but for those that are interested the repo is at http://repo.or.cz/w/qemu/mdroth.git Regards, Anthony Liguori What about FS that span over LVM with multiple drives? IDE/SCSI? Good points.
Re: [Qemu-devel] [PATCH 2/3] usb: add public usb_device_by_id
- Gerd Hoffmann kra...@redhat.com wrote: +USBDevice *usb_device_by_id(const char* id) +{ +USBBus *bus; +DeviceState *qdev; +USBDevice *dev; + +QTAILQ_FOREACH(bus,busses, next) { +qdev = qdev_find_recursive(bus-qbus, id); +if (qdev != NULL) { +dev = DO_UPCAST(USBDevice, qdev, qdev); +return dev; +} +} You don't need qdev_find_recursive here. Have a look at the usb_info() code to see how to loop over all usb devices. Then compare id with USBDevice-qdev.id. cheers, Gerd There is no problem to loop over all usb devices. But first of all I don't want to loop on used, since then I miss any detached devices, so I actually do want the same behavior of qdev_find_recursive, and since it's already available, why rewrite it in a different compilation unit? Alon
[Qemu-devel] Re: [PATCH] virtio: Use ioeventfd for virtqueue notify
On 10/19/2010 08:07 AM, Stefan Hajnoczi wrote: Is there anything stopping this patch from being merged? Michael, any objections? If not, I'll merge it. Regards, Anthony Liguori Thanks, Stefan
Re: [Qemu-devel] [PATCH 2/3] usb: add public usb_device_by_id
+USBDevice *usb_device_by_id(const char* id) +{ +USBBus *bus; +DeviceState *qdev; +USBDevice *dev; + +QTAILQ_FOREACH(bus,busses, next) { +qdev = qdev_find_recursive(bus-qbus, id); +if (qdev != NULL) { +dev = DO_UPCAST(USBDevice, qdev, qdev); +return dev; +} +} You don't need qdev_find_recursive here. Have a look at the usb_info() code to see how to loop over all usb devices. Then compare id with USBDevice-qdev.id. cheers, Gerd
[Qemu-devel] Static tracepoint control via trace-event
Hi Stefan, just had a closer look at qemu's new tracing framework. Looks cool, though it leaves a bit room for improvements. ;) One quirk I stumbled over quickly was the disable tag in trace-events. It confused me first as qemu starts without any tracepoint enabled by default and I thought I had to hack the file. Then I read the doc and wondered which exiting or future backend would come without sufficiently fast dynamic tracepoint control. Do you have any in mind? Instead of making it a compile-time switch (except for simpletrace), I would vote for declaring the simpletrace usage as the only one: disable sets the default state of the dynamic tracepoint. That way we could use trace-events to define a useful set of standard, moderate-impact tracepoints that shall be on. Others will still be available once a backend is configured, but remain off until enabled during runtime. Anything else looks like overkill to me. There are a few more things I have in mind (ftrace backend, enhanced -trace switch, wildcards for enabling tracepoints, and more tracepoints). Will hopefully come up with patches to address them, but this may take a while. Jan PS: Do you maintain a tracing git tree? -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Re: [Qemu-devel] Re: KVM call agenda for Oct 19
On 10/19/2010 03:22 PM, Anthony Liguori wrote: I had assumed that this would involve: qemu -hda windows.img (qemu) snapshot ide0-disk0 snap0.img 1) create snap0.img internally by doing the equivalent of `qemu-img create -f qcow2 -b windows.img snap0.img' 2) bdrv_flush('ide0-disk0') 3) bdrv_open(snap0.img) 4) bdrv_close(windows.img) 5) rename('windows.img', 'windows.img.tmp') 6) rename('snap0.img', 'windows.img') 7) rename('windows.img.tmp', 'snap0.img') Looks reasonable. Would be interesting to look at this as a use case for the threading work. We should eventually be able to create a snapshot without stalling vcpus (stalling I/O of course allowed). -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] Re: KVM call agenda for Oct 19
On 10/19/2010 07:48 AM, Dor Laor wrote: On 10/19/2010 04:11 AM, Chris Wright wrote: * Juan Quintela (quint...@redhat.com) wrote: Please send in any agenda items you are interested in covering. - 0.13.X -stable handoff - 0.14 planning - threadlet work - virtfs proposals - Live snapshots - We were asked to add this feature for external qcow2 images. Will simple approach of fsync + tracking each requested backing file (it can be per vDisk) and re-open the new image would be accepted? - Integration with FS freeze for consistent guest app snapshot Many apps do not sync their ram state to disk correctly or frequent enough. Physical world backup software calls fs freeze on xfs and VSS for windows to make the backup consistent. In order to integrated this with live snapshots we need a guest agent to trigger the guest fs freeze. We can either have qemu communicate with the agent directly through virtio-serial or have a mgmt daemon use virtio-serial to communicate with the guest in addition to QMP messages about the live snapshot state. Preferences? The first solution complicates qemu while the second complicates mgmt. - usb-ccid (aka external device modules) We probably won't get to it for today's call, but we should try to queue this topic up for discussion. We have a similar situation with vtpm (existing device model that wants to integrate with QEMU). My position so far has been that we should avoid external device models because of difficulty integrating QEMU features with external device models. However, I'd like to hear opinions from a wider audience. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Qemu-devel] Re: [PATCH] virtio: Use ioeventfd for virtqueue notify
On Thu, Sep 30, 2010 at 03:01:52PM +0100, Stefan Hajnoczi wrote: Virtqueue notify is currently handled synchronously in userspace virtio. This prevents the vcpu from executing guest code while hardware emulation code handles the notify. On systems that support KVM, the ioeventfd mechanism can be used to make virtqueue notify a lightweight exit by deferring hardware emulation to the iothread and allowing the VM to continue execution. This model is similar to how vhost receives virtqueue notifies. The result of this change is improved performance for userspace virtio devices. Virtio-blk throughput increases especially for multithreaded scenarios and virtio-net transmit throughput increases substantially. Full numbers are below. This patch employs ioeventfd virtqueue notify for all virtio devices. Linux kernels pre-2.6.34 only allow for 6 ioeventfds per VM and care must be taken so that vhost-net, the other ioeventfd user in QEMU, is able to function. On such kernels ioeventfd virtqueue notify will not be used. Khoa Huynh k...@us.ibm.com collected the following data for virtio-blk with cache=none,aio=native: FFSB Test Threads Unmodified Patched (MB/s) (MB/s) Large file create 121.721.8 8101.0 118.0 16 119.0 157.0 Sequential reads 121.923.2 8114.0 139.0 16 143.0 178.0 Random reads 13.3 3.6 823.025.4 16 43.347.8 Random writes 122.223.0 893.1111.6 16 110.5 132.0 Sridhar Samudrala s...@us.ibm.com collected the following data for virtio-net with 2.6.36-rc1 on the host and 2.6.34 on the guest. Guest to Host TCP_STREAM throughput(Mb/sec) --- Msg Size vhost-net virtio-net virtio-net/ioeventfd 65536 127556430 7590 16384 84993084 5764 4096 47231578 3659 1024 1827 981 2060 Host to Guest TCP_STREAM throughput(Mb/sec) --- Msg Size vhost-net virtio-net virtio-net/ioeventfd 65536 111565790 5853 16384 107875575 5691 4096 104525556 4277 1024 44373671 5277 Guest to Host TCP_RR latency(transactions/sec) -- Msg Size vhost-net virtio-net virtio-net/ioeventfd 1 99033459 3425 4096 71851931 1899 16384 61082102 1923 65536 31611610 1744 Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- Small changes are required for qemu-kvm.git. I will send them once qemu.git has virtio-ioeventfd support. hw/vhost.c |6 ++-- hw/virtio.c | 106 +++ hw/virtio.h |9 + kvm-all.c | 39 + kvm-stub.c |5 +++ kvm.h |1 + 6 files changed, 156 insertions(+), 10 deletions(-) Is there anything stopping this patch from being merged? Thanks, Stefan
[Qemu-devel] Re: [PATCH v5 00/14] pcie port switch emulators
On Tue, Oct 19, 2010 at 06:06:27PM +0900, Isaku Yamahata wrote: Here is v5 of the pcie patch series. I hope I addressed the blockers. On uncorrectable error status register in pcie_aer_write_config(). The register is RW1CS, so making it writable and test-and-clear doesn't work. new patches: 1, 2, updasted patches except trivial change: 4, 7, 8 Ok, I applied patches 1,2,3 and 5. BTW, as 0.13 is released, any chance to sync pci branch with the upstream by requesting pull? Patch description: This patch series implements pcie port switch emulators which is basic part for pcie/q35 support. This is for mst/pci tree. change v4 - v5: - introduced pci_xxx_test_and_clear/set_mask - eliminated xxx_notify(msi_trigger, int_level) - eliminated FLR bits. FLR will be addressed at the next phase. changes v3 - v4: - introduced new pci config helper functions.(clear set bit) - various clean up and some bug fixes. - dropped pci_shift_xxx(). - dropped function pointerin pcie_aer.h - dropped pci_exp_cap(), pcie_aer_cap(). - file rename (pcie_{root, upstream, downsatrem} = ioh33420, x3130). changes v2 - v3: - msi: improved commant and simplified shift/ffs dance - pci w1c config register framework - split pcie.[ch] into pcie_regs.h, pcie.[ch] and pcie_aer.[ch] - pcie, aer: many changes by following reviews. changes v1 - v2: - update msi - dropped already pushed out patches. - added msix patches. Isaku Yamahata (14): pci: introduce helper functions to test-and-{clear, set} mask in configuration space pci: introduce helper function to handle msi-x and msi. pci: use pci_word_test_and_clear_mask() in pci_device_reset() pci/bridge: fix pci_bridge_reset() msi: implements msi pcie: add pcie constants to pcie_regs.h pcie: helper functions for pcie capability and extended capability pcie/aer: helper functions for pcie aer capability pcie port: define struct PCIEPort/PCIESlot and helper functions ioh3420: pcie root port in X58 ioh x3130: pcie upstream port x3130: pcie downstream port pcie/hotplug: introduce pushing attention button command pcie/aer: glue aer error injection into qemu monitor Makefile.objs |4 +- hw/ioh3420.c| 229 + hw/ioh3420.h| 10 + hw/msi.c| 352 +++ hw/msi.h| 41 +++ hw/pci.c| 24 ++- hw/pci.h| 88 +- hw/pci_bridge.c | 57 +++- hw/pci_bridge.h |2 + hw/pcie.c | 540 + hw/pcie.h | 113 ++ hw/pcie_aer.c | 869 +++ hw/pcie_aer.h | 105 ++ hw/pcie_port.c | 198 +++ hw/pcie_port.h | 51 +++ hw/pcie_regs.h | 154 + hw/xio3130_downstream.c | 197 +++ hw/xio3130_downstream.h | 11 + hw/xio3130_upstream.c | 181 ++ hw/xio3130_upstream.h | 10 + qemu-common.h |6 + qemu-monitor.hx | 36 ++ sysemu.h|9 + 23 files changed, 3272 insertions(+), 15 deletions(-) create mode 100644 hw/ioh3420.c create mode 100644 hw/ioh3420.h create mode 100644 hw/msi.c create mode 100644 hw/msi.h create mode 100644 hw/pcie.c create mode 100644 hw/pcie.h create mode 100644 hw/pcie_aer.c create mode 100644 hw/pcie_aer.h create mode 100644 hw/pcie_port.c create mode 100644 hw/pcie_port.h create mode 100644 hw/pcie_regs.h create mode 100644 hw/xio3130_downstream.c create mode 100644 hw/xio3130_downstream.h create mode 100644 hw/xio3130_upstream.c create mode 100644 hw/xio3130_upstream.h
Re: [Qemu-devel] Static tracepoint control via trace-event
On Tue, Oct 19, 2010 at 03:08:08PM +0200, Jan Kiszka wrote: Hi Stefan, just had a closer look at qemu's new tracing framework. Looks cool, though it leaves a bit room for improvements. ;) One quirk I stumbled over quickly was the disable tag in trace-events. It confused me first as qemu starts without any tracepoint enabled by default and I thought I had to hack the file. Then I read the doc and wondered which exiting or future backend would come without sufficiently fast dynamic tracepoint control. Do you have any in mind? Instead of making it a compile-time switch (except for simpletrace), I would vote for declaring the simpletrace usage as the only one: disable sets the default state of the dynamic tracepoint. That way we could use trace-events to define a useful set of standard, moderate-impact tracepoints that shall be on. Others will still be available once a backend is configured, but remain off until enabled during runtime. Anything else looks like overkill to me. FYI with the DTrace/SystemTAP backend I posted yesterday, the 'disable' keyword is effectively completely ignored. All tracepoints are disabled when QEMU is running normally. Only when a end user runs a dtrace script that references a QEMU tracepoint, is that specific tracepoint enabled. Regards, Daniel -- |: Red Hat, Engineering, London-o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org-o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
Re: [Qemu-devel] Re: KVM call agenda for Oct 19
On Tue, Oct 19, 2010 at 2:33 PM, Anthony Liguori anth...@codemonkey.ws wrote: On 10/19/2010 08:27 AM, Avi Kivity wrote: On 10/19/2010 03:22 PM, Anthony Liguori wrote: I had assumed that this would involve: qemu -hda windows.img (qemu) snapshot ide0-disk0 snap0.img 1) create snap0.img internally by doing the equivalent of `qemu-img create -f qcow2 -b windows.img snap0.img' 2) bdrv_flush('ide0-disk0') 3) bdrv_open(snap0.img) 4) bdrv_close(windows.img) 5) rename('windows.img', 'windows.img.tmp') 6) rename('snap0.img', 'windows.img') 7) rename('windows.img.tmp', 'snap0.img') Looks reasonable. Would be interesting to look at this as a use case for the threading work. We should eventually be able to create a snapshot without stalling vcpus (stalling I/O of course allowed). If we had another block-level command, like bdrv_aio_freeze(), that queued all pending requests until the given callback completed, it would be very easy to do this entirely asynchronously. For instance: bdrv_aio_freeze(create_snapshot) create_snapshot(): bdrv_aio_flush(done_flush) done_flush(): bdrv_open(...) bdrv_close(...) ... Of course, closing a device while it's being frozen is probably a recipe for disaster but you get the idea :-) bdrv_aio_freeze() or any mechanism to deal with pending requests in the generic block code would be a good step for future live support of other operations like truncate. Stefan
Re: [Qemu-devel] [PATCH 00/10] [PULL] qemu-kvm.git uq/master queue
On 10/19/2010 05:40 AM, Marcelo Tosatti wrote: The following changes since commit 38cc9b607f85017b095793cab6c129bc9844f441: issue snd_pcm_start() when capturing audio (2010-10-18 00:39:06 +0400) are available in the git repository at: git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git uq/master This breaks the build. cc1: warnings being treated as errors /home/anthony/git/qemu/target-i386/kvm.c: In function ‘kvm_on_sigbus_vcpu’: /home/anthony/git/qemu/target-i386/kvm.c:1671: error: passing argument 3 of ‘kvm_physical_memory_addr_from_ram’ from incompatible pointer type /home/anthony/git/qemu/kvm.h:180: note: expected ‘target_phys_addr_t *’ but argument is of type ‘long unsigned int *’ /home/anthony/git/qemu/target-i386/kvm.c: In function ‘kvm_on_sigbus’: /home/anthony/git/qemu/target-i386/kvm.c:1714: error: passing argument 3 of ‘kvm_physical_memory_addr_from_ram’ from incompatible pointer type /home/anthony/git/qemu/kvm.h:180: note: expected ‘target_phys_addr_t *’ but argument is of type ‘long unsigned int *’ make[1]: *** [kvm.o] Error 1 make: *** [subdir-i386-softmmu] Error 2 I've pushed my tree to http://repo.or.cz/w/qemu/aliguori.git qemu-kvm-20101019 but the merge is a fast-forward so you should have no trouble reproducing. anth...@titi:~/build/qemu$ uname -a Linux titi 2.6.32.11+drm33.2-x201 #1 SMP Sat May 22 09:58:34 PDT 2010 x86_64 GNU/Linux anth...@titi:~/build/qemu$ gcc -v Using built-in specs. Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 4.4.3-4ubuntu5' --with-bugurl=file:///usr/share/doc/gcc-4.4/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --enable-shared --enable-multiarch --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.4 --program-suffix=-4.4 --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-plugin --enable-objc-gc --disable-werror --with-arch-32=i486 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) Regards, Anthony Liguori Huang Ying (1): Add RAM - physical addr mapping in MCE simulation Joerg Roedel (2): Set cpuid definition to 0 before initializing it Add svm cpuid features Marcelo Tosatti (7): signalfd compatibility iothread: use signalfd Expose thread_id in info cpus kvm: x86: add mce support Export qemu_ram_addr_from_host MCE: Relay UCR MCE to guest Add savevm/loadvm support for MCE Makefile.objs |1 + compatfd.c| 117 +++ compatfd.h| 43 +++ configure | 18 +++ cpu-common.h |3 +- cpu-defs.h|1 + cpus.c| 161 -- exec-all.h|2 +- exec.c| 27 +++-- kvm-all.c | 18 +++ kvm-stub.c|5 + kvm.h |6 + monitor.c |4 + osdep.c | 15 +++ osdep.h |1 + target-i386/cpu.h | 32 +- target-i386/cpuid.c | 79 ++--- target-i386/helper.c |6 + target-i386/kvm.c | 300 - target-i386/kvm_x86.h | 22 20 files changed, 817 insertions(+), 44 deletions(-) create mode 100644 compatfd.c create mode 100644 compatfd.h create mode 100644 target-i386/kvm_x86.h
[Qemu-devel] Re: [PATCH] virtio: Use ioeventfd for virtqueue notify
On Tue, Oct 19, 2010 at 08:12:42AM -0500, Anthony Liguori wrote: On 10/19/2010 08:07 AM, Stefan Hajnoczi wrote: Is there anything stopping this patch from being merged? Michael, any objections? If not, I'll merge it. I don't really understand what's going on there. The extra state in notifiers especially scares me. If you do and are comfortable with the code, go ahead :) -- MST
[Qemu-devel] Re: [PATCH] virtio: Use ioeventfd for virtqueue notify
On Tue, Oct 19, 2010 at 2:35 PM, Michael S. Tsirkin m...@redhat.com wrote: On Tue, Oct 19, 2010 at 08:12:42AM -0500, Anthony Liguori wrote: On 10/19/2010 08:07 AM, Stefan Hajnoczi wrote: Is there anything stopping this patch from being merged? Michael, any objections? If not, I'll merge it. I don't really understand what's going on there. The extra state in notifiers especially scares me. If you do and are comfortable with the code, go ahead :) I'm happy to address your comments. The state machine was a bit icky but I don't see a way around it. Will follow up to your review email. Stefan
[Qemu-devel] Re: [PATCH] virtio: Use ioeventfd for virtqueue notify
As a general comment, could you please try to split this patch up, to make it easier to review? I did a pass over it but I am still not understanding it completely. My main concern is with the fact that we add more state in notifiers that can easily get out of sync with users. If we absolutely need this state, let's try to at least document the state machine, and make the API for state transitions more transparent. On Thu, Sep 30, 2010 at 03:01:52PM +0100, Stefan Hajnoczi wrote: diff --git a/hw/vhost.c b/hw/vhost.c index 1b8624d..f127a07 100644 --- a/hw/vhost.c +++ b/hw/vhost.c @@ -517,7 +517,7 @@ static int vhost_virtqueue_init(struct vhost_dev *dev, goto fail_guest_notifier; } -r = vdev-binding-set_host_notifier(vdev-binding_opaque, idx, true); +r = virtio_set_host_notifier(vdev, idx, true); if (r 0) { fprintf(stderr, Error binding host notifier: %d\n, -r); goto fail_host_notifier; @@ -539,7 +539,7 @@ static int vhost_virtqueue_init(struct vhost_dev *dev, fail_call: fail_kick: -vdev-binding-set_host_notifier(vdev-binding_opaque, idx, false); +virtio_set_host_notifier(vdev, idx, false); fail_host_notifier: vdev-binding-set_guest_notifier(vdev-binding_opaque, idx, false); fail_guest_notifier: @@ -575,7 +575,7 @@ static void vhost_virtqueue_cleanup(struct vhost_dev *dev, } assert (r = 0); -r = vdev-binding-set_host_notifier(vdev-binding_opaque, idx, false); +r = virtio_set_host_notifier(vdev, idx, false); if (r 0) { fprintf(stderr, vhost VQ %d host cleanup failed: %d\n, idx, r); fflush(stderr); diff --git a/hw/virtio.c b/hw/virtio.c index fbef788..f075b3a 100644 --- a/hw/virtio.c +++ b/hw/virtio.c @@ -16,6 +16,7 @@ #include trace.h #include virtio.h #include sysemu.h +#include kvm.h /* The alignment to use between consumer and producer parts of vring. * x86 pagesize again. */ @@ -77,6 +78,11 @@ struct VirtQueue VirtIODevice *vdev; EventNotifier guest_notifier; EventNotifier host_notifier; +enum { +HOST_NOTIFIER_DEASSIGNED, /* inactive */ +HOST_NOTIFIER_ASSIGNED, /* active */ +HOST_NOTIFIER_OFFLIMITS,/* active but outside our control */ +} host_notifier_state; This state machine confuses me. Please note that users already track notifier state and call set with assign/deassign correctly. The comment does not help: what does 'outside our control' mean? Who's control? }; /* virt queue functions */ @@ -453,6 +459,93 @@ void virtio_update_irq(VirtIODevice *vdev) virtio_notify_vector(vdev, VIRTIO_NO_VECTOR); } +/* Service virtqueue notify from a host notifier */ +static void virtio_read_host_notifier(void *opaque) +{ +VirtQueue *vq = opaque; +EventNotifier *notifier = virtio_queue_get_host_notifier(vq); +if (event_notifier_test_and_clear(notifier)) { +if (vq-vring.desc) { +vq-handle_output(vq-vdev, vq); +} +} +} + +/* Transition between host notifier states */ +static int virtio_set_host_notifier_state(VirtIODevice *vdev, int n, int state) really unfortunate naming for functions: we seem to have about 4 of them starting with virtio_set_host_notifier* +{ +VirtQueue *vq = vdev-vq[n]; +EventNotifier *notifier = virtio_queue_get_host_notifier(vq); +int rc; + +if (!kvm_enabled()) { +return -ENOSYS; +} If this means that there's no need to do anything for non kvm, return 0 here. + +/* If the number of ioeventfds is limited, use them for vhost only */ +if (state == HOST_NOTIFIER_ASSIGNED !kvm_has_many_iobus_devs()) { +state = HOST_NOTIFIER_DEASSIGNED; +} + +/* Ignore if no state change */ +if (vq-host_notifier_state == state) { +return 0; +} + +/* Disable read handler if transitioning away from assigned */ +if (vq-host_notifier_state == HOST_NOTIFIER_ASSIGNED) { +qemu_set_fd_handler(event_notifier_get_fd(notifier), NULL, NULL, NULL); +} + +/* Toggle host notifier if transitioning to or from deassigned */ +if (state == HOST_NOTIFIER_DEASSIGNED || +vq-host_notifier_state == HOST_NOTIFIER_DEASSIGNED) { +rc = vdev-binding-set_host_notifier(vdev-binding_opaque, n, +state != HOST_NOTIFIER_DEASSIGNED); +if (rc 0) { +return rc; +} +} + +/* Enable read handler if transitioning to assigned */ +if (state == HOST_NOTIFIER_ASSIGNED) { +qemu_set_fd_handler(event_notifier_get_fd(notifier), +virtio_read_host_notifier, NULL, vq); +} + +vq-host_notifier_state = state; +return 0; +} + +/* Try to assign/deassign host notifiers for all virtqueues */ +static void virtio_set_host_notifiers(VirtIODevice *vdev, bool assigned) void? don't we care whether this fails? +{
[Qemu-devel] CFP: 1st International QEMU Users Forum
* Call for Presentations 1st International QEMU Users Forum March 18th, 2011, Grenoble, France * Deadlines: Extended abstract Nov 28th, 2010 Notification of acceptance Nov 30th, 2010 More information is available at: http://adt.cs.upb.de/quf
Re: [Qemu-devel] Re: KVM call agenda for Oct 19
On 10/19/2010 07:48 AM, Dor Laor wrote: On 10/19/2010 04:11 AM, Chris Wright wrote: * Juan Quintela (quint...@redhat.com) wrote: Please send in any agenda items you are interested in covering. - 0.13.X -stable handoff - 0.14 planning - threadlet work - virtfs proposals - Live snapshots - We were asked to add this feature for external qcow2 images. Will simple approach of fsync + tracking each requested backing file (it can be per vDisk) and re-open the new image would be accepted? I had assumed that this would involve: qemu -hda windows.img (qemu) snapshot ide0-disk0 snap0.img 1) create snap0.img internally by doing the equivalent of `qemu-img create -f qcow2 -b windows.img snap0.img' 2) bdrv_flush('ide0-disk0') 3) bdrv_open(snap0.img) 4) bdrv_close(windows.img) 5) rename('windows.img', 'windows.img.tmp') 6) rename('snap0.img', 'windows.img') 7) rename('windows.img.tmp', 'snap0.img') Regards, Anthony Liguori - Integration with FS freeze for consistent guest app snapshot Many apps do not sync their ram state to disk correctly or frequent enough. Physical world backup software calls fs freeze on xfs and VSS for windows to make the backup consistent. In order to integrated this with live snapshots we need a guest agent to trigger the guest fs freeze. We can either have qemu communicate with the agent directly through virtio-serial or have a mgmt daemon use virtio-serial to communicate with the guest in addition to QMP messages about the live snapshot state. Preferences? The first solution complicates qemu while the second complicates mgmt. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 3/3] monitor: add usb_attach and usb_detach
Hi, +.help = attach USB device 'bus.addr', +...@item usb_attach @var{devname} /me sees a mismatch here. There is still the use case question. Also note that this might have unwanted side effects when drivers automagically attach/detach devices like usb-host. Having this purely for debugging/troubleshooting purposes would be fine with me, but the documentation should clearly say so. cheers, Gerd
Re: [Qemu-devel] [PATCH 2/3] usb: add public usb_device_by_id
Hi, There is no problem to loop over all usb devices. But first of all I don't want to loop on used, since then I miss any detached devices, so I actually do want the same behavior of qdev_find_recursive, and since it's already available, why rewrite it in a different compilation unit? Point. ACK then. cheers, Gerd
Re: [Qemu-devel] Re: KVM call agenda for Oct 19
On 10/19/2010 02:58 PM, Dor Laor wrote: On 10/19/2010 02:55 PM, Avi Kivity wrote: On 10/19/2010 02:48 PM, Dor Laor wrote: On 10/19/2010 04:11 AM, Chris Wright wrote: * Juan Quintela (quint...@redhat.com) wrote: Please send in any agenda items you are interested in covering. - 0.13.X -stable handoff - 0.14 planning - threadlet work - virtfs proposals - Live snapshots - We were asked to add this feature for external qcow2 images. Will simple approach of fsync + tracking each requested backing file (it can be per vDisk) and re-open the new image would be accepted? - Integration with FS freeze for consistent guest app snapshot Many apps do not sync their ram state to disk correctly or frequent enough. Physical world backup software calls fs freeze on xfs and VSS for windows to make the backup consistent. In order to integrated this with live snapshots we need a guest agent to trigger the guest fs freeze. We can either have qemu communicate with the agent directly through virtio-serial or have a mgmt daemon use virtio-serial to communicate with the guest in addition to QMP messages about the live snapshot state. Preferences? The first solution complicates qemu while the second complicates mgmt. Third option, make the freeze path management - qemu - virtio-blk - guest kernel - file systems. The advantage is that it's easy to associate file systems with a block device this way. OTH the userspace freeze path already exist and now you create another path. I guess we would still have a userspace daemon; instead of talking to virtio-serial it talks to virtio-blk. So: management - qemu - virtio-blk - guest driver - kernel fs resolver - daemon - apps Yuck. What about FS that span over LVM with multiple drives? IDE/SCSI? Good points. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] Re: KVM call agenda for Oct 19
On 10/19/2010 08:27 AM, Avi Kivity wrote: On 10/19/2010 03:22 PM, Anthony Liguori wrote: I had assumed that this would involve: qemu -hda windows.img (qemu) snapshot ide0-disk0 snap0.img 1) create snap0.img internally by doing the equivalent of `qemu-img create -f qcow2 -b windows.img snap0.img' 2) bdrv_flush('ide0-disk0') 3) bdrv_open(snap0.img) 4) bdrv_close(windows.img) 5) rename('windows.img', 'windows.img.tmp') 6) rename('snap0.img', 'windows.img') 7) rename('windows.img.tmp', 'snap0.img') Looks reasonable. Would be interesting to look at this as a use case for the threading work. We should eventually be able to create a snapshot without stalling vcpus (stalling I/O of course allowed). If we had another block-level command, like bdrv_aio_freeze(), that queued all pending requests until the given callback completed, it would be very easy to do this entirely asynchronously. For instance: bdrv_aio_freeze(create_snapshot) create_snapshot(): bdrv_aio_flush(done_flush) done_flush(): bdrv_open(...) bdrv_close(...) ... Of course, closing a device while it's being frozen is probably a recipe for disaster but you get the idea :-) Regards, Anthony Liguori
[Qemu-devel] Re: Static tracepoint control via trace-event
On Tue, Oct 19, 2010 at 03:08:08PM +0200, Jan Kiszka wrote: One quirk I stumbled over quickly was the disable tag in trace-events. It confused me first as qemu starts without any tracepoint enabled by default and I thought I had to hack the file. Then I read the doc and wondered which exiting or future backend would come without sufficiently fast dynamic tracepoint control. Do you have any in mind? Instead of making it a compile-time switch (except for simpletrace), I would vote for declaring the simpletrace usage as the only one: disable sets the default state of the dynamic tracepoint. That way we could use trace-events to define a useful set of standard, moderate-impact tracepoints that shall be on. Others will still be available once a backend is configured, but remain off until enabled during runtime. Anything else looks like overkill to me. The motivation for disable producing a nop trace event is that it allows QEMU builds without certain trace events. A trace event cannot simply be removed by deleting its trace-events declaration since there are calls to its trace_*() function in the source tree. So this provided a way to disable trace events before simpletrace supported enabling/disabling trace events at runtime :). Today that's no longer an issue for simpletrace and other tracing backends like LTTng UST and SystemTAP handle disabled trace events well. I agree that keeping just one meaning for the disable keyword is better. Perhaps we should keep a separate nop keyword to build out specific trace events. When would nop be handy? I think an ftrace backend is a good example. Since an ftrace marker cannot be enabled/disabled at runtime, the only way to silence unwanted trace events is to nop them at compile-time. There are a few more things I have in mind (ftrace backend, enhanced -trace switch, wildcards for enabling tracepoints, and more tracepoints). Will hopefully come up with patches to address them, but this may take a while. Sounds great. PS: Do you maintain a tracing git tree? No, I'm reviewing patches as they are posted for qemu-devel. If the backlog between mailing list discussion and merge reaches the point where your patches are suffering conflicts please let me know and I can maintain one. For the initial QEMU tracing effort I kept a tree but I stopped after the patches were accepted into mainline. The patches I write go straight to qemu-devel now. Stefan
[Qemu-devel] Re: Static tracepoint control via trace-event
Am 19.10.2010 15:30, Stefan Hajnoczi wrote: On Tue, Oct 19, 2010 at 03:08:08PM +0200, Jan Kiszka wrote: One quirk I stumbled over quickly was the disable tag in trace-events. It confused me first as qemu starts without any tracepoint enabled by default and I thought I had to hack the file. Then I read the doc and wondered which exiting or future backend would come without sufficiently fast dynamic tracepoint control. Do you have any in mind? Instead of making it a compile-time switch (except for simpletrace), I would vote for declaring the simpletrace usage as the only one: disable sets the default state of the dynamic tracepoint. That way we could use trace-events to define a useful set of standard, moderate-impact tracepoints that shall be on. Others will still be available once a backend is configured, but remain off until enabled during runtime. Anything else looks like overkill to me. The motivation for disable producing a nop trace event is that it allows QEMU builds without certain trace events. A trace event cannot simply be removed by deleting its trace-events declaration since there are calls to its trace_*() function in the source tree. So this provided a way to disable trace events before simpletrace supported enabling/disabling trace events at runtime :). Today that's no longer an issue for simpletrace and other tracing backends like LTTng UST and SystemTAP handle disabled trace events well. I agree that keeping just one meaning for the disable keyword is better. Perhaps we should keep a separate nop keyword to build out specific trace events. When would nop be handy? I think an ftrace backend is a good example. Since an ftrace marker cannot be enabled/disabled at runtime, the only way to silence unwanted trace events is to nop them at compile-time. Another to-do item is to remove the strange dependency of tracing managements features on CONFIG_SIMPLE_TRACE. That way the monitor commands and a to-be-added command line option to control individual tracepoints could of course also be used by an ftrace backend. I bet the DTrace backend will like to see this as well. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
[Qemu-devel] Re: [PATCH] virtio: Use ioeventfd for virtqueue notify
On Tue, Oct 19, 2010 at 02:44:35PM +0100, Stefan Hajnoczi wrote: On Tue, Oct 19, 2010 at 2:35 PM, Michael S. Tsirkin m...@redhat.com wrote: On Tue, Oct 19, 2010 at 08:12:42AM -0500, Anthony Liguori wrote: On 10/19/2010 08:07 AM, Stefan Hajnoczi wrote: Is there anything stopping this patch from being merged? Michael, any objections? If not, I'll merge it. I don't really understand what's going on there. The extra state in notifiers especially scares me. If you do and are comfortable with the code, go ahead :) I'm happy to address your comments. The state machine was a bit icky but I don't see a way around it. I think the situation is similar to irqfd in qemu-kvm - take a look there, specifically msix mask notifiers. -- MST
Re: [Qemu-devel] Re: Static tracepoint control via trace-event
On Tue, Oct 19, 2010 at 2:46 PM, Jan Kiszka jan.kis...@siemens.com wrote: Am 19.10.2010 15:30, Stefan Hajnoczi wrote: On Tue, Oct 19, 2010 at 03:08:08PM +0200, Jan Kiszka wrote: One quirk I stumbled over quickly was the disable tag in trace-events. It confused me first as qemu starts without any tracepoint enabled by default and I thought I had to hack the file. Then I read the doc and wondered which exiting or future backend would come without sufficiently fast dynamic tracepoint control. Do you have any in mind? Instead of making it a compile-time switch (except for simpletrace), I would vote for declaring the simpletrace usage as the only one: disable sets the default state of the dynamic tracepoint. That way we could use trace-events to define a useful set of standard, moderate-impact tracepoints that shall be on. Others will still be available once a backend is configured, but remain off until enabled during runtime. Anything else looks like overkill to me. The motivation for disable producing a nop trace event is that it allows QEMU builds without certain trace events. A trace event cannot simply be removed by deleting its trace-events declaration since there are calls to its trace_*() function in the source tree. So this provided a way to disable trace events before simpletrace supported enabling/disabling trace events at runtime :). Today that's no longer an issue for simpletrace and other tracing backends like LTTng UST and SystemTAP handle disabled trace events well. I agree that keeping just one meaning for the disable keyword is better. Perhaps we should keep a separate nop keyword to build out specific trace events. When would nop be handy? I think an ftrace backend is a good example. Since an ftrace marker cannot be enabled/disabled at runtime, the only way to silence unwanted trace events is to nop them at compile-time. Another to-do item is to remove the strange dependency of tracing managements features on CONFIG_SIMPLE_TRACE. That way the monitor commands and a to-be-added command line option to control individual tracepoints could of course also be used by an ftrace backend. I bet the DTrace backend will like to see this as well. If there is a programmatic way of inspecting and toggling trace events from inside an instrumented process, then yes. If this is possible with SystemTAP we should think about it now before QMP tracing commands become available in a release. Stefan
[Qemu-devel] [PATCH] Fix test suite build with tracing enabled
qemu_malloc instrumentations require linking against the trace objects. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- Makefile | 12 ++-- 1 files changed, 6 insertions(+), 6 deletions(-) diff --git a/Makefile b/Makefile index 252c817..106a401 100644 --- a/Makefile +++ b/Makefile @@ -140,12 +140,12 @@ qemu-img-cmds.h: $(SRC_PATH)/qemu-img-cmds.hx check-qint.o check-qstring.o check-qdict.o check-qlist.o check-qfloat.o check-qjson.o: $(GENERATED_HEADERS) -check-qint: check-qint.o qint.o qemu-malloc.o -check-qstring: check-qstring.o qstring.o qemu-malloc.o -check-qdict: check-qdict.o qdict.o qfloat.o qint.o qstring.o qbool.o qemu-malloc.o qlist.o -check-qlist: check-qlist.o qlist.o qint.o qemu-malloc.o -check-qfloat: check-qfloat.o qfloat.o qemu-malloc.o -check-qjson: check-qjson.o qfloat.o qint.o qdict.o qstring.o qlist.o qbool.o qjson.o json-streamer.o json-lexer.o json-parser.o qemu-malloc.o +check-qint: check-qint.o qint.o qemu-malloc.o $(trace-obj-y) +check-qstring: check-qstring.o qstring.o qemu-malloc.o $(trace-obj-y) +check-qdict: check-qdict.o qdict.o qfloat.o qint.o qstring.o qbool.o qemu-malloc.o qlist.o $(trace-obj-y) +check-qlist: check-qlist.o qlist.o qint.o qemu-malloc.o $(trace-obj-y) +check-qfloat: check-qfloat.o qfloat.o qemu-malloc.o $(trace-obj-y) +check-qjson: check-qjson.o qfloat.o qint.o qdict.o qstring.o qlist.o qbool.o qjson.o json-streamer.o json-lexer.o json-parser.o qemu-malloc.o $(trace-obj-y) clean: # avoid old build problems by removing potentially incorrect old files -- 1.7.1
Re: [Qemu-devel] Static tracepoint control via trace-event
On Tue, Oct 19, 2010 at 2:36 PM, Daniel P. Berrange berra...@redhat.com wrote: On Tue, Oct 19, 2010 at 03:08:08PM +0200, Jan Kiszka wrote: Hi Stefan, just had a closer look at qemu's new tracing framework. Looks cool, though it leaves a bit room for improvements. ;) One quirk I stumbled over quickly was the disable tag in trace-events. It confused me first as qemu starts without any tracepoint enabled by default and I thought I had to hack the file. Then I read the doc and wondered which exiting or future backend would come without sufficiently fast dynamic tracepoint control. Do you have any in mind? Instead of making it a compile-time switch (except for simpletrace), I would vote for declaring the simpletrace usage as the only one: disable sets the default state of the dynamic tracepoint. That way we could use trace-events to define a useful set of standard, moderate-impact tracepoints that shall be on. Others will still be available once a backend is configured, but remain off until enabled during runtime. Anything else looks like overkill to me. FYI with the DTrace/SystemTAP backend I posted yesterday, the 'disable' keyword is effectively completely ignored. All tracepoints are disabled when QEMU is running normally. Only when a end user runs a dtrace script that references a QEMU tracepoint, is that specific tracepoint enabled. I think that makes sense for external trace backends. DTrace can launch a process for you with the probes you want enabled from the start. The simpletrace backend can't really do this so probes can be enabled/disabled at compile-time (e.g. early startup tracing). Stefan
[Qemu-devel] Re: [PATCH v5 00/14] pcie port switch emulators
Isaku Yamahata (14): pci: introduce helper functions to test-and-{clear, set} mask in configuration space pci: introduce helper function to handle msi-x and msi. pci: use pci_word_test_and_clear_mask() in pci_device_reset() pci/bridge: fix pci_bridge_reset() msi: implements msi pcie: add pcie constants to pcie_regs.h pcie: helper functions for pcie capability and extended capability I'll apply these. pcie/aer: helper functions for pcie aer capability Maybe move this to the end of the series? pcie port: define struct PCIEPort/PCIESlot and helper functions ioh3420: pcie root port in X58 ioh x3130: pcie upstream port x3130: pcie downstream port pcie/hotplug: introduce pushing attention button command I think the above can be applied - just remove the dependency on aer for now. Okay. I'll update the patch series and send it tomorrow. -- yamahata
Re: [Qemu-devel] Re: KVM call agenda for Oct 19
On 10/19/2010 03:38 PM, Stefan Hajnoczi wrote: bdrv_aio_freeze() or any mechanism to deal with pending requests in the generic block code would be a good step for future live support of other operations like truncate. + logical disk grow, etc. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] Re: Static tracepoint control via trace-event
On Tue, Oct 19, 2010 at 03:46:35PM +0200, Jan Kiszka wrote: Am 19.10.2010 15:30, Stefan Hajnoczi wrote: On Tue, Oct 19, 2010 at 03:08:08PM +0200, Jan Kiszka wrote: One quirk I stumbled over quickly was the disable tag in trace-events. It confused me first as qemu starts without any tracepoint enabled by default and I thought I had to hack the file. Then I read the doc and wondered which exiting or future backend would come without sufficiently fast dynamic tracepoint control. Do you have any in mind? Instead of making it a compile-time switch (except for simpletrace), I would vote for declaring the simpletrace usage as the only one: disable sets the default state of the dynamic tracepoint. That way we could use trace-events to define a useful set of standard, moderate-impact tracepoints that shall be on. Others will still be available once a backend is configured, but remain off until enabled during runtime. Anything else looks like overkill to me. The motivation for disable producing a nop trace event is that it allows QEMU builds without certain trace events. A trace event cannot simply be removed by deleting its trace-events declaration since there are calls to its trace_*() function in the source tree. So this provided a way to disable trace events before simpletrace supported enabling/disabling trace events at runtime :). Today that's no longer an issue for simpletrace and other tracing backends like LTTng UST and SystemTAP handle disabled trace events well. I agree that keeping just one meaning for the disable keyword is better. Perhaps we should keep a separate nop keyword to build out specific trace events. When would nop be handy? I think an ftrace backend is a good example. Since an ftrace marker cannot be enabled/disabled at runtime, the only way to silence unwanted trace events is to nop them at compile-time. Another to-do item is to remove the strange dependency of tracing managements features on CONFIG_SIMPLE_TRACE. That way the monitor commands and a to-be-added command line option to control individual tracepoints could of course also be used by an ftrace backend. I bet the DTrace backend will like to see this as well. I don't see a need for any monitor commands or command line options for the DTrace backend, since everything is completely dynamically controlled based on the tracing scripts the user is running. Regards, Daniel -- |: Red Hat, Engineering, London-o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org-o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
Re: [Qemu-devel] Static tracepoint control via trace-event
Am 19.10.2010 15:52, Stefan Hajnoczi wrote: On Tue, Oct 19, 2010 at 2:36 PM, Daniel P. Berrange berra...@redhat.com wrote: On Tue, Oct 19, 2010 at 03:08:08PM +0200, Jan Kiszka wrote: Hi Stefan, just had a closer look at qemu's new tracing framework. Looks cool, though it leaves a bit room for improvements. ;) One quirk I stumbled over quickly was the disable tag in trace-events. It confused me first as qemu starts without any tracepoint enabled by default and I thought I had to hack the file. Then I read the doc and wondered which exiting or future backend would come without sufficiently fast dynamic tracepoint control. Do you have any in mind? Instead of making it a compile-time switch (except for simpletrace), I would vote for declaring the simpletrace usage as the only one: disable sets the default state of the dynamic tracepoint. That way we could use trace-events to define a useful set of standard, moderate-impact tracepoints that shall be on. Others will still be available once a backend is configured, but remain off until enabled during runtime. Anything else looks like overkill to me. FYI with the DTrace/SystemTAP backend I posted yesterday, the 'disable' keyword is effectively completely ignored. All tracepoints are disabled when QEMU is running normally. Only when a end user runs a dtrace script that references a QEMU tracepoint, is that specific tracepoint enabled. I think that makes sense for external trace backends. DTrace can launch a process for you with the probes you want enabled from the start. The simpletrace backend can't really do this so probes can be enabled/disabled at compile-time (e.g. early startup tracing). Once we have -trace events=..., defining the list of active tracepoints before starting qemu will be trivial (e.g. via a config file). Of course, this requires that all tracepoints are built-in... Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
[Qemu-devel] [PATCH] simpletrace: Inline runtime state check
Instead of preparing all traced args, jumping into the common trace function, even collecting a timestamp, do the check if a particular tracepoint is enabled inline. Also, mark the enabled case unlikely to motivate the compiler to push the trace code out of the fastpath. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- simpletrace.c |4 tracetool |7 +-- 2 files changed, 5 insertions(+), 6 deletions(-) diff --git a/simpletrace.c b/simpletrace.c index deb1e07..224e4ab 100644 --- a/simpletrace.c +++ b/simpletrace.c @@ -148,10 +148,6 @@ static void trace(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3, */ clock_gettime(CLOCK_MONOTONIC, ts); -if (!trace_list[event].state) { -return; -} - rec-event = event; rec-timestamp_ns = ts.tv_sec * 10LL + ts.tv_nsec; rec-x1 = x1; diff --git a/tracetool b/tracetool index 7010858..9532409 100755 --- a/tracetool +++ b/tracetool @@ -146,6 +146,8 @@ linetoh_begin_simple() { cat EOF #include simpletrace.h + +extern TraceEvent trace_list[]; EOF simple_event_num=0 @@ -179,7 +181,9 @@ linetoh_simple() cat EOF static inline void trace_$name($args) { -trace$argc($trace_args); +if (unlikely(trace_list[$simple_event_num].state)) { +trace$argc($trace_args); +} } EOF @@ -190,7 +194,6 @@ linetoh_end_simple() { cat EOF #define NR_TRACE_EVENTS $simple_event_num -extern TraceEvent trace_list[NR_TRACE_EVENTS]; EOF } -- 1.7.1
[Qemu-devel] Re: [PATCH] Fix test suite build with tracing enabled
On Tue, Oct 19, 2010 at 04:03:15PM +0200, Jan Kiszka wrote: qemu_malloc instrumentations require linking against the trace objects. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- Makefile | 12 ++-- 1 files changed, 6 insertions(+), 6 deletions(-) Acked-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
Re: [Qemu-devel] [PATCH 0/7] ATAPI CDROM passthrough v5
On 19 October 2010 08:17, Alexander Graf ag...@suse.de wrote: Am 19.10.2010 um 02:10 schrieb Anthony Liguori anth...@codemonkey.ws: On 10/18/2010 06:29 PM, Alexander Graf wrote: A user will get a really nasty surprise if they think they can use a flag or rely on QEMU to prevent a VM from doing something nasty with a device. If they have this feeling of security, they're likely to chmod the device to allow unprivileged users to access it. But how a device handles ATAPI commands is totally up to the device. If you issue the wrong sequence, I'm sure there are devices out there that totally hose themselves. Are you absolutely confident that every ATAPI device out there is completely safe against hostile code provided that you simply prevent the FW update commands? I'm certainly not. Ping? Who are you pinging? Mostly Ian. I haven't seen any follow-up on this discussion and would like to know why and if there's still plans to upstream this code :). Why is allowing ATAPI passthrough such a problem? Sure if your boot drive is on the same IDE cable as the device you may have issues but other than that the device may just stop working if it is not designed to handle incorrect command gracefully (ie it is broken). I am sure there are devices that also break under issuing correct commands or commands that look vaguely sane. Eg. there are CD-ROMs that would lock up the whole system when you boot certain vintage of Linux (not tested with current Linux due to lack of old hardware) on a machine with the Intel BX chipset and one of these CD-ROMs attached over IDE cable. However, assuming random hardware breakage you cannot allow anything. Perhaps the ATAPI passthrough should be designed to allow any commands and some command profiles could be selected to allow for some sane/conservative subset, burning, LightScribe, LabelFlash, disc t...@tto, FW upgrade, .. It would be nice if these subsets were defined in a configuration file so that people can create their own 'default' combination or just install a new set when a new fancy feature comes out. Thanks Michal
Tracing block devices (was: Re: [Qemu-devel] Static tracepoint control via trace-event)
On Tue, Oct 19, 2010 at 03:59:36PM +0200, Jan Kiszka wrote: Once we have -trace events=..., defining the list of active tracepoints before starting qemu will be trivial (e.g. via a config file). Of course, this requires that all tracepoints are built-in... Sorry that I've not been following this very closely, but does this sort of thing allow tracing reads and writes to block devices? Am I right in thinking that if a tracepoint existed in the right place, one could get a log file from that which could be post-processed in another tool? cf: http://rwmj.wordpress.com/2010/10/05/visualizing-reads-writes-and-alignment/#content Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://et.redhat.com/~rjones/virt-df/
Re: [Qemu-devel] Re: Static tracepoint control via trace-event
Am 19.10.2010 16:12, Daniel P. Berrange wrote: On Tue, Oct 19, 2010 at 03:46:35PM +0200, Jan Kiszka wrote: Am 19.10.2010 15:30, Stefan Hajnoczi wrote: On Tue, Oct 19, 2010 at 03:08:08PM +0200, Jan Kiszka wrote: One quirk I stumbled over quickly was the disable tag in trace-events. It confused me first as qemu starts without any tracepoint enabled by default and I thought I had to hack the file. Then I read the doc and wondered which exiting or future backend would come without sufficiently fast dynamic tracepoint control. Do you have any in mind? Instead of making it a compile-time switch (except for simpletrace), I would vote for declaring the simpletrace usage as the only one: disable sets the default state of the dynamic tracepoint. That way we could use trace-events to define a useful set of standard, moderate-impact tracepoints that shall be on. Others will still be available once a backend is configured, but remain off until enabled during runtime. Anything else looks like overkill to me. The motivation for disable producing a nop trace event is that it allows QEMU builds without certain trace events. A trace event cannot simply be removed by deleting its trace-events declaration since there are calls to its trace_*() function in the source tree. So this provided a way to disable trace events before simpletrace supported enabling/disabling trace events at runtime :). Today that's no longer an issue for simpletrace and other tracing backends like LTTng UST and SystemTAP handle disabled trace events well. I agree that keeping just one meaning for the disable keyword is better. Perhaps we should keep a separate nop keyword to build out specific trace events. When would nop be handy? I think an ftrace backend is a good example. Since an ftrace marker cannot be enabled/disabled at runtime, the only way to silence unwanted trace events is to nop them at compile-time. Another to-do item is to remove the strange dependency of tracing managements features on CONFIG_SIMPLE_TRACE. That way the monitor commands and a to-be-added command line option to control individual tracepoints could of course also be used by an ftrace backend. I bet the DTrace backend will like to see this as well. I don't see a need for any monitor commands or command line options for the DTrace backend, since everything is completely dynamically controlled based on the tracing scripts the user is running. Ah, it's all dynamic probing, you just need the marks. OK, was a bad example. :) Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
[Qemu-devel] Re: Tracing block devices
Am 19.10.2010 16:29, Richard W.M. Jones wrote: On Tue, Oct 19, 2010 at 03:59:36PM +0200, Jan Kiszka wrote: Once we have -trace events=..., defining the list of active tracepoints before starting qemu will be trivial (e.g. via a config file). Of course, this requires that all tracepoints are built-in... Sorry that I've not been following this very closely, but does this sort of thing allow tracing reads and writes to block devices? Am I right in thinking that if a tracepoint existed in the right place, one could get a log file from that which could be post-processed in another tool? cf: http://rwmj.wordpress.com/2010/10/05/visualizing-reads-writes-and-alignment/#content Rich. Yes. The block layer is instrumented, not sure if already sufficiently, but you may simply want to try the simpletrace backend and inspect the result via its postprocessor (simpletrace.py). Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
[Qemu-devel] [PATCH 09/10] MCE: Relay UCR MCE to guest (v2)
Port qemu-kvm's commit 4b62fff1101a7ad77553147717a8bd3bf79df7ef Author: Huang Ying ying.hu...@intel.com Date: Mon Sep 21 10:43:25 2009 +0800 MCE: Relay UCR MCE to guest UCR (uncorrected recovery) MCE is supported in recent Intel CPUs, where some hardware error such as some memory error can be reported without PCC (processor context corrupted). To recover from such MCE, the corresponding memory will be unmapped, and all processes accessing the memory will be killed via SIGBUS. For KVM, if QEMU/KVM is killed, all guest processes will be killed too. So we relay SIGBUS from host OS to guest system via a UCR MCE injection. Then guest OS can isolate corresponding memory and kill necessary guest processes only. SIGBUS sent to main thread (not VCPU threads) will be broadcast to all VCPU threads as UCR MCE. v2: use target_phys_addr_t type for paddr. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com --- cpus.c| 82 -- kvm-stub.c|5 ++ kvm.h |3 + target-i386/cpu.h | 20 +- target-i386/helper.c |2 +- target-i386/kvm.c | 178 - target-i386/kvm_x86.h |3 +- 7 files changed, 279 insertions(+), 14 deletions(-) diff --git a/cpus.c b/cpus.c index 429993a..62de0bc 100644 --- a/cpus.c +++ b/cpus.c @@ -34,6 +34,10 @@ #include cpus.h #include compatfd.h +#ifdef CONFIG_LINUX +#include sys/prctl.h +#include sys/signalfd.h +#endif #ifdef SIGRTMIN #define SIG_IPI (SIGRTMIN+4) @@ -41,6 +45,10 @@ #define SIG_IPI SIGUSR1 #endif +#ifndef PR_MCE_KILL +#define PR_MCE_KILL 33 +#endif + static CPUState *next_cpu; /***/ @@ -498,28 +506,77 @@ static void qemu_tcg_wait_io_event(void) } } +static void sigbus_reraise(void) +{ +sigset_t set; +struct sigaction action; + +memset(action, 0, sizeof(action)); +action.sa_handler = SIG_DFL; +if (!sigaction(SIGBUS, action, NULL)) { +raise(SIGBUS); +sigemptyset(set); +sigaddset(set, SIGBUS); +sigprocmask(SIG_UNBLOCK, set, NULL); +} +perror(Failed to re-raise SIGBUS!\n); +abort(); +} + +static void sigbus_handler(int n, struct qemu_signalfd_siginfo *siginfo, + void *ctx) +{ +#if defined(TARGET_I386) +if (kvm_on_sigbus(siginfo-ssi_code, (void *)(intptr_t)siginfo-ssi_addr)) +#endif +sigbus_reraise(); +} + static void qemu_kvm_eat_signal(CPUState *env, int timeout) { struct timespec ts; int r, e; siginfo_t siginfo; sigset_t waitset; +sigset_t chkset; ts.tv_sec = timeout / 1000; ts.tv_nsec = (timeout % 1000) * 100; sigemptyset(waitset); sigaddset(waitset, SIG_IPI); +sigaddset(waitset, SIGBUS); -qemu_mutex_unlock(qemu_global_mutex); -r = sigtimedwait(waitset, siginfo, ts); -e = errno; -qemu_mutex_lock(qemu_global_mutex); +do { +qemu_mutex_unlock(qemu_global_mutex); -if (r == -1 !(e == EAGAIN || e == EINTR)) { -fprintf(stderr, sigtimedwait: %s\n, strerror(e)); -exit(1); -} +r = sigtimedwait(waitset, siginfo, ts); +e = errno; + +qemu_mutex_lock(qemu_global_mutex); + +if (r == -1 !(e == EAGAIN || e == EINTR)) { +fprintf(stderr, sigtimedwait: %s\n, strerror(e)); +exit(1); +} + +switch (r) { +case SIGBUS: +#ifdef TARGET_I386 +if (kvm_on_sigbus_vcpu(env, siginfo.si_code, siginfo.si_addr)) +#endif +sigbus_reraise(); +break; +default: +break; +} + +r = sigpending(chkset); +if (r == -1) { +fprintf(stderr, sigpending: %s\n, strerror(e)); +exit(1); +} +} while (sigismember(chkset, SIG_IPI) || sigismember(chkset, SIGBUS)); } static void qemu_kvm_wait_io_event(CPUState *env) @@ -645,6 +702,7 @@ static void kvm_init_ipi(CPUState *env) pthread_sigmask(SIG_BLOCK, NULL, set); sigdelset(set, SIG_IPI); +sigdelset(set, SIGBUS); r = kvm_set_signal_mask(env, set); if (r) { fprintf(stderr, kvm_set_signal_mask: %s\n, strerror(r)); @@ -655,6 +713,7 @@ static void kvm_init_ipi(CPUState *env) static sigset_t block_io_signals(void) { sigset_t set; +struct sigaction action; /* SIGUSR2 used by posix-aio-compat.c */ sigemptyset(set); @@ -665,8 +724,15 @@ static sigset_t block_io_signals(void) sigaddset(set, SIGIO); sigaddset(set, SIGALRM); sigaddset(set, SIG_IPI); +sigaddset(set, SIGBUS); pthread_sigmask(SIG_BLOCK, set, NULL); +memset(action, 0, sizeof(action)); +action.sa_flags = SA_SIGINFO; +action.sa_sigaction = (void (*)(int, siginfo_t*, void*))sigbus_handler; +sigaction(SIGBUS, action, NULL);
[Qemu-devel] [PATCH 0/2] v2 Decouple block device removal from device removal
This patch series decouples the detachment of a block device from the removal of the backing pci-device. Removal of a hotplugged pci device requires the guest to respond before qemu tears down the block device. In some cases, the guest may not respond leaving the guest with continued access to the block device. The new monitor command, drive_unplug, will revoke a guests access to the block device independently of the removal of the pci device. The first patch adds a new drive find method, the second patch implements the monitor command and block layer changes. Changes since v1: - CodingStyle fixes - Added qemu_aio_flush() to bdrv_unplug() Signed-off-by: Ryan Harper ry...@us.ibm.com
Re: Tracing block devices (was: Re: [Qemu-devel] Static tracepoint control via trace-event)
On Tue, Oct 19, 2010 at 03:29:51PM +0100, Richard W.M. Jones wrote: On Tue, Oct 19, 2010 at 03:59:36PM +0200, Jan Kiszka wrote: Once we have -trace events=..., defining the list of active tracepoints before starting qemu will be trivial (e.g. via a config file). Of course, this requires that all tracepoints are built-in... Sorry that I've not been following this very closely, but does this sort of thing allow tracing reads and writes to block devices? Am I right in thinking that if a tracepoint existed in the right place, one could get a log file from that which could be post-processed in another tool? cf: http://rwmj.wordpress.com/2010/10/05/visualizing-reads-writes-and-alignment/#content Definitely, here is the commit that added bdrv_aio_writev/bdrv_aio_readv tracing. bdrv_aio_multiwrite has been traced for a while. http://patchwork.ozlabs.org/patch/66843/ As an example, I use the following script to find all write requests that touch a given region. This is very useful for debugging image corruptions given a trace file: The usage is: find_overlapping_io.py bs sector_num nb_sectors where bs is the block driver state pointer, sector_num is the starting sector address, and nb_sectors is the number of sectors. #!/usr/bin/env python import sys def trace_filter(fobj, event, keys): for line in fobj: fields = line.strip().split() if fields[0] != event: continue attrs = dict([(k, v) for k, v in (x.split('=') for x in fields[2:])]) match = True for k, v in keys.iteritems(): if k not in attrs: match = False break if attrs[k] != v: match = False break if match: yield attrs def intersection(a_sector_num, a_nb_sectors, b_sector_num, b_nb_sectors): return not (a_sector_num + a_nb_sectors = b_sector_num or \ b_sector_num + b_nb_sectors = a_sector_num) bs, sector_num, nb_sectors = sys.argv[1:] sector_num = int(sector_num, 0) nb_sectors = int(nb_sectors, 0) for req in trace_filter(sys.stdin, 'bdrv_aio_writev', {'bs': bs}): if intersection(sector_num, nb_sectors, int(req['sector_num'], 0), int(req['nb_sectors'], 0)): print req Stefan
Re: Testing of russian keymap (was Re: [Qemu-devel] [PATCH] fix '/' and '|' on russian keymap)
19/10/2010 10:32 +0100, Daniel P. Berrange wrote: On Mon, Oct 18, 2010 at 01:59:15PM -0500, Anthony Liguori wrote: On 10/18/2010 12:30 PM, Oleg Sadov wrote: I don't understand reasons for such locale-default keyboard settings for qemu too, but may be it's useful for someone... -k only exists to deal with crappy VNC clients. If you use a good VNC client (like vinagre or virt-viewer) then you don't have to use -k. Indeed you must *NOT* use -k then, because that disables the extension that vinagre/virt-viewer rely on for sane keyboard handling. I don't use '-k' option directly -- in my RHEL-based system it's automagically appended to qemu-kvm by libvirt. KVM XML-description, created by standard virt-manager GUI-interface (package virt-manager-0.6.1-12.el5.x86_64), has a 'keymap' attribute of 'graphics' tag, despite that configurator don't have any controls for 'keymap' setting. As I understand, 'default_keymap' function from util.py (package python-virtinst-0.400.3-9.el5.noarch) got information from /etc/sysconfig/keyboard, then keymap searched in 'keytable' dictionary from keytable.py and automatically placed to 'keymap' attribute of 'graphics' tag in virtual-machine XML-description. In our system we have a russian keyboard settings = we've got a XML description like this: graphics type='vnc' port='-1' autoport='yes' keymap='ru'/ and, as a consequence, qemu-kvm running with '-k ru' option. Regards, Daniel Sincerely, --Oleg
[Qemu-devel] [PATCH 2/2] v2 Fix Block Hotplug race with drive_unplug()
Block hot unplug is racy since the guest is required to acknowlege the ACPI unplug event; this may not happen synchronously with the device removal command This series aims to close a gap where by mgmt applications that assume the block resource has been removed without confirming that the guest has acknowledged the removal may re-assign the underlying device to a second guest leading to data leakage. This series introduces a new montor command to decouple asynchornous device removal from restricting guest access to a block device. We do this by creating a new monitor command drive_unplug which maps to a bdrv_unplug() command which does a qemu_aio_flush; bdrv_flush() and bdrv_close(). Once complete, subsequent IO is rejected from the device and the guest will get IO errors but continue to function. A subsequent device removal command can be issued to remove the device, to which the guest may or maynot respond, but as long as the unplugged bit is set, no IO will be sumbitted. Changes since v1: - Added qemu_aio_flush() before bdrv_flush() to wait on pending io Signed-off-by: Ryan Harper ry...@us.ibm.com --- block.c |7 +++ block.h |1 + blockdev.c | 26 ++ blockdev.h |1 + hmp-commands.hx | 15 +++ 5 files changed, 50 insertions(+), 0 deletions(-) diff --git a/block.c b/block.c index a19374d..be47655 100644 --- a/block.c +++ b/block.c @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int removable) } } +void bdrv_unplug(BlockDriverState *bs) +{ +qemu_aio_flush(); +bdrv_flush(bs); +bdrv_close(bs); +} + int bdrv_is_removable(BlockDriverState *bs) { return bs-removable; diff --git a/block.h b/block.h index 5f64380..732f63e 100644 --- a/block.h +++ b/block.h @@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, BlockErrorAction on_read_error, BlockErrorAction on_write_error); BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read); void bdrv_set_removable(BlockDriverState *bs, int removable); +void bdrv_unplug(BlockDriverState *bs); int bdrv_is_removable(BlockDriverState *bs); int bdrv_is_read_only(BlockDriverState *bs); int bdrv_is_sg(BlockDriverState *bs); diff --git a/blockdev.c b/blockdev.c index 5fc3b9b..68eb329 100644 --- a/blockdev.c +++ b/blockdev.c @@ -610,3 +610,29 @@ int do_change_block(Monitor *mon, const char *device, } return monitor_read_bdrv_key_start(mon, bs, NULL, NULL); } + +int do_drive_unplug(Monitor *mon, const QDict *qdict, QObject **ret_data) +{ +DriveInfo *dinfo; +BlockDriverState *bs; +const char *id; + +if (!qdict_haskey(qdict, id)) { +qerror_report(QERR_MISSING_PARAMETER, id); +return -1; +} + +id = qdict_get_str(qdict, id); +dinfo = drive_get_by_id(id); +if (!dinfo) { +qerror_report(QERR_DEVICE_NOT_FOUND, id); +return -1; +} + +/* mark block device unplugged */ +bs = dinfo-bdrv; +bdrv_unplug(bs); + +return 0; +} + diff --git a/blockdev.h b/blockdev.h index 19c6915..ecb9ac8 100644 --- a/blockdev.h +++ b/blockdev.h @@ -52,5 +52,6 @@ int do_eject(Monitor *mon, const QDict *qdict, QObject **ret_data); int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject **ret_data); int do_change_block(Monitor *mon, const char *device, const char *filename, const char *fmt); +int do_drive_unplug(Monitor *mon, const QDict *qdict, QObject **ret_data); #endif diff --git a/hmp-commands.hx b/hmp-commands.hx index 81999aa..7a32a2e 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -68,6 +68,21 @@ Eject a removable medium (use -f to force it). ETEXI { +.name = drive_unplug, +.args_type = id:s, +.params = device, +.help = unplug block device, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_drive_unplug, +}, + +STEXI +...@item unplug @var{device} +...@findex unplug +Unplug block device. +ETEXI + +{ .name = change, .args_type = device:B,target:F,arg:s?, .params = device filename [format], -- 1.6.3.3
[Qemu-devel] [PATCH][RESEND] char: Flush read buffer in mux_chr_can_read
Move the buffer flush from mux_chr_read to mux_chr_can_read. While the latter is called periodically, the former will only be invoked when new characters arrive at the back-end. This caused problems to front-end drivers whenever they were unable to read data immediately, e.g. virtio-console attached to stdio. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- qemu-char.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/qemu-char.c b/qemu-char.c index 6d2dce7..f4c3876 100644 --- a/qemu-char.c +++ b/qemu-char.c @@ -398,6 +398,8 @@ static int mux_chr_can_read(void *opaque) MuxDriver *d = chr-opaque; int m = d-focus; +mux_chr_accept_input(opaque); + if ((d-prod[m] - d-cons[m]) MUX_BUFFER_SIZE) return 1; if (d-chr_can_read[m]) @@ -412,8 +414,6 @@ static void mux_chr_read(void *opaque, const uint8_t *buf, int size) int m = d-focus; int i; -mux_chr_accept_input (opaque); - for(i = 0; i size; i++) if (mux_proc_byte(chr, d, buf[i])) { if (d-prod[m] == d-cons[m] -- 1.7.1
[Qemu-devel] [PATCH][RESEND] pcnet: Do not receive external frames in loopback mode
While not explicitly stated in the spec, it was observed on real systems that enabling loopback testing on the pcnet controller disables reception of external frames. And some legacy software relies on it, so provide this behavior. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/pcnet.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/hw/pcnet.c b/hw/pcnet.c index b52935a..f970bda 100644 --- a/hw/pcnet.c +++ b/hw/pcnet.c @@ -1048,9 +1048,10 @@ ssize_t pcnet_receive(VLANClientState *nc, const uint8_t *buf, size_t size_) int crc_err = 0; int size = size_; -if (CSR_DRX(s) || CSR_STOP(s) || CSR_SPND(s) || !size) +if (CSR_DRX(s) || CSR_STOP(s) || CSR_SPND(s) || !size || +(CSR_LOOP(s) !s-looptest)) { return -1; - +} #ifdef PCNET_DEBUG printf(pcnet_receive size=%d\n, size); #endif -- 1.7.1