[Qemu-devel] Hello Would You Like To Earn
Hello qemu-devel Would you like to earn an extra $200 everyday?, for just 45 minutes work? You could quit your job and make double the money at home working for yourself. visit-http:tinyurl.com/42e38u9 Regards, Carmille Burns Survey Human Resources Dept.
[Qemu-devel] [PATCH v5 0/5] hpet 'driftfix': alleviate time drift with HPET periodic timers
Hi, This is version 5 of a series of patches that I originally posted in: http://lists.gnu.org/archive/html/qemu-devel/2011-03/msg01989.html http://lists.gnu.org/archive/html/qemu-devel/2011-03/msg01992.html http://lists.gnu.org/archive/html/qemu-devel/2011-03/msg01991.html http://lists.gnu.org/archive/html/qemu-devel/2011-03/msg01990.html http://article.gmane.org/gmane.comp.emulators.kvm.devel/69325 http://article.gmane.org/gmane.comp.emulators.kvm.devel/69326 http://article.gmane.org/gmane.comp.emulators.kvm.devel/69327 http://article.gmane.org/gmane.comp.emulators.kvm.devel/69328 Changes since version 4: Added comments to patch part 3 and part 5. No changes in the actual code. Please review and please comment. Regards, Uli Ulrich Obergfell (5): hpet 'driftfix': add hooks required to detect coalesced interrupts (x86 apic only) hpet 'driftfix': add driftfix property to HPETState and DeviceInfo hpet 'driftfix': add fields to HPETTimer and VMStateDescription hpet 'driftfix': add code in update_irq() to detect coalesced interrupts (x86 apic only) hpet 'driftfix': add code in hpet_timer() to compensate delayed callbacks and coalesced interrupts hw/apic.c |4 ++ hw/hpet.c | 178 +++-- hw/pc.h | 13 + vl.c | 13 + 4 files changed, 204 insertions(+), 4 deletions(-)
[Qemu-devel] [PATCH v5 4/5] hpet 'driftfix': add code in update_irq() to detect coalesced interrupts (x86 apic only)
update_irq() uses a similar method as in 'rtc_td_hack' to detect coalesced interrupts. The function entry addresses are retrieved from 'target_get_irq_delivered' and 'target_reset_irq_delivered'. This change can be replaced if a generic feedback infrastructure to track coalesced IRQs for periodic, clock providing devices becomes available. Signed-off-by: Ulrich Obergfell uober...@redhat.com --- hw/hpet.c | 13 +++-- 1 files changed, 11 insertions(+), 2 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index dba9370..0428290 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -184,11 +184,12 @@ static inline uint64_t hpet_calculate_diff(HPETTimer *t, uint64_t current) } } -static void update_irq(struct HPETTimer *timer, int set) +static int update_irq(struct HPETTimer *timer, int set) { uint64_t mask; HPETState *s; int route; +int irq_delivered = 1; if (timer-tn = 1 hpet_in_legacy_mode(timer-state)) { /* if LegacyReplacementRoute bit is set, HPET specification requires @@ -213,8 +214,16 @@ static void update_irq(struct HPETTimer *timer, int set) qemu_irq_raise(s-irqs[route]); } else { s-isr = ~mask; -qemu_irq_pulse(s-irqs[route]); +if (s-driftfix) { +target_reset_irq_delivered(); +qemu_irq_raise(s-irqs[route]); +irq_delivered = target_get_irq_delivered(); +qemu_irq_lower(s-irqs[route]); +} else { +qemu_irq_pulse(s-irqs[route]); +} } +return irq_delivered; } static void hpet_pre_save(void *opaque) -- 1.6.2.5
[Qemu-devel] [PATCH v5 5/5] hpet 'driftfix': add code in hpet_timer() to compensate delayed callbacks and coalesced interrupts
Loss of periodic timer interrupts caused by delayed callbacks and by interrupt coalescing is compensated by gradually injecting additional interrupts during subsequent timer intervals, starting at a rate of one additional interrupt per interval. The injection of additional interrupts is based on a backlog of unaccounted HPET clock periods (new HPETTimer field 'ticks_not_accounted'). The backlog increases due to delayed callbacks and coalesced interrupts, and it decreases if an interrupt was injected successfully. If the backlog increases while compensation is still in progress, the rate at which additional interrupts are injected is increased too. A limit is imposed on the backlog and on the rate. Injecting additional timer interrupts to compensate lost interrupts can alleviate long term time drift. However, on a short time scale, this method can have the side effect of making virtual machine time intermittently pass slower and faster than real time (depending on the guest's time keeping algorithm). Compensation is disabled by default and can be enabled for guests where this behaviour may be acceptable. Signed-off-by: Ulrich Obergfell uober...@redhat.com --- hw/hpet.c | 120 +++- 1 files changed, 118 insertions(+), 2 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index 0428290..bc2a21a 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -31,6 +31,7 @@ #include hpet_emul.h #include sysbus.h #include mc146818rtc.h +#include assert.h //#define HPET_DEBUG #ifdef HPET_DEBUG @@ -41,6 +42,9 @@ #define HPET_MSI_SUPPORT0 +#define MAX_TICKS_NOT_ACCOUNTED (uint64_t)5 /* 5 sec */ +#define MAX_IRQ_RATE(uint32_t)10 + struct HPETState; typedef struct HPETTimer { /* timers */ uint8_t tn; /*timer number*/ @@ -334,13 +338,68 @@ static const VMStateDescription vmstate_hpet = { }; /* + * This function resets the driftfix state in the following situations. + * + * - When the guest o/s changes the 'CFG_ENABLE' bit (overall enable) + * in the General Configuration Register from 0 to 1. + * + * - When the guest o/s changes the 'TN_ENABLE' bit (timer N interrupt enable) + * in the Timer N Configuration and Capabilities Register from 0 to 1. + */ +static void hpet_timer_driftfix_reset(HPETTimer *t) +{ +if (t-state-driftfix timer_is_periodic(t)) { +t-ticks_not_accounted = t-prev_period = t-period; +t-irq_rate = 1; +t-divisor = 1; +} +} + +/* + * This function determines whether there is a backlog of ticks for which + * no interrupts have been delivered to the guest o/s yet. If the backlog + * is equal to or greater than the current period length, then additional + * interrupts will be delivered to the guest o/s inside of the subsequent + * period interval to compensate missed interrupts. + * + * 'ticks_not_accounted' increases by 'N * period' when the comparator is + * being advanced, and it decreases by 'prev_period' when an interrupt is + * delivered to the guest o/s. Normally 'prev_period' is equal to 'period' + * and 'N' is 1. 'prev_period' is different from 'period' if a guest o/s + * has changed the comparator value during the previous period interval. + * 'N' is greater than 1 if the callback was delayed by 'N - 1' periods, + * and 'N' is zero while additional interrupts are delivered inside of an + * interval. + * + * This function is called after the comparator has been advanced but before + * the interrupt is delivered to the guest o/s. Hence, 'ticks_not_accounted' + * is equal to 'prev_period' plus 'period' if there is no backlog. + */ +static bool hpet_timer_has_tick_backlog(HPETTimer *t) +{ +uint64_t backlog = 0; + +if (t-ticks_not_accounted = t-period + t-prev_period) { +backlog = t-ticks_not_accounted - (t-period + t-prev_period); +} +return (backlog = t-period); +} + +/* * timer expiration callback */ static void hpet_timer(void *opaque) { HPETTimer *t = opaque; +HPETState *s = t-state; uint64_t diff; - +int irq_delivered = 0; +uint32_t period_count = 0; /* elapsed periods since last callback + * 1: normal case + * 1: missed 'period_count - 1' interrupts + * due to delayed callback + * 0: callback inside of an interval + * to deliver additional interrupts */ uint64_t period = t-period; uint64_t cur_tick = hpet_get_ticks(t-state); @@ -348,13 +407,48 @@ static void hpet_timer(void *opaque) if (t-config HPET_TN_32BIT) { while (hpet_time_after(cur_tick, t-cmp)) { t-cmp = (uint32_t)(t-cmp + t-period); +t-ticks_not_accounted += t-period; +period_count++; } } else { while (hpet_time_after64(cur_tick, t-cmp)) {
[Qemu-devel] [PATCH v5 1/5] hpet 'driftfix': add hooks required to detect coalesced interrupts (x86 apic only)
'target_get_irq_delivered' and 'target_reset_irq_delivered' point to functions that are called by update_irq() to detect coalesced interrupts. Initially they point to stub functions which pretend successful interrupt injection. apic code calls two registration functions to replace the stubs with apic_get_irq_delivered() and apic_reset_irq_delivered(). This change can be replaced if a generic feedback infrastructure to track coalesced IRQs for periodic, clock providing devices becomes available. Signed-off-by: Ulrich Obergfell uober...@redhat.com --- hw/apic.c |4 hw/pc.h | 13 + vl.c | 13 + 3 files changed, 30 insertions(+), 0 deletions(-) diff --git a/hw/apic.c b/hw/apic.c index a45b57f..94b1d15 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -17,6 +17,7 @@ * License along with this library; if not, see http://www.gnu.org/licenses/ */ #include hw.h +#include pc.h #include apic.h #include ioapic.h #include qemu-timer.h @@ -1143,6 +1144,9 @@ static SysBusDeviceInfo apic_info = { static void apic_register_devices(void) { +register_target_get_irq_delivered(apic_get_irq_delivered); +register_target_reset_irq_delivered(apic_reset_irq_delivered); + sysbus_register_withprop(apic_info); } diff --git a/hw/pc.h b/hw/pc.h index bc8fcec..7511f28 100644 --- a/hw/pc.h +++ b/hw/pc.h @@ -7,6 +7,19 @@ #include fdc.h #include net.h +extern int (*target_get_irq_delivered)(void); +extern void (*target_reset_irq_delivered)(void); + +static inline void register_target_get_irq_delivered(int (*func)(void)) +{ +target_get_irq_delivered = func; +} + +static inline void register_target_reset_irq_delivered(void (*func)(void)) +{ +target_reset_irq_delivered = func; +} + /* PC-style peripherals (also used by other machines). */ /* serial.c */ diff --git a/vl.c b/vl.c index 73e147f..456e320 100644 --- a/vl.c +++ b/vl.c @@ -232,6 +232,19 @@ const char *prom_envs[MAX_PROM_ENVS]; const char *nvram = NULL; int boot_menu; +static int target_get_irq_delivered_stub(void) +{ +return 1; +} + +static void target_reset_irq_delivered_stub(void) +{ +return; +} + +int (*target_get_irq_delivered)(void) = target_get_irq_delivered_stub; +void (*target_reset_irq_delivered)(void) = target_reset_irq_delivered_stub; + typedef struct FWBootEntry FWBootEntry; struct FWBootEntry { -- 1.6.2.5
[Qemu-devel] [PATCH v5 2/5] hpet 'driftfix': add driftfix property to HPETState and DeviceInfo
driftfix is a 'bit type' property. Compensation of delayed callbacks and coalesced interrupts can be enabled with the command line option -global hpet.driftfix=on driftfix is 'off' (disabled) by default. Signed-off-by: Ulrich Obergfell uober...@redhat.com --- hw/hpet.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index 6ce07bc..7513065 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -72,6 +72,8 @@ typedef struct HPETState { uint64_t isr; /* interrupt status reg */ uint64_t hpet_counter; /* main counter */ uint8_t hpet_id; /* instance id */ + +uint32_t driftfix; } HPETState; static uint32_t hpet_in_legacy_mode(HPETState *s) @@ -738,6 +740,7 @@ static SysBusDeviceInfo hpet_device_info = { .qdev.props = (Property[]) { DEFINE_PROP_UINT8(timers, HPETState, num_timers, HPET_MIN_TIMERS), DEFINE_PROP_BIT(msi, HPETState, flags, HPET_MSI_SUPPORT, false), +DEFINE_PROP_BIT(driftfix, HPETState, driftfix, 0, false), DEFINE_PROP_END_OF_LIST(), }, }; -- 1.6.2.5
[Qemu-devel] [PATCH v5 3/5] hpet 'driftfix': add fields to HPETTimer and VMStateDescription
The new fields in HPETTimer are covered by a separate VMStateDescription which is a subsection of 'vmstate_hpet_timer'. They are only migrated if -global hpet.driftfix=on Signed-off-by: Ulrich Obergfell uober...@redhat.com --- hw/hpet.c | 42 ++ 1 files changed, 42 insertions(+), 0 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index 7513065..dba9370 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -55,6 +55,19 @@ typedef struct HPETTimer { /* timers */ uint8_t wrap_flag; /* timer pop will indicate wrap for one-shot 32-bit * mode. Next pop will be actual timer expiration. */ +/* driftfix state */ +uint64_t prev_period;/* needed when the guest o/s changes the + * comparator value */ +uint64_t ticks_not_accounted;/* ticks for which no interrupts have been + * delivered to the guest o/s yet */ +uint32_t irq_rate; /* rate at which interrupts are delivered + * to the guest o/s during one period + * interval; if rate is greater than 1, + * additional interrupts are delivered + * to compensate missed interrupts */ +uint32_t divisor;/* needed to determine when the next + * timer callback should occur while + * rate is greater than 1 */ } HPETTimer; typedef struct HPETState { @@ -246,6 +259,27 @@ static int hpet_post_load(void *opaque, int version_id) return 0; } +static bool hpet_timer_driftfix_vmstate_needed(void *opaque) +{ +HPETTimer *t = opaque; + +return (t-state-driftfix != 0); +} + +static const VMStateDescription vmstate_hpet_timer_driftfix = { +.name = hpet_timer_driftfix, +.version_id = 1, +.minimum_version_id = 1, +.minimum_version_id_old = 1, +.fields = (VMStateField []) { +VMSTATE_UINT64(prev_period, HPETTimer), +VMSTATE_UINT64(ticks_not_accounted, HPETTimer), +VMSTATE_UINT32(irq_rate, HPETTimer), +VMSTATE_UINT32(divisor, HPETTimer), +VMSTATE_END_OF_LIST() +} +}; + static const VMStateDescription vmstate_hpet_timer = { .name = hpet_timer, .version_id = 1, @@ -260,6 +294,14 @@ static const VMStateDescription vmstate_hpet_timer = { VMSTATE_UINT8(wrap_flag, HPETTimer), VMSTATE_TIMER(qemu_timer, HPETTimer), VMSTATE_END_OF_LIST() +}, +.subsections = (VMStateSubsection []) { +{ +.vmsd = vmstate_hpet_timer_driftfix, +.needed = hpet_timer_driftfix_vmstate_needed, +}, { +/* empty */ +} } }; -- 1.6.2.5
Re: [Qemu-devel] [PATCH 20/26] target-xtensa: implement extended L32R
+static void gen_wsr_litbase(DisasContext *dc, uint32_t sr, TCGv_i32 s) +{ +tcg_gen_mov_i32(cpu_SR[sr], s); +/* This can change tb-flags, so exit tb */ +gen_jumpi_check_loop_end(dc, -1); +} Surely you have to flush all TB's when changing litbase? +((dc-tb-flags XTENSA_TBFLAG_LITBASE) ? + dc-litbase : + ((dc-pc + 3) ~3)) + +(0xfffc | (RI16_IMM16 2))); Unless you actually read from env-sr[LITBASE] here, instead of building the value into the TB. You're right, I have to flush all TBs at gen_wsr_litbase for this code to always work correctly. As far as I can see LITBASE usage pattern is that it is set up once in early initialization and is never changed after. Thanks. -- Max
Re: [Qemu-devel] [RFC] Memory API
On Thu, May 19, 2011 at 08:55:49PM +0200, Jan Kiszka wrote: Because we should catch accidental overlaps in all those non PCI devices with hard-wired addressing. That's a bug in the device/machine model and should be reported as such by QEMU. Why should we complicate API to catch unlikely errors? If you want to debug that add capability to dump memory map from the monitor. Because we need to switch tons of code that so far saw a fairly different reaction of the core to overlapping regions. How so? Today if there is accidental overlap device will not function properly. With new API it will be the same. I rather expect subtle differences as overlapping registration changes existing regions, in the future those will recover. Where do you expect the differences will come from? Conversion to the new API shouldn't change the order of the registration and if the last registration will override previous one the end result should be the same as we have today. new region management will not cause any harm to overlapping regions so that they can recover when the overlap is gone. Another example may be APIC region and PCI. They overlap, but neither CPU nor PCI knows about it. And they do not need to. The APIC regions will be managed by the per-CPU region management, reusing the tool box we need for all bridges. It will register the APIC page with a priority higher than the default one, thus overriding everything that comes from the host bridge. I think that reflects pretty well real machine behaviour. What is higher? How does it know that priority is high enough? Because no one else manages priorities at a specific hierarchy level. There is only one. PCI and CPU are on different hierarchy levels. PCI is under the PIIX and CPU is on a system BUS. The priority for the APIC mapping will be applied at CPU level, of course. So it will override everything, not just PCI. So you do not need explicit priority because the place in hierarchy implicitly provides you with one. Yes. OK :) So you agree that we can do without priorities :) Alternatively, you could add a prio offset to all mappings when climbing one level up, provided that offset is smaller than the prio range locally available to each level. Then a memory region final priority will depend on a tree height. If two disjointed tree branches of different height will claim the same memory region the higher one will have higher priority. I think this priority management is a can of worms. Only the lowest level (aka system bus) will use memory API directly. PCI device will call PCI subsystem. PCI subsystem, instead of assigning arbitrary priorities to all overlappings, may just resolve them and pass flattened view to the chipset. Chipset in turn will look for overlappings between PCI memory areas and RAM/ISA/other memory areas that are outside of PCI windows and resolve all those passing the flattened view to system bus where APIC/PCI conflict will be resolved and finally memory API will be used to create memory map. In such a model I do not see the need for priorities. All overlappings are resolved in the most logical place, the one that has the best knowledge about how to resolve the conflict. The will be no code duplication. Overlapping resolution code will be in separate library used by all layers. -- Gleb.
Re: [Qemu-devel] [PATCH 19/26] target-xtensa: implement loop option
+if (env-sregs[LEND] != v) { +tb_invalidate_phys_page_range( +env-sregs[LEND] - 1, env-sregs[LEND], 0); +env-sregs[LEND] = v; +tb_invalidate_phys_page_range( +env-sregs[LEND] - 1, env-sregs[LEND], 0); +} Why are you invalidating twice? TB at the old LEND and at the new. Although it will work correctly without first invalidation. +static void gen_check_loop_end(DisasContext *dc, int slot) +{ +if (option_enabled(dc, XTENSA_OPTION_LOOP) +!(dc-tb-flags XTENSA_TBFLAG_EXCM) +dc-next_pc == dc-lend) { +int label = gen_new_label(); + +tcg_gen_brcondi_i32(TCG_COND_NE, cpu_SR[LEND], dc-next_pc, label); +tcg_gen_brcondi_i32(TCG_COND_EQ, cpu_SR[LCOUNT], 0, label); +tcg_gen_subi_i32(cpu_SR[LCOUNT], cpu_SR[LCOUNT], 1); +gen_jump(dc, cpu_SR[LBEG]); +gen_set_label(label); +gen_jumpi(dc, dc-next_pc, slot); If you're going to pretend that LEND is a constant, you might as well pretend that LBEG is also a constant, so that you get to chain the TB's around the loop. But there may be three exits from TB at the LEND if its last command is a branch: to the LBEG, to the branch target and to the next insn. Thanks. -- Max
Re: [Qemu-devel] [PATCH 09/26] target-xtensa: add special and user registers
+enum { +THREADPTR = 231, +FCR = 232, +FSR = 233, +}; + typedef struct XtensaConfig { const char *name; uint64_t options; @@ -109,6 +115,7 @@ typedef struct CPUXtensaState { uint32_t regs[16]; uint32_t pc; uint32_t sregs[256]; +uint32_t uregs[256]; Is it really worthwhile allocating 2k worth of space in the CPUState when only several of the slots are actually used? I would think that it might be better to have a function to map between number to offset/register. E.g. int ur_offset(int ur) { switch (ur) { case THREADPTR: return offsetof(CPUState, ur_threadptr); case FCR: return offsetof(CPUState, ur_fcr); case FSR: return offsetof(CPUState, ur_fsr); } return -1; } where the individual slots are allocated by hand in the CPUState. The fact that they'll be named in the struct will also make it easier to dump the value inside gdb and see what the individual values are. User registers represent TIE states that may appear in custom xtensa configurations. I'd better change RUR and WUR so that they can access all user registers but warn on those not defined globally or in the CPUEnv::config. Is it OK? Thanks. -- Max
Re: [Qemu-devel] [RFC] Memory API
On 2011-05-20 09:23, Gleb Natapov wrote: On Thu, May 19, 2011 at 08:55:49PM +0200, Jan Kiszka wrote: Because we should catch accidental overlaps in all those non PCI devices with hard-wired addressing. That's a bug in the device/machine model and should be reported as such by QEMU. Why should we complicate API to catch unlikely errors? If you want to debug that add capability to dump memory map from the monitor. Because we need to switch tons of code that so far saw a fairly different reaction of the core to overlapping regions. How so? Today if there is accidental overlap device will not function properly. With new API it will be the same. I rather expect subtle differences as overlapping registration changes existing regions, in the future those will recover. Where do you expect the differences will come from? Conversion to the new API shouldn't change the order of the registration and if the last registration will override previous one the end result should be the same as we have today. A) Removing regions will change significantly. So far this is done by setting a region to IO_MEM_UNASSIGNED, keeping truncation. With the new API that will be a true removal which will additionally restore hidden regions. B) Uncontrolled overlapping is a bug that should be caught by the core, and a new API is a perfect chance to do this. new region management will not cause any harm to overlapping regions so that they can recover when the overlap is gone. Another example may be APIC region and PCI. They overlap, but neither CPU nor PCI knows about it. And they do not need to. The APIC regions will be managed by the per-CPU region management, reusing the tool box we need for all bridges. It will register the APIC page with a priority higher than the default one, thus overriding everything that comes from the host bridge. I think that reflects pretty well real machine behaviour. What is higher? How does it know that priority is high enough? Because no one else manages priorities at a specific hierarchy level. There is only one. PCI and CPU are on different hierarchy levels. PCI is under the PIIX and CPU is on a system BUS. The priority for the APIC mapping will be applied at CPU level, of course. So it will override everything, not just PCI. So you do not need explicit priority because the place in hierarchy implicitly provides you with one. Yes. OK :) So you agree that we can do without priorities :) Nope, see below how your own example depends on them. Alternatively, you could add a prio offset to all mappings when climbing one level up, provided that offset is smaller than the prio range locally available to each level. Then a memory region final priority will depend on a tree height. If two disjointed tree branches of different height will claim the same memory region the higher one will have higher priority. I think this priority management is a can of worms. It is not as it remains a pure local thing and helps implementing the sketched scenarios. Believe, I tried to fix PAM/SMRAM already. Only the lowest level (aka system bus) will use memory API directly. Not necessarily. It depends on how much added value buses like PCI or ISA or whatever can offer for managing I/O regions. For some purposes, it may as well be fine to just call the memory_* service directly and pass the result of some operation to the bus API later on. PCI device will call PCI subsystem. PCI subsystem, instead of assigning arbitrary priorities to all overlappings, Again: PCI will _not_ assign arbitrary priorities but only MEMORY_REGION_DEFAULT_PRIORITY, likely 0. may just resolve them and pass flattened view to the chipset. Chipset in turn will look for overlappings between PCI memory areas and RAM/ISA/other memory areas that are outside of PCI windows and resolve all those passing the flattened view to system bus where APIC/PCI conflict will be resolved and finally memory API will be used to create memory map. In such a model I do not see the need for priorities. All overlappings are resolved in the most logical place, the one that has the best knowledge about how to resolve the conflict. The will be no code duplication. Overlapping resolution code will be in separate library used by all layers. That does not specify how the PCI bridge or the chipset will tell that overlapping resolution lib _how_ overlapping regions shall be translated into a flat representation. And precisely here come priorities into play. It is the way to tell that lib either region A shall override region B if A has higher prio or if region A and B overlap, do whatever you want if both have the same prio. Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH 01/27] Clean up PowerPC SLB handling code
On 20.05.2011, at 05:34, David Gibson wrote: On Thu, May 19, 2011 at 10:25:04AM +0200, Andreas Färber wrote: QEMU HEAD still uses a 32-bit binary for both 32-bit and 64-bit. That one uses mtsrin so will need the compatibility, it seemed affected, too. OpenBIOS SVN HEAD (blob) uses slb* as linked to. We're in the preparation of 1.1 and I need to test it before we can update the QEMU binary. ;) Sorry for top-posting, Android sucks. So, my theory was half right. It was a problem with 64-bit mtsr emulation, but it wasn't that I just removed that code with the SLB cleanup. The code was still there and *almost* right. I was off by one in one shift, causing the storage key bits to end up in the wrong place in the SLB entry. I'll send out the patch right after I've sent this mail. Thanks a lot for tracking it down you two :) Alex
Re: [Qemu-devel] [PATCH] Fix a bug in mtsr/mtsrin emulation on ppc64
On 20.05.2011, at 05:34, David Gibson wrote: Early ppc64 CPUs include a hack to partially simulate the ppc32 segment registers, by translating writes to them into writes to the SLB. This is not used by any current Linux kernel, but it is used by the openbios used in the qemu mac99 model. Commit 81762d6dd0d430d87024f2c83e9c4dcc4329fb7d, cleaning up the SLB handling introduced a bug in this code, breaking the openbios currently in qemu. Specifically, there was an off by one error bitshuffling the register format used by mtsr into the format needed for the SLB load, causing the flag bits to end up in the wrong place. This caused the storage keys to be wrong under openbios, meaning that the translation code incorrectly thought a legitimate access was a permission violation. This patch fixes the bug, at the same time it fixes some build bug in the MMU debugging code (only exposed when DEBUG_MMU is enabled). Thanks, applied to ppc-next :) Alex
Re: [Qemu-devel] [V2 2/2]Qemu: Add commands hostcache_set and hostcache_get
On Thu, May 19, 2011 at 10:38:03PM +0530, Supriya Kannery wrote: Monitor commands hostcache_set and hostcache_get added for dynamic host cache change and display of host cache setting respectively. A generic command for changing block device options would be nice, althought I don't see other options where it makes sense to change them at runtime. The alternative would be: block_set hostcache on block_set, {device: ide1-cd0, name: hostcache, enable: true} The hostcache_get information would be part of query-block output: { device:ide0-hd0, locked:false, removable:false, inserted:{ ro:false, drv:qcow2, encrypted:false, file:disks/test.img hostcache:true, }, type:hd }, This approach is extensible if more options need to be exposed. Signed-off-by: Supriya Kannery supri...@in.ibm.com --- block.c | 48 block.h |2 ++ blockdev.c | 48 blockdev.h |2 ++ hmp-commands.hx | 29 + qmp-commands.hx | 55 +++ 6 files changed, 184 insertions(+) Index: qemu/hmp-commands.hx === --- qemu.orig/hmp-commands.hx +++ qemu/hmp-commands.hx @@ -70,6 +70,35 @@ but should be used with extreme caution. resizes image files, it can not resize block devices like LVM volumes. ETEXI +{ +.name = hostcache_get, +.args_type = device:B, +.params = device, +.help = retrieve host cache settings for device, Please make it clear these operations affect block devices: for block device +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_hostcache_get, +}, + +STEXI +@item hostcache_get +@findex hostcache_get +Display host cache settings for a block device while guest is running. +ETEXI + +{ +.name = hostcache_set, +.args_type = device:B,hostcache:s, +.params = device hostcache, +.help = change host cache setting for device, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_hostcache_set, +}, + +STEXI +@item hostcache_set +@findex hostcache_set +Change host cache options for a block device while guest is running. +ETEXI { .name = eject, Index: qemu/block.c === --- qemu.orig/block.c +++ qemu/block.c @@ -657,6 +657,34 @@ unlink_and_fail: return ret; } +int bdrv_reopen(BlockDriverState *bs, int bdrv_flags) +{ +BlockDriver *drv = bs-drv; +int ret = 0; + +/* No need to reopen as no change in flags */ +if (bdrv_flags == bs-open_flags) { +return 0; +} + +/* Quiesce IO for the given block device */ +qemu_aio_flush(); +bdrv_flush(bs); + +bdrv_close(bs); +ret = bdrv_open(bs, bs-filename, bdrv_flags, drv); + +/* + * A failed attempt to reopen the image file must lead to 'abort()' + */ +if (ret != 0) { +qerror_report(QERR_REOPEN_FILE_FAILED, bs-filename); +abort(); The error is never reported on a QMP monitor because qerror_report() simply stashes away the qerror. The QMP client doesn't have a chance to read the error before QEMU terminates. +} + +return ret; +} + void bdrv_close(BlockDriverState *bs) { if (bs-drv) { @@ -3049,3 +3077,23 @@ out: return ret; } + +int bdrv_change_hostcache(BlockDriverState *bs, bool enable_host_cache) Consistently using hostcache or host_cache would be nice. +{ +int bdrv_flags = bs-open_flags; + +/* No change in existing hostcache setting */ +if(!enable_host_cache == (bdrv_flags BDRV_O_NOCACHE)) { This expression doesn't work as expected. bool has a lower rank than int. That means !enable_host_cache is converted to an int and compared against bdrv_flags BDRV_O_NOCACHE. This expression is always false because a bool is 0 or 1 and BDRV_O_NOCACHE is 0x0020. +return -1; This shouldn't be a failure and please don't use -1 when a negative errno indicates failure. -1 == -EPERM. The return value should be 0 here. +} Anyway, this whole check is unnecessary since bdrv_reopen() already performs it. + +/* set hostcache flags (without changing WCE/flush bits) */ +if(!enable_host_cache) { +bdrv_flags |= BDRV_O_NOCACHE; +} else { +bdrv_flags = ~BDRV_O_NOCACHE; +} + +/* Reopen file with changed set of flags */ +return(bdrv_reopen(bs, bdrv_flags)); Please run scripts/checkpatch.pl before submitting patches. +} Index: qemu/blockdev.c
[Qemu-devel] virtio scsi host draft specification, v2
Hi all, here is the second version of the spec. In the end I took the advice of merging all requestq's into one. The reason for this is that I took a look at the vSCSI device and liked its approach of using SAM 8-byte LUNs directly. While it _is_ complex (and not yet done right by QEMU---will send a patch for that), the scheme is actually quite natural to implement and use, and supporting generic bus/target/LUN topologies is good to have for passthrough, as well. I also added a few more features from SAM to avoid redefining the structs in the future. Of course it may be that I'm completely wrong. :) Please comment on the spec! Paolo Virtio SCSI Host Device Spec The virtio SCSI host device groups together one or more simple virtual devices (ie. disk), and allows communicating to these devices using the SCSI protocol. An instance of the device represents a SCSI host with possibly many buses, targets and LUN attached. The virtio SCSI device services two kinds of requests: - command requests for a logical unit; - task management functions related to a logical unit, target or command. The device is also able to send out notifications about added and removed logical units. v4: First public version v5: Merged all virtqueues into one, removed separate TARGET fields Configuration - Subsystem Device ID TBD Virtqueues 0:control transmitq 1:control receiveq 2:requestq Feature bits VIRTIO_SCSI_F_INOUT - Whether a single request can include both read-only and write-only data buffers. Device configuration layout struct virtio_scsi_config { } (Still empty) Device initialization - The initialization routine should first of all discover the device's control virtqueues. The driver should then place at least a buffer in the control receiveq. Buffers returned by the device on the control receiveq may be referred to as events in the rest of the document. The driver can immediately issue requests (for example, INQUIRY or REPORT LUNS) or task management functions (for example, I_T RESET). Device operation: request queue --- The driver queues requests to the virtqueue, and they are used by the device (not necessarily in order). Requests have the following format: struct virtio_scsi_req_cmd { u8 lun[8]; u64 id; u8 task_attr; u8 prio; u8 crn; u32 num_dataout, num_datain; char cdb[]; char data[][num_dataout+num_datain]; u8 sense[]; u32 sense_len; u32 residual; u16 status_qualifier; u8 status; u8 response; }; /* command-specific response values */ #define VIRTIO_SCSI_S_OK 0 #define VIRTIO_SCSI_S_UNDERRUN1 #define VIRTIO_SCSI_S_ABORTED 2 #define VIRTIO_SCSI_S_FAILURE 3 The lun field addresses a bus, target and logical unit in the SCSI host. The id field is the command identifier as defined in SAM. The task_attr, prio field should always be zero, as task attributes other than SIMPLE, as well as command priority, are explicitly not supported by this version of the device. CRN is also as defined in SAM; while it is generally expected to be 0, clients can provide it. The maximum CRN value defined by the protocol is 255, since CRN is stored in an 8-bit integer. All of these fields are always read-only. The cdb, data and sense fields must reside in separate buffers. The cdb field is always read-only. The data buffers may be either read-only or write-only, depending on the request, with the read-only buffers coming first. The sense buffer is always write-only. The request shall have num_dataout read-only data buffers and num_datain write-only data buffers. One of these two values must be zero if the VIRTIO_SCSI_F_INOUT has not been negotiated. Remaining fields are filled in by the device. The sense_len field indicates the number of bytes actually written to the sense buffer, while the residual field indicates the residual size, calculated as data_length - number_of_transferred_bytes. The status byte is written by the device to be the SCSI status code. The response byte is written by the device to be one of the following: - VIRTIO_SCSI_S_OK when the request was completed and the status byte is filled with a SCSI status code (not necessarily GOOD). - VIRTIO_SCSI_S_UNDERRUN if the content of the CDB requires transferring more data than is available in the data buffers. - VIRTIO_SCSI_S_ABORTED if the request was cancelled due to a reset or another task management function. - VIRTIO_SCSI_S_FAILURE for other host or guest error. Device operation: control transmitq --- The control transmitq is used for other SCSI transport
Re: [Qemu-devel] [RFC] Memory API
On 05/19/2011 07:36 PM, Anthony Liguori wrote: There are no global priorities. Priorities are only used inside each level of the memory region hierarchy to generate a resulting, flattened view for the next higher level. At that level, everything imported from below has the default prio again, ie. the lowest one. Then SMM is impossible. It doesn't follow. Why do we need priorities at all? There should be no overlap at each level in the hierarchy. Of course there is overlap. PCI BARs overlap each other, the VGA windows and ROM overlap RAM. If you have overlapping BARs, the PCI bus will always send the request to a single device based on something that's implementation specific. This works because each PCI device advertises the BAR locations and sizes in it's config space. BARs in general don't need priority, except we need to decide if BARs overlap RAM of vice-versa. To dispatch a request, the PCI bus will walk the config space to find a match. If you remove something that was previously causing an overlap, it'll the other device will now get the I/O requests. That's what *exactl* what priority means. Which device is in front, and which is in the back. To model this correctly, you need to let the PCI bus decide how to dispatch I/O requests (again, you need hierarchical dispatch). And again, this API gives you hierarchical dispatch, with the addition that some of it is done at registration time so we can prepare the RAM slots. In the absence of this, the PCI bus needs to look at all of the devices, figure out the flat mapping, and register it. When a device is added or removed, it needs to recalculate the flat mapping and register it. However we do this, we need to look at all devices. There is no need to have centralized logic to decide this. I think you're completely missing the point of my proposal. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
Re: [Qemu-devel] [RFC] Memory API
On 05/19/2011 07:49 PM, Jan Kiszka wrote: If you have overlapping BARs, the PCI bus will always send the request to a single device based on something that's implementation specific. This works because each PCI device advertises the BAR locations and sizes in it's config space. That's not a use case for priorities at all. Priorities are useful for PAM and SMRAM-like scenarios. Correct. Priorities are also useful to decide if BARs hide RAM or vice-versa (determined by the PCI container's priority vs. the RAM container priorities, not individual BARs' priorities). -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
Re: [Qemu-devel] [RFC] Memory API
On 05/19/2011 07:32 PM, Anthony Liguori wrote: Think of how a window manager folds windows with priorities onto a flat framebuffer. You do a depth-first walk of the tree. For each child list, you iterate it from the lowest to highest priority, allowing later subregions override earlier subregions. Okay, but this doesn't explain how you'll let RAM override the VGA mapping since RAM is not represented in the same child list as VGA (RAM is a child of the PMC whereas VGA is a child of ISA/PCI, both of which are at least one level removed from the PMC). VGA will override RAM. Memory controller | +-- RAM container (prio 0) | +-- PCI container (prio 1) | +--- vga window -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
Re: [Qemu-devel] [RFC] Memory API
On 05/19/2011 07:38 PM, Anthony Liguori wrote: You can always create a new memory region with higher priority, pointing to the RAM window you want to have above VGA. That's what we do today as well, just with different effects on the internal representation. But then we're no better than we are today. I thought the whole point of this thread of discussion was to allow overlapping I/O regions to be handled in a better way than we do today? It is, and the goal is achieved. Right now the code saves the old contents in isa_page_descs. With the new approach it calls memory_region_del_subregion() and the previous contents magically appear (or new contents if they changed in the meanwhile). -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
Re: [Qemu-devel] [RFC] Memory API
On 05/19/2011 09:22 PM, Gleb Natapov wrote: BARs may overlap with other BARs or with RAM. That's well-known, so PCI bridged need to register their regions with the _overlap variant unconditionally. In contrast to the current PhysPageDesc mechanism, the With what priority? It doesn't matter, since the spec doesn't define priorities among PCI BARs. If it needs to call _overlap unconditionally why not always call _overlap and drop not _overlap variant? Other uses need non-overlapping registration. And they do not need to. The APIC regions will be managed by the per-CPU region management, reusing the tool box we need for all bridges. It will register the APIC page with a priority higher than the default one, thus overriding everything that comes from the host bridge. I think that reflects pretty well real machine behaviour. What is higher? How does it know that priority is high enough? It is well known that 1 0, for example. I thought, from reading other replies, that priorities are meaningful only on the same hierarchy level (which kinda make sense), but now you are saying that you will override PCI address from another part of the topology? -- per-cpu memory | +--- apic page (prio 1) | +--- global memory (prio 0) -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
Re: [Qemu-devel] [PATCH 19/26] target-xtensa: implement loop option
+ if (env-sregs[LEND] != v) { + tb_invalidate_phys_page_range( + env-sregs[LEND] - 1, env-sregs[LEND], 0); + env-sregs[LEND] = v; + tb_invalidate_phys_page_range( + env-sregs[LEND] - 1, env-sregs[LEND], 0); + } Why are you invalidating twice? TB at the old LEND and at the new. Although it will work correctly without first invalidation. +static void gen_check_loop_end(DisasContext *dc, int slot) +{ + if (option_enabled(dc, XTENSA_OPTION_LOOP) + !(dc-tb-flags XTENSA_TBFLAG_EXCM) + dc-next_pc == dc-lend) { + int label = gen_new_label(); + + tcg_gen_brcondi_i32(TCG_COND_NE, cpu_SR[LEND], dc-next_pc, label); + tcg_gen_brcondi_i32(TCG_COND_EQ, cpu_SR[LCOUNT], 0, label); + tcg_gen_subi_i32(cpu_SR[LCOUNT], cpu_SR[LCOUNT], 1); + gen_jump(dc, cpu_SR[LBEG]); + gen_set_label(label); + gen_jumpi(dc, dc-next_pc, slot); If you're going to pretend that LEND is a constant, you might as well pretend that LBEG is also a constant, so that you get to chain the TB's around the loop. But there may be three exits from TB at the LEND if its last command is a branch: to the LBEG, to the branch target and to the next insn. Ok, I guess that I need to add gen_wsr_lbeg that invalidates TB at the current LEND, pretend that LBEG is constant and use given slot to jump to it. And also to get rid of tcg_gen_brcondi_i32(TCG_COND_NE, cpu_SR[LEND], dc-next_pc, label); -- Thanks. -- Max
[Qemu-devel] [PATCH] hw/sd.c: Don't complain about SDIO commands CMD52/CMD53
The SDIO specification introduces new commands 52 and 53. Handle as illegal command but do not complain on stderr, as SDIO-aware OSes (including Linux) may legitimately use these in their probing for presence of an SDIO card. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- hw/sd.c | 11 +++ 1 files changed, 11 insertions(+), 0 deletions(-) diff --git a/hw/sd.c b/hw/sd.c index f44a970..cedfb20 100644 --- a/hw/sd.c +++ b/hw/sd.c @@ -1104,6 +1104,17 @@ static sd_rsp_type_t sd_normal_command(SDState *sd, } break; +case 52: +case 53: +/* CMD52, CMD53: reserved for SDIO cards + * (see the SDIO Simplified Specification V2.0) + * Handle as illegal command but do not complain + * on stderr, as some OSes may use these in their + * probing for presence of an SDIO card. + */ +sd-card_status |= ILLEGAL_COMMAND; +return sd_r0; + /* Application specific commands (Class 8) */ case 55: /* CMD55: APP_CMD */ if (sd-rca != rca) -- 1.7.1
Re: [Qemu-devel] [RFC] Memory API
On 05/19/2011 09:18 PM, Anthony Liguori wrote: On 05/19/2011 09:11 AM, Avi Kivity wrote: On 05/19/2011 05:04 PM, Anthony Liguori wrote: Right, the chipset register is mainly used to program the contents of SMM. There is a single access pin that has effectively the same semantics as setting the chipset register. It's not a per-CPU setting--that's the point. You can't have one CPU reading SMM memory at the exactly same time as accessing VGA. But I guess you can never have two simultaneous accesses anyway so perhaps it's splitting hairs :-) Exactly - it just works. Well, not really. kvm.ko has a global mapping of RAM regions and currently only allows code execution from RAM. This means the only way for QEMU to enable SMM support is to program the global RAM regions table to enable allow RAM access for the VGA region. The problem with this is that it's perfectly conceivable to have CPU 0 in SMM mode while CPU 1 is doing MMIO to the VGA planar. kvm needs updates to support SMM; I already outlined them several months ago. The same problem exists with PAM. PAM is a completely different problem. The changes are global and fit kvm slot management. It would be much easier to implement PAM correctly in QEMU if it were possible to execute code via MMIO as we could just mark the BIOS memory as non-RAM and deal with the dispatch ourselves. Would it be fundamentally hard to support this in KVM? I guess you would need to put the VCPU in single step mode and maintain a page to copy the results into. You need to emulate everything. We're probably not far from that. However there may be a significant performance loss. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API
On 05/19/2011 10:07 PM, Alex Williamson wrote: On Thu, 2011-05-19 at 10:12 -0400, Avi Kivity wrote: The memory API separates the attributes of a memory region (its size, how reads or writes are handled, dirty logging, and coalescing) from where it is mapped and whether it is enabled. This allows a device to configure a memory region once, then hand it off to its parent bus to map it according to the bus configuration. Hierarchical registration also allows a device to compose a region out of a number of sub-regions with different properties; for example some may be RAM while others may be MMIO. +/* Guest-visible constraints: */ +struct { +/* If nonzero, specify bounds on access sizes beyond which a machine + * check is thrown. + */ +unsigned min_access_size; +unsigned max_access_size; Do we always support all access sizes between min and max? As far as I can tell, yes. This might be easier to describe as a bitmap of supported power of 2 access sizes. This is uglier to initialize. However we can provide #defines for common use (MEM_ACCESS_BYTE_TO_LONG, MEM_ACCESS_LONG). -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API
On 05/19/2011 10:27 PM, Jan Kiszka wrote: On 2011-05-19 16:12, Avi Kivity wrote: +/* Sets an offset to be added to MemoryRegionOps callbacks. */ +void memory_region_set_offset(MemoryRegion *mr, target_phys_addr_t offset); Please mark this as a legacy helper, ideally to be removed after the complete conversion to this API. During that phase we should try to identify those devices which still depend on offset=0 and maybe directly fix them. Okay. +/* Turn loggging on or off for specified client (display, migration) */ +void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client); +/* Enable memory coalescing for the region. MMIO -write callbacks may be + * delayed until a non-coalesced MMIO is issued. + */ +void memory_region_set_coalescing(MemoryRegion *mr); +/* Enable memory coalescing for a sub-range of the region. MMIO -write + * callbacks may be delayed until a non-coalesced MMIO is issued. + */ +void memory_region_add_coalescing(MemoryRegion *mr, + target_phys_addr_t offset, + target_phys_addr_t size); Will this be such a common use case that requesting the user to split up the region and then use set_coalescing will generate too much boiler plate code? Look at e1000, coalescing ranges have byte granularity. +/* Disable MMIO coalescing for the region. */ +void memory_region_clear_coalescing(MemoryRegion *mr); And what about clearing coalescing for sub-ranges? Clear them all and rebuild. Maybe skip add_coalescing for the first run and see how far we get. We get as far as e. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API
On 05/19/2011 11:43 PM, Anthony Liguori wrote: On 05/19/2011 09:12 AM, Avi Kivity wrote: The memory API separates the attributes of a memory region (its size, how reads or writes are handled, dirty logging, and coalescing) from where it is mapped and whether it is enabled. This allows a device to configure a memory region once, then hand it off to its parent bus to map it according to the bus configuration. Hierarchical registration also allows a device to compose a region out of a number of sub-regions with different properties; for example some may be RAM while others may be MMIO. +struct { +/* If nonzero, specify bounds on access sizes beyond which a machine + * check is thrown. + */ +unsigned min_access_size; +unsigned max_access_size; +/* If true, unaligned accesses are supported. Otherwise unaligned + * accesses throw machine checks. + */ + bool unaligned; +} valid; Under what circumstances would this be used? The behavior of devices that receive non-natural accesses varies wildly. For PCI devices, invalid accesses almost always return ~0. I can't think of a device where an MCE would occur. This was requested by Richard, so I'll let him comment. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API
On 05/20/2011 12:04 AM, Stefan Weil wrote: Am 19.05.2011 16:12, schrieb Avi Kivity: The memory API separates the attributes of a memory region (its size, how reads or writes are handled, dirty logging, and coalescing) from where it is mapped and whether it is enabled. This allows a device to configure a memory region once, then hand it off to its parent bus to map it according to the bus configuration. Hierarchical registration also allows a device to compose a region out of a number of sub-regions with different properties; for example some may be RAM while others may be MMIO. --- /dev/null +++ b/memory.h @@ -0,0 +1,142 @@ +#ifndef MEMORY_H +#define MEMORY_H + +#include stdint.h +#include stdbool.h stdbool.h is already included in qemu-common.h, stdint.h (indirectly) too. Therefore both include statements can be removed. We shouldn't rely on indirect includes, it makes updating headers very hard. Each header should #include what it directly needs and no more. +typedef struct CoalescedMemoryRange CoalescedMemoryRange; + +struct CoalescedMemoryRange { + target_phys_addr_t start; + target_phys_addr_t size; + QTAILQ_ENTRY(coalesced_ranges) link; +}; + +struct MemoryRegion { + /* All fields are private - violators will be prosecuted */ Is it possible to move this private declaration into the implementation file (or a private header file if the declaration is needed by more than one file)? No, the structure size is needed by clients. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
[Qemu-devel] Share a directory between a linux host and a windows guest w/o network?
Hi, is it possible to share a directory between a windows guest running on a linux host? Similar to samba but independent on the network? I have searched for combinations of v9fs or virtio-9p and windows but didn't find anything relevant. Thanks, Torsten Förtsch
Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API
On 05/20/2011 12:11 AM, Stefan Hajnoczi wrote: On Thu, May 19, 2011 at 3:12 PM, Avi Kivitya...@redhat.com wrote: +struct MemoryRegion { +/* All fields are private - violators will be prosecuted */ +const MemoryRegionOps *ops; +MemoryRegion *parent; In the case where a region is aliased (mapped twice into the address space at different addresses) I need two MemoryRegions? Yes. The MemoryRegion describes an actual mapping in theparent, addr, ram_addr tuple, not just the attributes of the region (ops, size, ...). Correct. The region is not just a read-only descriptor. memory_region_add_subregion() can be used only once on a region (unless you _del_subregion() in between). (it also follows from the fact that there is no separate opaque for registration, and from the fact that RAM is owned by the region, not provided as part of registration). -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
Re: [Qemu-devel] [RFC] Memory API
On 05/19/2011 07:27 PM, Gleb Natapov wrote: Think of how a window manager folds windows with priorities onto a flat framebuffer. You do a depth-first walk of the tree. For each child list, you iterate it from the lowest to highest priority, allowing later subregions override earlier subregions. I do not think that window manager is a good analogy. Window can overlap with only its siblings. In our memory tree each final node may overlap with any other node in the tree. Transparent windows. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
Re: [Qemu-devel] [PATCH 01/11] target-ppc: remove old CONFIG_SOFTFLOAT #ifdef
On 15 May 2011 15:13, Aurelien Jarno aurel...@aurel32.net wrote: target-ppc has been switched to softfloat only long ago, but a few #ifdef CONFIG_SOFTFLOAT have been forgotten. Remove them. Cc: Alexander Graf ag...@suse.de Signed-off-by: Aurelien Jarno aurel...@aurel32.net Reviewed-by: Peter Maydell peter.mayd...@linaro.org
[Qemu-devel] Protesta contro l'oppressione fiscale giudiziale e bancaria
Per visionare il sito clicca qui Fai parte anche tu di Italia che lavora. Sito di protesta: Fiscale, giudiziaria e bancaria Inoltra questo msg ai tuoi amici Per visionare il sito clicca qui
Re: [Qemu-devel] [PATCH 03/11] softfloat-native: remove
On 15 May 2011 15:13, Aurelien Jarno aurel...@aurel32.net wrote: Remove softfloat-native support, all targets are now using softfloat instead. Signed-off-by: Aurelien Jarno aurel...@aurel32.net Reviewed-by: Peter Maydell peter.mayd...@linaro.org
Re: [Qemu-devel] [PATCH 04/11] softfloat: always enable floatx80 and float128 support
On 15 May 2011 15:13, Aurelien Jarno aurel...@aurel32.net wrote: Now that softfloat-native is gone, there is no real point on not always enabling floatx80 and float128 support. Signed-off-by: Aurelien Jarno aurel...@aurel32.net Reviewed-by: Peter Maydell peter.mayd...@linaro.org
Re: [Qemu-devel] [PATCH 05/11] target-i386: remove old code handling float64
On 15 May 2011 15:13, Aurelien Jarno aurel...@aurel32.net wrote: Now that target-i386 uses softfloat, floatx80 is always available and there is no need anymore to have code handling both float64 and floax80. Signed-off-by: Aurelien Jarno aurel...@aurel32.net This patch is OK in terms of how it leaves the code, but I think some parts of it are out of sequence with the rest of the patchset. For instance: -#ifdef FLOATX80 -#define USE_X86LDOUBLE -#endif We've already removed the FLOATX80 define in a previous patch, so if we don't delete the x86 use of it until this patch then the behaviour will briefly flip-flop as you go through the patch stack, which could be bad for bisection. -#if defined(CONFIG_SOFTFLOAT) -# define floatx_lg2 make_floatx80( 0x3ffd, 0x9a209a84fbcff799LL ) -# define floatx_l2e make_floatx80( 0x3fff, 0xb8aa3b295c17f0bcLL ) -# define floatx_l2t make_floatx80( 0x4000, 0xd49a784bcd1b8afeLL ) -#else -# define floatx_lg2 (0.30102999566398119523L) -# define floatx_l2e (1.44269504088896340739L) -# define floatx_l2t (3.32192809488736234781L) -#endif Similarly, this #ifdeffery should have gone away when we took out CONFIG_SOFTFLOAT, not later. (Also the patch was a bit of a pig to review because it combines several distinct mostly-mechanical transformations.) -- PMM
Re: [Qemu-devel] [PATCH 06/11] target-i386: use floatx80 constants in helper_fld*_ST0()
On 15 May 2011 15:13, Aurelien Jarno aurel...@aurel32.net wrote: Instead of using a table which doesn't correspond to anything from physical in the CPU, use directly the constants in helper_fld*_ST0(). Actually I rather suspect there is effectively a table in the CPU indexed by the last 3 bits of the FLD* opcode... It would be possible to implement this group of insns in QEMU with a single helper function that took the index into the array, but since the array seems to be causing weird compilation problems we might as well stick with the lots-of-helpers approach, at which point this is a sensible cleanup. Reviewed-by: Peter Maydell peter.mayd...@linaro.org
Re: [Qemu-devel] [PATCH 07/11] softfloat: add float*_is_zero_or_denormal()
On 15 May 2011 15:13, Aurelien Jarno aurel...@aurel32.net wrote: float*_is_zero_or_denormal() is available for float32, but not for float64, floatx80 and float128. Fix that. Signed-off-by: Aurelien Jarno aurel...@aurel32.net Reviewed-by: Peter Maydell peter.mayd...@linaro.org
[Qemu-devel] [PATCH v4 0/3] Coroutines for better asynchronous programming
QEMU is event-driven and suffers when blocking operations are performed because VM execution may be stopped until the operation completes. Therefore many operations that could block are performed asynchronously and a callback is invoked when the operation has completed. This allows QEMU to continue executing while the operation is pending. The downside to callbacks is that they split up code into many smaller functions, each of which is a single step in a state machine that quickly becomes complex and hard to understand. Callback functions also result in lots of noise as variables are packed and unpacked into temporary structs that pass state to the callback function. This patch series introduces coroutines as a solution for writing asynchronous code while still having a nice sequential control flow. The semantics are explained in the first patch. The second patch adds automated tests. A nice feature of coroutines is that it is relatively easy to take synchronous code and lift it into a coroutine to make it asynchronous. Work has been done to move qcow2 request processing into coroutines and thereby make it asynchronous (today qcow2 will perform synchronous metadata accesses). This qcow2 work is still ongoing and not quite ready for mainline yet. Coroutines are also being used for virtfs (virtio-9p) so I have submitted this patch now because virtfs patches that depend on coroutines are being published. v4: * Windows Fibers support (Paolo Bonzini pbonz...@redhat.com) * Return-after-setjmp() fix (Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com) * Re-entrancy for multi-threaded coroutines support * qemu-coroutine.h cleanup and documentation v3: * Updated LGPL v2 license header to use web link * Removed atexit(3) pool freeing * Removed thread-local current/leader * Documented thread-safety limitation * Disabled trace events v2: * Added ./check-coroutine --lifecycle-benchmark for performance measurement * Split pooling into a separate patch with performance justification * Set maximum pool size to prevent holding onto too many free coroutines * Added atexit(3) handler to free pool * Coding style cleanups Kevin Wolf (1): coroutine: introduce coroutines Stefan Hajnoczi (2): coroutine: add check-coroutine automated tests coroutine: add check-coroutine --benchmark-lifecycle Makefile |3 +- Makefile.objs|7 ++ check-coroutine.c| 236 ++ coroutine-ucontext.c | 229 coroutine-win32.c| 92 +++ qemu-coroutine-int.h | 48 ++ qemu-coroutine.c | 75 qemu-coroutine.h | 95 trace-events |5 + 9 files changed, 789 insertions(+), 1 deletions(-) create mode 100644 check-coroutine.c create mode 100644 coroutine-ucontext.c create mode 100644 coroutine-win32.c create mode 100644 qemu-coroutine-int.h create mode 100644 qemu-coroutine.c create mode 100644 qemu-coroutine.h -- 1.7.4.4
[Qemu-devel] [PATCH v4 2/3] coroutine: add check-coroutine automated tests
To run automated tests for coroutines: make check-coroutine ./check-coroutine On success the program terminates with exit status 0. On failure an error message is written to stderr and the program exits with exit status 1. Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- Makefile |3 +- check-coroutine.c | 188 + 2 files changed, 190 insertions(+), 1 deletions(-) create mode 100644 check-coroutine.c diff --git a/Makefile b/Makefile index 2b0438c..69c08c2 100644 --- a/Makefile +++ b/Makefile @@ -132,7 +132,7 @@ qemu-io$(EXESUF): qemu-io.o cmd.o qemu-tool.o qemu-error.o $(oslib-obj-y) $(trac qemu-img-cmds.h: $(SRC_PATH)/qemu-img-cmds.hx $(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h $ $@, GEN $@) -check-qint.o check-qstring.o check-qdict.o check-qlist.o check-qfloat.o check-qjson.o: $(GENERATED_HEADERS) +check-qint.o check-qstring.o check-qdict.o check-qlist.o check-qfloat.o check-qjson.o check-coroutine.o: $(GENERATED_HEADERS) CHECK_PROG_DEPS = qemu-malloc.o $(oslib-obj-y) $(trace-obj-y) @@ -142,6 +142,7 @@ check-qdict: check-qdict.o qdict.o qfloat.o qint.o qstring.o qbool.o qlist.o $(C check-qlist: check-qlist.o qlist.o qint.o $(CHECK_PROG_DEPS) check-qfloat: check-qfloat.o qfloat.o $(CHECK_PROG_DEPS) check-qjson: check-qjson.o qfloat.o qint.o qdict.o qstring.o qlist.o qbool.o qjson.o json-streamer.o json-lexer.o json-parser.o $(CHECK_PROG_DEPS) +check-coroutine: check-coroutine.o $(coroutine-obj-y) $(CHECK_PROG_DEPS) QEMULIBS=libhw32 libhw64 libuser libdis libdis-user diff --git a/check-coroutine.c b/check-coroutine.c new file mode 100644 index 000..f65ac2e --- /dev/null +++ b/check-coroutine.c @@ -0,0 +1,188 @@ +/* + * Coroutine tests + * + * Copyright IBM, Corp. 2011 + * + * Authors: + * Stefan Hajnoczistefa...@linux.vnet.ibm.com + * + * This work is licensed under the terms of the GNU LGPL, version 2 or later. + * See the COPYING.LIB file in the top-level directory. + * + */ + +#include stdlib.h +#include stdio.h +#include qemu-coroutine.h + +static const char *cur_test_name; + +static void test_assert(bool condition, const char *msg) +{ +if (!condition) { +fprintf(stderr, %s: %s\n, cur_test_name, msg); +exit(EXIT_FAILURE); +} +} + +/* + * Check that qemu_in_coroutine() works + */ + +static void coroutine_fn verify_in_coroutine(void *opaque) +{ +test_assert(qemu_in_coroutine(), expected coroutine context); +} + +static void test_in_coroutine(void) +{ +Coroutine *coroutine; + +test_assert(!qemu_in_coroutine(), expected no coroutine context); + +coroutine = qemu_coroutine_create(verify_in_coroutine); +qemu_coroutine_enter(coroutine, NULL); +} + +/* + * Check that qemu_coroutine_self() works + */ + +static void coroutine_fn verify_self(void *opaque) +{ +test_assert(qemu_coroutine_self() == opaque, +qemu_coroutine_self() did not return this coroutine); +} + +static void test_self(void) +{ +Coroutine *coroutine; + +coroutine = qemu_coroutine_create(verify_self); +qemu_coroutine_enter(coroutine, coroutine); +} + +/* + * Check that coroutines may nest multiple levels + */ + +typedef struct { +unsigned int n_enter; /* num coroutines entered */ +unsigned int n_return; /* num coroutines returned */ +unsigned int max; /* maximum level of nesting */ +} NestData; + +static void coroutine_fn nest(void *opaque) +{ +NestData *nd = opaque; + +nd-n_enter++; + +if (nd-n_enter nd-max) { +Coroutine *child; + +child = qemu_coroutine_create(nest); +qemu_coroutine_enter(child, nd); +} + +nd-n_return++; +} + +static void test_nesting(void) +{ +Coroutine *root; +NestData nd = { +.n_enter = 0, +.n_return = 0, +.max = 1, +}; + +root = qemu_coroutine_create(nest); +qemu_coroutine_enter(root, nd); + +test_assert(nd.n_enter == nd.max, +failed entering to max nesting level); +test_assert(nd.n_return == nd.max, +failed returning from max nesting level); +} + +/* + * Check that yield/enter transfer control correctly + */ + +static void coroutine_fn yield_5_times(void *opaque) +{ +bool *done = opaque; +int i; + +for (i = 0; i 5; i++) { +qemu_coroutine_yield(); +} +*done = true; +} + +static void test_yield(void) +{ +Coroutine *coroutine; +bool done = false; +int i = -1; /* one extra time to return from coroutine */ + +coroutine = qemu_coroutine_create(yield_5_times); +while (!done) { +qemu_coroutine_enter(coroutine, done); +i++; +} +test_assert(i == 5, coroutine did not yield 5 times); +} + +/* + * Check that creation, enter, and return work + */ + +static void coroutine_fn set_and_exit(void *opaque) +{ +bool *done = opaque; + +*done = true; +} + +static void test_lifecycle(void) +{ +
[Qemu-devel] [PATCH v4 1/3] coroutine: introduce coroutines
From: Kevin Wolf kw...@redhat.com Asynchronous code is becoming very complex. At the same time synchronous code is growing because it is convenient to write. Sometimes duplicate code paths are even added, one synchronous and the other asynchronous. This patch introduces coroutines which allow code that looks synchronous but is asynchronous under the covers. A coroutine has its own stack and is therefore able to preserve state across blocking operations, which traditionally require callback functions and manual marshalling of parameters. Creating and starting a coroutine is easy: coroutine = qemu_coroutine_create(my_coroutine); qemu_coroutine_enter(coroutine, my_data); The coroutine then executes until it returns or yields: void coroutine_fn my_coroutine(void *opaque) { MyData *my_data = opaque; /* do some work */ qemu_coroutine_yield(); /* do some more work */ } Yielding switches control back to the caller of qemu_coroutine_enter(). This is typically used to switch back to the main thread's event loop after issuing an asynchronous I/O request. The request callback will then invoke qemu_coroutine_enter() once more to switch back to the coroutine. Note that if coroutines are used only from threads which hold the global mutex they will never execute concurrently. This makes programming with coroutines easier than with threads. Race conditions cannot occur since only one coroutine may be active at any time. Other coroutines can only run across yield. This coroutines implementation is based on the gtk-vnc implementation written by Anthony Liguori anth...@codemonkey.ws but it has been significantly rewritten by Kevin Wolf kw...@redhat.com to use setjmp()/longjmp() instead of the more expensive swapcontext() and by Paolo Bonzini pbonz...@redhat.com for Windows Fibers support. Signed-off-by: Kevin Wolf kw...@redhat.com Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- Makefile.objs|7 ++ coroutine-ucontext.c | 229 ++ coroutine-win32.c| 92 qemu-coroutine-int.h | 48 +++ qemu-coroutine.c | 75 qemu-coroutine.h | 95 + trace-events |5 + 7 files changed, 551 insertions(+), 0 deletions(-) create mode 100644 coroutine-ucontext.c create mode 100644 coroutine-win32.c create mode 100644 qemu-coroutine-int.h create mode 100644 qemu-coroutine.c create mode 100644 qemu-coroutine.h diff --git a/Makefile.objs b/Makefile.objs index 4478c61..a8dbd15 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -11,6 +11,12 @@ oslib-obj-$(CONFIG_WIN32) += oslib-win32.o qemu-thread-win32.o oslib-obj-$(CONFIG_POSIX) += oslib-posix.o qemu-thread-posix.o ### +# coroutines +coroutine-obj-y = qemu-coroutine.o +coroutine-obj-$(CONFIG_POSIX) += coroutine-ucontext.o +coroutine-obj-$(CONFIG_WIN32) += coroutine-win32.o + +### # block-obj-y is code used by both qemu system emulation and qemu-img block-obj-y = cutils.o cache-utils.o qemu-malloc.o qemu-option.o module.o async.o @@ -67,6 +73,7 @@ common-obj-y += readline.o console.o cursor.o qemu-error.o common-obj-y += $(oslib-obj-y) common-obj-$(CONFIG_WIN32) += os-win32.o common-obj-$(CONFIG_POSIX) += os-posix.o +common-obj-y += $(coroutine-obj-y) common-obj-y += tcg-runtime.o host-utils.o common-obj-y += irq.o ioport.o input.o diff --git a/coroutine-ucontext.c b/coroutine-ucontext.c new file mode 100644 index 000..bcea2bd --- /dev/null +++ b/coroutine-ucontext.c @@ -0,0 +1,229 @@ +/* + * ucontext coroutine initialization code + * + * Copyright (C) 2006 Anthony Liguori anth...@codemonkey.ws + * Copyright (C) 2011 Kevin Wolf kw...@redhat.com + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.0 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see http://www.gnu.org/licenses/. + */ + +/* XXX Is there a nicer way to disable glibc's stack check for longjmp? */ +#ifdef _FORTIFY_SOURCE +#undef _FORTIFY_SOURCE +#endif +#include setjmp.h +#include stdint.h +#include pthread.h +#include ucontext.h +#include qemu-common.h +#include qemu-coroutine-int.h + +enum { +/* Maximum free pool size prevents holding too many freed coroutines */ +POOL_MAX_SIZE = 64, +}; + +typedef struct { +
[Qemu-devel] [PATCH v4 3/3] coroutine: add check-coroutine --benchmark-lifecycle
Add a microbenchmark for coroutine create, enter, and return (aka lifecycle). This is a useful benchmark because users are expected to create many coroutines, one per I/O request for example, and we therefore need to provide good performance in that scenario. To run: make check-coroutine ./check-coroutine --benchmark-lifecycle 2000 This will do 20,000,000 coroutine create, enter, return iterations and print the resulting time. Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- check-coroutine.c | 48 1 files changed, 48 insertions(+), 0 deletions(-) diff --git a/check-coroutine.c b/check-coroutine.c index f65ac2e..8ed1a4f 100644 --- a/check-coroutine.c +++ b/check-coroutine.c @@ -11,8 +11,10 @@ * */ +#include string.h #include stdlib.h #include stdio.h +#include sys/time.h #include qemu-coroutine.h static const char *cur_test_name; @@ -163,6 +165,43 @@ static void test_lifecycle(void) test_assert(done, expected done to be true (second time)); } +/* + * Lifecycle benchmark + */ + +static void coroutine_fn empty_coroutine(void *opaque) +{ +/* Do nothing */ +} + +static void benchmark_lifecycle(const char *iterations) +{ +Coroutine *coroutine; +unsigned int i, max; +struct timeval start, finish; +time_t dsec; +long dusec; + +max = atoi(iterations); + +gettimeofday(start, NULL); +for (i = 0; i max; i++) { +coroutine = qemu_coroutine_create(empty_coroutine); +qemu_coroutine_enter(coroutine, NULL); +} +gettimeofday(finish, NULL); + +dsec = finish.tv_sec - start.tv_sec; +if (finish.tv_usec start.tv_usec) { +dsec--; +dusec = finish.tv_usec + 100 - start.tv_usec; +} else { +dusec = finish.tv_usec - start.tv_usec; +} +printf(Lifecycle %u iterations: %lu sec %lu us\n, + max, dsec, dusec); +} + #define TESTCASE(fn) { #fn, fn } int main(int argc, char **argv) { @@ -179,6 +218,15 @@ int main(int argc, char **argv) }; int i; +if (argc == 3 strcmp(argv[1], --benchmark-lifecycle) == 0) { +benchmark_lifecycle(argv[2]); +return EXIT_SUCCESS; +} else if (argc != 1) { +fprintf(stderr, usage: %s [--benchmark-lifecycle iterations]\n, +argv[0]); +return EXIT_FAILURE; +} + for (i = 0; testcases[i].name; i++) { cur_test_name = testcases[i].name; printf(%s\n, testcases[i].name); -- 1.7.4.4
[Qemu-devel] [PATCH 2/2] Deprecate -M command line options
Superseded by -machine. Therefore, this patch removes -M from the help list and pushes -machine at the same place in the output. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- qemu-options.hx | 45 - 1 files changed, 20 insertions(+), 25 deletions(-) diff --git a/qemu-options.hx b/qemu-options.hx index 0dbc028..1204a00 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -27,14 +27,29 @@ STEXI Display version information and exit ETEXI -DEF(M, HAS_ARG, QEMU_OPTION_M, --M machine select emulated machine (-M ? for list)\n, QEMU_ARCH_ALL) +DEF(machine, HAS_ARG, QEMU_OPTION_machine, \ +-machine [type=]name[,prop[=value][,...]]\n +selects emulated machine (-machine ? for list)\n +property accel=accel1[:accel2[:...]] selects accelerator\n +supported accelerators are kvm, xen, tcg (default: tcg)\n, +QEMU_ARCH_ALL) STEXI -@item -M @var{machine} -@findex -M -Select the emulated @var{machine} (@code{-M ?} for list) +@item -machine [type=]@var{name}[,prop=@var{value}[,...]] +@findex -machine +Select the emulated machine by @var{name}. Use @code{-machine ?} to list +available machines. Supported machine properties are: +@table @option +@item accel=@var{accels1}[:@var{accels2}[:...]] +This is used to enable an accelerator. Depending on the target architecture, +kvm, xen, or tcg can be available. By default, tcg is used. If there is more +than one accelerator specified, the next one is used if the previous one fails +to initialize. +@end table ETEXI +HXCOMM Deprecated by -machine +DEF(M, HAS_ARG, QEMU_OPTION_M, , QEMU_ARCH_ALL) + DEF(cpu, HAS_ARG, QEMU_OPTION_cpu, -cpu cpuselect CPU (-cpu ? for list)\n, QEMU_ARCH_ALL) STEXI @@ -2032,26 +2047,6 @@ Enable KVM full virtualization support. This option is only available if KVM support is enabled when compiling. ETEXI -DEF(machine, HAS_ARG, QEMU_OPTION_machine, \ --machine [type=]name[,prop[=value][,...]]\n -selects emulated machine (-machine ? for list)\n -property accel=accel1[:accel2[:...]] selects accelerator\n -supported accelerators are kvm, xen, tcg (default: tcg)\n, -QEMU_ARCH_ALL) -STEXI -@item -machine [type=]@var{name}[,prop=@var{value}[,...]] -@findex -machine -Select the emulated machine by @var{name}. Use @code{-machine ?} to list -available machines. Supported machine properties are: -@table @option -@item accel=@var{accels1}[:@var{accels2}[:...]] -This is used to enable an accelerator. Depending on the target architecture, -kvm, xen, or tcg can be available. By default, tcg is used. If there is more -than one accelerator specified, the next one is used if the previous one fails -to initialize. -@end table -ETEXI - DEF(xen-domid, HAS_ARG, QEMU_OPTION_xen_domid, -xen-domid id specify xen guest domain id\n, QEMU_ARCH_ALL) DEF(xen-create, 0, QEMU_OPTION_xen_create, -- 1.7.1
[Qemu-devel] [PATCH 1/2] Generalize -machine command line option
-machine somehow suggests that it selects the machine, but it doesn't. Fix that before this command is set in stone. Actually, -machine should supersede -M and allow to introduce arbitrary per-machine options to the command line. That will change the internal realization again, but we will be able to keep the user interface stable. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- qemu-config.c |5 + qemu-options.hx | 20 +++- vl.c| 34 +++--- 3 files changed, 39 insertions(+), 20 deletions(-) diff --git a/qemu-config.c b/qemu-config.c index 5d7ffa2..01751b4 100644 --- a/qemu-config.c +++ b/qemu-config.c @@ -452,9 +452,14 @@ QemuOptsList qemu_option_rom_opts = { static QemuOptsList qemu_machine_opts = { .name = machine, +.implied_opt_name = type, .head = QTAILQ_HEAD_INITIALIZER(qemu_machine_opts.head), .desc = { { +.name = type, +.type = QEMU_OPT_STRING, +.help = emulated machine +}, { .name = accel, .type = QEMU_OPT_STRING, .help = accelerator list, diff --git a/qemu-options.hx b/qemu-options.hx index 82e085a..0dbc028 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -2033,13 +2033,23 @@ if KVM support is enabled when compiling. ETEXI DEF(machine, HAS_ARG, QEMU_OPTION_machine, \ --machine accel=accel1[:accel2]use an accelerator (kvm,xen,tcg), default is tcg\n, QEMU_ARCH_ALL) +-machine [type=]name[,prop[=value][,...]]\n +selects emulated machine (-machine ? for list)\n +property accel=accel1[:accel2[:...]] selects accelerator\n +supported accelerators are kvm, xen, tcg (default: tcg)\n, +QEMU_ARCH_ALL) STEXI -@item -machine accel=@var{accels} +@item -machine [type=]@var{name}[,prop=@var{value}[,...]] @findex -machine -This is use to enable an accelerator, in kvm,xen,tcg. -By default, it use only tcg. If there a more than one accelerator -specified, the next one is used if the first don't work. +Select the emulated machine by @var{name}. Use @code{-machine ?} to list +available machines. Supported machine properties are: +@table @option +@item accel=@var{accels1}[:@var{accels2}[:...]] +This is used to enable an accelerator. Depending on the target architecture, +kvm, xen, or tcg can be available. By default, tcg is used. If there is more +than one accelerator specified, the next one is used if the previous one fails +to initialize. +@end table ETEXI DEF(xen-domid, HAS_ARG, QEMU_OPTION_xen_domid, diff --git a/vl.c b/vl.c index b362871..4560376 100644 --- a/vl.c +++ b/vl.c @@ -2144,20 +2144,9 @@ int main(int argc, char **argv, char **envp) } switch(popt-index) { case QEMU_OPTION_M: -machine = find_machine(optarg); -if (!machine) { -QEMUMachine *m; -printf(Supported machines are:\n); -for(m = first_machine; m != NULL; m = m-next) { -if (m-alias) -printf(%-10s %s (alias of %s)\n, - m-alias, m-desc, m-name); -printf(%-10s %s%s\n, - m-name, m-desc, - m-is_default ? (default) : ); -} -exit(*optarg != '?'); -} +olist = qemu_find_opts(machine); +qemu_opts_reset(olist); +qemu_opts_parse(olist, optarg, 1); break; case QEMU_OPTION_cpu: /* hw initialization will check this */ @@ -2675,11 +2664,26 @@ int main(int argc, char **argv, char **envp) case QEMU_OPTION_machine: olist = qemu_find_opts(machine); qemu_opts_reset(olist); -opts = qemu_opts_parse(olist, optarg, 0); +opts = qemu_opts_parse(olist, optarg, 1); if (!opts) { fprintf(stderr, parse error: %s\n, optarg); exit(1); } +optarg = qemu_opt_get(opts, type); +machine = optarg ? find_machine(optarg) : NULL; +if (!machine) { +QEMUMachine *m; +printf(Supported machines are:\n); +for (m = first_machine; m != NULL; m = m-next) { +if (m-alias) { +printf(%-10s %s (alias of %s)\n, + m-alias, m-desc, m-name); +} +printf(%-10s %s%s\n, m-name, m-desc, + m-is_default ? (default) : ); +} +exit(!optarg || *optarg != '?'); +} break; case QEMU_OPTION_usb:
Re: [Qemu-devel] Regression Warning: more nics requested than this machine supports
On 16 May 2011 17:58, Markus Armbruster arm...@redhat.com wrote: $ qemu-system-x86_64 -nodefaults -enable-kvm -m 384 -vnc :0 -S -netdev user,id=net0 -device e1000,netdev=net0 Warning: more nics requested than this machine supports; some have been ignored (qemu) info network Devices not on any VLAN: net0: net=10.0.2.0, restricted=n peer=e1000.0 e1000.0: model=e1000,macaddr=52:54:00:12:34:56 peer=net0 Culprit is net: Improve the warnings for dubious command line option combinations Its count of requested NICs is blissfully unaware of -device. In my example, it comes up with nb_nics == 0 and seen_nics == 1. As far as I can determine, -device e1000,netdev=0 doesn't go through net_init_nic() and doesn't put an entry in the nd_table[] for the NIC. This means it's broken, because a lot of board models look in nd_table[] to determine whether the user requested a NIC and whether it's the right type. So I think that in some ways this is just showing up an existing problem with trying to instantiate a network card with -device. -- PMM
Re: [Qemu-devel] [RFC] Memory API
On Fri, May 20, 2011 at 09:40:13AM +0200, Jan Kiszka wrote: On 2011-05-20 09:23, Gleb Natapov wrote: On Thu, May 19, 2011 at 08:55:49PM +0200, Jan Kiszka wrote: Because we should catch accidental overlaps in all those non PCI devices with hard-wired addressing. That's a bug in the device/machine model and should be reported as such by QEMU. Why should we complicate API to catch unlikely errors? If you want to debug that add capability to dump memory map from the monitor. Because we need to switch tons of code that so far saw a fairly different reaction of the core to overlapping regions. How so? Today if there is accidental overlap device will not function properly. With new API it will be the same. I rather expect subtle differences as overlapping registration changes existing regions, in the future those will recover. Where do you expect the differences will come from? Conversion to the new API shouldn't change the order of the registration and if the last registration will override previous one the end result should be the same as we have today. A) Removing regions will change significantly. So far this is done by setting a region to IO_MEM_UNASSIGNED, keeping truncation. With the new API that will be a true removal which will additionally restore hidden regions. And what problem do you expect may arise from that? Currently accessing such region after unassign will result in undefined behaviour, so this code is non working today, you can't make it worse. B) Uncontrolled overlapping is a bug that should be caught by the core, and a new API is a perfect chance to do this. Well, this will indeed introduce the difference in behaviour :) The guest that ran before will abort now. Are you actually aware of any such overlaps in the current code base? But if priorities are gona stay why not fail if two regions with the same priority overlap? If that happens it means that the memory creation didn't pass the point where conflict should have been resolved (by assigning different priorities) and this means that overlap is unintentional, no? new region management will not cause any harm to overlapping regions so that they can recover when the overlap is gone. Another example may be APIC region and PCI. They overlap, but neither CPU nor PCI knows about it. And they do not need to. The APIC regions will be managed by the per-CPU region management, reusing the tool box we need for all bridges. It will register the APIC page with a priority higher than the default one, thus overriding everything that comes from the host bridge. I think that reflects pretty well real machine behaviour. What is higher? How does it know that priority is high enough? Because no one else manages priorities at a specific hierarchy level. There is only one. PCI and CPU are on different hierarchy levels. PCI is under the PIIX and CPU is on a system BUS. The priority for the APIC mapping will be applied at CPU level, of course. So it will override everything, not just PCI. So you do not need explicit priority because the place in hierarchy implicitly provides you with one. Yes. OK :) So you agree that we can do without priorities :) Nope, see below how your own example depends on them. It depends on them in very defined way. Only layer that knows exactly what is going on defines priorities. The priorities do not leak on any other level or global database. It is different from propagating priority from PCI BAR to core memory API. I am starting to see how you can represent all this local decisions as priority numbers and then travel this weighted tree to find what memory region should be accessed (memory registration _has_ to be hierarchical for that to work in meaningful way). I still don't see why it is better than flattening the tree in the point of conflict. Alternatively, you could add a prio offset to all mappings when climbing one level up, provided that offset is smaller than the prio range locally available to each level. Then a memory region final priority will depend on a tree height. If two disjointed tree branches of different height will claim the same memory region the higher one will have higher priority. I think this priority management is a can of worms. It is not as it remains a pure local thing and helps implementing the sketched scenarios. Believe, I tried to fix PAM/SMRAM already. If it remains local thing then I misunderstand what do you mean by could add a prio offset to all mappings when climbing one level up. Doesn't sound like local things to me any more. What problem did you have with PAM except low number of KVM slots btw? Only the lowest level (aka system bus) will use memory API directly. Not necessarily. It depends on how much added value buses like PCI or ISA or whatever can offer for managing I/O regions. For some purposes,
Re: [Qemu-devel] Regression Warning: more nics requested than this machine supports
On 2011-05-20 13:19, Peter Maydell wrote: On 16 May 2011 17:58, Markus Armbruster arm...@redhat.com wrote: $ qemu-system-x86_64 -nodefaults -enable-kvm -m 384 -vnc :0 -S -netdev user,id=net0 -device e1000,netdev=net0 Warning: more nics requested than this machine supports; some have been ignored (qemu) info network Devices not on any VLAN: net0: net=10.0.2.0, restricted=n peer=e1000.0 e1000.0: model=e1000,macaddr=52:54:00:12:34:56 peer=net0 Culprit is net: Improve the warnings for dubious command line option combinations Its count of requested NICs is blissfully unaware of -device. In my example, it comes up with nb_nics == 0 and seen_nics == 1. As far as I can determine, -device e1000,netdev=0 doesn't go through net_init_nic() and doesn't put an entry in the nd_table[] for the NIC. This means it's broken, because a lot of board models look in nd_table[] to determine whether the user requested a NIC and whether it's the right type. So I think that in some ways this is just showing up an existing problem with trying to instantiate a network card with -device. qemu_new_nic must call net_init_nic so that this works properly. Of course we need to avoid calling it multiple times when the adapter is still instantiated via the old -net or via board init code. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [RFC] Memory API
On Fri, May 20, 2011 at 11:59:58AM +0300, Avi Kivity wrote: On 05/19/2011 07:27 PM, Gleb Natapov wrote: Think of how a window manager folds windows with priorities onto a flat framebuffer. You do a depth-first walk of the tree. For each child list, you iterate it from the lowest to highest priority, allowing later subregions override earlier subregions. I do not think that window manager is a good analogy. Window can overlap with only its siblings. In our memory tree each final node may overlap with any other node in the tree. Transparent windows. No, still not that. Think about child windows that resides outside of its parent windows on screen. In our memory region terms think about PCI BAR is registered to overlap with RAM at address 0x1000 for instance. PCI BAR memory region and RAM memory region are on very different branches of the global tree. -- Gleb.
Re: [Qemu-devel] [RFC] Memory API
On Fri, May 20, 2011 at 12:10:22PM +0300, Avi Kivity wrote: On 05/19/2011 09:22 PM, Gleb Natapov wrote: BARs may overlap with other BARs or with RAM. That's well-known, so PCI bridged need to register their regions with the _overlap variant unconditionally. In contrast to the current PhysPageDesc mechanism, the With what priority? It doesn't matter, since the spec doesn't define priorities among PCI BARs. And among PCI BAR and memory (the case the question above referred too). If it needs to call _overlap unconditionally why not always call _overlap and drop not _overlap variant? Other uses need non-overlapping registration. And who prohibit them from creating one? And they do not need to. The APIC regions will be managed by the per-CPU region management, reusing the tool box we need for all bridges. It will register the APIC page with a priority higher than the default one, thus overriding everything that comes from the host bridge. I think that reflects pretty well real machine behaviour. What is higher? How does it know that priority is high enough? It is well known that 1 0, for example. That is if you have global scale. In the case I am asking about you do not. Even if PCI will register memory region that overlaps APIC address with priority 1000 APIC memory region should still be able to override it even with priority 0. Voila 1000 0? Where is your sarcasm now? :) But Jan already answered this one. Actually what really matters is the place of the node in a topology, not priority. But then for all of this to make sense registration has to be hierarchical. I thought, from reading other replies, that priorities are meaningful only on the same hierarchy level (which kinda make sense), but now you are saying that you will override PCI address from another part of the topology? -- per-cpu memory | +--- apic page (prio 1) | +--- global memory (prio 0) -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- Gleb.
Re: [Qemu-devel] [PATCH v4 1/3] coroutine: introduce coroutines
On 05/20/2011 12:59 PM, Stefan Hajnoczi wrote: This coroutines implementation is based on the gtk-vnc implementation written by Anthony Liguorianth...@codemonkey.ws but it has been significantly rewritten by Kevin Wolfkw...@redhat.com to use setjmp()/longjmp() instead of the more expensive swapcontext() and by Paolo Bonzinipbonz...@redhat.com for Windows Fibers support. Not a blocker at all, but why did you move the pooling to the ucontext implementation? It's less expensive to create the fiber in Windows because there are no system calls (unlike swapcontext), but a future pthread-based implementation will also need the pooling. It can be left to whoever writes the pthread stuff, though. Paolo
Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration
I'm interested in what the API for snapshots would look like. Specifically how does user software do the following: 1. Create a snapshot 2. Delete a snapshot 3. List snapshots 4. Access data from a snapshot 5. Restore a VM from a snapshot 6. Get the dirty blocks list (for incremental backup) We've discussed image format-level approaches but I think the scope of the API should cover several levels at which snapshots are implemented: 1. Image format - image file snapshot (Jes, Jagane) 2. Host file system - ext4 and btrfs snapshots 3. Storage system - LVM or SAN volume snapshots It will be hard to take advantage of more efficient host file system or storage system snapshots if they are not designed in now. Is anyone familiar enough with the libvirt storage APIs to draft an extension that adds snapshot support? I will take a stab at it if no one else want to try it. Stefan
Re: [Qemu-devel] [PATCH v4 1/3] coroutine: introduce coroutines
On Fri, May 20, 2011 at 1:09 PM, Paolo Bonzini pbonz...@redhat.com wrote: On 05/20/2011 12:59 PM, Stefan Hajnoczi wrote: This coroutines implementation is based on the gtk-vnc implementation written by Anthony Liguorianth...@codemonkey.ws but it has been significantly rewritten by Kevin Wolfkw...@redhat.com to use setjmp()/longjmp() instead of the more expensive swapcontext() and by Paolo Bonzinipbonz...@redhat.com for Windows Fibers support. Not a blocker at all, but why did you move the pooling to the ucontext implementation? It's less expensive to create the fiber in Windows because there are no system calls (unlike swapcontext), but a future pthread-based implementation will also need the pooling. It can be left to whoever writes the pthread stuff, though. There are two options for pooling: 1. Thread-local pools 2. One global pool with a lock One of these choices must be selected because otherwise the pool could be accessed simultaneously from multiple threads. I tried #2 first because it was less code but it caused a noticable slow-down with ./check-coroutine --benchmark-lifecycle. Option #1 had less impact but requires using thread-local storage, which I've used pthread APIs for. Hence I moved it into coroutine-ucontext.c hoping that win32 would either be fast enough as-as or that we could find a better solution if someone needs it. Stefan
[Qemu-devel] [PATCH 0/6] Implement constant folding and copy propagation in TCG
This series implements some basic machine-independent optimizations. They simplify code and allow liveness analysis do it's work better. Suppose we have following ARM code: movwr12, #0xb6db movtr12, #0xdb6d In TCG before optimizations we'll have: movi_i32 tmp8,$0xb6db mov_i32 r12,tmp8 mov_i32 tmp8,r12 ext16u_i32 tmp8,tmp8 movi_i32 tmp9,$0xdb6d or_i32 tmp8,tmp8,tmp9 mov_i32 r12,tmp8 And after optimizations we'll have this: movi_i32 r12,$0xdb6db6db Here are performance evaluation results on SPEC CPU2000 integer tests in user-mode emulation on x86_64 host. There were 5 runs of each test on reference data set. The tables below show runtime in seconds for all these runs. ARM guest without optimizations: Test name #1 #2 #3 #4 #5Median 164.gzip1403.612 1403.499 1403.52 1208.55 1403.583 1403.52 175.vpr 1237.729 1238.008 1238.019 1176.852 1237.902 1237.902 176.gcc 929.511 928.867 929.048 928.927 928.792 928.927 181.mcf 196.371 196.335 196.172 197.057 196.196 196.335 186.crafty 1547.101 1547.293 1547.133 1547.037 1547.044 1547.101 197.parser 3804.336 3804.429 3804.412 3804.45 3804.301 3804.412 252.eon 2760.414 2760.45 2473.608 2760.606 2760.216 2760.414 253.perlbmk 2557.966 2558.971 2559.731 2479.299 2556.835 2557.966 256.bzip2 1296.412 1296.215 1296.63 1296.489 1296.092 1296.412 300.twolf 2919.496 2919.444 2919.529 2919.384 2919.404 2919.444 ARM guest with optimizations: Test name #1 #2 #3 #4 #5MedianGain 164.gzip1345.416 1401.741 1377.022 1401.737 1401.773 1401.737 0.13% 175.vpr 1116.75 1243.213 1243.32 1243.316 1243.144 1243.213 -0.43% 176.gcc 897.045 909.568 850.1909.65 909.57 909.568 2.08% 181.mcf 199.058 198.717 198.28 198.866 197.955 198.717 -1.21% 186.crafty 1525.667 1526.663 1525.981 1525.995 1526.164 1525.995 1.36% 197.parser 3749.453 3749.522 3749.413 3749.5 3749.484 3749.484 1.44% 252.eon 2730.593 2746.525 2746.495 2746.493 2746.62 2746.495 0.50% 253.perlbmk 2577.341 2521.057 2578.461 2578.721 2581.313 2578.461 -0.80% 256.bzip2 1184.498 1190.116 1294.352 1294.554 1294.637 1294.352 0.16% 300.twolf 2894.264 2894.133 2894.398 2894.103 2894.146 2894.146 0.87% x86_64 guest without optimizations: Test name #1 #2 #3 #4 #5Median 164.gzip 858.118 858.151 858.09 858.139 858.122 858.122 175.vpr 956.361 956.465 956.521 956.438 956.705 956.465 176.gcc 647.275 647.465 647.186 647.294 647.268 647.275 181.mcf 219.239 221.964 220.244 220.74 220.559 220.559 186.crafty 1128.027 1128.071 1128.028 1128.115 1128.123 1128.071 197.parser 1815.669 1815.651 1815.669 1815.711 1815.759 1815.669 253.perlbmk 1777.143 1777.749 1667.508 1777.051 1778.391 1777.143 254.gap 1062.808 1062.758 1062.801 1063.099 1062.859 1062.808 255.vortex 1930.693 1930.706 1930.579 1930.7 1930.566 1930.693 256.bzip2 1014.566 1014.702 1014.6 1014.274 1014.421 1014.566 300.twolf 1342.653 1342.759 1344.092 1342.641 1342.794 1342.759 x86_64 guest with optimizations: Test name #1 #2 #3 #4 #5MedianGain 164.gzip 857.485 857.457 857.475 857.509 857.507 857.485 0.07% 175.vpr 963.255 962.972 963.27 963.124 963.686 963.255 -0.71% 176.gcc 644.123 644.055 644.145 643.818 635.773 644.055 0.50% 181.mcf 216.215 217.549 218.744 216.437 217.83 217.549 1.36% 186.crafty 1128.873 1128.792 1128.871 1128.816 1128.823 1128.823 -0.07% 197.parser 1814.626 1814.503 1814.552 1814.602 1814.748 1814.602 0.06% 253.perlbmk 1758.056 1751.963 1753.267 1765.27 1759.828 1758.056 1.07% 254.gap 1064.702 1064.712 1064.629 1064.657 1064.645 1064.657 -0.17% 255.vortex 1760.638 1936.387 1937.871 1937.471 1760.496 1936.387 -0.29% 256.bzip2 1007.658 1007.682 1007.316 1007.982 1007.747 1007.682 0.68% 300.twolf 1334.139 1333.791 1333.795 1334.147 1333.732 1333.795 0.67% ARM guests for 254.gap and 255.vortex and x86_64 guest for 252.eon does not work under QEMU for some unrelated reason. Kirill Batuzov (6): Add TCG optimizations stub Add copy and constant propagation. Do constant folding for basic arithmetic operations. Do constant folding for boolean operations. Do constant folding for shift operations. Do constant folding for unary operations. Makefile.target |2 +- tcg/optimize.c | 539 +++ tcg/tcg.c |6 + tcg/tcg.h |3 + 4 files changed, 549 insertions(+), 1 deletions(-) create mode 100644 tcg/optimize.c -- 1.7.4.1
[Qemu-devel] [PATCH 1/6] Add TCG optimizations stub
Added file tcg/optimize.c to hold TCG optimizations. Function tcg_optimize is called from tcg_gen_code_common. It calls other functions performing specific optimizations. Stub for constant folding was added. Signed-off-by: Kirill Batuzov batuz...@ispras.ru --- Makefile.target |2 +- tcg/optimize.c | 87 +++ tcg/tcg.c |6 tcg/tcg.h |3 ++ 4 files changed, 97 insertions(+), 1 deletions(-) create mode 100644 tcg/optimize.c diff --git a/Makefile.target b/Makefile.target index 21f864a..5a61778 100644 --- a/Makefile.target +++ b/Makefile.target @@ -70,7 +70,7 @@ all: $(PROGS) stap # # cpu emulator library libobj-y = exec.o translate-all.o cpu-exec.o translate.o -libobj-y += tcg/tcg.o +libobj-y += tcg/tcg.o tcg/optimize.o libobj-$(CONFIG_SOFTFLOAT) += fpu/softfloat.o libobj-$(CONFIG_NOSOFTFLOAT) += fpu/softfloat-native.o libobj-y += op_helper.o helper.o diff --git a/tcg/optimize.c b/tcg/optimize.c new file mode 100644 index 000..cf31d18 --- /dev/null +++ b/tcg/optimize.c @@ -0,0 +1,87 @@ +/* + * Optimizations for Tiny Code Generator for QEMU + * + * Copyright (c) 2010 Samsung Electronics. + * Contributed by Kirill Batuzov batuz...@ispras.ru + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the Software), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include config.h + +#include stdlib.h +#include stdio.h + +#include qemu-common.h +#include tcg-op.h + +static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr, +TCGArg *args, TCGOpDef *tcg_op_defs) +{ +int i, nb_ops, op_index, op, nb_temps, nb_globals; +const TCGOpDef *def; +TCGArg *gen_args; + +nb_temps = s-nb_temps; +nb_globals = s-nb_globals; + +nb_ops = tcg_opc_ptr - gen_opc_buf; +gen_args = args; +for (op_index = 0; op_index nb_ops; op_index++) { +op = gen_opc_buf[op_index]; +def = tcg_op_defs[op]; +switch (op) { +case INDEX_op_call: +case INDEX_op_jmp: +case INDEX_op_br: +case INDEX_op_brcond_i32: +case INDEX_op_set_label: +#if TCG_TARGET_REG_BITS == 64 +case INDEX_op_brcond_i64: +#endif +i = (op == INDEX_op_call) ? +(args[0] 16) + (args[0] 0x) + 3 : +def-nb_args; +while (i) { +*gen_args = *args; +args++; +gen_args++; +i--; +} +break; +default: +for (i = 0; i def-nb_args; i++) { +gen_args[i] = args[i]; +} +args += def-nb_args; +gen_args += def-nb_args; +break; +} +} + +return gen_args; +} + +TCGArg *tcg_optimize(TCGContext *s, uint16_t *tcg_opc_ptr, +TCGArg *args, TCGOpDef *tcg_op_defs) +{ +TCGArg *res; +res = tcg_constant_folding(s, tcg_opc_ptr, args, tcg_op_defs); +return res; +} diff --git a/tcg/tcg.c b/tcg/tcg.c index 8748c05..6fb4dd6 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -24,6 +24,7 @@ /* define it to use liveness analysis (better code) */ #define USE_LIVENESS_ANALYSIS +#define USE_TCG_OPTIMIZATIONS #include config.h @@ -2018,6 +2019,11 @@ static inline int tcg_gen_code_common(TCGContext *s, uint8_t *gen_code_buf, } #endif +#ifdef USE_TCG_OPTIMIZATIONS +gen_opparam_ptr = +tcg_optimize(s, gen_opc_ptr, gen_opparam_buf, tcg_op_defs); +#endif + #ifdef CONFIG_PROFILER s-la_time -= profile_getclock(); #endif diff --git a/tcg/tcg.h b/tcg/tcg.h index 3fab8d6..a85a8d7 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -486,6 +486,9 @@ void tcg_gen_callN(TCGContext *s, TCGv_ptr func, unsigned int flags, void tcg_gen_shifti_i64(TCGv_i64 ret, TCGv_i64 arg1, int c, int right, int arith); +TCGArg *tcg_optimize(TCGContext *s, uint16_t *tcg_opc_ptr, TCGArg *args, +
[Qemu-devel] [PATCH 6/6] Do constant folding for unary operations.
Perform constant folding for NOT and EXT{8,16,32}{S,U} operations. Signed-off-by: Kirill Batuzov batuz...@ispras.ru --- tcg/optimize.c | 82 1 files changed, 82 insertions(+), 0 deletions(-) diff --git a/tcg/optimize.c b/tcg/optimize.c index b6b0dc4..bda469a 100644 --- a/tcg/optimize.c +++ b/tcg/optimize.c @@ -104,6 +104,11 @@ static int op_bits(int op) case INDEX_op_sar_i32: case INDEX_op_rotl_i32: case INDEX_op_rotr_i32: +case INDEX_op_not_i32: +case INDEX_op_ext8s_i32: +case INDEX_op_ext16s_i32: +case INDEX_op_ext8u_i32: +case INDEX_op_ext16u_i32: return 32; #if TCG_TARGET_REG_BITS == 64 case INDEX_op_mov_i64: @@ -118,6 +123,13 @@ static int op_bits(int op) case INDEX_op_sar_i64: case INDEX_op_rotl_i64: case INDEX_op_rotr_i64: +case INDEX_op_not_i64: +case INDEX_op_ext8s_i64: +case INDEX_op_ext16s_i64: +case INDEX_op_ext32s_i64: +case INDEX_op_ext8u_i64: +case INDEX_op_ext16u_i64: +case INDEX_op_ext32u_i64: return 64; #endif default: @@ -245,6 +257,44 @@ static TCGArg do_constant_folding_2(int op, TCGArg x, TCGArg y) return x; #endif +case INDEX_op_not_i32: +#if TCG_TARGET_REG_BITS == 64 +case INDEX_op_not_i64: +#endif +return ~x; + +case INDEX_op_ext8s_i32: +return x (1 7) ? x | ~0xff : x 0xff; + +case INDEX_op_ext16s_i32: +return x (1 15) ? x | ~0x : x 0x; + +case INDEX_op_ext8u_i32: +return x 0xff; + +case INDEX_op_ext16u_i32: +return x 0x; + +#if TCG_TARGET_REG_BITS == 64 +case INDEX_op_ext8s_i64: +return x (1 7) ? x | ~0xffULL : x 0xff; + +case INDEX_op_ext16s_i64: +return x (1 15) ? x | ~0xULL : x 0x; + +case INDEX_op_ext32s_i64: +return x (1U 31) ? x | ~0xULL : x 0x; + +case INDEX_op_ext8u_i64: +return x 0xff; + +case INDEX_op_ext16u_i64: +return x 0x; + +case INDEX_op_ext32u_i64: +return x 0x; +#endif + default: fprintf(stderr, Unrecognized operation %d in do_constant_folding.\n, op); @@ -345,6 +395,38 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr, gen_args += 2; args += 2; break; +case INDEX_op_not_i32: +case INDEX_op_ext8s_i32: +case INDEX_op_ext16s_i32: +case INDEX_op_ext8u_i32: +case INDEX_op_ext16u_i32: +#if TCG_TARGET_REG_BITS == 64 +case INDEX_op_not_i64: +case INDEX_op_ext8s_i64: +case INDEX_op_ext16s_i64: +case INDEX_op_ext32s_i64: +case INDEX_op_ext8u_i64: +case INDEX_op_ext16u_i64: +case INDEX_op_ext32u_i64: +#endif +if (state[args[1]] == TCG_TEMP_CONST) { +gen_opc_buf[op_index] = op_to_movi(op); +gen_args[0] = args[0]; +gen_args[1] = do_constant_folding(op, vals[args[1]], 0); +reset_temp(state, vals, gen_args[0], nb_temps, nb_globals); +state[gen_args[0]] = TCG_TEMP_CONST; +vals[gen_args[0]] = gen_args[1]; +gen_args += 2; +args += 2; +break; +} else { +reset_temp(state, vals, args[0], nb_temps, nb_globals); +gen_args[0] = args[0]; +gen_args[1] = args[1]; +gen_args += 2; +args += 2; +break; +} case INDEX_op_or_i32: case INDEX_op_and_i32: #if TCG_TARGET_REG_BITS == 64 -- 1.7.4.1
[Qemu-devel] [PATCH 5/6] Do constant folding for shift operations.
Perform constant forlding for SHR, SHL, SAR, ROTR, ROTL operations. Signed-off-by: Kirill Batuzov batuz...@ispras.ru --- tcg/optimize.c | 87 1 files changed, 87 insertions(+), 0 deletions(-) diff --git a/tcg/optimize.c b/tcg/optimize.c index a02d5c1..b6b0dc4 100644 --- a/tcg/optimize.c +++ b/tcg/optimize.c @@ -99,6 +99,11 @@ static int op_bits(int op) case INDEX_op_and_i32: case INDEX_op_or_i32: case INDEX_op_xor_i32: +case INDEX_op_shl_i32: +case INDEX_op_shr_i32: +case INDEX_op_sar_i32: +case INDEX_op_rotl_i32: +case INDEX_op_rotr_i32: return 32; #if TCG_TARGET_REG_BITS == 64 case INDEX_op_mov_i64: @@ -108,6 +113,11 @@ static int op_bits(int op) case INDEX_op_and_i64: case INDEX_op_or_i64: case INDEX_op_xor_i64: +case INDEX_op_shl_i64: +case INDEX_op_shr_i64: +case INDEX_op_sar_i64: +case INDEX_op_rotl_i64: +case INDEX_op_rotr_i64: return 64; #endif default: @@ -131,6 +141,7 @@ static int op_to_movi(int op) static TCGArg do_constant_folding_2(int op, TCGArg x, TCGArg y) { +TCGArg r; switch (op) { case INDEX_op_add_i32: #if TCG_TARGET_REG_BITS == 64 @@ -168,6 +179,72 @@ static TCGArg do_constant_folding_2(int op, TCGArg x, TCGArg y) #endif return x ^ y; +case INDEX_op_shl_i32: +#if TCG_TARGET_REG_BITS == 64 +y = 0x; +case INDEX_op_shl_i64: +#endif +return x y; + +case INDEX_op_shr_i32: +#if TCG_TARGET_REG_BITS == 64 +x = 0x; +y = 0x; +case INDEX_op_shr_i64: +#endif +/* Assuming TCGArg to be unsigned */ +return x y; + +case INDEX_op_sar_i32: +#if TCG_TARGET_REG_BITS == 64 +x = 0x; +y = 0x; +#endif +r = x 0x8000; +x = ~0x8000; +x = y; +r |= r - (r y); +x |= r; +return x; + +#if TCG_TARGET_REG_BITS == 64 +case INDEX_op_sar_i64: +r = x 0x8000ULL; +x = ~0x8000ULL; +x = y; +r |= r - (r y); +x |= r; +return x; +#endif + +case INDEX_op_rotr_i32: +#if TCG_TARGET_REG_BITS == 64 +x = 0x; +y = 0x; +#endif +x = (x (32 - y)) | (x y); +return x; + +#if TCG_TARGET_REG_BITS == 64 +case INDEX_op_rotr_i64: +x = (x (64 - y)) | (x y); +return x; +#endif + +case INDEX_op_rotl_i32: +#if TCG_TARGET_REG_BITS == 64 +x = 0x; +y = 0x; +#endif +x = (x y) | (x (32 - y)); +return x; + +#if TCG_TARGET_REG_BITS == 64 +case INDEX_op_rotl_i64: +x = (x y) | (x (64 - y)); +return x; +#endif + default: fprintf(stderr, Unrecognized operation %d in do_constant_folding.\n, op); @@ -297,11 +374,21 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr, case INDEX_op_add_i32: case INDEX_op_sub_i32: case INDEX_op_mul_i32: +case INDEX_op_shl_i32: +case INDEX_op_shr_i32: +case INDEX_op_sar_i32: +case INDEX_op_rotl_i32: +case INDEX_op_rotr_i32: #if TCG_TARGET_REG_BITS == 64 case INDEX_op_xor_i64: case INDEX_op_add_i64: case INDEX_op_sub_i64: case INDEX_op_mul_i64: +case INDEX_op_shl_i64: +case INDEX_op_shr_i64: +case INDEX_op_sar_i64: +case INDEX_op_rotl_i64: +case INDEX_op_rotr_i64: #endif if (state[args[1]] == TCG_TEMP_CONST state[args[2]] == TCG_TEMP_CONST) { -- 1.7.4.1
[Qemu-devel] [PATCH 4/6] Do constant folding for boolean operations.
Perform constant folding for AND, OR, XOR operations. Signed-off-by: Kirill Batuzov batuz...@ispras.ru --- tcg/optimize.c | 58 1 files changed, 58 insertions(+), 0 deletions(-) diff --git a/tcg/optimize.c b/tcg/optimize.c index 4073f05..a02d5c1 100644 --- a/tcg/optimize.c +++ b/tcg/optimize.c @@ -38,6 +38,13 @@ typedef enum { TCG_TEMP_ANY } tcg_temp_state; +const int mov_opc[] = { +INDEX_op_mov_i32, +#if TCG_TARGET_REG_BITS == 64 +INDEX_op_mov_i64, +#endif +}; + static int mov_to_movi(int op) { switch (op) { @@ -89,12 +96,18 @@ static int op_bits(int op) case INDEX_op_add_i32: case INDEX_op_sub_i32: case INDEX_op_mul_i32: +case INDEX_op_and_i32: +case INDEX_op_or_i32: +case INDEX_op_xor_i32: return 32; #if TCG_TARGET_REG_BITS == 64 case INDEX_op_mov_i64: case INDEX_op_add_i64: case INDEX_op_sub_i64: case INDEX_op_mul_i64: +case INDEX_op_and_i64: +case INDEX_op_or_i64: +case INDEX_op_xor_i64: return 64; #endif default: @@ -137,6 +150,24 @@ static TCGArg do_constant_folding_2(int op, TCGArg x, TCGArg y) #endif return x * y; +case INDEX_op_and_i32: +#if TCG_TARGET_REG_BITS == 64 +case INDEX_op_and_i64: +#endif +return x y; + +case INDEX_op_or_i32: +#if TCG_TARGET_REG_BITS == 64 +case INDEX_op_or_i64: +#endif +return x | y; + +case INDEX_op_xor_i32: +#if TCG_TARGET_REG_BITS == 64 +case INDEX_op_xor_i64: +#endif +return x ^ y; + default: fprintf(stderr, Unrecognized operation %d in do_constant_folding.\n, op); @@ -237,10 +268,37 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr, gen_args += 2; args += 2; break; +case INDEX_op_or_i32: +case INDEX_op_and_i32: +#if TCG_TARGET_REG_BITS == 64 +case INDEX_op_and_i64: +case INDEX_op_or_i64: +#endif +if (args[1] == args[2]) { +if (args[1] == args[0]) { +args += 3; +gen_opc_buf[op_index] = INDEX_op_nop; +} else { +reset_temp(state, vals, args[0], nb_temps, nb_globals); +if (args[1] = s-nb_globals) { +state[args[0]] = TCG_TEMP_COPY; +vals[args[0]] = args[1]; +} +gen_opc_buf[op_index] = mov_opc[op_bits(op) / 32 - 1]; +gen_args[0] = args[0]; +gen_args[1] = args[1]; +gen_args += 2; +args += 3; +} +break; +} +/* Proceed with default binary operation handling */ +case INDEX_op_xor_i32: case INDEX_op_add_i32: case INDEX_op_sub_i32: case INDEX_op_mul_i32: #if TCG_TARGET_REG_BITS == 64 +case INDEX_op_xor_i64: case INDEX_op_add_i64: case INDEX_op_sub_i64: case INDEX_op_mul_i64: -- 1.7.4.1
[Qemu-devel] [PATCH 2/6] Add copy and constant propagation.
Make tcg_constant_folding do copy and constant propagation. It is a preparational work before actual constant folding. Signed-off-by: Kirill Batuzov batuz...@ispras.ru --- tcg/optimize.c | 123 1 files changed, 123 insertions(+), 0 deletions(-) diff --git a/tcg/optimize.c b/tcg/optimize.c index cf31d18..a761c51 100644 --- a/tcg/optimize.c +++ b/tcg/optimize.c @@ -31,22 +31,139 @@ #include qemu-common.h #include tcg-op.h +typedef enum { +TCG_TEMP_UNDEF = 0, +TCG_TEMP_CONST, +TCG_TEMP_COPY, +TCG_TEMP_ANY +} tcg_temp_state; + +static int mov_to_movi(int op) +{ +switch (op) { +case INDEX_op_mov_i32: return INDEX_op_movi_i32; +#if TCG_TARGET_REG_BITS == 64 +case INDEX_op_mov_i64: return INDEX_op_movi_i64; +#endif +default: +fprintf(stderr, Unrecognized operation %d in mov_to_movi.\n, op); +tcg_abort(); +} +} + +/* Reset TEMP's state to TCG_TEMP_ANY. If TEMP was a representative of some + class of equivalent temp's, a new representative should be chosen in this + class. */ +static void reset_temp(tcg_temp_state *state, tcg_target_ulong *vals, + TCGArg temp, int nb_temps, int nb_globals) +{ +int i; +TCGArg new_base; +new_base = (TCGArg)-1; +for (i = nb_globals; i nb_temps; i++) { +if (state[i] == TCG_TEMP_COPY vals[i] == temp) { +if (new_base == ((TCGArg)-1)) { +new_base = (TCGArg)i; +state[i] = TCG_TEMP_ANY; +} else { +vals[i] = new_base; +} +} +} +for (i = 0; i nb_globals; i++) { +if (state[i] == TCG_TEMP_COPY vals[i] == temp) { +if (new_base == ((TCGArg)-1)) { +state[i] = TCG_TEMP_ANY; +} else { +vals[i] = new_base; +} +} +} +state[temp] = TCG_TEMP_ANY; +} + +/* Propagate constants and copies, fold constant expressions. */ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr, TCGArg *args, TCGOpDef *tcg_op_defs) { int i, nb_ops, op_index, op, nb_temps, nb_globals; const TCGOpDef *def; TCGArg *gen_args; +/* Array VALS has an element for each temp. + If this temp holds a constant then its value is kept in VALS' element. + If this temp is a copy of other ones then this equivalence class' + representative is kept in VALS' element. + If this temp is neither copy nor constant then corresponding VALS' + element is unused. */ +static tcg_target_ulong vals[TCG_MAX_TEMPS]; +static tcg_temp_state state[TCG_MAX_TEMPS]; nb_temps = s-nb_temps; nb_globals = s-nb_globals; +memset(state, 0, nb_temps * sizeof(tcg_temp_state)); nb_ops = tcg_opc_ptr - gen_opc_buf; gen_args = args; for (op_index = 0; op_index nb_ops; op_index++) { op = gen_opc_buf[op_index]; def = tcg_op_defs[op]; +/* Do copy propagation */ +if (op != INDEX_op_call) { +for (i = def-nb_oargs; i def-nb_oargs + def-nb_iargs; i++) { +if (state[args[i]] == TCG_TEMP_COPY + !(def-args_ct[i].ct TCG_CT_IALIAS) + (def-args_ct[i].ct TCG_CT_REG)) { +args[i] = vals[args[i]]; +} +} +} + +/* Propagate constants through copy operations and do constant + folding. Constants will be substituted to arguments by register + allocator where needed and possible. Also detect copies. */ switch (op) { +case INDEX_op_mov_i32: +#if TCG_TARGET_REG_BITS == 64 +case INDEX_op_mov_i64: +#endif +if ((state[args[1]] == TCG_TEMP_COPY + vals[args[1]] == args[0]) +|| args[0] == args[1]) { +args += 2; +gen_opc_buf[op_index] = INDEX_op_nop; +break; +} +if (state[args[1]] != TCG_TEMP_CONST) { +reset_temp(state, vals, args[0], nb_temps, nb_globals); +if (args[1] = s-nb_globals) { +state[args[0]] = TCG_TEMP_COPY; +vals[args[0]] = args[1]; +} +gen_args[0] = args[0]; +gen_args[1] = args[1]; +gen_args += 2; +args += 2; +break; +} else { +/* Source argument is constant. Rewrite the operation and + let movi case handle it. */ +op = mov_to_movi(op); +gen_opc_buf[op_index] = op; +args[1] = vals[args[1]]; +/* fallthrough */ +} +case INDEX_op_movi_i32: +#if TCG_TARGET_REG_BITS == 64 +case INDEX_op_movi_i64: +#endif +reset_temp(state, vals, args[0], nb_temps, nb_globals); +
[Qemu-devel] [PATCH 3/6] Do constant folding for basic arithmetic operations.
Perform actual constant folding for ADD, SUB and MUL operations. Signed-off-by: Kirill Batuzov batuz...@ispras.ru --- tcg/optimize.c | 102 1 files changed, 102 insertions(+), 0 deletions(-) diff --git a/tcg/optimize.c b/tcg/optimize.c index a761c51..4073f05 100644 --- a/tcg/optimize.c +++ b/tcg/optimize.c @@ -82,6 +82,79 @@ static void reset_temp(tcg_temp_state *state, tcg_target_ulong *vals, state[temp] = TCG_TEMP_ANY; } +static int op_bits(int op) +{ +switch (op) { +case INDEX_op_mov_i32: +case INDEX_op_add_i32: +case INDEX_op_sub_i32: +case INDEX_op_mul_i32: +return 32; +#if TCG_TARGET_REG_BITS == 64 +case INDEX_op_mov_i64: +case INDEX_op_add_i64: +case INDEX_op_sub_i64: +case INDEX_op_mul_i64: +return 64; +#endif +default: +fprintf(stderr, Unrecognized operation %d in op_bits.\n, op); +tcg_abort(); +} +} + +static int op_to_movi(int op) +{ +if (op_bits(op) == 32) { +return INDEX_op_movi_i32; +} +#if TCG_TARGET_REG_BITS == 64 +if (op_bits(op) == 64) { +return INDEX_op_movi_i64; +} +#endif +tcg_abort(); +} + +static TCGArg do_constant_folding_2(int op, TCGArg x, TCGArg y) +{ +switch (op) { +case INDEX_op_add_i32: +#if TCG_TARGET_REG_BITS == 64 +case INDEX_op_add_i64: +#endif +return x + y; + +case INDEX_op_sub_i32: +#if TCG_TARGET_REG_BITS == 64 +case INDEX_op_sub_i64: +#endif +return x - y; + +case INDEX_op_mul_i32: +#if TCG_TARGET_REG_BITS == 64 +case INDEX_op_mul_i64: +#endif +return x * y; + +default: +fprintf(stderr, +Unrecognized operation %d in do_constant_folding.\n, op); +tcg_abort(); +} +} + +static TCGArg do_constant_folding(int op, TCGArg x, TCGArg y) +{ +TCGArg res = do_constant_folding_2(op, x, y); +#if TCG_TARGET_REG_BITS == 64 +if (op_bits(op) == 32) { +res = 0x; +} +#endif +return res; +} + /* Propagate constants and copies, fold constant expressions. */ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr, TCGArg *args, TCGOpDef *tcg_op_defs) @@ -164,6 +237,35 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr, gen_args += 2; args += 2; break; +case INDEX_op_add_i32: +case INDEX_op_sub_i32: +case INDEX_op_mul_i32: +#if TCG_TARGET_REG_BITS == 64 +case INDEX_op_add_i64: +case INDEX_op_sub_i64: +case INDEX_op_mul_i64: +#endif +if (state[args[1]] == TCG_TEMP_CONST + state[args[2]] == TCG_TEMP_CONST) { +gen_opc_buf[op_index] = op_to_movi(op); +gen_args[0] = args[0]; +gen_args[1] = +do_constant_folding(op, vals[args[1]], vals[args[2]]); +reset_temp(state, vals, gen_args[0], nb_temps, nb_globals); +state[gen_args[0]] = TCG_TEMP_CONST; +vals[gen_args[0]] = gen_args[1]; +gen_args += 2; +args += 3; +break; +} else { +reset_temp(state, vals, args[0], nb_temps, nb_globals); +gen_args[0] = args[0]; +gen_args[1] = args[1]; +gen_args[2] = args[2]; +gen_args += 3; +args += 3; +break; +} case INDEX_op_call: case INDEX_op_jmp: case INDEX_op_br: -- 1.7.4.1
Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration
On 05/20/11 14:19, Stefan Hajnoczi wrote: I'm interested in what the API for snapshots would look like. I presume you're talking external snapshots here? The API is really what should be defined by libvirt, so you get a unified API that can work both on QEMU level snapshots as well as enterprise storage, host file system snapshots etc. Specifically how does user software do the following: 1. Create a snapshot There's a QMP patch out already that is still not applied, but it is pretty simple, similar to the hmp command. Alternatively you can do it the evil way by pre-creating the snapshot image file and feeding that the snapshot command. In this case QEMU won't create the snapshot file. 2. Delete a snapshot This is still to be defined. 3. List snapshots Again this is tricky as it depends on the type of snapshot. For QEMU level ones they are files, so 'ls' is your friend :) 4. Access data from a snapshot You boot the snapshot file. 5. Restore a VM from a snapshot We're talking snapshots not checkpointing here, so you cannot restore a VM from a snapshot. 6. Get the dirty blocks list (for incremental backup) Good question We've discussed image format-level approaches but I think the scope of the API should cover several levels at which snapshots are implemented: 1. Image format - image file snapshot (Jes, Jagane) 2. Host file system - ext4 and btrfs snapshots 3. Storage system - LVM or SAN volume snapshots It will be hard to take advantage of more efficient host file system or storage system snapshots if they are not designed in now. Is anyone familiar enough with the libvirt storage APIs to draft an extension that adds snapshot support? I will take a stab at it if no one else want to try it. I believe the libvirt guys are already looking at this. Adding to the CC list. Cheers, Jes
Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration
On Fri, May 20, 2011 at 1:39 PM, Jes Sorensen jes.soren...@redhat.com wrote: On 05/20/11 14:19, Stefan Hajnoczi wrote: I'm interested in what the API for snapshots would look like. I presume you're talking external snapshots here? The API is really what should be defined by libvirt, so you get a unified API that can work both on QEMU level snapshots as well as enterprise storage, host file system snapshots etc. Thanks for the pointers on external snapshots using image files. I'm really thinking about the libvirt API. Basically I'm not sure we'll implement the right things if we don't think through the API that the user sees first. Stefan
Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration
On 05/20/11 14:49, Stefan Hajnoczi wrote: On Fri, May 20, 2011 at 1:39 PM, Jes Sorensen jes.soren...@redhat.com wrote: On 05/20/11 14:19, Stefan Hajnoczi wrote: I'm interested in what the API for snapshots would look like. I presume you're talking external snapshots here? The API is really what should be defined by libvirt, so you get a unified API that can work both on QEMU level snapshots as well as enterprise storage, host file system snapshots etc. Thanks for the pointers on external snapshots using image files. I'm really thinking about the libvirt API. Basically I'm not sure we'll implement the right things if we don't think through the API that the user sees first. Right, I agree. There's a lot of variables there, and they are not necessarily easy to map into a single namespace. I am not sure it should be done either.. Cheers, Jes
[Qemu-devel] Invitation to connect on LinkedIn
LinkedIn Sosthene Grosset-Janin requested to add you as a connection on LinkedIn: -- Jiajun, I'd like to add you to my professional network on LinkedIn. - Sosthene Accept invitation from Sosthene Grosset-Janin http://www.linkedin.com/e/-kkb1ec-gnx65bmq-4b/qTMmi8QEI_f3FNXUkL1mvZgy00BGYniwg3/blk/I129329775_11/1BpC5vrmRLoRZcjkkZt5YCpnlOt3RApnhMpmdzgmhxrSNBszYNclYRdPsVczcVcz59bTwRlnAUiA9ebPsVcPkVc3oQej8LrCBxbOYWrSlI/EML_comm_afe/ View invitation from Sosthene Grosset-Janin http://www.linkedin.com/e/-kkb1ec-gnx65bmq-4b/qTMmi8QEI_f3FNXUkL1mvZgy00BGYniwg3/blk/I129329775_11/34NnPkTdPAOcPAOckALqnpPbOYWrSlI/svi/ -- Why might connecting with Sosthene Grosset-Janin be a good idea? Sosthene Grosset-Janin's connections could be useful to you: After accepting Sosthene Grosset-Janin's invitation, check Sosthene Grosset-Janin's connections to see who else you may know and who you might want an introduction to. Building these connections can create opportunities in the future. -- (c) 2011, LinkedIn Corporation
[Qemu-devel] mouse doesn't work on guest OS
Hello all, I use Qemu to run ubuntu image(for kernel debugging affairs). I use following command: sudo qemu -hda ubuntu-qemu-test -append root=/dev/sda1 -kernel /mnt/build/linux-2.6/arch/x86/boot/bzImage -boot c -net nic -net user Mouse doesn't work on guest ubuntu. I googled my problem, and I found two solutions: adding -usb -usbdevice tablet entering this command before running qemu: export SDL_VIDEO_X11_DGAMOUSE=0 But none of them worked for me. any help is appreciated. -- Amirali Shambayati Bachelor Student Computer Engineering Department Sharif University of Technology Tehran, Iran
Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API
On 05/20/2011 02:23 AM, Avi Kivity wrote: On 05/19/2011 11:43 PM, Anthony Liguori wrote: On 05/19/2011 09:12 AM, Avi Kivity wrote: The memory API separates the attributes of a memory region (its size, how reads or writes are handled, dirty logging, and coalescing) from where it is mapped and whether it is enabled. This allows a device to configure a memory region once, then hand it off to its parent bus to map it according to the bus configuration. Hierarchical registration also allows a device to compose a region out of a number of sub-regions with different properties; for example some may be RAM while others may be MMIO. +struct { +/* If nonzero, specify bounds on access sizes beyond which a machine + * check is thrown. + */ +unsigned min_access_size; +unsigned max_access_size; +/* If true, unaligned accesses are supported. Otherwise unaligned + * accesses throw machine checks. + */ + bool unaligned; +} valid; Under what circumstances would this be used? The behavior of devices that receive non-natural accesses varies wildly. For PCI devices, invalid accesses almost always return ~0. I can't think of a device where an MCE would occur. This was requested by Richard, so I'll let him comment. Several alpha system chips MCE when accessed with incorrect sizes. E.g. only 64-bit accesses are allowed. Is this structure honestly any better than 4 function pointers? I can't see that it is, myself. r~
[Qemu-devel] Hello Would You Like To Earn
Hello qemu-devel Would you like to earn an extra $200 everyday?, for just 45 minutes work? You could quit your job and make double the money at home working for yourself. visit-http:tinyurl.com/42e38u9 Regards, Carmille Burns Survey Human Resources Dept.
Re: [Qemu-devel] [PATCH 19/26] target-xtensa: implement loop option
On 05/20/2011 02:10 AM, Max Filippov wrote: If you're going to pretend that LEND is a constant, you might as well pretend that LBEG is also a constant, so that you get to chain the TB's around the loop. But there may be three exits from TB at the LEND if its last command is a branch: to the LBEG, to the branch target and to the next insn. Ok, I guess that I need to add gen_wsr_lbeg that invalidates TB at the current LEND, pretend that LBEG is constant and use given slot to jump to it. And also to get rid of tcg_gen_brcondi_i32(TCG_COND_NE, cpu_SR[LEND], dc-next_pc, label); Yes. Consider that the code is written to assume that the loop cycles, so the most likely exit at LEND is LBEG. If we choose to mirror that logic inside TCG, then of the 3 possible exits from the block one of them should be LBEG so that the most likely edge can get chained. r~
Re: [Qemu-devel] [PATCH 09/26] target-xtensa: add special and user registers
On 05/20/2011 12:34 AM, Max Filippov wrote: User registers represent TIE states that may appear in custom xtensa configurations. I'd better change RUR and WUR so that they can access all user registers but warn on those not defined globally or in the CPUEnv::config. Is it OK? Well, it's ok if you change nothing. However, I wanted you to think about other ways that might make sense than simply allocating all of the registers. r~
[Qemu-devel] [PATCH v5, resend] revamp acpitable parsing and allow to specify complete (headerful) table
Since I've got no comments/replies whatsoever, -- neither positive nor negative, I assume no one received this email (sent on Thu, 12 May 2011), so am resending it again. This patch almost rewrites acpi_table_add() function (but still leaves it using old get_param_value() interface). The result is that it's now possible to specify whole table (together with a header) in an external file, instead of just data portion, with a new file= parameter, but at the same time it's still possible to specify header fields as before. Now with the checkpatch.pl formatting fixes, thanks to Stefan Hajnoczi for suggestions, with changes from Isaku Yamahata, and with my further refinements. v5: rediffed against current qemu/master. Signed-off-by: Michael Tokarev m...@tls.msk.ru --- hw/acpi.c | 292 --- qemu-options.hx |7 +- 2 files changed, 175 insertions(+), 124 deletions(-) diff --git a/hw/acpi.c b/hw/acpi.c index ad40fb4..4316189 100644 --- a/hw/acpi.c +++ b/hw/acpi.c @@ -22,17 +22,29 @@ struct acpi_table_header { -char signature [4];/* ACPI signature (4 ASCII characters) */ +uint16_t _length; /* our length, not actual part of the hdr */ + /* XXX why we have 2 length fields here? */ +char sig[4]; /* ACPI signature (4 ASCII characters) */ uint32_t length; /* Length of table, in bytes, including header */ uint8_t revision; /* ACPI Specification minor version # */ uint8_t checksum; /* To make sum of entire table == 0 */ -char oem_id [6]; /* OEM identification */ -char oem_table_id [8]; /* OEM table identification */ +char oem_id[6]; /* OEM identification */ +char oem_table_id[8]; /* OEM table identification */ uint32_t oem_revision;/* OEM revision number */ -char asl_compiler_id [4]; /* ASL compiler vendor ID */ +char asl_compiler_id[4]; /* ASL compiler vendor ID */ uint32_t asl_compiler_revision; /* ASL compiler revision number */ } __attribute__((packed)); +#define ACPI_TABLE_HDR_SIZE sizeof(struct acpi_table_header) +#define ACPI_TABLE_PFX_SIZE sizeof(uint16_t) /* size of the extra prefix */ + +static const char dfl_hdr[ACPI_TABLE_HDR_SIZE] = +\0\0 /* fake _length (2) */ +QEMU\0\0\0\0\1\0 /* sig (4), len(4), revno (1), csum (1) */ +QEMUQEQEMUQEMU\1\0\0\0 /* OEM id (6), table (8), revno (4) */ +QEMU\1\0\0\0 /* ASL compiler ID (4), version (4) */ +; + char *acpi_tables; size_t acpi_tables_len; @@ -45,158 +57,192 @@ static int acpi_checksum(const uint8_t *data, int len) return (-sum) 0xff; } +/* like strncpy() but zero-fills the tail of destination */ +static void strzcpy(char *dst, const char *src, size_t size) +{ +size_t len = strlen(src); +if (len = size) { +len = size; +} else { + memset(dst + len, 0, size - len); +} +memcpy(dst, src, len); +} + +/* XXX fixme: this function uses obsolete argument parsing interface */ int acpi_table_add(const char *t) { -static const char *dfl_id = QEMUQEMU; char buf[1024], *p, *f; -struct acpi_table_header acpi_hdr; unsigned long val; -uint32_t length; -struct acpi_table_header *acpi_hdr_p; -size_t off; +size_t len, start, allen; +bool has_header; +int changed; +int r; +struct acpi_table_header hdr; + +r = 0; +r |= get_param_value(buf, sizeof(buf), data, t) ? 1 : 0; +r |= get_param_value(buf, sizeof(buf), file, t) ? 2 : 0; +switch (r) { +case 0: +buf[0] = '\0'; +case 1: +has_header = false; +break; +case 2: +has_header = true; +break; +default: +fprintf(stderr, acpitable: both data and file are specified\n); +return -1; +} + +if (!acpi_tables) { +allen = sizeof(uint16_t); +acpi_tables = qemu_mallocz(allen); +} +else { +allen = acpi_tables_len; +} + +start = allen; +acpi_tables = qemu_realloc(acpi_tables, start + ACPI_TABLE_HDR_SIZE); +allen += has_header ? ACPI_TABLE_PFX_SIZE : ACPI_TABLE_HDR_SIZE; + +/* now read in the data files, reallocating buffer as needed */ + +for (f = strtok(buf, :); f; f = strtok(NULL, :)) { +int fd = open(f, O_RDONLY); + +if (fd 0) { +fprintf(stderr, can't open file %s: %s\n, f, strerror(errno)); +return -1; +} + +for (;;) { +char data[8192]; +r = read(fd, data, sizeof(data)); +if (r == 0) { +break; +} else if (r 0) { +acpi_tables = qemu_realloc(acpi_tables, allen + r); +memcpy(acpi_tables + allen, data, r); +allen += r; +} else if (errno != EINTR) { +fprintf(stderr, can't read file %s: %s\n, +f,
Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API
On 05/20/2011 09:06 AM, Richard Henderson wrote: On 05/20/2011 02:23 AM, Avi Kivity wrote: On 05/19/2011 11:43 PM, Anthony Liguori wrote: On 05/19/2011 09:12 AM, Avi Kivity wrote: The memory API separates the attributes of a memory region (its size, how reads or writes are handled, dirty logging, and coalescing) from where it is mapped and whether it is enabled. This allows a device to configure a memory region once, then hand it off to its parent bus to map it according to the bus configuration. Hierarchical registration also allows a device to compose a region out of a number of sub-regions with different properties; for example some may be RAM while others may be MMIO. +struct { +/* If nonzero, specify bounds on access sizes beyond which a machine + * check is thrown. + */ +unsigned min_access_size; +unsigned max_access_size; +/* If true, unaligned accesses are supported. Otherwise unaligned + * accesses throw machine checks. + */ + bool unaligned; +} valid; Under what circumstances would this be used? The behavior of devices that receive non-natural accesses varies wildly. For PCI devices, invalid accesses almost always return ~0. I can't think of a device where an MCE would occur. This was requested by Richard, so I'll let him comment. Several alpha system chips MCE when accessed with incorrect sizes. E.g. only 64-bit accesses are allowed. But is this a characteristic of devices or is this a characteristic of the chipset/CPU? At any rate, I'm fairly sure it doesn't belong in the MemoryRegion structure. Regards, Anthony Liguori
Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API
On 05/20/2011 07:31 AM, Anthony Liguori wrote: But is this a characteristic of devices or is this a characteristic of the chipset/CPU? Chipset. r~
Re: [Qemu-devel] [PATCH] hw/realview.c: Remove duplicate #include line
On Thu, May 19, 2011 at 4:21 PM, Peter Maydell peter.mayd...@linaro.org wrote: Remove a duplicate #include of sysbus.h. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- hw/realview.c | 1 - 1 files changed, 0 insertions(+), 1 deletions(-) Thanks, added to the trivial-patches tree: http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/trivial-patches Stefan
Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API
On 05/20/2011 09:40 AM, Richard Henderson wrote: On 05/20/2011 07:31 AM, Anthony Liguori wrote: But is this a characteristic of devices or is this a characteristic of the chipset/CPU? Chipset. So if the chipset only allows accesses that are 64-bit, then you'll want to have hierarchical dispatch filter non 64-bit accesses and raise an MCE appropriately. So you don't need anything in MemoryRegion, you need code in the dispatch path. Regards, Anthony Liguori r~
Re: [Qemu-devel] [Qemu-trivial] [PATCH] hw/sd.c: Don't complain about SDIO commands CMD52/CMD53
On Fri, May 20, 2011 at 10:11 AM, Peter Maydell peter.mayd...@linaro.org wrote: The SDIO specification introduces new commands 52 and 53. Handle as illegal command but do not complain on stderr, as SDIO-aware OSes (including Linux) may legitimately use these in their probing for presence of an SDIO card. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- hw/sd.c | 11 +++ 1 files changed, 11 insertions(+), 0 deletions(-) Thanks, added to the trivial patches tree: http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/trivial-patches Stefan
Re: [Qemu-devel] [RFC] Memory API
On 05/20/2011 03:56 AM, Avi Kivity wrote: On 05/19/2011 07:36 PM, Anthony Liguori wrote: There are no global priorities. Priorities are only used inside each level of the memory region hierarchy to generate a resulting, flattened view for the next higher level. At that level, everything imported from below has the default prio again, ie. the lowest one. Then SMM is impossible. It doesn't follow. Why do we need priorities at all? There should be no overlap at each level in the hierarchy. Of course there is overlap. PCI BARs overlap each other, the VGA windows and ROM overlap RAM. Here's what I'm still struggling with: If children normally overlap their parents, but child priorities are always less than their parents, then what's the benefit of having anything more than two priorities settings. As far as I can understand it, a priority of 0 means let children windows overlap whereas a priority of 1 means don't let children windows overlap. Is there a use-case for a priority above 1 and if so, what does it mean? If you have overlapping BARs, the PCI bus will always send the request to a single device based on something that's implementation specific. This works because each PCI device advertises the BAR locations and sizes in it's config space. BARs in general don't need priority, except we need to decide if BARs overlap RAM of vice-versa. To dispatch a request, the PCI bus will walk the config space to find a match. If you remove something that was previously causing an overlap, it'll the other device will now get the I/O requests. That's what *exactl* what priority means. Which device is in front, and which is in the back. Why not use registration order to resolve this type of conflict? What are the use cases to use priorities where registration order wouldn't be adequate? There is no need to have centralized logic to decide this. I think you're completely missing the point of my proposal. I'm struggling to find the mental model for priorities. I may just be dense here but the analogy of transparent window ordering isn't helping me. Regards, Anthony Liguori
[Qemu-devel] [RFC PATCH 0/6] SCSI series part 2, rewrite LUN parsing
This is the second part of my SCSI work. The first is still pending and this one is incomplete, but I still would like to get opinions early enough because this design directly affects the UI. This series is half of the work that is necessary to support multiple LUNs behind a target. The idea is to have two devices, scsi-path and scsi-target, each of which provides both a SCSIDevice and a SCSIBus. I plan to do this work using VSCSI and then cut-an^Wapply it later to virtio-scsi. This way we can be reasonably sure that the approach will be usable in the Linux virtio-scsi drivers too. For an HBA like VSCSI or the upcoming virtio-scsi, which supports multiple paths, you can add to your HBA: - a scsi-path (id=1) which has two scsi-disks. Then the disks will be at path 1, target 0, LUN 0/1 - a scsi-path (id=1) which has two scsi-targets each with a scsi-disk. Then the disks will be at path 1, target 0/1, LUN 0 - a scsi-path (id=1) which has two scsi-targets each with two scsi-disk. Then the four disks will be at path 1, target 0/1, LUN 0/1 - two scsi-path (id=1) each with two scsi-targets each with two scsi-disk. Then the eight disks will be at path 1, target 0/1, LUN 0/1 - a scsi-target (id=0) which has two scsi-disks. Then the disks will be at path 0, target 0, LUN 0/1 - a scsi-target (id=0) with two scsi-disks and a scsi-path (id=1) each with two scsi-targets each with two scsi-disks. Then two disks will be at path 0, target 0, LUN 0/1; four more will be at path 1, target 0/1, LUN 0/1. For an HBA like lsi, which only supports one level, you can add to your HBA: - a scsi-target (id=0) which has two scsi-disks. Then the disks will be at path 0, target 0, LUN 0/1 - two scsi-targets (id=0/1) which has two scsi-disks. Then the disks will be at path 0, targets 0/1, LUN 0/1 - one scsi-target (id=0) which has two scsi-disks and one scsi-disk (id=1). Then two disks will be at path 0, target 0, LUN 0/1, the third will be at path 0, target 1, LUN 0. and so on. The patches do not provide the devices and relaying mechanism, but add plumbing for parsing complex LUNs such as those used by VSCSI. Patch 2 is useful on its own, because it fixes a mismatch in VSCSI's handling of OpenFirmware and Linux LUNs. It adds the main parsing code, and I'll probably resubmit it soon. Patch 5 adds the infrastructure that will be used by the simple LSI case. Patch 6 adds the infrastructure that will be used in the full case, and already kind-of attaches VSCSI to it. The other 3 are just complimentary. Ideas? Does the interface seem applicable to libvirt? Paolo Bonzini (6): scsi: ignore LUN field in the CDB scsi: support parsing of SAM logical unit numbers scsi-generic: allow customization of the lun scsi-disk: allow customization of the lun scsi: let a SCSIDevice have children devices scsi: add walking of hierarchical LUNs hw/esp.c |4 +- hw/lsi53c895a.c |2 +- hw/scsi-bus.c | 170 + hw/scsi-defs.h| 22 +++ hw/scsi-disk.c| 19 +++--- hw/scsi-generic.c | 41 +++-- hw/scsi.h | 17 + hw/spapr_vscsi.c | 22 ++- 8 files changed, 264 insertions(+), 33 deletions(-) -- 1.7.4.4
[Qemu-devel] [RFC PATCH 3/6] scsi-generic: allow customization of the lun
This allows passthrough of devices with LUN != 0, by redirecting them to LUN0 in the emulated target. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/scsi-generic.c | 38 +- 1 files changed, 33 insertions(+), 5 deletions(-) diff --git a/hw/scsi-generic.c b/hw/scsi-generic.c index e6f0efd..fb38934 100644 --- a/hw/scsi-generic.c +++ b/hw/scsi-generic.c @@ -230,8 +230,11 @@ static void scsi_read_data(SCSIRequest *req) return; } -if (r-req.cmd.buf[0] == REQUEST_SENSE s-driver_status SG_ERR_DRIVER_SENSE) -{ +switch (r-req.cmd.buf[0]) { +case REQUEST_SENSE: +if (!(s-driver_status SG_ERR_DRIVER_SENSE)) { +break; +} s-senselen = MIN(r-len, s-senselen); memcpy(r-buf, s-sensebuf, s-senselen); r-io_header.driver_status = 0; @@ -246,6 +249,32 @@ static void scsi_read_data(SCSIRequest *req) /* Clear sensebuf after REQUEST_SENSE */ scsi_clear_sense(s); return; + +case REPORT_LUNS: + assert(!s-lun); +if (r-req.cmd.xfer 16) { +scsi_command_complete(r, -EINVAL); +return; +} +r-io_header.driver_status = 0; +r-io_header.status = 0; +r-io_header.dxfer_len = 16; +r-len = -1; +r-buf[3] = 8; +scsi_req_data(r-req, 16); +scsi_command_complete(r, 0); +return; + +case INQUIRY: +if (req-lun != s-lun) { +if (r-req.cmd.xfer 1) { +scsi_command_complete(r, -EINVAL); +return; +} +outbuf[0] = 0x7f; +return MIN(req-cmd.xfer, SCSI_MAX_INQUIRY_LEN); +} +break; } ret = execute_command(s-bs, r, SG_DXFER_FROM_DEV, scsi_read_complete); @@ -335,7 +364,7 @@ static int32_t scsi_send_command(SCSIRequest *req, uint8_t *cmd) SCSIGenericReq *r = DO_UPCAST(SCSIGenericReq, req, req); int ret; -if (cmd[0] != REQUEST_SENSE req-lun != s-lun) { +if (cmd[0] != REQUEST_SENSE cmd[0] != INQUIRY req-lun != s-lun) { DPRINTF(Unimplemented LUN %d\n, req-lun); scsi_set_sense(s, SENSE_CODE(LUN_NOT_SUPPORTED)); r-req.status = CHECK_CONDITION; @@ -510,8 +539,6 @@ static int scsi_generic_initfn(SCSIDevice *dev) } /* define device state */ -s-lun = scsiid.lun; -DPRINTF(LUN %d\n, s-lun); s-qdev.type = scsiid.scsi_type; DPRINTF(device type %d\n, s-qdev.type); if (s-qdev.type == TYPE_TAPE) { @@ -552,6 +579,7 @@ static SCSIDeviceInfo scsi_generic_info = { .get_sense= scsi_get_sense, .qdev.props = (Property[]) { DEFINE_BLOCK_PROPERTIES(SCSIGenericState, qdev.conf), +DEFINE_PROP_UINT32(lun, SCSIDiskState, lun, 0), DEFINE_PROP_END_OF_LIST(), }, }; -- 1.7.4.4
[Qemu-devel] [RFC PATCH 2/6] scsi: support parsing of SAM logical unit numbers
SAM logical unit numbers are complicated beasts that can address multiple levels of buses and targets before finally reaching logical units. Begin supporting them by correctly parsing vSCSI LUNs. Note that with the current (admittedly incorrect) code OpenFirmware thought the devices were at bus X, target 0, lun 0 (because OF prefers access mode 0, which places bus numbers in the top byte), while Linux thought it was bus 0, target Y, lun 0 (because Linux uses access mode 2, which places target numbers in the top byte). With this patch, everything consistently uses the former notation. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/scsi-bus.c| 109 ++ hw/scsi-defs.h | 22 +++ hw/scsi.h|7 +++ hw/spapr_vscsi.c | 18 ++--- 4 files changed, 142 insertions(+), 14 deletions(-) diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c index 2f0ffda..70b1092 100644 --- a/hw/scsi-bus.c +++ b/hw/scsi-bus.c @@ -718,3 +718,112 @@ static char *scsibus_get_fw_dev_path(DeviceState *dev) return strdup(path); } + +/* Decode the bus and level parts of a LUN, as defined in the SCSI architecture + model. If false is returned, the LUN could not be parsed. If true + is return, *bus and *target identify the next two steps in the + hierarchical LUN. + + *lun can be used with scsi_get_lun to continue the parsing. */ +static bool scsi_decode_level(uint64_t sam_lun, int *bus, int *target, + uint64_t *lun) +{ +switch (sam_lun 62) { +case ADDR_PERIPHERAL_DEVICE: +*bus = (sam_lun 56) 0x3f; +if (*bus) { +/* The TARGET OR LUN field selects a target; walk the next + 16-bits to find the LUN. */ +*target = (sam_lun 48) 0xff; +*lun = sam_lun 16; +} else { +/* The TARGET OR LUN field selects a LUN on the current + node, identified by bus 0. */ +*target = 0; +*lun = (sam_lun 0xffLL) | (1LL 62); +} +return true; +case ADDR_LOGICAL_UNIT: +*bus = (sam_lun 53) 7; +*target = (sam_lun 56) 0x3f; +*lun = (sam_lun 0x1fLL) | (1LL 62); +return true; +case ADDR_FLAT_SPACE: +*bus = 0; +*target = 0; +*lun = sam_lun; +return true; +case ADDR_LOGICAL_UNIT_EXT: +if ((sam_lun 56) == ADDR_WELL_KNOWN_LUN || +(sam_lun 56) == ADDR_FLAT_SPACE_EXT) { +*bus = 0; +*target = 0; +*lun = sam_lun; +return true; +} +return false; +} +abort(); +} + +/* Extract a single-level LUN number from a LUN, as specified in the + SCSI architecture model. Return -1 if this is not possible because + the LUN includes a bus or target component. */ +static int scsi_get_lun(uint64_t sam_lun) +{ +int bus, target; + +retry: +switch (sam_lun 62) { +case ADDR_PERIPHERAL_DEVICE: +case ADDR_LOGICAL_UNIT: +scsi_decode_level(sam_lun, bus, target, sam_lun); +if (bus || target) { +return LUN_INVALID; +} +goto retry; + +case ADDR_FLAT_SPACE: +return (sam_lun 48) 0x3fff; +case ADDR_LOGICAL_UNIT_EXT: +if ((sam_lun 56) == ADDR_WELL_KNOWN_LUN) { +return LUN_WLUN_BASE | ((sam_lun 48) 0xff); +} +if ((sam_lun 56) == ADDR_FLAT_SPACE_EXT) { +return (sam_lun 32) 0xff; +} +return LUN_INVALID; +} +abort(); +} + +/* Extract bus and target from the given LUN and use it to identify a + SCSIDevice from a SCSIBus. Right now, only 1 target per bus is + supported. In the future a SCSIDevice could host its own SCSIBus, + in an alternation of devices that select a bus (target ports) and + devices that select a target (initiator ports). */ +SCSIDevice *scsi_decode_lun(SCSIBus *sbus, uint64_t sam_lun, int *lun) +{ +int bus, target, decoded_lun; +uint64_t next_lun; + +if (!scsi_decode_level(sam_lun, bus, target, next_lun)) { + /* Unsupported LUN format. */ + return NULL; +} +if (bus = sbus-ndev || (bus == 0 target 0)) { +/* Out of range. */ +return NULL; +} +if (target != 0) { +/* Only one target for now. */ +return NULL; +} + +decoded_lun = scsi_get_lun(next_lun); +if (decoded_lun != LUN_INVALID) { +*lun = decoded_lun; +return sbus-devs[bus]; +} +return NULL; +} diff --git a/hw/scsi-defs.h b/hw/scsi-defs.h index 413cce0..66dfd4a 100644 --- a/hw/scsi-defs.h +++ b/hw/scsi-defs.h @@ -164,3 +164,25 @@ #define TYPE_ENCLOSURE 0x0d/* Enclosure Services Device */ #define TYPE_NO_LUN 0x7f +/* + * SCSI addressing methods (bits 62-63 of the LUN). + */ +#define ADDR_PERIPHERAL_DEVICE 0 +#define ADDR_FLAT_SPACE1 +#define
[Qemu-devel] [RFC PATCH 5/6] scsi: let a SCSIDevice have children devices
This provides the infrastructure for simple devices to pick LUNs. Of course, this will not do anything until there is a device that can report the existence of those LUNs. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/esp.c|4 +++- hw/lsi53c895a.c |2 +- hw/scsi-bus.c | 14 ++ hw/scsi.h |3 +++ 4 files changed, 21 insertions(+), 2 deletions(-) diff --git a/hw/esp.c b/hw/esp.c index 5a33c67..e5bab76 100644 --- a/hw/esp.c +++ b/hw/esp.c @@ -239,12 +239,14 @@ static uint32_t get_cmd(ESPState *s, uint8_t *buf) static void do_busid_cmd(ESPState *s, uint8_t *buf, uint8_t busid) { +SCSIDevice *dev; int32_t datalen; int lun; DPRINTF(do_busid_cmd: busid 0x%x\n, busid); lun = busid 7; -s-current_req = scsi_req_new(s-current_dev, 0, lun); +dev = scsi_find_lun(s-current_dev, lun, buf); +s-current_req = scsi_req_new(dev, 0, lun); datalen = scsi_req_enqueue(s-current_req, buf); s-ti_size = datalen; if (datalen != 0) { diff --git a/hw/lsi53c895a.c b/hw/lsi53c895a.c index f291283..c549955 100644 --- a/hw/lsi53c895a.c +++ b/hw/lsi53c895a.c @@ -780,7 +780,7 @@ static void lsi_do_command(LSIState *s) s-command_complete = 0; id = (s-select_tag 8) 0xf; -dev = s-bus.devs[id]; +dev = scsi_find_lun(s-bus.devs[id], s-current_lun, buf); if (!dev) { lsi_bad_selection(s, id); return; diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c index 70b1092..4d46831 100644 --- a/hw/scsi-bus.c +++ b/hw/scsi-bus.c @@ -719,6 +719,20 @@ static char *scsibus_get_fw_dev_path(DeviceState *dev) return strdup(path); } +/* Simplified walk of the SCSI bus hierarchy, for devices that only support + one bus and only flat-space LUNs (typically 3-bit ones!). */ +SCSIDevice *scsi_find_lun(SCSIDevice *sdev, int lun, uint8_t *cdb) +{ +SCSIBus *sbus = sdev-children; +if (!sbus || +(lun == 0 cdb[1] == REPORT_LUNS) || +lun = sbus-ndev || sbus-devs[lun] == NULL) { +return sdev; +} else { +return sbus-devs[lun]; +} +} + /* Decode the bus and level parts of a LUN, as defined in the SCSI architecture model. If false is returned, the LUN could not be parsed. If true is return, *bus and *target identify the next two steps in the diff --git a/hw/scsi.h b/hw/scsi.h index aa75b82..438dd89 100644 --- a/hw/scsi.h +++ b/hw/scsi.h @@ -58,6 +58,7 @@ struct SCSIDevice uint32_t id; BlockConf conf; SCSIDeviceInfo *info; +SCSIBus *children; QTAILQ_HEAD(, SCSIRequest) requests; int blocksize; int type; @@ -143,7 +144,9 @@ extern const struct SCSISense sense_code_LUN_FAILURE; int scsi_build_sense(SCSISense sense, uint8_t *buf, int len, int fixed); int scsi_sense_valid(SCSISense sense); + SCSIDevice *scsi_decode_lun(SCSIBus *sbus, uint64_t sam_lun, int *lun); +SCSIDevice *scsi_find_lun(SCSIDevice *sdev, int lun, uint8_t *cdb); SCSIRequest *scsi_req_alloc(size_t size, SCSIDevice *d, uint32_t tag, uint32_t lun); SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, uint32_t lun); -- 1.7.4.4
[Qemu-devel] [RFC PATCH 1/6] scsi: ignore LUN field in the CDB
The LUN field in the CDB is a historical relic. Ignore it as reserved, which is what modern SCSI specifications actually say. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/scsi-disk.c|6 +++--- hw/scsi-generic.c |5 ++--- 2 files changed, 5 insertions(+), 6 deletions(-) diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c index 4c7a53e..b14c32f 100644 --- a/hw/scsi-disk.c +++ b/hw/scsi-disk.c @@ -516,7 +516,7 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, uint8_t *outbuf) memset(outbuf, 0, buflen); -if (req-lun || req-cmd.buf[1] 5) { +if (req-lun) { outbuf[0] = 0x7f; /* LUN not supported */ return buflen; } @@ -1022,9 +1022,9 @@ static int32_t scsi_send_command(SCSIRequest *req, uint8_t *buf) } #endif -if (req-lun || buf[1] 5) { +if (req-lun) { /* Only LUN 0 supported. */ -DPRINTF(Unimplemented LUN %d\n, req-lun ? req-lun : buf[1] 5); +DPRINTF(Unimplemented LUN %d\n, req-lun); if (command != REQUEST_SENSE command != INQUIRY) { scsi_command_complete(r, CHECK_CONDITION, SENSE_CODE(LUN_NOT_SUPPORTED)); diff --git a/hw/scsi-generic.c b/hw/scsi-generic.c index 0c04606..e6f0efd 100644 --- a/hw/scsi-generic.c +++ b/hw/scsi-generic.c @@ -335,9 +335,8 @@ static int32_t scsi_send_command(SCSIRequest *req, uint8_t *cmd) SCSIGenericReq *r = DO_UPCAST(SCSIGenericReq, req, req); int ret; -if (cmd[0] != REQUEST_SENSE -(req-lun != s-lun || (cmd[1] 5) != s-lun)) { -DPRINTF(Unimplemented LUN %d\n, req-lun ? req-lun : cmd[1] 5); +if (cmd[0] != REQUEST_SENSE req-lun != s-lun) { +DPRINTF(Unimplemented LUN %d\n, req-lun); scsi_set_sense(s, SENSE_CODE(LUN_NOT_SUPPORTED)); r-req.status = CHECK_CONDITION; scsi_req_complete(r-req); -- 1.7.4.4
[Qemu-devel] [RFC PATCH 4/6] scsi-disk: allow customization of the lun
This will not work until there is a device that can answer REPORT LUNS for disks with LUN != 0. However, it provides the infrastructure. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/scsi-disk.c | 17 + 1 files changed, 9 insertions(+), 8 deletions(-) diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c index b14c32f..f41550a 100644 --- a/hw/scsi-disk.c +++ b/hw/scsi-disk.c @@ -66,6 +66,7 @@ struct SCSIDiskState /* The qemu block layer uses a fixed 512 byte sector size. This is the number of 512 byte blocks in a single scsi sector. */ int cluster_size; +uint32_t lun; uint32_t removable; uint64_t max_lba; QEMUBH *bh; @@ -516,7 +517,7 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, uint8_t *outbuf) memset(outbuf, 0, buflen); -if (req-lun) { +if (req-lun != s-lun) { outbuf[0] = 0x7f; /* LUN not supported */ return buflen; } @@ -955,6 +956,7 @@ static int scsi_disk_emulate_command(SCSIDiskReq *r, uint8_t *outbuf) DPRINTF(Unsupported Service Action In\n); goto illegal_request; case REPORT_LUNS: +assert(!s-lun); if (req-cmd.xfer 16) goto illegal_request; memset(outbuf, 0, 16); @@ -1022,14 +1024,12 @@ static int32_t scsi_send_command(SCSIRequest *req, uint8_t *buf) } #endif -if (req-lun) { -/* Only LUN 0 supported. */ +if (command != REQUEST_SENSE command != INQUIRY req-lun != s-lun) { +/* Only one LUN supported. */ DPRINTF(Unimplemented LUN %d\n, req-lun); -if (command != REQUEST_SENSE command != INQUIRY) { -scsi_command_complete(r, CHECK_CONDITION, - SENSE_CODE(LUN_NOT_SUPPORTED)); -return 0; -} +scsi_command_complete(r, CHECK_CONDITION, + SENSE_CODE(LUN_NOT_SUPPORTED)); +return 0; } switch (command) { case TEST_UNIT_READY: @@ -1247,6 +1247,7 @@ static SCSIDeviceInfo scsi_disk_info = { .get_sense= scsi_get_sense, .qdev.props = (Property[]) { DEFINE_BLOCK_PROPERTIES(SCSIDiskState, qdev.conf), +DEFINE_PROP_UINT32(lun, SCSIDiskState, lun, 0), DEFINE_PROP_STRING(ver, SCSIDiskState, version), DEFINE_PROP_STRING(serial, SCSIDiskState, serial), DEFINE_PROP_BIT(removable, SCSIDiskState, removable, 0, false), -- 1.7.4.4
[Qemu-devel] [RFC PATCH 6/6] scsi: add walking of hierarchical LUNs
Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/scsi-bus.c| 79 +++--- hw/scsi.h|9 +- hw/spapr_vscsi.c |6 +++- 3 files changed, 75 insertions(+), 19 deletions(-) diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c index 4d46831..2037da3 100644 --- a/hw/scsi-bus.c +++ b/hw/scsi-bus.c @@ -811,33 +811,80 @@ retry: abort(); } -/* Extract bus and target from the given LUN and use it to identify a - SCSIDevice from a SCSIBus. Right now, only 1 target per bus is - supported. In the future a SCSIDevice could host its own SCSIBus, - in an alternation of devices that select a bus (target ports) and - devices that select a target (initiator ports). */ -SCSIDevice *scsi_decode_lun(SCSIBus *sbus, uint64_t sam_lun, int *lun) +/* Reusable implementation of the decode_lun entry in SCSIBusOps. */ +SCSIDevice *scsi_decode_bus_from_lun(SCSIBus *sbus, uint64_t sam_lun, + uint64_t *next_lun) { -int bus, target, decoded_lun; -uint64_t next_lun; +int bus, target; +uint64_t my_next_lun; +SCSIDevice *sdev; -if (!scsi_decode_level(sam_lun, bus, target, next_lun)) { +if (!scsi_decode_level(sam_lun, bus, target, my_next_lun)) { /* Unsupported LUN format. */ return NULL; } -if (bus = sbus-ndev || (bus == 0 target 0)) { +if (bus = sbus-ndev) { /* Out of range. */ return NULL; } -if (target != 0) { -/* Only one target for now. */ + +sdev = sbus-devs[bus]; +if (!sdev) { + return NULL; +} else if (bus == 0 || !sdev-children) { +return target ? NULL : sdev; +} else { +/* Next we'll decode the target, so pass down the same LUN we got. */ +return sdev-children-ops.decode_lun(sbus, sam_lun, next_lun); +} +} + +SCSIDevice *scsi_decode_target_from_lun(SCSIBus *sbus, uint64_t sam_lun, +uint64_t *next_lun) +{ +int bus, target; +SCSIDevice *sdev; + +if (!scsi_decode_level(sam_lun, bus, target, next_lun)) { + /* Unsupported LUN format. */ + return NULL; +} +if (target = sbus-ndev) { +/* Out of range. */ return NULL; } +sdev = sbus-devs[target]; +if (!sdev || !sdev-children || (*next_lun 56) == ADDR_WELL_KNOWN_LUN) { +return sdev; +} else { +return sdev-children-ops.decode_lun(sbus, *next_lun, next_lun); +} +} + +/* Extract bus and target from the given LUN and use it to identify a + SCSIDevice from a SCSIBus. Right now, only 1 target per bus is + supported. In the future a SCSIDevice could host its own SCSIBus, + in an alternation of devices that select a bus (target ports) and + devices that select a target (initiator ports). */ +SCSIDevice *scsi_decode_lun(SCSIBus *sbus, uint64_t sam_lun, +uint8_t *cdb, int *lun) +{ +int decoded_lun; +uint64_t next_lun; +SCSIDevice *sdev; + +sdev = sbus-ops.decode_lun(sbus, sam_lun, next_lun); +if (!sdev) { +return NULL; +} decoded_lun = scsi_get_lun(next_lun); -if (decoded_lun != LUN_INVALID) { -*lun = decoded_lun; -return sbus-devs[bus]; +if (decoded_lun == LUN_INVALID) { +return NULL; +} +if ((decoded_lun ~LUN_WLUN_MASK) == LUN_WLUN_BASE) { +return sdev; } -return NULL; +*lun = decoded_lun; +return scsi_find_lun(sdev, decoded_lun, cdb); } diff --git a/hw/scsi.h b/hw/scsi.h index 438dd89..c4cca0b 100644 --- a/hw/scsi.h +++ b/hw/scsi.h @@ -88,6 +88,8 @@ struct SCSIBusOps { void (*transfer_data)(SCSIRequest *req, uint32_t arg); void (*complete)(SCSIRequest *req, uint32_t arg); void (*cancel)(SCSIRequest *req); +SCSIDevice *(*decode_lun)(SCSIBus *sbus, uint64_t sam_lun, + uint64_t *next_lun); }; struct SCSIBus { @@ -145,7 +147,12 @@ extern const struct SCSISense sense_code_LUN_FAILURE; int scsi_build_sense(SCSISense sense, uint8_t *buf, int len, int fixed); int scsi_sense_valid(SCSISense sense); -SCSIDevice *scsi_decode_lun(SCSIBus *sbus, uint64_t sam_lun, int *lun); +SCSIDevice *scsi_decode_lun(SCSIBus *sbus, uint64_t sam_lun, uint8_t *cdb, +int *lun); +SCSIDevice *scsi_decode_bus_from_lun(SCSIBus *sbus, uint64_t sam_lun, + uint64_t *next_lun); +SCSIDevice *scsi_decode_target_from_lun(SCSIBus *sbus, uint64_t sam_lun, +uint64_t *next_lun); SCSIDevice *scsi_find_lun(SCSIDevice *sdev, int lun, uint8_t *cdb); SCSIRequest *scsi_req_alloc(size_t size, SCSIDevice *d, uint32_t tag, uint32_t lun); diff --git a/hw/spapr_vscsi.c b/hw/spapr_vscsi.c index ee88ff6..d46ab30 100644 --- a/hw/spapr_vscsi.c +++ b/hw/spapr_vscsi.c @@ -640,7 +640,8 @@ static int vscsi_queue_cmd(VSCSIState *s, vscsi_req *req)
Re: [Qemu-devel] [PATCH 20/26] target-xtensa: implement extended L32R
On 05/20/2011 12:14 AM, Max Filippov wrote: As far as I can see LITBASE usage pattern is that it is set up once in early initialization and is never changed after. That's probably true on a per-program basis. I.e. for semi-hosting or userland emulation, hard-coding litbase into the TB could make sense. However, for full system emulation, with kernel and userland et al, I would expect that litbase would tend to be set per-application. At which point it would almost certainly be more efficient to read the value at runtime. r~
Re: [Qemu-devel] [RFC] Memory API
On 05/20/2011 04:01 AM, Avi Kivity wrote: On 05/19/2011 07:32 PM, Anthony Liguori wrote: Think of how a window manager folds windows with priorities onto a flat framebuffer. You do a depth-first walk of the tree. For each child list, you iterate it from the lowest to highest priority, allowing later subregions override earlier subregions. Okay, but this doesn't explain how you'll let RAM override the VGA mapping since RAM is not represented in the same child list as VGA (RAM is a child of the PMC whereas VGA is a child of ISA/PCI, both of which are at least one level removed from the PMC). VGA will override RAM. Memory controller | +-- RAM container (prio 0) | +-- PCI container (prio 1) | +--- vga window Unless the RAM controller increases it's priority, right? That's how you would implement SMM, by doing priority++? But if you have: Memory controller | +-- RAM container (prio 0) | +-- PCI container (prio 1) | +-- PCI-X container (prio 2) | +--- vga window Now you need to do priority = 3? Jan had mentioned previously about registering a new temporary window. I assume the registration always gets highest_priority++, or do you have to explicitly specify that PCI container gets priority=1? Regards, Anthony Liguori
[Qemu-devel] [PATCH] s390x: complain when allocating ram fails
While trying out the 64GB guest RAM patch, I hit some virtual address limitations of my host system, which resulted in mmap failing. Unfortunately, qemu didn't tell me about this failure, but just used the NULL pointer happily, resulting in either segmentation faults or other fun errors. To spare other users from tracing this down, let's print a nice message instead so the user can figure out what's wrong from there. Signed-off-by: Alexander Graf ag...@suse.de --- exec.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/exec.c b/exec.c index 3f96d44..a4785b2 100644 --- a/exec.c +++ b/exec.c @@ -2918,6 +2918,10 @@ ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name, new_block-host = mmap((void*)0x8, size, PROT_EXEC|PROT_READ|PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS | MAP_FIXED, -1, 0); +if (new_block-host == MAP_FAILED) { +fprintf(stderr, Allocating RAM failed\n); +abort(); +} #else if (xen_mapcache_enabled()) { xen_ram_alloc(new_block-offset, size); -- 1.6.0.2
[Qemu-devel] [PATCH V5 00/12] Qemu Trusted Platform Module (TPM) integration
The following series of patches adds a TPM (Trusted Platform Module) TIS (TPM Interface Spec) interface to Qemu and with that provides means to access a backend implementing the actual TPM functionality. This frontend enables for example Linux's TPM TIS (tpm_tis) driver. I am also posting the implementation of a backend implementation that is based on a library (libtpms) providing TPM functionality. This library is currently undergoing further testing but is now available via Fedora Rawhide: http://download.fedora.redhat.com/pub/fedora/linux/development/rawhide/x86_64/os/Packages/libtpms-0.5.1-5.x86_64.rpm http://download.fedora.redhat.com/pub/fedora/linux/development/rawhide/x86_64/os/Packages/libtpms-devel-0.5.1-5.x86_64.rpm source at http://download.fedora.redhat.com/pub/fedora/linux/development/rawhide/source/SRPMS/libtpms-0.5.1-5.src.rpm All testing was done with the libtpms-based backend. It provides support for VM suspend/resume, migration and snapshotting. It uses QCoW2 as the file format for storing its persistent state onto, which is necessary for support of snapshotting. Using Linux as the OS along with some recently posted patches for the Linux TPM TIS driver, suspend/resume works fine (using 'virsh save/restore') along with hibernation and OS suspend (ACPI S3). Proper support for the TPM requires support in the BIOS since the BIOS needs to initialize the TPM upon machine start or issue commands to the TPM when it resumes from suspend (ACPI S3). It also builds and connects the necessary ACPI tables (SSDT for TPM device, TCPA table for logging) to the ones that are built by a BIOS. To support this I have fairly extensive set of extensions for SeaBIOS that have already been posted to the SeaBIOS mailing list and been ACK'ed by Kevin (thank you! :-)). v5: - applies to checkout of 1fddfba1 - adding support for split command line using the -tpmdev ... -device ... options while keeping the -tpm option - support for querying the device models using -tpm model=? - support for monitor 'info tpm' - adding documentation of command line options for man page and web page - increasing room for ACPI tables that qemu reserves to 128kb (from 64kb) - adding (experimental) support for block migration - adding (experimental) support for taking measurements when kernel, initrd and kernel command line are directly passed to Qemu v4: - applies to checkout of d2d979c6 - more coding style fixes - adding patch for supporting blob encryption (in addition to the existing QCoW2-level encryption) - this allows for graceful termination of a migration if the target is detected to have a wrong key - tested with big and little endian hosts - main thread releases mutex while checking for work to do on behalf of backend - introducing file locking (fcntl) on the block layer for serializing access to shared (QCoW2) files (used during migration) v3: - Building a null driver at patch 5/8 that responds to all requests with an error response; subsequently this driver is transformed to the libtpms-based driver for real TPM functionality - Reworked the threading; dropped the patch for qemu_thread_join; the main thread synchronizing with the TPM thread termination may need to write data to the block storage while waiting for the thread to terminate; did not previously show a problem but is safer - A lot of testing based on recent git checkout 4b4a72e5 (4/10): - migration of i686 VM from x86_64 host to i686 host to ppc64 host while running tests inside the VM - tests with S3 suspend/resume - tests with snapshots - multiple-hour tests with VM suspend/resume (using virsh save/restore) while running a TPM test suite inside the VM All tests passed; [not all of them were done on the ppc64 host] v2: - splitting some of the patches into smaller ones for easier review - fixes in individual patches Regards, Stefan
[Qemu-devel] [PATCH V5 05/12] Add a debug register
This patch uses the possibility to add a vendor-specific register and adds a debug register useful for dumping the TIS's internal state. This register is only active in a debug build (#define DEBUG_TIS). v3: - all output goes to stderr Signed-off-by: Stefan Berger stef...@linux.vnet.ibm.com --- hw/tpm_tis.c | 67 +++ 1 file changed, 67 insertions(+) Index: qemu-git/hw/tpm_tis.c === --- qemu-git.orig/hw/tpm_tis.c +++ qemu-git/hw/tpm_tis.c @@ -43,6 +43,8 @@ #define TIS_REG_DID_VID 0xf00 #define TIS_REG_RID 0xf04 +/* vendor-specific registers */ +#define TIS_REG_DEBUG 0xf90 #define STS_VALID(1 7) #define STS_COMMAND_READY(1 6) @@ -316,6 +318,66 @@ static uint32_t tis_data_read(TPMState * } +#ifdef DEBUG_TIS +static void tis_dump_state(void *opaque, target_phys_addr_t addr) +{ +static const unsigned regs[] = { +TIS_REG_ACCESS, +TIS_REG_INT_ENABLE, +TIS_REG_INT_VECTOR, +TIS_REG_INT_STATUS, +TIS_REG_INTF_CAPABILITY, +TIS_REG_STS, +TIS_REG_DID_VID, +TIS_REG_RID, +0xfff}; +int idx; +uint8_t locty = tis_locality_from_addr(addr); +target_phys_addr_t base = addr ~0xfff; +TPMState *s = opaque; + +fprintf(stderr, +tpm_tis: active locality : %d\n +tpm_tis: state of locality %d : %d\n +tpm_tis: register dump:\n, +s-active_locty, +locty, s-loc[locty].state); + +for (idx = 0; regs[idx] != 0xfff; idx++) { +fprintf(stderr, tpm_tis: 0x%04x : 0x%08x\n, regs[idx], +tis_mem_readl(opaque, base + regs[idx])); +} + +fprintf(stderr, +tpm_tis: read offset : %d\n +tpm_tis: result buffer : , +s-loc[locty].r_offset); +for (idx = 0; + idx tis_get_size_from_buffer(s-loc[locty].r_buffer); + idx++) { +fprintf(stderr, %c%02x%s, +s-loc[locty].r_offset == idx ? '' : ' ', +s-loc[locty].r_buffer.buffer[idx], +((idx 0xf) == 0xf) ? \ntpm_tis: : ); +} +fprintf(stderr, +\n +tpm_tis: write offset : %d\n +tpm_tis: request buffer: , +s-loc[locty].w_offset); +for (idx = 0; + idx tis_get_size_from_buffer(s-loc[locty].w_buffer); + idx++) { +fprintf(stderr, %c%02x%s, +s-loc[locty].w_offset == idx ? '' : ' ', +s-loc[locty].w_buffer.buffer[idx], +((idx 0xf) == 0xf) ? \ntpm_tis: : ); +} +fprintf(stderr,\n); +} +#endif + + /* * Read a register of the TIS interface * See specs pages 33-63 for description of the registers @@ -391,6 +453,11 @@ static uint32_t tis_mem_readl(void *opaq case TIS_REG_RID: val = TPM_RID; break; +#ifdef DEBUG_TIS +case TIS_REG_DEBUG: +tis_dump_state(opaque, addr); +break; +#endif } qemu_mutex_unlock(s-state_lock);
[Qemu-devel] [PATCH V5 01/12] Support for TPM command line options
This patch adds support for TPM command line options. The command line supported here (considering the libtpms based backend) are ./qemu-... -tpm builtin,path=path to blockstorage file and ./qemu-... -tpmdev builtin,path=path to blockstorage file,id=id -device tpm-tis,tpmdev=id and ./qemu-... -tpmdev ? where the latter works similar to -soundhw ? and shows a list of available TPM backends ('builtin'). To show the available TPM models do: ./qemu-... -tpm model=? In case of -tpm, 'type' (above 'builtin') and 'model' are interpreted in tpm.c. In case of -tpmdev 'type' and 'id' are interpreted in tpm.c Using the type parameter, the backend is chosen, i.e., 'builtin' for the libtpms-based builtin TPM. The interpretation of the other parameters along with determining whether enough parameters were provided is pushed into the backend driver, which needs to implement the interface function 'create' and return a TPMDriver structure if the VM can be started or 'NULL' if not enough or bad parameters were provided. Since SeaBIOS will now use 128kb for ACPI tables the amount of reserved memory for ACPI tables needs to be increased -- increasing it to 128kb. Monitor support for 'info tpm' has been added. It for example prints the following: TPM devices: builtin: model=tpm-tis,id=tpm0 v5: - fixing typo reported by Serge Hallyn - Adapting code to split command line parameters supporting -tpmdev ... -device tpm-tis,tpmdev=... - moved code out of arch_init.c|h into tpm.c|h - increasing reserved memory for ACPI tables to 128kb (from 64kb) - the backend interface has a create() function for interpreting the command line parameters and returning a TPMDevice structure; previoulsy this function was called handle_options() - the backend interface has a destroy() function for cleaning up after the create() function was called - added support for 'info tpm' in monitor v4: - coding style fixes v3: - added hw/tpm_tis.h to this patch so Qemu compiles at this stage Signed-off-by: Stefan Berger stef...@linux.vnet.ibm.com --- Makefile.target |1 hmp-commands.hx |2 hw/pc.c |5 - hw/tpm_tis.h| 76 +++ monitor.c | 10 ++ qemu-config.c | 46 + qemu-options.hx | 80 tpm.c | 277 tpm.h | 104 + vl.c| 14 ++ 10 files changed, 614 insertions(+), 1 deletion(-) Index: qemu-git/qemu-options.hx === --- qemu-git.orig/qemu-options.hx +++ qemu-git/qemu-options.hx @@ -1691,6 +1691,86 @@ ETEXI DEFHEADING() +DEFHEADING(TPM device options:) + +#ifndef _WIN32 +# ifdef CONFIG_TPM +DEF(tpm, HAS_ARG, QEMU_OPTION_tpm, \ + \ +-tpm builtin,path=path[,model=model]\n \ +enable a builtin TPM with state in file in path\n \ +-tpm model=?to list available TPM device models\n \ +-tpm ? to list available TPM backend types\n, +QEMU_ARCH_I386) +DEF(tpmdev, HAS_ARG, QEMU_OPTION_tpmdev, \ +-tpmdev [builtin],id=str[,option][,option][,...]\n, +QEMU_ARCH_I386) +# endif +#endif +STEXI + +The general form of a TPM device option is: +@table @option + +@item -tpmdev @var{backend} ,id=@var{id} [,@var{options}] +@findex -tpmdev +Backend type must be: +@option{builtin}. + +The specific backend type will determine the applicable options. +The @code{-tpmdev} options requires a @code{-device} option. + +Options to each backend are described below. + +Use ? to print all available TPM backend types. +@example +qemu -tpmdev ? +@end example + +@item -tpmdev builtin ,id=@var{id}, path=@var{path} + +Creates an instance of the built-in TPM. + +@option{path} specifies the path to the QCoW2 image that will store +the TPM's persistent data. @option{path} is required. + +To create a built-in TPM use the following two options: +@example +-tpmdev builtin,id=tpm0,path=path_to_qcow2 -device tpm-tis,tpmdev=tpm0 +@end example +Not that the @code{-tpmdev} id is @code{tpm0} and is referenced by +@code{tpmdev=tpm0} in the device option. + +@end table + +The short form of a TPM device option is: +@table @option + +@item -tpm @var{backend-type}, path=@var{path} [,model=@var{model}] +@findex -tpm + +@option{model} specifies the device model. The default device model is a +@code{tpm-tis} device model. @code{model} is optional. + +Use ? to print all available TPM models. +@example +qemu -tpm model=? +@end example + +The other options have the same meaning as explained above. + +To create a built-in TPM use the following option: +@example +-tpm builtin, path=path_to_qcow2 +@end example + +@end table + +ETEXI + + +DEFHEADING() + DEFHEADING(Linux/Multiboot boot specific:) STEXI Index: qemu-git/vl.c === --- qemu-git.orig/vl.c +++ qemu-git/vl.c @@ -137,6 +137,7 @@ int main(int
[Qemu-devel] [PATCH V5 02/12] Add TPM (frontend) hardware interface (TPM TIS) to Qemu
This patch adds the main code of the TPM frontend driver, the TPM TIS interface, to Qemu. The code is largely based on my previous implementation for Xen but has been significantly extended to meet the standard's requirements, such as the support for changing of localities and all the functionality of the available flags. Communication with the backend (i.e., for Xen or the libtpms-based one) is cleanly separated through an interface which the backend driver needs to implement. The TPM TIS driver's backend was previously chosen in the code added to arch_init. The frontend holds a pointer to the chosen backend (interface). Communication with the backend is largely based on signals and conditions. Whenever the frontend has collected a complete packet, it will signal the backend, which then starts processing the command. Once the result has been returned, the backend invokes a callback function (tis_tpm_receive_cb()). The one tricky part is support for VM suspend while the TPM is processing a command. In this case the frontend driver is waiting for the backend to return the result of the last command before shutting down. It waits on a condition for a signal from the backend, which is delivered in tis_tpm_receive_cb(). Testing the proper functioning of the different flags and localities cannot be done from user space when running in Linux for example, since access to the address space of the TPM TIS interface is not possible. Also the Linux driver itself does not exercise all functionality. So, for testing there is a fairly extensive test suite as part of the SeaBIOS patches since from within the BIOS one can have full access to all the TPM's registers. v5: - adding comment to tis_data_read - refactoring following support for split command line options -tpmdev and -device - code handling the configuration of the TPM device was moved to tpm.c - removed empty line at end of file v3: - prefixing functions with tis_ - added a function to the backend interface 'early_startup_tpm' that allows to detect the presence of the block storage and gracefully fails Qemu if it's not available. This works with migration using shared storage but doesn't support migration with block storage migration. For encyrypted QCoW2 and in case of a snapshot resue the late_startup_tpm interface function is called Signed-off-by: Stefan Berger stef...@linux.vnet.ibm.com --- hw/tpm_tis.c | 839 +++ 1 file changed, 839 insertions(+) Index: qemu-git/hw/tpm_tis.c === --- /dev/null +++ qemu-git/hw/tpm_tis.c @@ -0,0 +1,839 @@ +/* + * tpm_tis.c - QEMU emulator for a 1.2 TPM with TIS interface + * + * Copyright (C) 2006,2010 IBM Corporation + * + * Author: Stefan Berger stef...@us.ibm.com + * David Safford saff...@us.ibm.com + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation, version 2 of the + * License. + * + * + * Implementation of the TIS interface according to specs at + * https://www.trustedcomputinggroup.org/groups/pc_client/TCG_PCClientTPMSpecification_1-20_1-00_FINAL.pdf + * + */ + +#include tpm.h +#include block.h +#include hw/hw.h +#include hw/pc.h +#include hw/tpm_tis.h + +#include stdio.h + +//#define DEBUG_TIS + +/* whether the STS interrupt is supported */ +//#define RAISE_STS_IRQ + +/* tis registers */ +#define TIS_REG_ACCESS0x00 +#define TIS_REG_INT_ENABLE0x08 +#define TIS_REG_INT_VECTOR0x0c +#define TIS_REG_INT_STATUS0x10 +#define TIS_REG_INTF_CAPABILITY 0x14 +#define TIS_REG_STS 0x18 +#define TIS_REG_DATA_FIFO 0x24 +#define TIS_REG_DID_VID 0xf00 +#define TIS_REG_RID 0xf04 + + +#define STS_VALID(1 7) +#define STS_COMMAND_READY(1 6) +#define STS_TPM_GO (1 5) +#define STS_DATA_AVAILABLE (1 4) +#define STS_EXPECT (1 3) +#define STS_RESPONSE_RETRY (1 1) + +#define ACCESS_TPM_REG_VALID_STS (1 7) +#define ACCESS_ACTIVE_LOCALITY (1 5) +#define ACCESS_BEEN_SEIZED (1 4) +#define ACCESS_SEIZE (1 3) +#define ACCESS_PENDING_REQUEST (1 2) +#define ACCESS_REQUEST_USE (1 1) +#define ACCESS_TPM_ESTABLISHMENT (1 0) + +#define INT_ENABLED (1 31) +#define INT_DATA_AVAILABLE (1 0) +#define INT_STS_VALID(1 1) +#define INT_LOCALITY_CHANGED (1 2) +#define INT_COMMAND_READY(1 7) + +#ifndef RAISE_STS_IRQ + +# define INTERRUPTS_SUPPORTED (INT_LOCALITY_CHANGED | \ + INT_DATA_AVAILABLE | \ + INT_COMMAND_READY) + +#else + +#
[Qemu-devel] [PATCH V5 04/12] Add tpm_tis driver to build process
The TPM interface (tpm_tis) needs to be explicitly enabled via ./configure --enable-tpm. This patch also restricts the building of the TPM support to i386 and x86_64 targets since only there it is currently supported. With that I am trying to prevent that one will end up with support for a frontend but no available backend. v3: - fixed and moved hunks in Makefile.target into right place Signed-off-by: Stefan Berger stef...@linux.vnet.ibm.com Index:qemu/Makefile.target === --- Makefile.target |1 + configure | 20 2 files changed, 21 insertions(+) Index: qemu-git/Makefile.target === --- qemu-git.orig/Makefile.target +++ qemu-git/Makefile.target @@ -239,6 +239,7 @@ obj-i386-y += device-hotplug.o pci-hotpl obj-i386-y += debugcon.o multiboot.o obj-i386-y += pc_piix.o kvmclock.o obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o +obj-i386-$(CONFIG_TPM) += tpm_tis.o # shared objects obj-ppc-y = ppc.o Index: qemu-git/configure === --- qemu-git.orig/configure +++ qemu-git/configure @@ -179,6 +179,7 @@ rbd= smartcard= smartcard_nss= opengl= +tpm=no # parse CC options first for opt do @@ -714,6 +715,8 @@ for opt do ;; --kerneldir=*) kerneldir=$optarg ;; + --enable-tpm) tpm=yes + ;; --with-pkgversion=*) pkgversion= ($optarg) ;; --disable-docs) docs=no @@ -1014,6 +1017,7 @@ echo --disable-smartcard disable echo --enable-smartcard enable smartcard support echo --disable-smartcard-nss disable smartcard nss support echo --enable-smartcard-nss enable smartcard nss support +echo --enable-tpm enables an emulated TPM echo echo NOTE: The object files are built at the place where configure is launched exit 1 @@ -2706,6 +2710,7 @@ echo rbd support $rbd echo xfsctl support$xfs echo nss used $smartcard_nss echo OpenGL support$opengl +echo TPM support $tpm if test $sdl_too_old = yes; then echo - Your SDL version is too old - please upgrade to have SDL support @@ -3524,6 +3529,21 @@ if test $gprof = yes ; then fi fi +if test $tpm = yes; then + has_tpm=0 + if test $target_softmmu = yes ; then +case $TARGET_BASE_ARCH in +i386) + has_tpm=1 +;; +esac + fi + + if test $has_tpm = 1; then + echo CONFIG_TPM=y $config_host_mak + fi +fi + linker_script=-Wl,-T../config-host.ld -Wl,-T,\$(SRC_PATH)/\$(ARCH).ld if test $target_linux_user = yes -o $target_bsd_user = yes ; then case $ARCH in
[Qemu-devel] [PATCH V5 03/12] Add persistent state handling to TPM TIS frontend driver
This patch adds support for handling of persistent state to the TPM TIS frontend. The currently used buffer is determined (can only be in currently active locality and either be a read or a write buffer) and only that buffer's content is stored. The reverse is done when the state is restored from disk where the buffer's content are copied into the currently used buffer. To keep compatibility with existing Xen the VMStateDescription was adapted to be compatible with existing state. For that I am adding Andreas Niederl as an author to the file. v5: - removing qdev.no_user=1 v4: - main thread releases the 'state' lock while periodically calling the backends function that may request it to write data into block storage. v3: - all functions prefixed with tis_ - while the main thread is waiting for an outstanding TPM command to finish, it periodically does some work (writes data to the block storage) Signed-off-by: Stefan Berger stef...@linux.vnet.ibm.com --- hw/tpm_tis.c | 166 +++ 1 file changed, 166 insertions(+) Index: qemu-git/hw/tpm_tis.c === --- qemu-git.orig/hw/tpm_tis.c +++ qemu-git/hw/tpm_tis.c @@ -6,6 +6,8 @@ * Author: Stefan Berger stef...@us.ibm.com * David Safford saff...@us.ibm.com * + * Xen 4 support: Andrease Niederl andreas.nied...@iaik.tugraz.at + * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License as * published by the Free Software Foundation, version 2 of the @@ -837,3 +839,167 @@ static int tis_init(ISADevice *dev) err_exit: return -1; } + +/* persistent state handling */ + +static void tis_pre_save(void *opaque) +{ +TPMState *s = opaque; +uint8_t locty = s-active_locty; + +qemu_mutex_lock(s-state_lock); + +/* wait for outstanding requests to complete */ +if (IS_VALID_LOCTY(locty) s-loc[locty].state == STATE_EXECUTION) { +if (!s-be_driver-ops-job_for_main_thread) { +qemu_cond_wait(s-from_tpm_cond, s-state_lock); +} else { +while (s-loc[locty].state == STATE_EXECUTION) { +qemu_mutex_unlock(s-state_lock); + +s-be_driver-ops-job_for_main_thread(NULL); +usleep(1); + +qemu_mutex_lock(s-state_lock); +} +} +} + +#ifdef DEBUG_TIS_SR +fprintf(stderr,tpm_tis: suspend: locty 0 : r_offset = %d, w_offset = %d\n, +s-loc[0].r_offset, +s-loc[0].w_offset); +if (s-loc[0].r_offset) { +tis_dump_state(opaque, 0); +} +#endif + +qemu_mutex_unlock(s-state_lock); + +/* copy current active read or write buffer into the buffer + written to disk */ +if (IS_VALID_LOCTY(locty)) { +switch (s-loc[locty].state) { +case STATE_RECEPTION: +memcpy(s-buf, + s-loc[locty].w_buffer.buffer, + MIN(sizeof(s-buf), + s-loc[locty].w_buffer.size)); +s-offset = s-loc[locty].w_offset; +break; +case STATE_COMPLETION: +memcpy(s-buf, + s-loc[locty].r_buffer.buffer, + MIN(sizeof(s-buf), + s-loc[locty].r_buffer.size)); +s-offset = s-loc[locty].r_offset; +break; +default: +/* leak nothing */ +memset(s-buf, 0x0, sizeof(s-buf)); +break; +} +} + +s-be_driver-ops-save_volatile_data(); +} + + +static int tis_post_load(void *opaque, + int version_id __attribute__((unused))) +{ +TPMState *s = opaque; + +uint8_t locty = s-active_locty; + +if (IS_VALID_LOCTY(locty)) { +switch (s-loc[locty].state) { +case STATE_RECEPTION: +memcpy(s-loc[locty].w_buffer.buffer, + s-buf, + MIN(sizeof(s-buf), + s-loc[locty].w_buffer.size)); +s-loc[locty].w_offset = s-offset; +break; +case STATE_COMPLETION: +memcpy(s-loc[locty].r_buffer.buffer, + s-buf, + MIN(sizeof(s-buf), + s-loc[locty].r_buffer.size)); +s-loc[locty].r_offset = s-offset; +break; +default: +break; +} +} + +#ifdef DEBUG_TIS_SR +fprintf(stderr,tpm_tis: resume : locty 0 : r_offset = %d, w_offset = %d\n, +s-loc[0].r_offset, +s-loc[0].w_offset); +#endif + +return s-be_driver-ops-load_volatile_data(s); +} + + +static const VMStateDescription vmstate_locty = { +.name = loc, +.version_id = 1, +.minimum_version_id = 0, +.minimum_version_id_old = 0, +.fields = (VMStateField[]) { +VMSTATE_UINT32(state , TPMLocality), +VMSTATE_UINT32(inte, TPMLocality), +
[Qemu-devel] [PATCH V5 08/12] Introduce file lock for the block layer
This patch introduces file locking via fcntl() for the block layer so that concurrent access to files shared by 2 Qemu instances, for example via NFS, can be serialized. This feature is useful primarily during initial phases of VM migration where the target machine's TIS driver validates the block storage (and in a later patch checks for missing AES keys) and terminates Qemu if the storage is found to be faulty. This then allows migration to be gracefully terminated and Qemu continues running on the source machine. Support for win32 is based on win32 API and has been lightly tested with a standalone test program locking shared storage from two different machines. To enable locking a file multiple times, a counter is used. Actual locking happens the very first time and unlocking happens when the counter is zero. Signed-off-by: Stefan Berger stef...@linux.vnet.ibm.com --- --- block.c | 40 ++ block.h |8 ++ block/raw-posix.c | 62 ++ block/raw-win32.c | 51 block_int.h |4 +++ 5 files changed, 165 insertions(+) Index: qemu-git/block.c === --- qemu-git.orig/block.c +++ qemu-git/block.c @@ -475,6 +475,8 @@ static int bdrv_open_common(BlockDriverS goto free_and_fail; } +drv-num_locks = 0; + bs-keep_read_only = bs-read_only = !(open_flags BDRV_O_RDWR); ret = refresh_total_sectors(bs, bs-total_sectors); @@ -1181,6 +1183,44 @@ void bdrv_get_geometry(BlockDriverState *nb_sectors_ptr = length; } +/* file locking */ +static int bdrv_lock_common(BlockDriverState *bs, BDRVLockType lock_type) +{ +BlockDriver *drv = bs-drv; + +if (!drv) +return -ENOMEDIUM; + +if (bs-file) { +drv = bs-file-drv; +if (drv-bdrv_lock) { +return drv-bdrv_lock(bs-file, lock_type); +} +} + +if (drv-bdrv_lock) { +return drv-bdrv_lock(bs, lock_type); +} + +return -ENOTSUP; +} + + +int bdrv_lock(BlockDriverState *bs) +{ +if (bdrv_is_read_only(bs)) { +return bdrv_lock_common(bs, BDRV_F_RDLCK); +} + +return bdrv_lock_common(bs, BDRV_F_WRLCK); +} + +void bdrv_unlock(BlockDriverState *bs) +{ +bdrv_lock_common(bs, BDRV_F_UNLCK); +} + + struct partition { uint8_t boot_ind; /* 0x80 - active */ uint8_t head; /* starting head */ Index: qemu-git/block.h === --- qemu-git.orig/block.h +++ qemu-git/block.h @@ -42,6 +42,12 @@ typedef struct QEMUSnapshotInfo { #define BDRV_SECTOR_MASK ~(BDRV_SECTOR_SIZE - 1) typedef enum { +BDRV_F_UNLCK, +BDRV_F_RDLCK, +BDRV_F_WRLCK, +} BDRVLockType; + +typedef enum { BLOCK_ERR_REPORT, BLOCK_ERR_IGNORE, BLOCK_ERR_STOP_ENOSPC, BLOCK_ERR_STOP_ANY } BlockErrorAction; @@ -95,6 +101,8 @@ int bdrv_commit(BlockDriverState *bs); void bdrv_commit_all(void); int bdrv_change_backing_file(BlockDriverState *bs, const char *backing_file, const char *backing_fmt); +int bdrv_lock(BlockDriverState *bs); +void bdrv_unlock(BlockDriverState *bs); void bdrv_register(BlockDriver *bdrv); Index: qemu-git/block/raw-posix.c === --- qemu-git.orig/block/raw-posix.c +++ qemu-git/block/raw-posix.c @@ -718,6 +718,66 @@ static int64_t raw_getlength(BlockDriver } #endif +static int raw_lock(BlockDriverState *bs, BDRVLockType lock_type) +{ +BlockDriver *drv = bs-drv; +BDRVRawState *s = bs-opaque; +struct flock flock = { +.l_whence = SEEK_SET, +.l_start = 0, +.l_len = 0, +}; +int n; + +switch (lock_type) { +case BDRV_F_RDLCK: +case BDRV_F_WRLCK: +if (drv-num_locks) { +drv-num_locks++; +return 0; +} +flock.l_type = (lock_type == BDRV_F_RDLCK) ? F_RDLCK : F_WRLCK; +break; + +case BDRV_F_UNLCK: +if (--drv-num_locks 0) { +return 0; +} + +assert(drv-num_locks == 0); + +flock.l_type = F_UNLCK; +break; + +default: +return -EINVAL; +} + +while (1) { +n = fcntl(s-fd, F_SETLKW, flock); +if (n 0) { +if (errno == EINTR) { +continue; +} +if (errno == EAGAIN) { +usleep(1); +continue; +} +} +break; +} + +if (n == 0 +((lock_type == BDRV_F_RDLCK) || (lock_type == BDRV_F_WRLCK))) { +drv-num_locks = 1; +} + +if (n) +return -errno; + +return 0; +} + static int raw_create(const char *filename, QEMUOptionParameter *options) { int fd; @@ -814,6 +874,8 @@ static BlockDriver bdrv_file = { .bdrv_truncate = raw_truncate,
[Qemu-devel] [PATCH V5 12/12] Experimental support for taking measurements when kernel etc. are passed to Qemu
This really is just for experimental purposes since there are problems when doing something similar with a multiboot kernel. This patch addresses the case where the user provides the kernel, initrd and kernel command line via command line parameters to Qemu. To avoid incorrect measurements by SeaBIOS, the setup part of the kernel needs to be treated separately. For SeaBIOS to be able to measure the kernel whose measurement corresponds to the 'sha1sum kernel file' we need to preserve the setup part of the kernel. Since Qemu modifies it, we store a copy of the original setup and later retrieve it in SeaBIOS's and concat the setup and rest of the kernel to get the correct measurement. An alternative would be to measure the files in Qemu and make the measurements available to SeaBIOS. This would introduce a dependency of Qemu on a sha1 algorithm. Signed-off-by: Stefan Berger stef...@linux.vnet.ibm.com --- hw/fw_cfg.h |1 + hw/pc.c |8 +++- 2 files changed, 8 insertions(+), 1 deletion(-) Index: qemu-git/hw/fw_cfg.h === --- qemu-git.orig/hw/fw_cfg.h +++ qemu-git/hw/fw_cfg.h @@ -27,6 +27,7 @@ #define FW_CFG_SETUP_SIZE 0x17 #define FW_CFG_SETUP_DATA 0x18 #define FW_CFG_FILE_DIR 0x19 +#define FW_CFG_SETUP_ORIG_DATA 0x1a #define FW_CFG_FILE_FIRST 0x20 #define FW_CFG_FILE_SLOTS 0x10 Index: qemu-git/hw/pc.c === --- qemu-git.orig/hw/pc.c +++ qemu-git/hw/pc.c @@ -659,7 +659,7 @@ static void load_linux(void *fw_cfg, uint16_t protocol; int setup_size, kernel_size, initrd_size = 0, cmdline_size; uint32_t initrd_max; -uint8_t header[8192], *setup, *kernel, *initrd_data; +uint8_t header[8192], *setup, *kernel, *initrd_data, *setup_orig; target_phys_addr_t real_addr, prot_addr, cmdline_addr, initrd_addr = 0; FILE *f; char *vmode; @@ -807,6 +807,7 @@ static void load_linux(void *fw_cfg, kernel_size -= setup_size; setup = qemu_malloc(setup_size); +setup_orig = qemu_malloc(setup_size); kernel = qemu_malloc(kernel_size); fseek(f, 0, SEEK_SET); if (fread(setup, 1, setup_size, f) != setup_size) { @@ -818,6 +819,9 @@ static void load_linux(void *fw_cfg, exit(1); } fclose(f); + +memcpy(setup_orig, setup, setup_size); + memcpy(setup, header, MIN(sizeof(header), setup_size)); fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, prot_addr); @@ -828,6 +832,8 @@ static void load_linux(void *fw_cfg, fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, setup_size); fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA, setup, setup_size); +fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_ORIG_DATA, setup_orig, setup_size); + option_rom[nb_option_roms].name = linuxboot.bin; option_rom[nb_option_roms].bootindex = 0; nb_option_roms++;
[Qemu-devel] [PATCH V5 06/12] Add a TPM backend skeleton implementation
This patch provides a TPM backend skeleton implementation. It doesn't do anything useful (except for returning error response for every TPM command) but it compiles. v5: - the backend interface now has a create and destroy function. The former is used during the initialization phase of the TPM and the latter to clean up when Qemu terminates. v3: - in tpm_builtin.c all functions prefixed with tpm_builtin_ - build the builtin TPM driver available at this point; it returns a failure response message for every command - do not try to join the TPM thread but poll for its termination; the libtpms-based driver will require Qemu's main thread to write data to the block storage device while trying to join V2: - only terminating thread in tpm_atexit if it's running Signed-off-by: Stefan Berger stef...@linux.vnet.ibm.com --- Makefile.target |5 configure|1 hw/tpm_builtin.c | 454 +++ tpm.c|3 tpm.h|1 5 files changed, 464 insertions(+) Index: qemu-git/hw/tpm_builtin.c === --- /dev/null +++ qemu-git/hw/tpm_builtin.c @@ -0,0 +1,454 @@ +/* + * builtin 'null' TPM driver + * + * Copyright (c) 2010, 2011 IBM Corporation + * Copyright (c) 2010, 2011 Stefan Berger + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see http://www.gnu.org/licenses/ + */ + +#include qemu-common.h +#include tpm.h +#include hw/hw.h +#include hw/tpm_tis.h +#include hw/pc.h + + +//#define DEBUG_TPM +//#define DEBUG_TPM_SR /* suspend - resume */ + + +/* data structures */ + +typedef struct ThreadParams { +TPMState *tpm_state; + +TPMRecvDataCB *recv_data_callback; +} ThreadParams; + + +/* local variables */ + +static QemuThread thread; + +static QemuMutex state_mutex; /* protects *_state below */ +static QemuMutex tpm_initialized_mutex; /* protect tpm_initialized */ + +static bool thread_terminate = false; +static bool tpm_initialized = false; +static bool had_fatal_error = false; +static bool had_startup_error = false; +static bool thread_running = false; + +static ThreadParams tpm_thread_params; + +/* locality of the command being executed by libtpms */ +static uint8_t g_locty; + +static const unsigned char tpm_std_fatal_error_response[10] = { +0x00, 0xc4, 0x00, 0x00, 0x00, 0x0A, 0x00, 0x00, 0x00, 0x09 /* TPM_FAIL */ +}; + +static char dev_description[80]; + + +static void *tpm_builtin_main_loop(void *d) +{ +int res = 1; +ThreadParams *thr_parms = d; +uint32_t in_len, out_len; +uint8_t *in, *out; +uint32_t resp_size; /* total length of response */ + +#ifdef DEBUG_TPM +fprintf(stderr, tpm: THREAD IS STARTING\n); +#endif + +if (res != 0) { +#if defined DEBUG_TPM || defined DEBUG_TPM_SR +fprintf(stderr, tpm: Error: TPM initialization failed (rc=%d)\n, +res); +#endif + had_fatal_error = true; +} else { +qemu_mutex_lock(tpm_initialized_mutex); + +tpm_initialized = true; + +qemu_mutex_unlock(tpm_initialized_mutex); +} + +/* start command processing */ +while (!thread_terminate) { +/* receive and handle commands */ +in_len = 0; +do { +#ifdef DEBUG_TPM +fprintf(stderr, tpm: waiting for commands...\n); +#endif + +if (thread_terminate) { +break; +} + +qemu_mutex_lock(thr_parms-tpm_state-state_lock); + +/* in case we were to slow and missed the signal, the + to_tpm_execute boolean tells us about a pending command */ +if (!thr_parms-tpm_state-to_tpm_execute) { +qemu_cond_wait(thr_parms-tpm_state-to_tpm_cond, + thr_parms-tpm_state-state_lock); +} + +thr_parms-tpm_state-to_tpm_execute = false; + +qemu_mutex_unlock(thr_parms-tpm_state-state_lock); + +if (thread_terminate) { +break; +} + +g_locty = thr_parms-tpm_state-command_locty; + +in = thr_parms-tpm_state-loc[g_locty].w_buffer.buffer; +in_len = thr_parms-tpm_state-loc[g_locty].w_offset; + +if (!had_fatal_error) { + +out_len = thr_parms-tpm_state-loc[g_locty].r_buffer.size; + +#ifdef DEBUG_TPM +
[Qemu-devel] [PATCH V5 10/12] Encrypt state blobs using AES CBC encryption
This patch adds encryption of the individual state blobs that are written into the block storage. The 'directory' at the beginnig of the block storage is not encrypted. Keys can be passed either as a string of hexadecimal digits forming a 256, 192 or 128 bit AES key. Those keys can optionally start with '0x'. If the parser does not recognize it as such, the string itself is taken as the AES key. The key is passed via command line argument. It is wiped from the command line after parsing. If key=0x1234... was passed before it will then be changed to key=--... so that 'ps' does not show the key anymore. Obviously it cannot be completely prevented that the key is visible during a very short period of time until qemu is done parsing the command line parameters. A flag is introduced in the directory structure indicating whether the blobs are encrypted. An additional 'layer' for reading and writing the blobs to the underlying block storage is added. This layer encrypts the blobs for writing if a key is available. Similarly it decrypts the blobs after reading. Checks are added that test whether a key has been provided although all data are stored in clear-text or whether a key is missing. In either one of the cases the backend returns an error and Qemu terminates. -v5: - -tpmdev now also gets a key parameter - add documentation about key parameter Signed-off-by: Stefan Berger stef...@linux.vnet.ibm.com --- hw/tpm_builtin.c | 213 +-- qemu-config.c| 10 ++ qemu-options.hx | 20 - tpm.c| 10 ++ 4 files changed, 246 insertions(+), 7 deletions(-) Index: qemu-git/hw/tpm_builtin.c === --- qemu-git.orig/hw/tpm_builtin.c +++ qemu-git/hw/tpm_builtin.c @@ -27,6 +27,7 @@ #include hw/pc.h #include migration.h #include sysemu.h +#include aes.h #include libtpms/tpm_library.h #include libtpms/tpm_error.h @@ -110,7 +111,8 @@ typedef struct BSDir { uint16_t rev; uint32_t checksum; uint32_t num_entries; -uint32_t reserved[10]; +uint32_t flags; +uint32_t reserved[9]; BSEntry entries[BS_DIR_MAX_NUM_ENTRIES]; } __attribute__((packed)) BSDir; @@ -119,6 +121,8 @@ typedef struct BSDir { #define BS_DIR_REV_CURRENT BS_DIR_REV1 +#define BS_DIR_FLAG_ENC_BLOBS (1 0) + /* local variables */ static QemuThread thread; @@ -150,6 +154,8 @@ static const unsigned char tpm_std_fatal static char dev_description[80]; +static bool has_key; +static AES_KEY tpm_enc_key, tpm_dec_key; static void tpm_builtin_adjust_data_layout(BlockDriverState *bs, BSDir *dir); static int tpm_builtin_load_sized_data_from_bs(BlockDriverState *bs, @@ -206,6 +212,7 @@ static void tpm_builtin_dir_be_to_cpu(BS be16_to_cpus(dir-rev); be32_to_cpus(dir-checksum); be32_to_cpus(dir-num_entries); +be32_to_cpus(dir-flags); for (c = 0; c dir-num_entries c BS_DIR_MAX_NUM_ENTRIES; c++) { be32_to_cpus(dir-entries[c].type); @@ -232,6 +239,7 @@ static void tpm_builtin_dir_cpu_to_be(BS dir-rev = cpu_to_be16(dir-rev); dir-checksum= cpu_to_be32(dir-checksum); dir-num_entries = cpu_to_be32(dir-num_entries); +dir-flags = cpu_to_be32(dir-flags); } @@ -297,6 +305,36 @@ static bool tpm_builtin_has_valid_conten } +static uint32_t tpm_builtin_get_dir_flags(void) +{ +if (has_key) { +return BS_DIR_FLAG_ENC_BLOBS; +} + +return 0; +} + + +static bool tpm_builtin_has_missing_key(const BSDir *dir) +{ +if ((dir-flags BS_DIR_FLAG_ENC_BLOBS) !has_key) { +return true; +} + +return false; +} + + +static bool tpm_builtin_has_unnecessary_key(const BSDir *dir) +{ +if (!(dir-flags BS_DIR_FLAG_ENC_BLOBS) has_key) { +return true; +} + +return false; +} + + static int tpm_builtin_create_blank_dir(BlockDriverState *bs) { uint8_t buf[BDRV_SECTOR_SIZE]; @@ -307,6 +345,7 @@ static int tpm_builtin_create_blank_dir( dir = (BSDir *)buf; dir-rev = BS_DIR_REV_CURRENT; dir-num_entries = 0; +dir-flags = tpm_builtin_get_dir_flags(); dir-checksum = tpm_builtin_calc_dir_checksum(dir); @@ -408,6 +447,28 @@ static int tpm_builtin_startup_bs(BlockD tpm_builtin_dir_be_to_cpu(dir); +if (tpm_builtin_is_valid_bsdir(dir)) { +if (tpm_builtin_has_missing_key(dir)) { +fprintf(stderr, +tpm: the data are encrypted but I am missing the key.\n); +rc = -EIO; +goto err_exit; +} +if (tpm_builtin_has_unnecessary_key(dir)) { +fprintf(stderr, +tpm: I have a key but the data are not encrypted.\n); +rc = -EIO; +goto err_exit; +} +if ((dir-flags BS_DIR_FLAG_ENC_BLOBS) +!tpm_builtin_has_valid_content(dir)) { +fprintf(stderr, tpm: cannot read the data - +
[Qemu-devel] [PATCH V5 07/12] Implementation of the libtpms-based backend
This patch provides the glue for the TPM TIS interface (frontend) to the libtpms that provides the actual TPM functionality. Some details: This part of the patch provides support for the spawning of a thread that will interact with the libtpms-based TPM. It expects a signal from the frontend to wake and pick up the TPM command that is supposed to be processed and delivers the response packet using a callback function provided by the frontend. The backend connects itself to the frontend by filling out an interface structure with pointers to the function implementing support for various operations. In this part a structure with callback functions with is registered with libtpms. Those callback functions mostly deal with persistent storage. The libtpms-based backend implements functionality to write into a Qemu block storage device rather than to plain files. With that we can support VM snapshotting and we also get the possibility to use encrypted QCoW2 for free. Thanks to Anthony for pointing this out. The storage part of the driver has been split off into its own patch. v5: - check access() to TPM's state file and report error if file is not accessible v3: - temporarily deactivate the building of the tpm_builtin.c until subsequent patch completely converts it to the libtpms based driver v2: - fixes to adhere to the qemu coding style Signed-off-by: Stefan Berger stef...@linux.vnet.ibm.com --- configure|1 hw/tpm_builtin.c | 422 --- hw/tpm_tis.h | 17 ++ 3 files changed, 419 insertions(+), 21 deletions(-) Index: qemu-git/hw/tpm_tis.h === --- qemu-git.orig/hw/tpm_tis.h +++ qemu-git/hw/tpm_tis.h @@ -73,4 +73,21 @@ static inline void dumpBuffer(FILE *stre fprintf(stream, \n); } +static inline void clear_sized_buffer(TPMSizedBuffer *tpmsb) +{ +if (tpmsb-buffer) { + tpmsb-size = 0; + qemu_free(tpmsb-buffer); + tpmsb-buffer = NULL; +} +} + +static inline void set_sized_buffer(TPMSizedBuffer *tpmsb, +uint8_t *buffer, uint32_t size) +{ +clear_sized_buffer(tpmsb); +tpmsb-size = size; +tpmsb-buffer = buffer; +} + #endif /* _HW_TPM_TIS_H */ Index: qemu-git/hw/tpm_builtin.c === --- qemu-git.orig/hw/tpm_builtin.c +++ qemu-git/hw/tpm_builtin.c @@ -1,5 +1,5 @@ /* - * builtin 'null' TPM driver + * builtin TPM driver based on libtpms * * Copyright (c) 2010, 2011 IBM Corporation * Copyright (c) 2010, 2011 Stefan Berger @@ -18,17 +18,36 @@ * License along with this library; if not, see http://www.gnu.org/licenses/ */ +#include blockdev.h +#include block_int.h #include qemu-common.h #include tpm.h #include hw/hw.h #include hw/tpm_tis.h #include hw/pc.h +#include migration.h +#include sysemu.h + +#include libtpms/tpm_library.h +#include libtpms/tpm_error.h +#include libtpms/tpm_memory.h +#include libtpms/tpm_nvfilename.h +#include libtpms/tpm_tis.h + +#include zlib.h //#define DEBUG_TPM //#define DEBUG_TPM_SR /* suspend - resume */ +#define SAVESTATE_TYPE 'S' +#define PERMSTATE_TYPE 'P' +#define VOLASTATE_TYPE 'V' + +#define VTPM_DRIVE drive-vtpm0-nvram +#define TPM_OPTS id= VTPM_DRIVE + /* data structures */ typedef struct ThreadParams { @@ -44,12 +63,18 @@ static QemuThread thread; static QemuMutex state_mutex; /* protects *_state below */ static QemuMutex tpm_initialized_mutex; /* protect tpm_initialized */ +static QemuCond bs_write_result_cond; +static TPMSizedBuffer permanent_state = { .size = 0, .buffer = NULL, }; +static TPMSizedBuffer volatile_state = { .size = 0, .buffer = NULL, }; +static TPMSizedBuffer save_state = { .size = 0, .buffer = NULL, }; +static int pipefd[2] = {-1, -1}; static bool thread_terminate = false; static bool tpm_initialized = false; static bool had_fatal_error = false; static bool had_startup_error = false; static bool thread_running = false; +static bool need_read_volatile = false; static ThreadParams tpm_thread_params; @@ -63,9 +88,21 @@ static const unsigned char tpm_std_fatal static char dev_description[80]; +static int tpmlib_get_prop(enum TPMLIB_TPMProperty prop) +{ +int result; + +TPM_RESULT res = TPMLIB_GetTPMProperty(prop, result); + +assert(res == TPM_SUCCESS); + +return result; +} + + static void *tpm_builtin_main_loop(void *d) { -int res = 1; +TPM_RESULT res; ThreadParams *thr_parms = d; uint32_t in_len, out_len; uint8_t *in, *out; @@ -75,9 +112,10 @@ static void *tpm_builtin_main_loop(void fprintf(stderr, tpm: THREAD IS STARTING\n); #endif -if (res != 0) { +res = TPMLIB_MainInit(); +if (res != TPM_SUCCESS) { #if defined DEBUG_TPM || defined DEBUG_TPM_SR -fprintf(stderr, tpm: Error: TPM initialization failed (rc=%d)\n, +fprintf(stderr,
[Qemu-devel] [PATCH V5 11/12] Experimental support for block migrating TPMs state
This patch adds (experimental) support for block migration. In the case of block migration an empty QCoW2 image must be found on the destination so that early checks on the content and whether it can be decrytped with the provided key have to be skipped. That empty file needs to be created by higher layers (i.e., libvirt). Also, the completion of the block migration has to be delayed until after the TPM has written the last bytes of its state into the block device so that we get the latest state on the target as well. Before the change to savevm.c it could happen that the latest state of the TPM did not make it to the destination host since the TPM was still processing a command and changing its state (written into block storage) but the block migration already had finished. Re-ordering the saving of the live_state to finish after the 'non live_state' seems to get it right. Signed-off-by: Stefan Berger stef...@linux.vnet.ibm.com --- hw/tpm_builtin.c |5 + savevm.c | 22 +++--- 2 files changed, 16 insertions(+), 11 deletions(-) Index: qemu-git/hw/tpm_builtin.c === --- qemu-git.orig/hw/tpm_builtin.c +++ qemu-git/hw/tpm_builtin.c @@ -471,6 +471,11 @@ static int tpm_builtin_startup_bs(BlockD if (!tpm_builtin_is_valid_bsdir(dir) || !tpm_builtin_has_valid_content(dir)) { +if (incoming_expected) { +/* during migration with block migration, we may end + up here due to an empty block file */ +return -ENOKEY; +} /* if it's encrypted and has something else than null-content, we assume to have the wrong key */ if (bdrv_is_encrypted(bs)) { Index: qemu-git/savevm.c === --- qemu-git.orig/savevm.c +++ qemu-git/savevm.c @@ -1547,17 +1547,6 @@ int qemu_savevm_state_complete(Monitor * cpu_synchronize_all_states(); QTAILQ_FOREACH(se, savevm_handlers, entry) { -if (se-save_live_state == NULL) -continue; - -/* Section type */ -qemu_put_byte(f, QEMU_VM_SECTION_END); -qemu_put_be32(f, se-section_id); - -se-save_live_state(mon, f, QEMU_VM_SECTION_END, se-opaque); -} - -QTAILQ_FOREACH(se, savevm_handlers, entry) { int len; if (se-save_state == NULL se-vmsd == NULL) @@ -1578,6 +1567,17 @@ int qemu_savevm_state_complete(Monitor * vmstate_save(f, se); } +QTAILQ_FOREACH(se, savevm_handlers, entry) { +if (se-save_live_state == NULL) +continue; + +/* Section type */ +qemu_put_byte(f, QEMU_VM_SECTION_END); +qemu_put_be32(f, se-section_id); + +se-save_live_state(mon, f, QEMU_VM_SECTION_END, se-opaque); +} + qemu_put_byte(f, QEMU_VM_EOF); if (qemu_file_has_error(f))
Re: [Qemu-devel] [PATCH 23/26] target-xtensa: implement interrupt option
On 05/17/2011 03:32 PM, Max Filippov wrote: +if (xtensa_option_enabled(env-config, XTENSA_OPTION_TIMER_INTERRUPT)) { +int i; +for (i = 0; i env-config-nccompare; ++i) { +if (env-sregs[CCOMPARE + i] - old_ccount = d) { +env-halted = 0; +xtensa_timer_irq(env, i, 1); I don't think you should be writing to halted here; this is done by the code in cpu-exec.c, when noticing when cpu_has_work. Which will be true as a function of env-interrupt_request and the interrupt mask. +if (env-halted) { +xtensa_advance_ccount(env, +muldiv64(qemu_get_clock_ns(vm_clock) - env-halt_clock, +env-config-clock_freq_khz, 100)); +} Why are you polling the vm_clock rather than setting up a timer? +env-ccompare_timer = +qemu_new_timer_ns(vm_clock, xtensa_ccompare_cb, env); ... er, actually you are setting up a timer. So why aren't you using it? void do_interrupt(CPUState *env) { switch (env-exception_index) { +case EXC_IRQ: +if (handle_interrupt(env)) { +break; +} +/* not handled interrupt falls through, + * env-exception_index is updated + */ Do you really want to fall through, rather than restart the switch? @@ -124,12 +198,16 @@ void do_interrupt(CPUState *env) if (env-config-exception_vector[env-exception_index]) { env-pc = env-config-exception_vector[env-exception_index]; env-exception_taken = 1; +env-interrupt_request |= CPU_INTERRUPT_EXITTB; Huh? What are you trying to accomplish here? EXITTB is supposed to be used when a device external to the cpu changes the memory mapping of the system. E.g. the x86 a20 line. +DEF_HELPER_0(check_interrupts, void) +DEF_HELPER_2(waiti, void, i32, i32) +DEF_HELPER_2(timer_irq, void, i32, i32) +DEF_HELPER_1(advance_ccount, void, i32) You shouldn't have to manage any of this from within the translator. r~
[Qemu-devel] [PULL] s390x patch queue
Hi, This is my current s390x patch queue containing * s390x emulation * fixes for s390x kvm Please pull. Alex The following changes since commit 1fddfba129f5435c80eda14e8bc23fdb888c7187: Alexander Graf (1): ahci: Fix non-NCQ accesses for LBA 16bits are available in the git repository at: git://repo.or.cz/qemu/agraf.git s390-next Alexander Graf (12): tcg: extend max tcg opcodes when using 64-on-32bit s390x: make kvm exported functions conditional on kvm s390x: keep hint on virtio managing size s390x: Shift variables in CPUState for memset(0) s390x: helper functions for system emulation s390x: Implement opcode helpers s390x: Adjust internal kvm code s390x: translate engine for s390x CPU s390x: Adjust GDB stub s390x: remove compatibility cc field s390x: build s390x by default s390x: complain when allocating ram fails Christian Borntraeger (4): s390x: fix smp support for kvm s390x: Fix debugging for unknown sigp order codes s390x: change mapping base to allow guests 2GB s390x: fix memory detection for guests 64GB Ulrich Hecht (1): s390x: s390x-linux-user support configure|2 + default-configs/s390x-linux-user.mak |1 + exec-all.h |4 + exec.c | 14 +- gdbstub.c|8 +- hw/s390-virtio-bus.c |3 + hw/s390-virtio-bus.h |2 +- hw/s390-virtio.c | 20 +- linux-user/elfload.c | 19 + linux-user/main.c| 83 + linux-user/s390x/syscall.h | 23 + linux-user/s390x/syscall_nr.h| 349 +++ linux-user/s390x/target_signal.h | 26 + linux-user/s390x/termbits.h | 283 ++ linux-user/signal.c | 333 +++ linux-user/syscall.c | 16 +- linux-user/syscall_defs.h| 55 +- scripts/qemu-binfmt-conf.sh |4 +- target-s390x/cpu.h | 28 +- target-s390x/helper.c| 565 - target-s390x/helpers.h | 151 + target-s390x/kvm.c | 48 +- target-s390x/op_helper.c | 2929 +++- target-s390x/translate.c | 5167 +- 24 files changed, 10058 insertions(+), 75 deletions(-) create mode 100644 default-configs/s390x-linux-user.mak create mode 100644 linux-user/s390x/syscall.h create mode 100644 linux-user/s390x/syscall_nr.h create mode 100644 linux-user/s390x/target_signal.h create mode 100644 linux-user/s390x/termbits.h create mode 100644 target-s390x/helpers.h
[Qemu-devel] [PATCH V5 09/12] Add block storage support for libtpms based TPM backend
This patch supports the storage of TPM persistent state. The TPM creates state of varying size, depending for example how many keys are loaded into it at a certain time. The worst-case sizes of the different blobs the TPM can write have been pre-calculated and this value is used to determine the minimum size of the Qcow2 image. It needs to be 63kb. 'qemu-... -tpm ?' shows this number when this backend driver is available. The layout of the TPM's persistent data in the block storage is as follows: The first sector (512 bytes) holds a primitive directory for the different types of blobs that the TPM can write. This directory holds a revision number, a checksum over its content, the number of entries, and the entries themselves. typedef struct BSDir { uint16_t rev; uint32_t checksum; uint32_t num_entries; uint32_t reserved[10]; BSEntry entries[BS_DIR_MAX_NUM_ENTRIES]; } __attribute__((packed)) BSDir; The entries are described through their absolute offsets, their maximum sizes, the number of currently valid bytes (the blobs inflate and deflate) and what type of blob it is (see below for the types). A CRC32 over the blob is also included. typedef struct BSEntry { enum BSEntryType type; uint64_t offset; uint32_t space; uint32_t blobsize; uint32_t blobcrc32; uint32_t reserved[9]; } __attribute__((packed)) BSEntry; The worst case sizes of the blobs have been calculated and according to the sizes the blobs are written at certain offsets into the blockstorage. Their offsets are all aligned to sectors (512 byte boundaries). The TPM provides three different blobs that are written into the storage: - volatile state - permanent state - save state The 'save state' is written when the VM suspends (ACPI S3) and read when it resumes. This is done in concert with the BIOS where the BIOS needs to send a command to the TPM upon resume (TPM_Startup(ST_STATE)), while the OS issues the command TPM_SaveState() before entering ACPI S3. The 'permanent state' is written when the TPM receives a command that alters its permenent state, i.e., when the a key is loaded into the TPM that is expected to be there upon reboot of the machine / VM. Volatile state is written when the frontend triggers it to do so, i.e., when the VM's state is written out during taking of a snapshot, migration or suspension to disk (as in 'virsh save'). This state serves to resume at the point where the TPM previously stopped but there is no need for it after a machine reboot for example. Tricky parts here are related to encrypted QCoW2 storage where certain operations need to be deferred since the key for the storage only becomes available much later via the monitor than the time that the backend is instantiated. The backend also tries to check for the validity of the block storage for example. If the Qcow2 is not encrypted and the checksum is found to be bad, the block storage directory will be initialized. In case the Qcow2 is encrypted, initialization will only be done if the directory is found to be all 0s. In case the directory cannot be checksummed correctly, but is not all 0s, it is assumed that the user provided a wrong key. In this case I am not exiting qemu, but black-out the TPM interface (returns 0xff in all memory location) due to a presumed fatal error and let the VM run (without TPM functionality). v5: - name of drive is 'drive-vtpm0-nvram'; was 'vtpm-nvram' v4: - functions prefixed with tpm_builtin - added 10 uint32_t to BSDir as being reserved for future use - never move data in the block storage while migration is going on - use brdv_lock/bdrv_unlock to serialize access to the TPM's state file which is primarily necessary during migration and the startup of qemu on the target host where the content of the drive is being read and validated v3: - added reserved int's for future extensions to the entries in the directory structure - added crc32 to every entry in the directory structure and calculating it when writing and checking it when reading - fixed an endianess issue related to crc calculation - surrounding debugging output function in adjust_data_layout with #if defined DEBUG_TPM - probing for installed libtpms development package by test-compiling Signed-off-by: Stefan Berger stef...@linux.vnet.ibm.com --- configure| 25 + hw/tpm_builtin.c | 816 ++- 2 files changed, 837 insertions(+), 4 deletions(-) Index: qemu-git/hw/tpm_builtin.c === --- qemu-git.orig/hw/tpm_builtin.c +++ qemu-git/hw/tpm_builtin.c @@ -48,6 +48,34 @@ #define VTPM_DRIVE drive-vtpm0-nvram #define TPM_OPTS id= VTPM_DRIVE + +#define ALIGN(VAL, SIZE) \ + ( ( (VAL) + (SIZE) - 1 ) ~( (SIZE) - 1 ) ) + + +#define DIRECTORY_SIZEBDRV_SECTOR_SIZE + +#define PERMSTATE_DISK_OFFSET ALIGN(DIRECTORY_SIZE, BDRV_SECTOR_SIZE)