[Qemu-devel] Hello Would You Like To Earn

2011-05-20 Thread Carmille . Burns
Hello qemu-devel

Would you like to earn an extra $200 everyday?, for just 45 minutes work? You 
could quit your job and make double the money at home working for yourself.

visit-http:tinyurl.com/42e38u9

Regards,

Carmille Burns

Survey Human Resources Dept.





[Qemu-devel] [PATCH v5 0/5] hpet 'driftfix': alleviate time drift with HPET periodic timers

2011-05-20 Thread Ulrich Obergfell
Hi,

This is version 5 of a series of patches that I originally posted in:

http://lists.gnu.org/archive/html/qemu-devel/2011-03/msg01989.html
http://lists.gnu.org/archive/html/qemu-devel/2011-03/msg01992.html
http://lists.gnu.org/archive/html/qemu-devel/2011-03/msg01991.html
http://lists.gnu.org/archive/html/qemu-devel/2011-03/msg01990.html

http://article.gmane.org/gmane.comp.emulators.kvm.devel/69325
http://article.gmane.org/gmane.comp.emulators.kvm.devel/69326
http://article.gmane.org/gmane.comp.emulators.kvm.devel/69327
http://article.gmane.org/gmane.comp.emulators.kvm.devel/69328


Changes since version 4:

Added comments to patch part 3 and part 5. No changes in the actual code.


Please review and please comment.

Regards,

Uli


Ulrich Obergfell (5):
  hpet 'driftfix': add hooks required to detect coalesced interrupts
(x86 apic only)
  hpet 'driftfix': add driftfix property to HPETState and DeviceInfo
  hpet 'driftfix': add fields to HPETTimer and VMStateDescription
  hpet 'driftfix': add code in update_irq() to detect coalesced
interrupts (x86 apic only)
  hpet 'driftfix': add code in hpet_timer() to compensate delayed
callbacks and coalesced interrupts

 hw/apic.c |4 ++
 hw/hpet.c |  178 +++--
 hw/pc.h   |   13 +
 vl.c  |   13 +
 4 files changed, 204 insertions(+), 4 deletions(-)




[Qemu-devel] [PATCH v5 4/5] hpet 'driftfix': add code in update_irq() to detect coalesced interrupts (x86 apic only)

2011-05-20 Thread Ulrich Obergfell
update_irq() uses a similar method as in 'rtc_td_hack' to detect
coalesced interrupts. The function entry addresses are retrieved
from 'target_get_irq_delivered' and 'target_reset_irq_delivered'.

This change can be replaced if a generic feedback infrastructure to
track coalesced IRQs for periodic, clock providing devices becomes
available.

Signed-off-by: Ulrich Obergfell uober...@redhat.com
---
 hw/hpet.c |   13 +++--
 1 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/hw/hpet.c b/hw/hpet.c
index dba9370..0428290 100644
--- a/hw/hpet.c
+++ b/hw/hpet.c
@@ -184,11 +184,12 @@ static inline uint64_t hpet_calculate_diff(HPETTimer *t, 
uint64_t current)
 }
 }
 
-static void update_irq(struct HPETTimer *timer, int set)
+static int update_irq(struct HPETTimer *timer, int set)
 {
 uint64_t mask;
 HPETState *s;
 int route;
+int irq_delivered = 1;
 
 if (timer-tn = 1  hpet_in_legacy_mode(timer-state)) {
 /* if LegacyReplacementRoute bit is set, HPET specification requires
@@ -213,8 +214,16 @@ static void update_irq(struct HPETTimer *timer, int set)
 qemu_irq_raise(s-irqs[route]);
 } else {
 s-isr = ~mask;
-qemu_irq_pulse(s-irqs[route]);
+if (s-driftfix) {
+target_reset_irq_delivered();
+qemu_irq_raise(s-irqs[route]);
+irq_delivered = target_get_irq_delivered();
+qemu_irq_lower(s-irqs[route]);
+} else {
+qemu_irq_pulse(s-irqs[route]);
+}
 }
+return irq_delivered;
 }
 
 static void hpet_pre_save(void *opaque)
-- 
1.6.2.5




[Qemu-devel] [PATCH v5 5/5] hpet 'driftfix': add code in hpet_timer() to compensate delayed callbacks and coalesced interrupts

2011-05-20 Thread Ulrich Obergfell
Loss of periodic timer interrupts caused by delayed callbacks and by
interrupt coalescing is compensated by gradually injecting additional
interrupts during subsequent timer intervals, starting at a rate of
one additional interrupt per interval. The injection of additional
interrupts is based on a backlog of unaccounted HPET clock periods
(new HPETTimer field 'ticks_not_accounted'). The backlog increases
due to delayed callbacks and coalesced interrupts, and it decreases
if an interrupt was injected successfully. If the backlog increases
while compensation is still in progress, the rate at which additional
interrupts are injected is increased too. A limit is imposed on the
backlog and on the rate.

Injecting additional timer interrupts to compensate lost interrupts
can alleviate long term time drift. However, on a short time scale,
this method can have the side effect of making virtual machine time
intermittently pass slower and faster than real time (depending on
the guest's time keeping algorithm). Compensation is disabled by
default and can be enabled for guests where this behaviour may be
acceptable.

Signed-off-by: Ulrich Obergfell uober...@redhat.com
---
 hw/hpet.c |  120 +++-
 1 files changed, 118 insertions(+), 2 deletions(-)

diff --git a/hw/hpet.c b/hw/hpet.c
index 0428290..bc2a21a 100644
--- a/hw/hpet.c
+++ b/hw/hpet.c
@@ -31,6 +31,7 @@
 #include hpet_emul.h
 #include sysbus.h
 #include mc146818rtc.h
+#include assert.h
 
 //#define HPET_DEBUG
 #ifdef HPET_DEBUG
@@ -41,6 +42,9 @@
 
 #define HPET_MSI_SUPPORT0
 
+#define MAX_TICKS_NOT_ACCOUNTED (uint64_t)5 /* 5 sec */
+#define MAX_IRQ_RATE(uint32_t)10
+
 struct HPETState;
 typedef struct HPETTimer {  /* timers */
 uint8_t tn; /*timer number*/
@@ -334,13 +338,68 @@ static const VMStateDescription vmstate_hpet = {
 };
 
 /*
+ * This function resets the driftfix state in the following situations.
+ *
+ * - When the guest o/s changes the 'CFG_ENABLE' bit (overall enable)
+ *   in the General Configuration Register from 0 to 1.
+ *
+ * - When the guest o/s changes the 'TN_ENABLE' bit (timer N interrupt enable)
+ *   in the Timer N Configuration and Capabilities Register from 0 to 1.
+ */
+static void hpet_timer_driftfix_reset(HPETTimer *t)
+{
+if (t-state-driftfix  timer_is_periodic(t)) {
+t-ticks_not_accounted = t-prev_period = t-period;
+t-irq_rate = 1;
+t-divisor = 1;
+}
+}
+
+/*
+ * This function determines whether there is a backlog of ticks for which
+ * no interrupts have been delivered to the guest o/s yet. If the backlog
+ * is equal to or greater than the current period length, then additional
+ * interrupts will be delivered to the guest o/s inside of the subsequent
+ * period interval to compensate missed interrupts.
+ *
+ * 'ticks_not_accounted' increases by 'N * period' when the comparator is
+ * being advanced, and it decreases by 'prev_period' when an interrupt is
+ * delivered to the guest o/s. Normally 'prev_period' is equal to 'period'
+ * and 'N' is 1. 'prev_period' is different from 'period' if a guest o/s
+ * has changed the comparator value during the previous period interval.
+ * 'N' is greater than 1 if the callback was delayed by 'N - 1' periods,
+ * and 'N' is zero while additional interrupts are delivered inside of an
+ * interval.
+ *
+ * This function is called after the comparator has been advanced but before
+ * the interrupt is delivered to the guest o/s. Hence, 'ticks_not_accounted'
+ * is equal to 'prev_period' plus 'period' if there is no backlog.
+ */
+static bool hpet_timer_has_tick_backlog(HPETTimer *t)
+{
+uint64_t backlog = 0;
+
+if (t-ticks_not_accounted = t-period + t-prev_period) {
+backlog = t-ticks_not_accounted - (t-period + t-prev_period);
+}
+return (backlog = t-period);
+}
+
+/*
  * timer expiration callback
  */
 static void hpet_timer(void *opaque)
 {
 HPETTimer *t = opaque;
+HPETState *s = t-state;
 uint64_t diff;
-
+int irq_delivered = 0;
+uint32_t period_count = 0;   /* elapsed periods since last callback
+  *   1: normal case
+  *  1: missed 'period_count - 1' interrupts
+  *  due to delayed callback
+  *   0: callback inside of an interval
+  *  to deliver additional interrupts */
 uint64_t period = t-period;
 uint64_t cur_tick = hpet_get_ticks(t-state);
 
@@ -348,13 +407,48 @@ static void hpet_timer(void *opaque)
 if (t-config  HPET_TN_32BIT) {
 while (hpet_time_after(cur_tick, t-cmp)) {
 t-cmp = (uint32_t)(t-cmp + t-period);
+t-ticks_not_accounted += t-period;
+period_count++;
 }
 } else {
 while (hpet_time_after64(cur_tick, t-cmp)) {
 

[Qemu-devel] [PATCH v5 1/5] hpet 'driftfix': add hooks required to detect coalesced interrupts (x86 apic only)

2011-05-20 Thread Ulrich Obergfell
'target_get_irq_delivered' and 'target_reset_irq_delivered' point
to functions that are called by update_irq() to detect coalesced
interrupts. Initially they point to stub functions which pretend
successful interrupt injection. apic code calls two registration
functions to replace the stubs with apic_get_irq_delivered() and
apic_reset_irq_delivered().

This change can be replaced if a generic feedback infrastructure to
track coalesced IRQs for periodic, clock providing devices becomes
available.

Signed-off-by: Ulrich Obergfell uober...@redhat.com
---
 hw/apic.c |4 
 hw/pc.h   |   13 +
 vl.c  |   13 +
 3 files changed, 30 insertions(+), 0 deletions(-)

diff --git a/hw/apic.c b/hw/apic.c
index a45b57f..94b1d15 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -17,6 +17,7 @@
  * License along with this library; if not, see http://www.gnu.org/licenses/
  */
 #include hw.h
+#include pc.h
 #include apic.h
 #include ioapic.h
 #include qemu-timer.h
@@ -1143,6 +1144,9 @@ static SysBusDeviceInfo apic_info = {
 
 static void apic_register_devices(void)
 {
+register_target_get_irq_delivered(apic_get_irq_delivered);
+register_target_reset_irq_delivered(apic_reset_irq_delivered);
+
 sysbus_register_withprop(apic_info);
 }
 
diff --git a/hw/pc.h b/hw/pc.h
index bc8fcec..7511f28 100644
--- a/hw/pc.h
+++ b/hw/pc.h
@@ -7,6 +7,19 @@
 #include fdc.h
 #include net.h
 
+extern int (*target_get_irq_delivered)(void);
+extern void (*target_reset_irq_delivered)(void);
+
+static inline void register_target_get_irq_delivered(int (*func)(void))
+{
+target_get_irq_delivered = func;
+}
+
+static inline void register_target_reset_irq_delivered(void (*func)(void))
+{
+target_reset_irq_delivered = func;
+}
+
 /* PC-style peripherals (also used by other machines).  */
 
 /* serial.c */
diff --git a/vl.c b/vl.c
index 73e147f..456e320 100644
--- a/vl.c
+++ b/vl.c
@@ -232,6 +232,19 @@ const char *prom_envs[MAX_PROM_ENVS];
 const char *nvram = NULL;
 int boot_menu;
 
+static int target_get_irq_delivered_stub(void)
+{
+return 1;
+}
+
+static void target_reset_irq_delivered_stub(void)
+{
+return;
+}
+
+int (*target_get_irq_delivered)(void) = target_get_irq_delivered_stub;
+void (*target_reset_irq_delivered)(void) = target_reset_irq_delivered_stub;
+
 typedef struct FWBootEntry FWBootEntry;
 
 struct FWBootEntry {
-- 
1.6.2.5




[Qemu-devel] [PATCH v5 2/5] hpet 'driftfix': add driftfix property to HPETState and DeviceInfo

2011-05-20 Thread Ulrich Obergfell
driftfix is a 'bit type' property. Compensation of delayed callbacks
and coalesced interrupts can be enabled with the command line option

-global hpet.driftfix=on

driftfix is 'off' (disabled) by default.

Signed-off-by: Ulrich Obergfell uober...@redhat.com
---
 hw/hpet.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/hw/hpet.c b/hw/hpet.c
index 6ce07bc..7513065 100644
--- a/hw/hpet.c
+++ b/hw/hpet.c
@@ -72,6 +72,8 @@ typedef struct HPETState {
 uint64_t isr;   /* interrupt status reg */
 uint64_t hpet_counter;  /* main counter */
 uint8_t  hpet_id;   /* instance id */
+
+uint32_t driftfix;
 } HPETState;
 
 static uint32_t hpet_in_legacy_mode(HPETState *s)
@@ -738,6 +740,7 @@ static SysBusDeviceInfo hpet_device_info = {
 .qdev.props = (Property[]) {
 DEFINE_PROP_UINT8(timers, HPETState, num_timers, HPET_MIN_TIMERS),
 DEFINE_PROP_BIT(msi, HPETState, flags, HPET_MSI_SUPPORT, false),
+DEFINE_PROP_BIT(driftfix, HPETState, driftfix, 0, false),
 DEFINE_PROP_END_OF_LIST(),
 },
 };
-- 
1.6.2.5




[Qemu-devel] [PATCH v5 3/5] hpet 'driftfix': add fields to HPETTimer and VMStateDescription

2011-05-20 Thread Ulrich Obergfell
The new fields in HPETTimer are covered by a separate VMStateDescription
which is a subsection of 'vmstate_hpet_timer'. They are only migrated if

-global hpet.driftfix=on

Signed-off-by: Ulrich Obergfell uober...@redhat.com
---
 hw/hpet.c |   42 ++
 1 files changed, 42 insertions(+), 0 deletions(-)

diff --git a/hw/hpet.c b/hw/hpet.c
index 7513065..dba9370 100644
--- a/hw/hpet.c
+++ b/hw/hpet.c
@@ -55,6 +55,19 @@ typedef struct HPETTimer {  /* timers */
 uint8_t wrap_flag;  /* timer pop will indicate wrap for one-shot 32-bit
  * mode. Next pop will be actual timer expiration.
  */
+/* driftfix state */
+uint64_t prev_period;/* needed when the guest o/s changes the
+  * comparator value */
+uint64_t ticks_not_accounted;/* ticks for which no interrupts have been
+  * delivered to the guest o/s yet */
+uint32_t irq_rate;   /* rate at which interrupts are delivered
+  * to the guest o/s during one period
+  * interval; if rate is greater than 1,
+  * additional interrupts are delivered
+  * to compensate missed interrupts */
+uint32_t divisor;/* needed to determine when the next
+  * timer callback should occur while
+  * rate is greater than 1 */
 } HPETTimer;
 
 typedef struct HPETState {
@@ -246,6 +259,27 @@ static int hpet_post_load(void *opaque, int version_id)
 return 0;
 }
 
+static bool hpet_timer_driftfix_vmstate_needed(void *opaque)
+{
+HPETTimer *t = opaque;
+
+return (t-state-driftfix != 0);
+}
+
+static const VMStateDescription vmstate_hpet_timer_driftfix = {
+.name = hpet_timer_driftfix,
+.version_id = 1,
+.minimum_version_id = 1,
+.minimum_version_id_old = 1,
+.fields  = (VMStateField []) {
+VMSTATE_UINT64(prev_period, HPETTimer),
+VMSTATE_UINT64(ticks_not_accounted, HPETTimer),
+VMSTATE_UINT32(irq_rate, HPETTimer),
+VMSTATE_UINT32(divisor, HPETTimer),
+VMSTATE_END_OF_LIST()
+}
+};
+
 static const VMStateDescription vmstate_hpet_timer = {
 .name = hpet_timer,
 .version_id = 1,
@@ -260,6 +294,14 @@ static const VMStateDescription vmstate_hpet_timer = {
 VMSTATE_UINT8(wrap_flag, HPETTimer),
 VMSTATE_TIMER(qemu_timer, HPETTimer),
 VMSTATE_END_OF_LIST()
+},
+.subsections = (VMStateSubsection []) {
+{
+.vmsd = vmstate_hpet_timer_driftfix,
+.needed = hpet_timer_driftfix_vmstate_needed,
+}, {
+/* empty */
+}
 }
 };
 
-- 
1.6.2.5




Re: [Qemu-devel] [PATCH 20/26] target-xtensa: implement extended L32R

2011-05-20 Thread Max Filippov
  +static void gen_wsr_litbase(DisasContext *dc, uint32_t sr, TCGv_i32 s)
  +{
  +tcg_gen_mov_i32(cpu_SR[sr], s);
  +/* This can change tb-flags, so exit tb */
  +gen_jumpi_check_loop_end(dc, -1);
  +}
 
 Surely you have to flush all TB's when changing litbase?
 
  +((dc-tb-flags  XTENSA_TBFLAG_LITBASE) ?
  + dc-litbase :
  + ((dc-pc + 3)  ~3)) +
  +(0xfffc | (RI16_IMM16  2)));
 
 Unless you actually read from env-sr[LITBASE] here, instead
 of building the value into the TB.

You're right, I have to flush all TBs at gen_wsr_litbase for this code to 
always work correctly.
As far as I can see LITBASE usage pattern is that it is set up once in early 
initialization and is never changed after.

Thanks.
-- Max



Re: [Qemu-devel] [RFC] Memory API

2011-05-20 Thread Gleb Natapov
On Thu, May 19, 2011 at 08:55:49PM +0200, Jan Kiszka wrote:
  Because we should catch accidental overlaps in all those non PCI devices
  with hard-wired addressing. That's a bug in the device/machine model and
  should be reported as such by QEMU.
  Why should we complicate API to catch unlikely errors? If you want to
  debug that add capability to dump memory map from the monitor.
 
  Because we need to switch tons of code that so far saw a fairly
  different reaction of the core to overlapping regions.
 
  How so? Today if there is accidental overlap device will not function 
  properly.
  With new API it will be the same.
 
 I rather expect subtle differences as overlapping registration changes
 existing regions, in the future those will recover.
 
Where do you expect the differences will come from? Conversion to the new
API shouldn't change the order of the registration and if the last
registration will override previous one the end result should be the
same as we have today.

  new region management will not cause any harm to overlapping regions so
  that they can recover when the overlap is gone.
 
 
  Another example may be APIC region and PCI. They overlap, but neither
  CPU nor PCI knows about it.
 
  And they do not need to. The APIC regions will be managed by the 
  per-CPU
  region management, reusing the tool box we need for all bridges. It 
  will
  register the APIC page with a priority higher than the default one, 
  thus
  overriding everything that comes from the host bridge. I think that
  reflects pretty well real machine behaviour.
 
  What is higher? How does it know that priority is high enough?
 
  Because no one else manages priorities at a specific hierarchy level.
  There is only one.
 
  PCI and CPU are on different hierarchy levels. PCI is under the PIIX and
  CPU is on a system BUS.
 
  The priority for the APIC mapping will be applied at CPU level, of
  course. So it will override everything, not just PCI.
 
  So you do not need explicit priority because the place in hierarchy
  implicitly provides you with one.
 
 Yes.
OK :) So you agree that we can do without priorities :)

   Alternatively, you could add a prio offset to all mappings when
 climbing one level up, provided that offset is smaller than the prio
 range locally available to each level.
 
Then a memory region final priority will depend on a tree height. If two
disjointed tree branches of different height will claim the same memory
region the higher one will have higher priority. I think this priority
management is a can of worms.

Only the lowest level (aka system bus) will use memory API directly. PCI
device will call PCI subsystem. PCI subsystem, instead of assigning
arbitrary priorities to all overlappings, may just resolve them and pass
flattened view to the chipset. Chipset in turn will look for overlappings
between PCI memory areas and RAM/ISA/other memory areas that are outside
of PCI windows and resolve all those passing the flattened view to system
bus where APIC/PCI conflict will be resolved and finally memory API will
be used to create memory map. In such a model I do not see the need for
priorities. All overlappings are resolved in the most logical place,
the one that has the best knowledge about how to resolve the conflict.
The will be no code duplication. Overlapping resolution code will be in
separate library used by all layers.

--
Gleb.



Re: [Qemu-devel] [PATCH 19/26] target-xtensa: implement loop option

2011-05-20 Thread Max Filippov
  +if (env-sregs[LEND] != v) {
  +tb_invalidate_phys_page_range(
  +env-sregs[LEND] - 1, env-sregs[LEND], 0);
  +env-sregs[LEND] = v;
  +tb_invalidate_phys_page_range(
  +env-sregs[LEND] - 1, env-sregs[LEND], 0);
  +}
 
 Why are you invalidating twice?

TB at the old LEND and at the new. Although it will work correctly without 
first invalidation.

  +static void gen_check_loop_end(DisasContext *dc, int slot)
  +{
  +if (option_enabled(dc, XTENSA_OPTION_LOOP) 
  +!(dc-tb-flags  XTENSA_TBFLAG_EXCM) 
  +dc-next_pc == dc-lend) {
  +int label = gen_new_label();
  +
  +tcg_gen_brcondi_i32(TCG_COND_NE, cpu_SR[LEND], dc-next_pc, label);
  +tcg_gen_brcondi_i32(TCG_COND_EQ, cpu_SR[LCOUNT], 0, label);
  +tcg_gen_subi_i32(cpu_SR[LCOUNT], cpu_SR[LCOUNT], 1);
  +gen_jump(dc, cpu_SR[LBEG]);
  +gen_set_label(label);
  +gen_jumpi(dc, dc-next_pc, slot);
 
 If you're going to pretend that LEND is a constant, you might as well
 pretend that LBEG is also a constant, so that you get to chain the TB's
 around the loop.

But there may be three exits from TB at the LEND if its last command is a 
branch: to the LBEG, to the branch target and to the next insn.

Thanks.
-- Max



Re: [Qemu-devel] [PATCH 09/26] target-xtensa: add special and user registers

2011-05-20 Thread Max Filippov
  +enum {
  +THREADPTR = 231,
  +FCR = 232,
  +FSR = 233,
  +};
  +
   typedef struct XtensaConfig {
   const char *name;
   uint64_t options;
  @@ -109,6 +115,7 @@ typedef struct CPUXtensaState {
   uint32_t regs[16];
   uint32_t pc;
   uint32_t sregs[256];
  +uint32_t uregs[256];
 
 Is it really worthwhile allocating 2k worth of space in the
 CPUState when only several of the slots are actually used?
 
 I would think that it might be better to have a function to
 map between number to offset/register.  E.g.
 
 int ur_offset(int ur)
 {
 switch (ur) {
 case THREADPTR:
 return offsetof(CPUState, ur_threadptr);
 case FCR:
 return offsetof(CPUState, ur_fcr);
 case FSR:
 return offsetof(CPUState, ur_fsr);
 }
 return -1;
 }
 
 where the individual slots are allocated by hand in the
 CPUState.  The fact that they'll be named in the struct
 will also make it easier to dump the value inside gdb and
 see what the individual values are.

User registers represent TIE states that may appear in custom xtensa 
configurations. I'd better change RUR and WUR so that they can access all user 
registers but warn on those not defined globally or in the CPUEnv::config.
Is it OK?

Thanks.
-- Max



Re: [Qemu-devel] [RFC] Memory API

2011-05-20 Thread Jan Kiszka
On 2011-05-20 09:23, Gleb Natapov wrote:
 On Thu, May 19, 2011 at 08:55:49PM +0200, Jan Kiszka wrote:
 Because we should catch accidental overlaps in all those non PCI devices
 with hard-wired addressing. That's a bug in the device/machine model and
 should be reported as such by QEMU.
 Why should we complicate API to catch unlikely errors? If you want to
 debug that add capability to dump memory map from the monitor.

 Because we need to switch tons of code that so far saw a fairly
 different reaction of the core to overlapping regions.

 How so? Today if there is accidental overlap device will not function 
 properly.
 With new API it will be the same.

 I rather expect subtle differences as overlapping registration changes
 existing regions, in the future those will recover.

 Where do you expect the differences will come from? Conversion to the new
 API shouldn't change the order of the registration and if the last
 registration will override previous one the end result should be the
 same as we have today.

A) Removing regions will change significantly. So far this is done by
setting a region to IO_MEM_UNASSIGNED, keeping truncation. With the new
API that will be a true removal which will additionally restore hidden
regions.

B) Uncontrolled overlapping is a bug that should be caught by the core,
and a new API is a perfect chance to do this.

 
 new region management will not cause any harm to overlapping regions so
 that they can recover when the overlap is gone.


 Another example may be APIC region and PCI. They overlap, but neither
 CPU nor PCI knows about it.

 And they do not need to. The APIC regions will be managed by the 
 per-CPU
 region management, reusing the tool box we need for all bridges. It 
 will
 register the APIC page with a priority higher than the default one, 
 thus
 overriding everything that comes from the host bridge. I think that
 reflects pretty well real machine behaviour.

 What is higher? How does it know that priority is high enough?

 Because no one else manages priorities at a specific hierarchy level.
 There is only one.

 PCI and CPU are on different hierarchy levels. PCI is under the PIIX and
 CPU is on a system BUS.

 The priority for the APIC mapping will be applied at CPU level, of
 course. So it will override everything, not just PCI.

 So you do not need explicit priority because the place in hierarchy
 implicitly provides you with one.

 Yes.
 OK :) So you agree that we can do without priorities :)

Nope, see below how your own example depends on them.

 
   Alternatively, you could add a prio offset to all mappings when
 climbing one level up, provided that offset is smaller than the prio
 range locally available to each level.

 Then a memory region final priority will depend on a tree height. If two
 disjointed tree branches of different height will claim the same memory
 region the higher one will have higher priority. I think this priority
 management is a can of worms.

It is not as it remains a pure local thing and helps implementing the
sketched scenarios. Believe, I tried to fix PAM/SMRAM already.

 
 Only the lowest level (aka system bus) will use memory API directly.

Not necessarily. It depends on how much added value buses like PCI or
ISA or whatever can offer for managing I/O regions. For some purposes,
it may as well be fine to just call the memory_* service directly and
pass the result of some operation to the bus API later on.

 PCI
 device will call PCI subsystem. PCI subsystem, instead of assigning
 arbitrary priorities to all overlappings,

Again: PCI will _not_ assign arbitrary priorities but only
MEMORY_REGION_DEFAULT_PRIORITY, likely 0.

 may just resolve them and pass
 flattened view to the chipset. Chipset in turn will look for overlappings
 between PCI memory areas and RAM/ISA/other memory areas that are outside
 of PCI windows and resolve all those passing the flattened view to system
 bus where APIC/PCI conflict will be resolved and finally memory API will
 be used to create memory map. In such a model I do not see the need for
 priorities. All overlappings are resolved in the most logical place,
 the one that has the best knowledge about how to resolve the conflict.
 The will be no code duplication. Overlapping resolution code will be in
 separate library used by all layers.

That does not specify how the PCI bridge or the chipset will tell that
overlapping resolution lib _how_ overlapping regions shall be translated
into a flat representation. And precisely here come priorities into
play. It is the way to tell that lib either region A shall override
region B if A has higher prio or if region A and B overlap, do
whatever you want if both have the same prio.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 01/27] Clean up PowerPC SLB handling code

2011-05-20 Thread Alexander Graf

On 20.05.2011, at 05:34, David Gibson wrote:

 On Thu, May 19, 2011 at 10:25:04AM +0200, Andreas Färber wrote:
 QEMU HEAD still uses a 32-bit binary for both 32-bit and
 64-bit. That one uses mtsrin so will need the compatibility, it
 seemed affected, too.
 
 OpenBIOS SVN HEAD (blob) uses slb* as linked to. We're in the
 preparation of 1.1 and I need to test it before we can update the
 QEMU binary. ;)
 
 Sorry for top-posting, Android sucks.
 
 So, my theory was half right.  It was a problem with 64-bit mtsr
 emulation, but it wasn't that I just removed that code with the SLB
 cleanup.  The code was still there and *almost* right.  I was off by
 one in one shift, causing the storage key bits to end up in the wrong
 place in the SLB entry.  I'll send out the patch right after I've sent
 this mail.

Thanks a lot for tracking it down you two :)

Alex




Re: [Qemu-devel] [PATCH] Fix a bug in mtsr/mtsrin emulation on ppc64

2011-05-20 Thread Alexander Graf

On 20.05.2011, at 05:34, David Gibson wrote:

 Early ppc64 CPUs include a hack to partially simulate the ppc32 segment
 registers, by translating writes to them into writes to the SLB.  This is
 not used by any current Linux kernel, but it is used by the openbios used
 in the qemu mac99 model.
 
 Commit 81762d6dd0d430d87024f2c83e9c4dcc4329fb7d, cleaning up the SLB
 handling introduced a bug in this code, breaking the openbios currently in
 qemu.  Specifically, there was an off by one error bitshuffling the
 register format used by mtsr into the format needed for the SLB load,
 causing the flag bits to end up in the wrong place.  This caused the
 storage keys to be wrong under openbios, meaning that the translation code
 incorrectly thought a legitimate access was a permission violation.
 
 This patch fixes the bug, at the same time it fixes some build bug in the
 MMU debugging code (only exposed when DEBUG_MMU is enabled).

Thanks, applied to ppc-next :)


Alex




Re: [Qemu-devel] [V2 2/2]Qemu: Add commands hostcache_set and hostcache_get

2011-05-20 Thread Stefan Hajnoczi
On Thu, May 19, 2011 at 10:38:03PM +0530, Supriya Kannery wrote:
 Monitor commands hostcache_set and hostcache_get added for dynamic
 host cache change and display of host cache setting respectively.

A generic command for changing block device options would be nice,
althought I don't see other options where it makes sense to change them
at runtime.

The alternative would be:

block_set hostcache on

block_set, {device: ide1-cd0, name: hostcache, enable: true}

The hostcache_get information would be part of query-block output:
 {
device:ide0-hd0,
locked:false,
removable:false,
inserted:{
   ro:false,
   drv:qcow2,
   encrypted:false,
   file:disks/test.img
   hostcache:true,
},
type:hd
 },

This approach is extensible if more options need to be exposed.

 Signed-off-by: Supriya Kannery supri...@in.ibm.com
 
 ---
  block.c |   48 
  block.h |2 ++
  blockdev.c  |   48 
  blockdev.h  |2 ++
  hmp-commands.hx |   29 +
  qmp-commands.hx |   55 
 +++
  6 files changed, 184 insertions(+)
 
 Index: qemu/hmp-commands.hx
 ===
 --- qemu.orig/hmp-commands.hx
 +++ qemu/hmp-commands.hx
 @@ -70,6 +70,35 @@ but should be used with extreme caution.
  resizes image files, it can not resize block devices like LVM volumes.
  ETEXI
  
 +{
 +.name   = hostcache_get,
 +.args_type  = device:B,
 +.params = device,
 +.help   = retrieve host cache settings for device,

Please make it clear these operations affect block devices:
for block device

 +.user_print = monitor_user_noop,
 +.mhandler.cmd_new = do_hostcache_get,
 +},
 +
 +STEXI
 +@item hostcache_get
 +@findex hostcache_get
 +Display host cache settings for a block device while guest is running.
 +ETEXI
 +
 +{
 +.name   = hostcache_set,
 +.args_type  = device:B,hostcache:s,
 +.params = device hostcache,
 +.help   = change host cache setting for device,
 +.user_print = monitor_user_noop,
 +.mhandler.cmd_new = do_hostcache_set,
 +},
 +
 +STEXI
 +@item hostcache_set
 +@findex hostcache_set
 +Change host cache options for a block device while guest is running.
 +ETEXI
  
  {
  .name   = eject,
 Index: qemu/block.c
 ===
 --- qemu.orig/block.c
 +++ qemu/block.c
 @@ -657,6 +657,34 @@ unlink_and_fail:
  return ret;
  }
  
 +int bdrv_reopen(BlockDriverState *bs, int bdrv_flags)
 +{
 +BlockDriver *drv = bs-drv;
 +int ret = 0;
 +
 +/* No need to reopen as no change in flags */
 +if (bdrv_flags == bs-open_flags) {
 +return 0;
 +}
 +
 +/* Quiesce IO for the given block device */
 +qemu_aio_flush();
 +bdrv_flush(bs);
 +
 +bdrv_close(bs);
 +ret = bdrv_open(bs, bs-filename, bdrv_flags, drv);
 +
 +/*
 + * A failed attempt to reopen the image file must lead to 'abort()'
 + */
 +if (ret != 0) {
 +qerror_report(QERR_REOPEN_FILE_FAILED, bs-filename);
 +abort();

The error is never reported on a QMP monitor because qerror_report()
simply stashes away the qerror.  The QMP client doesn't have a chance to
read the error before QEMU terminates.

 +}
 +
 +return ret;
 +}
 +
  void bdrv_close(BlockDriverState *bs)
  {
  if (bs-drv) {
 @@ -3049,3 +3077,23 @@ out:
  
  return ret;
  }
 +
 +int bdrv_change_hostcache(BlockDriverState *bs, bool enable_host_cache)

Consistently using hostcache or host_cache would be nice.

 +{
 +int bdrv_flags = bs-open_flags;
 +
 +/* No change in existing hostcache setting */
 +if(!enable_host_cache == (bdrv_flags  BDRV_O_NOCACHE)) {

This expression doesn't work as expected.  bool has a lower rank than
int.  That means !enable_host_cache is converted to an int and compared
against bdrv_flags  BDRV_O_NOCACHE.  This expression is always false
because a bool is 0 or 1 and BDRV_O_NOCACHE is 0x0020.

 +return -1;

This shouldn't be a failure and please don't use -1 when a negative
errno indicates failure.  -1 == -EPERM.  The return value should be 0
here.

 +}

Anyway, this whole check is unnecessary since bdrv_reopen() already
performs it.

 +
 +/* set hostcache flags (without changing WCE/flush bits) */
 +if(!enable_host_cache) {
 +bdrv_flags |= BDRV_O_NOCACHE;
 +} else {
 +bdrv_flags = ~BDRV_O_NOCACHE;
 +}
 +
 +/* Reopen file with changed set of flags */
 +return(bdrv_reopen(bs, bdrv_flags));

Please run scripts/checkpatch.pl before submitting patches.

 +}
 Index: qemu/blockdev.c
 

[Qemu-devel] virtio scsi host draft specification, v2

2011-05-20 Thread Paolo Bonzini

Hi all,

here is the second version of the spec.  In the end I took the advice of 
merging all requestq's into one.  The reason for this is that I took a 
look at the vSCSI device and liked its approach of using SAM 8-byte LUNs 
directly.  While it _is_ complex (and not yet done right by QEMU---will 
send a patch for that), the scheme is actually quite natural to 
implement and use, and supporting generic bus/target/LUN topologies is 
good to have for passthrough, as well.


I also added a few more features from SAM to avoid redefining the 
structs in the future.


Of course it may be that I'm completely wrong. :)  Please comment on the 
spec!


Paolo
Virtio SCSI Host Device Spec


The virtio SCSI host device groups together one or more simple virtual
devices (ie. disk), and allows communicating to these devices using the
SCSI protocol.  An instance of the device represents a SCSI host with
possibly many buses, targets and LUN attached.

The virtio SCSI device services two kinds of requests:

- command requests for a logical unit;

- task management functions related to a logical unit, target or
command.

The device is also able to send out notifications about added
and removed logical units.

v4:
First public version

v5:
Merged all virtqueues into one, removed separate TARGET fields

Configuration
-

Subsystem Device ID
TBD

Virtqueues
0:control transmitq
1:control receiveq
2:requestq

Feature bits
VIRTIO_SCSI_F_INOUT - Whether a single request can include both
read-only and write-only data buffers.

Device configuration layout
struct virtio_scsi_config {
}

(Still empty)

Device initialization
-

The initialization routine should first of all discover the device's
control virtqueues.

The driver should then place at least a buffer in the control receiveq.
Buffers returned by the device on the control receiveq may be referred
to as events in the rest of the document.

The driver can immediately issue requests (for example, INQUIRY or
REPORT LUNS) or task management functions (for example, I_T RESET).

Device operation: request queue
---

The driver queues requests to the virtqueue, and they are used by the device
(not necessarily in order).

Requests have the following format:

struct virtio_scsi_req_cmd {
u8 lun[8];
u64 id;
u8 task_attr;
u8 prio;
u8 crn;
u32 num_dataout, num_datain;
char cdb[];
char data[][num_dataout+num_datain];
u8 sense[];
u32 sense_len;
u32 residual;
u16 status_qualifier;
u8 status;
u8 response;
};

/* command-specific response values */
#define VIRTIO_SCSI_S_OK  0
#define VIRTIO_SCSI_S_UNDERRUN1
#define VIRTIO_SCSI_S_ABORTED 2
#define VIRTIO_SCSI_S_FAILURE 3

The lun field addresses a bus, target and logical unit in the SCSI
host.  The id field is the command identifier as defined in SAM.

The task_attr, prio field should always be zero, as task
attributes other than SIMPLE, as well as command priority, are
explicitly not supported by this version of the device.
CRN is also as defined in SAM; while it is generally expected to
be 0, clients can provide it.  The maximum CRN value defined by
the protocol is 255, since CRN is stored in an 8-bit integer.

All of these fields are always read-only.

The cdb, data and sense fields must reside in separate buffers.
The cdb field is always read-only.  The data buffers may be either
read-only or write-only, depending on the request, with the read-only
buffers coming first.  The sense buffer is always write-only.

The request shall have num_dataout read-only data buffers and
num_datain write-only data buffers.  One of these two values must be
zero if the VIRTIO_SCSI_F_INOUT has not been negotiated.

Remaining fields are filled in by the device.  The sense_len field
indicates the number of bytes actually written to the sense buffer,
while the residual field indicates the residual size, calculated as
data_length - number_of_transferred_bytes.

The status byte is written by the device to be the SCSI status code.

The response byte is written by the device to be one of the following:

- VIRTIO_SCSI_S_OK when the request was completed and the status byte
  is filled with a SCSI status code (not necessarily GOOD).

- VIRTIO_SCSI_S_UNDERRUN if the content of the CDB requires transferring
  more data than is available in the data buffers.

- VIRTIO_SCSI_S_ABORTED if the request was cancelled due to a reset
  or another task management function.

- VIRTIO_SCSI_S_FAILURE for other host or guest error.

Device operation: control transmitq
---

The control transmitq is used for other SCSI transport 

Re: [Qemu-devel] [RFC] Memory API

2011-05-20 Thread Avi Kivity

On 05/19/2011 07:36 PM, Anthony Liguori wrote:

There are no global priorities. Priorities are only used inside each
level of the memory region hierarchy to generate a resulting, flattened
view for the next higher level. At that level, everything imported from
below has the default prio again, ie. the lowest one.



Then SMM is impossible.



It doesn't follow.

Why do we need priorities at all?  There should be no overlap at each 
level in the hierarchy.


Of course there is overlap.  PCI BARs overlap each other, the VGA 
windows and ROM overlap RAM.




If you have overlapping BARs, the PCI bus will always send the request 
to a single device based on something that's implementation specific. 
This works because each PCI device advertises the BAR locations and 
sizes in it's config space.


BARs in general don't need priority, except we need to decide if BARs 
overlap RAM of vice-versa.




To dispatch a request, the PCI bus will walk the config space to find 
a match.  If you remove something that was previously causing an 
overlap, it'll the other device will now get the I/O requests.


That's what *exactl* what priority means.  Which device is in front, and 
which is in the back.




To model this correctly, you need to let the PCI bus decide how to 
dispatch I/O requests (again, you need hierarchical dispatch).


And again, this API gives you hierarchical dispatch, with the addition 
that some of it is done at registration time so we can prepare the RAM 
slots.




In the absence of this, the PCI bus needs to look at all of the 
devices, figure out the flat mapping, and register it.  When a device 
is added or removed, it needs to recalculate the flat mapping and 
register it.


However we do this, we need to look at all devices.



There is no need to have centralized logic to decide this.



I think you're completely missing the point of my proposal.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.




Re: [Qemu-devel] [RFC] Memory API

2011-05-20 Thread Avi Kivity

On 05/19/2011 07:49 PM, Jan Kiszka wrote:


  If you have overlapping BARs, the PCI bus will always send the request
  to a single device based on something that's implementation specific.
  This works because each PCI device advertises the BAR locations and
  sizes in it's config space.

That's not a use case for priorities at all. Priorities are useful for
PAM and SMRAM-like scenarios.


Correct.  Priorities are also useful to decide if BARs hide RAM or 
vice-versa (determined by the PCI container's priority vs. the RAM 
container priorities, not individual BARs' priorities).


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.




Re: [Qemu-devel] [RFC] Memory API

2011-05-20 Thread Avi Kivity

On 05/19/2011 07:32 PM, Anthony Liguori wrote:

Think of how a window manager folds windows with priorities onto a flat
framebuffer.

You do a depth-first walk of the tree. For each child list, you iterate
it from the lowest to highest priority, allowing later subregions
override earlier subregions.




Okay, but this doesn't explain how you'll let RAM override the VGA 
mapping since RAM is not represented in the same child list as VGA 
(RAM is a child of the PMC whereas VGA is a child of ISA/PCI, both of 
which are at least one level removed from the PMC).


VGA will override RAM.

Memory controller
 |
 +-- RAM container (prio 0)
 |
 +-- PCI container (prio 1)
  |
  +--- vga window


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.




Re: [Qemu-devel] [RFC] Memory API

2011-05-20 Thread Avi Kivity

On 05/19/2011 07:38 PM, Anthony Liguori wrote:

You can always create a new memory region with higher priority, pointing
to the RAM window you want to have above VGA. That's what we do today as
well, just with different effects on the internal representation.



But then we're no better than we are today.  I thought the whole point 
of this thread of discussion was to allow overlapping I/O regions to 
be handled in a better way than we do today?


It is, and the goal is achieved.  Right now the code saves the old 
contents in isa_page_descs.  With the new approach it calls 
memory_region_del_subregion() and the previous contents magically appear 
(or new contents if they changed in the meanwhile).


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.




Re: [Qemu-devel] [RFC] Memory API

2011-05-20 Thread Avi Kivity

On 05/19/2011 09:22 PM, Gleb Natapov wrote:


  BARs may overlap with other BARs or with RAM. That's well-known, so PCI
  bridged need to register their regions with the _overlap variant
  unconditionally. In contrast to the current PhysPageDesc mechanism, the
With what priority?


It doesn't matter, since the spec doesn't define priorities among PCI BARs.


If it needs to call _overlap unconditionally why not
always call _overlap and drop not _overlap variant?


Other uses need non-overlapping registration.



  And they do not need to. The APIC regions will be managed by the per-CPU
  region management, reusing the tool box we need for all bridges. It will
  register the APIC page with a priority higher than the default one, thus
  overriding everything that comes from the host bridge. I think that
  reflects pretty well real machine behaviour.

What is higher? How does it know that priority is high enough?


It is well known that 1  0, for example.


I
thought, from reading other replies, that priorities are meaningful
only on the same hierarchy level (which kinda make sense), but now you
are saying that you will override PCI address from another part of
the topology?


-- per-cpu memory
|
+--- apic page (prio 1)
|
+--- global memory (prio 0)

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.




Re: [Qemu-devel] [PATCH 19/26] target-xtensa: implement loop option

2011-05-20 Thread Max Filippov
  +    if (env-sregs[LEND] != v) {
  +        tb_invalidate_phys_page_range(
  +                env-sregs[LEND] - 1, env-sregs[LEND], 0);
  +        env-sregs[LEND] = v;
  +        tb_invalidate_phys_page_range(
  +                env-sregs[LEND] - 1, env-sregs[LEND], 0);
  +    }

 Why are you invalidating twice?

 TB at the old LEND and at the new. Although it will work correctly without 
 first invalidation.

  +static void gen_check_loop_end(DisasContext *dc, int slot)
  +{
  +    if (option_enabled(dc, XTENSA_OPTION_LOOP) 
  +            !(dc-tb-flags  XTENSA_TBFLAG_EXCM) 
  +            dc-next_pc == dc-lend) {
  +        int label = gen_new_label();
  +
  +        tcg_gen_brcondi_i32(TCG_COND_NE, cpu_SR[LEND], dc-next_pc, 
  label);
  +        tcg_gen_brcondi_i32(TCG_COND_EQ, cpu_SR[LCOUNT], 0, label);
  +        tcg_gen_subi_i32(cpu_SR[LCOUNT], cpu_SR[LCOUNT], 1);
  +        gen_jump(dc, cpu_SR[LBEG]);
  +        gen_set_label(label);
  +        gen_jumpi(dc, dc-next_pc, slot);

 If you're going to pretend that LEND is a constant, you might as well
 pretend that LBEG is also a constant, so that you get to chain the TB's
 around the loop.

 But there may be three exits from TB at the LEND if its last command is a 
 branch: to the LBEG, to the branch target and to the next insn.

Ok, I guess that I need to add gen_wsr_lbeg that invalidates TB at the
current LEND, pretend that LBEG is constant and use given slot to jump
to it. And also to get rid of tcg_gen_brcondi_i32(TCG_COND_NE,
cpu_SR[LEND], dc-next_pc, label);

-- 
Thanks.
-- Max



[Qemu-devel] [PATCH] hw/sd.c: Don't complain about SDIO commands CMD52/CMD53

2011-05-20 Thread Peter Maydell
The SDIO specification introduces new commands 52 and 53.
Handle as illegal command but do not complain on stderr,
as SDIO-aware OSes (including Linux) may legitimately use
these in their probing for presence of an SDIO card.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
---
 hw/sd.c |   11 +++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/hw/sd.c b/hw/sd.c
index f44a970..cedfb20 100644
--- a/hw/sd.c
+++ b/hw/sd.c
@@ -1104,6 +1104,17 @@ static sd_rsp_type_t sd_normal_command(SDState *sd,
 }
 break;
 
+case 52:
+case 53:
+/* CMD52, CMD53: reserved for SDIO cards
+ * (see the SDIO Simplified Specification V2.0)
+ * Handle as illegal command but do not complain
+ * on stderr, as some OSes may use these in their
+ * probing for presence of an SDIO card.
+ */
+sd-card_status |= ILLEGAL_COMMAND;
+return sd_r0;
+
 /* Application specific commands (Class 8) */
 case 55:   /* CMD55:  APP_CMD */
 if (sd-rca != rca)
-- 
1.7.1




Re: [Qemu-devel] [RFC] Memory API

2011-05-20 Thread Avi Kivity

On 05/19/2011 09:18 PM, Anthony Liguori wrote:

On 05/19/2011 09:11 AM, Avi Kivity wrote:

On 05/19/2011 05:04 PM, Anthony Liguori wrote:


Right, the chipset register is mainly used to program the contents of
SMM.

There is a single access pin that has effectively the same semantics
as setting the chipset register.

It's not a per-CPU setting--that's the point. You can't have one CPU
reading SMM memory at the exactly same time as accessing VGA.

But I guess you can never have two simultaneous accesses anyway so
perhaps it's splitting hairs :-)


Exactly - it just works.


Well, not really.

kvm.ko has a global mapping of RAM regions and currently only allows 
code execution from RAM.


This means the only way for QEMU to enable SMM support is to program 
the global RAM regions table to enable allow RAM access for the VGA 
region.


The problem with this is that it's perfectly conceivable to have CPU 0 
in SMM mode while CPU 1 is doing MMIO to the VGA planar.


kvm needs updates to support SMM; I already outlined them several months 
ago.




The same problem exists with PAM. 


PAM is a completely different problem.  The changes are global and fit 
kvm slot management.


It would be much easier to implement PAM correctly in QEMU if it were 
possible to execute code via MMIO as we could just mark the BIOS 
memory as non-RAM and deal with the dispatch ourselves.


Would it be fundamentally hard to support this in KVM?  I guess you 
would need to put the VCPU in single step mode and maintain a page to 
copy the results into.


You need to emulate everything.  We're probably not far from that.  
However there may be a significant performance loss.



--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.




Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API

2011-05-20 Thread Avi Kivity

On 05/19/2011 10:07 PM, Alex Williamson wrote:

On Thu, 2011-05-19 at 10:12 -0400, Avi Kivity wrote:
  The memory API separates the attributes of a memory region (its size, how
  reads or writes are handled, dirty logging, and coalescing) from where it
  is mapped and whether it is enabled.  This allows a device to configure
  a memory region once, then hand it off to its parent bus to map it according
  to the bus configuration.

  Hierarchical registration also allows a device to compose a region out of
  a number of sub-regions with different properties; for example some may be
  RAM while others may be MMIO.

  +/* Guest-visible constraints: */
  +struct {
  +/* If nonzero, specify bounds on access sizes beyond which a machine
  + * check is thrown.
  + */
  +unsigned min_access_size;
  +unsigned max_access_size;

Do we always support all access sizes between min and max?


As far as I can tell, yes.


This might
be easier to describe as a bitmap of supported power of 2 access sizes.


This is uglier to initialize.  However we can provide #defines for 
common use (MEM_ACCESS_BYTE_TO_LONG, MEM_ACCESS_LONG).


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.




Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API

2011-05-20 Thread Avi Kivity

On 05/19/2011 10:27 PM, Jan Kiszka wrote:

On 2011-05-19 16:12, Avi Kivity wrote:
  +/* Sets an offset to be added to MemoryRegionOps callbacks. */
  +void memory_region_set_offset(MemoryRegion *mr, target_phys_addr_t offset);

Please mark this as a legacy helper, ideally to be removed after the
complete conversion to this API. During that phase we should try to
identify those devices which still depend on offset=0 and maybe directly
fix them.


Okay.


  +/* Turn loggging on or off for specified client (display, migration) */
  +void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client);
  +/* Enable memory coalescing for the region.  MMIO -write callbacks may be
  + * delayed until a non-coalesced MMIO is issued.
  + */
  +void memory_region_set_coalescing(MemoryRegion *mr);
  +/* Enable memory coalescing for a sub-range of the region.  MMIO -write
  + * callbacks may be delayed until a non-coalesced MMIO is issued.
  + */
  +void memory_region_add_coalescing(MemoryRegion *mr,
  +  target_phys_addr_t offset,
  +  target_phys_addr_t size);

Will this be such a common use case that requesting the user to split up
the region and then use set_coalescing will generate too much boiler
plate code?


Look at e1000, coalescing ranges have byte granularity.


  +/* Disable MMIO coalescing for the region. */
  +void memory_region_clear_coalescing(MemoryRegion *mr);

And what about clearing coalescing for sub-ranges?


Clear them all and rebuild.


Maybe skip
add_coalescing for the first run and see how far we get.


We get as far as e.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.




Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API

2011-05-20 Thread Avi Kivity

On 05/19/2011 11:43 PM, Anthony Liguori wrote:

On 05/19/2011 09:12 AM, Avi Kivity wrote:
The memory API separates the attributes of a memory region (its size, 
how
reads or writes are handled, dirty logging, and coalescing) from 
where it

is mapped and whether it is enabled.  This allows a device to configure
a memory region once, then hand it off to its parent bus to map it 
according

to the bus configuration.

Hierarchical registration also allows a device to compose a region 
out of
a number of sub-regions with different properties; for example some 
may be

RAM while others may be MMIO.

+struct {
+/* If nonzero, specify bounds on access sizes beyond which a 
machine

+ * check is thrown.
+ */
+unsigned min_access_size;
+unsigned max_access_size;
+/* If true, unaligned accesses are supported.  Otherwise 
unaligned

+ * accesses throw machine checks.
+ */
+ bool unaligned;
+} valid;


Under what circumstances would this be used?

The behavior of devices that receive non-natural accesses varies wildly.

For PCI devices, invalid accesses almost always return ~0.  I can't 
think of a device where an MCE would occur.


This was requested by Richard, so I'll let him comment.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.




Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API

2011-05-20 Thread Avi Kivity

On 05/20/2011 12:04 AM, Stefan Weil wrote:

Am 19.05.2011 16:12, schrieb Avi Kivity:
The memory API separates the attributes of a memory region (its size, 
how
reads or writes are handled, dirty logging, and coalescing) from 
where it

is mapped and whether it is enabled. This allows a device to configure
a memory region once, then hand it off to its parent bus to map it 
according

to the bus configuration.

Hierarchical registration also allows a device to compose a region 
out of
a number of sub-regions with different properties; for example some 
may be

RAM while others may be MMIO.



--- /dev/null
+++ b/memory.h
@@ -0,0 +1,142 @@
+#ifndef MEMORY_H
+#define MEMORY_H
+
+#include stdint.h
+#include stdbool.h


stdbool.h is already included in qemu-common.h,
stdint.h (indirectly) too.

Therefore both include statements can be removed.


We shouldn't rely on indirect includes, it makes updating headers very 
hard.  Each header should #include what it directly needs and no more.



+typedef struct CoalescedMemoryRange CoalescedMemoryRange;
+
+struct CoalescedMemoryRange {
+ target_phys_addr_t start;
+ target_phys_addr_t size;
+ QTAILQ_ENTRY(coalesced_ranges) link;
+};
+
+struct MemoryRegion {
+ /* All fields are private - violators will be prosecuted */


Is it possible to move this private declaration into the implementation
file (or a private header file if the declaration is needed by more than
one file)?



No, the structure size is needed by clients.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.




[Qemu-devel] Share a directory between a linux host and a windows guest w/o network?

2011-05-20 Thread Torsten Förtsch
Hi,

is it possible to share a directory between a windows guest running on a 
linux host? Similar to samba but independent on the network?

I have searched for combinations of v9fs or virtio-9p and windows 
but didn't find anything relevant.

Thanks,
Torsten Förtsch



Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API

2011-05-20 Thread Avi Kivity

On 05/20/2011 12:11 AM, Stefan Hajnoczi wrote:

On Thu, May 19, 2011 at 3:12 PM, Avi Kivitya...@redhat.com  wrote:
  +struct MemoryRegion {
  +/* All fields are private - violators will be prosecuted */
  +const MemoryRegionOps *ops;
  +MemoryRegion *parent;

In the case where a region is aliased (mapped twice into the address
space at different addresses) I need two MemoryRegions?


Yes.


The
MemoryRegion describes an actual mapping in theparent, addr,
ram_addr  tuple, not just the attributes of the region (ops, size,
...).


Correct.  The region is not just a read-only descriptor.  
memory_region_add_subregion() can be used only once on a region (unless 
you _del_subregion() in between).


(it also follows from the fact that there is no separate opaque for 
registration, and from the fact that RAM is owned by the region, not 
provided as part of registration).


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.




Re: [Qemu-devel] [RFC] Memory API

2011-05-20 Thread Avi Kivity

On 05/19/2011 07:27 PM, Gleb Natapov wrote:

  Think of how a window manager folds windows with priorities onto a
  flat framebuffer.

  You do a depth-first walk of the tree.  For each child list, you
  iterate it from the lowest to highest priority, allowing later
  subregions override earlier subregions.

I do not think that window manager is a good analogy. Window can
overlap with only its siblings. In our memory tree each final node may
overlap with any other node in the tree.



Transparent windows.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.




Re: [Qemu-devel] [PATCH 01/11] target-ppc: remove old CONFIG_SOFTFLOAT #ifdef

2011-05-20 Thread Peter Maydell
On 15 May 2011 15:13, Aurelien Jarno aurel...@aurel32.net wrote:
 target-ppc has been switched to softfloat only long ago, but a
 few #ifdef CONFIG_SOFTFLOAT have been forgotten. Remove them.

 Cc: Alexander Graf ag...@suse.de
 Signed-off-by: Aurelien Jarno aurel...@aurel32.net

Reviewed-by: Peter Maydell peter.mayd...@linaro.org



[Qemu-devel] Protesta contro l'oppressione fiscale giudiziale e bancaria

2011-05-20 Thread Protesta popolare
Per visionare il sito clicca qui
Fai parte anche tu di
Italia che lavora.
Sito di protesta:
Fiscale, giudiziaria e bancaria
Inoltra questo msg ai tuoi amici
Per visionare il sito clicca qui


Re: [Qemu-devel] [PATCH 03/11] softfloat-native: remove

2011-05-20 Thread Peter Maydell
On 15 May 2011 15:13, Aurelien Jarno aurel...@aurel32.net wrote:
 Remove softfloat-native support, all targets are now using softfloat
 instead.

 Signed-off-by: Aurelien Jarno aurel...@aurel32.net

Reviewed-by: Peter Maydell peter.mayd...@linaro.org



Re: [Qemu-devel] [PATCH 04/11] softfloat: always enable floatx80 and float128 support

2011-05-20 Thread Peter Maydell
On 15 May 2011 15:13, Aurelien Jarno aurel...@aurel32.net wrote:
 Now that softfloat-native is gone, there is no real point on not always
 enabling floatx80 and float128 support.

 Signed-off-by: Aurelien Jarno aurel...@aurel32.net

Reviewed-by: Peter Maydell peter.mayd...@linaro.org



Re: [Qemu-devel] [PATCH 05/11] target-i386: remove old code handling float64

2011-05-20 Thread Peter Maydell
On 15 May 2011 15:13, Aurelien Jarno aurel...@aurel32.net wrote:
 Now that target-i386 uses softfloat, floatx80 is always available and
 there is no need anymore to have code handling both float64 and floax80.

 Signed-off-by: Aurelien Jarno aurel...@aurel32.net

This patch is OK in terms of how it leaves the code, but I think some
parts of it are out of sequence with the rest of the patchset.

For instance:
 -#ifdef FLOATX80
 -#define USE_X86LDOUBLE
 -#endif

We've already removed the FLOATX80 define in a previous patch,
so if we don't delete the x86 use of it until this patch then
the behaviour will briefly flip-flop as you go through the patch
stack, which could be bad for bisection.

 -#if defined(CONFIG_SOFTFLOAT)
 -# define floatx_lg2 make_floatx80( 0x3ffd, 0x9a209a84fbcff799LL )
 -# define floatx_l2e make_floatx80( 0x3fff, 0xb8aa3b295c17f0bcLL )
 -# define floatx_l2t make_floatx80( 0x4000, 0xd49a784bcd1b8afeLL )
 -#else
 -# define floatx_lg2 (0.30102999566398119523L)
 -# define floatx_l2e (1.44269504088896340739L)
 -# define floatx_l2t (3.32192809488736234781L)
 -#endif

Similarly, this #ifdeffery should have gone away when we took
out CONFIG_SOFTFLOAT, not later.

(Also the patch was a bit of a pig to review because it combines
several distinct mostly-mechanical transformations.)

-- PMM



Re: [Qemu-devel] [PATCH 06/11] target-i386: use floatx80 constants in helper_fld*_ST0()

2011-05-20 Thread Peter Maydell
On 15 May 2011 15:13, Aurelien Jarno aurel...@aurel32.net wrote:
 Instead of using a table which doesn't correspond to anything from
 physical in the CPU, use directly the constants in helper_fld*_ST0().

Actually I rather suspect there is effectively a table in the CPU
indexed by the last 3 bits of the FLD* opcode... It would be
possible to implement this group of insns in QEMU with a single
helper function that took the index into the array, but since the
array seems to be causing weird compilation problems we might
as well stick with the lots-of-helpers approach, at which point
this is a sensible cleanup.

Reviewed-by: Peter Maydell peter.mayd...@linaro.org



Re: [Qemu-devel] [PATCH 07/11] softfloat: add float*_is_zero_or_denormal()

2011-05-20 Thread Peter Maydell
On 15 May 2011 15:13, Aurelien Jarno aurel...@aurel32.net wrote:
 float*_is_zero_or_denormal() is available for float32, but not for
 float64, floatx80 and float128. Fix that.

 Signed-off-by: Aurelien Jarno aurel...@aurel32.net

Reviewed-by: Peter Maydell peter.mayd...@linaro.org



[Qemu-devel] [PATCH v4 0/3] Coroutines for better asynchronous programming

2011-05-20 Thread Stefan Hajnoczi
QEMU is event-driven and suffers when blocking operations are performed because
VM execution may be stopped until the operation completes.  Therefore many
operations that could block are performed asynchronously and a callback is
invoked when the operation has completed.  This allows QEMU to continue
executing while the operation is pending.

The downside to callbacks is that they split up code into many smaller
functions, each of which is a single step in a state machine that quickly
becomes complex and hard to understand.  Callback functions also result in lots
of noise as variables are packed and unpacked into temporary structs that pass
state to the callback function.

This patch series introduces coroutines as a solution for writing asynchronous
code while still having a nice sequential control flow.  The semantics are
explained in the first patch.  The second patch adds automated tests.

A nice feature of coroutines is that it is relatively easy to take synchronous
code and lift it into a coroutine to make it asynchronous.  Work has been done
to move qcow2 request processing into coroutines and thereby make it
asynchronous (today qcow2 will perform synchronous metadata accesses).  This
qcow2 work is still ongoing and not quite ready for mainline yet.

Coroutines are also being used for virtfs (virtio-9p) so I have submitted this
patch now because virtfs patches that depend on coroutines are being published.

v4:
 * Windows Fibers support (Paolo Bonzini pbonz...@redhat.com)
 * Return-after-setjmp() fix (Aneesh Kumar K.V 
aneesh.ku...@linux.vnet.ibm.com)
 * Re-entrancy for multi-threaded coroutines support
 * qemu-coroutine.h cleanup and documentation

v3:
 * Updated LGPL v2 license header to use web link
 * Removed atexit(3) pool freeing
 * Removed thread-local current/leader
 * Documented thread-safety limitation
 * Disabled trace events

v2:
 * Added ./check-coroutine --lifecycle-benchmark for performance measurement
 * Split pooling into a separate patch with performance justification
 * Set maximum pool size to prevent holding onto too many free coroutines
 * Added atexit(3) handler to free pool
 * Coding style cleanups

Kevin Wolf (1):
  coroutine: introduce coroutines

Stefan Hajnoczi (2):
  coroutine: add check-coroutine automated tests
  coroutine: add check-coroutine --benchmark-lifecycle

 Makefile |3 +-
 Makefile.objs|7 ++
 check-coroutine.c|  236 ++
 coroutine-ucontext.c |  229 
 coroutine-win32.c|   92 +++
 qemu-coroutine-int.h |   48 ++
 qemu-coroutine.c |   75 
 qemu-coroutine.h |   95 
 trace-events |5 +
 9 files changed, 789 insertions(+), 1 deletions(-)
 create mode 100644 check-coroutine.c
 create mode 100644 coroutine-ucontext.c
 create mode 100644 coroutine-win32.c
 create mode 100644 qemu-coroutine-int.h
 create mode 100644 qemu-coroutine.c
 create mode 100644 qemu-coroutine.h

-- 
1.7.4.4




[Qemu-devel] [PATCH v4 2/3] coroutine: add check-coroutine automated tests

2011-05-20 Thread Stefan Hajnoczi
To run automated tests for coroutines:

  make check-coroutine
  ./check-coroutine

On success the program terminates with exit status 0.  On failure an
error message is written to stderr and the program exits with exit
status 1.

Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
---
 Makefile  |3 +-
 check-coroutine.c |  188 +
 2 files changed, 190 insertions(+), 1 deletions(-)
 create mode 100644 check-coroutine.c

diff --git a/Makefile b/Makefile
index 2b0438c..69c08c2 100644
--- a/Makefile
+++ b/Makefile
@@ -132,7 +132,7 @@ qemu-io$(EXESUF): qemu-io.o cmd.o qemu-tool.o qemu-error.o 
$(oslib-obj-y) $(trac
 qemu-img-cmds.h: $(SRC_PATH)/qemu-img-cmds.hx
$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h  $  $@,  GEN  
 $@)
 
-check-qint.o check-qstring.o check-qdict.o check-qlist.o check-qfloat.o 
check-qjson.o: $(GENERATED_HEADERS)
+check-qint.o check-qstring.o check-qdict.o check-qlist.o check-qfloat.o 
check-qjson.o check-coroutine.o: $(GENERATED_HEADERS)
 
 CHECK_PROG_DEPS = qemu-malloc.o $(oslib-obj-y) $(trace-obj-y)
 
@@ -142,6 +142,7 @@ check-qdict: check-qdict.o qdict.o qfloat.o qint.o 
qstring.o qbool.o qlist.o $(C
 check-qlist: check-qlist.o qlist.o qint.o $(CHECK_PROG_DEPS)
 check-qfloat: check-qfloat.o qfloat.o $(CHECK_PROG_DEPS)
 check-qjson: check-qjson.o qfloat.o qint.o qdict.o qstring.o qlist.o qbool.o 
qjson.o json-streamer.o json-lexer.o json-parser.o $(CHECK_PROG_DEPS)
+check-coroutine: check-coroutine.o $(coroutine-obj-y) $(CHECK_PROG_DEPS)
 
 QEMULIBS=libhw32 libhw64 libuser libdis libdis-user
 
diff --git a/check-coroutine.c b/check-coroutine.c
new file mode 100644
index 000..f65ac2e
--- /dev/null
+++ b/check-coroutine.c
@@ -0,0 +1,188 @@
+/*
+ * Coroutine tests
+ *
+ * Copyright IBM, Corp. 2011
+ *
+ * Authors:
+ *  Stefan Hajnoczistefa...@linux.vnet.ibm.com
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include stdlib.h
+#include stdio.h
+#include qemu-coroutine.h
+
+static const char *cur_test_name;
+
+static void test_assert(bool condition, const char *msg)
+{
+if (!condition) {
+fprintf(stderr, %s: %s\n, cur_test_name, msg);
+exit(EXIT_FAILURE);
+}
+}
+
+/*
+ * Check that qemu_in_coroutine() works
+ */
+
+static void coroutine_fn verify_in_coroutine(void *opaque)
+{
+test_assert(qemu_in_coroutine(), expected coroutine context);
+}
+
+static void test_in_coroutine(void)
+{
+Coroutine *coroutine;
+
+test_assert(!qemu_in_coroutine(), expected no coroutine context);
+
+coroutine = qemu_coroutine_create(verify_in_coroutine);
+qemu_coroutine_enter(coroutine, NULL);
+}
+
+/*
+ * Check that qemu_coroutine_self() works
+ */
+
+static void coroutine_fn verify_self(void *opaque)
+{
+test_assert(qemu_coroutine_self() == opaque,
+qemu_coroutine_self() did not return this coroutine);
+}
+
+static void test_self(void)
+{
+Coroutine *coroutine;
+
+coroutine = qemu_coroutine_create(verify_self);
+qemu_coroutine_enter(coroutine, coroutine);
+}
+
+/*
+ * Check that coroutines may nest multiple levels
+ */
+
+typedef struct {
+unsigned int n_enter;   /* num coroutines entered */
+unsigned int n_return;  /* num coroutines returned */
+unsigned int max;   /* maximum level of nesting */
+} NestData;
+
+static void coroutine_fn nest(void *opaque)
+{
+NestData *nd = opaque;
+
+nd-n_enter++;
+
+if (nd-n_enter  nd-max) {
+Coroutine *child;
+
+child = qemu_coroutine_create(nest);
+qemu_coroutine_enter(child, nd);
+}
+
+nd-n_return++;
+}
+
+static void test_nesting(void)
+{
+Coroutine *root;
+NestData nd = {
+.n_enter  = 0,
+.n_return = 0,
+.max  = 1,
+};
+
+root = qemu_coroutine_create(nest);
+qemu_coroutine_enter(root, nd);
+
+test_assert(nd.n_enter == nd.max,
+failed entering to max nesting level);
+test_assert(nd.n_return == nd.max,
+failed returning from max nesting level);
+}
+
+/*
+ * Check that yield/enter transfer control correctly
+ */
+
+static void coroutine_fn yield_5_times(void *opaque)
+{
+bool *done = opaque;
+int i;
+
+for (i = 0; i  5; i++) {
+qemu_coroutine_yield();
+}
+*done = true;
+}
+
+static void test_yield(void)
+{
+Coroutine *coroutine;
+bool done = false;
+int i = -1; /* one extra time to return from coroutine */
+
+coroutine = qemu_coroutine_create(yield_5_times);
+while (!done) {
+qemu_coroutine_enter(coroutine, done);
+i++;
+}
+test_assert(i == 5, coroutine did not yield 5 times);
+}
+
+/*
+ * Check that creation, enter, and return work
+ */
+
+static void coroutine_fn set_and_exit(void *opaque)
+{
+bool *done = opaque;
+
+*done = true;
+}
+
+static void test_lifecycle(void)
+{
+

[Qemu-devel] [PATCH v4 1/3] coroutine: introduce coroutines

2011-05-20 Thread Stefan Hajnoczi
From: Kevin Wolf kw...@redhat.com

Asynchronous code is becoming very complex.  At the same time
synchronous code is growing because it is convenient to write.
Sometimes duplicate code paths are even added, one synchronous and the
other asynchronous.  This patch introduces coroutines which allow code
that looks synchronous but is asynchronous under the covers.

A coroutine has its own stack and is therefore able to preserve state
across blocking operations, which traditionally require callback
functions and manual marshalling of parameters.

Creating and starting a coroutine is easy:

  coroutine = qemu_coroutine_create(my_coroutine);
  qemu_coroutine_enter(coroutine, my_data);

The coroutine then executes until it returns or yields:

  void coroutine_fn my_coroutine(void *opaque) {
  MyData *my_data = opaque;

  /* do some work */

  qemu_coroutine_yield();

  /* do some more work */
  }

Yielding switches control back to the caller of qemu_coroutine_enter().
This is typically used to switch back to the main thread's event loop
after issuing an asynchronous I/O request.  The request callback will
then invoke qemu_coroutine_enter() once more to switch back to the
coroutine.

Note that if coroutines are used only from threads which hold the global
mutex they will never execute concurrently.  This makes programming with
coroutines easier than with threads.  Race conditions cannot occur since
only one coroutine may be active at any time.  Other coroutines can only
run across yield.

This coroutines implementation is based on the gtk-vnc implementation
written by Anthony Liguori anth...@codemonkey.ws but it has been
significantly rewritten by Kevin Wolf kw...@redhat.com to use
setjmp()/longjmp() instead of the more expensive swapcontext() and by
Paolo Bonzini pbonz...@redhat.com for Windows Fibers support.

Signed-off-by: Kevin Wolf kw...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
---
 Makefile.objs|7 ++
 coroutine-ucontext.c |  229 ++
 coroutine-win32.c|   92 
 qemu-coroutine-int.h |   48 +++
 qemu-coroutine.c |   75 
 qemu-coroutine.h |   95 +
 trace-events |5 +
 7 files changed, 551 insertions(+), 0 deletions(-)
 create mode 100644 coroutine-ucontext.c
 create mode 100644 coroutine-win32.c
 create mode 100644 qemu-coroutine-int.h
 create mode 100644 qemu-coroutine.c
 create mode 100644 qemu-coroutine.h

diff --git a/Makefile.objs b/Makefile.objs
index 4478c61..a8dbd15 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -11,6 +11,12 @@ oslib-obj-$(CONFIG_WIN32) += oslib-win32.o 
qemu-thread-win32.o
 oslib-obj-$(CONFIG_POSIX) += oslib-posix.o qemu-thread-posix.o
 
 ###
+# coroutines
+coroutine-obj-y = qemu-coroutine.o
+coroutine-obj-$(CONFIG_POSIX) += coroutine-ucontext.o
+coroutine-obj-$(CONFIG_WIN32) += coroutine-win32.o
+
+###
 # block-obj-y is code used by both qemu system emulation and qemu-img
 
 block-obj-y = cutils.o cache-utils.o qemu-malloc.o qemu-option.o module.o 
async.o
@@ -67,6 +73,7 @@ common-obj-y += readline.o console.o cursor.o qemu-error.o
 common-obj-y += $(oslib-obj-y)
 common-obj-$(CONFIG_WIN32) += os-win32.o
 common-obj-$(CONFIG_POSIX) += os-posix.o
+common-obj-y += $(coroutine-obj-y)
 
 common-obj-y += tcg-runtime.o host-utils.o
 common-obj-y += irq.o ioport.o input.o
diff --git a/coroutine-ucontext.c b/coroutine-ucontext.c
new file mode 100644
index 000..bcea2bd
--- /dev/null
+++ b/coroutine-ucontext.c
@@ -0,0 +1,229 @@
+/*
+ * ucontext coroutine initialization code
+ *
+ * Copyright (C) 2006  Anthony Liguori anth...@codemonkey.ws
+ * Copyright (C) 2011  Kevin Wolf kw...@redhat.com
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.0 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see http://www.gnu.org/licenses/.
+ */
+
+/* XXX Is there a nicer way to disable glibc's stack check for longjmp? */
+#ifdef _FORTIFY_SOURCE
+#undef _FORTIFY_SOURCE
+#endif
+#include setjmp.h
+#include stdint.h
+#include pthread.h
+#include ucontext.h
+#include qemu-common.h
+#include qemu-coroutine-int.h
+
+enum {
+/* Maximum free pool size prevents holding too many freed coroutines */
+POOL_MAX_SIZE = 64,
+};
+
+typedef struct {
+

[Qemu-devel] [PATCH v4 3/3] coroutine: add check-coroutine --benchmark-lifecycle

2011-05-20 Thread Stefan Hajnoczi
Add a microbenchmark for coroutine create, enter, and return (aka
lifecycle).  This is a useful benchmark because users are expected to
create many coroutines, one per I/O request for example, and we
therefore need to provide good performance in that scenario.

To run:

  make check-coroutine
  ./check-coroutine --benchmark-lifecycle 2000

This will do 20,000,000 coroutine create, enter, return iterations and
print the resulting time.

Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
---
 check-coroutine.c |   48 
 1 files changed, 48 insertions(+), 0 deletions(-)

diff --git a/check-coroutine.c b/check-coroutine.c
index f65ac2e..8ed1a4f 100644
--- a/check-coroutine.c
+++ b/check-coroutine.c
@@ -11,8 +11,10 @@
  *
  */
 
+#include string.h
 #include stdlib.h
 #include stdio.h
+#include sys/time.h
 #include qemu-coroutine.h
 
 static const char *cur_test_name;
@@ -163,6 +165,43 @@ static void test_lifecycle(void)
 test_assert(done, expected done to be true (second time));
 }
 
+/*
+ * Lifecycle benchmark
+ */
+
+static void coroutine_fn empty_coroutine(void *opaque)
+{
+/* Do nothing */
+}
+
+static void benchmark_lifecycle(const char *iterations)
+{
+Coroutine *coroutine;
+unsigned int i, max;
+struct timeval start, finish;
+time_t dsec;
+long dusec;
+
+max = atoi(iterations);
+
+gettimeofday(start, NULL);
+for (i = 0; i  max; i++) {
+coroutine = qemu_coroutine_create(empty_coroutine);
+qemu_coroutine_enter(coroutine, NULL);
+}
+gettimeofday(finish, NULL);
+
+dsec = finish.tv_sec - start.tv_sec;
+if (finish.tv_usec  start.tv_usec) {
+dsec--;
+dusec = finish.tv_usec + 100 - start.tv_usec;
+} else {
+dusec = finish.tv_usec - start.tv_usec;
+}
+printf(Lifecycle %u iterations: %lu sec %lu us\n,
+   max, dsec, dusec);
+}
+
 #define TESTCASE(fn) { #fn, fn }
 int main(int argc, char **argv)
 {
@@ -179,6 +218,15 @@ int main(int argc, char **argv)
 };
 int i;
 
+if (argc == 3  strcmp(argv[1], --benchmark-lifecycle) == 0) {
+benchmark_lifecycle(argv[2]);
+return EXIT_SUCCESS;
+} else if (argc != 1) {
+fprintf(stderr, usage: %s [--benchmark-lifecycle iterations]\n,
+argv[0]);
+return EXIT_FAILURE;
+}
+
 for (i = 0; testcases[i].name; i++) {
 cur_test_name = testcases[i].name;
 printf(%s\n, testcases[i].name);
-- 
1.7.4.4




[Qemu-devel] [PATCH 2/2] Deprecate -M command line options

2011-05-20 Thread Jan Kiszka
Superseded by -machine. Therefore, this patch removes -M from the help
list and pushes -machine at the same place in the output.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 qemu-options.hx |   45 -
 1 files changed, 20 insertions(+), 25 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 0dbc028..1204a00 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -27,14 +27,29 @@ STEXI
 Display version information and exit
 ETEXI
 
-DEF(M, HAS_ARG, QEMU_OPTION_M,
--M machine  select emulated machine (-M ? for list)\n, QEMU_ARCH_ALL)
+DEF(machine, HAS_ARG, QEMU_OPTION_machine, \
+-machine [type=]name[,prop[=value][,...]]\n
+selects emulated machine (-machine ? for list)\n
+property accel=accel1[:accel2[:...]] selects 
accelerator\n
+supported accelerators are kvm, xen, tcg (default: 
tcg)\n,
+QEMU_ARCH_ALL)
 STEXI
-@item -M @var{machine}
-@findex -M
-Select the emulated @var{machine} (@code{-M ?} for list)
+@item -machine [type=]@var{name}[,prop=@var{value}[,...]]
+@findex -machine
+Select the emulated machine by @var{name}. Use @code{-machine ?} to list
+available machines. Supported machine properties are:
+@table @option
+@item accel=@var{accels1}[:@var{accels2}[:...]]
+This is used to enable an accelerator. Depending on the target architecture,
+kvm, xen, or tcg can be available. By default, tcg is used. If there is more
+than one accelerator specified, the next one is used if the previous one fails
+to initialize.
+@end table
 ETEXI
 
+HXCOMM Deprecated by -machine
+DEF(M, HAS_ARG, QEMU_OPTION_M, , QEMU_ARCH_ALL)
+
 DEF(cpu, HAS_ARG, QEMU_OPTION_cpu,
 -cpu cpuselect CPU (-cpu ? for list)\n, QEMU_ARCH_ALL)
 STEXI
@@ -2032,26 +2047,6 @@ Enable KVM full virtualization support. This option is 
only available
 if KVM support is enabled when compiling.
 ETEXI
 
-DEF(machine, HAS_ARG, QEMU_OPTION_machine, \
--machine [type=]name[,prop[=value][,...]]\n
-selects emulated machine (-machine ? for list)\n
-property accel=accel1[:accel2[:...]] selects 
accelerator\n
-supported accelerators are kvm, xen, tcg (default: 
tcg)\n,
-QEMU_ARCH_ALL)
-STEXI
-@item -machine [type=]@var{name}[,prop=@var{value}[,...]]
-@findex -machine
-Select the emulated machine by @var{name}. Use @code{-machine ?} to list
-available machines. Supported machine properties are:
-@table @option
-@item accel=@var{accels1}[:@var{accels2}[:...]]
-This is used to enable an accelerator. Depending on the target architecture,
-kvm, xen, or tcg can be available. By default, tcg is used. If there is more
-than one accelerator specified, the next one is used if the previous one fails
-to initialize.
-@end table
-ETEXI
-
 DEF(xen-domid, HAS_ARG, QEMU_OPTION_xen_domid,
 -xen-domid id   specify xen guest domain id\n, QEMU_ARCH_ALL)
 DEF(xen-create, 0, QEMU_OPTION_xen_create,
-- 
1.7.1



[Qemu-devel] [PATCH 1/2] Generalize -machine command line option

2011-05-20 Thread Jan Kiszka
-machine somehow suggests that it selects the machine, but it doesn't.
Fix that before this command is set in stone.

Actually, -machine should supersede -M and allow to introduce arbitrary
per-machine options to the command line. That will change the internal
realization again, but we will be able to keep the user interface
stable.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 qemu-config.c   |5 +
 qemu-options.hx |   20 +++-
 vl.c|   34 +++---
 3 files changed, 39 insertions(+), 20 deletions(-)

diff --git a/qemu-config.c b/qemu-config.c
index 5d7ffa2..01751b4 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -452,9 +452,14 @@ QemuOptsList qemu_option_rom_opts = {
 
 static QemuOptsList qemu_machine_opts = {
 .name = machine,
+.implied_opt_name = type,
 .head = QTAILQ_HEAD_INITIALIZER(qemu_machine_opts.head),
 .desc = {
 {
+.name = type,
+.type = QEMU_OPT_STRING,
+.help = emulated machine
+}, {
 .name = accel,
 .type = QEMU_OPT_STRING,
 .help = accelerator list,
diff --git a/qemu-options.hx b/qemu-options.hx
index 82e085a..0dbc028 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2033,13 +2033,23 @@ if KVM support is enabled when compiling.
 ETEXI
 
 DEF(machine, HAS_ARG, QEMU_OPTION_machine, \
--machine accel=accel1[:accel2]use an accelerator (kvm,xen,tcg), 
default is tcg\n, QEMU_ARCH_ALL)
+-machine [type=]name[,prop[=value][,...]]\n
+selects emulated machine (-machine ? for list)\n
+property accel=accel1[:accel2[:...]] selects 
accelerator\n
+supported accelerators are kvm, xen, tcg (default: 
tcg)\n,
+QEMU_ARCH_ALL)
 STEXI
-@item -machine accel=@var{accels}
+@item -machine [type=]@var{name}[,prop=@var{value}[,...]]
 @findex -machine
-This is use to enable an accelerator, in kvm,xen,tcg.
-By default, it use only tcg. If there a more than one accelerator
-specified, the next one is used if the first don't work.
+Select the emulated machine by @var{name}. Use @code{-machine ?} to list
+available machines. Supported machine properties are:
+@table @option
+@item accel=@var{accels1}[:@var{accels2}[:...]]
+This is used to enable an accelerator. Depending on the target architecture,
+kvm, xen, or tcg can be available. By default, tcg is used. If there is more
+than one accelerator specified, the next one is used if the previous one fails
+to initialize.
+@end table
 ETEXI
 
 DEF(xen-domid, HAS_ARG, QEMU_OPTION_xen_domid,
diff --git a/vl.c b/vl.c
index b362871..4560376 100644
--- a/vl.c
+++ b/vl.c
@@ -2144,20 +2144,9 @@ int main(int argc, char **argv, char **envp)
 }
 switch(popt-index) {
 case QEMU_OPTION_M:
-machine = find_machine(optarg);
-if (!machine) {
-QEMUMachine *m;
-printf(Supported machines are:\n);
-for(m = first_machine; m != NULL; m = m-next) {
-if (m-alias)
-printf(%-10s %s (alias of %s)\n,
-   m-alias, m-desc, m-name);
-printf(%-10s %s%s\n,
-   m-name, m-desc,
-   m-is_default ?  (default) : );
-}
-exit(*optarg != '?');
-}
+olist = qemu_find_opts(machine);
+qemu_opts_reset(olist);
+qemu_opts_parse(olist, optarg, 1);
 break;
 case QEMU_OPTION_cpu:
 /* hw initialization will check this */
@@ -2675,11 +2664,26 @@ int main(int argc, char **argv, char **envp)
 case QEMU_OPTION_machine:
 olist = qemu_find_opts(machine);
 qemu_opts_reset(olist);
-opts = qemu_opts_parse(olist, optarg, 0);
+opts = qemu_opts_parse(olist, optarg, 1);
 if (!opts) {
 fprintf(stderr, parse error: %s\n, optarg);
 exit(1);
 }
+optarg = qemu_opt_get(opts, type);
+machine = optarg ? find_machine(optarg) : NULL;
+if (!machine) {
+QEMUMachine *m;
+printf(Supported machines are:\n);
+for (m = first_machine; m != NULL; m = m-next) {
+if (m-alias) {
+printf(%-10s %s (alias of %s)\n,
+   m-alias, m-desc, m-name);
+}
+printf(%-10s %s%s\n, m-name, m-desc,
+   m-is_default ?  (default) : );
+}
+exit(!optarg || *optarg != '?');
+}
 break;
 case QEMU_OPTION_usb:

Re: [Qemu-devel] Regression Warning: more nics requested than this machine supports

2011-05-20 Thread Peter Maydell
On 16 May 2011 17:58, Markus Armbruster arm...@redhat.com wrote:
    $ qemu-system-x86_64 -nodefaults -enable-kvm -m 384 -vnc :0 -S -netdev 
 user,id=net0 -device e1000,netdev=net0
    Warning: more nics requested than this machine supports; some have been 
 ignored
    (qemu) info network
    Devices not on any VLAN:
      net0: net=10.0.2.0, restricted=n peer=e1000.0
      e1000.0: model=e1000,macaddr=52:54:00:12:34:56 peer=net0

 Culprit is
    net: Improve the warnings for dubious command line option combinations

 Its count of requested NICs is blissfully unaware of -device.  In my
 example, it comes up with nb_nics == 0 and seen_nics == 1.

As far as I can determine, -device e1000,netdev=0 doesn't go through
net_init_nic() and doesn't put an entry in the nd_table[] for the NIC.
This means it's broken, because a lot of board models look in nd_table[]
to determine whether the user requested a NIC and whether it's the right
type. So I think that in some ways this is just showing up an existing
problem with trying to instantiate a network card with -device.

-- PMM



Re: [Qemu-devel] [RFC] Memory API

2011-05-20 Thread Gleb Natapov
On Fri, May 20, 2011 at 09:40:13AM +0200, Jan Kiszka wrote:
 On 2011-05-20 09:23, Gleb Natapov wrote:
  On Thu, May 19, 2011 at 08:55:49PM +0200, Jan Kiszka wrote:
  Because we should catch accidental overlaps in all those non PCI 
  devices
  with hard-wired addressing. That's a bug in the device/machine model 
  and
  should be reported as such by QEMU.
  Why should we complicate API to catch unlikely errors? If you want to
  debug that add capability to dump memory map from the monitor.
 
  Because we need to switch tons of code that so far saw a fairly
  different reaction of the core to overlapping regions.
 
  How so? Today if there is accidental overlap device will not function 
  properly.
  With new API it will be the same.
 
  I rather expect subtle differences as overlapping registration changes
  existing regions, in the future those will recover.
 
  Where do you expect the differences will come from? Conversion to the new
  API shouldn't change the order of the registration and if the last
  registration will override previous one the end result should be the
  same as we have today.
 
 A) Removing regions will change significantly. So far this is done by
 setting a region to IO_MEM_UNASSIGNED, keeping truncation. With the new
 API that will be a true removal which will additionally restore hidden
 regions.
 
And what problem do you expect may arise from that? Currently accessing
such region after unassign will result in undefined behaviour, so this
code is non working today, you can't make it worse.

 B) Uncontrolled overlapping is a bug that should be caught by the core,
 and a new API is a perfect chance to do this.
 
Well, this will indeed introduce the difference in behaviour :) The guest
that ran before will abort now. Are you actually aware of any such
overlaps in the current code base?

But if priorities are gona stay why not fail if two regions with the
same priority overlap? If that happens it means that the memory creation
didn't pass the point where conflict should have been resolved (by
assigning different priorities) and this means that overlap is
unintentional, no?

  
  new region management will not cause any harm to overlapping regions 
  so
  that they can recover when the overlap is gone.
 
 
  Another example may be APIC region and PCI. They overlap, but 
  neither
  CPU nor PCI knows about it.
 
  And they do not need to. The APIC regions will be managed by the 
  per-CPU
  region management, reusing the tool box we need for all bridges. It 
  will
  register the APIC page with a priority higher than the default one, 
  thus
  overriding everything that comes from the host bridge. I think that
  reflects pretty well real machine behaviour.
 
  What is higher? How does it know that priority is high enough?
 
  Because no one else manages priorities at a specific hierarchy level.
  There is only one.
 
  PCI and CPU are on different hierarchy levels. PCI is under the PIIX and
  CPU is on a system BUS.
 
  The priority for the APIC mapping will be applied at CPU level, of
  course. So it will override everything, not just PCI.
 
  So you do not need explicit priority because the place in hierarchy
  implicitly provides you with one.
 
  Yes.
  OK :) So you agree that we can do without priorities :)
 
 Nope, see below how your own example depends on them.
 
It depends on them in very defined way. Only layer that knows exactly
what is going on defines priorities. The priorities do not leak on any
other level or global database. It is different from propagating priority
from PCI BAR to core memory API.

I am starting to see how you can represent all this local decisions as
priority numbers and then travel this weighted tree to find what memory
region should be accessed (memory registration _has_ to be hierarchical
for that to work in meaningful way). I still don't see why it is better
than flattening the tree in the point of conflict.
 
  
Alternatively, you could add a prio offset to all mappings when
  climbing one level up, provided that offset is smaller than the prio
  range locally available to each level.
 
  Then a memory region final priority will depend on a tree height. If two
  disjointed tree branches of different height will claim the same memory
  region the higher one will have higher priority. I think this priority
  management is a can of worms.
 
 It is not as it remains a pure local thing and helps implementing the
 sketched scenarios. Believe, I tried to fix PAM/SMRAM already.
If it remains local thing then I misunderstand what do you mean by
could add a prio offset to all mappings when climbing one level up.
Doesn't sound like local things to me any more.

What problem did you have with PAM except low number of KVM slots btw?

 
  
  Only the lowest level (aka system bus) will use memory API directly.
 
 Not necessarily. It depends on how much added value buses like PCI or
 ISA or whatever can offer for managing I/O regions. For some purposes,
 

Re: [Qemu-devel] Regression Warning: more nics requested than this machine supports

2011-05-20 Thread Jan Kiszka
On 2011-05-20 13:19, Peter Maydell wrote:
 On 16 May 2011 17:58, Markus Armbruster arm...@redhat.com wrote:
$ qemu-system-x86_64 -nodefaults -enable-kvm -m 384 -vnc :0 -S -netdev 
 user,id=net0 -device e1000,netdev=net0
Warning: more nics requested than this machine supports; some have been 
 ignored
(qemu) info network
Devices not on any VLAN:
  net0: net=10.0.2.0, restricted=n peer=e1000.0
  e1000.0: model=e1000,macaddr=52:54:00:12:34:56 peer=net0

 Culprit is
net: Improve the warnings for dubious command line option combinations
 
 Its count of requested NICs is blissfully unaware of -device.  In my
 example, it comes up with nb_nics == 0 and seen_nics == 1.
 
 As far as I can determine, -device e1000,netdev=0 doesn't go through
 net_init_nic() and doesn't put an entry in the nd_table[] for the NIC.
 This means it's broken, because a lot of board models look in nd_table[]
 to determine whether the user requested a NIC and whether it's the right
 type. So I think that in some ways this is just showing up an existing
 problem with trying to instantiate a network card with -device.

qemu_new_nic must call net_init_nic so that this works properly. Of
course we need to avoid calling it multiple times when the adapter is
still instantiated via the old -net or via board init code.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [RFC] Memory API

2011-05-20 Thread Gleb Natapov
On Fri, May 20, 2011 at 11:59:58AM +0300, Avi Kivity wrote:
 On 05/19/2011 07:27 PM, Gleb Natapov wrote:
   Think of how a window manager folds windows with priorities onto a
   flat framebuffer.
 
   You do a depth-first walk of the tree.  For each child list, you
   iterate it from the lowest to highest priority, allowing later
   subregions override earlier subregions.
 
 I do not think that window manager is a good analogy. Window can
 overlap with only its siblings. In our memory tree each final node may
 overlap with any other node in the tree.
 
 
 Transparent windows.
 
No, still not that. Think about child windows that resides outside of its
parent windows on screen. In our memory region terms think about PCI BAR
is registered to overlap with RAM at address 0x1000 for instance. PCI
BAR memory region and RAM memory region are on very different branches
of the global tree.

--
Gleb.



Re: [Qemu-devel] [RFC] Memory API

2011-05-20 Thread Gleb Natapov
On Fri, May 20, 2011 at 12:10:22PM +0300, Avi Kivity wrote:
 On 05/19/2011 09:22 PM, Gleb Natapov wrote:
 
   BARs may overlap with other BARs or with RAM. That's well-known, so PCI
   bridged need to register their regions with the _overlap variant
   unconditionally. In contrast to the current PhysPageDesc mechanism, the
 With what priority?
 
 It doesn't matter, since the spec doesn't define priorities among PCI BARs.
 
And among PCI BAR and memory (the case the question above referred too).

 If it needs to call _overlap unconditionally why not
 always call _overlap and drop not _overlap variant?
 
 Other uses need non-overlapping registration.
And who prohibit them from creating one?

 
 
   And they do not need to. The APIC regions will be managed by the per-CPU
   region management, reusing the tool box we need for all bridges. It will
   register the APIC page with a priority higher than the default one, thus
   overriding everything that comes from the host bridge. I think that
   reflects pretty well real machine behaviour.
 
 What is higher? How does it know that priority is high enough?
 
 It is well known that 1  0, for example.
 
That is if you have global scale. In the case I am asking about you do
not. Even if PCI will register memory region that overlaps APIC address
with priority 1000 APIC memory region should still be able to override
it even with priority 0. Voila 1000  0? Where is your sarcasm now? :)

But Jan already answered this one. Actually what really matters is the
place of the node in a topology, not priority. But then for all of this
to make sense registration has to be hierarchical.

 I
 thought, from reading other replies, that priorities are meaningful
 only on the same hierarchy level (which kinda make sense), but now you
 are saying that you will override PCI address from another part of
 the topology?
 
 -- per-cpu memory
 |
 +--- apic page (prio 1)
 |
 +--- global memory (prio 0)
 
 -- 
 I have a truly marvellous patch that fixes the bug which this
 signature is too narrow to contain.

--
Gleb.



Re: [Qemu-devel] [PATCH v4 1/3] coroutine: introduce coroutines

2011-05-20 Thread Paolo Bonzini

On 05/20/2011 12:59 PM, Stefan Hajnoczi wrote:

This coroutines implementation is based on the gtk-vnc implementation
written by Anthony Liguorianth...@codemonkey.ws  but it has been
significantly rewritten by Kevin Wolfkw...@redhat.com  to use
setjmp()/longjmp() instead of the more expensive swapcontext() and by
Paolo Bonzinipbonz...@redhat.com  for Windows Fibers support.



Not a blocker at all, but why did you move the pooling to the ucontext 
implementation?  It's less expensive to create the fiber in Windows 
because there are no system calls (unlike swapcontext), but a future 
pthread-based implementation will also need the pooling.


It can be left to whoever writes the pthread stuff, though.

Paolo



Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration

2011-05-20 Thread Stefan Hajnoczi
I'm interested in what the API for snapshots would look like.
Specifically how does user software do the following:
1. Create a snapshot
2. Delete a snapshot
3. List snapshots
4. Access data from a snapshot
5. Restore a VM from a snapshot
6. Get the dirty blocks list (for incremental backup)

We've discussed image format-level approaches but I think the scope of
the API should cover several levels at which snapshots are
implemented:
1. Image format - image file snapshot (Jes, Jagane)
2. Host file system - ext4 and btrfs snapshots
3. Storage system - LVM or SAN volume snapshots

It will be hard to take advantage of more efficient host file system
or storage system snapshots if they are not designed in now.

Is anyone familiar enough with the libvirt storage APIs to draft an
extension that adds snapshot support?  I will take a stab at it if no
one else want to try it.

Stefan



Re: [Qemu-devel] [PATCH v4 1/3] coroutine: introduce coroutines

2011-05-20 Thread Stefan Hajnoczi
On Fri, May 20, 2011 at 1:09 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 On 05/20/2011 12:59 PM, Stefan Hajnoczi wrote:

 This coroutines implementation is based on the gtk-vnc implementation
 written by Anthony Liguorianth...@codemonkey.ws  but it has been
 significantly rewritten by Kevin Wolfkw...@redhat.com  to use
 setjmp()/longjmp() instead of the more expensive swapcontext() and by
 Paolo Bonzinipbonz...@redhat.com  for Windows Fibers support.


 Not a blocker at all, but why did you move the pooling to the ucontext
 implementation?  It's less expensive to create the fiber in Windows because
 there are no system calls (unlike swapcontext), but a future pthread-based
 implementation will also need the pooling.

 It can be left to whoever writes the pthread stuff, though.

There are two options for pooling:
1. Thread-local pools
2. One global pool with a lock

One of these choices must be selected because otherwise the pool could
be accessed simultaneously from multiple threads.  I tried #2 first
because it was less code but it caused a noticable slow-down with
./check-coroutine --benchmark-lifecycle.  Option #1 had less impact
but requires using thread-local storage, which I've used pthread APIs
for.  Hence I moved it into coroutine-ucontext.c hoping that win32
would either be fast enough as-as or that we could find a better
solution if someone needs it.

Stefan



[Qemu-devel] [PATCH 0/6] Implement constant folding and copy propagation in TCG

2011-05-20 Thread Kirill Batuzov
This series implements some basic machine-independent optimizations.  They
simplify code and allow liveness analysis do it's work better.

Suppose we have following ARM code:

 movwr12, #0xb6db
 movtr12, #0xdb6d

In TCG before optimizations we'll have:

 movi_i32 tmp8,$0xb6db
 mov_i32 r12,tmp8
 mov_i32 tmp8,r12
 ext16u_i32 tmp8,tmp8
 movi_i32 tmp9,$0xdb6d
 or_i32 tmp8,tmp8,tmp9
 mov_i32 r12,tmp8

And after optimizations we'll have this:

 movi_i32 r12,$0xdb6db6db

Here are performance evaluation results on SPEC CPU2000 integer tests in
user-mode emulation on x86_64 host.  There were 5 runs of each test on
reference data set.  The tables below show runtime in seconds for all these
runs.

ARM guest without optimizations:
Test name   #1   #2   #3   #4   #5Median
164.gzip1403.612 1403.499 1403.52  1208.55  1403.583 1403.52
175.vpr 1237.729 1238.008 1238.019 1176.852 1237.902 1237.902
176.gcc  929.511  928.867  929.048  928.927  928.792  928.927
181.mcf  196.371  196.335  196.172  197.057  196.196  196.335
186.crafty  1547.101 1547.293 1547.133 1547.037 1547.044 1547.101
197.parser  3804.336 3804.429 3804.412 3804.45  3804.301 3804.412
252.eon 2760.414 2760.45  2473.608 2760.606 2760.216 2760.414
253.perlbmk 2557.966 2558.971 2559.731 2479.299 2556.835 2557.966
256.bzip2   1296.412 1296.215 1296.63  1296.489 1296.092 1296.412
300.twolf   2919.496 2919.444 2919.529 2919.384 2919.404 2919.444
  
ARM guest with optimizations:
Test name   #1   #2   #3   #4   #5MedianGain
164.gzip1345.416 1401.741 1377.022 1401.737 1401.773 1401.737   0.13%
175.vpr 1116.75  1243.213 1243.32  1243.316 1243.144 1243.213  -0.43%
176.gcc  897.045  909.568  850.1909.65   909.57   909.568   2.08%
181.mcf  199.058  198.717  198.28   198.866  197.955  198.717  -1.21%
186.crafty  1525.667 1526.663 1525.981 1525.995 1526.164 1525.995   1.36%
197.parser  3749.453 3749.522 3749.413 3749.5   3749.484 3749.484   1.44%
252.eon 2730.593 2746.525 2746.495 2746.493 2746.62  2746.495   0.50%
253.perlbmk 2577.341 2521.057 2578.461 2578.721 2581.313 2578.461  -0.80%
256.bzip2   1184.498 1190.116 1294.352 1294.554 1294.637 1294.352   0.16%
300.twolf   2894.264 2894.133 2894.398 2894.103 2894.146 2894.146   0.87%


x86_64 guest without optimizations:
Test name   #1   #2   #3   #4   #5Median
164.gzip 858.118  858.151  858.09   858.139  858.122  858.122
175.vpr  956.361  956.465  956.521  956.438  956.705  956.465
176.gcc  647.275  647.465  647.186  647.294  647.268  647.275
181.mcf  219.239  221.964  220.244  220.74   220.559  220.559
186.crafty  1128.027 1128.071 1128.028 1128.115 1128.123 1128.071
197.parser  1815.669 1815.651 1815.669 1815.711 1815.759 1815.669
253.perlbmk 1777.143 1777.749 1667.508 1777.051 1778.391 1777.143
254.gap 1062.808 1062.758 1062.801 1063.099 1062.859 1062.808
255.vortex  1930.693 1930.706 1930.579 1930.7   1930.566 1930.693
256.bzip2   1014.566 1014.702 1014.6   1014.274 1014.421 1014.566
300.twolf   1342.653 1342.759 1344.092 1342.641 1342.794 1342.759
 
x86_64 guest with optimizations:
Test name   #1   #2   #3   #4   #5MedianGain
164.gzip 857.485  857.457  857.475  857.509  857.507  857.485   0.07%
175.vpr  963.255  962.972  963.27   963.124  963.686  963.255  -0.71%
176.gcc  644.123  644.055  644.145  643.818  635.773  644.055   0.50%
181.mcf  216.215  217.549  218.744  216.437  217.83   217.549   1.36%
186.crafty  1128.873 1128.792 1128.871 1128.816 1128.823 1128.823  -0.07%
197.parser  1814.626 1814.503 1814.552 1814.602 1814.748 1814.602   0.06%
253.perlbmk 1758.056 1751.963 1753.267 1765.27  1759.828 1758.056   1.07%
254.gap 1064.702 1064.712 1064.629 1064.657 1064.645 1064.657  -0.17%
255.vortex  1760.638 1936.387 1937.871 1937.471 1760.496 1936.387  -0.29%
256.bzip2   1007.658 1007.682 1007.316 1007.982 1007.747 1007.682   0.68%
300.twolf   1334.139 1333.791 1333.795 1334.147 1333.732 1333.795   0.67%

ARM guests for 254.gap and 255.vortex and x86_64 guest for 252.eon does not
work under QEMU for some unrelated reason.

Kirill Batuzov (6):
  Add TCG optimizations stub
  Add copy and constant propagation.
  Do constant folding for basic arithmetic operations.
  Do constant folding for boolean operations.
  Do constant folding for shift operations.
  Do constant folding for unary operations.

 Makefile.target |2 +-
 tcg/optimize.c  |  539 +++
 tcg/tcg.c   |6 +
 tcg/tcg.h   |3 +
 4 files changed, 549 insertions(+), 1 deletions(-)
 create mode 100644 tcg/optimize.c

-- 
1.7.4.1




[Qemu-devel] [PATCH 1/6] Add TCG optimizations stub

2011-05-20 Thread Kirill Batuzov
Added file tcg/optimize.c to hold TCG optimizations. Function tcg_optimize
is called from tcg_gen_code_common. It calls other functions performing
specific optimizations. Stub for constant folding was added.

Signed-off-by: Kirill Batuzov batuz...@ispras.ru
---
 Makefile.target |2 +-
 tcg/optimize.c  |   87 +++
 tcg/tcg.c   |6 
 tcg/tcg.h   |3 ++
 4 files changed, 97 insertions(+), 1 deletions(-)
 create mode 100644 tcg/optimize.c

diff --git a/Makefile.target b/Makefile.target
index 21f864a..5a61778 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -70,7 +70,7 @@ all: $(PROGS) stap
 #
 # cpu emulator library
 libobj-y = exec.o translate-all.o cpu-exec.o translate.o
-libobj-y += tcg/tcg.o
+libobj-y += tcg/tcg.o tcg/optimize.o
 libobj-$(CONFIG_SOFTFLOAT) += fpu/softfloat.o
 libobj-$(CONFIG_NOSOFTFLOAT) += fpu/softfloat-native.o
 libobj-y += op_helper.o helper.o
diff --git a/tcg/optimize.c b/tcg/optimize.c
new file mode 100644
index 000..cf31d18
--- /dev/null
+++ b/tcg/optimize.c
@@ -0,0 +1,87 @@
+/*
+ * Optimizations for Tiny Code Generator for QEMU
+ *
+ * Copyright (c) 2010 Samsung Electronics.
+ * Contributed by Kirill Batuzov batuz...@ispras.ru
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the Software), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include config.h
+
+#include stdlib.h
+#include stdio.h
+
+#include qemu-common.h
+#include tcg-op.h
+
+static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
+TCGArg *args, TCGOpDef *tcg_op_defs)
+{
+int i, nb_ops, op_index, op, nb_temps, nb_globals;
+const TCGOpDef *def;
+TCGArg *gen_args;
+
+nb_temps = s-nb_temps;
+nb_globals = s-nb_globals;
+
+nb_ops = tcg_opc_ptr - gen_opc_buf;
+gen_args = args;
+for (op_index = 0; op_index  nb_ops; op_index++) {
+op = gen_opc_buf[op_index];
+def = tcg_op_defs[op];
+switch (op) {
+case INDEX_op_call:
+case INDEX_op_jmp:
+case INDEX_op_br:
+case INDEX_op_brcond_i32:
+case INDEX_op_set_label:
+#if TCG_TARGET_REG_BITS == 64
+case INDEX_op_brcond_i64:
+#endif
+i = (op == INDEX_op_call) ?
+(args[0]  16) + (args[0]  0x) + 3 :
+def-nb_args;
+while (i) {
+*gen_args = *args;
+args++;
+gen_args++;
+i--;
+}
+break;
+default:
+for (i = 0; i  def-nb_args; i++) {
+gen_args[i] = args[i];
+}
+args += def-nb_args;
+gen_args += def-nb_args;
+break;
+}
+}
+
+return gen_args;
+}
+
+TCGArg *tcg_optimize(TCGContext *s, uint16_t *tcg_opc_ptr,
+TCGArg *args, TCGOpDef *tcg_op_defs)
+{
+TCGArg *res;
+res = tcg_constant_folding(s, tcg_opc_ptr, args, tcg_op_defs);
+return res;
+}
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 8748c05..6fb4dd6 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -24,6 +24,7 @@
 
 /* define it to use liveness analysis (better code) */
 #define USE_LIVENESS_ANALYSIS
+#define USE_TCG_OPTIMIZATIONS
 
 #include config.h
 
@@ -2018,6 +2019,11 @@ static inline int tcg_gen_code_common(TCGContext *s, 
uint8_t *gen_code_buf,
 }
 #endif
 
+#ifdef USE_TCG_OPTIMIZATIONS
+gen_opparam_ptr =
+tcg_optimize(s, gen_opc_ptr, gen_opparam_buf, tcg_op_defs);
+#endif
+
 #ifdef CONFIG_PROFILER
 s-la_time -= profile_getclock();
 #endif
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 3fab8d6..a85a8d7 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -486,6 +486,9 @@ void tcg_gen_callN(TCGContext *s, TCGv_ptr func, unsigned 
int flags,
 void tcg_gen_shifti_i64(TCGv_i64 ret, TCGv_i64 arg1,
 int c, int right, int arith);
 
+TCGArg *tcg_optimize(TCGContext *s, uint16_t *tcg_opc_ptr, TCGArg *args,
+  

[Qemu-devel] [PATCH 6/6] Do constant folding for unary operations.

2011-05-20 Thread Kirill Batuzov
Perform constant folding for NOT and EXT{8,16,32}{S,U} operations.

Signed-off-by: Kirill Batuzov batuz...@ispras.ru
---
 tcg/optimize.c |   82 
 1 files changed, 82 insertions(+), 0 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index b6b0dc4..bda469a 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -104,6 +104,11 @@ static int op_bits(int op)
 case INDEX_op_sar_i32:
 case INDEX_op_rotl_i32:
 case INDEX_op_rotr_i32:
+case INDEX_op_not_i32:
+case INDEX_op_ext8s_i32:
+case INDEX_op_ext16s_i32:
+case INDEX_op_ext8u_i32:
+case INDEX_op_ext16u_i32:
 return 32;
 #if TCG_TARGET_REG_BITS == 64
 case INDEX_op_mov_i64:
@@ -118,6 +123,13 @@ static int op_bits(int op)
 case INDEX_op_sar_i64:
 case INDEX_op_rotl_i64:
 case INDEX_op_rotr_i64:
+case INDEX_op_not_i64:
+case INDEX_op_ext8s_i64:
+case INDEX_op_ext16s_i64:
+case INDEX_op_ext32s_i64:
+case INDEX_op_ext8u_i64:
+case INDEX_op_ext16u_i64:
+case INDEX_op_ext32u_i64:
 return 64;
 #endif
 default:
@@ -245,6 +257,44 @@ static TCGArg do_constant_folding_2(int op, TCGArg x, 
TCGArg y)
 return x;
 #endif
 
+case INDEX_op_not_i32:
+#if TCG_TARGET_REG_BITS == 64
+case INDEX_op_not_i64:
+#endif
+return ~x;
+
+case INDEX_op_ext8s_i32:
+return x  (1  7) ? x | ~0xff : x  0xff;
+
+case INDEX_op_ext16s_i32:
+return x  (1  15) ? x | ~0x : x  0x;
+
+case INDEX_op_ext8u_i32:
+return x  0xff;
+
+case INDEX_op_ext16u_i32:
+return x  0x;
+
+#if TCG_TARGET_REG_BITS == 64
+case INDEX_op_ext8s_i64:
+return x  (1  7) ? x | ~0xffULL : x  0xff;
+
+case INDEX_op_ext16s_i64:
+return x  (1  15) ? x | ~0xULL : x  0x;
+
+case INDEX_op_ext32s_i64:
+return x  (1U  31) ? x | ~0xULL : x  0x;
+
+case INDEX_op_ext8u_i64:
+return x  0xff;
+
+case INDEX_op_ext16u_i64:
+return x  0x;
+
+case INDEX_op_ext32u_i64:
+return x  0x;
+#endif
+
 default:
 fprintf(stderr,
 Unrecognized operation %d in do_constant_folding.\n, op);
@@ -345,6 +395,38 @@ static TCGArg *tcg_constant_folding(TCGContext *s, 
uint16_t *tcg_opc_ptr,
 gen_args += 2;
 args += 2;
 break;
+case INDEX_op_not_i32:
+case INDEX_op_ext8s_i32:
+case INDEX_op_ext16s_i32:
+case INDEX_op_ext8u_i32:
+case INDEX_op_ext16u_i32:
+#if TCG_TARGET_REG_BITS == 64
+case INDEX_op_not_i64:
+case INDEX_op_ext8s_i64:
+case INDEX_op_ext16s_i64:
+case INDEX_op_ext32s_i64:
+case INDEX_op_ext8u_i64:
+case INDEX_op_ext16u_i64:
+case INDEX_op_ext32u_i64:
+#endif
+if (state[args[1]] == TCG_TEMP_CONST) {
+gen_opc_buf[op_index] = op_to_movi(op);
+gen_args[0] = args[0];
+gen_args[1] = do_constant_folding(op, vals[args[1]], 0);
+reset_temp(state, vals, gen_args[0], nb_temps, nb_globals);
+state[gen_args[0]] = TCG_TEMP_CONST;
+vals[gen_args[0]] = gen_args[1];
+gen_args += 2;
+args += 2;
+break;
+} else {
+reset_temp(state, vals, args[0], nb_temps, nb_globals);
+gen_args[0] = args[0];
+gen_args[1] = args[1];
+gen_args += 2;
+args += 2;
+break;
+}
 case INDEX_op_or_i32:
 case INDEX_op_and_i32:
 #if TCG_TARGET_REG_BITS == 64
-- 
1.7.4.1




[Qemu-devel] [PATCH 5/6] Do constant folding for shift operations.

2011-05-20 Thread Kirill Batuzov
Perform constant forlding for SHR, SHL, SAR, ROTR, ROTL operations.

Signed-off-by: Kirill Batuzov batuz...@ispras.ru
---
 tcg/optimize.c |   87 
 1 files changed, 87 insertions(+), 0 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index a02d5c1..b6b0dc4 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -99,6 +99,11 @@ static int op_bits(int op)
 case INDEX_op_and_i32:
 case INDEX_op_or_i32:
 case INDEX_op_xor_i32:
+case INDEX_op_shl_i32:
+case INDEX_op_shr_i32:
+case INDEX_op_sar_i32:
+case INDEX_op_rotl_i32:
+case INDEX_op_rotr_i32:
 return 32;
 #if TCG_TARGET_REG_BITS == 64
 case INDEX_op_mov_i64:
@@ -108,6 +113,11 @@ static int op_bits(int op)
 case INDEX_op_and_i64:
 case INDEX_op_or_i64:
 case INDEX_op_xor_i64:
+case INDEX_op_shl_i64:
+case INDEX_op_shr_i64:
+case INDEX_op_sar_i64:
+case INDEX_op_rotl_i64:
+case INDEX_op_rotr_i64:
 return 64;
 #endif
 default:
@@ -131,6 +141,7 @@ static int op_to_movi(int op)
 
 static TCGArg do_constant_folding_2(int op, TCGArg x, TCGArg y)
 {
+TCGArg r;
 switch (op) {
 case INDEX_op_add_i32:
 #if TCG_TARGET_REG_BITS == 64
@@ -168,6 +179,72 @@ static TCGArg do_constant_folding_2(int op, TCGArg x, 
TCGArg y)
 #endif
 return x ^ y;
 
+case INDEX_op_shl_i32:
+#if TCG_TARGET_REG_BITS == 64
+y = 0x;
+case INDEX_op_shl_i64:
+#endif
+return x  y;
+
+case INDEX_op_shr_i32:
+#if TCG_TARGET_REG_BITS == 64
+x = 0x;
+y = 0x;
+case INDEX_op_shr_i64:
+#endif
+/* Assuming TCGArg to be unsigned */
+return x  y;
+
+case INDEX_op_sar_i32:
+#if TCG_TARGET_REG_BITS == 64
+x = 0x;
+y = 0x;
+#endif
+r = x  0x8000;
+x = ~0x8000;
+x = y;
+r |= r - (r  y);
+x |= r;
+return x;
+
+#if TCG_TARGET_REG_BITS == 64
+case INDEX_op_sar_i64:
+r = x  0x8000ULL;
+x = ~0x8000ULL;
+x = y;
+r |= r - (r  y);
+x |= r;
+return x;
+#endif
+
+case INDEX_op_rotr_i32:
+#if TCG_TARGET_REG_BITS == 64
+x = 0x;
+y = 0x;
+#endif
+x = (x  (32 - y)) | (x  y);
+return x;
+
+#if TCG_TARGET_REG_BITS == 64
+case INDEX_op_rotr_i64:
+x = (x  (64 - y)) | (x  y);
+return x;
+#endif
+
+case INDEX_op_rotl_i32:
+#if TCG_TARGET_REG_BITS == 64
+x = 0x;
+y = 0x;
+#endif
+x = (x  y) | (x  (32 - y));
+return x;
+
+#if TCG_TARGET_REG_BITS == 64
+case INDEX_op_rotl_i64:
+x = (x  y) | (x  (64 - y));
+return x;
+#endif
+
 default:
 fprintf(stderr,
 Unrecognized operation %d in do_constant_folding.\n, op);
@@ -297,11 +374,21 @@ static TCGArg *tcg_constant_folding(TCGContext *s, 
uint16_t *tcg_opc_ptr,
 case INDEX_op_add_i32:
 case INDEX_op_sub_i32:
 case INDEX_op_mul_i32:
+case INDEX_op_shl_i32:
+case INDEX_op_shr_i32:
+case INDEX_op_sar_i32:
+case INDEX_op_rotl_i32:
+case INDEX_op_rotr_i32:
 #if TCG_TARGET_REG_BITS == 64
 case INDEX_op_xor_i64:
 case INDEX_op_add_i64:
 case INDEX_op_sub_i64:
 case INDEX_op_mul_i64:
+case INDEX_op_shl_i64:
+case INDEX_op_shr_i64:
+case INDEX_op_sar_i64:
+case INDEX_op_rotl_i64:
+case INDEX_op_rotr_i64:
 #endif
 if (state[args[1]] == TCG_TEMP_CONST
  state[args[2]] == TCG_TEMP_CONST) {
-- 
1.7.4.1




[Qemu-devel] [PATCH 4/6] Do constant folding for boolean operations.

2011-05-20 Thread Kirill Batuzov
Perform constant folding for AND, OR, XOR operations.

Signed-off-by: Kirill Batuzov batuz...@ispras.ru
---
 tcg/optimize.c |   58 
 1 files changed, 58 insertions(+), 0 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 4073f05..a02d5c1 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -38,6 +38,13 @@ typedef enum {
 TCG_TEMP_ANY
 } tcg_temp_state;
 
+const int mov_opc[] = {
+INDEX_op_mov_i32,
+#if TCG_TARGET_REG_BITS == 64
+INDEX_op_mov_i64,
+#endif
+};
+
 static int mov_to_movi(int op)
 {
 switch (op) {
@@ -89,12 +96,18 @@ static int op_bits(int op)
 case INDEX_op_add_i32:
 case INDEX_op_sub_i32:
 case INDEX_op_mul_i32:
+case INDEX_op_and_i32:
+case INDEX_op_or_i32:
+case INDEX_op_xor_i32:
 return 32;
 #if TCG_TARGET_REG_BITS == 64
 case INDEX_op_mov_i64:
 case INDEX_op_add_i64:
 case INDEX_op_sub_i64:
 case INDEX_op_mul_i64:
+case INDEX_op_and_i64:
+case INDEX_op_or_i64:
+case INDEX_op_xor_i64:
 return 64;
 #endif
 default:
@@ -137,6 +150,24 @@ static TCGArg do_constant_folding_2(int op, TCGArg x, 
TCGArg y)
 #endif
 return x * y;
 
+case INDEX_op_and_i32:
+#if TCG_TARGET_REG_BITS == 64
+case INDEX_op_and_i64:
+#endif
+return x  y;
+
+case INDEX_op_or_i32:
+#if TCG_TARGET_REG_BITS == 64
+case INDEX_op_or_i64:
+#endif
+return x | y;
+
+case INDEX_op_xor_i32:
+#if TCG_TARGET_REG_BITS == 64
+case INDEX_op_xor_i64:
+#endif
+return x ^ y;
+
 default:
 fprintf(stderr,
 Unrecognized operation %d in do_constant_folding.\n, op);
@@ -237,10 +268,37 @@ static TCGArg *tcg_constant_folding(TCGContext *s, 
uint16_t *tcg_opc_ptr,
 gen_args += 2;
 args += 2;
 break;
+case INDEX_op_or_i32:
+case INDEX_op_and_i32:
+#if TCG_TARGET_REG_BITS == 64
+case INDEX_op_and_i64:
+case INDEX_op_or_i64:
+#endif
+if (args[1] == args[2]) {
+if (args[1] == args[0]) {
+args += 3;
+gen_opc_buf[op_index] = INDEX_op_nop;
+} else {
+reset_temp(state, vals, args[0], nb_temps, nb_globals);
+if (args[1] = s-nb_globals) {
+state[args[0]] = TCG_TEMP_COPY;
+vals[args[0]] = args[1];
+}
+gen_opc_buf[op_index] = mov_opc[op_bits(op) / 32 - 1];
+gen_args[0] = args[0];
+gen_args[1] = args[1];
+gen_args += 2;
+args += 3;
+}
+break;
+}
+/* Proceed with default binary operation handling */
+case INDEX_op_xor_i32:
 case INDEX_op_add_i32:
 case INDEX_op_sub_i32:
 case INDEX_op_mul_i32:
 #if TCG_TARGET_REG_BITS == 64
+case INDEX_op_xor_i64:
 case INDEX_op_add_i64:
 case INDEX_op_sub_i64:
 case INDEX_op_mul_i64:
-- 
1.7.4.1




[Qemu-devel] [PATCH 2/6] Add copy and constant propagation.

2011-05-20 Thread Kirill Batuzov
Make tcg_constant_folding do copy and constant propagation. It is a
preparational work before actual constant folding.

Signed-off-by: Kirill Batuzov batuz...@ispras.ru
---
 tcg/optimize.c |  123 
 1 files changed, 123 insertions(+), 0 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index cf31d18..a761c51 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -31,22 +31,139 @@
 #include qemu-common.h
 #include tcg-op.h
 
+typedef enum {
+TCG_TEMP_UNDEF = 0,
+TCG_TEMP_CONST,
+TCG_TEMP_COPY,
+TCG_TEMP_ANY
+} tcg_temp_state;
+
+static int mov_to_movi(int op)
+{
+switch (op) {
+case INDEX_op_mov_i32: return INDEX_op_movi_i32;
+#if TCG_TARGET_REG_BITS == 64
+case INDEX_op_mov_i64: return INDEX_op_movi_i64;
+#endif
+default:
+fprintf(stderr, Unrecognized operation %d in mov_to_movi.\n, op);
+tcg_abort();
+}
+}
+
+/* Reset TEMP's state to TCG_TEMP_ANY.  If TEMP was a representative of some
+   class of equivalent temp's, a new representative should be chosen in this
+   class. */
+static void reset_temp(tcg_temp_state *state, tcg_target_ulong *vals,
+   TCGArg temp, int nb_temps, int nb_globals)
+{
+int i;
+TCGArg new_base;
+new_base = (TCGArg)-1;
+for (i = nb_globals; i  nb_temps; i++) {
+if (state[i] == TCG_TEMP_COPY  vals[i] == temp) {
+if (new_base == ((TCGArg)-1)) {
+new_base = (TCGArg)i;
+state[i] = TCG_TEMP_ANY;
+} else {
+vals[i] = new_base;
+}
+}
+}
+for (i = 0; i  nb_globals; i++) {
+if (state[i] == TCG_TEMP_COPY  vals[i] == temp) {
+if (new_base == ((TCGArg)-1)) {
+state[i] = TCG_TEMP_ANY;
+} else {
+vals[i] = new_base;
+}
+}
+}
+state[temp] = TCG_TEMP_ANY;
+}
+
+/* Propagate constants and copies, fold constant expressions. */
 static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
 TCGArg *args, TCGOpDef *tcg_op_defs)
 {
 int i, nb_ops, op_index, op, nb_temps, nb_globals;
 const TCGOpDef *def;
 TCGArg *gen_args;
+/* Array VALS has an element for each temp.
+   If this temp holds a constant then its value is kept in VALS' element.
+   If this temp is a copy of other ones then this equivalence class'
+   representative is kept in VALS' element.
+   If this temp is neither copy nor constant then corresponding VALS'
+   element is unused. */
+static tcg_target_ulong vals[TCG_MAX_TEMPS];
+static tcg_temp_state state[TCG_MAX_TEMPS];
 
 nb_temps = s-nb_temps;
 nb_globals = s-nb_globals;
+memset(state, 0, nb_temps * sizeof(tcg_temp_state));
 
 nb_ops = tcg_opc_ptr - gen_opc_buf;
 gen_args = args;
 for (op_index = 0; op_index  nb_ops; op_index++) {
 op = gen_opc_buf[op_index];
 def = tcg_op_defs[op];
+/* Do copy propagation */
+if (op != INDEX_op_call) {
+for (i = def-nb_oargs; i  def-nb_oargs + def-nb_iargs; i++) {
+if (state[args[i]] == TCG_TEMP_COPY
+ !(def-args_ct[i].ct  TCG_CT_IALIAS)
+ (def-args_ct[i].ct  TCG_CT_REG)) {
+args[i] = vals[args[i]];
+}
+}
+}
+
+/* Propagate constants through copy operations and do constant
+   folding.  Constants will be substituted to arguments by register
+   allocator where needed and possible.  Also detect copies. */
 switch (op) {
+case INDEX_op_mov_i32:
+#if TCG_TARGET_REG_BITS == 64
+case INDEX_op_mov_i64:
+#endif
+if ((state[args[1]] == TCG_TEMP_COPY
+ vals[args[1]] == args[0])
+|| args[0] == args[1]) {
+args += 2;
+gen_opc_buf[op_index] = INDEX_op_nop;
+break;
+}
+if (state[args[1]] != TCG_TEMP_CONST) {
+reset_temp(state, vals, args[0], nb_temps, nb_globals);
+if (args[1] = s-nb_globals) {
+state[args[0]] = TCG_TEMP_COPY;
+vals[args[0]] = args[1];
+}
+gen_args[0] = args[0];
+gen_args[1] = args[1];
+gen_args += 2;
+args += 2;
+break;
+} else {
+/* Source argument is constant.  Rewrite the operation and
+   let movi case handle it. */
+op = mov_to_movi(op);
+gen_opc_buf[op_index] = op;
+args[1] = vals[args[1]];
+/* fallthrough */
+}
+case INDEX_op_movi_i32:
+#if TCG_TARGET_REG_BITS == 64
+case INDEX_op_movi_i64:
+#endif
+reset_temp(state, vals, args[0], nb_temps, nb_globals);
+ 

[Qemu-devel] [PATCH 3/6] Do constant folding for basic arithmetic operations.

2011-05-20 Thread Kirill Batuzov
Perform actual constant folding for ADD, SUB and MUL operations.

Signed-off-by: Kirill Batuzov batuz...@ispras.ru
---
 tcg/optimize.c |  102 
 1 files changed, 102 insertions(+), 0 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index a761c51..4073f05 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -82,6 +82,79 @@ static void reset_temp(tcg_temp_state *state, 
tcg_target_ulong *vals,
 state[temp] = TCG_TEMP_ANY;
 }
 
+static int op_bits(int op)
+{
+switch (op) {
+case INDEX_op_mov_i32:
+case INDEX_op_add_i32:
+case INDEX_op_sub_i32:
+case INDEX_op_mul_i32:
+return 32;
+#if TCG_TARGET_REG_BITS == 64
+case INDEX_op_mov_i64:
+case INDEX_op_add_i64:
+case INDEX_op_sub_i64:
+case INDEX_op_mul_i64:
+return 64;
+#endif
+default:
+fprintf(stderr, Unrecognized operation %d in op_bits.\n, op);
+tcg_abort();
+}
+}
+
+static int op_to_movi(int op)
+{
+if (op_bits(op) == 32) {
+return INDEX_op_movi_i32;
+}
+#if TCG_TARGET_REG_BITS == 64
+if (op_bits(op) == 64) {
+return INDEX_op_movi_i64;
+}
+#endif
+tcg_abort();
+}
+
+static TCGArg do_constant_folding_2(int op, TCGArg x, TCGArg y)
+{
+switch (op) {
+case INDEX_op_add_i32:
+#if TCG_TARGET_REG_BITS == 64
+case INDEX_op_add_i64:
+#endif
+return x + y;
+
+case INDEX_op_sub_i32:
+#if TCG_TARGET_REG_BITS == 64
+case INDEX_op_sub_i64:
+#endif
+return x - y;
+
+case INDEX_op_mul_i32:
+#if TCG_TARGET_REG_BITS == 64
+case INDEX_op_mul_i64:
+#endif
+return x * y;
+
+default:
+fprintf(stderr,
+Unrecognized operation %d in do_constant_folding.\n, op);
+tcg_abort();
+}
+}
+
+static TCGArg do_constant_folding(int op, TCGArg x, TCGArg y)
+{
+TCGArg res = do_constant_folding_2(op, x, y);
+#if TCG_TARGET_REG_BITS == 64
+if (op_bits(op) == 32) {
+res = 0x;
+}
+#endif
+return res;
+}
+
 /* Propagate constants and copies, fold constant expressions. */
 static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
 TCGArg *args, TCGOpDef *tcg_op_defs)
@@ -164,6 +237,35 @@ static TCGArg *tcg_constant_folding(TCGContext *s, 
uint16_t *tcg_opc_ptr,
 gen_args += 2;
 args += 2;
 break;
+case INDEX_op_add_i32:
+case INDEX_op_sub_i32:
+case INDEX_op_mul_i32:
+#if TCG_TARGET_REG_BITS == 64
+case INDEX_op_add_i64:
+case INDEX_op_sub_i64:
+case INDEX_op_mul_i64:
+#endif
+if (state[args[1]] == TCG_TEMP_CONST
+ state[args[2]] == TCG_TEMP_CONST) {
+gen_opc_buf[op_index] = op_to_movi(op);
+gen_args[0] = args[0];
+gen_args[1] =
+do_constant_folding(op, vals[args[1]], vals[args[2]]);
+reset_temp(state, vals, gen_args[0], nb_temps, nb_globals);
+state[gen_args[0]] = TCG_TEMP_CONST;
+vals[gen_args[0]] = gen_args[1];
+gen_args += 2;
+args += 3;
+break;
+} else {
+reset_temp(state, vals, args[0], nb_temps, nb_globals);
+gen_args[0] = args[0];
+gen_args[1] = args[1];
+gen_args[2] = args[2];
+gen_args += 3;
+args += 3;
+break;
+}
 case INDEX_op_call:
 case INDEX_op_jmp:
 case INDEX_op_br:
-- 
1.7.4.1




Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration

2011-05-20 Thread Jes Sorensen
On 05/20/11 14:19, Stefan Hajnoczi wrote:
 I'm interested in what the API for snapshots would look like.

I presume you're talking external snapshots here? The API is really what
should be defined by libvirt, so you get a unified API that can work
both on QEMU level snapshots as well as enterprise storage, host file
system snapshots etc.

 Specifically how does user software do the following:
 1. Create a snapshot

There's a QMP patch out already that is still not applied, but it is
pretty simple, similar to the hmp command.

Alternatively you can do it the evil way by pre-creating the snapshot
image file and feeding that the snapshot command. In this case QEMU
won't create the snapshot file.

 2. Delete a snapshot

This is still to be defined.

 3. List snapshots

Again this is tricky as it depends on the type of snapshot. For QEMU
level ones they are files, so 'ls' is your friend :)

 4. Access data from a snapshot

You boot the snapshot file.

 5. Restore a VM from a snapshot

We're talking snapshots not checkpointing here, so you cannot restore a
VM from a snapshot.

 6. Get the dirty blocks list (for incremental backup)

Good question

 We've discussed image format-level approaches but I think the scope of
 the API should cover several levels at which snapshots are
 implemented:
 1. Image format - image file snapshot (Jes, Jagane)
 2. Host file system - ext4 and btrfs snapshots
 3. Storage system - LVM or SAN volume snapshots
 
 It will be hard to take advantage of more efficient host file system
 or storage system snapshots if they are not designed in now.
 
 Is anyone familiar enough with the libvirt storage APIs to draft an
 extension that adds snapshot support?  I will take a stab at it if no
 one else want to try it.

I believe the libvirt guys are already looking at this. Adding to the CC
list.

Cheers,
Jes



Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration

2011-05-20 Thread Stefan Hajnoczi
On Fri, May 20, 2011 at 1:39 PM, Jes Sorensen jes.soren...@redhat.com wrote:
 On 05/20/11 14:19, Stefan Hajnoczi wrote:
 I'm interested in what the API for snapshots would look like.

 I presume you're talking external snapshots here? The API is really what
 should be defined by libvirt, so you get a unified API that can work
 both on QEMU level snapshots as well as enterprise storage, host file
 system snapshots etc.

Thanks for the pointers on external snapshots using image files.  I'm
really thinking about the libvirt API.

Basically I'm not sure we'll implement the right things if we don't
think through the API that the user sees first.

Stefan



Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration

2011-05-20 Thread Jes Sorensen
On 05/20/11 14:49, Stefan Hajnoczi wrote:
 On Fri, May 20, 2011 at 1:39 PM, Jes Sorensen jes.soren...@redhat.com wrote:
 On 05/20/11 14:19, Stefan Hajnoczi wrote:
 I'm interested in what the API for snapshots would look like.

 I presume you're talking external snapshots here? The API is really what
 should be defined by libvirt, so you get a unified API that can work
 both on QEMU level snapshots as well as enterprise storage, host file
 system snapshots etc.
 
 Thanks for the pointers on external snapshots using image files.  I'm
 really thinking about the libvirt API.
 
 Basically I'm not sure we'll implement the right things if we don't
 think through the API that the user sees first.

Right, I agree. There's a lot of variables there, and they are not
necessarily easy to map into a single namespace. I am not sure it should
be done either..

Cheers,
Jes



[Qemu-devel] Invitation to connect on LinkedIn

2011-05-20 Thread Sosthene Grosset-Janin via LinkedIn
LinkedIn
Sosthene Grosset-Janin requested to add you as a connection on 
LinkedIn:
--

Jiajun,

I'd like to add you to my professional network on LinkedIn.

- Sosthene

Accept invitation from Sosthene Grosset-Janin
http://www.linkedin.com/e/-kkb1ec-gnx65bmq-4b/qTMmi8QEI_f3FNXUkL1mvZgy00BGYniwg3/blk/I129329775_11/1BpC5vrmRLoRZcjkkZt5YCpnlOt3RApnhMpmdzgmhxrSNBszYNclYRdPsVczcVcz59bTwRlnAUiA9ebPsVcPkVc3oQej8LrCBxbOYWrSlI/EML_comm_afe/

View invitation from Sosthene Grosset-Janin
http://www.linkedin.com/e/-kkb1ec-gnx65bmq-4b/qTMmi8QEI_f3FNXUkL1mvZgy00BGYniwg3/blk/I129329775_11/34NnPkTdPAOcPAOckALqnpPbOYWrSlI/svi/

--

Why might connecting with Sosthene Grosset-Janin be a good idea?

Sosthene Grosset-Janin's connections could be useful to you:
After accepting Sosthene Grosset-Janin's invitation, check Sosthene 
Grosset-Janin's connections to see who else you may know and who you might want 
an introduction to. Building these connections can create opportunities in the 
future.

 
-- 
(c) 2011, LinkedIn Corporation

[Qemu-devel] mouse doesn't work on guest OS

2011-05-20 Thread Amirali Shambayati
Hello all,
I use Qemu to run ubuntu image(for kernel debugging affairs). I use
following command:

sudo qemu -hda ubuntu-qemu-test -append root=/dev/sda1 -kernel
/mnt/build/linux-2.6/arch/x86/boot/bzImage -boot c -net nic -net user

Mouse doesn't work on guest ubuntu.

I googled my problem, and I found two solutions:
adding -usb -usbdevice tablet
entering this command before running qemu: export SDL_VIDEO_X11_DGAMOUSE=0

But none of them worked for me. any help is appreciated.

-- 
Amirali Shambayati
Bachelor Student
Computer Engineering Department
Sharif University of Technology
Tehran, Iran



Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API

2011-05-20 Thread Richard Henderson
On 05/20/2011 02:23 AM, Avi Kivity wrote:
 On 05/19/2011 11:43 PM, Anthony Liguori wrote:
 On 05/19/2011 09:12 AM, Avi Kivity wrote:
 The memory API separates the attributes of a memory region (its size, how
 reads or writes are handled, dirty logging, and coalescing) from where it
 is mapped and whether it is enabled.  This allows a device to configure
 a memory region once, then hand it off to its parent bus to map it according
 to the bus configuration.

 Hierarchical registration also allows a device to compose a region out of
 a number of sub-regions with different properties; for example some may be
 RAM while others may be MMIO.

 +struct {
 +/* If nonzero, specify bounds on access sizes beyond which a 
 machine
 + * check is thrown.
 + */
 +unsigned min_access_size;
 +unsigned max_access_size;
 +/* If true, unaligned accesses are supported.  Otherwise unaligned
 + * accesses throw machine checks.
 + */
 + bool unaligned;
 +} valid;

 Under what circumstances would this be used?

 The behavior of devices that receive non-natural accesses varies wildly.

 For PCI devices, invalid accesses almost always return ~0.  I can't think of 
 a device where an MCE would occur.
 
 This was requested by Richard, so I'll let him comment.
 

Several alpha system chips MCE when accessed with incorrect sizes.
E.g. only 64-bit accesses are allowed.

Is this structure honestly any better than 4 function pointers?
I can't see that it is, myself.


r~



[Qemu-devel] Hello Would You Like To Earn

2011-05-20 Thread Carmille . Burns
Hello qemu-devel

Would you like to earn an extra $200 everyday?, for just 45 minutes work? You 
could quit your job and make double the money at home working for yourself.

visit-http:tinyurl.com/42e38u9

Regards,

Carmille Burns

Survey Human Resources Dept.





Re: [Qemu-devel] [PATCH 19/26] target-xtensa: implement loop option

2011-05-20 Thread Richard Henderson
On 05/20/2011 02:10 AM, Max Filippov wrote:
 If you're going to pretend that LEND is a constant, you might as well
 pretend that LBEG is also a constant, so that you get to chain the TB's
 around the loop.

 But there may be three exits from TB at the LEND if its last
 command is a branch: to the LBEG, to the branch target and to the
 next insn.
 
 Ok, I guess that I need to add gen_wsr_lbeg that invalidates TB at the
 current LEND, pretend that LBEG is constant and use given slot to jump
 to it. And also to get rid of tcg_gen_brcondi_i32(TCG_COND_NE,
 cpu_SR[LEND], dc-next_pc, label);

Yes.

Consider that the code is written to assume that the loop cycles,
so the most likely exit at LEND is LBEG.  If we choose to mirror
that logic inside TCG, then of the 3 possible exits from the block
one of them should be LBEG so that the most likely edge can get
chained.


r~



Re: [Qemu-devel] [PATCH 09/26] target-xtensa: add special and user registers

2011-05-20 Thread Richard Henderson
On 05/20/2011 12:34 AM, Max Filippov wrote:
 User registers represent TIE states that may appear in custom xtensa
 configurations. I'd better change RUR and WUR so that they can access
 all user registers but warn on those not defined globally or in the
 CPUEnv::config. Is it OK?

Well, it's ok if you change nothing.  However, I wanted you to think
about other ways that might make sense than simply allocating all of
the registers.


r~



[Qemu-devel] [PATCH v5, resend] revamp acpitable parsing and allow to specify complete (headerful) table

2011-05-20 Thread Michael Tokarev
Since I've got no comments/replies whatsoever, -- neither
positive nor negative, I assume no one received this email
(sent on Thu, 12 May 2011), so am resending it again.

This patch almost rewrites acpi_table_add() function
(but still leaves it using old get_param_value() interface).
The result is that it's now possible to specify whole table
(together with a header) in an external file, instead of just
data portion, with a new file= parameter, but at the same time
it's still possible to specify header fields as before.

Now with the checkpatch.pl formatting fixes, thanks to
Stefan Hajnoczi for suggestions, with changes from
Isaku Yamahata, and with my further refinements.

v5: rediffed against current qemu/master.

Signed-off-by: Michael Tokarev m...@tls.msk.ru
---
 hw/acpi.c   |  292 ---
 qemu-options.hx |7 +-
 2 files changed, 175 insertions(+), 124 deletions(-)

diff --git a/hw/acpi.c b/hw/acpi.c
index ad40fb4..4316189 100644
--- a/hw/acpi.c
+++ b/hw/acpi.c
@@ -22,17 +22,29 @@
  struct acpi_table_header
 {
-char signature [4];/* ACPI signature (4 ASCII characters) */
+uint16_t _length; /* our length, not actual part of the hdr */
+  /* XXX why we have 2 length fields here? */
+char sig[4];  /* ACPI signature (4 ASCII characters) */
 uint32_t length;  /* Length of table, in bytes, including header */
 uint8_t revision; /* ACPI Specification minor version # */
 uint8_t checksum; /* To make sum of entire table == 0 */
-char oem_id [6];   /* OEM identification */
-char oem_table_id [8]; /* OEM table identification */
+char oem_id[6];   /* OEM identification */
+char oem_table_id[8]; /* OEM table identification */
 uint32_t oem_revision;/* OEM revision number */
-char asl_compiler_id [4]; /* ASL compiler vendor ID */
+char asl_compiler_id[4];  /* ASL compiler vendor ID */
 uint32_t asl_compiler_revision; /* ASL compiler revision number */
 } __attribute__((packed));
 +#define ACPI_TABLE_HDR_SIZE sizeof(struct acpi_table_header)
+#define ACPI_TABLE_PFX_SIZE sizeof(uint16_t)  /* size of the extra prefix */
+
+static const char dfl_hdr[ACPI_TABLE_HDR_SIZE] =
+\0\0   /* fake _length (2) */
+QEMU\0\0\0\0\1\0   /* sig (4), len(4), revno (1), csum (1) */
+QEMUQEQEMUQEMU\1\0\0\0 /* OEM id (6), table (8), revno (4) */
+QEMU\1\0\0\0   /* ASL compiler ID (4), version (4) */
+;
+
 char *acpi_tables;
 size_t acpi_tables_len;
 @@ -45,158 +57,192 @@ static int acpi_checksum(const uint8_t *data, int len)
 return (-sum)  0xff;
 }
 +/* like strncpy() but zero-fills the tail of destination */
+static void strzcpy(char *dst, const char *src, size_t size)
+{
+size_t len = strlen(src);
+if (len = size) {
+len = size;
+} else {
+  memset(dst + len, 0, size - len);
+}
+memcpy(dst, src, len);
+}
+
+/* XXX fixme: this function uses obsolete argument parsing interface */
 int acpi_table_add(const char *t)
 {
-static const char *dfl_id = QEMUQEMU;
 char buf[1024], *p, *f;
-struct acpi_table_header acpi_hdr;
 unsigned long val;
-uint32_t length;
-struct acpi_table_header *acpi_hdr_p;
-size_t off;
+size_t len, start, allen;
+bool has_header;
+int changed;
+int r;
+struct acpi_table_header hdr;
+
+r = 0;
+r |= get_param_value(buf, sizeof(buf), data, t) ? 1 : 0;
+r |= get_param_value(buf, sizeof(buf), file, t) ? 2 : 0;
+switch (r) {
+case 0:
+buf[0] = '\0';
+case 1:
+has_header = false;
+break;
+case 2:
+has_header = true;
+break;
+default:
+fprintf(stderr, acpitable: both data and file are specified\n);
+return -1;
+}
+
+if (!acpi_tables) {
+allen = sizeof(uint16_t);
+acpi_tables = qemu_mallocz(allen);
+}
+else {
+allen = acpi_tables_len;
+}
+
+start = allen;
+acpi_tables = qemu_realloc(acpi_tables, start + ACPI_TABLE_HDR_SIZE);
+allen += has_header ? ACPI_TABLE_PFX_SIZE : ACPI_TABLE_HDR_SIZE;
+
+/* now read in the data files, reallocating buffer as needed */
+
+for (f = strtok(buf, :); f; f = strtok(NULL, :)) {
+int fd = open(f, O_RDONLY);
+
+if (fd  0) {
+fprintf(stderr, can't open file %s: %s\n, f, strerror(errno));
+return -1;
+}
+
+for (;;) {
+char data[8192];
+r = read(fd, data, sizeof(data));
+if (r == 0) {
+break;
+} else if (r  0) {
+acpi_tables = qemu_realloc(acpi_tables, allen + r);
+memcpy(acpi_tables + allen, data, r);
+allen += r;
+} else if (errno != EINTR) {
+fprintf(stderr, can't read file %s: %s\n,
+f, 

Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API

2011-05-20 Thread Anthony Liguori

On 05/20/2011 09:06 AM, Richard Henderson wrote:

On 05/20/2011 02:23 AM, Avi Kivity wrote:

On 05/19/2011 11:43 PM, Anthony Liguori wrote:

On 05/19/2011 09:12 AM, Avi Kivity wrote:

The memory API separates the attributes of a memory region (its size, how
reads or writes are handled, dirty logging, and coalescing) from where it
is mapped and whether it is enabled.  This allows a device to configure
a memory region once, then hand it off to its parent bus to map it according
to the bus configuration.

Hierarchical registration also allows a device to compose a region out of
a number of sub-regions with different properties; for example some may be
RAM while others may be MMIO.

+struct {
+/* If nonzero, specify bounds on access sizes beyond which a machine
+ * check is thrown.
+ */
+unsigned min_access_size;
+unsigned max_access_size;
+/* If true, unaligned accesses are supported.  Otherwise unaligned
+ * accesses throw machine checks.
+ */
+ bool unaligned;
+} valid;


Under what circumstances would this be used?

The behavior of devices that receive non-natural accesses varies wildly.

For PCI devices, invalid accesses almost always return ~0.  I can't think of a 
device where an MCE would occur.


This was requested by Richard, so I'll let him comment.



Several alpha system chips MCE when accessed with incorrect sizes.
E.g. only 64-bit accesses are allowed.


But is this a characteristic of devices or is this a characteristic of 
the chipset/CPU?


At any rate, I'm fairly sure it doesn't belong in the MemoryRegion 
structure.


Regards,

Anthony Liguori



Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API

2011-05-20 Thread Richard Henderson
On 05/20/2011 07:31 AM, Anthony Liguori wrote:
 But is this a characteristic of devices or is this a characteristic of the 
 chipset/CPU?

Chipset.


r~



Re: [Qemu-devel] [PATCH] hw/realview.c: Remove duplicate #include line

2011-05-20 Thread Stefan Hajnoczi
On Thu, May 19, 2011 at 4:21 PM, Peter Maydell peter.mayd...@linaro.org wrote:
 Remove a duplicate #include of sysbus.h.

 Signed-off-by: Peter Maydell peter.mayd...@linaro.org
 ---
  hw/realview.c |    1 -
  1 files changed, 0 insertions(+), 1 deletions(-)

Thanks, added to the trivial-patches tree:
http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/trivial-patches

Stefan



Re: [Qemu-devel] [RFC v1] Add declarations for hierarchical memory region API

2011-05-20 Thread Anthony Liguori

On 05/20/2011 09:40 AM, Richard Henderson wrote:

On 05/20/2011 07:31 AM, Anthony Liguori wrote:

But is this a characteristic of devices or is this a characteristic of the 
chipset/CPU?


Chipset.


So if the chipset only allows accesses that are 64-bit, then you'll want 
to have hierarchical dispatch filter non 64-bit accesses and raise an 
MCE appropriately.


So you don't need anything in MemoryRegion, you need code in the 
dispatch path.


Regards,

Anthony Liguori



r~






Re: [Qemu-devel] [Qemu-trivial] [PATCH] hw/sd.c: Don't complain about SDIO commands CMD52/CMD53

2011-05-20 Thread Stefan Hajnoczi
On Fri, May 20, 2011 at 10:11 AM, Peter Maydell
peter.mayd...@linaro.org wrote:
 The SDIO specification introduces new commands 52 and 53.
 Handle as illegal command but do not complain on stderr,
 as SDIO-aware OSes (including Linux) may legitimately use
 these in their probing for presence of an SDIO card.

 Signed-off-by: Peter Maydell peter.mayd...@linaro.org
 ---
  hw/sd.c |   11 +++
  1 files changed, 11 insertions(+), 0 deletions(-)

Thanks, added to the trivial patches tree:
http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/trivial-patches

Stefan



Re: [Qemu-devel] [RFC] Memory API

2011-05-20 Thread Anthony Liguori

On 05/20/2011 03:56 AM, Avi Kivity wrote:

On 05/19/2011 07:36 PM, Anthony Liguori wrote:

There are no global priorities. Priorities are only used inside each
level of the memory region hierarchy to generate a resulting, flattened
view for the next higher level. At that level, everything imported from
below has the default prio again, ie. the lowest one.



Then SMM is impossible.



It doesn't follow.


Why do we need priorities at all? There should be no overlap at each
level in the hierarchy.


Of course there is overlap. PCI BARs overlap each other, the VGA windows
and ROM overlap RAM.


Here's what I'm still struggling with:

If children normally overlap their parents, but child priorities are 
always less than their parents, then what's the benefit of having 
anything more than two priorities settings.


As far as I can understand it, a priority of 0 means let children 
windows overlap whereas a priority of 1 means don't let children 
windows overlap.


Is there a use-case for a priority above 1 and if so, what does it mean?


If you have overlapping BARs, the PCI bus will always send the request
to a single device based on something that's implementation specific.
This works because each PCI device advertises the BAR locations and
sizes in it's config space.


BARs in general don't need priority, except we need to decide if BARs
overlap RAM of vice-versa.



To dispatch a request, the PCI bus will walk the config space to find
a match. If you remove something that was previously causing an
overlap, it'll the other device will now get the I/O requests.


That's what *exactl* what priority means. Which device is in front, and
which is in the back.


Why not use registration order to resolve this type of conflict?  What 
are the use cases to use priorities where registration order wouldn't be 
adequate?



There is no need to have centralized logic to decide this.



I think you're completely missing the point of my proposal.


I'm struggling to find the mental model for priorities.  I may just be 
dense here but the analogy of transparent window ordering isn't helping me.


Regards,

Anthony Liguori





[Qemu-devel] [RFC PATCH 0/6] SCSI series part 2, rewrite LUN parsing

2011-05-20 Thread Paolo Bonzini
This is the second part of my SCSI work.  The first is still pending
and this one is incomplete, but I still would like to get opinions
early enough because this design directly affects the UI.

This series is half of the work that is necessary to support multiple
LUNs behind a target.  The idea is to have two devices, scsi-path
and scsi-target, each of which provides both a SCSIDevice and a
SCSIBus.

I plan to do this work using VSCSI and then cut-an^Wapply it later to
virtio-scsi.  This way we can be reasonably sure that the approach will
be usable in the Linux virtio-scsi drivers too.

For an HBA like VSCSI or the upcoming virtio-scsi, which supports
multiple paths, you can add to your HBA:

- a scsi-path (id=1) which has two scsi-disks.  Then the disks
  will be at path 1, target 0, LUN 0/1

- a scsi-path (id=1) which has two scsi-targets each with a scsi-disk.
  Then the disks will be at path 1, target 0/1, LUN 0

- a scsi-path (id=1) which has two scsi-targets each with two scsi-disk.
  Then the four disks will be at path 1, target 0/1, LUN 0/1

- two scsi-path (id=1) each with two scsi-targets each with two scsi-disk.
  Then the eight disks will be at path 1, target 0/1, LUN 0/1

- a scsi-target (id=0) which has two scsi-disks.  Then the disks
  will be at path 0, target 0, LUN 0/1

- a scsi-target (id=0) with two scsi-disks and a scsi-path (id=1) each with
  two scsi-targets each with two scsi-disks.  Then two disks will be at
  path 0, target 0, LUN 0/1; four more will be at path 1, target 0/1,
  LUN 0/1.


For an HBA like lsi, which only supports one level, you can add to your HBA:

- a scsi-target (id=0) which has two scsi-disks.  Then the disks
  will be at path 0, target 0, LUN 0/1

- two scsi-targets (id=0/1) which has two scsi-disks.  Then the disks
  will be at path 0, targets 0/1, LUN 0/1

- one scsi-target (id=0) which has two scsi-disks and one scsi-disk
  (id=1).  Then two disks will be at path 0, target 0, LUN 0/1,
  the third will be at path 0, target 1, LUN 0.

and so on.

The patches do not provide the devices and relaying mechanism, but add
plumbing for parsing complex LUNs such as those used by VSCSI.

Patch 2 is useful on its own, because it fixes a mismatch in VSCSI's handling
of OpenFirmware and Linux LUNs.  It adds the main parsing code, and I'll
probably resubmit it soon.

Patch 5 adds the infrastructure that will be used by the simple LSI case.

Patch 6 adds the infrastructure that will be used in the full case, and
already kind-of attaches VSCSI to it.

The other 3 are just complimentary.

Ideas?  Does the interface seem applicable to libvirt?

Paolo Bonzini (6):
  scsi: ignore LUN field in the CDB
  scsi: support parsing of SAM logical unit numbers
  scsi-generic: allow customization of the lun
  scsi-disk: allow customization of the lun
  scsi: let a SCSIDevice have children devices
  scsi: add walking of hierarchical LUNs

 hw/esp.c  |4 +-
 hw/lsi53c895a.c   |2 +-
 hw/scsi-bus.c |  170 +
 hw/scsi-defs.h|   22 +++
 hw/scsi-disk.c|   19 +++---
 hw/scsi-generic.c |   41 +++--
 hw/scsi.h |   17 +
 hw/spapr_vscsi.c  |   22 ++-
 8 files changed, 264 insertions(+), 33 deletions(-)

-- 
1.7.4.4




[Qemu-devel] [RFC PATCH 3/6] scsi-generic: allow customization of the lun

2011-05-20 Thread Paolo Bonzini
This allows passthrough of devices with LUN != 0, by redirecting them to
LUN0 in the emulated target.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 hw/scsi-generic.c |   38 +-
 1 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/hw/scsi-generic.c b/hw/scsi-generic.c
index e6f0efd..fb38934 100644
--- a/hw/scsi-generic.c
+++ b/hw/scsi-generic.c
@@ -230,8 +230,11 @@ static void scsi_read_data(SCSIRequest *req)
 return;
 }
 
-if (r-req.cmd.buf[0] == REQUEST_SENSE  s-driver_status  
SG_ERR_DRIVER_SENSE)
-{
+switch (r-req.cmd.buf[0]) {
+case REQUEST_SENSE:
+if (!(s-driver_status  SG_ERR_DRIVER_SENSE)) {
+break;
+}
 s-senselen = MIN(r-len, s-senselen);
 memcpy(r-buf, s-sensebuf, s-senselen);
 r-io_header.driver_status = 0;
@@ -246,6 +249,32 @@ static void scsi_read_data(SCSIRequest *req)
 /* Clear sensebuf after REQUEST_SENSE */
 scsi_clear_sense(s);
 return;
+
+case REPORT_LUNS:
+   assert(!s-lun);
+if (r-req.cmd.xfer  16) {
+scsi_command_complete(r, -EINVAL);
+return;
+}
+r-io_header.driver_status = 0;
+r-io_header.status = 0;
+r-io_header.dxfer_len  = 16;
+r-len = -1;
+r-buf[3] = 8;
+scsi_req_data(r-req, 16);
+scsi_command_complete(r, 0);
+return;
+
+case INQUIRY:
+if (req-lun != s-lun) {
+if (r-req.cmd.xfer  1) {
+scsi_command_complete(r, -EINVAL);
+return;
+}
+outbuf[0] = 0x7f;
+return MIN(req-cmd.xfer, SCSI_MAX_INQUIRY_LEN);
+}
+break;
 }
 
 ret = execute_command(s-bs, r, SG_DXFER_FROM_DEV, scsi_read_complete);
@@ -335,7 +364,7 @@ static int32_t scsi_send_command(SCSIRequest *req, uint8_t 
*cmd)
 SCSIGenericReq *r = DO_UPCAST(SCSIGenericReq, req, req);
 int ret;
 
-if (cmd[0] != REQUEST_SENSE  req-lun != s-lun) {
+if (cmd[0] != REQUEST_SENSE  cmd[0] != INQUIRY  req-lun != s-lun) {
 DPRINTF(Unimplemented LUN %d\n, req-lun);
 scsi_set_sense(s, SENSE_CODE(LUN_NOT_SUPPORTED));
 r-req.status = CHECK_CONDITION;
@@ -510,8 +539,6 @@ static int scsi_generic_initfn(SCSIDevice *dev)
 }
 
 /* define device state */
-s-lun = scsiid.lun;
-DPRINTF(LUN %d\n, s-lun);
 s-qdev.type = scsiid.scsi_type;
 DPRINTF(device type %d\n, s-qdev.type);
 if (s-qdev.type == TYPE_TAPE) {
@@ -552,6 +579,7 @@ static SCSIDeviceInfo scsi_generic_info = {
 .get_sense= scsi_get_sense,
 .qdev.props   = (Property[]) {
 DEFINE_BLOCK_PROPERTIES(SCSIGenericState, qdev.conf),
+DEFINE_PROP_UINT32(lun,  SCSIDiskState, lun, 0),
 DEFINE_PROP_END_OF_LIST(),
 },
 };
-- 
1.7.4.4





[Qemu-devel] [RFC PATCH 2/6] scsi: support parsing of SAM logical unit numbers

2011-05-20 Thread Paolo Bonzini
SAM logical unit numbers are complicated beasts that can address
multiple levels of buses and targets before finally reaching
logical units.  Begin supporting them by correctly parsing vSCSI
LUNs.

Note that with the current (admittedly incorrect) code OpenFirmware
thought the devices were at bus X, target 0, lun 0 (because OF
prefers access mode 0, which places bus numbers in the top byte),
while Linux thought it was bus 0, target Y, lun 0 (because Linux
uses access mode 2, which places target numbers in the top byte).
With this patch, everything consistently uses the former notation.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 hw/scsi-bus.c|  109 ++
 hw/scsi-defs.h   |   22 +++
 hw/scsi.h|7 +++
 hw/spapr_vscsi.c |   18 ++---
 4 files changed, 142 insertions(+), 14 deletions(-)

diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c
index 2f0ffda..70b1092 100644
--- a/hw/scsi-bus.c
+++ b/hw/scsi-bus.c
@@ -718,3 +718,112 @@ static char *scsibus_get_fw_dev_path(DeviceState *dev)
 
 return strdup(path);
 }
+
+/* Decode the bus and level parts of a LUN, as defined in the SCSI architecture
+   model.  If false is returned, the LUN could not be parsed.  If true
+   is return, *bus and *target identify the next two steps in the
+   hierarchical LUN.
+
+   *lun can be used with scsi_get_lun to continue the parsing.  */
+static bool scsi_decode_level(uint64_t sam_lun, int *bus, int *target,
+  uint64_t *lun)
+{
+switch (sam_lun  62) {
+case ADDR_PERIPHERAL_DEVICE:
+*bus = (sam_lun  56)  0x3f;
+if (*bus) {
+/* The TARGET OR LUN field selects a target; walk the next
+   16-bits to find the LUN.  */
+*target = (sam_lun  48)  0xff;
+*lun = sam_lun  16;
+} else {
+/* The TARGET OR LUN field selects a LUN on the current
+   node, identified by bus 0.  */
+*target = 0;
+*lun = (sam_lun  0xffLL) | (1LL  62);
+}
+return true;
+case ADDR_LOGICAL_UNIT:
+*bus = (sam_lun  53)  7;
+*target = (sam_lun  56)  0x3f;
+*lun = (sam_lun  0x1fLL) | (1LL  62);
+return true;
+case ADDR_FLAT_SPACE:
+*bus = 0;
+*target = 0;
+*lun = sam_lun;
+return true;
+case ADDR_LOGICAL_UNIT_EXT:
+if ((sam_lun  56) == ADDR_WELL_KNOWN_LUN ||
+(sam_lun  56) == ADDR_FLAT_SPACE_EXT) {
+*bus = 0;
+*target = 0;
+*lun = sam_lun;
+return true;
+}
+return false;
+}
+abort();
+}
+
+/* Extract a single-level LUN number from a LUN, as specified in the
+   SCSI architecture model.  Return -1 if this is not possible because
+   the LUN includes a bus or target component.  */
+static int scsi_get_lun(uint64_t sam_lun)
+{
+int bus, target;
+
+retry:
+switch (sam_lun  62) {
+case ADDR_PERIPHERAL_DEVICE:
+case ADDR_LOGICAL_UNIT:
+scsi_decode_level(sam_lun, bus, target, sam_lun);
+if (bus || target) {
+return LUN_INVALID;
+}
+goto retry;
+
+case ADDR_FLAT_SPACE:
+return (sam_lun  48)  0x3fff;
+case ADDR_LOGICAL_UNIT_EXT:
+if ((sam_lun  56) == ADDR_WELL_KNOWN_LUN) {
+return LUN_WLUN_BASE | ((sam_lun  48)  0xff);
+}
+if ((sam_lun  56) == ADDR_FLAT_SPACE_EXT) {
+return (sam_lun  32)  0xff;
+}
+return LUN_INVALID;
+}
+abort();
+}
+
+/* Extract bus and target from the given LUN and use it to identify a
+   SCSIDevice from a SCSIBus.  Right now, only 1 target per bus is
+   supported.  In the future a SCSIDevice could host its own SCSIBus,
+   in an alternation of devices that select a bus (target ports) and
+   devices that select a target (initiator ports).  */
+SCSIDevice *scsi_decode_lun(SCSIBus *sbus, uint64_t sam_lun, int *lun)
+{
+int bus, target, decoded_lun;
+uint64_t next_lun;
+
+if (!scsi_decode_level(sam_lun, bus, target, next_lun)) {
+ /* Unsupported LUN format.  */
+ return NULL;
+}
+if (bus = sbus-ndev || (bus == 0  target  0)) {
+/* Out of range.  */
+return NULL;
+}
+if (target != 0) {
+/* Only one target for now.  */
+return NULL;
+}
+
+decoded_lun = scsi_get_lun(next_lun);
+if (decoded_lun != LUN_INVALID) {
+*lun = decoded_lun;
+return sbus-devs[bus];
+}
+return NULL;
+}
diff --git a/hw/scsi-defs.h b/hw/scsi-defs.h
index 413cce0..66dfd4a 100644
--- a/hw/scsi-defs.h
+++ b/hw/scsi-defs.h
@@ -164,3 +164,25 @@
 #define TYPE_ENCLOSURE 0x0d/* Enclosure Services Device */
 #define TYPE_NO_LUN 0x7f
 
+/*
+ *  SCSI addressing methods (bits 62-63 of the LUN).
+ */
+#define ADDR_PERIPHERAL_DEVICE 0
+#define ADDR_FLAT_SPACE1
+#define 

[Qemu-devel] [RFC PATCH 5/6] scsi: let a SCSIDevice have children devices

2011-05-20 Thread Paolo Bonzini
This provides the infrastructure for simple devices to pick LUNs.
Of course, this will not do anything until there is a device that
can report the existence of those LUNs.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 hw/esp.c|4 +++-
 hw/lsi53c895a.c |2 +-
 hw/scsi-bus.c   |   14 ++
 hw/scsi.h   |3 +++
 4 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/hw/esp.c b/hw/esp.c
index 5a33c67..e5bab76 100644
--- a/hw/esp.c
+++ b/hw/esp.c
@@ -239,12 +239,14 @@ static uint32_t get_cmd(ESPState *s, uint8_t *buf)
 
 static void do_busid_cmd(ESPState *s, uint8_t *buf, uint8_t busid)
 {
+SCSIDevice *dev;
 int32_t datalen;
 int lun;
 
  DPRINTF(do_busid_cmd: busid 0x%x\n, busid);
 lun = busid  7;
-s-current_req = scsi_req_new(s-current_dev, 0, lun);
+dev = scsi_find_lun(s-current_dev, lun, buf);
+s-current_req = scsi_req_new(dev, 0, lun);
 datalen = scsi_req_enqueue(s-current_req, buf);
 s-ti_size = datalen;
 if (datalen != 0) {
diff --git a/hw/lsi53c895a.c b/hw/lsi53c895a.c
index f291283..c549955 100644
--- a/hw/lsi53c895a.c
+++ b/hw/lsi53c895a.c
@@ -780,7 +780,7 @@ static void lsi_do_command(LSIState *s)
 s-command_complete = 0;
 
 id = (s-select_tag  8)  0xf;
-dev = s-bus.devs[id];
+dev = scsi_find_lun(s-bus.devs[id], s-current_lun, buf);
 if (!dev) {
 lsi_bad_selection(s, id);
 return;
diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c
index 70b1092..4d46831 100644
--- a/hw/scsi-bus.c
+++ b/hw/scsi-bus.c
@@ -719,6 +719,20 @@ static char *scsibus_get_fw_dev_path(DeviceState *dev)
 return strdup(path);
 }
 
+/* Simplified walk of the SCSI bus hierarchy, for devices that only support
+   one bus and only flat-space LUNs (typically 3-bit ones!).  */
+SCSIDevice *scsi_find_lun(SCSIDevice *sdev, int lun, uint8_t *cdb)
+{
+SCSIBus *sbus = sdev-children;
+if (!sbus ||
+(lun == 0  cdb[1] == REPORT_LUNS) ||
+lun = sbus-ndev || sbus-devs[lun] == NULL) {
+return sdev;
+} else {
+return sbus-devs[lun];
+}
+}
+
 /* Decode the bus and level parts of a LUN, as defined in the SCSI architecture
model.  If false is returned, the LUN could not be parsed.  If true
is return, *bus and *target identify the next two steps in the
diff --git a/hw/scsi.h b/hw/scsi.h
index aa75b82..438dd89 100644
--- a/hw/scsi.h
+++ b/hw/scsi.h
@@ -58,6 +58,7 @@ struct SCSIDevice
 uint32_t id;
 BlockConf conf;
 SCSIDeviceInfo *info;
+SCSIBus *children;
 QTAILQ_HEAD(, SCSIRequest) requests;
 int blocksize;
 int type;
@@ -143,7 +144,9 @@ extern const struct SCSISense sense_code_LUN_FAILURE;
 
 int scsi_build_sense(SCSISense sense, uint8_t *buf, int len, int fixed);
 int scsi_sense_valid(SCSISense sense);
+
 SCSIDevice *scsi_decode_lun(SCSIBus *sbus, uint64_t sam_lun, int *lun);
+SCSIDevice *scsi_find_lun(SCSIDevice *sdev, int lun, uint8_t *cdb);
 
 SCSIRequest *scsi_req_alloc(size_t size, SCSIDevice *d, uint32_t tag, uint32_t 
lun);
 SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, uint32_t lun);
-- 
1.7.4.4





[Qemu-devel] [RFC PATCH 1/6] scsi: ignore LUN field in the CDB

2011-05-20 Thread Paolo Bonzini
The LUN field in the CDB is a historical relic.  Ignore it as reserved,
which is what modern SCSI specifications actually say.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 hw/scsi-disk.c|6 +++---
 hw/scsi-generic.c |5 ++---
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index 4c7a53e..b14c32f 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -516,7 +516,7 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, 
uint8_t *outbuf)
 
 memset(outbuf, 0, buflen);
 
-if (req-lun || req-cmd.buf[1]  5) {
+if (req-lun) {
 outbuf[0] = 0x7f;  /* LUN not supported */
 return buflen;
 }
@@ -1022,9 +1022,9 @@ static int32_t scsi_send_command(SCSIRequest *req, 
uint8_t *buf)
 }
 #endif
 
-if (req-lun || buf[1]  5) {
+if (req-lun) {
 /* Only LUN 0 supported.  */
-DPRINTF(Unimplemented LUN %d\n, req-lun ? req-lun : buf[1]  5);
+DPRINTF(Unimplemented LUN %d\n, req-lun);
 if (command != REQUEST_SENSE  command != INQUIRY) {
 scsi_command_complete(r, CHECK_CONDITION,
   SENSE_CODE(LUN_NOT_SUPPORTED));
diff --git a/hw/scsi-generic.c b/hw/scsi-generic.c
index 0c04606..e6f0efd 100644
--- a/hw/scsi-generic.c
+++ b/hw/scsi-generic.c
@@ -335,9 +335,8 @@ static int32_t scsi_send_command(SCSIRequest *req, uint8_t 
*cmd)
 SCSIGenericReq *r = DO_UPCAST(SCSIGenericReq, req, req);
 int ret;
 
-if (cmd[0] != REQUEST_SENSE 
-(req-lun != s-lun || (cmd[1]  5) != s-lun)) {
-DPRINTF(Unimplemented LUN %d\n, req-lun ? req-lun : cmd[1]  5);
+if (cmd[0] != REQUEST_SENSE  req-lun != s-lun) {
+DPRINTF(Unimplemented LUN %d\n, req-lun);
 scsi_set_sense(s, SENSE_CODE(LUN_NOT_SUPPORTED));
 r-req.status = CHECK_CONDITION;
 scsi_req_complete(r-req);
-- 
1.7.4.4





[Qemu-devel] [RFC PATCH 4/6] scsi-disk: allow customization of the lun

2011-05-20 Thread Paolo Bonzini
This will not work until there is a device that can answer REPORT LUNS
for disks with LUN != 0.  However, it provides the infrastructure.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 hw/scsi-disk.c |   17 +
 1 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index b14c32f..f41550a 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -66,6 +66,7 @@ struct SCSIDiskState
 /* The qemu block layer uses a fixed 512 byte sector size.
This is the number of 512 byte blocks in a single scsi sector.  */
 int cluster_size;
+uint32_t lun;
 uint32_t removable;
 uint64_t max_lba;
 QEMUBH *bh;
@@ -516,7 +517,7 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, 
uint8_t *outbuf)
 
 memset(outbuf, 0, buflen);
 
-if (req-lun) {
+if (req-lun != s-lun) {
 outbuf[0] = 0x7f;  /* LUN not supported */
 return buflen;
 }
@@ -955,6 +956,7 @@ static int scsi_disk_emulate_command(SCSIDiskReq *r, 
uint8_t *outbuf)
 DPRINTF(Unsupported Service Action In\n);
 goto illegal_request;
 case REPORT_LUNS:
+assert(!s-lun);
 if (req-cmd.xfer  16)
 goto illegal_request;
 memset(outbuf, 0, 16);
@@ -1022,14 +1024,12 @@ static int32_t scsi_send_command(SCSIRequest *req, 
uint8_t *buf)
 }
 #endif
 
-if (req-lun) {
-/* Only LUN 0 supported.  */
+if (command != REQUEST_SENSE  command != INQUIRY  req-lun != s-lun) {
+/* Only one LUN supported.  */
 DPRINTF(Unimplemented LUN %d\n, req-lun);
-if (command != REQUEST_SENSE  command != INQUIRY) {
-scsi_command_complete(r, CHECK_CONDITION,
-  SENSE_CODE(LUN_NOT_SUPPORTED));
-return 0;
-}
+scsi_command_complete(r, CHECK_CONDITION,
+  SENSE_CODE(LUN_NOT_SUPPORTED));
+return 0;
 }
 switch (command) {
 case TEST_UNIT_READY:
@@ -1247,6 +1247,7 @@ static SCSIDeviceInfo scsi_disk_info = {
 .get_sense= scsi_get_sense,
 .qdev.props   = (Property[]) {
 DEFINE_BLOCK_PROPERTIES(SCSIDiskState, qdev.conf),
+DEFINE_PROP_UINT32(lun,  SCSIDiskState, lun, 0),
 DEFINE_PROP_STRING(ver,  SCSIDiskState, version),
 DEFINE_PROP_STRING(serial,  SCSIDiskState, serial),
 DEFINE_PROP_BIT(removable, SCSIDiskState, removable, 0, false),
-- 
1.7.4.4





[Qemu-devel] [RFC PATCH 6/6] scsi: add walking of hierarchical LUNs

2011-05-20 Thread Paolo Bonzini
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 hw/scsi-bus.c|   79 +++---
 hw/scsi.h|9 +-
 hw/spapr_vscsi.c |6 +++-
 3 files changed, 75 insertions(+), 19 deletions(-)

diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c
index 4d46831..2037da3 100644
--- a/hw/scsi-bus.c
+++ b/hw/scsi-bus.c
@@ -811,33 +811,80 @@ retry:
 abort();
 }
 
-/* Extract bus and target from the given LUN and use it to identify a
-   SCSIDevice from a SCSIBus.  Right now, only 1 target per bus is
-   supported.  In the future a SCSIDevice could host its own SCSIBus,
-   in an alternation of devices that select a bus (target ports) and
-   devices that select a target (initiator ports).  */
-SCSIDevice *scsi_decode_lun(SCSIBus *sbus, uint64_t sam_lun, int *lun)
+/* Reusable implementation of the decode_lun entry in SCSIBusOps.  */
+SCSIDevice *scsi_decode_bus_from_lun(SCSIBus *sbus, uint64_t sam_lun,
+ uint64_t *next_lun)
 {
-int bus, target, decoded_lun;
-uint64_t next_lun;
+int bus, target;
+uint64_t my_next_lun;
+SCSIDevice *sdev;
 
-if (!scsi_decode_level(sam_lun, bus, target, next_lun)) {
+if (!scsi_decode_level(sam_lun, bus, target, my_next_lun)) {
  /* Unsupported LUN format.  */
  return NULL;
 }
-if (bus = sbus-ndev || (bus == 0  target  0)) {
+if (bus = sbus-ndev) {
 /* Out of range.  */
 return NULL;
 }
-if (target != 0) {
-/* Only one target for now.  */
+
+sdev = sbus-devs[bus];
+if (!sdev) {
+   return NULL;
+} else if (bus == 0 || !sdev-children) {
+return target ? NULL : sdev;
+} else {
+/* Next we'll decode the target, so pass down the same LUN we got.  */
+return sdev-children-ops.decode_lun(sbus, sam_lun, next_lun);
+}
+}
+
+SCSIDevice *scsi_decode_target_from_lun(SCSIBus *sbus, uint64_t sam_lun,
+uint64_t *next_lun)
+{
+int bus, target;
+SCSIDevice *sdev;
+
+if (!scsi_decode_level(sam_lun, bus, target, next_lun)) {
+ /* Unsupported LUN format.  */
+ return NULL;
+}
+if (target = sbus-ndev) {
+/* Out of range.  */
 return NULL;
 }
 
+sdev = sbus-devs[target];
+if (!sdev || !sdev-children || (*next_lun  56) == ADDR_WELL_KNOWN_LUN) {
+return sdev;
+} else {
+return sdev-children-ops.decode_lun(sbus, *next_lun, next_lun);
+}
+}
+
+/* Extract bus and target from the given LUN and use it to identify a
+   SCSIDevice from a SCSIBus.  Right now, only 1 target per bus is
+   supported.  In the future a SCSIDevice could host its own SCSIBus,
+   in an alternation of devices that select a bus (target ports) and
+   devices that select a target (initiator ports).  */
+SCSIDevice *scsi_decode_lun(SCSIBus *sbus, uint64_t sam_lun,
+uint8_t *cdb, int *lun)
+{
+int decoded_lun;
+uint64_t next_lun;
+SCSIDevice *sdev;
+
+sdev = sbus-ops.decode_lun(sbus, sam_lun, next_lun);
+if (!sdev) {
+return NULL;
+}
 decoded_lun = scsi_get_lun(next_lun);
-if (decoded_lun != LUN_INVALID) {
-*lun = decoded_lun;
-return sbus-devs[bus];
+if (decoded_lun == LUN_INVALID) {
+return NULL;
+}
+if ((decoded_lun  ~LUN_WLUN_MASK) == LUN_WLUN_BASE) {
+return sdev;
 }
-return NULL;
+*lun = decoded_lun;
+return scsi_find_lun(sdev, decoded_lun, cdb);
 }
diff --git a/hw/scsi.h b/hw/scsi.h
index 438dd89..c4cca0b 100644
--- a/hw/scsi.h
+++ b/hw/scsi.h
@@ -88,6 +88,8 @@ struct SCSIBusOps {
 void (*transfer_data)(SCSIRequest *req, uint32_t arg);
 void (*complete)(SCSIRequest *req, uint32_t arg);
 void (*cancel)(SCSIRequest *req);
+SCSIDevice *(*decode_lun)(SCSIBus *sbus, uint64_t sam_lun,
+  uint64_t *next_lun);
 };
 
 struct SCSIBus {
@@ -145,7 +147,12 @@ extern const struct SCSISense sense_code_LUN_FAILURE;
 int scsi_build_sense(SCSISense sense, uint8_t *buf, int len, int fixed);
 int scsi_sense_valid(SCSISense sense);
 
-SCSIDevice *scsi_decode_lun(SCSIBus *sbus, uint64_t sam_lun, int *lun);
+SCSIDevice *scsi_decode_lun(SCSIBus *sbus, uint64_t sam_lun, uint8_t *cdb,
+int *lun);
+SCSIDevice *scsi_decode_bus_from_lun(SCSIBus *sbus, uint64_t sam_lun,
+ uint64_t *next_lun);
+SCSIDevice *scsi_decode_target_from_lun(SCSIBus *sbus, uint64_t sam_lun,
+uint64_t *next_lun);
 SCSIDevice *scsi_find_lun(SCSIDevice *sdev, int lun, uint8_t *cdb);
 
 SCSIRequest *scsi_req_alloc(size_t size, SCSIDevice *d, uint32_t tag, uint32_t 
lun);
diff --git a/hw/spapr_vscsi.c b/hw/spapr_vscsi.c
index ee88ff6..d46ab30 100644
--- a/hw/spapr_vscsi.c
+++ b/hw/spapr_vscsi.c
@@ -640,7 +640,8 @@ static int vscsi_queue_cmd(VSCSIState *s, vscsi_req *req)
 

Re: [Qemu-devel] [PATCH 20/26] target-xtensa: implement extended L32R

2011-05-20 Thread Richard Henderson
On 05/20/2011 12:14 AM, Max Filippov wrote:
 As far as I can see LITBASE usage pattern is that it is set up once
 in early initialization and is never changed after.

That's probably true on a per-program basis.  I.e. for semi-hosting or userland
emulation, hard-coding litbase into the TB could make sense.

However, for full system emulation, with kernel and userland et al, I would 
expect that litbase would tend to be set per-application.  At which point it
would almost certainly be more efficient to read the value at runtime.


r~



Re: [Qemu-devel] [RFC] Memory API

2011-05-20 Thread Anthony Liguori

On 05/20/2011 04:01 AM, Avi Kivity wrote:

On 05/19/2011 07:32 PM, Anthony Liguori wrote:

Think of how a window manager folds windows with priorities onto a flat
framebuffer.

You do a depth-first walk of the tree. For each child list, you iterate
it from the lowest to highest priority, allowing later subregions
override earlier subregions.




Okay, but this doesn't explain how you'll let RAM override the VGA
mapping since RAM is not represented in the same child list as VGA
(RAM is a child of the PMC whereas VGA is a child of ISA/PCI, both of
which are at least one level removed from the PMC).


VGA will override RAM.

Memory controller
|
+-- RAM container (prio 0)
|
+-- PCI container (prio 1)
|
+--- vga window


Unless the RAM controller increases it's priority, right?  That's how 
you would implement SMM, by doing priority++?


But if you have:

Memory controller
|
+-- RAM container (prio 0)
|
+-- PCI container (prio 1)
|
+-- PCI-X container (prio 2)
|
+--- vga window

Now you need to do priority = 3?

Jan had mentioned previously about registering a new temporary window. 
I assume the registration always gets highest_priority++, or do you have 
to explicitly specify that PCI container gets priority=1?


Regards,

Anthony Liguori









[Qemu-devel] [PATCH] s390x: complain when allocating ram fails

2011-05-20 Thread Alexander Graf
While trying out the  64GB guest RAM patch, I hit some virtual address
limitations of my host system, which resulted in mmap failing. Unfortunately,
qemu didn't tell me about this failure, but just used the NULL pointer
happily, resulting in either segmentation faults or other fun errors.

To spare other users from tracing this down, let's print a nice message
instead so the user can figure out what's wrong from there.

Signed-off-by: Alexander Graf ag...@suse.de
---
 exec.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/exec.c b/exec.c
index 3f96d44..a4785b2 100644
--- a/exec.c
+++ b/exec.c
@@ -2918,6 +2918,10 @@ ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, 
const char *name,
 new_block-host = mmap((void*)0x8, size,
PROT_EXEC|PROT_READ|PROT_WRITE,
MAP_SHARED | MAP_ANONYMOUS | MAP_FIXED, -1, 
0);
+if (new_block-host == MAP_FAILED) {
+fprintf(stderr, Allocating RAM failed\n);
+abort();
+}
 #else
 if (xen_mapcache_enabled()) {
 xen_ram_alloc(new_block-offset, size);
-- 
1.6.0.2




[Qemu-devel] [PATCH V5 00/12] Qemu Trusted Platform Module (TPM) integration

2011-05-20 Thread Stefan Berger
The following series of patches adds a TPM (Trusted Platform Module)
TIS (TPM Interface Spec) interface to Qemu and with that provides
means to access a backend implementing the actual TPM functionality.
This frontend enables for example Linux's TPM TIS (tpm_tis) driver.

I am also posting the implementation of a backend implementation that is based
on a library (libtpms) providing TPM functionality. This library is currently
undergoing further testing but is now available via Fedora Rawhide:

http://download.fedora.redhat.com/pub/fedora/linux/development/rawhide/x86_64/os/Packages/libtpms-0.5.1-5.x86_64.rpm
http://download.fedora.redhat.com/pub/fedora/linux/development/rawhide/x86_64/os/Packages/libtpms-devel-0.5.1-5.x86_64.rpm

source at

http://download.fedora.redhat.com/pub/fedora/linux/development/rawhide/source/SRPMS/libtpms-0.5.1-5.src.rpm

All testing was done with the libtpms-based backend. It provides support for
VM suspend/resume, migration and snapshotting. It uses QCoW2 as the file
format for storing its persistent state onto, which is necessary for support
of snapshotting. Using Linux as the OS along with some recently posted patches
for the Linux TPM TIS driver, suspend/resume works fine (using 'virsh
save/restore') along with hibernation and OS suspend (ACPI S3).

Proper support for the TPM requires support in the BIOS since the BIOS
needs to initialize the TPM upon machine start or issue commands to the TPM
when it resumes from suspend (ACPI S3). It also builds and connects the
necessary ACPI tables (SSDT for TPM device, TCPA table for logging) to the
ones that are built by a BIOS. To support this I have fairly extensive
set of extensions for SeaBIOS that have already been posted to the SeaBIOS
mailing list and been ACK'ed by Kevin (thank you! :-)).

v5:
 - applies to checkout of 1fddfba1
 - adding support for split command line using the -tpmdev ... -device ...
   options while keeping the -tpm option
 - support for querying the device models using -tpm model=?
 - support for monitor 'info tpm'
 - adding documentation of command line options for man page and web page
 - increasing room for ACPI tables that qemu reserves to 128kb (from 64kb)
 - adding (experimental) support for block migration
 - adding (experimental) support for taking measurements when kernel,
   initrd and kernel command line are directly passed to Qemu

v4:
 - applies to checkout of d2d979c6
 - more coding style fixes
 - adding patch for supporting blob encryption (in addition to the existing
   QCoW2-level encryption)
   - this allows for graceful termination of a migration if the target
 is detected to have a wrong key
   - tested with big and little endian hosts
 - main thread releases mutex while checking for work to do on behalf of
   backend
 - introducing file locking (fcntl) on the block layer for serializing access
   to shared (QCoW2) files (used during migration)

v3:
 - Building a null driver at patch 5/8 that responds to all requests
   with an error response; subsequently this driver is transformed to the
   libtpms-based driver for real TPM functionality
 - Reworked the threading; dropped the patch for qemu_thread_join; the
   main thread synchronizing with the TPM thread termination may need
   to write data to the block storage while waiting for the thread to 
   terminate; did not previously show a problem but is safer
 - A lot of testing based on recent git checkout 4b4a72e5 (4/10):
   - migration of i686 VM from x86_64 host to i686 host to ppc64 host while
 running tests inside the VM
   - tests with S3 suspend/resume
   - tests with snapshots
   - multiple-hour tests with VM suspend/resume (using virsh save/restore)
 while running a TPM test suite inside the VM
   All tests passed; [not all of them were done on the ppc64 host]

v2:
 - splitting some of the patches into smaller ones for easier review
 - fixes in individual patches

Regards,
Stefan





[Qemu-devel] [PATCH V5 05/12] Add a debug register

2011-05-20 Thread Stefan Berger
This patch uses the possibility to add a vendor-specific register and
adds a debug register useful for dumping the TIS's internal state. This
register is only active in a debug build (#define DEBUG_TIS).

v3:
 - all output goes to stderr

Signed-off-by: Stefan Berger stef...@linux.vnet.ibm.com

---
 hw/tpm_tis.c |   67 +++
 1 file changed, 67 insertions(+)

Index: qemu-git/hw/tpm_tis.c
===
--- qemu-git.orig/hw/tpm_tis.c
+++ qemu-git/hw/tpm_tis.c
@@ -43,6 +43,8 @@
 #define TIS_REG_DID_VID   0xf00
 #define TIS_REG_RID   0xf04
 
+/* vendor-specific registers */
+#define TIS_REG_DEBUG 0xf90
 
 #define STS_VALID(1  7)
 #define STS_COMMAND_READY(1  6)
@@ -316,6 +318,66 @@ static uint32_t tis_data_read(TPMState *
 }
 
 
+#ifdef DEBUG_TIS
+static void tis_dump_state(void *opaque, target_phys_addr_t addr)
+{
+static const unsigned regs[] = {
+TIS_REG_ACCESS,
+TIS_REG_INT_ENABLE,
+TIS_REG_INT_VECTOR,
+TIS_REG_INT_STATUS,
+TIS_REG_INTF_CAPABILITY,
+TIS_REG_STS,
+TIS_REG_DID_VID,
+TIS_REG_RID,
+0xfff};
+int idx;
+uint8_t locty = tis_locality_from_addr(addr);
+target_phys_addr_t base = addr  ~0xfff;
+TPMState *s = opaque;
+
+fprintf(stderr,
+tpm_tis: active locality  : %d\n
+tpm_tis: state of locality %d : %d\n
+tpm_tis: register dump:\n,
+s-active_locty,
+locty, s-loc[locty].state);
+
+for (idx = 0; regs[idx] != 0xfff; idx++) {
+fprintf(stderr, tpm_tis: 0x%04x : 0x%08x\n, regs[idx],
+tis_mem_readl(opaque, base + regs[idx]));
+}
+
+fprintf(stderr,
+tpm_tis: read offset   : %d\n
+tpm_tis: result buffer : ,
+s-loc[locty].r_offset);
+for (idx = 0;
+ idx  tis_get_size_from_buffer(s-loc[locty].r_buffer);
+ idx++) {
+fprintf(stderr, %c%02x%s,
+s-loc[locty].r_offset == idx ? '' : ' ',
+s-loc[locty].r_buffer.buffer[idx],
+((idx  0xf) == 0xf) ? \ntpm_tis:  : );
+}
+fprintf(stderr,
+\n
+tpm_tis: write offset  : %d\n
+tpm_tis: request buffer: ,
+s-loc[locty].w_offset);
+for (idx = 0;
+ idx  tis_get_size_from_buffer(s-loc[locty].w_buffer);
+ idx++) {
+fprintf(stderr, %c%02x%s,
+s-loc[locty].w_offset == idx ? '' : ' ',
+s-loc[locty].w_buffer.buffer[idx],
+((idx  0xf) == 0xf) ? \ntpm_tis:  : );
+}
+fprintf(stderr,\n);
+}
+#endif
+
+
 /*
  * Read a register of the TIS interface
  * See specs pages 33-63 for description of the registers
@@ -391,6 +453,11 @@ static uint32_t tis_mem_readl(void *opaq
 case TIS_REG_RID:
 val = TPM_RID;
 break;
+#ifdef DEBUG_TIS
+case TIS_REG_DEBUG:
+tis_dump_state(opaque, addr);
+break;
+#endif
 }
 
 qemu_mutex_unlock(s-state_lock);




[Qemu-devel] [PATCH V5 01/12] Support for TPM command line options

2011-05-20 Thread Stefan Berger
This patch adds support for TPM command line options.
The command line supported here (considering the libtpms based
backend) are

./qemu-... -tpm builtin,path=path to blockstorage file

and

./qemu-... -tpmdev builtin,path=path to blockstorage file,id=id
   -device tpm-tis,tpmdev=id

and

./qemu-... -tpmdev ?

where the latter works similar to -soundhw ? and shows a list of
available TPM backends ('builtin').

To show the available TPM models do:

./qemu-... -tpm model=?


In case of -tpm, 'type' (above 'builtin') and 'model' are interpreted in tpm.c.
In case of -tpmdev 'type' and 'id' are interpreted in tpm.c
Using the type parameter, the backend is chosen, i.e., 'builtin' for the
libtpms-based builtin TPM. The interpretation of the other parameters along
with determining whether enough parameters were provided is pushed into
the backend driver, which needs to implement the interface function
'create' and return a TPMDriver structure if the VM can be started or 'NULL'
if not enough or bad parameters were provided.

Since SeaBIOS will now use 128kb for ACPI tables the amount of reserved
memory for ACPI tables needs to be increased -- increasing it to 128kb.

Monitor support for 'info tpm' has been added. It for example prints the
following:

TPM devices:
  builtin: model=tpm-tis,id=tpm0



v5:
 - fixing typo reported by Serge Hallyn
 - Adapting code to split command line parameters supporting 
   -tpmdev ... -device tpm-tis,tpmdev=...
 - moved code out of arch_init.c|h into tpm.c|h
 - increasing reserved memory for ACPI tables to 128kb (from 64kb)
 - the backend interface has a create() function for interpreting the command
   line parameters and returning a TPMDevice structure; previoulsy
   this function was called handle_options()
 - the backend interface has a destroy() function for cleaning up after
   the create() function was called
 - added support for 'info tpm' in monitor

v4:
 - coding style fixes

v3:
 - added hw/tpm_tis.h to this patch so Qemu compiles at this stage

Signed-off-by: Stefan Berger stef...@linux.vnet.ibm.com

---
 Makefile.target |1 
 hmp-commands.hx |2 
 hw/pc.c |5 -
 hw/tpm_tis.h|   76 +++
 monitor.c   |   10 ++
 qemu-config.c   |   46 +
 qemu-options.hx |   80 
 tpm.c   |  277 
 tpm.h   |  104 +
 vl.c|   14 ++
 10 files changed, 614 insertions(+), 1 deletion(-)

Index: qemu-git/qemu-options.hx
===
--- qemu-git.orig/qemu-options.hx
+++ qemu-git/qemu-options.hx
@@ -1691,6 +1691,86 @@ ETEXI
 
 DEFHEADING()
 
+DEFHEADING(TPM device options:)
+
+#ifndef _WIN32
+# ifdef CONFIG_TPM
+DEF(tpm, HAS_ARG, QEMU_OPTION_tpm, \
+ \
+-tpm builtin,path=path[,model=model]\n \
+enable a builtin TPM with state in file in path\n \
+-tpm model=?to list available TPM device models\n \
+-tpm ?  to list available TPM backend types\n,
+QEMU_ARCH_I386)
+DEF(tpmdev, HAS_ARG, QEMU_OPTION_tpmdev, \
+-tpmdev [builtin],id=str[,option][,option][,...]\n,
+QEMU_ARCH_I386)
+# endif
+#endif
+STEXI
+
+The general form of a TPM device option is:
+@table @option
+
+@item -tpmdev @var{backend} ,id=@var{id} [,@var{options}]
+@findex -tpmdev
+Backend type must be:
+@option{builtin}.
+
+The specific backend type will determine the applicable options.
+The @code{-tpmdev} options requires a @code{-device} option.
+
+Options to each backend are described below.
+
+Use ? to print all available TPM backend types.
+@example
+qemu -tpmdev ?
+@end example
+
+@item -tpmdev builtin ,id=@var{id}, path=@var{path}
+
+Creates an instance of the built-in TPM.
+
+@option{path} specifies the path to the QCoW2 image that will store
+the TPM's persistent data. @option{path} is required.
+
+To create a built-in TPM use the following two options:
+@example
+-tpmdev builtin,id=tpm0,path=path_to_qcow2 -device tpm-tis,tpmdev=tpm0
+@end example
+Not that the @code{-tpmdev} id is @code{tpm0} and is referenced by
+@code{tpmdev=tpm0} in the device option.
+
+@end table
+
+The short form of a TPM device option is:
+@table @option
+
+@item -tpm @var{backend-type}, path=@var{path} [,model=@var{model}]
+@findex -tpm
+
+@option{model} specifies the device model. The default device model is a
+@code{tpm-tis} device model. @code{model} is optional.
+
+Use ? to print all available TPM models.
+@example
+qemu -tpm model=?
+@end example
+
+The other options have the same meaning as explained above.
+
+To create a built-in TPM use the following option:
+@example
+-tpm builtin, path=path_to_qcow2
+@end example
+
+@end table
+
+ETEXI
+
+
+DEFHEADING()
+
 DEFHEADING(Linux/Multiboot boot specific:)
 STEXI
 
Index: qemu-git/vl.c
===
--- qemu-git.orig/vl.c
+++ qemu-git/vl.c
@@ -137,6 +137,7 @@ int main(int 

[Qemu-devel] [PATCH V5 02/12] Add TPM (frontend) hardware interface (TPM TIS) to Qemu

2011-05-20 Thread Stefan Berger
This patch adds the main code of the TPM frontend driver, the TPM TIS
interface, to Qemu. The code is largely based on my previous implementation
for Xen but has been significantly extended to meet the standard's
requirements, such as the support for changing of localities and all the
functionality of the available flags.

Communication with the backend (i.e., for Xen or the libtpms-based one)
is cleanly separated through an interface which the backend driver needs
to implement.

The TPM TIS driver's backend was previously chosen in the code added
to arch_init. The frontend holds a pointer to the chosen backend (interface).

Communication with the backend is largely based on signals and conditions.
Whenever the frontend has collected a complete packet, it will signal
the backend, which then starts processing the command. Once the result
has been returned, the backend invokes a callback function
(tis_tpm_receive_cb()).

The one tricky part is support for VM suspend while the TPM is processing
a command. In this case the frontend driver is waiting for the backend
to return the result of the last command before shutting down. It waits
on a condition for a signal from the backend, which is delivered in 
tis_tpm_receive_cb().

Testing the proper functioning of the different flags and localities 
cannot be done from user space when running in Linux for example, since
access to the address space of the TPM TIS interface is not possible. Also
the Linux driver itself does not exercise all functionality. So, for
testing there is a fairly extensive test suite as part of the SeaBIOS patches
since from within the BIOS one can have full access to all the TPM's registers.

v5:
  - adding comment to tis_data_read
  - refactoring following support for split command line options
-tpmdev and -device
  - code handling the configuration of the TPM device was moved to tpm.c
  - removed empty line at end of file

v3:
  - prefixing functions with tis_
  - added a function to the backend interface 'early_startup_tpm' that
allows to detect the presence of the block storage and gracefully fails
Qemu if it's not available. This works with migration using shared
storage but doesn't support migration with block storage migration.
For encyrypted QCoW2 and in case of a snapshot resue the late_startup_tpm
interface function is called

Signed-off-by: Stefan Berger stef...@linux.vnet.ibm.com

---
 hw/tpm_tis.c |  839 +++
 1 file changed, 839 insertions(+)

Index: qemu-git/hw/tpm_tis.c
===
--- /dev/null
+++ qemu-git/hw/tpm_tis.c
@@ -0,0 +1,839 @@
+/*
+ * tpm_tis.c - QEMU emulator for a 1.2 TPM with TIS interface
+ *
+ * Copyright (C) 2006,2010 IBM Corporation
+ *
+ * Author: Stefan Berger stef...@us.ibm.com
+ * David Safford saff...@us.ibm.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation, version 2 of the
+ * License.
+ *
+ *
+ * Implementation of the TIS interface according to specs at
+ * 
https://www.trustedcomputinggroup.org/groups/pc_client/TCG_PCClientTPMSpecification_1-20_1-00_FINAL.pdf
+ *
+ */
+
+#include tpm.h
+#include block.h
+#include hw/hw.h
+#include hw/pc.h
+#include hw/tpm_tis.h
+
+#include stdio.h
+
+//#define DEBUG_TIS
+
+/* whether the STS interrupt is supported */
+//#define RAISE_STS_IRQ
+
+/* tis registers */
+#define TIS_REG_ACCESS0x00
+#define TIS_REG_INT_ENABLE0x08
+#define TIS_REG_INT_VECTOR0x0c
+#define TIS_REG_INT_STATUS0x10
+#define TIS_REG_INTF_CAPABILITY   0x14
+#define TIS_REG_STS   0x18
+#define TIS_REG_DATA_FIFO 0x24
+#define TIS_REG_DID_VID   0xf00
+#define TIS_REG_RID   0xf04
+
+
+#define STS_VALID(1  7)
+#define STS_COMMAND_READY(1  6)
+#define STS_TPM_GO   (1  5)
+#define STS_DATA_AVAILABLE   (1  4)
+#define STS_EXPECT   (1  3)
+#define STS_RESPONSE_RETRY   (1  1)
+
+#define ACCESS_TPM_REG_VALID_STS (1  7)
+#define ACCESS_ACTIVE_LOCALITY   (1  5)
+#define ACCESS_BEEN_SEIZED   (1  4)
+#define ACCESS_SEIZE (1  3)
+#define ACCESS_PENDING_REQUEST   (1  2)
+#define ACCESS_REQUEST_USE   (1  1)
+#define ACCESS_TPM_ESTABLISHMENT (1  0)
+
+#define INT_ENABLED  (1  31)
+#define INT_DATA_AVAILABLE   (1  0)
+#define INT_STS_VALID(1  1)
+#define INT_LOCALITY_CHANGED (1  2)
+#define INT_COMMAND_READY(1  7)
+
+#ifndef RAISE_STS_IRQ
+
+# define INTERRUPTS_SUPPORTED (INT_LOCALITY_CHANGED | \
+   INT_DATA_AVAILABLE   | \
+   INT_COMMAND_READY)
+
+#else
+
+# 

[Qemu-devel] [PATCH V5 04/12] Add tpm_tis driver to build process

2011-05-20 Thread Stefan Berger
The TPM interface (tpm_tis) needs to be explicitly enabled via
./configure --enable-tpm. This patch also restricts the building of the
TPM support to i386 and x86_64 targets since only there it is currently
supported. With that I am trying to prevent that one will end up with
support for a frontend but no available backend.

v3:
 - fixed and moved hunks in Makefile.target into right place

Signed-off-by: Stefan Berger stef...@linux.vnet.ibm.com

Index:qemu/Makefile.target
===
---
 Makefile.target |1 +
 configure   |   20 
 2 files changed, 21 insertions(+)

Index: qemu-git/Makefile.target
===
--- qemu-git.orig/Makefile.target
+++ qemu-git/Makefile.target
@@ -239,6 +239,7 @@ obj-i386-y += device-hotplug.o pci-hotpl
 obj-i386-y += debugcon.o multiboot.o
 obj-i386-y += pc_piix.o kvmclock.o
 obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
+obj-i386-$(CONFIG_TPM) += tpm_tis.o
 
 # shared objects
 obj-ppc-y = ppc.o
Index: qemu-git/configure
===
--- qemu-git.orig/configure
+++ qemu-git/configure
@@ -179,6 +179,7 @@ rbd=
 smartcard=
 smartcard_nss=
 opengl=
+tpm=no
 
 # parse CC options first
 for opt do
@@ -714,6 +715,8 @@ for opt do
   ;;
   --kerneldir=*) kerneldir=$optarg
   ;;
+  --enable-tpm) tpm=yes
+  ;;
   --with-pkgversion=*) pkgversion= ($optarg)
   ;;
   --disable-docs) docs=no
@@ -1014,6 +1017,7 @@ echo   --disable-smartcard  disable
 echo   --enable-smartcard   enable smartcard support
 echo   --disable-smartcard-nss  disable smartcard nss support
 echo   --enable-smartcard-nss   enable smartcard nss support
+echo   --enable-tpm enables an emulated TPM
 echo 
 echo NOTE: The object files are built at the place where configure is 
launched
 exit 1
@@ -2706,6 +2710,7 @@ echo rbd support   $rbd
 echo xfsctl support$xfs
 echo nss used  $smartcard_nss
 echo OpenGL support$opengl
+echo TPM support   $tpm
 
 if test $sdl_too_old = yes; then
 echo - Your SDL version is too old - please upgrade to have SDL support
@@ -3524,6 +3529,21 @@ if test $gprof = yes ; then
   fi
 fi
 
+if test $tpm = yes; then
+  has_tpm=0
+  if test $target_softmmu = yes ; then
+case $TARGET_BASE_ARCH in
+i386)
+  has_tpm=1
+;;
+esac
+  fi
+
+  if test $has_tpm = 1; then
+  echo CONFIG_TPM=y  $config_host_mak
+  fi
+fi
+
 linker_script=-Wl,-T../config-host.ld -Wl,-T,\$(SRC_PATH)/\$(ARCH).ld
 if test $target_linux_user = yes -o $target_bsd_user = yes ; then
   case $ARCH in




[Qemu-devel] [PATCH V5 03/12] Add persistent state handling to TPM TIS frontend driver

2011-05-20 Thread Stefan Berger
This patch adds support for handling of persistent state to the TPM TIS
frontend.

The currently used buffer is determined (can only be in currently active
locality and either be a read or a write buffer) and only that buffer's content
is stored. The reverse is done when the state is restored from disk
where the buffer's content are copied into the currently used buffer.

To keep compatibility with existing Xen the VMStateDescription was adapted
to be compatible with existing state. For that I am adding Andreas
Niederl as an author to the file.

v5:
 - removing qdev.no_user=1

v4:
 - main thread releases the 'state' lock while periodically calling the
   backends function that may request it to write data into block storage.

v3:
 - all functions prefixed with tis_
 - while the main thread is waiting for an outstanding TPM command to finish,
   it periodically does some work (writes data to the block storage)

Signed-off-by: Stefan Berger stef...@linux.vnet.ibm.com

---
 hw/tpm_tis.c |  166 +++
 1 file changed, 166 insertions(+)

Index: qemu-git/hw/tpm_tis.c
===
--- qemu-git.orig/hw/tpm_tis.c
+++ qemu-git/hw/tpm_tis.c
@@ -6,6 +6,8 @@
  * Author: Stefan Berger stef...@us.ibm.com
  * David Safford saff...@us.ibm.com
  *
+ * Xen 4 support: Andrease Niederl andreas.nied...@iaik.tugraz.at
+ *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License as
  * published by the Free Software Foundation, version 2 of the
@@ -837,3 +839,167 @@ static int tis_init(ISADevice *dev)
  err_exit:
 return -1;
 }
+
+/* persistent state handling */
+
+static void tis_pre_save(void *opaque)
+{
+TPMState *s = opaque;
+uint8_t locty = s-active_locty;
+
+qemu_mutex_lock(s-state_lock);
+
+/* wait for outstanding requests to complete */
+if (IS_VALID_LOCTY(locty)  s-loc[locty].state == STATE_EXECUTION) {
+if (!s-be_driver-ops-job_for_main_thread) {
+qemu_cond_wait(s-from_tpm_cond, s-state_lock);
+} else {
+while (s-loc[locty].state == STATE_EXECUTION) {
+qemu_mutex_unlock(s-state_lock);
+
+s-be_driver-ops-job_for_main_thread(NULL);
+usleep(1);
+
+qemu_mutex_lock(s-state_lock);
+}
+}
+}
+
+#ifdef DEBUG_TIS_SR
+fprintf(stderr,tpm_tis: suspend: locty 0 : r_offset = %d, w_offset = 
%d\n,
+s-loc[0].r_offset,
+s-loc[0].w_offset);
+if (s-loc[0].r_offset) {
+tis_dump_state(opaque, 0);
+}
+#endif
+
+qemu_mutex_unlock(s-state_lock);
+
+/* copy current active read or write buffer into the buffer
+   written to disk */
+if (IS_VALID_LOCTY(locty)) {
+switch (s-loc[locty].state) {
+case STATE_RECEPTION:
+memcpy(s-buf,
+   s-loc[locty].w_buffer.buffer,
+   MIN(sizeof(s-buf),
+   s-loc[locty].w_buffer.size));
+s-offset = s-loc[locty].w_offset;
+break;
+case STATE_COMPLETION:
+memcpy(s-buf,
+   s-loc[locty].r_buffer.buffer,
+   MIN(sizeof(s-buf),
+   s-loc[locty].r_buffer.size));
+s-offset = s-loc[locty].r_offset;
+break;
+default:
+/* leak nothing */
+memset(s-buf, 0x0, sizeof(s-buf));
+break;
+}
+}
+
+s-be_driver-ops-save_volatile_data();
+}
+
+
+static int tis_post_load(void *opaque,
+ int version_id __attribute__((unused)))
+{
+TPMState *s = opaque;
+
+uint8_t locty = s-active_locty;
+
+if (IS_VALID_LOCTY(locty)) {
+switch (s-loc[locty].state) {
+case STATE_RECEPTION:
+memcpy(s-loc[locty].w_buffer.buffer,
+   s-buf,
+   MIN(sizeof(s-buf),
+   s-loc[locty].w_buffer.size));
+s-loc[locty].w_offset = s-offset;
+break;
+case STATE_COMPLETION:
+memcpy(s-loc[locty].r_buffer.buffer,
+   s-buf,
+   MIN(sizeof(s-buf),
+   s-loc[locty].r_buffer.size));
+s-loc[locty].r_offset = s-offset;
+break;
+default:
+break;
+}
+}
+
+#ifdef DEBUG_TIS_SR
+fprintf(stderr,tpm_tis: resume : locty 0 : r_offset = %d, w_offset = 
%d\n,
+s-loc[0].r_offset,
+s-loc[0].w_offset);
+#endif
+
+return s-be_driver-ops-load_volatile_data(s);
+}
+
+
+static const VMStateDescription vmstate_locty = {
+.name = loc,
+.version_id = 1,
+.minimum_version_id = 0,
+.minimum_version_id_old = 0,
+.fields  = (VMStateField[]) {
+VMSTATE_UINT32(state   , TPMLocality),
+VMSTATE_UINT32(inte, TPMLocality),
+

[Qemu-devel] [PATCH V5 08/12] Introduce file lock for the block layer

2011-05-20 Thread Stefan Berger
This patch introduces file locking via fcntl() for the block layer so that
concurrent access to files shared by 2 Qemu instances, for example via NFS,
can be serialized. This feature is useful primarily during initial phases of
VM migration where the target machine's TIS driver validates the block
storage (and in a later patch checks for missing AES keys) and terminates
Qemu if the storage is found to be faulty. This then allows migration to
be gracefully terminated and Qemu continues running on the source machine.

Support for win32 is based on win32 API and has been lightly tested with a
standalone test program locking shared storage from two different machines.

To enable locking a file multiple times, a counter is used. Actual locking
happens the very first time and unlocking happens when the counter is zero.

Signed-off-by: Stefan Berger stef...@linux.vnet.ibm.com

---

---
 block.c   |   40 ++
 block.h   |8 ++
 block/raw-posix.c |   62 ++
 block/raw-win32.c |   51 
 block_int.h   |4 +++
 5 files changed, 165 insertions(+)

Index: qemu-git/block.c
===
--- qemu-git.orig/block.c
+++ qemu-git/block.c
@@ -475,6 +475,8 @@ static int bdrv_open_common(BlockDriverS
 goto free_and_fail;
 }
 
+drv-num_locks = 0;
+
 bs-keep_read_only = bs-read_only = !(open_flags  BDRV_O_RDWR);
 
 ret = refresh_total_sectors(bs, bs-total_sectors);
@@ -1181,6 +1183,44 @@ void bdrv_get_geometry(BlockDriverState 
 *nb_sectors_ptr = length;
 }
 
+/* file locking */
+static int bdrv_lock_common(BlockDriverState *bs, BDRVLockType lock_type)
+{
+BlockDriver *drv = bs-drv;
+
+if (!drv)
+return -ENOMEDIUM;
+
+if (bs-file) {
+drv = bs-file-drv;
+if (drv-bdrv_lock) {
+return drv-bdrv_lock(bs-file, lock_type);
+}
+}
+
+if (drv-bdrv_lock) {
+return drv-bdrv_lock(bs, lock_type);
+}
+
+return -ENOTSUP;
+}
+
+
+int bdrv_lock(BlockDriverState *bs)
+{
+if (bdrv_is_read_only(bs)) {
+return bdrv_lock_common(bs, BDRV_F_RDLCK);
+}
+
+return bdrv_lock_common(bs, BDRV_F_WRLCK);
+}
+
+void bdrv_unlock(BlockDriverState *bs)
+{
+bdrv_lock_common(bs, BDRV_F_UNLCK);
+}
+
+
 struct partition {
 uint8_t boot_ind;   /* 0x80 - active */
 uint8_t head;   /* starting head */
Index: qemu-git/block.h
===
--- qemu-git.orig/block.h
+++ qemu-git/block.h
@@ -42,6 +42,12 @@ typedef struct QEMUSnapshotInfo {
 #define BDRV_SECTOR_MASK   ~(BDRV_SECTOR_SIZE - 1)
 
 typedef enum {
+BDRV_F_UNLCK,
+BDRV_F_RDLCK,
+BDRV_F_WRLCK,
+} BDRVLockType;
+
+typedef enum {
 BLOCK_ERR_REPORT, BLOCK_ERR_IGNORE, BLOCK_ERR_STOP_ENOSPC,
 BLOCK_ERR_STOP_ANY
 } BlockErrorAction;
@@ -95,6 +101,8 @@ int bdrv_commit(BlockDriverState *bs);
 void bdrv_commit_all(void);
 int bdrv_change_backing_file(BlockDriverState *bs,
 const char *backing_file, const char *backing_fmt);
+int bdrv_lock(BlockDriverState *bs);
+void bdrv_unlock(BlockDriverState *bs);
 void bdrv_register(BlockDriver *bdrv);
 
 
Index: qemu-git/block/raw-posix.c
===
--- qemu-git.orig/block/raw-posix.c
+++ qemu-git/block/raw-posix.c
@@ -718,6 +718,66 @@ static int64_t raw_getlength(BlockDriver
 }
 #endif
 
+static int raw_lock(BlockDriverState *bs, BDRVLockType lock_type)
+{
+BlockDriver *drv = bs-drv;
+BDRVRawState *s = bs-opaque;
+struct flock flock = {
+.l_whence = SEEK_SET,
+.l_start = 0,
+.l_len = 0,
+};
+int n;
+
+switch (lock_type) {
+case BDRV_F_RDLCK:
+case BDRV_F_WRLCK:
+if (drv-num_locks) {
+drv-num_locks++;
+return 0;
+}
+flock.l_type = (lock_type == BDRV_F_RDLCK) ? F_RDLCK : F_WRLCK;
+break;
+
+case BDRV_F_UNLCK:
+if (--drv-num_locks  0) {
+return 0;
+}
+
+assert(drv-num_locks == 0);
+
+flock.l_type = F_UNLCK;
+break;
+
+default:
+return -EINVAL;
+}
+
+while (1) {
+n = fcntl(s-fd, F_SETLKW, flock);
+if (n  0) {
+if (errno == EINTR) {
+continue;
+}
+if (errno == EAGAIN) {
+usleep(1);
+continue;
+}
+}
+break;
+}
+
+if (n == 0 
+((lock_type == BDRV_F_RDLCK) || (lock_type == BDRV_F_WRLCK))) {
+drv-num_locks = 1;
+}
+
+if (n)
+return -errno;
+
+return 0;
+}
+
 static int raw_create(const char *filename, QEMUOptionParameter *options)
 {
 int fd;
@@ -814,6 +874,8 @@ static BlockDriver bdrv_file = {
 .bdrv_truncate = raw_truncate,
   

[Qemu-devel] [PATCH V5 12/12] Experimental support for taking measurements when kernel etc. are passed to Qemu

2011-05-20 Thread Stefan Berger
This really is just for experimental purposes since there are problems
when doing something similar with a multiboot kernel.

This patch addresses the case where the user provides the kernel, initrd
and kernel command line via command line parameters to Qemu. To avoid
incorrect measurements by SeaBIOS, the setup part of the kernel needs
to be treated separately.

For SeaBIOS to be able to measure the kernel whose measurement corresponds
to the 'sha1sum kernel file' we need to preserve the setup part of the
kernel. Since Qemu modifies it, we store a copy of the original setup
and later retrieve it in SeaBIOS's and concat the setup and rest of the kernel
to get the correct measurement.

An alternative would be to measure the files in Qemu and make the measurements
available to SeaBIOS. This would introduce a dependency of Qemu on a sha1
algorithm.

Signed-off-by: Stefan Berger stef...@linux.vnet.ibm.com

---
 hw/fw_cfg.h |1 +
 hw/pc.c |8 +++-
 2 files changed, 8 insertions(+), 1 deletion(-)

Index: qemu-git/hw/fw_cfg.h
===
--- qemu-git.orig/hw/fw_cfg.h
+++ qemu-git/hw/fw_cfg.h
@@ -27,6 +27,7 @@
 #define FW_CFG_SETUP_SIZE   0x17
 #define FW_CFG_SETUP_DATA   0x18
 #define FW_CFG_FILE_DIR 0x19
+#define FW_CFG_SETUP_ORIG_DATA  0x1a
 
 #define FW_CFG_FILE_FIRST   0x20
 #define FW_CFG_FILE_SLOTS   0x10
Index: qemu-git/hw/pc.c
===
--- qemu-git.orig/hw/pc.c
+++ qemu-git/hw/pc.c
@@ -659,7 +659,7 @@ static void load_linux(void *fw_cfg,
 uint16_t protocol;
 int setup_size, kernel_size, initrd_size = 0, cmdline_size;
 uint32_t initrd_max;
-uint8_t header[8192], *setup, *kernel, *initrd_data;
+uint8_t header[8192], *setup, *kernel, *initrd_data, *setup_orig;
 target_phys_addr_t real_addr, prot_addr, cmdline_addr, initrd_addr = 0;
 FILE *f;
 char *vmode;
@@ -807,6 +807,7 @@ static void load_linux(void *fw_cfg,
 kernel_size -= setup_size;
 
 setup  = qemu_malloc(setup_size);
+setup_orig = qemu_malloc(setup_size);
 kernel = qemu_malloc(kernel_size);
 fseek(f, 0, SEEK_SET);
 if (fread(setup, 1, setup_size, f) != setup_size) {
@@ -818,6 +819,9 @@ static void load_linux(void *fw_cfg,
 exit(1);
 }
 fclose(f);
+
+memcpy(setup_orig, setup, setup_size);
+
 memcpy(setup, header, MIN(sizeof(header), setup_size));
 
 fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, prot_addr);
@@ -828,6 +832,8 @@ static void load_linux(void *fw_cfg,
 fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, setup_size);
 fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA, setup, setup_size);
 
+fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_ORIG_DATA, setup_orig, setup_size);
+
 option_rom[nb_option_roms].name = linuxboot.bin;
 option_rom[nb_option_roms].bootindex = 0;
 nb_option_roms++;




[Qemu-devel] [PATCH V5 06/12] Add a TPM backend skeleton implementation

2011-05-20 Thread Stefan Berger
This patch provides a TPM backend skeleton implementation. It doesn't do
anything useful (except for returning error response for every TPM command)
but it compiles.

v5:
  - the backend interface now has a create and destroy function.
The former is used during the initialization phase of the TPM
and the latter to clean up when Qemu terminates.

v3:
  - in tpm_builtin.c all functions prefixed with tpm_builtin_
  - build the builtin TPM driver available at this point; it returns
a failure response message for every command
  - do not try to join the TPM thread but poll for its termination;
the libtpms-based driver will require Qemu's main thread to write
data to the block storage device while trying to join

V2:
  - only terminating thread in tpm_atexit if it's running

Signed-off-by: Stefan Berger stef...@linux.vnet.ibm.com

---
 Makefile.target  |5 
 configure|1 
 hw/tpm_builtin.c |  454 +++
 tpm.c|3 
 tpm.h|1 
 5 files changed, 464 insertions(+)

Index: qemu-git/hw/tpm_builtin.c
===
--- /dev/null
+++ qemu-git/hw/tpm_builtin.c
@@ -0,0 +1,454 @@
+/*
+ *  builtin 'null' TPM driver
+ *
+ *  Copyright (c) 2010, 2011 IBM Corporation
+ *  Copyright (c) 2010, 2011 Stefan Berger
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see http://www.gnu.org/licenses/
+ */
+
+#include qemu-common.h
+#include tpm.h
+#include hw/hw.h
+#include hw/tpm_tis.h
+#include hw/pc.h
+
+
+//#define DEBUG_TPM
+//#define DEBUG_TPM_SR /* suspend - resume */
+
+
+/* data structures */
+
+typedef struct ThreadParams {
+TPMState *tpm_state;
+
+TPMRecvDataCB *recv_data_callback;
+} ThreadParams;
+
+
+/* local variables */
+
+static QemuThread thread;
+
+static QemuMutex state_mutex; /* protects *_state below */
+static QemuMutex tpm_initialized_mutex; /* protect tpm_initialized */
+
+static bool thread_terminate = false;
+static bool tpm_initialized = false;
+static bool had_fatal_error = false;
+static bool had_startup_error = false;
+static bool thread_running = false;
+
+static ThreadParams tpm_thread_params;
+
+/* locality of the command being executed by libtpms */
+static uint8_t g_locty;
+
+static const unsigned char tpm_std_fatal_error_response[10] = {
+0x00, 0xc4, 0x00, 0x00, 0x00, 0x0A, 0x00, 0x00, 0x00, 0x09 /* TPM_FAIL */
+};
+
+static char dev_description[80];
+
+
+static void *tpm_builtin_main_loop(void *d)
+{
+int res = 1;
+ThreadParams *thr_parms = d;
+uint32_t in_len, out_len;
+uint8_t *in, *out;
+uint32_t resp_size; /* total length of response */
+
+#ifdef DEBUG_TPM
+fprintf(stderr, tpm: THREAD IS STARTING\n);
+#endif
+
+if (res != 0) {
+#if defined DEBUG_TPM || defined DEBUG_TPM_SR
+fprintf(stderr, tpm: Error: TPM initialization failed (rc=%d)\n,
+res);
+#endif
+   had_fatal_error = true;
+} else {
+qemu_mutex_lock(tpm_initialized_mutex);
+
+tpm_initialized = true;
+
+qemu_mutex_unlock(tpm_initialized_mutex);
+}
+
+/* start command processing */
+while (!thread_terminate) {
+/* receive and handle commands */
+in_len = 0;
+do {
+#ifdef DEBUG_TPM
+fprintf(stderr, tpm: waiting for commands...\n);
+#endif
+
+if (thread_terminate) {
+break;
+}
+
+qemu_mutex_lock(thr_parms-tpm_state-state_lock);
+
+/* in case we were to slow and missed the signal, the
+   to_tpm_execute boolean tells us about a pending command */
+if (!thr_parms-tpm_state-to_tpm_execute) {
+qemu_cond_wait(thr_parms-tpm_state-to_tpm_cond,
+   thr_parms-tpm_state-state_lock);
+}
+
+thr_parms-tpm_state-to_tpm_execute = false;
+
+qemu_mutex_unlock(thr_parms-tpm_state-state_lock);
+
+if (thread_terminate) {
+break;
+}
+
+g_locty = thr_parms-tpm_state-command_locty;
+
+in = thr_parms-tpm_state-loc[g_locty].w_buffer.buffer;
+in_len = thr_parms-tpm_state-loc[g_locty].w_offset;
+
+if (!had_fatal_error) {
+
+out_len = thr_parms-tpm_state-loc[g_locty].r_buffer.size;
+
+#ifdef DEBUG_TPM
+   

[Qemu-devel] [PATCH V5 10/12] Encrypt state blobs using AES CBC encryption

2011-05-20 Thread Stefan Berger
This patch adds encryption of the individual state blobs that are written
into the block storage. The 'directory' at the beginnig of the block
storage is not encrypted.

Keys can be passed either as a string of hexadecimal digits forming a 256,
192 or 128 bit AES key. Those keys can optionally start with '0x'. If the
parser does not recognize it as such, the string itself is taken as the AES
key.

The key is passed via command line argument. It is wiped from the command
line after parsing. If key=0x1234... was passed before it will then be
changed to key=--... so that 'ps' does not show the key anymore. Obviously
it cannot be completely prevented that the key is visible during a very
short period of time until qemu is done parsing the command line parameters.

A flag is introduced in the directory structure indicating whether the blobs
are encrypted.

An additional 'layer' for reading and writing the blobs to the underlying
block storage is added. This layer encrypts the blobs for writing if a key is
available. Similarly it decrypts the blobs after reading.

Checks are added that test whether a key has been provided although all
data are stored in clear-text or whether a key is missing. In either one of
the cases the backend returns an error and Qemu terminates.

-v5:
  - -tpmdev now also gets a key parameter
  - add documentation about key parameter

Signed-off-by: Stefan Berger stef...@linux.vnet.ibm.com

---
 hw/tpm_builtin.c |  213 +--
 qemu-config.c|   10 ++
 qemu-options.hx  |   20 -
 tpm.c|   10 ++
 4 files changed, 246 insertions(+), 7 deletions(-)

Index: qemu-git/hw/tpm_builtin.c
===
--- qemu-git.orig/hw/tpm_builtin.c
+++ qemu-git/hw/tpm_builtin.c
@@ -27,6 +27,7 @@
 #include hw/pc.h
 #include migration.h
 #include sysemu.h
+#include aes.h
 
 #include libtpms/tpm_library.h
 #include libtpms/tpm_error.h
@@ -110,7 +111,8 @@ typedef struct BSDir {
 uint16_t  rev;
 uint32_t  checksum;
 uint32_t  num_entries;
-uint32_t  reserved[10];
+uint32_t  flags;
+uint32_t  reserved[9];
 BSEntry   entries[BS_DIR_MAX_NUM_ENTRIES];
 } __attribute__((packed)) BSDir;
 
@@ -119,6 +121,8 @@ typedef struct BSDir {
 
 #define BS_DIR_REV_CURRENT  BS_DIR_REV1
 
+#define BS_DIR_FLAG_ENC_BLOBS   (1  0)
+
 /* local variables */
 
 static QemuThread thread;
@@ -150,6 +154,8 @@ static const unsigned char tpm_std_fatal
 
 static char dev_description[80];
 
+static bool has_key;
+static AES_KEY tpm_enc_key, tpm_dec_key;
 
 static void tpm_builtin_adjust_data_layout(BlockDriverState *bs, BSDir *dir);
 static int tpm_builtin_load_sized_data_from_bs(BlockDriverState *bs,
@@ -206,6 +212,7 @@ static void tpm_builtin_dir_be_to_cpu(BS
 be16_to_cpus(dir-rev);
 be32_to_cpus(dir-checksum);
 be32_to_cpus(dir-num_entries);
+be32_to_cpus(dir-flags);
 
 for (c = 0; c  dir-num_entries  c  BS_DIR_MAX_NUM_ENTRIES; c++) {
 be32_to_cpus(dir-entries[c].type);
@@ -232,6 +239,7 @@ static void tpm_builtin_dir_cpu_to_be(BS
 dir-rev = cpu_to_be16(dir-rev);
 dir-checksum= cpu_to_be32(dir-checksum);
 dir-num_entries = cpu_to_be32(dir-num_entries);
+dir-flags   = cpu_to_be32(dir-flags);
 }
 
 
@@ -297,6 +305,36 @@ static bool tpm_builtin_has_valid_conten
 }
 
 
+static uint32_t tpm_builtin_get_dir_flags(void)
+{
+if (has_key) {
+return BS_DIR_FLAG_ENC_BLOBS;
+}
+
+return 0;
+}
+
+
+static bool tpm_builtin_has_missing_key(const BSDir *dir)
+{
+if ((dir-flags  BS_DIR_FLAG_ENC_BLOBS)  !has_key) {
+return true;
+}
+
+return false;
+}
+
+
+static bool tpm_builtin_has_unnecessary_key(const BSDir *dir)
+{
+if (!(dir-flags  BS_DIR_FLAG_ENC_BLOBS)  has_key) {
+return true;
+}
+
+return false;
+}
+
+
 static int tpm_builtin_create_blank_dir(BlockDriverState *bs)
 {
 uint8_t buf[BDRV_SECTOR_SIZE];
@@ -307,6 +345,7 @@ static int tpm_builtin_create_blank_dir(
 dir = (BSDir *)buf;
 dir-rev = BS_DIR_REV_CURRENT;
 dir-num_entries = 0;
+dir-flags = tpm_builtin_get_dir_flags();
 
 dir-checksum = tpm_builtin_calc_dir_checksum(dir);
 
@@ -408,6 +447,28 @@ static int tpm_builtin_startup_bs(BlockD
 
 tpm_builtin_dir_be_to_cpu(dir);
 
+if (tpm_builtin_is_valid_bsdir(dir)) {
+if (tpm_builtin_has_missing_key(dir)) {
+fprintf(stderr,
+tpm: the data are encrypted but I am missing the key.\n);
+rc = -EIO;
+goto err_exit;
+}
+if (tpm_builtin_has_unnecessary_key(dir)) {
+fprintf(stderr,
+tpm: I have a key but the data are not encrypted.\n);
+rc = -EIO;
+goto err_exit;
+}
+if ((dir-flags  BS_DIR_FLAG_ENC_BLOBS) 
+!tpm_builtin_has_valid_content(dir)) {
+fprintf(stderr, tpm: cannot read the data - 
+

[Qemu-devel] [PATCH V5 07/12] Implementation of the libtpms-based backend

2011-05-20 Thread Stefan Berger
This patch provides the glue for the TPM TIS interface (frontend) to
the libtpms that provides the actual TPM functionality.

Some details:

This part of the patch provides support for the spawning of a thread
that will interact with the libtpms-based TPM. It expects a signal
from the frontend to wake and pick up the TPM command that is supposed
to be processed and delivers the response packet using a callback
function provided by the frontend.

The backend connects itself to the frontend by filling out an interface
structure with pointers to the function implementing support for various
operations.

In this part a structure with callback functions with is registered with
libtpms. Those callback functions mostly deal with persistent storage.

The libtpms-based backend implements functionality to write into a 
Qemu block storage device rather than to plain files. With that we
can support VM snapshotting and we also get the possibility to use
encrypted QCoW2 for free. Thanks to Anthony for pointing this out.
The storage part of the driver has been split off into its own patch.

v5:
  - check access() to TPM's state file and report error if file is not
accessible

v3:
  - temporarily deactivate the building of the tpm_builtin.c until
subsequent patch completely converts it to the libtpms based driver

v2:
  - fixes to adhere to the qemu coding style


Signed-off-by: Stefan Berger stef...@linux.vnet.ibm.com

---
 configure|1 
 hw/tpm_builtin.c |  422 ---
 hw/tpm_tis.h |   17 ++
 3 files changed, 419 insertions(+), 21 deletions(-)

Index: qemu-git/hw/tpm_tis.h
===
--- qemu-git.orig/hw/tpm_tis.h
+++ qemu-git/hw/tpm_tis.h
@@ -73,4 +73,21 @@ static inline void dumpBuffer(FILE *stre
 fprintf(stream, \n);
 }
 
+static inline void clear_sized_buffer(TPMSizedBuffer *tpmsb)
+{
+if (tpmsb-buffer) {
+   tpmsb-size = 0;
+   qemu_free(tpmsb-buffer);
+   tpmsb-buffer = NULL;
+}
+}
+
+static inline void set_sized_buffer(TPMSizedBuffer *tpmsb,
+uint8_t *buffer, uint32_t size)
+{
+clear_sized_buffer(tpmsb);
+tpmsb-size = size;
+tpmsb-buffer = buffer;
+}
+
 #endif /* _HW_TPM_TIS_H */
Index: qemu-git/hw/tpm_builtin.c
===
--- qemu-git.orig/hw/tpm_builtin.c
+++ qemu-git/hw/tpm_builtin.c
@@ -1,5 +1,5 @@
 /*
- *  builtin 'null' TPM driver
+ *  builtin TPM driver based on libtpms
  *
  *  Copyright (c) 2010, 2011 IBM Corporation
  *  Copyright (c) 2010, 2011 Stefan Berger
@@ -18,17 +18,36 @@
  * License along with this library; if not, see http://www.gnu.org/licenses/
  */
 
+#include blockdev.h
+#include block_int.h
 #include qemu-common.h
 #include tpm.h
 #include hw/hw.h
 #include hw/tpm_tis.h
 #include hw/pc.h
+#include migration.h
+#include sysemu.h
+
+#include libtpms/tpm_library.h
+#include libtpms/tpm_error.h
+#include libtpms/tpm_memory.h
+#include libtpms/tpm_nvfilename.h
+#include libtpms/tpm_tis.h
+
+#include zlib.h
 
 
 //#define DEBUG_TPM
 //#define DEBUG_TPM_SR /* suspend - resume */
 
 
+#define SAVESTATE_TYPE 'S'
+#define PERMSTATE_TYPE 'P'
+#define VOLASTATE_TYPE 'V'
+
+#define VTPM_DRIVE  drive-vtpm0-nvram
+#define TPM_OPTS id= VTPM_DRIVE
+
 /* data structures */
 
 typedef struct ThreadParams {
@@ -44,12 +63,18 @@ static QemuThread thread;
 
 static QemuMutex state_mutex; /* protects *_state below */
 static QemuMutex tpm_initialized_mutex; /* protect tpm_initialized */
+static QemuCond bs_write_result_cond;
+static TPMSizedBuffer permanent_state = { .size = 0, .buffer = NULL, };
+static TPMSizedBuffer volatile_state  = { .size = 0, .buffer = NULL, };
+static TPMSizedBuffer save_state  = { .size = 0, .buffer = NULL, };
+static int pipefd[2] =  {-1, -1};
 
 static bool thread_terminate = false;
 static bool tpm_initialized = false;
 static bool had_fatal_error = false;
 static bool had_startup_error = false;
 static bool thread_running = false;
+static bool need_read_volatile = false;
 
 static ThreadParams tpm_thread_params;
 
@@ -63,9 +88,21 @@ static const unsigned char tpm_std_fatal
 static char dev_description[80];
 
 
+static int tpmlib_get_prop(enum TPMLIB_TPMProperty prop)
+{
+int result;
+
+TPM_RESULT res = TPMLIB_GetTPMProperty(prop, result);
+
+assert(res == TPM_SUCCESS);
+
+return result;
+}
+
+
 static void *tpm_builtin_main_loop(void *d)
 {
-int res = 1;
+TPM_RESULT res;
 ThreadParams *thr_parms = d;
 uint32_t in_len, out_len;
 uint8_t *in, *out;
@@ -75,9 +112,10 @@ static void *tpm_builtin_main_loop(void 
 fprintf(stderr, tpm: THREAD IS STARTING\n);
 #endif
 
-if (res != 0) {
+res = TPMLIB_MainInit();
+if (res != TPM_SUCCESS) {
 #if defined DEBUG_TPM || defined DEBUG_TPM_SR
-fprintf(stderr, tpm: Error: TPM initialization failed (rc=%d)\n,
+fprintf(stderr, 

[Qemu-devel] [PATCH V5 11/12] Experimental support for block migrating TPMs state

2011-05-20 Thread Stefan Berger
This patch adds (experimental) support for block migration.

In the case of block migration an empty QCoW2 image must be found on
the destination so that early checks on the content and whether it can be
decrytped with the provided key have to be skipped. That empty file needs
to be created by higher layers (i.e., libvirt).

Also, the completion of the block migration has to be delayed until after
the TPM has written the last bytes of its state into the block device so
that we get the latest state on the target as well. Before the change to
savevm.c it could happen that the latest state of the TPM did not make it to
the destination host since the TPM was still processing a command and
changing its state (written into block storage) but the block migration
already had finished. Re-ordering the saving of the live_state to finish
after the 'non live_state' seems to get it right.

Signed-off-by: Stefan Berger stef...@linux.vnet.ibm.com

---
 hw/tpm_builtin.c |5 +
 savevm.c |   22 +++---
 2 files changed, 16 insertions(+), 11 deletions(-)

Index: qemu-git/hw/tpm_builtin.c
===
--- qemu-git.orig/hw/tpm_builtin.c
+++ qemu-git/hw/tpm_builtin.c
@@ -471,6 +471,11 @@ static int tpm_builtin_startup_bs(BlockD
 
 if (!tpm_builtin_is_valid_bsdir(dir) ||
 !tpm_builtin_has_valid_content(dir)) {
+if (incoming_expected) {
+/* during migration with block migration, we may end
+   up here due to an empty block file */
+return -ENOKEY;
+}
 /* if it's encrypted and has something else than null-content,
we assume to have the wrong key */
 if (bdrv_is_encrypted(bs)) {
Index: qemu-git/savevm.c
===
--- qemu-git.orig/savevm.c
+++ qemu-git/savevm.c
@@ -1547,17 +1547,6 @@ int qemu_savevm_state_complete(Monitor *
 cpu_synchronize_all_states();
 
 QTAILQ_FOREACH(se, savevm_handlers, entry) {
-if (se-save_live_state == NULL)
-continue;
-
-/* Section type */
-qemu_put_byte(f, QEMU_VM_SECTION_END);
-qemu_put_be32(f, se-section_id);
-
-se-save_live_state(mon, f, QEMU_VM_SECTION_END, se-opaque);
-}
-
-QTAILQ_FOREACH(se, savevm_handlers, entry) {
 int len;
 
if (se-save_state == NULL  se-vmsd == NULL)
@@ -1578,6 +1567,17 @@ int qemu_savevm_state_complete(Monitor *
 vmstate_save(f, se);
 }
 
+QTAILQ_FOREACH(se, savevm_handlers, entry) {
+if (se-save_live_state == NULL)
+continue;
+
+/* Section type */
+qemu_put_byte(f, QEMU_VM_SECTION_END);
+qemu_put_be32(f, se-section_id);
+
+se-save_live_state(mon, f, QEMU_VM_SECTION_END, se-opaque);
+}
+
 qemu_put_byte(f, QEMU_VM_EOF);
 
 if (qemu_file_has_error(f))




Re: [Qemu-devel] [PATCH 23/26] target-xtensa: implement interrupt option

2011-05-20 Thread Richard Henderson
On 05/17/2011 03:32 PM, Max Filippov wrote:
 +if (xtensa_option_enabled(env-config, XTENSA_OPTION_TIMER_INTERRUPT)) {
 +int i;
 +for (i = 0; i  env-config-nccompare; ++i) {
 +if (env-sregs[CCOMPARE + i] - old_ccount = d) {
 +env-halted = 0;
 +xtensa_timer_irq(env, i, 1);

I don't think you should be writing to halted here; this is done by
the code in cpu-exec.c, when noticing when cpu_has_work.  Which will
be true as a function of env-interrupt_request and the interrupt mask.


 +if (env-halted) {
 +xtensa_advance_ccount(env,
 +muldiv64(qemu_get_clock_ns(vm_clock) - 
 env-halt_clock,
 +env-config-clock_freq_khz, 100));
 +}

Why are you polling the vm_clock rather than setting up a timer?

 +env-ccompare_timer =
 +qemu_new_timer_ns(vm_clock, xtensa_ccompare_cb, env);

... er, actually you are setting up a timer.  So why aren't you using it?

  void do_interrupt(CPUState *env)
  {
  switch (env-exception_index) {
 +case EXC_IRQ:
 +if (handle_interrupt(env)) {
 +break;
 +}
 +/* not handled interrupt falls through,
 + * env-exception_index is updated
 + */

Do you really want to fall through, rather than restart the switch?

 @@ -124,12 +198,16 @@ void do_interrupt(CPUState *env)
  if (env-config-exception_vector[env-exception_index]) {
  env-pc = env-config-exception_vector[env-exception_index];
  env-exception_taken = 1;
 +env-interrupt_request |= CPU_INTERRUPT_EXITTB;

Huh?  What are you trying to accomplish here?
EXITTB is supposed to be used when a device external to the cpu
changes the memory mapping of the system.  E.g. the x86 a20 line.

 +DEF_HELPER_0(check_interrupts, void)
 +DEF_HELPER_2(waiti, void, i32, i32)
 +DEF_HELPER_2(timer_irq, void, i32, i32)
 +DEF_HELPER_1(advance_ccount, void, i32)

You shouldn't have to manage any of this from within the translator.


r~



[Qemu-devel] [PULL] s390x patch queue

2011-05-20 Thread Alexander Graf
Hi,

This is my current s390x patch queue containing

  * s390x emulation
  * fixes for s390x kvm

Please pull.

Alex

The following changes since commit 1fddfba129f5435c80eda14e8bc23fdb888c7187:
  Alexander Graf (1):
ahci: Fix non-NCQ accesses for LBA  16bits

are available in the git repository at:

  git://repo.or.cz/qemu/agraf.git s390-next

Alexander Graf (12):
  tcg: extend max tcg opcodes when using 64-on-32bit
  s390x: make kvm exported functions conditional on kvm
  s390x: keep hint on virtio managing size
  s390x: Shift variables in CPUState for memset(0)
  s390x: helper functions for system emulation
  s390x: Implement opcode helpers
  s390x: Adjust internal kvm code
  s390x: translate engine for s390x CPU
  s390x: Adjust GDB stub
  s390x: remove compatibility cc field
  s390x: build s390x by default
  s390x: complain when allocating ram fails

Christian Borntraeger (4):
  s390x: fix smp support for kvm
  s390x: Fix debugging for unknown sigp order codes
  s390x: change mapping base to allow guests  2GB
  s390x: fix memory detection for guests  64GB

Ulrich Hecht (1):
  s390x: s390x-linux-user support

 configure|2 +
 default-configs/s390x-linux-user.mak |1 +
 exec-all.h   |4 +
 exec.c   |   14 +-
 gdbstub.c|8 +-
 hw/s390-virtio-bus.c |3 +
 hw/s390-virtio-bus.h |2 +-
 hw/s390-virtio.c |   20 +-
 linux-user/elfload.c |   19 +
 linux-user/main.c|   83 +
 linux-user/s390x/syscall.h   |   23 +
 linux-user/s390x/syscall_nr.h|  349 +++
 linux-user/s390x/target_signal.h |   26 +
 linux-user/s390x/termbits.h  |  283 ++
 linux-user/signal.c  |  333 +++
 linux-user/syscall.c |   16 +-
 linux-user/syscall_defs.h|   55 +-
 scripts/qemu-binfmt-conf.sh  |4 +-
 target-s390x/cpu.h   |   28 +-
 target-s390x/helper.c|  565 -
 target-s390x/helpers.h   |  151 +
 target-s390x/kvm.c   |   48 +-
 target-s390x/op_helper.c | 2929 +++-
 target-s390x/translate.c | 5167 +-
 24 files changed, 10058 insertions(+), 75 deletions(-)
 create mode 100644 default-configs/s390x-linux-user.mak
 create mode 100644 linux-user/s390x/syscall.h
 create mode 100644 linux-user/s390x/syscall_nr.h
 create mode 100644 linux-user/s390x/target_signal.h
 create mode 100644 linux-user/s390x/termbits.h
 create mode 100644 target-s390x/helpers.h




[Qemu-devel] [PATCH V5 09/12] Add block storage support for libtpms based TPM backend

2011-05-20 Thread Stefan Berger
This patch supports the storage of TPM persistent state.

The TPM creates state of varying size, depending for example how many
keys are loaded into it at a certain time. The worst-case sizes of
the different blobs the TPM can write have been pre-calculated and this
value is used to determine the minimum size of the Qcow2 image. It needs to
be 63kb. 'qemu-... -tpm ?' shows this number when this backend driver is
available.


The layout of the TPM's persistent data in the block storage is as follows:

The first sector (512 bytes) holds a primitive directory for the different
types of blobs that the TPM can write. This directory holds a revision
number, a checksum over its content, the number of entries, and the entries
themselves. 

typedef struct BSDir {
uint16_t  rev;
uint32_t  checksum; 
uint32_t  num_entries;
uint32_t  reserved[10];
BSEntry   entries[BS_DIR_MAX_NUM_ENTRIES];
} __attribute__((packed)) BSDir;

The entries are described through their absolute offsets, their maximum
sizes, the number of currently valid bytes (the blobs inflate and deflate)
and what type of blob it is (see below for the types). A CRC32 over the blob
is also included.

typedef struct BSEntry {
enum BSEntryType type;
uint64_t offset;
uint32_t space;
uint32_t blobsize;
uint32_t blobcrc32;
uint32_t reserved[9];
} __attribute__((packed)) BSEntry;


The worst case sizes of the blobs have been calculated and according to the
sizes the blobs are written at certain offsets into the blockstorage. Their
offsets are all aligned to sectors (512 byte boundaries).

The TPM provides three different blobs that are written into the storage:

- volatile state
- permanent state
- save state

The 'save state' is written when the VM suspends (ACPI S3) and read when it
resumes. This is done in concert with the BIOS where the BIOS needs to send
a command to the TPM upon resume (TPM_Startup(ST_STATE)), while the OS
issues the command TPM_SaveState() before entering ACPI S3.

The 'permanent state' is written when the TPM receives a command that alters
its permenent state, i.e., when the a key is loaded into the TPM that
is expected to be there upon reboot of the machine / VM.

Volatile state is written when the frontend triggers it to do so, i.e.,
when the VM's state is written out during taking of a snapshot, migration
or suspension to disk (as in 'virsh save'). This state serves to resume
at the point where the TPM previously stopped but there is no need for it
after a machine reboot for example.

Tricky parts here are related to encrypted QCoW2 storage where certain
operations need to be deferred since the key for the storage only becomes
available much later via the monitor than the time that the backend is
instantiated.

The backend also tries to check for the validity of the block storage for
example. If the Qcow2 is not encrypted and the checksum is found to be
bad, the block storage directory will be initialized. 
In case the Qcow2 is encrypted, initialization will only be done if
the directory is found to be all 0s. In case the directory cannot be
checksummed correctly, but is not all 0s, it is assumed that the user
provided a wrong key. In this case I am not exiting qemu, but black-out
the TPM interface (returns 0xff in all memory location) due to a presumed
fatal error and let the VM run (without TPM functionality).

v5:
  - name of drive is 'drive-vtpm0-nvram'; was 'vtpm-nvram'

v4:
  - functions prefixed with tpm_builtin
  - added 10 uint32_t to BSDir as being reserved for future use
  - never move data in the block storage while migration is going on
  - use brdv_lock/bdrv_unlock to serialize access to the TPM's state
file which is primarily necessary during migration and the startup
of qemu on the target host where the content of the drive is being
read and validated

v3:
  - added reserved int's for future extensions to the entries in the
directory structure
  - added crc32 to every entry in the directory structure and calculating
it when writing and checking it when reading
  - fixed an endianess issue related to crc calculation
  - surrounding debugging output function in adjust_data_layout
with #if defined DEBUG_TPM
  - probing for installed libtpms development package by test-compiling

Signed-off-by: Stefan Berger stef...@linux.vnet.ibm.com

---
 configure|   25 +
 hw/tpm_builtin.c |  816 ++-
 2 files changed, 837 insertions(+), 4 deletions(-)

Index: qemu-git/hw/tpm_builtin.c
===
--- qemu-git.orig/hw/tpm_builtin.c
+++ qemu-git/hw/tpm_builtin.c
@@ -48,6 +48,34 @@
 #define VTPM_DRIVE  drive-vtpm0-nvram
 #define TPM_OPTS id= VTPM_DRIVE
 
+
+#define ALIGN(VAL, SIZE) \
+  ( ( (VAL) + (SIZE) - 1 )  ~( (SIZE) - 1 ) )
+
+
+#define DIRECTORY_SIZEBDRV_SECTOR_SIZE
+
+#define PERMSTATE_DISK_OFFSET ALIGN(DIRECTORY_SIZE, BDRV_SECTOR_SIZE)

  1   2   3   >