Re: [Qemu-devel] [PATCH] arm_gic: Keep track of GICD_CPENDR and GICD_SPENDR

2013-10-29 Thread Bhushan Bharat-R65777
Hi Christoffer,

Not related to the patch, for edge type of interrupt, will setting bit in 
ICD_SPENDR generate interrupt?

Thanks
-Bharat

 -Original Message-
 From: Christoffer Dall [mailto:christoffer.d...@linaro.org]
 Sent: Wednesday, October 23, 2013 8:57 PM
 To: peter.mayd...@linaro.org
 Cc: patc...@linaro.org; qemu-devel@nongnu.org; kvm...@lists.cs.columbia.edu
 Subject: [PATCH] arm_gic: Keep track of GICD_CPENDR and GICD_SPENDR
 
 If software writes to the ISPENDR and sets the pending state of a level-
 triggered interrupt, the falling edge of the hardware input must not clear the
 pending state.  Conversely, if software writes to the ICPENDR, the pending 
 state
 of a level-triggered interrupt should only be cleared if the hardware input is
 not asserted.
 
 This requires an extra state variable to keep track of software writes.
 
 Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
 ---
  hw/intc/arm_gic.c| 20 +---
  hw/intc/arm_gic_common.c |  5 +++--
  hw/intc/gic_internal.h   |  4 
  3 files changed, 24 insertions(+), 5 deletions(-)
 
 diff --git a/hw/intc/arm_gic.c b/hw/intc/arm_gic.c index d1ddac1..db54061 
 100644
 --- a/hw/intc/arm_gic.c
 +++ b/hw/intc/arm_gic.c
 @@ -101,6 +101,12 @@ static void gic_clear_pending(GICState *s, int irq, int 
 cm,
 uint8_t src)  {
  unsigned cpu;
 
 +/* If a level-triggered interrupt has been set to pending through the
 + * GICD_SPENDR, then a falling edge does not clear the pending state.
 + */
 +if (GIC_TEST_SW_PENDING(irq, cm))
 +return;
 +
  GIC_CLEAR_PENDING(irq, cm);
  if (irq  GIC_NR_SGIS) {
  cpu = (unsigned)ffs(cm) - 1;
 @@ -177,8 +183,9 @@ uint32_t gic_acknowledge_irq(GICState *s, int cpu)
  s-last_active[new_irq][cpu] = s-running_irq[cpu];
  /* Clear pending flags for both level and edge triggered interrupts.
 Level triggered IRQs will be reasserted once they become inactive.  */
 -gic_clear_pending(s, new_irq, GIC_TEST_MODEL(new_irq) ? ALL_CPU_MASK : 
 cm,
 -  GIC_SGI_SRC(new_irq, cpu));
 +cm = GIC_TEST_MODEL(new_irq) ? ALL_CPU_MASK : cm;
 +GIC_CLEAR_SW_PENDING(new_irq, cm);
 +gic_clear_pending(s, new_irq, cm, GIC_SGI_SRC(new_irq, cpu));
  gic_set_running_irq(s, cpu, new_irq);
  DPRINTF(ACK %d\n, new_irq);
  return new_irq;
 @@ -445,16 +452,23 @@ static void gic_dist_writeb(void *opaque, hwaddr offset,
  for (i = 0; i  8; i++) {
  if (value  (1  i)) {
  GIC_SET_PENDING(irq + i, GIC_TARGET(irq + i));
 +if (!GIC_TEST_TRIGGER(irq + i)) {
 +GIC_SET_SW_PENDING(irq + i, GIC_TARGET(irq + i));
 +}
  }
  }
  } else if (offset  0x300) {
 +int cm = (1  cpu);
  /* Interrupt Clear Pending.  */
  irq = (offset - 0x280) * 8 + GIC_BASE_IRQ;
  if (irq = s-num_irq)
  goto bad_reg;
  for (i = 0; i  8; i++, irq++) {
  if (irq  GIC_NR_SGIS  value  (1  i)) {
 -gic_clear_pending(s, irq, 1  cpu, 0);
 +GIC_CLEAR_SW_PENDING(irq, cm);
 +if (GIC_TEST_TRIGGER(irq + i) || !GIC_TEST_LEVEL(irq, cm)) {
 +GIC_CLEAR_PENDING(irq, cm);
 +}
  }
  }
  } else if (offset  0x400) {
 diff --git a/hw/intc/arm_gic_common.c b/hw/intc/arm_gic_common.c index
 1d3b738..7f0615f 100644
 --- a/hw/intc/arm_gic_common.c
 +++ b/hw/intc/arm_gic_common.c
 @@ -43,11 +43,12 @@ static int gic_post_load(void *opaque, int version_id)
 
  static const VMStateDescription vmstate_gic_irq_state = {
  .name = arm_gic_irq_state,
 -.version_id = 1,
 -.minimum_version_id = 1,
 +.version_id = 2,
 +.minimum_version_id = 2,
  .fields = (VMStateField[]) {
  VMSTATE_UINT8(enabled, gic_irq_state),
  VMSTATE_UINT8(pending, gic_irq_state),
 +VMSTATE_UINT8(sw_pending, gic_irq_state),
  VMSTATE_UINT8(active, gic_irq_state),
  VMSTATE_UINT8(level, gic_irq_state),
  VMSTATE_BOOL(model, gic_irq_state), diff --git 
 a/hw/intc/gic_internal.h
 b/hw/intc/gic_internal.h index f9133b9..173c607 100644
 --- a/hw/intc/gic_internal.h
 +++ b/hw/intc/gic_internal.h
 @@ -43,6 +43,9 @@
  #define GIC_SET_PENDING(irq, cm) s-irq_state[irq].pending |= (cm)  #define
 GIC_CLEAR_PENDING(irq, cm) s-irq_state[irq].pending = ~(cm)  #define
 GIC_TEST_PENDING(irq, cm) ((s-irq_state[irq].pending  (cm)) != 0)
 +#define GIC_SET_SW_PENDING(irq, cm) s-irq_state[irq].sw_pending |=
 +(cm) #define GIC_CLEAR_SW_PENDING(irq, cm) s-irq_state[irq].sw_pending
 += ~(cm) #define GIC_TEST_SW_PENDING(irq, cm)
 +((s-irq_state[irq].sw_pending  (cm)) != 0)
  #define GIC_SET_ACTIVE(irq, cm) s-irq_state[irq].active |= (cm)  #define
 GIC_CLEAR_ACTIVE(irq, cm) s-irq_state[irq].active = ~(cm)  #define
 GIC_TEST_ACTIVE(irq, cm) ((s-irq_state[irq].active  (cm)) != 0) @@ 

Re: [Qemu-devel] vfio for platform devices - 9/5/2012 - minutes

2013-09-12 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alex Williamson [mailto:alex.william...@redhat.com]
 Sent: Wednesday, September 11, 2013 10:45 PM
 To: Yoder Stuart-B08248
 Cc: Wood Scott-B07421; Sethi Varun-B16395; Bhushan Bharat-R65777; 'Peter
 Maydell'; 'Santosh Shukla'; 'Alexander Graf'; 'Antonios Motakis'; 'Christoffer
 Dall'; 'kim.phill...@linaro.org'; kvm...@lists.cs.columbia.edu; kvm-
 p...@vger.kernel.org; qemu-devel@nongnu.org
 Subject: Re: vfio for platform devices - 9/5/2012 - minutes
 
 On Wed, 2013-09-11 at 16:42 +, Yoder Stuart-B08248 wrote:
 
   -Original Message-
   From: Yoder Stuart-B08248
   Sent: Thursday, September 05, 2013 12:51 PM
   To: Wood Scott-B07421; Sethi Varun-B16395; Bhushan Bharat-R65777;
   'Peter Maydell'; 'Santosh Shukla'; 'Alex Williamson'; 'Alexander
   Graf'; 'Antonios Motakis'; 'Christoffer Dall'; 'kim.phill...@linaro.org'
   Cc: kvm...@lists.cs.columbia.edu; 'kvm-...@vger.kernel.org'; 'qemu-
   de...@nongnu.org'
   Subject: vfio for platform devices - 9/5/2012 - minutes
  
   We had a call with those interested and/or working on vfio for
   platform devices.
  
   Participants: Scott Wood, Varun Sethi, Bharat Bhushan, Peter Maydell,
 Santosh Shukla, Alex Williamson, Alexander Graf,
 Antonios Motakis, Christoffer Dall, Kim Phillips,
 Stuart Yoder
  
   Several aspects to vfio for platform devices:
  
   1. IOMMU groups
  
-iommu driver needs to register a bus notifier for the platform bus
 and create groups for relevant platform devices  -Antonios is
   looking at this for several ARM IOMMUs  -PAMU (Freescale) driver
   already does this
  
   2. unbinding device from host
  
PCI:
  echo :06:0d.0 
   /sys/bus/pci/devices/:06:0d.0/driver/unbind
Platform:
  echo ffe101300.dma 
   /sys/bus/platform/devices/ffe101300.dma/driver/unbind
  
-don't believe there are issues or work to do here
  
   3. binding device to vfio-platform driver
  
PCI:
  echo 1102 0002  /sys/bus/pci/drivers/vfio-pci/new_id
  
-this is probably the least understood issue-- platform drivers
 register themselves with the bus for a specific name
 string.  That is matched with device tree compatible strings
 later to bind a device to a driver  -we want is to have the
   vfio-platform driver to dynamically bind
 to a variety of platform devices previously unknown to
 vfio-platform
-ideally unbinding and binding could be an atomic operation  -Alex
   W pointed out that x86 could leverage this work so
 keep that in mind in what we design  -Kim Phillips (Linaro) will
   start working on this
 
  One thing we didn't discuss needs to be considered (probably by Kim
  who is looking at the 'binding device' issue) is around returning a
  passthru device back to the host.
 
  After a platform device has been bound to vfio and is in use by user
  space or a virtual machine, we also need to be able to unwind all that
  and return the device back to the host in a sane state.
 
  What happens when user space exits and the vfio file descriptors are
  closed?
 
 For reference, expectations of how vfio-pci handles these situations:
 
 For vfio-pci, when the reference count on the device fd drops to zero we call 
 a
 device disable function that includes disabling the bus master bit in config
 space stop ongoing DMA.

There is no bus mastering for platform devices and device disabling is not a 
standard like PCI. Do you think that we might need to do some device specific 
handling. For example for DMA controller, we need to atleast disable the DMA 
controller and mask its interrupts (may be ensure that there is no 
pending/running dma transaction, release irqs etc). As we are not yet having 
any linkage to respective kernel driver then I am not sure how we will do that 
specific handling?

 
  What if the device is still active and doing bus
  mastering?   (e.g. a VM crashed causing a QEMU
  exit)
 
 If the VM crashes the vfio fds get released resulting in the above opportunity
 for the vfio device driver to quiesce the device.

I think the quiesing of devices with we device specific, so the generic 
vfio=platform driver may not be able to handle that, right?

 
  How can the vfio-platform layer in the host kernel get a specific
  device in a sane state?
 
 It's not easy on pci either.  We save config space prior to exposing the 
 device
 and restore config space later, but it's not complete.  We mostly rely on 
 device
 (function) resets, to put things in a sane state, but those aren't always
 supported.

All platform devices also may not have reset capability (and if any then not 
generic way for all devices).

  I just introduced patches for v3.12 that enable a PCI bus reset
 interface, but it's mostly useful for userspace, since on PCI it's often the
 case that a bus contains multiple devices which don't necessarily align to 
 iommu
 group boundaries.
 
  When a plaform device is 'unbound

Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer for target_bit above 61

2013-06-11 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Monday, June 10, 2013 11:40 PM
 To: Wood Scott-B07421
 Cc: Bhushan Bharat-R65777; Andreas Färber; qemu-...@nongnu.org; qemu-
 de...@nongnu.org; Wood Scott-B07421
 Subject: Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer for
 target_bit above 61
 
 
 On 10.06.2013, at 19:20, Scott Wood wrote:
 
  On 06/10/2013 09:26:18 AM, Alexander Graf wrote:
  On 06/10/2013 02:47 PM, Bhushan Bharat-R65777 wrote:
  -Original Message-
  From: Andreas Färber [mailto:afaer...@suse.de]
  Sent: Monday, June 10, 2013 5:43 PM
  To: Bhushan Bharat-R65777
  Cc: qemu-...@nongnu.org; qemu-devel@nongnu.org; ag...@suse.de; Wood
  Scott- B07421; Bhushan Bharat-R65777
  Subject: Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer
  for target_bit above 61 So IIUC we can only allow 63 bits due to
  signedness, thus a maximum of (1  62), thus target_bit= 61.
  Any chance at least the comment can be worded to explain that any
  better? Maybe also use (target-bit + 1= 63) or period  INT64_MAX as
 condition?
  How about this:
  /* QEMU timer supports a maximum timer of INT64_MAX
 (0x7fff_).
   * Run booke fit/wdog timer when
   * ((1ULL  target_bit + 1)  0x4000_), i.e target_bit =
 61.
   * Also the time with this maximum target_bit (with current range of
   * CPU frequency PowerPC supports) will be many many years. So it is
   * pretty safe to stop the timer above this threshold. */
  How about
   /* This timeout will take years to trigger. Treat the timer as
  disabled. */
 
  There should be at least a brief mention that it's because the QEMU
  timer can't handle larger values,
 
 If it can't handle higher values, maybe it's better to just set the timer 
 value
 to INT64_MAX when we detect an overflow? That would make the code plainly
 obvious.
 

What about below comment (a mix of both :)):

/* Timeout calculated with (target_bit + 1)  62 will take
 * hundreds of years to trigger. Treat the timer as disabled.
 * Also this timeout is within the qemu supported maximum
 * timeout limit (INT64_MAX.). */

-Bharat

 
 Alex
 
  with the detailed explanation in the changelog.  A better lower bound on the
 number of years would be nice as well (e.g. hundreds of years).
 
  -Scott
 





Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer for target_bit above 61

2013-06-11 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Tuesday, June 11, 2013 6:10 PM
 To: Bhushan Bharat-R65777
 Cc: Wood Scott-B07421; Andreas Färber; qemu-...@nongnu.org; qemu-
 de...@nongnu.org
 Subject: Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer for
 target_bit above 61
 
 On 06/11/2013 01:40 PM, Bhushan Bharat-R65777 wrote:
 
  -Original Message-
  From: Alexander Graf [mailto:ag...@suse.de]
  Sent: Monday, June 10, 2013 11:40 PM
  To: Wood Scott-B07421
  Cc: Bhushan Bharat-R65777; Andreas Färber; qemu-...@nongnu.org; qemu-
  de...@nongnu.org; Wood Scott-B07421
  Subject: Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer
  for target_bit above 61
 
 
  On 10.06.2013, at 19:20, Scott Wood wrote:
 
  On 06/10/2013 09:26:18 AM, Alexander Graf wrote:
  On 06/10/2013 02:47 PM, Bhushan Bharat-R65777 wrote:
  -Original Message-
  From: Andreas Färber [mailto:afaer...@suse.de]
  Sent: Monday, June 10, 2013 5:43 PM
  To: Bhushan Bharat-R65777
  Cc: qemu-...@nongnu.org; qemu-devel@nongnu.org; ag...@suse.de;
  Wood
  Scott- B07421; Bhushan Bharat-R65777
  Subject: Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate
  timer for target_bit above 61 So IIUC we can only allow 63 bits due to
  signedness, thus a maximum of (1   62), thus target_bit= 61.
  Any chance at least the comment can be worded to explain that any
  better? Maybe also use (target-bit + 1= 63) or period   INT64_MAX as
  condition?
  How about this:
   /* QEMU timer supports a maximum timer of INT64_MAX
  (0x7fff_).
* Run booke fit/wdog timer when
* ((1ULL   target_bit + 1)   0x4000_), i.e 
  target_bit
 =
  61.
* Also the time with this maximum target_bit (with current range 
  of
* CPU frequency PowerPC supports) will be many many years. So it 
  is
* pretty safe to stop the timer above this threshold. */
  How about
/* This timeout will take years to trigger. Treat the timer as
  disabled. */
  There should be at least a brief mention that it's because the QEMU
  timer can't handle larger values,
  If it can't handle higher values, maybe it's better to just set the
  timer value to INT64_MAX when we detect an overflow? That would make
  the code plainly obvious.
 
  What about below comment (a mix of both :)):
 
   /* Timeout calculated with (target_bit + 1)  62 will take
* hundreds of years to trigger. Treat the timer as disabled.
* Also this timeout is within the qemu supported maximum
* timeout limit (INT64_MAX.). */
 
 Ok, next question: Why does return disable the timer?

Actually here disabled means _not_ starting the timer. This function will be 
called to start timer initially and then later it is called to restart after 
every expiry. If we do not start then it is as good as stopped/disabled (it is 
not disabled in TCR). Probably saying do not start qemu timer or something 
similar is better than saying disabling the timer.

Thanks
-Bharat

 
 
 Alex
 





Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer for target_bit above 61

2013-06-11 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Tuesday, June 11, 2013 6:27 PM
 To: Bhushan Bharat-R65777
 Cc: Wood Scott-B07421; Andreas Färber; qemu-...@nongnu.org; qemu-
 de...@nongnu.org
 Subject: Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer for
 target_bit above 61
 
 On 06/11/2013 02:47 PM, Bhushan Bharat-R65777 wrote:
 
  -Original Message-
  From: Alexander Graf [mailto:ag...@suse.de]
  Sent: Tuesday, June 11, 2013 6:10 PM
  To: Bhushan Bharat-R65777
  Cc: Wood Scott-B07421; Andreas Färber; qemu-...@nongnu.org; qemu-
  de...@nongnu.org
  Subject: Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer
  for target_bit above 61
 
  On 06/11/2013 01:40 PM, Bhushan Bharat-R65777 wrote:
  -Original Message-
  From: Alexander Graf [mailto:ag...@suse.de]
  Sent: Monday, June 10, 2013 11:40 PM
  To: Wood Scott-B07421
  Cc: Bhushan Bharat-R65777; Andreas Färber; qemu-...@nongnu.org;
  qemu- de...@nongnu.org; Wood Scott-B07421
  Subject: Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer
  for target_bit above 61
 
 
  On 10.06.2013, at 19:20, Scott Wood wrote:
 
  On 06/10/2013 09:26:18 AM, Alexander Graf wrote:
  On 06/10/2013 02:47 PM, Bhushan Bharat-R65777 wrote:
  -Original Message-
  From: Andreas Färber [mailto:afaer...@suse.de]
  Sent: Monday, June 10, 2013 5:43 PM
  To: Bhushan Bharat-R65777
  Cc: qemu-...@nongnu.org; qemu-devel@nongnu.org; ag...@suse.de;
  Wood
  Scott- B07421; Bhushan Bharat-R65777
  Subject: Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate
  timer for target_bit above 61 So IIUC we can only allow 63 bits due 
  to
  signedness, thus a maximum of (162), thus target_bit= 61.
  Any chance at least the comment can be worded to explain that any
  better? Maybe also use (target-bit + 1= 63) or periodINT64_MAX 
  as
  condition?
  How about this:
/* QEMU timer supports a maximum timer of INT64_MAX
  (0x7fff_).
 * Run booke fit/wdog timer when
 * ((1ULLtarget_bit + 1)0x4000_), i.e
 target_bit
  =
  61.
 * Also the time with this maximum target_bit (with current 
  range
 of
 * CPU frequency PowerPC supports) will be many many years. So 
  it
 is
 * pretty safe to stop the timer above this threshold. */
  How about
 /* This timeout will take years to trigger. Treat the timer as
  disabled. */
  There should be at least a brief mention that it's because the
  QEMU timer can't handle larger values,
  If it can't handle higher values, maybe it's better to just set the
  timer value to INT64_MAX when we detect an overflow? That would
  make the code plainly obvious.
 
  What about below comment (a mix of both :)):
 
/* Timeout calculated with (target_bit + 1)   62 will take
 * hundreds of years to trigger. Treat the timer as disabled.
 * Also this timeout is within the qemu supported maximum
 * timeout limit (INT64_MAX.). */
  Ok, next question: Why does return disable the timer?
  Actually here disabled means _not_ starting the timer. This function will be
 called to start timer initially and then later it is called to restart after
 every expiry. If we do not start then it is as good as stopped/disabled (it is
 not disabled in TCR). Probably saying do not start qemu timer or something
 similar is better than saying disabling the timer.
 
 Couldn't you simply make things obvious from the code flow without pulling up
 assumptions?

You yourself suggested to stop/disable timer above a threshold :)

 
 Something along the lines of
 
 if (overflow) {

What is overflow?

Do you mean something like this: 
diff --git a/hw/ppc/ppc_booke.c b/hw/ppc/ppc_booke.c
index e41b036..5b84b96 100644
--- a/hw/ppc/ppc_booke.c
+++ b/hw/ppc/ppc_booke.c
@@ -133,15 +133,19 @@ static void booke_update_fixed_timer(CPUPPCState 
*env,
 ppc_tb_t *tb_env = env-tb_env;
 uint64_t lapse;
 uint64_t tb;
-uint64_t period = 1  (target_bit + 1);
+uint64_t period;
 uint64_t now;
 
 now = qemu_get_clock_ns(vm_clock);
 tb  = cpu_ppc_get_tb(tb_env, now, tb_env-tb_offset);
 
-lapse = period - ((tb - (1  target_bit))  (period - 1));
-
-*next = now + muldiv64(lapse, get_ticks_per_sec(), tb_env-tb_freq);
+if (target_bit = 62) {
+*next = INT64_MAX;
+} else {
+period = 1ULL  (target_bit + 1);
+lapse = period - ((tb - (1  target_bit))  (period - 1));
+*next = now + muldiv64(lapse, get_ticks_per_sec(), tb_env-tb_freq);
+}
 

Thanks
-Bharat

  *next = INT64_MAX;
 }
 
 qemu_mod_timer(timer, *next);
 
 Then everyone knows what's going on, we can always assume the timer is running
 and there's no need to understand complex corner cases. It feels more like the
 timer framework would be the one to decid to ignore timeouts that take years 
 to
 finish.
 
 
 Alex
 





Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer for target_bit above 61

2013-06-10 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Peter Maydell [mailto:peter.mayd...@linaro.org]
 Sent: Monday, June 10, 2013 2:56 PM
 To: Bhushan Bharat-R65777
 Cc: qemu-...@nongnu.org; qemu-devel@nongnu.org; ag...@suse.de; Wood Scott-
 B07421; Bhushan Bharat-R65777
 Subject: Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer for
 target_bit above 61
 
 On 10 June 2013 08:55, Bharat Bhushan r65...@freescale.com wrote:
  QEMU timer supports a maximum timer of INT64_MAX. So starting timer
  only for time which is calculated using target_bit  62 and
  deactivate/stop timer if the target bit is above 61.
 
 Is this really what the hardware does? Or do we need to set a timer for the
 maximum QEMU allows and then reset it for the residual time when the first 
 timer
 expires?

No, this is not how hardware works. But the time with the maximum target bit of 
61 (with current range of CPU frequency PowerPC supports) will be many many 
years. So I think it is pretty safe to stop the timer.

Thanks
-Bharat

 
 thanks
 -- PMM



Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer for target_bit above 61

2013-06-10 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Andreas Färber [mailto:afaer...@suse.de]
 Sent: Monday, June 10, 2013 5:43 PM
 To: Bhushan Bharat-R65777
 Cc: qemu-...@nongnu.org; qemu-devel@nongnu.org; ag...@suse.de; Wood Scott-
 B07421; Bhushan Bharat-R65777
 Subject: Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer for
 target_bit above 61
 
 Am 10.06.2013 09:55, schrieb Bharat Bhushan:
  QEMU timer supports a maximum timer of INT64_MAX. So starting timer
  only for time which is calculated using target_bit  62 and
  deactivate/stop timer if the target bit is above 61.
 
  This patch also fix the time calculation from target_bit.
  The code was doing (1  (target_bit + 1)) while this should be (1ULL
   (target_bit + 1)).
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
  v1-v2
   - Added booke: timer: in patch subject
 
   hw/ppc/ppc_booke.c |8 +++-
   1 files changed, 7 insertions(+), 1 deletions(-)
 
  diff --git a/hw/ppc/ppc_booke.c b/hw/ppc/ppc_booke.c index
  e41b036..f4eda15 100644
  --- a/hw/ppc/ppc_booke.c
  +++ b/hw/ppc/ppc_booke.c
  @@ -133,9 +133,15 @@ static void booke_update_fixed_timer(CPUPPCState
 *env,
   ppc_tb_t *tb_env = env-tb_env;
   uint64_t lapse;
   uint64_t tb;
  -uint64_t period = 1  (target_bit + 1);
  +uint64_t period;
   uint64_t now;
 
  +/* Deactivate timer for target_bit  61 */
  +if (target_bit  61)
  +return;
 
 Braces missing and trailing whitespace after return.

Ok, will correct

 
 So IIUC we can only allow 63 bits due to signedness, thus a maximum of
 (1  62), thus target_bit = 61.
 
 Any chance at least the comment can be worded to explain that any better? 
 Maybe
 also use (target-bit + 1 = 63) or period  INT64_MAX as condition?

How about this:
/* QEMU timer supports a maximum timer of INT64_MAX (0x7fff_).
 * Run booke fit/wdog timer when
 * ((1ULL  target_bit + 1)  0x4000_), i.e target_bit = 61.
 * Also the time with this maximum target_bit (with current range of
 * CPU frequency PowerPC supports) will be many many years. So it is
 * pretty safe to stop the timer above this threshold. */


 
 Best regards,
 Andreas
 
  +
  +period = 1ULL  (target_bit + 1);
  +
   now = qemu_get_clock_ns(vm_clock);
   tb  = cpu_ppc_get_tb(tb_env, now, tb_env-tb_offset);
 
 
 
 
 --
 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
 GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg





Re: [Qemu-devel] [PATCH v2] ppc: initialize GPRs as per epapr

2013-04-29 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Wood Scott-B07421
 Sent: Monday, April 29, 2013 11:21 PM
 To: Bhushan Bharat-R65777
 Cc: qemu-...@nongnu.org; qemu-devel@nongnu.org; ag...@suse.de; Yoder Stuart-
 B08248; da...@gibson.dropbear.id.au; Bhushan Bharat-R65777; Bhushan Bharat-
 R65777
 Subject: Re: [PATCH v2] ppc: initialize GPRs as per epapr
 
 On 04/29/2013 09:40:56 AM, Bharat Bhushan wrote:
  ePAPR defines the initial values of cpu registers.
  This patch initialize the GPRs as per ePAPR specification.
 
  This resolves the issue of guest reboot/reset (guest hang on reboot).
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
  v1--v2
   - Dynemic seting of initial map size in gpr[7]
   - For this the tlb size calculation is moved into a function.
 
   hw/ppc/e500.c |   29 ++---
   1 files changed, 26 insertions(+), 3 deletions(-)
 
  diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c index c1bdb6b..decd86c
  100644
  --- a/hw/ppc/e500.c
  +++ b/hw/ppc/e500.c
  @@ -37,6 +37,7 @@
   #include qemu/host-utils.h
   #include hw/pci-host/ppce500.h
 
  +#define EPAPR_MAGIC(0x45504150)
   #define BINARY_DEVICE_TREE_FILEmpc8544ds.dtb
   #define UIMAGE_LOAD_BASE   0
   #define DTC_LOAD_PAD   0x180
  @@ -393,11 +394,10 @@ static inline hwaddr
  booke206_page_size_to_tlb(uint64_t size)
   return 63 - clz64(size  10);
   }
 
  -static void mmubooke_create_initial_mapping(CPUPPCState *env)
  +static int booke206_initial_map_tsize(CPUPPCState *env)
   {
   struct boot_info *bi = env-load_info;
  -ppcmas_tlb_t *tlb = booke206_get_tlbm(env, 1, 0, 0);
  -hwaddr size, dt_end;
  +hwaddr dt_end;
   int ps;
 
   /* Our initial TLB entry needs to cover everything from 0 to @@
  -408,6 +408,23 @@ static void
  mmubooke_create_initial_mapping(CPUPPCState *env)
   /* e500v2 can only do even TLB size bits */
   ps++;
   }
  +return ps;
  +}
  +static uint64_t mmubooke_initial_mapsize(CPUPPCState *env)
 
 Please leave a blank line after each of those } lines.
 
 You change the function name to look like it's simply returning
 information, but it still creates the TLB entry as far as I can see.

This is just returning the tlb size . not creating the tlb entry.



 Then you go calling it multiple times (why?).  This may not be harmful, but it
 is ugly.
 
  +{
  +int tsize;
  +
  +tsize = booke206_initial_map_tsize(env);
  +return (1ULL  10  tsize);
  +}
 
 What value does this wrapper have?

This returning the size on initial tlb entry in bytes.

  The caller needs both the tsize
 and the size in bytes, and you only call this wrapper once, so let the caller 
 do
 the conversion instead.

Caller does not need to know both. It only need to know the size in bytes.

I think git diff make a mess of this. just trying to put the code for clarity 


static int booke206_initial_map_tsize(CPUPPCState *env)
{
struct boot_info *bi = env-load_info;
hwaddr dt_end;
int ps;

/* Our initial TLB entry needs to cover everything from 0 to
   the device tree top */
dt_end = bi-dt_base + bi-dt_size;
ps = booke206_page_size_to_tlb(dt_end) + 1;
if (ps  1) {
/* e500v2 can only do even TLB size bits */
ps++;
}
return ps;
}
static uint64_t mmubooke_initial_mapsize(CPUPPCState *env)
{
int tsize;

tsize = booke206_initial_map_tsize(env);
return (1ULL  10  tsize);
}

static void mmubooke_create_initial_mapping(CPUPPCState *env)
{
ppcmas_tlb_t *tlb = booke206_get_tlbm(env, 1, 0, 0);
hwaddr size;
int ps;

ps = booke206_initial_map_tsize(env);
size = (ps  MAS1_TSIZE_SHIFT);
tlb-mas1 = MAS1_VALID | size;
tlb-mas2 = 0;
tlb-mas7_3 = 0;
tlb-mas7_3 |= MAS3_UR | MAS3_UW | MAS3_UX | MAS3_SR | MAS3_SW | MAS3_SX;

env-tlb_dirty = true;
}

--

Thanks
-Bharat
 
 -Scott




Re: [Qemu-devel] [PATCH] ppc: initialize GPRs as per epapr

2013-04-26 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Friday, April 26, 2013 11:51 AM
 To: Bhushan Bharat-R65777
 Cc: qemu-...@nongnu.org; qemu-devel@nongnu.org; Wood Scott-B07421; Bhushan
 Bharat-R65777; Yoder Stuart-B08248
 Subject: Re: [PATCH] ppc: initialize GPRs as per epapr
 
 
 On 26.04.2013, at 08:17, Bharat Bhushan wrote:
 
  ePAPR defines the initial values of cpu registers. This patch
  initialize the GPRs as per ePAPR specification.
 
  This resolves the issue of guest reboot/reset (guest hang on reboot).
 
 Why does it hang only on reboot, not on initial bootup?

may be memory pointed by env pointer are zero initialized initially.
Reboot also not always hangs. I have seen reboot mostly working on 
e500v2/e500mc and mostly hanging on e5500.

 
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  Signed-off-by: Stuart Yoder stuart.yo...@freescale.com
  ---
  hw/ppc/e500.c |7 +++
  1 files changed, 7 insertions(+), 0 deletions(-)
 
  diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c index c1bdb6b..a47f976
  100644
  --- a/hw/ppc/e500.c
  +++ b/hw/ppc/e500.c
  @@ -37,6 +37,7 @@
  #include qemu/host-utils.h
  #include hw/pci-host/ppce500.h
 
  +#define EPAPR_MAGIC(0x45504150)
  #define BINARY_DEVICE_TREE_FILEmpc8544ds.dtb
  #define UIMAGE_LOAD_BASE   0
  #define DTC_LOAD_PAD   0x180
  @@ -444,6 +445,12 @@ static void ppce500_cpu_reset(void *opaque)
 
 Does ePAPR mention anything wrt GPR state of secondary CPUs?

Yes, I think we handle this in hw/ppc/ppce500_spin.c

 
  cs-halted = 0;
  env-gpr[1] = (1620) - 8;
  env-gpr[3] = bi-dt_base;
  +env-gpr[4] = 0;
  +env-gpr[5] = 0;
  +env-gpr[6] = EPAPR_MAGIC;
  +env-gpr[7] = (64 * 1024 * 1024);
 
 What is this?

Size of initial TLB ( should be big enough to cover kernel handler). I do not 
see ePAPR defines any value, I set this to 64M.

-Bharat

 
 
 Alex
 
  +env-gpr[8] = 0;
  +env-gpr[9] = 0;
  env-nip = bi-entry;
  mmubooke_create_initial_mapping(env);
  }
  --
  1.7.0.4
 
 
 





Re: [Qemu-devel] RFC: vfio API changes needed for powerpc (v3)

2013-04-09 Thread Bhushan Bharat-R65777

So now the sequence would be something like:

1)VFIO_GROUP_SET_CONTAINER // add groups to the container

2)VFIO_SET_IOMMU(VFIO_FSL_PAMU)// set iommu model

3)count = VFIO_IOMMU_GET_MSI_BANK_COUNT// returns max # of MSI banks

4)VFIO_IOMMU_SET_ATTR(ATTR_GEOMETRY) // set overall aperture

5)VFIO_IOMMU_SET_ATTR(ATTR_WINDOWS) // set # of windows, including MSI 
banks

6)   For (int I = 0; I  count; i++)
VFIO_IOMMU_PAMU_MAP_MSI_BANK() // map the MSI banks, do not 
enable aperture here.

7)   Memory Listener will call- VFIO_IOMMU_MAP_DMA// map the guest's 
memory
   --- kernel enables aperture here on first VFIO_IOMMU_MAP_DMA

8)VFIO_DEVICE_SET_IRQS
   --- VFIO in kernel makes 
pci_enable_msix()/pci_enable_msi_block() calls, this sets actual MSI addr/data 
in physical device.
  --- As the address set by above APIs is not what we want so
- is using MSIX, VFIO will update address in the MSI-X 
table
- if using MSI, update MSI address in PCI configuration 
space.



Thanks
-Bharat


 -Original Message-
 From: Yoder Stuart-B08248
 Sent: Friday, April 05, 2013 3:40 AM
 To: Alex Williamson
 Cc: Wood Scott-B07421; ag...@suse.de; Bhushan Bharat-R65777; Sethi 
 Varun-B16395;
 k...@vger.kernel.org; qemu-devel@nongnu.org; io...@lists.linux-foundation.org
 Subject: RFC: vfio API changes needed for powerpc (v3)
 
 -v3 updates
-made vfio_pamu_attr a union, added flags
-s/VFIO_PAMU_/VFIO_IOMMU_PAMU_/ for the ioctls to make it more
 clear which fd is being operated on
-added flags to vfio_pamu_msi_bank_map/umap
-VFIO_PAMU_GET_MSI_BANK_COUNT now just returns a __u32
 not a struct
-fixed some typos
 
 
 
 The Freescale PAMU is an aperture-based IOMMU with the following
 characteristics.  Each device has an entry in a table in memory
 describing the iova-phys mapping. The mapping has:
-an overall aperture that is power of 2 sized, and has a start iova that
 is naturally aligned
-has 1 or more windows within the aperture
   -number of windows must be power of 2, max is 256
   -size of each window is determined by aperture size / # of windows
   -iova of each window is determined by aperture start iova / # of windows
   -the mapped region in each window can be different than
the window size...mapping must power of 2
   -physical address of the mapping must be naturally aligned
with the mapping size
 
 These ioctls operate on the VFIO file descriptor (/dev/vfio/vfio).
 
 /*
  * VFIO_IOMMU_PAMU_GET_ATTR
  *
  * Gets the iommu attributes for the current vfio container.  This
  * ioctl is applicable to an iommu type of VFIO_PAMU only.
  * Caller sets argsz and attribute.  The ioctl fills in
  * the provided struct vfio_pamu_attr based on the attribute
  * value that was set.
 
  * Return: 0 on success, -errno on failure
  */
 struct vfio_pamu_attr {
 __u32   argsz;
 __u32   flags;/* no flags currently */
 __u32   attribute;
 
 union {
 /* VFIO_ATTR_GEOMETRY */
 struct {
 __u64 aperture_start; /* first addr that can be mapped
 */
 __u64 aperture_end;  /* last addr that can be mapped 
 */
 } attr;
 
 /* VFIO_ATTR_WINDOWS */
 __u32 windows;  /* number of windows in the aperture */
 /* initially this will be the max number
  * of windows that can be set
  */
 
 /* VFIO_ATTR_PAMU_STASH */
 struct {
 __u32 cpu; /* CPU number for stashing */
 __u32 cache;   /* cache ID for stashing */
 } stash;
 }
 };
 #define VFIO_IOMMU_PAMU_GET_ATTR  _IO(VFIO_TYPE, VFIO_BASE + x,
 struct vfio_pamu_attr)
 
 /*
  * VFIO_IOMMU_PAMU_SET_ATTR
  *
  * Sets the iommu attributes for the current vfio container.  This
  * ioctl is applicable to an iommu type of VFIO_PAMU only.
  * Caller sets struct vfio_pamu attr, including argsz and attribute and
  * setting any fields that are valid for the attribute.
  * Return: 0 on success, -errno on failure
  */
 #define VFIO_IOMMU_PAMU_SET_ATTR  _IO(VFIO_TYPE, VFIO_BASE + x,
 struct vfio_pamu_attr)
 
 /*
  * VFIO_IOMMU_PAMU_GET_MSI_BANK_COUNT
  *
  * Returns the number of MSI banks for this platform.  This tells user space
  * how many aperture windows should be reserved for MSI banks when setting
  * the PAMU geometry and window count.
  * Return: __u32 bank count on success, -errno on failure
  */
 #define VFIO_IOMMU_PAMU_GET_MSI_BANK_COUNT _IO(VFIO_TYPE, VFIO_BASE + x, 
 __u32)
 
 /*
  * VFIO_IOMMU_PAMU_MAP_MSI_BANK

Re: [Qemu-devel] VFIO: Not require to make VFIO_IOMMU_MAP_DMA for MMAPED PCI bars

2013-02-18 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alex Williamson [mailto:alex.william...@redhat.com]
 Sent: Thursday, February 14, 2013 11:57 PM
 To: Bhushan Bharat-R65777
 Cc: Yoder Stuart-B08248; Wood Scott-B07421; Graf Alexander-B36701; qemu-
 de...@nongnu.org; qemu-...@nongnu.org
 Subject: Re: VFIO: Not require to make VFIO_IOMMU_MAP_DMA for MMAPED PCI bars
 
 On Thu, 2013-02-14 at 08:22 +, Bhushan Bharat-R65777 wrote:
  Hi Alex Williamson,
 
  I am able (with some hacks :)) to directly assign the e1000 PCI device
  to KVM guest using VFIO on Freescale device. One of the problem I am
  facing is about the DMA mapping in IOMMU for PCI device BARs. On
  Freescale devices, the mmaped PCI device BARs are not required to be
  mapped in IOMMU. Typically the flow of in/out transaction (from CPU)
  is:
 
  Incoming flow:
 
  -| |--| |---|
 |-|
  CORE |-| IOMMU|--| 
  PCI-Controller|---
 | PCI device  |
  -| |--| |---|
 |-|
 
  Outgoing Flow: IOMMU is bypassed for out transactions
 
  -| |--| |---|
 |-|
  CORE |--|   | IOMMU|---| 
  PCI-Controller|---
 | PCI device  |
  -| |   |--|   ^ |---|
 |-|
 |  |
 |--|
 
 Mapping the BAR into the IOMMU isn't for this second use case, CPU to device 
 is
 never IOMMU translated.  Instead, it's for this:
 
 |--| |---|   |-|
 | IOMMU|--| PCI-Controller|---| PCI device  |
 |--| |---|   |-|
   |
   |  |---|   |-|
   +- -| PCI-Controller|---| PCI device  |
  |---|   |-|
 
 It's perfectly fine to not support peer-to-peer DMA, but let's skip it where
 it's not supported in case it might actually work in some cases.
 
  Also because of some hardware limitations on our IOMMU it is difficult
  to map these BAR regions with RAM (DDR) regions. So on Freescale
  device we want the VFIO_IOMMU_MAP_DMA ioctl to be called for RAM
  regions (DDR) only and _not_ for these mmaped ram regions of PCI
  device bars. I can understand that we need to add these mmaped PCI
  bars as RAM type in qemu memory_region_*(). So for that I tried to
  skip these regions in VFIO memory_listeners. Below changes which works
  for me. I am not sure whether this is correct approach, please
  suggest.
 
  -
  diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c index c51ae67..63728d8
  100644
  --- a/hw/vfio_pci.c
  +++ b/hw/vfio_pci.c
  @@ -1115,9 +1115,35 @@ static int vfio_dma_map(VFIOContainer *container,
 hwaddr iova,
   return -errno;
   }
 
  -static bool vfio_listener_skipped_section(MemoryRegionSection
  *section)
  +static int memory_region_is_mmap_bars(VFIOContainer *container,
  +  MemoryRegionSection *section)
   {
  -return !memory_region_is_ram(section-mr);
  +VFIOGroup *group;
  +VFIODevice *vdev;
  +int i;
  +
  +QLIST_FOREACH(group, container-group_list, next) {
 
 I think you want to start at the global group_list

Why you think global? My understanding is that the operation on dma_map/unmap 
is limited to a group in the given container.

 
  +QLIST_FOREACH(vdev, group-device_list, next) {
  +if (vdev-msix-mmap_mem.ram_addr ==
  + section-mr-ram_addr)
 
 Note the comment in memory.h:
 
 struct MemoryRegion {
 /* All fields are private - violators will be prosecuted */
 ...
 ram_addr_t ram_addr;
 
 You'll need to invent some kind of memory region comparison interfaces instead
 of accessing these private fields.

You mean adding some api ine memory.c?

 
  +return 1;
  +for (i = 0; i  PCI_ROM_SLOT; i++) {
  +VFIOBAR *bar = vdev-bars[i];
  +if (bar-mmap_mem.ram_addr == section-mr-ram_addr)
  +return 1;
  +}
  +}
  +}
  +
  +return 0;
  +}
 
 It's a pretty heavy weight function, I think the memory API should help us
 differentiate MMIO from RAM.

What about adding a bool in  struct MemoryRegion { 

Bool require_iommu_map;
There will be APIs to set/clear this bool and default all ram = true, will 
have require_iommu_map = true

if (!(memory_region-ram == true  memory_region-require_iommu_map == true)
skip_iommumap;


 
  +static bool vfio_listener_skipped_section(VFIOContainer *container,
  +  MemoryRegionSection
  +*section) {
  +if (!memory_region_is_ram(section-mr))
  +   return 1;
 
 Need

Re: [Qemu-devel] VFIO: Not require to make VFIO_IOMMU_MAP_DMA for MMAPED PCI bars

2013-02-18 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Wood Scott-B07421
 Sent: Tuesday, February 19, 2013 9:39 AM
 To: Bhushan Bharat-R65777
 Cc: Alex Williamson; Yoder Stuart-B08248; Wood Scott-B07421; Graf Alexander-
 B36701; qemu-devel@nongnu.org; qemu-...@nongnu.org
 Subject: Re: VFIO: Not require to make VFIO_IOMMU_MAP_DMA for MMAPED PCI bars
 
 On 02/18/2013 10:05:20 PM, Bhushan Bharat-R65777 wrote:
 
 
   -Original Message-
   From: Alex Williamson [mailto:alex.william...@redhat.com]
   Sent: Thursday, February 14, 2013 11:57 PM
   To: Bhushan Bharat-R65777
   Cc: Yoder Stuart-B08248; Wood Scott-B07421; Graf Alexander-B36701;
  qemu-
   de...@nongnu.org; qemu-...@nongnu.org
   Subject: Re: VFIO: Not require to make VFIO_IOMMU_MAP_DMA for
  MMAPED PCI bars
  
   On Thu, 2013-02-14 at 08:22 +, Bhushan Bharat-R65777 wrote:
+static bool vfio_listener_skipped_section(VFIOContainer
  *container,
+  MemoryRegionSection
+*section) {
+if (!memory_region_is_ram(section-mr))
+   return 1;
  
   Need some kind of conditional here for platforms that support
  peer-to-peer.
   Thanks,
 
  What about adding a iommu_type will help here (VFIO_TYPE2_IOMMU),
  which does not require BARs to be mapped in iommu ?
 
 What specifically does type 2 mean again?  If we're going to use it as the
 thing to key off of for all sorts of PAMU quirks, might as well just call it
 VFIO_IOMMU_FSL_PAMU.

Just thought of a common name so in case some other IOMMU also fall in this 
category and add a comment where this will be defined.
I am fine with this name as well :)

Thanks
-Bharat






Re: [Qemu-devel] VFIO: Not require to make VFIO_IOMMU_MAP_DMA for MMAPED PCI bars

2013-02-18 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alex Williamson [mailto:alex.william...@redhat.com]
 Sent: Tuesday, February 19, 2013 9:55 AM
 To: Bhushan Bharat-R65777
 Cc: Yoder Stuart-B08248; Wood Scott-B07421; Graf Alexander-B36701; qemu-
 de...@nongnu.org; qemu-...@nongnu.org
 Subject: Re: VFIO: Not require to make VFIO_IOMMU_MAP_DMA for MMAPED PCI bars
 
 On Tue, 2013-02-19 at 04:05 +, Bhushan Bharat-R65777 wrote:
 
   -Original Message-
   From: Alex Williamson [mailto:alex.william...@redhat.com]
   Sent: Thursday, February 14, 2013 11:57 PM
   To: Bhushan Bharat-R65777
   Cc: Yoder Stuart-B08248; Wood Scott-B07421; Graf Alexander-B36701;
   qemu- de...@nongnu.org; qemu-...@nongnu.org
   Subject: Re: VFIO: Not require to make VFIO_IOMMU_MAP_DMA for MMAPED
   PCI bars
  
   On Thu, 2013-02-14 at 08:22 +, Bhushan Bharat-R65777 wrote:
Hi Alex Williamson,
   
I am able (with some hacks :)) to directly assign the e1000 PCI
device to KVM guest using VFIO on Freescale device. One of the
problem I am facing is about the DMA mapping in IOMMU for PCI
device BARs. On Freescale devices, the mmaped PCI device BARs are
not required to be mapped in IOMMU. Typically the flow of in/out
transaction (from CPU)
is:
   
Incoming flow:
   
-| |--| |---|
   |-|
CORE |-| IOMMU|--| 
PCI-Controller|---
 
   | PCI device  |
-| |--| |---|
   |-|
   
Outgoing Flow: IOMMU is bypassed for out transactions
   
-| |--| |---|
   |-|
CORE |--|   | IOMMU|---| 
PCI-Controller|---
 
   | PCI device  |
-| |   |--|   ^ |---|
   |-|
   |  |
   |--|
  
   Mapping the BAR into the IOMMU isn't for this second use case, CPU
   to device is never IOMMU translated.  Instead, it's for this:
  
   |--| |---|   |-|
   | IOMMU|--| PCI-Controller|---| PCI device  |
   |--| |---|   |-|
 |
 |  |---|   |-|
 +- -| PCI-Controller|---| PCI device  |
|---|   |-|
  
   It's perfectly fine to not support peer-to-peer DMA, but let's skip
   it where it's not supported in case it might actually work in some cases.
  
Also because of some hardware limitations on our IOMMU it is
difficult to map these BAR regions with RAM (DDR) regions. So on
Freescale device we want the VFIO_IOMMU_MAP_DMA ioctl to be called
for RAM regions (DDR) only and _not_ for these mmaped ram regions
of PCI device bars. I can understand that we need to add these
mmaped PCI bars as RAM type in qemu memory_region_*(). So for that
I tried to skip these regions in VFIO memory_listeners. Below
changes which works for me. I am not sure whether this is correct
approach, please suggest.
   
-
diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c index c51ae67..63728d8
100644
--- a/hw/vfio_pci.c
+++ b/hw/vfio_pci.c
@@ -1115,9 +1115,35 @@ static int vfio_dma_map(VFIOContainer
*container,
   hwaddr iova,
 return -errno;
 }
   
-static bool vfio_listener_skipped_section(MemoryRegionSection
*section)
+static int memory_region_is_mmap_bars(VFIOContainer *container,
+  MemoryRegionSection
+*section)
 {
-return !memory_region_is_ram(section-mr);
+VFIOGroup *group;
+VFIODevice *vdev;
+int i;
+
+QLIST_FOREACH(group, container-group_list, next) {
  
   I think you want to start at the global group_list
 
  Why you think global? My understanding is that the operation on 
  dma_map/unmap
 is limited to a group in the given container.
 
 There can theoretically be more than one container, so if you're only 
 searching
 groups within a container then you may not be searching all the devices.
 
+QLIST_FOREACH(vdev, group-device_list, next) {
+if (vdev-msix-mmap_mem.ram_addr ==
+ section-mr-ram_addr)
  
   Note the comment in memory.h:
  
   struct MemoryRegion {
   /* All fields are private - violators will be prosecuted */
   ...
   ram_addr_t ram_addr;
  
   You'll need to invent some kind of memory region comparison
   interfaces instead of accessing these private fields.
 
  You mean adding some api ine memory.c?
 
 Yes
 
+return 1;
+for (i = 0; i  PCI_ROM_SLOT; i++) {
+VFIOBAR *bar

[Qemu-devel] VFIO: Not require to make VFIO_IOMMU_MAP_DMA for MMAPED PCI bars

2013-02-14 Thread Bhushan Bharat-R65777
Hi Alex Williamson,

I am able (with some hacks :)) to directly assign the e1000 PCI device to KVM 
guest using VFIO on Freescale device. One of the problem I am facing is about 
the DMA mapping in IOMMU for PCI device BARs. On Freescale devices, the mmaped 
PCI device BARs are not required to be mapped in IOMMU. Typically the flow of 
in/out transaction (from CPU) is: 

Incoming flow:

-| |--| |---|   
|-|
CORE |-| IOMMU|--| 
PCI-Controller|---| PCI device  |
-| |--| |---|   
|-|

Outgoing Flow: IOMMU is bypassed for out transactions

-| |--| |---|   
|-|
CORE |--|   | IOMMU|---| 
PCI-Controller|---| PCI device  |
-| |   |--|   ^ |---|   
|-|
   |  |
   |--|


Also because of some hardware limitations on our IOMMU it is difficult to map 
these BAR regions with RAM (DDR) regions. So on Freescale device we want the 
VFIO_IOMMU_MAP_DMA ioctl to be called for RAM regions (DDR) only and _not_ for 
these mmaped ram regions of PCI device bars. I can understand that we need to 
add these mmaped PCI bars as RAM type in qemu memory_region_*(). So for that I 
tried to skip these regions in VFIO memory_listeners. Below changes which works 
for me. I am not sure whether this is correct approach, please suggest.

-
diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c
index c51ae67..63728d8 100644
--- a/hw/vfio_pci.c
+++ b/hw/vfio_pci.c
@@ -1115,9 +1115,35 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr 
iova,
 return -errno;
 }
 
-static bool vfio_listener_skipped_section(MemoryRegionSection *section)
+static int memory_region_is_mmap_bars(VFIOContainer *container,
+  MemoryRegionSection *section)
 {
-return !memory_region_is_ram(section-mr);
+VFIOGroup *group;
+VFIODevice *vdev;
+int i;
+
+QLIST_FOREACH(group, container-group_list, next) {
+QLIST_FOREACH(vdev, group-device_list, next) {
+if (vdev-msix-mmap_mem.ram_addr == section-mr-ram_addr)
+return 1;
+for (i = 0; i  PCI_ROM_SLOT; i++) {
+VFIOBAR *bar = vdev-bars[i];
+if (bar-mmap_mem.ram_addr == section-mr-ram_addr)
+return 1;
+}
+}
+}
+
+return 0;
+}
+
+static bool vfio_listener_skipped_section(VFIOContainer *container,
+  MemoryRegionSection *section)
+{
+if (!memory_region_is_ram(section-mr))
+   return 1;
+
+return memory_region_is_mmap_bars(container, section);
 }
 
 static void vfio_listener_region_add(MemoryListener *listener,
@@ -1129,7 +1155,7 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 void *vaddr;
 int ret;
 
-if (vfio_listener_skipped_section(section)) {
+if (vfio_listener_skipped_section(container, section)) {
 DPRINTF(vfio: SKIPPING region_add %HWADDR_PRIx - %PRIx64\n,
 section-offset_within_address_space,
 section-offset_within_address_space + section-size - 1);
@@ -1173,7 +1199,7 @@ static void vfio_listener_region_del(MemoryListener 
*listener,
 hwaddr iova, end;
 int ret;
-if (vfio_listener_skipped_section(section)) {
+if (vfio_listener_skipped_section(container, section)) {
 DPRINTF(vfio: SKIPPING region_del %HWADDR_PRIx - %PRIx64\n,
 section-offset_within_address_space,
 section-offset_within_address_space + section-size - 1);

-


Thanks
-Bharat




Re: [Qemu-devel] [PATCH 4/7 v2] KVM regsync: Add register bitmap parameter to do_kvm_cpu_synchronize_state

2013-01-16 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Marcelo Tosatti [mailto:mtosa...@redhat.com]
 Sent: Wednesday, January 16, 2013 9:36 PM
 To: Jason J. Herne
 Cc: ag...@suse.de; borntrae...@de.ibm.com; aligu...@us.ibm.com; qemu-
 de...@nongnu.org; Bhushan Bharat-R65777; jan.kis...@siemens.com
 Subject: Re: [PATCH 4/7 v2] KVM regsync: Add register bitmap parameter to
 do_kvm_cpu_synchronize_state
 
 On Thu, Jan 10, 2013 at 10:29:04AM -0500, Jason J. Herne wrote:
  From: Jason J. Herne jjhe...@us.ibm.com
 
  do_kvm_cpu_synchronize_state is called via run_on_cpu, so we can only
  pass a single argument.  Create SyncStateArgs struct for this purpose
  and add register bitmap data member to it.
 
  Signed-off-by: Jason J. Herne jjhe...@us.ibm.com
  Reviewed-by: Christian Borntraeger borntrae...@de.ibm.com
  ---
   include/sysemu/kvm.h |6 ++
   kvm-all.c|   27 +--
   2 files changed, 23 insertions(+), 10 deletions(-)
 
  diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h index
  e0738ba..193d1f4 100644
  --- a/include/sysemu/kvm.h
  +++ b/include/sysemu/kvm.h
  @@ -223,6 +223,12 @@ int kvm_check_extension(KVMState *s, unsigned int
  extension);
 
   uint32_t kvm_arch_get_supported_cpuid(KVMState *env, uint32_t function,
 uint32_t index, int reg);
  +
  +struct kvm_cpu_syncstate_args {
  +CPUState *cpu;
  +int regmap;
  +};
  +
   void kvm_cpu_synchronize_state(CPUArchState *env);  void
  kvm_cpu_synchronize_post_reset(CPUArchState *env);  void
  kvm_cpu_synchronize_post_init(CPUArchState *env); diff --git
  a/kvm-all.c b/kvm-all.c index 1aa61bb..77ab72a 100644
  --- a/kvm-all.c
  +++ b/kvm-all.c
  @@ -231,7 +231,7 @@ int kvm_init_vcpu(CPUArchState *env)
 
   cpu-kvm_fd = ret;
   cpu-kvm_state = s;
  -cpu-kvm_vcpu_dirty = true;
  +cpu-kvm_vcpu_dirty = KVM_REGSYNC_FULL_STATE;
 
   mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
   if (mmap_size  0) {
  @@ -1491,20 +1491,27 @@ void kvm_flush_coalesced_mmio_buffer(void)
 
   static void do_kvm_cpu_synchronize_state(void *arg)  {
  -CPUState *cpu = arg;
  +struct kvm_cpu_syncstate_args *args = arg;
 
  -if (!cpu-kvm_vcpu_dirty) {
  -kvm_arch_get_registers(cpu, KVM_REGSYNC_FULL_STATE);
  -cpu-kvm_vcpu_dirty = true;
  +/* Do not sync regs that are already dirty */
  +int regs_to_get = args-regmap  ~args-cpu-kvm_vcpu_dirty;
  +
  +if (regs_to_get) {
  +kvm_arch_get_registers(args-cpu, regs_to_get);
  +args-cpu-kvm_vcpu_dirty |= regs_to_get;
   }
   }
 
   void kvm_cpu_synchronize_state(CPUArchState *env)  {
   CPUState *cpu = ENV_GET_CPU(env);
  +struct kvm_cpu_syncstate_args args;
  +
  +args.cpu = cpu;
  +args.regmap = KVM_REGSYNC_FULL_STATE;
 
  -if (!cpu-kvm_vcpu_dirty) {
  -run_on_cpu(cpu, do_kvm_cpu_synchronize_state, cpu);
  +if (args.regmap  ~cpu-kvm_vcpu_dirty) {
  +run_on_cpu(cpu, do_kvm_cpu_synchronize_state, args);
   }
   }
 
  @@ -1513,7 +1520,7 @@ void kvm_cpu_synchronize_post_reset(CPUArchState *env)
   CPUState *cpu = ENV_GET_CPU(env);
 
   kvm_arch_put_registers(cpu, KVM_REGSYNC_RESET_STATE);
  -cpu-kvm_vcpu_dirty = false;
  +cpu-kvm_vcpu_dirty = ~KVM_REGSYNC_RESET_STATE;
   }
 
   void kvm_cpu_synchronize_post_init(CPUArchState *env) @@ -1521,7
  +1528,7 @@ void kvm_cpu_synchronize_post_init(CPUArchState *env)
   CPUState *cpu = ENV_GET_CPU(env);
 
   kvm_arch_put_registers(cpu, KVM_REGSYNC_FULL_STATE);
  -cpu-kvm_vcpu_dirty = false;
  +cpu-kvm_vcpu_dirty = ~KVM_REGSYNC_FULL_STATE;
   }
 
   int kvm_cpu_exec(CPUArchState *env)
  @@ -1540,7 +1547,7 @@ int kvm_cpu_exec(CPUArchState *env)
   do {
   if (cpu-kvm_vcpu_dirty) {
   kvm_arch_put_registers(cpu, KVM_REGSYNC_RUNTIME_STATE);
  -cpu-kvm_vcpu_dirty = false;
  +cpu-kvm_vcpu_dirty = ~KVM_REGSYNC_RUNTIME_STATE;
   }
 
 1) This implies a vcpu can enter guest mode with kvm_vcpu_dirty non-zero.

I think above code should be:
kvm_arch_put_registers(cpu, cpu-kvm_vcpu_dirty);
cpu-kvm_vcpu_dirty = false;

so vcpu will not enter guest state with dirty registers in qemu.

 
 Unrelated:
 
 2) Also, what is the reason for specifying sets of registers in arch-specific
 code? Is that because it allows PPC to fix their sync-timer register problem?
 
 When you are writing generic code, what does it mean to use
 'KVM_REGSYNC_{RUNTIME,RESET,FULL}_STATE' ?
 Answer: it depends on the architecture.
 
 3) On x86, kvm_arch_get_registers(GET_FULL) must not imply
 kvm_arch_put_registers(PUT_FULL).
 
 The S/390 problem, from
 http://lists.nongnu.org/archive/html/qemu-devel/2012-11/msg02213.html:
 
  The kvm register sync needs to happen in the kvm register sync
  function :)
  That would eliminate the whole purpose of sync regs and forces us to
  have an expensive ioctl on lots of exits (again). I

Re: [Qemu-devel] [PATCH 4/7 v2] KVM regsync: Add register bitmap parameter to do_kvm_cpu_synchronize_state

2013-01-14 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Jason J. Herne [mailto:jjhe...@us.ibm.com]
 Sent: Thursday, January 10, 2013 8:59 PM
 To: ag...@suse.de; borntrae...@de.ibm.com; aligu...@us.ibm.com;
 mtosa...@redhat.com; qemu-devel@nongnu.org; Bhushan Bharat-R65777;
 jan.kis...@siemens.com
 Cc: Jason J. Herne
 Subject: [PATCH 4/7 v2] KVM regsync: Add register bitmap parameter to
 do_kvm_cpu_synchronize_state
 
 From: Jason J. Herne jjhe...@us.ibm.com
 
 do_kvm_cpu_synchronize_state is called via run_on_cpu, so we can only pass
 a single argument.  Create SyncStateArgs struct for this purpose and add
 register bitmap data member to it.
 
 Signed-off-by: Jason J. Herne jjhe...@us.ibm.com
 Reviewed-by: Christian Borntraeger borntrae...@de.ibm.com
 ---
  include/sysemu/kvm.h |6 ++
  kvm-all.c|   27 +--
  2 files changed, 23 insertions(+), 10 deletions(-)
 
 diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
 index e0738ba..193d1f4 100644
 --- a/include/sysemu/kvm.h
 +++ b/include/sysemu/kvm.h
 @@ -223,6 +223,12 @@ int kvm_check_extension(KVMState *s, unsigned int
 extension);
 
  uint32_t kvm_arch_get_supported_cpuid(KVMState *env, uint32_t function,
uint32_t index, int reg);
 +
 +struct kvm_cpu_syncstate_args {
 +CPUState *cpu;
 +int regmap;
 +};
 +
  void kvm_cpu_synchronize_state(CPUArchState *env);
  void kvm_cpu_synchronize_post_reset(CPUArchState *env);
  void kvm_cpu_synchronize_post_init(CPUArchState *env);
 diff --git a/kvm-all.c b/kvm-all.c
 index 1aa61bb..77ab72a 100644
 --- a/kvm-all.c
 +++ b/kvm-all.c
 @@ -231,7 +231,7 @@ int kvm_init_vcpu(CPUArchState *env)
 
  cpu-kvm_fd = ret;
  cpu-kvm_state = s;
 -cpu-kvm_vcpu_dirty = true;
 +cpu-kvm_vcpu_dirty = KVM_REGSYNC_FULL_STATE;
 
  mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
  if (mmap_size  0) {
 @@ -1491,20 +1491,27 @@ void kvm_flush_coalesced_mmio_buffer(void)
 
  static void do_kvm_cpu_synchronize_state(void *arg)
  {
 -CPUState *cpu = arg;
 +struct kvm_cpu_syncstate_args *args = arg;
 
 -if (!cpu-kvm_vcpu_dirty) {
 -kvm_arch_get_registers(cpu, KVM_REGSYNC_FULL_STATE);
 -cpu-kvm_vcpu_dirty = true;
 +/* Do not sync regs that are already dirty */
 +int regs_to_get = args-regmap  ~args-cpu-kvm_vcpu_dirty;
 +
 +if (regs_to_get) {
 +kvm_arch_get_registers(args-cpu, regs_to_get);
 +args-cpu-kvm_vcpu_dirty |= regs_to_get;
  }
  }
 
  void kvm_cpu_synchronize_state(CPUArchState *env)
  {
  CPUState *cpu = ENV_GET_CPU(env);
 +struct kvm_cpu_syncstate_args args;
 +
 +args.cpu = cpu;
 +args.regmap = KVM_REGSYNC_FULL_STATE;
 
 -if (!cpu-kvm_vcpu_dirty) {
 -run_on_cpu(cpu, do_kvm_cpu_synchronize_state, cpu);
 +if (args.regmap  ~cpu-kvm_vcpu_dirty) {
 +run_on_cpu(cpu, do_kvm_cpu_synchronize_state, args);
  }
  }
 
 @@ -1513,7 +1520,7 @@ void kvm_cpu_synchronize_post_reset(CPUArchState *env)
  CPUState *cpu = ENV_GET_CPU(env);
 
  kvm_arch_put_registers(cpu, KVM_REGSYNC_RESET_STATE);
 -cpu-kvm_vcpu_dirty = false;
 +cpu-kvm_vcpu_dirty = ~KVM_REGSYNC_RESET_STATE;
  }
 
  void kvm_cpu_synchronize_post_init(CPUArchState *env)
 @@ -1521,7 +1528,7 @@ void kvm_cpu_synchronize_post_init(CPUArchState *env)
  CPUState *cpu = ENV_GET_CPU(env);
 
  kvm_arch_put_registers(cpu, KVM_REGSYNC_FULL_STATE);
 -cpu-kvm_vcpu_dirty = false;
 +cpu-kvm_vcpu_dirty = ~KVM_REGSYNC_FULL_STATE;
  }
 
  int kvm_cpu_exec(CPUArchState *env)
 @@ -1540,7 +1547,7 @@ int kvm_cpu_exec(CPUArchState *env)
  do {
  if (cpu-kvm_vcpu_dirty) {
  kvm_arch_put_registers(cpu, KVM_REGSYNC_RUNTIME_STATE);

s/KVM_REGSYNC_RUNTIME_STATE/cpu-kvm_vcpu_dirty


 -cpu-kvm_vcpu_dirty = false;
 +cpu-kvm_vcpu_dirty = ~KVM_REGSYNC_RUNTIME_STATE;

cpu-kvm_vcpu_dirty = 0;

-Bharat

  }
 
  kvm_arch_pre_run(cpu, run);
 --
 1.7.9.5
 





Re: [Qemu-devel] [PATCH 0/7 v2] KVM regsync

2013-01-14 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Jason J. Herne [mailto:jjhe...@linux.vnet.ibm.com]
 Sent: Thursday, January 10, 2013 8:59 PM
 To: ag...@suse.de; borntrae...@de.ibm.com; aligu...@us.ibm.com;
 mtosa...@redhat.com; qemu-devel@nongnu.org; Bhushan Bharat-R65777;
 jan.kis...@siemens.com
 Subject: [PATCH 0/7 v2] KVM regsync
 
 Rework the method used to synchronize CPU registers between Qemu  KVM.  This
 patch set extends kvm_arch_put_registers() and
 kvm_arch_get_registers() to take a register bitmap parameter.  All existing 
 code
 paths are updated to specify this new parameter.
 
 IMPORTANT NOTE:  The PPC and i386 implementations are incomplete.
 I am submitting this code at this time only to get a review on the
 implementation of the existing code and to perhaps seek assistance with the
 mentioned architectures.
 
 I am not sure who will finish the implementation of PPC/i386 yet.  Due to the
 fact that I am unfamiliar with these architectures at the register level and I
 do not have test environments I would like to humbly request that a maintainer
 of these architectures take a look at it.  Or perhaps Bharat could handle the
 PPC code?  This would only leave
 i386 to worry about.  If I cannot find someone to handle i386 I will look into
 the feasibility of completing it myself.
 
 In order to complete the missing implementations, kvm_arch_get_registers and
 kvm_arch_put_registers (and associated helper functions) will need to be 
 updated
 to only sync the registers contained in the new bitmap argument.  Also, each 
 set
 of registers represented  by one of the bits must be mutually exclusive with
 respect to every other bit.  if this is not the case then local register data
 can be lost when kvm_arch_get_registers is called causing an old register 
 value
 to overwrite a newer local value.

I think net step can be let the register bitmap to go all upto the KVM, so the 
KVM also read/write the specified registers set only.

Thanks
-Bharat




Re: [Qemu-devel] [PATCH 0/7 v2] KVM regsync

2013-01-11 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Jason J. Herne [mailto:jjhe...@linux.vnet.ibm.com]
 Sent: Thursday, January 10, 2013 8:59 PM
 To: ag...@suse.de; borntrae...@de.ibm.com; aligu...@us.ibm.com;
 mtosa...@redhat.com; qemu-devel@nongnu.org; Bhushan Bharat-R65777;
 jan.kis...@siemens.com
 Subject: [PATCH 0/7 v2] KVM regsync
 
 Rework the method used to synchronize CPU registers between Qemu  KVM.  This
 patch set extends kvm_arch_put_registers() and
 kvm_arch_get_registers() to take a register bitmap parameter.  All existing 
 code
 paths are updated to specify this new parameter.
 
 IMPORTANT NOTE:  The PPC and i386 implementations are incomplete.
 I am submitting this code at this time only to get a review on the
 implementation of the existing code and to perhaps seek assistance with the
 mentioned architectures.
 
 I am not sure who will finish the implementation of PPC/i386 yet.  Due to the
 fact that I am unfamiliar with these architectures at the register level and I
 do not have test environments I would like to humbly request that a maintainer
 of these architectures take a look at it.  Or perhaps Bharat could handle the
 PPC code?  This would only leave

I will handle the PPC code.

-Bharat

 i386 to worry about.  If I cannot find someone to handle i386 I will look into
 the feasibility of completing it myself.
 
 In order to complete the missing implementations, kvm_arch_get_registers and
 kvm_arch_put_registers (and associated helper functions) will need to be 
 updated
 to only sync the registers contained in the new bitmap argument.  Also, each 
 set
 of registers represented  by one of the bits must be mutually exclusive with
 respect to every other bit.  if this is not the case then local register data
 can be lost when kvm_arch_get_registers is called causing an old register 
 value
 to overwrite a newer local value.
 
 --
 -- Jason J. Herne (jjhe...@linux.vnet.ibm.com)
 





Re: [Qemu-devel] [PATCH 3/3 v2] Enable kvm emulated watchdog

2013-01-10 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Thursday, January 10, 2013 9:07 PM
 To: Bhushan Bharat-R65777
 Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; Bhushan Bharat-R65777
 Subject: Re: [PATCH 3/3 v2] Enable kvm emulated watchdog
 
 
 On 28.12.2012, at 06:16, Bharat Bhushan wrote:
 
  Enable the KVM emulated watchdog if KVM supports (use the capability
  enablement in watchdog handler). Also watchdog exit
  (KVM_EXIT_WATCHDOG) handling is added.
  Watchdog state machine is cleared whenever VM state changes to running.
  This is to handle the cases like return from debug halt etc.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
  v2:
   - access cap_* from target_ppc/kvm.c only.
   - Added wrapper functions in target_ppc/kvm.c for
 enable_watchdog and tsr_sregs synchronization.
   - Incorporated other Review comments
 
  hw/ppc.h |1 +
  hw/ppc_booke.c   |   36 +++-
  target-ppc/kvm.c |   56 
  ++
  target-ppc/kvm_ppc.h |   11 +
  4 files changed, 103 insertions(+), 1 deletions(-)
 
  diff --git a/hw/ppc.h b/hw/ppc.h
  index 2f3ea27..6ad9e1f 100644
  --- a/hw/ppc.h
  +++ b/hw/ppc.h
  @@ -90,3 +90,4 @@ enum {
 
  /* ppc_booke.c */
  void ppc_booke_timers_init(CPUPPCState *env, uint32_t freq, uint32_t
  flags);
  +void ppc_booke_watchdog_clear_tsr(CPUPPCState *env);
  diff --git a/hw/ppc_booke.c b/hw/ppc_booke.c index 837a5b6..7273259
  100644
  --- a/hw/ppc_booke.c
  +++ b/hw/ppc_booke.c
  @@ -28,7 +28,7 @@
  #include nvram.h
  #include qemu-log.h
  #include loader.h
  -
  +#include kvm_ppc.h
 
  /* Timer Control Register */
 
  @@ -203,6 +203,11 @@ static void booke_wdt_cb(void *opaque)
   booke_timer-wdt_timer); }
 
  +void ppc_booke_watchdog_clear_tsr(CPUPPCState *env) {
  +env-spr[SPR_BOOKE_TSR] = ~(TSR_ENW | TSR_WIS | TSR_WRS_MASK); }
  +
  void store_booke_tsr(CPUPPCState *env, target_ulong val) {
  env-spr[SPR_BOOKE_TSR] = ~val;
  @@ -241,10 +246,27 @@ static void ppc_booke_timer_reset_handle(void *opaque)
  booke_update_irq(env);
  }
 
  +static void cpu_state_change_handler(void *opaque, int running,
  +RunState state) {
 
 Needs a comment when this happens
 
  +CPUPPCState *env = opaque;
  +
  +if (!running) {
  +return;
  +}
  +
  +/*
  + * Clear watchdog interrupt condition by clearing TSR.
  + */
  +ppc_booke_watchdog_clear_tsr(env);
  +
  +kvmppc_synch_sregs_tsr(env);
 
 kvmppc_sync_tsr. Also please add the sync to store_booke_tsr(). Then here, you
 can just do
 
   store_booke_tsr(TSR_ENW | TSR_WIS | TSR_WRS_MASK);
 
  +}
  +
  void ppc_booke_timers_init(CPUPPCState *env, uint32_t freq, uint32_t
  flags) {
  ppc_tb_t *tb_env;
  booke_timer_t *booke_timer;
  +int ret = 0;
 
  tb_env  = g_malloc0(sizeof(ppc_tb_t));
  booke_timer = g_malloc0(sizeof(booke_timer_t)); @@ -262,5 +284,17
  @@ void ppc_booke_timers_init(CPUPPCState *env, uint32_t freq, uint32_t 
  flags)
  booke_timer-wdt_timer =
  qemu_new_timer_ns(vm_clock, booke_wdt_cb, env);
 
  +ret = kvmppc_booke_watchdog_enable(env);
  +
  +if (ret) {
  +/* TODO: Start the QEMU emulated watchdog if not running on KVM.
  + * Also start the QEMU emulated watchdog if KVM does not support
  + * emulated watchdog or somehow it is not enabled (supported but
  + * not enabled is though some bug and requires debugging :)).
  + */
  +}
  +
  +qemu_add_vm_change_state_handler(cpu_state_change_handler, env);
  +
  qemu_register_reset(ppc_booke_timer_reset_handle, env); } diff
  --git a/target-ppc/kvm.c b/target-ppc/kvm.c index 3f5df57..6828afa
  100644
  --- a/target-ppc/kvm.c
  +++ b/target-ppc/kvm.c
  @@ -32,10 +32,12 @@
  #include device_tree.h
  #include hw/sysbus.h
  #include hw/spapr.h
  +#include hw/watchdog.h
 
  #include hw/sysbus.h
  #include hw/spapr.h
  #include hw/spapr_vio.h
  +#include hw/ppc.h
 
  //#define DEBUG_KVM
 
  @@ -61,6 +63,7 @@ static int cap_ppc_smt; static int cap_ppc_rma;
  static int cap_spapr_tce; static int cap_hior;
  +static int cap_ppc_watchdog;
 
  /* XXX We have a race condition where we actually have a level triggered
   * interrupt, but the infrastructure can't expose that yet, so the guest
  @@ -90,6 +93,7 @@ int kvm_arch_init(KVMState *s)
  cap_ppc_rma = kvm_check_extension(s, KVM_CAP_PPC_RMA);
  cap_spapr_tce = kvm_check_extension(s, KVM_CAP_SPAPR_TCE);
  cap_hior = kvm_check_extension(s, KVM_CAP_PPC_HIOR);
  +cap_ppc_watchdog = kvm_check_extension(s, KVM_CAP_PPC_BOOKE_WATCHDOG);
 
  if (!cap_interrupt_level) {
  fprintf(stderr, KVM: Couldn't find level irq capability. Expect 
  the 
  @@ -823,6 +827,12 @@ int kvm_arch_handle_exit(CPUPPCState *env, struct 
  kvm_run
 *run)
  ret = 0;
  break;
  #endif
  +case

Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue

2013-01-06 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Jason J. Herne [mailto:jjhe...@linux.vnet.ibm.com]
 Sent: Friday, January 04, 2013 11:10 PM
 To: Alexander Graf
 Cc: Christian Borntraeger; Anthony Liguori; Marcelo Tosatti; qemu-
 de...@nongnu.org qemu-devel; Bhushan Bharat-R65777
 Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix
 do_kvm_cpu_synchronize_state data integrity issue
 
 On 01/04/2013 11:27 AM, Alexander Graf wrote:
 
  On 04.01.2013, at 16:25, Jason J. Herne wrote:
 
  If I've followed the conversation correctly this is what needs to be done:
 
  1. Remove the level parameters from kvm_arch_get_registers and
 kvm_arch_put_registers.
 
  2. Add a new bitmap parameter to kvm_arch_get_registers and
 kvm_arch_put_registers.
 
  I would combine these into replace levels with bitmap.
 
  3. Define a bit that correlates to our current notion of all runtime
 registers.  This bit, and all bits in this bitmap, would be architecture
 specific.
 
  Why would that bit be architecture specific? All runtime registers ==
 registers that gdb can access IIRC. The implementation on what exactly that
 means obviously is architecture specific, but the bit itself would not be, as
 the gdbstub wants to be able to synchronize in arch independent code.
 
  4. Remove the cpustate-kvm_sync_dirty field.  Replace it with a bitmap 
  that
 tracks which bits are dirty and need to be synced back to KVM-land.
 
  5. As we do today, we'll assume registers are dirty and turn on their
 corresponding bit in this new bitmap whenever we get the registers from KVM.
 
  Yes. Changing these semantics is nothing for today :).
 
  6. Add other bits as needed on a case by case basis.
 
  Does this seem to match what was discussed, and what we want to do?
 
  It's probably the best way forward, keeping everyone happy.
 
  Please coordinate with Bharat on who actually wants to sit down to implement
 this. Or if you're quick you might be able to beat him to it regardless thanks
 to time zones :).
 
 
 Hi Bharat,
 
 How would you like to handle these changes?  I can do them, or you could if 
 you
 prefer. Please let me know.

Hi Jason,

I will be happy to see the patch from you and because of some other things if 
you think that it will take time then let me know, I will do the changes. This 
framework will be used by my watchdog patches and only thing I want is my 
watchdog changes get pushed in QEMU.

Thanks
-Bharat





Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue

2013-01-04 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Friday, January 04, 2013 2:11 PM
 To: Bhushan Bharat-R65777
 Cc: Marcelo Tosatti; jjhe...@linux.vnet.ibm.com; Christian Borntraeger; 
 Anthony
 Liguori; qemu-devel@nongnu.org qemu-devel
 Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix
 do_kvm_cpu_synchronize_state data integrity issue
 
 
 On 04.01.2013, at 05:22, Bhushan Bharat-R65777 wrote:
 
 
 
  -Original Message-
  From: Marcelo Tosatti [mailto:mtosa...@redhat.com]
  Sent: Friday, January 04, 2013 7:08 AM
  To: Alexander Graf
  Cc: jjhe...@linux.vnet.ibm.com; Christian Borntraeger; Anthony
  Liguori; qemu- de...@nongnu.org qemu-devel; Bhushan Bharat-R65777
  Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix
  do_kvm_cpu_synchronize_state data integrity issue
 
  On Thu, Jan 03, 2013 at 08:09:22PM +0100, Alexander Graf wrote:
 
  On 03.01.2013, at 19:48, Jason J. Herne wrote:
 
  On 01/03/2013 08:56 AM, Alexander Graf wrote:
  static void do_kvm_cpu_synchronize_state(void *_args)
  {
 struct kvm_cpu_syncstate_args *args = _args;
  +CPUArchState *env = args-env;
  +int register_level = args-register_level;
 
  This probably becomes more readable if we explicitly revert back
  to
  unsynced state first:
 
  /* Write back local modifications at our current level */ if
  (register_level  env-kvm_vcpu_dirty) {
 kvm_arch_put_registers(...);
 env-kvm_vcpu_dirty = 0;
  }
 
  and then do the sync we are requested to do:
 
  if (!env-kvm_vcpu_dirty) {
 ...
  }
 
  I agree, but only if we add a second conditional to the if 1st
  statement  as
  such:
 
  if (args-env-kvm_vcpu_dirty  register_level 
  env-kvm_vcpu_dirty)
 
  This is to cover the case where the caller is asking for register level 
  1
  and we're already dirty at level 2. In this case, nothing should
  happen and we'll need the args-env-kvm_vcpu_dirty to ensure that is the
 case.
 
  As before, I'd prefer to make this explicit:
 
 
  static void do_kvm_cpu_synchronize_state(void *_args) {
struct kvm_cpu_syncstate_args *args = _args;
CPUArchState *env = args-env;
int register_level = args-register_level;
 
  if (register_level  env-kvm_vcpu_dirty) {
 /* We are more dirty than we need to - all is well */
 return;
  }
 
 
/* Write back local modifications at our current level */
if (args-env-kvm_vcpu_dirty  register_level  env-kvm_vcpu_dirty) 
  {
kvm_arch_put_registers(env, env-kvm_vcpu_dirty);
env-kvm_vcpu_dirty = 0;
}
 
if (!args-env-kvm_vcpu_dirty) {
kvm_arch_get_registers(env, register_level);
env-kvm_vcpu_dirty = register_level;
}
  }
 
  Do you agree?  Thanks for your time. :)
 
  Please also check out the discussions I've had with Bharat about his
  watchdog
  patches. There we need a mechanism to synchronize registers only when
  we actually need to, in order to avoid potential race conditions with
  a kernel timer.
 
  That particular case doesn't work well with levels. We can have
  multiple
  different potential race producers in the kernel that we need to
  avoid individually, so we can't always synchronize all of them when
  only one of them needs to be synchronized.
 
  The big question is what we should be doing about this. We basically
  have 3
  options:
 
   * implement levels, treat racy registers as manually
  synchronized, as
  Bharat's latest patch set does
   * implement levels, add a bitmap for additional special
  synchronization bits
   * replace levels by bitmap
 
  I'm quite frankly not sure which one of the 3 would be the best way 
  forward.
 
 
  Alex
 
  Implement read/write accessors for individual registers, similarly to
  how its done in the kernel. See kvm_cache_regs.h.
 
  Read/write for individual registers can be heavy for cases where multiple
 register needed to be updated. Is not it?
 
 It depends. We can still have classes of registers to synchronize that we 
 could
 even synchronize using MANY_REG. For writes, things become even faster with
 individual accessors since we can easily coalesce all register writes until we
 hit the vcpu again into a MANY_REG array and write them in one go.
 
 
  For cases where multiple register needed to be synchronized then I would 
  like
 to elaborate the options as:
 
  Option 1:
  int timer_func(CPUArchState env)
  {
  cpu_synchronize_state(env);
  //update env-timer_registers
  env-updated_regs_type |= TIMER_REGS_TYPE_BIT;
 
 To scale this one, it's probably also more clever to swap the logic:
 
 env-kvm_sync_extra |= SYNC_TIMER_REGS;
 cpu_synchronize_state(env); /* gets timer regs */
 /* update env-timer_registers */
 /* timer regs will be synchronized later because kvm_sync_extra has
 SYNC_TIMER_REGS set */
 
 Your case right now is special in that we don't need to get any register 
 state,
 but only write it. However, if we do
 
 cpu_synchronize_state(env); /* will not get timer regs

Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue

2013-01-04 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Friday, January 04, 2013 2:11 PM
 To: Bhushan Bharat-R65777
 Cc: Marcelo Tosatti; jjhe...@linux.vnet.ibm.com; Christian Borntraeger; 
 Anthony
 Liguori; qemu-devel@nongnu.org qemu-devel
 Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix
 do_kvm_cpu_synchronize_state data integrity issue
 
 
 On 04.01.2013, at 05:22, Bhushan Bharat-R65777 wrote:
 
 
 
  -Original Message-
  From: Marcelo Tosatti [mailto:mtosa...@redhat.com]
  Sent: Friday, January 04, 2013 7:08 AM
  To: Alexander Graf
  Cc: jjhe...@linux.vnet.ibm.com; Christian Borntraeger; Anthony
  Liguori; qemu- de...@nongnu.org qemu-devel; Bhushan Bharat-R65777
  Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix
  do_kvm_cpu_synchronize_state data integrity issue
 
  On Thu, Jan 03, 2013 at 08:09:22PM +0100, Alexander Graf wrote:
 
  On 03.01.2013, at 19:48, Jason J. Herne wrote:
 
  On 01/03/2013 08:56 AM, Alexander Graf wrote:
  static void do_kvm_cpu_synchronize_state(void *_args)
  {
 struct kvm_cpu_syncstate_args *args = _args;
  +CPUArchState *env = args-env;
  +int register_level = args-register_level;
 
  This probably becomes more readable if we explicitly revert back
  to
  unsynced state first:
 
  /* Write back local modifications at our current level */ if
  (register_level  env-kvm_vcpu_dirty) {
 kvm_arch_put_registers(...);
 env-kvm_vcpu_dirty = 0;
  }
 
  and then do the sync we are requested to do:
 
  if (!env-kvm_vcpu_dirty) {
 ...
  }
 
  I agree, but only if we add a second conditional to the if 1st
  statement  as
  such:
 
  if (args-env-kvm_vcpu_dirty  register_level 
  env-kvm_vcpu_dirty)
 
  This is to cover the case where the caller is asking for register level 
  1
  and we're already dirty at level 2. In this case, nothing should
  happen and we'll need the args-env-kvm_vcpu_dirty to ensure that is the
 case.
 
  As before, I'd prefer to make this explicit:
 
 
  static void do_kvm_cpu_synchronize_state(void *_args) {
struct kvm_cpu_syncstate_args *args = _args;
CPUArchState *env = args-env;
int register_level = args-register_level;
 
  if (register_level  env-kvm_vcpu_dirty) {
 /* We are more dirty than we need to - all is well */
 return;
  }
 
 
/* Write back local modifications at our current level */
if (args-env-kvm_vcpu_dirty  register_level  env-kvm_vcpu_dirty) 
  {
kvm_arch_put_registers(env, env-kvm_vcpu_dirty);
env-kvm_vcpu_dirty = 0;
}
 
if (!args-env-kvm_vcpu_dirty) {
kvm_arch_get_registers(env, register_level);
env-kvm_vcpu_dirty = register_level;
}
  }
 
  Do you agree?  Thanks for your time. :)
 
  Please also check out the discussions I've had with Bharat about his
  watchdog
  patches. There we need a mechanism to synchronize registers only when
  we actually need to, in order to avoid potential race conditions with
  a kernel timer.
 
  That particular case doesn't work well with levels. We can have
  multiple
  different potential race producers in the kernel that we need to
  avoid individually, so we can't always synchronize all of them when
  only one of them needs to be synchronized.
 
  The big question is what we should be doing about this. We basically
  have 3
  options:
 
   * implement levels, treat racy registers as manually
  synchronized, as
  Bharat's latest patch set does
   * implement levels, add a bitmap for additional special
  synchronization bits
   * replace levels by bitmap
 
  I'm quite frankly not sure which one of the 3 would be the best way 
  forward.
 
 
  Alex
 
  Implement read/write accessors for individual registers, similarly to
  how its done in the kernel. See kvm_cache_regs.h.
 
  Read/write for individual registers can be heavy for cases where multiple
 register needed to be updated. Is not it?
 
 It depends. We can still have classes of registers to synchronize that we 
 could
 even synchronize using MANY_REG. For writes, things become even faster with
 individual accessors since we can easily coalesce all register writes until we
 hit the vcpu again into a MANY_REG array and write them in one go.
 
 
  For cases where multiple register needed to be synchronized then I would 
  like
 to elaborate the options as:
 
  Option 1:
  int timer_func(CPUArchState env)
  {
  cpu_synchronize_state(env);
  //update env-timer_registers
  env-updated_regs_type |= TIMER_REGS_TYPE_BIT;
 
 To scale this one, it's probably also more clever to swap the logic:
 
 env-kvm_sync_extra |= SYNC_TIMER_REGS;
 cpu_synchronize_state(env); /* gets timer regs */
 /* update env-timer_registers */
 /* timer regs will be synchronized later because kvm_sync_extra has
 SYNC_TIMER_REGS set */
 
 Your case right now is special in that we don't need to get any register 
 state,
 but only write it. However, if we do
 
 cpu_synchronize_state(env); /* will not get timer regs

Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue

2013-01-04 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Friday, January 04, 2013 3:59 PM
 To: Bhushan Bharat-R65777
 Cc: Marcelo Tosatti; jjhe...@linux.vnet.ibm.com; Christian Borntraeger; 
 Anthony
 Liguori; qemu-devel@nongnu.org qemu-devel
 Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix
 do_kvm_cpu_synchronize_state data integrity issue
 
 
 On 04.01.2013, at 11:23, Bhushan Bharat-R65777 wrote:
 
 
 
  -Original Message-
  From: Alexander Graf [mailto:ag...@suse.de]
  Sent: Friday, January 04, 2013 2:11 PM
  To: Bhushan Bharat-R65777
  Cc: Marcelo Tosatti; jjhe...@linux.vnet.ibm.com; Christian
  Borntraeger; Anthony Liguori; qemu-devel@nongnu.org qemu-devel
  Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix
  do_kvm_cpu_synchronize_state data integrity issue
 
 
  On 04.01.2013, at 05:22, Bhushan Bharat-R65777 wrote:
 
 
 
  -Original Message-
  From: Marcelo Tosatti [mailto:mtosa...@redhat.com]
  Sent: Friday, January 04, 2013 7:08 AM
  To: Alexander Graf
  Cc: jjhe...@linux.vnet.ibm.com; Christian Borntraeger; Anthony
  Liguori; qemu- de...@nongnu.org qemu-devel; Bhushan Bharat-R65777
  Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix
  do_kvm_cpu_synchronize_state data integrity issue
 
  On Thu, Jan 03, 2013 at 08:09:22PM +0100, Alexander Graf wrote:
 
  On 03.01.2013, at 19:48, Jason J. Herne wrote:
 
  On 01/03/2013 08:56 AM, Alexander Graf wrote:
  static void do_kvm_cpu_synchronize_state(void *_args)
  {
struct kvm_cpu_syncstate_args *args = _args;
  +CPUArchState *env = args-env;
  +int register_level = args-register_level;
 
  This probably becomes more readable if we explicitly revert back
  to
  unsynced state first:
 
  /* Write back local modifications at our current level */ if
  (register_level  env-kvm_vcpu_dirty) {
kvm_arch_put_registers(...);
env-kvm_vcpu_dirty = 0;
  }
 
  and then do the sync we are requested to do:
 
  if (!env-kvm_vcpu_dirty) {
...
  }
 
  I agree, but only if we add a second conditional to the if 1st
  statement  as
  such:
 
  if (args-env-kvm_vcpu_dirty  register_level 
  env-kvm_vcpu_dirty)
 
  This is to cover the case where the caller is asking for register level
 1
  and we're already dirty at level 2. In this case, nothing should
  happen and we'll need the args-env-kvm_vcpu_dirty to ensure
  that is the
  case.
 
  As before, I'd prefer to make this explicit:
 
 
  static void do_kvm_cpu_synchronize_state(void *_args) {  struct
  kvm_cpu_syncstate_args *args = _args;  CPUArchState *env =
  args-env;  int register_level = args-register_level;
 
  if (register_level  env-kvm_vcpu_dirty) {
/* We are more dirty than we need to - all is well */
return;
  }
 
 
   /* Write back local modifications at our current level */  if
  (args-env-kvm_vcpu_dirty  register_level  env-kvm_vcpu_dirty) {
   kvm_arch_put_registers(env, env-kvm_vcpu_dirty);
   env-kvm_vcpu_dirty = 0;
   }
 
   if (!args-env-kvm_vcpu_dirty) {
   kvm_arch_get_registers(env, register_level);
   env-kvm_vcpu_dirty = register_level;  } }
 
  Do you agree?  Thanks for your time. :)
 
  Please also check out the discussions I've had with Bharat about
  his watchdog
  patches. There we need a mechanism to synchronize registers only
  when we actually need to, in order to avoid potential race
  conditions with a kernel timer.
 
  That particular case doesn't work well with levels. We can have
  multiple
  different potential race producers in the kernel that we need to
  avoid individually, so we can't always synchronize all of them when
  only one of them needs to be synchronized.
 
  The big question is what we should be doing about this. We
  basically have 3
  options:
 
  * implement levels, treat racy registers as manually
  synchronized, as
  Bharat's latest patch set does
  * implement levels, add a bitmap for additional special
  synchronization bits
  * replace levels by bitmap
 
  I'm quite frankly not sure which one of the 3 would be the best way
 forward.
 
 
  Alex
 
  Implement read/write accessors for individual registers, similarly
  to how its done in the kernel. See kvm_cache_regs.h.
 
  Read/write for individual registers can be heavy for cases where
  multiple
  register needed to be updated. Is not it?
 
  It depends. We can still have classes of registers to synchronize
  that we could even synchronize using MANY_REG. For writes, things
  become even faster with individual accessors since we can easily
  coalesce all register writes until we hit the vcpu again into a MANY_REG
 array and write them in one go.
 
 
  For cases where multiple register needed to be synchronized then I
  would like
  to elaborate the options as:
 
  Option 1:
  int timer_func(CPUArchState env)
  {
cpu_synchronize_state(env);
//update env-timer_registers
env-updated_regs_type |= TIMER_REGS_TYPE_BIT;
 
  To scale this one, it's probably also more clever to swap the logic:
 
 env

Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue

2013-01-04 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Friday, January 04, 2013 4:07 PM
 To: Bhushan Bharat-R65777
 Cc: Marcelo Tosatti; jjhe...@linux.vnet.ibm.com; Christian Borntraeger; 
 Anthony
 Liguori; qemu-devel@nongnu.org qemu-devel
 Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix
 do_kvm_cpu_synchronize_state data integrity issue
 
 
 On 04.01.2013, at 11:32, Bhushan Bharat-R65777 wrote:
 
 
 
  -Original Message-
 
  Int timer_func(CPUxxState env)
  {
   Struct timer_regs;
   read_regs_type((env, timer_regs,TIMER_REGS);  //update
  env-timer_registers  Write_regs_type(env, TIMER_REGS) }
 
  This will get one type of register_types and can cause multiple
  IOCTL per
  entry to QEMU.
 
  This is still too explicit. How about
 
  static inline void ppc_set_tsr(CPUState *env, target_ulong val) {
env-kvm_sync_extra |= SYNC_TIMER_REGS;
cpu_synchronize_registers(env);
env-spr[SPR_TSR] = val;
 
  You also want env-kvm_sync_dirty also, right?
 
  Not quite, since SYNC_TIMER_REGS spans more than only TSR. So we need
  to make sure we fetch the non-TSR register values before we can
  declare TIMER_REGS as dirty.
 
  Right , I actually want communicate mark dirty the TIMER_REGS only :)
 
 Imagine TIMER_REGS includes TSR and TCR. Then
 
 ppc_set_tsr();
 
 should still work without a respective ppc_set_tcr() call. So we can't just 
 mark
 it dirty here. The only way we could optimize this bit would be by either
 splitting the bitmap per register or by extending the setter functions to the
 full scope of the sync:

Are you talking about GET all register of a type, say TIMER (TCR and TSR) on 
first call to cpu_synchronization_function but SET will be only for dirty 
register. Say if TSR is changed then setter will only communicate that TSR is 
changed?

Or

(
get multiple regs of a type , say TIMER (TCR and TSR), change one or more and 
mark dirty if any one changed. Set all registers of a type, TIMER (TCR and TSR) 
if dirty.
  In this case the first call by ppc_set_tsr() to cpu_synchronization_function 
will implicitly get all registers of TIMER (TCR and TSR) - means one or more 
registers of timer can be updated by QEMU. This function will update only 
env-TSR.
  Later if ppc_set_tcr() is called, it will also call 
cpu_synchronization_function but this will see that TIMER registers are already 
available to QEMU, so it will not fetch again. This function will update the 
TCR also.
  Finally they all, TIMER (TCR and TSR) will be pushed to kernel.
)

Thanks
-Bharat

 
 static inline void ppc_set_timer_regs(CPUState *env, target_ulong tsr,
 target_ulong tcr) {
 ...
 }
 
 
 Alex
 





Re: [Qemu-devel] [Qemu-ppc] [PATCH 2/3 v2] Reset qemu timers when guest reset

2013-01-03 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Wood Scott-B07421
 Sent: Friday, January 04, 2013 1:51 AM
 To: Bhushan Bharat-R65777
 Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; ag...@suse.de; Bhushan Bharat-
 R65777
 Subject: Re: [Qemu-ppc] [PATCH 2/3 v2] Reset qemu timers when guest reset
 
 On 12/27/2012 11:16:51 PM, Bharat Bhushan wrote:
  This patch install the timer reset handler. This will be called when
  the guest is reset.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
   v2: same as v1
 
   hw/ppc_booke.c |   12 
   1 files changed, 12 insertions(+), 0 deletions(-)
 
  diff --git a/hw/ppc_booke.c b/hw/ppc_booke.c index d51e7fa..837a5b6
  100644
  --- a/hw/ppc_booke.c
  +++ b/hw/ppc_booke.c
  @@ -231,6 +231,16 @@ void store_booke_tcr(CPUPPCState *env,
  target_ulong val)
 
   }
 
  +static void ppc_booke_timer_reset_handle(void *opaque) {
  +CPUPPCState *env = opaque;
  +
  +env-spr[SPR_BOOKE_TSR] = 0;
  +env-spr[SPR_BOOKE_TCR] = 0;
  +
  +booke_update_irq(env);
  +}
 
 When does KVM_SET_SREGS get called?

This is part of reset processing and is not cpu_synchronize_state() called 
before all reset handlers are called and after that post_synchronize will do 
the KVM_SET_SREGS in kvm_put_registers().

Thanks
-Bharat





Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue

2013-01-03 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Marcelo Tosatti [mailto:mtosa...@redhat.com]
 Sent: Friday, January 04, 2013 7:08 AM
 To: Alexander Graf
 Cc: jjhe...@linux.vnet.ibm.com; Christian Borntraeger; Anthony Liguori; qemu-
 de...@nongnu.org qemu-devel; Bhushan Bharat-R65777
 Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix
 do_kvm_cpu_synchronize_state data integrity issue
 
 On Thu, Jan 03, 2013 at 08:09:22PM +0100, Alexander Graf wrote:
 
  On 03.01.2013, at 19:48, Jason J. Herne wrote:
 
   On 01/03/2013 08:56 AM, Alexander Graf wrote:
   static void do_kvm_cpu_synchronize_state(void *_args)
   {
struct kvm_cpu_syncstate_args *args = _args;
   +CPUArchState *env = args-env;
   +int register_level = args-register_level;
   
   This probably becomes more readable if we explicitly revert back to
 unsynced state first:
  
   /* Write back local modifications at our current level */ if
   (register_level  env-kvm_vcpu_dirty) {
   kvm_arch_put_registers(...);
   env-kvm_vcpu_dirty = 0;
   }
  
   and then do the sync we are requested to do:
  
   if (!env-kvm_vcpu_dirty) {
   ...
   }
  
   I agree, but only if we add a second conditional to the if 1st statement  
   as
 such:
  
   if (args-env-kvm_vcpu_dirty  register_level 
   env-kvm_vcpu_dirty)
  
   This is to cover the case where the caller is asking for register level 
   1
 and we're already dirty at level 2. In this case, nothing should happen and
 we'll need the args-env-kvm_vcpu_dirty to ensure that is the case.
 
  As before, I'd prefer to make this explicit:
 
  
   static void do_kvm_cpu_synchronize_state(void *_args) {
  struct kvm_cpu_syncstate_args *args = _args;
  CPUArchState *env = args-env;
  int register_level = args-register_level;
 
  if (register_level  env-kvm_vcpu_dirty) {
  /* We are more dirty than we need to - all is well */
  return;
  }
 
  
  /* Write back local modifications at our current level */
  if (args-env-kvm_vcpu_dirty  register_level  env-kvm_vcpu_dirty) 
   {
  kvm_arch_put_registers(env, env-kvm_vcpu_dirty);
  env-kvm_vcpu_dirty = 0;
  }
  
  if (!args-env-kvm_vcpu_dirty) {
  kvm_arch_get_registers(env, register_level);
  env-kvm_vcpu_dirty = register_level;
  }
   }
  
   Do you agree?  Thanks for your time. :)
 
  Please also check out the discussions I've had with Bharat about his 
  watchdog
 patches. There we need a mechanism to synchronize registers only when we
 actually need to, in order to avoid potential race conditions with a kernel
 timer.
 
  That particular case doesn't work well with levels. We can have multiple
 different potential race producers in the kernel that we need to avoid
 individually, so we can't always synchronize all of them when only one of them
 needs to be synchronized.
 
  The big question is what we should be doing about this. We basically have 3
 options:
 
* implement levels, treat racy registers as manually synchronized, as
 Bharat's latest patch set does
* implement levels, add a bitmap for additional special synchronization 
  bits
* replace levels by bitmap
 
  I'm quite frankly not sure which one of the 3 would be the best way forward.
 
 
  Alex
 
 Implement read/write accessors for individual registers, similarly to how its
 done in the kernel. See kvm_cache_regs.h.

Read/write for individual registers can be heavy for cases where multiple 
register needed to be updated. Is not it?

For cases where multiple register needed to be synchronized then I would like 
to elaborate the options as:

Option 1:
int timer_func(CPUArchState env)
{
cpu_synchronize_state(env);
//update env-timer_registers
env-updated_regs_type |= TIMER_REGS_TYPE_BIT;
}

Change the kvm_arch_put_registers() to add followings:

int kvm_arch_put_registers(CPUArchState *env, int level)
{

If (env-updated_regs_type) {
struct kvm_sregs sregs;
If (env-updated_regs_type  TIMER_REGS_TYPE_BIT) {
// update sregs-timer_registers;
// may be we can have a bitmap to tell kernel for what 
actually updated
}
If (env-updated_regs_type  XX_CPU_REGS_TYPE) {
// update respective registers in sregs
}

ioctl(env, KVM_GET_SREGS, sregs);
}
}

This mechanism will get all registers state but this is required only once per 
entry in QEMU.


Option 2:
Define read_regs_type() and write_regs_type() for cases where it requires 
multiple registers updates:

int read_regs_type((CPUArchState env, void *regs_ptr, int reg_type)
{
 - reg_type are bitmap like: TIMER_REGS, DEBUG_REGS, PERFMON_REGS etc.

Switch(reg_type) {
Case TIMER_REGS:
Struct *timer_regs = regs_ptr;
Ioctl(KVM_GET_SREGS, timer_regs, CPU_reg_type); 
Break;

Default

Re: [Qemu-devel] [PATCH 3/3] Enable kvm emulated watchdog

2012-12-27 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Monday, December 17, 2012 8:09 PM
 To: Bhushan Bharat-R65777
 Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; Bhushan Bharat-R65777
 Subject: Re: [PATCH 3/3] Enable kvm emulated watchdog
 
 
 On 17.12.2012, at 07:08, Bharat Bhushan wrote:
 
  Enable the KVM emulated watchdog if KVM supports (use the capability
  enablement in watchdog handler). Also watchdog exit
  (KVM_EXIT_WATCHDOG) handling is added.
  Watchdog state machine is cleared whenever VM state changes to running.
  This is to handle the cases like return from debug halt etc.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
  hw/ppc.h |2 +
  hw/ppc_booke.c   |   71 
  ++
  target-ppc/kvm.c |   13 +-
  3 files changed, 85 insertions(+), 1 deletions(-)
 
  diff --git a/hw/ppc.h b/hw/ppc.h
  index 2f3ea27..3672fe8 100644
  --- a/hw/ppc.h
  +++ b/hw/ppc.h
  @@ -44,6 +44,8 @@ struct ppc_tb_t {
 
  uint64_t cpu_ppc_get_tb(ppc_tb_t *tb_env, uint64_t vmclk, int64_t
  tb_offset); clk_setup_cb cpu_ppc_tb_init (CPUPPCState *env, uint32_t
  freq);
  +extern int cap_ppc_watchdog;
  +extern int cap_booke_sregs;
 
 No. Never export cap_ variables. They are kvm internal.
 
  /* Embedded PowerPC DCR management */
  typedef uint32_t (*dcr_read_cb)(void *opaque, int dcrn); typedef void
  (*dcr_write_cb)(void *opaque, int dcrn, uint32_t val); diff --git
  a/hw/ppc_booke.c b/hw/ppc_booke.c index 837a5b6..f18df74 100644
  --- a/hw/ppc_booke.c
  +++ b/hw/ppc_booke.c
  @@ -21,6 +21,8 @@
   * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
  DEALINGS IN
   * THE SOFTWARE.
   */
  +#include sysemu.h
  +#include kvm.h
  #include hw.h
  #include ppc.h
  #include qemu-timer.h
  @@ -203,6 +205,11 @@ static void booke_wdt_cb(void *opaque)
   booke_timer-wdt_timer); }
 
  +static void ppc_booke_watchdog_clear_tsr(CPUPPCState *env,
  +target_ulong tsr) {
  +env-spr[SPR_BOOKE_TSR] = tsr  ~(TSR_ENW | TSR_WIS |
  +TSR_WRS_MASK); }
  +
  void store_booke_tsr(CPUPPCState *env, target_ulong val) {
  env-spr[SPR_BOOKE_TSR] = ~val;
  @@ -241,6 +248,64 @@ static void ppc_booke_timer_reset_handle(void *opaque)
  booke_update_irq(env);
  }
 
  +static void cpu_state_change_handler(void *opaque, int running,
  +RunState state) {
  +CPUPPCState *env = opaque;
  +
  +struct kvm_sregs sregs;
  +
  +if (!running) {
  +return;
  +}
  +
  +/*
  + * Clear watchdog interrupt condition by clearing TSR.
  + * Similar logic needed to be implemented for watchdog
  + * emulation in qemu.
  + */
  +
  +if (!kvm_enabled()) {
  +/* FIXME: add handling for qemu emulated case */
  +return;
  +}
  +
  +if (cap_booke_sregs  cap_ppc_watchdog) {
  +kvm_vcpu_ioctl(env, KVM_GET_SREGS, sregs);
  +
  +/* Clear TSR.ENW, TSR.WIS and TSR.WRS */
  +ppc_booke_watchdog_clear_tsr(env, sregs.u.e.tsr);
 
 This should happen outside of the if (kvm_enabled()) block.
 
  +sregs.u.e.tsr = env-spr[SPR_BOOKE_TSR];
  +sregs.u.e.update_special = KVM_SREGS_E_UPDATE_TSR;
  +
  +kvm_vcpu_ioctl(env, KVM_SET_SREGS, sregs);
 
 Please create a kvmppc_... wrapper for all this in target-ppc/kvm.c. Or maybe
 even better yet add a helper variable that tells the kvm register sync 
 function
 to sync TSR as well and just use the normal cpu_synchronize_state() way of
 pushing register into the CPU.

Not sure what type of helper variable you are talking about.
What came in my mine is we define a helper variable as per bitmap of SREGS 
update feature KVM_SREGS_E_UPDATE_* (update_special) in env. Whenever any code 
changes the env[spr] it will set the update_special. Env-update_special will 
be checked in put_registers().

Thanks
-Bharat

 
  +}
  +}
  +
  +static int kvm_booke_watchdog_enable(CPUPPCState *env) {
  +int ret;
  +struct kvm_enable_cap encap = {};
  +
  +if (!kvm_enabled()) {
  +return 0;
 
 Why return 0?
 
  +}
  +
  +if (!cap_ppc_watchdog) {
  +printf(warning: KVM does not support watchdog);
  +return 0;
 
 Why return 0?
 
  +}
  +
  +encap.cap = KVM_CAP_PPC_BOOKE_WATCHDOG;
  +ret = kvm_vcpu_ioctl(env, KVM_ENABLE_CAP, encap);
  +if (ret  0) {
  +fprintf(stderr, %s: couldn't enable KVM_CAP_PPC_BOOKE_WATCHDOG:
 %s\n,
  +__func__, strerror(-ret));
  +return ret;
  +}
 
 Please wrap this in a kvmppc_... function in kvm.c.
 
  +
  +return ret;
  +}
  +
  void ppc_booke_timers_init(CPUPPCState *env, uint32_t freq, uint32_t
  flags) {
  ppc_tb_t *tb_env;
  @@ -262,5 +327,11 @@ void ppc_booke_timers_init(CPUPPCState *env, uint32_t
 freq, uint32_t flags)
  booke_timer-wdt_timer =
  qemu_new_timer_ns(vm_clock, booke_wdt_cb, env);
 
  +if (kvm_enabled()) {
 
 Why double-check

Re: [Qemu-devel] [PATCH v2] Added uapi directory into linux-header

2012-12-18 Thread Bhushan Bharat-R65777
  +++ b/scripts/update-linux-headers.sh
  @@ -46,14 +46,26 @@ for arch in $ARCHLIST; do
 
   make -C $linux INSTALL_HDR_PATH=$tmpdir SRCARCH=$arch
  headers_install
 
  +if [ -e $linux/arch/$arch/include/uapi ] 
  +! [ -e $output/linux-headers/uapi ] ; then
  +mkdir $output/linux-headers/uapi
 
  mkdir -p
 
  But looking through this whole thing, it seems like the root
  cause is actually different. We don't want any uapi directories
  exposed to user space. So let's go back a step:
 
  Why do we need the uapi include dir? Because some header is using it.
 
  linux-headers/asm-powerpc/kvm_para.h:
 
  The kvm_para.h (also kvm.h) are now defined in include/uapi/asm/
 
  Is not this the correct thing that any header file in
  include/uapi/asm/ (in
  this case kvm_para.h) includes another header file (epapr_hcalls.h)
  in same directory?
 
  Also I think now only the uapi/asm/*.h files should be exposed to
  userspace
  (QEMU here).
 
  make headers_install should basically remove all the uapi magic and
  give us normal backwards-compatible asm trees :).
 
  I am perfectly fine, How we can do this now :)
 
  Well, for starters, do the headers work if you apply the patch I sent
  in a previous mail plus the epapr_hcall.h copy? If so, then that's
  the way to go :)
 
  Are you really sure that applying a patch and then syncing (or other way
 round)  is the way you want to go ?
 
 Yes, because I'm quite confident we're generating broken headers right now.

Ok, so every time someone does the sync he/she has to do this? Also do we think 
that sometime in future this will be taken care by make header_install?

Thanks
-Bharat

 
 Alex
 
 
  To me it does not look good, I think we can go with the script changes to 
  make
 install_header is updated to do the work.
 
  -Bharat
 
 
 
  Alex
 
 
 





Re: [Qemu-devel] [PATCH] Added uapi directory into linux-header

2012-12-17 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Monday, December 17, 2012 7:45 PM
 To: Bhushan Bharat-R65777
 Cc: qemu-devel qemu-devel; qemu-...@nongnu.org List; Bhushan Bharat-R65777; 
 Jan
 Kiszka; Peter Maydell
 Subject: Re: [PATCH] Added uapi directory into linux-header
 
 
 On 17.12.2012, at 05:08, Bharat Bhushan wrote:
 
  Linux ARCH specific header files are now in uapi/ directory also.
  These header files (epapr_hcalls.h on powerpc) are needed in qemu in
  linux-headers/uapi/asm/ directory as these are referenced by other
  header files in linux-headers/asm/.
 
  This patch is about changing the scripts for same.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
 
 Please include people who were participating in the discussion previously in 
 CC
 when you post a new version.
 
  ---
  configure   |1 +
  scripts/update-linux-headers.sh |   10 ++
  2 files changed, 11 insertions(+), 0 deletions(-)
 
  diff --git a/configure b/configure
  index 38b1cc6..51cce6b 100755
  --- a/configure
  +++ b/configure
  @@ -3724,6 +3724,7 @@ if test $linux = yes ; then
  # For non-KVM architectures we will not have asm headers
  if [ -e $source_path/linux-headers/asm-$linux_arch ]; then
symlink $source_path/linux-headers/asm-$linux_arch
  linux-headers/asm
  +  symlink $source_path/linux-headers/uapi/asm-$linux_arch
  + linux-headers/uapi/asm
 
 This is still wrong. You want this guarded by another if [ -e ].

This is not wrong because  

 
  fi
  fi
 
  diff --git a/scripts/update-linux-headers.sh
  b/scripts/update-linux-headers.sh index 4c7b566..d40f9c4 100755
  --- a/scripts/update-linux-headers.sh
  +++ b/scripts/update-linux-headers.sh
  @@ -46,14 +46,24 @@ for arch in $ARCHLIST; do
 
  make -C $linux INSTALL_HDR_PATH=$tmpdir SRCARCH=$arch
  headers_install
 
  +if ! [ -e $output/linux-headers/uapi ] ; then
  +mkdir $output/linux-headers/uapi
  +fi
 
 ... which would mean you don't need this bit.

... because here we ensure that the directory exists.

 
  +
  rm -rf $output/linux-headers/asm-$arch
  mkdir -p $output/linux-headers/asm-$arch
  +rm -rf $output/linux-headers/uapi/asm-$arch
  +mkdir -p $output/linux-headers/uapi/asm-$arch
 
 Only if it exists, no?

Above it is ensured that the directory always exists .. 

 
  +
  for header in kvm.h kvm_para.h; do
  cp $tmpdir/include/asm/$header $output/linux-headers/asm-$arch
  done
  if [ $arch = x86 ]; then
  cp $tmpdir/include/asm/hyperv.h $output/linux-headers/asm-x86
  fi
  +if [ $arch = powerpc ]; then
  +cp $linux/arch/$arch/include/uapi/asm/epapr_hcalls.h
 $output/linux-headers/uapi/asm-$arch/
  +fi
 
 Why is this pointing at $linux and not $tmpdir?

Because I found that $tmpdir does not have epapr_haclls.h, though I do not know 
why ? any idea why ?

 Also, no need for $arch - be
 explicit and make it powerpc.

Ok.

-Bharat

 
 
 Alex
 
  done
 
  rm -rf $output/linux-headers/linux
  --
  1.7.0.4
 
 
 





Re: [Qemu-devel] [PATCH] Added uapi directory into linux-header

2012-12-17 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Monday, December 17, 2012 8:24 PM
 To: Bhushan Bharat-R65777
 Cc: qemu-devel qemu-devel; qemu-...@nongnu.org List; Jan Kiszka; Peter Maydell
 Subject: Re: [PATCH] Added uapi directory into linux-header
 
 
 On 17.12.2012, at 15:51, Bhushan Bharat-R65777 wrote:
 
 
 
  -Original Message-
  From: Alexander Graf [mailto:ag...@suse.de]
  Sent: Monday, December 17, 2012 7:45 PM
  To: Bhushan Bharat-R65777
  Cc: qemu-devel qemu-devel; qemu-...@nongnu.org List; Bhushan
  Bharat-R65777; Jan Kiszka; Peter Maydell
  Subject: Re: [PATCH] Added uapi directory into linux-header
 
 
  On 17.12.2012, at 05:08, Bharat Bhushan wrote:
 
  Linux ARCH specific header files are now in uapi/ directory also.
  These header files (epapr_hcalls.h on powerpc) are needed in qemu in
  linux-headers/uapi/asm/ directory as these are referenced by other
  header files in linux-headers/asm/.
 
  This patch is about changing the scripts for same.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
 
  Please include people who were participating in the discussion
  previously in CC when you post a new version.
 
  ---
  configure   |1 +
  scripts/update-linux-headers.sh |   10 ++
  2 files changed, 11 insertions(+), 0 deletions(-)
 
  diff --git a/configure b/configure
  index 38b1cc6..51cce6b 100755
  --- a/configure
  +++ b/configure
  @@ -3724,6 +3724,7 @@ if test $linux = yes ; then
 # For non-KVM architectures we will not have asm headers
 if [ -e $source_path/linux-headers/asm-$linux_arch ]; then
   symlink $source_path/linux-headers/asm-$linux_arch
  linux-headers/asm
  +  symlink $source_path/linux-headers/uapi/asm-$linux_arch
  + linux-headers/uapi/asm
 
  This is still wrong. You want this guarded by another if [ -e ].
 
  This is not wrong because 
 
 
 fi
  fi
 
  diff --git a/scripts/update-linux-headers.sh
  b/scripts/update-linux-headers.sh index 4c7b566..d40f9c4 100755
  --- a/scripts/update-linux-headers.sh
  +++ b/scripts/update-linux-headers.sh
  @@ -46,14 +46,24 @@ for arch in $ARCHLIST; do
 
 make -C $linux INSTALL_HDR_PATH=$tmpdir SRCARCH=$arch
  headers_install
 
  +if ! [ -e $output/linux-headers/uapi ] ; then
  +mkdir $output/linux-headers/uapi
  +fi
 
  ... which would mean you don't need this bit.
 
  ... because here we ensure that the directory exists.
 
 But that means we create a uapi directory for archs that don't have one, 
 right?

Yes, I think that's the direction going forward (user specific header files in 
uapi/).

 
 
 
  +
 rm -rf $output/linux-headers/asm-$arch
 mkdir -p $output/linux-headers/asm-$arch
  +rm -rf $output/linux-headers/uapi/asm-$arch
  +mkdir -p $output/linux-headers/uapi/asm-$arch
 
  Only if it exists, no?
 
  Above it is ensured that the directory always exists ..
 
 
  +
 for header in kvm.h kvm_para.h; do
 cp $tmpdir/include/asm/$header $output/linux-headers/asm-$arch
 done
 if [ $arch = x86 ]; then
 cp $tmpdir/include/asm/hyperv.h $output/linux-headers/asm-x86
 fi
  +if [ $arch = powerpc ]; then
  +cp $linux/arch/$arch/include/uapi/asm/epapr_hcalls.h
  $output/linux-headers/uapi/asm-$arch/
  +fi
 
  Why is this pointing at $linux and not $tmpdir?
 
  Because I found that $tmpdir does not have epapr_haclls.h, though I do not
 know why ? any idea why ?
 
 Smells like a kernel bug :). In fact, I remember that we used to forget
 uapi/asm/epapr_hcalls.h in the header list a while ago.

I am not able to locate which command (set of commands) are installing the 
header files (all/selectively header files of arch/powerpc/include/asm/ and 
arch/powerp/include/uapi/asm/) in $tmpdir/

 Are you sure you're
 basing on the latest kvm code?

kvm-ppc-next branch of http://github.com/agraf/linux-2.6.git 

-Bharat

 
 
 Alex
 
 
  Also, no need for $arch - be
  explicit and make it powerpc.
 
  Ok.
 
  -Bharat
 
 
 
  Alex
 
  done
 
  rm -rf $output/linux-headers/linux
  --
  1.7.0.4
 
 
 
 
 
 





Re: [Qemu-devel] [PATCH 2/3] Reset qemu timers when guest reset

2012-12-17 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Monday, December 17, 2012 7:53 PM
 To: Bhushan Bharat-R65777
 Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; Bhushan Bharat-R65777
 Subject: Re: [PATCH 2/3] Reset qemu timers when guest reset
 
 
 On 17.12.2012, at 07:08, Bharat Bhushan wrote:
 
  This patch install the timer reset handler. This will be called when
  the guest is reset.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
  hw/ppc_booke.c |   12 
  1 files changed, 12 insertions(+), 0 deletions(-)
 
  diff --git a/hw/ppc_booke.c b/hw/ppc_booke.c index d51e7fa..837a5b6
  100644
  --- a/hw/ppc_booke.c
  +++ b/hw/ppc_booke.c
  @@ -231,6 +231,16 @@ void store_booke_tcr(CPUPPCState *env,
  target_ulong val)
 
  }
 
  +static void ppc_booke_timer_reset_handle(void *opaque) {
  +CPUPPCState *env = opaque;
  +
 
 Doesn't this need a cpu_synchronize_state() call?

There are some more registered reset_handler which changes the spr's but does 
not call synchronize..

But is not the qemu_system_reset() ( which calls registered reset handler) 
synchronizes the cpu state ?

-Bharat

 
 Alex
 
  +env-spr[SPR_BOOKE_TSR] = 0;
  +env-spr[SPR_BOOKE_TCR] = 0;
  +
  +booke_update_irq(env);
  +}
  +
  void ppc_booke_timers_init(CPUPPCState *env, uint32_t freq, uint32_t
  flags) {
  ppc_tb_t *tb_env;
  @@ -251,4 +261,6 @@ void ppc_booke_timers_init(CPUPPCState *env, uint32_t
 freq, uint32_t flags)
  qemu_new_timer_ns(vm_clock, booke_fit_cb, env);
  booke_timer-wdt_timer =
  qemu_new_timer_ns(vm_clock, booke_wdt_cb, env);
  +
  +qemu_register_reset(ppc_booke_timer_reset_handle, env);
  }
  --
  1.7.0.4
 
 
 





Re: [Qemu-devel] [PATCH] Added uapi directory into linux-header

2012-12-17 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Monday, December 17, 2012 8:39 PM
 To: Bhushan Bharat-R65777
 Cc: qemu-devel qemu-devel; qemu-...@nongnu.org List; Jan Kiszka; Peter Maydell
 Subject: Re: [PATCH] Added uapi directory into linux-header
 
 
 On 17.12.2012, at 16:02, Bhushan Bharat-R65777 wrote:
 
 
 
  -Original Message-
  From: Alexander Graf [mailto:ag...@suse.de]
  Sent: Monday, December 17, 2012 8:24 PM
  To: Bhushan Bharat-R65777
  Cc: qemu-devel qemu-devel; qemu-...@nongnu.org List; Jan Kiszka;
  Peter Maydell
  Subject: Re: [PATCH] Added uapi directory into linux-header
 
 
  On 17.12.2012, at 15:51, Bhushan Bharat-R65777 wrote:
 
 
 
  -Original Message-
  From: Alexander Graf [mailto:ag...@suse.de]
  Sent: Monday, December 17, 2012 7:45 PM
  To: Bhushan Bharat-R65777
  Cc: qemu-devel qemu-devel; qemu-...@nongnu.org List; Bhushan
  Bharat-R65777; Jan Kiszka; Peter Maydell
  Subject: Re: [PATCH] Added uapi directory into linux-header
 
 
  On 17.12.2012, at 05:08, Bharat Bhushan wrote:
 
  Linux ARCH specific header files are now in uapi/ directory also.
  These header files (epapr_hcalls.h on powerpc) are needed in qemu
  in linux-headers/uapi/asm/ directory as these are referenced by
  other header files in linux-headers/asm/.
 
  This patch is about changing the scripts for same.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
 
  Please include people who were participating in the discussion
  previously in CC when you post a new version.
 
  ---
  configure   |1 +
  scripts/update-linux-headers.sh |   10 ++
  2 files changed, 11 insertions(+), 0 deletions(-)
 
  diff --git a/configure b/configure index 38b1cc6..51cce6b 100755
  --- a/configure
  +++ b/configure
  @@ -3724,6 +3724,7 @@ if test $linux = yes ; then
# For non-KVM architectures we will not have asm headers
if [ -e $source_path/linux-headers/asm-$linux_arch ]; then
  symlink $source_path/linux-headers/asm-$linux_arch
  linux-headers/asm
  +  symlink $source_path/linux-headers/uapi/asm-$linux_arch
  + linux-headers/uapi/asm
 
  This is still wrong. You want this guarded by another if [ -e ].
 
  This is not wrong because 
 
 
fi
  fi
 
  diff --git a/scripts/update-linux-headers.sh
  b/scripts/update-linux-headers.sh index 4c7b566..d40f9c4 100755
  --- a/scripts/update-linux-headers.sh
  +++ b/scripts/update-linux-headers.sh
  @@ -46,14 +46,24 @@ for arch in $ARCHLIST; do
 
make -C $linux INSTALL_HDR_PATH=$tmpdir SRCARCH=$arch
  headers_install
 
  +if ! [ -e $output/linux-headers/uapi ] ; then
  +mkdir $output/linux-headers/uapi
  +fi
 
  ... which would mean you don't need this bit.
 
  ... because here we ensure that the directory exists.
 
  But that means we create a uapi directory for archs that don't have one,
 right?
 
  Yes, I think that's the direction going forward (user specific header files 
  in
 uapi/).
 
 Sure, but we shouldn't be faster than Linux itself here ;).

I am fine with what makes you happier :)

 
 
 
 
 
  +
rm -rf $output/linux-headers/asm-$arch
mkdir -p $output/linux-headers/asm-$arch
  +rm -rf $output/linux-headers/uapi/asm-$arch
  +mkdir -p $output/linux-headers/uapi/asm-$arch
 
  Only if it exists, no?
 
  Above it is ensured that the directory always exists ..
 
 
  +
for header in kvm.h kvm_para.h; do
cp $tmpdir/include/asm/$header $output/linux-headers/asm-$arch
done
if [ $arch = x86 ]; then
cp $tmpdir/include/asm/hyperv.h $output/linux-headers/asm-x86
fi
  +if [ $arch = powerpc ]; then
  +cp $linux/arch/$arch/include/uapi/asm/epapr_hcalls.h
  $output/linux-headers/uapi/asm-$arch/
  +fi
 
  Why is this pointing at $linux and not $tmpdir?
 
  Because I found that $tmpdir does not have epapr_haclls.h, though I
  do not
  know why ? any idea why ?
 
  Smells like a kernel bug :). In fact, I remember that we used to
  forget uapi/asm/epapr_hcalls.h in the header list a while ago.
 
  I am not able to locate which command (set of commands) are installing
  the header files (all/selectively header files of
  arch/powerpc/include/asm/ and arch/powerp/include/uapi/asm/) in
  $tmpdir/
 
 There is a make target headers_install to install user space headers into a
 directory. To be installed by this, the file needs to be listed in
 arch/powerpc/include/uapi/asm/Kbuild. Commit
 c99ec973a63e2249020d6d93a46d7572432da6a2 put it in there, so it should get
 installed...

Ahh, thanks . Actually all the files are getting installed in 
$tmpdir/include/asm/ while I was looking for the header file in 
$tmpdir/include/uapi/asm/

-Bharat

 
 
  Are you sure you're
  basing on the latest kvm code?
 
  kvm-ppc-next branch of http://github.com/agraf/linux-2.6.git
 
 Yeah, that one is good :)
 
 
 Alex
 





Re: [Qemu-devel] [PATCH v2] Added uapi directory into linux-header

2012-12-17 Thread Bhushan Bharat-R65777
  +++ b/scripts/update-linux-headers.sh
  @@ -46,14 +46,26 @@ for arch in $ARCHLIST; do
 
  make -C $linux INSTALL_HDR_PATH=$tmpdir SRCARCH=$arch
  headers_install
 
  +if [ -e $linux/arch/$arch/include/uapi ] 
  +! [ -e $output/linux-headers/uapi ] ; then
  +mkdir $output/linux-headers/uapi
 
 mkdir -p
 
 But looking through this whole thing, it seems like the root cause is actually
 different. We don't want any uapi directories exposed to user space. So let's 
 go
 back a step:
 
 Why do we need the uapi include dir? Because some header is using it.
 
 linux-headers/asm-powerpc/kvm_para.h:

The kvm_para.h (also kvm.h) are now defined in include/uapi/asm/

Is not this the correct thing that any header file in include/uapi/asm/ (in 
this case kvm_para.h) includes another header file (epapr_hcalls.h) in same 
directory?

Also I think now only the uapi/asm/*.h files should be exposed to userspace 
(QEMU here).

Waiting for David's comment for better clarity ... 

-Bharat

 
 #include uapi/asm/epapr_hcalls.h
 
 This is the root cause of the problem. We must never manually include any uapi
 header paths. We only ever include their normal asm-counterparts which then 
 may
 include uapi (in kernel) or actually are uapi (in user space). David, please
 correct me if I'm wrong.
 
 Could you please try and see if this kernel side patch makes things work for 
 you
 too?
 
 
 Alex
 
 
 diff --git a/arch/powerpc/include/uapi/asm/kvm_para.h
 b/arch/powerpc/include/uapi/asm/kvm_para.h
 index ed0e025..e3af328 100644
 --- a/arch/powerpc/include/uapi/asm/kvm_para.h
 +++ b/arch/powerpc/include/uapi/asm/kvm_para.h
 @@ -78,7 +78,7 @@ struct kvm_vcpu_arch_shared {
 
  #define KVM_HCALL_TOKEN(num) _EV_HCALL_TOKEN(EV_KVM_VENDOR_ID, num)
 
 -#include uapi/asm/epapr_hcalls.h
 +#include asm/epapr_hcalls.h
 
  #define KVM_FEATURE_MAGIC_PAGE   1
 






Re: [Qemu-devel] [PATCH v2] Added uapi directory into linux-header

2012-12-17 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Tuesday, December 18, 2012 6:51 AM
 To: Bhushan Bharat-R65777
 Cc: qemu-devel qemu-devel; Peter Maydell; Jan Kiszka; qemu-...@nongnu.org 
 List;
 Marcelo Tosatti; David Howells
 Subject: Re: [PATCH v2] Added uapi directory into linux-header
 
 
 On 18.12.2012, at 02:14, Bhushan Bharat-R65777 wrote:
 
  +++ b/scripts/update-linux-headers.sh
  @@ -46,14 +46,26 @@ for arch in $ARCHLIST; do
 
 make -C $linux INSTALL_HDR_PATH=$tmpdir SRCARCH=$arch
  headers_install
 
  +if [ -e $linux/arch/$arch/include/uapi ] 
  +! [ -e $output/linux-headers/uapi ] ; then
  +mkdir $output/linux-headers/uapi
 
  mkdir -p
 
  But looking through this whole thing, it seems like the root cause is
  actually different. We don't want any uapi directories exposed to
  user space. So let's go back a step:
 
  Why do we need the uapi include dir? Because some header is using it.
 
  linux-headers/asm-powerpc/kvm_para.h:
 
  The kvm_para.h (also kvm.h) are now defined in include/uapi/asm/
 
  Is not this the correct thing that any header file in include/uapi/asm/ (in
 this case kvm_para.h) includes another header file (epapr_hcalls.h) in same
 directory?
 
  Also I think now only the uapi/asm/*.h files should be exposed to userspace
 (QEMU here).
 
 make headers_install should basically remove all the uapi magic and give us
 normal backwards-compatible asm trees :).

I am perfectly fine, How we can do this now :)

-Bharat

 
 
 Alex
 
 
  Waiting for David's comment for better clarity ...
 
  -Bharat
 
 
 #include uapi/asm/epapr_hcalls.h
 
  This is the root cause of the problem. We must never manually include
  any uapi header paths. We only ever include their normal
  asm-counterparts which then may include uapi (in kernel) or actually
  are uapi (in user space). David, please correct me if I'm wrong.
 
  Could you please try and see if this kernel side patch makes things
  work for you too?
 
 
  Alex
 
 
  diff --git a/arch/powerpc/include/uapi/asm/kvm_para.h
  b/arch/powerpc/include/uapi/asm/kvm_para.h
  index ed0e025..e3af328 100644
  --- a/arch/powerpc/include/uapi/asm/kvm_para.h
  +++ b/arch/powerpc/include/uapi/asm/kvm_para.h
  @@ -78,7 +78,7 @@ struct kvm_vcpu_arch_shared {
 
  #define KVM_HCALL_TOKEN(num) _EV_HCALL_TOKEN(EV_KVM_VENDOR_ID, num)
 
  -#include uapi/asm/epapr_hcalls.h
  +#include asm/epapr_hcalls.h
 
  #define KVM_FEATURE_MAGIC_PAGE 1
 
 
 
 
 





Re: [Qemu-devel] [PATCH v2] Added uapi directory into linux-header

2012-12-17 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Tuesday, December 18, 2012 7:00 AM
 To: Bhushan Bharat-R65777
 Cc: qemu-devel qemu-devel; Peter Maydell; Jan Kiszka; qemu-...@nongnu.org 
 List;
 Marcelo Tosatti; David Howells
 Subject: Re: [PATCH v2] Added uapi directory into linux-header
 
 
 On 18.12.2012, at 02:27, Bhushan Bharat-R65777 wrote:
 
 
 
  -Original Message-
  From: Alexander Graf [mailto:ag...@suse.de]
  Sent: Tuesday, December 18, 2012 6:51 AM
  To: Bhushan Bharat-R65777
  Cc: qemu-devel qemu-devel; Peter Maydell; Jan Kiszka;
  qemu-...@nongnu.org List; Marcelo Tosatti; David Howells
  Subject: Re: [PATCH v2] Added uapi directory into linux-header
 
 
  On 18.12.2012, at 02:14, Bhushan Bharat-R65777 wrote:
 
  +++ b/scripts/update-linux-headers.sh
  @@ -46,14 +46,26 @@ for arch in $ARCHLIST; do
 
make -C $linux INSTALL_HDR_PATH=$tmpdir SRCARCH=$arch
  headers_install
 
  +if [ -e $linux/arch/$arch/include/uapi ] 
  +! [ -e $output/linux-headers/uapi ] ; then
  +mkdir $output/linux-headers/uapi
 
  mkdir -p
 
  But looking through this whole thing, it seems like the root cause
  is actually different. We don't want any uapi directories exposed
  to user space. So let's go back a step:
 
  Why do we need the uapi include dir? Because some header is using it.
 
  linux-headers/asm-powerpc/kvm_para.h:
 
  The kvm_para.h (also kvm.h) are now defined in include/uapi/asm/
 
  Is not this the correct thing that any header file in
  include/uapi/asm/ (in
  this case kvm_para.h) includes another header file (epapr_hcalls.h)
  in same directory?
 
  Also I think now only the uapi/asm/*.h files should be exposed to
  userspace
  (QEMU here).
 
  make headers_install should basically remove all the uapi magic and
  give us normal backwards-compatible asm trees :).
 
  I am perfectly fine, How we can do this now :)
 
 Well, for starters, do the headers work if you apply the patch I sent in a
 previous mail plus the epapr_hcall.h copy? If so, then that's the way to go :)

Are you really sure that applying a patch and then syncing (or other way round) 
 is the way you want to go ?

To me it does not look good, I think we can go with the script changes to make 
install_header is updated to do the work.

-Bharat

 
 
 Alex
 





Re: [Qemu-devel] [PATCH RFC] PowerPC: Added uapi directory into linux-header

2012-12-14 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Friday, December 14, 2012 5:06 PM
 To: Bhushan Bharat-R65777
 Cc: qemu-devel qemu-devel; qemu-...@nongnu.org List; Bhushan Bharat-R65777; 
 Jan
 Kiszka
 Subject: Re: [PATCH RFC] PowerPC: Added uapi directory into linux-header
 
 
 On 14.12.2012, at 12:04, Bharat Bhushan wrote:
 
  This is corrently done for powerpc.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
 
 Jan, could you please check if this is correct?
 
  ---
  configure   |1 +
  scripts/update-linux-headers.sh |5 +
  2 files changed, 6 insertions(+), 0 deletions(-)
 
  diff --git a/configure b/configure
  index 780b19a..bdc2d5e 100755
  --- a/configure
  +++ b/configure
  @@ -3701,6 +3701,7 @@ if test $linux = yes ; then
  # For non-KVM architectures we will not have asm headers
  if [ -e $source_path/linux-headers/asm-$linux_arch ]; then
symlink $source_path/linux-headers/asm-$linux_arch
  linux-headers/asm
  +  symlink $source_path/linux-headers/uapi/asm-$linux_arch
  + linux-headers/uapi/asm
  fi
  fi
 
  diff --git a/scripts/update-linux-headers.sh
  b/scripts/update-linux-headers.sh index 4c7b566..9f6bf25 100755
  --- a/scripts/update-linux-headers.sh
  +++ b/scripts/update-linux-headers.sh
  @@ -48,6 +48,11 @@ for arch in $ARCHLIST; do
 
  rm -rf $output/linux-headers/asm-$arch
  mkdir -p $output/linux-headers/asm-$arch
  +if [ $arch = powerpc ]; then
 
 This looks bogus. There shouldn't be any powerpc specifics anywhere in this
 file.

This file have x86 specific also, why ?

-Bharat

 
 
 Alex
 
  +   rm -rf $output/linux-headers/uapi/asm-$arch/*
  +cp $linux/arch/$arch/include/uapi/asm/epapr_hcalls.h
 $output/linux-headers/uapi/asm-$arch/
  +fi
  +
  for header in kvm.h kvm_para.h; do
  cp $tmpdir/include/asm/$header $output/linux-headers/asm-$arch
  done
  --
  1.7.0.4
 
 
 





Re: [Qemu-devel] [PATCH 2/3] e500: Adding CCSR memory region

2012-10-09 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Avi Kivity [mailto:a...@redhat.com]
 Sent: Tuesday, October 09, 2012 2:35 PM
 To: Andreas Färber
 Cc: Bhushan Bharat-R65777; ag...@suse.de; qemu-...@nongnu.org; qemu-
 de...@nongnu.org; Bhushan Bharat-R65777
 Subject: Re: [PATCH 2/3] e500: Adding CCSR memory region
 
 On 10/08/2012 07:21 PM, Andreas Färber wrote:
  Am 08.10.2012 18:46, schrieb Bharat Bhushan:
  All devices are also placed under CCSR memory region.
  The CCSR memory region is exported to pci device. The MSI interrupt
  generation is the main reason to export the CCSR region to PCI device.
  This put the requirement to move mpic under CCSR region, but
  logically all devices should be under CCSR. So this patch places all
  emulated devices under ccsr region.
 
  +sysbus_connect_irq(s, 0, mpic[pci_irq_nrs[0]]);
  +sysbus_connect_irq(s, 1, mpic[pci_irq_nrs[1]]);
  +sysbus_connect_irq(s, 2, mpic[pci_irq_nrs[2]]);
  +sysbus_connect_irq(s, 3, mpic[pci_irq_nrs[3]]);
  +memory_region_add_subregion(ccsr, MPC8544_PCI_REGS_OFFSET,
  + s-mmio[0].memory);
 
  ... I wonder if fiddling with SysBus MMIO is a good idea.
  s-mmio[0].addr is not getting assigned this way, which is checked as
  condition for deleting the subregion. But sysbus_mmio_map() only adds
  to / deletes from get_system_memory().
  The alternative would be using a custom field rather than the
  SysBus-internal one. Avi/Alex?
 
 IMO yes.  Or not use sysbus at all.

What about adding a API:
void sysbus_mmio_map_to_mr(SysBusDevice *dev, int n, target_phys_addr_t addr,
   MemoryRegion *mr)
{
assert(n = 0  n  dev-num_mmio);

if (dev-mmio[n].addr == addr) {
/* ??? region already mapped here.  */
return;
}
if (dev-mmio[n].addr != (target_phys_addr_t)-1) {
/* Unregister previous mapping.  */
memory_region_del_subregion(mr, dev-mmio[n].memory);
}
dev-mmio[n].addr = addr;
memory_region_add_subregion(mr, addr, dev-mmio[n].memory);
}

Thanks
-Bharat

 
 
 --
 error compiling committee.c: too many arguments to function





Re: [Qemu-devel] [PATCH 2/3] e500: Adding CCSR memory region

2012-10-09 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Avi Kivity [mailto:a...@redhat.com]
 Sent: Tuesday, October 09, 2012 10:24 PM
 To: Bhushan Bharat-R65777
 Cc: Andreas Färber; ag...@suse.de; qemu-...@nongnu.org; qemu-devel@nongnu.org
 Subject: Re: [PATCH 2/3] e500: Adding CCSR memory region
 
 On 10/09/2012 06:45 PM, Bhushan Bharat-R65777 wrote:
 
  What about adding a API:
  void sysbus_mmio_map_to_mr(SysBusDevice *dev, int n, target_phys_addr_t 
  addr,
 MemoryRegion *mr) {
  assert(n = 0  n  dev-num_mmio);
 
  if (dev-mmio[n].addr == addr) {
  /* ??? region already mapped here.  */
  return;
  }
  if (dev-mmio[n].addr != (target_phys_addr_t)-1) {
  /* Unregister previous mapping.  */
  memory_region_del_subregion(mr, dev-mmio[n].memory);
  }
  dev-mmio[n].addr = addr;
  memory_region_add_subregion(mr, addr, dev-mmio[n].memory); }
 
 
 I think you can just use sysbus_mmio_get_region().  There are plenty of other
 users, so there's precedent.

You mean something like this : memory_region_add_subregion(mr, addr, 
sysbus_mmio_get_region(dev, 0);

Ok, but this will still not resolve the issue of not setting the 
dev-mmio[n].addr, no ?

Thanks
-Bharat

 
 
 --
 error compiling committee.c: too many arguments to function





Re: [Qemu-devel] [PATCH 2/3] e500: Adding CCSR memory region

2012-10-09 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Avi Kivity [mailto:a...@redhat.com]
 Sent: Tuesday, October 09, 2012 10:31 PM
 To: Bhushan Bharat-R65777
 Cc: Andreas Färber; ag...@suse.de; qemu-...@nongnu.org; qemu-devel@nongnu.org
 Subject: Re: [PATCH 2/3] e500: Adding CCSR memory region
 
 On 10/09/2012 06:57 PM, Bhushan Bharat-R65777 wrote:
 
 
  -Original Message-
  From: Avi Kivity [mailto:a...@redhat.com]
  Sent: Tuesday, October 09, 2012 10:24 PM
  To: Bhushan Bharat-R65777
  Cc: Andreas Färber; ag...@suse.de; qemu-...@nongnu.org;
  qemu-devel@nongnu.org
  Subject: Re: [PATCH 2/3] e500: Adding CCSR memory region
 
  On 10/09/2012 06:45 PM, Bhushan Bharat-R65777 wrote:
  
   What about adding a API:
   void sysbus_mmio_map_to_mr(SysBusDevice *dev, int n, target_phys_addr_t
 addr,
  MemoryRegion *mr) {
   assert(n = 0  n  dev-num_mmio);
  
   if (dev-mmio[n].addr == addr) {
   /* ??? region already mapped here.  */
   return;
   }
   if (dev-mmio[n].addr != (target_phys_addr_t)-1) {
   /* Unregister previous mapping.  */
   memory_region_del_subregion(mr, dev-mmio[n].memory);
   }
   dev-mmio[n].addr = addr;
   memory_region_add_subregion(mr, addr, dev-mmio[n].memory); }
  
 
  I think you can just use sysbus_mmio_get_region().  There are plenty
  of other users, so there's precedent.
 
  You mean something like this : memory_region_add_subregion(mr, addr,
  sysbus_mmio_get_region(dev, 0);
 
  Ok, but this will still not resolve the issue of not setting the dev-
 mmio[n].addr, no ?
 
 Correct.  But there are 20 uses already, so it can't matter much.  If someone
 wants to fix them, they can write a new API and do a sweep.
 
 But really, sysbus just needs to go away.  It's pointless to give it more and
 more features.

Ok

Thanks
-Bharat

 
 
 --
 error compiling committee.c: too many arguments to function





Re: [Qemu-devel] [PATCH 2/2 v2] Adding BAR0 for e500 PCI controller

2012-10-09 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Wood Scott-B07421
 Sent: Wednesday, October 10, 2012 4:08 AM
 To: Bhushan Bharat-R65777
 Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; ag...@suse.de; 
 afaer...@suse.de;
 Bhushan Bharat-R65777
 Subject: Re: [Qemu-devel] [PATCH 2/2 v2] Adding BAR0 for e500 PCI controller
 
 On 10/09/2012 01:19:10 PM, Bharat Bhushan wrote:
  +static int e500_ccsr_initfn(SysBusDevice *dev) {
  +PPCE500CCSRState *pci_ccsr;
  +
  +pci_ccsr = CCSR(dev);
  +memory_region_init(pci_ccsr-ccsr_space, e500-ccsr,
  +   MPC8544_CCSRBAR_SIZE);
  +return 0;
  +}
 
 Is this object supposed to represent CCSR (which is what the type name seems 
 to
 imply, along with the existence of a different
 PPCE500PCIBridgeState) or PCI BAR0 (which is what pci_ccsr seems to imply, 
 along
 with the fact that it's being added in the PCI patch)?

It is ccsr, I will correct this naming.

Thanks
-Bharat





Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI controller

2012-10-08 Thread Bhushan Bharat-R65777
  Which device's initialization function you are talking about?
 
  static const TypeInfo e500_host_bridge_info = {
.name  = e500-host-bridge,
.parent= TYPE_PCI_DEVICE,
.instance_size = sizeof(PCIDevice),
.class_init= e500_host_bridge_class_init,
  };
 
  This needs to describe a derived class of PCIDevice, hold the BAR
  as a data member, and register the BAR in the init function (which
  doesn't exist yet because you haven't subclassed it).  This way the
  BAR lives in the device which exposes it, like BARs everywhere.
 
  Do you mean that we add the MemoryRegion bar0 in PCIDevice struct.
  Do the
  same thing that I was doing in e500_pcihost_initfn() in the k-init()
  (will add
  this) function of e500-host-bridge
 
  No, he means that you create a new struct like this:
 
   struct foo {
 PCIDevice p;
 MemoryRegion bar0;
   };
 
  Please check out any other random PCI device in QEMU. Almost all of
  them do this to store more information than their parent class can hold.
 
  Just want to be sure I understood you correctly: Do you mean something
  like this : ( I know I have to switch to QOM mechanism to share
  parameters)
 
  diff --git a/hw/ppce500_pci.c b/hw/ppce500_pci.c index
  92b1dc0..a948bc6 100644
  --- a/hw/ppce500_pci.c
  +++ b/hw/ppce500_pci.c
  @@ -89,6 +89,12 @@ struct PPCE500PCIState {
  MemoryRegion iomem;
  };
 
  +struct BHARAT {
  +PCIDevice p;
  +void *bar0;
 
 MemoryRegion *bar0

If I uses  MemoryRegion *bar0  or  MemoryRegion bar0 then how the size and 
address of bar0 will be populated?

As far as I can see, other PCI devices using this way to have extra information 
but they does not get MemoryRegion information from other object.

 
  +};
  +
  +typedef struct BHARAT bharat;
  typedef struct PPCE500PCIState PPCE500PCIState;
 
  static uint64_t pci_reg_read4(void *opaque, target_phys_addr_t addr,
  @@ -307,6 +313,16 @@ static const VMStateDescription
  vmstate_ppce500_pci = {
 
  #include exec-memory.h
 
  +static int e500_pcihost_bridge_initfn(PCIDevice *d) {
  +bharat *b = DO_UPCAST(bharat, p, d);
  +
  +printf(Addr = %llx, size = %llx\n, ((MemoryRegion *)b-bar0)-addr,
 (unsigned long long)int128_get64(((Me
  +pci_register_bar(d, 0, PCI_BASE_ADDRESS_SPACE_MEMORY,
  + (MemoryRegion *)b-bar0);
 
 That one still has to call its parent initfn, no?

I am really sorry, but I did not get. The object says its parent is 
TYPE_PCI_DEVICE, so which function are you talking about?

 
  +return 0;
  +}
  +
  +MemoryRegion ccsr;
  static int e500_pcihost_initfn(SysBusDevice *dev) {
  PCIHostState *h;
  @@ -315,6 +331,7 @@ static int e500_pcihost_initfn(SysBusDevice *dev)
  int i;
  MemoryRegion *address_space_mem = get_system_memory();
  MemoryRegion *address_space_io = get_system_io();
  +PCIDevice *d;
 
  h = PCI_HOST_BRIDGE(dev);
  s = PPC_E500_PCI_HOST_BRIDGE(dev); @@ -328,7 +345,12 @@ static int
  e500_pcihost_initfn(SysBusDevice *dev)
   address_space_io, PCI_DEVFN(0x11, 0), 4);
  h-bus = b;
 
  -pci_create_simple(b, 0, e500-host-bridge);
  +d = pci_create(b, 0, e500-host-bridge);
  +/* ccsr- should be passed from hw/ppc/e500.c */
  +ccsr.addr = 0xE000;
  +ccsr.size = int128_make64(0x0010);
 
 What is this?

I am trying to pass the CCSR information to e500-host-bridge. This is 
currently hardcoded, but finally this will come from hw/ppc/e500.c.

Thanks
-Bharat

 
 
 Alex
 
  +qdev_prop_set_ptr(d-qdev, bar0_region, ccsr);
  +qdev_init_nofail(d-qdev);
 
  memory_region_init(s-container, pci-container, PCIE500_ALL_SIZE);
  memory_region_init_io(h-conf_mem, pci_host_conf_be_ops, h, @@
  -345,11 +367,18 @@ static int e500_pcihost_initfn(SysBusDevice *dev)
  return 0;
  }
 
  +static Property pci_host_dev_info[] = {
  +DEFINE_PROP_PTR(bar0_region, bharat, bar0),
  +DEFINE_PROP_END_OF_LIST(),
  +};
  +
  static void e500_host_bridge_class_init(ObjectClass *klass, void
  *data) {
  DeviceClass *dc = DEVICE_CLASS(klass);
  PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 
  +k-init = e500_pcihost_bridge_initfn;
  +dc-props = pci_host_dev_info;
  k-vendor_id = PCI_VENDOR_ID_FREESCALE;
  k-device_id = PCI_DEVICE_ID_MPC8533E;
  k-class_id = PCI_CLASS_PROCESSOR_POWERPC; @@ -359,10 +388,11 @@
  static void e500_host_bridge_class_init(ObjectClass *klass, void
  *data) static const TypeInfo e500_host_bridge_info = {
  .name  = e500-host-bridge,
  .parent= TYPE_PCI_DEVICE,
  -.instance_size = sizeof(PCIDevice),
  +.instance_size = sizeof(bharat),
  .class_init= e500_host_bridge_class_init,
  };
 
  static void e500_pcihost_class_init(ObjectClass *klass, void *data) {
  DeviceClass *dc = DEVICE_CLASS(klass);
 
  Thanks
  -Bharat
 
 
 
  Alex
 
 
  This way I should be able to met both of my requirements.
 
  Thanks
  -Bharat
 
 
  --
  error compiling 

Re: [Qemu-devel] [PATCH 3/3] Adding BAR0 for e500 PCI controller

2012-10-08 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Wood Scott-B07421
 Sent: Tuesday, October 09, 2012 1:54 AM
 To: Alexander Graf
 Cc: Andreas Färber; Bhushan Bharat-R65777; Avi Kivity; qemu-...@nongnu.org;
 qemu-devel@nongnu.org; Bhushan Bharat-R65777
 Subject: Re: [Qemu-devel] [PATCH 3/3] Adding BAR0 for e500 PCI controller
 
 On 10/08/2012 02:04:43 PM, Alexander Graf wrote:
 
  On 08.10.2012, at 20:00, Andreas Färber wrote:
 
   Am 08.10.2012 18:46, schrieb Bharat Bhushan:
   #define BINARY_DEVICE_TREE_FILEmpc8544ds.dtb
   #define UIMAGE_LOAD_BASE   0
   -#define DTC_LOAD_PAD   0x180
   +#define DTC_LOAD_PAD   0x50
   #define DTC_PAD_MASK   0xF
   #define INITRD_LOAD_PAD0x200
   #define INITRD_PAD_MASK0xFF
  
   Was this change intentional? I don't see it being used here, and
  commit
   message doesn't seem to mention it.
 
  I'd assume he tried to work around a bug I fixed in between. But this
  change definitely is not intentional.
 
 It looks like an accidental revert of
 http://patchwork.ozlabs.org/patch/179475/

Oops, it was not intended to be sent as patch. It was not accidental because 
without this revert I was not able to boot my guest.

Thanks
-Bharat





Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI controller

2012-10-05 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Thursday, October 04, 2012 9:37 PM
 To: Bhushan Bharat-R65777
 Cc: Avi Kivity; qemu-devel@nongnu.org; qemu-...@nongnu.org
 Subject: Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI controller
 
 
 On 04.10.2012, at 18:03, Bhushan Bharat-R65777 wrote:
 
 
 
  -Original Message-
  From: Avi Kivity [mailto:a...@redhat.com]
  Sent: Thursday, October 04, 2012 8:28 PM
  To: Bhushan Bharat-R65777
  Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; ag...@suse.de
  Subject: Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI
  controller
 
  On 10/04/2012 03:46 PM, Bhushan Bharat-R65777 wrote:
 
 
  -Original Message-
  From: Avi Kivity [mailto:a...@redhat.com]
  Sent: Thursday, October 04, 2012 6:02 PM
  To: Bhushan Bharat-R65777
  Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; ag...@suse.de;
  Bhushan Bharat-
  R65777
  Subject: Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI
  controller
 
  On 10/03/2012 01:50 PM, Bharat Bhushan wrote:
  sysbus_connect_irq(s, 0, mpic[pci_irq_nrs[0]]); diff --git
  a/hw/ppce500_pci.c b/hw/ppce500_pci.c index 92b1dc0..16e4af2
  100644
  --- a/hw/ppce500_pci.c
  +++ b/hw/ppce500_pci.c
  @@ -87,6 +87,7 @@ struct PPCE500PCIState {
  /* mmio maps */
  MemoryRegion container;
  MemoryRegion iomem;
  +void *bar0;
  };
 
  void *?  Why?
 
  Passing parameter via qdev is either possible by value or by void *.
  That's
  why I used void *.
 
  Then you shouldn't be using qdev for this.  But if you make the
  changes below, it will likely not be necessary.
 
 
  However this is best done from the pci device's initialization
  function, not here (the MemoryRegion should be a member of the pci
  device as
  well).
 
  We want to set BAR0 of PCI controller so we did this here. But yes,
  we also
  want that all devices under the PCI controller have this mapping, so
  when they access the MPIC register to send MSI then they can do that.
 
  Please elaborate.  One way to describe what you want is to issue an 'info
 mtree'
  and show what changes you want.
 
  Setup is something like this:
 
   |-|
   | PCI device  |
   | |
   ---
 |
 |
 |
 |
  |---|
  |   |
  | PCI controller|
  |   |
  |   --  |
  |   |  BAR0 in   |  |
  |   |   TYPE1|  |
  |   | Config HDR |  |
  |   --  |
  |   |
   ---
 
  BAR0: it is an inbound window, and on e500 devices this maps the pci bus
 address to CCSR address.
 
  My requirement are:
  1) when guest read pci controller BAR0, it is able to get proper size
  ( Size of CCSR space)
  2) Guest can program this to any address in pci address space. When PCI 
  device
 access this address range then that address should be mapped to CCSR address
 space.
 
  Though this patch only met the requirement number (1).
 
 No, it also meets (2). The PCI address space is identical to the CPU memory
 space in our mapping right now. So if the guest maps BAR0 somewhere, it
 automatically maps CCSR into CPU address space, which exposes it to PCI 
 address
 space.
 
 
 
 
  Which device's initialization function you are talking about?
 
  static const TypeInfo e500_host_bridge_info = {
 .name  = e500-host-bridge,
 .parent= TYPE_PCI_DEVICE,
 .instance_size = sizeof(PCIDevice),
 .class_init= e500_host_bridge_class_init,
  };
 
  This needs to describe a derived class of PCIDevice, hold the BAR as
  a data member, and register the BAR in the init function (which
  doesn't exist yet because you haven't subclassed it).  This way the
  BAR lives in the device which exposes it, like BARs everywhere.
 
  Do you mean that we add the MemoryRegion bar0 in PCIDevice struct. Do the
 same thing that I was doing in e500_pcihost_initfn() in the k-init() (will 
 add
 this) function of e500-host-bridge
 
 No, he means that you create a new struct like this:
 
   struct foo {
 PCIDevice p;
 MemoryRegion bar0;
   };
 
 Please check out any other random PCI device in QEMU. Almost all of them do 
 this
 to store more information than their parent class can hold.

Just want to be sure I understood you correctly: Do you mean something like 
this : ( I know I have to switch to QOM mechanism to share parameters)

diff --git a/hw/ppce500_pci.c b/hw/ppce500_pci.c
index 92b1dc0..a948bc6 100644
--- a/hw/ppce500_pci.c
+++ b/hw/ppce500_pci.c
@@ -89,6 +89,12 @@ struct PPCE500PCIState {
 MemoryRegion iomem;
 };
 
+struct BHARAT {
+PCIDevice p;
+void *bar0;
+};
+
+typedef struct BHARAT bharat;
 typedef struct PPCE500PCIState PPCE500PCIState;
 
 static uint64_t pci_reg_read4(void *opaque, target_phys_addr_t addr,
@@ -307,6 +313,16 @@ static const VMStateDescription vmstate_ppce500_pci = {
 
 #include exec-memory.h
 
+static int e500_pcihost_bridge_initfn

Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI controller

2012-10-04 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Avi Kivity [mailto:a...@redhat.com]
 Sent: Thursday, October 04, 2012 6:02 PM
 To: Bhushan Bharat-R65777
 Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; ag...@suse.de; Bhushan Bharat-
 R65777
 Subject: Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI controller
 
 On 10/03/2012 01:50 PM, Bharat Bhushan wrote:
   sysbus_connect_irq(s, 0, mpic[pci_irq_nrs[0]]); diff --git
  a/hw/ppce500_pci.c b/hw/ppce500_pci.c index 92b1dc0..16e4af2 100644
  --- a/hw/ppce500_pci.c
  +++ b/hw/ppce500_pci.c
  @@ -87,6 +87,7 @@ struct PPCE500PCIState {
   /* mmio maps */
   MemoryRegion container;
   MemoryRegion iomem;
  +void *bar0;
   };
 
 void *?  Why?

Passing parameter via qdev is either possible by value or by void *. That's why 
I used void *.

 
 
   typedef struct PPCE500PCIState PPCE500PCIState; @@ -315,6 +316,8 @@
  static int e500_pcihost_initfn(SysBusDevice *dev)
   int i;
   MemoryRegion *address_space_mem = get_system_memory();
   MemoryRegion *address_space_io = get_system_io();
  +PCIDevice *pdev;
  +MemoryRegion bar0;
 
   h = PCI_HOST_BRIDGE(dev);
   s = PPC_E500_PCI_HOST_BRIDGE(dev); @@ -342,6 +345,10 @@ static
  int e500_pcihost_initfn(SysBusDevice *dev)
   memory_region_add_subregion(s-container, PCIE500_REG_BASE, 
  s-iomem);
   sysbus_init_mmio(dev, s-container);
 
  +bar0 = *(MemoryRegion *)s-bar0;
  +pdev = pci_find_device(b, 0, 0);
  +pci_register_bar(pdev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, bar0);
  +
 
 This is broken, you're registering an object on the stack which will be freed 
 as
 soon as the function returns.
 
 Just pass s-bar0 as Alex suggests.

Ok.

 
 However this is best done from the pci device's initialization function, not
 here (the MemoryRegion should be a member of the pci device as well).

We want to set BAR0 of PCI controller so we did this here. But yes, we also 
want that all devices under the PCI controller have this mapping, so when they 
access the MPIC register to send MSI then they can do that.

Which device's initialization function you are talking about?

Thanks
-Bharat

 
   return 0;
   }
 
 
 
 --
 error compiling committee.c: too many arguments to function





Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI controller

2012-10-04 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Avi Kivity [mailto:a...@redhat.com]
 Sent: Thursday, October 04, 2012 8:28 PM
 To: Bhushan Bharat-R65777
 Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; ag...@suse.de
 Subject: Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI controller
 
 On 10/04/2012 03:46 PM, Bhushan Bharat-R65777 wrote:
 
 
  -Original Message-
  From: Avi Kivity [mailto:a...@redhat.com]
  Sent: Thursday, October 04, 2012 6:02 PM
  To: Bhushan Bharat-R65777
  Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; ag...@suse.de;
  Bhushan Bharat-
  R65777
  Subject: Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI
  controller
 
  On 10/03/2012 01:50 PM, Bharat Bhushan wrote:
sysbus_connect_irq(s, 0, mpic[pci_irq_nrs[0]]); diff --git
   a/hw/ppce500_pci.c b/hw/ppce500_pci.c index 92b1dc0..16e4af2 100644
   --- a/hw/ppce500_pci.c
   +++ b/hw/ppce500_pci.c
   @@ -87,6 +87,7 @@ struct PPCE500PCIState {
/* mmio maps */
MemoryRegion container;
MemoryRegion iomem;
   +void *bar0;
};
 
  void *?  Why?
 
  Passing parameter via qdev is either possible by value or by void *. That's
 why I used void *.
 
 Then you shouldn't be using qdev for this.  But if you make the changes below,
 it will likely not be necessary.
 
 
  However this is best done from the pci device's initialization
  function, not here (the MemoryRegion should be a member of the pci device 
  as
 well).
 
  We want to set BAR0 of PCI controller so we did this here. But yes, we also
 want that all devices under the PCI controller have this mapping, so when they
 access the MPIC register to send MSI then they can do that.
 
 Please elaborate.  One way to describe what you want is to issue an 'info 
 mtree'
 and show what changes you want.

Setup is something like this:

  |-|
  | PCI device  |
  | |
  ---
|
|
|
|
 |---|
 |   |
 | PCI controller|
 |   |
 |   --  |
 |   |  BAR0 in   |  |
 |   |   TYPE1|  |
 |   | Config HDR |  |
 |   --  |
 |   |
  ---

BAR0: it is an inbound window, and on e500 devices this maps the pci bus 
address to CCSR address.

My requirement are:
 1) when guest read pci controller BAR0, it is able to get proper size ( Size 
of CCSR space)
 2) Guest can program this to any address in pci address space. When PCI device 
access this address range then that address should be mapped to CCSR address 
space.

Though this patch only met the requirement number (1). 

 
 
  Which device's initialization function you are talking about?
 
 static const TypeInfo e500_host_bridge_info = {
 .name  = e500-host-bridge,
 .parent= TYPE_PCI_DEVICE,
 .instance_size = sizeof(PCIDevice),
 .class_init= e500_host_bridge_class_init,
 };
 
 This needs to describe a derived class of PCIDevice, hold the BAR as a data
 member, and register the BAR in the init function (which doesn't exist yet
 because you haven't subclassed it).  This way the BAR lives in the device 
 which
 exposes it, like BARs everywhere.

Do you mean that we add the MemoryRegion bar0 in PCIDevice struct. Do the 
same thing that I was doing in e500_pcihost_initfn() in the k-init() (will add 
this) function of e500-host-bridge

This way I should be able to met both of my requirements.

Thanks
-Bharat

 
 --
 error compiling committee.c: too many arguments to function





Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI controller

2012-10-04 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Thursday, October 04, 2012 9:37 PM
 To: Bhushan Bharat-R65777
 Cc: Avi Kivity; qemu-devel@nongnu.org; qemu-...@nongnu.org
 Subject: Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI controller
 
 
 On 04.10.2012, at 18:03, Bhushan Bharat-R65777 wrote:
 
 
 
  -Original Message-
  From: Avi Kivity [mailto:a...@redhat.com]
  Sent: Thursday, October 04, 2012 8:28 PM
  To: Bhushan Bharat-R65777
  Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; ag...@suse.de
  Subject: Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI
  controller
 
  On 10/04/2012 03:46 PM, Bhushan Bharat-R65777 wrote:
 
 
  -Original Message-
  From: Avi Kivity [mailto:a...@redhat.com]
  Sent: Thursday, October 04, 2012 6:02 PM
  To: Bhushan Bharat-R65777
  Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; ag...@suse.de;
  Bhushan Bharat-
  R65777
  Subject: Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI
  controller
 
  On 10/03/2012 01:50 PM, Bharat Bhushan wrote:
  sysbus_connect_irq(s, 0, mpic[pci_irq_nrs[0]]); diff --git
  a/hw/ppce500_pci.c b/hw/ppce500_pci.c index 92b1dc0..16e4af2
  100644
  --- a/hw/ppce500_pci.c
  +++ b/hw/ppce500_pci.c
  @@ -87,6 +87,7 @@ struct PPCE500PCIState {
  /* mmio maps */
  MemoryRegion container;
  MemoryRegion iomem;
  +void *bar0;
  };
 
  void *?  Why?
 
  Passing parameter via qdev is either possible by value or by void *.
  That's
  why I used void *.
 
  Then you shouldn't be using qdev for this.  But if you make the
  changes below, it will likely not be necessary.
 
 
  However this is best done from the pci device's initialization
  function, not here (the MemoryRegion should be a member of the pci
  device as
  well).
 
  We want to set BAR0 of PCI controller so we did this here. But yes,
  we also
  want that all devices under the PCI controller have this mapping, so
  when they access the MPIC register to send MSI then they can do that.
 
  Please elaborate.  One way to describe what you want is to issue an 'info
 mtree'
  and show what changes you want.
 
  Setup is something like this:
 
   |-|
   | PCI device  |
   | |
   ---
 |
 |
 |
 |
  |---|
  |   |
  | PCI controller|
  |   |
  |   --  |
  |   |  BAR0 in   |  |
  |   |   TYPE1|  |
  |   | Config HDR |  |
  |   --  |
  |   |
   ---
 
  BAR0: it is an inbound window, and on e500 devices this maps the pci bus
 address to CCSR address.
 
  My requirement are:
  1) when guest read pci controller BAR0, it is able to get proper size
  ( Size of CCSR space)
  2) Guest can program this to any address in pci address space. When PCI 
  device
 access this address range then that address should be mapped to CCSR address
 space.
 
  Though this patch only met the requirement number (1).
 
 No, it also meets (2). The PCI address space is identical to the CPU memory
 space in our mapping right now. So if the guest maps BAR0 somewhere, it
 automatically maps CCSR into CPU address space, which exposes it to PCI 
 address
 space.

Really? I think on powerpc the pci address space is defined as: it maps the 
outbound window just below 0x1__, then CCSR and then inbound window. So 
inbound window is 1:1 map if guest physical starts from 0x0. But I do not think 
CCSR is 1:1 map in pci address space and cpu address space. 

 
 
 
 
  Which device's initialization function you are talking about?
 
  static const TypeInfo e500_host_bridge_info = {
 .name  = e500-host-bridge,
 .parent= TYPE_PCI_DEVICE,
 .instance_size = sizeof(PCIDevice),
 .class_init= e500_host_bridge_class_init,
  };
 
  This needs to describe a derived class of PCIDevice, hold the BAR as
  a data member, and register the BAR in the init function (which
  doesn't exist yet because you haven't subclassed it).  This way the
  BAR lives in the device which exposes it, like BARs everywhere.
 
  Do you mean that we add the MemoryRegion bar0 in PCIDevice struct. Do the
 same thing that I was doing in e500_pcihost_initfn() in the k-init() (will 
 add
 this) function of e500-host-bridge
 
 No, he means that you create a new struct like this:
 
   struct foo {
 PCIDevice p;
 MemoryRegion bar0;
   };
 
 Please check out any other random PCI device in QEMU. Almost all of them do 
 this
 to store more information than their parent class can hold.

Ok,

Thanks
-Bharat

 
 
 Alex
 
 
  This way I should be able to met both of my requirements.
 
  Thanks
  -Bharat
 
 
  --
  error compiling committee.c: too many arguments to function
 
 
 





Re: [Qemu-devel] [PATCH 1/2] e500: Adding CCSR memory region

2012-10-03 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Wednesday, October 03, 2012 5:39 PM
 To: Bhushan Bharat-R65777
 Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; Bhushan Bharat-R65777
 Subject: Re: [PATCH 1/2] e500: Adding CCSR memory region
 
 
 On 03.10.2012, at 13:49, Bharat Bhushan wrote:
 
  All devices are also placed under CCSR memory region.
  The CCSR memory region is exported to pci device. The MSI interrupt
  generation is the main reason to export the CCSR region to PCI device.
  This put the requirement to move mpic under CCSR region, but logically
  all devices should be under CCSR. So this patch places all emulated
  devices under ccsr region.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
  hw/ppc/e500.c |   51 +--
  1 files changed, 37 insertions(+), 14 deletions(-)
 
  diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c index 5bab340..197411d
  100644
  --- a/hw/ppc/e500.c
  +++ b/hw/ppc/e500.c
  @@ -46,14 +46,23 @@
  /* TODO: parameterize */
  #define MPC8544_CCSRBAR_BASE   0xE000ULL
  #define MPC8544_CCSRBAR_SIZE   0x0010ULL
  -#define MPC8544_MPIC_REGS_BASE (MPC8544_CCSRBAR_BASE + 0x4ULL)
  -#define MPC8544_SERIAL0_REGS_BASE  (MPC8544_CCSRBAR_BASE + 0x4500ULL)
  -#define MPC8544_SERIAL1_REGS_BASE  (MPC8544_CCSRBAR_BASE + 0x4600ULL)
  -#define MPC8544_PCI_REGS_BASE  (MPC8544_CCSRBAR_BASE + 0x8000ULL)
  +#define MPC8544_MPIC_REGS_OFFSET   0x4ULL
  +#define MPC8544_MPIC_REGS_BASE (MPC8544_CCSRBAR_BASE + \
  +MPC8544_MPIC_REGS_OFFSET) #define
  +MPC8544_SERIAL0_REGS_OFFSET 0x4500ULL #define
  +MPC8544_SERIAL0_REGS_BASE  (MPC8544_CCSRBAR_BASE + \
  +MPC8544_SERIAL0_REGS_OFFSET)
  +#define MPC8544_SERIAL1_REGS_OFFSET 0x4600ULL #define
  +MPC8544_SERIAL1_REGS_BASE  (MPC8544_CCSRBAR_BASE + \
  +MPC8544_SERIAL1_REGS_OFFSET)
  +#define MPC8544_PCI_REGS_OFFSET0x8000ULL
  +#define MPC8544_PCI_REGS_BASE  (MPC8544_CCSRBAR_BASE + \
  +MPC8544_PCI_REGS_OFFSET)
 
 You don't use any of the bases anymore, right? Please remove the respective
 #define's.

Alex, some of these bases are used in device tree creation code.

Thanks
-Bharat
 
 The rest of the patch looks very nice.
 
 
 Alex
 
  #define MPC8544_PCI_REGS_SIZE  0x1000ULL
  #define MPC8544_PCI_IO 0xE100ULL
  #define MPC8544_PCI_IOLEN  0x1ULL
  -#define MPC8544_UTIL_BASE  (MPC8544_CCSRBAR_BASE + 0xeULL)
  +#define MPC8544_UTIL_OFFSET0xeULL
  +#define MPC8544_UTIL_BASE  (MPC8544_CCSRBAR_BASE +
 MPC8544_UTIL_OFFSET)
  #define MPC8544_SPIN_BASE  0xEF00ULL
 
  struct boot_info
  @@ -419,6 +428,8 @@ void ppce500_init(PPCE500Params *params)
  qemu_irq **irqs, *mpic;
  DeviceState *dev;
  CPUPPCState *firstenv = NULL;
  +MemoryRegion *ccsr;
  +SysBusDevice *s;
 
  /* Setup CPUs */
  if (params-cpu_model == NULL) {
  @@ -474,8 +485,12 @@ void ppce500_init(PPCE500Params *params)
  vmstate_register_ram_global(ram);
  memory_region_add_subregion(address_space_mem, 0, ram);
 
  +ccsr = g_malloc0(sizeof(MemoryRegion));
  +memory_region_init(ccsr, e500-cssr, MPC8544_CCSRBAR_SIZE);
  +memory_region_add_subregion(address_space_mem,
  + MPC8544_CCSRBAR_BASE, ccsr);
  +
  /* MPIC */
  -mpic = mpic_init(address_space_mem, MPC8544_MPIC_REGS_BASE,
  +mpic = mpic_init(ccsr, MPC8544_MPIC_REGS_OFFSET,
   smp_cpus, irqs, NULL);
 
  if (!mpic) {
  @@ -484,25 +499,33 @@ void ppce500_init(PPCE500Params *params)
 
  /* Serial */
  if (serial_hds[0]) {
  -serial_mm_init(address_space_mem, MPC8544_SERIAL0_REGS_BASE,
  +serial_mm_init(ccsr, MPC8544_SERIAL0_REGS_OFFSET,
 0, mpic[12+26], 399193,
 serial_hds[0], DEVICE_BIG_ENDIAN);
  }
 
  if (serial_hds[1]) {
  -serial_mm_init(address_space_mem, MPC8544_SERIAL1_REGS_BASE,
  +serial_mm_init(ccsr, MPC8544_SERIAL1_REGS_OFFSET,
 0, mpic[12+26], 399193,
  -   serial_hds[0], DEVICE_BIG_ENDIAN);
  +   serial_hds[1], DEVICE_BIG_ENDIAN);
  }
 
  /* General Utility device */
  -sysbus_create_simple(mpc8544-guts, MPC8544_UTIL_BASE, NULL);
  +dev = qdev_create(NULL, mpc8544-guts);
  +qdev_init_nofail(dev);
  +s = sysbus_from_qdev(dev);
  +memory_region_add_subregion(ccsr, MPC8544_UTIL_OFFSET,
  + s-mmio[0].memory);
 
  /* PCI */
  -dev = sysbus_create_varargs(e500-pcihost, MPC8544_PCI_REGS_BASE,
  -mpic[pci_irq_nrs[0]], mpic[pci_irq_nrs[1]],
  -mpic[pci_irq_nrs[2]], mpic[pci_irq_nrs[3]],
  -NULL);
  +dev = qdev_create

Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI controller

2012-10-03 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Wednesday, October 03, 2012 5:41 PM
 To: Bhushan Bharat-R65777
 Cc: qemu-devel qemu-devel; qemu-...@nongnu.org List; Bhushan Bharat-R65777; 
 Avi
 Kivity
 Subject: Re: [PATCH 2/2] Adding BAR0 for e500 PCI controller
 
 
 On 03.10.2012, at 13:50, Bharat Bhushan wrote:
 
  PCI Root complex have TYPE-1 configuration header while PCI endpoint
  have type-0 configuration header. The type-1 configuration header have
  a BAR (BAR0). In Freescale PCI controller BAR0 is used for mapping pci
  address space to CCSR address space. This can used for 2 purposes: 1)
  for MSI interrupt generation 2) Allow CCSR registers access when
  configured as PCI endpoint, which I am not sure is a use case with QEMU-KVM
 guest.
 
  What I observed is that when guest read the size of BAR0 of host
  controller configuration header (TYPE1 header) then it always reads it
  as 0. When looking into the QEMU hw/ppce500_pci.c, I do not find the
  PCI controller device registering BAR0. I do not find any other
  controller also doing so may they do not use BAR0.
 
  There are two issues when BAR0 is not there (which I can think of):
  1) There should be BAR0 emulated for PCI Root comaplex (TYPE1 header)
  and when reading the size of BAR0, it should give size as per real h/w.
 
  This patch solves this problem.
 
  2) Do we need this BAR0 inbound address translation?
 When BAR0 is of non-zero size then it will be configured for
  PCI address space to local address(CCSR) space translation on inbound 
  access.
  The primary use case is for MSI interrupt generation. The device is
  configured with a address offsets in PCI address space, which will be
  translated to MSI interrupt generation MPIC registers. Currently I do
  not understand the MSI interrupt generation mechanism in QEMU and also
  IIRC we do not use QEMU MSI interrupt mechanism on e500 guest machines.
  But this BAR0 will be used when using MSI on e500.
 
  I can see one more issue, There are ATMUs emulated in
  hw/ppce500_pci.c, but i do not see these being used for address translation.
  So far that works because pci address space and local address space
  are 1:1 mapped. BAR0 inbound translation + ATMU translation will
  complete the address translation of inbound traffic.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
  hw/ppc/e500.c|1 +
  hw/ppce500_pci.c |   13 +
  2 files changed, 14 insertions(+), 0 deletions(-)
 
  diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c index 197411d..c7ae2b6
  100644
  --- a/hw/ppc/e500.c
  +++ b/hw/ppc/e500.c
  @@ -518,6 +518,7 @@ void ppce500_init(PPCE500Params *params)
 
  /* PCI */
  dev = qdev_create(NULL, e500-pcihost);
  +qdev_prop_set_ptr(dev, bar0_region, ccsr);
  qdev_init_nofail(dev);
  s = sysbus_from_qdev(dev);
  sysbus_connect_irq(s, 0, mpic[pci_irq_nrs[0]]); diff --git
  a/hw/ppce500_pci.c b/hw/ppce500_pci.c index 92b1dc0..16e4af2 100644
  --- a/hw/ppce500_pci.c
  +++ b/hw/ppce500_pci.c
  @@ -87,6 +87,7 @@ struct PPCE500PCIState {
  /* mmio maps */
  MemoryRegion container;
  MemoryRegion iomem;
  +void *bar0;
  };
 
  typedef struct PPCE500PCIState PPCE500PCIState; @@ -315,6 +316,8 @@
  static int e500_pcihost_initfn(SysBusDevice *dev)
  int i;
  MemoryRegion *address_space_mem = get_system_memory();
  MemoryRegion *address_space_io = get_system_io();
  +PCIDevice *pdev;
  +MemoryRegion bar0;
 
  h = PCI_HOST_BRIDGE(dev);
  s = PPC_E500_PCI_HOST_BRIDGE(dev); @@ -342,6 +345,10 @@ static int
  e500_pcihost_initfn(SysBusDevice *dev)
  memory_region_add_subregion(s-container, PCIE500_REG_BASE, s-iomem);
  sysbus_init_mmio(dev, s-container);
 
  +bar0 = *(MemoryRegion *)s-bar0;
  +pdev = pci_find_device(b, 0, 0);
  +pci_register_bar(pdev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, bar0);
 
 Any reason you can't just pass in s-bar0 here? Though I'm not sure whether we
 have to do something special to get a memory region alias.

Yes I think this should work:
 pci_register_bar(pdev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, (MemoryRegion 
*)s-bar0);

Thanks
-Bharat

 
 Avi, any ideas? We want the same memory region mapped twice. Once at a fixed
 address, once as BAR0 of the PCI host controller.
 
 
 Alex
 
  +
  return 0;
  }
 
  @@ -363,6 +370,11 @@ static const TypeInfo e500_host_bridge_info = {
  .class_init= e500_host_bridge_class_init,
  };
 
  +static Property pci_host_dev_info[] = {
  +DEFINE_PROP_PTR(bar0_region, PPCE500PCIState, bar0),
  +DEFINE_PROP_END_OF_LIST(),
  +};
  +
  static void e500_pcihost_class_init(ObjectClass *klass, void *data) {
  DeviceClass *dc = DEVICE_CLASS(klass); @@ -370,6 +382,7 @@ static
  void e500_pcihost_class_init(ObjectClass *klass, void *data)
 
  k-init = e500_pcihost_initfn;
  dc-vmsd = vmstate_ppce500_pci;
  +dc-props = pci_host_dev_info;
  }
 
  static

Re: [Qemu-devel] [PATCH] Adding BAR0 for e500 PCI controller

2012-09-19 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Wednesday, September 19, 2012 4:33 PM
 To: Bhushan Bharat-R65777
 Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; afaer...@suse.de; Bhushan
 Bharat-R65777
 Subject: Re: [PATCH] Adding BAR0 for e500 PCI controller
 
 
 On 19.09.2012, at 09:41, Bharat Bhushan wrote:
 
  PCI Root complex have TYPE-1 configuration header while PCI endpoint
  have type-0 configuration header. The type-1 configuration header have
  a BAR (BAR0). In Freescale PCI controller BAR0 is used for mapping pci
  address space to CCSR address space. This can used for 2 purposes: 1)
  for MSI interrupt generation 2) Allow CCSR registers access when
  configured as PCI endpoint, which I am not sure is a use case with QEMU-KVM
 guest.
 
  What I observed is that when guest read the size of BAR0 of host
  controller configuration header (TYPE1 header) then it always reads it
  as 0. When looking into the QEMU hw/ppce500_pci.c, I do not find the
  PCI controller device registering BAR0. I do not find any other
  controller also doing so may they do not use BAR0.
 
  There are two issues when BAR0 is not there (which I can think of):
  1) There should be BAR0 emulated for PCI Root comaplex (TYPE1 header)
  and when reading the size of BAR0, it should give size as per real h/w.
 
  This patch solves this problem.
 
  2) Do we need this BAR0 inbound address translation?
 When BAR0 is of non-zero size then it will be configured for
  PCI address space to local address(CCSR) space translation on inbound 
  access.
  The primary use case is for MSI interrupt generation. The device is
  configured with a address offsets in PCI address space, which will be
  translated to MSI interrupt generation MPIC registers. Currently I do
  not understand the MSI interrupt generation mechanism in QEMU and also
  IIRC we do not use QEMU MSI interrupt mechanism on e500 guest machines.
  But this BAR0 will be used when using MSI on e500.
 
  I can see one more issue, There are ATMUs emulated in
  hw/ppce500_pci.c, but i do not see these being used for address translation.
  So far that works because pci address space and local address space
  are 1:1 mapped. BAR0 inbound translation + ATMU translation will
  complete the address translation of inbound traffic.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
  hw/pci_host.h|9 +
  hw/ppc/e500.c|   21 +
  hw/ppce500_pci.c |7 +++
  3 files changed, 37 insertions(+), 0 deletions(-)
 
  diff --git a/hw/pci_host.h b/hw/pci_host.h index 4b9c300..c1ec7eb
  100644
  --- a/hw/pci_host.h
  +++ b/hw/pci_host.h
  @@ -41,10 +41,19 @@ struct PCIHostState {
  MemoryRegion data_mem;
  MemoryRegion mmcfg;
  MemoryRegion *address_space;
  +MemoryRegion bar0;
  uint32_t config_reg;
  PCIBus *bus;
  };
 
  +typedef struct PPCE500CCSRState {
  +SysBusDevice *parent;
  +MemoryRegion ccsr_space;
  +} PPCE500CCSRState;
  +
  +#define TYPE_CCSR e500-ccsr
  +#define CCSR(obj) OBJECT_CHECK(PPCE500CCSRState, (obj), TYPE_CCSR)
  +
 
 All of this is e500 specific, so it should definitely not be in any generic 
 pci
 host header.

Yes, ok.

 
  /* common internal helpers for PCI/PCIe hosts, cut off overflows */
  void pci_host_config_write_common(PCIDevice *pci_dev, uint32_t addr,
uint32_t limit, uint32_t val,
  uint32_t len); diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c index
  6f0de6d..1f5da28 100644
  --- a/hw/ppc/e500.c
  +++ b/hw/ppc/e500.c
  @@ -31,6 +31,7 @@
  #include hw/loader.h
  #include elf.h
  #include hw/sysbus.h
  +#include hw/pci_host.h
  #include exec-memory.h
  #include host-utils.h
 
  @@ -587,3 +588,23 @@ void ppce500_init(PPCE500Params *params)
  kvmppc_init();
  }
  }
  +
  +static void e500_ccsr_type_init(Object *obj) {
  +PPCE500CCSRState *ccsr = CCSR(obj);
  +ccsr-ccsr_space.size = int128_make64(MPC8544_CCSRBAR_SIZE);
  +}
  +
  +static const TypeInfo e500_ccsr_info = {
  +.name  = TYPE_CCSR,
  +.parent=  TYPE_SYS_BUS_DEVICE,
  +.instance_size = sizeof(PPCE500CCSRState),
  +.instance_init = e500_ccsr_type_init, };
  +
  +static void e500_ccsr_register_types(void) {
  +type_register_static(e500_ccsr_info);
  +}
  +
  +type_init(e500_ccsr_register_types)
  diff --git a/hw/ppce500_pci.c b/hw/ppce500_pci.c index
  92b1dc0..135f2a6 100644
  --- a/hw/ppce500_pci.c
  +++ b/hw/ppce500_pci.c
  @@ -315,6 +315,8 @@ static int e500_pcihost_initfn(SysBusDevice *dev)
  int i;
  MemoryRegion *address_space_mem = get_system_memory();
  MemoryRegion *address_space_io = get_system_io();
  +PPCE500CCSRState *ccsr;
  +PCIDevice *pdev;
 
  h = PCI_HOST_BRIDGE(dev);
  s = PPC_E500_PCI_HOST_BRIDGE(dev); @@ -342,6 +344,11 @@ static int
  e500_pcihost_initfn(SysBusDevice *dev)
  memory_region_add_subregion(s-container, PCIE500_REG_BASE, s

Re: [Qemu-devel] [Qemu-ppc] [PATCH: RFC] Adding BAR0 for e500 PCI controller

2012-09-03 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Wood Scott-B07421
 Sent: Wednesday, August 15, 2012 6:59 AM
 To: Bhushan Bharat-R65777
 Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; ag...@suse.de; Bhushan Bharat-
 R65777
 Subject: Re: [Qemu-ppc] [PATCH: RFC] Adding BAR0 for e500 PCI controller
 
 On 08/14/2012 07:50 AM, Bharat Bhushan wrote:
  PCI Root complex have TYPE-1 configuration header while PCI endpoint
  have type-0 configuration header. The type-1 configuration header have
  a BAR (BAR0). In Freescale PCI controller BAR0 is used for mapping pci
  address space to CCSR address space. This can used for 2 purposes: 1)
  for MSI interrupt generation 2) Allow CCSR registers access when
  configured as PCI endpoint, which I am not sure is a use case with QEMU-KVM
 guest.
 
  What I observed is that when guest read the size of BAR0 of host
  controller configuration header (TYPE1 header) then it always reads it
  as 0. When looking into the QEMU hw/ppce500_pci.c, I do not find the
  PCI controller device registering BAR0. I do not find any other
  controller also doing so may they do not use BAR0.
 
  There are two issues when BAR0 is not there (which I can think of):
  1) There should be BAR0 emulated for PCI Root comaplex (TYPE1 header)
  and when reading the size of BAR0, it should give size as per real h/w.
 
  2) Do we need this BAR0 inbound address translation?
  When BAR0 is of non-zero size then it will be configured for PCI
  address space to local address(CCSR) space translation on inbound access.
  The primary use case is for MSI interrupt generation. The device is
  configured with a address offsets in PCI address space, which will be
  translated to MSI interrupt generation MPIC registers. Currently I do
  not understand the MSI interrupt generation mechanism in QEMU and also
  IIRC we do not use QEMU MSI interrupt mechanism on e500 guest machines.
  But this BAR0 will be used when using MSI on e500.
 
 This patch is only trying to address #1, right?  I don't see any connection 
 from
 this BAR to CCSR.
 
  +memory_region_init_io(h-bar0, pci_host_conf_be_ops, h,
  +  PCIHOST-bar0, 0x100);
 
 0x0100 is correct for e500mc-based systems, but it should be 0x0010 
 for
 e500v2-based systems.

Scott,

Currently we have a generic e500 machine which have CCSR size 0x0010 
(MPC8544_CCSRBAR_SIZE). We do not have e500mc and e500v2 machines. So should we 
make this 0x0010 as per generic e500 machine?

Can we somehow pass this via qdev/varargs from machine emulation code 
(hw/ppc/e500.c) ?

Thanks
-Bharat

 
 -Scott



Re: [Qemu-devel] Isuue assiging devices using VFIO on x86

2012-08-29 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alex Williamson [mailto:alex.william...@redhat.com]
 Sent: Wednesday, August 29, 2012 12:16 PM
 To: Bhushan Bharat-R65777
 Cc: k...@vger.kernel.org; Avi Kivity; qemu-devel@nongnu.org
 Subject: Re: Isuue assiging devices using VFIO on x86
 
 On Tue, 2012-08-28 at 17:10 +, Bhushan Bharat-R65777 wrote:
 
   -Original Message-
   From: Alex Williamson [mailto:alex.william...@redhat.com]
   Sent: Tuesday, August 28, 2012 9:27 PM
   To: Bhushan Bharat-R65777
   Cc: k...@vger.kernel.org; Avi Kivity; qemu-devel@nongnu.org
   Subject: Re: Isuue assiging devices using VFIO on x86
  
   On Tue, 2012-08-28 at 09:23 +, Bhushan Bharat-R65777 wrote:
Hi Alex,
   
In my susyem I have following devices:
   
I tried assigning a following PCI devices:
00:03.0 Communication controller: Intel Corporation 4 Series
Chipset HECI
   Controller (rev 03)
00:03.2 IDE interface: Intel Corporation 4 Series Chipset PT IDER
Controller
   (rev 03)
00:03.3 Serial controller: Intel Corporation 4 Series Chipset
Serial KT
   Controller (rev 03)
   
and getting below error:
   
---
Command:
---
qemu-system-x86_64 -enable-kvm  -nographic  -kernel
/boot/vmlinuz-3.6.0-rc2+ -
   initrd /boot/initramfs-3.6.0-rc2+.img -append root=/dev/sda1 -m
   1024 -drive
   file=/home/kvmdev/debian_squeeze_amd64_standard.qcow2 -device vfio-
   pci,host=:00:03.0 -device vfio-pci,host=:00:03.2 -device
   vfio-
   pci,host=:00:03.3
   
Error prints:

qemu-system-x86_64: -device vfio-pci,host=:00:03.0: Warning,
device
   :00:03.0 does not support reset
   
qemu-system-x86_64: -device vfio-pci,host=:00:03.0: VFIO
:00:03.0 BAR
   0 is too small to mmap, this may affect performance.
   
qemu-system-x86_64: -device vfio-pci,host=:00:03.2: Warning,
device
   :00:03.2 does not support reset
   
qemu-system-x86_64: -device vfio-pci,host=:00:03.3: Warning,
device
   :00:03.3 does not support reset
   
qemu: hardware error: register_ioport_read: invalid opaque for
address 0x3f6 CPU #0:
EAX=8003 EBX=3ffe0e80 ECX=8003 EDX=0cfc ESI=2800
EDI=80002804 EBP=6fc0 ESP=6f38
EIP=3ffec005 EFL=0086 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010   00c09300 DPL=0 DS   [-WA]
CS =0008   00c09b00 DPL=0 CS32 [-RA]
SS =0010   00c09300 DPL=0 DS   [-WA]
DS =0010   00c09300 DPL=0 DS   [-WA]
FS =0010   00c09300 DPL=0 DS   [-WA]
GS =0010   00c09300 DPL=0 DS   [-WA]
LDT=   8200 DPL=0 LDT TR = 
 8b00 DPL=0 TSS32-busy
GDT= 000fcd68 0037
IDT= 000fdb60 
CR0=0011 CR2= CR3= CR4=
DR0= DR1= DR2=
   DR3=
DR6=0ff0 DR7=0400 EFER=
FCW=037f FSW= [ST=0] FTW=00 MXCSR=1f80
FPR0=  FPR1= 
FPR2=  FPR3= 
FPR4=  FPR5= 
FPR6=  FPR7= 
XMM00=
XMM01=
XMM02=
XMM03=
XMM04=
XMM05=
XMM06=
XMM07=
Aborted (core dumped)
   

   
Linux: http://github.com/awilliam/linux-vfio.git next
Qemu: https://github.com/awilliam/qemu-vfio.git  iommu-group-vfio
   
Any idea what's is the issue.
  
   Please try the vfio-pci-for-qemu-1.2-v3 qemu-vfio.git tag, I'm not
   even sure what's in that old iommu-group-vfio branch.  Thanks,
 
  Can you share the command to configure qemu?
 
 Nothing special: ./configure
 
 I sometimes set --prefix or --target-list, but nothing that changes the
 binaries.  Thanks,

Hello Alex,
I am using  ./configure --target-list=x86_64-softmmu --enable-kvm  .
QEMU repo is at:
commit 3b4672f6614e07f9dc0a85a12a8c89d480a2493c
Author: Alex Williamson alex.william...@redhat.com
Date:   Tue Aug 14 14:07:19 2012 -0600

vfio: Enable vfio-pci and mark supported

Signed-off-by: Alex Williamson alex.william...@redhat.com

Linux Repo is at :
commit d9875690d9b89a866022ff49e3fcea892345ad92
Author: Linus Torvalds torva...@linux-foundation.org
Date:   Thu Aug 16 14:51:24 2012 -0700

Linux 3.6-rc2

---

Then I bound 3 pci devices with VFIO and run below command to launch VM
qemu-system-x86_64 -enable-kvm  -nographic  -kernel /boot/vmlinuz-3.6.0-rc2+ 
-initrd /boot/initramfs-3.6.0-rc2+.img

Re: [Qemu-devel] Isuue assiging devices using VFIO on x86

2012-08-29 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alex Williamson [mailto:alex.william...@redhat.com]
 Sent: Wednesday, August 29, 2012 8:44 PM
 To: Bhushan Bharat-R65777
 Cc: k...@vger.kernel.org; Avi Kivity; qemu-devel@nongnu.org
 Subject: Re: Isuue assiging devices using VFIO on x86
 
 On Wed, 2012-08-29 at 07:11 +, Bhushan Bharat-R65777 wrote:
 
   -Original Message-
   From: Alex Williamson [mailto:alex.william...@redhat.com]
   Sent: Wednesday, August 29, 2012 12:16 PM
   To: Bhushan Bharat-R65777
   Cc: k...@vger.kernel.org; Avi Kivity; qemu-devel@nongnu.org
   Subject: Re: Isuue assiging devices using VFIO on x86
  
   On Tue, 2012-08-28 at 17:10 +, Bhushan Bharat-R65777 wrote:
   
 -Original Message-
 From: Alex Williamson [mailto:alex.william...@redhat.com]
 Sent: Tuesday, August 28, 2012 9:27 PM
 To: Bhushan Bharat-R65777
 Cc: k...@vger.kernel.org; Avi Kivity; qemu-devel@nongnu.org
 Subject: Re: Isuue assiging devices using VFIO on x86

 On Tue, 2012-08-28 at 09:23 +, Bhushan Bharat-R65777 wrote:
  Hi Alex,
 
  In my susyem I have following devices:
 
  I tried assigning a following PCI devices:
  00:03.0 Communication controller: Intel Corporation 4 Series
  Chipset HECI
 Controller (rev 03)
  00:03.2 IDE interface: Intel Corporation 4 Series Chipset PT
  IDER Controller
 (rev 03)
  00:03.3 Serial controller: Intel Corporation 4 Series Chipset
  Serial KT
 Controller (rev 03)
  Hello Alex,
  I am using  ./configure --target-list=x86_64-softmmu --enable-kvm  .
  QEMU repo is at:
  commit 3b4672f6614e07f9dc0a85a12a8c89d480a2493c
  Author: Alex Williamson alex.william...@redhat.com
  Date:   Tue Aug 14 14:07:19 2012 -0600
 
  vfio: Enable vfio-pci and mark supported
 
  Signed-off-by: Alex Williamson alex.william...@redhat.com
 
  Linux Repo is at :
  commit d9875690d9b89a866022ff49e3fcea892345ad92
  Author: Linus Torvalds torva...@linux-foundation.org
  Date:   Thu Aug 16 14:51:24 2012 -0700
 
  Linux 3.6-rc2
 
  ---
 
  Then I bound 3 pci devices with VFIO and run below command to launch
  VM
  qemu-system-x86_64 -enable-kvm  -nographic  -kernel
  /boot/vmlinuz-3.6.0-rc2+ -initrd /boot/initramfs-3.6.0-rc2+.img
  -append root=/dev/sda1 -m 1024 -drive
  file=/home/kvmdev/debian_squeeze_amd64_standard.qcow2 -device
  vfio-pci,host=:00:03.0 -device vfio-pci,host=:00:03.2 -device
  vfio-pci,host=:00:03.3
 
  Then I was getting below errors:
  qemu-system-x86_64: -device vfio-pci,host=:00:03.0: vfio: failed
  to set iommu for container: Operation not permitted
 
  qemu-system-x86_64: -device vfio-pci,host=:00:03.0: vfio: failed
  to setup container for group 2
 
  qemu-system-x86_64: -device vfio-pci,host=:00:03.0: vfio: failed
  to get group 2
 
  qemu-system-x86_64: -device vfio-pci,host=:00:03.0: Device
  'vfio-pci' could not be initialized
  -
 
  Then I set
  /sys/module/vfio_iommu_type1/parameters/allow_unsafe_interrupts = 1
 
  And getting below error:
 
  qemu-system-x86_64: -device vfio-pci,host=:00:03.0: Warning,
  device :00:03.0 does not support reset
 
  qemu-system-x86_64: -device vfio-pci,host=:00:03.2: Warning,
  device :00:03.2 does not support reset
 
  qemu-system-x86_64: -device vfio-pci,host=:00:03.3: Warning,
  device :00:03.3 does not support reset
 
  qemu: hardware error: register_ioport_read: invalid opaque for address
  0x3f6 CPU #0:
  EAX=8003 EBX=3ffe0e80 ECX=8003 EDX=0cfc ESI=2800
  EDI=80002804 EBP=6fc0 ESP=6f38
  EIP=3ffec005 EFL=0086 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
  ES =0010   00c09300 DPL=0 DS   [-WA]
  CS =0008   00c09b00 DPL=0 CS32 [-RA]
  SS =0010   00c09300 DPL=0 DS   [-WA]
  DS =0010   00c09300 DPL=0 DS   [-WA]
  FS =0010   00c09300 DPL=0 DS   [-WA]
  GS =0010   00c09300 DPL=0 DS   [-WA]
  LDT=   8200 DPL=0 LDT TR = 
   8b00 DPL=0 TSS32-busy
  GDT= 000fcd68 0037
  IDT= 000fdb60 
  CR0=0011 CR2= CR3= CR4=
  DR0= DR1= DR2=
  DR3= DR6=0ff0 DR7=0400
  EFER= FCW=037f FSW= [ST=0] FTW=00 MXCSR=1f80
  FPR0=  FPR1= 
  FPR2=  FPR3= 
  FPR4=  FPR5= 
  FPR6=  FPR7= 
  XMM00=
  XMM01=
  XMM02=
  XMM03=
  XMM04=
  XMM05=
  XMM06=
  XMM07=

[Qemu-devel] Isuue assiging devices using VFIO on x86

2012-08-28 Thread Bhushan Bharat-R65777
Hi Alex,

In my susyem I have following devices:

I tried assigning a following PCI devices:
00:03.0 Communication controller: Intel Corporation 4 Series Chipset HECI 
Controller (rev 03)
00:03.2 IDE interface: Intel Corporation 4 Series Chipset PT IDER Controller 
(rev 03)
00:03.3 Serial controller: Intel Corporation 4 Series Chipset Serial KT 
Controller (rev 03)

and getting below error:

---
Command:
---
qemu-system-x86_64 -enable-kvm  -nographic  -kernel /boot/vmlinuz-3.6.0-rc2+ 
-initrd /boot/initramfs-3.6.0-rc2+.img -append root=/dev/sda1 -m 1024 -drive 
file=/home/kvmdev/debian_squeeze_amd64_standard.qcow2 -device 
vfio-pci,host=:00:03.0 -device vfio-pci,host=:00:03.2 -device 
vfio-pci,host=:00:03.3

Error prints:

qemu-system-x86_64: -device vfio-pci,host=:00:03.0: Warning, device 
:00:03.0 does not support reset

qemu-system-x86_64: -device vfio-pci,host=:00:03.0: VFIO :00:03.0 BAR 0 
is too small to mmap, this may affect performance.

qemu-system-x86_64: -device vfio-pci,host=:00:03.2: Warning, device 
:00:03.2 does not support reset

qemu-system-x86_64: -device vfio-pci,host=:00:03.3: Warning, device 
:00:03.3 does not support reset

qemu: hardware error: register_ioport_read: invalid opaque for address 0x3f6
CPU #0:
EAX=8003 EBX=3ffe0e80 ECX=8003 EDX=0cfc
ESI=2800 EDI=80002804 EBP=6fc0 ESP=6f38
EIP=3ffec005 EFL=0086 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010   00c09300 DPL=0 DS   [-WA]
CS =0008   00c09b00 DPL=0 CS32 [-RA]
SS =0010   00c09300 DPL=0 DS   [-WA]
DS =0010   00c09300 DPL=0 DS   [-WA]
FS =0010   00c09300 DPL=0 DS   [-WA]
GS =0010   00c09300 DPL=0 DS   [-WA]
LDT=   8200 DPL=0 LDT
TR =   8b00 DPL=0 TSS32-busy
GDT= 000fcd68 0037
IDT= 000fdb60 
CR0=0011 CR2= CR3= CR4=
DR0= DR1= DR2= 
DR3= 
DR6=0ff0 DR7=0400
EFER=
FCW=037f FSW= [ST=0] FTW=00 MXCSR=1f80
FPR0=  FPR1= 
FPR2=  FPR3= 
FPR4=  FPR5= 
FPR6=  FPR7= 
XMM00= XMM01=
XMM02= XMM03=
XMM04= XMM05=
XMM06= XMM07=
Aborted (core dumped)



Linux: http://github.com/awilliam/linux-vfio.git next
Qemu: https://github.com/awilliam/qemu-vfio.git  iommu-group-vfio

Any idea what's is the issue.

Thanks
-Bharat




Re: [Qemu-devel] Isuue assiging devices using VFIO on x86

2012-08-28 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alex Williamson [mailto:alex.william...@redhat.com]
 Sent: Tuesday, August 28, 2012 9:27 PM
 To: Bhushan Bharat-R65777
 Cc: k...@vger.kernel.org; Avi Kivity; qemu-devel@nongnu.org
 Subject: Re: Isuue assiging devices using VFIO on x86
 
 On Tue, 2012-08-28 at 09:23 +, Bhushan Bharat-R65777 wrote:
  Hi Alex,
 
  In my susyem I have following devices:
 
  I tried assigning a following PCI devices:
  00:03.0 Communication controller: Intel Corporation 4 Series Chipset HECI
 Controller (rev 03)
  00:03.2 IDE interface: Intel Corporation 4 Series Chipset PT IDER Controller
 (rev 03)
  00:03.3 Serial controller: Intel Corporation 4 Series Chipset Serial KT
 Controller (rev 03)
 
  and getting below error:
 
  ---
  Command:
  ---
  qemu-system-x86_64 -enable-kvm  -nographic  -kernel 
  /boot/vmlinuz-3.6.0-rc2+ -
 initrd /boot/initramfs-3.6.0-rc2+.img -append root=/dev/sda1 -m 1024 -drive
 file=/home/kvmdev/debian_squeeze_amd64_standard.qcow2 -device vfio-
 pci,host=:00:03.0 -device vfio-pci,host=:00:03.2 -device vfio-
 pci,host=:00:03.3
 
  Error prints:
  
  qemu-system-x86_64: -device vfio-pci,host=:00:03.0: Warning, device
 :00:03.0 does not support reset
 
  qemu-system-x86_64: -device vfio-pci,host=:00:03.0: VFIO :00:03.0 
  BAR
 0 is too small to mmap, this may affect performance.
 
  qemu-system-x86_64: -device vfio-pci,host=:00:03.2: Warning, device
 :00:03.2 does not support reset
 
  qemu-system-x86_64: -device vfio-pci,host=:00:03.3: Warning, device
 :00:03.3 does not support reset
 
  qemu: hardware error: register_ioport_read: invalid opaque for address 0x3f6
  CPU #0:
  EAX=8003 EBX=3ffe0e80 ECX=8003 EDX=0cfc
  ESI=2800 EDI=80002804 EBP=6fc0 ESP=6f38
  EIP=3ffec005 EFL=0086 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
  ES =0010   00c09300 DPL=0 DS   [-WA]
  CS =0008   00c09b00 DPL=0 CS32 [-RA]
  SS =0010   00c09300 DPL=0 DS   [-WA]
  DS =0010   00c09300 DPL=0 DS   [-WA]
  FS =0010   00c09300 DPL=0 DS   [-WA]
  GS =0010   00c09300 DPL=0 DS   [-WA]
  LDT=   8200 DPL=0 LDT
  TR =   8b00 DPL=0 TSS32-busy
  GDT= 000fcd68 0037
  IDT= 000fdb60 
  CR0=0011 CR2= CR3= CR4=
  DR0= DR1= DR2=
 DR3=
  DR6=0ff0 DR7=0400
  EFER=
  FCW=037f FSW= [ST=0] FTW=00 MXCSR=1f80
  FPR0=  FPR1= 
  FPR2=  FPR3= 
  FPR4=  FPR5= 
  FPR6=  FPR7= 
  XMM00= 
  XMM01=
  XMM02= 
  XMM03=
  XMM04= 
  XMM05=
  XMM06= 
  XMM07=
  Aborted (core dumped)
 
  
 
  Linux: http://github.com/awilliam/linux-vfio.git next
  Qemu: https://github.com/awilliam/qemu-vfio.git  iommu-group-vfio
 
  Any idea what's is the issue.
 
 Please try the vfio-pci-for-qemu-1.2-v3 qemu-vfio.git tag, I'm not even
 sure what's in that old iommu-group-vfio branch.  Thanks,

Can you share the command to configure qemu?

Thanks
-Bharat

 
 Alex
 



Re: [Qemu-devel] [Qemu-ppc] [PATCH: RFC] Adding BAR0 for e500 PCI controller

2012-08-14 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Wood Scott-B07421
 Sent: Wednesday, August 15, 2012 6:59 AM
 To: Bhushan Bharat-R65777
 Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; ag...@suse.de; Bhushan Bharat-
 R65777
 Subject: Re: [Qemu-ppc] [PATCH: RFC] Adding BAR0 for e500 PCI controller
 
 On 08/14/2012 07:50 AM, Bharat Bhushan wrote:
  PCI Root complex have TYPE-1 configuration header while PCI endpoint
  have type-0 configuration header. The type-1 configuration header have
  a BAR (BAR0). In Freescale PCI controller BAR0 is used for mapping pci
  address space to CCSR address space. This can used for 2 purposes: 1)
  for MSI interrupt generation 2) Allow CCSR registers access when
  configured as PCI endpoint, which I am not sure is a use case with QEMU-KVM
 guest.
 
  What I observed is that when guest read the size of BAR0 of host
  controller configuration header (TYPE1 header) then it always reads it
  as 0. When looking into the QEMU hw/ppce500_pci.c, I do not find the
  PCI controller device registering BAR0. I do not find any other
  controller also doing so may they do not use BAR0.
 
  There are two issues when BAR0 is not there (which I can think of):
  1) There should be BAR0 emulated for PCI Root comaplex (TYPE1 header)
  and when reading the size of BAR0, it should give size as per real h/w.
 
  2) Do we need this BAR0 inbound address translation?
  When BAR0 is of non-zero size then it will be configured for PCI
  address space to local address(CCSR) space translation on inbound access.
  The primary use case is for MSI interrupt generation. The device is
  configured with a address offsets in PCI address space, which will be
  translated to MSI interrupt generation MPIC registers. Currently I do
  not understand the MSI interrupt generation mechanism in QEMU and also
  IIRC we do not use QEMU MSI interrupt mechanism on e500 guest machines.
  But this BAR0 will be used when using MSI on e500.
 
 This patch is only trying to address #1, right?

Yes.

  I don't see any connection from
 this BAR to CCSR.

Yes, it is not there. That can be the next step.

 
  +memory_region_init_io(h-bar0, pci_host_conf_be_ops, h,
  +  PCIHOST-bar0, 0x100);
 
 0x0100 is correct for e500mc-based systems, but it should be 0x0010 
 for
 e500v2-based systems.

Yes, I will correct this.

Thanks
-Bharat

 
 -Scott



Re: [Qemu-devel] Running KVM guest on X86

2012-08-10 Thread Bhushan Bharat-R65777
On Mon, 2012-08-06 at 15:40 +, Bhushan Bharat-R65777 wrote:
 Hi Avi/All,

 I am facing issue to boot KVM guest on x86 (I used to work on
 PowerPC platform
and do not have enough knowledge of x86). I am working on making
VFIO working on PowerPC Booke, So I have cloned Alex Williamsons
git repository, compiled kernel for x86 on fedora with
virtualization configuration (selected all kernel config options
for same). Run below command to boot Guest (I have not provided vfio
 device yet):

 qemu-system-x86_64 -enable-kvm -m 1024 -nographic -kernel
arch/x86_64/boot/bzImage -initrd /boot/initramfs-3.5.0-rc4+.img
-serial tcp::,server,telnet

 After the I can see qemu command line (able to run various
 commands like info
registers etc), while guest does not boot (not even the first print
 comes).

 Can anyone help in what I am missing or doing wrong?
   
x86 doesn't use the serial port for console by default, so you're
making things quite a bit more difficult that way.  Typically
you'll want to provide a disk image (the -hda option is the
easiest way to do this), a display (-vga std -vnc :0 is again
easiest), and probably something to install from (-cdrom
image.iso).  You can also add a -boot d to get it to choose the cdrom
 the first time for install.
Thanks,
   
Thanks Avi and Alex, I can see the KVM guest boot prints by adding
-append
   console=ttyS0
  
   Note, once you get to user space you will need a getty specified in
   inittab in order to get a login on your serial port.   Something like:
  
  T0:23:respawn:/sbin/getty -L ttyS0
 
  1)
  I tried booting with prebuilt qcow2 then it works for me:
  qemu-system-x86_64  -enable-kvm  -nographic  -device sga  -m 1024 -hda
  debian_squeeze_amd64_standard.qcow2
 
  Does anyone help on how I can add my kernel to qcow2? Or create a proper
 qcow2?
 
  2)
  Also I tried as mentioned in section 3.9 Direct Linux Boot:
 http://qemu.weilnetz.de/qemu-doc.html#disk_005fimages :
 
  qemu-kvm  -enable-kvm  -nographic -kernel /boot/vmlinuz-3.5.0+ -hda
  /boot/initramfs-3.5.0+.img  -append console=ttyS0 root=/dev/sda -m
  1024
 
 -hda  /boot/initramfs-3.5.0+.img is incorrect. Should be -hda
 debian_squeeze_amd64_standard.qcow2 -initrd /boot/initramfs-3.5.0+.img and
 root=/dev/sda1 probably.

I tried :
qemu-system-x86_64  -enable-kvm  -nographic  -kernel /boot/vmlinuz-3.5.0+  
-initrd /boot/initramfs-3.5.0+.img  -append root=/dev/sda1 rw console=ttyS0 
-m 1024 -hda debian_squeeze_amd64_standard.qcow2

With this I get the login prompt, but it is not taking input character from 
keyboard properly (not able to give login credentials even). Seeing some weird 
behavior, like sometimes it treat normal character as like ENTER pressed.

Below are some boot prints and it is found that there were some junk characters 
after Setting console screen modes.

--
Setting console screen modes.
]Rcannot (un)set powersave mode
[9;30][14;30]Skipping font and keymap setup (handled by console-setup).
Setting up console font and keymap...done.
[   11.547904] rc (278) used greatest stack depth: 1760 bytes left
Starting portmap daemon...Already running..
Starting NFS common utilities: statd.
Starting enhanced syslogd: rsyslogd.
Starting deferred execution scheduler: atd.
Starting ACPI services
Starting periodic command scheduler: cron.
Starting MTA: exim4.

Debian GNU/Linux 6.0 debian-amd64 ttyS0

debian-amd64 login:


Thanks 
-Bharat





Re: [Qemu-devel] Running KVM guest on X86

2012-08-09 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Stuart Yoder [mailto:b08...@gmail.com]
 Sent: Thursday, August 09, 2012 8:28 PM
 To: Bhushan Bharat-R65777
 Cc: Alex Williamson; qemu-devel@nongnu.org; Avi Kivity
 Subject: Re: [Qemu-devel] Running KVM guest on X86
 
 On Tue, Aug 7, 2012 at 1:30 AM, Bhushan Bharat-R65777 r65...@freescale.com
 wrote:
 
 
  -Original Message-
  From: Alex Williamson [mailto:alex.william...@redhat.com]
  Sent: Monday, August 06, 2012 9:27 PM
  To: Bhushan Bharat-R65777
  Cc: qemu-devel@nongnu.org; Avi Kivity
  Subject: Re: Running KVM guest on X86
 
  On Mon, 2012-08-06 at 15:40 +, Bhushan Bharat-R65777 wrote:
   Hi Avi/All,
  
   I am facing issue to boot KVM guest on x86 (I used to work on
   PowerPC platform
  and do not have enough knowledge of x86). I am working on making VFIO
  working on PowerPC Booke, So I have cloned Alex Williamsons git
  repository, compiled kernel for x86 on fedora with virtualization
  configuration (selected all kernel config options for same). Run
  below command to boot Guest (I have not provided vfio device yet):
  
   qemu-system-x86_64 -enable-kvm -m 1024 -nographic -kernel
  arch/x86_64/boot/bzImage -initrd /boot/initramfs-3.5.0-rc4+.img
  -serial tcp::,server,telnet
  
   After the I can see qemu command line (able to run various commands
   like info
  registers etc), while guest does not boot (not even the first print 
  comes).
  
   Can anyone help in what I am missing or doing wrong?
 
  x86 doesn't use the serial port for console by default, so you're
  making things quite a bit more difficult that way.  Typically you'll
  want to provide a disk image (the -hda option is the easiest way to
  do this), a display (-vga std -vnc :0 is again easiest), and probably
  something to install from (-cdrom image.iso).  You can also add a
  -boot d to get it to choose the cdrom the first time for install.
  Thanks,
 
  Thanks Avi and Alex, I can see the KVM guest boot prints by adding -append
 console=ttyS0
 
 Note, once you get to user space you will need a getty specified in
 inittab in order to get a login on your serial port.   Something like:
 
T0:23:respawn:/sbin/getty -L ttyS0

1)
I tried booting with prebuilt qcow2 then it works for me:
qemu-system-x86_64  -enable-kvm  -nographic  -device sga  -m 1024 -hda 
debian_squeeze_amd64_standard.qcow2

Does anyone help on how I can add my kernel to qcow2? Or create a proper qcow2?

2)
Also I tried as mentioned in section 3.9 Direct Linux Boot: 
http://qemu.weilnetz.de/qemu-doc.html#disk_005fimages : 

qemu-kvm  -enable-kvm  -nographic -kernel /boot/vmlinuz-3.5.0+ -hda 
/boot/initramfs-3.5.0+.img  -append console=ttyS0 root=/dev/sda -m 1024 

I get below error :
[1.299225] No filesystem could mount root, tried:  ext3 ext2 ext4 iso9660
[1.303232] Kernel panic - not syncing: VFS: Unable to mount root fs on 
unknown-block(8,0)
[1.307683] Pid: 1, comm: swapper/0 Not tainted 3.3.5-2.fc16.x86_64 #1
[1.311201] Call Trace:
[1.312548]  [815eac62] panic+0xba/0x1cd
[1.315160]  [81cf1075] mount_block_root+0x258/0x283
[1.318275]  [81cf10f3] mount_root+0x53/0x57
[1.321047]  [81cf1234] prepare_namespace+0x13d/0x176
[1.324206]  [81cf0d59] kernel_init+0x156/0x15b
[1.327114]  [81089587] ? schedule_tail+0x27/0xb0
[1.330102]  [815fd6a4] kernel_thread_helper+0x4/0x10
[1.333413]  [81cf0c03] ? start_kernel+0x3c5/0x3c5
[1.336446]  [815fd6a0] ? gs_change+0x13/0x13

Thanks
-Bharat




Re: [Qemu-devel] Running KVM guest on X86

2012-08-09 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alex Williamson [mailto:alex.william...@redhat.com]
 Sent: Thursday, August 09, 2012 11:25 PM
 To: Bhushan Bharat-R65777
 Cc: Stuart Yoder; qemu-devel@nongnu.org; Avi Kivity
 Subject: Re: [Qemu-devel] Running KVM guest on X86
 
 On Thu, 2012-08-09 at 17:39 +, Bhushan Bharat-R65777 wrote:
 
   -Original Message-
   From: Stuart Yoder [mailto:b08...@gmail.com]
   Sent: Thursday, August 09, 2012 8:28 PM
   To: Bhushan Bharat-R65777
   Cc: Alex Williamson; qemu-devel@nongnu.org; Avi Kivity
   Subject: Re: [Qemu-devel] Running KVM guest on X86
  
   On Tue, Aug 7, 2012 at 1:30 AM, Bhushan Bharat-R65777
   r65...@freescale.com
   wrote:
   
   
-Original Message-
From: Alex Williamson [mailto:alex.william...@redhat.com]
Sent: Monday, August 06, 2012 9:27 PM
To: Bhushan Bharat-R65777
Cc: qemu-devel@nongnu.org; Avi Kivity
Subject: Re: Running KVM guest on X86
   
On Mon, 2012-08-06 at 15:40 +, Bhushan Bharat-R65777 wrote:
 Hi Avi/All,

 I am facing issue to boot KVM guest on x86 (I used to work on
 PowerPC platform
and do not have enough knowledge of x86). I am working on making
VFIO working on PowerPC Booke, So I have cloned Alex Williamsons
git repository, compiled kernel for x86 on fedora with
virtualization configuration (selected all kernel config options
for same). Run below command to boot Guest (I have not provided vfio
 device yet):

 qemu-system-x86_64 -enable-kvm -m 1024 -nographic -kernel
arch/x86_64/boot/bzImage -initrd /boot/initramfs-3.5.0-rc4+.img
-serial tcp::,server,telnet

 After the I can see qemu command line (able to run various
 commands like info
registers etc), while guest does not boot (not even the first print
 comes).

 Can anyone help in what I am missing or doing wrong?
   
x86 doesn't use the serial port for console by default, so you're
making things quite a bit more difficult that way.  Typically
you'll want to provide a disk image (the -hda option is the
easiest way to do this), a display (-vga std -vnc :0 is again
easiest), and probably something to install from (-cdrom
image.iso).  You can also add a -boot d to get it to choose the cdrom
 the first time for install.
Thanks,
   
Thanks Avi and Alex, I can see the KVM guest boot prints by adding
-append
   console=ttyS0
  
   Note, once you get to user space you will need a getty specified in
   inittab in order to get a login on your serial port.   Something like:
  
  T0:23:respawn:/sbin/getty -L ttyS0
 
  1)
  I tried booting with prebuilt qcow2 then it works for me:
  qemu-system-x86_64  -enable-kvm  -nographic  -device sga  -m 1024 -hda
  debian_squeeze_amd64_standard.qcow2
 
  Does anyone help on how I can add my kernel to qcow2? Or create a proper
 qcow2?
 
  2)
  Also I tried as mentioned in section 3.9 Direct Linux Boot:
 http://qemu.weilnetz.de/qemu-doc.html#disk_005fimages :
 
  qemu-kvm  -enable-kvm  -nographic -kernel /boot/vmlinuz-3.5.0+ -hda
  /boot/initramfs-3.5.0+.img  -append console=ttyS0 root=/dev/sda -m
  1024
 
  I get below error :
  [1.299225] No filesystem could mount root, tried:  ext3 ext2 ext4 
  iso9660
  [1.303232] Kernel panic - not syncing: VFS: Unable to mount root fs on
 unknown-block(8,0)
 
 
 Chances are your root is a partition on /dev/sda, not the disk itself (ex. 
 sda1,
 sda2)... Why exactly are you causing yourself so much pain by doing these 
 direct
 boot, no VGA options instead of starting with an ISO image, installing it, 
 then
 setting up a serial console if you want serial access?  This may be the norm 
 for
 powerpc VMs, but it's not how I think most people setup x86 VMs.  Thanks,

I have a fedora machine to which I do not have direct access (but I can reboot 
remotely, have a console). So far what I was trying direct booting VM using 
same initramfs and bzimage as of host,

Alex, How I can create a ISO image with my kernel? Where I should place that on 
host?

Thanks
-Bharat

 
 Alex
 
  [1.307683] Pid: 1, comm: swapper/0 Not tainted 3.3.5-2.fc16.x86_64 #1
  [1.311201] Call Trace:
  [1.312548]  [815eac62] panic+0xba/0x1cd
  [1.315160]  [81cf1075] mount_block_root+0x258/0x283
  [1.318275]  [81cf10f3] mount_root+0x53/0x57
  [1.321047]  [81cf1234] prepare_namespace+0x13d/0x176
  [1.324206]  [81cf0d59] kernel_init+0x156/0x15b
  [1.327114]  [81089587] ? schedule_tail+0x27/0xb0
  [1.330102]  [815fd6a4] kernel_thread_helper+0x4/0x10
  [1.333413]  [81cf0c03] ? start_kernel+0x3c5/0x3c5
  [1.336446]  [815fd6a0] ? gs_change+0x13/0x13
 
  Thanks
  -Bharat
 
 
 
 



Re: [Qemu-devel] Running KVM guest on X86

2012-08-07 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alex Williamson [mailto:alex.william...@redhat.com]
 Sent: Monday, August 06, 2012 9:27 PM
 To: Bhushan Bharat-R65777
 Cc: qemu-devel@nongnu.org; Avi Kivity
 Subject: Re: Running KVM guest on X86
 
 On Mon, 2012-08-06 at 15:40 +, Bhushan Bharat-R65777 wrote:
  Hi Avi/All,
 
  I am facing issue to boot KVM guest on x86 (I used to work on PowerPC 
  platform
 and do not have enough knowledge of x86). I am working on making VFIO working 
 on
 PowerPC Booke, So I have cloned Alex Williamsons git repository, compiled 
 kernel
 for x86 on fedora with virtualization configuration (selected all kernel 
 config
 options for same). Run below command to boot Guest (I have not provided vfio
 device yet):
 
  qemu-system-x86_64 -enable-kvm -m 1024 -nographic -kernel
 arch/x86_64/boot/bzImage -initrd /boot/initramfs-3.5.0-rc4+.img -serial
 tcp::,server,telnet
 
  After the I can see qemu command line (able to run various commands like 
  info
 registers etc), while guest does not boot (not even the first print comes).
 
  Can anyone help in what I am missing or doing wrong?
 
 x86 doesn't use the serial port for console by default, so you're making 
 things
 quite a bit more difficult that way.  Typically you'll want to provide a disk
 image (the -hda option is the easiest way to do this), a display (-vga std 
 -vnc
 :0 is again easiest), and probably something to install from (-cdrom
 image.iso).  You can also add a -boot d to get it to choose the cdrom the
 first time for install.  Thanks,

Thanks Avi and Alex, I can see the KVM guest boot prints by adding -append 
console=ttyS0

Now my exact command is like: 
qemu-system-x86_64 -enable-kvm -nographic -nodefconfig -kernel 
/boot/vmlinuz-3.5.0+  -initrd /boot/initramfs-3.5.0+.img -append 
console=ttyS0 -hda fedora.qcow  -m 1024

  Where fedora.qcow is created by qemu-img create fedora.qcow 5G

With this it is falling to Dracut, below are the error prints:


[2.288931] dracut: FATAL: No or empty root= argument
[2.291808] dracut: Refusing to continue


[2.294721] dracut Warning: Signal caught!
dracut Warning: Signal caught!
[2.298894] dracut Warning: dracut: FATAL: No or empty root= argument
dracut Warning: dracut: FATAL: No or empty root= argument
[2.304713] dracut Warning: dracut: Refusing to continue
dracut Warning: dracut: Refusing to continue

[2.320311] init (1) used greatest stack depth: 2664 bytes left
[2.323851] Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x0100
[2.323851] 
[2.324551] Pid: 1, comm: init Not tainted 3.5.0+ #7
[2.324551] Call Trace:
[2.324551]  [8171c34e] panic+0xc6/0x1e1
[2.324551]  [81729be0] ? _raw_write_unlock_irq+0x30/0x50
[2.324551]  [810c0020] do_exit+0xa20/0xb70
[2.324551]  [8172a215] ? retint_swapgs+0x13/0x1b
[2.324551]  [810c04bf] do_group_exit+0x4f/0xc0
[2.324551]  [810c0547] sys_exit_group+0x17/0x20
[2.324551]  [81732d69] system_call_fastpath+0x16/0x1b
-

Should I mount the root-file-system in disk and pass -append root=mount-path.

I had no luck when I tried qemu-system-x86_64 -device virtio-scsi-pci,id=scsi  
-enable-kvm  -nographic -nodefconfig -nodefaults  -chardev stdio,id=charserial0 
 -device isa-serial,chardev=charserial0,id=serial0  -drive 
file=fedora.qcow,if=none,id=sda  -device scsi-disk,bus=scsi.0,drive=sda  
-device sga  -kernel /boot/vmlinuz-3.5.0+  -initrd /boot/initramfs-3.5.0+.img  
-append console=ttyS0 -m 1024
 

Thanks in Advance
-Bharat



[Qemu-devel] Running KVM guest on X86

2012-08-06 Thread Bhushan Bharat-R65777
Hi Avi/All,

I am facing issue to boot KVM guest on x86 (I used to work on PowerPC platform 
and do not have enough knowledge of x86). I am working on making VFIO working 
on PowerPC Booke, So I have cloned Alex Williamsons git repository, compiled 
kernel for x86 on fedora with virtualization configuration (selected all kernel 
config options for same). Run below command to boot Guest (I have not provided 
vfio device yet): 

qemu-system-x86_64 -enable-kvm -m 1024 -nographic -kernel 
arch/x86_64/boot/bzImage -initrd /boot/initramfs-3.5.0-rc4+.img -serial 
tcp::,server,telnet

After the I can see qemu command line (able to run various commands like info 
registers etc), while guest does not boot (not even the first print comes).

Can anyone help in what I am missing or doing wrong?

Thanks
-Bharat




Re: [Qemu-devel] [PATCH 4/4] Enable kvm emulated watchdog

2012-08-02 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Wood Scott-B07421
 Sent: Wednesday, August 01, 2012 11:31 PM
 To: Bhushan Bharat-R65777
 Cc: Alexander Graf; qemu-...@nongnu.org List; kvm-...@vger.kernel.org; qemu-
 devel qemu-devel; KVM list
 Subject: Re: [PATCH 4/4] Enable kvm emulated watchdog
 
 On 08/01/2012 12:27 PM, Bhushan Bharat-R65777 wrote:
 
 
  -Original Message-
  From: Alexander Graf [mailto:ag...@suse.de]
  Sent: Wednesday, August 01, 2012 7:57 AM
  To: Bhushan Bharat-R65777
  Cc: qemu-...@nongnu.org List; kvm-...@vger.kernel.org; Bhushan
  Bharat-R65777; qemu-devel qemu-devel; KVM list
  Subject: Re: [PATCH 4/4] Enable kvm emulated watchdog
 
 
  On 20.07.2012, at 07:23, Bharat Bhushan wrote:
  @@ -384,6 +437,15 @@ int kvm_arch_init_vcpu(CPUPPCState *cenv)
  return ret;
  }
 
  +if (enable_watchdog_support) {
  +ret = kvm_watchdog_enable(cenv);
 
  Do you think this is a good idea? Why would real hardware not
  implement a watchdog just because the user didn't select an action?
 
  If there is no watchdog action then why we want to run watchdog timer?
 
 On real hardware, if software sets WRC to a non-zero value, the watchdog 
 action
 is a system reset.  The user doesn't have to do anything special.

Scott, maybe I misunderstood you comment, did not you commented to enable kvm 
watchdog if there is watchdog action provided by use. 

Thanks
-Bharat



Re: [Qemu-devel] [PATCH 4/4] Enable kvm emulated watchdog

2012-08-02 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Wood Scott-B07421
 Sent: Friday, August 03, 2012 2:02 AM
 To: Bhushan Bharat-R65777
 Cc: Wood Scott-B07421; Alexander Graf; qemu-...@nongnu.org List; kvm-
 p...@vger.kernel.org; qemu-devel qemu-devel; KVM list
 Subject: Re: [PATCH 4/4] Enable kvm emulated watchdog
 
 On 08/02/2012 10:56 AM, Bhushan Bharat-R65777 wrote:
 
 
  -Original Message-
  From: Wood Scott-B07421
  Sent: Wednesday, August 01, 2012 11:31 PM
  To: Bhushan Bharat-R65777
  Cc: Alexander Graf; qemu-...@nongnu.org List;
  kvm-...@vger.kernel.org; qemu- devel qemu-devel; KVM list
  Subject: Re: [PATCH 4/4] Enable kvm emulated watchdog
 
  On 08/01/2012 12:27 PM, Bhushan Bharat-R65777 wrote:
 
 
  -Original Message-
  From: Alexander Graf [mailto:ag...@suse.de]
  Sent: Wednesday, August 01, 2012 7:57 AM
  To: Bhushan Bharat-R65777
  Cc: qemu-...@nongnu.org List; kvm-...@vger.kernel.org; Bhushan
  Bharat-R65777; qemu-devel qemu-devel; KVM list
  Subject: Re: [PATCH 4/4] Enable kvm emulated watchdog
 
 
  On 20.07.2012, at 07:23, Bharat Bhushan wrote:
  @@ -384,6 +437,15 @@ int kvm_arch_init_vcpu(CPUPPCState *cenv)
  return ret;
  }
 
  +if (enable_watchdog_support) {
  +ret = kvm_watchdog_enable(cenv);
 
  Do you think this is a good idea? Why would real hardware not
  implement a watchdog just because the user didn't select an action?
 
  If there is no watchdog action then why we want to run watchdog timer?
 
  On real hardware, if software sets WRC to a non-zero value, the
  watchdog action is a system reset.  The user doesn't have to do anything
 special.
 
  Scott, maybe I misunderstood you comment, did not you commented to
  enable kvm watchdog if there is watchdog action provided by use.
 
 I changed my mind. :-)
 
 The main difference between the user specifically asking for an action of 
 system
 reset, and the user not asking for anything, is that only in the former case
 should we error out if watchdog functionality isn't available -- but if it's a
 pain to distinguish, don't error out in either case.

:-) ..  ok 

-Bharat

 
 -Scott



Re: [Qemu-devel] [PATCH 4/4] Enable kvm emulated watchdog

2012-08-01 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Wednesday, August 01, 2012 7:57 AM
 To: Bhushan Bharat-R65777
 Cc: qemu-...@nongnu.org List; kvm-...@vger.kernel.org; Bhushan Bharat-R65777;
 qemu-devel qemu-devel; KVM list
 Subject: Re: [PATCH 4/4] Enable kvm emulated watchdog
 
 
 On 20.07.2012, at 07:23, Bharat Bhushan wrote:
 
  Enable the KVM emulated watchdog if KVM supports (use the capability
  enablement in watchdog handler). Also watchdog exit
  (KVM_EXIT_WATCHDOG) handling is added.
  Watchdog state machine is cleared whenever VM state changes to running.
  This is to handle the cases like return from debug halt etc.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
  v4: Enbale watchdog support only when user specifies watchdog-action
 Earlier this was [3/3] of the patch series. Now because i separated
 linux-header updation in separate patch, so this become [4/4]
 
  v3:
  - TSR clearing is removed in whatchdog exit handling as this is
no more needed.
 
  v2:
  - Merged ([PATCH 3/4] Watchdog exit handling support) and
([PATCH 4/4] Enable to use kvm emulated watchdog)
  - Clear watchdog state machine when VM state changes to running.
 
  hw/ppc_booke.c   |5 
  sysemu.h |1 +
  target-ppc/cpu.h |1 +
  target-ppc/kvm.c |   69 
  ++
  vl.c |2 +
  5 files changed, 78 insertions(+), 0 deletions(-)
 
  diff --git a/hw/ppc_booke.c b/hw/ppc_booke.c index 837a5b6..478dbcd
  100644
  --- a/hw/ppc_booke.c
  +++ b/hw/ppc_booke.c
  @@ -203,6 +203,11 @@ static void booke_wdt_cb(void *opaque)
   booke_timer-wdt_timer); }
 
  +void ppc_booke_watchdog_clear_tsr(CPUPPCState *env, target_ulong tsr)
  +{
  +env-spr[SPR_BOOKE_TSR] = tsr  ~(TSR_ENW | TSR_WIS |
  +TSR_WRS_MASK); }
  +
  void store_booke_tsr(CPUPPCState *env, target_ulong val) {
  env-spr[SPR_BOOKE_TSR] = ~val;
  diff --git a/sysemu.h b/sysemu.h
  index bc2c788..fc388b7 100644
  --- a/sysemu.h
  +++ b/sysemu.h
  @@ -131,6 +131,7 @@ extern uint8_t *boot_splash_filedata; extern int
  boot_splash_filedata_size; extern uint8_t qemu_extra_params_fw[2];
  extern QEMUClock *rtc_clock;
  +extern int enable_watchdog_support;
 
  #define MAX_NODES 64
  extern int nb_numa_nodes;
  diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h index
  ca2fc21..163389a 100644
  --- a/target-ppc/cpu.h
  +++ b/target-ppc/cpu.h
  @@ -1191,6 +1191,7 @@ void store_40x_dbcr0 (CPUPPCState *env, uint32_t
  val); void store_40x_sler (CPUPPCState *env, uint32_t val); void
  store_booke_tcr (CPUPPCState *env, target_ulong val); void
  store_booke_tsr (CPUPPCState *env, target_ulong val);
  +void ppc_booke_watchdog_clear_tsr(CPUPPCState *env, target_ulong
  +tsr);
  void booke206_flush_tlb(CPUPPCState *env, int flags, const int
  check_iprot); target_phys_addr_t booke206_tlb_to_page_size(CPUPPCState
  *env, ppcmas_tlb_t *tlb); int ppcmas_tlb_check(CPUPPCState *env,
  ppcmas_tlb_t *tlb, diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
  index b6ef72d..0226b5e 100644
  --- a/target-ppc/kvm.c
  +++ b/target-ppc/kvm.c
  @@ -32,6 +32,7 @@
  #include device_tree.h
  #include hw/sysbus.h
  #include hw/spapr.h
  +#include hw/watchdog.h
 
  #include hw/sysbus.h
  #include hw/spapr.h
  @@ -60,6 +61,7 @@ static int cap_booke_sregs; static int cap_ppc_smt;
  static int cap_ppc_rma; static int cap_spapr_tce;
  +static int cap_ppc_watchdog;
 
  /* XXX We have a race condition where we actually have a level triggered
   * interrupt, but the infrastructure can't expose that yet, so the guest
  @@ -86,6 +88,7 @@ int kvm_arch_init(KVMState *s)
  cap_ppc_smt = kvm_check_extension(s, KVM_CAP_PPC_SMT);
  cap_ppc_rma = kvm_check_extension(s, KVM_CAP_PPC_RMA);
  cap_spapr_tce = kvm_check_extension(s, KVM_CAP_SPAPR_TCE);
  +cap_ppc_watchdog = kvm_check_extension(s, KVM_CAP_PPC_BOOKE_WATCHDOG);
 
  if (!cap_interrupt_level) {
  fprintf(stderr, KVM: Couldn't find level irq capability. Expect 
  the 
  @@ -168,6 +171,30 @@ static int kvm_booke206_tlb_init(CPUPPCState *env)
  return 0;
  }
 
  +static int kvm_watchdog_enable(CPUPPCState *env)
  +{
  +int ret;
  +struct kvm_enable_cap encap = {};
  +
  +if (!kvm_enabled()) {
  +return 0;
  +}
  +
  +if (!cap_ppc_watchdog) {
  +printf(warning: KVM does not support watchdog);
  +return 0;
  +}
  +
  +encap.cap = KVM_CAP_PPC_BOOKE_WATCHDOG;
  +ret = kvm_vcpu_ioctl(env, KVM_ENABLE_CAP, encap);
  +if (ret  0) {
  +fprintf(stderr, %s: couldn't enable KVM_CAP_PPC_BOOKE_WATCHDOG:
 %s\n,
  +__func__, strerror(-ret));
  +return ret;
  +}
  +
  +return ret;
  +}
 
  #if defined(TARGET_PPC64)
  static void kvm_get_fallback_smmu_info(CPUPPCState *env,
  @@ -371,6 +398,32 @@ static inline void kvm_fixup_page_sizes(CPUPPCState 
  *env)
 
  #endif /* !defined

Re: [Qemu-devel] [PATCH 4/4] Enable kvm emulated watchdog

2012-08-01 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Wednesday, August 01, 2012 7:57 AM
 To: Bhushan Bharat-R65777
 Cc: qemu-...@nongnu.org List; kvm-...@vger.kernel.org; Bhushan Bharat-R65777;
 qemu-devel qemu-devel; KVM list
 Subject: Re: [PATCH 4/4] Enable kvm emulated watchdog
 
 
 On 20.07.2012, at 07:23, Bharat Bhushan wrote:
 
  Enable the KVM emulated watchdog if KVM supports (use the capability
  enablement in watchdog handler). Also watchdog exit
  (KVM_EXIT_WATCHDOG) handling is added.
  Watchdog state machine is cleared whenever VM state changes to running.
  This is to handle the cases like return from debug halt etc.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
  v4: Enbale watchdog support only when user specifies watchdog-action
 Earlier this was [3/3] of the patch series. Now because i separated
 linux-header updation in separate patch, so this become [4/4]
 
  v3:
  - TSR clearing is removed in whatchdog exit handling as this is
no more needed.
 
  v2:
  - Merged ([PATCH 3/4] Watchdog exit handling support) and
([PATCH 4/4] Enable to use kvm emulated watchdog)
  - Clear watchdog state machine when VM state changes to running.
 
  hw/ppc_booke.c   |5 
  sysemu.h |1 +
  target-ppc/cpu.h |1 +
  target-ppc/kvm.c |   69 
  ++
  vl.c |2 +
  5 files changed, 78 insertions(+), 0 deletions(-)
 
  diff --git a/hw/ppc_booke.c b/hw/ppc_booke.c index 837a5b6..478dbcd
  100644
  --- a/hw/ppc_booke.c
  +++ b/hw/ppc_booke.c
  @@ -203,6 +203,11 @@ static void booke_wdt_cb(void *opaque)
   booke_timer-wdt_timer); }
 
  +void ppc_booke_watchdog_clear_tsr(CPUPPCState *env, target_ulong tsr)
  +{
  +env-spr[SPR_BOOKE_TSR] = tsr  ~(TSR_ENW | TSR_WIS |
  +TSR_WRS_MASK); }
  +
  void store_booke_tsr(CPUPPCState *env, target_ulong val) {
  env-spr[SPR_BOOKE_TSR] = ~val;
  diff --git a/sysemu.h b/sysemu.h
  index bc2c788..fc388b7 100644
  --- a/sysemu.h
  +++ b/sysemu.h
  @@ -131,6 +131,7 @@ extern uint8_t *boot_splash_filedata; extern int
  boot_splash_filedata_size; extern uint8_t qemu_extra_params_fw[2];
  extern QEMUClock *rtc_clock;
  +extern int enable_watchdog_support;
 
  #define MAX_NODES 64
  extern int nb_numa_nodes;
  diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h index
  ca2fc21..163389a 100644
  --- a/target-ppc/cpu.h
  +++ b/target-ppc/cpu.h
  @@ -1191,6 +1191,7 @@ void store_40x_dbcr0 (CPUPPCState *env, uint32_t
  val); void store_40x_sler (CPUPPCState *env, uint32_t val); void
  store_booke_tcr (CPUPPCState *env, target_ulong val); void
  store_booke_tsr (CPUPPCState *env, target_ulong val);
  +void ppc_booke_watchdog_clear_tsr(CPUPPCState *env, target_ulong
  +tsr);
  void booke206_flush_tlb(CPUPPCState *env, int flags, const int
  check_iprot); target_phys_addr_t booke206_tlb_to_page_size(CPUPPCState
  *env, ppcmas_tlb_t *tlb); int ppcmas_tlb_check(CPUPPCState *env,
  ppcmas_tlb_t *tlb, diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
  index b6ef72d..0226b5e 100644
  --- a/target-ppc/kvm.c
  +++ b/target-ppc/kvm.c
  @@ -32,6 +32,7 @@
  #include device_tree.h
  #include hw/sysbus.h
  #include hw/spapr.h
  +#include hw/watchdog.h
 
  #include hw/sysbus.h
  #include hw/spapr.h
  @@ -60,6 +61,7 @@ static int cap_booke_sregs; static int cap_ppc_smt;
  static int cap_ppc_rma; static int cap_spapr_tce;
  +static int cap_ppc_watchdog;
 
  /* XXX We have a race condition where we actually have a level triggered
   * interrupt, but the infrastructure can't expose that yet, so the guest
  @@ -86,6 +88,7 @@ int kvm_arch_init(KVMState *s)
  cap_ppc_smt = kvm_check_extension(s, KVM_CAP_PPC_SMT);
  cap_ppc_rma = kvm_check_extension(s, KVM_CAP_PPC_RMA);
  cap_spapr_tce = kvm_check_extension(s, KVM_CAP_SPAPR_TCE);
  +cap_ppc_watchdog = kvm_check_extension(s, KVM_CAP_PPC_BOOKE_WATCHDOG);
 
  if (!cap_interrupt_level) {
  fprintf(stderr, KVM: Couldn't find level irq capability. Expect 
  the 
  @@ -168,6 +171,30 @@ static int kvm_booke206_tlb_init(CPUPPCState *env)
  return 0;
  }
 
  +static int kvm_watchdog_enable(CPUPPCState *env)
  +{
  +int ret;
  +struct kvm_enable_cap encap = {};
  +
  +if (!kvm_enabled()) {
  +return 0;
  +}
  +
  +if (!cap_ppc_watchdog) {
  +printf(warning: KVM does not support watchdog);
  +return 0;
  +}
  +
  +encap.cap = KVM_CAP_PPC_BOOKE_WATCHDOG;
  +ret = kvm_vcpu_ioctl(env, KVM_ENABLE_CAP, encap);
  +if (ret  0) {
  +fprintf(stderr, %s: couldn't enable KVM_CAP_PPC_BOOKE_WATCHDOG:
 %s\n,
  +__func__, strerror(-ret));
  +return ret;
  +}
  +
  +return ret;
  +}
 
  #if defined(TARGET_PPC64)
  static void kvm_get_fallback_smmu_info(CPUPPCState *env,
  @@ -371,6 +398,32 @@ static inline void kvm_fixup_page_sizes(CPUPPCState 
  *env)
 
  #endif /* !defined

Re: [Qemu-devel] [RFC] Proposal: PCI/PCIe: inbound BAR0 emulation for PCI controller (Root Complex)

2012-06-11 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org]
 Sent: Friday, June 08, 2012 4:32 PM
 To: Bhushan Bharat-R65777
 Cc: qemu-devel@nongnu.org; Wood Scott-B07421; Yoder Stuart-B08248
 Subject: Re: [RFC] Proposal: PCI/PCIe: inbound BAR0 emulation for PCI 
 controller
 (Root Complex)
 
 On Fri, 2012-06-08 at 09:03 +, Bhushan Bharat-R65777 wrote:
  Hi All,
 
  When Freescale PCI controller configured in Root Complex mode then,
  its configuration header (type 1) have one inbound BAR (BAR0, called
  as CCSRBAR). And rest of BARs (inbound and outbound) are supported by
  ATMU registers, which are outside the Type 1 configuration header.
  This BAR0 of Type 1 configuration header always translate to CCSR
  space and is of the size of CCSR. This BAR0 (inbound window) is
  required for MSI interrupts support. With this window, the pci devices
  can write to the MPIC MSI registers to generate MSI interrupt.
 
 So you should start with giving an overview, nobody here knows what ATMU or
 CCSR are, those are device specific acronyms

ATMU - Address Translation and Mapping Unit :- This facilitate flexibility in 
defining the address maps for the external interfaces (PCI, Rapidio etc). 
This basically provides following features:
1. Translating the local address to an external address space
2. Translating external addresses to the local address space

CCSR - Configuration, Control, and Status Register (CCSR) Space. This is memory 
mapped system register space.

In PCI/PCIe context this can be visualized as:


-|--|
Local|   |PCI controller (Root Complex)||
Address  |   |   ___| 
Space|-|  |ATMU  |  |  Type 1||--PCI Address Space---
 |   |  |__|  | Config Header  ||
 |   ||||
-|--|


 
 As a matter of fact it looks like I misunderstood you on IRC :-) IE. I thought
 your problem was that BAR0 is 'special' as it represents the inbound memory
 window (as it does on some PPC 4xx for example and a whole other collection of
 embedded devices). You seem to indicate that it's in fact MMIO:

In some SOCs, BAR0 always maps to CCSR (MMIO) while in some new SOCs, the BAR0 
default maps to CCSR space while it can be programmed to map system memory in 
local address space.

 
  As far as I know, as of now no emulated PCI controller supports this
  BAR0 in type 1 configuration header. But probably (I think so) that
  supporting this is not of big concern, but the point is that this
  window (BAR0) translate to mmio-regs (CCSR) and not to DDR memory.
 
 So the 4xx case or the case where your BAR 0 actually represents system memory
 are something that is bothering me but not what you are trying to discuss 
 here.
 
 IE. Normal BAR operations should work for an MMIO BAR, ie register it normally
 as a PCIe device and devices accessing those addresses will do the right 
 thing.
 
 From there, AFAIK, the MSI code will simply do stl_le_phys, which I
 -believe- will hit a BAR that does MMIO decoding for those addresses, but I'll
 let people knowing qemu more in depth reply whether that's true or not.
 
 There's very few devices with MSI support in hw/* so it's hard to test with
 anything other than virtio.

Probably testing with virtio should be ok as of now ( to put the framework in 
place).

Thanks
-Bharat
 
  So I have couple of concerns here:
 
  1. Whenever PCI device does need DMA then these windows (inbound and
  outbound ATMUs registers) need to used to translate pci address to
  system physical address (Sometime we also call this as cpu address
  space). This will probably be done by : [Qemu-devel] [PATCH 00/12]
  IOMMU Infrastructure : patch-set ( I am trying to understand these
  patches :-))
 
 Yes, that's basically it. The patches allow you to add a set of routines that
 will be used for translating DMA accesses to system memory along with 
 map/unmap
 operations etc...
 
 Beware that this only works with drivers that use the proper accessors.
 The patch series fixes some of them but not everything. I don't know off 
 hand
 (I don't have the code at hand right now) whether the MSI code goes through 
 that
 or not, if it does, you may need to be careful to
 -not- hit the translation layer and pass the accesses on.
 
  2. Hook up this inbound BAR0 in the above patch-set to translate to
  mmio-regs. As this would be controller specific, a callback will be
  registered for translation. Now it will be upto the controller
  specific code on how it translates. As this is needed only for MSI
  interrupt, maybe, initially we do not do anything initially, till we
  want MSI emulation in QEMU.
 
 Well, virtio can make good use of MSI emulation ...
 
 Cheers,
 Ben.
 
  Please provide your comment.
 
  Thanks
  -Bharat
 
 
 
 



Re: [Qemu-devel] [RFC] Proposal: PCI/PCIe: inbound BAR0 emulation for PCI controller (Root Complex)

2012-06-10 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Wood Scott-B07421
 Sent: Saturday, June 09, 2012 12:27 AM
 To: Bhushan Bharat-R65777
 Cc: qemu-devel@nongnu.org; b...@kernel.crashing.org; Wood Scott-B07421; Yoder
 Stuart-B08248
 Subject: Re: [RFC] Proposal: PCI/PCIe: inbound BAR0 emulation for PCI 
 controller
 (Root Complex)
 
 On 06/08/2012 04:03 AM, Bhushan Bharat-R65777 wrote:
  Hi All,
 
  When Freescale PCI controller configured in Root Complex mode then,
  its configuration header (type 1) have one inbound BAR (BAR0, called
  as CCSRBAR).
 
 It maps to CCSRBAR, but it's called PCICSRBAR/PEXCSRBAR.
 
  And rest of BARs (inbound and outbound) are supported by ATMU
  registers, which are outside the Type 1 configuration header.
  This BAR0 of Type 1 configuration header always translate to CCSR
  space and is of the size of CCSR. This BAR0 (inbound window) is
  required for MSI interrupts support.
 
 Some sort of inbound mapping is required for MSIs.  It's typically this one or
 PEXMSIBAR (which is a similar concept but more full-featured, and not in 
 config
 space), but doesn't have to be (e.g. we use normal inbound windows for MSI on
 Topaz, because of PAMU limitations that prevent us from using the CCSR address
 as the guest physical MSI destination).

When guest access BAR0 of configuration space or it accesses PEXMSIBAR, a 
common inbound mapping will translate the address to CCSR (Now the translated 
address may be further translated to real physical by IOMMU translation layer 
if configured in QEMU).

 
  As far as I know, as of now no emulated PCI controller supports this
  BAR0 in type 1 configuration header. But probably (I think so) that
  supporting this is not of big concern, but the point is that this
  window (BAR0) translate to mmio-regs (CCSR) and not to DDR memory.
 
  So I have couple of concerns here:
 
  1. Whenever PCI device does need DMA then these windows (inbound and
  outbound ATMUs registers) need to used to translate pci address to
  system physical address (Sometime we also call this as cpu address
  space). This will probably be done by : [Qemu-devel] [PATCH 00/12]
  IOMMU Infrastructure : patch-set ( I am trying to understand these
  patches :-))
 
  2. Hook up this inbound BAR0 in the above patch-set to translate to
  mmio-regs. As this would be controller specific, a callback will be
  registered for translation. Now it will be upto the controller
  specific code on how it translates. As this is needed only for MSI
  interrupt, maybe, initially we do not do anything initially, till we
  want MSI emulation in QEMU.
 
 MSIs may be the only thing that we plan to use it for, but if you want to
 properly emulate the chip you need to fully implement all the translation
 windows.

Do you mean address translation using ATMU inbound/outbound windows? Or 
complete translation support of CCSR space?

The former, I think, not sure, should get supported by IOMMU Infrastructure 
patch set.

The later is not supported. And if I understood correctly then, I think the I 
agree with you on emulating complete CCSR space. Do you think that we can start 
with supporting the MSI region mapping (MPIC mmio regs) and design it in way 
that later if needed it can be extended for rest of CCSR space.

If in kernel MPIC is there, then the QEMU will use the in-kernel MPIC to 
generate MSI interrupt or it will use QEMU emulated MPIC to generate MSI 
interrupt.

 
 We also want the BAR itself to be properly emulated, as otherwise software can
 get confused when it reads it and sees zero as the location of the PCI DMA
 memory hole.
 

Probably I did not understood clearly. Can you please elaborate of what type of 
confusion.

Thanks
-Bharat




[Qemu-devel] [RFC] Proposal: PCI/PCIe: inbound BAR0 emulation for PCI controller (Root Complex)

2012-06-08 Thread Bhushan Bharat-R65777
Hi All,

When Freescale PCI controller configured in Root Complex mode then, its 
configuration header (type 1) have one inbound BAR (BAR0, called as CCSRBAR). 
And rest of BARs (inbound and outbound) are supported by ATMU registers, which 
are outside the Type 1 configuration header. This BAR0 of Type 1 configuration 
header always translate to CCSR space and is of the size of CCSR. This BAR0 
(inbound window) is required for MSI interrupts support. With this window, the 
pci devices can write to the MPIC MSI registers to generate MSI interrupt.

As far as I know, as of now no emulated PCI controller supports this BAR0 in 
type 1 configuration header. But probably (I think so) that supporting this is 
not of big concern, but the point is that this window (BAR0) translate to 
mmio-regs (CCSR) and not to DDR memory.

So I have couple of concerns here:

1. Whenever PCI device does need DMA then these windows (inbound and outbound 
ATMUs registers) need to used to translate pci address to system physical 
address (Sometime we also call this as cpu address space). This will probably 
be done by : [Qemu-devel] [PATCH 00/12] IOMMU Infrastructure : patch-set ( I am 
trying to understand these patches :-))

2. Hook up this inbound BAR0 in the above patch-set to translate to mmio-regs. 
As this would be controller specific, a callback will be registered for 
translation. Now it will be upto the controller specific code on how it 
translates. As this is needed only for MSI interrupt, maybe, initially we do 
not do anything initially, till we want MSI emulation in QEMU.

Please provide your comment.

Thanks
-Bharat






Re: [Qemu-devel] [RFC] Proposal: PCI/PCIe: inbound BAR0 emulation for PCI controller (Root Complex)

2012-06-08 Thread Bhushan Bharat-R65777
Just adding Alex Graf and qemu-ppc@.

Will add my response in reply. 

 -Original Message-
 From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org]
 Sent: Friday, June 08, 2012 4:32 PM
 To: Bhushan Bharat-R65777
 Cc: qemu-devel@nongnu.org; Wood Scott-B07421; Yoder Stuart-B08248
 Subject: Re: [RFC] Proposal: PCI/PCIe: inbound BAR0 emulation for PCI 
 controller
 (Root Complex)
 
 On Fri, 2012-06-08 at 09:03 +, Bhushan Bharat-R65777 wrote:
  Hi All,
 
  When Freescale PCI controller configured in Root Complex mode then,
  its configuration header (type 1) have one inbound BAR (BAR0, called
  as CCSRBAR). And rest of BARs (inbound and outbound) are supported by
  ATMU registers, which are outside the Type 1 configuration header.
  This BAR0 of Type 1 configuration header always translate to CCSR
  space and is of the size of CCSR. This BAR0 (inbound window) is
  required for MSI interrupts support. With this window, the pci devices
  can write to the MPIC MSI registers to generate MSI interrupt.
 
 So you should start with giving an overview, nobody here knows what ATMU or
 CCSR are, those are device specific acronyms
 
 As a matter of fact it looks like I misunderstood you on IRC :-) IE. I thought
 your problem was that BAR0 is 'special' as it represents the inbound memory
 window (as it does on some PPC 4xx for example and a whole other collection of
 embedded devices). You seem to indicate that it's in fact MMIO:
 
  As far as I know, as of now no emulated PCI controller supports this
  BAR0 in type 1 configuration header. But probably (I think so) that
  supporting this is not of big concern, but the point is that this
  window (BAR0) translate to mmio-regs (CCSR) and not to DDR memory.
 
 So the 4xx case or the case where your BAR 0 actually represents system memory
 are something that is bothering me but not what you are trying to discuss 
 here.
 
 IE. Normal BAR operations should work for an MMIO BAR, ie register it normally
 as a PCIe device and devices accessing those addresses will do the right 
 thing.
 
 From there, AFAIK, the MSI code will simply do stl_le_phys, which I
 -believe- will hit a BAR that does MMIO decoding for those addresses, but I'll
 let people knowing qemu more in depth reply whether that's true or not.
 
 There's very few devices with MSI support in hw/* so it's hard to test with
 anything other than virtio.
 
  So I have couple of concerns here:
 
  1. Whenever PCI device does need DMA then these windows (inbound and
  outbound ATMUs registers) need to used to translate pci address to
  system physical address (Sometime we also call this as cpu address
  space). This will probably be done by : [Qemu-devel] [PATCH 00/12]
  IOMMU Infrastructure : patch-set ( I am trying to understand these
  patches :-))
 
 Yes, that's basically it. The patches allow you to add a set of routines that
 will be used for translating DMA accesses to system memory along with 
 map/unmap
 operations etc...
 
 Beware that this only works with drivers that use the proper accessors.
 The patch series fixes some of them but not everything. I don't know off 
 hand
 (I don't have the code at hand right now) whether the MSI code goes through 
 that
 or not, if it does, you may need to be careful to
 -not- hit the translation layer and pass the accesses on.
 
  2. Hook up this inbound BAR0 in the above patch-set to translate to
  mmio-regs. As this would be controller specific, a callback will be
  registered for translation. Now it will be upto the controller
  specific code on how it translates. As this is needed only for MSI
  interrupt, maybe, initially we do not do anything initially, till we
  want MSI emulation in QEMU.
 
 Well, virtio can make good use of MSI emulation ...
 
 Cheers,
 Ben.
 
  Please provide your comment.
 
  Thanks
  -Bharat
 
 
 
 



Re: [Qemu-devel] qemu_power_down_request handling

2012-04-16 Thread Bhushan Bharat-R65777

Also why not use qemu_system_shutdown_request() rather than 
qemu_system_powerdown_request() for -watchdog-action shutdown ?

Thanks
-Bharat

 -Original Message-
 From: Bhushan Bharat-R65777
 Sent: Monday, April 16, 2012 1:37 PM
 To: 'qemu-devel@nongnu.org'
 Cc: 'Alexander Graf'; Wood Scott-B07421
 Subject: qemu_power_down_request handling
 
 Hi All,
 
 The -watchdog-action shutdown uses the monitor system_shutdown mechanism but
 this does not actually powerdown the system.
 
 In code, If there is pending powerdown_request 
 (qemu_system_powerdown_request()
 called) then it try to raise an irq. What is that irq supposed to do. 
 Currently
 it in NULL pointer so does not do anything.
 
 Code snapshot -- vl.c : main_loop_should_exit()
 
 
 if (qemu_powerdown_requested()) {
 monitor_protocol_event(QEVENT_POWERDOWN, NULL);
 qemu_irq_raise(qemu_system_powerdown);
 }
 . . . .
 
 Is the powerdown_request handling not complete?
 
 Thanks
 -Bharat
 




[Qemu-devel] qemu_power_down_request handling

2012-04-16 Thread Bhushan Bharat-R65777
Hi All,

The -watchdog-action shutdown uses the monitor system_shutdown mechanism but 
this does not actually powerdown the system. 

In code, If there is pending powerdown_request (qemu_system_powerdown_request() 
called) then it try to raise an irq. What is that irq supposed to do. Currently 
it in NULL pointer so does not do anything.

Code snapshot -- vl.c : main_loop_should_exit()


if (qemu_powerdown_requested()) {
monitor_protocol_event(QEVENT_POWERDOWN, NULL);
qemu_irq_raise(qemu_system_powerdown);
}
. . . .

Is the powerdown_request handling not complete?

Thanks
-Bharat 





Re: [Qemu-devel] qemu_power_down_request handling

2012-04-16 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Monday, April 16, 2012 1:47 PM
 To: Bhushan Bharat-R65777
 Cc: qemu-devel@nongnu.org; Wood Scott-B07421
 Subject: Re: qemu_power_down_request handling
 
 
 On 16.04.2012, at 10:07, Bhushan Bharat-R65777 wrote:
 
  Hi All,
 
  The -watchdog-action shutdown uses the monitor system_shutdown mechanism 
  but
 this does not actually powerdown the system.
 
  In code, If there is pending powerdown_request
 (qemu_system_powerdown_request() called) then it try to raise an irq. What is
 that irq supposed to do. Currently it in NULL pointer so does not do anything.
 
  Code snapshot -- vl.c : main_loop_should_exit()
 
  
 if (qemu_powerdown_requested()) {
 monitor_protocol_event(QEVENT_POWERDOWN, NULL);
 qemu_irq_raise(qemu_system_powerdown);
 }
  . . . .
 
  Is the powerdown_request handling not complete?
 
 The watchdog shutdown event is a soft power off. Imagine a PC where you press
 the power button, the OS gets notified through ACPI that it's supposed to shut
 down and shuts down the machine.
 
 I don't know of any BookE system that has a comparable mechanism. We could
 implement it through qemu-ga and fall back to hard power off maybe?

Till someone develop this interface, does it make sense to use 
qemu_system_shutdown_request() rather than qemu_system_powerdown_request() for 
-watchdog-action shutdown ?

Thanks
-Bharat




Re: [Qemu-devel] [RFC PATCH] vfio: VFIO Driver core framework

2011-12-02 Thread Bhushan Bharat-R65777


 -Original Message-
 From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
 Behalf Of Stuart Yoder
 Sent: Friday, December 02, 2011 8:11 PM
 To: Alex Williamson
 Cc: Alexey Kardashevskiy; aafab...@cisco.com; k...@vger.kernel.org;
 p...@au1.ibm.com; qemu-devel@nongnu.org; joerg.roe...@amd.com;
 konrad.w...@oracle.com; ag...@suse.de; d...@au1.ibm.com; chrisw@sous-
 sol.org; Yoder Stuart-B08248; io...@lists.linux-foundation.org;
 a...@redhat.com; linux-...@vger.kernel.org; Wood Scott-B07421;
 be...@cisco.com
 Subject: Re: [Qemu-devel] [RFC PATCH] vfio: VFIO Driver core framework
 
 On Thu, Dec 1, 2011 at 3:25 PM, Alex Williamson
 alex.william...@redhat.com wrote:
  On Thu, 2011-12-01 at 14:58 -0600, Stuart Yoder wrote:
  One other mechanism we need as well is the ability to enable/disable
  a domain.
 
  For example-- suppose a device is assigned to a VM, the device is in
  use when the VM is abruptly terminated.  The VM terminate would shut
  off DMA at the IOMMU, but now the device is in an indeterminate
  state.   Some devices have no simple reset bit and getting the device
  back into a sane state could be complicated-- something the
  hypervisor doesn't want to do.
 
  So now KVM restarts the VM, vfio init happens for the device and  the
  IOMMU for that device is re-configured, etc, but we really can't
  re-enable DMA until the guest OS tells us (via an hcall) that it is
  ready.   The guest needs to get the assigned device in a sane state
  before DMA is enabled.
 
  Giant red flag.  We need to paravirtualize the guest?  Not on x86.
 
 It's the reality we have to deal with, but doing this would obviously
 only apply to platforms that need it.
 
  Some
  devices are better for assignment than others.  PCI devices are moving
  towards supporting standard reset mechanisms.
 
  Does this warrant a new domain enable/disable API, or should we make
  this part of the setup API we are discussing here?
 
  What's wrong with simply not adding any DMA mapping entries until you
  think your guest is ready?  Isn't that effectively the same thing?
  Unmap ~= disable.  If the IOMMU API had a mechanism to toggle the
  iommu domain on and off, I wouldn't be opposed to adding an ioctl to
  do it, but it really seems like just a shortcut vs map/unmap.  Thanks,
 
 Yes, we could do something like that I guess.

How do we determine whether guest is ready or not? There can be multiple device 
get ready at different time.
Further if guest have given the device to it user level process or its guest. 
Should not there be some mechanism for a guest to enable/disable on per device 
or group?

Thanks
-Bharat






Re: [Qemu-devel] [RFC PATCH] vfio: VFIO Driver core framework

2011-12-02 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Wood Scott-B07421
 Sent: Friday, December 02, 2011 11:57 PM
 To: Bhushan Bharat-R65777
 Cc: Stuart Yoder; Alex Williamson; Alexey Kardashevskiy;
 aafab...@cisco.com; k...@vger.kernel.org; p...@au1.ibm.com; qemu-
 de...@nongnu.org; joerg.roe...@amd.com; konrad.w...@oracle.com;
 ag...@suse.de; d...@au1.ibm.com; chr...@sous-sol.org; Yoder Stuart-B08248;
 io...@lists.linux-foundation.org; a...@redhat.com; linux-
 p...@vger.kernel.org; Wood Scott-B07421; be...@cisco.com
 Subject: Re: [Qemu-devel] [RFC PATCH] vfio: VFIO Driver core framework
 
 On 12/02/2011 12:11 PM, Bhushan Bharat-R65777 wrote:
  How do we determine whether guest is ready or not? There can be
 multiple device get ready at different time.
 
 The guest makes a hypercall with a device handle -- at least that's how
 we do it in Topaz.
 
  Further if guest have given the device to it user level process or its
 guest. Should not there be some mechanism for a guest to enable/disable
 on per device or group?
 
 Yes, the same mechanism can be used for that -- though in that case we'll
 also want the ability for the guest to be able to control another layer
 of mapping (which will get quite tricky with PAMU's limitations).
  This isn't really VFIO's concern, though (unless you're talking about
 the VFIO implementation in the guest).

Scott, I am not sure if there is any real use case where device needed to 
assigned beyond 2 level (host + immediate guest) in nested virtualization.

But if there any exists then will not it be better to virtualizes the iommu 
(PAMU for Freescale)?

Thanks
-Bharat 






Re: [Qemu-devel] [RFC PATCH] vfio: VFIO Driver core framework

2011-12-02 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Wood Scott-B07421
 Sent: Friday, December 02, 2011 11:57 PM
 To: Bhushan Bharat-R65777
 Cc: Stuart Yoder; Alex Williamson; Alexey Kardashevskiy;
 aafab...@cisco.com; k...@vger.kernel.org; p...@au1.ibm.com; qemu-
 de...@nongnu.org; joerg.roe...@amd.com; konrad.w...@oracle.com;
 ag...@suse.de; d...@au1.ibm.com; chr...@sous-sol.org; Yoder Stuart-B08248;
 io...@lists.linux-foundation.org; a...@redhat.com; linux-
 p...@vger.kernel.org; Wood Scott-B07421; be...@cisco.com
 Subject: Re: [Qemu-devel] [RFC PATCH] vfio: VFIO Driver core framework
 
 On 12/02/2011 12:11 PM, Bhushan Bharat-R65777 wrote:
  How do we determine whether guest is ready or not? There can be
 multiple device get ready at different time.
 
 The guest makes a hypercall with a device handle -- at least that's how
 we do it in Topaz.

Yes, it is ok from hcall with device handle perspective.
But I could not understood how cleanly this can be handled with the idea of 
enabling iommu when guest is ready.

Thanks
-Bharat

 
  Further if guest have given the device to it user level process or its
 guest. Should not there be some mechanism for a guest to enable/disable
 on per device or group?
 
 Yes, the same mechanism can be used for that -- though in that case we'll
 also want the ability for the guest to be able to control another layer
 of mapping (which will get quite tricky with PAMU's limitations).
  This isn't really VFIO's concern, though (unless you're talking about
 the VFIO implementation in the guest).

May be thinking too ahead, but will not something like this will be needed for 
nested virtualization?

Thanks
-Bharat




[Qemu-devel] Query: gdbstub software breakpoint removal

2011-03-21 Thread Bhushan Bharat-R65777
Hi All,

I can see that the software breakpoint queue is cleared in 
kvm_remove_breakpoint() in kvm-all.c.  I.e 
QTAILQ_REMOVE(current_env-kvm_state-kvm_sw_breakpoints, bp, entry); is 
called when the breakpoint is cleared.

While the queue is not cleared on kvm_remove_all_breakpoints();
Is there any specific reason for that?


Thanks
-Bharat



[Qemu-devel] RE: Query: gdbstub software breakpoint removal

2011-03-21 Thread Bhushan Bharat-R65777
Hi Jan,

I will send a patch.

Thanks
-Bharat

 -Original Message-
 From: Jan Kiszka [mailto:jan.kis...@siemens.com]
 Sent: Monday, March 21, 2011 5:04 PM
 To: Bhushan Bharat-R65777
 Cc: qemu-devel@nongnu.org
 Subject: Re: Query: gdbstub software breakpoint removal
 
 On 2011-03-21 12:01, Bhushan Bharat-R65777 wrote:
  Hi All,
 
  I can see that the software breakpoint queue is cleared in
 kvm_remove_breakpoint() in kvm-all.c.  I.e QTAILQ_REMOVE(current_env-
 kvm_state-kvm_sw_breakpoints, bp, entry); is called when the breakpoint
 is cleared.
 
  While the queue is not cleared on kvm_remove_all_breakpoints(); Is
  there any specific reason for that?
 
 Hmm, no particular reason that I recall. This rather looks like a very
 old leak. Patch welcome.
 
 Jan
 
 --
 Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence
 Center Embedded Linux