Re: [Xen-devel] Backport request for tools/hotplug: set mtu from bridge for tap interface

2014-11-19 Thread Daniel Kiper
On Tue, Nov 18, 2014 at 06:24:31PM +, Ian Jackson wrote:
 Daniel Kiper writes (Re: [Xen-devel] delaying 4.4.2 and 4.3.4):
  By the way, what I should do to have commit 
  f3f5f1927f0d3aef9e3d2ce554dbfa0de73487d5
  (tools/hotplug: set mtu from bridge for tap interface) in at least Xen 4.3?
  I am asking about that more than five months. This patch fixes real bug.

 I don't seem to be able to find these mails from you but my mailbox is
 very big.  The normal thing ought to be for you to post a backport
 request and CC the stable tools maintainer (ie me).  I'm sorry if I
 dropped your message.

 The patch looks reasonable to backport.  I have put it on my list for
 backporting later.  I'll wait a bit to see if anyone objects.
 (I have also CC'd the patch's original author and also Ian C because
 he acked it for unstable.)

 Does it apply cleanly to 4.3 and 4.4?  I haven't checked.  Daniel, if
 you could check that, that would be helpful.  If it doesn't then the
 normal process would be for the backport requestor (ie you) to post
 the revised patch against 4.3 and/or 4.4.

4.4 and later have this patch. 4.3 and earlier ones do not have this patch.
It could be cherry picked to 4.3 and 4.2 without any issues.

Daniel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] fix restore: xenstore entries left when restore failed

2014-11-19 Thread Ian Campbell
On Wed, 2014-11-19 at 15:12 +0800, Chunyan Liu wrote:
 While running libvirt-tck domain/102-broken-save-restore.t
 test (save domain, corrupt saved file by truncate the last 512k,
 then restore), found that restore domain failed, but domain
 related xenstore entries still exist in xenstore.
 
 Add a patch to clear xenstore entries in this case.
 
 Signed-off-by: Chunyan Liu cy...@suse.com
 ---
  tools/libxl/libxl.c | 52 
  1 file changed, 52 insertions(+)
 
 diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
 index de23fec..447840d 100644
 --- a/tools/libxl/libxl.c
 +++ b/tools/libxl/libxl.c
 @@ -1525,6 +1525,54 @@ static void devices_destroy_cb(libxl__egc *egc,
 libxl__devices_remove_state *drs,
 int rc);
  
 +static void libxl_clear_xs_entry(libxl__gc *gc, uint32_t domid)

This seems to duplicate a bunch of stuff already done by
libxl__destroy_domid and libxl__devices_destroy etc. I think rather than
duplicating this libxl__destroy_domid should be made to Do The Right
Thing by trying to clean up any remnants of the domid even if it doesn't
currently exist.

Alternatively perhaps the real bug is in the error path of the restore
functionality, which isn't calling the correct unwind path. Ian, any
thoughts?

Ian.

 +const char *dom_path, *vm_path;
 +char *path;
 +unsigned int num_kinds, num_dev_xsentries;
 +char **kinds = NULL, **devs = NULL;
 +int i, j;
 +
 +/* remove libxl path */
 +libxl__xs_rm_checked(gc, XBT_NULL, libxl__xs_libxl_path(gc, domid));
 +
 +dom_path = libxl__xs_get_dompath(gc, domid);
 +if (!dom_path)
 +return;
 +
 +/* remove backend entries */
 +path = GCSPRINTF(%s/device, dom_path);
 +kinds = libxl__xs_directory(gc, XBT_NULL, path, num_kinds);
 +if (kinds  num_kinds) {
 +for (i = 0; i  num_kinds; i++) {
 +path = GCSPRINTF(%s/device/%s, dom_path, kinds[i]);
 +devs = libxl__xs_directory(gc, XBT_NULL, path, 
 num_dev_xsentries);
 +if (!devs)
 +continue;
 +for (j = 0; j  num_dev_xsentries; j++) {
 +path = GCSPRINTF(%s/device/%s/%s/backend,
 + dom_path, kinds[i], devs[j]);
 +path = libxl__xs_read(gc, XBT_NULL, path);
 +if (path)
 +libxl__xs_rm_checked(gc, XBT_NULL, path);
 +}
 +}
 +}
 +
 +path = GCSPRINTF(%s/console/backend, dom_path);
 +path = libxl__xs_read(gc, XBT_NULL, path);
 +if (path)
 +libxl__xs_rm_checked(gc, XBT_NULL, path);
 +
 +/* remove vm path */
 +vm_path = libxl__xs_read(gc, XBT_NULL, GCSPRINTF(%s/vm, dom_path));
 +if (vm_path)
 +libxl__xs_rm_checked(gc, XBT_NULL, vm_path);
 +
 +/* remove dom path */
 +libxl__xs_rm_checked(gc, XBT_NULL, dom_path);
 +}
 +
  void libxl__destroy_domid(libxl__egc *egc, libxl__destroy_domid_state *dis)
  {
  STATE_AO_GC(dis-ao);
 @@ -1540,6 +1588,10 @@ void libxl__destroy_domid(libxl__egc *egc, 
 libxl__destroy_domid_state *dis)
  break;
  case ERROR_INVAL:
  LIBXL__LOG(ctx, LIBXL__LOG_ERROR, non-existant domain %d, domid);
 +/* domain may not started successfully but some xenstore entries
 + * might be created already in earlier stage. We need to clear
 + * those entries. */
 +libxl_clear_xs_entry(gc, domid);
  default:
  goto out;
  }



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Andrii Tseglytskyi
Hi Stefano,

Thank you for your support.

You are right - with latest change you've proposed I got a continuous
prints during platform hang:

(XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
(XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
(XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
(XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
(XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
(XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
(XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0

Looks line issue needs further deeper debugging.

Regards,
Andrii

On Tue, Nov 18, 2014 at 7:51 PM, Stefano Stabellini
stefano.stabell...@eu.citrix.com wrote:
 Hello Andrii,
 we are getting closer :-)

 It would help if you post the output with GIC_DEBUG defined but without
 the other change that fixes the issue.

 I think the problem is probably due to software irqs.
 You are getting too many

 gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is still lr_pending

 messages. That means you are loosing virtual SGIs (guest VCPU to guest
 VCPU). It would be best to investigate why, especially if you get many
 more of the same messages without the MAINTENANCE_IRQ change I
 suggested.

 This patch might also help understading the problem more:


 diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
 index b7516c0..5eaeca2 100644
 --- a/xen/arch/arm/gic.c
 +++ b/xen/arch/arm/gic.c
 @@ -717,7 +717,12 @@ static void gic_restore_pending_irqs(struct vcpu *v)
  list_for_each_entry_safe ( p, t, v-arch.vgic.lr_pending, lr_queue )
  {
  i = find_first_zero_bit(this_cpu(lr_mask), nr_lrs);
 -if ( i = nr_lrs ) return;
 +if ( i = nr_lrs )
 +{
 +gdprintk(XENLOG_DEBUG, LRs full, not injecting irq=%u into 
 d%dv%d\n,
 +p-irq, v-domain-domain_id, v-vcpu_id);
 +continue;
 +}

  spin_lock_irqsave(gic.lock, flags);
  gic_set_lr(i, p, GICH_LR_PENDING);




 On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
 Hi Stefano,

 No hangs with this change.
 Complete log is the following:

 U-Boot SPL 2013.10-00499-g062782f (Oct 14 2014 - 11:36:26)
 DRA752 ES1.0
 ethaddr not set. Validating first E-fuse MAC
 cpsw
 - UART enabled -
 - CPU  booting -
 - Xen starting in Hyp mode -
 - Zero BSS -
 - Setting up control registers -
 - Turning on paging -
 - Ready -
 (XEN) Checking for initrd in /chosen
 (XEN) RAM: 8000 - 9fff
 (XEN) RAM: a000 - bfff
 (XEN) RAM: c000 - dfff
 (XEN)
 (XEN) MODULE[1]: c200 - c20069aa
 (XEN) MODULE[2]: c000 - c200
 (XEN) MODULE[3]:  - 
 (XEN) MODULE[4]: c300 - c301
 (XEN)  RESVD[0]: ba30 - bfd0
 (XEN)  RESVD[1]: 9580 - 9590
 (XEN)  RESVD[2]: 98a0 - 98b0
 (XEN)  RESVD[3]: 95f0 - 98a0
 (XEN)  RESVD[4]: 9590 - 95f0
 (XEN)
 (XEN) Command line: dom0_mem=128M console=dtuart dtuart=serial0
 dom0_max_vcpus=2 bootscrub=0 flask_enforcing=1
 (XEN) Placing Xen at 0xdfe0-0xe000
 (XEN) Xen heap: d200-de00 (49152 pages)
 (XEN) Dom heap: 344064 pages
 (XEN) Domain heap initialised
 (XEN) Looking for UART console serial0
  Xen 4.5-unstable
 (XEN) Xen version 4.5-unstable (atseglytskyi@)
 (arm-linux-gnueabihf-gcc (crosstool-NG
 linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) 4.7.3
 20130328 (prerelease)) debu4
 (XEN) Latest ChangeSet: Thu Jul 3 12:55:26 2014 +0300 git:3ee354f-dirty
 (XEN) Processor: 412fc0f2: ARM Limited, variant: 0x2, part 0xc0f, rev 0x2
 (XEN) 32-bit Execution:
 (XEN)   Processor Features: 1131:00011011
 (XEN) Instruction Sets: AArch32 Thumb Thumb-2 ThumbEE Jazelle
 (XEN) Extensions: GenericTimer Security
 (XEN)   Debug Features: 02010555
 (XEN)   Auxiliary Features: 
 (XEN)   Memory Model Features: 10201105 2000 0124 02102211
 (XEN)  ISA Features: 02101110 13112111 21232041 2131 10011142 
 (XEN) Platform: TI DRA7
 (XEN) /psci method must be smc, but is: hvc
 (XEN) Set AuxCoreBoot1 to dfe0004c (0020004c)
 (XEN) Set AuxCoreBoot0 to 0x20
 (XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27
 (XEN) Using generic timer at 6144 KHz
 (XEN) GIC initialization:
 (XEN) gic_dist_addr=48211000
 (XEN) gic_cpu_addr=48212000
 (XEN) gic_hyp_addr=48214000
 (XEN) gic_vcpu_addr=48216000
 (XEN) gic_maintenance_irq=25
 (XEN) GIC: 192 lines, 2 cpus, secure (IID 043b).
 (XEN) Using scheduler: SMP Credit Scheduler (credit)
 (XEN) I/O virtualisation disabled
 (XEN) Allocated console ring of 16 KiB.
 (XEN) VFP implementer 0x41 architecture 4 part 0x30 variant 0xf rev 0x0
 (XEN) Bringing up CPU1
 - CPU 

Re: [Xen-devel] [v7][RFC][PATCH 06/13] hvmloader/ram: check if guest memory is out of reserved device memory maps

2014-11-19 Thread Tian, Kevin
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: Wednesday, November 12, 2014 5:57 PM
 
  On 12.11.14 at 10:13, tiejun.c...@intel.com wrote:
  On 2014/11/12 17:02, Jan Beulich wrote:
  On 12.11.14 at 09:45, tiejun.c...@intel.com wrote:
  #2 flags field in each specific device of new domctl would control
  whether this device need to check/reserve its own RMRR range. But its
  not dependent on current device assignment domctl, so the user can
 use
  them to control which devices need to work as hotplug later, separately.
 
  And this could be left as a second step, in order for what needs to
  be done now to not get more complicated that necessary.
 
 
  Do you mean currently we still rely on the device assignment domctl to
  provide SBDF? So looks nothing should be changed in our policy.
 
  I can't connect your question to what I said. What I tried to tell you
 
  Something is misunderstanding to me.
 
  was that I don't currently see a need to make this overly complicated:
  Having the option to punch holes for all devices and (by default)
  dealing with just the devices assigned at boot may be sufficient as a
  first step. Yet (repeating just to avoid any misunderstanding) that
  makes things easier only if we decide to require device assignment to
  happen before memory getting populated (since in that case there's
 
  Here what do you mean, 'if we decide to require device assignment to
  happen before memory getting populated'?
 
  Because -quote-
  
  In the present the device assignment is always after memory population.
  And I also mentioned previously I double checked this sequence with printk.
  
 
  Or you already plan or deciede to change this sequence?
 
 So it is now the 3rd time that I'm telling you that part of your
 decision making as to which route to follow should be to
 re-consider whether the current sequence of operations shouldn't
 be changed. Please also consult with the VT-d maintainers (hint to
 them: participating in this discussion publicly would be really nice)
 on _all_ decisions to be made here.
 

there's no decision made privately. we hope all the discussions publicly.
will get back w/ our thoughts soon.

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5 2/4] xen: arm: correct off by one in xgene-storm's map_one_mmio

2014-11-19 Thread Ian Campbell
On Tue, 2014-11-18 at 17:01 +, Julien Grall wrote:
 Hi Ian,
 
 On 11/18/2014 04:44 PM, Ian Campbell wrote:
  The callers pass the end as the pfn immediately *after* the last page to be
  mapped, therefore adding one is incorrect and causes an additional page to 
  be
  mapped.
  
  At the same time correct the printing of the mfn values, zero-padding them 
  to
  16 digits as for a paddr when they are frame numbers is just confusing.
  
  Signed-off-by: Ian Campbell ian.campb...@citrix.com
  ---
   xen/arch/arm/platforms/xgene-storm.c |4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)
  
  diff --git a/xen/arch/arm/platforms/xgene-storm.c 
  b/xen/arch/arm/platforms/xgene-storm.c
  index 29c4752..38674cd 100644
  --- a/xen/arch/arm/platforms/xgene-storm.c
  +++ b/xen/arch/arm/platforms/xgene-storm.c
  @@ -45,9 +45,9 @@ static int map_one_mmio(struct domain *d, const char 
  *what,
   {
   int ret;
   
  -printk(Additional MMIO %PRIpaddr-%PRIpaddr (%s)\n,
  +printk(Additional MMIO %lx-%lx (%s)\n,
  start, end, what);
  -ret = map_mmio_regions(d, start, end - start + 1, start);
  +ret = map_mmio_regions(d, start, end - start, start);
   if ( ret )
   printk(Failed to map %s @ %PRIpaddr to dom%d\n,
  what, start, d-domain_id);
 
 As you fixed the previous printf format. I would fix this one too.

Yes, good idea.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] libxc: Expose the pdpe1gb cpuid flag to guest

2014-11-19 Thread Tim Deegan
At 01:29 + on 19 Nov (1416356943), Zhang, Yang Z wrote:
 Tim Deegan wrote on 2014-11-18:
  In this case, the guest is entitled to _expect_ pagefaults on 1GB
  mappings if CPUID claims they are not supported.  That sounds like an
  unlikely thing for the guest to be relying on, but Xen itself does
  something similar for the SHOPT_FAST_FAULT_PATH (and now also for
  IOMMU entries for the deferred caching attribute updates).
 
 Indeed. How about adding the software check (as Andrew mentioned)
 firstly and leave the hardware problem (Actually, I don't think we
 can solve it currently).

I don't think we should change the software path unless we can change
the hardware behaviour too.  It's better to be consistent, and it
saves us some cycles in the pt walker.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5 1/4] xen: arm: Add earlyprintk for McDivitt.

2014-11-19 Thread Ian Campbell
On Tue, 2014-11-18 at 16:59 +, Julien Grall wrote:
 Hi Ian,
 
 On 11/18/2014 04:44 PM, Ian Campbell wrote:
  Signed-off-by: Ian Campbell ian.campb...@citrix.com
  ---
   xen/arch/arm/Rules.mk |6 ++
   1 file changed, 6 insertions(+)
  
  diff --git a/xen/arch/arm/Rules.mk b/xen/arch/arm/Rules.mk
  index 572d854..ef887a5 100644
  --- a/xen/arch/arm/Rules.mk
  +++ b/xen/arch/arm/Rules.mk
  @@ -95,6 +95,12 @@ EARLY_PRINTK_BAUD := 115200
   EARLY_UART_BASE_ADDRESS := 0x1c02
   EARLY_UART_REG_SHIFT := 2
   endif
  +ifeq ($(CONFIG_EARLY_PRINTK), xgene-mcdivitt)
  +EARLY_PRINTK_INC := 8250
  +EARLY_PRINTK_BAUD := 9600
 
 EARLY_PRINTK_BAUD is not necessary as we don't use the initialization
 function (EARLY_PRINTK_INIT_UART is not set).

Oh yes, oops. Also the baud is not even what is actually used, so it's
not even serving a documentary purpose.

 With the EARLY_PRINTK_BAUD dropped, this could be merged with the
 xgene-storm  early printk

It's at a different base address. Long term I either want to make this
(somewhat) runtime configurable or at least to rationalise the options
into the form soc/soc-family-uartN, or perhaps even 8250|pl011|
etc@address[,ratesettings], if it's not to skanky to arrange to
parse that somewhere in the build system. Not for 4.5 though.

 (I didn't really understand why the baud rate
 is different).

Different hardware might potentially have different baud rates
configured in firmware which we would want to seemlessly follow, but
it's moot since the right thing to do in most cases is leave the
bootloader provided cfg alone.

 But I don't think it's 4.5 material.

You mean the patch generally or the merging?

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5 4/4] xen: arm: Support the other 4 PCI buses on Xgene

2014-11-19 Thread Ian Campbell
On Tue, 2014-11-18 at 17:15 +, Julien Grall wrote:
  +default:
  +/* Ignore unknown PCI busses */
 
 I would add a
 printk(Ignoring PCI busses %s\n, dt_node_full_name(dev));
 
  +ret = 0;
  +break;
 
 continue?

Yes, that makes sense (probably the ret = is then unnecessary).

  You can't assume the order of the PCI busses in the device tree.

But, I don't understand what this has to do with using continue.

 
  +}
  +
  +if ( ret  0 )
  +return ret;
  +
  +printk(Mapped additional regions for PCIe device at 
  0x%PRIx64\n,
  +   addr);
 
 Printing the device tree path would be more helpful than the address.

OK.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5 4/4] xen: arm: Support the other 4 PCI buses on Xgene

2014-11-19 Thread Julien Grall

Hi Ian,

On 19/11/2014 09:56, Ian Campbell wrote:

On Tue, 2014-11-18 at 17:15 +, Julien Grall wrote:

+default:
+/* Ignore unknown PCI busses */


I would add a
printk(Ignoring PCI busses %s\n, dt_node_full_name(dev));


+ret = 0;
+break;


continue?


Yes, that makes sense (probably the ret = is then unnecessary).


  You can't assume the order of the PCI busses in the device tree.


But, I don't understand what this has to do with using continue.


The current xgene-storm DTS has the different PCI busses ordered. So as 
soon as you don't find the PCI range, it means there is no more PCI busses.


Without the continue, this patch gives the impression that you rely on 
the node order on the device tree.


Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] Strangeness in generated xen-command-line.html

2014-11-19 Thread Ian Campbell
http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html has a
bunch of random sha id's in it, where the 4.4-testing version does not.

They seem to have replaced the various
`= boolean`

Default: `true`

Bits.

Andy, Any thoughts or should I investigate?

I don't see anything since 4.4 touching the html generation itself (we
added pandoc for pdf but didn't touch HTML afaict).

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5 4/4] xen: arm: Support the other 4 PCI buses on Xgene

2014-11-19 Thread Ian Campbell
On Wed, 2014-11-19 at 10:06 +, Julien Grall wrote:
 Hi Ian,
 
 On 19/11/2014 09:56, Ian Campbell wrote:
  On Tue, 2014-11-18 at 17:15 +, Julien Grall wrote:
  +default:
  +/* Ignore unknown PCI busses */
 
  I would add a
  printk(Ignoring PCI busses %s\n, dt_node_full_name(dev));
 
  +ret = 0;
  +break;
 
  continue?
 
  Yes, that makes sense (probably the ret = is then unnecessary).
 
You can't assume the order of the PCI busses in the device tree.
 
  But, I don't understand what this has to do with using continue.
 
 The current xgene-storm DTS has the different PCI busses ordered. So as 
 soon as you don't find the PCI range, it means there is no more PCI busses.

I don't think it does, the patch iterates over all of the buses, even
ones we don't understand, we don't give up at the first one we don't
grok.

 Without the continue, this patch gives the impression that you rely on 
 the node order on the device tree.



 
 Regards,
 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Strangeness in generated xen-command-line.html

2014-11-19 Thread Ian Campbell
On Wed, 2014-11-19 at 10:12 +, Ian Campbell wrote:
 http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html has a
 bunch of random sha id's in it, where the 4.4-testing version does not.
 
 They seem to have replaced the various
 `= boolean`
 
 Default: `true`
 
 Bits.
 
 Andy, Any thoughts or should I investigate?
 
 I don't see anything since 4.4 touching the html generation itself (we
 added pandoc for pdf but didn't touch HTML afaict).

FWIW it seems to happen from the conring_size entry onwards, The
com1,com2 and earlier are OK. I can't see anything about thecom1,com2
entry which would be causing this...

 
 Ian.
 
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Strangeness in generated xen-command-line.html

2014-11-19 Thread Andrew Cooper
On 19/11/14 10:12, Ian Campbell wrote:
 http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html has a
 bunch of random sha id's in it, where the 4.4-testing version does not.

 They seem to have replaced the various
 `= boolean`
 
 Default: `true`
 
 Bits.

 Andy, Any thoughts or should I investigate?

 I don't see anything since 4.4 touching the html generation itself (we
 added pandoc for pdf but didn't touch HTML afaict).

 Ian.


I have looked into it before but didn't get very far.  I suspect it
might be a bug in wheezy's markdown.  It doesn't reproduce when building
using other versions of markdown.

I had planned (given some non-existent free time) to see about
converting it from markdown to pandoc which has leads to a far more
nicely formatted document.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5 4/4] xen: arm: Support the other 4 PCI buses on Xgene

2014-11-19 Thread Julien Grall



On 19/11/2014 10:18, Ian Campbell wrote:

On Wed, 2014-11-19 at 10:06 +, Julien Grall wrote:

Hi Ian,

On 19/11/2014 09:56, Ian Campbell wrote:

On Tue, 2014-11-18 at 17:15 +, Julien Grall wrote:

+default:
+/* Ignore unknown PCI busses */


I would add a
printk(Ignoring PCI busses %s\n, dt_node_full_name(dev));


+ret = 0;
+break;


continue?


Yes, that makes sense (probably the ret = is then unnecessary).


   You can't assume the order of the PCI busses in the device tree.


But, I don't understand what this has to do with using continue.


The current xgene-storm DTS has the different PCI busses ordered. So as
soon as you don't find the PCI range, it means there is no more PCI busses.


I don't think it does, the patch iterates over all of the buses, even
ones we don't understand, we don't give up at the first one we don't
grok.


Hrmm you are right. I don't know why I though the break were bound to 
the loop and not the switch.


Sorry for the noise.

Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Strangeness in generated xen-command-line.html

2014-11-19 Thread Andrew Cooper
On 19/11/14 10:30, Ian Campbell wrote:
 On Wed, 2014-11-19 at 10:24 +, Andrew Cooper wrote:
 On 19/11/14 10:12, Ian Campbell wrote:
 http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html has a
 bunch of random sha id's in it, where the 4.4-testing version does not.

 They seem to have replaced the various
 `= boolean`
 
 Default: `true`
 
 Bits.

 Andy, Any thoughts or should I investigate?

 I don't see anything since 4.4 touching the html generation itself (we
 added pandoc for pdf but didn't touch HTML afaict).

 Ian.

 I have looked into it before but didn't get very far.  I suspect it
 might be a bug in wheezy's markdown.  It doesn't reproduce when building
 using other versions of markdown.
 Right.

 It seems to be triggered by the line:
   `S` is an integer 1 or 2 for the number of stop bits.
 just removing that makes the issue go away. It's not the `s since
 removing just those retains the issue. WTAF!

 Ian.


So it does.  As best as I can tell, that is all legal mardown for a
nested block.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Strangeness in generated xen-command-line.html

2014-11-19 Thread Ian Campbell
On Wed, 2014-11-19 at 10:38 +, Ian Campbell wrote:
 I've not been able to find a workaround...

This works for me...

8---

From 3483179d333c47deacfc8c2eb195bf7dc4a555ff Mon Sep 17 00:00:00 2001
From: Ian Campbell ian.campb...@citrix.com
Date: Wed, 19 Nov 2014 10:42:18 +
Subject: [PATCH] docs: workaround markdown parser error in
 xen-command-line.markdown

Some versions of markdown (specifically the one in Debian Wheezy, currently
used to generate
http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html) seem to be
confused by nested lists in the middle of multi-paragraph parent list entries
as seen in the com1,com2 entry.

The effect is that the Default section of all following entries are replace
by some sort of hash or checksum (at least, a string of 32 random seeming hex
digits).

Workaround this issue by making the decriptions of the DPS options a nested
list, moving the existing nested list describing the options for S into a third
level list. This seems to avoid the issue, and is arguably better formatting in
its own right (at least its not a regression IMHO)

Signed-off-by: Ian Campbell ian.campb...@citrix.com
---
 docs/misc/xen-command-line.markdown |   16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown 
b/docs/misc/xen-command-line.markdown
index 0830e5f..c40f89b 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -248,17 +248,17 @@ Both option `com1` and `com2` follow the same format.
 * `DPS` represents the number of data bits, the parity, and the number
   of stop bits.
 
-  `D` is an integer between 5 and 8 for the number of data bits.
+  * `D` is an integer between 5 and 8 for the number of data bits.
 
-  `P` is a single character representing the type of parity:
+  * `P` is a single character representing the type of parity:
 
-   * `n` No
-   * `o` Odd
-   * `e` Even
-   * `m` Mark
-   * `s` Space
+  * `n` No
+  * `o` Odd
+  * `e` Even
+  * `m` Mark
+  * `s` Space
 
-  `S` is an integer 1 or 2 for the number of stop bits.
+  * `S` is an integer 1 or 2 for the number of stop bits.
 
 * `io-base` is an integer which specifies the IO base port for UART
   registers.
-- 
1.7.10.4




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Strangeness in generated xen-command-line.html

2014-11-19 Thread Andrew Cooper
On 19/11/14 10:46, Ian Campbell wrote:
 On Wed, 2014-11-19 at 10:38 +, Ian Campbell wrote:
 I've not been able to find a workaround...
 This works for me...

 8---

 From 3483179d333c47deacfc8c2eb195bf7dc4a555ff Mon Sep 17 00:00:00 2001
 From: Ian Campbell ian.campb...@citrix.com
 Date: Wed, 19 Nov 2014 10:42:18 +
 Subject: [PATCH] docs: workaround markdown parser error in
  xen-command-line.markdown

 Some versions of markdown (specifically the one in Debian Wheezy, currently
 used to generate
 http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html) seem to be
 confused by nested lists in the middle of multi-paragraph parent list entries
 as seen in the com1,com2 entry.

 The effect is that the Default section of all following entries are replace
 by some sort of hash or checksum (at least, a string of 32 random seeming hex
 digits).

 Workaround this issue by making the decriptions of the DPS options a nested
 list, moving the existing nested list describing the options for S into a 
 third
 level list. This seems to avoid the issue, and is arguably better formatting 
 in
 its own right (at least its not a regression IMHO)

 Signed-off-by: Ian Campbell ian.campb...@citrix.com

I had just identified a different way, but this way is slightly better.

If you take out all the blank lines visible in the context below, the
resulting HTML will be correctly formatted and rather neater (i.e.
without sporadic blank lines).

~Andrew

 ---
  docs/misc/xen-command-line.markdown |   16 
  1 file changed, 8 insertions(+), 8 deletions(-)

 diff --git a/docs/misc/xen-command-line.markdown 
 b/docs/misc/xen-command-line.markdown
 index 0830e5f..c40f89b 100644
 --- a/docs/misc/xen-command-line.markdown
 +++ b/docs/misc/xen-command-line.markdown
 @@ -248,17 +248,17 @@ Both option `com1` and `com2` follow the same format.
  * `DPS` represents the number of data bits, the parity, and the number
of stop bits.
  
 -  `D` is an integer between 5 and 8 for the number of data bits.
 +  * `D` is an integer between 5 and 8 for the number of data bits.
  
 -  `P` is a single character representing the type of parity:
 +  * `P` is a single character representing the type of parity:
  
 -   * `n` No
 -   * `o` Odd
 -   * `e` Even
 -   * `m` Mark
 -   * `s` Space
 +  * `n` No
 +  * `o` Odd
 +  * `e` Even
 +  * `m` Mark
 +  * `s` Space
  
 -  `S` is an integer 1 or 2 for the number of stop bits.
 +  * `S` is an integer 1 or 2 for the number of stop bits.
  
  * `io-base` is an integer which specifies the IO base port for UART
registers.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [BUGFIX][PATCH for 2.2 1/1] hw/ide/core.c: Prevent SIGSEGV during migration

2014-11-19 Thread Stefano Stabellini
ping?

On Tue, 18 Nov 2014, Stefano Stabellini wrote:
 Konrad,
 I think we should have this fix in Xen 4.5. Should I go ahead and
 backport it?
 
 On Mon, 17 Nov 2014, Don Slutz wrote:
  The other callers to blk_set_enable_write_cache() in this file
  already check for s-blk == NULL.
  
  Signed-off-by: Don Slutz dsl...@verizon.com
  ---
  
  I think this is a bugfix that should be back ported to stable
  releases.
  
  I also think this should be done in xen's copy of QEMU for 4.5 with
  back port(s) to active stable releases.
  
  Note: In 2.1 and earlier the routine is
  bdrv_set_enable_write_cache(); variable is s-bs.
  
   hw/ide/core.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)
  
  diff --git a/hw/ide/core.c b/hw/ide/core.c
  index 00e21cf..d4af5e2 100644
  --- a/hw/ide/core.c
  +++ b/hw/ide/core.c
  @@ -2401,7 +2401,7 @@ static int ide_drive_post_load(void *opaque, int 
  version_id)
   {
   IDEState *s = opaque;
   
  -if (s-identify_set) {
  +if (s-blk  s-identify_set) {
   blk_set_enable_write_cache(s-blk, !!(s-identify_data[85]  (1  
  5)));
   }
   return 0;
  -- 
  1.8.4
  
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Strangeness in generated xen-command-line.html

2014-11-19 Thread Ian Campbell
On Wed, 2014-11-19 at 10:52 +, Andrew Cooper wrote:
 On 19/11/14 10:46, Ian Campbell wrote:
  On Wed, 2014-11-19 at 10:38 +, Ian Campbell wrote:
  I've not been able to find a workaround...
  This works for me...
 
  8---
 
  From 3483179d333c47deacfc8c2eb195bf7dc4a555ff Mon Sep 17 00:00:00 2001
  From: Ian Campbell ian.campb...@citrix.com
  Date: Wed, 19 Nov 2014 10:42:18 +
  Subject: [PATCH] docs: workaround markdown parser error in
   xen-command-line.markdown
 
  Some versions of markdown (specifically the one in Debian Wheezy, currently
  used to generate
  http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html) seem to be
  confused by nested lists in the middle of multi-paragraph parent list 
  entries
  as seen in the com1,com2 entry.
 
  The effect is that the Default section of all following entries are 
  replace
  by some sort of hash or checksum (at least, a string of 32 random seeming 
  hex
  digits).
 
  Workaround this issue by making the decriptions of the DPS options a nested
  list, moving the existing nested list describing the options for S into a 
  third
  level list. This seems to avoid the issue, and is arguably better 
  formatting in
  its own right (at least its not a regression IMHO)
 
  Signed-off-by: Ian Campbell ian.campb...@citrix.com
 
 I had just identified a different way, but this way is slightly better.
 
 If you take out all the blank lines visible in the context below, the
 resulting HTML will be correctly formatted and rather neater (i.e.
 without sporadic blank lines).

Agreed.

8--

From 53398a9729d391f1fb7b6f753a0032b1f3604d4d Mon Sep 17 00:00:00 2001
From: Ian Campbell ian.campb...@citrix.com
Date: Wed, 19 Nov 2014 10:42:18 +
Subject: [PATCH] docs: workaround markdown parser error in
 xen-command-line.markdown

Some versions of markdown (specifically the one in Debian Wheezy, currently
used to generate
http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html) seem to be
confused by nested lists in the middle of multi-paragraph parent list entries
as seen in the com1,com2 entry.

The effect is that the Default section of all following entries are replace
by some sort of hash or checksum (at least, a string of 32 random seeming hex
digits).

Workaround this issue by making the decriptions of the DPS options a nested
list, moving the existing nested list describing the options for S into a third
level list. This seems to avoid the issue, and is arguably better formatting in
its own right (at least its not a regression IMHO)

Signed-off-by: Ian Campbell ian.campb...@citrix.com
---
v2: Less blank lines == nicer output.
---
 docs/misc/xen-command-line.markdown |   21 -
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown 
b/docs/misc/xen-command-line.markdown
index 0830e5f..b7eaeea 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -247,19 +247,14 @@ Both option `com1` and `com2` follow the same format.
 * Optionally, a clock speed measured in hz can be specified.
 * `DPS` represents the number of data bits, the parity, and the number
   of stop bits.
-
-  `D` is an integer between 5 and 8 for the number of data bits.
-
-  `P` is a single character representing the type of parity:
-
-   * `n` No
-   * `o` Odd
-   * `e` Even
-   * `m` Mark
-   * `s` Space
-
-  `S` is an integer 1 or 2 for the number of stop bits.
-
+  * `D` is an integer between 5 and 8 for the number of data bits.
+  * `P` is a single character representing the type of parity:
+  * `n` No
+  * `o` Odd
+  * `e` Even
+  * `m` Mark
+  * `s` Space
+  * `S` is an integer 1 or 2 for the number of stop bits.
 * `io-base` is an integer which specifies the IO base port for UART
   registers.
 * `irq` is the IRQ number to use, or `0` to use the UART in poll
-- 
1.7.10.4




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Strangeness in generated xen-command-line.html

2014-11-19 Thread Andrew Cooper
On 19/11/14 11:04, Ian Campbell wrote:
 On Wed, 2014-11-19 at 10:52 +, Andrew Cooper wrote:
 On 19/11/14 10:46, Ian Campbell wrote:
 On Wed, 2014-11-19 at 10:38 +, Ian Campbell wrote:
 I've not been able to find a workaround...
 This works for me...

 8---

 From 3483179d333c47deacfc8c2eb195bf7dc4a555ff Mon Sep 17 00:00:00 2001
 From: Ian Campbell ian.campb...@citrix.com
 Date: Wed, 19 Nov 2014 10:42:18 +
 Subject: [PATCH] docs: workaround markdown parser error in
  xen-command-line.markdown

 Some versions of markdown (specifically the one in Debian Wheezy, currently
 used to generate
 http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html) seem to be
 confused by nested lists in the middle of multi-paragraph parent list 
 entries
 as seen in the com1,com2 entry.

 The effect is that the Default section of all following entries are 
 replace
 by some sort of hash or checksum (at least, a string of 32 random seeming 
 hex
 digits).

 Workaround this issue by making the decriptions of the DPS options a nested
 list, moving the existing nested list describing the options for S into a 
 third
 level list. This seems to avoid the issue, and is arguably better 
 formatting in
 its own right (at least its not a regression IMHO)

 Signed-off-by: Ian Campbell ian.campb...@citrix.com
 I had just identified a different way, but this way is slightly better.

 If you take out all the blank lines visible in the context below, the
 resulting HTML will be correctly formatted and rather neater (i.e.
 without sporadic blank lines).
 Agreed.

 8--

 From 53398a9729d391f1fb7b6f753a0032b1f3604d4d Mon Sep 17 00:00:00 2001
 From: Ian Campbell ian.campb...@citrix.com
 Date: Wed, 19 Nov 2014 10:42:18 +
 Subject: [PATCH] docs: workaround markdown parser error in
  xen-command-line.markdown

 Some versions of markdown (specifically the one in Debian Wheezy, currently
 used to generate
 http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html) seem to be
 confused by nested lists in the middle of multi-paragraph parent list entries
 as seen in the com1,com2 entry.

 The effect is that the Default section of all following entries are replace
 by some sort of hash or checksum (at least, a string of 32 random seeming hex
 digits).

 Workaround this issue by making the decriptions of the DPS options a nested
 list, moving the existing nested list describing the options for S into a 
 third
 level list. This seems to avoid the issue, and is arguably better formatting 
 in
 its own right (at least its not a regression IMHO)

 Signed-off-by: Ian Campbell ian.campb...@citrix.com

Reviewed-by: Andrew Cooper andrew.coop...@citrix.com

 ---
 v2: Less blank lines == nicer output.
 ---
  docs/misc/xen-command-line.markdown |   21 -
  1 file changed, 8 insertions(+), 13 deletions(-)

 diff --git a/docs/misc/xen-command-line.markdown 
 b/docs/misc/xen-command-line.markdown
 index 0830e5f..b7eaeea 100644
 --- a/docs/misc/xen-command-line.markdown
 +++ b/docs/misc/xen-command-line.markdown
 @@ -247,19 +247,14 @@ Both option `com1` and `com2` follow the same format.
  * Optionally, a clock speed measured in hz can be specified.
  * `DPS` represents the number of data bits, the parity, and the number
of stop bits.
 -
 -  `D` is an integer between 5 and 8 for the number of data bits.
 -
 -  `P` is a single character representing the type of parity:
 -
 -   * `n` No
 -   * `o` Odd
 -   * `e` Even
 -   * `m` Mark
 -   * `s` Space
 -
 -  `S` is an integer 1 or 2 for the number of stop bits.
 -
 +  * `D` is an integer between 5 and 8 for the number of data bits.
 +  * `P` is a single character representing the type of parity:
 +  * `n` No
 +  * `o` Odd
 +  * `e` Even
 +  * `m` Mark
 +  * `s` Space
 +  * `S` is an integer 1 or 2 for the number of stop bits.
  * `io-base` is an integer which specifies the IO base port for UART
registers.
  * `irq` is the IRQ number to use, or `0` to use the UART in poll


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [BUGFIX][PATCH for 2.2 1/1] hw/ide/core.c: Prevent SIGSEGV during migration

2014-11-19 Thread Konrad Rzeszutek Wilk
On November 19, 2014 5:52:58 AM EST, Stefano Stabellini 
stefano.stabell...@eu.citrix.com wrote:
ping?

On Tue, 18 Nov 2014, Stefano Stabellini wrote:
 Konrad,
 I think we should have this fix in Xen 4.5. Should I go ahead and
 backport it?

Go for it. Release-Acked-by: Konrad Rzeszutek Wilk (konrad.w...@oracle.com)

 
 On Mon, 17 Nov 2014, Don Slutz wrote:
  The other callers to blk_set_enable_write_cache() in this file
  already check for s-blk == NULL.
  
  Signed-off-by: Don Slutz dsl...@verizon.com
  ---
  
  I think this is a bugfix that should be back ported to stable
  releases.
  
  I also think this should be done in xen's copy of QEMU for 4.5 with
  back port(s) to active stable releases.
  
  Note: In 2.1 and earlier the routine is
  bdrv_set_enable_write_cache(); variable is s-bs.
  
   hw/ide/core.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)
  
  diff --git a/hw/ide/core.c b/hw/ide/core.c
  index 00e21cf..d4af5e2 100644
  --- a/hw/ide/core.c
  +++ b/hw/ide/core.c
  @@ -2401,7 +2401,7 @@ static int ide_drive_post_load(void *opaque,
int version_id)
   {
   IDEState *s = opaque;
   
  -if (s-identify_set) {
  +if (s-blk  s-identify_set) {
   blk_set_enable_write_cache(s-blk, !!(s-identify_data[85]
 (1  5)));
   }
   return 0;
  -- 
  1.8.4
  
 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [qemu-mainline test] 31668: regressions - FAIL

2014-11-19 Thread xen . org
flight 31668 qemu-mainline real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/31668/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-pair   17 guest-migrate/src_host/dst_host fail REGR. vs. 30603

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt  9 guest-start  fail   never pass
 test-amd64-amd64-xl-pcipt-intel  9 guest-start fail never pass
 test-amd64-i386-libvirt   9 guest-start  fail   never pass
 test-armhf-armhf-xl  10 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt  9 guest-start  fail   never pass
 test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xl-win7-amd64 14 guest-stop   fail  never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-amd64-xl-win7-amd64 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-amd64-xl-winxpsp3 14 guest-stop   fail   never pass
 test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop  fail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass
 test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop   fail never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3 14 guest-stopfail never pass
 test-amd64-i386-xl-winxpsp3  14 guest-stop   fail   never pass

version targeted for testing:
 qemuuf874bf905ff2f8dcc17acbfc61e49a92a6f4d04b
baseline version:
 qemuub00a0ddb31a393b8386d30a9bef4d9bbb249e7ec


People who touched revisions under test:
  Adam Crume adamcr...@gmail.com
  Alex Bennée alex.ben...@linaro.org
  Alex Williamson alex.william...@redhat.com
  Alexander Graf ag...@suse.de
  Alexey Kardashevskiy a...@ozlabs.ru
  Amit Shah amit.s...@redhat.com
  Amos Kong ak...@redhat.com
  Andreas Färber afaer...@suse.de
  Andrew Jones drjo...@redhat.com
  Ard Biesheuvel ard.biesheu...@linaro.org
  Aurelien Jarno aurel...@aurel32.net
  Bastian Koppelmann kbast...@mail.uni-paderborn.de
  Bharata B Rao bhar...@linux.vnet.ibm.com
  Bin Wu wu.wu...@huawei.com
  Chao Peng chao.p.p...@linux.intel.com
  Chen Fan chen.fan.f...@cn.fujitsu.com
  Chen Gang gang.chen.5...@gmail.com
  Chenliang chenlian...@huawei.com
  Chris Johns chr...@rtems.org
  Chris Spiegel chris.spie...@cypherpath.com
  Christian Borntraeger borntrae...@de.ibm.com
  Claudio Fontana claudio.font...@huawei.com
  Cole Robinson crobi...@redhat.com
  Corey Minyard cminy...@mvista.com
  Cornelia Huck cornelia.h...@de.ibm.com
  David Gibson da...@gibson.dropbear.id.au
  David Hildenbrand d...@linux.vnet.ibm.com
  Denis V. Lunev d...@openvz.org
  Don Slutz dsl...@verizon.com
  Dongxue Zhang elta@gmail.com
  Dr. David Alan Gilbert dgilb...@redhat.com
  Edgar E. Iglesias edgar.igles...@xilinx.com
  Eduardo Habkost ehabk...@redhat.com
  Eduardo Otubo eduardo.ot...@profitbricks.com
  Fabian Aggeler aggel...@ethz.ch
  Fam Zheng f...@redhat.com
  Frank Blaschka blasc...@linux.vnet.ibm.com
  Gal Hammer gham...@redhat.com
  Gerd Hoffmann kra...@redhat.com
  Gonglei arei.gong...@huawei.com
  Greg Bellows greg.bell...@linaro.org
  Gu Zheng guz.f...@cn.fujitsu.com
  Hannes Reinecke h...@suse.de
  Heinz Graalfs graa...@linux.vnet.ibm.com
  Igor Mammedov imamm...@redhat.com
  James Harper james.har...@ejbdigital.com.au
  James Harper ja...@ejbdigital.com.au
  Jan Kiszka jan.kis...@siemens.com
  Jan Vesely jano.ves...@gmail.com
  Jens Freimann jf...@linux.vnet.ibm.com
  Joel Schopp jsch...@linux.vnet.ibm.com
  John Snow js...@redhat.com
  Jonas Gorski j...@openwrt.org
  Jonas Maebe jonas.ma...@elis.ugent.be
  Juan Quintela quint...@redhat.com
  Juan Quintela quint...@trasno.org
  Jun Li junm...@gmail.com
  Kevin Wolf kw...@redhat.com
  KONRAD Frederic fred.kon...@greensocs.com
  Laszlo Ersek ler...@redhat.com
  Leon Alrae leon.al...@imgtec.com
  Li Liang liang.z...@intel.com
  Li Liu john.li...@huawei.com
  Luiz Capitulino lcapitul...@redhat.com
  Maciej W. Rozycki ma...@codesourcery.com
  Magnus Reftel ref...@spotify.com
  Marc-André Lureau marcandre.lur...@gmail.com
  Marcel Apfelbaum marce...@redhat.com
  Mark Cave-Ayland mark.cave-ayl...@ilande.co.uk
  Markus Armbruster arm...@redhat.com
  Martin Decky mar...@decky.cz
  Martin Simmons mar...@lispworks.com
  Max Filippov jcmvb...@gmail.com
  Max Reitz mre...@redhat.com
  Michael 

Re: [Xen-devel] Strangeness in generated xen-command-line.html

2014-11-19 Thread Konrad Rzeszutek Wilk
On November 19, 2014 6:05:33 AM EST, Andrew Cooper andrew.coop...@citrix.com 
wrote:
On 19/11/14 11:04, Ian Campbell wrote:
 On Wed, 2014-11-19 at 10:52 +, Andrew Cooper wrote:
 On 19/11/14 10:46, Ian Campbell wrote:
 On Wed, 2014-11-19 at 10:38 +, Ian Campbell wrote:
 I've not been able to find a workaround...
 This works for me...

 8---

 From 3483179d333c47deacfc8c2eb195bf7dc4a555ff Mon Sep 17 00:00:00
2001
 From: Ian Campbell ian.campb...@citrix.com
 Date: Wed, 19 Nov 2014 10:42:18 +
 Subject: [PATCH] docs: workaround markdown parser error in
  xen-command-line.markdown

 Some versions of markdown (specifically the one in Debian Wheezy,
currently
 used to generate
 http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html)
seem to be
 confused by nested lists in the middle of multi-paragraph parent
list entries
 as seen in the com1,com2 entry.

 The effect is that the Default section of all following entries
are replace
 by some sort of hash or checksum (at least, a string of 32 random
seeming hex
 digits).

 Workaround this issue by making the decriptions of the DPS options
a nested
 list, moving the existing nested list describing the options for S
into a third
 level list. This seems to avoid the issue, and is arguably better
formatting in
 its own right (at least its not a regression IMHO)

 Signed-off-by: Ian Campbell ian.campb...@citrix.com
 I had just identified a different way, but this way is slightly
better.

 If you take out all the blank lines visible in the context below,
the
 resulting HTML will be correctly formatted and rather neater (i.e.
 without sporadic blank lines).
 Agreed.

 8--

 From 53398a9729d391f1fb7b6f753a0032b1f3604d4d Mon Sep 17 00:00:00
2001
 From: Ian Campbell ian.campb...@citrix.com
 Date: Wed, 19 Nov 2014 10:42:18 +
 Subject: [PATCH] docs: workaround markdown parser error in
  xen-command-line.markdown

 Some versions of markdown (specifically the one in Debian Wheezy,
currently
 used to generate
 http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html) seem
to be
 confused by nested lists in the middle of multi-paragraph parent list
entries
 as seen in the com1,com2 entry.

 The effect is that the Default section of all following entries are
replace
 by some sort of hash or checksum (at least, a string of 32 random
seeming hex
 digits).

 Workaround this issue by making the decriptions of the DPS options a
nested
 list, moving the existing nested list describing the options for S
into a third
 level list. This seems to avoid the issue, and is arguably better
formatting in
 its own right (at least its not a regression IMHO)

 Signed-off-by: Ian Campbell ian.campb...@citrix.com

Reviewed-by: Andrew Cooper andrew.coop...@citrix.com

Release-Acked-by: Konrad Rzeszutek Wilk (konrad.w...@oracle.com)

In case you were thinking of putting in 4.5

 ---
 v2: Less blank lines == nicer output.
 ---
  docs/misc/xen-command-line.markdown |   21 -
  1 file changed, 8 insertions(+), 13 deletions(-)

 diff --git a/docs/misc/xen-command-line.markdown
b/docs/misc/xen-command-line.markdown
 index 0830e5f..b7eaeea 100644
 --- a/docs/misc/xen-command-line.markdown
 +++ b/docs/misc/xen-command-line.markdown
 @@ -247,19 +247,14 @@ Both option `com1` and `com2` follow the same
format.
  * Optionally, a clock speed measured in hz can be specified.
  * `DPS` represents the number of data bits, the parity, and the
number
of stop bits.
 -
 -  `D` is an integer between 5 and 8 for the number of data bits.
 -
 -  `P` is a single character representing the type of parity:
 -
 -   * `n` No
 -   * `o` Odd
 -   * `e` Even
 -   * `m` Mark
 -   * `s` Space
 -
 -  `S` is an integer 1 or 2 for the number of stop bits.
 -
 +  * `D` is an integer between 5 and 8 for the number of data bits.
 +  * `P` is a single character representing the type of parity:
 +  * `n` No
 +  * `o` Odd
 +  * `e` Even
 +  * `m` Mark
 +  * `s` Space
 +  * `S` is an integer 1 or 2 for the number of stop bits.
  * `io-base` is an integer which specifies the IO base port for
UART
registers.
  * `irq` is the IRQ number to use, or `0` to use the UART in poll


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2 V3] remove domain field in xenstore backend dir

2014-11-19 Thread Ian Jackson
Chunyan Liu writes ([PATCH 1/2 V3] remove domain field in xenstore backend 
dir):
 Remove the unusual 'domain' field under backend directory. The
 affected are backend/console, backend/vfb, backend/vkbd.

Thanks.

Acked-by: Ian Jackson ian.jack...@eu.citrix.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Stefano Stabellini
On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 Hi Stefano,
 
 Thank you for your support.
 
 You are right - with latest change you've proposed I got a continuous
 prints during platform hang:
 
 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
 
 Looks line issue needs further deeper debugging.

Cool! You could simply print what irqs are in all LRs when they are
full, for example you could call gic_dump_info. That would tell us what
is taking all the LRs space we have.

How many LRs are available on omap5 anyway?

I doubt you have so much interrupt traffic to actually fill all the LRs,
so I am thinking that a few LRs might not be cleared properly (that
should happen on hypervisor entry, gic_update_one_lr should take care of
it).


 Regards,
 Andrii
 
 On Tue, Nov 18, 2014 at 7:51 PM, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
  Hello Andrii,
  we are getting closer :-)
 
  It would help if you post the output with GIC_DEBUG defined but without
  the other change that fixes the issue.
 
  I think the problem is probably due to software irqs.
  You are getting too many
 
  gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is still lr_pending
 
  messages. That means you are loosing virtual SGIs (guest VCPU to guest
  VCPU). It would be best to investigate why, especially if you get many
  more of the same messages without the MAINTENANCE_IRQ change I
  suggested.
 
  This patch might also help understading the problem more:
 
 
  diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
  index b7516c0..5eaeca2 100644
  --- a/xen/arch/arm/gic.c
  +++ b/xen/arch/arm/gic.c
  @@ -717,7 +717,12 @@ static void gic_restore_pending_irqs(struct vcpu *v)
   list_for_each_entry_safe ( p, t, v-arch.vgic.lr_pending, lr_queue )
   {
   i = find_first_zero_bit(this_cpu(lr_mask), nr_lrs);
  -if ( i = nr_lrs ) return;
  +if ( i = nr_lrs )
  +{
  +gdprintk(XENLOG_DEBUG, LRs full, not injecting irq=%u into 
  d%dv%d\n,
  +p-irq, v-domain-domain_id, v-vcpu_id);
  +continue;
  +}
 
   spin_lock_irqsave(gic.lock, flags);
   gic_set_lr(i, p, GICH_LR_PENDING);
 
 
 
 
  On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
  Hi Stefano,
 
  No hangs with this change.
  Complete log is the following:
 
  U-Boot SPL 2013.10-00499-g062782f (Oct 14 2014 - 11:36:26)
  DRA752 ES1.0
  ethaddr not set. Validating first E-fuse MAC
  cpsw
  - UART enabled -
  - CPU  booting -
  - Xen starting in Hyp mode -
  - Zero BSS -
  - Setting up control registers -
  - Turning on paging -
  - Ready -
  (XEN) Checking for initrd in /chosen
  (XEN) RAM: 8000 - 9fff
  (XEN) RAM: a000 - bfff
  (XEN) RAM: c000 - dfff
  (XEN)
  (XEN) MODULE[1]: c200 - c20069aa
  (XEN) MODULE[2]: c000 - c200
  (XEN) MODULE[3]:  - 
  (XEN) MODULE[4]: c300 - c301
  (XEN)  RESVD[0]: ba30 - bfd0
  (XEN)  RESVD[1]: 9580 - 9590
  (XEN)  RESVD[2]: 98a0 - 98b0
  (XEN)  RESVD[3]: 95f0 - 98a0
  (XEN)  RESVD[4]: 9590 - 95f0
  (XEN)
  (XEN) Command line: dom0_mem=128M console=dtuart dtuart=serial0
  dom0_max_vcpus=2 bootscrub=0 flask_enforcing=1
  (XEN) Placing Xen at 0xdfe0-0xe000
  (XEN) Xen heap: d200-de00 (49152 pages)
  (XEN) Dom heap: 344064 pages
  (XEN) Domain heap initialised
  (XEN) Looking for UART console serial0
   Xen 4.5-unstable
  (XEN) Xen version 4.5-unstable (atseglytskyi@)
  (arm-linux-gnueabihf-gcc (crosstool-NG
  linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) 4.7.3
  20130328 (prerelease)) debu4
  (XEN) Latest ChangeSet: Thu Jul 3 12:55:26 2014 +0300 git:3ee354f-dirty
  (XEN) Processor: 412fc0f2: ARM Limited, variant: 0x2, part 0xc0f, rev 0x2
  (XEN) 32-bit Execution:
  (XEN)   Processor Features: 1131:00011011
  (XEN) Instruction Sets: AArch32 Thumb Thumb-2 ThumbEE Jazelle
  (XEN) Extensions: GenericTimer Security
  (XEN)   Debug Features: 02010555
  (XEN)   Auxiliary Features: 
  (XEN)   Memory Model Features: 10201105 2000 0124 02102211
  (XEN)  ISA Features: 02101110 13112111 21232041 2131 10011142 
  (XEN) Platform: TI DRA7
  (XEN) /psci method must be smc, but is: hvc
  (XEN) Set AuxCoreBoot1 to dfe0004c (0020004c)
  (XEN) Set AuxCoreBoot0 to 0x20
  (XEN) Generic 

Re: [Xen-devel] [PATCH 2/2 V3] fix rename: xenstore not fully updated

2014-11-19 Thread Ian Jackson
Chunyan Liu writes ([PATCH 2/2 V3] fix rename: xenstore not fully updated):
 libxl__domain_rename only updates /local/domain/domid/name,
 /vm/uuid/name in xenstore are not updated. Add code in
 libxl__domain_rename to update /vm/uuid/name too.

Acked-by: Ian Jackson ian.jack...@eu.citrix.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Andrii Tseglytskyi
On Wed, Nov 19, 2014 at 1:12 PM, Stefano Stabellini
stefano.stabell...@eu.citrix.com wrote:
 On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 Hi Stefano,

 Thank you for your support.

 You are right - with latest change you've proposed I got a continuous
 prints during platform hang:

 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0

 Looks line issue needs further deeper debugging.

 Cool! You could simply print what irqs are in all LRs when they are
 full, for example you could call gic_dump_info. That would tell us what
 is taking all the LRs space we have.

 How many LRs are available on omap5 anyway?

:) Already done this:


(XEN) gic.c:725:d0v0 LRs full, not injecting irq=27 nr_lrs 4 i 4 into d0v0
(XEN) GICH_LRs (vcpu 0) mask=f
(XEN)HW_LR[0]=1a1f
(XEN)HW_LR[1]=9a00e439
(XEN)HW_LR[2]=1a02
(XEN)HW_LR[3]=9a015856
(XEN) Inflight irq=31 lr=0
(XEN) Inflight irq=57 lr=1
(XEN) Inflight irq=2 lr=2
(XEN) Inflight irq=86 lr=3
(XEN) Inflight irq=27 lr=255
(XEN) Pending irq=27



 I doubt you have so much interrupt traffic to actually fill all the LRs,
 so I am thinking that a few LRs might not be cleared properly (that
 should happen on hypervisor entry, gic_update_one_lr should take care of
 it).

This actually explains why this happens during domU start - SGI
traffic might be very heavy this time



 Regards,
 Andrii

 On Tue, Nov 18, 2014 at 7:51 PM, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
  Hello Andrii,
  we are getting closer :-)
 
  It would help if you post the output with GIC_DEBUG defined but without
  the other change that fixes the issue.
 
  I think the problem is probably due to software irqs.
  You are getting too many
 
  gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is still 
  lr_pending
 
  messages. That means you are loosing virtual SGIs (guest VCPU to guest
  VCPU). It would be best to investigate why, especially if you get many
  more of the same messages without the MAINTENANCE_IRQ change I
  suggested.
 
  This patch might also help understading the problem more:
 
 
  diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
  index b7516c0..5eaeca2 100644
  --- a/xen/arch/arm/gic.c
  +++ b/xen/arch/arm/gic.c
  @@ -717,7 +717,12 @@ static void gic_restore_pending_irqs(struct vcpu *v)
   list_for_each_entry_safe ( p, t, v-arch.vgic.lr_pending, lr_queue )
   {
   i = find_first_zero_bit(this_cpu(lr_mask), nr_lrs);
  -if ( i = nr_lrs ) return;
  +if ( i = nr_lrs )
  +{
  +gdprintk(XENLOG_DEBUG, LRs full, not injecting irq=%u into 
  d%dv%d\n,
  +p-irq, v-domain-domain_id, v-vcpu_id);
  +continue;
  +}
 
   spin_lock_irqsave(gic.lock, flags);
   gic_set_lr(i, p, GICH_LR_PENDING);
 
 
 
 
  On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
  Hi Stefano,
 
  No hangs with this change.
  Complete log is the following:
 
  U-Boot SPL 2013.10-00499-g062782f (Oct 14 2014 - 11:36:26)
  DRA752 ES1.0
  ethaddr not set. Validating first E-fuse MAC
  cpsw
  - UART enabled -
  - CPU  booting -
  - Xen starting in Hyp mode -
  - Zero BSS -
  - Setting up control registers -
  - Turning on paging -
  - Ready -
  (XEN) Checking for initrd in /chosen
  (XEN) RAM: 8000 - 9fff
  (XEN) RAM: a000 - bfff
  (XEN) RAM: c000 - dfff
  (XEN)
  (XEN) MODULE[1]: c200 - c20069aa
  (XEN) MODULE[2]: c000 - c200
  (XEN) MODULE[3]:  - 
  (XEN) MODULE[4]: c300 - c301
  (XEN)  RESVD[0]: ba30 - bfd0
  (XEN)  RESVD[1]: 9580 - 9590
  (XEN)  RESVD[2]: 98a0 - 98b0
  (XEN)  RESVD[3]: 95f0 - 98a0
  (XEN)  RESVD[4]: 9590 - 95f0
  (XEN)
  (XEN) Command line: dom0_mem=128M console=dtuart dtuart=serial0
  dom0_max_vcpus=2 bootscrub=0 flask_enforcing=1
  (XEN) Placing Xen at 0xdfe0-0xe000
  (XEN) Xen heap: d200-de00 (49152 pages)
  (XEN) Dom heap: 344064 pages
  (XEN) Domain heap initialised
  (XEN) Looking for UART console serial0
   Xen 4.5-unstable
  (XEN) Xen version 4.5-unstable (atseglytskyi@)
  (arm-linux-gnueabihf-gcc (crosstool-NG
  linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) 4.7.3
  20130328 (prerelease)) debu4
  (XEN) Latest ChangeSet: Thu Jul 3 12:55:26 2014 +0300 git:3ee354f-dirty
  (XEN) Processor: 412fc0f2: ARM Limited, variant: 0x2, 

Re: [Xen-devel] RFC: vNUMA project

2014-11-19 Thread George Dunlap
On Tue, Nov 11, 2014 at 5:36 PM, Wei Liu wei.l...@citrix.com wrote:
 Third stage:

Basic PoD   Ballooning  Mem_relocation
 PV/PVH   Y   na   Y na
 HVM  Y   YY X

 NUMA-aware PoD?

Hmm, that will certainly be interesting. :-)

The point of PoD is to allocate a chunk of memory at guest creation
time and have the VM balloon down to fit that amount of memory.

If we assume that vnodes correspond to some set of pnodes, then the
initial allocation will (ideally) have to come from *some* subset of
those pnodes; but depending on the situation, it may be any
combinaton.  So for example, a guest with 2 vnodes each with 2GiB each
might end up with 1G on each pnode, or 2 G on one pnode and none on
another.

In this case, the only way to get an ideal memory layout is to
communicate back to the balloon driver how much memory to free on each
virtual node.  If the split is 1G / 1G, then the balloon driver will
need to allocate 1G for each vnode.  If the split was 0.5G / 1.5G,
then it would have to allocate 1.5G / 0.5G, c.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5] docs/commandline: Fix formatting issues

2014-11-19 Thread Ian Campbell
On Wed, 2014-11-19 at 11:17 +, Andrew Cooper wrote:
 In both of these cases, markdown was interpreting the text as regular text,
 and reflowing it as a regular paragraph, leading to a single line as output.
 Reformat them as code blocks inside blockquote blocks, which causes them to
 take their precise whitespace layout.
 
 Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
Acked-by: Ian Campbell ian.campb...@citrix.com

 CC: Ian Jackson ian.jack...@eu.citrix.com
 CC: Wei Liu wei.l...@citrix.com
 CC: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 
 ---
 
 Konrad: this is a documentation fix, so requesting a 4.5 ack please.

FWIW IMHO documentation fixes in general should have a very low bar to
cross until very late in the release cycle...

 ---
  docs/misc/xen-command-line.markdown |   38 
 +--
  1 file changed, 19 insertions(+), 19 deletions(-)
 
 diff --git a/docs/misc/xen-command-line.markdown 
 b/docs/misc/xen-command-line.markdown
 index f054d4b..e3a5a15 100644
 --- a/docs/misc/xen-command-line.markdown
 +++ b/docs/misc/xen-command-line.markdown
 @@ -475,13 +475,13 @@ defaults of 1 and unlimited respectively are used 
 instead.
  
  For example, with `dom0_max_vcpus=4-8`:
  
 - Number of
 -  PCPUs | Dom0 VCPUs
 -   2|  4
 -   4|  4
 -   6|  6
 -   8|  8
 -  10|  8
 +Number of
 + PCPUs | Dom0 VCPUs
 +  2|  4
 +  4|  4
 +  6|  6
 +  8|  8
 + 10|  8
  
  ### dom0\_mem
   `= List of ( min:size | max:size | size )`
 @@ -684,18 +684,18 @@ supported only when compiled with XSM\_ENABLE=y on x86.
  The specified value is a bit mask with the individual bits having the
  following meaning:
  
 -Bit  0 - debug level 0 (unused at present)
 -Bit  1 - debug level 1 (Control Register logging)
 -Bit  2 - debug level 2 (VMX logging of MSR restores when context switching)
 -Bit  3 - debug level 3 (unused at present)
 -Bit  4 - I/O operation logging
 -Bit  5 - vMMU logging
 -Bit  6 - vLAPIC general logging
 -Bit  7 - vLAPIC timer logging
 -Bit  8 - vLAPIC interrupt logging
 -Bit  9 - vIOAPIC logging
 -Bit 10 - hypercall logging
 -Bit 11 - MSR operation logging
 + Bit  0 - debug level 0 (unused at present)
 + Bit  1 - debug level 1 (Control Register logging)
 + Bit  2 - debug level 2 (VMX logging of MSR restores when context 
 switching)
 + Bit  3 - debug level 3 (unused at present)
 + Bit  4 - I/O operation logging
 + Bit  5 - vMMU logging
 + Bit  6 - vLAPIC general logging
 + Bit  7 - vLAPIC timer logging
 + Bit  8 - vLAPIC interrupt logging
 + Bit  9 - vIOAPIC logging
 + Bit 10 - hypercall logging
 + Bit 11 - MSR operation logging
  
  Recognized in debug builds of the hypervisor only.
  



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 0/2 V3] fix rename: xenstore not fully updated

2014-11-19 Thread Ian Jackson
Hi Konrad, I have another release ack request:

Chunyan Liu writes ([PATCH 0/2 V3] fix rename: xenstore not fully updated):
 Currently libxl__domain_rename only update /local/domain/domid/name,
 still some places in xenstore are not updated, including:
 /vm/uuid/name and /local/domain/0/backend/device/domid/.../domain.
 This patch series updates /vm/uuid/name in xenstore,

This ([PATCH 2/2 V3] fix rename: xenstore not fully updated) is a
bugfix which I think should go into Xen 4.5.

The risk WITHOUT this patch is that there are out-of-tree tools which
look here for the domain name and will get confused after it is
renamed.

The risk WITH this patch is that the implementation could be wrong
somehow, in which case the code would need to be updated again.  But
it's a very small patch and has been fully reviewed.


 and removes the unusual 'domain' field under backend directory.

This is a reference to [PATCH 1/2 V3] remove domain field in xenstore
backend dir.  The change to libxl is that it no longer writes
  /local/domain/0/backend/vfb/3/0/domain = name of frontend domain

It seems hardly conceivable that anyone could be using this field.
Existing users will not work after the domain is renamed, anyway.

The risk on both sides of the decision lies entirely with out-of-tree
software which looks here for the domain name for some reason.  We
don't think any such tools exist.

Note that the domain name cannot be used directly by a non-dom0
programs because the mapping between domids and domain names is in a
part of xenstore which is not accessible to guests.  (It is possible
that a guest would read this value merely to display it.)


If such out-of-tree software exists:

The risk WITHOUT this patch is that it might report, or (worse)
operate on, the wrong domain entirely.

The risk WITH this patch is that it (or some subset of its
functionality) would stop working right away.


An alternative would be to update all of these entries on rename.
That's a large and somewhat fiddly patch which we don't think is
appropriate given that the presence of this key is a mistake.


Thanks,
ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [BUGFIX][PATCH for 2.2 1/1] hw/ide/core.c: Prevent SIGSEGV during migration

2014-11-19 Thread Stefano Stabellini
On Wed, 19 Nov 2014, Konrad Rzeszutek Wilk wrote:
 On November 19, 2014 5:52:58 AM EST, Stefano Stabellini 
 stefano.stabell...@eu.citrix.com wrote:
 ping?
 
 On Tue, 18 Nov 2014, Stefano Stabellini wrote:
  Konrad,
  I think we should have this fix in Xen 4.5. Should I go ahead and
  backport it?
 
 Go for it. Release-Acked-by: Konrad Rzeszutek Wilk (konrad.w...@oracle.com)

Done, thanks!


  
  On Mon, 17 Nov 2014, Don Slutz wrote:
   The other callers to blk_set_enable_write_cache() in this file
   already check for s-blk == NULL.
   
   Signed-off-by: Don Slutz dsl...@verizon.com
   ---
   
   I think this is a bugfix that should be back ported to stable
   releases.
   
   I also think this should be done in xen's copy of QEMU for 4.5 with
   back port(s) to active stable releases.
   
   Note: In 2.1 and earlier the routine is
   bdrv_set_enable_write_cache(); variable is s-bs.
   
hw/ide/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
   
   diff --git a/hw/ide/core.c b/hw/ide/core.c
   index 00e21cf..d4af5e2 100644
   --- a/hw/ide/core.c
   +++ b/hw/ide/core.c
   @@ -2401,7 +2401,7 @@ static int ide_drive_post_load(void *opaque,
 int version_id)
{
IDEState *s = opaque;

   -if (s-identify_set) {
   +if (s-blk  s-identify_set) {
blk_set_enable_write_cache(s-blk, !!(s-identify_data[85]
  (1  5)));
}
return 0;
   -- 
   1.8.4
   
  
 
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Ian Campbell
On Wed, 2014-11-19 at 11:42 +, Stefano Stabellini wrote:
 So it looks like there is not actually anything wrong, is just that you
 have too much inflight irqs? It should cause problems because in that
 case GICH_HCR_UIE should be set and you should get a maintenance
 interrupt when LRs become available (actually when none, or only one,
 of the List register entries is marked as a valid interrupt).
 
 Maybe GICH_HCR_UIE is the one that doesn't work properly.

How much testing did this aspect get when the no-maint-irq series
originally went in? Did you manage to find a workload which filled all
the LRs or try artificially limiting the number of LRs somehow in order
to provoke it?

I ask because my intuition is that this won't happen very much, meaning
those code paths may not be as well tested...



  It might be
 worth checking that you are receiving maintenance interrupts:
 
 
 diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
 index b7516c0..b3eaa44 100644
 --- a/xen/arch/arm/gic.c
 +++ b/xen/arch/arm/gic.c
 @@ -868,6 +868,8 @@ static void maintenance_interrupt(int irq, void *dev_id, 
 struct cpu_user_regs *r
   * on return to guest that is going to clear the old LRs and inject
   * new interrupts.
   */
 +
 +gdprintk(XENLOG_DEBUG, maintenance interrupt\n);
  }
  
  void gic_dump_info(struct vcpu *v)
 
  
 You could also try to replace GICH_HCR_UIE with GICH_HCR_NPIE, you
 should still be receiving maintenance interrupts when one or more LRs
 become available.
 
 
  
   I doubt you have so much interrupt traffic to actually fill all the LRs,
   so I am thinking that a few LRs might not be cleared properly (that
   should happen on hypervisor entry, gic_update_one_lr should take care of
   it).
  
  This actually explains why this happens during domU start - SGI
  traffic might be very heavy this time
  
  
  
   Regards,
   Andrii
  
   On Tue, Nov 18, 2014 at 7:51 PM, Stefano Stabellini
   stefano.stabell...@eu.citrix.com wrote:
Hello Andrii,
we are getting closer :-)
   
It would help if you post the output with GIC_DEBUG defined but without
the other change that fixes the issue.
   
I think the problem is probably due to software irqs.
You are getting too many
   
gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is still 
lr_pending
   
messages. That means you are loosing virtual SGIs (guest VCPU to guest
VCPU). It would be best to investigate why, especially if you get many
more of the same messages without the MAINTENANCE_IRQ change I
suggested.
   
This patch might also help understading the problem more:
   
   
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index b7516c0..5eaeca2 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -717,7 +717,12 @@ static void gic_restore_pending_irqs(struct vcpu 
*v)
 list_for_each_entry_safe ( p, t, v-arch.vgic.lr_pending, 
lr_queue )
 {
 i = find_first_zero_bit(this_cpu(lr_mask), nr_lrs);
-if ( i = nr_lrs ) return;
+if ( i = nr_lrs )
+{
+gdprintk(XENLOG_DEBUG, LRs full, not injecting irq=%u 
into d%dv%d\n,
+p-irq, v-domain-domain_id, v-vcpu_id);
+continue;
+}
   
 spin_lock_irqsave(gic.lock, flags);
 gic_set_lr(i, p, GICH_LR_PENDING);
   
   
   
   
On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
Hi Stefano,
   
No hangs with this change.
Complete log is the following:
   
U-Boot SPL 2013.10-00499-g062782f (Oct 14 2014 - 11:36:26)
DRA752 ES1.0
ethaddr not set. Validating first E-fuse MAC
cpsw
- UART enabled -
- CPU  booting -
- Xen starting in Hyp mode -
- Zero BSS -
- Setting up control registers -
- Turning on paging -
- Ready -
(XEN) Checking for initrd in /chosen
(XEN) RAM: 8000 - 9fff
(XEN) RAM: a000 - bfff
(XEN) RAM: c000 - dfff
(XEN)
(XEN) MODULE[1]: c200 - c20069aa
(XEN) MODULE[2]: c000 - c200
(XEN) MODULE[3]:  - 
(XEN) MODULE[4]: c300 - c301
(XEN)  RESVD[0]: ba30 - bfd0
(XEN)  RESVD[1]: 9580 - 9590
(XEN)  RESVD[2]: 98a0 - 98b0
(XEN)  RESVD[3]: 95f0 - 98a0
(XEN)  RESVD[4]: 9590 - 95f0
(XEN)
(XEN) Command line: dom0_mem=128M console=dtuart dtuart=serial0
dom0_max_vcpus=2 bootscrub=0 flask_enforcing=1
(XEN) Placing Xen at 0xdfe0-0xe000
(XEN) Xen heap: d200-de00 (49152 pages)
(XEN) Dom heap: 344064 pages
(XEN) Domain heap initialised
(XEN) Looking for UART console serial0
 Xen 

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Andrii Tseglytskyi
Hi Stefano,

   if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
  -GICH[GICH_HCR] |= GICH_HCR_UIE;
  +GICH[GICH_HCR] |= GICH_HCR_NPIE;
   else
  -GICH[GICH_HCR] = ~GICH_HCR_UIE;
  +GICH[GICH_HCR] = ~GICH_HCR_NPIE;
 
   }

 Yes, exactly

I tried, hang still occurs with this change

Regards,
Andrii




-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Andrii Tseglytskyi
Hi Julien,

On Wed, Nov 19, 2014 at 2:23 PM, Julien Grall julien.gr...@linaro.org wrote:
 On 11/19/2014 12:17 PM, Stefano Stabellini wrote:
 On Wed, 19 Nov 2014, Ian Campbell wrote:
 On Wed, 2014-11-19 at 11:42 +, Stefano Stabellini wrote:
 So it looks like there is not actually anything wrong, is just that you
 have too much inflight irqs? It should cause problems because in that
 case GICH_HCR_UIE should be set and you should get a maintenance
 interrupt when LRs become available (actually when none, or only one,
 of the List register entries is marked as a valid interrupt).

 Maybe GICH_HCR_UIE is the one that doesn't work properly.

 How much testing did this aspect get when the no-maint-irq series
 originally went in? Did you manage to find a workload which filled all
 the LRs or try artificially limiting the number of LRs somehow in order
 to provoke it?

 I ask because my intuition is that this won't happen very much, meaning
 those code paths may not be as well tested...

 I did test it by artificially limiting the number of LRs to 1.
 However there have been many iterations of that series and I didn't run
 this test at every iteration.

 am I the only to think this may not be related to this bug? All the LRs
 are full with IRQ of the same priority. So it's valid.

 As gic_restore_pending_irqs is called every time that we return to the
 guest. It could be anything else.

 It would be interesting to see why we are trapping all the time in Xen.


I may perform any test if you have some specific scenario.


 Regards,

 --
 Julien Grall



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Julien Grall
On 11/19/2014 12:40 PM, Andrii Tseglytskyi wrote:
 Hi Julien,
 
 On Wed, Nov 19, 2014 at 2:23 PM, Julien Grall julien.gr...@linaro.org wrote:
 On 11/19/2014 12:17 PM, Stefano Stabellini wrote:
 On Wed, 19 Nov 2014, Ian Campbell wrote:
 On Wed, 2014-11-19 at 11:42 +, Stefano Stabellini wrote:
 So it looks like there is not actually anything wrong, is just that you
 have too much inflight irqs? It should cause problems because in that
 case GICH_HCR_UIE should be set and you should get a maintenance
 interrupt when LRs become available (actually when none, or only one,
 of the List register entries is marked as a valid interrupt).

 Maybe GICH_HCR_UIE is the one that doesn't work properly.

 How much testing did this aspect get when the no-maint-irq series
 originally went in? Did you manage to find a workload which filled all
 the LRs or try artificially limiting the number of LRs somehow in order
 to provoke it?

 I ask because my intuition is that this won't happen very much, meaning
 those code paths may not be as well tested...

 I did test it by artificially limiting the number of LRs to 1.
 However there have been many iterations of that series and I didn't run
 this test at every iteration.

 am I the only to think this may not be related to this bug? All the LRs
 are full with IRQ of the same priority. So it's valid.

 As gic_restore_pending_irqs is called every time that we return to the
 guest. It could be anything else.

 It would be interesting to see why we are trapping all the time in Xen.

 
 I may perform any test if you have some specific scenario.

I have no specific scenario in my mind :/.

It looks like I'm able to reproduce it on my ARM board by the restricted
the number of LRs to 1.

I will investigate.

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] qemu 2.2 crash on linux hvm domU (full backtrace included)

2014-11-19 Thread Fabio Fantoni

Il 14/11/2014 12:25, Fabio Fantoni ha scritto:
dom0 xen-unstable from staging git with x86/hvm: Extend HVM cpuid 
leaf with vcpu id and x86/hvm: Add per-vcpu evtchn upcalls patches, 
and qemu 2.2 from spice git (spice/next commit 
e779fa0a715530311e6f59fc8adb0f6eca914a89):

https://github.com/Fantu/Xen/commits/rebase/m2r-staging


I tried with qemu  tag v2.2.0-rc2 and crash still happen, here the full 
backtrace of latest test:

Program received signal SIGSEGV, Segmentation fault.
0x55689b07 in vmport_ioport_read (opaque=0x564443a0, addr=0,
size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73
73  eax = env-regs[R_EAX];
(gdb) bt full
#0  0x55689b07 in vmport_ioport_read (opaque=0x564443a0, 
addr=0,

size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73
s = 0x564443a0
cs = 0x0
cpu = 0x0
__func__ = vmport_ioport_read
env = 0x8250
command = 0 '\000'
eax = 0
#1  0x55655fc4 in memory_region_read_accessor (mr=0x5628,
addr=0, value=0x7fffd8d0, size=4, shift=0, mask=4294967295)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:410
tmp = 0
#2  0x556562b7 in access_with_adjusted_size (addr=0,
value=0x7fffd8d0, size=4, access_size_min=4, access_size_max=4,
access=0x55655f62 memory_region_read_accessor, 
mr=0x5628)

at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:480
access_mask = 4294967295
access_size = 4
i = 0
#3  0x556590e9 in memory_region_dispatch_read1 
(mr=0x5628,

addr=0, size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1077
data = 0
#4  0x556591b1 in memory_region_dispatch_read (mr=0x5628,
addr=0, pval=0x7fffd9a8, size=4)
---Type return to continue, or q return to quit---
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1099
No locals.
#5  0x5565cbbc in io_mem_read (mr=0x5628, addr=0,
pval=0x7fffd9a8, size=4)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1962
No locals.
#6  0x5560a1ca in address_space_rw (as=0x55eaf920, 
addr=22104,

buf=0x7fffda50 \377\377\377\377, len=4, is_write=false)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/exec.c:2167
l = 4
ptr = 0x55a92d87 %s/%d:\n
val = 7852232130387826944
addr1 = 0
mr = 0x5628
error = false
#7  0x5560a38f in address_space_read (as=0x55eaf920, 
addr=22104,

buf=0x7fffda50 \377\377\377\377, len=4)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/exec.c:2205
No locals.
#8  0x5564fd4b in cpu_inl (addr=22104)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/ioport.c:117
buf = \377\377\377\377
val = 21845
#9  0x55670c73 in do_inp (addr=22104, size=4)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:684
---Type return to continue, or q return to quit---
No locals.
#10 0x55670ee0 in cpu_ioreq_pio (req=0x77ff3020)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:747
i = 1
#11 0x556714b3 in handle_ioreq (state=0x563c2510,
req=0x77ff3020) at 
/mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:853

No locals.
#12 0x55671826 in cpu_handle_ioreq (opaque=0x563c2510)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:931
state = 0x563c2510
req = 0x77ff3020
#13 0x5596e240 in qemu_iohandler_poll (pollfds=0x56389a30, 
ret=1)

at iohandler.c:143
revents = 1
pioh = 0x563f7610
ioh = 0x56450a40
#14 0x5596de1c in main_loop_wait (nonblocking=0) at 
main-loop.c:495

ret = 1
timeout = 4294967295
timeout_ns = 3965432
#15 0x55756d3f in main_loop () at vl.c:1882
nonblocking = false
last_io = 0
#16 0x5575ea49 in main (argc=62, argv=0x7fffe048,
envp=0x7fffe240) at vl.c:4400
---Type return to continue, or q return to quit---
i = 128
snapshot = 0
linux_boot = 0
initrd_filename = 0x0
kernel_filename = 0x0
kernel_cmdline = 0x55a48f86 
boot_order = 0x56387460 dc
ds = 0x564b2040
cyls = 0
heads = 0
secs = 0
translation = 0
hda_opts = 0x0
opts = 0x563873b0
machine_opts = 0x56389010
icount_opts = 0x0
olist = 0x55e57e80
optind = 62
optarg = 0x7fffe914 
file=/mnt/vm/disks/FEDORA19.disk1.xm,if=ide,index=0,media=disk,format=raw,cache=writeback

loadvm = 0x0
machine_class = 0x5637d5c0
cpu_model = 0x0
vga_model = 0x0
qtest_chrdev = 0x0
---Type return to continue, or q return to quit---
qtest_log = 0x0
pid_file = 0x0
incoming = 0x0
show_vnc_port = 0
defconfig = true
userconfig = true
log_mask = 0x0
log_file = 0x0
   

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Julien Grall
On 11/19/2014 01:30 PM, Andrii Tseglytskyi wrote:
 On Wed, Nov 19, 2014 at 3:26 PM, Julien Grall julien.gr...@linaro.org wrote:
 On 11/19/2014 12:40 PM, Andrii Tseglytskyi wrote:
 Hi Julien,

 On Wed, Nov 19, 2014 at 2:23 PM, Julien Grall julien.gr...@linaro.org 
 wrote:
 On 11/19/2014 12:17 PM, Stefano Stabellini wrote:
 On Wed, 19 Nov 2014, Ian Campbell wrote:
 On Wed, 2014-11-19 at 11:42 +, Stefano Stabellini wrote:
 So it looks like there is not actually anything wrong, is just that you
 have too much inflight irqs? It should cause problems because in that
 case GICH_HCR_UIE should be set and you should get a maintenance
 interrupt when LRs become available (actually when none, or only one,
 of the List register entries is marked as a valid interrupt).

 Maybe GICH_HCR_UIE is the one that doesn't work properly.

 How much testing did this aspect get when the no-maint-irq series
 originally went in? Did you manage to find a workload which filled all
 the LRs or try artificially limiting the number of LRs somehow in order
 to provoke it?

 I ask because my intuition is that this won't happen very much, meaning
 those code paths may not be as well tested...

 I did test it by artificially limiting the number of LRs to 1.
 However there have been many iterations of that series and I didn't run
 this test at every iteration.

 am I the only to think this may not be related to this bug? All the LRs
 are full with IRQ of the same priority. So it's valid.

 As gic_restore_pending_irqs is called every time that we return to the
 guest. It could be anything else.

 It would be interesting to see why we are trapping all the time in Xen.


 I may perform any test if you have some specific scenario.

 I have no specific scenario in my mind :/.

 It looks like I'm able to reproduce it on my ARM board by the restricted
 the number of LRs to 1.

 
 Do you mean that you got a hang with current xen/master branch ?

Yes but I forgot to update another part of the code.

With the patch below to restrict the number of LRs I'm still able to boot.
And don't see any maintenance interrupt.

Stefano, is it valid?

diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
index faad1ff..c1c0f7ff 100644
--- a/xen/arch/arm/gic-v2.c
+++ b/xen/arch/arm/gic-v2.c
@@ -327,6 +327,7 @@ static void __cpuinit gicv2_hyp_init(void)
 vtr = readl_gich(GICH_VTR);
 nr_lrs  = (vtr  GICH_V2_VTR_NRLRGS) + 1;
 gicv2_info.nr_lrs = nr_lrs;
+gicv2_info.nr_lrs = 1;
 
 writel_gich(GICH_MISR_EOI, GICH_MISR);
 }
@@ -488,6 +489,16 @@ static void gicv2_write_lr(int lr, const struct gic_lr 
*lr_reg)
 
 static void gicv2_hcr_status(uint32_t flag, bool_t status)
 {
+uint32_t lr = readl_gich(GICH_LR + 0);
+
+if ( status )
+lr |= GICH_V2_LR_MAINTENANCE_IRQ;
+else
+lr = ~GICH_V2_LR_MAINTENANCE_IRQ;
+
+writel_gich(lr, GICH_LR + 0);
+
+#if 0
 uint32_t hcr = readl_gich(GICH_HCR);
 
 if ( status )
@@ -496,6 +507,7 @@ static void gicv2_hcr_status(uint32_t flag, bool_t status)
 hcr = (~flag);
 
 writel_gich(hcr, GICH_HCR);
+#endif
 }
 
 static unsigned int gicv2_read_vmcr_priority(void)
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 70d10d6..c726d7a 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -599,6 +599,7 @@ static void maintenance_interrupt(int irq, void *dev_id, 
struct cpu_user_regs *r
  * on return to guest that is going to clear the old LRs and inject
  * new interrupts.
  */
+gdprintk(XENLOG_DEBUG, \n);
 }
 
 void gic_dump_info(struct vcpu *v)


-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Stefano Stabellini
On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 Hi Stefano,
 
if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
   -GICH[GICH_HCR] |= GICH_HCR_UIE;
   +GICH[GICH_HCR] |= GICH_HCR_NPIE;
else
   -GICH[GICH_HCR] = ~GICH_HCR_UIE;
   +GICH[GICH_HCR] = ~GICH_HCR_NPIE;
  
}
 
  Yes, exactly
 
 I tried, hang still occurs with this change

We need to figure out why during the hang you still have all the LRs
busy even if you are getting maintenance interrupts that should cause
them to be cleared.

Could you please call gic_dump_info(current) from maintenance_interrupt,
and post the output during the hang? Remove the other gic_dump_info to
avoid confusion, we want to understand what is the status of the LRs
after clearing them upon receiving a maintenance interrupt at busy times.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Qemu-devel] qemu 2.2 crash on linux hvm domU (full backtrace included)

2014-11-19 Thread Don Slutz
I think I know what is happening here.  But you are pointing at the 
wrong change.


commit 9b23cfb76b3a5e9eb5cc899eaf2f46bc46d33ba4

Is what I am guessing at this time is the issue.  I think that 
xen_enabled() is
returning false in pc_machine_initfn.  Where as in pc_init1 is is 
returning true.


I am thinking that:


diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 7bb97a4..3268c29 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -914,7 +914,7 @@ static QEMUMachine xenfv_machine = {
 .desc = Xen Fully-virtualized PC,
 .init = pc_xen_hvm_init,
 .max_cpus = HVM_MAX_VCPUS,
-.default_machine_opts = accel=xen,
+.default_machine_opts = accel=xen,vmport=off,
 .hot_add_cpu = pc_hot_add_cpu,
 };
 #endif

Will fix your issue. I have not tested this yet.

-Don Slutz


On 11/19/14 09:04, Fabio Fantoni wrote:

Il 14/11/2014 12:25, Fabio Fantoni ha scritto:
dom0 xen-unstable from staging git with x86/hvm: Extend HVM cpuid 
leaf with vcpu id and x86/hvm: Add per-vcpu evtchn upcalls 
patches, and qemu 2.2 from spice git (spice/next commit 
e779fa0a715530311e6f59fc8adb0f6eca914a89):

https://github.com/Fantu/Xen/commits/rebase/m2r-staging


I tried with qemu  tag v2.2.0-rc2 and crash still happen, here the 
full backtrace of latest test:

Program received signal SIGSEGV, Segmentation fault.
0x55689b07 in vmport_ioport_read (opaque=0x564443a0, addr=0,
size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73
73  eax = env-regs[R_EAX];
(gdb) bt full
#0  0x55689b07 in vmport_ioport_read (opaque=0x564443a0, 
addr=0,

size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73
s = 0x564443a0
cs = 0x0
cpu = 0x0
__func__ = vmport_ioport_read
env = 0x8250
command = 0 '\000'
eax = 0
#1  0x55655fc4 in memory_region_read_accessor 
(mr=0x5628,

addr=0, value=0x7fffd8d0, size=4, shift=0, mask=4294967295)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:410
tmp = 0
#2  0x556562b7 in access_with_adjusted_size (addr=0,
value=0x7fffd8d0, size=4, access_size_min=4, access_size_max=4,
access=0x55655f62 memory_region_read_accessor, 
mr=0x5628)

at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:480
access_mask = 4294967295
access_size = 4
i = 0
#3  0x556590e9 in memory_region_dispatch_read1 
(mr=0x5628,

addr=0, size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1077
data = 0
#4  0x556591b1 in memory_region_dispatch_read 
(mr=0x5628,

addr=0, pval=0x7fffd9a8, size=4)
---Type return to continue, or q return to quit---
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1099
No locals.
#5  0x5565cbbc in io_mem_read (mr=0x5628, addr=0,
pval=0x7fffd9a8, size=4)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1962
No locals.
#6  0x5560a1ca in address_space_rw (as=0x55eaf920, 
addr=22104,

buf=0x7fffda50 \377\377\377\377, len=4, is_write=false)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/exec.c:2167
l = 4
ptr = 0x55a92d87 %s/%d:\n
val = 7852232130387826944
addr1 = 0
mr = 0x5628
error = false
#7  0x5560a38f in address_space_read (as=0x55eaf920, 
addr=22104,

buf=0x7fffda50 \377\377\377\377, len=4)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/exec.c:2205
No locals.
#8  0x5564fd4b in cpu_inl (addr=22104)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/ioport.c:117
buf = \377\377\377\377
val = 21845
#9  0x55670c73 in do_inp (addr=22104, size=4)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:684
---Type return to continue, or q return to quit---
No locals.
#10 0x55670ee0 in cpu_ioreq_pio (req=0x77ff3020)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:747
i = 1
#11 0x556714b3 in handle_ioreq (state=0x563c2510,
req=0x77ff3020) at 
/mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:853

No locals.
#12 0x55671826 in cpu_handle_ioreq (opaque=0x563c2510)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:931
state = 0x563c2510
req = 0x77ff3020
#13 0x5596e240 in qemu_iohandler_poll 
(pollfds=0x56389a30, ret=1)

at iohandler.c:143
revents = 1
pioh = 0x563f7610
ioh = 0x56450a40
#14 0x5596de1c in main_loop_wait (nonblocking=0) at 
main-loop.c:495

ret = 1
timeout = 4294967295
timeout_ns = 3965432
#15 0x55756d3f in main_loop () at vl.c:1882
nonblocking = false
last_io = 0
#16 0x5575ea49 in main (argc=62, argv=0x7fffe048,
envp=0x7fffe240) at vl.c:4400
---Type return to continue, or q return to quit---
i = 128
snapshot = 0
linux_boot = 0
initrd_filename = 0x0
kernel_filename = 

Re: [Xen-devel] Xen-unstable: xen panic RIP: dpci_softirq

2014-11-19 Thread Konrad Rzeszutek Wilk
On Wed, Nov 19, 2014 at 12:16:44PM +0100, Sander Eikelenboom wrote:
 
 Wednesday, November 19, 2014, 2:55:41 AM, you wrote:
 
  On Tue, Nov 18, 2014 at 11:12:54PM +0100, Sander Eikelenboom wrote:
  
  Tuesday, November 18, 2014, 9:56:33 PM, you wrote:
  
   
   Uhmm i thought i had these switched off (due to problems earlier and 
   then forgot 
   about them .. however looking at the earlier reports these lines were 
   also in 
   those reports).
   
   The xen-syms and these last runs are all with a prestine xen tree 
   cloned today (staging 
   branch), so the qemu-xen and seabios defined with that were also 
   freshly cloned 
   and had a new default seabios config. (just to rule out anything stale 
   in my tree)
   
   If you don't see those messages .. perhaps your seabios and qemu trees 
   (and at least the 
   seabios config) are not the most recent (they don't get updated 
   automatically 
   when you just do a git pull on the main tree) ?
   
   In /tools/firmware/seabios-dir/.config i have:
   CONFIG_USB=y
   CONFIG_USB_UHCI=y
   CONFIG_USB_OHCI=y
   CONFIG_USB_EHCI=y
   CONFIG_USB_XHCI=y
   CONFIG_USB_MSC=y
   CONFIG_USB_UAS=y
   CONFIG_USB_HUB=y
   CONFIG_USB_KEYBOARD=y
   CONFIG_USB_MOUSE=y
   
  
   I seem to have the same thing. Perhaps it is my XHCI controller being 
   wonky.
  
   And this is all just from a:
   - git clone git://xenbits.xen.org/xen.git -b staging
   - make clean  ./configure  make -j6  make -j6 install
  
   Aye. 
   .. snip..
 1) test_and_[set|clear]_bit sometimes return unexpected values.
[But this might be invalid as the addition of the 8303faaf25a8
 might be correct - as the second dpci the softirq is processing
 could be the MSI one]
   
   Would there be an easy way to stress test this function separately in 
   some 
   debugging function to see if it indeed is returning unexpected values ?
  
   Sadly no. But you got me looking in the right direction when you 
   mentioned
   'timeout'.
   
 2) INIT_LIST_HEAD operations on the same CPU are not honored.
   
   Just curious, have you also tested the patches on AMD hardware ?
  
   Yes. To reproduce this the first thing I did was to get an AMD box.
  
   

When i look at the combination of (2) and (3), It seems it could be 
an 
interaction between the two passed through devices and/or different 
IRQ types.
   
Could be - as in it is causing this issue to show up faster than
expected. Or it is the one that triggers more than one dpci happening
at the same time.
   
   Well that didn't seem to be it (see separate amendment i mailed 
   previously)
  
   Right, the current theory I've is that the interrupts are not being
   Acked within 8 milisecond and we reset the 'state' - and at the same
   time we get an interrupt and schedule it - while we are still processing
   the same interrupt. This would explain why the 'test_and_clear_bit'
   got the wrong value.
  
   In regards to the list poison - following this thread of logic - with
   the 'state = 0' set we open the floodgates for any CPU to put the same
   'struct hvm_pirq_dpci' on its list.
  
   We do reset the 'state' on _every_ GSI that is mapped to a guest - so
   we also reset the 'state' for the MSI one (XHCI). Anyhow in your case:
  
   CPUX:   CPUY:
   pt_irq_time_out:
   state = 0;  
   [out of timer coder, theraise_softirq
pirq_dpci is on the dpci_list] [adds the pirq_dpci as state == 
   0]
  
   softirq_dpcisoftirq_dpci:
   list_del
   [entries poison]
   list_del = BOOM
   
   Is what I believe is happening.
  
   The INTX device - once I put a load on it - does not trigger
   any pt_irq_time_out, so that would explain why I cannot hit this.
  
   But I believe your card hits these hiccups.   
  
  
  Hi Konrad,
  
  I just tested you 5 patches and as a result i still got an(other) host 
  crash:
  (complete serial log attached)
  
  (XEN) [2014-11-18 21:55:41.591] [ Xen-4.5.0-rc  x86_64  debug=y  Not 
  tainted ]
  (XEN) [2014-11-18 21:55:41.591] CPU:0
  (XEN) [2014-11-18 21:55:41.591] [ Xen-4.5.0-rc  x86_64  debug=y  Not 
  tainted ]
  (XEN) [2014-11-18 21:55:41.591] RIP:e008:[82d08012c7e7]CPU:2
  (XEN) [2014-11-18 21:55:41.591] RIP:e008:[82d08014a461] 
  hvm_do_IRQ_dpci+0xbd/0x13c
  (XEN) [2014-11-18 21:55:41.591] RFLAGS: 00010006
  _spin_unlock+0x1f/0x30CONTEXT: hypervisor
 
  Duh!
 
  Here is another patch on top of the five you have (attached and inline).
 
 Hi Konrad,
 
 Happy to report it has been running with this additional patch for 2 hours 
 now 
 without any problems. I think you nailed it :-)

Could you also do an 'xl debug-keys k' and send that please?

 More than happy to test the definitive patch as well.


Re: [Xen-devel] Problems accessing passthrough PCI device

2014-11-19 Thread Simon Martin
Hello Jan and Konrad,

Tuesday, November 18, 2014, 1:49:13 PM, you wrote:


 I've just checked this with lspci. I see that the IO is being enabled.

 Memory you mean.

Yes. Sorry.

 Any   other   idea   on   why I might be reading back 0xff for all PCI
 memory area reads? The lspci output follows.

 Since this isn't behind a bridge - no, not really. Did you try this with
 any other device for comparison purposes?

This   is  getting  more  interesting.  It  seems  that  something  is
overwriting the pci-back configuration data.

Starting  from a fresh reboot I checked the Dom0 pci configuration and
got this:

root@smartin-xen:~# lspci -s 00:19.0 -x
00:19.0 Ethernet controller: Intel Corporation Device 1559 (rev 04)
00: 86 80 59 15 00 00 10 00 04 00 00 02 00 00 00 00
10: 00 00 d0 f7 00 c0 d3 f7 81 f0 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 54 20
30: 00 00 00 00 c8 00 00 00 00 00 00 00 05 01 00 00

I then start/stop my DomU and checked the Dom0 pci configuration again
and got this:

root@smartin-xen:~# lspci -s 00:19.0 -x
00:19.0 Ethernet controller: Intel Corporation Device 1559 (rev 04)
00: 86 80 59 15 00 00 10 00 04 00 00 02 00 00 00 00
10: 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 54 20
30: 00 00 00 00 c8 00 00 00 00 00 00 00 05 01 00 00

Inside  my  DomU I added code to print the PCI configuration registers
and what I get after restarting the DomU is:

(d18) 14:57:04.042 src/e1000e.c@00150: 00: 86 80 59 15 00 00 10 00 04 00 00 02 
00 00 00 00
(d18) 14:57:04.042 src/e1000e.c@00150: 10: 00 00 d0 f7 00 c0 d3 f7 81 f0 00 00 
00 00 00 00
(d18) 14:57:04.042 src/e1000e.c@00150: 20: 00 00 00 00 00 00 00 00 00 00 00 00 
86 80 54 20
(d18) 14:57:04.043 src/e1000e.c@00150: 30: 00 00 00 00 c8 00 00 00 00 00 00 00 
14 01 00 00
(d18) 14:57:04.043 src/e1000e.c@00324: Enable PCI Memory Access
(d18) 14:57:05.043 src/e1000e.c@00150: 00: 86 80 59 15 03 00 10 00 04 00 00 02 
00 00 00 00
(d18) 14:57:05.044 src/e1000e.c@00150: 10: 00 00 d0 f7 00 c0 d3 f7 81 f0 00 00 
00 00 00 00
(d18) 14:57:05.044 src/e1000e.c@00150: 20: 00 00 00 00 00 00 00 00 00 00 00 00 
86 80 54 20
(d18) 14:57:05.045 src/e1000e.c@00150: 30: 00 00 00 00 c8 00 00 00 00 00 00 00 
14 01 00 00

As  you can see the pci configuration read from the pci-back driver by
my DomU is different to the data in the Dom0 pci configuration!

Just  before  leaving my DomU I disable the pci memory access and this
is what I see

(d18) 15:01:02.051 src/e1000e.c@00150: 00: 86 80 59 15 03 00 10 00 04 00 00 02 
00 00 00 00
(d18) 15:01:02.051 src/e1000e.c@00150: 10: 00 00 d0 f7 00 c0 d3 f7 81 f0 00 00 
00 00 00 00
(d18) 15:01:02.051 src/e1000e.c@00150: 20: 00 00 00 00 00 00 00 00 00 00 00 00 
86 80 54 20
(d18) 15:01:02.052 src/e1000e.c@00150: 30: 00 00 00 00 c8 00 00 00 00 00 00 00 
14 01 00 00
(d18) 15:01:02.052 src/e1000e.c@00541: Disable PCI Memory Access
(d18) 15:01:02.052 src/e1000e.c@00150: 00: 86 80 59 15 00 00 10 00 04 00 00 02 
00 00 00 00
(d18) 15:01:02.052 src/e1000e.c@00150: 10: 00 00 d0 f7 00 c0 d3 f7 81 f0 00 00 
00 00 00 00
(d18) 15:01:02.052 src/e1000e.c@00150: 20: 00 00 00 00 00 00 00 00 00 00 00 00 
86 80 54 20
(d18) 15:01:02.053 src/e1000e.c@00150: 30: 00 00 00 00 c8 00 00 00 00 00 00 00 
14 01 00 00

As  you  can  see the data is consistent with just writing  to the
pci control register.

This is the output from the debug version of the xen-pciback module.

[ 5429.351231] pciback :00:19.0: enabling device ( - 0003)
[ 5429.351367] xen: registering gsi 20 triggering 0 polarity 1
[ 5429.351373] Already setup the GSI :20
[ 5429.351387] pciback :00:19.0: xen-pciback[:00:19.0]: #20 on  
disable- enable
[ 5429.351436] pciback :00:19.0: xen-pciback[:00:19.0]: #20 on  enabled
[ 5434.360078] pciback :00:19.0: xen-pciback[:00:19.0]: #20 off  
enable- disable
[ 5434.360116] pciback :00:19.0: xen-pciback[:00:19.0]: #0 off  disabled
[ 5434.361491] xen-pciback pci-20-0: fe state changed 5
[ 5434.362473] xen-pciback pci-20-0: fe state changed 6
[ 5434.363540] xen-pciback pci-20-0: fe state changed 0
[ 5434.363544] xen-pciback pci-20-0: frontend is gone! unregister device
[ 5434.467359] pciback :00:19.0: resetting virtual configuration space
[ 5434.467376] pciback :00:19.0: free-ing dynamically allocated virtual 
configuration space fields

Does this make any sense to you?

-- 
Best regards,
 Simonmailto:furryfutt...@gmail.com


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 0/5 v2 for-4.5] xen: arm: xgene bug fixes + support for McDivitt

2014-11-19 Thread Ian Campbell
These patches:

  * fix up an off by one bug in the xgene mapping of additional PCI
bus resources, which would cause an additional extra page to be
mapped
  * correct the size of the mapped regions to match the docs
  * adds support for the other 4 PCI buses on the chip, which
enables mcdivitt and presumably most other Xgene based platforms
which uses PCI buses other than pcie0.
  * adds earlyprintk for the mcdivitt platform

They can also be found at:
git://xenbits.xen.org/people/ianc/xen.git mcdivitt-v2

McDivitt is the X-Gene based HP Moonshot cartridge (McDivitt is the code
name, I think the product is called m400, not quite sure).

Other than the bug fixes I'd like to see the mcdivitt support
(specifically the other 4 PCI buses one) in 4.5 because Moonshot is an
interesting and exciting platform for arm64. It is also being used for
ongoing work on Xen on ARM on Openstack in Linaro. The earlyprintk patch
is totally harmless unless it's explicitly enabled at compile time, IMHO
if we are taking the rest we may as well throw it in...

The risk here is that we break the existing support for the Mustang
platform, which would be the most likely failure case for the second
patch. I've tested these on a Mustang, including firing up a PCI NIC
device. The new mappings are a superset of the existing ones so the
potential for breakage should be quite small.

I've also successfully tested on a McDivitt.

Ian.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 for-4.5 1/5] xen: arm: Add earlyprintk for McDivitt.

2014-11-19 Thread Ian Campbell
Signed-off-by: Ian Campbell ian.campb...@citrix.com
---
v2: Remove pointless/unused baud rate setting.

A bunch of other entries have these, but cleaning them up is out of scope here 
I think.
---
 xen/arch/arm/Rules.mk |5 +
 1 file changed, 5 insertions(+)

diff --git a/xen/arch/arm/Rules.mk b/xen/arch/arm/Rules.mk
index 572d854..30c7823 100644
--- a/xen/arch/arm/Rules.mk
+++ b/xen/arch/arm/Rules.mk
@@ -95,6 +95,11 @@ EARLY_PRINTK_BAUD := 115200
 EARLY_UART_BASE_ADDRESS := 0x1c02
 EARLY_UART_REG_SHIFT := 2
 endif
+ifeq ($(CONFIG_EARLY_PRINTK), xgene-mcdivitt)
+EARLY_PRINTK_INC := 8250
+EARLY_UART_BASE_ADDRESS := 0x1c021000
+EARLY_UART_REG_SHIFT := 2
+endif
 ifeq ($(CONFIG_EARLY_PRINTK), juno)
 EARLY_PRINTK_INC := pl011
 EARLY_PRINTK_BAUD := 115200
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 for-4.5 2/5] xen: arm: Drop EARLY_PRINTK_BAUD from entries which don't set ..._INIT_UART

2014-11-19 Thread Ian Campbell
EARLY_PRINTK_BAUD doesn't do anything unless EARLY_PRINTK_INIT_UART is set.

Furthermore only the pl011 driver implements the init routine at all, so the
entries which use 8250 and specified a BAUD were doubly wrong.

Signed-off-by: Ian Campbell ian.campb...@citrix.com
---
v2: New patch.
---
 xen/arch/arm/Rules.mk |7 ---
 1 file changed, 7 deletions(-)

diff --git a/xen/arch/arm/Rules.mk b/xen/arch/arm/Rules.mk
index 30c7823..4ee51a9 100644
--- a/xen/arch/arm/Rules.mk
+++ b/xen/arch/arm/Rules.mk
@@ -45,7 +45,6 @@ ifeq ($(debug),y)
 # Early printk for versatile express
 ifeq ($(CONFIG_EARLY_PRINTK), vexpress)
 EARLY_PRINTK_INC := pl011
-EARLY_PRINTK_BAUD := 38400
 EARLY_UART_BASE_ADDRESS := 0x1c09
 endif
 ifeq ($(CONFIG_EARLY_PRINTK), fastmodel)
@@ -56,12 +55,10 @@ EARLY_UART_BASE_ADDRESS := 0x1c09
 endif
 ifeq ($(CONFIG_EARLY_PRINTK), exynos5250)
 EARLY_PRINTK_INC := exynos4210
-EARLY_PRINTK_BAUD := 115200
 EARLY_UART_BASE_ADDRESS := 0x12c2
 endif
 ifeq ($(CONFIG_EARLY_PRINTK), midway)
 EARLY_PRINTK_INC := pl011
-EARLY_PRINTK_BAUD := 115200
 EARLY_UART_BASE_ADDRESS := 0xfff36000
 endif
 ifeq ($(CONFIG_EARLY_PRINTK), omap5432)
@@ -91,7 +88,6 @@ EARLY_UART_REG_SHIFT := 2
 endif
 ifeq ($(CONFIG_EARLY_PRINTK), xgene-storm)
 EARLY_PRINTK_INC := 8250
-EARLY_PRINTK_BAUD := 115200
 EARLY_UART_BASE_ADDRESS := 0x1c02
 EARLY_UART_REG_SHIFT := 2
 endif
@@ -102,18 +98,15 @@ EARLY_UART_REG_SHIFT := 2
 endif
 ifeq ($(CONFIG_EARLY_PRINTK), juno)
 EARLY_PRINTK_INC := pl011
-EARLY_PRINTK_BAUD := 115200
 EARLY_UART_BASE_ADDRESS := 0x7ff8
 endif
 ifeq ($(CONFIG_EARLY_PRINTK), hip04-d01)
 EARLY_PRINTK_INC := 8250
-EARLY_PRINTK_BAUD := 115200
 EARLY_UART_BASE_ADDRESS := 0xE4007000
 EARLY_UART_REG_SHIFT := 2
 endif
 ifeq ($(CONFIG_EARLY_PRINTK), seattle)
 EARLY_PRINTK_INC := pl011
-EARLY_PRINTK_BAUD := 115200
 EARLY_UART_BASE_ADDRESS := 0xe101
 endif
 
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 for-4.5 4/5] xen: arm: correct specific mappings for PCIE0 on X-Gene

2014-11-19 Thread Ian Campbell
The region assigned to PCIE0, according to the docs, is 0x0e0 to
0x100. They make no distinction between PCI CFG and PCI IO mem within
this range (in fact, I'm not sure that isn't up to the driver).

Signed-off-by: Ian Campbell ian.campb...@citrix.com
Reviewed-by: Julien Grall julien.gr...@linaro.org
---
 xen/arch/arm/platforms/xgene-storm.c |   18 ++
 1 file changed, 2 insertions(+), 16 deletions(-)

diff --git a/xen/arch/arm/platforms/xgene-storm.c 
b/xen/arch/arm/platforms/xgene-storm.c
index 8685c93..8c27f24 100644
--- a/xen/arch/arm/platforms/xgene-storm.c
+++ b/xen/arch/arm/platforms/xgene-storm.c
@@ -89,22 +89,8 @@ static int xgene_storm_specific_mapping(struct domain *d)
 int ret;
 
 /* Map the PCIe bus resources */
-ret = map_one_mmio(d, PCI MEM REGION, paddr_to_pfn(0xe0UL),
-paddr_to_pfn(0xe01000UL));
-if ( ret )
-goto err;
-
-ret = map_one_mmio(d, PCI IO REGION, paddr_to_pfn(0xe08000UL),
-   paddr_to_pfn(0xe08001UL));
-if ( ret )
-goto err;
-
-ret = map_one_mmio(d, PCI CFG REGION, paddr_to_pfn(0xe0d000UL),
-paddr_to_pfn(0xe0d020UL));
-if ( ret )
-goto err;
-ret = map_one_mmio(d, PCI MSI REGION, paddr_to_pfn(0xe01000UL),
-paddr_to_pfn(0xe01080UL));
+ret = map_one_mmio(d, PCI MEMORY, paddr_to_pfn(0x0e0UL),
+paddr_to_pfn(0x010UL));
 if ( ret )
 goto err;
 
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Stefano Stabellini
On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 Hi Stefano,
 
 On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
  On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
  Hi Stefano,
 
 if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() 
)
-GICH[GICH_HCR] |= GICH_HCR_UIE;
+GICH[GICH_HCR] |= GICH_HCR_NPIE;
 else
-GICH[GICH_HCR] = ~GICH_HCR_UIE;
+GICH[GICH_HCR] = ~GICH_HCR_NPIE;
   
 }
  
   Yes, exactly
 
  I tried, hang still occurs with this change
 
  We need to figure out why during the hang you still have all the LRs
  busy even if you are getting maintenance interrupts that should cause
  them to be cleared.
 
 
 I see that I have free LRs during maintenance interrupt
 
 (XEN) gic.c:871:d0v0 maintenance interrupt
 (XEN) GICH_LRs (vcpu 0) mask=0
 (XEN)HW_LR[0]=9a015856
 (XEN)HW_LR[1]=0
 (XEN)HW_LR[2]=0
 (XEN)HW_LR[3]=0
 (XEN) Inflight irq=86 lr=0
 (XEN) Inflight irq=2 lr=255
 (XEN) Pending irq=2
 
 But I see that after I got hang - maintenance interrupts are generated
 continuously. Platform continues printing the same log till reboot.

Exactly the same log? As in the one above you just pasted?
That is very very suspicious.

I am thinking that we are not handling GICH_HCR_UIE correctly and
something we do in Xen, maybe writing to an LR register, might trigger a
new maintenance interrupt immediately causing an infinite loop.

Could you please try this patch? It disable GICH_HCR_UIE immediately on
hypervisor entry.


diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 4d2a92d..6ae8dc4 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
 if ( is_idle_vcpu(v) )
 return;
 
+GICH[GICH_HCR] = ~GICH_HCR_UIE;
+
 spin_lock_irqsave(v-arch.vgic.lock, flags);
 
 while ((i = find_next_bit((const unsigned long *) this_cpu(lr_mask),
@@ -821,12 +823,8 @@ void gic_inject(void)
 
 gic_restore_pending_irqs(current);
 
-
 if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
 GICH[GICH_HCR] |= GICH_HCR_UIE;
-else
-GICH[GICH_HCR] = ~GICH_HCR_UIE;
-
 }
 
 static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Qemu-devel] qemu 2.2 crash on linux hvm domU (full backtrace included)

2014-11-19 Thread Fabio Fantoni

Il 19/11/2014 15:56, Don Slutz ha scritto:
I think I know what is happening here.  But you are pointing at the 
wrong change.


commit 9b23cfb76b3a5e9eb5cc899eaf2f46bc46d33ba4

Is what I am guessing at this time is the issue.  I think that 
xen_enabled() is
returning false in pc_machine_initfn.  Where as in pc_init1 is is 
returning true.


I am thinking that:


diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 7bb97a4..3268c29 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -914,7 +914,7 @@ static QEMUMachine xenfv_machine = {
 .desc = Xen Fully-virtualized PC,
 .init = pc_xen_hvm_init,
 .max_cpus = HVM_MAX_VCPUS,
-.default_machine_opts = accel=xen,
+.default_machine_opts = accel=xen,vmport=off,
 .hot_add_cpu = pc_hot_add_cpu,
 };
 #endif

Will fix your issue. I have not tested this yet.


Tested now and it solves regression of linux hvm domUs with qemu 2.2, 
thanks.
I think that I'm not the only with this regression and that this patch 
(or a fix to the cause in vmport) should be applied before qemu 2.2 final.




-Don Slutz


On 11/19/14 09:04, Fabio Fantoni wrote:

Il 14/11/2014 12:25, Fabio Fantoni ha scritto:
dom0 xen-unstable from staging git with x86/hvm: Extend HVM cpuid 
leaf with vcpu id and x86/hvm: Add per-vcpu evtchn upcalls 
patches, and qemu 2.2 from spice git (spice/next commit 
e779fa0a715530311e6f59fc8adb0f6eca914a89):

https://github.com/Fantu/Xen/commits/rebase/m2r-staging


I tried with qemu  tag v2.2.0-rc2 and crash still happen, here the 
full backtrace of latest test:

Program received signal SIGSEGV, Segmentation fault.
0x55689b07 in vmport_ioport_read (opaque=0x564443a0, 
addr=0,

size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73
73  eax = env-regs[R_EAX];
(gdb) bt full
#0  0x55689b07 in vmport_ioport_read (opaque=0x564443a0, 
addr=0,

size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73
s = 0x564443a0
cs = 0x0
cpu = 0x0
__func__ = vmport_ioport_read
env = 0x8250
command = 0 '\000'
eax = 0
#1  0x55655fc4 in memory_region_read_accessor 
(mr=0x5628,

addr=0, value=0x7fffd8d0, size=4, shift=0, mask=4294967295)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:410
tmp = 0
#2  0x556562b7 in access_with_adjusted_size (addr=0,
value=0x7fffd8d0, size=4, access_size_min=4, access_size_max=4,
access=0x55655f62 memory_region_read_accessor, 
mr=0x5628)

at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:480
access_mask = 4294967295
access_size = 4
i = 0
#3  0x556590e9 in memory_region_dispatch_read1 
(mr=0x5628,

addr=0, size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1077
data = 0
#4  0x556591b1 in memory_region_dispatch_read 
(mr=0x5628,

addr=0, pval=0x7fffd9a8, size=4)
---Type return to continue, or q return to quit---
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1099
No locals.
#5  0x5565cbbc in io_mem_read (mr=0x5628, addr=0,
pval=0x7fffd9a8, size=4)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1962
No locals.
#6  0x5560a1ca in address_space_rw (as=0x55eaf920, 
addr=22104,

buf=0x7fffda50 \377\377\377\377, len=4, is_write=false)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/exec.c:2167
l = 4
ptr = 0x55a92d87 %s/%d:\n
val = 7852232130387826944
addr1 = 0
mr = 0x5628
error = false
#7  0x5560a38f in address_space_read (as=0x55eaf920, 
addr=22104,

buf=0x7fffda50 \377\377\377\377, len=4)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/exec.c:2205
No locals.
#8  0x5564fd4b in cpu_inl (addr=22104)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/ioport.c:117
buf = \377\377\377\377
val = 21845
#9  0x55670c73 in do_inp (addr=22104, size=4)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:684
---Type return to continue, or q return to quit---
No locals.
#10 0x55670ee0 in cpu_ioreq_pio (req=0x77ff3020)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:747
i = 1
#11 0x556714b3 in handle_ioreq (state=0x563c2510,
req=0x77ff3020) at 
/mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:853

No locals.
#12 0x55671826 in cpu_handle_ioreq (opaque=0x563c2510)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:931
state = 0x563c2510
req = 0x77ff3020
#13 0x5596e240 in qemu_iohandler_poll 
(pollfds=0x56389a30, ret=1)

at iohandler.c:143
revents = 1
pioh = 0x563f7610
ioh = 0x56450a40
#14 0x5596de1c in main_loop_wait (nonblocking=0) at 
main-loop.c:495

ret = 1
timeout = 4294967295
timeout_ns = 3965432
#15 0x55756d3f in main_loop () at vl.c:1882
nonblocking = false

Re: [Xen-devel] [Qemu-devel] qemu 2.2 crash on linux hvm domU (full backtrace included)

2014-11-19 Thread Stefano Stabellini
On Wed, 19 Nov 2014, Fabio Fantoni wrote:
 Il 19/11/2014 15:56, Don Slutz ha scritto:
  I think I know what is happening here.  But you are pointing at the wrong
  change.
  
  commit 9b23cfb76b3a5e9eb5cc899eaf2f46bc46d33ba4
  
  Is what I am guessing at this time is the issue.  I think that xen_enabled()
  is
  returning false in pc_machine_initfn.  Where as in pc_init1 is is returning
  true.
  
  I am thinking that:
  
  
  diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
  index 7bb97a4..3268c29 100644
  --- a/hw/i386/pc_piix.c
  +++ b/hw/i386/pc_piix.c
  @@ -914,7 +914,7 @@ static QEMUMachine xenfv_machine = {
   .desc = Xen Fully-virtualized PC,
   .init = pc_xen_hvm_init,
   .max_cpus = HVM_MAX_VCPUS,
  -.default_machine_opts = accel=xen,
  +.default_machine_opts = accel=xen,vmport=off,
   .hot_add_cpu = pc_hot_add_cpu,
   };
   #endif
  
  Will fix your issue. I have not tested this yet.
 
 Tested now and it solves regression of linux hvm domUs with qemu 2.2, thanks.
 I think that I'm not the only with this regression and that this patch (or a
 fix to the cause in vmport) should be applied before qemu 2.2 final.

Don,
please submit a proper patch with a Signed-off-by.

Thanks!

- Stefano

  
  -Don Slutz
  
  
  On 11/19/14 09:04, Fabio Fantoni wrote:
   Il 14/11/2014 12:25, Fabio Fantoni ha scritto:
dom0 xen-unstable from staging git with x86/hvm: Extend HVM cpuid leaf
with vcpu id and x86/hvm: Add per-vcpu evtchn upcalls patches, and
qemu 2.2 from spice git (spice/next commit
e779fa0a715530311e6f59fc8adb0f6eca914a89):
https://github.com/Fantu/Xen/commits/rebase/m2r-staging
   
   I tried with qemu  tag v2.2.0-rc2 and crash still happen, here the full
   backtrace of latest test:
Program received signal SIGSEGV, Segmentation fault.
0x55689b07 in vmport_ioport_read (opaque=0x564443a0, addr=0,
size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73
73  eax = env-regs[R_EAX];
(gdb) bt full
#0  0x55689b07 in vmport_ioport_read (opaque=0x564443a0,
addr=0,
size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73
s = 0x564443a0
cs = 0x0
cpu = 0x0
__func__ = vmport_ioport_read
env = 0x8250
command = 0 '\000'
eax = 0
#1  0x55655fc4 in memory_region_read_accessor
(mr=0x5628,
addr=0, value=0x7fffd8d0, size=4, shift=0, mask=4294967295)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:410
tmp = 0
#2  0x556562b7 in access_with_adjusted_size (addr=0,
value=0x7fffd8d0, size=4, access_size_min=4, access_size_max=4,
access=0x55655f62 memory_region_read_accessor,
mr=0x5628)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:480
access_mask = 4294967295
access_size = 4
i = 0
#3  0x556590e9 in memory_region_dispatch_read1
(mr=0x5628,
addr=0, size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1077
data = 0
#4  0x556591b1 in memory_region_dispatch_read
(mr=0x5628,
addr=0, pval=0x7fffd9a8, size=4)
---Type return to continue, or q return to quit---
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1099
No locals.
#5  0x5565cbbc in io_mem_read (mr=0x5628, addr=0,
pval=0x7fffd9a8, size=4)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1962
No locals.
#6  0x5560a1ca in address_space_rw (as=0x55eaf920,
addr=22104,
buf=0x7fffda50 \377\377\377\377, len=4, is_write=false)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/exec.c:2167
l = 4
ptr = 0x55a92d87 %s/%d:\n
val = 7852232130387826944
addr1 = 0
mr = 0x5628
error = false
#7  0x5560a38f in address_space_read (as=0x55eaf920,
addr=22104,
buf=0x7fffda50 \377\377\377\377, len=4)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/exec.c:2205
No locals.
#8  0x5564fd4b in cpu_inl (addr=22104)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/ioport.c:117
buf = \377\377\377\377
val = 21845
#9  0x55670c73 in do_inp (addr=22104, size=4)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:684
---Type return to continue, or q return to quit---
No locals.
#10 0x55670ee0 in cpu_ioreq_pio (req=0x77ff3020)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:747
i = 1
#11 0x556714b3 in handle_ioreq (state=0x563c2510,
req=0x77ff3020) at
/mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:853
No locals.
#12 0x55671826 in cpu_handle_ioreq (opaque=0x563c2510)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:931
 

[Xen-devel] [PATCHv3 0/4]: dma, x86, xen: reduce SWIOTLB usage in Xen guests

2014-11-19 Thread David Vrabel
On systems where DMA addresses and physical addresses are not 1:1
(such as Xen PV guests), the generic dma_get_required_mask() will not
return the correct mask (since it uses max_pfn).

Some device drivers (such as mptsas, mpt2sas) use
dma_get_required_mask() to set the device's DMA mask to allow them to use
only 32-bit DMA addresses in hardware structures.  This results in
unnecessary use of the SWIOTLB if DMA addresses are more than 32-bits,
impacting performance significantly.

This series allows Xen PV guests to override the default
dma_get_required_mask() with one that calculates the DMA mask from the
maximum MFN (and not the PFN).

Changes in v3:
- fix off-by-one in xen_dma_get_required_mask()
- split ia64 changes into separate patch.

Changes in v2:
- split x86 and xen changes into separate patches

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 4/4] x86/xen: use the maximum MFN to calculate the required DMA mask

2014-11-19 Thread David Vrabel
On a Xen PV guest the DMA addresses and physical addresses are not 1:1
(such as Xen PV guests) and the generic dma_get_required_mask() does
not return the correct mask (since it uses max_pfn).

Some device drivers (such as mptsas, mpt2sas) use
dma_get_required_mask() to set the device's DMA mask to allow them to
use only 32-bit DMA addresses in hardware structures.  This results in
unnecessary use of the SWIOTLB if DMA addresses are more than 32-bits,
impacting performance significantly.

Provide a get_required_mask op that uses the maximum MFN to calculate
the DMA mask.

Signed-off-by: David Vrabel david.vra...@citrix.com
---
 arch/x86/xen/pci-swiotlb-xen.c |1 +
 drivers/xen/swiotlb-xen.c  |   13 +
 include/xen/swiotlb-xen.h  |4 
 3 files changed, 18 insertions(+)

diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
index 0e98e5d..a5d180a 100644
--- a/arch/x86/xen/pci-swiotlb-xen.c
+++ b/arch/x86/xen/pci-swiotlb-xen.c
@@ -31,6 +31,7 @@ static struct dma_map_ops xen_swiotlb_dma_ops = {
.map_page = xen_swiotlb_map_page,
.unmap_page = xen_swiotlb_unmap_page,
.dma_supported = xen_swiotlb_dma_supported,
+   .get_required_mask = xen_swiotlb_get_required_mask,
 };
 
 /*
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index ebd8f21..654587d 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -42,9 +42,11 @@
 #include xen/page.h
 #include xen/xen-ops.h
 #include xen/hvc-console.h
+#include xen/interface/memory.h
 
 #include asm/dma-mapping.h
 #include asm/xen/page-coherent.h
+#include asm/xen/hypercall.h
 
 #include trace/events/swiotlb.h
 /*
@@ -683,3 +685,14 @@ xen_swiotlb_set_dma_mask(struct device *dev, u64 dma_mask)
return 0;
 }
 EXPORT_SYMBOL_GPL(xen_swiotlb_set_dma_mask);
+
+u64
+xen_swiotlb_get_required_mask(struct device *dev)
+{
+   unsigned long max_mfn;
+
+   max_mfn = HYPERVISOR_memory_op(XENMEM_maximum_ram_page, NULL);
+
+   return DMA_BIT_MASK(fls_long(max_mfn - 1) + PAGE_SHIFT);
+}
+EXPORT_SYMBOL_GPL(xen_swiotlb_get_required_mask);
diff --git a/include/xen/swiotlb-xen.h b/include/xen/swiotlb-xen.h
index 8b2eb93..640 100644
--- a/include/xen/swiotlb-xen.h
+++ b/include/xen/swiotlb-xen.h
@@ -58,4 +58,8 @@ xen_swiotlb_dma_supported(struct device *hwdev, u64 mask);
 
 extern int
 xen_swiotlb_set_dma_mask(struct device *dev, u64 dma_mask);
+
+extern u64
+xen_swiotlb_get_required_mask(struct device *dev);
+
 #endif /* __LINUX_SWIOTLB_XEN_H */
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Andrii Tseglytskyi
On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
stefano.stabell...@eu.citrix.com wrote:
 On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 Hi Stefano,

 On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
  On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
  Hi Stefano,
 
 if ( !list_empty(current-arch.vgic.lr_pending)  
lr_all_full() )
-GICH[GICH_HCR] |= GICH_HCR_UIE;
+GICH[GICH_HCR] |= GICH_HCR_NPIE;
 else
-GICH[GICH_HCR] = ~GICH_HCR_UIE;
+GICH[GICH_HCR] = ~GICH_HCR_NPIE;
   
 }
  
   Yes, exactly
 
  I tried, hang still occurs with this change
 
  We need to figure out why during the hang you still have all the LRs
  busy even if you are getting maintenance interrupts that should cause
  them to be cleared.
 

 I see that I have free LRs during maintenance interrupt

 (XEN) gic.c:871:d0v0 maintenance interrupt
 (XEN) GICH_LRs (vcpu 0) mask=0
 (XEN)HW_LR[0]=9a015856
 (XEN)HW_LR[1]=0
 (XEN)HW_LR[2]=0
 (XEN)HW_LR[3]=0
 (XEN) Inflight irq=86 lr=0
 (XEN) Inflight irq=2 lr=255
 (XEN) Pending irq=2

 But I see that after I got hang - maintenance interrupts are generated
 continuously. Platform continues printing the same log till reboot.

 Exactly the same log? As in the one above you just pasted?
 That is very very suspicious.

Yes exactly the same log. And looks like it means that LRs are flushed
correctly.


 I am thinking that we are not handling GICH_HCR_UIE correctly and
 something we do in Xen, maybe writing to an LR register, might trigger a
 new maintenance interrupt immediately causing an infinite loop.


Yes, this is what I'm thinking about. Taking in account all collected
debug info it looks like once LRs are overloaded with SGIs -
maintenance interrupt occurs.
And then it is not handled properly, and occurs again and again - so
platform hangs inside its handler.

 Could you please try this patch? It disable GICH_HCR_UIE immediately on
 hypervisor entry.


Now trying.


 diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
 index 4d2a92d..6ae8dc4 100644
 --- a/xen/arch/arm/gic.c
 +++ b/xen/arch/arm/gic.c
 @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
  if ( is_idle_vcpu(v) )
  return;

 +GICH[GICH_HCR] = ~GICH_HCR_UIE;
 +
  spin_lock_irqsave(v-arch.vgic.lock, flags);

  while ((i = find_next_bit((const unsigned long *) this_cpu(lr_mask),
 @@ -821,12 +823,8 @@ void gic_inject(void)

  gic_restore_pending_irqs(current);

 -
  if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
  GICH[GICH_HCR] |= GICH_HCR_UIE;
 -else
 -GICH[GICH_HCR] = ~GICH_HCR_UIE;
 -
  }

  static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi 
 sgi)



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 1/4] dma: add dma_get_required_mask_from_max_pfn()

2014-11-19 Thread David Vrabel
A generic dma_get_required_mask() is useful even for architectures (such
as ia64) that define ARCH_HAS_GET_REQUIRED_MASK.

Signed-off-by: David Vrabel david.vra...@citrix.com
Reviewed-by: Stefano Stabellini stefano.stabell...@eu.citrix.com
---
 drivers/base/platform.c |   10 --
 include/linux/dma-mapping.h |1 +
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index b2afc29..f9f3930 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -1009,8 +1009,7 @@ int __init platform_bus_init(void)
return error;
 }
 
-#ifndef ARCH_HAS_DMA_GET_REQUIRED_MASK
-u64 dma_get_required_mask(struct device *dev)
+u64 dma_get_required_mask_from_max_pfn(struct device *dev)
 {
u32 low_totalram = ((max_pfn - 1)  PAGE_SHIFT);
u32 high_totalram = ((max_pfn - 1)  (32 - PAGE_SHIFT));
@@ -1028,6 +1027,13 @@ u64 dma_get_required_mask(struct device *dev)
}
return mask;
 }
+EXPORT_SYMBOL_GPL(dma_get_required_mask_from_max_pfn);
+
+#ifndef ARCH_HAS_DMA_GET_REQUIRED_MASK
+u64 dma_get_required_mask(struct device *dev)
+{
+   return dma_get_required_mask_from_max_pfn(dev);
+}
 EXPORT_SYMBOL_GPL(dma_get_required_mask);
 #endif
 
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index d5d3881..6e2fdfc 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -127,6 +127,7 @@ static inline int dma_coerce_mask_and_coherent(struct 
device *dev, u64 mask)
return dma_set_mask_and_coherent(dev, mask);
 }
 
+extern u64 dma_get_required_mask_from_max_pfn(struct device *dev);
 extern u64 dma_get_required_mask(struct device *dev);
 
 #ifndef set_arch_dma_coherent_ops
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 3/4] x86: allow dma_get_required_mask() to be overridden

2014-11-19 Thread David Vrabel
Use dma_ops-get_required_mask() if provided, defaulting to
dma_get_requried_mask_from_max_pfn().

This is needed on systems (such as Xen PV guests) where the DMA
address and the physical address are not equal.

ARCH_HAS_DMA_GET_REQUIRED_MASK is defined in asm/device.h instead of
asm/dma-mapping.h because linux/dma-mapping.h uses the define before
including asm/dma-mapping.h

Signed-off-by: David Vrabel david.vra...@citrix.com
Reviewed-by: Stefano Stabellini stefano.stabell...@eu.citrix.com
---
 arch/x86/include/asm/device.h |2 ++
 arch/x86/kernel/pci-dma.c |8 
 2 files changed, 10 insertions(+)

diff --git a/arch/x86/include/asm/device.h b/arch/x86/include/asm/device.h
index 03dd729..10bc628 100644
--- a/arch/x86/include/asm/device.h
+++ b/arch/x86/include/asm/device.h
@@ -13,4 +13,6 @@ struct dev_archdata {
 struct pdev_archdata {
 };
 
+#define ARCH_HAS_DMA_GET_REQUIRED_MASK
+
 #endif /* _ASM_X86_DEVICE_H */
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index a25e202..5154400 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -140,6 +140,14 @@ void dma_generic_free_coherent(struct device *dev, size_t 
size, void *vaddr,
free_pages((unsigned long)vaddr, get_order(size));
 }
 
+u64 dma_get_required_mask(struct device *dev)
+{
+   if (dma_ops-get_required_mask)
+   return dma_ops-get_required_mask(dev);
+   return dma_get_required_mask_from_max_pfn(dev);
+}
+EXPORT_SYMBOL_GPL(dma_get_required_mask);
+
 /*
  * See Documentation/x86/x86_64/boot-options.txt for the iommu kernel
  * parameter documentation.
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 2/4] ia64: use common dma_get_required_mask_from_pfn()

2014-11-19 Thread David Vrabel
Signed-off-by: David Vrabel david.vra...@citrix.com
Cc: Tony Luck tony.l...@intel.com
Cc: Fenghua Yu fenghua...@intel.com
Cc: linux-i...@vger.kernel.org
---
 arch/ia64/include/asm/machvec.h  |2 +-
 arch/ia64/include/asm/machvec_init.h |1 -
 arch/ia64/pci/pci.c  |   20 
 3 files changed, 1 insertion(+), 22 deletions(-)

diff --git a/arch/ia64/include/asm/machvec.h b/arch/ia64/include/asm/machvec.h
index 9c39bdf..beaa47d 100644
--- a/arch/ia64/include/asm/machvec.h
+++ b/arch/ia64/include/asm/machvec.h
@@ -287,7 +287,7 @@ extern struct dma_map_ops *dma_get_ops(struct device *);
 # define platform_dma_get_ops  dma_get_ops
 #endif
 #ifndef platform_dma_get_required_mask
-# define  platform_dma_get_required_mask   ia64_dma_get_required_mask
+# define  platform_dma_get_required_mask   
dma_get_required_mask_from_max_pfn
 #endif
 #ifndef platform_irq_to_vector
 # define platform_irq_to_vector__ia64_irq_to_vector
diff --git a/arch/ia64/include/asm/machvec_init.h 
b/arch/ia64/include/asm/machvec_init.h
index 37a4698..ef964b2 100644
--- a/arch/ia64/include/asm/machvec_init.h
+++ b/arch/ia64/include/asm/machvec_init.h
@@ -3,7 +3,6 @@
 
 extern ia64_mv_send_ipi_t ia64_send_ipi;
 extern ia64_mv_global_tlb_purge_t ia64_global_tlb_purge;
-extern ia64_mv_dma_get_required_mask ia64_dma_get_required_mask;
 extern ia64_mv_irq_to_vector __ia64_irq_to_vector;
 extern ia64_mv_local_vector_to_irq __ia64_local_vector_to_irq;
 extern ia64_mv_pci_get_legacy_mem_t ia64_pci_get_legacy_mem;
diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c
index 291a582..79da21b 100644
--- a/arch/ia64/pci/pci.c
+++ b/arch/ia64/pci/pci.c
@@ -791,26 +791,6 @@ static void __init set_pci_dfl_cacheline_size(void)
pci_dfl_cache_line_size = (1  cci.pcci_line_size) / 4;
 }
 
-u64 ia64_dma_get_required_mask(struct device *dev)
-{
-   u32 low_totalram = ((max_pfn - 1)  PAGE_SHIFT);
-   u32 high_totalram = ((max_pfn - 1)  (32 - PAGE_SHIFT));
-   u64 mask;
-
-   if (!high_totalram) {
-   /* convert to mask just covering totalram */
-   low_totalram = (1  (fls(low_totalram) - 1));
-   low_totalram += low_totalram - 1;
-   mask = low_totalram;
-   } else {
-   high_totalram = (1  (fls(high_totalram) - 1));
-   high_totalram += high_totalram - 1;
-   mask = (((u64)high_totalram)  32) + 0x;
-   }
-   return mask;
-}
-EXPORT_SYMBOL_GPL(ia64_dma_get_required_mask);
-
 u64 dma_get_required_mask(struct device *dev)
 {
return platform_dma_get_required_mask(dev);
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Stefano Stabellini
On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
 andrii.tseglyts...@globallogic.com wrote:
  On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
  stefano.stabell...@eu.citrix.com wrote:
  On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
  Hi Stefano,
 
  On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
  stefano.stabell...@eu.citrix.com wrote:
   On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
   Hi Stefano,
  
  if ( !list_empty(current-arch.vgic.lr_pending)  
 lr_all_full() )
 -GICH[GICH_HCR] |= GICH_HCR_UIE;
 +GICH[GICH_HCR] |= GICH_HCR_NPIE;
  else
 -GICH[GICH_HCR] = ~GICH_HCR_UIE;
 +GICH[GICH_HCR] = ~GICH_HCR_NPIE;

  }
   
Yes, exactly
  
   I tried, hang still occurs with this change
  
   We need to figure out why during the hang you still have all the LRs
   busy even if you are getting maintenance interrupts that should cause
   them to be cleared.
  
 
  I see that I have free LRs during maintenance interrupt
 
  (XEN) gic.c:871:d0v0 maintenance interrupt
  (XEN) GICH_LRs (vcpu 0) mask=0
  (XEN)HW_LR[0]=9a015856
  (XEN)HW_LR[1]=0
  (XEN)HW_LR[2]=0
  (XEN)HW_LR[3]=0
  (XEN) Inflight irq=86 lr=0
  (XEN) Inflight irq=2 lr=255
  (XEN) Pending irq=2
 
  But I see that after I got hang - maintenance interrupts are generated
  continuously. Platform continues printing the same log till reboot.
 
  Exactly the same log? As in the one above you just pasted?
  That is very very suspicious.
 
  Yes exactly the same log. And looks like it means that LRs are flushed
  correctly.
 
 
  I am thinking that we are not handling GICH_HCR_UIE correctly and
  something we do in Xen, maybe writing to an LR register, might trigger a
  new maintenance interrupt immediately causing an infinite loop.
 
 
  Yes, this is what I'm thinking about. Taking in account all collected
  debug info it looks like once LRs are overloaded with SGIs -
  maintenance interrupt occurs.
  And then it is not handled properly, and occurs again and again - so
  platform hangs inside its handler.
 
  Could you please try this patch? It disable GICH_HCR_UIE immediately on
  hypervisor entry.
 
 
  Now trying.
 
 
  diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
  index 4d2a92d..6ae8dc4 100644
  --- a/xen/arch/arm/gic.c
  +++ b/xen/arch/arm/gic.c
  @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
   if ( is_idle_vcpu(v) )
   return;
 
  +GICH[GICH_HCR] = ~GICH_HCR_UIE;
  +
   spin_lock_irqsave(v-arch.vgic.lock, flags);
 
   while ((i = find_next_bit((const unsigned long *) this_cpu(lr_mask),
  @@ -821,12 +823,8 @@ void gic_inject(void)
 
   gic_restore_pending_irqs(current);
 
  -
   if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
   GICH[GICH_HCR] |= GICH_HCR_UIE;
  -else
  -GICH[GICH_HCR] = ~GICH_HCR_UIE;
  -
   }
 
   static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi 
  sgi)
 
 
 Heh - I don't see hangs with this patch :) But also I see that
 maintenance interrupt doesn't occur (and no hang as result)
 Stefano - is this expected?

No maintenance interrupts at all? That's strange. You should be
receiving them when LRs are full and you still have interrupts pending
to be added to them.

You could add another printk here to see if you should be receiving
them:

 if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
+{
+gdprintk(XENLOG_DEBUG, requesting maintenance interrupt\n);
 GICH[GICH_HCR] |= GICH_HCR_UIE;
-else
-GICH[GICH_HCR] = ~GICH_HCR_UIE;
-
+}
 }


 
 
  --
 
  Andrii Tseglytskyi | Embedded Dev
  GlobalLogic
  www.globallogic.com
 
 
 
 -- 
 
 Andrii Tseglytskyi | Embedded Dev
 GlobalLogic
 www.globallogic.com
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Andrii Tseglytskyi
On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
stefano.stabell...@eu.citrix.com wrote:
 On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
 andrii.tseglyts...@globallogic.com wrote:
  On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
  stefano.stabell...@eu.citrix.com wrote:
  On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
  Hi Stefano,
 
  On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
  stefano.stabell...@eu.citrix.com wrote:
   On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
   Hi Stefano,
  
  if ( !list_empty(current-arch.vgic.lr_pending)  
 lr_all_full() )
 -GICH[GICH_HCR] |= GICH_HCR_UIE;
 +GICH[GICH_HCR] |= GICH_HCR_NPIE;
  else
 -GICH[GICH_HCR] = ~GICH_HCR_UIE;
 +GICH[GICH_HCR] = ~GICH_HCR_NPIE;

  }
   
Yes, exactly
  
   I tried, hang still occurs with this change
  
   We need to figure out why during the hang you still have all the LRs
   busy even if you are getting maintenance interrupts that should cause
   them to be cleared.
  
 
  I see that I have free LRs during maintenance interrupt
 
  (XEN) gic.c:871:d0v0 maintenance interrupt
  (XEN) GICH_LRs (vcpu 0) mask=0
  (XEN)HW_LR[0]=9a015856
  (XEN)HW_LR[1]=0
  (XEN)HW_LR[2]=0
  (XEN)HW_LR[3]=0
  (XEN) Inflight irq=86 lr=0
  (XEN) Inflight irq=2 lr=255
  (XEN) Pending irq=2
 
  But I see that after I got hang - maintenance interrupts are generated
  continuously. Platform continues printing the same log till reboot.
 
  Exactly the same log? As in the one above you just pasted?
  That is very very suspicious.
 
  Yes exactly the same log. And looks like it means that LRs are flushed
  correctly.
 
 
  I am thinking that we are not handling GICH_HCR_UIE correctly and
  something we do in Xen, maybe writing to an LR register, might trigger a
  new maintenance interrupt immediately causing an infinite loop.
 
 
  Yes, this is what I'm thinking about. Taking in account all collected
  debug info it looks like once LRs are overloaded with SGIs -
  maintenance interrupt occurs.
  And then it is not handled properly, and occurs again and again - so
  platform hangs inside its handler.
 
  Could you please try this patch? It disable GICH_HCR_UIE immediately on
  hypervisor entry.
 
 
  Now trying.
 
 
  diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
  index 4d2a92d..6ae8dc4 100644
  --- a/xen/arch/arm/gic.c
  +++ b/xen/arch/arm/gic.c
  @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
   if ( is_idle_vcpu(v) )
   return;
 
  +GICH[GICH_HCR] = ~GICH_HCR_UIE;
  +
   spin_lock_irqsave(v-arch.vgic.lock, flags);
 
   while ((i = find_next_bit((const unsigned long *) this_cpu(lr_mask),
  @@ -821,12 +823,8 @@ void gic_inject(void)
 
   gic_restore_pending_irqs(current);
 
  -
   if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
   GICH[GICH_HCR] |= GICH_HCR_UIE;
  -else
  -GICH[GICH_HCR] = ~GICH_HCR_UIE;
  -
   }
 
   static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum 
  gic_sgi sgi)
 

 Heh - I don't see hangs with this patch :) But also I see that
 maintenance interrupt doesn't occur (and no hang as result)
 Stefano - is this expected?

 No maintenance interrupts at all? That's strange. You should be
 receiving them when LRs are full and you still have interrupts pending
 to be added to them.

 You could add another printk here to see if you should be receiving
 them:

  if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
 +{
 +gdprintk(XENLOG_DEBUG, requesting maintenance interrupt\n);
  GICH[GICH_HCR] |= GICH_HCR_UIE;
 -else
 -GICH[GICH_HCR] = ~GICH_HCR_UIE;
 -
 +}
  }


Requested properly:

(XEN) gic.c:756:d0v0 requesting maintenance interrupt
(XEN) gic.c:756:d0v0 requesting maintenance interrupt
(XEN) gic.c:756:d0v0 requesting maintenance interrupt
(XEN) gic.c:756:d0v0 requesting maintenance interrupt
(XEN) gic.c:756:d0v0 requesting maintenance interrupt
(XEN) gic.c:756:d0v0 requesting maintenance interrupt
(XEN) gic.c:756:d0v0 requesting maintenance interrupt

But does not occur



 
 
  --
 
  Andrii Tseglytskyi | Embedded Dev
  GlobalLogic
  www.globallogic.com



 --

 Andrii Tseglytskyi | Embedded Dev
 GlobalLogic
 www.globallogic.com




-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Andrii Tseglytskyi
Gic dump during interrupt requesting:

(XEN) GICH_LRs (vcpu 0) mask=f
(XEN)HW_LR[0]=3a1f
(XEN)HW_LR[1]=9a015856
(XEN)HW_LR[2]=1a1b
(XEN)HW_LR[3]=9a00e439
(XEN) Inflight irq=31 lr=0
(XEN) Inflight irq=86 lr=1
(XEN) Inflight irq=27 lr=2
(XEN) Inflight irq=57 lr=3
(XEN) Inflight irq=2 lr=255
(XEN) Pending irq=2

On Wed, Nov 19, 2014 at 6:29 PM, Andrii Tseglytskyi
andrii.tseglyts...@globallogic.com wrote:
 On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
 On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
 andrii.tseglyts...@globallogic.com wrote:
  On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
  stefano.stabell...@eu.citrix.com wrote:
  On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
  Hi Stefano,
 
  On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
  stefano.stabell...@eu.citrix.com wrote:
   On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
   Hi Stefano,
  
  if ( !list_empty(current-arch.vgic.lr_pending)  
 lr_all_full() )
 -GICH[GICH_HCR] |= GICH_HCR_UIE;
 +GICH[GICH_HCR] |= GICH_HCR_NPIE;
  else
 -GICH[GICH_HCR] = ~GICH_HCR_UIE;
 +GICH[GICH_HCR] = ~GICH_HCR_NPIE;

  }
   
Yes, exactly
  
   I tried, hang still occurs with this change
  
   We need to figure out why during the hang you still have all the LRs
   busy even if you are getting maintenance interrupts that should cause
   them to be cleared.
  
 
  I see that I have free LRs during maintenance interrupt
 
  (XEN) gic.c:871:d0v0 maintenance interrupt
  (XEN) GICH_LRs (vcpu 0) mask=0
  (XEN)HW_LR[0]=9a015856
  (XEN)HW_LR[1]=0
  (XEN)HW_LR[2]=0
  (XEN)HW_LR[3]=0
  (XEN) Inflight irq=86 lr=0
  (XEN) Inflight irq=2 lr=255
  (XEN) Pending irq=2
 
  But I see that after I got hang - maintenance interrupts are generated
  continuously. Platform continues printing the same log till reboot.
 
  Exactly the same log? As in the one above you just pasted?
  That is very very suspicious.
 
  Yes exactly the same log. And looks like it means that LRs are flushed
  correctly.
 
 
  I am thinking that we are not handling GICH_HCR_UIE correctly and
  something we do in Xen, maybe writing to an LR register, might trigger a
  new maintenance interrupt immediately causing an infinite loop.
 
 
  Yes, this is what I'm thinking about. Taking in account all collected
  debug info it looks like once LRs are overloaded with SGIs -
  maintenance interrupt occurs.
  And then it is not handled properly, and occurs again and again - so
  platform hangs inside its handler.
 
  Could you please try this patch? It disable GICH_HCR_UIE immediately on
  hypervisor entry.
 
 
  Now trying.
 
 
  diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
  index 4d2a92d..6ae8dc4 100644
  --- a/xen/arch/arm/gic.c
  +++ b/xen/arch/arm/gic.c
  @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
   if ( is_idle_vcpu(v) )
   return;
 
  +GICH[GICH_HCR] = ~GICH_HCR_UIE;
  +
   spin_lock_irqsave(v-arch.vgic.lock, flags);
 
   while ((i = find_next_bit((const unsigned long *) 
  this_cpu(lr_mask),
  @@ -821,12 +823,8 @@ void gic_inject(void)
 
   gic_restore_pending_irqs(current);
 
  -
   if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
   GICH[GICH_HCR] |= GICH_HCR_UIE;
  -else
  -GICH[GICH_HCR] = ~GICH_HCR_UIE;
  -
   }
 
   static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum 
  gic_sgi sgi)
 

 Heh - I don't see hangs with this patch :) But also I see that
 maintenance interrupt doesn't occur (and no hang as result)
 Stefano - is this expected?

 No maintenance interrupts at all? That's strange. You should be
 receiving them when LRs are full and you still have interrupts pending
 to be added to them.

 You could add another printk here to see if you should be receiving
 them:

  if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
 +{
 +gdprintk(XENLOG_DEBUG, requesting maintenance interrupt\n);
  GICH[GICH_HCR] |= GICH_HCR_UIE;
 -else
 -GICH[GICH_HCR] = ~GICH_HCR_UIE;
 -
 +}
  }


 Requested properly:

 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt

 But does not occur



 
 
  --
 
  Andrii Tseglytskyi | Embedded Dev
  GlobalLogic
  www.globallogic.com



 --

 Andrii Tseglytskyi | Embedded Dev
 GlobalLogic
 www.globallogic.com




 --

 Andrii Tseglytskyi | Embedded Dev
 GlobalLogic
 www.globallogic.com



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com


Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Andrii Tseglytskyi
BTW - shouldn't this flag GICH_LR_MAINTENANCE_IRQ be set after
maintenance interrupt requesting ?

On Wed, Nov 19, 2014 at 6:32 PM, Andrii Tseglytskyi
andrii.tseglyts...@globallogic.com wrote:
 Gic dump during interrupt requesting:

 (XEN) GICH_LRs (vcpu 0) mask=f
 (XEN)HW_LR[0]=3a1f
 (XEN)HW_LR[1]=9a015856
 (XEN)HW_LR[2]=1a1b
 (XEN)HW_LR[3]=9a00e439
 (XEN) Inflight irq=31 lr=0
 (XEN) Inflight irq=86 lr=1
 (XEN) Inflight irq=27 lr=2
 (XEN) Inflight irq=57 lr=3
 (XEN) Inflight irq=2 lr=255
 (XEN) Pending irq=2

 On Wed, Nov 19, 2014 at 6:29 PM, Andrii Tseglytskyi
 andrii.tseglyts...@globallogic.com wrote:
 On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
 On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
 andrii.tseglyts...@globallogic.com wrote:
  On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
  stefano.stabell...@eu.citrix.com wrote:
  On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
  Hi Stefano,
 
  On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
  stefano.stabell...@eu.citrix.com wrote:
   On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
   Hi Stefano,
  
  if ( !list_empty(current-arch.vgic.lr_pending)  
 lr_all_full() )
 -GICH[GICH_HCR] |= GICH_HCR_UIE;
 +GICH[GICH_HCR] |= GICH_HCR_NPIE;
  else
 -GICH[GICH_HCR] = ~GICH_HCR_UIE;
 +GICH[GICH_HCR] = ~GICH_HCR_NPIE;

  }
   
Yes, exactly
  
   I tried, hang still occurs with this change
  
   We need to figure out why during the hang you still have all the LRs
   busy even if you are getting maintenance interrupts that should cause
   them to be cleared.
  
 
  I see that I have free LRs during maintenance interrupt
 
  (XEN) gic.c:871:d0v0 maintenance interrupt
  (XEN) GICH_LRs (vcpu 0) mask=0
  (XEN)HW_LR[0]=9a015856
  (XEN)HW_LR[1]=0
  (XEN)HW_LR[2]=0
  (XEN)HW_LR[3]=0
  (XEN) Inflight irq=86 lr=0
  (XEN) Inflight irq=2 lr=255
  (XEN) Pending irq=2
 
  But I see that after I got hang - maintenance interrupts are generated
  continuously. Platform continues printing the same log till reboot.
 
  Exactly the same log? As in the one above you just pasted?
  That is very very suspicious.
 
  Yes exactly the same log. And looks like it means that LRs are flushed
  correctly.
 
 
  I am thinking that we are not handling GICH_HCR_UIE correctly and
  something we do in Xen, maybe writing to an LR register, might trigger a
  new maintenance interrupt immediately causing an infinite loop.
 
 
  Yes, this is what I'm thinking about. Taking in account all collected
  debug info it looks like once LRs are overloaded with SGIs -
  maintenance interrupt occurs.
  And then it is not handled properly, and occurs again and again - so
  platform hangs inside its handler.
 
  Could you please try this patch? It disable GICH_HCR_UIE immediately on
  hypervisor entry.
 
 
  Now trying.
 
 
  diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
  index 4d2a92d..6ae8dc4 100644
  --- a/xen/arch/arm/gic.c
  +++ b/xen/arch/arm/gic.c
  @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
   if ( is_idle_vcpu(v) )
   return;
 
  +GICH[GICH_HCR] = ~GICH_HCR_UIE;
  +
   spin_lock_irqsave(v-arch.vgic.lock, flags);
 
   while ((i = find_next_bit((const unsigned long *) 
  this_cpu(lr_mask),
  @@ -821,12 +823,8 @@ void gic_inject(void)
 
   gic_restore_pending_irqs(current);
 
  -
   if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
   GICH[GICH_HCR] |= GICH_HCR_UIE;
  -else
  -GICH[GICH_HCR] = ~GICH_HCR_UIE;
  -
   }
 
   static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum 
  gic_sgi sgi)
 

 Heh - I don't see hangs with this patch :) But also I see that
 maintenance interrupt doesn't occur (and no hang as result)
 Stefano - is this expected?

 No maintenance interrupts at all? That's strange. You should be
 receiving them when LRs are full and you still have interrupts pending
 to be added to them.

 You could add another printk here to see if you should be receiving
 them:

  if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
 +{
 +gdprintk(XENLOG_DEBUG, requesting maintenance interrupt\n);
  GICH[GICH_HCR] |= GICH_HCR_UIE;
 -else
 -GICH[GICH_HCR] = ~GICH_HCR_UIE;
 -
 +}
  }


 Requested properly:

 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt

 But does not occur



 
 
  --
 
  Andrii Tseglytskyi | Embedded Dev
  GlobalLogic
  www.globallogic.com



 --

 Andrii Tseglytskyi | Embedded Dev
 

Re: [Xen-devel] [PATCH v10 for-xen-4.5 2/2] dpci: Replace tasklet with an softirq

2014-11-19 Thread Konrad Rzeszutek Wilk
On Fri, Nov 14, 2014 at 11:11:46AM -0500, Konrad Rzeszutek Wilk wrote:
 On Fri, Nov 14, 2014 at 03:13:42PM +, Jan Beulich wrote:
   On 12.11.14 at 03:23, konrad.w...@oracle.com wrote:
   +static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci)
   +{
   +struct domain *d = pirq_dpci-dom;
   +
   +ASSERT(spin_is_locked(d-event_lock));
   +
   +switch ( cmpxchg(pirq_dpci-state, 1  STATE_SCHED, 0) )
   +{
   +case (1  STATE_SCHED):
   +/*
   + * We are going to try to de-schedule the softirq before it goes 
   in
   + * STATE_RUN. Whoever clears STATE_SCHED MUST refcount the 'dom'.
   + */
   +put_domain(d);
   +/* fallthrough. */
  
  Considering Sander's report, the only suspicious place I find is this
  one: When the STATE_SCHED flag is set, pirq_dpci is on some
  CPU's list. What guarantees it to get removed from that list before
  getting inserted on another one?
 
 None. The moment that STATE_SCHED is cleared, 'raise_softirq_for'
 is free to manipulate the list.

I was too quick to say this. A bit more inspection shows that while
'raise_softirq_for' is free to manipulate the list - it won't be called.

The reason is that the pt_pirq_softirq_reset is called _after_ the IRQ
action handler are removed for this IRQ. That means we will not receive
any interrupts for it and call 'raise_softirq_for'. At least until
'pt_irq_create_bind' is called. And said function has a check for
this too:

42  * A crude 'while' loop with us dropping the spinlock and giving 
   
243  * the softirq_dpci a chance to run.

244  * We MUST check for this condition as the softirq could be scheduled   

245  * and hasn't run yet. Note that this code replaced tasklet_kill which  

246  * would have spun forever and would do the same thing (wait to flush 
out   
247  * outstanding hvm_dirq_assist calls.   

248  */ 

249 if ( pt_pirq_softirq_active(pirq_dpci) )  

Hence the patch below is not needed.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Stefano Stabellini
On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
  On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
  On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
  andrii.tseglyts...@globallogic.com wrote:
   On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
   stefano.stabell...@eu.citrix.com wrote:
   On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
   Hi Stefano,
  
   On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
   stefano.stabell...@eu.citrix.com wrote:
On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
Hi Stefano,
   
   if ( !list_empty(current-arch.vgic.lr_pending)  
  lr_all_full() )
  -GICH[GICH_HCR] |= GICH_HCR_UIE;
  +GICH[GICH_HCR] |= GICH_HCR_NPIE;
   else
  -GICH[GICH_HCR] = ~GICH_HCR_UIE;
  +GICH[GICH_HCR] = ~GICH_HCR_NPIE;
 
   }

 Yes, exactly
   
I tried, hang still occurs with this change
   
We need to figure out why during the hang you still have all the LRs
busy even if you are getting maintenance interrupts that should cause
them to be cleared.
   
  
   I see that I have free LRs during maintenance interrupt
  
   (XEN) gic.c:871:d0v0 maintenance interrupt
   (XEN) GICH_LRs (vcpu 0) mask=0
   (XEN)HW_LR[0]=9a015856
   (XEN)HW_LR[1]=0
   (XEN)HW_LR[2]=0
   (XEN)HW_LR[3]=0
   (XEN) Inflight irq=86 lr=0
   (XEN) Inflight irq=2 lr=255
   (XEN) Pending irq=2
  
   But I see that after I got hang - maintenance interrupts are generated
   continuously. Platform continues printing the same log till reboot.
  
   Exactly the same log? As in the one above you just pasted?
   That is very very suspicious.
  
   Yes exactly the same log. And looks like it means that LRs are flushed
   correctly.
  
  
   I am thinking that we are not handling GICH_HCR_UIE correctly and
   something we do in Xen, maybe writing to an LR register, might trigger a
   new maintenance interrupt immediately causing an infinite loop.
  
  
   Yes, this is what I'm thinking about. Taking in account all collected
   debug info it looks like once LRs are overloaded with SGIs -
   maintenance interrupt occurs.
   And then it is not handled properly, and occurs again and again - so
   platform hangs inside its handler.
  
   Could you please try this patch? It disable GICH_HCR_UIE immediately on
   hypervisor entry.
  
  
   Now trying.
  
  
   diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
   index 4d2a92d..6ae8dc4 100644
   --- a/xen/arch/arm/gic.c
   +++ b/xen/arch/arm/gic.c
   @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
if ( is_idle_vcpu(v) )
return;
  
   +GICH[GICH_HCR] = ~GICH_HCR_UIE;
   +
spin_lock_irqsave(v-arch.vgic.lock, flags);
  
while ((i = find_next_bit((const unsigned long *) 
   this_cpu(lr_mask),
   @@ -821,12 +823,8 @@ void gic_inject(void)
  
gic_restore_pending_irqs(current);
  
   -
if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
GICH[GICH_HCR] |= GICH_HCR_UIE;
   -else
   -GICH[GICH_HCR] = ~GICH_HCR_UIE;
   -
}
  
static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum 
   gic_sgi sgi)
  
 
  Heh - I don't see hangs with this patch :) But also I see that
  maintenance interrupt doesn't occur (and no hang as result)
  Stefano - is this expected?
 
  No maintenance interrupts at all? That's strange. You should be
  receiving them when LRs are full and you still have interrupts pending
  to be added to them.
 
  You could add another printk here to see if you should be receiving
  them:
 
   if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
  +{
  +gdprintk(XENLOG_DEBUG, requesting maintenance interrupt\n);
   GICH[GICH_HCR] |= GICH_HCR_UIE;
  -else
  -GICH[GICH_HCR] = ~GICH_HCR_UIE;
  -
  +}
   }
 
 
 Requested properly:
 
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 
 But does not occur

OK, let's see what's going on then by printing the irq number of the
maintenance interrupt:

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 4d2a92d..fed3167 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -55,6 +55,7 @@ static struct {
 static DEFINE_PER_CPU(uint64_t, lr_mask);
 
 static uint8_t nr_lrs;
+static bool uie_on;
 #define lr_all_full() (this_cpu(lr_mask) == ((1  nr_lrs) - 1))
 
 /* The GIC mapping of CPU interfaces does not necessarily match the
@@ -694,6 +695,7 @@ void gic_clear_lrs(struct vcpu *v)
 {
 int i = 0;
 unsigned long flags;
+unsigned 

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Stefano Stabellini
I think that's OK: it looks like that on your board for some reasons
when UIE is set you get irq 1023 (spurious interrupt) instead of your
normal maintenance interrupt.

But everything should work anyway without issues.

This is the same patch as before but on top of the lastest xen-unstable
tree. Please confirm if it works.

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 70d10d6..df140b9 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v)
 if ( is_idle_vcpu(v) )
 return;
 
+gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0);
+
 spin_lock_irqsave(v-arch.vgic.lock, flags);
 
 while ((i = find_next_bit((const unsigned long *) this_cpu(lr_mask),
@@ -527,8 +529,6 @@ void gic_inject(void)
 
 if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
 gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 1);
-else
-gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0);
 }
 
 static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi)

On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 I got this strange log:
 
 (XEN) received maintenance interrupt irq=1023
 
 And platform does not hang due to this:
 +hcr = GICH[GICH_HCR];
 +if ( hcr  GICH_HCR_UIE )
 +{
 +GICH[GICH_HCR] = ~GICH_HCR_UIE;
 +uie_on = 1;
 +}
 
 On Wed, Nov 19, 2014 at 6:50 PM, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
  On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
  On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
  stefano.stabell...@eu.citrix.com wrote:
   On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
   On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
   andrii.tseglyts...@globallogic.com wrote:
On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
stefano.stabell...@eu.citrix.com wrote:
On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
Hi Stefano,
   
On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
stefano.stabell...@eu.citrix.com wrote:
 On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 Hi Stefano,

if ( !list_empty(current-arch.vgic.lr_pending)  
   lr_all_full() )
   -GICH[GICH_HCR] |= GICH_HCR_UIE;
   +GICH[GICH_HCR] |= GICH_HCR_NPIE;
else
   -GICH[GICH_HCR] = ~GICH_HCR_UIE;
   +GICH[GICH_HCR] = ~GICH_HCR_NPIE;
  
}
 
  Yes, exactly

 I tried, hang still occurs with this change

 We need to figure out why during the hang you still have all the 
 LRs
 busy even if you are getting maintenance interrupts that should 
 cause
 them to be cleared.

   
I see that I have free LRs during maintenance interrupt
   
(XEN) gic.c:871:d0v0 maintenance interrupt
(XEN) GICH_LRs (vcpu 0) mask=0
(XEN)HW_LR[0]=9a015856
(XEN)HW_LR[1]=0
(XEN)HW_LR[2]=0
(XEN)HW_LR[3]=0
(XEN) Inflight irq=86 lr=0
(XEN) Inflight irq=2 lr=255
(XEN) Pending irq=2
   
But I see that after I got hang - maintenance interrupts are 
generated
continuously. Platform continues printing the same log till reboot.
   
Exactly the same log? As in the one above you just pasted?
That is very very suspicious.
   
Yes exactly the same log. And looks like it means that LRs are flushed
correctly.
   
   
I am thinking that we are not handling GICH_HCR_UIE correctly and
something we do in Xen, maybe writing to an LR register, might 
trigger a
new maintenance interrupt immediately causing an infinite loop.
   
   
Yes, this is what I'm thinking about. Taking in account all collected
debug info it looks like once LRs are overloaded with SGIs -
maintenance interrupt occurs.
And then it is not handled properly, and occurs again and again - so
platform hangs inside its handler.
   
Could you please try this patch? It disable GICH_HCR_UIE immediately 
on
hypervisor entry.
   
   
Now trying.
   
   
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 4d2a92d..6ae8dc4 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
 if ( is_idle_vcpu(v) )
 return;
   
+GICH[GICH_HCR] = ~GICH_HCR_UIE;
+
 spin_lock_irqsave(v-arch.vgic.lock, flags);
   
 while ((i = find_next_bit((const unsigned long *) 
this_cpu(lr_mask),
@@ -821,12 +823,8 @@ void gic_inject(void)
   
 gic_restore_pending_irqs(current);
   
-
 if ( !list_empty(current-arch.vgic.lr_pending)  
lr_all_full() )
 GICH[GICH_HCR] |= GICH_HCR_UIE;
-else
-GICH[GICH_HCR] = ~GICH_HCR_UIE;
-
 }
   
 static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum 
gic_sgi sgi)
   
  
   Heh - I don't see hangs with this patch :) But also I see that
   maintenance interrupt doesn't occur (and no hang as 

Re: [Xen-devel] Xen-unstable: xen panic RIP: dpci_softirq

2014-11-19 Thread Sander Eikelenboom

Wednesday, November 19, 2014, 4:04:59 PM, you wrote:

 On Wed, Nov 19, 2014 at 12:16:44PM +0100, Sander Eikelenboom wrote:
 
 Wednesday, November 19, 2014, 2:55:41 AM, you wrote:
 
  On Tue, Nov 18, 2014 at 11:12:54PM +0100, Sander Eikelenboom wrote:
  
  Tuesday, November 18, 2014, 9:56:33 PM, you wrote:
  
   
   Uhmm i thought i had these switched off (due to problems earlier and 
   then forgot 
   about them .. however looking at the earlier reports these lines were 
   also in 
   those reports).
   
   The xen-syms and these last runs are all with a prestine xen tree 
   cloned today (staging 
   branch), so the qemu-xen and seabios defined with that were also 
   freshly cloned 
   and had a new default seabios config. (just to rule out anything stale 
   in my tree)
   
   If you don't see those messages .. perhaps your seabios and qemu trees 
   (and at least the 
   seabios config) are not the most recent (they don't get updated 
   automatically 
   when you just do a git pull on the main tree) ?
   
   In /tools/firmware/seabios-dir/.config i have:
   CONFIG_USB=y
   CONFIG_USB_UHCI=y
   CONFIG_USB_OHCI=y
   CONFIG_USB_EHCI=y
   CONFIG_USB_XHCI=y
   CONFIG_USB_MSC=y
   CONFIG_USB_UAS=y
   CONFIG_USB_HUB=y
   CONFIG_USB_KEYBOARD=y
   CONFIG_USB_MOUSE=y
   
  
   I seem to have the same thing. Perhaps it is my XHCI controller being 
   wonky.
  
   And this is all just from a:
   - git clone git://xenbits.xen.org/xen.git -b staging
   - make clean  ./configure  make -j6  make -j6 install
  
   Aye. 
   .. snip..
 1) test_and_[set|clear]_bit sometimes return unexpected values.
[But this might be invalid as the addition of the 
8303faaf25a8
 might be correct - as the second dpci the softirq is processing
 could be the MSI one]
   
   Would there be an easy way to stress test this function separately in 
   some 
   debugging function to see if it indeed is returning unexpected values ?
  
   Sadly no. But you got me looking in the right direction when you 
   mentioned
   'timeout'.
   
 2) INIT_LIST_HEAD operations on the same CPU are not honored.
   
   Just curious, have you also tested the patches on AMD hardware ?
  
   Yes. To reproduce this the first thing I did was to get an AMD box.
  
   

When i look at the combination of (2) and (3), It seems it could be 
an 
interaction between the two passed through devices and/or different 
IRQ types.
   
Could be - as in it is causing this issue to show up faster than
expected. Or it is the one that triggers more than one dpci happening
at the same time.
   
   Well that didn't seem to be it (see separate amendment i mailed 
   previously)
  
   Right, the current theory I've is that the interrupts are not being
   Acked within 8 milisecond and we reset the 'state' - and at the same
   time we get an interrupt and schedule it - while we are still processing
   the same interrupt. This would explain why the 'test_and_clear_bit'
   got the wrong value.
  
   In regards to the list poison - following this thread of logic - with
   the 'state = 0' set we open the floodgates for any CPU to put the same
   'struct hvm_pirq_dpci' on its list.
  
   We do reset the 'state' on _every_ GSI that is mapped to a guest - so
   we also reset the 'state' for the MSI one (XHCI). Anyhow in your case:
  
   CPUX:   CPUY:
   pt_irq_time_out:
   state = 0;  
   [out of timer coder, theraise_softirq
pirq_dpci is on the dpci_list] [adds the pirq_dpci as state == 
   0]
  
   softirq_dpcisoftirq_dpci:
   list_del
   [entries poison]
   list_del = BOOM
   
   Is what I believe is happening.
  
   The INTX device - once I put a load on it - does not trigger
   any pt_irq_time_out, so that would explain why I cannot hit this.
  
   But I believe your card hits these hiccups.   
  
  
  Hi Konrad,
  
  I just tested you 5 patches and as a result i still got an(other) host 
  crash:
  (complete serial log attached)
  
  (XEN) [2014-11-18 21:55:41.591] [ Xen-4.5.0-rc  x86_64  debug=y  Not 
  tainted ]
  (XEN) [2014-11-18 21:55:41.591] CPU:0
  (XEN) [2014-11-18 21:55:41.591] [ Xen-4.5.0-rc  x86_64  debug=y  Not 
  tainted ]
  (XEN) [2014-11-18 21:55:41.591] RIP:e008:[82d08012c7e7]CPU:2
  (XEN) [2014-11-18 21:55:41.591] RIP:e008:[82d08014a461] 
  hvm_do_IRQ_dpci+0xbd/0x13c
  (XEN) [2014-11-18 21:55:41.591] RFLAGS: 00010006
  _spin_unlock+0x1f/0x30CONTEXT: hypervisor
 
  Duh!
 
  Here is another patch on top of the five you have (attached and inline).
 
 Hi Konrad,
 
 Happy to report it has been running with this additional patch for 2 hours 
 now 
 without any problems. I think you nailed it :-)

 Could you also do an 'xl debug-keys k' and send that please?

Sure:

(XEN) 

[Xen-devel] [for-xen-4.5 PATCH] dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked.

2014-11-19 Thread Konrad Rzeszutek Wilk
If we pass in INTx type devices to a guest on an over-subscribed
machine - and in an over-worked guest - we can cause the
pirq_dpci-softirq_list to become corrupted.

The reason for this is that the 'pt_irq_guest_eoi' ends up
setting the 'state' to zero value. However the 'state' value
(STATE_SCHED, STATE_RUN) is used to communicate between
 'raise_softirq_for' and 'dpci_softirq' to determine whether the
'struct hvm_pirq_dpci' can be re-scheduled. We are ignoring the
teardown path for simplicity for right now. The 'pt_irq_guest_eoi' was
not adhering to the proper dialogue and was not using locked cmpxchg or
test_bit operations and ended setting 'state' set to zero. That
meant 'raise_softirq_for' was free to schedule it while the
'struct hvm_pirq_dpci'' was still on an per-cpu list.
The end result was list_del being called twice and the second call
corrupting the per-cpu list.

For this to occur one of the CPUs must be in the idle loop executing
softirqs and the interrupt handler in the guest must not
respond to the pending interrupt within 8ms, and we must receive
another interrupt for this device on another CPU.

CPU0:  CPU1:

timer_softirq_action
 \- pt_irq_time_out
 state = 0;do_IRQ
 [out of timer code, theraise_softirq
 pirq_dpci is on the CPU0 dpci_list]  [adds the pirq_dpci to CPU1
   dpci_list as state == 0]

softirq_dpci:softirq_dpci:
list_del
[list entries are poisoned]
list_del = BOOM

The fix is simple - enroll 'pt_irq_guest_eoi' to use the locked
semantics for 'state'. We piggyback on pt_pirq_softirq_cancel (was
pt_pirq_softirq_reset) to use cmpxchg. We also expand said function
to reset the '-dom' only on the teardown paths - but not on the
timeouts.

Reported-by: Sander Eikelenboom li...@eikelenboom.it
Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
---
 xen/drivers/passthrough/io.c | 27 +--
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index efc66dc..2039d31 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -57,7 +57,7 @@ enum {
  * This can be called multiple times, but the softirq is only raised once.
  * That is until the STATE_SCHED state has been cleared. The state can be
  * cleared by: the 'dpci_softirq' (when it has executed 'hvm_dirq_assist'),
- * or by 'pt_pirq_softirq_reset' (which will try to clear the state before
+ * or by 'pt_pirq_softirq_cancel' (which will try to clear the state before
  * the softirq had a chance to run).
  */
 static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
@@ -97,13 +97,15 @@ bool_t pt_pirq_softirq_active(struct hvm_pirq_dpci 
*pirq_dpci)
 }
 
 /*
- * Reset the pirq_dpci-dom parameter to NULL.
+ * Cancels an outstanding pirq_dpci (if scheduled). Also if clear is set,
+ * reset pirq_dpci-dom parameter to NULL (used for teardown).
  *
  * This function checks the different states to make sure it can do it
  * at the right time. If it unschedules the 'hvm_dirq_assist' from running
  * it also refcounts (which is what the softirq would have done) properly.
  */
-static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci)
+static void pt_pirq_softirq_cancel(struct hvm_pirq_dpci *pirq_dpci,
+   unsigned int clear)
 {
 struct domain *d = pirq_dpci-dom;
 
@@ -125,8 +127,13 @@ static void pt_pirq_softirq_reset(struct hvm_pirq_dpci 
*pirq_dpci)
  * to a shortcut the 'dpci_softirq' implements. It stashes the 'dom'
  * in local variable before it sets STATE_RUN - and therefore will not
  * dereference '-dom' which would crash.
+ *
+ * However, if this is called from 'pt_irq_time_out' we do not want to
+ * clear the '-dom' as we can re-use the 'pirq_dpci' after that and
+ * need '-dom'.
  */
-pirq_dpci-dom = NULL;
+if ( clear )
+pirq_dpci-dom = NULL;
 break;
 }
 }
@@ -142,7 +149,7 @@ static int pt_irq_guest_eoi(struct domain *d, struct 
hvm_pirq_dpci *pirq_dpci,
 if ( __test_and_clear_bit(_HVM_IRQ_DPCI_EOI_LATCH_SHIFT,
   pirq_dpci-flags) )
 {
-pirq_dpci-state = 0;
+pt_pirq_softirq_cancel(pirq_dpci, 0 /* keep dom */);
 pirq_dpci-pending = 0;
 pirq_guest_eoi(dpci_pirq(pirq_dpci));
 }
@@ -285,7 +292,7 @@ int pt_irq_create_bind(
  * to be scheduled but we must deal with the one that may 
be
  * in the queue.
  */
-pt_pirq_softirq_reset(pirq_dpci);
+pt_pirq_softirq_cancel(pirq_dpci, 1 /* reset dom */);
 }
 }
 if ( unlikely(rc) )
@@ -536,9 +543,9 @@ int 

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Stefano Stabellini
On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 Hi Stefano,
 
 On Wed, Nov 19, 2014 at 7:07 PM, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
  I think that's OK: it looks like that on your board for some reasons
  when UIE is set you get irq 1023 (spurious interrupt) instead of your
  normal maintenance interrupt.
 
 OK, but I think this should be investigated too. What do you think ?

I think it is harmless: my guess is that if we clear UIE before reading
GICC_IAR, GICC_IAR returns spurious interrupt instead of maintenance
interrupt. But it doesn't really matter to us.

 
  But everything should work anyway without issues.
 
  This is the same patch as before but on top of the lastest xen-unstable
  tree. Please confirm if it works.
 
  diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
  index 70d10d6..df140b9 100644
  --- a/xen/arch/arm/gic.c
  +++ b/xen/arch/arm/gic.c
  @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v)
   if ( is_idle_vcpu(v) )
   return;
 
  +gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0);
  +
   spin_lock_irqsave(v-arch.vgic.lock, flags);
 
   while ((i = find_next_bit((const unsigned long *) this_cpu(lr_mask),
  @@ -527,8 +529,6 @@ void gic_inject(void)
 
   if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
   gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 1);
  -else
  -gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0);
   }
 
 
 I confirm - it works fine. Will this be a final fix ?

Yep :-)
Many thanks for your help on this!


 Regards,
 Andrii
 
   static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi)
 
  On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
  I got this strange log:
 
  (XEN) received maintenance interrupt irq=1023
 
  And platform does not hang due to this:
  +hcr = GICH[GICH_HCR];
  +if ( hcr  GICH_HCR_UIE )
  +{
  +GICH[GICH_HCR] = ~GICH_HCR_UIE;
  +uie_on = 1;
  +}
 
  On Wed, Nov 19, 2014 at 6:50 PM, Stefano Stabellini
  stefano.stabell...@eu.citrix.com wrote:
   On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
   On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
   stefano.stabell...@eu.citrix.com wrote:
On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
andrii.tseglyts...@globallogic.com wrote:
 On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
 On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 Hi Stefano,

 On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
  On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
  Hi Stefano,
 
 if ( !list_empty(current-arch.vgic.lr_pending)  
lr_all_full() )
-GICH[GICH_HCR] |= GICH_HCR_UIE;
+GICH[GICH_HCR] |= GICH_HCR_NPIE;
 else
-GICH[GICH_HCR] = ~GICH_HCR_UIE;
+GICH[GICH_HCR] = ~GICH_HCR_NPIE;
   
 }
  
   Yes, exactly
 
  I tried, hang still occurs with this change
 
  We need to figure out why during the hang you still have all 
  the LRs
  busy even if you are getting maintenance interrupts that 
  should cause
  them to be cleared.
 

 I see that I have free LRs during maintenance interrupt

 (XEN) gic.c:871:d0v0 maintenance interrupt
 (XEN) GICH_LRs (vcpu 0) mask=0
 (XEN)HW_LR[0]=9a015856
 (XEN)HW_LR[1]=0
 (XEN)HW_LR[2]=0
 (XEN)HW_LR[3]=0
 (XEN) Inflight irq=86 lr=0
 (XEN) Inflight irq=2 lr=255
 (XEN) Pending irq=2

 But I see that after I got hang - maintenance interrupts are 
 generated
 continuously. Platform continues printing the same log till 
 reboot.

 Exactly the same log? As in the one above you just pasted?
 That is very very suspicious.

 Yes exactly the same log. And looks like it means that LRs are 
 flushed
 correctly.


 I am thinking that we are not handling GICH_HCR_UIE correctly and
 something we do in Xen, maybe writing to an LR register, might 
 trigger a
 new maintenance interrupt immediately causing an infinite loop.


 Yes, this is what I'm thinking about. Taking in account all 
 collected
 debug info it looks like once LRs are overloaded with SGIs -
 maintenance interrupt occurs.
 And then it is not handled properly, and occurs again and again - 
 so
 platform hangs inside its handler.

 Could you please try this patch? It disable GICH_HCR_UIE 
 immediately on
 hypervisor entry.


 Now trying.


 diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
 index 4d2a92d..6ae8dc4 100644
 --- a/xen/arch/arm/gic.c
 +++ b/xen/arch/arm/gic.c
 @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
  if ( is_idle_vcpu(v) )
  

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Andrii Tseglytskyi
On Wed, Nov 19, 2014 at 7:42 PM, Stefano Stabellini
stefano.stabell...@eu.citrix.com wrote:
 On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 Hi Stefano,

 On Wed, Nov 19, 2014 at 7:07 PM, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
  I think that's OK: it looks like that on your board for some reasons
  when UIE is set you get irq 1023 (spurious interrupt) instead of your
  normal maintenance interrupt.

 OK, but I think this should be investigated too. What do you think ?

 I think it is harmless: my guess is that if we clear UIE before reading
 GICC_IAR, GICC_IAR returns spurious interrupt instead of maintenance
 interrupt. But it doesn't really matter to us.

OK. I think catching this will be a good exercise for someone )) But
out of scope for this issue.


 
  But everything should work anyway without issues.
 
  This is the same patch as before but on top of the lastest xen-unstable
  tree. Please confirm if it works.
 
  diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
  index 70d10d6..df140b9 100644
  --- a/xen/arch/arm/gic.c
  +++ b/xen/arch/arm/gic.c
  @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v)
   if ( is_idle_vcpu(v) )
   return;
 
  +gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0);
  +
   spin_lock_irqsave(v-arch.vgic.lock, flags);
 
   while ((i = find_next_bit((const unsigned long *) this_cpu(lr_mask),
  @@ -527,8 +529,6 @@ void gic_inject(void)
 
   if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
   gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 1);
  -else
  -gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0);
   }
 

 I confirm - it works fine. Will this be a final fix ?

 Yep :-)
 Many thanks for your help on this!

Thank you Stefano. This issue was really critical for us :)

Regards,
Andrii



 Regards,
 Andrii

   static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi)
 
  On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
  I got this strange log:
 
  (XEN) received maintenance interrupt irq=1023
 
  And platform does not hang due to this:
  +hcr = GICH[GICH_HCR];
  +if ( hcr  GICH_HCR_UIE )
  +{
  +GICH[GICH_HCR] = ~GICH_HCR_UIE;
  +uie_on = 1;
  +}
 
  On Wed, Nov 19, 2014 at 6:50 PM, Stefano Stabellini
  stefano.stabell...@eu.citrix.com wrote:
   On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
   On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
   stefano.stabell...@eu.citrix.com wrote:
On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
andrii.tseglyts...@globallogic.com wrote:
 On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
 On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 Hi Stefano,

 On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
  On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
  Hi Stefano,
 
 if ( !list_empty(current-arch.vgic.lr_pending)  
lr_all_full() )
-GICH[GICH_HCR] |= GICH_HCR_UIE;
+GICH[GICH_HCR] |= GICH_HCR_NPIE;
 else
-GICH[GICH_HCR] = ~GICH_HCR_UIE;
+GICH[GICH_HCR] = ~GICH_HCR_NPIE;
   
 }
  
   Yes, exactly
 
  I tried, hang still occurs with this change
 
  We need to figure out why during the hang you still have all 
  the LRs
  busy even if you are getting maintenance interrupts that 
  should cause
  them to be cleared.
 

 I see that I have free LRs during maintenance interrupt

 (XEN) gic.c:871:d0v0 maintenance interrupt
 (XEN) GICH_LRs (vcpu 0) mask=0
 (XEN)HW_LR[0]=9a015856
 (XEN)HW_LR[1]=0
 (XEN)HW_LR[2]=0
 (XEN)HW_LR[3]=0
 (XEN) Inflight irq=86 lr=0
 (XEN) Inflight irq=2 lr=255
 (XEN) Pending irq=2

 But I see that after I got hang - maintenance interrupts are 
 generated
 continuously. Platform continues printing the same log till 
 reboot.

 Exactly the same log? As in the one above you just pasted?
 That is very very suspicious.

 Yes exactly the same log. And looks like it means that LRs are 
 flushed
 correctly.


 I am thinking that we are not handling GICH_HCR_UIE correctly and
 something we do in Xen, maybe writing to an LR register, might 
 trigger a
 new maintenance interrupt immediately causing an infinite loop.


 Yes, this is what I'm thinking about. Taking in account all 
 collected
 debug info it looks like once LRs are overloaded with SGIs -
 maintenance interrupt occurs.
 And then it is not handled properly, and occurs again and again - 
 so
 platform hangs inside its handler.

 Could you please try this patch? It disable GICH_HCR_UIE 
 immediately on
 hypervisor entry.


 Now trying.

[Xen-devel] [PATCH for-4.5] xen/arm: clear UIE on hypervisor entry

2014-11-19 Thread Stefano Stabellini
UIE being set can cause maintenance interrupts to occur when Xen writes
to one or more LR registers. The effect is a busy loop around the
interrupt handler in Xen
(http://marc.info/?l=xen-develm=141597517132682): everything gets stuck.

Konrad, this fixes an actual bug, at least on OMAP5. It should have no
bad side effects on any other platforms as far as I can tell. It should
go in 4.5.

Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com
Tested-by: Andrii Tseglytskyi andrii.tseglyts...@globallogic.com

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 70d10d6..df140b9 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v)
 if ( is_idle_vcpu(v) )
 return;
 
+gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0);
+
 spin_lock_irqsave(v-arch.vgic.lock, flags);
 
 while ((i = find_next_bit((const unsigned long *) this_cpu(lr_mask),
@@ -527,8 +529,6 @@ void gic_inject(void)
 
 if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
 gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 1);
-else
-gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0);
 }
 
 static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 4/4] x86/xen: use the maximum MFN to calculate the required DMA mask

2014-11-19 Thread Stefano Stabellini
On Wed, 19 Nov 2014, David Vrabel wrote:
 On a Xen PV guest the DMA addresses and physical addresses are not 1:1
 (such as Xen PV guests) and the generic dma_get_required_mask() does
 not return the correct mask (since it uses max_pfn).
 
 Some device drivers (such as mptsas, mpt2sas) use
 dma_get_required_mask() to set the device's DMA mask to allow them to
 use only 32-bit DMA addresses in hardware structures.  This results in
 unnecessary use of the SWIOTLB if DMA addresses are more than 32-bits,
 impacting performance significantly.
 
 Provide a get_required_mask op that uses the maximum MFN to calculate
 the DMA mask.
 
 Signed-off-by: David Vrabel david.vra...@citrix.com
 ---
  arch/x86/xen/pci-swiotlb-xen.c |1 +
  drivers/xen/swiotlb-xen.c  |   13 +
  include/xen/swiotlb-xen.h  |4 
  3 files changed, 18 insertions(+)
 
 diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
 index 0e98e5d..a5d180a 100644
 --- a/arch/x86/xen/pci-swiotlb-xen.c
 +++ b/arch/x86/xen/pci-swiotlb-xen.c
 @@ -31,6 +31,7 @@ static struct dma_map_ops xen_swiotlb_dma_ops = {
   .map_page = xen_swiotlb_map_page,
   .unmap_page = xen_swiotlb_unmap_page,
   .dma_supported = xen_swiotlb_dma_supported,
 + .get_required_mask = xen_swiotlb_get_required_mask,
  };
  
  /*
 diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
 index ebd8f21..654587d 100644
 --- a/drivers/xen/swiotlb-xen.c
 +++ b/drivers/xen/swiotlb-xen.c
 @@ -42,9 +42,11 @@
  #include xen/page.h
  #include xen/xen-ops.h
  #include xen/hvc-console.h
 +#include xen/interface/memory.h
  
  #include asm/dma-mapping.h
  #include asm/xen/page-coherent.h
 +#include asm/xen/hypercall.h
  
  #include trace/events/swiotlb.h
  /*
 @@ -683,3 +685,14 @@ xen_swiotlb_set_dma_mask(struct device *dev, u64 
 dma_mask)
   return 0;
  }
  EXPORT_SYMBOL_GPL(xen_swiotlb_set_dma_mask);
 +
 +u64
 +xen_swiotlb_get_required_mask(struct device *dev)
 +{
 + unsigned long max_mfn;
 +
 + max_mfn = HYPERVISOR_memory_op(XENMEM_maximum_ram_page, NULL);

As Jan pointed out, I think you need to change the prototype of
HYPERVISOR_memory_op to return long. Please do consistently across all
relevant archs.


 + return DMA_BIT_MASK(fls_long(max_mfn - 1) + PAGE_SHIFT);
 +}
 +EXPORT_SYMBOL_GPL(xen_swiotlb_get_required_mask);
 diff --git a/include/xen/swiotlb-xen.h b/include/xen/swiotlb-xen.h
 index 8b2eb93..640 100644
 --- a/include/xen/swiotlb-xen.h
 +++ b/include/xen/swiotlb-xen.h
 @@ -58,4 +58,8 @@ xen_swiotlb_dma_supported(struct device *hwdev, u64 mask);
  
  extern int
  xen_swiotlb_set_dma_mask(struct device *dev, u64 dma_mask);
 +
 +extern u64
 +xen_swiotlb_get_required_mask(struct device *dev);
 +
  #endif /* __LINUX_SWIOTLB_XEN_H */
 -- 
 1.7.10.4
 
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Stefano Stabellini
That's right, the maintenance interrupt handler is not called, but it
doesn't do anything so we are fine. The important thing is that an
interrupt is sent and git_clear_lrs gets called on hypervisor entry.

On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 The only ambiguity left - maintenance interrupt handler is not called.
 It was requested for specific IRQ number, retrieved from device tree.
 But when we trigger GICH_HCR_UIE - we got maintenance interrupt for
 spurious number 1023.
 
 Regards,
 Andrii
 
 On Wed, Nov 19, 2014 at 7:47 PM, Andrii Tseglytskyi
 andrii.tseglyts...@globallogic.com wrote:
  On Wed, Nov 19, 2014 at 7:42 PM, Stefano Stabellini
  stefano.stabell...@eu.citrix.com wrote:
  On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
  Hi Stefano,
 
  On Wed, Nov 19, 2014 at 7:07 PM, Stefano Stabellini
  stefano.stabell...@eu.citrix.com wrote:
   I think that's OK: it looks like that on your board for some reasons
   when UIE is set you get irq 1023 (spurious interrupt) instead of your
   normal maintenance interrupt.
 
  OK, but I think this should be investigated too. What do you think ?
 
  I think it is harmless: my guess is that if we clear UIE before reading
  GICC_IAR, GICC_IAR returns spurious interrupt instead of maintenance
  interrupt. But it doesn't really matter to us.
 
  OK. I think catching this will be a good exercise for someone )) But
  out of scope for this issue.
 
 
  
   But everything should work anyway without issues.
  
   This is the same patch as before but on top of the lastest xen-unstable
   tree. Please confirm if it works.
  
   diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
   index 70d10d6..df140b9 100644
   --- a/xen/arch/arm/gic.c
   +++ b/xen/arch/arm/gic.c
   @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v)
if ( is_idle_vcpu(v) )
return;
  
   +gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0);
   +
spin_lock_irqsave(v-arch.vgic.lock, flags);
  
while ((i = find_next_bit((const unsigned long *) 
   this_cpu(lr_mask),
   @@ -527,8 +529,6 @@ void gic_inject(void)
  
if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 1);
   -else
   -gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0);
}
  
 
  I confirm - it works fine. Will this be a final fix ?
 
  Yep :-)
  Many thanks for your help on this!
 
  Thank you Stefano. This issue was really critical for us :)
 
  Regards,
  Andrii
 
 
 
  Regards,
  Andrii
 
static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi)
  
   On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
   I got this strange log:
  
   (XEN) received maintenance interrupt irq=1023
  
   And platform does not hang due to this:
   +hcr = GICH[GICH_HCR];
   +if ( hcr  GICH_HCR_UIE )
   +{
   +GICH[GICH_HCR] = ~GICH_HCR_UIE;
   +uie_on = 1;
   +}
  
   On Wed, Nov 19, 2014 at 6:50 PM, Stefano Stabellini
   stefano.stabell...@eu.citrix.com wrote:
On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
stefano.stabell...@eu.citrix.com wrote:
 On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
 andrii.tseglyts...@globallogic.com wrote:
  On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
  stefano.stabell...@eu.citrix.com wrote:
  On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
  Hi Stefano,
 
  On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
  stefano.stabell...@eu.citrix.com wrote:
   On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
   Hi Stefano,
  
  if ( !list_empty(current-arch.vgic.lr_pending) 
  lr_all_full() )
 -GICH[GICH_HCR] |= GICH_HCR_UIE;
 +GICH[GICH_HCR] |= GICH_HCR_NPIE;
  else
 -GICH[GICH_HCR] = ~GICH_HCR_UIE;
 +GICH[GICH_HCR] = ~GICH_HCR_NPIE;

  }
   
Yes, exactly
  
   I tried, hang still occurs with this change
  
   We need to figure out why during the hang you still have 
   all the LRs
   busy even if you are getting maintenance interrupts that 
   should cause
   them to be cleared.
  
 
  I see that I have free LRs during maintenance interrupt
 
  (XEN) gic.c:871:d0v0 maintenance interrupt
  (XEN) GICH_LRs (vcpu 0) mask=0
  (XEN)HW_LR[0]=9a015856
  (XEN)HW_LR[1]=0
  (XEN)HW_LR[2]=0
  (XEN)HW_LR[3]=0
  (XEN) Inflight irq=86 lr=0
  (XEN) Inflight irq=2 lr=255
  (XEN) Pending irq=2
 
  But I see that after I got hang - maintenance interrupts are 
  generated
  continuously. Platform continues printing the same log till 
  reboot.
 
  Exactly the same log? As in the one above you just pasted?
  That is very very suspicious.
 
  Yes exactly the 

Re: [Xen-devel] [Qemu-devel] qemu 2.2 crash on linux hvm domU (full backtrace included)

2014-11-19 Thread Stefano Stabellini
On Wed, 19 Nov 2014, Don Slutz wrote:
 I have posted the patch:
 
 Subject: [BUGFIX][PATCH for 2.2 1/1] hw/i386/pc_piix.c: Also pass vmport=off
 for xenfv machine
 Date: Wed, 19 Nov 2014 12:30:57 -0500
 Message-ID: 1416418257-10166-1-git-send-email-dsl...@verizon.com
 
 
 Which fixes QEMU 2.2 for xenfv.  However if you configure xen_platform_pci=0
 you will still
 have this issue.  The good news is that xen-4.5 currently does not have QEMU
 2.2 and so does
 not have this issue.
 
 Only people (groups like spice?) that want QEMU 2.2.0 with xen 4.5.0 (or older
 xen versions)
 will hit this.
 
 I have changes to xen 4.6 which will fix the xen_platform_pci=0 case also.
 
 In order to get xen 4.5 to fully work with QEMU 2.2.0 (both in hard freeze)
 
 the 1st patch from Dr. David Alan Gilbert dgilb...@redhat.com
 would need to be applied to xen's qemu 2.0.2 (+ changes) so that
 vmport=off can be added to --machine.
 
 And a patch (yet to be written, subset of changes I have pending for 4.6)
 that adds vmport=off to QEMU args for --machine (it can be done in all cases).

What happens if you pass vmport=off via --machine, without David Alan
Gilbert's patch in QEMU?


 -Don Slutz
 
 
 
 On 11/19/14 10:52, Stefano Stabellini wrote:
  On Wed, 19 Nov 2014, Fabio Fantoni wrote:
   Il 19/11/2014 15:56, Don Slutz ha scritto:
I think I know what is happening here.  But you are pointing at the
wrong
change.

commit 9b23cfb76b3a5e9eb5cc899eaf2f46bc46d33ba4

Is what I am guessing at this time is the issue.  I think that
xen_enabled()
is
returning false in pc_machine_initfn.  Where as in pc_init1 is is
returning
true.

I am thinking that:


diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 7bb97a4..3268c29 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -914,7 +914,7 @@ static QEMUMachine xenfv_machine = {
  .desc = Xen Fully-virtualized PC,
  .init = pc_xen_hvm_init,
  .max_cpus = HVM_MAX_VCPUS,
-.default_machine_opts = accel=xen,
+.default_machine_opts = accel=xen,vmport=off,
  .hot_add_cpu = pc_hot_add_cpu,
  };
  #endif

Will fix your issue. I have not tested this yet.
   Tested now and it solves regression of linux hvm domUs with qemu 2.2,
   thanks.
   I think that I'm not the only with this regression and that this patch (or
   a
   fix to the cause in vmport) should be applied before qemu 2.2 final.
  Don,
  please submit a proper patch with a Signed-off-by.
  
  Thanks!
  
  - Stefano
  
 -Don Slutz


On 11/19/14 09:04, Fabio Fantoni wrote:
 Il 14/11/2014 12:25, Fabio Fantoni ha scritto:
  dom0 xen-unstable from staging git with x86/hvm: Extend HVM cpuid
  leaf
  with vcpu id and x86/hvm: Add per-vcpu evtchn upcalls patches,
  and
  qemu 2.2 from spice git (spice/next commit
  e779fa0a715530311e6f59fc8adb0f6eca914a89):
  https://github.com/Fantu/Xen/commits/rebase/m2r-staging
 I tried with qemu  tag v2.2.0-rc2 and crash still happen, here the
 full
 backtrace of latest test:
  Program received signal SIGSEGV, Segmentation fault.
  0x55689b07 in vmport_ioport_read (opaque=0x564443a0,
  addr=0,
   size=4) at
  /mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73
  73  eax = env-regs[R_EAX];
  (gdb) bt full
  #0  0x55689b07 in vmport_ioport_read (opaque=0x564443a0,
  addr=0,
   size=4) at
  /mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73
   s = 0x564443a0
   cs = 0x0
   cpu = 0x0
   __func__ = vmport_ioport_read
   env = 0x8250
   command = 0 '\000'
   eax = 0
  #1  0x55655fc4 in memory_region_read_accessor
  (mr=0x5628,
   addr=0, value=0x7fffd8d0, size=4, shift=0, mask=4294967295)
   at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:410
   tmp = 0
  #2  0x556562b7 in access_with_adjusted_size (addr=0,
   value=0x7fffd8d0, size=4, access_size_min=4,
  access_size_max=4,
   access=0x55655f62 memory_region_read_accessor,
  mr=0x5628)
   at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:480
   access_mask = 4294967295
   access_size = 4
   i = 0
  #3  0x556590e9 in memory_region_dispatch_read1
  (mr=0x5628,
   addr=0, size=4) at
  /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1077
   data = 0
  #4  0x556591b1 in memory_region_dispatch_read
  (mr=0x5628,
   addr=0, pval=0x7fffd9a8, size=4)
  ---Type return to continue, or q return to quit---
   at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1099
  No locals.
  #5  0x5565cbbc in io_mem_read (mr=0x5628, 

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Julien Grall
On 11/19/2014 06:14 PM, Stefano Stabellini wrote:
 That's right, the maintenance interrupt handler is not called, but it
 doesn't do anything so we are fine. The important thing is that an
 interrupt is sent and git_clear_lrs gets called on hypervisor entry.

It would be worth to write down this somewhere. Just in case someone
decide to add code in maintenance interrupt later.

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5] xen/arm: clear UIE on hypervisor entry

2014-11-19 Thread Konrad Rzeszutek Wilk
On Wed, Nov 19, 2014 at 05:44:49PM +, Stefano Stabellini wrote:
 UIE being set can cause maintenance interrupts to occur when Xen writes
 to one or more LR registers. The effect is a busy loop around the
 interrupt handler in Xen
 (http://marc.info/?l=xen-develm=141597517132682): everything gets stuck.
 
 Konrad, this fixes an actual bug, at least on OMAP5. It should have no
 bad side effects on any other platforms as far as I can tell. It should
 go in 4.5.

Have you checked (aka ran the tests) on the other platforms?
 
 Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com
 Tested-by: Andrii Tseglytskyi andrii.tseglyts...@globallogic.com
  ^^^
 'Reported-and-Tested-by'
 
 diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
 index 70d10d6..df140b9 100644
 --- a/xen/arch/arm/gic.c
 +++ b/xen/arch/arm/gic.c
 @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v)
  if ( is_idle_vcpu(v) )
  return;
  
 +gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0);
 +
  spin_lock_irqsave(v-arch.vgic.lock, flags);
  
  while ((i = find_next_bit((const unsigned long *) this_cpu(lr_mask),
 @@ -527,8 +529,6 @@ void gic_inject(void)
  
  if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
  gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 1);
 -else
 -gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0);
  }
  
  static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Stefano Stabellini
On Wed, 19 Nov 2014, Julien Grall wrote:
 On 11/19/2014 06:14 PM, Stefano Stabellini wrote:
  That's right, the maintenance interrupt handler is not called, but it
  doesn't do anything so we are fine. The important thing is that an
  interrupt is sent and git_clear_lrs gets called on hypervisor entry.
 
 It would be worth to write down this somewhere. Just in case someone
 decide to add code in maintenance interrupt later.

Yes, I could add a comment in the handler

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5] xen/arm: clear UIE on hypervisor entry

2014-11-19 Thread Stefano Stabellini
On Wed, 19 Nov 2014, Konrad Rzeszutek Wilk wrote:
 On Wed, Nov 19, 2014 at 05:44:49PM +, Stefano Stabellini wrote:
  UIE being set can cause maintenance interrupts to occur when Xen writes
  to one or more LR registers. The effect is a busy loop around the
  interrupt handler in Xen
  (http://marc.info/?l=xen-develm=141597517132682): everything gets stuck.
  
  Konrad, this fixes an actual bug, at least on OMAP5. It should have no
  bad side effects on any other platforms as far as I can tell. It should
  go in 4.5.
 
 Have you checked (aka ran the tests) on the other platforms?

Yes, I tested on Midway and it runs fine.


  Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com
  Tested-by: Andrii Tseglytskyi andrii.tseglyts...@globallogic.com
   ^^^
  'Reported-and-Tested-by'

Good point


  diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
  index 70d10d6..df140b9 100644
  --- a/xen/arch/arm/gic.c
  +++ b/xen/arch/arm/gic.c
  @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v)
   if ( is_idle_vcpu(v) )
   return;
   
  +gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0);
  +
   spin_lock_irqsave(v-arch.vgic.lock, flags);
   
   while ((i = find_next_bit((const unsigned long *) this_cpu(lr_mask),
  @@ -527,8 +529,6 @@ void gic_inject(void)
   
   if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
   gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 1);
  -else
  -gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0);
   }
   
   static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi)
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1 for-xen-4.5] Fix list corruption in dpci_softirq.

2014-11-19 Thread Andrew Cooper
On 19/11/2014 18:54, Sander Eikelenboom wrote:
 Wednesday, November 19, 2014, 6:31:39 PM, you wrote:

 Hey,
 This patch should fix the issue that Sander had seen. The full details
 are in the patch itself. Sander, if you could - please test origin/staging
 with this patch to make sure it does fix the issue.

  xen/drivers/passthrough/io.c | 27 +--
 Konrad Rzeszutek Wilk (1):
   dpci: Fix list corruption if INTx device is used and an IRQ timeout is 
 invoked.
  1 file changed, 17 insertions(+), 10 deletions(-)

 Hi Konrad,

 Hmm just tested with a freshly cloned tree .. unfortunately it blew up again.
 (i must admit i also re-enabled stuff i had disabled in debugging like, 
 cpuidle, cpufreq). 

 (XEN) [2014-11-19 18:41:25.999] [ Xen-4.5.0-rc  x86_64  debug=y  Not 
 tainted ]
 (XEN) [2014-11-19 18:41:25.999] CPU:5
 (XEN) [2014-11-19 18:41:25.999] RIP:e008:[82d0801490ac] 
 dpci_softirq+0x9c/0x23d
 (XEN) [2014-11-19 18:41:25.999] RFLAGS: 00010283   CONTEXT: hypervisor
 (XEN) [2014-11-19 18:41:25.999] rax: 0100100100100100   rbx: 8303bb688d90 
   rcx: 0001
 (XEN) [2014-11-19 18:41:25.999] rdx: 83054ef18000   rsi: 0002 
   rdi: 83050b29e0b8
 (XEN) [2014-11-19 18:41:25.999] rbp: 83054ef1feb0   rsp: 83054ef1fe50 
   r8:  8303bb688d60
 (XEN) [2014-11-19 18:41:25.999] r9:  01d5f62fff63   r10: deadbeef 
   r11: 0246
 (XEN) [2014-11-19 18:41:25.999] r12: 8303bb688d38   r13: 83050b29e000 
   r14: 8303bb688d28
 (XEN) [2014-11-19 18:41:25.999] r15: 8303bb688d28   cr0: 8005003b 
   cr4: 06f0
 (XEN) [2014-11-19 18:41:25.999] cr3: 00050b2c7000   cr2: ff600400
 (XEN) [2014-11-19 18:41:25.999] ds: 002b   es: 002b   fs:    gs:    
 ss: e010   cs: e008
 (XEN) [2014-11-19 18:41:25.999] Xen stack trace from rsp=83054ef1fe50:
 (XEN) [2014-11-19 18:41:25.999]0c23 83050b29e0b8 
 8303bb688d38 83054ef1fe70
 (XEN) [2014-11-19 18:41:25.999]8303bb688d90 8303bb688d90 
 00fb 82d080300200
 (XEN) [2014-11-19 18:41:25.999]82d0802fff80  
 83054ef18000 0002
 (XEN) [2014-11-19 18:41:25.999]83054ef1fee0 82d08012be31 
 83054ef18000 83009fd2d000
 (XEN) [2014-11-19 18:41:25.999] 83054ef28068 
 83054ef1fef0 82d08012be89
 (XEN) [2014-11-19 18:41:25.999]83054ef1ff10 82d0801633e5 
 82d08012be89 83009ff8b000
 (XEN) [2014-11-19 18:41:25.999]83054ef1fde8 880059bf8000 
 880059bf8000 
 (XEN) [2014-11-19 18:41:25.999] 880059bfbeb0 
 822f3ec0 0246
 (XEN) [2014-11-19 18:41:25.999]0001  
  
 (XEN) [2014-11-19 18:41:25.999]810013aa 880059bde480 
 deadbeef deadbeef
 (XEN) [2014-11-19 18:41:25.999]0100 810013aa 
 e033 0246
 (XEN) [2014-11-19 18:41:25.999]880059bfbe98 e02b 
 1862060042c8beef 224d41480704beef
 (XEN) [2014-11-19 18:41:25.999]99171042639bbeef 74c88180108cbeef 
 c0dc604c0005 83009ff8b000
 (XEN) [2014-11-19 18:41:26.000]0034cebff280 ca836183a4020303
 (XEN) [2014-11-19 18:41:26.000] Xen call trace:
 (XEN) [2014-11-19 18:41:26.000][82d0801490ac] 
 dpci_softirq+0x9c/0x23d
 (XEN) [2014-11-19 18:41:26.000][82d08012be31] __do_softirq+0x81/0x8c
 (XEN) [2014-11-19 18:41:26.000][82d08012be89] do_softirq+0x13/0x15
 (XEN) [2014-11-19 18:41:26.000][82d0801633e5] idle_loop+0x5e/0x6e
 (XEN) [2014-11-19 18:41:26.000] 
 (XEN) [2014-11-19 18:41:26.778] 
 (XEN) [2014-11-19 18:41:26.787] 
 (XEN) [2014-11-19 18:41:26.806] Panic on CPU 5:
 (XEN) [2014-11-19 18:41:26.819] GENERAL PROTECTION FAULT
 (XEN) [2014-11-19 18:41:26.834] [error_code=]
 (XEN) [2014-11-19 18:41:26.847] 
 (XEN) [2014-11-19 18:41:26.867] 
 (XEN) [2014-11-19 18:41:26.876] Reboot in five seconds...
 (XEN) [2014-11-19 18:41:26.891] APIC error on CPU0: 00(08)
 (XEN) [2014-11-19 18:41:26.906] APIC error on CPU0: 08(08)

For the avoidance of any confusion, this is still LIST_POISON1 (see
%rax), but now a #GP fault following c/s 404227138 (now with 100% less
chance of dereferencing into guest-controlled virtual address space)

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Andrii Tseglytskyi
19 лист. 2014 20:32, користувач Stefano Stabellini 
stefano.stabell...@eu.citrix.com написав:

 On Wed, 19 Nov 2014, Julien Grall wrote:
  On 11/19/2014 06:14 PM, Stefano Stabellini wrote:
   That's right, the maintenance interrupt handler is not called, but it
   doesn't do anything so we are fine. The important thing is that an
   interrupt is sent and git_clear_lrs gets called on hypervisor entry.
 
  It would be worth to write down this somewhere. Just in case someone
  decide to add code in maintenance interrupt later.

 Yes, I could add a comment in the handler

Maybe it wouldn't take a lot of effort to fix it? I am just worrying that
we may hide some issue - typically spurious interrupt this not what is
expected.
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [xen-4.3-testing test] 31670: regressions - FAIL

2014-11-19 Thread xen . org
flight 31670 xen-4.3-testing real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/31670/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemut-winxpsp3  7 windows-install fail REGR. vs. 31536

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-rumpuserxen-amd64  1 build-check(1)   blocked n/a
 test-amd64-i386-rumpuserxen-i386  1 build-check(1)   blocked  n/a
 build-amd64-rumpuserxen   6 xen-buildfail   never pass
 build-i386-rumpuserxen6 xen-buildfail   never pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64  7 debian-hvm-install fail never pass
 test-amd64-i386-libvirt   9 guest-start  fail   never pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  7 debian-hvm-install  fail never pass
 test-amd64-amd64-libvirt  9 guest-start  fail   never pass
 test-amd64-amd64-xl-pcipt-intel  9 guest-start fail never pass
 test-armhf-armhf-xl   5 xen-boot fail   never pass
 test-armhf-armhf-libvirt  5 xen-boot fail   never pass
 test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xend-winxpsp3 17 leak-check/check fail  never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xend-qemut-winxpsp3 17 leak-check/checkfail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-amd64-xl-win7-amd64 14 guest-stop   fail never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-win7-amd64 14 guest-stop   fail  never pass
 test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass
 test-amd64-amd64-xl-winxpsp3 14 guest-stop   fail   never pass
 test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop   fail never pass

version targeted for testing:
 xen  82fa0623454a52c7d1812a9419c4cc09567d243d
baseline version:
 xen  d6281e354393f1c8a02fac55f4f611b4d4856303


People who touched revisions under test:
  Jan Beulich jbeul...@suse.com
  Tim Deegan t...@xen.org


jobs:
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 build-amd64-rumpuserxen  fail
 build-i386-rumpuserxen   fail
 test-amd64-amd64-xl  pass
 test-armhf-armhf-xl  fail
 test-amd64-i386-xl   pass
 test-amd64-i386-rhel6hvm-amd pass
 test-amd64-i386-qemut-rhel6hvm-amd   pass
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64pass
 test-amd64-i386-xl-qemut-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-i386-freebsd10-amd64  pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 fail
 test-amd64-i386-xl-qemuu-ovmf-amd64  fail
 test-amd64-amd64-rumpuserxen-amd64   blocked 
 test-amd64-amd64-xl-qemut-win7-amd64 fail
 test-amd64-i386-xl-qemut-win7-amd64  fail
 test-amd64-amd64-xl-qemuu-win7-amd64 fail
 test-amd64-i386-xl-qemuu-win7-amd64  fail
 test-amd64-amd64-xl-win7-amd64   fail
 test-amd64-i386-xl-win7-amd64fail
 test-amd64-i386-xl-credit2   pass
 test-amd64-i386-freebsd10-i386   pass

Re: [Xen-devel] [Qemu-devel] qemu 2.2 crash on linux hvm domU (full backtrace included)

2014-11-19 Thread Don Slutz

On 11/19/14 13:18, Stefano Stabellini wrote:

On Wed, 19 Nov 2014, Don Slutz wrote:

I have posted the patch:

Subject: [BUGFIX][PATCH for 2.2 1/1] hw/i386/pc_piix.c: Also pass vmport=off
for xenfv machine
Date: Wed, 19 Nov 2014 12:30:57 -0500
Message-ID: 1416418257-10166-1-git-send-email-dsl...@verizon.com


Which fixes QEMU 2.2 for xenfv.  However if you configure xen_platform_pci=0
you will still
have this issue.  The good news is that xen-4.5 currently does not have QEMU
2.2 and so does
not have this issue.

Only people (groups like spice?) that want QEMU 2.2.0 with xen 4.5.0 (or older
xen versions)
will hit this.

I have changes to xen 4.6 which will fix the xen_platform_pci=0 case also.

In order to get xen 4.5 to fully work with QEMU 2.2.0 (both in hard freeze)

the 1st patch from Dr. David Alan Gilbert dgilb...@redhat.com
would need to be applied to xen's qemu 2.0.2 (+ changes) so that
vmport=off can be added to --machine.

And a patch (yet to be written, subset of changes I have pending for 4.6)
that adds vmport=off to QEMU args for --machine (it can be done in all cases).

What happens if you pass vmport=off via --machine, without David Alan
Gilbert's patch in QEMU?


I am almost (99%) sure that QEMU will complain about a bad arg.

gdb says:

(gdb) r
Starting program: 
/home/don/qemu/out/master/x86_64-softmmu/qemu-system-x86_64 -M pc 
-machine accel=xen,vmportport=1

[Thread debugging using libthread_db enabled]
Using host libthread_db library /lib64/libthread_db.so.1.
qemu-system-x86_64: -machine accel=xen,vmportport=1: Invalid parameter 
'vmportport'



In which case domU will fail to start.
   -Don Slutz




 -Don Slutz



On 11/19/14 10:52, Stefano Stabellini wrote:

On Wed, 19 Nov 2014, Fabio Fantoni wrote:

Il 19/11/2014 15:56, Don Slutz ha scritto:

I think I know what is happening here.  But you are pointing at the
wrong
change.

commit 9b23cfb76b3a5e9eb5cc899eaf2f46bc46d33ba4

Is what I am guessing at this time is the issue.  I think that
xen_enabled()
is
returning false in pc_machine_initfn.  Where as in pc_init1 is is
returning
true.

I am thinking that:


diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 7bb97a4..3268c29 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -914,7 +914,7 @@ static QEMUMachine xenfv_machine = {
   .desc = Xen Fully-virtualized PC,
   .init = pc_xen_hvm_init,
   .max_cpus = HVM_MAX_VCPUS,
-.default_machine_opts = accel=xen,
+.default_machine_opts = accel=xen,vmport=off,
   .hot_add_cpu = pc_hot_add_cpu,
   };
   #endif

Will fix your issue. I have not tested this yet.

Tested now and it solves regression of linux hvm domUs with qemu 2.2,
thanks.
I think that I'm not the only with this regression and that this patch (or
a
fix to the cause in vmport) should be applied before qemu 2.2 final.

Don,
please submit a proper patch with a Signed-off-by.

Thanks!

- Stefano


  -Don Slutz


On 11/19/14 09:04, Fabio Fantoni wrote:

Il 14/11/2014 12:25, Fabio Fantoni ha scritto:

dom0 xen-unstable from staging git with x86/hvm: Extend HVM cpuid
leaf
with vcpu id and x86/hvm: Add per-vcpu evtchn upcalls patches,
and
qemu 2.2 from spice git (spice/next commit
e779fa0a715530311e6f59fc8adb0f6eca914a89):
https://github.com/Fantu/Xen/commits/rebase/m2r-staging

I tried with qemu  tag v2.2.0-rc2 and crash still happen, here the
full
backtrace of latest test:

Program received signal SIGSEGV, Segmentation fault.
0x55689b07 in vmport_ioport_read (opaque=0x564443a0,
addr=0,
  size=4) at
/mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73
73  eax = env-regs[R_EAX];
(gdb) bt full
#0  0x55689b07 in vmport_ioport_read (opaque=0x564443a0,
addr=0,
  size=4) at
/mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73
  s = 0x564443a0
  cs = 0x0
  cpu = 0x0
  __func__ = vmport_ioport_read
  env = 0x8250
  command = 0 '\000'
  eax = 0
#1  0x55655fc4 in memory_region_read_accessor
(mr=0x5628,
  addr=0, value=0x7fffd8d0, size=4, shift=0, mask=4294967295)
  at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:410
  tmp = 0
#2  0x556562b7 in access_with_adjusted_size (addr=0,
  value=0x7fffd8d0, size=4, access_size_min=4,
access_size_max=4,
  access=0x55655f62 memory_region_read_accessor,
mr=0x5628)
  at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:480
  access_mask = 4294967295
  access_size = 4
  i = 0
#3  0x556590e9 in memory_region_dispatch_read1
(mr=0x5628,
  addr=0, size=4) at
/mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1077
  data = 0
#4  0x556591b1 in memory_region_dispatch_read
(mr=0x5628,
  addr=0, pval=0x7fffd9a8, size=4)
---Type return to continue, or q return to quit---
  at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1099
No locals.
#5  0x5565cbbc in io_mem_read 

Re: [Xen-devel] [PATCH V3 2/8] xen: Delay remapping memory of pv-domain

2014-11-19 Thread Konrad Rzeszutek Wilk
On Fri, Nov 14, 2014 at 06:14:06PM +0100, Juergen Gross wrote:
 On 11/14/2014 05:47 PM, Konrad Rzeszutek Wilk wrote:
 On Fri, Nov 14, 2014 at 05:53:19AM +0100, Juergen Gross wrote:
 On 11/13/2014 08:56 PM, Konrad Rzeszutek Wilk wrote:
 +   mfn_save = virt_to_mfn(buf);
 +
 +   while (xen_remap_mfn != INVALID_P2M_ENTRY) {
 
 So the 'list' is constructed by going forward - that is from low-numbered
 PFNs to higher numbered ones. But the 'xen_remap_mfn' is going the
 other way - from the highest PFN to the lowest PFN.
 
 Won't that mean we will restore the chunks of memory in the wrong
 order? That is we will still restore them in chunks size, but the
 chunks will be in descending order instead of ascending?
 
 No, the information where to put each chunk is contained in the chunk
 data. I can add a comment explaining this.
 
 Right, the MFNs in a chunks are going to be restored in the right order.
 
 I was thinking that the chunks (so a set of MFNs) will be restored in
 the opposite order that they are written to.
 
 And oddly enough the chunks are done in 512-3 = 509 MFNs at once?
 
 More don't fit on a single page due to the other info needed. So: yes.
 
 But you could use two pages - one for the structure and the other
 for the list of MFNs. That would fix the problem of having only
 509 MFNs being contingous per chunk when restoring.
 
 That's no problem (see below).
 
 Anyhow the point I had that I am worried is that we do not restore the
 MFNs in the same order. We do it in chunk size which is OK (so the 509 MFNs
 at once)- but the order we traverse the restoration process is the opposite 
 of
 the save process. Say we have 4MB of contingous MFNs, so two (err, three)
 chunks. The first one we iterate is from 0-509, the second is 510-1018, the
 last is 1019-1023. When we restore (remap) we start with the last 'chunk'
 so we end up restoring them: 1019-1023, 510-1018, 0-509 order.
 
 No. When building up the chunks we save in each chunk where to put it
 on remap. So in your example 0-509 should be mapped at dest+0,
 510-1018 at dest+510, and 1019-1023 at dest+1019.
 
 When remapping we map 1019-1023 to dest+1019, 510-1018 at dest+510
 and last 0-509 at dest+0. So we do the mapping in reverse order, but
 to the correct pfns.

Excellent! Could a condensed version of that explanation be put in the code ?

 
 Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V3] Decouple SnadyBridge quirk form VTd timeout

2014-11-19 Thread Donald D. Dugger
Currently the quirk code for SandyBridge uses the VTd timeout value when
writing to an IGD register.  This is the wrong timeout to use and, at
1000 msec., is also much too large.  This patch changes the quirk code to
use a timeout that is specific to the IGD device and allows the user
control of the timeout.

Boolean settings for the boot parameter `snb_igd_quirk' keep their current
meaning, enabling or disabling the quirk code with a timeout of 1000 msec.

In addition specifying `snb_igd_quirk=default' will enable the code and
set the timeout to the theoretical maximum of 670 msec.  For finer control,
specifying `snb_igd_quirk=n', where `n' is a decimal number, will enable
the code and set the timeout to `n' msec.

Signed-off-by: Don Dugger donald.d.dug...@intel.com
-- 
diff -r 9d485e2c8339 xen/drivers/passthrough/vtd/quirks.c
--- a/xen/drivers/passthrough/vtd/quirks.c  Mon Nov 10 12:03:36 2014 +
+++ b/xen/drivers/passthrough/vtd/quirks.c  Wed Nov 19 09:49:31 2014 -0700
@@ -50,6 +50,10 @@
 #define IS_ILK(id)(id == 0x00408086 || id == 0x00448086 || id== 0x00628086 
|| id == 0x006A8086)
 #define IS_CPT(id)(id == 0x01008086 || id == 0x01048086)
 
+#define SNB_IGD_TIMEOUT_LEGACY MILLISECS(1000)
+#define SNB_IGD_TIMEOUTMILLISECS( 670)
+static u32 snb_igd_timeout = 0;
+
 static u32 __read_mostly ioh_id;
 static u32 __initdata igd_id;
 bool_t __read_mostly rwbf_quirk;
@@ -158,6 +162,16 @@
  * Workaround is to prevent graphics get into RC6
  * state when doing VT-d IOTLB operations, do the VT-d
  * IOTLB operation, and then re-enable RC6 state.
+ *
+ * This quirk is enabled with the snb_igd_quirk command
+ * line parameter.  Specifying snb_igd_quirk with no value
+ * (or any of the standard boolean values) enables this
+ * quirk and sets the timeout to the legacy timeout of
+ * 1000 msec.  Setting this parameter to the string
+ * default enables this quirk and sets the timeout to
+ * the theoretical maximum of 670 msec.  Setting this
+ * parameter to a numerical value enables the quirk and
+ * sets the timeout to that numerical number of msecs.
  */
 static void snb_vtd_ops_preamble(struct iommu* iommu)
 {
@@ -177,7 +191,7 @@
 start_time = NOW();
 while ( (*(volatile u32 *)(igd_reg_va + 0x22AC)  0xF) != 0 )
 {
-if ( NOW()  start_time + DMAR_OPERATION_TIMEOUT )
+if ( NOW()  start_time + snb_igd_timeout )
 {
 dprintk(XENLOG_INFO VTDPREFIX,
 snb_vtd_ops_preamble: failed to disable idle 
handshake\n);
@@ -208,13 +222,10 @@
  * call before VT-d translation enable and IOTLB flush operations.
  */
 
-static int snb_igd_quirk;
-boolean_param(snb_igd_quirk, snb_igd_quirk);
-
 void vtd_ops_preamble_quirk(struct iommu* iommu)
 {
 cantiga_vtd_ops_preamble(iommu);
-if ( snb_igd_quirk )
+if ( snb_igd_timeout != 0 )
 {
 spin_lock(igd_lock);
 
@@ -228,7 +239,7 @@
  */
 void vtd_ops_postamble_quirk(struct iommu* iommu)
 {
-if ( snb_igd_quirk )
+if ( snb_igd_timeout != 0 )
 {
 snb_vtd_ops_postamble(iommu);
 
@@ -237,6 +248,42 @@
 }
 }
 
+static void __init parse_snb_timeout(const char *s)
+{
+   int not;
+
+   switch (*s) {
+
+   case '\0':
+   snb_igd_timeout = SNB_IGD_TIMEOUT_LEGACY;
+   break;
+
+   case '0':   case '1':   case '2':
+   case '3':   case '4':   case '5':
+   case '6':   case '7':   case '8':
+   case '9':
+   snb_igd_timeout = MILLISECS(simple_strtoul(s, s, 0));
+   if ( snb_igd_timeout == MILLISECS(1) )
+   snb_igd_timeout = SNB_IGD_TIMEOUT_LEGACY;
+   break;
+
+   default:
+   if ( strncmp(default, s, 7) == 0 ) {
+   snb_igd_timeout = SNB_IGD_TIMEOUT;
+   break;
+   }
+   not = !strncmp(no-, s, 3);
+   if ( not )
+   s += 3;
+   if ( not ^ parse_bool(s) )
+   snb_igd_timeout = SNB_IGD_TIMEOUT_LEGACY;
+   break;
+
+   }
+   return;
+}
+custom_param(snb_igd_quirk, parse_snb_timeout);
+
 /* 5500/5520/X58 Chipset Interrupt remapping errata, for stepping B-3.
  * Fixed in stepping C-2. */
 static void __init tylersburg_intremap_quirk(void)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 7/8] xen: switch to linear virtual mapped sparse p2m list

2014-11-19 Thread Konrad Rzeszutek Wilk
On Tue, Nov 11, 2014 at 06:43:45AM +0100, Juergen Gross wrote:
 At start of the day the Xen hypervisor presents a contiguous mfn list
 to a pv-domain. In order to support sparse memory this mfn list is
 accessed via a three level p2m tree built early in the boot process.
 Whenever the system needs the mfn associated with a pfn this tree is
 used to find the mfn.
 
 Instead of using a software walked tree for accessing a specific mfn
 list entry this patch is creating a virtual address area for the
 entire possible mfn list including memory holes. The holes are
 covered by mapping a pre-defined  page consisting only of invalid
 mfn entries. Access to a mfn entry is possible by just using the
 virtual base address of the mfn list and the pfn as index into that
 list. This speeds up the (hot) path of determining the mfn of a
 pfn.
 
 Kernel build on a Dell Latitude E6440 (2 cores, HT) in 64 bit Dom0
 showed following improvements:
 
 Elapsed time: 32:50 -  32:35
 System:   18:07 -  17:47
 User:104:00 - 103:30
 
 Tested on 64 bit dom0 and 32 bit domU.
 
 Signed-off-by: Juergen Gross jgr...@suse.com
 ---
  arch/x86/include/asm/xen/page.h |  14 +-
  arch/x86/xen/mmu.c  |  32 +-
  arch/x86/xen/p2m.c  | 732 
 +---
  arch/x86/xen/xen-ops.h  |   2 +-
  4 files changed, 342 insertions(+), 438 deletions(-)
 
 diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
 index 07d8a7b..4a227ec 100644
 --- a/arch/x86/include/asm/xen/page.h
 +++ b/arch/x86/include/asm/xen/page.h
 @@ -72,7 +72,19 @@ extern unsigned long m2p_find_override_pfn(unsigned long 
 mfn, unsigned long pfn)
   */
  static inline unsigned long __pfn_to_mfn(unsigned long pfn)
  {
 - return get_phys_to_machine(pfn);
 + unsigned long mfn;
 +
 + if (pfn  xen_p2m_size)
 + mfn = xen_p2m_addr[pfn];
 + else if (unlikely(pfn  xen_max_p2m_pfn))
 + return get_phys_to_machine(pfn);
 + else
 + return IDENTITY_FRAME(pfn);
 +
 + if (unlikely(mfn == INVALID_P2M_ENTRY))
 + return get_phys_to_machine(pfn);
 +
 + return mfn;
  }
  
  static inline unsigned long pfn_to_mfn(unsigned long pfn)
 diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
 index 31ca515..0b43c45 100644
 --- a/arch/x86/xen/mmu.c
 +++ b/arch/x86/xen/mmu.c
 @@ -1158,20 +1158,16 @@ static void __init xen_cleanhighmap(unsigned long 
 vaddr,
* instead of somewhere later and be confusing. */
   xen_mc_flush();
  }
 -static void __init xen_pagetable_p2m_copy(void)
 +
 +static void __init xen_pagetable_p2m_free(void)
  {
   unsigned long size;
   unsigned long addr;
 - unsigned long new_mfn_list;
 -
 - if (xen_feature(XENFEAT_auto_translated_physmap))
 - return;
  
   size = PAGE_ALIGN(xen_start_info-nr_pages * sizeof(unsigned long));
  
 - new_mfn_list = xen_revector_p2m_tree();
   /* No memory or already called. */
 - if (!new_mfn_list || new_mfn_list == xen_start_info-mfn_list)
 + if ((unsigned long)xen_p2m_addr == xen_start_info-mfn_list)
   return;
  
   /* using __ka address and sticking INVALID_P2M_ENTRY! */
 @@ -1189,8 +1185,6 @@ static void __init xen_pagetable_p2m_copy(void)
  
   size = PAGE_ALIGN(xen_start_info-nr_pages * sizeof(unsigned long));
   memblock_free(__pa(xen_start_info-mfn_list), size);
 - /* And revector! Bye bye old array */
 - xen_start_info-mfn_list = new_mfn_list;
  
   /* At this stage, cleanup_highmap has already cleaned __ka space
* from _brk_limit way up to the max_pfn_mapped (which is the end of
 @@ -1214,12 +1208,26 @@ static void __init xen_pagetable_p2m_copy(void)
  }
  #endif
  
 -static void __init xen_pagetable_init(void)
 +static void __init xen_pagetable_p2m_setup(void)
  {
 - paging_init();
 + if (xen_feature(XENFEAT_auto_translated_physmap))
 + return;
 +
 + xen_vmalloc_p2m_tree();
 +
  #ifdef CONFIG_X86_64
 - xen_pagetable_p2m_copy();
 + xen_pagetable_p2m_free();
  #endif
 + /* And revector! Bye bye old array */
 + xen_start_info-mfn_list = (unsigned long)xen_p2m_addr;
 +}
 +
 +static void __init xen_pagetable_init(void)
 +{
 + paging_init();
 +
 + xen_pagetable_p2m_setup();
 +
   /* Allocate and initialize top and mid mfn levels for p2m structure */
   xen_build_mfn_list_list();
  
 diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
 index 328875a..7df446d 100644
 --- a/arch/x86/xen/p2m.c
 +++ b/arch/x86/xen/p2m.c
 @@ -3,21 +3,22 @@
   * guests themselves, but it must also access and update the p2m array
   * during suspend/resume when all the pages are reallocated.
   *
 - * The p2m table is logically a flat array, but we implement it as a
 - * three-level tree to allow the address space to be sparse.
 + * The logical flat p2m table is mapped to a linear kernel memory area.
 + * For accesses by Xen a three-level tree linked via 

Re: [Xen-devel] [PATCH V3 0/8] xen: Switch to virtual mapped linear p2m list

2014-11-19 Thread Konrad Rzeszutek Wilk
On Tue, Nov 11, 2014 at 06:43:38AM +0100, Juergen Gross wrote:
 Paravirtualized kernels running on Xen use a three level tree for
 translation of guest specific physical addresses to machine global
 addresses. This p2m tree is used for construction of page table
 entries, so the p2m tree walk is performance critical.
 
 By using a linear virtual mapped p2m list accesses to p2m elements
 can be sped up while even simplifying code. To achieve this goal
 some p2m related initializations have to be performed later in the
 boot process, as the final p2m list can be set up only after basic
 memory management functions are available.
 

Hey Juergen,

I finially finished looking at the patchset. Had some comments,
some questions that I hope can make it in the patch so that in
six months or so when somebody looks at the code they can
understand the subtle pieces.

Looking forward to the v4! (Thought keep in mind that next week
is Thanksgiving week so won't be able to look much after Wednesday)

  arch/x86/include/asm/pgtable_types.h |1 +
  arch/x86/include/asm/xen/page.h  |   49 +-
  arch/x86/mm/pageattr.c   |   20 +
  arch/x86/xen/mmu.c   |   38 +-
  arch/x86/xen/p2m.c   | 1315 
 ++
  arch/x86/xen/setup.c |  460 ++--
  arch/x86/xen/xen-ops.h   |6 +-
  7 files changed, 854 insertions(+), 1035 deletions(-)

And best of - we are deleting more code!

 
 -- 
 2.1.2
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5] libxl: remove existence check for PCI device hotplug

2014-11-19 Thread Konrad Rzeszutek Wilk
On Mon, Nov 17, 2014 at 12:10:34PM +, Wei Liu wrote:
 The existence check is to make sure a device is not added to a guest
 multiple times.
 
 PCI device backend path has different rules from vif, disk etc. For
 example:
 /local/domain/0/backend/pci/9/0/dev-1/:03:10.1
 /local/domain/0/backend/pci/9/0/key-1/:03:10.1
 /local/domain/0/backend/pci/9/0/dev-2/:03:10.2
 /local/domain/0/backend/pci/9/0/key-2/:03:10.2
 
 The devid for PCI devices is hardcoded 0. libxl__device_exists only
 checks up to /local/.../9/0 so it always returns true even the device is
 assignable.
 
 Remove invocation of libxl__device_exists. We're sure at this point that
 the PCI device is assignable (hence no xenstore entry or JSON entry).
 The check is done before hand. For HVM guest it's done by calling
 xc_test_assign_device and for PV guest it's done by calling
 pciback_dev_is_assigned.
 
 Reported-by: Li, Liang Z liang.z...@intel.com
 Signed-off-by: Wei Liu wei.l...@citrix.com
 Cc: Ian Campbell ian.campb...@citrix.com
 Cc: Ian Jackson ian.jack...@eu.citrix.com
 Cc: Konrad Wilk konrad.w...@oracle.com
 ---
 This patch fixes a regression in 4.5.

Ouch! That needs then to be fixed.

Is the version you would want to commit? I did test it - and it
looked to do the right thing - thought the xen-pciback is stuck in the
7 state. However that is a seperate issue that I believe is due to
Xen pciback not your patches.

 
 The risk is that I misunderstood semantics of xc_test_assign_device and
 pciback_dev_is_assigned and end up adding several entries to JSON config
 template. But if the assignable tests are incorrect I think we have a
 bigger problem to worry about than duplicated entries in JSON template.
 
 It would be good for someone to have PCI hotplug setup to run a quick test.  I
 think Liang confirmed (indrectly) that xc_test_assign_device worked well for
 him so I think there's won't be multiple JSON template entries for HVM guests.
 However PV side still remains to be tested.
 ---
  tools/libxl/libxl_pci.c |8 
  1 file changed, 8 deletions(-)
 
 diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
 index 9f40100..316643c 100644
 --- a/tools/libxl/libxl_pci.c
 +++ b/tools/libxl/libxl_pci.c
 @@ -175,14 +175,6 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc, 
 uint32_t domid, libxl_d
  rc = libxl__xs_transaction_start(gc, t);
  if (rc) goto out;
  
 -rc = libxl__device_exists(gc, t, device);
 -if (rc  0) goto out;
 -if (rc == 1) {
 -LOG(ERROR, device already exists in xenstore);
 -rc = ERROR_DEVICE_EXISTS;
 -goto out;
 -}
 -
  rc = libxl__set_domain_configuration(gc, domid, d_config);
  if (rc) goto out;
  
 -- 
 1.7.10.4
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v9 12/13] swiotlb-xen: pass dev_addr to xen_dma_unmap_page and xen_dma_sync_single_for_cpu

2014-11-19 Thread Konrad Rzeszutek Wilk
On Wed, Nov 12, 2014 at 11:40:53AM +, Stefano Stabellini wrote:
 xen_dma_unmap_page and xen_dma_sync_single_for_cpu take a dma_addr_t
 handle as argument, not a physical address.

Ouch. Should this also go on stable tree?

 
 Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com
 Reviewed-by: Catalin Marinas catalin.mari...@arm.com
 ---
  drivers/xen/swiotlb-xen.c |6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)
 
 diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
 index 3725ee4..498b654 100644
 --- a/drivers/xen/swiotlb-xen.c
 +++ b/drivers/xen/swiotlb-xen.c
 @@ -449,7 +449,7 @@ static void xen_unmap_single(struct device *hwdev, 
 dma_addr_t dev_addr,
  
   BUG_ON(dir == DMA_NONE);
  
 - xen_dma_unmap_page(hwdev, paddr, size, dir, attrs);
 + xen_dma_unmap_page(hwdev, dev_addr, size, dir, attrs);
  
   /* NOTE: We use dev_addr here, not paddr! */
   if (is_xen_swiotlb_buffer(dev_addr)) {
 @@ -497,14 +497,14 @@ xen_swiotlb_sync_single(struct device *hwdev, 
 dma_addr_t dev_addr,
   BUG_ON(dir == DMA_NONE);
  
   if (target == SYNC_FOR_CPU)
 - xen_dma_sync_single_for_cpu(hwdev, paddr, size, dir);
 + xen_dma_sync_single_for_cpu(hwdev, dev_addr, size, dir);
  
   /* NOTE: We use dev_addr here, not paddr! */
   if (is_xen_swiotlb_buffer(dev_addr))
   swiotlb_tbl_sync_single(hwdev, paddr, size, dir, target);
  
   if (target == SYNC_FOR_DEVICE)
 - xen_dma_sync_single_for_cpu(hwdev, paddr, size, dir);
 + xen_dma_sync_single_for_cpu(hwdev, dev_addr, size, dir);
  
   if (dir != DMA_FROM_DEVICE)
   return;
 -- 
 1.7.10.4
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] set pv guest default video_memkb to 0

2014-11-19 Thread Konrad Rzeszutek Wilk
On Tue, Nov 18, 2014 at 03:57:08PM -0500, Zhigang Wang wrote:
 Before this patch, pv guest video_memkb is -1, which is an invalid value.
 And it will cause the xenstore 'memory/targe' calculation wrong:
 
 memory/target = info-target_memkb - info-video_memkb

CC-ing the maintainers.

Is this an regression as compared to Xen 4.4 or is this also in Xen 4.4?

Thanks.

 
 Signed-off-by: Zhigang Wang zhigang.x.w...@oracle.com
 ---
  tools/libxl/libxl_create.c | 2 ++
  1 file changed, 2 insertions(+)
 
 diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
 index b1ff5ae..1198225 100644
 --- a/tools/libxl/libxl_create.c
 +++ b/tools/libxl/libxl_create.c
 @@ -357,6 +357,8 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
  break;
  case LIBXL_DOMAIN_TYPE_PV:
  libxl_defbool_setdefault(b_info-u.pv.e820_host, false);
 +if (b_info-video_memkb == LIBXL_MEMKB_DEFAULT)
 +b_info-video_memkb = 0;
  if (b_info-shadow_memkb == LIBXL_MEMKB_DEFAULT)
  b_info-shadow_memkb = 0;
  if (b_info-u.pv.slack_memkb == LIBXL_MEMKB_DEFAULT)
 -- 
 1.8.3.1
 
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5] docs/commandline: Fix formatting issues

2014-11-19 Thread Konrad Rzeszutek Wilk
On Wed, Nov 19, 2014 at 11:22:18AM +, Ian Campbell wrote:
 On Wed, 2014-11-19 at 11:17 +, Andrew Cooper wrote:
  In both of these cases, markdown was interpreting the text as regular text,
  and reflowing it as a regular paragraph, leading to a single line as output.
  Reformat them as code blocks inside blockquote blocks, which causes them to
  take their precise whitespace layout.
  
  Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
 Acked-by: Ian Campbell ian.campb...@citrix.com
 
  CC: Ian Jackson ian.jack...@eu.citrix.com
  CC: Wei Liu wei.l...@citrix.com
  CC: Konrad Rzeszutek Wilk konrad.w...@oracle.com
  
  ---
  
  Konrad: this is a documentation fix, so requesting a 4.5 ack please.
 
 FWIW IMHO documentation fixes in general should have a very low bar to
 cross until very late in the release cycle...

I concur, I updated the release criteria doc so that it will be expediated
in the future.

 
  ---
   docs/misc/xen-command-line.markdown |   38 
  +--
   1 file changed, 19 insertions(+), 19 deletions(-)
  
  diff --git a/docs/misc/xen-command-line.markdown 
  b/docs/misc/xen-command-line.markdown
  index f054d4b..e3a5a15 100644
  --- a/docs/misc/xen-command-line.markdown
  +++ b/docs/misc/xen-command-line.markdown
  @@ -475,13 +475,13 @@ defaults of 1 and unlimited respectively are used 
  instead.
   
   For example, with `dom0_max_vcpus=4-8`:
   
  - Number of
  -  PCPUs | Dom0 VCPUs
  -   2|  4
  -   4|  4
  -   6|  6
  -   8|  8
  -  10|  8
  +Number of
  + PCPUs | Dom0 VCPUs
  +  2|  4
  +  4|  4
  +  6|  6
  +  8|  8
  + 10|  8
   
   ### dom0\_mem
`= List of ( min:size | max:size | size )`
  @@ -684,18 +684,18 @@ supported only when compiled with XSM\_ENABLE=y on 
  x86.
   The specified value is a bit mask with the individual bits having the
   following meaning:
   
  -Bit  0 - debug level 0 (unused at present)
  -Bit  1 - debug level 1 (Control Register logging)
  -Bit  2 - debug level 2 (VMX logging of MSR restores when context switching)
  -Bit  3 - debug level 3 (unused at present)
  -Bit  4 - I/O operation logging
  -Bit  5 - vMMU logging
  -Bit  6 - vLAPIC general logging
  -Bit  7 - vLAPIC timer logging
  -Bit  8 - vLAPIC interrupt logging
  -Bit  9 - vIOAPIC logging
  -Bit 10 - hypercall logging
  -Bit 11 - MSR operation logging
  + Bit  0 - debug level 0 (unused at present)
  + Bit  1 - debug level 1 (Control Register logging)
  + Bit  2 - debug level 2 (VMX logging of MSR restores when context 
  switching)
  + Bit  3 - debug level 3 (unused at present)
  + Bit  4 - I/O operation logging
  + Bit  5 - vMMU logging
  + Bit  6 - vLAPIC general logging
  + Bit  7 - vLAPIC timer logging
  + Bit  8 - vLAPIC interrupt logging
  + Bit  9 - vIOAPIC logging
  + Bit 10 - hypercall logging
  + Bit 11 - MSR operation logging
   
   Recognized in debug builds of the hypervisor only.
   
 
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] set pv guest default video_memkb to 0

2014-11-19 Thread Wei Liu
On Wed, Nov 19, 2014 at 04:08:46PM -0500, Konrad Rzeszutek Wilk wrote:
 On Tue, Nov 18, 2014 at 03:57:08PM -0500, Zhigang Wang wrote:
  Before this patch, pv guest video_memkb is -1, which is an invalid value.
  And it will cause the xenstore 'memory/targe' calculation wrong:
  
  memory/target = info-target_memkb - info-video_memkb
 
 CC-ing the maintainers.
 
 Is this an regression as compared to Xen 4.4 or is this also in Xen 4.4?
 

I don't think this is a regression, it has been broken for quite a
while.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5] libxl: remove existence check for PCI device hotplug

2014-11-19 Thread Konrad Rzeszutek Wilk
On Wed, Nov 19, 2014 at 09:21:23PM +, Wei Liu wrote:
 On Wed, Nov 19, 2014 at 04:01:54PM -0500, Konrad Rzeszutek Wilk wrote:
  On Mon, Nov 17, 2014 at 12:10:34PM +, Wei Liu wrote:
   The existence check is to make sure a device is not added to a guest
   multiple times.
   
   PCI device backend path has different rules from vif, disk etc. For
   example:
   /local/domain/0/backend/pci/9/0/dev-1/:03:10.1
   /local/domain/0/backend/pci/9/0/key-1/:03:10.1
   /local/domain/0/backend/pci/9/0/dev-2/:03:10.2
   /local/domain/0/backend/pci/9/0/key-2/:03:10.2
   
   The devid for PCI devices is hardcoded 0. libxl__device_exists only
   checks up to /local/.../9/0 so it always returns true even the device is
   assignable.
   
   Remove invocation of libxl__device_exists. We're sure at this point that
   the PCI device is assignable (hence no xenstore entry or JSON entry).
   The check is done before hand. For HVM guest it's done by calling
   xc_test_assign_device and for PV guest it's done by calling
   pciback_dev_is_assigned.
   
   Reported-by: Li, Liang Z liang.z...@intel.com
   Signed-off-by: Wei Liu wei.l...@citrix.com
   Cc: Ian Campbell ian.campb...@citrix.com
   Cc: Ian Jackson ian.jack...@eu.citrix.com
   Cc: Konrad Wilk konrad.w...@oracle.com
   ---
   This patch fixes a regression in 4.5.
  
  Ouch! That needs then to be fixed.
  
  Is the version you would want to commit? I did test it - and it
 
 Yes.

Then Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 
  looked to do the right thing - thought the xen-pciback is stuck in the
  7 state. However that is a seperate issue that I believe is due to
  Xen pciback not your patches.
  
 
 Thanks for testing.
 
 Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v0 RFC 0/2] xl/libxl support for PVUSB

2014-11-19 Thread Konrad Rzeszutek Wilk
On Sun, Nov 16, 2014 at 10:36:28AM +0800, Simon Cao wrote:
 Hi,
 
 I was working on the work. But I was busing preparing some job interviews
 in the last three months, sorry for this long delay. I will update my
 progress in a few days.

OK, I put your name for this to be in Xen 4.6.

Thanks!
 
 Thanks!
 
 Bo Cao
 
 On Mon, Nov 10, 2014 at 4:37 PM, Chun Yan Liu cy...@suse.com wrote:
 
  Is there any progress on this work? I didn't see new version after this.
  Anyone knows the status?
 
  Thanks,
  Chunyan
 
   On 8/11/2014 at 04:23 AM, in message
  1407702234-22309-1-git-send-email-caobosi...@gmail.com, Bo Cao
  caobosi...@gmail.com wrote:
   Finally I have a workable version xl/libxl support for PVUSB. Most of
   its commands work property now, but there are still some probelm to be
   solved.
   Please take a loot and give me some advices.
  
   == What have been implemented ? ==
   I have implemented libxl functions for PVUSB in libxl_usb.c. It mainly
   consists of two part:
   usbctrl_add/remove/list and usb_add/remove/list in which usbctrl denote
  usb
   controller in which
   usd device can be plugged in. I don't use ao_dev in
   libxl_deivce_usbctrl_add since we don't need to
   execute hotplug script for usbctrl and without ao_dev, adding default
   usbctrl for usb device
   would be easier.
  
   For the cammands to manipulate usb device such as xl usb-attach and xl
   usb-detach, this patch now only
   support to specify usb devices by their interface in sysfs. Using this
   interface, we can read usb device
   information through sysfs and bind/unbind usb device. (The support for
   mapping the lsusb bus:addr to the
   sysfs usb interface will come later).
  
   == What needs to do next ? ==
   There are two main problems to be solved.
  
   1.  PVUSB Options in VM Guest's Configuration File
   The interface in VM Guest's configuration file to add usb device is:
   usb=[interface=1-1].
   But the problem is now is that after the default usbctrl is added, the
  state
   of usbctrl is 2, e,g, XenbusStateInitWait,
   waiting for xen-usbfront to connect. The xen-usbfront in VM Guest isn't
   loaded. Therefore, sysfs_intf_write
   will report error. Does anyone have any clue how to solve this?
  
   2. sysfs_intf_write
   In the process of xl usb-attach domid intf=1-1, after writing
  1-1 to
   Xenstore entry, we need to
   bind the controller of this usb device to usbback driver so that it can
  be
   used by VM Guest. For exampele,
   for usb device 1-1, it's controller interface maybe 1-1:1.0, and we
   write this value to /sys/bus/usb/driver/usbback/bind.
   But for some devices, they have two controllers, for example 1-1:1.0
  and
   1-1:1.1. I think this means it has two functions,
   such as usbhid and usb-storage. So in this case, we bind the two
  controller
   to usbback?
  
   
   There maybe some errors or bugs in the codes. Feel free to tell me.
  
   Cheers,
  
   - Simon
  
   ---
   CC: George Dunlap george.dun...@eu.citrix.com
   CC: Ian Jackson ian.jack...@citrix.com
   CC: Ian Campbell ian.campb...@citrix.com
   CC: Pasi Kärkkäinen pa...@iki.fi
   CC: Lars Kurth lars.ku...@citrix.com
  
  
  
   ___
   Xen-devel mailing list
   Xen-devel@lists.xen.org
   http://lists.xen.org/xen-devel
  
 
 

 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1 for-xen-4.5] Fix list corruption in dpci_softirq.

2014-11-19 Thread Konrad Rzeszutek Wilk
On Wed, Nov 19, 2014 at 08:17:35PM +0100, Sander Eikelenboom wrote:
 
 Wednesday, November 19, 2014, 8:01:31 PM, you wrote:
 
  On Wed, Nov 19, 2014 at 07:54:39PM +0100, Sander Eikelenboom wrote:
  
  Wednesday, November 19, 2014, 6:31:39 PM, you wrote:
  
   Hey,
  
   This patch should fix the issue that Sander had seen. The full details
   are in the patch itself. Sander, if you could - please test 
   origin/staging
   with this patch to make sure it does fix the issue.
  
  
xen/drivers/passthrough/io.c | 27 +--
  
   Konrad Rzeszutek Wilk (1):
 dpci: Fix list corruption if INTx device is used and an IRQ 
   timeout is invoked.
  
1 file changed, 17 insertions(+), 10 deletions(-)
  
  
  Hi Konrad,
  
  Hmm just tested with a freshly cloned tree .. unfortunately it blew up 
  again.
  (i must admit i also re-enabled stuff i had disabled in debugging like, 
  cpuidle, cpufreq). 
 
  Argh.
 
  Could you also try the first patch the STATE_ZOMBIE one?
 
 Building now ..

(Attached and inline)

Sander mentioned to me over IRC that with the STATE_ZOMBIE patch things work 
peachy for him.

The patch in combination with the previous adds two extra paths:

1) in raise_softirq, we do delay scheduling of dcpi_pirq until STATE_ZOMBIE is 
cleared.
2) dpci_softirq will pick up the cancelled dpci_pirq and then clear the 
STATE_ZOMBIE.

Lets follow the case without the zombie patch and with the zombie patch:

w/o zombie:

timer_softirq_action
pt_irq_time_out calls pt_pirq_softirq_cancel which cmpxchg the state to 
0.
pirq_dpci is still on dpci_list.
dpci_sofitrq
while (!list_emptry(our_list))
list_del, but has not yet done 'entry-next = LIST_POISON1;'
[interrupt happens]
raise_softirq checks state which is zero. Adds pirq_dpci to the 
dpci_list.
[interrupt is done, back to dpci_softirq]
finishes the entry-next = LIST_POISON1;
.. test STATE_SCHED returns true, so executes the 
hvm_dirq_assist.
ends the loop, exits.
dpci_softirq
while (!list_emtpry)
list_del, but -next already has LIST_POISON1 and we blow up.


w/ zombie:
timer_softirq_action
pt_irq_time_out calls pt_pirq_softirq_cancel which cmpxchg the state to 
STATE_ZOMBIE.
pirq_dpci is still on dpci_list.
dpci_sofitrq
while (!list_emptry(our_list))
list_del, but has not yet done 'entry-next = LIST_POISON1;'
[interrupt happens]
raise_softirq checks state, it is STATE_ZOMBIE so returns.
[interrupt is done, back to dpci_softirq]
finishes the entry-next = LIST_POISON1;
.. test STATE_SCHED returns true, so executes the 
hvm_dirq_assist.
ends the loop, exits.

So it seems that the STATE_ZOMBIE is needed, but for a different reason that
Jan initially thought of:


From c89a97f695fda245f5fcb16ddb36d3df7f6f28b9 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Date: Fri, 14 Nov 2014 12:15:26 -0500
Subject: [PATCH] dpci: Add ZOMBIE state to allow the softirq to finish with
 the dpci_pirq.

When we want to cancel an outstanding 'struct hvm_pirq_dpci' we perform
and cmpxch on the state to set it to zero. That is OK on the teardown
paths as it is guarnateed that the do_IRQ action handler has been removed.
Hence no more interrupts can be scheduled. But with the introduction
of dpci: Fix list corruption if INTx device is used and an IRQ timeout is 
invoked.
we now utilize the pt_pirq_softirq_cancel when we want to cancel
outstanding operations. However once we cancel them the do_IRQ is
free to schedule them back in - even if said 'struct hvm_pirq_dpci'
is still on the dpci_list.

The code base before this patch could follow this race:

\-timer_softirq_action
pt_irq_time_out calls pt_pirq_softirq_cancel which cmpxchg the state to 
0.
pirq_dpci is still on dpci_list.
\- dpci_sofitrq
while (!list_emptry(our_list))
list_del, but has not yet done 'entry-next = LIST_POISON1;'
[interrupt happens]
raise_softirq checks state which is zero. Adds pirq_dpci to the 
dpci_list.
[interrupt is done, back to dpci_softirq]
finishes the entry-next = LIST_POISON1;
.. test STATE_SCHED returns true, so executes the 
hvm_dirq_assist.
ends the loop, exits.

\- dpci_softirq
while (!list_emtpry)
list_del, but -next already has LIST_POISON1 and we blow up.

This patch in combination adds two extra paths:

1) in raise_softirq, we do delay scheduling of dcpi_pirq until STATE_ZOMBIE is 
cleared.
2) dpci_softirq will pick up the cancelled dpci_pirq and then clear the 
STATE_ZOMBIE.

Using the example above the code-paths would be now:
\- timer_softirq_action
pt_irq_time_out calls pt_pirq_softirq_cancel which cmpxchg the state to 
STATE_ZOMBIE.
pirq_dpci is still on dpci_list.
\- dpci_sofitrq
while (!list_emptry(our_list))

[Xen-devel] [for xen-4.5 PATCH v2] Fix list corruption in dpci_softirq.

2014-11-19 Thread Konrad Rzeszutek Wilk
Hey,

Attached are two patches that fix the dpci_softirq list corruption
that Sander was observing.


 xen/drivers/passthrough/io.c | 55 +++-
 1 file changed, 39 insertions(+), 16 deletions(-)

Konrad Rzeszutek Wilk (2):
  dpci: Fix list corruption if INTx device is used and an IRQ timeout is 
invoked.
  dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [for-xen-4.5 PATCH v2 1/2] dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked.

2014-11-19 Thread Konrad Rzeszutek Wilk
If we pass in INTx type devices to a guest on an over-subscribed
machine - and in an over-worked guest - we can cause the
pirq_dpci-softirq_list to become corrupted.

The reason for this is that the 'pt_irq_guest_eoi' ends up
setting the 'state' to zero value. However the 'state' value
(STATE_SCHED, STATE_RUN) is used to communicate between
 'raise_softirq_for' and 'dpci_softirq' to determine whether the
'struct hvm_pirq_dpci' can be re-scheduled. We are ignoring the
teardown path for simplicity for right now. The 'pt_irq_guest_eoi' was
not adhering to the proper dialogue and was not using locked cmpxchg or
test_bit operations and ended setting 'state' set to zero. That
meant 'raise_softirq_for' was free to schedule it while the
'struct hvm_pirq_dpci'' was still on an per-cpu list.
The end result was list_del being called twice and the second call
corrupting the per-cpu list.

For this to occur one of the CPUs must be in the idle loop executing
softirqs and the interrupt handler in the guest must not
respond to the pending interrupt within 8ms, and we must receive
another interrupt for this device on another CPU.

CPU0:  CPU1:

timer_softirq_action
 \- pt_irq_time_out
 state = 0;do_IRQ
 [out of timer code, theraise_softirq
 pirq_dpci is on the CPU0 dpci_list]  [adds the pirq_dpci to CPU1
   dpci_list as state == 0]

softirq_dpci:softirq_dpci:
list_del
[list entries are poisoned]
list_del = BOOM

The fix is simple - enroll 'pt_irq_guest_eoi' to use the locked
semantics for 'state'. We piggyback on pt_pirq_softirq_cancel (was
pt_pirq_softirq_reset) to use cmpxchg. We also expand said function
to reset the '-dom' only on the teardown paths - but not on the
timeouts.

Reported-and-Tested-by: Sander Eikelenboom li...@eikelenboom.it
Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
---
 xen/drivers/passthrough/io.c | 27 +--
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index efc66dc..2039d31 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -57,7 +57,7 @@ enum {
  * This can be called multiple times, but the softirq is only raised once.
  * That is until the STATE_SCHED state has been cleared. The state can be
  * cleared by: the 'dpci_softirq' (when it has executed 'hvm_dirq_assist'),
- * or by 'pt_pirq_softirq_reset' (which will try to clear the state before
+ * or by 'pt_pirq_softirq_cancel' (which will try to clear the state before
  * the softirq had a chance to run).
  */
 static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
@@ -97,13 +97,15 @@ bool_t pt_pirq_softirq_active(struct hvm_pirq_dpci 
*pirq_dpci)
 }
 
 /*
- * Reset the pirq_dpci-dom parameter to NULL.
+ * Cancels an outstanding pirq_dpci (if scheduled). Also if clear is set,
+ * reset pirq_dpci-dom parameter to NULL (used for teardown).
  *
  * This function checks the different states to make sure it can do it
  * at the right time. If it unschedules the 'hvm_dirq_assist' from running
  * it also refcounts (which is what the softirq would have done) properly.
  */
-static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci)
+static void pt_pirq_softirq_cancel(struct hvm_pirq_dpci *pirq_dpci,
+   unsigned int clear)
 {
 struct domain *d = pirq_dpci-dom;
 
@@ -125,8 +127,13 @@ static void pt_pirq_softirq_reset(struct hvm_pirq_dpci 
*pirq_dpci)
  * to a shortcut the 'dpci_softirq' implements. It stashes the 'dom'
  * in local variable before it sets STATE_RUN - and therefore will not
  * dereference '-dom' which would crash.
+ *
+ * However, if this is called from 'pt_irq_time_out' we do not want to
+ * clear the '-dom' as we can re-use the 'pirq_dpci' after that and
+ * need '-dom'.
  */
-pirq_dpci-dom = NULL;
+if ( clear )
+pirq_dpci-dom = NULL;
 break;
 }
 }
@@ -142,7 +149,7 @@ static int pt_irq_guest_eoi(struct domain *d, struct 
hvm_pirq_dpci *pirq_dpci,
 if ( __test_and_clear_bit(_HVM_IRQ_DPCI_EOI_LATCH_SHIFT,
   pirq_dpci-flags) )
 {
-pirq_dpci-state = 0;
+pt_pirq_softirq_cancel(pirq_dpci, 0 /* keep dom */);
 pirq_dpci-pending = 0;
 pirq_guest_eoi(dpci_pirq(pirq_dpci));
 }
@@ -285,7 +292,7 @@ int pt_irq_create_bind(
  * to be scheduled but we must deal with the one that may 
be
  * in the queue.
  */
-pt_pirq_softirq_reset(pirq_dpci);
+pt_pirq_softirq_cancel(pirq_dpci, 1 /* reset dom */);
 }
 }
 if ( unlikely(rc) )
@@ -536,9 

Re: [Xen-devel] [PATCH v9 05/13] arm: introduce is_device_dma_coherent

2014-11-19 Thread Russell King - ARM Linux
On Tue, Nov 18, 2014 at 04:49:21PM +, Stefano Stabellini wrote:
 ping?

Sending something which wants my attention _To:_ me is always a good idea :)

The patch is fine in itself, but I have a niggle about the
is_device_dma_coherent() - provided this is only used in architecture
specific code, that should be fine.  It could probably do with a comment
to that effect in an attempt to discourage drivers using it (thereby
becoming less portable to other architectures.)

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-19 Thread Steve Freitas

On 11/17/2014 23:54, Jan Beulich wrote:

On 17.11.14 at 20:21, sfl...@ihonk.com wrote:

Okay, I did a bisection and was not able to correlate the above error
message with the problem I'm seeing. Not saying it's not related, but I
had plenty of successful test runs in the presence of that error.

Took me about a week (sometimes it takes as much as 6 hours to produce
the error), but bisect narrowed it down to this commit:

http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=9a727a813e9b25003e433b3d
c3fa47e621f9e238

What do you think?

Thanks for narrowing this, even if this change didn't show any other
bad effects so far (and it's been widely tested by now), and even if
problems here would generally be expected to surface independent
of the use of PCI pass-through. But a hang (rather than a crash)
would indeed be the most natural result of something being wrong
here. To double check the result, could you, in an up-to-date tree,
simply make x86's arch_skip_send_event_check() return 0
unconditionally?


Made this change and the host was happy.


  Plus, without said adjustment, first just disable the
MWAIT CPU idle driver (mwait-idle=0) and then, if that didn't make
a difference, use of C states altogether (cpuidle=0). If any of this
does make a difference, limiting use of C states without fully
excluding their use may need to be the next step.


Will do this next.


Another thing - now that serial logging appears to be working for
you, did you try whether the host, once hung, still reacts to serial
input (perhaps force input to go to Xen right at boot via the
conswitch= option)? If so, 'd' debug-key output would likely be
the piece of most interest.


Here you go. Performed with a checkout of 9a727a81 (because it was 
handy), let me know if you'd rather see the results from 4.5-rc2 or any 
other Xen debugging info:


(XEN) 'd' pressed - dumping registers
(XEN)
(XEN) *** Dumping CPU0 guest state (d1v2): ***
(XEN) [ Xen-4.5-unstable  x86_64  debug=y  Not tainted ]
(XEN) CPU:0
(XEN) RIP:0010:[f8000281e2c1]
(XEN) RFLAGS: 0002   CONTEXT: hvm guest
(XEN) rax: 3acd4939f3e7   rbx: 3acd493a0cce   rcx: 
(XEN) rdx: 3acd   rsi:    rdi: 0057
(XEN) rbp: 645c   rsp: f880033edf90   r8: f880033edff0
(XEN) r9:     r10: f880033ee040   r11: 000342934690
(XEN) r12: f880033ee3c8   r13: 1000   r14: 
(XEN) r15: 0058   cr0: 80050031   cr4: 06f8
(XEN) cr3: 66aca000   cr2: f9800268
(XEN) ds: 002b   es: 002b   fs: 0053   gs: 002b   ss: 0018   cs: 0010
(XEN)
(XEN) *** Dumping CPU1 host state: ***
(XEN) [ Xen-4.5-unstable  x86_64  debug=y  Not tainted ]
(XEN) CPU:1
(XEN) RIP:e008:[82d08012a9a1] _spin_unlock_irq+0x30/0x31
(XEN) RFLAGS: 0246   CONTEXT: hypervisor
(XEN) rax:    rbx: 8300a943e000   rcx: 0001
(XEN) rdx: 830c3dc7   rsi: 0004   rdi: 830c3dc7a088
(XEN) rbp: 830c3dc77ec8   rsp: 830c3dc77e40   r8: 830c3dc7a0a0
(XEN) r9:     r10: f88002fd82a0   r11: f88002fe2d70
(XEN) r12: 151cc8b48756   r13: 8300a943e000   r14: 830c3dc7a088
(XEN) r15: 01c9c380   cr0: 8005003b   cr4: 26f0
(XEN) cr3: 000c18962000   cr2: ff331aa0
(XEN) ds:    es:    fs:    gs:    ss:    cs: e008
(XEN) Xen stack trace from rsp=830c3dc77e40:
(XEN)82d080126ec5 82d080321280 830c3dc7a0a0 000100c77e78
(XEN)830c3dc7a080 82d0801b5277 8300a943e000 f88002fe2d70
(XEN)8300a943e000 01c9c380 82d0801e0f00 830c3dc77f08
(XEN)82d0802f8080 82d0802f8000  830c3dc7
(XEN)0001 830c3dc77ef8 82d08012a1b3 8300a943e000
(XEN)f88002fe2d70 36d08fbeebe8 000f 830c3dc77f08
(XEN)82d08012a20b 000f 82d0801e3d2a 0001
(XEN)000f 36d08fbeebe8 f88002fe2d70 000f
(XEN)f88002fd8180 f88002fe2d70 f88002fd82a0 34711df61755
(XEN)f88002fd82a0 0002 f88002fd81c0 0400
(XEN) f88002fe2eb0 beefbeef f8000298520c
(XEN)00bfbeef 0046 f88002fe2c20 beef
(XEN)c2c2c2c2c2c2beef c2c2c2c2c2c2beef c2c2c2c2c2c2beef c2c2c2c2c2c2beef
(XEN)c2c2c2c20001 8300a943e000 003bbd958e00 c2c2c2c2c2c2c2c2
(XEN) Xen call trace:
(XEN)[82d08012a9a1] _spin_unlock_irq+0x30/0x31
(XEN)[82d08012a1b3] __do_softirq+0x81/0x8c
(XEN)[82d08012a20b] do_softirq+0x13/0x15
(XEN)[82d0801e3d2a] vmx_asm_do_vmentry+0x2a/0x45
(XEN)
(XEN) *** Dumping CPU1 guest state (d1v5): ***
(XEN) [ Xen-4.5-unstable  x86_64  debug=y  Not tainted ]
(XEN) CPU:  

Re: [Xen-devel] [v7][RFC][PATCH 06/13] hvmloader/ram: check if guest memory is out of reserved device memory maps

2014-11-19 Thread Jan Beulich
 On 19.11.14 at 02:26, tiejun.c...@intel.com wrote:
  So without lookuping devices[i], how can we call func() for each sbdf as
 you mentioned?

 You've got both rmrr and bdf in the body of for_each_rmrr_device().
 After all - as I said - you just open-coded it.

 
 Yeah, so change this again,
 
 int intel_iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
 {
  struct acpi_rmrr_unit *rmrr;
  int rc = 0;
  unsigned int i;
  u16 bdf;
 
  for_each_rmrr_device ( rmrr, bdf, i )
  {
  rc = func(PFN_DOWN(rmrr-base_address),
 PFN_UP(rmrr-end_address) -
  PFN_DOWN(rmrr-base_address),
 PCI_SBDF(rmrr-segment, bdf),
ctxt);
  /* Hit this entry so just go next. */
  if ( rc == 1 )
  i = rmrr-scope.devices_cnt;
  else if ( rc  0 )
  return rc;
  }
 
  return rc;
 }

Better. Another improvement would be make it not depend on the
internal workings of for_each_rmrr_device()... And in any case you
should not special case 1 - just return when rc is negative and skip
the rest of the current RMRR when it's positive. And of course make
the function's final return value predictable.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][RFC][PATCH 06/13] hvmloader/ram: check if guest memory is out of reserved device memory maps

2014-11-19 Thread Tian, Kevin
 From: Tian, Kevin
 Sent: Wednesday, November 19, 2014 4:18 PM
 
  From: Jan Beulich [mailto:jbeul...@suse.com]
  Sent: Wednesday, November 12, 2014 5:57 PM
 
   On 12.11.14 at 10:13, tiejun.c...@intel.com wrote:
   On 2014/11/12 17:02, Jan Beulich wrote:
   On 12.11.14 at 09:45, tiejun.c...@intel.com wrote:
   #2 flags field in each specific device of new domctl would control
   whether this device need to check/reserve its own RMRR range. But
 its
   not dependent on current device assignment domctl, so the user can
  use
   them to control which devices need to work as hotplug later,
 separately.
  
   And this could be left as a second step, in order for what needs to
   be done now to not get more complicated that necessary.
  
  
   Do you mean currently we still rely on the device assignment domctl to
   provide SBDF? So looks nothing should be changed in our policy.
  
   I can't connect your question to what I said. What I tried to tell you
  
   Something is misunderstanding to me.
  
   was that I don't currently see a need to make this overly complicated:
   Having the option to punch holes for all devices and (by default)
   dealing with just the devices assigned at boot may be sufficient as a
   first step. Yet (repeating just to avoid any misunderstanding) that
   makes things easier only if we decide to require device assignment to
   happen before memory getting populated (since in that case there's
  
   Here what do you mean, 'if we decide to require device assignment to
   happen before memory getting populated'?
  
   Because -quote-
   
   In the present the device assignment is always after memory population.
   And I also mentioned previously I double checked this sequence with 
   printk.
   
  
   Or you already plan or deciede to change this sequence?
 
  So it is now the 3rd time that I'm telling you that part of your
  decision making as to which route to follow should be to
  re-consider whether the current sequence of operations shouldn't
  be changed. Please also consult with the VT-d maintainers (hint to
  them: participating in this discussion publicly would be really nice)
  on _all_ decisions to be made here.
 
 

Yang and I did some discussion here. We understand your point to
avoid introducing new interface if we can leverage existing code.
However it's not a trivial effort to move device assignment before 
populating p2m, and there is no other benefit of doing so except
for this purpose. So we'd not suggest this way.

Current option sounds a reasonable one, i.e. passing a list of BDFs
assigned to this VM before populating p2m, and then having 
hypervisor to filter out reserved regions associated with those 
BDFs. This way libxc teaches Xen to create reserved regions once,
and then later the filtered info is returned upon query.

The limitation of wasted memory due to confliction can be
mitigated, and we considered further enhancement can be made
later in libxc that when populating p2m, the reserved regions
can be skipped explicitly at initial p2m creation phase and then 
there would be no waste at all. But this optimization takes some
time and can be built incrementally on current patch and interface, 
post 4.5 release. For now let's focus on the very correctness first.

If you agree, Tiejun will move forward to send another series for 4.5. So
far lots of opens have been closed with your help, but it also means
original v7 needs a serious update then (latest code is in deep discussion
list)

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel