date:20150227

Re: [Xen-devel] Poor network performance between DomU with multiqueue support

2015-02-27 Thread openlui

At 2015-02-27 18:59:52, Wei Liu wei.l...@citrix.com wrote:
Cc'ing David (XenServer kernel maintainer)

On Fri, Feb 27, 2015 at 05:21:11PM +0800, openlui wrote:
 On Mon, Dec 08, 2014 at 01:08:18PM +, Zhangleiqiang (Trump) wrote:
   On Mon, Dec 08, 2014 at 06:44:26AM +, Zhangleiqiang (Trump) wrote:
 On Fri, Dec 05, 2014 at 01:17:16AM +, Zhangleiqiang (Trump) 
 wrote:
 [...]
   I think that's expected, because guest RX data path still 
   uses grant_copy while guest TX uses grant_map to do zero-copy 
   transmit.
 
  As far as I know, there are three main grant-related 
  operations used in split
 device model: grant mapping, grant transfer and grant copy.
  Grant transfer has not used now, and grant mapping and grant 
  transfer both
 involve TLB refresh work for hypervisor, am I right?  Or only 
 grant transfer has this overhead?

 Transfer is not used so I can't tell. Grant unmap causes TLB flush.

 I saw in an email the other day XenServer folks has some planned 
 improvement to avoid TLB flush in Xen to upstream in 4.6 window. 
 I can't speak for sure it will get upstreamed as I don't work on 
 that.

  Does grant copy surely has more overhead than grant mapping?
 

 At the very least the zero-copy TX path is faster than previous 
 copying path.

 But speaking of the micro operation I'm not sure.

 There was once persistent map prototype netback / netfront that 
 establishes a memory pool between FE and BE then use memcpy to 
 copy data. Unfortunately that prototype was not done right so 
 the result was not
   good.
   
The newest mail about persistent grant I can find is sent from 16 
Nov
2012
(http://lists.xen.org/archives/html/xen-devel/2012-11/msg00832.html).
Why is it not done right and not merged into upstream?
   
   AFAICT there's one more memcpy than necessary, i.e. frontend memcpy 
   data into the pool then backend memcpy data out of the pool, when 
   backend should be able to use the page in pool directly.
  
  Memcpy should cheaper than grant_copy because the former needs not the 
  hypercall which will cause VM Exit to XEN Hypervisor, am I 
  right? For RX path, using memcpy based on persistent grant table may 
  have higher performance than using grant copy now.
 
 In theory yes. Unfortunately nobody has benchmarked that properly.

 I have some testing for RX performance using persistent grant method
 and upstream method (3.17.4 branch), the results show that persistent
 grant method does have higher performance than upstream method (from
 3.5Gbps to about 6Gbps). And I find that persistent grant mechanism
 has already used in blkfrong/blkback, I am wondering why there are no
 efforts to replace the grant copy by persistent grant now, at least in
 RX path. Are there other disadvantages in persistent grant method
 which stop we use it? 
 

I've seen numbers better than 6Gbps. See upstream changeset
1650d5455bd2dc6b5ee134bd6fc1a3236c266b5b.
Thanks, Wei. The throughout I mentioned (3.5Gbps and 6Gbps) is for UDP 1400 
bytes packet, I think the result based on 
1650d5455bd2dc6b5ee134bd6fc1a3236c266b5b is for TCP. 

Persistent grant is not silver bullet. There is email thread on the
list discussing whether it should be removed in block driver.

I have tried to look for the thread but no detailed info. Could you give me 
some keyword to find the thread, thanks.


XenServer folks have been working on improving network performance. It's
my understanding that they choose different routes than persistent
grant. David might have more insight.


Wei.

 PS. I used pkt-gen to send packet from dom0 to a domU running on
 another dom0, the CPUs of both dom0 is Intel E5640 2.4GHz, and the two
 dom0s is connected with a 10GE NIC.
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [qemu-upstream-unstable test] 35474: regressions - FAIL

2015-02-27 Thread xen . org

flight 35474 qemu-upstream-unstable real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/35474/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-rhel6hvm-amd 6 leak-check/basis(6) running in 34247 
[st=running!]
 test-amd64-amd64-xl-winxpsp3 10 guest-localmigrate fail in 34247 REGR. vs. 
33488

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-qemuu-debianhvm-amd64 7 debian-hvm-install fail pass in 
35312
 test-amd64-i386-pair 17 guest-migrate/src_host/dst_host fail pass in 34247
 test-amd64-amd64-xl-winxpsp3  7 windows-install fail pass in 34319
 test-amd64-i386-freebsd10-i386 11 guest-localmigrate fail in 34247 pass in 
35474
 test-amd64-i386-xl-qemuu-debianhvm-amd64 10 guest-localmigrate fail in 34247 
pass in 35474
 test-amd64-i386-freebsd10-amd64 11 guest-localmigrate fail in 34247 pass in 
35474
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 10 guest-localmigrate fail in 34247 
pass in 35474
 test-amd64-i386-xl-qemuu-ovmf-amd64 10 guest-localmigrate fail in 34247 pass 
in 35474
 test-amd64-amd64-xl-qemuu-debianhvm-amd64 10 guest-localmigrate fail in 34247 
pass in 35312
 test-amd64-amd64-xl-qemuu-win7-amd64 10 guest-localmigrate fail in 34247 pass 
in 35474
 test-amd64-i386-xl-win7-amd64 10 guest-localmigrate fail in 34247 pass in 35474
 test-amd64-amd64-xl-win7-amd64 10 guest-localmigrate fail in 34247 pass in 
35474
 test-amd64-amd64-xl-qemuu-ovmf-amd64 10 guest-localmigrate fail in 34247 pass 
in 35474
 test-amd64-i386-xl-winxpsp3-vcpus1 10 guest-localmigrate fail in 34247 pass in 
35474
 test-amd64-i386-xl-winxpsp3  10 guest-localmigrate fail in 34247 pass in 35474
 test-amd64-i386-xl-qemuu-winxpsp3 10 guest-localmigrate fail in 34247 pass in 
35474
 test-amd64-amd64-xl-qemuu-winxpsp3 10 guest-localmigrate fail in 34247 pass in 
35474
 test-amd64-i386-xl-qemuu-win7-amd64 10 guest-localmigrate fail in 34247 pass 
in 35474

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-sedf-pin 13 guest-destroy   fail in 35312 blocked in 33488
 test-amd64-amd64-libvirt  9 guest-start   fail in 35312 like 33488
 test-armhf-armhf-xl-multivcpu 14 leak-check/check fail in 34247 blocked in 
33488
 test-armhf-armhf-xl-credit2   5 xen-bootfail in 34247 blocked in 33488
 test-armhf-armhf-libvirt 13 guest-destroy   fail in 34247 blocked in 33488
 test-amd64-i386-libvirt   9 guest-start   fail in 34247 like 33488

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-pvh-intel  9 guest-start  fail  never pass
 test-armhf-armhf-xl-multivcpu 10 migrate-support-checkfail  never pass
 test-armhf-armhf-xl  10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-sedf-pin 10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-sedf 10 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 10 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 10 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-pvh-amd   9 guest-start  fail   never pass
 test-armhf-armhf-xl-midway   10 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-pcipt-intel  9 guest-start fail never pass
 test-amd64-i386-libvirt  10 migrate-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop  fail never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-win7-amd64 14 guest-stop   fail  never pass
 test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass
 test-amd64-amd64-xl-win7-amd64 14 guest-stop   fail never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass
 test-amd64-i386-xl-winxpsp3  14 guest-stop   fail   never pass
 test-amd64-i386-xl-qemuu-winxpsp3 14 guest-stopfail never pass
 test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop  fail never pass

version targeted for testing:
 qemuube11dc1e9172f91e798a8f831b30c14b479e08e8
baseline version:
 qemuu0d37748342e29854db7c9f6c47d7f58c6cfba6b2


People who touched revisions under test:
  Don Slutz dsl...@verizon.com
  Paul Durrant

[Xen-devel] [xen-4.5-testing test] 35450: trouble: broken/fail/pass

2015-02-27 Thread xen . org

flight 35450 xen-4.5-testing real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/35450/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf  3 host-install(3) broken in 35097 REGR. vs. 34731

Tests which are failing intermittently (not blocking):
 test-amd64-i386-pair  4 host-install/dst_host(4)  broken pass in 35097
 test-amd64-i386-libvirt   5 xen-boot   fail in 35097 pass in 35450
 test-amd64-amd64-rumpuserxen-amd64  5 xen-boot fail in 35097 pass in 35450
 test-amd64-i386-qemut-rhel6hvm-intel  5 xen-boot   fail in 35097 pass in 35450
 test-amd64-i386-qemuu-rhel6hvm-amd  5 xen-boot fail in 35097 pass in 35450
 test-amd64-i386-xl5 xen-boot   fail in 35097 pass in 35450
 test-amd64-amd64-xl-sedf  5 xen-boot   fail in 35097 pass in 35450
 test-amd64-amd64-xl   5 xen-boot   fail in 35097 pass in 35450
 test-amd64-i386-freebsd10-i386  5 xen-boot fail in 35097 pass in 35450
 test-amd64-i386-rumpuserxen-i386  5 xen-boot   fail in 35097 pass in 35450
 test-amd64-amd64-xl-qemuu-win7-amd64  5 xen-boot   fail in 35097 pass in 35450
 test-amd64-i386-xl-qemut-debianhvm-amd64 5 xen-boot fail in 35097 pass in 35450
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 5 xen-boot fail in 35097 pass in 35450
 test-amd64-i386-xl-qemuu-winxpsp3  5 xen-boot  fail in 35097 pass in 35450
 test-amd64-i386-xl-winxpsp3-vcpus1  5 xen-boot fail in 35097 pass in 35450
 test-amd64-amd64-xl-qemuu-debianhvm-amd64 5 xen-boot fail in 35097 pass in 
35450
 test-amd64-amd64-xl-qemut-debianhvm-amd64 5 xen-boot fail in 35097 pass in 
35450
 test-amd64-amd64-xl-qemuu-ovmf-amd64  5 xen-boot   fail in 35097 pass in 35450
 test-amd64-i386-xl-qemut-winxpsp3  5 xen-boot  fail in 35097 pass in 35450

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-libvirt  5 xen-boot  fail in 35097 like 34638

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-pvh-intel  9 guest-start  fail  never pass
 test-armhf-armhf-xl-sedf 10 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 10 migrate-support-checkfail  never pass
 test-armhf-armhf-xl  10 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-pvh-amd   9 guest-start  fail   never pass
 test-armhf-armhf-xl-midway   10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-sedf-pin 10 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  10 migrate-support-checkfail   never pass
 test-amd64-amd64-rumpuserxen-amd64 13 
rumpuserxen-demo-xenstorels/xenstorels.repeat fail never pass
 test-amd64-amd64-libvirt 10 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-pcipt-intel  9 guest-start fail never pass
 test-armhf-armhf-xl-credit2   5 xen-boot fail   never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-winxpsp3  14 guest-stop   fail   never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3 14 guest-stopfail never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-win7-amd64 14 guest-stop   fail  never pass
 test-amd64-amd64-xl-win7-amd64 14 guest-stop   fail never pass
 test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass
 test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop   fail never pass
 test-amd64-amd64-xl-winxpsp3 14 guest-stop   fail   never pass
 test-armhf-armhf-xl-sedf  1 build-check(1)blocked in 35097 n/a
 test-armhf-armhf-libvirt  1 build-check(1)blocked in 35097 n/a
 test-armhf-armhf-xl-multivcpu  1 build-check(1)   blocked in 35097 n/a
 test-armhf-armhf-xl   1 build-check(1)blocked in 35097 n/a
 test-armhf-armhf-xl-midway1 build-check(1)blocked in 35097 n/a
 test-armhf-armhf-xl-sedf-pin  1 build-check(1)blocked in 35097 n/a
 test-armhf-armhf-xl-credit2   1 build-check(1)blocked in 35097 n/a
 build-armhf-libvirt   1 build-check(1)blocked in 35097 n/a

Re: [Xen-devel] Poor network performance between DomU with multiqueue support

2015-02-27 Thread openlui

At 2015-02-27 19:30:20, David Vrabel david.vra...@citrix.com wrote:
On 27/02/15 10:59, Wei Liu wrote:
 
 Persistent grant is not silver bullet. There is email thread on the
 list discussing whether it should be removed in block driver.

Persistent grants for to-guest network traffic is a flawed idea.  It
either requires:

a) the backend to memcpy into the mapped grant /and/ the frontend to
memcpy out of the persistently mapped pool.  This is clearly going to be
worse for memory bandwidth than a single grant copy.


Yes, persistent grant method does use more DomU's cpu than grant copy method. 


However, the persistent way does have one more memcpy operation than grant 
copy, but it has two less mmap operation than grant copy and no hypercall 
too. I have examined the code for grant copy, it needs to mmap the memory 
from src and dest domain to hypervisor,  then memcpy the data from src to 
dest. There will be more cpu used by hypervisor instead of DomU.


or

b) the backend to accumulate more and more mappings of guest memory,
which is bad for security and it uses too many grant and map track
resources hence it does not scale to many VIFs.

I find that persistent grant patch has a upper limit for amount of guest memory 
can be mapped by each queue of VIF. The limit seems to the VIF‘s ring size if I 
understand right, so the amount seems not high.
Under my benchmark, at least for single UDP flow, the persistent grant way has 
more higher throughout than grant copy way. 


David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v7 0/3] xen/arm: Add support for Huawei hip04-d01 platform

2015-02-27 Thread Frediano Ziglio

This set of patches add Xen support for hip04-d01 platform (see
https://wiki.linaro.org/Boards/D01 for details).

Changes from v6:
- collapsed some patch (Julien Grall);
- remove useless check for irq values;
- test interrupt controller not using DT compatibility;
- remove non standard drivers flag (Ian Campbell).

Changes from V5.99.1:
- removed RFC again;
- use different constants for hip04 instead of redefine standard ones;
- comment compatible string change;
- add an option to ARM to enable non standard drivers;
- rename gicv2 to hip04gic to make clear this is not a standard gic.

Changes from v5:
- do not change gic-v2.c code but use a copy.

To be considered RFC, to see if better to use copy or other techniques.

Changes from v4:
- rebased to new version;
- removed patch for computing GIC addresses as it apply to all platforms;
- removed patches to platform (cpu and system operations) as now they can
  use a bootwrapper which provide them.

Changes from v3:
- change the way regs property is computed for GICv2 (Julien Grall);
- revert order of compaible names for GIC (Julien Grall).

Changes from v2:
- rewrote DTS fix patch (Ian Campbell);
- use is_hip04 macro instead of doing explicit test (Julien Grall);
- do not use quirks to distinguish this platform (Ian Cambell);
- move some GIC constants to C files instead of header (Julien Grall);
- minor changes (Julien Grall).

Changes from v1:
- style (Julien Grall);
- make gicv2_send_SGI faster (Julien Grall);
- cleanup correctly if hip04_smp_init fails (Julien Grall);
- remove quirks using compatibility (Ian Campbell);
- other minor suggestions by Julien Grall.

Frediano Ziglio (3):
  xen/arm: Duplicate gic-v2.c file to support hip04 platform version
  xen/arm: Make gic-v2 code handle hip04-d01 platform
  xen/arm: Force dom0 to use normal GICv2 driver on Hip04 platform

 xen/arch/arm/Makefile   |   1 +
 xen/arch/arm/domain_build.c |   2 +-
 xen/arch/arm/gic-hip04.c| 817 
 3 files changed, 819 insertions(+), 1 deletion(-)
 create mode 100644 xen/arch/arm/gic-hip04.c

-- 
1.9.1



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 1/5] x86: allow specifying the NUMA nodes Dom0 should run on

2015-02-27 Thread Dario Faggioli

On Fri, 2015-02-27 at 10:50 +, Jan Beulich wrote:
  On 27.02.15 at 11:04, dario.faggi...@citrix.com wrote:
  On Fri, 2015-02-27 at 08:46 +, Jan Beulich wrote:

  This way behavior doesn't change if internally in the hypervisor we
  need to change the mapping from PXMs to node IDs.
  
  Ok, I see the value of this. I'm still a bit concerned about the fact
  that everything else speak NUMA node, but it's probably just me being
  much more used to that than to PXMs. :-)
 
 With everything else I suppose you mean the tool stack? There
 shouldn't be any node IDs kept across reboots there. Yet the
 consistent behavior to be achieved here is particularly for multiple
 boots.
 
Sure. I was more thinking to inconsistency in the user mind, as he'll
have to deal with PXM when configuring Dom0, and with node IDs after
boot... but again, maybe it's only me.

  I'm simply adjusting what sched_init_vcpu() did, which is alter
  hard affinity conditionally on is_pinned and soft affinity
  unconditionally.
  
  Ok, I understand the idea behing this better now, thanks.
  [...]
  Setting soft affinity as a superset of (in the former case) or equal to
  (in the latter) hard affinity is just pure overhead, when in the
  scheduler.
 
 The why does sched_init_vcpu() do what it does? If you want to
 alter that, I'm fine with altering it here.
 
It does that, but, in there, soft affinity is unconditionally set to
'all bits set'. Then, in the scheduler, if we find out that the the soft
affinity mask is fully set, we just skip the soft affinity balancing
step.

The idea is that, whether the mask is full because no one touched this
default, or because it has been manually set like that, there is nothing
to do at the soft affinity balancing level.

So, you actually are right: rather that not touch soft affinity, as I
said in the previous email, I think we should set hard affinity
conditionally to is_pinned, as in the patch, and then unconditionally
set soft affinity to all, as in sched_init_vcpu().

  Then, if we want to make it possible to tweak soft affinity, we can
  allow for something like dom0_nodes=soft:1,3 and, in that case, alter
  soft affinity only.
 
 Hmm, not sure. And I keep being confused whether soft means
 allow and hard means prefer or the other way around. 

hard means allow (or not allow)
soft means prefer

 In any
 event, again, with sched_init_vcpu() setting up things so that
 soft is a superset of hard (and most likely they're equal), I don't
 see why the same done here would be more of a problem.
 
Indeed, sorry, my bad. When talking about soft being superset, I forgot
to mention the sort of special casing we are granting to the case when
soft mask is all set.

Using cpumask_setall here, as done in sched_init_vcpu(), would avoid
incurring in the pointless soft affinity balancing overhead.

Regards,
Dario


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v2 3/4] xen: sched: make counters for vCPU tickling generic

2015-02-27 Thread Dario Faggioli

and update them from Credit2 and RTDS schedulers.

Signed-off-by: Dario Faggioli dario.faggi...@citrix.com
Cc: Meng Xu xumengpa...@gmail.com
Cc: George Dunlap george.dun...@eu.citrix.com
Cc: Jan Beulich jbeul...@suse.com
Cc: Keir Fraser k...@xen.org
Reviewed-by: Meng Xu men...@cis.upenn.edu
Acked-by: Jan Beulich jbeul...@suse.com
---
Changes from v1:
 * fixed the 'no_tickle' case, in Credit2, as requested
   during review
---
 xen/common/sched_credit2.c   |4 
 xen/common/sched_rt.c|2 ++
 xen/include/xen/perfc_defn.h |4 ++--
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 2b852cc..c0f7452 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -556,7 +556,10 @@ runq_tickle(const struct scheduler *ops, unsigned int cpu, 
struct csched2_vcpu *
 /* Only switch to another processor if the credit difference is greater
  * than the migrate resistance */
 if ( ipid == -1 || lowest + CSCHED2_MIGRATE_RESIST  new-credit )
+{
+SCHED_STAT_CRANK(tickle_idlers_none);
 goto no_tickle;
+}
 
 tickle:
 BUG_ON(ipid == -1);
@@ -571,6 +574,7 @@ tickle:
   (unsigned char *)d);
 }
 cpumask_set_cpu(ipid, rqd-tickled);
+SCHED_STAT_CRANK(tickle_idlers_some);
 cpu_raise_softirq(ipid, SCHEDULE_SOFTIRQ);
 
 no_tickle:
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 49d1b83..2ad0c68 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -929,6 +929,7 @@ runq_tickle(const struct scheduler *ops, struct rt_vcpu 
*new)
 }
 
 /* didn't tickle any cpu */
+SCHED_STAT_CRANK(tickle_idlers_none);
 return;
 out:
 /* TRACE */
@@ -944,6 +945,7 @@ out:
 }
 
 cpumask_set_cpu(cpu_to_tickle, prv-tickled);
+SCHED_STAT_CRANK(tickle_idlers_some);
 cpu_raise_softirq(cpu_to_tickle, SCHEDULE_SOFTIRQ);
 return;
 }
diff --git a/xen/include/xen/perfc_defn.h b/xen/include/xen/perfc_defn.h
index 2dc78fe..f754331 100644
--- a/xen/include/xen/perfc_defn.h
+++ b/xen/include/xen/perfc_defn.h
@@ -26,6 +26,8 @@ PERFCOUNTER(vcpu_wake_running,  sched: 
vcpu_wake_running)
 PERFCOUNTER(vcpu_wake_onrunq,   sched: vcpu_wake_onrunq)
 PERFCOUNTER(vcpu_wake_runnable, sched: vcpu_wake_runnable)
 PERFCOUNTER(vcpu_wake_not_runnable, sched: vcpu_wake_not_runnable)
+PERFCOUNTER(tickle_idlers_none, sched: tickle_idlers_none)
+PERFCOUNTER(tickle_idlers_some, sched: tickle_idlers_some)
 
 /* credit specific counters */
 PERFCOUNTER(delay_ms,   csched: delay)
@@ -39,8 +41,6 @@ PERFCOUNTER(acct_vcpu_active,   csched: 
acct_vcpu_active)
 PERFCOUNTER(acct_vcpu_idle, csched: acct_vcpu_idle)
 PERFCOUNTER(vcpu_park,  csched: vcpu_park)
 PERFCOUNTER(vcpu_unpark,csched: vcpu_unpark)
-PERFCOUNTER(tickle_idlers_none, csched: tickle_idlers_none)
-PERFCOUNTER(tickle_idlers_some, csched: tickle_idlers_some)
 PERFCOUNTER(load_balance_idle,  csched: load_balance_idle)
 PERFCOUNTER(load_balance_over,  csched: load_balance_over)
 PERFCOUNTER(load_balance_other, csched: load_balance_other)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v2 4/4] xen: credit2: add a few performance counters

2015-02-27 Thread Dario Faggioli

for events that are specific to Credit2 (as it happens
for Credit1 already).

Signed-off-by: Dario Faggioli dario.faggi...@citrix.com
Cc: George Dunlap george.dun...@eu.citrix.com
Cc: Jan Beulich jbeul...@suse.com
Cc: Keir Fraser k...@xen.org
Acked-by: Jan Beulich jbeul...@suse.com
---
Changes from v1:
 * fixed the repeated typo in perfc_defn.h, as requested
   during review.
---
 xen/common/sched_credit2.c   |   23 +++
 xen/include/xen/perfc_defn.h |   15 ++-
 2 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index c0f7452..bf0d651 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -654,6 +654,8 @@ static void reset_credit(const struct scheduler *ops, int 
cpu, s_time_t now,
 }
 }
 
+SCHED_STAT_CRANK(credit_reset);
+
 /* No need to resort runqueue, as everyone's order should be the same. */
 }
 
@@ -673,6 +675,7 @@ void burn_credits(struct csched2_runqueue_data *rqd, struct 
csched2_vcpu *svc, s
 delta = now - svc-start_time;
 
 if ( delta  0 ) {
+SCHED_STAT_CRANK(burn_credits_t2c);
 t2c_update(rqd, delta, svc);
 svc-start_time = now;
 
@@ -713,6 +716,7 @@ static void update_max_weight(struct csched2_runqueue_data 
*rqd, int new_weight,
 {
 rqd-max_weight = new_weight;
 d2printk(%s: Runqueue id %d max weight %d\n, __func__, rqd-id, 
rqd-max_weight);
+SCHED_STAT_CRANK(upd_max_weight_quick);
 }
 else if ( old_weight == rqd-max_weight )
 {
@@ -729,6 +733,7 @@ static void update_max_weight(struct csched2_runqueue_data 
*rqd, int new_weight,
 
 rqd-max_weight = max_weight;
 d2printk(%s: Runqueue %d max weight %d\n, __func__, rqd-id, 
rqd-max_weight);
+SCHED_STAT_CRANK(upd_max_weight_full);
 }
 }
 
@@ -750,6 +755,7 @@ __csched2_vcpu_check(struct vcpu *vc)
 {
 BUG_ON( !is_idle_vcpu(vc) );
 }
+SCHED_STAT_CRANK(vcpu_check);
 }
 #define CSCHED2_VCPU_CHECK(_vc)  (__csched2_vcpu_check(_vc))
 #else
@@ -1203,6 +1209,7 @@ static void migrate(const struct scheduler *ops,
 svc-migrate_rqd = trqd;
 set_bit(_VPF_migrating, svc-vcpu-pause_flags);
 set_bit(__CSFLAG_runq_migrate_request, svc-flags);
+SCHED_STAT_CRANK(migrate_requested);
 }
 else
 {
@@ -1223,7 +1230,10 @@ static void migrate(const struct scheduler *ops,
 update_load(ops, svc-rqd, svc, 1, now);
 runq_insert(ops, svc-vcpu-processor, svc);
 runq_tickle(ops, svc-vcpu-processor, svc, now);
+SCHED_STAT_CRANK(migrate_on_runq);
 }
+else
+SCHED_STAT_CRANK(migrate_no_runq);
 }
 }
 
@@ -1577,7 +1587,10 @@ csched2_runtime(const struct scheduler *ops, int cpu, 
struct csched2_vcpu *snext
 /* The next guy may actually have a higher credit, if we've tried to
  * avoid migrating him from a different cpu.  DTRT.  */
 if ( rt_credit = 0 )
+{
 time = CSCHED2_MIN_TIMER;
+SCHED_STAT_CRANK(runtime_min_timer);
+}
 else
 {
 /* FIXME: See if we can eliminate this conversion if we know time
@@ -1588,9 +1601,15 @@ csched2_runtime(const struct scheduler *ops, int cpu, 
struct csched2_vcpu *snext
 
 /* Check limits */
 if ( time  CSCHED2_MIN_TIMER )
+{
 time = CSCHED2_MIN_TIMER;
+SCHED_STAT_CRANK(runtime_min_timer);
+}
 else if ( time  CSCHED2_MAX_TIMER )
+{
 time = CSCHED2_MAX_TIMER;
+SCHED_STAT_CRANK(runtime_max_timer);
+}
 }
 
 return time;
@@ -1623,7 +1642,10 @@ runq_candidate(struct csched2_runqueue_data *rqd,
  * its credit is at least CSCHED2_MIGRATE_RESIST higher. */
 if ( svc-vcpu-processor != cpu
   snext-credit + CSCHED2_MIGRATE_RESIST  svc-credit )
+{
+SCHED_STAT_CRANK(migrate_resisted);
 continue;
+}
 
 /* If the next one on the list has more credit than current
  * (or idle, if current is not runnable), choose it. */
@@ -1768,6 +1790,7 @@ csched2_schedule(
 {
 snext-credit += CSCHED2_MIGRATE_COMPENSATION;
 snext-vcpu-processor = cpu;
+SCHED_STAT_CRANK(migrated);
 ret.migrated = 1;
 }
 }
diff --git a/xen/include/xen/perfc_defn.h b/xen/include/xen/perfc_defn.h
index f754331..526002d 100644
--- a/xen/include/xen/perfc_defn.h
+++ b/xen/include/xen/perfc_defn.h
@@ -28,10 +28,10 @@ PERFCOUNTER(vcpu_wake_runnable, sched: 
vcpu_wake_runnable)
 PERFCOUNTER(vcpu_wake_not_runnable, sched: vcpu_wake_not_runnable)
 PERFCOUNTER(tickle_idlers_none, sched: tickle_idlers_none)
 PERFCOUNTER(tickle_idlers_some, sched: tickle_idlers_some)
+PERFCOUNTER(vcpu_check, sched: vcpu_check)
 
 /* credit specific counters */
 PERFCOUNTER(delay_ms,   csched: delay)

Re: [Xen-devel] [PATCH 3/3] mini-os: sort objects in binary archives

2015-02-27 Thread Ian Campbell

On Wed, 2015-02-11 at 11:37 +, Wei Liu wrote:
 Otherwise we can commence splitting off and then apply this patch to the
 split-off mini-os tree.

mini-os has just been split off, minus this patch.

I intend to let the push gate process that split (hopefully the gate
will pass over the w/e) and then apply this patch as the first fresh
commit in the new tree, which will help check all the bits are in place
etc.

I can adjust the paths and fix missing bracket as I go. I'll also update
MINIOS_UPSTREAM_REVISION in xen.git to the new thing.

Ian.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 0/8] Split off mini-os to a separate tree

2015-02-27 Thread Ian Campbell

On Wed, 2015-02-25 at 11:21 +, Wei Liu wrote:
 This is v3 of my mini-os splitting off patch series.

As xen@xenbits I ran:
$ mkdir ~/git/mini-os.git
$ cd ~/git/mini-os.git
$ git init --bare
Initialized empty Git repository in /home/xen/git/mini-os.git/
$ chgrp -R xenmaint .
$ find . -type d -exec chmod g+s {} \;
$ git config --add receive.denyNonFastForwards true
$ git config --add receive.unpackLimit 1
$ git config --add gc.autopacklimit 25

(the last three are due to what is in xen.git/config)

Then on the machine where I usually do committing stuff I did:
$ git clone git://xenbits.xen.org/mini-os.git mini-os.git
Cloning into 'mini-os.git'...
warning: You appear to have cloned an empty repository.
$ git fetch git://xenbits.xen.org/people/liuw/mini-os.git master
remote: Counting objects: 3325, done.
remote: Compressing objects: 100% (954/954), done.
remote: Total 3325 (delta 2308), reused 3291 (delta 2282)
Receiving objects: 100% (3325/3325), 962.22 KiB | 451 KiB/s, done.
Resolving deltas: 100% (2308/2308), done.
From git://xenbits.xen.org/people/liuw/mini-os
 * branchmaster - FETCH_HEAD
$ git push --dry-run origin 
f5d9868796e91bee70601805b9bfc1bb544b0586:refs/heads/master
To ssh://xenbits.xen.org/home/xen/git/mini-os.git
 * [new branch]  f5d9868796e91bee70601805b9bfc1bb544b0586 - master

However having merged wip.build-system-v4 I discovered that autogen.sh
needed to have been run half way up the merged branch.

Wei fixed this up and produced a new people/liuw/mini-os.git and
wip.build-system-v5, see 20150227161058.ge29...@zion.uk.xensource.com.

So in mini-os.git:

$ git fetch git://xenbits.xen.org/people/liuw/mini-os.git master
remote: Counting objects: 99, done.
remote: Compressing objects: 100% (71/71), done.
remote: Total 90 (delta 19), reused 84 (delta 15)
Unpacking objects: 100% (90/90), done.
From git://xenbits.xen.org/people/liuw/mini-os
 * branchmaster - FETCH_HEAD
$ git rev-parse FETCH_HEAD
55f7cd7427ef3e7fe3563a3da46d8664a2ed0d6d
$ git push --dry-run origin 
+55f7cd7427ef3e7fe3563a3da46d8664a2ed0d6d:refs/heads/master
To ssh://xenbits.xen.org/home/xen/git/mini-os.git
 + f5d9868...55f7cd7 55f7cd7427ef3e7fe3563a3da46d8664a2ed0d6d - master 
(forced update)
$ git push origin 
+55f7cd7427ef3e7fe3563a3da46d8664a2ed0d6d:refs/heads/master
Counting objects: 99, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (70/70), done.
Writing objects: 100% (90/90), 183.78 KiB, done.
Total 90 (delta 19), reused 86 (delta 16)
To ssh://xenbits.xen.org/home/xen/git/mini-os.git
 + f5d9868...55f7cd7 55f7cd7427ef3e7fe3563a3da46d8664a2ed0d6d - master 
(forced update)

This required me to temporarily disable receive.denyNonFastForward on
the xenbits repo. It is re-enabled now.

Having done that I pulled

 git://xenbits.xen.org/people/liuw/xen.git wip.build-system-v5

into my staging branch, build tested it and pushed it back out to the
xen.git#staging branch.

Phew!

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-27 Thread Jan Beulich

 On 27.02.15 at 16:24, ian.campb...@citrix.com wrote:
 On Fri, 2015-02-27 at 14:54 +, Stefano Stabellini wrote:
 MMCFG is a Linux config option, not to be confused with
 PHYSDEVOP_pci_mmcfg_reserved that is a Xen hypercall interface.  I don't
 think that the way Linux (or FreeBSD) call PHYSDEVOP_pci_mmcfg_reserved
 is relevant.
 
 My (possibly flawed) understanding was that pci_mmcfg_reserved was
 intended to propagate the result of dom0 parsing some firmware table or
 other to the hypevisor.

That's not flawed at all.

 In Linux dom0 we call it walking pci_mmcfg_list, which looking at
 arch/x86/pci/mmconfig-shared.c pci_parse_mcfg is populated by walking
 over a struct acpi_table_mcfg (there also appears to be a bunch of
 processor family derived entries, which I guess are quirks of some
 sort).

Right - this parses ACPI tables (plus applies some knowledge about
certain specific systems/chipsets/CPUs) and verifies that the space
needed for the MMCFG region is properly reserved either in E820 or
in the ACPI specified resources (only if so Linux decides to use
MMCFG and consequently also tells Xen that it may use it).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v2 1/4] xen: sched: honour generic perf conuters in the RTDS scheduler

2015-02-27 Thread Dario Faggioli

more specifically, about vCPU initialization and destruction events,
in line with adb26c09f26e (xen: sched: introduce a couple of counters
in credit2 and SEDF).

Signed-off-by: Dario Faggioli dario.faggi...@citrix.com
Cc: George Dunlap george.dun...@eu.citrix.com
Cc: Meng Xu xumengpa...@gmail.com
Cc: Jan Beulich jbeul...@suse.com
Cc: Keir Fraser k...@xen.org
Reviewed-by: Meng Xu men...@cis.upenn.edu
---
 xen/common/sched_rt.c |4 
 1 file changed, 4 insertions(+)

diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index df4adac..58dd646 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -525,6 +525,8 @@ rt_alloc_vdata(const struct scheduler *ops, struct vcpu 
*vc, void *dd)
 if ( !is_idle_vcpu(vc) )
 svc-budget = RTDS_DEFAULT_BUDGET;
 
+SCHED_STAT_CRANK(vcpu_init);
+
 return svc;
 }
 
@@ -574,6 +576,8 @@ rt_vcpu_remove(const struct scheduler *ops, struct vcpu *vc)
 struct rt_dom * const sdom = svc-sdom;
 spinlock_t *lock;
 
+SCHED_STAT_CRANK(vcpu_destroy);
+
 BUG_ON( sdom == NULL );
 
 lock = vcpu_schedule_lock_irq(vc);


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v2 0/4] xen: sched: rework and add performance counters

2015-02-27 Thread Dario Faggioli

Take 2 of this:

 http://lists.xen.org/archives/html/xen-devel/2015-02/msg03249.html

I've made all the changes suggested during v1.

The series has Meng's Reviewed-by for the changes to sched_rt.c, and Jan's Ack
for the non-strictly scheduling related part (1 file! :-D), so I think what is
missing is George's view/Ack.

Thanks and Regards,
Dario

---
Dario Faggioli (4):
  xen: sched: honour generic perf conuters in the RTDS scheduler
  xen: sched: make counters for vCPU sleep and wakeup generic
  xen: sched: make counters for vCPU tickling generic
  xen: credit2: add a few performance counters

 xen/common/sched_credit2.c   |   39 +++
 xen/common/sched_rt.c|   18 ++
 xen/include/xen/perfc_defn.h |   29 +
 3 files changed, 74 insertions(+), 12 deletions(-)

--
This happens because I choose it to happen! (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v2 2/4] xen: sched: make counters for vCPU sleep and wakeup generic

2015-02-27 Thread Dario Faggioli

and update them from Credit2 and RTDS. In Credit2, while there,
remove some stale comments too.

Signed-off-by: Dario Faggioli dario.faggi...@citrix.com
Cc: George Dunlap george.dun...@eu.citrix.com
Cc: Meng Xu men...@cis.upenn.edu
Cc: Jan Beulich jbeul...@suse.com
Cc: Keir Fraser k...@xen.org
Reviewed-by: Meng Xu men...@cis.upenn.edu
Acked-by: Jan Beulich jbeul...@suse.com
---
 xen/common/sched_credit2.c   |   12 
 xen/common/sched_rt.c|   12 
 xen/include/xen/perfc_defn.h |   10 +-
 3 files changed, 25 insertions(+), 9 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index ad0a5d4..2b852cc 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -931,6 +931,7 @@ csched2_vcpu_sleep(const struct scheduler *ops, struct vcpu 
*vc)
 struct csched2_vcpu * const svc = CSCHED2_VCPU(vc);
 
 BUG_ON( is_idle_vcpu(vc) );
+SCHED_STAT_CRANK(vcpu_sleep);
 
 if ( per_cpu(schedule_data, vc-processor).curr == vc )
 cpu_raise_softirq(vc-processor, SCHEDULE_SOFTIRQ);
@@ -956,19 +957,22 @@ csched2_vcpu_wake(const struct scheduler *ops, struct 
vcpu *vc)
 
 BUG_ON( is_idle_vcpu(vc) );
 
-/* Make sure svc priority mod happens before runq check */
 if ( unlikely(per_cpu(schedule_data, vc-processor).curr == vc) )
 {
+SCHED_STAT_CRANK(vcpu_wake_running);
 goto out;
 }
-
 if ( unlikely(__vcpu_on_runq(svc)) )
 {
-/* If we've boosted someone that's already on a runqueue, prioritize
- * it and inform the cpu in question. */
+SCHED_STAT_CRANK(vcpu_wake_onrunq);
 goto out;
 }
 
+if ( likely(vcpu_runnable(vc)) )
+SCHED_STAT_CRANK(vcpu_wake_runnable);
+else
+SCHED_STAT_CRANK(vcpu_wake_not_runnable);
+
 /* If the context hasn't been saved for this vcpu yet, we can't put it on
  * another runqueue.  Instead, we set a flag so that it will be put on the 
runqueue
  * after the context has been saved. */
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 58dd646..49d1b83 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -851,6 +851,7 @@ rt_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
 struct rt_vcpu * const svc = rt_vcpu(vc);
 
 BUG_ON( is_idle_vcpu(vc) );
+SCHED_STAT_CRANK(vcpu_sleep);
 
 if ( curr_on_cpu(vc-processor) == vc )
 cpu_raise_softirq(vc-processor, SCHEDULE_SOFTIRQ);
@@ -966,11 +967,22 @@ rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
 BUG_ON( is_idle_vcpu(vc) );
 
 if ( unlikely(curr_on_cpu(vc-processor) == vc) )
+{
+SCHED_STAT_CRANK(vcpu_wake_running);
 return;
+}
 
 /* on RunQ/DepletedQ, just update info is ok */
 if ( unlikely(__vcpu_on_q(svc)) )
+{
+SCHED_STAT_CRANK(vcpu_wake_onrunq);
 return;
+}
+
+if ( likely(vcpu_runnable(vc)) )
+SCHED_STAT_CRANK(vcpu_wake_runnable);
+else
+SCHED_STAT_CRANK(vcpu_wake_not_runnable);
 
 /* If context hasn't been saved for this vcpu yet, we can't put it on
  * the Runqueue/DepletedQ. Instead, we set a flag so that it will be
diff --git a/xen/include/xen/perfc_defn.h b/xen/include/xen/perfc_defn.h
index 3ac7b45..2dc78fe 100644
--- a/xen/include/xen/perfc_defn.h
+++ b/xen/include/xen/perfc_defn.h
@@ -21,6 +21,11 @@ PERFCOUNTER(dom_init,   sched: dom_init)
 PERFCOUNTER(dom_destroy,sched: dom_destroy)
 PERFCOUNTER(vcpu_init,  sched: vcpu_init)
 PERFCOUNTER(vcpu_destroy,   sched: vcpu_destroy)
+PERFCOUNTER(vcpu_sleep, sched: vcpu_sleep)
+PERFCOUNTER(vcpu_wake_running,  sched: vcpu_wake_running)
+PERFCOUNTER(vcpu_wake_onrunq,   sched: vcpu_wake_onrunq)
+PERFCOUNTER(vcpu_wake_runnable, sched: vcpu_wake_runnable)
+PERFCOUNTER(vcpu_wake_not_runnable, sched: vcpu_wake_not_runnable)
 
 /* credit specific counters */
 PERFCOUNTER(delay_ms,   csched: delay)
@@ -32,11 +37,6 @@ PERFCOUNTER(acct_reorder,   csched: acct_reorder)
 PERFCOUNTER(acct_min_credit,csched: acct_min_credit)
 PERFCOUNTER(acct_vcpu_active,   csched: acct_vcpu_active)
 PERFCOUNTER(acct_vcpu_idle, csched: acct_vcpu_idle)
-PERFCOUNTER(vcpu_sleep, csched: vcpu_sleep)
-PERFCOUNTER(vcpu_wake_running,  csched: vcpu_wake_running)
-PERFCOUNTER(vcpu_wake_onrunq,   csched: vcpu_wake_onrunq)
-PERFCOUNTER(vcpu_wake_runnable, csched: vcpu_wake_runnable)
-PERFCOUNTER(vcpu_wake_not_runnable, csched: vcpu_wake_not_runnable)
 PERFCOUNTER(vcpu_park,  csched: vcpu_park)
 PERFCOUNTER(vcpu_unpark,csched: vcpu_unpark)
 PERFCOUNTER(tickle_idlers_none, csched: tickle_idlers_none)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v6 02/23] xen: move NUMA_NO_NODE to public memory.h as XEN_NUMA_NO_NODE

2015-02-27 Thread Andrew Cooper

On 27/02/15 16:51, Wei Liu wrote:
 On Fri, Feb 27, 2015 at 04:42:42PM +, Jan Beulich wrote:
 On 26.02.15 at 16:55, wei.l...@citrix.com wrote:
 Update NUMA_NO_NODE in Xen code to use the new macro.

 No functional change introduced.
 But also no explanation given why this is being done. After all just
 leaving out the explicit specification on a node in the memop flags
 has the effect of saying NUMA_NO_NODE.

 During last round review, Andrew wanted me to move this to Xen public
 header to avoid reinventing it in libxc. Now this value is used in libxc
 patch.

 But I don't particularly mind whether we move it or not, it's up to you
 maintainers to decide.

It is a sentinel value used in the public ABI.  It should therefore
appear in the public API.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-27 Thread Stefano Stabellini

On Fri, 27 Feb 2015, Ian Campbell wrote:
 On Fri, 2015-02-27 at 16:35 +, Jan Beulich wrote:
   On 27.02.15 at 16:24, ian.campb...@citrix.com wrote:
   On Fri, 2015-02-27 at 14:54 +, Stefano Stabellini wrote:
   MMCFG is a Linux config option, not to be confused with
   PHYSDEVOP_pci_mmcfg_reserved that is a Xen hypercall interface.  I don't
   think that the way Linux (or FreeBSD) call PHYSDEVOP_pci_mmcfg_reserved
   is relevant.
   
   My (possibly flawed) understanding was that pci_mmcfg_reserved was
   intended to propagate the result of dom0 parsing some firmware table or
   other to the hypevisor.
  
  That's not flawed at all.
 
 I think that's a first in this thread ;-)
 
   In Linux dom0 we call it walking pci_mmcfg_list, which looking at
   arch/x86/pci/mmconfig-shared.c pci_parse_mcfg is populated by walking
   over a struct acpi_table_mcfg (there also appears to be a bunch of
   processor family derived entries, which I guess are quirks of some
   sort).
  
  Right - this parses ACPI tables (plus applies some knowledge about
  certain specific systems/chipsets/CPUs) and verifies that the space
  needed for the MMCFG region is properly reserved either in E820 or
  in the ACPI specified resources (only if so Linux decides to use
  MMCFG and consequently also tells Xen that it may use it).
 
 Thanks.
 
 So I think what I wrote in 1424948710.14641.25.ca...@citrix.com
 applies as is to Device Tree based ARM devices, including the need for
 the PHYSDEVOP_pci_host_bridge_add call.

Although I understand now that PHYSDEVOP_pci_mmcfg_reserved was
intendend for passing down firmware information to Xen, as the
information that we need is exactly the same, I think it would be
acceptable to use the same hypercall on ARM too.

I am not hard set on this and the new hypercall is also a viable option.
However If we do introduce a new hypercall as Ian suggested, do we need
to take into account the possibility that an host bridge might have
multiple cfg memory ranges?


 On ACPI based devices we will have the MCFG table, and things follow
 much as for x86:
 
   * Xen should parse MCFG to discover the PCI host-bridges
   * Dom0 should do likewise and call PHYSDEVOP_pci_mmcfg_reserved in
 the same way as Xen/x86 does.
 
 The SBSA, an ARM standard for servers, mandates various things which
 we can rely on here because ACPI on ARM requires an SBSA compliant
 system. So things like odd quirks in PCI controllers or magic setup are
 spec'd out of our zone of caring (into the firmware I suppose), hence
 there is nothing like the DT_DEVICE_START stuff to register specific
 drivers etc.
 
 The PHYSDEVOP_pci_host_bridge_add call is not AFAICT needed on ACPI ARM
 systems (any more than it is on x86). We can decide whether to omit it
 from dom0 or ignore it from Xen later on.
 
 (Manish, this is FYI, I don't expect you to implement ACPI support!)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] MAINTAINERS: Add OVMF maintainers.

2015-02-27 Thread Wei Liu

On Fri, Feb 27, 2015 at 04:49:18PM +, Anthony PERARD wrote:
 Signed-off-by: Anthony PERARD anthony.per...@citrix.com

Acked-by: Wei Liu wei.l...@citrix.com

 ---
  MAINTAINERS | 6 ++
  1 file changed, 6 insertions(+)
 
 diff --git a/MAINTAINERS b/MAINTAINERS
 index 3bbac9e..e94a763 100644
 --- a/MAINTAINERS
 +++ b/MAINTAINERS
 @@ -237,6 +237,12 @@ M:   David Scott dave.sc...@eu.citrix.com
  S:   Supported
  F:   tools/ocaml/
  
 +OVMF UPSTREAM
 +M:   Anthony PERARD anthony.per...@citrix.com
 +M:   Wei Liu wei.l...@citrix.com
 +S:   Supported
 +T:   git git://xenbits.xen.org/ovmf.git
 +
  POWER MANAGEMENT
  M:   Jan Beulich jbeul...@suse.com
  M:   Liu Jinsong jinsong@alibaba-inc.com
 -- 
 Anthony PERARD

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v6 02/23] xen: move NUMA_NO_NODE to public memory.h as XEN_NUMA_NO_NODE

2015-02-27 Thread Wei Liu

On Fri, Feb 27, 2015 at 04:42:42PM +, Jan Beulich wrote:
  On 26.02.15 at 16:55, wei.l...@citrix.com wrote:
  Update NUMA_NO_NODE in Xen code to use the new macro.
  
  No functional change introduced.
 
 But also no explanation given why this is being done. After all just
 leaving out the explicit specification on a node in the memop flags
 has the effect of saying NUMA_NO_NODE.
 

During last round review, Andrew wanted me to move this to Xen public
header to avoid reinventing it in libxc. Now this value is used in libxc
patch.

But I don't particularly mind whether we move it or not, it's up to you
maintainers to decide.

Wei.

 Jan
 
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v6 02/23] xen: move NUMA_NO_NODE to public memory.h as XEN_NUMA_NO_NODE

2015-02-27 Thread Jan Beulich

 On 26.02.15 at 16:55, wei.l...@citrix.com wrote:
 Update NUMA_NO_NODE in Xen code to use the new macro.
 
 No functional change introduced.

But also no explanation given why this is being done. After all just
leaving out the explicit specification on a node in the memop flags
has the effect of saying NUMA_NO_NODE.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH] MAINTAINERS: Add OVMF maintainers.

2015-02-27 Thread Anthony PERARD

Signed-off-by: Anthony PERARD anthony.per...@citrix.com
---
 MAINTAINERS | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 3bbac9e..e94a763 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -237,6 +237,12 @@ M: David Scott dave.sc...@eu.citrix.com
 S: Supported
 F: tools/ocaml/
 
+OVMF UPSTREAM
+M: Anthony PERARD anthony.per...@citrix.com
+M: Wei Liu wei.l...@citrix.com
+S: Supported
+T: git git://xenbits.xen.org/ovmf.git
+
 POWER MANAGEMENT
 M: Jan Beulich jbeul...@suse.com
 M: Liu Jinsong jinsong@alibaba-inc.com
-- 
Anthony PERARD


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v6 03/23] xen: make two memory hypercalls vNUMA-aware

2015-02-27 Thread Jan Beulich

 On 26.02.15 at 16:55, wei.l...@citrix.com wrote:
 Make XENMEM_increase_reservation and XENMEM_populate_physmap
 vNUMA-aware.
 
 That is, if guest requests Xen to allocate memory for specific vnode,
 Xen can translate vnode to pnode using vNUMA information of that guest.
 
 XENMEMF_vnode is introduced for the guest to mark the node number is in
 fact virtual node number and should be translated by Xen.
 
 XENFEAT_memory_op_vnode_supported is introduced to indicate that Xen is
 able to translate virtual node to physical node.
 
 Signed-off-by: Wei Liu wei.l...@citrix.com

As I massaged your first patch (also, but not only, to do what Andrew
requested), this one will need adjustment too. Perhaps additionally if
the 2nd one is to be dropped...

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v6 03/23] xen: make two memory hypercalls vNUMA-aware

2015-02-27 Thread Wei Liu

On Fri, Feb 27, 2015 at 04:59:02PM +, Jan Beulich wrote:
  On 26.02.15 at 16:55, wei.l...@citrix.com wrote:
  Make XENMEM_increase_reservation and XENMEM_populate_physmap
  vNUMA-aware.
  
  That is, if guest requests Xen to allocate memory for specific vnode,
  Xen can translate vnode to pnode using vNUMA information of that guest.
  
  XENMEMF_vnode is introduced for the guest to mark the node number is in
  fact virtual node number and should be translated by Xen.
  
  XENFEAT_memory_op_vnode_supported is introduced to indicate that Xen is
  able to translate virtual node to physical node.
  
  Signed-off-by: Wei Liu wei.l...@citrix.com
 
 As I massaged your first patch (also, but not only, to do what Andrew
 requested), this one will need adjustment too. Perhaps additionally if
 the 2nd one is to be dropped...
 

I can resend after we come to conclusion on what to do with patch 2.

Wei.

 Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 0/8] Split off mini-os to a separate tree

2015-02-27 Thread Ian Campbell

On Fri, 2015-02-27 at 16:37 +, Ian Campbell wrote:
 On Wed, 2015-02-25 at 11:21 +, Wei Liu wrote:
  This is v3 of my mini-os splitting off patch series.
 
 As xen@xenbits I ran:
 $ mkdir ~/git/mini-os.git
 $ cd ~/git/mini-os.git
 $ git init --bare
 Initialized empty Git repository in /home/xen/git/mini-os.git/
 $ chgrp -R xenmaint .
 $ find . -type d -exec chmod g+s {} \;
 $ git config --add receive.denyNonFastForwards true
 $ git config --add receive.unpackLimit 1
 $ git config --add gc.autopacklimit 25

This omitted setting up the mails to xen-stag...@lists.xensource.com on
push.

Following Ian's advice to look at ~xen/release-checklist on xenbits I
have now:

xen@xenbits:~/HG/patchbot$ echo 55f7cd7427ef3e7fe3563a3da46d8664a2ed0d6d  
mini-os--master.patchbot-reported-heads

edited versions to add:

/home/xen/git mini-os.git#master
xen-change...@lists.xensource.com   xen-de...@lists.xensource.com

and committed that change to the git repo in cwd.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Regression, host crash with 4.5rc1

2015-02-27 Thread Dugger, Donald D

Len (CC'd on this email) is our power expert who has some ideas on this issue, 
I'll let him explain further.

--
Don Dugger
Censeo Toto nos in Kansa esse decisse. - D. Gale
Ph: 303/443-3786

-Original Message-
From: Jan Beulich [mailto:jbeul...@suse.com] 
Sent: Thursday, November 27, 2014 2:28 AM
To: Steve Freitas; Dugger, Donald D; Nakajima, Jun
Cc: xen-devel@lists.xen.org; Don Slutz
Subject: Re: [Xen-devel] Regression, host crash with 4.5rc1

 On 27.11.14 at 06:29, sfl...@ihonk.com wrote:
 On 11/25/2014 03:00 AM, Jan Beulich wrote:
 Okay, so it's not really the mwait-idle driver causing the 
 regression, but it is C-state related. Hence we're now down to seeing 
 whether all or just the deeper C states are affected, i.e. I now need 
 to ask you to play with max_cstate=. For that you'll have to 
 remember that the option's effect differs between the ACPI and the MWAIT 
 idle drivers.
 In the spirit of bisection I'd suggest using max_cstate=2 first no 
 matter which of the two scenarios you pick. If that still hangs, 
 max_cstate=1 obviously is the only other thing to try. Should that 
 not hang (and you left out mwait-idle=0), trying max_cstate=3
 in that same scenario would be the other case to check.

 No need for 'd' and 'a' output for the time being, but 'c' output 
 would be much appreciated for all cases where you observe hangs.

 
 Okay, working through that now. I tried max_cstate=2 and got no hangs, 
 whether with or without mwait-idle=0. However, I was puzzled by this:
 
 (XEN) 'c' pressed - printing ACPI Cx structures
 (XEN) ==cpu0==
 (XEN) active state: C0
 (XEN) max_cstate:   C2
 (XEN) states:
 (XEN) C1:   type[C1] latency[003] usage[12219860] method[  FFH] 
 duration[1190961948551]
 (XEN) C2:   type[C1] latency[010] usage[10205554] method[  FFH] 
 duration[2015393965907]
 (XEN) C3:   type[C2] latency[020] usage[50926286] method[  FFH] 
 duration[30527997858148]
 (XEN)*C0:   usage[73351700] duration[9974627547595]
 (XEN) max=0 pwr=0 urg=0 nxt=0
 (XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0]
 (XEN) CC3[28794734145697] CC6[0] CC7[0]
 (XEN) ==cpu1==
 (XEN) active state: C3
 (XEN) max_cstate:   C2
 (XEN) states:
 (XEN) C1:   type[C1] latency[003] usage[10699950] method[  FFH] 
 duration[1141422044112]
 (XEN) C2:   type[C1] latency[010] usage[06382904] method[  FFH] 
 duration[1329739264322]
 (XEN)*C3:   type[C2] latency[020] usage[44630764] method[  FFH] 
 duration[31676618425954]
 (XEN) C0:   usage[61713618] duration[9561201640320]
 (XEN) max=0 pwr=0 urg=0 nxt=0
 (XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0]
 (XEN) CC3[30066495105056] CC6[0] CC7[0] [...]
 
 Why would some of the cores be in C3 even though they list max_cstate as C2?

This was precisely the reason why I told you that the numbering differs (and is 
confusing and has nothing to do with actual C state
numbers): What max_cstate refers to in the mwait-idle driver is what above is 
listed as type[Cx], i.e. the state at index 1 is C1, at
2 we've got C1E, and at 3 we've got C2. And those still aren't in line with the 
numbering the CPU documentation uses, it's rather kind of meant to refer to the 
ACPI numbering (but probably also not fully matching up).

So max_cstate=2 working suggests a problem with what the CPU calls C6, which 
presumably isn't all that surprising considering the many errata (BD35, BD38, 
BD40, BD59, BD87, and BD104). Not sure how to proceed from here - I suppose you 
already made sure you run with the latest available BIOS. And with 6 errata 
documented it's not all that unlikely that there's a 7th one with MONITOR/MWAIT 
behavior. The commit you bisected to (and which you had verified to be the 
culprit by just forcing
arch_skip_send_event_check() to always return false) could be reasonably 
assumed to be broken only when MWAIT use for all C states didn't work.

Don, Jun - is there anything known but not yet publicly documented for Family 6 
Model 44 Xeons?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] RFC: xen config changes v4

2015-02-27 Thread Luis R. Rodriguez

On Fri, Feb 27, 2015 at 6:30 AM, Juergen Gross jgr...@suse.com wrote:
 On 02/27/2015 02:38 PM, Stefano Stabellini wrote:

 On Fri, 27 Feb 2015, Juergen Gross wrote:

 On 02/27/2015 01:24 PM, Stefano Stabellini wrote:

 On Fri, 27 Feb 2015, Juergen Gross wrote:

 On 02/27/2015 11:11 AM, Stefano Stabellini wrote:

 On Fri, 27 Feb 2015, Juergen Gross wrote:

 On 02/27/2015 10:41 AM, Stefano Stabellini wrote:

 On Fri, 27 Feb 2015, Juergen Gross wrote:

 On 02/26/2015 06:42 PM, Stefano Stabellini wrote:

 On Thu, 26 Feb 2015, Luis R. Rodriguez wrote:

 On Thu, Feb 26, 2015 at 11:08:20AM +, Stefano Stabellini
 wrote:

 On Thu, 26 Feb 2015, David Vrabel wrote:

 On 26/02/15 04:59, Juergen Gross wrote:


 So we are again in the situation that pv-drivers always
 imply
 the
 pvops
 kernel (PARAVIRT selected). I started the whole Kconfig
 rework
 to
 eliminate this dependency.


 Yes.  Can you produce a series that just addresses this
 one
 issue.

 In the absence of any concrete requirement for this big
 Kconfig
 reorg
 I
 I don't think it is helpful.


 I clearly missed some context as I didn't realize that this
 was
 the
 intended goal. Why do we want this? Please explain as it
 won't
 come
 for free.


 We have a few PV interfaces for HVM guests that need
 PARAVIRT in
 Linux
 in order to be used, for example pv_time_ops and
 HVMOP_pagetable_dying.
 They are critical performance improvements and from the
 interface
 perspective, small enough that doesn't make much sense
 having a
 separate
 KConfig option for them.


 In order to reach the goal above we necessarily need to
 introduce a
 differentiation in terms of PV on HVM guests in Linux:

 1) basic guests with PV network, disk, etc but no PV timers,
 no
 HVMOP_pagetable_dying, no PV IPIs
 2) full PV on HVM guests that have PV network, disk, timers,
 HVMOP_pagetable_dying, PV IPIs and anything else that
 makes
 sense.

 2) is much faster than 1) on Xen and 2) is only a tiny bit
 slower
 than
 1) on native x86


 Also don't we shove 2) down hvm guests right now? Even when
 everything
 is
 built in I do not see how we opt out for HVM for 1) at run
 time
 right
 now.

 If this is true then the question of motivation for this
 becomes
 even
 stronger I think.


 Yes, indeed there is no way to do 1) at the moment. And for good
 reasons, see above.


 Hmm, after checking the code I'm not convinced:

 - HVMOP_pagetable_dying is obsolete on modern hardware supporting
   EPT/HAP


 That might be true, but what about older hardware?
 Even on modern hardware a few workloads still run faster on shadow.
 But if HVMOP_pagetable_dying is the only reason to keep PARAVIRT for
 HVM
 guests, then I agree with you that we should remove it.


 - PV IPIs are not needed on single-vcpu guests

 - PARAVIRT_CLOCK doesn't need PARAVIRT (in fact the SUSEs kernel
 configs
   for all x86_64 kernels have CONFIG_PARAVIRT_CLOCK=y)

 So I think we really should enable building Xen frontends without
 PARAVIRT, implying at least no XEN_PV and no XEN_PVH.

 I'll have a try setting up patches.


 If we are doing this as a performance improvement, I would like to
 see a
 couple of benchmarks (kernbench, hackbench) to show that on a
 single-vcpu guest and multi-vcpu guest (let's say 4 vcpus) disabling
 PARAVIRT leads to better performance on Xen on EPT hardware.


 This is not meant to be a performance improvement. It is meant to
 enable
 a standard distro kernel configured without PARAVIRT to be able to
 run
 as a HVM guest using the pv-drivers.


 This is not a convincing explanation.  Debian, Ubuntu and Fedora seems
 to be able to cope with it just fine.

 Why do you want to do that, even though it will cause a performance
 regression and a maintenance pain?  You haven't provided a reason yet.


 Either we are talking about different things, or I really don't
 understand your problem here. I don't want to disable something. I
 just want to enable kernels without PARAVIRT to run under Xen better
 than today. Being it 32 bit non-PAE kernels as Ian pointed out or
 distro kernels like e.g. SLES and probably RHEL.

 Using PV frontends is completely orthogonal to other PV enhancements
 like PARAVIRT_CLOCK, HVMOP_pagetable_dying or PV IPIs. So why do you
 object enabling the PV frontends for those kernels?


 I am for it.  I would like to avoid two user visible XEN enablement
 options (XEN_FRONTEND vs. XEN_PVHVM) for x86_64 and PAE HVM guests to
 avoid configurations with just XEN_FRONTEND, that can be considered a
 performance regression compared to what we have now (on x86_64 and PAE).


 Would you be okay with making this an expert configuration alternative
 for PAE/x86_64? This would enable the possibility to use PV drivers for
 native-performance-tuned kernels. I would explicitly mention the better
 alternative XEN_PVHVM in the Kconfig help text.


 I would prefer to hide it on PAE and x86_64.


 Okay, as long as it is still _possible_ somehow to configure it.

That begs

[Xen-devel] [PATCH v7 2/3] xen/arm: Make gic-v2 code handle hip04-d01 platform

2015-02-27 Thread Frediano Ziglio

The GIC in this platform is mainly compatible with the standard
GICv2 beside:
- ITARGET is extended to 16 bit to support 16 CPUs;
- SGI mask is extended to support 16 CPUs;
- maximum supported interrupt is 510;
- GICH APR and LR register offsets.

Signed-off-by: Frediano Ziglio frediano.zig...@huawei.com
Signed-off-by: Zoltan Kiss zoltan.k...@huawei.com
---
 xen/arch/arm/Makefile   |   1 +
 xen/arch/arm/domain_build.c |   2 +-
 xen/arch/arm/gic-hip04.c| 400 +++-
 3 files changed, 207 insertions(+), 196 deletions(-)

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 41aba2e..72499e9 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -12,6 +12,7 @@ obj-y += domctl.o
 obj-y += sysctl.o
 obj-y += domain_build.o
 obj-y += gic.o gic-v2.o
+obj-$(arm32) += gic-hip04.o
 obj-$(CONFIG_ARM_64) += gic-v3.o
 obj-y += io.o
 obj-y += irq.o
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 9f1f59f..83951a3 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -1069,7 +1069,7 @@ static int handle_node(struct domain *d, struct 
kernel_info *kinfo,
 
 /* Replace these nodes with our own. Note that the original may be
  * used_by DOMID_XEN so this check comes first. */
-if ( dt_match_node(gic_matches, node) )
+if ( node == dt_interrupt_controller || dt_match_node(gic_matches, node) )
 return make_gic_node(d, kinfo-fdt, node);
 if ( dt_match_node(timer_matches, node) )
 return make_timer_node(d, kinfo-fdt, node);
diff --git a/xen/arch/arm/gic-hip04.c b/xen/arch/arm/gic-hip04.c
index fa695d1..9977f9b 100644
--- a/xen/arch/arm/gic-hip04.c
+++ b/xen/arch/arm/gic-hip04.c
@@ -1,7 +1,8 @@
 /*
- * xen/arch/arm/gic-v2.c
+ * xen/arch/arm/gic-hip04.c
  *
- * ARM Generic Interrupt Controller support v2
+ * Generic Interrupt Controller for HiSilicon Hip04 platform
+ * Based heavily from gic-v2.c
  *
  * Tim Deegan t...@xen.org
  * Copyright (c) 2011 Citrix Systems.
@@ -71,59 +72,71 @@ static struct {
 void __iomem * map_hbase; /* IO Address of virtual interface registers */
 paddr_t vbase;/* Address of virtual cpu interface registers */
 spinlock_t lock;
-} gicv2;
+} hip04gic;
 
-static struct gic_info gicv2_info;
+static struct gic_info hip04gic_info;
 
 /* The GIC mapping of CPU interfaces does not necessarily match the
  * logical CPU numbering. Let's use mapping as returned by the GIC
  * itself
  */
-static DEFINE_PER_CPU(u8, gic_cpu_id);
+static DEFINE_PER_CPU(u16, gic_cpu_id);
 
 /* Maximum cpu interface per GIC */
-#define NR_GIC_CPU_IF 8
+#define NR_GIC_CPU_IF 16
+
+#define HIP04_GICD_SGI_TARGET_SHIFT 8
+
+#define HIP04_GICH_APR   0x70
+#define HIP04_GICH_LR0x80
+
+#define DT_COMPAT_GIC_HIP04 hisilicon,hip04-intc
 
 static inline void writeb_gicd(uint8_t val, unsigned int offset)
 {
-writeb_relaxed(val, gicv2.map_dbase + offset);
+writeb_relaxed(val, hip04gic.map_dbase + offset);
+}
+
+static inline void writew_gicd(uint16_t val, unsigned int offset)
+{
+writew_relaxed(val, hip04gic.map_dbase + offset);
 }
 
 static inline void writel_gicd(uint32_t val, unsigned int offset)
 {
-writel_relaxed(val, gicv2.map_dbase + offset);
+writel_relaxed(val, hip04gic.map_dbase + offset);
 }
 
 static inline uint32_t readl_gicd(unsigned int offset)
 {
-return readl_relaxed(gicv2.map_dbase + offset);
+return readl_relaxed(hip04gic.map_dbase + offset);
 }
 
 static inline void writel_gicc(uint32_t val, unsigned int offset)
 {
 unsigned int page = offset  PAGE_SHIFT;
 offset = ~PAGE_MASK;
-writel_relaxed(val, gicv2.map_cbase[page] + offset);
+writel_relaxed(val, hip04gic.map_cbase[page] + offset);
 }
 
 static inline uint32_t readl_gicc(unsigned int offset)
 {
 unsigned int page = offset  PAGE_SHIFT;
 offset = ~PAGE_MASK;
-return readl_relaxed(gicv2.map_cbase[page] + offset);
+return readl_relaxed(hip04gic.map_cbase[page] + offset);
 }
 
 static inline void writel_gich(uint32_t val, unsigned int offset)
 {
-writel_relaxed(val, gicv2.map_hbase + offset);
+writel_relaxed(val, hip04gic.map_hbase + offset);
 }
 
 static inline uint32_t readl_gich(int unsigned offset)
 {
-return readl_relaxed(gicv2.map_hbase + offset);
+return readl_relaxed(hip04gic.map_hbase + offset);
 }
 
-static unsigned int gicv2_cpu_mask(const cpumask_t *cpumask)
+static unsigned int hip04gic_cpu_mask(const cpumask_t *cpumask)
 {
 unsigned int cpu;
 unsigned int mask = 0;
@@ -139,7 +152,7 @@ static unsigned int gicv2_cpu_mask(const cpumask_t *cpumask)
 return mask;
 }
 
-static void gicv2_save_state(struct vcpu *v)
+static void hip04gic_save_state(struct vcpu *v)
 {
 int i;
 
@@ -147,58 +160,58 @@ static void gicv2_save_state(struct vcpu *v)
  * this call and it only accesses struct vcpu fields that cannot be
  * accessed simultaneously by another pCPU.
  */
-for ( i = 0; i

[Xen-devel] [PATCH v7 1/3] xen/arm: Duplicate gic-v2.c file to support hip04 platform version

2015-02-27 Thread Frediano Ziglio

HiSilison Hip04 platform use a slightly different version.
This is just a verbatim copy of the file to workaround git
not fully supporting copy operation.

Signed-off-by: Frediano Ziglio frediano.zig...@huawei.com
---
 xen/arch/arm/gic-hip04.c | 803 +++
 1 file changed, 803 insertions(+)
 create mode 100644 xen/arch/arm/gic-hip04.c

diff --git a/xen/arch/arm/gic-hip04.c b/xen/arch/arm/gic-hip04.c
new file mode 100644
index 000..fa695d1
--- /dev/null
+++ b/xen/arch/arm/gic-hip04.c
@@ -0,0 +1,803 @@
+/*
+ * xen/arch/arm/gic-v2.c
+ *
+ * ARM Generic Interrupt Controller support v2
+ *
+ * Tim Deegan t...@xen.org
+ * Copyright (c) 2011 Citrix Systems.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include xen/config.h
+#include xen/lib.h
+#include xen/init.h
+#include xen/mm.h
+#include xen/irq.h
+#include xen/sched.h
+#include xen/errno.h
+#include xen/softirq.h
+#include xen/list.h
+#include xen/device_tree.h
+#include xen/libfdt/libfdt.h
+#include asm/p2m.h
+#include asm/domain.h
+#include asm/platform.h
+#include asm/device.h
+
+#include asm/io.h
+#include asm/gic.h
+
+/*
+ * LR register definitions are GIC v2 specific.
+ * Moved these definitions from header file to here
+ */
+#define GICH_V2_LR_VIRTUAL_MASK0x3ff
+#define GICH_V2_LR_VIRTUAL_SHIFT   0
+#define GICH_V2_LR_PHYSICAL_MASK   0x3ff
+#define GICH_V2_LR_PHYSICAL_SHIFT  10
+#define GICH_V2_LR_STATE_MASK  0x3
+#define GICH_V2_LR_STATE_SHIFT 28
+#define GICH_V2_LR_PRIORITY_SHIFT  23
+#define GICH_V2_LR_PRIORITY_MASK   0x1f
+#define GICH_V2_LR_HW_SHIFT31
+#define GICH_V2_LR_HW_MASK 0x1
+#define GICH_V2_LR_GRP_SHIFT   30
+#define GICH_V2_LR_GRP_MASK0x1
+#define GICH_V2_LR_MAINTENANCE_IRQ (119)
+#define GICH_V2_LR_GRP1(130)
+#define GICH_V2_LR_HW  (131)
+#define GICH_V2_LR_CPUID_SHIFT 9
+#define GICH_V2_VTR_NRLRGS 0x3f
+
+#define GICH_V2_VMCR_PRIORITY_MASK   0x1f
+#define GICH_V2_VMCR_PRIORITY_SHIFT  27
+
+/* Global state */
+static struct {
+paddr_t dbase;/* Address of distributor registers */
+void __iomem * map_dbase; /* IO mapped Address of distributor registers */
+paddr_t cbase;/* Address of CPU interface registers */
+void __iomem * map_cbase[2]; /* IO mapped Address of CPU interface 
registers */
+paddr_t hbase;/* Address of virtual interface registers */
+void __iomem * map_hbase; /* IO Address of virtual interface registers */
+paddr_t vbase;/* Address of virtual cpu interface registers */
+spinlock_t lock;
+} gicv2;
+
+static struct gic_info gicv2_info;
+
+/* The GIC mapping of CPU interfaces does not necessarily match the
+ * logical CPU numbering. Let's use mapping as returned by the GIC
+ * itself
+ */
+static DEFINE_PER_CPU(u8, gic_cpu_id);
+
+/* Maximum cpu interface per GIC */
+#define NR_GIC_CPU_IF 8
+
+static inline void writeb_gicd(uint8_t val, unsigned int offset)
+{
+writeb_relaxed(val, gicv2.map_dbase + offset);
+}
+
+static inline void writel_gicd(uint32_t val, unsigned int offset)
+{
+writel_relaxed(val, gicv2.map_dbase + offset);
+}
+
+static inline uint32_t readl_gicd(unsigned int offset)
+{
+return readl_relaxed(gicv2.map_dbase + offset);
+}
+
+static inline void writel_gicc(uint32_t val, unsigned int offset)
+{
+unsigned int page = offset  PAGE_SHIFT;
+offset = ~PAGE_MASK;
+writel_relaxed(val, gicv2.map_cbase[page] + offset);
+}
+
+static inline uint32_t readl_gicc(unsigned int offset)
+{
+unsigned int page = offset  PAGE_SHIFT;
+offset = ~PAGE_MASK;
+return readl_relaxed(gicv2.map_cbase[page] + offset);
+}
+
+static inline void writel_gich(uint32_t val, unsigned int offset)
+{
+writel_relaxed(val, gicv2.map_hbase + offset);
+}
+
+static inline uint32_t readl_gich(int unsigned offset)
+{
+return readl_relaxed(gicv2.map_hbase + offset);
+}
+
+static unsigned int gicv2_cpu_mask(const cpumask_t *cpumask)
+{
+unsigned int cpu;
+unsigned int mask = 0;
+cpumask_t possible_mask;
+
+cpumask_and(possible_mask, cpumask, cpu_possible_map);
+for_each_cpu( cpu, possible_mask )
+{
+ASSERT(cpu  NR_GIC_CPU_IF);
+mask |= per_cpu(gic_cpu_id, cpu);
+}
+
+return mask;
+}
+
+static void gicv2_save_state(struct vcpu *v)
+{
+int i;
+
+/* No need for spinlocks here because interrupts are disabled around
+ * this call and it only accesses struct vcpu fields that cannot be
+ *

[Xen-devel] Pygrub backports

2015-02-27 Thread Ian Jackson

I think the following commits from master should be considered for
backport:

0c12e5b7427b4dfd2dfabf21f6b0e6e24bc8e864
tools/pygrub: Fix extlinux when /boot is a separate partition from /

d1b93ea2615bd789ee28901f1f1c05ffb319cb61
tools/pygrub: Make pygrub understand default entry in string format

4ee393f9d6528640c29a0554fdc6cb3e795fb6e8
pygrub: fix non-interactive parsing of grub1 config files

3b279811707dab4bab95c2e952e94ebf4d6badd9
pygrub: Fix regression from c/s d1b93ea, attempt 2


Existing Xen 4.4.1 as found in Ubuntu cannot parse the grub.cfg files
that Ubuntu itself generates, which was:

Reported-by: Owen Dunn osd1...@cam.ac.uk

Owen kindly tested pygrub from xen.git#master (merged with the
Debian/Ubuntu patchset, provided by me) and reports that it worked in
his setup.


Opinions ?

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 3/5] x86: widen NUMA nodes to be allocated from

2015-02-27 Thread Dario Faggioli

On Fri, 2015-02-27 at 13:36 +, Jan Beulich wrote:
  On 27.02.15 at 14:27, dario.faggi...@citrix.com wrote:

  I'm asking because I really don't like vcpu_to_node(). And I'm not
  talking about how it is implemented (there probably are not much
  alternatives), I'm saying I don't think it should exist, and I really
  would see value in killing it. :-)
 
 I'm all for killing it. In fact I'd also like to see domain_to_node()
 go away, as it's similarly bogus (no matter of the proposed
 changed implementation) - neither a vCPU nor a domain have
 a focus node or some such (some may happen to if their node
 mask has just a single set bit, but that's nothing code should
 depend on). 

I totally agree. I didn't go as far as far as suggesting that because,
if my grep-ing is not failing, it's still in use in two more places,
even with your series applied.

But yes, we really should make it possible to remove it too.

 (And btw, at the very least first_node() in your
 proposal should become any_node().)
 
Except, there is no such function. But again, I agree, and if we get to
the point where we can kill vcpu_to_node() but need to keep
domain_to_node, we can of course implement it. :-)

Regards,
Dario


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-27 Thread Ian Campbell

On Fri, 2015-02-27 at 14:33 +, Stefano Stabellini wrote:
 On Thu, 26 Feb 2015, Ian Campbell wrote:
  On Thu, 2015-02-26 at 15:39 +0530, Manish Jaggi wrote:
   Have you reached a conclusion?
  
  My current thinking on how PCI for Xen on ARM should look is thus:
  
  xen/arch/arm/pci.c:
  
  New file, containing core PCI infrastructure for ARM. Includes:
  
  pci_hostbridge_register(), which registers a host bridge:
  
  Registration includes:
  DT node pointer
  
  CFG space address
  
  pci_hostbridge_ops function table, which
  contains e.g. cfg space read/write ops, perhaps
  other stuff).
  
  Function for setting the (segment,bus) for a given host bridge.
  Lets say pci_hostbridge_setup(), the host bridge must have been
  previously registered. Looks up the host bridge via CFG space
  address and maps that to (segment,bus).
  
  Functions for looking up host bridges by various keys as needed
  (cfg base address, DT node, etc)
  
  pci_init() function, called from somewhere appropriate in
  setup.c which calls device_init(node, DEVICE_PCIHOST, NULL) (see
  gic_init() for the shape of this)
  
  Any other common helper functions for managing PCI devices, e.g.
  for implementing PHYSDEVOP_*, which cannot be made properly
  common (i.e. shared with x86).
  
  xen/drivers/pci/host-*.c (or pci/host/*.c):
  
  New files, one per supported PCI controller IP block. Each
  should use the normal DT_DEVICE infrastructure for probing,
  i.e.:
  DT_DEVICE_START(foo, FOO, DEVICE_PCIHOST)
  
  Probe function should call pci_hostbridge_register() for each
  host bridge which the controller exposes.
  
  xen/arch/arm/physdev.c:
  
  Implements do_physdev_op handling PHYSDEVOP_*. Includes:
  
  New hypercall subop PHYSDEVOP_pci_host_bridge_add:
  
  As per 1424703761.27930.140.ca...@citrix.com which
  calls pci_hostbridge_setup() to map the (segment,bus) to
  a specific pci_hostbridge_ops (i.e. must have previously
  been registered with pci_hostbridge_register(), else
  error).
 
 I think that the new hypercall is unnecessary. We know the MMCFG address
 ranges belonging to a given host bridge from DT and
 PHYSDEVOP_pci_mmcfg_reserved gives us segment, start_bus and end_bus for
 a specific MMCFG.

My understanding from discussion with Jan was that this is not what this
hypercall does, or at least that this would be an abuse of the existing
interface. See: 54e75d87027800062...@mail.emea.novell.com

Anyway, what happens for when there is no MMCFG table to drive dom0's
calls to pci_mmcfg_reserved? Or a given host-bridge doesn't have special
flags and so isn't mentioned there.

I think a dedicated hypercall is better.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Pygrub backports

2015-02-27 Thread Ian Jackson

Jan Beulich writes (Re: Pygrub backports):
  On 27.02.15 at 13:29, ian.jack...@eu.citrix.com wrote:
  I think the following commits from master should be considered for
  backport:
 
 Looks reasonable. Question is - do you still want this for 4.4.2 or
 only afterwards? If for it, then can these please go in before RC2
 (which really is only pending a push on the branches)?

Well, TBH I was kind of surprised that we hadn't queued these as
backports anyway.  Backporting pygrub improvements is important for
compatibility with newer guests.

So if you don't mind too much, can we have them in 4.4.2 ?  In which
case I would push them right away.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/Dom0: account for shadow/HAP allocation

2015-02-27 Thread Andrew Cooper

On 27/02/15 13:21, Jan Beulich wrote:
 On 27.02.15 at 13:02, andrew.coop...@citrix.com wrote:
 On 26/02/15 07:43, Jan Beulich wrote:
 On 25.02.15 at 18:06, andrew.coop...@citrix.com wrote:
 On 25/02/15 14:45, Jan Beulich wrote:
 +static unsigned long __init dom0_paging_pages(const struct domain *d,
 +  unsigned long nr_pages)
 +{
 +/* Copied from: libxl_get_required_shadow_memory() */
 +unsigned long memkb = nr_pages * (PAGE_SIZE / 1024);
 +
 +memkb = 4 * (256 * d-max_vcpus + 2 * (memkb / 1024));
 I have recently raised a bug against Xapi for similar wrong logic when
 calculating the size of the shadow pool.

 A per-vcpu reservation of shadow allocation is only needed if shadow
 paging is actually in use, and even then should match
 shadow_min_acceptable_pages() at 128 pages per vcpu.

 If HAP is in use, the only allocations from the shadow pool are for the
 EPT/NPT tables (1% of nr_pages), IOMMU tables (another 1% of nr_pages if
 in use), and the logdirty radix tree (substantially less than than 1% of
 nr_pages).

 One could argue that structure such as the vmcs/vmcb should have their
 allocations accounted against the domain, in which case a small per-vcpu
 component would be appropriate.

 However as it currently stands, this calculation wastes 4MB of ram per
 vcpu in shadow allocation which is not going to be used.
 But you realize that the functional change here explicitly only covers
 the shadow case - the PVH (i.e. HAP) case is effectively unchanged
 (merely correcting the mistake of not accounting for what gets
 actually allocated), and I don't intend any functional change for PVH
 (other than said bug fix) with this patch.
 Ok

 Hence correcting this (i.e.
 lowering the accounted for as well as the allocated amount) as well
 as adding accounting for VMCS/VMCB (just like we account for
 struct vcpu) should be the subject of a separate patch, presumably
 by someone actively working on PVH (and then perhaps at once for
 libxc). I also think that this calculation would better become a paging
 variant specific hook if calculations differ between shadow and HAP.
 That would be better, in the longrun.
 Taking this together, can I read this as an ack then?

Acked-by: Andrew Cooper andrew.coop...@citrix.com


 Jan



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 3/4] xen: sched: make counters for vCPU tickling generic

2015-02-27 Thread Meng Xu

2015-02-27 5:53 GMT-05:00 Dario Faggioli dario.faggi...@citrix.com:
 On Fri, 2015-02-27 at 00:47 -0500, Meng Xu wrote:

 2015-02-26 8:37 GMT-05:00 Dario Faggioli dario.faggi...@citrix.com:
 and update them from Credit2 and RTDS schedulers.

 Signed-off-by: Dario Faggioli dario.faggi...@citrix.com
 Cc: Meng Xu xumengpa...@gmail.com
 Cc: George Dunlap george.dun...@eu.citrix.com
 Cc: Jan Beulich jbeul...@suse.com
 Cc: Keir Fraser k...@xen.org
 ---
  xen/common/sched_credit2.c   |2 ++
  xen/common/sched_rt.c|2 ++
  xen/include/xen/perfc_defn.h |4 ++--
  3 files changed, 6 insertions(+), 2 deletions(-)

 The change for RTDS scheduler looks good to me.

 Does this count as a Reviewed-by: Meng Xu men...@cis.upenn.edu ?

 Also, if yes, does it also apply to patch #2 ? That is unclear as
 sched_rt.c is modified in patches #1, #2 ad #3, while what you did is:
  - you explicitly provided the tag for patch #1
  - you said looks good for this for patch #3
  - you said nothing for patch #2

 The bottom line of all this being: with Ack-s/Reviewed-by-s, it's always
 better be pretty explicit! :-D

I see. Thank you very much, Dario, for explaining this to me! :-)

After you add return before no_tickle:, this patch is good to go, IMHO.

So after the this change,
Reviewed-by: Meng Xu men...@cis.upenn.edu

Thank you very much!

Best,

Meng

-- 

---
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 08/24] xen/arm: Allow virq != irq

2015-02-27 Thread Julien Grall

Hi Ian,

On 20/02/15 15:52, Ian Campbell wrote:
 As DOM0 will get most the devices, the vIRQ is equal to the IRQ in that case.
 
 Am I correct that after this patch all callers still pass irq==virq to
 the new function?

Sorry, I forgot to answer to this question. Yes, all the callers will
pass irq == virq in case of DOM0.

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] xen/arm: Handle translated addresses for hardware domains in GICv2

2015-02-27 Thread Ian Campbell

On Fri, 2015-02-27 at 13:53 +, Julien Grall wrote:
 Hi Frediano,
 
 On 25/02/15 13:21, Frediano Ziglio wrote:
  Translated addresses (in d-arch.vgic.{c,d}base) are now bus addresses
  which could not always be applied to the DT.
  Copy the original addresses from DT directly to get the original
  untranslated reg property which will give same d-arch.vgic.{c,d}base
  values once translated again.
  
  Signed-off-by: Frediano Ziglio frediano.zig...@huawei.com
  ---
   xen/arch/arm/gic-v2.c | 25 ++---
   1 file changed, 14 insertions(+), 11 deletions(-)
  
  Fixed typos in comments.
  
  diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
  index 31fb81a..a401e3f 100644
  --- a/xen/arch/arm/gic-v2.c
  +++ b/xen/arch/arm/gic-v2.c
  @@ -590,7 +590,7 @@ static int gicv2_make_dt_node(const struct domain *d,
   const struct dt_device_node *gic = dt_interrupt_controller;
   const void *compatible = NULL;
   u32 len;
  -__be32 *new_cells, *tmp;
  +const __be32 *regs;
   int res = 0;
   
   compatible = dt_get_property(gic, compatible, len);
  @@ -617,18 +617,21 @@ static int gicv2_make_dt_node(const struct domain *d,
   if ( res )
   return res;
   
  -len = dt_cells_to_size(dt_n_addr_cells(node) + dt_n_size_cells(node));
  -len *= 2; /* GIC has two memory regions: Distributor + CPU interface */
  -new_cells = xzalloc_bytes(len);
  -if ( new_cells == NULL )
  -return -FDT_ERR_XEN(ENOMEM);
  +/*
  + * DTB provides up to 4 regions to handle virtualization
 
 Sorry to ask more change.
 
 I'm not sure why you speak about virtualization here.

Because two of the regions are GICH and GICV, and those are the ones we
are truncating out here.

 
 Also, can you write somewhere that the GICC and GICD are the first 2
 regions of the reg?
 
 Other than that this patch looks good to me:
 
 Reviewed-by: Julien Grall julien.gr...@linaro.org
 
 Regards,
 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 3/5] x86: widen NUMA nodes to be allocated from

2015-02-27 Thread Dario Faggioli

On Fri, 2015-02-27 at 13:46 +, Ian Campbell wrote:
 On Fri, 2015-02-27 at 13:27 +, Dario Faggioli wrote:

  After this series, vcpu_to_node() (defined in xen/include/xen/numa.h) is
  left with only one use, in xen/arch/arm/domain.c, besides of course
  being used to implement domain_to_node() (still in
  xen/include/xen/numa.h).
  
  So, provided ARM people (and I'm Cc-ing them) can get rid of that,
 
 Happy to do so if you have advise on what to replace it with, just 0?
 
As Julien says, with the MEMF_no_owner feature Jan is introducing in the
series.

 We don't do NUMA yet on ARM so that would be fine, but eventually we'd
 want the vcpu stack to be allocated in some sort of sensible relative
 to vcpu affinity location...
 
Yes, and Jan's MEMF_no_owner, if it works on your arch too, as it seems
it could, will provide exactly that.

Regards,
Dario


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-27 Thread Stefano Stabellini

On Thu, 26 Feb 2015, Ian Campbell wrote:
 On Thu, 2015-02-26 at 15:39 +0530, Manish Jaggi wrote:
  Have you reached a conclusion?
 
 My current thinking on how PCI for Xen on ARM should look is thus:
 
 xen/arch/arm/pci.c:
 
 New file, containing core PCI infrastructure for ARM. Includes:
 
 pci_hostbridge_register(), which registers a host bridge:
 
 Registration includes:
 DT node pointer
 
 CFG space address
 
 pci_hostbridge_ops function table, which
 contains e.g. cfg space read/write ops, perhaps
 other stuff).
 
 Function for setting the (segment,bus) for a given host bridge.
 Lets say pci_hostbridge_setup(), the host bridge must have been
 previously registered. Looks up the host bridge via CFG space
 address and maps that to (segment,bus).
 
 Functions for looking up host bridges by various keys as needed
 (cfg base address, DT node, etc)
 
 pci_init() function, called from somewhere appropriate in
 setup.c which calls device_init(node, DEVICE_PCIHOST, NULL) (see
 gic_init() for the shape of this)
 
 Any other common helper functions for managing PCI devices, e.g.
 for implementing PHYSDEVOP_*, which cannot be made properly
 common (i.e. shared with x86).
 
 xen/drivers/pci/host-*.c (or pci/host/*.c):
 
 New files, one per supported PCI controller IP block. Each
 should use the normal DT_DEVICE infrastructure for probing,
 i.e.:
 DT_DEVICE_START(foo, FOO, DEVICE_PCIHOST)
 
 Probe function should call pci_hostbridge_register() for each
 host bridge which the controller exposes.
 
 xen/arch/arm/physdev.c:
 
 Implements do_physdev_op handling PHYSDEVOP_*. Includes:
 
 New hypercall subop PHYSDEVOP_pci_host_bridge_add:
 
 As per 1424703761.27930.140.ca...@citrix.com which
 calls pci_hostbridge_setup() to map the (segment,bus) to
 a specific pci_hostbridge_ops (i.e. must have previously
 been registered with pci_hostbridge_register(), else
 error).

I think that the new hypercall is unnecessary. We know the MMCFG address
ranges belonging to a given host bridge from DT and
PHYSDEVOP_pci_mmcfg_reserved gives us segment, start_bus and end_bus for
a specific MMCFG. We don't need anything else: we can simply match the
host bridge based on the MMCFG address that dom0 tells us via
PHYSDEVOP_pci_mmcfg_reserved with the addresses on DT.

But we do need to support PHYSDEVOP_pci_mmcfg_reserved on ARM.


 PHYSDEVOP_pci_device_add/remove: Implement existing hypercall
 interface used by x86 for ARM.
 
 This requires that PHYSDEVOP_pci_host_bridge_add has
 been called for the (segment,bus) which it refers to,
 otherwise error.
 
 Looks up the host bridge and does whatever setup is
 required plus e.g. calling of pci_add_device().
 
 No doubt various other existing interfaces will need wiring up, e.g.
 pci_conf_{read,write}* should lookup the host bridge ops struct and call
 the associated method.
 
 I'm sure the above must be incomplete, but I hope the general shape
 makes sense?
 
I think it makes sense and it is along the lines of what I was thinking
too.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v7 3/3] xen/arm: Force dom0 to use normal GICv2 driver on Hip04 platform

2015-02-27 Thread Frediano Ziglio

Until vGIC support is not implemented and tested, this will prevent
guest kernels to use their Hip04 driver, or crash when they don't
have any.

Signed-off-by: Frediano Ziglio frediano.zig...@huawei.com
---
 xen/arch/arm/gic-hip04.c | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/xen/arch/arm/gic-hip04.c b/xen/arch/arm/gic-hip04.c
index 9977f9b..a7c0892 100644
--- a/xen/arch/arm/gic-hip04.c
+++ b/xen/arch/arm/gic-hip04.c
@@ -614,17 +614,21 @@ static int hip04gic_make_dt_node(const struct domain *d,
   const struct dt_device_node *node, void *fdt)
 {
 const struct dt_device_node *gic = dt_interrupt_controller;
-const void *compatible = NULL;
+const void *compatible;
 u32 len;
 const __be32 *regs;
 int res = 0;
 
-compatible = dt_get_property(gic, compatible, len);
-if ( !compatible )
-{
-dprintk(XENLOG_ERR, Can't find compatible property for the gic 
node\n);
-return -FDT_ERR_XEN(ENOENT);
-}
+/*
+ * Replace compatibility string with a standard one.
+ * dom0 will see a compatible GIC. This as GICC is compatible
+ * with standard one and GICD (emulated by Xen) is compatible
+ * to standard. Otherwise we should implement HIP04 GICD in
+ * the virtual GIC.
+ * This actually limit CPU number to 8 for dom0.
+ */
+compatible = DT_COMPAT_GIC_CORTEX_A15;
+len = strlen((char*) compatible) + 1;
 
 res = fdt_begin_node(fdt, interrupt-controller);
 if ( res )
-- 
1.9.1



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] xen: correct bug in p2m list initialization

2015-02-27 Thread David Vrabel

On 27/02/15 14:45, Juergen Gross wrote:
 Commit 054954eb051f35e74b75a566a96fe756015352c8 (xen: switch to
 linear virtual mapped sparse p2m list) introduced an error.
 
 During initialization of the p2m list a p2m identity area mapped by
 a complete identity pmd entry has to be split up into smaller chunks
 sometimes, if a non-identity pfn is introduced in this area.
 
 If this non-identity pfn is not at index 0 of a p2m page the new
 p2m page needed is initialized with wrong identity entries, as the
 identity pfns don't start with the value corresponding to index 0,
 but with the initial non-identity pfn. This results in weird wrong
 mappings.
 
 Correct the wrong initialization by starting with the correct pfn.

Applied to stable/for-linus-4.0, thanks.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Pygrub backports

2015-02-27 Thread Ian Jackson

Ian Campbell writes (Re: Pygrub backports):
 Sounds good. If we could also get an example of the problematic grub.cfg
 to be checked into xen.git/tools/pygrub/examples that would be handy
 too.

I have asked the reporter for a (suitably-laundered) copy and some
info about how it was generated.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 08/24] xen/arm: Allow virq != irq

2015-02-27 Thread Ian Campbell

On Fri, 2015-02-27 at 14:33 +, Julien Grall wrote:
 Hi Ian,
 
 On 20/02/15 17:09, Julien Grall wrote:
  On 20/02/15 15:52, Ian Campbell wrote:
   
   action = xmalloc(struct irqaction);
  -if (!action)
  +if ( !action )
  +return -ENOMEM;
  +
  +info = xmalloc(struct irq_guest);
 
  FWIW you might (subject to sizing/alignment needs) be able to do
 action = _xmalloc(sizeof(struct irqaction) + sizeof(struct irq_guest);
 info = (sturct irq_guest *)(action + 1);
 
  which would save some memory overhead for free pointers etc and allow
  you to avoid manually managing the info.
 
  You probably won't like that though, so feel free to ignore.
  
  Actually it's a good idea :). I haven't though about it.
 
 I though about it. The pointer to irq_guest may not be correctly aligned
 with this solution, right?

It depends on sizeof(struct irqaction) (which is what I meant by
subject to...). t'd probably need a ROUNDUP(sizeof(foo),
pointer-alignement) in there somewhere.

 So I prefer to keep separate the allocation. We can revisit it later.

OK.




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 3/5] x86: widen NUMA nodes to be allocated from

2015-02-27 Thread Dario Faggioli

On Fri, 2015-02-27 at 13:38 +, Julien Grall wrote:

  Signed-off-by: Jan Beulich jbeul...@suse.com
 
  Reviewed-by: Dario Faggioli dario.faggi...@citrix.com
  
  One question (a genuine one, i.e., I'm really not sure what I'm saying
  is correct).
  
  After this series, vcpu_to_node() (defined in xen/include/xen/numa.h) is
  left with only one use, in xen/arch/arm/domain.c, besides of course
  being used to implement domain_to_node() (still in
  xen/include/xen/numa.h).
  
  So, provided ARM people (and I'm Cc-ing them) can get rid of that, can
  that macro be removed all together, and domain_to_node(d) be defined
  after d-node_affinity... something like:

 Given the changes made by Jan on x86, I think we could replace
 vcpu_to_node by MEMF_no_owner.
 
I expected this to be the case. Happy to hear it is! :-)

 FWIW, we don't have any NUMA support on ARM currently.
 
I know.

Thanks and Regards,
Dario


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] RFC: xen config changes v4

2015-02-27 Thread Konrad Rzeszutek Wilk

  This is not meant to be a performance improvement. It is meant to enable
  a standard distro kernel configured without PARAVIRT to be able to run
  as a HVM guest using the pv-drivers.
  
 This is not a convincing explanation.  Debian, Ubuntu and Fedora seems
 to be able to cope with it just fine.

No they are not. The 32-bit Fedora Core 21 LiveISO is non-PAE. I think the
same situation was with Ubuntu.
 
 Why do you want to do that, even though it will cause a performance
 regression and a maintenance pain?  You haven't provided a reason yet.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] RFC: xen config changes v4

2015-02-27 Thread Konrad Rzeszutek Wilk

On Fri, Feb 27, 2015 at 09:53:46AM -0800, Luis R. Rodriguez wrote:
 On Fri, Feb 27, 2015 at 6:30 AM, Juergen Gross jgr...@suse.com wrote:
  On 02/27/2015 02:38 PM, Stefano Stabellini wrote:
 
  On Fri, 27 Feb 2015, Juergen Gross wrote:
 
  On 02/27/2015 01:24 PM, Stefano Stabellini wrote:
 
  On Fri, 27 Feb 2015, Juergen Gross wrote:
 
  On 02/27/2015 11:11 AM, Stefano Stabellini wrote:
 
  On Fri, 27 Feb 2015, Juergen Gross wrote:
 
  On 02/27/2015 10:41 AM, Stefano Stabellini wrote:
 
  On Fri, 27 Feb 2015, Juergen Gross wrote:
 
  On 02/26/2015 06:42 PM, Stefano Stabellini wrote:
 
  On Thu, 26 Feb 2015, Luis R. Rodriguez wrote:
 
  On Thu, Feb 26, 2015 at 11:08:20AM +, Stefano Stabellini
  wrote:
 
  On Thu, 26 Feb 2015, David Vrabel wrote:
 
  On 26/02/15 04:59, Juergen Gross wrote:
 
 
  So we are again in the situation that pv-drivers always
  imply
  the
  pvops
  kernel (PARAVIRT selected). I started the whole Kconfig
  rework
  to
  eliminate this dependency.
 
 
  Yes.  Can you produce a series that just addresses this
  one
  issue.
 
  In the absence of any concrete requirement for this big
  Kconfig
  reorg
  I
  I don't think it is helpful.
 
 
  I clearly missed some context as I didn't realize that this
  was
  the
  intended goal. Why do we want this? Please explain as it
  won't
  come
  for free.
 
 
  We have a few PV interfaces for HVM guests that need
  PARAVIRT in
  Linux
  in order to be used, for example pv_time_ops and
  HVMOP_pagetable_dying.
  They are critical performance improvements and from the
  interface
  perspective, small enough that doesn't make much sense
  having a
  separate
  KConfig option for them.
 
 
  In order to reach the goal above we necessarily need to
  introduce a
  differentiation in terms of PV on HVM guests in Linux:
 
  1) basic guests with PV network, disk, etc but no PV timers,
  no
  HVMOP_pagetable_dying, no PV IPIs
  2) full PV on HVM guests that have PV network, disk, timers,
  HVMOP_pagetable_dying, PV IPIs and anything else that
  makes
  sense.
 
  2) is much faster than 1) on Xen and 2) is only a tiny bit
  slower
  than
  1) on native x86
 
 
  Also don't we shove 2) down hvm guests right now? Even when
  everything
  is
  built in I do not see how we opt out for HVM for 1) at run
  time
  right
  now.
 
  If this is true then the question of motivation for this
  becomes
  even
  stronger I think.
 
 
  Yes, indeed there is no way to do 1) at the moment. And for good
  reasons, see above.
 
 
  Hmm, after checking the code I'm not convinced:
 
  - HVMOP_pagetable_dying is obsolete on modern hardware supporting
EPT/HAP
 
 
  That might be true, but what about older hardware?
  Even on modern hardware a few workloads still run faster on shadow.
  But if HVMOP_pagetable_dying is the only reason to keep PARAVIRT for
  HVM
  guests, then I agree with you that we should remove it.
 
 
  - PV IPIs are not needed on single-vcpu guests
 
  - PARAVIRT_CLOCK doesn't need PARAVIRT (in fact the SUSEs kernel
  configs
for all x86_64 kernels have CONFIG_PARAVIRT_CLOCK=y)
 
  So I think we really should enable building Xen frontends without
  PARAVIRT, implying at least no XEN_PV and no XEN_PVH.
 
  I'll have a try setting up patches.
 
 
  If we are doing this as a performance improvement, I would like to
  see a
  couple of benchmarks (kernbench, hackbench) to show that on a
  single-vcpu guest and multi-vcpu guest (let's say 4 vcpus) disabling
  PARAVIRT leads to better performance on Xen on EPT hardware.
 
 
  This is not meant to be a performance improvement. It is meant to
  enable
  a standard distro kernel configured without PARAVIRT to be able to
  run
  as a HVM guest using the pv-drivers.
 
 
  This is not a convincing explanation.  Debian, Ubuntu and Fedora seems
  to be able to cope with it just fine.
 
  Why do you want to do that, even though it will cause a performance
  regression and a maintenance pain?  You haven't provided a reason yet.
 
 
  Either we are talking about different things, or I really don't
  understand your problem here. I don't want to disable something. I
  just want to enable kernels without PARAVIRT to run under Xen better
  than today. Being it 32 bit non-PAE kernels as Ian pointed out or
  distro kernels like e.g. SLES and probably RHEL.
 
  Using PV frontends is completely orthogonal to other PV enhancements
  like PARAVIRT_CLOCK, HVMOP_pagetable_dying or PV IPIs. So why do you
  object enabling the PV frontends for those kernels?
 
 
  I am for it.  I would like to avoid two user visible XEN enablement
  options (XEN_FRONTEND vs. XEN_PVHVM) for x86_64 and PAE HVM guests to
  avoid configurations with just XEN_FRONTEND, that can be considered a
  performance regression compared to what we have now (on x86_64 and PAE).
 
 
  Would you be okay with making this an expert configuration alternative
  for PAE/x86_64? This would enable the possibility to use PV drivers for

[Xen-devel] [PATCH v5] tools/xenconsoled: Increase file descriptor limit

2015-02-27 Thread Andrew Cooper

XenServer's VM density testing uncovered a regression when moving from
sysvinit to systemd where the file descriptor limit dropped from 4096 to
1024. (XenServer had previously inserted a ulimit statement into its
initscripts.)

One solution is to use LimitNOFILE=4096 in xenconsoled.service to match the
lost ulimit, but that is only a stopgap solution.

As Xenconsoled genuinely needs a large number of file descriptors if a large
number of domains are running, attempt to increase the limit.

Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Wei Liu wei.l...@citrix.com

---
v5:
 * Drop system maximum checking
 * Unify set paths
v4:
 * Calculate fd limit based on domid ABI - result is 132008 fds
 * Warn if sufficient fds are not available.
v3:
 * Hide Linux specific bits in #ifdef __linux__
v2:
 * Always increase soft limit to hard limit
 * Correct commment regarding number of file descriptors
 * long - unsigned long as that appears to be the underlying type of an rlim_t
---
 tools/console/daemon/main.c |   36 
 1 file changed, 36 insertions(+)

diff --git a/tools/console/daemon/main.c b/tools/console/daemon/main.c
index 92d2fc4..6e84f5a 100644
--- a/tools/console/daemon/main.c
+++ b/tools/console/daemon/main.c
@@ -26,6 +26,7 @@
 #include string.h
 #include signal.h
 #include sys/types.h
+#include sys/resource.h
 
 #include xenctrl.h
 
@@ -55,6 +56,39 @@ static void version(char *name)
printf(Xen Console Daemon 3.0\n);
 }
 
+static void increase_fd_limit(void)
+{
+   /*
+* We require many file descriptors:
+* - per domain: pty master, pty slave, logfile and evtchn
+* - misc extra: hypervisor log, privcmd, gntdev, std...
+*
+* Allow a generous 1000 for misc, and calculate the maximum possible
+* number of fds which could be used.
+*/
+   unsigned min_fds = (DOMID_FIRST_RESERVED * 4) + 1000;
+   struct rlimit lim, new = { min_fds, min_fds };
+
+   if (getrlimit(RLIMIT_NOFILE, lim)  0) {
+   fprintf(stderr, Failed to obtain fd limit: %s\n,
+   strerror(errno));
+   exit(1);
+   }
+
+   /* Do we already have sufficient? Great! */
+   if (lim.rlim_cur = min_fds)
+   return;
+
+   /* Try to increase our limit. */
+   if (setrlimit(RLIMIT_NOFILE, new)  0)
+   syslog(LOG_WARNING,
+  Unable to increase fd limit from {%lu, %lu} to 
+  {%lu, %lu}: (%s) - May run out with lots of domains,
+  lim.rlim_cur, lim.rlim_max,
+  new.rlim_cur, new.rlim_max,
+  strerror(errno));
+}
+
 int main(int argc, char **argv)
 {
const char *sopts = hVvit:o:;
@@ -154,6 +188,8 @@ int main(int argc, char **argv)
openlog(xenconsoled, syslog_option, LOG_DAEMON);
setlogmask(syslog_mask);
 
+   increase_fd_limit();
+
if (!is_interactive) {
daemonize(pidfile ? pidfile : /var/run/xenconsoled.pid);
}
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v5] tools/xenconsoled: Increase file descriptor limit

2015-02-27 Thread Ian Jackson

Andrew Cooper writes ([PATCH v5] tools/xenconsoled: Increase file descriptor 
limit):
 XenServer's VM density testing uncovered a regression when moving from
 sysvinit to systemd where the file descriptor limit dropped from 4096 to
 1024. (XenServer had previously inserted a ulimit statement into its
 initscripts.)
...

Thanks, and sorry to be pernickety.

Acked-by: Ian Jackson ian.jack...@eu.citrix.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Regression, host crash with 4.5rc1

2015-02-27 Thread Brown, Len

(Please forgive my lack of Xen-fu knowledge in advance)

If this issue were to happen on Linux/bare-metal, this is how I'd debug it.
Hopefully some of this will translate to Xen in one way or another.

dmesg | grep idle
will tell us what idle driver is running (on Dom0 kernel)
and if it is intel_idle, it will also tell us the supported sub-states 
(CPUID.MWAIT.EDX value)

grep . /sys/devices/system/cpu/cpu0/cpuidle/*/*
will tell us what states the OS is requesting,
It will expand on the FFH bit here:

  (XEN) C1:   type[C1] latency[003] usage[12219860] method[  FFH]
  duration[1190961948551]
  (XEN) C2:   type[C1] latency[010] usage[10205554] method[  FFH]
  duration[2015393965907]
  (XEN) C3:   type[C2] latency[020] usage[50926286] method[  FFH]
  duration[30527997858148]

I'm hopeful that this information comes from the hardware's BIOS
and not some hypervisor tricking out Dom0 with a fake BIOS, yes?

If Xen doesn't have cpuidle, or its sysfs, then acpidump for the platform
should be able to tell us what the platform is exporting.

Next, hopefully the attached turbostat utility can be invoked on Dom0
and it can read the MSRs on at least 1 processor via the /dev/cpu interface.

This will tell you what the hardware supports, and what HW states are actually
being invoked.  (which  may be different from what the OS asks for...)

It may tell us just the same thing I think we learned here:

  (XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0]
  (XEN) CC3[28794734145697] CC6[0] CC7[0]

which I'm assuming are a dump of the MSR residency counters.
If yes, it appears to be that this platform is not invoking c6 and pc6 at all,
and that the deepest state being used is actually cc3 and pc3.
I don't know if that is because you've booted the kernel with max_cstate=N
of some kind, or if this is default.

attached is turbostat, source and binary, run it this way
and send the ts.out file:

# ./turbostat --debug sleep 5  ts.out 21

Guessing...
If no surprises in the debug stuff requested above, and
If the XEN debug stuff above is with c6 explicitly disabled...
Note that here are two kinds of c6 -- CC6 (core) and PC6 (package).
If this box supports both, the next thing to try will be to keep CC6
enabled, but to just disable PC6.  This is done via an MSR that turbostat
dumps out (MSR_NHM_SNB_PKG_CST_CFG_CTL) via the wrmsr(8) utility.
Though if that MSR is locked by the BIOS, then BIOS SETUP option
may be the only way to disable the package C-state limit without
also disabling the associated core C-state.

cheers,
-Len


ps. 



turbostat-test.tar.gz
Description: turbostat-test.tar.gz
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] freemem-slack and large memory environments

2015-02-27 Thread Mike Latimer

On Friday, February 27, 2015 08:28:49 AM Mike Latimer wrote:
 On Friday, February 27, 2015 10:52:17 AM Stefano Stabellini wrote:
  On Thu, 26 Feb 2015, Mike Latimer wrote:
  libxl_set_memory_target = 1
  
  The new memory target is set for dom0 successfully.
  
  libxl_wait_for_free_memory = -5
  
  Still there isn't enough free memory in the system.
  
  libxl_wait_for_memory_target = 0
  
  However dom0 reached the new memory target already.
  Who is stealing your memory?
 
 I just realized I was missing commit 2048aeec, which corrects the hardcoded
 return value of libxl_wait_for_memory_target from 0 to rc. I'll retest with
 this change in place.
 
  In any case in the context of libxl_wait_for_memory_target, ERROR_FAIL
  means that the memory target has not been reached.
 
 I'm expecting this commit to to change what I'm seeing, but I'm not
 convinced it will be a good change...  There is zero chance dom0 will
 balloon down 64GB (or 512GB) in the 10 second window set by freemem. This
 will likely mean the entire process will fail (when given a bit more time
 it would have succeeded).
 
 I'll add the missing commit, and send a complete set of debug logs later
 today.

After adding 2048aeec, dom0's target is lowered by the required amount (e.g. 
64GB), but as dom0 cannot balloon down fast enough, 
libxl_wait_for_memory_target returns -5, and the domain create fails (failed 
to free memory for the domain).

As dom0's target was lowered successfully, dom0 continues to balloon down in 
the background. So, after waiting a while, the domain creation will succeed. 
This is one of the problems I would like to solve. As the ballooning is 
working (just taking longer than expected) the code should monitor it and wait 
somehow.

I'll send in detailed logs (without 2048aeec) later today, to make sure I've 
explained this well enough.

-Mike

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [xen-unstable test] 35257: regressions - FAIL

2015-02-27 Thread Jim Fehlig

Ian Campbell wrote:
 On Thu, 2015-02-26 at 20:14 +, xen.org wrote:
   
 flight 35257 xen-unstable real [real]
 http://www.chiark.greenend.org.uk/~xensrcts/logs/35257/

 Regressions :-(

 Tests which did not succeed and are blocking,
 including tests which could not be run:
  test-armhf-armhf-libvirt 12 guest-start.2 fail REGR. vs. 
 34629
 

 logs:
 http://www.chiark.greenend.org.uk/~xensrcts/logs/35257/test-armhf-armhf-libvirt/info.html

 http://www.chiark.greenend.org.uk/~xensrcts/logs/35257/test-armhf-armhf-libvirt/12.ts-guest-start.log
 2015-02-23 20:21:48 Z executing ssh ... root@10.80.229.106 virsh 
 domxml-from-native xen-xl /etc/xen/debian.guest.osstest.cfg  
 /etc/xen/debian.guest.osstest.cfg.xml
 error: failed to connect to the hypervisor
 error: no valid connection
 error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': 
 Connection refused

 http://www.chiark.greenend.org.uk/~xensrcts/logs/35257/test-armhf-armhf-libvirt/marilith-n4-output-ps_wwwaxf_-eo_pid%2Ctty%2Cstat%2Ctime%2Cnice%2Cpsr%2Cpcpu%2Cpmem%2Cnwchan%2Cwchan%2325%2Cargs
 appears to show no libvirtd process.

 http://www.chiark.greenend.org.uk/~xensrcts/logs/35257/test-armhf-armhf-libvirt/marilith-n4---var-log-libvirt-libvirtd.log
 says:
 2015-02-23 20:13:15.556+: 2133: info : libvirt version: 1.2.13
 2015-02-23 20:13:15.556+: 2133: error : 
 dnsmasqCapsRefreshInternal:726 : Cannot check dnsmasq binary dnsmasq: No such 
 file or directory
   
 2015-02-23 20:13:15.845+: 2133: error :
 virFirewallValidateBackend:193 : direct firewall backend requested,
 but /sbin/ebtables is not available: No such file or directory

Odd, since ebtables was found when building

checking for ebtables... /sbin/ebtables

But AFAICT, that wont prevent libvirtd from starting.

 
 I think these are just spurious.

 2015-02-23 20:13:15.845+: 2133: error : virFirewallApply:936 : 
 out of memory
 

 2015-02-23 20:13:16.092+: 2133: error : virExec:491 : Cannot find 
 'pm-is-supported' in path: No such file or directory
 2015-02-23 20:13:16.092+: 2133: warning : virQEMUCapsInit:999 : 
 Failed to get host power management capabilities
 
 As are these two.
 
 2015-02-23 20:13:16.400+: 2133: error : virFirewallApply:936 : 
 out of memory

 Has these OOM messages resulted in libvirtd exiting?

No, I don't think so.  The related code is

int
virFirewallApply(virFirewallPtr firewall)
{
size_t i, j;
int ret = -1;

virMutexLock(ruleLock);

if (!firewall || firewall-err == ENOMEM) {
virReportOOMError();
goto cleanup;
...
}

I suspect 'firewall' is null, so OOM error is reported and the function
returns -1.  But I also don't see this preventing libvirtd from
starting.  I've cc'd the libvirt list for verification that these errors
won't prevent libvirtd from starting.

  I don't see any
 evidence of a crash elsewhere in the logs (i.e. no process segfaulted
 in dmesg, no OOM killing going on etc).

 We don't seem to collect dom0 freemem info, but that most likely
 wouldn't help given the libvirtd process has exited.

   
 Any ideas where to look next?

Can you access the test environment and try starting libvirtd in the
foreground?  Or enable debug log level in /etc/libvirt/libvirtd.conf?

Regards,
Jim


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 2/4] xen: sched: make counters for vCPU sleep and wakeup generic

2015-02-27 Thread Meng Xu

[I see the reason why I neglected this patch: my gmail just filter it
into the Forum category and I didn't see it. :-)
Dario, Do you have any suggestion of the email client (maybe the one
you guys are using)?  ]

2015-02-26 8:37 GMT-05:00 Dario Faggioli dario.faggi...@citrix.com:
 and update them from Credit2 and RTDS. In Credit2, while there,
 remove some stale comments too.

 Signed-off-by: Dario Faggioli dario.faggi...@citrix.com
 Cc: George Dunlap george.dun...@eu.citrix.com
 Cc: Jan Beulich jbeul...@suse.com
 Cc: Keir Fraser k...@xen.org
 ---
  xen/common/sched_credit2.c   |   12 
  xen/common/sched_rt.c|   12 
  xen/include/xen/perfc_defn.h |   10 +-
  3 files changed, 25 insertions(+), 9 deletions(-)

 diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
 index ad0a5d4..2b852cc 100644
 --- a/xen/common/sched_credit2.c
 +++ b/xen/common/sched_credit2.c
 @@ -931,6 +931,7 @@ csched2_vcpu_sleep(const struct scheduler *ops, struct 
 vcpu *vc)
  struct csched2_vcpu * const svc = CSCHED2_VCPU(vc);

  BUG_ON( is_idle_vcpu(vc) );
 +SCHED_STAT_CRANK(vcpu_sleep);

  if ( per_cpu(schedule_data, vc-processor).curr == vc )
  cpu_raise_softirq(vc-processor, SCHEDULE_SOFTIRQ);
 @@ -956,19 +957,22 @@ csched2_vcpu_wake(const struct scheduler *ops, struct 
 vcpu *vc)

  BUG_ON( is_idle_vcpu(vc) );

 -/* Make sure svc priority mod happens before runq check */
  if ( unlikely(per_cpu(schedule_data, vc-processor).curr == vc) )
  {
 +SCHED_STAT_CRANK(vcpu_wake_running);
  goto out;
  }
 -
  if ( unlikely(__vcpu_on_runq(svc)) )
  {
 -/* If we've boosted someone that's already on a runqueue, prioritize
 - * it and inform the cpu in question. */
 +SCHED_STAT_CRANK(vcpu_wake_onrunq);
  goto out;
  }

 +if ( likely(vcpu_runnable(vc)) )
 +SCHED_STAT_CRANK(vcpu_wake_runnable);
 +else
 +SCHED_STAT_CRANK(vcpu_wake_not_runnable);
 +
  /* If the context hasn't been saved for this vcpu yet, we can't put it on
   * another runqueue.  Instead, we set a flag so that it will be put on 
 the runqueue
   * after the context has been saved. */
 diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
 index 58dd646..49d1b83 100644
 --- a/xen/common/sched_rt.c
 +++ b/xen/common/sched_rt.c
 @@ -851,6 +851,7 @@ rt_vcpu_sleep(const struct scheduler *ops, struct vcpu 
 *vc)
  struct rt_vcpu * const svc = rt_vcpu(vc);

  BUG_ON( is_idle_vcpu(vc) );
 +SCHED_STAT_CRANK(vcpu_sleep);

  if ( curr_on_cpu(vc-processor) == vc )
  cpu_raise_softirq(vc-processor, SCHEDULE_SOFTIRQ);
 @@ -966,11 +967,22 @@ rt_vcpu_wake(const struct scheduler *ops, struct vcpu 
 *vc)
  BUG_ON( is_idle_vcpu(vc) );

  if ( unlikely(curr_on_cpu(vc-processor) == vc) )
 +{
 +SCHED_STAT_CRANK(vcpu_wake_running);
  return;
 +}

  /* on RunQ/DepletedQ, just update info is ok */
  if ( unlikely(__vcpu_on_q(svc)) )
 +{
 +SCHED_STAT_CRANK(vcpu_wake_onrunq);
  return;
 +}
 +
 +if ( likely(vcpu_runnable(vc)) )
 +SCHED_STAT_CRANK(vcpu_wake_runnable);
 +else
 +SCHED_STAT_CRANK(vcpu_wake_not_runnable);

  /* If context hasn't been saved for this vcpu yet, we can't put it on
   * the Runqueue/DepletedQ. Instead, we set a flag so that it will be
 diff --git a/xen/include/xen/perfc_defn.h b/xen/include/xen/perfc_defn.h
 index 3ac7b45..2dc78fe 100644
 --- a/xen/include/xen/perfc_defn.h
 +++ b/xen/include/xen/perfc_defn.h
 @@ -21,6 +21,11 @@ PERFCOUNTER(dom_init,   sched: dom_init)
  PERFCOUNTER(dom_destroy,sched: dom_destroy)
  PERFCOUNTER(vcpu_init,  sched: vcpu_init)
  PERFCOUNTER(vcpu_destroy,   sched: vcpu_destroy)
 +PERFCOUNTER(vcpu_sleep, sched: vcpu_sleep)
 +PERFCOUNTER(vcpu_wake_running,  sched: vcpu_wake_running)
 +PERFCOUNTER(vcpu_wake_onrunq,   sched: vcpu_wake_onrunq)
 +PERFCOUNTER(vcpu_wake_runnable, sched: vcpu_wake_runnable)
 +PERFCOUNTER(vcpu_wake_not_runnable, sched: vcpu_wake_not_runnable)

  /* credit specific counters */
  PERFCOUNTER(delay_ms,   csched: delay)
 @@ -32,11 +37,6 @@ PERFCOUNTER(acct_reorder,   csched: acct_reorder)
  PERFCOUNTER(acct_min_credit,csched: acct_min_credit)
  PERFCOUNTER(acct_vcpu_active,   csched: acct_vcpu_active)
  PERFCOUNTER(acct_vcpu_idle, csched: acct_vcpu_idle)
 -PERFCOUNTER(vcpu_sleep, csched: vcpu_sleep)
 -PERFCOUNTER(vcpu_wake_running,  csched: vcpu_wake_running)
 -PERFCOUNTER(vcpu_wake_onrunq,   csched: vcpu_wake_onrunq)
 -PERFCOUNTER(vcpu_wake_runnable, csched: vcpu_wake_runnable)
 -PERFCOUNTER(vcpu_wake_not_runnable, csched: vcpu_wake_not_runnable)
  PERFCOUNTER(vcpu_park,  csched: vcpu_park)
  PERFCOUNTER(vcpu_unpark,csched: vcpu_unpark)

Re: [Xen-devel] [PATCH v3 0/8] Split off mini-os to a separate tree

2015-02-27 Thread Ian Campbell

On Fri, 2015-02-27 at 14:58 +, Wei Liu wrote:
 On Fri, Feb 27, 2015 at 02:46:58PM +, Ian Campbell wrote:
  On Fri, 2015-02-27 at 13:50 +, Wei Liu wrote:
   On Fri, Feb 27, 2015 at 01:38:58PM +, Ian Campbell wrote:
On Wed, 2015-02-25 at 11:21 +, Wei Liu wrote:
 git://xenbits.xen.org/people/liuw/xen.git wip.build-system-v3

I think the series is now fully acked. Please could you rebase -i and
add the acks and push this as v4 without changing the base commit, i.e.
not pulling it up to current master or staging, leave it at
cb34a7c8d741aa447d79e1b01d71168a4088a4d7.

Not rebasing means you do not need to retest etc and I can just git pull
the result.

   
   git://xenbits.xen.org/people/liuw/xen.git wip.build-system-v4
  
  Thanks. I'm going to commit this after my current test run with the OVMF
  update completes.
  
  Please can you confirm the precise changeset ID you expect me to find at
  git://xenbits.xen.org/people/liuw/mini-os.git master
 
  f5d9868796e91bee70601805b9bfc1bb544b0586

Thanks.

  and to push to git://xenbits.xen.org/people/mini-os.git master as part
^^
  You don't need people I think?

Correct, I removed one too few path elements. It's
git://xenbits.xen.org/mini-os.git.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v6 23/23] xl: vNUMA support

2015-02-27 Thread Dario Faggioli

On Thu, 2015-02-26 at 15:56 +, Wei Liu wrote:
 This patch includes configuration options parser and documentation.
 
 Please find the hunk to xl.cfg.pod.5 for more information.
 
 Signed-off-by: Wei Liu wei.l...@citrix.com
 Cc: Ian Campbell ian.campb...@citrix.com
 Cc: Ian Jackson ian.jack...@eu.citrix.com
 ---
 Changes in v6:
 1. Disable NUMA auto-placement.
 ---

Reviewed-and-Tested-by: Dario Faggioli dario.faggi...@citrix.com

Regards,
Dario


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4] tools/xenconsoled: Increase file descriptor limit

2015-02-27 Thread Ian Jackson

Andrew Cooper writes ([PATCH v4] tools/xenconsoled: Increase file descriptor 
limit):
 XenServer's VM density testing uncovered a regression when moving from
 sysvinit to systemd where the file descriptor limit dropped from 4096 to
 1024. (XenServer had previously inserted a ulimit statement into its
 initscripts.)
 
 One solution is to use LimitNOFILE=4096 in xenconsoled.service to match the
 lost ulimit, but that is only a stopgap solution.
 
 As Xenconsoled genuinely needs a large number of file descriptors if a large
 number of domains are running, attempt to increase the limit.
...

There's still a lot of code here I think we can do without.

Why do we care about the system maximum ?

 + /*
 +  * Will min_fds fit within our current hard limit?
 +  * (likely on *BSD, unlikely on Linux)
 +  * If so, raise our soft limit.
 +  */
 + if (min_fds = lim.rlim_max) {
 + struct rlimit new = {
 + .rlim_cur = min_fds,
 + .rlim_max = lim.rlim_max,
 + };
 +
 + if (setrlimit(RLIMIT_NOFILE, new)  0)
 + syslog(LOG_WARNING,
 +Unable to increase fd soft limit: %lu - %u, 
 +hard %lu (%s) - May run out with lots of 
 domains,
 +lim.rlim_cur, min_fds, lim.rlim_max,
 +strerror(errno));
 + } else {
 + /*
 +  * Lets hope that, as a root process, we have sufficient
 +  * privilege to up the hard limit.
 +  */
 + struct rlimit new = { .rlim_cur = min_fds, .rlim_max = min_fds 
 };
 +
 + if (setrlimit(RLIMIT_NOFILE, new)  0)
 + syslog(LOG_WARNING,
 +Unable to increase fd hard limit: %lu - %u 
 (%s)
 + - May run out with lots of domains,
 +lim.rlim_max, min_fds, strerror(errno));
 + }

This is very repetitive.  The only difference between the two branches
is (a) the value of .rlim_max and (b) the log message.  (b) can be
dealt with by making the log message depend only on the contents of
new.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 0/8] Split off mini-os to a separate tree

2015-02-27 Thread Wei Liu

On Fri, Feb 27, 2015 at 02:46:58PM +, Ian Campbell wrote:
 On Fri, 2015-02-27 at 13:50 +, Wei Liu wrote:
  On Fri, Feb 27, 2015 at 01:38:58PM +, Ian Campbell wrote:
   On Wed, 2015-02-25 at 11:21 +, Wei Liu wrote:
git://xenbits.xen.org/people/liuw/xen.git wip.build-system-v3
   
   I think the series is now fully acked. Please could you rebase -i and
   add the acks and push this as v4 without changing the base commit, i.e.
   not pulling it up to current master or staging, leave it at
   cb34a7c8d741aa447d79e1b01d71168a4088a4d7.
   
   Not rebasing means you do not need to retest etc and I can just git pull
   the result.
   
  
  git://xenbits.xen.org/people/liuw/xen.git wip.build-system-v4
 
 Thanks. I'm going to commit this after my current test run with the OVMF
 update completes.
 
 Please can you confirm the precise changeset ID you expect me to find at
 git://xenbits.xen.org/people/liuw/mini-os.git master

 f5d9868796e91bee70601805b9bfc1bb544b0586

 and to push to git://xenbits.xen.org/people/mini-os.git master as part
   ^^
   You don't need people I think?

 of this.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH] xen: credit2: use curr_on_cpu(cpu) in place of `per_cpu(s, c).curr'

2015-02-27 Thread Dario Faggioli

as 0bba5747f4bee4ddd (xen: sched_credit: define and use
curr_on_cpu(cpu)) did for Credit1, hence making the code more
consistent and easier to read.

Signed-off-by: Dario Faggioli dario.faggi...@citrix.com
Cc: George Dunlap george.dun...@eu.citrix.com
Cc: Jan Beulich jbeul...@suse.com
Cc: Keir Fraser k...@xen.org
---
 xen/common/sched_credit2.c |   12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index ad0a5d4..f0e2c82 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -493,7 +493,7 @@ runq_tickle(const struct scheduler *ops, unsigned int cpu, 
struct csched2_vcpu *
 BUG_ON(new-rqd != rqd);
 
 /* Look at the cpu it's running on first */
-cur = CSCHED2_VCPU(per_cpu(schedule_data, cpu).curr);
+cur = CSCHED2_VCPU(curr_on_cpu(cpu));
 burn_credits(rqd, cur, now);
 
 if ( cur-credit  new-credit )
@@ -526,7 +526,7 @@ runq_tickle(const struct scheduler *ops, unsigned int cpu, 
struct csched2_vcpu *
 if ( i == cpu )
 continue;
 
-cur = CSCHED2_VCPU(per_cpu(schedule_data, i).curr);
+cur = CSCHED2_VCPU(curr_on_cpu(i));
 
 BUG_ON(is_idle_vcpu(cur-vcpu));
 
@@ -658,7 +658,7 @@ void burn_credits(struct csched2_runqueue_data *rqd, struct 
csched2_vcpu *svc, s
 s_time_t delta;
 
 /* Assert svc is current */
-ASSERT(svc==CSCHED2_VCPU(per_cpu(schedule_data, 
svc-vcpu-processor).curr));
+ASSERT(svc==CSCHED2_VCPU(curr_on_cpu(svc-vcpu-processor)));
 
 if ( is_idle_vcpu(svc-vcpu) )
 {
@@ -932,7 +932,7 @@ csched2_vcpu_sleep(const struct scheduler *ops, struct vcpu 
*vc)
 
 BUG_ON( is_idle_vcpu(vc) );
 
-if ( per_cpu(schedule_data, vc-processor).curr == vc )
+if ( curr_on_cpu(vc-processor) == vc )
 cpu_raise_softirq(vc-processor, SCHEDULE_SOFTIRQ);
 else if ( __vcpu_on_runq(svc) )
 {
@@ -957,7 +957,7 @@ csched2_vcpu_wake(const struct scheduler *ops, struct vcpu 
*vc)
 BUG_ON( is_idle_vcpu(vc) );
 
 /* Make sure svc priority mod happens before runq check */
-if ( unlikely(per_cpu(schedule_data, vc-processor).curr == vc) )
+if ( unlikely(curr_on_cpu(vc-processor) == vc) )
 {
 goto out;
 }
@@ -1815,7 +1815,7 @@ csched2_dump_pcpu(const struct scheduler *ops, int cpu)
 printk(core=%s\n, cpustr);
 
 /* current VCPU */
-svc = CSCHED2_VCPU(per_cpu(schedule_data, cpu).curr);
+svc = CSCHED2_VCPU(curr_on_cpu(cpu));
 if ( svc )
 {
 printk(\trun: );


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v6 0/5] xen/arm: Add support for Huawei hip04-d01 platform

2015-02-27 Thread Frediano Ziglio

2015-02-26 13:24 GMT+00:00 Julien Grall julien.gr...@linaro.org:
 Hi Frediano,

 On 26/02/15 12:40, Frediano Ziglio wrote:
   xen/arm: Make gic-v2 code handle hip04-d01 platform
   xen/arm: handle GICH register changes for hip04-d01 platform
   xen/arm: Force dom0 to use normal GICv2 driver on Hip04 platform

 There is not much benefits to have 3 separate patches. I think they
 could be merged in a single-patch.


In the last version I merged 2 of the 3 patches. In the third the
comment is really specific to the piece of code.

Frediano

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 1/5] x86: allow specifying the NUMA nodes Dom0 should run on

2015-02-27 Thread Jan Beulich

 On 27.02.15 at 15:54, dario.faggi...@citrix.com wrote:
 On Fri, 2015-02-27 at 10:50 +, Jan Beulich wrote:
  On 27.02.15 at 11:04, dario.faggi...@citrix.com wrote:
  On Fri, 2015-02-27 at 08:46 +, Jan Beulich wrote:
  I'm simply adjusting what sched_init_vcpu() did, which is alter
  hard affinity conditionally on is_pinned and soft affinity
  unconditionally.
  
  Ok, I understand the idea behing this better now, thanks.
  [...]
  Setting soft affinity as a superset of (in the former case) or equal to
  (in the latter) hard affinity is just pure overhead, when in the
  scheduler.
 
 The why does sched_init_vcpu() do what it does? If you want to
 alter that, I'm fine with altering it here.
 
 It does that, but, in there, soft affinity is unconditionally set to
 'all bits set'. Then, in the scheduler, if we find out that the the soft
 affinity mask is fully set, we just skip the soft affinity balancing
 step.
 
 The idea is that, whether the mask is full because no one touched this
 default, or because it has been manually set like that, there is nothing
 to do at the soft affinity balancing level.
 
 So, you actually are right: rather that not touch soft affinity, as I
 said in the previous email, I think we should set hard affinity
 conditionally to is_pinned, as in the patch, and then unconditionally
 set soft affinity to all, as in sched_init_vcpu().

I.e. effectively not touching it anyway (because just before it
got set to all by sched_init_vcpu()). I guess instead of
removing the line, I'll put it in a comment.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [xen-unstable test] 35257: regressions - FAIL

2015-02-27 Thread Ian Campbell

On Fri, 2015-02-27 at 11:51 -0700, Jim Fehlig wrote:
  2015-02-23 20:13:15.845+: 2133: error :
  virFirewallValidateBackend:193 : direct firewall backend requested,
  but /sbin/ebtables is not available: No such file or directory
 
 Odd, since ebtables was found when building
 
 checking for ebtables... /sbin/ebtables
 
 But AFAICT, that wont prevent libvirtd from starting.

The build host and the runtime host will likely be different (or at
least reinstalled).

The base set of packages should be the same, but the build one will
install a bunch of libfoo-dev while the runtime host will only get
libfoo. Perhaps some libfoo-dev is pulling in ebtables somehow while
just libfoo is not. I'll have a look next week. I think its probably
non-critical to the error here.

  I think these are just spurious.
 
  2015-02-23 20:13:15.845+: 2133: error : virFirewallApply:936 : 
  out of memory
  
 
  2015-02-23 20:13:16.092+: 2133: error : virExec:491 : Cannot 
  find 'pm-is-supported' in path: No such file or directory
  2015-02-23 20:13:16.092+: 2133: warning : virQEMUCapsInit:999 : 
  Failed to get host power management capabilities
  
  As are these two.
  
  2015-02-23 20:13:16.400+: 2133: error : virFirewallApply:936 : 
  out of memory
 
  Has these OOM messages resulted in libvirtd exiting?
 
 No, I don't think so.  The related code is
 
 int
 virFirewallApply(virFirewallPtr firewall)
 {
 size_t i, j;
 int ret = -1;
 
 virMutexLock(ruleLock);
 
 if (!firewall || firewall-err == ENOMEM) {
 virReportOOMError();
 goto cleanup;
 ...
 }
 
 I suspect 'firewall' is null, so OOM error is reported and the function
 returns -1.  But I also don't see this preventing libvirtd from
 starting.  I've cc'd the libvirt list for verification that these errors
 won't prevent libvirtd from starting.

I'm pretty sure libvirtd did successfully start, since we have
successfully done a guest start and stop.

The failing step is a second guest start, so it seems like libvirtd has
either crashed or exited.

I suppose these messages are from start of day and therefore
red-herrings wrt the reason libvirtd went away.

   I don't see any
  evidence of a crash elsewhere in the logs (i.e. no process segfaulted
  in dmesg, no OOM killing going on etc).
 
  We don't seem to collect dom0 freemem info, but that most likely
  wouldn't help given the libvirtd process has exited.
 

  Any ideas where to look next?
 
 Can you access the test environment and try starting libvirtd in the
 foreground?  Or enable debug log level in /etc/libvirt/libvirtd.conf?

The test env will have been recycled, I could try and replicate it
manually, but I think to start with I should arrange for the test env to
have more logging enabled, in the hopes that if it happens again we get
more information. I had some question around this in my reply Wei in
this thread at 1425042785.14641.188.ca...@citrix.com.

Cheers,
Ian.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [xen-unstable test] 35257: regressions - FAIL

2015-02-27 Thread Jim Fehlig

Ian Campbell wrote:
 On Fri, 2015-02-27 at 10:48 +, Wei Liu wrote:
   
 On Fri, Feb 27, 2015 at 09:42:29AM +, Ian Campbell wrote:
 
 On Thu, 2015-02-26 at 20:14 +, xen.org wrote:
   
 flight 35257 xen-unstable real [real]
 http://www.chiark.greenend.org.uk/~xensrcts/logs/35257/

 Regressions :-(

 Tests which did not succeed and are blocking,
 including tests which could not be run:
  test-armhf-armhf-libvirt 12 guest-start.2 fail REGR. vs. 
 34629
 
 logs:
 http://www.chiark.greenend.org.uk/~xensrcts/logs/35257/test-armhf-armhf-libvirt/info.html

 http://www.chiark.greenend.org.uk/~xensrcts/logs/35257/test-armhf-armhf-libvirt/12.ts-guest-start.log
 2015-02-23 20:21:48 Z executing ssh ... root@10.80.229.106 virsh 
 domxml-from-native xen-xl /etc/xen/debian.guest.osstest.cfg  
 /etc/xen/debian.guest.osstest.cfg.xml
 error: failed to connect to the hypervisor
 error: no valid connection
 error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': 
 Connection refused

 http://www.chiark.greenend.org.uk/~xensrcts/logs/35257/test-armhf-armhf-libvirt/marilith-n4-output-ps_wwwaxf_-eo_pid%2Ctty%2Cstat%2Ctime%2Cnice%2Cpsr%2Cpcpu%2Cpmem%2Cnwchan%2Cwchan%2325%2Cargs
 appears to show no libvirtd process.

 http://www.chiark.greenend.org.uk/~xensrcts/logs/35257/test-armhf-armhf-libvirt/marilith-n4---var-log-libvirt-libvirtd.log
 says:
 2015-02-23 20:13:15.556+: 2133: info : libvirt version: 1.2.13
 2015-02-23 20:13:15.556+: 2133: error : 
 dnsmasqCapsRefreshInternal:726 : Cannot check dnsmasq binary dnsmasq: No 
 such file or directory
 2015-02-23 20:13:15.845+: 2133: error : 
 virFirewallValidateBackend:193 : direct firewall backend requested, but 
 /sbin/ebtables is not available: No such file or directory
 
 I think these are just spurious.

 2015-02-23 20:13:15.845+: 2133: error : virFirewallApply:936 : 
 out of memory
 

 2015-02-23 20:13:16.092+: 2133: error : virExec:491 : Cannot 
 find 'pm-is-supported' in path: No such file or directory
 2015-02-23 20:13:16.092+: 2133: warning : virQEMUCapsInit:999 : 
 Failed to get host power management capabilities
 
 As are these two.
 
 2015-02-23 20:13:16.400+: 2133: error : virFirewallApply:936 : 
 out of memory

   
 Last time Ian and I debugged a libvirt crashing bug, out of memory
 didn't cause libvirtd to exit. It turned out it's some bug in libxl
 event machinery that caused libvirt to exit, but the assertion message
 was not shown anywhere.

 I think we might need to login to that host and run libvirtd in
 foreground to determine what goes wrong.
 

 That's possible I suppose, but it would be nice to arrange not to have
 to in the future.

 Perhaps we should be forcing higher log levels on libvirtd when
 installing, patching /usr/local/etc/libvirt/libvirtd.conf to set
 log_level=2 (or even 1) perhaps? (Default is 3 == warnings+error, 2 is
 info, 1 is debug)

 Jim, what debug level would you recommend for automated test? Unless it
   
 is super verbose I suppose 1=debug is the way to go?

I think we need DEBUG log level, although it is rather verbose.  If that
becomes a problem, we could experiment with a minimally useful
log_filters setting, e.g.

log_filters=1:daemon 1:libxl

 Adding -v to libvirtd command line would be an easier patch, but only
 gives the effect of log_level=2 AFAICT. Perhaps that is considered
 sufficient?
   

In my experience, if ERROR is insufficient, INFO and WARNING don't
help.  DEBUG is needed.

Regards,
Jim


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH] xen/pciback: Don't print scary messages when unsupported by hypervisor.

2015-02-27 Thread Konrad Rzeszutek Wilk

We print at the warninig level messages such as:
pciback :90:00.5: MSI-X preparation failed (-38)

which is due to the hypervisor not supporting this sub-hypercall
(which was added in Xen 4.3).

Instead of having scary messages all the time - only have it
when the hypercall is actually supported.

Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
---
 drivers/xen/xen-pciback/pci_stub.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/xen-pciback/pci_stub.c 
b/drivers/xen/xen-pciback/pci_stub.c
index 7acc796..ddc5500 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -115,7 +115,7 @@ static void pcistub_device_release(struct kref *kref)
int err = HYPERVISOR_physdev_op(PHYSDEVOP_release_msix,
ppdev);
 
-   if (err)
+   if (err  err != -ENOSYS)
dev_warn(dev-dev, MSI-X release failed (%d)\n,
 err);
}
@@ -376,7 +376,7 @@ static int __devinit pcistub_init_device(struct pci_dev 
*dev)
};
 
err = HYPERVISOR_physdev_op(PHYSDEVOP_prepare_msix, ppdev);
-   if (err)
+   if (err  err != -ENOSYS)
dev_err(dev-dev, MSI-X preparation failed (%d)\n,
err);
}
-- 
2.1.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] RFC: xen config changes v4

2015-02-27 Thread Luis R. Rodriguez

On Fri, Feb 27, 2015 at 07:14:32AM +0100, Juergen Gross wrote:
 On 02/26/2015 07:48 PM, Luis R. Rodriguez wrote:
 On Thu, Feb 26, 2015 at 05:42:57PM +, Stefano Stabellini wrote:
 On Thu, 26 Feb 2015, Luis R. Rodriguez wrote:
 On Thu, Feb 26, 2015 at 11:08:20AM +, Stefano Stabellini wrote:
 On Thu, 26 Feb 2015, David Vrabel wrote:
 On 26/02/15 04:59, Juergen Gross wrote:

 So we are again in the situation that pv-drivers always imply the pvops
 kernel (PARAVIRT selected). I started the whole Kconfig rework to
 eliminate this dependency.

 Yes.  Can you produce a series that just addresses this one issue.

 In the absence of any concrete requirement for this big Kconfig reorg I
 I don't think it is helpful.

 I clearly missed some context as I didn't realize that this was the
 intended goal. Why do we want this? Please explain as it won't come
 for free.


 We have a few PV interfaces for HVM guests that need PARAVIRT in Linux
 in order to be used, for example pv_time_ops and HVMOP_pagetable_dying.
 They are critical performance improvements and from the interface
 perspective, small enough that doesn't make much sense having a separate
 KConfig option for them.


 In order to reach the goal above we necessarily need to introduce a
 differentiation in terms of PV on HVM guests in Linux:

 1) basic guests with PV network, disk, etc but no PV timers, no
 HVMOP_pagetable_dying, no PV IPIs
 2) full PV on HVM guests that have PV network, disk, timers,
 HVMOP_pagetable_dying, PV IPIs and anything else that makes sense.

 2) is much faster than 1) on Xen and 2) is only a tiny bit slower than
 1) on native x86

 Also don't we shove 2) down hvm guests right now? Even when everything is
 built in I do not see how we opt out for HVM for 1) at run time right now.

 If this is true then the question of motivation for this becomes even
 stronger I think.

 Yes, indeed there is no way to do 1) at the moment. And for good
 reasons, see above.

 OK if the goal is to be able to build front end drivers by avoiding building
 PARAVIRT / PARAVIRT_CLOCK and if the gains to be able to do so (which haven't
 been stated other than just the ability to do so) are small (as Stefano notes
 simple hvm containers do not perform great) but requires a bit of work, I'd
 rather ask -- why not address *why* we are avoiding PARAVIRT /
 PARAVIRT_CLOCK and stick to the original goals behind the pvops model by
 addressing what is required to be able to continue to be happy with one 
 single
 kernel. The work required to do that might be more than to just be able to
 build simple Xen hvm containers without PARAVIRT / PARAVIRT_CLOCK  but I'd
 think the gains would be much higher.

 I absolutely agree. I think this is a long term goal we should work on.
 PVH should address most of the issues, BTW.

 If this resonates well then I'd like to ask: what are the current most 
 pressing
 issues with enabling PARAVIRT / PARAVIRT_CLOCK.

 PARAVIRT: performance, especially memory management

Do we have studies on specific areas? I'd be very interested in the exact 
routines.

 PARAVIRT_CLOCK: none

Great!

 Luis

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH] xen, apic: Setup our own APIC driver and validator for APIC IDs.

2015-02-27 Thread Konrad Rzeszutek Wilk

Via CPUID masking and the different apic- overrides we
effectively make PV guests only but with the default APIC
driver. That is OK as an PV guest should never access any
APIC registers. However, the APIC is also used to limit the
amount of CPUs if the APIC IDs are incorrect - and since we
mask the x2APIC from the CPUID - any APIC IDs above 0xFF
are deemed incorrect by the default APIC routines.

As such add a new routine to check for APIC ID which will
be only used if the CPUID (native one) tells us the system
is using x2APIC.

This allows us to boot with more than 255 CPUs if running
as initial domain.

The probing of APIC drivers is dependent on the build. The
arch/x86/kernel/apic/Makefile lists them as (assuming 64-bit):
 apic_numachip.o
 x2apic_uv_x.o
 x2apic_phys.o
 x2apic_cluster.o
 apic_flat_64.o

Looking at .apicdrivers section I see:
xen_apic, apic_x2apic_phys, apic_x2apic_cluster, apic_physflatapic_flat
addresses.  Since we build from arch/x86/xen which we can before or
after x86/kernel/apic is built. As such we add in an late probe
function to change to the Xen PV if it hand't been done during bootup.

Reported-by: Cathy Avery cathy.av...@oracle.com
Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
---
 arch/x86/xen/apic.c  | 169 +++
 arch/x86/xen/enlighten.c |  90 +
 2 files changed, 170 insertions(+), 89 deletions(-)

diff --git a/arch/x86/xen/apic.c b/arch/x86/xen/apic.c
index 7005ced..9b9a5fc 100644
--- a/arch/x86/xen/apic.c
+++ b/arch/x86/xen/apic.c
@@ -7,6 +7,7 @@
 #include xen/xen.h
 #include xen/interface/physdev.h
 #include xen-ops.h
+#include smp.h
 
 static unsigned int xen_io_apic_read(unsigned apic, unsigned reg)
 {
@@ -28,7 +29,175 @@ static unsigned int xen_io_apic_read(unsigned apic, 
unsigned reg)
return 0xfd;
 }
 
+static unsigned long xen_set_apic_id(unsigned int x)
+{
+   WARN_ON(1);
+   return x;
+}
+
+static unsigned int xen_get_apic_id(unsigned long x)
+{
+   return ((x)24)  0xFFu;
+}
+
+static u32 xen_apic_read(u32 reg)
+{
+   struct xen_platform_op op = {
+   .cmd = XENPF_get_cpuinfo,
+   .interface_version = XENPF_INTERFACE_VERSION,
+   .u.pcpu_info.xen_cpuid = 0,
+   };
+   int ret = 0;
+
+   /* Shouldn't need this as APIC is turned off for PV, and we only
+* get called on the bootup processor. But just in case. */
+   if (!xen_initial_domain() || smp_processor_id())
+   return 0;
+
+   if (reg == APIC_LVR)
+   return 0x10;
+
+   if (reg != APIC_ID)
+   return 0;
+
+   ret = HYPERVISOR_dom0_op(op);
+   if (ret)
+   return 0;
+
+   return op.u.pcpu_info.apic_id  24;
+}
+
+static void xen_apic_write(u32 reg, u32 val)
+{
+   /* Warn to see if there's any stray references */
+   WARN_ON(1);
+}
+
+static u64 xen_apic_icr_read(void)
+{
+   return 0;
+}
+
+static void xen_apic_icr_write(u32 low, u32 id)
+{
+   /* Warn to see if there's any stray references */
+   WARN_ON(1);
+}
+
+static u32 xen_safe_apic_wait_icr_idle(void)
+{
+return 0;
+}
+
+
+static int probe_xen(void)
+{
+   if (xen_pv_domain())
+   return 1;
+
+   return 0;
+}
+
+static int xen_madt_oem_check(char *oem_id, char *oem_table_id)
+{
+   return 1;
+}
+
+static int xen_id_always_valid(int apicid)
+{
+   return 1;
+}
+
+static int xen_id_always_registered(void)
+{
+   return 1;
+}
+
+static int xen_phys_pkg_id(int initial_apic_id, int index_msb)
+{
+   return initial_apic_id  index_msb;
+}
+
+static void xen_noop(void)
+{
+}
+
+static void xen_silent_inquire(int apicid)
+{
+}
+
+static struct apic xen_apic = {
+   .name   = Xen PV,
+   .probe  = probe_xen,
+   .acpi_madt_oem_check= xen_madt_oem_check,
+   .apic_id_valid  = xen_id_always_valid,
+   .apic_id_registered = xen_id_always_registered,
+
+   /* .irq_delivery_mode - used in native_compose_msi_msg only */
+   /* .irq_dest_mode - used in native_compose_msi_msg only */
+
+   .target_cpus= default_target_cpus,
+   .disable_esr= 0,
+   /* .dest_logical  -  default_send_IPI_ use it but we use our own. */
+   .check_apicid_used  = default_check_apicid_used, /* Used on 
32-bit */
+
+   .vector_allocation_domain   = flat_vector_allocation_domain,
+   .init_apic_ldr  = xen_noop, /* setup_local_APIC calls 
it */
+
+   .ioapic_phys_id_map = default_ioapic_phys_id_map, /* Used 
on 32-bit */
+   .setup_apic_routing = NULL,
+   .cpu_present_to_apicid  = default_cpu_present_to_apicid,
+   .apicid_to_cpu_present  = physid_set_mask_of_physid, /* Used on 
32-bit */
+   .check_phys_apicid_present  =

[Xen-devel] Regression due to d9581c7dcac15c02ad4d47c60c60f4d8f197db55 en/fb: allow xenfb initialization for hvm guest

2015-02-27 Thread Konrad Rzeszutek Wilk

This has been in queue for some time.

In our kernels (UEK3) we had to revert said patch. The patch says:

xen/fb: allow xenfb initialization for hvm guests

There is no reasons why an HVM guest shouldn't be allowed to use xenfb.
As a matter of fact ARM guests, HVM from Linux POV, can use xenfb.
Given that no Xen toolstacks configure a xenfb backend for x86 HVM
guests, they are not affected.

Please note that at this time QEMU needs few outstanding fixes to
provide xenfb on ARM:

http://marc.info/?l=qemu-develm=138739419700837w=2


which is a lie. The no Xen toolstacks configure a xenfb backend for
x86 HVM is actually a lie. If you try to boot this kernel under
Xen with Xend it will be a problem - as Xend does setup an 'vfb'
device.

The end result is that during the bootup - up until X starts, there is
no console output on the VNC window. As the Linux kernel tries to use
the vfb console driver.

Any suggestsion on how to fix this? Should we just wrap the
whole thing with #ifdef, like this?

diff --git a/drivers/video/fbdev/xen-fbfront.c 
b/drivers/video/fbdev/xen-fbfront.c
index 09dc447..584be8e 100644
--- a/drivers/video/fbdev/xen-fbfront.c
+++ b/drivers/video/fbdev/xen-fbfront.c
@@ -696,7 +696,10 @@ static int __init xenfb_init(void)
 {
if (!xen_domain())
return -ENODEV;
-
+#ifdef CONFIG_X86
+   if (!xen_pv_domain())
+   return -ENODEV;
+#endif
/* Nothing to do if running in dom0. */
if (xen_initial_domain())
return -ENODEV;

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] libxl__device_pci_reset() questions

2015-02-27 Thread Konrad Rzeszutek Wilk

On Thu, Feb 26, 2015 at 02:28:34PM +, Jan Beulich wrote:
  On 19.02.15 at 15:30, ian.campb...@citrix.com wrote:
  On Thu, 2015-02-19 at 13:59 +, Jan Beulich wrote:
  All,
  
  in the context of someone seeing The kernel doesn't support reset
  from sysfs for PCI device, is my understanding correct that the lack
  of error checking in any caller (perhaps intentional) means that any
  of the errors logged from this function are really just warnings, i.e.
  don't prevent the assignment from taking place?
  
  It was a long while ago, but I believe that was the intention, yes.
  
  Furthermore I'm puzzled by the function first thing trying to access
  a do_flr file supposedly made available by the pciback driver, yet
  I can't see either the upstream or the old 2.6.18 driver surfacing
  such a file. What am I missing here?
  
  I'm not sure, on the basis of
  http://lists.xen.org/archives/html/xen-devel/2014-06/msg03105.html and 
  http://lists.xen.org/archives/html/xen-devel/2014-07/msg01108.html I've
  added Konrad to the CC.
 
 Konrad?

I talked with David about this and his point was that:
 1). If the device advertises it can 'reset' it be better be able to do it.

 2). However there are some that lie. If they exist we should have an quirk for 
them
 in the PCI layer so that we don't think we have this feature available.

 3). In the case where the PCI device has none of the mechanism to do the reset
 we should provide on via xen-pciback.

The 3) David had a patch which is in XenServer which does the work - it first
figures out whether the PCI device reports as being able to do the reset. If it
is not, then we install our own 'reset' SysFS which will do the bus reset.

However looking at how VFIO and QEMU does it - there is also an check on 
the user-space part - where it decideds in some cases to ignore the 'reset'
from SysFS and do its bus-reset via the VFIO ioctl. I hadn't yet digged
completlely in the code to understand what the logic states it has to
use the VFIO ioctl bus reset instead of the PCI reset mechanism.

 
 Thanks, Jan
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [OSSTEST PATCH 3/8] emails: honour OSSTEST_EMAIL_SUBJECT_PREFIX

2015-02-27 Thread Ian Campbell

On Thu, 2015-02-26 at 17:44 +, Ian Jackson wrote:
 Ian Campbell writes (Re: [OSSTEST PATCH 3/8] emails: honour 
 OSSTEST_EMAIL_SUBJECT_PREFIX):
  On Wed, 2015-02-25 at 13:01 +, Ian Jackson wrote:
   This is prefixed before the other computed prefixes.  It makes it
   easier to distinguish an adhoc cr-daily-branch test runs for a real
   branch.
  
  Do they not already get adhoc in the $subject? i.e. my commissioning
  runs for the new arm create (following README.dev procedure) resulted in
  mails with:
  
  [adhoc test] 34418: trouble: blocked/broken/fail/pass
  
  (IOW it seems $branch is replaced by adhoc somewhere along the say)
 
 That happens if you use mg-execute-flight.  If you let cr-daily-branch
 run the flight for you, it uses the standard email stuff.

Ah, OK, I didn't realise there was a difference.

So Ack to this and the next patch which I didn't ack for similar
reasons. (I think that makes the whole series acked, FWIW)

 
 Ian.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Poor network performance between DomU with multiqueue support

2015-02-27 Thread openlui

On Mon, Dec 08, 2014 at 01:08:18PM +, Zhangleiqiang (Trump) wrote:
  On Mon, Dec 08, 2014 at 06:44:26AM +, Zhangleiqiang (Trump) wrote:
On Fri, Dec 05, 2014 at 01:17:16AM +, Zhangleiqiang (Trump) wrote:
[...]
  I think that's expected, because guest RX data path still 
  uses grant_copy while guest TX uses grant_map to do zero-copy 
  transmit.

 As far as I know, there are three main grant-related 
 operations used in split
device model: grant mapping, grant transfer and grant copy.
 Grant transfer has not used now, and grant mapping and grant 
 transfer both
involve TLB refresh work for hypervisor, am I right?  Or only 
grant transfer has this overhead?
   
Transfer is not used so I can't tell. Grant unmap causes TLB flush.
   
I saw in an email the other day XenServer folks has some planned 
improvement to avoid TLB flush in Xen to upstream in 4.6 window. 
I can't speak for sure it will get upstreamed as I don't work on that.
   
 Does grant copy surely has more overhead than grant mapping?

   
At the very least the zero-copy TX path is faster than previous 
copying path.
   
But speaking of the micro operation I'm not sure.
   
There was once persistent map prototype netback / netfront that 
establishes a memory pool between FE and BE then use memcpy to 
copy data. Unfortunately that prototype was not done right so 
the result was not
  good.
  
   The newest mail about persistent grant I can find is sent from 16 
   Nov
   2012
   (http://lists.xen.org/archives/html/xen-devel/2012-11/msg00832.html).
   Why is it not done right and not merged into upstream?
  
  AFAICT there's one more memcpy than necessary, i.e. frontend memcpy 
  data into the pool then backend memcpy data out of the pool, when 
  backend should be able to use the page in pool directly.
 
 Memcpy should cheaper than grant_copy because the former needs not the 
 hypercall which will cause VM Exit to XEN Hypervisor, am I 
 right? For RX path, using memcpy based on persistent grant table may 
 have higher performance than using grant copy now.

In theory yes. Unfortunately nobody has benchmarked that properly.
I have some testing for RX performance using persistent grant method and 
upstream method (3.17.4 branch), the results show that persistent grant method 
does have higher performance than upstream method (from 3.5Gbps to about 
6Gbps). And I find that persistent grant mechanism has already used in 
blkfrong/blkback, I am wondering why there are no efforts to replace the grant 
copy by persistent grant now, at least in RX path. Are there other 
disadvantages in persistent grant method which stop we use it? 

PS. I used pkt-gen to send packet from dom0 to a domU running on another dom0, 
the CPUs of both dom0 is Intel E5640 2.4GHz, and the two dom0s is connected 
with a 10GE NIC.




If you're interested in doing work on optimising RX performance, you might 
want to sync up with XenServer folks?

 
 I have seen move grant copy to guest and Fix grant copy alignment 
 problem as optimization methods used in NetChannel2
 (http://www-archive.xenproject.org/files/xensummit_fall07/16_JoseRenatoSantos.pdf).
 Unfortunately, NetChannel2 seems not be supported from 2.6.32. Do you 
 know them and are them be helpful for RX path optimization under 
 current upstream implementation?

Not sure, that's long before I ever started working on Xen.

 
 By the way, after rethinking the testing results for multi-queue pv 
 (kernel 3.17.4+XEN 4.4) implementation, I find that when using four 
 queues for netback/netfront, there will be about 3 netback process 
 running with high CPU usage on receive Dom0 (about 85% usage per 
 process running on one CPU core), and the aggregate throughout is only 
 about 5Gbps. I doubt that there may be some bug or pitfall in current 
 multi-queue implementation, because for 5Gbps throughout, occurring 
 about all of 3 CPU core for packet receiving is somehow abnormal.
 

3.17.4 doesn't contain David Vrabel's fixes.

Look for
  bc96f648df1bbc2729abbb84513cf4f64273a1f1
  f48da8b14d04ca87ffcffe68829afd45f926ec6a
  ecf08d2dbb96d5a4b4bcc53a39e8d29cc8fef02e
in David Miller's net tree.

BTW there are some improvement planned for 4.6: [Xen-devel] [PATCH v3 0/2] 
gnttab: Improve scaleability. This is orthogonal to the problem you're trying 
to solve but it should help improve performance in general.


Wei.
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3] RFC: Automatically check xen's public headers for C++ pitfalls.

2015-02-27 Thread Jan Beulich

 On 27.02.15 at 10:22, t...@xen.org wrote:
 At 08:36 + on 27 Feb (1425022578), Jan Beulich wrote:
  On 26.02.15 at 17:24, t...@xen.org wrote:
  +PUBLIC_ANSI_HEADERS := $(filter-out public/%ctl.h public/xsm/% 
 public/%hvm/save.h, $(PUBLIC_HEADERS))
  +
  +headers.chk: $(PUBLIC_ANSI_HEADERS) Makefile
  +  for i in $(filter %.h,$^); do \
  +  $(CC) -x c -ansi -Wall -Werror -include stdint.h \
  +-S -o /dev/null $$i || exit 1; \
  +  echo $$i; \
  +  done $@.new
  +  mv $@.new $@
  +
  +headers++.chk: $(PUBLIC_HEADERS) Makefile
  +  if $(CXX) -v /dev/null 21; then \
  +  for i in $(filter %.h,$^); do \
  +  $(CXX) -x c++ -std=gnu++98 -Wall -Werror \
  + -D__XEN_TOOLS__ -Dprivate=private_is_a_keyword_in_cpp \
 
 With -D__XEN_TOOLS__ added, did you check that domctl.h and
 sysctl.h still actually need to be excluded from this test?
 
 The C++ check includes those headers and defines __XEN_TOOLS__; the
 ANSI C check does neither (as before). 

Argh - I again didn't look closely enough; I'm sorry.

 Would you like to change that too?

No.

Ack on v3 then.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [OSSTEST PATCH 9/8] README.dev: Runes for adhoc testing in the production environment

2015-02-27 Thread Ian Campbell

On Thu, 2015-02-26 at 17:53 +, Ian Jackson wrote:
 Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com

Looks good, Acked-by: Ian Campbell ian.campb...@citrix.com
 ---
  README.dev |   18 ++
  1 file changed, 18 insertions(+)
 
 diff --git a/README.dev b/README.dev
 index aae4f17..03c3e61 100644
 --- a/README.dev
 +++ b/README.dev
 @@ -164,3 +164,21 @@ $HOME/bisects/for-$branch.git/stop
  $HOME/testing.git/$xenbranch.stop
  
stops everything using $xenbranch
 +
 +Adhoc testing in the production environment
 +===
 +
 +Adhoc (`play') testing of a proposed osstest branch:
 +
 +  As yourself on the osstest controller VM:
 +
 +  Check out the version of osstest to be tested.  If you are editing
 +  on your workstation, it is easiest to commit everything and then
 + git-push osstestvm:osstest-wombat-tree.git +HEAD:t
 +  and on the controller
 + git checkout t~0
 +
 +  Create (on the controller) daily-cron-email-foo containing
 + To: something appropriate
 +  Then
 + OSSTEST_EMAIL_HEADER=daily-cron-email-foo OSSTEST_USE_HEAD=y 
 OSSTEST_NO_BASELINE=y ./cr-daily-branch osstest



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 3/3] xen/arm: allow console=hvc0 to be omitted for guests

2015-02-27 Thread Ian Campbell

On Thu, 2015-02-26 at 18:22 +, Stefano Stabellini wrote:
 On Wed, 18 Feb 2015, Ian Campbell wrote:
  On Wed, 2015-02-18 at 09:50 -0600, Rob Herring wrote:
   On Wed, Feb 18, 2015 at 7:51 AM, Julien Grall julien.gr...@linaro.org 
   wrote:
From: Ard Biesheuvel ard.biesheu...@linaro.org
   
This patch registers hvc0 as the preferred console if no console
has been specified explicitly on the kernel command line.
   
The purpose is to allow platform agnostic kernels and boot images
(such as distro installers) to boot in a Xen/ARM domU without the
need to modify the command line by hand.
   
   How does this interact with DT chosen stdout-path?
  
  I think it shouldn't any more than the existing calls from e.g. the 8250
  driver to preferred_console do.
 
   Is there a node for hvc0?
  
  Not a direct one, it is inferred from the presence of the general Xen
  node.
 
 Xen PV consoles, including hvc0, as all the other Xen PV devices are
 advertised on xenstore.

Do we actually use the xenstore node for hvc0? I thought we got it from
hvmparams (so the primary it can be used before xenstore is up)

 
 
  I did vaguely consider handling a stdout-path pointing to that --
  but it seemed a bit of an abuse.
 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] RFC: xen config changes v4

2015-02-27 Thread Stefano Stabellini

On Fri, 27 Feb 2015, Juergen Gross wrote:
 On 02/26/2015 06:42 PM, Stefano Stabellini wrote:
  On Thu, 26 Feb 2015, Luis R. Rodriguez wrote:
   On Thu, Feb 26, 2015 at 11:08:20AM +, Stefano Stabellini wrote:
On Thu, 26 Feb 2015, David Vrabel wrote:
 On 26/02/15 04:59, Juergen Gross wrote:
  
  So we are again in the situation that pv-drivers always imply the
  pvops
  kernel (PARAVIRT selected). I started the whole Kconfig rework to
  eliminate this dependency.
 
 Yes.  Can you produce a series that just addresses this one issue.
 
 In the absence of any concrete requirement for this big Kconfig reorg
 I
 I don't think it is helpful.

I clearly missed some context as I didn't realize that this was the
intended goal. Why do we want this? Please explain as it won't come
for free.


We have a few PV interfaces for HVM guests that need PARAVIRT in Linux
in order to be used, for example pv_time_ops and HVMOP_pagetable_dying.
They are critical performance improvements and from the interface
perspective, small enough that doesn't make much sense having a separate
KConfig option for them.


In order to reach the goal above we necessarily need to introduce a
differentiation in terms of PV on HVM guests in Linux:

1) basic guests with PV network, disk, etc but no PV timers, no
HVMOP_pagetable_dying, no PV IPIs
2) full PV on HVM guests that have PV network, disk, timers,
HVMOP_pagetable_dying, PV IPIs and anything else that makes sense.

2) is much faster than 1) on Xen and 2) is only a tiny bit slower than
1) on native x86
   
   Also don't we shove 2) down hvm guests right now? Even when everything is
   built in I do not see how we opt out for HVM for 1) at run time right now.
   
   If this is true then the question of motivation for this becomes even
   stronger I think.
  
  Yes, indeed there is no way to do 1) at the moment. And for good
  reasons, see above.
 
 Hmm, after checking the code I'm not convinced:
 
 - HVMOP_pagetable_dying is obsolete on modern hardware supporting
   EPT/HAP

That might be true, but what about older hardware?
Even on modern hardware a few workloads still run faster on shadow.
But if HVMOP_pagetable_dying is the only reason to keep PARAVIRT for HVM
guests, then I agree with you that we should remove it.


 - PV IPIs are not needed on single-vcpu guests

 - PARAVIRT_CLOCK doesn't need PARAVIRT (in fact the SUSEs kernel configs
   for all x86_64 kernels have CONFIG_PARAVIRT_CLOCK=y)
 
 So I think we really should enable building Xen frontends without
 PARAVIRT, implying at least no XEN_PV and no XEN_PVH.
 
 I'll have a try setting up patches.
 
If we are doing this as a performance improvement, I would like to see a
couple of benchmarks (kernbench, hackbench) to show that on a
single-vcpu guest and multi-vcpu guest (let's say 4 vcpus) disabling
PARAVIRT leads to better performance on Xen on EPT hardware.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 1/5] x86: allow specifying the NUMA nodes Dom0 should run on

2015-02-27 Thread Jan Beulich

 On 26.02.15 at 18:14, dario.faggi...@citrix.com wrote:
 On Thu, 2015-02-26 at 13:52 +, Jan Beulich wrote:
 +### dom0\_nodes
 +
 + `= integer[,...]`
 +
 +Specify the NUMA nodes to place Dom0 on. Defaults for vCPU-s created
 +and memory assigned to Dom0 will be adjusted to match the node
 +restrictions set up here. Note that the values to be specified here are
 +ACPI PXM ones, not Xen internal node numbers.
 +
 Why use PXM ids? It might be me being much more used to work with NUMA
 node ids, but wouldn't the other way round be more consistent (almost
 everything the user interacts with after boot speak node ids) and easier
 for the user to figure things out (e.g., with tools like numactl on
 baremetal)?

This way behavior doesn't change if internally in the hypervisor we
need to change the mapping from PXMs to node IDs.

 +static struct vcpu *__init setup_vcpu(struct domain *d, unsigned int 
 vcpu_id,
 +  unsigned int cpu)
 +{
 +struct vcpu *v = alloc_vcpu(d, vcpu_id, cpu);
 +
 +if ( v )
 +{
 +if ( !d-is_pinned )
 +cpumask_copy(v-cpu_hard_affinity, dom0_cpus);
 +cpumask_copy(v-cpu_soft_affinity, dom0_cpus);
 +}
 +
 About this, for DomUs, now that we have soft affinity available, what we
 do is set only soft affinity to match the NUMA placement. I think I see
 and agree why we want to be 'more strict' in Dom0, but I felt like it
 was worth to point out the difference in behaviour (should it be
 documented somewhere?).

I'm simply adjusting what sched_init_vcpu() did, which is alter
hard affinity conditionally on is_pinned and soft affinity
unconditionally.

 BTW, mostly out of curiosity, I've had a few strange issues/conflicts in
 applying this on top of staging, in order to test it... Was it me doing
 something very stupid, or was this based on something different?

Apart from the one patch named in the cover letter there shouldn't
be any other dependencies. Without you naming the issues you
encountered, I can't tell.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3] RFC: Automatically check xen's public headers for C++ pitfalls.

2015-02-27 Thread Tim Deegan

At 08:36 + on 27 Feb (1425022578), Jan Beulich wrote:
  On 26.02.15 at 17:24, t...@xen.org wrote:
  +PUBLIC_ANSI_HEADERS := $(filter-out public/%ctl.h public/xsm/% 
  public/%hvm/save.h, $(PUBLIC_HEADERS))
  +
  +headers.chk: $(PUBLIC_ANSI_HEADERS) Makefile
  +   for i in $(filter %.h,$^); do \
  +   $(CC) -x c -ansi -Wall -Werror -include stdint.h \
  + -S -o /dev/null $$i || exit 1; \
  +   echo $$i; \
  +   done $@.new
  +   mv $@.new $@
  +
  +headers++.chk: $(PUBLIC_HEADERS) Makefile
  +   if $(CXX) -v /dev/null 21; then \
  +   for i in $(filter %.h,$^); do \
  +   $(CXX) -x c++ -std=gnu++98 -Wall -Werror \
  +  -D__XEN_TOOLS__ -Dprivate=private_is_a_keyword_in_cpp \
 
 With -D__XEN_TOOLS__ added, did you check that domctl.h and
 sysctl.h still actually need to be excluded from this test?

The C++ check includes those headers and defines __XEN_TOOLS__; the
ANSI C check does neither (as before).  Would you like to change that too?

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [xen-unstable test] 35257: regressions - FAIL

2015-02-27 Thread Wei Liu

On Fri, Feb 27, 2015 at 09:42:29AM +, Ian Campbell wrote:
 On Thu, 2015-02-26 at 20:14 +, xen.org wrote:
  flight 35257 xen-unstable real [real]
  http://www.chiark.greenend.org.uk/~xensrcts/logs/35257/
  
  Regressions :-(
  
  Tests which did not succeed and are blocking,
  including tests which could not be run:
   test-armhf-armhf-libvirt 12 guest-start.2 fail REGR. vs. 
  34629
 
 logs:
 http://www.chiark.greenend.org.uk/~xensrcts/logs/35257/test-armhf-armhf-libvirt/info.html
 
 http://www.chiark.greenend.org.uk/~xensrcts/logs/35257/test-armhf-armhf-libvirt/12.ts-guest-start.log
 2015-02-23 20:21:48 Z executing ssh ... root@10.80.229.106 virsh 
 domxml-from-native xen-xl /etc/xen/debian.guest.osstest.cfg  
 /etc/xen/debian.guest.osstest.cfg.xml
 error: failed to connect to the hypervisor
 error: no valid connection
 error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': 
 Connection refused
 
 http://www.chiark.greenend.org.uk/~xensrcts/logs/35257/test-armhf-armhf-libvirt/marilith-n4-output-ps_wwwaxf_-eo_pid%2Ctty%2Cstat%2Ctime%2Cnice%2Cpsr%2Cpcpu%2Cpmem%2Cnwchan%2Cwchan%2325%2Cargs
 appears to show no libvirtd process.
 
 http://www.chiark.greenend.org.uk/~xensrcts/logs/35257/test-armhf-armhf-libvirt/marilith-n4---var-log-libvirt-libvirtd.log
 says:
 2015-02-23 20:13:15.556+: 2133: info : libvirt version: 1.2.13
 2015-02-23 20:13:15.556+: 2133: error : 
 dnsmasqCapsRefreshInternal:726 : Cannot check dnsmasq binary dnsmasq: No such 
 file or directory
 2015-02-23 20:13:15.845+: 2133: error : 
 virFirewallValidateBackend:193 : direct firewall backend requested, but 
 /sbin/ebtables is not available: No such file or directory
 
 I think these are just spurious.
 
 2015-02-23 20:13:15.845+: 2133: error : virFirewallApply:936 : 
 out of memory
 
 
 2015-02-23 20:13:16.092+: 2133: error : virExec:491 : Cannot find 
 'pm-is-supported' in path: No such file or directory
 2015-02-23 20:13:16.092+: 2133: warning : virQEMUCapsInit:999 : 
 Failed to get host power management capabilities
 
 As are these two.
 
 2015-02-23 20:13:16.400+: 2133: error : virFirewallApply:936 : 
 out of memory
 

Last time Ian and I debugged a libvirt crashing bug, out of memory
didn't cause libvirtd to exit. It turned out it's some bug in libxl
event machinery that caused libvirt to exit, but the assertion message
was not shown anywhere.

I think we might need to login to that host and run libvirtd in
foreground to determine what goes wrong.

Wei.


 Has these OOM messages resulted in libvirtd exiting? I don't see any
 evidence of a crash elsewhere in the logs (i.e. no process segfaulted
 in dmesg, no OOM killing going on etc).
 
 We don't seem to collect dom0 freemem info, but that most likely
 wouldn't help given the libvirtd process has exited.
 
 Any ideas where to look next?
 
 Ian.
 
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 1/5] x86: allow specifying the NUMA nodes Dom0 should run on

2015-02-27 Thread Jan Beulich

 On 27.02.15 at 11:04, dario.faggi...@citrix.com wrote:
 On Fri, 2015-02-27 at 08:46 +, Jan Beulich wrote:
  On 26.02.15 at 18:14, dario.faggi...@citrix.com wrote:
  On Thu, 2015-02-26 at 13:52 +, Jan Beulich wrote:
  +### dom0\_nodes
  +
  + `= integer[,...]`
  +
  +Specify the NUMA nodes to place Dom0 on. Defaults for vCPU-s created
  +and memory assigned to Dom0 will be adjusted to match the node
  +restrictions set up here. Note that the values to be specified here are
  +ACPI PXM ones, not Xen internal node numbers.
  +
  Why use PXM ids? It might be me being much more used to work with NUMA
  node ids, but wouldn't the other way round be more consistent (almost
  everything the user interacts with after boot speak node ids) and easier
  for the user to figure things out (e.g., with tools like numactl on
  baremetal)?
 
 This way behavior doesn't change if internally in the hypervisor we
 need to change the mapping from PXMs to node IDs.
 
 Ok, I see the value of this. I'm still a bit concerned about the fact
 that everything else speak NUMA node, but it's probably just me being
 much more used to that than to PXMs. :-)

With everything else I suppose you mean the tool stack? There
shouldn't be any node IDs kept across reboots there. Yet the
consistent behavior to be achieved here is particularly for multiple
boots.

  +static struct vcpu *__init setup_vcpu(struct domain *d, unsigned int 
  vcpu_id,
  +  unsigned int cpu)
  +{
  +struct vcpu *v = alloc_vcpu(d, vcpu_id, cpu);
  +
  +if ( v )
  +{
  +if ( !d-is_pinned )
  +cpumask_copy(v-cpu_hard_affinity, dom0_cpus);
  +cpumask_copy(v-cpu_soft_affinity, dom0_cpus);
  +}
  +
  About this, for DomUs, now that we have soft affinity available, what we
  do is set only soft affinity to match the NUMA placement. I think I see
  and agree why we want to be 'more strict' in Dom0, but I felt like it
  was worth to point out the difference in behaviour (should it be
  documented somewhere?).
 
 I'm simply adjusting what sched_init_vcpu() did, which is alter
 hard affinity conditionally on is_pinned and soft affinity
 unconditionally.
 
 Ok, I understand the idea behing this better now, thanks.
 [...]
 Setting soft affinity as a superset of (in the former case) or equal to
 (in the latter) hard affinity is just pure overhead, when in the
 scheduler.

The why does sched_init_vcpu() do what it does? If you want to
alter that, I'm fine with altering it here.

 In fact, if the scheduler sees that soft affinity is defined, it will go
 through the load balancing/vcpu placement logic twice, the first time
 using the soft affinity mask, the second using the hard affinity one.
 Actually, the first time it uses 'soft  hard', which in these cases is
 exactly equal to hard, and that's why I'm calling this pure overhead.
 
 I probably should add checks in the scheduler to identify such
 situations as no need to consider soft affinity. I thought about this
 before, but didn't do that because it's a more cpumask_foo() fiddling in
 a few hot paths... but of course I can check for the relationship
 between hard and soft affinity masks upfront, cache the result in a
 bool_t, and use _that_ in hot paths... what do you think?

Avoiding the fiddling in hot paths is surely desirable. But it would
indeed seem even better to avoid the inefficiency in the first place
(i.e. when storing affinities).

 All this being said, I still would avoid putting the system in a
 configuration where soft is superset or equal to hard, at the very least
 not automatically, as I think it can appear confusing to the user (the
 user himself can, of course, do that after boot, for Dom0 or DomUs, but
 that's another story, I think). So I'm now thinking whether it wouldn't
 be better to, in this patch, leave soft affinity alone completely.
 
 Then, if we want to make it possible to tweak soft affinity, we can
 allow for something like dom0_nodes=soft:1,3 and, in that case, alter
 soft affinity only.

Hmm, not sure. And I keep being confused whether soft means
allow and hard means prefer or the other way around. In any
event, again, with sched_init_vcpu() setting up things so that
soft is a superset of hard (and most likely they're equal), I don't
see why the same done here would be more of a problem.

  BTW, mostly out of curiosity, I've had a few strange issues/conflicts in
  applying this on top of staging, in order to test it... Was it me doing
  something very stupid, or was this based on something different?
 
 Apart from the one patch named in the cover letter there shouldn't
 be any other dependencies. Without you naming the issues you
 encountered, I can't tell.
 
 I see. Never mind then, maybe I messed up with my various branches...
 Sorry for bothering with this. :-)

No reason to be sorry - I'm more than happy if inconsistencies get
pointed out before trying to commit anything.

Jan

Re: [Xen-devel] [PATCH 3/4] xen: sched: make counters for vCPU tickling generic

2015-02-27 Thread Dario Faggioli

On Fri, 2015-02-27 at 00:47 -0500, Meng Xu wrote:

 2015-02-26 8:37 GMT-05:00 Dario Faggioli dario.faggi...@citrix.com:
 and update them from Credit2 and RTDS schedulers.
 
 Signed-off-by: Dario Faggioli dario.faggi...@citrix.com
 Cc: Meng Xu xumengpa...@gmail.com
 Cc: George Dunlap george.dun...@eu.citrix.com
 Cc: Jan Beulich jbeul...@suse.com
 Cc: Keir Fraser k...@xen.org
 ---
  xen/common/sched_credit2.c   |2 ++
  xen/common/sched_rt.c|2 ++
  xen/include/xen/perfc_defn.h |4 ++--
  3 files changed, 6 insertions(+), 2 deletions(-)
 
 The change for RTDS scheduler looks good to me.

Does this count as a Reviewed-by: Meng Xu men...@cis.upenn.edu ?

Also, if yes, does it also apply to patch #2 ? That is unclear as
sched_rt.c is modified in patches #1, #2 ad #3, while what you did is:
 - you explicitly provided the tag for patch #1
 - you said looks good for this for patch #3
 - you said nothing for patch #2

The bottom line of all this being: with Ack-s/Reviewed-by-s, it's always
better be pretty explicit! :-D

Thanks and Regards,
Dario



signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] freemem-slack and large memory environments

2015-02-27 Thread Stefano Stabellini

On Thu, 26 Feb 2015, Mike Latimer wrote:
 On Thursday, February 26, 2015 01:45:16 PM Mike Latimer wrote:
  On Thursday, February 26, 2015 05:53:06 PM Stefano Stabellini wrote:
   What is the return value of libxl_set_memory_target and
   libxl_wait_for_free_memory in that case? Isn't it just a matter of
   properly handle the return values?
  
  The return from libxl_set_memory_target is 0, as the assignment works just
  fine. I don't have the return from libxl_wait_for_free_memory in my notes,
  so I'll spin up another test and track that down.
 
 I slightly misspoke here... In my testing, the returns are actually:
 
libxl_set_memory_target = 1

The new memory target is set for dom0 successfully.


libxl_wait_for_free_memory = -5

Still there isn't enough free memory in the system.


libxl_wait_for_memory_target = 0

However dom0 reached the new memory target already.
Who is stealing your memory?


   Note - libxl_wait_for_memory_target is confusing, as rc can be set
   to ERROR_FAIL, but the function returns 0 anyway (unless an error
   is encountered earlier.) I guess this just means we need to continue
   to wait...

Maybe I am misunderstanding what you meant, but as far as I can tell rc
is set to ERROR_FAIL only right before the out label in
libxl_wait_for_memory_target. In that case the function would return
ERROR_FAIL.

In any case in the context of libxl_wait_for_memory_target, ERROR_FAIL
means that the memory target has not been reached.

 
 I was testing spinning up a 64GB guest on a 2TB host. After the ballooning 
 had 
 completed, dom0 had ballooned down an extra ~320GB. On this particular 
 machine, each iteration of the loop was showing only 5-7GB of memory being 
 freed at a time. (The loop took 12 iterations.)

I would investigate why dom0 is ballooning down as much as you asked it
to, but the free memory in the system is still not enough.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Poor network performance between DomU with multiqueue support

2015-02-27 Thread Wei Liu

Cc'ing David (XenServer kernel maintainer)

On Fri, Feb 27, 2015 at 05:21:11PM +0800, openlui wrote:
 On Mon, Dec 08, 2014 at 01:08:18PM +, Zhangleiqiang (Trump) wrote:
   On Mon, Dec 08, 2014 at 06:44:26AM +, Zhangleiqiang (Trump) wrote:
 On Fri, Dec 05, 2014 at 01:17:16AM +, Zhangleiqiang (Trump) 
 wrote:
 [...]
   I think that's expected, because guest RX data path still 
   uses grant_copy while guest TX uses grant_map to do zero-copy 
   transmit.
 
  As far as I know, there are three main grant-related 
  operations used in split
 device model: grant mapping, grant transfer and grant copy.
  Grant transfer has not used now, and grant mapping and grant 
  transfer both
 involve TLB refresh work for hypervisor, am I right?  Or only 
 grant transfer has this overhead?

 Transfer is not used so I can't tell. Grant unmap causes TLB flush.

 I saw in an email the other day XenServer folks has some planned 
 improvement to avoid TLB flush in Xen to upstream in 4.6 window. 
 I can't speak for sure it will get upstreamed as I don't work on 
 that.

  Does grant copy surely has more overhead than grant mapping?
 

 At the very least the zero-copy TX path is faster than previous 
 copying path.

 But speaking of the micro operation I'm not sure.

 There was once persistent map prototype netback / netfront that 
 establishes a memory pool between FE and BE then use memcpy to 
 copy data. Unfortunately that prototype was not done right so 
 the result was not
   good.
   
The newest mail about persistent grant I can find is sent from 16 
Nov
2012
(http://lists.xen.org/archives/html/xen-devel/2012-11/msg00832.html).
Why is it not done right and not merged into upstream?
   
   AFAICT there's one more memcpy than necessary, i.e. frontend memcpy 
   data into the pool then backend memcpy data out of the pool, when 
   backend should be able to use the page in pool directly.
  
  Memcpy should cheaper than grant_copy because the former needs not the 
  hypercall which will cause VM Exit to XEN Hypervisor, am I 
  right? For RX path, using memcpy based on persistent grant table may 
  have higher performance than using grant copy now.
 
 In theory yes. Unfortunately nobody has benchmarked that properly.

 I have some testing for RX performance using persistent grant method
 and upstream method (3.17.4 branch), the results show that persistent
 grant method does have higher performance than upstream method (from
 3.5Gbps to about 6Gbps). And I find that persistent grant mechanism
 has already used in blkfrong/blkback, I am wondering why there are no
 efforts to replace the grant copy by persistent grant now, at least in
 RX path. Are there other disadvantages in persistent grant method
 which stop we use it? 
 

I've seen numbers better than 6Gbps. See upstream changeset
1650d5455bd2dc6b5ee134bd6fc1a3236c266b5b.

Persistent grant is not silver bullet. There is email thread on the
list discussing whether it should be removed in block driver.

XenServer folks have been working on improving network performance. It's
my understanding that they choose different routes than persistent
grant. David might have more insight.

Wei.

 PS. I used pkt-gen to send packet from dom0 to a domU running on
 another dom0, the CPUs of both dom0 is Intel E5640 2.4GHz, and the two
 dom0s is connected with a 10GE NIC.
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [Qemu-devel] [v2][PATCH] libxl: add one machine property to support IGD GFX passthrough

2015-02-27 Thread Ian Campbell

On Fri, 2015-02-27 at 14:28 +0800, Chen, Tiejun wrote:
 On 2015/2/27 0:17, Ian Campbell wrote:
  On Thu, 2015-02-26 at 14:35 +0800, Chen, Tiejun wrote:
 
  If we are going to do this then I think we need to arrange for the
  interface to be able to express the need to force the workarounds for a
  particular device. IOW a boolean will not suffice since it doesn't
  indicate that IGD workarounds are needed.
 
  Probably it would be simplest to just leave this functionality out for
  the time being and revisit if/when maintaining the list becomes an
  annoyance or an end user trips over it.
 
 
  You mean we should maintain one list to save all targeted devices, then
  tools uses ids as an index to lookup this list to pass something to qemu.
 
  I (think I) meant a list of pci vid:did in libxl, which is matched
  against the devices passed to the domain (e.g. pci = [...] in xl cfg),
  which then enables the igd workarounds, i.e. by passing the option to
 
 Yeah, this is exactly what I'm understanding.
 
  qemu.
 
  But actually one question that I have always been thinking about is, its
  really a responsibility of Xen to determine which device type should be
  passed by probing that pair of vendor and device ids? Xen is just one of
  so many approaches to qemu so such a rare workaround option can be
  passed actively by any user, instead of Xen. Furthermore, its becoming
  flexible as well to those cases we want to force overriding this.
 
  I'm not sure, but I think you are suggestion that qemu should autodetect
  this situation, without being explicitly told igd-passthru=on on the
  command line?
 
  If the qemu maintainers are amenable to that, and it's not already the
  case that other components (e.g. hvmloader) need to be told about these
  workarounds, then I suppose that would work.
 
  So I think qemu should mainly plays this role. If qemu realizes we're
  passing through a IGD or other targeted device, it should post a warning
  or even error message to indicate what right behavior is needed, or what
  is that potential risk by default.
 
  Hrm, here it sounds more like you are suggesting that qemu should detect
  and warn, rather than detect and do the right thing?
 
  I'm not sure how Qemu could indicate what the right behaviour is going
  to be, it'll differ for different hypervisors or even for which Xen
  toolstack (xl vs libvirt etc) is in use.
 
  Or maybe I've misunderstood?
 
 
 IGD is a tricky case since Qemu has to construct a ISA bridge and host 
 bridge before we pass IGD device. But we don't like to expose these two 
 bridges unconditionally, and this is also why we need this option.
 
 Here I just mean when Qemu realizes IGD is passed through but without 
 that appropriate option set, Qemu can post something to explicitly 
 notify user that this option is needed in his case. But it may be a lazy 
 idea.

In any case I think the additions of such warnings in qemu are a
separate to the discussion in this thread, so I propose to leave it
alone for now.

 So now I think I'd better go back handling this on Xen side with your 
 comments. As you said the Boolean doesn't suffice to indicate that IGD 
 workarounds are needed. So I think we can reuse that existing bool 
 'gfx_passthru'.
 
 Firstly we can redefine this as string,

Unfortunately not since libxl's API guarantee requires older clients to
keep working, i.e. those who use libxl_defbool_set on this field.

Probably the best which can be done is to deprecate this field in favour
of a new one (the old field would need to be obeyed only if the new one
was set to its default value).

Probably an Enumeration would be better than a raw string here as well.

This approach doesn't allow for the possibility of multiple such
workarounds though. It's unclear to me if this matters or not.

The other option which I've mentioned is to leave gfx_passthru and have
libxl figure out which workarounds to enable based on the set of PCI
devices passed through. I guess you don't like that approach? (due to
the need to maintain the pci vid:did list?)

 
 -   (gfx_passthru, libxl_defbool),
 +   (gfx_passthru, string),
 
 Then
 
 +
 +if (libxl__is_igd_vga_passthru(gc, guest_config) ||
 +(b_info-u.hvm.gfx_passthru 
 + strncmp(b_info-u.hvm.gfx_passthru, igd, 3) == 0) ) {
 +machinearg = GCSPRINTF(%s,igd-passthru=on, machinearg);
 +}
 +
 
 Of course we need modify something else to align this change.
 
 Thanks
 Tiejun



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] freemem-slack and large memory environments

2015-02-27 Thread Stefano Stabellini

On Fri, 27 Feb 2015, Ian Campbell wrote:
 On Thu, 2015-02-26 at 13:38 -0700, Mike Latimer wrote:
  (Sorry for the delayed response, dealing with ENOTIME.)
  
  On Thursday, February 26, 2015 05:47:21 PM Ian Campbell wrote:
   On Thu, 2015-02-26 at 10:38 -0700, Mike Latimer wrote:
  
   rc = libxl_set_memory_target(ctx, 0, free_memkb - need_memkb, 1, 0);
  
   I think so. In essence we just need to update need_memkb on each
   iteration, right?
  
  Not quite...
 
 Indeed, looking again I see that the 1 there means relative, so I'm
 still confused about why free_memkb - need_memkb isn't the correct delta
 on every iteration.
 
 Is the issue that if you have a current target of, say, 15 and you wish
 to go to ten you would say
 
   libxl_set_memory_target(, 15 - (-5), 1, 0)
 i.e.
   libxl_set_memory_target(, -5, 1, 0)
 
 then the target would be set to 10, but if during
 libxl_wait_for_free_memory you only ballooned -2 and failed the target
 gets left at 10 but the current free is actually now 13 so next time
 around you say:
 
   libxl_set_memory_target(, 13 - (-3), 1, 0)
 i.e.
   libxl_set_memory_target(, -3, 1, 0)
 
 and the target now becomes 10-3 == 7, rather than 13-3=10 as one might
 expect?
 
need_memkb is used in the loop to determine if we have enough 
  free memory for the new domain. So, need_memkb should always remain set to 
  the 
  total amount of memory requested - not just the amount of change still 
  required.
  
  The easiest thing to do is set the dom0's memory target before the loop, 
  which 
  is what my original patch did.
 
 It seems like there are two viable approaches here:
 
 First is to just set the target before the loop and wait (perhaps much
 longer) for it to be achieved.

 The second is to decrement the target in smaller steps and wait to reach
 it each time.

 I don't think an approach which sets a target, waits for that target to
 be achieved and then on partial success tries to figure out what the
 relative progress is and what is left to achieve  and factor that into a
 new target request makes sense.

The reason for the loop is not to make the memory decrease request more
digestible for dom0 or coping with errors. The loop tries to handle
scenarios were the freed memory is not available to us somehow.
This is a more wordy explanation of it:

  get free memory
  is it enough? if so, return, otherwise continue
  set dom0 memory target = current - need
  is there enough memory now? if so, return, otherwise continue
  has dom0 actually reached his target? If so, loop again (who stole the 
memory?), otherwise fail (dom0 is busy)

This is consistent with Mike's logs: the memory is freed by dom0 but it
is not available somehow. Maybe XenD is running? Another guest is
ballooning up at the same time?


 This is all confounded by the fact that the libxl_wait_for_free_*
 functions have a barking interface.

That is true


 I've just seen this comment right
 above:

 /*
  * WARNING
  * This memory management API is unstable even in Xen 4.2.
  * It has a numer of deficiencies and we intend to replace it.
  *
  * The semantics of these functions should not be relied on to be very
  * coherent or stable.  We will however endeavour to keep working
  * existing programs which use them in roughly the same way as libxl.
  */
 
 Given that I think that we should feel free, if necessary, to deprecate
 the current interface and replace it with one which is actually usable.
 Whatever that might mean.
 
 Ian.
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] freemem-slack and large memory environments

2015-02-27 Thread Mike Latimer

On Friday, February 27, 2015 11:29:12 AM Mike Latimer wrote:
 On Friday, February 27, 2015 08:28:49 AM Mike Latimer wrote:
 After adding 2048aeec, dom0's target is lowered by the required amount (e.g.
 64GB), but as dom0 cannot balloon down fast enough,
 libxl_wait_for_memory_target returns -5, and the domain create fails
(wrong return code - libxl_wait_for_memory_target actually returns -3)

With libxl_wait_for_memory_target return code corrected (2048aeec), debug 
messages look like this:

Parsing config from sles12pv
 DBG: start freemem loop
 DBG: free_memkb = 541976, need_memkb = 67651584 (rc=0)
 DBG: dom0_curr_target = 2118976472, set_memory_target = -67109608 (rc=1)
 DBG: wait_for_free_memory = 67651584 (rc=-5)
 DBG: wait_for_memory_target (rc=-3)
failed to free memory for the domain

After failing, dom0 continues to balloon down by the requested amount 
(-67109608), so a subsequent startup attempt would work.

My original fix (2563bca1) was intended to continue looping in freem until dom0 
ballooned down the requested amount. However, this really only worked without 
2048aeec, as wait_for_memory_target was always returning 0. After Stefano 
pointed out this problem, commit 2563bca1 can still be useful - but seems less 
important as ballooning down dom0 is where the major delays are seen.

The following messages show what was happening when wait_for_memory_target was 
always returning 0. I've narrowed it down to just the interesting messages:

DBG: free_memkb = 9794852, need_memkb = 67651584 (rc=0)
DBG: dom0_curr_target = 2118976464, set_memory_target = -67109596 (rc=1)
DBG: dom0_curr_target = 2051866868, set_memory_target = -57856732 (rc=1)
DBG: dom0_curr_target = 1994010136, set_memory_target = -50615004 (rc=1)
DBG: dom0_curr_target = 1943395132, set_memory_target = -43965148 (rc=1)
DBG: dom0_curr_target = 1899429984, set_memory_target = -37538524 (rc=1)
DBG: dom0_curr_target = 1861891460, set_memory_target = -31560412 (rc=1)
DBG: dom0_curr_target = 1830331048, set_memory_target = -25309916 (rc=1)
DBG: dom0_curr_target = 1805021132, set_memory_target = -19514076 (rc=1)
DBG: dom0_curr_target = 1785507056, set_memory_target = -13949660 (rc=1)
DBG: dom0_curr_target = 1771557396, set_memory_target = -8057564 (rc=1)
DBG: dom0_curr_target = 1763499832, set_memory_target = -1862364 (rc=1)

The above situation is no longer relevant, but the overall dom0 target problem 
is still an issue. It now seems rather obvious (hopefully) that the 10 second 
delay in wait_for_memory_target is not sufficient. Should that function be 
modified to monitor ongoing progress and continue waiting as long as progress 
is being made?

Sorry for the long discussion to get to this point. :(

-Mike



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] RFC: xen config changes v4

2015-02-27 Thread Juergen Gross


On 02/27/2015 10:41 AM, Stefano Stabellini wrote:

On Fri, 27 Feb 2015, Juergen Gross wrote:

On 02/26/2015 06:42 PM, Stefano Stabellini wrote:

On Thu, 26 Feb 2015, Luis R. Rodriguez wrote:

On Thu, Feb 26, 2015 at 11:08:20AM +, Stefano Stabellini wrote:

On Thu, 26 Feb 2015, David Vrabel wrote:

On 26/02/15 04:59, Juergen Gross wrote:


So we are again in the situation that pv-drivers always imply the
pvops
kernel (PARAVIRT selected). I started the whole Kconfig rework to
eliminate this dependency.


Yes.  Can you produce a series that just addresses this one issue.

In the absence of any concrete requirement for this big Kconfig reorg
I
I don't think it is helpful.


I clearly missed some context as I didn't realize that this was the
intended goal. Why do we want this? Please explain as it won't come
for free.


We have a few PV interfaces for HVM guests that need PARAVIRT in Linux
in order to be used, for example pv_time_ops and HVMOP_pagetable_dying.
They are critical performance improvements and from the interface
perspective, small enough that doesn't make much sense having a separate
KConfig option for them.


In order to reach the goal above we necessarily need to introduce a
differentiation in terms of PV on HVM guests in Linux:

1) basic guests with PV network, disk, etc but no PV timers, no
 HVMOP_pagetable_dying, no PV IPIs
2) full PV on HVM guests that have PV network, disk, timers,
 HVMOP_pagetable_dying, PV IPIs and anything else that makes sense.

2) is much faster than 1) on Xen and 2) is only a tiny bit slower than
1) on native x86


Also don't we shove 2) down hvm guests right now? Even when everything is
built in I do not see how we opt out for HVM for 1) at run time right now.

If this is true then the question of motivation for this becomes even
stronger I think.


Yes, indeed there is no way to do 1) at the moment. And for good
reasons, see above.


Hmm, after checking the code I'm not convinced:

- HVMOP_pagetable_dying is obsolete on modern hardware supporting
   EPT/HAP


That might be true, but what about older hardware?
Even on modern hardware a few workloads still run faster on shadow.
But if HVMOP_pagetable_dying is the only reason to keep PARAVIRT for HVM
guests, then I agree with you that we should remove it.



- PV IPIs are not needed on single-vcpu guests

- PARAVIRT_CLOCK doesn't need PARAVIRT (in fact the SUSEs kernel configs
   for all x86_64 kernels have CONFIG_PARAVIRT_CLOCK=y)

So I think we really should enable building Xen frontends without
PARAVIRT, implying at least no XEN_PV and no XEN_PVH.

I'll have a try setting up patches.


If we are doing this as a performance improvement, I would like to see a
couple of benchmarks (kernbench, hackbench) to show that on a
single-vcpu guest and multi-vcpu guest (let's say 4 vcpus) disabling
PARAVIRT leads to better performance on Xen on EPT hardware.


This is not meant to be a performance improvement. It is meant to enable
a standard distro kernel configured without PARAVIRT to be able to run
as a HVM guest using the pv-drivers.

Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3] RFC: Automatically check xen's public headers for C++ pitfalls.

2015-02-27 Thread Tim Deegan

At 15:28 -0500 on 26 Feb (1424960919), Don Slutz wrote:
 On 02/26/15 11:24, Tim Deegan wrote:
  Explicitly _not_ addressing the use of 'private' in various fields,
  since we'd previously decided not to fix that.
 
 This sentence and the -Dprivate=private_is_a_keyword_in_cpp below
 appear to be at odds.

Yes, that's not very clear; will reword as I apply.

 You can add my
 
 Tested-by: Don Slutz dsl...@verizon.com

Thanks.

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] RFC: xen config changes v4

2015-02-27 Thread Ian Campbell

On Thu, 2015-02-26 at 19:48 +0100, Luis R. Rodriguez wrote:
 On Thu, Feb 26, 2015 at 05:42:57PM +, Stefano Stabellini wrote:
  On Thu, 26 Feb 2015, Luis R. Rodriguez wrote:
   On Thu, Feb 26, 2015 at 11:08:20AM +, Stefano Stabellini wrote:
On Thu, 26 Feb 2015, David Vrabel wrote:
 On 26/02/15 04:59, Juergen Gross wrote:
  
  So we are again in the situation that pv-drivers always imply the 
  pvops
  kernel (PARAVIRT selected). I started the whole Kconfig rework to
  eliminate this dependency.
 
 Yes.  Can you produce a series that just addresses this one issue.
 
 In the absence of any concrete requirement for this big Kconfig reorg 
 I
 I don't think it is helpful.

I clearly missed some context as I didn't realize that this was the
intended goal. Why do we want this? Please explain as it won't come
for free.


We have a few PV interfaces for HVM guests that need PARAVIRT in Linux
in order to be used, for example pv_time_ops and HVMOP_pagetable_dying.
They are critical performance improvements and from the interface
perspective, small enough that doesn't make much sense having a separate
KConfig option for them.


In order to reach the goal above we necessarily need to introduce a
differentiation in terms of PV on HVM guests in Linux:

1) basic guests with PV network, disk, etc but no PV timers, no
   HVMOP_pagetable_dying, no PV IPIs
2) full PV on HVM guests that have PV network, disk, timers,
   HVMOP_pagetable_dying, PV IPIs and anything else that makes sense.

2) is much faster than 1) on Xen and 2) is only a tiny bit slower than
1) on native x86
   
   Also don't we shove 2) down hvm guests right now? Even when everything is
   built in I do not see how we opt out for HVM for 1) at run time right now.
  
   If this is true then the question of motivation for this becomes even
   stronger I think.
  
  Yes, indeed there is no way to do 1) at the moment. And for good
  reasons, see above.
 
 OK if the goal is to be able to build front end drivers by avoiding building
 PARAVIRT / PARAVIRT_CLOCK and if the gains to be able to do so (which haven't
 been stated other than just the ability to do so) are small (as Stefano notes
 simple hvm containers do not perform great)

I may have misunderstood this bit, WRT this last parenthetical: adding
PV I/O drivers to an HVM guest is AFAIAA the single biggest improvement
you can make to a bare HVM guest in terms of performance.

There are indeed additional gains to be had from other PV stuff which
Stefano mentions (clocks etc), but I believe those are all mostly
incremental and not as impressive as the PV I/O gains (but still good
improvements).

That's not to say that there's an argument in the context of Linux that
if you can enable PV I/O then you can also enable other PV 
optimisations, but I thought I would mention it.

Wasn't part of the original point here to be able to enable PV I/O (and
perhaps other PV stuff) for non-PAE 32-bit x86, i.e. in a context where
PVMMU isn't available. (That doesn't necessarily conflict with if you
can enable PV I/O then you can also enable other PV 
optimisations though)

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] VT-d: print_vtd_entries() should cope with superpages

2015-02-27 Thread Roger Pau Monné

El 27/02/15 a les 10.52, Jan Beulich ha escrit:
 Even if VT-d code alone (i.e. when not sharing tables with EPT) still
 doesn't support superpages, this function - invoked upon DMA remapping
 faults - needs to cope with such.
 
 While at it also replace a few more plain numbers with suitable named
 constants.
 
 Signed-off-by: Jan Beulich jbeul...@suse.com

Thanks for this, looks fine to me:

Acked-by: Roger Pau Monné roger@citrix.com


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] RFC: xen config changes v4

2015-02-27 Thread Stefano Stabellini

On Fri, 27 Feb 2015, Juergen Gross wrote:
 On 02/27/2015 10:41 AM, Stefano Stabellini wrote:
  On Fri, 27 Feb 2015, Juergen Gross wrote:
   On 02/26/2015 06:42 PM, Stefano Stabellini wrote:
On Thu, 26 Feb 2015, Luis R. Rodriguez wrote:
 On Thu, Feb 26, 2015 at 11:08:20AM +, Stefano Stabellini wrote:
  On Thu, 26 Feb 2015, David Vrabel wrote:
   On 26/02/15 04:59, Juergen Gross wrote:

So we are again in the situation that pv-drivers always imply
the
pvops
kernel (PARAVIRT selected). I started the whole Kconfig rework
to
eliminate this dependency.
   
   Yes.  Can you produce a series that just addresses this one issue.
   
   In the absence of any concrete requirement for this big Kconfig
   reorg
   I
   I don't think it is helpful.
  
  I clearly missed some context as I didn't realize that this was the
  intended goal. Why do we want this? Please explain as it won't come
  for free.
  
  
  We have a few PV interfaces for HVM guests that need PARAVIRT in
  Linux
  in order to be used, for example pv_time_ops and
  HVMOP_pagetable_dying.
  They are critical performance improvements and from the interface
  perspective, small enough that doesn't make much sense having a
  separate
  KConfig option for them.
  
  
  In order to reach the goal above we necessarily need to introduce a
  differentiation in terms of PV on HVM guests in Linux:
  
  1) basic guests with PV network, disk, etc but no PV timers, no
   HVMOP_pagetable_dying, no PV IPIs
  2) full PV on HVM guests that have PV network, disk, timers,
   HVMOP_pagetable_dying, PV IPIs and anything else that makes
  sense.
  
  2) is much faster than 1) on Xen and 2) is only a tiny bit slower
  than
  1) on native x86
 
 Also don't we shove 2) down hvm guests right now? Even when everything
 is
 built in I do not see how we opt out for HVM for 1) at run time right
 now.
 
 If this is true then the question of motivation for this becomes even
 stronger I think.

Yes, indeed there is no way to do 1) at the moment. And for good
reasons, see above.
   
   Hmm, after checking the code I'm not convinced:
   
   - HVMOP_pagetable_dying is obsolete on modern hardware supporting
  EPT/HAP
  
  That might be true, but what about older hardware?
  Even on modern hardware a few workloads still run faster on shadow.
  But if HVMOP_pagetable_dying is the only reason to keep PARAVIRT for HVM
  guests, then I agree with you that we should remove it.
  
  
   - PV IPIs are not needed on single-vcpu guests
   
   - PARAVIRT_CLOCK doesn't need PARAVIRT (in fact the SUSEs kernel configs
  for all x86_64 kernels have CONFIG_PARAVIRT_CLOCK=y)
   
   So I think we really should enable building Xen frontends without
   PARAVIRT, implying at least no XEN_PV and no XEN_PVH.
   
   I'll have a try setting up patches.
  
  If we are doing this as a performance improvement, I would like to see a
  couple of benchmarks (kernbench, hackbench) to show that on a
  single-vcpu guest and multi-vcpu guest (let's say 4 vcpus) disabling
  PARAVIRT leads to better performance on Xen on EPT hardware.
 
 This is not meant to be a performance improvement. It is meant to enable
 a standard distro kernel configured without PARAVIRT to be able to run
 as a HVM guest using the pv-drivers.
 
This is not a convincing explanation.  Debian, Ubuntu and Fedora seems
to be able to cope with it just fine.

Why do you want to do that, even though it will cause a performance
regression and a maintenance pain?  You haven't provided a reason yet.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] freemem-slack and large memory environments

2015-02-27 Thread Ian Campbell

On Thu, 2015-02-26 at 13:38 -0700, Mike Latimer wrote:
 (Sorry for the delayed response, dealing with ENOTIME.)
 
 On Thursday, February 26, 2015 05:47:21 PM Ian Campbell wrote:
  On Thu, 2015-02-26 at 10:38 -0700, Mike Latimer wrote:
 
  rc = libxl_set_memory_target(ctx, 0, free_memkb - need_memkb, 1, 0);
 
  I think so. In essence we just need to update need_memkb on each
  iteration, right?
 
 Not quite...

Indeed, looking again I see that the 1 there means relative, so I'm
still confused about why free_memkb - need_memkb isn't the correct delta
on every iteration.

Is the issue that if you have a current target of, say, 15 and you wish
to go to ten you would say

libxl_set_memory_target(, 15 - (-5), 1, 0)
i.e.
libxl_set_memory_target(, -5, 1, 0)

then the target would be set to 10, but if during
libxl_wait_for_free_memory you only ballooned -2 and failed the target
gets left at 10 but the current free is actually now 13 so next time
around you say:

libxl_set_memory_target(, 13 - (-3), 1, 0)
i.e.
libxl_set_memory_target(, -3, 1, 0)

and the target now becomes 10-3 == 7, rather than 13-3=10 as one might
expect?

   need_memkb is used in the loop to determine if we have enough 
 free memory for the new domain. So, need_memkb should always remain set to 
 the 
 total amount of memory requested - not just the amount of change still 
 required.
 
 The easiest thing to do is set the dom0's memory target before the loop, 
 which 
 is what my original patch did.

It seems like there are two viable approaches here:

First is to just set the target before the loop and wait (perhaps much
longer) for it to be achieved.

The second is to decrement the target in smaller steps and wait to reach
it each time.

I don't think an approach which sets a target, waits for that target to
be achieved and then on partial success tries to figure out what the
relative progress is and what is left to achieve  and factor that into a
new target request makes sense.

This is all confounded by the fact that the libxl_wait_for_free_*
functions have a barking interface. I've just seen this comment right
above:

/*
 * WARNING
 * This memory management API is unstable even in Xen 4.2.
 * It has a numer of deficiencies and we intend to replace it.
 *
 * The semantics of these functions should not be relied on to be very
 * coherent or stable.  We will however endeavour to keep working
 * existing programs which use them in roughly the same way as libxl.
 */

Given that I think that we should feel free, if necessary, to deprecate
the current interface and replace it with one which is actually usable.
Whatever that might mean.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] RFC: xen config changes v4

2015-02-27 Thread Ian Campbell

On Fri, 2015-02-27 at 10:11 +, Stefano Stabellini wrote:

(for some reason I initially thought this was in reply to my mail, so
it's written in a way which assumes that, so sprinkle IMHO around the
place and/or take it as a follow on to my previous mail in this thread,
I guess)

 This is not a convincing explanation.  Debian, Ubuntu and Fedora seems
 to be able to cope with it just fine.

Debian doesn't really, for an i386 Debian installation you need to go
and find some slightly obscure media which has a PAE kernel on it in
order to install with PV drivers. If you just download the most obvious
i386 installation media you get no PV drivers of any description in an
HVM guest.

Fedora IIRC has moved everything over to PAE by default (no non-PAE
support), so they are probably OK.

I've no idea what Ubuntu does.

 Why do you want to do that, even though it will cause a performance
 regression and a maintenance pain?  You haven't provided a reason yet.

Where is the performance regression?

For a non-PAE x86 guest, which currently has 0 PV optimisations enabled
(no PV I/O, no PV clock, nothing) being able to enable PV I/O is a
useful performance improvement.

I'm also not saying that it *only* makes sense to enable PV I/O, if it
was also possible to enable other PV things, like PV clocks etc for
non-PAE x86 guests then that would also be worthwhile.

But I am saying that if enabling those extra optimisations for non-PAE
x86 guests is too invasive or problematic or whatever then it would
*still* be worth enabling PV I/O if that is more possible.

Note that in no case am I suggesting turning off something which is
possible today. In particular I see no reason to want to disable PV
optimisations for PAE enabled x86 guests.




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] correct mis-conversion set_bit() - __cpumask_set_cpu() by 4aaca0e9cd

2015-02-27 Thread Ian Campbell

On Fri, 2015-02-27 at 07:33 +, Jan Beulich wrote:
  On 26.02.15 at 17:53, li...@eikelenboom.it wrote:
 
  Monday, February 23, 2015, 12:06:00 PM, you wrote:
  
  I have no idea how I came to use __cpumask_set_cpu() there, the
  conversion should have been set_bit() - __set_bit(). The wrong
  construct results in problems on systems with relatively few CPUs.
  
  Reported-by: Sander Eikelenboom li...@eikelenboom.it
  Signed-off-by: Jan Beulich jbeul...@suse.com
  
  --- a/xen/common/softirq.c
  +++ b/xen/common/softirq.c
  @@ -106,7 +106,7 @@ void cpu_raise_softirq(unsigned int cpu,
   if ( !per_cpu(batching, this_cpu) || in_irq() )
   smp_send_event_check_cpu(cpu);
   else
  -__cpumask_set_cpu(nr, per_cpu(batch_mask, this_cpu));
  +__set_bit(nr, per_cpu(batch_mask, this_cpu));
   }
   
   void cpu_raise_softirq_batch_begin(void)
  
  Hi Jan,
  
  Any reason this wasn't applied to staging yet ?
 
 It didn't get ack-ed

Sorry, I thought this was an x86 patch for some reason and therefore
that Andrew's ack was sufficient.

For v2 of the patch (54eb3d88027800062...@mail.emea.novell.com,
using __cpumask_set_cpu(cpu, ...):

Acked-by: Ian Campbell ian.campb...@citrix.com



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] xen/iommu: fix usage of shared EPT/IOMMU page tables on PVH guests

2015-02-27 Thread Jan Beulich

 On 27.02.15 at 11:10, roger@citrix.com wrote:
 iommu_share_p2m_table should not prevent PVH guests from using a shared page
 table. Change the condition to has_hvm_container_domain instead of
 is_hvm_domain. This allows both PVH and HVM guests to use it. Remove the
 asserts in iommu_set_pgd and amd_iommu_share_p2m, iommu_share_p2m_table
 and p2m_alloc_table already do them.

This wording is confusing - it took me to got into p2m_alloc_table()
to see that one half of the assertion is being satisfied there an
the other in iommu_share_p2m_table(). While not asserting what
IOMMU code does is quite fine in IOMMU code (especially as closely
related as is the case here), the assertion regarding what P2M
code does (and what a future second caller of
iommu_share_p2m_table() might violate) should be kept, but
perhaps be moved into iommu_share_p2m_table() instead of
keeping it in vendor specific code.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-27 Thread Ian Campbell

On Fri, 2015-02-27 at 15:41 +0530, Pranavkumar Sawargaonkar wrote:
 Hi Julien,
 
 On Thu, Feb 26, 2015 at 8:47 PM, Julien Grall julien.gr...@linaro.org wrote:
  On 26/02/15 14:46, Pranavkumar Sawargaonkar wrote:
  Hi
 
  Hi Pranavkumar,
 
  Also if we just show only one vITS (or only one Virtual v2m frame)
  instead of two vITS
  then actual hardware interrupt number and virtual interrupt number
  which guest will see will become different
  This will hamper direct irq routing to guest.
 
  The IRQ injection should not consider a 1:1 mapping between pIRQ and vIRQ.
 
 Yes, but in case of GICv2m( I am not sure about ITS) in register
 MSI_SETSPI_NS device has to write the interrupt ID (which is pirq) to
 generate an interrupt.
 If you write virq which is different that pirq (associated with the
 actual GICv2m frame ) then it will not trigger any interrupt.
 
 Now there is case which I am not sure how it can be solvable with one
 vITS/vGICv2m  -
 
 . Suppose we have two GICv2m frames and say oneis  having an address
 0x1000 for MSI_SETSPI_NS register and other 0x2000 for it's
 MSI_SETSPI_NS register
 . Assume first frame has SPI's (physical) 0x64 - 0x72 associated and
 second has 0x80-0x88 associated.
 . Now there are two PCIe hosts, first using first GICv2m frame as a
 MSI parent and another using second frame.
 . Device on first host uses MSI_SETSPI_NS (0x1000) address along with
 a data (i.e. intr number say 0x64) and device on second host uses
 0x2000 and data 0x80
 
 Now if we show one vGICv2m frame in guest for both the devices then
 what address I will program in each device's config space for MSI and
 also what will the data value.
 Secondly device's write for these addresses will be transparent to cpu
 so how can we trap them while device wants to trigger any interrupt ?

 Please correct me if I misunderstood anything.

Is what you are suggesting a v2m specific issue?

I thought the whole point of the ITS stuff in GICv3 was that one could
program such virt-phys mappings into the hardware ITS and it would do
the translation (the T in ITS) such that the host got the pIRQ it was
expecting when the guest wrote the virtualised vIRQ information to the
device.

Caveat: If I've read the ITS bits of that doc at any point it was long
ago and I've forgotten everything I knew about it... And I've never read
anything about v2m at all ;-)

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH] VT-d: print_vtd_entries() should cope with superpages

2015-02-27 Thread Jan Beulich

Even if VT-d code alone (i.e. when not sharing tables with EPT) still
doesn't support superpages, this function - invoked upon DMA remapping
faults - needs to cope with such.

While at it also replace a few more plain numbers with suitable named
constants.

Signed-off-by: Jan Beulich jbeul...@suse.com

--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -268,18 +268,22 @@ struct dma_pte {
 };
 #define DMA_PTE_READ (1)
 #define DMA_PTE_WRITE (2)
+#define DMA_PTE_PROT (DMA_PTE_READ | DMA_PTE_WRITE)
+#define DMA_PTE_SP   (1  7)
 #define DMA_PTE_SNP  (1  11)
 #define dma_clear_pte(p)do {(p).val = 0;} while(0)
 #define dma_set_pte_readable(p) do {(p).val |= DMA_PTE_READ;} while(0)
 #define dma_set_pte_writable(p) do {(p).val |= DMA_PTE_WRITE;} while(0)
-#define dma_set_pte_superpage(p) do {(p).val |= (1  7);} while(0)
+#define dma_set_pte_superpage(p) do {(p).val |= DMA_PTE_SP;} while(0)
 #define dma_set_pte_snp(p)  do {(p).val |= DMA_PTE_SNP;} while(0)
-#define dma_set_pte_prot(p, prot) \
-do {(p).val = ((p).val  ~3) | ((prot)  3); } while (0)
+#define dma_set_pte_prot(p, prot) do { \
+(p).val = ((p).val  ~DMA_PTE_PROT) | ((prot)  DMA_PTE_PROT); \
+} while (0)
 #define dma_pte_addr(p) ((p).val  PADDR_MASK  PAGE_MASK_4K)
 #define dma_set_pte_addr(p, addr) do {\
 (p).val |= ((addr)  PAGE_MASK_4K); } while (0)
-#define dma_pte_present(p) (((p).val  3) != 0)
+#define dma_pte_present(p) (((p).val  DMA_PTE_PROT) != 0)
+#define dma_pte_superpage(p) (((p).val  DMA_PTE_SP) != 0)
 
 /* interrupt remap entry */
 struct iremap_entry {
--- a/xen/drivers/passthrough/vtd/utils.c
+++ b/xen/drivers/passthrough/vtd/utils.c
@@ -179,6 +179,8 @@ void print_vtd_entries(struct iommu *iom
 printk(l%d[%x] not present\n, level, l_index);
 break;
 }
+if ( dma_pte_superpage(pte) )
+break;
 val = dma_pte_addr(pte);
 } while ( --level );
 }



VT-d: print_vtd_entries() should cope with superpages

Even if VT-d code alone (i.e. when not sharing tables with EPT) still
doesn't support superpages, this function - invoked upon DMA remapping
faults - needs to cope with such.

While at it also replace a few more plain numbers with suitable named
constants.

Signed-off-by: Jan Beulich jbeul...@suse.com

--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -268,18 +268,22 @@ struct dma_pte {
 };
 #define DMA_PTE_READ (1)
 #define DMA_PTE_WRITE (2)
+#define DMA_PTE_PROT (DMA_PTE_READ | DMA_PTE_WRITE)
+#define DMA_PTE_SP   (1  7)
 #define DMA_PTE_SNP  (1  11)
 #define dma_clear_pte(p)do {(p).val = 0;} while(0)
 #define dma_set_pte_readable(p) do {(p).val |= DMA_PTE_READ;} while(0)
 #define dma_set_pte_writable(p) do {(p).val |= DMA_PTE_WRITE;} while(0)
-#define dma_set_pte_superpage(p) do {(p).val |= (1  7);} while(0)
+#define dma_set_pte_superpage(p) do {(p).val |= DMA_PTE_SP;} while(0)
 #define dma_set_pte_snp(p)  do {(p).val |= DMA_PTE_SNP;} while(0)
-#define dma_set_pte_prot(p, prot) \
-do {(p).val = ((p).val  ~3) | ((prot)  3); } while (0)
+#define dma_set_pte_prot(p, prot) do { \
+(p).val = ((p).val  ~DMA_PTE_PROT) | ((prot)  DMA_PTE_PROT); \
+} while (0)
 #define dma_pte_addr(p) ((p).val  PADDR_MASK  PAGE_MASK_4K)
 #define dma_set_pte_addr(p, addr) do {\
 (p).val |= ((addr)  PAGE_MASK_4K); } while (0)
-#define dma_pte_present(p) (((p).val  3) != 0)
+#define dma_pte_present(p) (((p).val  DMA_PTE_PROT) != 0)
+#define dma_pte_superpage(p) (((p).val  DMA_PTE_SP) != 0)
 
 /* interrupt remap entry */
 struct iremap_entry {
--- a/xen/drivers/passthrough/vtd/utils.c
+++ b/xen/drivers/passthrough/vtd/utils.c
@@ -179,6 +179,8 @@ void print_vtd_entries(struct iommu *iom
 printk(l%d[%x] not present\n, level, l_index);
 break;
 }
+if ( dma_pte_superpage(pte) )
+break;
 val = dma_pte_addr(pte);
 } while ( --level );
 }
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 0/5] (not just)x86/Dom0: NUMA related adjustments

2015-02-27 Thread Dario Faggioli

[adding Wei, as he may be interested, for his vNUMA work]

On Thu, 2015-02-26 at 13:44 +, Jan Beulich wrote:
 1: x86: allow specifying the NUMA nodes Dom0 should run on
 2: allow domain heap allocations to specify more than one NUMA node
 3: x86: widen NUMA nodes to be allocated from
 4: VT-d: widen NUMA nodes to be allocated from
 5: AMD IOMMU: widen NUMA nodes to be allocated from
 
 Signed-off-by: Jan Beulich jbeul...@suse.com
 ---
 To apply cleanly his depends on x86/Dom0: account for shadow/HAP allocation
 (http://lists.xenproject.org/archives/html/xen-devel/2015-02/msg03111.html).



signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH] xen/iommu: fix usage of shared EPT/IOMMU page tables on PVH guests

2015-02-27 Thread Roger Pau Monne

iommu_share_p2m_table should not prevent PVH guests from using a shared page
table. Change the condition to has_hvm_container_domain instead of
is_hvm_domain. This allows both PVH and HVM guests to use it. Remove the
asserts in iommu_set_pgd and amd_iommu_share_p2m, iommu_share_p2m_table
and p2m_alloc_table already do them.

Also fix another incorrect usage of is_hvm_domain usage in
arch_iommu_populate_page_table. This has not given problems so far because
all the pages in PVH guests are of type PGT_writable_page.

Signed-off-by: Roger Pau Monné roger@citrix.com
Cc: Suravee Suthikulpanit suravee.suthikulpa...@amd.com
Cc: Aravind Gopalakrishnan aravind.gopalakrish...@amd.com
Cc: Jan Beulich jbeul...@suse.com
Cc: Yang Zhang yang.z.zh...@intel.com
Cc: Kevin Tian kevin.t...@intel.com
---
 xen/drivers/passthrough/amd/iommu_map.c | 2 --
 xen/drivers/passthrough/iommu.c | 2 +-
 xen/drivers/passthrough/vtd/iommu.c | 2 --
 xen/drivers/passthrough/x86/iommu.c | 2 +-
 4 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/xen/drivers/passthrough/amd/iommu_map.c 
b/xen/drivers/passthrough/amd/iommu_map.c
index a8c60ec..31dc05d 100644
--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -785,8 +785,6 @@ void amd_iommu_share_p2m(struct domain *d)
 struct page_info *p2m_table;
 mfn_t pgd_mfn;
 
-ASSERT( is_hvm_domain(d)  d-arch.hvm_domain.hap_enabled );
-
 if ( !iommu_use_hap_pt(d) )
 return;
 
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index cc12735..3e11d6b 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -332,7 +332,7 @@ void iommu_share_p2m_table(struct domain* d)
 {
 const struct iommu_ops *ops = iommu_get_ops();
 
-if ( iommu_enabled  is_hvm_domain(d) )
+if ( iommu_enabled  has_hvm_container_domain(d) )
 ops-share_p2m(d);
 }
 
diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index 2e113d7..ff542cb 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1788,8 +1788,6 @@ static void iommu_set_pgd(struct domain *d)
 struct hvm_iommu *hd  = domain_hvm_iommu(d);
 mfn_t pgd_mfn;
 
-ASSERT( is_hvm_domain(d)  d-arch.hvm_domain.hap_enabled );
-
 if ( !iommu_use_hap_pt(d) )
 return;
 
diff --git a/xen/drivers/passthrough/x86/iommu.c 
b/xen/drivers/passthrough/x86/iommu.c
index 52d8948..9eb8d33 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -56,7 +56,7 @@ int arch_iommu_populate_page_table(struct domain *d)
 
 while ( !rc  (page = page_list_remove_head(d-page_list)) )
 {
-if ( is_hvm_domain(d) ||
+if ( has_hvm_container_domain(d) ||
 (page-u.inuse.type_info  PGT_type_mask) == PGT_writable_page )
 {
 BUG_ON(SHARED_M2P(mfn_to_gmfn(d, page_to_mfn(page;
-- 
1.9.3 (Apple Git-50)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-27 Thread Pranavkumar Sawargaonkar

Hi Julien,

On Thu, Feb 26, 2015 at 8:47 PM, Julien Grall julien.gr...@linaro.org wrote:
 On 26/02/15 14:46, Pranavkumar Sawargaonkar wrote:
 Hi

 Hi Pranavkumar,

 Also if we just show only one vITS (or only one Virtual v2m frame)
 instead of two vITS
 then actual hardware interrupt number and virtual interrupt number
 which guest will see will become different
 This will hamper direct irq routing to guest.

 The IRQ injection should not consider a 1:1 mapping between pIRQ and vIRQ.

Yes, but in case of GICv2m( I am not sure about ITS) in register
MSI_SETSPI_NS device has to write the interrupt ID (which is pirq) to
generate an interrupt.
If you write virq which is different that pirq (associated with the
actual GICv2m frame ) then it will not trigger any interrupt.

Now there is case which I am not sure how it can be solvable with one
vITS/vGICv2m  -

. Suppose we have two GICv2m frames and say oneis  having an address
0x1000 for MSI_SETSPI_NS register and other 0x2000 for it's
MSI_SETSPI_NS register
. Assume first frame has SPI's (physical) 0x64 - 0x72 associated and
second has 0x80-0x88 associated.
. Now there are two PCIe hosts, first using first GICv2m frame as a
MSI parent and another using second frame.
. Device on first host uses MSI_SETSPI_NS (0x1000) address along with
a data (i.e. intr number say 0x64) and device on second host uses
0x2000 and data 0x80

Now if we show one vGICv2m frame in guest for both the devices then
what address I will program in each device's config space for MSI and
also what will the data value.
Secondly device's write for these addresses will be transparent to cpu
so how can we trap them while device wants to trigger any interrupt ?
Please correct me if I misunderstood anything.

Thanks,
Pranav




 I have a patch which allow virq != pirq:

 https://patches.linaro.org/43012/

 Regards,

 --
 Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] freemem-slack and large memory environments

2015-02-27 Thread Ian Campbell

On Thu, 2015-02-26 at 16:30 -0700, Mike Latimer wrote:
 On Thursday, February 26, 2015 01:45:16 PM Mike Latimer wrote:
  On Thursday, February 26, 2015 05:53:06 PM Stefano Stabellini wrote:
   What is the return value of libxl_set_memory_target and
   libxl_wait_for_free_memory in that case? Isn't it just a matter of
   properly handle the return values?
  
  The return from libxl_set_memory_target is 0, as the assignment works just
  fine. I don't have the return from libxl_wait_for_free_memory in my notes,
  so I'll spin up another test and track that down.
 
 I slightly misspoke here... In my testing, the returns are actually:
 
libxl_set_memory_target = 1
libxl_wait_for_free_memory = -5
libxl_wait_for_memory_target = 0
   Note - libxl_wait_for_memory_target is confusing,

Further to the comment I just made WRT this source comment:
/*
 * WARNING
 * This memory management API is unstable even in Xen 4.2.
 * It has a numer of deficiencies and we intend to replace it.
 *
 * The semantics of these functions should not be relied on to be very
 * coherent or stable.  We will however endeavour to keep working
 * existing programs which use them in roughly the same way as libxl.
 */

I think we should feel free to introduce a new interface which has
semantics which we can actually work with. IOW

  as rc can be set
   to ERROR_FAIL, but the function returns 0 anyway (unless an error
   is encountered earlier.) I guess this just means we need to continue
   to wait...

Do something sensible so there is no more guessing.

I'm not sure yet what sensible would be.

One approach to fixing this might be when the replacenent for
libxl_wait_for_memory_target fails it sets the target to whatever was
actually achieved, such that further calculations involving free_memkb
and the overall target will still be valid.

Or we could move the progress is being made logic currently in xl's
freemem down into the wait_for_memory_target replacement so it hopefully
has more information available to it in order to make better decisions
about the timeouts.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] how to assign resources exclusive to a single domU

2015-02-27 Thread Ian Campbell

On Fri, 2015-02-27 at 09:19 +0100, Olaf Hering wrote:
 On Fri, Feb 27, Jürgen Groß wrote:
 
  On 02/26/2015 09:57 AM, Olaf Hering wrote:
  I wonder what should be done in my changes for libxl.
  If you are doing something, please add a flag to be able to disable
  the additional security checks regarding multiple assignment.
 
 I think libxl should just allow multiple assignments of physical
 devices. Its up to the admin to make sure the overall config is sane.

I can't remember what libxl does today but WRT disks (with the phy
backend at least) xend used to have sharing checks and refuse to allow
sharing (for writeable disks) unless overridden (by w+ in the mode
string, IIRC).

I don't think libxl implements those checks, so the override isn't
supported, but maybe it would be good to do so, and maybe it would be a
good idea for pvscsi to at least be consistent with what we might
eventually do for disks?

(FWIW I think most of the checks were actually in the block-* scripts,
I'm not sure why they are active under libxl)

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] VT-d: print_vtd_entries() should cope with superpages

2015-02-27 Thread Andrew Cooper

On 27/02/15 09:52, Jan Beulich wrote:
 Even if VT-d code alone (i.e. when not sharing tables with EPT) still
 doesn't support superpages, this function - invoked upon DMA remapping
 faults - needs to cope with such.

 While at it also replace a few more plain numbers with suitable named
 constants.

 Signed-off-by: Jan Beulich jbeul...@suse.com

Reviewed-by: Andrew Cooper andrew.coop...@citrix.com


 --- a/xen/drivers/passthrough/vtd/iommu.h
 +++ b/xen/drivers/passthrough/vtd/iommu.h
 @@ -268,18 +268,22 @@ struct dma_pte {
  };
  #define DMA_PTE_READ (1)
  #define DMA_PTE_WRITE (2)
 +#define DMA_PTE_PROT (DMA_PTE_READ | DMA_PTE_WRITE)
 +#define DMA_PTE_SP   (1  7)
  #define DMA_PTE_SNP  (1  11)
  #define dma_clear_pte(p)do {(p).val = 0;} while(0)
  #define dma_set_pte_readable(p) do {(p).val |= DMA_PTE_READ;} while(0)
  #define dma_set_pte_writable(p) do {(p).val |= DMA_PTE_WRITE;} while(0)
 -#define dma_set_pte_superpage(p) do {(p).val |= (1  7);} while(0)
 +#define dma_set_pte_superpage(p) do {(p).val |= DMA_PTE_SP;} while(0)
  #define dma_set_pte_snp(p)  do {(p).val |= DMA_PTE_SNP;} while(0)
 -#define dma_set_pte_prot(p, prot) \
 -do {(p).val = ((p).val  ~3) | ((prot)  3); } while (0)
 +#define dma_set_pte_prot(p, prot) do { \
 +(p).val = ((p).val  ~DMA_PTE_PROT) | ((prot)  DMA_PTE_PROT); \
 +} while (0)
  #define dma_pte_addr(p) ((p).val  PADDR_MASK  PAGE_MASK_4K)
  #define dma_set_pte_addr(p, addr) do {\
  (p).val |= ((addr)  PAGE_MASK_4K); } while (0)
 -#define dma_pte_present(p) (((p).val  3) != 0)
 +#define dma_pte_present(p) (((p).val  DMA_PTE_PROT) != 0)
 +#define dma_pte_superpage(p) (((p).val  DMA_PTE_SP) != 0)
  
  /* interrupt remap entry */
  struct iremap_entry {
 --- a/xen/drivers/passthrough/vtd/utils.c
 +++ b/xen/drivers/passthrough/vtd/utils.c
 @@ -179,6 +179,8 @@ void print_vtd_entries(struct iommu *iom
  printk(l%d[%x] not present\n, level, l_index);
  break;
  }
 +if ( dma_pte_superpage(pte) )
 +break;
  val = dma_pte_addr(pte);
  } while ( --level );
  }





 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 1/1] xen-netback: remove compilation warning

2015-02-27 Thread pmarzo


On jue, 2015-02-26 at 11:30 -0500, David Miller wrote:
 From: pedro marzo.pe...@gmail.com
 Date: Thu, 26 Feb 2015 09:25:41 +0100
 
  From: pmarzo marzo.pe...@gmail.com
  
  offset and size are of type uint16_t so the %lu gives a warning
  A %u specifier, the same used in size makes gcc happy
  Not sure if a %x would be more correct
  
  Signed-off-by: Pedro Marzo Perez marzo.pe...@gmail.com
 
 This patch actually adds a warning on my machine, and your analysis
 of the types is therefore probably incorrect:
 
 drivers/net/xen-netback/netback.c: In function ‘xenvif_tx_build_gops’:
 drivers/net/xen-netback/netback.c:1259:8: warning: format ‘%u’ expects 
 argument of type ‘unsigned int’, but argument 5 has type ‘long unsigned int’ 
 [-Wformat=]

You are right, this patch is completely wrong for i386, it gives me a
warning too. I should have checked that before, sorry. 

I should also have said I am using a cross compiler, which is the one
that gives the warning compiling the current code:
arm-linux-gnueabi-gcc --version
arm-linux-gnueabi-gcc (Ubuntu/Linaro 4.7.3-12ubuntu1) 4.7.3


 
 The issue is probably ~PAGE_MASK and I think the type of that
 propagates into the type of the overall calculation.
That is what is probably happening, operations must be done to operands
of the same size, and the intel compiler is casting everything to
unsigned long (because I have a 64 bit machine??), but the arm compiler
is casting to unsigned int :-(

PAGE_MASK is defined as a number without any cast, so not sure which
compiler is right
#define PAGE_SHIFT  12
#define PAGE_MASK   (~((1  PAGE_SHIFT) - 1))

This new patch fixes the warning for the arm gcc compiler and the i386
compiler, it just makes sure everything is cast to unsigned long
Could you please forget the previous one and give your opinion about
this one?

--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1248,9 +1248,10 @@ static void xenvif_tx_build_gops(struct
xenvif_queue *queue,
/* No crossing a page as the payload mustn't fragment.
*/
if (unlikely((txreq.offset + txreq.size)  PAGE_SIZE)) {
netdev_err(queue-vif-dev,
-  txreq.offset: %x, size: %u, end: %u
\n,
+  txreq.offset: %x, size: %u, end: %lu
\n,
   txreq.offset, txreq.size,
-  (txreq.offset~PAGE_MASK) +
txreq.size);
+  ((unsigned
long)txreq.offset~PAGE_MASK)
++ txreq.size);
xenvif_fatal_tx_err(queue-vif);
break;
}
-- 
1.9.1









___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [linux-linus test] 35443: regressions - trouble: blocked/broken/fail/pass

2015-02-27 Thread xen . org

flight 35443 linux-linus real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/35443/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-rumpuserxen-amd64  8 guest-start fail REGR. vs. 34227
 build-armhf-libvirt   3 host-install(3) broken REGR. vs. 34227
 build-armhf-pvops 3 host-install(3) broken REGR. vs. 34227
 test-amd64-amd64-xl-qemut-win7-amd64  7 windows-install   fail REGR. vs. 34227

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-freebsd10-i386  7 freebsd-install  fail like 34227
 test-amd64-i386-freebsd10-amd64  7 freebsd-install fail like 34227
 test-amd64-i386-pair17 guest-migrate/src_host/dst_host fail like 34227

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-pvh-intel  9 guest-start  fail  never pass
 test-armhf-armhf-xl-midway1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-sedf-pin  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-sedf  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-credit2   1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt 10 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-pcipt-intel  9 guest-start fail never pass
 test-amd64-amd64-xl-pvh-amd   9 guest-start  fail   never pass
 test-amd64-i386-libvirt  10 migrate-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-amd64-xl-winxpsp3 14 guest-stop   fail   never pass
 test-amd64-i386-xl-qemuu-winxpsp3 14 guest-stopfail never pass
 test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xl-win7-amd64 14 guest-stop   fail  never pass
 test-amd64-amd64-xl-win7-amd64 14 guest-stop   fail never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass
 test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop   fail never pass
 test-amd64-i386-xl-winxpsp3  14 guest-stop   fail   never pass
 test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass
 test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop   fail never pass

version targeted for testing:
 linuxb24e2bdde4af656bb0679a101265ebb8f8735d3c
baseline version:
 linux9d82f5eb3376cbae96ad36a063a9390de1694546


1736 people touched revisions under test,
not listing them all


jobs:
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  broken  
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopsbroken  
 build-i386-pvops pass
 build-amd64-rumpuserxen  pass
 build-i386-rumpuserxen   pass
 test-amd64-amd64-xl  pass
 test-armhf-armhf-xl  blocked 
 test-amd64-i386-xl   pass
 test-amd64-amd64-xl-pvh-amd  fail
 test-amd64-i386-rhel6hvm-amd pass
 test-amd64-i386-qemut-rhel6hvm-amd   pass
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64pass
 test-amd64-i386-xl-qemut-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-i386-freebsd10-amd64  fail
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass

Re: [Xen-devel] backport c1d322e6048796296555dd36fdd102d7fa2f50bf to all stable trees

2015-02-27 Thread Stefano Stabellini

On Fri, 27 Feb 2015, Fabio Fantoni wrote:
 Il 26/02/2015 14:02, Stefano Stabellini ha scritto:
  Hi all,
  
  I would like to request a backport of
  
  commit c1d322e6048796296555dd36fdd102d7fa2f50bf
  Author: Stefano Stabellini stefano.stabell...@eu.citrix.com
  Date:   Wed Dec 3 08:15:19 2014 -0500
  
   xen-hvm: increase maxmem before calling xc_domain_populate_physmap
 
 Seems that this fixes is applied only in staging/qemu-upstream-unstable.git
 (of xen's gits) but still not in qemu-upstream-unstable.git or stables ones.

An unrelated local-migrate test is failing. It is believed to be due to
Paul's ioreq-server API changes and the fix should be in xen-unstable
already (the fix is a patch to the hypervisor). We expect the test to
pass soon.


 Can be the cause of strange problem of loop of increase memory failing on
 hvm domUs start with xen 4.4, 4.5 and unstable with newer kernel even if domUs
 and dom0 have all fixed memory settings with balloning disabled?

What exactly are you referring to?
Are you talking about http://marc.info/?l=xen-develm=142499350515886 ?


 Or is another memory bug in xen?
 I have syslog and kern.log increasing some gb each days full of:
 xen:balloon: reserve_additional_memory: add_memory() failed: -17
 in one 4.5.0 dom0 also with kernel 3.16.7-ckt4-3~bpo70+1 with these applied:
 [xen] cancel ballooning if adding new memory failed (Closes: #776448)
 
 Thanks for any reply and sorry for my bad english.

It shouldn't have anything to do with xen-hvm: increase maxmem before
calling xc_domain_populate_physmap. To make sure you could simply
revert c1d322e6048796296555dd36fdd102d7fa2f50bf
(901230fd8ce053cc21312a2eca2f3ba9f1d103f2 in qemu-upstream-unstable.git)
and try again to see if the memory issues you are experiencing go away.


  
  to all QEMU stable trees. Which ones are the currently maintained trees?
  
  It applies without issues to 2.2, 2.1, 2.0, 1.7, 1.6, 1.5.
  The filename in the commit needs to be changed from xen-hvm.c to
  xen-all.c for 1.4, 1.3, 1.2, 1.1.  I didn't go father back.
  
  Thanks,
  
  Stefano
  
  ___
  Xen-devel mailing list
  Xen-devel@lists.xen.org
  http://lists.xen.org/xen-devel
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

1 2 >

1 - 100 of 105 matches

Mail list logo