Re: [Xen-devel] Poor network performance between DomU with multiqueue support
At 2015-02-27 18:59:52, Wei Liu wei.l...@citrix.com wrote: Cc'ing David (XenServer kernel maintainer) On Fri, Feb 27, 2015 at 05:21:11PM +0800, openlui wrote: On Mon, Dec 08, 2014 at 01:08:18PM +, Zhangleiqiang (Trump) wrote: On Mon, Dec 08, 2014 at 06:44:26AM +, Zhangleiqiang (Trump) wrote: On Fri, Dec 05, 2014 at 01:17:16AM +, Zhangleiqiang (Trump) wrote: [...] I think that's expected, because guest RX data path still uses grant_copy while guest TX uses grant_map to do zero-copy transmit. As far as I know, there are three main grant-related operations used in split device model: grant mapping, grant transfer and grant copy. Grant transfer has not used now, and grant mapping and grant transfer both involve TLB refresh work for hypervisor, am I right? Or only grant transfer has this overhead? Transfer is not used so I can't tell. Grant unmap causes TLB flush. I saw in an email the other day XenServer folks has some planned improvement to avoid TLB flush in Xen to upstream in 4.6 window. I can't speak for sure it will get upstreamed as I don't work on that. Does grant copy surely has more overhead than grant mapping? At the very least the zero-copy TX path is faster than previous copying path. But speaking of the micro operation I'm not sure. There was once persistent map prototype netback / netfront that establishes a memory pool between FE and BE then use memcpy to copy data. Unfortunately that prototype was not done right so the result was not good. The newest mail about persistent grant I can find is sent from 16 Nov 2012 (http://lists.xen.org/archives/html/xen-devel/2012-11/msg00832.html). Why is it not done right and not merged into upstream? AFAICT there's one more memcpy than necessary, i.e. frontend memcpy data into the pool then backend memcpy data out of the pool, when backend should be able to use the page in pool directly. Memcpy should cheaper than grant_copy because the former needs not the hypercall which will cause VM Exit to XEN Hypervisor, am I right? For RX path, using memcpy based on persistent grant table may have higher performance than using grant copy now. In theory yes. Unfortunately nobody has benchmarked that properly. I have some testing for RX performance using persistent grant method and upstream method (3.17.4 branch), the results show that persistent grant method does have higher performance than upstream method (from 3.5Gbps to about 6Gbps). And I find that persistent grant mechanism has already used in blkfrong/blkback, I am wondering why there are no efforts to replace the grant copy by persistent grant now, at least in RX path. Are there other disadvantages in persistent grant method which stop we use it? I've seen numbers better than 6Gbps. See upstream changeset 1650d5455bd2dc6b5ee134bd6fc1a3236c266b5b. Thanks, Wei. The throughout I mentioned (3.5Gbps and 6Gbps) is for UDP 1400 bytes packet, I think the result based on 1650d5455bd2dc6b5ee134bd6fc1a3236c266b5b is for TCP. Persistent grant is not silver bullet. There is email thread on the list discussing whether it should be removed in block driver. I have tried to look for the thread but no detailed info. Could you give me some keyword to find the thread, thanks. XenServer folks have been working on improving network performance. It's my understanding that they choose different routes than persistent grant. David might have more insight. Wei. PS. I used pkt-gen to send packet from dom0 to a domU running on another dom0, the CPUs of both dom0 is Intel E5640 2.4GHz, and the two dom0s is connected with a 10GE NIC. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [qemu-upstream-unstable test] 35474: regressions - FAIL
flight 35474 qemu-upstream-unstable real [real] http://www.chiark.greenend.org.uk/~xensrcts/logs/35474/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-rhel6hvm-amd 6 leak-check/basis(6) running in 34247 [st=running!] test-amd64-amd64-xl-winxpsp3 10 guest-localmigrate fail in 34247 REGR. vs. 33488 Tests which are failing intermittently (not blocking): test-amd64-amd64-xl-qemuu-debianhvm-amd64 7 debian-hvm-install fail pass in 35312 test-amd64-i386-pair 17 guest-migrate/src_host/dst_host fail pass in 34247 test-amd64-amd64-xl-winxpsp3 7 windows-install fail pass in 34319 test-amd64-i386-freebsd10-i386 11 guest-localmigrate fail in 34247 pass in 35474 test-amd64-i386-xl-qemuu-debianhvm-amd64 10 guest-localmigrate fail in 34247 pass in 35474 test-amd64-i386-freebsd10-amd64 11 guest-localmigrate fail in 34247 pass in 35474 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 10 guest-localmigrate fail in 34247 pass in 35474 test-amd64-i386-xl-qemuu-ovmf-amd64 10 guest-localmigrate fail in 34247 pass in 35474 test-amd64-amd64-xl-qemuu-debianhvm-amd64 10 guest-localmigrate fail in 34247 pass in 35312 test-amd64-amd64-xl-qemuu-win7-amd64 10 guest-localmigrate fail in 34247 pass in 35474 test-amd64-i386-xl-win7-amd64 10 guest-localmigrate fail in 34247 pass in 35474 test-amd64-amd64-xl-win7-amd64 10 guest-localmigrate fail in 34247 pass in 35474 test-amd64-amd64-xl-qemuu-ovmf-amd64 10 guest-localmigrate fail in 34247 pass in 35474 test-amd64-i386-xl-winxpsp3-vcpus1 10 guest-localmigrate fail in 34247 pass in 35474 test-amd64-i386-xl-winxpsp3 10 guest-localmigrate fail in 34247 pass in 35474 test-amd64-i386-xl-qemuu-winxpsp3 10 guest-localmigrate fail in 34247 pass in 35474 test-amd64-amd64-xl-qemuu-winxpsp3 10 guest-localmigrate fail in 34247 pass in 35474 test-amd64-i386-xl-qemuu-win7-amd64 10 guest-localmigrate fail in 34247 pass in 35474 Regressions which are regarded as allowable (not blocking): test-armhf-armhf-xl-sedf-pin 13 guest-destroy fail in 35312 blocked in 33488 test-amd64-amd64-libvirt 9 guest-start fail in 35312 like 33488 test-armhf-armhf-xl-multivcpu 14 leak-check/check fail in 34247 blocked in 33488 test-armhf-armhf-xl-credit2 5 xen-bootfail in 34247 blocked in 33488 test-armhf-armhf-libvirt 13 guest-destroy fail in 34247 blocked in 33488 test-amd64-i386-libvirt 9 guest-start fail in 34247 like 33488 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-pvh-intel 9 guest-start fail never pass test-armhf-armhf-xl-multivcpu 10 migrate-support-checkfail never pass test-armhf-armhf-xl 10 migrate-support-checkfail never pass test-armhf-armhf-xl-sedf-pin 10 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 10 migrate-support-checkfail never pass test-armhf-armhf-xl-sedf 10 migrate-support-checkfail never pass test-armhf-armhf-libvirt 10 migrate-support-checkfail never pass test-amd64-amd64-libvirt 10 migrate-support-checkfail never pass test-amd64-amd64-xl-pvh-amd 9 guest-start fail never pass test-armhf-armhf-xl-midway 10 migrate-support-checkfail never pass test-amd64-amd64-xl-pcipt-intel 9 guest-start fail never pass test-amd64-i386-libvirt 10 migrate-support-checkfail never pass test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop fail never pass test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass test-amd64-i386-xl-winxpsp3 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-winxpsp3 14 guest-stopfail never pass test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop fail never pass version targeted for testing: qemuube11dc1e9172f91e798a8f831b30c14b479e08e8 baseline version: qemuu0d37748342e29854db7c9f6c47d7f58c6cfba6b2 People who touched revisions under test: Don Slutz dsl...@verizon.com Paul Durrant
[Xen-devel] [xen-4.5-testing test] 35450: trouble: broken/fail/pass
flight 35450 xen-4.5-testing real [real] http://www.chiark.greenend.org.uk/~xensrcts/logs/35450/ Failures and problems with tests :-( Tests which did not succeed and are blocking, including tests which could not be run: build-armhf 3 host-install(3) broken in 35097 REGR. vs. 34731 Tests which are failing intermittently (not blocking): test-amd64-i386-pair 4 host-install/dst_host(4) broken pass in 35097 test-amd64-i386-libvirt 5 xen-boot fail in 35097 pass in 35450 test-amd64-amd64-rumpuserxen-amd64 5 xen-boot fail in 35097 pass in 35450 test-amd64-i386-qemut-rhel6hvm-intel 5 xen-boot fail in 35097 pass in 35450 test-amd64-i386-qemuu-rhel6hvm-amd 5 xen-boot fail in 35097 pass in 35450 test-amd64-i386-xl5 xen-boot fail in 35097 pass in 35450 test-amd64-amd64-xl-sedf 5 xen-boot fail in 35097 pass in 35450 test-amd64-amd64-xl 5 xen-boot fail in 35097 pass in 35450 test-amd64-i386-freebsd10-i386 5 xen-boot fail in 35097 pass in 35450 test-amd64-i386-rumpuserxen-i386 5 xen-boot fail in 35097 pass in 35450 test-amd64-amd64-xl-qemuu-win7-amd64 5 xen-boot fail in 35097 pass in 35450 test-amd64-i386-xl-qemut-debianhvm-amd64 5 xen-boot fail in 35097 pass in 35450 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 5 xen-boot fail in 35097 pass in 35450 test-amd64-i386-xl-qemuu-winxpsp3 5 xen-boot fail in 35097 pass in 35450 test-amd64-i386-xl-winxpsp3-vcpus1 5 xen-boot fail in 35097 pass in 35450 test-amd64-amd64-xl-qemuu-debianhvm-amd64 5 xen-boot fail in 35097 pass in 35450 test-amd64-amd64-xl-qemut-debianhvm-amd64 5 xen-boot fail in 35097 pass in 35450 test-amd64-amd64-xl-qemuu-ovmf-amd64 5 xen-boot fail in 35097 pass in 35450 test-amd64-i386-xl-qemut-winxpsp3 5 xen-boot fail in 35097 pass in 35450 Regressions which are regarded as allowable (not blocking): test-amd64-amd64-libvirt 5 xen-boot fail in 35097 like 34638 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-pvh-intel 9 guest-start fail never pass test-armhf-armhf-xl-sedf 10 migrate-support-checkfail never pass test-armhf-armhf-libvirt 10 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 10 migrate-support-checkfail never pass test-armhf-armhf-xl 10 migrate-support-checkfail never pass test-amd64-amd64-xl-pvh-amd 9 guest-start fail never pass test-armhf-armhf-xl-midway 10 migrate-support-checkfail never pass test-armhf-armhf-xl-sedf-pin 10 migrate-support-checkfail never pass test-amd64-i386-libvirt 10 migrate-support-checkfail never pass test-amd64-amd64-rumpuserxen-amd64 13 rumpuserxen-demo-xenstorels/xenstorels.repeat fail never pass test-amd64-amd64-libvirt 10 migrate-support-checkfail never pass test-amd64-amd64-xl-pcipt-intel 9 guest-start fail never pass test-armhf-armhf-xl-credit2 5 xen-boot fail never pass test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-winxpsp3 14 guest-stop fail never pass test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-winxpsp3 14 guest-stopfail never pass test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop fail never pass test-amd64-amd64-xl-winxpsp3 14 guest-stop fail never pass test-armhf-armhf-xl-sedf 1 build-check(1)blocked in 35097 n/a test-armhf-armhf-libvirt 1 build-check(1)blocked in 35097 n/a test-armhf-armhf-xl-multivcpu 1 build-check(1) blocked in 35097 n/a test-armhf-armhf-xl 1 build-check(1)blocked in 35097 n/a test-armhf-armhf-xl-midway1 build-check(1)blocked in 35097 n/a test-armhf-armhf-xl-sedf-pin 1 build-check(1)blocked in 35097 n/a test-armhf-armhf-xl-credit2 1 build-check(1)blocked in 35097 n/a build-armhf-libvirt 1 build-check(1)blocked in 35097 n/a
Re: [Xen-devel] Poor network performance between DomU with multiqueue support
At 2015-02-27 19:30:20, David Vrabel david.vra...@citrix.com wrote: On 27/02/15 10:59, Wei Liu wrote: Persistent grant is not silver bullet. There is email thread on the list discussing whether it should be removed in block driver. Persistent grants for to-guest network traffic is a flawed idea. It either requires: a) the backend to memcpy into the mapped grant /and/ the frontend to memcpy out of the persistently mapped pool. This is clearly going to be worse for memory bandwidth than a single grant copy. Yes, persistent grant method does use more DomU's cpu than grant copy method. However, the persistent way does have one more memcpy operation than grant copy, but it has two less mmap operation than grant copy and no hypercall too. I have examined the code for grant copy, it needs to mmap the memory from src and dest domain to hypervisor, then memcpy the data from src to dest. There will be more cpu used by hypervisor instead of DomU. or b) the backend to accumulate more and more mappings of guest memory, which is bad for security and it uses too many grant and map track resources hence it does not scale to many VIFs. I find that persistent grant patch has a upper limit for amount of guest memory can be mapped by each queue of VIF. The limit seems to the VIF‘s ring size if I understand right, so the amount seems not high. Under my benchmark, at least for single UDP flow, the persistent grant way has more higher throughout than grant copy way. David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v7 0/3] xen/arm: Add support for Huawei hip04-d01 platform
This set of patches add Xen support for hip04-d01 platform (see https://wiki.linaro.org/Boards/D01 for details). Changes from v6: - collapsed some patch (Julien Grall); - remove useless check for irq values; - test interrupt controller not using DT compatibility; - remove non standard drivers flag (Ian Campbell). Changes from V5.99.1: - removed RFC again; - use different constants for hip04 instead of redefine standard ones; - comment compatible string change; - add an option to ARM to enable non standard drivers; - rename gicv2 to hip04gic to make clear this is not a standard gic. Changes from v5: - do not change gic-v2.c code but use a copy. To be considered RFC, to see if better to use copy or other techniques. Changes from v4: - rebased to new version; - removed patch for computing GIC addresses as it apply to all platforms; - removed patches to platform (cpu and system operations) as now they can use a bootwrapper which provide them. Changes from v3: - change the way regs property is computed for GICv2 (Julien Grall); - revert order of compaible names for GIC (Julien Grall). Changes from v2: - rewrote DTS fix patch (Ian Campbell); - use is_hip04 macro instead of doing explicit test (Julien Grall); - do not use quirks to distinguish this platform (Ian Cambell); - move some GIC constants to C files instead of header (Julien Grall); - minor changes (Julien Grall). Changes from v1: - style (Julien Grall); - make gicv2_send_SGI faster (Julien Grall); - cleanup correctly if hip04_smp_init fails (Julien Grall); - remove quirks using compatibility (Ian Campbell); - other minor suggestions by Julien Grall. Frediano Ziglio (3): xen/arm: Duplicate gic-v2.c file to support hip04 platform version xen/arm: Make gic-v2 code handle hip04-d01 platform xen/arm: Force dom0 to use normal GICv2 driver on Hip04 platform xen/arch/arm/Makefile | 1 + xen/arch/arm/domain_build.c | 2 +- xen/arch/arm/gic-hip04.c| 817 3 files changed, 819 insertions(+), 1 deletion(-) create mode 100644 xen/arch/arm/gic-hip04.c -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/5] x86: allow specifying the NUMA nodes Dom0 should run on
On Fri, 2015-02-27 at 10:50 +, Jan Beulich wrote: On 27.02.15 at 11:04, dario.faggi...@citrix.com wrote: On Fri, 2015-02-27 at 08:46 +, Jan Beulich wrote: This way behavior doesn't change if internally in the hypervisor we need to change the mapping from PXMs to node IDs. Ok, I see the value of this. I'm still a bit concerned about the fact that everything else speak NUMA node, but it's probably just me being much more used to that than to PXMs. :-) With everything else I suppose you mean the tool stack? There shouldn't be any node IDs kept across reboots there. Yet the consistent behavior to be achieved here is particularly for multiple boots. Sure. I was more thinking to inconsistency in the user mind, as he'll have to deal with PXM when configuring Dom0, and with node IDs after boot... but again, maybe it's only me. I'm simply adjusting what sched_init_vcpu() did, which is alter hard affinity conditionally on is_pinned and soft affinity unconditionally. Ok, I understand the idea behing this better now, thanks. [...] Setting soft affinity as a superset of (in the former case) or equal to (in the latter) hard affinity is just pure overhead, when in the scheduler. The why does sched_init_vcpu() do what it does? If you want to alter that, I'm fine with altering it here. It does that, but, in there, soft affinity is unconditionally set to 'all bits set'. Then, in the scheduler, if we find out that the the soft affinity mask is fully set, we just skip the soft affinity balancing step. The idea is that, whether the mask is full because no one touched this default, or because it has been manually set like that, there is nothing to do at the soft affinity balancing level. So, you actually are right: rather that not touch soft affinity, as I said in the previous email, I think we should set hard affinity conditionally to is_pinned, as in the patch, and then unconditionally set soft affinity to all, as in sched_init_vcpu(). Then, if we want to make it possible to tweak soft affinity, we can allow for something like dom0_nodes=soft:1,3 and, in that case, alter soft affinity only. Hmm, not sure. And I keep being confused whether soft means allow and hard means prefer or the other way around. hard means allow (or not allow) soft means prefer In any event, again, with sched_init_vcpu() setting up things so that soft is a superset of hard (and most likely they're equal), I don't see why the same done here would be more of a problem. Indeed, sorry, my bad. When talking about soft being superset, I forgot to mention the sort of special casing we are granting to the case when soft mask is all set. Using cpumask_setall here, as done in sched_init_vcpu(), would avoid incurring in the pointless soft affinity balancing overhead. Regards, Dario signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 3/4] xen: sched: make counters for vCPU tickling generic
and update them from Credit2 and RTDS schedulers. Signed-off-by: Dario Faggioli dario.faggi...@citrix.com Cc: Meng Xu xumengpa...@gmail.com Cc: George Dunlap george.dun...@eu.citrix.com Cc: Jan Beulich jbeul...@suse.com Cc: Keir Fraser k...@xen.org Reviewed-by: Meng Xu men...@cis.upenn.edu Acked-by: Jan Beulich jbeul...@suse.com --- Changes from v1: * fixed the 'no_tickle' case, in Credit2, as requested during review --- xen/common/sched_credit2.c |4 xen/common/sched_rt.c|2 ++ xen/include/xen/perfc_defn.h |4 ++-- 3 files changed, 8 insertions(+), 2 deletions(-) diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index 2b852cc..c0f7452 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -556,7 +556,10 @@ runq_tickle(const struct scheduler *ops, unsigned int cpu, struct csched2_vcpu * /* Only switch to another processor if the credit difference is greater * than the migrate resistance */ if ( ipid == -1 || lowest + CSCHED2_MIGRATE_RESIST new-credit ) +{ +SCHED_STAT_CRANK(tickle_idlers_none); goto no_tickle; +} tickle: BUG_ON(ipid == -1); @@ -571,6 +574,7 @@ tickle: (unsigned char *)d); } cpumask_set_cpu(ipid, rqd-tickled); +SCHED_STAT_CRANK(tickle_idlers_some); cpu_raise_softirq(ipid, SCHEDULE_SOFTIRQ); no_tickle: diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c index 49d1b83..2ad0c68 100644 --- a/xen/common/sched_rt.c +++ b/xen/common/sched_rt.c @@ -929,6 +929,7 @@ runq_tickle(const struct scheduler *ops, struct rt_vcpu *new) } /* didn't tickle any cpu */ +SCHED_STAT_CRANK(tickle_idlers_none); return; out: /* TRACE */ @@ -944,6 +945,7 @@ out: } cpumask_set_cpu(cpu_to_tickle, prv-tickled); +SCHED_STAT_CRANK(tickle_idlers_some); cpu_raise_softirq(cpu_to_tickle, SCHEDULE_SOFTIRQ); return; } diff --git a/xen/include/xen/perfc_defn.h b/xen/include/xen/perfc_defn.h index 2dc78fe..f754331 100644 --- a/xen/include/xen/perfc_defn.h +++ b/xen/include/xen/perfc_defn.h @@ -26,6 +26,8 @@ PERFCOUNTER(vcpu_wake_running, sched: vcpu_wake_running) PERFCOUNTER(vcpu_wake_onrunq, sched: vcpu_wake_onrunq) PERFCOUNTER(vcpu_wake_runnable, sched: vcpu_wake_runnable) PERFCOUNTER(vcpu_wake_not_runnable, sched: vcpu_wake_not_runnable) +PERFCOUNTER(tickle_idlers_none, sched: tickle_idlers_none) +PERFCOUNTER(tickle_idlers_some, sched: tickle_idlers_some) /* credit specific counters */ PERFCOUNTER(delay_ms, csched: delay) @@ -39,8 +41,6 @@ PERFCOUNTER(acct_vcpu_active, csched: acct_vcpu_active) PERFCOUNTER(acct_vcpu_idle, csched: acct_vcpu_idle) PERFCOUNTER(vcpu_park, csched: vcpu_park) PERFCOUNTER(vcpu_unpark,csched: vcpu_unpark) -PERFCOUNTER(tickle_idlers_none, csched: tickle_idlers_none) -PERFCOUNTER(tickle_idlers_some, csched: tickle_idlers_some) PERFCOUNTER(load_balance_idle, csched: load_balance_idle) PERFCOUNTER(load_balance_over, csched: load_balance_over) PERFCOUNTER(load_balance_other, csched: load_balance_other) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 4/4] xen: credit2: add a few performance counters
for events that are specific to Credit2 (as it happens for Credit1 already). Signed-off-by: Dario Faggioli dario.faggi...@citrix.com Cc: George Dunlap george.dun...@eu.citrix.com Cc: Jan Beulich jbeul...@suse.com Cc: Keir Fraser k...@xen.org Acked-by: Jan Beulich jbeul...@suse.com --- Changes from v1: * fixed the repeated typo in perfc_defn.h, as requested during review. --- xen/common/sched_credit2.c | 23 +++ xen/include/xen/perfc_defn.h | 15 ++- 2 files changed, 37 insertions(+), 1 deletion(-) diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index c0f7452..bf0d651 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -654,6 +654,8 @@ static void reset_credit(const struct scheduler *ops, int cpu, s_time_t now, } } +SCHED_STAT_CRANK(credit_reset); + /* No need to resort runqueue, as everyone's order should be the same. */ } @@ -673,6 +675,7 @@ void burn_credits(struct csched2_runqueue_data *rqd, struct csched2_vcpu *svc, s delta = now - svc-start_time; if ( delta 0 ) { +SCHED_STAT_CRANK(burn_credits_t2c); t2c_update(rqd, delta, svc); svc-start_time = now; @@ -713,6 +716,7 @@ static void update_max_weight(struct csched2_runqueue_data *rqd, int new_weight, { rqd-max_weight = new_weight; d2printk(%s: Runqueue id %d max weight %d\n, __func__, rqd-id, rqd-max_weight); +SCHED_STAT_CRANK(upd_max_weight_quick); } else if ( old_weight == rqd-max_weight ) { @@ -729,6 +733,7 @@ static void update_max_weight(struct csched2_runqueue_data *rqd, int new_weight, rqd-max_weight = max_weight; d2printk(%s: Runqueue %d max weight %d\n, __func__, rqd-id, rqd-max_weight); +SCHED_STAT_CRANK(upd_max_weight_full); } } @@ -750,6 +755,7 @@ __csched2_vcpu_check(struct vcpu *vc) { BUG_ON( !is_idle_vcpu(vc) ); } +SCHED_STAT_CRANK(vcpu_check); } #define CSCHED2_VCPU_CHECK(_vc) (__csched2_vcpu_check(_vc)) #else @@ -1203,6 +1209,7 @@ static void migrate(const struct scheduler *ops, svc-migrate_rqd = trqd; set_bit(_VPF_migrating, svc-vcpu-pause_flags); set_bit(__CSFLAG_runq_migrate_request, svc-flags); +SCHED_STAT_CRANK(migrate_requested); } else { @@ -1223,7 +1230,10 @@ static void migrate(const struct scheduler *ops, update_load(ops, svc-rqd, svc, 1, now); runq_insert(ops, svc-vcpu-processor, svc); runq_tickle(ops, svc-vcpu-processor, svc, now); +SCHED_STAT_CRANK(migrate_on_runq); } +else +SCHED_STAT_CRANK(migrate_no_runq); } } @@ -1577,7 +1587,10 @@ csched2_runtime(const struct scheduler *ops, int cpu, struct csched2_vcpu *snext /* The next guy may actually have a higher credit, if we've tried to * avoid migrating him from a different cpu. DTRT. */ if ( rt_credit = 0 ) +{ time = CSCHED2_MIN_TIMER; +SCHED_STAT_CRANK(runtime_min_timer); +} else { /* FIXME: See if we can eliminate this conversion if we know time @@ -1588,9 +1601,15 @@ csched2_runtime(const struct scheduler *ops, int cpu, struct csched2_vcpu *snext /* Check limits */ if ( time CSCHED2_MIN_TIMER ) +{ time = CSCHED2_MIN_TIMER; +SCHED_STAT_CRANK(runtime_min_timer); +} else if ( time CSCHED2_MAX_TIMER ) +{ time = CSCHED2_MAX_TIMER; +SCHED_STAT_CRANK(runtime_max_timer); +} } return time; @@ -1623,7 +1642,10 @@ runq_candidate(struct csched2_runqueue_data *rqd, * its credit is at least CSCHED2_MIGRATE_RESIST higher. */ if ( svc-vcpu-processor != cpu snext-credit + CSCHED2_MIGRATE_RESIST svc-credit ) +{ +SCHED_STAT_CRANK(migrate_resisted); continue; +} /* If the next one on the list has more credit than current * (or idle, if current is not runnable), choose it. */ @@ -1768,6 +1790,7 @@ csched2_schedule( { snext-credit += CSCHED2_MIGRATE_COMPENSATION; snext-vcpu-processor = cpu; +SCHED_STAT_CRANK(migrated); ret.migrated = 1; } } diff --git a/xen/include/xen/perfc_defn.h b/xen/include/xen/perfc_defn.h index f754331..526002d 100644 --- a/xen/include/xen/perfc_defn.h +++ b/xen/include/xen/perfc_defn.h @@ -28,10 +28,10 @@ PERFCOUNTER(vcpu_wake_runnable, sched: vcpu_wake_runnable) PERFCOUNTER(vcpu_wake_not_runnable, sched: vcpu_wake_not_runnable) PERFCOUNTER(tickle_idlers_none, sched: tickle_idlers_none) PERFCOUNTER(tickle_idlers_some, sched: tickle_idlers_some) +PERFCOUNTER(vcpu_check, sched: vcpu_check) /* credit specific counters */ PERFCOUNTER(delay_ms, csched: delay)
Re: [Xen-devel] [PATCH 3/3] mini-os: sort objects in binary archives
On Wed, 2015-02-11 at 11:37 +, Wei Liu wrote: Otherwise we can commence splitting off and then apply this patch to the split-off mini-os tree. mini-os has just been split off, minus this patch. I intend to let the push gate process that split (hopefully the gate will pass over the w/e) and then apply this patch as the first fresh commit in the new tree, which will help check all the bits are in place etc. I can adjust the paths and fix missing bracket as I go. I'll also update MINIOS_UPSTREAM_REVISION in xen.git to the new thing. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 0/8] Split off mini-os to a separate tree
On Wed, 2015-02-25 at 11:21 +, Wei Liu wrote: This is v3 of my mini-os splitting off patch series. As xen@xenbits I ran: $ mkdir ~/git/mini-os.git $ cd ~/git/mini-os.git $ git init --bare Initialized empty Git repository in /home/xen/git/mini-os.git/ $ chgrp -R xenmaint . $ find . -type d -exec chmod g+s {} \; $ git config --add receive.denyNonFastForwards true $ git config --add receive.unpackLimit 1 $ git config --add gc.autopacklimit 25 (the last three are due to what is in xen.git/config) Then on the machine where I usually do committing stuff I did: $ git clone git://xenbits.xen.org/mini-os.git mini-os.git Cloning into 'mini-os.git'... warning: You appear to have cloned an empty repository. $ git fetch git://xenbits.xen.org/people/liuw/mini-os.git master remote: Counting objects: 3325, done. remote: Compressing objects: 100% (954/954), done. remote: Total 3325 (delta 2308), reused 3291 (delta 2282) Receiving objects: 100% (3325/3325), 962.22 KiB | 451 KiB/s, done. Resolving deltas: 100% (2308/2308), done. From git://xenbits.xen.org/people/liuw/mini-os * branchmaster - FETCH_HEAD $ git push --dry-run origin f5d9868796e91bee70601805b9bfc1bb544b0586:refs/heads/master To ssh://xenbits.xen.org/home/xen/git/mini-os.git * [new branch] f5d9868796e91bee70601805b9bfc1bb544b0586 - master However having merged wip.build-system-v4 I discovered that autogen.sh needed to have been run half way up the merged branch. Wei fixed this up and produced a new people/liuw/mini-os.git and wip.build-system-v5, see 20150227161058.ge29...@zion.uk.xensource.com. So in mini-os.git: $ git fetch git://xenbits.xen.org/people/liuw/mini-os.git master remote: Counting objects: 99, done. remote: Compressing objects: 100% (71/71), done. remote: Total 90 (delta 19), reused 84 (delta 15) Unpacking objects: 100% (90/90), done. From git://xenbits.xen.org/people/liuw/mini-os * branchmaster - FETCH_HEAD $ git rev-parse FETCH_HEAD 55f7cd7427ef3e7fe3563a3da46d8664a2ed0d6d $ git push --dry-run origin +55f7cd7427ef3e7fe3563a3da46d8664a2ed0d6d:refs/heads/master To ssh://xenbits.xen.org/home/xen/git/mini-os.git + f5d9868...55f7cd7 55f7cd7427ef3e7fe3563a3da46d8664a2ed0d6d - master (forced update) $ git push origin +55f7cd7427ef3e7fe3563a3da46d8664a2ed0d6d:refs/heads/master Counting objects: 99, done. Delta compression using up to 8 threads. Compressing objects: 100% (70/70), done. Writing objects: 100% (90/90), 183.78 KiB, done. Total 90 (delta 19), reused 86 (delta 16) To ssh://xenbits.xen.org/home/xen/git/mini-os.git + f5d9868...55f7cd7 55f7cd7427ef3e7fe3563a3da46d8664a2ed0d6d - master (forced update) This required me to temporarily disable receive.denyNonFastForward on the xenbits repo. It is re-enabled now. Having done that I pulled git://xenbits.xen.org/people/liuw/xen.git wip.build-system-v5 into my staging branch, build tested it and pushed it back out to the xen.git#staging branch. Phew! Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On 27.02.15 at 16:24, ian.campb...@citrix.com wrote: On Fri, 2015-02-27 at 14:54 +, Stefano Stabellini wrote: MMCFG is a Linux config option, not to be confused with PHYSDEVOP_pci_mmcfg_reserved that is a Xen hypercall interface. I don't think that the way Linux (or FreeBSD) call PHYSDEVOP_pci_mmcfg_reserved is relevant. My (possibly flawed) understanding was that pci_mmcfg_reserved was intended to propagate the result of dom0 parsing some firmware table or other to the hypevisor. That's not flawed at all. In Linux dom0 we call it walking pci_mmcfg_list, which looking at arch/x86/pci/mmconfig-shared.c pci_parse_mcfg is populated by walking over a struct acpi_table_mcfg (there also appears to be a bunch of processor family derived entries, which I guess are quirks of some sort). Right - this parses ACPI tables (plus applies some knowledge about certain specific systems/chipsets/CPUs) and verifies that the space needed for the MMCFG region is properly reserved either in E820 or in the ACPI specified resources (only if so Linux decides to use MMCFG and consequently also tells Xen that it may use it). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 1/4] xen: sched: honour generic perf conuters in the RTDS scheduler
more specifically, about vCPU initialization and destruction events, in line with adb26c09f26e (xen: sched: introduce a couple of counters in credit2 and SEDF). Signed-off-by: Dario Faggioli dario.faggi...@citrix.com Cc: George Dunlap george.dun...@eu.citrix.com Cc: Meng Xu xumengpa...@gmail.com Cc: Jan Beulich jbeul...@suse.com Cc: Keir Fraser k...@xen.org Reviewed-by: Meng Xu men...@cis.upenn.edu --- xen/common/sched_rt.c |4 1 file changed, 4 insertions(+) diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c index df4adac..58dd646 100644 --- a/xen/common/sched_rt.c +++ b/xen/common/sched_rt.c @@ -525,6 +525,8 @@ rt_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd) if ( !is_idle_vcpu(vc) ) svc-budget = RTDS_DEFAULT_BUDGET; +SCHED_STAT_CRANK(vcpu_init); + return svc; } @@ -574,6 +576,8 @@ rt_vcpu_remove(const struct scheduler *ops, struct vcpu *vc) struct rt_dom * const sdom = svc-sdom; spinlock_t *lock; +SCHED_STAT_CRANK(vcpu_destroy); + BUG_ON( sdom == NULL ); lock = vcpu_schedule_lock_irq(vc); ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 0/4] xen: sched: rework and add performance counters
Take 2 of this: http://lists.xen.org/archives/html/xen-devel/2015-02/msg03249.html I've made all the changes suggested during v1. The series has Meng's Reviewed-by for the changes to sched_rt.c, and Jan's Ack for the non-strictly scheduling related part (1 file! :-D), so I think what is missing is George's view/Ack. Thanks and Regards, Dario --- Dario Faggioli (4): xen: sched: honour generic perf conuters in the RTDS scheduler xen: sched: make counters for vCPU sleep and wakeup generic xen: sched: make counters for vCPU tickling generic xen: credit2: add a few performance counters xen/common/sched_credit2.c | 39 +++ xen/common/sched_rt.c| 18 ++ xen/include/xen/perfc_defn.h | 29 + 3 files changed, 74 insertions(+), 12 deletions(-) -- This happens because I choose it to happen! (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 2/4] xen: sched: make counters for vCPU sleep and wakeup generic
and update them from Credit2 and RTDS. In Credit2, while there, remove some stale comments too. Signed-off-by: Dario Faggioli dario.faggi...@citrix.com Cc: George Dunlap george.dun...@eu.citrix.com Cc: Meng Xu men...@cis.upenn.edu Cc: Jan Beulich jbeul...@suse.com Cc: Keir Fraser k...@xen.org Reviewed-by: Meng Xu men...@cis.upenn.edu Acked-by: Jan Beulich jbeul...@suse.com --- xen/common/sched_credit2.c | 12 xen/common/sched_rt.c| 12 xen/include/xen/perfc_defn.h | 10 +- 3 files changed, 25 insertions(+), 9 deletions(-) diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index ad0a5d4..2b852cc 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -931,6 +931,7 @@ csched2_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc) struct csched2_vcpu * const svc = CSCHED2_VCPU(vc); BUG_ON( is_idle_vcpu(vc) ); +SCHED_STAT_CRANK(vcpu_sleep); if ( per_cpu(schedule_data, vc-processor).curr == vc ) cpu_raise_softirq(vc-processor, SCHEDULE_SOFTIRQ); @@ -956,19 +957,22 @@ csched2_vcpu_wake(const struct scheduler *ops, struct vcpu *vc) BUG_ON( is_idle_vcpu(vc) ); -/* Make sure svc priority mod happens before runq check */ if ( unlikely(per_cpu(schedule_data, vc-processor).curr == vc) ) { +SCHED_STAT_CRANK(vcpu_wake_running); goto out; } - if ( unlikely(__vcpu_on_runq(svc)) ) { -/* If we've boosted someone that's already on a runqueue, prioritize - * it and inform the cpu in question. */ +SCHED_STAT_CRANK(vcpu_wake_onrunq); goto out; } +if ( likely(vcpu_runnable(vc)) ) +SCHED_STAT_CRANK(vcpu_wake_runnable); +else +SCHED_STAT_CRANK(vcpu_wake_not_runnable); + /* If the context hasn't been saved for this vcpu yet, we can't put it on * another runqueue. Instead, we set a flag so that it will be put on the runqueue * after the context has been saved. */ diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c index 58dd646..49d1b83 100644 --- a/xen/common/sched_rt.c +++ b/xen/common/sched_rt.c @@ -851,6 +851,7 @@ rt_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc) struct rt_vcpu * const svc = rt_vcpu(vc); BUG_ON( is_idle_vcpu(vc) ); +SCHED_STAT_CRANK(vcpu_sleep); if ( curr_on_cpu(vc-processor) == vc ) cpu_raise_softirq(vc-processor, SCHEDULE_SOFTIRQ); @@ -966,11 +967,22 @@ rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc) BUG_ON( is_idle_vcpu(vc) ); if ( unlikely(curr_on_cpu(vc-processor) == vc) ) +{ +SCHED_STAT_CRANK(vcpu_wake_running); return; +} /* on RunQ/DepletedQ, just update info is ok */ if ( unlikely(__vcpu_on_q(svc)) ) +{ +SCHED_STAT_CRANK(vcpu_wake_onrunq); return; +} + +if ( likely(vcpu_runnable(vc)) ) +SCHED_STAT_CRANK(vcpu_wake_runnable); +else +SCHED_STAT_CRANK(vcpu_wake_not_runnable); /* If context hasn't been saved for this vcpu yet, we can't put it on * the Runqueue/DepletedQ. Instead, we set a flag so that it will be diff --git a/xen/include/xen/perfc_defn.h b/xen/include/xen/perfc_defn.h index 3ac7b45..2dc78fe 100644 --- a/xen/include/xen/perfc_defn.h +++ b/xen/include/xen/perfc_defn.h @@ -21,6 +21,11 @@ PERFCOUNTER(dom_init, sched: dom_init) PERFCOUNTER(dom_destroy,sched: dom_destroy) PERFCOUNTER(vcpu_init, sched: vcpu_init) PERFCOUNTER(vcpu_destroy, sched: vcpu_destroy) +PERFCOUNTER(vcpu_sleep, sched: vcpu_sleep) +PERFCOUNTER(vcpu_wake_running, sched: vcpu_wake_running) +PERFCOUNTER(vcpu_wake_onrunq, sched: vcpu_wake_onrunq) +PERFCOUNTER(vcpu_wake_runnable, sched: vcpu_wake_runnable) +PERFCOUNTER(vcpu_wake_not_runnable, sched: vcpu_wake_not_runnable) /* credit specific counters */ PERFCOUNTER(delay_ms, csched: delay) @@ -32,11 +37,6 @@ PERFCOUNTER(acct_reorder, csched: acct_reorder) PERFCOUNTER(acct_min_credit,csched: acct_min_credit) PERFCOUNTER(acct_vcpu_active, csched: acct_vcpu_active) PERFCOUNTER(acct_vcpu_idle, csched: acct_vcpu_idle) -PERFCOUNTER(vcpu_sleep, csched: vcpu_sleep) -PERFCOUNTER(vcpu_wake_running, csched: vcpu_wake_running) -PERFCOUNTER(vcpu_wake_onrunq, csched: vcpu_wake_onrunq) -PERFCOUNTER(vcpu_wake_runnable, csched: vcpu_wake_runnable) -PERFCOUNTER(vcpu_wake_not_runnable, csched: vcpu_wake_not_runnable) PERFCOUNTER(vcpu_park, csched: vcpu_park) PERFCOUNTER(vcpu_unpark,csched: vcpu_unpark) PERFCOUNTER(tickle_idlers_none, csched: tickle_idlers_none) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v6 02/23] xen: move NUMA_NO_NODE to public memory.h as XEN_NUMA_NO_NODE
On 27/02/15 16:51, Wei Liu wrote: On Fri, Feb 27, 2015 at 04:42:42PM +, Jan Beulich wrote: On 26.02.15 at 16:55, wei.l...@citrix.com wrote: Update NUMA_NO_NODE in Xen code to use the new macro. No functional change introduced. But also no explanation given why this is being done. After all just leaving out the explicit specification on a node in the memop flags has the effect of saying NUMA_NO_NODE. During last round review, Andrew wanted me to move this to Xen public header to avoid reinventing it in libxc. Now this value is used in libxc patch. But I don't particularly mind whether we move it or not, it's up to you maintainers to decide. It is a sentinel value used in the public ABI. It should therefore appear in the public API. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On Fri, 27 Feb 2015, Ian Campbell wrote: On Fri, 2015-02-27 at 16:35 +, Jan Beulich wrote: On 27.02.15 at 16:24, ian.campb...@citrix.com wrote: On Fri, 2015-02-27 at 14:54 +, Stefano Stabellini wrote: MMCFG is a Linux config option, not to be confused with PHYSDEVOP_pci_mmcfg_reserved that is a Xen hypercall interface. I don't think that the way Linux (or FreeBSD) call PHYSDEVOP_pci_mmcfg_reserved is relevant. My (possibly flawed) understanding was that pci_mmcfg_reserved was intended to propagate the result of dom0 parsing some firmware table or other to the hypevisor. That's not flawed at all. I think that's a first in this thread ;-) In Linux dom0 we call it walking pci_mmcfg_list, which looking at arch/x86/pci/mmconfig-shared.c pci_parse_mcfg is populated by walking over a struct acpi_table_mcfg (there also appears to be a bunch of processor family derived entries, which I guess are quirks of some sort). Right - this parses ACPI tables (plus applies some knowledge about certain specific systems/chipsets/CPUs) and verifies that the space needed for the MMCFG region is properly reserved either in E820 or in the ACPI specified resources (only if so Linux decides to use MMCFG and consequently also tells Xen that it may use it). Thanks. So I think what I wrote in 1424948710.14641.25.ca...@citrix.com applies as is to Device Tree based ARM devices, including the need for the PHYSDEVOP_pci_host_bridge_add call. Although I understand now that PHYSDEVOP_pci_mmcfg_reserved was intendend for passing down firmware information to Xen, as the information that we need is exactly the same, I think it would be acceptable to use the same hypercall on ARM too. I am not hard set on this and the new hypercall is also a viable option. However If we do introduce a new hypercall as Ian suggested, do we need to take into account the possibility that an host bridge might have multiple cfg memory ranges? On ACPI based devices we will have the MCFG table, and things follow much as for x86: * Xen should parse MCFG to discover the PCI host-bridges * Dom0 should do likewise and call PHYSDEVOP_pci_mmcfg_reserved in the same way as Xen/x86 does. The SBSA, an ARM standard for servers, mandates various things which we can rely on here because ACPI on ARM requires an SBSA compliant system. So things like odd quirks in PCI controllers or magic setup are spec'd out of our zone of caring (into the firmware I suppose), hence there is nothing like the DT_DEVICE_START stuff to register specific drivers etc. The PHYSDEVOP_pci_host_bridge_add call is not AFAICT needed on ACPI ARM systems (any more than it is on x86). We can decide whether to omit it from dom0 or ignore it from Xen later on. (Manish, this is FYI, I don't expect you to implement ACPI support!) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] MAINTAINERS: Add OVMF maintainers.
On Fri, Feb 27, 2015 at 04:49:18PM +, Anthony PERARD wrote: Signed-off-by: Anthony PERARD anthony.per...@citrix.com Acked-by: Wei Liu wei.l...@citrix.com --- MAINTAINERS | 6 ++ 1 file changed, 6 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 3bbac9e..e94a763 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -237,6 +237,12 @@ M: David Scott dave.sc...@eu.citrix.com S: Supported F: tools/ocaml/ +OVMF UPSTREAM +M: Anthony PERARD anthony.per...@citrix.com +M: Wei Liu wei.l...@citrix.com +S: Supported +T: git git://xenbits.xen.org/ovmf.git + POWER MANAGEMENT M: Jan Beulich jbeul...@suse.com M: Liu Jinsong jinsong@alibaba-inc.com -- Anthony PERARD ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v6 02/23] xen: move NUMA_NO_NODE to public memory.h as XEN_NUMA_NO_NODE
On Fri, Feb 27, 2015 at 04:42:42PM +, Jan Beulich wrote: On 26.02.15 at 16:55, wei.l...@citrix.com wrote: Update NUMA_NO_NODE in Xen code to use the new macro. No functional change introduced. But also no explanation given why this is being done. After all just leaving out the explicit specification on a node in the memop flags has the effect of saying NUMA_NO_NODE. During last round review, Andrew wanted me to move this to Xen public header to avoid reinventing it in libxc. Now this value is used in libxc patch. But I don't particularly mind whether we move it or not, it's up to you maintainers to decide. Wei. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v6 02/23] xen: move NUMA_NO_NODE to public memory.h as XEN_NUMA_NO_NODE
On 26.02.15 at 16:55, wei.l...@citrix.com wrote: Update NUMA_NO_NODE in Xen code to use the new macro. No functional change introduced. But also no explanation given why this is being done. After all just leaving out the explicit specification on a node in the memop flags has the effect of saying NUMA_NO_NODE. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] MAINTAINERS: Add OVMF maintainers.
Signed-off-by: Anthony PERARD anthony.per...@citrix.com --- MAINTAINERS | 6 ++ 1 file changed, 6 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 3bbac9e..e94a763 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -237,6 +237,12 @@ M: David Scott dave.sc...@eu.citrix.com S: Supported F: tools/ocaml/ +OVMF UPSTREAM +M: Anthony PERARD anthony.per...@citrix.com +M: Wei Liu wei.l...@citrix.com +S: Supported +T: git git://xenbits.xen.org/ovmf.git + POWER MANAGEMENT M: Jan Beulich jbeul...@suse.com M: Liu Jinsong jinsong@alibaba-inc.com -- Anthony PERARD ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v6 03/23] xen: make two memory hypercalls vNUMA-aware
On 26.02.15 at 16:55, wei.l...@citrix.com wrote: Make XENMEM_increase_reservation and XENMEM_populate_physmap vNUMA-aware. That is, if guest requests Xen to allocate memory for specific vnode, Xen can translate vnode to pnode using vNUMA information of that guest. XENMEMF_vnode is introduced for the guest to mark the node number is in fact virtual node number and should be translated by Xen. XENFEAT_memory_op_vnode_supported is introduced to indicate that Xen is able to translate virtual node to physical node. Signed-off-by: Wei Liu wei.l...@citrix.com As I massaged your first patch (also, but not only, to do what Andrew requested), this one will need adjustment too. Perhaps additionally if the 2nd one is to be dropped... Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v6 03/23] xen: make two memory hypercalls vNUMA-aware
On Fri, Feb 27, 2015 at 04:59:02PM +, Jan Beulich wrote: On 26.02.15 at 16:55, wei.l...@citrix.com wrote: Make XENMEM_increase_reservation and XENMEM_populate_physmap vNUMA-aware. That is, if guest requests Xen to allocate memory for specific vnode, Xen can translate vnode to pnode using vNUMA information of that guest. XENMEMF_vnode is introduced for the guest to mark the node number is in fact virtual node number and should be translated by Xen. XENFEAT_memory_op_vnode_supported is introduced to indicate that Xen is able to translate virtual node to physical node. Signed-off-by: Wei Liu wei.l...@citrix.com As I massaged your first patch (also, but not only, to do what Andrew requested), this one will need adjustment too. Perhaps additionally if the 2nd one is to be dropped... I can resend after we come to conclusion on what to do with patch 2. Wei. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 0/8] Split off mini-os to a separate tree
On Fri, 2015-02-27 at 16:37 +, Ian Campbell wrote: On Wed, 2015-02-25 at 11:21 +, Wei Liu wrote: This is v3 of my mini-os splitting off patch series. As xen@xenbits I ran: $ mkdir ~/git/mini-os.git $ cd ~/git/mini-os.git $ git init --bare Initialized empty Git repository in /home/xen/git/mini-os.git/ $ chgrp -R xenmaint . $ find . -type d -exec chmod g+s {} \; $ git config --add receive.denyNonFastForwards true $ git config --add receive.unpackLimit 1 $ git config --add gc.autopacklimit 25 This omitted setting up the mails to xen-stag...@lists.xensource.com on push. Following Ian's advice to look at ~xen/release-checklist on xenbits I have now: xen@xenbits:~/HG/patchbot$ echo 55f7cd7427ef3e7fe3563a3da46d8664a2ed0d6d mini-os--master.patchbot-reported-heads edited versions to add: /home/xen/git mini-os.git#master xen-change...@lists.xensource.com xen-de...@lists.xensource.com and committed that change to the git repo in cwd. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Regression, host crash with 4.5rc1
Len (CC'd on this email) is our power expert who has some ideas on this issue, I'll let him explain further. -- Don Dugger Censeo Toto nos in Kansa esse decisse. - D. Gale Ph: 303/443-3786 -Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Thursday, November 27, 2014 2:28 AM To: Steve Freitas; Dugger, Donald D; Nakajima, Jun Cc: xen-devel@lists.xen.org; Don Slutz Subject: Re: [Xen-devel] Regression, host crash with 4.5rc1 On 27.11.14 at 06:29, sfl...@ihonk.com wrote: On 11/25/2014 03:00 AM, Jan Beulich wrote: Okay, so it's not really the mwait-idle driver causing the regression, but it is C-state related. Hence we're now down to seeing whether all or just the deeper C states are affected, i.e. I now need to ask you to play with max_cstate=. For that you'll have to remember that the option's effect differs between the ACPI and the MWAIT idle drivers. In the spirit of bisection I'd suggest using max_cstate=2 first no matter which of the two scenarios you pick. If that still hangs, max_cstate=1 obviously is the only other thing to try. Should that not hang (and you left out mwait-idle=0), trying max_cstate=3 in that same scenario would be the other case to check. No need for 'd' and 'a' output for the time being, but 'c' output would be much appreciated for all cases where you observe hangs. Okay, working through that now. I tried max_cstate=2 and got no hangs, whether with or without mwait-idle=0. However, I was puzzled by this: (XEN) 'c' pressed - printing ACPI Cx structures (XEN) ==cpu0== (XEN) active state: C0 (XEN) max_cstate: C2 (XEN) states: (XEN) C1: type[C1] latency[003] usage[12219860] method[ FFH] duration[1190961948551] (XEN) C2: type[C1] latency[010] usage[10205554] method[ FFH] duration[2015393965907] (XEN) C3: type[C2] latency[020] usage[50926286] method[ FFH] duration[30527997858148] (XEN)*C0: usage[73351700] duration[9974627547595] (XEN) max=0 pwr=0 urg=0 nxt=0 (XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0] (XEN) CC3[28794734145697] CC6[0] CC7[0] (XEN) ==cpu1== (XEN) active state: C3 (XEN) max_cstate: C2 (XEN) states: (XEN) C1: type[C1] latency[003] usage[10699950] method[ FFH] duration[1141422044112] (XEN) C2: type[C1] latency[010] usage[06382904] method[ FFH] duration[1329739264322] (XEN)*C3: type[C2] latency[020] usage[44630764] method[ FFH] duration[31676618425954] (XEN) C0: usage[61713618] duration[9561201640320] (XEN) max=0 pwr=0 urg=0 nxt=0 (XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0] (XEN) CC3[30066495105056] CC6[0] CC7[0] [...] Why would some of the cores be in C3 even though they list max_cstate as C2? This was precisely the reason why I told you that the numbering differs (and is confusing and has nothing to do with actual C state numbers): What max_cstate refers to in the mwait-idle driver is what above is listed as type[Cx], i.e. the state at index 1 is C1, at 2 we've got C1E, and at 3 we've got C2. And those still aren't in line with the numbering the CPU documentation uses, it's rather kind of meant to refer to the ACPI numbering (but probably also not fully matching up). So max_cstate=2 working suggests a problem with what the CPU calls C6, which presumably isn't all that surprising considering the many errata (BD35, BD38, BD40, BD59, BD87, and BD104). Not sure how to proceed from here - I suppose you already made sure you run with the latest available BIOS. And with 6 errata documented it's not all that unlikely that there's a 7th one with MONITOR/MWAIT behavior. The commit you bisected to (and which you had verified to be the culprit by just forcing arch_skip_send_event_check() to always return false) could be reasonably assumed to be broken only when MWAIT use for all C states didn't work. Don, Jun - is there anything known but not yet publicly documented for Family 6 Model 44 Xeons? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: xen config changes v4
On Fri, Feb 27, 2015 at 6:30 AM, Juergen Gross jgr...@suse.com wrote: On 02/27/2015 02:38 PM, Stefano Stabellini wrote: On Fri, 27 Feb 2015, Juergen Gross wrote: On 02/27/2015 01:24 PM, Stefano Stabellini wrote: On Fri, 27 Feb 2015, Juergen Gross wrote: On 02/27/2015 11:11 AM, Stefano Stabellini wrote: On Fri, 27 Feb 2015, Juergen Gross wrote: On 02/27/2015 10:41 AM, Stefano Stabellini wrote: On Fri, 27 Feb 2015, Juergen Gross wrote: On 02/26/2015 06:42 PM, Stefano Stabellini wrote: On Thu, 26 Feb 2015, Luis R. Rodriguez wrote: On Thu, Feb 26, 2015 at 11:08:20AM +, Stefano Stabellini wrote: On Thu, 26 Feb 2015, David Vrabel wrote: On 26/02/15 04:59, Juergen Gross wrote: So we are again in the situation that pv-drivers always imply the pvops kernel (PARAVIRT selected). I started the whole Kconfig rework to eliminate this dependency. Yes. Can you produce a series that just addresses this one issue. In the absence of any concrete requirement for this big Kconfig reorg I I don't think it is helpful. I clearly missed some context as I didn't realize that this was the intended goal. Why do we want this? Please explain as it won't come for free. We have a few PV interfaces for HVM guests that need PARAVIRT in Linux in order to be used, for example pv_time_ops and HVMOP_pagetable_dying. They are critical performance improvements and from the interface perspective, small enough that doesn't make much sense having a separate KConfig option for them. In order to reach the goal above we necessarily need to introduce a differentiation in terms of PV on HVM guests in Linux: 1) basic guests with PV network, disk, etc but no PV timers, no HVMOP_pagetable_dying, no PV IPIs 2) full PV on HVM guests that have PV network, disk, timers, HVMOP_pagetable_dying, PV IPIs and anything else that makes sense. 2) is much faster than 1) on Xen and 2) is only a tiny bit slower than 1) on native x86 Also don't we shove 2) down hvm guests right now? Even when everything is built in I do not see how we opt out for HVM for 1) at run time right now. If this is true then the question of motivation for this becomes even stronger I think. Yes, indeed there is no way to do 1) at the moment. And for good reasons, see above. Hmm, after checking the code I'm not convinced: - HVMOP_pagetable_dying is obsolete on modern hardware supporting EPT/HAP That might be true, but what about older hardware? Even on modern hardware a few workloads still run faster on shadow. But if HVMOP_pagetable_dying is the only reason to keep PARAVIRT for HVM guests, then I agree with you that we should remove it. - PV IPIs are not needed on single-vcpu guests - PARAVIRT_CLOCK doesn't need PARAVIRT (in fact the SUSEs kernel configs for all x86_64 kernels have CONFIG_PARAVIRT_CLOCK=y) So I think we really should enable building Xen frontends without PARAVIRT, implying at least no XEN_PV and no XEN_PVH. I'll have a try setting up patches. If we are doing this as a performance improvement, I would like to see a couple of benchmarks (kernbench, hackbench) to show that on a single-vcpu guest and multi-vcpu guest (let's say 4 vcpus) disabling PARAVIRT leads to better performance on Xen on EPT hardware. This is not meant to be a performance improvement. It is meant to enable a standard distro kernel configured without PARAVIRT to be able to run as a HVM guest using the pv-drivers. This is not a convincing explanation. Debian, Ubuntu and Fedora seems to be able to cope with it just fine. Why do you want to do that, even though it will cause a performance regression and a maintenance pain? You haven't provided a reason yet. Either we are talking about different things, or I really don't understand your problem here. I don't want to disable something. I just want to enable kernels without PARAVIRT to run under Xen better than today. Being it 32 bit non-PAE kernels as Ian pointed out or distro kernels like e.g. SLES and probably RHEL. Using PV frontends is completely orthogonal to other PV enhancements like PARAVIRT_CLOCK, HVMOP_pagetable_dying or PV IPIs. So why do you object enabling the PV frontends for those kernels? I am for it. I would like to avoid two user visible XEN enablement options (XEN_FRONTEND vs. XEN_PVHVM) for x86_64 and PAE HVM guests to avoid configurations with just XEN_FRONTEND, that can be considered a performance regression compared to what we have now (on x86_64 and PAE). Would you be okay with making this an expert configuration alternative for PAE/x86_64? This would enable the possibility to use PV drivers for native-performance-tuned kernels. I would explicitly mention the better alternative XEN_PVHVM in the Kconfig help text. I would prefer to hide it on PAE and x86_64. Okay, as long as it is still _possible_ somehow to configure it. That begs
[Xen-devel] [PATCH v7 2/3] xen/arm: Make gic-v2 code handle hip04-d01 platform
The GIC in this platform is mainly compatible with the standard GICv2 beside: - ITARGET is extended to 16 bit to support 16 CPUs; - SGI mask is extended to support 16 CPUs; - maximum supported interrupt is 510; - GICH APR and LR register offsets. Signed-off-by: Frediano Ziglio frediano.zig...@huawei.com Signed-off-by: Zoltan Kiss zoltan.k...@huawei.com --- xen/arch/arm/Makefile | 1 + xen/arch/arm/domain_build.c | 2 +- xen/arch/arm/gic-hip04.c| 400 +++- 3 files changed, 207 insertions(+), 196 deletions(-) diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile index 41aba2e..72499e9 100644 --- a/xen/arch/arm/Makefile +++ b/xen/arch/arm/Makefile @@ -12,6 +12,7 @@ obj-y += domctl.o obj-y += sysctl.o obj-y += domain_build.o obj-y += gic.o gic-v2.o +obj-$(arm32) += gic-hip04.o obj-$(CONFIG_ARM_64) += gic-v3.o obj-y += io.o obj-y += irq.o diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c index 9f1f59f..83951a3 100644 --- a/xen/arch/arm/domain_build.c +++ b/xen/arch/arm/domain_build.c @@ -1069,7 +1069,7 @@ static int handle_node(struct domain *d, struct kernel_info *kinfo, /* Replace these nodes with our own. Note that the original may be * used_by DOMID_XEN so this check comes first. */ -if ( dt_match_node(gic_matches, node) ) +if ( node == dt_interrupt_controller || dt_match_node(gic_matches, node) ) return make_gic_node(d, kinfo-fdt, node); if ( dt_match_node(timer_matches, node) ) return make_timer_node(d, kinfo-fdt, node); diff --git a/xen/arch/arm/gic-hip04.c b/xen/arch/arm/gic-hip04.c index fa695d1..9977f9b 100644 --- a/xen/arch/arm/gic-hip04.c +++ b/xen/arch/arm/gic-hip04.c @@ -1,7 +1,8 @@ /* - * xen/arch/arm/gic-v2.c + * xen/arch/arm/gic-hip04.c * - * ARM Generic Interrupt Controller support v2 + * Generic Interrupt Controller for HiSilicon Hip04 platform + * Based heavily from gic-v2.c * * Tim Deegan t...@xen.org * Copyright (c) 2011 Citrix Systems. @@ -71,59 +72,71 @@ static struct { void __iomem * map_hbase; /* IO Address of virtual interface registers */ paddr_t vbase;/* Address of virtual cpu interface registers */ spinlock_t lock; -} gicv2; +} hip04gic; -static struct gic_info gicv2_info; +static struct gic_info hip04gic_info; /* The GIC mapping of CPU interfaces does not necessarily match the * logical CPU numbering. Let's use mapping as returned by the GIC * itself */ -static DEFINE_PER_CPU(u8, gic_cpu_id); +static DEFINE_PER_CPU(u16, gic_cpu_id); /* Maximum cpu interface per GIC */ -#define NR_GIC_CPU_IF 8 +#define NR_GIC_CPU_IF 16 + +#define HIP04_GICD_SGI_TARGET_SHIFT 8 + +#define HIP04_GICH_APR 0x70 +#define HIP04_GICH_LR0x80 + +#define DT_COMPAT_GIC_HIP04 hisilicon,hip04-intc static inline void writeb_gicd(uint8_t val, unsigned int offset) { -writeb_relaxed(val, gicv2.map_dbase + offset); +writeb_relaxed(val, hip04gic.map_dbase + offset); +} + +static inline void writew_gicd(uint16_t val, unsigned int offset) +{ +writew_relaxed(val, hip04gic.map_dbase + offset); } static inline void writel_gicd(uint32_t val, unsigned int offset) { -writel_relaxed(val, gicv2.map_dbase + offset); +writel_relaxed(val, hip04gic.map_dbase + offset); } static inline uint32_t readl_gicd(unsigned int offset) { -return readl_relaxed(gicv2.map_dbase + offset); +return readl_relaxed(hip04gic.map_dbase + offset); } static inline void writel_gicc(uint32_t val, unsigned int offset) { unsigned int page = offset PAGE_SHIFT; offset = ~PAGE_MASK; -writel_relaxed(val, gicv2.map_cbase[page] + offset); +writel_relaxed(val, hip04gic.map_cbase[page] + offset); } static inline uint32_t readl_gicc(unsigned int offset) { unsigned int page = offset PAGE_SHIFT; offset = ~PAGE_MASK; -return readl_relaxed(gicv2.map_cbase[page] + offset); +return readl_relaxed(hip04gic.map_cbase[page] + offset); } static inline void writel_gich(uint32_t val, unsigned int offset) { -writel_relaxed(val, gicv2.map_hbase + offset); +writel_relaxed(val, hip04gic.map_hbase + offset); } static inline uint32_t readl_gich(int unsigned offset) { -return readl_relaxed(gicv2.map_hbase + offset); +return readl_relaxed(hip04gic.map_hbase + offset); } -static unsigned int gicv2_cpu_mask(const cpumask_t *cpumask) +static unsigned int hip04gic_cpu_mask(const cpumask_t *cpumask) { unsigned int cpu; unsigned int mask = 0; @@ -139,7 +152,7 @@ static unsigned int gicv2_cpu_mask(const cpumask_t *cpumask) return mask; } -static void gicv2_save_state(struct vcpu *v) +static void hip04gic_save_state(struct vcpu *v) { int i; @@ -147,58 +160,58 @@ static void gicv2_save_state(struct vcpu *v) * this call and it only accesses struct vcpu fields that cannot be * accessed simultaneously by another pCPU. */ -for ( i = 0; i
[Xen-devel] [PATCH v7 1/3] xen/arm: Duplicate gic-v2.c file to support hip04 platform version
HiSilison Hip04 platform use a slightly different version. This is just a verbatim copy of the file to workaround git not fully supporting copy operation. Signed-off-by: Frediano Ziglio frediano.zig...@huawei.com --- xen/arch/arm/gic-hip04.c | 803 +++ 1 file changed, 803 insertions(+) create mode 100644 xen/arch/arm/gic-hip04.c diff --git a/xen/arch/arm/gic-hip04.c b/xen/arch/arm/gic-hip04.c new file mode 100644 index 000..fa695d1 --- /dev/null +++ b/xen/arch/arm/gic-hip04.c @@ -0,0 +1,803 @@ +/* + * xen/arch/arm/gic-v2.c + * + * ARM Generic Interrupt Controller support v2 + * + * Tim Deegan t...@xen.org + * Copyright (c) 2011 Citrix Systems. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include xen/config.h +#include xen/lib.h +#include xen/init.h +#include xen/mm.h +#include xen/irq.h +#include xen/sched.h +#include xen/errno.h +#include xen/softirq.h +#include xen/list.h +#include xen/device_tree.h +#include xen/libfdt/libfdt.h +#include asm/p2m.h +#include asm/domain.h +#include asm/platform.h +#include asm/device.h + +#include asm/io.h +#include asm/gic.h + +/* + * LR register definitions are GIC v2 specific. + * Moved these definitions from header file to here + */ +#define GICH_V2_LR_VIRTUAL_MASK0x3ff +#define GICH_V2_LR_VIRTUAL_SHIFT 0 +#define GICH_V2_LR_PHYSICAL_MASK 0x3ff +#define GICH_V2_LR_PHYSICAL_SHIFT 10 +#define GICH_V2_LR_STATE_MASK 0x3 +#define GICH_V2_LR_STATE_SHIFT 28 +#define GICH_V2_LR_PRIORITY_SHIFT 23 +#define GICH_V2_LR_PRIORITY_MASK 0x1f +#define GICH_V2_LR_HW_SHIFT31 +#define GICH_V2_LR_HW_MASK 0x1 +#define GICH_V2_LR_GRP_SHIFT 30 +#define GICH_V2_LR_GRP_MASK0x1 +#define GICH_V2_LR_MAINTENANCE_IRQ (119) +#define GICH_V2_LR_GRP1(130) +#define GICH_V2_LR_HW (131) +#define GICH_V2_LR_CPUID_SHIFT 9 +#define GICH_V2_VTR_NRLRGS 0x3f + +#define GICH_V2_VMCR_PRIORITY_MASK 0x1f +#define GICH_V2_VMCR_PRIORITY_SHIFT 27 + +/* Global state */ +static struct { +paddr_t dbase;/* Address of distributor registers */ +void __iomem * map_dbase; /* IO mapped Address of distributor registers */ +paddr_t cbase;/* Address of CPU interface registers */ +void __iomem * map_cbase[2]; /* IO mapped Address of CPU interface registers */ +paddr_t hbase;/* Address of virtual interface registers */ +void __iomem * map_hbase; /* IO Address of virtual interface registers */ +paddr_t vbase;/* Address of virtual cpu interface registers */ +spinlock_t lock; +} gicv2; + +static struct gic_info gicv2_info; + +/* The GIC mapping of CPU interfaces does not necessarily match the + * logical CPU numbering. Let's use mapping as returned by the GIC + * itself + */ +static DEFINE_PER_CPU(u8, gic_cpu_id); + +/* Maximum cpu interface per GIC */ +#define NR_GIC_CPU_IF 8 + +static inline void writeb_gicd(uint8_t val, unsigned int offset) +{ +writeb_relaxed(val, gicv2.map_dbase + offset); +} + +static inline void writel_gicd(uint32_t val, unsigned int offset) +{ +writel_relaxed(val, gicv2.map_dbase + offset); +} + +static inline uint32_t readl_gicd(unsigned int offset) +{ +return readl_relaxed(gicv2.map_dbase + offset); +} + +static inline void writel_gicc(uint32_t val, unsigned int offset) +{ +unsigned int page = offset PAGE_SHIFT; +offset = ~PAGE_MASK; +writel_relaxed(val, gicv2.map_cbase[page] + offset); +} + +static inline uint32_t readl_gicc(unsigned int offset) +{ +unsigned int page = offset PAGE_SHIFT; +offset = ~PAGE_MASK; +return readl_relaxed(gicv2.map_cbase[page] + offset); +} + +static inline void writel_gich(uint32_t val, unsigned int offset) +{ +writel_relaxed(val, gicv2.map_hbase + offset); +} + +static inline uint32_t readl_gich(int unsigned offset) +{ +return readl_relaxed(gicv2.map_hbase + offset); +} + +static unsigned int gicv2_cpu_mask(const cpumask_t *cpumask) +{ +unsigned int cpu; +unsigned int mask = 0; +cpumask_t possible_mask; + +cpumask_and(possible_mask, cpumask, cpu_possible_map); +for_each_cpu( cpu, possible_mask ) +{ +ASSERT(cpu NR_GIC_CPU_IF); +mask |= per_cpu(gic_cpu_id, cpu); +} + +return mask; +} + +static void gicv2_save_state(struct vcpu *v) +{ +int i; + +/* No need for spinlocks here because interrupts are disabled around + * this call and it only accesses struct vcpu fields that cannot be + *
[Xen-devel] Pygrub backports
I think the following commits from master should be considered for backport: 0c12e5b7427b4dfd2dfabf21f6b0e6e24bc8e864 tools/pygrub: Fix extlinux when /boot is a separate partition from / d1b93ea2615bd789ee28901f1f1c05ffb319cb61 tools/pygrub: Make pygrub understand default entry in string format 4ee393f9d6528640c29a0554fdc6cb3e795fb6e8 pygrub: fix non-interactive parsing of grub1 config files 3b279811707dab4bab95c2e952e94ebf4d6badd9 pygrub: Fix regression from c/s d1b93ea, attempt 2 Existing Xen 4.4.1 as found in Ubuntu cannot parse the grub.cfg files that Ubuntu itself generates, which was: Reported-by: Owen Dunn osd1...@cam.ac.uk Owen kindly tested pygrub from xen.git#master (merged with the Debian/Ubuntu patchset, provided by me) and reports that it worked in his setup. Opinions ? Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 3/5] x86: widen NUMA nodes to be allocated from
On Fri, 2015-02-27 at 13:36 +, Jan Beulich wrote: On 27.02.15 at 14:27, dario.faggi...@citrix.com wrote: I'm asking because I really don't like vcpu_to_node(). And I'm not talking about how it is implemented (there probably are not much alternatives), I'm saying I don't think it should exist, and I really would see value in killing it. :-) I'm all for killing it. In fact I'd also like to see domain_to_node() go away, as it's similarly bogus (no matter of the proposed changed implementation) - neither a vCPU nor a domain have a focus node or some such (some may happen to if their node mask has just a single set bit, but that's nothing code should depend on). I totally agree. I didn't go as far as far as suggesting that because, if my grep-ing is not failing, it's still in use in two more places, even with your series applied. But yes, we really should make it possible to remove it too. (And btw, at the very least first_node() in your proposal should become any_node().) Except, there is no such function. But again, I agree, and if we get to the point where we can kill vcpu_to_node() but need to keep domain_to_node, we can of course implement it. :-) Regards, Dario signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On Fri, 2015-02-27 at 14:33 +, Stefano Stabellini wrote: On Thu, 26 Feb 2015, Ian Campbell wrote: On Thu, 2015-02-26 at 15:39 +0530, Manish Jaggi wrote: Have you reached a conclusion? My current thinking on how PCI for Xen on ARM should look is thus: xen/arch/arm/pci.c: New file, containing core PCI infrastructure for ARM. Includes: pci_hostbridge_register(), which registers a host bridge: Registration includes: DT node pointer CFG space address pci_hostbridge_ops function table, which contains e.g. cfg space read/write ops, perhaps other stuff). Function for setting the (segment,bus) for a given host bridge. Lets say pci_hostbridge_setup(), the host bridge must have been previously registered. Looks up the host bridge via CFG space address and maps that to (segment,bus). Functions for looking up host bridges by various keys as needed (cfg base address, DT node, etc) pci_init() function, called from somewhere appropriate in setup.c which calls device_init(node, DEVICE_PCIHOST, NULL) (see gic_init() for the shape of this) Any other common helper functions for managing PCI devices, e.g. for implementing PHYSDEVOP_*, which cannot be made properly common (i.e. shared with x86). xen/drivers/pci/host-*.c (or pci/host/*.c): New files, one per supported PCI controller IP block. Each should use the normal DT_DEVICE infrastructure for probing, i.e.: DT_DEVICE_START(foo, FOO, DEVICE_PCIHOST) Probe function should call pci_hostbridge_register() for each host bridge which the controller exposes. xen/arch/arm/physdev.c: Implements do_physdev_op handling PHYSDEVOP_*. Includes: New hypercall subop PHYSDEVOP_pci_host_bridge_add: As per 1424703761.27930.140.ca...@citrix.com which calls pci_hostbridge_setup() to map the (segment,bus) to a specific pci_hostbridge_ops (i.e. must have previously been registered with pci_hostbridge_register(), else error). I think that the new hypercall is unnecessary. We know the MMCFG address ranges belonging to a given host bridge from DT and PHYSDEVOP_pci_mmcfg_reserved gives us segment, start_bus and end_bus for a specific MMCFG. My understanding from discussion with Jan was that this is not what this hypercall does, or at least that this would be an abuse of the existing interface. See: 54e75d87027800062...@mail.emea.novell.com Anyway, what happens for when there is no MMCFG table to drive dom0's calls to pci_mmcfg_reserved? Or a given host-bridge doesn't have special flags and so isn't mentioned there. I think a dedicated hypercall is better. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Pygrub backports
Jan Beulich writes (Re: Pygrub backports): On 27.02.15 at 13:29, ian.jack...@eu.citrix.com wrote: I think the following commits from master should be considered for backport: Looks reasonable. Question is - do you still want this for 4.4.2 or only afterwards? If for it, then can these please go in before RC2 (which really is only pending a push on the branches)? Well, TBH I was kind of surprised that we hadn't queued these as backports anyway. Backporting pygrub improvements is important for compatibility with newer guests. So if you don't mind too much, can we have them in 4.4.2 ? In which case I would push them right away. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/Dom0: account for shadow/HAP allocation
On 27/02/15 13:21, Jan Beulich wrote: On 27.02.15 at 13:02, andrew.coop...@citrix.com wrote: On 26/02/15 07:43, Jan Beulich wrote: On 25.02.15 at 18:06, andrew.coop...@citrix.com wrote: On 25/02/15 14:45, Jan Beulich wrote: +static unsigned long __init dom0_paging_pages(const struct domain *d, + unsigned long nr_pages) +{ +/* Copied from: libxl_get_required_shadow_memory() */ +unsigned long memkb = nr_pages * (PAGE_SIZE / 1024); + +memkb = 4 * (256 * d-max_vcpus + 2 * (memkb / 1024)); I have recently raised a bug against Xapi for similar wrong logic when calculating the size of the shadow pool. A per-vcpu reservation of shadow allocation is only needed if shadow paging is actually in use, and even then should match shadow_min_acceptable_pages() at 128 pages per vcpu. If HAP is in use, the only allocations from the shadow pool are for the EPT/NPT tables (1% of nr_pages), IOMMU tables (another 1% of nr_pages if in use), and the logdirty radix tree (substantially less than than 1% of nr_pages). One could argue that structure such as the vmcs/vmcb should have their allocations accounted against the domain, in which case a small per-vcpu component would be appropriate. However as it currently stands, this calculation wastes 4MB of ram per vcpu in shadow allocation which is not going to be used. But you realize that the functional change here explicitly only covers the shadow case - the PVH (i.e. HAP) case is effectively unchanged (merely correcting the mistake of not accounting for what gets actually allocated), and I don't intend any functional change for PVH (other than said bug fix) with this patch. Ok Hence correcting this (i.e. lowering the accounted for as well as the allocated amount) as well as adding accounting for VMCS/VMCB (just like we account for struct vcpu) should be the subject of a separate patch, presumably by someone actively working on PVH (and then perhaps at once for libxc). I also think that this calculation would better become a paging variant specific hook if calculations differ between shadow and HAP. That would be better, in the longrun. Taking this together, can I read this as an ack then? Acked-by: Andrew Cooper andrew.coop...@citrix.com Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 3/4] xen: sched: make counters for vCPU tickling generic
2015-02-27 5:53 GMT-05:00 Dario Faggioli dario.faggi...@citrix.com: On Fri, 2015-02-27 at 00:47 -0500, Meng Xu wrote: 2015-02-26 8:37 GMT-05:00 Dario Faggioli dario.faggi...@citrix.com: and update them from Credit2 and RTDS schedulers. Signed-off-by: Dario Faggioli dario.faggi...@citrix.com Cc: Meng Xu xumengpa...@gmail.com Cc: George Dunlap george.dun...@eu.citrix.com Cc: Jan Beulich jbeul...@suse.com Cc: Keir Fraser k...@xen.org --- xen/common/sched_credit2.c |2 ++ xen/common/sched_rt.c|2 ++ xen/include/xen/perfc_defn.h |4 ++-- 3 files changed, 6 insertions(+), 2 deletions(-) The change for RTDS scheduler looks good to me. Does this count as a Reviewed-by: Meng Xu men...@cis.upenn.edu ? Also, if yes, does it also apply to patch #2 ? That is unclear as sched_rt.c is modified in patches #1, #2 ad #3, while what you did is: - you explicitly provided the tag for patch #1 - you said looks good for this for patch #3 - you said nothing for patch #2 The bottom line of all this being: with Ack-s/Reviewed-by-s, it's always better be pretty explicit! :-D I see. Thank you very much, Dario, for explaining this to me! :-) After you add return before no_tickle:, this patch is good to go, IMHO. So after the this change, Reviewed-by: Meng Xu men...@cis.upenn.edu Thank you very much! Best, Meng -- --- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 08/24] xen/arm: Allow virq != irq
Hi Ian, On 20/02/15 15:52, Ian Campbell wrote: As DOM0 will get most the devices, the vIRQ is equal to the IRQ in that case. Am I correct that after this patch all callers still pass irq==virq to the new function? Sorry, I forgot to answer to this question. Yes, all the callers will pass irq == virq in case of DOM0. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen/arm: Handle translated addresses for hardware domains in GICv2
On Fri, 2015-02-27 at 13:53 +, Julien Grall wrote: Hi Frediano, On 25/02/15 13:21, Frediano Ziglio wrote: Translated addresses (in d-arch.vgic.{c,d}base) are now bus addresses which could not always be applied to the DT. Copy the original addresses from DT directly to get the original untranslated reg property which will give same d-arch.vgic.{c,d}base values once translated again. Signed-off-by: Frediano Ziglio frediano.zig...@huawei.com --- xen/arch/arm/gic-v2.c | 25 ++--- 1 file changed, 14 insertions(+), 11 deletions(-) Fixed typos in comments. diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c index 31fb81a..a401e3f 100644 --- a/xen/arch/arm/gic-v2.c +++ b/xen/arch/arm/gic-v2.c @@ -590,7 +590,7 @@ static int gicv2_make_dt_node(const struct domain *d, const struct dt_device_node *gic = dt_interrupt_controller; const void *compatible = NULL; u32 len; -__be32 *new_cells, *tmp; +const __be32 *regs; int res = 0; compatible = dt_get_property(gic, compatible, len); @@ -617,18 +617,21 @@ static int gicv2_make_dt_node(const struct domain *d, if ( res ) return res; -len = dt_cells_to_size(dt_n_addr_cells(node) + dt_n_size_cells(node)); -len *= 2; /* GIC has two memory regions: Distributor + CPU interface */ -new_cells = xzalloc_bytes(len); -if ( new_cells == NULL ) -return -FDT_ERR_XEN(ENOMEM); +/* + * DTB provides up to 4 regions to handle virtualization Sorry to ask more change. I'm not sure why you speak about virtualization here. Because two of the regions are GICH and GICV, and those are the ones we are truncating out here. Also, can you write somewhere that the GICC and GICD are the first 2 regions of the reg? Other than that this patch looks good to me: Reviewed-by: Julien Grall julien.gr...@linaro.org Regards, ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 3/5] x86: widen NUMA nodes to be allocated from
On Fri, 2015-02-27 at 13:46 +, Ian Campbell wrote: On Fri, 2015-02-27 at 13:27 +, Dario Faggioli wrote: After this series, vcpu_to_node() (defined in xen/include/xen/numa.h) is left with only one use, in xen/arch/arm/domain.c, besides of course being used to implement domain_to_node() (still in xen/include/xen/numa.h). So, provided ARM people (and I'm Cc-ing them) can get rid of that, Happy to do so if you have advise on what to replace it with, just 0? As Julien says, with the MEMF_no_owner feature Jan is introducing in the series. We don't do NUMA yet on ARM so that would be fine, but eventually we'd want the vcpu stack to be allocated in some sort of sensible relative to vcpu affinity location... Yes, and Jan's MEMF_no_owner, if it works on your arch too, as it seems it could, will provide exactly that. Regards, Dario signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On Thu, 26 Feb 2015, Ian Campbell wrote: On Thu, 2015-02-26 at 15:39 +0530, Manish Jaggi wrote: Have you reached a conclusion? My current thinking on how PCI for Xen on ARM should look is thus: xen/arch/arm/pci.c: New file, containing core PCI infrastructure for ARM. Includes: pci_hostbridge_register(), which registers a host bridge: Registration includes: DT node pointer CFG space address pci_hostbridge_ops function table, which contains e.g. cfg space read/write ops, perhaps other stuff). Function for setting the (segment,bus) for a given host bridge. Lets say pci_hostbridge_setup(), the host bridge must have been previously registered. Looks up the host bridge via CFG space address and maps that to (segment,bus). Functions for looking up host bridges by various keys as needed (cfg base address, DT node, etc) pci_init() function, called from somewhere appropriate in setup.c which calls device_init(node, DEVICE_PCIHOST, NULL) (see gic_init() for the shape of this) Any other common helper functions for managing PCI devices, e.g. for implementing PHYSDEVOP_*, which cannot be made properly common (i.e. shared with x86). xen/drivers/pci/host-*.c (or pci/host/*.c): New files, one per supported PCI controller IP block. Each should use the normal DT_DEVICE infrastructure for probing, i.e.: DT_DEVICE_START(foo, FOO, DEVICE_PCIHOST) Probe function should call pci_hostbridge_register() for each host bridge which the controller exposes. xen/arch/arm/physdev.c: Implements do_physdev_op handling PHYSDEVOP_*. Includes: New hypercall subop PHYSDEVOP_pci_host_bridge_add: As per 1424703761.27930.140.ca...@citrix.com which calls pci_hostbridge_setup() to map the (segment,bus) to a specific pci_hostbridge_ops (i.e. must have previously been registered with pci_hostbridge_register(), else error). I think that the new hypercall is unnecessary. We know the MMCFG address ranges belonging to a given host bridge from DT and PHYSDEVOP_pci_mmcfg_reserved gives us segment, start_bus and end_bus for a specific MMCFG. We don't need anything else: we can simply match the host bridge based on the MMCFG address that dom0 tells us via PHYSDEVOP_pci_mmcfg_reserved with the addresses on DT. But we do need to support PHYSDEVOP_pci_mmcfg_reserved on ARM. PHYSDEVOP_pci_device_add/remove: Implement existing hypercall interface used by x86 for ARM. This requires that PHYSDEVOP_pci_host_bridge_add has been called for the (segment,bus) which it refers to, otherwise error. Looks up the host bridge and does whatever setup is required plus e.g. calling of pci_add_device(). No doubt various other existing interfaces will need wiring up, e.g. pci_conf_{read,write}* should lookup the host bridge ops struct and call the associated method. I'm sure the above must be incomplete, but I hope the general shape makes sense? I think it makes sense and it is along the lines of what I was thinking too. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v7 3/3] xen/arm: Force dom0 to use normal GICv2 driver on Hip04 platform
Until vGIC support is not implemented and tested, this will prevent guest kernels to use their Hip04 driver, or crash when they don't have any. Signed-off-by: Frediano Ziglio frediano.zig...@huawei.com --- xen/arch/arm/gic-hip04.c | 18 +++--- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/xen/arch/arm/gic-hip04.c b/xen/arch/arm/gic-hip04.c index 9977f9b..a7c0892 100644 --- a/xen/arch/arm/gic-hip04.c +++ b/xen/arch/arm/gic-hip04.c @@ -614,17 +614,21 @@ static int hip04gic_make_dt_node(const struct domain *d, const struct dt_device_node *node, void *fdt) { const struct dt_device_node *gic = dt_interrupt_controller; -const void *compatible = NULL; +const void *compatible; u32 len; const __be32 *regs; int res = 0; -compatible = dt_get_property(gic, compatible, len); -if ( !compatible ) -{ -dprintk(XENLOG_ERR, Can't find compatible property for the gic node\n); -return -FDT_ERR_XEN(ENOENT); -} +/* + * Replace compatibility string with a standard one. + * dom0 will see a compatible GIC. This as GICC is compatible + * with standard one and GICD (emulated by Xen) is compatible + * to standard. Otherwise we should implement HIP04 GICD in + * the virtual GIC. + * This actually limit CPU number to 8 for dom0. + */ +compatible = DT_COMPAT_GIC_CORTEX_A15; +len = strlen((char*) compatible) + 1; res = fdt_begin_node(fdt, interrupt-controller); if ( res ) -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen: correct bug in p2m list initialization
On 27/02/15 14:45, Juergen Gross wrote: Commit 054954eb051f35e74b75a566a96fe756015352c8 (xen: switch to linear virtual mapped sparse p2m list) introduced an error. During initialization of the p2m list a p2m identity area mapped by a complete identity pmd entry has to be split up into smaller chunks sometimes, if a non-identity pfn is introduced in this area. If this non-identity pfn is not at index 0 of a p2m page the new p2m page needed is initialized with wrong identity entries, as the identity pfns don't start with the value corresponding to index 0, but with the initial non-identity pfn. This results in weird wrong mappings. Correct the wrong initialization by starting with the correct pfn. Applied to stable/for-linus-4.0, thanks. David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Pygrub backports
Ian Campbell writes (Re: Pygrub backports): Sounds good. If we could also get an example of the problematic grub.cfg to be checked into xen.git/tools/pygrub/examples that would be handy too. I have asked the reporter for a (suitably-laundered) copy and some info about how it was generated. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 08/24] xen/arm: Allow virq != irq
On Fri, 2015-02-27 at 14:33 +, Julien Grall wrote: Hi Ian, On 20/02/15 17:09, Julien Grall wrote: On 20/02/15 15:52, Ian Campbell wrote: action = xmalloc(struct irqaction); -if (!action) +if ( !action ) +return -ENOMEM; + +info = xmalloc(struct irq_guest); FWIW you might (subject to sizing/alignment needs) be able to do action = _xmalloc(sizeof(struct irqaction) + sizeof(struct irq_guest); info = (sturct irq_guest *)(action + 1); which would save some memory overhead for free pointers etc and allow you to avoid manually managing the info. You probably won't like that though, so feel free to ignore. Actually it's a good idea :). I haven't though about it. I though about it. The pointer to irq_guest may not be correctly aligned with this solution, right? It depends on sizeof(struct irqaction) (which is what I meant by subject to...). t'd probably need a ROUNDUP(sizeof(foo), pointer-alignement) in there somewhere. So I prefer to keep separate the allocation. We can revisit it later. OK. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 3/5] x86: widen NUMA nodes to be allocated from
On Fri, 2015-02-27 at 13:38 +, Julien Grall wrote: Signed-off-by: Jan Beulich jbeul...@suse.com Reviewed-by: Dario Faggioli dario.faggi...@citrix.com One question (a genuine one, i.e., I'm really not sure what I'm saying is correct). After this series, vcpu_to_node() (defined in xen/include/xen/numa.h) is left with only one use, in xen/arch/arm/domain.c, besides of course being used to implement domain_to_node() (still in xen/include/xen/numa.h). So, provided ARM people (and I'm Cc-ing them) can get rid of that, can that macro be removed all together, and domain_to_node(d) be defined after d-node_affinity... something like: Given the changes made by Jan on x86, I think we could replace vcpu_to_node by MEMF_no_owner. I expected this to be the case. Happy to hear it is! :-) FWIW, we don't have any NUMA support on ARM currently. I know. Thanks and Regards, Dario signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: xen config changes v4
This is not meant to be a performance improvement. It is meant to enable a standard distro kernel configured without PARAVIRT to be able to run as a HVM guest using the pv-drivers. This is not a convincing explanation. Debian, Ubuntu and Fedora seems to be able to cope with it just fine. No they are not. The 32-bit Fedora Core 21 LiveISO is non-PAE. I think the same situation was with Ubuntu. Why do you want to do that, even though it will cause a performance regression and a maintenance pain? You haven't provided a reason yet. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: xen config changes v4
On Fri, Feb 27, 2015 at 09:53:46AM -0800, Luis R. Rodriguez wrote: On Fri, Feb 27, 2015 at 6:30 AM, Juergen Gross jgr...@suse.com wrote: On 02/27/2015 02:38 PM, Stefano Stabellini wrote: On Fri, 27 Feb 2015, Juergen Gross wrote: On 02/27/2015 01:24 PM, Stefano Stabellini wrote: On Fri, 27 Feb 2015, Juergen Gross wrote: On 02/27/2015 11:11 AM, Stefano Stabellini wrote: On Fri, 27 Feb 2015, Juergen Gross wrote: On 02/27/2015 10:41 AM, Stefano Stabellini wrote: On Fri, 27 Feb 2015, Juergen Gross wrote: On 02/26/2015 06:42 PM, Stefano Stabellini wrote: On Thu, 26 Feb 2015, Luis R. Rodriguez wrote: On Thu, Feb 26, 2015 at 11:08:20AM +, Stefano Stabellini wrote: On Thu, 26 Feb 2015, David Vrabel wrote: On 26/02/15 04:59, Juergen Gross wrote: So we are again in the situation that pv-drivers always imply the pvops kernel (PARAVIRT selected). I started the whole Kconfig rework to eliminate this dependency. Yes. Can you produce a series that just addresses this one issue. In the absence of any concrete requirement for this big Kconfig reorg I I don't think it is helpful. I clearly missed some context as I didn't realize that this was the intended goal. Why do we want this? Please explain as it won't come for free. We have a few PV interfaces for HVM guests that need PARAVIRT in Linux in order to be used, for example pv_time_ops and HVMOP_pagetable_dying. They are critical performance improvements and from the interface perspective, small enough that doesn't make much sense having a separate KConfig option for them. In order to reach the goal above we necessarily need to introduce a differentiation in terms of PV on HVM guests in Linux: 1) basic guests with PV network, disk, etc but no PV timers, no HVMOP_pagetable_dying, no PV IPIs 2) full PV on HVM guests that have PV network, disk, timers, HVMOP_pagetable_dying, PV IPIs and anything else that makes sense. 2) is much faster than 1) on Xen and 2) is only a tiny bit slower than 1) on native x86 Also don't we shove 2) down hvm guests right now? Even when everything is built in I do not see how we opt out for HVM for 1) at run time right now. If this is true then the question of motivation for this becomes even stronger I think. Yes, indeed there is no way to do 1) at the moment. And for good reasons, see above. Hmm, after checking the code I'm not convinced: - HVMOP_pagetable_dying is obsolete on modern hardware supporting EPT/HAP That might be true, but what about older hardware? Even on modern hardware a few workloads still run faster on shadow. But if HVMOP_pagetable_dying is the only reason to keep PARAVIRT for HVM guests, then I agree with you that we should remove it. - PV IPIs are not needed on single-vcpu guests - PARAVIRT_CLOCK doesn't need PARAVIRT (in fact the SUSEs kernel configs for all x86_64 kernels have CONFIG_PARAVIRT_CLOCK=y) So I think we really should enable building Xen frontends without PARAVIRT, implying at least no XEN_PV and no XEN_PVH. I'll have a try setting up patches. If we are doing this as a performance improvement, I would like to see a couple of benchmarks (kernbench, hackbench) to show that on a single-vcpu guest and multi-vcpu guest (let's say 4 vcpus) disabling PARAVIRT leads to better performance on Xen on EPT hardware. This is not meant to be a performance improvement. It is meant to enable a standard distro kernel configured without PARAVIRT to be able to run as a HVM guest using the pv-drivers. This is not a convincing explanation. Debian, Ubuntu and Fedora seems to be able to cope with it just fine. Why do you want to do that, even though it will cause a performance regression and a maintenance pain? You haven't provided a reason yet. Either we are talking about different things, or I really don't understand your problem here. I don't want to disable something. I just want to enable kernels without PARAVIRT to run under Xen better than today. Being it 32 bit non-PAE kernels as Ian pointed out or distro kernels like e.g. SLES and probably RHEL. Using PV frontends is completely orthogonal to other PV enhancements like PARAVIRT_CLOCK, HVMOP_pagetable_dying or PV IPIs. So why do you object enabling the PV frontends for those kernels? I am for it. I would like to avoid two user visible XEN enablement options (XEN_FRONTEND vs. XEN_PVHVM) for x86_64 and PAE HVM guests to avoid configurations with just XEN_FRONTEND, that can be considered a performance regression compared to what we have now (on x86_64 and PAE). Would you be okay with making this an expert configuration alternative for PAE/x86_64? This would enable the possibility to use PV drivers for
[Xen-devel] [PATCH v5] tools/xenconsoled: Increase file descriptor limit
XenServer's VM density testing uncovered a regression when moving from sysvinit to systemd where the file descriptor limit dropped from 4096 to 1024. (XenServer had previously inserted a ulimit statement into its initscripts.) One solution is to use LimitNOFILE=4096 in xenconsoled.service to match the lost ulimit, but that is only a stopgap solution. As Xenconsoled genuinely needs a large number of file descriptors if a large number of domains are running, attempt to increase the limit. Signed-off-by: Andrew Cooper andrew.coop...@citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com --- v5: * Drop system maximum checking * Unify set paths v4: * Calculate fd limit based on domid ABI - result is 132008 fds * Warn if sufficient fds are not available. v3: * Hide Linux specific bits in #ifdef __linux__ v2: * Always increase soft limit to hard limit * Correct commment regarding number of file descriptors * long - unsigned long as that appears to be the underlying type of an rlim_t --- tools/console/daemon/main.c | 36 1 file changed, 36 insertions(+) diff --git a/tools/console/daemon/main.c b/tools/console/daemon/main.c index 92d2fc4..6e84f5a 100644 --- a/tools/console/daemon/main.c +++ b/tools/console/daemon/main.c @@ -26,6 +26,7 @@ #include string.h #include signal.h #include sys/types.h +#include sys/resource.h #include xenctrl.h @@ -55,6 +56,39 @@ static void version(char *name) printf(Xen Console Daemon 3.0\n); } +static void increase_fd_limit(void) +{ + /* +* We require many file descriptors: +* - per domain: pty master, pty slave, logfile and evtchn +* - misc extra: hypervisor log, privcmd, gntdev, std... +* +* Allow a generous 1000 for misc, and calculate the maximum possible +* number of fds which could be used. +*/ + unsigned min_fds = (DOMID_FIRST_RESERVED * 4) + 1000; + struct rlimit lim, new = { min_fds, min_fds }; + + if (getrlimit(RLIMIT_NOFILE, lim) 0) { + fprintf(stderr, Failed to obtain fd limit: %s\n, + strerror(errno)); + exit(1); + } + + /* Do we already have sufficient? Great! */ + if (lim.rlim_cur = min_fds) + return; + + /* Try to increase our limit. */ + if (setrlimit(RLIMIT_NOFILE, new) 0) + syslog(LOG_WARNING, + Unable to increase fd limit from {%lu, %lu} to + {%lu, %lu}: (%s) - May run out with lots of domains, + lim.rlim_cur, lim.rlim_max, + new.rlim_cur, new.rlim_max, + strerror(errno)); +} + int main(int argc, char **argv) { const char *sopts = hVvit:o:; @@ -154,6 +188,8 @@ int main(int argc, char **argv) openlog(xenconsoled, syslog_option, LOG_DAEMON); setlogmask(syslog_mask); + increase_fd_limit(); + if (!is_interactive) { daemonize(pidfile ? pidfile : /var/run/xenconsoled.pid); } -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5] tools/xenconsoled: Increase file descriptor limit
Andrew Cooper writes ([PATCH v5] tools/xenconsoled: Increase file descriptor limit): XenServer's VM density testing uncovered a regression when moving from sysvinit to systemd where the file descriptor limit dropped from 4096 to 1024. (XenServer had previously inserted a ulimit statement into its initscripts.) ... Thanks, and sorry to be pernickety. Acked-by: Ian Jackson ian.jack...@eu.citrix.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Regression, host crash with 4.5rc1
(Please forgive my lack of Xen-fu knowledge in advance) If this issue were to happen on Linux/bare-metal, this is how I'd debug it. Hopefully some of this will translate to Xen in one way or another. dmesg | grep idle will tell us what idle driver is running (on Dom0 kernel) and if it is intel_idle, it will also tell us the supported sub-states (CPUID.MWAIT.EDX value) grep . /sys/devices/system/cpu/cpu0/cpuidle/*/* will tell us what states the OS is requesting, It will expand on the FFH bit here: (XEN) C1: type[C1] latency[003] usage[12219860] method[ FFH] duration[1190961948551] (XEN) C2: type[C1] latency[010] usage[10205554] method[ FFH] duration[2015393965907] (XEN) C3: type[C2] latency[020] usage[50926286] method[ FFH] duration[30527997858148] I'm hopeful that this information comes from the hardware's BIOS and not some hypervisor tricking out Dom0 with a fake BIOS, yes? If Xen doesn't have cpuidle, or its sysfs, then acpidump for the platform should be able to tell us what the platform is exporting. Next, hopefully the attached turbostat utility can be invoked on Dom0 and it can read the MSRs on at least 1 processor via the /dev/cpu interface. This will tell you what the hardware supports, and what HW states are actually being invoked. (which may be different from what the OS asks for...) It may tell us just the same thing I think we learned here: (XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0] (XEN) CC3[28794734145697] CC6[0] CC7[0] which I'm assuming are a dump of the MSR residency counters. If yes, it appears to be that this platform is not invoking c6 and pc6 at all, and that the deepest state being used is actually cc3 and pc3. I don't know if that is because you've booted the kernel with max_cstate=N of some kind, or if this is default. attached is turbostat, source and binary, run it this way and send the ts.out file: # ./turbostat --debug sleep 5 ts.out 21 Guessing... If no surprises in the debug stuff requested above, and If the XEN debug stuff above is with c6 explicitly disabled... Note that here are two kinds of c6 -- CC6 (core) and PC6 (package). If this box supports both, the next thing to try will be to keep CC6 enabled, but to just disable PC6. This is done via an MSR that turbostat dumps out (MSR_NHM_SNB_PKG_CST_CFG_CTL) via the wrmsr(8) utility. Though if that MSR is locked by the BIOS, then BIOS SETUP option may be the only way to disable the package C-state limit without also disabling the associated core C-state. cheers, -Len ps. turbostat-test.tar.gz Description: turbostat-test.tar.gz ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] freemem-slack and large memory environments
On Friday, February 27, 2015 08:28:49 AM Mike Latimer wrote: On Friday, February 27, 2015 10:52:17 AM Stefano Stabellini wrote: On Thu, 26 Feb 2015, Mike Latimer wrote: libxl_set_memory_target = 1 The new memory target is set for dom0 successfully. libxl_wait_for_free_memory = -5 Still there isn't enough free memory in the system. libxl_wait_for_memory_target = 0 However dom0 reached the new memory target already. Who is stealing your memory? I just realized I was missing commit 2048aeec, which corrects the hardcoded return value of libxl_wait_for_memory_target from 0 to rc. I'll retest with this change in place. In any case in the context of libxl_wait_for_memory_target, ERROR_FAIL means that the memory target has not been reached. I'm expecting this commit to to change what I'm seeing, but I'm not convinced it will be a good change... There is zero chance dom0 will balloon down 64GB (or 512GB) in the 10 second window set by freemem. This will likely mean the entire process will fail (when given a bit more time it would have succeeded). I'll add the missing commit, and send a complete set of debug logs later today. After adding 2048aeec, dom0's target is lowered by the required amount (e.g. 64GB), but as dom0 cannot balloon down fast enough, libxl_wait_for_memory_target returns -5, and the domain create fails (failed to free memory for the domain). As dom0's target was lowered successfully, dom0 continues to balloon down in the background. So, after waiting a while, the domain creation will succeed. This is one of the problems I would like to solve. As the ballooning is working (just taking longer than expected) the code should monitor it and wait somehow. I'll send in detailed logs (without 2048aeec) later today, to make sure I've explained this well enough. -Mike ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [xen-unstable test] 35257: regressions - FAIL
Ian Campbell wrote: On Thu, 2015-02-26 at 20:14 +, xen.org wrote: flight 35257 xen-unstable real [real] http://www.chiark.greenend.org.uk/~xensrcts/logs/35257/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-armhf-armhf-libvirt 12 guest-start.2 fail REGR. vs. 34629 logs: http://www.chiark.greenend.org.uk/~xensrcts/logs/35257/test-armhf-armhf-libvirt/info.html http://www.chiark.greenend.org.uk/~xensrcts/logs/35257/test-armhf-armhf-libvirt/12.ts-guest-start.log 2015-02-23 20:21:48 Z executing ssh ... root@10.80.229.106 virsh domxml-from-native xen-xl /etc/xen/debian.guest.osstest.cfg /etc/xen/debian.guest.osstest.cfg.xml error: failed to connect to the hypervisor error: no valid connection error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused http://www.chiark.greenend.org.uk/~xensrcts/logs/35257/test-armhf-armhf-libvirt/marilith-n4-output-ps_wwwaxf_-eo_pid%2Ctty%2Cstat%2Ctime%2Cnice%2Cpsr%2Cpcpu%2Cpmem%2Cnwchan%2Cwchan%2325%2Cargs appears to show no libvirtd process. http://www.chiark.greenend.org.uk/~xensrcts/logs/35257/test-armhf-armhf-libvirt/marilith-n4---var-log-libvirt-libvirtd.log says: 2015-02-23 20:13:15.556+: 2133: info : libvirt version: 1.2.13 2015-02-23 20:13:15.556+: 2133: error : dnsmasqCapsRefreshInternal:726 : Cannot check dnsmasq binary dnsmasq: No such file or directory 2015-02-23 20:13:15.845+: 2133: error : virFirewallValidateBackend:193 : direct firewall backend requested, but /sbin/ebtables is not available: No such file or directory Odd, since ebtables was found when building checking for ebtables... /sbin/ebtables But AFAICT, that wont prevent libvirtd from starting. I think these are just spurious. 2015-02-23 20:13:15.845+: 2133: error : virFirewallApply:936 : out of memory 2015-02-23 20:13:16.092+: 2133: error : virExec:491 : Cannot find 'pm-is-supported' in path: No such file or directory 2015-02-23 20:13:16.092+: 2133: warning : virQEMUCapsInit:999 : Failed to get host power management capabilities As are these two. 2015-02-23 20:13:16.400+: 2133: error : virFirewallApply:936 : out of memory Has these OOM messages resulted in libvirtd exiting? No, I don't think so. The related code is int virFirewallApply(virFirewallPtr firewall) { size_t i, j; int ret = -1; virMutexLock(ruleLock); if (!firewall || firewall-err == ENOMEM) { virReportOOMError(); goto cleanup; ... } I suspect 'firewall' is null, so OOM error is reported and the function returns -1. But I also don't see this preventing libvirtd from starting. I've cc'd the libvirt list for verification that these errors won't prevent libvirtd from starting. I don't see any evidence of a crash elsewhere in the logs (i.e. no process segfaulted in dmesg, no OOM killing going on etc). We don't seem to collect dom0 freemem info, but that most likely wouldn't help given the libvirtd process has exited. Any ideas where to look next? Can you access the test environment and try starting libvirtd in the foreground? Or enable debug log level in /etc/libvirt/libvirtd.conf? Regards, Jim ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 2/4] xen: sched: make counters for vCPU sleep and wakeup generic
[I see the reason why I neglected this patch: my gmail just filter it into the Forum category and I didn't see it. :-) Dario, Do you have any suggestion of the email client (maybe the one you guys are using)? ] 2015-02-26 8:37 GMT-05:00 Dario Faggioli dario.faggi...@citrix.com: and update them from Credit2 and RTDS. In Credit2, while there, remove some stale comments too. Signed-off-by: Dario Faggioli dario.faggi...@citrix.com Cc: George Dunlap george.dun...@eu.citrix.com Cc: Jan Beulich jbeul...@suse.com Cc: Keir Fraser k...@xen.org --- xen/common/sched_credit2.c | 12 xen/common/sched_rt.c| 12 xen/include/xen/perfc_defn.h | 10 +- 3 files changed, 25 insertions(+), 9 deletions(-) diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index ad0a5d4..2b852cc 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -931,6 +931,7 @@ csched2_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc) struct csched2_vcpu * const svc = CSCHED2_VCPU(vc); BUG_ON( is_idle_vcpu(vc) ); +SCHED_STAT_CRANK(vcpu_sleep); if ( per_cpu(schedule_data, vc-processor).curr == vc ) cpu_raise_softirq(vc-processor, SCHEDULE_SOFTIRQ); @@ -956,19 +957,22 @@ csched2_vcpu_wake(const struct scheduler *ops, struct vcpu *vc) BUG_ON( is_idle_vcpu(vc) ); -/* Make sure svc priority mod happens before runq check */ if ( unlikely(per_cpu(schedule_data, vc-processor).curr == vc) ) { +SCHED_STAT_CRANK(vcpu_wake_running); goto out; } - if ( unlikely(__vcpu_on_runq(svc)) ) { -/* If we've boosted someone that's already on a runqueue, prioritize - * it and inform the cpu in question. */ +SCHED_STAT_CRANK(vcpu_wake_onrunq); goto out; } +if ( likely(vcpu_runnable(vc)) ) +SCHED_STAT_CRANK(vcpu_wake_runnable); +else +SCHED_STAT_CRANK(vcpu_wake_not_runnable); + /* If the context hasn't been saved for this vcpu yet, we can't put it on * another runqueue. Instead, we set a flag so that it will be put on the runqueue * after the context has been saved. */ diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c index 58dd646..49d1b83 100644 --- a/xen/common/sched_rt.c +++ b/xen/common/sched_rt.c @@ -851,6 +851,7 @@ rt_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc) struct rt_vcpu * const svc = rt_vcpu(vc); BUG_ON( is_idle_vcpu(vc) ); +SCHED_STAT_CRANK(vcpu_sleep); if ( curr_on_cpu(vc-processor) == vc ) cpu_raise_softirq(vc-processor, SCHEDULE_SOFTIRQ); @@ -966,11 +967,22 @@ rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc) BUG_ON( is_idle_vcpu(vc) ); if ( unlikely(curr_on_cpu(vc-processor) == vc) ) +{ +SCHED_STAT_CRANK(vcpu_wake_running); return; +} /* on RunQ/DepletedQ, just update info is ok */ if ( unlikely(__vcpu_on_q(svc)) ) +{ +SCHED_STAT_CRANK(vcpu_wake_onrunq); return; +} + +if ( likely(vcpu_runnable(vc)) ) +SCHED_STAT_CRANK(vcpu_wake_runnable); +else +SCHED_STAT_CRANK(vcpu_wake_not_runnable); /* If context hasn't been saved for this vcpu yet, we can't put it on * the Runqueue/DepletedQ. Instead, we set a flag so that it will be diff --git a/xen/include/xen/perfc_defn.h b/xen/include/xen/perfc_defn.h index 3ac7b45..2dc78fe 100644 --- a/xen/include/xen/perfc_defn.h +++ b/xen/include/xen/perfc_defn.h @@ -21,6 +21,11 @@ PERFCOUNTER(dom_init, sched: dom_init) PERFCOUNTER(dom_destroy,sched: dom_destroy) PERFCOUNTER(vcpu_init, sched: vcpu_init) PERFCOUNTER(vcpu_destroy, sched: vcpu_destroy) +PERFCOUNTER(vcpu_sleep, sched: vcpu_sleep) +PERFCOUNTER(vcpu_wake_running, sched: vcpu_wake_running) +PERFCOUNTER(vcpu_wake_onrunq, sched: vcpu_wake_onrunq) +PERFCOUNTER(vcpu_wake_runnable, sched: vcpu_wake_runnable) +PERFCOUNTER(vcpu_wake_not_runnable, sched: vcpu_wake_not_runnable) /* credit specific counters */ PERFCOUNTER(delay_ms, csched: delay) @@ -32,11 +37,6 @@ PERFCOUNTER(acct_reorder, csched: acct_reorder) PERFCOUNTER(acct_min_credit,csched: acct_min_credit) PERFCOUNTER(acct_vcpu_active, csched: acct_vcpu_active) PERFCOUNTER(acct_vcpu_idle, csched: acct_vcpu_idle) -PERFCOUNTER(vcpu_sleep, csched: vcpu_sleep) -PERFCOUNTER(vcpu_wake_running, csched: vcpu_wake_running) -PERFCOUNTER(vcpu_wake_onrunq, csched: vcpu_wake_onrunq) -PERFCOUNTER(vcpu_wake_runnable, csched: vcpu_wake_runnable) -PERFCOUNTER(vcpu_wake_not_runnable, csched: vcpu_wake_not_runnable) PERFCOUNTER(vcpu_park, csched: vcpu_park) PERFCOUNTER(vcpu_unpark,csched: vcpu_unpark)
Re: [Xen-devel] [PATCH v3 0/8] Split off mini-os to a separate tree
On Fri, 2015-02-27 at 14:58 +, Wei Liu wrote: On Fri, Feb 27, 2015 at 02:46:58PM +, Ian Campbell wrote: On Fri, 2015-02-27 at 13:50 +, Wei Liu wrote: On Fri, Feb 27, 2015 at 01:38:58PM +, Ian Campbell wrote: On Wed, 2015-02-25 at 11:21 +, Wei Liu wrote: git://xenbits.xen.org/people/liuw/xen.git wip.build-system-v3 I think the series is now fully acked. Please could you rebase -i and add the acks and push this as v4 without changing the base commit, i.e. not pulling it up to current master or staging, leave it at cb34a7c8d741aa447d79e1b01d71168a4088a4d7. Not rebasing means you do not need to retest etc and I can just git pull the result. git://xenbits.xen.org/people/liuw/xen.git wip.build-system-v4 Thanks. I'm going to commit this after my current test run with the OVMF update completes. Please can you confirm the precise changeset ID you expect me to find at git://xenbits.xen.org/people/liuw/mini-os.git master f5d9868796e91bee70601805b9bfc1bb544b0586 Thanks. and to push to git://xenbits.xen.org/people/mini-os.git master as part ^^ You don't need people I think? Correct, I removed one too few path elements. It's git://xenbits.xen.org/mini-os.git. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v6 23/23] xl: vNUMA support
On Thu, 2015-02-26 at 15:56 +, Wei Liu wrote: This patch includes configuration options parser and documentation. Please find the hunk to xl.cfg.pod.5 for more information. Signed-off-by: Wei Liu wei.l...@citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Ian Jackson ian.jack...@eu.citrix.com --- Changes in v6: 1. Disable NUMA auto-placement. --- Reviewed-and-Tested-by: Dario Faggioli dario.faggi...@citrix.com Regards, Dario signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4] tools/xenconsoled: Increase file descriptor limit
Andrew Cooper writes ([PATCH v4] tools/xenconsoled: Increase file descriptor limit): XenServer's VM density testing uncovered a regression when moving from sysvinit to systemd where the file descriptor limit dropped from 4096 to 1024. (XenServer had previously inserted a ulimit statement into its initscripts.) One solution is to use LimitNOFILE=4096 in xenconsoled.service to match the lost ulimit, but that is only a stopgap solution. As Xenconsoled genuinely needs a large number of file descriptors if a large number of domains are running, attempt to increase the limit. ... There's still a lot of code here I think we can do without. Why do we care about the system maximum ? + /* + * Will min_fds fit within our current hard limit? + * (likely on *BSD, unlikely on Linux) + * If so, raise our soft limit. + */ + if (min_fds = lim.rlim_max) { + struct rlimit new = { + .rlim_cur = min_fds, + .rlim_max = lim.rlim_max, + }; + + if (setrlimit(RLIMIT_NOFILE, new) 0) + syslog(LOG_WARNING, +Unable to increase fd soft limit: %lu - %u, +hard %lu (%s) - May run out with lots of domains, +lim.rlim_cur, min_fds, lim.rlim_max, +strerror(errno)); + } else { + /* + * Lets hope that, as a root process, we have sufficient + * privilege to up the hard limit. + */ + struct rlimit new = { .rlim_cur = min_fds, .rlim_max = min_fds }; + + if (setrlimit(RLIMIT_NOFILE, new) 0) + syslog(LOG_WARNING, +Unable to increase fd hard limit: %lu - %u (%s) + - May run out with lots of domains, +lim.rlim_max, min_fds, strerror(errno)); + } This is very repetitive. The only difference between the two branches is (a) the value of .rlim_max and (b) the log message. (b) can be dealt with by making the log message depend only on the contents of new. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 0/8] Split off mini-os to a separate tree
On Fri, Feb 27, 2015 at 02:46:58PM +, Ian Campbell wrote: On Fri, 2015-02-27 at 13:50 +, Wei Liu wrote: On Fri, Feb 27, 2015 at 01:38:58PM +, Ian Campbell wrote: On Wed, 2015-02-25 at 11:21 +, Wei Liu wrote: git://xenbits.xen.org/people/liuw/xen.git wip.build-system-v3 I think the series is now fully acked. Please could you rebase -i and add the acks and push this as v4 without changing the base commit, i.e. not pulling it up to current master or staging, leave it at cb34a7c8d741aa447d79e1b01d71168a4088a4d7. Not rebasing means you do not need to retest etc and I can just git pull the result. git://xenbits.xen.org/people/liuw/xen.git wip.build-system-v4 Thanks. I'm going to commit this after my current test run with the OVMF update completes. Please can you confirm the precise changeset ID you expect me to find at git://xenbits.xen.org/people/liuw/mini-os.git master f5d9868796e91bee70601805b9bfc1bb544b0586 and to push to git://xenbits.xen.org/people/mini-os.git master as part ^^ You don't need people I think? of this. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] xen: credit2: use curr_on_cpu(cpu) in place of `per_cpu(s, c).curr'
as 0bba5747f4bee4ddd (xen: sched_credit: define and use curr_on_cpu(cpu)) did for Credit1, hence making the code more consistent and easier to read. Signed-off-by: Dario Faggioli dario.faggi...@citrix.com Cc: George Dunlap george.dun...@eu.citrix.com Cc: Jan Beulich jbeul...@suse.com Cc: Keir Fraser k...@xen.org --- xen/common/sched_credit2.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index ad0a5d4..f0e2c82 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -493,7 +493,7 @@ runq_tickle(const struct scheduler *ops, unsigned int cpu, struct csched2_vcpu * BUG_ON(new-rqd != rqd); /* Look at the cpu it's running on first */ -cur = CSCHED2_VCPU(per_cpu(schedule_data, cpu).curr); +cur = CSCHED2_VCPU(curr_on_cpu(cpu)); burn_credits(rqd, cur, now); if ( cur-credit new-credit ) @@ -526,7 +526,7 @@ runq_tickle(const struct scheduler *ops, unsigned int cpu, struct csched2_vcpu * if ( i == cpu ) continue; -cur = CSCHED2_VCPU(per_cpu(schedule_data, i).curr); +cur = CSCHED2_VCPU(curr_on_cpu(i)); BUG_ON(is_idle_vcpu(cur-vcpu)); @@ -658,7 +658,7 @@ void burn_credits(struct csched2_runqueue_data *rqd, struct csched2_vcpu *svc, s s_time_t delta; /* Assert svc is current */ -ASSERT(svc==CSCHED2_VCPU(per_cpu(schedule_data, svc-vcpu-processor).curr)); +ASSERT(svc==CSCHED2_VCPU(curr_on_cpu(svc-vcpu-processor))); if ( is_idle_vcpu(svc-vcpu) ) { @@ -932,7 +932,7 @@ csched2_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc) BUG_ON( is_idle_vcpu(vc) ); -if ( per_cpu(schedule_data, vc-processor).curr == vc ) +if ( curr_on_cpu(vc-processor) == vc ) cpu_raise_softirq(vc-processor, SCHEDULE_SOFTIRQ); else if ( __vcpu_on_runq(svc) ) { @@ -957,7 +957,7 @@ csched2_vcpu_wake(const struct scheduler *ops, struct vcpu *vc) BUG_ON( is_idle_vcpu(vc) ); /* Make sure svc priority mod happens before runq check */ -if ( unlikely(per_cpu(schedule_data, vc-processor).curr == vc) ) +if ( unlikely(curr_on_cpu(vc-processor) == vc) ) { goto out; } @@ -1815,7 +1815,7 @@ csched2_dump_pcpu(const struct scheduler *ops, int cpu) printk(core=%s\n, cpustr); /* current VCPU */ -svc = CSCHED2_VCPU(per_cpu(schedule_data, cpu).curr); +svc = CSCHED2_VCPU(curr_on_cpu(cpu)); if ( svc ) { printk(\trun: ); ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v6 0/5] xen/arm: Add support for Huawei hip04-d01 platform
2015-02-26 13:24 GMT+00:00 Julien Grall julien.gr...@linaro.org: Hi Frediano, On 26/02/15 12:40, Frediano Ziglio wrote: xen/arm: Make gic-v2 code handle hip04-d01 platform xen/arm: handle GICH register changes for hip04-d01 platform xen/arm: Force dom0 to use normal GICv2 driver on Hip04 platform There is not much benefits to have 3 separate patches. I think they could be merged in a single-patch. In the last version I merged 2 of the 3 patches. In the third the comment is really specific to the piece of code. Frediano ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/5] x86: allow specifying the NUMA nodes Dom0 should run on
On 27.02.15 at 15:54, dario.faggi...@citrix.com wrote: On Fri, 2015-02-27 at 10:50 +, Jan Beulich wrote: On 27.02.15 at 11:04, dario.faggi...@citrix.com wrote: On Fri, 2015-02-27 at 08:46 +, Jan Beulich wrote: I'm simply adjusting what sched_init_vcpu() did, which is alter hard affinity conditionally on is_pinned and soft affinity unconditionally. Ok, I understand the idea behing this better now, thanks. [...] Setting soft affinity as a superset of (in the former case) or equal to (in the latter) hard affinity is just pure overhead, when in the scheduler. The why does sched_init_vcpu() do what it does? If you want to alter that, I'm fine with altering it here. It does that, but, in there, soft affinity is unconditionally set to 'all bits set'. Then, in the scheduler, if we find out that the the soft affinity mask is fully set, we just skip the soft affinity balancing step. The idea is that, whether the mask is full because no one touched this default, or because it has been manually set like that, there is nothing to do at the soft affinity balancing level. So, you actually are right: rather that not touch soft affinity, as I said in the previous email, I think we should set hard affinity conditionally to is_pinned, as in the patch, and then unconditionally set soft affinity to all, as in sched_init_vcpu(). I.e. effectively not touching it anyway (because just before it got set to all by sched_init_vcpu()). I guess instead of removing the line, I'll put it in a comment. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [xen-unstable test] 35257: regressions - FAIL
On Fri, 2015-02-27 at 11:51 -0700, Jim Fehlig wrote: 2015-02-23 20:13:15.845+: 2133: error : virFirewallValidateBackend:193 : direct firewall backend requested, but /sbin/ebtables is not available: No such file or directory Odd, since ebtables was found when building checking for ebtables... /sbin/ebtables But AFAICT, that wont prevent libvirtd from starting. The build host and the runtime host will likely be different (or at least reinstalled). The base set of packages should be the same, but the build one will install a bunch of libfoo-dev while the runtime host will only get libfoo. Perhaps some libfoo-dev is pulling in ebtables somehow while just libfoo is not. I'll have a look next week. I think its probably non-critical to the error here. I think these are just spurious. 2015-02-23 20:13:15.845+: 2133: error : virFirewallApply:936 : out of memory 2015-02-23 20:13:16.092+: 2133: error : virExec:491 : Cannot find 'pm-is-supported' in path: No such file or directory 2015-02-23 20:13:16.092+: 2133: warning : virQEMUCapsInit:999 : Failed to get host power management capabilities As are these two. 2015-02-23 20:13:16.400+: 2133: error : virFirewallApply:936 : out of memory Has these OOM messages resulted in libvirtd exiting? No, I don't think so. The related code is int virFirewallApply(virFirewallPtr firewall) { size_t i, j; int ret = -1; virMutexLock(ruleLock); if (!firewall || firewall-err == ENOMEM) { virReportOOMError(); goto cleanup; ... } I suspect 'firewall' is null, so OOM error is reported and the function returns -1. But I also don't see this preventing libvirtd from starting. I've cc'd the libvirt list for verification that these errors won't prevent libvirtd from starting. I'm pretty sure libvirtd did successfully start, since we have successfully done a guest start and stop. The failing step is a second guest start, so it seems like libvirtd has either crashed or exited. I suppose these messages are from start of day and therefore red-herrings wrt the reason libvirtd went away. I don't see any evidence of a crash elsewhere in the logs (i.e. no process segfaulted in dmesg, no OOM killing going on etc). We don't seem to collect dom0 freemem info, but that most likely wouldn't help given the libvirtd process has exited. Any ideas where to look next? Can you access the test environment and try starting libvirtd in the foreground? Or enable debug log level in /etc/libvirt/libvirtd.conf? The test env will have been recycled, I could try and replicate it manually, but I think to start with I should arrange for the test env to have more logging enabled, in the hopes that if it happens again we get more information. I had some question around this in my reply Wei in this thread at 1425042785.14641.188.ca...@citrix.com. Cheers, Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [xen-unstable test] 35257: regressions - FAIL
Ian Campbell wrote: On Fri, 2015-02-27 at 10:48 +, Wei Liu wrote: On Fri, Feb 27, 2015 at 09:42:29AM +, Ian Campbell wrote: On Thu, 2015-02-26 at 20:14 +, xen.org wrote: flight 35257 xen-unstable real [real] http://www.chiark.greenend.org.uk/~xensrcts/logs/35257/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-armhf-armhf-libvirt 12 guest-start.2 fail REGR. vs. 34629 logs: http://www.chiark.greenend.org.uk/~xensrcts/logs/35257/test-armhf-armhf-libvirt/info.html http://www.chiark.greenend.org.uk/~xensrcts/logs/35257/test-armhf-armhf-libvirt/12.ts-guest-start.log 2015-02-23 20:21:48 Z executing ssh ... root@10.80.229.106 virsh domxml-from-native xen-xl /etc/xen/debian.guest.osstest.cfg /etc/xen/debian.guest.osstest.cfg.xml error: failed to connect to the hypervisor error: no valid connection error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused http://www.chiark.greenend.org.uk/~xensrcts/logs/35257/test-armhf-armhf-libvirt/marilith-n4-output-ps_wwwaxf_-eo_pid%2Ctty%2Cstat%2Ctime%2Cnice%2Cpsr%2Cpcpu%2Cpmem%2Cnwchan%2Cwchan%2325%2Cargs appears to show no libvirtd process. http://www.chiark.greenend.org.uk/~xensrcts/logs/35257/test-armhf-armhf-libvirt/marilith-n4---var-log-libvirt-libvirtd.log says: 2015-02-23 20:13:15.556+: 2133: info : libvirt version: 1.2.13 2015-02-23 20:13:15.556+: 2133: error : dnsmasqCapsRefreshInternal:726 : Cannot check dnsmasq binary dnsmasq: No such file or directory 2015-02-23 20:13:15.845+: 2133: error : virFirewallValidateBackend:193 : direct firewall backend requested, but /sbin/ebtables is not available: No such file or directory I think these are just spurious. 2015-02-23 20:13:15.845+: 2133: error : virFirewallApply:936 : out of memory 2015-02-23 20:13:16.092+: 2133: error : virExec:491 : Cannot find 'pm-is-supported' in path: No such file or directory 2015-02-23 20:13:16.092+: 2133: warning : virQEMUCapsInit:999 : Failed to get host power management capabilities As are these two. 2015-02-23 20:13:16.400+: 2133: error : virFirewallApply:936 : out of memory Last time Ian and I debugged a libvirt crashing bug, out of memory didn't cause libvirtd to exit. It turned out it's some bug in libxl event machinery that caused libvirt to exit, but the assertion message was not shown anywhere. I think we might need to login to that host and run libvirtd in foreground to determine what goes wrong. That's possible I suppose, but it would be nice to arrange not to have to in the future. Perhaps we should be forcing higher log levels on libvirtd when installing, patching /usr/local/etc/libvirt/libvirtd.conf to set log_level=2 (or even 1) perhaps? (Default is 3 == warnings+error, 2 is info, 1 is debug) Jim, what debug level would you recommend for automated test? Unless it is super verbose I suppose 1=debug is the way to go? I think we need DEBUG log level, although it is rather verbose. If that becomes a problem, we could experiment with a minimally useful log_filters setting, e.g. log_filters=1:daemon 1:libxl Adding -v to libvirtd command line would be an easier patch, but only gives the effect of log_level=2 AFAICT. Perhaps that is considered sufficient? In my experience, if ERROR is insufficient, INFO and WARNING don't help. DEBUG is needed. Regards, Jim ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] xen/pciback: Don't print scary messages when unsupported by hypervisor.
We print at the warninig level messages such as: pciback :90:00.5: MSI-X preparation failed (-38) which is due to the hypervisor not supporting this sub-hypercall (which was added in Xen 4.3). Instead of having scary messages all the time - only have it when the hypercall is actually supported. Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- drivers/xen/xen-pciback/pci_stub.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c index 7acc796..ddc5500 100644 --- a/drivers/xen/xen-pciback/pci_stub.c +++ b/drivers/xen/xen-pciback/pci_stub.c @@ -115,7 +115,7 @@ static void pcistub_device_release(struct kref *kref) int err = HYPERVISOR_physdev_op(PHYSDEVOP_release_msix, ppdev); - if (err) + if (err err != -ENOSYS) dev_warn(dev-dev, MSI-X release failed (%d)\n, err); } @@ -376,7 +376,7 @@ static int __devinit pcistub_init_device(struct pci_dev *dev) }; err = HYPERVISOR_physdev_op(PHYSDEVOP_prepare_msix, ppdev); - if (err) + if (err err != -ENOSYS) dev_err(dev-dev, MSI-X preparation failed (%d)\n, err); } -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: xen config changes v4
On Fri, Feb 27, 2015 at 07:14:32AM +0100, Juergen Gross wrote: On 02/26/2015 07:48 PM, Luis R. Rodriguez wrote: On Thu, Feb 26, 2015 at 05:42:57PM +, Stefano Stabellini wrote: On Thu, 26 Feb 2015, Luis R. Rodriguez wrote: On Thu, Feb 26, 2015 at 11:08:20AM +, Stefano Stabellini wrote: On Thu, 26 Feb 2015, David Vrabel wrote: On 26/02/15 04:59, Juergen Gross wrote: So we are again in the situation that pv-drivers always imply the pvops kernel (PARAVIRT selected). I started the whole Kconfig rework to eliminate this dependency. Yes. Can you produce a series that just addresses this one issue. In the absence of any concrete requirement for this big Kconfig reorg I I don't think it is helpful. I clearly missed some context as I didn't realize that this was the intended goal. Why do we want this? Please explain as it won't come for free. We have a few PV interfaces for HVM guests that need PARAVIRT in Linux in order to be used, for example pv_time_ops and HVMOP_pagetable_dying. They are critical performance improvements and from the interface perspective, small enough that doesn't make much sense having a separate KConfig option for them. In order to reach the goal above we necessarily need to introduce a differentiation in terms of PV on HVM guests in Linux: 1) basic guests with PV network, disk, etc but no PV timers, no HVMOP_pagetable_dying, no PV IPIs 2) full PV on HVM guests that have PV network, disk, timers, HVMOP_pagetable_dying, PV IPIs and anything else that makes sense. 2) is much faster than 1) on Xen and 2) is only a tiny bit slower than 1) on native x86 Also don't we shove 2) down hvm guests right now? Even when everything is built in I do not see how we opt out for HVM for 1) at run time right now. If this is true then the question of motivation for this becomes even stronger I think. Yes, indeed there is no way to do 1) at the moment. And for good reasons, see above. OK if the goal is to be able to build front end drivers by avoiding building PARAVIRT / PARAVIRT_CLOCK and if the gains to be able to do so (which haven't been stated other than just the ability to do so) are small (as Stefano notes simple hvm containers do not perform great) but requires a bit of work, I'd rather ask -- why not address *why* we are avoiding PARAVIRT / PARAVIRT_CLOCK and stick to the original goals behind the pvops model by addressing what is required to be able to continue to be happy with one single kernel. The work required to do that might be more than to just be able to build simple Xen hvm containers without PARAVIRT / PARAVIRT_CLOCK but I'd think the gains would be much higher. I absolutely agree. I think this is a long term goal we should work on. PVH should address most of the issues, BTW. If this resonates well then I'd like to ask: what are the current most pressing issues with enabling PARAVIRT / PARAVIRT_CLOCK. PARAVIRT: performance, especially memory management Do we have studies on specific areas? I'd be very interested in the exact routines. PARAVIRT_CLOCK: none Great! Luis ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] xen, apic: Setup our own APIC driver and validator for APIC IDs.
Via CPUID masking and the different apic- overrides we effectively make PV guests only but with the default APIC driver. That is OK as an PV guest should never access any APIC registers. However, the APIC is also used to limit the amount of CPUs if the APIC IDs are incorrect - and since we mask the x2APIC from the CPUID - any APIC IDs above 0xFF are deemed incorrect by the default APIC routines. As such add a new routine to check for APIC ID which will be only used if the CPUID (native one) tells us the system is using x2APIC. This allows us to boot with more than 255 CPUs if running as initial domain. The probing of APIC drivers is dependent on the build. The arch/x86/kernel/apic/Makefile lists them as (assuming 64-bit): apic_numachip.o x2apic_uv_x.o x2apic_phys.o x2apic_cluster.o apic_flat_64.o Looking at .apicdrivers section I see: xen_apic, apic_x2apic_phys, apic_x2apic_cluster, apic_physflatapic_flat addresses. Since we build from arch/x86/xen which we can before or after x86/kernel/apic is built. As such we add in an late probe function to change to the Xen PV if it hand't been done during bootup. Reported-by: Cathy Avery cathy.av...@oracle.com Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- arch/x86/xen/apic.c | 169 +++ arch/x86/xen/enlighten.c | 90 + 2 files changed, 170 insertions(+), 89 deletions(-) diff --git a/arch/x86/xen/apic.c b/arch/x86/xen/apic.c index 7005ced..9b9a5fc 100644 --- a/arch/x86/xen/apic.c +++ b/arch/x86/xen/apic.c @@ -7,6 +7,7 @@ #include xen/xen.h #include xen/interface/physdev.h #include xen-ops.h +#include smp.h static unsigned int xen_io_apic_read(unsigned apic, unsigned reg) { @@ -28,7 +29,175 @@ static unsigned int xen_io_apic_read(unsigned apic, unsigned reg) return 0xfd; } +static unsigned long xen_set_apic_id(unsigned int x) +{ + WARN_ON(1); + return x; +} + +static unsigned int xen_get_apic_id(unsigned long x) +{ + return ((x)24) 0xFFu; +} + +static u32 xen_apic_read(u32 reg) +{ + struct xen_platform_op op = { + .cmd = XENPF_get_cpuinfo, + .interface_version = XENPF_INTERFACE_VERSION, + .u.pcpu_info.xen_cpuid = 0, + }; + int ret = 0; + + /* Shouldn't need this as APIC is turned off for PV, and we only +* get called on the bootup processor. But just in case. */ + if (!xen_initial_domain() || smp_processor_id()) + return 0; + + if (reg == APIC_LVR) + return 0x10; + + if (reg != APIC_ID) + return 0; + + ret = HYPERVISOR_dom0_op(op); + if (ret) + return 0; + + return op.u.pcpu_info.apic_id 24; +} + +static void xen_apic_write(u32 reg, u32 val) +{ + /* Warn to see if there's any stray references */ + WARN_ON(1); +} + +static u64 xen_apic_icr_read(void) +{ + return 0; +} + +static void xen_apic_icr_write(u32 low, u32 id) +{ + /* Warn to see if there's any stray references */ + WARN_ON(1); +} + +static u32 xen_safe_apic_wait_icr_idle(void) +{ +return 0; +} + + +static int probe_xen(void) +{ + if (xen_pv_domain()) + return 1; + + return 0; +} + +static int xen_madt_oem_check(char *oem_id, char *oem_table_id) +{ + return 1; +} + +static int xen_id_always_valid(int apicid) +{ + return 1; +} + +static int xen_id_always_registered(void) +{ + return 1; +} + +static int xen_phys_pkg_id(int initial_apic_id, int index_msb) +{ + return initial_apic_id index_msb; +} + +static void xen_noop(void) +{ +} + +static void xen_silent_inquire(int apicid) +{ +} + +static struct apic xen_apic = { + .name = Xen PV, + .probe = probe_xen, + .acpi_madt_oem_check= xen_madt_oem_check, + .apic_id_valid = xen_id_always_valid, + .apic_id_registered = xen_id_always_registered, + + /* .irq_delivery_mode - used in native_compose_msi_msg only */ + /* .irq_dest_mode - used in native_compose_msi_msg only */ + + .target_cpus= default_target_cpus, + .disable_esr= 0, + /* .dest_logical - default_send_IPI_ use it but we use our own. */ + .check_apicid_used = default_check_apicid_used, /* Used on 32-bit */ + + .vector_allocation_domain = flat_vector_allocation_domain, + .init_apic_ldr = xen_noop, /* setup_local_APIC calls it */ + + .ioapic_phys_id_map = default_ioapic_phys_id_map, /* Used on 32-bit */ + .setup_apic_routing = NULL, + .cpu_present_to_apicid = default_cpu_present_to_apicid, + .apicid_to_cpu_present = physid_set_mask_of_physid, /* Used on 32-bit */ + .check_phys_apicid_present =
[Xen-devel] Regression due to d9581c7dcac15c02ad4d47c60c60f4d8f197db55 en/fb: allow xenfb initialization for hvm guest
This has been in queue for some time. In our kernels (UEK3) we had to revert said patch. The patch says: xen/fb: allow xenfb initialization for hvm guests There is no reasons why an HVM guest shouldn't be allowed to use xenfb. As a matter of fact ARM guests, HVM from Linux POV, can use xenfb. Given that no Xen toolstacks configure a xenfb backend for x86 HVM guests, they are not affected. Please note that at this time QEMU needs few outstanding fixes to provide xenfb on ARM: http://marc.info/?l=qemu-develm=138739419700837w=2 which is a lie. The no Xen toolstacks configure a xenfb backend for x86 HVM is actually a lie. If you try to boot this kernel under Xen with Xend it will be a problem - as Xend does setup an 'vfb' device. The end result is that during the bootup - up until X starts, there is no console output on the VNC window. As the Linux kernel tries to use the vfb console driver. Any suggestsion on how to fix this? Should we just wrap the whole thing with #ifdef, like this? diff --git a/drivers/video/fbdev/xen-fbfront.c b/drivers/video/fbdev/xen-fbfront.c index 09dc447..584be8e 100644 --- a/drivers/video/fbdev/xen-fbfront.c +++ b/drivers/video/fbdev/xen-fbfront.c @@ -696,7 +696,10 @@ static int __init xenfb_init(void) { if (!xen_domain()) return -ENODEV; - +#ifdef CONFIG_X86 + if (!xen_pv_domain()) + return -ENODEV; +#endif /* Nothing to do if running in dom0. */ if (xen_initial_domain()) return -ENODEV; ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] libxl__device_pci_reset() questions
On Thu, Feb 26, 2015 at 02:28:34PM +, Jan Beulich wrote: On 19.02.15 at 15:30, ian.campb...@citrix.com wrote: On Thu, 2015-02-19 at 13:59 +, Jan Beulich wrote: All, in the context of someone seeing The kernel doesn't support reset from sysfs for PCI device, is my understanding correct that the lack of error checking in any caller (perhaps intentional) means that any of the errors logged from this function are really just warnings, i.e. don't prevent the assignment from taking place? It was a long while ago, but I believe that was the intention, yes. Furthermore I'm puzzled by the function first thing trying to access a do_flr file supposedly made available by the pciback driver, yet I can't see either the upstream or the old 2.6.18 driver surfacing such a file. What am I missing here? I'm not sure, on the basis of http://lists.xen.org/archives/html/xen-devel/2014-06/msg03105.html and http://lists.xen.org/archives/html/xen-devel/2014-07/msg01108.html I've added Konrad to the CC. Konrad? I talked with David about this and his point was that: 1). If the device advertises it can 'reset' it be better be able to do it. 2). However there are some that lie. If they exist we should have an quirk for them in the PCI layer so that we don't think we have this feature available. 3). In the case where the PCI device has none of the mechanism to do the reset we should provide on via xen-pciback. The 3) David had a patch which is in XenServer which does the work - it first figures out whether the PCI device reports as being able to do the reset. If it is not, then we install our own 'reset' SysFS which will do the bus reset. However looking at how VFIO and QEMU does it - there is also an check on the user-space part - where it decideds in some cases to ignore the 'reset' from SysFS and do its bus-reset via the VFIO ioctl. I hadn't yet digged completlely in the code to understand what the logic states it has to use the VFIO ioctl bus reset instead of the PCI reset mechanism. Thanks, Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [OSSTEST PATCH 3/8] emails: honour OSSTEST_EMAIL_SUBJECT_PREFIX
On Thu, 2015-02-26 at 17:44 +, Ian Jackson wrote: Ian Campbell writes (Re: [OSSTEST PATCH 3/8] emails: honour OSSTEST_EMAIL_SUBJECT_PREFIX): On Wed, 2015-02-25 at 13:01 +, Ian Jackson wrote: This is prefixed before the other computed prefixes. It makes it easier to distinguish an adhoc cr-daily-branch test runs for a real branch. Do they not already get adhoc in the $subject? i.e. my commissioning runs for the new arm create (following README.dev procedure) resulted in mails with: [adhoc test] 34418: trouble: blocked/broken/fail/pass (IOW it seems $branch is replaced by adhoc somewhere along the say) That happens if you use mg-execute-flight. If you let cr-daily-branch run the flight for you, it uses the standard email stuff. Ah, OK, I didn't realise there was a difference. So Ack to this and the next patch which I didn't ack for similar reasons. (I think that makes the whole series acked, FWIW) Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Poor network performance between DomU with multiqueue support
On Mon, Dec 08, 2014 at 01:08:18PM +, Zhangleiqiang (Trump) wrote: On Mon, Dec 08, 2014 at 06:44:26AM +, Zhangleiqiang (Trump) wrote: On Fri, Dec 05, 2014 at 01:17:16AM +, Zhangleiqiang (Trump) wrote: [...] I think that's expected, because guest RX data path still uses grant_copy while guest TX uses grant_map to do zero-copy transmit. As far as I know, there are three main grant-related operations used in split device model: grant mapping, grant transfer and grant copy. Grant transfer has not used now, and grant mapping and grant transfer both involve TLB refresh work for hypervisor, am I right? Or only grant transfer has this overhead? Transfer is not used so I can't tell. Grant unmap causes TLB flush. I saw in an email the other day XenServer folks has some planned improvement to avoid TLB flush in Xen to upstream in 4.6 window. I can't speak for sure it will get upstreamed as I don't work on that. Does grant copy surely has more overhead than grant mapping? At the very least the zero-copy TX path is faster than previous copying path. But speaking of the micro operation I'm not sure. There was once persistent map prototype netback / netfront that establishes a memory pool between FE and BE then use memcpy to copy data. Unfortunately that prototype was not done right so the result was not good. The newest mail about persistent grant I can find is sent from 16 Nov 2012 (http://lists.xen.org/archives/html/xen-devel/2012-11/msg00832.html). Why is it not done right and not merged into upstream? AFAICT there's one more memcpy than necessary, i.e. frontend memcpy data into the pool then backend memcpy data out of the pool, when backend should be able to use the page in pool directly. Memcpy should cheaper than grant_copy because the former needs not the hypercall which will cause VM Exit to XEN Hypervisor, am I right? For RX path, using memcpy based on persistent grant table may have higher performance than using grant copy now. In theory yes. Unfortunately nobody has benchmarked that properly. I have some testing for RX performance using persistent grant method and upstream method (3.17.4 branch), the results show that persistent grant method does have higher performance than upstream method (from 3.5Gbps to about 6Gbps). And I find that persistent grant mechanism has already used in blkfrong/blkback, I am wondering why there are no efforts to replace the grant copy by persistent grant now, at least in RX path. Are there other disadvantages in persistent grant method which stop we use it? PS. I used pkt-gen to send packet from dom0 to a domU running on another dom0, the CPUs of both dom0 is Intel E5640 2.4GHz, and the two dom0s is connected with a 10GE NIC. If you're interested in doing work on optimising RX performance, you might want to sync up with XenServer folks? I have seen move grant copy to guest and Fix grant copy alignment problem as optimization methods used in NetChannel2 (http://www-archive.xenproject.org/files/xensummit_fall07/16_JoseRenatoSantos.pdf). Unfortunately, NetChannel2 seems not be supported from 2.6.32. Do you know them and are them be helpful for RX path optimization under current upstream implementation? Not sure, that's long before I ever started working on Xen. By the way, after rethinking the testing results for multi-queue pv (kernel 3.17.4+XEN 4.4) implementation, I find that when using four queues for netback/netfront, there will be about 3 netback process running with high CPU usage on receive Dom0 (about 85% usage per process running on one CPU core), and the aggregate throughout is only about 5Gbps. I doubt that there may be some bug or pitfall in current multi-queue implementation, because for 5Gbps throughout, occurring about all of 3 CPU core for packet receiving is somehow abnormal. 3.17.4 doesn't contain David Vrabel's fixes. Look for bc96f648df1bbc2729abbb84513cf4f64273a1f1 f48da8b14d04ca87ffcffe68829afd45f926ec6a ecf08d2dbb96d5a4b4bcc53a39e8d29cc8fef02e in David Miller's net tree. BTW there are some improvement planned for 4.6: [Xen-devel] [PATCH v3 0/2] gnttab: Improve scaleability. This is orthogonal to the problem you're trying to solve but it should help improve performance in general. Wei. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3] RFC: Automatically check xen's public headers for C++ pitfalls.
On 27.02.15 at 10:22, t...@xen.org wrote: At 08:36 + on 27 Feb (1425022578), Jan Beulich wrote: On 26.02.15 at 17:24, t...@xen.org wrote: +PUBLIC_ANSI_HEADERS := $(filter-out public/%ctl.h public/xsm/% public/%hvm/save.h, $(PUBLIC_HEADERS)) + +headers.chk: $(PUBLIC_ANSI_HEADERS) Makefile + for i in $(filter %.h,$^); do \ + $(CC) -x c -ansi -Wall -Werror -include stdint.h \ +-S -o /dev/null $$i || exit 1; \ + echo $$i; \ + done $@.new + mv $@.new $@ + +headers++.chk: $(PUBLIC_HEADERS) Makefile + if $(CXX) -v /dev/null 21; then \ + for i in $(filter %.h,$^); do \ + $(CXX) -x c++ -std=gnu++98 -Wall -Werror \ + -D__XEN_TOOLS__ -Dprivate=private_is_a_keyword_in_cpp \ With -D__XEN_TOOLS__ added, did you check that domctl.h and sysctl.h still actually need to be excluded from this test? The C++ check includes those headers and defines __XEN_TOOLS__; the ANSI C check does neither (as before). Argh - I again didn't look closely enough; I'm sorry. Would you like to change that too? No. Ack on v3 then. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [OSSTEST PATCH 9/8] README.dev: Runes for adhoc testing in the production environment
On Thu, 2015-02-26 at 17:53 +, Ian Jackson wrote: Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com Looks good, Acked-by: Ian Campbell ian.campb...@citrix.com --- README.dev | 18 ++ 1 file changed, 18 insertions(+) diff --git a/README.dev b/README.dev index aae4f17..03c3e61 100644 --- a/README.dev +++ b/README.dev @@ -164,3 +164,21 @@ $HOME/bisects/for-$branch.git/stop $HOME/testing.git/$xenbranch.stop stops everything using $xenbranch + +Adhoc testing in the production environment +=== + +Adhoc (`play') testing of a proposed osstest branch: + + As yourself on the osstest controller VM: + + Check out the version of osstest to be tested. If you are editing + on your workstation, it is easiest to commit everything and then + git-push osstestvm:osstest-wombat-tree.git +HEAD:t + and on the controller + git checkout t~0 + + Create (on the controller) daily-cron-email-foo containing + To: something appropriate + Then + OSSTEST_EMAIL_HEADER=daily-cron-email-foo OSSTEST_USE_HEAD=y OSSTEST_NO_BASELINE=y ./cr-daily-branch osstest ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 3/3] xen/arm: allow console=hvc0 to be omitted for guests
On Thu, 2015-02-26 at 18:22 +, Stefano Stabellini wrote: On Wed, 18 Feb 2015, Ian Campbell wrote: On Wed, 2015-02-18 at 09:50 -0600, Rob Herring wrote: On Wed, Feb 18, 2015 at 7:51 AM, Julien Grall julien.gr...@linaro.org wrote: From: Ard Biesheuvel ard.biesheu...@linaro.org This patch registers hvc0 as the preferred console if no console has been specified explicitly on the kernel command line. The purpose is to allow platform agnostic kernels and boot images (such as distro installers) to boot in a Xen/ARM domU without the need to modify the command line by hand. How does this interact with DT chosen stdout-path? I think it shouldn't any more than the existing calls from e.g. the 8250 driver to preferred_console do. Is there a node for hvc0? Not a direct one, it is inferred from the presence of the general Xen node. Xen PV consoles, including hvc0, as all the other Xen PV devices are advertised on xenstore. Do we actually use the xenstore node for hvc0? I thought we got it from hvmparams (so the primary it can be used before xenstore is up) I did vaguely consider handling a stdout-path pointing to that -- but it seemed a bit of an abuse. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: xen config changes v4
On Fri, 27 Feb 2015, Juergen Gross wrote: On 02/26/2015 06:42 PM, Stefano Stabellini wrote: On Thu, 26 Feb 2015, Luis R. Rodriguez wrote: On Thu, Feb 26, 2015 at 11:08:20AM +, Stefano Stabellini wrote: On Thu, 26 Feb 2015, David Vrabel wrote: On 26/02/15 04:59, Juergen Gross wrote: So we are again in the situation that pv-drivers always imply the pvops kernel (PARAVIRT selected). I started the whole Kconfig rework to eliminate this dependency. Yes. Can you produce a series that just addresses this one issue. In the absence of any concrete requirement for this big Kconfig reorg I I don't think it is helpful. I clearly missed some context as I didn't realize that this was the intended goal. Why do we want this? Please explain as it won't come for free. We have a few PV interfaces for HVM guests that need PARAVIRT in Linux in order to be used, for example pv_time_ops and HVMOP_pagetable_dying. They are critical performance improvements and from the interface perspective, small enough that doesn't make much sense having a separate KConfig option for them. In order to reach the goal above we necessarily need to introduce a differentiation in terms of PV on HVM guests in Linux: 1) basic guests with PV network, disk, etc but no PV timers, no HVMOP_pagetable_dying, no PV IPIs 2) full PV on HVM guests that have PV network, disk, timers, HVMOP_pagetable_dying, PV IPIs and anything else that makes sense. 2) is much faster than 1) on Xen and 2) is only a tiny bit slower than 1) on native x86 Also don't we shove 2) down hvm guests right now? Even when everything is built in I do not see how we opt out for HVM for 1) at run time right now. If this is true then the question of motivation for this becomes even stronger I think. Yes, indeed there is no way to do 1) at the moment. And for good reasons, see above. Hmm, after checking the code I'm not convinced: - HVMOP_pagetable_dying is obsolete on modern hardware supporting EPT/HAP That might be true, but what about older hardware? Even on modern hardware a few workloads still run faster on shadow. But if HVMOP_pagetable_dying is the only reason to keep PARAVIRT for HVM guests, then I agree with you that we should remove it. - PV IPIs are not needed on single-vcpu guests - PARAVIRT_CLOCK doesn't need PARAVIRT (in fact the SUSEs kernel configs for all x86_64 kernels have CONFIG_PARAVIRT_CLOCK=y) So I think we really should enable building Xen frontends without PARAVIRT, implying at least no XEN_PV and no XEN_PVH. I'll have a try setting up patches. If we are doing this as a performance improvement, I would like to see a couple of benchmarks (kernbench, hackbench) to show that on a single-vcpu guest and multi-vcpu guest (let's say 4 vcpus) disabling PARAVIRT leads to better performance on Xen on EPT hardware. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/5] x86: allow specifying the NUMA nodes Dom0 should run on
On 26.02.15 at 18:14, dario.faggi...@citrix.com wrote: On Thu, 2015-02-26 at 13:52 +, Jan Beulich wrote: +### dom0\_nodes + + `= integer[,...]` + +Specify the NUMA nodes to place Dom0 on. Defaults for vCPU-s created +and memory assigned to Dom0 will be adjusted to match the node +restrictions set up here. Note that the values to be specified here are +ACPI PXM ones, not Xen internal node numbers. + Why use PXM ids? It might be me being much more used to work with NUMA node ids, but wouldn't the other way round be more consistent (almost everything the user interacts with after boot speak node ids) and easier for the user to figure things out (e.g., with tools like numactl on baremetal)? This way behavior doesn't change if internally in the hypervisor we need to change the mapping from PXMs to node IDs. +static struct vcpu *__init setup_vcpu(struct domain *d, unsigned int vcpu_id, + unsigned int cpu) +{ +struct vcpu *v = alloc_vcpu(d, vcpu_id, cpu); + +if ( v ) +{ +if ( !d-is_pinned ) +cpumask_copy(v-cpu_hard_affinity, dom0_cpus); +cpumask_copy(v-cpu_soft_affinity, dom0_cpus); +} + About this, for DomUs, now that we have soft affinity available, what we do is set only soft affinity to match the NUMA placement. I think I see and agree why we want to be 'more strict' in Dom0, but I felt like it was worth to point out the difference in behaviour (should it be documented somewhere?). I'm simply adjusting what sched_init_vcpu() did, which is alter hard affinity conditionally on is_pinned and soft affinity unconditionally. BTW, mostly out of curiosity, I've had a few strange issues/conflicts in applying this on top of staging, in order to test it... Was it me doing something very stupid, or was this based on something different? Apart from the one patch named in the cover letter there shouldn't be any other dependencies. Without you naming the issues you encountered, I can't tell. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3] RFC: Automatically check xen's public headers for C++ pitfalls.
At 08:36 + on 27 Feb (1425022578), Jan Beulich wrote: On 26.02.15 at 17:24, t...@xen.org wrote: +PUBLIC_ANSI_HEADERS := $(filter-out public/%ctl.h public/xsm/% public/%hvm/save.h, $(PUBLIC_HEADERS)) + +headers.chk: $(PUBLIC_ANSI_HEADERS) Makefile + for i in $(filter %.h,$^); do \ + $(CC) -x c -ansi -Wall -Werror -include stdint.h \ + -S -o /dev/null $$i || exit 1; \ + echo $$i; \ + done $@.new + mv $@.new $@ + +headers++.chk: $(PUBLIC_HEADERS) Makefile + if $(CXX) -v /dev/null 21; then \ + for i in $(filter %.h,$^); do \ + $(CXX) -x c++ -std=gnu++98 -Wall -Werror \ + -D__XEN_TOOLS__ -Dprivate=private_is_a_keyword_in_cpp \ With -D__XEN_TOOLS__ added, did you check that domctl.h and sysctl.h still actually need to be excluded from this test? The C++ check includes those headers and defines __XEN_TOOLS__; the ANSI C check does neither (as before). Would you like to change that too? Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [xen-unstable test] 35257: regressions - FAIL
On Fri, Feb 27, 2015 at 09:42:29AM +, Ian Campbell wrote: On Thu, 2015-02-26 at 20:14 +, xen.org wrote: flight 35257 xen-unstable real [real] http://www.chiark.greenend.org.uk/~xensrcts/logs/35257/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-armhf-armhf-libvirt 12 guest-start.2 fail REGR. vs. 34629 logs: http://www.chiark.greenend.org.uk/~xensrcts/logs/35257/test-armhf-armhf-libvirt/info.html http://www.chiark.greenend.org.uk/~xensrcts/logs/35257/test-armhf-armhf-libvirt/12.ts-guest-start.log 2015-02-23 20:21:48 Z executing ssh ... root@10.80.229.106 virsh domxml-from-native xen-xl /etc/xen/debian.guest.osstest.cfg /etc/xen/debian.guest.osstest.cfg.xml error: failed to connect to the hypervisor error: no valid connection error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused http://www.chiark.greenend.org.uk/~xensrcts/logs/35257/test-armhf-armhf-libvirt/marilith-n4-output-ps_wwwaxf_-eo_pid%2Ctty%2Cstat%2Ctime%2Cnice%2Cpsr%2Cpcpu%2Cpmem%2Cnwchan%2Cwchan%2325%2Cargs appears to show no libvirtd process. http://www.chiark.greenend.org.uk/~xensrcts/logs/35257/test-armhf-armhf-libvirt/marilith-n4---var-log-libvirt-libvirtd.log says: 2015-02-23 20:13:15.556+: 2133: info : libvirt version: 1.2.13 2015-02-23 20:13:15.556+: 2133: error : dnsmasqCapsRefreshInternal:726 : Cannot check dnsmasq binary dnsmasq: No such file or directory 2015-02-23 20:13:15.845+: 2133: error : virFirewallValidateBackend:193 : direct firewall backend requested, but /sbin/ebtables is not available: No such file or directory I think these are just spurious. 2015-02-23 20:13:15.845+: 2133: error : virFirewallApply:936 : out of memory 2015-02-23 20:13:16.092+: 2133: error : virExec:491 : Cannot find 'pm-is-supported' in path: No such file or directory 2015-02-23 20:13:16.092+: 2133: warning : virQEMUCapsInit:999 : Failed to get host power management capabilities As are these two. 2015-02-23 20:13:16.400+: 2133: error : virFirewallApply:936 : out of memory Last time Ian and I debugged a libvirt crashing bug, out of memory didn't cause libvirtd to exit. It turned out it's some bug in libxl event machinery that caused libvirt to exit, but the assertion message was not shown anywhere. I think we might need to login to that host and run libvirtd in foreground to determine what goes wrong. Wei. Has these OOM messages resulted in libvirtd exiting? I don't see any evidence of a crash elsewhere in the logs (i.e. no process segfaulted in dmesg, no OOM killing going on etc). We don't seem to collect dom0 freemem info, but that most likely wouldn't help given the libvirtd process has exited. Any ideas where to look next? Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/5] x86: allow specifying the NUMA nodes Dom0 should run on
On 27.02.15 at 11:04, dario.faggi...@citrix.com wrote: On Fri, 2015-02-27 at 08:46 +, Jan Beulich wrote: On 26.02.15 at 18:14, dario.faggi...@citrix.com wrote: On Thu, 2015-02-26 at 13:52 +, Jan Beulich wrote: +### dom0\_nodes + + `= integer[,...]` + +Specify the NUMA nodes to place Dom0 on. Defaults for vCPU-s created +and memory assigned to Dom0 will be adjusted to match the node +restrictions set up here. Note that the values to be specified here are +ACPI PXM ones, not Xen internal node numbers. + Why use PXM ids? It might be me being much more used to work with NUMA node ids, but wouldn't the other way round be more consistent (almost everything the user interacts with after boot speak node ids) and easier for the user to figure things out (e.g., with tools like numactl on baremetal)? This way behavior doesn't change if internally in the hypervisor we need to change the mapping from PXMs to node IDs. Ok, I see the value of this. I'm still a bit concerned about the fact that everything else speak NUMA node, but it's probably just me being much more used to that than to PXMs. :-) With everything else I suppose you mean the tool stack? There shouldn't be any node IDs kept across reboots there. Yet the consistent behavior to be achieved here is particularly for multiple boots. +static struct vcpu *__init setup_vcpu(struct domain *d, unsigned int vcpu_id, + unsigned int cpu) +{ +struct vcpu *v = alloc_vcpu(d, vcpu_id, cpu); + +if ( v ) +{ +if ( !d-is_pinned ) +cpumask_copy(v-cpu_hard_affinity, dom0_cpus); +cpumask_copy(v-cpu_soft_affinity, dom0_cpus); +} + About this, for DomUs, now that we have soft affinity available, what we do is set only soft affinity to match the NUMA placement. I think I see and agree why we want to be 'more strict' in Dom0, but I felt like it was worth to point out the difference in behaviour (should it be documented somewhere?). I'm simply adjusting what sched_init_vcpu() did, which is alter hard affinity conditionally on is_pinned and soft affinity unconditionally. Ok, I understand the idea behing this better now, thanks. [...] Setting soft affinity as a superset of (in the former case) or equal to (in the latter) hard affinity is just pure overhead, when in the scheduler. The why does sched_init_vcpu() do what it does? If you want to alter that, I'm fine with altering it here. In fact, if the scheduler sees that soft affinity is defined, it will go through the load balancing/vcpu placement logic twice, the first time using the soft affinity mask, the second using the hard affinity one. Actually, the first time it uses 'soft hard', which in these cases is exactly equal to hard, and that's why I'm calling this pure overhead. I probably should add checks in the scheduler to identify such situations as no need to consider soft affinity. I thought about this before, but didn't do that because it's a more cpumask_foo() fiddling in a few hot paths... but of course I can check for the relationship between hard and soft affinity masks upfront, cache the result in a bool_t, and use _that_ in hot paths... what do you think? Avoiding the fiddling in hot paths is surely desirable. But it would indeed seem even better to avoid the inefficiency in the first place (i.e. when storing affinities). All this being said, I still would avoid putting the system in a configuration where soft is superset or equal to hard, at the very least not automatically, as I think it can appear confusing to the user (the user himself can, of course, do that after boot, for Dom0 or DomUs, but that's another story, I think). So I'm now thinking whether it wouldn't be better to, in this patch, leave soft affinity alone completely. Then, if we want to make it possible to tweak soft affinity, we can allow for something like dom0_nodes=soft:1,3 and, in that case, alter soft affinity only. Hmm, not sure. And I keep being confused whether soft means allow and hard means prefer or the other way around. In any event, again, with sched_init_vcpu() setting up things so that soft is a superset of hard (and most likely they're equal), I don't see why the same done here would be more of a problem. BTW, mostly out of curiosity, I've had a few strange issues/conflicts in applying this on top of staging, in order to test it... Was it me doing something very stupid, or was this based on something different? Apart from the one patch named in the cover letter there shouldn't be any other dependencies. Without you naming the issues you encountered, I can't tell. I see. Never mind then, maybe I messed up with my various branches... Sorry for bothering with this. :-) No reason to be sorry - I'm more than happy if inconsistencies get pointed out before trying to commit anything. Jan
Re: [Xen-devel] [PATCH 3/4] xen: sched: make counters for vCPU tickling generic
On Fri, 2015-02-27 at 00:47 -0500, Meng Xu wrote: 2015-02-26 8:37 GMT-05:00 Dario Faggioli dario.faggi...@citrix.com: and update them from Credit2 and RTDS schedulers. Signed-off-by: Dario Faggioli dario.faggi...@citrix.com Cc: Meng Xu xumengpa...@gmail.com Cc: George Dunlap george.dun...@eu.citrix.com Cc: Jan Beulich jbeul...@suse.com Cc: Keir Fraser k...@xen.org --- xen/common/sched_credit2.c |2 ++ xen/common/sched_rt.c|2 ++ xen/include/xen/perfc_defn.h |4 ++-- 3 files changed, 6 insertions(+), 2 deletions(-) The change for RTDS scheduler looks good to me. Does this count as a Reviewed-by: Meng Xu men...@cis.upenn.edu ? Also, if yes, does it also apply to patch #2 ? That is unclear as sched_rt.c is modified in patches #1, #2 ad #3, while what you did is: - you explicitly provided the tag for patch #1 - you said looks good for this for patch #3 - you said nothing for patch #2 The bottom line of all this being: with Ack-s/Reviewed-by-s, it's always better be pretty explicit! :-D Thanks and Regards, Dario signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] freemem-slack and large memory environments
On Thu, 26 Feb 2015, Mike Latimer wrote: On Thursday, February 26, 2015 01:45:16 PM Mike Latimer wrote: On Thursday, February 26, 2015 05:53:06 PM Stefano Stabellini wrote: What is the return value of libxl_set_memory_target and libxl_wait_for_free_memory in that case? Isn't it just a matter of properly handle the return values? The return from libxl_set_memory_target is 0, as the assignment works just fine. I don't have the return from libxl_wait_for_free_memory in my notes, so I'll spin up another test and track that down. I slightly misspoke here... In my testing, the returns are actually: libxl_set_memory_target = 1 The new memory target is set for dom0 successfully. libxl_wait_for_free_memory = -5 Still there isn't enough free memory in the system. libxl_wait_for_memory_target = 0 However dom0 reached the new memory target already. Who is stealing your memory? Note - libxl_wait_for_memory_target is confusing, as rc can be set to ERROR_FAIL, but the function returns 0 anyway (unless an error is encountered earlier.) I guess this just means we need to continue to wait... Maybe I am misunderstanding what you meant, but as far as I can tell rc is set to ERROR_FAIL only right before the out label in libxl_wait_for_memory_target. In that case the function would return ERROR_FAIL. In any case in the context of libxl_wait_for_memory_target, ERROR_FAIL means that the memory target has not been reached. I was testing spinning up a 64GB guest on a 2TB host. After the ballooning had completed, dom0 had ballooned down an extra ~320GB. On this particular machine, each iteration of the loop was showing only 5-7GB of memory being freed at a time. (The loop took 12 iterations.) I would investigate why dom0 is ballooning down as much as you asked it to, but the free memory in the system is still not enough. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Poor network performance between DomU with multiqueue support
Cc'ing David (XenServer kernel maintainer) On Fri, Feb 27, 2015 at 05:21:11PM +0800, openlui wrote: On Mon, Dec 08, 2014 at 01:08:18PM +, Zhangleiqiang (Trump) wrote: On Mon, Dec 08, 2014 at 06:44:26AM +, Zhangleiqiang (Trump) wrote: On Fri, Dec 05, 2014 at 01:17:16AM +, Zhangleiqiang (Trump) wrote: [...] I think that's expected, because guest RX data path still uses grant_copy while guest TX uses grant_map to do zero-copy transmit. As far as I know, there are three main grant-related operations used in split device model: grant mapping, grant transfer and grant copy. Grant transfer has not used now, and grant mapping and grant transfer both involve TLB refresh work for hypervisor, am I right? Or only grant transfer has this overhead? Transfer is not used so I can't tell. Grant unmap causes TLB flush. I saw in an email the other day XenServer folks has some planned improvement to avoid TLB flush in Xen to upstream in 4.6 window. I can't speak for sure it will get upstreamed as I don't work on that. Does grant copy surely has more overhead than grant mapping? At the very least the zero-copy TX path is faster than previous copying path. But speaking of the micro operation I'm not sure. There was once persistent map prototype netback / netfront that establishes a memory pool between FE and BE then use memcpy to copy data. Unfortunately that prototype was not done right so the result was not good. The newest mail about persistent grant I can find is sent from 16 Nov 2012 (http://lists.xen.org/archives/html/xen-devel/2012-11/msg00832.html). Why is it not done right and not merged into upstream? AFAICT there's one more memcpy than necessary, i.e. frontend memcpy data into the pool then backend memcpy data out of the pool, when backend should be able to use the page in pool directly. Memcpy should cheaper than grant_copy because the former needs not the hypercall which will cause VM Exit to XEN Hypervisor, am I right? For RX path, using memcpy based on persistent grant table may have higher performance than using grant copy now. In theory yes. Unfortunately nobody has benchmarked that properly. I have some testing for RX performance using persistent grant method and upstream method (3.17.4 branch), the results show that persistent grant method does have higher performance than upstream method (from 3.5Gbps to about 6Gbps). And I find that persistent grant mechanism has already used in blkfrong/blkback, I am wondering why there are no efforts to replace the grant copy by persistent grant now, at least in RX path. Are there other disadvantages in persistent grant method which stop we use it? I've seen numbers better than 6Gbps. See upstream changeset 1650d5455bd2dc6b5ee134bd6fc1a3236c266b5b. Persistent grant is not silver bullet. There is email thread on the list discussing whether it should be removed in block driver. XenServer folks have been working on improving network performance. It's my understanding that they choose different routes than persistent grant. David might have more insight. Wei. PS. I used pkt-gen to send packet from dom0 to a domU running on another dom0, the CPUs of both dom0 is Intel E5640 2.4GHz, and the two dom0s is connected with a 10GE NIC. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [Qemu-devel] [v2][PATCH] libxl: add one machine property to support IGD GFX passthrough
On Fri, 2015-02-27 at 14:28 +0800, Chen, Tiejun wrote: On 2015/2/27 0:17, Ian Campbell wrote: On Thu, 2015-02-26 at 14:35 +0800, Chen, Tiejun wrote: If we are going to do this then I think we need to arrange for the interface to be able to express the need to force the workarounds for a particular device. IOW a boolean will not suffice since it doesn't indicate that IGD workarounds are needed. Probably it would be simplest to just leave this functionality out for the time being and revisit if/when maintaining the list becomes an annoyance or an end user trips over it. You mean we should maintain one list to save all targeted devices, then tools uses ids as an index to lookup this list to pass something to qemu. I (think I) meant a list of pci vid:did in libxl, which is matched against the devices passed to the domain (e.g. pci = [...] in xl cfg), which then enables the igd workarounds, i.e. by passing the option to Yeah, this is exactly what I'm understanding. qemu. But actually one question that I have always been thinking about is, its really a responsibility of Xen to determine which device type should be passed by probing that pair of vendor and device ids? Xen is just one of so many approaches to qemu so such a rare workaround option can be passed actively by any user, instead of Xen. Furthermore, its becoming flexible as well to those cases we want to force overriding this. I'm not sure, but I think you are suggestion that qemu should autodetect this situation, without being explicitly told igd-passthru=on on the command line? If the qemu maintainers are amenable to that, and it's not already the case that other components (e.g. hvmloader) need to be told about these workarounds, then I suppose that would work. So I think qemu should mainly plays this role. If qemu realizes we're passing through a IGD or other targeted device, it should post a warning or even error message to indicate what right behavior is needed, or what is that potential risk by default. Hrm, here it sounds more like you are suggesting that qemu should detect and warn, rather than detect and do the right thing? I'm not sure how Qemu could indicate what the right behaviour is going to be, it'll differ for different hypervisors or even for which Xen toolstack (xl vs libvirt etc) is in use. Or maybe I've misunderstood? IGD is a tricky case since Qemu has to construct a ISA bridge and host bridge before we pass IGD device. But we don't like to expose these two bridges unconditionally, and this is also why we need this option. Here I just mean when Qemu realizes IGD is passed through but without that appropriate option set, Qemu can post something to explicitly notify user that this option is needed in his case. But it may be a lazy idea. In any case I think the additions of such warnings in qemu are a separate to the discussion in this thread, so I propose to leave it alone for now. So now I think I'd better go back handling this on Xen side with your comments. As you said the Boolean doesn't suffice to indicate that IGD workarounds are needed. So I think we can reuse that existing bool 'gfx_passthru'. Firstly we can redefine this as string, Unfortunately not since libxl's API guarantee requires older clients to keep working, i.e. those who use libxl_defbool_set on this field. Probably the best which can be done is to deprecate this field in favour of a new one (the old field would need to be obeyed only if the new one was set to its default value). Probably an Enumeration would be better than a raw string here as well. This approach doesn't allow for the possibility of multiple such workarounds though. It's unclear to me if this matters or not. The other option which I've mentioned is to leave gfx_passthru and have libxl figure out which workarounds to enable based on the set of PCI devices passed through. I guess you don't like that approach? (due to the need to maintain the pci vid:did list?) - (gfx_passthru, libxl_defbool), + (gfx_passthru, string), Then + +if (libxl__is_igd_vga_passthru(gc, guest_config) || +(b_info-u.hvm.gfx_passthru + strncmp(b_info-u.hvm.gfx_passthru, igd, 3) == 0) ) { +machinearg = GCSPRINTF(%s,igd-passthru=on, machinearg); +} + Of course we need modify something else to align this change. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] freemem-slack and large memory environments
On Fri, 27 Feb 2015, Ian Campbell wrote: On Thu, 2015-02-26 at 13:38 -0700, Mike Latimer wrote: (Sorry for the delayed response, dealing with ENOTIME.) On Thursday, February 26, 2015 05:47:21 PM Ian Campbell wrote: On Thu, 2015-02-26 at 10:38 -0700, Mike Latimer wrote: rc = libxl_set_memory_target(ctx, 0, free_memkb - need_memkb, 1, 0); I think so. In essence we just need to update need_memkb on each iteration, right? Not quite... Indeed, looking again I see that the 1 there means relative, so I'm still confused about why free_memkb - need_memkb isn't the correct delta on every iteration. Is the issue that if you have a current target of, say, 15 and you wish to go to ten you would say libxl_set_memory_target(, 15 - (-5), 1, 0) i.e. libxl_set_memory_target(, -5, 1, 0) then the target would be set to 10, but if during libxl_wait_for_free_memory you only ballooned -2 and failed the target gets left at 10 but the current free is actually now 13 so next time around you say: libxl_set_memory_target(, 13 - (-3), 1, 0) i.e. libxl_set_memory_target(, -3, 1, 0) and the target now becomes 10-3 == 7, rather than 13-3=10 as one might expect? need_memkb is used in the loop to determine if we have enough free memory for the new domain. So, need_memkb should always remain set to the total amount of memory requested - not just the amount of change still required. The easiest thing to do is set the dom0's memory target before the loop, which is what my original patch did. It seems like there are two viable approaches here: First is to just set the target before the loop and wait (perhaps much longer) for it to be achieved. The second is to decrement the target in smaller steps and wait to reach it each time. I don't think an approach which sets a target, waits for that target to be achieved and then on partial success tries to figure out what the relative progress is and what is left to achieve and factor that into a new target request makes sense. The reason for the loop is not to make the memory decrease request more digestible for dom0 or coping with errors. The loop tries to handle scenarios were the freed memory is not available to us somehow. This is a more wordy explanation of it: get free memory is it enough? if so, return, otherwise continue set dom0 memory target = current - need is there enough memory now? if so, return, otherwise continue has dom0 actually reached his target? If so, loop again (who stole the memory?), otherwise fail (dom0 is busy) This is consistent with Mike's logs: the memory is freed by dom0 but it is not available somehow. Maybe XenD is running? Another guest is ballooning up at the same time? This is all confounded by the fact that the libxl_wait_for_free_* functions have a barking interface. That is true I've just seen this comment right above: /* * WARNING * This memory management API is unstable even in Xen 4.2. * It has a numer of deficiencies and we intend to replace it. * * The semantics of these functions should not be relied on to be very * coherent or stable. We will however endeavour to keep working * existing programs which use them in roughly the same way as libxl. */ Given that I think that we should feel free, if necessary, to deprecate the current interface and replace it with one which is actually usable. Whatever that might mean. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] freemem-slack and large memory environments
On Friday, February 27, 2015 11:29:12 AM Mike Latimer wrote: On Friday, February 27, 2015 08:28:49 AM Mike Latimer wrote: After adding 2048aeec, dom0's target is lowered by the required amount (e.g. 64GB), but as dom0 cannot balloon down fast enough, libxl_wait_for_memory_target returns -5, and the domain create fails (wrong return code - libxl_wait_for_memory_target actually returns -3) With libxl_wait_for_memory_target return code corrected (2048aeec), debug messages look like this: Parsing config from sles12pv DBG: start freemem loop DBG: free_memkb = 541976, need_memkb = 67651584 (rc=0) DBG: dom0_curr_target = 2118976472, set_memory_target = -67109608 (rc=1) DBG: wait_for_free_memory = 67651584 (rc=-5) DBG: wait_for_memory_target (rc=-3) failed to free memory for the domain After failing, dom0 continues to balloon down by the requested amount (-67109608), so a subsequent startup attempt would work. My original fix (2563bca1) was intended to continue looping in freem until dom0 ballooned down the requested amount. However, this really only worked without 2048aeec, as wait_for_memory_target was always returning 0. After Stefano pointed out this problem, commit 2563bca1 can still be useful - but seems less important as ballooning down dom0 is where the major delays are seen. The following messages show what was happening when wait_for_memory_target was always returning 0. I've narrowed it down to just the interesting messages: DBG: free_memkb = 9794852, need_memkb = 67651584 (rc=0) DBG: dom0_curr_target = 2118976464, set_memory_target = -67109596 (rc=1) DBG: dom0_curr_target = 2051866868, set_memory_target = -57856732 (rc=1) DBG: dom0_curr_target = 1994010136, set_memory_target = -50615004 (rc=1) DBG: dom0_curr_target = 1943395132, set_memory_target = -43965148 (rc=1) DBG: dom0_curr_target = 1899429984, set_memory_target = -37538524 (rc=1) DBG: dom0_curr_target = 1861891460, set_memory_target = -31560412 (rc=1) DBG: dom0_curr_target = 1830331048, set_memory_target = -25309916 (rc=1) DBG: dom0_curr_target = 1805021132, set_memory_target = -19514076 (rc=1) DBG: dom0_curr_target = 1785507056, set_memory_target = -13949660 (rc=1) DBG: dom0_curr_target = 1771557396, set_memory_target = -8057564 (rc=1) DBG: dom0_curr_target = 1763499832, set_memory_target = -1862364 (rc=1) The above situation is no longer relevant, but the overall dom0 target problem is still an issue. It now seems rather obvious (hopefully) that the 10 second delay in wait_for_memory_target is not sufficient. Should that function be modified to monitor ongoing progress and continue waiting as long as progress is being made? Sorry for the long discussion to get to this point. :( -Mike ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: xen config changes v4
On 02/27/2015 10:41 AM, Stefano Stabellini wrote: On Fri, 27 Feb 2015, Juergen Gross wrote: On 02/26/2015 06:42 PM, Stefano Stabellini wrote: On Thu, 26 Feb 2015, Luis R. Rodriguez wrote: On Thu, Feb 26, 2015 at 11:08:20AM +, Stefano Stabellini wrote: On Thu, 26 Feb 2015, David Vrabel wrote: On 26/02/15 04:59, Juergen Gross wrote: So we are again in the situation that pv-drivers always imply the pvops kernel (PARAVIRT selected). I started the whole Kconfig rework to eliminate this dependency. Yes. Can you produce a series that just addresses this one issue. In the absence of any concrete requirement for this big Kconfig reorg I I don't think it is helpful. I clearly missed some context as I didn't realize that this was the intended goal. Why do we want this? Please explain as it won't come for free. We have a few PV interfaces for HVM guests that need PARAVIRT in Linux in order to be used, for example pv_time_ops and HVMOP_pagetable_dying. They are critical performance improvements and from the interface perspective, small enough that doesn't make much sense having a separate KConfig option for them. In order to reach the goal above we necessarily need to introduce a differentiation in terms of PV on HVM guests in Linux: 1) basic guests with PV network, disk, etc but no PV timers, no HVMOP_pagetable_dying, no PV IPIs 2) full PV on HVM guests that have PV network, disk, timers, HVMOP_pagetable_dying, PV IPIs and anything else that makes sense. 2) is much faster than 1) on Xen and 2) is only a tiny bit slower than 1) on native x86 Also don't we shove 2) down hvm guests right now? Even when everything is built in I do not see how we opt out for HVM for 1) at run time right now. If this is true then the question of motivation for this becomes even stronger I think. Yes, indeed there is no way to do 1) at the moment. And for good reasons, see above. Hmm, after checking the code I'm not convinced: - HVMOP_pagetable_dying is obsolete on modern hardware supporting EPT/HAP That might be true, but what about older hardware? Even on modern hardware a few workloads still run faster on shadow. But if HVMOP_pagetable_dying is the only reason to keep PARAVIRT for HVM guests, then I agree with you that we should remove it. - PV IPIs are not needed on single-vcpu guests - PARAVIRT_CLOCK doesn't need PARAVIRT (in fact the SUSEs kernel configs for all x86_64 kernels have CONFIG_PARAVIRT_CLOCK=y) So I think we really should enable building Xen frontends without PARAVIRT, implying at least no XEN_PV and no XEN_PVH. I'll have a try setting up patches. If we are doing this as a performance improvement, I would like to see a couple of benchmarks (kernbench, hackbench) to show that on a single-vcpu guest and multi-vcpu guest (let's say 4 vcpus) disabling PARAVIRT leads to better performance on Xen on EPT hardware. This is not meant to be a performance improvement. It is meant to enable a standard distro kernel configured without PARAVIRT to be able to run as a HVM guest using the pv-drivers. Juergen ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3] RFC: Automatically check xen's public headers for C++ pitfalls.
At 15:28 -0500 on 26 Feb (1424960919), Don Slutz wrote: On 02/26/15 11:24, Tim Deegan wrote: Explicitly _not_ addressing the use of 'private' in various fields, since we'd previously decided not to fix that. This sentence and the -Dprivate=private_is_a_keyword_in_cpp below appear to be at odds. Yes, that's not very clear; will reword as I apply. You can add my Tested-by: Don Slutz dsl...@verizon.com Thanks. Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: xen config changes v4
On Thu, 2015-02-26 at 19:48 +0100, Luis R. Rodriguez wrote: On Thu, Feb 26, 2015 at 05:42:57PM +, Stefano Stabellini wrote: On Thu, 26 Feb 2015, Luis R. Rodriguez wrote: On Thu, Feb 26, 2015 at 11:08:20AM +, Stefano Stabellini wrote: On Thu, 26 Feb 2015, David Vrabel wrote: On 26/02/15 04:59, Juergen Gross wrote: So we are again in the situation that pv-drivers always imply the pvops kernel (PARAVIRT selected). I started the whole Kconfig rework to eliminate this dependency. Yes. Can you produce a series that just addresses this one issue. In the absence of any concrete requirement for this big Kconfig reorg I I don't think it is helpful. I clearly missed some context as I didn't realize that this was the intended goal. Why do we want this? Please explain as it won't come for free. We have a few PV interfaces for HVM guests that need PARAVIRT in Linux in order to be used, for example pv_time_ops and HVMOP_pagetable_dying. They are critical performance improvements and from the interface perspective, small enough that doesn't make much sense having a separate KConfig option for them. In order to reach the goal above we necessarily need to introduce a differentiation in terms of PV on HVM guests in Linux: 1) basic guests with PV network, disk, etc but no PV timers, no HVMOP_pagetable_dying, no PV IPIs 2) full PV on HVM guests that have PV network, disk, timers, HVMOP_pagetable_dying, PV IPIs and anything else that makes sense. 2) is much faster than 1) on Xen and 2) is only a tiny bit slower than 1) on native x86 Also don't we shove 2) down hvm guests right now? Even when everything is built in I do not see how we opt out for HVM for 1) at run time right now. If this is true then the question of motivation for this becomes even stronger I think. Yes, indeed there is no way to do 1) at the moment. And for good reasons, see above. OK if the goal is to be able to build front end drivers by avoiding building PARAVIRT / PARAVIRT_CLOCK and if the gains to be able to do so (which haven't been stated other than just the ability to do so) are small (as Stefano notes simple hvm containers do not perform great) I may have misunderstood this bit, WRT this last parenthetical: adding PV I/O drivers to an HVM guest is AFAIAA the single biggest improvement you can make to a bare HVM guest in terms of performance. There are indeed additional gains to be had from other PV stuff which Stefano mentions (clocks etc), but I believe those are all mostly incremental and not as impressive as the PV I/O gains (but still good improvements). That's not to say that there's an argument in the context of Linux that if you can enable PV I/O then you can also enable other PV optimisations, but I thought I would mention it. Wasn't part of the original point here to be able to enable PV I/O (and perhaps other PV stuff) for non-PAE 32-bit x86, i.e. in a context where PVMMU isn't available. (That doesn't necessarily conflict with if you can enable PV I/O then you can also enable other PV optimisations though) Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] VT-d: print_vtd_entries() should cope with superpages
El 27/02/15 a les 10.52, Jan Beulich ha escrit: Even if VT-d code alone (i.e. when not sharing tables with EPT) still doesn't support superpages, this function - invoked upon DMA remapping faults - needs to cope with such. While at it also replace a few more plain numbers with suitable named constants. Signed-off-by: Jan Beulich jbeul...@suse.com Thanks for this, looks fine to me: Acked-by: Roger Pau Monné roger@citrix.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: xen config changes v4
On Fri, 27 Feb 2015, Juergen Gross wrote: On 02/27/2015 10:41 AM, Stefano Stabellini wrote: On Fri, 27 Feb 2015, Juergen Gross wrote: On 02/26/2015 06:42 PM, Stefano Stabellini wrote: On Thu, 26 Feb 2015, Luis R. Rodriguez wrote: On Thu, Feb 26, 2015 at 11:08:20AM +, Stefano Stabellini wrote: On Thu, 26 Feb 2015, David Vrabel wrote: On 26/02/15 04:59, Juergen Gross wrote: So we are again in the situation that pv-drivers always imply the pvops kernel (PARAVIRT selected). I started the whole Kconfig rework to eliminate this dependency. Yes. Can you produce a series that just addresses this one issue. In the absence of any concrete requirement for this big Kconfig reorg I I don't think it is helpful. I clearly missed some context as I didn't realize that this was the intended goal. Why do we want this? Please explain as it won't come for free. We have a few PV interfaces for HVM guests that need PARAVIRT in Linux in order to be used, for example pv_time_ops and HVMOP_pagetable_dying. They are critical performance improvements and from the interface perspective, small enough that doesn't make much sense having a separate KConfig option for them. In order to reach the goal above we necessarily need to introduce a differentiation in terms of PV on HVM guests in Linux: 1) basic guests with PV network, disk, etc but no PV timers, no HVMOP_pagetable_dying, no PV IPIs 2) full PV on HVM guests that have PV network, disk, timers, HVMOP_pagetable_dying, PV IPIs and anything else that makes sense. 2) is much faster than 1) on Xen and 2) is only a tiny bit slower than 1) on native x86 Also don't we shove 2) down hvm guests right now? Even when everything is built in I do not see how we opt out for HVM for 1) at run time right now. If this is true then the question of motivation for this becomes even stronger I think. Yes, indeed there is no way to do 1) at the moment. And for good reasons, see above. Hmm, after checking the code I'm not convinced: - HVMOP_pagetable_dying is obsolete on modern hardware supporting EPT/HAP That might be true, but what about older hardware? Even on modern hardware a few workloads still run faster on shadow. But if HVMOP_pagetable_dying is the only reason to keep PARAVIRT for HVM guests, then I agree with you that we should remove it. - PV IPIs are not needed on single-vcpu guests - PARAVIRT_CLOCK doesn't need PARAVIRT (in fact the SUSEs kernel configs for all x86_64 kernels have CONFIG_PARAVIRT_CLOCK=y) So I think we really should enable building Xen frontends without PARAVIRT, implying at least no XEN_PV and no XEN_PVH. I'll have a try setting up patches. If we are doing this as a performance improvement, I would like to see a couple of benchmarks (kernbench, hackbench) to show that on a single-vcpu guest and multi-vcpu guest (let's say 4 vcpus) disabling PARAVIRT leads to better performance on Xen on EPT hardware. This is not meant to be a performance improvement. It is meant to enable a standard distro kernel configured without PARAVIRT to be able to run as a HVM guest using the pv-drivers. This is not a convincing explanation. Debian, Ubuntu and Fedora seems to be able to cope with it just fine. Why do you want to do that, even though it will cause a performance regression and a maintenance pain? You haven't provided a reason yet. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] freemem-slack and large memory environments
On Thu, 2015-02-26 at 13:38 -0700, Mike Latimer wrote: (Sorry for the delayed response, dealing with ENOTIME.) On Thursday, February 26, 2015 05:47:21 PM Ian Campbell wrote: On Thu, 2015-02-26 at 10:38 -0700, Mike Latimer wrote: rc = libxl_set_memory_target(ctx, 0, free_memkb - need_memkb, 1, 0); I think so. In essence we just need to update need_memkb on each iteration, right? Not quite... Indeed, looking again I see that the 1 there means relative, so I'm still confused about why free_memkb - need_memkb isn't the correct delta on every iteration. Is the issue that if you have a current target of, say, 15 and you wish to go to ten you would say libxl_set_memory_target(, 15 - (-5), 1, 0) i.e. libxl_set_memory_target(, -5, 1, 0) then the target would be set to 10, but if during libxl_wait_for_free_memory you only ballooned -2 and failed the target gets left at 10 but the current free is actually now 13 so next time around you say: libxl_set_memory_target(, 13 - (-3), 1, 0) i.e. libxl_set_memory_target(, -3, 1, 0) and the target now becomes 10-3 == 7, rather than 13-3=10 as one might expect? need_memkb is used in the loop to determine if we have enough free memory for the new domain. So, need_memkb should always remain set to the total amount of memory requested - not just the amount of change still required. The easiest thing to do is set the dom0's memory target before the loop, which is what my original patch did. It seems like there are two viable approaches here: First is to just set the target before the loop and wait (perhaps much longer) for it to be achieved. The second is to decrement the target in smaller steps and wait to reach it each time. I don't think an approach which sets a target, waits for that target to be achieved and then on partial success tries to figure out what the relative progress is and what is left to achieve and factor that into a new target request makes sense. This is all confounded by the fact that the libxl_wait_for_free_* functions have a barking interface. I've just seen this comment right above: /* * WARNING * This memory management API is unstable even in Xen 4.2. * It has a numer of deficiencies and we intend to replace it. * * The semantics of these functions should not be relied on to be very * coherent or stable. We will however endeavour to keep working * existing programs which use them in roughly the same way as libxl. */ Given that I think that we should feel free, if necessary, to deprecate the current interface and replace it with one which is actually usable. Whatever that might mean. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: xen config changes v4
On Fri, 2015-02-27 at 10:11 +, Stefano Stabellini wrote: (for some reason I initially thought this was in reply to my mail, so it's written in a way which assumes that, so sprinkle IMHO around the place and/or take it as a follow on to my previous mail in this thread, I guess) This is not a convincing explanation. Debian, Ubuntu and Fedora seems to be able to cope with it just fine. Debian doesn't really, for an i386 Debian installation you need to go and find some slightly obscure media which has a PAE kernel on it in order to install with PV drivers. If you just download the most obvious i386 installation media you get no PV drivers of any description in an HVM guest. Fedora IIRC has moved everything over to PAE by default (no non-PAE support), so they are probably OK. I've no idea what Ubuntu does. Why do you want to do that, even though it will cause a performance regression and a maintenance pain? You haven't provided a reason yet. Where is the performance regression? For a non-PAE x86 guest, which currently has 0 PV optimisations enabled (no PV I/O, no PV clock, nothing) being able to enable PV I/O is a useful performance improvement. I'm also not saying that it *only* makes sense to enable PV I/O, if it was also possible to enable other PV things, like PV clocks etc for non-PAE x86 guests then that would also be worthwhile. But I am saying that if enabling those extra optimisations for non-PAE x86 guests is too invasive or problematic or whatever then it would *still* be worth enabling PV I/O if that is more possible. Note that in no case am I suggesting turning off something which is possible today. In particular I see no reason to want to disable PV optimisations for PAE enabled x86 guests. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] correct mis-conversion set_bit() - __cpumask_set_cpu() by 4aaca0e9cd
On Fri, 2015-02-27 at 07:33 +, Jan Beulich wrote: On 26.02.15 at 17:53, li...@eikelenboom.it wrote: Monday, February 23, 2015, 12:06:00 PM, you wrote: I have no idea how I came to use __cpumask_set_cpu() there, the conversion should have been set_bit() - __set_bit(). The wrong construct results in problems on systems with relatively few CPUs. Reported-by: Sander Eikelenboom li...@eikelenboom.it Signed-off-by: Jan Beulich jbeul...@suse.com --- a/xen/common/softirq.c +++ b/xen/common/softirq.c @@ -106,7 +106,7 @@ void cpu_raise_softirq(unsigned int cpu, if ( !per_cpu(batching, this_cpu) || in_irq() ) smp_send_event_check_cpu(cpu); else -__cpumask_set_cpu(nr, per_cpu(batch_mask, this_cpu)); +__set_bit(nr, per_cpu(batch_mask, this_cpu)); } void cpu_raise_softirq_batch_begin(void) Hi Jan, Any reason this wasn't applied to staging yet ? It didn't get ack-ed Sorry, I thought this was an x86 patch for some reason and therefore that Andrew's ack was sufficient. For v2 of the patch (54eb3d88027800062...@mail.emea.novell.com, using __cpumask_set_cpu(cpu, ...): Acked-by: Ian Campbell ian.campb...@citrix.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen/iommu: fix usage of shared EPT/IOMMU page tables on PVH guests
On 27.02.15 at 11:10, roger@citrix.com wrote: iommu_share_p2m_table should not prevent PVH guests from using a shared page table. Change the condition to has_hvm_container_domain instead of is_hvm_domain. This allows both PVH and HVM guests to use it. Remove the asserts in iommu_set_pgd and amd_iommu_share_p2m, iommu_share_p2m_table and p2m_alloc_table already do them. This wording is confusing - it took me to got into p2m_alloc_table() to see that one half of the assertion is being satisfied there an the other in iommu_share_p2m_table(). While not asserting what IOMMU code does is quite fine in IOMMU code (especially as closely related as is the case here), the assertion regarding what P2M code does (and what a future second caller of iommu_share_p2m_table() might violate) should be kept, but perhaps be moved into iommu_share_p2m_table() instead of keeping it in vendor specific code. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On Fri, 2015-02-27 at 15:41 +0530, Pranavkumar Sawargaonkar wrote: Hi Julien, On Thu, Feb 26, 2015 at 8:47 PM, Julien Grall julien.gr...@linaro.org wrote: On 26/02/15 14:46, Pranavkumar Sawargaonkar wrote: Hi Hi Pranavkumar, Also if we just show only one vITS (or only one Virtual v2m frame) instead of two vITS then actual hardware interrupt number and virtual interrupt number which guest will see will become different This will hamper direct irq routing to guest. The IRQ injection should not consider a 1:1 mapping between pIRQ and vIRQ. Yes, but in case of GICv2m( I am not sure about ITS) in register MSI_SETSPI_NS device has to write the interrupt ID (which is pirq) to generate an interrupt. If you write virq which is different that pirq (associated with the actual GICv2m frame ) then it will not trigger any interrupt. Now there is case which I am not sure how it can be solvable with one vITS/vGICv2m - . Suppose we have two GICv2m frames and say oneis having an address 0x1000 for MSI_SETSPI_NS register and other 0x2000 for it's MSI_SETSPI_NS register . Assume first frame has SPI's (physical) 0x64 - 0x72 associated and second has 0x80-0x88 associated. . Now there are two PCIe hosts, first using first GICv2m frame as a MSI parent and another using second frame. . Device on first host uses MSI_SETSPI_NS (0x1000) address along with a data (i.e. intr number say 0x64) and device on second host uses 0x2000 and data 0x80 Now if we show one vGICv2m frame in guest for both the devices then what address I will program in each device's config space for MSI and also what will the data value. Secondly device's write for these addresses will be transparent to cpu so how can we trap them while device wants to trigger any interrupt ? Please correct me if I misunderstood anything. Is what you are suggesting a v2m specific issue? I thought the whole point of the ITS stuff in GICv3 was that one could program such virt-phys mappings into the hardware ITS and it would do the translation (the T in ITS) such that the host got the pIRQ it was expecting when the guest wrote the virtualised vIRQ information to the device. Caveat: If I've read the ITS bits of that doc at any point it was long ago and I've forgotten everything I knew about it... And I've never read anything about v2m at all ;-) Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] VT-d: print_vtd_entries() should cope with superpages
Even if VT-d code alone (i.e. when not sharing tables with EPT) still doesn't support superpages, this function - invoked upon DMA remapping faults - needs to cope with such. While at it also replace a few more plain numbers with suitable named constants. Signed-off-by: Jan Beulich jbeul...@suse.com --- a/xen/drivers/passthrough/vtd/iommu.h +++ b/xen/drivers/passthrough/vtd/iommu.h @@ -268,18 +268,22 @@ struct dma_pte { }; #define DMA_PTE_READ (1) #define DMA_PTE_WRITE (2) +#define DMA_PTE_PROT (DMA_PTE_READ | DMA_PTE_WRITE) +#define DMA_PTE_SP (1 7) #define DMA_PTE_SNP (1 11) #define dma_clear_pte(p)do {(p).val = 0;} while(0) #define dma_set_pte_readable(p) do {(p).val |= DMA_PTE_READ;} while(0) #define dma_set_pte_writable(p) do {(p).val |= DMA_PTE_WRITE;} while(0) -#define dma_set_pte_superpage(p) do {(p).val |= (1 7);} while(0) +#define dma_set_pte_superpage(p) do {(p).val |= DMA_PTE_SP;} while(0) #define dma_set_pte_snp(p) do {(p).val |= DMA_PTE_SNP;} while(0) -#define dma_set_pte_prot(p, prot) \ -do {(p).val = ((p).val ~3) | ((prot) 3); } while (0) +#define dma_set_pte_prot(p, prot) do { \ +(p).val = ((p).val ~DMA_PTE_PROT) | ((prot) DMA_PTE_PROT); \ +} while (0) #define dma_pte_addr(p) ((p).val PADDR_MASK PAGE_MASK_4K) #define dma_set_pte_addr(p, addr) do {\ (p).val |= ((addr) PAGE_MASK_4K); } while (0) -#define dma_pte_present(p) (((p).val 3) != 0) +#define dma_pte_present(p) (((p).val DMA_PTE_PROT) != 0) +#define dma_pte_superpage(p) (((p).val DMA_PTE_SP) != 0) /* interrupt remap entry */ struct iremap_entry { --- a/xen/drivers/passthrough/vtd/utils.c +++ b/xen/drivers/passthrough/vtd/utils.c @@ -179,6 +179,8 @@ void print_vtd_entries(struct iommu *iom printk(l%d[%x] not present\n, level, l_index); break; } +if ( dma_pte_superpage(pte) ) +break; val = dma_pte_addr(pte); } while ( --level ); } VT-d: print_vtd_entries() should cope with superpages Even if VT-d code alone (i.e. when not sharing tables with EPT) still doesn't support superpages, this function - invoked upon DMA remapping faults - needs to cope with such. While at it also replace a few more plain numbers with suitable named constants. Signed-off-by: Jan Beulich jbeul...@suse.com --- a/xen/drivers/passthrough/vtd/iommu.h +++ b/xen/drivers/passthrough/vtd/iommu.h @@ -268,18 +268,22 @@ struct dma_pte { }; #define DMA_PTE_READ (1) #define DMA_PTE_WRITE (2) +#define DMA_PTE_PROT (DMA_PTE_READ | DMA_PTE_WRITE) +#define DMA_PTE_SP (1 7) #define DMA_PTE_SNP (1 11) #define dma_clear_pte(p)do {(p).val = 0;} while(0) #define dma_set_pte_readable(p) do {(p).val |= DMA_PTE_READ;} while(0) #define dma_set_pte_writable(p) do {(p).val |= DMA_PTE_WRITE;} while(0) -#define dma_set_pte_superpage(p) do {(p).val |= (1 7);} while(0) +#define dma_set_pte_superpage(p) do {(p).val |= DMA_PTE_SP;} while(0) #define dma_set_pte_snp(p) do {(p).val |= DMA_PTE_SNP;} while(0) -#define dma_set_pte_prot(p, prot) \ -do {(p).val = ((p).val ~3) | ((prot) 3); } while (0) +#define dma_set_pte_prot(p, prot) do { \ +(p).val = ((p).val ~DMA_PTE_PROT) | ((prot) DMA_PTE_PROT); \ +} while (0) #define dma_pte_addr(p) ((p).val PADDR_MASK PAGE_MASK_4K) #define dma_set_pte_addr(p, addr) do {\ (p).val |= ((addr) PAGE_MASK_4K); } while (0) -#define dma_pte_present(p) (((p).val 3) != 0) +#define dma_pte_present(p) (((p).val DMA_PTE_PROT) != 0) +#define dma_pte_superpage(p) (((p).val DMA_PTE_SP) != 0) /* interrupt remap entry */ struct iremap_entry { --- a/xen/drivers/passthrough/vtd/utils.c +++ b/xen/drivers/passthrough/vtd/utils.c @@ -179,6 +179,8 @@ void print_vtd_entries(struct iommu *iom printk(l%d[%x] not present\n, level, l_index); break; } +if ( dma_pte_superpage(pte) ) +break; val = dma_pte_addr(pte); } while ( --level ); } ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/5] (not just)x86/Dom0: NUMA related adjustments
[adding Wei, as he may be interested, for his vNUMA work] On Thu, 2015-02-26 at 13:44 +, Jan Beulich wrote: 1: x86: allow specifying the NUMA nodes Dom0 should run on 2: allow domain heap allocations to specify more than one NUMA node 3: x86: widen NUMA nodes to be allocated from 4: VT-d: widen NUMA nodes to be allocated from 5: AMD IOMMU: widen NUMA nodes to be allocated from Signed-off-by: Jan Beulich jbeul...@suse.com --- To apply cleanly his depends on x86/Dom0: account for shadow/HAP allocation (http://lists.xenproject.org/archives/html/xen-devel/2015-02/msg03111.html). signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] xen/iommu: fix usage of shared EPT/IOMMU page tables on PVH guests
iommu_share_p2m_table should not prevent PVH guests from using a shared page table. Change the condition to has_hvm_container_domain instead of is_hvm_domain. This allows both PVH and HVM guests to use it. Remove the asserts in iommu_set_pgd and amd_iommu_share_p2m, iommu_share_p2m_table and p2m_alloc_table already do them. Also fix another incorrect usage of is_hvm_domain usage in arch_iommu_populate_page_table. This has not given problems so far because all the pages in PVH guests are of type PGT_writable_page. Signed-off-by: Roger Pau Monné roger@citrix.com Cc: Suravee Suthikulpanit suravee.suthikulpa...@amd.com Cc: Aravind Gopalakrishnan aravind.gopalakrish...@amd.com Cc: Jan Beulich jbeul...@suse.com Cc: Yang Zhang yang.z.zh...@intel.com Cc: Kevin Tian kevin.t...@intel.com --- xen/drivers/passthrough/amd/iommu_map.c | 2 -- xen/drivers/passthrough/iommu.c | 2 +- xen/drivers/passthrough/vtd/iommu.c | 2 -- xen/drivers/passthrough/x86/iommu.c | 2 +- 4 files changed, 2 insertions(+), 6 deletions(-) diff --git a/xen/drivers/passthrough/amd/iommu_map.c b/xen/drivers/passthrough/amd/iommu_map.c index a8c60ec..31dc05d 100644 --- a/xen/drivers/passthrough/amd/iommu_map.c +++ b/xen/drivers/passthrough/amd/iommu_map.c @@ -785,8 +785,6 @@ void amd_iommu_share_p2m(struct domain *d) struct page_info *p2m_table; mfn_t pgd_mfn; -ASSERT( is_hvm_domain(d) d-arch.hvm_domain.hap_enabled ); - if ( !iommu_use_hap_pt(d) ) return; diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c index cc12735..3e11d6b 100644 --- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -332,7 +332,7 @@ void iommu_share_p2m_table(struct domain* d) { const struct iommu_ops *ops = iommu_get_ops(); -if ( iommu_enabled is_hvm_domain(d) ) +if ( iommu_enabled has_hvm_container_domain(d) ) ops-share_p2m(d); } diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c index 2e113d7..ff542cb 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -1788,8 +1788,6 @@ static void iommu_set_pgd(struct domain *d) struct hvm_iommu *hd = domain_hvm_iommu(d); mfn_t pgd_mfn; -ASSERT( is_hvm_domain(d) d-arch.hvm_domain.hap_enabled ); - if ( !iommu_use_hap_pt(d) ) return; diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c index 52d8948..9eb8d33 100644 --- a/xen/drivers/passthrough/x86/iommu.c +++ b/xen/drivers/passthrough/x86/iommu.c @@ -56,7 +56,7 @@ int arch_iommu_populate_page_table(struct domain *d) while ( !rc (page = page_list_remove_head(d-page_list)) ) { -if ( is_hvm_domain(d) || +if ( has_hvm_container_domain(d) || (page-u.inuse.type_info PGT_type_mask) == PGT_writable_page ) { BUG_ON(SHARED_M2P(mfn_to_gmfn(d, page_to_mfn(page; -- 1.9.3 (Apple Git-50) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
Hi Julien, On Thu, Feb 26, 2015 at 8:47 PM, Julien Grall julien.gr...@linaro.org wrote: On 26/02/15 14:46, Pranavkumar Sawargaonkar wrote: Hi Hi Pranavkumar, Also if we just show only one vITS (or only one Virtual v2m frame) instead of two vITS then actual hardware interrupt number and virtual interrupt number which guest will see will become different This will hamper direct irq routing to guest. The IRQ injection should not consider a 1:1 mapping between pIRQ and vIRQ. Yes, but in case of GICv2m( I am not sure about ITS) in register MSI_SETSPI_NS device has to write the interrupt ID (which is pirq) to generate an interrupt. If you write virq which is different that pirq (associated with the actual GICv2m frame ) then it will not trigger any interrupt. Now there is case which I am not sure how it can be solvable with one vITS/vGICv2m - . Suppose we have two GICv2m frames and say oneis having an address 0x1000 for MSI_SETSPI_NS register and other 0x2000 for it's MSI_SETSPI_NS register . Assume first frame has SPI's (physical) 0x64 - 0x72 associated and second has 0x80-0x88 associated. . Now there are two PCIe hosts, first using first GICv2m frame as a MSI parent and another using second frame. . Device on first host uses MSI_SETSPI_NS (0x1000) address along with a data (i.e. intr number say 0x64) and device on second host uses 0x2000 and data 0x80 Now if we show one vGICv2m frame in guest for both the devices then what address I will program in each device's config space for MSI and also what will the data value. Secondly device's write for these addresses will be transparent to cpu so how can we trap them while device wants to trigger any interrupt ? Please correct me if I misunderstood anything. Thanks, Pranav I have a patch which allow virq != pirq: https://patches.linaro.org/43012/ Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] freemem-slack and large memory environments
On Thu, 2015-02-26 at 16:30 -0700, Mike Latimer wrote: On Thursday, February 26, 2015 01:45:16 PM Mike Latimer wrote: On Thursday, February 26, 2015 05:53:06 PM Stefano Stabellini wrote: What is the return value of libxl_set_memory_target and libxl_wait_for_free_memory in that case? Isn't it just a matter of properly handle the return values? The return from libxl_set_memory_target is 0, as the assignment works just fine. I don't have the return from libxl_wait_for_free_memory in my notes, so I'll spin up another test and track that down. I slightly misspoke here... In my testing, the returns are actually: libxl_set_memory_target = 1 libxl_wait_for_free_memory = -5 libxl_wait_for_memory_target = 0 Note - libxl_wait_for_memory_target is confusing, Further to the comment I just made WRT this source comment: /* * WARNING * This memory management API is unstable even in Xen 4.2. * It has a numer of deficiencies and we intend to replace it. * * The semantics of these functions should not be relied on to be very * coherent or stable. We will however endeavour to keep working * existing programs which use them in roughly the same way as libxl. */ I think we should feel free to introduce a new interface which has semantics which we can actually work with. IOW as rc can be set to ERROR_FAIL, but the function returns 0 anyway (unless an error is encountered earlier.) I guess this just means we need to continue to wait... Do something sensible so there is no more guessing. I'm not sure yet what sensible would be. One approach to fixing this might be when the replacenent for libxl_wait_for_memory_target fails it sets the target to whatever was actually achieved, such that further calculations involving free_memkb and the overall target will still be valid. Or we could move the progress is being made logic currently in xl's freemem down into the wait_for_memory_target replacement so it hopefully has more information available to it in order to make better decisions about the timeouts. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] how to assign resources exclusive to a single domU
On Fri, 2015-02-27 at 09:19 +0100, Olaf Hering wrote: On Fri, Feb 27, Jürgen Groß wrote: On 02/26/2015 09:57 AM, Olaf Hering wrote: I wonder what should be done in my changes for libxl. If you are doing something, please add a flag to be able to disable the additional security checks regarding multiple assignment. I think libxl should just allow multiple assignments of physical devices. Its up to the admin to make sure the overall config is sane. I can't remember what libxl does today but WRT disks (with the phy backend at least) xend used to have sharing checks and refuse to allow sharing (for writeable disks) unless overridden (by w+ in the mode string, IIRC). I don't think libxl implements those checks, so the override isn't supported, but maybe it would be good to do so, and maybe it would be a good idea for pvscsi to at least be consistent with what we might eventually do for disks? (FWIW I think most of the checks were actually in the block-* scripts, I'm not sure why they are active under libxl) Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] VT-d: print_vtd_entries() should cope with superpages
On 27/02/15 09:52, Jan Beulich wrote: Even if VT-d code alone (i.e. when not sharing tables with EPT) still doesn't support superpages, this function - invoked upon DMA remapping faults - needs to cope with such. While at it also replace a few more plain numbers with suitable named constants. Signed-off-by: Jan Beulich jbeul...@suse.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com --- a/xen/drivers/passthrough/vtd/iommu.h +++ b/xen/drivers/passthrough/vtd/iommu.h @@ -268,18 +268,22 @@ struct dma_pte { }; #define DMA_PTE_READ (1) #define DMA_PTE_WRITE (2) +#define DMA_PTE_PROT (DMA_PTE_READ | DMA_PTE_WRITE) +#define DMA_PTE_SP (1 7) #define DMA_PTE_SNP (1 11) #define dma_clear_pte(p)do {(p).val = 0;} while(0) #define dma_set_pte_readable(p) do {(p).val |= DMA_PTE_READ;} while(0) #define dma_set_pte_writable(p) do {(p).val |= DMA_PTE_WRITE;} while(0) -#define dma_set_pte_superpage(p) do {(p).val |= (1 7);} while(0) +#define dma_set_pte_superpage(p) do {(p).val |= DMA_PTE_SP;} while(0) #define dma_set_pte_snp(p) do {(p).val |= DMA_PTE_SNP;} while(0) -#define dma_set_pte_prot(p, prot) \ -do {(p).val = ((p).val ~3) | ((prot) 3); } while (0) +#define dma_set_pte_prot(p, prot) do { \ +(p).val = ((p).val ~DMA_PTE_PROT) | ((prot) DMA_PTE_PROT); \ +} while (0) #define dma_pte_addr(p) ((p).val PADDR_MASK PAGE_MASK_4K) #define dma_set_pte_addr(p, addr) do {\ (p).val |= ((addr) PAGE_MASK_4K); } while (0) -#define dma_pte_present(p) (((p).val 3) != 0) +#define dma_pte_present(p) (((p).val DMA_PTE_PROT) != 0) +#define dma_pte_superpage(p) (((p).val DMA_PTE_SP) != 0) /* interrupt remap entry */ struct iremap_entry { --- a/xen/drivers/passthrough/vtd/utils.c +++ b/xen/drivers/passthrough/vtd/utils.c @@ -179,6 +179,8 @@ void print_vtd_entries(struct iommu *iom printk(l%d[%x] not present\n, level, l_index); break; } +if ( dma_pte_superpage(pte) ) +break; val = dma_pte_addr(pte); } while ( --level ); } ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/1] xen-netback: remove compilation warning
On jue, 2015-02-26 at 11:30 -0500, David Miller wrote: From: pedro marzo.pe...@gmail.com Date: Thu, 26 Feb 2015 09:25:41 +0100 From: pmarzo marzo.pe...@gmail.com offset and size are of type uint16_t so the %lu gives a warning A %u specifier, the same used in size makes gcc happy Not sure if a %x would be more correct Signed-off-by: Pedro Marzo Perez marzo.pe...@gmail.com This patch actually adds a warning on my machine, and your analysis of the types is therefore probably incorrect: drivers/net/xen-netback/netback.c: In function ‘xenvif_tx_build_gops’: drivers/net/xen-netback/netback.c:1259:8: warning: format ‘%u’ expects argument of type ‘unsigned int’, but argument 5 has type ‘long unsigned int’ [-Wformat=] You are right, this patch is completely wrong for i386, it gives me a warning too. I should have checked that before, sorry. I should also have said I am using a cross compiler, which is the one that gives the warning compiling the current code: arm-linux-gnueabi-gcc --version arm-linux-gnueabi-gcc (Ubuntu/Linaro 4.7.3-12ubuntu1) 4.7.3 The issue is probably ~PAGE_MASK and I think the type of that propagates into the type of the overall calculation. That is what is probably happening, operations must be done to operands of the same size, and the intel compiler is casting everything to unsigned long (because I have a 64 bit machine??), but the arm compiler is casting to unsigned int :-( PAGE_MASK is defined as a number without any cast, so not sure which compiler is right #define PAGE_SHIFT 12 #define PAGE_MASK (~((1 PAGE_SHIFT) - 1)) This new patch fixes the warning for the arm gcc compiler and the i386 compiler, it just makes sure everything is cast to unsigned long Could you please forget the previous one and give your opinion about this one? --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -1248,9 +1248,10 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue, /* No crossing a page as the payload mustn't fragment. */ if (unlikely((txreq.offset + txreq.size) PAGE_SIZE)) { netdev_err(queue-vif-dev, - txreq.offset: %x, size: %u, end: %u \n, + txreq.offset: %x, size: %u, end: %lu \n, txreq.offset, txreq.size, - (txreq.offset~PAGE_MASK) + txreq.size); + ((unsigned long)txreq.offset~PAGE_MASK) ++ txreq.size); xenvif_fatal_tx_err(queue-vif); break; } -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [linux-linus test] 35443: regressions - trouble: blocked/broken/fail/pass
flight 35443 linux-linus real [real] http://www.chiark.greenend.org.uk/~xensrcts/logs/35443/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-rumpuserxen-amd64 8 guest-start fail REGR. vs. 34227 build-armhf-libvirt 3 host-install(3) broken REGR. vs. 34227 build-armhf-pvops 3 host-install(3) broken REGR. vs. 34227 test-amd64-amd64-xl-qemut-win7-amd64 7 windows-install fail REGR. vs. 34227 Regressions which are regarded as allowable (not blocking): test-amd64-i386-freebsd10-i386 7 freebsd-install fail like 34227 test-amd64-i386-freebsd10-amd64 7 freebsd-install fail like 34227 test-amd64-i386-pair17 guest-migrate/src_host/dst_host fail like 34227 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-pvh-intel 9 guest-start fail never pass test-armhf-armhf-xl-midway1 build-check(1) blocked n/a test-armhf-armhf-xl-sedf-pin 1 build-check(1) blocked n/a test-armhf-armhf-xl-sedf 1 build-check(1) blocked n/a test-armhf-armhf-xl 1 build-check(1) blocked n/a test-armhf-armhf-xl-multivcpu 1 build-check(1) blocked n/a test-armhf-armhf-xl-credit2 1 build-check(1) blocked n/a test-armhf-armhf-libvirt 1 build-check(1) blocked n/a test-amd64-amd64-libvirt 10 migrate-support-checkfail never pass test-amd64-amd64-xl-pcipt-intel 9 guest-start fail never pass test-amd64-amd64-xl-pvh-amd 9 guest-start fail never pass test-amd64-i386-libvirt 10 migrate-support-checkfail never pass test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-amd64-xl-winxpsp3 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-winxpsp3 14 guest-stopfail never pass test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop fail never pass test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xl-winxpsp3 14 guest-stop fail never pass test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop fail never pass version targeted for testing: linuxb24e2bdde4af656bb0679a101265ebb8f8735d3c baseline version: linux9d82f5eb3376cbae96ad36a063a9390de1694546 1736 people touched revisions under test, not listing them all jobs: build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-armhf-libvirt broken build-i386-libvirt pass build-amd64-pvopspass build-armhf-pvopsbroken build-i386-pvops pass build-amd64-rumpuserxen pass build-i386-rumpuserxen pass test-amd64-amd64-xl pass test-armhf-armhf-xl blocked test-amd64-i386-xl pass test-amd64-amd64-xl-pvh-amd fail test-amd64-i386-rhel6hvm-amd pass test-amd64-i386-qemut-rhel6hvm-amd pass test-amd64-i386-qemuu-rhel6hvm-amd pass test-amd64-amd64-xl-qemut-debianhvm-amd64pass test-amd64-i386-xl-qemut-debianhvm-amd64 pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-i386-xl-qemuu-debianhvm-amd64 pass test-amd64-i386-freebsd10-amd64 fail test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
Re: [Xen-devel] backport c1d322e6048796296555dd36fdd102d7fa2f50bf to all stable trees
On Fri, 27 Feb 2015, Fabio Fantoni wrote: Il 26/02/2015 14:02, Stefano Stabellini ha scritto: Hi all, I would like to request a backport of commit c1d322e6048796296555dd36fdd102d7fa2f50bf Author: Stefano Stabellini stefano.stabell...@eu.citrix.com Date: Wed Dec 3 08:15:19 2014 -0500 xen-hvm: increase maxmem before calling xc_domain_populate_physmap Seems that this fixes is applied only in staging/qemu-upstream-unstable.git (of xen's gits) but still not in qemu-upstream-unstable.git or stables ones. An unrelated local-migrate test is failing. It is believed to be due to Paul's ioreq-server API changes and the fix should be in xen-unstable already (the fix is a patch to the hypervisor). We expect the test to pass soon. Can be the cause of strange problem of loop of increase memory failing on hvm domUs start with xen 4.4, 4.5 and unstable with newer kernel even if domUs and dom0 have all fixed memory settings with balloning disabled? What exactly are you referring to? Are you talking about http://marc.info/?l=xen-develm=142499350515886 ? Or is another memory bug in xen? I have syslog and kern.log increasing some gb each days full of: xen:balloon: reserve_additional_memory: add_memory() failed: -17 in one 4.5.0 dom0 also with kernel 3.16.7-ckt4-3~bpo70+1 with these applied: [xen] cancel ballooning if adding new memory failed (Closes: #776448) Thanks for any reply and sorry for my bad english. It shouldn't have anything to do with xen-hvm: increase maxmem before calling xc_domain_populate_physmap. To make sure you could simply revert c1d322e6048796296555dd36fdd102d7fa2f50bf (901230fd8ce053cc21312a2eca2f3ba9f1d103f2 in qemu-upstream-unstable.git) and try again to see if the memory issues you are experiencing go away. to all QEMU stable trees. Which ones are the currently maintained trees? It applies without issues to 2.2, 2.1, 2.0, 1.7, 1.6, 1.5. The filename in the commit needs to be changed from xen-hvm.c to xen-all.c for 1.4, 1.3, 1.2, 1.1. I didn't go father back. Thanks, Stefano ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel