Re: [Xen-devel] [PATCH] xen-blkfront: fix mq start/stop race
Hi Boris & Juergen, Could you help review this patch? This is a race and will cause the io hung. Thanks, Junxiao. On 06/22/2017 09:36 AM, Junxiao Bi wrote: > When ring buf full, hw queue will be stopped. While blkif interrupt consume > request and make free space in ring buf, hw queue will be started again. > But since start queue is protected by spin lock while stop not, that will > cause a race. > > interrupt: process: > blkif_interrupt() blkif_queue_rq() > kick_pending_request_queues_locked() > blk_mq_start_stopped_hw_queues() >clear_bit(BLK_MQ_S_STOPPED, >state) > blk_mq_stop_hw_queue(hctx) >blk_mq_run_hw_queue(hctx, async) > > If ring buf is made empty in this case, interrupt will never come, then the > hw queue will be stopped forever, all processes waiting for the pending io > in the queue will hung. > > Signed-off-by: Junxiao Bi> Reviewed-by: Ankur Arora > --- > drivers/block/xen-blkfront.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c > index 8bb160cd00e1..4767b82b2cf6 100644 > --- a/drivers/block/xen-blkfront.c > +++ b/drivers/block/xen-blkfront.c > @@ -912,8 +912,8 @@ out_err: > return BLK_MQ_RQ_QUEUE_ERROR; > > out_busy: > - spin_unlock_irqrestore(>ring_lock, flags); > blk_mq_stop_hw_queue(hctx); > + spin_unlock_irqrestore(>ring_lock, flags); > return BLK_MQ_RQ_QUEUE_BUSY; > } > > ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 3/3] VT-d PI: restrict the vcpu number on a given pcpu
On Fri, Jun 16, 2017 at 09:09:13AM -0600, Jan Beulich wrote: On 24.05.17 at 08:56,wrote: >> Currently, a blocked vCPU is put in its pCPU's pi blocking list. If >> too many vCPUs are blocked on a given pCPU, it will incur that the list >> grows too long. After a simple analysis, there are 32k domains and >> 128 vcpu per domain, thus about 4M vCPUs may be blocked in one pCPU's >> PI blocking list. When a wakeup interrupt arrives, the list is >> traversed to find some specific vCPUs to wake them up. This traversal in >> that case would consume much time. >> >> To mitigate this issue, this patch limits the vcpu number on a given >> pCPU, > >This would be a bug, but I think it's the description which is wrong >(or at least imprecise): You don't limit the number of vCPU-s _run_ >on any pCPU, but those tracked on any pCPU-s blocking list. Please >say so here to avoid confusion. Agree. > >> taking factors such as perfomance of common case, current hvm vcpu >> count and current pcpu count into consideration. With this method, for >> the common case, it works fast and for some extreme cases, the list >> length is under control. >> >> The change in vmx_pi_unblock_vcpu() is for the following case: >> vcpu is running -> try to block (this patch may change NSDT to >> another pCPU) but notification comes in time, thus the vcpu > >What does "but notification comes in time" mean? > I mean when local_events_need_delivery() in vcpu_block() return true. >> goes back to running station -> VM-entry (we should set NSDT again, > >s/station/state/ ? > >> reverting the change we make to NSDT in vmx_vcpu_block()) > >Overall I'm not sure I really understand what you try to explain >here. Will put it above the related change. I wanted to explain why we need this change if a vcpu can be added to a remote pcpu (means the vcpu isn't running on this pcpu). a vcpu may go through the two different paths from calling vcpu_block() to VM-entry: Path1: vcpu_block()->vmx_vcpu_block()->local_events_need_delivery(return true) -> vmx_pi_unblock_vcpu (during VM-entry) Path2: vcpu_block()->vmx_vcpu_block()->local_events_need_delivery(return false) -> vmx_pi_switch_from() -> vmx_pi_switch_to() ->vmx_pi_unblock_vcpu (during VM-entry) For migration a vcpu to another pcpu would lead to a incorrect pi_desc->ndst, vmx_pi_switch_to() re-assigns pi_desc->ndst. It was enough for Path1 (no one changed the pi_desc->ndst field and changed the binding between pcpu and vcpu) and Path2. But, now vmx_vcpu_block() would change pi_desc->ndst to another pcpu to receive wakeup interrupt. If local_events_need_delivery() returns true, we should correct pi_desc->ndst to current pcpu in vmx_pi_unblock_vcpu(). > >> --- a/xen/arch/x86/hvm/vmx/vmx.c >> +++ b/xen/arch/x86/hvm/vmx/vmx.c >> @@ -100,16 +100,62 @@ void vmx_pi_per_cpu_init(unsigned int cpu) >> spin_lock_init(_cpu(vmx_pi_blocking, cpu).lock); >> } >> >> +/* >> + * By default, the local pcpu (means the one the vcpu is currently running >> on) >> + * is chosen as the destination of wakeup interrupt. But if the vcpu number >> of >> + * the pcpu exceeds a limit, another pcpu is chosen until we find a suitable >> + * one. >> + * >> + * Currently, choose (v_tot/p_tot) + K as the limit of vcpu count, where >> + * v_tot is the total number of hvm vcpus on the system, p_tot is the total >> + * number of pcpus in the system, and K is a fixed number. Experments shows >> + * the maximum time to wakeup a vcpu from a 128-entry blocking list is about >> + * 22us, which is tolerable. So choose 128 as the fixed number K. > >Giving and kind of absolute time value requires also stating on what >hardware this was measured. > >> + * This policy makes sure: >> + * 1) for common cases, the limit won't be reached and the local pcpu is >> used >> + * which is beneficial to performance (at least, avoid an IPI when >> unblocking >> + * vcpu). >> + * 2) for the worst case, the blocking list length scales with the vcpu >> count >> + * divided by the pcpu count. >> + */ >> +#define PI_LIST_FIXED_NUM 128 >> +#define PI_LIST_LIMIT (atomic_read(_hvm_vcpus) / num_online_cpus() >> + \ >> + PI_LIST_FIXED_NUM) >> + >> +static bool pi_over_limit(int count) > >Can a caller validly pass a negative argument? Otherwise unsigned int >please. > >> +{ >> +/* Compare w/ constant first to save an atomic read in the common case >> */ > >As an atomic read is just a normal read on x86, does this really matter? agree. > >> +return ((count > PI_LIST_FIXED_NUM) && >> +(count > (atomic_read(_hvm_vcpus) / num_online_cpus()) + >> +PI_LIST_FIXED_NUM)); > >Right above you've #define-d PI_LIST_LIMIT - why do you open code >it here? Also note that the outer pair of parentheses is pointless (and >hampering readability). > >> static void vmx_vcpu_block(struct vcpu *v) >> { >> unsigned long flags; >> -unsigned int dest; >> +unsigned int dest,
[Xen-devel] [linux-linus test] 110950: regressions - FAIL
flight 110950 linux-linus real [real] http://logs.test-lab.xenproject.org/osstest/logs/110950/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-qemuu-ovmf-amd64 14 guest-saverestore.2 fail REGR. vs. 110515 test-amd64-amd64-xl-qemuu-win7-amd64 15 guest-localmigrate/x10 fail REGR. vs. 110515 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 17 guest-start/win.repeat fail blocked in 110515 test-armhf-armhf-libvirt-xsm 13 saverestore-support-checkfail like 110515 test-armhf-armhf-libvirt 13 saverestore-support-checkfail like 110515 test-armhf-armhf-xl-rtds 15 guest-start/debian.repeatfail like 110515 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail like 110515 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop fail like 110515 test-amd64-amd64-xl-rtds 9 debian-install fail like 110515 test-armhf-armhf-libvirt-raw 12 saverestore-support-checkfail like 110515 test-amd64-amd64-xl-qemut-ws16-amd64 9 windows-installfail never pass test-amd64-i386-libvirt 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt 12 migrate-support-checkfail never pass test-arm64-arm64-xl 12 migrate-support-checkfail never pass test-arm64-arm64-xl 13 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 12 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 12 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 13 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass test-arm64-arm64-xl-xsm 12 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 12 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 13 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail never pass test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2 fail never pass test-armhf-armhf-xl-rtds 12 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail never pass test-armhf-armhf-xl 12 migrate-support-checkfail never pass test-armhf-armhf-xl 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt 12 migrate-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass test-amd64-i386-xl-qemut-ws16-amd64 12 guest-saverestore fail never pass test-amd64-amd64-xl-qemuu-ws16-amd64 9 windows-installfail never pass test-armhf-armhf-xl-vhd 11 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 12 saverestore-support-checkfail never pass test-armhf-armhf-xl-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-xl-xsm 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 12 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 13 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail never pass test-amd64-i386-xl-qemuu-ws16-amd64 12 guest-saverestore fail never pass test-amd64-amd64-xl-qemut-win10-i386 9 windows-installfail never pass test-amd64-i386-xl-qemuu-win10-i386 9 windows-install fail never pass test-amd64-amd64-xl-qemuu-win10-i386 9 windows-installfail never pass test-amd64-i386-xl-qemut-win10-i386 9 windows-install fail never pass version targeted for testing: linux48b6bbef9a1789f0365c1a385879a1fea4460016 baseline version: linux1439ccf73d9c07654fdd5b4969fd53c2feb8684d Last test of basis 110515 2017-06-17 06:48:56 Z5 days Failing since110536 2017-06-17 23:48:13 Z5 days6 attempts Testing same since 110950 2017-06-21 22:17:11 Z1 days1 attempts People who
Re: [Xen-devel] [PATCH for-4.9 v3 3/3] xen/livepatch: Don't crash on encountering STN_UNDEF relocations
On Thu, Jun 22, 2017 at 07:15:29PM +0100, Andrew Cooper wrote: > A symndx of STN_UNDEF is special, and means a symbol value of 0. While > legitimate in the ELF standard, its existance in a livepatch is questionable > at best. Until a plausible usecase presents itself, reject such a relocation > with -EOPNOTSUPP. > > Additionally, fix an off-by-one error while range checking symndx, and perform > a safety check on elf->sym[symndx].sym before derefencing it, to avoid > tripping over a NULL pointer when calculating val. > > Signed-off-by: Andrew Cooper> --- > CC: Konrad Rzeszutek Wilk Reviewed-by: Konrad Rzeszutek Wilk Tested-by: Konrad Rzeszutek Wilk [arm32 and x86] ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.9 v3 2/3] xen/livepatch: Use zeroed memory allocations for arrays
On Thu, Jun 22, 2017 at 07:15:28PM +0100, Andrew Cooper wrote: > Each of these arrays is sparse. Use zeroed allocations to cause uninitialised > array elements to contain deterministic values, most importantly for the > embedded pointers. > > Signed-off-by: Andrew Cooper> --- > CC: Konrad Rzeszutek Wilk Reviewed-by: Konrad Rzeszutek Wilk Tested-by: Konrad Rzeszutek Wilk [x86 and ARM32] > CC: Ross Lagerwall > > * new in v3 > --- > xen/common/livepatch.c | 4 ++-- > xen/common/livepatch_elf.c | 4 ++-- > 2 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c > index df67a1a..66d532d 100644 > --- a/xen/common/livepatch.c > +++ b/xen/common/livepatch.c > @@ -771,8 +771,8 @@ static int build_symbol_table(struct payload *payload, > } > } > > -symtab = xmalloc_array(struct livepatch_symbol, nsyms); > -strtab = xmalloc_array(char, strtab_len); > +symtab = xzalloc_array(struct livepatch_symbol, nsyms); > +strtab = xzalloc_array(char, strtab_len); > > if ( !strtab || !symtab ) > { > diff --git a/xen/common/livepatch_elf.c b/xen/common/livepatch_elf.c > index c4a9633..b69e271 100644 > --- a/xen/common/livepatch_elf.c > +++ b/xen/common/livepatch_elf.c > @@ -52,7 +52,7 @@ static int elf_resolve_sections(struct livepatch_elf *elf, > const void *data) > int rc; > > /* livepatch_elf_load sanity checked e_shnum. */ > -sec = xmalloc_array(struct livepatch_elf_sec, elf->hdr->e_shnum); > +sec = xzalloc_array(struct livepatch_elf_sec, elf->hdr->e_shnum); > if ( !sec ) > { > dprintk(XENLOG_ERR, LIVEPATCH"%s: Could not allocate memory for > section table!\n", > @@ -225,7 +225,7 @@ static int elf_get_sym(struct livepatch_elf *elf, const > void *data) > /* No need to check values as elf_resolve_sections did it. */ > nsym = symtab_sec->sec->sh_size / symtab_sec->sec->sh_entsize; > > -sym = xmalloc_array(struct livepatch_elf_sym, nsym); > +sym = xzalloc_array(struct livepatch_elf_sym, nsym); > if ( !sym ) > { > dprintk(XENLOG_ERR, LIVEPATCH "%s: Could not allocate memory for > symbols\n", > -- > 2.1.4 > ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen: Replace ASSERT(0) with ASSERT_UNREACHABLE()
On Wed, Jun 21, 2017 at 01:40:45PM +0100, Andrew Cooper wrote: > No functional change, but the result is more informative both in the code and > error messages if the assertions do get hit. > > Signed-off-by: Andrew Cooper> --- > CC: Jan Beulich > CC: Konrad Rzeszutek Wilk Acked-by: Konrad Rzeszutek Wilk ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Travis build failing because "tools/xen-detect: try sysfs node for obtaining guest type" ?
On Thu, Jun 22, 2017 at 07:31:53PM +0200, Dario Faggioli wrote: > Hey, > > Am I the only one for which Travis seems to be unhappy of this: Nope. I saw it too ,but then figured there was some patch from Olaf for this?. > > I/home/travis/build/fdario/xen/tools/misc/../../tools/include > xen-detect.c -o xen-detect > xen-detect.c: In function ‘check_sysfs’: > xen-detect.c:196:17: error: ignoring return value of ‘asprintf’, declared > with attribute warn_unused_result [-Werror=unused-result] > asprintf(, "V%s.%s", str, tmp); > ^ > xen-detect.c: In function ‘check_for_xen’: > xen-detect.c:93:17: error: ignoring return value of ‘asprintf’, declared with > attribute warn_unused_result [-Werror=unused-result] > asprintf(, "V%u.%u", > ^ > cc1: all warnings being treated as errors > > https://travis-ci.org/fdario/xen/jobs/245864401 > > Which, to me, looks related to 48d0c822640f8ce4754de16f1bee5c995bac7078 > ("tools/xen-detect: try sysfs node for obtaining guest type"). > > I can, however, build the tools locally, with: > gcc version 6.3.0 20170516 (Debian 6.3.0-18) > > Thoughts? > > Regards, > Dario > -- > <> (Raistlin Majere) > - > Dario Faggioli, Ph.D, http://about.me/dario.faggioli > Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK) > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > https://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [xen-4.8-testing test] 110946: tolerable FAIL - PUSHED
flight 110946 xen-4.8-testing real [real] http://logs.test-lab.xenproject.org/osstest/logs/110946/ Failures :-/ but no regressions. Regressions which are regarded as allowable (not blocking): test-amd64-i386-xl-qemut-win7-amd64 16 guest-stopfail REGR. vs. 110437 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail REGR. vs. 110437 Tests which did not succeed, but are not blocking: test-armhf-armhf-xl-credit2 15 guest-start/debian.repeatfail like 110410 test-xtf-amd64-amd64-3 45 xtf/test-hvm64-lbr-tsx-vmentry fail like 110437 test-xtf-amd64-amd64-1 45 xtf/test-hvm64-lbr-tsx-vmentry fail like 110437 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stopfail like 110437 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail like 110437 test-armhf-armhf-xl-rtds 15 guest-start/debian.repeatfail like 110437 test-amd64-amd64-xl-rtds 9 debian-install fail like 110437 build-amd64-prev 6 xen-build/dist-test fail never pass build-i386-prev 6 xen-build/dist-test fail never pass test-amd64-amd64-xl-pvh-intel 11 guest-start fail never pass test-amd64-amd64-xl-qemut-ws16-amd64 9 windows-installfail never pass test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail never pass test-amd64-i386-libvirt 12 migrate-support-checkfail never pass test-amd64-amd64-xl-pvh-amd 11 guest-start fail never pass test-amd64-i386-libvirt-xsm 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt 12 migrate-support-checkfail never pass test-amd64-amd64-xl-qemuu-ws16-amd64 9 windows-installfail never pass test-arm64-arm64-xl-credit2 12 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 13 saverestore-support-checkfail never pass test-arm64-arm64-xl 12 migrate-support-checkfail never pass test-arm64-arm64-xl 13 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 12 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 13 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 12 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 12 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 13 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail never pass test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2 fail never pass test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-xsm 13 saverestore-support-checkfail never pass test-armhf-armhf-xl 12 migrate-support-checkfail never pass test-armhf-armhf-xl-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-xl 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-xsm 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 12 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 13 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 12 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 12 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 11 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 12 saverestore-support-checkfail never pass test-amd64-i386-xl-qemut-ws16-amd64 9 windows-install fail never pass test-amd64-amd64-xl-qemut-win10-i386 9 windows-installfail never pass test-amd64-i386-xl-qemut-win10-i386 9 windows-install fail never pass test-amd64-amd64-xl-qemuu-win10-i386 9 windows-installfail never pass test-amd64-i386-xl-qemuu-win10-i386 9 windows-install fail never pass test-amd64-i386-xl-qemuu-ws16-amd64 9 windows-install fail never pass
Re: [Xen-devel] [PATCH 14/17 v5] xen/arm: vpl011: Add support for vuart console in xenconsole
On Thu, 22 Jun 2017, Bhupinder Thakur wrote: > This patch finally adds the support for vuart console. > > Signed-off-by: Bhupinder Thakur> --- > CC: Ian Jackson > CC: Wei Liu > CC: Stefano Stabellini > CC: Julien Grall > > Changes since v4: > - Renamed VUART_CFLAGS- to CFLAGS_vuart- in the Makefile as per the > convention. > > config/arm32.mk | 1 + > config/arm64.mk | 1 + > tools/console/Makefile| 3 ++- > tools/console/daemon/io.c | 31 ++- > 4 files changed, 34 insertions(+), 2 deletions(-) > > diff --git a/config/arm32.mk b/config/arm32.mk > index f95228e..b9f23fe 100644 > --- a/config/arm32.mk > +++ b/config/arm32.mk > @@ -1,5 +1,6 @@ > CONFIG_ARM := y > CONFIG_ARM_32 := y > +CONFIG_VUART_CONSOLE := y > CONFIG_ARM_$(XEN_OS) := y I am tempted to disable this by default on arm32 (but leaving it configurable via Kconfig maybe). Tipically arm32 cpus are not found on server platforms, where SBSA compliance is important. Julien, what do you think? > CONFIG_XEN_INSTALL_SUFFIX := > diff --git a/config/arm64.mk b/config/arm64.mk > index aa45772..861d0a4 100644 > --- a/config/arm64.mk > +++ b/config/arm64.mk > @@ -1,5 +1,6 @@ > CONFIG_ARM := y > CONFIG_ARM_64 := y > +CONFIG_VUART_CONSOLE := y > CONFIG_ARM_$(XEN_OS) := y > > CONFIG_XEN_INSTALL_SUFFIX := > diff --git a/tools/console/Makefile b/tools/console/Makefile > index c8b0300..1cddb6e 100644 > --- a/tools/console/Makefile > +++ b/tools/console/Makefile > @@ -11,6 +11,7 @@ LDLIBS += $(SOCKET_LIBS) > > LDLIBS_xenconsoled += $(UTIL_LIBS) > LDLIBS_xenconsoled += -lrt > +CFLAGS_vuart-$(CONFIG_VUART_CONSOLE) = -DCONFIG_VUART_CONSOLE > > BIN = xenconsoled xenconsole > > @@ -28,7 +29,7 @@ clean: > distclean: clean > > daemon/main.o: daemon/_paths.h > -daemon/io.o: CFLAGS += $(CFLAGS_libxenevtchn) $(CFLAGS_libxengnttab) > +daemon/io.o: CFLAGS += $(CFLAGS_libxenevtchn) $(CFLAGS_libxengnttab) > $(CFLAGS_vuart-y) > xenconsoled: $(patsubst %.c,%.o,$(wildcard daemon/*.c)) > $(CC) $(LDFLAGS) $^ -o $@ $(LDLIBS) $(LDLIBS_libxenevtchn) > $(LDLIBS_libxengnttab) $(LDLIBS_xenconsoled) $(APPEND_LDFLAGS) > > diff --git a/tools/console/daemon/io.c b/tools/console/daemon/io.c > index baf0e2e..6b0114e 100644 > --- a/tools/console/daemon/io.c > +++ b/tools/console/daemon/io.c > @@ -107,12 +107,16 @@ struct console { > xenevtchn_port_or_error_t remote_port; > struct xencons_interface *interface; > struct domain *d; > + bool optional; > + bool prefer_gnttab; > }; > > struct console_data { > char *xsname; > char *ttyname; > char *log_suffix; > + bool optional; > + bool prefer_gnttab; > }; > > static struct console_data console_data[] = { > @@ -121,7 +125,18 @@ static struct console_data console_data[] = { > .xsname = "/console", > .ttyname = "tty", > .log_suffix = "", > + .optional = false, > + .prefer_gnttab = true, > }, > +#if defined(CONFIG_VUART_CONSOLE) > + { > + .xsname = "/vuart/0", > + .ttyname = "tty", > + .log_suffix = "-vuart0", > + .optional = true, > + .prefer_gnttab = false, > + }, > +#endif > }; > > #define MAX_CONSOLE (sizeof(console_data)/sizeof(struct console_data)) > @@ -655,8 +670,18 @@ static int console_create_ring(struct console *con) > "ring-ref", "%u", _ref, > "port", "%i", _port, > NULL); > + > if (err) > + { > + /* > + * This is a normal condition for optional consoles: they might > not be > + * present on xenstore at all. In that case, just return > without error. > + */ > + if (con->optional) > + err = 0; > + > goto out; > + } > > snprintf(path, sizeof(path), "%s/type", con->xspath); > type = xs_read(xs, XBT_NULL, path, NULL); > @@ -670,7 +695,9 @@ static int console_create_ring(struct console *con) > if (ring_ref != con->ring_ref && con->ring_ref != -1) > console_unmap_interface(con); > > - if (!con->interface && xgt_handle) { > + if (!con->interface && > + xgt_handle && > + con->prefer_gnttab) { > /* Prefer using grant table */ > con->interface = xengnttab_map_grant_ref(xgt_handle, > dom->domid, GNTTAB_RESERVED_CONSOLE, > @@ -790,6 +817,8 @@ static int console_init(struct console *con, struct > domain *dom, void **data) > con->d = dom; > con->ttyname = (*con_data)->ttyname; > con->log_suffix = (*con_data)->log_suffix; > + con->optional = (*con_data)->optional; > + con->prefer_gnttab =
Re: [Xen-devel] [PATCH 13/17 v5] xen/arm: vpl011: Modify xenconsole to support multiple consoles
On Thu, 22 Jun 2017, Bhupinder Thakur wrote: > This patch adds the support for multiple consoles and introduces the iterator > functions to operate on multiple consoles. > > This patch is in preparation to support a new vuart console. > > Signed-off-by: Bhupinder Thakur> --- > CC: Ian Jackson > CC: Wei Liu > CC: Stefano Stabellini > CC: Julien Grall > > Changes since v4: > - Changes to make event channel handling per console rather than per domain. > > Changes since v3: > - The changes in xenconsole have been split into four patches. This is the > third patch. > > tools/console/daemon/io.c | 435 > -- > 1 file changed, 302 insertions(+), 133 deletions(-) > > diff --git a/tools/console/daemon/io.c b/tools/console/daemon/io.c > index a2a3496..baf0e2e 100644 > --- a/tools/console/daemon/io.c > +++ b/tools/console/daemon/io.c > @@ -90,12 +90,14 @@ struct buffer { > }; > > struct console { > + char *ttyname; > int master_fd; > int master_pollfd_idx; > int slave_fd; > int log_fd; > struct buffer buffer; > char *xspath; > + char *log_suffix; > int ring_ref; > xenevtchn_handle *xce_handle; > int xce_pollfd_idx; > @@ -107,16 +109,112 @@ struct console { > struct domain *d; > }; > > +struct console_data { > + char *xsname; > + char *ttyname; > + char *log_suffix; > +}; > + > +static struct console_data console_data[] = { > + > + { > + .xsname = "/console", > + .ttyname = "tty", > + .log_suffix = "", > + }, > +}; > + > +#define MAX_CONSOLE (sizeof(console_data)/sizeof(struct console_data)) > + > struct domain { > int domid; > bool is_dead; > unsigned last_seen; > struct domain *next; > - struct console console; > + struct console console[MAX_CONSOLE]; > }; > > static struct domain *dom_head; > > +typedef void (*VOID_ITER_FUNC_ARG1)(struct console *); > +typedef bool (*BOOL_ITER_FUNC_ARG1)(struct console *); > +typedef int (*INT_ITER_FUNC_ARG1)(struct console *); > +typedef void (*VOID_ITER_FUNC_ARG2)(struct console *, void *); > +typedef int (*INT_ITER_FUNC_ARG3)(struct console *, > + struct domain *dom, void **); > + > +static inline bool console_enabled(struct console *con) > +{ > + return con->local_port != -1; > +} > + > +static inline void console_iter_void_arg1(struct domain *d, > + > VOID_ITER_FUNC_ARG1 iter_func) > +{ > + int i = 0; > + struct console *con = &(d->console[0]); > + > + for (i = 0; i < MAX_CONSOLE; i++, con++) > + { > + iter_func(con); > + } > +} > + > +static inline void console_iter_void_arg2(struct domain *d, > + > VOID_ITER_FUNC_ARG2 iter_func, > + > void *iter_data) > +{ > + int i = 0; > + struct console *con = &(d->console[0]); > + > + for (i = 0; i < MAX_CONSOLE; i++, con++) > + { > + iter_func(con, iter_data); > + } > +} > + > +static inline bool console_iter_bool_arg1(struct domain *d, > + > BOOL_ITER_FUNC_ARG1 iter_func) > +{ > + int i = 0; > + struct console *con = &(d->console[0]); > + > + for (i = 0; i < MAX_CONSOLE; i++, con++) > + { > + if (iter_func(con)) > + return true; > + } > + return false; > +} > + > +static inline int console_iter_int_arg1(struct domain *d, > + > INT_ITER_FUNC_ARG1 iter_func) > +{ > + int i = 0; > + struct console *con = &(d->console[0]); > + > + for (i = 0; i < MAX_CONSOLE; i++, con++) > + { > + if (iter_func(con)) > + return 1; > + } > + return 0; > +} > + > +static inline int console_iter_int_arg3(struct domain *d, > + > INT_ITER_FUNC_ARG3 iter_func, > + > void **iter_data) > +{ > + int i = 0; > + struct console *con = &(d->console[0]); > + > + for (i = 0; i < MAX_CONSOLE; i++, con++) > + { > + if (iter_func(con, d, iter_data)) > + return 1; > + } > + return 0; > +} > static int write_all(int fd, const char* buf, size_t len) > { > while (len) { > @@ -163,12 +261,22 @@ static int write_with_timestamp(int fd, const char > *data, size_t sz, > return 0; > } > > -static void buffer_append(struct console
Re: [Xen-devel] [PATCH 11/17 v5] xen/arm: vpl011: Rename the console structure field conspath to xspath
On Thu, 22 Jun 2017, Bhupinder Thakur wrote: > The console->conspath name is changed to console->xspath as it is > clear from the name that it is referring to xenstore path. > > Signed-off-by: Bhupinder ThakurReviewed-by: Stefano Stabellini > --- > CC: Ian Jackson > CC: Wei Liu > CC: Stefano Stabellini > CC: Julien Grall > > Changes since v4: > - Split this change in a separate patch. > > tools/console/daemon/io.c | 30 +++--- > 1 file changed, 15 insertions(+), 15 deletions(-) > > diff --git a/tools/console/daemon/io.c b/tools/console/daemon/io.c > index 30cd167..6f5c69c 100644 > --- a/tools/console/daemon/io.c > +++ b/tools/console/daemon/io.c > @@ -95,7 +95,7 @@ struct console { > int slave_fd; > int log_fd; > struct buffer buffer; > - char *conspath; > + char *xspath; > int ring_ref; > xenevtchn_handle *xce_handle; > int xce_pollfd_idx; > @@ -463,7 +463,7 @@ static int domain_create_tty(struct domain *dom) > goto out; > } > > - success = asprintf(, "%s/limit", con->conspath) != > + success = asprintf(, "%s/limit", con->xspath) != > -1; > if (!success) > goto out; > @@ -474,7 +474,7 @@ static int domain_create_tty(struct domain *dom) > } > free(path); > > - success = (asprintf(, "%s/tty", con->conspath) != -1); > + success = (asprintf(, "%s/tty", con->xspath) != -1); > if (!success) > goto out; > success = xs_write(xs, XBT_NULL, path, slave, strlen(slave)); > @@ -546,14 +546,14 @@ static int domain_create_ring(struct domain *dom) > char *type, path[PATH_MAX]; > struct console *con = >console; > > - err = xs_gather(xs, con->conspath, > + err = xs_gather(xs, con->xspath, > "ring-ref", "%u", _ref, > "port", "%i", _port, > NULL); > if (err) > goto out; > > - snprintf(path, sizeof(path), "%s/type", con->conspath); > + snprintf(path, sizeof(path), "%s/type", con->xspath); > type = xs_read(xs, XBT_NULL, path, NULL); > if (type && strcmp(type, "xenconsoled") != 0) { > free(type); > @@ -646,13 +646,13 @@ static bool watch_domain(struct domain *dom, bool watch) > > snprintf(domid_str, sizeof(domid_str), "dom%u", dom->domid); > if (watch) { > - success = xs_watch(xs, con->conspath, domid_str); > + success = xs_watch(xs, con->xspath, domid_str); > if (success) > domain_create_ring(dom); > else > - xs_unwatch(xs, con->conspath, domid_str); > + xs_unwatch(xs, con->xspath, domid_str); > } else { > - success = xs_unwatch(xs, con->conspath, domid_str); > + success = xs_unwatch(xs, con->xspath, domid_str); > } > > return success; > @@ -682,13 +682,13 @@ static struct domain *create_domain(int domid) > dom->domid = domid; > > con = >console; > - con->conspath = xs_get_domain_path(xs, dom->domid); > - s = realloc(con->conspath, strlen(con->conspath) + > + con->xspath = xs_get_domain_path(xs, dom->domid); > + s = realloc(con->xspath, strlen(con->xspath) + > strlen("/console") + 1); > if (s == NULL) > goto out; > - con->conspath = s; > - strcat(con->conspath, "/console"); > + con->xspath = s; > + strcat(con->xspath, "/console"); > > con->master_fd = -1; > con->master_pollfd_idx = -1; > @@ -712,7 +712,7 @@ static struct domain *create_domain(int domid) > > return dom; > out: > - free(con->conspath); > + free(con->xspath); > free(dom); > return NULL; > } > @@ -756,8 +756,8 @@ static void cleanup_domain(struct domain *d) > free(con->buffer.data); > con->buffer.data = NULL; > > - free(con->conspath); > - con->conspath = NULL; > + free(con->xspath); > + con->xspath = NULL; > > remove_domain(d); > } > -- > 2.7.4 > ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 10/17 v5] xen/arm: vpl011: Modify xenconsole to define and use a new console structure
On Thu, 22 Jun 2017, Bhupinder Thakur wrote: > Xenconsole uses a domain structure which contains console specific fields. > This > patch defines a new console structure, which would be used by the xenconsole > functions to perform console specific operations like reading/writing data > from/to > the console ring buffer or reading/writing data from/to console tty. > > This patch is in preparation to support multiple consoles to support vuart > console. > > Signed-off-by: Bhupinder ThakurReviewed-by: Stefano Stabellini > --- > CC: Ian Jackson > CC: Wei Liu > CC: Stefano Stabellini > CC: Julien Grall > > Changes since v4: > - Moved the following fields from the struct domain to struct console: > ->xenevtchn_handle *xce_handle; > ->int xce_pollfd_idx; > ->int event_count; > ->long long next_period; > > Changes since v3: > - The changes in xenconsole have been split into four patches. This is the > first patch > which modifies the xenconsole to use a new console structure. > > Changes since v2: > - Defined a new function console_create_ring() which sets up the ring buffer > and > event channel a new console. domain_create_ring() uses this function to > setup > a console. > - This patch does not contain vuart specific changes, which would be > introduced in > the next patch. > - Changes for keeping the PV log file name unchanged. > > Changes since v1: > - Split the domain struture to a separate console structure > - Modified the functions to operate on the console struture > - Replaced repetitive per console code with generic code > > tools/console/daemon/io.c | 299 > +- > 1 file changed, 165 insertions(+), 134 deletions(-) > > diff --git a/tools/console/daemon/io.c b/tools/console/daemon/io.c > index e8033d2..30cd167 100644 > --- a/tools/console/daemon/io.c > +++ b/tools/console/daemon/io.c > @@ -89,25 +89,30 @@ struct buffer { > size_t max_capacity; > }; > > -struct domain { > - int domid; > +struct console { > int master_fd; > int master_pollfd_idx; > int slave_fd; > int log_fd; > - bool is_dead; > - unsigned last_seen; > struct buffer buffer; > - struct domain *next; > char *conspath; > int ring_ref; > - xenevtchn_port_or_error_t local_port; > - xenevtchn_port_or_error_t remote_port; > xenevtchn_handle *xce_handle; > int xce_pollfd_idx; > - struct xencons_interface *interface; > int event_count; > long long next_period; > + xenevtchn_port_or_error_t local_port; > + xenevtchn_port_or_error_t remote_port; > + struct xencons_interface *interface; > + struct domain *d; > +}; > + > +struct domain { > + int domid; > + bool is_dead; > + unsigned last_seen; > + struct domain *next; > + struct console console; > }; > > static struct domain *dom_head; > @@ -160,9 +165,10 @@ static int write_with_timestamp(int fd, const char > *data, size_t sz, > > static void buffer_append(struct domain *dom) > { > - struct buffer *buffer = >buffer; > + struct console *con = >console; > + struct buffer *buffer = >buffer; > XENCONS_RING_IDX cons, prod, size; > - struct xencons_interface *intf = dom->interface; > + struct xencons_interface *intf = con->interface; > > cons = intf->out_cons; > prod = intf->out_prod; > @@ -187,22 +193,22 @@ static void buffer_append(struct domain *dom) > > xen_mb(); > intf->out_cons = cons; > - xenevtchn_notify(dom->xce_handle, dom->local_port); > + xenevtchn_notify(con->xce_handle, con->local_port); > > /* Get the data to the logfile as early as possible because if >* no one is listening on the console pty then it will fill up >* and handle_tty_write will stop being called. >*/ > - if (dom->log_fd != -1) { > + if (con->log_fd != -1) { > int logret; > if (log_time_guest) { > logret = write_with_timestamp( > - dom->log_fd, > + con->log_fd, > buffer->data + buffer->size - size, > size, _time_guest_needts); > } else { > logret = write_all( > - dom->log_fd, > + con->log_fd, > buffer->data + buffer->size - size, > size); > } > @@ -338,14 +344,16 @@ static int create_domain_log(struct domain *dom) > > static void domain_close_tty(struct domain *dom) > { > - if (dom->master_fd != -1) { > - close(dom->master_fd); > - dom->master_fd = -1; > + struct console *con = >console; >
Re: [Xen-devel] [PATCH 15/17 v5] xen/arm: vpl011: Add a new vuart console type to xenconsole client
On Thu, 22 Jun 2017, Bhupinder Thakur wrote: > Add a new console type VUART to connect to guest's emualated vuart > console. > > Signed-off-by: Bhupinder ThakurReviewed-by: Stefano Stabellini > --- > CC: Ian Jackson > CC: Wei Liu > CC: Stefano Stabellini > CC: Julien Grall > > Changes since v4: > - Removed the vuart compile time flag so that vuart code is compiled always. > > Changes since v3: > - The vuart console support is under CONFIG_VUART_CONSOLE option. > - Since there is a change from last review, I have not included > reviewed-by tag from Stefano and acked-by tag from Wei. > > tools/console/client/main.c | 13 +++-- > 1 file changed, 11 insertions(+), 2 deletions(-) > > diff --git a/tools/console/client/main.c b/tools/console/client/main.c > index 99f..3dbb06f 100644 > --- a/tools/console/client/main.c > +++ b/tools/console/client/main.c > @@ -76,7 +76,7 @@ static void usage(const char *program) { > "\n" > " -h, --help display this help and exit\n" > " -n, --num N use console number N\n" > -" --type TYPE console type. must be 'pv' or 'serial'\n" > +" --type TYPE console type. must be 'pv', 'serial' or > 'vuart'\n" > " --start-notify-fd N file descriptor used to notify parent\n" > , program); > } > @@ -264,6 +264,7 @@ typedef enum { > CONSOLE_INVAL, > CONSOLE_PV, > CONSOLE_SERIAL, > + CONSOLE_VUART, > } console_type; > > static struct termios stdin_old_attr; > @@ -343,6 +344,7 @@ int main(int argc, char **argv) > char *end; > console_type type = CONSOLE_INVAL; > bool interactive = 0; > + char *console_names = "serial, pv, vuart"; > > if (isatty(STDIN_FILENO) && isatty(STDOUT_FILENO)) > interactive = 1; > @@ -361,9 +363,12 @@ int main(int argc, char **argv) > type = CONSOLE_SERIAL; > else if (!strcmp(optarg, "pv")) > type = CONSOLE_PV; > + else if (!strcmp(optarg, "vuart")) > + type = CONSOLE_VUART; > else { > fprintf(stderr, "Invalid type argument\n"); > - fprintf(stderr, "Console types supported are: > serial, pv\n"); > + fprintf(stderr, "Console types supported are: > %s\n", > + console_names); > exit(EINVAL); > } > break; > @@ -436,6 +441,10 @@ int main(int argc, char **argv) > else > snprintf(path, strlen(dom_path) + > strlen("/device/console/%d/tty") + 5, "%s/device/console/%d/tty", dom_path, > num); > } > + if (type == CONSOLE_VUART) { > + snprintf(path, strlen(dom_path) + strlen("/vuart/0/tty") + 1, > + "%s/vuart/0/tty", dom_path); > + } > > /* FIXME consoled currently does not assume domain-0 doesn't have a > console which is good when we break domain-0 up. To keep us > -- > 2.7.4 > ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 09/17 v5] xen/arm: vpl011: Add a new vuart node in the xenstore
On Thu, 22 Jun 2017, Bhupinder Thakur wrote: > Add a new vuart console node to xenstore. This node is added at > > /local/domain/$DOMID/vuart/0. > > The node contains information such as the ring-ref, event channel, > buffer limit and type of console. > > Xenconsole reads the node information to setup the ring buffer and > event channel for sending/receiving vuart data. > > Signed-off-by: Bhupinder ThakurReviewed-by: Stefano Stabellini > --- > CC: Ian Jackson > CC: Wei Liu > CC: Stefano Stabellini > CC: Julien Grall > > Changes since v4: > - vuart_device moved inside libxl__device_vuart_add() as a local variable. > > Changes since v3: > - Added a backend node for vpl011. > - Removed libxl__device_vuart_add() for HVM guest. It is called only for PV > guest. > > tools/libxl/libxl_console.c | 44 > > tools/libxl/libxl_create.c | 10 +++- > tools/libxl/libxl_device.c | 9 ++-- > tools/libxl/libxl_internal.h | 3 +++ > tools/libxl/libxl_types_internal.idl | 1 + > 5 files changed, 64 insertions(+), 3 deletions(-) > > diff --git a/tools/libxl/libxl_console.c b/tools/libxl/libxl_console.c > index 853be15..cdaf7fd 100644 > --- a/tools/libxl/libxl_console.c > +++ b/tools/libxl/libxl_console.c > @@ -344,6 +344,50 @@ out: > return rc; > } > > +int libxl__device_vuart_add(libxl__gc *gc, uint32_t domid, > +libxl__device_console *console, > +libxl__domain_build_state *state) > +{ > +libxl__device device; > +flexarray_t *ro_front; > +flexarray_t *back; > +int rc; > + > +ro_front = flexarray_make(gc, 16, 1); > +back = flexarray_make(gc, 16, 1); > + > +device.backend_devid = console->devid; > +device.backend_domid = console->backend_domid; > +device.backend_kind = LIBXL__DEVICE_KIND_VUART; > +device.devid = console->devid; > +device.domid = domid; > +device.kind = LIBXL__DEVICE_KIND_VUART; > + > +flexarray_append(back, "frontend-id"); > +flexarray_append(back, GCSPRINTF("%d", domid)); > +flexarray_append(back, "online"); > +flexarray_append(back, "1"); > +flexarray_append(back, "state"); > +flexarray_append(back, GCSPRINTF("%d", XenbusStateInitialising)); > +flexarray_append(back, "protocol"); > +flexarray_append(back, LIBXL_XENCONSOLE_PROTOCOL); > + > +flexarray_append(ro_front, "port"); > +flexarray_append(ro_front, GCSPRINTF("%"PRIu32, state->vuart_port)); > +flexarray_append(ro_front, "ring-ref"); > +flexarray_append(ro_front, GCSPRINTF("%lu", state->vuart_gfn)); > +flexarray_append(ro_front, "limit"); > +flexarray_append(ro_front, GCSPRINTF("%d", LIBXL_XENCONSOLE_LIMIT)); > +flexarray_append(ro_front, "type"); > +flexarray_append(ro_front, "xenconsoled"); > + > +rc = libxl__device_generic_add(gc, XBT_NULL, , > + libxl__xs_kvs_of_flexarray(gc, back), > + NULL, > + libxl__xs_kvs_of_flexarray(gc, ro_front)); > +return rc; > +} > + > int libxl__init_console_from_channel(libxl__gc *gc, > libxl__device_console *console, > int dev_num, > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c > index bffbc45..cfd85ec 100644 > --- a/tools/libxl/libxl_create.c > +++ b/tools/libxl/libxl_create.c > @@ -1367,7 +1367,7 @@ static void domcreate_launch_dm(libxl__egc *egc, > libxl__multidev *multidev, > } > case LIBXL_DOMAIN_TYPE_PV: > { > -libxl__device_console console; > +libxl__device_console console, vuart; > libxl__device device; > > for (i = 0; i < d_config->num_vfbs; i++) { > @@ -1375,6 +1375,14 @@ static void domcreate_launch_dm(libxl__egc *egc, > libxl__multidev *multidev, > libxl__device_vkb_add(gc, domid, _config->vkbs[i]); > } > > +if (d_config->b_info.arch_arm.vuart) > +{ > +init_console_info(gc, , 0); > +vuart.backend_domid = state->console_domid; > +libxl__device_vuart_add(gc, domid, , state); > +libxl__device_console_dispose(); > +} > + > init_console_info(gc, , 0); > console.backend_domid = state->console_domid; > libxl__device_console_add(gc, domid, , state, ); > diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c > index 00356af..3b10c58 100644 > --- a/tools/libxl/libxl_device.c > +++ b/tools/libxl/libxl_device.c > @@ -26,6 +26,9 @@ static char *libxl__device_frontend_path(libxl__gc *gc, > libxl__device *device) > if (device->kind == LIBXL__DEVICE_KIND_CONSOLE && device->devid == 0) > return
Re: [Xen-devel] [PATCH 08/17 v5] xen/arm: vpl011: Add a new domctl API to initialize vpl011
On Thu, 22 Jun 2017, Bhupinder Thakur wrote: > Add a new domctl API to initialize vpl011. It takes the GFN and console > backend domid as input and returns an event channel to be used for > sending and receiving events from Xen. > > Xen will communicate with xenconsole using GFN as the ring buffer and > the event channel to transmit and receive pl011 data on the guest domain's > behalf. > > Signed-off-by: Bhupinder Thakur> --- > CC: Ian Jackson > CC: Wei Liu > CC: Stefano Stabellini > CC: Julien Grall > > Changes since v4: > - Removed libxl__arch_domain_create_finish(). > - Added a new function libxl__arch_build_dom_finish(), which is called at the > last > in libxl__build_dom(). This function calls the vpl011 initialization > function now. > > Changes since v3: > - Added a new arch specific function libxl__arch_domain_create_finish(), which > calls the vpl011 initialization function. For x86 this function does not do > anything. > - domain_vpl011_init() takes a pointer to a structure which contains all the > required information such as console_domid, gfn instead of passing > parameters > separately. > - Dropped a DOMCTL API defined for de-initializing vpl011 as that should be > taken care when the domain is destroyed (and not dependent on userspace > libraries/applications). > > Changes since v2: > - Replaced the DOMCTL APIs defined for get/set of event channel and GFN with > a set of DOMCTL APIs for initializing and de-initializing vpl011 emulation. > > tools/libxc/include/xenctrl.h | 20 > tools/libxc/xc_domain.c | 25 + > tools/libxl/libxl_arch.h | 6 ++ > tools/libxl/libxl_arm.c | 22 ++ > tools/libxl/libxl_dom.c | 4 > tools/libxl/libxl_x86.c | 8 > xen/arch/arm/domain.c | 5 + > xen/arch/arm/domctl.c | 37 + > xen/include/public/domctl.h | 12 > 9 files changed, 139 insertions(+) > > diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h > index 1629f41..26f3d1e 100644 > --- a/tools/libxc/include/xenctrl.h > +++ b/tools/libxc/include/xenctrl.h > @@ -885,6 +885,26 @@ int xc_vcpu_getcontext(xc_interface *xch, > uint32_t vcpu, > vcpu_guest_context_any_t *ctxt); > > +#if defined (__arm__) || defined(__aarch64__) > +/** > + * This function initializes the vpl011 emulation and returns > + * the event to be used by the backend for communicating with > + * the emulation code. > + * > + * @parm xch a handle to an open hypervisor interface > + * @parm domid the domain to get information from > + * @parm console_domid the domid of the backend console > + * @parm gfn the guest pfn to be used as the ring buffer > + * @parm evtchn the event channel to be used for events > + * @return 0 on success, negative error on failure > + */ > +int xc_dom_vpl011_init(xc_interface *xch, > + uint32_t domid, > + uint32_t console_domid, > + xen_pfn_t gfn, > + evtchn_port_t *evtchn); > +#endif Actually, the pattern is to define the xc_ function on all architecture but only return ENOSYS where it's not implemented, see xc_vcpu_get_extstate. > /** > * This function returns information about the XSAVE state of a particular > * vcpu of a domain. If extstate->size and extstate->xfeature_mask are 0, > diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c > index 5d192ea..55de408 100644 > --- a/tools/libxc/xc_domain.c > +++ b/tools/libxc/xc_domain.c > @@ -343,6 +343,31 @@ int xc_domain_get_guest_width(xc_interface *xch, > uint32_t domid, > return 0; > } > > +#if defined (__arm__) || defined(__aarch64__) > +int xc_dom_vpl011_init(xc_interface *xch, > + uint32_t domid, > + uint32_t console_domid, > + xen_pfn_t gfn, > + evtchn_port_t *evtchn) > +{ See other comment. > +DECLARE_DOMCTL; > +int rc = 0; > + > +domctl.cmd = XEN_DOMCTL_vuart_op; > +domctl.domain = (domid_t)domid; > +domctl.u.vuart_op.cmd = XEN_DOMCTL_VUART_OP_INIT_VPL011; > +domctl.u.vuart_op.console_domid = console_domid; > +domctl.u.vuart_op.gfn = gfn; > + > +if ( (rc = do_domctl(xch, )) < 0 ) > +return rc; > + > +*evtchn = domctl.u.vuart_op.evtchn; > + > +return rc; > +} > +#endif > + > int xc_domain_getinfo(xc_interface *xch, >uint32_t first_domid, >unsigned int max_doms, > diff --git a/tools/libxl/libxl_arch.h b/tools/libxl/libxl_arch.h > index 5e1fc60..118b92c 100644 > --- a/tools/libxl/libxl_arch.h > +++ b/tools/libxl/libxl_arch.h > @@ -44,6 +44,12 @@ int
Re: [Xen-devel] [PATCH 07/17 v5] xen/arm: vpl011: Rearrange xen header includes in alphabetical order in domctl.c
On Thu, 22 Jun 2017, Bhupinder Thakur wrote: > Rearrange xen header includes in alphabetical order in domctl.c. > > Signed-off-by: Bhupinder ThakurReviewed-by: Stefano Stabellini > --- > CC: Stefano Stabellini > CC: Julien Grall > > xen/arch/arm/domctl.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c > index 971caec..86fa102 100644 > --- a/xen/arch/arm/domctl.c > +++ b/xen/arch/arm/domctl.c > @@ -5,11 +5,11 @@ > */ > > #include > -#include > #include > -#include > #include > #include > +#include > +#include > #include > #include > > -- > 2.7.4 > ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 06/17 v5] xen/arm: vpl011: Add support for vuart in libxl
On Thu, 22 Jun 2017, Bhupinder Thakur wrote: > An option is provided in libxl to enable/disable sbsa vuart while > creating a guest domain. > > Libxl now suppots a generic vuart console and sbsa uart is a specific type. > In future support can be added for multiple vuart of different types. > > User can enable sbsa vuart by adding the following line in the guest > configuration file: > > vuart = "sbsa_uart" > > Signed-off-by: Bhupinder ThakurAcked-by: Stefano Stabellini > --- > CC: Ian Jackson > CC: Wei Liu > CC: Stefano Stabellini > CC: Julien Grall > > Changes since v4: > - Renamed "pl011" to "sbsa_uart". > > Changes since v3: > - Added a new config option CONFIG_VUART_CONSOLE to enable/disable vuart > console > support. > - Moved libxl_vuart_type to arch-arm part of libxl_domain_build_info > - Updated xl command help to mention new console type - vuart. > > Changes since v2: > - Defined vuart option as an enum instead of a string. > - Removed the domain creation flag defined for vuart and the related code > to pass on the information while domain creation. Now vpl011 is initialized > independent of domain creation through new DOMCTL APIs. > > tools/libxl/libxl.h | 6 ++ > tools/libxl/libxl_console.c | 3 +++ > tools/libxl/libxl_dom.c | 1 + > tools/libxl/libxl_internal.h | 3 +++ > tools/libxl/libxl_types.idl | 7 +++ > tools/xl/xl_cmdtable.c | 2 +- > tools/xl/xl_console.c| 5 - > tools/xl/xl_parse.c | 8 > 8 files changed, 33 insertions(+), 2 deletions(-) > > diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h > index cf8687a..bcfbb6c 100644 > --- a/tools/libxl/libxl.h > +++ b/tools/libxl/libxl.h > @@ -306,6 +306,12 @@ > #define LIBXL_HAVE_BUILDINFO_HVM_ACPI_LAPTOP_SLATE 1 > > /* > + * LIBXL_HAVE_VUART indicates that xenconsole/client supports > + * virtual uart. > + */ > +#define LIBXL_HAVE_VUART 1 > + > +/* > * libxl ABI compatibility > * > * The only guarantee which libxl makes regarding ABI compatibility > diff --git a/tools/libxl/libxl_console.c b/tools/libxl/libxl_console.c > index 446e766..853be15 100644 > --- a/tools/libxl/libxl_console.c > +++ b/tools/libxl/libxl_console.c > @@ -67,6 +67,9 @@ int libxl_console_exec(libxl_ctx *ctx, uint32_t domid, int > cons_num, > case LIBXL_CONSOLE_TYPE_SERIAL: > cons_type_s = "serial"; > break; > +case LIBXL_CONSOLE_TYPE_VUART: > +cons_type_s = "vuart"; > +break; > default: > goto out; > } > diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c > index 5d914a5..c98af60 100644 > --- a/tools/libxl/libxl_dom.c > +++ b/tools/libxl/libxl_dom.c > @@ -788,6 +788,7 @@ int libxl__build_pv(libxl__gc *gc, uint32_t domid, > if (xc_dom_translated(dom)) { > state->console_mfn = dom->console_pfn; > state->store_mfn = dom->xenstore_pfn; > +state->vuart_gfn = dom->vuart_gfn; > } else { > state->console_mfn = xc_dom_p2m(dom, dom->console_pfn); > state->store_mfn = xc_dom_p2m(dom, dom->xenstore_pfn); > diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h > index afe6652..d0d50c3 100644 > --- a/tools/libxl/libxl_internal.h > +++ b/tools/libxl/libxl_internal.h > @@ -1139,6 +1139,9 @@ typedef struct { > uint32_t num_vmemranges; > > xc_domain_configuration_t config; > + > +xen_pfn_t vuart_gfn; > +evtchn_port_t vuart_port; > } libxl__domain_build_state; > > _hidden int libxl__build_pre(libxl__gc *gc, uint32_t domid, > diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl > index 2204425..d492b35 100644 > --- a/tools/libxl/libxl_types.idl > +++ b/tools/libxl/libxl_types.idl > @@ -105,6 +105,7 @@ libxl_console_type = Enumeration("console_type", [ > (0, "UNKNOWN"), > (1, "SERIAL"), > (2, "PV"), > +(3, "VUART"), > ]) > > libxl_disk_format = Enumeration("disk_format", [ > @@ -240,6 +241,11 @@ libxl_checkpointed_stream = > Enumeration("checkpointed_stream", [ > (2, "COLO"), > ]) > > +libxl_vuart_type = Enumeration("vuart_type", [ > +(0, "unknown"), > +(1, "sbsa_uart"), > +]) > + > # > # Complex libxl types > # > @@ -580,6 +586,7 @@ libxl_domain_build_info = Struct("domain_build_info",[ > > > ("arch_arm", Struct(None, [("gic_version", libxl_gic_version), > + ("vuart", libxl_vuart_type), >])), > # Alternate p2m is not bound to any architecture or guest type, as it is > # supported by x86 HVM and ARM support is planned. > diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c > index 30eb93c..9f91651 100644 > --- a/tools/xl/xl_cmdtable.c > +++ b/tools/xl/xl_cmdtable.c > @@ -133,7 +133,7 @@ struct cmd_spec cmd_table[] = { >
Re: [Xen-devel] [PATCH 04/17 v5] xen/arm: vpl011: Add SBSA UART emulation in Xen
On Thu, 22 Jun 2017, Bhupinder Thakur wrote: > Add emulation code to emulate read/write access to pl011 registers > and pl011 interrupts: > > - Emulate DR read/write by reading and writing from/to the IN > and OUT ring buffers and raising an event to the backend when > there is data in the OUT ring buffer and injecting an interrupt > to the guest when there is data in the IN ring buffer > > - Other registers are related to interrupt management and > essentially control when interrupts are delivered to the guest > > This patch implements the SBSA Generic UART which is a subset of ARM > PL011 UART. > > The SBSA Generic UART is covered in Appendix B of > https://static.docs.arm.com/den0029/a/Server_Base_System_Architecture_v3_1_ARM_DEN_0029A.pdf > > Signed-off-by: Bhupinder Thakur> --- > CC: Stefano Stabellini > CC: Julien Grall > CC: Andre Przywara > > Changes since v4: > - Renamed vpl011_update() to vpl011_update_interrupt_status() and added logic > to avoid > raising spurious interrupts. > - Used barrier instructions correctly while reading/writing data to the ring > buffer. > - Proper lock taken before reading ring buffer indices. > > Changes since v3: > - Moved the call to DEFINE_XEN_FLEX_RING from vpl011.h to public/console.h. > This macro defines > standard functions to operate on the ring buffer. > - Lock taken while updating the interrupt mask and clear registers in > mmio_write. > - Use gfn_t instead of xen_pfn_t. > - vgic_free_virq called if there is any error in vpl011 initialization. > - mmio handlers freed if there is any error in vpl011 initialization. > - Removed vpl011->initialized flag usage as the same check could be done > using vpl011->ring-ref. > - Used return instead of break in the switch handling of emulation of > different pl011 registers. > - Renamed vpl011_update_spi() to vpl011_update(). > > Changes since v2: > - Use generic vreg_reg* for read/write of registers emulating pl011. > - Use generic ring buffer functions defined using DEFINE_XEN_FLEX_RING. > - Renamed the SPI injection function to vpl011_update_spi() to reflect level > triggered nature of pl011 interrupts. > - The pl011 register access address should always be the base address of the > corresponding register as per section B of the SBSA document. For this > reason, > the register range address access is not allowed. > > Changes since v1: > - Removed the optimiztion related to sendiing events to xenconsole > - Use local variables as ring buffer indices while using the ring buffer > > xen/arch/arm/Kconfig | 7 + > xen/arch/arm/Makefile| 1 + > xen/arch/arm/vpl011.c| 449 > +++ > xen/include/asm-arm/domain.h | 6 + > xen/include/asm-arm/pl011-uart.h | 2 + > xen/include/asm-arm/vpl011.h | 73 +++ > xen/include/public/arch-arm.h| 6 + > 7 files changed, 544 insertions(+) > create mode 100644 xen/arch/arm/vpl011.c > create mode 100644 xen/include/asm-arm/vpl011.h > > diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig > index d46b98c..f58019d 100644 > --- a/xen/arch/arm/Kconfig > +++ b/xen/arch/arm/Kconfig > @@ -50,6 +50,13 @@ config HAS_ITS > prompt "GICv3 ITS MSI controller support" if EXPERT = "y" > depends on HAS_GICV3 > > +config SBSA_VUART_CONSOLE > + bool "Emulated SBSA UART console support" > + default y > + ---help--- > + Allows a guest to use SBSA Generic UART as a console. The > + SBSA Generic UART implements a subset of ARM PL011 UART. > + > endmenu > > menu "ARM errata workaround via the alternative framework" > diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile > index 49e1fb2..d9c6ebf 100644 > --- a/xen/arch/arm/Makefile > +++ b/xen/arch/arm/Makefile > @@ -50,6 +50,7 @@ obj-$(CONFIG_HAS_GICV3) += vgic-v3.o > obj-$(CONFIG_HAS_ITS) += vgic-v3-its.o > obj-y += vm_event.o > obj-y += vtimer.o > +obj-$(CONFIG_SBSA_VUART_CONSOLE) += vpl011.o > obj-y += vpsci.o > obj-y += vuart.o > > diff --git a/xen/arch/arm/vpl011.c b/xen/arch/arm/vpl011.c > new file mode 100644 > index 000..db8651c > --- /dev/null > +++ b/xen/arch/arm/vpl011.c > @@ -0,0 +1,449 @@ > +/* > + * arch/arm/vpl011.c > + * > + * Virtual PL011 UART > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms and conditions of the GNU General Public License, > + * version 2, as published by the Free Software Foundation. > + * > + * This program is distributed in the hope it will be useful, but WITHOUT > + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or > + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for > + * more details. > + * > + * You should have received a copy of the GNU General Public License along > with > + * this
Re: [Xen-devel] [PATCH 03/17 v5] xen/arm: vpl011: Define common ring buffer helper functions in console.h
On Thu, 22 Jun 2017, Bhupinder Thakur wrote: > DEFINE_XEN_FLEX_RING(xencons) defines common helper functions such as > xencons_queued() to tell the current size of the ring buffer, > xencons_mask() to mask off the index, which are useful helper functions. > pl011 emulation code will use these helper functions. > > io/consol.h includes io/ring.h which defines DEFINE_XEN_FLEX_RING. io/console.h > In console/daemon/io.c, string.h had to be included before io/console.h > because ring.h uses string functions. > > Signed-off-by: Bhupinder ThakurReviewed-by: Stefano Stabellini > --- > CC: Ian Jackson > CC: Wei Liu > CC: Konrad Rzeszutek Wilk > CC: Stefano Stabellini > CC: Julien Grall > > Changes since v4: > - Split this change in a separate patch. > > tools/console/daemon/io.c | 2 +- > xen/include/public/io/console.h | 4 > 2 files changed, 5 insertions(+), 1 deletion(-) > > diff --git a/tools/console/daemon/io.c b/tools/console/daemon/io.c > index 7e474bb..e8033d2 100644 > --- a/tools/console/daemon/io.c > +++ b/tools/console/daemon/io.c > @@ -21,6 +21,7 @@ > > #include "utils.h" > #include "io.h" > +#include > #include > #include > #include > @@ -29,7 +30,6 @@ > > #include > #include > -#include > #include > #include > #include > diff --git a/xen/include/public/io/console.h b/xen/include/public/io/console.h > index e2cd97f..5e45e1c 100644 > --- a/xen/include/public/io/console.h > +++ b/xen/include/public/io/console.h > @@ -27,6 +27,8 @@ > #ifndef __XEN_PUBLIC_IO_CONSOLE_H__ > #define __XEN_PUBLIC_IO_CONSOLE_H__ > > +#include "ring.h" > + > typedef uint32_t XENCONS_RING_IDX; > > #define MASK_XENCONS_IDX(idx, ring) ((idx) & (sizeof(ring)-1)) > @@ -38,6 +40,8 @@ struct xencons_interface { > XENCONS_RING_IDX out_cons, out_prod; > }; > > +DEFINE_XEN_FLEX_RING(xencons); > + > #endif /* __XEN_PUBLIC_IO_CONSOLE_H__ */ > > /* ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 3/3] xen-disk: use an IOThread per instance
CC'ing Andreas Färber. Could you please give a quick look below at the way the iothread object is instantiate and destroyed? I am no object model expert and would appreaciate a second opinion. On Wed, 21 Jun 2017, Paul Durrant wrote: > This patch allocates an IOThread object for each xen_disk instance and > sets the AIO context appropriately on connect. This allows processing > of I/O to proceed in parallel. > > The patch also adds tracepoints into xen_disk to make it possible to > follow the state transtions of an instance in the log. > > Signed-off-by: Paul Durrant> --- > Cc: Stefano Stabellini > Cc: Anthony Perard > Cc: Kevin Wolf > Cc: Max Reitz > > v2: > - explicitly acquire and release AIO context in qemu_aio_complete() and >blk_bh() > --- > hw/block/trace-events | 7 ++ > hw/block/xen_disk.c | 69 > --- > 2 files changed, 67 insertions(+), 9 deletions(-) > > diff --git a/hw/block/trace-events b/hw/block/trace-events > index 65e83dc258..608b24ba66 100644 > --- a/hw/block/trace-events > +++ b/hw/block/trace-events > @@ -10,3 +10,10 @@ virtio_blk_submit_multireq(void *mrb, int start, int > num_reqs, uint64_t offset, > # hw/block/hd-geometry.c > hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p > LCHS %d %d %d" > hd_geometry_guess(void *blk, uint32_t cyls, uint32_t heads, uint32_t secs, > int trans) "blk %p CHS %u %u %u trans %d" > + > +# hw/block/xen_disk.c > +xen_disk_alloc(char *name) "%s" > +xen_disk_init(char *name) "%s" > +xen_disk_connect(char *name) "%s" > +xen_disk_disconnect(char *name) "%s" > +xen_disk_free(char *name) "%s" > diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c > index 0e6513708e..8548195195 100644 > --- a/hw/block/xen_disk.c > +++ b/hw/block/xen_disk.c > @@ -27,10 +27,13 @@ > #include "hw/xen/xen_backend.h" > #include "xen_blkif.h" > #include "sysemu/blockdev.h" > +#include "sysemu/iothread.h" > #include "sysemu/block-backend.h" > #include "qapi/error.h" > #include "qapi/qmp/qdict.h" > #include "qapi/qmp/qstring.h" > +#include "qom/object_interfaces.h" > +#include "trace.h" > > /* - */ > > @@ -128,6 +131,9 @@ struct XenBlkDev { > DriveInfo *dinfo; > BlockBackend*blk; > QEMUBH *bh; > + > +IOThread*iothread; > +AioContext *ctx; > }; > > /* - */ > @@ -599,9 +605,12 @@ static int ioreq_runio_qemu_aio(struct ioreq *ioreq); > static void qemu_aio_complete(void *opaque, int ret) > { > struct ioreq *ioreq = opaque; > +struct XenBlkDev *blkdev = ioreq->blkdev; > + > +aio_context_acquire(blkdev->ctx); I think that Paolo was right that we need a aio_context_acquire here, however the issue is that with the current code: blk_handle_requests -> ioreq_runio_qemu_aio -> qemu_aio_complete leading to aio_context_acquire being called twice on the same lock, which I don't think is allowed? I think we need to get rid of the qemu_aio_complete call from ioreq_runio_qemu_aio, but to do that we need to be careful with the accounting of aio_inflight (today it's incremented unconditionally at the beginning of ioreq_runio_qemu_aio, I think we would have to change that to increment it only if presync). > if (ret != 0) { > -xen_pv_printf(>blkdev->xendev, 0, "%s I/O error\n", > +xen_pv_printf(>xendev, 0, "%s I/O error\n", >ioreq->req.operation == BLKIF_OP_READ ? "read" : > "write"); > ioreq->aio_errors++; > } > @@ -610,13 +619,13 @@ static void qemu_aio_complete(void *opaque, int ret) > if (ioreq->presync) { > ioreq->presync = 0; > ioreq_runio_qemu_aio(ioreq); > -return; > +goto done; > } > if (ioreq->aio_inflight > 0) { > -return; > +goto done; > } > > -if (ioreq->blkdev->feature_grant_copy) { > +if (blkdev->feature_grant_copy) { > switch (ioreq->req.operation) { > case BLKIF_OP_READ: > /* in case of failure ioreq->aio_errors is increased */ > @@ -638,7 +647,7 @@ static void qemu_aio_complete(void *opaque, int ret) > } > > ioreq->status = ioreq->aio_errors ? BLKIF_RSP_ERROR : BLKIF_RSP_OKAY; > -if (!ioreq->blkdev->feature_grant_copy) { > +if (!blkdev->feature_grant_copy) { > ioreq_unmap(ioreq); > } > ioreq_finish(ioreq); > @@ -650,16 +659,19 @@ static void qemu_aio_complete(void *opaque, int ret) > } > case BLKIF_OP_READ: > if (ioreq->status == BLKIF_RSP_OKAY) { > -block_acct_done(blk_get_stats(ioreq->blkdev->blk), >acct); > +block_acct_done(blk_get_stats(blkdev->blk), >acct); > } else { > -
Re: [Xen-devel] new dma-mapping tree, was Re: clean up and modularize arch dma_mapping interface V2
Hi all, On Wed, 21 Jun 2017 15:32:39 +0200 Marek Szyprowskiwrote: > > On 2017-06-20 15:16, Christoph Hellwig wrote: > > On Tue, Jun 20, 2017 at 11:04:00PM +1000, Stephen Rothwell wrote: > >> git://git.linaro.org/people/mszyprowski/linux-dma-mapping.git#dma-mapping-next > >> > >> Contacts: Marek Szyprowski and Kyungmin Park (cc'd) > >> > >> I have called your tree dma-mapping-hch for now. The other tree has > >> not been updated since 4.9-rc1 and I am not sure how general it is. > >> Marek, Kyungmin, any comments? > > I'd be happy to join efforts - co-maintainers and reviers are always > > welcome. > > I did some dma-mapping unification works in the past and my tree in > linux-next > was a side effect of that. I think that for now it can be dropped in > favor of > Christoph's tree. I can also do some review and help in maintainers work if > needed, although I was recently busy with other stuff. OK, so I have dropped the dma-mapping tree and renamed dma-mapping-hch to dma-mapping. -- Cheers, Stephen Rothwell ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [xen-4.7-testing test] 110944: tolerable FAIL - PUSHED
flight 110944 xen-4.7-testing real [real] http://logs.test-lab.xenproject.org/osstest/logs/110944/ Failures :-/ but no regressions. Tests which are failing intermittently (not blocking): test-armhf-armhf-xl 3 host-install(3) broken in 110902 pass in 110944 test-xtf-amd64-amd64-2 45 xtf/test-hvm64-lbr-tsx-vmentry fail pass in 110902 test-arm64-arm64-xl-credit2 9 debian-install fail pass in 110902 test-armhf-armhf-xl-cubietruck 16 guest-start.2fail pass in 110902 Regressions which are regarded as allowable (not blocking): test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stopfail REGR. vs. 110430 Tests which did not succeed, but are not blocking: test-armhf-armhf-xl-rtds 16 guest-start.2 fail blocked in 110430 test-armhf-armhf-xl-rtds 15 guest-start/debian.repeat fail in 110902 like 110430 test-arm64-arm64-xl-credit2 12 migrate-support-check fail in 110902 never pass test-arm64-arm64-xl-credit2 13 saverestore-support-check fail in 110902 never pass test-armhf-armhf-libvirt-xsm 13 saverestore-support-checkfail like 110430 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stopfail like 110430 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop fail like 110430 test-armhf-armhf-libvirt-raw 12 saverestore-support-checkfail like 110430 test-armhf-armhf-libvirt 13 saverestore-support-checkfail like 110430 test-amd64-amd64-xl-qemut-ws16-amd64 9 windows-installfail never pass test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail never pass test-amd64-amd64-xl-qemuu-ws16-amd64 9 windows-installfail never pass test-amd64-amd64-xl-pvh-amd 11 guest-start fail never pass test-amd64-i386-libvirt-xsm 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt 12 migrate-support-checkfail never pass test-amd64-amd64-xl-pvh-intel 11 guest-start fail never pass test-amd64-i386-libvirt 12 migrate-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass test-arm64-arm64-xl-xsm 12 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 12 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 13 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 saverestore-support-checkfail never pass test-arm64-arm64-xl 12 migrate-support-checkfail never pass test-arm64-arm64-xl 13 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass test-armhf-armhf-xl-arndale 12 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 13 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail never pass test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2 fail never pass test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail never pass test-armhf-armhf-xl 12 migrate-support-checkfail never pass test-armhf-armhf-xl 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-xl-xsm 13 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail never pass test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail never pass test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 12 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 11 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 12 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 12 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 13 saverestore-support-checkfail never pass test-amd64-i386-xl-qemuu-win10-i386 9 windows-install fail never pass test-amd64-amd64-xl-qemuu-win10-i386 9 windows-installfail never pass test-amd64-amd64-xl-qemut-win10-i386 9 windows-installfail never pass test-amd64-i386-xl-qemut-win10-i386 9 windows-install fail never pass test-amd64-i386-xl-qemuu-ws16-amd64 9 windows-install fail never pass test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass test-amd64-i386-xl-qemut-ws16-amd64 9 windows-install fail never pass version targeted for
Re: [Xen-devel] [PATCH for-4.9 v3 3/3] xen/livepatch: Don't crash on encountering STN_UNDEF relocations
On Thu, 22 Jun 2017, Andrew Cooper wrote: > A symndx of STN_UNDEF is special, and means a symbol value of 0. While > legitimate in the ELF standard, its existance in a livepatch is questionable > at best. Until a plausible usecase presents itself, reject such a relocation > with -EOPNOTSUPP. > > Additionally, fix an off-by-one error while range checking symndx, and perform > a safety check on elf->sym[symndx].sym before derefencing it, to avoid > tripping over a NULL pointer when calculating val. > > Signed-off-by: Andrew Cooper> --- > CC: Konrad Rzeszutek Wilk > CC: Ross Lagerwall > CC: Jan Beulich > CC: Stefano Stabellini > CC: Julien Grall > > v3: > * Fix off-by-one error > v2: > * Reject STN_UNDEF with -EOPNOTSUPP Reviewed-by: Stefano Stabellini > --- > xen/arch/arm/arm32/livepatch.c | 14 +- > xen/arch/arm/arm64/livepatch.c | 14 +- > xen/arch/x86/livepatch.c | 14 +- > 3 files changed, 39 insertions(+), 3 deletions(-) > > diff --git a/xen/arch/arm/arm32/livepatch.c b/xen/arch/arm/arm32/livepatch.c > index a328179..41378a5 100644 > --- a/xen/arch/arm/arm32/livepatch.c > +++ b/xen/arch/arm/arm32/livepatch.c > @@ -254,12 +254,24 @@ int arch_livepatch_perform(struct livepatch_elf *elf, > addend = get_addend(type, dest); > } > > -if ( symndx > elf->nsym ) > +if ( symndx == STN_UNDEF ) > +{ > +dprintk(XENLOG_ERR, LIVEPATCH "%s: Encountered STN_UNDEF\n", > +elf->name); > +return -EOPNOTSUPP; > +} > +else if ( symndx >= elf->nsym ) > { > dprintk(XENLOG_ERR, LIVEPATCH "%s: Relative symbol wants > symbol@%u which is past end!\n", > elf->name, symndx); > return -EINVAL; > } > +else if ( !elf->sym[symndx].sym ) > +{ > +dprintk(XENLOG_ERR, LIVEPATCH "%s: No relative symbol@%u\n", > +elf->name, symndx); > +return -EINVAL; > +} > > val = elf->sym[symndx].sym->st_value; /* S */ > > diff --git a/xen/arch/arm/arm64/livepatch.c b/xen/arch/arm/arm64/livepatch.c > index 63929b1..2247b92 100644 > --- a/xen/arch/arm/arm64/livepatch.c > +++ b/xen/arch/arm/arm64/livepatch.c > @@ -252,12 +252,24 @@ int arch_livepatch_perform_rela(struct livepatch_elf > *elf, > int ovf = 0; > uint64_t val; > > -if ( symndx > elf->nsym ) > +if ( symndx == STN_UNDEF ) > +{ > +dprintk(XENLOG_ERR, LIVEPATCH "%s: Encountered STN_UNDEF\n", > +elf->name); > +return -EOPNOTSUPP; > +} > +else if ( symndx >= elf->nsym ) > { > dprintk(XENLOG_ERR, LIVEPATCH "%s: Relative relocation wants > symbol@%u which is past end!\n", > elf->name, symndx); > return -EINVAL; > } > +else if ( !elf->sym[symndx].sym ) > +{ > +dprintk(XENLOG_ERR, LIVEPATCH "%s: No relative symbol@%u\n", > +elf->name, symndx); > +return -EINVAL; > +} > > val = elf->sym[symndx].sym->st_value + r->r_addend; /* S+A */ > > diff --git a/xen/arch/x86/livepatch.c b/xen/arch/x86/livepatch.c > index 7917610..406eb91 100644 > --- a/xen/arch/x86/livepatch.c > +++ b/xen/arch/x86/livepatch.c > @@ -170,12 +170,24 @@ int arch_livepatch_perform_rela(struct livepatch_elf > *elf, > uint8_t *dest = base->load_addr + r->r_offset; > uint64_t val; > > -if ( symndx > elf->nsym ) > +if ( symndx == STN_UNDEF ) > +{ > +dprintk(XENLOG_ERR, LIVEPATCH "%s: Encountered STN_UNDEF\n", > +elf->name); > +return -EOPNOTSUPP; > +} > +else if ( symndx >= elf->nsym ) > { > dprintk(XENLOG_ERR, LIVEPATCH "%s: Relative relocation wants > symbol@%u which is past end!\n", > elf->name, symndx); > return -EINVAL; > } > +else if ( !elf->sym[symndx].sym ) > +{ > +dprintk(XENLOG_ERR, LIVEPATCH "%s: No symbol@%u\n", > +elf->name, symndx); > +return -EINVAL; > +} > > val = r->r_addend + elf->sym[symndx].sym->st_value; > > -- > 2.1.4 > ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC v2]Proposal to allow setting up shared memory areas between VMs from xl config file
On Wed, 21 Jun 2017, Zhongze Liu wrote: > > 1. Motivation and Description > > Virtual machines use grant table hypercalls to setup a share page for > inter-VMs communications. These hypercalls are used by all PV > protocols today. However, very simple guests, such as baremetal > applications, might not have the infrastructure to handle the grant table. > This project is about setting up several shared memory areas for inter-VMs > communications directly from the VM config file. > So that the guest kernel doesn't have to have grant table support (in the > embedded space, this is not unusual) to be able to communicate with > other guests. > > > 2. Implementation Plan: > > > == > 2.1 Introduce a new VM config option in xl: > == > The shared areas should be shareable among several (>=2) VMs, so > every shared physical memory area is assigned to a set of VMs. > Therefore, a “token” or “identifier” should be used here to uniquely > identify a backing memory area. > > The backing area would be taken from one domain, which we will regard > as the "master domain", and this domain should be created prior to any > other "slave domain"s. Again, we have to use some kind of tag to tell who > is the "master domain". > > And the ability to specify the attributes of the pages (say, WO/RO/X) > to be shared should be also given to the user. For the master domain, > these attributes often describes the maximum permission allowed for the > shared pages, and for the slave domains, these attributes are often used > to describe with what permissions this area will be mapped. > This information should also be specified in the xl config entry. > > To handle all these, I would suggest using an unsigned integer to serve as the > identifier, and using a "master" tag in the master domain's xl config entry > to announce that she will provide the backing memory pages. A separate > entry would be used to describe the attributes of the shared memory area, of > the form "prot=RW". > For example: > > In xl config file of vm1: > > static_shared_mem = ["id = ID1, begin = gmfn1, end = gmfn2, > granularity = 4k, prot = RO, master”, > "id = ID2, begin = gmfn3, end = gmfn4, > granularity = 4k, prot = RW, master”] > > In xl config file of vm2: > > static_shared_mem = ["id = ID1, begin = gmfn5, end = gmfn6, > granularity = 4k, prot = RO”] > > In xl config file of vm3: > > static_shared_mem = ["id = ID2, begin = gmfn7, end = gmfn8, > granularity = 4k, prot = RW”] > > gmfn's above are all hex of the form "0x2". > > In the example above. A memory area ID1 will be shared between vm1 and vm2. > This area will be taken from vm1 and mapped into vm2's stage-2 page table. > The parameter "prot=RO" means that this memory area are offered with read-only > permission. vm1 can access this area using gmfn1~gmfn2, and vm2 using > gmfn5~gmfn6. > Likewise, a memory area ID will be shared between vm1 and vm3 with read and > write permissions. vm1 is the master and vm2 the slave. vm1 can access the > area using gmfn3~gmfn4 and vm3 using gmfn7~gmfn8. > > The "granularity" is optional in the slaves' config entries. But if it's > presented in the slaves' config entry, it has to be the same with its > master's. > Besides, the size of the gmfn range must also match. And overlapping backing > memory areas are well defined. > > Note that the "master" tag in vm1 for both ID1 and ID2 indicates that vm1 > should be created prior to both vm2 and vm3, for they both rely on the pages > backed by vm1. If one tries to create vm2 or vm3 prior to vm1, she will get > an error. And in vm1's config file, the "prot=RO" parameter of ID1 indicates > that if one tries to share this page with vm1 with, say, "WR" permission, > she will get an error, too. > > == > 2.2 Store the mem-sharing information in xenstore > == > For we don't have some persistent storage for xl to store the information > of the shared memory areas, we have to find some way to keep it between xl > launches. And xenstore is a good place to do this. The information for one > shared area should include the ID, master domid and gmfn ranges and > memory attributes in master and slave domains of this area. > A current plan is to place the information under /local/shared_mem/ID. > Still take the above config files as an example: > > If we instantiate vm1, vm2 and vm3, one after another, > “xenstore ls -f” should output something like this: > > After VM1 was instantiated, the output of “xenstore ls -f” > will be something like this: > >
Re: [Xen-devel] [RFC v2]Proposal to allow setting up shared memory areas between VMs from xl config file
On Fri, 23 Jun 2017, Zhongze Liu wrote: > Hi Julien, > > 2017-06-21 1:29 GMT+08:00 Julien Grall: > > Hi, > > > > Thank you for the new proposal. > > > > On 06/20/2017 06:18 PM, Zhongze Liu wrote: > >> > >> In the example above. A memory area ID1 will be shared between vm1 and > >> vm2. > >> This area will be taken from vm1 and mapped into vm2's stage-2 page table. > >> The parameter "prot=RO" means that this memory area are offered with > >> read-only > >> permission. vm1 can access this area using gmfn1~gmfn2, and vm2 using > >> gmfn5~gmfn6. > > > > > > [...] > > > >> > >> == > >> 2.3 mapping the memory areas > >> == > >> Handle the newly added config option in tools/{xl, libxl} and utilize > >> toos/libxc to do the actual memory mapping. Specifically, we will use > >> a wrapper to XENMME_add_to_physmap_batch with XENMAPSPACE_gmfn_foreign to > >> do the actual mapping. But since there isn't such a wrapper in libxc, > >> we'll > >> have to add a new wrapper, xc_domain_add_to_physmap_batch in > >> libxc/xc_domain.c > > > > > > In the paragrah above, you suggest the user can select the permission on the > > shared page. However, the hypercall XENMEM_add_to_physmap does not currently > > take permission. So how do you plan to handle that? > > > > I think this could be done via XENMEM_access_op? I discussed this topic with Zhongze. I suggested to leave permissions as "TODO" for the moment, given that for the use-case we have in mind they aren't needed. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] passthrough: give XEN_DOMCTL_test_assign_device more sane semantics
On 06/22/2017 05:40 AM, George Dunlap wrote: On 22/06/17 08:05, Jan Beulich wrote: On 21.06.17 at 18:36,wrote: On 21/06/17 16:59, Jan Beulich wrote: On 21.06.17 at 16:38, wrote: On 21/06/17 11:08, Jan Beulich wrote: So far callers of the libxc interface passed in a domain ID which was then ignored in the hypervisor. Instead, make the hypervisor honor it (accepting DOMID_INVALID to obtain original behavior), allowing to query whether a device is assigned to a particular domain. Ignore the passed in domain ID at the libxc layer instead, in order to not break existing callers. New libxc functions would need to be added if callers wanted to leverage the new functionality. I don't think your modified description matches the name of the call at all. It looks like the callers expect "test_assign_device" to answer the question: "Can I assign a device to this domain"? I don't think so - the question being answered by the original operation is "Is this device assigned to any domain?" with the implied inverse "Is this device available to be assigned to some domain (i.e. it is currently unassigned or owned by Dom0)?" If the question were "Is this device assigned to any domain?", then I would expect: 1. The return value to be a boolean 2. It would always return, "No it's not assigned" in the case where there is no IOMMU. However, that's not what happens: 1. It returns "success" if there is an IOMMU and the device is *not* assigned, and returns an error if the device is assigned 2. It returns an error if there is no IOMMU. The only place in the code this is called 'for real' in the tree is in libxl_pci.c:libxl__device_pci_add() if (libxl__domain_type(gc, domid) == LIBXL_DOMAIN_TYPE_HVM) { rc = xc_test_assign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev)); if (rc) { LOGD(ERROR, domid, "PCI device %04x:%02x:%02x.%u %s?", pcidev->domain, pcidev->bus, pcidev->dev, pcidev->func, errno == ENOSYS ? "cannot be assigned - no IOMMU" : "already assigned to a different guest"); goto out; } } Here 'domid' is the domain to which libxl wants to assign the device. So libxl is now asking Xen, "Am I allowed to assign device $bdf to domain $domain?" Your description provides the *algorithm* by which Xen normally provides an answer: that is, normally the only thing Xen cares about is that it hasn't already been assigned to a domain. But it still remains the case that what libxl is asking is, "Can I assign X to Y?" Taking the log message into account that you quote, I do not view the code's intention to be what you describe. Well, I'm not sure what to say, because in my view the log message supports my view. :-) Note that there are two errors, both explaining why the domain cannot be assigned -- one is "no IOMMU", one is "already assigned to a different guest". Yes, at the moment it doesn't have a separate message for -EPERM (which is presumably what XSM would return if there were some other problem). But it also doesn't correctly report other potential errors: -ENODEV if you try to assign a DT device on a PCI-based system, or a PCI device on a DT-based system. (Apparently we also retirn -EINVAL if you included inappropriate flags, *or* if the device didn't exist, *or* if the device was already assigned somehwere else. As long as we're re-painting things we should probably change this as well.) But to make test_assign_device answer the question, "Is this assigned to a domU?", you'd have to have it return SUCCESS when there is no IOMMU (since the device is not, in fact, assigned to a domU); and thus libxl would have to make a separate call to find out if an IOMMU was present. It looks like it's meant to be used in XSM environments, to allow a policy to permit or forbid specific guests to have access to specific devices. On a default (non-XSM) system, the answer to that question doesn't depend on the domain it's being assigned to, but only whether the device is already assigned to another domain; but on XSM systems the logic can presumably be more complicated. That sounds like a perfectly sane semantic to me, and this patch removes that ability. And again I don't think so: Prior to the patch, do_domctl() at its very top makes sure to entirely ignore the passed in domain ID. This code sits ahead of the XSM check, so XSM has no way of knowing which domain has been specified by the caller. Right, I see that now. Still, I assert that the original hypercall semantics is a very useful one, and what you're doing is changing the hypercall such that the question can no longer be asked. It would be better to extend things so that XSM can actually deny device assignment based on both the bdf and the domain. Do you have a particular use case in mind for your alternate hypercall? No - I'm open to any change to it which
Re: [Xen-devel] [PATCH v2 2/2] x86/xen/efi: Init only efi struct members used by Xen
On 06/22/2017 06:51 AM, Daniel Kiper wrote: > Current approach, wholesale efi struct initialization from efi_xen, is not > good. Usually if new member is defined then it is properly initialized in > drivers/firmware/efi/efi.c but not in arch/x86/xen/efi.c. As I saw it happened > a few times until now. So, let's initialize only efi struct members used by > Xen to avoid such issues in the future. > > Signed-off-by: Daniel Kiper> Acked-by: Ard Biesheuvel Reviewed-by: Boris Ostrovsky ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [xen-4.9-testing test] 110942: regressions - FAIL
flight 110942 xen-4.9-testing real [real] http://logs.test-lab.xenproject.org/osstest/logs/110942/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-armhf-armhf-xl-credit2 15 guest-start/debian.repeat fail REGR. vs. 110542 test-armhf-armhf-xl 6 xen-boot fail REGR. vs. 110550 Regressions which are regarded as allowable (not blocking): test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stopfail REGR. vs. 110542 Tests which did not succeed, but are not blocking: test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop fail like 110499 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stopfail like 110524 test-amd64-amd64-xl-qemut-win7-amd64 15 guest-localmigrate/x10 fail like 110550 test-amd64-amd64-xl-rtds 9 debian-install fail like 110550 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail never pass test-amd64-amd64-xl-qemuu-ws16-amd64 9 windows-installfail never pass test-amd64-i386-libvirt-xsm 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt 12 migrate-support-checkfail never pass test-amd64-amd64-xl-qemut-ws16-amd64 9 windows-installfail never pass build-amd64-prev 6 xen-build/dist-test fail never pass test-arm64-arm64-xl-credit2 12 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 13 saverestore-support-checkfail never pass test-arm64-arm64-xl 12 migrate-support-checkfail never pass test-arm64-arm64-xl 13 saverestore-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass test-arm64-arm64-xl-xsm 12 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 13 saverestore-support-checkfail never pass build-i386-prev 6 xen-build/dist-test fail never pass test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail never pass test-armhf-armhf-xl-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-xl-xsm 13 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-xsm 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass test-amd64-i386-libvirt 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 12 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 12 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 12 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 12 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 13 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 12 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 11 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 12 saverestore-support-checkfail never pass test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2 fail never pass test-amd64-i386-xl-qemuu-win10-i386 9 windows-install fail never pass test-amd64-amd64-xl-qemut-win10-i386 9 windows-installfail never pass test-amd64-amd64-xl-qemuu-win10-i386 9 windows-installfail never pass test-amd64-i386-xl-qemut-win10-i386 9 windows-install fail never pass test-amd64-i386-xl-qemut-ws16-amd64 9 windows-install fail never pass test-amd64-i386-xl-qemuu-ws16-amd64 9 windows-install fail never pass version targeted for testing: xen b38b1479a532f08fedd7f3b761673bc78b66739d baseline version: xen e197d29514165202308fe65db6effc4835aabfeb Last test of basis 110550 2017-06-18 21:49:42 Z3 days Failing since110568 2017-06-19 13:14:32 Z3 days3 attempts Testing same since 110942 2017-06-21 16:30:45 Z1 days1 attempts
[Xen-devel] [PATCH v5 17/18] xen/pvcalls: implement write
When the other end notifies us that there is data to be written (pvcalls_back_conn_event), increment the io and write counters, and schedule the ioworker. Implement the write function called by ioworker by reading the data from the data ring, writing it to the socket by calling inet_sendmsg. Set out_error on error. Signed-off-by: Stefano StabelliniCC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-back.c | 74 +- 1 file changed, 73 insertions(+), 1 deletion(-) diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c index ccceabd..424dcac 100644 --- a/drivers/xen/pvcalls-back.c +++ b/drivers/xen/pvcalls-back.c @@ -179,7 +179,66 @@ static void pvcalls_conn_back_read(void *opaque) static int pvcalls_conn_back_write(struct sock_mapping *map) { - return 0; + struct pvcalls_data_intf *intf = map->ring; + struct pvcalls_data *data = >data; + struct msghdr msg; + struct kvec vec[2]; + RING_IDX cons, prod, size, array_size; + int ret; + + cons = intf->out_cons; + prod = intf->out_prod; + /* read the indexes before dealing with the data */ + virt_mb(); + + array_size = XEN_FLEX_RING_SIZE(map->ring_order); + size = pvcalls_queued(prod, cons, array_size); + if (size == 0) + return 0; + + memset(, 0, sizeof(msg)); + msg.msg_flags |= MSG_DONTWAIT; + msg.msg_iter.type = ITER_KVEC|READ; + msg.msg_iter.count = size; + if (pvcalls_mask(prod, array_size) > pvcalls_mask(cons, array_size)) { + vec[0].iov_base = data->out + pvcalls_mask(cons, array_size); + vec[0].iov_len = size; + msg.msg_iter.kvec = vec; + msg.msg_iter.nr_segs = 1; + } else { + vec[0].iov_base = data->out + pvcalls_mask(cons, array_size); + vec[0].iov_len = array_size - pvcalls_mask(cons, array_size); + vec[1].iov_base = data->out; + vec[1].iov_len = size - vec[0].iov_len; + msg.msg_iter.kvec = vec; + msg.msg_iter.nr_segs = 2; + } + + atomic_set(>write, 0); + ret = inet_sendmsg(map->sock, , size); + if (ret == -EAGAIN || (ret >= 0 && ret < size)) { + atomic_inc(>write); + atomic_inc(>io); + } + if (ret == -EAGAIN) + return ret; + + /* write the data, then update the indexes */ + virt_wmb(); + if (ret < 0) { + intf->out_error = ret; + } else { + intf->out_error = 0; + intf->out_cons = cons + ret; + prod = intf->out_prod; + } + /* update the indexes, then notify the other end */ + virt_wmb(); + if (prod != cons + ret) + atomic_inc(>write); + notify_remote_via_irq(map->irq); + + return ret; } static void pvcalls_back_ioworker(struct work_struct *work) @@ -849,6 +908,19 @@ static irqreturn_t pvcalls_back_event(int irq, void *dev_id) static irqreturn_t pvcalls_back_conn_event(int irq, void *sock_map) { + struct sock_mapping *map = sock_map; + struct pvcalls_ioworker *iow; + + if (map == NULL || map->sock == NULL || map->sock->sk == NULL || + map->sock->sk->sk_user_data != map) + return IRQ_HANDLED; + + iow = >ioworker; + + atomic_inc(>write); + atomic_inc(>io); + queue_work(iow->wq, >register_work); + return IRQ_HANDLED; } -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v5 07/18] xen/pvcalls: implement socket command
Just reply with success to the other end for now. Delay the allocation of the actual socket to bind and/or connect. Signed-off-by: Stefano StabelliniReviewed-by: Boris Ostrovsky CC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-back.c | 27 +++ 1 file changed, 27 insertions(+) diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c index 437c2ad..953458b 100644 --- a/drivers/xen/pvcalls-back.c +++ b/drivers/xen/pvcalls-back.c @@ -12,12 +12,17 @@ * GNU General Public License for more details. */ +#include #include #include #include #include #include #include +#include +#include +#include +#include #include #include @@ -54,6 +59,28 @@ struct pvcalls_fedata { static int pvcalls_back_socket(struct xenbus_device *dev, struct xen_pvcalls_request *req) { + struct pvcalls_fedata *fedata; + int ret; + struct xen_pvcalls_response *rsp; + + fedata = dev_get_drvdata(>dev); + + if (req->u.socket.domain != AF_INET || + req->u.socket.type != SOCK_STREAM || + (req->u.socket.protocol != IPPROTO_IP && +req->u.socket.protocol != AF_INET)) + ret = -EAFNOSUPPORT; + else + ret = 0; + + /* leave the actual socket allocation for later */ + + rsp = RING_GET_RESPONSE(>ring, fedata->ring.rsp_prod_pvt++); + rsp->req_id = req->req_id; + rsp->cmd = req->cmd; + rsp->u.socket.id = req->u.socket.id; + rsp->ret = ret; + return 0; } -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v5 16/18] xen/pvcalls: implement read
When an active socket has data available, increment the io and read counters, and schedule the ioworker. Implement the read function by reading from the socket, writing the data to the data ring. Set in_error on error. Signed-off-by: Stefano StabelliniCC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-back.c | 85 ++ 1 file changed, 85 insertions(+) diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c index ab7882a..ccceabd 100644 --- a/drivers/xen/pvcalls-back.c +++ b/drivers/xen/pvcalls-back.c @@ -100,6 +100,81 @@ static int pvcalls_back_release_active(struct xenbus_device *dev, static void pvcalls_conn_back_read(void *opaque) { + struct sock_mapping *map = (struct sock_mapping *)opaque; + struct msghdr msg; + struct kvec vec[2]; + RING_IDX cons, prod, size, wanted, array_size, masked_prod, masked_cons; + int32_t error; + struct pvcalls_data_intf *intf = map->ring; + struct pvcalls_data *data = >data; + unsigned long flags; + int ret; + + array_size = XEN_FLEX_RING_SIZE(map->ring_order); + cons = intf->in_cons; + prod = intf->in_prod; + error = intf->in_error; + /* read the indexes first, then deal with the data */ + virt_mb(); + + if (error) + return; + + size = pvcalls_queued(prod, cons, array_size); + if (size >= array_size) + return; + spin_lock_irqsave(>sock->sk->sk_receive_queue.lock, flags); + if (skb_queue_empty(>sock->sk->sk_receive_queue)) { + atomic_set(>read, 0); + spin_unlock_irqrestore(>sock->sk->sk_receive_queue.lock, + flags); + return; + } + spin_unlock_irqrestore(>sock->sk->sk_receive_queue.lock, flags); + wanted = array_size - size; + masked_prod = pvcalls_mask(prod, array_size); + masked_cons = pvcalls_mask(cons, array_size); + + memset(, 0, sizeof(msg)); + msg.msg_iter.type = ITER_KVEC|WRITE; + msg.msg_iter.count = wanted; + if (masked_prod < masked_cons) { + vec[0].iov_base = data->in + masked_prod; + vec[0].iov_len = wanted; + msg.msg_iter.kvec = vec; + msg.msg_iter.nr_segs = 1; + } else { + vec[0].iov_base = data->in + masked_prod; + vec[0].iov_len = array_size - masked_prod; + vec[1].iov_base = data->in; + vec[1].iov_len = wanted - vec[0].iov_len; + msg.msg_iter.kvec = vec; + msg.msg_iter.nr_segs = 2; + } + + atomic_set(>read, 0); + ret = inet_recvmsg(map->sock, , wanted, MSG_DONTWAIT); + WARN_ON(ret > wanted); + if (ret == -EAGAIN) /* shouldn't happen */ + return; + if (!ret) + ret = -ENOTCONN; + spin_lock_irqsave(>sock->sk->sk_receive_queue.lock, flags); + if (ret > 0 && !skb_queue_empty(>sock->sk->sk_receive_queue)) + atomic_inc(>read); + spin_unlock_irqrestore(>sock->sk->sk_receive_queue.lock, flags); + + /* write the data, then modify the indexes */ + virt_wmb(); + if (ret < 0) + intf->in_error = ret; + else + intf->in_prod = prod + ret; + /* update the indexes, then notify the other end */ + virt_wmb(); + notify_remote_via_irq(map->irq); + + return; } static int pvcalls_conn_back_write(struct sock_mapping *map) @@ -172,6 +247,16 @@ static void pvcalls_sk_state_change(struct sock *sock) static void pvcalls_sk_data_ready(struct sock *sock) { + struct sock_mapping *map = sock->sk_user_data; + struct pvcalls_ioworker *iow; + + if (map == NULL) + return; + + iow = >ioworker; + atomic_inc(>read); + atomic_inc(>io); + queue_work(iow->wq, >register_work); } static struct sock_mapping *pvcalls_new_active_socket( -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v5 04/18] xen/pvcalls: xenbus state handling
Introduce the code to handle xenbus state changes. Implement the probe function for the pvcalls backend. Write the supported versions, max-page-order and function-calls nodes to xenstore, as required by the protocol. Introduce stub functions for disconnecting/connecting to a frontend. Signed-off-by: Stefano StabelliniReviewed-by: Boris Ostrovsky CC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-back.c | 152 + 1 file changed, 152 insertions(+) diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c index 9044cf2..7bce750 100644 --- a/drivers/xen/pvcalls-back.c +++ b/drivers/xen/pvcalls-back.c @@ -25,20 +25,172 @@ #include #include +#define PVCALLS_VERSIONS "1" +#define MAX_RING_ORDER XENBUS_MAX_RING_GRANT_ORDER + struct pvcalls_back_global { struct list_head frontends; struct semaphore frontends_lock; } pvcalls_back_global; +static int backend_connect(struct xenbus_device *dev) +{ + return 0; +} + +static int backend_disconnect(struct xenbus_device *dev) +{ + return 0; +} + static int pvcalls_back_probe(struct xenbus_device *dev, const struct xenbus_device_id *id) { + int err, abort; + struct xenbus_transaction xbt; + +again: + abort = 1; + + err = xenbus_transaction_start(); + if (err) { + pr_warn("%s cannot create xenstore transaction\n", __func__); + return err; + } + + err = xenbus_printf(xbt, dev->nodename, "versions", "%s", + PVCALLS_VERSIONS); + if (err) { + pr_warn("%s write out 'version' failed\n", __func__); + goto abort; + } + + err = xenbus_printf(xbt, dev->nodename, "max-page-order", "%u", + MAX_RING_ORDER); + if (err) { + pr_warn("%s write out 'max-page-order' failed\n", __func__); + goto abort; + } + + err = xenbus_printf(xbt, dev->nodename, "function-calls", + XENBUS_FUNCTIONS_CALLS); + if (err) { + pr_warn("%s write out 'function-calls' failed\n", __func__); + goto abort; + } + + abort = 0; +abort: + err = xenbus_transaction_end(xbt, abort); + if (err) { + if (err == -EAGAIN && !abort) + goto again; + pr_warn("%s cannot complete xenstore transaction\n", __func__); + return err; + } + + xenbus_switch_state(dev, XenbusStateInitWait); + return 0; } +static void set_backend_state(struct xenbus_device *dev, + enum xenbus_state state) +{ + while (dev->state != state) { + switch (dev->state) { + case XenbusStateClosed: + switch (state) { + case XenbusStateInitWait: + case XenbusStateConnected: + xenbus_switch_state(dev, XenbusStateInitWait); + break; + case XenbusStateClosing: + xenbus_switch_state(dev, XenbusStateClosing); + break; + default: + __WARN(); + } + break; + case XenbusStateInitWait: + case XenbusStateInitialised: + switch (state) { + case XenbusStateConnected: + backend_connect(dev); + xenbus_switch_state(dev, XenbusStateConnected); + break; + case XenbusStateClosing: + case XenbusStateClosed: + xenbus_switch_state(dev, XenbusStateClosing); + break; + default: + __WARN(); + } + break; + case XenbusStateConnected: + switch (state) { + case XenbusStateInitWait: + case XenbusStateClosing: + case XenbusStateClosed: + down(_back_global.frontends_lock); + backend_disconnect(dev); + up(_back_global.frontends_lock); + xenbus_switch_state(dev, XenbusStateClosing); + break; + default: + __WARN(); + } + break; + case XenbusStateClosing: + switch (state) { + case XenbusStateInitWait: +
[Xen-devel] [PATCH v5 06/18] xen/pvcalls: handle commands from the frontend
When the other end notifies us that there are commands to be read (pvcalls_back_event), wake up the backend thread to parse the command. The command ring works like most other Xen rings, so use the usual ring macros to read and write to it. The functions implementing the commands are empty stubs for now. Signed-off-by: Stefano StabelliniCC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-back.c | 119 + 1 file changed, 119 insertions(+) diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c index e4c2e46..437c2ad 100644 --- a/drivers/xen/pvcalls-back.c +++ b/drivers/xen/pvcalls-back.c @@ -51,12 +51,131 @@ struct pvcalls_fedata { struct work_struct register_work; }; +static int pvcalls_back_socket(struct xenbus_device *dev, + struct xen_pvcalls_request *req) +{ + return 0; +} + +static int pvcalls_back_connect(struct xenbus_device *dev, + struct xen_pvcalls_request *req) +{ + return 0; +} + +static int pvcalls_back_release(struct xenbus_device *dev, + struct xen_pvcalls_request *req) +{ + return 0; +} + +static int pvcalls_back_bind(struct xenbus_device *dev, +struct xen_pvcalls_request *req) +{ + return 0; +} + +static int pvcalls_back_listen(struct xenbus_device *dev, + struct xen_pvcalls_request *req) +{ + return 0; +} + +static int pvcalls_back_accept(struct xenbus_device *dev, + struct xen_pvcalls_request *req) +{ + return 0; +} + +static int pvcalls_back_poll(struct xenbus_device *dev, +struct xen_pvcalls_request *req) +{ + return 0; +} + +static int pvcalls_back_handle_cmd(struct xenbus_device *dev, + struct xen_pvcalls_request *req) +{ + int ret = 0; + + switch (req->cmd) { + case PVCALLS_SOCKET: + ret = pvcalls_back_socket(dev, req); + break; + case PVCALLS_CONNECT: + ret = pvcalls_back_connect(dev, req); + break; + case PVCALLS_RELEASE: + ret = pvcalls_back_release(dev, req); + break; + case PVCALLS_BIND: + ret = pvcalls_back_bind(dev, req); + break; + case PVCALLS_LISTEN: + ret = pvcalls_back_listen(dev, req); + break; + case PVCALLS_ACCEPT: + ret = pvcalls_back_accept(dev, req); + break; + case PVCALLS_POLL: + ret = pvcalls_back_poll(dev, req); + break; + default: + ret = -ENOTSUPP; + break; + } + return ret; +} + static void pvcalls_back_work(struct work_struct *work) { + struct pvcalls_fedata *fedata = container_of(work, + struct pvcalls_fedata, register_work); + int notify, notify_all = 0, more = 1; + struct xen_pvcalls_request req; + struct xenbus_device *dev = fedata->dev; + + while (more) { + while (RING_HAS_UNCONSUMED_REQUESTS(>ring)) { + RING_COPY_REQUEST(>ring, + fedata->ring.req_cons++, + ); + + if (!pvcalls_back_handle_cmd(dev, )) { + RING_PUSH_RESPONSES_AND_CHECK_NOTIFY( + >ring, notify); + notify_all += notify; + } + } + + if (notify_all) + notify_remote_via_irq(fedata->irq); + + RING_FINAL_CHECK_FOR_REQUESTS(>ring, more); + } } static irqreturn_t pvcalls_back_event(int irq, void *dev_id) { + struct xenbus_device *dev = dev_id; + struct pvcalls_fedata *fedata = NULL; + + if (dev == NULL) + return IRQ_HANDLED; + + fedata = dev_get_drvdata(>dev); + if (fedata == NULL) + return IRQ_HANDLED; + + /* +* TODO: a small theoretical race exists if we try to queue work +* after pvcalls_back_work checked for final requests and before +* it returns. The queuing will fail, and pvcalls_back_work +* won't do the work because it is about to return. In that +* case, we lose the notification. +*/ + queue_work(fedata->wq, >register_work); + return IRQ_HANDLED; } -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v5 15/18] xen/pvcalls: implement the ioworker functions
We have one ioworker per socket. Each ioworker goes through the list of outstanding read/write requests. Once all requests have been dealt with, it returns. We use one atomic counter per socket for "read" operations and one for "write" operations to keep track of the reads/writes to do. We also use one atomic counter ("io") per ioworker to keep track of how many outstanding requests we have in total assigned to the ioworker. The ioworker finishes when there are none. Signed-off-by: Stefano StabelliniReviewed-by: Boris Ostrovsky CC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-back.c | 27 +++ 1 file changed, 27 insertions(+) diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c index 7a8e866..ab7882a 100644 --- a/drivers/xen/pvcalls-back.c +++ b/drivers/xen/pvcalls-back.c @@ -98,8 +98,35 @@ static int pvcalls_back_release_active(struct xenbus_device *dev, struct pvcalls_fedata *fedata, struct sock_mapping *map); +static void pvcalls_conn_back_read(void *opaque) +{ +} + +static int pvcalls_conn_back_write(struct sock_mapping *map) +{ + return 0; +} + static void pvcalls_back_ioworker(struct work_struct *work) { + struct pvcalls_ioworker *ioworker = container_of(work, + struct pvcalls_ioworker, register_work); + struct sock_mapping *map = container_of(ioworker, struct sock_mapping, + ioworker); + + while (atomic_read(>io) > 0) { + if (atomic_read(>release) > 0) { + atomic_set(>release, 0); + return; + } + + if (atomic_read(>read) > 0) + pvcalls_conn_back_read(map); + if (atomic_read(>write) > 0) + pvcalls_conn_back_write(map); + + atomic_dec(>io); + } } static int pvcalls_back_socket(struct xenbus_device *dev, -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v5 14/18] xen/pvcalls: disconnect and module_exit
Implement backend_disconnect. Call pvcalls_back_release_active on active sockets and pvcalls_back_release_passive on passive sockets. Implement module_exit by calling backend_disconnect on frontend connections. Signed-off-by: Stefano StabelliniCC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-back.c | 53 ++ 1 file changed, 53 insertions(+) diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c index f6f88ce..7a8e866 100644 --- a/drivers/xen/pvcalls-back.c +++ b/drivers/xen/pvcalls-back.c @@ -812,6 +812,43 @@ static int backend_connect(struct xenbus_device *dev) static int backend_disconnect(struct xenbus_device *dev) { + struct pvcalls_fedata *fedata; + struct sock_mapping *map, *n; + struct sockpass_mapping *mappass; + struct radix_tree_iter iter; + void **slot; + + + fedata = dev_get_drvdata(>dev); + + down(>socket_lock); + list_for_each_entry_safe(map, n, >socket_mappings, list) { + list_del(>list); + pvcalls_back_release_active(dev, fedata, map); + } + + radix_tree_for_each_slot(slot, >socketpass_mappings, , 0) { + mappass = radix_tree_deref_slot(slot); + if (!mappass) + continue; + if (radix_tree_exception(mappass)) { + if (radix_tree_deref_retry(mappass)) + slot = radix_tree_iter_retry(); + } else { + radix_tree_delete(>socketpass_mappings, mappass->id); + pvcalls_back_release_passive(dev, fedata, mappass); + } + } + up(>socket_lock); + + xenbus_unmap_ring_vfree(dev, fedata->sring); + unbind_from_irqhandler(fedata->irq, dev); + + list_del(>list); + destroy_workqueue(fedata->wq); + kfree(fedata); + dev_set_drvdata(>dev, NULL); + return 0; } @@ -1005,3 +1042,19 @@ static int __init pvcalls_back_init(void) return 0; } module_init(pvcalls_back_init); + +static void __exit pvcalls_back_fin(void) +{ + struct pvcalls_fedata *fedata, *nfedata; + + down(_back_global.frontends_lock); + list_for_each_entry_safe(fedata, nfedata, _back_global.frontends, +list) { + backend_disconnect(fedata->dev); + } + up(_back_global.frontends_lock); + + xenbus_unregister_driver(_back_driver); +} + +module_exit(pvcalls_back_fin); -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v5 18/18] xen: introduce a Kconfig option to enable the pvcalls backend
Also add pvcalls-back to the Makefile. Signed-off-by: Stefano StabelliniCC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/Kconfig | 12 drivers/xen/Makefile | 1 + 2 files changed, 13 insertions(+) diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig index f15bb3b7..4545561 100644 --- a/drivers/xen/Kconfig +++ b/drivers/xen/Kconfig @@ -196,6 +196,18 @@ config XEN_PCIDEV_BACKEND If in doubt, say m. +config XEN_PVCALLS_BACKEND + bool "XEN PV Calls backend driver" + depends on INET && XEN && XEN_BACKEND + default n + help + Experimental backend for the Xen PV Calls protocol + (https://xenbits.xen.org/docs/unstable/misc/pvcalls.html). It + allows PV Calls frontends to send POSIX calls to the backend, + which implements them. + + If in doubt, say n. + config XEN_SCSI_BACKEND tristate "XEN SCSI backend driver" depends on XEN && XEN_BACKEND && TARGET_CORE diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile index 8feab810..480b928 100644 --- a/drivers/xen/Makefile +++ b/drivers/xen/Makefile @@ -38,6 +38,7 @@ obj-$(CONFIG_XEN_ACPI_PROCESSOR) += xen-acpi-processor.o obj-$(CONFIG_XEN_EFI) += efi.o obj-$(CONFIG_XEN_SCSI_BACKEND) += xen-scsiback.o obj-$(CONFIG_XEN_AUTO_XLATE) += xlate_mmu.o +obj-$(CONFIG_XEN_PVCALLS_BACKEND) += pvcalls-back.o xen-evtchn-y := evtchn.o xen-gntdev-y := gntdev.o xen-gntalloc-y := gntalloc.o -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v5 11/18] xen/pvcalls: implement accept command
Implement the accept command by calling inet_accept. To avoid blocking in the kernel, call inet_accept(O_NONBLOCK) from a workqueue, which get scheduled on sk_data_ready (for a passive socket, it means that there are connections to accept). Use the reqcopy field to store the request. Accept the new socket from the delayed work function, create a new sock_mapping for it, map the indexes page and data ring, and reply to the other end. Allocate an ioworker for the socket. Only support one outstanding blocking accept request for every socket at any time. Add a field to sock_mapping to remember the passive socket from which an active socket was created. Signed-off-by: Stefano StabelliniCC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-back.c | 113 + 1 file changed, 113 insertions(+) diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c index 2a47425..62738e4 100644 --- a/drivers/xen/pvcalls-back.c +++ b/drivers/xen/pvcalls-back.c @@ -64,6 +64,7 @@ struct pvcalls_ioworker { struct sock_mapping { struct list_head list; struct pvcalls_fedata *fedata; + struct sockpass_mapping *sockpass; struct socket *sock; uint64_t id; grant_ref_t ref; @@ -279,10 +280,83 @@ static int pvcalls_back_release(struct xenbus_device *dev, static void __pvcalls_back_accept(struct work_struct *work) { + struct sockpass_mapping *mappass = container_of( + work, struct sockpass_mapping, register_work); + struct sock_mapping *map; + struct pvcalls_ioworker *iow; + struct pvcalls_fedata *fedata; + struct socket *sock; + struct xen_pvcalls_response *rsp; + struct xen_pvcalls_request *req; + int notify; + int ret = -EINVAL; + unsigned long flags; + + fedata = mappass->fedata; + /* +* __pvcalls_back_accept can race against pvcalls_back_accept. +* We only need to check the value of "cmd" on read. It could be +* done atomically, but to simplify the code on the write side, we +* use a spinlock. +*/ + spin_lock_irqsave(>copy_lock, flags); + req = >reqcopy; + if (req->cmd != PVCALLS_ACCEPT) { + spin_unlock_irqrestore(>copy_lock, flags); + return; + } + spin_unlock_irqrestore(>copy_lock, flags); + + sock = sock_alloc(); + if (sock == NULL) + goto out_error; + sock->type = mappass->sock->type; + sock->ops = mappass->sock->ops; + + ret = inet_accept(mappass->sock, sock, O_NONBLOCK, true); + if (ret == -EAGAIN) { + sock_release(sock); + goto out_error; + } + + map = pvcalls_new_active_socket(fedata, + req->u.accept.id_new, + req->u.accept.ref, + req->u.accept.evtchn, + sock); + if (!map) { + ret = -EFAULT; + sock_release(sock); + goto out_error; + } + + map->sockpass = mappass; + iow = >ioworker; + atomic_inc(>read); + atomic_inc(>io); + queue_work(iow->wq, >register_work); + +out_error: + rsp = RING_GET_RESPONSE(>ring, fedata->ring.rsp_prod_pvt++); + rsp->req_id = req->req_id; + rsp->cmd = req->cmd; + rsp->u.accept.id = req->u.accept.id; + rsp->ret = ret; + RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(>ring, notify); + if (notify) + notify_remote_via_irq(fedata->irq); + + mappass->reqcopy.cmd = 0; } static void pvcalls_pass_sk_data_ready(struct sock *sock) { + struct sockpass_mapping *mappass = sock->sk_user_data; + + if (mappass == NULL) + return; + + queue_work(mappass->wq, >register_work); } static int pvcalls_back_bind(struct xenbus_device *dev, @@ -388,6 +462,45 @@ static int pvcalls_back_listen(struct xenbus_device *dev, static int pvcalls_back_accept(struct xenbus_device *dev, struct xen_pvcalls_request *req) { + struct pvcalls_fedata *fedata; + struct sockpass_mapping *mappass; + int ret = -EINVAL; + struct xen_pvcalls_response *rsp; + unsigned long flags; + + fedata = dev_get_drvdata(>dev); + + down(>socket_lock); + mappass = radix_tree_lookup(>socketpass_mappings, + req->u.accept.id); + up(>socket_lock); + if (mappass == NULL) + goto out_error; + + /* +* Limitation of the current implementation: only support one +* concurrent accept or poll call on one socket. +*/ + spin_lock_irqsave(>copy_lock, flags); + if (mappass->reqcopy.cmd != 0) { + spin_unlock_irqrestore(>copy_lock, flags); + ret = -EINTR; +
[Xen-devel] [PATCH v5 05/18] xen/pvcalls: connect to a frontend
Introduce a per-frontend data structure named pvcalls_fedata. It contains pointers to the command ring, its event channel, a list of active sockets and a tree of passive sockets (passing sockets need to be looked up from the id on listen, accept and poll commands, while active sockets only on release). It also has an unbound workqueue to schedule the work of parsing and executing commands on the command ring. socket_lock protects the two lists. In pvcalls_back_global, keep a list of connected frontends. Signed-off-by: Stefano StabelliniReviewed-by: Boris Ostrovsky CC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-back.c | 92 ++ 1 file changed, 92 insertions(+) diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c index 7bce750..e4c2e46 100644 --- a/drivers/xen/pvcalls-back.c +++ b/drivers/xen/pvcalls-back.c @@ -33,9 +33,101 @@ struct pvcalls_back_global { struct semaphore frontends_lock; } pvcalls_back_global; +/* + * Per-frontend data structure. It contains pointers to the command + * ring, its event channel, a list of active sockets and a tree of + * passive sockets. + */ +struct pvcalls_fedata { + struct list_head list; + struct xenbus_device *dev; + struct xen_pvcalls_sring *sring; + struct xen_pvcalls_back_ring ring; + int irq; + struct list_head socket_mappings; + struct radix_tree_root socketpass_mappings; + struct semaphore socket_lock; + struct workqueue_struct *wq; + struct work_struct register_work; +}; + +static void pvcalls_back_work(struct work_struct *work) +{ +} + +static irqreturn_t pvcalls_back_event(int irq, void *dev_id) +{ + return IRQ_HANDLED; +} + static int backend_connect(struct xenbus_device *dev) { + int err, evtchn; + grant_ref_t ring_ref; + struct pvcalls_fedata *fedata = NULL; + + fedata = kzalloc(sizeof(struct pvcalls_fedata), GFP_KERNEL); + if (!fedata) + return -ENOMEM; + + err = xenbus_scanf(XBT_NIL, dev->otherend, "port", "%u", + ); + if (err != 1) { + err = -EINVAL; + xenbus_dev_fatal(dev, err, "reading %s/event-channel", +dev->otherend); + goto error; + } + + err = xenbus_scanf(XBT_NIL, dev->otherend, "ring-ref", "%u", _ref); + if (err != 1) { + err = -EINVAL; + xenbus_dev_fatal(dev, err, "reading %s/ring-ref", +dev->otherend); + goto error; + } + + err = bind_interdomain_evtchn_to_irqhandler(dev->otherend_id, evtchn, + pvcalls_back_event, 0, + "pvcalls-backend", dev); + if (err < 0) + goto error; + fedata->irq = err; + + fedata->wq = alloc_workqueue("pvcalls_back_wq", WQ_UNBOUND, 1); + if (!fedata->wq) { + err = -ENOMEM; + goto error; + } + + err = xenbus_map_ring_valloc(dev, _ref, 1, (void**)>sring); + if (err < 0) + goto error; + + BACK_RING_INIT(>ring, fedata->sring, XEN_PAGE_SIZE * 1); + fedata->dev = dev; + + INIT_WORK(>register_work, pvcalls_back_work); + INIT_LIST_HEAD(>socket_mappings); + INIT_RADIX_TREE(>socketpass_mappings, GFP_KERNEL); + sema_init(>socket_lock, 1); + dev_set_drvdata(>dev, fedata); + + down(_back_global.frontends_lock); + list_add_tail(>list, _back_global.frontends); + up(_back_global.frontends_lock); + queue_work(fedata->wq, >register_work); + return 0; + + error: + if (fedata->sring != NULL) + xenbus_unmap_ring_vfree(dev, fedata->sring); + if (fedata->wq) + destroy_workqueue(fedata->wq); + unbind_from_irqhandler(fedata->irq, dev); + kfree(fedata); + return err; } static int backend_disconnect(struct xenbus_device *dev) -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v5 01/18] xen: introduce the pvcalls interface header
Introduce the C header file which defines the PV Calls interface. It is imported from xen/include/public/io/pvcalls.h. Signed-off-by: Stefano StabelliniReviewed-by: Boris Ostrovsky CC: konrad.w...@oracle.com CC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- include/xen/interface/io/pvcalls.h | 121 + include/xen/interface/io/ring.h| 2 + 2 files changed, 123 insertions(+) create mode 100644 include/xen/interface/io/pvcalls.h diff --git a/include/xen/interface/io/pvcalls.h b/include/xen/interface/io/pvcalls.h new file mode 100644 index 000..ccf97b8 --- /dev/null +++ b/include/xen/interface/io/pvcalls.h @@ -0,0 +1,121 @@ +#ifndef __XEN_PUBLIC_IO_XEN_PVCALLS_H__ +#define __XEN_PUBLIC_IO_XEN_PVCALLS_H__ + +#include +#include +#include + +/* "1" means socket, connect, release, bind, listen, accept and poll */ +#define XENBUS_FUNCTIONS_CALLS "1" + +/* + * See docs/misc/pvcalls.markdown in xen.git for the full specification: + * https://xenbits.xen.org/docs/unstable/misc/pvcalls.html + */ +struct pvcalls_data_intf { +RING_IDX in_cons, in_prod, in_error; + +uint8_t pad1[52]; + +RING_IDX out_cons, out_prod, out_error; + +uint8_t pad2[52]; + +RING_IDX ring_order; +grant_ref_t ref[]; +}; +DEFINE_XEN_FLEX_RING(pvcalls); + +#define PVCALLS_SOCKET 0 +#define PVCALLS_CONNECT1 +#define PVCALLS_RELEASE2 +#define PVCALLS_BIND 3 +#define PVCALLS_LISTEN 4 +#define PVCALLS_ACCEPT 5 +#define PVCALLS_POLL 6 + +struct xen_pvcalls_request { +uint32_t req_id; /* private to guest, echoed in response */ +uint32_t cmd;/* command to execute */ +union { +struct xen_pvcalls_socket { +uint64_t id; +uint32_t domain; +uint32_t type; +uint32_t protocol; +} socket; +struct xen_pvcalls_connect { +uint64_t id; +uint8_t addr[28]; +uint32_t len; +uint32_t flags; +grant_ref_t ref; +uint32_t evtchn; +} connect; +struct xen_pvcalls_release { +uint64_t id; +uint8_t reuse; +} release; +struct xen_pvcalls_bind { +uint64_t id; +uint8_t addr[28]; +uint32_t len; +} bind; +struct xen_pvcalls_listen { +uint64_t id; +uint32_t backlog; +} listen; +struct xen_pvcalls_accept { +uint64_t id; +uint64_t id_new; +grant_ref_t ref; +uint32_t evtchn; +} accept; +struct xen_pvcalls_poll { +uint64_t id; +} poll; +/* dummy member to force sizeof(struct xen_pvcalls_request) + * to match across archs */ +struct xen_pvcalls_dummy { +uint8_t dummy[56]; +} dummy; +} u; +}; + +struct xen_pvcalls_response { +uint32_t req_id; +uint32_t cmd; +int32_t ret; +uint32_t pad; +union { +struct _xen_pvcalls_socket { +uint64_t id; +} socket; +struct _xen_pvcalls_connect { +uint64_t id; +} connect; +struct _xen_pvcalls_release { +uint64_t id; +} release; +struct _xen_pvcalls_bind { +uint64_t id; +} bind; +struct _xen_pvcalls_listen { +uint64_t id; +} listen; +struct _xen_pvcalls_accept { +uint64_t id; +} accept; +struct _xen_pvcalls_poll { +uint64_t id; +} poll; +struct _xen_pvcalls_dummy { +uint8_t dummy[8]; +} dummy; +} u; +}; + +DEFINE_RING_TYPES(xen_pvcalls, struct xen_pvcalls_request, + struct xen_pvcalls_response); + +#endif diff --git a/include/xen/interface/io/ring.h b/include/xen/interface/io/ring.h index c794568..e547088 100644 --- a/include/xen/interface/io/ring.h +++ b/include/xen/interface/io/ring.h @@ -9,6 +9,8 @@ #ifndef __XEN_PUBLIC_IO_RING_H__ #define __XEN_PUBLIC_IO_RING_H__ +#include + typedef unsigned int RING_IDX; /* Round a 32-bit unsigned constant down to the nearest power of two. */ -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v5 08/18] xen/pvcalls: implement connect command
Allocate a socket. Keep track of socket <-> ring mappings with a new data structure, called sock_mapping. Implement the connect command by calling inet_stream_connect, and mapping the new indexes page and data ring. Allocate a workqueue and a work_struct, called ioworker, to perform reads and writes to the socket. When an active socket is closed (sk_state_change), set in_error to -ENOTCONN and notify the other end, as specified by the protocol. sk_data_ready and pvcalls_back_ioworker will be implemented later. Signed-off-by: Stefano StabelliniCC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-back.c | 174 + 1 file changed, 174 insertions(+) diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c index 953458b..5435ce7 100644 --- a/drivers/xen/pvcalls-back.c +++ b/drivers/xen/pvcalls-back.c @@ -56,6 +56,39 @@ struct pvcalls_fedata { struct work_struct register_work; }; +struct pvcalls_ioworker { + struct work_struct register_work; + struct workqueue_struct *wq; +}; + +struct sock_mapping { + struct list_head list; + struct pvcalls_fedata *fedata; + struct socket *sock; + uint64_t id; + grant_ref_t ref; + struct pvcalls_data_intf *ring; + void *bytes; + struct pvcalls_data data; + uint32_t ring_order; + int irq; + atomic_t read; + atomic_t write; + atomic_t io; + atomic_t release; + void (*saved_data_ready)(struct sock *sk); + struct pvcalls_ioworker ioworker; +}; + +static irqreturn_t pvcalls_back_conn_event(int irq, void *sock_map); +static int pvcalls_back_release_active(struct xenbus_device *dev, + struct pvcalls_fedata *fedata, + struct sock_mapping *map); + +static void pvcalls_back_ioworker(struct work_struct *work) +{ +} + static int pvcalls_back_socket(struct xenbus_device *dev, struct xen_pvcalls_request *req) { @@ -84,9 +117,145 @@ static int pvcalls_back_socket(struct xenbus_device *dev, return 0; } +static void pvcalls_sk_state_change(struct sock *sock) +{ + struct sock_mapping *map = sock->sk_user_data; + struct pvcalls_data_intf *intf; + + if (map == NULL) + return; + + intf = map->ring; + intf->in_error = -ENOTCONN; + notify_remote_via_irq(map->irq); +} + +static void pvcalls_sk_data_ready(struct sock *sock) +{ +} + +static struct sock_mapping *pvcalls_new_active_socket( + struct pvcalls_fedata *fedata, + uint64_t id, + grant_ref_t ref, + uint32_t evtchn, + struct socket *sock) +{ + int ret; + struct sock_mapping *map; + void *page; + + map = kzalloc(sizeof(*map), GFP_KERNEL); + if (map == NULL) + return NULL; + + map->fedata = fedata; + map->sock = sock; + map->id = id; + map->ref = ref; + + ret = xenbus_map_ring_valloc(fedata->dev, , 1, ); + if (ret < 0) + goto out; + map->ring = page; + map->ring_order = map->ring->ring_order; + /* first read the order, then map the data ring */ + virt_rmb(); + if (map->ring_order > MAX_RING_ORDER) { + pr_warn("%s frontend requested ring_order %u, which is > MAX (%u)\n", + __func__, map->ring_order, MAX_RING_ORDER); + goto out; + } + ret = xenbus_map_ring_valloc(fedata->dev, map->ring->ref, +(1 << map->ring_order), ); + if (ret < 0) + goto out; + map->bytes = page; + + ret = bind_interdomain_evtchn_to_irqhandler(fedata->dev->otherend_id, + evtchn, + pvcalls_back_conn_event, + 0, + "pvcalls-backend", + map); + if (ret < 0) + goto out; + map->irq = ret; + + map->data.in = map->bytes; + map->data.out = map->bytes + XEN_FLEX_RING_SIZE(map->ring_order); + + map->ioworker.wq = alloc_workqueue("pvcalls_io", WQ_UNBOUND, 1); + if (!map->ioworker.wq) + goto out; + atomic_set(>io, 1); + INIT_WORK(>ioworker.register_work, pvcalls_back_ioworker); + + down(>socket_lock); + list_add_tail(>list, >socket_mappings); + up(>socket_lock); + + write_lock_bh(>sock->sk->sk_callback_lock); + map->saved_data_ready = map->sock->sk->sk_data_ready; + map->sock->sk->sk_user_data = map; + map->sock->sk->sk_data_ready = pvcalls_sk_data_ready; + map->sock->sk->sk_state_change = pvcalls_sk_state_change; +
[Xen-devel] [PATCH v5 13/18] xen/pvcalls: implement release command
Release both active and passive sockets. For active sockets, make sure to avoid possible conflicts with the ioworker reading/writing to those sockets concurrently. Set map->release to let the ioworker know atomically that the socket will be released soon, then wait until the ioworker finishes (flush_work). Unmap indexes pages and data rings. Signed-off-by: Stefano StabelliniCC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-back.c | 68 ++ 1 file changed, 68 insertions(+) diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c index 5b2ef60..f6f88ce 100644 --- a/drivers/xen/pvcalls-back.c +++ b/drivers/xen/pvcalls-back.c @@ -269,12 +269,80 @@ static int pvcalls_back_release_active(struct xenbus_device *dev, struct pvcalls_fedata *fedata, struct sock_mapping *map) { + disable_irq(map->irq); + if (map->sock->sk != NULL) { + write_lock_bh(>sock->sk->sk_callback_lock); + map->sock->sk->sk_user_data = NULL; + map->sock->sk->sk_data_ready = map->saved_data_ready; + write_unlock_bh(>sock->sk->sk_callback_lock); + } + + atomic_set(>release, 1); + flush_work(>ioworker.register_work); + + xenbus_unmap_ring_vfree(dev, map->bytes); + xenbus_unmap_ring_vfree(dev, (void *)map->ring); + unbind_from_irqhandler(map->irq, map); + + sock_release(map->sock); + kfree(map); + + return 0; +} + +static int pvcalls_back_release_passive(struct xenbus_device *dev, + struct pvcalls_fedata *fedata, + struct sockpass_mapping *mappass) +{ + if (mappass->sock->sk != NULL) { + write_lock_bh(>sock->sk->sk_callback_lock); + mappass->sock->sk->sk_user_data = NULL; + mappass->sock->sk->sk_data_ready = mappass->saved_data_ready; + write_unlock_bh(>sock->sk->sk_callback_lock); + } + sock_release(mappass->sock); + flush_workqueue(mappass->wq); + destroy_workqueue(mappass->wq); + kfree(mappass); + return 0; } static int pvcalls_back_release(struct xenbus_device *dev, struct xen_pvcalls_request *req) { + struct pvcalls_fedata *fedata; + struct sock_mapping *map, *n; + struct sockpass_mapping *mappass; + int ret = 0; + struct xen_pvcalls_response *rsp; + + fedata = dev_get_drvdata(>dev); + + down(>socket_lock); + list_for_each_entry_safe(map, n, >socket_mappings, list) { + if (map->id == req->u.release.id) { + list_del(>list); + up(>socket_lock); + ret = pvcalls_back_release_active(dev, fedata, map); + goto out; + } + } + mappass = radix_tree_lookup(>socketpass_mappings, + req->u.release.id); + if (mappass != NULL) { + radix_tree_delete(>socketpass_mappings, mappass->id); + up(>socket_lock); + ret = pvcalls_back_release_passive(dev, fedata, mappass); + } else + up(>socket_lock); + +out: + rsp = RING_GET_RESPONSE(>ring, fedata->ring.rsp_prod_pvt++); + rsp->req_id = req->req_id; + rsp->u.release.id = req->u.release.id; + rsp->cmd = req->cmd; + rsp->ret = ret; return 0; } -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v5 12/18] xen/pvcalls: implement poll command
Implement poll on passive sockets by requesting a delayed response with mappass->reqcopy, and reply back when there is data on the passive socket. Poll on active socket is unimplemented as by the spec, as the frontend should just wait for events and check the indexes on the indexes page. Only support one outstanding poll (or accept) request for every passive socket at any given time. Signed-off-by: Stefano StabelliniCC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-back.c | 73 +- 1 file changed, 72 insertions(+), 1 deletion(-) diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c index 62738e4..5b2ef60 100644 --- a/drivers/xen/pvcalls-back.c +++ b/drivers/xen/pvcalls-back.c @@ -352,11 +352,33 @@ static void __pvcalls_back_accept(struct work_struct *work) static void pvcalls_pass_sk_data_ready(struct sock *sock) { struct sockpass_mapping *mappass = sock->sk_user_data; + struct pvcalls_fedata *fedata; + struct xen_pvcalls_response *rsp; + unsigned long flags; + int notify; if (mappass == NULL) return; - queue_work(mappass->wq, >register_work); + fedata = mappass->fedata; + spin_lock_irqsave(>copy_lock, flags); + if (mappass->reqcopy.cmd == PVCALLS_POLL) { + rsp = RING_GET_RESPONSE(>ring, fedata->ring.rsp_prod_pvt++); + rsp->req_id = mappass->reqcopy.req_id; + rsp->u.poll.id = mappass->reqcopy.u.poll.id; + rsp->cmd = mappass->reqcopy.cmd; + rsp->ret = 0; + + mappass->reqcopy.cmd = 0; + spin_unlock_irqrestore(>copy_lock, flags); + + RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(>ring, notify); + if (notify) + notify_remote_via_irq(mappass->fedata->irq); + } else { + spin_unlock_irqrestore(>copy_lock, flags); + queue_work(mappass->wq, >register_work); + } } static int pvcalls_back_bind(struct xenbus_device *dev, @@ -507,6 +529,55 @@ static int pvcalls_back_accept(struct xenbus_device *dev, static int pvcalls_back_poll(struct xenbus_device *dev, struct xen_pvcalls_request *req) { + struct pvcalls_fedata *fedata; + struct sockpass_mapping *mappass; + struct xen_pvcalls_response *rsp; + struct inet_connection_sock *icsk; + struct request_sock_queue *queue; + unsigned long flags; + int ret; + bool data; + + fedata = dev_get_drvdata(>dev); + + down(>socket_lock); + mappass = radix_tree_lookup(>socketpass_mappings, req->u.poll.id); + up(>socket_lock); + if (mappass == NULL) + return -EINVAL; + + /* +* Limitation of the current implementation: only support one +* concurrent accept or poll call on one socket. +*/ + spin_lock_irqsave(>copy_lock, flags); + if (mappass->reqcopy.cmd != 0) { + ret = -EINTR; + goto out; + } + + mappass->reqcopy = *req; + icsk = inet_csk(mappass->sock->sk); + queue = >icsk_accept_queue; + data = queue->rskq_accept_head != NULL; + if (data) { + mappass->reqcopy.cmd = 0; + ret = 0; + goto out; + } + spin_unlock_irqrestore(>copy_lock, flags); + + /* Tell the caller we don't need to send back a notification yet */ + return -1; + +out: + spin_unlock_irqrestore(>copy_lock, flags); + + rsp = RING_GET_RESPONSE(>ring, fedata->ring.rsp_prod_pvt++); + rsp->req_id = req->req_id; + rsp->cmd = req->cmd; + rsp->u.poll.id = req->u.poll.id; + rsp->ret = ret; return 0; } -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v5 02/18] xen/pvcalls: introduce the pvcalls xenbus backend
Introduce a xenbus backend for the pvcalls protocol, as defined by https://xenbits.xen.org/docs/unstable/misc/pvcalls.html. This patch only adds the stubs, the code will be added by the following patches. Signed-off-by: Stefano StabelliniReviewed-by: Boris Ostrovsky CC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-back.c | 61 ++ 1 file changed, 61 insertions(+) create mode 100644 drivers/xen/pvcalls-back.c diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c new file mode 100644 index 000..f3d0daa --- /dev/null +++ b/drivers/xen/pvcalls-back.c @@ -0,0 +1,61 @@ +/* + * (c) 2017 Stefano Stabellini + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include + +static int pvcalls_back_probe(struct xenbus_device *dev, + const struct xenbus_device_id *id) +{ + return 0; +} + +static void pvcalls_back_changed(struct xenbus_device *dev, +enum xenbus_state frontend_state) +{ +} + +static int pvcalls_back_remove(struct xenbus_device *dev) +{ + return 0; +} + +static int pvcalls_back_uevent(struct xenbus_device *xdev, + struct kobj_uevent_env *env) +{ + return 0; +} + +static const struct xenbus_device_id pvcalls_back_ids[] = { + { "pvcalls" }, + { "" } +}; + +static struct xenbus_driver pvcalls_back_driver = { + .ids = pvcalls_back_ids, + .probe = pvcalls_back_probe, + .remove = pvcalls_back_remove, + .uevent = pvcalls_back_uevent, + .otherend_changed = pvcalls_back_changed, +}; -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v5 03/18] xen/pvcalls: initialize the module and register the xenbus backend
Keep a list of connected frontends. Use a semaphore to protect list accesses. Signed-off-by: Stefano StabelliniReviewed-by: Boris Ostrovsky CC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-back.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c index f3d0daa..9044cf2 100644 --- a/drivers/xen/pvcalls-back.c +++ b/drivers/xen/pvcalls-back.c @@ -25,6 +25,11 @@ #include #include +struct pvcalls_back_global { + struct list_head frontends; + struct semaphore frontends_lock; +} pvcalls_back_global; + static int pvcalls_back_probe(struct xenbus_device *dev, const struct xenbus_device_id *id) { @@ -59,3 +64,20 @@ static int pvcalls_back_uevent(struct xenbus_device *xdev, .uevent = pvcalls_back_uevent, .otherend_changed = pvcalls_back_changed, }; + +static int __init pvcalls_back_init(void) +{ + int ret; + + if (!xen_domain()) + return -ENODEV; + + ret = xenbus_register_backend(_back_driver); + if (ret < 0) + return ret; + + sema_init(_back_global.frontends_lock, 1); + INIT_LIST_HEAD(_back_global.frontends); + return 0; +} +module_init(pvcalls_back_init); -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v5 09/18] xen/pvcalls: implement bind command
Allocate a socket. Track the allocated passive sockets with a new data structure named sockpass_mapping. It contains an unbound workqueue to schedule delayed work for the accept and poll commands. It also has a reqcopy field to be used to store a copy of a request for delayed work. Reads/writes to it are protected by a lock (the "copy_lock" spinlock). Initialize the workqueue in pvcalls_back_bind. Implement the bind command with inet_bind. The pass_sk_data_ready event handler will be added later. Signed-off-by: Stefano StabelliniCC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-back.c | 87 ++ 1 file changed, 87 insertions(+) diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c index 5435ce7..2c0bfef 100644 --- a/drivers/xen/pvcalls-back.c +++ b/drivers/xen/pvcalls-back.c @@ -80,6 +80,18 @@ struct sock_mapping { struct pvcalls_ioworker ioworker; }; +struct sockpass_mapping { + struct list_head list; + struct pvcalls_fedata *fedata; + struct socket *sock; + uint64_t id; + struct xen_pvcalls_request reqcopy; + spinlock_t copy_lock; + struct workqueue_struct *wq; + struct work_struct register_work; + void (*saved_data_ready)(struct sock *sk); +}; + static irqreturn_t pvcalls_back_conn_event(int irq, void *sock_map); static int pvcalls_back_release_active(struct xenbus_device *dev, struct pvcalls_fedata *fedata, @@ -265,9 +277,84 @@ static int pvcalls_back_release(struct xenbus_device *dev, return 0; } +static void __pvcalls_back_accept(struct work_struct *work) +{ +} + +static void pvcalls_pass_sk_data_ready(struct sock *sock) +{ +} + static int pvcalls_back_bind(struct xenbus_device *dev, struct xen_pvcalls_request *req) { + struct pvcalls_fedata *fedata; + int ret, err; + struct socket *sock; + struct sockpass_mapping *map; + struct xen_pvcalls_response *rsp; + + fedata = dev_get_drvdata(>dev); + + map = kzalloc(sizeof(*map), GFP_KERNEL); + if (map == NULL) { + ret = -ENOMEM; + goto out; + } + + INIT_WORK(>register_work, __pvcalls_back_accept); + spin_lock_init(>copy_lock); + map->wq = alloc_workqueue("pvcalls_wq", WQ_UNBOUND, 1); + if (!map->wq) { + ret = -ENOMEM; + kfree(map); + goto out; + } + + ret = sock_create(AF_INET, SOCK_STREAM, 0, ); + if (ret < 0) { + destroy_workqueue(map->wq); + kfree(map); + goto out; + } + + ret = inet_bind(sock, (struct sockaddr *)>u.bind.addr, + req->u.bind.len); + if (ret < 0) { + sock_release(sock); + destroy_workqueue(map->wq); + kfree(map); + goto out; + } + + map->fedata = fedata; + map->sock = sock; + map->id = req->u.bind.id; + + down(>socket_lock); + err = radix_tree_insert(>socketpass_mappings, map->id, + map); + up(>socket_lock); + if (err) { + ret = err; + sock_release(sock); + destroy_workqueue(map->wq); + kfree(map); + goto out; + } + + write_lock_bh(>sk->sk_callback_lock); + map->saved_data_ready = sock->sk->sk_data_ready; + sock->sk->sk_user_data = map; + sock->sk->sk_data_ready = pvcalls_pass_sk_data_ready; + write_unlock_bh(>sk->sk_callback_lock); + +out: + rsp = RING_GET_RESPONSE(>ring, fedata->ring.rsp_prod_pvt++); + rsp->req_id = req->req_id; + rsp->cmd = req->cmd; + rsp->u.bind.id = req->u.bind.id; + rsp->ret = ret; return 0; } -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v5 10/18] xen/pvcalls: implement listen command
Call inet_listen to implement the listen command. Signed-off-by: Stefano StabelliniReviewed-by: Boris Ostrovsky CC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-back.c | 21 + 1 file changed, 21 insertions(+) diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c index 2c0bfef..2a47425 100644 --- a/drivers/xen/pvcalls-back.c +++ b/drivers/xen/pvcalls-back.c @@ -361,6 +361,27 @@ static int pvcalls_back_bind(struct xenbus_device *dev, static int pvcalls_back_listen(struct xenbus_device *dev, struct xen_pvcalls_request *req) { + struct pvcalls_fedata *fedata; + int ret = -EINVAL; + struct sockpass_mapping *map; + struct xen_pvcalls_response *rsp; + + fedata = dev_get_drvdata(>dev); + + down(>socket_lock); + map = radix_tree_lookup(>socketpass_mappings, req->u.listen.id); + up(>socket_lock); + if (map == NULL) + goto out; + + ret = inet_listen(map->sock, req->u.listen.backlog); + +out: + rsp = RING_GET_RESPONSE(>ring, fedata->ring.rsp_prod_pvt++); + rsp->req_id = req->req_id; + rsp->cmd = req->cmd; + rsp->u.listen.id = req->u.listen.id; + rsp->ret = ret; return 0; } -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v5 00/18] introduce the Xen PV Calls backend
Hi all, this series introduces the backend for the newly introduced PV Calls procotol. PV Calls is a paravirtualized protocol that allows the implementation of a set of POSIX functions in a different domain. The PV Calls frontend sends POSIX function calls to the backend, which implements them and returns a value to the frontend and acts on the function call. For more information about PV Calls, please read: https://xenbits.xen.org/docs/unstable/misc/pvcalls.html I tried to split the source code into small pieces to make it easier to read and understand. Please review! Changes in v5: - added review-byes - remove unnecessary gotos - ret 0 in pvcalls_back_connect - do not lose ret values - remove queue->rskq_lock - make sure all accesses to socket_mappings and socketpass_mappings are protected by socket_lock - rename ring_size to array_size Changes in v4: - add reviewed-bys - fix return values of many functions - remove pointless initializers - print a warning if ring_order > MAX_RING_ORDER - remove map->ioworker.cpu - use queue_work instead of queue_work_on - add sock_release() on error paths where appropriate - add a comment in __pvcalls_back_accept about racing with pvcalls_back_accept and atomicity of reqcopy - remove unneded (void*) casts - remove unneded {} - fix backend_disconnect if !mappass - remove pointless continue in backend_disconnect - remove pointless memset of _back_global - pass *opaque to pvcalls_conn_back_read - improve WARN_ON in pvcalls_conn_back_read - fix error checks in pvcalls_conn_back_write - XEN_PVCALLS_BACKEND depends on XEN_BACKEND - rename priv to fedata across all patches Changes in v3: - added reviewed-bys - return err from pvcalls_back_probe - remove old comments - use a xenstore transaction in pvcalls_back_probe - ignore errors from xenbus_switch_state - rename pvcalls_back_priv to pvcalls_fedata - remove addr from backend_connect - remove priv->work, add comment about theoretical race - use IPPROTO_IP - refactor active socket allocation in a single new function Changes in v2: - allocate one ioworker per socket (rather than 1 per vcpu) - rename privs to frontends - add newlines - define "1" in the public header - better error returns in pvcalls_back_probe - do not set XenbusStateClosed twice in set_backend_state - add more comments - replace rw_semaphore with semaphore - rename pvcallss to socket_lock - move xenbus_map_ring_valloc closer to first use in backend_connect - use more traditional return codes from pvcalls_back_handle_cmd and callees - remove useless dev == NULL checks - replace lock_sock with more appropriate and fine grained socket locks Stefano Stabellini (18): xen: introduce the pvcalls interface header xen/pvcalls: introduce the pvcalls xenbus backend xen/pvcalls: initialize the module and register the xenbus backend xen/pvcalls: xenbus state handling xen/pvcalls: connect to a frontend xen/pvcalls: handle commands from the frontend xen/pvcalls: implement socket command xen/pvcalls: implement connect command xen/pvcalls: implement bind command xen/pvcalls: implement listen command xen/pvcalls: implement accept command xen/pvcalls: implement poll command xen/pvcalls: implement release command xen/pvcalls: disconnect and module_exit xen/pvcalls: implement the ioworker functions xen/pvcalls: implement read xen/pvcalls: implement write xen: introduce a Kconfig option to enable the pvcalls backend drivers/xen/Kconfig| 12 + drivers/xen/Makefile |1 + drivers/xen/pvcalls-back.c | 1244 include/xen/interface/io/pvcalls.h | 121 include/xen/interface/io/ring.h|2 + 5 files changed, 1380 insertions(+) create mode 100644 drivers/xen/pvcalls-back.c create mode 100644 include/xen/interface/io/pvcalls.h ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH for-4.8] x86/pv: Fix the handling of `int $x` for vectors which alias exceptions
The claim at the top of c/s 2e426d6eecf "x86/traps: Drop use_error_code parameter from do_{,guest_}trap()" is only actually true for hardware exceptions. It is not true for `int $x` instructions (which never push error code), irrespective of whether the vector aliases an exception or not. Furthermore, c/s 6480cc6280e "x86/traps: Fix failed ASSERT() in do_guest_trap()" really should have helped highlight that a regression had been introduced. Modify pv_inject_event() to understand event types other than X86_EVENTTYPE_HW_EXCEPTION, and introduce pv_inject_sw_interrupt() for the `int $x` handling code. Add further assertions to pv_inject_event() concerning the type of events passed in, which in turn requires that do_guest_trap() set its type appropriately (which is now used exclusively for hardware exceptions). This is logically a backport of c/s 5c4f579e0ee4f38cad5636bbf8ce700a394338d0 from Xen 4.9, but disentangled from the other injection work. Signed-off-by: Andrew Cooper--- CC: Jan Beulich --- xen/arch/x86/traps.c | 26 +- 1 file changed, 21 insertions(+), 5 deletions(-) diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index 19ac652..8c992ce 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -625,14 +625,24 @@ void fatal_trap(const struct cpu_user_regs *regs, bool_t show_remote) (regs->eflags & X86_EFLAGS_IF) ? "" : ", IN INTERRUPT CONTEXT"); } -static void do_guest_trap(unsigned int trapnr, - const struct cpu_user_regs *regs) +static void pv_inject_event( +unsigned int trapnr, const struct cpu_user_regs *regs, unsigned int type) { struct vcpu *v = current; struct trap_bounce *tb; const struct trap_info *ti; -const bool use_error_code = -((trapnr < 32) && (TRAP_HAVE_EC & (1u << trapnr))); +bool use_error_code; + +if ( type == X86_EVENTTYPE_HW_EXCEPTION ) +{ +ASSERT(trapnr < 32); +use_error_code = TRAP_HAVE_EC & (1u << trapnr); +} +else +{ +ASSERT(type == X86_EVENTTYPE_SW_INTERRUPT); +use_error_code = false; +} trace_pv_trap(trapnr, regs->eip, use_error_code, regs->error_code); @@ -658,6 +668,12 @@ static void do_guest_trap(unsigned int trapnr, trapstr(trapnr), trapnr, regs->error_code); } +static void do_guest_trap( +unsigned int trapnr, const struct cpu_user_regs *regs) +{ +pv_inject_event(trapnr, regs, X86_EVENTTYPE_HW_EXCEPTION); +} + static void instruction_done( struct cpu_user_regs *regs, unsigned long eip, unsigned int bpmatch) { @@ -3685,7 +3701,7 @@ void do_general_protection(struct cpu_user_regs *regs) if ( permit_softint(TI_GET_DPL(ti), v, regs) ) { regs->eip += 2; -do_guest_trap(vector, regs); +pv_inject_event(vector, regs, X86_EVENTTYPE_SW_INTERRUPT); return; } } -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v5 4/8] mm: Scrub memory from idle loop
Instead of scrubbing pages during guest destruction (from free_heap_pages()) do this opportunistically, from the idle loop. We might come to scrub_free_pages()from idle loop while another CPU uses mapcache override, resulting in a fault while trying to do __map_domain_page() in scrub_one_page(). To avoid this, make mapcache vcpu override a per-cpu variable. Signed-off-by: Boris Ostrovsky--- CC: Dario Faggioli --- Changes in v5: * Added explanation in commit message for making mapcache override VCPU a per-cpu variable * Fixed loop counting in scrub_free_pages() * Fixed the off-by-one error in setting first_dirty in scrub_free_pages(). * Various style fixes * Added a comment in node_to_scrub() explaining why it should be OK to prevent another CPU from scrubbing a node that ths current CPU temporarily claimed. (I decided against using locks there) xen/arch/arm/domain.c | 2 +- xen/arch/x86/domain.c | 2 +- xen/arch/x86/domain_page.c | 6 +-- xen/common/page_alloc.c| 118 - xen/include/xen/mm.h | 1 + 5 files changed, 111 insertions(+), 18 deletions(-) diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index 2dc8b0a..d282cd8 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -51,7 +51,7 @@ void idle_loop(void) /* Are we here for running vcpu context tasklets, or for idling? */ if ( unlikely(tasklet_work_to_do(cpu)) ) do_tasklet(); -else +else if ( !softirq_pending(cpu) && !scrub_free_pages() ) { local_irq_disable(); if ( cpu_is_haltable(cpu) ) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index f7873da..71f1ef4 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -122,7 +122,7 @@ static void idle_loop(void) /* Are we here for running vcpu context tasklets, or for idling? */ if ( unlikely(tasklet_work_to_do(cpu)) ) do_tasklet(); -else +else if ( !softirq_pending(cpu) && !scrub_free_pages() ) pm_idle(); do_softirq(); /* diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c index 71baede..0783c1e 100644 --- a/xen/arch/x86/domain_page.c +++ b/xen/arch/x86/domain_page.c @@ -18,12 +18,12 @@ #include #include -static struct vcpu *__read_mostly override; +static DEFINE_PER_CPU(struct vcpu *, override); static inline struct vcpu *mapcache_current_vcpu(void) { /* In the common case we use the mapcache of the running VCPU. */ -struct vcpu *v = override ?: current; +struct vcpu *v = this_cpu(override) ?: current; /* * When current isn't properly set up yet, this is equivalent to @@ -59,7 +59,7 @@ static inline struct vcpu *mapcache_current_vcpu(void) void __init mapcache_override_current(struct vcpu *v) { -override = v; +this_cpu(override) = v; } #define mapcache_l2_entry(e) ((e) >> PAGETABLE_ORDER) diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c index 9aac196..4e2775f 100644 --- a/xen/common/page_alloc.c +++ b/xen/common/page_alloc.c @@ -1019,15 +1019,85 @@ static int reserve_offlined_page(struct page_info *head) return count; } -static void scrub_free_pages(unsigned int node) +static nodemask_t node_scrubbing; + +/* + * If get_node is true this will return closest node that needs to be scrubbed, + * with appropriate bit in node_scrubbing set. + * If get_node is not set, this will return *a* node that needs to be scrubbed. + * node_scrubbing bitmask will no be updated. + * If no node needs scrubbing then NUMA_NO_NODE is returned. + */ +static unsigned int node_to_scrub(bool get_node) { -struct page_info *pg; -unsigned int zone; +nodeid_t node = cpu_to_node(smp_processor_id()), local_node; +nodeid_t closest = NUMA_NO_NODE; +u8 dist, shortest = 0xff; -ASSERT(spin_is_locked(_lock)); +if ( node == NUMA_NO_NODE ) +node = 0; -if ( !node_need_scrub[node] ) -return; +if ( node_need_scrub[node] && + (!get_node || !node_test_and_set(node, node_scrubbing)) ) +return node; + +/* + * See if there are memory-only nodes that need scrubbing and choose + * the closest one. + */ +local_node = node; +for ( ; ; ) +{ +do { +node = cycle_node(node, node_online_map); +} while ( !cpumask_empty(_to_cpumask(node)) && + (node != local_node) ); + +if ( node == local_node ) +break; + +/* + * Grab the node right away. If we find a closer node later we will + * release this one. While there is a chance that another CPU will + * not be able to scrub that node when it is searching for scrub work + * at the same time it will be able to do so next time it wakes up. + * The alternative would be to perform this
[Xen-devel] [PATCH v5 0/8] Memory scrubbing from idle loop
Changes in v5: * Make page_info.u.free and union and use bitfields there. * Bug fixes (see per-patch notes) When a domain is destroyed the hypervisor must scrub domain's pages before giving them to another guest in order to prevent leaking the deceased guest's data. Currently this is done during guest's destruction, possibly causing very lengthy cleanup process. This series adds support for scrubbing released pages from idle loop, making guest destruction significantly faster. For example, destroying a 1TB guest can now be completed in 40+ seconds as opposed to about 9 minutes using existing scrubbing algorithm. Briefly, the new algorithm places dirty pages at the end of heap's page list for each node/zone/order to avoid having to scan full list while searching for dirty pages. One processor form each node checks whether the node has any dirty pages and, if such pages are found, scrubs them. Scrubbing itself happens without holding heap lock so other users may access heap in the meantime. If while idle loop is scrubbing a particular chunk of pages this chunk is requested by the heap allocator, scrubbing is immediately stopped. On the allocation side, alloc_heap_pages() first tries to satisfy allocation request using only clean pages. If this is not possible, the search is repeated and dirty pages are scrubbed by the allocator. This series is somewhat based on earlier work by Bob Liu. V1: * Only set PGC_need_scrub bit for the buddy head, thus making it unnecessary to scan whole buddy * Fix spin_lock_cb() * Scrub CPU-less nodes * ARM support. Note that I have not been able to test this, only built the binary * Added scrub test patch (last one). Not sure whether it should be considered for committing but I have been running with it. V2: * merge_chunks() returns new buddy head * scrub_free_pages() returns softirq pending status in addition to (factored out) status of unscrubbed memory * spin_lock uses inlined spin_lock_cb() * scrub debugging code checks whole page, not just the first word. V3: * Keep dirty bit per page * Simplify merge_chunks() (now merge_and_free_buddy()) * When scrubbing memmory-only nodes try to find the closest node. V4: * Keep track of dirty pages in a buddy with page_info.u.free.first_dirty. * Drop patch 1 (factoring out merge_and_free_buddy()) since there is only one caller now * Drop patch patch 5 (from V3) since we are not breaking partially-scrubbed buddy anymore * Extract search loop in alloc_heap_pages() into get_free_buddy() (patch 2) * Add MEMF_no_scrub flag Deferred: * Per-node heap locks. In addition to (presumably) improving performance in general, once they are available we can parallelize scrubbing further by allowing more than one core per node to do idle loop scrubbing. * AVX-based scrubbing * Use idle loop scrubbing during boot. Boris Ostrovsky (8): mm: Place unscrubbed pages at the end of pagelist mm: Extract allocation loop from alloc_heap_pages() mm: Scrub pages in alloc_heap_pages() if needed mm: Scrub memory from idle loop spinlock: Introduce spin_lock_cb() mm: Keep heap accessible to others while scrubbing mm: Print number of unscrubbed pages in 'H' debug handler mm: Make sure pages are scrubbed xen/Kconfig.debug | 7 + xen/arch/arm/domain.c | 2 +- xen/arch/x86/domain.c | 2 +- xen/arch/x86/domain_page.c | 6 +- xen/common/page_alloc.c| 612 ++--- xen/common/spinlock.c | 9 +- xen/include/asm-arm/mm.h | 30 ++- xen/include/asm-x86/mm.h | 30 ++- xen/include/xen/mm.h | 5 +- xen/include/xen/spinlock.h | 8 + 10 files changed, 603 insertions(+), 108 deletions(-) -- 1.8.3.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v5 3/8] mm: Scrub pages in alloc_heap_pages() if needed
When allocating pages in alloc_heap_pages() first look for clean pages. If none is found then retry, take pages marked as unscrubbed and scrub them. Note that we shouldn't find unscrubbed pages in alloc_heap_pages() yet. However, this will become possible when we stop scrubbing from free_heap_pages() and instead do it from idle loop. Since not all allocations require clean pages (such as xenheap allocations) introduce MEMF_no_scrub flag that callers can set if they are willing to consume unscrubbed pages. Signed-off-by: Boris Ostrovsky--- Changes in v5: * Added comment explaining why we always grab order 0 pages in alloc_heap_pages) * Dropped the somewhat confusing comment about not needing to set first_dirty in alloc_heap_pages(). * Moved first bit of _MEMF_node by 8 to accommodate MEMF_no_scrub (bit 7 is no longer available) xen/common/page_alloc.c | 36 +++- xen/include/xen/mm.h| 4 +++- 2 files changed, 34 insertions(+), 6 deletions(-) diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c index 89fe3ce..9aac196 100644 --- a/xen/common/page_alloc.c +++ b/xen/common/page_alloc.c @@ -703,6 +703,7 @@ static struct page_info *get_free_buddy(unsigned int zone_lo, nodemask_t nodemask = d ? d->node_affinity : node_online_map; unsigned int j, zone, nodemask_retry = 0; struct page_info *pg; +bool use_unscrubbed = (memflags & MEMF_no_scrub); if ( node == NUMA_NO_NODE ) { @@ -734,8 +735,20 @@ static struct page_info *get_free_buddy(unsigned int zone_lo, /* Find smallest order which can satisfy the request. */ for ( j = order; j <= MAX_ORDER; j++ ) +{ if ( (pg = page_list_remove_head((node, zone, j))) ) -return pg; +{ +/* + * We grab single pages (order=0) even if they are + * unscrubbed. Given that scrubbing one page is fairly quick + * it is not worth breaking higher orders. + */ +if ( (order == 0) || use_unscrubbed || + pg->u.free.first_dirty == INVALID_DIRTY_IDX) +return pg; +page_list_add_tail(pg, (node, zone, j)); +} +} } while ( zone-- > zone_lo ); /* careful: unsigned zone may wrap */ if ( (memflags & MEMF_exact_node) && req_node != NUMA_NO_NODE ) @@ -775,7 +788,7 @@ static struct page_info *alloc_heap_pages( unsigned int i, buddy_order, zone; unsigned long request = 1UL << order; struct page_info *pg, *first_dirty_pg = NULL; -bool_t need_tlbflush = 0; +bool need_scrub, need_tlbflush = false; uint32_t tlbflush_timestamp = 0; /* Make sure there are enough bits in memflags for nodeID. */ @@ -819,6 +832,10 @@ static struct page_info *alloc_heap_pages( } pg = get_free_buddy(zone_lo, zone_hi, order, memflags, d); +/* Try getting a dirty buddy if we couldn't get a clean one. */ +if ( !pg && !(memflags & MEMF_no_scrub) ) +pg = get_free_buddy(zone_lo, zone_hi, order, +memflags | MEMF_no_scrub, d); if ( !pg ) { /* No suitable memory blocks. Fail the request. */ @@ -862,10 +879,19 @@ static struct page_info *alloc_heap_pages( if ( d != NULL ) d->last_alloc_node = node; +need_scrub = !!first_dirty_pg && !(memflags & MEMF_no_scrub); for ( i = 0; i < (1 << order); i++ ) { /* Reference count must continuously be zero for free pages. */ -BUG_ON(pg[i].count_info != PGC_state_free); +BUG_ON((pg[i].count_info & ~PGC_need_scrub) != PGC_state_free); + +if ( test_bit(_PGC_need_scrub, [i].count_info) ) +{ +if ( need_scrub ) +scrub_one_page([i]); +node_need_scrub[node]--; +} + pg[i].count_info = PGC_state_inuse; if ( !(memflags & MEMF_no_tlbflush) ) @@ -1749,7 +1775,7 @@ void *alloc_xenheap_pages(unsigned int order, unsigned int memflags) ASSERT(!in_irq()); pg = alloc_heap_pages(MEMZONE_XEN, MEMZONE_XEN, - order, memflags, NULL); + order, memflags | MEMF_no_scrub, NULL); if ( unlikely(pg == NULL) ) return NULL; @@ -1799,7 +1825,7 @@ void *alloc_xenheap_pages(unsigned int order, unsigned int memflags) if ( !(memflags >> _MEMF_bits) ) memflags |= MEMF_bits(xenheap_bits); -pg = alloc_domheap_pages(NULL, order, memflags); +pg = alloc_domheap_pages(NULL, order, memflags | MEMF_no_scrub); if ( unlikely(pg == NULL) ) return NULL; diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h index 3d3f31b..5f3d84a 100644 --- a/xen/include/xen/mm.h +++ b/xen/include/xen/mm.h @@ -238,7 +238,9 @@ struct npfec { #define MEMF_no_tlbflush
[Xen-devel] [PATCH v5 7/8] mm: Print number of unscrubbed pages in 'H' debug handler
Signed-off-by: Boris OstrovskyReviewed-by: Wei Liu --- xen/common/page_alloc.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c index f0e5399..da5ffc2 100644 --- a/xen/common/page_alloc.c +++ b/xen/common/page_alloc.c @@ -2315,6 +2315,13 @@ static void dump_heap(unsigned char key) printk("heap[node=%d][zone=%d] -> %lu pages\n", i, j, avail[i][j]); } + +for ( i = 0; i < MAX_NUMNODES; i++ ) +{ +if ( !node_need_scrub[i] ) +continue; +printk("Node %d has %lu unscrubbed pages\n", i, node_need_scrub[i]); +} } static __init int register_heap_trigger(void) -- 1.8.3.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC v2]Proposal to allow setting up shared memory areas between VMs from xl config file
Hi, After talking to Stefano, I know that there seem to be no such hypercalls to restrict the W/R/X permissions on the shared backing pages (XENMEM_access_op is for another purpose, sorry for getting its usage wrong). And it seems that the ability to specify these permissions is not strictly necessary. Since the goal of this project is to setup VM-to-VM communication, in most cases, users would just expect that the shared memory is mapped read-write with cacheability attributes of normal memory. So the temporary conclusion is to restrict the design to sharing read-write pages with normal caching attributes, with the rest left to the to-be-done list. Cheers, Zhongze Liu ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v5 5/8] spinlock: Introduce spin_lock_cb()
While waiting for a lock we may want to periodically run some code. This code may, for example, allow the caller to release resources held by it that are no longer needed in the critical section protected by the lock. Specifically, this feature will be needed by scrubbing code where the scrubber, while waiting for heap lock to merge back clean pages, may be requested by page allocator (which is currently holding the lock) to abort merging and release the buddy page head that the allocator wants. We could use spin_trylock() but since it doesn't take lock ticket it may take long time until the lock is taken. Instead we add spin_lock_cb() that allows us to grab the ticket and execute a callback while waiting. This callback is executed on every iteration of the spinlock waiting loop. Since we may be sleeping in the lock until it is released we need a mechanism that will make sure that the callback has a chance to run. We add spin_lock_kick() that will wake up the waiter. Signed-off-by: Boris Ostrovsky--- Changes in v5: * Added a sentence in commit message to note that callback function is called on every iteration of the spin loop. xen/common/spinlock.c | 9 - xen/include/xen/spinlock.h | 8 2 files changed, 16 insertions(+), 1 deletion(-) diff --git a/xen/common/spinlock.c b/xen/common/spinlock.c index 2a06406..3c1caae 100644 --- a/xen/common/spinlock.c +++ b/xen/common/spinlock.c @@ -129,7 +129,7 @@ static always_inline u16 observe_head(spinlock_tickets_t *t) return read_atomic(>head); } -void _spin_lock(spinlock_t *lock) +void inline _spin_lock_cb(spinlock_t *lock, void (*cb)(void *), void *data) { spinlock_tickets_t tickets = SPINLOCK_TICKET_INC; LOCK_PROFILE_VAR; @@ -140,6 +140,8 @@ void _spin_lock(spinlock_t *lock) while ( tickets.tail != observe_head(>tickets) ) { LOCK_PROFILE_BLOCK; +if ( unlikely(cb) ) +cb(data); arch_lock_relax(); } LOCK_PROFILE_GOT; @@ -147,6 +149,11 @@ void _spin_lock(spinlock_t *lock) arch_lock_acquire_barrier(); } +void _spin_lock(spinlock_t *lock) +{ + _spin_lock_cb(lock, NULL, NULL); +} + void _spin_lock_irq(spinlock_t *lock) { ASSERT(local_irq_is_enabled()); diff --git a/xen/include/xen/spinlock.h b/xen/include/xen/spinlock.h index c1883bd..91bfb95 100644 --- a/xen/include/xen/spinlock.h +++ b/xen/include/xen/spinlock.h @@ -153,6 +153,7 @@ typedef struct spinlock { #define spin_lock_init(l) (*(l) = (spinlock_t)SPIN_LOCK_UNLOCKED) void _spin_lock(spinlock_t *lock); +void _spin_lock_cb(spinlock_t *lock, void (*cond)(void *), void *data); void _spin_lock_irq(spinlock_t *lock); unsigned long _spin_lock_irqsave(spinlock_t *lock); @@ -169,6 +170,7 @@ void _spin_lock_recursive(spinlock_t *lock); void _spin_unlock_recursive(spinlock_t *lock); #define spin_lock(l) _spin_lock(l) +#define spin_lock_cb(l, c, d) _spin_lock_cb(l, c, d) #define spin_lock_irq(l) _spin_lock_irq(l) #define spin_lock_irqsave(l, f) \ ({ \ @@ -190,6 +192,12 @@ void _spin_unlock_recursive(spinlock_t *lock); 1 : ({ local_irq_restore(flags); 0; }); \ }) +#define spin_lock_kick(l) \ +({ \ +smp_mb(); \ +arch_lock_signal(); \ +}) + /* Ensure a lock is quiescent between two critical operations. */ #define spin_barrier(l) _spin_barrier(l) -- 1.8.3.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v5 6/8] mm: Keep heap accessible to others while scrubbing
Instead of scrubbing pages while holding heap lock we can mark buddy's head as being scrubbed and drop the lock temporarily. If someone (most likely alloc_heap_pages()) tries to access this chunk it will signal the scrubber to abort scrub by setting head's BUDDY_SCRUB_ABORT bit. The scrubber checks this bit after processing each page and stops its work as soon as it sees it. Signed-off-by: Boris Ostrovsky--- Changes in v5: * Fixed off-by-one error in setting first_dirty * Changed struct page_info.u.free to a union to permit use of ACCESS_ONCE in check_and_stop_scrub() * Renamed PAGE_SCRUBBING etc. macros to BUDDY_SCRUBBING etc xen/common/page_alloc.c | 105 +-- xen/include/asm-arm/mm.h | 28 - xen/include/asm-x86/mm.h | 29 - 3 files changed, 138 insertions(+), 24 deletions(-) diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c index 4e2775f..f0e5399 100644 --- a/xen/common/page_alloc.c +++ b/xen/common/page_alloc.c @@ -687,6 +687,7 @@ static void page_list_add_scrub(struct page_info *pg, unsigned int node, { PFN_ORDER(pg) = order; pg->u.free.first_dirty = first_dirty; +pg->u.free.scrub_state = BUDDY_NOT_SCRUBBING; if ( first_dirty != INVALID_DIRTY_IDX ) page_list_add_tail(pg, (node, zone, order)); @@ -694,6 +695,25 @@ static void page_list_add_scrub(struct page_info *pg, unsigned int node, page_list_add(pg, (node, zone, order)); } +static void check_and_stop_scrub(struct page_info *head) +{ +if ( head->u.free.scrub_state == BUDDY_SCRUBBING ) +{ +struct page_info pg; + +head->u.free.scrub_state = BUDDY_SCRUB_ABORT; +spin_lock_kick(); +for ( ; ; ) +{ +/* Can't ACCESS_ONCE() a bitfield. */ +pg.u.free.val = ACCESS_ONCE(head->u.free.val); +if ( pg.u.free.scrub_state != BUDDY_SCRUB_ABORT ) +break; +cpu_relax(); +} +} +} + static struct page_info *get_free_buddy(unsigned int zone_lo, unsigned int zone_hi, unsigned int order, unsigned int memflags, @@ -738,14 +758,19 @@ static struct page_info *get_free_buddy(unsigned int zone_lo, { if ( (pg = page_list_remove_head((node, zone, j))) ) { +if ( pg->u.free.first_dirty == INVALID_DIRTY_IDX ) +return pg; /* * We grab single pages (order=0) even if they are * unscrubbed. Given that scrubbing one page is fairly quick * it is not worth breaking higher orders. */ -if ( (order == 0) || use_unscrubbed || - pg->u.free.first_dirty == INVALID_DIRTY_IDX) +if ( (order == 0) || use_unscrubbed ) +{ +check_and_stop_scrub(pg); return pg; +} + page_list_add_tail(pg, (node, zone, j)); } } @@ -928,6 +953,7 @@ static int reserve_offlined_page(struct page_info *head) cur_head = head; +check_and_stop_scrub(head); /* * We may break the buddy so let's mark the head as clean. Then, when * merging chunks back into the heap, we will see whether the chunk has @@ -1084,6 +1110,29 @@ static unsigned int node_to_scrub(bool get_node) return closest; } +struct scrub_wait_state { +struct page_info *pg; +unsigned int first_dirty; +bool drop; +}; + +static void scrub_continue(void *data) +{ +struct scrub_wait_state *st = data; + +if ( st->drop ) +return; + +if ( st->pg->u.free.scrub_state == BUDDY_SCRUB_ABORT ) +{ +/* There is a waiter for this buddy. Release it. */ +st->drop = true; +st->pg->u.free.first_dirty = st->first_dirty; +smp_wmb(); +st->pg->u.free.scrub_state = BUDDY_NOT_SCRUBBING; +} +} + bool scrub_free_pages(void) { struct page_info *pg; @@ -1106,25 +1155,53 @@ bool scrub_free_pages(void) do { while ( !page_list_empty((node, zone, order)) ) { -unsigned int i; +unsigned int i, dirty_cnt; +struct scrub_wait_state st; /* Unscrubbed pages are always at the end of the list. */ pg = page_list_last((node, zone, order)); if ( pg->u.free.first_dirty == INVALID_DIRTY_IDX ) break; +ASSERT(!pg->u.free.scrub_state); +pg->u.free.scrub_state = BUDDY_SCRUBBING; + +spin_unlock(_lock); + +dirty_cnt = 0; + for ( i = pg->u.free.first_dirty; i < (1U << order); i++) {
[Xen-devel] [PATCH v5 2/8] mm: Extract allocation loop from alloc_heap_pages()
This will make code a bit more readable, especially with changes that will be introduced in subsequent patches. Signed-off-by: Boris Ostrovsky--- Changes in v5: * Constified get_free_buddy()'s struct domain argument * Dropped request local variable in get_free_buddy(). Because of rebasing there were few more changes in this patch so I decided not to keep Jan's ACK. xen/common/page_alloc.c | 143 ++-- 1 file changed, 79 insertions(+), 64 deletions(-) diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c index 570d1f7..89fe3ce 100644 --- a/xen/common/page_alloc.c +++ b/xen/common/page_alloc.c @@ -694,22 +694,15 @@ static void page_list_add_scrub(struct page_info *pg, unsigned int node, page_list_add(pg, (node, zone, order)); } -/* Allocate 2^@order contiguous pages. */ -static struct page_info *alloc_heap_pages( -unsigned int zone_lo, unsigned int zone_hi, -unsigned int order, unsigned int memflags, -struct domain *d) +static struct page_info *get_free_buddy(unsigned int zone_lo, +unsigned int zone_hi, +unsigned int order, unsigned int memflags, +const struct domain *d) { -unsigned int i, j, zone = 0, nodemask_retry = 0; nodeid_t first_node, node = MEMF_get_node(memflags), req_node = node; -unsigned long request = 1UL << order; -struct page_info *pg, *first_dirty_pg = NULL; -nodemask_t nodemask = (d != NULL ) ? d->node_affinity : node_online_map; -bool_t need_tlbflush = 0; -uint32_t tlbflush_timestamp = 0; - -/* Make sure there are enough bits in memflags for nodeID. */ -BUILD_BUG_ON((_MEMF_bits - _MEMF_node) < (8 * sizeof(nodeid_t))); +nodemask_t nodemask = d ? d->node_affinity : node_online_map; +unsigned int j, zone, nodemask_retry = 0; +struct page_info *pg; if ( node == NUMA_NO_NODE ) { @@ -725,34 +718,6 @@ static struct page_info *alloc_heap_pages( first_node = node; ASSERT(node < MAX_NUMNODES); -ASSERT(zone_lo <= zone_hi); -ASSERT(zone_hi < NR_ZONES); - -if ( unlikely(order > MAX_ORDER) ) -return NULL; - -spin_lock(_lock); - -/* - * Claimed memory is considered unavailable unless the request - * is made by a domain with sufficient unclaimed pages. - */ -if ( (outstanding_claims + request > - total_avail_pages + tmem_freeable_pages()) && - ((memflags & MEMF_no_refcount) || - !d || d->outstanding_pages < request) ) -goto not_found; - -/* - * TMEM: When available memory is scarce due to tmem absorbing it, allow - * only mid-size allocations to avoid worst of fragmentation issues. - * Others try tmem pools then fail. This is a workaround until all - * post-dom0-creation-multi-page allocations can be eliminated. - */ -if ( ((order == 0) || (order >= 9)) && - (total_avail_pages <= midsize_alloc_zone_pages) && - tmem_freeable_pages() ) -goto try_tmem; /* * Start with requested node, but exhaust all node memory in requested @@ -764,17 +729,17 @@ static struct page_info *alloc_heap_pages( zone = zone_hi; do { /* Check if target node can support the allocation. */ -if ( !avail[node] || (avail[node][zone] < request) ) +if ( !avail[node] || (avail[node][zone] < (1UL << order)) ) continue; /* Find smallest order which can satisfy the request. */ for ( j = order; j <= MAX_ORDER; j++ ) if ( (pg = page_list_remove_head((node, zone, j))) ) -goto found; +return pg; } while ( zone-- > zone_lo ); /* careful: unsigned zone may wrap */ if ( (memflags & MEMF_exact_node) && req_node != NUMA_NO_NODE ) -goto not_found; +return NULL; /* Pick next node. */ if ( !node_isset(node, nodemask) ) @@ -791,39 +756,89 @@ static struct page_info *alloc_heap_pages( { /* When we have tried all in nodemask, we fall back to others. */ if ( (memflags & MEMF_exact_node) || nodemask_retry++ ) -goto not_found; +return NULL; nodes_andnot(nodemask, node_online_map, nodemask); first_node = node = first_node(nodemask); if ( node >= MAX_NUMNODES ) -goto not_found; +return NULL; } } +} - try_tmem: -/* Try to free memory from tmem */ -if ( (pg = tmem_relinquish_pages(order, memflags)) != NULL ) +/* Allocate 2^@order contiguous pages. */ +static struct page_info *alloc_heap_pages( +unsigned int zone_lo, unsigned int zone_hi, +unsigned int order, unsigned int memflags, +struct domain *d) +{ +nodeid_t node; +
[Xen-devel] [PATCH v5 8/8] mm: Make sure pages are scrubbed
Add a debug Kconfig option that will make page allocator verify that pages that were supposed to be scrubbed are, in fact, clean. Signed-off-by: Boris Ostrovsky--- Changes in v5: * Defined SCRUB_PATTERN for NDEBUG * Style chages xen/Kconfig.debug | 7 ++ xen/common/page_alloc.c | 63 - 2 files changed, 69 insertions(+), 1 deletion(-) diff --git a/xen/Kconfig.debug b/xen/Kconfig.debug index 689f297..195d504 100644 --- a/xen/Kconfig.debug +++ b/xen/Kconfig.debug @@ -114,6 +114,13 @@ config DEVICE_TREE_DEBUG logged in the Xen ring buffer. If unsure, say N here. +config SCRUB_DEBUG + bool "Page scrubbing test" + default DEBUG + ---help--- + Verify that pages that need to be scrubbed before being allocated to + a guest are indeed scrubbed. + endif # DEBUG || EXPERT endmenu diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c index da5ffc2..5d50c2a 100644 --- a/xen/common/page_alloc.c +++ b/xen/common/page_alloc.c @@ -170,6 +170,10 @@ boolean_param("bootscrub", opt_bootscrub); static unsigned long __initdata opt_bootscrub_chunk = MB(128); size_param("bootscrub_chunk", opt_bootscrub_chunk); +#ifdef CONFIG_SCRUB_DEBUG +static bool __read_mostly boot_scrub_done; +#endif + /* * Bit width of the DMA heap -- used to override NUMA-node-first. * allocation strategy, which can otherwise exhaust low memory. @@ -695,6 +699,43 @@ static void page_list_add_scrub(struct page_info *pg, unsigned int node, page_list_add(pg, (node, zone, order)); } +/* SCRUB_PATTERN needs to be a repeating series of bytes. */ +#ifndef NDEBUG +#define SCRUB_PATTERN0xc2c2c2c2c2c2c2c2ULL +#else +#define SCRUB_PATTERN0ULL +#endif +#define SCRUB_BYTE_PATTERN (SCRUB_PATTERN & 0xff) + +static void poison_one_page(struct page_info *pg) +{ +#ifdef CONFIG_SCRUB_DEBUG +mfn_t mfn = _mfn(page_to_mfn(pg)); +uint64_t *ptr; + +ptr = map_domain_page(mfn); +*ptr = ~SCRUB_PATTERN; +unmap_domain_page(ptr); +#endif +} + +static void check_one_page(struct page_info *pg) +{ +#ifdef CONFIG_SCRUB_DEBUG +mfn_t mfn = _mfn(page_to_mfn(pg)); +const uint64_t *ptr; +unsigned int i; + +if ( !boot_scrub_done ) +return; + +ptr = map_domain_page(mfn); +for ( i = 0; i < PAGE_SIZE / sizeof (*ptr); i++ ) +ASSERT(ptr[i] == SCRUB_PATTERN); +unmap_domain_page(ptr); +#endif +} + static void check_and_stop_scrub(struct page_info *head) { if ( head->u.free.scrub_state == BUDDY_SCRUBBING ) @@ -931,6 +972,9 @@ static struct page_info *alloc_heap_pages( * guest can control its own visibility of/through the cache. */ flush_page_to_ram(page_to_mfn([i]), !(memflags & MEMF_no_icache_flush)); + +if ( !(memflags & MEMF_no_scrub) ) +check_one_page([i]); } spin_unlock(_lock); @@ -1294,7 +1338,10 @@ static void free_heap_pages( set_gpfn_from_mfn(mfn + i, INVALID_M2P_ENTRY); if ( need_scrub ) +{ pg[i].count_info |= PGC_need_scrub; +poison_one_page([i]); +} } avail[node][zone] += 1 << order; @@ -1656,7 +1703,12 @@ static void init_heap_pages( nr_pages -= n; } +#ifndef CONFIG_SCRUB_DEBUG free_heap_pages(pg + i, 0, false); +#else +free_heap_pages(pg + i, 0, boot_scrub_done); +#endif + } } @@ -1922,6 +1974,10 @@ void __init scrub_heap_pages(void) printk("done.\n"); +#ifdef CONFIG_SCRUB_DEBUG +boot_scrub_done = true; +#endif + /* Now that the heap is initialized, run checks and set bounds * for the low mem virq algorithm. */ setup_low_mem_virq(); @@ -2195,12 +2251,16 @@ void free_domheap_pages(struct page_info *pg, unsigned int order) spin_unlock_recursive(>page_alloc_lock); +#ifndef CONFIG_SCRUB_DEBUG /* * Normally we expect a domain to clear pages before freeing them, * if it cares about the secrecy of their contents. However, after * a domain has died we assume responsibility for erasure. */ scrub = !!d->is_dying; +#else +scrub = true; +#endif } else { @@ -2292,7 +2352,8 @@ void scrub_one_page(struct page_info *pg) #ifndef NDEBUG /* Avoid callers relying on allocations returning zeroed pages. */ -unmap_domain_page(memset(__map_domain_page(pg), 0xc2, PAGE_SIZE)); +unmap_domain_page(memset(__map_domain_page(pg), + SCRUB_BYTE_PATTERN, PAGE_SIZE)); #else /* For a production build, clear_page() is the fastest way to scrub. */ clear_domain_page(_mfn(page_to_mfn(pg))); -- 1.8.3.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v5 1/8] mm: Place unscrubbed pages at the end of pagelist
. so that it's easy to find pages that need to be scrubbed (those pages are now marked with _PGC_need_scrub bit). We keep track of the first unscrubbed page in a page buddy using first_dirty field. For now it can have two values, 0 (whole buddy needs scrubbing) or INVALID_DIRTY_IDX (the buddy does not need to be scrubbed). Subsequent patches will allow scrubbing to be interrupted, resulting in first_dirty taking any value. Signed-off-by: Boris Ostrovsky--- Changes in v5: * Be more careful when returning unused portion of a buddy to the heap in alloc_heap_pages() and don't set first_dirty if we know that the sub-buddy is clean * In reserve_offlined_page(), don't try to find dirty pages in sub-buddies if we can figure out that there is none. * Drop unnecessary setting of first_dirty in free_heap_pages() * Switch to using bitfields in page_info.u.free I kept node_need_scrub[] as a global array and not a "per-node". I think splitting it should be part of making heap_lock a per-node lock, together with increasing scrub concurrency by having more than one CPU scrub a node. xen/common/page_alloc.c | 190 +++ xen/include/asm-arm/mm.h | 18 - xen/include/asm-x86/mm.h | 17 - 3 files changed, 190 insertions(+), 35 deletions(-) diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c index 8bcef6a..570d1f7 100644 --- a/xen/common/page_alloc.c +++ b/xen/common/page_alloc.c @@ -383,6 +383,8 @@ typedef struct page_list_head heap_by_zone_and_order_t[NR_ZONES][MAX_ORDER+1]; static heap_by_zone_and_order_t *_heap[MAX_NUMNODES]; #define heap(node, zone, order) ((*_heap[node])[zone][order]) +static unsigned long node_need_scrub[MAX_NUMNODES]; + static unsigned long *avail[MAX_NUMNODES]; static long total_avail_pages; @@ -678,6 +680,20 @@ static void check_low_mem_virq(void) } } +/* Pages that need a scrub are added to tail, otherwise to head. */ +static void page_list_add_scrub(struct page_info *pg, unsigned int node, +unsigned int zone, unsigned int order, +unsigned int first_dirty) +{ +PFN_ORDER(pg) = order; +pg->u.free.first_dirty = first_dirty; + +if ( first_dirty != INVALID_DIRTY_IDX ) +page_list_add_tail(pg, (node, zone, order)); +else +page_list_add(pg, (node, zone, order)); +} + /* Allocate 2^@order contiguous pages. */ static struct page_info *alloc_heap_pages( unsigned int zone_lo, unsigned int zone_hi, @@ -687,7 +703,7 @@ static struct page_info *alloc_heap_pages( unsigned int i, j, zone = 0, nodemask_retry = 0; nodeid_t first_node, node = MEMF_get_node(memflags), req_node = node; unsigned long request = 1UL << order; -struct page_info *pg; +struct page_info *pg, *first_dirty_pg = NULL; nodemask_t nodemask = (d != NULL ) ? d->node_affinity : node_online_map; bool_t need_tlbflush = 0; uint32_t tlbflush_timestamp = 0; @@ -798,11 +814,26 @@ static struct page_info *alloc_heap_pages( return NULL; found: + +if ( pg->u.free.first_dirty != INVALID_DIRTY_IDX ) +first_dirty_pg = pg + pg->u.free.first_dirty; + /* We may have to halve the chunk a number of times. */ while ( j != order ) { -PFN_ORDER(pg) = --j; -page_list_add_tail(pg, (node, zone, j)); +unsigned int first_dirty; + +if ( first_dirty_pg && ((pg + (1 << j)) > first_dirty_pg) ) +{ +if ( pg < first_dirty_pg ) +first_dirty = (first_dirty_pg - pg) / sizeof(*pg); +else +first_dirty = 0; +} +else +first_dirty = INVALID_DIRTY_IDX; + +page_list_add_scrub(pg, node, zone, --j, first_dirty); pg += 1 << j; } @@ -849,13 +880,22 @@ static int reserve_offlined_page(struct page_info *head) { unsigned int node = phys_to_nid(page_to_maddr(head)); int zone = page_to_zone(head), i, head_order = PFN_ORDER(head), count = 0; -struct page_info *cur_head; +struct page_info *cur_head, *first_dirty_pg = NULL; int cur_order; ASSERT(spin_is_locked(_lock)); cur_head = head; +/* + * We may break the buddy so let's mark the head as clean. Then, when + * merging chunks back into the heap, we will see whether the chunk has + * unscrubbed pages and set its first_dirty properly. + */ +if (head->u.free.first_dirty != INVALID_DIRTY_IDX) +first_dirty_pg = head + head->u.free.first_dirty; +head->u.free.first_dirty = INVALID_DIRTY_IDX; + page_list_del(head, (node, zone, head_order)); while ( cur_head < (head + (1 << head_order)) ) @@ -873,6 +913,8 @@ static int reserve_offlined_page(struct page_info *head) while ( cur_order < head_order ) { +unsigned int first_dirty = INVALID_DIRTY_IDX; + next_order = cur_order + 1;
Re: [Xen-devel] [PATCH] xen/disk: don't leak stack data via response ring
On Thu, 22 Jun 2017, Jan Beulich wrote: > >>> On 21.06.17 at 20:46,wrote: > > On Wed, 21 Jun 2017, Jan Beulich wrote: > >> >>> On 20.06.17 at 23:48, wrote: > >> > On Tue, 20 Jun 2017, Jan Beulich wrote: > >> >> @@ -36,13 +33,7 @@ struct blkif_x86_32_request_discard { > >> >> blkif_sector_t sector_number;/* start sector idx on disk (r/w > > only) */ > >> >> uint64_t nr_sectors; /* # of contiguous sectors to > >> >> discard > > */ > >> >> }; > >> >> -struct blkif_x86_32_response { > >> >> -uint64_tid; /* copied from request */ > >> >> -uint8_t operation; /* copied from request */ > >> >> -int16_t status; /* BLKIF_RSP_??? */ > >> >> -}; > >> >> typedef struct blkif_x86_32_request blkif_x86_32_request_t; > >> >> -typedef struct blkif_x86_32_response blkif_x86_32_response_t; > >> >> #pragma pack(pop) > >> >> > >> >> /* x86_64 protocol version */ > >> >> @@ -62,20 +53,14 @@ struct blkif_x86_64_request_discard { > >> >> blkif_sector_t sector_number;/* start sector idx on disk (r/w > > only) */ > >> >> uint64_t nr_sectors; /* # of contiguous sectors to > >> >> discard > > */ > >> >> }; > >> >> -struct blkif_x86_64_response { > >> >> -uint64_t __attribute__((__aligned__(8))) id; > >> >> -uint8_t operation; /* copied from request */ > >> >> -int16_t status; /* BLKIF_RSP_??? */ > >> >> -}; > >> >> > >> >> typedef struct blkif_x86_64_request blkif_x86_64_request_t; > >> >> -typedef struct blkif_x86_64_response blkif_x86_64_response_t; > >> >> > >> >> DEFINE_RING_TYPES(blkif_common, struct blkif_common_request, > >> >> - struct blkif_common_response); > >> >> + struct blkif_response); > >> >> DEFINE_RING_TYPES(blkif_x86_32, struct blkif_x86_32_request, > >> >> - struct blkif_x86_32_response); > >> >> + struct blkif_response QEMU_PACKED); > >> > > >> > In my test, the previous sizes and alignments of the response structs > >> > were (on both x86_32 and x86_64): > >> > > >> > sizeof(blkif_x86_32_response)=12 sizeof(blkif_x86_64_response)=16 > >> > align(blkif_x86_32_response)=4 align(blkif_x86_64_response)=8 > >> > > >> > While with these changes are now, when compiled on x86_64: > >> > sizeof(blkif_x86_32_response)=11 sizeof(blkif_x86_64_response)=16 > >> > align(blkif_x86_32_response)=1 align(blkif_x86_64_response)=8 > >> > > >> > when compiled on x86_32: > >> > sizeof(blkif_x86_32_response)=11 sizeof(blkif_x86_64_response)=12 > >> > align(blkif_x86_32_response)=1 align(blkif_x86_64_response)=4 > >> > > >> > Did I do my tests wrong? > >> > > >> > QEMU_PACKED is not the same as #pragma pack(push, 4). In fact, it is the > >> > same as #pragma pack(push, 1), causing the struct to be densely packed, > >> > leaving no padding whatsever. > >> > > >> > In addition, without __attribute__((__aligned__(8))), > >> > blkif_x86_64_response won't be 8 bytes aligned when built on x86_32. > >> > > >> > Am I missing something? > >> > >> Well, you're mixing attribute application upon structure > >> declaration with attribute application upon structure use. It's > >> the latter here, and hence the attribute doesn't affect > >> structure layout at all. All it does is avoid the _containing_ > >> 32-bit union to become 8-byte aligned (and tail padding to be > >> inserted). > > > > Thanks for the explanation. I admit it's the first time I see the > > aligned attribute being used at structure usage only. I think it's the > > first time QEMU_PACKED is used this way in QEMU too. > > > > Anyway, even taking that into account, things are still not completely > > right: the alignment of struct blkif_x86_32_response QEMU_PACKED is 4 > > bytes as you wrote, but the size of struct blkif_x86_32_response is > > still 16 bytes instead of 12 bytes in my test. I suspect it worked for > > you because the other member of the union (blkif_x86_32_request) is > > larger than that. However, I think is not a good idea to rely on this > > implementation detail. The implementation of DEFINE_RING_TYPES should be > > opaque from our point of view. We shouldn't have to know that there is a > > union there. > > I don't follow - why should we not rely on this? It is a fundamental > aspect of the shared ring model that requests and responses share > space. > > > Moreover, the other problem is still unaddressed: the size and alignment > > of blkif_x86_64_response when built on x86_32 are 12 and 4 instead of 16 > > and 8 bytes. Is that working also because it's relying on the other > > member of the union to enforce the right alignment and bigger size? > > Yes. For these as well as your comments further up - sizeof() and > alignof() are completely uninteresting as long as we don't > instantiate objects of those types _and
Re: [Xen-devel] [PATCH v4 07/18] xen/pvcalls: implement socket command
On Thu, 22 Jun 2017, Andrew Cooper wrote: > On 22/06/17 19:29, Stefano Stabellini wrote: > > On Thu, 22 Jun 2017, Roger Pau Monné wrote: > >> On Wed, Jun 21, 2017 at 01:16:56PM -0700, Stefano Stabellini wrote: > >>> On Tue, 20 Jun 2017, Roger Pau Monné wrote: > On Thu, Jun 15, 2017 at 12:09:36PM -0700, Stefano Stabellini wrote: > > Just reply with success to the other end for now. Delay the allocation > > of the actual socket to bind and/or connect. > > > > Signed-off-by: Stefano Stabellini> > CC: boris.ostrov...@oracle.com > > CC: jgr...@suse.com > > --- > > drivers/xen/pvcalls-back.c | 27 +++ > > 1 file changed, 27 insertions(+) > > > > diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c > > index 437c2ad..953458b 100644 > > --- a/drivers/xen/pvcalls-back.c > > +++ b/drivers/xen/pvcalls-back.c > > @@ -12,12 +12,17 @@ > > * GNU General Public License for more details. > > */ > > > > +#include > > #include > > #include > > #include > > #include > > #include > > #include > > +#include > > +#include > > +#include > > +#include > > > > #include > > #include > > @@ -54,6 +59,28 @@ struct pvcalls_fedata { > > static int pvcalls_back_socket(struct xenbus_device *dev, > > struct xen_pvcalls_request *req) > > { > > + struct pvcalls_fedata *fedata; > > + int ret; > > + struct xen_pvcalls_response *rsp; > > + > > + fedata = dev_get_drvdata(>dev); > > + > > + if (req->u.socket.domain != AF_INET || > > + req->u.socket.type != SOCK_STREAM || > > + (req->u.socket.protocol != IPPROTO_IP && > > +req->u.socket.protocol != AF_INET)) > > + ret = -EAFNOSUPPORT; > Sorry for jumping into this out of the blue, but shouldn't all the > constants used above be part of the protocol? AF_INET/SOCK_STREAM/... > are all part of POSIX, but their specific value is not defined in the > standard, hence we should have XEN_AF_INET/XEN_SOCK_STREAM/... Or am I > just missing something? > >>> The values of these constants for the pvcalls protocol are defined by > >>> docs/misc/pvcalls.markdown under "Socket families and address format". > >>> > >>> They happen to be the same as the ones defined by Linux as AF_INET, > >>> SOCK_STREAM, etc, so in Linux I am just using those, but that is just an > >>> implementation detail internal to the Linux kernel driver. What is > >>> important from the protocol ABI perspective are the values defined by > >>> docs/misc/pvcalls.markdown. > >> Oh I see. I still think this should be part of the public pvcalls.h > >> header, and that the error codes should be the ones defined in > >> public/errno.h (or else also added to the pvcalls header). > > This was done differently in the past, but now that we have a formal > > process, a person in charge of new PV drivers reviews, and design > > documents with clearly spelled out ABIs, I consider the design docs > > under docs/misc as the official specification. We don't need headers > > anymore, they are redundant. In fact, we cannot have two specifications, > > and the design docs are certainly the official ones (we don't want the > > specs to be written as header files in C). To me, the headers under > > xen/include/public/io/ are optional helpers. It doesn't matter what's in > > there, or if frontends and backends use them or not. > > > > There is really an argument for removing those headers, because they > > might get out of sync with the spec by mistake, and in those cases, then > > we really end up with two specifications for the same protocol. I would > > be in favor of `git rm'ing all files under xen/include/public/io/ for > > which we have a complete design doc under docs/misc. > > +1. > > Specifications should not be written in C. The mess that is the net and > block protocol ABIs are perfect examples of why. > > Its fine (and indeed recommended) to provide a header file which > describes the specified protocol, but the authoritative spec should be > in text from. > > I would really prefer if more people started using ../docs/specs/. The > migration v2 documents are currently lonely there... I didn't realize we had a docs/specs. Feel free to move pvcalls and 9pfs under there.___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 07/18] xen/pvcalls: implement socket command
On 22/06/17 19:29, Stefano Stabellini wrote: > On Thu, 22 Jun 2017, Roger Pau Monné wrote: >> On Wed, Jun 21, 2017 at 01:16:56PM -0700, Stefano Stabellini wrote: >>> On Tue, 20 Jun 2017, Roger Pau Monné wrote: On Thu, Jun 15, 2017 at 12:09:36PM -0700, Stefano Stabellini wrote: > Just reply with success to the other end for now. Delay the allocation > of the actual socket to bind and/or connect. > > Signed-off-by: Stefano Stabellini> CC: boris.ostrov...@oracle.com > CC: jgr...@suse.com > --- > drivers/xen/pvcalls-back.c | 27 +++ > 1 file changed, 27 insertions(+) > > diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c > index 437c2ad..953458b 100644 > --- a/drivers/xen/pvcalls-back.c > +++ b/drivers/xen/pvcalls-back.c > @@ -12,12 +12,17 @@ > * GNU General Public License for more details. > */ > > +#include > #include > #include > #include > #include > #include > #include > +#include > +#include > +#include > +#include > > #include > #include > @@ -54,6 +59,28 @@ struct pvcalls_fedata { > static int pvcalls_back_socket(struct xenbus_device *dev, > struct xen_pvcalls_request *req) > { > + struct pvcalls_fedata *fedata; > + int ret; > + struct xen_pvcalls_response *rsp; > + > + fedata = dev_get_drvdata(>dev); > + > + if (req->u.socket.domain != AF_INET || > + req->u.socket.type != SOCK_STREAM || > + (req->u.socket.protocol != IPPROTO_IP && > + req->u.socket.protocol != AF_INET)) > + ret = -EAFNOSUPPORT; Sorry for jumping into this out of the blue, but shouldn't all the constants used above be part of the protocol? AF_INET/SOCK_STREAM/... are all part of POSIX, but their specific value is not defined in the standard, hence we should have XEN_AF_INET/XEN_SOCK_STREAM/... Or am I just missing something? >>> The values of these constants for the pvcalls protocol are defined by >>> docs/misc/pvcalls.markdown under "Socket families and address format". >>> >>> They happen to be the same as the ones defined by Linux as AF_INET, >>> SOCK_STREAM, etc, so in Linux I am just using those, but that is just an >>> implementation detail internal to the Linux kernel driver. What is >>> important from the protocol ABI perspective are the values defined by >>> docs/misc/pvcalls.markdown. >> Oh I see. I still think this should be part of the public pvcalls.h >> header, and that the error codes should be the ones defined in >> public/errno.h (or else also added to the pvcalls header). > This was done differently in the past, but now that we have a formal > process, a person in charge of new PV drivers reviews, and design > documents with clearly spelled out ABIs, I consider the design docs > under docs/misc as the official specification. We don't need headers > anymore, they are redundant. In fact, we cannot have two specifications, > and the design docs are certainly the official ones (we don't want the > specs to be written as header files in C). To me, the headers under > xen/include/public/io/ are optional helpers. It doesn't matter what's in > there, or if frontends and backends use them or not. > > There is really an argument for removing those headers, because they > might get out of sync with the spec by mistake, and in those cases, then > we really end up with two specifications for the same protocol. I would > be in favor of `git rm'ing all files under xen/include/public/io/ for > which we have a complete design doc under docs/misc. +1. Specifications should not be written in C. The mess that is the net and block protocol ABIs are perfect examples of why. Its fine (and indeed recommended) to provide a header file which describes the specified protocol, but the authoritative spec should be in text from. I would really prefer if more people started using ../docs/specs/. The migration v2 documents are currently lonely there... ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 01/16] xen/mm: Don't use _{g, m}fn for defining INVALID_{G, M}FN
Hi, On 20/06/17 11:32, Jan Beulich wrote: On 20.06.17 at 12:06,wrote: At 03:36 -0600 on 20 Jun (1497929778), Jan Beulich wrote: On 20.06.17 at 11:14, wrote: At 01:32 -0600 on 20 Jun (1497922345), Jan Beulich wrote: On 19.06.17 at 18:57, wrote: --- a/xen/include/xen/mm.h +++ b/xen/include/xen/mm.h @@ -56,7 +56,7 @@ TYPE_SAFE(unsigned long, mfn); #define PRI_mfn "05lx" -#define INVALID_MFN _mfn(~0UL) +#define INVALID_MFN (mfn_t){ ~0UL } While I don't expect anyone to wish to use a suffix expression on this constant, for maximum compatibility this should still be fully parenthesized, I think. Of course this should be easy enough to do while committing. Are you able to assure us that clang supports this gcc extension (compound literal for non-compound types) AIUI this is a C99 feature, not a GCCism. Most parts of it yes (it is a gcc extension in C89 mode only), but the specific use here isn't afaict: Compound literals outside of functions are static objects, and hence couldn't be used as initializers of other objects. Ah, I see. So would it be better to use #define INVALID_MFN ((const mfn_t) { ~0UL }) ? While I think we should indeed consider adding the const, the above still is a static object, and hence still not suitable as an initializer as per C99 or C11. But as long as gcc and clang permit it, we're fine. Actually this solutions breaks on GCC 4.9 provided by Linaro ([1] 4.9-2016-02 and 4.9-2017.01). This small reproducer does not compile with -std=gnu99 (used by Xen) but compile with this option. Jan, have you tried 4.9 with this patch? typedef struct { unsigned long i; } mfn_t; mfn_t v = (const mfn_t){~0UL}; Cheers, [1] https://releases.linaro.org/components/toolchain/binaries/ Jan -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 07/18] xen/pvcalls: implement socket command
On Thu, 22 Jun 2017, Roger Pau Monné wrote: > On Wed, Jun 21, 2017 at 01:16:56PM -0700, Stefano Stabellini wrote: > > On Tue, 20 Jun 2017, Roger Pau Monné wrote: > > > On Thu, Jun 15, 2017 at 12:09:36PM -0700, Stefano Stabellini wrote: > > > > Just reply with success to the other end for now. Delay the allocation > > > > of the actual socket to bind and/or connect. > > > > > > > > Signed-off-by: Stefano Stabellini> > > > CC: boris.ostrov...@oracle.com > > > > CC: jgr...@suse.com > > > > --- > > > > drivers/xen/pvcalls-back.c | 27 +++ > > > > 1 file changed, 27 insertions(+) > > > > > > > > diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c > > > > index 437c2ad..953458b 100644 > > > > --- a/drivers/xen/pvcalls-back.c > > > > +++ b/drivers/xen/pvcalls-back.c > > > > @@ -12,12 +12,17 @@ > > > > * GNU General Public License for more details. > > > > */ > > > > > > > > +#include > > > > #include > > > > #include > > > > #include > > > > #include > > > > #include > > > > #include > > > > +#include > > > > +#include > > > > +#include > > > > +#include > > > > > > > > #include > > > > #include > > > > @@ -54,6 +59,28 @@ struct pvcalls_fedata { > > > > static int pvcalls_back_socket(struct xenbus_device *dev, > > > > struct xen_pvcalls_request *req) > > > > { > > > > + struct pvcalls_fedata *fedata; > > > > + int ret; > > > > + struct xen_pvcalls_response *rsp; > > > > + > > > > + fedata = dev_get_drvdata(>dev); > > > > + > > > > + if (req->u.socket.domain != AF_INET || > > > > + req->u.socket.type != SOCK_STREAM || > > > > + (req->u.socket.protocol != IPPROTO_IP && > > > > +req->u.socket.protocol != AF_INET)) > > > > + ret = -EAFNOSUPPORT; > > > > > > Sorry for jumping into this out of the blue, but shouldn't all the > > > constants used above be part of the protocol? AF_INET/SOCK_STREAM/... > > > are all part of POSIX, but their specific value is not defined in the > > > standard, hence we should have XEN_AF_INET/XEN_SOCK_STREAM/... Or am I > > > just missing something? > > > > The values of these constants for the pvcalls protocol are defined by > > docs/misc/pvcalls.markdown under "Socket families and address format". > > > > They happen to be the same as the ones defined by Linux as AF_INET, > > SOCK_STREAM, etc, so in Linux I am just using those, but that is just an > > implementation detail internal to the Linux kernel driver. What is > > important from the protocol ABI perspective are the values defined by > > docs/misc/pvcalls.markdown. > > Oh I see. I still think this should be part of the public pvcalls.h > header, and that the error codes should be the ones defined in > public/errno.h (or else also added to the pvcalls header). This was done differently in the past, but now that we have a formal process, a person in charge of new PV drivers reviews, and design documents with clearly spelled out ABIs, I consider the design docs under docs/misc as the official specification. We don't need headers anymore, they are redundant. In fact, we cannot have two specifications, and the design docs are certainly the official ones (we don't want the specs to be written as header files in C). To me, the headers under xen/include/public/io/ are optional helpers. It doesn't matter what's in there, or if frontends and backends use them or not. There is really an argument for removing those headers, because they might get out of sync with the spec by mistake, and in those cases, then we really end up with two specifications for the same protocol. I would be in favor of `git rm'ing all files under xen/include/public/io/ for which we have a complete design doc under docs/misc.___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH for-4.9 v3 1/3] xen/livepatch: Clean up arch relocation handling
* Reduce symbol scope and initalisation as much as possible * Annotate a fallthrough case in arm64 * Fix switch statement style in arm32 No functional change. Signed-off-by: Andrew CooperReviewed-by: Jan Beulich Reviewed-by: Konrad Rzeszutek Wilk Tested-by: Konrad Rzeszutek Wilk --- xen/arch/arm/arm32/livepatch.c | 27 --- xen/arch/arm/arm64/livepatch.c | 19 +++ xen/arch/x86/livepatch.c | 13 + 3 files changed, 24 insertions(+), 35 deletions(-) diff --git a/xen/arch/arm/arm32/livepatch.c b/xen/arch/arm/arm32/livepatch.c index a7fd5e2..a328179 100644 --- a/xen/arch/arm/arm32/livepatch.c +++ b/xen/arch/arm/arm32/livepatch.c @@ -224,21 +224,21 @@ int arch_livepatch_perform(struct livepatch_elf *elf, const struct livepatch_elf_sec *rela, bool use_rela) { -const Elf_RelA *r_a; -const Elf_Rel *r; -unsigned int symndx, i; -uint32_t val; -void *dest; +unsigned int i; int rc = 0; for ( i = 0; i < (rela->sec->sh_size / rela->sec->sh_entsize); i++ ) { +unsigned int symndx; +uint32_t val; +void *dest; unsigned char type; -s32 addend = 0; +s32 addend; if ( use_rela ) { -r_a = rela->data + i * rela->sec->sh_entsize; +const Elf_RelA *r_a = rela->data + i * rela->sec->sh_entsize; + symndx = ELF32_R_SYM(r_a->r_info); type = ELF32_R_TYPE(r_a->r_info); dest = base->load_addr + r_a->r_offset; /* P */ @@ -246,10 +246,12 @@ int arch_livepatch_perform(struct livepatch_elf *elf, } else { -r = rela->data + i * rela->sec->sh_entsize; +const Elf_Rel *r = rela->data + i * rela->sec->sh_entsize; + symndx = ELF32_R_SYM(r->r_info); type = ELF32_R_TYPE(r->r_info); dest = base->load_addr + r->r_offset; /* P */ +addend = get_addend(type, dest); } if ( symndx > elf->nsym ) @@ -259,13 +261,11 @@ int arch_livepatch_perform(struct livepatch_elf *elf, return -EINVAL; } -if ( !use_rela ) -addend = get_addend(type, dest); - val = elf->sym[symndx].sym->st_value; /* S */ rc = perform_rel(type, dest, val, addend); -switch ( rc ) { +switch ( rc ) +{ case -EOVERFLOW: dprintk(XENLOG_ERR, LIVEPATCH "%s: Overflow in relocation %u in %s for %s!\n", elf->name, i, rela->name, base->name); @@ -275,9 +275,6 @@ int arch_livepatch_perform(struct livepatch_elf *elf, dprintk(XENLOG_ERR, LIVEPATCH "%s: Unhandled relocation #%x\n", elf->name, type); break; - -default: -break; } if ( rc ) diff --git a/xen/arch/arm/arm64/livepatch.c b/xen/arch/arm/arm64/livepatch.c index dae64f5..63929b1 100644 --- a/xen/arch/arm/arm64/livepatch.c +++ b/xen/arch/arm/arm64/livepatch.c @@ -241,19 +241,16 @@ int arch_livepatch_perform_rela(struct livepatch_elf *elf, const struct livepatch_elf_sec *base, const struct livepatch_elf_sec *rela) { -const Elf_RelA *r; -unsigned int symndx, i; -uint64_t val; -void *dest; -bool_t overflow_check; +unsigned int i; for ( i = 0; i < (rela->sec->sh_size / rela->sec->sh_entsize); i++ ) { +const Elf_RelA *r = rela->data + i * rela->sec->sh_entsize; +unsigned int symndx = ELF64_R_SYM(r->r_info); +void *dest = base->load_addr + r->r_offset; /* P */ +bool overflow_check = true; int ovf = 0; - -r = rela->data + i * rela->sec->sh_entsize; - -symndx = ELF64_R_SYM(r->r_info); +uint64_t val; if ( symndx > elf->nsym ) { @@ -262,11 +259,8 @@ int arch_livepatch_perform_rela(struct livepatch_elf *elf, return -EINVAL; } -dest = base->load_addr + r->r_offset; /* P */ val = elf->sym[symndx].sym->st_value + r->r_addend; /* S+A */ -overflow_check = true; - /* ARM64 operations at minimum are always 32-bit. */ if ( r->r_offset >= base->sec->sh_size || (r->r_offset + sizeof(uint32_t)) > base->sec->sh_size ) @@ -403,6 +397,7 @@ int arch_livepatch_perform_rela(struct livepatch_elf *elf, case R_AARCH64_ADR_PREL_PG_HI21_NC: overflow_check = false; +/* Fallthrough. */ case R_AARCH64_ADR_PREL_PG_HI21: ovf = reloc_insn_imm(RELOC_OP_PAGE, dest, val, 12, 21, AARCH64_INSN_IMM_ADR); diff --git a/xen/arch/x86/livepatch.c b/xen/arch/x86/livepatch.c index dd50dd1..7917610 100644 ---
[Xen-devel] [PATCH for-4.9 v3 0/3] Fixes for livepatching
Andrew Cooper (3): xen/livepatch: Clean up arch relocation handling xen/livepatch: Use zeroed memory allocations for arrays xen/livepatch: Don't crash on encountering STN_UNDEF relocations xen/arch/arm/arm32/livepatch.c | 41 + xen/arch/arm/arm64/livepatch.c | 33 - xen/arch/x86/livepatch.c | 27 ++- xen/common/livepatch.c | 4 ++-- xen/common/livepatch_elf.c | 4 ++-- 5 files changed, 67 insertions(+), 42 deletions(-) -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH for-4.9 v3 2/3] xen/livepatch: Use zeroed memory allocations for arrays
Each of these arrays is sparse. Use zeroed allocations to cause uninitialised array elements to contain deterministic values, most importantly for the embedded pointers. Signed-off-by: Andrew Cooper--- CC: Konrad Rzeszutek Wilk CC: Ross Lagerwall * new in v3 --- xen/common/livepatch.c | 4 ++-- xen/common/livepatch_elf.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c index df67a1a..66d532d 100644 --- a/xen/common/livepatch.c +++ b/xen/common/livepatch.c @@ -771,8 +771,8 @@ static int build_symbol_table(struct payload *payload, } } -symtab = xmalloc_array(struct livepatch_symbol, nsyms); -strtab = xmalloc_array(char, strtab_len); +symtab = xzalloc_array(struct livepatch_symbol, nsyms); +strtab = xzalloc_array(char, strtab_len); if ( !strtab || !symtab ) { diff --git a/xen/common/livepatch_elf.c b/xen/common/livepatch_elf.c index c4a9633..b69e271 100644 --- a/xen/common/livepatch_elf.c +++ b/xen/common/livepatch_elf.c @@ -52,7 +52,7 @@ static int elf_resolve_sections(struct livepatch_elf *elf, const void *data) int rc; /* livepatch_elf_load sanity checked e_shnum. */ -sec = xmalloc_array(struct livepatch_elf_sec, elf->hdr->e_shnum); +sec = xzalloc_array(struct livepatch_elf_sec, elf->hdr->e_shnum); if ( !sec ) { dprintk(XENLOG_ERR, LIVEPATCH"%s: Could not allocate memory for section table!\n", @@ -225,7 +225,7 @@ static int elf_get_sym(struct livepatch_elf *elf, const void *data) /* No need to check values as elf_resolve_sections did it. */ nsym = symtab_sec->sec->sh_size / symtab_sec->sec->sh_entsize; -sym = xmalloc_array(struct livepatch_elf_sym, nsym); +sym = xzalloc_array(struct livepatch_elf_sym, nsym); if ( !sym ) { dprintk(XENLOG_ERR, LIVEPATCH "%s: Could not allocate memory for symbols\n", -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH for-4.9 v3 3/3] xen/livepatch: Don't crash on encountering STN_UNDEF relocations
A symndx of STN_UNDEF is special, and means a symbol value of 0. While legitimate in the ELF standard, its existance in a livepatch is questionable at best. Until a plausible usecase presents itself, reject such a relocation with -EOPNOTSUPP. Additionally, fix an off-by-one error while range checking symndx, and perform a safety check on elf->sym[symndx].sym before derefencing it, to avoid tripping over a NULL pointer when calculating val. Signed-off-by: Andrew Cooper--- CC: Konrad Rzeszutek Wilk CC: Ross Lagerwall CC: Jan Beulich CC: Stefano Stabellini CC: Julien Grall v3: * Fix off-by-one error v2: * Reject STN_UNDEF with -EOPNOTSUPP --- xen/arch/arm/arm32/livepatch.c | 14 +- xen/arch/arm/arm64/livepatch.c | 14 +- xen/arch/x86/livepatch.c | 14 +- 3 files changed, 39 insertions(+), 3 deletions(-) diff --git a/xen/arch/arm/arm32/livepatch.c b/xen/arch/arm/arm32/livepatch.c index a328179..41378a5 100644 --- a/xen/arch/arm/arm32/livepatch.c +++ b/xen/arch/arm/arm32/livepatch.c @@ -254,12 +254,24 @@ int arch_livepatch_perform(struct livepatch_elf *elf, addend = get_addend(type, dest); } -if ( symndx > elf->nsym ) +if ( symndx == STN_UNDEF ) +{ +dprintk(XENLOG_ERR, LIVEPATCH "%s: Encountered STN_UNDEF\n", +elf->name); +return -EOPNOTSUPP; +} +else if ( symndx >= elf->nsym ) { dprintk(XENLOG_ERR, LIVEPATCH "%s: Relative symbol wants symbol@%u which is past end!\n", elf->name, symndx); return -EINVAL; } +else if ( !elf->sym[symndx].sym ) +{ +dprintk(XENLOG_ERR, LIVEPATCH "%s: No relative symbol@%u\n", +elf->name, symndx); +return -EINVAL; +} val = elf->sym[symndx].sym->st_value; /* S */ diff --git a/xen/arch/arm/arm64/livepatch.c b/xen/arch/arm/arm64/livepatch.c index 63929b1..2247b92 100644 --- a/xen/arch/arm/arm64/livepatch.c +++ b/xen/arch/arm/arm64/livepatch.c @@ -252,12 +252,24 @@ int arch_livepatch_perform_rela(struct livepatch_elf *elf, int ovf = 0; uint64_t val; -if ( symndx > elf->nsym ) +if ( symndx == STN_UNDEF ) +{ +dprintk(XENLOG_ERR, LIVEPATCH "%s: Encountered STN_UNDEF\n", +elf->name); +return -EOPNOTSUPP; +} +else if ( symndx >= elf->nsym ) { dprintk(XENLOG_ERR, LIVEPATCH "%s: Relative relocation wants symbol@%u which is past end!\n", elf->name, symndx); return -EINVAL; } +else if ( !elf->sym[symndx].sym ) +{ +dprintk(XENLOG_ERR, LIVEPATCH "%s: No relative symbol@%u\n", +elf->name, symndx); +return -EINVAL; +} val = elf->sym[symndx].sym->st_value + r->r_addend; /* S+A */ diff --git a/xen/arch/x86/livepatch.c b/xen/arch/x86/livepatch.c index 7917610..406eb91 100644 --- a/xen/arch/x86/livepatch.c +++ b/xen/arch/x86/livepatch.c @@ -170,12 +170,24 @@ int arch_livepatch_perform_rela(struct livepatch_elf *elf, uint8_t *dest = base->load_addr + r->r_offset; uint64_t val; -if ( symndx > elf->nsym ) +if ( symndx == STN_UNDEF ) +{ +dprintk(XENLOG_ERR, LIVEPATCH "%s: Encountered STN_UNDEF\n", +elf->name); +return -EOPNOTSUPP; +} +else if ( symndx >= elf->nsym ) { dprintk(XENLOG_ERR, LIVEPATCH "%s: Relative relocation wants symbol@%u which is past end!\n", elf->name, symndx); return -EINVAL; } +else if ( !elf->sym[symndx].sym ) +{ +dprintk(XENLOG_ERR, LIVEPATCH "%s: No symbol@%u\n", +elf->name, symndx); +return -EINVAL; +} val = r->r_addend + elf->sym[symndx].sym->st_value; -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] VT-d: fix VF of RC integrated endpoint matched to wrong VT-d unit
On 2017-06-22 11:52:50 -0400, Konrad Rzeszutek Wilk wrote: > On Thu, Jun 22, 2017 at 09:31:50AM -0600, Jan Beulich wrote: > > >>> On 22.06.17 at 16:21,wrote: > > > On Thu, Jun 22, 2017 at 03:26:04AM -0600, Jan Beulich wrote: > > > On 21.06.17 at 12:47, wrote: > > >>> The problem is a VF of RC integrated PF (e.g. PF's BDF is 00:02.0), > > >>> we would wrongly use 00:00.0 to search VT-d unit. > > >>> > > >>> To search VT-d unit for a VF, the BDF of the PF is used. And If the > > >>> PF is an Extended Function, the BDF of one traditional function is > > >>> used. The following line (from acpi_find_matched_drhd_unit()): > > >>> devfn = PCI_SLOT(pdev->info.physfn.devfn) ? 0 : > > >>> pdev->info.physfn.devfn; > > >>> sets 'devfn' to 0 if PF's devfn > 7. Apparently, it treats all > > >>> PFs which has devfn > 7 as extended function. However, it is wrong for > > >>> a RC integrated PF, which is not ARI-capable but may have devfn > 7. > > >> > > >>I'm again having trouble with you talking about ARI and RC > > >>integrated here, but not checking for either in any way in the > > >>new code. Please make sure you establish the full connection > > >>in the description. > > > > > > Sorry for this. Let me explain this again. > > > > > > From SRIOV spec 3.7.3, it says: > > > "ARI is not applicable to Root Complex Integrated Endpoints; all other > > > SR-IOV Capable Devices (Devices that include at least one PF) shall > > > implement the ARI Capability in each Function." > > > > > > So I _think_ PFs can be classified to two kinds: one is RC integrated > > > PF and the other is non-RC integrated PF. The former can't support ARI. > > > The latter shall support ARI. Only for extended functions, one > > > traditional function's BDF should be used to search VT-d unit. And > > > according to PCIE spec, Extended function means within an ARI Device, a > > > Function whose Function Number is greater than 7. So the former > > > can't be an extended function. The latter is an extended function as > > > long as PF's devfn > 7, this check is exactly what the original code > > > did. So I think the original code didn't aware the former > > > (aka, RC integrated endpoints.). This patch checks the is_extfn > > > directly. All of this is only my understanding. I need you and Kevin's > > > help to decide it's right or not. > > > > This makes sense to me, but as said, the patch description will need > > to include this in some form. > > > > >>> --- a/xen/drivers/passthrough/vtd/dmar.c > > >>> +++ b/xen/drivers/passthrough/vtd/dmar.c > > >>> @@ -218,8 +218,18 @@ struct acpi_drhd_unit > > >>> *acpi_find_matched_drhd_unit(const > > >>> struct pci_dev *pdev) > > >>> } > > >>> else if ( pdev->info.is_virtfn ) > > >>> { > > >>> +struct pci_dev *physfn; > > >> > > >>const > > >> > > >>> bus = pdev->info.physfn.bus; > > >>> -devfn = PCI_SLOT(pdev->info.physfn.devfn) ? 0 : > > >>> pdev->info.physfn.devfn; > > >>> +/* > > >>> + * Use 0 as 'devfn' to search VT-d unit when the physical > > >>> function > > >>> + * is an Extended Function. > > >>> + */ > > >>> +pcidevs_lock(); > > >>> +physfn = pci_get_pdev(pdev->seg, bus, pdev->info.physfn.devfn); > > >>> +pcidevs_unlock(); > > >>> +ASSERT(physfn); > > >>> +devfn = physfn->info.is_extfn ? 0 : pdev->info.physfn.devfn; > > >> > > >>This change looks to be fine is we assume that is_extfn is always > > >>set correctly. Looking at the Linux code setting it, I'm not sure > > >>though: I can't see any connection to the PF needing to be RC > > >>integrated there. > > > > > > Linux code sets it when > > > pci_ari_enabled(pci_dev->bus) && PCI_SLOT(pci_dev->devfn) > > > > > > I _think_ pci_ari_enabled(pci_dev->bus) means ARIforwarding is enabled > > > in the immediatedly upstream Downstream port. Thus, I think the pci_dev > > > is an ARI-capable device for PCIe spec 6.13 says: > > > > > > It is strongly recommended that software in general Set the ARI > > > Forwarding Enable bit in a 5 Downstream Port only if software is certain > > > that the device immediately below the Downstream Port is an ARI Device. > > > If the bit is Set when a non-ARI Device is present, the non-ARI Device > > > can respond to Configuration Space accesses under what it interprets as > > > being different Device Numbers, and its Functions can be aliased under > > > multiple Device Numbers, generally leading to undesired behavior. > > > > > > and the pci_dev can't be a RC integrated endpoints. From another side, it > > > also means the is_extfn won't be set for RC integrated PF. Is that > > > right? > > > > Well, I'm not sure about the Linux parts here? Konrad, do you > > happen to know? Or do you know someone who does? pci_ari_enabled() and related code trusts that an RC integrated endpoint does not present the PCI_EXT_CAP_ID_ARI capability. As long as we do
[Xen-devel] [xen-unstable-smoke test] 110976: tolerable trouble: broken/pass - PUSHED
flight 110976 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/110976/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-arm64-arm64-xl-xsm 1 build-check(1) blocked n/a test-amd64-amd64-libvirt 12 migrate-support-checkfail never pass test-armhf-armhf-xl 12 migrate-support-checkfail never pass test-armhf-armhf-xl 13 saverestore-support-checkfail never pass version targeted for testing: xen 579d698da608a24ab334a6a38d932176bac5cecd baseline version: xen 4514f788d1024ab727ed5d6cc29aed9e8f24 Last test of basis 110964 2017-06-22 08:02:12 Z0 days Testing same since 110976 2017-06-22 16:01:47 Z0 days1 attempts People who touched revisions under test: Bernhard M. WiedemannBernhard M. Wiedemann Ian Jackson Wei Liu jobs: build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-arm64-arm64-xl-xsm broken test-amd64-amd64-xl-qemuu-debianhvm-i386 pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : + branch=xen-unstable-smoke + revision=579d698da608a24ab334a6a38d932176bac5cecd + . ./cri-lock-repos ++ . ./cri-common +++ . ./cri-getconfig +++ umask 002 +++ getrepos getconfig Repos perl -e ' use Osstest; readglobalconfig(); print $c{"Repos"} or die $!; ' +++ local repos=/home/osstest/repos +++ '[' -z /home/osstest/repos ']' +++ '[' '!' -d /home/osstest/repos ']' +++ echo /home/osstest/repos ++ repos=/home/osstest/repos ++ repos_lock=/home/osstest/repos/lock ++ '[' x '!=' x/home/osstest/repos/lock ']' ++ OSSTEST_REPOS_LOCK_LOCKED=/home/osstest/repos/lock ++ exec with-lock-ex -w /home/osstest/repos/lock ./ap-push xen-unstable-smoke 579d698da608a24ab334a6a38d932176bac5cecd + branch=xen-unstable-smoke + revision=579d698da608a24ab334a6a38d932176bac5cecd + . ./cri-lock-repos ++ . ./cri-common +++ . ./cri-getconfig +++ umask 002 +++ getrepos getconfig Repos perl -e ' use Osstest; readglobalconfig(); print $c{"Repos"} or die $!; ' +++ local repos=/home/osstest/repos +++ '[' -z /home/osstest/repos ']' +++ '[' '!' -d /home/osstest/repos ']' +++ echo /home/osstest/repos ++ repos=/home/osstest/repos ++ repos_lock=/home/osstest/repos/lock ++ '[' x/home/osstest/repos/lock '!=' x/home/osstest/repos/lock ']' + . ./cri-common ++ . ./cri-getconfig ++ umask 002 + select_xenbranch + case "$branch" in + tree=xen + xenbranch=xen-unstable-smoke + qemuubranch=qemu-upstream-unstable + '[' xxen = xlinux ']' + linuxbranch= + '[' xqemu-upstream-unstable = x ']' + select_prevxenbranch ++ ./cri-getprevxenbranch xen-unstable-smoke + prevxenbranch=xen-4.9-testing + '[' x579d698da608a24ab334a6a38d932176bac5cecd = x ']' + : tested/2.6.39.x + . ./ap-common ++ : osst...@xenbits.xen.org +++ getconfig OsstestUpstream +++ perl -e ' use Osstest; readglobalconfig(); print $c{"OsstestUpstream"} or die $!; ' ++ : ++ : git://xenbits.xen.org/xen.git ++ : osst...@xenbits.xen.org:/home/xen/git/xen.git ++ : git://xenbits.xen.org/qemu-xen-traditional.git ++ : git://git.kernel.org ++ : git://git.kernel.org/pub/scm/linux/kernel/git ++ : git ++ : git://xenbits.xen.org/xtf.git ++ : osst...@xenbits.xen.org:/home/xen/git/xtf.git ++ : git://xenbits.xen.org/xtf.git ++ : git://xenbits.xen.org/libvirt.git ++ : osst...@xenbits.xen.org:/home/xen/git/libvirt.git ++ : git://xenbits.xen.org/libvirt.git ++ : git://xenbits.xen.org/osstest/rumprun.git ++ : git ++ : git://xenbits.xen.org/osstest/rumprun.git ++ : osst...@xenbits.xen.org:/home/xen/git/osstest/rumprun.git ++ : git://git.seabios.org/seabios.git ++ : osst...@xenbits.xen.org:/home/xen/git/osstest/seabios.git ++ : git://xenbits.xen.org/osstest/seabios.git ++ :
[Xen-devel] Xen 4.9 rc9
Hi all, Xen 4.9 rc8 is tagged. You can check that out from xen.git: git://xenbits.xen.org/xen.git 4.9.0-rc9 For your convenience there is also a tarball at: https://downloads.xenproject.org/release/xen/4.9.0-rc9/xen-4.9.0-rc9.tar.gz And the signature is at: https://downloads.xenproject.org/release/xen/4.9.0-rc9/xen-4.9.0-rc9.tar.gz.sig Please send bug reports and test reports to xen-de...@lists.xenproject.org. When sending bug reports, please CC relevant maintainers and me (julien.gr...@arm.com). Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] Travis build failing because "tools/xen-detect: try sysfs node for obtaining guest type" ?
Hey, Am I the only one for which Travis seems to be unhappy of this: I/home/travis/build/fdario/xen/tools/misc/../../tools/include xen-detect.c -o xen-detect xen-detect.c: In function ‘check_sysfs’: xen-detect.c:196:17: error: ignoring return value of ‘asprintf’, declared with attribute warn_unused_result [-Werror=unused-result] asprintf(, "V%s.%s", str, tmp); ^ xen-detect.c: In function ‘check_for_xen’: xen-detect.c:93:17: error: ignoring return value of ‘asprintf’, declared with attribute warn_unused_result [-Werror=unused-result] asprintf(, "V%u.%u", ^ cc1: all warnings being treated as errors https://travis-ci.org/fdario/xen/jobs/245864401 Which, to me, looks related to 48d0c822640f8ce4754de16f1bee5c995bac7078 ("tools/xen-detect: try sysfs node for obtaining guest type"). I can, however, build the tools locally, with: gcc version 6.3.0 20170516 (Debian 6.3.0-18) Thoughts? Regards, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC v2]Proposal to allow setting up shared memory areas between VMs from xl config file
Hi Wei, Thank you for your valuable comments. 2017-06-21 23:09 GMT+08:00 Wei Liu: > On Wed, Jun 21, 2017 at 01:18:38AM +0800, Zhongze Liu wrote: >> >> 1. Motivation and Description >> >> Virtual machines use grant table hypercalls to setup a share page for >> inter-VMs communications. These hypercalls are used by all PV >> protocols today. However, very simple guests, such as baremetal >> applications, might not have the infrastructure to handle the grant table. >> This project is about setting up several shared memory areas for inter-VMs >> communications directly from the VM config file. >> So that the guest kernel doesn't have to have grant table support (in the >> embedded space, this is not unusual) to be able to communicate with >> other guests. >> >> >> 2. Implementation Plan: >> >> >> == >> 2.1 Introduce a new VM config option in xl: >> == >> The shared areas should be shareable among several (>=2) VMs, so >> every shared physical memory area is assigned to a set of VMs. >> Therefore, a “token” or “identifier” should be used here to uniquely >> identify a backing memory area. >> >> The backing area would be taken from one domain, which we will regard >> as the "master domain", and this domain should be created prior to any >> other "slave domain"s. Again, we have to use some kind of tag to tell who >> is the "master domain". >> >> And the ability to specify the attributes of the pages (say, WO/RO/X) >> to be shared should be also given to the user. For the master domain, >> these attributes often describes the maximum permission allowed for the >> shared pages, and for the slave domains, these attributes are often used >> to describe with what permissions this area will be mapped. >> This information should also be specified in the xl config entry. >> > > I don't quite get the attribute settings. If you only insert a backing > page into guest physical address space with XENMEM hypercall, how do you > audit the attributes when the guest tries to map the page? > I'm still considering about this, and any suggestions are welcomed. The current plan I have in mind is XENMEM_access_op. >> To handle all these, I would suggest using an unsigned integer to serve as >> the >> identifier, and using a "master" tag in the master domain's xl config entry >> to announce that she will provide the backing memory pages. A separate >> entry would be used to describe the attributes of the shared memory area, of >> the form "prot=RW". > > I think using an integer is too limiting. You would need the user to > know if a particular number is already used. Maybe using a number is > good enough for the use case you have in mind, but it is not future > proof. I don't know how sophisticated we want this to be, though. > Sounds reasonable. I chose integers because I think integers are fast and easy to manipulate. But integers are somewhat hard to memorize and this isn't a good thing from a user's point of view. So maybe I'll make it a string with a maximum size of 32 or longer. >> For example: >> >> In xl config file of vm1: >> >> static_shared_mem = ["id = ID1, begin = gmfn1, end = gmfn2, >> granularity = 4k, prot = RO, master”, >> "id = ID2, begin = gmfn3, end = gmfn4, > > I think you mean "gpfn" here and below. > Yes, according to https://wiki.xenproject.org/wiki/XenTerminology, the section "Address Spaces", gmfn == gpfn for auto-translated guests. But this usage seems to be outdated and should be phased out according to include/xen/mm.h. And just as what Julien has pointed out, the term "gfn" should be used here. >> granularity = 4k, prot = RW, master”] >> >> In xl config file of vm2: >> >> static_shared_mem = ["id = ID1, begin = gmfn5, end = gmfn6, >> granularity = 4k, prot = RO”] >> >> In xl config file of vm3: >> >> static_shared_mem = ["id = ID2, begin = gmfn7, end = gmfn8, >> granularity = 4k, prot = RW”] >> >> gmfn's above are all hex of the form "0x2". >> >> In the example above. A memory area ID1 will be shared between vm1 and vm2. >> This area will be taken from vm1 and mapped into vm2's stage-2 page table. >> The parameter "prot=RO" means that this memory area are offered with >> read-only >> permission. vm1 can access this area using gmfn1~gmfn2, and vm2 using >> gmfn5~gmfn6. >> Likewise, a memory area ID will be shared between vm1 and vm3 with read and >> write permissions. vm1 is the master and vm2 the slave. vm1 can access the >> area using gmfn3~gmfn4 and vm3 using gmfn7~gmfn8. >> >> The "granularity" is optional in the slaves' config entries. But if it's >> presented in the
Re: [Xen-devel] [PATCH v3 5/9] xen/vpci: add handlers to map the BARs
On Fri, May 19, 2017 at 09:21:56AM -0600, Jan Beulich wrote: > >>> On 27.04.17 at 16:35,wrote: > > +static int vpci_modify_bars(struct pci_dev *pdev, const bool map) > > +{ > > +struct vpci_header *header = >vpci->header; > > +unsigned int i; > > +int rc = 0; > > + > > +for ( i = 0; i < ARRAY_SIZE(header->bars); i++ ) > > +{ > > +paddr_t gaddr = map ? header->bars[i].gaddr > > +: header->bars[i].mapped_addr; > > +paddr_t paddr = header->bars[i].paddr; > > + > > +if ( header->bars[i].type != VPCI_BAR_MEM && > > + header->bars[i].type != VPCI_BAR_MEM64_LO ) > > +continue; > > + > > +rc = modify_mmio(pdev->domain, _gfn(PFN_DOWN(gaddr)), > > + _mfn(PFN_DOWN(paddr)), > > PFN_UP(header->bars[i].size), > > The PFN_UP() indicates a problem: For sub-page BARs you can't > blindly map/unmap them without taking into consideration other > devices sharing the same page. I'm not sure I follow, the start address of BARs is always aligned to a 4KB boundary, so there's no chance of the same page being used by two different BARs at the same time. The size is indeed not aligned to 4KB, but I don't see how this can cause collisions with other BARs unless the domain is actively trying to make the BARs overlap, in which case there's not much Xen can do. > > + map); > > +if ( rc ) > > +break; > > + > > +header->bars[i].mapped_addr = map ? gaddr : 0; > > +} > > + > > +return rc; > > +} > > Shouldn't this function somewhere honor the unset flags? Right, I've added a check to make sure the BAR is positioned before trying to map it into the domain p2m. > > +static int vpci_cmd_read(struct pci_dev *pdev, unsigned int reg, > > + union vpci_val *val, void *data) > > +{ > > +struct vpci_header *header = data; > > + > > +val->word = header->command; > > Rather than reading back and storing the value in the write handler, > I'd recommending doing an actual read here. OK. > > +static int vpci_cmd_write(struct pci_dev *pdev, unsigned int reg, > > + union vpci_val val, void *data) > > +{ > > +struct vpci_header *header = data; > > +uint16_t new_cmd, saved_cmd; > > +uint8_t seg = pdev->seg, bus = pdev->bus; > > +uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn); > > +int rc; > > + > > +new_cmd = val.word; > > +saved_cmd = header->command; > > + > > +if ( !((new_cmd ^ saved_cmd) & PCI_COMMAND_MEMORY) ) > > +goto out; > > + > > +/* Memory space access change. */ > > +rc = vpci_modify_bars(pdev, new_cmd & PCI_COMMAND_MEMORY); > > +if ( rc ) > > +{ > > +dprintk(XENLOG_ERR, > > +"%04x:%02x:%02x.%u:unable to %smap BARs: %d\n", > > +seg, bus, slot, func, > > +new_cmd & PCI_COMMAND_MEMORY ? "" : "un", rc); > > +return rc; > > I guess you can guess the question already: What is the bare > hardware equivalent of this failure return? Yes, this is already fixed since write handlers simply return void. The hw equivalent would be to ignore the write AFAICT (ie: memory decoding will not be enabled). Are you fine with the dprintk or would you also like me to remove that? (IMHO it's helpful for debugging). > > +} > > + > > + out: > > Please try to avoid goto-s and labels for other than error handling > (and even then only when code would otherwise end up pretty > convoluted). Done. > > +static int vpci_bar_read(struct pci_dev *pdev, unsigned int reg, > > + union vpci_val *val, void *data) > > +{ > > +struct vpci_bar *bar = data; > > const > > > +bool hi = false; > > + > > +ASSERT(bar->type == VPCI_BAR_MEM || bar->type == VPCI_BAR_MEM64_LO || > > + bar->type == VPCI_BAR_MEM64_HI); > > + > > +if ( bar->type == VPCI_BAR_MEM64_HI ) > > +{ > > +ASSERT(reg - PCI_BASE_ADDRESS_0 > 0); > > reg > PCI_BASE_ADDRESS_0 Fixed. > > +bar--; > > +hi = true; > > +} > > + > > +if ( bar->sizing ) > > +val->double_word = ~(bar->size - 1) >> (hi ? 32 : 0); > > There's also a comment further down - this is producing undefined > behavior on 32-bits arches. I've changed size to be a uint64_t. > > +static int vpci_bar_write(struct pci_dev *pdev, unsigned int reg, > > + union vpci_val val, void *data) > > +{ > > +struct vpci_bar *bar = data; > > +uint32_t wdata = val.double_word; > > +bool hi = false, unset = false; > > + > > +ASSERT(bar->type == VPCI_BAR_MEM || bar->type == VPCI_BAR_MEM64_LO || > > + bar->type == VPCI_BAR_MEM64_HI); > > + > > +if ( wdata == GENMASK(31, 0) ) > > I'm afraid this again doesn't match real hardware behavior: As the > low bits are r/o, writes with them having any value, but all other
Re: [Xen-devel] [PATCH for-4.9 v2] xen/livepatch: Don't crash on encountering STN_UNDEF relocations
On Thu, Jun 22, 2017 at 12:33:57PM -0400, Konrad Rzeszutek Wilk wrote: > On Thu, Jun 22, 2017 at 12:10:46PM -0400, Konrad Rzeszutek Wilk wrote: > > On Thu, Jun 22, 2017 at 11:27:50AM -0400, Konrad Rzeszutek Wilk wrote: > > > On Wed, Jun 21, 2017 at 09:26:15PM -0400, Konrad Rzeszutek Wilk wrote: > > > > On Wed, Jun 21, 2017 at 07:13:36PM +0100, Andrew Cooper wrote: > > > > > A symndx of STN_UNDEF is special, and means a symbol value of 0. > > > > > While > > > > > legitimate in the ELF standard, its existance in a livepatch is > > > > > questionable > > > > > at best. Until a plausible usecase presents itself, reject such a > > > > > relocation > > > > > with -EOPNOTSUPP. > > > > > > > > > > Additionally, perform a safety check on elf->sym[symndx].sym before > > > > > derefencing it, to avoid tripping over a NULL pointer when > > > > > calculating val. > > > > > > > > > > Signed-off-by: Andrew Cooper> > > > > > > > Reviewed-by: Konrad Rzeszutek Wilk > > > > Tested-by: Konrad Rzeszutek Wilk [x86 right > > > > now, will do > > > > arm32 tomorrow] > > > > > > I did that on my Cubietruck and I made the rookie mistake of not trying > > > a hypervisor _without_ your changes, so I don't know if this crash > > > (see inline) is due to your patch or something else. > > > > > > Also I messed up and made the livepatch test run every time it boots, so > > > now it is stuck in a loop of crashes :-( > > > > > > The git tree is: > > > > > > git://xenbits.xen.org/people/konradwilk/xen.git staging-4.9 > > > > > > Stay tuned. > > > > And I see the same thing with b38b147 (that is the top of 'origin/staging'). > > > > So time to dig in. > > /me blushes. > > I compiled the hypervisor and the livepatches on a cross compiler. > arm-linux-gnueabi-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609 > > > But if I compile both on the Cubietruck (natively) it all works nicely. > gcc (Ubuntu/Linaro 4.8.2-19ubuntu1) 4.8.2 > > So: > > Tested-by: Konrad Rzeszutek Wilk [x86, arm32] > > for both of the patches. Sorry for the alarm. Jan, Do you recall perchance this thread: http://www.mail-archive.com/xen-devel@lists.xen.org/msg80633.html I am thinking to ressurect it but to follow the same spirit as here, that is return -ENOTSUPPO if the sh_addralign is not the correct value. > > Julien, would you be OK with these two going in 4.9? Please? ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC v2]Proposal to allow setting up shared memory areas between VMs from xl config file
Hi Julien, 2017-06-21 1:29 GMT+08:00 Julien Grall: > Hi, > > Thank you for the new proposal. > > On 06/20/2017 06:18 PM, Zhongze Liu wrote: >> >> In the example above. A memory area ID1 will be shared between vm1 and >> vm2. >> This area will be taken from vm1 and mapped into vm2's stage-2 page table. >> The parameter "prot=RO" means that this memory area are offered with >> read-only >> permission. vm1 can access this area using gmfn1~gmfn2, and vm2 using >> gmfn5~gmfn6. > > > [...] > >> >> == >> 2.3 mapping the memory areas >> == >> Handle the newly added config option in tools/{xl, libxl} and utilize >> toos/libxc to do the actual memory mapping. Specifically, we will use >> a wrapper to XENMME_add_to_physmap_batch with XENMAPSPACE_gmfn_foreign to >> do the actual mapping. But since there isn't such a wrapper in libxc, >> we'll >> have to add a new wrapper, xc_domain_add_to_physmap_batch in >> libxc/xc_domain.c > > > In the paragrah above, you suggest the user can select the permission on the > shared page. However, the hypercall XENMEM_add_to_physmap does not currently > take permission. So how do you plan to handle that? > I think this could be done via XENMEM_access_op? Cheers, Zhongze Liu ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/4] xen: credit2: implement utilization cap
On 08/06/17 13:08, Dario Faggioli wrote: > This commit implements the Xen part of the cap mechanism for > Credit2. > > A cap is how much, in terms of % of physical CPU time, a domain > can execute at most. > > For instance, a domain that must not use more than 1/4 of one > physical CPU, must have a cap of 25%; one that must not use more > than 1+1/2 of physical CPU time, must be given a cap of 150%. > > Caps are per domain, so it is all a domain's vCPUs, cumulatively, > that will be forced to execute no more than the decided amount. > > This is implemented by giving each domain a 'budget', and using > a (per-domain again) periodic timer. Values of budget and 'period' > are chosen so that budget/period is equal to the cap itself. > > Budget is burned by the domain's vCPUs, in a similar way to how > credits are. > > When a domain runs out of budget, its vCPUs can't run any longer. > They can gain, when the budget is replenishment by the timer, which > event happens once every period. > > Blocking the vCPUs because of lack of budget happens by > means of a new (_VPF_parked) pause flag, so that, e.g., > vcpu_runnable() still works. This is similar to what is > done in sched_rtds.c, as opposed to what happens in > sched_credit.c, where vcpu_pause() and vcpu_unpause() > (which means, among other things, more overhead). > > Note that xenalyze and tools/xentrace/format are also modified, > to keep them updated with one modified event. > > Signed-off-by: Dario FaggioliLooks really good overall, Dario! Just a few relatively minor comments. > diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c > index 126417c..ba4bf4b 100644 > --- a/xen/common/sched_credit2.c > +++ b/xen/common/sched_credit2.c > @@ -92,6 +92,82 @@ > */ > > /* > + * Utilization cap: > + * > + * Setting an pCPU utilization cap for a domain means the following: > + * > + * - a domain can have a cap, expressed in terms of % of physical CPU time. > + * A domain that must not use more than 1/4 of _one_ physical CPU, will > + * be given a cap of 25%; a domain that must not use more than 1+1/2 of > + * physical CPU time, will be given a cap of 150%; > + * > + * - caps are per-domain (not per-vCPU). If a domain has only 1 vCPU, and > + * a 40% cap, that one vCPU will use 40% of one pCPU. If a somain has 4 > + * vCPUs, and a 200% cap, all its 4 vCPUs are allowed to run for (the > + * equivalent of) 100% time on 2 pCPUs. How much each of the various 4 > + * vCPUs will get, is unspecified (will depend on various aspects: > workload, > + * system load, etc.). > + * > + * For implementing this, we use the following approach: > + * > + * - each domain is given a 'budget', an each domain has a timer, which > + * replenishes the domain's budget periodically. The budget is the amount > + * of time the vCPUs of the domain can use every 'period'; > + * > + * - the period is CSCHED2_BDGT_REPL_PERIOD, and is the same for all domains > + * (but each domain has its own timer; so the all are periodic by the same > + * period, but replenishment of the budgets of the various domains, at > + * periods boundaries, are not synchronous); > + * > + * - when vCPUs run, they consume budget. When they don't run, they don't > + * consume budget. If there is no budget left for the domain, no vCPU of > + * that domain can run. If a vCPU tries to run and finds that there is no > + * budget, it blocks. > + * Budget never expires, so at whatever time a vCPU wants to run, it can > + * check the domain's budget, and if there is some, it can use it. I'm not sure what this paragraph is trying to say. Saying budget "never expires" makes it sound like you continue to accumulate it, such that if you don't run at all for several periods, you could "save it up" and run at 100% for one full period. But that's contradicted by... > + * - budget is replenished to the top of the capacity for the domain once > + * per period. Even if there was some leftover budget from previous period, > + * though, the budget after a replenishment will always be at most equal > + * to the total capacify of the domain ('tot_budget'); ...this paragraph. > + * - when a budget replenishment occurs, if there are vCPUs that had been > + * blocked because of lack of budget, they'll be unblocked, and they will > + * (potentially) be able to run again. > + * > + * Finally, some even more implementation related detail: > + * > + * - budget is stored in a domain-wide pool. vCPUs of the domain that want > + * to run go to such pool, and grub some. When they do so, the amount > + * they grabbed is _immediately_ removed from the pool. This happens in > + * vcpu_try_to_get_budget(); This sounds like a good solution to the "greedy vcpu" problem. :-) > + * - when vCPUs stop running, if they've not consumed all the budget they > + * took, the leftover is put back in the pool. This happens in > + *
Re: [Xen-devel] [PATCH v7 27/36] iommu/amd: Allow the AMD IOMMU to work with memory encryption
On 6/22/2017 5:56 AM, Borislav Petkov wrote: On Fri, Jun 16, 2017 at 01:54:59PM -0500, Tom Lendacky wrote: The IOMMU is programmed with physical addresses for the various tables and buffers that are used to communicate between the device and the driver. When the driver allocates this memory it is encrypted. In order for the IOMMU to access the memory as encrypted the encryption mask needs to be included in these physical addresses during configuration. The PTE entries created by the IOMMU should also include the encryption mask so that when the device behind the IOMMU performs a DMA, the DMA will be performed to encrypted memory. Signed-off-by: Tom Lendacky--- drivers/iommu/amd_iommu.c | 30 -- drivers/iommu/amd_iommu_init.c | 34 -- drivers/iommu/amd_iommu_proto.h | 10 ++ drivers/iommu/amd_iommu_types.h |2 +- 4 files changed, 55 insertions(+), 21 deletions(-) Reviewed-by: Borislav Petkov Btw, I'm assuming the virt_to_phys() difference on SME systems is only needed in a handful of places. Otherwise, I'd suggest changing the virt_to_phys() function/macro directly. But I guess most of the places need the real physical address without the enc bit. Correct. Thanks, Tom ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v3] libxc: add xc_domain_add_to_physmap_batch to wrap XENMEM_add_to_physmap_batch
This is a preparation for the proposal "allow setting up shared memory areas between VMs from xl config file". See: V2: https://lists.xen.org/archives/html/xen-devel/2017-06/msg02256.html V1: https://lists.xen.org/archives/html/xen-devel/2017-05/msg01288.html The plan is to use XENMEM_add_to_physmap_batch in xl to map foregin pages from one DomU to another so that the page could be shared. But currently there is no wrapper for XENMEM_add_to_physmap_batch in libxc, so we just add a wrapper for it. Signed-off-by: Zhongze Liu--- Changed Since v2: * fix coding style issue * let rc = 1 on buffer bouncing failures Changed Since v1: * explain why such a sudden wrapper * change the parameters' types Cc: Ian Jackson , Cc: Wei Liu , Cc: Stefano Stabellini Cc: Julien Grall Cc: Jan Beulich
Re: [Xen-devel] [PATCH for-4.9 v2] xen/livepatch: Don't crash on encountering STN_UNDEF relocations
On Thu, Jun 22, 2017 at 12:10:46PM -0400, Konrad Rzeszutek Wilk wrote: > On Thu, Jun 22, 2017 at 11:27:50AM -0400, Konrad Rzeszutek Wilk wrote: > > On Wed, Jun 21, 2017 at 09:26:15PM -0400, Konrad Rzeszutek Wilk wrote: > > > On Wed, Jun 21, 2017 at 07:13:36PM +0100, Andrew Cooper wrote: > > > > A symndx of STN_UNDEF is special, and means a symbol value of 0. While > > > > legitimate in the ELF standard, its existance in a livepatch is > > > > questionable > > > > at best. Until a plausible usecase presents itself, reject such a > > > > relocation > > > > with -EOPNOTSUPP. > > > > > > > > Additionally, perform a safety check on elf->sym[symndx].sym before > > > > derefencing it, to avoid tripping over a NULL pointer when calculating > > > > val. > > > > > > > > Signed-off-by: Andrew Cooper> > > > > > Reviewed-by: Konrad Rzeszutek Wilk > > > Tested-by: Konrad Rzeszutek Wilk [x86 right now, > > > will do > > > arm32 tomorrow] > > > > I did that on my Cubietruck and I made the rookie mistake of not trying > > a hypervisor _without_ your changes, so I don't know if this crash > > (see inline) is due to your patch or something else. > > > > Also I messed up and made the livepatch test run every time it boots, so > > now it is stuck in a loop of crashes :-( > > > > The git tree is: > > > > git://xenbits.xen.org/people/konradwilk/xen.git staging-4.9 > > > > Stay tuned. > > And I see the same thing with b38b147 (that is the top of 'origin/staging'). > > So time to dig in. /me blushes. I compiled the hypervisor and the livepatches on a cross compiler. arm-linux-gnueabi-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609 But if I compile both on the Cubietruck (natively) it all works nicely. gcc (Ubuntu/Linaro 4.8.2-19ubuntu1) 4.8.2 So: Tested-by: Konrad Rzeszutek Wilk [x86, arm32] for both of the patches. Sorry for the alarm. Julien, would you be OK with these two going in 4.9? Please? ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/2] arm: smccc: handle SMCs/HVCs according to SMCCC
Hi Julien, On 15.06.17 13:48, Julien Grall wrote: Hi Volodymyr, On 14/06/17 15:10, Volodymyr Babchuk wrote: SMCCC (SMC Call Convention) describes how to handle both HVCs and SMCs. SMCCC states that both HVC and SMC are valid conduits to call to a different firmware functions. Thus, for example PSCI calls can be made both by SMC or HVC. Also SMCCC defines function number coding for such calls. Besides functional calls there are query calls, which allows underling OS determine version, UID and number of functions provided by service provider. This patch adds new file `smccc.c`, which handles both generic SMCs and HVC according to SMC. At this moment it implements only one service: Standard Hypervisor Service. Standard Hypervisor Service only supports query calls, so caller can ask about hypervisor UID and determine that it is XEN running. This change allows more generic handling for SMCs and HVCs and it can be easily extended to support new services and functions. Signed-off-by: Volodymyr BabchukReviewed-by: Oleksandr Andrushchenko Reviewed-by: Oleksandr Tyshchenko --- xen/arch/arm/Makefile | 1 + xen/arch/arm/smccc.c| 96 + xen/arch/arm/traps.c| 10 - xen/include/asm-arm/smccc.h | 89 + 4 files changed, 194 insertions(+), 2 deletions(-) create mode 100644 xen/arch/arm/smccc.c create mode 100644 xen/include/asm-arm/smccc.h diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile index 49e1fb2..b8728cf 100644 --- a/xen/arch/arm/Makefile +++ b/xen/arch/arm/Makefile @@ -39,6 +39,7 @@ obj-y += psci.o obj-y += setup.o obj-y += shutdown.o obj-y += smc.o +obj-y += smccc.o obj-y += smp.o obj-y += smpboot.o obj-y += sysctl.o diff --git a/xen/arch/arm/smccc.c b/xen/arch/arm/smccc.c new file mode 100644 index 000..5d10964 --- /dev/null +++ b/xen/arch/arm/smccc.c I would name this file vsmccc.c to show it is about virtual SMC. Also, I would have expected pretty everyone to use the SMCC, so I would even name the file vsmc.c @@ -0,0 +1,96 @@ +/* + * xen/arch/arm/smccc.c + * + * Generic handler for SMC and HVC calls according to + * ARM SMC callling convention s/callling/calling/ + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. I know that some of the other headers are wrong about the GPL license. But Xen is GPLv2 only. Please update the copyright accordingly. I.e: * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + + +#include +#include +#include Why this is included here? You don't use it. +/* Need to include xen/sched.h before asm/domain.h or it breaks build*/ xen/sched.h will include asm/domain.h. So no need to include the latter here. +#include +#include +#include +#include +#include You don't use this header here. +#include +#include + +#define XEN_SMCCC_UID ARM_SMCCC_UID(0xa71812dc, 0xc698, 0x4369, \ +0x9a, 0xcf, 0x79, 0xd1, \ +0x8d, 0xde, 0xe6, 0x67) Please mention that this value was generated. This would avoid to wonder where this value comes from. + +/* + * We can't use XEN version here: + * Major revision should change every time SMC/HVC function is removed. + * Minor revision should change every time SMC/HVC function is added. + * So, it is SMCCC protocol revision code, not XEN version It would be nice to say this is a requirement of the spec. Also missing full stop. + */ +#define XEN_SMCCC_MAJOR_REVISION 0 +#define XEN_SMCCC_MINOR_REVISION 1 I first thought the revision was 0.1.3 and was about to ask why. But then noticed XEN_SMCC_FUNCTION_COUNT is not part of the revision. So please add a newline for clarity. +#define XEN_SMCCC_FUNCTION_COUNT 3 + +/* SMCCC interface for hypervisor. Tell about self */ Tell about itself. + missing full stop. +static bool handle_hypervisor(struct cpu_user_regs *regs, const union hsr hsr) hsr is already part of regs. +{ +switch ( ARM_SMCCC_FUNC_NUM(get_user_reg(regs, 0)) ) +{ +case ARM_SMCCC_FUNC_CALL_COUNT: +set_user_reg(regs, 0, XEN_SMCCC_FUNCTION_COUNT); +return true; +case ARM_SMCCC_FUNC_CALL_UID: +set_user_reg(regs, 0, XEN_SMCCC_UID.a[0]); +
[Xen-devel] [PATCH v2 3/4] arm: traps: handle PSCI calls inside `vsmc.c`
PSCI is part of HVC/SMC interface, so it should be handled in appropriate place: `vsmc.c`. This patch just moves PSCI handler calls from `traps.c` to `vsmc.c`. PSCI is considered as two different "services" in terms of SMCCC. Older PSCI 1.0 is treated as "architecture service", while never PSCI 2.0 is defined as "standard secure service". Signed-off-by: Volodymyr BabchukReviewed-by: Oleksandr Andrushchenko Reviewed-by: Oleksandr Tyshchenko --- Splitted this patch into two. Now this patch does not change the way, how PSCI code accesses the arguments. --- xen/arch/arm/traps.c | 124 -- xen/arch/arm/vsmc.c | 136 ++ xen/include/public/arch-arm/smc.h | 5 ++ 3 files changed, 153 insertions(+), 112 deletions(-) diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index 66242e5..e806474 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -39,7 +39,6 @@ #include #include #include -#include #include #include #include @@ -1450,113 +1449,6 @@ static void do_debug_trap(struct cpu_user_regs *regs, unsigned int code) } #endif -/* helper function for checking arm mode 32/64 bit */ -static inline int psci_mode_check(struct domain *d, register_t fid) -{ -return !( is_64bit_domain(d)^( (fid & PSCI_0_2_64BIT) >> 30 ) ); -} - -static void do_trap_psci(struct cpu_user_regs *regs) -{ -register_t fid = get_user_reg(regs,0); - -/* preloading in case psci_mode_check fails */ -set_user_reg(regs, 0, PSCI_INVALID_PARAMETERS); -switch( fid ) -{ -case PSCI_cpu_off: -{ -uint32_t pstate = get_user_reg(regs, 1); -perfc_incr(vpsci_cpu_off); -set_user_reg(regs, 0, do_psci_cpu_off(pstate)); -} -break; -case PSCI_cpu_on: -{ -uint32_t vcpuid = get_user_reg(regs, 1); -register_t epoint = get_user_reg(regs, 2); -perfc_incr(vpsci_cpu_on); -set_user_reg(regs, 0, do_psci_cpu_on(vcpuid, epoint)); -} -break; -case PSCI_0_2_FN_PSCI_VERSION: -perfc_incr(vpsci_version); -set_user_reg(regs, 0, do_psci_0_2_version()); -break; -case PSCI_0_2_FN_CPU_OFF: -perfc_incr(vpsci_cpu_off); -set_user_reg(regs, 0, do_psci_0_2_cpu_off()); -break; -case PSCI_0_2_FN_MIGRATE_INFO_TYPE: -perfc_incr(vpsci_migrate_info_type); -set_user_reg(regs, 0, do_psci_0_2_migrate_info_type()); -break; -case PSCI_0_2_FN_MIGRATE_INFO_UP_CPU: -case PSCI_0_2_FN64_MIGRATE_INFO_UP_CPU: -perfc_incr(vpsci_migrate_info_up_cpu); -if ( psci_mode_check(current->domain, fid) ) -set_user_reg(regs, 0, do_psci_0_2_migrate_info_up_cpu()); -break; -case PSCI_0_2_FN_SYSTEM_OFF: -perfc_incr(vpsci_system_off); -do_psci_0_2_system_off(); -set_user_reg(regs, 0, PSCI_INTERNAL_FAILURE); -break; -case PSCI_0_2_FN_SYSTEM_RESET: -perfc_incr(vpsci_system_reset); -do_psci_0_2_system_reset(); -set_user_reg(regs, 0, PSCI_INTERNAL_FAILURE); -break; -case PSCI_0_2_FN_CPU_ON: -case PSCI_0_2_FN64_CPU_ON: -perfc_incr(vpsci_cpu_on); -if ( psci_mode_check(current->domain, fid) ) -{ -register_t vcpuid = get_user_reg(regs, 1); -register_t epoint = get_user_reg(regs, 2); -register_t cid = get_user_reg(regs, 3); -set_user_reg(regs, 0, - do_psci_0_2_cpu_on(vcpuid, epoint, cid)); -} -break; -case PSCI_0_2_FN_CPU_SUSPEND: -case PSCI_0_2_FN64_CPU_SUSPEND: -perfc_incr(vpsci_cpu_suspend); -if ( psci_mode_check(current->domain, fid) ) -{ -uint32_t pstate = get_user_reg(regs, 1); -register_t epoint = get_user_reg(regs, 2); -register_t cid = get_user_reg(regs, 3); -set_user_reg(regs, 0, - do_psci_0_2_cpu_suspend(pstate, epoint, cid)); -} -break; -case PSCI_0_2_FN_AFFINITY_INFO: -case PSCI_0_2_FN64_AFFINITY_INFO: -perfc_incr(vpsci_cpu_affinity_info); -if ( psci_mode_check(current->domain, fid) ) -{ -register_t taff = get_user_reg(regs, 1); -uint32_t laff = get_user_reg(regs, 2); -set_user_reg(regs, 0, - do_psci_0_2_affinity_info(taff, laff)); -} -break; -case PSCI_0_2_FN_MIGRATE: -case PSCI_0_2_FN64_MIGRATE: -perfc_incr(vpsci_cpu_migrate); -if ( psci_mode_check(current->domain, fid) ) -{ -uint32_t tcpu = get_user_reg(regs, 1); -set_user_reg(regs, 0, do_psci_0_2_migrate(tcpu)); -} -break; -default: -
[Xen-devel] [PATCH v2 1/4] arm: traps: psci: use generic register accessors
There are standard functions set_user_reg() and get_user_reg(). Use them instead of PSCI_RESULT_REG()/PSCI_ARG() macros. Signed-off-by: Volodymyr Babchuk--- xen/arch/arm/traps.c | 68 ++-- 1 file changed, 29 insertions(+), 39 deletions(-) diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index 6cf9ee7..2054c69 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -1449,16 +1449,6 @@ static void do_debug_trap(struct cpu_user_regs *regs, unsigned int code) } #endif -#ifdef CONFIG_ARM_64 -#define PSCI_RESULT_REG(reg) (reg)->x0 -#define PSCI_ARG(reg,n) (reg)->x##n -#define PSCI_ARG32(reg,n) (uint32_t)( (reg)->x##n & 0x ) -#else -#define PSCI_RESULT_REG(reg) (reg)->r0 -#define PSCI_ARG(reg,n) (reg)->r##n -#define PSCI_ARG32(reg,n) PSCI_ARG(reg,n) -#endif - /* helper function for checking arm mode 32/64 bit */ static inline int psci_mode_check(struct domain *d, register_t fid) { @@ -1467,65 +1457,65 @@ static inline int psci_mode_check(struct domain *d, register_t fid) static void do_trap_psci(struct cpu_user_regs *regs) { -register_t fid = PSCI_ARG(regs,0); +register_t fid = get_user_reg(regs,0); /* preloading in case psci_mode_check fails */ -PSCI_RESULT_REG(regs) = PSCI_INVALID_PARAMETERS; +set_user_reg(regs, 0, PSCI_INVALID_PARAMETERS); switch( fid ) { case PSCI_cpu_off: { -uint32_t pstate = PSCI_ARG32(regs,1); +uint32_t pstate = get_user_reg(regs, 1); perfc_incr(vpsci_cpu_off); -PSCI_RESULT_REG(regs) = do_psci_cpu_off(pstate); +set_user_reg(regs, 0, do_psci_cpu_off(pstate)); } break; case PSCI_cpu_on: { -uint32_t vcpuid = PSCI_ARG32(regs,1); -register_t epoint = PSCI_ARG(regs,2); +uint32_t vcpuid = get_user_reg(regs, 1); +register_t epoint = get_user_reg(regs, 2); perfc_incr(vpsci_cpu_on); -PSCI_RESULT_REG(regs) = do_psci_cpu_on(vcpuid, epoint); +set_user_reg(regs, 0, do_psci_cpu_on(vcpuid, epoint)); } break; case PSCI_0_2_FN_PSCI_VERSION: perfc_incr(vpsci_version); -PSCI_RESULT_REG(regs) = do_psci_0_2_version(); +set_user_reg(regs, 0, do_psci_0_2_version()); break; case PSCI_0_2_FN_CPU_OFF: perfc_incr(vpsci_cpu_off); -PSCI_RESULT_REG(regs) = do_psci_0_2_cpu_off(); +set_user_reg(regs, 0, do_psci_0_2_cpu_off()); break; case PSCI_0_2_FN_MIGRATE_INFO_TYPE: perfc_incr(vpsci_migrate_info_type); -PSCI_RESULT_REG(regs) = do_psci_0_2_migrate_info_type(); +set_user_reg(regs, 0, do_psci_0_2_migrate_info_type()); break; case PSCI_0_2_FN_MIGRATE_INFO_UP_CPU: case PSCI_0_2_FN64_MIGRATE_INFO_UP_CPU: perfc_incr(vpsci_migrate_info_up_cpu); if ( psci_mode_check(current->domain, fid) ) -PSCI_RESULT_REG(regs) = do_psci_0_2_migrate_info_up_cpu(); +set_user_reg(regs, 0, do_psci_0_2_migrate_info_up_cpu()); break; case PSCI_0_2_FN_SYSTEM_OFF: perfc_incr(vpsci_system_off); do_psci_0_2_system_off(); -PSCI_RESULT_REG(regs) = PSCI_INTERNAL_FAILURE; +set_user_reg(regs, 0, PSCI_INTERNAL_FAILURE); break; case PSCI_0_2_FN_SYSTEM_RESET: perfc_incr(vpsci_system_reset); do_psci_0_2_system_reset(); -PSCI_RESULT_REG(regs) = PSCI_INTERNAL_FAILURE; +set_user_reg(regs, 0, PSCI_INTERNAL_FAILURE); break; case PSCI_0_2_FN_CPU_ON: case PSCI_0_2_FN64_CPU_ON: perfc_incr(vpsci_cpu_on); if ( psci_mode_check(current->domain, fid) ) { -register_t vcpuid = PSCI_ARG(regs,1); -register_t epoint = PSCI_ARG(regs,2); -register_t cid = PSCI_ARG(regs,3); -PSCI_RESULT_REG(regs) = -do_psci_0_2_cpu_on(vcpuid, epoint, cid); +register_t vcpuid = get_user_reg(regs, 1); +register_t epoint = get_user_reg(regs, 2); +register_t cid = get_user_reg(regs, 3); +set_user_reg(regs, 0, + do_psci_0_2_cpu_on(vcpuid, epoint, cid)); } break; case PSCI_0_2_FN_CPU_SUSPEND: @@ -1533,11 +1523,11 @@ static void do_trap_psci(struct cpu_user_regs *regs) perfc_incr(vpsci_cpu_suspend); if ( psci_mode_check(current->domain, fid) ) { -uint32_t pstate = PSCI_ARG32(regs,1); -register_t epoint = PSCI_ARG(regs,2); -register_t cid = PSCI_ARG(regs,3); -PSCI_RESULT_REG(regs) = -do_psci_0_2_cpu_suspend(pstate, epoint, cid); +uint32_t pstate = get_user_reg(regs, 1); +register_t epoint = get_user_reg(regs, 2); +register_t cid = get_user_reg(regs, 3); +
[Xen-devel] [PATCH v2 0/4] Handle SMCs and HVCs in conformance with SMCCC
Hello all, This is second version. Instead of 2 patches, there are 4 now. I have divided PSCI patch into two: one changes how PSCI code accesses registers and second one moves PSCI code with new accessors to vsmc.c. Also I had removed redundant 64 bit mode check in PSCI code, as it does not conforms with SMCCC. Per-patch changes are described in corresponding patch messages. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 4/4] vsmc: psci: remove 64 bit mode check
PSCI handling code had helper routine that checked calling convention. It does not needed anymore, because: - Generic handler checks that 64 bit calls can be made only by 64 bit guests. - SMCCC requires that 64-bit handler should support both 32 and 64 bit calls even if they originate from 64 bit caller. This patch removes that extra check. Signed-off-by: Volodymyr Babchuk--- xen/arch/arm/vsmc.c | 13 + 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/xen/arch/arm/vsmc.c b/xen/arch/arm/vsmc.c index 5f10fd1..1983e0e 100644 --- a/xen/arch/arm/vsmc.c +++ b/xen/arch/arm/vsmc.c @@ -98,12 +98,6 @@ static bool handle_arch(struct cpu_user_regs *regs) return false; } -/* helper function for checking arm mode 32/64 bit */ -static inline int psci_mode_check(struct domain *d, register_t fid) -{ -return !( is_64bit_domain(d)^( (fid & PSCI_0_2_64BIT) >> 30 ) ); -} - /* PSCI 2.0 interface */ static bool handle_ssc(struct cpu_user_regs *regs) { @@ -125,8 +119,7 @@ static bool handle_ssc(struct cpu_user_regs *regs) return true; case ARM_SMCCC_FUNC_NUM(PSCI_0_2_FN_MIGRATE_INFO_UP_CPU): perfc_incr(vpsci_migrate_info_up_cpu); -if ( psci_mode_check(current->domain, fid) ) -set_user_reg(regs, 0, do_psci_0_2_migrate_info_up_cpu()); +set_user_reg(regs, 0, do_psci_0_2_migrate_info_up_cpu()); return true; case ARM_SMCCC_FUNC_NUM(PSCI_0_2_FN_SYSTEM_OFF): perfc_incr(vpsci_system_off); @@ -140,7 +133,6 @@ static bool handle_ssc(struct cpu_user_regs *regs) return true; case ARM_SMCCC_FUNC_NUM(PSCI_0_2_FN_CPU_ON): perfc_incr(vpsci_cpu_on); -if ( psci_mode_check(current->domain, fid) ) { register_t vcpuid = get_user_reg(regs, 1); register_t epoint = get_user_reg(regs, 2); @@ -151,7 +143,6 @@ static bool handle_ssc(struct cpu_user_regs *regs) return true; case ARM_SMCCC_FUNC_NUM(PSCI_0_2_FN_CPU_SUSPEND): perfc_incr(vpsci_cpu_suspend); -if ( psci_mode_check(current->domain, fid) ) { uint32_t pstate = get_user_reg(regs, 1); register_t epoint = get_user_reg(regs, 2); @@ -162,7 +153,6 @@ static bool handle_ssc(struct cpu_user_regs *regs) return true; case ARM_SMCCC_FUNC_NUM(PSCI_0_2_FN_AFFINITY_INFO): perfc_incr(vpsci_cpu_affinity_info); -if ( psci_mode_check(current->domain, fid) ) { register_t taff = get_user_reg(regs, 1); uint32_t laff = get_user_reg(regs,2); @@ -172,7 +162,6 @@ static bool handle_ssc(struct cpu_user_regs *regs) return true; case ARM_SMCCC_FUNC_NUM(PSCI_0_2_FN_MIGRATE): perfc_incr(vpsci_cpu_migrate); -if ( psci_mode_check(current->domain, fid) ) { uint32_t tcpu = get_user_reg(regs, 1); set_user_reg(regs, 0, do_psci_0_2_migrate(tcpu)); -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 2/4] arm: smccc: handle SMCs/HVCs according to SMCCC
SMCCC (SMC Call Convention) describes how to handle both HVCs and SMCs. SMCCC states that both HVC and SMC are valid conduits to call to a different firmware functions. Thus, for example PSCI calls can be made both by SMC or HVC. Also SMCCC defines function number coding for such calls. Besides functional calls there are query calls, which allows underling OS determine version, UID and number of functions provided by service provider. This patch adds new file `vsmc.c`, which handles both generic SMCs and HVC according to SMC. At this moment it implements only one service: Standard Hypervisor Service. Standard Hypervisor Service only supports query calls, so caller can ask about hypervisor UID and determine that it is XEN running. This change allows more generic handling for SMCs and HVCs and it can be easily extended to support new services and functions. But, before SMC is forwarded to standard SMCCC handler, it can be routed to a domain monitor, if one is installed. Signed-off-by: Volodymyr BabchukReviewed-by: Oleksandr Andrushchenko Reviewed-by: Oleksandr Tyshchenko --- - Moved UID definition to xen/include/public/arch-arm/smc.h - Renamed smccc.c to vsmc.c and smccc.h to vsmc.h - Reformated vsmc.h and commented definitions there - Added immediate value check for SMC64, HVC32 and HVC64 - Added conditional flags check for SMC calls (HVC will be handled and checked in the next patch). - Added check for 64 bit calls from 32 bit guests - Removed HSR value passing as separate argument - Various changes in comments --- xen/arch/arm/Makefile | 1 + xen/arch/arm/traps.c | 16 - xen/arch/arm/vsmc.c | 128 ++ xen/include/asm-arm/vsmc.h| 94 xen/include/public/arch-arm/smc.h | 45 ++ 5 files changed, 283 insertions(+), 1 deletion(-) create mode 100644 xen/arch/arm/vsmc.c create mode 100644 xen/include/asm-arm/vsmc.h create mode 100644 xen/include/public/arch-arm/smc.h diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile index 49e1fb2..4efd01c 100644 --- a/xen/arch/arm/Makefile +++ b/xen/arch/arm/Makefile @@ -50,6 +50,7 @@ obj-$(CONFIG_HAS_GICV3) += vgic-v3.o obj-$(CONFIG_HAS_ITS) += vgic-v3-its.o obj-y += vm_event.o obj-y += vtimer.o +obj-y += vsmc.o obj-y += vpsci.o obj-y += vuart.o diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index 2054c69..66242e5 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -44,6 +44,7 @@ #include #include #include +#include #include "decode.h" #include "vtimer.h" @@ -2771,10 +2772,23 @@ static void do_trap_smc(struct cpu_user_regs *regs, const union hsr hsr) { int rc = 0; +if ( !check_conditional_instr(regs, hsr) ) +{ +advance_pc(regs, hsr); +return; +} + +/* If monitor is enabled, let it handle the call */ if ( current->domain->arch.monitor.privileged_call_enabled ) rc = monitor_smc(); -if ( rc != 1 ) +if ( rc == 1 ) +return; + +/* Use standard routines to handle the call */ +if ( vsmc_handle_call(regs) ) +advance_pc(regs, hsr); +else inject_undef_exception(regs, hsr); } diff --git a/xen/arch/arm/vsmc.c b/xen/arch/arm/vsmc.c new file mode 100644 index 000..10c4acd --- /dev/null +++ b/xen/arch/arm/vsmc.c @@ -0,0 +1,128 @@ +/* + * xen/arch/arm/vsmc.c + * + * Generic handler for SMC and HVC calls according to + * ARM SMC calling convention + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + + +#include +#include +/* Need to include xen/sched.h before asm/domain.h or it breaks build*/ +#include +#include +#include +#include +#include +#include + +/* + * Hypervisor Service version + * + * We can't use XEN version here, because of SMCCC requirements: + * Major revision should change every time SMC/HVC function is removed. + * Minor revision should change every time SMC/HVC function is added. + * So, it is SMCCC protocol revision code, not XEN version. + * + * Those values are subjected to change, when interface will be extended. + * They should not be stored in public/asm-arm/smc.h because they should + * be queried by guest using SMC/HVC interface. + */ +#define XEN_SMCCC_MAJOR_REVISION 0 +#define XEN_SMCCC_MINOR_REVISION 1 + +/* Number of functions currently supported by Hypervisor Service. */ +#define XEN_SMCCC_FUNCTION_COUNT 3 + +/* SMCCC interface for hypervisor. Tell about itself.
[Xen-devel] [qemu-upstream-4.9-testing test] 110939: tolerable FAIL - PUSHED
flight 110939 qemu-upstream-4.9-testing real [real] http://logs.test-lab.xenproject.org/osstest/logs/110939/ Failures :-/ but no regressions. Regressions which are regarded as allowable (not blocking): test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail REGR. vs. 109926 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stopfail REGR. vs. 109926 Tests which did not succeed, but are not blocking: test-amd64-i386-libvirt-xsm 12 migrate-support-checkfail never pass test-amd64-amd64-xl-qemuu-ws16-amd64 9 windows-installfail never pass test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail never pass test-amd64-i386-libvirt 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt 12 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 12 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 13 saverestore-support-checkfail never pass test-arm64-arm64-xl 12 migrate-support-checkfail never pass test-arm64-arm64-xl 13 saverestore-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass test-arm64-arm64-libvirt-xsm 12 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 12 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-xl-xsm 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 12 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-xsm 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 12 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 13 saverestore-support-checkfail never pass test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2 fail never pass test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 12 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 12 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 11 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 12 saverestore-support-checkfail never pass test-armhf-armhf-xl 12 migrate-support-checkfail never pass test-armhf-armhf-xl 13 saverestore-support-checkfail never pass test-amd64-amd64-xl-qemuu-win10-i386 9 windows-installfail never pass test-amd64-i386-xl-qemuu-win10-i386 9 windows-install fail never pass test-amd64-i386-xl-qemuu-ws16-amd64 9 windows-install fail never pass version targeted for testing: qemuu414d069b38ab114b89085e44989bf57604ea86d7 baseline version: qemuue97832ec6b2a7ddd48b8e6d1d848ffdfee6a31c7 Last test of basis 109926 2017-06-01 11:16:20 Z 21 days Testing same since 110939 2017-06-21 15:44:00 Z1 days1 attempts People who touched revisions under test: Anthony PERARDJan Beulich jobs: build-amd64-xsm pass build-arm64-xsm pass build-armhf-xsm pass build-i386-xsm pass build-amd64 pass build-arm64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-arm64-libvirt
Re: [Xen-devel] [PATCH for-4.9 v2] xen/livepatch: Don't crash on encountering STN_UNDEF relocations
On Thu, Jun 22, 2017 at 11:27:50AM -0400, Konrad Rzeszutek Wilk wrote: > On Wed, Jun 21, 2017 at 09:26:15PM -0400, Konrad Rzeszutek Wilk wrote: > > On Wed, Jun 21, 2017 at 07:13:36PM +0100, Andrew Cooper wrote: > > > A symndx of STN_UNDEF is special, and means a symbol value of 0. While > > > legitimate in the ELF standard, its existance in a livepatch is > > > questionable > > > at best. Until a plausible usecase presents itself, reject such a > > > relocation > > > with -EOPNOTSUPP. > > > > > > Additionally, perform a safety check on elf->sym[symndx].sym before > > > derefencing it, to avoid tripping over a NULL pointer when calculating > > > val. > > > > > > Signed-off-by: Andrew Cooper> > > > Reviewed-by: Konrad Rzeszutek Wilk > > Tested-by: Konrad Rzeszutek Wilk [x86 right now, > > will do > > arm32 tomorrow] > > I did that on my Cubietruck and I made the rookie mistake of not trying > a hypervisor _without_ your changes, so I don't know if this crash > (see inline) is due to your patch or something else. > > Also I messed up and made the livepatch test run every time it boots, so > now it is stuck in a loop of crashes :-( > > The git tree is: > > git://xenbits.xen.org/people/konradwilk/xen.git staging-4.9 > > Stay tuned. And I see the same thing with b38b147 (that is the top of 'origin/staging'). So time to dig in. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] libxc: add xc_domain_add_to_physmap_batch to wrap XENMEM_add_to_physmap_batch
Hi Wei, 2017-06-21 23:44 GMT+08:00 Wei Liu: > On Wed, Jun 21, 2017 at 01:29:26AM +0800, Zhongze Liu wrote: >> This is a preparation for the proposal "allow setting up shared memory areas >> between VMs from xl config file". See: >> V2: https://lists.xen.org/archives/html/xen-devel/2017-06/msg02256.html >> V1: https://lists.xen.org/archives/html/xen-devel/2017-05/msg01288.html >> >> The plan is to use XENMEM_add_to_physmap_batch in xl to map foregin pages >> from >> one DomU to another so that the page could be shared. But currently there is >> no >> wrapper for XENMEM_add_to_physmap_batch in libxc, so we just add a wrapper >> for >> it. >> >> Signed-off-by: Zhongze Liu >> --- >> +int xc_domain_add_to_physmap_batch(xc_interface *xch, >> + domid_t domid, >> + domid_t foreign_domid, >> + unsigned int space, >> + unsigned int size, >> + xen_ulong_t *idxs, >> + xen_pfn_t *gpfns, >> + . int *errs) >> +{ >> +int rc; >> +DECLARE_HYPERCALL_BOUNCE(idxs, size * sizeof(*idxs), >> XC_HYPERCALL_BUFFER_BOUNCE_IN); >> +DECLARE_HYPERCALL_BOUNCE(gpfns, size * sizeof(*gpfns), >> XC_HYPERCALL_BUFFER_BOUNCE_IN); >> +DECLARE_HYPERCALL_BOUNCE(errs, size * sizeof(*errs), >> XC_HYPERCALL_BUFFER_BOUNCE_OUT); >> + >> +struct xen_add_to_physmap_batch xatp_batch = { >> +.domid = domid, >> +.space = space, >> +.size = size, >> +.u = {.foreign_domid = foreign_domid} > > Coding style issue. Do you mean that I should add a space between '{' and '.' near ".u = {.foreign" in this line? > > Just a note, the struct is different for pre-4.7 and post-4.7 Xen. You > don't need to implement a version of this function for pre-4.7 Xen. > >> +}; >> + >> +if ( xc_hypercall_bounce_pre(xch, idxs) || >> + xc_hypercall_bounce_pre(xch, gpfns) || >> + xc_hypercall_bounce_pre(xch, errs) ) >> +{ >> +PERROR("Could not bounce memory for XENMEM_add_to_physmap_batch"); >> +goto out; > > rc will be uninitialised in this exit path. > >> +} >> + >> +set_xen_guest_handle(xatp_batch.idxs, idxs); >> +set_xen_guest_handle(xatp_batch.gpfns, gpfns); >> +set_xen_guest_handle(xatp_batch.errs, errs); >> + >> +rc = do_memory_op(xch, XENMEM_add_to_physmap_batch, >> + _batch, sizeof(xatp_batch)); >> + >> +out: >> +xc_hypercall_bounce_post(xch, idxs); >> +xc_hypercall_bounce_post(xch, gpfns); >> +xc_hypercall_bounce_post(xch, errs); >> + >> +return rc; >> +} >> + >> int xc_domain_claim_pages(xc_interface *xch, >> uint32_t domid, >> unsigned long nr_pages) >> -- >> 2.13.1 >> Cheers, Zhongze Liu ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 2/2] x86/altp2m: Add a hvmop for setting the suppress #VE bit
On Thu, Jun 22, 2017 at 06:13:22AM -0600, Jan Beulich wrote: > >>> On 22.06.17 at 14:04,wrote: > > On Fri, Jun 16, 2017 at 02:39:10AM -0600, Jan Beulich wrote: > >> >>> On 15.06.17 at 21:01, wrote: > >> > On Fri, Jun 9, 2017 at 10:51 AM, Adrian Pop wrote: > >> >> --- a/xen/arch/x86/mm/mem_access.c > >> >> +++ b/xen/arch/x86/mm/mem_access.c > >> >> @@ -466,6 +466,58 @@ int p2m_get_mem_access(struct domain *d, gfn_t > >> >> gfn, > > xenmem_access_t *access) > >> >> } > >> >> > >> >> /* > >> >> + * Set/clear the #VE suppress bit for a page. Only available on VMX. > >> >> + */ > >> >> +int p2m_set_suppress_ve(struct domain *d, gfn_t gfn, bool suppress_ve, > >> >> +unsigned int altp2m_idx) > >> >> +{ > >> >> +struct p2m_domain *host_p2m = p2m_get_hostp2m(d); > >> >> +struct p2m_domain *ap2m = NULL; > >> >> +struct p2m_domain *p2m; > >> >> +mfn_t mfn; > >> >> +p2m_access_t a; > >> >> +p2m_type_t t; > >> >> +int rc; > >> >> + > >> >> +if ( !cpu_has_vmx_virt_exceptions ) > >> >> +return -EOPNOTSUPP; > >> >> + > >> >> +/* This subop should only be used from a privileged domain. */ > >> >> +if ( !current->domain->is_privileged ) > >> >> +return -EINVAL; > >> > > >> > This check looks wrong to me. If this subop should only be used by an > >> > external (privileged) domain then I don't think this should be > >> > implemented as an HVMOP, looks more like a domctl to me. > >> > >> I think this wants to be an XSM_DM_PRIV check instead. > > > > I'm not sure, but I expect that to not behave as intended security-wise > > if Xen is compiled without XSM. Would it? It would be great if this > > feature worked well without XSM too. > > Well, without you explaining why you think this wouldn't work > without XSM, I don't really know what to answer. I suppose > you've grep-ed for other uses of this and/or other XSM_* values, > finding that these exist in various places where all is fine without > XSM? OK; it indeed does what it should without XSM as well. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] VT-d: fix VF of RC integrated endpoint matched to wrong VT-d unit
On Thu, Jun 22, 2017 at 09:31:50AM -0600, Jan Beulich wrote: > >>> On 22.06.17 at 16:21,wrote: > > On Thu, Jun 22, 2017 at 03:26:04AM -0600, Jan Beulich wrote: > > On 21.06.17 at 12:47, wrote: > >>> The problem is a VF of RC integrated PF (e.g. PF's BDF is 00:02.0), > >>> we would wrongly use 00:00.0 to search VT-d unit. > >>> > >>> To search VT-d unit for a VF, the BDF of the PF is used. And If the > >>> PF is an Extended Function, the BDF of one traditional function is > >>> used. The following line (from acpi_find_matched_drhd_unit()): > >>> devfn = PCI_SLOT(pdev->info.physfn.devfn) ? 0 : > >>> pdev->info.physfn.devfn; > >>> sets 'devfn' to 0 if PF's devfn > 7. Apparently, it treats all > >>> PFs which has devfn > 7 as extended function. However, it is wrong for > >>> a RC integrated PF, which is not ARI-capable but may have devfn > 7. > >> > >>I'm again having trouble with you talking about ARI and RC > >>integrated here, but not checking for either in any way in the > >>new code. Please make sure you establish the full connection > >>in the description. > > > > Sorry for this. Let me explain this again. > > > > From SRIOV spec 3.7.3, it says: > > "ARI is not applicable to Root Complex Integrated Endpoints; all other > > SR-IOV Capable Devices (Devices that include at least one PF) shall > > implement the ARI Capability in each Function." > > > > So I _think_ PFs can be classified to two kinds: one is RC integrated > > PF and the other is non-RC integrated PF. The former can't support ARI. > > The latter shall support ARI. Only for extended functions, one > > traditional function's BDF should be used to search VT-d unit. And > > according to PCIE spec, Extended function means within an ARI Device, a > > Function whose Function Number is greater than 7. So the former > > can't be an extended function. The latter is an extended function as > > long as PF's devfn > 7, this check is exactly what the original code > > did. So I think the original code didn't aware the former > > (aka, RC integrated endpoints.). This patch checks the is_extfn > > directly. All of this is only my understanding. I need you and Kevin's > > help to decide it's right or not. > > This makes sense to me, but as said, the patch description will need > to include this in some form. > > >>> --- a/xen/drivers/passthrough/vtd/dmar.c > >>> +++ b/xen/drivers/passthrough/vtd/dmar.c > >>> @@ -218,8 +218,18 @@ struct acpi_drhd_unit > >>> *acpi_find_matched_drhd_unit(const > >>> struct pci_dev *pdev) > >>> } > >>> else if ( pdev->info.is_virtfn ) > >>> { > >>> +struct pci_dev *physfn; > >> > >>const > >> > >>> bus = pdev->info.physfn.bus; > >>> -devfn = PCI_SLOT(pdev->info.physfn.devfn) ? 0 : > >>> pdev->info.physfn.devfn; > >>> +/* > >>> + * Use 0 as 'devfn' to search VT-d unit when the physical > >>> function > >>> + * is an Extended Function. > >>> + */ > >>> +pcidevs_lock(); > >>> +physfn = pci_get_pdev(pdev->seg, bus, pdev->info.physfn.devfn); > >>> +pcidevs_unlock(); > >>> +ASSERT(physfn); > >>> +devfn = physfn->info.is_extfn ? 0 : pdev->info.physfn.devfn; > >> > >>This change looks to be fine is we assume that is_extfn is always > >>set correctly. Looking at the Linux code setting it, I'm not sure > >>though: I can't see any connection to the PF needing to be RC > >>integrated there. > > > > Linux code sets it when > > pci_ari_enabled(pci_dev->bus) && PCI_SLOT(pci_dev->devfn) > > > > I _think_ pci_ari_enabled(pci_dev->bus) means ARIforwarding is enabled > > in the immediatedly upstream Downstream port. Thus, I think the pci_dev > > is an ARI-capable device for PCIe spec 6.13 says: > > > > It is strongly recommended that software in general Set the ARI > > Forwarding Enable bit in a 5 Downstream Port only if software is certain > > that the device immediately below the Downstream Port is an ARI Device. > > If the bit is Set when a non-ARI Device is present, the non-ARI Device > > can respond to Configuration Space accesses under what it interprets as > > being different Device Numbers, and its Functions can be aliased under > > multiple Device Numbers, generally leading to undesired behavior. > > > > and the pci_dev can't be a RC integrated endpoints. From another side, it > > also means the is_extfn won't be set for RC integrated PF. Is that > > right? > > Well, I'm not sure about the Linux parts here? Konrad, do you > happen to know? Or do you know someone who does? Including Govinda and Venu, > > >>I'd also suggest doing error handling not by ASSERT(), but by > >>checking physfn in the conditional expression. > > > > do you mean this: > > devfn = (physfn && physfn->info.is_extfn) ? 0 : pdev->info.physfn.devfn; > > Yes. > > Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org
Re: [Xen-devel] [PATCH] mini-os: use gzip -n
On Thu, Jun 22, 2017 at 03:55:21PM +0100, Andrew Cooper wrote: > On 22/06/17 15:09, Wei Liu wrote: > > Cc minios-devel and Samuel > > > > On Thu, Jun 22, 2017 at 03:40:26PM +0200, Bernhard M. Wiedemann wrote: > >> to not add current timestamp to > >> ioemu-stubdom.gz > >> pv-grub-x86_32.gz > >> pv-grub-x86_64.gz > >> xenstore-stubdom.gz > >> > >> to allow for reproducible builds > >> > >> Signed-off-by: Bernhard M. Wiedemann> > Acked-by: Wei Liu > > Would it make sense to have a $(GZIP) in the same as we abstract out > other programs, and export GZIP = gzip -n ? Sure, that would be a nice thing to have. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] VT-d: fix VF of RC integrated endpoint matched to wrong VT-d unit
>>> On 22.06.17 at 16:21,wrote: > On Thu, Jun 22, 2017 at 03:26:04AM -0600, Jan Beulich wrote: > On 21.06.17 at 12:47, wrote: >>> The problem is a VF of RC integrated PF (e.g. PF's BDF is 00:02.0), >>> we would wrongly use 00:00.0 to search VT-d unit. >>> >>> To search VT-d unit for a VF, the BDF of the PF is used. And If the >>> PF is an Extended Function, the BDF of one traditional function is >>> used. The following line (from acpi_find_matched_drhd_unit()): >>> devfn = PCI_SLOT(pdev->info.physfn.devfn) ? 0 : pdev->info.physfn.devfn; >>> sets 'devfn' to 0 if PF's devfn > 7. Apparently, it treats all >>> PFs which has devfn > 7 as extended function. However, it is wrong for >>> a RC integrated PF, which is not ARI-capable but may have devfn > 7. >> >>I'm again having trouble with you talking about ARI and RC >>integrated here, but not checking for either in any way in the >>new code. Please make sure you establish the full connection >>in the description. > > Sorry for this. Let me explain this again. > > From SRIOV spec 3.7.3, it says: > "ARI is not applicable to Root Complex Integrated Endpoints; all other > SR-IOV Capable Devices (Devices that include at least one PF) shall > implement the ARI Capability in each Function." > > So I _think_ PFs can be classified to two kinds: one is RC integrated > PF and the other is non-RC integrated PF. The former can't support ARI. > The latter shall support ARI. Only for extended functions, one > traditional function's BDF should be used to search VT-d unit. And > according to PCIE spec, Extended function means within an ARI Device, a > Function whose Function Number is greater than 7. So the former > can't be an extended function. The latter is an extended function as > long as PF's devfn > 7, this check is exactly what the original code > did. So I think the original code didn't aware the former > (aka, RC integrated endpoints.). This patch checks the is_extfn > directly. All of this is only my understanding. I need you and Kevin's > help to decide it's right or not. This makes sense to me, but as said, the patch description will need to include this in some form. >>> --- a/xen/drivers/passthrough/vtd/dmar.c >>> +++ b/xen/drivers/passthrough/vtd/dmar.c >>> @@ -218,8 +218,18 @@ struct acpi_drhd_unit >>> *acpi_find_matched_drhd_unit(const >>> struct pci_dev *pdev) >>> } >>> else if ( pdev->info.is_virtfn ) >>> { >>> +struct pci_dev *physfn; >> >>const >> >>> bus = pdev->info.physfn.bus; >>> -devfn = PCI_SLOT(pdev->info.physfn.devfn) ? 0 : >>> pdev->info.physfn.devfn; >>> +/* >>> + * Use 0 as 'devfn' to search VT-d unit when the physical function >>> + * is an Extended Function. >>> + */ >>> +pcidevs_lock(); >>> +physfn = pci_get_pdev(pdev->seg, bus, pdev->info.physfn.devfn); >>> +pcidevs_unlock(); >>> +ASSERT(physfn); >>> +devfn = physfn->info.is_extfn ? 0 : pdev->info.physfn.devfn; >> >>This change looks to be fine is we assume that is_extfn is always >>set correctly. Looking at the Linux code setting it, I'm not sure >>though: I can't see any connection to the PF needing to be RC >>integrated there. > > Linux code sets it when > pci_ari_enabled(pci_dev->bus) && PCI_SLOT(pci_dev->devfn) > > I _think_ pci_ari_enabled(pci_dev->bus) means ARIforwarding is enabled > in the immediatedly upstream Downstream port. Thus, I think the pci_dev > is an ARI-capable device for PCIe spec 6.13 says: > > It is strongly recommended that software in general Set the ARI > Forwarding Enable bit in a 5 Downstream Port only if software is certain > that the device immediately below the Downstream Port is an ARI Device. > If the bit is Set when a non-ARI Device is present, the non-ARI Device > can respond to Configuration Space accesses under what it interprets as > being different Device Numbers, and its Functions can be aliased under > multiple Device Numbers, generally leading to undesired behavior. > > and the pci_dev can't be a RC integrated endpoints. From another side, it > also means the is_extfn won't be set for RC integrated PF. Is that > right? Well, I'm not sure about the Linux parts here? Konrad, do you happen to know? Or do you know someone who does? >>I'd also suggest doing error handling not by ASSERT(), but by >>checking physfn in the conditional expression. > > do you mean this: > devfn = (physfn && physfn->info.is_extfn) ? 0 : pdev->info.physfn.devfn; Yes. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.9 v2] xen/livepatch: Don't crash on encountering STN_UNDEF relocations
On Wed, Jun 21, 2017 at 09:26:15PM -0400, Konrad Rzeszutek Wilk wrote: > On Wed, Jun 21, 2017 at 07:13:36PM +0100, Andrew Cooper wrote: > > A symndx of STN_UNDEF is special, and means a symbol value of 0. While > > legitimate in the ELF standard, its existance in a livepatch is questionable > > at best. Until a plausible usecase presents itself, reject such a > > relocation > > with -EOPNOTSUPP. > > > > Additionally, perform a safety check on elf->sym[symndx].sym before > > derefencing it, to avoid tripping over a NULL pointer when calculating val. > > > > Signed-off-by: Andrew Cooper> > Reviewed-by: Konrad Rzeszutek Wilk > Tested-by: Konrad Rzeszutek Wilk [x86 right now, > will do > arm32 tomorrow] I did that on my Cubietruck and I made the rookie mistake of not trying a hypervisor _without_ your changes, so I don't know if this crash (see inline) is due to your patch or something else. Also I messed up and made the livepatch test run every time it boots, so now it is stuck in a loop of crashes :-( The git tree is: git://xenbits.xen.org/people/konradwilk/xen.git staging-4.9 Stay tuned. U-Boot SPL 2015.04 (Mar 14 2016 - 12:00:28) DRAM: 2048 MiB CPU: 91200Hz, AXI/AHB/APB: 3/2/2 U-Boot 2015.04 (Mar 14 2016 - 12:00:28) Allwinner Technology CPU: Allwinner A20 (SUN7I) I2C: ready DRAM: 2 GiB MMC: SUNXI SD/MMC: 0 Setting up a 1024x768 vga console In:serial Out: vga Err: vga SCSI: SUNXI SCSI INIT SATA link 0 timeout. AHCI 0001.0100 32 slots 1 ports 3 Gbps 0x1 impl SATA mode flags: ncq stag pm led clo only pmp pio slum part ccc apst Net: dwmac.1c5 starting USB... USB0: USB EHCI 1.00 scanning bus 0 for devices... 1 USB Device(s) found USB1: USB EHCI 1.00 scanning bus 1 for devices... 1 USB Device(s) found scanning usb for storage devices... 0 Storage Device(s) found Hit any key to stop autoboot: 2 1 0 switch to partitions #0, OK mmc0 is current device Scanning mmc 0:1... Found U-Boot script /boot.scr reading /boot.scr 1629 bytes read in 22 ms (72.3 KiB/s) ## Executing script at 4310 reading /xen 884744 bytes read in 72 ms (11.7 MiB/s) reading /sun7i-a20-cubietruck.dtb 30801 bytes read in 42 ms (715.8 KiB/s) reading /vmlinuz 5662136 bytes read in 382 ms (14.1 MiB/s) Kernel image @ 0xaea0 [ 0x00 - 0x11b700 ] ## Flattened Device Tree blob at aec0 Booting using the fdt blob at 0xaec0 reserving fdt memory region: addr=aec0 size=8000 Using Device Tree in place at aec0, end aec0afff Starting kernel ... Xen 4.9-rc (XEN) Xen version 4.9-rc (kon...@dumpdata.com) (arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609) debug=y Wed Jun 21 21:55:01 EDT 2017 (XEN) Latest ChangeSet: Wed Jun 21 19:13:36 2017 +0100 git:e199fd6 (XEN) Processor: 410fc074: "ARM Limited", variant: 0x0, part 0xc07, rev 0x4 (XEN) 32-bit Execution: (XEN) Processor Features: 1131:00011011 (XEN) Instruction Sets: AArch32 A32 Thumb Thumb-2 ThumbEE Jazelle (XEN) Extensions: GenericTimer Security (XEN) Debug Features: 02010555 (XEN) Auxiliary Features: (XEN) Memory Model Features: 10101105 4000 0124 02102211 (XEN) ISA Features: 02101110 13112111 21232041 2131 10011142 (XEN) Using PSCI-0.1 for SMP bringup (XEN) SMP: Allowing 2 CPUs (XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27 Freq: 24000 KHz (XEN) GICv2: WARNING: The GICC size is too small: 0x1000 expected 0x2000 (XEN) GICv2 initialization: (XEN) gic_dist_addr=01c81000 (XEN) gic_cpu_addr=01c82000 (XEN) gic_hyp_addr=01c84000 (XEN) gic_vcpu_addr=01c86000 (XEN) gic_maintenance_irq=25 (XEN) GICv2: 160 lines, 2 cpus, secure (IID 0100143b). (XEN) Using scheduler: SMP Credit Scheduler (credit) (XEN) Allocated console ring of 16 KiB. (XEN) VFP implementer 0x41 architecture 2 part 0x30 variant 0x7 rev 0x4 (XEN) Bringing up CPU1 (XEN) CPU 1 booted. (XEN) Brought up 2 CPUs (XEN) P2M: 40-bit IPA (XEN) P2M: 3 levels with order-1 root, VTCR 0x80003558 (XEN) I/O virtualisation disabled (XEN) build-id: d406e500724be7c1443df04d783419bc70fa75b9 (XEN) alternatives: Patching with alt table 100c1464 -> 100c1494 (XEN) *** LOADING DOMAIN 0 *** (XEN) Loading kernel from boot module @ af60 (XEN) Allocating 1:1 mappings totalling 512MB for dom0: (XEN) BANK[0] 0x006000-0x008000 (512MB) (XEN) Grant table range: 0x00bfa0-0x00bfa6d000 (XEN) Loading zImage from af60 to 67a0-67f665b8 (XEN) Allocating PPI 16 for event channel interrupt (XEN) Loading dom0 DTB to 0x6800-0x680072e0 (XEN) Scrubbing Free RAM on 1 nodes using 2 CPUs (XEN) done. (XEN) Initial low memory virq threshold set at 0x4000 pages. (XEN) Std. Loglevel: All (XEN) Guest Loglevel: All (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch
[Xen-devel] [qemu-upstream-unstable test] 110938: regressions - FAIL
flight 110938 qemu-upstream-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/110938/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-xl-qemuu-win7-amd64 15 guest-localmigrate/x10 fail REGR. vs. 106833 test-armhf-armhf-xl-credit2 15 guest-start/debian.repeat fail REGR. vs. 106833 Regressions which are regarded as allowable (not blocking): test-armhf-armhf-xl-rtds15 guest-start/debian.repeat fail REGR. vs. 106833 Tests which did not succeed, but are not blocking: test-armhf-armhf-libvirt-raw 12 saverestore-support-checkfail like 106813 test-armhf-armhf-libvirt-xsm 13 saverestore-support-checkfail like 106833 test-armhf-armhf-libvirt 13 saverestore-support-checkfail like 106833 test-amd64-amd64-xl-qemuu-ws16-amd64 9 windows-installfail never pass test-amd64-amd64-libvirt 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass test-amd64-i386-libvirt 12 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 12 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 13 saverestore-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass test-arm64-arm64-xl 12 migrate-support-checkfail never pass test-arm64-arm64-xl 13 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 12 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 12 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 12 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 13 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail never pass test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2 fail never pass test-armhf-armhf-xl-rtds 12 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 12 migrate-support-checkfail never pass test-armhf-armhf-xl 12 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 13 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt 12 migrate-support-checkfail never pass test-armhf-armhf-xl-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-xl-xsm 13 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 11 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 12 saverestore-support-checkfail never pass test-amd64-i386-xl-qemuu-win10-i386 9 windows-install fail never pass test-amd64-i386-xl-qemuu-ws16-amd64 9 windows-install fail never pass test-amd64-amd64-xl-qemuu-win10-i386 9 windows-installfail never pass version targeted for testing: qemuu414d069b38ab114b89085e44989bf57604ea86d7 baseline version: qemuue97832ec6b2a7ddd48b8e6d1d848ffdfee6a31c7 Last test of basis 106833 2017-03-22 07:02:01 Z 92 days Testing same since 110938 2017-06-21 15:39:52 Z0 days1 attempts People who touched revisions under test: Anthony PERARDJan Beulich jobs: build-amd64-xsm pass build-arm64-xsm pass build-armhf-xsm pass build-i386-xsm pass build-amd64 pass build-arm64 pass build-armhf pass build-i386
Re: [Xen-devel] [PATCH net] xen-netback: correctly schedule rate-limited queues
From: Wei LiuDate: Wed, 21 Jun 2017 10:21:22 +0100 > Add a flag to indicate if a queue is rate-limited. Test the flag in > NAPI poll handler and avoid rescheduling the queue if true, otherwise > we risk locking up the host. The rescheduling will be done in the > timer callback function. > > Reported-by: Jean-Louis Dupond > Signed-off-by: Wei Liu > Tested-by: Jean-Louis Dupond Applied. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] mini-os: use gzip -n
On 22/06/17 15:09, Wei Liu wrote: > Cc minios-devel and Samuel > > On Thu, Jun 22, 2017 at 03:40:26PM +0200, Bernhard M. Wiedemann wrote: >> to not add current timestamp to >> ioemu-stubdom.gz >> pv-grub-x86_32.gz >> pv-grub-x86_64.gz >> xenstore-stubdom.gz >> >> to allow for reproducible builds >> >> Signed-off-by: Bernhard M. Wiedemann> Acked-by: Wei Liu Would it make sense to have a $(GZIP) in the same as we abstract out other programs, and export GZIP = gzip -n ? ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] live migration of HVM domUs with more than 32vcpus fails
On 06/22/2017 10:39 AM, Olaf Hering wrote: > On Thu, Jun 22, Konrad Rzeszutek Wilk wrote: > >> On Thu, Jun 22, 2017 at 03:57:52PM +0200, Olaf Hering wrote: >>> It seems that live migration of HVM domUs with more than 32 vcpus causes >>> a hang of the domU on the remote side. Both ping and 'xl console' show no >>> reaction. >>> This happens also with kernel-4.12. Is this a known bug? >> Ankur had some patches for more than 32 vCPUs. > Great, where can I get a copy? They are queued for 4.13. git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git for-linus-4.13 -boris signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] live migration of HVM domUs with more than 32vcpus fails
On Thu, Jun 22, Konrad Rzeszutek Wilk wrote: > On Thu, Jun 22, 2017 at 03:57:52PM +0200, Olaf Hering wrote: > > It seems that live migration of HVM domUs with more than 32 vcpus causes > > a hang of the domU on the remote side. Both ping and 'xl console' show no > > reaction. > > This happens also with kernel-4.12. Is this a known bug? > > Ankur had some patches for more than 32 vCPUs. Great, where can I get a copy? Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] live migration of HVM domUs with more than 32vcpus fails
On Thu, Jun 22, 2017 at 03:57:52PM +0200, Olaf Hering wrote: > It seems that live migration of HVM domUs with more than 32 vcpus causes > a hang of the domU on the remote side. Both ping and 'xl console' show no > reaction. > This happens also with kernel-4.12. Is this a known bug? Ankur had some patches for more than 32 vCPUs. > > Olaf > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > https://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] VT-d: fix VF of RC integrated endpoint matched to wrong VT-d unit
On Thu, Jun 22, 2017 at 03:26:04AM -0600, Jan Beulich wrote: On 21.06.17 at 12:47,wrote: >> The problem is a VF of RC integrated PF (e.g. PF's BDF is 00:02.0), >> we would wrongly use 00:00.0 to search VT-d unit. >> >> To search VT-d unit for a VF, the BDF of the PF is used. And If the >> PF is an Extended Function, the BDF of one traditional function is >> used. The following line (from acpi_find_matched_drhd_unit()): >> devfn = PCI_SLOT(pdev->info.physfn.devfn) ? 0 : pdev->info.physfn.devfn; >> sets 'devfn' to 0 if PF's devfn > 7. Apparently, it treats all >> PFs which has devfn > 7 as extended function. However, it is wrong for >> a RC integrated PF, which is not ARI-capable but may have devfn > 7. > >I'm again having trouble with you talking about ARI and RC >integrated here, but not checking for either in any way in the >new code. Please make sure you establish the full connection >in the description. Sorry for this. Let me explain this again. From SRIOV spec 3.7.3, it says: "ARI is not applicable to Root Complex Integrated Endpoints; all other SR-IOV Capable Devices (Devices that include at least one PF) shall implement the ARI Capability in each Function." So I _think_ PFs can be classified to two kinds: one is RC integrated PF and the other is non-RC integrated PF. The former can't support ARI. The latter shall support ARI. Only for extended functions, one traditional function's BDF should be used to search VT-d unit. And according to PCIE spec, Extended function means within an ARI Device, a Function whose Function Number is greater than 7. So the former can't be an extended function. The latter is an extended function as long as PF's devfn > 7, this check is exactly what the original code did. So I think the original code didn't aware the former (aka, RC integrated endpoints.). This patch checks the is_extfn directly. All of this is only my understanding. I need you and Kevin's help to decide it's right or not. > >> --- a/xen/drivers/passthrough/vtd/dmar.c >> +++ b/xen/drivers/passthrough/vtd/dmar.c >> @@ -218,8 +218,18 @@ struct acpi_drhd_unit >> *acpi_find_matched_drhd_unit(const >> struct pci_dev *pdev) >> } >> else if ( pdev->info.is_virtfn ) >> { >> +struct pci_dev *physfn; > >const > >> bus = pdev->info.physfn.bus; >> -devfn = PCI_SLOT(pdev->info.physfn.devfn) ? 0 : >> pdev->info.physfn.devfn; >> +/* >> + * Use 0 as 'devfn' to search VT-d unit when the physical function >> + * is an Extended Function. >> + */ >> +pcidevs_lock(); >> +physfn = pci_get_pdev(pdev->seg, bus, pdev->info.physfn.devfn); >> +pcidevs_unlock(); >> +ASSERT(physfn); >> +devfn = physfn->info.is_extfn ? 0 : pdev->info.physfn.devfn; > >This change looks to be fine is we assume that is_extfn is always >set correctly. Looking at the Linux code setting it, I'm not sure >though: I can't see any connection to the PF needing to be RC >integrated there. Linux code sets it when pci_ari_enabled(pci_dev->bus) && PCI_SLOT(pci_dev->devfn) I _think_ pci_ari_enabled(pci_dev->bus) means ARIforwarding is enabled in the immediatedly upstream Downstream port. Thus, I think the pci_dev is an ARI-capable device for PCIe spec 6.13 says: It is strongly recommended that software in general Set the ARI Forwarding Enable bit in a 5 Downstream Port only if software is certain that the device immediately below the Downstream Port is an ARI Device. If the bit is Set when a non-ARI Device is present, the non-ARI Device can respond to Configuration Space accesses under what it interprets as being different Device Numbers, and its Functions can be aliased under multiple Device Numbers, generally leading to undesired behavior. and the pci_dev can't be a RC integrated endpoints. From another side, it also means the is_extfn won't be set for RC integrated PF. Is that right? > >I'd also suggest doing error handling not by ASSERT(), but by >checking physfn in the conditional expression. do you mean this: devfn = (physfn && physfn->info.is_extfn) ? 0 : pdev->info.physfn.devfn; Thanks Chao ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel