Re: [Xen-devel] [PATCH] xen-blkfront: fix mq start/stop race

2017-06-22 Thread Junxiao Bi
Hi Boris & Juergen,

Could you help review this patch? This is a race and will cause the io hung.

Thanks,
Junxiao.

On 06/22/2017 09:36 AM, Junxiao Bi wrote:
> When ring buf full, hw queue will be stopped. While blkif interrupt consume
> request and make free space in ring buf, hw queue will be started again.
> But since start queue is protected by spin lock while stop not, that will
> cause a race.
> 
> interrupt:  process:
> blkif_interrupt()   blkif_queue_rq()
>  kick_pending_request_queues_locked()
>   blk_mq_start_stopped_hw_queues()
>clear_bit(BLK_MQ_S_STOPPED, >state)
>  blk_mq_stop_hw_queue(hctx)
>blk_mq_run_hw_queue(hctx, async)
> 
> If ring buf is made empty in this case, interrupt will never come, then the
> hw queue will be stopped forever, all processes waiting for the pending io
> in the queue will hung.
> 
> Signed-off-by: Junxiao Bi 
> Reviewed-by: Ankur Arora 
> ---
>  drivers/block/xen-blkfront.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 8bb160cd00e1..4767b82b2cf6 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -912,8 +912,8 @@ out_err:
>   return BLK_MQ_RQ_QUEUE_ERROR;
>  
>  out_busy:
> - spin_unlock_irqrestore(>ring_lock, flags);
>   blk_mq_stop_hw_queue(hctx);
> + spin_unlock_irqrestore(>ring_lock, flags);
>   return BLK_MQ_RQ_QUEUE_BUSY;
>  }
>  
> 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 3/3] VT-d PI: restrict the vcpu number on a given pcpu

2017-06-22 Thread Chao Gao
On Fri, Jun 16, 2017 at 09:09:13AM -0600, Jan Beulich wrote:
 On 24.05.17 at 08:56,  wrote:
>> Currently, a blocked vCPU is put in its pCPU's pi blocking list. If
>> too many vCPUs are blocked on a given pCPU, it will incur that the list
>> grows too long. After a simple analysis, there are 32k domains and
>> 128 vcpu per domain, thus about 4M vCPUs may be blocked in one pCPU's
>> PI blocking list. When a wakeup interrupt arrives, the list is
>> traversed to find some specific vCPUs to wake them up. This traversal in
>> that case would consume much time.
>> 
>> To mitigate this issue, this patch limits the vcpu number on a given
>> pCPU,
>
>This would be a bug, but I think it's the description which is wrong
>(or at least imprecise): You don't limit the number of vCPU-s _run_
>on any pCPU, but those tracked on any pCPU-s blocking list. Please
>say so here to avoid confusion.

Agree.

>
>> taking factors such as perfomance of common case, current hvm vcpu
>> count and current pcpu count into consideration. With this method, for
>> the common case, it works fast and for some extreme cases, the list
>> length is under control.
>> 
>> The change in vmx_pi_unblock_vcpu() is for the following case:
>> vcpu is running -> try to block (this patch may change NSDT to
>> another pCPU) but notification comes in time, thus the vcpu
>
>What does "but notification comes in time" mean?
>

I mean when local_events_need_delivery() in vcpu_block() return true.

>> goes back to running station -> VM-entry (we should set NSDT again,
>
>s/station/state/ ?
>
>> reverting the change we make to NSDT in vmx_vcpu_block())
>
>Overall I'm not sure I really understand what you try to explain
>here.

Will put it above the related change.

I wanted to explain why we need this change if a vcpu can be added
to a remote pcpu (means the vcpu isn't running on this pcpu).

a vcpu may go through the two different paths from calling vcpu_block()
to VM-entry:
Path1: vcpu_block()->vmx_vcpu_block()->local_events_need_delivery(return
true) -> vmx_pi_unblock_vcpu (during VM-entry)
Path2: vcpu_block()->vmx_vcpu_block()->local_events_need_delivery(return
false) -> vmx_pi_switch_from() -> vmx_pi_switch_to()
->vmx_pi_unblock_vcpu (during VM-entry)

For migration a vcpu to another pcpu would lead to a incorrect
pi_desc->ndst, vmx_pi_switch_to() re-assigns pi_desc->ndst.
It was enough for Path1 (no one changed the pi_desc->ndst field and
changed the binding between pcpu and vcpu) and Path2. But, now
vmx_vcpu_block() would change pi_desc->ndst to another pcpu to receive
wakeup interrupt. If local_events_need_delivery() returns true, we
should correct pi_desc->ndst to current pcpu in vmx_pi_unblock_vcpu().

>
>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> @@ -100,16 +100,62 @@ void vmx_pi_per_cpu_init(unsigned int cpu)
>>  spin_lock_init(_cpu(vmx_pi_blocking, cpu).lock);
>>  }
>>  
>> +/*
>> + * By default, the local pcpu (means the one the vcpu is currently running 
>> on)
>> + * is chosen as the destination of wakeup interrupt. But if the vcpu number 
>> of
>> + * the pcpu exceeds a limit, another pcpu is chosen until we find a suitable
>> + * one.
>> + *
>> + * Currently, choose (v_tot/p_tot) + K as the limit of vcpu count, where
>> + * v_tot is the total number of hvm vcpus on the system, p_tot is the total
>> + * number of pcpus in the system, and K is a fixed number. Experments shows
>> + * the maximum time to wakeup a vcpu from a 128-entry blocking list is about
>> + * 22us, which is tolerable. So choose 128 as the fixed number K.
>
>Giving and kind of absolute time value requires also stating on what
>hardware this was measured.
>
>> + * This policy makes sure:
>> + * 1) for common cases, the limit won't be reached and the local pcpu is 
>> used
>> + * which is beneficial to performance (at least, avoid an IPI when 
>> unblocking
>> + * vcpu).
>> + * 2) for the worst case, the blocking list length scales with the vcpu 
>> count
>> + * divided by the pcpu count.
>> + */
>> +#define PI_LIST_FIXED_NUM 128
>> +#define PI_LIST_LIMIT (atomic_read(_hvm_vcpus) / num_online_cpus() 
>> + \
>> +   PI_LIST_FIXED_NUM)
>> +
>> +static bool pi_over_limit(int count)
>
>Can a caller validly pass a negative argument? Otherwise unsigned int
>please.
>
>> +{
>> +/* Compare w/ constant first to save an atomic read in the common case 
>> */
>
>As an atomic read is just a normal read on x86, does this really matter?

agree.

>
>> +return ((count > PI_LIST_FIXED_NUM) &&
>> +(count > (atomic_read(_hvm_vcpus) / num_online_cpus()) +
>> +PI_LIST_FIXED_NUM));
>
>Right above you've #define-d PI_LIST_LIMIT - why do you open code
>it here? Also note that the outer pair of parentheses is pointless (and
>hampering readability).
>
>>  static void vmx_vcpu_block(struct vcpu *v)
>>  {
>>  unsigned long flags;
>> -unsigned int dest;
>> +unsigned int dest, 

[Xen-devel] [linux-linus test] 110950: regressions - FAIL

2017-06-22 Thread osstest service owner
flight 110950 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/110950/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemuu-ovmf-amd64 14 guest-saverestore.2 fail REGR. vs. 
110515
 test-amd64-amd64-xl-qemuu-win7-amd64 15 guest-localmigrate/x10 fail REGR. vs. 
110515

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-start/win.repeat fail blocked in 
110515
 test-armhf-armhf-libvirt-xsm 13 saverestore-support-checkfail  like 110515
 test-armhf-armhf-libvirt 13 saverestore-support-checkfail  like 110515
 test-armhf-armhf-xl-rtds 15 guest-start/debian.repeatfail  like 110515
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail like 110515
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop fail like 110515
 test-amd64-amd64-xl-rtds  9 debian-install   fail  like 110515
 test-armhf-armhf-libvirt-raw 12 saverestore-support-checkfail  like 110515
 test-amd64-amd64-xl-qemut-ws16-amd64  9 windows-installfail never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  13 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  12 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-xsm  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-xl-rtds 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-i386-xl-qemut-ws16-amd64 12 guest-saverestore   fail never pass
 test-amd64-amd64-xl-qemuu-ws16-amd64  9 windows-installfail never pass
 test-armhf-armhf-xl-vhd  11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-ws16-amd64 12 guest-saverestore   fail never pass
 test-amd64-amd64-xl-qemut-win10-i386  9 windows-installfail never pass
 test-amd64-i386-xl-qemuu-win10-i386  9 windows-install fail never pass
 test-amd64-amd64-xl-qemuu-win10-i386  9 windows-installfail never pass
 test-amd64-i386-xl-qemut-win10-i386  9 windows-install fail never pass

version targeted for testing:
 linux48b6bbef9a1789f0365c1a385879a1fea4460016
baseline version:
 linux1439ccf73d9c07654fdd5b4969fd53c2feb8684d

Last test of basis   110515  2017-06-17 06:48:56 Z5 days
Failing since110536  2017-06-17 23:48:13 Z5 days6 attempts
Testing same since   110950  2017-06-21 22:17:11 Z1 days1 attempts


People who 

Re: [Xen-devel] [PATCH for-4.9 v3 3/3] xen/livepatch: Don't crash on encountering STN_UNDEF relocations

2017-06-22 Thread Konrad Rzeszutek Wilk
On Thu, Jun 22, 2017 at 07:15:29PM +0100, Andrew Cooper wrote:
> A symndx of STN_UNDEF is special, and means a symbol value of 0.  While
> legitimate in the ELF standard, its existance in a livepatch is questionable
> at best.  Until a plausible usecase presents itself, reject such a relocation
> with -EOPNOTSUPP.
> 
> Additionally, fix an off-by-one error while range checking symndx, and perform
> a safety check on elf->sym[symndx].sym before derefencing it, to avoid
> tripping over a NULL pointer when calculating val.
> 
> Signed-off-by: Andrew Cooper 
> ---
> CC: Konrad Rzeszutek Wilk 

Reviewed-by: Konrad Rzeszutek Wilk 
Tested-by: Konrad Rzeszutek Wilk  [arm32 and x86]

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.9 v3 2/3] xen/livepatch: Use zeroed memory allocations for arrays

2017-06-22 Thread Konrad Rzeszutek Wilk
On Thu, Jun 22, 2017 at 07:15:28PM +0100, Andrew Cooper wrote:
> Each of these arrays is sparse.  Use zeroed allocations to cause uninitialised
> array elements to contain deterministic values, most importantly for the
> embedded pointers.
> 
> Signed-off-by: Andrew Cooper 
> ---
> CC: Konrad Rzeszutek Wilk 

Reviewed-by: Konrad Rzeszutek Wilk 

Tested-by: Konrad Rzeszutek Wilk 
[x86 and ARM32]
> CC: Ross Lagerwall 
> 
> * new in v3
> ---
>  xen/common/livepatch.c | 4 ++--
>  xen/common/livepatch_elf.c | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c
> index df67a1a..66d532d 100644
> --- a/xen/common/livepatch.c
> +++ b/xen/common/livepatch.c
> @@ -771,8 +771,8 @@ static int build_symbol_table(struct payload *payload,
>  }
>  }
>  
> -symtab = xmalloc_array(struct livepatch_symbol, nsyms);
> -strtab = xmalloc_array(char, strtab_len);
> +symtab = xzalloc_array(struct livepatch_symbol, nsyms);
> +strtab = xzalloc_array(char, strtab_len);
>  
>  if ( !strtab || !symtab )
>  {
> diff --git a/xen/common/livepatch_elf.c b/xen/common/livepatch_elf.c
> index c4a9633..b69e271 100644
> --- a/xen/common/livepatch_elf.c
> +++ b/xen/common/livepatch_elf.c
> @@ -52,7 +52,7 @@ static int elf_resolve_sections(struct livepatch_elf *elf, 
> const void *data)
>  int rc;
>  
>  /* livepatch_elf_load sanity checked e_shnum. */
> -sec = xmalloc_array(struct livepatch_elf_sec, elf->hdr->e_shnum);
> +sec = xzalloc_array(struct livepatch_elf_sec, elf->hdr->e_shnum);
>  if ( !sec )
>  {
>  dprintk(XENLOG_ERR, LIVEPATCH"%s: Could not allocate memory for 
> section table!\n",
> @@ -225,7 +225,7 @@ static int elf_get_sym(struct livepatch_elf *elf, const 
> void *data)
>  /* No need to check values as elf_resolve_sections did it. */
>  nsym = symtab_sec->sec->sh_size / symtab_sec->sec->sh_entsize;
>  
> -sym = xmalloc_array(struct livepatch_elf_sym, nsym);
> +sym = xzalloc_array(struct livepatch_elf_sym, nsym);
>  if ( !sym )
>  {
>  dprintk(XENLOG_ERR, LIVEPATCH "%s: Could not allocate memory for 
> symbols\n",
> -- 
> 2.1.4
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen: Replace ASSERT(0) with ASSERT_UNREACHABLE()

2017-06-22 Thread Konrad Rzeszutek Wilk
On Wed, Jun 21, 2017 at 01:40:45PM +0100, Andrew Cooper wrote:
> No functional change, but the result is more informative both in the code and
> error messages if the assertions do get hit.
> 
> Signed-off-by: Andrew Cooper 
> ---
> CC: Jan Beulich 
> CC: Konrad Rzeszutek Wilk 

Acked-by: Konrad Rzeszutek Wilk 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Travis build failing because "tools/xen-detect: try sysfs node for obtaining guest type" ?

2017-06-22 Thread Konrad Rzeszutek Wilk
On Thu, Jun 22, 2017 at 07:31:53PM +0200, Dario Faggioli wrote:
> Hey,
> 
> Am I the only one for which Travis seems to be unhappy of this:

Nope. I saw it too ,but then figured there was some patch from Olaf for this?.

> 
> I/home/travis/build/fdario/xen/tools/misc/../../tools/include  
> xen-detect.c   -o xen-detect
> xen-detect.c: In function ‘check_sysfs’:
> xen-detect.c:196:17: error: ignoring return value of ‘asprintf’, declared 
> with attribute warn_unused_result [-Werror=unused-result]
>  asprintf(, "V%s.%s", str, tmp);
>  ^
> xen-detect.c: In function ‘check_for_xen’:
> xen-detect.c:93:17: error: ignoring return value of ‘asprintf’, declared with 
> attribute warn_unused_result [-Werror=unused-result]
>  asprintf(, "V%u.%u",
>  ^
> cc1: all warnings being treated as errors
> 
> https://travis-ci.org/fdario/xen/jobs/245864401
> 
> Which, to me, looks related to 48d0c822640f8ce4754de16f1bee5c995bac7078
> ("tools/xen-detect: try sysfs node for obtaining guest type").
> 
> I can, however, build the tools locally, with:
> gcc version 6.3.0 20170516 (Debian 6.3.0-18)
> 
> Thoughts?
> 
> Regards,
> Dario
> -- 
> <> (Raistlin Majere)
> -
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)



> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [xen-4.8-testing test] 110946: tolerable FAIL - PUSHED

2017-06-22 Thread osstest service owner
flight 110946 xen-4.8-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/110946/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stopfail REGR. vs. 110437
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop   fail REGR. vs. 110437

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-credit2  15 guest-start/debian.repeatfail  like 110410
 test-xtf-amd64-amd64-3  45 xtf/test-hvm64-lbr-tsx-vmentry fail like 110437
 test-xtf-amd64-amd64-1  45 xtf/test-hvm64-lbr-tsx-vmentry fail like 110437
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stopfail like 110437
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail like 110437
 test-armhf-armhf-xl-rtds 15 guest-start/debian.repeatfail  like 110437
 test-amd64-amd64-xl-rtds  9 debian-install   fail  like 110437
 build-amd64-prev  6 xen-build/dist-test  fail   never pass
 build-i386-prev   6 xen-build/dist-test  fail   never pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail  never pass
 test-amd64-amd64-xl-qemut-ws16-amd64  9 windows-installfail never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-ws16-amd64  9 windows-installfail never pass
 test-arm64-arm64-xl-credit2  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  13 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 12 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemut-ws16-amd64  9 windows-install fail never pass
 test-amd64-amd64-xl-qemut-win10-i386  9 windows-installfail never pass
 test-amd64-i386-xl-qemut-win10-i386  9 windows-install fail never pass
 test-amd64-amd64-xl-qemuu-win10-i386  9 windows-installfail never pass
 test-amd64-i386-xl-qemuu-win10-i386  9 windows-install fail never pass
 test-amd64-i386-xl-qemuu-ws16-amd64  9 windows-install fail never pass


Re: [Xen-devel] [PATCH 14/17 v5] xen/arm: vpl011: Add support for vuart console in xenconsole

2017-06-22 Thread Stefano Stabellini
On Thu, 22 Jun 2017, Bhupinder Thakur wrote:
> This patch finally adds the support for vuart console.
> 
> Signed-off-by: Bhupinder Thakur 
> ---
> CC: Ian Jackson 
> CC: Wei Liu 
> CC: Stefano Stabellini 
> CC: Julien Grall 
> 
> Changes since v4:
> - Renamed VUART_CFLAGS- to CFLAGS_vuart- in the Makefile as per the 
> convention.
> 
>  config/arm32.mk   |  1 +
>  config/arm64.mk   |  1 +
>  tools/console/Makefile|  3 ++-
>  tools/console/daemon/io.c | 31 ++-
>  4 files changed, 34 insertions(+), 2 deletions(-)
> 
> diff --git a/config/arm32.mk b/config/arm32.mk
> index f95228e..b9f23fe 100644
> --- a/config/arm32.mk
> +++ b/config/arm32.mk
> @@ -1,5 +1,6 @@
>  CONFIG_ARM := y
>  CONFIG_ARM_32 := y
> +CONFIG_VUART_CONSOLE := y
>  CONFIG_ARM_$(XEN_OS) := y

I am tempted to disable this by default on arm32 (but leaving it
configurable via Kconfig maybe). Tipically arm32 cpus are not found on
server platforms, where SBSA compliance is important. Julien, what do
you think?


>  CONFIG_XEN_INSTALL_SUFFIX :=
> diff --git a/config/arm64.mk b/config/arm64.mk
> index aa45772..861d0a4 100644
> --- a/config/arm64.mk
> +++ b/config/arm64.mk
> @@ -1,5 +1,6 @@
>  CONFIG_ARM := y
>  CONFIG_ARM_64 := y
> +CONFIG_VUART_CONSOLE := y
>  CONFIG_ARM_$(XEN_OS) := y
>  
>  CONFIG_XEN_INSTALL_SUFFIX :=
> diff --git a/tools/console/Makefile b/tools/console/Makefile
> index c8b0300..1cddb6e 100644
> --- a/tools/console/Makefile
> +++ b/tools/console/Makefile
> @@ -11,6 +11,7 @@ LDLIBS += $(SOCKET_LIBS)
>  
>  LDLIBS_xenconsoled += $(UTIL_LIBS)
>  LDLIBS_xenconsoled += -lrt
> +CFLAGS_vuart-$(CONFIG_VUART_CONSOLE) = -DCONFIG_VUART_CONSOLE
>  
>  BIN  = xenconsoled xenconsole
>  
> @@ -28,7 +29,7 @@ clean:
>  distclean: clean
>  
>  daemon/main.o: daemon/_paths.h
> -daemon/io.o: CFLAGS += $(CFLAGS_libxenevtchn) $(CFLAGS_libxengnttab)
> +daemon/io.o: CFLAGS += $(CFLAGS_libxenevtchn) $(CFLAGS_libxengnttab) 
> $(CFLAGS_vuart-y)
>  xenconsoled: $(patsubst %.c,%.o,$(wildcard daemon/*.c))
>   $(CC) $(LDFLAGS) $^ -o $@ $(LDLIBS) $(LDLIBS_libxenevtchn) 
> $(LDLIBS_libxengnttab) $(LDLIBS_xenconsoled) $(APPEND_LDFLAGS)
>  
> diff --git a/tools/console/daemon/io.c b/tools/console/daemon/io.c
> index baf0e2e..6b0114e 100644
> --- a/tools/console/daemon/io.c
> +++ b/tools/console/daemon/io.c
> @@ -107,12 +107,16 @@ struct console {
>   xenevtchn_port_or_error_t remote_port;
>   struct xencons_interface *interface;
>   struct domain *d;
> + bool optional;
> + bool prefer_gnttab;
>  };
>  
>  struct console_data {
>   char *xsname;
>   char *ttyname;
>   char *log_suffix;
> + bool optional;
> + bool prefer_gnttab;
>  };
>  
>  static struct console_data console_data[] = {
> @@ -121,7 +125,18 @@ static struct console_data console_data[] = {
>   .xsname = "/console",
>   .ttyname = "tty",
>   .log_suffix = "",
> + .optional = false,
> + .prefer_gnttab = true,
>   },
> +#if defined(CONFIG_VUART_CONSOLE)
> + {
> + .xsname = "/vuart/0",
> + .ttyname = "tty",
> + .log_suffix = "-vuart0",
> + .optional = true,
> + .prefer_gnttab = false,
> + },
> +#endif
>  };
>  
>  #define MAX_CONSOLE (sizeof(console_data)/sizeof(struct console_data))
> @@ -655,8 +670,18 @@ static int console_create_ring(struct console *con)
>   "ring-ref", "%u", _ref,
>   "port", "%i", _port,
>   NULL);
> +
>   if (err)
> + {
> + /*
> +  * This is a normal condition for optional consoles: they might 
> not be
> +  * present on xenstore at all. In that case, just return 
> without error.
> + */
> + if (con->optional)
> + err = 0;
> +
>   goto out;
> + }
>  
>   snprintf(path, sizeof(path), "%s/type", con->xspath);
>   type = xs_read(xs, XBT_NULL, path, NULL);
> @@ -670,7 +695,9 @@ static int console_create_ring(struct console *con)
>   if (ring_ref != con->ring_ref && con->ring_ref != -1)
>   console_unmap_interface(con);
>  
> - if (!con->interface && xgt_handle) {
> + if (!con->interface && 
> + xgt_handle &&
> + con->prefer_gnttab) {
>   /* Prefer using grant table */
>   con->interface = xengnttab_map_grant_ref(xgt_handle,
>   dom->domid, GNTTAB_RESERVED_CONSOLE,
> @@ -790,6 +817,8 @@ static int console_init(struct console *con, struct 
> domain *dom, void **data)
>   con->d = dom;
>   con->ttyname = (*con_data)->ttyname;
>   con->log_suffix = (*con_data)->log_suffix;
> + con->optional = (*con_data)->optional;
> + con->prefer_gnttab = 

Re: [Xen-devel] [PATCH 13/17 v5] xen/arm: vpl011: Modify xenconsole to support multiple consoles

2017-06-22 Thread Stefano Stabellini
On Thu, 22 Jun 2017, Bhupinder Thakur wrote:
> This patch adds the support for multiple consoles and introduces the iterator
> functions to operate on multiple consoles.
> 
> This patch is in preparation to support a new vuart console.
> 
> Signed-off-by: Bhupinder Thakur 
> ---
> CC: Ian Jackson 
> CC: Wei Liu 
> CC: Stefano Stabellini 
> CC: Julien Grall 
> 
> Changes since v4:
> - Changes to make event channel handling per console rather than per domain.
> 
> Changes since v3:
> - The changes in xenconsole have been split into four patches. This is the 
> third patch.
> 
>  tools/console/daemon/io.c | 435 
> --
>  1 file changed, 302 insertions(+), 133 deletions(-)
> 
> diff --git a/tools/console/daemon/io.c b/tools/console/daemon/io.c
> index a2a3496..baf0e2e 100644
> --- a/tools/console/daemon/io.c
> +++ b/tools/console/daemon/io.c
> @@ -90,12 +90,14 @@ struct buffer {
>  };
>  
>  struct console {
> + char *ttyname;
>   int master_fd;
>   int master_pollfd_idx;
>   int slave_fd;
>   int log_fd;
>   struct buffer buffer;
>   char *xspath;
> + char *log_suffix;
>   int ring_ref;
>   xenevtchn_handle *xce_handle;
>   int xce_pollfd_idx;
> @@ -107,16 +109,112 @@ struct console {
>   struct domain *d;
>  };
>  
> +struct console_data {
> + char *xsname;
> + char *ttyname;
> + char *log_suffix;
> +};
> +
> +static struct console_data console_data[] = {
> +
> + {
> + .xsname = "/console",
> + .ttyname = "tty",
> + .log_suffix = "",
> + },
> +};
> +
> +#define MAX_CONSOLE (sizeof(console_data)/sizeof(struct console_data))
> +
>  struct domain {
>   int domid;
>   bool is_dead;
>   unsigned last_seen;
>   struct domain *next;
> - struct console console;
> + struct console console[MAX_CONSOLE];
>  };
>  
>  static struct domain *dom_head;
>  
> +typedef void (*VOID_ITER_FUNC_ARG1)(struct console *);
> +typedef bool (*BOOL_ITER_FUNC_ARG1)(struct console *);
> +typedef int (*INT_ITER_FUNC_ARG1)(struct console *);
> +typedef void (*VOID_ITER_FUNC_ARG2)(struct console *,  void *);
> +typedef int (*INT_ITER_FUNC_ARG3)(struct console *,
> +  struct domain *dom, void **);
> +
> +static inline bool console_enabled(struct console *con)
> +{
> + return con->local_port != -1;
> +}
> +
> +static inline void console_iter_void_arg1(struct domain *d,
> + 
>   VOID_ITER_FUNC_ARG1 iter_func)
> +{
> + int i = 0;
> + struct console *con = &(d->console[0]);
> +
> + for (i = 0; i < MAX_CONSOLE; i++, con++)
> + {
> + iter_func(con);
> + }
> +}
> +
> +static inline void console_iter_void_arg2(struct domain *d,
> + 
>   VOID_ITER_FUNC_ARG2 iter_func,
> + 
>   void *iter_data)
> +{
> + int i = 0;
> + struct console *con = &(d->console[0]);
> +
> + for (i = 0; i < MAX_CONSOLE; i++, con++)
> + {
> + iter_func(con, iter_data);
> + }
> +}
> +
> +static inline bool console_iter_bool_arg1(struct domain *d,
> + 
>   BOOL_ITER_FUNC_ARG1 iter_func)
> +{
> + int i = 0;
> + struct console *con = &(d->console[0]);
> +
> + for (i = 0; i < MAX_CONSOLE; i++, con++)
> + {
> + if (iter_func(con))
> + return true;
> + }
> + return false;
> +}
> +
> +static inline int console_iter_int_arg1(struct domain *d,
> + 
> INT_ITER_FUNC_ARG1 iter_func)
> +{
> + int i = 0;
> + struct console *con = &(d->console[0]);
> +
> + for (i = 0; i < MAX_CONSOLE; i++, con++)
> + {
> + if (iter_func(con))
> + return 1;
> + }
> + return 0;
> +}
> +
> +static inline int console_iter_int_arg3(struct domain *d,
> + 
> INT_ITER_FUNC_ARG3 iter_func,
> + 
> void **iter_data)
> +{
> + int i = 0;
> + struct console *con = &(d->console[0]);
> +
> + for (i = 0; i < MAX_CONSOLE; i++, con++)
> + {
> + if (iter_func(con, d, iter_data))
> + return 1;
> + }
> + return 0;
> +}
>  static int write_all(int fd, const char* buf, size_t len)
>  {
>   while (len) {
> @@ -163,12 +261,22 @@ static int write_with_timestamp(int fd, const char 
> *data, size_t sz,
>   return 0;
>  }
>  
> -static void buffer_append(struct console 

Re: [Xen-devel] [PATCH 11/17 v5] xen/arm: vpl011: Rename the console structure field conspath to xspath

2017-06-22 Thread Stefano Stabellini
On Thu, 22 Jun 2017, Bhupinder Thakur wrote:
> The console->conspath name is changed to console->xspath as it is
> clear from the name that it is referring to xenstore path.
> 
> Signed-off-by: Bhupinder Thakur 

Reviewed-by: Stefano Stabellini 


> ---
> CC: Ian Jackson 
> CC: Wei Liu 
> CC: Stefano Stabellini 
> CC: Julien Grall 
> 
> Changes since v4:
> - Split this change in a separate patch.
> 
>  tools/console/daemon/io.c | 30 +++---
>  1 file changed, 15 insertions(+), 15 deletions(-)
> 
> diff --git a/tools/console/daemon/io.c b/tools/console/daemon/io.c
> index 30cd167..6f5c69c 100644
> --- a/tools/console/daemon/io.c
> +++ b/tools/console/daemon/io.c
> @@ -95,7 +95,7 @@ struct console {
>   int slave_fd;
>   int log_fd;
>   struct buffer buffer;
> - char *conspath;
> + char *xspath;
>   int ring_ref;
>   xenevtchn_handle *xce_handle;
>   int xce_pollfd_idx;
> @@ -463,7 +463,7 @@ static int domain_create_tty(struct domain *dom)
>   goto out;
>   }
>  
> - success = asprintf(, "%s/limit", con->conspath) !=
> + success = asprintf(, "%s/limit", con->xspath) !=
>   -1;
>   if (!success)
>   goto out;
> @@ -474,7 +474,7 @@ static int domain_create_tty(struct domain *dom)
>   }
>   free(path);
>  
> - success = (asprintf(, "%s/tty", con->conspath) != -1);
> + success = (asprintf(, "%s/tty", con->xspath) != -1);
>   if (!success)
>   goto out;
>   success = xs_write(xs, XBT_NULL, path, slave, strlen(slave));
> @@ -546,14 +546,14 @@ static int domain_create_ring(struct domain *dom)
>   char *type, path[PATH_MAX];
>   struct console *con = >console;
>  
> - err = xs_gather(xs, con->conspath,
> + err = xs_gather(xs, con->xspath,
>   "ring-ref", "%u", _ref,
>   "port", "%i", _port,
>   NULL);
>   if (err)
>   goto out;
>  
> - snprintf(path, sizeof(path), "%s/type", con->conspath);
> + snprintf(path, sizeof(path), "%s/type", con->xspath);
>   type = xs_read(xs, XBT_NULL, path, NULL);
>   if (type && strcmp(type, "xenconsoled") != 0) {
>   free(type);
> @@ -646,13 +646,13 @@ static bool watch_domain(struct domain *dom, bool watch)
>  
>   snprintf(domid_str, sizeof(domid_str), "dom%u", dom->domid);
>   if (watch) {
> - success = xs_watch(xs, con->conspath, domid_str);
> + success = xs_watch(xs, con->xspath, domid_str);
>   if (success)
>   domain_create_ring(dom);
>   else
> - xs_unwatch(xs, con->conspath, domid_str);
> + xs_unwatch(xs, con->xspath, domid_str);
>   } else {
> - success = xs_unwatch(xs, con->conspath, domid_str);
> + success = xs_unwatch(xs, con->xspath, domid_str);
>   }
>  
>   return success;
> @@ -682,13 +682,13 @@ static struct domain *create_domain(int domid)
>   dom->domid = domid;
>  
>   con = >console;
> - con->conspath = xs_get_domain_path(xs, dom->domid);
> - s = realloc(con->conspath, strlen(con->conspath) +
> + con->xspath = xs_get_domain_path(xs, dom->domid);
> + s = realloc(con->xspath, strlen(con->xspath) +
>   strlen("/console") + 1);
>   if (s == NULL)
>   goto out;
> - con->conspath = s;
> - strcat(con->conspath, "/console");
> + con->xspath = s;
> + strcat(con->xspath, "/console");
>  
>   con->master_fd = -1;
>   con->master_pollfd_idx = -1;
> @@ -712,7 +712,7 @@ static struct domain *create_domain(int domid)
>  
>   return dom;
>   out:
> - free(con->conspath);
> + free(con->xspath);
>   free(dom);
>   return NULL;
>  }
> @@ -756,8 +756,8 @@ static void cleanup_domain(struct domain *d)
>   free(con->buffer.data);
>   con->buffer.data = NULL;
>  
> - free(con->conspath);
> - con->conspath = NULL;
> + free(con->xspath);
> + con->xspath = NULL;
>  
>   remove_domain(d);
>  }
> -- 
> 2.7.4
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 10/17 v5] xen/arm: vpl011: Modify xenconsole to define and use a new console structure

2017-06-22 Thread Stefano Stabellini
On Thu, 22 Jun 2017, Bhupinder Thakur wrote:
> Xenconsole uses a domain structure which contains console specific fields. 
> This
> patch defines a new console structure, which would be used by the xenconsole
> functions to perform console specific operations like reading/writing data 
> from/to
> the console ring buffer or reading/writing data from/to console tty.
> 
> This patch is in preparation to support multiple consoles to support vuart 
> console.
> 
> Signed-off-by: Bhupinder Thakur 

Reviewed-by: Stefano Stabellini 


> ---
> CC: Ian Jackson 
> CC: Wei Liu 
> CC: Stefano Stabellini 
> CC: Julien Grall 
> 
> Changes since v4:
> - Moved the following fields from the struct domain to struct console:
>   ->xenevtchn_handle *xce_handle;
>   ->int xce_pollfd_idx;
>   ->int event_count;
>   ->long long next_period;
> 
> Changes since v3:
> - The changes in xenconsole have been split into four patches. This is the 
> first patch
>   which modifies the xenconsole to use a new console structure.
> 
> Changes since v2:
> - Defined a new function console_create_ring() which sets up the ring buffer 
> and 
>   event channel a new console. domain_create_ring() uses this function to 
> setup
>   a console.
> - This patch does not contain vuart specific changes, which would be 
> introduced in
>   the next patch.
> - Changes for keeping the PV log file name unchanged.
> 
> Changes since v1:
> - Split the domain struture to a separate console structure
> - Modified the functions to operate on the console struture
> - Replaced repetitive per console code with generic code
> 
>  tools/console/daemon/io.c | 299 
> +-
>  1 file changed, 165 insertions(+), 134 deletions(-)
> 
> diff --git a/tools/console/daemon/io.c b/tools/console/daemon/io.c
> index e8033d2..30cd167 100644
> --- a/tools/console/daemon/io.c
> +++ b/tools/console/daemon/io.c
> @@ -89,25 +89,30 @@ struct buffer {
>   size_t max_capacity;
>  };
>  
> -struct domain {
> - int domid;
> +struct console {
>   int master_fd;
>   int master_pollfd_idx;
>   int slave_fd;
>   int log_fd;
> - bool is_dead;
> - unsigned last_seen;
>   struct buffer buffer;
> - struct domain *next;
>   char *conspath;
>   int ring_ref;
> - xenevtchn_port_or_error_t local_port;
> - xenevtchn_port_or_error_t remote_port;
>   xenevtchn_handle *xce_handle;
>   int xce_pollfd_idx;
> - struct xencons_interface *interface;
>   int event_count;
>   long long next_period;
> + xenevtchn_port_or_error_t local_port;
> + xenevtchn_port_or_error_t remote_port;
> + struct xencons_interface *interface;
> + struct domain *d;
> +};
> +
> +struct domain {
> + int domid;
> + bool is_dead;
> + unsigned last_seen;
> + struct domain *next;
> + struct console console;
>  };
>  
>  static struct domain *dom_head;
> @@ -160,9 +165,10 @@ static int write_with_timestamp(int fd, const char 
> *data, size_t sz,
>  
>  static void buffer_append(struct domain *dom)
>  {
> - struct buffer *buffer = >buffer;
> + struct console *con = >console;
> + struct buffer *buffer = >buffer;
>   XENCONS_RING_IDX cons, prod, size;
> - struct xencons_interface *intf = dom->interface;
> + struct xencons_interface *intf = con->interface;
>  
>   cons = intf->out_cons;
>   prod = intf->out_prod;
> @@ -187,22 +193,22 @@ static void buffer_append(struct domain *dom)
>  
>   xen_mb();
>   intf->out_cons = cons;
> - xenevtchn_notify(dom->xce_handle, dom->local_port);
> + xenevtchn_notify(con->xce_handle, con->local_port);
>  
>   /* Get the data to the logfile as early as possible because if
>* no one is listening on the console pty then it will fill up
>* and handle_tty_write will stop being called.
>*/
> - if (dom->log_fd != -1) {
> + if (con->log_fd != -1) {
>   int logret;
>   if (log_time_guest) {
>   logret = write_with_timestamp(
> - dom->log_fd,
> + con->log_fd,
>   buffer->data + buffer->size - size,
>   size, _time_guest_needts);
>   } else {
>   logret = write_all(
> - dom->log_fd,
> + con->log_fd,
>   buffer->data + buffer->size - size,
>   size);
>   }
> @@ -338,14 +344,16 @@ static int create_domain_log(struct domain *dom)
>  
>  static void domain_close_tty(struct domain *dom)
>  {
> - if (dom->master_fd != -1) {
> - close(dom->master_fd);
> - dom->master_fd = -1;
> + struct console *con = >console;
> 

Re: [Xen-devel] [PATCH 15/17 v5] xen/arm: vpl011: Add a new vuart console type to xenconsole client

2017-06-22 Thread Stefano Stabellini
On Thu, 22 Jun 2017, Bhupinder Thakur wrote:
> Add a new console type VUART to connect to guest's emualated vuart
> console.
> 
> Signed-off-by: Bhupinder Thakur 

Reviewed-by: Stefano Stabellini 


> ---
> CC: Ian Jackson 
> CC: Wei Liu 
> CC: Stefano Stabellini 
> CC: Julien Grall 
> 
> Changes since v4:
> - Removed the vuart compile time flag so that vuart code is compiled always.
> 
> Changes since v3:
> - The vuart console support is under CONFIG_VUART_CONSOLE option.
> - Since there is a change from last review, I have not included
>   reviewed-by tag from Stefano and acked-by tag from Wei.
> 
>  tools/console/client/main.c | 13 +++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/console/client/main.c b/tools/console/client/main.c
> index 99f..3dbb06f 100644
> --- a/tools/console/client/main.c
> +++ b/tools/console/client/main.c
> @@ -76,7 +76,7 @@ static void usage(const char *program) {
>  "\n"
>  "  -h, --help   display this help and exit\n"
>  "  -n, --num N  use console number N\n"
> -"  --type TYPE  console type. must be 'pv' or 'serial'\n"
> +"  --type TYPE  console type. must be 'pv', 'serial' or 
> 'vuart'\n"
>  "  --start-notify-fd N file descriptor used to notify parent\n"
>  , program);
>  }
> @@ -264,6 +264,7 @@ typedef enum {
> CONSOLE_INVAL,
> CONSOLE_PV,
> CONSOLE_SERIAL,
> +   CONSOLE_VUART,
>  } console_type;
>  
>  static struct termios stdin_old_attr;
> @@ -343,6 +344,7 @@ int main(int argc, char **argv)
>   char *end;
>   console_type type = CONSOLE_INVAL;
>   bool interactive = 0;
> + char *console_names = "serial, pv, vuart";
>  
>   if (isatty(STDIN_FILENO) && isatty(STDOUT_FILENO))
>   interactive = 1;
> @@ -361,9 +363,12 @@ int main(int argc, char **argv)
>   type = CONSOLE_SERIAL;
>   else if (!strcmp(optarg, "pv"))
>   type = CONSOLE_PV;
> + else if (!strcmp(optarg, "vuart"))
> + type = CONSOLE_VUART;
>   else {
>   fprintf(stderr, "Invalid type argument\n");
> - fprintf(stderr, "Console types supported are: 
> serial, pv\n");
> + fprintf(stderr, "Console types supported are: 
> %s\n",
> + console_names);
>   exit(EINVAL);
>   }
>   break;
> @@ -436,6 +441,10 @@ int main(int argc, char **argv)
>   else
>   snprintf(path, strlen(dom_path) + 
> strlen("/device/console/%d/tty") + 5, "%s/device/console/%d/tty", dom_path, 
> num);
>   }
> + if (type == CONSOLE_VUART) {
> + snprintf(path, strlen(dom_path) + strlen("/vuart/0/tty") + 1,
> +  "%s/vuart/0/tty", dom_path);
> + }
>  
>   /* FIXME consoled currently does not assume domain-0 doesn't have a
>  console which is good when we break domain-0 up.  To keep us
> -- 
> 2.7.4
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 09/17 v5] xen/arm: vpl011: Add a new vuart node in the xenstore

2017-06-22 Thread Stefano Stabellini
On Thu, 22 Jun 2017, Bhupinder Thakur wrote:
> Add a new vuart console node to xenstore. This node is added at
> 
> /local/domain/$DOMID/vuart/0.
> 
> The node contains information such as the ring-ref, event channel,
> buffer limit and type of console.
> 
> Xenconsole reads the node information to setup the ring buffer and
> event channel for sending/receiving vuart data.
> 
> Signed-off-by: Bhupinder Thakur 

Reviewed-by: Stefano Stabellini 


> ---
> CC: Ian Jackson 
> CC: Wei Liu 
> CC: Stefano Stabellini 
> CC: Julien Grall 
> 
> Changes since v4:
> -  vuart_device moved inside libxl__device_vuart_add() as a local variable.
> 
> Changes since v3:
> - Added a backend node for vpl011.
> - Removed libxl__device_vuart_add() for HVM guest. It is called only for PV 
> guest.
> 
>  tools/libxl/libxl_console.c  | 44 
> 
>  tools/libxl/libxl_create.c   | 10 +++-
>  tools/libxl/libxl_device.c   |  9 ++--
>  tools/libxl/libxl_internal.h |  3 +++
>  tools/libxl/libxl_types_internal.idl |  1 +
>  5 files changed, 64 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/libxl/libxl_console.c b/tools/libxl/libxl_console.c
> index 853be15..cdaf7fd 100644
> --- a/tools/libxl/libxl_console.c
> +++ b/tools/libxl/libxl_console.c
> @@ -344,6 +344,50 @@ out:
>  return rc;
>  }
>  
> +int libxl__device_vuart_add(libxl__gc *gc, uint32_t domid,
> +libxl__device_console *console,
> +libxl__domain_build_state *state)
> +{
> +libxl__device device;
> +flexarray_t *ro_front;
> +flexarray_t *back;
> +int rc;
> +
> +ro_front = flexarray_make(gc, 16, 1);
> +back = flexarray_make(gc, 16, 1);
> +
> +device.backend_devid = console->devid;
> +device.backend_domid = console->backend_domid;
> +device.backend_kind = LIBXL__DEVICE_KIND_VUART;
> +device.devid = console->devid;
> +device.domid = domid;
> +device.kind = LIBXL__DEVICE_KIND_VUART;
> +
> +flexarray_append(back, "frontend-id");
> +flexarray_append(back, GCSPRINTF("%d", domid));
> +flexarray_append(back, "online");
> +flexarray_append(back, "1");
> +flexarray_append(back, "state");
> +flexarray_append(back, GCSPRINTF("%d", XenbusStateInitialising));
> +flexarray_append(back, "protocol");
> +flexarray_append(back, LIBXL_XENCONSOLE_PROTOCOL);
> +
> +flexarray_append(ro_front, "port");
> +flexarray_append(ro_front, GCSPRINTF("%"PRIu32, state->vuart_port));
> +flexarray_append(ro_front, "ring-ref");
> +flexarray_append(ro_front, GCSPRINTF("%lu", state->vuart_gfn));
> +flexarray_append(ro_front, "limit");
> +flexarray_append(ro_front, GCSPRINTF("%d", LIBXL_XENCONSOLE_LIMIT));
> +flexarray_append(ro_front, "type");
> +flexarray_append(ro_front, "xenconsoled");
> +
> +rc = libxl__device_generic_add(gc, XBT_NULL, ,
> +   libxl__xs_kvs_of_flexarray(gc, back),
> +   NULL,
> +   libxl__xs_kvs_of_flexarray(gc, ro_front));
> +return rc;
> +}
> +
>  int libxl__init_console_from_channel(libxl__gc *gc,
>   libxl__device_console *console,
>   int dev_num,
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> index bffbc45..cfd85ec 100644
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -1367,7 +1367,7 @@ static void domcreate_launch_dm(libxl__egc *egc, 
> libxl__multidev *multidev,
>  }
>  case LIBXL_DOMAIN_TYPE_PV:
>  {
> -libxl__device_console console;
> +libxl__device_console console, vuart;
>  libxl__device device;
>  
>  for (i = 0; i < d_config->num_vfbs; i++) {
> @@ -1375,6 +1375,14 @@ static void domcreate_launch_dm(libxl__egc *egc, 
> libxl__multidev *multidev,
>  libxl__device_vkb_add(gc, domid, _config->vkbs[i]);
>  }
>  
> +if (d_config->b_info.arch_arm.vuart)
> +{
> +init_console_info(gc, , 0);
> +vuart.backend_domid = state->console_domid;
> +libxl__device_vuart_add(gc, domid, , state);
> +libxl__device_console_dispose();
> +}
> +
>  init_console_info(gc, , 0);
>  console.backend_domid = state->console_domid;
>  libxl__device_console_add(gc, domid, , state, );
> diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c
> index 00356af..3b10c58 100644
> --- a/tools/libxl/libxl_device.c
> +++ b/tools/libxl/libxl_device.c
> @@ -26,6 +26,9 @@ static char *libxl__device_frontend_path(libxl__gc *gc, 
> libxl__device *device)
>  if (device->kind == LIBXL__DEVICE_KIND_CONSOLE && device->devid == 0)
>  return 

Re: [Xen-devel] [PATCH 08/17 v5] xen/arm: vpl011: Add a new domctl API to initialize vpl011

2017-06-22 Thread Stefano Stabellini
On Thu, 22 Jun 2017, Bhupinder Thakur wrote:
> Add a new domctl API to initialize vpl011. It takes the GFN and console
> backend domid as input and returns an event channel to be used for
> sending and receiving events from Xen.
> 
> Xen will communicate with xenconsole using GFN as the ring buffer and
> the event channel to transmit and receive pl011 data on the guest domain's
> behalf.
> 
> Signed-off-by: Bhupinder Thakur 
> ---
> CC: Ian Jackson 
> CC: Wei Liu 
> CC: Stefano Stabellini 
> CC: Julien Grall 
> 
> Changes since v4:
> - Removed libxl__arch_domain_create_finish().
> - Added a new function libxl__arch_build_dom_finish(), which is called at the 
> last
>   in libxl__build_dom(). This function calls the vpl011 initialization 
> function now.
> 
> Changes since v3:
> - Added a new arch specific function libxl__arch_domain_create_finish(), which
>   calls the vpl011 initialization function. For x86 this function does not do
>   anything.
> - domain_vpl011_init() takes a pointer to a structure which contains all the 
>   required information such as console_domid, gfn instead of passing 
> parameters
>   separately.
> - Dropped a DOMCTL API defined for de-initializing vpl011 as that should be
>   taken care when the domain is destroyed (and not dependent on userspace 
>   libraries/applications).
> 
> Changes since v2:
> - Replaced the DOMCTL APIs defined for get/set of event channel and GFN with 
>   a set of DOMCTL APIs for initializing and de-initializing vpl011 emulation.
> 
>  tools/libxc/include/xenctrl.h | 20 
>  tools/libxc/xc_domain.c   | 25 +
>  tools/libxl/libxl_arch.h  |  6 ++
>  tools/libxl/libxl_arm.c   | 22 ++
>  tools/libxl/libxl_dom.c   |  4 
>  tools/libxl/libxl_x86.c   |  8 
>  xen/arch/arm/domain.c |  5 +
>  xen/arch/arm/domctl.c | 37 +
>  xen/include/public/domctl.h   | 12 
>  9 files changed, 139 insertions(+)
> 
> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> index 1629f41..26f3d1e 100644
> --- a/tools/libxc/include/xenctrl.h
> +++ b/tools/libxc/include/xenctrl.h
> @@ -885,6 +885,26 @@ int xc_vcpu_getcontext(xc_interface *xch,
> uint32_t vcpu,
> vcpu_guest_context_any_t *ctxt);
>  
> +#if defined (__arm__) || defined(__aarch64__)
> +/**
> + * This function initializes the vpl011 emulation and returns
> + * the event to be used by the backend for communicating with
> + * the emulation code.
> + *
> + * @parm xch a handle to an open hypervisor interface
> + * @parm domid the domain to get information from
> + * @parm console_domid the domid of the backend console
> + * @parm gfn the guest pfn to be used as the ring buffer
> + * @parm evtchn the event channel to be used for events
> + * @return 0 on success, negative error on failure
> + */
> +int xc_dom_vpl011_init(xc_interface *xch,
> +   uint32_t domid,
> +   uint32_t console_domid,
> +   xen_pfn_t gfn,
> +   evtchn_port_t *evtchn);
> +#endif

Actually, the pattern is to define the xc_ function on all architecture
but only return ENOSYS where it's not implemented, see
xc_vcpu_get_extstate.


>  /**
>   * This function returns information about the XSAVE state of a particular
>   * vcpu of a domain. If extstate->size and extstate->xfeature_mask are 0,
> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> index 5d192ea..55de408 100644
> --- a/tools/libxc/xc_domain.c
> +++ b/tools/libxc/xc_domain.c
> @@ -343,6 +343,31 @@ int xc_domain_get_guest_width(xc_interface *xch, 
> uint32_t domid,
>  return 0;
>  }
>  
> +#if defined (__arm__) || defined(__aarch64__)
> +int xc_dom_vpl011_init(xc_interface *xch,
> +   uint32_t domid,
> +   uint32_t console_domid,
> +   xen_pfn_t gfn,
> +   evtchn_port_t *evtchn)
> +{

See other comment.


> +DECLARE_DOMCTL;
> +int rc = 0;
> +
> +domctl.cmd = XEN_DOMCTL_vuart_op;
> +domctl.domain = (domid_t)domid;
> +domctl.u.vuart_op.cmd = XEN_DOMCTL_VUART_OP_INIT_VPL011;
> +domctl.u.vuart_op.console_domid = console_domid;
> +domctl.u.vuart_op.gfn = gfn;
> +
> +if ( (rc = do_domctl(xch, )) < 0 )
> +return rc;
> +
> +*evtchn = domctl.u.vuart_op.evtchn;
> +
> +return rc;
> +}
> +#endif
> +
>  int xc_domain_getinfo(xc_interface *xch,
>uint32_t first_domid,
>unsigned int max_doms,
> diff --git a/tools/libxl/libxl_arch.h b/tools/libxl/libxl_arch.h
> index 5e1fc60..118b92c 100644
> --- a/tools/libxl/libxl_arch.h
> +++ b/tools/libxl/libxl_arch.h
> @@ -44,6 +44,12 @@ int 

Re: [Xen-devel] [PATCH 07/17 v5] xen/arm: vpl011: Rearrange xen header includes in alphabetical order in domctl.c

2017-06-22 Thread Stefano Stabellini
On Thu, 22 Jun 2017, Bhupinder Thakur wrote:
> Rearrange xen header includes in alphabetical order in domctl.c.
> 
> Signed-off-by: Bhupinder Thakur 

Reviewed-by: Stefano Stabellini 

> ---
> CC: Stefano Stabellini 
> CC: Julien Grall 
> 
>  xen/arch/arm/domctl.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c
> index 971caec..86fa102 100644
> --- a/xen/arch/arm/domctl.c
> +++ b/xen/arch/arm/domctl.c
> @@ -5,11 +5,11 @@
>   */
>  
>  #include 
> -#include 
>  #include 
> -#include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include 
>  #include 
>  
> -- 
> 2.7.4
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 06/17 v5] xen/arm: vpl011: Add support for vuart in libxl

2017-06-22 Thread Stefano Stabellini
On Thu, 22 Jun 2017, Bhupinder Thakur wrote:
> An option is provided in libxl to enable/disable sbsa vuart while
> creating a guest domain.
> 
> Libxl now suppots a generic vuart console and sbsa uart is a specific type.
> In future support can be added for multiple vuart of different types.
> 
> User can enable sbsa vuart by adding the following line in the guest
> configuration file:
> 
> vuart = "sbsa_uart"
> 
> Signed-off-by: Bhupinder Thakur 

Acked-by: Stefano Stabellini 


> ---
> CC: Ian Jackson 
> CC: Wei Liu 
> CC: Stefano Stabellini 
> CC: Julien Grall 
> 
> Changes since v4:
> - Renamed "pl011" to "sbsa_uart".
> 
> Changes since v3:
> - Added a new config option CONFIG_VUART_CONSOLE to enable/disable vuart 
> console
>   support.
> - Moved libxl_vuart_type to arch-arm part of libxl_domain_build_info
> - Updated xl command help to mention new console type - vuart.
> 
> Changes since v2:
> - Defined vuart option as an enum instead of a string.
> - Removed the domain creation flag defined for vuart and the related code
>   to pass on the information while domain creation. Now vpl011 is initialized
>   independent of domain creation through new DOMCTL APIs.
> 
>  tools/libxl/libxl.h  | 6 ++
>  tools/libxl/libxl_console.c  | 3 +++
>  tools/libxl/libxl_dom.c  | 1 +
>  tools/libxl/libxl_internal.h | 3 +++
>  tools/libxl/libxl_types.idl  | 7 +++
>  tools/xl/xl_cmdtable.c   | 2 +-
>  tools/xl/xl_console.c| 5 -
>  tools/xl/xl_parse.c  | 8 
>  8 files changed, 33 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index cf8687a..bcfbb6c 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -306,6 +306,12 @@
>  #define LIBXL_HAVE_BUILDINFO_HVM_ACPI_LAPTOP_SLATE 1
>  
>  /*
> + * LIBXL_HAVE_VUART indicates that xenconsole/client supports
> + * virtual uart.
> + */
> +#define LIBXL_HAVE_VUART 1
> +
> +/*
>   * libxl ABI compatibility
>   *
>   * The only guarantee which libxl makes regarding ABI compatibility
> diff --git a/tools/libxl/libxl_console.c b/tools/libxl/libxl_console.c
> index 446e766..853be15 100644
> --- a/tools/libxl/libxl_console.c
> +++ b/tools/libxl/libxl_console.c
> @@ -67,6 +67,9 @@ int libxl_console_exec(libxl_ctx *ctx, uint32_t domid, int 
> cons_num,
>  case LIBXL_CONSOLE_TYPE_SERIAL:
>  cons_type_s = "serial";
>  break;
> +case LIBXL_CONSOLE_TYPE_VUART:
> +cons_type_s = "vuart";
> +break;
>  default:
>  goto out;
>  }
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index 5d914a5..c98af60 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -788,6 +788,7 @@ int libxl__build_pv(libxl__gc *gc, uint32_t domid,
>  if (xc_dom_translated(dom)) {
>  state->console_mfn = dom->console_pfn;
>  state->store_mfn = dom->xenstore_pfn;
> +state->vuart_gfn = dom->vuart_gfn;
>  } else {
>  state->console_mfn = xc_dom_p2m(dom, dom->console_pfn);
>  state->store_mfn = xc_dom_p2m(dom, dom->xenstore_pfn);
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index afe6652..d0d50c3 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -1139,6 +1139,9 @@ typedef struct {
>  uint32_t num_vmemranges;
>  
>  xc_domain_configuration_t config;
> +
> +xen_pfn_t vuart_gfn;
> +evtchn_port_t vuart_port;
>  } libxl__domain_build_state;
>  
>  _hidden int libxl__build_pre(libxl__gc *gc, uint32_t domid,
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index 2204425..d492b35 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -105,6 +105,7 @@ libxl_console_type = Enumeration("console_type", [
>  (0, "UNKNOWN"),
>  (1, "SERIAL"),
>  (2, "PV"),
> +(3, "VUART"),
>  ])
>  
>  libxl_disk_format = Enumeration("disk_format", [
> @@ -240,6 +241,11 @@ libxl_checkpointed_stream = 
> Enumeration("checkpointed_stream", [
>  (2, "COLO"),
>  ])
>  
> +libxl_vuart_type = Enumeration("vuart_type", [
> +(0, "unknown"),
> +(1, "sbsa_uart"),
> +])
> +
>  #
>  # Complex libxl types
>  #
> @@ -580,6 +586,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
>  
>  
>  ("arch_arm", Struct(None, [("gic_version", libxl_gic_version),
> +   ("vuart", libxl_vuart_type),
>])),
>  # Alternate p2m is not bound to any architecture or guest type, as it is
>  # supported by x86 HVM and ARM support is planned.
> diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
> index 30eb93c..9f91651 100644
> --- a/tools/xl/xl_cmdtable.c
> +++ b/tools/xl/xl_cmdtable.c
> @@ -133,7 +133,7 @@ struct cmd_spec cmd_table[] = {
>  

Re: [Xen-devel] [PATCH 04/17 v5] xen/arm: vpl011: Add SBSA UART emulation in Xen

2017-06-22 Thread Stefano Stabellini
On Thu, 22 Jun 2017, Bhupinder Thakur wrote:
> Add emulation code to emulate read/write access to pl011 registers
> and pl011 interrupts:
> 
> - Emulate DR read/write by reading and writing from/to the IN
>   and OUT ring buffers and raising an event to the backend when
>   there is data in the OUT ring buffer and injecting an interrupt
>   to the guest when there is data in the IN ring buffer
> 
> - Other registers are related to interrupt management and
>   essentially control when interrupts are delivered to the guest
> 
> This patch implements the SBSA Generic UART which is a subset of ARM
> PL011 UART.
> 
> The SBSA Generic UART is covered in Appendix B of
> https://static.docs.arm.com/den0029/a/Server_Base_System_Architecture_v3_1_ARM_DEN_0029A.pdf
> 
> Signed-off-by: Bhupinder Thakur 
> ---
> CC: Stefano Stabellini 
> CC: Julien Grall 
> CC: Andre Przywara 
> 
> Changes since v4:
> - Renamed vpl011_update() to vpl011_update_interrupt_status() and added logic 
> to avoid
>   raising spurious interrupts.
> - Used barrier instructions correctly while reading/writing data to the ring 
> buffer.
> - Proper lock taken before reading ring buffer indices.
> 
> Changes since v3:
> - Moved the call to DEFINE_XEN_FLEX_RING from vpl011.h to public/console.h. 
> This macro defines
>   standard functions to operate on the ring buffer.
> - Lock taken while updating the interrupt mask and clear registers in 
> mmio_write.
> - Use gfn_t instead of xen_pfn_t.
> - vgic_free_virq called if there is any error in vpl011 initialization.
> - mmio handlers freed if there is any error in vpl011 initialization.
> - Removed vpl011->initialized flag usage as the same check could be done 
>   using vpl011->ring-ref.
> - Used return instead of break in the switch handling of emulation of 
> different pl011 registers.
> - Renamed vpl011_update_spi() to vpl011_update().
> 
> Changes since v2:
> - Use generic vreg_reg* for read/write of registers emulating pl011.
> - Use generic ring buffer functions defined using DEFINE_XEN_FLEX_RING.
> - Renamed the SPI injection function to vpl011_update_spi() to reflect level 
>   triggered nature of pl011 interrupts.
> - The pl011 register access address should always be the base address of the
>   corresponding register as per section B of the SBSA document. For this 
> reason,
>   the register range address access is not allowed.
> 
> Changes since v1:
> - Removed the optimiztion related to sendiing events to xenconsole 
> - Use local variables as ring buffer indices while using the ring buffer
> 
>  xen/arch/arm/Kconfig |   7 +
>  xen/arch/arm/Makefile|   1 +
>  xen/arch/arm/vpl011.c| 449 
> +++
>  xen/include/asm-arm/domain.h |   6 +
>  xen/include/asm-arm/pl011-uart.h |   2 +
>  xen/include/asm-arm/vpl011.h |  73 +++
>  xen/include/public/arch-arm.h|   6 +
>  7 files changed, 544 insertions(+)
>  create mode 100644 xen/arch/arm/vpl011.c
>  create mode 100644 xen/include/asm-arm/vpl011.h
> 
> diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
> index d46b98c..f58019d 100644
> --- a/xen/arch/arm/Kconfig
> +++ b/xen/arch/arm/Kconfig
> @@ -50,6 +50,13 @@ config HAS_ITS
>  prompt "GICv3 ITS MSI controller support" if EXPERT = "y"
>  depends on HAS_GICV3
>  
> +config SBSA_VUART_CONSOLE
> + bool "Emulated SBSA UART console support"
> + default y
> + ---help---
> +   Allows a guest to use SBSA Generic UART as a console. The
> +   SBSA Generic UART implements a subset of ARM PL011 UART.
> +
>  endmenu
>  
>  menu "ARM errata workaround via the alternative framework"
> diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
> index 49e1fb2..d9c6ebf 100644
> --- a/xen/arch/arm/Makefile
> +++ b/xen/arch/arm/Makefile
> @@ -50,6 +50,7 @@ obj-$(CONFIG_HAS_GICV3) += vgic-v3.o
>  obj-$(CONFIG_HAS_ITS) += vgic-v3-its.o
>  obj-y += vm_event.o
>  obj-y += vtimer.o
> +obj-$(CONFIG_SBSA_VUART_CONSOLE) += vpl011.o
>  obj-y += vpsci.o
>  obj-y += vuart.o
>  
> diff --git a/xen/arch/arm/vpl011.c b/xen/arch/arm/vpl011.c
> new file mode 100644
> index 000..db8651c
> --- /dev/null
> +++ b/xen/arch/arm/vpl011.c
> @@ -0,0 +1,449 @@
> +/*
> + * arch/arm/vpl011.c
> + *
> + * Virtual PL011 UART
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along 
> with
> + * this 

Re: [Xen-devel] [PATCH 03/17 v5] xen/arm: vpl011: Define common ring buffer helper functions in console.h

2017-06-22 Thread Stefano Stabellini
On Thu, 22 Jun 2017, Bhupinder Thakur wrote:
> DEFINE_XEN_FLEX_RING(xencons) defines common helper functions such as
> xencons_queued() to tell the current size of the ring buffer,
> xencons_mask() to mask off the index, which are useful helper functions.
> pl011 emulation code will use these helper functions.
> 
> io/consol.h includes io/ring.h which defines DEFINE_XEN_FLEX_RING.

io/console.h


> In console/daemon/io.c, string.h had to be included before io/console.h
> because ring.h uses string functions.
> 
> Signed-off-by: Bhupinder Thakur 

Reviewed-by: Stefano Stabellini 


> ---
> CC: Ian Jackson 
> CC: Wei Liu 
> CC: Konrad Rzeszutek Wilk 
> CC: Stefano Stabellini 
> CC: Julien Grall 
> 
> Changes since v4:
> - Split this change in a separate patch.
> 
>  tools/console/daemon/io.c   | 2 +-
>  xen/include/public/io/console.h | 4 
>  2 files changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/console/daemon/io.c b/tools/console/daemon/io.c
> index 7e474bb..e8033d2 100644
> --- a/tools/console/daemon/io.c
> +++ b/tools/console/daemon/io.c
> @@ -21,6 +21,7 @@
>  
>  #include "utils.h"
>  #include "io.h"
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -29,7 +30,6 @@
>  
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> diff --git a/xen/include/public/io/console.h b/xen/include/public/io/console.h
> index e2cd97f..5e45e1c 100644
> --- a/xen/include/public/io/console.h
> +++ b/xen/include/public/io/console.h
> @@ -27,6 +27,8 @@
>  #ifndef __XEN_PUBLIC_IO_CONSOLE_H__
>  #define __XEN_PUBLIC_IO_CONSOLE_H__
>  
> +#include "ring.h"
> +
>  typedef uint32_t XENCONS_RING_IDX;
>  
>  #define MASK_XENCONS_IDX(idx, ring) ((idx) & (sizeof(ring)-1))
> @@ -38,6 +40,8 @@ struct xencons_interface {
>  XENCONS_RING_IDX out_cons, out_prod;
>  };
>  
> +DEFINE_XEN_FLEX_RING(xencons);
> +
>  #endif /* __XEN_PUBLIC_IO_CONSOLE_H__ */
>  
>  /*

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 3/3] xen-disk: use an IOThread per instance

2017-06-22 Thread Stefano Stabellini
CC'ing Andreas Färber. Could you please give a quick look below at the
way the iothread object is instantiate and destroyed? I am no object
model expert and would appreaciate a second opinion.


On Wed, 21 Jun 2017, Paul Durrant wrote:
> This patch allocates an IOThread object for each xen_disk instance and
> sets the AIO context appropriately on connect. This allows processing
> of I/O to proceed in parallel.
> 
> The patch also adds tracepoints into xen_disk to make it possible to
> follow the state transtions of an instance in the log.
> 
> Signed-off-by: Paul Durrant 
> ---
> Cc: Stefano Stabellini 
> Cc: Anthony Perard 
> Cc: Kevin Wolf 
> Cc: Max Reitz 
> 
> v2:
>  - explicitly acquire and release AIO context in qemu_aio_complete() and
>blk_bh()
> ---
>  hw/block/trace-events |  7 ++
>  hw/block/xen_disk.c   | 69 
> ---
>  2 files changed, 67 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/block/trace-events b/hw/block/trace-events
> index 65e83dc258..608b24ba66 100644
> --- a/hw/block/trace-events
> +++ b/hw/block/trace-events
> @@ -10,3 +10,10 @@ virtio_blk_submit_multireq(void *mrb, int start, int 
> num_reqs, uint64_t offset,
>  # hw/block/hd-geometry.c
>  hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p 
> LCHS %d %d %d"
>  hd_geometry_guess(void *blk, uint32_t cyls, uint32_t heads, uint32_t secs, 
> int trans) "blk %p CHS %u %u %u trans %d"
> +
> +# hw/block/xen_disk.c
> +xen_disk_alloc(char *name) "%s"
> +xen_disk_init(char *name) "%s"
> +xen_disk_connect(char *name) "%s"
> +xen_disk_disconnect(char *name) "%s"
> +xen_disk_free(char *name) "%s"
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index 0e6513708e..8548195195 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -27,10 +27,13 @@
>  #include "hw/xen/xen_backend.h"
>  #include "xen_blkif.h"
>  #include "sysemu/blockdev.h"
> +#include "sysemu/iothread.h"
>  #include "sysemu/block-backend.h"
>  #include "qapi/error.h"
>  #include "qapi/qmp/qdict.h"
>  #include "qapi/qmp/qstring.h"
> +#include "qom/object_interfaces.h"
> +#include "trace.h"
>  
>  /* - */
>  
> @@ -128,6 +131,9 @@ struct XenBlkDev {
>  DriveInfo   *dinfo;
>  BlockBackend*blk;
>  QEMUBH  *bh;
> +
> +IOThread*iothread;
> +AioContext  *ctx;
>  };
>  
>  /* - */
> @@ -599,9 +605,12 @@ static int ioreq_runio_qemu_aio(struct ioreq *ioreq);
>  static void qemu_aio_complete(void *opaque, int ret)
>  {
>  struct ioreq *ioreq = opaque;
> +struct XenBlkDev *blkdev = ioreq->blkdev;
> +
> +aio_context_acquire(blkdev->ctx);

I think that Paolo was right that we need a aio_context_acquire here,
however the issue is that with the current code:

  blk_handle_requests -> ioreq_runio_qemu_aio -> qemu_aio_complete

leading to aio_context_acquire being called twice on the same lock,
which I don't think is allowed?

I think we need to get rid of the qemu_aio_complete call from
ioreq_runio_qemu_aio, but to do that we need to be careful with the
accounting of aio_inflight (today it's incremented unconditionally at
the beginning of ioreq_runio_qemu_aio, I think we would have to change
that to increment it only if presync).


>  if (ret != 0) {
> -xen_pv_printf(>blkdev->xendev, 0, "%s I/O error\n",
> +xen_pv_printf(>xendev, 0, "%s I/O error\n",
>ioreq->req.operation == BLKIF_OP_READ ? "read" : 
> "write");
>  ioreq->aio_errors++;
>  }
> @@ -610,13 +619,13 @@ static void qemu_aio_complete(void *opaque, int ret)
>  if (ioreq->presync) {
>  ioreq->presync = 0;
>  ioreq_runio_qemu_aio(ioreq);
> -return;
> +goto done;
>  }
>  if (ioreq->aio_inflight > 0) {
> -return;
> +goto done;
>  }
>  
> -if (ioreq->blkdev->feature_grant_copy) {
> +if (blkdev->feature_grant_copy) {
>  switch (ioreq->req.operation) {
>  case BLKIF_OP_READ:
>  /* in case of failure ioreq->aio_errors is increased */
> @@ -638,7 +647,7 @@ static void qemu_aio_complete(void *opaque, int ret)
>  }
>  
>  ioreq->status = ioreq->aio_errors ? BLKIF_RSP_ERROR : BLKIF_RSP_OKAY;
> -if (!ioreq->blkdev->feature_grant_copy) {
> +if (!blkdev->feature_grant_copy) {
>  ioreq_unmap(ioreq);
>  }
>  ioreq_finish(ioreq);
> @@ -650,16 +659,19 @@ static void qemu_aio_complete(void *opaque, int ret)
>  }
>  case BLKIF_OP_READ:
>  if (ioreq->status == BLKIF_RSP_OKAY) {
> -block_acct_done(blk_get_stats(ioreq->blkdev->blk), >acct);
> +block_acct_done(blk_get_stats(blkdev->blk), >acct);
>  } else {
> - 

Re: [Xen-devel] new dma-mapping tree, was Re: clean up and modularize arch dma_mapping interface V2

2017-06-22 Thread Stephen Rothwell
Hi all,

On Wed, 21 Jun 2017 15:32:39 +0200 Marek Szyprowski  
wrote:
>
> On 2017-06-20 15:16, Christoph Hellwig wrote:
> > On Tue, Jun 20, 2017 at 11:04:00PM +1000, Stephen Rothwell wrote:  
> >> git://git.linaro.org/people/mszyprowski/linux-dma-mapping.git#dma-mapping-next
> >>
> >> Contacts: Marek Szyprowski and Kyungmin Park (cc'd)
> >>
> >> I have called your tree dma-mapping-hch for now.  The other tree has
> >> not been updated since 4.9-rc1 and I am not sure how general it is.
> >> Marek, Kyungmin, any comments?  
> > I'd be happy to join efforts - co-maintainers and reviers are always
> > welcome.  
> 
> I did some dma-mapping unification works in the past and my tree in 
> linux-next
> was a side effect of that. I think that for now it can be dropped in 
> favor of
> Christoph's tree. I can also do some review and help in maintainers work if
> needed, although I was recently busy with other stuff.

OK, so I have dropped the dma-mapping tree and renamed dma-mapping-hch
to dma-mapping.

-- 
Cheers,
Stephen Rothwell

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [xen-4.7-testing test] 110944: tolerable FAIL - PUSHED

2017-06-22 Thread osstest service owner
flight 110944 xen-4.7-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/110944/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-xl  3 host-install(3) broken in 110902 pass in 110944
 test-xtf-amd64-amd64-2   45 xtf/test-hvm64-lbr-tsx-vmentry fail pass in 110902
 test-arm64-arm64-xl-credit2   9 debian-install fail pass in 110902
 test-armhf-armhf-xl-cubietruck 16 guest-start.2fail pass in 110902

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stopfail REGR. vs. 110430

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-rtds 16 guest-start.2   fail blocked in 110430
 test-armhf-armhf-xl-rtds 15 guest-start/debian.repeat fail in 110902 like 
110430
 test-arm64-arm64-xl-credit2 12 migrate-support-check fail in 110902 never pass
 test-arm64-arm64-xl-credit2 13 saverestore-support-check fail in 110902 never 
pass
 test-armhf-armhf-libvirt-xsm 13 saverestore-support-checkfail  like 110430
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stopfail like 110430
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop fail like 110430
 test-armhf-armhf-libvirt-raw 12 saverestore-support-checkfail  like 110430
 test-armhf-armhf-libvirt 13 saverestore-support-checkfail  like 110430
 test-amd64-amd64-xl-qemut-ws16-amd64  9 windows-installfail never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-ws16-amd64  9 windows-installfail never pass
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail  never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-xsm  12 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail never pass
 test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-win10-i386  9 windows-install fail never pass
 test-amd64-amd64-xl-qemuu-win10-i386  9 windows-installfail never pass
 test-amd64-amd64-xl-qemut-win10-i386  9 windows-installfail never pass
 test-amd64-i386-xl-qemut-win10-i386  9 windows-install fail never pass
 test-amd64-i386-xl-qemuu-ws16-amd64  9 windows-install fail never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-amd64-i386-xl-qemut-ws16-amd64  9 windows-install fail never pass

version targeted for 

Re: [Xen-devel] [PATCH for-4.9 v3 3/3] xen/livepatch: Don't crash on encountering STN_UNDEF relocations

2017-06-22 Thread Stefano Stabellini
On Thu, 22 Jun 2017, Andrew Cooper wrote:
> A symndx of STN_UNDEF is special, and means a symbol value of 0.  While
> legitimate in the ELF standard, its existance in a livepatch is questionable
> at best.  Until a plausible usecase presents itself, reject such a relocation
> with -EOPNOTSUPP.
> 
> Additionally, fix an off-by-one error while range checking symndx, and perform
> a safety check on elf->sym[symndx].sym before derefencing it, to avoid
> tripping over a NULL pointer when calculating val.
> 
> Signed-off-by: Andrew Cooper 
> ---
> CC: Konrad Rzeszutek Wilk 
> CC: Ross Lagerwall 
> CC: Jan Beulich 
> CC: Stefano Stabellini 
> CC: Julien Grall 
> 
> v3:
>  * Fix off-by-one error
> v2:
>  * Reject STN_UNDEF with -EOPNOTSUPP

Reviewed-by: Stefano Stabellini 


> ---
>  xen/arch/arm/arm32/livepatch.c | 14 +-
>  xen/arch/arm/arm64/livepatch.c | 14 +-
>  xen/arch/x86/livepatch.c   | 14 +-
>  3 files changed, 39 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/arch/arm/arm32/livepatch.c b/xen/arch/arm/arm32/livepatch.c
> index a328179..41378a5 100644
> --- a/xen/arch/arm/arm32/livepatch.c
> +++ b/xen/arch/arm/arm32/livepatch.c
> @@ -254,12 +254,24 @@ int arch_livepatch_perform(struct livepatch_elf *elf,
>  addend = get_addend(type, dest);
>  }
>  
> -if ( symndx > elf->nsym )
> +if ( symndx == STN_UNDEF )
> +{
> +dprintk(XENLOG_ERR, LIVEPATCH "%s: Encountered STN_UNDEF\n",
> +elf->name);
> +return -EOPNOTSUPP;
> +}
> +else if ( symndx >= elf->nsym )
>  {
>  dprintk(XENLOG_ERR, LIVEPATCH "%s: Relative symbol wants 
> symbol@%u which is past end!\n",
>  elf->name, symndx);
>  return -EINVAL;
>  }
> +else if ( !elf->sym[symndx].sym )
> +{
> +dprintk(XENLOG_ERR, LIVEPATCH "%s: No relative symbol@%u\n",
> +elf->name, symndx);
> +return -EINVAL;
> +}
>  
>  val = elf->sym[symndx].sym->st_value; /* S */
>  
> diff --git a/xen/arch/arm/arm64/livepatch.c b/xen/arch/arm/arm64/livepatch.c
> index 63929b1..2247b92 100644
> --- a/xen/arch/arm/arm64/livepatch.c
> +++ b/xen/arch/arm/arm64/livepatch.c
> @@ -252,12 +252,24 @@ int arch_livepatch_perform_rela(struct livepatch_elf 
> *elf,
>  int ovf = 0;
>  uint64_t val;
>  
> -if ( symndx > elf->nsym )
> +if ( symndx == STN_UNDEF )
> +{
> +dprintk(XENLOG_ERR, LIVEPATCH "%s: Encountered STN_UNDEF\n",
> +elf->name);
> +return -EOPNOTSUPP;
> +}
> +else if ( symndx >= elf->nsym )
>  {
>  dprintk(XENLOG_ERR, LIVEPATCH "%s: Relative relocation wants 
> symbol@%u which is past end!\n",
>  elf->name, symndx);
>  return -EINVAL;
>  }
> +else if ( !elf->sym[symndx].sym )
> +{
> +dprintk(XENLOG_ERR, LIVEPATCH "%s: No relative symbol@%u\n",
> +elf->name, symndx);
> +return -EINVAL;
> +}
>  
>  val = elf->sym[symndx].sym->st_value +  r->r_addend; /* S+A */
>  
> diff --git a/xen/arch/x86/livepatch.c b/xen/arch/x86/livepatch.c
> index 7917610..406eb91 100644
> --- a/xen/arch/x86/livepatch.c
> +++ b/xen/arch/x86/livepatch.c
> @@ -170,12 +170,24 @@ int arch_livepatch_perform_rela(struct livepatch_elf 
> *elf,
>  uint8_t *dest = base->load_addr + r->r_offset;
>  uint64_t val;
>  
> -if ( symndx > elf->nsym )
> +if ( symndx == STN_UNDEF )
> +{
> +dprintk(XENLOG_ERR, LIVEPATCH "%s: Encountered STN_UNDEF\n",
> +elf->name);
> +return -EOPNOTSUPP;
> +}
> +else if ( symndx >= elf->nsym )
>  {
>  dprintk(XENLOG_ERR, LIVEPATCH "%s: Relative relocation wants 
> symbol@%u which is past end!\n",
>  elf->name, symndx);
>  return -EINVAL;
>  }
> +else if ( !elf->sym[symndx].sym )
> +{
> +dprintk(XENLOG_ERR, LIVEPATCH "%s: No symbol@%u\n",
> +elf->name, symndx);
> +return -EINVAL;
> +}
>  
>  val = r->r_addend + elf->sym[symndx].sym->st_value;
>  
> -- 
> 2.1.4
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v2]Proposal to allow setting up shared memory areas between VMs from xl config file

2017-06-22 Thread Stefano Stabellini
On Wed, 21 Jun 2017, Zhongze Liu wrote:
> 
> 1. Motivation and Description
> 
> Virtual machines use grant table hypercalls to setup a share page for
> inter-VMs communications. These hypercalls are used by all PV
> protocols today. However, very simple guests, such as baremetal
> applications, might not have the infrastructure to handle the grant table.
> This project is about setting up several shared memory areas for inter-VMs
> communications directly from the VM config file.
> So that the guest kernel doesn't have to have grant table support (in the
> embedded space, this is not unusual) to be able to communicate with
> other guests.
> 
> 
> 2. Implementation Plan:
> 
> 
> ==
> 2.1 Introduce a new VM config option in xl:
> ==
> The shared areas should be shareable among several (>=2) VMs, so
> every shared physical memory area is assigned to a set of VMs.
> Therefore, a “token” or “identifier” should be used here to uniquely
> identify a backing memory area.
> 
> The backing area would be taken from one domain, which we will regard
> as the "master domain", and this domain should be created prior to any
> other "slave domain"s. Again, we have to use some kind of tag to tell who
> is the "master domain".
> 
> And the ability to specify the attributes of the pages (say, WO/RO/X)
> to be shared should be also given to the user. For the master domain,
> these attributes often describes the maximum permission allowed for the
> shared pages, and for the slave domains, these attributes are often used
> to describe with what permissions this area will be mapped.
> This information should also be specified in the xl config entry.
> 
> To handle all these, I would suggest using an unsigned integer to serve as the
> identifier, and using a "master" tag in the master domain's xl config entry
> to announce that she will provide the backing memory pages. A separate
> entry would be used to describe the attributes of the shared memory area, of
> the form "prot=RW".
> For example:
> 
> In xl config file of vm1:
> 
> static_shared_mem = ["id = ID1, begin = gmfn1, end = gmfn2,
>   granularity = 4k, prot = RO, master”,
>  "id = ID2, begin = gmfn3, end = gmfn4,
>  granularity = 4k, prot = RW, master”]
> 
> In xl config file of vm2:
> 
> static_shared_mem = ["id = ID1, begin = gmfn5, end = gmfn6,
>   granularity = 4k, prot = RO”]
> 
> In xl config file of vm3:
> 
> static_shared_mem = ["id = ID2, begin = gmfn7, end = gmfn8,
>   granularity = 4k, prot = RW”]
> 
> gmfn's above are all hex of the form "0x2".
> 
> In the example above. A memory area ID1 will be shared between vm1 and vm2.
> This area will be taken from vm1 and mapped into vm2's stage-2 page table.
> The parameter "prot=RO" means that this memory area are offered with read-only
> permission. vm1 can access this area using gmfn1~gmfn2, and vm2 using
> gmfn5~gmfn6.
> Likewise, a memory area ID will be shared between vm1 and vm3 with read and
> write permissions. vm1 is the master and vm2 the slave. vm1 can access the
> area using gmfn3~gmfn4 and vm3 using gmfn7~gmfn8.
> 
> The "granularity" is optional in the slaves' config entries. But if it's
> presented in the slaves' config entry, it has to be the same with its 
> master's.
> Besides, the size of the gmfn range must also match. And overlapping backing
> memory areas are well defined.
> 
> Note that the "master" tag in vm1 for both ID1 and ID2 indicates that vm1
> should be created prior to both vm2 and vm3, for they both rely on the pages
> backed by vm1. If one tries to create vm2 or vm3 prior to vm1, she will get
> an error. And in vm1's config file, the "prot=RO" parameter of ID1 indicates
> that if one tries to share this page with vm1 with, say, "WR" permission,
> she will get an error, too.
> 
> ==
> 2.2 Store the mem-sharing information in xenstore
> ==
> For we don't have some persistent storage for xl to store the information
> of the shared memory areas, we have to find some way to keep it between xl
> launches. And xenstore is a good place to do this. The information for one
> shared area should include the ID, master domid and gmfn ranges and
> memory attributes in master and slave domains of this area.
> A current plan is to place the information under /local/shared_mem/ID.
> Still take the above config files as an example:
> 
> If we instantiate vm1, vm2 and vm3, one after another,
> “xenstore ls -f” should output something like this:
> 
> After VM1 was instantiated, the output of “xenstore ls -f”
> will be something like this:
> 
> 

Re: [Xen-devel] [RFC v2]Proposal to allow setting up shared memory areas between VMs from xl config file

2017-06-22 Thread Stefano Stabellini
On Fri, 23 Jun 2017, Zhongze Liu wrote:
> Hi Julien,
> 
> 2017-06-21 1:29 GMT+08:00 Julien Grall :
> > Hi,
> >
> > Thank you for the new proposal.
> >
> > On 06/20/2017 06:18 PM, Zhongze Liu wrote:
> >>
> >> In the example above. A memory area ID1 will be shared between vm1 and
> >> vm2.
> >> This area will be taken from vm1 and mapped into vm2's stage-2 page table.
> >> The parameter "prot=RO" means that this memory area are offered with
> >> read-only
> >> permission. vm1 can access this area using gmfn1~gmfn2, and vm2 using
> >> gmfn5~gmfn6.
> >
> >
> > [...]
> >
> >>
> >> ==
> >> 2.3 mapping the memory areas
> >> ==
> >> Handle the newly added config option in tools/{xl, libxl} and utilize
> >> toos/libxc to do the actual memory mapping. Specifically, we will use
> >> a wrapper to XENMME_add_to_physmap_batch with XENMAPSPACE_gmfn_foreign to
> >> do the actual mapping. But since there isn't such a wrapper in libxc,
> >> we'll
> >> have to add a new wrapper, xc_domain_add_to_physmap_batch in
> >> libxc/xc_domain.c
> >
> >
> > In the paragrah above, you suggest the user can select the permission on the
> > shared page. However, the hypercall XENMEM_add_to_physmap does not currently
> > take permission. So how do you plan to handle that?
> >
> 
> I think this could be done via XENMEM_access_op?

I discussed this topic with Zhongze. I suggested to leave permissions as
"TODO" for the moment, given that for the use-case we have in mind they
aren't needed.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] passthrough: give XEN_DOMCTL_test_assign_device more sane semantics

2017-06-22 Thread Daniel De Graaf

On 06/22/2017 05:40 AM, George Dunlap wrote:

On 22/06/17 08:05, Jan Beulich wrote:

On 21.06.17 at 18:36,  wrote:

On 21/06/17 16:59, Jan Beulich wrote:

On 21.06.17 at 16:38,  wrote:

On 21/06/17 11:08, Jan Beulich wrote:

So far callers of the libxc interface passed in a domain ID which was
then ignored in the hypervisor. Instead, make the hypervisor honor it
(accepting DOMID_INVALID to obtain original behavior), allowing to
query whether a device is assigned to a particular domain. Ignore the
passed in domain ID at the libxc layer instead, in order to not break
existing callers. New libxc functions would need to be added if callers
wanted to leverage the new functionality.


I don't think your modified description matches the name of the call at all.

It looks like the callers expect "test_assign_device" to answer the
question: "Can I assign a device to this domain"?


I don't think so - the question being answered by the original
operation is "Is this device assigned to any domain?" with the
implied inverse "Is this device available to be assigned to some
domain (i.e. it is currently unassigned or owned by Dom0)?"


If the question were "Is this device assigned to any domain?", then I
would expect:
1. The return value to be a boolean
2. It would always return, "No it's not assigned" in the case where
there is no IOMMU.

However, that's not what happens:
1. It returns "success" if there is an IOMMU and the device is *not*
assigned, and returns an error if the device is assigned
2. It returns an error if there is no IOMMU.

The only place in the code this is called 'for real' in the tree is in
libxl_pci.c:libxl__device_pci_add()

 if (libxl__domain_type(gc, domid) == LIBXL_DOMAIN_TYPE_HVM) {
 rc = xc_test_assign_device(ctx->xch, domid,
pcidev_encode_bdf(pcidev));
 if (rc) {
 LOGD(ERROR, domid,
  "PCI device %04x:%02x:%02x.%u %s?",
  pcidev->domain, pcidev->bus, pcidev->dev, pcidev->func,
  errno == ENOSYS ? "cannot be assigned - no IOMMU"
  : "already assigned to a different guest");
 goto out;
 }
 }

Here 'domid' is the domain to which libxl wants to assign the device.
So libxl is now asking Xen, "Am I allowed to assign device $bdf to
domain $domain?"

Your description provides the *algorithm* by which Xen normally provides
an answer: that is, normally the only thing Xen cares about is that it
hasn't already been assigned to a domain.  But it still remains the case
that what libxl is asking is, "Can I assign X to Y?"


Taking the log message into account that you quote, I do not
view the code's intention to be what you describe.


Well, I'm not sure what to say, because in my view the log message
supports my view. :-)  Note that there are two errors, both explaining
why the domain cannot be assigned -- one is "no IOMMU", one is "already
assigned to a different guest".

Yes, at the moment it doesn't have a separate message for -EPERM (which
is presumably what XSM would return if there were some other problem).
But it also doesn't correctly report other potential errors: -ENODEV if
you try to assign a DT device on a PCI-based system, or a PCI device on
a DT-based system.  (Apparently we also retirn -EINVAL if you included
inappropriate flags, *or* if the device didn't exist, *or* if the device
was already assigned somehwere else.  As long as we're re-painting
things we should probably change this as well.)

But to make test_assign_device answer the question, "Is this assigned to
a domU?", you'd have to have it return SUCCESS when there is no IOMMU
(since the device is not, in fact, assigned to a domU); and thus libxl
would have to make a separate call to find out if an IOMMU was present.


It looks like it's meant to be used in XSM environments, to allow a
policy to permit or forbid specific guests to have access to specific
devices.  On a default (non-XSM) system, the answer to that question
doesn't depend on the domain it's being assigned to, but only whether
the device is already assigned to another domain; but on XSM systems the
logic can presumably be more complicated.

That sounds like a perfectly sane semantic to me, and this patch removes
that ability.


And again I don't think so: Prior to the patch, do_domctl() at its
very top makes sure to entirely ignore the passed in domain ID.
This code sits ahead of the XSM check, so XSM has no way of
knowing which domain has been specified by the caller.


Right, I see that now.

Still, I assert that the original hypercall semantics is a very useful
one, and what you're doing is changing the hypercall such that the
question can no longer be asked.  It would be better to extend things so
that XSM can actually deny device assignment based on both the bdf and
the domain.

Do you have a particular use case in mind for your alternate hypercall?


No - I'm open to any change to it which 

Re: [Xen-devel] [PATCH v2 2/2] x86/xen/efi: Init only efi struct members used by Xen

2017-06-22 Thread Boris Ostrovsky
On 06/22/2017 06:51 AM, Daniel Kiper wrote:
> Current approach, wholesale efi struct initialization from efi_xen, is not
> good. Usually if new member is defined then it is properly initialized in
> drivers/firmware/efi/efi.c but not in arch/x86/xen/efi.c. As I saw it happened
> a few times until now. So, let's initialize only efi struct members used by
> Xen to avoid such issues in the future.
>
> Signed-off-by: Daniel Kiper 
> Acked-by: Ard Biesheuvel 

Reviewed-by: Boris Ostrovsky 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [xen-4.9-testing test] 110942: regressions - FAIL

2017-06-22 Thread osstest service owner
flight 110942 xen-4.9-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/110942/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-xl-credit2 15 guest-start/debian.repeat fail REGR. vs. 110542
 test-armhf-armhf-xl   6 xen-boot fail REGR. vs. 110550

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stopfail REGR. vs. 110542

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop fail like 110499
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stopfail like 110524
 test-amd64-amd64-xl-qemut-win7-amd64 15 guest-localmigrate/x10 fail like 110550
 test-amd64-amd64-xl-rtds  9 debian-install   fail  like 110550
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-ws16-amd64  9 windows-installfail never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemut-ws16-amd64  9 windows-installfail never pass
 build-amd64-prev  6 xen-build/dist-test  fail   never pass
 test-arm64-arm64-xl-credit2  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  13 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-xsm  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 saverestore-support-checkfail   never pass
 build-i386-prev   6 xen-build/dist-test  fail   never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 12 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 saverestore-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-amd64-i386-xl-qemuu-win10-i386  9 windows-install fail never pass
 test-amd64-amd64-xl-qemut-win10-i386  9 windows-installfail never pass
 test-amd64-amd64-xl-qemuu-win10-i386  9 windows-installfail never pass
 test-amd64-i386-xl-qemut-win10-i386  9 windows-install fail never pass
 test-amd64-i386-xl-qemut-ws16-amd64  9 windows-install fail never pass
 test-amd64-i386-xl-qemuu-ws16-amd64  9 windows-install fail never pass

version targeted for testing:
 xen  b38b1479a532f08fedd7f3b761673bc78b66739d
baseline version:
 xen  e197d29514165202308fe65db6effc4835aabfeb

Last test of basis   110550  2017-06-18 21:49:42 Z3 days
Failing since110568  2017-06-19 13:14:32 Z3 days3 attempts
Testing same since   110942  2017-06-21 16:30:45 Z1 days1 attempts


[Xen-devel] [PATCH v5 17/18] xen/pvcalls: implement write

2017-06-22 Thread Stefano Stabellini
When the other end notifies us that there is data to be written
(pvcalls_back_conn_event), increment the io and write counters, and
schedule the ioworker.

Implement the write function called by ioworker by reading the data from
the data ring, writing it to the socket by calling inet_sendmsg.

Set out_error on error.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 74 +-
 1 file changed, 73 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index ccceabd..424dcac 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -179,7 +179,66 @@ static void pvcalls_conn_back_read(void *opaque)
 
 static int pvcalls_conn_back_write(struct sock_mapping *map)
 {
-   return 0;
+   struct pvcalls_data_intf *intf = map->ring;
+   struct pvcalls_data *data = >data;
+   struct msghdr msg;
+   struct kvec vec[2];
+   RING_IDX cons, prod, size, array_size;
+   int ret;
+
+   cons = intf->out_cons;
+   prod = intf->out_prod;
+   /* read the indexes before dealing with the data */
+   virt_mb();
+
+   array_size = XEN_FLEX_RING_SIZE(map->ring_order);
+   size = pvcalls_queued(prod, cons, array_size);
+   if (size == 0)
+   return 0;
+
+   memset(, 0, sizeof(msg));
+   msg.msg_flags |= MSG_DONTWAIT;
+   msg.msg_iter.type = ITER_KVEC|READ;
+   msg.msg_iter.count = size;
+   if (pvcalls_mask(prod, array_size) > pvcalls_mask(cons, array_size)) {
+   vec[0].iov_base = data->out + pvcalls_mask(cons, array_size);
+   vec[0].iov_len = size;
+   msg.msg_iter.kvec = vec;
+   msg.msg_iter.nr_segs = 1;
+   } else {
+   vec[0].iov_base = data->out + pvcalls_mask(cons, array_size);
+   vec[0].iov_len = array_size - pvcalls_mask(cons, array_size);
+   vec[1].iov_base = data->out;
+   vec[1].iov_len = size - vec[0].iov_len;
+   msg.msg_iter.kvec = vec;
+   msg.msg_iter.nr_segs = 2;
+   }
+
+   atomic_set(>write, 0);
+   ret = inet_sendmsg(map->sock, , size);
+   if (ret == -EAGAIN || (ret >= 0 && ret < size)) {
+   atomic_inc(>write);
+   atomic_inc(>io);
+   }
+   if (ret == -EAGAIN)
+   return ret;
+
+   /* write the data, then update the indexes */
+   virt_wmb();
+   if (ret < 0) {
+   intf->out_error = ret;
+   } else {
+   intf->out_error = 0;
+   intf->out_cons = cons + ret;
+   prod = intf->out_prod;
+   }
+   /* update the indexes, then notify the other end */
+   virt_wmb();
+   if (prod != cons + ret)
+   atomic_inc(>write);
+   notify_remote_via_irq(map->irq);
+
+   return ret;
 }
 
 static void pvcalls_back_ioworker(struct work_struct *work)
@@ -849,6 +908,19 @@ static irqreturn_t pvcalls_back_event(int irq, void 
*dev_id)
 
 static irqreturn_t pvcalls_back_conn_event(int irq, void *sock_map)
 {
+   struct sock_mapping *map = sock_map;
+   struct pvcalls_ioworker *iow;
+
+   if (map == NULL || map->sock == NULL || map->sock->sk == NULL ||
+   map->sock->sk->sk_user_data != map)
+   return IRQ_HANDLED;
+
+   iow = >ioworker;
+
+   atomic_inc(>write);
+   atomic_inc(>io);
+   queue_work(iow->wq, >register_work);
+
return IRQ_HANDLED;
 }
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 07/18] xen/pvcalls: implement socket command

2017-06-22 Thread Stefano Stabellini
Just reply with success to the other end for now. Delay the allocation
of the actual socket to bind and/or connect.

Signed-off-by: Stefano Stabellini 
Reviewed-by: Boris Ostrovsky 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 437c2ad..953458b 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -12,12 +12,17 @@
  * GNU General Public License for more details.
  */
 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
 
 #include 
 #include 
@@ -54,6 +59,28 @@ struct pvcalls_fedata {
 static int pvcalls_back_socket(struct xenbus_device *dev,
struct xen_pvcalls_request *req)
 {
+   struct pvcalls_fedata *fedata;
+   int ret;
+   struct xen_pvcalls_response *rsp;
+
+   fedata = dev_get_drvdata(>dev);
+
+   if (req->u.socket.domain != AF_INET ||
+   req->u.socket.type != SOCK_STREAM ||
+   (req->u.socket.protocol != IPPROTO_IP &&
+req->u.socket.protocol != AF_INET))
+   ret = -EAFNOSUPPORT;
+   else
+   ret = 0;
+
+   /* leave the actual socket allocation for later */
+
+   rsp = RING_GET_RESPONSE(>ring, fedata->ring.rsp_prod_pvt++);
+   rsp->req_id = req->req_id;
+   rsp->cmd = req->cmd;
+   rsp->u.socket.id = req->u.socket.id;
+   rsp->ret = ret;
+
return 0;
 }
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 16/18] xen/pvcalls: implement read

2017-06-22 Thread Stefano Stabellini
When an active socket has data available, increment the io and read
counters, and schedule the ioworker.

Implement the read function by reading from the socket, writing the data
to the data ring.

Set in_error on error.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 85 ++
 1 file changed, 85 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index ab7882a..ccceabd 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -100,6 +100,81 @@ static int pvcalls_back_release_active(struct 
xenbus_device *dev,
 
 static void pvcalls_conn_back_read(void *opaque)
 {
+   struct sock_mapping *map = (struct sock_mapping *)opaque;
+   struct msghdr msg;
+   struct kvec vec[2];
+   RING_IDX cons, prod, size, wanted, array_size, masked_prod, masked_cons;
+   int32_t error;
+   struct pvcalls_data_intf *intf = map->ring;
+   struct pvcalls_data *data = >data;
+   unsigned long flags;
+   int ret;
+
+   array_size = XEN_FLEX_RING_SIZE(map->ring_order);
+   cons = intf->in_cons;
+   prod = intf->in_prod;
+   error = intf->in_error;
+   /* read the indexes first, then deal with the data */
+   virt_mb();
+
+   if (error)
+   return;
+
+   size = pvcalls_queued(prod, cons, array_size);
+   if (size >= array_size)
+   return;
+   spin_lock_irqsave(>sock->sk->sk_receive_queue.lock, flags);
+   if (skb_queue_empty(>sock->sk->sk_receive_queue)) {
+   atomic_set(>read, 0);
+   spin_unlock_irqrestore(>sock->sk->sk_receive_queue.lock,
+   flags);
+   return;
+   }
+   spin_unlock_irqrestore(>sock->sk->sk_receive_queue.lock, flags);
+   wanted = array_size - size;
+   masked_prod = pvcalls_mask(prod, array_size);
+   masked_cons = pvcalls_mask(cons, array_size);
+
+   memset(, 0, sizeof(msg));
+   msg.msg_iter.type = ITER_KVEC|WRITE;
+   msg.msg_iter.count = wanted;
+   if (masked_prod < masked_cons) {
+   vec[0].iov_base = data->in + masked_prod;
+   vec[0].iov_len = wanted;
+   msg.msg_iter.kvec = vec;
+   msg.msg_iter.nr_segs = 1;
+   } else {
+   vec[0].iov_base = data->in + masked_prod;
+   vec[0].iov_len = array_size - masked_prod;
+   vec[1].iov_base = data->in;
+   vec[1].iov_len = wanted - vec[0].iov_len;
+   msg.msg_iter.kvec = vec;
+   msg.msg_iter.nr_segs = 2;
+   }
+
+   atomic_set(>read, 0);
+   ret = inet_recvmsg(map->sock, , wanted, MSG_DONTWAIT);
+   WARN_ON(ret > wanted);
+   if (ret == -EAGAIN) /* shouldn't happen */
+   return;
+   if (!ret)
+   ret = -ENOTCONN;
+   spin_lock_irqsave(>sock->sk->sk_receive_queue.lock, flags);
+   if (ret > 0 && !skb_queue_empty(>sock->sk->sk_receive_queue))
+   atomic_inc(>read);
+   spin_unlock_irqrestore(>sock->sk->sk_receive_queue.lock, flags);
+
+   /* write the data, then modify the indexes */
+   virt_wmb();
+   if (ret < 0)
+   intf->in_error = ret;
+   else
+   intf->in_prod = prod + ret;
+   /* update the indexes, then notify the other end */
+   virt_wmb();
+   notify_remote_via_irq(map->irq);
+
+   return;
 }
 
 static int pvcalls_conn_back_write(struct sock_mapping *map)
@@ -172,6 +247,16 @@ static void pvcalls_sk_state_change(struct sock *sock)
 
 static void pvcalls_sk_data_ready(struct sock *sock)
 {
+   struct sock_mapping *map = sock->sk_user_data;
+   struct pvcalls_ioworker *iow;
+
+   if (map == NULL)
+   return;
+
+   iow = >ioworker;
+   atomic_inc(>read);
+   atomic_inc(>io);
+   queue_work(iow->wq, >register_work);
 }
 
 static struct sock_mapping *pvcalls_new_active_socket(
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 04/18] xen/pvcalls: xenbus state handling

2017-06-22 Thread Stefano Stabellini
Introduce the code to handle xenbus state changes.

Implement the probe function for the pvcalls backend. Write the
supported versions, max-page-order and function-calls nodes to xenstore,
as required by the protocol.

Introduce stub functions for disconnecting/connecting to a frontend.

Signed-off-by: Stefano Stabellini 
Reviewed-by: Boris Ostrovsky 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 152 +
 1 file changed, 152 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 9044cf2..7bce750 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -25,20 +25,172 @@
 #include 
 #include 
 
+#define PVCALLS_VERSIONS "1"
+#define MAX_RING_ORDER XENBUS_MAX_RING_GRANT_ORDER
+
 struct pvcalls_back_global {
struct list_head frontends;
struct semaphore frontends_lock;
 } pvcalls_back_global;
 
+static int backend_connect(struct xenbus_device *dev)
+{
+   return 0;
+}
+
+static int backend_disconnect(struct xenbus_device *dev)
+{
+   return 0;
+}
+
 static int pvcalls_back_probe(struct xenbus_device *dev,
  const struct xenbus_device_id *id)
 {
+   int err, abort;
+   struct xenbus_transaction xbt;
+
+again:
+   abort = 1;
+
+   err = xenbus_transaction_start();
+   if (err) {
+   pr_warn("%s cannot create xenstore transaction\n", __func__);
+   return err;
+   }
+
+   err = xenbus_printf(xbt, dev->nodename, "versions", "%s",
+   PVCALLS_VERSIONS);
+   if (err) {
+   pr_warn("%s write out 'version' failed\n", __func__);
+   goto abort;
+   }
+
+   err = xenbus_printf(xbt, dev->nodename, "max-page-order", "%u",
+   MAX_RING_ORDER);
+   if (err) {
+   pr_warn("%s write out 'max-page-order' failed\n", __func__);
+   goto abort;
+   }
+
+   err = xenbus_printf(xbt, dev->nodename, "function-calls",
+   XENBUS_FUNCTIONS_CALLS);
+   if (err) {
+   pr_warn("%s write out 'function-calls' failed\n", __func__);
+   goto abort;
+   }
+
+   abort = 0;
+abort:
+   err = xenbus_transaction_end(xbt, abort);
+   if (err) {
+   if (err == -EAGAIN && !abort)
+   goto again;
+   pr_warn("%s cannot complete xenstore transaction\n", __func__);
+   return err;
+   }
+
+   xenbus_switch_state(dev, XenbusStateInitWait);
+
return 0;
 }
 
+static void set_backend_state(struct xenbus_device *dev,
+ enum xenbus_state state)
+{
+   while (dev->state != state) {
+   switch (dev->state) {
+   case XenbusStateClosed:
+   switch (state) {
+   case XenbusStateInitWait:
+   case XenbusStateConnected:
+   xenbus_switch_state(dev, XenbusStateInitWait);
+   break;
+   case XenbusStateClosing:
+   xenbus_switch_state(dev, XenbusStateClosing);
+   break;
+   default:
+   __WARN();
+   }
+   break;
+   case XenbusStateInitWait:
+   case XenbusStateInitialised:
+   switch (state) {
+   case XenbusStateConnected:
+   backend_connect(dev);
+   xenbus_switch_state(dev, XenbusStateConnected);
+   break;
+   case XenbusStateClosing:
+   case XenbusStateClosed:
+   xenbus_switch_state(dev, XenbusStateClosing);
+   break;
+   default:
+   __WARN();
+   }
+   break;
+   case XenbusStateConnected:
+   switch (state) {
+   case XenbusStateInitWait:
+   case XenbusStateClosing:
+   case XenbusStateClosed:
+   down(_back_global.frontends_lock);
+   backend_disconnect(dev);
+   up(_back_global.frontends_lock);
+   xenbus_switch_state(dev, XenbusStateClosing);
+   break;
+   default:
+   __WARN();
+   }
+   break;
+   case XenbusStateClosing:
+   switch (state) {
+   case XenbusStateInitWait:
+

[Xen-devel] [PATCH v5 06/18] xen/pvcalls: handle commands from the frontend

2017-06-22 Thread Stefano Stabellini
When the other end notifies us that there are commands to be read
(pvcalls_back_event), wake up the backend thread to parse the command.

The command ring works like most other Xen rings, so use the usual
ring macros to read and write to it. The functions implementing the
commands are empty stubs for now.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 119 +
 1 file changed, 119 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index e4c2e46..437c2ad 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -51,12 +51,131 @@ struct pvcalls_fedata {
struct work_struct register_work;
 };
 
+static int pvcalls_back_socket(struct xenbus_device *dev,
+   struct xen_pvcalls_request *req)
+{
+   return 0;
+}
+
+static int pvcalls_back_connect(struct xenbus_device *dev,
+   struct xen_pvcalls_request *req)
+{
+   return 0;
+}
+
+static int pvcalls_back_release(struct xenbus_device *dev,
+   struct xen_pvcalls_request *req)
+{
+   return 0;
+}
+
+static int pvcalls_back_bind(struct xenbus_device *dev,
+struct xen_pvcalls_request *req)
+{
+   return 0;
+}
+
+static int pvcalls_back_listen(struct xenbus_device *dev,
+  struct xen_pvcalls_request *req)
+{
+   return 0;
+}
+
+static int pvcalls_back_accept(struct xenbus_device *dev,
+  struct xen_pvcalls_request *req)
+{
+   return 0;
+}
+
+static int pvcalls_back_poll(struct xenbus_device *dev,
+struct xen_pvcalls_request *req)
+{
+   return 0;
+}
+
+static int pvcalls_back_handle_cmd(struct xenbus_device *dev,
+  struct xen_pvcalls_request *req)
+{
+   int ret = 0;
+
+   switch (req->cmd) {
+   case PVCALLS_SOCKET:
+   ret = pvcalls_back_socket(dev, req);
+   break;
+   case PVCALLS_CONNECT:
+   ret = pvcalls_back_connect(dev, req);
+   break;
+   case PVCALLS_RELEASE:
+   ret = pvcalls_back_release(dev, req);
+   break;
+   case PVCALLS_BIND:
+   ret = pvcalls_back_bind(dev, req);
+   break;
+   case PVCALLS_LISTEN:
+   ret = pvcalls_back_listen(dev, req);
+   break;
+   case PVCALLS_ACCEPT:
+   ret = pvcalls_back_accept(dev, req);
+   break;
+   case PVCALLS_POLL:
+   ret = pvcalls_back_poll(dev, req);
+   break;
+   default:
+   ret = -ENOTSUPP;
+   break;
+   }
+   return ret;
+}
+
 static void pvcalls_back_work(struct work_struct *work)
 {
+   struct pvcalls_fedata *fedata = container_of(work,
+   struct pvcalls_fedata, register_work);
+   int notify, notify_all = 0, more = 1;
+   struct xen_pvcalls_request req;
+   struct xenbus_device *dev = fedata->dev;
+
+   while (more) {
+   while (RING_HAS_UNCONSUMED_REQUESTS(>ring)) {
+   RING_COPY_REQUEST(>ring,
+ fedata->ring.req_cons++,
+ );
+
+   if (!pvcalls_back_handle_cmd(dev, )) {
+   RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(
+   >ring, notify);
+   notify_all += notify;
+   }
+   }
+
+   if (notify_all)
+   notify_remote_via_irq(fedata->irq);
+
+   RING_FINAL_CHECK_FOR_REQUESTS(>ring, more);
+   }
 }
 
 static irqreturn_t pvcalls_back_event(int irq, void *dev_id)
 {
+   struct xenbus_device *dev = dev_id;
+   struct pvcalls_fedata *fedata = NULL;
+
+   if (dev == NULL)
+   return IRQ_HANDLED;
+
+   fedata = dev_get_drvdata(>dev);
+   if (fedata == NULL)
+   return IRQ_HANDLED;
+
+   /*
+* TODO: a small theoretical race exists if we try to queue work
+* after pvcalls_back_work checked for final requests and before
+* it returns. The queuing will fail, and pvcalls_back_work
+* won't do the work because it is about to return. In that
+* case, we lose the notification.
+*/
+   queue_work(fedata->wq, >register_work);
+
return IRQ_HANDLED;
 }
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 15/18] xen/pvcalls: implement the ioworker functions

2017-06-22 Thread Stefano Stabellini
We have one ioworker per socket. Each ioworker goes through the list of
outstanding read/write requests. Once all requests have been dealt with,
it returns.

We use one atomic counter per socket for "read" operations and one
for "write" operations to keep track of the reads/writes to do.

We also use one atomic counter ("io") per ioworker to keep track of how
many outstanding requests we have in total assigned to the ioworker. The
ioworker finishes when there are none.

Signed-off-by: Stefano Stabellini 
Reviewed-by: Boris Ostrovsky 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 7a8e866..ab7882a 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -98,8 +98,35 @@ static int pvcalls_back_release_active(struct xenbus_device 
*dev,
   struct pvcalls_fedata *fedata,
   struct sock_mapping *map);
 
+static void pvcalls_conn_back_read(void *opaque)
+{
+}
+
+static int pvcalls_conn_back_write(struct sock_mapping *map)
+{
+   return 0;
+}
+
 static void pvcalls_back_ioworker(struct work_struct *work)
 {
+   struct pvcalls_ioworker *ioworker = container_of(work,
+   struct pvcalls_ioworker, register_work);
+   struct sock_mapping *map = container_of(ioworker, struct sock_mapping,
+   ioworker);
+
+   while (atomic_read(>io) > 0) {
+   if (atomic_read(>release) > 0) {
+   atomic_set(>release, 0);
+   return;
+   }
+
+   if (atomic_read(>read) > 0)
+   pvcalls_conn_back_read(map);
+   if (atomic_read(>write) > 0)
+   pvcalls_conn_back_write(map);
+
+   atomic_dec(>io);
+   }
 }
 
 static int pvcalls_back_socket(struct xenbus_device *dev,
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 14/18] xen/pvcalls: disconnect and module_exit

2017-06-22 Thread Stefano Stabellini
Implement backend_disconnect. Call pvcalls_back_release_active on active
sockets and pvcalls_back_release_passive on passive sockets.

Implement module_exit by calling backend_disconnect on frontend
connections.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 53 ++
 1 file changed, 53 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index f6f88ce..7a8e866 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -812,6 +812,43 @@ static int backend_connect(struct xenbus_device *dev)
 
 static int backend_disconnect(struct xenbus_device *dev)
 {
+   struct pvcalls_fedata *fedata;
+   struct sock_mapping *map, *n;
+   struct sockpass_mapping *mappass;
+   struct radix_tree_iter iter;
+   void **slot;
+
+
+   fedata = dev_get_drvdata(>dev);
+
+   down(>socket_lock);
+   list_for_each_entry_safe(map, n, >socket_mappings, list) {
+   list_del(>list);
+   pvcalls_back_release_active(dev, fedata, map);
+   }
+
+   radix_tree_for_each_slot(slot, >socketpass_mappings, , 0) {
+   mappass = radix_tree_deref_slot(slot);
+   if (!mappass)
+   continue;
+   if (radix_tree_exception(mappass)) {
+   if (radix_tree_deref_retry(mappass))
+   slot = radix_tree_iter_retry();
+   } else {
+   radix_tree_delete(>socketpass_mappings, 
mappass->id);
+   pvcalls_back_release_passive(dev, fedata, mappass);
+   }
+   }
+   up(>socket_lock);
+
+   xenbus_unmap_ring_vfree(dev, fedata->sring);
+   unbind_from_irqhandler(fedata->irq, dev);
+
+   list_del(>list);
+   destroy_workqueue(fedata->wq);
+   kfree(fedata);
+   dev_set_drvdata(>dev, NULL);
+
return 0;
 }
 
@@ -1005,3 +1042,19 @@ static int __init pvcalls_back_init(void)
return 0;
 }
 module_init(pvcalls_back_init);
+
+static void __exit pvcalls_back_fin(void)
+{
+   struct pvcalls_fedata *fedata, *nfedata;
+
+   down(_back_global.frontends_lock);
+   list_for_each_entry_safe(fedata, nfedata, 
_back_global.frontends,
+list) {
+   backend_disconnect(fedata->dev);
+   }
+   up(_back_global.frontends_lock);
+
+   xenbus_unregister_driver(_back_driver);
+}
+
+module_exit(pvcalls_back_fin);
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 18/18] xen: introduce a Kconfig option to enable the pvcalls backend

2017-06-22 Thread Stefano Stabellini
Also add pvcalls-back to the Makefile.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/Kconfig  | 12 
 drivers/xen/Makefile |  1 +
 2 files changed, 13 insertions(+)

diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index f15bb3b7..4545561 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -196,6 +196,18 @@ config XEN_PCIDEV_BACKEND
 
  If in doubt, say m.
 
+config XEN_PVCALLS_BACKEND
+   bool "XEN PV Calls backend driver"
+   depends on INET && XEN && XEN_BACKEND
+   default n
+   help
+ Experimental backend for the Xen PV Calls protocol
+ (https://xenbits.xen.org/docs/unstable/misc/pvcalls.html). It
+ allows PV Calls frontends to send POSIX calls to the backend,
+ which implements them.
+
+ If in doubt, say n.
+
 config XEN_SCSI_BACKEND
tristate "XEN SCSI backend driver"
depends on XEN && XEN_BACKEND && TARGET_CORE
diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index 8feab810..480b928 100644
--- a/drivers/xen/Makefile
+++ b/drivers/xen/Makefile
@@ -38,6 +38,7 @@ obj-$(CONFIG_XEN_ACPI_PROCESSOR)  += xen-acpi-processor.o
 obj-$(CONFIG_XEN_EFI)  += efi.o
 obj-$(CONFIG_XEN_SCSI_BACKEND) += xen-scsiback.o
 obj-$(CONFIG_XEN_AUTO_XLATE)   += xlate_mmu.o
+obj-$(CONFIG_XEN_PVCALLS_BACKEND)  += pvcalls-back.o
 xen-evtchn-y   := evtchn.o
 xen-gntdev-y   := gntdev.o
 xen-gntalloc-y := gntalloc.o
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 11/18] xen/pvcalls: implement accept command

2017-06-22 Thread Stefano Stabellini
Implement the accept command by calling inet_accept. To avoid blocking
in the kernel, call inet_accept(O_NONBLOCK) from a workqueue, which get
scheduled on sk_data_ready (for a passive socket, it means that there
are connections to accept).

Use the reqcopy field to store the request. Accept the new socket from
the delayed work function, create a new sock_mapping for it, map
the indexes page and data ring, and reply to the other end. Allocate an
ioworker for the socket.

Only support one outstanding blocking accept request for every socket at
any time.

Add a field to sock_mapping to remember the passive socket from which an
active socket was created.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 113 +
 1 file changed, 113 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 2a47425..62738e4 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -64,6 +64,7 @@ struct pvcalls_ioworker {
 struct sock_mapping {
struct list_head list;
struct pvcalls_fedata *fedata;
+   struct sockpass_mapping *sockpass;
struct socket *sock;
uint64_t id;
grant_ref_t ref;
@@ -279,10 +280,83 @@ static int pvcalls_back_release(struct xenbus_device *dev,
 
 static void __pvcalls_back_accept(struct work_struct *work)
 {
+   struct sockpass_mapping *mappass = container_of(
+   work, struct sockpass_mapping, register_work);
+   struct sock_mapping *map;
+   struct pvcalls_ioworker *iow;
+   struct pvcalls_fedata *fedata;
+   struct socket *sock;
+   struct xen_pvcalls_response *rsp;
+   struct xen_pvcalls_request *req;
+   int notify;
+   int ret = -EINVAL;
+   unsigned long flags;
+
+   fedata = mappass->fedata;
+   /*
+* __pvcalls_back_accept can race against pvcalls_back_accept.
+* We only need to check the value of "cmd" on read. It could be
+* done atomically, but to simplify the code on the write side, we
+* use a spinlock.
+*/
+   spin_lock_irqsave(>copy_lock, flags);
+   req = >reqcopy;
+   if (req->cmd != PVCALLS_ACCEPT) {
+   spin_unlock_irqrestore(>copy_lock, flags);
+   return;
+   }
+   spin_unlock_irqrestore(>copy_lock, flags);
+
+   sock = sock_alloc();
+   if (sock == NULL)
+   goto out_error;
+   sock->type = mappass->sock->type;
+   sock->ops = mappass->sock->ops;
+
+   ret = inet_accept(mappass->sock, sock, O_NONBLOCK, true);
+   if (ret == -EAGAIN) {
+   sock_release(sock);
+   goto out_error;
+   }
+
+   map = pvcalls_new_active_socket(fedata,
+   req->u.accept.id_new,
+   req->u.accept.ref,
+   req->u.accept.evtchn,
+   sock);
+   if (!map) {
+   ret = -EFAULT;
+   sock_release(sock);
+   goto out_error;
+   }
+
+   map->sockpass = mappass;
+   iow = >ioworker;
+   atomic_inc(>read);
+   atomic_inc(>io);
+   queue_work(iow->wq, >register_work);
+
+out_error:
+   rsp = RING_GET_RESPONSE(>ring, fedata->ring.rsp_prod_pvt++);
+   rsp->req_id = req->req_id;
+   rsp->cmd = req->cmd;
+   rsp->u.accept.id = req->u.accept.id;
+   rsp->ret = ret;
+   RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(>ring, notify);
+   if (notify)
+   notify_remote_via_irq(fedata->irq);
+
+   mappass->reqcopy.cmd = 0;
 }
 
 static void pvcalls_pass_sk_data_ready(struct sock *sock)
 {
+   struct sockpass_mapping *mappass = sock->sk_user_data;
+
+   if (mappass == NULL)
+   return;
+
+   queue_work(mappass->wq, >register_work);
 }
 
 static int pvcalls_back_bind(struct xenbus_device *dev,
@@ -388,6 +462,45 @@ static int pvcalls_back_listen(struct xenbus_device *dev,
 static int pvcalls_back_accept(struct xenbus_device *dev,
   struct xen_pvcalls_request *req)
 {
+   struct pvcalls_fedata *fedata;
+   struct sockpass_mapping *mappass;
+   int ret = -EINVAL;
+   struct xen_pvcalls_response *rsp;
+   unsigned long flags;
+
+   fedata = dev_get_drvdata(>dev);
+
+   down(>socket_lock);
+   mappass = radix_tree_lookup(>socketpass_mappings,
+   req->u.accept.id);
+   up(>socket_lock);
+   if (mappass == NULL)
+   goto out_error;
+
+   /* 
+* Limitation of the current implementation: only support one
+* concurrent accept or poll call on one socket.
+*/
+   spin_lock_irqsave(>copy_lock, flags);
+   if (mappass->reqcopy.cmd != 0) {
+   spin_unlock_irqrestore(>copy_lock, flags);
+   ret = -EINTR;
+  

[Xen-devel] [PATCH v5 05/18] xen/pvcalls: connect to a frontend

2017-06-22 Thread Stefano Stabellini
Introduce a per-frontend data structure named pvcalls_fedata. It
contains pointers to the command ring, its event channel, a list of
active sockets and a tree of passive sockets (passing sockets need to be
looked up from the id on listen, accept and poll commands, while active
sockets only on release).

It also has an unbound workqueue to schedule the work of parsing and
executing commands on the command ring. socket_lock protects the two
lists. In pvcalls_back_global, keep a list of connected frontends.

Signed-off-by: Stefano Stabellini 
Reviewed-by: Boris Ostrovsky 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 92 ++
 1 file changed, 92 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 7bce750..e4c2e46 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -33,9 +33,101 @@ struct pvcalls_back_global {
struct semaphore frontends_lock;
 } pvcalls_back_global;
 
+/*
+ * Per-frontend data structure. It contains pointers to the command
+ * ring, its event channel, a list of active sockets and a tree of
+ * passive sockets.
+ */
+struct pvcalls_fedata {
+   struct list_head list;
+   struct xenbus_device *dev;
+   struct xen_pvcalls_sring *sring;
+   struct xen_pvcalls_back_ring ring;
+   int irq;
+   struct list_head socket_mappings;
+   struct radix_tree_root socketpass_mappings;
+   struct semaphore socket_lock;
+   struct workqueue_struct *wq;
+   struct work_struct register_work;
+};
+
+static void pvcalls_back_work(struct work_struct *work)
+{
+}
+
+static irqreturn_t pvcalls_back_event(int irq, void *dev_id)
+{
+   return IRQ_HANDLED;
+}
+
 static int backend_connect(struct xenbus_device *dev)
 {
+   int err, evtchn;
+   grant_ref_t ring_ref;
+   struct pvcalls_fedata *fedata = NULL;
+
+   fedata = kzalloc(sizeof(struct pvcalls_fedata), GFP_KERNEL);
+   if (!fedata)
+   return -ENOMEM;
+
+   err = xenbus_scanf(XBT_NIL, dev->otherend, "port", "%u",
+  );
+   if (err != 1) {
+   err = -EINVAL;
+   xenbus_dev_fatal(dev, err, "reading %s/event-channel",
+dev->otherend);
+   goto error;
+   }
+
+   err = xenbus_scanf(XBT_NIL, dev->otherend, "ring-ref", "%u", _ref);
+   if (err != 1) {
+   err = -EINVAL;
+   xenbus_dev_fatal(dev, err, "reading %s/ring-ref",
+dev->otherend);
+   goto error;
+   }
+
+   err = bind_interdomain_evtchn_to_irqhandler(dev->otherend_id, evtchn,
+   pvcalls_back_event, 0,
+   "pvcalls-backend", dev);
+   if (err < 0)
+   goto error;
+   fedata->irq = err;
+
+   fedata->wq = alloc_workqueue("pvcalls_back_wq", WQ_UNBOUND, 1);
+   if (!fedata->wq) {
+   err = -ENOMEM;
+   goto error;
+   }
+
+   err = xenbus_map_ring_valloc(dev, _ref, 1, (void**)>sring);
+   if (err < 0)
+   goto error;
+
+   BACK_RING_INIT(>ring, fedata->sring, XEN_PAGE_SIZE * 1);
+   fedata->dev = dev;
+
+   INIT_WORK(>register_work, pvcalls_back_work);
+   INIT_LIST_HEAD(>socket_mappings);
+   INIT_RADIX_TREE(>socketpass_mappings, GFP_KERNEL);
+   sema_init(>socket_lock, 1);
+   dev_set_drvdata(>dev, fedata);
+
+   down(_back_global.frontends_lock);
+   list_add_tail(>list, _back_global.frontends);
+   up(_back_global.frontends_lock);
+   queue_work(fedata->wq, >register_work);
+
return 0;
+
+ error:
+   if (fedata->sring != NULL)
+   xenbus_unmap_ring_vfree(dev, fedata->sring);
+   if (fedata->wq)
+   destroy_workqueue(fedata->wq);
+   unbind_from_irqhandler(fedata->irq, dev);
+   kfree(fedata);
+   return err;
 }
 
 static int backend_disconnect(struct xenbus_device *dev)
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 01/18] xen: introduce the pvcalls interface header

2017-06-22 Thread Stefano Stabellini
Introduce the C header file which defines the PV Calls interface. It is
imported from xen/include/public/io/pvcalls.h.

Signed-off-by: Stefano Stabellini 
Reviewed-by: Boris Ostrovsky 
CC: konrad.w...@oracle.com
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 include/xen/interface/io/pvcalls.h | 121 +
 include/xen/interface/io/ring.h|   2 +
 2 files changed, 123 insertions(+)
 create mode 100644 include/xen/interface/io/pvcalls.h

diff --git a/include/xen/interface/io/pvcalls.h 
b/include/xen/interface/io/pvcalls.h
new file mode 100644
index 000..ccf97b8
--- /dev/null
+++ b/include/xen/interface/io/pvcalls.h
@@ -0,0 +1,121 @@
+#ifndef __XEN_PUBLIC_IO_XEN_PVCALLS_H__
+#define __XEN_PUBLIC_IO_XEN_PVCALLS_H__
+
+#include 
+#include 
+#include 
+
+/* "1" means socket, connect, release, bind, listen, accept and poll */
+#define XENBUS_FUNCTIONS_CALLS "1"
+
+/*
+ * See docs/misc/pvcalls.markdown in xen.git for the full specification:
+ * https://xenbits.xen.org/docs/unstable/misc/pvcalls.html
+ */
+struct pvcalls_data_intf {
+RING_IDX in_cons, in_prod, in_error;
+
+uint8_t pad1[52];
+
+RING_IDX out_cons, out_prod, out_error;
+
+uint8_t pad2[52];
+
+RING_IDX ring_order;
+grant_ref_t ref[];
+};
+DEFINE_XEN_FLEX_RING(pvcalls);
+
+#define PVCALLS_SOCKET 0
+#define PVCALLS_CONNECT1
+#define PVCALLS_RELEASE2
+#define PVCALLS_BIND   3
+#define PVCALLS_LISTEN 4
+#define PVCALLS_ACCEPT 5
+#define PVCALLS_POLL   6
+
+struct xen_pvcalls_request {
+uint32_t req_id; /* private to guest, echoed in response */
+uint32_t cmd;/* command to execute */
+union {
+struct xen_pvcalls_socket {
+uint64_t id;
+uint32_t domain;
+uint32_t type;
+uint32_t protocol;
+} socket;
+struct xen_pvcalls_connect {
+uint64_t id;
+uint8_t addr[28];
+uint32_t len;
+uint32_t flags;
+grant_ref_t ref;
+uint32_t evtchn;
+} connect;
+struct xen_pvcalls_release {
+uint64_t id;
+uint8_t reuse;
+} release;
+struct xen_pvcalls_bind {
+uint64_t id;
+uint8_t addr[28];
+uint32_t len;
+} bind;
+struct xen_pvcalls_listen {
+uint64_t id;
+uint32_t backlog;
+} listen;
+struct xen_pvcalls_accept {
+uint64_t id;
+uint64_t id_new;
+grant_ref_t ref;
+uint32_t evtchn;
+} accept;
+struct xen_pvcalls_poll {
+uint64_t id;
+} poll;
+/* dummy member to force sizeof(struct xen_pvcalls_request)
+ * to match across archs */
+struct xen_pvcalls_dummy {
+uint8_t dummy[56];
+} dummy;
+} u;
+};
+
+struct xen_pvcalls_response {
+uint32_t req_id;
+uint32_t cmd;
+int32_t ret;
+uint32_t pad;
+union {
+struct _xen_pvcalls_socket {
+uint64_t id;
+} socket;
+struct _xen_pvcalls_connect {
+uint64_t id;
+} connect;
+struct _xen_pvcalls_release {
+uint64_t id;
+} release;
+struct _xen_pvcalls_bind {
+uint64_t id;
+} bind;
+struct _xen_pvcalls_listen {
+uint64_t id;
+} listen;
+struct _xen_pvcalls_accept {
+uint64_t id;
+} accept;
+struct _xen_pvcalls_poll {
+uint64_t id;
+} poll;
+struct _xen_pvcalls_dummy {
+uint8_t dummy[8];
+} dummy;
+} u;
+};
+
+DEFINE_RING_TYPES(xen_pvcalls, struct xen_pvcalls_request,
+  struct xen_pvcalls_response);
+
+#endif
diff --git a/include/xen/interface/io/ring.h b/include/xen/interface/io/ring.h
index c794568..e547088 100644
--- a/include/xen/interface/io/ring.h
+++ b/include/xen/interface/io/ring.h
@@ -9,6 +9,8 @@
 #ifndef __XEN_PUBLIC_IO_RING_H__
 #define __XEN_PUBLIC_IO_RING_H__
 
+#include 
+
 typedef unsigned int RING_IDX;
 
 /* Round a 32-bit unsigned constant down to the nearest power of two. */
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 08/18] xen/pvcalls: implement connect command

2017-06-22 Thread Stefano Stabellini
Allocate a socket. Keep track of socket <-> ring mappings with a new data
structure, called sock_mapping. Implement the connect command by calling
inet_stream_connect, and mapping the new indexes page and data ring.
Allocate a workqueue and a work_struct, called ioworker, to perform
reads and writes to the socket.

When an active socket is closed (sk_state_change), set in_error to
-ENOTCONN and notify the other end, as specified by the protocol.

sk_data_ready and pvcalls_back_ioworker will be implemented later.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 174 +
 1 file changed, 174 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 953458b..5435ce7 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -56,6 +56,39 @@ struct pvcalls_fedata {
struct work_struct register_work;
 };
 
+struct pvcalls_ioworker {
+   struct work_struct register_work;
+   struct workqueue_struct *wq;
+};
+
+struct sock_mapping {
+   struct list_head list;
+   struct pvcalls_fedata *fedata;
+   struct socket *sock;
+   uint64_t id;
+   grant_ref_t ref;
+   struct pvcalls_data_intf *ring;
+   void *bytes;
+   struct pvcalls_data data;
+   uint32_t ring_order;
+   int irq;
+   atomic_t read;
+   atomic_t write;
+   atomic_t io;
+   atomic_t release;
+   void (*saved_data_ready)(struct sock *sk);
+   struct pvcalls_ioworker ioworker;
+};
+
+static irqreturn_t pvcalls_back_conn_event(int irq, void *sock_map);
+static int pvcalls_back_release_active(struct xenbus_device *dev,
+  struct pvcalls_fedata *fedata,
+  struct sock_mapping *map);
+
+static void pvcalls_back_ioworker(struct work_struct *work)
+{
+}
+
 static int pvcalls_back_socket(struct xenbus_device *dev,
struct xen_pvcalls_request *req)
 {
@@ -84,9 +117,145 @@ static int pvcalls_back_socket(struct xenbus_device *dev,
return 0;
 }
 
+static void pvcalls_sk_state_change(struct sock *sock)
+{
+   struct sock_mapping *map = sock->sk_user_data;
+   struct pvcalls_data_intf *intf;
+
+   if (map == NULL)
+   return;
+
+   intf = map->ring;
+   intf->in_error = -ENOTCONN;
+   notify_remote_via_irq(map->irq);
+}
+
+static void pvcalls_sk_data_ready(struct sock *sock)
+{
+}
+
+static struct sock_mapping *pvcalls_new_active_socket(
+   struct pvcalls_fedata *fedata,
+   uint64_t id,
+   grant_ref_t ref,
+   uint32_t evtchn,
+   struct socket *sock)
+{
+   int ret;
+   struct sock_mapping *map;
+   void *page;
+
+   map = kzalloc(sizeof(*map), GFP_KERNEL);
+   if (map == NULL)
+   return NULL;
+
+   map->fedata = fedata;
+   map->sock = sock;
+   map->id = id;
+   map->ref = ref;
+
+   ret = xenbus_map_ring_valloc(fedata->dev, , 1, );
+   if (ret < 0)
+   goto out;
+   map->ring = page;
+   map->ring_order = map->ring->ring_order;
+   /* first read the order, then map the data ring */
+   virt_rmb();
+   if (map->ring_order > MAX_RING_ORDER) {
+   pr_warn("%s frontend requested ring_order %u, which is > MAX 
(%u)\n",
+   __func__, map->ring_order, MAX_RING_ORDER);
+   goto out;
+   }
+   ret = xenbus_map_ring_valloc(fedata->dev, map->ring->ref,
+(1 << map->ring_order), );
+   if (ret < 0)
+   goto out;
+   map->bytes = page;
+
+   ret = bind_interdomain_evtchn_to_irqhandler(fedata->dev->otherend_id,
+   evtchn,
+   pvcalls_back_conn_event,
+   0,
+   "pvcalls-backend",
+   map);
+   if (ret < 0)
+   goto out;
+   map->irq = ret;
+
+   map->data.in = map->bytes;
+   map->data.out = map->bytes + XEN_FLEX_RING_SIZE(map->ring_order);
+   
+   map->ioworker.wq = alloc_workqueue("pvcalls_io", WQ_UNBOUND, 1);
+   if (!map->ioworker.wq)
+   goto out;
+   atomic_set(>io, 1);
+   INIT_WORK(>ioworker.register_work, pvcalls_back_ioworker);
+
+   down(>socket_lock);
+   list_add_tail(>list, >socket_mappings);
+   up(>socket_lock);
+
+   write_lock_bh(>sock->sk->sk_callback_lock);
+   map->saved_data_ready = map->sock->sk->sk_data_ready;
+   map->sock->sk->sk_user_data = map;
+   map->sock->sk->sk_data_ready = pvcalls_sk_data_ready;
+   map->sock->sk->sk_state_change = pvcalls_sk_state_change;
+   

[Xen-devel] [PATCH v5 13/18] xen/pvcalls: implement release command

2017-06-22 Thread Stefano Stabellini
Release both active and passive sockets. For active sockets, make sure
to avoid possible conflicts with the ioworker reading/writing to those
sockets concurrently. Set map->release to let the ioworker know
atomically that the socket will be released soon, then wait until the
ioworker finishes (flush_work).

Unmap indexes pages and data rings.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 68 ++
 1 file changed, 68 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 5b2ef60..f6f88ce 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -269,12 +269,80 @@ static int pvcalls_back_release_active(struct 
xenbus_device *dev,
   struct pvcalls_fedata *fedata,
   struct sock_mapping *map)
 {
+   disable_irq(map->irq);
+   if (map->sock->sk != NULL) {
+   write_lock_bh(>sock->sk->sk_callback_lock);
+   map->sock->sk->sk_user_data = NULL;
+   map->sock->sk->sk_data_ready = map->saved_data_ready;
+   write_unlock_bh(>sock->sk->sk_callback_lock);
+   }
+
+   atomic_set(>release, 1);
+   flush_work(>ioworker.register_work);
+
+   xenbus_unmap_ring_vfree(dev, map->bytes);
+   xenbus_unmap_ring_vfree(dev, (void *)map->ring);
+   unbind_from_irqhandler(map->irq, map);
+
+   sock_release(map->sock);
+   kfree(map);
+
+   return 0;
+}
+
+static int pvcalls_back_release_passive(struct xenbus_device *dev,
+   struct pvcalls_fedata *fedata,
+   struct sockpass_mapping *mappass)
+{
+   if (mappass->sock->sk != NULL) {
+   write_lock_bh(>sock->sk->sk_callback_lock);
+   mappass->sock->sk->sk_user_data = NULL;
+   mappass->sock->sk->sk_data_ready = mappass->saved_data_ready;
+   write_unlock_bh(>sock->sk->sk_callback_lock);
+   }
+   sock_release(mappass->sock);
+   flush_workqueue(mappass->wq);
+   destroy_workqueue(mappass->wq);
+   kfree(mappass);
+
return 0;
 }
 
 static int pvcalls_back_release(struct xenbus_device *dev,
struct xen_pvcalls_request *req)
 {
+   struct pvcalls_fedata *fedata;
+   struct sock_mapping *map, *n;
+   struct sockpass_mapping *mappass;
+   int ret = 0;
+   struct xen_pvcalls_response *rsp;
+
+   fedata = dev_get_drvdata(>dev);
+
+   down(>socket_lock);
+   list_for_each_entry_safe(map, n, >socket_mappings, list) {
+   if (map->id == req->u.release.id) {
+   list_del(>list);
+   up(>socket_lock);
+   ret = pvcalls_back_release_active(dev, fedata, map);
+   goto out;
+   }
+   }
+   mappass = radix_tree_lookup(>socketpass_mappings,
+   req->u.release.id);
+   if (mappass != NULL) {
+   radix_tree_delete(>socketpass_mappings, mappass->id);
+   up(>socket_lock);
+   ret = pvcalls_back_release_passive(dev, fedata, mappass);
+   } else
+   up(>socket_lock);
+
+out:
+   rsp = RING_GET_RESPONSE(>ring, fedata->ring.rsp_prod_pvt++);
+   rsp->req_id = req->req_id;
+   rsp->u.release.id = req->u.release.id;
+   rsp->cmd = req->cmd;
+   rsp->ret = ret;
return 0;
 }
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 12/18] xen/pvcalls: implement poll command

2017-06-22 Thread Stefano Stabellini
Implement poll on passive sockets by requesting a delayed response with
mappass->reqcopy, and reply back when there is data on the passive
socket.

Poll on active socket is unimplemented as by the spec, as the frontend
should just wait for events and check the indexes on the indexes page.

Only support one outstanding poll (or accept) request for every passive
socket at any given time.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 73 +-
 1 file changed, 72 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 62738e4..5b2ef60 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -352,11 +352,33 @@ static void __pvcalls_back_accept(struct work_struct 
*work)
 static void pvcalls_pass_sk_data_ready(struct sock *sock)
 {
struct sockpass_mapping *mappass = sock->sk_user_data;
+   struct pvcalls_fedata *fedata;
+   struct xen_pvcalls_response *rsp;
+   unsigned long flags;
+   int notify;
 
if (mappass == NULL)
return;
 
-   queue_work(mappass->wq, >register_work);
+   fedata = mappass->fedata;
+   spin_lock_irqsave(>copy_lock, flags);
+   if (mappass->reqcopy.cmd == PVCALLS_POLL) {
+   rsp = RING_GET_RESPONSE(>ring, 
fedata->ring.rsp_prod_pvt++);
+   rsp->req_id = mappass->reqcopy.req_id;
+   rsp->u.poll.id = mappass->reqcopy.u.poll.id;
+   rsp->cmd = mappass->reqcopy.cmd;
+   rsp->ret = 0;
+
+   mappass->reqcopy.cmd = 0;
+   spin_unlock_irqrestore(>copy_lock, flags);
+
+   RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(>ring, notify);
+   if (notify)
+   notify_remote_via_irq(mappass->fedata->irq);
+   } else {
+   spin_unlock_irqrestore(>copy_lock, flags);
+   queue_work(mappass->wq, >register_work);
+   }
 }
 
 static int pvcalls_back_bind(struct xenbus_device *dev,
@@ -507,6 +529,55 @@ static int pvcalls_back_accept(struct xenbus_device *dev,
 static int pvcalls_back_poll(struct xenbus_device *dev,
 struct xen_pvcalls_request *req)
 {
+   struct pvcalls_fedata *fedata;
+   struct sockpass_mapping *mappass;
+   struct xen_pvcalls_response *rsp;
+   struct inet_connection_sock *icsk;
+   struct request_sock_queue *queue;
+   unsigned long flags;
+   int ret;
+   bool data;
+
+   fedata = dev_get_drvdata(>dev);
+
+   down(>socket_lock);
+   mappass = radix_tree_lookup(>socketpass_mappings, 
req->u.poll.id);
+   up(>socket_lock);
+   if (mappass == NULL)
+   return -EINVAL;
+
+   /*
+* Limitation of the current implementation: only support one
+* concurrent accept or poll call on one socket.
+*/
+   spin_lock_irqsave(>copy_lock, flags);
+   if (mappass->reqcopy.cmd != 0) {
+   ret = -EINTR;
+   goto out;
+   }
+
+   mappass->reqcopy = *req;
+   icsk = inet_csk(mappass->sock->sk);
+   queue = >icsk_accept_queue;
+   data = queue->rskq_accept_head != NULL;
+   if (data) {
+   mappass->reqcopy.cmd = 0;
+   ret = 0;
+   goto out;
+   }
+   spin_unlock_irqrestore(>copy_lock, flags);
+
+   /* Tell the caller we don't need to send back a notification yet */
+   return -1;
+
+out:
+   spin_unlock_irqrestore(>copy_lock, flags);
+
+   rsp = RING_GET_RESPONSE(>ring, fedata->ring.rsp_prod_pvt++);
+   rsp->req_id = req->req_id;
+   rsp->cmd = req->cmd;
+   rsp->u.poll.id = req->u.poll.id;
+   rsp->ret = ret;
return 0;
 }
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 02/18] xen/pvcalls: introduce the pvcalls xenbus backend

2017-06-22 Thread Stefano Stabellini
Introduce a xenbus backend for the pvcalls protocol, as defined by
https://xenbits.xen.org/docs/unstable/misc/pvcalls.html.

This patch only adds the stubs, the code will be added by the following
patches.

Signed-off-by: Stefano Stabellini 
Reviewed-by: Boris Ostrovsky 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 61 ++
 1 file changed, 61 insertions(+)
 create mode 100644 drivers/xen/pvcalls-back.c

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
new file mode 100644
index 000..f3d0daa
--- /dev/null
+++ b/drivers/xen/pvcalls-back.c
@@ -0,0 +1,61 @@
+/*
+ * (c) 2017 Stefano Stabellini 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static int pvcalls_back_probe(struct xenbus_device *dev,
+ const struct xenbus_device_id *id)
+{
+   return 0;
+}
+
+static void pvcalls_back_changed(struct xenbus_device *dev,
+enum xenbus_state frontend_state)
+{
+}
+
+static int pvcalls_back_remove(struct xenbus_device *dev)
+{
+   return 0;
+}
+
+static int pvcalls_back_uevent(struct xenbus_device *xdev,
+  struct kobj_uevent_env *env)
+{
+   return 0;
+}
+
+static const struct xenbus_device_id pvcalls_back_ids[] = {
+   { "pvcalls" },
+   { "" }
+};
+
+static struct xenbus_driver pvcalls_back_driver = {
+   .ids = pvcalls_back_ids,
+   .probe = pvcalls_back_probe,
+   .remove = pvcalls_back_remove,
+   .uevent = pvcalls_back_uevent,
+   .otherend_changed = pvcalls_back_changed,
+};
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 03/18] xen/pvcalls: initialize the module and register the xenbus backend

2017-06-22 Thread Stefano Stabellini
Keep a list of connected frontends. Use a semaphore to protect list
accesses.

Signed-off-by: Stefano Stabellini 
Reviewed-by: Boris Ostrovsky 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index f3d0daa..9044cf2 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -25,6 +25,11 @@
 #include 
 #include 
 
+struct pvcalls_back_global {
+   struct list_head frontends;
+   struct semaphore frontends_lock;
+} pvcalls_back_global;
+
 static int pvcalls_back_probe(struct xenbus_device *dev,
  const struct xenbus_device_id *id)
 {
@@ -59,3 +64,20 @@ static int pvcalls_back_uevent(struct xenbus_device *xdev,
.uevent = pvcalls_back_uevent,
.otherend_changed = pvcalls_back_changed,
 };
+
+static int __init pvcalls_back_init(void)
+{
+   int ret;
+
+   if (!xen_domain())
+   return -ENODEV;
+
+   ret = xenbus_register_backend(_back_driver);
+   if (ret < 0)
+   return ret;
+
+   sema_init(_back_global.frontends_lock, 1);
+   INIT_LIST_HEAD(_back_global.frontends);
+   return 0;
+}
+module_init(pvcalls_back_init);
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 09/18] xen/pvcalls: implement bind command

2017-06-22 Thread Stefano Stabellini
Allocate a socket. Track the allocated passive sockets with a new data
structure named sockpass_mapping. It contains an unbound workqueue to
schedule delayed work for the accept and poll commands. It also has a
reqcopy field to be used to store a copy of a request for delayed work.
Reads/writes to it are protected by a lock (the "copy_lock" spinlock).
Initialize the workqueue in pvcalls_back_bind.

Implement the bind command with inet_bind.

The pass_sk_data_ready event handler will be added later.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 87 ++
 1 file changed, 87 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 5435ce7..2c0bfef 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -80,6 +80,18 @@ struct sock_mapping {
struct pvcalls_ioworker ioworker;
 };
 
+struct sockpass_mapping {
+   struct list_head list;
+   struct pvcalls_fedata *fedata;
+   struct socket *sock;
+   uint64_t id;
+   struct xen_pvcalls_request reqcopy;
+   spinlock_t copy_lock;
+   struct workqueue_struct *wq;
+   struct work_struct register_work;
+   void (*saved_data_ready)(struct sock *sk);
+};
+
 static irqreturn_t pvcalls_back_conn_event(int irq, void *sock_map);
 static int pvcalls_back_release_active(struct xenbus_device *dev,
   struct pvcalls_fedata *fedata,
@@ -265,9 +277,84 @@ static int pvcalls_back_release(struct xenbus_device *dev,
return 0;
 }
 
+static void __pvcalls_back_accept(struct work_struct *work)
+{
+}
+
+static void pvcalls_pass_sk_data_ready(struct sock *sock)
+{
+}
+
 static int pvcalls_back_bind(struct xenbus_device *dev,
 struct xen_pvcalls_request *req)
 {
+   struct pvcalls_fedata *fedata;
+   int ret, err;
+   struct socket *sock;
+   struct sockpass_mapping *map;
+   struct xen_pvcalls_response *rsp;
+
+   fedata = dev_get_drvdata(>dev);
+
+   map = kzalloc(sizeof(*map), GFP_KERNEL);
+   if (map == NULL) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   INIT_WORK(>register_work, __pvcalls_back_accept);
+   spin_lock_init(>copy_lock);
+   map->wq = alloc_workqueue("pvcalls_wq", WQ_UNBOUND, 1);
+   if (!map->wq) {
+   ret = -ENOMEM;
+   kfree(map);
+   goto out;
+   }
+
+   ret = sock_create(AF_INET, SOCK_STREAM, 0, );
+   if (ret < 0) {
+   destroy_workqueue(map->wq);
+   kfree(map);
+   goto out;
+   }
+
+   ret = inet_bind(sock, (struct sockaddr *)>u.bind.addr,
+   req->u.bind.len);
+   if (ret < 0) {
+   sock_release(sock);
+   destroy_workqueue(map->wq);
+   kfree(map);
+   goto out;
+   }
+
+   map->fedata = fedata;
+   map->sock = sock;
+   map->id = req->u.bind.id;
+
+   down(>socket_lock);
+   err = radix_tree_insert(>socketpass_mappings, map->id,
+   map);
+   up(>socket_lock);
+   if (err) {
+   ret = err;
+   sock_release(sock);
+   destroy_workqueue(map->wq);
+   kfree(map);
+   goto out;
+   }
+
+   write_lock_bh(>sk->sk_callback_lock);
+   map->saved_data_ready = sock->sk->sk_data_ready;
+   sock->sk->sk_user_data = map;
+   sock->sk->sk_data_ready = pvcalls_pass_sk_data_ready;
+   write_unlock_bh(>sk->sk_callback_lock);
+
+out:
+   rsp = RING_GET_RESPONSE(>ring, fedata->ring.rsp_prod_pvt++);
+   rsp->req_id = req->req_id;
+   rsp->cmd = req->cmd;
+   rsp->u.bind.id = req->u.bind.id;
+   rsp->ret = ret;
return 0;
 }
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 10/18] xen/pvcalls: implement listen command

2017-06-22 Thread Stefano Stabellini
Call inet_listen to implement the listen command.

Signed-off-by: Stefano Stabellini 
Reviewed-by: Boris Ostrovsky 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 2c0bfef..2a47425 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -361,6 +361,27 @@ static int pvcalls_back_bind(struct xenbus_device *dev,
 static int pvcalls_back_listen(struct xenbus_device *dev,
   struct xen_pvcalls_request *req)
 {
+   struct pvcalls_fedata *fedata;
+   int ret = -EINVAL;
+   struct sockpass_mapping *map;
+   struct xen_pvcalls_response *rsp;
+
+   fedata = dev_get_drvdata(>dev);
+
+   down(>socket_lock);
+   map = radix_tree_lookup(>socketpass_mappings, req->u.listen.id);
+   up(>socket_lock);
+   if (map == NULL)
+   goto out;
+
+   ret = inet_listen(map->sock, req->u.listen.backlog);
+
+out:
+   rsp = RING_GET_RESPONSE(>ring, fedata->ring.rsp_prod_pvt++);
+   rsp->req_id = req->req_id;
+   rsp->cmd = req->cmd;
+   rsp->u.listen.id = req->u.listen.id;
+   rsp->ret = ret;
return 0;
 }
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 00/18] introduce the Xen PV Calls backend

2017-06-22 Thread Stefano Stabellini
Hi all,

this series introduces the backend for the newly introduced PV Calls
procotol.

PV Calls is a paravirtualized protocol that allows the implementation of
a set of POSIX functions in a different domain. The PV Calls frontend
sends POSIX function calls to the backend, which implements them and
returns a value to the frontend and acts on the function call.

For more information about PV Calls, please read:

https://xenbits.xen.org/docs/unstable/misc/pvcalls.html

I tried to split the source code into small pieces to make it easier to
read and understand. Please review!


Changes in v5:
- added review-byes
- remove unnecessary gotos
- ret 0 in pvcalls_back_connect
- do not lose ret values
- remove queue->rskq_lock
- make sure all accesses to socket_mappings and socketpass_mappings are
  protected by socket_lock
- rename ring_size to array_size

Changes in v4:
- add reviewed-bys
- fix return values of many functions
- remove pointless initializers
- print a warning if ring_order > MAX_RING_ORDER
- remove map->ioworker.cpu
- use queue_work instead of queue_work_on
- add sock_release() on error paths where appropriate
- add a comment in __pvcalls_back_accept about racing with
  pvcalls_back_accept and atomicity of reqcopy
- remove unneded (void*) casts
- remove unneded {}
- fix backend_disconnect if !mappass
- remove pointless continue in backend_disconnect
- remove pointless memset of _back_global
- pass *opaque to pvcalls_conn_back_read
- improve WARN_ON in pvcalls_conn_back_read
- fix error checks in pvcalls_conn_back_write
- XEN_PVCALLS_BACKEND depends on XEN_BACKEND
- rename priv to fedata across all patches

Changes in v3:
- added reviewed-bys
- return err from pvcalls_back_probe
- remove old comments
- use a xenstore transaction in pvcalls_back_probe
- ignore errors from xenbus_switch_state
- rename pvcalls_back_priv to pvcalls_fedata
- remove addr from backend_connect
- remove priv->work, add comment about theoretical race
- use IPPROTO_IP
- refactor active socket allocation in a single new function

Changes in v2:
- allocate one ioworker per socket (rather than 1 per vcpu)
- rename privs to frontends
- add newlines
- define "1" in the public header
- better error returns in pvcalls_back_probe
- do not set XenbusStateClosed twice in set_backend_state
- add more comments
- replace rw_semaphore with semaphore
- rename pvcallss to socket_lock
- move xenbus_map_ring_valloc closer to first use in backend_connect
- use more traditional return codes from pvcalls_back_handle_cmd and
  callees
- remove useless dev == NULL checks
- replace lock_sock with more appropriate and fine grained socket locks


Stefano Stabellini (18):
  xen: introduce the pvcalls interface header
  xen/pvcalls: introduce the pvcalls xenbus backend
  xen/pvcalls: initialize the module and register the xenbus backend
  xen/pvcalls: xenbus state handling
  xen/pvcalls: connect to a frontend
  xen/pvcalls: handle commands from the frontend
  xen/pvcalls: implement socket command
  xen/pvcalls: implement connect command
  xen/pvcalls: implement bind command
  xen/pvcalls: implement listen command
  xen/pvcalls: implement accept command
  xen/pvcalls: implement poll command
  xen/pvcalls: implement release command
  xen/pvcalls: disconnect and module_exit
  xen/pvcalls: implement the ioworker functions
  xen/pvcalls: implement read
  xen/pvcalls: implement write
  xen: introduce a Kconfig option to enable the pvcalls backend

 drivers/xen/Kconfig|   12 +
 drivers/xen/Makefile   |1 +
 drivers/xen/pvcalls-back.c | 1244 
 include/xen/interface/io/pvcalls.h |  121 
 include/xen/interface/io/ring.h|2 +
 5 files changed, 1380 insertions(+)
 create mode 100644 drivers/xen/pvcalls-back.c
 create mode 100644 include/xen/interface/io/pvcalls.h

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH for-4.8] x86/pv: Fix the handling of `int $x` for vectors which alias exceptions

2017-06-22 Thread Andrew Cooper
The claim at the top of c/s 2e426d6eecf "x86/traps: Drop use_error_code
parameter from do_{,guest_}trap()" is only actually true for hardware
exceptions.  It is not true for `int $x` instructions (which never push error
code), irrespective of whether the vector aliases an exception or not.

Furthermore, c/s 6480cc6280e "x86/traps: Fix failed ASSERT() in
do_guest_trap()" really should have helped highlight that a regression had
been introduced.

Modify pv_inject_event() to understand event types other than
X86_EVENTTYPE_HW_EXCEPTION, and introduce pv_inject_sw_interrupt() for the
`int $x` handling code.

Add further assertions to pv_inject_event() concerning the type of events
passed in, which in turn requires that do_guest_trap() set its type
appropriately (which is now used exclusively for hardware exceptions).

This is logically a backport of c/s 5c4f579e0ee4f38cad5636bbf8ce700a394338d0
from Xen 4.9, but disentangled from the other injection work.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
---
 xen/arch/x86/traps.c | 26 +-
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 19ac652..8c992ce 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -625,14 +625,24 @@ void fatal_trap(const struct cpu_user_regs *regs, bool_t 
show_remote)
   (regs->eflags & X86_EFLAGS_IF) ? "" : ", IN INTERRUPT CONTEXT");
 }
 
-static void do_guest_trap(unsigned int trapnr,
-  const struct cpu_user_regs *regs)
+static void pv_inject_event(
+unsigned int trapnr, const struct cpu_user_regs *regs, unsigned int type)
 {
 struct vcpu *v = current;
 struct trap_bounce *tb;
 const struct trap_info *ti;
-const bool use_error_code =
-((trapnr < 32) && (TRAP_HAVE_EC & (1u << trapnr)));
+bool use_error_code;
+
+if ( type == X86_EVENTTYPE_HW_EXCEPTION )
+{
+ASSERT(trapnr < 32);
+use_error_code = TRAP_HAVE_EC & (1u << trapnr);
+}
+else
+{
+ASSERT(type == X86_EVENTTYPE_SW_INTERRUPT);
+use_error_code = false;
+}
 
 trace_pv_trap(trapnr, regs->eip, use_error_code, regs->error_code);
 
@@ -658,6 +668,12 @@ static void do_guest_trap(unsigned int trapnr,
 trapstr(trapnr), trapnr, regs->error_code);
 }
 
+static void do_guest_trap(
+unsigned int trapnr, const struct cpu_user_regs *regs)
+{
+pv_inject_event(trapnr, regs, X86_EVENTTYPE_HW_EXCEPTION);
+}
+
 static void instruction_done(
 struct cpu_user_regs *regs, unsigned long eip, unsigned int bpmatch)
 {
@@ -3685,7 +3701,7 @@ void do_general_protection(struct cpu_user_regs *regs)
 if ( permit_softint(TI_GET_DPL(ti), v, regs) )
 {
 regs->eip += 2;
-do_guest_trap(vector, regs);
+pv_inject_event(vector, regs, X86_EVENTTYPE_SW_INTERRUPT);
 return;
 }
 }
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 4/8] mm: Scrub memory from idle loop

2017-06-22 Thread Boris Ostrovsky
Instead of scrubbing pages during guest destruction (from
free_heap_pages()) do this opportunistically, from the idle loop.

We might come to scrub_free_pages()from idle loop while another CPU
uses mapcache override, resulting in a fault while trying to do 
__map_domain_page() in scrub_one_page(). To avoid this, make mapcache
vcpu override a per-cpu variable.

Signed-off-by: Boris Ostrovsky 
---
CC: Dario Faggioli 
---
Changes in v5:
* Added explanation in commit message for making mapcache override VCPU
  a per-cpu variable
* Fixed loop counting in scrub_free_pages()
* Fixed the off-by-one error in setting first_dirty in scrub_free_pages().
* Various style fixes
* Added a comment in node_to_scrub() explaining why it should be OK to
  prevent another CPU from scrubbing a node that ths current CPU temporarily
  claimed. (I decided against using locks there)


 xen/arch/arm/domain.c  |   2 +-
 xen/arch/x86/domain.c  |   2 +-
 xen/arch/x86/domain_page.c |   6 +--
 xen/common/page_alloc.c| 118 -
 xen/include/xen/mm.h   |   1 +
 5 files changed, 111 insertions(+), 18 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 2dc8b0a..d282cd8 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -51,7 +51,7 @@ void idle_loop(void)
 /* Are we here for running vcpu context tasklets, or for idling? */
 if ( unlikely(tasklet_work_to_do(cpu)) )
 do_tasklet();
-else
+else if ( !softirq_pending(cpu) && !scrub_free_pages() )
 {
 local_irq_disable();
 if ( cpu_is_haltable(cpu) )
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index f7873da..71f1ef4 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -122,7 +122,7 @@ static void idle_loop(void)
 /* Are we here for running vcpu context tasklets, or for idling? */
 if ( unlikely(tasklet_work_to_do(cpu)) )
 do_tasklet();
-else
+else if ( !softirq_pending(cpu) && !scrub_free_pages() )
 pm_idle();
 do_softirq();
 /*
diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c
index 71baede..0783c1e 100644
--- a/xen/arch/x86/domain_page.c
+++ b/xen/arch/x86/domain_page.c
@@ -18,12 +18,12 @@
 #include 
 #include 
 
-static struct vcpu *__read_mostly override;
+static DEFINE_PER_CPU(struct vcpu *, override);
 
 static inline struct vcpu *mapcache_current_vcpu(void)
 {
 /* In the common case we use the mapcache of the running VCPU. */
-struct vcpu *v = override ?: current;
+struct vcpu *v = this_cpu(override) ?: current;
 
 /*
  * When current isn't properly set up yet, this is equivalent to
@@ -59,7 +59,7 @@ static inline struct vcpu *mapcache_current_vcpu(void)
 
 void __init mapcache_override_current(struct vcpu *v)
 {
-override = v;
+this_cpu(override) = v;
 }
 
 #define mapcache_l2_entry(e) ((e) >> PAGETABLE_ORDER)
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 9aac196..4e2775f 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1019,15 +1019,85 @@ static int reserve_offlined_page(struct page_info *head)
 return count;
 }
 
-static void scrub_free_pages(unsigned int node)
+static nodemask_t node_scrubbing;
+
+/*
+ * If get_node is true this will return closest node that needs to be scrubbed,
+ * with appropriate bit in node_scrubbing set.
+ * If get_node is not set, this will return *a* node that needs to be scrubbed.
+ * node_scrubbing bitmask will no be updated.
+ * If no node needs scrubbing then NUMA_NO_NODE is returned.
+ */
+static unsigned int node_to_scrub(bool get_node)
 {
-struct page_info *pg;
-unsigned int zone;
+nodeid_t node = cpu_to_node(smp_processor_id()), local_node;
+nodeid_t closest = NUMA_NO_NODE;
+u8 dist, shortest = 0xff;
 
-ASSERT(spin_is_locked(_lock));
+if ( node == NUMA_NO_NODE )
+node = 0;
 
-if ( !node_need_scrub[node] )
-return;
+if ( node_need_scrub[node] &&
+ (!get_node || !node_test_and_set(node, node_scrubbing)) )
+return node;
+
+/*
+ * See if there are memory-only nodes that need scrubbing and choose
+ * the closest one.
+ */
+local_node = node;
+for ( ; ; )
+{
+do {
+node = cycle_node(node, node_online_map);
+} while ( !cpumask_empty(_to_cpumask(node)) &&
+  (node != local_node) );
+
+if ( node == local_node )
+break;
+
+/*
+ * Grab the node right away. If we find a closer node later we will
+ * release this one. While there is a chance that another CPU will
+ * not be able to scrub that node when it is searching for scrub work
+ * at the same time it will be able to do so next time it wakes up.
+ * The alternative would be to perform this 

[Xen-devel] [PATCH v5 0/8] Memory scrubbing from idle loop

2017-06-22 Thread Boris Ostrovsky
Changes in v5:
* Make page_info.u.free and union and use bitfields there.
* Bug fixes

(see per-patch notes)

When a domain is destroyed the hypervisor must scrub domain's pages before
giving them to another guest in order to prevent leaking the deceased
guest's data. Currently this is done during guest's destruction, possibly
causing very lengthy cleanup process.

This series adds support for scrubbing released pages from idle loop,
making guest destruction significantly faster. For example, destroying a
1TB guest can now be completed in 40+ seconds as opposed to about 9 minutes
using existing scrubbing algorithm.

Briefly, the new algorithm places dirty pages at the end of heap's page list
for each node/zone/order to avoid having to scan full list while searching
for dirty pages. One processor form each node checks whether the node has any
dirty pages and, if such pages are found, scrubs them. Scrubbing itself
happens without holding heap lock so other users may access heap in the
meantime. If while idle loop is scrubbing a particular chunk of pages this
chunk is requested by the heap allocator, scrubbing is immediately stopped.

On the allocation side, alloc_heap_pages() first tries to satisfy allocation
request using only clean pages. If this is not possible, the search is
repeated and dirty pages are scrubbed by the allocator.

This series is somewhat based on earlier work by Bob Liu.

V1:
* Only set PGC_need_scrub bit for the buddy head, thus making it unnecessary
  to scan whole buddy
* Fix spin_lock_cb()
* Scrub CPU-less nodes
* ARM support. Note that I have not been able to test this, only built the
  binary
* Added scrub test patch (last one). Not sure whether it should be considered
  for committing but I have been running with it.

V2:
* merge_chunks() returns new buddy head
* scrub_free_pages() returns softirq pending status in addition to (factored 
out)
  status of unscrubbed memory
* spin_lock uses inlined spin_lock_cb()
* scrub debugging code checks whole page, not just the first word.

V3:
* Keep dirty bit per page
* Simplify merge_chunks() (now merge_and_free_buddy())
* When scrubbing memmory-only nodes try to find the closest node.

V4:
* Keep track of dirty pages in a buddy with page_info.u.free.first_dirty.
* Drop patch 1 (factoring out merge_and_free_buddy()) since there is only
  one caller now
* Drop patch patch 5 (from V3) since we are not breaking partially-scrubbed
  buddy anymore
* Extract search loop in alloc_heap_pages() into get_free_buddy() (patch 2)
* Add MEMF_no_scrub flag


Deferred:
* Per-node heap locks. In addition to (presumably) improving performance in
  general, once they are available we can parallelize scrubbing further by
  allowing more than one core per node to do idle loop scrubbing.
* AVX-based scrubbing
* Use idle loop scrubbing during boot.


Boris Ostrovsky (8):
  mm: Place unscrubbed pages at the end of pagelist
  mm: Extract allocation loop from alloc_heap_pages()
  mm: Scrub pages in alloc_heap_pages() if needed
  mm: Scrub memory from idle loop
  spinlock: Introduce spin_lock_cb()
  mm: Keep heap accessible to others while scrubbing
  mm: Print number of unscrubbed pages in 'H' debug handler
  mm: Make sure pages are scrubbed

 xen/Kconfig.debug  |   7 +
 xen/arch/arm/domain.c  |   2 +-
 xen/arch/x86/domain.c  |   2 +-
 xen/arch/x86/domain_page.c |   6 +-
 xen/common/page_alloc.c| 612 ++---
 xen/common/spinlock.c  |   9 +-
 xen/include/asm-arm/mm.h   |  30 ++-
 xen/include/asm-x86/mm.h   |  30 ++-
 xen/include/xen/mm.h   |   5 +-
 xen/include/xen/spinlock.h |   8 +
 10 files changed, 603 insertions(+), 108 deletions(-)

-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 3/8] mm: Scrub pages in alloc_heap_pages() if needed

2017-06-22 Thread Boris Ostrovsky
When allocating pages in alloc_heap_pages() first look for clean pages. If none
is found then retry, take pages marked as unscrubbed and scrub them.

Note that we shouldn't find unscrubbed pages in alloc_heap_pages() yet. However,
this will become possible when we stop scrubbing from free_heap_pages() and
instead do it from idle loop.

Since not all allocations require clean pages (such as xenheap allocations)
introduce MEMF_no_scrub flag that callers can set if they are willing to
consume unscrubbed pages.

Signed-off-by: Boris Ostrovsky 
---
Changes in v5:
* Added comment explaining why we always grab order 0 pages in alloc_heap_pages)
* Dropped the somewhat confusing comment about not needing to set first_dirty
  in alloc_heap_pages().
* Moved first bit of _MEMF_node by 8 to accommodate MEMF_no_scrub (bit 7 is
  no longer available)


 xen/common/page_alloc.c | 36 +++-
 xen/include/xen/mm.h|  4 +++-
 2 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 89fe3ce..9aac196 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -703,6 +703,7 @@ static struct page_info *get_free_buddy(unsigned int 
zone_lo,
 nodemask_t nodemask = d ? d->node_affinity : node_online_map;
 unsigned int j, zone, nodemask_retry = 0;
 struct page_info *pg;
+bool use_unscrubbed = (memflags & MEMF_no_scrub);
 
 if ( node == NUMA_NO_NODE )
 {
@@ -734,8 +735,20 @@ static struct page_info *get_free_buddy(unsigned int 
zone_lo,
 
 /* Find smallest order which can satisfy the request. */
 for ( j = order; j <= MAX_ORDER; j++ )
+{
 if ( (pg = page_list_remove_head((node, zone, j))) )
-return pg;
+{
+/*
+ * We grab single pages (order=0) even if they are
+ * unscrubbed. Given that scrubbing one page is fairly 
quick
+ * it is not worth breaking higher orders.
+ */
+if ( (order == 0) || use_unscrubbed ||
+ pg->u.free.first_dirty == INVALID_DIRTY_IDX)
+return pg;
+page_list_add_tail(pg, (node, zone, j));
+}
+}
 } while ( zone-- > zone_lo ); /* careful: unsigned zone may wrap */
 
 if ( (memflags & MEMF_exact_node) && req_node != NUMA_NO_NODE )
@@ -775,7 +788,7 @@ static struct page_info *alloc_heap_pages(
 unsigned int i, buddy_order, zone;
 unsigned long request = 1UL << order;
 struct page_info *pg, *first_dirty_pg = NULL;
-bool_t need_tlbflush = 0;
+bool need_scrub, need_tlbflush = false;
 uint32_t tlbflush_timestamp = 0;
 
 /* Make sure there are enough bits in memflags for nodeID. */
@@ -819,6 +832,10 @@ static struct page_info *alloc_heap_pages(
  }
  
 pg = get_free_buddy(zone_lo, zone_hi, order, memflags, d);
+/* Try getting a dirty buddy if we couldn't get a clean one. */
+if ( !pg && !(memflags & MEMF_no_scrub) )
+pg = get_free_buddy(zone_lo, zone_hi, order,
+memflags | MEMF_no_scrub, d);
 if ( !pg )
 {
 /* No suitable memory blocks. Fail the request. */
@@ -862,10 +879,19 @@ static struct page_info *alloc_heap_pages(
 if ( d != NULL )
 d->last_alloc_node = node;
 
+need_scrub = !!first_dirty_pg && !(memflags & MEMF_no_scrub);
 for ( i = 0; i < (1 << order); i++ )
 {
 /* Reference count must continuously be zero for free pages. */
-BUG_ON(pg[i].count_info != PGC_state_free);
+BUG_ON((pg[i].count_info & ~PGC_need_scrub) != PGC_state_free);
+
+if ( test_bit(_PGC_need_scrub, [i].count_info) )
+{
+if ( need_scrub )
+scrub_one_page([i]);
+node_need_scrub[node]--;
+}
+
 pg[i].count_info = PGC_state_inuse;
 
 if ( !(memflags & MEMF_no_tlbflush) )
@@ -1749,7 +1775,7 @@ void *alloc_xenheap_pages(unsigned int order, unsigned 
int memflags)
 ASSERT(!in_irq());
 
 pg = alloc_heap_pages(MEMZONE_XEN, MEMZONE_XEN,
-  order, memflags, NULL);
+  order, memflags | MEMF_no_scrub, NULL);
 if ( unlikely(pg == NULL) )
 return NULL;
 
@@ -1799,7 +1825,7 @@ void *alloc_xenheap_pages(unsigned int order, unsigned 
int memflags)
 if ( !(memflags >> _MEMF_bits) )
 memflags |= MEMF_bits(xenheap_bits);
 
-pg = alloc_domheap_pages(NULL, order, memflags);
+pg = alloc_domheap_pages(NULL, order, memflags | MEMF_no_scrub);
 if ( unlikely(pg == NULL) )
 return NULL;
 
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index 3d3f31b..5f3d84a 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -238,7 +238,9 @@ struct npfec {
 #define  MEMF_no_tlbflush 

[Xen-devel] [PATCH v5 7/8] mm: Print number of unscrubbed pages in 'H' debug handler

2017-06-22 Thread Boris Ostrovsky
Signed-off-by: Boris Ostrovsky 
Reviewed-by: Wei Liu 
---
 xen/common/page_alloc.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index f0e5399..da5ffc2 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -2315,6 +2315,13 @@ static void dump_heap(unsigned char key)
 printk("heap[node=%d][zone=%d] -> %lu pages\n",
i, j, avail[i][j]);
 }
+
+for ( i = 0; i < MAX_NUMNODES; i++ )
+{
+if ( !node_need_scrub[i] )
+continue;
+printk("Node %d has %lu unscrubbed pages\n", i, node_need_scrub[i]);
+}
 }
 
 static __init int register_heap_trigger(void)
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v2]Proposal to allow setting up shared memory areas between VMs from xl config file

2017-06-22 Thread Zhongze Liu
Hi,

After talking to Stefano, I know that there seem to be no such
hypercalls to restrict the W/R/X
permissions on the shared backing pages (XENMEM_access_op is for
another purpose,
sorry for getting its usage wrong). And it seems that the ability to
specify these permissions
is not strictly necessary. Since the goal of this project is to setup
VM-to-VM communication,
in most cases, users would just expect that the shared memory is
mapped read-write with
cacheability attributes of normal memory. So the temporary conclusion
is to restrict the
design to sharing read-write pages with normal caching attributes,
with the rest left to
the to-be-done list.


Cheers,

Zhongze Liu

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 5/8] spinlock: Introduce spin_lock_cb()

2017-06-22 Thread Boris Ostrovsky
While waiting for a lock we may want to periodically run some
code. This code may, for example, allow the caller to release
resources held by it that are no longer needed in the critical
section protected by the lock.

Specifically, this feature will be needed by scrubbing code where
the scrubber, while waiting for heap lock to merge back clean
pages, may be requested by page allocator (which is currently
holding the lock) to abort merging and release the buddy page head
that the allocator wants.

We could use spin_trylock() but since it doesn't take lock ticket
it may take long time until the lock is taken. Instead we add
spin_lock_cb() that allows us to grab the ticket and execute a
callback while waiting. This callback is executed on every iteration
of the spinlock waiting loop.

Since we may be sleeping in the lock until it is released we need a
mechanism that will make sure that the callback has a chance to run.
We add spin_lock_kick() that will wake up the waiter.

Signed-off-by: Boris Ostrovsky 
---
Changes in v5:
* Added a sentence in commit message to note that callback function is
  called on every iteration of the spin loop.

 xen/common/spinlock.c  | 9 -
 xen/include/xen/spinlock.h | 8 
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/xen/common/spinlock.c b/xen/common/spinlock.c
index 2a06406..3c1caae 100644
--- a/xen/common/spinlock.c
+++ b/xen/common/spinlock.c
@@ -129,7 +129,7 @@ static always_inline u16 observe_head(spinlock_tickets_t *t)
 return read_atomic(>head);
 }
 
-void _spin_lock(spinlock_t *lock)
+void inline _spin_lock_cb(spinlock_t *lock, void (*cb)(void *), void *data)
 {
 spinlock_tickets_t tickets = SPINLOCK_TICKET_INC;
 LOCK_PROFILE_VAR;
@@ -140,6 +140,8 @@ void _spin_lock(spinlock_t *lock)
 while ( tickets.tail != observe_head(>tickets) )
 {
 LOCK_PROFILE_BLOCK;
+if ( unlikely(cb) )
+cb(data);
 arch_lock_relax();
 }
 LOCK_PROFILE_GOT;
@@ -147,6 +149,11 @@ void _spin_lock(spinlock_t *lock)
 arch_lock_acquire_barrier();
 }
 
+void _spin_lock(spinlock_t *lock)
+{
+ _spin_lock_cb(lock, NULL, NULL);
+}
+
 void _spin_lock_irq(spinlock_t *lock)
 {
 ASSERT(local_irq_is_enabled());
diff --git a/xen/include/xen/spinlock.h b/xen/include/xen/spinlock.h
index c1883bd..91bfb95 100644
--- a/xen/include/xen/spinlock.h
+++ b/xen/include/xen/spinlock.h
@@ -153,6 +153,7 @@ typedef struct spinlock {
 #define spin_lock_init(l) (*(l) = (spinlock_t)SPIN_LOCK_UNLOCKED)
 
 void _spin_lock(spinlock_t *lock);
+void _spin_lock_cb(spinlock_t *lock, void (*cond)(void *), void *data);
 void _spin_lock_irq(spinlock_t *lock);
 unsigned long _spin_lock_irqsave(spinlock_t *lock);
 
@@ -169,6 +170,7 @@ void _spin_lock_recursive(spinlock_t *lock);
 void _spin_unlock_recursive(spinlock_t *lock);
 
 #define spin_lock(l)  _spin_lock(l)
+#define spin_lock_cb(l, c, d) _spin_lock_cb(l, c, d)
 #define spin_lock_irq(l)  _spin_lock_irq(l)
 #define spin_lock_irqsave(l, f) \
 ({  \
@@ -190,6 +192,12 @@ void _spin_unlock_recursive(spinlock_t *lock);
 1 : ({ local_irq_restore(flags); 0; }); \
 })
 
+#define spin_lock_kick(l)   \
+({  \
+smp_mb();   \
+arch_lock_signal(); \
+})
+
 /* Ensure a lock is quiescent between two critical operations. */
 #define spin_barrier(l)   _spin_barrier(l)
 
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 6/8] mm: Keep heap accessible to others while scrubbing

2017-06-22 Thread Boris Ostrovsky
Instead of scrubbing pages while holding heap lock we can mark
buddy's head as being scrubbed and drop the lock temporarily.
If someone (most likely alloc_heap_pages()) tries to access
this chunk it will signal the scrubber to abort scrub by setting
head's BUDDY_SCRUB_ABORT bit. The scrubber checks this bit after
processing each page and stops its work as soon as it sees it.

Signed-off-by: Boris Ostrovsky 
---
Changes in v5:
* Fixed off-by-one error in setting first_dirty
* Changed struct page_info.u.free to a union to permit use of ACCESS_ONCE in
  check_and_stop_scrub()
* Renamed PAGE_SCRUBBING etc. macros to BUDDY_SCRUBBING etc

 xen/common/page_alloc.c  | 105 +--
 xen/include/asm-arm/mm.h |  28 -
 xen/include/asm-x86/mm.h |  29 -
 3 files changed, 138 insertions(+), 24 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 4e2775f..f0e5399 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -687,6 +687,7 @@ static void page_list_add_scrub(struct page_info *pg, 
unsigned int node,
 {
 PFN_ORDER(pg) = order;
 pg->u.free.first_dirty = first_dirty;
+pg->u.free.scrub_state = BUDDY_NOT_SCRUBBING;
 
 if ( first_dirty != INVALID_DIRTY_IDX )
 page_list_add_tail(pg, (node, zone, order));
@@ -694,6 +695,25 @@ static void page_list_add_scrub(struct page_info *pg, 
unsigned int node,
 page_list_add(pg, (node, zone, order));
 }
 
+static void check_and_stop_scrub(struct page_info *head)
+{
+if ( head->u.free.scrub_state == BUDDY_SCRUBBING )
+{
+struct page_info pg;
+
+head->u.free.scrub_state = BUDDY_SCRUB_ABORT;
+spin_lock_kick();
+for ( ; ; )
+{
+/* Can't ACCESS_ONCE() a bitfield. */
+pg.u.free.val = ACCESS_ONCE(head->u.free.val);
+if ( pg.u.free.scrub_state != BUDDY_SCRUB_ABORT )
+break;
+cpu_relax();
+}
+}
+}
+
 static struct page_info *get_free_buddy(unsigned int zone_lo,
 unsigned int zone_hi,
 unsigned int order, unsigned int 
memflags,
@@ -738,14 +758,19 @@ static struct page_info *get_free_buddy(unsigned int 
zone_lo,
 {
 if ( (pg = page_list_remove_head((node, zone, j))) )
 {
+if ( pg->u.free.first_dirty == INVALID_DIRTY_IDX )
+return pg;
 /*
  * We grab single pages (order=0) even if they are
  * unscrubbed. Given that scrubbing one page is fairly 
quick
  * it is not worth breaking higher orders.
  */
-if ( (order == 0) || use_unscrubbed ||
- pg->u.free.first_dirty == INVALID_DIRTY_IDX)
+if ( (order == 0) || use_unscrubbed )
+{
+check_and_stop_scrub(pg);
 return pg;
+}
+
 page_list_add_tail(pg, (node, zone, j));
 }
 }
@@ -928,6 +953,7 @@ static int reserve_offlined_page(struct page_info *head)
 
 cur_head = head;
 
+check_and_stop_scrub(head);
 /*
  * We may break the buddy so let's mark the head as clean. Then, when
  * merging chunks back into the heap, we will see whether the chunk has
@@ -1084,6 +1110,29 @@ static unsigned int node_to_scrub(bool get_node)
 return closest;
 }
 
+struct scrub_wait_state {
+struct page_info *pg;
+unsigned int first_dirty;
+bool drop;
+};
+
+static void scrub_continue(void *data)
+{
+struct scrub_wait_state *st = data;
+
+if ( st->drop )
+return;
+
+if ( st->pg->u.free.scrub_state == BUDDY_SCRUB_ABORT )
+{
+/* There is a waiter for this buddy. Release it. */
+st->drop = true;
+st->pg->u.free.first_dirty = st->first_dirty;
+smp_wmb();
+st->pg->u.free.scrub_state = BUDDY_NOT_SCRUBBING;
+}
+}
+
 bool scrub_free_pages(void)
 {
 struct page_info *pg;
@@ -1106,25 +1155,53 @@ bool scrub_free_pages(void)
 do {
 while ( !page_list_empty((node, zone, order)) )
 {
-unsigned int i;
+unsigned int i, dirty_cnt;
+struct scrub_wait_state st;
 
 /* Unscrubbed pages are always at the end of the list. */
 pg = page_list_last((node, zone, order));
 if ( pg->u.free.first_dirty == INVALID_DIRTY_IDX )
 break;
 
+ASSERT(!pg->u.free.scrub_state);
+pg->u.free.scrub_state = BUDDY_SCRUBBING;
+
+spin_unlock(_lock);
+
+dirty_cnt = 0;
+
 for ( i = pg->u.free.first_dirty; i < (1U << order); i++)
 {
  

[Xen-devel] [PATCH v5 2/8] mm: Extract allocation loop from alloc_heap_pages()

2017-06-22 Thread Boris Ostrovsky
This will make code a bit more readable, especially with changes that
will be introduced in subsequent patches.

Signed-off-by: Boris Ostrovsky 
---
Changes in v5:
* Constified get_free_buddy()'s struct domain argument
* Dropped request local variable in get_free_buddy().

Because of rebasing there were few more changes in this patch so I decided not
to keep Jan's ACK.


 xen/common/page_alloc.c | 143 ++--
 1 file changed, 79 insertions(+), 64 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 570d1f7..89fe3ce 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -694,22 +694,15 @@ static void page_list_add_scrub(struct page_info *pg, 
unsigned int node,
 page_list_add(pg, (node, zone, order));
 }
 
-/* Allocate 2^@order contiguous pages. */
-static struct page_info *alloc_heap_pages(
-unsigned int zone_lo, unsigned int zone_hi,
-unsigned int order, unsigned int memflags,
-struct domain *d)
+static struct page_info *get_free_buddy(unsigned int zone_lo,
+unsigned int zone_hi,
+unsigned int order, unsigned int 
memflags,
+const struct domain *d)
 {
-unsigned int i, j, zone = 0, nodemask_retry = 0;
 nodeid_t first_node, node = MEMF_get_node(memflags), req_node = node;
-unsigned long request = 1UL << order;
-struct page_info *pg, *first_dirty_pg = NULL;
-nodemask_t nodemask = (d != NULL ) ? d->node_affinity : node_online_map;
-bool_t need_tlbflush = 0;
-uint32_t tlbflush_timestamp = 0;
-
-/* Make sure there are enough bits in memflags for nodeID. */
-BUILD_BUG_ON((_MEMF_bits - _MEMF_node) < (8 * sizeof(nodeid_t)));
+nodemask_t nodemask = d ? d->node_affinity : node_online_map;
+unsigned int j, zone, nodemask_retry = 0;
+struct page_info *pg;
 
 if ( node == NUMA_NO_NODE )
 {
@@ -725,34 +718,6 @@ static struct page_info *alloc_heap_pages(
 first_node = node;
 
 ASSERT(node < MAX_NUMNODES);
-ASSERT(zone_lo <= zone_hi);
-ASSERT(zone_hi < NR_ZONES);
-
-if ( unlikely(order > MAX_ORDER) )
-return NULL;
-
-spin_lock(_lock);
-
-/*
- * Claimed memory is considered unavailable unless the request
- * is made by a domain with sufficient unclaimed pages.
- */
-if ( (outstanding_claims + request >
-  total_avail_pages + tmem_freeable_pages()) &&
-  ((memflags & MEMF_no_refcount) ||
-   !d || d->outstanding_pages < request) )
-goto not_found;
-
-/*
- * TMEM: When available memory is scarce due to tmem absorbing it, allow
- * only mid-size allocations to avoid worst of fragmentation issues.
- * Others try tmem pools then fail.  This is a workaround until all
- * post-dom0-creation-multi-page allocations can be eliminated.
- */
-if ( ((order == 0) || (order >= 9)) &&
- (total_avail_pages <= midsize_alloc_zone_pages) &&
- tmem_freeable_pages() )
-goto try_tmem;
 
 /*
  * Start with requested node, but exhaust all node memory in requested 
@@ -764,17 +729,17 @@ static struct page_info *alloc_heap_pages(
 zone = zone_hi;
 do {
 /* Check if target node can support the allocation. */
-if ( !avail[node] || (avail[node][zone] < request) )
+if ( !avail[node] || (avail[node][zone] < (1UL << order)) )
 continue;
 
 /* Find smallest order which can satisfy the request. */
 for ( j = order; j <= MAX_ORDER; j++ )
 if ( (pg = page_list_remove_head((node, zone, j))) )
-goto found;
+return pg;
 } while ( zone-- > zone_lo ); /* careful: unsigned zone may wrap */
 
 if ( (memflags & MEMF_exact_node) && req_node != NUMA_NO_NODE )
-goto not_found;
+return NULL;
 
 /* Pick next node. */
 if ( !node_isset(node, nodemask) )
@@ -791,39 +756,89 @@ static struct page_info *alloc_heap_pages(
 {
 /* When we have tried all in nodemask, we fall back to others. */
 if ( (memflags & MEMF_exact_node) || nodemask_retry++ )
-goto not_found;
+return NULL;
 nodes_andnot(nodemask, node_online_map, nodemask);
 first_node = node = first_node(nodemask);
 if ( node >= MAX_NUMNODES )
-goto not_found;
+return NULL;
 }
 }
+}
 
- try_tmem:
-/* Try to free memory from tmem */
-if ( (pg = tmem_relinquish_pages(order, memflags)) != NULL )
+/* Allocate 2^@order contiguous pages. */
+static struct page_info *alloc_heap_pages(
+unsigned int zone_lo, unsigned int zone_hi,
+unsigned int order, unsigned int memflags,
+struct domain *d)
+{
+nodeid_t node;
+

[Xen-devel] [PATCH v5 8/8] mm: Make sure pages are scrubbed

2017-06-22 Thread Boris Ostrovsky
Add a debug Kconfig option that will make page allocator verify
that pages that were supposed to be scrubbed are, in fact, clean.

Signed-off-by: Boris Ostrovsky 
---
Changes in v5:
* Defined SCRUB_PATTERN for NDEBUG
* Style chages


 xen/Kconfig.debug   |  7 ++
 xen/common/page_alloc.c | 63 -
 2 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/xen/Kconfig.debug b/xen/Kconfig.debug
index 689f297..195d504 100644
--- a/xen/Kconfig.debug
+++ b/xen/Kconfig.debug
@@ -114,6 +114,13 @@ config DEVICE_TREE_DEBUG
  logged in the Xen ring buffer.
  If unsure, say N here.
 
+config SCRUB_DEBUG
+   bool "Page scrubbing test"
+   default DEBUG
+   ---help---
+ Verify that pages that need to be scrubbed before being allocated to
+ a guest are indeed scrubbed.
+
 endif # DEBUG || EXPERT
 
 endmenu
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index da5ffc2..5d50c2a 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -170,6 +170,10 @@ boolean_param("bootscrub", opt_bootscrub);
 static unsigned long __initdata opt_bootscrub_chunk = MB(128);
 size_param("bootscrub_chunk", opt_bootscrub_chunk);
 
+#ifdef CONFIG_SCRUB_DEBUG
+static bool __read_mostly boot_scrub_done;
+#endif
+
 /*
  * Bit width of the DMA heap -- used to override NUMA-node-first.
  * allocation strategy, which can otherwise exhaust low memory.
@@ -695,6 +699,43 @@ static void page_list_add_scrub(struct page_info *pg, 
unsigned int node,
 page_list_add(pg, (node, zone, order));
 }
 
+/* SCRUB_PATTERN needs to be a repeating series of bytes. */
+#ifndef NDEBUG
+#define SCRUB_PATTERN0xc2c2c2c2c2c2c2c2ULL
+#else
+#define SCRUB_PATTERN0ULL
+#endif
+#define SCRUB_BYTE_PATTERN   (SCRUB_PATTERN & 0xff)
+
+static void poison_one_page(struct page_info *pg)
+{
+#ifdef CONFIG_SCRUB_DEBUG
+mfn_t mfn = _mfn(page_to_mfn(pg));
+uint64_t *ptr;
+
+ptr = map_domain_page(mfn);
+*ptr = ~SCRUB_PATTERN;
+unmap_domain_page(ptr);
+#endif
+}
+
+static void check_one_page(struct page_info *pg)
+{
+#ifdef CONFIG_SCRUB_DEBUG
+mfn_t mfn = _mfn(page_to_mfn(pg));
+const uint64_t *ptr;
+unsigned int i;
+
+if ( !boot_scrub_done )
+return;
+
+ptr = map_domain_page(mfn);
+for ( i = 0; i < PAGE_SIZE / sizeof (*ptr); i++ )
+ASSERT(ptr[i] == SCRUB_PATTERN);
+unmap_domain_page(ptr);
+#endif
+}
+
 static void check_and_stop_scrub(struct page_info *head)
 {
 if ( head->u.free.scrub_state == BUDDY_SCRUBBING )
@@ -931,6 +972,9 @@ static struct page_info *alloc_heap_pages(
  * guest can control its own visibility of/through the cache.
  */
 flush_page_to_ram(page_to_mfn([i]), !(memflags & 
MEMF_no_icache_flush));
+
+if ( !(memflags & MEMF_no_scrub) )
+check_one_page([i]);
 }
 
 spin_unlock(_lock);
@@ -1294,7 +1338,10 @@ static void free_heap_pages(
 set_gpfn_from_mfn(mfn + i, INVALID_M2P_ENTRY);
 
 if ( need_scrub )
+{
 pg[i].count_info |= PGC_need_scrub;
+poison_one_page([i]);
+}
 }
 
 avail[node][zone] += 1 << order;
@@ -1656,7 +1703,12 @@ static void init_heap_pages(
 nr_pages -= n;
 }
 
+#ifndef CONFIG_SCRUB_DEBUG
 free_heap_pages(pg + i, 0, false);
+#else
+free_heap_pages(pg + i, 0, boot_scrub_done);
+#endif
+   
 }
 }
 
@@ -1922,6 +1974,10 @@ void __init scrub_heap_pages(void)
 
 printk("done.\n");
 
+#ifdef CONFIG_SCRUB_DEBUG
+boot_scrub_done = true;
+#endif
+
 /* Now that the heap is initialized, run checks and set bounds
  * for the low mem virq algorithm. */
 setup_low_mem_virq();
@@ -2195,12 +2251,16 @@ void free_domheap_pages(struct page_info *pg, unsigned 
int order)
 
 spin_unlock_recursive(>page_alloc_lock);
 
+#ifndef CONFIG_SCRUB_DEBUG
 /*
  * Normally we expect a domain to clear pages before freeing them,
  * if it cares about the secrecy of their contents. However, after
  * a domain has died we assume responsibility for erasure.
  */
 scrub = !!d->is_dying;
+#else
+scrub = true;
+#endif
 }
 else
 {
@@ -2292,7 +2352,8 @@ void scrub_one_page(struct page_info *pg)
 
 #ifndef NDEBUG
 /* Avoid callers relying on allocations returning zeroed pages. */
-unmap_domain_page(memset(__map_domain_page(pg), 0xc2, PAGE_SIZE));
+unmap_domain_page(memset(__map_domain_page(pg),
+ SCRUB_BYTE_PATTERN, PAGE_SIZE));
 #else
 /* For a production build, clear_page() is the fastest way to scrub. */
 clear_domain_page(_mfn(page_to_mfn(pg)));
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 1/8] mm: Place unscrubbed pages at the end of pagelist

2017-06-22 Thread Boris Ostrovsky
. so that it's easy to find pages that need to be scrubbed (those pages are
now marked with _PGC_need_scrub bit).

We keep track of the first unscrubbed page in a page buddy using first_dirty
field. For now it can have two values, 0 (whole buddy needs scrubbing) or
INVALID_DIRTY_IDX (the buddy does not need to be scrubbed). Subsequent patches
will allow scrubbing to be interrupted, resulting in first_dirty taking any
value.

Signed-off-by: Boris Ostrovsky 
---
Changes in v5:
* Be more careful when returning unused portion of a buddy to the heap in
  alloc_heap_pages() and don't set first_dirty if we know that the sub-buddy
  is clean
* In reserve_offlined_page(), don't try to find dirty pages in sub-buddies if we
  can figure out that there is none.
* Drop unnecessary setting of first_dirty in free_heap_pages()
* Switch to using bitfields in page_info.u.free

I kept node_need_scrub[] as a global array and not a "per-node". I think 
splitting
it should be part of making heap_lock a per-node lock, together with increasing
scrub concurrency by having more than one CPU scrub a node.



 xen/common/page_alloc.c  | 190 +++
 xen/include/asm-arm/mm.h |  18 -
 xen/include/asm-x86/mm.h |  17 -
 3 files changed, 190 insertions(+), 35 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 8bcef6a..570d1f7 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -383,6 +383,8 @@ typedef struct page_list_head 
heap_by_zone_and_order_t[NR_ZONES][MAX_ORDER+1];
 static heap_by_zone_and_order_t *_heap[MAX_NUMNODES];
 #define heap(node, zone, order) ((*_heap[node])[zone][order])
 
+static unsigned long node_need_scrub[MAX_NUMNODES];
+
 static unsigned long *avail[MAX_NUMNODES];
 static long total_avail_pages;
 
@@ -678,6 +680,20 @@ static void check_low_mem_virq(void)
 }
 }
 
+/* Pages that need a scrub are added to tail, otherwise to head. */
+static void page_list_add_scrub(struct page_info *pg, unsigned int node,
+unsigned int zone, unsigned int order,
+unsigned int first_dirty)
+{
+PFN_ORDER(pg) = order;
+pg->u.free.first_dirty = first_dirty;
+
+if ( first_dirty != INVALID_DIRTY_IDX )
+page_list_add_tail(pg, (node, zone, order));
+else
+page_list_add(pg, (node, zone, order));
+}
+
 /* Allocate 2^@order contiguous pages. */
 static struct page_info *alloc_heap_pages(
 unsigned int zone_lo, unsigned int zone_hi,
@@ -687,7 +703,7 @@ static struct page_info *alloc_heap_pages(
 unsigned int i, j, zone = 0, nodemask_retry = 0;
 nodeid_t first_node, node = MEMF_get_node(memflags), req_node = node;
 unsigned long request = 1UL << order;
-struct page_info *pg;
+struct page_info *pg, *first_dirty_pg = NULL;
 nodemask_t nodemask = (d != NULL ) ? d->node_affinity : node_online_map;
 bool_t need_tlbflush = 0;
 uint32_t tlbflush_timestamp = 0;
@@ -798,11 +814,26 @@ static struct page_info *alloc_heap_pages(
 return NULL;
 
  found: 
+
+if ( pg->u.free.first_dirty != INVALID_DIRTY_IDX )
+first_dirty_pg = pg + pg->u.free.first_dirty;
+
 /* We may have to halve the chunk a number of times. */
 while ( j != order )
 {
-PFN_ORDER(pg) = --j;
-page_list_add_tail(pg, (node, zone, j));
+unsigned int first_dirty;
+
+if ( first_dirty_pg && ((pg + (1 << j)) > first_dirty_pg) )
+{
+if ( pg < first_dirty_pg )
+first_dirty = (first_dirty_pg - pg) / sizeof(*pg);
+else
+first_dirty = 0;
+}
+else
+first_dirty = INVALID_DIRTY_IDX;
+
+page_list_add_scrub(pg, node, zone, --j, first_dirty);
 pg += 1 << j;
 }
 
@@ -849,13 +880,22 @@ static int reserve_offlined_page(struct page_info *head)
 {
 unsigned int node = phys_to_nid(page_to_maddr(head));
 int zone = page_to_zone(head), i, head_order = PFN_ORDER(head), count = 0;
-struct page_info *cur_head;
+struct page_info *cur_head, *first_dirty_pg = NULL;
 int cur_order;
 
 ASSERT(spin_is_locked(_lock));
 
 cur_head = head;
 
+/*
+ * We may break the buddy so let's mark the head as clean. Then, when
+ * merging chunks back into the heap, we will see whether the chunk has
+ * unscrubbed pages and set its first_dirty properly.
+ */
+if (head->u.free.first_dirty != INVALID_DIRTY_IDX)
+first_dirty_pg = head + head->u.free.first_dirty;
+head->u.free.first_dirty = INVALID_DIRTY_IDX;
+
 page_list_del(head, (node, zone, head_order));
 
 while ( cur_head < (head + (1 << head_order)) )
@@ -873,6 +913,8 @@ static int reserve_offlined_page(struct page_info *head)
 
 while ( cur_order < head_order )
 {
+unsigned int first_dirty = INVALID_DIRTY_IDX;
+
 next_order = cur_order + 1;
 
 

Re: [Xen-devel] [PATCH] xen/disk: don't leak stack data via response ring

2017-06-22 Thread Stefano Stabellini
On Thu, 22 Jun 2017, Jan Beulich wrote:
> >>> On 21.06.17 at 20:46,  wrote:
> > On Wed, 21 Jun 2017, Jan Beulich wrote:
> >> >>> On 20.06.17 at 23:48,  wrote:
> >> > On Tue, 20 Jun 2017, Jan Beulich wrote:
> >> >> @@ -36,13 +33,7 @@ struct blkif_x86_32_request_discard {
> >> >>  blkif_sector_t sector_number;/* start sector idx on disk (r/w 
> > only)  */
> >> >>  uint64_t   nr_sectors;   /* # of contiguous sectors to 
> >> >> discard 
> >   */
> >> >>  };
> >> >> -struct blkif_x86_32_response {
> >> >> -uint64_tid;  /* copied from request */
> >> >> -uint8_t operation;   /* copied from request */
> >> >> -int16_t status;  /* BLKIF_RSP_???   */
> >> >> -};
> >> >>  typedef struct blkif_x86_32_request blkif_x86_32_request_t;
> >> >> -typedef struct blkif_x86_32_response blkif_x86_32_response_t;
> >> >>  #pragma pack(pop)
> >> >>  
> >> >>  /* x86_64 protocol version */
> >> >> @@ -62,20 +53,14 @@ struct blkif_x86_64_request_discard {
> >> >>  blkif_sector_t sector_number;/* start sector idx on disk (r/w 
> > only)  */
> >> >>  uint64_t   nr_sectors;   /* # of contiguous sectors to 
> >> >> discard 
> >   */
> >> >>  };
> >> >> -struct blkif_x86_64_response {
> >> >> -uint64_t   __attribute__((__aligned__(8))) id;
> >> >> -uint8_t operation;   /* copied from request */
> >> >> -int16_t status;  /* BLKIF_RSP_???   */
> >> >> -};
> >> >>
> >> >>  typedef struct blkif_x86_64_request blkif_x86_64_request_t;
> >> >> -typedef struct blkif_x86_64_response blkif_x86_64_response_t;
> >> >>  
> >> >>  DEFINE_RING_TYPES(blkif_common, struct blkif_common_request,
> >> >> -  struct blkif_common_response);
> >> >> +  struct blkif_response);
> >> >>  DEFINE_RING_TYPES(blkif_x86_32, struct blkif_x86_32_request,
> >> >> -  struct blkif_x86_32_response);
> >> >> +  struct blkif_response QEMU_PACKED);
> >> > 
> >> > In my test, the previous sizes and alignments of the response structs
> >> > were (on both x86_32 and x86_64):
> >> > 
> >> > sizeof(blkif_x86_32_response)=12   sizeof(blkif_x86_64_response)=16
> >> > align(blkif_x86_32_response)=4 align(blkif_x86_64_response)=8
> >> > 
> >> > While with these changes are now, when compiled on x86_64:
> >> > sizeof(blkif_x86_32_response)=11   sizeof(blkif_x86_64_response)=16
> >> > align(blkif_x86_32_response)=1 align(blkif_x86_64_response)=8
> >> > 
> >> > when compiled on x86_32:
> >> > sizeof(blkif_x86_32_response)=11   sizeof(blkif_x86_64_response)=12
> >> > align(blkif_x86_32_response)=1 align(blkif_x86_64_response)=4
> >> > 
> >> > Did I do my tests wrong?
> >> > 
> >> > QEMU_PACKED is not the same as #pragma pack(push, 4). In fact, it is the
> >> > same as #pragma pack(push, 1), causing the struct to be densely packed,
> >> > leaving no padding whatsever.
> >> > 
> >> > In addition, without __attribute__((__aligned__(8))),
> >> > blkif_x86_64_response won't be 8 bytes aligned when built on x86_32.
> >> > 
> >> > Am I missing something?
> >> 
> >> Well, you're mixing attribute application upon structure
> >> declaration with attribute application upon structure use. It's
> >> the latter here, and hence the attribute doesn't affect
> >> structure layout at all. All it does is avoid the _containing_
> >> 32-bit union to become 8-byte aligned (and tail padding to be
> >> inserted).
> > 
> > Thanks for the explanation. I admit it's the first time I see the
> > aligned attribute being used at structure usage only. I think it's the
> > first time QEMU_PACKED is used this way in QEMU too.
> > 
> > Anyway, even taking that into account, things are still not completely
> > right: the alignment of struct blkif_x86_32_response QEMU_PACKED is 4
> > bytes as you wrote, but the size of struct blkif_x86_32_response is
> > still 16 bytes instead of 12 bytes in my test. I suspect it worked for
> > you because the other member of the union (blkif_x86_32_request) is
> > larger than that. However, I think is not a good idea to rely on this
> > implementation detail. The implementation of DEFINE_RING_TYPES should be
> > opaque from our point of view. We shouldn't have to know that there is a
> > union there.
> 
> I don't follow - why should we not rely on this? It is a fundamental
> aspect of the shared ring model that requests and responses share
> space.
> 
> > Moreover, the other problem is still unaddressed: the size and alignment
> > of blkif_x86_64_response when built on x86_32 are 12 and 4 instead of 16
> > and 8 bytes. Is that working also because it's relying on the other
> > member of the union to enforce the right alignment and bigger size?
> 
> Yes. For these as well as your comments further up - sizeof() and
> alignof() are completely uninteresting as long as we don't
> instantiate objects of those types _and 

Re: [Xen-devel] [PATCH v4 07/18] xen/pvcalls: implement socket command

2017-06-22 Thread Stefano Stabellini
On Thu, 22 Jun 2017, Andrew Cooper wrote:
> On 22/06/17 19:29, Stefano Stabellini wrote:
> > On Thu, 22 Jun 2017, Roger Pau Monné wrote:
> >> On Wed, Jun 21, 2017 at 01:16:56PM -0700, Stefano Stabellini wrote:
> >>> On Tue, 20 Jun 2017, Roger Pau Monné wrote:
>  On Thu, Jun 15, 2017 at 12:09:36PM -0700, Stefano Stabellini wrote:
> > Just reply with success to the other end for now. Delay the allocation
> > of the actual socket to bind and/or connect.
> >
> > Signed-off-by: Stefano Stabellini 
> > CC: boris.ostrov...@oracle.com
> > CC: jgr...@suse.com
> > ---
> >  drivers/xen/pvcalls-back.c | 27 +++
> >  1 file changed, 27 insertions(+)
> >
> > diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> > index 437c2ad..953458b 100644
> > --- a/drivers/xen/pvcalls-back.c
> > +++ b/drivers/xen/pvcalls-back.c
> > @@ -12,12 +12,17 @@
> >   * GNU General Public License for more details.
> >   */
> >  
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> >  
> >  #include 
> >  #include 
> > @@ -54,6 +59,28 @@ struct pvcalls_fedata {
> >  static int pvcalls_back_socket(struct xenbus_device *dev,
> > struct xen_pvcalls_request *req)
> >  {
> > +   struct pvcalls_fedata *fedata;
> > +   int ret;
> > +   struct xen_pvcalls_response *rsp;
> > +
> > +   fedata = dev_get_drvdata(>dev);
> > +
> > +   if (req->u.socket.domain != AF_INET ||
> > +   req->u.socket.type != SOCK_STREAM ||
> > +   (req->u.socket.protocol != IPPROTO_IP &&
> > +req->u.socket.protocol != AF_INET))
> > +   ret = -EAFNOSUPPORT;
>  Sorry for jumping into this out of the blue, but shouldn't all the
>  constants used above be part of the protocol? AF_INET/SOCK_STREAM/...
>  are all part of POSIX, but their specific value is not defined in the
>  standard, hence we should have XEN_AF_INET/XEN_SOCK_STREAM/... Or am I
>  just missing something?
> >>> The values of these constants for the pvcalls protocol are defined by
> >>> docs/misc/pvcalls.markdown under "Socket families and address format".
> >>>
> >>> They happen to be the same as the ones defined by Linux as AF_INET,
> >>> SOCK_STREAM, etc, so in Linux I am just using those, but that is just an
> >>> implementation detail internal to the Linux kernel driver. What is
> >>> important from the protocol ABI perspective are the values defined by
> >>> docs/misc/pvcalls.markdown.
> >> Oh I see. I still think this should be part of the public pvcalls.h
> >> header, and that the error codes should be the ones defined in
> >> public/errno.h (or else also added to the pvcalls header).
> > This was done differently in the past, but now that we have a formal
> > process, a person in charge of new PV drivers reviews, and design
> > documents with clearly spelled out ABIs, I consider the design docs
> > under docs/misc as the official specification. We don't need headers
> > anymore, they are redundant. In fact, we cannot have two specifications,
> > and the design docs are certainly the official ones (we don't want the
> > specs to be written as header files in C). To me, the headers under
> > xen/include/public/io/ are optional helpers. It doesn't matter what's in
> > there, or if frontends and backends use them or not.
> >
> > There is really an argument for removing those headers, because they
> > might get out of sync with the spec by mistake, and in those cases, then
> > we really end up with two specifications for the same protocol. I would
> > be in favor of `git rm'ing all files under xen/include/public/io/ for
> > which we have a complete design doc under docs/misc.
> 
> +1.
> 
> Specifications should not be written in C.  The mess that is the net and
> block protocol ABIs are perfect examples of why.
> 
> Its fine (and indeed recommended) to provide a header file which
> describes the specified protocol, but the authoritative spec should be
> in text from.
> 
> I would really prefer if more people started using ../docs/specs/.  The
> migration v2 documents are currently lonely there...

I didn't realize we had a docs/specs. Feel free to move pvcalls and 9pfs
under there.___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 07/18] xen/pvcalls: implement socket command

2017-06-22 Thread Andrew Cooper
On 22/06/17 19:29, Stefano Stabellini wrote:
> On Thu, 22 Jun 2017, Roger Pau Monné wrote:
>> On Wed, Jun 21, 2017 at 01:16:56PM -0700, Stefano Stabellini wrote:
>>> On Tue, 20 Jun 2017, Roger Pau Monné wrote:
 On Thu, Jun 15, 2017 at 12:09:36PM -0700, Stefano Stabellini wrote:
> Just reply with success to the other end for now. Delay the allocation
> of the actual socket to bind and/or connect.
>
> Signed-off-by: Stefano Stabellini 
> CC: boris.ostrov...@oracle.com
> CC: jgr...@suse.com
> ---
>  drivers/xen/pvcalls-back.c | 27 +++
>  1 file changed, 27 insertions(+)
>
> diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> index 437c2ad..953458b 100644
> --- a/drivers/xen/pvcalls-back.c
> +++ b/drivers/xen/pvcalls-back.c
> @@ -12,12 +12,17 @@
>   * GNU General Public License for more details.
>   */
>  
> +#include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> +#include 
> +#include 
>  
>  #include 
>  #include 
> @@ -54,6 +59,28 @@ struct pvcalls_fedata {
>  static int pvcalls_back_socket(struct xenbus_device *dev,
>   struct xen_pvcalls_request *req)
>  {
> + struct pvcalls_fedata *fedata;
> + int ret;
> + struct xen_pvcalls_response *rsp;
> +
> + fedata = dev_get_drvdata(>dev);
> +
> + if (req->u.socket.domain != AF_INET ||
> + req->u.socket.type != SOCK_STREAM ||
> + (req->u.socket.protocol != IPPROTO_IP &&
> +  req->u.socket.protocol != AF_INET))
> + ret = -EAFNOSUPPORT;
 Sorry for jumping into this out of the blue, but shouldn't all the
 constants used above be part of the protocol? AF_INET/SOCK_STREAM/...
 are all part of POSIX, but their specific value is not defined in the
 standard, hence we should have XEN_AF_INET/XEN_SOCK_STREAM/... Or am I
 just missing something?
>>> The values of these constants for the pvcalls protocol are defined by
>>> docs/misc/pvcalls.markdown under "Socket families and address format".
>>>
>>> They happen to be the same as the ones defined by Linux as AF_INET,
>>> SOCK_STREAM, etc, so in Linux I am just using those, but that is just an
>>> implementation detail internal to the Linux kernel driver. What is
>>> important from the protocol ABI perspective are the values defined by
>>> docs/misc/pvcalls.markdown.
>> Oh I see. I still think this should be part of the public pvcalls.h
>> header, and that the error codes should be the ones defined in
>> public/errno.h (or else also added to the pvcalls header).
> This was done differently in the past, but now that we have a formal
> process, a person in charge of new PV drivers reviews, and design
> documents with clearly spelled out ABIs, I consider the design docs
> under docs/misc as the official specification. We don't need headers
> anymore, they are redundant. In fact, we cannot have two specifications,
> and the design docs are certainly the official ones (we don't want the
> specs to be written as header files in C). To me, the headers under
> xen/include/public/io/ are optional helpers. It doesn't matter what's in
> there, or if frontends and backends use them or not.
>
> There is really an argument for removing those headers, because they
> might get out of sync with the spec by mistake, and in those cases, then
> we really end up with two specifications for the same protocol. I would
> be in favor of `git rm'ing all files under xen/include/public/io/ for
> which we have a complete design doc under docs/misc.

+1.

Specifications should not be written in C.  The mess that is the net and
block protocol ABIs are perfect examples of why.

Its fine (and indeed recommended) to provide a header file which
describes the specified protocol, but the authoritative spec should be
in text from.

I would really prefer if more people started using ../docs/specs/.  The
migration v2 documents are currently lonely there...

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 01/16] xen/mm: Don't use _{g, m}fn for defining INVALID_{G, M}FN

2017-06-22 Thread Julien Grall

Hi,

On 20/06/17 11:32, Jan Beulich wrote:

On 20.06.17 at 12:06,  wrote:

At 03:36 -0600 on 20 Jun (1497929778), Jan Beulich wrote:

On 20.06.17 at 11:14,  wrote:

At 01:32 -0600 on 20 Jun (1497922345), Jan Beulich wrote:

On 19.06.17 at 18:57,  wrote:

--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -56,7 +56,7 @@

 TYPE_SAFE(unsigned long, mfn);
 #define PRI_mfn  "05lx"
-#define INVALID_MFN  _mfn(~0UL)
+#define INVALID_MFN  (mfn_t){ ~0UL }


While I don't expect anyone to wish to use a suffix expression on
this constant, for maximum compatibility this should still be fully
parenthesized, I think. Of course this should be easy enough to
do while committing.

Are you able to assure us that clang supports this gcc extension
(compound literal for non-compound types)


AIUI this is a C99 feature, not a GCCism.


Most parts of it yes (it is a gcc extension in C89 mode only), but the
specific use here isn't afaict: Compound literals outside of functions
are static objects, and hence couldn't be used as initializers of other
objects.


Ah, I see.  So would it be better to use

  #define INVALID_MFN ((const mfn_t) { ~0UL })

?


While I think we should indeed consider adding the const, the above
still is a static object, and hence still not suitable as an initializer as
per C99 or C11. But as long as gcc and clang permit it, we're fine.


Actually this solutions breaks on GCC 4.9 provided by Linaro ([1] 
4.9-2016-02 and 4.9-2017.01).


This small reproducer does not compile with -std=gnu99 (used by Xen) but 
compile with this option. Jan, have you tried 4.9 with this patch?


typedef struct
{
unsigned long i;
} mfn_t;

mfn_t v = (const mfn_t){~0UL};

Cheers,

[1] https://releases.linaro.org/components/toolchain/binaries/



Jan



--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 07/18] xen/pvcalls: implement socket command

2017-06-22 Thread Stefano Stabellini
On Thu, 22 Jun 2017, Roger Pau Monné wrote:
> On Wed, Jun 21, 2017 at 01:16:56PM -0700, Stefano Stabellini wrote:
> > On Tue, 20 Jun 2017, Roger Pau Monné wrote:
> > > On Thu, Jun 15, 2017 at 12:09:36PM -0700, Stefano Stabellini wrote:
> > > > Just reply with success to the other end for now. Delay the allocation
> > > > of the actual socket to bind and/or connect.
> > > > 
> > > > Signed-off-by: Stefano Stabellini 
> > > > CC: boris.ostrov...@oracle.com
> > > > CC: jgr...@suse.com
> > > > ---
> > > >  drivers/xen/pvcalls-back.c | 27 +++
> > > >  1 file changed, 27 insertions(+)
> > > > 
> > > > diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> > > > index 437c2ad..953458b 100644
> > > > --- a/drivers/xen/pvcalls-back.c
> > > > +++ b/drivers/xen/pvcalls-back.c
> > > > @@ -12,12 +12,17 @@
> > > >   * GNU General Public License for more details.
> > > >   */
> > > >  
> > > > +#include 
> > > >  #include 
> > > >  #include 
> > > >  #include 
> > > >  #include 
> > > >  #include 
> > > >  #include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > >  
> > > >  #include 
> > > >  #include 
> > > > @@ -54,6 +59,28 @@ struct pvcalls_fedata {
> > > >  static int pvcalls_back_socket(struct xenbus_device *dev,
> > > > struct xen_pvcalls_request *req)
> > > >  {
> > > > +   struct pvcalls_fedata *fedata;
> > > > +   int ret;
> > > > +   struct xen_pvcalls_response *rsp;
> > > > +
> > > > +   fedata = dev_get_drvdata(>dev);
> > > > +
> > > > +   if (req->u.socket.domain != AF_INET ||
> > > > +   req->u.socket.type != SOCK_STREAM ||
> > > > +   (req->u.socket.protocol != IPPROTO_IP &&
> > > > +req->u.socket.protocol != AF_INET))
> > > > +   ret = -EAFNOSUPPORT;
> > > 
> > > Sorry for jumping into this out of the blue, but shouldn't all the
> > > constants used above be part of the protocol? AF_INET/SOCK_STREAM/...
> > > are all part of POSIX, but their specific value is not defined in the
> > > standard, hence we should have XEN_AF_INET/XEN_SOCK_STREAM/... Or am I
> > > just missing something?
> > 
> > The values of these constants for the pvcalls protocol are defined by
> > docs/misc/pvcalls.markdown under "Socket families and address format".
> > 
> > They happen to be the same as the ones defined by Linux as AF_INET,
> > SOCK_STREAM, etc, so in Linux I am just using those, but that is just an
> > implementation detail internal to the Linux kernel driver. What is
> > important from the protocol ABI perspective are the values defined by
> > docs/misc/pvcalls.markdown.
> 
> Oh I see. I still think this should be part of the public pvcalls.h
> header, and that the error codes should be the ones defined in
> public/errno.h (or else also added to the pvcalls header).

This was done differently in the past, but now that we have a formal
process, a person in charge of new PV drivers reviews, and design
documents with clearly spelled out ABIs, I consider the design docs
under docs/misc as the official specification. We don't need headers
anymore, they are redundant. In fact, we cannot have two specifications,
and the design docs are certainly the official ones (we don't want the
specs to be written as header files in C). To me, the headers under
xen/include/public/io/ are optional helpers. It doesn't matter what's in
there, or if frontends and backends use them or not.

There is really an argument for removing those headers, because they
might get out of sync with the spec by mistake, and in those cases, then
we really end up with two specifications for the same protocol. I would
be in favor of `git rm'ing all files under xen/include/public/io/ for
which we have a complete design doc under docs/misc.___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH for-4.9 v3 1/3] xen/livepatch: Clean up arch relocation handling

2017-06-22 Thread Andrew Cooper
 * Reduce symbol scope and initalisation as much as possible
 * Annotate a fallthrough case in arm64
 * Fix switch statement style in arm32

No functional change.

Signed-off-by: Andrew Cooper 
Reviewed-by: Jan Beulich 
Reviewed-by: Konrad Rzeszutek Wilk 
Tested-by: Konrad Rzeszutek Wilk 
---
 xen/arch/arm/arm32/livepatch.c | 27 ---
 xen/arch/arm/arm64/livepatch.c | 19 +++
 xen/arch/x86/livepatch.c   | 13 +
 3 files changed, 24 insertions(+), 35 deletions(-)

diff --git a/xen/arch/arm/arm32/livepatch.c b/xen/arch/arm/arm32/livepatch.c
index a7fd5e2..a328179 100644
--- a/xen/arch/arm/arm32/livepatch.c
+++ b/xen/arch/arm/arm32/livepatch.c
@@ -224,21 +224,21 @@ int arch_livepatch_perform(struct livepatch_elf *elf,
const struct livepatch_elf_sec *rela,
bool use_rela)
 {
-const Elf_RelA *r_a;
-const Elf_Rel *r;
-unsigned int symndx, i;
-uint32_t val;
-void *dest;
+unsigned int i;
 int rc = 0;
 
 for ( i = 0; i < (rela->sec->sh_size / rela->sec->sh_entsize); i++ )
 {
+unsigned int symndx;
+uint32_t val;
+void *dest;
 unsigned char type;
-s32 addend = 0;
+s32 addend;
 
 if ( use_rela )
 {
-r_a = rela->data + i * rela->sec->sh_entsize;
+const Elf_RelA *r_a = rela->data + i * rela->sec->sh_entsize;
+
 symndx = ELF32_R_SYM(r_a->r_info);
 type = ELF32_R_TYPE(r_a->r_info);
 dest = base->load_addr + r_a->r_offset; /* P */
@@ -246,10 +246,12 @@ int arch_livepatch_perform(struct livepatch_elf *elf,
 }
 else
 {
-r = rela->data + i * rela->sec->sh_entsize;
+const Elf_Rel *r = rela->data + i * rela->sec->sh_entsize;
+
 symndx = ELF32_R_SYM(r->r_info);
 type = ELF32_R_TYPE(r->r_info);
 dest = base->load_addr + r->r_offset; /* P */
+addend = get_addend(type, dest);
 }
 
 if ( symndx > elf->nsym )
@@ -259,13 +261,11 @@ int arch_livepatch_perform(struct livepatch_elf *elf,
 return -EINVAL;
 }
 
-if ( !use_rela )
-addend = get_addend(type, dest);
-
 val = elf->sym[symndx].sym->st_value; /* S */
 
 rc = perform_rel(type, dest, val, addend);
-switch ( rc ) {
+switch ( rc )
+{
 case -EOVERFLOW:
 dprintk(XENLOG_ERR, LIVEPATCH "%s: Overflow in relocation %u in %s 
for %s!\n",
 elf->name, i, rela->name, base->name);
@@ -275,9 +275,6 @@ int arch_livepatch_perform(struct livepatch_elf *elf,
 dprintk(XENLOG_ERR, LIVEPATCH "%s: Unhandled relocation #%x\n",
 elf->name, type);
 break;
-
-default:
-break;
 }
 
 if ( rc )
diff --git a/xen/arch/arm/arm64/livepatch.c b/xen/arch/arm/arm64/livepatch.c
index dae64f5..63929b1 100644
--- a/xen/arch/arm/arm64/livepatch.c
+++ b/xen/arch/arm/arm64/livepatch.c
@@ -241,19 +241,16 @@ int arch_livepatch_perform_rela(struct livepatch_elf *elf,
 const struct livepatch_elf_sec *base,
 const struct livepatch_elf_sec *rela)
 {
-const Elf_RelA *r;
-unsigned int symndx, i;
-uint64_t val;
-void *dest;
-bool_t overflow_check;
+unsigned int i;
 
 for ( i = 0; i < (rela->sec->sh_size / rela->sec->sh_entsize); i++ )
 {
+const Elf_RelA *r = rela->data + i * rela->sec->sh_entsize;
+unsigned int symndx = ELF64_R_SYM(r->r_info);
+void *dest = base->load_addr + r->r_offset; /* P */
+bool overflow_check = true;
 int ovf = 0;
-
-r = rela->data + i * rela->sec->sh_entsize;
-
-symndx = ELF64_R_SYM(r->r_info);
+uint64_t val;
 
 if ( symndx > elf->nsym )
 {
@@ -262,11 +259,8 @@ int arch_livepatch_perform_rela(struct livepatch_elf *elf,
 return -EINVAL;
 }
 
-dest = base->load_addr + r->r_offset; /* P */
 val = elf->sym[symndx].sym->st_value +  r->r_addend; /* S+A */
 
-overflow_check = true;
-
 /* ARM64 operations at minimum are always 32-bit. */
 if ( r->r_offset >= base->sec->sh_size ||
 (r->r_offset + sizeof(uint32_t)) > base->sec->sh_size )
@@ -403,6 +397,7 @@ int arch_livepatch_perform_rela(struct livepatch_elf *elf,
 
 case R_AARCH64_ADR_PREL_PG_HI21_NC:
 overflow_check = false;
+/* Fallthrough. */
 case R_AARCH64_ADR_PREL_PG_HI21:
 ovf = reloc_insn_imm(RELOC_OP_PAGE, dest, val, 12, 21,
  AARCH64_INSN_IMM_ADR);
diff --git a/xen/arch/x86/livepatch.c b/xen/arch/x86/livepatch.c
index dd50dd1..7917610 100644
--- 

[Xen-devel] [PATCH for-4.9 v3 0/3] Fixes for livepatching

2017-06-22 Thread Andrew Cooper
Andrew Cooper (3):
  xen/livepatch: Clean up arch relocation handling
  xen/livepatch: Use zeroed memory allocations for arrays
  xen/livepatch: Don't crash on encountering STN_UNDEF relocations

 xen/arch/arm/arm32/livepatch.c | 41 +
 xen/arch/arm/arm64/livepatch.c | 33 -
 xen/arch/x86/livepatch.c   | 27 ++-
 xen/common/livepatch.c |  4 ++--
 xen/common/livepatch_elf.c |  4 ++--
 5 files changed, 67 insertions(+), 42 deletions(-)

-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH for-4.9 v3 2/3] xen/livepatch: Use zeroed memory allocations for arrays

2017-06-22 Thread Andrew Cooper
Each of these arrays is sparse.  Use zeroed allocations to cause uninitialised
array elements to contain deterministic values, most importantly for the
embedded pointers.

Signed-off-by: Andrew Cooper 
---
CC: Konrad Rzeszutek Wilk 
CC: Ross Lagerwall 

* new in v3
---
 xen/common/livepatch.c | 4 ++--
 xen/common/livepatch_elf.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c
index df67a1a..66d532d 100644
--- a/xen/common/livepatch.c
+++ b/xen/common/livepatch.c
@@ -771,8 +771,8 @@ static int build_symbol_table(struct payload *payload,
 }
 }
 
-symtab = xmalloc_array(struct livepatch_symbol, nsyms);
-strtab = xmalloc_array(char, strtab_len);
+symtab = xzalloc_array(struct livepatch_symbol, nsyms);
+strtab = xzalloc_array(char, strtab_len);
 
 if ( !strtab || !symtab )
 {
diff --git a/xen/common/livepatch_elf.c b/xen/common/livepatch_elf.c
index c4a9633..b69e271 100644
--- a/xen/common/livepatch_elf.c
+++ b/xen/common/livepatch_elf.c
@@ -52,7 +52,7 @@ static int elf_resolve_sections(struct livepatch_elf *elf, 
const void *data)
 int rc;
 
 /* livepatch_elf_load sanity checked e_shnum. */
-sec = xmalloc_array(struct livepatch_elf_sec, elf->hdr->e_shnum);
+sec = xzalloc_array(struct livepatch_elf_sec, elf->hdr->e_shnum);
 if ( !sec )
 {
 dprintk(XENLOG_ERR, LIVEPATCH"%s: Could not allocate memory for 
section table!\n",
@@ -225,7 +225,7 @@ static int elf_get_sym(struct livepatch_elf *elf, const 
void *data)
 /* No need to check values as elf_resolve_sections did it. */
 nsym = symtab_sec->sec->sh_size / symtab_sec->sec->sh_entsize;
 
-sym = xmalloc_array(struct livepatch_elf_sym, nsym);
+sym = xzalloc_array(struct livepatch_elf_sym, nsym);
 if ( !sym )
 {
 dprintk(XENLOG_ERR, LIVEPATCH "%s: Could not allocate memory for 
symbols\n",
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH for-4.9 v3 3/3] xen/livepatch: Don't crash on encountering STN_UNDEF relocations

2017-06-22 Thread Andrew Cooper
A symndx of STN_UNDEF is special, and means a symbol value of 0.  While
legitimate in the ELF standard, its existance in a livepatch is questionable
at best.  Until a plausible usecase presents itself, reject such a relocation
with -EOPNOTSUPP.

Additionally, fix an off-by-one error while range checking symndx, and perform
a safety check on elf->sym[symndx].sym before derefencing it, to avoid
tripping over a NULL pointer when calculating val.

Signed-off-by: Andrew Cooper 
---
CC: Konrad Rzeszutek Wilk 
CC: Ross Lagerwall 
CC: Jan Beulich 
CC: Stefano Stabellini 
CC: Julien Grall 

v3:
 * Fix off-by-one error
v2:
 * Reject STN_UNDEF with -EOPNOTSUPP
---
 xen/arch/arm/arm32/livepatch.c | 14 +-
 xen/arch/arm/arm64/livepatch.c | 14 +-
 xen/arch/x86/livepatch.c   | 14 +-
 3 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/xen/arch/arm/arm32/livepatch.c b/xen/arch/arm/arm32/livepatch.c
index a328179..41378a5 100644
--- a/xen/arch/arm/arm32/livepatch.c
+++ b/xen/arch/arm/arm32/livepatch.c
@@ -254,12 +254,24 @@ int arch_livepatch_perform(struct livepatch_elf *elf,
 addend = get_addend(type, dest);
 }
 
-if ( symndx > elf->nsym )
+if ( symndx == STN_UNDEF )
+{
+dprintk(XENLOG_ERR, LIVEPATCH "%s: Encountered STN_UNDEF\n",
+elf->name);
+return -EOPNOTSUPP;
+}
+else if ( symndx >= elf->nsym )
 {
 dprintk(XENLOG_ERR, LIVEPATCH "%s: Relative symbol wants symbol@%u 
which is past end!\n",
 elf->name, symndx);
 return -EINVAL;
 }
+else if ( !elf->sym[symndx].sym )
+{
+dprintk(XENLOG_ERR, LIVEPATCH "%s: No relative symbol@%u\n",
+elf->name, symndx);
+return -EINVAL;
+}
 
 val = elf->sym[symndx].sym->st_value; /* S */
 
diff --git a/xen/arch/arm/arm64/livepatch.c b/xen/arch/arm/arm64/livepatch.c
index 63929b1..2247b92 100644
--- a/xen/arch/arm/arm64/livepatch.c
+++ b/xen/arch/arm/arm64/livepatch.c
@@ -252,12 +252,24 @@ int arch_livepatch_perform_rela(struct livepatch_elf *elf,
 int ovf = 0;
 uint64_t val;
 
-if ( symndx > elf->nsym )
+if ( symndx == STN_UNDEF )
+{
+dprintk(XENLOG_ERR, LIVEPATCH "%s: Encountered STN_UNDEF\n",
+elf->name);
+return -EOPNOTSUPP;
+}
+else if ( symndx >= elf->nsym )
 {
 dprintk(XENLOG_ERR, LIVEPATCH "%s: Relative relocation wants 
symbol@%u which is past end!\n",
 elf->name, symndx);
 return -EINVAL;
 }
+else if ( !elf->sym[symndx].sym )
+{
+dprintk(XENLOG_ERR, LIVEPATCH "%s: No relative symbol@%u\n",
+elf->name, symndx);
+return -EINVAL;
+}
 
 val = elf->sym[symndx].sym->st_value +  r->r_addend; /* S+A */
 
diff --git a/xen/arch/x86/livepatch.c b/xen/arch/x86/livepatch.c
index 7917610..406eb91 100644
--- a/xen/arch/x86/livepatch.c
+++ b/xen/arch/x86/livepatch.c
@@ -170,12 +170,24 @@ int arch_livepatch_perform_rela(struct livepatch_elf *elf,
 uint8_t *dest = base->load_addr + r->r_offset;
 uint64_t val;
 
-if ( symndx > elf->nsym )
+if ( symndx == STN_UNDEF )
+{
+dprintk(XENLOG_ERR, LIVEPATCH "%s: Encountered STN_UNDEF\n",
+elf->name);
+return -EOPNOTSUPP;
+}
+else if ( symndx >= elf->nsym )
 {
 dprintk(XENLOG_ERR, LIVEPATCH "%s: Relative relocation wants 
symbol@%u which is past end!\n",
 elf->name, symndx);
 return -EINVAL;
 }
+else if ( !elf->sym[symndx].sym )
+{
+dprintk(XENLOG_ERR, LIVEPATCH "%s: No symbol@%u\n",
+elf->name, symndx);
+return -EINVAL;
+}
 
 val = r->r_addend + elf->sym[symndx].sym->st_value;
 
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] VT-d: fix VF of RC integrated endpoint matched to wrong VT-d unit

2017-06-22 Thread Venu Busireddy
On 2017-06-22 11:52:50 -0400, Konrad Rzeszutek Wilk wrote:
> On Thu, Jun 22, 2017 at 09:31:50AM -0600, Jan Beulich wrote:
> > >>> On 22.06.17 at 16:21,  wrote:
> > > On Thu, Jun 22, 2017 at 03:26:04AM -0600, Jan Beulich wrote:
> > > On 21.06.17 at 12:47,  wrote:
> > >>> The problem is a VF of RC integrated PF (e.g. PF's BDF is 00:02.0),
> > >>> we would wrongly use 00:00.0 to search VT-d unit.
> > >>> 
> > >>> To search VT-d unit for a VF, the BDF of the PF is used. And If the
> > >>> PF is an Extended Function, the BDF of one traditional function is
> > >>> used.  The following line (from acpi_find_matched_drhd_unit()):
> > >>> devfn = PCI_SLOT(pdev->info.physfn.devfn) ? 0 : 
> > >>> pdev->info.physfn.devfn;
> > >>> sets 'devfn' to 0 if PF's devfn > 7. Apparently, it treats all
> > >>> PFs which has devfn > 7 as extended function. However, it is wrong for
> > >>> a RC integrated PF, which is not ARI-capable but may have devfn > 7.
> > >>
> > >>I'm again having trouble with you talking about ARI and RC
> > >>integrated here, but not checking for either in any way in the
> > >>new code. Please make sure you establish the full connection
> > >>in the description.
> > > 
> > > Sorry for this. Let me explain this again.
> > > 
> > > From SRIOV spec 3.7.3, it says:
> > > "ARI is not applicable to Root Complex Integrated Endpoints; all other
> > > SR-IOV Capable Devices (Devices that include at least one PF) shall
> > > implement the ARI Capability in each Function."
> > > 
> > > So I _think_ PFs can be classified to two kinds: one is RC integrated
> > > PF and the other is non-RC integrated PF. The former can't support ARI.
> > > The latter shall support ARI. Only for extended functions, one
> > > traditional function's BDF should be used to search VT-d unit. And
> > > according to PCIE spec, Extended function means within an ARI Device, a
> > > Function whose Function Number is greater than 7. So the former
> > > can't be an extended function. The latter is an extended function as
> > > long as PF's devfn > 7, this check is exactly what the original code
> > > did. So I think the original code didn't aware the former
> > > (aka, RC integrated endpoints.). This patch checks the is_extfn
> > > directly. All of this is only my understanding. I need you and Kevin's
> > > help to decide it's right or not.
> > 
> > This makes sense to me, but as said, the patch description will need
> > to include this in some form.
> > 
> > >>> --- a/xen/drivers/passthrough/vtd/dmar.c
> > >>> +++ b/xen/drivers/passthrough/vtd/dmar.c
> > >>> @@ -218,8 +218,18 @@ struct acpi_drhd_unit 
> > >>> *acpi_find_matched_drhd_unit(const 
> > >>> struct pci_dev *pdev)
> > >>>  }
> > >>>  else if ( pdev->info.is_virtfn )
> > >>>  {
> > >>> +struct pci_dev *physfn;
> > >>
> > >>const
> > >>
> > >>>  bus = pdev->info.physfn.bus;
> > >>> -devfn = PCI_SLOT(pdev->info.physfn.devfn) ? 0 : 
> > >>> pdev->info.physfn.devfn;
> > >>> +/*
> > >>> + * Use 0 as 'devfn' to search VT-d unit when the physical 
> > >>> function
> > >>> + * is an Extended Function.
> > >>> + */
> > >>> +pcidevs_lock();
> > >>> +physfn = pci_get_pdev(pdev->seg, bus, pdev->info.physfn.devfn);
> > >>> +pcidevs_unlock();
> > >>> +ASSERT(physfn);
> > >>> +devfn = physfn->info.is_extfn ? 0 : pdev->info.physfn.devfn;
> > >>
> > >>This change looks to be fine is we assume that is_extfn is always
> > >>set correctly. Looking at the Linux code setting it, I'm not sure
> > >>though: I can't see any connection to the PF needing to be RC
> > >>integrated there.
> > > 
> > > Linux code sets it when
> > >  pci_ari_enabled(pci_dev->bus) && PCI_SLOT(pci_dev->devfn)
> > > 
> > >  I _think_ pci_ari_enabled(pci_dev->bus) means ARIforwarding is enabled
> > >  in the immediatedly upstream Downstream port. Thus, I think the pci_dev
> > >  is an ARI-capable device for PCIe spec 6.13 says:
> > > 
> > > It is strongly recommended that software in general Set the ARI
> > > Forwarding Enable bit in a 5 Downstream Port only if software is certain
> > > that the device immediately below the Downstream Port is an ARI Device.
> > > If the bit is Set when a non-ARI Device is present, the non-ARI Device
> > > can respond to Configuration Space accesses under what it interprets as
> > > being different Device Numbers, and its Functions can be aliased under
> > > multiple Device Numbers, generally leading to undesired behavior.
> > > 
> > > and the pci_dev can't be a RC integrated endpoints. From another side, it
> > > also means the is_extfn won't be set for RC integrated PF. Is that
> > > right?
> > 
> > Well, I'm not sure about the Linux parts here? Konrad, do you
> > happen to know? Or do you know someone who does?

pci_ari_enabled() and related code trusts that an RC integrated endpoint
does not present the PCI_EXT_CAP_ID_ARI capability. As long as we do

[Xen-devel] [xen-unstable-smoke test] 110976: tolerable trouble: broken/pass - PUSHED

2017-06-22 Thread osstest service owner
flight 110976 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/110976/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  579d698da608a24ab334a6a38d932176bac5cecd
baseline version:
 xen  4514f788d1024ab727ed5d6cc29aed9e8f24

Last test of basis   110964  2017-06-22 08:02:12 Z0 days
Testing same since   110976  2017-06-22 16:01:47 Z0 days1 attempts


People who touched revisions under test:
  Bernhard M. Wiedemann 
  Bernhard M. Wiedemann 
  Ian Jackson 
  Wei Liu 

jobs:
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  broken  
 test-amd64-amd64-xl-qemuu-debianhvm-i386 pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

+ branch=xen-unstable-smoke
+ revision=579d698da608a24ab334a6a38d932176bac5cecd
+ . ./cri-lock-repos
++ . ./cri-common
+++ . ./cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x '!=' x/home/osstest/repos/lock ']'
++ OSSTEST_REPOS_LOCK_LOCKED=/home/osstest/repos/lock
++ exec with-lock-ex -w /home/osstest/repos/lock ./ap-push xen-unstable-smoke 
579d698da608a24ab334a6a38d932176bac5cecd
+ branch=xen-unstable-smoke
+ revision=579d698da608a24ab334a6a38d932176bac5cecd
+ . ./cri-lock-repos
++ . ./cri-common
+++ . ./cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x/home/osstest/repos/lock '!=' x/home/osstest/repos/lock ']'
+ . ./cri-common
++ . ./cri-getconfig
++ umask 002
+ select_xenbranch
+ case "$branch" in
+ tree=xen
+ xenbranch=xen-unstable-smoke
+ qemuubranch=qemu-upstream-unstable
+ '[' xxen = xlinux ']'
+ linuxbranch=
+ '[' xqemu-upstream-unstable = x ']'
+ select_prevxenbranch
++ ./cri-getprevxenbranch xen-unstable-smoke
+ prevxenbranch=xen-4.9-testing
+ '[' x579d698da608a24ab334a6a38d932176bac5cecd = x ']'
+ : tested/2.6.39.x
+ . ./ap-common
++ : osst...@xenbits.xen.org
+++ getconfig OsstestUpstream
+++ perl -e '
use Osstest;
readglobalconfig();
print $c{"OsstestUpstream"} or die $!;
'
++ :
++ : git://xenbits.xen.org/xen.git
++ : osst...@xenbits.xen.org:/home/xen/git/xen.git
++ : git://xenbits.xen.org/qemu-xen-traditional.git
++ : git://git.kernel.org
++ : git://git.kernel.org/pub/scm/linux/kernel/git
++ : git
++ : git://xenbits.xen.org/xtf.git
++ : osst...@xenbits.xen.org:/home/xen/git/xtf.git
++ : git://xenbits.xen.org/xtf.git
++ : git://xenbits.xen.org/libvirt.git
++ : osst...@xenbits.xen.org:/home/xen/git/libvirt.git
++ : git://xenbits.xen.org/libvirt.git
++ : git://xenbits.xen.org/osstest/rumprun.git
++ : git
++ : git://xenbits.xen.org/osstest/rumprun.git
++ : osst...@xenbits.xen.org:/home/xen/git/osstest/rumprun.git
++ : git://git.seabios.org/seabios.git
++ : osst...@xenbits.xen.org:/home/xen/git/osstest/seabios.git
++ : git://xenbits.xen.org/osstest/seabios.git
++ : 

[Xen-devel] Xen 4.9 rc9

2017-06-22 Thread Julien Grall
Hi all,

Xen 4.9 rc8 is tagged. You can check that out from xen.git:

 git://xenbits.xen.org/xen.git 4.9.0-rc9

For your convenience there is also a tarball at:
https://downloads.xenproject.org/release/xen/4.9.0-rc9/xen-4.9.0-rc9.tar.gz

And the signature is at:
https://downloads.xenproject.org/release/xen/4.9.0-rc9/xen-4.9.0-rc9.tar.gz.sig

Please send bug reports and test reports to
xen-de...@lists.xenproject.org. When sending bug reports,
please CC relevant maintainers and me (julien.gr...@arm.com).

Cheers,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] Travis build failing because "tools/xen-detect: try sysfs node for obtaining guest type" ?

2017-06-22 Thread Dario Faggioli
Hey,

Am I the only one for which Travis seems to be unhappy of this:

I/home/travis/build/fdario/xen/tools/misc/../../tools/include  xen-detect.c 
  -o xen-detect
xen-detect.c: In function ‘check_sysfs’:
xen-detect.c:196:17: error: ignoring return value of ‘asprintf’, declared with 
attribute warn_unused_result [-Werror=unused-result]
 asprintf(, "V%s.%s", str, tmp);
 ^
xen-detect.c: In function ‘check_for_xen’:
xen-detect.c:93:17: error: ignoring return value of ‘asprintf’, declared with 
attribute warn_unused_result [-Werror=unused-result]
 asprintf(, "V%u.%u",
 ^
cc1: all warnings being treated as errors

https://travis-ci.org/fdario/xen/jobs/245864401

Which, to me, looks related to 48d0c822640f8ce4754de16f1bee5c995bac7078
("tools/xen-detect: try sysfs node for obtaining guest type").

I can, however, build the tools locally, with:
gcc version 6.3.0 20170516 (Debian 6.3.0-18)

Thoughts?

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v2]Proposal to allow setting up shared memory areas between VMs from xl config file

2017-06-22 Thread Zhongze Liu
Hi Wei,

Thank you for your valuable comments.

2017-06-21 23:09 GMT+08:00 Wei Liu :
> On Wed, Jun 21, 2017 at 01:18:38AM +0800, Zhongze Liu wrote:
>> 
>> 1. Motivation and Description
>> 
>> Virtual machines use grant table hypercalls to setup a share page for
>> inter-VMs communications. These hypercalls are used by all PV
>> protocols today. However, very simple guests, such as baremetal
>> applications, might not have the infrastructure to handle the grant table.
>> This project is about setting up several shared memory areas for inter-VMs
>> communications directly from the VM config file.
>> So that the guest kernel doesn't have to have grant table support (in the
>> embedded space, this is not unusual) to be able to communicate with
>> other guests.
>>
>> 
>> 2. Implementation Plan:
>> 
>>
>> ==
>> 2.1 Introduce a new VM config option in xl:
>> ==
>> The shared areas should be shareable among several (>=2) VMs, so
>> every shared physical memory area is assigned to a set of VMs.
>> Therefore, a “token” or “identifier” should be used here to uniquely
>> identify a backing memory area.
>>
>> The backing area would be taken from one domain, which we will regard
>> as the "master domain", and this domain should be created prior to any
>> other "slave domain"s. Again, we have to use some kind of tag to tell who
>> is the "master domain".
>>
>> And the ability to specify the attributes of the pages (say, WO/RO/X)
>> to be shared should be also given to the user. For the master domain,
>> these attributes often describes the maximum permission allowed for the
>> shared pages, and for the slave domains, these attributes are often used
>> to describe with what permissions this area will be mapped.
>> This information should also be specified in the xl config entry.
>>
>
> I don't quite get the attribute settings. If you only insert a backing
> page into guest physical address space with XENMEM hypercall, how do you
> audit the attributes when the guest tries to map the page?
>

I'm still considering about this, and any suggestions are welcomed.
The current plan
I have in mind is XENMEM_access_op.

>> To handle all these, I would suggest using an unsigned integer to serve as 
>> the
>> identifier, and using a "master" tag in the master domain's xl config entry
>> to announce that she will provide the backing memory pages. A separate
>> entry would be used to describe the attributes of the shared memory area, of
>> the form "prot=RW".
>
> I think using an integer is too limiting. You would need the user to
> know if a particular number is already used. Maybe using a number is
> good enough for the use case you have in mind, but it is not future
> proof. I don't know how sophisticated we want this to be, though.
>

Sounds reasonable. I chose integers because I think integers are fast
and easy to
manipulate. But integers are somewhat hard to memorize and this isn't
a good thing
from a user's point of view. So maybe I'll make it a string with a
maximum size of 32
or longer.

>> For example:
>>
>> In xl config file of vm1:
>>
>> static_shared_mem = ["id = ID1, begin = gmfn1, end = gmfn2,
>>   granularity = 4k, prot = RO, master”,
>>  "id = ID2, begin = gmfn3, end = gmfn4,
>
> I think you mean "gpfn" here and below.
>

Yes, according to https://wiki.xenproject.org/wiki/XenTerminology, the section
"Address Spaces", gmfn == gpfn for auto-translated guests. But this usage
seems to be outdated and should be phased out according to include/xen/mm.h.
And just as what Julien has pointed out, the term "gfn" should be used here.

>>  granularity = 4k, prot = RW, master”]
>>
>> In xl config file of vm2:
>>
>> static_shared_mem = ["id = ID1, begin = gmfn5, end = gmfn6,
>>   granularity = 4k, prot = RO”]
>>
>> In xl config file of vm3:
>>
>> static_shared_mem = ["id = ID2, begin = gmfn7, end = gmfn8,
>>   granularity = 4k, prot = RW”]
>>
>> gmfn's above are all hex of the form "0x2".
>>
>> In the example above. A memory area ID1 will be shared between vm1 and vm2.
>> This area will be taken from vm1 and mapped into vm2's stage-2 page table.
>> The parameter "prot=RO" means that this memory area are offered with 
>> read-only
>> permission. vm1 can access this area using gmfn1~gmfn2, and vm2 using
>> gmfn5~gmfn6.
>> Likewise, a memory area ID will be shared between vm1 and vm3 with read and
>> write permissions. vm1 is the master and vm2 the slave. vm1 can access the
>> area using gmfn3~gmfn4 and vm3 using gmfn7~gmfn8.
>>
>> The "granularity" is optional in the slaves' config entries. But if it's
>> presented in the 

Re: [Xen-devel] [PATCH v3 5/9] xen/vpci: add handlers to map the BARs

2017-06-22 Thread Roger Pau Monne
On Fri, May 19, 2017 at 09:21:56AM -0600, Jan Beulich wrote:
> >>> On 27.04.17 at 16:35,  wrote:
> > +static int vpci_modify_bars(struct pci_dev *pdev, const bool map)
> > +{
> > +struct vpci_header *header = >vpci->header;
> > +unsigned int i;
> > +int rc = 0;
> > +
> > +for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
> > +{
> > +paddr_t gaddr = map ? header->bars[i].gaddr
> > +: header->bars[i].mapped_addr;
> > +paddr_t paddr = header->bars[i].paddr;
> > +
> > +if ( header->bars[i].type != VPCI_BAR_MEM &&
> > + header->bars[i].type != VPCI_BAR_MEM64_LO )
> > +continue;
> > +
> > +rc = modify_mmio(pdev->domain, _gfn(PFN_DOWN(gaddr)),
> > + _mfn(PFN_DOWN(paddr)), 
> > PFN_UP(header->bars[i].size),
> 
> The PFN_UP() indicates a problem: For sub-page BARs you can't
> blindly map/unmap them without taking into consideration other
> devices sharing the same page.

I'm not sure I follow, the start address of BARs is always aligned to
a 4KB boundary, so there's no chance of the same page being used by
two different BARs at the same time.

The size is indeed not aligned to 4KB, but I don't see how this can
cause collisions with other BARs unless the domain is actively trying
to make the BARs overlap, in which case there's not much Xen can do.

> > + map);
> > +if ( rc )
> > +break;
> > +
> > +header->bars[i].mapped_addr = map ? gaddr : 0;
> > +}
> > +
> > +return rc;
> > +}
> 
> Shouldn't this function somewhere honor the unset flags?

Right, I've added a check to make sure the BAR is positioned before
trying to map it into the domain p2m.

> > +static int vpci_cmd_read(struct pci_dev *pdev, unsigned int reg,
> > + union vpci_val *val, void *data)
> > +{
> > +struct vpci_header *header = data;
> > +
> > +val->word = header->command;
> 
> Rather than reading back and storing the value in the write handler,
> I'd recommending doing an actual read here.

OK.

> > +static int vpci_cmd_write(struct pci_dev *pdev, unsigned int reg,
> > +  union vpci_val val, void *data)
> > +{
> > +struct vpci_header *header = data;
> > +uint16_t new_cmd, saved_cmd;
> > +uint8_t seg = pdev->seg, bus = pdev->bus;
> > +uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
> > +int rc;
> > +
> > +new_cmd = val.word;
> > +saved_cmd = header->command;
> > +
> > +if ( !((new_cmd ^ saved_cmd) & PCI_COMMAND_MEMORY) )
> > +goto out;
> > +
> > +/* Memory space access change. */
> > +rc = vpci_modify_bars(pdev, new_cmd & PCI_COMMAND_MEMORY);
> > +if ( rc )
> > +{
> > +dprintk(XENLOG_ERR,
> > +"%04x:%02x:%02x.%u:unable to %smap BARs: %d\n",
> > +seg, bus, slot, func,
> > +new_cmd & PCI_COMMAND_MEMORY ? "" : "un", rc);
> > +return rc;
> 
> I guess you can guess the question already: What is the bare
> hardware equivalent of this failure return?

Yes, this is already fixed since write handlers simply return void.
The hw equivalent would be to ignore the write AFAICT (ie: memory
decoding will not be enabled).

Are you fine with the dprintk or would you also like me to remove
that? (IMHO it's helpful for debugging).

> > +}
> > +
> > + out:
> 
> Please try to avoid goto-s and labels for other than error handling
> (and even then only when code would otherwise end up pretty
> convoluted).

Done.

> > +static int vpci_bar_read(struct pci_dev *pdev, unsigned int reg,
> > + union vpci_val *val, void *data)
> > +{
> > +struct vpci_bar *bar = data;
> 
> const
> 
> > +bool hi = false;
> > +
> > +ASSERT(bar->type == VPCI_BAR_MEM || bar->type == VPCI_BAR_MEM64_LO ||
> > +   bar->type == VPCI_BAR_MEM64_HI);
> > +
> > +if ( bar->type == VPCI_BAR_MEM64_HI )
> > +{
> > +ASSERT(reg - PCI_BASE_ADDRESS_0 > 0);
> 
> reg > PCI_BASE_ADDRESS_0

Fixed.

> > +bar--;
> > +hi = true;
> > +}
> > +
> > +if ( bar->sizing )
> > +val->double_word = ~(bar->size - 1) >> (hi ? 32 : 0);
> 
> There's also a comment further down - this is producing undefined
> behavior on 32-bits arches.

I've changed size to be a uint64_t.

> > +static int vpci_bar_write(struct pci_dev *pdev, unsigned int reg,
> > +  union vpci_val val, void *data)
> > +{
> > +struct vpci_bar *bar = data;
> > +uint32_t wdata = val.double_word;
> > +bool hi = false, unset = false;
> > +
> > +ASSERT(bar->type == VPCI_BAR_MEM || bar->type == VPCI_BAR_MEM64_LO ||
> > +   bar->type == VPCI_BAR_MEM64_HI);
> > +
> > +if ( wdata == GENMASK(31, 0) )
> 
> I'm afraid this again doesn't match real hardware behavior: As the
> low bits are r/o, writes with them having any value, but all other

Re: [Xen-devel] [PATCH for-4.9 v2] xen/livepatch: Don't crash on encountering STN_UNDEF relocations

2017-06-22 Thread Konrad Rzeszutek Wilk
On Thu, Jun 22, 2017 at 12:33:57PM -0400, Konrad Rzeszutek Wilk wrote:
> On Thu, Jun 22, 2017 at 12:10:46PM -0400, Konrad Rzeszutek Wilk wrote:
> > On Thu, Jun 22, 2017 at 11:27:50AM -0400, Konrad Rzeszutek Wilk wrote:
> > > On Wed, Jun 21, 2017 at 09:26:15PM -0400, Konrad Rzeszutek Wilk wrote:
> > > > On Wed, Jun 21, 2017 at 07:13:36PM +0100, Andrew Cooper wrote:
> > > > > A symndx of STN_UNDEF is special, and means a symbol value of 0.  
> > > > > While
> > > > > legitimate in the ELF standard, its existance in a livepatch is 
> > > > > questionable
> > > > > at best.  Until a plausible usecase presents itself, reject such a 
> > > > > relocation
> > > > > with -EOPNOTSUPP.
> > > > > 
> > > > > Additionally, perform a safety check on elf->sym[symndx].sym before
> > > > > derefencing it, to avoid tripping over a NULL pointer when 
> > > > > calculating val.
> > > > > 
> > > > > Signed-off-by: Andrew Cooper 
> > > > 
> > > > Reviewed-by: Konrad Rzeszutek Wilk 
> > > > Tested-by: Konrad Rzeszutek Wilk  [x86 right 
> > > > now, will do
> > > > arm32 tomorrow]
> > > 
> > > I did that on my Cubietruck and I made the rookie mistake of not trying
> > > a hypervisor _without_ your changes, so I don't know if this crash
> > > (see inline) is due to your patch or something else.
> > > 
> > > Also I messed up and made the livepatch test run every time it boots, so
> > > now it is stuck in a loop of crashes :-(
> > > 
> > > The git tree is:
> > > 
> > > git://xenbits.xen.org/people/konradwilk/xen.git staging-4.9
> > > 
> > > Stay tuned.
> > 
> > And I see the same thing with b38b147 (that is the top of 'origin/staging').
> > 
> > So time to dig in.
> 
> /me blushes.
> 
> I compiled the hypervisor and the livepatches on a cross compiler.
> arm-linux-gnueabi-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
> 
> 
> But if I compile both on the Cubietruck (natively) it all works nicely.
> gcc (Ubuntu/Linaro 4.8.2-19ubuntu1) 4.8.2
> 
> So:
> 
> Tested-by: Konrad Rzeszutek Wilk  [x86, arm32]
> 
> for both of the patches. Sorry for the alarm.


Jan,

Do you recall perchance this thread: 
http://www.mail-archive.com/xen-devel@lists.xen.org/msg80633.html

I am thinking to ressurect it but to follow the same spirit as here,
that is return -ENOTSUPPO if the sh_addralign is not the correct
value.

> 
> Julien, would you be OK with these two going in 4.9? Please?

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v2]Proposal to allow setting up shared memory areas between VMs from xl config file

2017-06-22 Thread Zhongze Liu
Hi Julien,

2017-06-21 1:29 GMT+08:00 Julien Grall :
> Hi,
>
> Thank you for the new proposal.
>
> On 06/20/2017 06:18 PM, Zhongze Liu wrote:
>>
>> In the example above. A memory area ID1 will be shared between vm1 and
>> vm2.
>> This area will be taken from vm1 and mapped into vm2's stage-2 page table.
>> The parameter "prot=RO" means that this memory area are offered with
>> read-only
>> permission. vm1 can access this area using gmfn1~gmfn2, and vm2 using
>> gmfn5~gmfn6.
>
>
> [...]
>
>>
>> ==
>> 2.3 mapping the memory areas
>> ==
>> Handle the newly added config option in tools/{xl, libxl} and utilize
>> toos/libxc to do the actual memory mapping. Specifically, we will use
>> a wrapper to XENMME_add_to_physmap_batch with XENMAPSPACE_gmfn_foreign to
>> do the actual mapping. But since there isn't such a wrapper in libxc,
>> we'll
>> have to add a new wrapper, xc_domain_add_to_physmap_batch in
>> libxc/xc_domain.c
>
>
> In the paragrah above, you suggest the user can select the permission on the
> shared page. However, the hypercall XENMEM_add_to_physmap does not currently
> take permission. So how do you plan to handle that?
>

I think this could be done via XENMEM_access_op?

Cheers,

Zhongze Liu

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/4] xen: credit2: implement utilization cap

2017-06-22 Thread George Dunlap
On 08/06/17 13:08, Dario Faggioli wrote:
> This commit implements the Xen part of the cap mechanism for
> Credit2.
> 
> A cap is how much, in terms of % of physical CPU time, a domain
> can execute at most.
> 
> For instance, a domain that must not use more than 1/4 of one
> physical CPU, must have a cap of 25%; one that must not use more
> than 1+1/2 of physical CPU time, must be given a cap of 150%.
> 
> Caps are per domain, so it is all a domain's vCPUs, cumulatively,
> that will be forced to execute no more than the decided amount.
> 
> This is implemented by giving each domain a 'budget', and using
> a (per-domain again) periodic timer. Values of budget and 'period'
> are chosen so that budget/period is equal to the cap itself.
> 
> Budget is burned by the domain's vCPUs, in a similar way to how
> credits are.
> 
> When a domain runs out of budget, its vCPUs can't run any longer.
> They can gain, when the budget is replenishment by the timer, which
> event happens once every period.
> 
> Blocking the vCPUs because of lack of budget happens by
> means of a new (_VPF_parked) pause flag, so that, e.g.,
> vcpu_runnable() still works. This is similar to what is
> done in sched_rtds.c, as opposed to what happens in
> sched_credit.c, where vcpu_pause() and vcpu_unpause()
> (which means, among other things, more overhead).
> 
> Note that xenalyze and tools/xentrace/format are also modified,
> to keep them updated with one modified event.
> 
> Signed-off-by: Dario Faggioli 

Looks really good overall, Dario!  Just a few relatively minor comments.

> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index 126417c..ba4bf4b 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -92,6 +92,82 @@
>   */
>  
>  /*
> + * Utilization cap:
> + *
> + * Setting an pCPU utilization cap for a domain means the following:
> + *
> + * - a domain can have a cap, expressed in terms of % of physical CPU time.
> + *   A domain that must not use more than 1/4 of _one_ physical CPU, will
> + *   be given a cap of 25%; a domain that must not use more than 1+1/2 of
> + *   physical CPU time, will be given a cap of 150%;
> + *
> + * - caps are per-domain (not per-vCPU). If a domain has only 1 vCPU, and
> + *   a 40% cap, that one vCPU will use 40% of one pCPU. If a somain has 4
> + *   vCPUs, and a 200% cap, all its 4 vCPUs are allowed to run for (the
> + *   equivalent of) 100% time on 2 pCPUs. How much each of the various 4
> + *   vCPUs will get, is unspecified (will depend on various aspects: 
> workload,
> + *   system load, etc.).
> + *
> + * For implementing this, we use the following approach:
> + *
> + * - each domain is given a 'budget', an each domain has a timer, which
> + *   replenishes the domain's budget periodically. The budget is the amount
> + *   of time the vCPUs of the domain can use every 'period';
> + *
> + * - the period is CSCHED2_BDGT_REPL_PERIOD, and is the same for all domains
> + *   (but each domain has its own timer; so the all are periodic by the same
> + *   period, but replenishment of the budgets of the various domains, at
> + *   periods boundaries, are not synchronous);
> + *
> + * - when vCPUs run, they consume budget. When they don't run, they don't
> + *   consume budget. If there is no budget left for the domain, no vCPU of
> + *   that domain can run. If a vCPU tries to run and finds that there is no
> + *   budget, it blocks.
> + *   Budget never expires, so at whatever time a vCPU wants to run, it can
> + *   check the domain's budget, and if there is some, it can use it.

I'm not sure what this paragraph is trying to say. Saying budget "never
expires" makes it sound like you continue to accumulate it, such that if
you don't run at all for several periods, you could "save it up" and run
at 100% for one full period.

But that's contradicted by...

> + * - budget is replenished to the top of the capacity for the domain once
> + *   per period. Even if there was some leftover budget from previous period,
> + *   though, the budget after a replenishment will always be at most equal
> + *   to the total capacify of the domain ('tot_budget');

...this paragraph.

> + * - when a budget replenishment occurs, if there are vCPUs that had been
> + *   blocked because of lack of budget, they'll be unblocked, and they will
> + *   (potentially) be able to run again.
> + *
> + * Finally, some even more implementation related detail:
> + *
> + * - budget is stored in a domain-wide pool. vCPUs of the domain that want
> + *   to run go to such pool, and grub some. When they do so, the amount
> + *   they grabbed is _immediately_ removed from the pool. This happens in
> + *   vcpu_try_to_get_budget();

This sounds like a good solution to the "greedy vcpu" problem. :-)

> + * - when vCPUs stop running, if they've not consumed all the budget they
> + *   took, the leftover is put back in the pool. This happens in
> + *   

Re: [Xen-devel] [PATCH v7 27/36] iommu/amd: Allow the AMD IOMMU to work with memory encryption

2017-06-22 Thread Tom Lendacky

On 6/22/2017 5:56 AM, Borislav Petkov wrote:

On Fri, Jun 16, 2017 at 01:54:59PM -0500, Tom Lendacky wrote:

The IOMMU is programmed with physical addresses for the various tables
and buffers that are used to communicate between the device and the
driver. When the driver allocates this memory it is encrypted. In order
for the IOMMU to access the memory as encrypted the encryption mask needs
to be included in these physical addresses during configuration.

The PTE entries created by the IOMMU should also include the encryption
mask so that when the device behind the IOMMU performs a DMA, the DMA
will be performed to encrypted memory.

Signed-off-by: Tom Lendacky 
---
  drivers/iommu/amd_iommu.c   |   30 --
  drivers/iommu/amd_iommu_init.c  |   34 --
  drivers/iommu/amd_iommu_proto.h |   10 ++
  drivers/iommu/amd_iommu_types.h |2 +-
  4 files changed, 55 insertions(+), 21 deletions(-)


Reviewed-by: Borislav Petkov 

Btw, I'm assuming the virt_to_phys() difference on SME systems is only
needed in a handful of places. Otherwise, I'd suggest changing the
virt_to_phys() function/macro directly. But I guess most of the places
need the real physical address without the enc bit.


Correct.

Thanks,
Tom





___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3] libxc: add xc_domain_add_to_physmap_batch to wrap XENMEM_add_to_physmap_batch

2017-06-22 Thread Zhongze Liu
This is a preparation for the proposal "allow setting up shared memory areas
between VMs from xl config file". See:
V2: https://lists.xen.org/archives/html/xen-devel/2017-06/msg02256.html
V1: https://lists.xen.org/archives/html/xen-devel/2017-05/msg01288.html

The plan is to use XENMEM_add_to_physmap_batch in xl to map foregin pages from
one DomU to another so that the page could be shared. But currently there is no
wrapper for XENMEM_add_to_physmap_batch in libxc, so we just add a wrapper for
it.

Signed-off-by: Zhongze Liu 
---
Changed Since v2:
  * fix coding style issue
  * let rc = 1 on buffer bouncing failures

Changed Since v1:
  * explain why such a sudden wrapper
  * change the parameters' types

Cc: Ian Jackson ,
Cc: Wei Liu ,
Cc: Stefano Stabellini 
Cc: Julien Grall 
Cc: Jan Beulich 

Re: [Xen-devel] [PATCH for-4.9 v2] xen/livepatch: Don't crash on encountering STN_UNDEF relocations

2017-06-22 Thread Konrad Rzeszutek Wilk
On Thu, Jun 22, 2017 at 12:10:46PM -0400, Konrad Rzeszutek Wilk wrote:
> On Thu, Jun 22, 2017 at 11:27:50AM -0400, Konrad Rzeszutek Wilk wrote:
> > On Wed, Jun 21, 2017 at 09:26:15PM -0400, Konrad Rzeszutek Wilk wrote:
> > > On Wed, Jun 21, 2017 at 07:13:36PM +0100, Andrew Cooper wrote:
> > > > A symndx of STN_UNDEF is special, and means a symbol value of 0.  While
> > > > legitimate in the ELF standard, its existance in a livepatch is 
> > > > questionable
> > > > at best.  Until a plausible usecase presents itself, reject such a 
> > > > relocation
> > > > with -EOPNOTSUPP.
> > > > 
> > > > Additionally, perform a safety check on elf->sym[symndx].sym before
> > > > derefencing it, to avoid tripping over a NULL pointer when calculating 
> > > > val.
> > > > 
> > > > Signed-off-by: Andrew Cooper 
> > > 
> > > Reviewed-by: Konrad Rzeszutek Wilk 
> > > Tested-by: Konrad Rzeszutek Wilk  [x86 right now, 
> > > will do
> > > arm32 tomorrow]
> > 
> > I did that on my Cubietruck and I made the rookie mistake of not trying
> > a hypervisor _without_ your changes, so I don't know if this crash
> > (see inline) is due to your patch or something else.
> > 
> > Also I messed up and made the livepatch test run every time it boots, so
> > now it is stuck in a loop of crashes :-(
> > 
> > The git tree is:
> > 
> > git://xenbits.xen.org/people/konradwilk/xen.git staging-4.9
> > 
> > Stay tuned.
> 
> And I see the same thing with b38b147 (that is the top of 'origin/staging').
> 
> So time to dig in.

/me blushes.

I compiled the hypervisor and the livepatches on a cross compiler.
arm-linux-gnueabi-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609


But if I compile both on the Cubietruck (natively) it all works nicely.
gcc (Ubuntu/Linaro 4.8.2-19ubuntu1) 4.8.2

So:

Tested-by: Konrad Rzeszutek Wilk  [x86, arm32]

for both of the patches. Sorry for the alarm.

Julien, would you be OK with these two going in 4.9? Please?

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] arm: smccc: handle SMCs/HVCs according to SMCCC

2017-06-22 Thread Volodymyr Babchuk

Hi Julien,

On 15.06.17 13:48, Julien Grall wrote:

Hi Volodymyr,

On 14/06/17 15:10, Volodymyr Babchuk wrote:

SMCCC (SMC Call Convention) describes how to handle both HVCs and SMCs.
SMCCC states that both HVC and SMC are valid conduits to call to a 
different

firmware functions. Thus, for example PSCI calls can be made both by
SMC or HVC. Also SMCCC defines function number coding for such calls.
Besides functional calls there are query calls, which allows underling
OS determine version, UID and number of functions provided by service
provider.

This patch adds new file `smccc.c`, which handles both generic SMCs
and HVC according to SMC. At this moment it implements only one
service: Standard Hypervisor Service.

Standard Hypervisor Service only supports query calls, so caller can
ask about hypervisor UID and determine that it is XEN running.

This change allows more generic handling for SMCs and HVCs and it can
be easily extended to support new services and functions.

Signed-off-by: Volodymyr Babchuk 
Reviewed-by: Oleksandr Andrushchenko 
Reviewed-by: Oleksandr Tyshchenko 
---
 xen/arch/arm/Makefile   |  1 +
 xen/arch/arm/smccc.c| 96 
+

 xen/arch/arm/traps.c| 10 -
 xen/include/asm-arm/smccc.h | 89 
+

 4 files changed, 194 insertions(+), 2 deletions(-)
 create mode 100644 xen/arch/arm/smccc.c
 create mode 100644 xen/include/asm-arm/smccc.h

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 49e1fb2..b8728cf 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -39,6 +39,7 @@ obj-y += psci.o
 obj-y += setup.o
 obj-y += shutdown.o
 obj-y += smc.o
+obj-y += smccc.o
 obj-y += smp.o
 obj-y += smpboot.o
 obj-y += sysctl.o
diff --git a/xen/arch/arm/smccc.c b/xen/arch/arm/smccc.c
new file mode 100644
index 000..5d10964
--- /dev/null
+++ b/xen/arch/arm/smccc.c


I would name this file vsmccc.c to show it is about virtual SMC. Also, I 
would have expected pretty everyone to use the SMCC, so I would even 
name the file vsmc.c



@@ -0,0 +1,96 @@
+/*
+ * xen/arch/arm/smccc.c
+ *
+ * Generic handler for SMC and HVC calls according to
+ * ARM SMC callling convention


s/callling/calling/


+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.


I know that some of the other headers are wrong about the GPL license. 
But Xen is GPLv2 only. Please update the copyright accordingly. I.e:


  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
  * published by the Free Software Foundation.


+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+
+#include 
+#include 
+#include 


Why this is included here? You don't use it.


+/* Need to include xen/sched.h before asm/domain.h or it breaks build*/


xen/sched.h will include asm/domain.h. So no need to include the latter 
here.



+#include 
+#include 
+#include 
+#include 
+#include 


You don't use this header here.


+#include 
+#include 
+
+#define XEN_SMCCC_UID ARM_SMCCC_UID(0xa71812dc, 0xc698, 0x4369, \
+0x9a, 0xcf, 0x79, 0xd1, \
+0x8d, 0xde, 0xe6, 0x67)


Please mention that this value was generated. This would avoid to wonder 
where this value comes from.



+
+/*
+ * We can't use XEN version here:
+ * Major revision should change every time SMC/HVC function is removed.
+ * Minor revision should change every time SMC/HVC function is added.
+ * So, it is SMCCC protocol revision code, not XEN version


It would be nice to say this is a requirement of the spec. Also missing 
full stop.



+ */
+#define XEN_SMCCC_MAJOR_REVISION 0
+#define XEN_SMCCC_MINOR_REVISION 1


I first thought the revision was 0.1.3 and was about to ask why. But 
then noticed XEN_SMCC_FUNCTION_COUNT is not part of the revision.


So please add a newline for clarity.


+#define XEN_SMCCC_FUNCTION_COUNT 3
+
+/* SMCCC interface for hypervisor. Tell about self */


Tell about itself. + missing full stop.

+static bool handle_hypervisor(struct cpu_user_regs *regs, const union 
hsr hsr)


hsr is already part of regs.


+{
+switch ( ARM_SMCCC_FUNC_NUM(get_user_reg(regs, 0)) )
+{
+case ARM_SMCCC_FUNC_CALL_COUNT:
+set_user_reg(regs, 0, XEN_SMCCC_FUNCTION_COUNT);
+return true;
+case ARM_SMCCC_FUNC_CALL_UID:
+set_user_reg(regs, 0, XEN_SMCCC_UID.a[0]);
+

[Xen-devel] [PATCH v2 3/4] arm: traps: handle PSCI calls inside `vsmc.c`

2017-06-22 Thread Volodymyr Babchuk
PSCI is part of HVC/SMC interface, so it should be handled in
appropriate place: `vsmc.c`. This patch just moves PSCI
handler calls from `traps.c` to `vsmc.c`.

PSCI is considered as two different "services" in terms of SMCCC.
Older PSCI 1.0 is treated as "architecture service", while never
PSCI 2.0 is defined as "standard secure service".

Signed-off-by: Volodymyr Babchuk 
Reviewed-by: Oleksandr Andrushchenko 
Reviewed-by: Oleksandr Tyshchenko 
---
Splitted this patch into two. Now this patch does not change the way,
how PSCI code accesses the arguments.
---
 xen/arch/arm/traps.c  | 124 --
 xen/arch/arm/vsmc.c   | 136 ++
 xen/include/public/arch-arm/smc.h |   5 ++
 3 files changed, 153 insertions(+), 112 deletions(-)

diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 66242e5..e806474 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -39,7 +39,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -1450,113 +1449,6 @@ static void do_debug_trap(struct cpu_user_regs *regs, 
unsigned int code)
 }
 #endif
 
-/* helper function for checking arm mode 32/64 bit */
-static inline int psci_mode_check(struct domain *d, register_t fid)
-{
-return !( is_64bit_domain(d)^( (fid & PSCI_0_2_64BIT) >> 30 ) );
-}
-
-static void do_trap_psci(struct cpu_user_regs *regs)
-{
-register_t fid = get_user_reg(regs,0);
-
-/* preloading in case psci_mode_check fails */
-set_user_reg(regs, 0, PSCI_INVALID_PARAMETERS);
-switch( fid )
-{
-case PSCI_cpu_off:
-{
-uint32_t pstate = get_user_reg(regs, 1);
-perfc_incr(vpsci_cpu_off);
-set_user_reg(regs, 0, do_psci_cpu_off(pstate));
-}
-break;
-case PSCI_cpu_on:
-{
-uint32_t vcpuid =  get_user_reg(regs, 1);
-register_t epoint = get_user_reg(regs, 2);
-perfc_incr(vpsci_cpu_on);
-set_user_reg(regs, 0, do_psci_cpu_on(vcpuid, epoint));
-}
-break;
-case PSCI_0_2_FN_PSCI_VERSION:
-perfc_incr(vpsci_version);
-set_user_reg(regs, 0, do_psci_0_2_version());
-break;
-case PSCI_0_2_FN_CPU_OFF:
-perfc_incr(vpsci_cpu_off);
-set_user_reg(regs, 0, do_psci_0_2_cpu_off());
-break;
-case PSCI_0_2_FN_MIGRATE_INFO_TYPE:
-perfc_incr(vpsci_migrate_info_type);
-set_user_reg(regs, 0, do_psci_0_2_migrate_info_type());
-break;
-case PSCI_0_2_FN_MIGRATE_INFO_UP_CPU:
-case PSCI_0_2_FN64_MIGRATE_INFO_UP_CPU:
-perfc_incr(vpsci_migrate_info_up_cpu);
-if ( psci_mode_check(current->domain, fid) )
-set_user_reg(regs, 0, do_psci_0_2_migrate_info_up_cpu());
-break;
-case PSCI_0_2_FN_SYSTEM_OFF:
-perfc_incr(vpsci_system_off);
-do_psci_0_2_system_off();
-set_user_reg(regs, 0, PSCI_INTERNAL_FAILURE);
-break;
-case PSCI_0_2_FN_SYSTEM_RESET:
-perfc_incr(vpsci_system_reset);
-do_psci_0_2_system_reset();
-set_user_reg(regs, 0, PSCI_INTERNAL_FAILURE);
-break;
-case PSCI_0_2_FN_CPU_ON:
-case PSCI_0_2_FN64_CPU_ON:
-perfc_incr(vpsci_cpu_on);
-if ( psci_mode_check(current->domain, fid) )
-{
-register_t vcpuid = get_user_reg(regs, 1);
-register_t epoint = get_user_reg(regs, 2);
-register_t cid = get_user_reg(regs, 3);
-set_user_reg(regs, 0,
- do_psci_0_2_cpu_on(vcpuid, epoint, cid));
-}
-break;
-case PSCI_0_2_FN_CPU_SUSPEND:
-case PSCI_0_2_FN64_CPU_SUSPEND:
-perfc_incr(vpsci_cpu_suspend);
-if ( psci_mode_check(current->domain, fid) )
-{
-uint32_t pstate = get_user_reg(regs, 1);
-register_t epoint = get_user_reg(regs, 2);
-register_t cid = get_user_reg(regs, 3);
-set_user_reg(regs, 0,
- do_psci_0_2_cpu_suspend(pstate, epoint, cid));
-}
-break;
-case PSCI_0_2_FN_AFFINITY_INFO:
-case PSCI_0_2_FN64_AFFINITY_INFO:
-perfc_incr(vpsci_cpu_affinity_info);
-if ( psci_mode_check(current->domain, fid) )
-{
-register_t taff = get_user_reg(regs, 1);
-uint32_t laff = get_user_reg(regs, 2);
-set_user_reg(regs, 0,
- do_psci_0_2_affinity_info(taff, laff));
-}
-break;
-case PSCI_0_2_FN_MIGRATE:
-case PSCI_0_2_FN64_MIGRATE:
-perfc_incr(vpsci_cpu_migrate);
-if ( psci_mode_check(current->domain, fid) )
-{
-uint32_t tcpu = get_user_reg(regs, 1);
-set_user_reg(regs, 0, do_psci_0_2_migrate(tcpu));
-}
-break;
-default:
-

[Xen-devel] [PATCH v2 1/4] arm: traps: psci: use generic register accessors

2017-06-22 Thread Volodymyr Babchuk
There are standard functions set_user_reg() and get_user_reg(). Use
them instead of PSCI_RESULT_REG()/PSCI_ARG() macros.

Signed-off-by: Volodymyr Babchuk 
---
 xen/arch/arm/traps.c | 68 ++--
 1 file changed, 29 insertions(+), 39 deletions(-)

diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 6cf9ee7..2054c69 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1449,16 +1449,6 @@ static void do_debug_trap(struct cpu_user_regs *regs, 
unsigned int code)
 }
 #endif
 
-#ifdef CONFIG_ARM_64
-#define PSCI_RESULT_REG(reg) (reg)->x0
-#define PSCI_ARG(reg,n) (reg)->x##n
-#define PSCI_ARG32(reg,n) (uint32_t)( (reg)->x##n & 0x )
-#else
-#define PSCI_RESULT_REG(reg) (reg)->r0
-#define PSCI_ARG(reg,n) (reg)->r##n
-#define PSCI_ARG32(reg,n) PSCI_ARG(reg,n)
-#endif
-
 /* helper function for checking arm mode 32/64 bit */
 static inline int psci_mode_check(struct domain *d, register_t fid)
 {
@@ -1467,65 +1457,65 @@ static inline int psci_mode_check(struct domain *d, 
register_t fid)
 
 static void do_trap_psci(struct cpu_user_regs *regs)
 {
-register_t fid = PSCI_ARG(regs,0);
+register_t fid = get_user_reg(regs,0);
 
 /* preloading in case psci_mode_check fails */
-PSCI_RESULT_REG(regs) = PSCI_INVALID_PARAMETERS;
+set_user_reg(regs, 0, PSCI_INVALID_PARAMETERS);
 switch( fid )
 {
 case PSCI_cpu_off:
 {
-uint32_t pstate = PSCI_ARG32(regs,1);
+uint32_t pstate = get_user_reg(regs, 1);
 perfc_incr(vpsci_cpu_off);
-PSCI_RESULT_REG(regs) = do_psci_cpu_off(pstate);
+set_user_reg(regs, 0, do_psci_cpu_off(pstate));
 }
 break;
 case PSCI_cpu_on:
 {
-uint32_t vcpuid = PSCI_ARG32(regs,1);
-register_t epoint = PSCI_ARG(regs,2);
+uint32_t vcpuid =  get_user_reg(regs, 1);
+register_t epoint = get_user_reg(regs, 2);
 perfc_incr(vpsci_cpu_on);
-PSCI_RESULT_REG(regs) = do_psci_cpu_on(vcpuid, epoint);
+set_user_reg(regs, 0, do_psci_cpu_on(vcpuid, epoint));
 }
 break;
 case PSCI_0_2_FN_PSCI_VERSION:
 perfc_incr(vpsci_version);
-PSCI_RESULT_REG(regs) = do_psci_0_2_version();
+set_user_reg(regs, 0, do_psci_0_2_version());
 break;
 case PSCI_0_2_FN_CPU_OFF:
 perfc_incr(vpsci_cpu_off);
-PSCI_RESULT_REG(regs) = do_psci_0_2_cpu_off();
+set_user_reg(regs, 0, do_psci_0_2_cpu_off());
 break;
 case PSCI_0_2_FN_MIGRATE_INFO_TYPE:
 perfc_incr(vpsci_migrate_info_type);
-PSCI_RESULT_REG(regs) = do_psci_0_2_migrate_info_type();
+set_user_reg(regs, 0, do_psci_0_2_migrate_info_type());
 break;
 case PSCI_0_2_FN_MIGRATE_INFO_UP_CPU:
 case PSCI_0_2_FN64_MIGRATE_INFO_UP_CPU:
 perfc_incr(vpsci_migrate_info_up_cpu);
 if ( psci_mode_check(current->domain, fid) )
-PSCI_RESULT_REG(regs) = do_psci_0_2_migrate_info_up_cpu();
+set_user_reg(regs, 0, do_psci_0_2_migrate_info_up_cpu());
 break;
 case PSCI_0_2_FN_SYSTEM_OFF:
 perfc_incr(vpsci_system_off);
 do_psci_0_2_system_off();
-PSCI_RESULT_REG(regs) = PSCI_INTERNAL_FAILURE;
+set_user_reg(regs, 0, PSCI_INTERNAL_FAILURE);
 break;
 case PSCI_0_2_FN_SYSTEM_RESET:
 perfc_incr(vpsci_system_reset);
 do_psci_0_2_system_reset();
-PSCI_RESULT_REG(regs) = PSCI_INTERNAL_FAILURE;
+set_user_reg(regs, 0, PSCI_INTERNAL_FAILURE);
 break;
 case PSCI_0_2_FN_CPU_ON:
 case PSCI_0_2_FN64_CPU_ON:
 perfc_incr(vpsci_cpu_on);
 if ( psci_mode_check(current->domain, fid) )
 {
-register_t vcpuid = PSCI_ARG(regs,1);
-register_t epoint = PSCI_ARG(regs,2);
-register_t cid = PSCI_ARG(regs,3);
-PSCI_RESULT_REG(regs) =
-do_psci_0_2_cpu_on(vcpuid, epoint, cid);
+register_t vcpuid = get_user_reg(regs, 1);
+register_t epoint = get_user_reg(regs, 2);
+register_t cid = get_user_reg(regs, 3);
+set_user_reg(regs, 0,
+ do_psci_0_2_cpu_on(vcpuid, epoint, cid));
 }
 break;
 case PSCI_0_2_FN_CPU_SUSPEND:
@@ -1533,11 +1523,11 @@ static void do_trap_psci(struct cpu_user_regs *regs)
 perfc_incr(vpsci_cpu_suspend);
 if ( psci_mode_check(current->domain, fid) )
 {
-uint32_t pstate = PSCI_ARG32(regs,1);
-register_t epoint = PSCI_ARG(regs,2);
-register_t cid = PSCI_ARG(regs,3);
-PSCI_RESULT_REG(regs) =
-do_psci_0_2_cpu_suspend(pstate, epoint, cid);
+uint32_t pstate = get_user_reg(regs, 1);
+register_t epoint = get_user_reg(regs, 2);
+register_t cid = get_user_reg(regs, 3);
+   

[Xen-devel] [PATCH v2 0/4] Handle SMCs and HVCs in conformance with SMCCC

2017-06-22 Thread Volodymyr Babchuk
Hello all,

This is second version. Instead of 2 patches, there are 4 now.
I have divided PSCI patch into two: one changes how PSCI
code accesses registers and second one moves PSCI code with
new accessors to vsmc.c.

Also I had removed redundant 64 bit mode check in PSCI code, as it
does not conforms with SMCCC.

Per-patch changes are described in corresponding patch messages.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 4/4] vsmc: psci: remove 64 bit mode check

2017-06-22 Thread Volodymyr Babchuk
PSCI handling code had helper routine that checked calling convention.
It does not needed anymore, because:

 - Generic handler checks that 64 bit calls can be made only by
   64 bit guests.

 - SMCCC requires that 64-bit handler should support both 32 and 64 bit
   calls even if they originate from 64 bit caller.

This patch removes that extra check.

Signed-off-by: Volodymyr Babchuk 
---
 xen/arch/arm/vsmc.c | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/xen/arch/arm/vsmc.c b/xen/arch/arm/vsmc.c
index 5f10fd1..1983e0e 100644
--- a/xen/arch/arm/vsmc.c
+++ b/xen/arch/arm/vsmc.c
@@ -98,12 +98,6 @@ static bool handle_arch(struct cpu_user_regs *regs)
 return false;
 }
 
-/* helper function for checking arm mode 32/64 bit */
-static inline int psci_mode_check(struct domain *d, register_t fid)
-{
-return !( is_64bit_domain(d)^( (fid & PSCI_0_2_64BIT) >> 30 ) );
-}
-
 /* PSCI 2.0 interface */
 static bool handle_ssc(struct cpu_user_regs *regs)
 {
@@ -125,8 +119,7 @@ static bool handle_ssc(struct cpu_user_regs *regs)
 return true;
 case ARM_SMCCC_FUNC_NUM(PSCI_0_2_FN_MIGRATE_INFO_UP_CPU):
 perfc_incr(vpsci_migrate_info_up_cpu);
-if ( psci_mode_check(current->domain, fid) )
-set_user_reg(regs, 0, do_psci_0_2_migrate_info_up_cpu());
+set_user_reg(regs, 0, do_psci_0_2_migrate_info_up_cpu());
 return true;
 case ARM_SMCCC_FUNC_NUM(PSCI_0_2_FN_SYSTEM_OFF):
 perfc_incr(vpsci_system_off);
@@ -140,7 +133,6 @@ static bool handle_ssc(struct cpu_user_regs *regs)
 return true;
 case ARM_SMCCC_FUNC_NUM(PSCI_0_2_FN_CPU_ON):
 perfc_incr(vpsci_cpu_on);
-if ( psci_mode_check(current->domain, fid) )
 {
 register_t vcpuid = get_user_reg(regs, 1);
 register_t epoint = get_user_reg(regs, 2);
@@ -151,7 +143,6 @@ static bool handle_ssc(struct cpu_user_regs *regs)
 return true;
 case ARM_SMCCC_FUNC_NUM(PSCI_0_2_FN_CPU_SUSPEND):
 perfc_incr(vpsci_cpu_suspend);
-if ( psci_mode_check(current->domain, fid) )
 {
 uint32_t pstate = get_user_reg(regs, 1);
 register_t epoint = get_user_reg(regs, 2);
@@ -162,7 +153,6 @@ static bool handle_ssc(struct cpu_user_regs *regs)
 return true;
 case ARM_SMCCC_FUNC_NUM(PSCI_0_2_FN_AFFINITY_INFO):
 perfc_incr(vpsci_cpu_affinity_info);
-if ( psci_mode_check(current->domain, fid) )
 {
 register_t taff = get_user_reg(regs, 1);
 uint32_t laff = get_user_reg(regs,2);
@@ -172,7 +162,6 @@ static bool handle_ssc(struct cpu_user_regs *regs)
 return true;
 case ARM_SMCCC_FUNC_NUM(PSCI_0_2_FN_MIGRATE):
 perfc_incr(vpsci_cpu_migrate);
-if ( psci_mode_check(current->domain, fid) )
 {
 uint32_t tcpu = get_user_reg(regs, 1);
 set_user_reg(regs, 0, do_psci_0_2_migrate(tcpu));
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 2/4] arm: smccc: handle SMCs/HVCs according to SMCCC

2017-06-22 Thread Volodymyr Babchuk
SMCCC (SMC Call Convention) describes how to handle both HVCs and SMCs.
SMCCC states that both HVC and SMC are valid conduits to call to a different
firmware functions. Thus, for example PSCI calls can be made both by
SMC or HVC. Also SMCCC defines function number coding for such calls.
Besides functional calls there are query calls, which allows underling
OS determine version, UID and number of functions provided by service
provider.

This patch adds new file `vsmc.c`, which handles both generic SMCs
and HVC according to SMC. At this moment it implements only one
service: Standard Hypervisor Service.

Standard Hypervisor Service only supports query calls, so caller can
ask about hypervisor UID and determine that it is XEN running.

This change allows more generic handling for SMCs and HVCs and it can
be easily extended to support new services and functions.

But, before SMC is forwarded to standard SMCCC handler, it can be routed
to a domain monitor, if one is installed.

Signed-off-by: Volodymyr Babchuk 
Reviewed-by: Oleksandr Andrushchenko 
Reviewed-by: Oleksandr Tyshchenko 
---
 - Moved UID definition to xen/include/public/arch-arm/smc.h
 - Renamed smccc.c to vsmc.c and smccc.h to vsmc.h
 - Reformated vsmc.h and commented definitions there
 - Added immediate value check for SMC64, HVC32 and HVC64
 - Added conditional flags check for SMC calls (HVC will be handled
   and checked in the next patch).
 - Added check for 64 bit calls from 32 bit guests
 - Removed HSR value passing as separate argument
 - Various changes in comments
---
 xen/arch/arm/Makefile |   1 +
 xen/arch/arm/traps.c  |  16 -
 xen/arch/arm/vsmc.c   | 128 ++
 xen/include/asm-arm/vsmc.h|  94 
 xen/include/public/arch-arm/smc.h |  45 ++
 5 files changed, 283 insertions(+), 1 deletion(-)
 create mode 100644 xen/arch/arm/vsmc.c
 create mode 100644 xen/include/asm-arm/vsmc.h
 create mode 100644 xen/include/public/arch-arm/smc.h

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 49e1fb2..4efd01c 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -50,6 +50,7 @@ obj-$(CONFIG_HAS_GICV3) += vgic-v3.o
 obj-$(CONFIG_HAS_ITS) += vgic-v3-its.o
 obj-y += vm_event.o
 obj-y += vtimer.o
+obj-y += vsmc.o
 obj-y += vpsci.o
 obj-y += vuart.o
 
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 2054c69..66242e5 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -44,6 +44,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "decode.h"
 #include "vtimer.h"
@@ -2771,10 +2772,23 @@ static void do_trap_smc(struct cpu_user_regs *regs, 
const union hsr hsr)
 {
 int rc = 0;
 
+if ( !check_conditional_instr(regs, hsr) )
+{
+advance_pc(regs, hsr);
+return;
+}
+
+/* If monitor is enabled, let it handle the call */
 if ( current->domain->arch.monitor.privileged_call_enabled )
 rc = monitor_smc();
 
-if ( rc != 1 )
+if ( rc == 1 )
+return;
+
+/* Use standard routines to handle the call */
+if ( vsmc_handle_call(regs) )
+advance_pc(regs, hsr);
+else
 inject_undef_exception(regs, hsr);
 }
 
diff --git a/xen/arch/arm/vsmc.c b/xen/arch/arm/vsmc.c
new file mode 100644
index 000..10c4acd
--- /dev/null
+++ b/xen/arch/arm/vsmc.c
@@ -0,0 +1,128 @@
+/*
+ * xen/arch/arm/vsmc.c
+ *
+ * Generic handler for SMC and HVC calls according to
+ * ARM SMC calling convention
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+
+#include 
+#include 
+/* Need to include xen/sched.h before asm/domain.h or it breaks build*/
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * Hypervisor Service version
+ *
+ * We can't use XEN version here, because of SMCCC requirements:
+ * Major revision should change every time SMC/HVC function is removed.
+ * Minor revision should change every time SMC/HVC function is added.
+ * So, it is SMCCC protocol revision code, not XEN version.
+ *
+ * Those values are subjected to change, when interface will be extended.
+ * They should not be stored in public/asm-arm/smc.h because they should
+ * be queried by guest using SMC/HVC interface.
+ */
+#define XEN_SMCCC_MAJOR_REVISION 0
+#define XEN_SMCCC_MINOR_REVISION 1
+
+/* Number of functions currently supported by Hypervisor Service. */
+#define XEN_SMCCC_FUNCTION_COUNT 3
+
+/* SMCCC interface for hypervisor. Tell about itself. 

[Xen-devel] [qemu-upstream-4.9-testing test] 110939: tolerable FAIL - PUSHED

2017-06-22 Thread osstest service owner
flight 110939 qemu-upstream-4.9-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/110939/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop   fail REGR. vs. 109926
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stopfail REGR. vs. 109926

Tests which did not succeed, but are not blocking:
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-ws16-amd64  9 windows-installfail never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  13 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-arm64-arm64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 12 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-win10-i386  9 windows-installfail never pass
 test-amd64-i386-xl-qemuu-win10-i386  9 windows-install fail never pass
 test-amd64-i386-xl-qemuu-ws16-amd64  9 windows-install fail never pass

version targeted for testing:
 qemuu414d069b38ab114b89085e44989bf57604ea86d7
baseline version:
 qemuue97832ec6b2a7ddd48b8e6d1d848ffdfee6a31c7

Last test of basis   109926  2017-06-01 11:16:20 Z   21 days
Testing same since   110939  2017-06-21 15:44:00 Z1 days1 attempts


People who touched revisions under test:
  Anthony PERARD 
  Jan Beulich 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt

Re: [Xen-devel] [PATCH for-4.9 v2] xen/livepatch: Don't crash on encountering STN_UNDEF relocations

2017-06-22 Thread Konrad Rzeszutek Wilk
On Thu, Jun 22, 2017 at 11:27:50AM -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Jun 21, 2017 at 09:26:15PM -0400, Konrad Rzeszutek Wilk wrote:
> > On Wed, Jun 21, 2017 at 07:13:36PM +0100, Andrew Cooper wrote:
> > > A symndx of STN_UNDEF is special, and means a symbol value of 0.  While
> > > legitimate in the ELF standard, its existance in a livepatch is 
> > > questionable
> > > at best.  Until a plausible usecase presents itself, reject such a 
> > > relocation
> > > with -EOPNOTSUPP.
> > > 
> > > Additionally, perform a safety check on elf->sym[symndx].sym before
> > > derefencing it, to avoid tripping over a NULL pointer when calculating 
> > > val.
> > > 
> > > Signed-off-by: Andrew Cooper 
> > 
> > Reviewed-by: Konrad Rzeszutek Wilk 
> > Tested-by: Konrad Rzeszutek Wilk  [x86 right now, 
> > will do
> > arm32 tomorrow]
> 
> I did that on my Cubietruck and I made the rookie mistake of not trying
> a hypervisor _without_ your changes, so I don't know if this crash
> (see inline) is due to your patch or something else.
> 
> Also I messed up and made the livepatch test run every time it boots, so
> now it is stuck in a loop of crashes :-(
> 
> The git tree is:
> 
> git://xenbits.xen.org/people/konradwilk/xen.git staging-4.9
> 
> Stay tuned.

And I see the same thing with b38b147 (that is the top of 'origin/staging').

So time to dig in.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] libxc: add xc_domain_add_to_physmap_batch to wrap XENMEM_add_to_physmap_batch

2017-06-22 Thread Zhongze Liu
Hi Wei,

2017-06-21 23:44 GMT+08:00 Wei Liu :
> On Wed, Jun 21, 2017 at 01:29:26AM +0800, Zhongze Liu wrote:
>> This is a preparation for the proposal "allow setting up shared memory areas
>> between VMs from xl config file". See:
>> V2: https://lists.xen.org/archives/html/xen-devel/2017-06/msg02256.html
>> V1: https://lists.xen.org/archives/html/xen-devel/2017-05/msg01288.html
>>
>> The plan is to use XENMEM_add_to_physmap_batch in xl to map foregin pages 
>> from
>> one DomU to another so that the page could be shared. But currently there is 
>> no
>> wrapper for XENMEM_add_to_physmap_batch in libxc, so we just add a wrapper 
>> for
>> it.
>>
>> Signed-off-by: Zhongze Liu 
>> ---
>> +int xc_domain_add_to_physmap_batch(xc_interface *xch,
>> +   domid_t domid,
>> +   domid_t foreign_domid,
>> +   unsigned int space,
>> +   unsigned int size,
>> +   xen_ulong_t *idxs,
>> +   xen_pfn_t *gpfns,
>> + .  int *errs)
>> +{
>> +int rc;
>> +DECLARE_HYPERCALL_BOUNCE(idxs, size * sizeof(*idxs), 
>> XC_HYPERCALL_BUFFER_BOUNCE_IN);
>> +DECLARE_HYPERCALL_BOUNCE(gpfns, size * sizeof(*gpfns), 
>> XC_HYPERCALL_BUFFER_BOUNCE_IN);
>> +DECLARE_HYPERCALL_BOUNCE(errs, size * sizeof(*errs), 
>> XC_HYPERCALL_BUFFER_BOUNCE_OUT);
>> +
>> +struct xen_add_to_physmap_batch xatp_batch = {
>> +.domid = domid,
>> +.space = space,
>> +.size = size,
>> +.u = {.foreign_domid = foreign_domid}
>
> Coding style issue.

Do you mean that I should add a space between '{' and '.' near ".u =
{.foreign" in this line?

>
> Just a note, the struct is different for pre-4.7 and post-4.7 Xen. You
> don't need to implement a version of this function for pre-4.7 Xen.
>
>> +};
>> +
>> +if ( xc_hypercall_bounce_pre(xch, idxs)  ||
>> + xc_hypercall_bounce_pre(xch, gpfns) ||
>> + xc_hypercall_bounce_pre(xch, errs)  )
>> +{
>> +PERROR("Could not bounce memory for XENMEM_add_to_physmap_batch");
>> +goto out;
>
> rc will be uninitialised in this exit path.
>
>> +}
>> +
>> +set_xen_guest_handle(xatp_batch.idxs, idxs);
>> +set_xen_guest_handle(xatp_batch.gpfns, gpfns);
>> +set_xen_guest_handle(xatp_batch.errs, errs);
>> +
>> +rc = do_memory_op(xch, XENMEM_add_to_physmap_batch,
>> +  _batch, sizeof(xatp_batch));
>> +
>> +out:
>> +xc_hypercall_bounce_post(xch, idxs);
>> +xc_hypercall_bounce_post(xch, gpfns);
>> +xc_hypercall_bounce_post(xch, errs);
>> +
>> +return rc;
>> +}
>> +
>>  int xc_domain_claim_pages(xc_interface *xch,
>> uint32_t domid,
>> unsigned long nr_pages)
>> --
>> 2.13.1
>>


Cheers,

Zhongze Liu

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/2] x86/altp2m: Add a hvmop for setting the suppress #VE bit

2017-06-22 Thread Adrian Pop
On Thu, Jun 22, 2017 at 06:13:22AM -0600, Jan Beulich wrote:
> >>> On 22.06.17 at 14:04,  wrote:
> > On Fri, Jun 16, 2017 at 02:39:10AM -0600, Jan Beulich wrote:
> >> >>> On 15.06.17 at 21:01,  wrote:
> >> > On Fri, Jun 9, 2017 at 10:51 AM, Adrian Pop  wrote:
> >> >> --- a/xen/arch/x86/mm/mem_access.c
> >> >> +++ b/xen/arch/x86/mm/mem_access.c
> >> >> @@ -466,6 +466,58 @@ int p2m_get_mem_access(struct domain *d, gfn_t 
> >> >> gfn, 
> > xenmem_access_t *access)
> >> >>  }
> >> >>
> >> >>  /*
> >> >> + * Set/clear the #VE suppress bit for a page.  Only available on VMX.
> >> >> + */
> >> >> +int p2m_set_suppress_ve(struct domain *d, gfn_t gfn, bool suppress_ve,
> >> >> +unsigned int altp2m_idx)
> >> >> +{
> >> >> +struct p2m_domain *host_p2m = p2m_get_hostp2m(d);
> >> >> +struct p2m_domain *ap2m = NULL;
> >> >> +struct p2m_domain *p2m;
> >> >> +mfn_t mfn;
> >> >> +p2m_access_t a;
> >> >> +p2m_type_t t;
> >> >> +int rc;
> >> >> +
> >> >> +if ( !cpu_has_vmx_virt_exceptions )
> >> >> +return -EOPNOTSUPP;
> >> >> +
> >> >> +/* This subop should only be used from a privileged domain. */
> >> >> +if ( !current->domain->is_privileged )
> >> >> +return -EINVAL;
> >> > 
> >> > This check looks wrong to me. If this subop should only be used by an
> >> > external (privileged) domain then I don't think this should be
> >> > implemented as an HVMOP, looks more like a domctl to me.
> >> 
> >> I think this wants to be an XSM_DM_PRIV check instead.
> > 
> > I'm not sure, but I expect that to not behave as intended security-wise
> > if Xen is compiled without XSM.  Would it?  It would be great if this
> > feature worked well without XSM too.
> 
> Well, without you explaining why you think this wouldn't work
> without XSM, I don't really know what to answer. I suppose
> you've grep-ed for other uses of this and/or other XSM_* values,
> finding that these exist in various places where all is fine without
> XSM?

OK; it indeed does what it should without XSM as well.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] VT-d: fix VF of RC integrated endpoint matched to wrong VT-d unit

2017-06-22 Thread Konrad Rzeszutek Wilk
On Thu, Jun 22, 2017 at 09:31:50AM -0600, Jan Beulich wrote:
> >>> On 22.06.17 at 16:21,  wrote:
> > On Thu, Jun 22, 2017 at 03:26:04AM -0600, Jan Beulich wrote:
> > On 21.06.17 at 12:47,  wrote:
> >>> The problem is a VF of RC integrated PF (e.g. PF's BDF is 00:02.0),
> >>> we would wrongly use 00:00.0 to search VT-d unit.
> >>> 
> >>> To search VT-d unit for a VF, the BDF of the PF is used. And If the
> >>> PF is an Extended Function, the BDF of one traditional function is
> >>> used.  The following line (from acpi_find_matched_drhd_unit()):
> >>> devfn = PCI_SLOT(pdev->info.physfn.devfn) ? 0 : 
> >>> pdev->info.physfn.devfn;
> >>> sets 'devfn' to 0 if PF's devfn > 7. Apparently, it treats all
> >>> PFs which has devfn > 7 as extended function. However, it is wrong for
> >>> a RC integrated PF, which is not ARI-capable but may have devfn > 7.
> >>
> >>I'm again having trouble with you talking about ARI and RC
> >>integrated here, but not checking for either in any way in the
> >>new code. Please make sure you establish the full connection
> >>in the description.
> > 
> > Sorry for this. Let me explain this again.
> > 
> > From SRIOV spec 3.7.3, it says:
> > "ARI is not applicable to Root Complex Integrated Endpoints; all other
> > SR-IOV Capable Devices (Devices that include at least one PF) shall
> > implement the ARI Capability in each Function."
> > 
> > So I _think_ PFs can be classified to two kinds: one is RC integrated
> > PF and the other is non-RC integrated PF. The former can't support ARI.
> > The latter shall support ARI. Only for extended functions, one
> > traditional function's BDF should be used to search VT-d unit. And
> > according to PCIE spec, Extended function means within an ARI Device, a
> > Function whose Function Number is greater than 7. So the former
> > can't be an extended function. The latter is an extended function as
> > long as PF's devfn > 7, this check is exactly what the original code
> > did. So I think the original code didn't aware the former
> > (aka, RC integrated endpoints.). This patch checks the is_extfn
> > directly. All of this is only my understanding. I need you and Kevin's
> > help to decide it's right or not.
> 
> This makes sense to me, but as said, the patch description will need
> to include this in some form.
> 
> >>> --- a/xen/drivers/passthrough/vtd/dmar.c
> >>> +++ b/xen/drivers/passthrough/vtd/dmar.c
> >>> @@ -218,8 +218,18 @@ struct acpi_drhd_unit 
> >>> *acpi_find_matched_drhd_unit(const 
> >>> struct pci_dev *pdev)
> >>>  }
> >>>  else if ( pdev->info.is_virtfn )
> >>>  {
> >>> +struct pci_dev *physfn;
> >>
> >>const
> >>
> >>>  bus = pdev->info.physfn.bus;
> >>> -devfn = PCI_SLOT(pdev->info.physfn.devfn) ? 0 : 
> >>> pdev->info.physfn.devfn;
> >>> +/*
> >>> + * Use 0 as 'devfn' to search VT-d unit when the physical 
> >>> function
> >>> + * is an Extended Function.
> >>> + */
> >>> +pcidevs_lock();
> >>> +physfn = pci_get_pdev(pdev->seg, bus, pdev->info.physfn.devfn);
> >>> +pcidevs_unlock();
> >>> +ASSERT(physfn);
> >>> +devfn = physfn->info.is_extfn ? 0 : pdev->info.physfn.devfn;
> >>
> >>This change looks to be fine is we assume that is_extfn is always
> >>set correctly. Looking at the Linux code setting it, I'm not sure
> >>though: I can't see any connection to the PF needing to be RC
> >>integrated there.
> > 
> > Linux code sets it when
> >  pci_ari_enabled(pci_dev->bus) && PCI_SLOT(pci_dev->devfn)
> > 
> >  I _think_ pci_ari_enabled(pci_dev->bus) means ARIforwarding is enabled
> >  in the immediatedly upstream Downstream port. Thus, I think the pci_dev
> >  is an ARI-capable device for PCIe spec 6.13 says:
> > 
> > It is strongly recommended that software in general Set the ARI
> > Forwarding Enable bit in a 5 Downstream Port only if software is certain
> > that the device immediately below the Downstream Port is an ARI Device.
> > If the bit is Set when a non-ARI Device is present, the non-ARI Device
> > can respond to Configuration Space accesses under what it interprets as
> > being different Device Numbers, and its Functions can be aliased under
> > multiple Device Numbers, generally leading to undesired behavior.
> > 
> > and the pci_dev can't be a RC integrated endpoints. From another side, it
> > also means the is_extfn won't be set for RC integrated PF. Is that
> > right?
> 
> Well, I'm not sure about the Linux parts here? Konrad, do you
> happen to know? Or do you know someone who does?

Including Govinda and Venu,

> 
> >>I'd also suggest doing error handling not by ASSERT(), but by
> >>checking physfn in the conditional expression.
> > 
> > do you mean this:
> > devfn = (physfn && physfn->info.is_extfn) ? 0 : pdev->info.physfn.devfn;
> 
> Yes.
> 
> Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org

Re: [Xen-devel] [PATCH] mini-os: use gzip -n

2017-06-22 Thread Wei Liu
On Thu, Jun 22, 2017 at 03:55:21PM +0100, Andrew Cooper wrote:
> On 22/06/17 15:09, Wei Liu wrote:
> > Cc minios-devel and Samuel
> >
> > On Thu, Jun 22, 2017 at 03:40:26PM +0200, Bernhard M. Wiedemann wrote:
> >> to not add current timestamp to
> >> ioemu-stubdom.gz
> >> pv-grub-x86_32.gz
> >> pv-grub-x86_64.gz
> >> xenstore-stubdom.gz
> >>
> >> to allow for reproducible builds
> >>
> >> Signed-off-by: Bernhard M. Wiedemann 
> > Acked-by: Wei Liu 
> 
> Would it make sense to have a $(GZIP) in the same as we abstract out
> other programs, and export GZIP = gzip -n ?

Sure, that would be a nice thing to have.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] VT-d: fix VF of RC integrated endpoint matched to wrong VT-d unit

2017-06-22 Thread Jan Beulich
>>> On 22.06.17 at 16:21,  wrote:
> On Thu, Jun 22, 2017 at 03:26:04AM -0600, Jan Beulich wrote:
> On 21.06.17 at 12:47,  wrote:
>>> The problem is a VF of RC integrated PF (e.g. PF's BDF is 00:02.0),
>>> we would wrongly use 00:00.0 to search VT-d unit.
>>> 
>>> To search VT-d unit for a VF, the BDF of the PF is used. And If the
>>> PF is an Extended Function, the BDF of one traditional function is
>>> used.  The following line (from acpi_find_matched_drhd_unit()):
>>> devfn = PCI_SLOT(pdev->info.physfn.devfn) ? 0 : pdev->info.physfn.devfn;
>>> sets 'devfn' to 0 if PF's devfn > 7. Apparently, it treats all
>>> PFs which has devfn > 7 as extended function. However, it is wrong for
>>> a RC integrated PF, which is not ARI-capable but may have devfn > 7.
>>
>>I'm again having trouble with you talking about ARI and RC
>>integrated here, but not checking for either in any way in the
>>new code. Please make sure you establish the full connection
>>in the description.
> 
> Sorry for this. Let me explain this again.
> 
> From SRIOV spec 3.7.3, it says:
> "ARI is not applicable to Root Complex Integrated Endpoints; all other
> SR-IOV Capable Devices (Devices that include at least one PF) shall
> implement the ARI Capability in each Function."
> 
> So I _think_ PFs can be classified to two kinds: one is RC integrated
> PF and the other is non-RC integrated PF. The former can't support ARI.
> The latter shall support ARI. Only for extended functions, one
> traditional function's BDF should be used to search VT-d unit. And
> according to PCIE spec, Extended function means within an ARI Device, a
> Function whose Function Number is greater than 7. So the former
> can't be an extended function. The latter is an extended function as
> long as PF's devfn > 7, this check is exactly what the original code
> did. So I think the original code didn't aware the former
> (aka, RC integrated endpoints.). This patch checks the is_extfn
> directly. All of this is only my understanding. I need you and Kevin's
> help to decide it's right or not.

This makes sense to me, but as said, the patch description will need
to include this in some form.

>>> --- a/xen/drivers/passthrough/vtd/dmar.c
>>> +++ b/xen/drivers/passthrough/vtd/dmar.c
>>> @@ -218,8 +218,18 @@ struct acpi_drhd_unit 
>>> *acpi_find_matched_drhd_unit(const 
>>> struct pci_dev *pdev)
>>>  }
>>>  else if ( pdev->info.is_virtfn )
>>>  {
>>> +struct pci_dev *physfn;
>>
>>const
>>
>>>  bus = pdev->info.physfn.bus;
>>> -devfn = PCI_SLOT(pdev->info.physfn.devfn) ? 0 : 
>>> pdev->info.physfn.devfn;
>>> +/*
>>> + * Use 0 as 'devfn' to search VT-d unit when the physical function
>>> + * is an Extended Function.
>>> + */
>>> +pcidevs_lock();
>>> +physfn = pci_get_pdev(pdev->seg, bus, pdev->info.physfn.devfn);
>>> +pcidevs_unlock();
>>> +ASSERT(physfn);
>>> +devfn = physfn->info.is_extfn ? 0 : pdev->info.physfn.devfn;
>>
>>This change looks to be fine is we assume that is_extfn is always
>>set correctly. Looking at the Linux code setting it, I'm not sure
>>though: I can't see any connection to the PF needing to be RC
>>integrated there.
> 
> Linux code sets it when
>  pci_ari_enabled(pci_dev->bus) && PCI_SLOT(pci_dev->devfn)
> 
>  I _think_ pci_ari_enabled(pci_dev->bus) means ARIforwarding is enabled
>  in the immediatedly upstream Downstream port. Thus, I think the pci_dev
>  is an ARI-capable device for PCIe spec 6.13 says:
> 
> It is strongly recommended that software in general Set the ARI
> Forwarding Enable bit in a 5 Downstream Port only if software is certain
> that the device immediately below the Downstream Port is an ARI Device.
> If the bit is Set when a non-ARI Device is present, the non-ARI Device
> can respond to Configuration Space accesses under what it interprets as
> being different Device Numbers, and its Functions can be aliased under
> multiple Device Numbers, generally leading to undesired behavior.
> 
> and the pci_dev can't be a RC integrated endpoints. From another side, it
> also means the is_extfn won't be set for RC integrated PF. Is that
> right?

Well, I'm not sure about the Linux parts here? Konrad, do you
happen to know? Or do you know someone who does?

>>I'd also suggest doing error handling not by ASSERT(), but by
>>checking physfn in the conditional expression.
> 
> do you mean this:
> devfn = (physfn && physfn->info.is_extfn) ? 0 : pdev->info.physfn.devfn;

Yes.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.9 v2] xen/livepatch: Don't crash on encountering STN_UNDEF relocations

2017-06-22 Thread Konrad Rzeszutek Wilk
On Wed, Jun 21, 2017 at 09:26:15PM -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Jun 21, 2017 at 07:13:36PM +0100, Andrew Cooper wrote:
> > A symndx of STN_UNDEF is special, and means a symbol value of 0.  While
> > legitimate in the ELF standard, its existance in a livepatch is questionable
> > at best.  Until a plausible usecase presents itself, reject such a 
> > relocation
> > with -EOPNOTSUPP.
> > 
> > Additionally, perform a safety check on elf->sym[symndx].sym before
> > derefencing it, to avoid tripping over a NULL pointer when calculating val.
> > 
> > Signed-off-by: Andrew Cooper 
> 
> Reviewed-by: Konrad Rzeszutek Wilk 
> Tested-by: Konrad Rzeszutek Wilk  [x86 right now, 
> will do
> arm32 tomorrow]

I did that on my Cubietruck and I made the rookie mistake of not trying
a hypervisor _without_ your changes, so I don't know if this crash
(see inline) is due to your patch or something else.

Also I messed up and made the livepatch test run every time it boots, so
now it is stuck in a loop of crashes :-(

The git tree is:

git://xenbits.xen.org/people/konradwilk/xen.git staging-4.9

Stay tuned.


U-Boot SPL 2015.04 (Mar 14 2016 - 12:00:28)
DRAM: 2048 MiB
CPU: 91200Hz, AXI/AHB/APB: 3/2/2


U-Boot 2015.04 (Mar 14 2016 - 12:00:28) Allwinner Technology

CPU:   Allwinner A20 (SUN7I)
I2C:   ready
DRAM:  2 GiB
MMC:   SUNXI SD/MMC: 0
Setting up a 1024x768 vga console
In:serial
Out:   vga
Err:   vga
SCSI:  SUNXI SCSI INIT
SATA link 0 timeout.
AHCI 0001.0100 32 slots 1 ports 3 Gbps 0x1 impl SATA mode
flags: ncq stag pm led clo only pmp pio slum part ccc apst 
Net:   dwmac.1c5
starting USB...
USB0:   USB EHCI 1.00
scanning bus 0 for devices... 1 USB Device(s) found
USB1:   USB EHCI 1.00
scanning bus 1 for devices... 1 USB Device(s) found
   scanning usb for storage devices... 0 Storage Device(s) found
Hit any key to stop autoboot:  2  1  0 
switch to partitions #0, OK
mmc0 is current device
Scanning mmc 0:1...
Found U-Boot script /boot.scr
reading /boot.scr
1629 bytes read in 22 ms (72.3 KiB/s)
## Executing script at 4310
reading /xen
884744 bytes read in 72 ms (11.7 MiB/s)
reading /sun7i-a20-cubietruck.dtb
30801 bytes read in 42 ms (715.8 KiB/s)
reading /vmlinuz
5662136 bytes read in 382 ms (14.1 MiB/s)
Kernel image @ 0xaea0 [ 0x00 - 0x11b700 ]
## Flattened Device Tree blob at aec0
   Booting using the fdt blob at 0xaec0
   reserving fdt memory region: addr=aec0 size=8000
   Using Device Tree in place at aec0, end aec0afff

Starting kernel ...

 Xen 4.9-rc
(XEN) Xen version 4.9-rc (kon...@dumpdata.com) (arm-linux-gnueabihf-gcc 
(Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609) debug=y  Wed Jun 21 
21:55:01 EDT 2017
(XEN) Latest ChangeSet: Wed Jun 21 19:13:36 2017 +0100 git:e199fd6
(XEN) Processor: 410fc074: "ARM Limited", variant: 0x0, part 0xc07, rev 0x4
(XEN) 32-bit Execution:
(XEN)   Processor Features: 1131:00011011
(XEN) Instruction Sets: AArch32 A32 Thumb Thumb-2 ThumbEE Jazelle
(XEN) Extensions: GenericTimer Security
(XEN)   Debug Features: 02010555
(XEN)   Auxiliary Features: 
(XEN)   Memory Model Features: 10101105 4000 0124 02102211
(XEN)  ISA Features: 02101110 13112111 21232041 2131 10011142 
(XEN) Using PSCI-0.1 for SMP bringup
(XEN) SMP: Allowing 2 CPUs
(XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27 Freq: 24000 KHz
(XEN) GICv2: WARNING: The GICC size is too small: 0x1000 expected 0x2000
(XEN) GICv2 initialization:
(XEN) gic_dist_addr=01c81000
(XEN) gic_cpu_addr=01c82000
(XEN) gic_hyp_addr=01c84000
(XEN) gic_vcpu_addr=01c86000
(XEN) gic_maintenance_irq=25
(XEN) GICv2: 160 lines, 2 cpus, secure (IID 0100143b).
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Allocated console ring of 16 KiB.
(XEN) VFP implementer 0x41 architecture 2 part 0x30 variant 0x7 rev 0x4
(XEN) Bringing up CPU1
(XEN) CPU 1 booted.
(XEN) Brought up 2 CPUs
(XEN) P2M: 40-bit IPA
(XEN) P2M: 3 levels with order-1 root, VTCR 0x80003558
(XEN) I/O virtualisation disabled
(XEN) build-id: d406e500724be7c1443df04d783419bc70fa75b9
(XEN) alternatives: Patching with alt table 100c1464 -> 100c1494
(XEN) *** LOADING DOMAIN 0 ***
(XEN) Loading kernel from boot module @ af60
(XEN) Allocating 1:1 mappings totalling 512MB for dom0:
(XEN) BANK[0] 0x006000-0x008000 (512MB)
(XEN) Grant table range: 0x00bfa0-0x00bfa6d000
(XEN) Loading zImage from af60 to 67a0-67f665b8
(XEN) Allocating PPI 16 for event channel interrupt
(XEN) Loading dom0 DTB to 0x6800-0x680072e0
(XEN) Scrubbing Free RAM on 1 nodes using 2 CPUs
(XEN) done.
(XEN) Initial low memory virq threshold set at 0x4000 pages.
(XEN) Std. Loglevel: All
(XEN) Guest Loglevel: All
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch 

[Xen-devel] [qemu-upstream-unstable test] 110938: regressions - FAIL

2017-06-22 Thread osstest service owner
flight 110938 qemu-upstream-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/110938/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-xl-qemuu-win7-amd64 15 guest-localmigrate/x10 fail REGR. vs. 
106833
 test-armhf-armhf-xl-credit2 15 guest-start/debian.repeat fail REGR. vs. 106833

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds15 guest-start/debian.repeat fail REGR. vs. 106833

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt-raw 12 saverestore-support-checkfail  like 106813
 test-armhf-armhf-libvirt-xsm 13 saverestore-support-checkfail  like 106833
 test-armhf-armhf-libvirt 13 saverestore-support-checkfail  like 106833
 test-amd64-amd64-xl-qemuu-ws16-amd64  9 windows-installfail never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-arm64-arm64-xl  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  13 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-xl-rtds 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-win10-i386  9 windows-install fail never pass
 test-amd64-i386-xl-qemuu-ws16-amd64  9 windows-install fail never pass
 test-amd64-amd64-xl-qemuu-win10-i386  9 windows-installfail never pass

version targeted for testing:
 qemuu414d069b38ab114b89085e44989bf57604ea86d7
baseline version:
 qemuue97832ec6b2a7ddd48b8e6d1d848ffdfee6a31c7

Last test of basis   106833  2017-03-22 07:02:01 Z   92 days
Testing same since   110938  2017-06-21 15:39:52 Z0 days1 attempts


People who touched revisions under test:
  Anthony PERARD 
  Jan Beulich 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386  

Re: [Xen-devel] [PATCH net] xen-netback: correctly schedule rate-limited queues

2017-06-22 Thread David Miller
From: Wei Liu 
Date: Wed, 21 Jun 2017 10:21:22 +0100

> Add a flag to indicate if a queue is rate-limited. Test the flag in
> NAPI poll handler and avoid rescheduling the queue if true, otherwise
> we risk locking up the host. The rescheduling will be done in the
> timer callback function.
> 
> Reported-by: Jean-Louis Dupond 
> Signed-off-by: Wei Liu 
> Tested-by: Jean-Louis Dupond 

Applied.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] mini-os: use gzip -n

2017-06-22 Thread Andrew Cooper
On 22/06/17 15:09, Wei Liu wrote:
> Cc minios-devel and Samuel
>
> On Thu, Jun 22, 2017 at 03:40:26PM +0200, Bernhard M. Wiedemann wrote:
>> to not add current timestamp to
>> ioemu-stubdom.gz
>> pv-grub-x86_32.gz
>> pv-grub-x86_64.gz
>> xenstore-stubdom.gz
>>
>> to allow for reproducible builds
>>
>> Signed-off-by: Bernhard M. Wiedemann 
> Acked-by: Wei Liu 

Would it make sense to have a $(GZIP) in the same as we abstract out
other programs, and export GZIP = gzip -n ?

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] live migration of HVM domUs with more than 32vcpus fails

2017-06-22 Thread Boris Ostrovsky
On 06/22/2017 10:39 AM, Olaf Hering wrote:
> On Thu, Jun 22, Konrad Rzeszutek Wilk wrote:
>
>> On Thu, Jun 22, 2017 at 03:57:52PM +0200, Olaf Hering wrote:
>>> It seems that live migration of HVM domUs with more than 32 vcpus causes
>>> a hang of the domU on the remote side. Both ping and 'xl console' show no
>>> reaction.
>>> This happens also with kernel-4.12. Is this a known bug?
>> Ankur had some patches for more than 32 vCPUs.
> Great, where can I get a copy?


They are queued for 4.13.
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git for-linus-4.13


-boris



signature.asc
Description: OpenPGP digital signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] live migration of HVM domUs with more than 32vcpus fails

2017-06-22 Thread Olaf Hering
On Thu, Jun 22, Konrad Rzeszutek Wilk wrote:

> On Thu, Jun 22, 2017 at 03:57:52PM +0200, Olaf Hering wrote:
> > It seems that live migration of HVM domUs with more than 32 vcpus causes
> > a hang of the domU on the remote side. Both ping and 'xl console' show no
> > reaction.
> > This happens also with kernel-4.12. Is this a known bug?
> 
> Ankur had some patches for more than 32 vCPUs.

Great, where can I get a copy?

Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] live migration of HVM domUs with more than 32vcpus fails

2017-06-22 Thread Konrad Rzeszutek Wilk
On Thu, Jun 22, 2017 at 03:57:52PM +0200, Olaf Hering wrote:
> It seems that live migration of HVM domUs with more than 32 vcpus causes
> a hang of the domU on the remote side. Both ping and 'xl console' show no
> reaction.
> This happens also with kernel-4.12. Is this a known bug?

Ankur had some patches for more than 32 vCPUs.

> 
> Olaf



> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] VT-d: fix VF of RC integrated endpoint matched to wrong VT-d unit

2017-06-22 Thread Chao Gao
On Thu, Jun 22, 2017 at 03:26:04AM -0600, Jan Beulich wrote:
 On 21.06.17 at 12:47,  wrote:
>> The problem is a VF of RC integrated PF (e.g. PF's BDF is 00:02.0),
>> we would wrongly use 00:00.0 to search VT-d unit.
>> 
>> To search VT-d unit for a VF, the BDF of the PF is used. And If the
>> PF is an Extended Function, the BDF of one traditional function is
>> used.  The following line (from acpi_find_matched_drhd_unit()):
>> devfn = PCI_SLOT(pdev->info.physfn.devfn) ? 0 : pdev->info.physfn.devfn;
>> sets 'devfn' to 0 if PF's devfn > 7. Apparently, it treats all
>> PFs which has devfn > 7 as extended function. However, it is wrong for
>> a RC integrated PF, which is not ARI-capable but may have devfn > 7.
>
>I'm again having trouble with you talking about ARI and RC
>integrated here, but not checking for either in any way in the
>new code. Please make sure you establish the full connection
>in the description.

Sorry for this. Let me explain this again.

From SRIOV spec 3.7.3, it says:
"ARI is not applicable to Root Complex Integrated Endpoints; all other
SR-IOV Capable Devices (Devices that include at least one PF) shall
implement the ARI Capability in each Function."

So I _think_ PFs can be classified to two kinds: one is RC integrated
PF and the other is non-RC integrated PF. The former can't support ARI.
The latter shall support ARI. Only for extended functions, one
traditional function's BDF should be used to search VT-d unit. And
according to PCIE spec, Extended function means within an ARI Device, a
Function whose Function Number is greater than 7. So the former
can't be an extended function. The latter is an extended function as
long as PF's devfn > 7, this check is exactly what the original code
did. So I think the original code didn't aware the former
(aka, RC integrated endpoints.). This patch checks the is_extfn
directly. All of this is only my understanding. I need you and Kevin's
help to decide it's right or not.

>
>> --- a/xen/drivers/passthrough/vtd/dmar.c
>> +++ b/xen/drivers/passthrough/vtd/dmar.c
>> @@ -218,8 +218,18 @@ struct acpi_drhd_unit 
>> *acpi_find_matched_drhd_unit(const 
>> struct pci_dev *pdev)
>>  }
>>  else if ( pdev->info.is_virtfn )
>>  {
>> +struct pci_dev *physfn;
>
>const
>
>>  bus = pdev->info.physfn.bus;
>> -devfn = PCI_SLOT(pdev->info.physfn.devfn) ? 0 : 
>> pdev->info.physfn.devfn;
>> +/*
>> + * Use 0 as 'devfn' to search VT-d unit when the physical function
>> + * is an Extended Function.
>> + */
>> +pcidevs_lock();
>> +physfn = pci_get_pdev(pdev->seg, bus, pdev->info.physfn.devfn);
>> +pcidevs_unlock();
>> +ASSERT(physfn);
>> +devfn = physfn->info.is_extfn ? 0 : pdev->info.physfn.devfn;
>
>This change looks to be fine is we assume that is_extfn is always
>set correctly. Looking at the Linux code setting it, I'm not sure
>though: I can't see any connection to the PF needing to be RC
>integrated there.

Linux code sets it when
 pci_ari_enabled(pci_dev->bus) && PCI_SLOT(pci_dev->devfn)

 I _think_ pci_ari_enabled(pci_dev->bus) means ARIforwarding is enabled
 in the immediatedly upstream Downstream port. Thus, I think the pci_dev
 is an ARI-capable device for PCIe spec 6.13 says:

It is strongly recommended that software in general Set the ARI
Forwarding Enable bit in a 5 Downstream Port only if software is certain
that the device immediately below the Downstream Port is an ARI Device.
If the bit is Set when a non-ARI Device is present, the non-ARI Device
can respond to Configuration Space accesses under what it interprets as
being different Device Numbers, and its Functions can be aliased under
multiple Device Numbers, generally leading to undesired behavior.

and the pci_dev can't be a RC integrated endpoints. From another side, it
also means the is_extfn won't be set for RC integrated PF. Is that
right?

>
>I'd also suggest doing error handling not by ASSERT(), but by
>checking physfn in the conditional expression.

do you mean this:
devfn = (physfn && physfn->info.is_extfn) ? 0 : pdev->info.physfn.devfn;

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


  1   2   >