Re: [Xen-devel] [PATCH v3 1/3] xen/flask: Drop the gen-policy.py script

2019-12-09 Thread Jan Beulich
On 09.12.2019 18:01, Andrew Cooper wrote:
> On 09/12/2019 13:38, Jan Beulich wrote:
>> On 07.12.2019 22:16, Andrew Cooper wrote:
>>> --- /dev/null
>>> +++ b/xen/xsm/flask/flask-policy.S
>>> @@ -0,0 +1,20 @@
>>> +.section .init.rodata, "a", %progbits
>>> +
>>> +/* const unsigned char xsm_flask_init_policy[] __initconst */
>>> +.align 4
>> I'm afraid .align is not universal enough to be used here - iirc
>> some architectures have it alias .p2align rather than (how e.g.
>> x86 behaves) .balign. Looks like .p2align is available in all
>> binutils versions we claim to be able to be built with. (I'm
>> not sure the one here is needed anyway, but the one below we
>> surely want.)
> 
> I can switch to p2align, but...
> 
>>
>>> +.global xsm_flask_init_policy
>>> +xsm_flask_init_policy:
>>> +.incbin "policy.bin"
>>> +.Lend:
>>> +
>>> +.type xsm_flask_init_policy, %object
>>> +.size xsm_flask_init_policy, . - xsm_flask_init_policy
>>> +
>>> +/* const unsigned int __initconst xsm_flask_init_policy_size */
>>> +.align 4
>>> +.global xsm_flask_init_policy_size
>>> +xsm_flask_init_policy_size:
>>> +.long .Lend - xsm_flask_init_policy
>> Similarly .long isn't really universal (various arches override
>> it in gas). Aiui .dc.l is intended to be portable (despite still
>> carrying the 'l' in its name, and despite even this one getting
>> overridden by two arches). But perhaps best to ask on the
>> binutils list.
> 
> ... this is not a clear or obvious way to go, not least because it makes
> a different expectation that int will never change from being 32 bits. 
> At least .long will work even if it becomes longer in a future toolchain.

There are a few targets where .long (and .int) appear to produce 2-byte
values (at the first glance, i.e. without checking very closely).

> What is used here doesn't need to be universal - it only needs to work
> for the architectures we support.

But it also would better not break silently for some future port. How
about an equivalent to Linux'es _ASM_PTR() (say ASM_WORD()), which each
architecture has to supply explicitly?

> If hand writing an asm file isn't considered good enough, then the other
> options are a C file with inline asm incbin, or `objdump
> --rename-section`.  The latter one would require a few changes elsewhere
> in the code, but only for linkage purposes.

I'm entirely fine with an assembler source here, it just needs a little
more polishing imo.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] docs/sphinx: How Xen Boots on x86

2019-12-09 Thread Jan Beulich
On 09.12.2019 17:42, Andrew Cooper wrote:
> On 09/12/2019 15:20, Jan Beulich wrote:
>> On 06.12.2019 20:34, Andrew Cooper wrote:
>>> +Objects
>>> +~~~
>>> +
>>> +To begin with, most object files are compiled and linked.  This includes 
>>> the
>>> +Multiboot 1 and 2 headers and entrypoints, including the Multiboot 2 tags 
>>> for
>>> +EFI extensions.  When ``CONFIG_PVH_GUEST`` is selected at build time, this
>>> +includes the PVH entrypoint and associated ELF notes.
>>> +
>>> +Depending on whether the compiler supports ``__attribute__((__ms_abi__))`` 
>>> or
>>> +not, either an EFI stub is included which nops/fails applicable setup 
>>> calls,
>>> +or full EFI support is included.
>> Perhaps also mention that the linker needs to support the necessary
>> binary output format? And perhaps "setup and runtime calls"?
> 
> Link time behaviour is (deliberately) in a later section.

I realize(d) this, but the statement above is simply not true without
also mentioning required linker capabilities: The object files won't
have "full EFI support included" in this case. So I'd expect a "see
also" here at the very least.

>>> +Protocols and entrypoints
>>> +~
>>> +
>>> +All headers and tags are built in ``xen/arch/x86/boot/head.S``
>>> +
>>> +The Multiboot 1 headers request aligned modules and memory information.  
>>> Entry
>>> +is via the start of the binary image, which is the ``start`` symbol.  This
>>> +entrypoint must be started in 32bit mode.
>>> +
>>> +The Multiboot 2 headers are more flexible, and in addition request that the
>>> +image be loaded as high as possible below the 4G boundary, with 2M 
>>> alignment.
>>> +Entry is still via the ``start`` symbol as with MB1.
>> Perhaps explicitly (re)state this is in 32-bit mode?
>>
>>> +Headers for the EFI MB2 extensions are also present.  These request that
>>> +``ExitBootServices()`` not be called, and register ``__efi_mb2_start`` as 
>>> an
>>> +alternative entrypoint, entered in 64bit mode.
>>> +
>>> +If ``CONFIG_PVH_GUEST`` was selected at build time, an Elf note is included
>>> +which indicates the ability to use the PVH boot protocol, and registers
>>> +``__pvh_start`` as the entrypoint, entered in 32bit mode.
>>> +
>>> +
>>> +xen.gz
>>> +~~
>>> +
>>> +The objects are linked together to form ``xen-syms`` which is an ELF64
>>> +executable with full debugging symbols.  ``xen.gz`` is formed by stripping
>>> +``xen-syms``, then repackaging the result as an ELF32 object with a single
>>> +load section at 2MB, and ``gzip``-ing the result.  Despite the ELF32 
>>> having a
>>> +fixed load address, its contents are relocatable.
>> This is a little ambiguous I guess - most of the code is PIC and as
>> such relocatable, but not in a way a boot loader could arrange for.
> 
> I don't follow your concern.
> 
> Everything which needs to be is position independent (subject to being
> loaded on a 2M boundary IIRC), and this property is requested by the MB2
> header.

Oh, sorry, it had been too many years of sym_phys() before it became
sym_offs(). You're right.

>>> +Any bootloader which unzips the binary and follows the ELF headers will 
>>> place
>>> +it at the 2M boundary and jump to ``start`` which is the identified entry
>>> +point.  However, Xen depends on being entered with the MB1 or MB2 
>>> protocols,
>>> +and will terminate otherwise.
>>> +
>>> +The MB2+EFI entrypoint depends on being entered with the MB2 protocol, and
>>> +will terminate if the entry protocol is wrong, or if EFI details aren't
>>> +provided, or if EFI Boot Services are not available.
>>> +
>>> +
>>> +xen.efi
>>> +~~~
>>> +
>>> +When a PEI-capable toolchain is found, the objects are linked together and 
>>> a
>>> +PE64 binary is created.  It can be run directly from the EFI shell, and has
>> I think it's commonly called PE32+, not PE64.
> 
> Ok., because by definition, it can stack.

How does stacking come into play here?

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] x86 / iommu: set up a scratch page in the quarantine domain

2019-12-09 Thread Tian, Kevin
> From: Jan Beulich 
> Sent: Tuesday, December 3, 2019 5:36 PM
> 
> On 28.11.2019 12:32, Jürgen Groß wrote:
> > On 28.11.19 12:17, Jan Beulich wrote:
> >> On 27.11.2019 18:11, Paul Durrant wrote:
> >>> This patch introduces a new iommu_op to facilitate a per-
> implementation
> >>> quarantine set up, and then further code for x86 implementations
> >>> (amd and vtd) to set up a read-only scratch page to serve as the source
> >>> for DMA reads whilst a device is assigned to dom_io. DMA writes will
> >>> continue to fault as before.
> >>>
> >>> The reason for doing this is that some hardware may continue to re-try
> >>> DMA (despite FLR) in the event of an error, or even BME being cleared,
> and
> >>> will fail to deal with DMA read faults gracefully. Having a scratch page
> >>> mapped will allow pending DMA reads to complete and thus such buggy
> >>> hardware will eventually be quiesced.
> >>>
> >>> NOTE: These modifications are restricted to x86 implementations only as
> >>>the buggy h/w I am aware of is only used with Xen in an x86
> >>>environment. ARM may require similar code but, since I am not
> >>>aware of the need, this patch does not modify any ARM
> implementation.
> >>>
> >>> Signed-off-by: Paul Durrant 
> >>
> >> Reviewed-by: Jan Beulich 
> >>
> >>> There is still the open question of whether use of a scratch page ought
> >>> to be gated on something, either are run-time or compile-time.
> >>
> >> I have no clear opinion either way here. The workaround seems low
> >> overhead enough that there may not be a need to have an admin (or
> >> build time) control for this.
> >>
> >> As to 4.13: The quarantining as a whole is pretty fresh. While it
> >> has been backported to security maintained trees, I'd still consider
> >> it a new feature in 4.13, and hence this workaround at least eligible
> >> for consideration.
> >
> > I agree.
> >
> > Release-acked-by: Juergen Gross 
> 
> I notice this has been committed meanwhile. I had specifically not
> done so due to the still missing VT-d ack, seeing that this wasn't
> an entirely "trivial" change.
> 

While the quarantine idea sounds good overall, I'm still not convinced
to have it the only way in place just for handling some known-buggy
device. It kills the possibility of identifying a new buggy device and then 
deciding not to use it in the first space... I thought about whether it
will get better when future IOMMU implements A/D bit - by checking
access bit being set then we'll know some buggy device exists, but, 
the scratch page is shared by all devices then we cannot rely on this 
feature to find out the actual buggy one.

Thanks
Kevin
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xen/blkfront: Adjust indentation in xlvbd_alloc_gendisk

2019-12-09 Thread Jürgen Groß

On 09.12.19 21:14, Nathan Chancellor wrote:

Clang warns:

../drivers/block/xen-blkfront.c:1117:4: warning: misleading indentation;
statement is not part of the previous 'if' [-Wmisleading-indentation]
 nr_parts = PARTS_PER_DISK;
 ^
../drivers/block/xen-blkfront.c:1115:3: note: previous statement is here
 if (err)
 ^

This is because there is a space at the beginning of this line; remove
it so that the indentation is consistent according to the Linux kernel
coding style and clang no longer warns.

While we are here, the previous line has some trailing whitespace; clean
that up as well.

Fixes: c80a420995e7 ("xen-blkfront: handle Xen major numbers other than XENVBD")
Link: https://github.com/ClangBuiltLinux/linux/issues/791
Signed-off-by: Nathan Chancellor 


Reviewed-by: Juergen Gross 


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/svm: Fix handling of EFLAGS.RF on task switch

2019-12-09 Thread Jürgen Groß

On 03.12.19 23:30, Andrew Cooper wrote:

VT-x updates RF before vmexit, so eflags written into the outgoing TSS happens
to be correct.  SVM does not update RF before vmexit, and instead provides it
via a bit in exitinfo2.

In practice, needing RF set in the outgoing state occurs when a task gate is
used to handle faults.

Extend hvm_task_switch() with an extra_eflags parameter which gets fed into
the outgoing TSS, and fill it in suitably from the SVM vmexit information.

Signed-off-by: Andrew Cooper 


Release-acked-by: Juergen Gross 


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/svm: Fix handling of EFLAGS.RF on task switch

2019-12-09 Thread Tian, Kevin
> From: Andrew Cooper 
> Sent: Wednesday, December 4, 2019 6:31 AM
> 
> VT-x updates RF before vmexit, so eflags written into the outgoing TSS
> happens
> to be correct.  SVM does not update RF before vmexit, and instead provides
> it
> via a bit in exitinfo2.
> 
> In practice, needing RF set in the outgoing state occurs when a task gate is
> used to handle faults.
> 
> Extend hvm_task_switch() with an extra_eflags parameter which gets fed
> into
> the outgoing TSS, and fill it in suitably from the SVM vmexit information.
> 
> Signed-off-by: Andrew Cooper 
> ---
> CC: Jan Beulich 
> CC: Wei Liu 
> CC: Roger Pau Monné 
> CC: Jun Nakajima 
> CC: Kevin Tian 
> CC: Juergen Gross 
> 
> Kevin: There is no help in the SDM about this.  RF is not mentioned in the
> list of state either modified or unmodified by hardware on a task switch
> vmexit.  This conclusion has been drawn from looking at the actual VMExit
> state given an XTF test poking every corner of TASK_SWITCH VMExits.

Here is what I observed in latest SDM (Oct. 2019):

27.3.3 Saving RIP, RSP, RFLAGS, and SSP
...
With the exception of the resume flag (RF; bit 16), the contents 
of the RFLAGS register is saved into the RFLAGS field. RFLAGS.RF 
is saved as follows:
...
- If the VM exit is caused by a task switch (including one caused 
by a task gate in the IDT), the value saved is that which would 
have been saved in the RFLAGS image in the old task-state 
segment (TSS) had the task switch completed normally without 
exception.
...

Based on that, fox vmx part:

Reviewed-by: Kevin Tian 

> 
> Juergen: I know its getting stupidly late in the day, but this, like the
> previous fixes, want backporting.  OTOH, the likelihood of not fixing it
> causing harm to VMs is minimal, unlike the earlier task switch fixes.
> ---
>  xen/arch/x86/hvm/hvm.c| 4 ++--
>  xen/arch/x86/hvm/svm/svm.c| 3 ++-
>  xen/arch/x86/hvm/vmx/vmx.c| 3 ++-
>  xen/include/asm-x86/hvm/hvm.h | 2 +-
>  4 files changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 7f556171bd..47573f71b8 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -2913,7 +2913,7 @@ void hvm_prepare_vm86_tss(struct vcpu *v,
> uint32_t base, uint32_t limit)
> 
>  void hvm_task_switch(
>  uint16_t tss_sel, enum hvm_task_switch_reason taskswitch_reason,
> -int32_t errcode, unsigned int insn_len)
> +int32_t errcode, unsigned int insn_len, unsigned int extra_eflags)
>  {
>  struct vcpu *v = current;
>  struct cpu_user_regs *regs = guest_cpu_user_regs();
> @@ -2988,7 +2988,7 @@ void hvm_task_switch(
>  eflags &= ~X86_EFLAGS_NT;
> 
>  tss.eip= regs->eip + insn_len;
> -tss.eflags = eflags;
> +tss.eflags = eflags | extra_eflags;
>  tss.eax= regs->eax;
>  tss.ecx= regs->ecx;
>  tss.edx= regs->edx;
> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
> index 0fb1908c18..6ae43999ff 100644
> --- a/xen/arch/x86/hvm/svm/svm.c
> +++ b/xen/arch/x86/hvm/svm/svm.c
> @@ -2812,7 +2812,8 @@ void svm_vmexit_handler(struct cpu_user_regs
> *regs)
>  if ( (vmcb->exitinfo2 >> 44) & 1 )
>  errcode = (uint32_t)vmcb->exitinfo2;
> 
> -hvm_task_switch(vmcb->exitinfo1, reason, errcode, insn_len);
> +hvm_task_switch(vmcb->exitinfo1, reason, errcode, insn_len,
> +(vmcb->exitinfo2 & (1ul << 48)) ? X86_EFLAGS_RF : 0);
>  break;
>  }
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 7450cbe40d..bafc3b30c5 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -3963,7 +3963,8 @@ void vmx_vmexit_handler(struct cpu_user_regs
> *regs)
>  else
>   ecode = -1;
> 
> -hvm_task_switch(exit_qualification, reasons[source], ecode, 
> inst_len);
> +hvm_task_switch(exit_qualification, reasons[source], ecode, inst_len,
> +0 /* EFLAGS.RF already updated. */);
>  break;
>  }
>  case EXIT_REASON_CPUID:
> diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-
> x86/hvm/hvm.h
> index 17fb7efa6e..1d7b66f927 100644
> --- a/xen/include/asm-x86/hvm/hvm.h
> +++ b/xen/include/asm-x86/hvm/hvm.h
> @@ -296,7 +296,7 @@ void hvm_set_rdtsc_exiting(struct domain *d, bool_t
> enable);
>  enum hvm_task_switch_reason { TSW_jmp, TSW_iret, TSW_call_or_int };
>  void hvm_task_switch(
>  uint16_t tss_sel, enum hvm_task_switch_reason taskswitch_reason,
> -int32_t errcode, unsigned int insn_len);
> +int32_t errcode, unsigned int insn_len, unsigned int extra_eflags);
> 
>  enum hvm_access_type {
>  hvm_access_insn_fetch,
> --
> 2.11.0

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [ovmf test] 144646: regressions - FAIL

2019-12-09 Thread osstest service owner
flight 144646 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/144646/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64   6 xen-buildfail REGR. vs. 144637
 build-i386-xsm6 xen-buildfail REGR. vs. 144637
 build-amd64-xsm   6 xen-buildfail REGR. vs. 144637
 build-i3866 xen-buildfail REGR. vs. 144637

Tests which did not succeed, but are not blocking:
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 build-amd64-libvirt   1 build-check(1)   blocked  n/a

version targeted for testing:
 ovmf 0c3e8e9947a6c13b4327dd11b20acb95441701cf
baseline version:
 ovmf 804666c86e7b6f04fe5c5cfdb13199c19e0e99b0

Last test of basis   144637  2019-12-09 09:09:49 Z0 days
Testing same since   144646  2019-12-10 01:39:53 Z0 days1 attempts


People who touched revisions under test:
  Bob Feng 

jobs:
 build-amd64-xsm  fail
 build-i386-xsm   fail
 build-amd64  fail
 build-i386   fail
 build-amd64-libvirt  blocked 
 build-i386-libvirt   blocked 
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked 
 test-amd64-i386-xl-qemuu-ovmf-amd64  blocked 



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.


commit 0c3e8e9947a6c13b4327dd11b20acb95441701cf
Author: Bob Feng 
Date:   Wed Nov 20 10:58:30 2019 +0800

BaseTools: Enhance Basetool for incremental build

BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2311

Include dependency file in Makefile to enhance
incremental build

Signed-off-by: Bob Feng 

Cc: Liming Gao 
Cc: Steven Shi 
Reviewed-by: Liming Gao 

commit cb277815d5ea92718eed2d334641451ce65b0ff5
Author: Bob Feng 
Date:   Mon Dec 2 16:25:32 2019 +0800

BaseTools: Update build_rule.txt to generate dependent files.

BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2311

Enable the dependent files generation function for compilers
and Trim tool.

Signed-off-by: Bob Feng 

Cc: Liming Gao 
Cc: Steven Shi 
Reviewed-by: Liming Gao 

commit e6edbe315fc3fbd02783cb4faa9284f8d05c410d
Author: Bob Feng 
Date:   Wed Nov 20 10:58:28 2019 +0800

BaseTools: Generate dependent files for ASL and ASM files

BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2311

Implement the function in Trim tool to get the included
file list for ASL and ASM file.

Signed-off-by: Bob Feng 

Cc: Liming Gao 
Cc: Steven Shi 
Reviewed-by: Liming Gao 

commit 13c5e34a1b8bfedbd10ea038cfcbae5caeab6652
Author: Bob Feng 
Date:   Mon Dec 2 16:24:19 2019 +0800

BaseTools: Add build option for dependency file generation

BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2311

Add /showIncludes for msvc and -MMD -MF $@.deps
for GCC and CLANG

Remove /MP for msvc since /MP does not work with
/showIncludes

Signed-off-by: Bob Feng 

Cc: Liming Gao 
Cc: Steven Shi 
Cc: Michael D Kinney 
Reviewed-by: Liming Gao 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 1/2] xenbus/backend: Add memory pressure handler callback

2019-12-09 Thread SeongJae Park
On Tue, Dec 10, 2019 at 7:23 AM Jürgen Groß  wrote:
>
> On 09.12.19 20:43, SeongJae Park wrote:
> > From: SeongJae Park 
> >
> > Granting pages consumes backend system memory.  In systems configured
> > with insufficient spare memory for those pages, it can cause a memory
> > pressure situation.  However, finding the optimal amount of the spare
> > memory is challenging for large systems having dynamic resource
> > utilization patterns.  Also, such a static configuration might lacks a
> > flexibility.
> >
> > To mitigate such problems, this commit adds a memory reclaim callback to
> > 'xenbus_driver'.  Using this facility, 'xenbus' would be able to monitor
> > a memory pressure and request specific domains of specific backend
> > drivers which causing the given pressure to voluntarily release its
> > memory.
> >
> > That said, this commit simply requests every callback registered driver
> > to release its memory for every domain, rather than issueing the
> > requests to the drivers and domain in charge.  Such things would be a
> > future work.  Also, this commit focuses on memory only.  However, it
> > would be ablt to be extended for general resources.
> >
> > Signed-off-by: SeongJae Park 
> > ---
> >   drivers/xen/xenbus/xenbus_probe_backend.c | 31 +++
> >   include/xen/xenbus.h  |  1 +
> >   2 files changed, 32 insertions(+)
> >
> > diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c 
> > b/drivers/xen/xenbus/xenbus_probe_backend.c
> > index b0bed4faf44c..cd5fd1cd8de3 100644
> > --- a/drivers/xen/xenbus/xenbus_probe_backend.c
> > +++ b/drivers/xen/xenbus/xenbus_probe_backend.c
> > @@ -248,6 +248,34 @@ static int backend_probe_and_watch(struct 
> > notifier_block *notifier,
> >   return NOTIFY_DONE;
> >   }
> >
> > +static int xenbus_backend_reclaim(struct device *dev, void *data)
> > +{
> > + struct xenbus_driver *drv;
> > + if (!dev->driver)
> > + return -ENOENT;
> > + drv = to_xenbus_driver(dev->driver);
> > + if (drv && drv->reclaim)
> > + drv->reclaim(to_xenbus_device(dev), DOMID_INVALID);
>
> Oh, sorry for first requesting you to add the domid as a parameter,
> but now I realize this could be handled in the xenbus driver, as
> struct xenbus_device already contains the otherend_id.
>
> Would you mind dropping the parameter again, please?

Oh, I also missed it!  Will do!


Thanks,
SeongJae Park

>
>
> Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 1/2] xenbus/backend: Add memory pressure handler callback

2019-12-09 Thread SeongJae Park
On Tue, Dec 10, 2019 at 7:11 AM Jürgen Groß  wrote:
>
> On 09.12.19 20:43, SeongJae Park wrote:
> > From: SeongJae Park 
> >
> > Granting pages consumes backend system memory.  In systems configured
> > with insufficient spare memory for those pages, it can cause a memory
> > pressure situation.  However, finding the optimal amount of the spare
> > memory is challenging for large systems having dynamic resource
> > utilization patterns.  Also, such a static configuration might lacks a
> > flexibility.
> >
> > To mitigate such problems, this commit adds a memory reclaim callback to
> > 'xenbus_driver'.  Using this facility, 'xenbus' would be able to monitor
> > a memory pressure and request specific domains of specific backend
> > drivers which causing the given pressure to voluntarily release its
> > memory.
> >
> > That said, this commit simply requests every callback registered driver
> > to release its memory for every domain, rather than issueing the
> > requests to the drivers and domain in charge.  Such things would be a
> > future work.  Also, this commit focuses on memory only.  However, it
> > would be ablt to be extended for general resources.
> >
> > Signed-off-by: SeongJae Park 
> > ---
> >   drivers/xen/xenbus/xenbus_probe_backend.c | 31 +++
> >   include/xen/xenbus.h  |  1 +
> >   2 files changed, 32 insertions(+)
> >
> > diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c 
> > b/drivers/xen/xenbus/xenbus_probe_backend.c
> > index b0bed4faf44c..cd5fd1cd8de3 100644
> > --- a/drivers/xen/xenbus/xenbus_probe_backend.c
> > +++ b/drivers/xen/xenbus/xenbus_probe_backend.c
> > @@ -248,6 +248,34 @@ static int backend_probe_and_watch(struct 
> > notifier_block *notifier,
> >   return NOTIFY_DONE;
> >   }
> >
> > +static int xenbus_backend_reclaim(struct device *dev, void *data)
> > +{
> > + struct xenbus_driver *drv;
> > + if (!dev->driver)
> > + return -ENOENT;
> > + drv = to_xenbus_driver(dev->driver);
> > + if (drv && drv->reclaim)
> > + drv->reclaim(to_xenbus_device(dev), DOMID_INVALID);
> > + return 0;
> > +}
> > +
> > +/*
> > + * Returns 0 always because we are using shrinker to only detect memory
> > + * pressure.
> > + */
> > +static unsigned long xenbus_backend_shrink_count(struct shrinker *shrinker,
> > + struct shrink_control *sc)
> > +{
> > + bus_for_each_dev(_backend.bus, NULL, NULL,
> > + xenbus_backend_reclaim);
> > + return 0;
> > +}
> > +
> > +static struct shrinker xenbus_backend_shrinker = {
> > + .count_objects = xenbus_backend_shrink_count,
> > + .seeks = DEFAULT_SEEKS,
> > +};
> > +
> >   static int __init xenbus_probe_backend_init(void)
> >   {
> >   static struct notifier_block xenstore_notifier = {
> > @@ -264,6 +292,9 @@ static int __init xenbus_probe_backend_init(void)
> >
> >   register_xenstore_notifier(_notifier);
> >
> > + if (register_shrinker(_backend_shrinker))
> > + pr_warn("shrinker registration failed\n");
> > +
> >   return 0;
> >   }
> >   subsys_initcall(xenbus_probe_backend_init);
> > diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
> > index 869c816d5f8c..52aaf4f78400 100644
> > --- a/include/xen/xenbus.h
> > +++ b/include/xen/xenbus.h
> > @@ -104,6 +104,7 @@ struct xenbus_driver {
> >   struct device_driver driver;
> >   int (*read_otherend_details)(struct xenbus_device *dev);
> >   int (*is_ready)(struct xenbus_device *dev);
> > + unsigned (*reclaim)(struct xenbus_device *dev, domid_t domid);
>
> Can you please add a comment here regarding semantics of specifying
> DOMID_INVALID as domid?

Yes, of course.  Will do with the next version.


Thanks,
SeongJae Park

>
> Block maintainers, would you be fine with me carrying this series
> through the Xen tree?
>
>
> Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 2/2] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-09 Thread Jürgen Groß

On 09.12.19 20:43, SeongJae Park wrote:

From: SeongJae Park 

Each `blkif` has a free pages pool for the grant mapping.  The size of
the pool starts from zero and be increased on demand while processing
the I/O requests.  If current I/O requests handling is finished or 100
milliseconds has passed since last I/O requests handling, it checks and
shrinks the pool to not exceed the size limit, `max_buffer_pages`.

Therefore, `blkfront` running guests can cause a memory pressure in the
`blkback` running guest by attaching a large number of block devices and
inducing I/O.  System administrators can avoid such problematic
situations by limiting the maximum number of devices each guest can
attach.  However, finding the optimal limit is not so easy.  Improper
set of the limit can results in the memory pressure or a resource
underutilization.  This commit avoids such problematic situations by
squeezing the pools (returns every free page in the pool to the system)
for a while (users can set this duration via a module parameter) if a
memory pressure is detected.

Discussions
===

The `blkback`'s original shrinking mechanism returns only pages in the
pool, which are not currently be used by `blkback`, to the system.  In
other words, the pages are not mapped with foreign pages.  Because this
commit is changing only the shrink limit but uses the mechanism as is,
this commit does not introduce improper mappings related security
issues.

Once a memory pressure is detected, this commit keeps the squeezing
limit for a user-specified time duration.  The duration should be
neither too long nor too short.  If it is too long, the squeezing
incurring overhead can reduce the I/O performance.  If it is too short,
`blkback` will not free enough pages to reduce the memory pressure.
This commit sets the value as `10 milliseconds` by default because it is
a short time in terms of I/O while it is a long time in terms of memory
operations.  Also, as the original shrinking mechanism works for at
least every 100 milliseconds, this could be a somewhat reasonable
choice.  I also tested other durations (refer to the below section for
more details) and confirmed that 10 milliseconds is the one that works
best with the test.  That said, the proper duration depends on actual
configurations and workloads.  That's why this commit is allowing users
to set it as their optimal value via the module parameter.

Memory Pressure Test


To show how this commit fixes the memory pressure situation well, I
configured a test environment on a xen-running virtualization system.
On the `blkfront` running guest instances, I attach a large number of
network-backed volume devices and induce I/O to those.  Meanwhile, I
measure the number of pages that swapped in and out on the `blkback`
running guest.  The test ran twice, once for the `blkback` before this
commit and once for that after this commit.  As shown below, this commit
has dramatically reduced the memory pressure:

 pswpin  pswpout
 before  76,672  185,799
 after  2123,325

Optimal Aggressive Shrinking Duration
-

To find a best squeezing duration, I repeated the test with three
different durations (1ms, 10ms, and 100ms).  The results are as below:

 durationpswpin  pswpout
 1   852 6,424
 10  212 3,325
 100 203 3,340

As expected, the memory pressure has decreased as the duration is
increased, but the reduction stopped from the `10ms`.  Based on this
results, I chose the default duration as 10ms.

Performance Overhead Test
=

This commit could incur I/O performance degradation under severe memory
pressure because the squeezing will require more page allocations per
I/O.  To show the overhead, I artificially made a worst-case squeezing
situation and measured the I/O performance of a `blkfront` running
guest.

For the artificial squeezing, I set the `blkback.max_buffer_pages` using
the `/sys/module/xen_blkback/parameters/max_buffer_pages` file.  We set
the value to `1024` and `0`.  The `1024` is the default value.  Setting
the value as `0` is same to a situation doing the squeezing always
(worst-case).

For the I/O performance measurement, I use a simple `dd` command.

Default Performance
---

 [dom0]# echo 1024 > /sys/module/xen_blkback/parameters/max_buffer_pages
 [instance]$ for i in {1..5}; do dd if=/dev/zero of=file bs=4k 
count=$((256*512)); sync; done
 131072+0 records in
 131072+0 records out
 536870912 bytes (537 MB) copied, 11.7257 s, 45.8 MB/s
 131072+0 records in
 131072+0 records out
 536870912 bytes (537 MB) copied, 13.8827 s, 38.7 MB/s
 131072+0 records in
 131072+0 records out
 536870912 bytes (537 MB) copied, 13.8781 s, 38.7 MB/s
 131072+0 records in
 131072+0 records out
 536870912 bytes (537 MB) copied, 13.8737 s, 38.7 MB/s
 

Re: [Xen-devel] [PATCH v4 1/2] xenbus/backend: Add memory pressure handler callback

2019-12-09 Thread Jürgen Groß

On 09.12.19 20:43, SeongJae Park wrote:

From: SeongJae Park 

Granting pages consumes backend system memory.  In systems configured
with insufficient spare memory for those pages, it can cause a memory
pressure situation.  However, finding the optimal amount of the spare
memory is challenging for large systems having dynamic resource
utilization patterns.  Also, such a static configuration might lacks a
flexibility.

To mitigate such problems, this commit adds a memory reclaim callback to
'xenbus_driver'.  Using this facility, 'xenbus' would be able to monitor
a memory pressure and request specific domains of specific backend
drivers which causing the given pressure to voluntarily release its
memory.

That said, this commit simply requests every callback registered driver
to release its memory for every domain, rather than issueing the
requests to the drivers and domain in charge.  Such things would be a
future work.  Also, this commit focuses on memory only.  However, it
would be ablt to be extended for general resources.

Signed-off-by: SeongJae Park 
---
  drivers/xen/xenbus/xenbus_probe_backend.c | 31 +++
  include/xen/xenbus.h  |  1 +
  2 files changed, 32 insertions(+)

diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c 
b/drivers/xen/xenbus/xenbus_probe_backend.c
index b0bed4faf44c..cd5fd1cd8de3 100644
--- a/drivers/xen/xenbus/xenbus_probe_backend.c
+++ b/drivers/xen/xenbus/xenbus_probe_backend.c
@@ -248,6 +248,34 @@ static int backend_probe_and_watch(struct notifier_block 
*notifier,
return NOTIFY_DONE;
  }
  
+static int xenbus_backend_reclaim(struct device *dev, void *data)

+{
+   struct xenbus_driver *drv;
+   if (!dev->driver)
+   return -ENOENT;
+   drv = to_xenbus_driver(dev->driver);
+   if (drv && drv->reclaim)
+   drv->reclaim(to_xenbus_device(dev), DOMID_INVALID);


Oh, sorry for first requesting you to add the domid as a parameter,
but now I realize this could be handled in the xenbus driver, as
struct xenbus_device already contains the otherend_id.

Would you mind dropping the parameter again, please?


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 1/2] xenbus/backend: Add memory pressure handler callback

2019-12-09 Thread Jürgen Groß

On 09.12.19 20:43, SeongJae Park wrote:

From: SeongJae Park 

Granting pages consumes backend system memory.  In systems configured
with insufficient spare memory for those pages, it can cause a memory
pressure situation.  However, finding the optimal amount of the spare
memory is challenging for large systems having dynamic resource
utilization patterns.  Also, such a static configuration might lacks a
flexibility.

To mitigate such problems, this commit adds a memory reclaim callback to
'xenbus_driver'.  Using this facility, 'xenbus' would be able to monitor
a memory pressure and request specific domains of specific backend
drivers which causing the given pressure to voluntarily release its
memory.

That said, this commit simply requests every callback registered driver
to release its memory for every domain, rather than issueing the
requests to the drivers and domain in charge.  Such things would be a
future work.  Also, this commit focuses on memory only.  However, it
would be ablt to be extended for general resources.

Signed-off-by: SeongJae Park 
---
  drivers/xen/xenbus/xenbus_probe_backend.c | 31 +++
  include/xen/xenbus.h  |  1 +
  2 files changed, 32 insertions(+)

diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c 
b/drivers/xen/xenbus/xenbus_probe_backend.c
index b0bed4faf44c..cd5fd1cd8de3 100644
--- a/drivers/xen/xenbus/xenbus_probe_backend.c
+++ b/drivers/xen/xenbus/xenbus_probe_backend.c
@@ -248,6 +248,34 @@ static int backend_probe_and_watch(struct notifier_block 
*notifier,
return NOTIFY_DONE;
  }
  
+static int xenbus_backend_reclaim(struct device *dev, void *data)

+{
+   struct xenbus_driver *drv;
+   if (!dev->driver)
+   return -ENOENT;
+   drv = to_xenbus_driver(dev->driver);
+   if (drv && drv->reclaim)
+   drv->reclaim(to_xenbus_device(dev), DOMID_INVALID);
+   return 0;
+}
+
+/*
+ * Returns 0 always because we are using shrinker to only detect memory
+ * pressure.
+ */
+static unsigned long xenbus_backend_shrink_count(struct shrinker *shrinker,
+   struct shrink_control *sc)
+{
+   bus_for_each_dev(_backend.bus, NULL, NULL,
+   xenbus_backend_reclaim);
+   return 0;
+}
+
+static struct shrinker xenbus_backend_shrinker = {
+   .count_objects = xenbus_backend_shrink_count,
+   .seeks = DEFAULT_SEEKS,
+};
+
  static int __init xenbus_probe_backend_init(void)
  {
static struct notifier_block xenstore_notifier = {
@@ -264,6 +292,9 @@ static int __init xenbus_probe_backend_init(void)
  
  	register_xenstore_notifier(_notifier);
  
+	if (register_shrinker(_backend_shrinker))

+   pr_warn("shrinker registration failed\n");
+
return 0;
  }
  subsys_initcall(xenbus_probe_backend_init);
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index 869c816d5f8c..52aaf4f78400 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -104,6 +104,7 @@ struct xenbus_driver {
struct device_driver driver;
int (*read_otherend_details)(struct xenbus_device *dev);
int (*is_ready)(struct xenbus_device *dev);
+   unsigned (*reclaim)(struct xenbus_device *dev, domid_t domid);


Can you please add a comment here regarding semantics of specifying
DOMID_INVALID as domid?

Block maintainers, would you be fine with me carrying this series
through the Xen tree?


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xen/blkfront: Adjust indentation in xlvbd_alloc_gendisk

2019-12-09 Thread Jürgen Groß

On 09.12.19 22:07, Nick Desaulniers wrote:

On Mon, Dec 9, 2019 at 12:14 PM Nathan Chancellor
 wrote:


Clang warns:

../drivers/block/xen-blkfront.c:1117:4: warning: misleading indentation;
statement is not part of the previous 'if' [-Wmisleading-indentation]
 nr_parts = PARTS_PER_DISK;
 ^
../drivers/block/xen-blkfront.c:1115:3: note: previous statement is here
 if (err)
 ^

This is because there is a space at the beginning of this line; remove
it so that the indentation is consistent according to the Linux kernel
coding style and clang no longer warns.

While we are here, the previous line has some trailing whitespace; clean
that up as well.

Fixes: c80a420995e7 ("xen-blkfront: handle Xen major numbers other than XENVBD")
Link: https://github.com/ClangBuiltLinux/linux/issues/791
Signed-off-by: Nathan Chancellor 
---
  drivers/block/xen-blkfront.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index a74d03913822..c02be06c5299 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -1113,8 +1113,8 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,


While you're here, would you please also removing the single space
before the labels in this function?


AFAIK those are intended to be there.

Having labels indented by a space makes diff not believe those are
function declarations. So a patching a function with a label won't show
the label, but the function in the diff block header.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] xen/blkfront: Adjust indentation in xlvbd_alloc_gendisk

2019-12-09 Thread Nathan Chancellor
Clang warns:

../drivers/block/xen-blkfront.c:1117:4: warning: misleading indentation;
statement is not part of the previous 'if' [-Wmisleading-indentation]
nr_parts = PARTS_PER_DISK;
^
../drivers/block/xen-blkfront.c:1115:3: note: previous statement is here
if (err)
^

This is because there is a space at the beginning of this line; remove
it so that the indentation is consistent according to the Linux kernel
coding style and clang no longer warns.

While we are here, the previous line has some trailing whitespace; clean
that up as well.

Fixes: c80a420995e7 ("xen-blkfront: handle Xen major numbers other than XENVBD")
Link: https://github.com/ClangBuiltLinux/linux/issues/791
Signed-off-by: Nathan Chancellor 
---
 drivers/block/xen-blkfront.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index a74d03913822..c02be06c5299 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -1113,8 +1113,8 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
if (!VDEV_IS_EXTENDED(info->vdevice)) {
err = xen_translate_vdev(info->vdevice, , );
if (err)
-   return err; 
-   nr_parts = PARTS_PER_DISK;
+   return err;
+   nr_parts = PARTS_PER_DISK;
} else {
minor = BLKIF_MINOR_EXT(info->vdevice);
nr_parts = PARTS_PER_EXT_DISK;
-- 
2.24.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xen/blkfront: Adjust indentation in xlvbd_alloc_gendisk

2019-12-09 Thread Nick Desaulniers
On Mon, Dec 9, 2019 at 12:14 PM Nathan Chancellor
 wrote:
>
> Clang warns:
>
> ../drivers/block/xen-blkfront.c:1117:4: warning: misleading indentation;
> statement is not part of the previous 'if' [-Wmisleading-indentation]
> nr_parts = PARTS_PER_DISK;
> ^
> ../drivers/block/xen-blkfront.c:1115:3: note: previous statement is here
> if (err)
> ^
>
> This is because there is a space at the beginning of this line; remove
> it so that the indentation is consistent according to the Linux kernel
> coding style and clang no longer warns.
>
> While we are here, the previous line has some trailing whitespace; clean
> that up as well.
>
> Fixes: c80a420995e7 ("xen-blkfront: handle Xen major numbers other than 
> XENVBD")
> Link: https://github.com/ClangBuiltLinux/linux/issues/791
> Signed-off-by: Nathan Chancellor 
> ---
>  drivers/block/xen-blkfront.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index a74d03913822..c02be06c5299 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -1113,8 +1113,8 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,

While you're here, would you please also removing the single space
before the labels in this function?

In vim:

/^ [a-zA-Z]

turns up 5 labels with this.

> if (!VDEV_IS_EXTENDED(info->vdevice)) {
> err = xen_translate_vdev(info->vdevice, , );
> if (err)
> -   return err;
> -   nr_parts = PARTS_PER_DISK;
> +   return err;
> +   nr_parts = PARTS_PER_DISK;
> } else {
> minor = BLKIF_MINOR_EXT(info->vdevice);
> nr_parts = PARTS_PER_EXT_DISK;
> --
> 2.24.0
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to clang-built-linux+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/clang-built-linux/20191209201444.33243-1-natechancellor%40gmail.com.



-- 
Thanks,
~Nick Desaulniers

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xen/blkfront: Adjust indentation in xlvbd_alloc_gendisk

2019-12-09 Thread Nathan Chancellor
On Mon, Dec 09, 2019 at 01:07:41PM -0800, Nick Desaulniers wrote:
> On Mon, Dec 9, 2019 at 12:14 PM Nathan Chancellor
>  wrote:
> >
> > Clang warns:
> >
> > ../drivers/block/xen-blkfront.c:1117:4: warning: misleading indentation;
> > statement is not part of the previous 'if' [-Wmisleading-indentation]
> > nr_parts = PARTS_PER_DISK;
> > ^
> > ../drivers/block/xen-blkfront.c:1115:3: note: previous statement is here
> > if (err)
> > ^
> >
> > This is because there is a space at the beginning of this line; remove
> > it so that the indentation is consistent according to the Linux kernel
> > coding style and clang no longer warns.
> >
> > While we are here, the previous line has some trailing whitespace; clean
> > that up as well.
> >
> > Fixes: c80a420995e7 ("xen-blkfront: handle Xen major numbers other than 
> > XENVBD")
> > Link: https://github.com/ClangBuiltLinux/linux/issues/791
> > Signed-off-by: Nathan Chancellor 
> > ---
> >  drivers/block/xen-blkfront.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> > index a74d03913822..c02be06c5299 100644
> > --- a/drivers/block/xen-blkfront.c
> > +++ b/drivers/block/xen-blkfront.c
> > @@ -1113,8 +1113,8 @@ static int xlvbd_alloc_gendisk(blkif_sector_t 
> > capacity,
> 
> While you're here, would you please also removing the single space
> before the labels in this function?
> 
> In vim:
> 
> /^ [a-zA-Z]
> 
> turns up 5 labels with this.

That should probably be a separate patch since there are only two labels
in the function I am touching here. I'll whip up a v2 if the maintainers
want it though or I'll just draft a separate patch when I am done
addressing all of the misleading indentation warnings.

Thanks for the reply!
Nathan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [qemu-mainline test] 144643: trouble: broken/fail/pass

2019-12-09 Thread osstest service owner
virtuozzo.com>
Reviewed-by: Eric Blake 
Reviewed-by: John Snow 
[eblake: commit message tweaks]
Signed-off-by: Eric Blake 

commit 8350b17be015bb872f28268bdeba1bac6c380efc
Merge: 02f9c885ed a2fad86497
Author: Peter Maydell 
Date:   Mon Dec 9 11:07:34 2019 +

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.2-20191209' 
into staging

ppc patch queue 2019-12-09

This is a last minute pull request for ppc-for-4.2.  I know it's very
late in freeze, but this does fix a regression: a bad interaction
between the new qemu and SLOF device tree construction code means that
SLOF will crash if PCI to PCI bridges are included in the system.

This PR supersedes ppc-for-4.2-20191206.  This one has only a more
minimal change to the firmware addressed only at fixing this bug and
not incorporating some other unrelated changes that happened in the
meantime.

# gpg: Signature made Mon 09 Dec 2019 04:52:19 GMT
# gpg:using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392
# gpg: Good signature from "David Gibson " 
[full]
# gpg: aka "David Gibson (Red Hat) " 
[full]
# gpg: aka "David Gibson (ozlabs.org) " 
[full]
# gpg: aka "David Gibson (kernel.org) " 
[unknown]
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 
B392

* remotes/dgibson/tags/ppc-for-4.2-20191209:
  pseries: Update SLOF firmware image

Signed-off-by: Peter Maydell 

commit a2fad86497b981426dc720236c15f2a99ca674a9
Author: Alexey Kardashevskiy 
Date:   Mon Dec 9 12:07:46 2019 +1100

pseries: Update SLOF firmware image

This fixes PCI bridge regression.

Alexey Kardashevskiy (3):
  ibm,client-architecture-support: Fix stack handling
  fdt: Fix updating the tree at H_CAS
  version: update to 20191209

Signed-off-by: Alexey Kardashevskiy 
Signed-off-by: David Gibson 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [seabios test] 144644: trouble: broken/fail/pass

2019-12-09 Thread osstest service owner
flight 144644 seabios real [real]
http://logs.test-lab.xenproject.org/osstest/logs/144644/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadowbroken
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm broken
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow broken
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 4 host-install(4) broken REGR. 
vs. 144198
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 4 host-install(4) broken 
REGR. vs. 144198
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 4 host-install(4) broken REGR. vs. 
144198

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 144198
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 144198
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 144198
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 144198
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass

version targeted for testing:
 seabios  f21b5a4aeb020f2a5e2c6503f906a9349dd2f069
baseline version:
 seabios  c9ba5276e3217ac6a1ec772dbebf568ba3a8a55d

Last test of basis   144198  2019-11-18 14:08:47 Z   21 days
Testing same since   144644  2019-12-09 21:08:58 Z0 days1 attempts


People who touched revisions under test:
  Kevin O'Connor 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm pass
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm  broken  
 test-amd64-amd64-qemuu-nested-amdfail
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-win7-amd64 fail
 test-amd64-i386-xl-qemuu-win7-amd64  fail
 test-amd64-amd64-xl-qemuu-ws16-amd64 fail
 test-amd64-i386-xl-qemuu-ws16-amd64  fail
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrictpass
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict pass
 test-amd64-amd64-qemuu-nested-intel  pass
 test-amd64-i386-qemuu-rhel6hvm-intel pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow broken  
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow  broken  



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary

broken-job test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow broken
broken-job test-amd64-i386-xl-qemuu-debianhvm-i386-xsm broken
broken-job test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow broken
broken-step test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow host-install(4)
broken-step test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow host-install(4)
broken-step test-amd64-i386-xl-qemuu-debianhvm-i386-xsm host-install(4)

Not pushing.


commit f21b5a4aeb020f2a5e2c6503f906a9349dd2f069
Author: Kevin O'Connor 
Date:   Mon Dec 9 15:08:17 2019 -0500

docs: Note v1.13.0 release

Signed-off-by: Kevin O'Connor 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org

[Xen-devel] [xen-unstable test] 144641: regressions - trouble: broken/fail/pass

2019-12-09 Thread osstest service owner
flight 144641 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/144641/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qcow2broken
 test-amd64-i386-xl-qemuu-win7-amd64 broken
 test-amd64-i386-xl-qemuu-debianhvm-amd64broken
 test-amd64-i386-xl-qemut-debianhvm-i386-xsm broken
 test-amd64-amd64-xl-qemut-win7-amd64 broken
 test-amd64-i386-xl-xsm   broken
 test-amd64-i386-xl-qemut-debianhvm-amd64broken
 test-amd64-i386-xl-qemuu-win7-amd64  4 host-install(4) broken REGR. vs. 144631
 test-amd64-i386-xl-qemut-debianhvm-amd64 4 host-install(4) broken REGR. vs. 
144631
 test-amd64-i386-xl-qemuu-debianhvm-amd64 4 host-install(4) broken REGR. vs. 
144631
 test-amd64-i386-xl-xsm4 host-install(4)broken REGR. vs. 144635
 test-amd64-i386-xl-qemut-debianhvm-i386-xsm 4 host-install(4) broken REGR. vs. 
144635
 test-amd64-i386-qemuu-rhel6hvm-intel 12 guest-start/redhat.repeat fail REGR. 
vs. 144631

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds16 guest-start/debian.repeat fail REGR. vs. 144619

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64  4 host-install(4) broken like 144635
 test-amd64-amd64-xl-qcow2 4 host-install(4) broken like 144635
 test-amd64-amd64-xl-rtds 16 guest-localmigrate   fail  like 144619
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 144631
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 144631
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 144635
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 144635
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 144635
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 144635
 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 144635
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-arm64-arm64-xl-seattle  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 

[Xen-devel] [xen-4.13-testing test] 144640: regressions - trouble: broken/fail/pass

2019-12-09 Thread osstest service owner
flight 144640 xen-4.13-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/144640/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-credit1  broken
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow broken
 test-amd64-amd64-pygrub  broken
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadowbroken
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm  broken
 test-amd64-amd64-i386-pvgrub broken
 test-xtf-amd64-amd64-1   broken
 test-amd64-amd64-xl-xsm  broken
 test-amd64-amd64-i386-pvgrub  4 host-install(4)broken REGR. vs. 144609
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 4 host-install(4) broken REGR. 
vs. 144609
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 4 host-install(4) broken 
REGR. vs. 144609
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 4 host-install(4) broken 
REGR. vs. 144609
 test-xtf-amd64-amd64-14 host-install(4)broken REGR. vs. 144609
 test-amd64-amd64-xl-xsm   4 host-install(4)broken REGR. vs. 144609
 test-amd64-amd64-pygrub   4 host-install(4)broken REGR. vs. 144609
 test-amd64-amd64-xl-credit1   4 host-install(4)broken REGR. vs. 144609
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 debian-hvm-install fail 
REGR. vs. 144609
 test-amd64-amd64-libvirt-vhd 10 debian-di-installfail REGR. vs. 144609

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds16 guest-start/debian.repeat fail REGR. vs. 144609

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  14 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop  fail never pass
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop  fail never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop  fail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 13 

Re: [Xen-devel] [PATCH] x86/microcode: Support builtin CPU microcode

2019-12-09 Thread Eslam Elnikety

On 09.12.19 16:19, Andrew Cooper wrote:

On 09/12/2019 08:41, Eslam Elnikety wrote:

diff --git a/docs/misc/builtin-ucode.txt b/docs/misc/builtin-ucode.txt
new file mode 100644
index 00..43bb60d3eb


Instead of introducing a new file, please extend
docs/admin-guide/microcode-loading.rst

I have an in-prep docs/hypervisor-guide/microcode-loading.rst as well,
which I'll see about posting.  It would be a more appropriate place for
the technical details.



Agreed!

While the existing docs/admin-guide/microcode-loading.rst speaks a 
different tone than what I added, that documentation anyway needs to be 
updated with builtin if such support were to be added. I will adjust 
accordingly. If docs/hypervisor-guide/microcode-loading.rst makes it in 
time for v2 of this patch, I will reflect changes there too.



diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 6ced293d88..7afbe44286 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -97,6 +97,14 @@ static struct ucode_mod_blob __initdata ucode_blob;
   */
  static bool_t __initdata ucode_scan;
  
+#ifdef CONFIG_BUILTIN_UCODE

+/* builtin is the default when BUILTIN_UCODE is set */
+static bool_t __initdata ucode_builtin = 1;


bool, true.



Will fix in v2.


+
+extern const char __builtin_intel_ucode_start[], __builtin_intel_ucode_end[];
+extern const char __builtin_amd_ucode_start[], __builtin_amd_ucode_end[];
+#endif
+
  /* By default, ucode loading is done in NMI handler */
  static bool ucode_in_nmi = true;
  
@@ -110,9 +118,9 @@ void __init microcode_set_module(unsigned int idx)

  }
  
  /*

- * The format is '[|scan=, nmi=]'. Both options are
- * optional. If the EFI has forced which of the multiboot payloads is to be
- * used, only nmi= is parsed.
+ * The format is '[|scan=|builtin=, nmi=]'. All
+ * options are optional. If the EFI has forced which of the multiboot payloads
+ * is to be used, only nmi= is parsed.
   */


Please delete this, or I'll do a prereq patch to fix it and the command
line docs.  (Both are in a poor state.)



Unless you are planning that along your on-going 
docs/hypervisor-guide/microcode-loading.rst effort, I can pick up this 
clean-up/prereq patch myself. What do you have in mind? (Or point me to 
a good example and I will figure things out).



@@ -237,6 +249,48 @@ void __init microcode_grab_module(
  scan:
  if ( ucode_scan )
  microcode_scan_module(module_map, mbi);
+
+#ifdef CONFIG_BUILTIN_UCODE
+/*
+ * Do not use the builtin microcode if:
+ * (a) builtin has been explicitly turned off (e.g., ucode=no-builtin)
+ * (b) a microcode module has been specified or a scan is successful
+ */
+if ( !ucode_builtin || ucode_mod.mod_end || ucode_blob.size )
+return;
+
+/* Set ucode_start/_end to the proper blob */
+if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD )
+ucode_blob.size = (size_t)(__builtin_amd_ucode_end
+   - __builtin_amd_ucode_start);


No need to cast the result.  Also, preferred Xen style would be to have
- on the preceding line.



Will fix in v2.


+else if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
+ucode_blob.size = (size_t)(__builtin_intel_ucode_end
+   - __builtin_intel_ucode_start);
+else
+return;
+
+if ( !ucode_blob.size )
+{
+printk("No builtin ucode! 'ucode=builtin' is nullified.\n");
+return;
+}
+else if ( ucode_blob.size > MAX_EARLY_CPIO_MICROCODE )
+{
+printk("Builtin microcode payload too big! (%ld, we can do %d)\n",
+   ucode_blob.size, MAX_EARLY_CPIO_MICROCODE);
+ucode_blob.size = 0;
+return;
+}
+
+ucode_blob.data = xmalloc_bytes(ucode_blob.size);
+if ( !ucode_blob.data )
+return;


Any chance we can reuse the "fits" logic to avoid holding every
inapplicable blob in memory as well?



I think this would be a welcomed change. It seems to me that we have two 
ways to go about it.


1) We factor the code in the intel-/amd-specific cpu_request_microcode 
to extract logic for finding a match into its own new function, expose 
that through microcode_ops, and finally do xalloc only for the matching 
microcode when early loading is scan or builtin.


2) Cannot we just do away completely with xalloc? I see that each 
individual microcode update gets allocated anyway in 
microcode_intel.c/get_next_ucode_from_buffer() and in 
microcode_amd.c/cpu_request_microcode(). Unless I am missing something, 
the xmalloc_bytes for ucode_blob.data is redundant.


Thoughts?


+
+if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD )
+memcpy(ucode_blob.data, __builtin_amd_ucode_start, ucode_blob.size);
+else
+memcpy(ucode_blob.data, __builtin_intel_ucode_start, ucode_blob.size);
+#endif
  }
  
  const struct microcode_ops *microcode_ops;

diff --git a/xen/arch/x86/microcode/Makefile b/xen/arch/x86/microcode/Makefile

Re: [Xen-devel] [PATCH] xen/pciback: Prevent NULL pointer dereference in quirks_show

2019-12-09 Thread Boris Ostrovsky


On 12/9/19 1:16 PM, Nuernberger, Stefan wrote:

On Fri, 2019-12-06 at 15:15 -0500, Boris Ostrovsky wrote:

On 12/6/19 1:09 PM, Nuernberger, Stefan wrote:

On Fri, 2019-12-06 at 10:11 -0500, Boris Ostrovsky wrote:

On 12/6/19 8:48 AM, Stefan Nuernberger wrote:

From: Uwe Dannowski 
  
  		list_for_each_entry(cfg_entry, _data-

config_fields, list) {

Couldn't you have the same race here?

Not quite the same, but it might not be entirely safe yet. The
'quirks_show' takes the 'device_ids_lock' and races with unbind /
'pcistub_device_release' "which takes device_lock mutex". So this
might
now be a UAF read access instead of a NULL pointer dereference.

Yes, that's what I meant (although I don't see much difference in
this
context).

Well, the NULL ptr access causes an instant kernel panic whereas we
have not attributed crashes to the possible UAF read until now.


  We have
not observed adversarial effects in our testing (compared to the
obvious issues with NULL pointer) but that's not a guarantee of
course.

So should quirks_show actually be protected by pcistub_devices_lock
instead as are other functions that access dev_data? Does it need
both
locks in that case?

device_ids_lock protects device_ids list, which is not what you are
trying to access, so that doesn't look like right lock to hold. And
AFAICT pcistub_devices_lock is not held when device data is cleared
in
pcistub_device_release() (which I think is where we are racing).

Indeed. The xen_pcibk_quirks list does not have a separate lock to
protect it. It's either modified under 'pcistub_devices_lock', from
pcistub_remove(), or iterated over with the 'device_ids_lock' held in
quirks_show(). Also the quirks list is amended from
   pcistub_init_device()
     -> xen_pcibk_config_init_dev()
       -> xen_pcibk_config_quirks_init()
without holding any lock at all. In fact the
pcistub_init_devices_late() and pcistub_seize() functions deliberately
release the pcistub_devices_lock before calling pcistub_init_device().
This looks broken.



Indeed.




The race is between
   pcistub_remove()
     -> pcistub_device_put()
       -> pcistub_device_release()
on one side and the quirks_show() on the other side. The problematic
quirk is freed from the xen_pcibk_quirks list in pcistub_remove() early
on under pcistub_devices_lock before the associated dev_data is freed
eventually. So switching from device_ids_lock to pcistub_devices_lock
in quirks_show() could be sufficient to always have valid dev_data for
all quirks in the list.



Yes, that should do it. (I missed xen_pcibk_config_quirk_release() call, 
which is why I wasn't sure pcistub_devices_lock is held where necessary).





There is also pcistub_put_pci_dev() possibly in the race, called from
xen_pcibk_remove_device(), or xen_pcibk_xenbus_remove(), or
pcistub_remove(). The pcistub_remove() call site is safe when we switch
to pcistub_devices_lock (same reasoning as above). For the others I
currently do not see when the quirks are ever freed?



I wonder whether we should call xen_pcibk_config_quirk_release() from 
pcistub_device_release() under pcistub_devices_lock.



-boris


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v4 2/2] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-09 Thread SeongJae Park
From: SeongJae Park 

Each `blkif` has a free pages pool for the grant mapping.  The size of
the pool starts from zero and be increased on demand while processing
the I/O requests.  If current I/O requests handling is finished or 100
milliseconds has passed since last I/O requests handling, it checks and
shrinks the pool to not exceed the size limit, `max_buffer_pages`.

Therefore, `blkfront` running guests can cause a memory pressure in the
`blkback` running guest by attaching a large number of block devices and
inducing I/O.  System administrators can avoid such problematic
situations by limiting the maximum number of devices each guest can
attach.  However, finding the optimal limit is not so easy.  Improper
set of the limit can results in the memory pressure or a resource
underutilization.  This commit avoids such problematic situations by
squeezing the pools (returns every free page in the pool to the system)
for a while (users can set this duration via a module parameter) if a
memory pressure is detected.

Discussions
===

The `blkback`'s original shrinking mechanism returns only pages in the
pool, which are not currently be used by `blkback`, to the system.  In
other words, the pages are not mapped with foreign pages.  Because this
commit is changing only the shrink limit but uses the mechanism as is,
this commit does not introduce improper mappings related security
issues.

Once a memory pressure is detected, this commit keeps the squeezing
limit for a user-specified time duration.  The duration should be
neither too long nor too short.  If it is too long, the squeezing
incurring overhead can reduce the I/O performance.  If it is too short,
`blkback` will not free enough pages to reduce the memory pressure.
This commit sets the value as `10 milliseconds` by default because it is
a short time in terms of I/O while it is a long time in terms of memory
operations.  Also, as the original shrinking mechanism works for at
least every 100 milliseconds, this could be a somewhat reasonable
choice.  I also tested other durations (refer to the below section for
more details) and confirmed that 10 milliseconds is the one that works
best with the test.  That said, the proper duration depends on actual
configurations and workloads.  That's why this commit is allowing users
to set it as their optimal value via the module parameter.

Memory Pressure Test


To show how this commit fixes the memory pressure situation well, I
configured a test environment on a xen-running virtualization system.
On the `blkfront` running guest instances, I attach a large number of
network-backed volume devices and induce I/O to those.  Meanwhile, I
measure the number of pages that swapped in and out on the `blkback`
running guest.  The test ran twice, once for the `blkback` before this
commit and once for that after this commit.  As shown below, this commit
has dramatically reduced the memory pressure:

pswpin  pswpout
before  76,672  185,799
after  2123,325

Optimal Aggressive Shrinking Duration
-

To find a best squeezing duration, I repeated the test with three
different durations (1ms, 10ms, and 100ms).  The results are as below:

durationpswpin  pswpout
1   852 6,424
10  212 3,325
100 203 3,340

As expected, the memory pressure has decreased as the duration is
increased, but the reduction stopped from the `10ms`.  Based on this
results, I chose the default duration as 10ms.

Performance Overhead Test
=

This commit could incur I/O performance degradation under severe memory
pressure because the squeezing will require more page allocations per
I/O.  To show the overhead, I artificially made a worst-case squeezing
situation and measured the I/O performance of a `blkfront` running
guest.

For the artificial squeezing, I set the `blkback.max_buffer_pages` using
the `/sys/module/xen_blkback/parameters/max_buffer_pages` file.  We set
the value to `1024` and `0`.  The `1024` is the default value.  Setting
the value as `0` is same to a situation doing the squeezing always
(worst-case).

For the I/O performance measurement, I use a simple `dd` command.

Default Performance
---

[dom0]# echo 1024 > /sys/module/xen_blkback/parameters/max_buffer_pages
[instance]$ for i in {1..5}; do dd if=/dev/zero of=file bs=4k 
count=$((256*512)); sync; done
131072+0 records in
131072+0 records out
536870912 bytes (537 MB) copied, 11.7257 s, 45.8 MB/s
131072+0 records in
131072+0 records out
536870912 bytes (537 MB) copied, 13.8827 s, 38.7 MB/s
131072+0 records in
131072+0 records out
536870912 bytes (537 MB) copied, 13.8781 s, 38.7 MB/s
131072+0 records in
131072+0 records out
536870912 bytes (537 MB) copied, 13.8737 s, 38.7 MB/s
131072+0 records in
131072+0 records out
536870912 bytes (537 

[Xen-devel] [PATCH v4 0/2] xenbus/backend: Add a memory pressure handler callback

2019-12-09 Thread SeongJae Park
Granting pages consumes backend system memory.  In systems configured
with insufficient spare memory for those pages, it can cause a memory
pressure situation.  However, finding the optimal amount of the spare
memory is challenging for large systems having dynamic resource
utilization patterns.  Also, such a static configuration might lacks a
flexibility.

To mitigate such problems, this patchset adds a memory reclaim callback
to 'xenbus_driver' (patch 1) and use it to mitigate the problem in
'xen-blkback' (patch 2).

Base Version


This patch is based on v5.4.  A complete tree is also available at my
public git repo:
https://github.com/sjp38/linux/tree/blkback_squeezing_v4


Patch History
-

Changes from v3 
(https://lore.kernel.org/xen-devel/20191209085839.21215-1-sjp...@amazon.com/)
 - Add general callback in xen_driver and use it (suggested by Juergen
   Gross)

Changes from v2 
(https://lore.kernel.org/linux-block/af195033-23d5-38ed-b73b-f6e2e3b34...@amazon.com)
 - Rename the module parameter and variables for brevity (aggressive
   shrinking -> squeezing)

Changes from v1 
(https://lore.kernel.org/xen-devel/20191204113419.2298-1-sjp...@amazon.com/)
 - Adjust the description to not use the term, `arbitrarily` (suggested
   by Paul Durrant)
 - Specify time unit of the duration in the parameter description,
   (suggested by Maximilian Heyne)
 - Change default aggressive shrinking duration from 1ms to 10ms
 - Merge two patches into one single patch

SeongJae Park (1):
  xen/blkback: Squeeze page pools if a memory pressure is detected

 drivers/block/xen-blkback/blkback.c | 35 +++--
 1 file changed, 33 insertions(+), 2 deletions(-)

SeongJae Park (2):
  xenbus/backend: Add memory pressure handler callback
  xen/blkback: Squeeze page pools if a memory pressure is detected

 drivers/block/xen-blkback/blkback.c   | 23 +++--
 drivers/block/xen-blkback/common.h|  1 +
 drivers/block/xen-blkback/xenbus.c|  3 ++-
 drivers/xen/xenbus/xenbus_probe_backend.c | 31 +++
 include/xen/xenbus.h  |  1 +
 5 files changed, 56 insertions(+), 3 deletions(-)

-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v4 1/2] xenbus/backend: Add memory pressure handler callback

2019-12-09 Thread SeongJae Park
From: SeongJae Park 

Granting pages consumes backend system memory.  In systems configured
with insufficient spare memory for those pages, it can cause a memory
pressure situation.  However, finding the optimal amount of the spare
memory is challenging for large systems having dynamic resource
utilization patterns.  Also, such a static configuration might lacks a
flexibility.

To mitigate such problems, this commit adds a memory reclaim callback to
'xenbus_driver'.  Using this facility, 'xenbus' would be able to monitor
a memory pressure and request specific domains of specific backend
drivers which causing the given pressure to voluntarily release its
memory.

That said, this commit simply requests every callback registered driver
to release its memory for every domain, rather than issueing the
requests to the drivers and domain in charge.  Such things would be a
future work.  Also, this commit focuses on memory only.  However, it
would be ablt to be extended for general resources.

Signed-off-by: SeongJae Park 
---
 drivers/xen/xenbus/xenbus_probe_backend.c | 31 +++
 include/xen/xenbus.h  |  1 +
 2 files changed, 32 insertions(+)

diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c 
b/drivers/xen/xenbus/xenbus_probe_backend.c
index b0bed4faf44c..cd5fd1cd8de3 100644
--- a/drivers/xen/xenbus/xenbus_probe_backend.c
+++ b/drivers/xen/xenbus/xenbus_probe_backend.c
@@ -248,6 +248,34 @@ static int backend_probe_and_watch(struct notifier_block 
*notifier,
return NOTIFY_DONE;
 }
 
+static int xenbus_backend_reclaim(struct device *dev, void *data)
+{
+   struct xenbus_driver *drv;
+   if (!dev->driver)
+   return -ENOENT;
+   drv = to_xenbus_driver(dev->driver);
+   if (drv && drv->reclaim)
+   drv->reclaim(to_xenbus_device(dev), DOMID_INVALID);
+   return 0;
+}
+
+/*
+ * Returns 0 always because we are using shrinker to only detect memory
+ * pressure.
+ */
+static unsigned long xenbus_backend_shrink_count(struct shrinker *shrinker,
+   struct shrink_control *sc)
+{
+   bus_for_each_dev(_backend.bus, NULL, NULL,
+   xenbus_backend_reclaim);
+   return 0;
+}
+
+static struct shrinker xenbus_backend_shrinker = {
+   .count_objects = xenbus_backend_shrink_count,
+   .seeks = DEFAULT_SEEKS,
+};
+
 static int __init xenbus_probe_backend_init(void)
 {
static struct notifier_block xenstore_notifier = {
@@ -264,6 +292,9 @@ static int __init xenbus_probe_backend_init(void)
 
register_xenstore_notifier(_notifier);
 
+   if (register_shrinker(_backend_shrinker))
+   pr_warn("shrinker registration failed\n");
+
return 0;
 }
 subsys_initcall(xenbus_probe_backend_init);
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index 869c816d5f8c..52aaf4f78400 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -104,6 +104,7 @@ struct xenbus_driver {
struct device_driver driver;
int (*read_otherend_details)(struct xenbus_device *dev);
int (*is_ready)(struct xenbus_device *dev);
+   unsigned (*reclaim)(struct xenbus_device *dev, domid_t domid);
 };
 
 static inline struct xenbus_driver *to_xenbus_driver(struct device_driver *drv)
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [qemu-mainline test] 144638: trouble: broken/fail/pass

2019-12-09 Thread osstest service owner
-amd64-i386-xl-pvshimfail
 test-amd64-amd64-pygrub  broken  
 test-amd64-amd64-xl-qcow2pass
 test-armhf-armhf-libvirt-raw pass
 test-amd64-i386-xl-raw   pass
 test-amd64-amd64-xl-rtds fail
 test-armhf-armhf-xl-rtds pass
 test-arm64-arm64-xl-seattle  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow  pass
 test-amd64-amd64-xl-shadow   pass
 test-amd64-i386-xl-shadowpass
 test-arm64-arm64-xl-thunderx pass
 test-amd64-amd64-libvirt-vhd pass
 test-armhf-armhf-xl-vhd  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary

broken-job test-amd64-amd64-amd64-pvgrub broken
broken-job test-amd64-amd64-xl-xsm broken
broken-job test-amd64-amd64-pygrub broken
broken-job test-amd64-i386-qemuu-rhel6hvm-intel broken
broken-job test-amd64-i386-libvirt broken
broken-step test-amd64-amd64-xl-xsm host-install(4)
broken-step test-amd64-amd64-amd64-pvgrub host-install(4)
broken-step test-amd64-amd64-pygrub host-install(4)
broken-step test-amd64-i386-qemuu-rhel6hvm-intel host-install(4)
broken-step test-amd64-i386-libvirt host-install(4)

Not pushing.


commit 8350b17be015bb872f28268bdeba1bac6c380efc
Merge: 02f9c885ed a2fad86497
Author: Peter Maydell 
Date:   Mon Dec 9 11:07:34 2019 +

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.2-20191209' 
into staging

ppc patch queue 2019-12-09

This is a last minute pull request for ppc-for-4.2.  I know it's very
late in freeze, but this does fix a regression: a bad interaction
between the new qemu and SLOF device tree construction code means that
SLOF will crash if PCI to PCI bridges are included in the system.

This PR supersedes ppc-for-4.2-20191206.  This one has only a more
minimal change to the firmware addressed only at fixing this bug and
not incorporating some other unrelated changes that happened in the
meantime.

# gpg: Signature made Mon 09 Dec 2019 04:52:19 GMT
# gpg:using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392
# gpg: Good signature from "David Gibson " 
[full]
# gpg: aka "David Gibson (Red Hat) " 
[full]
# gpg: aka "David Gibson (ozlabs.org) " 
[full]
# gpg: aka "David Gibson (kernel.org) " 
[unknown]
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 
B392

* remotes/dgibson/tags/ppc-for-4.2-20191209:
  pseries: Update SLOF firmware image

Signed-off-by: Peter Maydell 

commit a2fad86497b981426dc720236c15f2a99ca674a9
Author: Alexey Kardashevskiy 
Date:   Mon Dec 9 12:07:46 2019 +1100

pseries: Update SLOF firmware image

This fixes PCI bridge regression.

Alexey Kardashevskiy (3):
  ibm,client-architecture-support: Fix stack handling
  fdt: Fix updating the tree at H_CAS
  version: update to 20191209

Signed-off-by: Alexey Kardashevskiy 
Signed-off-by: David Gibson 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH-for-5.0 v3 0/6] hw/pci-host: Add Kconfig selector for IGD PCIe pass-through

2019-12-09 Thread no-reply
Patchew URL: https://patchew.org/QEMU/20191209095002.32194-1-phi...@redhat.com/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#! /bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-mingw@fedora J=14 NETWORK=1
=== TEST SCRIPT END ===

Submodule 'dtc' (https://git.qemu.org/git/dtc.git) registered for path 'dtc'
Cloning into 'dtc'...
remote: Counting objects: 5280, done.
error: RPC failed; result=18, HTTP code = 200
fatal: The remote end hung up unexpectedly
fatal: protocol error: bad pack header
Clone of 'https://git.qemu.org/git/dtc.git' into submodule path 'dtc' failed
failed to update submodule dtc
Submodule 'dtc' (https://git.qemu.org/git/dtc.git) unregistered for path 'dtc'
make[1]: *** 
[/var/tmp/patchew-tester-tmp-w4w8li1u/src/docker-src.2019-12-09-13.44.32.11934] 
Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-w4w8li1u/src'
make: *** [docker-run-test-mingw@fedora] Error 2

real5m17.813s
user0m2.470s


The full log is available at
http://patchew.org/logs/20191209095002.32194-1-phi...@redhat.com/testing.docker-mingw@fedora/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH-for-5.0 v3 0/6] hw/pci-host: Add Kconfig selector for IGD PCIe pass-through

2019-12-09 Thread no-reply
Patchew URL: https://patchew.org/QEMU/20191209095002.32194-1-phi...@redhat.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

Submodule 'dtc' (https://git.qemu.org/git/dtc.git) registered for path 'dtc'
Cloning into 'dtc'...
remote: Counting objects: 5280, done.
error: RPC failed; result=18, HTTP code = 200
fatal: The remote end hung up unexpectedly
fatal: protocol error: bad pack header
Clone of 'https://git.qemu.org/git/dtc.git' into submodule path 'dtc' failed
failed to update submodule dtc
Submodule 'dtc' (https://git.qemu.org/git/dtc.git) unregistered for path 'dtc'
make[1]: *** 
[/var/tmp/patchew-tester-tmp-2t3gdqmo/src/docker-src.2019-12-09-13.30.01.10260] 
Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-2t3gdqmo/src'
make: *** [docker-run-test-quick@centos7] Error 2

real9m11.026s
user0m2.757s


The full log is available at
http://patchew.org/logs/20191209095002.32194-1-phi...@redhat.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 1/3] xen/flask: Drop the gen-policy.py script

2019-12-09 Thread Daniel De Graaf

On 12/7/19 4:16 PM, Andrew Cooper wrote:

The script is Python 2 specific, and fails with string/binary issues with
Python 3:

   Traceback (most recent call last):
 File "gen-policy.py", line 14, in 
   for char in sys.stdin.read():
 File "/usr/lib/python3.5/codecs.py", line 321, in decode
   (result, consumed) = self._buffer_decode(data, self.errors, final)
   UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8c in position 0: 
invalid start byte

Fixing the script to be compatible isn't hard, but using python here is
wasteful.  Drop the script entirely, and write a short flask-policy.S instead.

Signed-off-by: Andrew Cooper 


Acked-by: Daniel De Graaf 

With either .align or .p2align as appropriate for more assemblers.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 5/6] xen/tasklet: Return -ERESTART from continue_hypercall_on_cpu()

2019-12-09 Thread Andrew Cooper
On 09/12/2019 16:52, Jan Beulich wrote:
> On 05.12.2019 23:30, Andrew Cooper wrote:
>> Some hypercalls tasklets want to create a continuation, rather than fail the
>> hypercall with a hard error.  By the time the tasklet is executing, it is too
>> late to create the continuation, and even continue_hypercall_on_cpu() doesn't
>> have enough state to do it correctly.
> I think it would be quite nice if you made clear what piece of state
> it is actually missing. To be honest, I don't recall anymore.

How to correctly mutate the registers and/or memory (which is specific
to the hypercall subop in some cases).

>> All callers of continue_hypercall_on_cpu() have been updated to turn 
>> -ERESTART
>> into a continuation, where appropriate modifications can be made to register
>> and/or memory parameters.
>>
>> This changes the continue_hypercall_on_cpu() behaviour to unconditionally
>> create a hypercall continuation, in case the tasklet wants to use it, and 
>> then
>> to have arch_hypercall_tasklet_result() cancel the continuation when a result
>> is available.  None of these hypercalls are fast paths.
>>
>> Signed-off-by: Andrew Cooper 
>> ---
>> CC: Jan Beulich 
>> CC: Wei Liu 
>> CC: Roger Pau Monné 
>> CC: Stefano Stabellini 
>> CC: Julien Grall 
>> CC: Volodymyr Babchuk 
>>
>> There is one RFC point.  The statement in the header file of "If this 
>> function
>> returns 0 then the function is guaranteed to run at some point in the 
>> future."
>> was never true.  In the case of a CPU miss, the hypercall would be blindly
>> failed with -EINVAL.
> "Was never true" sounds like "completely broken". Afaict it was true
> in all cases except the purely hypothetical one of the tasklet ending
> up executing on the wrong CPU.

There is nothing hypothetical about it.  It really will go wrong when a
CPU gets offlined.

>
>> The current behaviour with this patch is to not cancel the continuation, 
>> which
>> I think is less bad, but still not great.  Thoughts?
> Well, that's a guest live lock then aiui.

It simply continues again.  It will livelock only if the hypercall picks
a bad cpu all the time.

> Is there any way for the guest to make it out of there? If not, perhaps it'd 
> be "better" to
> crash the guest? (I don't suppose there's anything we can do to
> still make the tasklet run on the intended CPU.)

That is why I implemented it like this.  If it is a stray interaction
with a CPU offline, the next time around it will work fine.

Anything else is rather more broken.

>
>> --- a/xen/arch/arm/traps.c
>> +++ b/xen/arch/arm/traps.c
>> @@ -1489,6 +1489,7 @@ void arch_hypercall_tasklet_result(struct vcpu *v, 
>> long res)
>>  {
>>  struct cpu_user_regs *regs = >arch.cpu_info->guest_cpu_user_regs;
>>  
>> +regs->pc += 4;  /* Skip over 'hvc #XEN_HYPERCALL_TAG' */
>>  HYPERCALL_RESULT_REG(regs) = res;
>>  }
>>  
>> --- a/xen/arch/x86/hypercall.c
>> +++ b/xen/arch/x86/hypercall.c
>> @@ -170,6 +170,13 @@ void arch_hypercall_tasklet_result(struct vcpu *v, long 
>> res)
>>  {
>>  struct cpu_user_regs *regs = >arch.user_regs;
>>  
>> +/*
>> + * PV hypercalls are all 2-byte instructions (INT $0x82, SYSCALL).  HVM
>> + * hypercalls are all 3-byte instructions (VMCALL, VMMCALL).
>> + *
>> + * Move %rip forwards to complete the continuation.
>> + */
>> +regs->rip += 2 + is_hvm_vcpu(v);
>>  regs->rax = res;
>>  }
> To leave the system in consistent state, perhaps better to call
> hypercall_cancel_continuation() along with the PC/RIP updating?

Probably, yes.

>
>> --- a/xen/include/xen/domain.h
>> +++ b/xen/include/xen/domain.h
>> @@ -96,9 +96,11 @@ void domctl_lock_release(void);
>>  
>>  /*
>>   * Continue the current hypercall via func(data) on specified cpu.
>> - * If this function returns 0 then the function is guaranteed to run at some
>> - * point in the future. If this function returns an error code then the
>> - * function has not been and will not be executed.
>> + *
>> + * This function returns -ERESTART in the success case, and a higher level
>> + * caller is required to set up a hypercall continuation.  func() will be 
>> run
>> + * at some point in the future.  If this function returns any other error 
>> code
>> + * then func() has not, and will not be executed.
>>   */
>>  int continue_hypercall_on_cpu(
>>  unsigned int cpu, long (*func)(void *data), void *data);
> How is this comment any better wrt the "was never true" comment
> you've made above? The function still wouldn't be invoked in
> case of a CPU miss.

Depends now the miss came about.  It certainly stands a far better
chance now of actually executing.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 4/6] xen/hypercall: Cope with -ERESTART on more hypercall paths

2019-12-09 Thread Andrew Cooper
On 09/12/2019 16:29, Jan Beulich wrote:
> On 09.12.2019 17:25, Jan Beulich wrote:
>> On 05.12.2019 23:30, Andrew Cooper wrote:
>>> These hypercalls each use continue_hypercall_on_cpu(), whose API is about to
>>> switch to use -ERESTART.  Update the soon-to-be affected paths to cope,
>>> folding existing contination logic where applicable.
>>>
>>> In addition:
>>>  * For platform op and sysctl, insert a cpu_relax() into what is otherwise a
>>>tight spinlock loop, and make the continuation logic common at the
>>>epilogue.
>> Is this really needed with a hypercall_preempt_check() invocation
>> already in the bodies of these loops?

Yes.  The reason you're supposed to pause is to stop having memory
traffic constantly trying to pull the spinlock's cacheline into shared
state.

Racing round a tight loop constantly reading 4 or 5 memory locations is
almost as bad.

> And if it's really to be added, shouldn't it be at the bottom
> of the loop bodies rather than at the top?

It doesn't matter.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v4] x86: do not enable global pages when virtualized on AMD hardware

2019-12-09 Thread Roger Pau Monne
When using global pages a full tlb flush can only be performed by
toggling the PGE bit in CR4, which is usually quite expensive in terms
of performance when running virtualized. This is specially relevant on
AMD hardware, which doesn't have the ability to do selective CR4
trapping, but can also be relevant on Intel if the underlying
hypervisor also traps accesses to the PGE CR4 bit.

In order to avoid this performance penalty, do not use global pages
when running virtualized on AMD hardware. A command line option
'global-pages' is provided in order to allow the user to select
whether global pages will be enabled for PV guests.

The above figures are from a PV shim running on AMD hardware with
32 vCPUs:

PGE enabled, x2APIC mode:

(XEN) Global lock flush_lock: addr=82d0804b01c0, lockval=1adb1adb, not 
locked
(XEN)   lock:1841883(1375128998543), block:1658716(10193054890781)

Average lock time:   746588ns
Average block time: 6145147ns

PGE disabled, x2APIC mode:

(XEN) Global lock flush_lock: addr=82d0804af1c0, lockval=a8bfa8bf, not 
locked
(XEN)   lock:2730175(657505389886), block:2039716(2963768247738)

Average lock time:   240829ns
Average block time: 1453029ns

As seen from the above figures the lock and block time of the flush
lock is reduced to approximately 1/3 of the original value.

Note that XEN_MINIMAL_CR4 and mmu_cr4_features are not modified, and
thus global pages are left enabled for the hypervisor. This is not an
issue because the code to switch the control registers (cr3 and cr4)
already takes into account such situation and performs the necessary
flushes. The same already happens when using XPTI or PCIDE, as the
guest cr4 doesn't have global pages enabled in that case either.

Also note that the suspend and resume code is correct in writing
mmu_cr4_features into cr4 on resume, since that's the cr4 used by the
idle vCPU which is the context used by the suspend and resume routine.

Signed-off-by: Roger Pau Monné 
---
Changes since v3:
 - Expand commit message.

Changes since v2:
 - Set the default value at init if not specified by the user.
 - Use int8_t and read_mostly for opt_global_pages.

Changes since v1:
 - Provide command line option to enable/disable PGE.
 - Only disable PGE on AMD hardware when virtualized.
 - Document the global-pages option.
---
 docs/misc/xen-command-line.pandoc | 13 +
 xen/arch/x86/pv/domain.c  | 15 ++-
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/docs/misc/xen-command-line.pandoc 
b/docs/misc/xen-command-line.pandoc
index d9495ef6b9..7be30f2766 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -1087,6 +1087,19 @@ value settable via Xen tools.
 
 Dom0 is using this value for sizing its maptrack table.
 
+### global-pages (x86)
+> `= `
+
+> Default: `true` unless running virtualized on AMD hardware
+
+Set whether the PGE bit in CR4 will be enabled for PV guests. This controls the
+usage of global pages, and thus the need to perform tlb flushes by writing to
+CR4.
+
+Note it's disabled by default when running virtualized on AMD hardware since
+AMD SVM doesn't support selective trapping of CR4, so global pages are not
+enabled in order to reduce the overhead of tlb flushes.
+
 ### guest_loglvl
 > `= [/]` where level is `none | error | warning | 
 > info | debug | all`
 
diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c
index 4b6f48dea2..8ff733f56b 100644
--- a/xen/arch/x86/pv/domain.c
+++ b/xen/arch/x86/pv/domain.c
@@ -118,6 +118,19 @@ unsigned long pv_fixup_guest_cr4(const struct vcpu *v, 
unsigned long cr4)
 (mmu_cr4_features & PV_CR4_GUEST_VISIBLE_MASK));
 }
 
+static int8_t __read_mostly opt_global_pages = -1;
+boolean_runtime_param("global-pages", opt_global_pages);
+
+static int __init pge_init(void)
+{
+if ( opt_global_pages == -1 )
+opt_global_pages = !cpu_has_hypervisor ||
+   boot_cpu_data.x86_vendor != X86_VENDOR_AMD;
+
+return 0;
+}
+__initcall(pge_init);
+
 unsigned long pv_make_cr4(const struct vcpu *v)
 {
 const struct domain *d = v->domain;
@@ -130,7 +143,7 @@ unsigned long pv_make_cr4(const struct vcpu *v)
  */
 if ( d->arch.pv.pcid )
 cr4 |= X86_CR4_PCIDE;
-else if ( !d->arch.pv.xpti )
+else if ( !d->arch.pv.xpti && opt_global_pages )
 cr4 |= X86_CR4_PGE;
 
 /*
-- 
2.24.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 4/6] xen/hypercall: Cope with -ERESTART on more hypercall paths

2019-12-09 Thread Andrew Cooper
On 08/12/2019 12:57, Julien Grall wrote:
> Hi Andrew,
>
> On 05/12/2019 22:30, Andrew Cooper wrote:
>> These hypercalls each use continue_hypercall_on_cpu(), whose API is
>> about to
>> switch to use -ERESTART.  Update the soon-to-be affected paths to cope,
>> folding existing contination logic where applicable.
>>
>> In addition:
>>   * For platform op and sysctl, insert a cpu_relax() into what is
>> otherwise a
>>     tight spinlock loop, and make the continuation logic common at the
>>     epilogue.
>>   * Contrary to the comment in the code, kexec_exec() does return in the
>>     KEXEC_REBOOT case, needs to pass ret back to the caller.
>
> It is not entirely trivial to me that KEXEC_REBOOT refers to
> KEXEC_DEFAULT_TYPE. The more that if you look at the kexec_reboot()
> helper, it will not return (see BUG()). What may return is
> continue_hypercall_on_cpu().
>
> So would it make sense to use KEXEC_DEFAULT_TYPE?

I'm not sure why I capitalised it, but no - using KEXEC_DEFAULT_TYPE is
worse.  A casual reader is far more likely to understand kexec_reboot()
in this context.

>
> [...]
>
>> @@ -816,6 +819,13 @@ ret_t
>> do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
>>    out:
>>   spin_unlock(_lock);
>>   +    if ( ret == -ERESTART )
>> +    {
>> +    create_continuation:
>
> Shall we indent create_continuation the same way as out?

They have different scopes, and while it may look weird, this is in
accordance with our style.

>
> [...]
>
>> @@ -1263,13 +1263,25 @@ static int do_kexec_op_internal(unsigned long
>> op,
>>     long do_kexec_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void)
>> uarg)
>>   {
>> -    return do_kexec_op_internal(op, uarg, 0);
>> +    int ret = do_kexec_op_internal(op, uarg, 0);
> Shouldn't it be long (or unsigned long) here? Otherwise, the return of
> hypercall_create_continuation() may be truncated.

If you're concerned about truncation via this pattern, then there are
other areas of the code to be worred about.

However, there is nothing to truncate.  The return value of
hypercall_create_continuation() is the primary hypercall number, i.e.
__HYPERVISOR_kexec_op in this case.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/6] xen/domctl: Consolidate hypercall continuation handling at the top level

2019-12-09 Thread Andrew Cooper
On 09/12/2019 16:19, Jan Beulich wrote:
> On 05.12.2019 23:30, Andrew Cooper wrote:
>> --- a/xen/arch/x86/domctl.c
>> +++ b/xen/arch/x86/domctl.c
>> @@ -326,9 +326,12 @@ long arch_do_domctl(
>>  
>>  switch ( domctl->cmd )
>>  {
>> -
>>  case XEN_DOMCTL_shadow_op:
>>  ret = paging_domctl(d, >u.shadow_op, u_domctl, 0);
>> +/*
>> + * Continuations from paging_domctl() switch index to arch_1, and
>> + * can't use the common domctl continuation path.
>> + */
>>  if ( ret == -ERESTART )
>>  return hypercall_create_continuation(__HYPERVISOR_arch_1,
>>   "h", u_domctl);
> There's also XEN_DOMCTL_getpageframeinfo3 down from here which
> now invokes a continuation.

Fixed.

>
>> @@ -1080,6 +1068,9 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) 
>> u_domctl)
>>  if ( copyback && __copy_to_guest(u_domctl, op, 1) )
>>  ret = -EFAULT;
>>  
>> +if ( ret == -ERESTART )
>> +ret = hypercall_create_continuation(__HYPERVISOR_domctl,
>> +"h", u_domctl);
> You may want to mention in the description the bug you fix here:
> Previously the -EFAULT returning visible in context should have
> canceled any active continuation.

That would be presuming I'd even spotted it...

Having looked though the paths once again, I don't think there was a
path which continued and had copyback set, so this is at most a latent bug.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed

2019-12-09 Thread Durrant, Paul
> -Original Message-
> From: Roger Pau Monné 
> Sent: 09 December 2019 17:18
> To: Durrant, Paul 
> Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org; Juergen
> Gross ; Stefano Stabellini ;
> Boris Ostrovsky 
> Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to
> closed
> 
> On Mon, Dec 09, 2019 at 04:26:15PM +, Durrant, Paul wrote:
> > > > If you want unbind to actually do a proper unplug then that's extra
> work
> > > and not really something I want to tackle (and re-bind would still
> need to
> > > be toolstack initiated as something would have to re-create the
> xenstore
> > > area).
> > >
> > > Why do you say the xenstore area would need to be recreated?
> > >
> > > Setting state to closed shouldn't cause any cleanup of the xenstore
> > > area, as that should already happen for example when using pvgrub
> > > since in that case grub itself disconnects and already causes a
> > > transition to closed and a re-attachment afterwards by the guest
> > > kernel.
> > >
> >
> > For some reason, when I originally tested, the xenstore area
> disappeared. I checked again and it did not this time. I just ended up
> with a frontend stuck in state 5 (because it is the system disk and won't
> go offline) trying to talk to a non-existent backend. Upon re-bind the
> backend goes into state 5 (because it sees the 5 in the frontend) and
> leaves the guest wedged.
> 
> Likely blkfront should go back to init state, but anyway, that's not
> something that needs fixing as part of this series.
> 

Ok, cool.

I am wondering though whether we ought to suppress bind/unbind for drivers that 
don't whitelist themselves (through the new xenbus_driver flag that I'll add). 
It's somewhat misleading that the nodes are there but don't necessarily work.

  Paul


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/6] xen/domctl: Consolidate hypercall continuation handling at the top level

2019-12-09 Thread Andrew Cooper
On 08/12/2019 12:18, Julien Grall wrote:
> Hi,
>
> On 05/12/2019 22:30, Andrew Cooper wrote:
>> More paths are going start using hypercall continuations.  We could
>> add extra
>> calls to hypercall_create_continuation() but it is much easier to handle
>> -ERESTART once at the top level.
>>
>> One complication is XEN_DOMCTL_shadow_op, which for XSA-97 and ABI
>> compatibility in a security fix, turn a DOMCTL continuation into
>> __HYPERVISOR_arch_1.  This remains as it was, gaining a comment
>> explaining
>> what is going on.
>>
>> With -ERESTART handling in place, the !domctl_lock_acquire() path can
>> use the
>> normal exit path, instead of opencoding a subset of it locally.
>>
>> Signed-off-by: Andrew Cooper 
>> ---
>> CC: Jan Beulich 
>> CC: Wei Liu 
>> CC: Roger Pau Monné 
>> CC: Stefano Stabellini 
>> CC: Julien Grall 
>> CC: Volodymyr Babchuk 
>> ---
>>   xen/arch/x86/domctl.c   |  5 -
>>   xen/arch/x86/mm/hap/hap.c   |  3 +--
>>   xen/arch/x86/mm/shadow/common.c |  3 +--
>>   xen/common/domctl.c | 19 +--
>
> You seem to have missed the hypercall_create_continuation() call in
> iommu_do_pci_domctl().

So I have.  Fixed.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed

2019-12-09 Thread Roger Pau Monné
On Mon, Dec 09, 2019 at 04:26:15PM +, Durrant, Paul wrote:
> > > If you want unbind to actually do a proper unplug then that's extra work
> > and not really something I want to tackle (and re-bind would still need to
> > be toolstack initiated as something would have to re-create the xenstore
> > area).
> > 
> > Why do you say the xenstore area would need to be recreated?
> > 
> > Setting state to closed shouldn't cause any cleanup of the xenstore
> > area, as that should already happen for example when using pvgrub
> > since in that case grub itself disconnects and already causes a
> > transition to closed and a re-attachment afterwards by the guest
> > kernel.
> > 
> 
> For some reason, when I originally tested, the xenstore area disappeared. I 
> checked again and it did not this time. I just ended up with a frontend stuck 
> in state 5 (because it is the system disk and won't go offline) trying to 
> talk to a non-existent backend. Upon re-bind the backend goes into state 5 
> (because it sees the 5 in the frontend) and leaves the guest wedged.

Likely blkfront should go back to init state, but anyway, that's not
something that needs fixing as part of this series.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 1/3] xen/flask: Drop the gen-policy.py script

2019-12-09 Thread Andrew Cooper
On 09/12/2019 13:38, Jan Beulich wrote:
> On 07.12.2019 22:16, Andrew Cooper wrote:
>> --- /dev/null
>> +++ b/xen/xsm/flask/flask-policy.S
>> @@ -0,0 +1,20 @@
>> +.section .init.rodata, "a", %progbits
>> +
>> +/* const unsigned char xsm_flask_init_policy[] __initconst */
>> +.align 4
> I'm afraid .align is not universal enough to be used here - iirc
> some architectures have it alias .p2align rather than (how e.g.
> x86 behaves) .balign. Looks like .p2align is available in all
> binutils versions we claim to be able to be built with. (I'm
> not sure the one here is needed anyway, but the one below we
> surely want.)

I can switch to p2align, but...

>
>> +.global xsm_flask_init_policy
>> +xsm_flask_init_policy:
>> +.incbin "policy.bin"
>> +.Lend:
>> +
>> +.type xsm_flask_init_policy, %object
>> +.size xsm_flask_init_policy, . - xsm_flask_init_policy
>> +
>> +/* const unsigned int __initconst xsm_flask_init_policy_size */
>> +.align 4
>> +.global xsm_flask_init_policy_size
>> +xsm_flask_init_policy_size:
>> +.long .Lend - xsm_flask_init_policy
> Similarly .long isn't really universal (various arches override
> it in gas). Aiui .dc.l is intended to be portable (despite still
> carrying the 'l' in its name, and despite even this one getting
> overridden by two arches). But perhaps best to ask on the
> binutils list.

... this is not a clear or obvious way to go, not least because it makes
a different expectation that int will never change from being 32 bits. 
At least .long will work even if it becomes longer in a future toolchain.

What is used here doesn't need to be universal - it only needs to work
for the architectures we support.

If hand writing an asm file isn't considered good enough, then the other
options are a C file with inline asm incbin, or `objdump
--rename-section`.  The latter one would require a few changes elsewhere
in the code, but only for linkage purposes.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [xen-unstable-smoke test] 144639: tolerable all pass - PUSHED

2019-12-09 Thread osstest service owner
flight 144639 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/144639/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  b73aad4c8b6a767ce15cc8cb65f9eeab7bfccdae
baseline version:
 xen  ae25407faaaddf4abe44137ebf0e177a8c8f9858

Last test of basis   144607  2019-12-06 19:01:07 Z2 days
Testing same since   144639  2019-12-09 14:00:23 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Jan Beulich 
  Jeremi Piotrowski 
  Krzysztof Kolasa 
  Mark Pryor 
  Rasmus Villemoes 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   ae25407faa..b73aad4c8b  b73aad4c8b6a767ce15cc8cb65f9eeab7bfccdae -> smoke

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 5/6] xen/tasklet: Return -ERESTART from continue_hypercall_on_cpu()

2019-12-09 Thread Jan Beulich
On 05.12.2019 23:30, Andrew Cooper wrote:
> Some hypercalls tasklets want to create a continuation, rather than fail the
> hypercall with a hard error.  By the time the tasklet is executing, it is too
> late to create the continuation, and even continue_hypercall_on_cpu() doesn't
> have enough state to do it correctly.

I think it would be quite nice if you made clear what piece of state
it is actually missing. To be honest, I don't recall anymore.

> All callers of continue_hypercall_on_cpu() have been updated to turn -ERESTART
> into a continuation, where appropriate modifications can be made to register
> and/or memory parameters.
> 
> This changes the continue_hypercall_on_cpu() behaviour to unconditionally
> create a hypercall continuation, in case the tasklet wants to use it, and then
> to have arch_hypercall_tasklet_result() cancel the continuation when a result
> is available.  None of these hypercalls are fast paths.
> 
> Signed-off-by: Andrew Cooper 
> ---
> CC: Jan Beulich 
> CC: Wei Liu 
> CC: Roger Pau Monné 
> CC: Stefano Stabellini 
> CC: Julien Grall 
> CC: Volodymyr Babchuk 
> 
> There is one RFC point.  The statement in the header file of "If this function
> returns 0 then the function is guaranteed to run at some point in the future."
> was never true.  In the case of a CPU miss, the hypercall would be blindly
> failed with -EINVAL.

"Was never true" sounds like "completely broken". Afaict it was true
in all cases except the purely hypothetical one of the tasklet ending
up executing on the wrong CPU.

> The current behaviour with this patch is to not cancel the continuation, which
> I think is less bad, but still not great.  Thoughts?

Well, that's a guest live lock then aiui. Is there any way for the
guest to make it out of there? If not, perhaps it'd be "better" to
crash the guest? (I don't suppose there's anything we can do to
still make the tasklet run on the intended CPU.)

> --- a/xen/arch/arm/traps.c
> +++ b/xen/arch/arm/traps.c
> @@ -1489,6 +1489,7 @@ void arch_hypercall_tasklet_result(struct vcpu *v, long 
> res)
>  {
>  struct cpu_user_regs *regs = >arch.cpu_info->guest_cpu_user_regs;
>  
> +regs->pc += 4;  /* Skip over 'hvc #XEN_HYPERCALL_TAG' */
>  HYPERCALL_RESULT_REG(regs) = res;
>  }
>  
> --- a/xen/arch/x86/hypercall.c
> +++ b/xen/arch/x86/hypercall.c
> @@ -170,6 +170,13 @@ void arch_hypercall_tasklet_result(struct vcpu *v, long 
> res)
>  {
>  struct cpu_user_regs *regs = >arch.user_regs;
>  
> +/*
> + * PV hypercalls are all 2-byte instructions (INT $0x82, SYSCALL).  HVM
> + * hypercalls are all 3-byte instructions (VMCALL, VMMCALL).
> + *
> + * Move %rip forwards to complete the continuation.
> + */
> +regs->rip += 2 + is_hvm_vcpu(v);
>  regs->rax = res;
>  }

To leave the system in consistent state, perhaps better to call
hypercall_cancel_continuation() along with the PC/RIP updating?

> --- a/xen/include/xen/domain.h
> +++ b/xen/include/xen/domain.h
> @@ -96,9 +96,11 @@ void domctl_lock_release(void);
>  
>  /*
>   * Continue the current hypercall via func(data) on specified cpu.
> - * If this function returns 0 then the function is guaranteed to run at some
> - * point in the future. If this function returns an error code then the
> - * function has not been and will not be executed.
> + *
> + * This function returns -ERESTART in the success case, and a higher level
> + * caller is required to set up a hypercall continuation.  func() will be run
> + * at some point in the future.  If this function returns any other error 
> code
> + * then func() has not, and will not be executed.
>   */
>  int continue_hypercall_on_cpu(
>  unsigned int cpu, long (*func)(void *data), void *data);

How is this comment any better wrt the "was never true" comment
you've made above? The function still wouldn't be invoked in
case of a CPU miss.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] docs/sphinx: How Xen Boots on x86

2019-12-09 Thread Andrew Cooper
On 09/12/2019 15:20, Jan Beulich wrote:
> On 06.12.2019 20:34, Andrew Cooper wrote:
>> Begin to document how the x86 build of Xen boots.  It is by no means 
>> complete,
>> but is a start.
>>
>> Signed-off-by: Andrew Cooper 
>> ---
>> CC: Jan Beulich 
>> CC: Wei Liu 
>> CC: Roger Pau Monné 
>>
>> This came about while I sat in SFO waiting for a delayed flight, and was 
>> asked
>> a question by the Trenchboot folk.
>>
>> Writing it down like this already highlights some issues, such as the EFI
>> binary having MB1/MB2 headers.
> While at least the MB1 ones aren't really necessary, they also don't
> do any harm, do they?

At least one version of iPXE seems to prefer MB2 over PE, which is why I
was asked "where are Xen's entrypoints?" in the first place.

>
>> --- /dev/null
>> +++ b/docs/hypervisor-guide/x86/how-xen-boots.rst
>> @@ -0,0 +1,101 @@
>> +.. SPDX-License-Identifier: CC-BY-4.0
>> +
>> +How Xen Boots
>> +=
>> +
>> +This is an at-a-glance reference of Xen's booting capabilities and
>> +expectations.
>> +
>> +
>> +Build
>> +-
>> +
>> +A build of xen produces ``xen.gz`` and optionally ``xen.efi`` as final
>> +artefacts.
>> +
>> + * For BIOS, Xen supports the Multiboot 1 and 2 protocols.
>> +
>> + * For EFI, Xen supports Multiboot 2 with EFI extensions, and native EFI64.
>> +
>> + * For virtualisation, Xen supports starting directly with the PVH boot
>> +   protocol.
>> +
>> +
>> +Objects
>> +~~~
>> +
>> +To begin with, most object files are compiled and linked.  This includes the
>> +Multiboot 1 and 2 headers and entrypoints, including the Multiboot 2 tags 
>> for
>> +EFI extensions.  When ``CONFIG_PVH_GUEST`` is selected at build time, this
>> +includes the PVH entrypoint and associated ELF notes.
>> +
>> +Depending on whether the compiler supports ``__attribute__((__ms_abi__))`` 
>> or
>> +not, either an EFI stub is included which nops/fails applicable setup calls,
>> +or full EFI support is included.
> Perhaps also mention that the linker needs to support the necessary
> binary output format? And perhaps "setup and runtime calls"?

Link time behaviour is (deliberately) in a later section.

>
>> +Protocols and entrypoints
>> +~
>> +
>> +All headers and tags are built in ``xen/arch/x86/boot/head.S``
>> +
>> +The Multiboot 1 headers request aligned modules and memory information.  
>> Entry
>> +is via the start of the binary image, which is the ``start`` symbol.  This
>> +entrypoint must be started in 32bit mode.
>> +
>> +The Multiboot 2 headers are more flexible, and in addition request that the
>> +image be loaded as high as possible below the 4G boundary, with 2M 
>> alignment.
>> +Entry is still via the ``start`` symbol as with MB1.
> Perhaps explicitly (re)state this is in 32-bit mode?
>
>> +Headers for the EFI MB2 extensions are also present.  These request that
>> +``ExitBootServices()`` not be called, and register ``__efi_mb2_start`` as an
>> +alternative entrypoint, entered in 64bit mode.
>> +
>> +If ``CONFIG_PVH_GUEST`` was selected at build time, an Elf note is included
>> +which indicates the ability to use the PVH boot protocol, and registers
>> +``__pvh_start`` as the entrypoint, entered in 32bit mode.
>> +
>> +
>> +xen.gz
>> +~~
>> +
>> +The objects are linked together to form ``xen-syms`` which is an ELF64
>> +executable with full debugging symbols.  ``xen.gz`` is formed by stripping
>> +``xen-syms``, then repackaging the result as an ELF32 object with a single
>> +load section at 2MB, and ``gzip``-ing the result.  Despite the ELF32 having 
>> a
>> +fixed load address, its contents are relocatable.
> This is a little ambiguous I guess - most of the code is PIC and as
> such relocatable, but not in a way a boot loader could arrange for.

I don't follow your concern.

Everything which needs to be is position independent (subject to being
loaded on a 2M boundary IIRC), and this property is requested by the MB2
header.

>
>> +Any bootloader which unzips the binary and follows the ELF headers will 
>> place
>> +it at the 2M boundary and jump to ``start`` which is the identified entry
>> +point.  However, Xen depends on being entered with the MB1 or MB2 protocols,
>> +and will terminate otherwise.
>> +
>> +The MB2+EFI entrypoint depends on being entered with the MB2 protocol, and
>> +will terminate if the entry protocol is wrong, or if EFI details aren't
>> +provided, or if EFI Boot Services are not available.
>> +
>> +
>> +xen.efi
>> +~~~
>> +
>> +When a PEI-capable toolchain is found, the objects are linked together and a
>> +PE64 binary is created.  It can be run directly from the EFI shell, and has
> I think it's commonly called PE32+, not PE64.

Ok., because by definition, it can stack.

>
> Maybe also mention the "chainloader" grub command it can be used with?
> Or do we consider this uninteresting enough with modern grub?

I wasn't planning to consider chainloading, as it isn't really relevant
to how Xen starts, and can be stacked 

Re: [Xen-devel] [PATCH 3/4] xen/interface: don't discard pending work in FRONT/BACK_RING_ATTACH

2019-12-09 Thread Durrant, Paul
> -Original Message-
> From: Jürgen Groß 
> Sent: 09 December 2019 13:55
> To: Durrant, Paul ; linux-ker...@vger.kernel.org;
> xen-devel@lists.xenproject.org
> Cc: Boris Ostrovsky ; Stefano Stabellini
> 
> Subject: Re: [PATCH 3/4] xen/interface: don't discard pending work in
> FRONT/BACK_RING_ATTACH
> 
> On 05.12.19 15:01, Paul Durrant wrote:
> > Currently these macros will skip over any requests/responses that are
> > added to the shared ring whilst it is detached. This, in general, is not
> > a desirable semantic since most frontend implementations will eventually
> > block waiting for a response which would either never appear or never be
> > processed.
> >
> > NOTE: These macros are currently unused. BACK_RING_ATTACH(), however,
> will
> >be used in a subsequent patch.
> >
> > Signed-off-by: Paul Durrant 
> > ---
> > Cc: Boris Ostrovsky 
> > Cc: Juergen Gross 
> > Cc: Stefano Stabellini 
> > ---
> >   include/xen/interface/io/ring.h | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/xen/interface/io/ring.h
> b/include/xen/interface/io/ring.h
> > index 3f40501fc60b..405adfed87e6 100644
> > --- a/include/xen/interface/io/ring.h
> > +++ b/include/xen/interface/io/ring.h
> > @@ -143,14 +143,14 @@ struct __name##_back_ring {
>   \
> >   #define FRONT_RING_ATTACH(_r, _s, __size) do {
> > \
> >   (_r)->sring = (_s);   
> > \
> >   (_r)->req_prod_pvt = (_s)->req_prod;  \
> > -(_r)->rsp_cons = (_s)->rsp_prod;   
> > \
> > +(_r)->rsp_cons = (_s)->req_prod;   
> > \
> >   (_r)->nr_ents = __RING_SIZE(_s, __size);  
> > \
> >   } while (0)
> >
> >   #define BACK_RING_ATTACH(_r, _s, __size) do { 
> > \
> >   (_r)->sring = (_s);   
> > \
> >   (_r)->rsp_prod_pvt = (_s)->rsp_prod;  \
> > -(_r)->req_cons = (_s)->req_prod;   
> > \
> > +(_r)->req_cons = (_s)->rsp_prod;   
> > \
> >   (_r)->nr_ents = __RING_SIZE(_s, __size);  
> > \
> >   } while (0)
> 
> Lets look at all possible scenarios where BACK_RING_ATTACH()
> might happen:
> 
> Initially (after [FRONT|BACK]_RING_INIT(), leaving _pvt away):
> req_prod=0, rsp_cons=0, rsp_prod=0, req_cons=0
> Using BACK_RING_ATTACH() is fine (no change)
> 
> Request queued:
> req_prod=1, rsp_cons=0, rsp_prod=0, req_cons=0
> Using BACK_RING_ATTACH() is fine (no change)
> 
> and taken by backend:
> req_prod=1, rsp_cons=0, rsp_prod=0, req_cons=1
> Using BACK_RING_ATTACH() is resetting req_cons to 0, will result
> in redoing request (for blk this is fine, other devices like SCSI
> tapes will have issues with that). One possible solution would be
> to ensure all taken requests are either stopped or the response
> is queued already.

Yes, it is the assumption that a backend will drain and complete any requests 
it is handling, but it will not deal with new ones being posted by the 
frontend. This does appear to be the case for blkback.

> 
> Response queued:
> req_prod=1, rsp_cons=0, rsp_prod=1, req_cons=1
> Using BACK_RING_ATTACH() is fine (no change)
> 
> Response taken:
> req_prod=1, rsp_cons=1, rsp_prod=1, req_cons=1
> Using BACK_RING_ATTACH() is fine (no change)
> 
> In general I believe the [FRONT|BACK]_RING_ATTACH() macros are not
> fine to be used in the current state, as the *_pvt fields normally not
> accessible by the other end are initialized using the (possibly
> untrusted) values from the shared ring. There needs at least to be a
> test for the values to be sane, and your change should not result in the
> same value to be read twice, as it could have changed in between.

What test would you apply to sanitize the value of the pvt pointer? Another 
option would be to have a backend write its pvt value into the xenstore backend 
area when the ring is unmapped, so that a new instance definitely resumes where 
the old one left off. The value of rsp_prod could, of course, be overwritten by 
the guest at any time and so there's little point in attempting sanitize it.

> 
> As this is an error which can happen in other OS's, too, I'd recommend
> to add the adapted macros (plus a comment regarding the possible
> problem noted above for special devices like tapes) to the Xen variant
> of ring.h.
> 

I can certainly send a patch to Xen once we agree on the final definition.

  Paul

> 
> Juergen
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 4/6] xen/hypercall: Cope with -ERESTART on more hypercall paths

2019-12-09 Thread Jan Beulich
On 09.12.2019 17:25, Jan Beulich wrote:
> On 05.12.2019 23:30, Andrew Cooper wrote:
>> These hypercalls each use continue_hypercall_on_cpu(), whose API is about to
>> switch to use -ERESTART.  Update the soon-to-be affected paths to cope,
>> folding existing contination logic where applicable.
>>
>> In addition:
>>  * For platform op and sysctl, insert a cpu_relax() into what is otherwise a
>>tight spinlock loop, and make the continuation logic common at the
>>epilogue.
> 
> Is this really needed with a hypercall_preempt_check() invocation
> already in the bodies of these loops?

And if it's really to be added, shouldn't it be at the bottom
of the loop bodies rather than at the top?

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 4/6] xen/hypercall: Cope with -ERESTART on more hypercall paths

2019-12-09 Thread Jan Beulich
On 05.12.2019 23:30, Andrew Cooper wrote:
> These hypercalls each use continue_hypercall_on_cpu(), whose API is about to
> switch to use -ERESTART.  Update the soon-to-be affected paths to cope,
> folding existing contination logic where applicable.
> 
> In addition:
>  * For platform op and sysctl, insert a cpu_relax() into what is otherwise a
>tight spinlock loop, and make the continuation logic common at the
>epilogue.

Is this really needed with a hypercall_preempt_check() invocation
already in the bodies of these loops?

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed

2019-12-09 Thread Durrant, Paul
> -Original Message-
[snip]
> >
> > Well unbind is pretty useless now IMO since bind doesn't work, and a
> transition straight to closed is just plain wrong anyway.
> 
> Why do you claim that a straight transition into the closed state is
> wrong?

It's badly documented, I agree, but have a look at 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/xen-netback/xenbus.c#n480.
 Connected -> Closed is not a valid transition, and I don't think it was ever 
intended to be.

> 
> I don't see any such mention in blkif.h, which also doesn't contain
> any guidelines regarding closing state transitions, so unless
> otherwise stated somewhere else transitions into closed can happen
> from any state IMO.
> 

They can, but it is even more poorly documented what should be done in this 
case.

> > But, we could have a flag that the backend driver sets to say that it
> supports transparent re-bind that gates this code. Would that make you
> feel more comfortable?
> 
> Having an option to leave state untouched when unbinding would be fine
> for me, otherwise state should be set to closed when unbinding. I
> don't think there's anything else that needs to be done in this
> regard, the cleanup should be exactly the same the only difference
> being the setting of all the active backends to closed state.
> 

Ok, I'll add such a flag and define it for blkback only, in patch #4 i.e. when 
it actually gains the ability to rebind.

> > If you want unbind to actually do a proper unplug then that's extra work
> and not really something I want to tackle (and re-bind would still need to
> be toolstack initiated as something would have to re-create the xenstore
> area).
> 
> Why do you say the xenstore area would need to be recreated?
> 
> Setting state to closed shouldn't cause any cleanup of the xenstore
> area, as that should already happen for example when using pvgrub
> since in that case grub itself disconnects and already causes a
> transition to closed and a re-attachment afterwards by the guest
> kernel.
> 

For some reason, when I originally tested, the xenstore area disappeared. I 
checked again and it did not this time. I just ended up with a frontend stuck 
in state 5 (because it is the system disk and won't go offline) trying to talk 
to a non-existent backend. Upon re-bind the backend goes into state 5 (because 
it sees the 5 in the frontend) and leaves the guest wedged.

  Paul


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/6] xen/domctl: Consolidate hypercall continuation handling at the top level

2019-12-09 Thread Jan Beulich
On 05.12.2019 23:30, Andrew Cooper wrote:
> --- a/xen/arch/x86/domctl.c
> +++ b/xen/arch/x86/domctl.c
> @@ -326,9 +326,12 @@ long arch_do_domctl(
>  
>  switch ( domctl->cmd )
>  {
> -
>  case XEN_DOMCTL_shadow_op:
>  ret = paging_domctl(d, >u.shadow_op, u_domctl, 0);
> +/*
> + * Continuations from paging_domctl() switch index to arch_1, and
> + * can't use the common domctl continuation path.
> + */
>  if ( ret == -ERESTART )
>  return hypercall_create_continuation(__HYPERVISOR_arch_1,
>   "h", u_domctl);

There's also XEN_DOMCTL_getpageframeinfo3 down from here which
now invokes a continuation.

> @@ -1080,6 +1068,9 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) 
> u_domctl)
>  if ( copyback && __copy_to_guest(u_domctl, op, 1) )
>  ret = -EFAULT;
>  
> +if ( ret == -ERESTART )
> +ret = hypercall_create_continuation(__HYPERVISOR_domctl,
> +"h", u_domctl);

You may want to mention in the description the bug you fix here:
Previously the -EFAULT returning visible in context should have
canceled any active continuation.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3] x86: do not enable global pages when virtualized on AMD hardware

2019-12-09 Thread Jan Beulich
On 09.12.2019 16:52, Roger Pau Monné wrote:
> On Mon, Dec 09, 2019 at 04:39:58PM +0100, Jan Beulich wrote:
>> On 09.12.2019 16:36, Roger Pau Monné wrote:
>>> On Mon, Dec 09, 2019 at 04:04:51PM +0100, Jan Beulich wrote:
 On 09.12.2019 15:46, Roger Pau Monné wrote:
> On Mon, Dec 09, 2019 at 03:21:28PM +0100, Jan Beulich wrote:
>> On 09.12.2019 11:20, Roger Pau Monné wrote:
>>> On Wed, Dec 04, 2019 at 06:05:11PM +0100, Jan Beulich wrote:
 On 04.12.2019 17:18, Roger Pau Monné wrote:
> On Wed, Dec 04, 2019 at 05:11:42PM +0100, Jan Beulich wrote:
>> On 04.12.2019 16:12, Roger Pau Monne wrote:
>>> @@ -130,7 +143,7 @@ unsigned long pv_make_cr4(const struct vcpu *v)
>>>   */
>>>  if ( d->arch.pv.pcid )
>>>  cr4 |= X86_CR4_PCIDE;
>>> -else if ( !d->arch.pv.xpti )
>>> +else if ( !d->arch.pv.xpti && opt_global_pages )
>>>  cr4 |= X86_CR4_PGE;
>>
>> I'm sorry for noticing this only now, but what about XEN_MINIMAL_CR4,
>> which includes X86_CR4_PGE?
>
> I've tried removing PGE from XEN_MINIMAL_CR4 but it made no noticeable
> performance difference, so I left it as-is.

 My concern isn't about performance, but correctness. I admit I
 forgot for a moment that we now always write CR4 (unless the
 cached value matches the intended new one). Yet
 mmu_cr4_features (starting out as XEN_MINIMAL_CR4) is still of
 concern.

 I think this at least requires extending the description to
 discuss the correctness.
>>>
>>> Would you be fine with adding the following at the end of the commit
>>> message.
>>>
>>> "Note that XEN_MINIMAL_CR4 is not modified, and thus global pages are
>>> left enabled for the hypervisor. This is not an issue because the code
>>> to switch the control registers (cr3 and cr4) already takes into
>>> account such situation and performs the necessary flushes. The same
>>> already happens when using XPTI or PCIDE, as the guest cr4 doesn't
>>> have global pages enabled in that case either."
>>
>> Yes, this is good for XEN_MINIMAL_CR4. But I think mmu_cr4_features
>> needs discussing (or at least mentioning, if the same arguments
>> apply) as well.
>
> The same applies to mmu_cr4_features, it's fine for the hypervisor to
> use a different set of cr4 features (especially PGE) than PV guests:
> this is already the case when using XPTI or PCIDE when Xen cr4 will
> have PGE and guests cr4 won't, and switch_cr3_cr4 DTRT.
>
> So I would s/XEN_MINIMAL_CR4/XEN_MINIMAL_CR4 and mmu_cr4_features/ in
> the above proposed paragraph.

 I'm afraid it's more complicated, up to and including you making a
 possible pre-existing bug worse: The S3 resume path loads CR4 from
 mmu_cr4_features, but doesn't update the in-memory cache of the
 register.
>>>
>>> All domains are paused and the scheduler is disabled when doing S3
>>> suspend/resume, and the actual suspend and resume code is run by a
>>> tasklet which is executed in the idle vCPU context (since all domains
>>> are paused), and hence the write of CR4 with mmu_cr4_features is fine
>>> as entering guest context after resume is going to involve a call to
>>> switch_cr3_cr4 in order to switch out of the idle vCPU.
>>
>> And switch_cr3_cr4() has
>>
>> if ( old_cr4 != cr4 )
>> write_cr4(cr4);
>>
>> with old_cr4 read from the cache.
> 
> That read from the cache is fine. The idle vCPU cr4 is
> mmu_cr4_features (see write_ptbase), and hence the write done in
> __ret_point should match what's in the cache.

Ah yes.

>>> It might be clearer to just save cr4 in do_suspend_lowlevel like it's
>>> done with the rest of the control registers.
>>
>> Not just more clear, but also more reliable.
> 
> Let me prepare a patch to improve this, but I think the current patch
> at hand is correct and shouldn't be blocked by this suspend/resume
> improvement.

Fine with me, but please re-post with the enhanced description.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 4/6] Add Code Review Guide

2019-12-09 Thread Lars Kurth


> On 9 Dec 2019, at 11:02, Lars Kurth  wrote:
> 
> 
> 
> On 06/12/2019, 09:51, "Jan Beulich"  > wrote:
> 
>On 06.12.2019 00:41, Lars Kurth wrote:
>> I propose to add the following section to code-review-guide.md
>> 
>> 
>> ## Problematic Patch Reviews
>> 
>> A typical waterfall software development process is sequential with the 
>> following 
>> steps: define requirements, analyse, design, code, test and deploy. Problems 
>> uncovered by code review or testing at such a late stage can cause costly 
>> redesign 
>> and delays. The principle of **[Shift 
>> Left](https://devopedia.org/shift-left)** is to take a 
>> task that is traditionally performed at a late stage in the process and 
>> perform that task 
>> at earlier stages. The goal is to save time by avoiding refactoring.
>> 
>> Typically, problematic patch reviews uncover issues such as wrong or missed 
>> assumptions, a problematic architecture or design, or other bugs that 
>> require 
>> significant re-implementation of a patch series to fix the issue.
>> 
>> The principle of **Shift Left** also applies in code reviews. Let's assume a 
>> series has
>> a major flaw: ideally, this flaw would be picked up in the **first or second 
>> iteration** of 
>> the code review. As significant parts of the code may have to be re-written, 
>> it does not 
>> make sense for reviewers to highlight minor issues (such as style issues) 
>> until major 
>> flaws have been addressed. By providing feedback on minor issues reviewers 
>> cause 
>> the code author and themselves extra work by asking for changes to code, 
>> which 
>> ultimately may be changed later.
>> 
>> The question then becomes, how do code reviewers identify major issues 
>> early? 
>> 
>> This is where I really need help. Are there any tips and recommendations 
>> that we could give?
>> I can clearly highlight that we have RFC series, but in practice that does 
>> not solve the problem as RFCs don’t get prioritized
>> How do reviewers normally approach a series: do you a) take a big picture 
>> view first, or b) do most of you work through a series sequentially
> 
>Afaic - depends heavily on the patch / series. I wouldn't typically
>peek ahead in a series, but it has happened. But as you say
>(elsewhere) the cover letter should put in place the "big picture".
>A series should generally be reviewable going from patch to patch,
>having the cover letter in mind.
> 
> I am wondering what others do. 
> 
> I think explaining the basic work-flow from the viewpoint of a reviewer and 
> code author maybe in a separate section, which is not tied to the problem 
> case would make sense. More input from other maintainers would be valuable. 
> My gut-feel is that most reviewers "read and review" series sequentially, 
> which has implications for the author. E.g.
> - docs/design docs should be at the beginning of a series
> - key header files or changes to them should be at the beginning of a series
> - Etc
> 
>> I then propose to change the following section in communication-practice.md
>> 
>> ### Prioritize significant flaws
>> If a patch or patch series has significant flaws, such as
>> * It is built on wrong assumptions
>> * There are issues with the architecture or the design
> 
>In such a case a full review of course doesn't make much sense. But
>this is far from the typical situation. Way more often you have some
>_part_ of a patch or series which has a bigger issue, but other
>parts are in need of no or just minor changes.
> 
> I know that this is an unusual situation. But it has happened in clusters 
> frequently in the past.
> 
> I am wondering whether we should introduce some informal convention to mark 
> _part_ of a series as problematic. A simple example of how to do this in the 
> cover letter would do
> 
>> it does not make sense to do a detailed code review. In such cases, it is 
>> best to
>> focus on the major issues first and deal with style and minor issues in a 
>> subsequent
>> review. Not all series have significant flaws, but most series have 
>> different classes of 
>> changes that are required for acceptance: covering a range of major code 
>> modifications to minor code style fixes. To avoid misunderstandings between 
>> reviewers and contributors, it is important to establish and agree whether a 
>> series or 
>> part of a series has a significant flaw and agree a course of action. 
>> 
>> A pragmatic approach would be to
>> * Highlight problematic portions of a series in the cover letter 
>> * For the patch author and reviewer(s) to agree that for problematic to omit 
>> style and
>> minor issues in the review, until the significant flaw is addressed
>> 
>> This saves both the patch author and reviewer(s) time. Note that some 
>> background
>> is covered in detail in [Problematic Patch 
>> Reviews](resolving-disagreement.md#problems).
> 
>I have no issues with the suggested text in general, but I 

Re: [Xen-devel] [PATCH v3] x86: do not enable global pages when virtualized on AMD hardware

2019-12-09 Thread Roger Pau Monné
On Mon, Dec 09, 2019 at 04:39:58PM +0100, Jan Beulich wrote:
> On 09.12.2019 16:36, Roger Pau Monné wrote:
> > On Mon, Dec 09, 2019 at 04:04:51PM +0100, Jan Beulich wrote:
> >> On 09.12.2019 15:46, Roger Pau Monné wrote:
> >>> On Mon, Dec 09, 2019 at 03:21:28PM +0100, Jan Beulich wrote:
>  On 09.12.2019 11:20, Roger Pau Monné wrote:
> > On Wed, Dec 04, 2019 at 06:05:11PM +0100, Jan Beulich wrote:
> >> On 04.12.2019 17:18, Roger Pau Monné wrote:
> >>> On Wed, Dec 04, 2019 at 05:11:42PM +0100, Jan Beulich wrote:
>  On 04.12.2019 16:12, Roger Pau Monne wrote:
> > @@ -130,7 +143,7 @@ unsigned long pv_make_cr4(const struct vcpu *v)
> >   */
> >  if ( d->arch.pv.pcid )
> >  cr4 |= X86_CR4_PCIDE;
> > -else if ( !d->arch.pv.xpti )
> > +else if ( !d->arch.pv.xpti && opt_global_pages )
> >  cr4 |= X86_CR4_PGE;
> 
>  I'm sorry for noticing this only now, but what about XEN_MINIMAL_CR4,
>  which includes X86_CR4_PGE?
> >>>
> >>> I've tried removing PGE from XEN_MINIMAL_CR4 but it made no noticeable
> >>> performance difference, so I left it as-is.
> >>
> >> My concern isn't about performance, but correctness. I admit I
> >> forgot for a moment that we now always write CR4 (unless the
> >> cached value matches the intended new one). Yet
> >> mmu_cr4_features (starting out as XEN_MINIMAL_CR4) is still of
> >> concern.
> >>
> >> I think this at least requires extending the description to
> >> discuss the correctness.
> >
> > Would you be fine with adding the following at the end of the commit
> > message.
> >
> > "Note that XEN_MINIMAL_CR4 is not modified, and thus global pages are
> > left enabled for the hypervisor. This is not an issue because the code
> > to switch the control registers (cr3 and cr4) already takes into
> > account such situation and performs the necessary flushes. The same
> > already happens when using XPTI or PCIDE, as the guest cr4 doesn't
> > have global pages enabled in that case either."
> 
>  Yes, this is good for XEN_MINIMAL_CR4. But I think mmu_cr4_features
>  needs discussing (or at least mentioning, if the same arguments
>  apply) as well.
> >>>
> >>> The same applies to mmu_cr4_features, it's fine for the hypervisor to
> >>> use a different set of cr4 features (especially PGE) than PV guests:
> >>> this is already the case when using XPTI or PCIDE when Xen cr4 will
> >>> have PGE and guests cr4 won't, and switch_cr3_cr4 DTRT.
> >>>
> >>> So I would s/XEN_MINIMAL_CR4/XEN_MINIMAL_CR4 and mmu_cr4_features/ in
> >>> the above proposed paragraph.
> >>
> >> I'm afraid it's more complicated, up to and including you making a
> >> possible pre-existing bug worse: The S3 resume path loads CR4 from
> >> mmu_cr4_features, but doesn't update the in-memory cache of the
> >> register.
> > 
> > All domains are paused and the scheduler is disabled when doing S3
> > suspend/resume, and the actual suspend and resume code is run by a
> > tasklet which is executed in the idle vCPU context (since all domains
> > are paused), and hence the write of CR4 with mmu_cr4_features is fine
> > as entering guest context after resume is going to involve a call to
> > switch_cr3_cr4 in order to switch out of the idle vCPU.
> 
> And switch_cr3_cr4() has
> 
> if ( old_cr4 != cr4 )
> write_cr4(cr4);
> 
> with old_cr4 read from the cache.

That read from the cache is fine. The idle vCPU cr4 is
mmu_cr4_features (see write_ptbase), and hence the write done in
__ret_point should match what's in the cache.

> > It might be clearer to just save cr4 in do_suspend_lowlevel like it's
> > done with the rest of the control registers.
> 
> Not just more clear, but also more reliable.

Let me prepare a patch to improve this, but I think the current patch
at hand is correct and shouldn't be blocked by this suspend/resume
improvement.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3] x86: do not enable global pages when virtualized on AMD hardware

2019-12-09 Thread Jan Beulich
On 09.12.2019 16:36, Roger Pau Monné wrote:
> On Mon, Dec 09, 2019 at 04:04:51PM +0100, Jan Beulich wrote:
>> On 09.12.2019 15:46, Roger Pau Monné wrote:
>>> On Mon, Dec 09, 2019 at 03:21:28PM +0100, Jan Beulich wrote:
 On 09.12.2019 11:20, Roger Pau Monné wrote:
> On Wed, Dec 04, 2019 at 06:05:11PM +0100, Jan Beulich wrote:
>> On 04.12.2019 17:18, Roger Pau Monné wrote:
>>> On Wed, Dec 04, 2019 at 05:11:42PM +0100, Jan Beulich wrote:
 On 04.12.2019 16:12, Roger Pau Monne wrote:
> @@ -130,7 +143,7 @@ unsigned long pv_make_cr4(const struct vcpu *v)
>   */
>  if ( d->arch.pv.pcid )
>  cr4 |= X86_CR4_PCIDE;
> -else if ( !d->arch.pv.xpti )
> +else if ( !d->arch.pv.xpti && opt_global_pages )
>  cr4 |= X86_CR4_PGE;

 I'm sorry for noticing this only now, but what about XEN_MINIMAL_CR4,
 which includes X86_CR4_PGE?
>>>
>>> I've tried removing PGE from XEN_MINIMAL_CR4 but it made no noticeable
>>> performance difference, so I left it as-is.
>>
>> My concern isn't about performance, but correctness. I admit I
>> forgot for a moment that we now always write CR4 (unless the
>> cached value matches the intended new one). Yet
>> mmu_cr4_features (starting out as XEN_MINIMAL_CR4) is still of
>> concern.
>>
>> I think this at least requires extending the description to
>> discuss the correctness.
>
> Would you be fine with adding the following at the end of the commit
> message.
>
> "Note that XEN_MINIMAL_CR4 is not modified, and thus global pages are
> left enabled for the hypervisor. This is not an issue because the code
> to switch the control registers (cr3 and cr4) already takes into
> account such situation and performs the necessary flushes. The same
> already happens when using XPTI or PCIDE, as the guest cr4 doesn't
> have global pages enabled in that case either."

 Yes, this is good for XEN_MINIMAL_CR4. But I think mmu_cr4_features
 needs discussing (or at least mentioning, if the same arguments
 apply) as well.
>>>
>>> The same applies to mmu_cr4_features, it's fine for the hypervisor to
>>> use a different set of cr4 features (especially PGE) than PV guests:
>>> this is already the case when using XPTI or PCIDE when Xen cr4 will
>>> have PGE and guests cr4 won't, and switch_cr3_cr4 DTRT.
>>>
>>> So I would s/XEN_MINIMAL_CR4/XEN_MINIMAL_CR4 and mmu_cr4_features/ in
>>> the above proposed paragraph.
>>
>> I'm afraid it's more complicated, up to and including you making a
>> possible pre-existing bug worse: The S3 resume path loads CR4 from
>> mmu_cr4_features, but doesn't update the in-memory cache of the
>> register.
> 
> All domains are paused and the scheduler is disabled when doing S3
> suspend/resume, and the actual suspend and resume code is run by a
> tasklet which is executed in the idle vCPU context (since all domains
> are paused), and hence the write of CR4 with mmu_cr4_features is fine
> as entering guest context after resume is going to involve a call to
> switch_cr3_cr4 in order to switch out of the idle vCPU.

And switch_cr3_cr4() has

if ( old_cr4 != cr4 )
write_cr4(cr4);

with old_cr4 read from the cache.

> It might be clearer to just save cr4 in do_suspend_lowlevel like it's
> done with the rest of the control registers.

Not just more clear, but also more reliable.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3] x86: do not enable global pages when virtualized on AMD hardware

2019-12-09 Thread Roger Pau Monné
On Mon, Dec 09, 2019 at 04:04:51PM +0100, Jan Beulich wrote:
> On 09.12.2019 15:46, Roger Pau Monné wrote:
> > On Mon, Dec 09, 2019 at 03:21:28PM +0100, Jan Beulich wrote:
> >> On 09.12.2019 11:20, Roger Pau Monné wrote:
> >>> On Wed, Dec 04, 2019 at 06:05:11PM +0100, Jan Beulich wrote:
>  On 04.12.2019 17:18, Roger Pau Monné wrote:
> > On Wed, Dec 04, 2019 at 05:11:42PM +0100, Jan Beulich wrote:
> >> On 04.12.2019 16:12, Roger Pau Monne wrote:
> >>> @@ -130,7 +143,7 @@ unsigned long pv_make_cr4(const struct vcpu *v)
> >>>   */
> >>>  if ( d->arch.pv.pcid )
> >>>  cr4 |= X86_CR4_PCIDE;
> >>> -else if ( !d->arch.pv.xpti )
> >>> +else if ( !d->arch.pv.xpti && opt_global_pages )
> >>>  cr4 |= X86_CR4_PGE;
> >>
> >> I'm sorry for noticing this only now, but what about XEN_MINIMAL_CR4,
> >> which includes X86_CR4_PGE?
> >
> > I've tried removing PGE from XEN_MINIMAL_CR4 but it made no noticeable
> > performance difference, so I left it as-is.
> 
>  My concern isn't about performance, but correctness. I admit I
>  forgot for a moment that we now always write CR4 (unless the
>  cached value matches the intended new one). Yet
>  mmu_cr4_features (starting out as XEN_MINIMAL_CR4) is still of
>  concern.
> 
>  I think this at least requires extending the description to
>  discuss the correctness.
> >>>
> >>> Would you be fine with adding the following at the end of the commit
> >>> message.
> >>>
> >>> "Note that XEN_MINIMAL_CR4 is not modified, and thus global pages are
> >>> left enabled for the hypervisor. This is not an issue because the code
> >>> to switch the control registers (cr3 and cr4) already takes into
> >>> account such situation and performs the necessary flushes. The same
> >>> already happens when using XPTI or PCIDE, as the guest cr4 doesn't
> >>> have global pages enabled in that case either."
> >>
> >> Yes, this is good for XEN_MINIMAL_CR4. But I think mmu_cr4_features
> >> needs discussing (or at least mentioning, if the same arguments
> >> apply) as well.
> > 
> > The same applies to mmu_cr4_features, it's fine for the hypervisor to
> > use a different set of cr4 features (especially PGE) than PV guests:
> > this is already the case when using XPTI or PCIDE when Xen cr4 will
> > have PGE and guests cr4 won't, and switch_cr3_cr4 DTRT.
> > 
> > So I would s/XEN_MINIMAL_CR4/XEN_MINIMAL_CR4 and mmu_cr4_features/ in
> > the above proposed paragraph.
> 
> I'm afraid it's more complicated, up to and including you making a
> possible pre-existing bug worse: The S3 resume path loads CR4 from
> mmu_cr4_features, but doesn't update the in-memory cache of the
> register.

All domains are paused and the scheduler is disabled when doing S3
suspend/resume, and the actual suspend and resume code is run by a
tasklet which is executed in the idle vCPU context (since all domains
are paused), and hence the write of CR4 with mmu_cr4_features is fine
as entering guest context after resume is going to involve a call to
switch_cr3_cr4 in order to switch out of the idle vCPU.

It might be clearer to just save cr4 in do_suspend_lowlevel like it's
done with the rest of the control registers.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/6] xen/tasklet: Switch data parameter from unsigned long to void *.

2019-12-09 Thread Jan Beulich
On 05.12.2019 23:30, Andrew Cooper wrote:
> Most users pass a vcpu pointer, and only stopmachine_action() takes an integer
> parameter.  Switch to using void * to substantially reduce the number of
> explicit casts.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Jan Beulich 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] docs/sphinx: How Xen Boots on x86

2019-12-09 Thread Jan Beulich
On 06.12.2019 20:34, Andrew Cooper wrote:
> Begin to document how the x86 build of Xen boots.  It is by no means complete,
> but is a start.
> 
> Signed-off-by: Andrew Cooper 
> ---
> CC: Jan Beulich 
> CC: Wei Liu 
> CC: Roger Pau Monné 
> 
> This came about while I sat in SFO waiting for a delayed flight, and was asked
> a question by the Trenchboot folk.
> 
> Writing it down like this already highlights some issues, such as the EFI
> binary having MB1/MB2 headers.

While at least the MB1 ones aren't really necessary, they also don't
do any harm, do they?

> --- /dev/null
> +++ b/docs/hypervisor-guide/x86/how-xen-boots.rst
> @@ -0,0 +1,101 @@
> +.. SPDX-License-Identifier: CC-BY-4.0
> +
> +How Xen Boots
> +=
> +
> +This is an at-a-glance reference of Xen's booting capabilities and
> +expectations.
> +
> +
> +Build
> +-
> +
> +A build of xen produces ``xen.gz`` and optionally ``xen.efi`` as final
> +artefacts.
> +
> + * For BIOS, Xen supports the Multiboot 1 and 2 protocols.
> +
> + * For EFI, Xen supports Multiboot 2 with EFI extensions, and native EFI64.
> +
> + * For virtualisation, Xen supports starting directly with the PVH boot
> +   protocol.
> +
> +
> +Objects
> +~~~
> +
> +To begin with, most object files are compiled and linked.  This includes the
> +Multiboot 1 and 2 headers and entrypoints, including the Multiboot 2 tags for
> +EFI extensions.  When ``CONFIG_PVH_GUEST`` is selected at build time, this
> +includes the PVH entrypoint and associated ELF notes.
> +
> +Depending on whether the compiler supports ``__attribute__((__ms_abi__))`` or
> +not, either an EFI stub is included which nops/fails applicable setup calls,
> +or full EFI support is included.

Perhaps also mention that the linker needs to support the necessary
binary output format? And perhaps "setup and runtime calls"?

> +Protocols and entrypoints
> +~
> +
> +All headers and tags are built in ``xen/arch/x86/boot/head.S``
> +
> +The Multiboot 1 headers request aligned modules and memory information.  
> Entry
> +is via the start of the binary image, which is the ``start`` symbol.  This
> +entrypoint must be started in 32bit mode.
> +
> +The Multiboot 2 headers are more flexible, and in addition request that the
> +image be loaded as high as possible below the 4G boundary, with 2M alignment.
> +Entry is still via the ``start`` symbol as with MB1.

Perhaps explicitly (re)state this is in 32-bit mode?

> +Headers for the EFI MB2 extensions are also present.  These request that
> +``ExitBootServices()`` not be called, and register ``__efi_mb2_start`` as an
> +alternative entrypoint, entered in 64bit mode.
> +
> +If ``CONFIG_PVH_GUEST`` was selected at build time, an Elf note is included
> +which indicates the ability to use the PVH boot protocol, and registers
> +``__pvh_start`` as the entrypoint, entered in 32bit mode.
> +
> +
> +xen.gz
> +~~
> +
> +The objects are linked together to form ``xen-syms`` which is an ELF64
> +executable with full debugging symbols.  ``xen.gz`` is formed by stripping
> +``xen-syms``, then repackaging the result as an ELF32 object with a single
> +load section at 2MB, and ``gzip``-ing the result.  Despite the ELF32 having a
> +fixed load address, its contents are relocatable.

This is a little ambiguous I guess - most of the code is PIC and as
such relocatable, but not in a way a boot loader could arrange for.

> +Any bootloader which unzips the binary and follows the ELF headers will place
> +it at the 2M boundary and jump to ``start`` which is the identified entry
> +point.  However, Xen depends on being entered with the MB1 or MB2 protocols,
> +and will terminate otherwise.
> +
> +The MB2+EFI entrypoint depends on being entered with the MB2 protocol, and
> +will terminate if the entry protocol is wrong, or if EFI details aren't
> +provided, or if EFI Boot Services are not available.
> +
> +
> +xen.efi
> +~~~
> +
> +When a PEI-capable toolchain is found, the objects are linked together and a
> +PE64 binary is created.  It can be run directly from the EFI shell, and has

I think it's commonly called PE32+, not PE64.

Maybe also mention the "chainloader" grub command it can be used with?
Or do we consider this uninteresting enough with modern grub?

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/microcode: Support builtin CPU microcode

2019-12-09 Thread Andrew Cooper
On 09/12/2019 08:41, Eslam Elnikety wrote:
> diff --git a/docs/misc/builtin-ucode.txt b/docs/misc/builtin-ucode.txt
> new file mode 100644
> index 00..43bb60d3eb

Instead of introducing a new file, please extend
docs/admin-guide/microcode-loading.rst

I have an in-prep docs/hypervisor-guide/microcode-loading.rst as well,
which I'll see about posting.  It would be a more appropriate place for
the technical details.

> diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
> index 6ced293d88..7afbe44286 100644
> --- a/xen/arch/x86/microcode.c
> +++ b/xen/arch/x86/microcode.c
> @@ -97,6 +97,14 @@ static struct ucode_mod_blob __initdata ucode_blob;
>   */
>  static bool_t __initdata ucode_scan;
>  
> +#ifdef CONFIG_BUILTIN_UCODE
> +/* builtin is the default when BUILTIN_UCODE is set */
> +static bool_t __initdata ucode_builtin = 1;

bool, true.

> +
> +extern const char __builtin_intel_ucode_start[], __builtin_intel_ucode_end[];
> +extern const char __builtin_amd_ucode_start[], __builtin_amd_ucode_end[];
> +#endif
> +
>  /* By default, ucode loading is done in NMI handler */
>  static bool ucode_in_nmi = true;
>  
> @@ -110,9 +118,9 @@ void __init microcode_set_module(unsigned int idx)
>  }
>  
>  /*
> - * The format is '[|scan=, nmi=]'. Both options are
> - * optional. If the EFI has forced which of the multiboot payloads is to be
> - * used, only nmi= is parsed.
> + * The format is '[|scan=|builtin=, nmi=]'. All
> + * options are optional. If the EFI has forced which of the multiboot 
> payloads
> + * is to be used, only nmi= is parsed.
>   */

Please delete this, or I'll do a prereq patch to fix it and the command
line docs.  (Both are in a poor state.)

> @@ -237,6 +249,48 @@ void __init microcode_grab_module(
>  scan:
>  if ( ucode_scan )
>  microcode_scan_module(module_map, mbi);
> +
> +#ifdef CONFIG_BUILTIN_UCODE
> +/*
> + * Do not use the builtin microcode if:
> + * (a) builtin has been explicitly turned off (e.g., ucode=no-builtin)
> + * (b) a microcode module has been specified or a scan is successful
> + */
> +if ( !ucode_builtin || ucode_mod.mod_end || ucode_blob.size )
> +return;
> +
> +/* Set ucode_start/_end to the proper blob */
> +if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD )
> +ucode_blob.size = (size_t)(__builtin_amd_ucode_end
> +   - __builtin_amd_ucode_start);

No need to cast the result.  Also, preferred Xen style would be to have
- on the preceding line.

> +else if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
> +ucode_blob.size = (size_t)(__builtin_intel_ucode_end
> +   - __builtin_intel_ucode_start);
> +else
> +return;
> +
> +if ( !ucode_blob.size )
> +{
> +printk("No builtin ucode! 'ucode=builtin' is nullified.\n");
> +return;
> +}
> +else if ( ucode_blob.size > MAX_EARLY_CPIO_MICROCODE )
> +{
> +printk("Builtin microcode payload too big! (%ld, we can do %d)\n",
> +   ucode_blob.size, MAX_EARLY_CPIO_MICROCODE);
> +ucode_blob.size = 0;
> +return;
> +}
> +
> +ucode_blob.data = xmalloc_bytes(ucode_blob.size);
> +if ( !ucode_blob.data )
> +return;

Any chance we can reuse the "fits" logic to avoid holding every
inapplicable blob in memory as well?

> +
> +if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD )
> +memcpy(ucode_blob.data, __builtin_amd_ucode_start, ucode_blob.size);
> +else
> +memcpy(ucode_blob.data, __builtin_intel_ucode_start, 
> ucode_blob.size);
> +#endif
>  }
>  
>  const struct microcode_ops *microcode_ops;
> diff --git a/xen/arch/x86/microcode/Makefile b/xen/arch/x86/microcode/Makefile
> new file mode 100644
> index 00..6d585c5482
> --- /dev/null
> +++ b/xen/arch/x86/microcode/Makefile
> @@ -0,0 +1,40 @@
> +# Copyright (C) 2019 Amazon.com, Inc. or its affiliates.
> +# Author: Eslam Elnikety 
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 2 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +
> +obj-y += builtin_ucode.o
> +
> +# Directory holding the microcode updates.
> +UCODE_DIR=$(patsubst "%",%,$(CONFIG_BUILTIN_UCODE_DIR))
> +amd-blobs := $(wildcard $(UCODE_DIR)/amd-ucode/*)
> +intel-blobs := $(wildcard $(UCODE_DIR)/intel-ucode/*)

This is a little dangerous.  I can see why you want to do it like this,
and I can't provide any obvious suggestions, but if this glob picks up
anything which isn't a microcode file, it will break the logic to search
for the right 

Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed

2019-12-09 Thread Roger Pau Monné
On Mon, Dec 09, 2019 at 02:41:40PM +, Durrant, Paul wrote:
> > -Original Message-
> > From: Roger Pau Monné 
> > Sent: 09 December 2019 14:29
> > To: Durrant, Paul 
> > Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org; Juergen
> > Gross ; Stefano Stabellini ;
> > Boris Ostrovsky 
> > Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to
> > closed
> > 
> > On Mon, Dec 09, 2019 at 12:40:47PM +, Durrant, Paul wrote:
> > > > -Original Message-
> > > > From: Roger Pau Monné 
> > > > Sent: 09 December 2019 12:26
> > > > To: Durrant, Paul 
> > > > Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org;
> > Juergen
> > > > Gross ; Stefano Stabellini ;
> > > > Boris Ostrovsky 
> > > > Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is
> > forced to
> > > > closed
> > > >
> > > > On Mon, Dec 09, 2019 at 12:01:38PM +, Durrant, Paul wrote:
> > > > > > -Original Message-
> > > > > > From: Roger Pau Monné 
> > > > > > Sent: 09 December 2019 11:39
> > > > > > To: Durrant, Paul 
> > > > > > Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org;
> > > > Juergen
> > > > > > Gross ; Stefano Stabellini
> > ;
> > > > > > Boris Ostrovsky 
> > > > > > Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is
> > > > forced to
> > > > > > closed
> > > > > >
> > > > > > On Thu, Dec 05, 2019 at 02:01:21PM +, Paul Durrant wrote:
> > > > > > > Only force state to closed in the case when the toolstack may
> > need
> > > > to
> > > > > > > clean up. This can be detected by checking whether the state in
> > > > xenstore
> > > > > > > has been set to closing prior to device removal.
> > > > > >
> > > > > > I'm not sure I see the point of this, I would expect that a
> > failure to
> > > > > > probe or the removal of the device would leave the xenbus state as
> > > > > > closed, which is consistent with the actual driver state.
> > > > > >
> > > > > > Can you explain what's the benefit of leaving a device without a
> > > > > > driver in such unknown state?
> > > > > >
> > > > >
> > > > > If probe fails then I think it should leave the state alone. If the
> > > > > state is moved to closed then basically you just killed that
> > > > > connection to the guest (as the frontend will normally close down
> > > > > when it sees this change) so, if the probe failure was due to a bug
> > > > > in blkback or, e.g., a transient resource issue then it's game over
> > > > > as far as that guest goes.
> > > >
> > > > But the connection can be restarted by switching the backend to the
> > > > init state again.
> > >
> > > Too late. The frontend saw closed and you already lost.
> > >
> > > >
> > > > > The ultimate goal here is PV backend re-load that is completely
> > > > transparent to the guest. Modifying anything in xenstore compromises
> > that
> > > > so we need to be careful.
> > > >
> > > > That's a fine goal, but not switching to closed state in
> > > > xenbus_dev_remove seems wrong, as you have actually left the frontend
> > > > without a matching backend and with the state not set to closed.
> > > >
> > >
> > > Why is this a problem? With this series fully applied a (block) backend
> > can come and go without needing to change the state. Relying on guests to
> > DTRT is not a sustainable option for a cloud deployment.
> > >
> > > > Ie: that would be fine if you explicitly state this is some kind of
> > > > internal blkback reload, but not for the general case where blkback
> > > > has been unbound. I think we need someway to difference a blkback
> > > > reload vs a unbound.
> > > >
> > >
> > > Why do we need that though? Why is it advantageous for a backend to go
> > to closed. No PV backends cope with an unbind as-is, and a toolstack
> > initiated unplug will always set state to 5 anyway. So TBH any state
> > transition done directly in the xenbus code looks wrong to me anyway (but
> > appears to be a necessary evil to keep the toolstack working in the event
> > it spawns a backend where there is actually to driver present, or it
> > doesn't come online).
> > 
> > IMO the normal flow for unbind would be to attempt to close open
> > connections and then remove the driver: leaving frontends connected
> > without any attached backends is not correct, and will just block the
> > guest frontend until requests start timing out.
> > 
> > I can see the reasoning for doing that for the purpose of updating a
> > blkback module without guests noticing, but I would prefer that
> > leaving connections open was an option that could be given when
> > unbinding (or maybe a driver option in sysfs?), so that the default
> > behaviour would be to try to close everything when unbinding if
> > possible.
> 
> Well unbind is pretty useless now IMO since bind doesn't work, and a 
> transition straight to closed is just plain wrong anyway.

Why do you claim that a straight transition into the closed state is
wrong?

I don't see any such mention in 

Re: [Xen-devel] [PATCH v3] x86: do not enable global pages when virtualized on AMD hardware

2019-12-09 Thread Jan Beulich
On 09.12.2019 15:46, Roger Pau Monné wrote:
> On Mon, Dec 09, 2019 at 03:21:28PM +0100, Jan Beulich wrote:
>> On 09.12.2019 11:20, Roger Pau Monné wrote:
>>> On Wed, Dec 04, 2019 at 06:05:11PM +0100, Jan Beulich wrote:
 On 04.12.2019 17:18, Roger Pau Monné wrote:
> On Wed, Dec 04, 2019 at 05:11:42PM +0100, Jan Beulich wrote:
>> On 04.12.2019 16:12, Roger Pau Monne wrote:
>>> @@ -130,7 +143,7 @@ unsigned long pv_make_cr4(const struct vcpu *v)
>>>   */
>>>  if ( d->arch.pv.pcid )
>>>  cr4 |= X86_CR4_PCIDE;
>>> -else if ( !d->arch.pv.xpti )
>>> +else if ( !d->arch.pv.xpti && opt_global_pages )
>>>  cr4 |= X86_CR4_PGE;
>>
>> I'm sorry for noticing this only now, but what about XEN_MINIMAL_CR4,
>> which includes X86_CR4_PGE?
>
> I've tried removing PGE from XEN_MINIMAL_CR4 but it made no noticeable
> performance difference, so I left it as-is.

 My concern isn't about performance, but correctness. I admit I
 forgot for a moment that we now always write CR4 (unless the
 cached value matches the intended new one). Yet
 mmu_cr4_features (starting out as XEN_MINIMAL_CR4) is still of
 concern.

 I think this at least requires extending the description to
 discuss the correctness.
>>>
>>> Would you be fine with adding the following at the end of the commit
>>> message.
>>>
>>> "Note that XEN_MINIMAL_CR4 is not modified, and thus global pages are
>>> left enabled for the hypervisor. This is not an issue because the code
>>> to switch the control registers (cr3 and cr4) already takes into
>>> account such situation and performs the necessary flushes. The same
>>> already happens when using XPTI or PCIDE, as the guest cr4 doesn't
>>> have global pages enabled in that case either."
>>
>> Yes, this is good for XEN_MINIMAL_CR4. But I think mmu_cr4_features
>> needs discussing (or at least mentioning, if the same arguments
>> apply) as well.
> 
> The same applies to mmu_cr4_features, it's fine for the hypervisor to
> use a different set of cr4 features (especially PGE) than PV guests:
> this is already the case when using XPTI or PCIDE when Xen cr4 will
> have PGE and guests cr4 won't, and switch_cr3_cr4 DTRT.
> 
> So I would s/XEN_MINIMAL_CR4/XEN_MINIMAL_CR4 and mmu_cr4_features/ in
> the above proposed paragraph.

I'm afraid it's more complicated, up to and including you making a
possible pre-existing bug worse: The S3 resume path loads CR4 from
mmu_cr4_features, but doesn't update the in-memory cache of the
register. Hence some subsequent CR4 writes may wrongly be skipped,
until hitting one where an actual write is necessary. Now that you
play (more) with PGE, the situation is only going to get worse. Of
course I may well simply not have managed to spot the sync-ing of
the cached value. (VMX, otoh, takes special care to keep CPU loaded
value and cache in sync - see the bottom of vmx_do_resume().)

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 2/4] x86/apic: force phys mode if interrupt remapping is disabled

2019-12-09 Thread Roger Pau Monné
On Mon, Dec 09, 2019 at 03:30:27PM +0100, Jan Beulich wrote:
> On 09.12.2019 11:25, Roger Pau Monné wrote:
> > On Thu, Dec 05, 2019 at 10:32:34AM +0100, Jan Beulich wrote:
> >> On 04.12.2019 17:20, Roger Pau Monne wrote:
> >>> Cluster mode can only be used with interrupt remapping support, since
> >>> the top 16bits of the APIC ID are filled with the cluster ID, and
> >>> hence on systems where the physical ID is still smaller than 255 the
> >>> cluster ID is not. Force x2APIC to use physical mode if there's no
> >>> interrupt remapping support.
> >>>
> >>> Note that this requires a further patch in order to enable x2APIC
> >>> without interrupt remapping support.
> >>>
> >>> Signed-off-by: Roger Pau Monné 
> >>
> >> Reviewed-by: Jan Beulich 
> >> albeit ...
> >>
> >>> --- a/xen/arch/x86/genapic/x2apic.c
> >>> +++ b/xen/arch/x86/genapic/x2apic.c
> >>> @@ -226,7 +226,23 @@ boolean_param("x2apic_phys", x2apic_phys);
> >>>  const struct genapic *__init apic_x2apic_probe(void)
> >>>  {
> >>>  if ( x2apic_phys < 0 )
> >>> -x2apic_phys = !!(acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL);
> >>> +{
> >>> +if ( !iommu_intremap )
> >>> +/*
> >>> + * Force physical mode if there's no interrupt remapping 
> >>> support:
> >>> + * the ID in clustered mode requires a 32 bit destination 
> >>> field due
> >>> + * to the usage of the high 16 bits to store the cluster ID.
> >>> + */
> >>> +x2apic_phys = true;
> >>> +else
> >>> +x2apic_phys = !!(acpi_gbl_FADT.flags & 
> >>> ACPI_FADT_APIC_PHYSICAL);
> >>
> >> ... I wonder why you didn't make this
> >>
> >> x2apic_phys = !iommu_intremap || (acpi_gbl_FADT.flags & 
> >> ACPI_FADT_APIC_PHYSICAL);
> >>
> >> (not the least because of allowing to drop the somewhat ugly !!).
> > 
> > Feel free to do it at commit (and reindent the comment), or else I can
> > resend a new version if that's too intrusive.
> 
> Doing these adjustments at commit time ought to be fine. It's
> just that I'd prefer to wait with committing this series until
> 4.13 is fully finished.

That's fine, I don't have any hurry. All patches are Acked or RB now,
so I will hold off sending a new version. Let me know if the patches
don't apply cleanly when committing and I can send an updated version
with the minor comments from this last round.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3] x86: do not enable global pages when virtualized on AMD hardware

2019-12-09 Thread Roger Pau Monné
On Mon, Dec 09, 2019 at 03:21:28PM +0100, Jan Beulich wrote:
> On 09.12.2019 11:20, Roger Pau Monné wrote:
> > On Wed, Dec 04, 2019 at 06:05:11PM +0100, Jan Beulich wrote:
> >> On 04.12.2019 17:18, Roger Pau Monné wrote:
> >>> On Wed, Dec 04, 2019 at 05:11:42PM +0100, Jan Beulich wrote:
>  On 04.12.2019 16:12, Roger Pau Monne wrote:
> > @@ -130,7 +143,7 @@ unsigned long pv_make_cr4(const struct vcpu *v)
> >   */
> >  if ( d->arch.pv.pcid )
> >  cr4 |= X86_CR4_PCIDE;
> > -else if ( !d->arch.pv.xpti )
> > +else if ( !d->arch.pv.xpti && opt_global_pages )
> >  cr4 |= X86_CR4_PGE;
> 
>  I'm sorry for noticing this only now, but what about XEN_MINIMAL_CR4,
>  which includes X86_CR4_PGE?
> >>>
> >>> I've tried removing PGE from XEN_MINIMAL_CR4 but it made no noticeable
> >>> performance difference, so I left it as-is.
> >>
> >> My concern isn't about performance, but correctness. I admit I
> >> forgot for a moment that we now always write CR4 (unless the
> >> cached value matches the intended new one). Yet
> >> mmu_cr4_features (starting out as XEN_MINIMAL_CR4) is still of
> >> concern.
> >>
> >> I think this at least requires extending the description to
> >> discuss the correctness.
> > 
> > Would you be fine with adding the following at the end of the commit
> > message.
> > 
> > "Note that XEN_MINIMAL_CR4 is not modified, and thus global pages are
> > left enabled for the hypervisor. This is not an issue because the code
> > to switch the control registers (cr3 and cr4) already takes into
> > account such situation and performs the necessary flushes. The same
> > already happens when using XPTI or PCIDE, as the guest cr4 doesn't
> > have global pages enabled in that case either."
> 
> Yes, this is good for XEN_MINIMAL_CR4. But I think mmu_cr4_features
> needs discussing (or at least mentioning, if the same arguments
> apply) as well.

The same applies to mmu_cr4_features, it's fine for the hypervisor to
use a different set of cr4 features (especially PGE) than PV guests:
this is already the case when using XPTI or PCIDE when Xen cr4 will
have PGE and guests cr4 won't, and switch_cr3_cr4 DTRT.

So I would s/XEN_MINIMAL_CR4/XEN_MINIMAL_CR4 and mmu_cr4_features/ in
the above proposed paragraph.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed

2019-12-09 Thread Durrant, Paul
> -Original Message-
> From: Jürgen Groß 
> Sent: 09 December 2019 14:41
> To: Durrant, Paul ; Roger Pau Monné
> 
> Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org; Stefano
> Stabellini ; Boris Ostrovsky
> 
> Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to
> closed
> 
> On 09.12.19 15:23, Durrant, Paul wrote:
> >> -Original Message-
> >> From: Jürgen Groß 
> >> Sent: 09 December 2019 14:10
> >> To: Durrant, Paul ; Roger Pau Monné
> >> 
> >> Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org;
> Stefano
> >> Stabellini ; Boris Ostrovsky
> >> 
> >> Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced
> to
> >> closed
> >>
> >> On 09.12.19 15:06, Durrant, Paul wrote:
>  -Original Message-
>  From: Jürgen Groß 
>  Sent: 09 December 2019 13:39
>  To: Durrant, Paul ; Roger Pau Monné
>  
>  Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org;
> >> Stefano
>  Stabellini ; Boris Ostrovsky
>  
>  Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is
> forced
> >> to
>  closed
> 
>  On 09.12.19 13:19, Durrant, Paul wrote:
> >> -Original Message-
> >> From: Jürgen Groß 
> >> Sent: 09 December 2019 12:09
> >> To: Durrant, Paul ; Roger Pau Monné
> >> 
> >> Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org;
>  Stefano
> >> Stabellini ; Boris Ostrovsky
> >> 
> >> Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is
> >> forced
>  to
> >> closed
> >>
> >> On 09.12.19 13:03, Durrant, Paul wrote:
>  -Original Message-
>  From: Jürgen Groß 
>  Sent: 09 December 2019 11:55
>  To: Roger Pau Monné ; Durrant, Paul
>  
>  Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org;
> >> Stefano
>  Stabellini ; Boris Ostrovsky
>  
>  Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is
>  forced
> >> to
>  closed
> 
>  On 09.12.19 12:39, Roger Pau Monné wrote:
> > On Thu, Dec 05, 2019 at 02:01:21PM +, Paul Durrant wrote:
> >> Only force state to closed in the case when the toolstack may
> >> need
>  to
> >> clean up. This can be detected by checking whether the state in
>  xenstore
> >> has been set to closing prior to device removal.
> >
> > I'm not sure I see the point of this, I would expect that a
> >> failure
>  to
> > probe or the removal of the device would leave the xenbus state
> as
> > closed, which is consistent with the actual driver state.
> >
> > Can you explain what's the benefit of leaving a device without a
> > driver in such unknown state?
> 
>  And more concerning: did you check that no frontend/backend is
>  relying on the closed state to be visible without closing having
> >> been
>  set before?
> >>>
> >>> Blkfront doesn't seem to mind and I believe the Windows PV drivers
>  cope,
> >> but I don't really understand the comment since this patch is
> >> actually
> >> removing a case where the backend transitions directly to closed.
> >>
> >> I'm not speaking of blkfront/blkback only, but of net, tpm, scsi,
>  pvcall
> >> etc. frontends/backends. After all you are modifying a function
> >> common
> >> to all PV driver pairs.
> >>
> >> You are removing a state switc to "closed" in case the state was
> >> _not_
> >> "closing" before.
> >
> > Yes, which AFAIK is against the intention of the generic PV protocol
>  such that it ever existed anyway.
> 
>  While this might be the case we should _not_ break any guests
>  running now. So this kind of reasoning is dangerous.
> 
> >
> >> So any PV driver reacting to "closed" of the other end
> >> in case the previous state might not have been "closing" before is
> at
> >> risk to misbehave with your patch.
> >
> > Well, they will see nothing now. If the state was not closing, it
> gets
>  left alone, so the frontend shouldn't do anything. The only risk that
> I
>  can see is that some frontend/backend pair needed a direct 4 -> 6
>  transition to support 'unbind' before but AFAIK nothing has ever
> >> supported
>  that, and blk and net crash'n'burn if you try that on upstream as it
>  stands. A clean unplug would always set state to 5 first, since
> that's
>  part of the unplug protocol.
> 
>  That was my question: are you sure all current and previous
>  guest frontends and backends are handling unplug this way?
> 
>  Not "should handle", but "do handle".
> >>>
> >>> That depends on the toolstack. IIUC the only 'supported' toolstack is
> >> xl/libxl, which will set 'state' to 5 and 'online' to 0 to 

Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed

2019-12-09 Thread Durrant, Paul
> -Original Message-
> From: Roger Pau Monné 
> Sent: 09 December 2019 14:29
> To: Durrant, Paul 
> Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org; Juergen
> Gross ; Stefano Stabellini ;
> Boris Ostrovsky 
> Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to
> closed
> 
> On Mon, Dec 09, 2019 at 12:40:47PM +, Durrant, Paul wrote:
> > > -Original Message-
> > > From: Roger Pau Monné 
> > > Sent: 09 December 2019 12:26
> > > To: Durrant, Paul 
> > > Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org;
> Juergen
> > > Gross ; Stefano Stabellini ;
> > > Boris Ostrovsky 
> > > Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is
> forced to
> > > closed
> > >
> > > On Mon, Dec 09, 2019 at 12:01:38PM +, Durrant, Paul wrote:
> > > > > -Original Message-
> > > > > From: Roger Pau Monné 
> > > > > Sent: 09 December 2019 11:39
> > > > > To: Durrant, Paul 
> > > > > Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org;
> > > Juergen
> > > > > Gross ; Stefano Stabellini
> ;
> > > > > Boris Ostrovsky 
> > > > > Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is
> > > forced to
> > > > > closed
> > > > >
> > > > > On Thu, Dec 05, 2019 at 02:01:21PM +, Paul Durrant wrote:
> > > > > > Only force state to closed in the case when the toolstack may
> need
> > > to
> > > > > > clean up. This can be detected by checking whether the state in
> > > xenstore
> > > > > > has been set to closing prior to device removal.
> > > > >
> > > > > I'm not sure I see the point of this, I would expect that a
> failure to
> > > > > probe or the removal of the device would leave the xenbus state as
> > > > > closed, which is consistent with the actual driver state.
> > > > >
> > > > > Can you explain what's the benefit of leaving a device without a
> > > > > driver in such unknown state?
> > > > >
> > > >
> > > > If probe fails then I think it should leave the state alone. If the
> > > > state is moved to closed then basically you just killed that
> > > > connection to the guest (as the frontend will normally close down
> > > > when it sees this change) so, if the probe failure was due to a bug
> > > > in blkback or, e.g., a transient resource issue then it's game over
> > > > as far as that guest goes.
> > >
> > > But the connection can be restarted by switching the backend to the
> > > init state again.
> >
> > Too late. The frontend saw closed and you already lost.
> >
> > >
> > > > The ultimate goal here is PV backend re-load that is completely
> > > transparent to the guest. Modifying anything in xenstore compromises
> that
> > > so we need to be careful.
> > >
> > > That's a fine goal, but not switching to closed state in
> > > xenbus_dev_remove seems wrong, as you have actually left the frontend
> > > without a matching backend and with the state not set to closed.
> > >
> >
> > Why is this a problem? With this series fully applied a (block) backend
> can come and go without needing to change the state. Relying on guests to
> DTRT is not a sustainable option for a cloud deployment.
> >
> > > Ie: that would be fine if you explicitly state this is some kind of
> > > internal blkback reload, but not for the general case where blkback
> > > has been unbound. I think we need someway to difference a blkback
> > > reload vs a unbound.
> > >
> >
> > Why do we need that though? Why is it advantageous for a backend to go
> to closed. No PV backends cope with an unbind as-is, and a toolstack
> initiated unplug will always set state to 5 anyway. So TBH any state
> transition done directly in the xenbus code looks wrong to me anyway (but
> appears to be a necessary evil to keep the toolstack working in the event
> it spawns a backend where there is actually to driver present, or it
> doesn't come online).
> 
> IMO the normal flow for unbind would be to attempt to close open
> connections and then remove the driver: leaving frontends connected
> without any attached backends is not correct, and will just block the
> guest frontend until requests start timing out.
> 
> I can see the reasoning for doing that for the purpose of updating a
> blkback module without guests noticing, but I would prefer that
> leaving connections open was an option that could be given when
> unbinding (or maybe a driver option in sysfs?), so that the default
> behaviour would be to try to close everything when unbinding if
> possible.

Well unbind is pretty useless now IMO since bind doesn't work, and a transition 
straight to closed is just plain wrong anyway. But, we could have a flag that 
the backend driver sets to say that it supports transparent re-bind that gates 
this code. Would that make you feel more comfortable?

If you want unbind to actually do a proper unplug then that's extra work and 
not really something I want to tackle (and re-bind would still need to be 
toolstack initiated as something would have to re-create the xenstore 

Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed

2019-12-09 Thread Jürgen Groß

On 09.12.19 15:23, Durrant, Paul wrote:

-Original Message-
From: Jürgen Groß 
Sent: 09 December 2019 14:10
To: Durrant, Paul ; Roger Pau Monné

Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org; Stefano
Stabellini ; Boris Ostrovsky

Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to
closed

On 09.12.19 15:06, Durrant, Paul wrote:

-Original Message-
From: Jürgen Groß 
Sent: 09 December 2019 13:39
To: Durrant, Paul ; Roger Pau Monné

Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org;

Stefano

Stabellini ; Boris Ostrovsky

Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced

to

closed

On 09.12.19 13:19, Durrant, Paul wrote:

-Original Message-
From: Jürgen Groß 
Sent: 09 December 2019 12:09
To: Durrant, Paul ; Roger Pau Monné

Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org;

Stefano

Stabellini ; Boris Ostrovsky

Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is

forced

to

closed

On 09.12.19 13:03, Durrant, Paul wrote:

-Original Message-
From: Jürgen Groß 
Sent: 09 December 2019 11:55
To: Roger Pau Monné ; Durrant, Paul

Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org;

Stefano

Stabellini ; Boris Ostrovsky

Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is

forced

to

closed

On 09.12.19 12:39, Roger Pau Monné wrote:

On Thu, Dec 05, 2019 at 02:01:21PM +, Paul Durrant wrote:

Only force state to closed in the case when the toolstack may

need

to

clean up. This can be detected by checking whether the state in

xenstore

has been set to closing prior to device removal.


I'm not sure I see the point of this, I would expect that a

failure

to

probe or the removal of the device would leave the xenbus state as
closed, which is consistent with the actual driver state.

Can you explain what's the benefit of leaving a device without a
driver in such unknown state?


And more concerning: did you check that no frontend/backend is
relying on the closed state to be visible without closing having

been

set before?


Blkfront doesn't seem to mind and I believe the Windows PV drivers

cope,

but I don't really understand the comment since this patch is

actually

removing a case where the backend transitions directly to closed.

I'm not speaking of blkfront/blkback only, but of net, tpm, scsi,

pvcall

etc. frontends/backends. After all you are modifying a function

common

to all PV driver pairs.

You are removing a state switc to "closed" in case the state was

_not_

"closing" before.


Yes, which AFAIK is against the intention of the generic PV protocol

such that it ever existed anyway.

While this might be the case we should _not_ break any guests
running now. So this kind of reasoning is dangerous.




So any PV driver reacting to "closed" of the other end
in case the previous state might not have been "closing" before is at
risk to misbehave with your patch.


Well, they will see nothing now. If the state was not closing, it gets

left alone, so the frontend shouldn't do anything. The only risk that I
can see is that some frontend/backend pair needed a direct 4 -> 6
transition to support 'unbind' before but AFAIK nothing has ever

supported

that, and blk and net crash'n'burn if you try that on upstream as it
stands. A clean unplug would always set state to 5 first, since that's
part of the unplug protocol.

That was my question: are you sure all current and previous
guest frontends and backends are handling unplug this way?

Not "should handle", but "do handle".


That depends on the toolstack. IIUC the only 'supported' toolstack is

xl/libxl, which will set 'state' to 5 and 'online' to 0 to initiate an
unplug.

I guess libvirt/libxl is doing the same?



The unplug mechansism is all in libxl AFAICT, so it should be identical.


At least at SUSE we still have some customers running xend based
Xen installations with recent Linux or Windows guests.



Is that something the upstream code can/should support though? I'd be surprised 
if xend is actually doing anything different to libxl since I've been coding 
the Windows PV drivers to trigger off the combined closing/online transition 
for as long as I can remember.


I'd rather not have to carry a private patch for new Linux kernel to be
able to run on those hosts.

AFAIK you at Amazon have some quite old Xen installations, too. How are
you handling that (assuming the customer is updating the kernel to a
recent version in his guest)?


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 4/4] x86/apic: allow enabling x2APIC mode regardless of interrupt remapping

2019-12-09 Thread Jan Beulich
On 09.12.2019 11:56, Roger Pau Monné wrote:
> On Thu, Dec 05, 2019 at 10:45:59AM +0100, Jan Beulich wrote:
>> On 04.12.2019 17:20, Roger Pau Monne wrote:
>>> +switch ( iommu_enable_x2apic() )
>>>  {
>>> +case 0:
>>> +iommu_x2apic_enabled = true;
>>> +break;
>>> +
>>> +case -ENXIO: /* ACPI_DMAR_X2APIC_OPT_OUT set */
>>> +if ( x2apic_enabled )
>>> +panic("IOMMU requests xAPIC mode, but x2APIC already 
>>> enabled by firmware\n");
>>> +
>>>  printk("Not enabling x2APIC (upon firmware request)\n");
>>> -intremap_enabled = false;
>>> +iommu_x2apic_enabled = false;
>>>  goto restore_out;
>>> +
>>> +default:
>>> +printk(XENLOG_ERR "Failed to enable Interrupt Remapping\n");
>>> +iommu_x2apic_enabled = false;
>>> +break;
>>
>> I guess you still need to panic() in the failure cases if x2apic_phys
>> is false. Unless you can still properly switch from cluster to
>> physical mode at this point in time. (If you go the panic() route,
>> I'd appreciate if you could avoid making x2apic_phys non-static.)
> 
> I don't think Xen needs to check x2apic_phys or panic here, the x2apic
> probe that selects phys or cluster mode is done afterwards in
> apic_x2apic_probe, which is called after the attempt to enable
> interrupt remapping and hence will take this result into account.

Oh indeed, you're right.

>>> @@ -938,13 +931,16 @@ void __init x2apic_bsp_setup(void)
>>>  printk("Switched to APIC driver %s\n", genapic.name);
>>>  
>>>  restore_out:
>>> -/*
>>> - * NB: do not use raw mode when restoring entries if the iommu has been
>>> - * enabled during the process, because the entries need to be 
>>> translated
>>> - * and added to the remapping table in that case.
>>> - */
>>> -restore_IO_APIC_setup(ioapic_entries, !intremap_enabled);
>>> -unmask_8259A();
>>> +if ( iommu_supports_x2apic() )
>>
>> Hmm, I first wanted to suggest to use iommu_x2apic_enabled here, but
>> I realize the error cases above would then be wrong. Perhaps better
>> to leave a brief comment to this effect?
> 
> Ack, would you be fine with:
> 
> "Note that iommu_x2apic_enabled cannot be used here because if the
> IOMMU supports x2APIC but enabling failed Xen wouldn't restore the
> IO-APIC and the 8259A state correctly."

This or even more briefly "iommu_x2apic_enabled cannot be used here
in the error case". With this (which once again could be done while
committing)
Reviewed-by: Jan Beulich 

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 2/4] x86/apic: force phys mode if interrupt remapping is disabled

2019-12-09 Thread Jan Beulich
On 09.12.2019 11:25, Roger Pau Monné wrote:
> On Thu, Dec 05, 2019 at 10:32:34AM +0100, Jan Beulich wrote:
>> On 04.12.2019 17:20, Roger Pau Monne wrote:
>>> Cluster mode can only be used with interrupt remapping support, since
>>> the top 16bits of the APIC ID are filled with the cluster ID, and
>>> hence on systems where the physical ID is still smaller than 255 the
>>> cluster ID is not. Force x2APIC to use physical mode if there's no
>>> interrupt remapping support.
>>>
>>> Note that this requires a further patch in order to enable x2APIC
>>> without interrupt remapping support.
>>>
>>> Signed-off-by: Roger Pau Monné 
>>
>> Reviewed-by: Jan Beulich 
>> albeit ...
>>
>>> --- a/xen/arch/x86/genapic/x2apic.c
>>> +++ b/xen/arch/x86/genapic/x2apic.c
>>> @@ -226,7 +226,23 @@ boolean_param("x2apic_phys", x2apic_phys);
>>>  const struct genapic *__init apic_x2apic_probe(void)
>>>  {
>>>  if ( x2apic_phys < 0 )
>>> -x2apic_phys = !!(acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL);
>>> +{
>>> +if ( !iommu_intremap )
>>> +/*
>>> + * Force physical mode if there's no interrupt remapping 
>>> support:
>>> + * the ID in clustered mode requires a 32 bit destination 
>>> field due
>>> + * to the usage of the high 16 bits to store the cluster ID.
>>> + */
>>> +x2apic_phys = true;
>>> +else
>>> +x2apic_phys = !!(acpi_gbl_FADT.flags & 
>>> ACPI_FADT_APIC_PHYSICAL);
>>
>> ... I wonder why you didn't make this
>>
>> x2apic_phys = !iommu_intremap || (acpi_gbl_FADT.flags & 
>> ACPI_FADT_APIC_PHYSICAL);
>>
>> (not the least because of allowing to drop the somewhat ugly !!).
> 
> Feel free to do it at commit (and reindent the comment), or else I can
> resend a new version if that's too intrusive.

Doing these adjustments at commit time ought to be fine. It's
just that I'd prefer to wait with committing this series until
4.13 is fully finished.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed

2019-12-09 Thread Roger Pau Monné
On Mon, Dec 09, 2019 at 12:40:47PM +, Durrant, Paul wrote:
> > -Original Message-
> > From: Roger Pau Monné 
> > Sent: 09 December 2019 12:26
> > To: Durrant, Paul 
> > Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org; Juergen
> > Gross ; Stefano Stabellini ;
> > Boris Ostrovsky 
> > Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to
> > closed
> > 
> > On Mon, Dec 09, 2019 at 12:01:38PM +, Durrant, Paul wrote:
> > > > -Original Message-
> > > > From: Roger Pau Monné 
> > > > Sent: 09 December 2019 11:39
> > > > To: Durrant, Paul 
> > > > Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org;
> > Juergen
> > > > Gross ; Stefano Stabellini ;
> > > > Boris Ostrovsky 
> > > > Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is
> > forced to
> > > > closed
> > > >
> > > > On Thu, Dec 05, 2019 at 02:01:21PM +, Paul Durrant wrote:
> > > > > Only force state to closed in the case when the toolstack may need
> > to
> > > > > clean up. This can be detected by checking whether the state in
> > xenstore
> > > > > has been set to closing prior to device removal.
> > > >
> > > > I'm not sure I see the point of this, I would expect that a failure to
> > > > probe or the removal of the device would leave the xenbus state as
> > > > closed, which is consistent with the actual driver state.
> > > >
> > > > Can you explain what's the benefit of leaving a device without a
> > > > driver in such unknown state?
> > > >
> > >
> > > If probe fails then I think it should leave the state alone. If the
> > > state is moved to closed then basically you just killed that
> > > connection to the guest (as the frontend will normally close down
> > > when it sees this change) so, if the probe failure was due to a bug
> > > in blkback or, e.g., a transient resource issue then it's game over
> > > as far as that guest goes.
> > 
> > But the connection can be restarted by switching the backend to the
> > init state again.
> 
> Too late. The frontend saw closed and you already lost.
> 
> > 
> > > The ultimate goal here is PV backend re-load that is completely
> > transparent to the guest. Modifying anything in xenstore compromises that
> > so we need to be careful.
> > 
> > That's a fine goal, but not switching to closed state in
> > xenbus_dev_remove seems wrong, as you have actually left the frontend
> > without a matching backend and with the state not set to closed.
> > 
> 
> Why is this a problem? With this series fully applied a (block) backend can 
> come and go without needing to change the state. Relying on guests to DTRT is 
> not a sustainable option for a cloud deployment.
> 
> > Ie: that would be fine if you explicitly state this is some kind of
> > internal blkback reload, but not for the general case where blkback
> > has been unbound. I think we need someway to difference a blkback
> > reload vs a unbound.
> > 
> 
> Why do we need that though? Why is it advantageous for a backend to go to 
> closed. No PV backends cope with an unbind as-is, and a toolstack initiated 
> unplug will always set state to 5 anyway. So TBH any state transition done 
> directly in the xenbus code looks wrong to me anyway (but appears to be a 
> necessary evil to keep the toolstack working in the event it spawns a backend 
> where there is actually to driver present, or it doesn't come online).

IMO the normal flow for unbind would be to attempt to close open
connections and then remove the driver: leaving frontends connected
without any attached backends is not correct, and will just block the
guest frontend until requests start timing out.

I can see the reasoning for doing that for the purpose of updating a
blkback module without guests noticing, but I would prefer that
leaving connections open was an option that could be given when
unbinding (or maybe a driver option in sysfs?), so that the default
behaviour would be to try to close everything when unbinding if
possible.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed

2019-12-09 Thread Durrant, Paul
> -Original Message-
> From: Jürgen Groß 
> Sent: 09 December 2019 14:10
> To: Durrant, Paul ; Roger Pau Monné
> 
> Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org; Stefano
> Stabellini ; Boris Ostrovsky
> 
> Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to
> closed
> 
> On 09.12.19 15:06, Durrant, Paul wrote:
> >> -Original Message-
> >> From: Jürgen Groß 
> >> Sent: 09 December 2019 13:39
> >> To: Durrant, Paul ; Roger Pau Monné
> >> 
> >> Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org;
> Stefano
> >> Stabellini ; Boris Ostrovsky
> >> 
> >> Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced
> to
> >> closed
> >>
> >> On 09.12.19 13:19, Durrant, Paul wrote:
>  -Original Message-
>  From: Jürgen Groß 
>  Sent: 09 December 2019 12:09
>  To: Durrant, Paul ; Roger Pau Monné
>  
>  Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org;
> >> Stefano
>  Stabellini ; Boris Ostrovsky
>  
>  Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is
> forced
> >> to
>  closed
> 
>  On 09.12.19 13:03, Durrant, Paul wrote:
> >> -Original Message-
> >> From: Jürgen Groß 
> >> Sent: 09 December 2019 11:55
> >> To: Roger Pau Monné ; Durrant, Paul
> >> 
> >> Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org;
>  Stefano
> >> Stabellini ; Boris Ostrovsky
> >> 
> >> Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is
> >> forced
>  to
> >> closed
> >>
> >> On 09.12.19 12:39, Roger Pau Monné wrote:
> >>> On Thu, Dec 05, 2019 at 02:01:21PM +, Paul Durrant wrote:
>  Only force state to closed in the case when the toolstack may
> need
> >> to
>  clean up. This can be detected by checking whether the state in
> >> xenstore
>  has been set to closing prior to device removal.
> >>>
> >>> I'm not sure I see the point of this, I would expect that a
> failure
> >> to
> >>> probe or the removal of the device would leave the xenbus state as
> >>> closed, which is consistent with the actual driver state.
> >>>
> >>> Can you explain what's the benefit of leaving a device without a
> >>> driver in such unknown state?
> >>
> >> And more concerning: did you check that no frontend/backend is
> >> relying on the closed state to be visible without closing having
> been
> >> set before?
> >
> > Blkfront doesn't seem to mind and I believe the Windows PV drivers
> >> cope,
>  but I don't really understand the comment since this patch is
> actually
>  removing a case where the backend transitions directly to closed.
> 
>  I'm not speaking of blkfront/blkback only, but of net, tpm, scsi,
> >> pvcall
>  etc. frontends/backends. After all you are modifying a function
> common
>  to all PV driver pairs.
> 
>  You are removing a state switc to "closed" in case the state was
> _not_
>  "closing" before.
> >>>
> >>> Yes, which AFAIK is against the intention of the generic PV protocol
> >> such that it ever existed anyway.
> >>
> >> While this might be the case we should _not_ break any guests
> >> running now. So this kind of reasoning is dangerous.
> >>
> >>>
>  So any PV driver reacting to "closed" of the other end
>  in case the previous state might not have been "closing" before is at
>  risk to misbehave with your patch.
> >>>
> >>> Well, they will see nothing now. If the state was not closing, it gets
> >> left alone, so the frontend shouldn't do anything. The only risk that I
> >> can see is that some frontend/backend pair needed a direct 4 -> 6
> >> transition to support 'unbind' before but AFAIK nothing has ever
> supported
> >> that, and blk and net crash'n'burn if you try that on upstream as it
> >> stands. A clean unplug would always set state to 5 first, since that's
> >> part of the unplug protocol.
> >>
> >> That was my question: are you sure all current and previous
> >> guest frontends and backends are handling unplug this way?
> >>
> >> Not "should handle", but "do handle".
> >
> > That depends on the toolstack. IIUC the only 'supported' toolstack is
> xl/libxl, which will set 'state' to 5 and 'online' to 0 to initiate an
> unplug.
> 
> I guess libvirt/libxl is doing the same?
> 

The unplug mechansism is all in libxl AFAICT, so it should be identical.

> At least at SUSE we still have some customers running xend based
> Xen installations with recent Linux or Windows guests.
> 

Is that something the upstream code can/should support though? I'd be surprised 
if xend is actually doing anything different to libxl since I've been coding 
the Windows PV drivers to trigger off the combined closing/online transition 
for as long as I can remember.

  Paul

___
Xen-devel mailing list

Re: [Xen-devel] [PATCH v3] x86: do not enable global pages when virtualized on AMD hardware

2019-12-09 Thread Jan Beulich
On 09.12.2019 11:20, Roger Pau Monné wrote:
> On Wed, Dec 04, 2019 at 06:05:11PM +0100, Jan Beulich wrote:
>> On 04.12.2019 17:18, Roger Pau Monné wrote:
>>> On Wed, Dec 04, 2019 at 05:11:42PM +0100, Jan Beulich wrote:
 On 04.12.2019 16:12, Roger Pau Monne wrote:
> @@ -130,7 +143,7 @@ unsigned long pv_make_cr4(const struct vcpu *v)
>   */
>  if ( d->arch.pv.pcid )
>  cr4 |= X86_CR4_PCIDE;
> -else if ( !d->arch.pv.xpti )
> +else if ( !d->arch.pv.xpti && opt_global_pages )
>  cr4 |= X86_CR4_PGE;

 I'm sorry for noticing this only now, but what about XEN_MINIMAL_CR4,
 which includes X86_CR4_PGE?
>>>
>>> I've tried removing PGE from XEN_MINIMAL_CR4 but it made no noticeable
>>> performance difference, so I left it as-is.
>>
>> My concern isn't about performance, but correctness. I admit I
>> forgot for a moment that we now always write CR4 (unless the
>> cached value matches the intended new one). Yet
>> mmu_cr4_features (starting out as XEN_MINIMAL_CR4) is still of
>> concern.
>>
>> I think this at least requires extending the description to
>> discuss the correctness.
> 
> Would you be fine with adding the following at the end of the commit
> message.
> 
> "Note that XEN_MINIMAL_CR4 is not modified, and thus global pages are
> left enabled for the hypervisor. This is not an issue because the code
> to switch the control registers (cr3 and cr4) already takes into
> account such situation and performs the necessary flushes. The same
> already happens when using XPTI or PCIDE, as the guest cr4 doesn't
> have global pages enabled in that case either."

Yes, this is good for XEN_MINIMAL_CR4. But I think mmu_cr4_features
needs discussing (or at least mentioning, if the same arguments
apply) as well.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] cmdline: treat hyphens and underscores the same

2019-12-09 Thread Jan Beulich
On 09.12.2019 15:06, George Dunlap wrote:
> On 12/6/19 4:42 PM, Jan Beulich wrote:
>> On 06.12.2019 17:20, Julien Grall wrote:
>>> Hi,
>>>
>>> On 06/12/2019 16:06, Jan Beulich wrote:
 On 06.12.2019 15:46, Julien Grall wrote:
> On 05/12/2019 16:50, Jan Beulich wrote:
>> On 05.12.2019 17:27, Julien Grall wrote:
>>> On 05/12/2019 15:33, Jan Beulich wrote:
 +/*
 + * String comparison functions mostly matching strcmp() / strncmp(),
 + * except that they treat '-' and '_' as matching one another.
 + */
 +static int _strcmp(const char *s1, const char *s2)
>>>
>>> I thought we were trying to avoid new function name with leading _?
>>
>> We're trying to avoid new name space violations. Such are
>> - identifiers starting with two underscores,
>> - identifiers starting with an underscore and an upper case letter,
>> - identifiers of non-static symbols starting with an underscore.
>
> I am not sure to understand why non-static symbols only. This would
> prevent you to use the the non-static symbol if you happen to re-use the
> same name.

 I'm afraid I don't understand. Anyway - what I've listed above is
 what the language standard mandates.
>>> AFAIU, for a given unit, there is only one pool of identifiers. So you 
>>> could not have an identifier used at the same time by a non-static and a 
>>> static symbol (that's exclusing the weak attribute). So it feels 
>>> slightly strange to only cover the non-static symbols.
>>
>> I guess I'm still not getting your point. What the above tells
>> us is that static symbols may start with an underscore (but
>> not followed by another one or an uppercase letter). Non-static
>> symbols may not.
>>
> Anyway, how about calling it cmdline_strncmp()? This would be easier to
> spot misuse on review (i.e using strncmp() rather than _strncmp()).

 We already have cmdline_strcmp(), or else I would indeed have used
 this prefix. No prefix (other than the lone underscore) seemed the
 next best option.
>>>
>>> As we parse an option, how about opt_strncmp()?
>>
>> I'd still like _strncmp() better here.
> 
> Why?  It doesn't tell you anything at all about what's special about the
> function.  In fact, I'd say it's confusing -- the "_" doesn't normally
> mean, "do something different and special", but "do the core of
> something which other things might call".
> 
> I'd much prefer opt_strncmp() than _strncmp().

Noted - will do.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed

2019-12-09 Thread Jürgen Groß

On 09.12.19 15:06, Durrant, Paul wrote:

-Original Message-
From: Jürgen Groß 
Sent: 09 December 2019 13:39
To: Durrant, Paul ; Roger Pau Monné

Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org; Stefano
Stabellini ; Boris Ostrovsky

Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to
closed

On 09.12.19 13:19, Durrant, Paul wrote:

-Original Message-
From: Jürgen Groß 
Sent: 09 December 2019 12:09
To: Durrant, Paul ; Roger Pau Monné

Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org;

Stefano

Stabellini ; Boris Ostrovsky

Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced

to

closed

On 09.12.19 13:03, Durrant, Paul wrote:

-Original Message-
From: Jürgen Groß 
Sent: 09 December 2019 11:55
To: Roger Pau Monné ; Durrant, Paul

Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org;

Stefano

Stabellini ; Boris Ostrovsky

Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is

forced

to

closed

On 09.12.19 12:39, Roger Pau Monné wrote:

On Thu, Dec 05, 2019 at 02:01:21PM +, Paul Durrant wrote:

Only force state to closed in the case when the toolstack may need

to

clean up. This can be detected by checking whether the state in

xenstore

has been set to closing prior to device removal.


I'm not sure I see the point of this, I would expect that a failure

to

probe or the removal of the device would leave the xenbus state as
closed, which is consistent with the actual driver state.

Can you explain what's the benefit of leaving a device without a
driver in such unknown state?


And more concerning: did you check that no frontend/backend is
relying on the closed state to be visible without closing having been
set before?


Blkfront doesn't seem to mind and I believe the Windows PV drivers

cope,

but I don't really understand the comment since this patch is actually
removing a case where the backend transitions directly to closed.

I'm not speaking of blkfront/blkback only, but of net, tpm, scsi,

pvcall

etc. frontends/backends. After all you are modifying a function common
to all PV driver pairs.

You are removing a state switc to "closed" in case the state was _not_
"closing" before.


Yes, which AFAIK is against the intention of the generic PV protocol

such that it ever existed anyway.

While this might be the case we should _not_ break any guests
running now. So this kind of reasoning is dangerous.




So any PV driver reacting to "closed" of the other end
in case the previous state might not have been "closing" before is at
risk to misbehave with your patch.


Well, they will see nothing now. If the state was not closing, it gets

left alone, so the frontend shouldn't do anything. The only risk that I
can see is that some frontend/backend pair needed a direct 4 -> 6
transition to support 'unbind' before but AFAIK nothing has ever supported
that, and blk and net crash'n'burn if you try that on upstream as it
stands. A clean unplug would always set state to 5 first, since that's
part of the unplug protocol.

That was my question: are you sure all current and previous
guest frontends and backends are handling unplug this way?

Not "should handle", but "do handle".


That depends on the toolstack. IIUC the only 'supported' toolstack is xl/libxl, 
which will set 'state' to 5 and 'online' to 0 to initiate an unplug.


I guess libvirt/libxl is doing the same?

At least at SUSE we still have some customers running xend based
Xen installations with recent Linux or Windows guests.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] cmdline: treat hyphens and underscores the same

2019-12-09 Thread George Dunlap
On 12/6/19 4:42 PM, Jan Beulich wrote:
> On 06.12.2019 17:20, Julien Grall wrote:
>> Hi,
>>
>> On 06/12/2019 16:06, Jan Beulich wrote:
>>> On 06.12.2019 15:46, Julien Grall wrote:
 On 05/12/2019 16:50, Jan Beulich wrote:
> On 05.12.2019 17:27, Julien Grall wrote:
>> On 05/12/2019 15:33, Jan Beulich wrote:
>>> +/*
>>> + * String comparison functions mostly matching strcmp() / strncmp(),
>>> + * except that they treat '-' and '_' as matching one another.
>>> + */
>>> +static int _strcmp(const char *s1, const char *s2)
>>
>> I thought we were trying to avoid new function name with leading _?
>
> We're trying to avoid new name space violations. Such are
> - identifiers starting with two underscores,
> - identifiers starting with an underscore and an upper case letter,
> - identifiers of non-static symbols starting with an underscore.

 I am not sure to understand why non-static symbols only. This would
 prevent you to use the the non-static symbol if you happen to re-use the
 same name.
>>>
>>> I'm afraid I don't understand. Anyway - what I've listed above is
>>> what the language standard mandates.
>> AFAIU, for a given unit, there is only one pool of identifiers. So you 
>> could not have an identifier used at the same time by a non-static and a 
>> static symbol (that's exclusing the weak attribute). So it feels 
>> slightly strange to only cover the non-static symbols.
> 
> I guess I'm still not getting your point. What the above tells
> us is that static symbols may start with an underscore (but
> not followed by another one or an uppercase letter). Non-static
> symbols may not.
> 
 Anyway, how about calling it cmdline_strncmp()? This would be easier to
 spot misuse on review (i.e using strncmp() rather than _strncmp()).
>>>
>>> We already have cmdline_strcmp(), or else I would indeed have used
>>> this prefix. No prefix (other than the lone underscore) seemed the
>>> next best option.
>>
>> As we parse an option, how about opt_strncmp()?
> 
> I'd still like _strncmp() better here.

Why?  It doesn't tell you anything at all about what's special about the
function.  In fact, I'd say it's confusing -- the "_" doesn't normally
mean, "do something different and special", but "do the core of
something which other things might call".

I'd much prefer opt_strncmp() than _strncmp().

 -George


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed

2019-12-09 Thread Durrant, Paul
> -Original Message-
> From: Jürgen Groß 
> Sent: 09 December 2019 13:39
> To: Durrant, Paul ; Roger Pau Monné
> 
> Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org; Stefano
> Stabellini ; Boris Ostrovsky
> 
> Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to
> closed
> 
> On 09.12.19 13:19, Durrant, Paul wrote:
> >> -Original Message-
> >> From: Jürgen Groß 
> >> Sent: 09 December 2019 12:09
> >> To: Durrant, Paul ; Roger Pau Monné
> >> 
> >> Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org;
> Stefano
> >> Stabellini ; Boris Ostrovsky
> >> 
> >> Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced
> to
> >> closed
> >>
> >> On 09.12.19 13:03, Durrant, Paul wrote:
>  -Original Message-
>  From: Jürgen Groß 
>  Sent: 09 December 2019 11:55
>  To: Roger Pau Monné ; Durrant, Paul
>  
>  Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org;
> >> Stefano
>  Stabellini ; Boris Ostrovsky
>  
>  Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is
> forced
> >> to
>  closed
> 
>  On 09.12.19 12:39, Roger Pau Monné wrote:
> > On Thu, Dec 05, 2019 at 02:01:21PM +, Paul Durrant wrote:
> >> Only force state to closed in the case when the toolstack may need
> to
> >> clean up. This can be detected by checking whether the state in
>  xenstore
> >> has been set to closing prior to device removal.
> >
> > I'm not sure I see the point of this, I would expect that a failure
> to
> > probe or the removal of the device would leave the xenbus state as
> > closed, which is consistent with the actual driver state.
> >
> > Can you explain what's the benefit of leaving a device without a
> > driver in such unknown state?
> 
>  And more concerning: did you check that no frontend/backend is
>  relying on the closed state to be visible without closing having been
>  set before?
> >>>
> >>> Blkfront doesn't seem to mind and I believe the Windows PV drivers
> cope,
> >> but I don't really understand the comment since this patch is actually
> >> removing a case where the backend transitions directly to closed.
> >>
> >> I'm not speaking of blkfront/blkback only, but of net, tpm, scsi,
> pvcall
> >> etc. frontends/backends. After all you are modifying a function common
> >> to all PV driver pairs.
> >>
> >> You are removing a state switc to "closed" in case the state was _not_
> >> "closing" before.
> >
> > Yes, which AFAIK is against the intention of the generic PV protocol
> such that it ever existed anyway.
> 
> While this might be the case we should _not_ break any guests
> running now. So this kind of reasoning is dangerous.
> 
> >
> >> So any PV driver reacting to "closed" of the other end
> >> in case the previous state might not have been "closing" before is at
> >> risk to misbehave with your patch.
> >
> > Well, they will see nothing now. If the state was not closing, it gets
> left alone, so the frontend shouldn't do anything. The only risk that I
> can see is that some frontend/backend pair needed a direct 4 -> 6
> transition to support 'unbind' before but AFAIK nothing has ever supported
> that, and blk and net crash'n'burn if you try that on upstream as it
> stands. A clean unplug would always set state to 5 first, since that's
> part of the unplug protocol.
> 
> That was my question: are you sure all current and previous
> guest frontends and backends are handling unplug this way?
> 
> Not "should handle", but "do handle".

That depends on the toolstack. IIUC the only 'supported' toolstack is xl/libxl, 
which will set 'state' to 5 and 'online' to 0 to initiate an unplug.

  Paul

> 
> 
> Juergen
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 4/4] xen-blkback: support dynamic unbind/bind

2019-12-09 Thread Durrant, Paul
> -Original Message-
> From: Jürgen Groß 
> Sent: 09 December 2019 13:58
> To: Durrant, Paul ; linux-ker...@vger.kernel.org;
> xen-devel@lists.xenproject.org
> Cc: Konrad Rzeszutek Wilk ; Roger Pau Monné
> ; Jens Axboe ; Boris Ostrovsky
> ; Stefano Stabellini 
> Subject: Re: [PATCH 4/4] xen-blkback: support dynamic unbind/bind
> 
> On 05.12.19 15:01, Paul Durrant wrote:
> > By simply re-attaching to shared rings during connect_ring() rather than
> > assuming they are freshly allocated (i.e assuming the counters are zero)
> > it is possible for vbd instances to be unbound and re-bound from and to
> > (respectively) a running guest.
> >
> > This has been tested by running:
> >
> > while true; do dd if=/dev/urandom of=test.img bs=1M count=1024; done
> >
> > in a PV guest whilst running:
> >
> > while true;
> >do echo vbd-$DOMID-$VBD >unbind;
> >echo unbound;
> >sleep 5;
> >echo vbd-$DOMID-$VBD >bind;
> >echo bound;
> >sleep 3;
> >done
> >
> > in dom0 from /sys/bus/xen-backend/drivers/vbd to continuously unbind and
> > re-bind its system disk image.
> 
> Could you do the same test with mixed reads/writes and verification of
> the read/written data, please? A write-only test is not _that_
> convincing regarding correctness. It only proves the guest is not
> crashing.

Sure. I'll find something that will verify content.

> 
> I'm fine with the general approach, though.
> 

Cool, thanks,

  Paul

> 
> Juergen
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/CPUID: RSTR_FP_ERR_PTRS depends on FPU

2019-12-09 Thread Jürgen Groß

On 09.12.19 14:06, Jan Beulich wrote:

On 26.09.2019 17:23, Jürgen Groß wrote:

On 25.09.19 17:27, Jan Beulich wrote:

There's nothing to restore here if there's no FPU in the first place.

Signed-off-by: Jan Beulich 


Release-acked-by: Juergen Gross 


While I've committed the change to the unstable branch, making use of
this R-a-b now without asking would seem abusive to me. I'd expect
you don't want the 4.13 branch disturbed more than really helpful,
and hence I expect you'd rather not see this go in there now. Please
let me know if you view this differently.


I appreciate that a lot. Please don't commit this to 4.13 now. :-)


Juergen


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 4/4] xen-blkback: support dynamic unbind/bind

2019-12-09 Thread Jürgen Groß

On 05.12.19 15:01, Paul Durrant wrote:

By simply re-attaching to shared rings during connect_ring() rather than
assuming they are freshly allocated (i.e assuming the counters are zero)
it is possible for vbd instances to be unbound and re-bound from and to
(respectively) a running guest.

This has been tested by running:

while true; do dd if=/dev/urandom of=test.img bs=1M count=1024; done

in a PV guest whilst running:

while true;
   do echo vbd-$DOMID-$VBD >unbind;
   echo unbound;
   sleep 5;
   echo vbd-$DOMID-$VBD >bind;
   echo bound;
   sleep 3;
   done

in dom0 from /sys/bus/xen-backend/drivers/vbd to continuously unbind and
re-bind its system disk image.


Could you do the same test with mixed reads/writes and verification of
the read/written data, please? A write-only test is not _that_
convincing regarding correctness. It only proves the guest is not
crashing.

I'm fine with the general approach, though.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/4] xen/interface: don't discard pending work in FRONT/BACK_RING_ATTACH

2019-12-09 Thread Jürgen Groß

On 05.12.19 15:01, Paul Durrant wrote:

Currently these macros will skip over any requests/responses that are
added to the shared ring whilst it is detached. This, in general, is not
a desirable semantic since most frontend implementations will eventually
block waiting for a response which would either never appear or never be
processed.

NOTE: These macros are currently unused. BACK_RING_ATTACH(), however, will
   be used in a subsequent patch.

Signed-off-by: Paul Durrant 
---
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: Stefano Stabellini 
---
  include/xen/interface/io/ring.h | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/xen/interface/io/ring.h b/include/xen/interface/io/ring.h
index 3f40501fc60b..405adfed87e6 100644
--- a/include/xen/interface/io/ring.h
+++ b/include/xen/interface/io/ring.h
@@ -143,14 +143,14 @@ struct __name##_back_ring {   
\
  #define FRONT_RING_ATTACH(_r, _s, __size) do {
\
  (_r)->sring = (_s);\
  (_r)->req_prod_pvt = (_s)->req_prod;\
-(_r)->rsp_cons = (_s)->rsp_prod; \
+(_r)->rsp_cons = (_s)->req_prod; \
  (_r)->nr_ents = __RING_SIZE(_s, __size);   \
  } while (0)
  
  #define BACK_RING_ATTACH(_r, _s, __size) do {\

  (_r)->sring = (_s);\
  (_r)->rsp_prod_pvt = (_s)->rsp_prod;\
-(_r)->req_cons = (_s)->req_prod; \
+(_r)->req_cons = (_s)->rsp_prod; \
  (_r)->nr_ents = __RING_SIZE(_s, __size);   \
  } while (0)


Lets look at all possible scenarios where BACK_RING_ATTACH()
might happen:

Initially (after [FRONT|BACK]_RING_INIT(), leaving _pvt away):
req_prod=0, rsp_cons=0, rsp_prod=0, req_cons=0
Using BACK_RING_ATTACH() is fine (no change)

Request queued:
req_prod=1, rsp_cons=0, rsp_prod=0, req_cons=0
Using BACK_RING_ATTACH() is fine (no change)

and taken by backend:
req_prod=1, rsp_cons=0, rsp_prod=0, req_cons=1
Using BACK_RING_ATTACH() is resetting req_cons to 0, will result
in redoing request (for blk this is fine, other devices like SCSI
tapes will have issues with that). One possible solution would be
to ensure all taken requests are either stopped or the response
is queued already.

Response queued:
req_prod=1, rsp_cons=0, rsp_prod=1, req_cons=1
Using BACK_RING_ATTACH() is fine (no change)

Response taken:
req_prod=1, rsp_cons=1, rsp_prod=1, req_cons=1
Using BACK_RING_ATTACH() is fine (no change)

In general I believe the [FRONT|BACK]_RING_ATTACH() macros are not
fine to be used in the current state, as the *_pvt fields normally not
accessible by the other end are initialized using the (possibly
untrusted) values from the shared ring. There needs at least to be a
test for the values to be sane, and your change should not result in the
same value to be read twice, as it could have changed in between.

As this is an error which can happen in other OS's, too, I'd recommend
to add the adapted macros (plus a comment regarding the possible
problem noted above for special devices like tapes) to the Xen variant
of ring.h.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3] CODING_STYLE: Document how to handle unexpected conditions

2019-12-09 Thread Jan Beulich
On 09.12.2019 12:29, George Dunlap wrote:
> --- a/CODING_STYLE
> +++ b/CODING_STYLE
> @@ -133,3 +133,97 @@ the end of files.  It should be:
>   * indent-tabs-mode: nil
>   * End:
>   */
> +
> +Handling unexpected conditions
> +--
> +
> +GUIDELINES:
> +
> +Passing errors up the stack should be used when the caller is already
> +expecting to handle errors, and the state when the error was
> +discovered isn’t broken, or isn't too hard to fix.
> +
> +domain_crash() should be used when passing errors up the stack is too
> +difficult, and/or when fixing up state of a guest is impractical, but
> +where fixing up the state of Xen will allow Xen to continue running.
> +This is particularly appropriate when the guest is exhibiting behavior
> +well-behaved guest should.

DYM "shouldn't"?

> +BUG_ON() should be used when you can’t pass errors up the stack, and
> +either continuing or crashing the guest would likely cause an
> +information leak or privilege escalation vulnerability.
> +
> +ASSERT() IS NOT AN ERROR HANDLING MECHANISM.  ASSERT is a way to move
> +detection of a bug earlier in the programming cycle; it is a
> +more-noticeable printk.  It should only be added after one of the
> +other three error-handling mechanisms has been evaluated for
> +reliability and security.
> +
> +RATIONALE:
> +
> +It's frequently the case that code is written with the assumption that
> +certain conditions can never happen.  There are several possible
> +actions programmers can take in these situations:
> +
> +* Programmers can simply not handle those cases in any way, other than
> +perhaps to write a comment documenting what the assumption is.
> +
> +* Programmers can try to handle the case gracefully -- fixing up
> +in-progress state and returning an error to the user.
> +
> +* Programmers can crash the guest.
> +
> +* Programmers can use ASSERT(), which will cause the check to be
> +executed in DEBUG builds, and cause the hypervisor to crash if it's
> +violated
> +
> +* Programmers can use BUG_ON(), which will cause the check to be
> +executed in both DEBUG and non-DEBUG builds, and cause the hypervisor
> +to crash if it's violated.
> +
> +In selecting which response to use, we want to achieve several goals:
> +
> +- To minimize risk of introducing security vulnerabilities,
> +  particularly as the code evolves over time
> +
> +- To efficiently spend programmer time
> +
> +- To detect violations of assumptions as early as possible
> +
> +- To minimize the impact of bugs on production use cases
> +
> +The guidelines above attempt to balance these:
> +
> +- When the caller is expecting to handle errors, and there is no
> +broken state at the time the unexpected condition is discovered, or
> +when fixing the state is straightforward, then fixing up the state and
> +returning an error is the most robust thing to do.  However, if the
> +caller isn't expecting to handle errors, or if the state is difficult
> +to fix, then returning an error may require extensive refactoring,
> +which is not a good use of programmer time when they're certain that
> +this condition cannot occur.
> +
> +- BUG_ON() will stop all hypervisor action immediately.  In situations
> +where continuing might allow an attacker to escalate privilege, a
> +BUG_ON() can change a privilege escalation or information leak into a
> +denial-of-service (an improvement).  But in situations where
> +continuing (say, returning an error) might be safe, then BUG_ON() can
> +change a benign failure into denial-of-service (a degradation).
> +
> +- domain_crash() is similar to BUG_ON(), but with a more limited
> +effect: it stops that domain immediately.  In situations where
> +continuing might cause guest or hypervisor corruption, but destroying
> +the guest allows the hypervisor to continue, this can change a more
> +serious bug into a guest denial-of-service.  But in situations where
> +returning an error might be safe, then domain_crash() can change a
> +benign failure into a guest denial-of-service.

Perhaps further put emphasis on the call tree still getting unwound
normally, which may imply further actions on the (now dying) domain
taken. Unfortunately it's not unusual for people to forget this; I
think the IOMMU code in particular was (hopefully isn't so much
anymore) a "good" example of this.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 1/3] xen/flask: Drop the gen-policy.py script

2019-12-09 Thread Jan Beulich
On 07.12.2019 22:16, Andrew Cooper wrote:
> --- /dev/null
> +++ b/xen/xsm/flask/flask-policy.S
> @@ -0,0 +1,20 @@
> +.section .init.rodata, "a", %progbits
> +
> +/* const unsigned char xsm_flask_init_policy[] __initconst */
> +.align 4

I'm afraid .align is not universal enough to be used here - iirc
some architectures have it alias .p2align rather than (how e.g.
x86 behaves) .balign. Looks like .p2align is available in all
binutils versions we claim to be able to be built with. (I'm
not sure the one here is needed anyway, but the one below we
surely want.)

> +.global xsm_flask_init_policy
> +xsm_flask_init_policy:
> +.incbin "policy.bin"
> +.Lend:
> +
> +.type xsm_flask_init_policy, %object
> +.size xsm_flask_init_policy, . - xsm_flask_init_policy
> +
> +/* const unsigned int __initconst xsm_flask_init_policy_size */
> +.align 4
> +.global xsm_flask_init_policy_size
> +xsm_flask_init_policy_size:
> +.long .Lend - xsm_flask_init_policy

Similarly .long isn't really universal (various arches override
it in gas). Aiui .dc.l is intended to be portable (despite still
carrying the 'l' in its name, and despite even this one getting
overridden by two arches). But perhaps best to ask on the
binutils list.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed

2019-12-09 Thread Jürgen Groß

On 09.12.19 13:19, Durrant, Paul wrote:

-Original Message-
From: Jürgen Groß 
Sent: 09 December 2019 12:09
To: Durrant, Paul ; Roger Pau Monné

Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org; Stefano
Stabellini ; Boris Ostrovsky

Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to
closed

On 09.12.19 13:03, Durrant, Paul wrote:

-Original Message-
From: Jürgen Groß 
Sent: 09 December 2019 11:55
To: Roger Pau Monné ; Durrant, Paul

Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org;

Stefano

Stabellini ; Boris Ostrovsky

Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced

to

closed

On 09.12.19 12:39, Roger Pau Monné wrote:

On Thu, Dec 05, 2019 at 02:01:21PM +, Paul Durrant wrote:

Only force state to closed in the case when the toolstack may need to
clean up. This can be detected by checking whether the state in

xenstore

has been set to closing prior to device removal.


I'm not sure I see the point of this, I would expect that a failure to
probe or the removal of the device would leave the xenbus state as
closed, which is consistent with the actual driver state.

Can you explain what's the benefit of leaving a device without a
driver in such unknown state?


And more concerning: did you check that no frontend/backend is
relying on the closed state to be visible without closing having been
set before?


Blkfront doesn't seem to mind and I believe the Windows PV drivers cope,

but I don't really understand the comment since this patch is actually
removing a case where the backend transitions directly to closed.

I'm not speaking of blkfront/blkback only, but of net, tpm, scsi, pvcall
etc. frontends/backends. After all you are modifying a function common
to all PV driver pairs.

You are removing a state switc to "closed" in case the state was _not_
"closing" before.


Yes, which AFAIK is against the intention of the generic PV protocol such that 
it ever existed anyway.


While this might be the case we should _not_ break any guests
running now. So this kind of reasoning is dangerous.




So any PV driver reacting to "closed" of the other end
in case the previous state might not have been "closing" before is at
risk to misbehave with your patch.


Well, they will see nothing now. If the state was not closing, it gets left alone, 
so the frontend shouldn't do anything. The only risk that I can see is that some 
frontend/backend pair needed a direct 4 -> 6 transition to support 'unbind' 
before but AFAIK nothing has ever supported that, and blk and net crash'n'burn if 
you try that on upstream as it stands. A clean unplug would always set state to 5 
first, since that's part of the unplug protocol.


That was my question: are you sure all current and previous
guest frontends and backends are handling unplug this way?

Not "should handle", but "do handle".


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/2] x86/mm: factor out the code for shattering an l3 PTE

2019-12-09 Thread Jan Beulich
On 07.12.2019 19:20, Xia, Hongyan wrote:
> On Fri, 2019-12-06 at 17:50 +, Andrew Cooper wrote:
>> On 06/12/2019 15:53, Hongyan Xia wrote:
>>> +/* Shatter an l3 entry and populate l2. If virt is passed in, also
>>> do flush. */
>>> +static void shatter_l3e(l3_pgentry_t *pl3e, l2_pgentry_t *l2t,
>>> +unsigned long virt, bool locking)
>>> +{
>>> +unsigned int i;
>>> +l3_pgentry_t ol3e = *pl3e;
>>> +
>>> +for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
>>> +l2e_write(l2t + i,
>>> +  l2e_from_pfn(l3e_get_pfn(ol3e) +
>>> +   (i << PAGETABLE_ORDER),
>>> +   l3e_get_flags(ol3e)));
>>
>> The PTE macros are especially poor for generated asm, and in cases
>> like
>> this, I'd like to improve things.
>>
>> In particular, I believe the following code has identical behaviour:
>>
>> l2_pgentry_t nl2e = l2e_from_intpte(l3e_get_intpte(ol3e));
>>
>> for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++, nl2e.l2 +=
>> PAGETABLE_ORDER )
>> l2e_write(l2t + i, nl2e);
>>
>> (although someone please double check my logic) and rather better asm
>> generation.  (I also expect there to be some discussion on whether
>> using
>> n2le.l2 directly is something we'd want to start doing.)
>>
> 
> I believe it should be nl2e.l2 += 1<<(PAGETABLE_ORDER+PAGE_SHIFT) ?

Indeed.

> Although the code rarely touches the field (.l2) directly, so maybe use
> the macros (get_intpte -> add -> from_intpte) for consistency? They
> should produce the same code if the compiler is not too stupid.

I think the to/from intpte transformations should be used sparingly
too (as being dangerous). How about we make PTEs proper unions, with
a full-field intpte_t as we have it now combined with a set of bit
fields? This would at least eliminate the need for using PAGE_SHIFT
in constructs like the above.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] XSA-255 and Arm

2019-12-09 Thread Julien Grall

Sorry I forgot to CC xen-devel.

On 09/12/2019 13:13, Julien Grall wrote:

Hi all,

I was looking at the Grant Table code over the week-end and noticed 
thart XSA-255 [1] introduced some unintended consequences on Arm.


Since the XSA, gnttab_map_frame() will remove the previous mapping (if 
any) because mapping to the new GFN.


As on Arm we don't have an M2P, the GFN is stored per frame in the 
grant-table code. This will never get cleared during unmapping (e.g. 
XENMEM_remove_from_physmap) and therefore we may end up to remove a 
mapping from someone different (Arm does not check the MFN is the 
correct one before removing mapping).


This works well on x86 because the translation MFN to GFN is using the 
M2P. Therefore, the translation will be indirectly cleared when the 
mapping is removed via XENMEM_remove_from_physmap.


I could fix the P2M code to check the MFN on removal, but this is only 
fixing on part of the problem. For instance, 
gnttab_unpopulate_status_frame() is also check whether the GFN is still 
valid for each mapping.


Without the M2P, I can only see one solution. We would need to check 
whether the GFN correspond to a grant frame and update the array on 
removal. This obviously requires to loop through an array which is not 
very great.


Any other ideas?

Cheers,

[1] grant table v2 -> v1 transition may crash Xen



--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/CPUID: RSTR_FP_ERR_PTRS depends on FPU

2019-12-09 Thread Jan Beulich
On 26.09.2019 17:23, Jürgen Groß wrote:
> On 25.09.19 17:27, Jan Beulich wrote:
>> There's nothing to restore here if there's no FPU in the first place.
>>
>> Signed-off-by: Jan Beulich 
> 
> Release-acked-by: Juergen Gross 

While I've committed the change to the unstable branch, making use of
this R-a-b now without asking would seem abusive to me. I'd expect
you don't want the 4.13 branch disturbed more than really helpful,
and hence I expect you'd rather not see this go in there now. Please
let me know if you view this differently.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/4] xen/interface: don't discard pending work in FRONT/BACK_RING_ATTACH

2019-12-09 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of
> Jürgen Groß
> Sent: 09 December 2019 11:52
> To: Roger Pau Monné ; Durrant, Paul
> 
> Cc: xen-devel@lists.xenproject.org; Boris Ostrovsky
> ; Stefano Stabellini ;
> linux-ker...@vger.kernel.org
> Subject: Re: [Xen-devel] [PATCH 3/4] xen/interface: don't discard pending
> work in FRONT/BACK_RING_ATTACH
> 
> On 09.12.19 12:41, Roger Pau Monné wrote:
> > On Thu, Dec 05, 2019 at 02:01:22PM +, Paul Durrant wrote:
> >> Currently these macros will skip over any requests/responses that are
> >> added to the shared ring whilst it is detached. This, in general, is
> not
> >> a desirable semantic since most frontend implementations will
> eventually
> >> block waiting for a response which would either never appear or never
> be
> >> processed.
> >>
> >> NOTE: These macros are currently unused. BACK_RING_ATTACH(), however,
> will
> >>be used in a subsequent patch.
> >>
> >> Signed-off-by: Paul Durrant 
> >
> > Those headers come from Xen, and should be modified in Xen first and
> > then imported into Linux IMO.
> 
> In theory, yes. But the Xen variant doesn't contain the ATTACH macros.
> 

OOI do we have a policy about this? Re-importing headers into Linux wholesale 
is always slightly painful because of interdependencies and style checking 
issues.

  Paul

> 
> Juergen
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed

2019-12-09 Thread Durrant, Paul
> -Original Message-
> From: Roger Pau Monné 
> Sent: 09 December 2019 12:26
> To: Durrant, Paul 
> Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org; Juergen
> Gross ; Stefano Stabellini ;
> Boris Ostrovsky 
> Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to
> closed
> 
> On Mon, Dec 09, 2019 at 12:01:38PM +, Durrant, Paul wrote:
> > > -Original Message-
> > > From: Roger Pau Monné 
> > > Sent: 09 December 2019 11:39
> > > To: Durrant, Paul 
> > > Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org;
> Juergen
> > > Gross ; Stefano Stabellini ;
> > > Boris Ostrovsky 
> > > Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is
> forced to
> > > closed
> > >
> > > On Thu, Dec 05, 2019 at 02:01:21PM +, Paul Durrant wrote:
> > > > Only force state to closed in the case when the toolstack may need
> to
> > > > clean up. This can be detected by checking whether the state in
> xenstore
> > > > has been set to closing prior to device removal.
> > >
> > > I'm not sure I see the point of this, I would expect that a failure to
> > > probe or the removal of the device would leave the xenbus state as
> > > closed, which is consistent with the actual driver state.
> > >
> > > Can you explain what's the benefit of leaving a device without a
> > > driver in such unknown state?
> > >
> >
> > If probe fails then I think it should leave the state alone. If the
> > state is moved to closed then basically you just killed that
> > connection to the guest (as the frontend will normally close down
> > when it sees this change) so, if the probe failure was due to a bug
> > in blkback or, e.g., a transient resource issue then it's game over
> > as far as that guest goes.
> 
> But the connection can be restarted by switching the backend to the
> init state again.

Too late. The frontend saw closed and you already lost.

> 
> > The ultimate goal here is PV backend re-load that is completely
> transparent to the guest. Modifying anything in xenstore compromises that
> so we need to be careful.
> 
> That's a fine goal, but not switching to closed state in
> xenbus_dev_remove seems wrong, as you have actually left the frontend
> without a matching backend and with the state not set to closed.
> 

Why is this a problem? With this series fully applied a (block) backend can 
come and go without needing to change the state. Relying on guests to DTRT is 
not a sustainable option for a cloud deployment.

> Ie: that would be fine if you explicitly state this is some kind of
> internal blkback reload, but not for the general case where blkback
> has been unbound. I think we need someway to difference a blkback
> reload vs a unbound.
> 

Why do we need that though? Why is it advantageous for a backend to go to 
closed. No PV backends cope with an unbind as-is, and a toolstack initiated 
unplug will always set state to 5 anyway. So TBH any state transition done 
directly in the xenbus code looks wrong to me anyway (but appears to be a 
necessary evil to keep the toolstack working in the event it spawns a backend 
where there is actually to driver present, or it doesn't come online).

  Paul


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13] xen: Drop bogus BOOLEAN definitions, TRUE and FALSE

2019-12-09 Thread Julien Grall

Hi Jan,

On 09/12/2019 12:11, Jan Beulich wrote:

On 06.12.2019 22:02, Andrew Cooper wrote:

On 12/11/2019 14:03, Jan Beulich wrote:

Bottom line - I'm half convinced and willing to give my ack, but
I'm not convinced you truly thought through the longer term
consequences. I'd therefore be far happier to see this patch
split into a non-controversial part (anything that's not tied to
the ACPI and EFI header imports), an ACPI, and an EFI part.


I do not want to writing the same patch again in $N years time because
review and CI missed it creeping back in.

I don't think this is an unreasonable position to take.


It for sure isn't. Yet I also don't think though my request how to
split things is. By asking for the split I'm implying that we may
still reach agreement on the controversial parts, faod. Sadly once
again there are no other opinions helping to sort which route may
be the overall preferred one.


Well, for a first, I don't think I have seen any explicit request for 
opinion so far and not all the relevant maintainers have been CCed here.


In general, I tend to stay clear from argument between you and Andrew to 
avoid been dragged into bikeshedding.


Anyway, while I understand that we want to keep as close as upstream, 
those headers are not resync very often and the changes are minimal. The 
long term consequence is not about resync but keeping bogus code that 
can be used by everyone.


So the patch itself is a good step forward to make Xen better.

Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed

2019-12-09 Thread Roger Pau Monné
On Mon, Dec 09, 2019 at 12:01:38PM +, Durrant, Paul wrote:
> > -Original Message-
> > From: Roger Pau Monné 
> > Sent: 09 December 2019 11:39
> > To: Durrant, Paul 
> > Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org; Juergen
> > Gross ; Stefano Stabellini ;
> > Boris Ostrovsky 
> > Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to
> > closed
> > 
> > On Thu, Dec 05, 2019 at 02:01:21PM +, Paul Durrant wrote:
> > > Only force state to closed in the case when the toolstack may need to
> > > clean up. This can be detected by checking whether the state in xenstore
> > > has been set to closing prior to device removal.
> > 
> > I'm not sure I see the point of this, I would expect that a failure to
> > probe or the removal of the device would leave the xenbus state as
> > closed, which is consistent with the actual driver state.
> > 
> > Can you explain what's the benefit of leaving a device without a
> > driver in such unknown state?
> > 
> 
> If probe fails then I think it should leave the state alone. If the
> state is moved to closed then basically you just killed that
> connection to the guest (as the frontend will normally close down
> when it sees this change) so, if the probe failure was due to a bug
> in blkback or, e.g., a transient resource issue then it's game over
> as far as that guest goes.

But the connection can be restarted by switching the backend to the
init state again.

> The ultimate goal here is PV backend re-load that is completely transparent 
> to the guest. Modifying anything in xenstore compromises that so we need to 
> be careful.

That's a fine goal, but not switching to closed state in
xenbus_dev_remove seems wrong, as you have actually left the frontend
without a matching backend and with the state not set to closed.

Ie: that would be fine if you explicitly state this is some kind of
internal blkback reload, but not for the general case where blkback
has been unbound. I think we need someway to difference a blkback
reload vs a unbound.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 4/4] xen-blkback: support dynamic unbind/bind

2019-12-09 Thread Durrant, Paul
> -Original Message-
> From: Roger Pau Monné 
> Sent: 09 December 2019 12:17
> To: Durrant, Paul 
> Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org; Konrad
> Rzeszutek Wilk ; Jens Axboe ;
> Boris Ostrovsky ; Juergen Gross
> ; Stefano Stabellini 
> Subject: Re: [PATCH 4/4] xen-blkback: support dynamic unbind/bind
> 
> On Thu, Dec 05, 2019 at 02:01:23PM +, Paul Durrant wrote:
> > By simply re-attaching to shared rings during connect_ring() rather than
> > assuming they are freshly allocated (i.e assuming the counters are zero)
> > it is possible for vbd instances to be unbound and re-bound from and to
> > (respectively) a running guest.
> >
> > This has been tested by running:
> >
> > while true; do dd if=/dev/urandom of=test.img bs=1M count=1024; done
> >
> > in a PV guest whilst running:
> >
> > while true;
> >   do echo vbd-$DOMID-$VBD >unbind;
> >   echo unbound;
> >   sleep 5;
> >   echo vbd-$DOMID-$VBD >bind;
> >   echo bound;
> >   sleep 3;
> >   done
> 
> So this does unbind blkback while leaving the PV interface as
> connected?
> 

Yes, everything is left in place in the frontend. The backend detaches from the 
ring, closes its end of the event channels, etc. but the guest can still send 
requests which will get serviced when the new backend attaches.

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed

2019-12-09 Thread Durrant, Paul
> -Original Message-
> From: Jürgen Groß 
> Sent: 09 December 2019 12:09
> To: Durrant, Paul ; Roger Pau Monné
> 
> Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org; Stefano
> Stabellini ; Boris Ostrovsky
> 
> Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to
> closed
> 
> On 09.12.19 13:03, Durrant, Paul wrote:
> >> -Original Message-
> >> From: Jürgen Groß 
> >> Sent: 09 December 2019 11:55
> >> To: Roger Pau Monné ; Durrant, Paul
> >> 
> >> Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org;
> Stefano
> >> Stabellini ; Boris Ostrovsky
> >> 
> >> Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced
> to
> >> closed
> >>
> >> On 09.12.19 12:39, Roger Pau Monné wrote:
> >>> On Thu, Dec 05, 2019 at 02:01:21PM +, Paul Durrant wrote:
>  Only force state to closed in the case when the toolstack may need to
>  clean up. This can be detected by checking whether the state in
> >> xenstore
>  has been set to closing prior to device removal.
> >>>
> >>> I'm not sure I see the point of this, I would expect that a failure to
> >>> probe or the removal of the device would leave the xenbus state as
> >>> closed, which is consistent with the actual driver state.
> >>>
> >>> Can you explain what's the benefit of leaving a device without a
> >>> driver in such unknown state?
> >>
> >> And more concerning: did you check that no frontend/backend is
> >> relying on the closed state to be visible without closing having been
> >> set before?
> >
> > Blkfront doesn't seem to mind and I believe the Windows PV drivers cope,
> but I don't really understand the comment since this patch is actually
> removing a case where the backend transitions directly to closed.
> 
> I'm not speaking of blkfront/blkback only, but of net, tpm, scsi, pvcall
> etc. frontends/backends. After all you are modifying a function common
> to all PV driver pairs.
> 
> You are removing a state switc to "closed" in case the state was _not_
> "closing" before.

Yes, which AFAIK is against the intention of the generic PV protocol such that 
it ever existed anyway.

> So any PV driver reacting to "closed" of the other end
> in case the previous state might not have been "closing" before is at
> risk to misbehave with your patch.

Well, they will see nothing now. If the state was not closing, it gets left 
alone, so the frontend shouldn't do anything. The only risk that I can see is 
that some frontend/backend pair needed a direct 4 -> 6 transition to support 
'unbind' before but AFAIK nothing has ever supported that, and blk and net 
crash'n'burn if you try that on upstream as it stands. A clean unplug would 
always set state to 5 first, since that's part of the unplug protocol.

  Paul

> 
> Juergen
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 4/4] xen-blkback: support dynamic unbind/bind

2019-12-09 Thread Roger Pau Monné
On Thu, Dec 05, 2019 at 02:01:23PM +, Paul Durrant wrote:
> By simply re-attaching to shared rings during connect_ring() rather than
> assuming they are freshly allocated (i.e assuming the counters are zero)
> it is possible for vbd instances to be unbound and re-bound from and to
> (respectively) a running guest.
> 
> This has been tested by running:
> 
> while true; do dd if=/dev/urandom of=test.img bs=1M count=1024; done
> 
> in a PV guest whilst running:
> 
> while true;
>   do echo vbd-$DOMID-$VBD >unbind;
>   echo unbound;
>   sleep 5;
>   echo vbd-$DOMID-$VBD >bind;
>   echo bound;
>   sleep 3;
>   done

So this does unbind blkback while leaving the PV interface as
connected?

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13] xen: Drop bogus BOOLEAN definitions, TRUE and FALSE

2019-12-09 Thread Jan Beulich
On 06.12.2019 22:02, Andrew Cooper wrote:
> On 12/11/2019 14:03, Jan Beulich wrote:
>> Bottom line - I'm half convinced and willing to give my ack, but
>> I'm not convinced you truly thought through the longer term
>> consequences. I'd therefore be far happier to see this patch
>> split into a non-controversial part (anything that's not tied to
>> the ACPI and EFI header imports), an ACPI, and an EFI part.
> 
> I do not want to writing the same patch again in $N years time because
> review and CI missed it creeping back in.
> 
> I don't think this is an unreasonable position to take.

It for sure isn't. Yet I also don't think though my request how to
split things is. By asking for the split I'm implying that we may
still reach agreement on the controversial parts, faod. Sadly once
again there are no other opinions helping to sort which route may
be the overall preferred one.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed

2019-12-09 Thread Jürgen Groß

On 09.12.19 13:03, Durrant, Paul wrote:

-Original Message-
From: Jürgen Groß 
Sent: 09 December 2019 11:55
To: Roger Pau Monné ; Durrant, Paul

Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org; Stefano
Stabellini ; Boris Ostrovsky

Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to
closed

On 09.12.19 12:39, Roger Pau Monné wrote:

On Thu, Dec 05, 2019 at 02:01:21PM +, Paul Durrant wrote:

Only force state to closed in the case when the toolstack may need to
clean up. This can be detected by checking whether the state in

xenstore

has been set to closing prior to device removal.


I'm not sure I see the point of this, I would expect that a failure to
probe or the removal of the device would leave the xenbus state as
closed, which is consistent with the actual driver state.

Can you explain what's the benefit of leaving a device without a
driver in such unknown state?


And more concerning: did you check that no frontend/backend is
relying on the closed state to be visible without closing having been
set before?


Blkfront doesn't seem to mind and I believe the Windows PV drivers cope, but I 
don't really understand the comment since this patch is actually removing a 
case where the backend transitions directly to closed.


I'm not speaking of blkfront/blkback only, but of net, tpm, scsi, pvcall
etc. frontends/backends. After all you are modifying a function common
to all PV driver pairs.

You are removing a state switc to "closed" in case the state was _not_
"closing" before. So any PV driver reacting to "closed" of the other end
in case the previous state might not have been "closing" before is at
risk to misbehave with your patch.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/3] x86: relax GDT check in arch_set_info_guest()

2019-12-09 Thread Jan Beulich
On 06.12.2019 20:51, Andrew Cooper wrote:
> On 06/12/2019 11:32, Jan Beulich wrote:
>> On 06.12.2019 11:25, Andrew Cooper wrote:
>>> On 06/12/2019 10:14, Jan Beulich wrote:
 It is wrong for us to check frames beyond the guest specified limit.

 Signed-off-by: Jan Beulich 
>>> I don't completely agree.  The code has been like this since it was
>>> introduced, and is used to check data from the domain builder (inc
>>> migration), and from the guests.
>>>
>>> At the moment, every caller is required not to pass junk in unused
>>> frames, and I don't see an issue with keeping this behaviour.
>> Keeping the behavior isn't going to break anything, yes, but it
>> shouldn't have been this way to begin with. I simply don't see
>> the value of validating data we're not consuming anyway. Perhaps
>> I could say "not helpful" or "pointless" instead of "wrong" ...
> 
> But in other cases we go out of our way to check parameters (especially
> reserved fields) even when they aren't presently consumed.

Which we do to make sure we could use the fields down the road
without breaking existing callers. That's quite different from
the overzealous checking we do here.

> i.e. what do we gain (other than more complicated code) by relaxing a
> restriction we know is obeyed by every caller?

First - I don't think the code gets more complicated by this
change (nor the LDT counterpart). If anything I'm seeing a
really minor simplification (by consistently using a now
common variable). Further, if you look closely, you'll note
that the compat path is already only checking the specified
number of frames. Hence I'm bringing the non-compat one in
line, i.e. an improvement in consistency.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed

2019-12-09 Thread Durrant, Paul
> -Original Message-
> From: Jürgen Groß 
> Sent: 09 December 2019 11:55
> To: Roger Pau Monné ; Durrant, Paul
> 
> Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org; Stefano
> Stabellini ; Boris Ostrovsky
> 
> Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to
> closed
> 
> On 09.12.19 12:39, Roger Pau Monné wrote:
> > On Thu, Dec 05, 2019 at 02:01:21PM +, Paul Durrant wrote:
> >> Only force state to closed in the case when the toolstack may need to
> >> clean up. This can be detected by checking whether the state in
> xenstore
> >> has been set to closing prior to device removal.
> >
> > I'm not sure I see the point of this, I would expect that a failure to
> > probe or the removal of the device would leave the xenbus state as
> > closed, which is consistent with the actual driver state.
> >
> > Can you explain what's the benefit of leaving a device without a
> > driver in such unknown state?
> 
> And more concerning: did you check that no frontend/backend is
> relying on the closed state to be visible without closing having been
> set before?

Blkfront doesn't seem to mind and I believe the Windows PV drivers cope, but I 
don't really understand the comment since this patch is actually removing a 
case where the backend transitions directly to closed.

  Paul

> 
> 
> Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed

2019-12-09 Thread Durrant, Paul
> -Original Message-
> From: Roger Pau Monné 
> Sent: 09 December 2019 11:39
> To: Durrant, Paul 
> Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org; Juergen
> Gross ; Stefano Stabellini ;
> Boris Ostrovsky 
> Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to
> closed
> 
> On Thu, Dec 05, 2019 at 02:01:21PM +, Paul Durrant wrote:
> > Only force state to closed in the case when the toolstack may need to
> > clean up. This can be detected by checking whether the state in xenstore
> > has been set to closing prior to device removal.
> 
> I'm not sure I see the point of this, I would expect that a failure to
> probe or the removal of the device would leave the xenbus state as
> closed, which is consistent with the actual driver state.
> 
> Can you explain what's the benefit of leaving a device without a
> driver in such unknown state?
> 

If probe fails then I think it should leave the state alone. If the state is 
moved to closed then basically you just killed that connection to the guest (as 
the frontend will normally close down when it sees this change) so, if the 
probe failure was due to a bug in blkback or, e.g., a transient resource issue 
then it's game over as far as that guest goes.
The ultimate goal here is PV backend re-load that is completely transparent to 
the guest. Modifying anything in xenstore compromises that so we need to be 
careful.

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/4] xenbus: move xenbus_dev_shutdown() into frontend code...

2019-12-09 Thread Jürgen Groß

On 09.12.19 12:55, Durrant, Paul wrote:

-Original Message-
From: Jürgen Groß 
Sent: 09 December 2019 11:34
To: Durrant, Paul ; linux-ker...@vger.kernel.org;
xen-devel@lists.xenproject.org
Cc: Boris Ostrovsky ; Stefano Stabellini

Subject: Re: [PATCH 1/4] xenbus: move xenbus_dev_shutdown() into frontend
code...

On 05.12.19 15:01, Paul Durrant wrote:

...and make it static

xenbus_dev_shutdown() is seemingly intended to cause clean shutdown of

PV

frontends when a guest is rebooted. Indeed the function waits for a
conpletion which is only set by a call to xenbus_frontend_closed().

This patch removes the shutdown() method from backends and moves
xenbus_dev_shutdown() from xenbus_probe.c into xenbus_probe_frontend.c,
renaming it appropriately and making it static.


Is this a good move considering driver domains?


I don't think it can have ever worked properly for driver domains, and with the 
rest of the patches a backend should be able go away and return unannounced (as 
long as the domain id is kept... for which patches need to be upstreamed into 
Xen).



At least I'd expect the commit message addressing the expected behavior
with rebooting a driver domain and why this patch isn't making things
worse.



For a clean reboot I'd expect the toolstack to shut down the protocol before 
rebooting the driver domain, so the backend shutdown method is moot. And I 
don't believe re-startable driver domains were something that ever made it into 
support (because of the non-persistent domid problem). I can add something to 
the commit comment to that effect if you'd like.


Yes, please do so.

With this you can add my:

Reviewed-by: Juergen Gross 


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/4] xenbus: move xenbus_dev_shutdown() into frontend code...

2019-12-09 Thread Durrant, Paul
> -Original Message-
> From: Jürgen Groß 
> Sent: 09 December 2019 11:34
> To: Durrant, Paul ; linux-ker...@vger.kernel.org;
> xen-devel@lists.xenproject.org
> Cc: Boris Ostrovsky ; Stefano Stabellini
> 
> Subject: Re: [PATCH 1/4] xenbus: move xenbus_dev_shutdown() into frontend
> code...
> 
> On 05.12.19 15:01, Paul Durrant wrote:
> > ...and make it static
> >
> > xenbus_dev_shutdown() is seemingly intended to cause clean shutdown of
> PV
> > frontends when a guest is rebooted. Indeed the function waits for a
> > conpletion which is only set by a call to xenbus_frontend_closed().
> >
> > This patch removes the shutdown() method from backends and moves
> > xenbus_dev_shutdown() from xenbus_probe.c into xenbus_probe_frontend.c,
> > renaming it appropriately and making it static.
> 
> Is this a good move considering driver domains?

I don't think it can have ever worked properly for driver domains, and with the 
rest of the patches a backend should be able go away and return unannounced (as 
long as the domain id is kept... for which patches need to be upstreamed into 
Xen).

> 
> At least I'd expect the commit message addressing the expected behavior
> with rebooting a driver domain and why this patch isn't making things
> worse.
> 

For a clean reboot I'd expect the toolstack to shut down the protocol before 
rebooting the driver domain, so the backend shutdown method is moot. And I 
don't believe re-startable driver domains were something that ever made it into 
support (because of the non-persistent domid problem). I can add something to 
the commit comment to that effect if you'd like.

  Paul
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed

2019-12-09 Thread Jürgen Groß

On 09.12.19 12:39, Roger Pau Monné wrote:

On Thu, Dec 05, 2019 at 02:01:21PM +, Paul Durrant wrote:

Only force state to closed in the case when the toolstack may need to
clean up. This can be detected by checking whether the state in xenstore
has been set to closing prior to device removal.


I'm not sure I see the point of this, I would expect that a failure to
probe or the removal of the device would leave the xenbus state as
closed, which is consistent with the actual driver state.

Can you explain what's the benefit of leaving a device without a
driver in such unknown state?


And more concerning: did you check that no frontend/backend is
relying on the closed state to be visible without closing having been
set before?


Juergen


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/4] xen/interface: don't discard pending work in FRONT/BACK_RING_ATTACH

2019-12-09 Thread Jürgen Groß

On 09.12.19 12:41, Roger Pau Monné wrote:

On Thu, Dec 05, 2019 at 02:01:22PM +, Paul Durrant wrote:

Currently these macros will skip over any requests/responses that are
added to the shared ring whilst it is detached. This, in general, is not
a desirable semantic since most frontend implementations will eventually
block waiting for a response which would either never appear or never be
processed.

NOTE: These macros are currently unused. BACK_RING_ATTACH(), however, will
   be used in a subsequent patch.

Signed-off-by: Paul Durrant 


Those headers come from Xen, and should be modified in Xen first and
then imported into Linux IMO.


In theory, yes. But the Xen variant doesn't contain the ATTACH macros.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4] gnttab: don't expose host physical address without need

2019-12-09 Thread Jan Beulich
On 06.12.2019 20:48, Andrew Cooper wrote:
> On 05/12/2019 16:57, Jan Beulich wrote:
>> On 05.12.2019 16:47, Andrew Cooper wrote:
>>> On 05/12/2019 15:34, Jan Beulich wrote:
 Translated domains shouldn't see host physical addresses. While the
 address is also not supposed to be handed back even to non-translated
 domains when GNTMAP_device_map is not set (as explicitly stated by a
 comment in the public header), PV kernels (Linux at least) assume the
 field to get populated nevertheless.
>>> This really means that the public header needs correcting.  The field
>>> may not have intended to escape out of Xen, but it is defacto part of
>>> the ABI now.
>> Well, that's one of two possible routes. The other is to have, like
>> you did suggest earlier on, a mode in which we behave more strictly,
>> and current Linux then wouldn't work on such a Xen until fixed.
> 
> I'm sorely tempted to introduce a "fully strict mode" right now, behind
> a command line option, which chops off all the bits which shouldn't be
> usable in their current form.
> 
> However, nothing, not even dom0, will boot successfully, so it probably
> isn't a great idea right now.  Also, we'd have an easier time starting
> from nothing and whitelisting ok things in, than attempt to locate
> everything that we should blacklist in the current configuration.

Okay, I'll add a remark to the public header comment then for now.

 (Similarly mapkind() should check only GNTMAP_device_map.)
>>> Is this comment stale, or have I misunderstood some of the reasoning?
>> It's certainly not stale. mapkind() is used to determine whether
>> IOMMU mapping adjustments are needed. With this, it should in
>> principle only consider whether the current operation would
>> possibly alter IOMMU mapping needs. What needs doing should,
>> according to my interpretation of the originally intended design,
>> only depend on current and prior requests with GNTMAP_device_map
>> set.
> 
> That's all reasonable, but there are no edits to mapkind(), so I'm
> confused as to why this is present in the commit message.

Well, it's the same underlying problem - improper separation of
host_map from device_map. I've added this just to emphasize that
this likely wasn't an oversight, but intentional (yet wrong, at
least afaict).

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 1/2] x86/mm: factor out the code for shattering an l3 PTE

2019-12-09 Thread Hongyan Xia
map_pages_to_xen and modify_xen_mappings are performing almost exactly
the same operations when shattering an l3 PTE, the only difference
being whether we want to flush.

Signed-off-by: Hongyan Xia 

---
Changes in v2:
- improve asm.
- re-read pl3e from memory when taking the lock.
- move the allocation of l2t inside the shatter function.
---
 xen/arch/x86/mm.c | 97 +++
 1 file changed, 48 insertions(+), 49 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 7d4dd80a85..6188a968ff 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5151,6 +5151,51 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
  flush_area_local((const void *)v, f) : \
  flush_area_all((const void *)v, f))
 
+/* Shatter an l3 entry and populate l2. If virt is passed in, also do flush. */
+static int shatter_l3e(l3_pgentry_t *pl3e, unsigned long virt, bool locking)
+{
+unsigned int i;
+l3_pgentry_t ol3e;
+l2_pgentry_t ol2e, *l2t = alloc_xen_pagetable();
+
+if ( l2t == NULL )
+return -1;
+
+ol3e = *pl3e;
+ol2e = l2e_from_intpte(l3e_get_intpte(ol3e));
+
+for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
+{
+l2e_write(l2t + i, ol2e);
+ol2e = l2e_from_intpte(
+l2e_get_intpte(ol2e) + (1 << (PAGETABLE_ORDER + PAGE_SHIFT)));
+}
+if ( locking )
+spin_lock(_pgdir_lock);
+if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
+ (l3e_get_flags(*pl3e) & _PAGE_PSE) )
+{
+l3e_write_atomic(pl3e,
+l3e_from_paddr((paddr_t)virt_to_maddr(l2t), 
__PAGE_HYPERVISOR));
+l2t = NULL;
+}
+if ( locking )
+spin_unlock(_pgdir_lock);
+if ( virt )
+{
+unsigned int flush_flags =
+FLUSH_TLB | FLUSH_ORDER(2 * PAGETABLE_ORDER);
+
+if ( (l3e_get_flags(ol3e) & _PAGE_GLOBAL) )
+flush_flags |= FLUSH_TLB_GLOBAL;
+flush_area(virt, flush_flags);
+}
+if ( l2t )
+free_xen_pagetable(l2t);
+
+return 0;
+}
+
 int map_pages_to_xen(
 unsigned long virt,
 mfn_t mfn,
@@ -5244,9 +5289,6 @@ int map_pages_to_xen(
 if ( (l3e_get_flags(ol3e) & _PAGE_PRESENT) &&
  (l3e_get_flags(ol3e) & _PAGE_PSE) )
 {
-unsigned int flush_flags =
-FLUSH_TLB | FLUSH_ORDER(2 * PAGETABLE_ORDER);
-
 /* Skip this PTE if there is no change. */
 if ( ((l3e_get_pfn(ol3e) & ~(L2_PAGETABLE_ENTRIES *
  L1_PAGETABLE_ENTRIES - 1)) +
@@ -5267,33 +5309,9 @@ int map_pages_to_xen(
 continue;
 }
 
-pl2e = alloc_xen_pagetable();
-if ( pl2e == NULL )
+/* Pass virt to indicate we need to flush. */
+if ( shatter_l3e(pl3e, virt, locking) )
 return -ENOMEM;
-
-for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
-l2e_write(pl2e + i,
-  l2e_from_pfn(l3e_get_pfn(ol3e) +
-   (i << PAGETABLE_ORDER),
-   l3e_get_flags(ol3e)));
-
-if ( l3e_get_flags(ol3e) & _PAGE_GLOBAL )
-flush_flags |= FLUSH_TLB_GLOBAL;
-
-if ( locking )
-spin_lock(_pgdir_lock);
-if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
- (l3e_get_flags(*pl3e) & _PAGE_PSE) )
-{
-l3e_write_atomic(pl3e, l3e_from_mfn(virt_to_mfn(pl2e),
-__PAGE_HYPERVISOR));
-pl2e = NULL;
-}
-if ( locking )
-spin_unlock(_pgdir_lock);
-flush_area(virt, flush_flags);
-if ( pl2e )
-free_xen_pagetable(pl2e);
 }
 
 pl2e = virt_to_xen_l2e(virt);
@@ -5578,27 +5596,8 @@ int modify_xen_mappings(unsigned long s, unsigned long 
e, unsigned int nf)
 }
 
 /* PAGE1GB: shatter the superpage and fall through. */
-pl2e = alloc_xen_pagetable();
-if ( !pl2e )
+if ( shatter_l3e(pl3e, 0, locking) )
 return -ENOMEM;
-for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
-l2e_write(pl2e + i,
-  l2e_from_pfn(l3e_get_pfn(*pl3e) +
-   (i << PAGETABLE_ORDER),
-   l3e_get_flags(*pl3e)));
-if ( locking )
-spin_lock(_pgdir_lock);
-if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
- (l3e_get_flags(*pl3e) & _PAGE_PSE) )
-{
-l3e_write_atomic(pl3e, l3e_from_mfn(virt_to_mfn(pl2e),
-__PAGE_HYPERVISOR));
-pl2e = NULL;
-}
-if ( locking )
- 

[Xen-devel] [PATCH v2 2/2] x86/mm: factor out the code for shattering an l2 PTE

2019-12-09 Thread Hongyan Xia
map_pages_to_xen and modify_xen_mappings are performing almost exactly
the same operations when shattering an l2 PTE, the only difference
being whether we want to flush.

Signed-off-by: Hongyan Xia 

---
Changes in v2:
- improve asm.
- re-read pl2e from memory when taking the lock.
- move the allocation of l1t inside the shatter function.
---
 xen/arch/x86/mm.c | 95 ---
 1 file changed, 48 insertions(+), 47 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 6188a968ff..103c97b903 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5151,6 +5151,51 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
  flush_area_local((const void *)v, f) : \
  flush_area_all((const void *)v, f))
 
+/* Shatter an l2 entry and populate l1. If virt is passed in, also do flush. */
+static int shatter_l2e(l2_pgentry_t *pl2e, unsigned long virt, bool locking)
+{
+unsigned int i;
+l2_pgentry_t ol2e;
+l1_pgentry_t ol1e, *l1t = alloc_xen_pagetable();
+
+if ( l1t == NULL )
+return -1;
+
+ol2e = *pl2e;
+ol1e = l1e_from_paddr(l2e_get_paddr(ol2e), 
lNf_to_l1f(l2e_get_flags(ol2e)));
+
+for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
+{
+l1e_write(l1t + i, ol1e);
+ol1e = l1e_from_intpte(
+l1e_get_intpte(ol1e) + (1 << PAGE_SHIFT));
+}
+if ( locking )
+spin_lock(_pgdir_lock);
+if ( (l2e_get_flags(*pl2e) & _PAGE_PRESENT) &&
+ (l2e_get_flags(*pl2e) & _PAGE_PSE) )
+{
+l2e_write_atomic(pl2e,
+l2e_from_paddr((paddr_t)virt_to_maddr(l1t), 
__PAGE_HYPERVISOR));
+l1t = NULL;
+}
+if ( locking )
+spin_unlock(_pgdir_lock);
+if ( virt )
+{
+unsigned int flush_flags =
+FLUSH_TLB | FLUSH_ORDER(PAGETABLE_ORDER);
+
+if ( l2e_get_flags(ol2e) & _PAGE_GLOBAL )
+flush_flags |= FLUSH_TLB_GLOBAL;
+flush_area(virt, flush_flags);
+}
+if ( l1t )
+free_xen_pagetable(l1t);
+
+return 0;
+}
+
 /* Shatter an l3 entry and populate l2. If virt is passed in, also do flush. */
 static int shatter_l3e(l3_pgentry_t *pl3e, unsigned long virt, bool locking)
 {
@@ -5363,9 +5408,6 @@ int map_pages_to_xen(
 }
 else if ( l2e_get_flags(*pl2e) & _PAGE_PSE )
 {
-unsigned int flush_flags =
-FLUSH_TLB | FLUSH_ORDER(PAGETABLE_ORDER);
-
 /* Skip this PTE if there is no change. */
 if ( (((l2e_get_pfn(*pl2e) & ~(L1_PAGETABLE_ENTRIES - 1)) +
l1_table_offset(virt)) == mfn_x(mfn)) &&
@@ -5384,32 +5426,9 @@ int map_pages_to_xen(
 goto check_l3;
 }
 
-pl1e = alloc_xen_pagetable();
-if ( pl1e == NULL )
+/* Pass virt to indicate we need to flush. */
+if ( shatter_l2e(pl2e, virt, locking) )
 return -ENOMEM;
-
-for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
-l1e_write([i],
-  l1e_from_pfn(l2e_get_pfn(*pl2e) + i,
-   lNf_to_l1f(l2e_get_flags(*pl2e;
-
-if ( l2e_get_flags(*pl2e) & _PAGE_GLOBAL )
-flush_flags |= FLUSH_TLB_GLOBAL;
-
-if ( locking )
-spin_lock(_pgdir_lock);
-if ( (l2e_get_flags(*pl2e) & _PAGE_PRESENT) &&
- (l2e_get_flags(*pl2e) & _PAGE_PSE) )
-{
-l2e_write_atomic(pl2e, l2e_from_mfn(virt_to_mfn(pl1e),
-__PAGE_HYPERVISOR));
-pl1e = NULL;
-}
-if ( locking )
-spin_unlock(_pgdir_lock);
-flush_area(virt, flush_flags);
-if ( pl1e )
-free_xen_pagetable(pl1e);
 }
 
 pl1e  = l2e_to_l1e(*pl2e) + l1_table_offset(virt);
@@ -5632,26 +5651,8 @@ int modify_xen_mappings(unsigned long s, unsigned long 
e, unsigned int nf)
 else
 {
 /* PSE: shatter the superpage and try again. */
-pl1e = alloc_xen_pagetable();
-if ( !pl1e )
+if ( shatter_l2e(pl2e, 0, locking) )
 return -ENOMEM;
-for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
-l1e_write([i],
-  l1e_from_pfn(l2e_get_pfn(*pl2e) + i,
-   l2e_get_flags(*pl2e) & ~_PAGE_PSE));
-if ( locking )
-spin_lock(_pgdir_lock);
-if ( (l2e_get_flags(*pl2e) & _PAGE_PRESENT) &&
- (l2e_get_flags(*pl2e) & _PAGE_PSE) )
-{
-l2e_write_atomic(pl2e, 

[Xen-devel] [PATCH v2 0/2] Refactor super page shattering

2019-12-09 Thread Hongyan Xia
map_pages_to_xen and modify_xen_mappings use almost exactly the same
page shattering logic, and the code is mingled with other PTE
manipulations so it is less obvious that the intention is page
shattering. Factor out the functions to make them reusable and to make
the intention more obvious.

Of course, there is not much difference between the shattering logic of
each level, so we could further turn the per-level functions into a
single macro, although this is not that simple since we have per-level
functions and macros all over the place and there are slight differences
between levels. Keep it per-level for now.

tree:
https://xenbits.xen.org/git-http/people/hx242/xen.git int_review

---
Changes in v2:
- rebase.
- improve asm code.
- avoid stale values when taking the lock.
- move allocation of PTE tables inside the shatter function.

Hongyan Xia (2):
  x86/mm: factor out the code for shattering an l3 PTE
  x86/mm: factor out the code for shattering an l2 PTE

 xen/arch/x86/mm.c | 192 +++---
 1 file changed, 96 insertions(+), 96 deletions(-)

-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  1   2   >