Re: [Xen-devel] [PATCHv7 00/11] CONFIG_DEBUG_VIRTUAL for arm64

2017-01-12 Thread Will Deacon
On Tue, Jan 10, 2017 at 01:35:39PM -0800, Laura Abbott wrote:
> This is v7 of the patches to add CONFIG_DEBUG_VIRTUAL for arm64. This is
> a simple reordering of patches from v6 per request of Will Deacon for ease
> of merging support for arm which depends on this series.
> 
> Laura Abbott (11):
>   lib/Kconfig.debug: Add ARCH_HAS_DEBUG_VIRTUAL
>   mm/cma: Cleanup highmem check
>   mm: Introduce lm_alias
>   kexec: Switch to __pa_symbol
>   mm/kasan: Switch to using __pa_symbol and lm_alias
>   mm/usercopy: Switch to using lm_alias
>   drivers: firmware: psci: Use __pa_symbol for kernel symbol
>   arm64: Move some macros under #ifndef __ASSEMBLY__
>   arm64: Add cast for virt_to_pfn
>   arm64: Use __pa_symbol for kernel symbols
>   arm64: Add support for CONFIG_DEBUG_VIRTUAL

I've pushed this into linux-next and, assuming it survives the
autobuilders etc I'll co-ordinate with Russell to get the common parts
pulled into the ARM tree too (so he can take Florian's series). They're
currently split out on the arm64 for-next/debug-virtual branch.

Thanks!

Will

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCHv6 00/11] CONFIG_DEBUG_VIRTUAL for arm64

2017-01-10 Thread Will Deacon
On Wed, Jan 04, 2017 at 02:30:50PM -0800, Florian Fainelli wrote:
> On 01/04/2017 03:44 AM, Will Deacon wrote:
> > On Tue, Jan 03, 2017 at 03:25:53PM -0800, Laura Abbott wrote:
> >> On 01/03/2017 02:56 PM, Florian Fainelli wrote:
> >>> On 01/03/2017 09:21 AM, Laura Abbott wrote:
> >>>> Happy New Year!
> >>>>
> >>>> This is a very minor rebase from v5. It only moves a few headers around.
> >>>> I think this series should be ready to be queued up for 4.11.
> >>>
> >>> FWIW:
> >>>
> >>> Tested-by: Florian Fainelli <f.faine...@gmail.com>
> >>>
> >>
> >> Thanks!
> >>
> >>> How do we get this series included? I would like to get the ARM 32-bit
> >>> counterpart included as well (will resubmit rebased shortly), but I have
> >>> no clue which tree this should be going through.
> >>>
> >>
> >> I was assuming this would go through the arm64 tree unless Catalin/Will
> >> have an objection to that.
> > 
> > Yup, I was planning to pick it up for 4.11.
> > 
> > Florian -- does your series depend on this? If so, then I'll need to
> > co-ordinate with Russell (probably via a shared branch that we both pull)
> > if you're aiming for 4.11 too.
> 
> Yes, pretty much everything in Laura's patch series is relevant, except
> the arm64 bits.

Ok, then. Laura -- could you please reorder your patches so that the
non-arm64 bits come first? That way, I can put those on a separate branch
and have it pulled by both arm64 and rmk, so that the prequisities are
shared between the architectures.

Thanks,

Will

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCHv6 00/11] CONFIG_DEBUG_VIRTUAL for arm64

2017-01-04 Thread Will Deacon
On Tue, Jan 03, 2017 at 03:25:53PM -0800, Laura Abbott wrote:
> On 01/03/2017 02:56 PM, Florian Fainelli wrote:
> > On 01/03/2017 09:21 AM, Laura Abbott wrote:
> >> Happy New Year!
> >>
> >> This is a very minor rebase from v5. It only moves a few headers around.
> >> I think this series should be ready to be queued up for 4.11.
> > 
> > FWIW:
> > 
> > Tested-by: Florian Fainelli 
> > 
> 
> Thanks!
> 
> > How do we get this series included? I would like to get the ARM 32-bit
> > counterpart included as well (will resubmit rebased shortly), but I have
> > no clue which tree this should be going through.
> > 
> 
> I was assuming this would go through the arm64 tree unless Catalin/Will
> have an objection to that.

Yup, I was planning to pick it up for 4.11.

Florian -- does your series depend on this? If so, then I'll need to
co-ordinate with Russell (probably via a shared branch that we both pull)
if you're aiming for 4.11 too.

Will

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] linux-next: manual merge of the xen-tip tree with the arm64 tree

2016-04-26 Thread Will Deacon
On Tue, Apr 26, 2016 at 03:00:41PM +1000, Stephen Rothwell wrote:
> Hi all,
> 
> Today's linux-next merge of the xen-tip tree got a conflict in:
> 
>   arch/arm64/kernel/setup.c
> 
> between commit:
> 
>   3194ac6e66cc ("arm64: Move unflatten_device_tree() call earlier.")
> 
> from the arm64 tree and commit:
> 
>   3915fea959b6 ("ARM: XEN: Move xen_early_init() before efi_init()")
> 
> from the xen-tip tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.

Thanks Stephen, looks good to me.

Will

> diff --cc arch/arm64/kernel/setup.c
> index 65f515949baa,7cf992fe6684..
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@@ -277,13 -336,13 +278,11 @@@ void __init setup_arch(char **cmdline_p
>   
>   early_ioremap_reset();
>   
>  -if (acpi_disabled) {
>  -unflatten_device_tree();
>  +if (acpi_disabled)
>   psci_dt_init();
>  -} else {
>  +else
>   psci_acpi_init();
>  -}
>   
> - xen_early_init();
> - 
>   cpu_read_bootcpu_ops();
>   smp_init_cpus();
>   smp_build_mpidr_hash();
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] Xen on ARM and ARM64: update MAINTAINERS info

2016-04-01 Thread Will Deacon
On Tue, Mar 29, 2016 at 11:01:16AM +0100, Stefano Stabellini wrote:
> Not my full time job anymore, but still maintaining it.
> 
> Signed-off-by: Stefano Stabellini <stefano.stabell...@eu.citrix.com>
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 32bafda..049aa1d 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -12193,16 +12193,16 @@ F:  include/xen/
>  F:   include/uapi/xen/
>  
>  XEN HYPERVISOR ARM
> -M:   Stefano Stabellini <stefano.stabell...@eu.citrix.com>
> +M:   Stefano Stabellini <sstabell...@kernel.org>
>  L:   xen-de...@lists.xenproject.org (moderated for non-subscribers)
> -S:   Supported
> +S:   Maintained
>  F:   arch/arm/xen/
>  F:   arch/arm/include/asm/xen/
>  
>  XEN HYPERVISOR ARM64
> -M:   Stefano Stabellini <stefano.stabell...@eu.citrix.com>
> +M:   Stefano Stabellini <sstabell...@kernel.org>
>  L:   xen-de...@lists.xenproject.org (moderated for non-subscribers)
> -S:   Supported
> +S:   Maintained
>  F:   arch/arm64/xen/
>  F:   arch/arm64/include/asm/xen/

Acked-by: Will Deacon <will.dea...@arm.com>

Will

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 12/17] ARM64: ACPI: Check if it runs on Xen to enable or disable ACPI

2016-03-29 Thread Will Deacon
On Thu, Mar 24, 2016 at 10:44:31PM +0800, Shannon Zhao wrote:
> When it's a Xen domain0 booting with ACPI, it will supply a /chosen and
> a /hypervisor node in DT. So check if it needs to enable ACPI.
> 
> Signed-off-by: Shannon Zhao 
> Reviewed-by: Stefano Stabellini 
> Acked-by: Hanjun Guo 
> ---
>  arch/arm64/kernel/acpi.c | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c
> index d1ce8e2..4e92be0 100644
> --- a/arch/arm64/kernel/acpi.c
> +++ b/arch/arm64/kernel/acpi.c
> @@ -67,10 +67,13 @@ static int __init dt_scan_depth1_nodes(unsigned long node,
>  {
>   /*
>* Return 1 as soon as we encounter a node at depth 1 that is
> -  * not the /chosen node.
> +  * not the /chosen node, or /hypervisor node when running on Xen.
>*/
> - if (depth == 1 && (strcmp(uname, "chosen") != 0))
> - return 1;
> + if (depth == 1 && (strcmp(uname, "chosen") != 0)) {
> + if (!xen_initial_domain() || (strcmp(uname, "hypervisor") != 0))
> + return 1;
> + }

Hmm, but xen_initial_domain() is false when xen isn't being used at all,
so it feels to me like this is a bit too far-reaching and is basically
claiming the "/hypervisor" namespace for Xen. Couldn't it be renamed to
"xen,hypervisor" or something?

Mark, got any thoughts on this?

Will

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 11/17] ARM: XEN: Move xen_early_init() before efi_init()

2016-03-29 Thread Will Deacon
On Sat, Mar 26, 2016 at 12:54:09PM +, Stefano Stabellini wrote:
> are you OK with this patch?

Nothing against it, but the only arm64 bit is:

> > diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> > index 450987d..6cf5051 100644
> > --- a/arch/arm64/kernel/setup.c
> > +++ b/arch/arm64/kernel/setup.c
> > @@ -313,6 +313,7 @@ void __init setup_arch(char **cmdline_p)
> >  */
> > local_async_enable();
> >  
> > +   xen_early_init();
> > efi_init();
> > arm64_memblock_init();
> >  
> > @@ -334,7 +335,6 @@ void __init setup_arch(char **cmdline_p)
> > } else {
> > psci_acpi_init();
> > }
> > -   xen_early_init();

so it's difficult to care too much ;) I do hope that there won't be a
need to split up efi_init() in future because some of it has to happen
before xen_early_init, but that doesn't sound likely at the moment.

Will

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-27 Thread Will Deacon
On Tue, Jan 26, 2016 at 11:58:20AM -0800, Paul E. McKenney wrote:
> On Tue, Jan 26, 2016 at 12:16:09PM +, Will Deacon wrote:
> > On Mon, Jan 25, 2016 at 10:03:22PM -0800, Paul E. McKenney wrote:
> > > On Mon, Jan 25, 2016 at 04:42:43PM +, Will Deacon wrote:
> > > > On Fri, Jan 15, 2016 at 01:58:53PM -0800, Paul E. McKenney wrote:
> > > > > PPC Overlapping Group-B sets version 4
> > > > > ""
> > > > > (* When the Group-B sets from two different barriers involve 
> > > > > instructions in
> > > > >the same thread, within that thread one set must contain the other.
> > > > > 
> > > > >   P0  P1  P2
> > > > >   Rx=1Wy=1Wz=2
> > > > >   dep.lwsync  lwsync
> > > > >   Ry=0Wz=1Wx=1
> > > > >   Rz=1
> > > > > 
> > > > >   assert(!(z=2))
> > > > > 
> > > > >Forbidden by ppcmem, allowed by herd.
> > > > > *)
> > > > > {
> > > > > 0:r1=x; 0:r2=y; 0:r3=z;
> > > > > 1:r1=x; 1:r2=y; 1:r3=z; 1:r4=1;
> > > > > 2:r1=x; 2:r2=y; 2:r3=z; 2:r4=1; 2:r5=2;
> > > > > }
> > > > >  P0   | P1| P2;
> > > > >  lwz r6,0(r1) | stw r4,0(r2)  | stw r5,0(r3)  ;
> > > > >  xor r7,r6,r6 | lwsync| lwsync;
> > > > >  lwzx r7,r7,r2| stw r4,0(r3)  | stw r4,0(r1)  ;
> > > > >  lwz r8,0(r3) |   |   ;
> > > > > 
> > > > > exists
> > > > > (z=2 /\ 0:r6=1 /\ 0:r7=0 /\ 0:r8=1)
> > > > 
> > > > That really hurts. Assuming that the "assert(!(z=2))" is actually there
> > > > to constrain the coherence order of z to be {0->1->2}, then I think that
> > > > this test is forbidden on arm using dmb instead of lwsync. That said, I
> > > > also don't think the Rz=1 in P0 changes anything.
> > > 
> > > What about the smp_wmb() variant of dmb that orders only stores?
> > 
> > Tricky, but I think it still works out if the coherence order of z is as
> > I described above. The line of reasoning is weird though -- I ended up
> > considering the two cases where P0 reads z before and after it reads x
> > and what that means for the read of y.
> 
> By "works out" you mean that ARM prohibits the outcome?

Yes, that's my understanding.

Will

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] documentation: Add disclaimer

2016-01-27 Thread Will Deacon
On Wed, Jan 27, 2016 at 09:35:46AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 26, 2016 at 12:11:43PM -0800, Paul E. McKenney wrote:
> > So Peter, would you like to update your patch to include yourself
> > and Will as authors?
> 
> Sure, here goes.
> 
> ---
> Subject: documentation: Add disclaimer
> 
> It appears people are reading this document as a requirements list for
> building hardware. This is not the intent of this document. Nor is it
> particularly suited for this purpose.
> 
> The primary purpose of this document is our collective attempt to define
> a set of primitives that (hopefully) allow us to write correct code on
> the myriad of SMP platforms Linux supports.
> 
> Its a definite work in progress as our understanding of these platforms,
> and memory ordering in general, progresses.
> 
> Nor does being mentioned in this document mean we think its a
> particularly good idea; the data dependency barrier required by Alpha
> being a prime example. Yes we have it, no you're insane to require it
> when building new hardware.
> 
> Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org>
> ---
>  Documentation/memory-barriers.txt | 18 +-
>  1 file changed, 17 insertions(+), 1 deletion(-)

Acked-by: Will Deacon <will.dea...@arm.com>

Will

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-27 Thread Will Deacon
On Tue, Jan 26, 2016 at 03:37:33PM -0800, Paul E. McKenney wrote:
> On Tue, Jan 26, 2016 at 12:10:10PM +, Will Deacon wrote:
> > On Mon, Jan 25, 2016 at 05:06:46PM -0800, Paul E. McKenney wrote:
> > > PPC WRCnf+addrs
> > > ""
> > > {
> > > 0:r2=x; 0:r3=y;
> > > 1:r2=x; 1:r3=y;
> > > 2:r2=x; 2:r3=y;
> > > c=a; d=b; x=c; y=d;
> > > }
> > >  P0   | P1| P2;
> > >  stw r3,0(r2) | lwz r8,0(r2)  | lwz r8,0(r3)  ;
> > >   | stw r2,0(r8)  | lwz r9,0(r8)  ;
> > > exists
> > > (1:r8=y /\ 2:r8=x /\ 2:r9=c)
> > 
> > Agreed.
> 
> OK, thank you!  Would you agree that it would be good to replace the
> current xor-based fake-dependency litmus tests with tests having real
> dependencies?

Yes, because it would look a lot more like real (kernel) code.

Will

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-26 Thread Will Deacon
On Mon, Jan 25, 2016 at 10:03:22PM -0800, Paul E. McKenney wrote:
> On Mon, Jan 25, 2016 at 04:42:43PM +, Will Deacon wrote:
> > On Fri, Jan 15, 2016 at 01:58:53PM -0800, Paul E. McKenney wrote:
> > > PPC Overlapping Group-B sets version 4
> > > ""
> > > (* When the Group-B sets from two different barriers involve instructions 
> > > in
> > >the same thread, within that thread one set must contain the other.
> > > 
> > >   P0  P1  P2
> > >   Rx=1Wy=1Wz=2
> > >   dep.lwsync  lwsync
> > >   Ry=0Wz=1Wx=1
> > >   Rz=1
> > > 
> > >   assert(!(z=2))
> > > 
> > >Forbidden by ppcmem, allowed by herd.
> > > *)
> > > {
> > > 0:r1=x; 0:r2=y; 0:r3=z;
> > > 1:r1=x; 1:r2=y; 1:r3=z; 1:r4=1;
> > > 2:r1=x; 2:r2=y; 2:r3=z; 2:r4=1; 2:r5=2;
> > > }
> > >  P0   | P1| P2;
> > >  lwz r6,0(r1) | stw r4,0(r2)  | stw r5,0(r3)  ;
> > >  xor r7,r6,r6 | lwsync| lwsync;
> > >  lwzx r7,r7,r2| stw r4,0(r3)  | stw r4,0(r1)  ;
> > >  lwz r8,0(r3) |   |   ;
> > > 
> > > exists
> > > (z=2 /\ 0:r6=1 /\ 0:r7=0 /\ 0:r8=1)
> > 
> > That really hurts. Assuming that the "assert(!(z=2))" is actually there
> > to constrain the coherence order of z to be {0->1->2}, then I think that
> > this test is forbidden on arm using dmb instead of lwsync. That said, I
> > also don't think the Rz=1 in P0 changes anything.
> 
> What about the smp_wmb() variant of dmb that orders only stores?

Tricky, but I think it still works out if the coherence order of z is as
I described above. The line of reasoning is weird though -- I ended up
considering the two cases where P0 reads z before and after it reads x
and what that means for the read of y.

Will

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-26 Thread Will Deacon
On Mon, Jan 25, 2016 at 05:06:46PM -0800, Paul E. McKenney wrote:
> On Mon, Jan 25, 2016 at 02:41:34PM +, Will Deacon wrote:
> > On Fri, Jan 15, 2016 at 11:28:45AM -0800, Paul E. McKenney wrote:
> > > On Fri, Jan 15, 2016 at 09:54:01AM -0800, Paul E. McKenney wrote:
> > > > On Fri, Jan 15, 2016 at 10:24:32AM +, Will Deacon wrote:
> > > > > See my earlier reply [1] (but also, your WRC Linux example looks more
> > > > > like a variant on WWC and I couldn't really follow it).
> > > > 
> > > > I will revisit my WRC Linux example.  And yes, creating litmus tests
> > > > that use non-fake dependencies is still a bit of an undertaking.  :-/
> > > > I am sure that it will seem more natural with time and experience...
> > > 
> > > Hmmm...  You are quite right, I did do WWC.  I need to change cpu2()'s
> > > last access from a store to a load to get WRC.  Plus the levels of
> > > indirection definitely didn't match up, did they?
> > 
> > Nope, it was pretty baffling!
> 
> "It is a service that I provide."  ;-)
> 
> > >   struct foo {
> > >   struct foo *next;
> > >   };
> > >   struct foo a;
> > >   struct foo b;
> > >   struct foo c = {  };
> > >   struct foo d = {  };
> > >   struct foo x = {  };
> > >   struct foo y = {  };
> > >   struct foo *r1, *r2, *r3;
> > > 
> > >   void cpu0(void)
> > >   {
> > >   WRITE_ONCE(x.next, );
> > >   }
> > > 
> > >   void cpu1(void)
> > >   {
> > >   r1 = lockless_dereference(x.next);
> > >   WRITE_ONCE(r1->next, );
> > >   }
> > > 
> > >   void cpu2(void)
> > >   {
> > >   r2 = lockless_dereference(y.next);
> > >   r3 = READ_ONCE(r2->next);
> > >   }
> > > 
> > > In this case, it is legal to end the run with:
> > > 
> > >   r1 ==  && r2 ==  && r3 == 
> > > 
> > > Please see below for a ppcmem litmus test.
> > > 
> > > So, did I get it right this time?  ;-)
> > 
> > The code above looks correct to me (in that it matches WRC+addrs),
> > but your litmus test:
> > 
> > > PPC WRCnf+addrs
> > > ""
> > > {
> > > 0:r2=x; 0:r3=y;
> > > 1:r2=x; 1:r3=y;
> > > 2:r2=x; 2:r3=y;
> > > c=a; d=b; x=c; y=d;
> > > }
> > >  P0   | P1| P2;
> > >  stw r3,0(r2) | lwz r8,0(r2)  | lwz r8,0(r3)  ;
> > >   | stw r2,0(r3)  | lwz r9,0(r8)  ;
> > > exists
> > > (1:r8=y /\ 2:r8=x /\ 2:r9=c)
> > 
> > Seems to be missing the address dependency on P1.
> 
> You are quite correct!  How about the following?

I think that's it!

> As before, both herd and ppcmem say that the cycle is allowed, as
> expected, given non-transitive ordering.  To prohibit the cycle, P1
> needs a suitable memory-barrier instruction.
> 
> 
> 
> PPC WRCnf+addrs
> ""
> {
> 0:r2=x; 0:r3=y;
> 1:r2=x; 1:r3=y;
> 2:r2=x; 2:r3=y;
> c=a; d=b; x=c; y=d;
> }
>  P0   | P1| P2;
>  stw r3,0(r2) | lwz r8,0(r2)  | lwz r8,0(r3)  ;
>   | stw r2,0(r8)  | lwz r9,0(r8)  ;
> exists
> (1:r8=y /\ 2:r8=x /\ 2:r9=c)

Agreed.

Will

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-26 Thread Will Deacon
On Tue, Jan 26, 2016 at 11:32:00AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 26, 2016 at 11:24:02AM +0100, Peter Zijlstra wrote:
> 
> > Yeah, this goes under the header: memory-barriers.txt is _NOT_ a
> > specification (I seem to keep repeating this).
> 
> Do we want this ?
> 
> ---
>  Documentation/memory-barriers.txt | 17 +
>  1 file changed, 17 insertions(+)
> 
> diff --git a/Documentation/memory-barriers.txt 
> b/Documentation/memory-barriers.txt
> index a61be39c7b51..433326ebdc26 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -1,3 +1,4 @@
> +
>
>LINUX KERNEL MEMORY BARRIERS
>
> @@ -5,6 +6,22 @@
>  By: David Howells 
>  Paul E. McKenney 
>  
> +==
> +DISCLAIMER
> +==
> +
> +This document is not a specification; it is intentionally (for the sake of
> +brevity) and unintentionally (due to being human) incomplete. This document 
> is
> +meant as a guide to using the various memory barriers provided by Linux, but
> +in case of any doubt (and there are many) please ask.

It might be worth adding you and me to the top of the file, to save Paul
Cc'ing us on questions (get_maintainer.pl points at poor old Corbet for
this file).

But yes, it seems that something like this is required.

Will

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-25 Thread Will Deacon
On Fri, Jan 15, 2016 at 01:58:53PM -0800, Paul E. McKenney wrote:
> On Fri, Jan 15, 2016 at 10:27:14PM +0100, Peter Zijlstra wrote:
> > On Fri, Jan 15, 2016 at 09:46:12AM -0800, Paul E. McKenney wrote:
> > > On Fri, Jan 15, 2016 at 10:13:48AM +0100, Peter Zijlstra wrote:
> > 
> > > > And the stuff we're confused about is how best to express the difference
> > > > and guarantees of these two forms of transitivity and how exactly they
> > > > interact.
> > > 
> > > Hoping my memory-barrier.txt patch helps here...
> > 
> > Yes, that seems a good start. But yesterday you raised the 'fun' point
> > of two globally ordered sequences connected by a single local link.
> 
> The conclusion that I am slowly coming to is that litmus tests should
> not be thought of as linear chains, but rather as cycles.  If you think
> of it as a cycle, then it doesn't matter where the local link is, just
> how many of them and how they are connected.

Do you have some examples of this? I'm struggling to make it work in my
mind, or are you talking specifically in the context of the kernel
memory model?

> But I will admit that there are some rather strange litmus tests that
> challenge this cycle-centric view, for example, the one shown below.
> It turns out that herd and ppcmem disagree on the outcome.  (The Power
> architects side with ppcmem.)
> 
> > And I think I'm still confused on LWSYNC (in the smp_wmb case) when one
> > of the stores looses a conflict, and if that scenario matters. If it
> > does, we should inspect the same case for other barriers.
> 
> Indeed.  I am still working on how these should be described.  My
> current thought is to be quite conservative on what ordering is
> actually respected, however, the current task is formalizing how
> RCU plays with the rest of the memory model.
> 
>   Thanx, Paul
> 
> 
> 
> PPC Overlapping Group-B sets version 4
> ""
> (* When the Group-B sets from two different barriers involve instructions in
>the same thread, within that thread one set must contain the other.
> 
>   P0  P1  P2
>   Rx=1Wy=1Wz=2
>   dep.lwsync  lwsync
>   Ry=0Wz=1Wx=1
>   Rz=1
> 
>   assert(!(z=2))
> 
>Forbidden by ppcmem, allowed by herd.
> *)
> {
> 0:r1=x; 0:r2=y; 0:r3=z;
> 1:r1=x; 1:r2=y; 1:r3=z; 1:r4=1;
> 2:r1=x; 2:r2=y; 2:r3=z; 2:r4=1; 2:r5=2;
> }
>  P0   | P1| P2;
>  lwz r6,0(r1) | stw r4,0(r2)  | stw r5,0(r3)  ;
>  xor r7,r6,r6 | lwsync| lwsync;
>  lwzx r7,r7,r2| stw r4,0(r3)  | stw r4,0(r1)  ;
>  lwz r8,0(r3) |   |   ;
> 
> exists
> (z=2 /\ 0:r6=1 /\ 0:r7=0 /\ 0:r8=1)

That really hurts. Assuming that the "assert(!(z=2))" is actually there
to constrain the coherence order of z to be {0->1->2}, then I think that
this test is forbidden on arm using dmb instead of lwsync. That said, I
also don't think the Rz=1 in P0 changes anything.

The double negatives don't help here! (it is forbidden to guarantee that
z is not always 2).

Will

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-25 Thread Will Deacon
Hi Paul,

On Fri, Jan 15, 2016 at 09:39:12AM -0800, Paul E. McKenney wrote:
> On Fri, Jan 15, 2016 at 09:55:54AM +0100, Peter Zijlstra wrote:
> > On Thu, Jan 14, 2016 at 01:29:13PM -0800, Paul E. McKenney wrote:
> > > So smp_mb() provides transitivity, as do pairs of smp_store_release()
> > > and smp_read_acquire(), 
> > 
> > But they provide different grades of transitivity, which is where all
> > the confusion lays.
> > 
> > smp_mb() is strongly/globally transitive, all CPUs will agree on the order.
> > 
> > Whereas the RCpc release+acquire is weakly so, only the two cpus
> > involved in the handover will agree on the order.
> 
> Good point!
> 
> Using grace periods in place of smp_mb() also provides strong/global
> transitivity, but also insanely high latencies.  ;-)
> 
> The patch below updates Documentation/memory-barriers.txt to define
> local vs. global transitivity.  The corresponding ppcmem litmus test
> is included below as well.
> 
> Should we start putting litmus tests for the various examples
> somewhere, perhaps in a litmus-tests directory within each participating
> architecture?  I have a pile of powerpc-related litmus tests on my laptop,
> but they probably aren't doing all that much good there.

I too would like to have the litmus tests in the kernel so that we can
refer to them from memory-barriers.txt. Ideally they wouldn't be targetted
to a particular arch, however.

> PPC local-transitive
> ""
> {
> 0:r1=1; 0:r2=u; 0:r3=v; 0:r4=x; 0:r5=y; 0:r6=z;
> 1:r1=1; 1:r2=u; 1:r3=v; 1:r4=x; 1:r5=y; 1:r6=z;
> 2:r1=1; 2:r2=u; 2:r3=v; 2:r4=x; 2:r5=y; 2:r6=z;
> 3:r1=1; 3:r2=u; 3:r3=v; 3:r4=x; 3:r5=y; 3:r6=z;
> }
>  P0   | P1   | P2   | P3   ;
>  lwz r9,0(r4) | lwz r9,0(r5) | lwz r9,0(r6) | stw r1,0(r3) ;
>  lwsync   | lwsync   | lwsync   | sync ;
>  stw r1,0(r2) | lwz r8,0(r3) | stw r1,0(r7) | lwz r9,0(r2) ;
>  lwsync   | lwz r7,0(r2) |  |  ;
>  stw r1,0(r5) | lwsync   |  |  ;
>   | stw r1,0(r6) |  |  ;
> exists
> (* (0:r9=0 /\ 1:r9=1 /\ 2:r9=1 /\ 1:r8=0 /\ 3:r9=0) *)
> (* (0:r9=1 /\ 1:r9=1 /\ 2:r9=1) *)
> (* (0:r9=0 /\ 1:r9=1 /\ 2:r9=1 /\ 1:r7=0) *)
> (0:r9=0 /\ 1:r9=1 /\ 2:r9=1 /\ 1:r7=0)

i.e. we should rewrite this using READ_ONCE/WRITE_ONCE and smp_mb() etc.

> 
> 
> commit 2cb4e83a1b5c89c8e39b8a64bd89269d05913e41
> Author: Paul E. McKenney <paul...@linux.vnet.ibm.com>
> Date:   Fri Jan 15 09:30:42 2016 -0800
> 
> documentation: Distinguish between local and global transitivity
> 
> The introduction of smp_load_acquire() and smp_store_release() had
> the side effect of introducing a weaker notion of transitivity:
> The transitivity of full smp_mb() barriers is global, but that
> of smp_store_release()/smp_load_acquire() chains is local.  This
> commit therefore introduces the notion of local transitivity and
> gives an example.
> 
> Reported-by: Peter Zijlstra <pet...@infradead.org>
> Reported-by: Will Deacon <will.dea...@arm.com>
> Signed-off-by: Paul E. McKenney <paul...@linux.vnet.ibm.com>
> 
> diff --git a/Documentation/memory-barriers.txt 
> b/Documentation/memory-barriers.txt
> index c66ba46d8079..d8109ed99342 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -1318,8 +1318,82 @@ or a level of cache, CPU 2 might have early access to 
> CPU 1's writes.
>  General barriers are therefore required to ensure that all CPUs agree
>  on the combined order of CPU 1's and CPU 2's accesses.
>  
> -To reiterate, if your code requires transitivity, use general barriers
> -throughout.
> +General barriers provide "global transitivity", so that all CPUs will
> +agree on the order of operations.  In contrast, a chain of release-acquire
> +pairs provides only "local transitivity", so that only those CPUs on
> +the chain are guaranteed to agree on the combined order of the accesses.

Thanks for having a go at this. I tried defining something axiomatically,
but got stuck pretty quickly. In my scheme, I used "data-directed
transitivity" instead of "local transitivity", since the latter seems to
be a bit of a misnomer.

> +For example, switching to C code in deference to Herman Hollerith:
> +
> + int u, v, x, y, z;
> +
> + void cpu0(void)
> + {
> + r0 = smp_load_acquire();
> + WRITE_ONCE(u, 1);
> + smp_store_release(, 1);
> + }
> +
> + void cpu1(void)
&g

Re: [Xen-devel] [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-15 Thread Will Deacon
Paul,

On Thu, Jan 14, 2016 at 02:20:46PM -0800, Paul E. McKenney wrote:
> On Thu, Jan 14, 2016 at 01:24:34PM -0800, Leonid Yegoshin wrote:
> > It is not so simple, I mean "local ordering for address and data
> > dependencies". Local ordering is NOT enough. It happens that current
> > MIPS R6 doesn't require in your example smp_read_barrier_depends()
> > but in discussion it comes out that it may not. Because without
> > smp_read_barrier_depends() your example can be a part of Will's
> > WRC+addr+addr and we found some design which easily can bump into
> > this test. And that design actually performs "local ordering for
> > address and data dependencies" too.
> 
> As noted in another email in this thread, I do not believe that
> WRC+addr+addr needs to be prohibited.  Sounds like Will and I need to
> get our story straight, though.

I think you figured this out while I was sleeping, but just to confirm:

 1. The MIPS64 ISA doc [1] talks about SYNC in a way that applies only
to memory accesses appearing in *program-order* before the SYNC

 2. We need WRC+sync+addr to work, which means that the SYNC in P1 must
also capture the store in P0 as being "before" the barrier. Leonid
reckons it works, but his explanation [2] focussed on the address
dependency in P2 as to why this works. If that is the case (i.e.
address dependency provides global transitivity), then WRC+addr+addr
should also work (even though its not required).

 3. It seems that WRC+addr+addr doesn't work, so I'm still suspicious
about WRC+sync+addr, because neither the architecture document or
Leonid's explanation tell me that it should be forbidden.

Will

[1] https://imgtec.com/?do-download=4302
[2] http://lkml.kernel.org/r/569565da.2010...@imgtec.com (scroll to the end)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-14 Thread Will Deacon
On Wed, Jan 13, 2016 at 12:58:22PM -0800, Leonid Yegoshin wrote:
> On 01/13/2016 12:48 PM, Peter Zijlstra wrote:
> >On Wed, Jan 13, 2016 at 11:02:35AM -0800, Leonid Yegoshin wrote:
> >
> >>I ask HW team about it but I have a question - has it any relationship with
> >>replacing MIPS SYNC with lightweight SYNCs (SYNC_WMB etc)?
> >Of course. If you cannot explain the semantics of the primitives you
> >introduce, how can we judge the patch.
> >
> >
> You missed a point - it is a question about replacement of SYNC with
> lightweight primitives. It is NOT a question about multithread system
> behavior without any SYNC. The answer on a latest Will's question lies in
> different area.

The reason we (Peter and I) care about this isn't because we enjoy being
obstructive. It's because there is a whole load of core (i.e. portable)
kernel code that is written to the *kernel* memory model. For example,
the scheduler, RCU, mutex implementations, perf, drivers, you name it.

Consequently, it's important that the architecture back-ends implement
these portable primitives (e.g. smp_mb()) in a way that satisfies the
kernel memory model so that core code doesn't need to worry about the
underlying architecture for synchronisation purposes. You could turn
around and say "but if MIPS gets it wrong, then that's MIPS's problem",
but actually not having a general understanding of the ordering guarantees
provided by each architecture makes it very difficult for us to extend
the kernel memory model in such a way that it can be implemented
efficiently across the board *and* relied upon by core code.

The virtio patch at the start of the thread doesn't particularly concern
me. It's the other patches you linked to that implement acquire/release
that have me worried.

Will

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-14 Thread Will Deacon
On Wed, Jan 13, 2016 at 02:26:16PM -0800, Leonid Yegoshin wrote:
> On 01/13/2016 02:45 AM, Will Deacon wrote:
> >>
> >I don't think the address dependency is enough on its own. By that
> >reasoning, the following variant (WRC+addr+addr) would work too:
> >
> >
> >P0:
> >Wx = 1
> >
> >P1:
> >Rx == 1
> >
> >Wy = 1
> >
> >P2:
> >Ry == 1
> >
> >Rx = 0
> >
> >
> >So are you saying that this is also forbidden?
> >Imagine that P0 and P1 are two threads that share a store buffer. What
> >then?
> 
> OK, I collected answers and it is:
> 
> In MIPS R6 this test passes OK, I mean - P2: Rx = 1 if Ry is read as 1.
> By design.
> 
> However, it is unclear that happens in MIPS R2 1004K.

How can it be unclear? If, for example, the outcome is permitted on that
CPU, then your original reasoning for the WRC+sync+addr doesn't apply
there and SYNC is not transitive. That's what I'm trying to get to the
bottom of.

Does the MIPS kernel target a particular CPU at compile time?

> Moreover, there are voices against guarantee that it will be in future
> and that voices point me to Documentation/memory-barriers.txt section "DATA
> DEPENDENCY BARRIERS" examples which require SYNC_RMB between loading
> address/index and using that for loading data based on that address or index
> for shared data (look on CPU2 pseudo-code):
> >To deal with this, a data dependency barrier or better must be inserted
> >between the address load and the data load:
> >
> >CPU 1 CPU 2
> >===   ===
> >{ A == 1, B == 2, C = 3, P == , Q ==  }
> >B = 4;
> >
> >WRITE_ONCE(P, );
> >  Q = READ_ONCE(P);
> >   <---
> >SYNC_RMB is here
> >  D = *Q;
> ...
> >Another example of where data dependency barriers might be required is
> >where a
> >number is read from memory and then used to calculate the index for an
> >array
> >access:
> >
> >CPU 1 CPU 2
> >===   ===
> >{ M[0] == 1, M[1] == 2, M[3] = 3, P == 0, Q == 3 }
> >M[1] = 4;
> >
> >WRITE_ONCE(P, 1);
> >  Q = READ_ONCE(P);
> >   <
> >SYNC_RMB is here
> >  D = M[Q];
> 
> That voices say that there is a legitimate reason to relax HW here for
> performance if SYNC_RMB is needed anyway to work with this sequence of
> shared data.

Are you saying that MIPS needs to implement [smp_]read_barrier_depends?

> And all that is out-of-topic here in my mind. I just want to be sure that
> this patchset still provides a use of a specific lightweight SYNCs on MIPS
> vs bold and heavy generalized "SYNC 0" in any case.

We may be highjacking the thread slightly, but there are much bigger
issues at play here if you want to start using lightweight barriers to
implement relaxed kernel primitives such as smp_load_acquire and
smp_store_release.

Will

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-13 Thread Will Deacon
On Tue, Jan 12, 2016 at 12:45:14PM -0800, Leonid Yegoshin wrote:
> >The issue I have with the SYNC description in the text above is that it
> >describes the single CPU (program order) and the dual-CPU (confusingly
> >named global order) cases, but then doesn't generalise any further. That
> >means we can't sensibly reason about transitivity properties when a third
> >agent is involved. For example, the WRC+sync+addr test:
> >
> >
> >P0:
> >Wx = 1
> >
> >P1:
> >Rx == 1
> >SYNC
> >Wy = 1
> >
> >P2:
> >Ry == 1
> >
> >Rx = 0
> >
> >
> >I can't find anything to forbid that, given the text. The main problem
> >is having the SYNC on P1 affect the write by P0.
> 
> As I understand that test, the visibility of P0: W[x] = 1 is identical to P1
> and P2 here. If P1 got X before SYNC and write to Y after SYNC then
> instruction source register dependency tracking in P2 prevents a speculative
> load of X before P2 obtains Y from the same place as P0/P1 and calculate
> address of X. If some load of X in P2 happens before address dependency
> calculation it's result is discarded.

I don't think the address dependency is enough on its own. By that
reasoning, the following variant (WRC+addr+addr) would work too:


P0:
Wx = 1

P1:
Rx == 1

Wy = 1

P2:
Ry == 1

Rx = 0


So are you saying that this is also forbidden?
Imagine that P0 and P1 are two threads that share a store buffer. What
then?

> Yes, you can't find that in MIPS SYNC instruction description, it is more
> likely in CM (Coherence Manager) area. I just pointed our arch team member
> responsible for documents and he will think how to explain that.

I tried grepping the linked documents for "coherence manager" but couldn't
find anything. Is the description you refer to available anywhere?

Will

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-12 Thread Will Deacon
On Tue, Jan 12, 2016 at 11:40:12AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 12, 2016 at 11:25:55AM +0100, Peter Zijlstra wrote:
> > On Tue, Jan 12, 2016 at 10:27:11AM +0100, Peter Zijlstra wrote:
> > > 2) the changelog _completely_ fails to explain the sync 0x11 and sync
> > > 0x12 semantics nor does it provide a publicly accessible link to
> > > documentation that does.
> > 
> > Ralf pointed me at: https://imgtec.com/mips/architectures/mips64/
> > 
> > > 3) it really should have explained what you did with
> > > smp_llsc_mb/smp_mb__before_llsc() in _detail_.
> > 
> > And reading the MIPS64 v6.04 instruction set manual, I think 0x11/0x12
> > are _NOT_ transitive and therefore cannot be used to implement the
> > smp_mb__{before,after} stuff.
> > 
> > That is, in MIPS speak, those SYNC types are Ordering Barriers, not
> > Completion Barriers. They need not be globally performed.
> 
> Which if true; and I know Will has some questions here; would also mean
> that you 'cannot' use the ACQUIRE/RELEASE barriers for your locks as was
> recently suggested by David Daney.

The issue I have with the SYNC description in the text above is that it
describes the single CPU (program order) and the dual-CPU (confusingly
named global order) cases, but then doesn't generalise any further. That
means we can't sensibly reason about transitivity properties when a third
agent is involved. For example, the WRC+sync+addr test:


P0:
Wx = 1

P1:
Rx == 1
SYNC
Wy = 1

P2:
Ry == 1

Rx = 0


I can't find anything to forbid that, given the text. The main problem
is having the SYNC on P1 affect the write by P0.

> That is, currently all architectures -- with exception of PPC -- have
> RCsc locks, but using these non-transitive things will get you RCpc
> locks.
> 
> So yes, MIPS can go RCpc for its locks and share the burden of pain with
> PPC, but that needs to be a very concious decision.

I think it's much worse than RCpc, given my interpretation of the wording.

Will

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v11 4/5] arm64: introduce CONFIG_PARAVIRT, PARAVIRT_TIME_ACCOUNTING and pv_time_ops

2015-11-17 Thread Will Deacon
On Tue, Nov 17, 2015 at 05:34:36PM +, Marc Zyngier wrote:
> On 17/11/15 17:29, Will Deacon wrote:
> > On Tue, Nov 10, 2015 at 02:11:38PM +, Stefano Stabellini wrote:
> >> On Thu, 5 Nov 2015, Stefano Stabellini wrote:
> >>> Introduce CONFIG_PARAVIRT and PARAVIRT_TIME_ACCOUNTING on ARM64.
> >>> Necessary duplication of paravirt.h and paravirt.c with ARM.
> >>>
> >>> The only paravirt interface supported is pv_time_ops.steal_clock, so no
> >>> runtime pvops patching needed.
> >>>
> >>> This allows us to make use of steal_account_process_tick for stolen
> >>> ticks accounting.
> >>>
> >>> Signed-off-by: Stefano Stabellini <stefano.stabell...@eu.citrix.com>
> >>> Acked-by: Marc Zyngier <marc.zyng...@arm.com>
> >>
> >> Ping?
> >>
> >> Catalin, Will,
> >> are you happy with this change?
> > 
> > I'm happy if Marc's happy. Marc?
> 
> My Ack is already on the tin! ;-)

Ah yes, I only saw the cc line. In which case, I assume Stefano will
take this via the xen tree.

Will

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v11 4/5] arm64: introduce CONFIG_PARAVIRT, PARAVIRT_TIME_ACCOUNTING and pv_time_ops

2015-11-17 Thread Will Deacon
On Tue, Nov 10, 2015 at 02:11:38PM +, Stefano Stabellini wrote:
> On Thu, 5 Nov 2015, Stefano Stabellini wrote:
> > Introduce CONFIG_PARAVIRT and PARAVIRT_TIME_ACCOUNTING on ARM64.
> > Necessary duplication of paravirt.h and paravirt.c with ARM.
> > 
> > The only paravirt interface supported is pv_time_ops.steal_clock, so no
> > runtime pvops patching needed.
> > 
> > This allows us to make use of steal_account_process_tick for stolen
> > ticks accounting.
> > 
> > Signed-off-by: Stefano Stabellini 
> > Acked-by: Marc Zyngier 
> 
> Ping?
> 
> Catalin, Will,
> are you happy with this change?

I'm happy if Marc's happy. Marc?

Will

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] KVM: arm64: add workaround for Cortex-A57 erratum #852523

2015-09-14 Thread Will Deacon
Hi Ian,

On Mon, Sep 14, 2015 at 04:36:28PM +0100, Ian Campbell wrote:
> On Mon, 2015-09-14 at 16:06 +0100, Will Deacon wrote:
> > When restoring the system register state for an AArch32 guest at EL2,
> > writes to DACR32_EL2 may not be correctly synchronised by Cortex-A57,
> > which can lead to the guest effectively running with junk in the DACR
> > and running into unexpected domain faults.
> 
> Thanks for the CC, dropping down to just the Xen folks/list and you guys.
> 
> The errata doc I've got doesn't yet cover this, so I've a few questions.

It should be updated in the next few days, but I wanted to get this out
ASAP since it's quite easy to hit under KVM (particularly with the new
domain-based PAN implementation for arch/arm/).

> > This patch works around the issue by re-ordering our restoration of the
> > AArch32 register aliases so that they happen before the AArch64 system
> > registers. Ensuring that the registers are restored in this order
> > guarantees that they will be correctly synchronised by the core.
> 
> Is it required that the AArch32 aliases are all restored strictly before
> the AArch64 sysregs, or just that at least one sysreg is restored after
> DACR32_EL2 (or a specific one?)?

Take your pick from: SCTLR_EL1, TCR_EL1, TTBR0_EL1, TTBR1_EL1, or
CONTEXTIDR_EL1. Writing any of those after DACR32_EL2 will avoid the
erratum.

> The Xen ctxt switch code[0] has DACR_EL2 in the midst of it all, and
> certainly followed by some sysregs, which I've got my fingers crossed
> happens to be sufficient (other than maybe adding a comment).

It looks like you restore CONTEXTIDR_EL1 fairly late, so you should be
ok.

Will

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH] KVM: arm64: add workaround for Cortex-A57 erratum #852523

2015-09-14 Thread Will Deacon
When restoring the system register state for an AArch32 guest at EL2,
writes to DACR32_EL2 may not be correctly synchronised by Cortex-A57,
which can lead to the guest effectively running with junk in the DACR
and running into unexpected domain faults.

This patch works around the issue by re-ordering our restoration of the
AArch32 register aliases so that they happen before the AArch64 system
registers. Ensuring that the registers are restored in this order
guarantees that they will be correctly synchronised by the core.

Cc: <sta...@vger.kernel.org>
Cc: Marc Zyngier <marc.zyng...@arm.com>
Signed-off-by: Will Deacon <will.dea...@arm.com>
---
 arch/arm64/kvm/hyp.S | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index 3c4f641451bb..c4016d411f4a 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -739,6 +739,9 @@ ENTRY(__kvm_vcpu_run)
// Guest context
add x2, x0, #VCPU_CONTEXT
 
+   // We must restore the 32-bit state before the sysregs, thanks
+   // to Cortex-A57 erratum #852523.
+   restore_guest_32bit_state
bl __restore_sysregs
 
skip_debug_state x3, 1f
@@ -746,7 +749,6 @@ ENTRY(__kvm_vcpu_run)
kern_hyp_va x3
bl  __restore_debug
 1:
-   restore_guest_32bit_state
restore_guest_regs
 
// That's it, no more messing around.
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] KVM: arm64: add workaround for Cortex-A57 erratum #852523

2015-09-14 Thread Will Deacon
On Mon, Sep 14, 2015 at 04:46:28PM +0100, Marc Zyngier wrote:
> On 14/09/15 16:06, Will Deacon wrote:
> > When restoring the system register state for an AArch32 guest at EL2,
> > writes to DACR32_EL2 may not be correctly synchronised by Cortex-A57,
> > which can lead to the guest effectively running with junk in the DACR
> > and running into unexpected domain faults.
> > 
> > This patch works around the issue by re-ordering our restoration of the
> > AArch32 register aliases so that they happen before the AArch64 system
> > registers. Ensuring that the registers are restored in this order
> > guarantees that they will be correctly synchronised by the core.
> > 
> > Cc: <sta...@vger.kernel.org>
> > Cc: Marc Zyngier <marc.zyng...@arm.com>
> > Signed-off-by: Will Deacon <will.dea...@arm.com>
> 
> Reviewed-by: Marc Zyngier <marc.zyng...@arm.com>
> 
> I'll queue that together with the next batch of fixes.

Great, thanks Marc.

Will

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 0/3] arm/arm64: Detect Xen support earlier

2015-02-25 Thread Will Deacon
On Wed, Feb 25, 2015 at 04:40:40PM +, Ian Campbell wrote:
 On Wed, 2015-02-25 at 16:34 +, Will Deacon wrote:
  On Wed, Feb 25, 2015 at 04:22:33PM +, Julien Grall wrote:
   Hello,
   
   Ping? Any comments from ARM's maintainers on theses patches (at least #2)?
  
  I couldn't care less :)
  
  The arm64 part is boring in the good sense of the word.
 
 May Julien take that as an Acked-by? ;-)

For the two lines under arch/arm64, sure!

 Given this is all about Xen really I suppose it makes most sense to take
 it via the Xen tree.

Fine by me.

Will

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 0/3] arm/arm64: Detect Xen support earlier

2015-02-25 Thread Will Deacon
On Wed, Feb 25, 2015 at 04:22:33PM +, Julien Grall wrote:
 Hello,
 
 Ping? Any comments from ARM's maintainers on theses patches (at least #2)?

I couldn't care less :)

The arm64 part is boring in the good sense of the word.

Will

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] linux-next: manual merge of the xen-tip tree with the arm-soc tree

2014-12-08 Thread Will Deacon
On Mon, Dec 08, 2014 at 07:49:08AM +, Stephen Rothwell wrote:
 Hi all,
 
 Today's linux-next merge of the xen-tip tree got a conflict in
 arch/arm/include/asm/dma-mapping.h between commits a3a60f81ee6f
 (dma-mapping: replace set_arch_dma_coherent_ops with
 arch_setup_dma_ops) and 4bb25789ed28 (arm: dma-mapping: plumb our
 iommu mapping ops into arch_setup_dma_ops) from the arm-soc tree and
 commit 3d5391ac6f5e (arm: introduce is_device_dma_coherent) from the
 xen-tip tree.
 
 I fixed it up (see below) and can carry the fix as necessary (no action
 is required).
 
 I also neede this merge fix patch:
 
 From: Stephen Rothwell s...@canb.auug.org.au
 Date: Mon, 8 Dec 2014 18:46:59 +1100
 Subject: [PATCH] arm: introduce is_device_dma_coherent merge fix
 
 Signed-off-by: Stephen Rothwell s...@canb.auug.org.au
 ---
  arch/arm/mm/dma-mapping.c | 1 +
  1 file changed, 1 insertion(+)
 
 diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
 index 09645f00bd17..43064cbe58f9 100644
 --- a/arch/arm/mm/dma-mapping.c
 +++ b/arch/arm/mm/dma-mapping.c
 @@ -2058,6 +2058,7 @@ void arch_setup_dma_ops(struct device *dev, u64 
 dma_base, u64 size,
   else
   dma_ops = arm_get_dma_map_ops(coherent);
  
 + dev-archdata.dma_coherent = coherent;
   set_dma_ops(dev, dma_ops);
  }

Looks good to me.

Cheers,

Will

 -- 
 2.1.3
 
 -- 
 Cheers,
 Stephen Rothwells...@canb.auug.org.au
 
 diff --cc arch/arm/include/asm/dma-mapping.h
 index 9410b7e548fc,e6e3446abdf6..
 --- a/arch/arm/include/asm/dma-mapping.h
 +++ b/arch/arm/include/asm/dma-mapping.h
 @@@ -121,13 -121,20 +121,19 @@@ static inline unsigned long dma_max_pfn
   }
   #define dma_max_pfn(dev) dma_max_pfn(dev)
   
  -static inline int set_arch_dma_coherent_ops(struct device *dev)
  -{
  -dev-archdata.dma_coherent = true;
  -set_dma_ops(dev, arm_coherent_dma_ops);
  -return 0;
  -}
  -#define set_arch_dma_coherent_ops(dev)  set_arch_dma_coherent_ops(dev)
  +#define arch_setup_dma_ops arch_setup_dma_ops
  +extern void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
  +   struct iommu_ops *iommu, bool coherent);
  +
  +#define arch_teardown_dma_ops arch_teardown_dma_ops
  +extern void arch_teardown_dma_ops(struct device *dev);
   
 + /* do not use this function in a driver */
 + static inline bool is_device_dma_coherent(struct device *dev)
 + {
 + return dev-archdata.dma_coherent;
 + }
 + 
   static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
   {
   unsigned int offset = paddr  ~PAGE_MASK;



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel