Re: [PATCH] x86/asm/64: Align start of __clear_user() loop to 16-bytes

2020-06-18 Thread Alexey Dobriyan
On Thu, Jun 18, 2020 at 04:39:35PM +, David Laight wrote: > From: Alexey Dobriyan > > Sent: 18 June 2020 14:17 > ... > > > > diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c > > > > index fff28c6f73a2..b0dfac3d3df7 100644 > > > > --- a/arch/x86/lib/usercopy_64.c > > > >

RE: [PATCH] x86/asm/64: Align start of __clear_user() loop to 16-bytes

2020-06-18 Thread David Laight
From: Alexey Dobriyan > Sent: 18 June 2020 14:17 ... > > > diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c > > > index fff28c6f73a2..b0dfac3d3df7 100644 > > > --- a/arch/x86/lib/usercopy_64.c > > > +++ b/arch/x86/lib/usercopy_64.c > > > @@ -24,6 +24,7 @@ unsigned long

Re: [PATCH] x86/asm/64: Align start of __clear_user() loop to 16-bytes

2020-06-18 Thread Alexey Dobriyan
On Thu, Jun 18, 2020 at 10:48:05AM +, David Laight wrote: > From: Matt Fleming > > Sent: 18 June 2020 11:20 > > x86 CPUs can suffer severe performance drops if a tight loop, such as > > the ones in __clear_user(), straddles a 16-byte instruction fetch > > window, or worse, a 64-byte cacheline.

RE: [PATCH] x86/asm/64: Align start of __clear_user() loop to 16-bytes

2020-06-18 Thread David Laight
From: Matt Fleming > Sent: 18 June 2020 11:20 > x86 CPUs can suffer severe performance drops if a tight loop, such as > the ones in __clear_user(), straddles a 16-byte instruction fetch > window, or worse, a 64-byte cacheline. This issues was discovered in the > SUSE kernel with the following

[PATCH] x86/asm/64: Align start of __clear_user() loop to 16-bytes

2020-06-18 Thread Matt Fleming
x86 CPUs can suffer severe performance drops if a tight loop, such as the ones in __clear_user(), straddles a 16-byte instruction fetch window, or worse, a 64-byte cacheline. This issues was discovered in the SUSE kernel with the following commit, 1153933703d9 ("x86/asm/64: Micro-optimize