Re: [PATCH 1/1] futex: remove duplicated code

2017-05-15 Thread Will Deacon
Hi Jiri,

On Mon, May 15, 2017 at 03:07:42PM +0200, Jiri Slaby wrote:
> There is code duplicated over all architecture's headers for
> futex_atomic_op_inuser. Namely op decoding, access_ok check for uaddr,
> and comparison of the result.
> 
> Remove this duplication and leave up to the arches only the needed
> assembly which is now in arch_futex_atomic_op_inuser.
> 
> Note that s390 removed access_ok check in d12a29703 ("s390/uaccess:
> remove pointless access_ok() checks") as access_ok there returns true.
> We introduce it back to the helper for the sake of simplicity (it gets
> optimized away anyway).

Whilst I think this is a good idea, the code in question actually results
in undefined behaviour per the C spec and is reported by UBSAN. See my
patch fixing arm64 here (which I'd forgotten about):

https://www.spinics.net/lists/linux-arch/msg38564.html

But, as stated in the thread above, I think we should go a step further
and remove FUTEX_OP_{OR,ANDN,XOR,OPARG_SHIFT} altogether. They don't
appear to be used by userspace, and this whole thing is a total mess.

Any thoughts?

Will

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 1/1] futex: remove duplicated code and fix UB

2017-06-26 Thread Will Deacon
On Mon, Jun 26, 2017 at 02:02:31PM +0200, Jiri Slaby wrote:
> On 06/23/2017, 09:51 AM, Thomas Gleixner wrote:
> > On Wed, 21 Jun 2017, Jiri Slaby wrote:
> >> diff --git a/arch/arm64/include/asm/futex.h 
> >> b/arch/arm64/include/asm/futex.h
> >> index f32b42e8725d..5bb2fd4674e7 100644
> >> --- a/arch/arm64/include/asm/futex.h
> >> +++ b/arch/arm64/include/asm/futex.h
> >> @@ -48,20 +48,10 @@ do {   
> >> \
> >>  } while (0)
> >>  
> >>  static inline int
> >> -futex_atomic_op_inuser(unsigned int encoded_op, u32 __user *uaddr)
> > 
> > That unsigned int seems to be a change from the arm64 tree in next. It's
> > not upstream and it'll cause a (easy to resolve) conflict.
> 
> Ugh, I thought the arm64 is in upstream already. Note that this patch
> just takes what is in this arm64 fix and makes it effective for all
> architectures. So I will wait with v2 until it merges upstream.
> 
> So, Will, will you incorporate Thomas' comments into your arm64 fix?

I wasn't planning to (it's already queued and I think they're just cosmetic
changes). The easiest thing is probably for you to make the changes in the
generic version when you post v2.

Will

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 1/1] futex: remove duplicated code

2017-05-18 Thread Will Deacon
On Wed, May 17, 2017 at 10:01:29AM +0200, Jiri Slaby wrote:
> On 05/15/2017, 03:16 PM, Will Deacon wrote:
> > Whilst I think this is a good idea, the code in question actually results
> > in undefined behaviour per the C spec and is reported by UBSAN.
> 
> Hi, yes, I know -- this patch was the 1st from the series of 3 which I
> sent a long time ago to fix that up too. But I remember your patch, so I
> sent only this one this time.
> 
> > See my
> > patch fixing arm64 here (which I'd forgotten about):
> > 
> > https://www.spinics.net/lists/linux-arch/msg38564.html
> > 
> > But, as stated in the thread above, I think we should go a step further
> > and remove FUTEX_OP_{OR,ANDN,XOR,OPARG_SHIFT} altogether. They don't
> > appear to be used by userspace, and this whole thing is a total mess.
> > 
> > Any thoughts?
> 
> Ok, I am all for that. I think the only question is who is going to do
> the work and submit it :)? Do I understand correctly to eliminate all
> these functions and the path into the kernel? But won't this break API
> -- are there really no users of this interface?

That's the million-dollar question, really. I don't know of any code using
it, and I couldn't find any when I looked (also nothing reported by Debian
Codesearch afaict), but I was hoping linux-arch might have some thoughts
on this too.

For now, I'll queue my arm64 patch before I forget about it again!

Will

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 1/1] futex: remove duplicated code

2017-05-25 Thread Will Deacon
On Mon, May 22, 2017 at 11:11:33PM +0200, Thomas Gleixner wrote:
> On Mon, 15 May 2017, Will Deacon wrote:
> > On Mon, May 15, 2017 at 03:07:42PM +0200, Jiri Slaby wrote:
> > > There is code duplicated over all architecture's headers for
> > > futex_atomic_op_inuser. Namely op decoding, access_ok check for uaddr,
> > > and comparison of the result.
> > > 
> > > Remove this duplication and leave up to the arches only the needed
> > > assembly which is now in arch_futex_atomic_op_inuser.
> > > 
> > > Note that s390 removed access_ok check in d12a29703 ("s390/uaccess:
> > > remove pointless access_ok() checks") as access_ok there returns true.
> > > We introduce it back to the helper for the sake of simplicity (it gets
> > > optimized away anyway).
> > 
> > Whilst I think this is a good idea, the code in question actually results
> > in undefined behaviour per the C spec and is reported by UBSAN. See my
> > patch fixing arm64 here (which I'd forgotten about):
> > 
> > https://www.spinics.net/lists/linux-arch/msg38564.html
> > 
> > But, as stated in the thread above, I think we should go a step further
> > and remove FUTEX_OP_{OR,ANDN,XOR,OPARG_SHIFT} altogether. They don't
> > appear to be used by userspace, and this whole thing is a total mess.
> 
> You wish. The constants are not used, but FUTEX_WAKE_OP _IS_ used by
> glibc. They only have one argument it seems:
> 
>#define FUTEX_OP_CLEAR_WAKE_IF_GT_ONE  ((4 << 24) | 1)
> 
> but I'm pretty sure that there is enough (probably horrible) code (think
> java) out there using FUTEX_WAKE_OP for whatever (non)sensical reasons in
> any available combination.

Indeed, and I'm not proposing to get rid of that. It's the grossly
over-engineered array of operations and the FUTEX_OP_OPARG_SHIFT modifier
that I think we should kill. The latter likely behaves differently across
different architectures and potentially depending on the toolchain you used
to build the kernel.

Does anybody know the history behind the interface design?

Will

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH v2 1/1] futex: remove duplicated code and fix UB

2017-08-24 Thread Will Deacon
Hi Jiri,

On Thu, Aug 24, 2017 at 09:31:05AM +0200, Jiri Slaby wrote:
> There is code duplicated over all architecture's headers for
> futex_atomic_op_inuser. Namely op decoding, access_ok check for uaddr,
> and comparison of the result.
> 
> Remove this duplication and leave up to the arches only the needed
> assembly which is now in arch_futex_atomic_op_inuser.
> 
> This effectively distributes the Will Deacon's arm64 fix for undefined
> behaviour reported by UBSAN to all architectures. The fix was done in
> commit 5f16a046f8e1 (arm64: futex: Fix undefined behaviour with
> FUTEX_OP_OPARG_SHIFT usage). Look there for an example dump.
> 
> And as suggested by Thomas, check for negative oparg too, because it was
> also reported to cause undefined behaviour report.
> 
> Note that s390 removed access_ok check in d12a29703 ("s390/uaccess:
> remove pointless access_ok() checks") as access_ok there returns true.
> We introduce it back to the helper for the sake of simplicity (it gets
> optimized away anyway).

For arm64 and the core code:

Reviewed-by: Will Deacon <will.dea...@arm.com>

Although one minor thing on the core part:

> diff --git a/kernel/futex.c b/kernel/futex.c
> index 0939255fc750..3d38eaf05492 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -1551,6 +1551,45 @@ futex_wake(u32 __user *uaddr, unsigned int flags, int 
> nr_wake, u32 bitset)
>   return ret;
>  }
>  
> +static int futex_atomic_op_inuser(unsigned int encoded_op, u32 __user *uaddr)
> +{
> + unsigned int op = (encoded_op & 0x7000) >> 28;
> + unsigned int cmp =(encoded_op & 0x0f00) >> 24;
> + int oparg = sign_extend32((encoded_op & 0x00fff000) >> 12, 12);
> + int cmparg = sign_extend32(encoded_op & 0x0fff, 12);
> + int oldval, ret;
> +
> + if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) {
> + if (oparg < 0 || oparg > 31)
> + return -EINVAL;
> + oparg = 1 << oparg;
> + }
> +
> + if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
> + return -EFAULT;
> +
> + ret = arch_futex_atomic_op_inuser(op, oparg, , uaddr);
> + if (ret)
> + return ret;

We could move the pagefault_{disable,enable} calls here, and then remove
them from the futex_atomic_op_inuser callsites elsewhere in futex.c

Will

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH v2 6/9] kbuild: consolidate Devicetree dtb build rules

2018-09-06 Thread Will Deacon
On Wed, Sep 05, 2018 at 06:53:24PM -0500, Rob Herring wrote:
> There is nothing arch specific about building dtb files other than their
> location under /arch/*/boot/dts/. Keeping each arch aligned is a pain.
> The dependencies and supported targets are all slightly different.
> Also, a cross-compiler for each arch is needed, but really the host
> compiler preprocessor is perfectly fine for building dtbs. Move the
> build rules to a common location and remove the arch specific ones. This
> is done in a single step to avoid warnings about overriding rules.
> 
> The build dependencies had been a mixture of 'scripts' and/or 'prepare'.
> These pull in several dependencies some of which need a target compiler
> (specifically devicetable-offsets.h) and aren't needed to build dtbs.
> All that is really needed is dtc, so adjust the dependencies to only be
> dtc.
> 
> This change enables support 'dtbs_install' on some arches which were
> missing the target.
> 
> Cc: Masahiro Yamada 
> Cc: Michal Marek 
> Cc: Vineet Gupta 
> Cc: Russell King 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Yoshinori Sato 
> Cc: Michal Simek 
> Cc: Ralf Baechle 
> Cc: Paul Burton 
> Cc: James Hogan 
> Cc: Ley Foon Tan 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: Chris Zankel 
> Cc: Max Filippov 
> Cc: linux-kbu...@vger.kernel.org
> Cc: linux-snps-arc@lists.infradead.org
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: uclinux-h8-de...@lists.sourceforge.jp
> Cc: linux-m...@linux-mips.org
> Cc: nios2-...@lists.rocketboards.org
> Cc: linuxppc-...@lists.ozlabs.org
> Cc: linux-xte...@linux-xtensa.org
> Signed-off-by: Rob Herring 
> ---
> Please ack so I can take the whole series via the DT tree.

For arm64:

Acked-by: Will Deacon 

Will

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: Patch "asm-generic/bitops/lock.h: Rewrite using atomic_fetch_" causes kernel crash

2018-08-31 Thread Will Deacon
On Fri, Aug 31, 2018 at 12:30:50AM +, Vineet Gupta wrote:
> On 08/30/2018 01:45 PM, Peter Zijlstra wrote:
> >>
> >> Indeed this is the mother of all issues, I tried and system is clearly 
> >> hosed with
> >> and works after.
> >> What's amazing is the commit 4aef66c8ae9 which introduced it is from 2016 
> >> ;-)
> >> Back then we had a retry branch with backoff stuff which I'd reverted for 
> >> new
> >> cores and the merge conflict somehow missed it.
> >>
> >> @PeterZ I'll create a patch with you as author ? do I need any formal sign 
> >> offs,
> >> acks etc ?
> > Well, Will spotted it, give authorship to him, you have my ack per the
> > above.
> 
> Oops, sorry for the mixup. I have him as author now and pushed to ARC for-curr
> (will trickle into linux-next eventually).

Okey doke: you can have my Signed-off-by for all the diffs I sent. Just
stick something like [vineet: wrote commit message] in there so I don't
read it in the future and confuse myself.

Will

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: Patch "asm-generic/bitops/lock.h: Rewrite using atomic_fetch_" causes kernel crash

2018-08-30 Thread Will Deacon
On Wed, Aug 29, 2018 at 09:16:43PM +, Vineet Gupta wrote:
> On 08/29/2018 11:33 AM, Eugeniy Paltsev wrote:
> > Hi Guys,
> > Since v4.19-rc1 we are getting a serious regression on platforms with ARC 
> > architecture.
> > The kernel have become unstable and spontaneously crashes on LTP tests 
> > execution / IO tests or
> > even on boot.
> >
> > I don't know exactly what breaks but bisect clearly assign the blame to 
> > this commit:
> > 84c6591103db ("locking/atomics, asm-generic/bitops/lock.h: Rewrite using 
> > atomic_fetch_*()")
> > https://github.com/torvalds/linux/commit/84c6591103dbeaf393a092a3fc7b09510825f6b9
> >
> > Reverting the commit solves this problem.
> >
> > I tested v4.19-rc1 on ARM (wandboard, i.mx6, 32bit, quard core, ARMv7) 
> > which uses same
> > generic bitops implementation and it works fine.
> >
> > Do you have any ideas what went wrong?
> 
> Back in 2016, Peter had fixed this file due to a problem I reported on ARC. 
> See
> commit f75d48644c56a ("bitops: Do not default to __clear_bit() for
> __clear_bit_unlock()")
> That made __clear_bit_unlock() use the atomic clear_bit() vs. non-atomic
> __clear_bit(), effectively making clear_bit_unlock() and __clear_bit_unlock() 
> same.
> 
> This patch undoes that which could explain the issues you see. @Peter, @Will ?

/me grabs arc toolchain (incidentally, make.cross fuzzy matches "arc" to
"sparc", so that was fun for a few minutes).

I'll take a look today, thanks for the report.

Will

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: Patch "asm-generic/bitops/lock.h: Rewrite using atomic_fetch_" causes kernel crash

2018-08-30 Thread Will Deacon
On Thu, Aug 30, 2018 at 11:44:11AM +0200, Peter Zijlstra wrote:
> On Wed, Aug 29, 2018 at 09:16:43PM +, Vineet Gupta wrote:
> > On 08/29/2018 11:33 AM, Eugeniy Paltsev wrote:
> > > Hi Guys,
> > > Since v4.19-rc1 we are getting a serious regression on platforms with ARC 
> > > architecture.
> > > The kernel have become unstable and spontaneously crashes on LTP tests 
> > > execution / IO tests or
> > > even on boot.
> > >
> > > I don't know exactly what breaks but bisect clearly assign the blame to 
> > > this commit:
> > > 84c6591103db ("locking/atomics, asm-generic/bitops/lock.h: Rewrite using 
> > > atomic_fetch_*()")
> > > https://github.com/torvalds/linux/commit/84c6591103dbeaf393a092a3fc7b09510825f6b9
> > >
> > > Reverting the commit solves this problem.
> > >
> > > I tested v4.19-rc1 on ARM (wandboard, i.mx6, 32bit, quard core, ARMv7) 
> > > which uses same
> > > generic bitops implementation and it works fine.
> > >
> > > Do you have any ideas what went wrong?
> > 
> > Back in 2016, Peter had fixed this file due to a problem I reported on ARC. 
> > See
> > commit f75d48644c56a ("bitops: Do not default to __clear_bit() for
> > __clear_bit_unlock()")
> > That made __clear_bit_unlock() use the atomic clear_bit() vs. non-atomic
> > __clear_bit(), effectively making clear_bit_unlock() and 
> > __clear_bit_unlock() same.
> > 
> > This patch undoes that which could explain the issues you see. @Peter, 
> > @Will ?
> 
> Right, so the thinking is that on platforms that suffer that issue,
> atomic_set*() should DTRT. And if you look at your spinlock based atomic
> implementation, you'll note that atomic_set() does indeed do the right
> thing.
> 
> arch/arc/include/asm/atomic.h:108

Yeah, the bit_spin_lock()/__bit_spin_unlock() race described in f75d48644c56a
boils down to concurrent atomic_long_set_release() vs
atomic_long_fetch_or_acquire(), which really needs to work.

I'll keep digging. In the meantime, Vineet, do you have any useful crash
logs and do you only see the crashes in certain configurations (e.g. SMP but
!CONFIG_ARC_HAS_LLSC)?

Will

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH] atomic{64}_t: Explicitly specify data storage length and alignment

2018-07-09 Thread Will Deacon
On Mon, Jul 09, 2018 at 03:47:41PM +0300, Alexey Brodkin wrote:
> Atomic instructions require data they operate on to be aligned
> according to data size. I.e. 32-bit atomic values must be 32-bit
> aligned while 64-bit values must be 64-bit aligned.
> 
> Otherwise even if CPU may handle not-aligend normal data access,
> still atomic instructions fail and typically raise an exception
> leaving us dead in the water.
> 
> This came-up during lengthly discussion here:
> http://lists.infradead.org/pipermail/linux-snps-arc/2018-July/004022.html
> 
> Signed-off-by: Alexey Brodkin 
> Cc: Will Deacon 
> Cc: Peter Zijlstra 
> Cc: Boqun Feng 
> Cc: Russell King 
> Cc: Arnd Bergmann 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Darren Hart 
> Cc: Shuah Khan 
> Cc: "Paul E. McKenney" 
> Cc: Josh Triplett 
> Cc: Steven Rostedt 
> Cc: Mathieu Desnoyers 
> Cc: Lai Jiangshan 
> Cc: David Laight 
> Cc: Geert Uytterhoeven 
> Cc: Greg Kroah-Hartman 
> ---
>  arch/arm/include/asm/atomic.h | 2 +-
>  include/asm-generic/atomic64.h| 2 +-
>  include/linux/types.h | 4 ++--
>  tools/include/linux/types.h   | 2 +-
>  tools/testing/selftests/futex/include/atomic.h| 2 +-
>  .../rcutorture/formal/srcu-cbmc/include/linux/types.h | 4 ++--
>  6 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm/include/asm/atomic.h b/arch/arm/include/asm/atomic.h
> index 66d0e215a773..2ed6d7cf1407 100644
> --- a/arch/arm/include/asm/atomic.h
> +++ b/arch/arm/include/asm/atomic.h
> @@ -267,7 +267,7 @@ ATOMIC_OPS(xor, ^=, eor)
>  
>  #ifndef CONFIG_GENERIC_ATOMIC64
>  typedef struct {
> - long long counter;
> + u64 __aligned(8) counter;
>  } atomic64_t;

Long long is 8-byte aligned per EABI ARM, and we use the generic atomic64
infrastructure for OABI, so we don't need to change anything here afaict.

Will

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 2/4] mm: speed up mremap by 500x on large regions (v2)

2018-10-29 Thread Will Deacon
On Fri, Oct 26, 2018 at 02:11:48PM -0700, Joel Fernandes wrote:
> My thinking is to take it slow and get the patch in in its current state,
> since it improves x86. Then as a next step, look into why the arm64 tlb
> flushes are that expensive and look into optimizing that. On arm64 I am
> testing on a 4.9 kernel so I'm wondering there are any optimizations since
> 4.9 that can help speed it up there. After that, if all else fails about
> speeding up arm64, then I look into developing the cleanest possible solution
> where we can keep the lock held for longer and flush lesser.

We rewrote a good chunk of the arm64 TLB invalidation and core mmu_gather
code this merge window, so please do have another look at -rc1!

Will

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH v4 4/6] arm64: Utilize phys_initrd_start/phys_initrd_size

2018-11-15 Thread Will Deacon
On Mon, Nov 05, 2018 at 02:54:29PM -0800, Florian Fainelli wrote:
> ARM64 is the only architecture that re-defines
> __early_init_dt_declare_initrd() in order for that function to populate
> initrd_start/initrd_end with physical addresses instead of virtual
> addresses. Instead of having an override we can leverage
> drivers/of/fdt.c populating phys_initrd_start/phys_initrd_size to
> populate those variables for us.
> 
> Signed-off-by: Florian Fainelli 
> ---
>  arch/arm64/mm/init.c | 20 
>  1 file changed, 8 insertions(+), 12 deletions(-)

Looks ok to me:

Acked-by: Will Deacon 

Will

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH v4 1/4] kgdb: Remove irq flags from roundup

2018-11-14 Thread Will Deacon
On Mon, Nov 12, 2018 at 10:26:55AM -0800, Douglas Anderson wrote:
> The function kgdb_roundup_cpus() was passed a parameter that was
> documented as:
> 
> > the flags that will be used when restoring the interrupts. There is
> > local_irq_save() call before kgdb_roundup_cpus().
> 
> Nobody used those flags.  Anyone who wanted to temporarily turn on
> interrupts just did local_irq_enable() and local_irq_disable() without
> looking at them.  So we can definitely remove the flags.
> 
> Signed-off-by: Douglas Anderson 
> ---

Acked-by: Will Deacon 

I'm hopeful that you'll keep hacking on kgdb, because it definitely needs
some love in its current state.

Will

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH v4 2/4] kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()

2018-11-14 Thread Will Deacon
On Mon, Nov 12, 2018 at 10:26:56AM -0800, Douglas Anderson wrote:
> When I had lockdep turned on and dropped into kgdb I got a nice splat
> on my system.  Specifically it hit:
>   DEBUG_LOCKS_WARN_ON(current->hardirq_context)
> 
> Specifically it looked like this:
>   sysrq: SysRq : DEBUG
>   [ cut here ]
>   DEBUG_LOCKS_WARN_ON(current->hardirq_context)
>   WARNING: CPU: 0 PID: 0 at .../kernel/locking/lockdep.c:2875 
> lockdep_hardirqs_on+0xf0/0x160
>   CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0 #27
>   pstate: 604003c9 (nZCv DAIF +PAN -UAO)
>   pc : lockdep_hardirqs_on+0xf0/0x160
>   ...
>   Call trace:
>lockdep_hardirqs_on+0xf0/0x160
>trace_hardirqs_on+0x188/0x1ac
>kgdb_roundup_cpus+0x14/0x3c
>kgdb_cpu_enter+0x53c/0x5cc
>kgdb_handle_exception+0x180/0x1d4
>kgdb_compiled_brk_fn+0x30/0x3c
>brk_handler+0x134/0x178
>do_debug_exception+0xfc/0x178
>el1_dbg+0x18/0x78
>kgdb_breakpoint+0x34/0x58
>sysrq_handle_dbg+0x54/0x5c
>__handle_sysrq+0x114/0x21c
>handle_sysrq+0x30/0x3c
>qcom_geni_serial_isr+0x2dc/0x30c
>   ...
>   ...
>   irq event stamp: ...45
>   hardirqs last  enabled at (...44): [...] __do_softirq+0xd8/0x4e4
>   hardirqs last disabled at (...45): [...] el1_irq+0x74/0x130
>   softirqs last  enabled at (...42): [...] _local_bh_enable+0x2c/0x34
>   softirqs last disabled at (...43): [...] irq_exit+0xa8/0x100
>   ---[ end trace adf21f830c46e638 ]---
> 
> Looking closely at it, it seems like a really bad idea to be calling
> local_irq_enable() in kgdb_roundup_cpus().  If nothing else that seems
> like it could violate spinlock semantics and cause a deadlock.
> 
> Instead, let's use a private csd alongside
> smp_call_function_single_async() to round up the other CPUs.  Using
> smp_call_function_single_async() doesn't require interrupts to be
> enabled so we can remove the offending bit of code.
> 
> In order to avoid duplicating this across all the architectures that
> use the default kgdb_roundup_cpus(), we'll add a "weak" implementation
> to debug_core.c.
> 
> Looking at all the people who previously had copies of this code,
> there were a few variants.  I've attempted to keep the variants
> working like they used to.  Specifically:
> * For arch/arc we passed NULL to kgdb_nmicallback() instead of
>   get_irq_regs().
> * For arch/mips there was a bit of extra code around
>   kgdb_nmicallback()
> 
> NOTE: In this patch we will still get into trouble if we try to round
> up a CPU that failed to round up before.  We'll try to round it up
> again and potentially hang when we try to grab the csd lock.  That's
> not new behavior but we'll still try to do better in a future patch.
> 
> Suggested-by: Daniel Thompson 
> Signed-off-by: Douglas Anderson 
> ---
> 
> Changes in v4: None
> Changes in v3:
> - No separate init call.
> - Don't round up the CPU that is doing the rounding up.
> - Add "#ifdef CONFIG_SMP" to match the rest of the file.
> - Updated desc saying we don't solve the "failed to roundup" case.
> - Document the ignored parameter.
> 
> Changes in v2:
> - Removing irq flags separated from fixing lockdep splat.
> - Don't use smp_call_function (Daniel).
> 
>  arch/arc/kernel/kgdb.c | 10 ++
>  arch/arm/kernel/kgdb.c | 12 
>  arch/arm64/kernel/kgdb.c   | 12 
>  arch/hexagon/kernel/kgdb.c | 27 ---
>  arch/mips/kernel/kgdb.c|  9 +
>  arch/powerpc/kernel/kgdb.c |  4 ++--
>  arch/sh/kernel/kgdb.c  | 12 
>  include/linux/kgdb.h   | 15 +--
>  kernel/debug/debug_core.c  | 35 +++
>  9 files changed, 53 insertions(+), 83 deletions(-)

[...]

> diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> index f3cadda45f07..23f2b5613afa 100644
> --- a/kernel/debug/debug_core.c
> +++ b/kernel/debug/debug_core.c
> @@ -55,6 +55,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -220,6 +221,40 @@ int __weak kgdb_skipexception(int exception, struct 
> pt_regs *regs)
>   return 0;
>  }
>  
> +#ifdef CONFIG_SMP
> +
> +/*
> + * Default (weak) implementation for kgdb_roundup_cpus
> + */
> +
> +static DEFINE_PER_CPU(call_single_data_t, kgdb_roundup_csd);
> +
> +void __weak kgdb_call_nmi_hook(void *ignored)
> +{
> + kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
> +}

I suppose you could pass the cpu as an argument, but it doesn't really
matter. Also, I think there are cases where the CSD callback can run without
having received an IPI, so we could potentially up passing NULL for the regs
here which probably goes boom.

> +
> +void __weak kgdb_roundup_cpus(void)
> +{
> + call_single_data_t *csd;
> + int this_cpu = get_cpu();

Do you actually need to disable preemption here? afaict, irqs are already
disabled by the kgdb core.

> + int cpu;
> +
> + for_each_cpu(cpu, cpu_online_mask) {

for_each_online_cpu(cpu) ?
I'm assuming 

Re: Patch "asm-generic/bitops/lock.h: Rewrite using atomic_fetch_" causes kernel crash

2018-08-30 Thread Will Deacon
On Thu, Aug 30, 2018 at 04:17:13PM +0200, Peter Zijlstra wrote:
> On Thu, Aug 30, 2018 at 11:53:17AM +, Eugeniy Paltsev wrote:
> > I can see crashes with LLSC enabled in both SMP running on 4 cores
> > and SMP running on 1 core.
> 
> So you're running on LL/SC enabled hardware; that would make Will's
> patch irrelevant (although still a good idea for the hardware that does
> care about that spinlocked atomic crud).

Yeah, that's a good point. I think the !LLSC case is broken without my
patch, so we're looking at two bugs...

> Does something like the below cure things? That would confirm the
> suggestion that the change to __clear_bit_unlock() is the curprit.
> 
> If that doesn't cure things, then we've been looking in entirely the
> wrong place.

Yes, that would be worth trying. However, I also just noticed that the
fetch-ops (which are now used to implement test_and_set_bit_lock()) seem
to be missing the backwards branch in the LL/SC case. Yet another diff
below.

Will

--->8

diff --git a/arch/arc/include/asm/atomic.h b/arch/arc/include/asm/atomic.h
index 4e0072730241..f06c5ed672b3 100644
--- a/arch/arc/include/asm/atomic.h
+++ b/arch/arc/include/asm/atomic.h
@@ -84,7 +84,7 @@ static inline int atomic_fetch_##op(int i, atomic_t *v)   
\
"1: llock   %[orig], [%[ctr]]   \n" \
"   " #asm_op " %[val], %[orig], %[i]   \n" \
"   scond   %[val], [%[ctr]]\n" \
-   "   \n" \
+   "   bnz 1b  \n" \
: [val] "="   (val),  \
  [orig] "=" (orig)   \
: [ctr] "r" (>counter),  \

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: Patch "asm-generic/bitops/lock.h: Rewrite using atomic_fetch_" causes kernel crash

2018-08-30 Thread Will Deacon
Hi Eugeniy,

On Thu, Aug 30, 2018 at 11:53:17AM +, Eugeniy Paltsev wrote:
> On Thu, 2018-08-30 at 10:51 +0100, Will Deacon wrote:
> > On Thu, Aug 30, 2018 at 11:44:11AM +0200, Peter Zijlstra wrote:
> > > On Wed, Aug 29, 2018 at 09:16:43PM +, Vineet Gupta wrote:
> > > > On 08/29/2018 11:33 AM, Eugeniy Paltsev wrote:
> > > > > Hi Guys,
> > > > > Since v4.19-rc1 we are getting a serious regression on platforms with 
> > > > > ARC architecture.
> > > > > The kernel have become unstable and spontaneously crashes on LTP 
> > > > > tests execution / IO tests or
> > > > > even on boot.
> > > > > 
> > > > > I don't know exactly what breaks but bisect clearly assign the blame 
> > > > > to this commit:
> > > > > 84c6591103db ("locking/atomics, asm-generic/bitops/lock.h: Rewrite 
> > > > > using atomic_fetch_*()")
> > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_torvalds_linux_commit_84c6591103dbeaf393a092a3fc7b09510825f6b9=DwIBAg=DPL6
> > > > > _X_6JkXFx7AXWqB0tg=ZlJN1MriPUTkBKCrPSx67GmaplEUGcAEk9yPtCLdUXI=6y0FFvkGdIQ6kX2lZ31V99lMfMV-
> > > > > RyWyYhiUGzh0Bi0=GNwmhSynIcWqgZhiOwFEEH_AtbZAH443_L6QH4nw_ls=
> > > > > 
> > > > > Reverting the commit solves this problem.
> > > > > 
> > > > > I tested v4.19-rc1 on ARM (wandboard, i.mx6, 32bit, quard core, 
> > > > > ARMv7) which uses same
> > > > > generic bitops implementation and it works fine.
> > > > > 
> > > > > Do you have any ideas what went wrong?
> > > > 
> > > > Back in 2016, Peter had fixed this file due to a problem I reported on 
> > > > ARC. See
> > > > commit f75d48644c56a ("bitops: Do not default to __clear_bit() for
> > > > __clear_bit_unlock()")
> > > > That made __clear_bit_unlock() use the atomic clear_bit() vs. non-atomic
> > > > __clear_bit(), effectively making clear_bit_unlock() and 
> > > > __clear_bit_unlock() same.
> > > > 
> > > > This patch undoes that which could explain the issues you see. @Peter, 
> > > > @Will ?
> > > 
> > > Right, so the thinking is that on platforms that suffer that issue,
> > > atomic_set*() should DTRT. And if you look at your spinlock based atomic
> > > implementation, you'll note that atomic_set() does indeed do the right
> > > thing.
> > > 
> > > arch/arc/include/asm/atomic.h:108
> > 
> > Yeah, the bit_spin_lock()/__bit_spin_unlock() race described in 
> > f75d48644c56a
> > boils down to concurrent atomic_long_set_release() vs
> > atomic_long_fetch_or_acquire(), which really needs to work.
> > 
> > I'll keep digging. In the meantime, Vineet, do you have any useful crash
> > logs and do you only see the crashes in certain configurations (e.g. SMP but
> > !CONFIG_ARC_HAS_LLSC)?
> 
> We don't have such configuration (SMP with !CONFIG_ARC_HAS_LLSC).

Ok, thanks for confirming. If you could put your .config somewhere that
would be helpful, since the build fails for me with defconfig.

> I can see crashes with LLSC enabled in both SMP running on 4 cores
> and SMP running on 1 core.
> 
> 
> There are some crash logs (not sure if they are really useful):
> Crashes are quite spontaneous and mostly happens in IO-related code: 

Aha: arc seems to have separate spinlocks for protecting bitops and atomics.
This is a seriously bad idea, because you only implement some of the bitops
API in the architecture and rely on asm-generic to give you bitops/lock.h,
assuming that it's going to be implemented in terms of the other parts of
the bitops API (it isn't anymore).

Here's a quick hack (totally untested, since I can't even build an arc kernel)
which moves arc over to asm-generic for all of the bitops API and ditches
the bitops lock. You could probably also drop the ffs() stuff, but I'll
leave that up to you. The "downside" is serialisation between bitops and
atomics, but I actually think you probably want that for xchg/cmpxchg
anyway.

Will

--->8

diff --git a/arch/arc/include/asm/bitops.h b/arch/arc/include/asm/bitops.h
index 8da87feec59a..50b0d5a56e32 100644
--- a/arch/arc/include/asm/bitops.h
+++ b/arch/arc/include/asm/bitops.h
@@ -17,242 +17,6 @@
 
 #include 
 #include 
-#include 
-#ifndef CONFIG_ARC_HAS_LLSC
-#include 
-#endif
-
-#ifdef CONFIG_ARC_HAS_LLSC
-
-/*
- * Hardware assisted Atomic-R-M-W
- */
-
-#define BIT_OP(op, c_op, asm_op)   \
-static inline void op##_bit(unsigned long nr, vola

Re: [PATCH 3/3] bitops.h: set_mask_bits() to return old value

2019-01-14 Thread Will Deacon
On Thu, Jan 10, 2019 at 04:26:27PM -0800, Vineet Gupta wrote:
> | > Also, set_mask_bits is used in fs quite a bit and we can possibly come up
> | > with a generic llsc based implementation (w/o the cmpxchg loop)
> |
> | May I also suggest changing the return value of set_mask_bits() to old.
> |
> | You can compute the new value given old, but you cannot compute the old
> | value given new, therefore old is the better return value. Also, no
> | current user seems to use the return value, so changing it is without
> | risk.
> 
> Link: 
> http://lkml.kernel.org/g/20150807110955.gh16...@twins.programming.kicks-ass.net
> Suggested-by: Peter Zijlstra 
> Cc: Miklos Szeredi 
> Cc: Ingo Molnar 
> Cc: Jani Nikula 
> Cc: Chris Wilson 
> Cc: Andrew Morton 
> Cc: Will Deacon 
> Signed-off-by: Vineet Gupta 
> ---
>  include/linux/bitops.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index 705f7c442691..602af23b98c7 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -246,7 +246,7 @@ static __always_inline void __assign_bit(long nr, 
> volatile unsigned long *addr,
>   new__ = (old__ & ~mask__) | bits__; \
>   } while (cmpxchg(ptr, old__, new__) != old__);  \
>   \
> - new__;      \
> + old__;  \
>  })
>  #endif

Acked-by: Will Deacon 

May also explain why no in-tree users appear to use the return value!

Will

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH v5 2/4] kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()

2018-11-27 Thread Will Deacon
On Mon, Nov 26, 2018 at 07:51:31PM -0800, Douglas Anderson wrote:
> When I had lockdep turned on and dropped into kgdb I got a nice splat
> on my system.  Specifically it hit:
>   DEBUG_LOCKS_WARN_ON(current->hardirq_context)
> 
> Specifically it looked like this:
>   sysrq: SysRq : DEBUG
>   [ cut here ]
>   DEBUG_LOCKS_WARN_ON(current->hardirq_context)
>   WARNING: CPU: 0 PID: 0 at .../kernel/locking/lockdep.c:2875 
> lockdep_hardirqs_on+0xf0/0x160
>   CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0 #27
>   pstate: 604003c9 (nZCv DAIF +PAN -UAO)
>   pc : lockdep_hardirqs_on+0xf0/0x160
>   ...
>   Call trace:
>lockdep_hardirqs_on+0xf0/0x160
>trace_hardirqs_on+0x188/0x1ac
>kgdb_roundup_cpus+0x14/0x3c
>kgdb_cpu_enter+0x53c/0x5cc
>kgdb_handle_exception+0x180/0x1d4
>kgdb_compiled_brk_fn+0x30/0x3c
>brk_handler+0x134/0x178
>do_debug_exception+0xfc/0x178
>el1_dbg+0x18/0x78
>kgdb_breakpoint+0x34/0x58
>sysrq_handle_dbg+0x54/0x5c
>__handle_sysrq+0x114/0x21c
>handle_sysrq+0x30/0x3c
>qcom_geni_serial_isr+0x2dc/0x30c
>   ...
>   ...
>   irq event stamp: ...45
>   hardirqs last  enabled at (...44): [...] __do_softirq+0xd8/0x4e4
>   hardirqs last disabled at (...45): [...] el1_irq+0x74/0x130
>   softirqs last  enabled at (...42): [...] _local_bh_enable+0x2c/0x34
>   softirqs last disabled at (...43): [...] irq_exit+0xa8/0x100
>   ---[ end trace adf21f830c46e638 ]---
> 
> Looking closely at it, it seems like a really bad idea to be calling
> local_irq_enable() in kgdb_roundup_cpus().  If nothing else that seems
> like it could violate spinlock semantics and cause a deadlock.
> 
> Instead, let's use a private csd alongside
> smp_call_function_single_async() to round up the other CPUs.  Using
> smp_call_function_single_async() doesn't require interrupts to be
> enabled so we can remove the offending bit of code.
> 
> In order to avoid duplicating this across all the architectures that
> use the default kgdb_roundup_cpus(), we'll add a "weak" implementation
> to debug_core.c.
> 
> Looking at all the people who previously had copies of this code,
> there were a few variants.  I've attempted to keep the variants
> working like they used to.  Specifically:
> * For arch/arc we passed NULL to kgdb_nmicallback() instead of
>   get_irq_regs().
> * For arch/mips there was a bit of extra code around
>   kgdb_nmicallback()
> 
> NOTE: In this patch we will still get into trouble if we try to round
> up a CPU that failed to round up before.  We'll try to round it up
> again and potentially hang when we try to grab the csd lock.  That's
> not new behavior but we'll still try to do better in a future patch.
> 
> Suggested-by: Daniel Thompson 
> Signed-off-by: Douglas Anderson 
> ---
> 
> Changes in v5:
> - Add a comment about get_irq_regs().
> - get_cpu() => raw_smp_processor_id() in kgdb_roundup_cpus().
> - for_each_cpu() => for_each_online_cpu()
> - Error check smp_call_function_single_async()

For the arm64 bit:

> diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
> index 12c339ff6e75..da880247c734 100644
> --- a/arch/arm64/kernel/kgdb.c
> +++ b/arch/arm64/kernel/kgdb.c
> @@ -284,18 +284,6 @@ static struct step_hook kgdb_step_hook = {
>   .fn = kgdb_step_brk_fn
>  };
>  
> -static void kgdb_call_nmi_hook(void *ignored)
> -{
> - kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
> -}
> -
> -void kgdb_roundup_cpus(void)
> -{
> - local_irq_enable();
> - smp_call_function(kgdb_call_nmi_hook, NULL, 0);
> - local_irq_disable();
> -}
> -

Acked-by: Will Deacon 

Will

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: single copy atomicity for double load/stores on 32-bit systems

2019-07-02 Thread Will Deacon
On Mon, Jul 01, 2019 at 08:05:51PM +, Vineet Gupta wrote:
> On 5/31/19 1:21 AM, Peter Zijlstra wrote:
> > On Thu, May 30, 2019 at 11:22:42AM -0700, Vineet Gupta wrote:
> >> Had an interesting lunch time discussion with our hardware architects 
> >> pertinent to
> >> "minimal guarantees expected of a CPU" section of memory-barriers.txt
> >>
> >>
> >> |  (*) These guarantees apply only to properly aligned and sized scalar
> >> | variables.  "Properly sized" currently means variables that are
> >> | the same size as "char", "short", "int" and "long".  "Properly
> >> | aligned" means the natural alignment, thus no constraints for
> >> | "char", two-byte alignment for "short", four-byte alignment for
> >> | "int", and either four-byte or eight-byte alignment for "long",
> >> | on 32-bit and 64-bit systems, respectively.
> >>
> >>
> >> I'm not sure how to interpret "natural alignment" for the case of double
> >> load/stores on 32-bit systems where the hardware and ABI allow for 4 byte
> >> alignment (ARCv2 LDD/STD, ARM LDRD/STRD )
> > 
> > Natural alignment: !((uintptr_t)ptr % sizeof(*ptr))
> > 
> > For any u64 type, that would give 8 byte alignment. the problem
> > otherwise being that your data spans two lines/pages etc..
> > 
> >> I presume (and the question) that lkmm doesn't expect such 8 byte 
> >> load/stores to
> >> be atomic unless 8-byte aligned
> >>
> >> ARMv7 arch ref manual seems to confirm this. Quoting
> >>
> >> | LDM, LDC, LDC2, LDRD, STM, STC, STC2, STRD, PUSH, POP, RFE, SRS, VLDM, 
> >> VLDR,
> >> | VSTM, and VSTR instructions are executed as a sequence of word-aligned 
> >> word
> >> | accesses. Each 32-bit word access is guaranteed to be single-copy 
> >> atomic. A
> >> | subsequence of two or more word accesses from the sequence might not 
> >> exhibit
> >> | single-copy atomicity
> >>
> >> While it seems reasonable form hardware pov to not implement such 
> >> atomicity by
> >> default it seems there's an additional burden on application writers. They 
> >> could
> >> be happily using a lockless algorithm with just a shared flag between 2 
> >> threads
> >> w/o need for any explicit synchronization.
> > 
> > If you're that careless with lockless code, you deserve all the pain you
> > get.
> > 
> >> But upgrade to a new compiler which
> >> aggressively "packs" struct rendering long long 32-bit aligned (vs. 64-bit 
> >> before)
> >> causing the code to suddenly stop working. Is the onus on them to declare 
> >> such
> >> memory as c11 atomic or some such.
> > 
> > When a programmer wants guarantees they already need to know wth they're
> > doing.
> > 
> > And I'll stand by my earlier conviction that any architecture that has a
> > native u64 (be it a 64bit arch or a 32bit with double-width
> > instructions) but has an ABI that allows u32 alignment on them is daft.
> 
> So I agree with Paul's assertion that it is strange for 8-byte type being 
> 4-byte
> aligned on a 64-bit system, but is it totally broken even if the ISA of the 
> said
> 64-bit arch allows LD/ST to be augmented with acq/rel respectively.
> 
> Say the ISA guarantees single-copy atomicity for aligned cases (i.e. for 
> 8-byte
> data only if it is naturally aligned) and in lack thereof programmer needs to 
> use
> the proper acq/release

Apologies if I'm missing some context here, but it's not clear to me why the
use of acquire/release instructions has anything to do with single-copy
atomicity of unaligned accesses. The ordering they provide doesn't
necessarily prevent tearing, although a CPU architecture could obviously
provide that guarantee if it wanted to. Generally though, I wouldn't expect
the two to go hand-in-hand like you're suggesting.

Will

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 19/26] arm64: remove __iounmap

2019-08-19 Thread Will Deacon
On Sat, Aug 17, 2019 at 09:32:46AM +0200, Christoph Hellwig wrote:
> No need to indirect iounmap for arm64.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/arm64/include/asm/io.h | 3 +--
>  arch/arm64/mm/ioremap.c | 4 ++--
>  2 files changed, 3 insertions(+), 4 deletions(-)

Not sure why we did it like this...

Acked-by: Will Deacon 

Will


Re: [PATCH 19/26] arm64: remove __iounmap

2019-08-31 Thread Will Deacon
Hi Christoph,

On Fri, Aug 30, 2019 at 06:05:15PM +0200, Christoph Hellwig wrote:
> On Mon, Aug 19, 2019 at 08:36:02AM +0100, Will Deacon wrote:
> > On Sat, Aug 17, 2019 at 09:32:46AM +0200, Christoph Hellwig wrote:
> > > No need to indirect iounmap for arm64.
> > > 
> > > Signed-off-by: Christoph Hellwig 
> > > ---
> > >  arch/arm64/include/asm/io.h | 3 +--
> > >  arch/arm64/mm/ioremap.c | 4 ++--
> > >  2 files changed, 3 insertions(+), 4 deletions(-)
> > 
> > Not sure why we did it like this...
> > 
> > Acked-by: Will Deacon 
> 
> Can you just pick this one up through the arm64 tree for 5.4?

Unfortunately, it doesn't apply because the tree you've based it on has
removed ioremap_wt(). If you send a version based on mainline, I can
queue it.

Cheers,

Will

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 2/2] mm/thp: Rename pmd_mknotpresent() as pmd_mknotvalid()

2020-04-20 Thread Will Deacon
On Fri, Mar 20, 2020 at 10:24:17AM +0530, Anshuman Khandual wrote:
> pmd_present() is expected to test positive after pmdp_mknotpresent() as the
> PMD entry still points to a valid huge page in memory. pmdp_mknotpresent()
> implies that given PMD entry is just invalidated from MMU perspective while
> still holding on to pmd_page() referred valid huge page thus also clearing
> pmd_present() test. This creates the following situation which is counter
> intuitive.
> 
> [pmd_present(pmd_mknotpresent(pmd)) = true]
> 
> This renames pmd_mknotpresent() as pmd_mknotvalid() reflecting the helper's
> functionality more accurately while changing the above mentioned situation
> as follows. This does not create any functional change.
> 
> [pmd_present(pmd_mknotvalid(pmd)) = true]
> 
> This is not applicable for platforms that define own pmdp_invalidate() via
> __HAVE_ARCH_PMDP_INVALIDATE. Suggestion for renaming came during a previous
> discussion here.

Bikeshed alert: maybe pmd_mkinvalid() would be better, given that this is
a one-trick pony for pmdp_invalidate()?

Will

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 2/2] mm/thp: Rename pmd_mknotpresent() as pmd_mknotvalid()

2020-04-21 Thread Will Deacon
On Tue, Apr 21, 2020 at 04:57:26AM +0530, Anshuman Khandual wrote:
> 
> 
> On 04/21/2020 02:33 AM, Will Deacon wrote:
> > On Fri, Mar 20, 2020 at 10:24:17AM +0530, Anshuman Khandual wrote:
> >> pmd_present() is expected to test positive after pmdp_mknotpresent() as the
> >> PMD entry still points to a valid huge page in memory. pmdp_mknotpresent()
> >> implies that given PMD entry is just invalidated from MMU perspective while
> >> still holding on to pmd_page() referred valid huge page thus also clearing
> >> pmd_present() test. This creates the following situation which is counter
> >> intuitive.
> >>
> >> [pmd_present(pmd_mknotpresent(pmd)) = true]
> >>
> >> This renames pmd_mknotpresent() as pmd_mknotvalid() reflecting the helper's
> >> functionality more accurately while changing the above mentioned situation
> >> as follows. This does not create any functional change.
> >>
> >> [pmd_present(pmd_mknotvalid(pmd)) = true]
> >>
> >> This is not applicable for platforms that define own pmdp_invalidate() via
> >> __HAVE_ARCH_PMDP_INVALIDATE. Suggestion for renaming came during a previous
> >> discussion here.
> > 
> > Bikeshed alert: maybe pmd_mkinvalid() would be better, given that this is
> > a one-trick pony for pmdp_invalidate()?
> 
> I had thought about making it pmd_mkinvalid() earlier. But as we were 
> replacing
> pmd_mknotpresent(), hence went with similar pattern pmd_mknotvalid() which was
> originally suggested by Catalin. There is an existing pte_mknotpresent() in 
> arc
> platform as well. I dont have a very strong opinion either way, will be happy
> to rename. But then still wondering if we really need to.

I just think that having pmdp_invalidate() call pmd_mkinvalid() makes a lot
of sense and, since this is a pure renaming patch, then that's worth taking
into consideration.

If you go with pmd_mkinvalid(), then:

Acked-by: Will Deacon 

Will

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: Flushing transparent hugepages

2020-08-18 Thread Will Deacon
On Tue, Aug 18, 2020 at 04:07:36PM +0100, Matthew Wilcox wrote:
> For example, arm64 seems confused in this scenario:
> 
> void flush_dcache_page(struct page *page)
> {
> if (test_bit(PG_dcache_clean, >flags))
> clear_bit(PG_dcache_clean, >flags);
> }
> 
> ...
> 
> void __sync_icache_dcache(pte_t pte)
> {
> struct page *page = pte_page(pte);
> 
> if (!test_and_set_bit(PG_dcache_clean, >flags))
> sync_icache_aliases(page_address(page), page_size(page));
> }
> 
> So arm64 keeps track on a per-page basis which ones have been flushed.
> page_size() will return PAGE_SIZE if called on a tail page or regular
> page, but will return PAGE_SIZE << compound_order if called on a head
> page.  So this will either over-flush, or it's missing the opportunity
> to clear the bits on all the subpages which have now been flushed.

Hmm, that seems to go all the way back to 2014 as the result of a bug fix
in 923b8f5044da ("arm64: mm: Make icache synchronisation logic huge page
aware") which has a Reported-by Mark and a CC stable, suggesting something
_was_ going wrong at the time :/ Was there a point where the tail pages
could end up with PG_arch_1 uncleared on allocation?

> What would you _like_ to see?  Would you rather flush_dcache_page()
> were called once for each subpage, or would you rather maintain
> the page-needs-flushing state once per compound page?  We could also
> introduce flush_dcache_thp() if some architectures would prefer it one
> way and one the other, although that brings into question what to do
> for hugetlbfs pages.

For arm64, we'd like to see PG_arch_1 preserved during huge page splitting
[1], but there was a worry that it might break x86 and s390. It's also not
clear to me that we can change __sync_icache_dcache() as it's called when
we're installing the entry in the page-table, so why would it be called
again for the tail pages?

Will

[1] 
https://lore.kernel.org/linux-arch/20200703153718.16973-8-catalin.mari...@arm.com/

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH v2] arch: consolidate pm_power_off callback

2021-01-12 Thread Will Deacon
On Sun, Dec 27, 2020 at 03:01:28PM +0100, Enrico Weigelt, metux IT consult 
wrote:
> Move the pm_power_off callback into one global place and also add an
> function for conditionally calling it (when not NULL), in order to remove
> code duplication in all individual archs.
> 
> Reported-by: kernel test robot 
> Signed-off-by: Enrico Weigelt, metux IT consult 

[...]

> diff --git a/kernel/reboot.c b/kernel/reboot.c
> index eb1b15850761..ec4cd66dd1ae 100644
> --- a/kernel/reboot.c
> +++ b/kernel/reboot.c
> @@ -53,6 +53,16 @@ int reboot_force;
>  void (*pm_power_off_prepare)(void);
>  EXPORT_SYMBOL_GPL(pm_power_off_prepare);
>  
> +void (*pm_power_off)(void);
> +EXPORT_SYMBOL_GPL(pm_power_off);
> +
> +void do_power_off(void)
> +{
> + if (pm_power_off)
> + pm_power_off();
> +}
> +EXPORT_SYMBOL_GPL(do_power_off);

Could this just live as a static inline in pm.h to avoid having to export
the extra symbol?

Will

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 00/11] ARC atomics update

2021-08-06 Thread Will Deacon
On Thu, Aug 05, 2021 at 07:04:32PM +0200, Peter Zijlstra wrote:
> On Thu, Aug 05, 2021 at 04:18:29PM +, Vineet Gupta wrote:
> > On 8/5/21 2:02 AM, Peter Zijlstra wrote:
> > > On Wed, Aug 04, 2021 at 12:15:43PM -0700, Vineet Gupta wrote:
> > > 
> > >> Vineet Gupta (10):
> > >>ARC: atomics: disintegrate header
> > >>ARC: atomic: !LLSC: remove hack in atomic_set() for for UP
> > >>ARC: atomic: !LLSC: use int data type consistently
> > >>ARC: atomic64: LLSC: elide unused atomic_{and,or,xor,andnot}_return
> > >>ARC: atomics: implement relaxed variants
> > >>ARC: bitops: fls/ffs to take int (vs long) per asm-generic defines
> > >>ARC: xchg: !LLSC: remove UP micro-optimization/hack
> > >>ARC: cmpxchg/xchg: rewrite as macros to make type safe
> > >>ARC: cmpxchg/xchg: implement relaxed variants (LLSC config only)
> > >>ARC: atomic_cmpxchg/atomic_xchg: implement relaxed variants
> > >>
> > >> Will Deacon (1):
> > >>ARC: switch to generic bitops
> > > 
> > > Didn't see any weird things:
> > > 
> > > Acked-by: Peter Zijlstra (Intel) 
> > 
> > Thx Peter. A lot of this is your code anyways ;-)
> > 
> > Any initial thoughts/comments on patch 06/11 - is there an obvious 
> > reason that generic bitops take signed @nr or the hurdle is need to be 
> > done consistently cross-arch.
> 
> That does indeed seem daft and ready for a cleanup. Will any
> recollection from when you touched this?

I had a patch to fix this but it blew up in the robot and I didn't get round
to reworking it:

https://lore.kernel.org/patchwork/patch/124/

Will

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [RFC] bitops/non-atomic: make @nr unsigned to avoid any DIV

2021-08-06 Thread Will Deacon
On Thu, Aug 05, 2021 at 12:14:08PM -0700, Vineet Gupta wrote:
> signed math causes generation of costlier instructions such as DIV when
> they could be done by barrerl shifter.
> 
> Worse part is this is not caught by things like bloat-o-meter since
> instruction length / symbols are typically same size.
> 
> e.g.
> 
> stock (signed math)
> __
> 
> 919b4614 :
> 919b4614: div r2,r0,0x20
> ^^^
> 919b4618: add2r2,0x920f6050,r2
> 919b4620: ld_sr2,[r2,0]
> 919b4622: lsr r0,r2,r0
> 919b4626: j_s.d   [blink]
> 919b4628: bmsk_s  r0,r0,0
> 919b462a: nop_s
> 
> (patched) unsigned math
> __
> 
> 919b4614 :
> 919b4614: lsr r2,r0,0x5  @nr/32
> ^^^
> 919b4618: add2r2,0x920f6050,r2
> 919b4620: ld_sr2,[r2,0]
> 919b4622: lsr r0,r2,r0 #test_bit()
> 919b4626: j_s.d   [blink]
> 919b4628: bmsk_s  r0,r0,0
> 919b462a: nop_s

Just FYI, but on arm64 the existing codegen is alright as we have both
arithmetic and logical shifts.

> Signed-off-by: Vineet Gupta 
> ---
> This is an RFC for feeback, I understand this impacts every arch,
> but as of now it is only buld/run tested on ARC.
> ---
> ---
>  include/asm-generic/bitops/non-atomic.h | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)

Acked-by: Will Deacon 

We should really move test_bit() into the atomic header, but I failed to fix
the resulting include mess last time I tried that.

Will

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc