[tip: timers/core] clocksource: mips-gic-timer: Register as sched_clock

2020-06-01 Thread tip-bot2 for Paul Burton
The following commit has been merged into the timers/core branch of tip:

Commit-ID: 48016e78d328998b1f00bcfb639adeabca51abe5
Gitweb:
https://git.kernel.org/tip/48016e78d328998b1f00bcfb639adeabca51abe5
Author:Paul Burton 
AuthorDate:Thu, 21 May 2020 23:48:16 +03:00
Committer: Daniel Lezcano 
CommitterDate: Sat, 23 May 2020 00:03:08 +02:00

clocksource: mips-gic-timer: Register as sched_clock

The MIPS GIC timer is well suited for use as sched_clock, so register it
as such.

Whilst the existing gic_read_count() function matches the prototype
needed by sched_clock_register() already, we split it into 2 functions
in order to remove the need to evaluate the mips_cm_is64 condition
within each call since sched_clock should be as fast as possible.

Note the sched clock framework needs the clock source being stable in
order to rely on it. So we register the MIPS GIC timer as schedule clocks
only if it's, if either the system doesn't have CPU-frequency enabled or
the CPU frequency is changed by means of the CPC core clock divider
available on the platforms with CM3 or newer.

Signed-off-by: Paul Burton 
Co-developed-by: Serge Semin 
[sergey.se...@baikalelectronics.ru: Register sched-clock if CM3 or !CPU-freq]
Signed-off-by: Serge Semin 
Cc: Alexey Malahov 
Cc: Thomas Bogendoerfer 
Cc: Ralf Baechle 
Cc: Alessandro Zummo 
Cc: Alexandre Belloni 
Cc: Arnd Bergmann 
Cc: Rob Herring 
Cc: linux-m...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: devicet...@vger.kernel.org
Signed-off-by: Daniel Lezcano 
Link: 
https://lore.kernel.org/r/20200521204818.25436-8-sergey.se...@baikalelectronics.ru
---
 drivers/clocksource/mips-gic-timer.c | 31 +++
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/drivers/clocksource/mips-gic-timer.c 
b/drivers/clocksource/mips-gic-timer.c
index 8b5f8ae..ef12c12 100644
--- a/drivers/clocksource/mips-gic-timer.c
+++ b/drivers/clocksource/mips-gic-timer.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -24,13 +25,10 @@ static DEFINE_PER_CPU(struct clock_event_device, 
gic_clockevent_device);
 static int gic_timer_irq;
 static unsigned int gic_frequency;
 
-static u64 notrace gic_read_count(void)
+static u64 notrace gic_read_count_2x32(void)
 {
unsigned int hi, hi2, lo;
 
-   if (mips_cm_is64)
-   return read_gic_counter();
-
do {
hi = read_gic_counter_32h();
lo = read_gic_counter_32l();
@@ -40,6 +38,19 @@ static u64 notrace gic_read_count(void)
return (((u64) hi) << 32) + lo;
 }
 
+static u64 notrace gic_read_count_64(void)
+{
+   return read_gic_counter();
+}
+
+static u64 notrace gic_read_count(void)
+{
+   if (mips_cm_is64)
+   return gic_read_count_64();
+
+   return gic_read_count_2x32();
+}
+
 static int gic_next_event(unsigned long delta, struct clock_event_device *evt)
 {
int cpu = cpumask_first(evt->cpumask);
@@ -228,6 +239,18 @@ static int __init gic_clocksource_of_init(struct 
device_node *node)
/* And finally start the counter */
clear_gic_config(GIC_CONFIG_COUNTSTOP);
 
+   /*
+* It's safe to use the MIPS GIC timer as a sched clock source only if
+* its ticks are stable, which is true on either the platforms with
+* stable CPU frequency or on the platforms with CM3 and CPU frequency
+* change performed by the CPC core clocks divider.
+*/
+   if (mips_cm_revision() >= CM_REV_CM3 || !IS_ENABLED(CONFIG_CPU_FREQ)) {
+   sched_clock_register(mips_cm_is64 ?
+gic_read_count_64 : gic_read_count_2x32,
+64, gic_frequency);
+   }
+
return 0;
 }
 TIMER_OF_DECLARE(mips_gic_timer, "mti,gic-timer",


Re: piix4-poweroff.c I/O BAR usage

2020-05-21 Thread Paul Burton
Hello,

On Thu, May 21, 2020 at 6:04 PM Maciej W. Rozycki  wrote:
>  Paul may or may not be reachable anymore, so I'll step in.

I'm reachable but lacking free time & with no access to Malta hardware
I can't claim to be too useful here, so thanks for responding :)

Before being moved to a driver (which was mostly driven by a desire to
migrate Malta to a multi-platform/generic kernel using DT) this code
was part of arch/mips/mti-malta/ where I added it in commit
b6911bba598f ("MIPS: Malta: add suspend state entry code"). My main
motivation at the time was to make QEMU exit after running poweroff,
but I did ensure it worked on real Malta boards too (at least Malta-R
with CoreFPGA6). Over the years since then it shocked a couple of
hardware people to see software power off a Malta - if the original
hardware designers had intended that to work then the knowledge had
been lost over time :)

I suspect the code was based on visws_machine_power_off():

  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/platform/visws/visws_quirks.c?h=v3.10#n125

> > pci_request_region() takes a BAR number (0-5), but here we're passing
> > PCI_BRIDGE_RESOURCES (13 if CONFIG_PCI_IOV, or 7 otherwise), which is
> > the bridge I/O window.
> >
> > I don't think this device ([8086:7113]) is a bridge, so that resource
> > should be empty.
>
>  Hmm, isn't the resource actually set up by `quirk_piix4_acpi' though?

I agree that the region used is meant to match that set up by
quirk_piix4_acpi(), which also refers to it using the
PCI_BRIDGE_RESOURCES macro.

Thanks,
Paul


[PATCH] MIPS: tlbex: Fix build_restore_pagemask KScratch restore

2019-10-18 Thread Paul Burton
build_restore_pagemask() will restore the value of register $1/$at when
its restore_scratch argument is non-zero, and aims to do so by filling a
branch delay slot. Commit 0b24cae4d535 ("MIPS: Add missing EHB in mtc0
-> mfc0 sequence.") added an EHB instruction (Execution Hazard Barrier)
prior to restoring $1 from a KScratch register, in order to resolve a
hazard that can result in stale values of the KScratch register being
observed. In particular, P-class CPUs from MIPS with out of order
execution pipelines such as the P5600 & P6600 are affected.

Unfortunately this EHB instruction was inserted in the branch delay slot
causing the MFC0 instruction which performs the restoration to no longer
execute along with the branch. The result is that the $1 register isn't
actually restored, ie. the TLB refill exception handler clobbers it -
which is exactly the problem the EHB is meant to avoid for the P-class
CPUs.

Similarly build_get_pgd_vmalloc() will restore the value of $1/$at when
its mode argument equals refill_scratch, and suffers from the same
problem.

Fix this by in both cases moving the EHB earlier in the emitted code.
There's no reason it needs to immediately precede the MFC0 - it simply
needs to be between the MTC0 & MFC0.

This bug only affects Cavium Octeon systems which use
build_fast_tlb_refill_handler().

Signed-off-by: Paul Burton 
Fixes: 0b24cae4d535 ("MIPS: Add missing EHB in mtc0 -> mfc0 sequence.")
Cc: Dmitry Korotin 
Cc: sta...@vger.kernel.org # v3.15+
---
 arch/mips/mm/tlbex.c | 23 +++
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c
index e01cb33bfa1a..41bb91f05688 100644
--- a/arch/mips/mm/tlbex.c
+++ b/arch/mips/mm/tlbex.c
@@ -653,6 +653,13 @@ static void build_restore_pagemask(u32 **p, struct 
uasm_reloc **r,
   int restore_scratch)
 {
if (restore_scratch) {
+   /*
+* Ensure the MFC0 below observes the value written to the
+* KScratch register by the prior MTC0.
+*/
+   if (scratch_reg >= 0)
+   uasm_i_ehb(p);
+
/* Reset default page size */
if (PM_DEFAULT_MASK >> 16) {
uasm_i_lui(p, tmp, PM_DEFAULT_MASK >> 16);
@@ -667,12 +674,10 @@ static void build_restore_pagemask(u32 **p, struct 
uasm_reloc **r,
uasm_i_mtc0(p, 0, C0_PAGEMASK);
uasm_il_b(p, r, lid);
}
-   if (scratch_reg >= 0) {
-   uasm_i_ehb(p);
+   if (scratch_reg >= 0)
UASM_i_MFC0(p, 1, c0_kscratch(), scratch_reg);
-   } else {
+   else
UASM_i_LW(p, 1, scratchpad_offset(0), 0);
-   }
} else {
/* Reset default page size */
if (PM_DEFAULT_MASK >> 16) {
@@ -921,6 +926,10 @@ build_get_pgd_vmalloc64(u32 **p, struct uasm_label **l, 
struct uasm_reloc **r,
}
if (mode != not_refill && check_for_high_segbits) {
uasm_l_large_segbits_fault(l, *p);
+
+   if (mode == refill_scratch && scratch_reg >= 0)
+   uasm_i_ehb(p);
+
/*
 * We get here if we are an xsseg address, or if we are
 * an xuseg address above (PGDIR_SHIFT+PGDIR_BITS) boundary.
@@ -939,12 +948,10 @@ build_get_pgd_vmalloc64(u32 **p, struct uasm_label **l, 
struct uasm_reloc **r,
uasm_i_jr(p, ptr);
 
if (mode == refill_scratch) {
-   if (scratch_reg >= 0) {
-   uasm_i_ehb(p);
+   if (scratch_reg >= 0)
UASM_i_MFC0(p, 1, c0_kscratch(), scratch_reg);
-   } else {
+   else
UASM_i_LW(p, 1, scratchpad_offset(0), 0);
-   }
} else {
uasm_i_nop(p);
}
-- 
2.23.0



Re: [PATCH] MAINTAINERS: Use @kernel.org address for Paul Burton

2019-10-18 Thread Paul Burton
Hello,

Paul Burton wrote:
> From: Paul Burton 
> 
> Switch to using my paulbur...@kernel.org email address in order to avoid
> subject mangling that's being imposed on my previous address.

Applied to mips-fixes.

> commit 0ad8f7aa9f7e
> https://git.kernel.org/mips/c/0ad8f7aa9f7e
> 
> Signed-off-by: Paul Burton 
> Signed-off-by: Paul Burton 

Thanks,
Paul

[ This message was auto-generated; if you believe anything is incorrect
  then please email paulbur...@kernel.org to report it. ]


[PATCH] MAINTAINERS: Use @kernel.org address for Paul Burton

2019-10-16 Thread Paul Burton
From: Paul Burton 

Switch to using my paulbur...@kernel.org email address in order to avoid
subject mangling that's being imposed on my previous address.

Signed-off-by: Paul Burton 
Signed-off-by: Paul Burton 
---
 .mailmap|  3 ++-
 MAINTAINERS | 10 +-
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/.mailmap b/.mailmap
index edcac87e76c8..10b27ecb61c0 100644
--- a/.mailmap
+++ b/.mailmap
@@ -196,7 +196,8 @@ Oleksij Rempel  

 Oleksij Rempel  
 Paolo 'Blaisorblade' Giarrusso 
 Patrick Mochel 
-Paul Burton  
+Paul Burton  
+Paul Burton  
 Peter A Jonsson 
 Peter Oruba 
 Peter Oruba 
diff --git a/MAINTAINERS b/MAINTAINERS
index a69e6db80c79..6c4dc607074a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3096,7 +3096,7 @@ S:Supported
 F: arch/arm64/net/
 
 BPF JIT for MIPS (32-BIT AND 64-BIT)
-M: Paul Burton 
+M: Paul Burton 
 L: net...@vger.kernel.org
 L: b...@vger.kernel.org
 S: Maintained
@@ -8001,7 +8001,7 @@ S:Maintained
 F: drivers/usb/atm/ueagle-atm.c
 
 IMGTEC ASCII LCD DRIVER
-M: Paul Burton 
+M: Paul Burton 
 S: Maintained
 F: Documentation/devicetree/bindings/auxdisplay/img-ascii-lcd.txt
 F: drivers/auxdisplay/img-ascii-lcd.c
@@ -10828,7 +10828,7 @@ F:  drivers/usb/image/microtek.*
 
 MIPS
 M: Ralf Baechle 
-M: Paul Burton 
+M: Paul Burton 
 M: James Hogan 
 L: linux-m...@vger.kernel.org
 W: http://www.linux-mips.org/
@@ -10842,7 +10842,7 @@ F:  arch/mips/
 F: drivers/platform/mips/
 
 MIPS BOSTON DEVELOPMENT BOARD
-M: Paul Burton 
+M: Paul Burton 
 L: linux-m...@vger.kernel.org
 S: Maintained
 F: Documentation/devicetree/bindings/clock/img,boston-clock.txt
@@ -10852,7 +10852,7 @@ F:  drivers/clk/imgtec/clk-boston.c
 F: include/dt-bindings/clock/boston-clock.h
 
 MIPS GENERIC PLATFORM
-M: Paul Burton 
+M: Paul Burton 
 L: linux-m...@vger.kernel.org
 S: Supported
 F: Documentation/devicetree/bindings/power/mti,mips-cpc.txt
-- 
2.23.0



Re: [PATCH] MIPS: Loongson: Make default kernel log buffer size as 128KB for Loongson3

2019-10-15 Thread Paul Burton
Hi Tiezhu & Huacai,

On Tue, Oct 15, 2019 at 12:00:25PM +0800, Tiezhu Yang wrote:
> On 10/15/2019 11:36 AM, Huacai Chen wrote:
> > On Tue, Oct 15, 2019 at 10:12 AM Tiezhu Yang  wrote:
> > > When I update kernel with loongson3_defconfig based on the Loongson 3A3000
> > > platform, then using dmesg command to show kernel ring buffer, the initial
> > > kernel messages have disappeared due to the log buffer is too small, it is
> > > better to change the default kernel log buffer size from 16KB to 128KB.
> > > 
> > > Signed-off-by: Tiezhu Yang 
> > > ---
> > >   arch/mips/configs/loongson3_defconfig | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/mips/configs/loongson3_defconfig 
> > > b/arch/mips/configs/loongson3_defconfig
> > > index 90ee008..3aa2201 100644
> > > --- a/arch/mips/configs/loongson3_defconfig
> > > +++ b/arch/mips/configs/loongson3_defconfig
> > > @@ -12,7 +12,7 @@ CONFIG_TASKSTATS=y
> > >   CONFIG_TASK_DELAY_ACCT=y
> > >   CONFIG_TASK_XACCT=y
> > >   CONFIG_TASK_IO_ACCOUNTING=y
> > > -CONFIG_LOG_BUF_SHIFT=14
> > > +CONFIG_LOG_BUF_SHIFT=17
> > Hi, Tiezhu,
> > 
> > Why you choose 128KB but not 64KB or 256KB? I found 64KB is enough for
> > our cases. And if you really need more, I think 256KB could be better
> > because there are many platforms choose 256KB.
> 
> Hi Huacai,
> 
> Thanks for your reply and suggestion, I will send a v2 patch.

Thanks for the patches.

I actually have a slight preference for 128KB if you've no specific
need, since 128KB is the default. Some quick grepping says that of 405
defconfigs in tree (as of v5.4-rc3), we have:

  LOG_BUF_SHIFT  Count
 12  1
 13  3
 14  235
 15  18
 16  39
 17  90
 18  13
 19  2
 20  4

ie. 16KiB is by far the most common, then second most common is the
default 128KiB. 256KiB is comparatively rare.

However, I don't think your v1 patch is quite right Tiezhu - since 17 is
the default it shouldn't be specified in the defconfig at all. Did you
manually make the change in the loongson3_defconfig file? If so please
take a look at the savedefconfig make target & try something like this:

  make ARCH=mips loongson3_defconfig
  make ARCH=mips menuconfig
  # Change LOG_BUF_SHIFT
  make ARCH=mips savedefconfig
  mv defconfig arch/mips/configs/loongson3_defconfig
  git add -i arch/mips/configs/loongson3_defconfig
  # Stage the relevant changes, drop the others

You should end up with the CONFIG_LOG_BUF_SHIFT line just getting
deleted.

If on the other hand you really do prefer 256KiB for these systems
please describe why in the commit message. It could be something as
simple as "we have lots of memory so using 256KiB isn't a big deal, and
gives us a better chance of preserving boot messages until they're
examined". But if your log is getting this big before you look at it (or
before something like systemd copies it into its journal), there's
probably something fishy going on.

Thanks,
Paul


Re: [EXTERNAL]Re: Build regressions/improvements in v5.4-rc3

2019-10-14 Thread Paul Burton
Hi Geert, Greg,

On Mon, Oct 14, 2019 at 09:04:21AM +0200, Geert Uytterhoeven wrote:
> On Mon, Oct 14, 2019 at 8:53 AM Geert Uytterhoeven  
> wrote:
> > JFYI, when comparing v5.4-rc3[1] to v5.4-rc2[3], the summaries are:
> >   - build errors: +1/-0
> 
>   + /kisskb/src/drivers/staging/octeon/ethernet-spi.c: error:
> 'OCTEON_IRQ_RML' undeclared (first use in this function):  => 198:19,
> 224:12
> 
> mips-allmodconfig
> 
> > [1] 
> > http://kisskb.ellerman.id.au/kisskb/branch/linus/head/4f5cafb5cb8471e54afdc9054d973535614f7675/
> >  (232 out of 242 configs)
> > [3] 
> > http://kisskb.ellerman.id.au/kisskb/branch/linus/head/da0c9ea146cbe92b832f1b0f694840ea8eb33cce/
> >  (233 out of 242 configs)

I believe this should be fixed by this patch:

  https://lore.kernel.org/lkml/20191007231741.2012860-1-paul.bur...@mips.com/

It's currently in staging-next as commit 17a29fea086b ("staging/octeon:
Use stubs for MIPS && !CAVIUM_OCTEON_SOC"). Could we get that merged in
the 5.4 cycle instead of 5.5?

Thanks,
Paul


[GIT PULL] MIPS fixes

2019-10-12 Thread Paul Burton
Hi Linus,

Here are a few MIPS fixes for 5.4; please pull.

Thanks,
Paul


The following changes since commit da0c9ea146cbe92b832f1b0f694840ea8eb33cce:

  Linux 5.4-rc2 (2019-10-06 14:27:30 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux.git 
tags/mips_fixes_5.4_2

for you to fetch changes up to 2f2b4fd674cadd8c6b40eb629e140a14db4068fd:

  MIPS: Disable Loongson MMI instructions for kernel build (2019-10-10 11:58:52 
-0700)


A few MIPS fixes for 5.4:

- Build fixes for CONFIG_OPTIMIZE_INLINING=y builds in which the
  compiler may choose not to inline __xchg() & __cmpxchg().

- A build fix for Loongson configurations with GCC 9.x.

- Expose some extra HWCAP bits to indicate support for various
  instruction set extensions to userland.

- Fix bad stack access in firmware handling code for old SNI
  RM200/300/400 machines.


Jiaxun Yang (1):
  MIPS: elf_hwcap: Export userspace ASEs

Paul Burton (1):
  MIPS: Disable Loongson MMI instructions for kernel build

Thomas Bogendoerfer (3):
  MIPS: include: Mark __cmpxchg as __always_inline
  MIPS: include: Mark __xchg as __always_inline
  MIPS: fw: sni: Fix out of bounds init of o32 stack

 arch/mips/fw/sni/sniprom.c |  2 +-
 arch/mips/include/asm/cmpxchg.h|  9 +
 arch/mips/include/uapi/asm/hwcap.h | 11 +++
 arch/mips/kernel/cpu-probe.c   | 33 +
 arch/mips/loongson64/Platform  |  4 
 arch/mips/vdso/Makefile|  1 +
 6 files changed, 55 insertions(+), 5 deletions(-)


signature.asc
Description: PGP signature


Re: [PATCH] mips: Fix unroll macro when building with Clang

2019-10-10 Thread Paul Burton
Hello,

Nathan Chancellor wrote:
> Building with Clang errors after commit 6baaeadae911 ("MIPS: Provide
> unroll() macro, use it for cache ops") since the GCC_VERSION macro
> is defined in include/linux/compiler-gcc.h, which is only included
> in compiler.h when using GCC:
> 
> In file included from arch/mips/kernel/mips-mt.c:20:
> ./arch/mips/include/asm/r4kcache.h:254:1: error: use of undeclared
> identifier 'GCC_VERSION'; did you mean 'S_VERSION'?
> __BUILD_BLAST_CACHE(i, icache, Index_Invalidate_I, Hit_Invalidate_I, 32,
> )
> ^
> ./arch/mips/include/asm/r4kcache.h:219:4: note: expanded from macro
> '__BUILD_BLAST_CACHE'
> cache_unroll(32, kernel_cache, indexop,
> ^
> ./arch/mips/include/asm/r4kcache.h:203:2: note: expanded from macro
> 'cache_unroll'
> unroll(times, _cache_op, insn, op, (addr) + (i++ * (lsize)));
> ^
> ./arch/mips/include/asm/unroll.h:28:15: note: expanded from macro
> 'unroll'
> BUILD_BUG_ON(GCC_VERSION >= 40700 &&\
>  ^
> 
> Use CONFIG_GCC_VERSION, which will always be set by Kconfig.
> Additionally, Clang 8 had improvements around __builtin_constant_p so
> use that as a lower limit for this check with Clang (although MIPS
> wasn't buildable until Clang 9); building a kernel with Clang 9.0.0
> has no issues after this change.

Applied to mips-next.

> commit df3da04880b4
> https://git.kernel.org/mips/c/df3da04880b4
> 
> Fixes: 6baaeadae911 ("MIPS: Provide unroll() macro, use it for cache ops")
> Link: https://github.com/ClangBuiltLinux/linux/issues/736
> Signed-off-by: Nathan Chancellor 
> Signed-off-by: Paul Burton 

Thanks,
Paul

[ This message was auto-generated; if you believe anything is incorrect
  then please email paul.bur...@mips.com to report it. ]


Re: [PATCH 0/6] Clean up ARC code and fix IP22/28 early printk

2019-10-09 Thread Paul Burton
Hello,

Thomas Bogendoerfer wrote:
> While fixing the problem of not working EARLY_PRINTK on IP22/IP28
> I've removed not used ARC function and made 32bit ARC PROMs working
> with 64bit kernels. By switching to memory detection via PROM calls
> EARLY_PRINTK works now. And by using the regular 64bit spaces
> maximum memory of 384MB on Indigo2 R4k machines is working, too.
> 
> Thomas Bogendoerfer (6):
>   MIPS: fw: arc: remove unused ARC code
>   MIPS: fw: arc: use call_o32 to call ARC prom from 64bit kernel
>   MIPS: Kconfig: always select ARC_MEMORY and ARC_PROMLIB for platform
>   MIPS: fw: arc: workaround 64bit kernel/32bit ARC problems
>   MIPS: SGI-IP22: set PHYS_OFFSET to memory start
>   MIPS: SGI-IP22/28: Use PROM for memory detection

Series applied to mips-next.

> MIPS: fw: arc: remove unused ARC code
>   commit d11646b5ce93
>   https://git.kernel.org/mips/c/d11646b5ce93
>   
>   Signed-off-by: Thomas Bogendoerfer 
>   Signed-off-by: Paul Burton 
> 
> MIPS: fw: arc: use call_o32 to call ARC prom from 64bit kernel
>   commit ce6c0a593b3c
>   https://git.kernel.org/mips/c/ce6c0a593b3c
>   
>   Signed-off-by: Thomas Bogendoerfer 
>   Signed-off-by: Paul Burton 
> 
> MIPS: Kconfig: always select ARC_MEMORY and ARC_PROMLIB for platform
>   commit 39b2d7565a47
>   https://git.kernel.org/mips/c/39b2d7565a47
>   
>   Signed-off-by: Thomas Bogendoerfer 
>   Signed-off-by: Paul Burton 
> 
> MIPS: fw: arc: workaround 64bit kernel/32bit ARC problems
>   commit 351889d35629
>   https://git.kernel.org/mips/c/351889d35629
>   
>   Signed-off-by: Thomas Bogendoerfer 
>   Signed-off-by: Paul Burton 
> 
> MIPS: SGI-IP22: set PHYS_OFFSET to memory start
>   commit 931e1bfea403
>   https://git.kernel.org/mips/c/931e1bfea403
>   
>   Signed-off-by: Thomas Bogendoerfer 
>   Signed-off-by: Paul Burton 
> 
> MIPS: SGI-IP22/28: Use PROM for memory detection
>   commit c0de00b286ed
>   https://git.kernel.org/mips/c/c0de00b286ed
>   
>   Signed-off-by: Thomas Bogendoerfer 
>   Signed-off-by: Paul Burton 

Thanks,
Paul

[ This message was auto-generated; if you believe anything is incorrect
  then please email paul.bur...@mips.com to report it. ]


Re: [PATCH] MIPS: fw: sni: Fix out of bounds init of o32 stack

2019-10-09 Thread Paul Burton
Hello,

Thomas Bogendoerfer wrote:
> Use ARRAY_SIZE to caluculate the top of the o32 stack.

Applied to mips-fixes.

> commit efcb529694c3
> https://git.kernel.org/mips/c/efcb529694c3
> 
> Signed-off-by: Thomas Bogendoerfer 
> Signed-off-by: Paul Burton 

Thanks,
Paul

[ This message was auto-generated; if you believe anything is incorrect
  then please email paul.bur...@mips.com to report it. ]


Re: [PATCH] MIPS: include: Mark __xchg as __always_inline

2019-10-09 Thread Paul Burton
Hello,

Thomas Bogendoerfer wrote:
> Commit ac7c3e4ff401 ("compiler: enable CONFIG_OPTIMIZE_INLINING
> forcibly") allows compiler to uninline functions marked as 'inline'.
> In cace of __xchg this would cause to reference function
> __xchg_called_with_bad_pointer, which is an error case
> for catching bugs and will not happen for correct code, if
> __xchg is inlined.

Applied to mips-fixes.

> commit 46f1619500d0
> https://git.kernel.org/mips/c/46f1619500d0
> 
> Signed-off-by: Thomas Bogendoerfer 
> Reviewed-by: Philippe Mathieu-Daudé 
> Signed-off-by: Paul Burton 

Thanks,
Paul

[ This message was auto-generated; if you believe anything is incorrect
  then please email paul.bur...@mips.com to report it. ]


Re: [PATCH v2] MIPS: generic: Use __initconst for const init data

2019-10-08 Thread Paul Burton
Hello,

Tiezhu Yang wrote:
> Fix the following checkpatch errors:
> 
> $ ./scripts/checkpatch.pl --no-tree -f arch/mips/generic/init.c
> ERROR: Use of const init definition must use __initconst
> #23: FILE: arch/mips/generic/init.c:23:
> +static __initdata const void *fdt;
> 
> ERROR: Use of const init definition must use __initconst
> #24: FILE: arch/mips/generic/init.c:24:
> +static __initdata const struct mips_machine *mach;
> 
> ERROR: Use of const init definition must use __initconst
> #25: FILE: arch/mips/generic/init.c:25:
> +static __initdata const void *mach_match_data;

Applied to mips-next.

> commit a14bf1dc494a
> https://git.kernel.org/mips/c/a14bf1dc494a
> 
> Fixes: eed0eabd12ef ("MIPS: generic: Introduce generic DT-based board 
> support")
> Signed-off-by: Tiezhu Yang 
> Signed-off-by: Paul Burton 

Thanks,
Paul

[ This message was auto-generated; if you believe anything is incorrect
  then please email paul.bur...@mips.com to report it. ]


[PATCH] staging/octeon: Use stubs for MIPS && !CAVIUM_OCTEON_SOC

2019-10-07 Thread Paul Burton
When building for a non-Cavium MIPS system with COMPILE_TEST=y, the
Octeon ethernet driver hits a number of issues due to use of macros
provided only for CONFIG_CAVIUM_OCTEON_SOC=y configurations. For
example:

  drivers/staging/octeon/ethernet-rx.c:190:6: error:
'CONFIG_CAVIUM_OCTEON_CVMSEG_SIZE' undeclared (first use in this function)
  drivers/staging/octeon/ethernet-rx.c:472:25: error:
'OCTEON_IRQ_WORKQ0' undeclared (first use in this function)

These come from various asm/ headers that a non-Octeon build will be
using a non-Octeon version of.

Fix this by using the octeon-stubs.h header for non-Cavium MIPS builds,
and only using the real asm/octeon/ headers when building a Cavium
Octeon kernel configuration.

This requires that octeon-stubs.h doesn't redefine XKPHYS_TO_PHYS, which
is defined for MIPS by asm/addrspace.h which is pulled in by many other
common asm/ headers.

Signed-off-by: Paul Burton 
Reported-by: Geert Uytterhoeven 
URL: 
https://lore.kernel.org/linux-mips/CAMuHMdXvu+BppwzsU9imNWVKea_hoLcRt9N+a29Q-QsjW=i...@mail.gmail.com/
Fixes: 171a9bae68c7 ("staging/octeon: Allow test build on !MIPS")
Cc: Matthew Wilcox (Oracle) 
Cc: Greg Kroah-Hartman 
Cc: David S. Miller 

---

 drivers/staging/octeon/octeon-ethernet.h | 2 +-
 drivers/staging/octeon/octeon-stubs.h| 5 -
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/octeon/octeon-ethernet.h 
b/drivers/staging/octeon/octeon-ethernet.h
index a8a864b40913..042220d86d33 100644
--- a/drivers/staging/octeon/octeon-ethernet.h
+++ b/drivers/staging/octeon/octeon-ethernet.h
@@ -14,7 +14,7 @@
 #include 
 #include 
 
-#ifdef CONFIG_MIPS
+#ifdef CONFIG_CAVIUM_OCTEON_SOC
 
 #include 
 
diff --git a/drivers/staging/octeon/octeon-stubs.h 
b/drivers/staging/octeon/octeon-stubs.h
index a4ac3bfb62a8..c7ff90207f8a 100644
--- a/drivers/staging/octeon/octeon-stubs.h
+++ b/drivers/staging/octeon/octeon-stubs.h
@@ -1,5 +1,8 @@
 #define CONFIG_CAVIUM_OCTEON_CVMSEG_SIZE   512
-#define XKPHYS_TO_PHYS(p)  (p)
+
+#ifndef XKPHYS_TO_PHYS
+# define XKPHYS_TO_PHYS(p) (p)
+#endif
 
 #define OCTEON_IRQ_WORKQ0 0
 #define OCTEON_IRQ_RML 0
-- 
2.23.0



Re: [PATCH] mips: check for dsp presence only once before save/restore

2019-10-07 Thread Paul Burton
Hello,

Aurabindo Jayamohanan wrote:
> {save,restore}_dsp() internally checks if the cpu has dsp support.
> Therefore, explicit check is not required before calling them in
> {save,restore}_processor_state()

Applied to mips-next.

> commit 9662dd752c14
> https://git.kernel.org/mips/c/9662dd752c14
> 
> Signed-off-by: Aurabindo Jayamohanan 
> Signed-off-by: Paul Burton 

Thanks,
Paul

[ This message was auto-generated; if you believe anything is incorrect
  then please email paul.bur...@mips.com to report it. ]


Re: [PATCH v2 4/5] MIPS: CI20: DTS: Add Leds

2019-10-07 Thread Paul Burton
Hello,

Alexandre GRIVEAUX wrote:
> Adding leds and related triggers.

Applied to mips-next.

> commit 24b0cb4f883a
> https://git.kernel.org/mips/c/24b0cb4f883a
> 
> Signed-off-by: Alexandre GRIVEAUX 
> Signed-off-by: Paul Burton 

Thanks,
Paul

[ This message was auto-generated; if you believe anything is incorrect
  then please email paul.bur...@mips.com to report it. ]


Re: [PATCH v2 3/5] MIPS: CI20: DTS: Add IW8103 Wifi + bluetooth

2019-10-07 Thread Paul Burton
Hello,

Alexandre GRIVEAUX wrote:
> Add IW8103 Wifi + bluetooth module to device tree and related power domain.

Applied to mips-next.

> commit 948f2708f945
> https://git.kernel.org/mips/c/948f2708f945
> 
> Signed-off-by: Alexandre GRIVEAUX 
> Signed-off-by: Paul Burton 

Thanks,
Paul

[ This message was auto-generated; if you believe anything is incorrect
  then please email paul.bur...@mips.com to report it. ]


Re: [PATCH v2 2/5] MIPS: CI20: DTS: Add I2C nodes

2019-10-07 Thread Paul Burton
Hello,

Alexandre GRIVEAUX wrote:
> Adding missing I2C nodes and some peripheral:
> - PMU
> - RTC

Applied to mips-next.

> commit 73f2b940474d
> https://git.kernel.org/mips/c/73f2b940474d
> 
> Signed-off-by: Alexandre GRIVEAUX 
> Signed-off-by: Paul Burton 

Thanks,
Paul

[ This message was auto-generated; if you believe anything is incorrect
  then please email paul.bur...@mips.com to report it. ]


Re: [PATCH 1/2] MIPS: SGI-IP27: remove not used stuff inherited from IRIX

2019-10-07 Thread Paul Burton
Hello,

Thomas Bogendoerfer wrote:
> Most of the SN/SN0 header files are inherited from IRIX header files,
> but not all of that stuff is useful for Linux. Remove not used parts.

Series applied to mips-next.

> MIPS: SGI-IP27: remove not used stuff inherited from IRIX
>   commit 46a73e9e6ccc
>   https://git.kernel.org/mips/c/46a73e9e6ccc
>   
>   Signed-off-by: Thomas Bogendoerfer 
>   Signed-off-by: Paul Burton 
> 
> MIPS: SGI-IP27: get rid of compact node ids
>   commit 4bf841ebf17a
>   https://git.kernel.org/mips/c/4bf841ebf17a
>   
>   Signed-off-by: Thomas Bogendoerfer 
>   Signed-off-by: Paul Burton 

Thanks,
Paul

[ This message was auto-generated; if you believe anything is incorrect
  then please email paul.bur...@mips.com to report it. ]


Re: [PATCH v2 1/5] MIPS: JZ4780: DTS: Add I2C nodes

2019-10-07 Thread Paul Burton
Hello,

Alexandre GRIVEAUX wrote:
> Add the devicetree nodes for the I2C core of the JZ4780 SoC, disabled
> by default.

Applied to mips-next.

> commit f56a040c9faf
> https://git.kernel.org/mips/c/f56a040c9faf
> 
> Signed-off-by: Alexandre GRIVEAUX 
> Signed-off-by: Paul Burton 

Thanks,
Paul

[ This message was auto-generated; if you believe anything is incorrect
  then please email paul.bur...@mips.com to report it. ]


Re: [PATCH v2] mips: sgi-ip27: switch from DISCONTIGMEM to SPARSEMEM

2019-10-07 Thread Paul Burton
Hello,

Mike Rapoport wrote:
> From: Mike Rapoport 
> 
> The memory initialization of SGI-IP27 is already half-way to support
> SPARSEMEM. It only had free_bootmem_with_active_regions() left-overs
> interfering with sparse_memory_present_with_active_regions().
> 
> Replace these calls with simpler memblocks_present() call in prom_meminit()
> and adjust arch/mips/Kconfig to enable SPARSEMEM and SPARSEMEM_EXTREME for
> SGI-IP27.

Applied to mips-next.

> commit 397dc00e249e
> https://git.kernel.org/mips/c/397dc00e249e
> 
> Co-developed-by: Thomas Bogendoerfer 
> Signed-off-by: Thomas Bogendoerfer 
> Signed-off-by: Mike Rapoport 
> Signed-off-by: Paul Burton 

Thanks,
Paul

[ This message was auto-generated; if you believe anything is incorrect
  then please email paul.bur...@mips.com to report it. ]


Re: [PATCH v2 00/36] MIPS: barriers & atomics cleanups

2019-10-07 Thread Paul Burton
Hello,

Paul Burton wrote:
> This series consists of a bunch of cleanups to the way we handle memory
> barriers (though no changes to the sync instructions we use to implement
> them) & atomic memory accesses. One major goal was to ensure the
> Loongson3 LL/SC errata workarounds are applied in a safe manner from
> within inline-asm & that we can automatically verify the resulting
> kernel binary looks reasonable. Many patches are cleanups found along
> the way.
> 
> Applies atop v5.4-rc1.
> 
> Changes in v2:
> - Keep our fls/ffs implementations. Turns out GCC's builtins call
>   intrinsics in some configurations, and if we'd need to go implement
>   those then using the generic fls/ffs doesn't seem like such a win.
> - De-string __WEAK_LLSC_MB to allow use with __SYNC_ELSE().
> - Only try to build the loongson3-llsc-check tool from
>   arch/mips/Makefile when CONFIG_CPU_LOONGSON3_WORKAROUNDS is enabled.
> 
> Paul Burton (36):
>   MIPS: Unify sc beqz definition
>   MIPS: Use compact branch for LL/SC loops on MIPSr6+
>   MIPS: barrier: Add __SYNC() infrastructure
>   MIPS: barrier: Clean up rmb() & wmb() definitions
>   MIPS: barrier: Clean up __smp_mb() definition
>   MIPS: barrier: Remove fast_mb() Octeon #ifdef'ery
>   MIPS: barrier: Clean up __sync() definition
>   MIPS: barrier: Clean up sync_ginv()
>   MIPS: atomic: Fix whitespace in ATOMIC_OP macros
>   MIPS: atomic: Handle !kernel_uses_llsc first
>   MIPS: atomic: Use one macro to generate 32b & 64b functions
>   MIPS: atomic: Emit Loongson3 sync workarounds within asm
>   MIPS: atomic: Use _atomic barriers in atomic_sub_if_positive()
>   MIPS: atomic: Unify 32b & 64b sub_if_positive
>   MIPS: atomic: Deduplicate 32b & 64b read, set, xchg, cmpxchg
>   MIPS: bitops: Handle !kernel_uses_llsc first
>   MIPS: bitops: Only use ins for bit 16 or higher
>   MIPS: bitops: Use MIPS_ISA_REV, not #ifdefs
>   MIPS: bitops: ins start position is always an immediate
>   MIPS: bitops: Implement test_and_set_bit() in terms of _lock variant
>   MIPS: bitops: Allow immediates in test_and_{set,clear,change}_bit
>   MIPS: bitops: Use the BIT() macro
>   MIPS: bitops: Avoid redundant zero-comparison for non-LLSC
>   MIPS: bitops: Abstract LL/SC loops
>   MIPS: bitops: Use BIT_WORD() & BITS_PER_LONG
>   MIPS: bitops: Emit Loongson3 sync workarounds within asm
>   MIPS: bitops: Use smp_mb__before_atomic in test_* ops
>   MIPS: cmpxchg: Emit Loongson3 sync workarounds within asm
>   MIPS: cmpxchg: Omit redundant barriers for Loongson3
>   MIPS: futex: Emit Loongson3 sync workarounds within asm
>   MIPS: syscall: Emit Loongson3 sync workarounds within asm
>   MIPS: barrier: Remove loongson_llsc_mb()
>   MIPS: barrier: Make __smp_mb__before_atomic() a no-op for Loongson3
>   MIPS: genex: Add Loongson3 LL/SC workaround to ejtag_debug_handler
>   MIPS: genex: Don't reload address unnecessarily
>   MIPS: Check Loongson3 LL/SC errata workaround correctness
> 
>  arch/mips/Makefile |   3 +
>  arch/mips/Makefile.postlink|  10 +-

Series applied to mips-next.

> MIPS: Unify sc beqz definition
>   commit 878f75c7a253
>   https://git.kernel.org/mips/c/878f75c7a253
>   
>   Signed-off-by: Paul Burton 
> 
> MIPS: Use compact branch for LL/SC loops on MIPSr6+
>   commit ef85d057a605
>   https://git.kernel.org/mips/c/ef85d057a605
>   
>   Signed-off-by: Paul Burton 
> 
> MIPS: barrier: Add __SYNC() infrastructure
>   commit bf92927251b3
>   https://git.kernel.org/mips/c/bf92927251b3
>   
>   Signed-off-by: Paul Burton 
> 
> MIPS: barrier: Clean up rmb() & wmb() definitions
>   commit 21e3134b3ec0
>   https://git.kernel.org/mips/c/21e3134b3ec0
>   
>   Signed-off-by: Paul Burton 
> 
> MIPS: barrier: Clean up __smp_mb() definition
>   commit 05e6da742b5b
>   https://git.kernel.org/mips/c/05e6da742b5b
>   
>   Signed-off-by: Paul Burton 
> 
> MIPS: barrier: Remove fast_mb() Octeon #ifdef'ery
>   commit 5c12a6eff6ae
>   https://git.kernel.org/mips/c/5c12a6eff6ae
>   
>   Signed-off-by: Paul Burton 
> 
> MIPS: barrier: Clean up __sync() definition
>   commit fe0065e56227
>   https://git.kernel.org/mips/c/fe0065e56227
>   
>   Signed-off-by: Paul Burton 
> 
> MIPS: barrier: Clean up sync_ginv()
>   commit 185d7d7a5819
>   https://git.kernel.org/mips/c/185d7d7a5819
>   
>   Signed-off-by: Paul Burton 
> 
> MIPS: atomic: Fix whitespace in ATOMIC_OP macros
>   commit 36d3295c5a0d
>   https://git.kernel.org/mips/c/36d3295c5a0d
>   
>   Signed-off-by: Paul Burton 
> 
> MIPS: atomic: Handle !kernel_uses_llsc first
>   commit 9537db24c65a
>   https://git.kernel.org/mips/c/9537db24c65a
>   
>

Re: [PATCH] MIPS: include: Mark __cmpxchd as __always_inline

2019-10-07 Thread Paul Burton
Hello,

Thomas Bogendoerfer wrote:
> Commit ac7c3e4ff401 ("compiler: enable CONFIG_OPTIMIZE_INLINING
> forcibly") allows compiler to uninline functions marked as 'inline'.
> In cace of cmpxchg this would cause to reference function
> __cmpxchg_called_with_bad_pointer, which is a error case
> for catching bugs and will not happen for correct code, if
> __cmpxchg is inlined.

Applied to mips-fixes.

> commit 88356d09904b
> https://git.kernel.org/mips/c/88356d09904b
> 
> Signed-off-by: Thomas Bogendoerfer 
> [paul.bur...@mips.com: s/__cmpxchd/__cmpxchg in subject]
> Signed-off-by: Paul Burton 

Thanks,
Paul

[ This message was auto-generated; if you believe anything is incorrect
  then please email paul.bur...@mips.com to report it. ]


[GIT PULL] MIPS fixes

2019-10-04 Thread Paul Burton
Hi Linus,

Here is a selection of fixes for arch/mips, mostly handling regressions
introduced during the v5.4 merge window; please pull.

Thanks,
Paul


The following changes since commit 54ecb8f7028c5eb3d740bb82b0f1d90f2df63c5c:

  Linux 5.4-rc1 (2019-09-30 10:35:40 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux.git 
tags/mips_fixes_5.4_1

for you to fetch changes up to 6822c29ddbbdeafd8d1b79ebe6c51b83efd55ae1:

  MIPS: fw/arc: Remove unused addr variable (2019-10-04 11:46:22 -0700)


Some MIPS fixes for the 5.4 cycle:

- Build fixes for Cavium Octeon & PMC-Sierra MSP systems, as well as
  all pre-MIPSr6 configurations built with binutils < 2.25.

- Boot fixes for 64-bit Loongson systems & SGI IP28 systems.

- Wire up the new clone3 syscall.

- Clean ups for a few build-time warnings.


Christophe JAILLET (1):
  mips: Loongson: Fix the link time qualifier of 'serial_exit()'

Huacai Chen (1):
  MIPS: Loongson64: Fix boot failure after dropping boot_mem_map

Jiaxun Yang (1):
  MIPS: cpu-bugs64: Mark inline functions as __always_inline

Oleksij Rempel (1):
  MIPS: dts: ar9331: fix interrupt-controller size

Paul Burton (7):
  MIPS: octeon: Include required header; fix octeon ethernet build
  MIPS: Wire up clone3 syscall
  MIPS: VDSO: Remove unused gettimeofday.c
  MIPS: VDSO: Fix build for binutils < 2.25
  MIPS: pmcs-msp71xx: Add missing MAX_PROM_MEM definition
  MIPS: pmcs-msp71xx: Remove unused addr variable
  MIPS: fw/arc: Remove unused addr variable

Thomas Bogendoerfer (2):
  MIPS: init: Fix reservation of memory between PHYS_OFFSET and mem start
  MIPS: init: Prevent adding memory before PHYS_OFFSET

 arch/mips/boot/dts/qca/ar9331.dtsi|   2 +-
 arch/mips/fw/arc/memory.c |   1 -
 arch/mips/include/asm/octeon/cvmx-ipd.h   |   1 +
 arch/mips/include/asm/unistd.h|   1 +
 arch/mips/kernel/cpu-bugs64.c |  14 +-
 arch/mips/kernel/setup.c  |   5 +-
 arch/mips/kernel/syscall.c|   1 +
 arch/mips/kernel/syscalls/syscall_n32.tbl |   2 +-
 arch/mips/kernel/syscalls/syscall_n64.tbl |   2 +-
 arch/mips/kernel/syscalls/syscall_o32.tbl |   2 +-
 arch/mips/loongson64/common/mem.c |  35 ++--
 arch/mips/loongson64/common/serial.c  |   2 +-
 arch/mips/loongson64/loongson-3/numa.c|  11 +-
 arch/mips/pmcs-msp71xx/msp_prom.c |   4 +-
 arch/mips/vdso/Makefile   |   2 +-
 arch/mips/vdso/gettimeofday.c | 269 --
 16 files changed, 41 insertions(+), 313 deletions(-)
 delete mode 100644 arch/mips/vdso/gettimeofday.c


signature.asc
Description: PGP signature


[PATCH v2] mtd: rawnand: au1550nd: Fix au_read_buf16() prototype

2019-10-04 Thread Paul Burton
Commit 7e534323c416 ("mtd: rawnand: Pass a nand_chip object to
chip->read_xxx() hooks") modified the prototype of the struct nand_chip
read_buf function pointer. In the au1550nd driver we have 2
implementations of read_buf. The previously mentioned commit modified
the au_read_buf() implementation to match the function pointer, but not
au_read_buf16(). This results in a compiler warning for MIPS
db1xxx_defconfig builds:

  drivers/mtd/nand/raw/au1550nd.c:443:57:
warning: pointer type mismatch in conditional expression

Fix this by updating the prototype of au_read_buf16() to take a struct
nand_chip pointer as its first argument, as is expected after commit
7e534323c416 ("mtd: rawnand: Pass a nand_chip object to chip->read_xxx()
hooks").

Note that this shouldn't have caused any functional issues at runtime,
since the offset of the struct mtd_info within struct nand_chip is 0
making mtd_to_nand() effectively a type-cast.

Signed-off-by: Paul Burton 
Fixes: 7e534323c416 ("mtd: rawnand: Pass a nand_chip object to chip->read_xxx() 
hooks")
Cc: Boris Brezillon 
Cc: Miquel Raynal 
Cc: David Woodhouse 
Cc: Brian Norris 
Cc: Marek Vasut 
Cc: Vignesh Raghavendra 
Cc: linux-...@lists.infradead.org
Cc: linux-m...@vger.kernel.org
Cc: sta...@vger.kernel.org # v4.20+

---

Changes in v2:
- Update kerneldoc comment too...

 drivers/mtd/nand/raw/au1550nd.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/mtd/nand/raw/au1550nd.c b/drivers/mtd/nand/raw/au1550nd.c
index 97a97a9ccc36..e10b76089048 100644
--- a/drivers/mtd/nand/raw/au1550nd.c
+++ b/drivers/mtd/nand/raw/au1550nd.c
@@ -134,16 +134,15 @@ static void au_write_buf16(struct nand_chip *this, const 
u_char *buf, int len)
 
 /**
  * au_read_buf16 -  read chip data into buffer
- * @mtd:   MTD device structure
+ * @this:  NAND chip object
  * @buf:   buffer to store date
  * @len:   number of bytes to read
  *
  * read function for 16bit buswidth
  */
-static void au_read_buf16(struct mtd_info *mtd, u_char *buf, int len)
+static void au_read_buf16(struct nand_chip *this, u_char *buf, int len)
 {
int i;
-   struct nand_chip *this = mtd_to_nand(mtd);
u16 *p = (u16 *) buf;
len >>= 1;
 
-- 
2.23.0



[PATCH] mtd: rawnand: au1550nd: Fix au_read_buf16() prototype

2019-10-04 Thread Paul Burton
Commit 7e534323c416 ("mtd: rawnand: Pass a nand_chip object to
chip->read_xxx() hooks") modified the prototype of the struct nand_chip
read_buf function pointer. In the au1550nd driver we have 2
implementations of read_buf. The previously mentioned commit modified
the au_read_buf() implementation to match the function pointer, but not
au_read_buf16(). This results in a compiler warning for MIPS
db1xxx_defconfig builds:

  drivers/mtd/nand/raw/au1550nd.c:443:57:
warning: pointer type mismatch in conditional expression

Fix this by updating the prototype of au_read_buf16() to take a struct
nand_chip pointer as its first argument, as is expected after commit
7e534323c416 ("mtd: rawnand: Pass a nand_chip object to chip->read_xxx()
hooks").

Note that this shouldn't have caused any functional issues at runtime,
since the offset of the struct mtd_info within struct nand_chip is 0
making mtd_to_nand() effectively a type-cast.

Signed-off-by: Paul Burton 
Fixes: 7e534323c416 ("mtd: rawnand: Pass a nand_chip object to chip->read_xxx() 
hooks")
Cc: Boris Brezillon 
Cc: Miquel Raynal 
Cc: David Woodhouse 
Cc: Brian Norris 
Cc: Marek Vasut 
Cc: Vignesh Raghavendra 
Cc: linux-...@lists.infradead.org
Cc: linux-m...@vger.kernel.org
Cc: sta...@vger.kernel.org # v4.20+

---

 drivers/mtd/nand/raw/au1550nd.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/mtd/nand/raw/au1550nd.c b/drivers/mtd/nand/raw/au1550nd.c
index 97a97a9ccc36..2bc818dea2a8 100644
--- a/drivers/mtd/nand/raw/au1550nd.c
+++ b/drivers/mtd/nand/raw/au1550nd.c
@@ -140,10 +140,9 @@ static void au_write_buf16(struct nand_chip *this, const 
u_char *buf, int len)
  *
  * read function for 16bit buswidth
  */
-static void au_read_buf16(struct mtd_info *mtd, u_char *buf, int len)
+static void au_read_buf16(struct nand_chip *this, u_char *buf, int len)
 {
int i;
-   struct nand_chip *this = mtd_to_nand(mtd);
u16 *p = (u16 *) buf;
len >>= 1;
 
-- 
2.23.0



Re: [PATCH] MIPS: init: Prevent adding memory before PHYS_OFFSET

2019-10-02 Thread Paul Burton
Hello,

Thomas Bogendoerfer wrote:
> On some SGI machines (IP28 and IP30) a small region of memory is mirrored
> to pyhsical address 0 for exception vectors while rest of the memory
> is reachable at a higher physical address. ARC PROM marks this
> region as reserved, but with commit a94e4f24ec83 ("MIPS: init: Drop
> boot_mem_map") this chunk is used, when searching for start of ram,
> which breaks at least IP28 and IP30 machines. To fix this
> add_region_memory() checks for start address < PHYS_OFFSET and ignores
> these chunks.

Applied to mips-fixes.

> commit bd848d1b9235
> https://git.kernel.org/mips/c/bd848d1b9235
> 
> Fixes: a94e4f24ec83 ("MIPS: init: Drop boot_mem_map")
> Signed-off-by: Thomas Bogendoerfer 
> Signed-off-by: Paul Burton 

Thanks,
Paul

[ This message was auto-generated; if you believe anything is incorrect
  then please email paul.bur...@mips.com to report it. ]


Re: [PATCH] MIPS: init: Fix reservation of memory between PHYS_OFFSET and mem start

2019-10-02 Thread Paul Burton
Hello,

Thomas Bogendoerfer wrote:
> Fix calculation of the size for reserving memory between PHYS_OFFSET
> and real memory start.

Applied to mips-fixes.

> commit 66b416ee41ed
> https://git.kernel.org/mips/c/66b416ee41ed
> 
> Fixes: a94e4f24ec83 ("MIPS: init: Drop boot_mem_map")
> Signed-off-by: Thomas Bogendoerfer 
> Signed-off-by: Paul Burton 

Thanks,
Paul

[ This message was auto-generated; if you believe anything is incorrect
  then please email paul.bur...@mips.com to report it. ]


Re: [PATCH] mips: Loongson: Fix the link time qualifier of 'serial_exit()'

2019-10-02 Thread Paul Burton
Hello,

Christophe JAILLET wrote:
> 'exit' functions should be marked as __exit, not __init.

Applied to mips-fixes.

> commit 25b69a889b63
> https://git.kernel.org/mips/c/25b69a889b63
> 
> Fixes: 85cc028817ef ("mips: make loongsoon serial driver explicitly modular")
> Signed-off-by: Christophe JAILLET 
> Signed-off-by: Paul Burton 

Thanks,
Paul

[ This message was auto-generated; if you believe anything is incorrect
  then please email paul.bur...@mips.com to report it. ]


Re: Build regressions/improvements in v5.4-rc1

2019-10-02 Thread Paul Burton
Hi Geert,

On Wed, Oct 02, 2019 at 11:17:26AM +0200, Geert Uytterhoeven wrote:
> > 15 error regressions:
> >   + /kisskb/build/tmp/cc1Or5dj.s: Error: can't resolve `_start' {*UND* 
> > section} - `L0 ' {.text section}:  => 663, 1200, 222, 873, 1420
> >   + /kisskb/build/tmp/cc2uWmof.s: Error: can't resolve `_start' {*UND* 
> > section} - `L0 ' {.text section}:  => 1213, 919, 688, 1434, 226
> >   + /kisskb/build/tmp/ccc6hBqd.s: Error: can't resolve `_start' {*UND* 
> > section} - `L0 ' {.text section}:  => 513, 1279, 1058, 727
> >   + /kisskb/build/tmp/cclSQ19p.s: Error: can't resolve `_start' {*UND* 
> > section} - `L0 ' {.text section}:  => 1396, 881, 1175, 671, 226
> >   + /kisskb/build/tmp/ccu3SlxY.s: Error: can't resolve `_start' {*UND* 
> > section} - `L0 ' {.text section}:  => 1238, 911, 222, 680, 1457
> 
> Various mips (allmodconfig, allnoconfig, malta_defconfig, ip22_defconfig)
> 
> Related to
> 
> /kisskb/src/arch/mips/vdso/Makefile:61: MIPS VDSO requires binutils >= 
> 2.25
> 
> ?

Hmm, this looks like fallout from the conversion to the generic VDSO
infrastructure. This patch resolves it:

  
https://lore.kernel.org/linux-mips/20191002174438.127127-2-paul.bur...@mips.com/

> >   + /kisskb/src/arch/mips/include/asm/octeon/cvmx-ipd.h: error: 
> > 'CVMX_PIP_SFT_RST' undeclared (first use in this function):  => 331:36
> >   + /kisskb/src/arch/mips/include/asm/octeon/cvmx-ipd.h: error: 
> > 'CVMX_PIP_SFT_RST' undeclared (first use in this function); did you mean 
> > 'CVMX_CIU_SOFT_RST'?:  => 331:36
> >   + /kisskb/src/arch/mips/include/asm/octeon/cvmx-ipd.h: error: storage 
> > size of 'pip_sft_rst' isn't known:  => 330:27
> 
> mips-allmodconfig (CC Matthew Wilcox)

That one's triggered by a change in the ordering of some include
directives in the drivers/staging/octeon code, and fixed by commit
0228ecf6128c ("MIPS: octeon: Include required header; fix octeon
ethernet build") in mips-next.

Thanks,
Paul


Re: [PATCH v2 5/5] MIPS: JZ4780: DTS: Add CPU nodes

2019-10-01 Thread Paul Burton
Hi Alexandre,

On Tue, Oct 01, 2019 at 09:09:48PM +0200, Alexandre GRIVEAUX wrote:
> The JZ4780 have 2 core, adding to DT.
> 
> Signed-off-by: Alexandre GRIVEAUX 
> ---
>  arch/mips/boot/dts/ingenic/jz4780.dtsi | 17 +
>  1 file changed, 17 insertions(+)
> 
> diff --git a/arch/mips/boot/dts/ingenic/jz4780.dtsi 
> b/arch/mips/boot/dts/ingenic/jz4780.dtsi
> index f928329b034b..9c7346724f1f 100644
> --- a/arch/mips/boot/dts/ingenic/jz4780.dtsi
> +++ b/arch/mips/boot/dts/ingenic/jz4780.dtsi
> @@ -7,6 +7,23 @@
>   #size-cells = <1>;
>   compatible = "ingenic,jz4780";
>  
> + cpus {
> + #address-cells = <1>;
> + #size-cells = <0>;
> +
> + cpu@0 {
> + compatible = "ingenic,jz4780";

This should probably be something like ingenic,xburst2. JZ4780 is the
SoC. It also should be a documented binding, but I think it would be
worth holding off on the whole thing until we actually get SMP support
merged - just in case we come up with a binding that doesn't actually
work out.

So I expect I'll just apply patches 1-4 for now.

Thanks for working on it!

Paul

> + device_type = "cpu";
> + reg = <0>;
> + };
> +
> + cpu@1 {
> + compatible = "ingenic,jz4780";
> + device_type = "cpu";
> + reg = <1>;
> + };
> + };
> +
>   cpuintc: interrupt-controller {
>   #address-cells = <0>;
>   #interrupt-cells = <1>;
> -- 
> 2.20.1
> 


[PATCH v2 03/36] MIPS: barrier: Add __SYNC() infrastructure

2019-10-01 Thread Paul Burton
Introduce an asm/sync.h header which provides infrastructure that can be
used to generate sync instructions of various types, and for various
reasons. For example if we need a sync instruction that provides a full
completion barrier but only on systems which have weak memory ordering,
we can generate the appropriate assembly code using:

  __SYNC(full, weak_ordering)

When the kernel is configured to run on systems with weak memory
ordering (ie. CONFIG_WEAK_ORDERING is selected) we'll emit a sync
instruction. When the kernel is configured to run on systems with strong
memory ordering (ie. CONFIG_WEAK_ORDERING is not selected) we'll emit
nothing. The caller doesn't need to know which happened - it simply says
what it needs & when, with no concern for checking the kernel
configuration.

There are some scenarios in which we may want to emit code only when we
*didn't* emit a sync instruction. For example, some Loongson3 CPUs
suffer from a bug that requires us to emit a sync instruction prior to
each ll instruction (enabled by CONFIG_CPU_LOONGSON3_WORKAROUNDS). In
cases where this bug workaround is enabled, it's wasteful to then have
more generic code emit another sync instruction to provide barriers we
need in general. A __SYNC_ELSE() macro allows for this, providing an
extra argument that contains code to be assembled only in cases where
the sync instruction was not emitted. For example if we have a scenario
in which we generally want to emit a release barrier but for affected
Loongson3 configurations upgrade that to a full completion barrier, we
can do that like so:

  __SYNC_ELSE(full, loongson3_war, __SYNC(rl, always))

The assembly generated by these macros can be used either as inline
assembly or in assembly source files.

Differing types of sync as provided by MIPSr6 are defined, but currently
they all generate a full completion barrier except in kernels configured
for Cavium Octeon systems. There the wmb sync-type is used, and rmb
syncs are omitted, as has been the case since commit 6b07d38aaa52
("MIPS: Octeon: Use optimized memory barrier primitives."). Using
__SYNC() with the wmb or rmb types will abstract away the Octeon
specific behavior and allow us to later clean up asm/barrier.h code that
currently includes a plethora of #ifdef's.

Signed-off-by: Paul Burton 
---

Changes in v2: None

 arch/mips/include/asm/barrier.h | 113 +
 arch/mips/include/asm/sync.h| 207 
 arch/mips/kernel/pm-cps.c   |  20 +--
 3 files changed, 219 insertions(+), 121 deletions(-)
 create mode 100644 arch/mips/include/asm/sync.h

diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index 9228f7386220..5ad39bfd3b6d 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -9,116 +9,7 @@
 #define __ASM_BARRIER_H
 
 #include 
-
-/*
- * Sync types defined by the MIPS architecture (document MD00087 table 6.5)
- * These values are used with the sync instruction to perform memory barriers.
- * Types of ordering guarantees available through the SYNC instruction:
- * - Completion Barriers
- * - Ordering Barriers
- * As compared to the completion barrier, the ordering barrier is a
- * lighter-weight operation as it does not require the specified instructions
- * before the SYNC to be already completed. Instead it only requires that those
- * specified instructions which are subsequent to the SYNC in the instruction
- * stream are never re-ordered for processing ahead of the specified
- * instructions which are before the SYNC in the instruction stream.
- * This potentially reduces how many cycles the barrier instruction must stall
- * before it completes.
- * Implementations that do not use any of the non-zero values of stype to 
define
- * different barriers, such as ordering barriers, must make those stype values
- * act the same as stype zero.
- */
-
-/*
- * Completion barriers:
- * - Every synchronizable specified memory instruction (loads or stores or 
both)
- *   that occurs in the instruction stream before the SYNC instruction must be
- *   already globally performed before any synchronizable specified memory
- *   instructions that occur after the SYNC are allowed to be performed, with
- *   respect to any other processor or coherent I/O module.
- *
- * - The barrier does not guarantee the order in which instruction fetches are
- *   performed.
- *
- * - A stype value of zero will always be defined such that it performs the 
most
- *   complete set of synchronization operations that are defined.This means
- *   stype zero always does a completion barrier that affects both loads and
- *   stores preceding the SYNC instruction and both loads and stores that are
- *   subsequent to the SYNC instruction. Non-zero values of stype may be 
defined
- *   by the architecture or specific implementations to perform synchronization
- *   behaviors that are less complete than that of stype zero. If an
- *   imple

[PATCH v2 18/36] MIPS: bitops: Use MIPS_ISA_REV, not #ifdefs

2019-10-01 Thread Paul Burton
Rather than #ifdef on CONFIG_CPU_* to determine whether the ins
instruction is supported we can simply check MIPS_ISA_REV to discover
whether we're targeting MIPSr2 or higher. Do so in order to clean up the
code.

Signed-off-by: Paul Burton 
---

Changes in v2: None

 arch/mips/include/asm/bitops.h | 13 -
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index 1e5739191ddf..0f5329e32e87 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -19,6 +19,7 @@
 #include  /* sigh ... */
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -76,8 +77,7 @@ static inline void set_bit(unsigned long nr, volatile 
unsigned long *addr)
return;
}
 
-#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)
-   if (__builtin_constant_p(bit) && (bit >= 16)) {
+   if ((MIPS_ISA_REV >= 2) && __builtin_constant_p(bit) && (bit >= 16)) {
loongson_llsc_mb();
do {
__asm__ __volatile__(
@@ -90,7 +90,6 @@ static inline void set_bit(unsigned long nr, volatile 
unsigned long *addr)
} while (unlikely(!temp));
return;
}
-#endif /* CONFIG_CPU_MIPSR2 || CONFIG_CPU_MIPSR6 */
 
loongson_llsc_mb();
do {
@@ -143,8 +142,7 @@ static inline void clear_bit(unsigned long nr, volatile 
unsigned long *addr)
return;
}
 
-#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)
-   if (__builtin_constant_p(bit)) {
+   if ((MIPS_ISA_REV >= 2) && __builtin_constant_p(bit)) {
loongson_llsc_mb();
do {
__asm__ __volatile__(
@@ -157,7 +155,6 @@ static inline void clear_bit(unsigned long nr, volatile 
unsigned long *addr)
} while (unlikely(!temp));
return;
}
-#endif /* CONFIG_CPU_MIPSR2 || CONFIG_CPU_MIPSR6 */
 
loongson_llsc_mb();
do {
@@ -377,8 +374,7 @@ static inline int test_and_clear_bit(unsigned long nr,
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=" (res)
: "r" (1UL << bit)
: __LLSC_CLOBBER);
-#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)
-   } else if (__builtin_constant_p(nr)) {
+   } else if ((MIPS_ISA_REV >= 2) && __builtin_constant_p(nr)) {
loongson_llsc_mb();
do {
__asm__ __volatile__(
@@ -390,7 +386,6 @@ static inline int test_and_clear_bit(unsigned long nr,
: "ir" (bit)
: __LLSC_CLOBBER);
} while (unlikely(!temp));
-#endif
} else {
loongson_llsc_mb();
do {
-- 
2.23.0



[PATCH v2 07/36] MIPS: barrier: Clean up __sync() definition

2019-10-01 Thread Paul Burton
Implement __sync() using the new __SYNC() infrastructure, which will
take care of not emitting an instruction for old R3k CPUs that don't
support it. The only behavioral difference is that __sync() will now
provide a compiler barrier on these old CPUs, but that seems like
reasonable behavior anyway.

Signed-off-by: Paul Burton 
---

Changes in v2: None

 arch/mips/include/asm/barrier.h | 18 --
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index 657ec01120a4..a117c6d95038 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -11,20 +11,10 @@
 #include 
 #include 
 
-#ifdef CONFIG_CPU_HAS_SYNC
-#define __sync()   \
-   __asm__ __volatile__(   \
-   ".set   push\n\t"   \
-   ".set   noreorder\n\t"  \
-   ".set   mips2\n\t"  \
-   "sync\n\t"  \
-   ".set   pop"\
-   : /* no output */   \
-   : /* no input */\
-   : "memory")
-#else
-#define __sync()   do { } while(0)
-#endif
+static inline void __sync(void)
+{
+   asm volatile(__SYNC(full, always) ::: "memory");
+}
 
 static inline void rmb(void)
 {
-- 
2.23.0



[PATCH v2 09/36] MIPS: atomic: Fix whitespace in ATOMIC_OP macros

2019-10-01 Thread Paul Burton
We define macros in asm/atomic.h which end each line with space
characters before a backslash to continue on the next line. Remove the
space characters leaving tabs as the whitespace used for conformity with
coding convention.

Signed-off-by: Paul Burton 
---

Changes in v2: None

 arch/mips/include/asm/atomic.h | 184 -
 1 file changed, 92 insertions(+), 92 deletions(-)

diff --git a/arch/mips/include/asm/atomic.h b/arch/mips/include/asm/atomic.h
index 7578c807ef98..2d2a8a74c51b 100644
--- a/arch/mips/include/asm/atomic.h
+++ b/arch/mips/include/asm/atomic.h
@@ -42,102 +42,102 @@
  */
 #define atomic_set(v, i)   WRITE_ONCE((v)->counter, (i))
 
-#define ATOMIC_OP(op, c_op, asm_op)  \
-static __inline__ void atomic_##op(int i, atomic_t * v)
  \
-{\
-   if (kernel_uses_llsc) {   \
-   int temp; \
- \
-   loongson_llsc_mb();   \
-   __asm__ __volatile__( \
-   "   .setpush\n"   \
-   "   .set"MIPS_ISA_LEVEL"\n"   \
-   "1: ll  %0, %1  # atomic_" #op "\n"   \
-   "   " #asm_op " %0, %2  \n"   \
-   "   sc  %0, %1  \n"   \
-   "\t" __SC_BEQZ "%0, 1b  \n"   \
-   "   .setpop \n"   \
-   : "=" (temp), "+" GCC_OFF_SMALL_ASM() (v->counter)  \
-   : "Ir" (i) : __LLSC_CLOBBER); \
-   } else {  \
-   unsigned long flags;  \
- \
-   raw_local_irq_save(flags);\
-   v->counter c_op i;\
-   raw_local_irq_restore(flags); \
-   } \
+#define ATOMIC_OP(op, c_op, asm_op)\
+static __inline__ void atomic_##op(int i, atomic_t * v)
\
+{  \
+   if (kernel_uses_llsc) { \
+   int temp;   \
+   \
+   loongson_llsc_mb(); \
+   __asm__ __volatile__(   \
+   "   .setpush\n" \
+   "   .set"MIPS_ISA_LEVEL"\n" \
+   "1: ll  %0, %1  # atomic_" #op "\n" \
+   "   " #asm_op " %0, %2  \n" \
+   "   sc  %0, %1  \n" \
+   "\t" __SC_BEQZ "%0, 1b  \n" \
+   "   .setpop \n" \
+   : "=" (temp), "+" GCC_OFF_SMALL_ASM() (v->counter)\
+   : "Ir" (i) : __LLSC_CLOBBER);   \
+   } else {\
+   unsigned long flags;\
+   \
+   raw_local_irq_save(flags);  \
+   v->counter c_op i;  \
+   raw_local_irq_restore(flags);   \
+   }   \
 }
 
-#define ATOMIC_OP_RETURN(op, c_op, asm_op)   \
-static __inline__ int atomic_##op##_return_relaxed(int i, atomic_t * v)
  \
-{   

[PATCH v2 05/36] MIPS: barrier: Clean up __smp_mb() definition

2019-10-01 Thread Paul Burton
We #ifdef on Cavium Octeon CPUs, but emit the same sync instruction in
both cases. Remove the #ifdef & simply expand to the __sync() macro.

Whilst here indent the strong ordering case definitions to match the
indentation of the weak ordering ones, helping readability.

Signed-off-by: Paul Burton 
---

Changes in v2: None

 arch/mips/include/asm/barrier.h | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index f36cab87cfde..8a5abc1c85a6 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -89,17 +89,13 @@ static inline void wmb(void)
 #endif /* !CONFIG_CPU_HAS_WB */
 
 #if defined(CONFIG_WEAK_ORDERING)
-# ifdef CONFIG_CPU_CAVIUM_OCTEON
-#  define __smp_mb()   __sync()
-# else
-#  define __smp_mb()   __asm__ __volatile__("sync" : : :"memory")
-# endif
+# define __smp_mb()__sync()
 # define __smp_rmb()   rmb()
 # define __smp_wmb()   wmb()
 #else
-#define __smp_mb() barrier()
-#define __smp_rmb()barrier()
-#define __smp_wmb()barrier()
+# define __smp_mb()barrier()
+# define __smp_rmb()   barrier()
+# define __smp_wmb()   barrier()
 #endif
 
 /*
-- 
2.23.0



[PATCH v2 02/36] MIPS: Use compact branch for LL/SC loops on MIPSr6+

2019-10-01 Thread Paul Burton
When targeting MIPSr6 or higher make use of a compact branch in LL/SC
loops, preventing the insertion of a delay slot nop that only serves to
waste space.

Signed-off-by: Paul Burton 
---

Changes in v2: None

 arch/mips/include/asm/llsc.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/mips/include/asm/llsc.h b/arch/mips/include/asm/llsc.h
index 9b19f38562ac..d240a4a2d1c4 100644
--- a/arch/mips/include/asm/llsc.h
+++ b/arch/mips/include/asm/llsc.h
@@ -9,6 +9,8 @@
 #ifndef __ASM_LLSC_H
 #define __ASM_LLSC_H
 
+#include 
+
 #if _MIPS_SZLONG == 32
 #define SZLONG_LOG 5
 #define SZLONG_MASK 31UL
@@ -32,6 +34,8 @@
  */
 #if R1_LLSC_WAR
 # define __SC_BEQZ "beqzl  "
+#elif MIPS_ISA_REV >= 6
+# define __SC_BEQZ "beqzc  "
 #else
 # define __SC_BEQZ "beqz   "
 #endif
-- 
2.23.0



[PATCH v2 15/36] MIPS: atomic: Deduplicate 32b & 64b read, set, xchg, cmpxchg

2019-10-01 Thread Paul Burton
Remove the remaining duplication between 32b & 64b in asm/atomic.h by
making use of an ATOMIC_OPS() macro to generate:

  - atomic_read()/atomic64_read()
  - atomic_set()/atomic64_set()
  - atomic_cmpxchg()/atomic64_cmpxchg()
  - atomic_xchg()/atomic64_xchg()

This is consistent with the way all other functions in asm/atomic.h are
generated, and ensures consistency between the 32b & 64b functions.

Of note is that this results in the above now being static inline
functions rather than macros.

Signed-off-by: Paul Burton 
---

Changes in v2: None

 arch/mips/include/asm/atomic.h | 70 +-
 1 file changed, 27 insertions(+), 43 deletions(-)

diff --git a/arch/mips/include/asm/atomic.h b/arch/mips/include/asm/atomic.h
index 96ef50fa2817..e5ac88392d1f 100644
--- a/arch/mips/include/asm/atomic.h
+++ b/arch/mips/include/asm/atomic.h
@@ -24,24 +24,34 @@
 #include 
 #include 
 
-#define ATOMIC_INIT(i)   { (i) }
+#define ATOMIC_OPS(pfx, type)  \
+static __always_inline type pfx##_read(const pfx##_t *v)   \
+{  \
+   return READ_ONCE(v->counter);   \
+}  \
+   \
+static __always_inline void pfx##_set(pfx##_t *v, type i)  \
+{  \
+   WRITE_ONCE(v->counter, i);  \
+}  \
+   \
+static __always_inline type pfx##_cmpxchg(pfx##_t *v, type o, type n)  \
+{  \
+   return cmpxchg(>counter, o, n);  \
+}  \
+   \
+static __always_inline type pfx##_xchg(pfx##_t *v, type n) \
+{  \
+   return xchg(>counter, n);\
+}
 
-/*
- * atomic_read - read atomic variable
- * @v: pointer of type atomic_t
- *
- * Atomically reads the value of @v.
- */
-#define atomic_read(v) READ_ONCE((v)->counter)
+#define ATOMIC_INIT(i) { (i) }
+ATOMIC_OPS(atomic, int)
 
-/*
- * atomic_set - set atomic variable
- * @v: pointer of type atomic_t
- * @i: required value
- *
- * Atomically sets the value of @v to @i.
- */
-#define atomic_set(v, i)   WRITE_ONCE((v)->counter, (i))
+#ifdef CONFIG_64BIT
+# define ATOMIC64_INIT(i)  { (i) }
+ATOMIC_OPS(atomic64, s64)
+#endif
 
 #define ATOMIC_OP(pfx, op, type, c_op, asm_op, ll, sc) \
 static __inline__ void pfx##_##op(type i, pfx##_t * v) \
@@ -135,6 +145,7 @@ static __inline__ type pfx##_fetch_##op##_relaxed(type i, 
pfx##_t * v)  \
return result;  \
 }
 
+#undef ATOMIC_OPS
 #define ATOMIC_OPS(pfx, op, type, c_op, asm_op, ll, sc)
\
ATOMIC_OP(pfx, op, type, c_op, asm_op, ll, sc)  \
ATOMIC_OP_RETURN(pfx, op, type, c_op, asm_op, ll, sc)   \
@@ -254,31 +265,4 @@ ATOMIC_SIP_OP(atomic64, s64, dsubu, lld, scd)
 
 #undef ATOMIC_SIP_OP
 
-#define atomic_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n)))
-#define atomic_xchg(v, new) (xchg(&((v)->counter), (new)))
-
-#ifdef CONFIG_64BIT
-
-#define ATOMIC64_INIT(i){ (i) }
-
-/*
- * atomic64_read - read atomic variable
- * @v: pointer of type atomic64_t
- *
- */
-#define atomic64_read(v)   READ_ONCE((v)->counter)
-
-/*
- * atomic64_set - set atomic variable
- * @v: pointer of type atomic64_t
- * @i: required value
- */
-#define atomic64_set(v, i) WRITE_ONCE((v)->counter, (i))
-
-#define atomic64_cmpxchg(v, o, n) \
-   ((__typeof__((v)->counter))cmpxchg(&((v)->counter), (o), (n)))
-#define atomic64_xchg(v, new) (xchg(&((v)->counter), (new)))
-
-#endif /* CONFIG_64BIT */
-
 #endif /* _ASM_ATOMIC_H */
-- 
2.23.0



[PATCH v2 10/36] MIPS: atomic: Handle !kernel_uses_llsc first

2019-10-01 Thread Paul Burton
Handle the !kernel_uses_llsc path first in our ATOMIC_OP(),
ATOMIC_OP_RETURN() & ATOMIC_FETCH_OP() macros & return from within the
block. This allows us to de-indent the kernel_uses_llsc path by one
level which will be useful when making further changes.

Signed-off-by: Paul Burton 
---

Changes in v2: None

 arch/mips/include/asm/atomic.h | 99 +-
 1 file changed, 49 insertions(+), 50 deletions(-)

diff --git a/arch/mips/include/asm/atomic.h b/arch/mips/include/asm/atomic.h
index 2d2a8a74c51b..ace2ea005588 100644
--- a/arch/mips/include/asm/atomic.h
+++ b/arch/mips/include/asm/atomic.h
@@ -45,51 +45,36 @@
 #define ATOMIC_OP(op, c_op, asm_op)\
 static __inline__ void atomic_##op(int i, atomic_t * v)
\
 {  \
-   if (kernel_uses_llsc) { \
-   int temp;   \
+   int temp;   \
\
-   loongson_llsc_mb(); \
-   __asm__ __volatile__(   \
-   "   .setpush\n" \
-   "   .set"MIPS_ISA_LEVEL"\n" \
-   "1: ll  %0, %1  # atomic_" #op "\n" \
-   "   " #asm_op " %0, %2  \n" \
-   "   sc  %0, %1  \n" \
-   "\t" __SC_BEQZ "%0, 1b  \n" \
-   "   .setpop \n" \
-   : "=" (temp), "+" GCC_OFF_SMALL_ASM() (v->counter)\
-   : "Ir" (i) : __LLSC_CLOBBER);   \
-   } else {\
+   if (!kernel_uses_llsc) {\
unsigned long flags;\
\
raw_local_irq_save(flags);  \
v->counter c_op i;  \
raw_local_irq_restore(flags);   \
+   return; \
}   \
+   \
+   loongson_llsc_mb(); \
+   __asm__ __volatile__(   \
+   "   .setpush\n" \
+   "   .set" MIPS_ISA_LEVEL "  \n" \
+   "1: ll  %0, %1  # atomic_" #op "\n" \
+   "   " #asm_op " %0, %2  \n" \
+   "   sc  %0, %1  \n" \
+   "\t" __SC_BEQZ "%0, 1b  \n" \
+   "   .setpop \n" \
+   : "=" (temp), "+" GCC_OFF_SMALL_ASM() (v->counter)\
+   : "Ir" (i) : __LLSC_CLOBBER);   \
 }
 
 #define ATOMIC_OP_RETURN(op, c_op, asm_op) \
 static __inline__ int atomic_##op##_return_relaxed(int i, atomic_t * v)
\
 {  \
-   int result; \
-   \
-   if (kernel_uses_llsc) { \
-   int temp;   \
+   int temp, result;   \
\
-   loongson_llsc_mb(); \
-   __asm__ __volatile__(   \
-   "   .setpush\n" \
-   "   .set"MIPS_ISA_LEVEL"\n" \
-   "1: ll  %1, %2  # atomic_" #op "_return \n" \
-   "   " #asm_op " %0, %1, %3   

[PATCH v2 23/36] MIPS: bitops: Avoid redundant zero-comparison for non-LLSC

2019-10-01 Thread Paul Burton
The IRQ-disabling non-LLSC fallbacks for bitops on UP systems already
return a zero or one, so there's no need to perform another comparison
against zero. Move these comparisons into the LLSC paths to avoid the
redundant work.

Signed-off-by: Paul Burton 
---

Changes in v2: None

 arch/mips/include/asm/bitops.h | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index 0f8ff896e86b..7671db2a7b73 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -264,6 +264,8 @@ static inline int test_and_set_bit_lock(unsigned long nr,
: "=" (temp), "+m" (*m), "=" (res)
: "ir" (BIT(bit))
: __LLSC_CLOBBER);
+
+   res = res != 0;
} else {
loongson_llsc_mb();
do {
@@ -279,12 +281,12 @@ static inline int test_and_set_bit_lock(unsigned long nr,
: __LLSC_CLOBBER);
} while (unlikely(!res));
 
-   res = temp & BIT(bit);
+   res = (temp & BIT(bit)) != 0;
}
 
smp_llsc_mb();
 
-   return res != 0;
+   return res;
 }
 
 /*
@@ -335,6 +337,8 @@ static inline int test_and_clear_bit(unsigned long nr,
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=" (res)
: "ir" (BIT(bit))
: __LLSC_CLOBBER);
+
+   res = res != 0;
} else if ((MIPS_ISA_REV >= 2) && __builtin_constant_p(nr)) {
loongson_llsc_mb();
do {
@@ -363,12 +367,12 @@ static inline int test_and_clear_bit(unsigned long nr,
: __LLSC_CLOBBER);
} while (unlikely(!res));
 
-   res = temp & BIT(bit);
+   res = (temp & BIT(bit)) != 0;
}
 
smp_llsc_mb();
 
-   return res != 0;
+   return res;
 }
 
 /*
@@ -403,6 +407,8 @@ static inline int test_and_change_bit(unsigned long nr,
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=" (res)
: "ir" (BIT(bit))
: __LLSC_CLOBBER);
+
+   res = res != 0;
} else {
loongson_llsc_mb();
do {
@@ -418,12 +424,12 @@ static inline int test_and_change_bit(unsigned long nr,
: __LLSC_CLOBBER);
} while (unlikely(!res));
 
-   res = temp & BIT(bit);
+   res = (temp & BIT(bit)) != 0;
}
 
smp_llsc_mb();
 
-   return res != 0;
+   return res;
 }
 
 #include 
-- 
2.23.0



[PATCH v2 32/36] MIPS: barrier: Remove loongson_llsc_mb()

2019-10-01 Thread Paul Burton
The loongson_llsc_mb() macro is no longer used - instead barriers are
emitted as part of inline asm using the __SYNC() macro. Remove the
now-defunct loongson_llsc_mb() macro.

Signed-off-by: Paul Burton 
---

Changes in v2: None

 arch/mips/include/asm/barrier.h | 40 -
 arch/mips/loongson64/Platform   |  2 +-
 2 files changed, 1 insertion(+), 41 deletions(-)

diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index 133afd565067..6d92d5ccdafa 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -122,46 +122,6 @@ static inline void wmb(void)
 #define __smp_mb__before_atomic()  __smp_mb__before_llsc()
 #define __smp_mb__after_atomic()   smp_llsc_mb()
 
-/*
- * Some Loongson 3 CPUs have a bug wherein execution of a memory access (load,
- * store or prefetch) in between an LL & SC can cause the SC instruction to
- * erroneously succeed, breaking atomicity. Whilst it's unusual to write code
- * containing such sequences, this bug bites harder than we might otherwise
- * expect due to reordering & speculation:
- *
- * 1) A memory access appearing prior to the LL in program order may actually
- *be executed after the LL - this is the reordering case.
- *
- *In order to avoid this we need to place a memory barrier (ie. a SYNC
- *instruction) prior to every LL instruction, in between it and any earlier
- *memory access instructions.
- *
- *This reordering case is fixed by 3A R2 CPUs, ie. 3A2000 models and later.
- *
- * 2) If a conditional branch exists between an LL & SC with a target outside
- *of the LL-SC loop, for example an exit upon value mismatch in cmpxchg()
- *or similar, then misprediction of the branch may allow speculative
- *execution of memory accesses from outside of the LL-SC loop.
- *
- *In order to avoid this we need a memory barrier (ie. a SYNC instruction)
- *at each affected branch target, for which we also use loongson_llsc_mb()
- *defined below.
- *
- *This case affects all current Loongson 3 CPUs.
- *
- * The above described cases cause an error in the cache coherence protocol;
- * such that the Invalidate of a competing LL-SC goes 'missing' and SC
- * erroneously observes its core still has Exclusive state and lets the SC
- * proceed.
- *
- * Therefore the error only occurs on SMP systems.
- */
-#ifdef CONFIG_CPU_LOONGSON3_WORKAROUNDS /* Loongson-3's LLSC workaround */
-#define loongson_llsc_mb() __asm__ __volatile__("sync" : : :"memory")
-#else
-#define loongson_llsc_mb() do { } while (0)
-#endif
-
 static inline void sync_ginv(void)
 {
asm volatile(__SYNC(ginv, always));
diff --git a/arch/mips/loongson64/Platform b/arch/mips/loongson64/Platform
index c1a4d4dc4665..28172500f95a 100644
--- a/arch/mips/loongson64/Platform
+++ b/arch/mips/loongson64/Platform
@@ -27,7 +27,7 @@ cflags-$(CONFIG_CPU_LOONGSON3)+= -Wa,--trap
 #
 # Some versions of binutils, not currently mainline as of 2019/02/04, support
 # an -mfix-loongson3-llsc flag which emits a sync prior to each ll instruction
-# to work around a CPU bug (see loongson_llsc_mb() in asm/barrier.h for a
+# to work around a CPU bug (see __SYNC_loongson3_war in asm/sync.h for a
 # description).
 #
 # We disable this in order to prevent the assembler meddling with the
-- 
2.23.0



[PATCH v2 30/36] MIPS: futex: Emit Loongson3 sync workarounds within asm

2019-10-01 Thread Paul Burton
Generate the sync instructions required to workaround Loongson3 LL/SC
errata within inline asm blocks, which feels a little safer than doing
it from C where strictly speaking the compiler would be well within its
rights to insert a memory access between the separate asm statements we
previously had, containing sync & ll instructions respectively.

Signed-off-by: Paul Burton 

---

Changes in v2:
- De-string __WEAK_LLSC_MB to allow its use with __SYNC_ELSE().

 arch/mips/include/asm/barrier.h | 13 +++--
 arch/mips/include/asm/futex.h   | 15 +++
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index c7e05e832da9..133afd565067 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -95,13 +95,14 @@ static inline void wmb(void)
  * ordering will be done by smp_llsc_mb() and friends.
  */
 #if defined(CONFIG_WEAK_REORDERING_BEYOND_LLSC) && defined(CONFIG_SMP)
-#define __WEAK_LLSC_MB "   sync\n"
-#define smp_llsc_mb()  __asm__ __volatile__(__WEAK_LLSC_MB : : 
:"memory")
-#define __LLSC_CLOBBER
+# define __WEAK_LLSC_MBsync
+# define smp_llsc_mb() \
+   __asm__ __volatile__(__stringify(__WEAK_LLSC_MB) : : :"memory")
+# define __LLSC_CLOBBER
 #else
-#define __WEAK_LLSC_MB "   \n"
-#define smp_llsc_mb()  do { } while (0)
-#define __LLSC_CLOBBER "memory"
+# define __WEAK_LLSC_MB
+# define smp_llsc_mb() do { } while (0)
+# define __LLSC_CLOBBER"memory"
 #endif
 
 #ifdef CONFIG_CPU_CAVIUM_OCTEON
diff --git a/arch/mips/include/asm/futex.h b/arch/mips/include/asm/futex.h
index b83b0397462d..54cf20530931 100644
--- a/arch/mips/include/asm/futex.h
+++ b/arch/mips/include/asm/futex.h
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #define __futex_atomic_op(insn, ret, oldval, uaddr, oparg) \
@@ -32,7 +33,7 @@
"   .setarch=r4000  \n" \
"2: sc  $1, %2  \n" \
"   beqzl   $1, 1b  \n" \
-   __WEAK_LLSC_MB  \
+   __stringify(__WEAK_LLSC_MB) \
"3: \n" \
"   .insn   \n" \
"   .setpop \n" \
@@ -50,19 +51,19 @@
  "i" (-EFAULT) \
: "memory");\
} else if (cpu_has_llsc) {  \
-   loongson_llsc_mb(); \
__asm__ __volatile__(   \
"   .setpush\n" \
"   .setnoat\n" \
"   .setpush\n" \
"   .set"MIPS_ISA_ARCH_LEVEL"   \n" \
+   "   " __SYNC(full, loongson3_war) " \n" \
"1: "user_ll("%1", "%4")" # __futex_atomic_op\n"\
"   .setpop \n" \
"   " insn  "   \n" \
"   .set"MIPS_ISA_ARCH_LEVEL"   \n" \
"2: "user_sc("$1", "%2")"   \n" \
"   beqz$1, 1b  \n" \
-   __WEAK_LLSC_MB  \
+   __stringify(__WEAK_LLSC_MB) \
"3: \n" \
"   .insn   \n" \
"   .setpop \n" \
@@ -147,7 +148,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
"   .setarch=r4000  \n"
"2: sc  $1, %2  \n"
"   beqzl   $1, 1b  \n"
-   __WEAK_LLSC_MB
+   __stringify(__WEAK_LLSC_MB)
&

[PATCH v2 36/36] MIPS: Check Loongson3 LL/SC errata workaround correctness

2019-10-01 Thread Paul Burton
When Loongson3 LL/SC errata workarounds are enabled (ie.
CONFIG_CPU_LOONGSON3_WORKAROUNDS=y) run a tool to scan through the
compiled kernel & ensure that the workaround is applied correctly. That
is, ensure that:

  - Every LL or LLD instruction is preceded by a sync instruction.

  - Any branches from within an LL/SC loop to outside of that loop
target a sync instruction.

Reasoning for these conditions can be found by reading the comment above
the definition of __SYNC_loongson3_war in arch/mips/include/asm/sync.h.

This tool will help ensure that we don't inadvertently introduce code
paths that miss the required workarounds.

Signed-off-by: Paul Burton 

---

Changes in v2:
- Only try to build loongson3-llsc-check from arch/mips/Makefile when
  CONFIG_CPU_LOONGSON3_WORKAROUNDS is enabled.

 arch/mips/Makefile |   3 +
 arch/mips/Makefile.postlink|  10 +-
 arch/mips/tools/.gitignore |   1 +
 arch/mips/tools/Makefile   |   5 +
 arch/mips/tools/loongson3-llsc-check.c | 307 +
 5 files changed, 325 insertions(+), 1 deletion(-)
 create mode 100644 arch/mips/tools/loongson3-llsc-check.c

diff --git a/arch/mips/Makefile b/arch/mips/Makefile
index cdc09b71febe..0a5eab626260 100644
--- a/arch/mips/Makefile
+++ b/arch/mips/Makefile
@@ -14,6 +14,9 @@
 
 archscripts: scripts_basic
$(Q)$(MAKE) $(build)=arch/mips/tools elf-entry
+ifeq ($(CONFIG_CPU_LOONGSON3_WORKAROUNDS),y)
+   $(Q)$(MAKE) $(build)=arch/mips/tools loongson3-llsc-check
+endif
$(Q)$(MAKE) $(build)=arch/mips/boot/tools relocs
 
 KBUILD_DEFCONFIG := 32r2el_defconfig
diff --git a/arch/mips/Makefile.postlink b/arch/mips/Makefile.postlink
index 4eea4188cb20..f03fdc95143e 100644
--- a/arch/mips/Makefile.postlink
+++ b/arch/mips/Makefile.postlink
@@ -3,7 +3,8 @@
 # Post-link MIPS pass
 # ===
 #
-# 1. Insert relocations into vmlinux
+# 1. Check that Loongson3 LL/SC workarounds are applied correctly
+# 2. Insert relocations into vmlinux
 
 PHONY := __archpost
 __archpost:
@@ -11,6 +12,10 @@ __archpost:
 -include include/config/auto.conf
 include scripts/Kbuild.include
 
+CMD_LS3_LLSC = arch/mips/tools/loongson3-llsc-check
+quiet_cmd_ls3_llsc = LLSCCHK $@
+  cmd_ls3_llsc = $(CMD_LS3_LLSC) $@
+
 CMD_RELOCS = arch/mips/boot/tools/relocs
 quiet_cmd_relocs = RELOCS $@
   cmd_relocs = $(CMD_RELOCS) $@
@@ -19,6 +24,9 @@ quiet_cmd_relocs = RELOCS $@
 
 vmlinux: FORCE
@true
+ifeq ($(CONFIG_CPU_LOONGSON3_WORKAROUNDS),y)
+   $(call if_changed,ls3_llsc)
+endif
 ifeq ($(CONFIG_RELOCATABLE),y)
$(call if_changed,relocs)
 endif
diff --git a/arch/mips/tools/.gitignore b/arch/mips/tools/.gitignore
index 56d34ce4..b0209450d9ff 100644
--- a/arch/mips/tools/.gitignore
+++ b/arch/mips/tools/.gitignore
@@ -1 +1,2 @@
 elf-entry
+loongson3-llsc-check
diff --git a/arch/mips/tools/Makefile b/arch/mips/tools/Makefile
index 3baee4bc6775..aaef688749f5 100644
--- a/arch/mips/tools/Makefile
+++ b/arch/mips/tools/Makefile
@@ -3,3 +3,8 @@ hostprogs-y := elf-entry
 PHONY += elf-entry
 elf-entry: $(obj)/elf-entry
@:
+
+hostprogs-$(CONFIG_CPU_LOONGSON3_WORKAROUNDS) += loongson3-llsc-check
+PHONY += loongson3-llsc-check
+loongson3-llsc-check: $(obj)/loongson3-llsc-check
+   @:
diff --git a/arch/mips/tools/loongson3-llsc-check.c 
b/arch/mips/tools/loongson3-llsc-check.c
new file mode 100644
index ..0ebddd0ae46f
--- /dev/null
+++ b/arch/mips/tools/loongson3-llsc-check.c
@@ -0,0 +1,307 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifdef be32toh
+/* If libc provides le{16,32,64}toh() then we'll use them */
+#elif BYTE_ORDER == LITTLE_ENDIAN
+# define le16toh(x)(x)
+# define le32toh(x)(x)
+# define le64toh(x)(x)
+#elif BYTE_ORDER == BIG_ENDIAN
+# define le16toh(x)bswap_16(x)
+# define le32toh(x)bswap_32(x)
+# define le64toh(x)bswap_64(x)
+#endif
+
+/* MIPS opcodes, in bits 31:26 of an instruction */
+#define OP_SPECIAL 0x00
+#define OP_REGIMM  0x01
+#define OP_BEQ 0x04
+#define OP_BNE 0x05
+#define OP_BLEZ0x06
+#define OP_BGTZ0x07
+#define OP_BEQL0x14
+#define OP_BNEL0x15
+#define OP_BLEZL   0x16
+#define OP_BGTZL   0x17
+#define OP_LL  0x30
+#define OP_LLD 0x34
+#define OP_SC  0x38
+#define OP_SCD 0x3c
+
+/* Bits 20:16 of OP_REGIMM instructions */
+#define REGIMM_BLTZ0x00
+#define REGIMM_BGEZ0x01
+#define REGIMM_BLTZL   0x02
+#define REGIMM_BGEZL   0x03
+#define REGIMM_BLTZAL  0x10
+#define REGIMM_BGEZAL  0x11
+#define REGIMM_BLTZALL 0x12
+#define REGIMM_BGEZALL 0x13
+
+/* Bits 5:0 of OP_SPECIAL instructions */
+#define SPECIAL_SYNC   0x0f
+
+static void usage(

[PATCH v2 21/36] MIPS: bitops: Allow immediates in test_and_{set,clear,change}_bit

2019-10-01 Thread Paul Burton
The logical operations or & xor used in the test_and_set_bit_lock(),
test_and_clear_bit() & test_and_change_bit() functions currently force
the value 1<
---

Changes in v2: None

 arch/mips/include/asm/bitops.h | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index ea35a2e87b6d..7314ba5a3683 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -261,7 +261,7 @@ static inline int test_and_set_bit_lock(unsigned long nr,
"   and %2, %0, %3  \n"
"   .setpop \n"
: "=" (temp), "+m" (*m), "=" (res)
-   : "r" (1UL << bit)
+   : "ir" (1UL << bit)
: __LLSC_CLOBBER);
} else {
loongson_llsc_mb();
@@ -274,7 +274,7 @@ static inline int test_and_set_bit_lock(unsigned long nr,
"   " __SC  "%2, %1 \n"
"   .setpop \n"
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=" 
(res)
-   : "r" (1UL << bit)
+   : "ir" (1UL << bit)
: __LLSC_CLOBBER);
} while (unlikely(!res));
 
@@ -332,7 +332,7 @@ static inline int test_and_clear_bit(unsigned long nr,
"   and %2, %0, %3  \n"
"   .setpop \n"
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=" (res)
-   : "r" (1UL << bit)
+   : "ir" (1UL << bit)
: __LLSC_CLOBBER);
} else if ((MIPS_ISA_REV >= 2) && __builtin_constant_p(nr)) {
loongson_llsc_mb();
@@ -358,7 +358,7 @@ static inline int test_and_clear_bit(unsigned long nr,
"   " __SC  "%2, %1 \n"
"   .setpop \n"
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=" 
(res)
-   : "r" (1UL << bit)
+   : "ir" (1UL << bit)
: __LLSC_CLOBBER);
} while (unlikely(!res));
 
@@ -400,7 +400,7 @@ static inline int test_and_change_bit(unsigned long nr,
"   and %2, %0, %3  \n"
"   .setpop \n"
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=" (res)
-   : "r" (1UL << bit)
+   : "ir" (1UL << bit)
: __LLSC_CLOBBER);
} else {
loongson_llsc_mb();
@@ -413,7 +413,7 @@ static inline int test_and_change_bit(unsigned long nr,
"   " __SC  "\t%2, %1   \n"
"   .setpop \n"
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=" 
(res)
-   : "r" (1UL << bit)
+   : "ir" (1UL << bit)
: __LLSC_CLOBBER);
} while (unlikely(!res));
 
-- 
2.23.0



[PATCH v2 31/36] MIPS: syscall: Emit Loongson3 sync workarounds within asm

2019-10-01 Thread Paul Burton
Generate the sync instructions required to workaround Loongson3 LL/SC
errata within inline asm blocks, which feels a little safer than doing
it from C where strictly speaking the compiler would be well within its
rights to insert a memory access between the separate asm statements we
previously had, containing sync & ll instructions respectively.

Signed-off-by: Paul Burton 
---

Changes in v2: None

 arch/mips/kernel/syscall.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/mips/kernel/syscall.c b/arch/mips/kernel/syscall.c
index b0e25e913bdb..3ea288ca35f1 100644
--- a/arch/mips/kernel/syscall.c
+++ b/arch/mips/kernel/syscall.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -132,12 +133,12 @@ static inline int mips_atomic_set(unsigned long addr, 
unsigned long new)
  [efault] "i" (-EFAULT)
: "memory");
} else if (cpu_has_llsc) {
-   loongson_llsc_mb();
__asm__ __volatile__ (
"   .setpush\n"
"   .set"MIPS_ISA_ARCH_LEVEL"   \n"
"   li  %[err], 0   \n"
"1: \n"
+   "   " __SYNC(full, loongson3_war) " \n"
user_ll("%[old]", "(%[addr])")
"   move%[tmp], %[new]  \n"
"2: \n"
-- 
2.23.0



[PATCH v2 16/36] MIPS: bitops: Handle !kernel_uses_llsc first

2019-10-01 Thread Paul Burton
Reorder conditions in our various bitops functions that check
kernel_uses_llsc such that they handle the !kernel_uses_llsc case first.
This allows us to avoid the need to duplicate the kernel_uses_llsc check
in all the other cases. For functions that don't involve barriers common
to the various implementations, we switch to returning from within each
if block making each case easier to read in isolation.

Signed-off-by: Paul Burton 
---

Changes in v2: None

 arch/mips/include/asm/bitops.h | 213 -
 1 file changed, 105 insertions(+), 108 deletions(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index 985d6a02f9ea..e300960717e0 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -52,11 +52,16 @@ int __mips_test_and_change_bit(unsigned long nr,
  */
 static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
 {
-   unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG);
+   unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
int bit = nr & SZLONG_MASK;
unsigned long temp;
 
-   if (kernel_uses_llsc && R1_LLSC_WAR) {
+   if (!kernel_uses_llsc) {
+   __mips_set_bit(nr, addr);
+   return;
+   }
+
+   if (R1_LLSC_WAR) {
__asm__ __volatile__(
"   .setpush\n"
"   .setarch=r4000  \n"
@@ -68,8 +73,11 @@ static inline void set_bit(unsigned long nr, volatile 
unsigned long *addr)
: "=" (temp), "=" GCC_OFF_SMALL_ASM() (*m)
: "ir" (1UL << bit), GCC_OFF_SMALL_ASM() (*m)
: __LLSC_CLOBBER);
+   return;
+   }
+
 #if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)
-   } else if (kernel_uses_llsc && __builtin_constant_p(bit)) {
+   if (__builtin_constant_p(bit)) {
loongson_llsc_mb();
do {
__asm__ __volatile__(
@@ -80,23 +88,23 @@ static inline void set_bit(unsigned long nr, volatile 
unsigned long *addr)
: "ir" (bit), "r" (~0)
: __LLSC_CLOBBER);
} while (unlikely(!temp));
+   return;
+   }
 #endif /* CONFIG_CPU_MIPSR2 || CONFIG_CPU_MIPSR6 */
-   } else if (kernel_uses_llsc) {
-   loongson_llsc_mb();
-   do {
-   __asm__ __volatile__(
-   "   .setpush\n"
-   "   .set"MIPS_ISA_ARCH_LEVEL"   \n"
-   "   " __LL "%0, %1  # set_bit   \n"
-   "   or  %0, %2  \n"
-   "   " __SC  "%0, %1 \n"
-   "   .setpop \n"
-   : "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-   : "ir" (1UL << bit)
-   : __LLSC_CLOBBER);
-   } while (unlikely(!temp));
-   } else
-   __mips_set_bit(nr, addr);
+
+   loongson_llsc_mb();
+   do {
+   __asm__ __volatile__(
+   "   .setpush\n"
+   "   .set"MIPS_ISA_ARCH_LEVEL"   \n"
+   "   " __LL "%0, %1  # set_bit   \n"
+   "   or  %0, %2  \n"
+   "   " __SC  "%0, %1 \n"
+   "   .setpop \n"
+   : "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
+   : "ir" (1UL << bit)
+   : __LLSC_CLOBBER);
+   } while (unlikely(!temp));
 }
 
 /*
@@ -111,11 +119,16 @@ static inline void set_bit(unsigned long nr, volatile 
unsigned long *addr)
  */
 static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
-   unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG);
+   unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
int bit = nr & SZLONG_MASK;
unsigned long temp;
 
-   if (kernel_uses_llsc && R1_LLSC_WAR) {
+   if (!kernel_uses_llsc) {
+   __mips_clear_bit(nr, addr);
+   return;
+   }
+
+   if (R1_LLSC_WAR) {
__asm__ __volatile__(
"   .setpush   

[PATCH v2 27/36] MIPS: bitops: Use smp_mb__before_atomic in test_* ops

2019-10-01 Thread Paul Burton
Use smp_mb__before_atomic() rather than smp_mb__before_llsc() in
test_and_set_bit(), test_and_clear_bit() & test_and_change_bit(). The
_atomic() versions make semantic sense in these cases, and will allow a
later patch to omit redundant barriers for Loongson3 systems that
already include a barrier within __test_bit_op().

Signed-off-by: Paul Burton 
---

Changes in v2: None

 arch/mips/include/asm/bitops.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index c08b6d225f10..a74769940fbd 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -209,7 +209,7 @@ static inline int test_and_set_bit_lock(unsigned long nr,
 static inline int test_and_set_bit(unsigned long nr,
volatile unsigned long *addr)
 {
-   smp_mb__before_llsc();
+   smp_mb__before_atomic();
return test_and_set_bit_lock(nr, addr);
 }
 
@@ -228,7 +228,7 @@ static inline int test_and_clear_bit(unsigned long nr,
int bit = nr % BITS_PER_LONG;
unsigned long res, orig;
 
-   smp_mb__before_llsc();
+   smp_mb__before_atomic();
 
if (!kernel_uses_llsc) {
res = __mips_test_and_clear_bit(nr, addr);
@@ -265,7 +265,7 @@ static inline int test_and_change_bit(unsigned long nr,
int bit = nr % BITS_PER_LONG;
unsigned long res, orig;
 
-   smp_mb__before_llsc();
+   smp_mb__before_atomic();
 
if (!kernel_uses_llsc) {
res = __mips_test_and_change_bit(nr, addr);
-- 
2.23.0



[PATCH v2 33/36] MIPS: barrier: Make __smp_mb__before_atomic() a no-op for Loongson3

2019-10-01 Thread Paul Burton
Loongson3 systems with CONFIG_CPU_LOONGSON3_WORKAROUNDS enabled already
emit a full completion barrier as part of the inline assembly containing
LL/SC loops for atomic operations. As such the barrier emitted by
__smp_mb__before_atomic() is redundant, and we can remove it.

Signed-off-by: Paul Burton 
---

Changes in v2: None

 arch/mips/include/asm/barrier.h | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index 6d92d5ccdafa..49ff172a72b9 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -119,7 +119,17 @@ static inline void wmb(void)
 #define nudge_writes() mb()
 #endif
 
-#define __smp_mb__before_atomic()  __smp_mb__before_llsc()
+/*
+ * In the Loongson3 LL/SC workaround case, all of our LL/SC loops already have
+ * a completion barrier immediately preceding the LL instruction. Therefore we
+ * can skip emitting a barrier from __smp_mb__before_atomic().
+ */
+#ifdef CONFIG_CPU_LOONGSON3_WORKAROUNDS
+# define __smp_mb__before_atomic()
+#else
+# define __smp_mb__before_atomic() __smp_mb__before_llsc()
+#endif
+
 #define __smp_mb__after_atomic()   smp_llsc_mb()
 
 static inline void sync_ginv(void)
-- 
2.23.0



[PATCH v2 26/36] MIPS: bitops: Emit Loongson3 sync workarounds within asm

2019-10-01 Thread Paul Burton
Generate the sync instructions required to workaround Loongson3 LL/SC
errata within inline asm blocks, which feels a little safer than doing
it from C where strictly speaking the compiler would be well within its
rights to insert a memory access between the separate asm statements we
previously had, containing sync & ll instructions respectively.

Signed-off-by: Paul Burton 
---

Changes in v2: None

 arch/mips/include/asm/bitops.h | 11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index d39fca2def60..c08b6d225f10 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -31,6 +31,7 @@
asm volatile(   \
"   .setpush\n" \
"   .set" MIPS_ISA_LEVEL "  \n" \
+   "   " __SYNC(full, loongson3_war) " \n" \
"1: " __LL  "%0, %1 \n" \
"   " insn  "   \n" \
"   " __SC  "%0, %1 \n" \
@@ -47,6 +48,7 @@
asm volatile(   \
"   .setpush\n" \
"   .set" MIPS_ISA_LEVEL "  \n" \
+   "   " __SYNC(full, loongson3_war) " \n" \
"1: " __LL  ll_dst ", %2\n" \
"   " insn  "   \n" \
"   " __SC  "%1, %2 \n" \
@@ -96,12 +98,10 @@ static inline void set_bit(unsigned long nr, volatile 
unsigned long *addr)
}
 
if ((MIPS_ISA_REV >= 2) && __builtin_constant_p(bit) && (bit >= 16)) {
-   loongson_llsc_mb();
__bit_op(*m, __INS "%0, %3, %2, 1", "i"(bit), "r"(~0));
return;
}
 
-   loongson_llsc_mb();
__bit_op(*m, "or\t%0, %2", "ir"(BIT(bit)));
 }
 
@@ -126,12 +126,10 @@ static inline void clear_bit(unsigned long nr, volatile 
unsigned long *addr)
}
 
if ((MIPS_ISA_REV >= 2) && __builtin_constant_p(bit)) {
-   loongson_llsc_mb();
__bit_op(*m, __INS "%0, $0, %2, 1", "i"(bit));
return;
}
 
-   loongson_llsc_mb();
__bit_op(*m, "and\t%0, %2", "ir"(~BIT(bit)));
 }
 
@@ -168,7 +166,6 @@ static inline void change_bit(unsigned long nr, volatile 
unsigned long *addr)
return;
}
 
-   loongson_llsc_mb();
__bit_op(*m, "xor\t%0, %2", "ir"(BIT(bit)));
 }
 
@@ -190,7 +187,6 @@ static inline int test_and_set_bit_lock(unsigned long nr,
if (!kernel_uses_llsc) {
res = __mips_test_and_set_bit_lock(nr, addr);
} else {
-   loongson_llsc_mb();
orig = __test_bit_op(*m, "%0",
 "or\t%1, %0, %3",
 "ir"(BIT(bit)));
@@ -237,13 +233,11 @@ static inline int test_and_clear_bit(unsigned long nr,
if (!kernel_uses_llsc) {
res = __mips_test_and_clear_bit(nr, addr);
} else if ((MIPS_ISA_REV >= 2) && __builtin_constant_p(nr)) {
-   loongson_llsc_mb();
res = __test_bit_op(*m, "%1",
__EXT "%0, %1, %3, 1;"
__INS "%1, $0, %3, 1",
"i"(bit));
} else {
-   loongson_llsc_mb();
orig = __test_bit_op(*m, "%0",
 "or\t%1, %0, %3;"
 "xor\t%1, %1, %3",
@@ -276,7 +270,6 @@ static inline int test_and_change_bit(unsigned long nr,
if (!kernel_uses_llsc) {
res = __mips_test_and_change_bit(nr, addr);
} else {
-   loongson_llsc_mb();
orig = __test_bit_op(*m, "%0",
 "xor\t%1, %0, %3",
 "ir"(BIT(bit)));
-- 
2.23.0



[PATCH v2 17/36] MIPS: bitops: Only use ins for bit 16 or higher

2019-10-01 Thread Paul Burton
set_bit() can set bits 0-15 using an ori instruction, rather than
loading the value -1 into a register & then using an ins instruction.

That is, rather than the following:

  li   t0, -1
  ll   t1, 0(t2)
  ins  t1, t0, 4, 1
  sc   t1, 0(t2)

We can have the simpler:

  ll   t1, 0(t2)
  ori  t1, t1, 0x10
  sc   t1, 0(t2)

The or path already allows immediates to be used, so simply restricting
the ins path to bits that don't fit in immediates is sufficient to take
advantage of this.

Signed-off-by: Paul Burton 
---

Changes in v2: None

 arch/mips/include/asm/bitops.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index e300960717e0..1e5739191ddf 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -77,7 +77,7 @@ static inline void set_bit(unsigned long nr, volatile 
unsigned long *addr)
}
 
 #if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)
-   if (__builtin_constant_p(bit)) {
+   if (__builtin_constant_p(bit) && (bit >= 16)) {
loongson_llsc_mb();
do {
__asm__ __volatile__(
-- 
2.23.0



[PATCH v2 25/36] MIPS: bitops: Use BIT_WORD() & BITS_PER_LONG

2019-10-01 Thread Paul Burton
Rather than using custom SZLONG_LOG & SZLONG_MASK macros to shift & mask
a bit index to form word & bit offsets respectively, make use of the
standard BIT_WORD() & BITS_PER_LONG macros for the same purpose.

volatile is added to the definition of pointers to the long-sized word
we'll operate on, in order to prevent the compiler complaining that we
cast away the volatile qualifier of the addr argument. This should have
no effect on generated code, which in the LL/SC case is inline asm
anyway & in the non-LLSC case access is constrained by compiler barriers
provided by raw_local_irq_{save,restore}().

Signed-off-by: Paul Burton 
---

Changes in v2: None

 arch/mips/include/asm/bitops.h | 24 
 arch/mips/include/asm/llsc.h   |  4 
 arch/mips/lib/bitops.c | 31 +--
 3 files changed, 25 insertions(+), 34 deletions(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index fba0a842b98a..d39fca2def60 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -87,8 +87,8 @@ int __mips_test_and_change_bit(unsigned long nr,
  */
 static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
 {
-   unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
-   int bit = nr & SZLONG_MASK;
+   volatile unsigned long *m = [BIT_WORD(nr)];
+   int bit = nr % BITS_PER_LONG;
 
if (!kernel_uses_llsc) {
__mips_set_bit(nr, addr);
@@ -117,8 +117,8 @@ static inline void set_bit(unsigned long nr, volatile 
unsigned long *addr)
  */
 static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
-   unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
-   int bit = nr & SZLONG_MASK;
+   volatile unsigned long *m = [BIT_WORD(nr)];
+   int bit = nr % BITS_PER_LONG;
 
if (!kernel_uses_llsc) {
__mips_clear_bit(nr, addr);
@@ -160,8 +160,8 @@ static inline void clear_bit_unlock(unsigned long nr, 
volatile unsigned long *ad
  */
 static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
 {
-   unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
-   int bit = nr & SZLONG_MASK;
+   volatile unsigned long *m = [BIT_WORD(nr)];
+   int bit = nr % BITS_PER_LONG;
 
if (!kernel_uses_llsc) {
__mips_change_bit(nr, addr);
@@ -183,8 +183,8 @@ static inline void change_bit(unsigned long nr, volatile 
unsigned long *addr)
 static inline int test_and_set_bit_lock(unsigned long nr,
volatile unsigned long *addr)
 {
-   unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
-   int bit = nr & SZLONG_MASK;
+   volatile unsigned long *m = [BIT_WORD(nr)];
+   int bit = nr % BITS_PER_LONG;
unsigned long res, orig;
 
if (!kernel_uses_llsc) {
@@ -228,8 +228,8 @@ static inline int test_and_set_bit(unsigned long nr,
 static inline int test_and_clear_bit(unsigned long nr,
volatile unsigned long *addr)
 {
-   unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
-   int bit = nr & SZLONG_MASK;
+   volatile unsigned long *m = [BIT_WORD(nr)];
+   int bit = nr % BITS_PER_LONG;
unsigned long res, orig;
 
smp_mb__before_llsc();
@@ -267,8 +267,8 @@ static inline int test_and_clear_bit(unsigned long nr,
 static inline int test_and_change_bit(unsigned long nr,
volatile unsigned long *addr)
 {
-   unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
-   int bit = nr & SZLONG_MASK;
+   volatile unsigned long *m = [BIT_WORD(nr)];
+   int bit = nr % BITS_PER_LONG;
unsigned long res, orig;
 
smp_mb__before_llsc();
diff --git a/arch/mips/include/asm/llsc.h b/arch/mips/include/asm/llsc.h
index d240a4a2d1c4..c49738bc3bda 100644
--- a/arch/mips/include/asm/llsc.h
+++ b/arch/mips/include/asm/llsc.h
@@ -12,15 +12,11 @@
 #include 
 
 #if _MIPS_SZLONG == 32
-#define SZLONG_LOG 5
-#define SZLONG_MASK 31UL
 #define __LL   "ll "
 #define __SC   "sc "
 #define __INS  "ins"
 #define __EXT  "ext"
 #elif _MIPS_SZLONG == 64
-#define SZLONG_LOG 6
-#define SZLONG_MASK 63UL
 #define __LL   "lld"
 #define __SC   "scd"
 #define __INS  "dins   "
diff --git a/arch/mips/lib/bitops.c b/arch/mips/lib/bitops.c
index fba402c0879d..116d0bd8b2ae 100644
--- a/arch/mips/lib/bitops.c
+++ b/arch/mips/lib/bitops.c
@@ -7,6 +7,7 @@
  * Copyright (c) 1999, 2000  Silicon Graphics, Inc.
  */
 #include 
+#include 
 #include 
 #include 
 
@@ -19,12 +20,11 @@
  */
 void __mips_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
-   unsigned long *a = (unsigned long *)addr;
-   unsigned bit = nr & SZLONG_MASK;

[PATCH v2 34/36] MIPS: genex: Add Loongson3 LL/SC workaround to ejtag_debug_handler

2019-10-01 Thread Paul Burton
In ejtag_debug_handler we use LL & SC instructions to acquire & release
an open-coded spinlock. For Loongson3 systems affected by LL/SC errata
this requires that we insert a sync instruction prior to the LL in order
to ensure correct behavior of the LL/SC loop.

Signed-off-by: Paul Burton 
---

Changes in v2: None

 arch/mips/kernel/genex.S | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/mips/kernel/genex.S b/arch/mips/kernel/genex.S
index efde27c99414..ac4f2b835165 100644
--- a/arch/mips/kernel/genex.S
+++ b/arch/mips/kernel/genex.S
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -353,6 +354,7 @@ NESTED(ejtag_debug_handler, PT_SIZE, sp)
 
 #ifdef CONFIG_SMP
 1: PTR_LA  k0, ejtag_debug_buffer_spinlock
+   __SYNC(full, loongson3_war)
ll  k0, 0(k0)
bnezk0, 1b
PTR_LA  k0, ejtag_debug_buffer_spinlock
-- 
2.23.0



[PATCH v2 22/36] MIPS: bitops: Use the BIT() macro

2019-10-01 Thread Paul Burton
Use the BIT() macro in asm/bitops.h rather than open-coding its
equivalent.

Signed-off-by: Paul Burton 
---

Changes in v2: None

 arch/mips/include/asm/bitops.h | 31 ---
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index 7314ba5a3683..0f8ff896e86b 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -13,6 +13,7 @@
 #error only  can be included directly
 #endif
 
+#include 
 #include 
 #include 
 #include 
@@ -70,7 +71,7 @@ static inline void set_bit(unsigned long nr, volatile 
unsigned long *addr)
"   beqzl   %0, 1b  \n"
"   .setpop \n"
: "=" (temp), "=" GCC_OFF_SMALL_ASM() (*m)
-   : "ir" (1UL << bit), GCC_OFF_SMALL_ASM() (*m)
+   : "ir" (BIT(bit)), GCC_OFF_SMALL_ASM() (*m)
: __LLSC_CLOBBER);
return;
}
@@ -99,7 +100,7 @@ static inline void set_bit(unsigned long nr, volatile 
unsigned long *addr)
"   " __SC  "%0, %1 \n"
"   .setpop \n"
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-   : "ir" (1UL << bit)
+   : "ir" (BIT(bit))
: __LLSC_CLOBBER);
} while (unlikely(!temp));
 }
@@ -135,7 +136,7 @@ static inline void clear_bit(unsigned long nr, volatile 
unsigned long *addr)
"   beqzl   %0, 1b  \n"
"   .setpop \n"
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-   : "ir" (~(1UL << bit))
+   : "ir" (~(BIT(bit)))
: __LLSC_CLOBBER);
return;
}
@@ -164,7 +165,7 @@ static inline void clear_bit(unsigned long nr, volatile 
unsigned long *addr)
"   " __SC "%0, %1  \n"
"   .setpop \n"
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-   : "ir" (~(1UL << bit))
+   : "ir" (~(BIT(bit)))
: __LLSC_CLOBBER);
} while (unlikely(!temp));
 }
@@ -213,7 +214,7 @@ static inline void change_bit(unsigned long nr, volatile 
unsigned long *addr)
"   beqzl   %0, 1b  \n"
"   .setpop \n"
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-   : "ir" (1UL << bit)
+   : "ir" (BIT(bit))
: __LLSC_CLOBBER);
return;
}
@@ -228,7 +229,7 @@ static inline void change_bit(unsigned long nr, volatile 
unsigned long *addr)
"   " __SC  "%0, %1 \n"
"   .setpop \n"
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-   : "ir" (1UL << bit)
+   : "ir" (BIT(bit))
: __LLSC_CLOBBER);
} while (unlikely(!temp));
 }
@@ -261,7 +262,7 @@ static inline int test_and_set_bit_lock(unsigned long nr,
"   and %2, %0, %3  \n"
"   .setpop \n"
: "=" (temp), "+m" (*m), "=" (res)
-   : "ir" (1UL << bit)
+   : "ir" (BIT(bit))
: __LLSC_CLOBBER);
} else {
loongson_llsc_mb();
@@ -274,11 +275,11 @@ static inline int test_and_set_bit_lock(unsigned long nr,
"   " __SC  "%2, %1 \n"
"   .setpop \n"
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=" 
(res)
-   : "ir" (1UL << bit)
+   : "ir" (BIT(bit))
: __LLSC_CLOBBER);
} while (unlikely(!res));
 
-   res = temp & (1UL << bit);
+   res = temp & BIT(bit);
}
 
smp_llsc_mb();
@@ -332,7 +333,7 @@ static inline int test_and_clear_bit(unsigned long nr,
" 

[PATCH v2 24/36] MIPS: bitops: Abstract LL/SC loops

2019-10-01 Thread Paul Burton
Introduce __bit_op() & __test_bit_op() macros which abstract away the
implementation of LL/SC loops. This cuts down on a lot of duplicate
boilerplate code, and also allows R1_LLSC_WAR to be handled outside
of the individual bitop functions.

Signed-off-by: Paul Burton 
---

Changes in v2: None

 arch/mips/include/asm/bitops.h | 267 -
 1 file changed, 63 insertions(+), 204 deletions(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index 7671db2a7b73..fba0a842b98a 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -25,6 +25,41 @@
 #include 
 #include 
 
+#define __bit_op(mem, insn, inputs...) do {\
+   unsigned long temp; \
+   \
+   asm volatile(   \
+   "   .setpush\n" \
+   "   .set" MIPS_ISA_LEVEL "  \n" \
+   "1: " __LL  "%0, %1 \n" \
+   "   " insn  "   \n" \
+   "   " __SC  "%0, %1 \n" \
+   "   " __SC_BEQZ "%0, 1b \n" \
+   "   .setpop \n" \
+   : "="(temp), "+" GCC_OFF_SMALL_ASM()(mem) \
+   : inputs\
+   : __LLSC_CLOBBER);  \
+} while (0)
+
+#define __test_bit_op(mem, ll_dst, insn, inputs...) ({ \
+   unsigned long orig, temp;   \
+   \
+   asm volatile(   \
+   "   .setpush\n" \
+   "   .set" MIPS_ISA_LEVEL "  \n" \
+   "1: " __LL  ll_dst ", %2\n" \
+   "   " insn  "   \n" \
+   "   " __SC  "%1, %2 \n" \
+   "   " __SC_BEQZ "%1, 1b \n" \
+   "   .setpop \n" \
+   : "="(orig), "="(temp), \
+ "+" GCC_OFF_SMALL_ASM()(mem)  \
+   : inputs\
+   : __LLSC_CLOBBER);  \
+   \
+   orig;   \
+})
+
 /*
  * These are the "slower" versions of the functions and are in bitops.c.
  * These functions call raw_local_irq_{save,restore}().
@@ -54,55 +89,20 @@ static inline void set_bit(unsigned long nr, volatile 
unsigned long *addr)
 {
unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
int bit = nr & SZLONG_MASK;
-   unsigned long temp;
 
if (!kernel_uses_llsc) {
__mips_set_bit(nr, addr);
return;
}
 
-   if (R1_LLSC_WAR) {
-   __asm__ __volatile__(
-   "   .setpush\n"
-   "   .setarch=r4000  \n"
-   "1: " __LL "%0, %1  # set_bit   \n"
-   "   or  %0, %2  \n"
-   "   " __SC  "%0, %1 \n"
-   "   beqzl   %0, 1b  \n"
-   "   .setpop \n"
-   : "=" (temp), "=" GCC_OFF_SMALL_ASM() (*m)
-   : "ir" (BIT(bit)), GCC_OFF_SMALL_ASM() (*m)
-   : __LLSC_CLOBBER);
-   return;
-   }
-
if ((MIPS_ISA_REV >= 2) && __builtin_constant_p(bit) && (bit >= 16)) {
loongson_llsc_mb();
-   do {
-   __asm__ __volatile__(
-   "   " __LL "%0, %1  # set_bit   \n"
-   "   " __INS "%0, %3, %2, 1  \n"
-   "   " __SC "%0, %1  \n"
-   : "=" (temp), "+" GCC_OFF_SM

[PATCH v2 29/36] MIPS: cmpxchg: Omit redundant barriers for Loongson3

2019-10-01 Thread Paul Burton
When building a kernel configured to support Loongson3 LL/SC workarounds
(ie. CONFIG_CPU_LOONGSON3_WORKAROUNDS=y) the inline assembly in
__xchg_asm() & __cmpxchg_asm() already emits completion barriers, and as
such we don't need to emit extra barriers from the xchg() or cmpxchg()
macros. Add compile-time constant checks causing us to omit the
redundant memory barriers.

Signed-off-by: Paul Burton 
---

Changes in v2: None

 arch/mips/include/asm/cmpxchg.h | 26 +++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/arch/mips/include/asm/cmpxchg.h b/arch/mips/include/asm/cmpxchg.h
index fc121d20a980..820df68e32e1 100644
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -94,7 +94,13 @@ static inline unsigned long __xchg(volatile void *ptr, 
unsigned long x,
 ({ \
__typeof__(*(ptr)) __res;   \
\
-   smp_mb__before_llsc();  \
+   /*  \
+* In the Loongson3 workaround case __xchg_asm() already\
+* contains a completion barrier prior to the LL, so we don't   \
+* need to emit an extra one here.  \
+*/ \
+   if (!__SYNC_loongson3_war)  \
+   smp_mb__before_llsc();  \
\
__res = (__typeof__(*(ptr)))\
__xchg((ptr), (unsigned long)(x), sizeof(*(ptr)));  \
@@ -179,9 +185,23 @@ static inline unsigned long __cmpxchg(volatile void *ptr, 
unsigned long old,
 ({ \
__typeof__(*(ptr)) __res;   \
\
-   smp_mb__before_llsc();  \
+   /*  \
+* In the Loongson3 workaround case __cmpxchg_asm() already \
+* contains a completion barrier prior to the LL, so we don't   \
+* need to emit an extra one here.  \
+*/ \
+   if (!__SYNC_loongson3_war)  \
+   smp_mb__before_llsc();  \
+   \
__res = cmpxchg_local((ptr), (old), (new)); \
-   smp_llsc_mb();  \
+   \
+   /*  \
+* In the Loongson3 workaround case __cmpxchg_asm() already \
+* contains a completion barrier after the SC, so we don't  \
+* need to emit an extra one here.  \
+*/ \
+   if (!__SYNC_loongson3_war)  \
+   smp_llsc_mb();  \
\
__res;  \
 })
-- 
2.23.0



[PATCH v2 28/36] MIPS: cmpxchg: Emit Loongson3 sync workarounds within asm

2019-10-01 Thread Paul Burton
Generate the sync instructions required to workaround Loongson3 LL/SC
errata within inline asm blocks, which feels a little safer than doing
it from C where strictly speaking the compiler would be well within its
rights to insert a memory access between the separate asm statements we
previously had, containing sync & ll instructions respectively.

Signed-off-by: Paul Burton 
---

Changes in v2: None

 arch/mips/include/asm/cmpxchg.h | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/mips/include/asm/cmpxchg.h b/arch/mips/include/asm/cmpxchg.h
index 5d3f0e3513b4..fc121d20a980 100644
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /*
@@ -36,12 +37,12 @@ extern unsigned long __xchg_called_with_bad_pointer(void)
__typeof(*(m)) __ret;   \
\
if (kernel_uses_llsc) { \
-   loongson_llsc_mb(); \
__asm__ __volatile__(   \
"   .setpush\n" \
"   .setnoat\n" \
"   .setpush\n" \
"   .set" MIPS_ISA_ARCH_LEVEL " \n" \
+   "   " __SYNC(full, loongson3_war) " \n" \
"1: " ld "  %0, %2  # __xchg_asm\n" \
"   .setpop \n" \
"   move$1, %z3 \n" \
@@ -108,12 +109,12 @@ static inline unsigned long __xchg(volatile void *ptr, 
unsigned long x,
__typeof(*(m)) __ret;   \
\
if (kernel_uses_llsc) { \
-   loongson_llsc_mb(); \
__asm__ __volatile__(   \
"   .setpush\n" \
"   .setnoat\n" \
"   .setpush\n" \
"   .set"MIPS_ISA_ARCH_LEVEL"   \n" \
+   "   " __SYNC(full, loongson3_war) " \n" \
"1: " ld "  %0, %2  # __cmpxchg_asm \n" \
"   bne %0, %z3, 2f \n" \
"   .setpop \n" \
@@ -122,11 +123,10 @@ static inline unsigned long __xchg(volatile void *ptr, 
unsigned long x,
"   " st "  $1, %1  \n" \
"\t" __SC_BEQZ  "$1, 1b \n" \
"   .setpop \n" \
-   "2: \n" \
+   "2: " __SYNC(full, loongson3_war) " \n" \
: "=" (__ret), "=" GCC_OFF_SMALL_ASM() (*m)   \
: GCC_OFF_SMALL_ASM() (*m), "Jr" (old), "Jr" (new)  \
: __LLSC_CLOBBER);  \
-   loongson_llsc_mb(); \
} else {\
unsigned long __flags;  \
\
@@ -222,11 +222,11 @@ static inline unsigned long __cmpxchg64(volatile void 
*ptr,
 */
local_irq_save(flags);
 
-   loongson_llsc_mb();
asm volatile(
"   .setpush\n"
"   .set" MIPS_ISA_ARCH_LEVEL " \n"
/* Load 64 bits from ptr */
+   "   " __SYNC(full, loongson3_war) " \n"
"1: lld %L0, %3 # __cmpxchg64   \n"
/*
 * Split the 64 bit value we loaded into the 2 registers that hold the
@@ -260,7 +260,7 @@ static inline unsigned long __cmpxchg64(volatile void *ptr,
/* If we failed, loop! */
"\t" __SC_BEQZ "%L1, 1b \n&quo

[PATCH v2 20/36] MIPS: bitops: Implement test_and_set_bit() in terms of _lock variant

2019-10-01 Thread Paul Burton
The only difference between test_and_set_bit() & test_and_set_bit_lock()
is memory ordering barrier semantics - the former provides a full
barrier whilst the latter only provides acquire semantics.

We can therefore implement test_and_set_bit() in terms of
test_and_set_bit_lock() with the addition of the extra memory barrier.
Do this in order to avoid duplicating logic.

Signed-off-by: Paul Burton 
---

Changes in v2: None

 arch/mips/include/asm/bitops.h | 66 +++---
 arch/mips/lib/bitops.c | 26 --
 2 files changed, 13 insertions(+), 79 deletions(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index 03532ae9f528..ea35a2e87b6d 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -31,8 +31,6 @@
 void __mips_set_bit(unsigned long nr, volatile unsigned long *addr);
 void __mips_clear_bit(unsigned long nr, volatile unsigned long *addr);
 void __mips_change_bit(unsigned long nr, volatile unsigned long *addr);
-int __mips_test_and_set_bit(unsigned long nr,
-   volatile unsigned long *addr);
 int __mips_test_and_set_bit_lock(unsigned long nr,
 volatile unsigned long *addr);
 int __mips_test_and_clear_bit(unsigned long nr,
@@ -236,24 +234,22 @@ static inline void change_bit(unsigned long nr, volatile 
unsigned long *addr)
 }
 
 /*
- * test_and_set_bit - Set a bit and return its old value
+ * test_and_set_bit_lock - Set a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
- * This operation is atomic and cannot be reordered.
- * It also implies a memory barrier.
+ * This operation is atomic and implies acquire ordering semantics
+ * after the memory operation.
  */
-static inline int test_and_set_bit(unsigned long nr,
+static inline int test_and_set_bit_lock(unsigned long nr,
volatile unsigned long *addr)
 {
unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
int bit = nr & SZLONG_MASK;
unsigned long res, temp;
 
-   smp_mb__before_llsc();
-
if (!kernel_uses_llsc) {
-   res = __mips_test_and_set_bit(nr, addr);
+   res = __mips_test_and_set_bit_lock(nr, addr);
} else if (R1_LLSC_WAR) {
__asm__ __volatile__(
"   .setpush\n"
@@ -264,7 +260,7 @@ static inline int test_and_set_bit(unsigned long nr,
"   beqzl   %2, 1b  \n"
"   and %2, %0, %3  \n"
"   .setpop \n"
-   : "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=" (res)
+   : "=" (temp), "+m" (*m), "=" (res)
: "r" (1UL << bit)
: __LLSC_CLOBBER);
} else {
@@ -291,56 +287,20 @@ static inline int test_and_set_bit(unsigned long nr,
 }
 
 /*
- * test_and_set_bit_lock - Set a bit and return its old value
+ * test_and_set_bit - Set a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
- * This operation is atomic and implies acquire ordering semantics
- * after the memory operation.
+ * This operation is atomic and cannot be reordered.
+ * It also implies a memory barrier.
  */
-static inline int test_and_set_bit_lock(unsigned long nr,
+static inline int test_and_set_bit(unsigned long nr,
volatile unsigned long *addr)
 {
-   unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
-   int bit = nr & SZLONG_MASK;
-   unsigned long res, temp;
-
-   if (!kernel_uses_llsc) {
-   res = __mips_test_and_set_bit_lock(nr, addr);
-   } else if (R1_LLSC_WAR) {
-   __asm__ __volatile__(
-   "   .setpush\n"
-   "   .setarch=r4000  \n"
-   "1: " __LL "%0, %1  # test_and_set_bit  \n"
-   "   or  %2, %0, %3  \n"
-   "   " __SC  "%2, %1 \n"
-   "   beqzl   %2, 1b  \n"
-   "   and %2, %0, %3  \n"
-   "   .setpop \n"
-   : "=" (temp), "+m" (*m), "=" (res)
-   : "r" (1UL << bit)
-   : __LLSC_CLOBBER);
-   } else {
-   do {
-   __asm__ __volatile__(
-   

[PATCH v2 35/36] MIPS: genex: Don't reload address unnecessarily

2019-10-01 Thread Paul Burton
In ejtag_debug_handler() we must reload the address of
ejtag_debug_buffer_spinlock if an sc fails, since the address in k0 will
have been clobbered by the result of the sc instruction. In the case
where we simply load a non-zero value (ie. there's contention for the
lock) the address will not be clobbered & we can simply branch back to
repeat the load from memory without reloading the address into k0.

The primary motivation for this change is that it moves the target of
the bnez instruction to an instruction within the LL/SC loop (the LL
itself), which we know contains no other memory accesses & therefore
isn't affected by Loongson3 LL/SC errata.

Signed-off-by: Paul Burton 
---

Changes in v2: None

 arch/mips/kernel/genex.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/mips/kernel/genex.S b/arch/mips/kernel/genex.S
index ac4f2b835165..60ede6b75a3b 100644
--- a/arch/mips/kernel/genex.S
+++ b/arch/mips/kernel/genex.S
@@ -355,8 +355,8 @@ NESTED(ejtag_debug_handler, PT_SIZE, sp)
 #ifdef CONFIG_SMP
 1: PTR_LA  k0, ejtag_debug_buffer_spinlock
__SYNC(full, loongson3_war)
-   ll  k0, 0(k0)
-   bnezk0, 1b
+2: ll  k0, 0(k0)
+   bnezk0, 2b
PTR_LA  k0, ejtag_debug_buffer_spinlock
sc  k0, 0(k0)
beqzk0, 1b
-- 
2.23.0



[PATCH v2 12/36] MIPS: atomic: Emit Loongson3 sync workarounds within asm

2019-10-01 Thread Paul Burton
Generate the sync instructions required to workaround Loongson3 LL/SC
errata within inline asm blocks, which feels a little safer than doing
it from C where strictly speaking the compiler would be well within its
rights to insert a memory access between the separate asm statements we
previously had, containing sync & ll instructions respectively.

Signed-off-by: Paul Burton 
---

Changes in v2: None

 arch/mips/include/asm/atomic.h | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/mips/include/asm/atomic.h b/arch/mips/include/asm/atomic.h
index b834af5a7382..841ff274ada6 100644
--- a/arch/mips/include/asm/atomic.h
+++ b/arch/mips/include/asm/atomic.h
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #define ATOMIC_INIT(i)   { (i) }
@@ -56,10 +57,10 @@ static __inline__ void pfx##_##op(type i, pfx##_t * v)  
\
return; \
}   \
\
-   loongson_llsc_mb(); \
__asm__ __volatile__(   \
"   .setpush\n" \
"   .set" MIPS_ISA_LEVEL "  \n" \
+   "   " __SYNC(full, loongson3_war) " \n" \
"1: " #ll " %0, %1  # " #pfx "_" #op "  \n" \
"   " #asm_op " %0, %2  \n" \
"   " #sc " %0, %1  \n" \
@@ -85,10 +86,10 @@ static __inline__ type pfx##_##op##_return_relaxed(type i, 
pfx##_t * v) \
return result;  \
}   \
\
-   loongson_llsc_mb(); \
__asm__ __volatile__(   \
"   .setpush\n" \
"   .set" MIPS_ISA_LEVEL "  \n" \
+   "   " __SYNC(full, loongson3_war) " \n" \
"1: " #ll " %1, %2  # " #pfx "_" #op "_return\n"\
"   " #asm_op " %0, %1, %3  \n" \
"   " #sc " %0, %2  \n" \
@@ -117,10 +118,10 @@ static __inline__ type pfx##_fetch_##op##_relaxed(type i, 
pfx##_t * v)\
return result;  \
}   \
\
-   loongson_llsc_mb(); \
__asm__ __volatile__(   \
"   .setpush\n" \
"   .set" MIPS_ISA_LEVEL "  \n" \
+   "   " __SYNC(full, loongson3_war) " \n" \
"1: " #ll " %1, %2  # " #pfx "_fetch_" #op "\n" \
"   " #asm_op " %0, %1, %3  \n" \
"   " #sc " %0, %2  \n" \
@@ -200,10 +201,10 @@ static __inline__ int atomic_sub_if_positive(int i, 
atomic_t * v)
if (kernel_uses_llsc) {
int temp;
 
-   loongson_llsc_mb();
__asm__ __volatile__(
"   .setpush\n"
"   .set"MIPS_ISA_LEVEL"\n"
+   "   " __SYNC(full, loongson3_war) " \n"
"1: ll  %1, %2  # atomic_sub_if_positive\n"
"   .setpop \n"
"   subu%0, %1, %3  \n"
@@ -213,7 +214,7 @@ static __inline__ int atomic_sub_if_positive(int i, 
atomic_t * v)
"   .set"MIPS_ISA_LEVEL"\n"
"   sc  %1, %2  \n"
"\t

[PATCH v2 13/36] MIPS: atomic: Use _atomic barriers in atomic_sub_if_positive()

2019-10-01 Thread Paul Burton
Use smp_mb__before_atomic() & smp_mb__after_atomic() in
atomic_sub_if_positive() rather than the equivalent
smp_mb__before_llsc() & smp_llsc_mb(). The former are more standard &
this preps us for avoiding redundant duplicate barriers on Loongson3 in
a later patch.

Signed-off-by: Paul Burton 
---

Changes in v2: None

 arch/mips/include/asm/atomic.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/mips/include/asm/atomic.h b/arch/mips/include/asm/atomic.h
index 841ff274ada6..24443ef29337 100644
--- a/arch/mips/include/asm/atomic.h
+++ b/arch/mips/include/asm/atomic.h
@@ -196,7 +196,7 @@ static __inline__ int atomic_sub_if_positive(int i, 
atomic_t * v)
 {
int result;
 
-   smp_mb__before_llsc();
+   smp_mb__before_atomic();
 
if (kernel_uses_llsc) {
int temp;
@@ -237,7 +237,7 @@ static __inline__ int atomic_sub_if_positive(int i, 
atomic_t * v)
 * another barrier here.
 */
if (!__SYNC_loongson3_war)
-   smp_llsc_mb();
+   smp_mb__after_atomic();
 
return result;
 }
-- 
2.23.0



[PATCH v2 19/36] MIPS: bitops: ins start position is always an immediate

2019-10-01 Thread Paul Burton
The start position for an ins instruction is always encoded as an
immediate, so allowing registers to be used by the inline asm makes no
sense. It should never happen anyway since a bit index should always be
small enough to be treated as an immediate, but remove the nonsensical
"r" for sanity.

Signed-off-by: Paul Burton 
---

Changes in v2: None

 arch/mips/include/asm/bitops.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index 0f5329e32e87..03532ae9f528 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -85,7 +85,7 @@ static inline void set_bit(unsigned long nr, volatile 
unsigned long *addr)
"   " __INS "%0, %3, %2, 1  \n"
"   " __SC "%0, %1  \n"
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-   : "ir" (bit), "r" (~0)
+   : "i" (bit), "r" (~0)
: __LLSC_CLOBBER);
} while (unlikely(!temp));
return;
@@ -150,7 +150,7 @@ static inline void clear_bit(unsigned long nr, volatile 
unsigned long *addr)
"   " __INS "%0, $0, %2, 1  \n"
"   " __SC "%0, %1  \n"
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-   : "ir" (bit)
+   : "i" (bit)
: __LLSC_CLOBBER);
} while (unlikely(!temp));
return;
@@ -383,7 +383,7 @@ static inline int test_and_clear_bit(unsigned long nr,
"   " __INS "%0, $0, %3, 1  \n"
"   " __SC  "%0, %1 \n"
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=" 
(res)
-   : "ir" (bit)
+   : "i" (bit)
: __LLSC_CLOBBER);
} while (unlikely(!temp));
} else {
-- 
2.23.0



[PATCH v2 08/36] MIPS: barrier: Clean up sync_ginv()

2019-10-01 Thread Paul Burton
Use the new __SYNC() infrastructure to implement sync_ginv(), for
consistency with much of the rest of the asm/barrier.h.

Signed-off-by: Paul Burton 
---

Changes in v2: None

 arch/mips/include/asm/barrier.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index a117c6d95038..c7e05e832da9 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -163,7 +163,7 @@ static inline void wmb(void)
 
 static inline void sync_ginv(void)
 {
-   asm volatile("sync\t%0" :: "i"(__SYNC_ginv));
+   asm volatile(__SYNC(ginv, always));
 }
 
 #include 
-- 
2.23.0



[PATCH v2 14/36] MIPS: atomic: Unify 32b & 64b sub_if_positive

2019-10-01 Thread Paul Burton
Unify the definitions of atomic_sub_if_positive() &
atomic64_sub_if_positive() using a macro like we do for most other
atomic functions. This allows us to share the implementation ensuring
consistency between the two. Notably this provides the appropriate
loongson3_war barriers in the atomic64_sub_if_positive() case which were
previously missing.

The code is rearranged a little to handle the !kernel_uses_llsc case
first in order to de-indent the LL/SC case & allow us not to go over 80
characters per line.

Signed-off-by: Paul Burton 
---

Changes in v2: None

 arch/mips/include/asm/atomic.h | 164 -
 1 file changed, 58 insertions(+), 106 deletions(-)

diff --git a/arch/mips/include/asm/atomic.h b/arch/mips/include/asm/atomic.h
index 24443ef29337..96ef50fa2817 100644
--- a/arch/mips/include/asm/atomic.h
+++ b/arch/mips/include/asm/atomic.h
@@ -192,65 +192,71 @@ ATOMIC_OPS(atomic64, xor, s64, ^=, xor, lld, scd)
  * Atomically test @v and subtract @i if @v is greater or equal than @i.
  * The function returns the old value of @v minus @i.
  */
-static __inline__ int atomic_sub_if_positive(int i, atomic_t * v)
-{
-   int result;
-
-   smp_mb__before_atomic();
-
-   if (kernel_uses_llsc) {
-   int temp;
-
-   __asm__ __volatile__(
-   "   .setpush\n"
-   "   .set"MIPS_ISA_LEVEL"\n"
-   "   " __SYNC(full, loongson3_war) " \n"
-   "1: ll  %1, %2  # atomic_sub_if_positive\n"
-   "   .setpop \n"
-   "   subu%0, %1, %3  \n"
-   "   move%1, %0  \n"
-   "   bltz%0, 2f  \n"
-   "   .setpush\n"
-   "   .set"MIPS_ISA_LEVEL"\n"
-   "   sc  %1, %2  \n"
-   "\t" __SC_BEQZ "%1, 1b  \n"
-   "2: " __SYNC(full, loongson3_war) " \n"
-   "   .setpop \n"
-   : "=" (result), "=" (temp),
- "+" GCC_OFF_SMALL_ASM() (v->counter)
-   : "Ir" (i) : __LLSC_CLOBBER);
-   } else {
-   unsigned long flags;
+#define ATOMIC_SIP_OP(pfx, type, op, ll, sc)   \
+static __inline__ int pfx##_sub_if_positive(type i, pfx##_t * v)   \
+{  \
+   type temp, result;  \
+   \
+   smp_mb__before_atomic();\
+   \
+   if (!kernel_uses_llsc) {\
+   unsigned long flags;\
+   \
+   raw_local_irq_save(flags);  \
+   result = v->counter;\
+   result -= i;\
+   if (result >= 0)\
+   v->counter = result;\
+   raw_local_irq_restore(flags);   \
+   smp_mb__after_atomic(); \
+   return result;  \
+   }   \
+   \
+   __asm__ __volatile__(   \
+   "   .setpush\n" \
+   "   .set" MIPS_ISA_LEVEL "  \n" \
+   "   " __SYNC(full, loongson3_war) " \n" \
+   "1: " #ll " %1, %2  # atomic_sub_if_positive\n" \
+   "   .setpop \n" \
+   "   " #op " %0, %1, %3  \n" \
+   "   move%1, %0

[PATCH v2 04/36] MIPS: barrier: Clean up rmb() & wmb() definitions

2019-10-01 Thread Paul Burton
Simplify our definitions of rmb() & wmb() using the new __SYNC()
infrastructure.

The fast_rmb() & fast_wmb() macros are removed, since they only provided
a level of indirection that made the code less readable & weren't
directly used anywhere in the kernel tree.

The Octeon #ifdef'ery is removed, since the "syncw" instruction
previously used is merely an alias for "sync 4" which __SYNC() will emit
for the wmb sync type when the kernel is configured for an Octeon CPU.
Similarly __SYNC() will emit nothing for the rmb sync type in Octeon
configurations.

Signed-off-by: Paul Burton 
---

Changes in v2: None

 arch/mips/include/asm/barrier.h | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index 5ad39bfd3b6d..f36cab87cfde 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -26,6 +26,18 @@
 #define __sync()   do { } while(0)
 #endif
 
+static inline void rmb(void)
+{
+   asm volatile(__SYNC(rmb, always) ::: "memory");
+}
+#define rmb rmb
+
+static inline void wmb(void)
+{
+   asm volatile(__SYNC(wmb, always) ::: "memory");
+}
+#define wmb wmb
+
 #define __fast_iob()   \
__asm__ __volatile__(   \
".set   push\n\t"   \
@@ -37,16 +49,9 @@
: "m" (*(int *)CKSEG1)  \
: "memory")
 #ifdef CONFIG_CPU_CAVIUM_OCTEON
-# define OCTEON_SYNCW_STR  ".set push\n.set 
arch=octeon\nsyncw\nsyncw\n.set pop\n"
-# define __syncw() __asm__ __volatile__(OCTEON_SYNCW_STR : : : "memory")
-
-# define fast_wmb()__syncw()
-# define fast_rmb()barrier()
 # define fast_mb() __sync()
 # define fast_iob()do { } while (0)
 #else /* ! CONFIG_CPU_CAVIUM_OCTEON */
-# define fast_wmb()__sync()
-# define fast_rmb()__sync()
 # define fast_mb() __sync()
 # ifdef CONFIG_SGI_IP28
 #  define fast_iob()   \
@@ -83,19 +88,14 @@
 
 #endif /* !CONFIG_CPU_HAS_WB */
 
-#define wmb()  fast_wmb()
-#define rmb()  fast_rmb()
-
 #if defined(CONFIG_WEAK_ORDERING)
 # ifdef CONFIG_CPU_CAVIUM_OCTEON
 #  define __smp_mb()   __sync()
-#  define __smp_rmb()  barrier()
-#  define __smp_wmb()  __syncw()
 # else
 #  define __smp_mb()   __asm__ __volatile__("sync" : : :"memory")
-#  define __smp_rmb()  __asm__ __volatile__("sync" : : :"memory")
-#  define __smp_wmb()  __asm__ __volatile__("sync" : : :"memory")
 # endif
+# define __smp_rmb()   rmb()
+# define __smp_wmb()   wmb()
 #else
 #define __smp_mb() barrier()
 #define __smp_rmb()barrier()
-- 
2.23.0



[PATCH v2 11/36] MIPS: atomic: Use one macro to generate 32b & 64b functions

2019-10-01 Thread Paul Burton
Cut down on duplication by generalizing the ATOMIC_OP(),
ATOMIC_OP_RETURN() & ATOMIC_FETCH_OP() macros to work for both 32b &
64b atomics, and removing the ATOMIC64_ variants. This ensures
consistency between our atomic_* & atomic64_* functions.

Signed-off-by: Paul Burton 
---

Changes in v2: None

 arch/mips/include/asm/atomic.h | 196 -
 1 file changed, 45 insertions(+), 151 deletions(-)

diff --git a/arch/mips/include/asm/atomic.h b/arch/mips/include/asm/atomic.h
index ace2ea005588..b834af5a7382 100644
--- a/arch/mips/include/asm/atomic.h
+++ b/arch/mips/include/asm/atomic.h
@@ -42,10 +42,10 @@
  */
 #define atomic_set(v, i)   WRITE_ONCE((v)->counter, (i))
 
-#define ATOMIC_OP(op, c_op, asm_op)\
-static __inline__ void atomic_##op(int i, atomic_t * v)
\
+#define ATOMIC_OP(pfx, op, type, c_op, asm_op, ll, sc) \
+static __inline__ void pfx##_##op(type i, pfx##_t * v) \
 {  \
-   int temp;   \
+   type temp;  \
\
if (!kernel_uses_llsc) {\
unsigned long flags;\
@@ -60,19 +60,19 @@ static __inline__ void atomic_##op(int i, atomic_t * v) 
\
__asm__ __volatile__(   \
"   .setpush\n" \
"   .set" MIPS_ISA_LEVEL "  \n" \
-   "1: ll  %0, %1  # atomic_" #op "\n" \
+   "1: " #ll " %0, %1  # " #pfx "_" #op "  \n" \
"   " #asm_op " %0, %2  \n" \
-   "   sc  %0, %1  \n" \
+   "   " #sc " %0, %1  \n" \
"\t" __SC_BEQZ "%0, 1b  \n" \
"   .setpop \n" \
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (v->counter)\
: "Ir" (i) : __LLSC_CLOBBER);   \
 }
 
-#define ATOMIC_OP_RETURN(op, c_op, asm_op) \
-static __inline__ int atomic_##op##_return_relaxed(int i, atomic_t * v)
\
+#define ATOMIC_OP_RETURN(pfx, op, type, c_op, asm_op, ll, sc)  \
+static __inline__ type pfx##_##op##_return_relaxed(type i, pfx##_t * v)
\
 {  \
-   int temp, result;   \
+   type temp, result;  \
\
if (!kernel_uses_llsc) {\
unsigned long flags;\
@@ -89,9 +89,9 @@ static __inline__ int atomic_##op##_return_relaxed(int i, 
atomic_t * v)   \
__asm__ __volatile__(   \
"   .setpush\n" \
"   .set" MIPS_ISA_LEVEL "  \n" \
-   "1: ll  %1, %2  # atomic_" #op "_return \n" \
+   "1: " #ll " %1, %2  # " #pfx "_" #op "_return\n"\
"   " #asm_op " %0, %1, %3  \n" \
-   "   sc  %0, %2  \n" \
+   "   " #sc " %0, %2  \n" \
"\t" __SC_BEQZ "%0, 1b  \n" \
"   " #asm_op " %0, %1, %3  \n" \
"   .setpop \n" \
@@ -102,8 +102,8 @@ static __inline__ int atomic_##op##_return_relaxed(int i, 
atomic_t * v) \
return result;  \
 }
 
-#define ATOMIC_FETCH_OP(op, c_op, asm_op)  \
-static __inline__ int atomic_fetch_##op##_relaxed(int i, atomic_t * v) \
+#define ATOMIC_FETCH_OP(pfx, op, type, c_op, asm_op, ll, sc)   \
+static __i

[PATCH v2 01/36] MIPS: Unify sc beqz definition

2019-10-01 Thread Paul Burton
We currently duplicate the definition of __scbeqz in asm/atomic.h &
asm/cmpxchg.h. Move it to asm/llsc.h & rename it to __SC_BEQZ to fit
better with the existing __SC macro provided there.

We include a tab in the string in order to avoid the need for users to
indent code any further to include whitespace of their own after the
instruction mnemonic.

Signed-off-by: Paul Burton 
---

Changes in v2: None

 arch/mips/include/asm/atomic.h  | 28 +---
 arch/mips/include/asm/cmpxchg.h | 20 
 arch/mips/include/asm/llsc.h| 11 +++
 3 files changed, 24 insertions(+), 35 deletions(-)

diff --git a/arch/mips/include/asm/atomic.h b/arch/mips/include/asm/atomic.h
index bb8658cc7f12..7578c807ef98 100644
--- a/arch/mips/include/asm/atomic.h
+++ b/arch/mips/include/asm/atomic.h
@@ -20,19 +20,9 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
-/*
- * Using a branch-likely instruction to check the result of an sc instruction
- * works around a bug present in R1 CPUs prior to revision 3.0 that could
- * cause ll-sc sequences to execute non-atomically.
- */
-#if R1_LLSC_WAR
-# define __scbeqz "beqzl"
-#else
-# define __scbeqz "beqz"
-#endif
-
 #define ATOMIC_INIT(i)   { (i) }
 
 /*
@@ -65,7 +55,7 @@ static __inline__ void atomic_##op(int i, atomic_t * v)   
  \
"1: ll  %0, %1  # atomic_" #op "\n"   \
"   " #asm_op " %0, %2  \n"   \
"   sc  %0, %1  \n"   \
-   "\t" __scbeqz " %0, 1b  \n"   \
+   "\t" __SC_BEQZ "%0, 1b  \n"   \
"   .setpop \n"   \
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (v->counter)  \
: "Ir" (i) : __LLSC_CLOBBER); \
@@ -93,7 +83,7 @@ static __inline__ int atomic_##op##_return_relaxed(int i, 
atomic_t * v) \
"1: ll  %1, %2  # atomic_" #op "_return \n"   \
"   " #asm_op " %0, %1, %3  \n"   \
"   sc  %0, %2  \n"   \
-   "\t" __scbeqz " %0, 1b  \n"   \
+   "\t" __SC_BEQZ "%0, 1b  \n"   \
"   " #asm_op " %0, %1, %3  \n"   \
"   .setpop \n"   \
: "=" (result), "=" (temp),   \
@@ -127,7 +117,7 @@ static __inline__ int atomic_fetch_##op##_relaxed(int i, 
atomic_t * v)\
"1: ll  %1, %2  # atomic_fetch_" #op "  \n"   \
"   " #asm_op " %0, %1, %3  \n"   \
"   sc  %0, %2  \n"   \
-   "\t" __scbeqz " %0, 1b  \n"   \
+   "\t" __SC_BEQZ "%0, 1b  \n"   \
"   .setpop \n"   \
"   move%0, %1  \n"   \
: "=" (result), "=" (temp),   \
@@ -205,7 +195,7 @@ static __inline__ int atomic_sub_if_positive(int i, 
atomic_t * v)
"   .setpush\n"
"   .set"MIPS_ISA_LEVEL"\n"
"   sc  %1, %2  \n"
-   "\t" __scbeqz " %1, 1b  \n"
+   "\t" __SC_BEQZ "%1, 1b  \n"
"2: \n"
"   .setpop \n"
: "=" (result), "=" (temp),
@@ -267,7 +257,7 @@ static __inline__ void atomic64_##op(s64 i, atomic64_t * v) 
  \
"1: lld %0, %1  # atomic64_" #op "  \n"   \
"   " #asm_op " %0, %2  \n"   \
  

[PATCH v2 00/36] MIPS: barriers & atomics cleanups

2019-10-01 Thread Paul Burton
This series consists of a bunch of cleanups to the way we handle memory
barriers (though no changes to the sync instructions we use to implement
them) & atomic memory accesses. One major goal was to ensure the
Loongson3 LL/SC errata workarounds are applied in a safe manner from
within inline-asm & that we can automatically verify the resulting
kernel binary looks reasonable. Many patches are cleanups found along
the way.

Applies atop v5.4-rc1.

Changes in v2:
- Keep our fls/ffs implementations. Turns out GCC's builtins call
  intrinsics in some configurations, and if we'd need to go implement
  those then using the generic fls/ffs doesn't seem like such a win.
- De-string __WEAK_LLSC_MB to allow use with __SYNC_ELSE().
- Only try to build the loongson3-llsc-check tool from
  arch/mips/Makefile when CONFIG_CPU_LOONGSON3_WORKAROUNDS is enabled.

Paul Burton (36):
  MIPS: Unify sc beqz definition
  MIPS: Use compact branch for LL/SC loops on MIPSr6+
  MIPS: barrier: Add __SYNC() infrastructure
  MIPS: barrier: Clean up rmb() & wmb() definitions
  MIPS: barrier: Clean up __smp_mb() definition
  MIPS: barrier: Remove fast_mb() Octeon #ifdef'ery
  MIPS: barrier: Clean up __sync() definition
  MIPS: barrier: Clean up sync_ginv()
  MIPS: atomic: Fix whitespace in ATOMIC_OP macros
  MIPS: atomic: Handle !kernel_uses_llsc first
  MIPS: atomic: Use one macro to generate 32b & 64b functions
  MIPS: atomic: Emit Loongson3 sync workarounds within asm
  MIPS: atomic: Use _atomic barriers in atomic_sub_if_positive()
  MIPS: atomic: Unify 32b & 64b sub_if_positive
  MIPS: atomic: Deduplicate 32b & 64b read, set, xchg, cmpxchg
  MIPS: bitops: Handle !kernel_uses_llsc first
  MIPS: bitops: Only use ins for bit 16 or higher
  MIPS: bitops: Use MIPS_ISA_REV, not #ifdefs
  MIPS: bitops: ins start position is always an immediate
  MIPS: bitops: Implement test_and_set_bit() in terms of _lock variant
  MIPS: bitops: Allow immediates in test_and_{set,clear,change}_bit
  MIPS: bitops: Use the BIT() macro
  MIPS: bitops: Avoid redundant zero-comparison for non-LLSC
  MIPS: bitops: Abstract LL/SC loops
  MIPS: bitops: Use BIT_WORD() & BITS_PER_LONG
  MIPS: bitops: Emit Loongson3 sync workarounds within asm
  MIPS: bitops: Use smp_mb__before_atomic in test_* ops
  MIPS: cmpxchg: Emit Loongson3 sync workarounds within asm
  MIPS: cmpxchg: Omit redundant barriers for Loongson3
  MIPS: futex: Emit Loongson3 sync workarounds within asm
  MIPS: syscall: Emit Loongson3 sync workarounds within asm
  MIPS: barrier: Remove loongson_llsc_mb()
  MIPS: barrier: Make __smp_mb__before_atomic() a no-op for Loongson3
  MIPS: genex: Add Loongson3 LL/SC workaround to ejtag_debug_handler
  MIPS: genex: Don't reload address unnecessarily
  MIPS: Check Loongson3 LL/SC errata workaround correctness

 arch/mips/Makefile |   3 +
 arch/mips/Makefile.postlink|  10 +-
 arch/mips/include/asm/atomic.h | 571 +
 arch/mips/include/asm/barrier.h| 228 ++
 arch/mips/include/asm/bitops.h | 443 ++-
 arch/mips/include/asm/cmpxchg.h|  59 +--
 arch/mips/include/asm/futex.h  |  15 +-
 arch/mips/include/asm/llsc.h   |  19 +-
 arch/mips/include/asm/sync.h   | 207 +
 arch/mips/kernel/genex.S   |   6 +-
 arch/mips/kernel/pm-cps.c  |  20 +-
 arch/mips/kernel/syscall.c |   3 +-
 arch/mips/lib/bitops.c |  57 +--
 arch/mips/loongson64/Platform  |   2 +-
 arch/mips/tools/.gitignore |   1 +
 arch/mips/tools/Makefile   |   5 +
 arch/mips/tools/loongson3-llsc-check.c | 307 +
 17 files changed, 981 insertions(+), 975 deletions(-)
 create mode 100644 arch/mips/include/asm/sync.h
 create mode 100644 arch/mips/tools/loongson3-llsc-check.c

-- 
2.23.0



[PATCH v2 06/36] MIPS: barrier: Remove fast_mb() Octeon #ifdef'ery

2019-10-01 Thread Paul Burton
The definition of fast_mb() is the same in both the Octeon & non-Octeon
cases, so remove the duplication & define it only once.

Signed-off-by: Paul Burton 
---

Changes in v2: None

 arch/mips/include/asm/barrier.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index 8a5abc1c85a6..657ec01120a4 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -38,6 +38,8 @@ static inline void wmb(void)
 }
 #define wmb wmb
 
+#define fast_mb()  __sync()
+
 #define __fast_iob()   \
__asm__ __volatile__(   \
".set   push\n\t"   \
@@ -49,10 +51,8 @@ static inline void wmb(void)
: "m" (*(int *)CKSEG1)  \
: "memory")
 #ifdef CONFIG_CPU_CAVIUM_OCTEON
-# define fast_mb() __sync()
 # define fast_iob()do { } while (0)
 #else /* ! CONFIG_CPU_CAVIUM_OCTEON */
-# define fast_mb() __sync()
 # ifdef CONFIG_SGI_IP28
 #  define fast_iob()   \
__asm__ __volatile__(   \
-- 
2.23.0



[PATCH 02/37] MIPS: Use compact branch for LL/SC loops on MIPSr6+

2019-09-30 Thread Paul Burton
When targeting MIPSr6 or higher make use of a compact branch in LL/SC
loops, preventing the insertion of a delay slot nop that only serves to
waste space.

Signed-off-by: Paul Burton 
---

 arch/mips/include/asm/llsc.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/mips/include/asm/llsc.h b/arch/mips/include/asm/llsc.h
index 9b19f38562ac..d240a4a2d1c4 100644
--- a/arch/mips/include/asm/llsc.h
+++ b/arch/mips/include/asm/llsc.h
@@ -9,6 +9,8 @@
 #ifndef __ASM_LLSC_H
 #define __ASM_LLSC_H
 
+#include 
+
 #if _MIPS_SZLONG == 32
 #define SZLONG_LOG 5
 #define SZLONG_MASK 31UL
@@ -32,6 +34,8 @@
  */
 #if R1_LLSC_WAR
 # define __SC_BEQZ "beqzl  "
+#elif MIPS_ISA_REV >= 6
+# define __SC_BEQZ "beqzc  "
 #else
 # define __SC_BEQZ "beqz   "
 #endif
-- 
2.23.0



[PATCH 03/37] MIPS: barrier: Add __SYNC() infrastructure

2019-09-30 Thread Paul Burton
Introduce an asm/sync.h header which provides infrastructure that can be
used to generate sync instructions of various types, and for various
reasons. For example if we need a sync instruction that provides a full
completion barrier but only on systems which have weak memory ordering,
we can generate the appropriate assembly code using:

  __SYNC(full, weak_ordering)

When the kernel is configured to run on systems with weak memory
ordering (ie. CONFIG_WEAK_ORDERING is selected) we'll emit a sync
instruction. When the kernel is configured to run on systems with strong
memory ordering (ie. CONFIG_WEAK_ORDERING is not selected) we'll emit
nothing. The caller doesn't need to know which happened - it simply says
what it needs & when, with no concern for checking the kernel
configuration.

There are some scenarios in which we may want to emit code only when we
*didn't* emit a sync instruction. For example, some Loongson3 CPUs
suffer from a bug that requires us to emit a sync instruction prior to
each ll instruction (enabled by CONFIG_CPU_LOONGSON3_WORKAROUNDS). In
cases where this bug workaround is enabled, it's wasteful to then have
more generic code emit another sync instruction to provide barriers we
need in general. A __SYNC_ELSE() macro allows for this, providing an
extra argument that contains code to be assembled only in cases where
the sync instruction was not emitted. For example if we have a scenario
in which we generally want to emit a release barrier but for affected
Loongson3 configurations upgrade that to a full completion barrier, we
can do that like so:

  __SYNC_ELSE(full, loongson3_war, __SYNC(rl, always))

The assembly generated by these macros can be used either as inline
assembly or in assembly source files.

Differing types of sync as provided by MIPSr6 are defined, but currently
they all generate a full completion barrier except in kernels configured
for Cavium Octeon systems. There the wmb sync-type is used, and rmb
syncs are omitted, as has been the case since commit 6b07d38aaa52
("MIPS: Octeon: Use optimized memory barrier primitives."). Using
__SYNC() with the wmb or rmb types will abstract away the Octeon
specific behavior and allow us to later clean up asm/barrier.h code that
currently includes a plethora of #ifdef's.

Signed-off-by: Paul Burton 
---

 arch/mips/include/asm/barrier.h | 113 +
 arch/mips/include/asm/sync.h| 207 
 arch/mips/kernel/pm-cps.c   |  20 +--
 3 files changed, 219 insertions(+), 121 deletions(-)
 create mode 100644 arch/mips/include/asm/sync.h

diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index 9228f7386220..5ad39bfd3b6d 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -9,116 +9,7 @@
 #define __ASM_BARRIER_H
 
 #include 
-
-/*
- * Sync types defined by the MIPS architecture (document MD00087 table 6.5)
- * These values are used with the sync instruction to perform memory barriers.
- * Types of ordering guarantees available through the SYNC instruction:
- * - Completion Barriers
- * - Ordering Barriers
- * As compared to the completion barrier, the ordering barrier is a
- * lighter-weight operation as it does not require the specified instructions
- * before the SYNC to be already completed. Instead it only requires that those
- * specified instructions which are subsequent to the SYNC in the instruction
- * stream are never re-ordered for processing ahead of the specified
- * instructions which are before the SYNC in the instruction stream.
- * This potentially reduces how many cycles the barrier instruction must stall
- * before it completes.
- * Implementations that do not use any of the non-zero values of stype to 
define
- * different barriers, such as ordering barriers, must make those stype values
- * act the same as stype zero.
- */
-
-/*
- * Completion barriers:
- * - Every synchronizable specified memory instruction (loads or stores or 
both)
- *   that occurs in the instruction stream before the SYNC instruction must be
- *   already globally performed before any synchronizable specified memory
- *   instructions that occur after the SYNC are allowed to be performed, with
- *   respect to any other processor or coherent I/O module.
- *
- * - The barrier does not guarantee the order in which instruction fetches are
- *   performed.
- *
- * - A stype value of zero will always be defined such that it performs the 
most
- *   complete set of synchronization operations that are defined.This means
- *   stype zero always does a completion barrier that affects both loads and
- *   stores preceding the SYNC instruction and both loads and stores that are
- *   subsequent to the SYNC instruction. Non-zero values of stype may be 
defined
- *   by the architecture or specific implementations to perform synchronization
- *   behaviors that are less complete than that of stype zero. If an
- *   implementation does not use on

[PATCH 18/37] MIPS: bitops: Only use ins for bit 16 or higher

2019-09-30 Thread Paul Burton
set_bit() can set bits 0-15 using an ori instruction, rather than
loading the value -1 into a register & then using an ins instruction.

That is, rather than the following:

  li   t0, -1
  ll   t1, 0(t2)
  ins  t1, t0, 4, 1
  sc   t1, 0(t2)

We can have the simpler:

  ll   t1, 0(t2)
  ori  t1, t1, 0x10
  sc   t1, 0(t2)

The or path already allows immediates to be used, so simply restricting
the ins path to bits that don't fit in immediates is sufficient to take
advantage of this.

Signed-off-by: Paul Burton 
---

 arch/mips/include/asm/bitops.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index d3f3f37ca0b1..3ea4f172ac08 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -77,7 +77,7 @@ static inline void set_bit(unsigned long nr, volatile 
unsigned long *addr)
}
 
 #if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)
-   if (__builtin_constant_p(bit)) {
+   if (__builtin_constant_p(bit) && (bit >= 16)) {
loongson_llsc_mb();
do {
__asm__ __volatile__(
-- 
2.23.0



[PATCH 22/37] MIPS: bitops: Allow immediates in test_and_{set,clear,change}_bit

2019-09-30 Thread Paul Burton
The logical operations or & xor used in the test_and_set_bit_lock(),
test_and_clear_bit() & test_and_change_bit() functions currently force
the value 1<
---

 arch/mips/include/asm/bitops.h | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index 34d6fe3f18d0..0b0ce0adce8f 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -261,7 +261,7 @@ static inline int test_and_set_bit_lock(unsigned long nr,
"   and %2, %0, %3  \n"
"   .setpop \n"
: "=" (temp), "+m" (*m), "=" (res)
-   : "r" (1UL << bit)
+   : "ir" (1UL << bit)
: __LLSC_CLOBBER);
} else {
loongson_llsc_mb();
@@ -274,7 +274,7 @@ static inline int test_and_set_bit_lock(unsigned long nr,
"   " __SC  "%2, %1 \n"
"   .setpop \n"
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=" 
(res)
-   : "r" (1UL << bit)
+   : "ir" (1UL << bit)
: __LLSC_CLOBBER);
} while (unlikely(!res));
 
@@ -332,7 +332,7 @@ static inline int test_and_clear_bit(unsigned long nr,
"   and %2, %0, %3  \n"
"   .setpop \n"
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=" (res)
-   : "r" (1UL << bit)
+   : "ir" (1UL << bit)
: __LLSC_CLOBBER);
} else if ((MIPS_ISA_REV >= 2) && __builtin_constant_p(nr)) {
loongson_llsc_mb();
@@ -358,7 +358,7 @@ static inline int test_and_clear_bit(unsigned long nr,
"   " __SC  "%2, %1 \n"
"   .setpop \n"
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=" 
(res)
-   : "r" (1UL << bit)
+   : "ir" (1UL << bit)
: __LLSC_CLOBBER);
} while (unlikely(!res));
 
@@ -400,7 +400,7 @@ static inline int test_and_change_bit(unsigned long nr,
"   and %2, %0, %3  \n"
"   .setpop \n"
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=" (res)
-   : "r" (1UL << bit)
+   : "ir" (1UL << bit)
: __LLSC_CLOBBER);
} else {
loongson_llsc_mb();
@@ -413,7 +413,7 @@ static inline int test_and_change_bit(unsigned long nr,
"   " __SC  "\t%2, %1   \n"
"   .setpop \n"
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=" 
(res)
-   : "r" (1UL << bit)
+   : "ir" (1UL << bit)
: __LLSC_CLOBBER);
} while (unlikely(!res));
 
-- 
2.23.0



[PATCH 07/37] MIPS: barrier: Clean up __sync() definition

2019-09-30 Thread Paul Burton
Implement __sync() using the new __SYNC() infrastructure, which will
take care of not emitting an instruction for old R3k CPUs that don't
support it. The only behavioral difference is that __sync() will now
provide a compiler barrier on these old CPUs, but that seems like
reasonable behavior anyway.

Signed-off-by: Paul Burton 
---

 arch/mips/include/asm/barrier.h | 18 --
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index 657ec01120a4..a117c6d95038 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -11,20 +11,10 @@
 #include 
 #include 
 
-#ifdef CONFIG_CPU_HAS_SYNC
-#define __sync()   \
-   __asm__ __volatile__(   \
-   ".set   push\n\t"   \
-   ".set   noreorder\n\t"  \
-   ".set   mips2\n\t"  \
-   "sync\n\t"  \
-   ".set   pop"\
-   : /* no output */   \
-   : /* no input */\
-   : "memory")
-#else
-#define __sync()   do { } while(0)
-#endif
+static inline void __sync(void)
+{
+   asm volatile(__SYNC(full, always) ::: "memory");
+}
 
 static inline void rmb(void)
 {
-- 
2.23.0



[PATCH 05/37] MIPS: barrier: Clean up __smp_mb() definition

2019-09-30 Thread Paul Burton
We #ifdef on Cavium Octeon CPUs, but emit the same sync instruction in
both cases. Remove the #ifdef & simply expand to the __sync() macro.

Whilst here indent the strong ordering case definitions to match the
indentation of the weak ordering ones, helping readability.

Signed-off-by: Paul Burton 
---

 arch/mips/include/asm/barrier.h | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index f36cab87cfde..8a5abc1c85a6 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -89,17 +89,13 @@ static inline void wmb(void)
 #endif /* !CONFIG_CPU_HAS_WB */
 
 #if defined(CONFIG_WEAK_ORDERING)
-# ifdef CONFIG_CPU_CAVIUM_OCTEON
-#  define __smp_mb()   __sync()
-# else
-#  define __smp_mb()   __asm__ __volatile__("sync" : : :"memory")
-# endif
+# define __smp_mb()__sync()
 # define __smp_rmb()   rmb()
 # define __smp_wmb()   wmb()
 #else
-#define __smp_mb() barrier()
-#define __smp_rmb()barrier()
-#define __smp_wmb()barrier()
+# define __smp_mb()barrier()
+# define __smp_rmb()   barrier()
+# define __smp_wmb()   barrier()
 #endif
 
 /*
-- 
2.23.0



[PATCH 13/37] MIPS: atomic: Use _atomic barriers in atomic_sub_if_positive()

2019-09-30 Thread Paul Burton
Use smp_mb__before_atomic() & smp_mb__after_atomic() in
atomic_sub_if_positive() rather than the equivalent
smp_mb__before_llsc() & smp_llsc_mb(). The former are more standard &
this preps us for avoiding redundant duplicate barriers on Loongson3 in
a later patch.

Signed-off-by: Paul Burton 
---

 arch/mips/include/asm/atomic.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/mips/include/asm/atomic.h b/arch/mips/include/asm/atomic.h
index 841ff274ada6..24443ef29337 100644
--- a/arch/mips/include/asm/atomic.h
+++ b/arch/mips/include/asm/atomic.h
@@ -196,7 +196,7 @@ static __inline__ int atomic_sub_if_positive(int i, 
atomic_t * v)
 {
int result;
 
-   smp_mb__before_llsc();
+   smp_mb__before_atomic();
 
if (kernel_uses_llsc) {
int temp;
@@ -237,7 +237,7 @@ static __inline__ int atomic_sub_if_positive(int i, 
atomic_t * v)
 * another barrier here.
 */
if (!__SYNC_loongson3_war)
-   smp_llsc_mb();
+   smp_mb__after_atomic();
 
return result;
 }
-- 
2.23.0



[PATCH 12/37] MIPS: atomic: Emit Loongson3 sync workarounds within asm

2019-09-30 Thread Paul Burton
Generate the sync instructions required to workaround Loongson3 LL/SC
errata within inline asm blocks, which feels a little safer than doing
it from C where strictly speaking the compiler would be well within its
rights to insert a memory access between the separate asm statements we
previously had, containing sync & ll instructions respectively.

Signed-off-by: Paul Burton 
---

 arch/mips/include/asm/atomic.h | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/mips/include/asm/atomic.h b/arch/mips/include/asm/atomic.h
index b834af5a7382..841ff274ada6 100644
--- a/arch/mips/include/asm/atomic.h
+++ b/arch/mips/include/asm/atomic.h
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #define ATOMIC_INIT(i)   { (i) }
@@ -56,10 +57,10 @@ static __inline__ void pfx##_##op(type i, pfx##_t * v)  
\
return; \
}   \
\
-   loongson_llsc_mb(); \
__asm__ __volatile__(   \
"   .setpush\n" \
"   .set" MIPS_ISA_LEVEL "  \n" \
+   "   " __SYNC(full, loongson3_war) " \n" \
"1: " #ll " %0, %1  # " #pfx "_" #op "  \n" \
"   " #asm_op " %0, %2  \n" \
"   " #sc " %0, %1  \n" \
@@ -85,10 +86,10 @@ static __inline__ type pfx##_##op##_return_relaxed(type i, 
pfx##_t * v) \
return result;  \
}   \
\
-   loongson_llsc_mb(); \
__asm__ __volatile__(   \
"   .setpush\n" \
"   .set" MIPS_ISA_LEVEL "  \n" \
+   "   " __SYNC(full, loongson3_war) " \n" \
"1: " #ll " %1, %2  # " #pfx "_" #op "_return\n"\
"   " #asm_op " %0, %1, %3  \n" \
"   " #sc " %0, %2  \n" \
@@ -117,10 +118,10 @@ static __inline__ type pfx##_fetch_##op##_relaxed(type i, 
pfx##_t * v)\
return result;  \
}   \
\
-   loongson_llsc_mb(); \
__asm__ __volatile__(   \
"   .setpush\n" \
"   .set" MIPS_ISA_LEVEL "  \n" \
+   "   " __SYNC(full, loongson3_war) " \n" \
"1: " #ll " %1, %2  # " #pfx "_fetch_" #op "\n" \
"   " #asm_op " %0, %1, %3  \n" \
"   " #sc " %0, %2  \n" \
@@ -200,10 +201,10 @@ static __inline__ int atomic_sub_if_positive(int i, 
atomic_t * v)
if (kernel_uses_llsc) {
int temp;
 
-   loongson_llsc_mb();
__asm__ __volatile__(
"   .setpush\n"
"   .set"MIPS_ISA_LEVEL"\n"
+   "   " __SYNC(full, loongson3_war) " \n"
"1: ll  %1, %2  # atomic_sub_if_positive\n"
"   .setpop \n"
"   subu%0, %1, %3  \n"
@@ -213,7 +214,7 @@ static __inline__ int atomic_sub_if_positive(int i, 
atomic_t * v)
"   .set"MIPS_ISA_LEVEL"\n"
"   sc  %1, %2  \n"
"\t"

[PATCH 33/37] MIPS: barrier: Remove loongson_llsc_mb()

2019-09-30 Thread Paul Burton
The loongson_llsc_mb() macro is no longer used - instead barriers are
emitted as part of inline asm using the __SYNC() macro. Remove the
now-defunct loongson_llsc_mb() macro.

Signed-off-by: Paul Burton 
---

 arch/mips/include/asm/barrier.h | 40 -
 arch/mips/loongson64/Platform   |  2 +-
 2 files changed, 1 insertion(+), 41 deletions(-)

diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index c7e05e832da9..1a99a6c5b5dd 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -121,46 +121,6 @@ static inline void wmb(void)
 #define __smp_mb__before_atomic()  __smp_mb__before_llsc()
 #define __smp_mb__after_atomic()   smp_llsc_mb()
 
-/*
- * Some Loongson 3 CPUs have a bug wherein execution of a memory access (load,
- * store or prefetch) in between an LL & SC can cause the SC instruction to
- * erroneously succeed, breaking atomicity. Whilst it's unusual to write code
- * containing such sequences, this bug bites harder than we might otherwise
- * expect due to reordering & speculation:
- *
- * 1) A memory access appearing prior to the LL in program order may actually
- *be executed after the LL - this is the reordering case.
- *
- *In order to avoid this we need to place a memory barrier (ie. a SYNC
- *instruction) prior to every LL instruction, in between it and any earlier
- *memory access instructions.
- *
- *This reordering case is fixed by 3A R2 CPUs, ie. 3A2000 models and later.
- *
- * 2) If a conditional branch exists between an LL & SC with a target outside
- *of the LL-SC loop, for example an exit upon value mismatch in cmpxchg()
- *or similar, then misprediction of the branch may allow speculative
- *execution of memory accesses from outside of the LL-SC loop.
- *
- *In order to avoid this we need a memory barrier (ie. a SYNC instruction)
- *at each affected branch target, for which we also use loongson_llsc_mb()
- *defined below.
- *
- *This case affects all current Loongson 3 CPUs.
- *
- * The above described cases cause an error in the cache coherence protocol;
- * such that the Invalidate of a competing LL-SC goes 'missing' and SC
- * erroneously observes its core still has Exclusive state and lets the SC
- * proceed.
- *
- * Therefore the error only occurs on SMP systems.
- */
-#ifdef CONFIG_CPU_LOONGSON3_WORKAROUNDS /* Loongson-3's LLSC workaround */
-#define loongson_llsc_mb() __asm__ __volatile__("sync" : : :"memory")
-#else
-#define loongson_llsc_mb() do { } while (0)
-#endif
-
 static inline void sync_ginv(void)
 {
asm volatile(__SYNC(ginv, always));
diff --git a/arch/mips/loongson64/Platform b/arch/mips/loongson64/Platform
index c1a4d4dc4665..28172500f95a 100644
--- a/arch/mips/loongson64/Platform
+++ b/arch/mips/loongson64/Platform
@@ -27,7 +27,7 @@ cflags-$(CONFIG_CPU_LOONGSON3)+= -Wa,--trap
 #
 # Some versions of binutils, not currently mainline as of 2019/02/04, support
 # an -mfix-loongson3-llsc flag which emits a sync prior to each ll instruction
-# to work around a CPU bug (see loongson_llsc_mb() in asm/barrier.h for a
+# to work around a CPU bug (see __SYNC_loongson3_war in asm/sync.h for a
 # description).
 #
 # We disable this in order to prevent the assembler meddling with the
-- 
2.23.0



[PATCH 20/37] MIPS: bitops: ins start position is always an immediate

2019-09-30 Thread Paul Burton
The start position for an ins instruction is always encoded as an
immediate, so allowing registers to be used by the inline asm makes no
sense. It should never happen anyway since a bit index should always be
small enough to be treated as an immediate, but remove the nonsensical
"r" for sanity.

Signed-off-by: Paul Burton 
---

 arch/mips/include/asm/bitops.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index b8785bdf3507..83fd1f1c3ab4 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -85,7 +85,7 @@ static inline void set_bit(unsigned long nr, volatile 
unsigned long *addr)
"   " __INS "%0, %3, %2, 1  \n"
"   " __SC "%0, %1  \n"
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-   : "ir" (bit), "r" (~0)
+   : "i" (bit), "r" (~0)
: __LLSC_CLOBBER);
} while (unlikely(!temp));
return;
@@ -150,7 +150,7 @@ static inline void clear_bit(unsigned long nr, volatile 
unsigned long *addr)
"   " __INS "%0, $0, %2, 1  \n"
"   " __SC "%0, %1  \n"
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-   : "ir" (bit)
+   : "i" (bit)
: __LLSC_CLOBBER);
} while (unlikely(!temp));
return;
@@ -383,7 +383,7 @@ static inline int test_and_clear_bit(unsigned long nr,
"   " __INS "%0, $0, %3, 1  \n"
"   " __SC  "%0, %1 \n"
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=" 
(res)
-   : "ir" (bit)
+   : "i" (bit)
: __LLSC_CLOBBER);
} while (unlikely(!temp));
} else {
-- 
2.23.0



[PATCH 36/37] MIPS: genex: Don't reload address unnecessarily

2019-09-30 Thread Paul Burton
In ejtag_debug_handler() we must reload the address of
ejtag_debug_buffer_spinlock if an sc fails, since the address in k0 will
have been clobbered by the result of the sc instruction. In the case
where we simply load a non-zero value (ie. there's contention for the
lock) the address will not be clobbered & we can simply branch back to
repeat the load from memory without reloading the address into k0.

The primary motivation for this change is that it moves the target of
the bnez instruction to an instruction within the LL/SC loop (the LL
itself), which we know contains no other memory accesses & therefore
isn't affected by Loongson3 LL/SC errata.

Signed-off-by: Paul Burton 
---

 arch/mips/kernel/genex.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/mips/kernel/genex.S b/arch/mips/kernel/genex.S
index ac4f2b835165..60ede6b75a3b 100644
--- a/arch/mips/kernel/genex.S
+++ b/arch/mips/kernel/genex.S
@@ -355,8 +355,8 @@ NESTED(ejtag_debug_handler, PT_SIZE, sp)
 #ifdef CONFIG_SMP
 1: PTR_LA  k0, ejtag_debug_buffer_spinlock
__SYNC(full, loongson3_war)
-   ll  k0, 0(k0)
-   bnezk0, 1b
+2: ll  k0, 0(k0)
+   bnezk0, 2b
PTR_LA  k0, ejtag_debug_buffer_spinlock
sc  k0, 0(k0)
beqzk0, 1b
-- 
2.23.0



[PATCH 35/37] MIPS: genex: Add Loongson3 LL/SC workaround to ejtag_debug_handler

2019-09-30 Thread Paul Burton
In ejtag_debug_handler we use LL & SC instructions to acquire & release
an open-coded spinlock. For Loongson3 systems affected by LL/SC errata
this requires that we insert a sync instruction prior to the LL in order
to ensure correct behavior of the LL/SC loop.

Signed-off-by: Paul Burton 
---

 arch/mips/kernel/genex.S | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/mips/kernel/genex.S b/arch/mips/kernel/genex.S
index efde27c99414..ac4f2b835165 100644
--- a/arch/mips/kernel/genex.S
+++ b/arch/mips/kernel/genex.S
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -353,6 +354,7 @@ NESTED(ejtag_debug_handler, PT_SIZE, sp)
 
 #ifdef CONFIG_SMP
 1: PTR_LA  k0, ejtag_debug_buffer_spinlock
+   __SYNC(full, loongson3_war)
ll  k0, 0(k0)
bnezk0, 1b
PTR_LA  k0, ejtag_debug_buffer_spinlock
-- 
2.23.0



[PATCH 30/37] MIPS: cmpxchg: Omit redundant barriers for Loongson3

2019-09-30 Thread Paul Burton
When building a kernel configured to support Loongson3 LL/SC workarounds
(ie. CONFIG_CPU_LOONGSON3_WORKAROUNDS=y) the inline assembly in
__xchg_asm() & __cmpxchg_asm() already emits completion barriers, and as
such we don't need to emit extra barriers from the xchg() or cmpxchg()
macros. Add compile-time constant checks causing us to omit the
redundant memory barriers.

Signed-off-by: Paul Burton 
---

 arch/mips/include/asm/cmpxchg.h | 26 +++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/arch/mips/include/asm/cmpxchg.h b/arch/mips/include/asm/cmpxchg.h
index fc121d20a980..820df68e32e1 100644
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -94,7 +94,13 @@ static inline unsigned long __xchg(volatile void *ptr, 
unsigned long x,
 ({ \
__typeof__(*(ptr)) __res;   \
\
-   smp_mb__before_llsc();  \
+   /*  \
+* In the Loongson3 workaround case __xchg_asm() already\
+* contains a completion barrier prior to the LL, so we don't   \
+* need to emit an extra one here.  \
+*/ \
+   if (!__SYNC_loongson3_war)  \
+   smp_mb__before_llsc();  \
\
__res = (__typeof__(*(ptr)))\
__xchg((ptr), (unsigned long)(x), sizeof(*(ptr)));  \
@@ -179,9 +185,23 @@ static inline unsigned long __cmpxchg(volatile void *ptr, 
unsigned long old,
 ({ \
__typeof__(*(ptr)) __res;   \
\
-   smp_mb__before_llsc();  \
+   /*  \
+* In the Loongson3 workaround case __cmpxchg_asm() already \
+* contains a completion barrier prior to the LL, so we don't   \
+* need to emit an extra one here.  \
+*/ \
+   if (!__SYNC_loongson3_war)  \
+   smp_mb__before_llsc();  \
+   \
__res = cmpxchg_local((ptr), (old), (new)); \
-   smp_llsc_mb();  \
+   \
+   /*  \
+* In the Loongson3 workaround case __cmpxchg_asm() already \
+* contains a completion barrier after the SC, so we don't  \
+* need to emit an extra one here.  \
+*/ \
+   if (!__SYNC_loongson3_war)  \
+   smp_llsc_mb();  \
\
__res;  \
 })
-- 
2.23.0



[PATCH 37/37] MIPS: Check Loongson3 LL/SC errata workaround correctness

2019-09-30 Thread Paul Burton
When Loongson3 LL/SC errata workarounds are enabled (ie.
CONFIG_CPU_LOONGSON3_WORKAROUNDS=y) run a tool to scan through the
compiled kernel & ensure that the workaround is applied correctly. That
is, ensure that:

  - Every LL or LLD instruction is preceded by a sync instruction.

  - Any branches from within an LL/SC loop to outside of that loop
target a sync instruction.

Reasoning for these conditions can be found by reading the comment above
the definition of __SYNC_loongson3_war in arch/mips/include/asm/sync.h.

This tool will help ensure that we don't inadvertently introduce code
paths that miss the required workarounds.

Signed-off-by: Paul Burton 

---

 arch/mips/Makefile |   2 +-
 arch/mips/Makefile.postlink|  10 +-
 arch/mips/tools/.gitignore |   1 +
 arch/mips/tools/Makefile   |   5 +
 arch/mips/tools/loongson3-llsc-check.c | 307 +
 5 files changed, 323 insertions(+), 2 deletions(-)
 create mode 100644 arch/mips/tools/loongson3-llsc-check.c

diff --git a/arch/mips/Makefile b/arch/mips/Makefile
index cdc09b71febe..4ac0974cf902 100644
--- a/arch/mips/Makefile
+++ b/arch/mips/Makefile
@@ -13,7 +13,7 @@
 #
 
 archscripts: scripts_basic
-   $(Q)$(MAKE) $(build)=arch/mips/tools elf-entry
+   $(Q)$(MAKE) $(build)=arch/mips/tools elf-entry loongson3-llsc-check
$(Q)$(MAKE) $(build)=arch/mips/boot/tools relocs
 
 KBUILD_DEFCONFIG := 32r2el_defconfig
diff --git a/arch/mips/Makefile.postlink b/arch/mips/Makefile.postlink
index 4eea4188cb20..f03fdc95143e 100644
--- a/arch/mips/Makefile.postlink
+++ b/arch/mips/Makefile.postlink
@@ -3,7 +3,8 @@
 # Post-link MIPS pass
 # ===
 #
-# 1. Insert relocations into vmlinux
+# 1. Check that Loongson3 LL/SC workarounds are applied correctly
+# 2. Insert relocations into vmlinux
 
 PHONY := __archpost
 __archpost:
@@ -11,6 +12,10 @@ __archpost:
 -include include/config/auto.conf
 include scripts/Kbuild.include
 
+CMD_LS3_LLSC = arch/mips/tools/loongson3-llsc-check
+quiet_cmd_ls3_llsc = LLSCCHK $@
+  cmd_ls3_llsc = $(CMD_LS3_LLSC) $@
+
 CMD_RELOCS = arch/mips/boot/tools/relocs
 quiet_cmd_relocs = RELOCS $@
   cmd_relocs = $(CMD_RELOCS) $@
@@ -19,6 +24,9 @@ quiet_cmd_relocs = RELOCS $@
 
 vmlinux: FORCE
@true
+ifeq ($(CONFIG_CPU_LOONGSON3_WORKAROUNDS),y)
+   $(call if_changed,ls3_llsc)
+endif
 ifeq ($(CONFIG_RELOCATABLE),y)
$(call if_changed,relocs)
 endif
diff --git a/arch/mips/tools/.gitignore b/arch/mips/tools/.gitignore
index 56d34ce4..b0209450d9ff 100644
--- a/arch/mips/tools/.gitignore
+++ b/arch/mips/tools/.gitignore
@@ -1 +1,2 @@
 elf-entry
+loongson3-llsc-check
diff --git a/arch/mips/tools/Makefile b/arch/mips/tools/Makefile
index 3baee4bc6775..aaef688749f5 100644
--- a/arch/mips/tools/Makefile
+++ b/arch/mips/tools/Makefile
@@ -3,3 +3,8 @@ hostprogs-y := elf-entry
 PHONY += elf-entry
 elf-entry: $(obj)/elf-entry
@:
+
+hostprogs-$(CONFIG_CPU_LOONGSON3_WORKAROUNDS) += loongson3-llsc-check
+PHONY += loongson3-llsc-check
+loongson3-llsc-check: $(obj)/loongson3-llsc-check
+   @:
diff --git a/arch/mips/tools/loongson3-llsc-check.c 
b/arch/mips/tools/loongson3-llsc-check.c
new file mode 100644
index ..0ebddd0ae46f
--- /dev/null
+++ b/arch/mips/tools/loongson3-llsc-check.c
@@ -0,0 +1,307 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifdef be32toh
+/* If libc provides le{16,32,64}toh() then we'll use them */
+#elif BYTE_ORDER == LITTLE_ENDIAN
+# define le16toh(x)(x)
+# define le32toh(x)(x)
+# define le64toh(x)(x)
+#elif BYTE_ORDER == BIG_ENDIAN
+# define le16toh(x)bswap_16(x)
+# define le32toh(x)bswap_32(x)
+# define le64toh(x)bswap_64(x)
+#endif
+
+/* MIPS opcodes, in bits 31:26 of an instruction */
+#define OP_SPECIAL 0x00
+#define OP_REGIMM  0x01
+#define OP_BEQ 0x04
+#define OP_BNE 0x05
+#define OP_BLEZ0x06
+#define OP_BGTZ0x07
+#define OP_BEQL0x14
+#define OP_BNEL0x15
+#define OP_BLEZL   0x16
+#define OP_BGTZL   0x17
+#define OP_LL  0x30
+#define OP_LLD 0x34
+#define OP_SC  0x38
+#define OP_SCD 0x3c
+
+/* Bits 20:16 of OP_REGIMM instructions */
+#define REGIMM_BLTZ0x00
+#define REGIMM_BGEZ0x01
+#define REGIMM_BLTZL   0x02
+#define REGIMM_BGEZL   0x03
+#define REGIMM_BLTZAL  0x10
+#define REGIMM_BGEZAL  0x11
+#define REGIMM_BLTZALL 0x12
+#define REGIMM_BGEZALL 0x13
+
+/* Bits 5:0 of OP_SPECIAL instructions */
+#define SPECIAL_SYNC   0x0f
+
+static void usage(FILE *f)
+{
+   fprintf(f, "Usage: loongson3-llsc-check /path/to/vmlinux\n");
+}
+
+static int se16(uint16_t x)
+{
+   return (int16_t)x;
+}
+
+sta

[PATCH 23/37] MIPS: bitops: Use the BIT() macro

2019-09-30 Thread Paul Burton
Use the BIT() macro in asm/bitops.h rather than open-coding its
equivalent.

Signed-off-by: Paul Burton 
---

 arch/mips/include/asm/bitops.h | 31 ---
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index 0b0ce0adce8f..35582afc057b 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -13,6 +13,7 @@
 #error only  can be included directly
 #endif
 
+#include 
 #include 
 #include 
 #include 
@@ -70,7 +71,7 @@ static inline void set_bit(unsigned long nr, volatile 
unsigned long *addr)
"   beqzl   %0, 1b  \n"
"   .setpop \n"
: "=" (temp), "=" GCC_OFF_SMALL_ASM() (*m)
-   : "ir" (1UL << bit), GCC_OFF_SMALL_ASM() (*m)
+   : "ir" (BIT(bit)), GCC_OFF_SMALL_ASM() (*m)
: __LLSC_CLOBBER);
return;
}
@@ -99,7 +100,7 @@ static inline void set_bit(unsigned long nr, volatile 
unsigned long *addr)
"   " __SC  "%0, %1 \n"
"   .setpop \n"
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-   : "ir" (1UL << bit)
+   : "ir" (BIT(bit))
: __LLSC_CLOBBER);
} while (unlikely(!temp));
 }
@@ -135,7 +136,7 @@ static inline void clear_bit(unsigned long nr, volatile 
unsigned long *addr)
"   beqzl   %0, 1b  \n"
"   .setpop \n"
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-   : "ir" (~(1UL << bit))
+   : "ir" (~(BIT(bit)))
: __LLSC_CLOBBER);
return;
}
@@ -164,7 +165,7 @@ static inline void clear_bit(unsigned long nr, volatile 
unsigned long *addr)
"   " __SC "%0, %1  \n"
"   .setpop \n"
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-   : "ir" (~(1UL << bit))
+   : "ir" (~(BIT(bit)))
: __LLSC_CLOBBER);
} while (unlikely(!temp));
 }
@@ -213,7 +214,7 @@ static inline void change_bit(unsigned long nr, volatile 
unsigned long *addr)
"   beqzl   %0, 1b  \n"
"   .setpop \n"
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-   : "ir" (1UL << bit)
+   : "ir" (BIT(bit))
: __LLSC_CLOBBER);
return;
}
@@ -228,7 +229,7 @@ static inline void change_bit(unsigned long nr, volatile 
unsigned long *addr)
"   " __SC  "%0, %1 \n"
"   .setpop \n"
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-   : "ir" (1UL << bit)
+   : "ir" (BIT(bit))
: __LLSC_CLOBBER);
} while (unlikely(!temp));
 }
@@ -261,7 +262,7 @@ static inline int test_and_set_bit_lock(unsigned long nr,
"   and %2, %0, %3  \n"
"   .setpop \n"
: "=" (temp), "+m" (*m), "=" (res)
-   : "ir" (1UL << bit)
+   : "ir" (BIT(bit))
: __LLSC_CLOBBER);
} else {
loongson_llsc_mb();
@@ -274,11 +275,11 @@ static inline int test_and_set_bit_lock(unsigned long nr,
"   " __SC  "%2, %1 \n"
"   .setpop \n"
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=" 
(res)
-   : "ir" (1UL << bit)
+   : "ir" (BIT(bit))
: __LLSC_CLOBBER);
} while (unlikely(!res));
 
-   res = temp & (1UL << bit);
+   res = temp & BIT(bit);
}
 
smp_llsc_mb();
@@ -332,7 +333,7 @@ static inline int test_and_clear_bit(unsigned long nr,
" 

[PATCH 28/37] MIPS: bitops: Use smp_mb__before_atomic in test_* ops

2019-09-30 Thread Paul Burton
Use smp_mb__before_atomic() rather than smp_mb__before_llsc() in
test_and_set_bit(), test_and_clear_bit() & test_and_change_bit(). The
_atomic() versions make semantic sense in these cases, and will allow a
later patch to omit redundant barriers for Loongson3 systems that
already include a barrier within __test_bit_op().

Signed-off-by: Paul Burton 
---

 arch/mips/include/asm/bitops.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index 9e967d6622c8..e6d97238a321 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -209,7 +209,7 @@ static inline int test_and_set_bit_lock(unsigned long nr,
 static inline int test_and_set_bit(unsigned long nr,
volatile unsigned long *addr)
 {
-   smp_mb__before_llsc();
+   smp_mb__before_atomic();
return test_and_set_bit_lock(nr, addr);
 }
 
@@ -228,7 +228,7 @@ static inline int test_and_clear_bit(unsigned long nr,
int bit = nr % BITS_PER_LONG;
unsigned long res, orig;
 
-   smp_mb__before_llsc();
+   smp_mb__before_atomic();
 
if (!kernel_uses_llsc) {
res = __mips_test_and_clear_bit(nr, addr);
@@ -265,7 +265,7 @@ static inline int test_and_change_bit(unsigned long nr,
int bit = nr % BITS_PER_LONG;
unsigned long res, orig;
 
-   smp_mb__before_llsc();
+   smp_mb__before_atomic();
 
if (!kernel_uses_llsc) {
res = __mips_test_and_change_bit(nr, addr);
-- 
2.23.0



[PATCH 16/37] MIPS: bitops: Use generic builtin ffs/fls; drop cpu_has_clo_clz

2019-09-30 Thread Paul Burton
The MIPS-specific implementations of __ffs(), ffs(), __fls() & fls()
make use of the MIPS clz instruction where possible. They do this via
inline asm, but in any configuration in which the kernel is built for a
MIPS32 or MIPS64 release 1 or higher instruction set we know that these
instructions are available & can be emitted using the __builtin_clz()
function & other associated builtins which are provided by all currently
supported versions of gcc.

When targeting an older instruction set GCC will generate a longer code
sequence similar to the fallback cases we have in our implementations.

As such, remove our custom implementations of these functions & use the
generic versions built atop compiler builtins. This allows us to drop a
significant chunk of code, along with the cpu_has_clo_clz feature macro
which was only used by these functions.

The only thing we lose here is the ability for kernels built to target a
pre-r1 ISA to opportunistically make use of clz when running on a CPU
that implements it. This seems like a small cost, and well worth paying
to simplify the code.

Signed-off-by: Paul Burton 
---

 arch/mips/include/asm/bitops.h| 146 +-
 arch/mips/include/asm/cpu-features.h  |  10 --
 .../asm/mach-malta/cpu-feature-overrides.h|   2 -
 3 files changed, 4 insertions(+), 154 deletions(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index 985d6a02f9ea..4b618afbfa5b 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -491,149 +491,11 @@ static inline void __clear_bit_unlock(unsigned long nr, 
volatile unsigned long *
nudge_writes();
 }
 
-/*
- * Return the bit position (0..63) of the most significant 1 bit in a word
- * Returns -1 if no 1 bit exists
- */
-static __always_inline unsigned long __fls(unsigned long word)
-{
-   int num;
-
-   if (BITS_PER_LONG == 32 && !__builtin_constant_p(word) &&
-   __builtin_constant_p(cpu_has_clo_clz) && cpu_has_clo_clz) {
-   __asm__(
-   "   .setpush\n"
-   "   .set"MIPS_ISA_LEVEL"\n"
-   "   clz %0, %1  \n"
-   "   .setpop \n"
-   : "=r" (num)
-   : "r" (word));
-
-   return 31 - num;
-   }
-
-   if (BITS_PER_LONG == 64 && !__builtin_constant_p(word) &&
-   __builtin_constant_p(cpu_has_mips64) && cpu_has_mips64) {
-   __asm__(
-   "   .setpush\n"
-   "   .set"MIPS_ISA_LEVEL"\n"
-   "   dclz%0, %1  \n"
-   "   .setpop \n"
-   : "=r" (num)
-   : "r" (word));
-
-   return 63 - num;
-   }
-
-   num = BITS_PER_LONG - 1;
-
-#if BITS_PER_LONG == 64
-   if (!(word & (~0ul << 32))) {
-   num -= 32;
-   word <<= 32;
-   }
-#endif
-   if (!(word & (~0ul << (BITS_PER_LONG-16 {
-   num -= 16;
-   word <<= 16;
-   }
-   if (!(word & (~0ul << (BITS_PER_LONG-8 {
-   num -= 8;
-   word <<= 8;
-   }
-   if (!(word & (~0ul << (BITS_PER_LONG-4 {
-   num -= 4;
-   word <<= 4;
-   }
-   if (!(word & (~0ul << (BITS_PER_LONG-2 {
-   num -= 2;
-   word <<= 2;
-   }
-   if (!(word & (~0ul << (BITS_PER_LONG-1
-   num -= 1;
-   return num;
-}
-
-/*
- * __ffs - find first bit in word.
- * @word: The word to search
- *
- * Returns 0..SZLONG-1
- * Undefined if no bit exists, so code should check against 0 first.
- */
-static __always_inline unsigned long __ffs(unsigned long word)
-{
-   return __fls(word & -word);
-}
-
-/*
- * fls - find last bit set.
- * @word: The word to search
- *
- * This is defined the same way as ffs.
- * Note fls(0) = 0, fls(1) = 1, fls(0x8000) = 32.
- */
-static inline int fls(unsigned int x)
-{
-   int r;
-
-   if (!__builtin_constant_p(x) &&
-   __builtin_constant_p(cpu_has_clo_clz) && cpu_has_clo_clz) {
-   __asm__(
-   "   .setpush\n"
-   "   .set"MIPS_ISA_LEVEL"\n"
-   "   clz %0, %1  

[PATCH 27/37] MIPS: bitops: Emit Loongson3 sync workarounds within asm

2019-09-30 Thread Paul Burton
Generate the sync instructions required to workaround Loongson3 LL/SC
errata within inline asm blocks, which feels a little safer than doing
it from C where strictly speaking the compiler would be well within its
rights to insert a memory access between the separate asm statements we
previously had, containing sync & ll instructions respectively.

Signed-off-by: Paul Burton 
---

 arch/mips/include/asm/bitops.h | 11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index 59fe1d5d4fc9..9e967d6622c8 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -31,6 +31,7 @@
asm volatile(   \
"   .setpush\n" \
"   .set" MIPS_ISA_LEVEL "  \n" \
+   "   " __SYNC(full, loongson3_war) " \n" \
"1: " __LL  "%0, %1 \n" \
"   " insn  "   \n" \
"   " __SC  "%0, %1 \n" \
@@ -47,6 +48,7 @@
asm volatile(   \
"   .setpush\n" \
"   .set" MIPS_ISA_LEVEL "  \n" \
+   "   " __SYNC(full, loongson3_war) " \n" \
"1: " __LL  ll_dst ", %2\n" \
"   " insn  "   \n" \
"   " __SC  "%1, %2 \n" \
@@ -96,12 +98,10 @@ static inline void set_bit(unsigned long nr, volatile 
unsigned long *addr)
}
 
if ((MIPS_ISA_REV >= 2) && __builtin_constant_p(bit) && (bit >= 16)) {
-   loongson_llsc_mb();
__bit_op(*m, __INS "%0, %3, %2, 1", "i"(bit), "r"(~0));
return;
}
 
-   loongson_llsc_mb();
__bit_op(*m, "or\t%0, %2", "ir"(BIT(bit)));
 }
 
@@ -126,12 +126,10 @@ static inline void clear_bit(unsigned long nr, volatile 
unsigned long *addr)
}
 
if ((MIPS_ISA_REV >= 2) && __builtin_constant_p(bit)) {
-   loongson_llsc_mb();
__bit_op(*m, __INS "%0, $0, %2, 1", "i"(bit));
return;
}
 
-   loongson_llsc_mb();
__bit_op(*m, "and\t%0, %2", "ir"(~BIT(bit)));
 }
 
@@ -168,7 +166,6 @@ static inline void change_bit(unsigned long nr, volatile 
unsigned long *addr)
return;
}
 
-   loongson_llsc_mb();
__bit_op(*m, "xor\t%0, %2", "ir"(BIT(bit)));
 }
 
@@ -190,7 +187,6 @@ static inline int test_and_set_bit_lock(unsigned long nr,
if (!kernel_uses_llsc) {
res = __mips_test_and_set_bit_lock(nr, addr);
} else {
-   loongson_llsc_mb();
orig = __test_bit_op(*m, "%0",
 "or\t%1, %0, %3",
 "ir"(BIT(bit)));
@@ -237,13 +233,11 @@ static inline int test_and_clear_bit(unsigned long nr,
if (!kernel_uses_llsc) {
res = __mips_test_and_clear_bit(nr, addr);
} else if ((MIPS_ISA_REV >= 2) && __builtin_constant_p(nr)) {
-   loongson_llsc_mb();
res = __test_bit_op(*m, "%1",
__EXT "%0, %1, %3, 1;"
__INS "%1, $0, %3, 1",
"i"(bit));
} else {
-   loongson_llsc_mb();
orig = __test_bit_op(*m, "%0",
 "or\t%1, %0, %3;"
 "xor\t%1, %1, %3",
@@ -276,7 +270,6 @@ static inline int test_and_change_bit(unsigned long nr,
if (!kernel_uses_llsc) {
res = __mips_test_and_change_bit(nr, addr);
} else {
-   loongson_llsc_mb();
orig = __test_bit_op(*m, "%0",
 "xor\t%1, %0, %3",
 "ir"(BIT(bit)));
-- 
2.23.0



[PATCH 32/37] MIPS: syscall: Emit Loongson3 sync workarounds within asm

2019-09-30 Thread Paul Burton
Generate the sync instructions required to workaround Loongson3 LL/SC
errata within inline asm blocks, which feels a little safer than doing
it from C where strictly speaking the compiler would be well within its
rights to insert a memory access between the separate asm statements we
previously had, containing sync & ll instructions respectively.

Signed-off-by: Paul Burton 
---

 arch/mips/kernel/syscall.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/mips/kernel/syscall.c b/arch/mips/kernel/syscall.c
index b0e25e913bdb..3ea288ca35f1 100644
--- a/arch/mips/kernel/syscall.c
+++ b/arch/mips/kernel/syscall.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -132,12 +133,12 @@ static inline int mips_atomic_set(unsigned long addr, 
unsigned long new)
  [efault] "i" (-EFAULT)
: "memory");
} else if (cpu_has_llsc) {
-   loongson_llsc_mb();
__asm__ __volatile__ (
"   .setpush\n"
"   .set"MIPS_ISA_ARCH_LEVEL"   \n"
"   li  %[err], 0   \n"
"1: \n"
+   "   " __SYNC(full, loongson3_war) " \n"
user_ll("%[old]", "(%[addr])")
"   move%[tmp], %[new]  \n"
"2: \n"
-- 
2.23.0



[PATCH 17/37] MIPS: bitops: Handle !kernel_uses_llsc first

2019-09-30 Thread Paul Burton
Reorder conditions in our various bitops functions that check
kernel_uses_llsc such that they handle the !kernel_uses_llsc case first.
This allows us to avoid the need to duplicate the kernel_uses_llsc check
in all the other cases. For functions that don't involve barriers common
to the various implementations, we switch to returning from within each
if block making each case easier to read in isolation.

Signed-off-by: Paul Burton 
---

 arch/mips/include/asm/bitops.h | 213 -
 1 file changed, 105 insertions(+), 108 deletions(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index 4b618afbfa5b..d3f3f37ca0b1 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -52,11 +52,16 @@ int __mips_test_and_change_bit(unsigned long nr,
  */
 static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
 {
-   unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG);
+   unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
int bit = nr & SZLONG_MASK;
unsigned long temp;
 
-   if (kernel_uses_llsc && R1_LLSC_WAR) {
+   if (!kernel_uses_llsc) {
+   __mips_set_bit(nr, addr);
+   return;
+   }
+
+   if (R1_LLSC_WAR) {
__asm__ __volatile__(
"   .setpush\n"
"   .setarch=r4000  \n"
@@ -68,8 +73,11 @@ static inline void set_bit(unsigned long nr, volatile 
unsigned long *addr)
: "=" (temp), "=" GCC_OFF_SMALL_ASM() (*m)
: "ir" (1UL << bit), GCC_OFF_SMALL_ASM() (*m)
: __LLSC_CLOBBER);
+   return;
+   }
+
 #if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)
-   } else if (kernel_uses_llsc && __builtin_constant_p(bit)) {
+   if (__builtin_constant_p(bit)) {
loongson_llsc_mb();
do {
__asm__ __volatile__(
@@ -80,23 +88,23 @@ static inline void set_bit(unsigned long nr, volatile 
unsigned long *addr)
: "ir" (bit), "r" (~0)
: __LLSC_CLOBBER);
} while (unlikely(!temp));
+   return;
+   }
 #endif /* CONFIG_CPU_MIPSR2 || CONFIG_CPU_MIPSR6 */
-   } else if (kernel_uses_llsc) {
-   loongson_llsc_mb();
-   do {
-   __asm__ __volatile__(
-   "   .setpush\n"
-   "   .set"MIPS_ISA_ARCH_LEVEL"   \n"
-   "   " __LL "%0, %1  # set_bit   \n"
-   "   or  %0, %2  \n"
-   "   " __SC  "%0, %1 \n"
-   "   .setpop \n"
-   : "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-   : "ir" (1UL << bit)
-   : __LLSC_CLOBBER);
-   } while (unlikely(!temp));
-   } else
-   __mips_set_bit(nr, addr);
+
+   loongson_llsc_mb();
+   do {
+   __asm__ __volatile__(
+   "   .setpush\n"
+   "   .set"MIPS_ISA_ARCH_LEVEL"   \n"
+   "   " __LL "%0, %1  # set_bit   \n"
+   "   or  %0, %2  \n"
+   "   " __SC  "%0, %1 \n"
+   "   .setpop \n"
+   : "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
+   : "ir" (1UL << bit)
+   : __LLSC_CLOBBER);
+   } while (unlikely(!temp));
 }
 
 /*
@@ -111,11 +119,16 @@ static inline void set_bit(unsigned long nr, volatile 
unsigned long *addr)
  */
 static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
-   unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG);
+   unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
int bit = nr & SZLONG_MASK;
unsigned long temp;
 
-   if (kernel_uses_llsc && R1_LLSC_WAR) {
+   if (!kernel_uses_llsc) {
+   __mips_clear_bit(nr, addr);
+   return;
+   }
+
+   if (R1_LLSC_WAR) {
__asm__ __volatile__(
"   .setpush\

[PATCH 26/37] MIPS: bitops: Use BIT_WORD() & BITS_PER_LONG

2019-09-30 Thread Paul Burton
Rather than using custom SZLONG_LOG & SZLONG_MASK macros to shift & mask
a bit index to form word & bit offsets respectively, make use of the
standard BIT_WORD() & BITS_PER_LONG macros for the same purpose.

volatile is added to the definition of pointers to the long-sized word
we'll operate on, in order to prevent the compiler complaining that we
cast away the volatile qualifier of the addr argument. This should have
no effect on generated code, which in the LL/SC case is inline asm
anyway & in the non-LLSC case access is constrained by compiler barriers
provided by raw_local_irq_{save,restore}().

Signed-off-by: Paul Burton 
---

 arch/mips/include/asm/bitops.h | 24 
 arch/mips/include/asm/llsc.h   |  4 
 arch/mips/lib/bitops.c | 31 +--
 3 files changed, 25 insertions(+), 34 deletions(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index 5701f8b41e87..59fe1d5d4fc9 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -87,8 +87,8 @@ int __mips_test_and_change_bit(unsigned long nr,
  */
 static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
 {
-   unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
-   int bit = nr & SZLONG_MASK;
+   volatile unsigned long *m = [BIT_WORD(nr)];
+   int bit = nr % BITS_PER_LONG;
 
if (!kernel_uses_llsc) {
__mips_set_bit(nr, addr);
@@ -117,8 +117,8 @@ static inline void set_bit(unsigned long nr, volatile 
unsigned long *addr)
  */
 static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
-   unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
-   int bit = nr & SZLONG_MASK;
+   volatile unsigned long *m = [BIT_WORD(nr)];
+   int bit = nr % BITS_PER_LONG;
 
if (!kernel_uses_llsc) {
__mips_clear_bit(nr, addr);
@@ -160,8 +160,8 @@ static inline void clear_bit_unlock(unsigned long nr, 
volatile unsigned long *ad
  */
 static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
 {
-   unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
-   int bit = nr & SZLONG_MASK;
+   volatile unsigned long *m = [BIT_WORD(nr)];
+   int bit = nr % BITS_PER_LONG;
 
if (!kernel_uses_llsc) {
__mips_change_bit(nr, addr);
@@ -183,8 +183,8 @@ static inline void change_bit(unsigned long nr, volatile 
unsigned long *addr)
 static inline int test_and_set_bit_lock(unsigned long nr,
volatile unsigned long *addr)
 {
-   unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
-   int bit = nr & SZLONG_MASK;
+   volatile unsigned long *m = [BIT_WORD(nr)];
+   int bit = nr % BITS_PER_LONG;
unsigned long res, orig;
 
if (!kernel_uses_llsc) {
@@ -228,8 +228,8 @@ static inline int test_and_set_bit(unsigned long nr,
 static inline int test_and_clear_bit(unsigned long nr,
volatile unsigned long *addr)
 {
-   unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
-   int bit = nr & SZLONG_MASK;
+   volatile unsigned long *m = [BIT_WORD(nr)];
+   int bit = nr % BITS_PER_LONG;
unsigned long res, orig;
 
smp_mb__before_llsc();
@@ -267,8 +267,8 @@ static inline int test_and_clear_bit(unsigned long nr,
 static inline int test_and_change_bit(unsigned long nr,
volatile unsigned long *addr)
 {
-   unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
-   int bit = nr & SZLONG_MASK;
+   volatile unsigned long *m = [BIT_WORD(nr)];
+   int bit = nr % BITS_PER_LONG;
unsigned long res, orig;
 
smp_mb__before_llsc();
diff --git a/arch/mips/include/asm/llsc.h b/arch/mips/include/asm/llsc.h
index d240a4a2d1c4..c49738bc3bda 100644
--- a/arch/mips/include/asm/llsc.h
+++ b/arch/mips/include/asm/llsc.h
@@ -12,15 +12,11 @@
 #include 
 
 #if _MIPS_SZLONG == 32
-#define SZLONG_LOG 5
-#define SZLONG_MASK 31UL
 #define __LL   "ll "
 #define __SC   "sc "
 #define __INS  "ins"
 #define __EXT  "ext"
 #elif _MIPS_SZLONG == 64
-#define SZLONG_LOG 6
-#define SZLONG_MASK 63UL
 #define __LL   "lld"
 #define __SC   "scd"
 #define __INS  "dins   "
diff --git a/arch/mips/lib/bitops.c b/arch/mips/lib/bitops.c
index fba402c0879d..116d0bd8b2ae 100644
--- a/arch/mips/lib/bitops.c
+++ b/arch/mips/lib/bitops.c
@@ -7,6 +7,7 @@
  * Copyright (c) 1999, 2000  Silicon Graphics, Inc.
  */
 #include 
+#include 
 #include 
 #include 
 
@@ -19,12 +20,11 @@
  */
 void __mips_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
-   unsigned long *a = (unsigned long *)addr;
-   unsigned bit = nr & SZLONG_MASK;
+   volatile un

[PATCH 31/37] MIPS: futex: Emit Loongson3 sync workarounds within asm

2019-09-30 Thread Paul Burton
Generate the sync instructions required to workaround Loongson3 LL/SC
errata within inline asm blocks, which feels a little safer than doing
it from C where strictly speaking the compiler would be well within its
rights to insert a memory access between the separate asm statements we
previously had, containing sync & ll instructions respectively.

Signed-off-by: Paul Burton 
---

 arch/mips/include/asm/futex.h | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/mips/include/asm/futex.h b/arch/mips/include/asm/futex.h
index b83b0397462d..45c3e3652f48 100644
--- a/arch/mips/include/asm/futex.h
+++ b/arch/mips/include/asm/futex.h
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #define __futex_atomic_op(insn, ret, oldval, uaddr, oparg) \
@@ -50,12 +51,12 @@
  "i" (-EFAULT) \
: "memory");\
} else if (cpu_has_llsc) {  \
-   loongson_llsc_mb(); \
__asm__ __volatile__(   \
"   .setpush\n" \
"   .setnoat\n" \
"   .setpush\n" \
"   .set"MIPS_ISA_ARCH_LEVEL"   \n" \
+   "   " __SYNC(full, loongson3_war) " \n" \
"1: "user_ll("%1", "%4")" # __futex_atomic_op\n"\
"   .setpop \n" \
"   " insn  "   \n" \
@@ -164,13 +165,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user 
*uaddr,
  "i" (-EFAULT)
: "memory");
} else if (cpu_has_llsc) {
-   loongson_llsc_mb();
__asm__ __volatile__(
"# futex_atomic_cmpxchg_inatomic\n"
"   .setpush\n"
"   .setnoat\n"
"   .setpush\n"
"   .set"MIPS_ISA_ARCH_LEVEL"   \n"
+   "   " __SYNC(full, loongson3_war) " \n"
"1: "user_ll("%1", "%3")"   \n"
"   bne %1, %z4, 3f \n"
"   .setpop \n"
@@ -178,8 +179,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
"   .set"MIPS_ISA_ARCH_LEVEL"   \n"
"2: "user_sc("$1", "%2")"   \n"
"   beqz$1, 1b  \n"
-   __WEAK_LLSC_MB
-   "3: \n"
+   "3: " __SYNC_ELSE(full, loongson3_war, __WEAK_LLSC_MB) "\n"
"   .insn   \n"
"   .setpop \n"
"   .section .fixup,\"ax\"  \n"
@@ -194,7 +194,6 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
: GCC_OFF_SMALL_ASM() (*uaddr), "Jr" (oldval), "Jr" (newval),
  "i" (-EFAULT)
: "memory");
-   loongson_llsc_mb();
} else
return -ENOSYS;
 
-- 
2.23.0



[PATCH 29/37] MIPS: cmpxchg: Emit Loongson3 sync workarounds within asm

2019-09-30 Thread Paul Burton
Generate the sync instructions required to workaround Loongson3 LL/SC
errata within inline asm blocks, which feels a little safer than doing
it from C where strictly speaking the compiler would be well within its
rights to insert a memory access between the separate asm statements we
previously had, containing sync & ll instructions respectively.

Signed-off-by: Paul Burton 
---

 arch/mips/include/asm/cmpxchg.h | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/mips/include/asm/cmpxchg.h b/arch/mips/include/asm/cmpxchg.h
index 5d3f0e3513b4..fc121d20a980 100644
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /*
@@ -36,12 +37,12 @@ extern unsigned long __xchg_called_with_bad_pointer(void)
__typeof(*(m)) __ret;   \
\
if (kernel_uses_llsc) { \
-   loongson_llsc_mb(); \
__asm__ __volatile__(   \
"   .setpush\n" \
"   .setnoat\n" \
"   .setpush\n" \
"   .set" MIPS_ISA_ARCH_LEVEL " \n" \
+   "   " __SYNC(full, loongson3_war) " \n" \
"1: " ld "  %0, %2  # __xchg_asm\n" \
"   .setpop \n" \
"   move$1, %z3 \n" \
@@ -108,12 +109,12 @@ static inline unsigned long __xchg(volatile void *ptr, 
unsigned long x,
__typeof(*(m)) __ret;   \
\
if (kernel_uses_llsc) { \
-   loongson_llsc_mb(); \
__asm__ __volatile__(   \
"   .setpush\n" \
"   .setnoat\n" \
"   .setpush\n" \
"   .set"MIPS_ISA_ARCH_LEVEL"   \n" \
+   "   " __SYNC(full, loongson3_war) " \n" \
"1: " ld "  %0, %2  # __cmpxchg_asm \n" \
"   bne %0, %z3, 2f \n" \
"   .setpop \n" \
@@ -122,11 +123,10 @@ static inline unsigned long __xchg(volatile void *ptr, 
unsigned long x,
"   " st "  $1, %1  \n" \
"\t" __SC_BEQZ  "$1, 1b \n" \
"   .setpop \n" \
-   "2: \n" \
+   "2: " __SYNC(full, loongson3_war) " \n" \
: "=" (__ret), "=" GCC_OFF_SMALL_ASM() (*m)   \
: GCC_OFF_SMALL_ASM() (*m), "Jr" (old), "Jr" (new)  \
: __LLSC_CLOBBER);  \
-   loongson_llsc_mb(); \
} else {\
unsigned long __flags;  \
\
@@ -222,11 +222,11 @@ static inline unsigned long __cmpxchg64(volatile void 
*ptr,
 */
local_irq_save(flags);
 
-   loongson_llsc_mb();
asm volatile(
"   .setpush\n"
"   .set" MIPS_ISA_ARCH_LEVEL " \n"
/* Load 64 bits from ptr */
+   "   " __SYNC(full, loongson3_war) " \n"
"1: lld %L0, %3 # __cmpxchg64   \n"
/*
 * Split the 64 bit value we loaded into the 2 registers that hold the
@@ -260,7 +260,7 @@ static inline unsigned long __cmpxchg64(volatile void *ptr,
/* If we failed, loop! */
"\t" __SC_BEQZ "%L1, 1b \n&quo

[PATCH 34/37] MIPS: barrier: Make __smp_mb__before_atomic() a no-op for Loongson3

2019-09-30 Thread Paul Burton
Loongson3 systems with CONFIG_CPU_LOONGSON3_WORKAROUNDS enabled already
emit a full completion barrier as part of the inline assembly containing
LL/SC loops for atomic operations. As such the barrier emitted by
__smp_mb__before_atomic() is redundant, and we can remove it.

Signed-off-by: Paul Burton 
---

 arch/mips/include/asm/barrier.h | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index 1a99a6c5b5dd..f3b5aa0938c1 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -118,7 +118,17 @@ static inline void wmb(void)
 #define nudge_writes() mb()
 #endif
 
-#define __smp_mb__before_atomic()  __smp_mb__before_llsc()
+/*
+ * In the Loongson3 LL/SC workaround case, all of our LL/SC loops already have
+ * a completion barrier immediately preceding the LL instruction. Therefore we
+ * can skip emitting a barrier from __smp_mb__before_atomic().
+ */
+#ifdef CONFIG_CPU_LOONGSON3_WORKAROUNDS
+# define __smp_mb__before_atomic()
+#else
+# define __smp_mb__before_atomic() __smp_mb__before_llsc()
+#endif
+
 #define __smp_mb__after_atomic()   smp_llsc_mb()
 
 static inline void sync_ginv(void)
-- 
2.23.0



[PATCH 08/37] MIPS: barrier: Clean up sync_ginv()

2019-09-30 Thread Paul Burton
Use the new __SYNC() infrastructure to implement sync_ginv(), for
consistency with much of the rest of the asm/barrier.h.

Signed-off-by: Paul Burton 
---

 arch/mips/include/asm/barrier.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index a117c6d95038..c7e05e832da9 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -163,7 +163,7 @@ static inline void wmb(void)
 
 static inline void sync_ginv(void)
 {
-   asm volatile("sync\t%0" :: "i"(__SYNC_ginv));
+   asm volatile(__SYNC(ginv, always));
 }
 
 #include 
-- 
2.23.0



[PATCH 10/37] MIPS: atomic: Handle !kernel_uses_llsc first

2019-09-30 Thread Paul Burton
Handle the !kernel_uses_llsc path first in our ATOMIC_OP(),
ATOMIC_OP_RETURN() & ATOMIC_FETCH_OP() macros & return from within the
block. This allows us to de-indent the kernel_uses_llsc path by one
level which will be useful when making further changes.

Signed-off-by: Paul Burton 
---

 arch/mips/include/asm/atomic.h | 99 +-
 1 file changed, 49 insertions(+), 50 deletions(-)

diff --git a/arch/mips/include/asm/atomic.h b/arch/mips/include/asm/atomic.h
index 2d2a8a74c51b..ace2ea005588 100644
--- a/arch/mips/include/asm/atomic.h
+++ b/arch/mips/include/asm/atomic.h
@@ -45,51 +45,36 @@
 #define ATOMIC_OP(op, c_op, asm_op)\
 static __inline__ void atomic_##op(int i, atomic_t * v)
\
 {  \
-   if (kernel_uses_llsc) { \
-   int temp;   \
+   int temp;   \
\
-   loongson_llsc_mb(); \
-   __asm__ __volatile__(   \
-   "   .setpush\n" \
-   "   .set"MIPS_ISA_LEVEL"\n" \
-   "1: ll  %0, %1  # atomic_" #op "\n" \
-   "   " #asm_op " %0, %2  \n" \
-   "   sc  %0, %1  \n" \
-   "\t" __SC_BEQZ "%0, 1b  \n" \
-   "   .setpop \n" \
-   : "=" (temp), "+" GCC_OFF_SMALL_ASM() (v->counter)\
-   : "Ir" (i) : __LLSC_CLOBBER);   \
-   } else {\
+   if (!kernel_uses_llsc) {\
unsigned long flags;\
\
raw_local_irq_save(flags);  \
v->counter c_op i;  \
raw_local_irq_restore(flags);   \
+   return; \
}   \
+   \
+   loongson_llsc_mb(); \
+   __asm__ __volatile__(   \
+   "   .setpush\n" \
+   "   .set" MIPS_ISA_LEVEL "  \n" \
+   "1: ll  %0, %1  # atomic_" #op "\n" \
+   "   " #asm_op " %0, %2  \n" \
+   "   sc  %0, %1  \n" \
+   "\t" __SC_BEQZ "%0, 1b  \n" \
+   "   .setpop \n" \
+   : "=" (temp), "+" GCC_OFF_SMALL_ASM() (v->counter)\
+   : "Ir" (i) : __LLSC_CLOBBER);   \
 }
 
 #define ATOMIC_OP_RETURN(op, c_op, asm_op) \
 static __inline__ int atomic_##op##_return_relaxed(int i, atomic_t * v)
\
 {  \
-   int result; \
-   \
-   if (kernel_uses_llsc) { \
-   int temp;   \
+   int temp, result;   \
\
-   loongson_llsc_mb(); \
-   __asm__ __volatile__(   \
-   "   .setpush\n" \
-   "   .set"MIPS_ISA_LEVEL"\n" \
-   "1: ll  %1, %2  # atomic_" #op "_return \n" \
-   "   " #asm_op " %0, %1, %3   

[PATCH 24/37] MIPS: bitops: Avoid redundant zero-comparison for non-LLSC

2019-09-30 Thread Paul Burton
The IRQ-disabling non-LLSC fallbacks for bitops on UP systems already
return a zero or one, so there's no need to perform another comparison
against zero. Move these comparisons into the LLSC paths to avoid the
redundant work.

Signed-off-by: Paul Burton 
---

 arch/mips/include/asm/bitops.h | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index 35582afc057b..3e5589320e83 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -264,6 +264,8 @@ static inline int test_and_set_bit_lock(unsigned long nr,
: "=" (temp), "+m" (*m), "=" (res)
: "ir" (BIT(bit))
: __LLSC_CLOBBER);
+
+   res = res != 0;
} else {
loongson_llsc_mb();
do {
@@ -279,12 +281,12 @@ static inline int test_and_set_bit_lock(unsigned long nr,
: __LLSC_CLOBBER);
} while (unlikely(!res));
 
-   res = temp & BIT(bit);
+   res = (temp & BIT(bit)) != 0;
}
 
smp_llsc_mb();
 
-   return res != 0;
+   return res;
 }
 
 /*
@@ -335,6 +337,8 @@ static inline int test_and_clear_bit(unsigned long nr,
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=" (res)
: "ir" (BIT(bit))
: __LLSC_CLOBBER);
+
+   res = res != 0;
} else if ((MIPS_ISA_REV >= 2) && __builtin_constant_p(nr)) {
loongson_llsc_mb();
do {
@@ -363,12 +367,12 @@ static inline int test_and_clear_bit(unsigned long nr,
: __LLSC_CLOBBER);
} while (unlikely(!res));
 
-   res = temp & BIT(bit);
+   res = (temp & BIT(bit)) != 0;
}
 
smp_llsc_mb();
 
-   return res != 0;
+   return res;
 }
 
 /*
@@ -403,6 +407,8 @@ static inline int test_and_change_bit(unsigned long nr,
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=" (res)
: "ir" (BIT(bit))
: __LLSC_CLOBBER);
+
+   res = res != 0;
} else {
loongson_llsc_mb();
do {
@@ -418,12 +424,12 @@ static inline int test_and_change_bit(unsigned long nr,
: __LLSC_CLOBBER);
} while (unlikely(!res));
 
-   res = temp & BIT(bit);
+   res = (temp & BIT(bit)) != 0;
}
 
smp_llsc_mb();
 
-   return res != 0;
+   return res;
 }
 
 #include 
-- 
2.23.0



[PATCH 21/37] MIPS: bitops: Implement test_and_set_bit() in terms of _lock variant

2019-09-30 Thread Paul Burton
The only difference between test_and_set_bit() & test_and_set_bit_lock()
is memory ordering barrier semantics - the former provides a full
barrier whilst the latter only provides acquire semantics.

We can therefore implement test_and_set_bit() in terms of
test_and_set_bit_lock() with the addition of the extra memory barrier.
Do this in order to avoid duplicating logic.

Signed-off-by: Paul Burton 
---

 arch/mips/include/asm/bitops.h | 66 +++---
 arch/mips/lib/bitops.c | 26 --
 2 files changed, 13 insertions(+), 79 deletions(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index 83fd1f1c3ab4..34d6fe3f18d0 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -31,8 +31,6 @@
 void __mips_set_bit(unsigned long nr, volatile unsigned long *addr);
 void __mips_clear_bit(unsigned long nr, volatile unsigned long *addr);
 void __mips_change_bit(unsigned long nr, volatile unsigned long *addr);
-int __mips_test_and_set_bit(unsigned long nr,
-   volatile unsigned long *addr);
 int __mips_test_and_set_bit_lock(unsigned long nr,
 volatile unsigned long *addr);
 int __mips_test_and_clear_bit(unsigned long nr,
@@ -236,24 +234,22 @@ static inline void change_bit(unsigned long nr, volatile 
unsigned long *addr)
 }
 
 /*
- * test_and_set_bit - Set a bit and return its old value
+ * test_and_set_bit_lock - Set a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
- * This operation is atomic and cannot be reordered.
- * It also implies a memory barrier.
+ * This operation is atomic and implies acquire ordering semantics
+ * after the memory operation.
  */
-static inline int test_and_set_bit(unsigned long nr,
+static inline int test_and_set_bit_lock(unsigned long nr,
volatile unsigned long *addr)
 {
unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
int bit = nr & SZLONG_MASK;
unsigned long res, temp;
 
-   smp_mb__before_llsc();
-
if (!kernel_uses_llsc) {
-   res = __mips_test_and_set_bit(nr, addr);
+   res = __mips_test_and_set_bit_lock(nr, addr);
} else if (R1_LLSC_WAR) {
__asm__ __volatile__(
"   .setpush\n"
@@ -264,7 +260,7 @@ static inline int test_and_set_bit(unsigned long nr,
"   beqzl   %2, 1b  \n"
"   and %2, %0, %3  \n"
"   .setpop \n"
-   : "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=" (res)
+   : "=" (temp), "+m" (*m), "=" (res)
: "r" (1UL << bit)
: __LLSC_CLOBBER);
} else {
@@ -291,56 +287,20 @@ static inline int test_and_set_bit(unsigned long nr,
 }
 
 /*
- * test_and_set_bit_lock - Set a bit and return its old value
+ * test_and_set_bit - Set a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
- * This operation is atomic and implies acquire ordering semantics
- * after the memory operation.
+ * This operation is atomic and cannot be reordered.
+ * It also implies a memory barrier.
  */
-static inline int test_and_set_bit_lock(unsigned long nr,
+static inline int test_and_set_bit(unsigned long nr,
volatile unsigned long *addr)
 {
-   unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
-   int bit = nr & SZLONG_MASK;
-   unsigned long res, temp;
-
-   if (!kernel_uses_llsc) {
-   res = __mips_test_and_set_bit_lock(nr, addr);
-   } else if (R1_LLSC_WAR) {
-   __asm__ __volatile__(
-   "   .setpush\n"
-   "   .setarch=r4000  \n"
-   "1: " __LL "%0, %1  # test_and_set_bit  \n"
-   "   or  %2, %0, %3  \n"
-   "   " __SC  "%2, %1 \n"
-   "   beqzl   %2, 1b  \n"
-   "   and %2, %0, %3  \n"
-   "   .setpop \n"
-   : "=" (temp), "+m" (*m), "=" (res)
-   : "r" (1UL << bit)
-   : __LLSC_CLOBBER);
-   } else {
-   do {
-   __asm__ __volatile__(
-   "   .setpush

[PATCH 15/37] MIPS: atomic: Deduplicate 32b & 64b read, set, xchg, cmpxchg

2019-09-30 Thread Paul Burton
Remove the remaining duplication between 32b & 64b in asm/atomic.h by
making use of an ATOMIC_OPS() macro to generate:

  - atomic_read()/atomic64_read()
  - atomic_set()/atomic64_set()
  - atomic_cmpxchg()/atomic64_cmpxchg()
  - atomic_xchg()/atomic64_xchg()

This is consistent with the way all other functions in asm/atomic.h are
generated, and ensures consistency between the 32b & 64b functions.

Of note is that this results in the above now being static inline
functions rather than macros.

Signed-off-by: Paul Burton 
---

 arch/mips/include/asm/atomic.h | 70 +-
 1 file changed, 27 insertions(+), 43 deletions(-)

diff --git a/arch/mips/include/asm/atomic.h b/arch/mips/include/asm/atomic.h
index 96ef50fa2817..e5ac88392d1f 100644
--- a/arch/mips/include/asm/atomic.h
+++ b/arch/mips/include/asm/atomic.h
@@ -24,24 +24,34 @@
 #include 
 #include 
 
-#define ATOMIC_INIT(i)   { (i) }
+#define ATOMIC_OPS(pfx, type)  \
+static __always_inline type pfx##_read(const pfx##_t *v)   \
+{  \
+   return READ_ONCE(v->counter);   \
+}  \
+   \
+static __always_inline void pfx##_set(pfx##_t *v, type i)  \
+{  \
+   WRITE_ONCE(v->counter, i);  \
+}  \
+   \
+static __always_inline type pfx##_cmpxchg(pfx##_t *v, type o, type n)  \
+{  \
+   return cmpxchg(>counter, o, n);  \
+}  \
+   \
+static __always_inline type pfx##_xchg(pfx##_t *v, type n) \
+{  \
+   return xchg(>counter, n);\
+}
 
-/*
- * atomic_read - read atomic variable
- * @v: pointer of type atomic_t
- *
- * Atomically reads the value of @v.
- */
-#define atomic_read(v) READ_ONCE((v)->counter)
+#define ATOMIC_INIT(i) { (i) }
+ATOMIC_OPS(atomic, int)
 
-/*
- * atomic_set - set atomic variable
- * @v: pointer of type atomic_t
- * @i: required value
- *
- * Atomically sets the value of @v to @i.
- */
-#define atomic_set(v, i)   WRITE_ONCE((v)->counter, (i))
+#ifdef CONFIG_64BIT
+# define ATOMIC64_INIT(i)  { (i) }
+ATOMIC_OPS(atomic64, s64)
+#endif
 
 #define ATOMIC_OP(pfx, op, type, c_op, asm_op, ll, sc) \
 static __inline__ void pfx##_##op(type i, pfx##_t * v) \
@@ -135,6 +145,7 @@ static __inline__ type pfx##_fetch_##op##_relaxed(type i, 
pfx##_t * v)  \
return result;  \
 }
 
+#undef ATOMIC_OPS
 #define ATOMIC_OPS(pfx, op, type, c_op, asm_op, ll, sc)
\
ATOMIC_OP(pfx, op, type, c_op, asm_op, ll, sc)  \
ATOMIC_OP_RETURN(pfx, op, type, c_op, asm_op, ll, sc)   \
@@ -254,31 +265,4 @@ ATOMIC_SIP_OP(atomic64, s64, dsubu, lld, scd)
 
 #undef ATOMIC_SIP_OP
 
-#define atomic_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n)))
-#define atomic_xchg(v, new) (xchg(&((v)->counter), (new)))
-
-#ifdef CONFIG_64BIT
-
-#define ATOMIC64_INIT(i){ (i) }
-
-/*
- * atomic64_read - read atomic variable
- * @v: pointer of type atomic64_t
- *
- */
-#define atomic64_read(v)   READ_ONCE((v)->counter)
-
-/*
- * atomic64_set - set atomic variable
- * @v: pointer of type atomic64_t
- * @i: required value
- */
-#define atomic64_set(v, i) WRITE_ONCE((v)->counter, (i))
-
-#define atomic64_cmpxchg(v, o, n) \
-   ((__typeof__((v)->counter))cmpxchg(&((v)->counter), (o), (n)))
-#define atomic64_xchg(v, new) (xchg(&((v)->counter), (new)))
-
-#endif /* CONFIG_64BIT */
-
 #endif /* _ASM_ATOMIC_H */
-- 
2.23.0



[PATCH 00/37] MIPS: barriers & atomics cleanups

2019-09-30 Thread Paul Burton
This series consists of a bunch of cleanups to the way we handle memory
barriers (though no changes to the sync instructions we use to implement
them) & atomic memory accesses. One major goal was to ensure the
Loongson3 LL/SC errata workarounds are applied in a safe manner from
within inline-asm & that we can automatically verify the resulting
kernel binary looks reasonable. Many patches are cleanups found along
the way.

Applies atop v5.4-rc1.

Paul Burton (37):
  MIPS: Unify sc beqz definition
  MIPS: Use compact branch for LL/SC loops on MIPSr6+
  MIPS: barrier: Add __SYNC() infrastructure
  MIPS: barrier: Clean up rmb() & wmb() definitions
  MIPS: barrier: Clean up __smp_mb() definition
  MIPS: barrier: Remove fast_mb() Octeon #ifdef'ery
  MIPS: barrier: Clean up __sync() definition
  MIPS: barrier: Clean up sync_ginv()
  MIPS: atomic: Fix whitespace in ATOMIC_OP macros
  MIPS: atomic: Handle !kernel_uses_llsc first
  MIPS: atomic: Use one macro to generate 32b & 64b functions
  MIPS: atomic: Emit Loongson3 sync workarounds within asm
  MIPS: atomic: Use _atomic barriers in atomic_sub_if_positive()
  MIPS: atomic: Unify 32b & 64b sub_if_positive
  MIPS: atomic: Deduplicate 32b & 64b read, set, xchg, cmpxchg
  MIPS: bitops: Use generic builtin ffs/fls; drop cpu_has_clo_clz
  MIPS: bitops: Handle !kernel_uses_llsc first
  MIPS: bitops: Only use ins for bit 16 or higher
  MIPS: bitops: Use MIPS_ISA_REV, not #ifdefs
  MIPS: bitops: ins start position is always an immediate
  MIPS: bitops: Implement test_and_set_bit() in terms of _lock variant
  MIPS: bitops: Allow immediates in test_and_{set,clear,change}_bit
  MIPS: bitops: Use the BIT() macro
  MIPS: bitops: Avoid redundant zero-comparison for non-LLSC
  MIPS: bitops: Abstract LL/SC loops
  MIPS: bitops: Use BIT_WORD() & BITS_PER_LONG
  MIPS: bitops: Emit Loongson3 sync workarounds within asm
  MIPS: bitops: Use smp_mb__before_atomic in test_* ops
  MIPS: cmpxchg: Emit Loongson3 sync workarounds within asm
  MIPS: cmpxchg: Omit redundant barriers for Loongson3
  MIPS: futex: Emit Loongson3 sync workarounds within asm
  MIPS: syscall: Emit Loongson3 sync workarounds within asm
  MIPS: barrier: Remove loongson_llsc_mb()
  MIPS: barrier: Make __smp_mb__before_atomic() a no-op for Loongson3
  MIPS: genex: Add Loongson3 LL/SC workaround to ejtag_debug_handler
  MIPS: genex: Don't reload address unnecessarily
  MIPS: Check Loongson3 LL/SC errata workaround correctness

 arch/mips/Makefile|   2 +-
 arch/mips/Makefile.postlink   |  10 +-
 arch/mips/include/asm/atomic.h| 571 ++---
 arch/mips/include/asm/barrier.h   | 215 +--
 arch/mips/include/asm/bitops.h| 593 --
 arch/mips/include/asm/cmpxchg.h   |  59 +-
 arch/mips/include/asm/cpu-features.h  |  10 -
 arch/mips/include/asm/futex.h |   9 +-
 arch/mips/include/asm/llsc.h  |  19 +-
 .../asm/mach-malta/cpu-feature-overrides.h|   2 -
 arch/mips/include/asm/sync.h  | 207 ++
 arch/mips/kernel/genex.S  |   6 +-
 arch/mips/kernel/pm-cps.c |  20 +-
 arch/mips/kernel/syscall.c|   3 +-
 arch/mips/lib/bitops.c|  57 +-
 arch/mips/loongson64/Platform |   2 +-
 arch/mips/tools/.gitignore|   1 +
 arch/mips/tools/Makefile  |   5 +
 arch/mips/tools/loongson3-llsc-check.c| 307 +
 19 files changed, 975 insertions(+), 1123 deletions(-)
 create mode 100644 arch/mips/include/asm/sync.h
 create mode 100644 arch/mips/tools/loongson3-llsc-check.c

-- 
2.23.0



[PATCH 25/37] MIPS: bitops: Abstract LL/SC loops

2019-09-30 Thread Paul Burton
Introduce __bit_op() & __test_bit_op() macros which abstract away the
implementation of LL/SC loops. This cuts down on a lot of duplicate
boilerplate code, and also allows R1_LLSC_WAR to be handled outside
of the individual bitop functions.

Signed-off-by: Paul Burton 
---

 arch/mips/include/asm/bitops.h | 267 -
 1 file changed, 63 insertions(+), 204 deletions(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index 3e5589320e83..5701f8b41e87 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -25,6 +25,41 @@
 #include 
 #include 
 
+#define __bit_op(mem, insn, inputs...) do {\
+   unsigned long temp; \
+   \
+   asm volatile(   \
+   "   .setpush\n" \
+   "   .set" MIPS_ISA_LEVEL "  \n" \
+   "1: " __LL  "%0, %1 \n" \
+   "   " insn  "   \n" \
+   "   " __SC  "%0, %1 \n" \
+   "   " __SC_BEQZ "%0, 1b \n" \
+   "   .setpop \n" \
+   : "="(temp), "+" GCC_OFF_SMALL_ASM()(mem) \
+   : inputs\
+   : __LLSC_CLOBBER);  \
+} while (0)
+
+#define __test_bit_op(mem, ll_dst, insn, inputs...) ({ \
+   unsigned long orig, temp;   \
+   \
+   asm volatile(   \
+   "   .setpush\n" \
+   "   .set" MIPS_ISA_LEVEL "  \n" \
+   "1: " __LL  ll_dst ", %2\n" \
+   "   " insn  "   \n" \
+   "   " __SC  "%1, %2 \n" \
+   "   " __SC_BEQZ "%1, 1b \n" \
+   "   .setpop \n" \
+   : "="(orig), "="(temp), \
+ "+" GCC_OFF_SMALL_ASM()(mem)  \
+   : inputs\
+   : __LLSC_CLOBBER);  \
+   \
+   orig;   \
+})
+
 /*
  * These are the "slower" versions of the functions and are in bitops.c.
  * These functions call raw_local_irq_{save,restore}().
@@ -54,55 +89,20 @@ static inline void set_bit(unsigned long nr, volatile 
unsigned long *addr)
 {
unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
int bit = nr & SZLONG_MASK;
-   unsigned long temp;
 
if (!kernel_uses_llsc) {
__mips_set_bit(nr, addr);
return;
}
 
-   if (R1_LLSC_WAR) {
-   __asm__ __volatile__(
-   "   .setpush\n"
-   "   .setarch=r4000  \n"
-   "1: " __LL "%0, %1  # set_bit   \n"
-   "   or  %0, %2  \n"
-   "   " __SC  "%0, %1 \n"
-   "   beqzl   %0, 1b  \n"
-   "   .setpop \n"
-   : "=" (temp), "=" GCC_OFF_SMALL_ASM() (*m)
-   : "ir" (BIT(bit)), GCC_OFF_SMALL_ASM() (*m)
-   : __LLSC_CLOBBER);
-   return;
-   }
-
if ((MIPS_ISA_REV >= 2) && __builtin_constant_p(bit) && (bit >= 16)) {
loongson_llsc_mb();
-   do {
-   __asm__ __volatile__(
-   "   " __LL "%0, %1  # set_bit   \n"
-   "   " __INS "%0, %3, %2, 1  \n"
-   "   " __SC "%0, %1  \n"
-   : "=" (temp), "+" GCC_OFF_SM

[PATCH 09/37] MIPS: atomic: Fix whitespace in ATOMIC_OP macros

2019-09-30 Thread Paul Burton
We define macros in asm/atomic.h which end each line with space
characters before a backslash to continue on the next line. Remove the
space characters leaving tabs as the whitespace used for conformity with
coding convention.

Signed-off-by: Paul Burton 
---

 arch/mips/include/asm/atomic.h | 184 -
 1 file changed, 92 insertions(+), 92 deletions(-)

diff --git a/arch/mips/include/asm/atomic.h b/arch/mips/include/asm/atomic.h
index 7578c807ef98..2d2a8a74c51b 100644
--- a/arch/mips/include/asm/atomic.h
+++ b/arch/mips/include/asm/atomic.h
@@ -42,102 +42,102 @@
  */
 #define atomic_set(v, i)   WRITE_ONCE((v)->counter, (i))
 
-#define ATOMIC_OP(op, c_op, asm_op)  \
-static __inline__ void atomic_##op(int i, atomic_t * v)
  \
-{\
-   if (kernel_uses_llsc) {   \
-   int temp; \
- \
-   loongson_llsc_mb();   \
-   __asm__ __volatile__( \
-   "   .setpush\n"   \
-   "   .set"MIPS_ISA_LEVEL"\n"   \
-   "1: ll  %0, %1  # atomic_" #op "\n"   \
-   "   " #asm_op " %0, %2  \n"   \
-   "   sc  %0, %1  \n"   \
-   "\t" __SC_BEQZ "%0, 1b  \n"   \
-   "   .setpop \n"   \
-   : "=" (temp), "+" GCC_OFF_SMALL_ASM() (v->counter)  \
-   : "Ir" (i) : __LLSC_CLOBBER); \
-   } else {  \
-   unsigned long flags;  \
- \
-   raw_local_irq_save(flags);\
-   v->counter c_op i;\
-   raw_local_irq_restore(flags); \
-   } \
+#define ATOMIC_OP(op, c_op, asm_op)\
+static __inline__ void atomic_##op(int i, atomic_t * v)
\
+{  \
+   if (kernel_uses_llsc) { \
+   int temp;   \
+   \
+   loongson_llsc_mb(); \
+   __asm__ __volatile__(   \
+   "   .setpush\n" \
+   "   .set"MIPS_ISA_LEVEL"\n" \
+   "1: ll  %0, %1  # atomic_" #op "\n" \
+   "   " #asm_op " %0, %2  \n" \
+   "   sc  %0, %1  \n" \
+   "\t" __SC_BEQZ "%0, 1b  \n" \
+   "   .setpop \n" \
+   : "=" (temp), "+" GCC_OFF_SMALL_ASM() (v->counter)\
+   : "Ir" (i) : __LLSC_CLOBBER);   \
+   } else {\
+   unsigned long flags;\
+   \
+   raw_local_irq_save(flags);  \
+   v->counter c_op i;  \
+   raw_local_irq_restore(flags);   \
+   }   \
 }
 
-#define ATOMIC_OP_RETURN(op, c_op, asm_op)   \
-static __inline__ int atomic_##op##_return_relaxed(int i, atomic_t * v)
  \
-{   

  1   2   3   4   5   6   7   8   9   10   >