date:20200521

Re: [PATCH] arch/{mips,sparc,microblaze,powerpc}: Don't enable pagefault/preempt twice

2020-05-21 Thread Guenter Roeck

On Fri, May 22, 2020 at 02:35:23AM +0100, Al Viro wrote:
> On Fri, May 22, 2020 at 02:29:50AM +0100, Al Viro wrote:
> > On Thu, May 21, 2020 at 06:11:08PM -0700, Guenter Roeck wrote:
> > 
> > > Mainline, with:
> > > 
> > > qemu-system-sparc -M SS-4 -kernel arch/sparc/boot/zImage -no-reboot \
> > >   -snapshot -drive file=rootfs.ext2,format=raw,if=scsi \
> > >   -append "panic=-1 slub_debug=FZPUA root=/dev/sda console=ttyS0"
> > >   -nographic -monitor none
> > > 
> > > The machine doesn't really matter, though.
> > 
> > It does, unfortunately - try that with SS-10 and watch what happens ;-/
> 
> Ugh...  It's actually something in -m handling: -m 256 passes, -m 512
> leads to that panic.

Default seems to be 128M. Anyway, see below log for SS-10.

Guenter

---
Configuration device id QEMU version 1 machine id 64
Probing SBus slot 0 offset 0
Probing SBus slot 1 offset 0
Probing SBus slot 2 offset 0
Probing SBus slot 3 offset 0
Probing SBus slot 15 offset 0
Invalid FCode start byte
CPUs: 1 x TI,TMS390Z55
UUID: ----
Welcome to OpenBIOS v1.1 built on Oct 28 2019 17:08
  Type 'help' for detailed information
[sparc] Kernel already loaded
switching to new context:
PROMLIB: obio_ranges 1
PROMLIB: Sun Boot Prom Version 3 Revision 2
Linux version 5.7.0-rc6-00026-g03fb3acae4be (gro...@server.roeck-us.net) (gcc 
version 6.5.0 (Buildroot 2018.11-rc2-00071-g4310260), GNU ld (GNU Binutils) 
2.31.1) #1 Thu May 21 19:17:48 PDT 2020
printk: bootconsole [earlyprom0] enabled
ARCH: SUN4M
TYPE: Sun4m SparcStation10/20
Ethernet address: 52:54:00:12:34:56
OF stdout device is: /obio/zs@0,10:a
PROM: Built device tree with 30586 bytes of memory.
Booting Linux...
Power off control detected.
Built 1 zonelists, mobility grouping on.  Total pages: 25012
Kernel command line: panic=-1 slub_debug=FZPUA root=/dev/sda console=ttyS0
Dentry cache hash table entries: 16384 (order: 4, 65536 bytes, linear)
Inode-cache hash table entries: 8192 (order: 3, 32768 bytes, linear)
Sorting __ex_table...
mem auto-init: stack:off, heap alloc:off, heap free:off
Memory: 103428K/100944K available (5050K kernel code, 178K rwdata, 1212K 
rodata, 176K init, 158K bss, 4294964812K reserved, 0K cma-reserved, 0K highmem)
NR_IRQS: 64
Console: colour dummy device 80x25

| Locking API testsuite:

 | spin |wlock |rlock |mutex | wsem | rsem |
  --
 A-A deadlock:failed|failed|  ok  
|failed|failed|failed|failed|
 A-B-B-A deadlock:failed|failed|  ok  
|failed|failed|failed|failed|
 A-B-B-C-C-A deadlock:failed|failed|  ok  
|failed|failed|failed|failed|
 A-B-C-A-B-C deadlock:failed|failed|  ok  
|failed|failed|failed|failed|
 A-B-B-C-C-D-D-A deadlock:failed|failed|  ok  
|failed|failed|failed|failed|
 A-B-C-D-B-D-D-A deadlock:failed|failed|  ok  
|failed|failed|failed|failed|
 A-B-C-D-B-C-D-A deadlock:failed|failed|  ok  
|failed|failed|failed|failed|
double unlock:  ok  |  ok  |failed|  ok  |failed|failed|  
ok  |
  initialize 
held:failed|failed|failed|failed|failed|failed|failed|
  --
  recursive read-lock: |  ok  | |failed|
   recursive read-lock #2: |  ok  | |failed|
mixed read-write-lock: |failed| |failed|
mixed write-read-lock: |failed| |failed|
  mixed read-lock/lock-write ABBA: |failed| |failed|
   mixed read-lock/lock-read ABBA: |  ok  | |failed|
 mixed write-lock/lock-write ABBA: |failed| |failed|
  --
 hard-irqs-on + irq-safe-A/12:failed|failed|  ok  |
 soft-irqs-on + irq-safe-A/12:failed|failed|  ok  |
 hard-irqs-on + irq-safe-A/21:failed|failed|  ok  |
 soft-irqs-on + irq-safe-A/21:failed|failed|  ok  |
   sirq-safe-A => hirqs-on/12:failed|failed|  ok  |
   sirq-safe-A => hirqs-on/21:failed|failed|  ok  |
 hard-safe-A + irqs-on/12:failed|failed|  ok  |
 soft-safe-A + irqs-on/12:failed|failed|  ok  |
 hard-safe-A + irqs-on/21:failed|failed|  ok  |
 soft-safe-A + irqs-on/21:failed|failed|  ok  |
hard-safe-A + unsafe-B #1/123:failed|failed|  ok  |
soft-safe-A + unsafe-B #1/123:failed|failed|  ok  |
hard-safe-A + unsafe-B #1/132:failed|failed|  ok  |
soft-safe-A + unsafe-B #1/132:failed|failed|  ok  |
hard-safe-A + unsafe-B #1/213:failed|failed|  ok  |
soft-safe-A + unsafe-B #1/213:failed|failed|  ok  |
hard-safe-A + unsafe-B #1/231:failed|failed|  ok  |
soft-safe-A + unsafe-B

Re: Endless soft-lockups for compiling workload since next-20200519

2020-05-21 Thread Qian Cai

On Thu, May 21, 2020 at 11:39:38AM +0200, Peter Zijlstra wrote:
> On Thu, May 21, 2020 at 02:40:36AM +0200, Frederic Weisbecker wrote:
> > On Wed, May 20, 2020 at 02:50:56PM +0200, Peter Zijlstra wrote:
> > > On Tue, May 19, 2020 at 11:58:17PM -0400, Qian Cai wrote:
> > > > Just a head up. Repeatedly compiling kernels for a while would trigger
> > > > endless soft-lockups since next-20200519 on both x86_64 and powerpc.
> > > > .config are in,
> > > 
> > > Could be 90b5363acd47 ("sched: Clean up scheduler_ipi()"), although I've
> > > not seen anything like that myself. Let me go have a look.
> > > 
> > > 
> > > In as far as the logs are readable (they're a wrapped mess, please don't
> > > do that!), they contain very little useful, as is typical with IPIs :/
> > > 
> > > > [ 1167.993773][C1] WARNING: CPU: 1 PID: 0 at kernel/smp.c:127
> > > > flush_smp_call_function_queue+0x1fa/0x2e0
> > 
> > So I've tried to think of a race that could produce that and here is
> > the only thing I could come up with. It's a bit complicated unfortunately:
> 
> This:
> 
> > smp_call_function_single_async() { 
> > smp_call_function_single_async() {
> > // verified csd->flags != CSD_LOCK // verified 
> > csd->flags != CSD_LOCK
> > csd->flags = CSD_LOCK  csd->flags = 
> > CSD_LOCK
> 
> concurrent smp_call_function_single_async() using the same csd is what
> I'm looking at as well. Now in the ILB case there is an easy cure:
> 
> (because there is only a single ilb target)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 01f94cf52783..b6d8a7b991f0 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -10033,7 +10033,7 @@ static void kick_ilb(unsigned int flags)
>* is idle. And the softirq performing nohz idle load balance
>* will be run before returning from the IPI.
>*/
> - smp_call_function_single_async(ilb_cpu, &cpu_rq(ilb_cpu)->nohz_csd);
> + smp_call_function_single_async(ilb_cpu, &this_rq()->nohz_csd);
>  }
>  
>  /*
> 
> Qian, can you give that a spin?

Running for a few hours now. It works fine.

Re: [PATCH] arch/{mips,sparc,microblaze,powerpc}: Don't enable pagefault/preempt twice

2020-05-21 Thread Al Viro

On Fri, May 22, 2020 at 02:29:50AM +0100, Al Viro wrote:
> On Thu, May 21, 2020 at 06:11:08PM -0700, Guenter Roeck wrote:
> 
> > Mainline, with:
> > 
> > qemu-system-sparc -M SS-4 -kernel arch/sparc/boot/zImage -no-reboot \
> > -snapshot -drive file=rootfs.ext2,format=raw,if=scsi \
> > -append "panic=-1 slub_debug=FZPUA root=/dev/sda console=ttyS0"
> > -nographic -monitor none
> > 
> > The machine doesn't really matter, though.
> 
> It does, unfortunately - try that with SS-10 and watch what happens ;-/

Ugh...  It's actually something in -m handling: -m 256 passes, -m 512
leads to that panic.

Re: [PATCH] arch/{mips,sparc,microblaze,powerpc}: Don't enable pagefault/preempt twice

2020-05-21 Thread Al Viro

On Thu, May 21, 2020 at 06:11:08PM -0700, Guenter Roeck wrote:

> Mainline, with:
> 
> qemu-system-sparc -M SS-4 -kernel arch/sparc/boot/zImage -no-reboot \
>   -snapshot -drive file=rootfs.ext2,format=raw,if=scsi \
>   -append "panic=-1 slub_debug=FZPUA root=/dev/sda console=ttyS0"
>   -nographic -monitor none
> 
> The machine doesn't really matter, though.

It does, unfortunately - try that with SS-10 and watch what happens ;-/

Re: [PATCH] arch/{mips,sparc,microblaze,powerpc}: Don't enable pagefault/preempt twice

2020-05-21 Thread Guenter Roeck

On 5/21/20 5:46 PM, Al Viro wrote:
> On Thu, May 21, 2020 at 11:46:12PM +0100, Al Viro wrote:
>> On Thu, May 21, 2020 at 03:20:46PM -0700, Guenter Roeck wrote:
>>> On 5/21/20 10:27 AM, Al Viro wrote:
 On Tue, May 19, 2020 at 09:54:22AM -0700, Guenter Roeck wrote:
> On Mon, May 18, 2020 at 11:48:43AM -0700, ira.we...@intel.com wrote:
>> From: Ira Weiny 
>>
>> The kunmap_atomic clean up failed to remove one set of pagefault/preempt
>> enables when vaddr is not in the fixmap.
>>
>> Fixes: bee2128a09e6 ("arch/kunmap_atomic: consolidate duplicate code")
>> Signed-off-by: Ira Weiny 
>
> microblazeel works with this patch, as do the nosmp sparc32 boot tests,
> but sparc32 boot tests with SMP enabled still fail with lots of messages
> such as:

 BTW, what's your setup for sparc32 boot tests?  IOW, how do you manage to
 shrink the damn thing enough to have the loader cope with it?  I hadn't
 been able to do that for the current mainline ;-/

>>>
>>> defconfig seems to work just fine, even after enabling various debug
>>> and file system options.
>>
>> The hell?  How do you manage to get the kernel in?  sparc32_defconfig
>> ends up with 5316876 bytes unpacked...
> 
> Incidentally, trying to load it via -kernel/-initrd leads to
> Configuration device id QEMU version 1 machine id 64
> Probing SBus slot 0 offset 0
> Probing SBus slot 1 offset 0
> Probing SBus slot 2 offset 0
> Probing SBus slot 3 offset 0
> Probing SBus slot 15 offset 0
> Invalid FCode start byte
> CPUs: 1 x TI,TMS390Z55
> UUID: ----
> Welcome to OpenBIOS v1.1 built on Dec 27 2018 19:17
>   Type 'help' for detailed information
> [sparc] Kernel already loaded
> switching to new context:
> PROMLIB: obio_ranges 1
> PROMLIB: Sun Boot Prom Version 3 Revision 2
> Linux version 5.7.0-rc1-2-gcf51e129b968 (al@duke) (gcc version 6.3.0 
> 20170516 (Debian 6.3.0-18), GNU ld (GNU Binutils for Debian) 2.28) #32 Thu 
> May 21 18:36:07 EDT 2020
> printk: bootconsole [earlyprom0] enabled
> ARCH: SUN4M
> TYPE: Sun4m SparcStation10/20
> Ethernet address: 52:54:00:12:34:56
> 303MB HIGHMEM available.
> OF stdout device is: /obio/zs@0,10:a
> PROM: Built device tree with 30051 bytes of memory.
> Booting Linux...
> Power off control detected.
> Kernel panic - not syncing: Failed to allocate memory for percpu areas.
> CPU: 0 PID: 0 Comm: swapper Not tainted 5.7.0-rc1-2-gcf51e129b968 #32
> [f04f92a8 : 
> setup_per_cpu_areas+0x58/0x90 ] 
> [f04edbf4 : 
> start_kernel+0xc0/0x4a0 ] 
> [f04ed43c : 
> continue_boot+0x324/0x334 ] 
> [ : 
> 0x0 ] 
> 
> Press Stop-A (L1-A) from sun keyboard or send break
> twice on console to return to the boot prom
> ---[ end Kernel panic - not syncing: Failed to allocate memory for percpu 
> areas. ]---
> 
> Giving guest more RAM doesn't change the outcome (well, the number HIGHMEM 
> line is
> obviously higher, but that's it).
> 
> So which sparc32 kernel have you booted with defconfig and how have you done
> that?
> 

Mainline, with:

qemu-system-sparc -M SS-4 -kernel arch/sparc/boot/zImage -no-reboot \
-snapshot -drive file=rootfs.ext2,format=raw,if=scsi \
-append "panic=-1 slub_debug=FZPUA root=/dev/sda console=ttyS0"
-nographic -monitor none

The machine doesn't really matter, though. The root file system is built
with buildroot.

Note that I carry two reverts in my qemu images.

Revert "tcx: switch to load_image_mr() and remove prom_addr hack"
Revert "cg3: switch to load_image_mr() and remove prom-addr hack"

I have been carrying those since ~2017. I didn't check recently
if they are still needed.

If sparc32 is no longer supported in the upstream kernel,
would it possibly make sense remove its support ?

Guenter

Re: [PATCH] arch/{mips,sparc,microblaze,powerpc}: Don't enable pagefault/preempt twice

2020-05-21 Thread Al Viro

On Thu, May 21, 2020 at 11:46:12PM +0100, Al Viro wrote:
> On Thu, May 21, 2020 at 03:20:46PM -0700, Guenter Roeck wrote:
> > On 5/21/20 10:27 AM, Al Viro wrote:
> > > On Tue, May 19, 2020 at 09:54:22AM -0700, Guenter Roeck wrote:
> > >> On Mon, May 18, 2020 at 11:48:43AM -0700, ira.we...@intel.com wrote:
> > >>> From: Ira Weiny 
> > >>>
> > >>> The kunmap_atomic clean up failed to remove one set of pagefault/preempt
> > >>> enables when vaddr is not in the fixmap.
> > >>>
> > >>> Fixes: bee2128a09e6 ("arch/kunmap_atomic: consolidate duplicate code")
> > >>> Signed-off-by: Ira Weiny 
> > >>
> > >> microblazeel works with this patch, as do the nosmp sparc32 boot tests,
> > >> but sparc32 boot tests with SMP enabled still fail with lots of messages
> > >> such as:
> > > 
> > > BTW, what's your setup for sparc32 boot tests?  IOW, how do you manage to
> > > shrink the damn thing enough to have the loader cope with it?  I hadn't
> > > been able to do that for the current mainline ;-/
> > > 
> > 
> > defconfig seems to work just fine, even after enabling various debug
> > and file system options.
> 
> The hell?  How do you manage to get the kernel in?  sparc32_defconfig
> ends up with 5316876 bytes unpacked...

Incidentally, trying to load it via -kernel/-initrd leads to
Configuration device id QEMU version 1 machine id 64
Probing SBus slot 0 offset 0
Probing SBus slot 1 offset 0
Probing SBus slot 2 offset 0
Probing SBus slot 3 offset 0
Probing SBus slot 15 offset 0
Invalid FCode start byte
CPUs: 1 x TI,TMS390Z55
UUID: ----
Welcome to OpenBIOS v1.1 built on Dec 27 2018 19:17
  Type 'help' for detailed information
[sparc] Kernel already loaded
switching to new context:
PROMLIB: obio_ranges 1
PROMLIB: Sun Boot Prom Version 3 Revision 2
Linux version 5.7.0-rc1-2-gcf51e129b968 (al@duke) (gcc version 6.3.0 
20170516 (Debian 6.3.0-18), GNU ld (GNU Binutils for Debian) 2.28) #32 Thu May 
21 18:36:07 EDT 2020
printk: bootconsole [earlyprom0] enabled
ARCH: SUN4M
TYPE: Sun4m SparcStation10/20
Ethernet address: 52:54:00:12:34:56
303MB HIGHMEM available.
OF stdout device is: /obio/zs@0,10:a
PROM: Built device tree with 30051 bytes of memory.
Booting Linux...
Power off control detected.
Kernel panic - not syncing: Failed to allocate memory for percpu areas.
CPU: 0 PID: 0 Comm: swapper Not tainted 5.7.0-rc1-2-gcf51e129b968 #32
[f04f92a8 : 
setup_per_cpu_areas+0x58/0x90 ] 
[f04edbf4 : 
start_kernel+0xc0/0x4a0 ] 
[f04ed43c : 
continue_boot+0x324/0x334 ] 
[ : 
0x0 ] 

Press Stop-A (L1-A) from sun keyboard or send break
twice on console to return to the boot prom
---[ end Kernel panic - not syncing: Failed to allocate memory for percpu 
areas. ]---

Giving guest more RAM doesn't change the outcome (well, the number HIGHMEM line 
is
obviously higher, but that's it).

So which sparc32 kernel have you booted with defconfig and how have you done
that?

Re: [PATCH v2 0/7] padata: parallelize deferred page init

2020-05-21 Thread Josh Triplett

On Wed, May 20, 2020 at 02:26:38PM -0400, Daniel Jordan wrote:
> Please review and test, and thanks to Alex, Andrew, Josh, and Pavel for
> their feedback in the last version.

I re-tested v2:

Tested-by: Josh Triplett 

[0.231435] node 1 initialised, 24189223 pages in 32ms
[0.236718] node 0 initialised, 23398907 pages in 36ms

Re: [RESEND PATCH v7 3/5] powerpc/papr_scm: Fetch nvdimm health information from PHYP

2020-05-21 Thread Ira Weiny

On Wed, May 20, 2020 at 10:45:58PM +0530, Vaibhav Jain wrote:
...

> > On Wed, May 20, 2020 at 12:30:56AM +0530, Vaibhav Jain wrote:

...

> >> @@ -39,6 +78,15 @@ struct papr_scm_priv {
> >>struct resource res;
> >>struct nd_region *region;
> >>struct nd_interleave_set nd_set;
> >> +
> >> +  /* Protect dimm health data from concurrent read/writes */
> >> +  struct mutex health_mutex;
> >> +
> >> +  /* Last time the health information of the dimm was updated */
> >> +  unsigned long lasthealth_jiffies;
> >> +
> >> +  /* Health information for the dimm */
> >> +  u64 health_bitmap;
> >
> > I wonder if this should be typed big endian as you mention that it is in the
> > commit message?
> This was discussed in an earlier review of the patch series at
> https://lore.kernel.org/linux-nvdimm/878sjetcis@mpe.ellerman.id.au
> 
> Even though health bitmap is returned in big endian format (For ex
> value 0xC00 indicates bits 0,1 set), its value is never
> used. Instead only test for specific bits being set in the register is
> done.
> 
> Hence using native cpu type instead of __be64 to store this value.

ok.

> 
> >
> >>  };
> >>  
> >>  static int drc_pmem_bind(struct papr_scm_priv *p)
> >> @@ -144,6 +192,62 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv 
> >> *p)
> >>return drc_pmem_bind(p);
> >>  }
> >>  
> >> +/*
> >> + * Issue hcall to retrieve dimm health info and populate papr_scm_priv 
> >> with the
> >> + * health information.
> >> + */
> >> +static int __drc_pmem_query_health(struct papr_scm_priv *p)
> >> +{
> >> +  unsigned long ret[PLPAR_HCALL_BUFSIZE];
> >
> > Is this exclusive to 64bit?  Why not u64?
> Yes this is specific to 64 bit as the array holds 64 bit register values
> returned from PHYP. Can u64 but here that will be a departure from existing
> practice within arch/powerpc code to use an unsigned long array to fetch
> returned values for PHYP.
> 
> >
> >> +  s64 rc;
> >
> > plpar_hcall() returns long and this function returns int and rc is declared
> > s64?
> >
> > Why not have them all be long to follow plpar_hcall?
> Yes 'long' type is better suited for variable 'rc' and I will get it fixed.
> 
> But the value of variable 'rc' is never directly returned from this
> function, we always return kernel error codes instead. Hence the
> return type of this function is consistent.

Honestly masking the error return of plpar_hcall() seems a problem as well...
but ok.

Ira

> 
> >
> >> +
> >> +  /* issue the hcall */
> >> +  rc = plpar_hcall(H_SCM_HEALTH, ret, p->drc_index);
> >> +  if (rc != H_SUCCESS) {
> >> +  dev_err(&p->pdev->dev,
> >> +   "Failed to query health information, Err:%lld\n", rc);
> >> +  rc = -ENXIO;
> >> +  goto out;
> >> +  }
> >> +
> >> +  p->lasthealth_jiffies = jiffies;
> >> +  p->health_bitmap = ret[0] & ret[1];
> >> +
> >> +  dev_dbg(&p->pdev->dev,
> >> +  "Queried dimm health info. Bitmap:0x%016lx Mask:0x%016lx\n",
> >> +  ret[0], ret[1]);
> >> +out:
> >> +  return rc;
> >> +}
> >> +
> >> +/* Min interval in seconds for assuming stable dimm health */
> >> +#define MIN_HEALTH_QUERY_INTERVAL 60
> >> +
> >> +/* Query cached health info and if needed call drc_pmem_query_health */
> >> +static int drc_pmem_query_health(struct papr_scm_priv *p)
> >> +{
> >> +  unsigned long cache_timeout;
> >> +  s64 rc;
> >> +
> >> +  /* Protect concurrent modifications to papr_scm_priv */
> >> +  rc = mutex_lock_interruptible(&p->health_mutex);
> >> +  if (rc)
> >> +  return rc;
> >> +
> >> +  /* Jiffies offset for which the health data is assumed to be same */
> >> +  cache_timeout = p->lasthealth_jiffies +
> >> +  msecs_to_jiffies(MIN_HEALTH_QUERY_INTERVAL * 1000);
> >> +
> >> +  /* Fetch new health info is its older than MIN_HEALTH_QUERY_INTERVAL */
> >> +  if (time_after(jiffies, cache_timeout))
> >> +  rc = __drc_pmem_query_health(p);
> >
> > And back to s64 after returning int?
> Agree, will change 's64 rc' to 'int rc'.
> 
> >
> >> +  else
> >> +  /* Assume cached health data is valid */
> >> +  rc = 0;
> >> +
> >> +  mutex_unlock(&p->health_mutex);
> >> +  return rc;
> >> +}
> >>  
> >>  static int papr_scm_meta_get(struct papr_scm_priv *p,
> >> struct nd_cmd_get_config_data_hdr *hdr)
> >> @@ -286,6 +390,64 @@ static int papr_scm_ndctl(struct 
> >> nvdimm_bus_descriptor *nd_desc,
> >>return 0;
> >>  }
> >>  
> >> +static ssize_t flags_show(struct device *dev,
> >> +  struct device_attribute *attr, char *buf)
> >> +{
> >> +  struct nvdimm *dimm = to_nvdimm(dev);
> >> +  struct papr_scm_priv *p = nvdimm_provider_data(dimm);
> >> +  struct seq_buf s;
> >> +  u64 health;
> >> +  int rc;
> >> +
> >> +  rc = drc_pmem_query_health(p);
> >
> > and back to int...
> >
> drc_pmem_query_health() returns an 'int' so the type of variable 'rc'
> looks correct to me.
> 
> > Just make them long all through...
> I think t

Re: Fwd: [CRON] Broken: ClangBuiltLinux/continuous-integration#1432 (master - 0aceafc)

2020-05-21 Thread Nathan Chancellor

On Thu, May 21, 2020 at 03:23:11PM -0700, Nick Desaulniers wrote:
> On Thu, May 21, 2020 at 6:00 AM Michael Ellerman  wrote:
> >
> > Nathan Chancellor  writes:
> > > On Tue, May 19, 2020 at 05:56:32PM -0700, 'Nick Desaulniers' via Clang 
> > > Built Linux wrote:
> > >> Looks like our CI is still red from this:
> > >>
> > >> https://travis-ci.com/github/ClangBuiltLinux/continuous-integration/builds/166854584
> > >>
> > >> Filing a bug to follow up on:
> > >> https://github.com/ClangBuiltLinux/linux/issues/1031
> > >>
> > >> On Thu, May 7, 2020 at 8:29 PM Michael Ellerman  
> > >> wrote:
> > >> >
> > >> > Nick Desaulniers  writes:
> > >> > > Looks like ppc64le powernv_defconfig is suddenly failing the locking
> > >> > > torture tests, then locks up?
> > >> > > https://travis-ci.com/github/ClangBuiltLinux/continuous-integration/jobs/329211572#L3111-L3167
> > >> > > Any recent changes related here in -next?  I believe this is the 
> > >> > > first
> > >> > > failure, so I'll report back if we see this again.
> > >> >
> > >> > Thanks for the report.
> > >> >
> > >> > There's nothing newly in next-20200507 that seems related.
> > ...
> > >
> > > This is probably still a manifestation of
> > > https://github.com/ClangBuiltLinux/continuous-integration/issues/262
> > > because rekicking the tests usually fixes it.
> 
> I thought we had upgraded our version of QEMU in response to this already?
> https://github.com/ClangBuiltLinux/dockerimage/pull/44
> https://github.com/ClangBuiltLinux/dockerimage/pull/46

That was more of a bandaid than an actual fix. It happens a lot less
often with QEMU 4.2.0 but I could still reproduce that hang very
sparingly with the POWER9 machines on it. My machines are way more
powerful than the ones on Travis, which I am sure factors into that.
the hang with the POWER9 machines very sparingly with QEMU 4.2.0 but

The real solution is to upgrade to QEMU 5.0.0, which we could probably
do via a PPA (or through our Docker image), or wait for QEMU 4.2.1,
which should hopefully have that fix since it was CC'd for QEMU stable.

> >
> > Oh yep.
> >
> > I was looking at the RCU warning, which I still don't understand, but
> > the lockup is presumably the same problem you hit with interrupts being
> > lost.
> >
> > > We should probably just disable the torture tests like we do for x86_64
> > > for CI because we do not have access to QEMU 5.0.0 where this should be
> > > fixed. I believe it is slated for 4.2.1 as well but we still have to
> > > wait for that to be updated and packaged in Ubuntu.
> >
> > You just need to start building Qemu HEAD as part of your CI ;)
> 
> LOL
> https://github.com/ClangBuiltLinux/dockerimage/pull/46#pullrequestreview-395639442
> Yeah I think the hard part for all these dependendencies is the risk
> of living on the edge of "top of tree" for all of them, and trying to
> control for some by using stable releases.  May not always be
> possible.

Unfortunately, we are at the mercy of a bunch of different parties. If
only we had a ClangBuiltLinux build server that we maintained...

Cheers,
Nathan

Re: [PATCH] arch/{mips,sparc,microblaze,powerpc}: Don't enable pagefault/preempt twice

2020-05-21 Thread Al Viro

On Thu, May 21, 2020 at 03:20:46PM -0700, Guenter Roeck wrote:
> On 5/21/20 10:27 AM, Al Viro wrote:
> > On Tue, May 19, 2020 at 09:54:22AM -0700, Guenter Roeck wrote:
> >> On Mon, May 18, 2020 at 11:48:43AM -0700, ira.we...@intel.com wrote:
> >>> From: Ira Weiny 
> >>>
> >>> The kunmap_atomic clean up failed to remove one set of pagefault/preempt
> >>> enables when vaddr is not in the fixmap.
> >>>
> >>> Fixes: bee2128a09e6 ("arch/kunmap_atomic: consolidate duplicate code")
> >>> Signed-off-by: Ira Weiny 
> >>
> >> microblazeel works with this patch, as do the nosmp sparc32 boot tests,
> >> but sparc32 boot tests with SMP enabled still fail with lots of messages
> >> such as:
> > 
> > BTW, what's your setup for sparc32 boot tests?  IOW, how do you manage to
> > shrink the damn thing enough to have the loader cope with it?  I hadn't
> > been able to do that for the current mainline ;-/
> > 
> 
> defconfig seems to work just fine, even after enabling various debug
> and file system options.

The hell?  How do you manage to get the kernel in?  sparc32_defconfig
ends up with 5316876 bytes unpacked...

RE: [PATCH] arch/{mips,sparc,microblaze,powerpc}: Don't enable pagefault/preempt twice

2020-05-21 Thread Weiny, Ira

> On 5/21/20 10:42 AM, Ira Weiny wrote:
> > On Thu, May 21, 2020 at 09:05:41AM -0700, Guenter Roeck wrote:
> >> On 5/19/20 10:13 PM, Ira Weiny wrote:
> >>> On Tue, May 19, 2020 at 12:42:15PM -0700, Guenter Roeck wrote:
>  On Tue, May 19, 2020 at 11:40:32AM -0700, Ira Weiny wrote:
> > On Tue, May 19, 2020 at 09:54:22AM -0700, Guenter Roeck wrote:
> >> On Mon, May 18, 2020 at 11:48:43AM -0700, ira.we...@intel.com
> wrote:
> >>> From: Ira Weiny 
> >>>
> >>> The kunmap_atomic clean up failed to remove one set of
> >>> pagefault/preempt enables when vaddr is not in the fixmap.
> >>>
> >>> Fixes: bee2128a09e6 ("arch/kunmap_atomic: consolidate duplicate
> >>> code")
> >>> Signed-off-by: Ira Weiny 
> >>
> >> microblazeel works with this patch,
> >
> > Awesome...  Andrew in my rush yesterday I should have put a
> > reported by on the patch for Guenter as well.
> >
> > Sorry about that Guenter,
> 
>  No worries.
> 
> > Ira
> >
> >> as do the nosmp sparc32 boot tests, but sparc32 boot tests with
> >> SMP enabled still fail with lots of messages such as:
> >>
> >> BUG: Bad page state in process swapper/0  pfn:006a1
> >> page:f0933420 refcount:0 mapcount:1 mapping:(ptrval) index:0x1
> >> flags: 0x0()
> >> raw:  0100 0122  0001 
> >>   page dumped because: nonzero mapcount
> Modules
> >> linked in:
> >> CPU: 0 PID: 1 Comm: swapper/0 Tainted: GB 
> >> 5.7.0-rc6-next-
> 20200518-2-gb178d2d56f29 #1
> >> [f00e7ab8 :
> >> bad_page+0xa8/0x108 ]
> >> [f00e8b54 :
> >> free_pcppages_bulk+0x154/0x52c ]
> >> [f00ea024 :
> >> free_unref_page+0x54/0x6c ]
> >> [f00ed864 :
> >> free_reserved_area+0x58/0xec ]
> >> [f0527104 :
> >> kernel_init+0x14/0x110 ]
> >> [f000b77c :
> >> ret_from_kernel_thread+0xc/0x38 ]
> >> [ :
> >> 0x0 ]
> >>
> >> Code path leading to that message is different but always the
> >> same from free_unref_page().
> >>>
> >>> Actually it occurs to me that the patch consolidating kmap_prot is
> >>> odd for sparc 32 bit...
> >>>
> >>> Its a long shot but could you try reverting this patch?
> >>>
> >>> 4ea7d2419e3f kmap: consolidate kmap_prot definitions
> >>>
> >>
> >> That is not easy to revert, unfortunately, due to several follow-up
> patches.
> >
> > I have gotten your sparc tests to run and they all pass...
> >
> > 08:10:34 > ../linux-build-test/rootfs/sparc/run-qemu-sparc.sh
> > Build reference: v5.7-rc4-17-g852b6f2edc0f
> >
> 
> That doesn't look like it is linux-next, which I guess means that something
> else in linux-next breaks it. What is your qemu version ?

Ah yea that was just 5.7-rc4 with my patch set applied.  Yes must be something 
else or an interaction with my patch set.

Did I see another email with Mike which may fix this?

Ira

> 
> Thanks,
> Guenter
> 
> > Building sparc32:SPARCClassic:nosmp:scsi:hd ... running . passed
> > Building sparc32:SPARCbook:nosmp:scsi:cd ... running . passed
> > Building sparc32:LX:nosmp:noapc:scsi:hd ... running . passed
> > Building sparc32:SS-4:nosmp:initrd ... running . passed
> > Building sparc32:SS-5:nosmp:scsi:hd ... running . passed
> > Building sparc32:SS-10:nosmp:scsi:cd ... running . passed
> > Building sparc32:SS-20:nosmp:scsi:hd ... running . passed
> > Building sparc32:SS-600MP:nosmp:scsi:hd ... running . passed
> > Building sparc32:Voyager:nosmp:noapc:scsi:hd ... running . passed
> > Building sparc32:SS-4:smp:scsi:hd ... running . passed
> > Building sparc32:SS-5:smp:scsi:cd ... running . passed
> > Building sparc32:SS-10:smp:scsi:hd ... running . passed
> > Building sparc32:SS-20:smp:scsi:hd ... running . passed
> > Building sparc32:SS-600MP:smp:scsi:hd ... running . passed
> > Building sparc32:Voyager:smp:noapc:scsi:hd ... running . passed
> >
> > Is there another test I need to run?
> >
> > Ira
> >
> >
> >>
> >> Guenter
> >>
> >>> Alternately I will need to figure out how to run the sparc on qemu here...
> >>>
> >>> Thanks very much for all the testing though!  :-D
> >>>
> >>> Ira
> >>>
> >>
> >> Still testing ppc images.
> >>
> 
>  ppc image tests are passing with this patch.
> 
>  Guenter
> >>

Re: [PATCH] arch/{mips,sparc,microblaze,powerpc}: Don't enable pagefault/preempt twice

2020-05-21 Thread Guenter Roeck

On 5/21/20 10:42 AM, Ira Weiny wrote:
> On Thu, May 21, 2020 at 09:05:41AM -0700, Guenter Roeck wrote:
>> On 5/19/20 10:13 PM, Ira Weiny wrote:
>>> On Tue, May 19, 2020 at 12:42:15PM -0700, Guenter Roeck wrote:
 On Tue, May 19, 2020 at 11:40:32AM -0700, Ira Weiny wrote:
> On Tue, May 19, 2020 at 09:54:22AM -0700, Guenter Roeck wrote:
>> On Mon, May 18, 2020 at 11:48:43AM -0700, ira.we...@intel.com wrote:
>>> From: Ira Weiny 
>>>
>>> The kunmap_atomic clean up failed to remove one set of pagefault/preempt
>>> enables when vaddr is not in the fixmap.
>>>
>>> Fixes: bee2128a09e6 ("arch/kunmap_atomic: consolidate duplicate code")
>>> Signed-off-by: Ira Weiny 
>>
>> microblazeel works with this patch,
>
> Awesome...  Andrew in my rush yesterday I should have put a reported by 
> on the
> patch for Guenter as well.
>
> Sorry about that Guenter,

 No worries.

> Ira
>
>> as do the nosmp sparc32 boot tests,
>> but sparc32 boot tests with SMP enabled still fail with lots of messages
>> such as:
>>
>> BUG: Bad page state in process swapper/0  pfn:006a1
>> page:f0933420 refcount:0 mapcount:1 mapping:(ptrval) index:0x1
>> flags: 0x0()
>> raw:  0100 0122  0001   
>> 
>> page dumped because: nonzero mapcount
>> Modules linked in:
>> CPU: 0 PID: 1 Comm: swapper/0 Tainted: GB 
>> 5.7.0-rc6-next-20200518-2-gb178d2d56f29 #1
>> [f00e7ab8 :
>> bad_page+0xa8/0x108 ]
>> [f00e8b54 :
>> free_pcppages_bulk+0x154/0x52c ]
>> [f00ea024 :
>> free_unref_page+0x54/0x6c ]
>> [f00ed864 :
>> free_reserved_area+0x58/0xec ]
>> [f0527104 :
>> kernel_init+0x14/0x110 ]
>> [f000b77c :
>> ret_from_kernel_thread+0xc/0x38 ]
>> [ :
>> 0x0 ]
>>
>> Code path leading to that message is different but always the same
>> from free_unref_page().
>>>
>>> Actually it occurs to me that the patch consolidating kmap_prot is odd for
>>> sparc 32 bit...
>>>
>>> Its a long shot but could you try reverting this patch?
>>>
>>> 4ea7d2419e3f kmap: consolidate kmap_prot definitions
>>>
>>
>> That is not easy to revert, unfortunately, due to several follow-up patches.
> 
> I have gotten your sparc tests to run and they all pass...
> 
> 08:10:34 > ../linux-build-test/rootfs/sparc/run-qemu-sparc.sh 
> Build reference: v5.7-rc4-17-g852b6f2edc0f
> 

That doesn't look like it is linux-next, which I guess means that something
else in linux-next breaks it. What is your qemu version ?

Thanks,
Guenter

> Building sparc32:SPARCClassic:nosmp:scsi:hd ... running . passed
> Building sparc32:SPARCbook:nosmp:scsi:cd ... running . passed
> Building sparc32:LX:nosmp:noapc:scsi:hd ... running . passed
> Building sparc32:SS-4:nosmp:initrd ... running . passed
> Building sparc32:SS-5:nosmp:scsi:hd ... running . passed
> Building sparc32:SS-10:nosmp:scsi:cd ... running . passed
> Building sparc32:SS-20:nosmp:scsi:hd ... running . passed
> Building sparc32:SS-600MP:nosmp:scsi:hd ... running . passed
> Building sparc32:Voyager:nosmp:noapc:scsi:hd ... running . passed
> Building sparc32:SS-4:smp:scsi:hd ... running . passed
> Building sparc32:SS-5:smp:scsi:cd ... running . passed
> Building sparc32:SS-10:smp:scsi:hd ... running . passed
> Building sparc32:SS-20:smp:scsi:hd ... running . passed
> Building sparc32:SS-600MP:smp:scsi:hd ... running . passed
> Building sparc32:Voyager:smp:noapc:scsi:hd ... running . passed
> 
> Is there another test I need to run?
> 
> Ira
> 
> 
>>
>> Guenter
>>
>>> Alternately I will need to figure out how to run the sparc on qemu here...
>>>
>>> Thanks very much for all the testing though!  :-D
>>>
>>> Ira
>>>
>>
>> Still testing ppc images.
>>

 ppc image tests are passing with this patch.

 Guenter
>>

Re: [PATCH] arch/{mips,sparc,microblaze,powerpc}: Don't enable pagefault/preempt twice

2020-05-21 Thread Guenter Roeck

On 5/21/20 10:27 AM, Al Viro wrote:
> On Tue, May 19, 2020 at 09:54:22AM -0700, Guenter Roeck wrote:
>> On Mon, May 18, 2020 at 11:48:43AM -0700, ira.we...@intel.com wrote:
>>> From: Ira Weiny 
>>>
>>> The kunmap_atomic clean up failed to remove one set of pagefault/preempt
>>> enables when vaddr is not in the fixmap.
>>>
>>> Fixes: bee2128a09e6 ("arch/kunmap_atomic: consolidate duplicate code")
>>> Signed-off-by: Ira Weiny 
>>
>> microblazeel works with this patch, as do the nosmp sparc32 boot tests,
>> but sparc32 boot tests with SMP enabled still fail with lots of messages
>> such as:
> 
> BTW, what's your setup for sparc32 boot tests?  IOW, how do you manage to
> shrink the damn thing enough to have the loader cope with it?  I hadn't
> been able to do that for the current mainline ;-/
> 

defconfig seems to work just fine, even after enabling various debug
and file system options.

Guenter

Re: [PATCH v2 5/7] mm: parallelize deferred_init_memmap()

2020-05-21 Thread Daniel Jordan

On Thu, May 21, 2020 at 09:46:35AM -0700, Alexander Duyck wrote:
> It is more about not bothering with the extra tracking. We don't
> really need it and having it doesn't really add much in the way of
> value.

Yeah, it can probably go.

> > > > @@ -1863,11 +1892,32 @@ static int __init deferred_init_memmap(void 
> > > > *data)
> > > > goto zone_empty;
> > > >
> > > > /*
> > > > -* Initialize and free pages in MAX_ORDER sized increments so
> > > > -* that we can avoid introducing any issues with the buddy
> > > > -* allocator.
> > > > +* More CPUs always led to greater speedups on tested systems, 
> > > > up to
> > > > +* all the nodes' CPUs.  Use all since the system is otherwise 
> > > > idle now.
> > > >  */
> > > > +   max_threads = max(cpumask_weight(cpumask), 1u);
> > > > +
> > > > while (spfn < epfn) {
> > > > +   epfn_align = ALIGN_DOWN(epfn, PAGES_PER_SECTION);
> > > > +
> > > > +   if (IS_ALIGNED(spfn, PAGES_PER_SECTION) &&
> > > > +   epfn_align - spfn >= PAGES_PER_SECTION) {
> > > > +   struct definit_args arg = { zone, 
> > > > ATOMIC_LONG_INIT(0) };
> > > > +   struct padata_mt_job job = {
> > > > +   .thread_fn   = 
> > > > deferred_init_memmap_chunk,
> > > > +   .fn_arg  = &arg,
> > > > +   .start   = spfn,
> > > > +   .size= epfn_align - spfn,
> > > > +   .align   = PAGES_PER_SECTION,
> > > > +   .min_chunk   = PAGES_PER_SECTION,
> > > > +   .max_threads = max_threads,
> > > > +   };
> > > > +
> > > > +   padata_do_multithreaded(&job);
> > > > +   nr_pages += atomic_long_read(&arg.nr_pages);
> > > > +   spfn = epfn_align;
> > > > +   }
> > > > +
> > > > nr_pages += deferred_init_maxorder(&i, zone, &spfn, 
> > > > &epfn);
> > > > cond_resched();
> > > > }
> > >
> > > This doesn't look right. You are basically adding threads in addition
> > > to calls to deferred_init_maxorder.
> >
> > The deferred_init_maxorder call is there to do the remaining, non-section
> > aligned part of a range.  It doesn't have to be done this way.
> 
> It is also doing the advancing though isn't it?

Yes.  Not sure what you're getting at.  There's the 'spfn = epfn_align' before
so nothing is skipped.  It's true that the nonaligned part is done outside of
padata when it could be done by a thread that'd otherwise be waiting or idle,
which should be addressed in the next version.

> I think I resolved this with the fix for it I described in the other
> email. We just need to swap out spfn for epfn and make sure we align
> spfn with epfn_align. Then I think that takes care of possible skips.

Right, though your fix looks a lot like deferred_init_mem_pfn_range_in_zone().
Seems better to just use that and not repeat ourselves.  Lame that it's
starting at the beginning of the ranges every time, maybe it could be
generalized somehow, but I think it should be fast enough.

> > We could use deferred_init_mem_pfn_range_in_zone() instead of the for_each
> > loop.
> >
> > What I was trying to avoid by aligning down is creating a discontiguous pfn
> > range that get passed to padata.  We already discussed how those are handled
> > by the zone iterator in the thread function, but job->size can be 
> > exaggerated
> > to include parts of the range that are never touched.  Thinking more about 
> > it
> > though, it's a small fraction of the total work and shouldn't matter.
> 
> So the problem with aligning down is that you are going to be slowed
> up as you have to go single threaded to initialize whatever remains.
> So worst case scenario is that you have a section aligned block and
> you will process all but 1 section in parallel, and then have to
> process the remaining section one max order block at a time.

Yes, aligning up is better.

> > > This should accomplish the same thing, but much more efficiently.
> >
> > Well, more cleanly.  I'll give it a try.
> 
> I agree I am not sure if it will make a big difference on x86, however
> the more ranges you have to process the faster this approach should be
> as it stays parallel the entire time rather than having to drop out
> and process the last section one max order block at a time.

Right.

Re: linux-next: manual merge of the rcu tree with the powerpc tree

2020-05-21 Thread Thomas Gleixner

"Paul E. McKenney"  writes:
> On Thu, May 21, 2020 at 02:51:24PM +1000, Stephen Rothwell wrote:
>> Hi all,
>> 
>> On Tue, 19 May 2020 17:23:16 +1000 Stephen Rothwell  
>> wrote:
>> >
>> > Today's linux-next merge of the rcu tree got a conflict in:
>> > 
>> >   arch/powerpc/kernel/traps.c
>> > 
>> > between commit:
>> > 
>> >   116ac378bb3f ("powerpc/64s: machine check interrupt update NMI 
>> > accounting")
>> > 
>> > from the powerpc tree and commit:
>> > 
>> >   187416eeb388 ("hardirq/nmi: Allow nested nmi_enter()")
>> > 
>> > from the rcu tree.
>> > 
>> > I fixed it up (I used the powerpc tree version for now) and can carry the
>> > fix as necessary. This is now fixed as far as linux-next is concerned,
>> > but any non trivial conflicts should be mentioned to your upstream
>> > maintainer when your tree is submitted for merging.  You may also want
>> > to consider cooperating with the maintainer of the conflicting tree to
>> > minimise any particularly complex conflicts.
>> 
>> This is now a conflict between the powerpc commit and commit
>> 
>>   69ea03b56ed2 ("hardirq/nmi: Allow nested nmi_enter()")
>> 
>> from the tip tree.  I assume that the rcu and tip trees are sharing
>> some patches (but not commits) :-(
>
> We are sharing commits, and in fact 187416eeb388 in the rcu tree came
> from the tip tree.  My guess is version skew, and that I probably have
> another rebase coming up.
>
> Why is this happening?  There are sets of conflicting commits in different
> efforts, and we are trying to resolve them.  But we are getting feedback
> on some of those commits, which is probably what is causing the skew.

Correct. We had to rebase that. I don't think we do it again. The
changes I just sent out are carefully crafted to avoid that.

Thanks,

tglx

Re: [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier

2020-05-21 Thread Mikulas Patocka




On Thu, 21 May 2020, Dan Williams wrote:

> On Thu, May 21, 2020 at 10:03 AM Aneesh Kumar K.V
>  wrote:
> >
> > > Moving on to the patch itself--Aneesh, have you audited other persistent
> > > memory users in the kernel?  For example, drivers/md/dm-writecache.c does
> > > this:
> > >
> > > static void writecache_commit_flushed(struct dm_writecache *wc, bool 
> > > wait_for_ios)
> > > {
> > >   if (WC_MODE_PMEM(wc))
> > >   wmb(); <==
> > >  else
> > >  ssd_commit_flushed(wc, wait_for_ios);
> > > }
> > >
> > > I believe you'll need to make modifications there.
> > >
> >
> > Correct. Thanks for catching that.
> >
> >
> > I don't understand dm much, wondering how this will work with
> > non-synchronous DAX device?
> 
> That's a good point. DM-writecache needs to be cognizant of things
> like virtio-pmem that violate the rule that persisent memory writes
> can be flushed by CPU functions rather than calling back into the
> driver. It seems we need to always make the flush case a dax_operation
> callback to account for this.

dm-writecache is normally sitting on the top of dm-linear, so it would 
need to pass the wmb() call through the dm core and dm-linear target ... 
that would slow it down ... I remember that you already did it this way 
some times ago and then removed it.

What's the exact problem with POWER? Could the POWER system have two types 
of persistent memory that need two different ways of flushing?

Mikulas

Re: [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier

2020-05-21 Thread Dan Williams

On Thu, May 21, 2020 at 7:39 AM Jeff Moyer  wrote:
>
> Dan Williams  writes:
>
> >> But I agree with your concern that if we have older kernel/applications
> >> that continue to use `dcbf` on future hardware we will end up
> >> having issues w.r.t powerfail consistency. The plan is what you outlined
> >> above as tighter ecosystem control. Considering we don't have a pmem
> >> device generally available, we get both kernel and userspace upgraded
> >> to use these new instructions before such a device is made available.
>
> I thought power already supported NVDIMM-N, no?  So are you saying that
> those devices will continue to work with the existing flushing and
> fencing mechanisms?
>
> > Ok, I think a compile time kernel option with a runtime override
> > satisfies my concern. Does that work for you?
>
> The compile time option only helps when running newer kernels.  I'm not
> sure how you would even begin to audit userspace applications (keep in
> mind, not every application is open source, and not every application
> uses pmdk).  I also question the merits of forcing the administrator to
> make the determination of whether all applications on the system will
> work properly.  Really, you have to rely on the vendor to tell you the
> platform is supported, and at that point, why put further hurdles in the
> way?

I'm thoroughly confused by this. I thought this was exactly the role
of a Linux distribution vendor. ISVs qualify their application on a
hardware-platform + distribution combination and the distribution owns
picking ABI defaults like CONFIG_SYSFS_DEPRECATED regardless of
whether they can guarantee that all apps are updated to the new
semantics.

The administrator is not forced, the administrator if afforded an
override in the extreme case that they find an exception to what was
qualified and need to override the distribution's compile-time choice.

>
> The decision to require different instructions on ppc is unfortunate,
> but one I'm sure we have no control over.  I don't see any merit in the
> kernel disallowing MAP_SYNC access on these platforms.  Ideally, we'd
> have some way of ensuring older kernels don't work with these new
> platforms, but I don't think that's possible.

I see disabling MAP_SYNC as the more targeted form of "ensursing older
kernels don't work.

So I guess we agree that something should break when baseline
assumptions change, we just don't yet agree on where that break should
happen?

Re: [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier

2020-05-21 Thread Dan Williams

On Thu, May 21, 2020 at 10:03 AM Aneesh Kumar K.V
 wrote:
>
> On 5/21/20 8:08 PM, Jeff Moyer wrote:
> > Dan Williams  writes:
> >
> >>> But I agree with your concern that if we have older kernel/applications
> >>> that continue to use `dcbf` on future hardware we will end up
> >>> having issues w.r.t powerfail consistency. The plan is what you outlined
> >>> above as tighter ecosystem control. Considering we don't have a pmem
> >>> device generally available, we get both kernel and userspace upgraded
> >>> to use these new instructions before such a device is made available.
> >
> > I thought power already supported NVDIMM-N, no?  So are you saying that
> > those devices will continue to work with the existing flushing and
> > fencing mechanisms?
> >
>
> yes. these devices can continue to use 'dcbf + hwsync' as long as we are
> running them on P9.
>
>
> >> Ok, I think a compile time kernel option with a runtime override
> >> satisfies my concern. Does that work for you?
> >
> > The compile time option only helps when running newer kernels.  I'm not
> > sure how you would even begin to audit userspace applications (keep in
> > mind, not every application is open source, and not every application
> > uses pmdk).  I also question the merits of forcing the administrator to
> > make the determination of whether all applications on the system will
> > work properly.  Really, you have to rely on the vendor to tell you the
> > platform is supported, and at that point, why put further hurdles in the
> > way?
> >
> > The decision to require different instructions on ppc is unfortunate,
> > but one I'm sure we have no control over.  I don't see any merit in the
> > kernel disallowing MAP_SYNC access on these platforms.  Ideally, we'd
> > have some way of ensuring older kernels don't work with these new
> > platforms, but I don't think that's possible.
> >
>
>
> I am currently looking at the possibility of firmware present these
> devices with different device-tree compat values. So that older
> /existing kernel won't initialize the device on newer systems. Is that a
> good compromise? We still can end up with older userspace and newer
> kernel. One of the option suggested by Jan Kara is to use a prctl flag
> to control that? (intead of kernel parameter option I posted before)
>
>
> > Moving on to the patch itself--Aneesh, have you audited other persistent
> > memory users in the kernel?  For example, drivers/md/dm-writecache.c does
> > this:
> >
> > static void writecache_commit_flushed(struct dm_writecache *wc, bool 
> > wait_for_ios)
> > {
> >   if (WC_MODE_PMEM(wc))
> >   wmb(); <==
> >  else
> >  ssd_commit_flushed(wc, wait_for_ios);
> > }
> >
> > I believe you'll need to make modifications there.
> >
>
> Correct. Thanks for catching that.
>
>
> I don't understand dm much, wondering how this will work with
> non-synchronous DAX device?

That's a good point. DM-writecache needs to be cognizant of things
like virtio-pmem that violate the rule that persisent memory writes
can be flushed by CPU functions rather than calling back into the
driver. It seems we need to always make the flush case a dax_operation
callback to account for this.

Re: [PATCH] arch/{mips,sparc,microblaze,powerpc}: Don't enable pagefault/preempt twice

2020-05-21 Thread Ira Weiny

On Thu, May 21, 2020 at 09:05:41AM -0700, Guenter Roeck wrote:
> On 5/19/20 10:13 PM, Ira Weiny wrote:
> > On Tue, May 19, 2020 at 12:42:15PM -0700, Guenter Roeck wrote:
> >> On Tue, May 19, 2020 at 11:40:32AM -0700, Ira Weiny wrote:
> >>> On Tue, May 19, 2020 at 09:54:22AM -0700, Guenter Roeck wrote:
>  On Mon, May 18, 2020 at 11:48:43AM -0700, ira.we...@intel.com wrote:
> > From: Ira Weiny 
> >
> > The kunmap_atomic clean up failed to remove one set of pagefault/preempt
> > enables when vaddr is not in the fixmap.
> >
> > Fixes: bee2128a09e6 ("arch/kunmap_atomic: consolidate duplicate code")
> > Signed-off-by: Ira Weiny 
> 
>  microblazeel works with this patch,
> >>>
> >>> Awesome...  Andrew in my rush yesterday I should have put a reported by 
> >>> on the
> >>> patch for Guenter as well.
> >>>
> >>> Sorry about that Guenter,
> >>
> >> No worries.
> >>
> >>> Ira
> >>>
>  as do the nosmp sparc32 boot tests,
>  but sparc32 boot tests with SMP enabled still fail with lots of messages
>  such as:
> 
>  BUG: Bad page state in process swapper/0  pfn:006a1
>  page:f0933420 refcount:0 mapcount:1 mapping:(ptrval) index:0x1
>  flags: 0x0()
>  raw:  0100 0122  0001   
>  
>  page dumped because: nonzero mapcount
>  Modules linked in:
>  CPU: 0 PID: 1 Comm: swapper/0 Tainted: GB 
>  5.7.0-rc6-next-20200518-2-gb178d2d56f29 #1
>  [f00e7ab8 :
>  bad_page+0xa8/0x108 ]
>  [f00e8b54 :
>  free_pcppages_bulk+0x154/0x52c ]
>  [f00ea024 :
>  free_unref_page+0x54/0x6c ]
>  [f00ed864 :
>  free_reserved_area+0x58/0xec ]
>  [f0527104 :
>  kernel_init+0x14/0x110 ]
>  [f000b77c :
>  ret_from_kernel_thread+0xc/0x38 ]
>  [ :
>  0x0 ]
> 
>  Code path leading to that message is different but always the same
>  from free_unref_page().
> > 
> > Actually it occurs to me that the patch consolidating kmap_prot is odd for
> > sparc 32 bit...
> > 
> > Its a long shot but could you try reverting this patch?
> > 
> > 4ea7d2419e3f kmap: consolidate kmap_prot definitions
> > 
> 
> That is not easy to revert, unfortunately, due to several follow-up patches.

I have gotten your sparc tests to run and they all pass...

08:10:34 > ../linux-build-test/rootfs/sparc/run-qemu-sparc.sh 
Build reference: v5.7-rc4-17-g852b6f2edc0f

Building sparc32:SPARCClassic:nosmp:scsi:hd ... running . passed
Building sparc32:SPARCbook:nosmp:scsi:cd ... running . passed
Building sparc32:LX:nosmp:noapc:scsi:hd ... running . passed
Building sparc32:SS-4:nosmp:initrd ... running . passed
Building sparc32:SS-5:nosmp:scsi:hd ... running . passed
Building sparc32:SS-10:nosmp:scsi:cd ... running . passed
Building sparc32:SS-20:nosmp:scsi:hd ... running . passed
Building sparc32:SS-600MP:nosmp:scsi:hd ... running . passed
Building sparc32:Voyager:nosmp:noapc:scsi:hd ... running . passed
Building sparc32:SS-4:smp:scsi:hd ... running . passed
Building sparc32:SS-5:smp:scsi:cd ... running . passed
Building sparc32:SS-10:smp:scsi:hd ... running . passed
Building sparc32:SS-20:smp:scsi:hd ... running . passed
Building sparc32:SS-600MP:smp:scsi:hd ... running . passed
Building sparc32:Voyager:smp:noapc:scsi:hd ... running . passed

Is there another test I need to run?

Ira


> 
> Guenter
> 
> > Alternately I will need to figure out how to run the sparc on qemu here...
> > 
> > Thanks very much for all the testing though!  :-D
> > 
> > Ira
> > 
> 
>  Still testing ppc images.
> 
> >>
> >> ppc image tests are passing with this patch.
> >>
> >> Guenter
>

Re: [PATCH] arch/{mips,sparc,microblaze,powerpc}: Don't enable pagefault/preempt twice

2020-05-21 Thread Al Viro

On Tue, May 19, 2020 at 09:54:22AM -0700, Guenter Roeck wrote:
> On Mon, May 18, 2020 at 11:48:43AM -0700, ira.we...@intel.com wrote:
> > From: Ira Weiny 
> > 
> > The kunmap_atomic clean up failed to remove one set of pagefault/preempt
> > enables when vaddr is not in the fixmap.
> > 
> > Fixes: bee2128a09e6 ("arch/kunmap_atomic: consolidate duplicate code")
> > Signed-off-by: Ira Weiny 
> 
> microblazeel works with this patch, as do the nosmp sparc32 boot tests,
> but sparc32 boot tests with SMP enabled still fail with lots of messages
> such as:

BTW, what's your setup for sparc32 boot tests?  IOW, how do you manage to
shrink the damn thing enough to have the loader cope with it?  I hadn't
been able to do that for the current mainline ;-/

Re: [PATCH v3 6/7] powerpc/dt_cpu_ftrs: Add MMA feature

2020-05-21 Thread Paul A. Clarke

On Thu, May 21, 2020 at 11:43:40AM +1000, Alistair Popple wrote:
> Matrix multiple assist (MMA) is a new feature added to ISAv3.1 and

s/Matrix multiple assist/Matrix-Multiply Assist/

> POWER10. Support on powernv can be selected via a firmware CPU device
> tree feature which enables it via a PCR bit.
> 
> Signed-off-by: Alistair Popple 
> ---
>  arch/powerpc/include/asm/reg.h|  3 ++-
>  arch/powerpc/kernel/dt_cpu_ftrs.c | 17 -
>  2 files changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
> index dd20af367b57..88e6c78100d9 100644
> --- a/arch/powerpc/include/asm/reg.h
> +++ b/arch/powerpc/include/asm/reg.h
> @@ -481,7 +481,8 @@
>  #define   PCR_VEC_DIS(__MASK(63-0))  /* Vec. disable (bit NA since 
> POWER8) */
>  #define   PCR_VSX_DIS(__MASK(63-1))  /* VSX disable (bit NA since 
> POWER8) */
>  #define   PCR_TM_DIS (__MASK(63-2))  /* Trans. memory disable (POWER8) */
> -#define   PCR_HIGH_BITS  (PCR_VEC_DIS | PCR_VSX_DIS | PCR_TM_DIS)
> +#define   PCR_MMA_DIS(__MASK(63-3)) /* Matrix-Multiply Accelerator */

s/Accelerator/Assist/

PC

> +#define   PCR_HIGH_BITS  (PCR_MMA_DIS | PCR_VEC_DIS | PCR_VSX_DIS | 
> PCR_TM_DIS)
>  /*
>   * These bits are used in the function kvmppc_set_arch_compat() to specify 
> and
>   * determine both the compatibility level which we want to emulate and the
> diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c 
> b/arch/powerpc/kernel/dt_cpu_ftrs.c
> index 93c340906aad..0a41fce34165 100644
> --- a/arch/powerpc/kernel/dt_cpu_ftrs.c
> +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
> @@ -75,6 +75,7 @@ static struct {
>   u64 lpcr_clear;
>   u64 hfscr;
>   u64 fscr;
> + u64 pcr;
>  } system_registers;
> 
>  static void (*init_pmu_registers)(void);
> @@ -102,7 +103,7 @@ static void __restore_cpu_cpufeatures(void)
>   if (hv_mode) {
>   mtspr(SPRN_LPID, 0);
>   mtspr(SPRN_HFSCR, system_registers.hfscr);
> - mtspr(SPRN_PCR, PCR_MASK);
> + mtspr(SPRN_PCR, system_registers.pcr);
>   }
>   mtspr(SPRN_FSCR, system_registers.fscr);
> 
> @@ -555,6 +556,18 @@ static int __init feat_enable_large_ci(struct 
> dt_cpu_feature *f)
>   return 1;
>  }
> 
> +static int __init feat_enable_mma(struct dt_cpu_feature *f)
> +{
> + u64 pcr;
> +
> + feat_enable(f);
> + pcr = mfspr(SPRN_PCR);
> + pcr &= ~PCR_MMA_DIS;
> + mtspr(SPRN_PCR, pcr);
> +
> + return 1;
> +}
> +
>  struct dt_cpu_feature_match {
>   const char *name;
>   int (*enable)(struct dt_cpu_feature *f);
> @@ -629,6 +642,7 @@ static struct dt_cpu_feature_match __initdata
>   {"vector-binary16", feat_enable, 0},
>   {"wait-v3", feat_enable, 0},
>   {"prefix-instructions", feat_enable, 0},
> + {"matrix-multiply-assist", feat_enable_mma, 0},
>  };
> 
>  static bool __initdata using_dt_cpu_ftrs;
> @@ -779,6 +793,7 @@ static void __init cpufeatures_setup_finished(void)
>   system_registers.lpcr = mfspr(SPRN_LPCR);
>   system_registers.hfscr = mfspr(SPRN_HFSCR);
>   system_registers.fscr = mfspr(SPRN_FSCR);
> + system_registers.pcr = mfspr(SPRN_PCR);
> 
>   pr_info("final cpu/mmu features = 0x%016lx 0x%08x\n",
>   cur_cpu_spec->cpu_features, cur_cpu_spec->mmu_features);
> -- 
> 2.20.1
>

Re: [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier

2020-05-21 Thread Aneesh Kumar K.V


On 5/21/20 8:08 PM, Jeff Moyer wrote:

Dan Williams  writes:


But I agree with your concern that if we have older kernel/applications
that continue to use `dcbf` on future hardware we will end up
having issues w.r.t powerfail consistency. The plan is what you outlined
above as tighter ecosystem control. Considering we don't have a pmem
device generally available, we get both kernel and userspace upgraded
to use these new instructions before such a device is made available.


I thought power already supported NVDIMM-N, no?  So are you saying that
those devices will continue to work with the existing flushing and
fencing mechanisms?



yes. these devices can continue to use 'dcbf + hwsync' as long as we are 
running them on P9.




Ok, I think a compile time kernel option with a runtime override
satisfies my concern. Does that work for you?


The compile time option only helps when running newer kernels.  I'm not
sure how you would even begin to audit userspace applications (keep in
mind, not every application is open source, and not every application
uses pmdk).  I also question the merits of forcing the administrator to
make the determination of whether all applications on the system will
work properly.  Really, you have to rely on the vendor to tell you the
platform is supported, and at that point, why put further hurdles in the
way?

The decision to require different instructions on ppc is unfortunate,
but one I'm sure we have no control over.  I don't see any merit in the
kernel disallowing MAP_SYNC access on these platforms.  Ideally, we'd
have some way of ensuring older kernels don't work with these new
platforms, but I don't think that's possible.




I am currently looking at the possibility of firmware present these 
devices with different device-tree compat values. So that older 
/existing kernel won't initialize the device on newer systems. Is that a 
good compromise? We still can end up with older userspace and newer 
kernel. One of the option suggested by Jan Kara is to use a prctl flag 
to control that? (intead of kernel parameter option I posted before)




Moving on to the patch itself--Aneesh, have you audited other persistent
memory users in the kernel?  For example, drivers/md/dm-writecache.c does
this:

static void writecache_commit_flushed(struct dm_writecache *wc, bool 
wait_for_ios)
{
if (WC_MODE_PMEM(wc))
wmb(); <==
 else
 ssd_commit_flushed(wc, wait_for_ios);
}

I believe you'll need to make modifications there.



Correct. Thanks for catching that.


I don't understand dm much, wondering how this will work with 
non-synchronous DAX device?


-aneesh

Re: [RESEND PATCH v7 3/5] powerpc/papr_scm: Fetch nvdimm health information from PHYP

2020-05-21 Thread Vaibhav Jain

Michael Ellerman  writes:

> Vaibhav Jain  writes:
>> Thanks for reviewing this this patch Ira. My responses below:
>> Ira Weiny  writes:
>>> On Wed, May 20, 2020 at 12:30:56AM +0530, Vaibhav Jain wrote:
 Implement support for fetching nvdimm health information via
 H_SCM_HEALTH hcall as documented in Ref[1]. The hcall returns a pair
 of 64-bit big-endian integers, bitwise-and of which is then stored in
 'struct papr_scm_priv' and subsequently partially exposed to
 user-space via newly introduced dimm specific attribute
 'papr/flags'. Since the hcall is costly, the health information is
 cached and only re-queried, 60s after the previous successful hcall.
> ...
 diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
 b/arch/powerpc/platforms/pseries/papr_scm.c
 index f35592423380..142636e1a59f 100644
 --- a/arch/powerpc/platforms/pseries/papr_scm.c
 +++ b/arch/powerpc/platforms/pseries/papr_scm.c
 @@ -39,6 +78,15 @@ struct papr_scm_priv {
struct resource res;
struct nd_region *region;
struct nd_interleave_set nd_set;
 +
 +  /* Protect dimm health data from concurrent read/writes */
 +  struct mutex health_mutex;
 +
 +  /* Last time the health information of the dimm was updated */
 +  unsigned long lasthealth_jiffies;
 +
 +  /* Health information for the dimm */
 +  u64 health_bitmap;
>>>
>>> I wonder if this should be typed big endian as you mention that it is in the
>>> commit message?
>> This was discussed in an earlier review of the patch series at
>> https://lore.kernel.org/linux-nvdimm/878sjetcis@mpe.ellerman.id.au
>>
>> Even though health bitmap is returned in big endian format (For ex
>> value 0xC00 indicates bits 0,1 set), its value is never
>> used. Instead only test for specific bits being set in the register is
>> done.
>
> This has already caused a lot of confusion, so let me try and clear it
> up. I will probably fail :)
>
> The value is not big endian.
>
> It's returned in a GPR (a register), from the hypervisor. The ordering
> of bytes in a register is not dependent on what endian we're executing
> in.
>
> It's true that the hypervisor will have been running big endian, and
> when it returns to us we will now be running little endian. But the
> value is unchanged, it was 0xC00 in the GPR while the HV was
> running and it's still 0xC00 when we return to Linux. You
> can see this in mambo, see below for an example.
>
>
> _However_, the specification of the bits in the bitmap value uses MSB 0
> ordering, as is traditional for IBM documentation. That means the most
> significant bit, aka. the left most bit, is called "bit 0".
>
> See: https://en.wikipedia.org/wiki/Bit_numbering#MSB_0_bit_numbering
>
> That is the opposite numbering from what most people use, and in
> particular what most code in Linux uses, which is that bit 0 is the
> least significant bit.
>
> Which is where the confusion comes in. It's not that the bytes are
> returned in a different order, it's that the bits are numbered
> differently in the IBM documentation.
>
> The way to fix this kind of thing is to read the docs, and convert all
> the bits into correct numbering (LSB=0), and then throw away the docs ;)
>
> cheers
Thanks a lot for clarifying this Mpe and for this detailed explaination.

I have removed the term Big-Endian from v8 patch description to avoid
any further confusion.

>
>
>
> In mambo we can set a breakpoint just before the kernel enters skiboot,
> towards the end of __opal_call. The kernel is running LE and skiboot
> runs BE.
>
>   systemsim-p9 [~/skiboot/skiboot/external/mambo] b 0xc00c1744
>   breakpoint set at [0:0:0]: 0xc00c1744 (0x000C1744) 
> Enc:0x2402004C : hrfid
>
> Then run:
>
>   systemsim-p9 [~/skiboot/skiboot/external/mambo] c
>   [0:0:0]: 0xC00C1744 (0x000C1744) Enc:0x2402004C : hrfid
>   INFO: 121671618: (121671618): ** Execution stopped: user (tcl),  **
>   121671618: ** finished running 121671618 instructions **
>
> And we stop there, on an hrfid that we haven't executed yet.
> We can print r0, to see the OPAL token:
>
>   systemsim-p9 [~/skiboot/skiboot/external/mambo] p r0
>   0x0019
>
> ie. we're calling OPAL_CONSOLE_WRITE_BUFFER_SPACE (25).
>
> And we can print the MSR:
>
>   systemsim-p9 [~/skiboot/skiboot/external/mambo] p msr
>   0x92001033
>   
>  64-bit mode (SF): 0x1 [64-bit mode]
> Hypervisor State (HV): 0x1
>Vector Available (VEC): 0x1
>   Machine Check Interrupt Enable (ME): 0x1
> Instruction Relocate (IR): 0x1
>Data Relocate (DR): 0x1
>Recoverable Interrupt (RI): 0x1
>   Little-Endian Mode (LE): 0x1 [little-endian]
>
> ie. we're little endian.
>
> We then step one instruction:
>
>   systemsim-p9 [~/skiboot/skiboot/external/mambo] s
>   [0:0:0]: 0x30002BF0

[PATCH v5 12/13] powerpc/40x: Avoid using r12 in TLB miss handlers

2020-05-21 Thread Christophe Leroy

From: Christophe Leroy 

Let's reduce the number of registers used in TLB miss handlers.

We have both r9 and r12 available for any temporary use.

r9 is enough, avoid using r12.

Signed-off-by: Christophe Leroy 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_40x.S | 70 --
 1 file changed, 33 insertions(+), 37 deletions(-)

diff --git a/arch/powerpc/kernel/head_40x.S b/arch/powerpc/kernel/head_40x.S
index 75238897093d..b584e81f6d19 100644
--- a/arch/powerpc/kernel/head_40x.S
+++ b/arch/powerpc/kernel/head_40x.S
@@ -255,9 +255,9 @@ _ENTRY(saved_ksp_limit)
mtspr   SPRN_SPRG_SCRATCH3, r12
mtspr   SPRN_SPRG_SCRATCH4, r9
mfcrr11
-   mfspr   r12, SPRN_PID
+   mfspr   r9, SPRN_PID
mtspr   SPRN_SPRG_SCRATCH6, r11
-   mtspr   SPRN_SPRG_SCRATCH5, r12
+   mtspr   SPRN_SPRG_SCRATCH5, r9
mfspr   r10, SPRN_DEAR  /* Get faulting address */
 
/* If we are faulting a kernel address, we have to use the
@@ -280,12 +280,12 @@ _ENTRY(saved_ksp_limit)
 4:
tophys(r11, r11)
rlwimi  r11, r10, 12, 20, 29/* Create L1 (pgdir/pmd) address */
-   lwz r12, 0(r11) /* Get L1 entry */
-   andi.   r9, r12, _PMD_PRESENT   /* Check if it points to a PTE page */
+   lwz r11, 0(r11) /* Get L1 entry */
+   andi.   r9, r11, _PMD_PRESENT   /* Check if it points to a PTE page */
beq 2f  /* Bail if no table */
 
-   rlwimi  r12, r10, 22, 20, 29/* Compute PTE address */
-   lwz r11, 0(r12) /* Get Linux PTE */
+   rlwimi  r11, r10, 22, 20, 29/* Compute PTE address */
+   lwz r11, 0(r11) /* Get Linux PTE */
 #ifdef CONFIG_SWAP
li  r9, _PAGE_PRESENT | _PAGE_ACCESSED
 #else
@@ -301,13 +301,13 @@ _ENTRY(saved_ksp_limit)
/* Create TLB tag.  This is the faulting address plus a static
 * set of bits.  These are size, valid, E, U0.
*/
-   li  r12, 0x00c0
-   rlwimi  r10, r12, 0, 20, 31
+   li  r9, 0x00c0
+   rlwimi  r10, r9, 0, 20, 31
 
b   finish_tlb_load
 
 2: /* Check for possible large-page pmd entry */
-   rlwinm. r9, r12, 2, 22, 24
+   rlwinm. r9, r11, 2, 22, 24
beq 5f
 
/* Create TLB tag.  This is the faulting address, plus a static
@@ -315,7 +315,6 @@ _ENTRY(saved_ksp_limit)
 */
ori r9, r9, 0x40
rlwimi  r10, r9, 0, 20, 31
-   mr  r11, r12
 
b   finish_tlb_load
 
@@ -323,9 +322,9 @@ _ENTRY(saved_ksp_limit)
/* The bailout.  Restore registers to pre-exception conditions
 * and call the heavyweights to help us out.
 */
-   mfspr   r12, SPRN_SPRG_SCRATCH5
+   mfspr   r9, SPRN_SPRG_SCRATCH5
mfspr   r11, SPRN_SPRG_SCRATCH6
-   mtspr   SPRN_PID, r12
+   mtspr   SPRN_PID, r9
mtcrr11
mfspr   r9, SPRN_SPRG_SCRATCH4
mfspr   r12, SPRN_SPRG_SCRATCH3
@@ -343,9 +342,9 @@ _ENTRY(saved_ksp_limit)
mtspr   SPRN_SPRG_SCRATCH3, r12
mtspr   SPRN_SPRG_SCRATCH4, r9
mfcrr11
-   mfspr   r12, SPRN_PID
+   mfspr   r9, SPRN_PID
mtspr   SPRN_SPRG_SCRATCH6, r11
-   mtspr   SPRN_SPRG_SCRATCH5, r12
+   mtspr   SPRN_SPRG_SCRATCH5, r9
mfspr   r10, SPRN_SRR0  /* Get faulting address */
 
/* If we are faulting a kernel address, we have to use the
@@ -368,12 +367,12 @@ _ENTRY(saved_ksp_limit)
 4:
tophys(r11, r11)
rlwimi  r11, r10, 12, 20, 29/* Create L1 (pgdir/pmd) address */
-   lwz r12, 0(r11) /* Get L1 entry */
-   andi.   r9, r12, _PMD_PRESENT   /* Check if it points to a PTE page */
+   lwz r11, 0(r11) /* Get L1 entry */
+   andi.   r9, r11, _PMD_PRESENT   /* Check if it points to a PTE page */
beq 2f  /* Bail if no table */
 
-   rlwimi  r12, r10, 22, 20, 29/* Compute PTE address */
-   lwz r11, 0(r12) /* Get Linux PTE */
+   rlwimi  r11, r10, 22, 20, 29/* Compute PTE address */
+   lwz r11, 0(r11) /* Get Linux PTE */
 #ifdef CONFIG_SWAP
li  r9, _PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC
 #else
@@ -389,13 +388,13 @@ _ENTRY(saved_ksp_limit)
/* Create TLB tag.  This is the faulting address plus a static
 * set of bits.  These are size, valid, E, U0.
*/
-   li  r12, 0x00c0
-   rlwimi  r10, r12, 0, 20, 31
+   li  r9, 0x00c0
+   rlwimi  r10, r9, 0, 20, 31
 
b   finish_tlb_load
 
 2: /* Check for possible large-page pmd entry */
-   rlwinm. r9, r12, 2, 22, 24
+   rlwinm. r9, r11, 2, 22, 24
beq 5f
 
/* Create TLB tag.  This is the faulting address, plus a static
@@ -403,7 +402,6 @@ _ENTRY(saved_ksp_limit)
 */
ori r9, r9, 0x40
rlwimi

[PATCH v5 11/13] powerpc: Remove IBM405 Erratum #77

2020-05-21 Thread Christophe Leroy

From: Christophe Leroy 

This erratum is dedicated to IBM 405GP and STB03xxx
which are now gone.

Remove this erratum.

Signed-off-by: Christophe Leroy 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/asm-405.h   | 19 ---
 arch/powerpc/include/asm/atomic.h| 11 ---
 arch/powerpc/include/asm/bitops.h|  4 
 arch/powerpc/include/asm/cmpxchg.h   | 11 ---
 arch/powerpc/include/asm/futex.h |  3 ---
 arch/powerpc/include/asm/nohash/32/pgtable.h |  1 -
 arch/powerpc/include/asm/spinlock.h  |  4 
 arch/powerpc/kernel/entry_32.S   | 11 ---
 arch/powerpc/kernel/head_40x.S   |  3 ---
 arch/powerpc/platforms/40x/Kconfig   |  6 --
 10 files changed, 73 deletions(-)
 delete mode 100644 arch/powerpc/include/asm/asm-405.h

diff --git a/arch/powerpc/include/asm/asm-405.h 
b/arch/powerpc/include/asm/asm-405.h
deleted file mode 100644
index 7270d3ae7c8e..
--- a/arch/powerpc/include/asm/asm-405.h
+++ /dev/null
@@ -1,19 +0,0 @@
-#ifndef _ASM_POWERPC_ASM_405_H
-#define _ASM_POWERPC_ASM_405_H
-
-#include 
-
-#ifdef __KERNEL__
-#ifdef CONFIG_IBM405_ERR77
-/* Erratum #77 on the 405 means we need a sync or dcbt before every
- * stwcx.  The old ATOMIC_SYNC_FIX covered some but not all of this.
- */
-#define PPC405_ERR77(ra,rb)stringify_in_c(dcbt ra, rb;)
-#definePPC405_ERR77_SYNC   stringify_in_c(sync;)
-#else
-#define PPC405_ERR77(ra,rb)
-#define PPC405_ERR77_SYNC
-#endif
-#endif
-
-#endif /* _ASM_POWERPC_ASM_405_H */
diff --git a/arch/powerpc/include/asm/atomic.h 
b/arch/powerpc/include/asm/atomic.h
index 31c231ea56b7..498785ffc25f 100644
--- a/arch/powerpc/include/asm/atomic.h
+++ b/arch/powerpc/include/asm/atomic.h
@@ -10,7 +10,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #define ATOMIC_INIT(i) { (i) }
 
@@ -47,7 +46,6 @@ static __inline__ void atomic_##op(int a, atomic_t *v)
\
__asm__ __volatile__(   \
 "1:lwarx   %0,0,%3 # atomic_" #op "\n" \
#asm_op " %0,%2,%0\n"   \
-   PPC405_ERR77(0,%3)  \
 "  stwcx.  %0,0,%3 \n" \
 "  bne-1b\n"   \
: "=&r" (t), "+m" (v->counter)  \
@@ -63,7 +61,6 @@ static inline int atomic_##op##_return_relaxed(int a, 
atomic_t *v)\
__asm__ __volatile__(   \
 "1:lwarx   %0,0,%3 # atomic_" #op "_return_relaxed\n"  \
#asm_op " %0,%2,%0\n"   \
-   PPC405_ERR77(0, %3) \
 "  stwcx.  %0,0,%3\n"  \
 "  bne-1b\n"   \
: "=&r" (t), "+m" (v->counter)  \
@@ -81,7 +78,6 @@ static inline int atomic_fetch_##op##_relaxed(int a, atomic_t 
*v) \
__asm__ __volatile__(   \
 "1:lwarx   %0,0,%4 # atomic_fetch_" #op "_relaxed\n"   \
#asm_op " %1,%3,%0\n"   \
-   PPC405_ERR77(0, %4) \
 "  stwcx.  %1,0,%4\n"  \
 "  bne-1b\n"   \
: "=&r" (res), "=&r" (t), "+m" (v->counter) \
@@ -130,7 +126,6 @@ static __inline__ void atomic_inc(atomic_t *v)
__asm__ __volatile__(
 "1:lwarx   %0,0,%2 # atomic_inc\n\
addic   %0,%0,1\n"
-   PPC405_ERR77(0,%2)
 "  stwcx.  %0,0,%2 \n\
bne-1b"
: "=&r" (t), "+m" (v->counter)
@@ -146,7 +141,6 @@ static __inline__ int atomic_inc_return_relaxed(atomic_t *v)
__asm__ __volatile__(
 "1:lwarx   %0,0,%2 # atomic_inc_return_relaxed\n"
 "  addic   %0,%0,1\n"
-   PPC405_ERR77(0, %2)
 "  stwcx.  %0,0,%2\n"
 "  bne-1b"
: "=&r" (t), "+m" (v->counter)
@@ -163,7 +157,6 @@ static __inline__ void atomic_dec(atomic_t *v)
__asm__ __volatile__(
 "1:lwarx   %0,0,%2 # atomic_dec\n\
addic   %0,%0,-1\n"
-   PPC405_ERR77(0,%2)\
 "  stwcx.  %0,0,%2\n\
bne-1b"
: "=&r" (t), "+m" (v->counter)
@@ -179,7 +172,6 @@ static __inline__ int atomic_dec_return_relaxed(atomic_t *v)
__asm__ __volatile__(
 "1:lwarx   %0,0,%2 # atomic_dec_return_relaxed\n"
 "  addic   %0,%0,-1\n"
-   PPC405_ERR77(0, %2)
 "  stwcx.  %0,0,%2\n"
 "  bne-1b"
: "=&r" (t), "+m" (v->counter)
@@ -220,7 +212,6 @@ static __inline__ int ato

[PATCH v5 13/13] powerpc/40x: Don't save CR in SPRN_SPRG_SCRATCH6

2020-05-21 Thread Christophe Leroy

From: Christophe Leroy 

We have r12 available, use it to keep CR around and don't
save it in SPRN_SPRG_SCRATCH6.

Signed-off-by: Christophe Leroy 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_40x.S | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/head_40x.S b/arch/powerpc/kernel/head_40x.S
index b584e81f6d19..a22a8209971b 100644
--- a/arch/powerpc/kernel/head_40x.S
+++ b/arch/powerpc/kernel/head_40x.S
@@ -254,9 +254,8 @@ _ENTRY(saved_ksp_limit)
mtspr   SPRN_SPRG_SCRATCH1, r11
mtspr   SPRN_SPRG_SCRATCH3, r12
mtspr   SPRN_SPRG_SCRATCH4, r9
-   mfcrr11
+   mfcrr12
mfspr   r9, SPRN_PID
-   mtspr   SPRN_SPRG_SCRATCH6, r11
mtspr   SPRN_SPRG_SCRATCH5, r9
mfspr   r10, SPRN_DEAR  /* Get faulting address */
 
@@ -323,9 +322,8 @@ _ENTRY(saved_ksp_limit)
 * and call the heavyweights to help us out.
 */
mfspr   r9, SPRN_SPRG_SCRATCH5
-   mfspr   r11, SPRN_SPRG_SCRATCH6
mtspr   SPRN_PID, r9
-   mtcrr11
+   mtcrr12
mfspr   r9, SPRN_SPRG_SCRATCH4
mfspr   r12, SPRN_SPRG_SCRATCH3
mfspr   r11, SPRN_SPRG_SCRATCH1
@@ -341,9 +339,8 @@ _ENTRY(saved_ksp_limit)
mtspr   SPRN_SPRG_SCRATCH1, r11
mtspr   SPRN_SPRG_SCRATCH3, r12
mtspr   SPRN_SPRG_SCRATCH4, r9
-   mfcrr11
+   mfcrr12
mfspr   r9, SPRN_PID
-   mtspr   SPRN_SPRG_SCRATCH6, r11
mtspr   SPRN_SPRG_SCRATCH5, r9
mfspr   r10, SPRN_SRR0  /* Get faulting address */
 
@@ -410,9 +407,8 @@ _ENTRY(saved_ksp_limit)
 * and call the heavyweights to help us out.
 */
mfspr   r9, SPRN_SPRG_SCRATCH5
-   mfspr   r11, SPRN_SPRG_SCRATCH6
mtspr   SPRN_PID, r9
-   mtcrr11
+   mtcrr12
mfspr   r9, SPRN_SPRG_SCRATCH4
mfspr   r12, SPRN_SPRG_SCRATCH3
mfspr   r11, SPRN_SPRG_SCRATCH1
@@ -556,9 +552,8 @@ finish_tlb_load:
/* Done...restore registers and get out of here.
*/
mfspr   r9, SPRN_SPRG_SCRATCH5
-   mfspr   r11, SPRN_SPRG_SCRATCH6
mtspr   SPRN_PID, r9
-   mtcrr11
+   mtcrr12
mfspr   r9, SPRN_SPRG_SCRATCH4
mfspr   r12, SPRN_SPRG_SCRATCH3
mfspr   r11, SPRN_SPRG_SCRATCH1
-- 
2.25.0

[PATCH v5 07/13] powerpc/40x: Remove EP405

2020-05-21 Thread Christophe Leroy

From: Christophe Leroy 

EP405 is an old type of board based on a 405GP which is obsolete.

Remove it.

Signed-off-by: Christophe Leroy 
---
v4: A few things from previous patch are now here as there are not related to 
walnut
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/boot/Makefile   |   3 +-
 arch/powerpc/boot/dts/ep405.dts  | 230 ---
 arch/powerpc/boot/ep405.c|  71 ---
 arch/powerpc/configs/40x/ep405_defconfig |  62 --
 arch/powerpc/configs/ppc40x_defconfig|   1 -
 arch/powerpc/platforms/40x/Kconfig   |   8 -
 arch/powerpc/platforms/40x/Makefile  |   1 -
 arch/powerpc/platforms/40x/ep405.c   | 123 
 8 files changed, 1 insertion(+), 498 deletions(-)
 delete mode 100644 arch/powerpc/boot/dts/ep405.dts
 delete mode 100644 arch/powerpc/boot/ep405.c
 delete mode 100644 arch/powerpc/configs/40x/ep405_defconfig
 delete mode 100644 arch/powerpc/platforms/40x/ep405.c

diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index 749c27fcf2d9..63d7456b9518 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -130,7 +130,7 @@ src-wlib-$(CONFIG_EMBEDDED6xx) += ugecon.c fsl-soc.c
 src-wlib-$(CONFIG_CPM) += cpm-serial.c
 
 src-plat-y := of.c epapr.c
-src-plat-$(CONFIG_40x) += fixed-head.S ep405.c cuboot-hotfoot.c \
+src-plat-$(CONFIG_40x) += fixed-head.S cuboot-hotfoot.c \
cuboot-acadia.c \
cuboot-kilauea.c simpleboot.c
 src-plat-$(CONFIG_44x) += treeboot-ebony.c cuboot-ebony.c treeboot-bamboo.c \
@@ -275,7 +275,6 @@ image-$(CONFIG_EPAPR_BOOT)  += zImage.epapr
 #
 
 # Board ports in arch/powerpc/platform/40x/Kconfig
-image-$(CONFIG_EP405)  += dtbImage.ep405
 image-$(CONFIG_HOTFOOT)+= cuImage.hotfoot
 image-$(CONFIG_ACADIA) += cuImage.acadia
 image-$(CONFIG_OBS600) += uImage.obs600
diff --git a/arch/powerpc/boot/dts/ep405.dts b/arch/powerpc/boot/dts/ep405.dts
deleted file mode 100644
index 4ac9c5ab6e6b..
--- a/arch/powerpc/boot/dts/ep405.dts
+++ /dev/null
@@ -1,230 +0,0 @@
-/*
- * Device Tree Source for EP405
- *
- * Copyright 2007 IBM Corp.
- * Benjamin Herrenschmidt 
- *
- * This file is licensed under the terms of the GNU General Public
- * License version 2.  This program is licensed "as is" without
- * any warranty of any kind, whether express or implied.
- */
-
-/dts-v1/;
-
-/ {
-   #address-cells = <1>;
-   #size-cells = <1>;
-   model = "ep405";
-   compatible = "ep405";
-   dcr-parent = <&{/cpus/cpu@0}>;
-
-   aliases {
-   ethernet0 = &EMAC;
-   serial0 = &UART0;
-   serial1 = &UART1;
-   };
-
-   cpus {
-   #address-cells = <1>;
-   #size-cells = <0>;
-
-   cpu@0 {
-   device_type = "cpu";
-   model = "PowerPC,405GP";
-   reg = <0x>;
-   clock-frequency = <2>; /* Filled in by zImage */
-   timebase-frequency = <0>; /* Filled in by zImage */
-   i-cache-line-size = <32>;
-   d-cache-line-size = <32>;
-   i-cache-size = <16384>;
-   d-cache-size = <16384>;
-   dcr-controller;
-   dcr-access-method = "native";
-   };
-   };
-
-   memory {
-   device_type = "memory";
-   reg = <0x 0x>; /* Filled in by zImage */
-   };
-
-   UIC0: interrupt-controller {
-   compatible = "ibm,uic";
-   interrupt-controller;
-   cell-index = <0>;
-   dcr-reg = <0x0c0 0x009>;
-   #address-cells = <0>;
-   #size-cells = <0>;
-   #interrupt-cells = <2>;
-   };
-
-   plb {
-   compatible = "ibm,plb3";
-   #address-cells = <1>;
-   #size-cells = <1>;
-   ranges;
-   clock-frequency = <0>; /* Filled in by zImage */
-
-   SDRAM0: memory-controller {
-   compatible = "ibm,sdram-405gp";
-   dcr-reg = <0x010 0x002>;
-   };
-
-   MAL: mcmal {
-   compatible = "ibm,mcmal-405gp", "ibm,mcmal";
-   dcr-reg = <0x180 0x062>;
-   num-tx-chans = <1>;
-   num-rx-chans = <1>;
-   interrupt-parent = <&UIC0>;
-   interrupts = <
-   0xb 0x4 /* TXEOB */
-   0xc 0x4 /* RXEOB */
-   0xa 0x4 /* SERR */
-   0xd 0x4 /* TXDE */
-   0xe 0x4 /* RXDE */>;
-   };
-
-   POB0: o

[PATCH v5 06/13] powerpc/40x: Remove WALNUT

2020-05-21 Thread Christophe Leroy

From: Christophe Leroy 

CONFIG_WALNUT is not selected by any config and is based
on 405GP which is obsolete.

Remove it.

Signed-off-by: Christophe Leroy 
---
v4: Moved a few things related to EP405 to next patch
Signed-off-by: Christophe Leroy 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/boot/Makefile  |   4 +-
 arch/powerpc/boot/dts/walnut.dts| 246 
 arch/powerpc/boot/treeboot-walnut.c |  81 ---
 arch/powerpc/configs/40x/acadia_defconfig   |   1 -
 arch/powerpc/configs/40x/kilauea_defconfig  |   1 -
 arch/powerpc/configs/40x/klondike_defconfig |   1 -
 arch/powerpc/configs/40x/makalu_defconfig   |   1 -
 arch/powerpc/configs/40x/obs600_defconfig   |   1 -
 arch/powerpc/platforms/40x/Kconfig  |  10 -
 arch/powerpc/platforms/40x/Makefile |   1 -
 arch/powerpc/platforms/40x/walnut.c |  65 --
 11 files changed, 1 insertion(+), 411 deletions(-)
 delete mode 100644 arch/powerpc/boot/dts/walnut.dts
 delete mode 100644 arch/powerpc/boot/treeboot-walnut.c
 delete mode 100644 arch/powerpc/platforms/40x/walnut.c

diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index d8077b7071dd..749c27fcf2d9 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -75,7 +75,6 @@ $(obj)/cuboot-hotfoot.o: BOOTCFLAGS += -mcpu=405
 $(obj)/cuboot-taishan.o: BOOTCFLAGS += -mcpu=405
 $(obj)/cuboot-katmai.o: BOOTCFLAGS += -mcpu=405
 $(obj)/cuboot-acadia.o: BOOTCFLAGS += -mcpu=405
-$(obj)/treeboot-walnut.o: BOOTCFLAGS += -mcpu=405
 $(obj)/treeboot-iss4xx.o: BOOTCFLAGS += -mcpu=405
 $(obj)/treeboot-currituck.o: BOOTCFLAGS += -mcpu=405
 $(obj)/treeboot-akebono.o: BOOTCFLAGS += -mcpu=405
@@ -132,7 +131,7 @@ src-wlib-$(CONFIG_CPM) += cpm-serial.c
 
 src-plat-y := of.c epapr.c
 src-plat-$(CONFIG_40x) += fixed-head.S ep405.c cuboot-hotfoot.c \
-   treeboot-walnut.c cuboot-acadia.c \
+   cuboot-acadia.c \
cuboot-kilauea.c simpleboot.c
 src-plat-$(CONFIG_44x) += treeboot-ebony.c cuboot-ebony.c treeboot-bamboo.c \
cuboot-bamboo.c cuboot-sam440ep.c \
@@ -278,7 +277,6 @@ image-$(CONFIG_EPAPR_BOOT)  += zImage.epapr
 # Board ports in arch/powerpc/platform/40x/Kconfig
 image-$(CONFIG_EP405)  += dtbImage.ep405
 image-$(CONFIG_HOTFOOT)+= cuImage.hotfoot
-image-$(CONFIG_WALNUT) += treeImage.walnut
 image-$(CONFIG_ACADIA) += cuImage.acadia
 image-$(CONFIG_OBS600) += uImage.obs600
 
diff --git a/arch/powerpc/boot/dts/walnut.dts b/arch/powerpc/boot/dts/walnut.dts
deleted file mode 100644
index 0872862c9363..
--- a/arch/powerpc/boot/dts/walnut.dts
+++ /dev/null
@@ -1,246 +0,0 @@
-/*
- * Device Tree Source for IBM Walnut
- *
- * Copyright 2007 IBM Corp.
- * Josh Boyer 
- *
- * This file is licensed under the terms of the GNU General Public
- * License version 2.  This program is licensed "as is" without
- * any warranty of any kind, whether express or implied.
- */
-
-/dts-v1/;
-
-/ {
-   #address-cells = <1>;
-   #size-cells = <1>;
-   model = "ibm,walnut";
-   compatible = "ibm,walnut";
-   dcr-parent = <&{/cpus/cpu@0}>;
-
-   aliases {
-   ethernet0 = &EMAC;
-   serial0 = &UART0;
-   serial1 = &UART1;
-   };
-
-   cpus {
-   #address-cells = <1>;
-   #size-cells = <0>;
-
-   cpu@0 {
-   device_type = "cpu";
-   model = "PowerPC,405GP";
-   reg = <0x>;
-   clock-frequency = <2>; /* Filled in by zImage */
-   timebase-frequency = <0>; /* Filled in by zImage */
-   i-cache-line-size = <32>;
-   d-cache-line-size = <32>;
-   i-cache-size = <16384>;
-   d-cache-size = <16384>;
-   dcr-controller;
-   dcr-access-method = "native";
-   };
-   };
-
-   memory {
-   device_type = "memory";
-   reg = <0x 0x>; /* Filled in by zImage */
-   };
-
-   UIC0: interrupt-controller {
-   compatible = "ibm,uic";
-   interrupt-controller;
-   cell-index = <0>;
-   dcr-reg = <0x0c0 0x009>;
-   #address-cells = <0>;
-   #size-cells = <0>;
-   #interrupt-cells = <2>;
-   };
-
-   plb {
-   compatible = "ibm,plb3";
-   #address-cells = <1>;
-   #size-cells = <1>;
-   ranges;
-   clock-frequency = <0>; /* Filled in by zImage */
-
-   SDRAM0: memory-controller {
-   compatible = "ibm,sdram-405gp";
-   dcr-reg = <0x010 0x002>;

[PATCH v5 05/13] powerpc/40x: Remove STB03xxx

2020-05-21 Thread Christophe Leroy

From: Christophe Leroy 

CONFIG_STB03xxx is not user selectable and is not selected
by any config.

Remove it.

Signed-off-by: Christophe Leroy 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/cputable.c | 13 -
 arch/powerpc/platforms/40x/Kconfig |  5 -
 2 files changed, 18 deletions(-)

diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index 61bd8fb408b2..bdc4eab0daaf 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -1245,19 +1245,6 @@ static struct cpu_spec __initdata cpu_specs[] = {
.machine_check  = machine_check_4xx,
.platform   = "ppc405",
},
-   {   /* STB 03xxx */
-   .pvr_mask   = 0x,
-   .pvr_value  = 0x4013,
-   .cpu_name   = "STB03xxx",
-   .cpu_features   = CPU_FTRS_40X,
-   .cpu_user_features  = PPC_FEATURE_32 |
-   PPC_FEATURE_HAS_MMU | PPC_FEATURE_HAS_4xxMAC,
-   .mmu_features   = MMU_FTR_TYPE_40x,
-   .icache_bsize   = 32,
-   .dcache_bsize   = 32,
-   .machine_check  = machine_check_4xx,
-   .platform   = "ppc405",
-   },
{   /* STB 04xxx */
.pvr_mask   = 0x,
.pvr_value  = 0x4181,
diff --git a/arch/powerpc/platforms/40x/Kconfig 
b/arch/powerpc/platforms/40x/Kconfig
index 8669be59948c..ca8f44650647 100644
--- a/arch/powerpc/platforms/40x/Kconfig
+++ b/arch/powerpc/platforms/40x/Kconfig
@@ -86,11 +86,6 @@ config 405EZ
select IBM_EMAC_MAL_CLR_ICINTSTAT if IBM_EMAC
select IBM_EMAC_MAL_COMMON_ERR if IBM_EMAC
 
-config STB03xxx
-   bool
-   select IBM405_ERR77
-   select IBM405_ERR51
-
 config PPC4xx_GPIO
bool "PPC4xx GPIO support"
depends on 40x
-- 
2.25.0

[PATCH v5 10/13] powerpc/40x: Remove IBM405 Erratum #51

2020-05-21 Thread Christophe Leroy

From: Christophe Leroy 

This erratum was for IBM 403GCX, 405EP and STB03xxx which are
now gone.

Remove this erratum.

Signed-off-by: Christophe Leroy 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_40x.S | 6 --
 arch/powerpc/platforms/40x/Kconfig | 4 
 2 files changed, 10 deletions(-)

diff --git a/arch/powerpc/kernel/head_40x.S b/arch/powerpc/kernel/head_40x.S
index 5fe4b7ad864b..a78cacea0be0 100644
--- a/arch/powerpc/kernel/head_40x.S
+++ b/arch/powerpc/kernel/head_40x.S
@@ -426,13 +426,7 @@ _ENTRY(saved_ksp_limit)
EXCEPTION(0x1400, Trap_14, unknown_exception, EXC_XFER_STD)
EXCEPTION(0x1500, Trap_15, unknown_exception, EXC_XFER_STD)
EXCEPTION(0x1600, Trap_16, unknown_exception, EXC_XFER_STD)
-#ifdef CONFIG_IBM405_ERR51
-   /* 405GP errata 51 */
-   START_EXCEPTION(0x1700, Trap_17)
-   b DTLBMiss
-#else
EXCEPTION(0x1700, Trap_17, unknown_exception, EXC_XFER_STD)
-#endif
EXCEPTION(0x1800, Trap_18, unknown_exception, EXC_XFER_STD)
EXCEPTION(0x1900, Trap_19, unknown_exception, EXC_XFER_STD)
EXCEPTION(0x1A00, Trap_1A, unknown_exception, EXC_XFER_STD)
diff --git a/arch/powerpc/platforms/40x/Kconfig 
b/arch/powerpc/platforms/40x/Kconfig
index 253c047fe6fe..ebe283476461 100644
--- a/arch/powerpc/platforms/40x/Kconfig
+++ b/arch/powerpc/platforms/40x/Kconfig
@@ -75,10 +75,6 @@ config PPC4xx_GPIO
 config IBM405_ERR77
bool
 
-# All 40x-based cores, up until the 405GPR and 405EP have this errata.
-config IBM405_ERR51
-   bool
-
 config APM8018X
bool "APM8018X"
depends on 40x
-- 
2.25.0

[PATCH v5 08/13] powerpc/40x: Remove support for ISS Simulator

2020-05-21 Thread Christophe Leroy

From: Christophe Leroy 

ISS4xx has support for 405GP which is obsolete.

Remote it.

Signed-off-by: Christophe Leroy 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/platforms/44x/Kconfig | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/44x/Kconfig 
b/arch/powerpc/platforms/44x/Kconfig
index 39e93d23fb38..78ac6d67a935 100644
--- a/arch/powerpc/platforms/44x/Kconfig
+++ b/arch/powerpc/platforms/44x/Kconfig
@@ -167,8 +167,7 @@ config YOSEMITE
 
 config ISS4xx
bool "ISS 4xx Simulator"
-   depends on (44x || 40x)
-   select 405GP if 40x
+   depends on 44x
select 440GP if 44x && !PPC_47x
select PPC_FPU
select OF_RTC
-- 
2.25.0

[PATCH v5 09/13] powerpc/40x: Remove support for IBM 405GP

2020-05-21 Thread Christophe Leroy

From: Christophe Leroy 

All platforms selecting the obsolete processor are gone now.

Remove support for it.

Signed-off-by: Christophe Leroy 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/cputable.c | 13 -
 arch/powerpc/platforms/40x/Kconfig |  6 --
 2 files changed, 19 deletions(-)

diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index bdc4eab0daaf..8ed553734919 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -1232,19 +1232,6 @@ static struct cpu_spec __initdata cpu_specs[] = {
},
 #endif /* CONFIG_PPC_8xx */
 #ifdef CONFIG_40x
-   {   /* 405GP */
-   .pvr_mask   = 0x,
-   .pvr_value  = 0x4011,
-   .cpu_name   = "405GP",
-   .cpu_features   = CPU_FTRS_40X,
-   .cpu_user_features  = PPC_FEATURE_32 |
-   PPC_FEATURE_HAS_MMU | PPC_FEATURE_HAS_4xxMAC,
-   .mmu_features   = MMU_FTR_TYPE_40x,
-   .icache_bsize   = 32,
-   .dcache_bsize   = 32,
-   .machine_check  = machine_check_4xx,
-   .platform   = "ppc405",
-   },
{   /* STB 04xxx */
.pvr_mask   = 0x,
.pvr_value  = 0x4181,
diff --git a/arch/powerpc/platforms/40x/Kconfig 
b/arch/powerpc/platforms/40x/Kconfig
index 5d9d96e7223a..253c047fe6fe 100644
--- a/arch/powerpc/platforms/40x/Kconfig
+++ b/arch/powerpc/platforms/40x/Kconfig
@@ -51,12 +51,6 @@ config PPC40x_SIMPLE
help
  This option enables the simple PowerPC 40x platform support.
 
-config 405GP
-   bool
-   select IBM405_ERR77
-   select IBM405_ERR51
-   select IBM_EMAC_ZMII if IBM_EMAC
-
 config 405EX
bool
select IBM_EMAC_EMAC4 if IBM_EMAC
-- 
2.25.0

[PATCH v5 01/13] powerpc: Remove Xilinx PPC405/PPC440 support

2020-05-21 Thread Christophe Leroy

From: Michal Simek 

The latest Xilinx design tools called ISE and EDK has been released in
October 2013. New tool doesn't support any PPC405/PPC440 new designs.
These platforms are no longer supported and tested.

PowerPC 405/440 port is orphan from 2013 by
commit cdeb89943bfc ("MAINTAINERS: Fix incorrect status tag") and
commit 19624236cce1 ("MAINTAINERS: Update Grant's email address and 
maintainership")
that's why it is time to remove the support fot these platforms.

Signed-off-by: Michal Simek 
Acked-by: Arnd Bergmann 
Signed-off-by: Christophe Leroy 
---
 Documentation/devicetree/bindings/xilinx.txt | 143 --
 Documentation/powerpc/bootwrapper.rst|  28 +-
 arch/powerpc/Kconfig.debug   |   2 +-
 arch/powerpc/boot/Makefile   |   7 +-
 arch/powerpc/boot/dts/Makefile   |   1 -
 arch/powerpc/boot/dts/virtex440-ml507.dts| 406 
 arch/powerpc/boot/dts/virtex440-ml510.dts| 466 ---
 arch/powerpc/boot/ops.h  |   1 -
 arch/powerpc/boot/serial.c   |   5 -
 arch/powerpc/boot/uartlite.c |  79 
 arch/powerpc/boot/virtex.c   |  97 
 arch/powerpc/boot/virtex405-head.S   |  31 --
 arch/powerpc/boot/wrapper|   8 -
 arch/powerpc/configs/40x/virtex_defconfig|  75 ---
 arch/powerpc/configs/44x/virtex5_defconfig   |  74 ---
 arch/powerpc/configs/ppc40x_defconfig|   8 -
 arch/powerpc/configs/ppc44x_defconfig|   8 -
 arch/powerpc/include/asm/xilinx_intc.h   |  16 -
 arch/powerpc/include/asm/xilinx_pci.h|  21 -
 arch/powerpc/kernel/cputable.c   |  39 --
 arch/powerpc/platforms/40x/Kconfig   |  31 --
 arch/powerpc/platforms/40x/Makefile  |   1 -
 arch/powerpc/platforms/40x/virtex.c  |  54 ---
 arch/powerpc/platforms/44x/Kconfig   |  37 --
 arch/powerpc/platforms/44x/Makefile  |   2 -
 arch/powerpc/platforms/44x/virtex.c  |  60 ---
 arch/powerpc/platforms/44x/virtex_ml510.c|  30 --
 arch/powerpc/platforms/Kconfig   |   4 -
 arch/powerpc/sysdev/Makefile |   2 -
 arch/powerpc/sysdev/xilinx_intc.c|  88 
 arch/powerpc/sysdev/xilinx_pci.c | 132 --
 drivers/char/Kconfig |   2 +-
 drivers/video/fbdev/Kconfig  |   2 +-
 33 files changed, 7 insertions(+), 1953 deletions(-)
 delete mode 100644 arch/powerpc/boot/dts/virtex440-ml507.dts
 delete mode 100644 arch/powerpc/boot/dts/virtex440-ml510.dts
 delete mode 100644 arch/powerpc/boot/uartlite.c
 delete mode 100644 arch/powerpc/boot/virtex.c
 delete mode 100644 arch/powerpc/boot/virtex405-head.S
 delete mode 100644 arch/powerpc/configs/40x/virtex_defconfig
 delete mode 100644 arch/powerpc/configs/44x/virtex5_defconfig
 delete mode 100644 arch/powerpc/include/asm/xilinx_intc.h
 delete mode 100644 arch/powerpc/include/asm/xilinx_pci.h
 delete mode 100644 arch/powerpc/platforms/40x/virtex.c
 delete mode 100644 arch/powerpc/platforms/44x/virtex.c
 delete mode 100644 arch/powerpc/platforms/44x/virtex_ml510.c
 delete mode 100644 arch/powerpc/sysdev/xilinx_intc.c
 delete mode 100644 arch/powerpc/sysdev/xilinx_pci.c

diff --git a/Documentation/devicetree/bindings/xilinx.txt 
b/Documentation/devicetree/bindings/xilinx.txt
index d058ace29345..28199b31fe5e 100644
--- a/Documentation/devicetree/bindings/xilinx.txt
+++ b/Documentation/devicetree/bindings/xilinx.txt
@@ -86,149 +86,6 @@
xlnx,use-parity = <0>;
};
 
-   Some IP cores actually implement 2 or more logical devices.  In
-   this case, the device should still describe the whole IP core with
-   a single node and add a child node for each logical device.  The
-   ranges property can be used to translate from parent IP-core to the
-   registers of each device.  In addition, the parent node should be
-   compatible with the bus type 'xlnx,compound', and should contain
-   #address-cells and #size-cells, as with any other bus.  (Note: this
-   makes the assumption that both logical devices have the same bus
-   binding.  If this is not true, then separate nodes should be used
-   for each logical device).  The 'cell-index' property can be used to
-   enumerate logical devices within an IP core.  For example, the
-   following is the system.mhs entry for the dual ps2 controller found
-   on the ml403 reference design.
-
-   BEGIN opb_ps2_dual_ref
-   PARAMETER INSTANCE = opb_ps2_dual_ref_0
-   PARAMETER HW_VER = 1.00.a
-   PARAMETER C_BASEADDR = 0xA900
-   PARAMETER C_HIGHADDR = 0xA9001FFF
-   BUS_INTERFACE SOPB = opb_v20_0
-   PORT Sys_Intr1 = ps2_1_intr
-   PORT Sys_Intr2 = ps2_2_intr
-   PORT Clkin1 = ps2_clk_rx_1
-   PORT Clkin2 = ps2_clk_rx_2
-   PORT Clkpd1 = ps2_clk_tx_1
-   PORT Clkpd2 = ps2_c

[PATCH v5 04/13] powerpc/40x: Remove support for IBM 403GCX

2020-05-21 Thread Christophe Leroy

From: Christophe Leroy 

CONFIG_403GCX is not user selectable and is not
selected by any platform.

Remove it.

Signed-off-by: Christophe Leroy 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/cache.h |  2 +-
 arch/powerpc/include/asm/reg_booke.h | 54 
 arch/powerpc/include/asm/time.h  | 12 ---
 arch/powerpc/kernel/cputable.c   | 37 ---
 arch/powerpc/kernel/head_40x.S   | 45 ---
 arch/powerpc/kernel/misc_32.S|  9 -
 arch/powerpc/kernel/setup-common.c   |  4 ---
 arch/powerpc/platforms/40x/Kconfig   |  6 
 8 files changed, 1 insertion(+), 168 deletions(-)

diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h
index 609cab1d58f2..2124b7090db9 100644
--- a/arch/powerpc/include/asm/cache.h
+++ b/arch/powerpc/include/asm/cache.h
@@ -6,7 +6,7 @@
 
 
 /* bytes per L1 cache line */
-#if defined(CONFIG_PPC_8xx) || defined(CONFIG_403GCX)
+#if defined(CONFIG_PPC_8xx)
 #define L1_CACHE_SHIFT 4
 #define MAX_COPY_PREFETCH  1
 #define IFETCH_ALIGN_SHIFT 2
diff --git a/arch/powerpc/include/asm/reg_booke.h 
b/arch/powerpc/include/asm/reg_booke.h
index f26fe482fbca..ff30f1076162 100644
--- a/arch/powerpc/include/asm/reg_booke.h
+++ b/arch/powerpc/include/asm/reg_booke.h
@@ -663,60 +663,6 @@
 #define EPC_EPID   0x3fff
 #define EPC_EPID_SHIFT 0
 
-/*
- * The IBM-403 is an even more odd special case, as it is much
- * older than the IBM-405 series.  We put these down here incase someone
- * wishes to support these machines again.
- */
-#ifdef CONFIG_403GCX
-/* Special Purpose Registers (SPRNs)*/
-#define SPRN_TBHU  0x3CC   /* Time Base High User-mode */
-#define SPRN_TBLU  0x3CD   /* Time Base Low User-mode */
-#define SPRN_CDBCR 0x3D7   /* Cache Debug Control Register */
-#define SPRN_TBHI  0x3DC   /* Time Base High */
-#define SPRN_TBLO  0x3DD   /* Time Base Low */
-#define SPRN_DBCR  0x3F2   /* Debug Control Register */
-#define SPRN_PBL1  0x3FC   /* Protection Bound Lower 1 */
-#define SPRN_PBL2  0x3FE   /* Protection Bound Lower 2 */
-#define SPRN_PBU1  0x3FD   /* Protection Bound Upper 1 */
-#define SPRN_PBU2  0x3FF   /* Protection Bound Upper 2 */
-
-
-/* Bit definitions for the DBCR. */
-#define DBCR_EDM   DBCR0_EDM
-#define DBCR_IDM   DBCR0_IDM
-#define DBCR_RST(x)(((x) & 0x3) << 28)
-#define DBCR_RST_NONE  0
-#define DBCR_RST_CORE  1
-#define DBCR_RST_CHIP  2
-#define DBCR_RST_SYSTEM3
-#define DBCR_ICDBCR0_IC/* Instruction Completion Debug 
Evnt */
-#define DBCR_BTDBCR0_BT/* Branch Taken Debug Event */
-#define DBCR_EDE   DBCR0_EDE   /* Exception Debug Event */
-#define DBCR_TDE   DBCR0_TDE   /* TRAP Debug Event */
-#define DBCR_FER   0x00F8  /* First Events Remaining Mask */
-#define DBCR_FT0x0004  /* Freeze Timers on Debug Event 
*/
-#define DBCR_IA1   0x0002  /* Instr. Addr. Compare 1 Enable */
-#define DBCR_IA2   0x0001  /* Instr. Addr. Compare 2 Enable */
-#define DBCR_D1R   0x8000  /* Data Addr. Compare 1 Read Enable */
-#define DBCR_D1W   0x4000  /* Data Addr. Compare 1 Write Enable */
-#define DBCR_D1S(x)(((x) & 0x3) << 12) /* Data Adrr. Compare 1 Size */
-#define DAC_BYTE   0
-#define DAC_HALF   1
-#define DAC_WORD   2
-#define DAC_QUAD   3
-#define DBCR_D2R   0x0800  /* Data Addr. Compare 2 Read Enable */
-#define DBCR_D2W   0x0400  /* Data Addr. Compare 2 Write Enable */
-#define DBCR_D2S(x)(((x) & 0x3) << 8)  /* Data Addr. Compare 2 Size */
-#define DBCR_SBT   0x0040  /* Second Branch Taken Debug Event */
-#define DBCR_SED   0x0020  /* Second Exception Debug Event */
-#define DBCR_STD   0x0010  /* Second Trap Debug Event */
-#define DBCR_SIA   0x0008  /* Second IAC Enable */
-#define DBCR_SDA   0x0004  /* Second DAC Enable */
-#define DBCR_JOI   0x0002  /* JTAG Serial Outbound Int. Enable */
-#define DBCR_JII   0x0001  /* JTAG Serial Inbound Int. Enable */
-#endif /* 403GCX */
-
 /* Some 476 specific registers */
 #define SPRN_SSPCR 830
 #define SPRN_USPCR 831
diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index 39ce95016a3a..b287cfc2dd85 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -51,24 +51,12 @@ struct div_result {
 
 static inline unsigned long get_tbl(void)
 {
-#if defined(CONFIG_403GCX)
-   unsigned long tbl;
-   asm volatile("mfspr %0, 0x3dd" : "=r" (tbl));
-   return tbl;
-#else
return mftbl();
-#endif
 }
 
 static inline unsigned int get_tbu(void)
 {
-#ifdef CONFIG_403GCX
-   unsigned int tbu;
-   asm volatile("mfspr %0, 0x3dc" : "=r" (tbu));
-   return tbu;
-#else

[PATCH v5 02/13] powerpc/40x: Rework 40x PTE access and TLB miss

2020-05-21 Thread Christophe Leroy

From: Christophe Leroy 

Commit 1bc54c03117b ("powerpc: rework 4xx PTE access and TLB miss")
reworked 44x PTE access to avoid atomic pte updates, and
left 8xx, 40x and fsl booke with atomic pte updates.
Commit 6cfd8990e27d ("powerpc: rework FSL Book-E PTE access and TLB
miss") removed atomic pte updates on fsl booke.
It went away on 8xx with commit ddfc20a3b9ae ("powerpc/8xx: Remove
PTE_ATOMIC_UPDATES").

40x is the last platform setting PTE_ATOMIC_UPDATES.

Rework PTE access and TLB miss to remove PTE_ATOMIC_UPDATES for 40x:
- Always handle DSI as a fault.
- Bail out of TLB miss handler when CONFIG_SWAP is set and
_PAGE_ACCESSED is not set.
- Bail out of ITLB miss handler when _PAGE_EXEC is not set.
- Only set WR bit when both _PAGE_RW and _PAGE_DIRTY are set.
- Remove _PAGE_HWWRITE
- Don't require PTE_ATOMIC_UPDATES anymore

Reported-by: kbuild test robot 
Signed-off-by: Christophe Leroy 
---
v4: Fixed build failure (missing ;)
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/nohash/32/pte-40x.h |  23 +--
 arch/powerpc/include/asm/nohash/pgtable.h|   2 -
 arch/powerpc/kernel/head_40x.S   | 177 +++
 arch/powerpc/mm/nohash/40x.c |   4 +-
 4 files changed, 34 insertions(+), 172 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/32/pte-40x.h 
b/arch/powerpc/include/asm/nohash/32/pte-40x.h
index 12c6811e344b..2d3153cfc0d7 100644
--- a/arch/powerpc/include/asm/nohash/32/pte-40x.h
+++ b/arch/powerpc/include/asm/nohash/32/pte-40x.h
@@ -44,9 +44,8 @@
 #define_PAGE_WRITETHRU 0x008   /* W: caching is write-through */
 #define_PAGE_USER  0x010   /* matches one of the zone permission 
bits */
 #define_PAGE_SPECIAL   0x020   /* software: Special page */
-#define_PAGE_RW0x040   /* software: Writes permitted */
 #define_PAGE_DIRTY 0x080   /* software: dirty page */
-#define _PAGE_HWWRITE  0x100   /* hardware: Dirty & RW, set in exception */
+#define _PAGE_RW   0x100   /* hardware: WR, anded with dirty in exception 
*/
 #define _PAGE_EXEC 0x200   /* hardware: EX permission */
 #define _PAGE_ACCESSED 0x400   /* software: R: page referenced */
 
@@ -58,8 +57,8 @@
 
 #define _PAGE_KERNEL_RO0
 #define _PAGE_KERNEL_ROX   _PAGE_EXEC
-#define _PAGE_KERNEL_RW(_PAGE_DIRTY | _PAGE_RW | _PAGE_HWWRITE)
-#define _PAGE_KERNEL_RWX   (_PAGE_DIRTY | _PAGE_RW | _PAGE_HWWRITE | 
_PAGE_EXEC)
+#define _PAGE_KERNEL_RW(_PAGE_DIRTY | _PAGE_RW)
+#define _PAGE_KERNEL_RWX   (_PAGE_DIRTY | _PAGE_RW | _PAGE_EXEC)
 
 #define _PMD_PRESENT   0x400   /* PMD points to page of PTEs */
 #define _PMD_PRESENT_MASK  _PMD_PRESENT
@@ -85,21 +84,5 @@
 #define PAGE_READONLY  __pgprot(_PAGE_BASE | _PAGE_USER)
 #define PAGE_READONLY_X__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC)
 
-#ifndef __ASSEMBLY__
-static inline pte_t pte_wrprotect(pte_t pte)
-{
-   return __pte(pte_val(pte) & ~(_PAGE_RW | _PAGE_HWWRITE));
-}
-
-#define pte_wrprotect pte_wrprotect
-
-static inline pte_t pte_mkclean(pte_t pte)
-{
-   return __pte(pte_val(pte) & ~(_PAGE_DIRTY | _PAGE_HWWRITE));
-}
-
-#define pte_mkclean pte_mkclean
-#endif
-
 #endif /* __KERNEL__ */
 #endif /*  _ASM_POWERPC_NOHASH_32_PTE_40x_H */
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h 
b/arch/powerpc/include/asm/nohash/pgtable.h
index f27c967d9269..50a4b0bb8d16 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -130,12 +130,10 @@ static inline pte_t pte_exprotect(pte_t pte)
return __pte(pte_val(pte) & ~_PAGE_EXEC);
 }
 
-#ifndef pte_mkclean
 static inline pte_t pte_mkclean(pte_t pte)
 {
return __pte(pte_val(pte) & ~_PAGE_DIRTY);
 }
-#endif
 
 static inline pte_t pte_mkold(pte_t pte)
 {
diff --git a/arch/powerpc/kernel/head_40x.S b/arch/powerpc/kernel/head_40x.S
index 2cec543c38f0..f45d71ada48b 100644
--- a/arch/powerpc/kernel/head_40x.S
+++ b/arch/powerpc/kernel/head_40x.S
@@ -176,135 +176,16 @@ _ENTRY(saved_ksp_limit)
  * 0x0300 - Data Storage Exception
  * This happens for just a few reasons.  U0 set (but we don't do that),
  * or zone protection fault (user violation, write to protected page).
- * If this is just an update of modified status, we do that quickly
- * and exit.  Otherwise, we call heavywight functions to do the work.
+ * The other Data TLB exceptions bail out to this point
+ * if they can't resolve the lightweight TLB fault.
  */
START_EXCEPTION(0x0300, DataStorage)
-   mtspr   SPRN_SPRG_SCRATCH0, r10 /* Save some working registers */
-   mtspr   SPRN_SPRG_SCRATCH1, r11
-#ifdef CONFIG_403GCX
-   stw r12, 0(r0)
-   stw r9, 4(r0)
-   mfcrr11
-   mfspr   r12, SPRN_PID
-   stw r11, 8(r0)
-   stw r12, 12(r0)
-#else
-   mtspr   SPRN_SPRG_SCRATCH3, r12
-   mtspr   SPRN_SPRG_SCRATCH4, r9
-   mfcrr11
-   mfspr   r12, SPRN_PID
-   mtspr   SPRN

[PATCH v5 00/13] Modernise powerpc 40x

2020-05-21 Thread Christophe Leroy

v1 and v2 of this series were aiming at removing 40x entirely,
but it led to protests.

v3 is trying to start modernising powerpc 40x:
- Rework TLB miss handlers to not use PTE_ATOMIC_UPDATES and _PAGE_HWWRITE
- Remove old versions of 40x processors, namely 403 and 405GP and associated
errata.
- Last two patches are trivial changes in TLB miss handlers to reduce number
of scratch registers.

v4:
- Fixing a build failure with patch 2 due to a missing ;
- There was in patch 5 some stuff belonging to patch 6. Moved them.
- Rebased to today's powerpc/merge

v5:
- Rebase on top of 35a26089dd1c Automatic merge of 'next-test' into merge-test 
(2020-05-22 00:14)

I would have liked to test it with QEMU, but I get the following error when
trying to start QEMU for machine ref405ep.

qemu-system-ppc: Could not load PowerPC BIOS 'ppc405_rom.bin'

Can someone help with that ?

Christophe Leroy (12):
  powerpc/40x: Rework 40x PTE access and TLB miss
  powerpc/pgtable: Drop PTE_ATOMIC_UPDATES
  powerpc/40x: Remove support for IBM 403GCX
  powerpc/40x: Remove STB03xxx
  powerpc/40x: Remove WALNUT
  powerpc/40x: Remove EP405
  powerpc/40x: Remove support for ISS Simulator
  powerpc/40x: Remove support for IBM 405GP
  powerpc/40x: Remove IBM405 Erratum #51
  powerpc: Remove IBM405 Erratum #77
  powerpc/40x: Avoid using r12 in TLB miss handlers
  powerpc/40x: Don't save CR in SPRN_SPRG_SCRATCH6

Michal Simek (1):
  powerpc: Remove Xilinx PPC405/PPC440 support

 Documentation/devicetree/bindings/xilinx.txt | 143 --
 Documentation/powerpc/bootwrapper.rst|  28 +-
 arch/powerpc/Kconfig.debug   |   2 +-
 arch/powerpc/boot/Makefile   |  14 +-
 arch/powerpc/boot/dts/Makefile   |   1 -
 arch/powerpc/boot/dts/ep405.dts  | 230 -
 arch/powerpc/boot/dts/virtex440-ml507.dts| 406 
 arch/powerpc/boot/dts/virtex440-ml510.dts| 466 ---
 arch/powerpc/boot/dts/walnut.dts | 246 --
 arch/powerpc/boot/ep405.c|  71 ---
 arch/powerpc/boot/ops.h  |   1 -
 arch/powerpc/boot/serial.c   |   5 -
 arch/powerpc/boot/treeboot-walnut.c  |  81 
 arch/powerpc/boot/uartlite.c |  79 
 arch/powerpc/boot/virtex.c   |  97 
 arch/powerpc/boot/virtex405-head.S   |  31 --
 arch/powerpc/boot/wrapper|   8 -
 arch/powerpc/configs/40x/acadia_defconfig|   1 -
 arch/powerpc/configs/40x/ep405_defconfig |  62 ---
 arch/powerpc/configs/40x/kilauea_defconfig   |   1 -
 arch/powerpc/configs/40x/klondike_defconfig  |   1 -
 arch/powerpc/configs/40x/makalu_defconfig|   1 -
 arch/powerpc/configs/40x/obs600_defconfig|   1 -
 arch/powerpc/configs/40x/virtex_defconfig|  75 ---
 arch/powerpc/configs/44x/virtex5_defconfig   |  74 ---
 arch/powerpc/configs/ppc40x_defconfig|   9 -
 arch/powerpc/configs/ppc44x_defconfig|   8 -
 arch/powerpc/include/asm/asm-405.h   |  19 -
 arch/powerpc/include/asm/atomic.h|  11 -
 arch/powerpc/include/asm/bitops.h|   4 -
 arch/powerpc/include/asm/cache.h |   2 +-
 arch/powerpc/include/asm/cmpxchg.h   |  11 -
 arch/powerpc/include/asm/futex.h |   3 -
 arch/powerpc/include/asm/nohash/32/pgtable.h |  16 -
 arch/powerpc/include/asm/nohash/32/pte-40x.h |  23 +-
 arch/powerpc/include/asm/nohash/pgtable.h|   2 -
 arch/powerpc/include/asm/reg_booke.h |  54 ---
 arch/powerpc/include/asm/spinlock.h  |   4 -
 arch/powerpc/include/asm/time.h  |  12 -
 arch/powerpc/include/asm/xilinx_intc.h   |  16 -
 arch/powerpc/include/asm/xilinx_pci.h|  21 -
 arch/powerpc/kernel/cputable.c   | 102 
 arch/powerpc/kernel/entry_32.S   |  11 -
 arch/powerpc/kernel/head_40x.S   | 316 +++--
 arch/powerpc/kernel/misc_32.S|   9 -
 arch/powerpc/kernel/setup-common.c   |   4 -
 arch/powerpc/mm/nohash/40x.c |   4 +-
 arch/powerpc/platforms/40x/Kconfig   |  76 ---
 arch/powerpc/platforms/40x/Makefile  |   3 -
 arch/powerpc/platforms/40x/ep405.c   | 123 -
 arch/powerpc/platforms/40x/virtex.c  |  54 ---
 arch/powerpc/platforms/40x/walnut.c  |  65 ---
 arch/powerpc/platforms/44x/Kconfig   |  40 +-
 arch/powerpc/platforms/44x/Makefile  |   2 -
 arch/powerpc/platforms/44x/virtex.c  |  60 ---
 arch/powerpc/platforms/44x/virtex_ml510.c|  30 --
 arch/powerpc/platforms/Kconfig   |   4 -
 arch/powerpc/sysdev/Makefile |   2 -
 arch/powerpc/sysdev/xilinx_intc.c|  88 
 arch/powerpc/sysdev/xilinx_pci.c | 132 --
 drivers/char/Kconfig |   2 +-
 drivers/video/fbdev/Kconfig  |   2 +-
 62 files changed, 83 insertions(+), 3386 deletions(-)
 d

[PATCH v5 03/13] powerpc/pgtable: Drop PTE_ATOMIC_UPDATES

2020-05-21 Thread Christophe Leroy

From: Christophe Leroy 

40x was the last user of PTE_ATOMIC_UPDATES.

Drop everything related to PTE_ATOMIC_UPDATES.

Signed-off-by: Christophe Leroy 
---
v5: Rebased on top of the 8xx hugepage series
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/nohash/32/pgtable.h | 15 ---
 1 file changed, 15 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h 
b/arch/powerpc/include/asm/nohash/32/pgtable.h
index 717f995d21b8..46cd5428fc52 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -259,25 +259,10 @@ static inline pte_basic_t pte_update(struct mm_struct 
*mm, unsigned long addr, p
 static inline pte_basic_t pte_update(struct mm_struct *mm, unsigned long addr, 
pte_t *p,
 unsigned long clr, unsigned long set, int 
huge)
 {
-#if defined(PTE_ATOMIC_UPDATES) && !defined(CONFIG_PTE_64BIT)
-   unsigned long old, tmp;
-
-   __asm__ __volatile__("\
-1: lwarx   %0,0,%3\n\
-   andc%1,%0,%4\n\
-   or  %1,%1,%5\n"
-   PPC405_ERR77(0,%3)
-"  stwcx.  %1,0,%3\n\
-   bne-1b"
-   : "=&r" (old), "=&r" (tmp), "=m" (*p)
-   : "r" (p), "r" (clr), "r" (set), "m" (*p)
-   : "cc" );
-#else /* PTE_ATOMIC_UPDATES */
pte_basic_t old = pte_val(*p);
pte_basic_t new = (old & ~(pte_basic_t)clr) | set;
 
*p = __pte(new);
-#endif /* !PTE_ATOMIC_UPDATES */
 
 #ifdef CONFIG_44x
if ((old & _PAGE_USER) && (old & _PAGE_EXEC))
-- 
2.25.0

Re: [PATCH v2 5/7] mm: parallelize deferred_init_memmap()

2020-05-21 Thread Alexander Duyck

On Thu, May 21, 2020 at 8:37 AM Daniel Jordan
 wrote:
>
> On Wed, May 20, 2020 at 06:29:32PM -0700, Alexander Duyck wrote:
> > On Wed, May 20, 2020 at 11:27 AM Daniel Jordan
> > > @@ -1814,16 +1815,44 @@ deferred_init_maxorder(u64 *i, struct zone *zone, 
> > > unsigned long *start_pfn,
> > > return nr_pages;
> > >  }
> > >
> > > +struct definit_args {
> > > +   struct zone *zone;
> > > +   atomic_long_t nr_pages;
> > > +};
> > > +
> > > +static void __init
> > > +deferred_init_memmap_chunk(unsigned long start_pfn, unsigned long 
> > > end_pfn,
> > > +  void *arg)
> > > +{
> > > +   unsigned long spfn, epfn, nr_pages = 0;
> > > +   struct definit_args *args = arg;
> > > +   struct zone *zone = args->zone;
> > > +   u64 i;
> > > +
> > > +   deferred_init_mem_pfn_range_in_zone(&i, zone, &spfn, &epfn, 
> > > start_pfn);
> > > +
> > > +   /*
> > > +* Initialize and free pages in MAX_ORDER sized increments so 
> > > that we
> > > +* can avoid introducing any issues with the buddy allocator.
> > > +*/
> > > +   while (spfn < end_pfn) {
> > > +   nr_pages += deferred_init_maxorder(&i, zone, &spfn, 
> > > &epfn);
> > > +   cond_resched();
> > > +   }
> > > +
> > > +   atomic_long_add(nr_pages, &args->nr_pages);
> > > +}
> > > +
> >
> > Personally I would get rid of nr_pages entirely. It isn't worth the
> > cache thrash to have this atomic variable bouncing around.
>
> One of the things I tried to optimize was the managed_pages atomic adds in
> __free_pages_core, but performance stayed the same on the biggest machine I
> tested when it was done once at the end of page init instead of in every 
> thread
> for every pageblock.
>
> I'm not sure this atomic would matter either, given it's less frequent.

It is more about not bothering with the extra tracking. We don't
really need it and having it doesn't really add much in the way of
value.

> > You could
> > probably just have this function return void since all nr_pages is
> > used for is a pr_info  statement at the end of the initialization
> > which will be completely useless now anyway since we really have the
> > threads running in parallel anyway.
>
> The timestamp is still useful for observability, page init is a significant
> part of kernel boot on big machines, over 10% sometimes with these patches.

Agreed.

> It's mostly the time that matters though, I agree the number of pages is less
> important and is probably worth removing just to simplify the code.  I'll do 
> it
> if no one sees a reason to keep it.

Sounds good.

> > We only really need the nr_pages logic in deferred_grow_zone in order
> > to track if we have freed enough pages to allow us to go back to what
> > we were doing.
> >
> > > @@ -1863,11 +1892,32 @@ static int __init deferred_init_memmap(void *data)
> > > goto zone_empty;
> > >
> > > /*
> > > -* Initialize and free pages in MAX_ORDER sized increments so
> > > -* that we can avoid introducing any issues with the buddy
> > > -* allocator.
> > > +* More CPUs always led to greater speedups on tested systems, up 
> > > to
> > > +* all the nodes' CPUs.  Use all since the system is otherwise 
> > > idle now.
> > >  */
> > > +   max_threads = max(cpumask_weight(cpumask), 1u);
> > > +
> > > while (spfn < epfn) {
> > > +   epfn_align = ALIGN_DOWN(epfn, PAGES_PER_SECTION);
> > > +
> > > +   if (IS_ALIGNED(spfn, PAGES_PER_SECTION) &&
> > > +   epfn_align - spfn >= PAGES_PER_SECTION) {
> > > +   struct definit_args arg = { zone, 
> > > ATOMIC_LONG_INIT(0) };
> > > +   struct padata_mt_job job = {
> > > +   .thread_fn   = deferred_init_memmap_chunk,
> > > +   .fn_arg  = &arg,
> > > +   .start   = spfn,
> > > +   .size= epfn_align - spfn,
> > > +   .align   = PAGES_PER_SECTION,
> > > +   .min_chunk   = PAGES_PER_SECTION,
> > > +   .max_threads = max_threads,
> > > +   };
> > > +
> > > +   padata_do_multithreaded(&job);
> > > +   nr_pages += atomic_long_read(&arg.nr_pages);
> > > +   spfn = epfn_align;
> > > +   }
> > > +
> > > nr_pages += deferred_init_maxorder(&i, zone, &spfn, 
> > > &epfn);
> > > cond_resched();
> > > }
> >
> > This doesn't look right. You are basically adding threads in addition
> > to calls to deferred_init_maxorder.
>
> The deferred_init_maxorder call is there to do the remaining, non-section
> aligned part of a range.  It doesn't have to be done this way.

It is also doing the advancing though isn't it?

> >

Re: [PATCH] arch/{mips,sparc,microblaze,powerpc}: Don't enable pagefault/preempt twice

2020-05-21 Thread Guenter Roeck

On 5/19/20 10:13 PM, Ira Weiny wrote:
> On Tue, May 19, 2020 at 12:42:15PM -0700, Guenter Roeck wrote:
>> On Tue, May 19, 2020 at 11:40:32AM -0700, Ira Weiny wrote:
>>> On Tue, May 19, 2020 at 09:54:22AM -0700, Guenter Roeck wrote:
 On Mon, May 18, 2020 at 11:48:43AM -0700, ira.we...@intel.com wrote:
> From: Ira Weiny 
>
> The kunmap_atomic clean up failed to remove one set of pagefault/preempt
> enables when vaddr is not in the fixmap.
>
> Fixes: bee2128a09e6 ("arch/kunmap_atomic: consolidate duplicate code")
> Signed-off-by: Ira Weiny 

 microblazeel works with this patch,
>>>
>>> Awesome...  Andrew in my rush yesterday I should have put a reported by on 
>>> the
>>> patch for Guenter as well.
>>>
>>> Sorry about that Guenter,
>>
>> No worries.
>>
>>> Ira
>>>
 as do the nosmp sparc32 boot tests,
 but sparc32 boot tests with SMP enabled still fail with lots of messages
 such as:

 BUG: Bad page state in process swapper/0  pfn:006a1
 page:f0933420 refcount:0 mapcount:1 mapping:(ptrval) index:0x1
 flags: 0x0()
 raw:  0100 0122  0001   
 
 page dumped because: nonzero mapcount
 Modules linked in:
 CPU: 0 PID: 1 Comm: swapper/0 Tainted: GB 
 5.7.0-rc6-next-20200518-2-gb178d2d56f29 #1
 [f00e7ab8 :
 bad_page+0xa8/0x108 ]
 [f00e8b54 :
 free_pcppages_bulk+0x154/0x52c ]
 [f00ea024 :
 free_unref_page+0x54/0x6c ]
 [f00ed864 :
 free_reserved_area+0x58/0xec ]
 [f0527104 :
 kernel_init+0x14/0x110 ]
 [f000b77c :
 ret_from_kernel_thread+0xc/0x38 ]
 [ :
 0x0 ]

 Code path leading to that message is different but always the same
 from free_unref_page().
> 
> Actually it occurs to me that the patch consolidating kmap_prot is odd for
> sparc 32 bit...
> 
> Its a long shot but could you try reverting this patch?
> 
> 4ea7d2419e3f kmap: consolidate kmap_prot definitions
> 

That is not easy to revert, unfortunately, due to several follow-up patches.

Guenter

> Alternately I will need to figure out how to run the sparc on qemu here...
> 
> Thanks very much for all the testing though!  :-D
> 
> Ira
> 

 Still testing ppc images.

>>
>> ppc image tests are passing with this patch.
>>
>> Guenter

Re: [PATCH v2 5/7] mm: parallelize deferred_init_memmap()

2020-05-21 Thread Daniel Jordan

On Thu, May 21, 2020 at 08:00:31AM -0700, Alexander Duyck wrote:
> So I was thinking about my suggestion further and the loop at the end
> isn't quite correct as I believe it could lead to gaps. The loop on
> the end should probably be:
> for_each_free_mem_pfn_range_in_zone_from(i, zone, spfn, epfn) 
> {
> if (epfn <= epfn_align)
> continue;
> if (spfn < epfn_align)
> spfn = epfn_align;
> break;
> }
> 
> That would generate a new range where epfn_align has actually ended
> and there is a range of new PFNs to process.

Whoops, my email crossed with yours.  Agreed, but see the other message.

Re: [PATCH v2 5/7] mm: parallelize deferred_init_memmap()

2020-05-21 Thread Daniel Jordan

On Wed, May 20, 2020 at 06:29:32PM -0700, Alexander Duyck wrote:
> On Wed, May 20, 2020 at 11:27 AM Daniel Jordan
> > @@ -1814,16 +1815,44 @@ deferred_init_maxorder(u64 *i, struct zone *zone, 
> > unsigned long *start_pfn,
> > return nr_pages;
> >  }
> >
> > +struct definit_args {
> > +   struct zone *zone;
> > +   atomic_long_t nr_pages;
> > +};
> > +
> > +static void __init
> > +deferred_init_memmap_chunk(unsigned long start_pfn, unsigned long end_pfn,
> > +  void *arg)
> > +{
> > +   unsigned long spfn, epfn, nr_pages = 0;
> > +   struct definit_args *args = arg;
> > +   struct zone *zone = args->zone;
> > +   u64 i;
> > +
> > +   deferred_init_mem_pfn_range_in_zone(&i, zone, &spfn, &epfn, 
> > start_pfn);
> > +
> > +   /*
> > +* Initialize and free pages in MAX_ORDER sized increments so that 
> > we
> > +* can avoid introducing any issues with the buddy allocator.
> > +*/
> > +   while (spfn < end_pfn) {
> > +   nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn);
> > +   cond_resched();
> > +   }
> > +
> > +   atomic_long_add(nr_pages, &args->nr_pages);
> > +}
> > +
> 
> Personally I would get rid of nr_pages entirely. It isn't worth the
> cache thrash to have this atomic variable bouncing around.

One of the things I tried to optimize was the managed_pages atomic adds in
__free_pages_core, but performance stayed the same on the biggest machine I
tested when it was done once at the end of page init instead of in every thread
for every pageblock.

I'm not sure this atomic would matter either, given it's less frequent.

> You could
> probably just have this function return void since all nr_pages is
> used for is a pr_info  statement at the end of the initialization
> which will be completely useless now anyway since we really have the
> threads running in parallel anyway.

The timestamp is still useful for observability, page init is a significant
part of kernel boot on big machines, over 10% sometimes with these patches.

It's mostly the time that matters though, I agree the number of pages is less
important and is probably worth removing just to simplify the code.  I'll do it
if no one sees a reason to keep it.

> We only really need the nr_pages logic in deferred_grow_zone in order
> to track if we have freed enough pages to allow us to go back to what
> we were doing.
>
> > @@ -1863,11 +1892,32 @@ static int __init deferred_init_memmap(void *data)
> > goto zone_empty;
> >
> > /*
> > -* Initialize and free pages in MAX_ORDER sized increments so
> > -* that we can avoid introducing any issues with the buddy
> > -* allocator.
> > +* More CPUs always led to greater speedups on tested systems, up to
> > +* all the nodes' CPUs.  Use all since the system is otherwise idle 
> > now.
> >  */
> > +   max_threads = max(cpumask_weight(cpumask), 1u);
> > +
> > while (spfn < epfn) {
> > +   epfn_align = ALIGN_DOWN(epfn, PAGES_PER_SECTION);
> > +
> > +   if (IS_ALIGNED(spfn, PAGES_PER_SECTION) &&
> > +   epfn_align - spfn >= PAGES_PER_SECTION) {
> > +   struct definit_args arg = { zone, 
> > ATOMIC_LONG_INIT(0) };
> > +   struct padata_mt_job job = {
> > +   .thread_fn   = deferred_init_memmap_chunk,
> > +   .fn_arg  = &arg,
> > +   .start   = spfn,
> > +   .size= epfn_align - spfn,
> > +   .align   = PAGES_PER_SECTION,
> > +   .min_chunk   = PAGES_PER_SECTION,
> > +   .max_threads = max_threads,
> > +   };
> > +
> > +   padata_do_multithreaded(&job);
> > +   nr_pages += atomic_long_read(&arg.nr_pages);
> > +   spfn = epfn_align;
> > +   }
> > +
> > nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn);
> > cond_resched();
> > }
> 
> This doesn't look right. You are basically adding threads in addition
> to calls to deferred_init_maxorder.

The deferred_init_maxorder call is there to do the remaining, non-section
aligned part of a range.  It doesn't have to be done this way.

> In addition you are spawning one
> job per section instead of per range.

That's not what's happening, all the above is doing is aligning the end of the
range down to a section.  Each thread is working on way more than a section at
a time.

> Really you should be going for
> something more along the lines of:
> 
> while (spfn < epfn) {
> unsigned long epfn_align = ALIGN(epfn,
> PAGE_PER_SECTION);
> struct definit_args arg = { zone, ATOMIC_LONG_INIT(0)
> };
>

Re: [PATCH] arch/{mips,sparc,microblaze,powerpc}: Don't enable pagefault/preempt twice

2020-05-21 Thread Ira Weiny

On Tue, May 19, 2020 at 12:42:15PM -0700, Guenter Roeck wrote:
> > On Tue, May 19, 2020 at 09:54:22AM -0700, Guenter Roeck wrote:
> > > as do the nosmp sparc32 boot tests,
> > > but sparc32 boot tests with SMP enabled still fail with lots of messages
> > > such as:
> > > 
> > > BUG: Bad page state in process swapper/0  pfn:006a1
> > > page:f0933420 refcount:0 mapcount:1 mapping:(ptrval) index:0x1
> > > flags: 0x0()
> > > raw:  0100 0122  0001   
> > > 
> > > page dumped because: nonzero mapcount
> > > Modules linked in:
> > > CPU: 0 PID: 1 Comm: swapper/0 Tainted: GB 
> > > 5.7.0-rc6-next-20200518-2-gb178d2d56f29 #1
> > > [f00e7ab8 :
> > > bad_page+0xa8/0x108 ]
> > > [f00e8b54 :
> > > free_pcppages_bulk+0x154/0x52c ]
> > > [f00ea024 :
> > > free_unref_page+0x54/0x6c ]
> > > [f00ed864 :
> > > free_reserved_area+0x58/0xec ]
> > > [f0527104 :
> > > kernel_init+0x14/0x110 ]
> > > [f000b77c :
> > > ret_from_kernel_thread+0xc/0x38 ]
> > > [ :
> > > 0x0 ]
> > > 
> > > Code path leading to that message is different but always the same
> > > from free_unref_page().
> > > 
> > > Still testing ppc images.
> > > 
> 
> ppc image tests are passing with this patch.

How about sparc?  I finally got your scripts to run on sparc and everything
looks to be passing?

Are we all good now?

Thanks again!
Ira

Re: [PATCH v2 5/7] mm: parallelize deferred_init_memmap()

2020-05-21 Thread Alexander Duyck

On Wed, May 20, 2020 at 6:29 PM Alexander Duyck
 wrote:
>
> On Wed, May 20, 2020 at 11:27 AM Daniel Jordan
>  wrote:
> >
> > Deferred struct page init is a significant bottleneck in kernel boot.
> > Optimizing it maximizes availability for large-memory systems and allows
> > spinning up short-lived VMs as needed without having to leave them
> > running.  It also benefits bare metal machines hosting VMs that are
> > sensitive to downtime.  In projects such as VMM Fast Restart[1], where
> > guest state is preserved across kexec reboot, it helps prevent
> > application and network timeouts in the guests.
> >
> > Multithread to take full advantage of system memory bandwidth.
> >
> > The maximum number of threads is capped at the number of CPUs on the
> > node because speedups always improve with additional threads on every
> > system tested, and at this phase of boot, the system is otherwise idle
> > and waiting on page init to finish.
> >
> > Helper threads operate on section-aligned ranges to both avoid false
> > sharing when setting the pageblock's migrate type and to avoid accessing
> > uninitialized buddy pages, though max order alignment is enough for the
> > latter.
> >
> > The minimum chunk size is also a section.  There was benefit to using
> > multiple threads even on relatively small memory (1G) systems, and this
> > is the smallest size that the alignment allows.
> >
> > The time (milliseconds) is the slowest node to initialize since boot
> > blocks until all nodes finish.  intel_pstate is loaded in active mode
> > without hwp and with turbo enabled, and intel_idle is active as well.
> >
> > Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz (Skylake, bare metal)
> >   2 nodes * 26 cores * 2 threads = 104 CPUs
> >   384G/node = 768G memory
> >
> >kernel boot deferred init
> >
> > node% (thr)speedup  time_ms (stdev)speedup  time_ms (stdev)
> >   (  0) --   4078.0 (  9.0) --   1779.0 (  8.7)
> >2% (  1)   1.4%   4021.3 (  2.9)   3.4%   1717.7 (  7.8)
> >   12% (  6)  35.1%   2644.7 ( 35.3)  80.8%341.0 ( 35.5)
> >   25% ( 13)  38.7%   2498.0 ( 34.2)  89.1%193.3 ( 32.3)
> >   37% ( 19)  39.1%   2482.0 ( 25.2)  90.1%175.3 ( 31.7)
> >   50% ( 26)  38.8%   2495.0 (  8.7)  89.1%193.7 (  3.5)
> >   75% ( 39)  39.2%   2478.0 ( 21.0)  90.3%172.7 ( 26.7)
> >  100% ( 52)  40.0%   2448.0 (  2.0)  91.9%143.3 (  1.5)
> >
> > Intel(R) Xeon(R) CPU E5-2699C v4 @ 2.20GHz (Broadwell, bare metal)
> >   1 node * 16 cores * 2 threads = 32 CPUs
> >   192G/node = 192G memory
> >
> >kernel boot deferred init
> >
> > node% (thr)speedup  time_ms (stdev)speedup  time_ms (stdev)
> >   (  0) --   1996.0 ( 18.0) --   1104.3 (  6.7)
> >3% (  1)   1.4%   1968.0 (  3.0)   2.7%   1074.7 (  9.0)
> >   12% (  4)  40.1%   1196.0 ( 22.7)  72.4%305.3 ( 16.8)
> >   25% (  8)  47.4%   1049.3 ( 17.2)  84.2%174.0 ( 10.6)
> >   37% ( 12)  48.3%   1032.0 ( 14.9)  86.8%145.3 (  2.5)
> >   50% ( 16)  48.9%   1020.3 (  2.5)  88.0%133.0 (  1.7)
> >   75% ( 24)  49.1%   1016.3 (  8.1)  88.4%128.0 (  1.7)
> >  100% ( 32)  49.4%   1009.0 (  8.5)  88.6%126.3 (  0.6)
> >
> > Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz (Haswell, bare metal)
> >   2 nodes * 18 cores * 2 threads = 72 CPUs
> >   128G/node = 256G memory
> >
> >kernel boot deferred init
> >
> > node% (thr)speedup  time_ms (stdev)speedup  time_ms (stdev)
> >   (  0) --   1682.7 (  6.7) --630.0 (  4.6)
> >3% (  1)   0.4%   1676.0 (  2.0)   0.7%625.3 (  3.2)
> >   12% (  4)  25.8%   1249.0 (  1.0)  68.2%200.3 (  1.2)
> >   25% (  9)  30.0%   1178.0 (  5.2)  79.7%128.0 (  3.5)
> >   37% ( 13)  30.6%   1167.7 (  3.1)  81.3%117.7 (  1.2)
> >   50% ( 18)  30.6%   1167.3 (  2.3)  81.4%117.0 (  1.0)
> >   75% ( 27)  31.0%   1161.3 (  4.6)  82.5%110.0 (  6.9)
> >  100% ( 36)  32.1%   1142.0 (  3.6)  85.7% 90.0 (  1.0)
> >
> > AMD EPYC 7551 32-Core Processor (Zen, kvm guest)
> >   1 node * 8 cores * 2 threads = 16 CPUs
> >   64G/node = 64G memory
> >
> >kernel boot deferred init
> >
> > node% (thr)speedup  time_ms (stdev)speedup  time_ms (stdev)
> >   (  0) --   1003.7

Re: [PATCH 0/2] powerpc: Remove support for ppc405/440 Xilinx platforms

2020-05-21 Thread Michal Simek

On 21. 05. 20 15:53, Michael Ellerman wrote:
> Christophe Leroy  writes:
>> Le 21/05/2020 à 09:02, Michael Ellerman a écrit :
>>> Arnd Bergmann  writes:
 +On Wed, Apr 8, 2020 at 2:04 PM Michael Ellerman  
 wrote:
> Benjamin Herrenschmidt  writes:
>> On Fri, 2020-04-03 at 15:59 +1100, Michael Ellerman wrote:
>>> Benjamin Herrenschmidt  writes:
>> IBM still put 40x cores inside POWER chips no ?
>
> Oh yeah that's true. I guess most folks don't know that, or that they
> run RHEL on them.

 Is there a reason for not having those dts files in mainline then?
 If nothing else, it would document what machines are still being
 used with future kernels.
>>>
>>> Sorry that part was a joke :D  Those chips don't run Linux.
>>>
>>
>> Nice to know :)
>>
>> What's the plan then, do we still want to keep 40x in the kernel ?
> 
> I guess we keep it for now.
> 
> Perhaps we mark it BROKEN for a few releases and see if anyone
> complains?

I would like to get at least that xilinx patch to the tree to unblock
our changes on interrupt controller.

Thanks,
Michal

Re: [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier

2020-05-21 Thread Jeff Moyer

Dan Williams  writes:

>> But I agree with your concern that if we have older kernel/applications
>> that continue to use `dcbf` on future hardware we will end up
>> having issues w.r.t powerfail consistency. The plan is what you outlined
>> above as tighter ecosystem control. Considering we don't have a pmem
>> device generally available, we get both kernel and userspace upgraded
>> to use these new instructions before such a device is made available.

I thought power already supported NVDIMM-N, no?  So are you saying that
those devices will continue to work with the existing flushing and
fencing mechanisms?

> Ok, I think a compile time kernel option with a runtime override
> satisfies my concern. Does that work for you?

The compile time option only helps when running newer kernels.  I'm not
sure how you would even begin to audit userspace applications (keep in
mind, not every application is open source, and not every application
uses pmdk).  I also question the merits of forcing the administrator to
make the determination of whether all applications on the system will
work properly.  Really, you have to rely on the vendor to tell you the
platform is supported, and at that point, why put further hurdles in the
way?

The decision to require different instructions on ppc is unfortunate,
but one I'm sure we have no control over.  I don't see any merit in the
kernel disallowing MAP_SYNC access on these platforms.  Ideally, we'd
have some way of ensuring older kernels don't work with these new
platforms, but I don't think that's possible.

Moving on to the patch itself--Aneesh, have you audited other persistent
memory users in the kernel?  For example, drivers/md/dm-writecache.c does
this:

static void writecache_commit_flushed(struct dm_writecache *wc, bool 
wait_for_ios)
{
if (WC_MODE_PMEM(wc))
wmb(); <==
else
ssd_commit_flushed(wc, wait_for_ios);
}

I believe you'll need to make modifications there.

Cheers,
Jeff

Re: [RESEND PATCH v7 3/5] powerpc/papr_scm: Fetch nvdimm health information from PHYP

2020-05-21 Thread Michael Ellerman

Vaibhav Jain  writes:
> Thanks for reviewing this this patch Ira. My responses below:
> Ira Weiny  writes:
>> On Wed, May 20, 2020 at 12:30:56AM +0530, Vaibhav Jain wrote:
>>> Implement support for fetching nvdimm health information via
>>> H_SCM_HEALTH hcall as documented in Ref[1]. The hcall returns a pair
>>> of 64-bit big-endian integers, bitwise-and of which is then stored in
>>> 'struct papr_scm_priv' and subsequently partially exposed to
>>> user-space via newly introduced dimm specific attribute
>>> 'papr/flags'. Since the hcall is costly, the health information is
>>> cached and only re-queried, 60s after the previous successful hcall.
...
>>> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
>>> b/arch/powerpc/platforms/pseries/papr_scm.c
>>> index f35592423380..142636e1a59f 100644
>>> --- a/arch/powerpc/platforms/pseries/papr_scm.c
>>> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
>>> @@ -39,6 +78,15 @@ struct papr_scm_priv {
>>> struct resource res;
>>> struct nd_region *region;
>>> struct nd_interleave_set nd_set;
>>> +
>>> +   /* Protect dimm health data from concurrent read/writes */
>>> +   struct mutex health_mutex;
>>> +
>>> +   /* Last time the health information of the dimm was updated */
>>> +   unsigned long lasthealth_jiffies;
>>> +
>>> +   /* Health information for the dimm */
>>> +   u64 health_bitmap;
>>
>> I wonder if this should be typed big endian as you mention that it is in the
>> commit message?
> This was discussed in an earlier review of the patch series at
> https://lore.kernel.org/linux-nvdimm/878sjetcis@mpe.ellerman.id.au
>
> Even though health bitmap is returned in big endian format (For ex
> value 0xC00 indicates bits 0,1 set), its value is never
> used. Instead only test for specific bits being set in the register is
> done.

This has already caused a lot of confusion, so let me try and clear it
up. I will probably fail :)

The value is not big endian.

It's returned in a GPR (a register), from the hypervisor. The ordering
of bytes in a register is not dependent on what endian we're executing
in.

It's true that the hypervisor will have been running big endian, and
when it returns to us we will now be running little endian. But the
value is unchanged, it was 0xC00 in the GPR while the HV was
running and it's still 0xC00 when we return to Linux. You
can see this in mambo, see below for an example.

_However_, the specification of the bits in the bitmap value uses MSB 0
ordering, as is traditional for IBM documentation. That means the most
significant bit, aka. the left most bit, is called "bit 0".

See: https://en.wikipedia.org/wiki/Bit_numbering#MSB_0_bit_numbering

That is the opposite numbering from what most people use, and in
particular what most code in Linux uses, which is that bit 0 is the
least significant bit.

Which is where the confusion comes in. It's not that the bytes are
returned in a different order, it's that the bits are numbered
differently in the IBM documentation.

The way to fix this kind of thing is to read the docs, and convert all
the bits into correct numbering (LSB=0), and then throw away the docs ;)

cheers

In mambo we can set a breakpoint just before the kernel enters skiboot,
towards the end of __opal_call. The kernel is running LE and skiboot
runs BE.

  systemsim-p9 [~/skiboot/skiboot/external/mambo] b 0xc00c1744
  breakpoint set at [0:0:0]: 0xc00c1744 (0x000C1744) 
Enc:0x2402004C : hrfid

Then run:

  systemsim-p9 [~/skiboot/skiboot/external/mambo] c
  [0:0:0]: 0xC00C1744 (0x000C1744) Enc:0x2402004C : hrfid
  INFO: 121671618: (121671618): ** Execution stopped: user (tcl),  **
  121671618: ** finished running 121671618 instructions **

And we stop there, on an hrfid that we haven't executed yet.
We can print r0, to see the OPAL token:

  systemsim-p9 [~/skiboot/skiboot/external/mambo] p r0
  0x0019

ie. we're calling OPAL_CONSOLE_WRITE_BUFFER_SPACE (25).

And we can print the MSR:

  systemsim-p9 [~/skiboot/skiboot/external/mambo] p msr
  0x92001033

 64-bit mode (SF): 0x1 [64-bit mode]
Hypervisor State (HV): 0x1
   Vector Available (VEC): 0x1
  Machine Check Interrupt Enable (ME): 0x1
Instruction Relocate (IR): 0x1
   Data Relocate (DR): 0x1
   Recoverable Interrupt (RI): 0x1
  Little-Endian Mode (LE): 0x1 [little-endian]

ie. we're little endian.

We then step one instruction:

  systemsim-p9 [~/skiboot/skiboot/external/mambo] s
  [0:0:0]: 0x30002BF0 (0x30002BF0) Enc:0x7D9FFAA6 : mfspr   
r12,PIR

Now we're in skiboot. Print the MSR again:

  systemsim-p9 [~/skiboot/skiboot/external/mambo] p msr
  0x92001002

 64-bit mode (SF): 0x1 [64-bit mode]
Hypervisor State (HV): 0x1
   Vector Available (VEC): 0x1
  Machine Check Interrupt Enable (ME): 0x

Re: [PATCH 0/2] powerpc: Remove support for ppc405/440 Xilinx platforms

2020-05-21 Thread Michael Ellerman

Christophe Leroy  writes:
> Le 21/05/2020 à 09:02, Michael Ellerman a écrit :
>> Arnd Bergmann  writes:
>>> +On Wed, Apr 8, 2020 at 2:04 PM Michael Ellerman  
>>> wrote:
 Benjamin Herrenschmidt  writes:
> On Fri, 2020-04-03 at 15:59 +1100, Michael Ellerman wrote:
>> Benjamin Herrenschmidt  writes:
> IBM still put 40x cores inside POWER chips no ?

 Oh yeah that's true. I guess most folks don't know that, or that they
 run RHEL on them.
>>>
>>> Is there a reason for not having those dts files in mainline then?
>>> If nothing else, it would document what machines are still being
>>> used with future kernels.
>> 
>> Sorry that part was a joke :D  Those chips don't run Linux.
>> 
>
> Nice to know :)
>
> What's the plan then, do we still want to keep 40x in the kernel ?

I guess we keep it for now.

Perhaps we mark it BROKEN for a few releases and see if anyone
complains?

> If yes, is it ok to drop the oldies anyway as done in my series 
> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=172630 ?

Yeah let's do it. I would love to get rid of that horrible
PPC405_ERR77() sprinkled all through our atomics.

> (Note that this series will conflict with my series on hugepages on 8xx 
> due to the PTE_ATOMIC_UPDATES stuff. I can rebase the 40x modernisation 
> series on top of the 8xx hugepages series if it is worth it)

Yeah if you can rebase that would be great.

cheers

Re: linux-next: manual merge of the rcu tree with the powerpc tree

2020-05-21 Thread Paul E. McKenney

On Thu, May 21, 2020 at 02:51:24PM +1000, Stephen Rothwell wrote:
> Hi all,
> 
> On Tue, 19 May 2020 17:23:16 +1000 Stephen Rothwell  
> wrote:
> >
> > Today's linux-next merge of the rcu tree got a conflict in:
> > 
> >   arch/powerpc/kernel/traps.c
> > 
> > between commit:
> > 
> >   116ac378bb3f ("powerpc/64s: machine check interrupt update NMI 
> > accounting")
> > 
> > from the powerpc tree and commit:
> > 
> >   187416eeb388 ("hardirq/nmi: Allow nested nmi_enter()")
> > 
> > from the rcu tree.
> > 
> > I fixed it up (I used the powerpc tree version for now) and can carry the
> > fix as necessary. This is now fixed as far as linux-next is concerned,
> > but any non trivial conflicts should be mentioned to your upstream
> > maintainer when your tree is submitted for merging.  You may also want
> > to consider cooperating with the maintainer of the conflicting tree to
> > minimise any particularly complex conflicts.
> 
> This is now a conflict between the powerpc commit and commit
> 
>   69ea03b56ed2 ("hardirq/nmi: Allow nested nmi_enter()")
> 
> from the tip tree.  I assume that the rcu and tip trees are sharing
> some patches (but not commits) :-(

We are sharing commits, and in fact 187416eeb388 in the rcu tree came
from the tip tree.  My guess is version skew, and that I probably have
another rebase coming up.

Why is this happening?  There are sets of conflicting commits in different
efforts, and we are trying to resolve them.  But we are getting feedback
on some of those commits, which is probably what is causing the skew.

Thanx, Paul

Re: Fwd: [CRON] Broken: ClangBuiltLinux/continuous-integration#1432 (master - 0aceafc)

2020-05-21 Thread Michael Ellerman

Nathan Chancellor  writes:
> On Tue, May 19, 2020 at 05:56:32PM -0700, 'Nick Desaulniers' via Clang Built 
> Linux wrote:
>> Looks like our CI is still red from this:
>> 
>> https://travis-ci.com/github/ClangBuiltLinux/continuous-integration/builds/166854584
>> 
>> Filing a bug to follow up on:
>> https://github.com/ClangBuiltLinux/linux/issues/1031
>> 
>> On Thu, May 7, 2020 at 8:29 PM Michael Ellerman  wrote:
>> >
>> > Nick Desaulniers  writes:
>> > > Looks like ppc64le powernv_defconfig is suddenly failing the locking
>> > > torture tests, then locks up?
>> > > https://travis-ci.com/github/ClangBuiltLinux/continuous-integration/jobs/329211572#L3111-L3167
>> > > Any recent changes related here in -next?  I believe this is the first
>> > > failure, so I'll report back if we see this again.
>> >
>> > Thanks for the report.
>> >
>> > There's nothing newly in next-20200507 that seems related.
...
>
> This is probably still a manifestation of
> https://github.com/ClangBuiltLinux/continuous-integration/issues/262
> because rekicking the tests usually fixes it.

Oh yep.

I was looking at the RCU warning, which I still don't understand, but
the lockup is presumably the same problem you hit with interrupts being
lost.

> We should probably just disable the torture tests like we do for x86_64
> for CI because we do not have access to QEMU 5.0.0 where this should be
> fixed. I believe it is slated for 4.2.1 as well but we still have to
> wait for that to be updated and packaged in Ubuntu.

You just need to start building Qemu HEAD as part of your CI ;)

cheers

Re: Endless soft-lockups for compiling workload since next-20200519

2020-05-21 Thread Frederic Weisbecker

On Thu, May 21, 2020 at 01:00:27PM +0200, Peter Zijlstra wrote:
> On Thu, May 21, 2020 at 12:49:37PM +0200, Peter Zijlstra wrote:
> > On Thu, May 21, 2020 at 11:39:39AM +0200, Peter Zijlstra wrote:
> > > On Thu, May 21, 2020 at 02:40:36AM +0200, Frederic Weisbecker wrote:
> > 
> > > This:
> > > 
> > > > smp_call_function_single_async() { 
> > > > smp_call_function_single_async() {
> > > > // verified csd->flags != CSD_LOCK // verified 
> > > > csd->flags != CSD_LOCK
> > > > csd->flags = CSD_LOCK  csd->flags = 
> > > > CSD_LOCK
> > > 
> > > concurrent smp_call_function_single_async() using the same csd is what
> > > I'm looking at as well.
> > 
> > So something like this ought to cure the fundamental problem and make
> > smp_call_function_single_async() more user friendly, but also more
> > expensive.
> > 
> > The problem is that while the ILB case is easy to fix, I can't seem to
> > find an equally nice solution for the ttwu_remote_queue() case; that
> > would basically require sticking the wake_csd in task_struct, I'll also
> > post that.
> > 
> > So it's either this:
> 
> Or this:
> 
> ---
>  include/linux/sched.h | 4 
>  kernel/sched/core.c   | 7 ---
>  kernel/sched/fair.c   | 2 +-
>  kernel/sched/sched.h  | 1 -
>  4 files changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index f38d62c4632c..136ee400b568 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -696,6 +696,10 @@ struct task_struct {
>   struct uclamp_seuclamp[UCLAMP_CNT];
>  #endif
>  
> +#ifdef CONFIG_SMP
> + call_single_data_t  wake_csd;
> +#endif
> +
>  #ifdef CONFIG_PREEMPT_NOTIFIERS
>   /* List of struct preempt_notifier: */
>   struct hlist_head   preempt_notifiers;
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 5b286469e26e..a7129652e89b 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2320,7 +2320,7 @@ static void ttwu_queue_remote(struct task_struct *p, 
> int cpu, int wake_flags)
>  
>   if (llist_add(&p->wake_entry, &rq->wake_list)) {
>   if (!set_nr_if_polling(rq->idle))
> - smp_call_function_single_async(cpu, &rq->wake_csd);
> + smp_call_function_single_async(cpu, &p->wake_csd);
>   else
>   trace_sched_wake_idle_without_ipi(cpu);
>   }
> @@ -2921,6 +2921,9 @@ int sched_fork(unsigned long clone_flags, struct 
> task_struct *p)
>  #endif
>  #if defined(CONFIG_SMP)
>   p->on_cpu = 0;
> + p->wake_csd = (struct __call_single_data) {
> + .func = wake_csd_func,
> + };
>  #endif
>   init_task_preempt_count(p);
>  #ifdef CONFIG_SMP
> @@ -6723,8 +6726,6 @@ void __init sched_init(void)
>   rq->avg_idle = 2*sysctl_sched_migration_cost;
>   rq->max_idle_balance_cost = sysctl_sched_migration_cost;
>  
> - rq_csd_init(rq, &rq->wake_csd, wake_csd_func);
> -
>   INIT_LIST_HEAD(&rq->cfs_tasks);
>  
>   rq_attach_root(rq, &def_root_domain);
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 01f94cf52783..b6d8a7b991f0 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -10033,7 +10033,7 @@ static void kick_ilb(unsigned int flags)
>* is idle. And the softirq performing nohz idle load balance
>* will be run before returning from the IPI.
>*/
> - smp_call_function_single_async(ilb_cpu, &cpu_rq(ilb_cpu)->nohz_csd);
> + smp_call_function_single_async(ilb_cpu, &this_rq()->nohz_csd);

My fear here is that if a previous call from the the same CPU but to another
target is still pending, the new one will be spuriously ignored.

Namely this could happen:

CPU 0  CPU 1
-  -
   local_irq_disable() or VMEXIT
kick_ilb() {
smp_call_function_single_async(CPU 1,  
 &this_rq()->nohz_csd);
}

kick_ilb() {
smp_call_function_single_async(CPU 2,  
 &this_rq()->nohz_csd) {
// IPI to CPU 2 ignored
if (csd->flags == CSD_LOCK)
return -EBUSY;
}
}

local_irq_enable();




But I believe we can still keep the remote csd if nohz_flags() are
strictly only set before the IPI and strictly only cleared from it.

And I still don't understand why trigger_load_balance() raise the
softirq without setting the current CPU as ilb. run_rebalance_domains()
thus ignores it most of the time in the end or it spuriously clear the
nohz_flags set by an IPI sender. Or there is something I misunderstood
there.

(Haven't checked the wake up case yet).

Re: [PATCH] ASoC: fsl: imx-pcm-dma: Don't request dma channel in probe

2020-05-21 Thread Shengjiu Wang

On Wed, May 20, 2020 at 8:38 PM Mark Brown  wrote:
>
> On Wed, May 20, 2020 at 07:22:19PM +0800, Shengjiu Wang wrote:
>
> > I see some driver also request dma channel in open() or hw_params().
> > how can they avoid the defer probe issue?
> > for example：
> > sound/arm/pxa2xx-pcm-lib.c
> > sound/soc/sprd/sprd-pcm-dma.c
>
> Other drivers having problems means those drivers should be fixed, not
> that we should copy the problems.  In the case of the PXA driver that's
> very old code which predates deferred probe by I'd guess a decade.

Thanks.

For the FE-BE case, do you have any suggestion for how fix it?

With DMA1->ASRC->DMA2->ESAI case, the DMA1->ASRC->DMA2
is in FE,  ESAI is in BE.  When ESAI drvier probe,  DMA3 channel is
created with ESAI's "dma:tx" (DMA3 channel
is not used in this FE-BE case).When FE-BE startup, DMA2
channel is created, it needs the ESAI's "dma:tx", so the warning
comes out.

best regards
wang shengjiu

Re: Endless soft-lockups for compiling workload since next-20200519

2020-05-21 Thread Peter Zijlstra

On Thu, May 21, 2020 at 12:49:37PM +0200, Peter Zijlstra wrote:
> On Thu, May 21, 2020 at 11:39:39AM +0200, Peter Zijlstra wrote:
> > On Thu, May 21, 2020 at 02:40:36AM +0200, Frederic Weisbecker wrote:
> 
> > This:
> > 
> > > smp_call_function_single_async() { 
> > > smp_call_function_single_async() {
> > > // verified csd->flags != CSD_LOCK // verified 
> > > csd->flags != CSD_LOCK
> > > csd->flags = CSD_LOCK  csd->flags = 
> > > CSD_LOCK
> > 
> > concurrent smp_call_function_single_async() using the same csd is what
> > I'm looking at as well.
> 
> So something like this ought to cure the fundamental problem and make
> smp_call_function_single_async() more user friendly, but also more
> expensive.
> 
> The problem is that while the ILB case is easy to fix, I can't seem to
> find an equally nice solution for the ttwu_remote_queue() case; that
> would basically require sticking the wake_csd in task_struct, I'll also
> post that.
> 
> So it's either this:

Or this:

---
 include/linux/sched.h | 4 
 kernel/sched/core.c   | 7 ---
 kernel/sched/fair.c   | 2 +-
 kernel/sched/sched.h  | 1 -
 4 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index f38d62c4632c..136ee400b568 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -696,6 +696,10 @@ struct task_struct {
struct uclamp_seuclamp[UCLAMP_CNT];
 #endif
 
+#ifdef CONFIG_SMP
+   call_single_data_t  wake_csd;
+#endif
+
 #ifdef CONFIG_PREEMPT_NOTIFIERS
/* List of struct preempt_notifier: */
struct hlist_head   preempt_notifiers;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5b286469e26e..a7129652e89b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2320,7 +2320,7 @@ static void ttwu_queue_remote(struct task_struct *p, int 
cpu, int wake_flags)
 
if (llist_add(&p->wake_entry, &rq->wake_list)) {
if (!set_nr_if_polling(rq->idle))
-   smp_call_function_single_async(cpu, &rq->wake_csd);
+   smp_call_function_single_async(cpu, &p->wake_csd);
else
trace_sched_wake_idle_without_ipi(cpu);
}
@@ -2921,6 +2921,9 @@ int sched_fork(unsigned long clone_flags, struct 
task_struct *p)
 #endif
 #if defined(CONFIG_SMP)
p->on_cpu = 0;
+   p->wake_csd = (struct __call_single_data) {
+   .func = wake_csd_func,
+   };
 #endif
init_task_preempt_count(p);
 #ifdef CONFIG_SMP
@@ -6723,8 +6726,6 @@ void __init sched_init(void)
rq->avg_idle = 2*sysctl_sched_migration_cost;
rq->max_idle_balance_cost = sysctl_sched_migration_cost;
 
-   rq_csd_init(rq, &rq->wake_csd, wake_csd_func);
-
INIT_LIST_HEAD(&rq->cfs_tasks);
 
rq_attach_root(rq, &def_root_domain);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 01f94cf52783..b6d8a7b991f0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10033,7 +10033,7 @@ static void kick_ilb(unsigned int flags)
 * is idle. And the softirq performing nohz idle load balance
 * will be run before returning from the IPI.
 */
-   smp_call_function_single_async(ilb_cpu, &cpu_rq(ilb_cpu)->nohz_csd);
+   smp_call_function_single_async(ilb_cpu, &this_rq()->nohz_csd);
 }
 
 /*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index f7ab6334e992..c35f0ef43ab0 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1021,7 +1021,6 @@ struct rq {
 #endif
 
 #ifdef CONFIG_SMP
-   call_single_data_t  wake_csd;
struct llist_head   wake_list;
 #endif

Re: Endless soft-lockups for compiling workload since next-20200519

2020-05-21 Thread Peter Zijlstra

On Thu, May 21, 2020 at 11:39:39AM +0200, Peter Zijlstra wrote:
> On Thu, May 21, 2020 at 02:40:36AM +0200, Frederic Weisbecker wrote:

> This:
> 
> > smp_call_function_single_async() { 
> > smp_call_function_single_async() {
> > // verified csd->flags != CSD_LOCK // verified 
> > csd->flags != CSD_LOCK
> > csd->flags = CSD_LOCK  csd->flags = 
> > CSD_LOCK
> 
> concurrent smp_call_function_single_async() using the same csd is what
> I'm looking at as well.

So something like this ought to cure the fundamental problem and make
smp_call_function_single_async() more user friendly, but also more
expensive.

The problem is that while the ILB case is easy to fix, I can't seem to
find an equally nice solution for the ttwu_remote_queue() case; that
would basically require sticking the wake_csd in task_struct, I'll also
post that.

So it's either this:

---
 kernel/smp.c | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 84303197caf9..d1ca2a2d1cc7 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -109,6 +109,12 @@ static __always_inline void 
csd_lock_wait(call_single_data_t *csd)
smp_cond_load_acquire(&csd->flags, !(VAL & CSD_FLAG_LOCK));
 }
 
+/*
+ * csd_lock() can use non-atomic operations to set CSD_FLAG_LOCK because it's
+ * users are careful to only use CPU-local data. IOW, there is no cross-cpu
+ * lock usage. Also, you're not allowed to use smp_call_function*() from IRQs,
+ * and must be extra careful from SoftIRQ.
+ */
 static __always_inline void csd_lock(call_single_data_t *csd)
 {
csd_lock_wait(csd);
@@ -318,7 +324,7 @@ EXPORT_SYMBOL(smp_call_function_single);
 
 /**
  * smp_call_function_single_async(): Run an asynchronous function on a
- *  specific CPU.
+ *  specific CPU.
  * @cpu: The CPU to run on.
  * @csd: Pre-allocated and setup data structure
  *
@@ -339,18 +345,23 @@ EXPORT_SYMBOL(smp_call_function_single);
  */
 int smp_call_function_single_async(int cpu, call_single_data_t *csd)
 {
+   unsigned int csd_flags;
int err = 0;
 
preempt_disable();
 
-   if (csd->flags & CSD_FLAG_LOCK) {
+   /*
+* Unlike the regular smp_call_function*() APIs, this one is actually
+* usable from IRQ context, also the -EBUSY return value suggests
+* it is safe to share csd's.
+*/
+   csd_flags = READ_ONCE(csd->flags);
+   if (csd_flags & CSD_FLAG_LOCK ||
+   cmpxchg(&csd->flags, csd_flags, csd_flags | CSD_FLAG_LOCK) != 
csd_flags) {
err = -EBUSY;
goto out;
}
 
-   csd->flags = CSD_FLAG_LOCK;
-   smp_wmb();
-
err = generic_exec_single(cpu, csd, csd->func, csd->info);
 
 out:

Re: [PATCH 0/2] powerpc: Remove support for ppc405/440 Xilinx platforms

2020-05-21 Thread Christophe Leroy





Le 21/05/2020 à 09:02, Michael Ellerman a écrit :

Arnd Bergmann  writes:

+On Wed, Apr 8, 2020 at 2:04 PM Michael Ellerman  wrote:

Benjamin Herrenschmidt  writes:

On Fri, 2020-04-03 at 15:59 +1100, Michael Ellerman wrote:

Benjamin Herrenschmidt  writes:

IBM still put 40x cores inside POWER chips no ?


Oh yeah that's true. I guess most folks don't know that, or that they
run RHEL on them.


Is there a reason for not having those dts files in mainline then?
If nothing else, it would document what machines are still being
used with future kernels.


Sorry that part was a joke :D  Those chips don't run Linux.



Nice to know :)

What's the plan then, do we still want to keep 40x in the kernel ?

If yes, is it ok to drop the oldies anyway as done in my series 
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=172630 ?


(Note that this series will conflict with my series on hugepages on 8xx 
due to the PTE_ATOMIC_UPDATES stuff. I can rebase the 40x modernisation 
series on top of the 8xx hugepages series if it is worth it)


Christophe

Re: Endless soft-lockups for compiling workload since next-20200519

2020-05-21 Thread Peter Zijlstra

On Thu, May 21, 2020 at 02:40:36AM +0200, Frederic Weisbecker wrote:

>atomic_fetch_or(, 
> nohz_flags(0))
> softirq() {#VMEXIT or anything 
> that could stop a CPU for a while
> run_rebalance_domain() {
> nohz_idle_balance() {
> atomic_andnot(NOHZ_KICK_MASK, nohz_flag(0))

I'm an idiot and didn't have enough wake-up-juice; I missed that andnot
clearing the flag again.

Yes, fun fun fun..

Re: Endless soft-lockups for compiling workload since next-20200519

2020-05-21 Thread Peter Zijlstra

On Thu, May 21, 2020 at 02:40:36AM +0200, Frederic Weisbecker wrote:
> On Wed, May 20, 2020 at 02:50:56PM +0200, Peter Zijlstra wrote:
> > On Tue, May 19, 2020 at 11:58:17PM -0400, Qian Cai wrote:
> > > Just a head up. Repeatedly compiling kernels for a while would trigger
> > > endless soft-lockups since next-20200519 on both x86_64 and powerpc.
> > > .config are in,
> > 
> > Could be 90b5363acd47 ("sched: Clean up scheduler_ipi()"), although I've
> > not seen anything like that myself. Let me go have a look.
> > 
> > 
> > In as far as the logs are readable (they're a wrapped mess, please don't
> > do that!), they contain very little useful, as is typical with IPIs :/
> > 
> > > [ 1167.993773][C1] WARNING: CPU: 1 PID: 0 at kernel/smp.c:127
> > > flush_smp_call_function_queue+0x1fa/0x2e0
> 
> So I've tried to think of a race that could produce that and here is
> the only thing I could come up with. It's a bit complicated unfortunately:

This:

> smp_call_function_single_async() { 
> smp_call_function_single_async() {
> // verified csd->flags != CSD_LOCK // verified 
> csd->flags != CSD_LOCK
> csd->flags = CSD_LOCK  csd->flags = 
> CSD_LOCK

concurrent smp_call_function_single_async() using the same csd is what
I'm looking at as well. Now in the ILB case there is an easy cure:

(because there is only a single ilb target)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 01f94cf52783..b6d8a7b991f0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10033,7 +10033,7 @@ static void kick_ilb(unsigned int flags)
 * is idle. And the softirq performing nohz idle load balance
 * will be run before returning from the IPI.
 */
-   smp_call_function_single_async(ilb_cpu, &cpu_rq(ilb_cpu)->nohz_csd);
+   smp_call_function_single_async(ilb_cpu, &this_rq()->nohz_csd);
 }
 
 /*

Qian, can you give that a spin?

But I'm still not convinced of your scenario:

>kick_ilb() {
>atomic_fetch_or(, 
> nohz_flags(0))

> atomic_fetch_or(, nohz_flags(0))   #VMENTER
> smp_call_function_single_async() { 
> smp_call_function_single_async() {
> // verified csd->flags != CSD_LOCK // verified 
> csd->flags != CSD_LOCK
> csd->flags = CSD_LOCK  csd->flags = 
> CSD_LOCK

Note that we check the return value of atomic_fetch_or() and bail if
someone else set a flag in KICK_MASK before us.

Aah, I suppose you're saying this can happen when:

  !(flags & NOHZ_KICK_MASK)

? That's not supposed to happen though.


Anyway, let me go stare at the remove wake-up case, because i'm afraid
that might have the same problem too...

Re: [PATCH] input: i8042: Remove special PowerPC handling

2020-05-21 Thread Michael Ellerman

Dmitry Torokhov  writes:
> Hi Michael,
>
> On Wed, May 20, 2020 at 04:07:00PM +1000, Michael Ellerman wrote:
>> [ + Dmitry & linux-input ]
>> 
>> Nathan Chancellor  writes:
>> > This causes a build error with CONFIG_WALNUT because kb_cs and kb_data
>> > were removed in commit 917f0af9e5a9 ("powerpc: Remove arch/ppc and
>> > include/asm-ppc").
>> >
>> > ld.lld: error: undefined symbol: kb_cs
>> >> referenced by i8042-ppcio.h:28 (drivers/input/serio/i8042-ppcio.h:28)
>> >> input/serio/i8042.o:(__i8042_command) in archive drivers/built-in.a
>> >> referenced by i8042-ppcio.h:28 (drivers/input/serio/i8042-ppcio.h:28)
>> >> input/serio/i8042.o:(__i8042_command) in archive drivers/built-in.a
>> >> referenced by i8042-ppcio.h:28 (drivers/input/serio/i8042-ppcio.h:28)
>> >> input/serio/i8042.o:(__i8042_command) in archive drivers/built-in.a
>> >
>> > ld.lld: error: undefined symbol: kb_data
>> >> referenced by i8042.c:309 (drivers/input/serio/i8042.c:309)
>> >> input/serio/i8042.o:(__i8042_command) in archive drivers/built-in.a
>> >> referenced by i8042-ppcio.h:33 (drivers/input/serio/i8042-ppcio.h:33)
>> >> input/serio/i8042.o:(__i8042_command) in archive drivers/built-in.a
>> >> referenced by i8042.c:319 (drivers/input/serio/i8042.c:319)
>> >> input/serio/i8042.o:(__i8042_command) in archive drivers/built-in.a
>> >> referenced 15 more times
>> >
>> > Presumably since nobody has noticed this for the last 12 years, there is
>> > not anyone actually trying to use this driver so we can just remove this
>> > special walnut code and use the generic header so it builds for all
>> > configurations.
>> >
>> > Fixes: 917f0af9e5a9 ("powerpc: Remove arch/ppc and include/asm-ppc")
>> > Reported-by: kbuild test robot 
>> > Signed-off-by: Nathan Chancellor 
>> > ---
>> >  drivers/input/serio/i8042-ppcio.h | 57 ---
>> >  drivers/input/serio/i8042.h   |  2 --
>> >  2 files changed, 59 deletions(-)
>> >  delete mode 100644 drivers/input/serio/i8042-ppcio.h
>> 
>> This LGTM.
>> 
>> Acked-by: Michael Ellerman  (powerpc)
>> 
>> I assumed drivers/input/serio would be pretty quiet, but there's
>> actually some commits to it in linux-next. So perhaps this should go via
>> the input tree.
>> 
>> Dmitry do you want to take this, or should I take it via powerpc?
>> 
>> Original patch is here:
>>   
>> https://lore.kernel.org/lkml/20200518181043.3363953-1-natechancel...@gmail.com
>
> I'm fine with you taking it through powerpc.
>
> Acked-by: Dmitry Torokhov 

Thanks.

> Also, while I have your attention ;), could you please ack or take
> https://lore.kernel.org/lkml/20191002214854.GA114387@dtor-ws/ as I
> believe this is the last user or input_polled_dev API and I would like
> to drop it from the tree.

Ooof. Sorry, you are very patient :)

I have put it in my next-test.

In future feel free to send me mail off-list, or ping me on irc, if I
haven't picked something up after several months!

cheers

[PATCH] powerpc/4xx: Don't unmap NULL mbase

2020-05-21 Thread Michael Ellerman

From: huhai 

Signed-off-by: huhai 
---
 arch/powerpc/platforms/4xx/pci.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/4xx/pci.c b/arch/powerpc/platforms/4xx/pci.c
index e6e2adcc7b64..c13d64c3b019 100644
--- a/arch/powerpc/platforms/4xx/pci.c
+++ b/arch/powerpc/platforms/4xx/pci.c
@@ -1242,7 +1242,7 @@ static void __init ppc460sx_pciex_check_link(struct 
ppc4xx_pciex_port *port)
if (mbase == NULL) {
printk(KERN_ERR "%pOF: Can't map internal config space !",
port->node);
-   goto done;
+   return;
}
 
while (attempt && (0 == (in_le32(mbase + PECFG_460SX_DLLSTA)
@@ -1252,9 +1252,7 @@ static void __init ppc460sx_pciex_check_link(struct 
ppc4xx_pciex_port *port)
}
if (attempt)
port->link = 1;
-done:
iounmap(mbase);
-
 }
 
 static struct ppc4xx_pciex_hwops ppc460sx_pcie_hwops __initdata = {
-- 
2.25.1

Re: [PATCH 2/3] powerpc/pci: unmap legacy INTx interrupts of passthrough IO adapters

2020-05-21 Thread Cédric Le Goater

On 4/29/20 9:51 AM, Cédric Le Goater wrote:
> When a passthrough IO adapter is removed from a pseries machine using
> hash MMU and the XIVE interrupt mode, the POWER hypervisor, pHyp,
> expects the guest OS to have cleared all page table entries related to
> the adapter. If some are still present, the RTAS call which isolates
> the PCI slot returns error 9001 "valid outstanding translations" and
> the removal of the IO adapter fails.
> 
> INTx interrupt numbers need special care because Linux maps the
> interrupts automatically in the Linux interrupt number space if they
> are presented in the device tree node describing the IO adapter. These
> interrupts are not un-mapped automatically and in case of an hot-plug
> adapter, the PCI hot-plug layer needs to handle the cleanup to make
> sure that all the page table entries of the XIVE ESB pages are
> cleared.
> 
> Cc: "Oliver O'Halloran" 
> Signed-off-by: Cédric Le Goater 


We did some tests with differnt passthrough adapters under LPAPRs (PowerVM) 
and KVM guests P8 (HPT) and P9 (HPT+Radix). Baremetal behaves correctly. 
But I would feel more comfortable if someone could give a try (PATCH1-2) 
on a different system.


Thanks,

C. 


> ---
>  arch/powerpc/kernel/pci-hotplug.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/pci-hotplug.c 
> b/arch/powerpc/kernel/pci-hotplug.c
> index bf83f76563a3..9e9c6befd7ea 100644
> --- a/arch/powerpc/kernel/pci-hotplug.c
> +++ b/arch/powerpc/kernel/pci-hotplug.c
> @@ -57,6 +57,8 @@ void pcibios_release_device(struct pci_dev *dev)
>   struct pci_controller *phb = pci_bus_to_host(dev->bus);
>   struct pci_dn *pdn = pci_get_pdn(dev);
>  
> + irq_dispose_mapping(dev->irq);
> +
>   eeh_remove_device(dev);
>  
>   if (phb->controller_ops.release_device)
>

Re: [PATCH 0/2] powerpc: Remove support for ppc405/440 Xilinx platforms

2020-05-21 Thread Michael Ellerman

Arnd Bergmann  writes:
> +On Wed, Apr 8, 2020 at 2:04 PM Michael Ellerman  wrote:
>> Benjamin Herrenschmidt  writes:
>> > On Fri, 2020-04-03 at 15:59 +1100, Michael Ellerman wrote:
>> >> Benjamin Herrenschmidt  writes:
>> > IBM still put 40x cores inside POWER chips no ?
>>
>> Oh yeah that's true. I guess most folks don't know that, or that they
>> run RHEL on them.
>
> Is there a reason for not having those dts files in mainline then?
> If nothing else, it would document what machines are still being
> used with future kernels.

Sorry that part was a joke :D  Those chips don't run Linux.

cheers

60 matches

Mail list logo