Re: Does SMP work at all on 40x ?

2019-01-31 Thread Benjamin Herrenschmidt
On Wed, 2019-01-30 at 08:16 +0100, Christophe Leroy wrote:
> In transfer_to_handler() (entry_32.S), we have:
> 
> #if defined(CONFIG_40x) || defined(CONFIG_BOOKE)
> ...
> #ifdef CONFIG_SMP
>   CURRENT_THREAD_INFO(r9, r1)
>   lwz r9,TI_CPU(r9)
>   slwir9,r9,3
>   add r11,r11,r9
> #endif
> #endif
> 
> When running this piece of code, MMU translation is off. But r9 contains 
> the virtual addr of current_thread_info, so unless I miss something, 
> this cannot work on the 40x, can it ?
> 
> On CONFIG_BOOKE it works because phys addr = virt addr

There is no 40x SMP that I am aware of.

Cheers,
Ben.




Re: fix a layering violation in videobuf2 and improve dma_map_resource v2

2019-01-31 Thread Marek Szyprowski
Hi All,

On 2019-01-18 12:37, Christoph Hellwig wrote:
> Hi all,
>
> this series fixes a rather gross layering violation in videobuf2, which
> pokes into arm DMA mapping internals to get a DMA address for memory that
> does not have a page structure, and to do so fixes up the dma_map_resource
> implementation to not provide a somewhat dangerous default and improve
> the error handling.
>
> Changes since v1:
>  - don't apply bus offsets in dma_direct_map_resource

Works fine on older Exynos based boards with IOMMU and CMA disabled.

Tested-by: Marek Szyprowski 

Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland



Re: [PATCH] powerpc/mm: Add _PAGE_SAO to _PAGE_CACHE_CTL mask

2019-01-31 Thread Alexey Kardashevskiy



On 31/01/2019 00:35, Michael Ellerman wrote:
> Reza Arbab  writes:
> 
>> On Tue, Jan 29, 2019 at 08:37:28PM +0530, Aneesh Kumar K.V wrote:
>>> Not sure what the fix is about. We set the related hash pte flags via
>>>
>>> if ((pteflags & _PAGE_CACHE_CTL) == _PAGE_TOLERANT)
>>> rflags |= HPTE_R_I;
>>> else if ((pteflags & _PAGE_CACHE_CTL) == _PAGE_NON_IDEMPOTENT)
>>> rflags |= (HPTE_R_I | HPTE_R_G);
>>> else if ((pteflags & _PAGE_CACHE_CTL) == _PAGE_SAO)
>>> rflags |= (HPTE_R_W | HPTE_R_I | HPTE_R_M);
>>
>> Again, nothing broken here, just a code readability thing. As Alexey 
>> (and Charlie) noted, given the above it is a little confusing to define 
>> _PAGE_CACHE_CTL this way:
>>
>>   #define _PAGE_CACHE_CTL  (_PAGE_NON_IDEMPOTENT | _PAGE_TOLERANT)
> 
> Yeah that's confusing I agree.
> 
> It's not really a maintainability thing, because those bits are in the
> architecture, so they can't change.
> 
>> I like Alexey's idea, maybe just use a literal?
>>
>>   #define _PAGE_CACHE_CTL 0x30
> 
> I prefer your original patch. It serves as documentation on what values
> we expect to see in that field.


As documentation, it gives an idea that there can be both
_PAGE_NON_IDEMPOTENT and _PAGE_TOLERANT set which is not true. Putting
possible values in the comment next to "#define _PAGE_CACHE_CTL" will
document it properly imho.



-- 
Alexey


[PATCH RFC v2] powerpc/64s: rewriting interrupt entry code

2019-01-31 Thread Nicholas Piggin
Finally got around to making this more or less work. I didn't quite
know where it would end up, so I haven't got patches in clean pieces
yet. But for this RFC I would rather consider the end result from a
higher level (mainly of kernel/exceptions-64s.S new style of macros).

There are two motivations for this. First of all removing some of the
CPP spaghetti of nested macros and macro parameters that themseves are
passed other defines and things. It's totally non-linear and several
levels of indirection to work out what is going on or how to change
anything in 64s.h.

Improving this is done by instead using gas .macros. These have two
really nice properties that you can't do with CPP: first is that you
can conditionally expand parts of them based on expressions; second is
that you can modify parts of them using CPP. Oh also another one --
they do not have to all be put onto one line and separated with ';'!
Nice.

[ Not sure I like asm with indentations but maybe not used to it.
  Might change that back to flat because there is not _too_ much
  nesting. ]

Anyway, it sounds wonderful, but the reality is there's still some
twisty code when you actually implment everything. Some non-linear
indirections I've put in are "additions" for masked handlers (e.g.,
mask EE for some, branch to soft-NMI watchdog for others), and one
to be able to override the stack.

Other than those, it's quite linear albeit complex, you can step
through the macros to see what code will come out.

Generated code is very different too for hopefully some good reasons.
Mostly ignore that for now. I kind of didn't know how the macros
would turn out without implementing the code changes I wanted, but
I'll go back and try to do at least some bits incrementally.

One significant and not so nice thing is that registers are not in
the same place between different handlers. SRR0 may be in r18 in
one handler, and r19 in another. The reason for this is that different
handlers want to load different variety of registers initially, but
I want to save off scratch registers in contiguous ranges to allow
coalesing in the store queue. Interrupt entry has a lot of stores
to save regs, and then it'll go and call something with a spinlock
or atomic_add_unless and have to wait for them.

So symbolic registers are assigned a number for each interrupt when
they are defined. It's a bit ugly so I'm thinking about it. The other
option to keep this optimization is to instead store registers to
different locations in the scratch save area (so SRR0 would always be
r18, but r18 may be saved at EXGEN+32 or EXGEN+40), although that is
a lot more ugliness to do the saving and still frustrates code
sharing, but it may give a nicer result.

Thanks,
Nick
---
 include/asm/exception-64s.h |  622 ---
 include/asm/hw_irq.h|   93 -
 include/asm/ppc_asm.h   |   14 
 include/asm/ptrace.h|3 
 kernel/dbell.c  |3 
 kernel/entry_64.S   |6 
 kernel/exceptions-64s.S | 3622 ++--
 kernel/irq.c|   18 
 kernel/time.c   |   66 
 kernel/traps.c  |   17 
 mm/fault.c  |   10 
 perf/core-book3s.c  |   13 
 12 files changed, 2997 insertions(+), 1490 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 3b4767ed3ec5..4be71c2504fc 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -24,48 +24,18 @@
  *  as published by the Free Software Foundation; either version
  *  2 of the License, or (at your option) any later version.
  */
-/*
- * The following macros define the code that appears as
- * the prologue to each of the exception handlers.  They
- * are split into two parts to allow a single kernel binary
- * to be used for pSeries and iSeries.
- *
- * We make as much of the exception code common between native
- * exception handlers (including pSeries LPAR) and iSeries LPAR
- * implementations as possible.
- */
-#include 
-#include 
 
-/* PACA save area offsets (exgen, exmc, etc) */
-#define EX_R9  0
-#define EX_R10 8
-#define EX_R11 16
-#define EX_R12 24
-#define EX_R13 32
-#define EX_DAR 40
-#define EX_DSISR   48
-#define EX_CCR 52
-#define EX_CFAR56
-#define EX_PPR 64
-#if defined(CONFIG_RELOCATABLE)
-#define EX_CTR 72
-#define EX_SIZE10  /* size in u64 units */
-#else
-#define EX_SIZE9   /* size in u64 units */
-#endif
+#include 
 
 /*
- * maximum recursive depth of MCE exceptions
+ * Size of register save areas in paca
  */
-#define MAX_MCE_DEPTH  4
+#define EX_SIZE12
 
 /*
- * EX_R3 is only used by the bad_stack handler. bad_stack reloads and
- * saves DAR from SPRN_DAR, and EX_DAR is not used. So EX_R3 can overlap
- * with EX_DAR.
+ * maximum recursive depth of MCE 

[PATCH] powerpc/powernv: Escalate reset when IODA reset fails

2019-01-31 Thread Oliver O'Halloran
The IODA reset is used to flush out any OS controlled state from the PHB.
This reset can fail if a PHB fatal error has occurred in early boot,
probably due to a because of a bad device. We already do a fundemental
reset of the device in some cases, so this patch just adds a test to force
a full reset if firmware reports an error when performing the IODA reset.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 1d6406a..53982f8 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3943,9 +3943,12 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
 * shutdown PCI devices correctly. We already got IODA table
 * cleaned out. So we have to issue PHB reset to stop all PCI
 * transactions from previous kernel. The ppc_pci_reset_phbs
-* kernel parameter will force this reset too.
+* kernel parameter will force this reset too. Additionally,
+* if the IODA reset above failed then use a bigger hammer.
+* This can happen if we get a PHB fatal error in very early
+* boot.
 */
-   if (is_kdump_kernel() || pci_reset_phbs) {
+   if (is_kdump_kernel() || pci_reset_phbs || rc) {
pr_info("  Issue PHB reset ...\n");
pnv_eeh_phb_reset(hose, EEH_RESET_FUNDAMENTAL);
pnv_eeh_phb_reset(hose, EEH_RESET_DEACTIVATE);
-- 
2.9.4



Re: [PATCH v02] powerpc/pseries: Check for ceded CPU's during LPAR migration

2019-01-31 Thread Tyrel Datwyler
On 01/31/2019 02:21 PM, Tyrel Datwyler wrote:
> On 01/31/2019 01:53 PM, Michael Bringmann wrote:
>> On 1/30/19 11:38 PM, Michael Ellerman wrote:
>>> Michael Bringmann  writes:
 This patch is to check for cede'ed CPUs during LPM.  Some extreme
 tests encountered a problem ehere Linux has put some threads to
 sleep (possibly to save energy or something), LPM was attempted,
 and the Linux kernel didn't awaken the sleeping threads, but issued
 the H_JOIN for the active threads.  Since the sleeping threads
 are not awake, they can not issue the expected H_JOIN, and the
 partition would never suspend.  This patch wakes the sleeping
 threads back up.
>>>
>>> I'm don't think this is the right solution.
>>>
>>> Just after your for loop we do an on_each_cpu() call, which sends an IPI
>>> to every CPU, and that should wake all CPUs up from CEDE.
>>>
>>> If that's not happening then there is a bug somewhere, and we need to
>>> work out where.
>>
>> Let me explain the scenario of the LPM case that Pete Heyrman found, and
>> that Nathan F. was working upon, previously.
>>
>> In the scenario, the partition has 5 dedicated processors each with 8 threads
>> running.
> 
> Do we CEDE processors when running dedicated? I thought H_CEDE was part of the
> Shared Processor LPAR option.

Looks like the cpuidle-pseries driver uses CEDE with dedicated processors as
long as firmware supports SPLPAR option.

> 
>>
>> From the PHYP data we can see that on VP 0, threads 3, 4, 5, 6 and 7 issued
>> a H_CEDE requesting to save energy by putting the requesting thread into
>> sleep mode.  In this state, the thread will only be awakened by H_PROD from
>> another running thread or from an external user action (power off, reboot
>> and such).  Timers and external interrupts are disabled in this mode.
> 
> Not according to PAPR. A CEDE'd processor should awaken if signaled by 
> external
> interrupt such as decrementer or IPI as well.

This statement should still apply though. From PAPR:

14.11.3.3 H_CEDE
The architectural intent of this hcall() is to have the virtual processor, which
has no useful work to do, enter a wait state ceding its processor capacity to
other virtual processors until some useful work appears, signaled either through
an interrupt or a prod hcall(). To help the caller reduce race conditions, this
call may be made with interrupts disabled but the semantics of the hcall()
enable the virtual processor’s interrupts so that it may always receive wake up
interrupt signals.

-Tyrel

> 
> -Tyrel
> 
>>
>> About 3 seconds later, as part of the LPM operation, the other 35 threads
>> have all issued a H_JOIN request.  Join is part of the LPM process where
>> the threads suspend themselves as part of the LPM operation so the partition
>> can be migrated to the target server.
>>
>> So, the current state is the the OS has suspended the execution of all the
>> threads in the partition without successfully suspending all threads as part
>> of LPM.
>>
>> Net, OS has an issue where they suspended every processor thread so nothing
>> can run.
>>
>> This appears to be slightly different than the previous LPM stalls we have
>> seen where the migration stalls because of cpus being taken offline and not
>> making the H_JOIN call.
>>
>> In this scenario we appear to have CPUs that have done an H_CEDE prior to
>> the LPM. For these CPUs we would need to do a H_PROD to wake them back up
>> so they can do a H_JOIN and allow the LPM to continue.
>>
>> The problem is that Linux has some threads that they put to sleep (probably
>> to save energy or something), LPM was attempted, Linux didn't awaken the
>> sleeping threads but issued the H_JOIN for the active threads.  Since the
>> sleeping threads don't issue the H_JOIN the partition will never suspend.
>>
>> I am checking again with Pete regarding your concerns.
>>
>> Thanks.
>>
>>>
>>>
 diff --git a/arch/powerpc/include/asm/plpar_wrappers.h 
 b/arch/powerpc/include/asm/plpar_wrappers.h
 index cff5a41..8292eff 100644
 --- a/arch/powerpc/include/asm/plpar_wrappers.h
 +++ b/arch/powerpc/include/asm/plpar_wrappers.h
 @@ -26,10 +26,8 @@ static inline void set_cede_latency_hint(u8 
 latency_hint)
get_lppaca()->cede_latency_hint = latency_hint;
  }
  
 -static inline long cede_processor(void)
 -{
 -  return plpar_hcall_norets(H_CEDE);
 -}
 +int cpu_is_ceded(int cpu);
 +long cede_processor(void);
  
  static inline long extended_cede_processor(unsigned long latency_hint)
  {
 diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
 index de35bd8f..fea3d21 100644
 --- a/arch/powerpc/kernel/rtas.c
 +++ b/arch/powerpc/kernel/rtas.c
 @@ -44,6 +44,7 @@
  #include 
  #include 
  #include 
 +#include 
  
  /* This is here deliberately so it's only used in this file */
  void enter_rtas(unsigned long);
 @@ -942,7 +943,7 @@ int 

Re: [PATCH v02] powerpc/pseries: Check for ceded CPU's during LPAR migration

2019-01-31 Thread Michael Bringmann
On 1/31/19 4:21 PM, Tyrel Datwyler wrote:
> On 01/31/2019 01:53 PM, Michael Bringmann wrote:
>> On 1/30/19 11:38 PM, Michael Ellerman wrote:
>>> Michael Bringmann  writes:
 This patch is to check for cede'ed CPUs during LPM.  Some extreme
 tests encountered a problem ehere Linux has put some threads to
 sleep (possibly to save energy or something), LPM was attempted,
 and the Linux kernel didn't awaken the sleeping threads, but issued
 the H_JOIN for the active threads.  Since the sleeping threads
 are not awake, they can not issue the expected H_JOIN, and the
 partition would never suspend.  This patch wakes the sleeping
 threads back up.
>>>
>>> I'm don't think this is the right solution.
>>>
>>> Just after your for loop we do an on_each_cpu() call, which sends an IPI
>>> to every CPU, and that should wake all CPUs up from CEDE.
>>>
>>> If that's not happening then there is a bug somewhere, and we need to
>>> work out where.
>>
>> Let me explain the scenario of the LPM case that Pete Heyrman found, and
>> that Nathan F. was working upon, previously.
>>
>> In the scenario, the partition has 5 dedicated processors each with 8 threads
>> running.
> 
> Do we CEDE processors when running dedicated? I thought H_CEDE was part of the
> Shared Processor LPAR option.
> 
>>
>> From the PHYP data we can see that on VP 0, threads 3, 4, 5, 6 and 7 issued
>> a H_CEDE requesting to save energy by putting the requesting thread into
>> sleep mode.  In this state, the thread will only be awakened by H_PROD from
>> another running thread or from an external user action (power off, reboot
>> and such).  Timers and external interrupts are disabled in this mode.
> 
> Not according to PAPR. A CEDE'd processor should awaken if signaled by 
> external
> interrupt such as decrementer or IPI as well.

Checking these points with Pete H.
Thanks.

> 
> -Tyrel
> 
>>
>> About 3 seconds later, as part of the LPM operation, the other 35 threads
>> have all issued a H_JOIN request.  Join is part of the LPM process where
>> the threads suspend themselves as part of the LPM operation so the partition
>> can be migrated to the target server.
>>
>> So, the current state is the the OS has suspended the execution of all the
>> threads in the partition without successfully suspending all threads as part
>> of LPM.
>>
>> Net, OS has an issue where they suspended every processor thread so nothing
>> can run.
>>
>> This appears to be slightly different than the previous LPM stalls we have
>> seen where the migration stalls because of cpus being taken offline and not
>> making the H_JOIN call.
>>
>> In this scenario we appear to have CPUs that have done an H_CEDE prior to
>> the LPM. For these CPUs we would need to do a H_PROD to wake them back up
>> so they can do a H_JOIN and allow the LPM to continue.
>>
>> The problem is that Linux has some threads that they put to sleep (probably
>> to save energy or something), LPM was attempted, Linux didn't awaken the
>> sleeping threads but issued the H_JOIN for the active threads.  Since the
>> sleeping threads don't issue the H_JOIN the partition will never suspend.
>>
>> I am checking again with Pete regarding your concerns.
>>
>> Thanks.
>>
>>>
>>>
 diff --git a/arch/powerpc/include/asm/plpar_wrappers.h 
 b/arch/powerpc/include/asm/plpar_wrappers.h
 index cff5a41..8292eff 100644
 --- a/arch/powerpc/include/asm/plpar_wrappers.h
 +++ b/arch/powerpc/include/asm/plpar_wrappers.h
 @@ -26,10 +26,8 @@ static inline void set_cede_latency_hint(u8 
 latency_hint)
get_lppaca()->cede_latency_hint = latency_hint;
  }
  
 -static inline long cede_processor(void)
 -{
 -  return plpar_hcall_norets(H_CEDE);
 -}
 +int cpu_is_ceded(int cpu);
 +long cede_processor(void);
  
  static inline long extended_cede_processor(unsigned long latency_hint)
  {
 diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
 index de35bd8f..fea3d21 100644
 --- a/arch/powerpc/kernel/rtas.c
 +++ b/arch/powerpc/kernel/rtas.c
 @@ -44,6 +44,7 @@
  #include 
  #include 
  #include 
 +#include 
  
  /* This is here deliberately so it's only used in this file */
  void enter_rtas(unsigned long);
 @@ -942,7 +943,7 @@ int rtas_ibm_suspend_me(u64 handle)
struct rtas_suspend_me_data data;
DECLARE_COMPLETION_ONSTACK(done);
cpumask_var_t offline_mask;
 -  int cpuret;
 +  int cpuret, cpu;
  
if (!rtas_service_present("ibm,suspend-me"))
return -ENOSYS;
 @@ -991,6 +992,11 @@ int rtas_ibm_suspend_me(u64 handle)
goto out_hotplug_enable;
}
  
 +  for_each_present_cpu(cpu) {
 +  if (cpu_is_ceded(cpu))
 +  plpar_hcall_norets(H_PROD, 
 get_hard_smp_processor_id(cpu));
 +  }
>>>
>>> There's a race condition here, there's nothing to prevent the CPUs you
>>> 

Re: [PATCH v02] powerpc/pseries: Check for ceded CPU's during LPAR migration

2019-01-31 Thread Tyrel Datwyler
On 01/31/2019 01:53 PM, Michael Bringmann wrote:
> On 1/30/19 11:38 PM, Michael Ellerman wrote:
>> Michael Bringmann  writes:
>>> This patch is to check for cede'ed CPUs during LPM.  Some extreme
>>> tests encountered a problem ehere Linux has put some threads to
>>> sleep (possibly to save energy or something), LPM was attempted,
>>> and the Linux kernel didn't awaken the sleeping threads, but issued
>>> the H_JOIN for the active threads.  Since the sleeping threads
>>> are not awake, they can not issue the expected H_JOIN, and the
>>> partition would never suspend.  This patch wakes the sleeping
>>> threads back up.
>>
>> I'm don't think this is the right solution.
>>
>> Just after your for loop we do an on_each_cpu() call, which sends an IPI
>> to every CPU, and that should wake all CPUs up from CEDE.
>>
>> If that's not happening then there is a bug somewhere, and we need to
>> work out where.
> 
> Let me explain the scenario of the LPM case that Pete Heyrman found, and
> that Nathan F. was working upon, previously.
> 
> In the scenario, the partition has 5 dedicated processors each with 8 threads
> running.

Do we CEDE processors when running dedicated? I thought H_CEDE was part of the
Shared Processor LPAR option.

> 
> From the PHYP data we can see that on VP 0, threads 3, 4, 5, 6 and 7 issued
> a H_CEDE requesting to save energy by putting the requesting thread into
> sleep mode.  In this state, the thread will only be awakened by H_PROD from
> another running thread or from an external user action (power off, reboot
> and such).  Timers and external interrupts are disabled in this mode.

Not according to PAPR. A CEDE'd processor should awaken if signaled by external
interrupt such as decrementer or IPI as well.

-Tyrel

> 
> About 3 seconds later, as part of the LPM operation, the other 35 threads
> have all issued a H_JOIN request.  Join is part of the LPM process where
> the threads suspend themselves as part of the LPM operation so the partition
> can be migrated to the target server.
> 
> So, the current state is the the OS has suspended the execution of all the
> threads in the partition without successfully suspending all threads as part
> of LPM.
> 
> Net, OS has an issue where they suspended every processor thread so nothing
> can run.
> 
> This appears to be slightly different than the previous LPM stalls we have
> seen where the migration stalls because of cpus being taken offline and not
> making the H_JOIN call.
> 
> In this scenario we appear to have CPUs that have done an H_CEDE prior to
> the LPM. For these CPUs we would need to do a H_PROD to wake them back up
> so they can do a H_JOIN and allow the LPM to continue.
> 
> The problem is that Linux has some threads that they put to sleep (probably
> to save energy or something), LPM was attempted, Linux didn't awaken the
> sleeping threads but issued the H_JOIN for the active threads.  Since the
> sleeping threads don't issue the H_JOIN the partition will never suspend.
> 
> I am checking again with Pete regarding your concerns.
> 
> Thanks.
> 
>>
>>
>>> diff --git a/arch/powerpc/include/asm/plpar_wrappers.h 
>>> b/arch/powerpc/include/asm/plpar_wrappers.h
>>> index cff5a41..8292eff 100644
>>> --- a/arch/powerpc/include/asm/plpar_wrappers.h
>>> +++ b/arch/powerpc/include/asm/plpar_wrappers.h
>>> @@ -26,10 +26,8 @@ static inline void set_cede_latency_hint(u8 latency_hint)
>>> get_lppaca()->cede_latency_hint = latency_hint;
>>>  }
>>>  
>>> -static inline long cede_processor(void)
>>> -{
>>> -   return plpar_hcall_norets(H_CEDE);
>>> -}
>>> +int cpu_is_ceded(int cpu);
>>> +long cede_processor(void);
>>>  
>>>  static inline long extended_cede_processor(unsigned long latency_hint)
>>>  {
>>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>>> index de35bd8f..fea3d21 100644
>>> --- a/arch/powerpc/kernel/rtas.c
>>> +++ b/arch/powerpc/kernel/rtas.c
>>> @@ -44,6 +44,7 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>>  
>>>  /* This is here deliberately so it's only used in this file */
>>>  void enter_rtas(unsigned long);
>>> @@ -942,7 +943,7 @@ int rtas_ibm_suspend_me(u64 handle)
>>> struct rtas_suspend_me_data data;
>>> DECLARE_COMPLETION_ONSTACK(done);
>>> cpumask_var_t offline_mask;
>>> -   int cpuret;
>>> +   int cpuret, cpu;
>>>  
>>> if (!rtas_service_present("ibm,suspend-me"))
>>> return -ENOSYS;
>>> @@ -991,6 +992,11 @@ int rtas_ibm_suspend_me(u64 handle)
>>> goto out_hotplug_enable;
>>> }
>>>  
>>> +   for_each_present_cpu(cpu) {
>>> +   if (cpu_is_ceded(cpu))
>>> +   plpar_hcall_norets(H_PROD, 
>>> get_hard_smp_processor_id(cpu));
>>> +   }
>>
>> There's a race condition here, there's nothing to prevent the CPUs you
>> just PROD'ed from going back into CEDE before you do the on_each_cpu()
>> call below> 
>>> /* Call function on all CPUs.  One of us will make the
>>>  * rtas call
>>>  */
>>> diff --git 

Re: [PATCH v02] powerpc/pseries: Check for ceded CPU's during LPAR migration

2019-01-31 Thread Michael Bringmann
On 1/30/19 11:38 PM, Michael Ellerman wrote:
> Michael Bringmann  writes:
>> This patch is to check for cede'ed CPUs during LPM.  Some extreme
>> tests encountered a problem ehere Linux has put some threads to
>> sleep (possibly to save energy or something), LPM was attempted,
>> and the Linux kernel didn't awaken the sleeping threads, but issued
>> the H_JOIN for the active threads.  Since the sleeping threads
>> are not awake, they can not issue the expected H_JOIN, and the
>> partition would never suspend.  This patch wakes the sleeping
>> threads back up.
> 
> I'm don't think this is the right solution.
> 
> Just after your for loop we do an on_each_cpu() call, which sends an IPI
> to every CPU, and that should wake all CPUs up from CEDE.
> 
> If that's not happening then there is a bug somewhere, and we need to
> work out where.

Let me explain the scenario of the LPM case that Pete Heyrman found, and
that Nathan F. was working upon, previously.

In the scenario, the partition has 5 dedicated processors each with 8 threads
running.

>From the PHYP data we can see that on VP 0, threads 3, 4, 5, 6 and 7 issued
a H_CEDE requesting to save energy by putting the requesting thread into
sleep mode.  In this state, the thread will only be awakened by H_PROD from
another running thread or from an external user action (power off, reboot
and such).  Timers and external interrupts are disabled in this mode.

About 3 seconds later, as part of the LPM operation, the other 35 threads
have all issued a H_JOIN request.  Join is part of the LPM process where
the threads suspend themselves as part of the LPM operation so the partition
can be migrated to the target server.

So, the current state is the the OS has suspended the execution of all the
threads in the partition without successfully suspending all threads as part
of LPM.

Net, OS has an issue where they suspended every processor thread so nothing
can run.

This appears to be slightly different than the previous LPM stalls we have
seen where the migration stalls because of cpus being taken offline and not
making the H_JOIN call.

In this scenario we appear to have CPUs that have done an H_CEDE prior to
the LPM. For these CPUs we would need to do a H_PROD to wake them back up
so they can do a H_JOIN and allow the LPM to continue.

The problem is that Linux has some threads that they put to sleep (probably
to save energy or something), LPM was attempted, Linux didn't awaken the
sleeping threads but issued the H_JOIN for the active threads.  Since the
sleeping threads don't issue the H_JOIN the partition will never suspend.

I am checking again with Pete regarding your concerns.

Thanks.

> 
> 
>> diff --git a/arch/powerpc/include/asm/plpar_wrappers.h 
>> b/arch/powerpc/include/asm/plpar_wrappers.h
>> index cff5a41..8292eff 100644
>> --- a/arch/powerpc/include/asm/plpar_wrappers.h
>> +++ b/arch/powerpc/include/asm/plpar_wrappers.h
>> @@ -26,10 +26,8 @@ static inline void set_cede_latency_hint(u8 latency_hint)
>>  get_lppaca()->cede_latency_hint = latency_hint;
>>  }
>>  
>> -static inline long cede_processor(void)
>> -{
>> -return plpar_hcall_norets(H_CEDE);
>> -}
>> +int cpu_is_ceded(int cpu);
>> +long cede_processor(void);
>>  
>>  static inline long extended_cede_processor(unsigned long latency_hint)
>>  {
>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>> index de35bd8f..fea3d21 100644
>> --- a/arch/powerpc/kernel/rtas.c
>> +++ b/arch/powerpc/kernel/rtas.c
>> @@ -44,6 +44,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  /* This is here deliberately so it's only used in this file */
>>  void enter_rtas(unsigned long);
>> @@ -942,7 +943,7 @@ int rtas_ibm_suspend_me(u64 handle)
>>  struct rtas_suspend_me_data data;
>>  DECLARE_COMPLETION_ONSTACK(done);
>>  cpumask_var_t offline_mask;
>> -int cpuret;
>> +int cpuret, cpu;
>>  
>>  if (!rtas_service_present("ibm,suspend-me"))
>>  return -ENOSYS;
>> @@ -991,6 +992,11 @@ int rtas_ibm_suspend_me(u64 handle)
>>  goto out_hotplug_enable;
>>  }
>>  
>> +for_each_present_cpu(cpu) {
>> +if (cpu_is_ceded(cpu))
>> +plpar_hcall_norets(H_PROD, 
>> get_hard_smp_processor_id(cpu));
>> +}
> 
> There's a race condition here, there's nothing to prevent the CPUs you
> just PROD'ed from going back into CEDE before you do the on_each_cpu()
> call below> 
>>  /* Call function on all CPUs.  One of us will make the
>>   * rtas call
>>   */
>> diff --git a/arch/powerpc/platforms/pseries/setup.c 
>> b/arch/powerpc/platforms/pseries/setup.c
>> index 41f62ca2..48ae6d4 100644
>> --- a/arch/powerpc/platforms/pseries/setup.c
>> +++ b/arch/powerpc/platforms/pseries/setup.c
>> @@ -331,6 +331,24 @@ static int alloc_dispatch_log_kmem_cache(void)
>>  }
>>  machine_early_initcall(pseries, alloc_dispatch_log_kmem_cache);
>>  
>> +static DEFINE_PER_CPU(int, cpu_ceded);
>> +
>> +int cpu_is_ceded(int 

[PATCH] powerpc: Enable kernel XZ compression option on 44x

2019-01-31 Thread Christian Lamparter
Enable kernel XZ compression option on 44x.
Tested on a Western Digital - MyBook Live NAS.
It takes 22 seconds for the 800 MHz CPU to decompress
and boot a 2.63 MiB XZ-compressed kernel simpleImage.

Signed-off-by: Christian Lamparter 
---
 arch/powerpc/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2890d36eb531..58b6ad3555e0 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -201,7 +201,7 @@ config PPC
select HAVE_IOREMAP_PROT
select HAVE_IRQ_EXIT_ON_IRQ_STACK
select HAVE_KERNEL_GZIP
-   select HAVE_KERNEL_XZ   if PPC_BOOK3S
+   select HAVE_KERNEL_XZ   if PPC_BOOK3S || 44x
select HAVE_KPROBES
select HAVE_KPROBES_ON_FTRACE
select HAVE_KRETPROBES
-- 
2.20.1



[PATCH] powerpc: drop page_is_ram() and walk_system_ram_range()

2019-01-31 Thread Christophe Leroy
Since commit c40dd2f76644 ("powerpc: Add System RAM to /proc/iomem")
it is possible to use the generic walk_system_ram_range() and
the generic page_is_ram().

Fixes: c40dd2f76644 ("powerpc: Add System RAM to /proc/iomem")
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig|  3 ---
 arch/powerpc/include/asm/page.h |  1 -
 arch/powerpc/mm/mem.c   | 33 -
 3 files changed, 37 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 0a26e0075ce5..0006ca6a7664 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -479,9 +479,6 @@ config ARCH_CPU_PROBE_RELEASE
 config ARCH_ENABLE_MEMORY_HOTPLUG
def_bool y
 
-config ARCH_HAS_WALK_MEMORY
-   def_bool y
-
 config ARCH_ENABLE_MEMORY_HOTREMOVE
def_bool y
 
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index 5c5ea2413413..aa4497175bd3 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -326,7 +326,6 @@ struct page;
 extern void clear_user_page(void *page, unsigned long vaddr, struct page *pg);
 extern void copy_user_page(void *to, void *from, unsigned long vaddr,
struct page *p);
-extern int page_is_ram(unsigned long pfn);
 extern int devmem_is_allowed(unsigned long pfn);
 
 #ifdef CONFIG_PPC_SMLPAR
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 33cc6f676fa6..fa9916c2c662 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -80,11 +80,6 @@ static inline pte_t *virt_to_kpte(unsigned long vaddr)
 #define TOP_ZONE ZONE_NORMAL
 #endif
 
-int page_is_ram(unsigned long pfn)
-{
-   return memblock_is_memory(__pfn_to_phys(pfn));
-}
-
 pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
  unsigned long size, pgprot_t vma_prot)
 {
@@ -176,34 +171,6 @@ int __meminit arch_remove_memory(int nid, u64 start, u64 
size,
 #endif
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
-/*
- * walk_memory_resource() needs to make sure there is no holes in a given
- * memory range.  PPC64 does not maintain the memory layout in /proc/iomem.
- * Instead it maintains it in memblock.memory structures.  Walk through the
- * memory regions, find holes and callback for contiguous regions.
- */
-int
-walk_system_ram_range(unsigned long start_pfn, unsigned long nr_pages,
-   void *arg, int (*func)(unsigned long, unsigned long, void *))
-{
-   struct memblock_region *reg;
-   unsigned long end_pfn = start_pfn + nr_pages;
-   unsigned long tstart, tend;
-   int ret = -1;
-
-   for_each_memblock(memory, reg) {
-   tstart = max(start_pfn, memblock_region_memory_base_pfn(reg));
-   tend = min(end_pfn, memblock_region_memory_end_pfn(reg));
-   if (tstart >= tend)
-   continue;
-   ret = (*func)(tstart, tend - tstart, arg);
-   if (ret)
-   break;
-   }
-   return ret;
-}
-EXPORT_SYMBOL_GPL(walk_system_ram_range);
-
 #ifndef CONFIG_NEED_MULTIPLE_NODES
 void __init mem_topology_setup(void)
 {
-- 
2.13.3



[RFC PATCH] powerpc/6xx: Don't set back MSR_RI before reenabling MMU

2019-01-31 Thread Christophe Leroy
By delaying the setting of MSR_RI, a 1% improvment is optained on
null_syscall selftest on an mpc8321.

Without this patch:

root@vgoippro:~# ./null_syscall
   1134.33 ns 378.11 cycles

With this patch:

root@vgoippro:~# ./null_syscall
   1121.85 ns 373.95 cycles

The drawback is that a machine check during that period
would be unrecoverable, but as only main memory is accessed
during that period, it shouldn't be a concern.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_32.S | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index 146385b1c2da..ea28a6ab56ec 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -282,8 +282,6 @@ __secondary_hold_acknowledge:
stw r1,GPR1(r11);   \
stw r1,0(r11);  \
tovirt(r1,r11); /* set new kernel sp */ \
-   li  r10,MSR_KERNEL & ~(MSR_IR|MSR_DR); /* can take exceptions */ \
-   MTMSRD(r10);/* (except for mach check in rtas) */ \
stw r0,GPR0(r11);   \
lis r10,STACK_FRAME_REGS_MARKER@ha; /* exception frame marker */ \
addir10,r10,STACK_FRAME_REGS_MARKER@l; \
-- 
2.13.3



Re: [PATCH] powerpc/powernv/npu: Remove redundant change_pte() hook

2019-01-31 Thread Andrea Arcangeli
On Thu, Jan 31, 2019 at 06:30:22PM +0800, Peter Xu wrote:
> The change_pte() notifier was designed to use as a quick path to
> update secondary MMU PTEs on write permission changes or PFN changes.
> For KVM, it could reduce the vm-exits when vcpu faults on the pages
> that was touched up by KSM.  It's not used to do cache invalidations,
> for example, if we see the notifier will be called before the real PTE
> update after all (please see set_pte_at_notify that set_pte_at was
> called later).
> 
> All the necessary cache invalidation should all be done in
> invalidate_range() already.
> 
> CC: Benjamin Herrenschmidt 
> CC: Paul Mackerras 
> CC: Michael Ellerman 
> CC: Alistair Popple 
> CC: Alexey Kardashevskiy 
> CC: Mark Hairgrove 
> CC: Balbir Singh 
> CC: David Gibson 
> CC: Andrea Arcangeli 
> CC: Jerome Glisse 
> CC: Jason Wang 
> CC: linuxppc-dev@lists.ozlabs.org
> CC: linux-ker...@vger.kernel.org
> Signed-off-by: Peter Xu 
> ---
>  arch/powerpc/platforms/powernv/npu-dma.c | 10 --
>  1 file changed, 10 deletions(-)

Reviewed-by: Andrea Arcangeli 

It doesn't make sense to implement change_pte as an invalidate,
change_pte is not compulsory to implement so if one wants to have
invalidates only, change_pte method shouldn't be implemented in the
first place and the common code will guarantee to invoke the range
invalidates instead.

Currently the whole change_pte optimization is effectively disabled as
noted in past discussions with Jerome (because of the range
invalidates that always surrounds it), so we need to revisit the whole
change_pte logic and decide it to re-enable it or to drop it as a
whole, but in the meantime it's good to cleanup spots like below that
should leave change_pte alone.

There are several examples of mmu_notifiers_ops in the kernel that
don't implement change_pte, in fact it's the majority. Of all mmu
notifier users, only nv_nmmu_notifier_ops, intel_mmuops_change and
kvm_mmu_notifier_ops implements change_pte and as Peter found out by
source review nv_nmmu_notifier_ops, intel_mmuops_change are wrong
about it and should stop implementing it as an invalidate.

In short change_pte is only implemented correctly from KVM which can
really updates the spte and flushes the TLB but the spte update
remains and could avoid a vmexit if we figure out how to re-enable the
optimization safely (the TLB fill after change_pte in KVM EPT/shadow
secondary MMU will be looked up by the CPU in hardware).

If change_pte is implemented, it should update the mapping like KVM
does and not do an invalidate.

Thanks,
Andrea

> 
> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
> b/arch/powerpc/platforms/powernv/npu-dma.c
> index 3f58c7dbd581..c003b29d870e 100644
> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> @@ -917,15 +917,6 @@ static void pnv_npu2_mn_release(struct mmu_notifier *mn,
>   mmio_invalidate(npu_context, 0, ~0UL);
>  }
>  
> -static void pnv_npu2_mn_change_pte(struct mmu_notifier *mn,
> - struct mm_struct *mm,
> - unsigned long address,
> - pte_t pte)
> -{
> - struct npu_context *npu_context = mn_to_npu_context(mn);
> - mmio_invalidate(npu_context, address, PAGE_SIZE);
> -}
> -
>  static void pnv_npu2_mn_invalidate_range(struct mmu_notifier *mn,
>   struct mm_struct *mm,
>   unsigned long start, unsigned long end)
> @@ -936,7 +927,6 @@ static void pnv_npu2_mn_invalidate_range(struct 
> mmu_notifier *mn,
>  
>  static const struct mmu_notifier_ops nv_nmmu_notifier_ops = {
>   .release = pnv_npu2_mn_release,
> - .change_pte = pnv_npu2_mn_change_pte,
>   .invalidate_range = pnv_npu2_mn_invalidate_range,
>  };
>  
> -- 
> 2.17.1
> 


Re: [PATCH 3/3] videobuf2: replace a layering violation with dma_map_resource

2019-01-31 Thread Christoph Hellwig
Hi Marek,

can chance you could retest the v2 version?


Re: [PATCH v2 19/21] treewide: add checks for the return value of memblock_alloc*()

2019-01-31 Thread Max Filippov
On Mon, Jan 21, 2019 at 12:06 AM Mike Rapoport  wrote:
>
> Add check for the return value of memblock_alloc*() functions and call
> panic() in case of error.
> The panic message repeats the one used by panicing memblock allocators with
> adjustment of parameters to include only relevant ones.
>
> The replacement was mostly automated with semantic patches like the one
> below with manual massaging of format strings.
>
> @@
> expression ptr, size, align;
> @@
> ptr = memblock_alloc(size, align);
> + if (!ptr)
> +   panic("%s: Failed to allocate %lu bytes align=0x%lx\n", __func__,
> size, align);
>
> Signed-off-by: Mike Rapoport 
> Reviewed-by: Guo Ren  # c-sky
> Acked-by: Paul Burton  # MIPS
> Acked-by: Heiko Carstens  # s390
> Reviewed-by: Juergen Gross  # Xen
> ---
>  arch/xtensa/mm/kasan_init.c   |  4 
>  arch/xtensa/mm/mmu.c  |  3 +++

For xtensa:
Acked-by: Max Filippov 

-- 
Thanks.
-- Max


Re: [PATCH] powerpc: Ensure gcc doesn't move around cache flushing in __patch_instruction

2019-01-31 Thread Christophe Leroy




Le 18/05/2018 à 01:00, Segher Boessenkool a écrit :

On Fri, May 18, 2018 at 08:30:27AM +1000, Benjamin Herrenschmidt wrote:

On Thu, 2018-05-17 at 14:23 -0500, Segher Boessenkool wrote:

On Thu, May 17, 2018 at 01:06:10PM +1000, Benjamin Herrenschmidt wrote:

The current asm statement in __patch_instruction() for the cache flushes
doesn't have a "volatile" statement and no memory clobber. That means
gcc can potentially move it around (or move the store done by put_user
past the flush).


volatile is completely superfluous here, except maybe as documentation:
any asm without outputs is always volatile.


I wasn't aware of that. I was drilled early on to always stick volatile
in my asm statements if they have any form of side effect :-)


If an asm without output was not marked automatically as having another
side effect, every such asm would be immediately deleted ;-)

Adding volatile as documentation for side effects can be good; it just
doesn't do much (nothing, in fact) for asms without output as far as
the compiler is concerned.


(And the memory clobber does not prevent the compiler from moving the
asm around, or duplicating it, etc., and neither does the volatile).


It prevents load/stores from moving around doesn't it ? I wanted to
make sure the store of the instruction doesn't move in/pass the asm. If
you say that's not needed then ignore the patch.


No, it's fine here, and you want either that or put exactly the memory
you are touching in a constraint (probably overkill here).  I just
wanted to say that a "memory" clobber does nothing more than say the
asm touches some unspecified memory; there is no magic other meaning
to it.  Your patch is correct, just the "volatile" part isn't needed,
and the explanation was a bit cargo-culty ;-)



Any plan to get that merged ?

Christophe


Re: use generic DMA mapping code in powerpc V4

2019-01-31 Thread Christian Zigotzky

Hi Christoph,

I compiled kernels for the X5000 and X1000 from your branch 
'powerpc-dma.6' today.


Gitweb: 
http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/powerpc-dma.6


git clone git://git.infradead.org/users/hch/misc.git -b powerpc-dma.6 a

The X1000 and X5000 boot but unfortunately the P.A. Semi Ethernet 
doesn't work.


Error messages (X1000):

[   17.371736] pci :00:1a.0: overflow 0x0002691bf802+1646 of DMA 
mask  bus mask 0
[   17.371760] WARNING: CPU: 0 PID: 2496 at kernel/dma/direct.c:43 
.dma_direct_map_page+0x11c/0x200

[   17.371762] Modules linked in:
[   17.371769] CPU: 0 PID: 2496 Comm: NetworkManager Not tainted 
5.0.0-rc4-3_A-EON_AmigaOne_X1000_Nemo-54580-g8d7a724-dirty #2
[   17.371772] NIP:  c010395c LR: c0103a30 CTR: 
c0726f70
[   17.371775] REGS: c0026900e9a0 TRAP: 0700   Not tainted 
(5.0.0-rc4-3_A-EON_AmigaOne_X1000_Nemo-54580-g8d7a724-dirty)
[   17.371777] MSR:  90029032  CR: 
2400  XER: 2000

[   17.371786] IRQMASK: 0
   GPR00: c0103a30 c0026900ec30 
c1923f00 0052
   GPR04: c0026f206778 c0026f20d458 
 0346
   GPR08: 0007  
 0010
   GPR12: 22002444 c1b1 
 
   GPR16: 10382410  
 c0026bd9d820
   GPR20:  c0026919c000 
 
   GPR24: 0800 c0026919 
c002692a4180 c0026919
   GPR28: c00277ada1c8 066e 
c0026d3c68b0 0802

[   17.371823] NIP [c010395c] .dma_direct_map_page+0x11c/0x200
[   17.371827] LR [c0103a30] .dma_direct_map_page+0x1f0/0x200
[   17.371829] Call Trace:
[   17.371833] [c0026900ec30] [c0103a30] 
.dma_direct_map_page+0x1f0/0x200 (unreliable)
[   17.371840] [c0026900ecd0] [c099b7ec] 
.pasemi_mac_replenish_rx_ring+0x12c/0x2a0
[   17.371846] [c0026900eda0] [c099dc64] 
.pasemi_mac_open+0x384/0x7c0

[   17.371853] [c0026900ee40] [c0c6f484] .__dev_open+0x134/0x1e0
[   17.371858] [c0026900eee0] [c0c6f9ec] 
.__dev_change_flags+0x1bc/0x210
[   17.371863] [c0026900ef90] [c0c6fa88] 
.dev_change_flags+0x48/0xa0

[   17.371869] [c0026900f030] [c0c8c88c] .do_setlink+0x3dc/0xf60
[   17.371875] [c0026900f1b0] [c0c8dd84] 
.__rtnl_newlink+0x5e4/0x900

[   17.371880] [c0026900f5f0] [c0c8e10c] .rtnl_newlink+0x6c/0xb0
[   17.371885] [c0026900f680] [c0c89838] 
.rtnetlink_rcv_msg+0x2e8/0x3d0
[   17.371891] [c0026900f760] [c0cc0f90] 
.netlink_rcv_skb+0x120/0x170
[   17.371896] [c0026900f820] [c0c87318] 
.rtnetlink_rcv+0x28/0x40
[   17.371901] [c0026900f8a0] [c0cc03f8] 
.netlink_unicast+0x208/0x2f0
[   17.371906] [c0026900f950] [c0cc09a8] 
.netlink_sendmsg+0x348/0x460

[   17.371911] [c0026900fa30] [c0c38774] .sock_sendmsg+0x44/0x70
[   17.371915] [c0026900fab0] [c0c3a79c] 
.___sys_sendmsg+0x30c/0x320
[   17.371920] [c0026900fca0] [c0c3c3b4] 
.__sys_sendmsg+0x74/0xf0
[   17.371926] [c0026900fd90] [c0cb4da0] 
.__se_compat_sys_sendmsg+0x40/0x60

[   17.371932] [c0026900fe20] [c000a21c] system_call+0x5c/0x70
[   17.371934] Instruction dump:
[   17.371937] 6000 f8610070 3d20 6129fffe 79290020 e8e7 
7fa74840 409d00b8
[   17.371946] 3d420001 892acb59 2f89 419e00b8 <0fe0> 382100a0 
3860 e8010010

[   17.371954] ---[ end trace a81f3c344f625f76 ]---
[   17.396654] IPv6: ADDRCONF(NETDEV_UP): enp0s20f3: link is not ready



Additionally, Xorg doesn't start on a virtual e5500 QEMU machine 
anymore. I tested with the following QEMU command:


./qemu-system-ppc64 -M ppce500 -cpu e5500 -m 2048 -kernel 
/home/christian/Downloads/vmlinux-5.0-rc4-3-AmigaOne_X1000_X5000/X5000_and_QEMU_e5500/uImage-5.0 
-drive 
format=raw,file=/home/christian/Downloads/Fienix-Beta120418.img,index=0,if=virtio 
-nic user,model=e1000 -append "rw root=/dev/vda" -device virtio-vga 
-device virtio-mouse-pci -device virtio-keyboard-pci -usb -soundhw 
es1370 -smp 4


Cheers,
Christian


On 30 January 2019 at 05:40AM, Christian Zigotzky wrote:

Hi Christoph,

Thanks a lot for the updates. I will test the full branch tomorrow.

Cheers,
Christian

Sent from my iPhone


On 29. Jan 2019, at 17:34, Christoph Hellwig  wrote:


On Tue, Jan 29, 2019 at 05:14:11PM +0100, Christoph Hellwig wrote:

On Tue, Jan 29, 2019 at 04:03:32PM +0100, Christian Zigotzky wrote:
Hi Christoph,

I compiled kernels for the X5000 and X1000 from your new branch
'powerpc-dma.6-debug.2' today. The kernels boot and the P.A. Semi Ethernet
works!

Thanks for testing!  I'll prepare a new series that adds the 

[PATCH] powerpc/powernv/npu: Remove redundant change_pte() hook

2019-01-31 Thread Peter Xu
The change_pte() notifier was designed to use as a quick path to
update secondary MMU PTEs on write permission changes or PFN changes.
For KVM, it could reduce the vm-exits when vcpu faults on the pages
that was touched up by KSM.  It's not used to do cache invalidations,
for example, if we see the notifier will be called before the real PTE
update after all (please see set_pte_at_notify that set_pte_at was
called later).

All the necessary cache invalidation should all be done in
invalidate_range() already.

CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Michael Ellerman 
CC: Alistair Popple 
CC: Alexey Kardashevskiy 
CC: Mark Hairgrove 
CC: Balbir Singh 
CC: David Gibson 
CC: Andrea Arcangeli 
CC: Jerome Glisse 
CC: Jason Wang 
CC: linuxppc-dev@lists.ozlabs.org
CC: linux-ker...@vger.kernel.org
Signed-off-by: Peter Xu 
---
 arch/powerpc/platforms/powernv/npu-dma.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
b/arch/powerpc/platforms/powernv/npu-dma.c
index 3f58c7dbd581..c003b29d870e 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -917,15 +917,6 @@ static void pnv_npu2_mn_release(struct mmu_notifier *mn,
mmio_invalidate(npu_context, 0, ~0UL);
 }
 
-static void pnv_npu2_mn_change_pte(struct mmu_notifier *mn,
-   struct mm_struct *mm,
-   unsigned long address,
-   pte_t pte)
-{
-   struct npu_context *npu_context = mn_to_npu_context(mn);
-   mmio_invalidate(npu_context, address, PAGE_SIZE);
-}
-
 static void pnv_npu2_mn_invalidate_range(struct mmu_notifier *mn,
struct mm_struct *mm,
unsigned long start, unsigned long end)
@@ -936,7 +927,6 @@ static void pnv_npu2_mn_invalidate_range(struct 
mmu_notifier *mn,
 
 static const struct mmu_notifier_ops nv_nmmu_notifier_ops = {
.release = pnv_npu2_mn_release,
-   .change_pte = pnv_npu2_mn_change_pte,
.invalidate_range = pnv_npu2_mn_invalidate_range,
 };
 
-- 
2.17.1



Re: Does SMP work at all on 40x ?

2019-01-31 Thread Christophe Leroy




Le 30/01/2019 à 12:43, Michael Ellerman a écrit :

Christophe Leroy  writes:


In transfer_to_handler() (entry_32.S), we have:

#if defined(CONFIG_40x) || defined(CONFIG_BOOKE)
...
#ifdef CONFIG_SMP
CURRENT_THREAD_INFO(r9, r1)
lwz r9,TI_CPU(r9)
slwir9,r9,3
add r11,r11,r9
#endif
#endif

When running this piece of code, MMU translation is off. But r9 contains
the virtual addr of current_thread_info, so unless I miss something,
this cannot work on the 40x, can it ?

On CONFIG_BOOKE it works because phys addr = virt addr


AFAIK 40x can't be SMP:

   config SMP
depends on PPC_BOOK3S || PPC_BOOK3E || FSL_BOOKE || PPC_47x


But this stuff is all before my time.

The commit that added the SMP block was clearly only meant for BookE:

   4eaddb4d7ec3 ("[POWERPC] Make Book-E debug handling SMP safe")


Ok, then no need to worry about it. It will implicitely get fixed with 
the THREAD_INFO_IN_TASK_STRUCT series.


Christophe


Re: BUG: memcmp(): Accessing invalid memory location

2019-01-31 Thread Michael Ellerman
Adding Simon who wrote the code.

Chandan Rajendra  writes:
> When executing fstests' generic/026 test, I hit the following call trace,
>
> [  417.061038] BUG: Unable to handle kernel data access at 0xc0062ac4
> [  417.062172] Faulting instruction address: 0xc0092240
> [  417.062242] Oops: Kernel access of bad area, sig: 11 [#1]
> [  417.062299] LE SMP NR_CPUS=2048 DEBUG_PAGEALLOC NUMA pSeries
> [  417.062366] Modules linked in:
> [  417.062401] CPU: 0 PID: 27828 Comm: chacl Not tainted 
> 5.0.0-rc2-next-20190115-1-g6de6dba64dda #1
> [  417.062495] NIP:  c0092240 LR: c066a55c CTR: 
> 
> [  417.062567] REGS: c0062c0c3430 TRAP: 0300   Not tainted  
> (5.0.0-rc2-next-20190115-1-g6de6dba64dda)
> [  417.062660] MSR:  82009033   CR: 
> 44000842  XER: 2000
> [  417.062750] CFAR: 7fff7f3108ac DAR: c0062ac4 DSISR: 4000 
> IRQMASK: 0
>GPR00:  c0062c0c36c0 c17f4c00 
> c121a660
>GPR04: c0062ac3fff9 0004 0020 
> 275b19c4
>GPR08: 000c 46494c45 5347495f41434c5f 
> c26073a0
>GPR12:  c27a  
> 
>GPR16:    
> 
>GPR20: c0062ea70020 c0062c0c38d0 0002 
> 0002
>GPR24: c0062ac3ffe8 275b19c4 0001 
> c0062ac3
>GPR28: c0062c0c38d0 c0062ac30050 c0062ac30058 
> 
> [  417.063563] NIP [c0092240] memcmp+0x120/0x690
> [  417.063635] LR [c066a55c] xfs_attr3_leaf_lookup_int+0x53c/0x5b0
> [  417.063709] Call Trace:
> [  417.063744] [c0062c0c36c0] [c066a098] 
> xfs_attr3_leaf_lookup_int+0x78/0x5b0 (unreliable)
> [  417.063851] [c0062c0c3760] [c0693f8c] 
> xfs_da3_node_lookup_int+0x32c/0x5a0
> [  417.063944] [c0062c0c3820] [c06634a0] 
> xfs_attr_node_addname+0x170/0x6b0
> [  417.064034] [c0062c0c38b0] [c0664ffc] xfs_attr_set+0x2ac/0x340
> [  417.064118] [c0062c0c39a0] [c0758d40] __xfs_set_acl+0xf0/0x230
> [  417.064190] [c0062c0c3a00] [c0758f50] xfs_set_acl+0xd0/0x160
> [  417.064268] [c0062c0c3aa0] [c04b69b0] set_posix_acl+0xc0/0x130
> [  417.064339] [c0062c0c3ae0] [c04b6a88] 
> posix_acl_xattr_set+0x68/0x110
> [  417.064412] [c0062c0c3b20] [c04532d4] __vfs_setxattr+0xa4/0x110
> [  417.064485] [c0062c0c3b80] [c0454c2c] 
> __vfs_setxattr_noperm+0xac/0x240
> [  417.064566] [c0062c0c3bd0] [c0454ee8] vfs_setxattr+0x128/0x130
> [  417.064638] [c0062c0c3c30] [c0455138] setxattr+0x248/0x600
> [  417.064710] [c0062c0c3d90] [c0455738] path_setxattr+0x108/0x120
> [  417.064785] [c0062c0c3e00] [c0455778] sys_setxattr+0x28/0x40
> [  417.064858] [c0062c0c3e20] [c000bae4] system_call+0x5c/0x70
> [  417.064930] Instruction dump:
> [  417.064964] 7d201c28 7d402428 7c295040 38630008 38840008 408201f0 4200ffe8 
> 2c05
> [  417.065051] 4182ff6c 20c50008 54c61838 7d201c28 <7d402428> 7d293436 
> 7d4a3436 7c295040
> [  417.065150] ---[ end trace 0d060411b5e3741b ]---
>
>
> Both the memory locations passed to memcmp() had "SGI_ACL_FILE" and len
> argument of memcmp() was set to 12. s1 argument of memcmp() had the value
> 0xf4af0485, while s2 argument had the value 0xce9e316f.
>
> The following is the code path within memcmp() that gets executed for the
> above mentioned values,
>
> - Since len (i.e. 12) is greater than 7, we branch to .Lno_short.
> - We then prefetch the contents of r3 & r4 and branch to
>   .Ldiffoffset_8bytes_make_align_start.
> - Under .Ldiffoffset_novmx_cmp, Since r3 is unaligned we end up comparing
>   "SGI" part of the string. r3's value is then aligned. r4's value is
>   incremented by 3. For comparing the remaining 9 bytes, we jump to
>   .Lcmp_lt32bytes.
> - Here, 8 bytes of the remaining 9 bytes are compared and execution moves to
>   .Lcmp_rest_lt8bytes.
> - Here we execute "LD rB,0,r4". In the case of this bug, r4 has an unaligned
>   value and hence ends up accessing the "next" double word. The "next" double
>   word happens to occur after the last page mapped into the kernel's address
>   space and hence this leads to the previously listed oops.

Thanks for the analysis.

This is just a bug, we can't read past the end of the source or dest.

We have a selftest for memcmp, but clearly it doesn't exercise this
case. Here's a patch to try and trip it:

diff --git a/tools/testing/selftests/powerpc/stringloops/memcmp.c 
b/tools/testing/selftests/powerpc/stringloops/memcmp.c
index b1fa7546957f..edca3abb6ecf 100644
--- a/tools/testing/selftests/powerpc/stringloops/memcmp.c
+++ 

[PATCH v2] powerpc/32: Fix CONFIG_VIRT_CPU_ACCOUNTING_NATIVE for 40x/booke

2019-01-31 Thread Christophe Leroy
40x/booke have another path to reach 3f from transfer_to_handler,
make sure it also calls ACCOUNT_CPU_USER_ENTRY() when
CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is selected.

Signed-off-by: Christophe Leroy 
---
 v2: left inside the user entry path

 arch/powerpc/kernel/entry_32.S | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 0768dfd8a64e..d4c6186aa7e8 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -166,6 +166,13 @@ transfer_to_handler:
   internal debug mode bit to do this. */
lwz r12,THREAD_DBCR0(r12)
andis.  r12,r12,DBCR0_IDM@h
+#endif
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
+   CURRENT_THREAD_INFO(r9, r1)
+   tophys(r9, r9)
+   ACCOUNT_CPU_USER_ENTRY(r9, r11, r12)
+#endif
+#if defined(CONFIG_40x) || defined(CONFIG_BOOKE)
beq+3f
/* From user and task is ptraced - load up global dbcr0 */
li  r12,-1  /* clear all pending debug events */
@@ -185,11 +192,6 @@ transfer_to_handler:
addir12,r12,-1
stw r12,4(r11)
 #endif
-#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
-   CURRENT_THREAD_INFO(r9, r1)
-   tophys(r9, r9)
-   ACCOUNT_CPU_USER_ENTRY(r9, r11, r12)
-#endif
 
b   3f
 
-- 
2.13.3



[PATCH v15 13/13] powerpc: clean stack pointers naming

2019-01-31 Thread Christophe Leroy
Some stack pointers used to also be thread_info pointers
and were called tp. Now that they are only stack pointers,
rename them sp.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/irq.c  | 17 +++--
 arch/powerpc/kernel/setup_64.c | 11 +++
 2 files changed, 10 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 938944c6e2ee..8a936723c791 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -659,21 +659,21 @@ void __do_irq(struct pt_regs *regs)
 void do_IRQ(struct pt_regs *regs)
 {
struct pt_regs *old_regs = set_irq_regs(regs);
-   void *curtp, *irqtp, *sirqtp;
+   void *cursp, *irqsp, *sirqsp;
 
/* Switch to the irq stack to handle this */
-   curtp = (void *)(current_stack_pointer() & ~(THREAD_SIZE - 1));
-   irqtp = hardirq_ctx[raw_smp_processor_id()];
-   sirqtp = softirq_ctx[raw_smp_processor_id()];
+   cursp = (void *)(current_stack_pointer() & ~(THREAD_SIZE - 1));
+   irqsp = hardirq_ctx[raw_smp_processor_id()];
+   sirqsp = softirq_ctx[raw_smp_processor_id()];
 
/* Already there ? */
-   if (unlikely(curtp == irqtp || curtp == sirqtp)) {
+   if (unlikely(cursp == irqsp || cursp == sirqsp)) {
__do_irq(regs);
set_irq_regs(old_regs);
return;
}
/* Switch stack and call */
-   call_do_irq(regs, irqtp);
+   call_do_irq(regs, irqsp);
 
set_irq_regs(old_regs);
 }
@@ -695,10 +695,7 @@ void *hardirq_ctx[NR_CPUS] __read_mostly;
 
 void do_softirq_own_stack(void)
 {
-   void *irqtp;
-
-   irqtp = softirq_ctx[smp_processor_id()];
-   call_do_softirq(irqtp);
+   call_do_softirq(softirq_ctx[smp_processor_id()]);
 }
 
 irq_hw_number_t virq_to_hw(unsigned int virq)
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 2db1c5f7d141..daa361fc6a24 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -716,19 +716,14 @@ void __init emergency_stack_init(void)
limit = min(ppc64_bolted_size(), ppc64_rma_size);
 
for_each_possible_cpu(i) {
-   void *ti;
-
-   ti = alloc_stack(limit, i);
-   paca_ptrs[i]->emergency_sp = ti + THREAD_SIZE;
+   paca_ptrs[i]->emergency_sp = alloc_stack(limit, i) + 
THREAD_SIZE;
 
 #ifdef CONFIG_PPC_BOOK3S_64
/* emergency stack for NMI exception handling. */
-   ti = alloc_stack(limit, i);
-   paca_ptrs[i]->nmi_emergency_sp = ti + THREAD_SIZE;
+   paca_ptrs[i]->nmi_emergency_sp = alloc_stack(limit, i) + 
THREAD_SIZE;
 
/* emergency stack for machine check exception handling. */
-   ti = alloc_stack(limit, i);
-   paca_ptrs[i]->mc_emergency_sp = ti + THREAD_SIZE;
+   paca_ptrs[i]->mc_emergency_sp = alloc_stack(limit, i) + 
THREAD_SIZE;
 #endif
}
 }
-- 
2.13.3



[PATCH v15 12/13] powerpc/64: Remove CURRENT_THREAD_INFO

2019-01-31 Thread Christophe Leroy
Now that current_thread_info is located at the beginning of 'current'
task struct, CURRENT_THREAD_INFO macro is not really needed any more.

This patch replaces it by loads of the value at PACACURRENT(r13).

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/exception-64s.h   |  4 ++--
 arch/powerpc/include/asm/thread_info.h |  4 
 arch/powerpc/kernel/entry_64.S | 10 +-
 arch/powerpc/kernel/exceptions-64e.S   |  2 +-
 arch/powerpc/kernel/exceptions-64s.S   |  2 +-
 arch/powerpc/kernel/idle_book3e.S  |  2 +-
 arch/powerpc/kernel/idle_power4.S  |  2 +-
 arch/powerpc/kernel/trace/ftrace_64_mprofile.S |  6 +++---
 8 files changed, 14 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 3b4767ed3ec5..dd6a5ae7a769 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -671,7 +671,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 
 #define RUNLATCH_ON\
 BEGIN_FTR_SECTION  \
-   CURRENT_THREAD_INFO(r3, r1);\
+   ld  r3, PACACURRENT(r13);   \
ld  r4,TI_LOCAL_FLAGS(r3);  \
andi.   r0,r4,_TLF_RUNLATCH;\
beqlppc64_runlatch_on_trampoline;   \
@@ -721,7 +721,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CTRL)
 #ifdef CONFIG_PPC_970_NAP
 #define FINISH_NAP \
 BEGIN_FTR_SECTION  \
-   CURRENT_THREAD_INFO(r11, r1);   \
+   ld  r11, PACACURRENT(r13);  \
ld  r9,TI_LOCAL_FLAGS(r11); \
andi.   r10,r9,_TLF_NAPPING;\
bnelpower4_fixup_nap;   \
diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index c959b8d66cac..8e1d0195ac36 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -17,10 +17,6 @@
 
 #define THREAD_SIZE(1 << THREAD_SHIFT)
 
-#ifdef CONFIG_PPC64
-#define CURRENT_THREAD_INFO(dest, sp)  stringify_in_c(ld dest, 
PACACURRENT(r13))
-#endif
-
 #ifndef __ASSEMBLY__
 #include 
 #include 
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 01d0706d873f..83bddacd7a17 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -166,7 +166,7 @@ system_call:/* label this so stack 
traces look sane */
li  r10,IRQS_ENABLED
std r10,SOFTE(r1)
 
-   CURRENT_THREAD_INFO(r11, r1)
+   ld  r11, PACACURRENT(r13)
ld  r10,TI_FLAGS(r11)
andi.   r11,r10,_TIF_SYSCALL_DOTRACE
bne .Lsyscall_dotrace   /* does not return */
@@ -213,7 +213,7 @@ system_call:/* label this so stack 
traces look sane */
ld  r3,RESULT(r1)
 #endif
 
-   CURRENT_THREAD_INFO(r12, r1)
+   ld  r12, PACACURRENT(r13)
 
ld  r8,_MSR(r1)
 #ifdef CONFIG_PPC_BOOK3S
@@ -348,7 +348,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 
/* Repopulate r9 and r10 for the syscall path */
addir9,r1,STACK_FRAME_OVERHEAD
-   CURRENT_THREAD_INFO(r10, r1)
+   ld  r10, PACACURRENT(r13)
ld  r10,TI_FLAGS(r10)
 
cmpldi  r0,NR_syscalls
@@ -746,7 +746,7 @@ _GLOBAL(ret_from_except_lite)
mtmsrd  r10,1 /* Update machine state */
 #endif /* CONFIG_PPC_BOOK3E */
 
-   CURRENT_THREAD_INFO(r9, r1)
+   ld  r9, PACACURRENT(r13)
ld  r3,_MSR(r1)
 #ifdef CONFIG_PPC_BOOK3E
ld  r10,PACACURRENT(r13)
@@ -860,7 +860,7 @@ resume_kernel:
 1: bl  preempt_schedule_irq
 
/* Re-test flags and eventually loop */
-   CURRENT_THREAD_INFO(r9, r1)
+   ld  r9, PACACURRENT(r13)
ld  r4,TI_FLAGS(r9)
andi.   r0,r4,_TIF_NEED_RESCHED
bne 1b
diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index 20f14996281d..04ee24789f80 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -493,7 +493,7 @@ exc_##n##_bad_stack:
\
  * interrupts happen before the wait instruction.
  */
 #define CHECK_NAPPING()
\
-   CURRENT_THREAD_INFO(r11, r1);   \
+   ld  r11, PACACURRENT(r13);  \
ld  r10,TI_LOCAL_FLAGS(r11);\
andi.   r9,r10,_TLF_NAPPING;\
beq+1f; \
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 9e253ce27e08..c7c4e2d6f98f 100644

[PATCH v15 11/13] powerpc/32: Remove CURRENT_THREAD_INFO and rename TI_CPU

2019-01-31 Thread Christophe Leroy
Now that thread_info is similar to task_struct, its address is in r2
so CURRENT_THREAD_INFO() macro is useless. This patch removes it.

This patch also moves the 'tovirt(r2, r2)' down just before the
reactivation of MMU translation, so that we keep the physical address
of 'current' in r2 until then. It avoids a few calls to tophys().

At the same time, as the 'cpu' field is not anymore in thread_info,
TI_CPU is renamed TASK_CPU by this patch.

It also allows to get rid of a couple of
'#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE' as ACCOUNT_CPU_USER_ENTRY()
and ACCOUNT_CPU_USER_EXIT() are empty when
CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not defined.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Makefile  |  2 +-
 arch/powerpc/include/asm/thread_info.h |  2 --
 arch/powerpc/kernel/asm-offsets.c  |  2 +-
 arch/powerpc/kernel/entry_32.S | 55 +++---
 arch/powerpc/kernel/epapr_hcalls.S |  5 ++--
 arch/powerpc/kernel/head_fsl_booke.S   |  5 ++--
 arch/powerpc/kernel/idle_6xx.S |  9 ++
 arch/powerpc/kernel/idle_e500.S|  8 ++---
 arch/powerpc/kernel/misc_32.S  |  3 +-
 arch/powerpc/mm/hash_low_32.S  | 14 -
 arch/powerpc/sysdev/6xx-suspend.S  |  5 ++--
 11 files changed, 38 insertions(+), 72 deletions(-)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 53ffe935f3b0..7de49889bd5d 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -431,7 +431,7 @@ ifdef CONFIG_SMP
 prepare: task_cpu_prepare
 
 task_cpu_prepare: prepare0
-   $(eval KBUILD_CFLAGS += -D_TASK_CPU=$(shell awk '{if ($$2 == "TI_CPU") 
print $$3;}' include/generated/asm-offsets.h))
+   $(eval KBUILD_CFLAGS += -D_TASK_CPU=$(shell awk '{if ($$2 == 
"TASK_CPU") print $$3;}' include/generated/asm-offsets.h))
 endif
 
 # Check toolchain versions:
diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index d91523c2c7d8..c959b8d66cac 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -19,8 +19,6 @@
 
 #ifdef CONFIG_PPC64
 #define CURRENT_THREAD_INFO(dest, sp)  stringify_in_c(ld dest, 
PACACURRENT(r13))
-#else
-#define CURRENT_THREAD_INFO(dest, sp)  stringify_in_c(mr dest, r2)
 #endif
 
 #ifndef __ASSEMBLY__
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 94ac190a0b16..03439785c2ea 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -96,7 +96,7 @@ int main(void)
 #endif /* CONFIG_PPC64 */
OFFSET(TASK_STACK, task_struct, stack);
 #ifdef CONFIG_SMP
-   OFFSET(TI_CPU, task_struct, cpu);
+   OFFSET(TASK_CPU, task_struct, cpu);
 #endif
 
 #ifdef CONFIG_LIVEPATCH
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index aea22c7b891f..a5e2d5585dcb 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -151,7 +151,6 @@ transfer_to_handler:
stw r2,_XER(r11)
mfspr   r12,SPRN_SPRG_THREAD
addir2,r12,-THREAD
-   tovirt(r2,r2)   /* set r2 to current */
beq 2f  /* if from user, fix up THREAD.regs */
addir11,r1,STACK_FRAME_OVERHEAD
stw r11,PT_REGS(r12)
@@ -161,11 +160,7 @@ transfer_to_handler:
lwz r12,THREAD_DBCR0(r12)
andis.  r12,r12,DBCR0_IDM@h
 #endif
-#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
-   CURRENT_THREAD_INFO(r9, r1)
-   tophys(r9, r9)
-   ACCOUNT_CPU_USER_ENTRY(r9, r11, r12)
-#endif
+   ACCOUNT_CPU_USER_ENTRY(r2, r11, r12)
 #if defined(CONFIG_40x) || defined(CONFIG_BOOKE)
beq+3f
/* From user and task is ptraced - load up global dbcr0 */
@@ -175,8 +170,7 @@ transfer_to_handler:
tophys(r11,r11)
addir11,r11,global_dbcr0@l
 #ifdef CONFIG_SMP
-   CURRENT_THREAD_INFO(r9, r1)
-   lwz r9,TI_CPU(r9)
+   lwz r9,TASK_CPU(r2)
slwir9,r9,3
add r11,r11,r9
 #endif
@@ -197,9 +191,7 @@ transfer_to_handler:
ble-stack_ovf   /* then the kernel stack overflowed */
 5:
 #if defined(CONFIG_PPC_BOOK3S_32) || defined(CONFIG_E500)
-   CURRENT_THREAD_INFO(r9, r1)
-   tophys(r9,r9)   /* check local flags */
-   lwz r12,TI_LOCAL_FLAGS(r9)
+   lwz r12,TI_LOCAL_FLAGS(r2)
mtcrf   0x01,r12
bt- 31-TLF_NAPPING,4f
bt- 31-TLF_SLEEPING,7f
@@ -208,6 +200,7 @@ transfer_to_handler:
 transfer_to_handler_cont:
 3:
mflrr9
+   tovirt(r2, r2)  /* set r2 to current */
lwz r11,0(r9)   /* virtual address of handler */
lwz r9,4(r9)/* where to go when done */
 #if defined(CONFIG_PPC_8xx) && defined(CONFIG_PERF_EVENTS)
@@ -271,11 +264,11 @@ reenable_mmu: /* re-enable 
mmu so we can */
 
 #if defined 

[PATCH v15 10/13] powerpc: 'current_set' is now a table of task_struct pointers

2019-01-31 Thread Christophe Leroy
The table of pointers 'current_set' has been used for retrieving
the stack and current. They used to be thread_info pointers as
they were pointing to the stack and current was taken from the
'task' field of the thread_info.

Now, the pointers of 'current_set' table are now both pointers
to task_struct and pointers to thread_info.

As they are used to get current, and the stack pointer is
retrieved from current's stack field, this patch changes
their type to task_struct, and renames secondary_ti to
secondary_current.

Reviewed-by: Nicholas Piggin 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/asm-prototypes.h |  4 ++--
 arch/powerpc/kernel/head_32.S |  6 +++---
 arch/powerpc/kernel/head_44x.S|  4 ++--
 arch/powerpc/kernel/head_fsl_booke.S  |  4 ++--
 arch/powerpc/kernel/smp.c | 10 --
 5 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/asm-prototypes.h 
b/arch/powerpc/include/asm/asm-prototypes.h
index 1d911f68a23b..1484df6779ab 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -23,8 +23,8 @@
 #include 
 
 /* SMP */
-extern struct thread_info *current_set[NR_CPUS];
-extern struct thread_info *secondary_ti;
+extern struct task_struct *current_set[NR_CPUS];
+extern struct task_struct *secondary_current;
 void start_secondary(void *unused);
 
 /* kexec */
diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index 309a45779ad5..146385b1c2da 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -846,9 +846,9 @@ __secondary_start:
 #endif /* CONFIG_PPC_BOOK3S_32 */
 
/* get current's stack and current */
-   lis r1,secondary_ti@ha
-   tophys(r1,r1)
-   lwz r2,secondary_ti@l(r1)
+   lis r2,secondary_current@ha
+   tophys(r2,r2)
+   lwz r2,secondary_current@l(r2)
tophys(r1,r2)
lwz r1,TASK_STACK(r1)
 
diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S
index f94a93b6c2f2..37117ab11584 100644
--- a/arch/powerpc/kernel/head_44x.S
+++ b/arch/powerpc/kernel/head_44x.S
@@ -1020,8 +1020,8 @@ _GLOBAL(start_secondary_47x)
/* Now we can get our task struct and real stack pointer */
 
/* Get current's stack and current */
-   lis r1,secondary_ti@ha
-   lwz r2,secondary_ti@l(r1)
+   lis r2,secondary_current@ha
+   lwz r2,secondary_current@l(r2)
lwz r1,TASK_STACK(r2)
 
/* Current stack pointer */
diff --git a/arch/powerpc/kernel/head_fsl_booke.S 
b/arch/powerpc/kernel/head_fsl_booke.S
index 11f38adbe020..4ed2a7c8e89b 100644
--- a/arch/powerpc/kernel/head_fsl_booke.S
+++ b/arch/powerpc/kernel/head_fsl_booke.S
@@ -1091,8 +1091,8 @@ __secondary_start:
bl  call_setup_cpu
 
/* get current's stack and current */
-   lis r1,secondary_ti@ha
-   lwz r2,secondary_ti@l(r1)
+   lis r2,secondary_current@ha
+   lwz r2,secondary_current@l(r2)
lwz r1,TASK_STACK(r2)
 
/* stack */
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index aa4517686f90..a41fa8924004 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -76,7 +76,7 @@
 static DEFINE_PER_CPU(int, cpu_state) = { 0 };
 #endif
 
-struct thread_info *secondary_ti;
+struct task_struct *secondary_current;
 bool has_big_cores;
 
 DEFINE_PER_CPU(cpumask_var_t, cpu_sibling_map);
@@ -664,7 +664,7 @@ void smp_send_stop(void)
 }
 #endif /* CONFIG_NMI_IPI */
 
-struct thread_info *current_set[NR_CPUS];
+struct task_struct *current_set[NR_CPUS];
 
 static void smp_store_cpu_info(int id)
 {
@@ -929,7 +929,7 @@ void smp_prepare_boot_cpu(void)
paca_ptrs[boot_cpuid]->__current = current;
 #endif
set_numa_node(numa_cpu_lookup_table[boot_cpuid]);
-   current_set[boot_cpuid] = task_thread_info(current);
+   current_set[boot_cpuid] = current;
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
@@ -1014,15 +1014,13 @@ static bool secondaries_inhibited(void)
 
 static void cpu_idle_thread_init(unsigned int cpu, struct task_struct *idle)
 {
-   struct thread_info *ti = task_thread_info(idle);
-
 #ifdef CONFIG_PPC64
paca_ptrs[cpu]->__current = idle;
paca_ptrs[cpu]->kstack = (unsigned long)task_stack_page(idle) +
 THREAD_SIZE - STACK_FRAME_OVERHEAD;
 #endif
idle->cpu = cpu;
-   secondary_ti = current_set[cpu] = ti;
+   secondary_current = current_set[cpu] = idle;
 }
 
 int __cpu_up(unsigned int cpu, struct task_struct *tidle)
-- 
2.13.3



[PATCH v15 09/13] powerpc: regain entire stack space

2019-01-31 Thread Christophe Leroy
thread_info is not anymore in the stack, so the entire stack
can now be used.

There is also no risk anymore of corrupting task_cpu(p) with a
stack overflow so the patch removes the test.

When doing this, an explicit test for NULL stack pointer is
needed in validate_sp() as it is not anymore implicitely covered
by the sizeof(thread_info) gap.

In the meantime, with the previous patch all pointers to the stacks
are not anymore pointers to thread_info so this patch changes them
to void*

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/irq.h   | 10 +-
 arch/powerpc/include/asm/processor.h |  3 +--
 arch/powerpc/kernel/asm-offsets.c|  1 -
 arch/powerpc/kernel/entry_32.S   | 14 --
 arch/powerpc/kernel/irq.c| 19 +--
 arch/powerpc/kernel/misc_32.S|  6 ++
 arch/powerpc/kernel/process.c| 32 +---
 arch/powerpc/kernel/setup_64.c   |  8 
 8 files changed, 38 insertions(+), 55 deletions(-)

diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h
index 28a7ace0a1b9..c91a60cda4fa 100644
--- a/arch/powerpc/include/asm/irq.h
+++ b/arch/powerpc/include/asm/irq.h
@@ -48,16 +48,16 @@ struct pt_regs;
  * Per-cpu stacks for handling critical, debug and machine check
  * level interrupts.
  */
-extern struct thread_info *critirq_ctx[NR_CPUS];
-extern struct thread_info *dbgirq_ctx[NR_CPUS];
-extern struct thread_info *mcheckirq_ctx[NR_CPUS];
+extern void *critirq_ctx[NR_CPUS];
+extern void *dbgirq_ctx[NR_CPUS];
+extern void *mcheckirq_ctx[NR_CPUS];
 #endif
 
 /*
  * Per-cpu stacks for handling hard and soft interrupts.
  */
-extern struct thread_info *hardirq_ctx[NR_CPUS];
-extern struct thread_info *softirq_ctx[NR_CPUS];
+extern void *hardirq_ctx[NR_CPUS];
+extern void *softirq_ctx[NR_CPUS];
 
 void call_do_softirq(void *sp);
 void call_do_irq(struct pt_regs *regs, void *sp);
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 15acb282a876..8179b64871ed 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -325,8 +325,7 @@ struct thread_struct {
 #define ARCH_MIN_TASKALIGN 16
 
 #define INIT_SP(sizeof(init_stack) + (unsigned long) 
_stack)
-#define INIT_SP_LIMIT \
-   (_ALIGN_UP(sizeof(struct thread_info), 16) + (unsigned long)_stack)
+#define INIT_SP_LIMIT  ((unsigned long)_stack)
 
 #ifdef CONFIG_SPE
 #define SPEFSCR_INIT \
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 1fb52206c106..94ac190a0b16 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -92,7 +92,6 @@ int main(void)
DEFINE(SIGSEGV, SIGSEGV);
DEFINE(NMI_MASK, NMI_MASK);
 #else
-   DEFINE(THREAD_INFO_GAP, _ALIGN_UP(sizeof(struct thread_info), 16));
OFFSET(KSP_LIMIT, thread_struct, ksp_limit);
 #endif /* CONFIG_PPC64 */
OFFSET(TASK_STACK, task_struct, stack);
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 3255c0840beb..aea22c7b891f 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -97,14 +97,11 @@ crit_transfer_to_handler:
mfspr   r0,SPRN_SRR1
stw r0,_SRR1(r11)
 
-   /* set the stack limit to the current stack
-* and set the limit to protect the thread_info
-* struct
-*/
+   /* set the stack limit to the current stack */
mfspr   r8,SPRN_SPRG_THREAD
lwz r0,KSP_LIMIT(r8)
stw r0,SAVED_KSP_LIMIT(r11)
-   rlwimi  r0,r1,0,0,(31-THREAD_SHIFT)
+   rlwinm  r0,r1,0,0,(31 - THREAD_SHIFT)
stw r0,KSP_LIMIT(r8)
/* fall through */
 #endif
@@ -121,14 +118,11 @@ crit_transfer_to_handler:
mfspr   r0,SPRN_SRR1
stw r0,crit_srr1@l(0)
 
-   /* set the stack limit to the current stack
-* and set the limit to protect the thread_info
-* struct
-*/
+   /* set the stack limit to the current stack */
mfspr   r8,SPRN_SPRG_THREAD
lwz r0,KSP_LIMIT(r8)
stw r0,saved_ksp_limit@l(0)
-   rlwimi  r0,r1,0,0,(31-THREAD_SHIFT)
+   rlwinm  r0,r1,0,0,(31 - THREAD_SHIFT)
stw r0,KSP_LIMIT(r8)
/* fall through */
 #endif
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 85c48911938a..938944c6e2ee 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -618,9 +618,8 @@ static inline void check_stack_overflow(void)
sp = current_stack_pointer() & (THREAD_SIZE-1);
 
/* check for stack overflow: is there less than 2KB free? */
-   if (unlikely(sp < (sizeof(struct thread_info) + 2048))) {
-   pr_err("do_IRQ: stack overflow: %ld\n",
-   sp - sizeof(struct thread_info));
+   if (unlikely(sp < 2048)) {
+   pr_err("do_IRQ: stack overflow: %ld\n", sp);
   

[PATCH v15 08/13] powerpc: Activate CONFIG_THREAD_INFO_IN_TASK

2019-01-31 Thread Christophe Leroy
This patch activates CONFIG_THREAD_INFO_IN_TASK which
moves the thread_info into task_struct.

Moving thread_info into task_struct has the following advantages:
- It protects thread_info from corruption in the case of stack
overflows.
- Its address is harder to determine if stack addresses are
leaked, making a number of attacks more difficult.

This has the following consequences:
- thread_info is now located at the beginning of task_struct.
- The 'cpu' field is now in task_struct, and only exists when
CONFIG_SMP is active.
- thread_info doesn't have anymore the 'task' field.

This patch:
- Removes all recopy of thread_info struct when the stack changes.
- Changes the CURRENT_THREAD_INFO() macro to point to current.
- Selects CONFIG_THREAD_INFO_IN_TASK.
- Modifies raw_smp_processor_id() to get ->cpu from current without
including linux/sched.h to avoid circular inclusion and without
including asm/asm-offsets.h to avoid symbol names duplication
between ASM constants and C constants.
- Modifies klp_init_thread_info() to take a task_struct pointer
argument.

Signed-off-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/Kconfig   |  1 +
 arch/powerpc/Makefile  |  7 +++
 arch/powerpc/include/asm/irq.h |  4 --
 arch/powerpc/include/asm/livepatch.h   |  6 +--
 arch/powerpc/include/asm/ptrace.h  |  2 +-
 arch/powerpc/include/asm/smp.h | 17 +++-
 arch/powerpc/include/asm/thread_info.h | 17 +---
 arch/powerpc/kernel/asm-offsets.c  |  7 ++-
 arch/powerpc/kernel/entry_32.S |  9 ++--
 arch/powerpc/kernel/exceptions-64e.S   | 11 -
 arch/powerpc/kernel/head_32.S  |  6 +--
 arch/powerpc/kernel/head_44x.S |  4 +-
 arch/powerpc/kernel/head_64.S  |  1 +
 arch/powerpc/kernel/head_booke.h   |  8 +---
 arch/powerpc/kernel/head_fsl_booke.S   |  7 ++-
 arch/powerpc/kernel/irq.c  | 79 +-
 arch/powerpc/kernel/kgdb.c | 28 
 arch/powerpc/kernel/machine_kexec_64.c |  6 +--
 arch/powerpc/kernel/process.c  |  2 +-
 arch/powerpc/kernel/setup-common.c |  2 +-
 arch/powerpc/kernel/setup_64.c | 21 -
 arch/powerpc/kernel/smp.c  |  2 +-
 arch/powerpc/net/bpf_jit32.h   |  5 +--
 23 files changed, 57 insertions(+), 195 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2890d36eb531..0a26e0075ce5 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -241,6 +241,7 @@ config PPC
select RTC_LIB
select SPARSE_IRQ
select SYSCTL_EXCEPTION_TRACE
+   select THREAD_INFO_IN_TASK
select VIRT_TO_BUS  if !PPC64
#
# Please keep this list sorted alphabetically.
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index ac033341ed55..53ffe935f3b0 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -427,6 +427,13 @@ else
 endif
 endif
 
+ifdef CONFIG_SMP
+prepare: task_cpu_prepare
+
+task_cpu_prepare: prepare0
+   $(eval KBUILD_CFLAGS += -D_TASK_CPU=$(shell awk '{if ($$2 == "TI_CPU") 
print $$3;}' include/generated/asm-offsets.h))
+endif
+
 # Check toolchain versions:
 # - gcc-4.6 is the minimum kernel-wide version so nothing required.
 checkbin:
diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h
index 2efbae8d93be..28a7ace0a1b9 100644
--- a/arch/powerpc/include/asm/irq.h
+++ b/arch/powerpc/include/asm/irq.h
@@ -51,9 +51,6 @@ struct pt_regs;
 extern struct thread_info *critirq_ctx[NR_CPUS];
 extern struct thread_info *dbgirq_ctx[NR_CPUS];
 extern struct thread_info *mcheckirq_ctx[NR_CPUS];
-extern void exc_lvl_ctx_init(void);
-#else
-#define exc_lvl_ctx_init()
 #endif
 
 /*
@@ -62,7 +59,6 @@ extern void exc_lvl_ctx_init(void);
 extern struct thread_info *hardirq_ctx[NR_CPUS];
 extern struct thread_info *softirq_ctx[NR_CPUS];
 
-extern void irq_ctx_init(void);
 void call_do_softirq(void *sp);
 void call_do_irq(struct pt_regs *regs, void *sp);
 extern void do_IRQ(struct pt_regs *regs);
diff --git a/arch/powerpc/include/asm/livepatch.h 
b/arch/powerpc/include/asm/livepatch.h
index 47a03b9b528b..7cb514865a28 100644
--- a/arch/powerpc/include/asm/livepatch.h
+++ b/arch/powerpc/include/asm/livepatch.h
@@ -43,13 +43,13 @@ static inline unsigned long 
klp_get_ftrace_location(unsigned long faddr)
return ftrace_location_range(faddr, faddr + 16);
 }
 
-static inline void klp_init_thread_info(struct thread_info *ti)
+static inline void klp_init_thread_info(struct task_struct *p)
 {
/* + 1 to account for STACK_END_MAGIC */
-   ti->livepatch_sp = (unsigned long *)(ti + 1) + 1;
+   task_thread_info(p)->livepatch_sp = end_of_stack(p) + 1;
 }
 #else
-static void klp_init_thread_info(struct thread_info *ti) { }
+static inline void klp_init_thread_info(struct task_struct *p) { }
 #endif /* CONFIG_LIVEPATCH */
 
 #endif /* _ASM_POWERPC_LIVEPATCH_H */
diff --git 

[PATCH v15 07/13] powerpc: Prepare for moving thread_info into task_struct

2019-01-31 Thread Christophe Leroy
This patch cleans the powerpc kernel before activating
CONFIG_THREAD_INFO_IN_TASK:
- The purpose of the pointer given to call_do_softirq() and
call_do_irq() is to point the new stack ==> change it to void* and
rename it 'sp'
- Don't use CURRENT_THREAD_INFO() to locate the stack.
- Fix a few comments.
- Replace current_thread_info()->task by current
- Makes TASK_STACK available to PPC64. PPC64 will need it to get the
stack pointer from current once the thread_info have been moved.
- In idle_6xx.S, make sure CURRENT_THREAD_INFO() is used with r1
which is the virtual address of the stack, in order to ease the switch
to r2 as we have no register having the phys address of current.

Signed-off-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/irq.h   | 4 ++--
 arch/powerpc/include/asm/processor.h | 4 ++--
 arch/powerpc/include/asm/reg.h   | 2 +-
 arch/powerpc/kernel/asm-offsets.c| 2 +-
 arch/powerpc/kernel/entry_64.S   | 2 +-
 arch/powerpc/kernel/head_32.S| 2 +-
 arch/powerpc/kernel/head_44x.S   | 2 +-
 arch/powerpc/kernel/head_fsl_booke.S | 2 +-
 arch/powerpc/kernel/idle_6xx.S   | 3 ++-
 arch/powerpc/kernel/irq.c| 2 +-
 arch/powerpc/kernel/misc_32.S| 4 ++--
 arch/powerpc/kernel/process.c| 6 +++---
 arch/powerpc/kernel/smp.c| 4 +++-
 13 files changed, 21 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h
index ee39ce56b2a2..2efbae8d93be 100644
--- a/arch/powerpc/include/asm/irq.h
+++ b/arch/powerpc/include/asm/irq.h
@@ -63,8 +63,8 @@ extern struct thread_info *hardirq_ctx[NR_CPUS];
 extern struct thread_info *softirq_ctx[NR_CPUS];
 
 extern void irq_ctx_init(void);
-extern void call_do_softirq(struct thread_info *tp);
-extern void call_do_irq(struct pt_regs *regs, struct thread_info *tp);
+void call_do_softirq(void *sp);
+void call_do_irq(struct pt_regs *regs, void *sp);
 extern void do_IRQ(struct pt_regs *regs);
 extern void __init init_IRQ(void);
 extern void __do_irq(struct pt_regs *regs);
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 692f7383d461..15acb282a876 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -40,7 +40,7 @@
 
 #ifndef __ASSEMBLY__
 #include 
-#include 
+#include 
 #include 
 #include 
 
@@ -326,7 +326,7 @@ struct thread_struct {
 
 #define INIT_SP(sizeof(init_stack) + (unsigned long) 
_stack)
 #define INIT_SP_LIMIT \
-   (_ALIGN_UP(sizeof(init_thread_info), 16) + (unsigned long) _stack)
+   (_ALIGN_UP(sizeof(struct thread_info), 16) + (unsigned long)_stack)
 
 #ifdef CONFIG_SPE
 #define SPEFSCR_INIT \
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 1c98ef1f2d5b..581e61db2dcf 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -1062,7 +1062,7 @@
  * - SPRG9 debug exception scratch
  *
  * All 32-bit:
- * - SPRG3 current thread_info pointer
+ * - SPRG3 current thread_struct physical addr pointer
  *(virtual on BookE, physical on others)
  *
  * 32-bit classic:
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 23456ba3410a..b2b52e002a76 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -90,10 +90,10 @@ int main(void)
DEFINE(SIGSEGV, SIGSEGV);
DEFINE(NMI_MASK, NMI_MASK);
 #else
-   OFFSET(TASK_STACK, task_struct, stack);
DEFINE(THREAD_INFO_GAP, _ALIGN_UP(sizeof(struct thread_info), 16));
OFFSET(KSP_LIMIT, thread_struct, ksp_limit);
 #endif /* CONFIG_PPC64 */
+   OFFSET(TASK_STACK, task_struct, stack);
 
 #ifdef CONFIG_LIVEPATCH
OFFSET(TI_livepatch_sp, thread_info, livepatch_sp);
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 435927f549c4..01d0706d873f 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -695,7 +695,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
 2:
 #endif /* CONFIG_PPC_BOOK3S_64 */
 
-   CURRENT_THREAD_INFO(r7, r8)  /* base of new stack */
+   clrrdi  r7, r8, THREAD_SHIFT/* base of new stack */
/* Note: this uses SWITCH_FRAME_SIZE rather than INT_FRAME_SIZE
   because we don't need to leave the 288-byte ABI gap at the
   top of the kernel stack. */
diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index 9268e5e87949..8282d25948ae 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -845,7 +845,7 @@ __secondary_start:
bl  init_idle_6xx
 #endif /* CONFIG_PPC_BOOK3S_32 */
 
-   /* get current_thread_info and current */
+   /* get current's stack and current */
lis r1,secondary_ti@ha
tophys(r1,r1)
lwz r1,secondary_ti@l(r1)
diff --git a/arch/powerpc/kernel/head_44x.S 

[PATCH v15 06/13] powerpc: Rename THREAD_INFO to TASK_STACK

2019-01-31 Thread Christophe Leroy
This patch renames THREAD_INFO to TASK_STACK, because it is in fact
the offset of the pointer to the stack in task_struct so this pointer
will not be impacted by the move of THREAD_INFO.

Signed-off-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/kernel/asm-offsets.c| 2 +-
 arch/powerpc/kernel/entry_32.S   | 2 +-
 arch/powerpc/kernel/head_32.S| 2 +-
 arch/powerpc/kernel/head_40x.S   | 4 ++--
 arch/powerpc/kernel/head_8xx.S   | 2 +-
 arch/powerpc/kernel/head_booke.h | 4 ++--
 arch/powerpc/kernel/head_fsl_booke.S | 2 +-
 7 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 9ffc72ded73a..23456ba3410a 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -90,7 +90,7 @@ int main(void)
DEFINE(SIGSEGV, SIGSEGV);
DEFINE(NMI_MASK, NMI_MASK);
 #else
-   OFFSET(THREAD_INFO, task_struct, stack);
+   OFFSET(TASK_STACK, task_struct, stack);
DEFINE(THREAD_INFO_GAP, _ALIGN_UP(sizeof(struct thread_info), 16));
OFFSET(KSP_LIMIT, thread_struct, ksp_limit);
 #endif /* CONFIG_PPC64 */
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index d4c6186aa7e8..f1646d845404 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -1168,7 +1168,7 @@ ret_from_debug_exc:
mfspr   r9,SPRN_SPRG_THREAD
lwz r10,SAVED_KSP_LIMIT(r1)
stw r10,KSP_LIMIT(r9)
-   lwz r9,THREAD_INFO-THREAD(r9)
+   lwz r9,TASK_STACK-THREAD(r9)
CURRENT_THREAD_INFO(r10, r1)
lwz r10,TI_PREEMPT(r10)
stw r10,TI_PREEMPT(r9)
diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index 05b08db3901d..9268e5e87949 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -261,7 +261,7 @@ __secondary_hold_acknowledge:
tophys(r11,r1); /* use tophys(r1) if kernel */ \
beq 1f; \
mfspr   r11,SPRN_SPRG_THREAD;   \
-   lwz r11,THREAD_INFO-THREAD(r11);\
+   lwz r11,TASK_STACK-THREAD(r11); \
addir11,r11,THREAD_SIZE;\
tophys(r11,r11);\
 1: subir11,r11,INT_FRAME_SIZE  /* alloc exc. frame */
diff --git a/arch/powerpc/kernel/head_40x.S b/arch/powerpc/kernel/head_40x.S
index b19d78410511..3088c9f29f5e 100644
--- a/arch/powerpc/kernel/head_40x.S
+++ b/arch/powerpc/kernel/head_40x.S
@@ -115,7 +115,7 @@ _ENTRY(saved_ksp_limit)
andi.   r11,r11,MSR_PR;  \
beq 1f;  \
mfspr   r1,SPRN_SPRG_THREAD;/* if from user, start at top of   */\
-   lwz r1,THREAD_INFO-THREAD(r1); /* this thread's kernel stack   */\
+   lwz r1,TASK_STACK-THREAD(r1); /* this thread's kernel stack   */\
addir1,r1,THREAD_SIZE;   \
 1: subir1,r1,INT_FRAME_SIZE;   /* Allocate an exception frame */\
tophys(r11,r1);  \
@@ -158,7 +158,7 @@ _ENTRY(saved_ksp_limit)
beq 1f;  \
/* COMING FROM USER MODE */  \
mfspr   r11,SPRN_SPRG_THREAD;   /* if from user, start at top of   */\
-   lwz r11,THREAD_INFO-THREAD(r11); /* this thread's kernel stack */\
+   lwz r11,TASK_STACK-THREAD(r11); /* this thread's kernel stack */\
 1: addir11,r11,THREAD_SIZE-INT_FRAME_SIZE; /* Alloc an excpt frm  */\
tophys(r11,r11); \
stw r10,_CCR(r11);  /* save various registers  */\
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 20cc816b3508..ca9207013579 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -142,7 +142,7 @@ instruction_counter:
tophys(r11,r1); /* use tophys(r1) if kernel */ \
beq 1f; \
mfspr   r11,SPRN_SPRG_THREAD;   \
-   lwz r11,THREAD_INFO-THREAD(r11);\
+   lwz r11,TASK_STACK-THREAD(r11); \
addir11,r11,THREAD_SIZE;\
tophys(r11,r11);\
 1: subir11,r11,INT_FRAME_SIZE  /* alloc exc. frame */
diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h
index 306e26c073a0..69e80e6d0d16 100644
--- a/arch/powerpc/kernel/head_booke.h
+++ b/arch/powerpc/kernel/head_booke.h
@@ -55,7 +55,7 @@ END_BTB_FLUSH_SECTION
beq 1f;  \
BOOKE_CLEAR_BTB(r11)\
/* if from user, start at top of this thread's kernel stack */   \
-   lwz 

[PATCH v15 05/13] powerpc: prep stack walkers for THREAD_INFO_IN_TASK

2019-01-31 Thread Christophe Leroy
[text copied from commit 9bbd4c56b0b6
("arm64: prep stack walkers for THREAD_INFO_IN_TASK")]

When CONFIG_THREAD_INFO_IN_TASK is selected, task stacks may be freed
before a task is destroyed. To account for this, the stacks are
refcounted, and when manipulating the stack of another task, it is
necessary to get/put the stack to ensure it isn't freed and/or re-used
while we do so.

This patch reworks the powerpc stack walking code to account for this.
When CONFIG_THREAD_INFO_IN_TASK is not selected these perform no
refcounting, and this should only be a structural change that does not
affect behaviour.

Acked-by: Mark Rutland 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/process.c| 23 +--
 arch/powerpc/kernel/stacktrace.c | 29 ++---
 2 files changed, 47 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index ce393df243aa..4ffbb677c9f5 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -2027,7 +2027,7 @@ int validate_sp(unsigned long sp, struct task_struct *p,
 
 EXPORT_SYMBOL(validate_sp);
 
-unsigned long get_wchan(struct task_struct *p)
+static unsigned long __get_wchan(struct task_struct *p)
 {
unsigned long ip, sp;
int count = 0;
@@ -2053,6 +2053,20 @@ unsigned long get_wchan(struct task_struct *p)
return 0;
 }
 
+unsigned long get_wchan(struct task_struct *p)
+{
+   unsigned long ret;
+
+   if (!try_get_task_stack(p))
+   return 0;
+
+   ret = __get_wchan(p);
+
+   put_task_stack(p);
+
+   return ret;
+}
+
 static int kstack_depth_to_print = CONFIG_PRINT_STACK_DEPTH;
 
 void show_stack(struct task_struct *tsk, unsigned long *stack)
@@ -2067,6 +2081,9 @@ void show_stack(struct task_struct *tsk, unsigned long 
*stack)
int curr_frame = 0;
 #endif
 
+   if (!try_get_task_stack(tsk))
+   return;
+
sp = (unsigned long) stack;
if (tsk == NULL)
tsk = current;
@@ -2081,7 +2098,7 @@ void show_stack(struct task_struct *tsk, unsigned long 
*stack)
printk("Call Trace:\n");
do {
if (!validate_sp(sp, tsk, STACK_FRAME_OVERHEAD))
-   return;
+   break;
 
stack = (unsigned long *) sp;
newsp = stack[0];
@@ -2121,6 +2138,8 @@ void show_stack(struct task_struct *tsk, unsigned long 
*stack)
 
sp = newsp;
} while (count++ < kstack_depth_to_print);
+
+   put_task_stack(tsk);
 }
 
 #ifdef CONFIG_PPC64
diff --git a/arch/powerpc/kernel/stacktrace.c b/arch/powerpc/kernel/stacktrace.c
index e2c50b55138f..f80e1129c0f2 100644
--- a/arch/powerpc/kernel/stacktrace.c
+++ b/arch/powerpc/kernel/stacktrace.c
@@ -67,12 +67,17 @@ void save_stack_trace_tsk(struct task_struct *tsk, struct 
stack_trace *trace)
 {
unsigned long sp;
 
+   if (!try_get_task_stack(tsk))
+   return;
+
if (tsk == current)
sp = current_stack_pointer();
else
sp = tsk->thread.ksp;
 
save_context_stack(trace, sp, tsk, 0);
+
+   put_task_stack(tsk);
 }
 EXPORT_SYMBOL_GPL(save_stack_trace_tsk);
 
@@ -84,9 +89,8 @@ save_stack_trace_regs(struct pt_regs *regs, struct 
stack_trace *trace)
 EXPORT_SYMBOL_GPL(save_stack_trace_regs);
 
 #ifdef CONFIG_HAVE_RELIABLE_STACKTRACE
-int
-save_stack_trace_tsk_reliable(struct task_struct *tsk,
-   struct stack_trace *trace)
+static int __save_stack_trace_tsk_reliable(struct task_struct *tsk,
+  struct stack_trace *trace)
 {
unsigned long sp;
unsigned long stack_page = (unsigned long)task_stack_page(tsk);
@@ -193,6 +197,25 @@ save_stack_trace_tsk_reliable(struct task_struct *tsk,
}
return 0;
 }
+
+int save_stack_trace_tsk_reliable(struct task_struct *tsk,
+ struct stack_trace *trace)
+{
+   int ret;
+
+   /*
+* If the task doesn't have a stack (e.g., a zombie), the stack is
+* "reliably" empty.
+*/
+   if (!try_get_task_stack(tsk))
+   return 0;
+
+   ret = __save_stack_trace_tsk_reliable(tsk, trace);
+
+   put_task_stack(tsk);
+
+   return ret;
+}
 EXPORT_SYMBOL_GPL(save_stack_trace_tsk_reliable);
 #endif /* CONFIG_HAVE_RELIABLE_STACKTRACE */
 
-- 
2.13.3



[PATCH v15 04/13] powerpc: Only use task_struct 'cpu' field on SMP

2019-01-31 Thread Christophe Leroy
When moving to CONFIG_THREAD_INFO_IN_TASK, the thread_info 'cpu' field
gets moved into task_struct and only defined when CONFIG_SMP is set.

This patch ensures that TI_CPU is only used when CONFIG_SMP is set and
that task_struct 'cpu' field is not used directly out of SMP code.

Signed-off-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/kernel/head_fsl_booke.S | 2 ++
 arch/powerpc/kernel/misc_32.S| 4 
 arch/powerpc/xmon/xmon.c | 2 +-
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/head_fsl_booke.S 
b/arch/powerpc/kernel/head_fsl_booke.S
index 2386ce2a9c6e..2c21e8642a00 100644
--- a/arch/powerpc/kernel/head_fsl_booke.S
+++ b/arch/powerpc/kernel/head_fsl_booke.S
@@ -243,8 +243,10 @@ set_ivor:
li  r0,0
stwur0,THREAD_SIZE-STACK_FRAME_OVERHEAD(r1)
 
+#ifdef CONFIG_SMP
CURRENT_THREAD_INFO(r22, r1)
stw r24, TI_CPU(r22)
+#endif
 
bl  early_init
 
diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index 57d2ffb2d45c..02b8cdd73792 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -183,10 +183,14 @@ _GLOBAL(low_choose_750fx_pll)
or  r4,r4,r5
mtspr   SPRN_HID1,r4
 
+#ifdef CONFIG_SMP
/* Store new HID1 image */
CURRENT_THREAD_INFO(r6, r1)
lwz r6,TI_CPU(r6)
slwir6,r6,2
+#else
+   li  r6, 0
+#endif
addis   r6,r6,nap_save_hid1@ha
stw r4,nap_save_hid1@l(r6)
 
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 757b8499aba2..a0f44f992360 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -2997,7 +2997,7 @@ static void show_task(struct task_struct *tsk)
printf("%px %016lx %6d %6d %c %2d %s\n", tsk,
tsk->thread.ksp,
tsk->pid, rcu_dereference(tsk->parent)->pid,
-   state, task_thread_info(tsk)->cpu,
+   state, task_cpu(tsk),
tsk->comm);
 }
 
-- 
2.13.3



[PATCH v15 03/13] book3s/64: avoid circular header inclusion in mmu-hash.h

2019-01-31 Thread Christophe Leroy
When activating CONFIG_THREAD_INFO_IN_TASK, linux/sched.h
includes asm/current.h. This generates a circular dependency.
To avoid that, asm/processor.h shall not be included in mmu-hash.h

In order to do that, this patch moves into a new header called
asm/task_size_user64.h the information from asm/processor.h required
by mmu-hash.h

Signed-off-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |  2 +-
 arch/powerpc/include/asm/processor.h  | 34 +-
 arch/powerpc/include/asm/task_size_user64.h   | 42 +++
 arch/powerpc/kvm/book3s_hv_hmi.c  |  1 +
 4 files changed, 45 insertions(+), 34 deletions(-)
 create mode 100644 arch/powerpc/include/asm/task_size_user64.h

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 12e522807f9f..b2aba048301e 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -23,7 +23,7 @@
  */
 #include 
 #include 
-#include 
+#include 
 #include 
 
 /*
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index ee58526cb6c2..692f7383d461 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -95,40 +95,8 @@ void release_thread(struct task_struct *);
 #endif
 
 #ifdef CONFIG_PPC64
-/*
- * 64-bit user address space can have multiple limits
- * For now supported values are:
- */
-#define TASK_SIZE_64TB  (0x4000UL)
-#define TASK_SIZE_128TB (0x8000UL)
-#define TASK_SIZE_512TB (0x0002UL)
-#define TASK_SIZE_1PB   (0x0004UL)
-#define TASK_SIZE_2PB   (0x0008UL)
-/*
- * With 52 bits in the address we can support
- * upto 4PB of range.
- */
-#define TASK_SIZE_4PB   (0x0010UL)
 
-/*
- * For now 512TB is only supported with book3s and 64K linux page size.
- */
-#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_PPC_64K_PAGES)
-/*
- * Max value currently used:
- */
-#define TASK_SIZE_USER64   TASK_SIZE_4PB
-#define DEFAULT_MAP_WINDOW_USER64  TASK_SIZE_128TB
-#define TASK_CONTEXT_SIZE  TASK_SIZE_512TB
-#else
-#define TASK_SIZE_USER64   TASK_SIZE_64TB
-#define DEFAULT_MAP_WINDOW_USER64  TASK_SIZE_64TB
-/*
- * We don't need to allocate extended context ids for 4K page size, because
- * we limit the max effective address on this config to 64TB.
- */
-#define TASK_CONTEXT_SIZE  TASK_SIZE_64TB
-#endif
+#include 
 
 /*
  * 32-bit user address space is 4GB - 1 page
diff --git a/arch/powerpc/include/asm/task_size_user64.h 
b/arch/powerpc/include/asm/task_size_user64.h
new file mode 100644
index ..a4043075864b
--- /dev/null
+++ b/arch/powerpc/include/asm/task_size_user64.h
@@ -0,0 +1,42 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_TASK_SIZE_USER64_H
+#define _ASM_POWERPC_TASK_SIZE_USER64_H
+
+#ifdef CONFIG_PPC64
+/*
+ * 64-bit user address space can have multiple limits
+ * For now supported values are:
+ */
+#define TASK_SIZE_64TB  (0x4000UL)
+#define TASK_SIZE_128TB (0x8000UL)
+#define TASK_SIZE_512TB (0x0002UL)
+#define TASK_SIZE_1PB   (0x0004UL)
+#define TASK_SIZE_2PB   (0x0008UL)
+/*
+ * With 52 bits in the address we can support
+ * upto 4PB of range.
+ */
+#define TASK_SIZE_4PB   (0x0010UL)
+
+/*
+ * For now 512TB is only supported with book3s and 64K linux page size.
+ */
+#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_PPC_64K_PAGES)
+/*
+ * Max value currently used:
+ */
+#define TASK_SIZE_USER64   TASK_SIZE_4PB
+#define DEFAULT_MAP_WINDOW_USER64  TASK_SIZE_128TB
+#define TASK_CONTEXT_SIZE  TASK_SIZE_512TB
+#else
+#define TASK_SIZE_USER64   TASK_SIZE_64TB
+#define DEFAULT_MAP_WINDOW_USER64  TASK_SIZE_64TB
+/*
+ * We don't need to allocate extended context ids for 4K page size, because
+ * we limit the max effective address on this config to 64TB.
+ */
+#define TASK_CONTEXT_SIZE  TASK_SIZE_64TB
+#endif
+
+#endif /* CONFIG_PPC64 */
+#endif /* _ASM_POWERPC_TASK_SIZE_USER64_H */
diff --git a/arch/powerpc/kvm/book3s_hv_hmi.c b/arch/powerpc/kvm/book3s_hv_hmi.c
index e3f738eb1cac..64b5011475c7 100644
--- a/arch/powerpc/kvm/book3s_hv_hmi.c
+++ b/arch/powerpc/kvm/book3s_hv_hmi.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 void wait_for_subcore_guest_exit(void)
 {
-- 
2.13.3



[PATCH v15 02/13] powerpc/32: Fix CONFIG_VIRT_CPU_ACCOUNTING_NATIVE for 40x/booke

2019-01-31 Thread Christophe Leroy
40x/booke have another path to reach 3f from transfer_to_handler,
make sure it also calls ACCOUNT_CPU_USER_ENTRY() when
CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is selected.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/entry_32.S | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 0768dfd8a64e..d4c6186aa7e8 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -166,6 +166,13 @@ transfer_to_handler:
   internal debug mode bit to do this. */
lwz r12,THREAD_DBCR0(r12)
andis.  r12,r12,DBCR0_IDM@h
+#endif
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
+   CURRENT_THREAD_INFO(r9, r1)
+   tophys(r9, r9)
+   ACCOUNT_CPU_USER_ENTRY(r9, r11, r12)
+#endif
+#if defined(CONFIG_40x) || defined(CONFIG_BOOKE)
beq+3f
/* From user and task is ptraced - load up global dbcr0 */
li  r12,-1  /* clear all pending debug events */
@@ -185,11 +192,6 @@ transfer_to_handler:
addir12,r12,-1
stw r12,4(r11)
 #endif
-#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
-   CURRENT_THREAD_INFO(r9, r1)
-   tophys(r9, r9)
-   ACCOUNT_CPU_USER_ENTRY(r9, r11, r12)
-#endif
 
b   3f
 
-- 
2.13.3



[PATCH v15 00/13] powerpc: Switch to CONFIG_THREAD_INFO_IN_TASK

2019-01-31 Thread Christophe Leroy
The purpose of this serie is to activate CONFIG_THREAD_INFO_IN_TASK which
moves the thread_info into task_struct.

Moving thread_info into task_struct has the following advantages:
- It protects thread_info from corruption in the case of stack
overflows.
- Its address is harder to determine if stack addresses are
leaked, making a number of attacks more difficult.

Changes in v15:
 - switched patch 1 and 2.
 - resync patch 1 with linux/next. As memblock modifications are now fully 
merged in
 linux-mm tree, this patch voids as soon as linux-mm gets merged into 
powerpc/merge branch
 - Fixed build failure on 64le due to call to __save_stack_trace_tsk_reliable() 
(patch 5)
 - Taken the renaming of THREAD_INFO to TASK_STACK out of the preparation patch 
to ease review (hence new patch 6)
 - Fixed one place where r11 (physical address of stack) was used instead of r1 
to locate
 thread_info, inducing a bug when switching to r2 which is virtual address of 
current (patch 7)
 - Keeping physical address of current in r2 until MMU translation is 
reactivated (patch 11)

Changes in v14 (ie since v13):
 - Added in front a fixup patch which conflicts with this serie
 - Added a patch for using try_get_task_stack()/put_task_stack() in stack 
walkers.
 - Fixed compilation failure in the preparation patch (by moving the 
modification
 of klp_init_thread_info() to the following patch)

Changes since v12:
 - Patch 1: Taken comment from Mike (re-introduced the 'panic' in case memblock 
allocation fails in setup_64.c
 - Patch 1: Added alloc_stack() function in setup_32.c to also panic in case of 
allocation failure.

Changes since v11:
 - Rebased on 81775f5563fa ("Automatic merge of branches 'master', 'next' and 
'fixes' into merge")
 - Added a first patch to change memblock allocs to functions returning virtual 
addrs. This removes
   the memset() which were the only remaining stuff in irq_ctx_init() and 
exc_lvl_ctx_init() at the end.
 - dropping irq_ctx_init() and exc_lvl_ctx_init() in patch 5 (powerpc: Activate 
CONFIG_THREAD_INFO_IN_TASK)
 - A few cosmetic changes in commit log and code.

Changes since v10:
 - Rebased on 21622a0d2023 ("Automatic merge of branches 'master', 'next' and 
'fixes' into merge")
  ==> Fixed conflict in setup_32.S

Changes since v9:
 - Rebased on 183cbf93be88 ("Automatic merge of branches 'master', 'next' and 
'fixes' into merge")
  ==> Fixed conflict on xmon

Changes since v8:
 - Rebased on e589b79e40d9 ("Automatic merge of branches 'master', 'next' and 
'fixes' into merge")
  ==> Main impact was conflicts due to commit 9a8dd708d547 ("memblock: rename 
memblock_alloc{_nid,_try_nid} to memblock_phys_alloc*")

Changes since v7:
 - Rebased on fb6c6ce7907d ("Automatic merge of branches 'master', 'next' and 
'fixes' into merge")

Changes since v6:
 - Fixed validate_sp() to exclude NULL sp in 'regain entire stack space' patch 
(early crash with CONFIG_KMEMLEAK)

Changes since v5:
 - Fixed livepatch_sp setup by using end_of_stack() instead of hardcoding
 - Fixed PPC_BPF_LOAD_CPU() macro

Changes since v4:
 - Fixed a build failure on 32bits SMP when include/generated/asm-offsets.h is 
not
 already existing, was due to spaces instead of a tab in the Makefile

Changes since RFC v3: (based on Nick's review)
 - Renamed task_size.h to task_size_user64.h to better relate to what it 
contains.
 - Handling of the isolation of thread_info cpu field inside CONFIG_SMP #ifdefs 
moved to a separate patch.
 - Removed CURRENT_THREAD_INFO macro completely.
 - Added a guard in asm/smp.h to avoid build failure before _TASK_CPU is 
defined.
 - Added a patch at the end to rename 'tp' pointers to 'sp' pointers
 - Renamed 'tp' into 'sp' pointers in preparation patch when relevant
 - Fixed a few commit logs
 - Fixed checkpatch report.

Changes since RFC v2:
 - Removed the modification of names in asm-offsets
 - Created a rule in arch/powerpc/Makefile to append the offset of current->cpu 
in CFLAGS
 - Modified asm/smp.h to use the offset set in CFLAGS
 - Squashed the renaming of THREAD_INFO to TASK_STACK in the preparation patch
 - Moved the modification of current_pt_regs in the patch activating 
CONFIG_THREAD_INFO_IN_TASK

Changes since RFC v1:
 - Removed the first patch which was modifying header inclusion order in timer
 - Modified some names in asm-offsets to avoid conflicts when including 
asm-offsets in C files
 - Modified asm/smp.h to avoid having to include linux/sched.h (using 
asm-offsets instead)
 - Moved some changes from the activation patch to the preparation patch.

Christophe Leroy (13):
  powerpc/irq: use memblock functions returning virtual address
  powerpc/32: Fix CONFIG_VIRT_CPU_ACCOUNTING_NATIVE for 40x/booke
  book3s/64: avoid circular header inclusion in mmu-hash.h
  powerpc: Only use task_struct 'cpu' field on SMP
  powerpc: prep stack walkers for THREAD_INFO_IN_TASK
  powerpc: Rename THREAD_INFO to TASK_STACK
  powerpc: Prepare for moving thread_info into task_struct
  powerpc: Activate 

[PATCH v15 01/13] powerpc/irq: use memblock functions returning virtual address

2019-01-31 Thread Christophe Leroy
Since only the virtual address of allocated blocks is used,
lets use functions returning directly virtual address.

Those functions have the advantage of also zeroing the block.

Suggested-by: Mike Rapoport 
Acked-by: Mike Rapoport 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/irq.c  |  5 -
 arch/powerpc/kernel/setup_32.c | 26 --
 arch/powerpc/kernel/setup_64.c | 19 +++
 3 files changed, 23 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index bb299613a462..4a5dd8800946 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -725,18 +725,15 @@ void exc_lvl_ctx_init(void)
 #endif
 #endif
 
-   memset((void *)critirq_ctx[cpu_nr], 0, THREAD_SIZE);
tp = critirq_ctx[cpu_nr];
tp->cpu = cpu_nr;
tp->preempt_count = 0;
 
 #ifdef CONFIG_BOOKE
-   memset((void *)dbgirq_ctx[cpu_nr], 0, THREAD_SIZE);
tp = dbgirq_ctx[cpu_nr];
tp->cpu = cpu_nr;
tp->preempt_count = 0;
 
-   memset((void *)mcheckirq_ctx[cpu_nr], 0, THREAD_SIZE);
tp = mcheckirq_ctx[cpu_nr];
tp->cpu = cpu_nr;
tp->preempt_count = HARDIRQ_OFFSET;
@@ -754,12 +751,10 @@ void irq_ctx_init(void)
int i;
 
for_each_possible_cpu(i) {
-   memset((void *)softirq_ctx[i], 0, THREAD_SIZE);
tp = softirq_ctx[i];
tp->cpu = i;
klp_init_thread_info(tp);
 
-   memset((void *)hardirq_ctx[i], 0, THREAD_SIZE);
tp = hardirq_ctx[i];
tp->cpu = i;
klp_init_thread_info(tp);
diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c
index 947f904688b0..1f0b7629c1a6 100644
--- a/arch/powerpc/kernel/setup_32.c
+++ b/arch/powerpc/kernel/setup_32.c
@@ -196,6 +196,17 @@ static int __init ppc_init(void)
 }
 arch_initcall(ppc_init);
 
+static void *__init alloc_stack(void)
+{
+   void *ptr = memblock_alloc(THREAD_SIZE, THREAD_SIZE);
+
+   if (!ptr)
+   panic("cannot allocate %d bytes for stack at %pS\n",
+ THREAD_SIZE, (void *)_RET_IP_);
+
+   return ptr;
+}
+
 void __init irqstack_early_init(void)
 {
unsigned int i;
@@ -203,10 +214,8 @@ void __init irqstack_early_init(void)
/* interrupt stacks must be in lowmem, we get that for free on ppc32
 * as the memblock is limited to lowmem by default */
for_each_possible_cpu(i) {
-   softirq_ctx[i] = (struct thread_info *)
-   __va(memblock_phys_alloc(THREAD_SIZE, THREAD_SIZE));
-   hardirq_ctx[i] = (struct thread_info *)
-   __va(memblock_phys_alloc(THREAD_SIZE, THREAD_SIZE));
+   softirq_ctx[i] = alloc_stack();
+   hardirq_ctx[i] = alloc_stack();
}
 }
 
@@ -224,13 +233,10 @@ void __init exc_lvl_early_init(void)
hw_cpu = 0;
 #endif
 
-   critirq_ctx[hw_cpu] = (struct thread_info *)
-   __va(memblock_phys_alloc(THREAD_SIZE, THREAD_SIZE));
+   critirq_ctx[hw_cpu] = alloc_stack();
 #ifdef CONFIG_BOOKE
-   dbgirq_ctx[hw_cpu] = (struct thread_info *)
-   __va(memblock_phys_alloc(THREAD_SIZE, THREAD_SIZE));
-   mcheckirq_ctx[hw_cpu] = (struct thread_info *)
-   __va(memblock_phys_alloc(THREAD_SIZE, THREAD_SIZE));
+   dbgirq_ctx[hw_cpu] = alloc_stack();
+   mcheckirq_ctx[hw_cpu] = alloc_stack();
 #endif
}
 }
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 236c1151a3a7..080dd515d587 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -634,19 +634,17 @@ __init u64 ppc64_bolted_size(void)
 
 static void *__init alloc_stack(unsigned long limit, int cpu)
 {
-   unsigned long pa;
+   void *ptr;
 
BUILD_BUG_ON(STACK_INT_FRAME_SIZE % 16);
 
-   pa = memblock_alloc_base_nid(THREAD_SIZE, THREAD_SIZE, limit,
-   early_cpu_to_node(cpu), MEMBLOCK_NONE);
-   if (!pa) {
-   pa = memblock_alloc_base(THREAD_SIZE, THREAD_SIZE, limit);
-   if (!pa)
-   panic("cannot allocate stacks");
-   }
+   ptr = memblock_alloc_try_nid(THREAD_SIZE, THREAD_SIZE,
+MEMBLOCK_LOW_LIMIT, limit,
+early_cpu_to_node(cpu));
+   if (!ptr)
+   panic("cannot allocate stacks");
 
-   return __va(pa);
+   return ptr;
 }
 
 void __init irqstack_early_init(void)
@@ -739,20 +737,17 @@ void __init emergency_stack_init(void)
struct thread_info *ti;
 
ti = alloc_stack(limit, i);
-   memset(ti, 0, THREAD_SIZE);
 

Re: linux-next: powerpc le qemu boot failure after merge of the akpm tree

2019-01-31 Thread Stephen Rothwell
Hi Mike,

On Thu, 31 Jan 2019 09:40:18 +0200 Mike Rapoport  wrote:
>
> Andrew, can you please add the below patch to as a fixup to "treewide: add
> checks for the return value of memblock_alloc*()"?

I have added that to linux-next for tomorrow (in case Andrew doesn't
get to it).

> From 854f54b9d4fe52f477765b905a4b2c421d30f46e Mon Sep 17 00:00:00 2001
> From: Mike Rapoport 
> Date: Thu, 31 Jan 2019 09:18:50 +0200
> Subject: [PATCH] mm/sparse: don't panic if the allocation in
>  sparse_buffer_init fails

Thanks all for the quick resolution.

-- 
Cheers,
Stephen Rothwell


pgp6EaFEzzD09.pgp
Description: OpenPGP digital signature