Re: [PATCH 2/2] powerpc: Show PAGE_SIZE in __die() output

2019-01-08 Thread Christophe Leroy




Le 08/01/2019 à 13:21, Christophe Leroy a écrit :



Le 08/01/2019 à 13:05, Michael Ellerman a écrit :

The page size the kernel is built with is useful info when debugging a
crash, so add it to the output in __die().

Result looks like eg:

   kernel BUG at drivers/misc/lkdtm/bugs.c:63!
   Oops: Exception in kernel mode, sig: 5 [#1]
   LE PAGE_SIZE=64K SMP NR_CPUS=2048 NUMA pSeries
   Modules linked in: vmx_crypto kvm binfmt_misc ip_tables

Signed-off-by: Michael Ellerman 
---
  arch/powerpc/kernel/traps.c | 12 
  1 file changed, 12 insertions(+)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 431a86d3f772..fc972e4eee5f 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -268,6 +268,18 @@ static int __die(const char *str, struct pt_regs 
*regs, long err)

  else
  seq_buf_puts(, "BE ");
+    seq_buf_puts(, "PAGE_SIZE=");
+    if (IS_ENABLED(CONFIG_PPC_4K_PAGES))
+    seq_buf_puts(, "4K ");
+    else if (IS_ENABLED(CONFIG_PPC_16K_PAGES))
+    seq_buf_puts(, "16K ");
+    else if (IS_ENABLED(CONFIG_PPC_64K_PAGES))
+    seq_buf_puts(, "64K ");
+    else if (IS_ENABLED(CONFIG_PPC_256K_PAGES))
+    seq_buf_puts(, "256K ");


Can't we build all the above at once using PAGE_SHIFT ?

Something like (untested):

"%dK ", 1 << (PAGE_SHIFT - 10)


Or even simplier:

"%dK ", PAGE_SIZE / 1024

Christophe




Christophe


+    else
+    BUILD_BUG_ON(1);
+
  if (IS_ENABLED(CONFIG_PREEMPT))
  seq_buf_puts(, "PREEMPT ");



Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45]

2019-01-08 Thread Benjamin Herrenschmidt
On Wed, 2019-01-09 at 17:32 +1100, Alexey Kardashevskiy wrote:
> I have just moved the "Mellanox Technologies MT27700 Family
> [ConnectX-4]" from garrison to firestone machine and there it does not
> produce an EEH, with the same kernel and skiboot (both upstream + my
> debug). Hm. I cannot really blame the card but I cannot see what could
> cause the difference in skiboot either. I even tried disabling NPU so
> garrison would look like firestone, still EEH'ing.

The systems have a different chip though, firestone is P8 and garrison
is P8', which a slightly different PHB revision. Worth checking if we
have anything significantly different in our inits and poke at the HW
guys.

BTW. Are the cards behind a switch in either case ?

Cheers,
Ben.




Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45]

2019-01-08 Thread Benjamin Herrenschmidt
On Wed, 2019-01-09 at 15:53 +1100, Alexey Kardashevskiy wrote:
> "A PCI completion timeout occurred for an outstanding PCI-E transaction"
> it is.
> 
> This is how I bind the device to vfio:
> 
> echo vfio-pci > '/sys/bus/pci/devices/:01:00.0/driver_override'
> echo vfio-pci > '/sys/bus/pci/devices/:01:00.1/driver_override'
> echo ':01:00.0' > '/sys/bus/pci/devices/:01:00.0/driver/unbind'
> echo ':01:00.1' > '/sys/bus/pci/devices/:01:00.1/driver/unbind'
> echo ':01:00.0' > /sys/bus/pci/drivers/vfio-pci/bind
> echo ':01:00.1' > /sys/bus/pci/drivers/vfio-pci/bind
> 
> 
> and I noticed that EEH only happens with the last command. The order
> (.0,.1  or .1,.0) does not matter, it seems that putting one function to
> D3 is fine but putting another one when the first one is already in D3 -
> produces EEH. And I do not recall ever seeing this on the firestone
> machine. Weird.

Putting all functions into D3 is what allows the device to actually go
into D3.

Does it work with other devices ? We do have that bug on early P9
revisions where the attempt of bringing the link to L1 as part of the
D3 process fails in horrible ways, I thought P8 would be ok but maybe
not ...

Otherwise, it might be that our timeouts are too low (you may want to
talk to our PCIe guys internally)

Cheers,
Ben.




Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45]

2019-01-08 Thread Alexey Kardashevskiy



On 09/01/2019 16:30, David Gibson wrote:
> On Wed, Jan 09, 2019 at 04:09:02PM +1100, Benjamin Herrenschmidt wrote:
>> On Mon, 2019-01-07 at 21:01 -0700, Jason Gunthorpe wrote:
>>>
 In a very cryptic way that requires manual parsing using non-public
 docs sadly but yes. From the look of it, it's a completion timeout.

 Looks to me like we don't get a response to a config space access
 during the change of D state. I don't know if it's the write of the D3
 state itself or the read back though (it's probably detected on the
 read back or a subsequent read, but that doesn't tell me which specific
 one failed).
>>>
>>> If it is just one card doing it (again, check you have latest
>>> firmware) I wonder if it is a sketchy PCI-E electrical link that is
>>> causing a long re-training cycle? Can you tell if the PCI-E link is
>>> permanently gone or does it eventually return?
>>
>> No, it's 100% reproducable on systems with that specific card model,
>> not card instance, and maybe different systems/cards as well, I'll let
>> David & Alexey comment further on that.
> 
> Well, it's 100% reproducable on a particular model of system
> (garrison) with a particular model of card.  I've had some suggestions
> that it fails with some other systems card card models, but nothing
> confirmed - the one other system model I've been able to try, which
> also had a newer card model didn't reproduce the problem.

I have just moved the "Mellanox Technologies MT27700 Family
[ConnectX-4]" from garrison to firestone machine and there it does not
produce an EEH, with the same kernel and skiboot (both upstream + my
debug). Hm. I cannot really blame the card but I cannot see what could
cause the difference in skiboot either. I even tried disabling NPU so
garrison would look like firestone, still EEH'ing.



>>> Does the card work in Gen 3 when it starts? Is there any indication of
>>> PCI-E link errors?
>>
>> Nope.
>>
>>> Everytime or sometimes?
>>>
>>> POWER 8 firmware is good? If the link does eventually come back, is
>>> the POWER8's D3 resumption timeout long enough?
>>>
>>> If this doesn't lead to an obvious conclusion you'll probably need to
>>> connect to IBM's Mellanox support team to get more information from
>>> the card side.
>>
>> We are IBM :-) So far, it seems to be that the card is doing something
>> not quite right, but we don't know what. We might need to engage
>> Mellanox themselves.
> 
> Possibly.  On the other hand, I've had it reported that this is a
> software regression at least with downstream red hat kernels.  I
> haven't yet been able to eliminate factors that might be confusing
> that, or try to find a working version upstream.

Do you have tarballs handy? I'd diff...


-- 
Alexey


Re: [PATCH] lkdtm: Add a tests for NULL pointer dereference

2019-01-08 Thread Christophe Leroy




Le 09/01/2019 à 02:14, Kees Cook a écrit :

On Fri, Dec 14, 2018 at 7:26 AM Christophe Leroy
 wrote:


Introduce lkdtm tests for NULL pointer dereference: check
access or exec at NULL address.


Why is this not already covered by the existing tests? (Is there
something special about NULL that is being missed?) I'd expect SMAP
and SMEP to cover NULL as well.


Most arches print a different message whether the faulty address is 
above or under PAGE_SIZE. Below is exemple from x86:


pr_alert("BUG: unable to handle kernel %s at %px\n",
 address < PAGE_SIZE ? "NULL pointer dereference" : "paging 
request",
 (void *)address);


Until recently, the powerpc arch didn't do it. When I implemented it 
(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=49a502ea23bf9dec47f8f3c3960909ff409cd1bb), 
I needed a way to test it and couldn't find an existing one, hence this 
new LKDTM test.


But maybe I missed something ?

Christophe



-Kees



Signed-off-by: Christophe Leroy 
---
  drivers/misc/lkdtm/core.c  |  2 ++
  drivers/misc/lkdtm/lkdtm.h |  2 ++
  drivers/misc/lkdtm/perms.c | 18 ++
  3 files changed, 22 insertions(+)

diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c
index bc76756b7eda..36910e1d5c09 100644
--- a/drivers/misc/lkdtm/core.c
+++ b/drivers/misc/lkdtm/core.c
@@ -157,7 +157,9 @@ static const struct crashtype crashtypes[] = {
 CRASHTYPE(EXEC_VMALLOC),
 CRASHTYPE(EXEC_RODATA),
 CRASHTYPE(EXEC_USERSPACE),
+   CRASHTYPE(EXEC_NULL),
 CRASHTYPE(ACCESS_USERSPACE),
+   CRASHTYPE(ACCESS_NULL),
 CRASHTYPE(WRITE_RO),
 CRASHTYPE(WRITE_RO_AFTER_INIT),
 CRASHTYPE(WRITE_KERN),
diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h
index 3c6fd327e166..b69ee004a3f7 100644
--- a/drivers/misc/lkdtm/lkdtm.h
+++ b/drivers/misc/lkdtm/lkdtm.h
@@ -45,7 +45,9 @@ void lkdtm_EXEC_KMALLOC(void);
  void lkdtm_EXEC_VMALLOC(void);
  void lkdtm_EXEC_RODATA(void);
  void lkdtm_EXEC_USERSPACE(void);
+void lkdtm_EXEC_NULL(void);
  void lkdtm_ACCESS_USERSPACE(void);
+void lkdtm_ACCESS_NULL(void);

  /* lkdtm_refcount.c */
  void lkdtm_REFCOUNT_INC_OVERFLOW(void);
diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c
index fa54add6375a..62f76d506f04 100644
--- a/drivers/misc/lkdtm/perms.c
+++ b/drivers/misc/lkdtm/perms.c
@@ -164,6 +164,11 @@ void lkdtm_EXEC_USERSPACE(void)
 vm_munmap(user_addr, PAGE_SIZE);
  }

+void lkdtm_EXEC_NULL(void)
+{
+   execute_location(NULL, CODE_AS_IS);
+}
+
  void lkdtm_ACCESS_USERSPACE(void)
  {
 unsigned long user_addr, tmp = 0;
@@ -195,6 +200,19 @@ void lkdtm_ACCESS_USERSPACE(void)
 vm_munmap(user_addr, PAGE_SIZE);
  }

+void lkdtm_ACCESS_NULL(void)
+{
+   unsigned long tmp;
+   unsigned long *ptr = (unsigned long *)NULL;
+
+   pr_info("attempting bad read at %px\n", ptr);
+   tmp = *ptr;
+   tmp += 0xc0dec0de;
+
+   pr_info("attempting bad write at %px\n", ptr);
+   *ptr = tmp;
+}
+
  void __init lkdtm_perms_init(void)
  {
 /* Make sure we can write to __ro_after_init values during __init */
--
2.13.3






Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop

2019-01-08 Thread Gautham R Shenoy
Hello Thiago,

Wish you a happy 2019!

On Sat, Dec 08, 2018 at 12:40:52AM -0200, Thiago Jung Bauermann wrote:
> 
> Gautham R Shenoy  writes:
> > On Fri, Dec 07, 2018 at 04:13:11PM +0530, Gautham R Shenoy wrote:
> >> Sure. I will test the patch and report back.
> >
> > I added the following debug patch on top of your patch, and after an
> > hour's run, the system crashed. Appending the log at the end.
> 
> Thank you very much for testing! Your debug patch was very helpful as
> well.
> 
> > I suppose we still need to increase the number of tries since we wait
> > only for 2.5ms looping before giving up.
> 
> Do you think it would have helped? In the debug output you posted I
> would have expected the following message to show up if the loop
> finished too early, and it didn't:
> 
> "Querying DEAD? cpu %i (%i) shows %i\n"
> 
> So I don't think increasing the loop length would have made a
> difference. In fact, the call to smp_query_cpu_stopped() always
> succeeded on the first iteration.

I did some testing during the holidays. Here are the observations:

1) With just your patch (without any additional debug patch), if I run
DLPAR on /off operations on a system that has SMT=off, I am able to
see a crash involving RTAS stack corruption within an hour's time.

2) With the debug patch (appended below) which has additional debug to
capture the callers of stop-self, start-cpu, set-power-levels, the
system is able to perform DLPAR on/off operations on a system with
SMT=off for three days. And then, it crashed with the dead CPU showing
a "Bad kernel stack pointer". From this log, I can clearly
see that there were no concurrent calls to stop-self, start-cpu,
set-power-levels. The only concurrent RTAS calls were the dying CPU
calling "stop-self", and the CPU running the DLPAR operation calling
"query-cpu-stopped-state". The crash signature is appended below as
well.

3) Modifying your patch to remove the udelay and increase the loop
count from 25 to 1000 doesn't improve the situation. We are still able
to see the crash.

4) With my patch, even without any additional debug, I was able to
observe the system run the tests successfully for over a week (I
started the tests before the Christmas weekend, and forgot to turn it
off!)

It appears that there is a narrow race window involving rtas-stop-self
and rtas-query-cpu-stopped-state calls that can be observed with your
patch. Adding any printk's seems to reduce the probability of hitting
this race window. It might be worth the while to check with RTAS
folks, if they suspect something here.

The Crash log with this debug patch is as follows

[DEBUG] CPU 32 waiting for CPU 130 to enter rtas
cpu 130 (hwid 130) Ready to die...
[DEBUG] CPU 32 noticed CPU 130 enter rtas: tries=0, time=539
[DEBUG] CPU 32 waiting for CPU 131 to enter rtas
cpu 131 (hwid 131) Ready to die...
[DEBUG] CPU 32 noticed CPU 131 enter rtas: tries=0, time=137
[DEBUG] CPU 32 waiting for CPU 132 to enter rtas
cpu 132 (hwid 132) Ready to die...
[DEBUG] CPU 32 noticed CPU 132 enter rtas: tries=0, time=1238
[DEBUG] CPU 32 waiting for CPU 133 to enter rtas
cpu 133 (hwid 133) Ready to die...
[DEBUG] CPU 32 noticed CPU 133 enter rtas: tries=1, time=1259
[DEBUG] CPU 32 waiting for CPU 134 to enter rtas
cpu 134 (hwid 134) Ready to die...
[DEBUG] CPU 32 noticed CPU 134 enter rtas: tries=0, time=1141
[DEBUG] CPU 32 waiting for CPU 135 to enter rtas
cpu 135 (hwid 135) Ready to die...
[DEBUG] CPU 32 noticed CPU 135 enter rtas: tries=0, time=636
cpu 120 (hwid 120) Ready to die...
[DEBUG] CPU 32 waiting for CPU 120 to enter rtas
[DEBUG] CPU 32 noticed CPU 120 enter rtas: tries=0, time=53
[DEBUG] CPU 32 waiting for CPU 121 to enter rtas
cpu 121 (hwid 121) Ready to die...
Bad kernel stack pointer 1fafb400 at 1fafb4b0
Bad kernel stack pointer 1fafb4b0 at 1ec15270
Oops: Bad kernel stack pointer, sig: 6 [#1]
LE SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
CPU: 121 PID: 0 Comm: swapper/121 Not tainted 
4.20.0-thiago-original-with-debug+ #57
NIP:  1ec15270 LR: 1ec1526c CTR: 1ecd26c4
REGS: c0001e577d30 TRAP: 0300   Not tainted  
(4.20.0-thiago-original-with-debug+)
MSR:  80001000   CR: 4800  XER: 0020
CFAR: 1ecd27a8 DAR: 8108 DSISR: 4000 IRQMASK: 
80009033
GPR00: 1ec1526c 1fafb4b0  
GPR04: 01a40968 00e0 dfe8 0018
GPR08: 1f82ae00 1ed025d4 1f827850 
GPR12: 01b6d998 c0001eb4e080 c003a1bdbf90 1eea3020
GPR16:  c006fdc45100 c004c8b0 c13d5300
GPR20: c006fdc45300 0008 c19d2cf8 c13d6888
GPR24: 0079 c13d688c 0002 c13d688c
GPR28: c19cecf0 03c8  1fafb4b0
NIP [1ec15270] 0x1ec15270
LR [1ec1526c] 0x1ec1526c
Call Trace:
Instruction dump:
 

Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45]

2019-01-08 Thread David Gibson
On Wed, Jan 09, 2019 at 04:09:02PM +1100, Benjamin Herrenschmidt wrote:
> On Mon, 2019-01-07 at 21:01 -0700, Jason Gunthorpe wrote:
> > 
> > > In a very cryptic way that requires manual parsing using non-public
> > > docs sadly but yes. From the look of it, it's a completion timeout.
> > > 
> > > Looks to me like we don't get a response to a config space access
> > > during the change of D state. I don't know if it's the write of the D3
> > > state itself or the read back though (it's probably detected on the
> > > read back or a subsequent read, but that doesn't tell me which specific
> > > one failed).
> > 
> > If it is just one card doing it (again, check you have latest
> > firmware) I wonder if it is a sketchy PCI-E electrical link that is
> > causing a long re-training cycle? Can you tell if the PCI-E link is
> > permanently gone or does it eventually return?
> 
> No, it's 100% reproducable on systems with that specific card model,
> not card instance, and maybe different systems/cards as well, I'll let
> David & Alexey comment further on that.

Well, it's 100% reproducable on a particular model of system
(garrison) with a particular model of card.  I've had some suggestions
that it fails with some other systems card card models, but nothing
confirmed - the one other system model I've been able to try, which
also had a newer card model didn't reproduce the problem.

> > Does the card work in Gen 3 when it starts? Is there any indication of
> > PCI-E link errors?
> 
> Nope.
> 
> > Everytime or sometimes?
> > 
> > POWER 8 firmware is good? If the link does eventually come back, is
> > the POWER8's D3 resumption timeout long enough?
> > 
> > If this doesn't lead to an obvious conclusion you'll probably need to
> > connect to IBM's Mellanox support team to get more information from
> > the card side.
> 
> We are IBM :-) So far, it seems to be that the card is doing something
> not quite right, but we don't know what. We might need to engage
> Mellanox themselves.

Possibly.  On the other hand, I've had it reported that this is a
software regression at least with downstream red hat kernels.  I
haven't yet been able to eliminate factors that might be confusing
that, or try to find a working version upstream.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45]

2019-01-08 Thread Benjamin Herrenschmidt
On Mon, 2019-01-07 at 21:01 -0700, Jason Gunthorpe wrote:
> 
> > In a very cryptic way that requires manual parsing using non-public
> > docs sadly but yes. From the look of it, it's a completion timeout.
> > 
> > Looks to me like we don't get a response to a config space access
> > during the change of D state. I don't know if it's the write of the D3
> > state itself or the read back though (it's probably detected on the
> > read back or a subsequent read, but that doesn't tell me which specific
> > one failed).
> 
> If it is just one card doing it (again, check you have latest
> firmware) I wonder if it is a sketchy PCI-E electrical link that is
> causing a long re-training cycle? Can you tell if the PCI-E link is
> permanently gone or does it eventually return?

No, it's 100% reproducable on systems with that specific card model,
not card instance, and maybe different systems/cards as well, I'll let
David & Alexey comment further on that.

> Does the card work in Gen 3 when it starts? Is there any indication of
> PCI-E link errors?

Nope.

> Everytime or sometimes?
> 
> POWER 8 firmware is good? If the link does eventually come back, is
> the POWER8's D3 resumption timeout long enough?
> 
> If this doesn't lead to an obvious conclusion you'll probably need to
> connect to IBM's Mellanox support team to get more information from
> the card side.

We are IBM :-) So far, it seems to be that the card is doing something
not quite right, but we don't know what. We might need to engage
Mellanox themselves.

Cheers,
Ben.




Re: [PATCH 03/19] KVM: PPC: Book3S HV: check the IRQ controller type

2019-01-08 Thread David Gibson
On Mon, Jan 07, 2019 at 07:43:15PM +0100, Cédric Le Goater wrote:
> We will have different KVM devices for interrupts, one for the
> XICS-over-XIVE mode and one for the XIVE native exploitation
> mode. Let's add some checks to make sure we are not mixing the
> interfaces in KVM.
> 
> Signed-off-by: Cédric Le Goater 

Reviewed-by: David Gibson 

> ---
>  arch/powerpc/kvm/book3s_xive.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
> index f78d002f0fe0..8a4fa45f07f8 100644
> --- a/arch/powerpc/kvm/book3s_xive.c
> +++ b/arch/powerpc/kvm/book3s_xive.c
> @@ -819,6 +819,9 @@ u64 kvmppc_xive_get_icp(struct kvm_vcpu *vcpu)
>  {
>   struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
>  
> + if (!kvmppc_xics_enabled(vcpu))
> + return -EPERM;
> +
>   if (!xc)
>   return 0;
>  
> @@ -835,6 +838,9 @@ int kvmppc_xive_set_icp(struct kvm_vcpu *vcpu, u64 icpval)
>   u8 cppr, mfrr;
>   u32 xisr;
>  
> + if (!kvmppc_xics_enabled(vcpu))
> + return -EPERM;
> +
>   if (!xc || !xive)
>   return -ENOENT;
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 02/19] powerpc/xive: add OPAL extensions for the XIVE native exploitation support

2019-01-08 Thread David Gibson
On Mon, Jan 07, 2019 at 07:43:14PM +0100, Cédric Le Goater wrote:
> The support for XIVE native exploitation mode in Linux/KVM needs a
> couple more OPAL calls to configure the sPAPR guest and to get/set the
> state of the XIVE internal structures.
> 
> Signed-off-by: Cédric Le Goater 

Reviewed-by: David Gibson 

> ---
>  arch/powerpc/include/asm/opal-api.h   | 11 ++-
>  arch/powerpc/include/asm/opal.h   |  7 ++
>  arch/powerpc/include/asm/xive.h   | 14 +++
>  arch/powerpc/sysdev/xive/native.c | 99 +++
>  .../powerpc/platforms/powernv/opal-wrappers.S |  3 +
>  5 files changed, 130 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/opal-api.h 
> b/arch/powerpc/include/asm/opal-api.h
> index 870fb7b239ea..cdfc54f78101 100644
> --- a/arch/powerpc/include/asm/opal-api.h
> +++ b/arch/powerpc/include/asm/opal-api.h
> @@ -186,8 +186,8 @@
>  #define OPAL_XIVE_FREE_IRQ   140
>  #define OPAL_XIVE_SYNC   141
>  #define OPAL_XIVE_DUMP   142
> -#define OPAL_XIVE_RESERVED3  143
> -#define OPAL_XIVE_RESERVED4  144
> +#define OPAL_XIVE_GET_QUEUE_STATE143
> +#define OPAL_XIVE_SET_QUEUE_STATE144
>  #define OPAL_SIGNAL_SYSTEM_RESET 145
>  #define OPAL_NPU_INIT_CONTEXT146
>  #define OPAL_NPU_DESTROY_CONTEXT 147
> @@ -209,8 +209,11 @@
>  #define OPAL_SENSOR_GROUP_ENABLE 163
>  #define OPAL_PCI_GET_PBCQ_TUNNEL_BAR 164
>  #define OPAL_PCI_SET_PBCQ_TUNNEL_BAR 165
> -#define  OPAL_NX_COPROC_INIT 167
> -#define OPAL_LAST167
> +#define OPAL_HANDLE_HMI2 166
> +#define OPAL_NX_COPROC_INIT  167
> +#define OPAL_NPU_SET_RELAXED_ORDER   168
> +#define OPAL_NPU_GET_RELAXED_ORDER   169
> +#define OPAL_XIVE_GET_VP_STATE   170
>  
>  #define QUIESCE_HOLD 1 /* Spin all calls at entry */
>  #define QUIESCE_REJECT   2 /* Fail all calls with 
> OPAL_BUSY */
> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
> index a55b01c90bb1..4e978d4dea5c 100644
> --- a/arch/powerpc/include/asm/opal.h
> +++ b/arch/powerpc/include/asm/opal.h
> @@ -279,6 +279,13 @@ int64_t opal_xive_allocate_irq(uint32_t chip_id);
>  int64_t opal_xive_free_irq(uint32_t girq);
>  int64_t opal_xive_sync(uint32_t type, uint32_t id);
>  int64_t opal_xive_dump(uint32_t type, uint32_t id);
> +int64_t opal_xive_get_queue_state(uint64_t vp, uint32_t prio,
> +   __be32 *out_qtoggle,
> +   __be32 *out_qindex);
> +int64_t opal_xive_set_queue_state(uint64_t vp, uint32_t prio,
> +   uint32_t qtoggle,
> +   uint32_t qindex);
> +int64_t opal_xive_get_vp_state(uint64_t vp, __be64 *out_w01);
>  int64_t opal_pci_set_p2p(uint64_t phb_init, uint64_t phb_target,
>   uint64_t desc, uint16_t pe_number);
>  
> diff --git a/arch/powerpc/include/asm/xive.h b/arch/powerpc/include/asm/xive.h
> index 32f033bfbf42..d6be3e4d9fa4 100644
> --- a/arch/powerpc/include/asm/xive.h
> +++ b/arch/powerpc/include/asm/xive.h
> @@ -132,12 +132,26 @@ extern int xive_native_configure_queue(u32 vp_id, 
> struct xive_q *q, u8 prio,
>  extern void xive_native_disable_queue(u32 vp_id, struct xive_q *q, u8 prio);
>  
>  extern void xive_native_sync_source(u32 hw_irq);
> +extern void xive_native_sync_queue(u32 hw_irq);
>  extern bool is_xive_irq(struct irq_chip *chip);
>  extern int xive_native_enable_vp(u32 vp_id, bool single_escalation);
>  extern int xive_native_disable_vp(u32 vp_id);
>  extern int xive_native_get_vp_info(u32 vp_id, u32 *out_cam_id, u32 
> *out_chip_id);
>  extern bool xive_native_has_single_escalation(void);
>  
> +extern int xive_native_get_queue_info(u32 vp_id, uint32_t prio,
> +   u64 *out_qpage,
> +   u64 *out_qsize,
> +   u64 *out_qeoi_page,
> +   u32 *out_escalate_irq,
> +   u64 *out_qflags);
> +
> +extern int xive_native_get_queue_state(u32 vp_id, uint32_t prio, u32 
> *qtoggle,
> +u32 *qindex);
> +extern int xive_native_set_queue_state(u32 vp_id, uint32_t prio, u32 qtoggle,
> +u32 qindex);
> +extern int xive_native_get_vp_state(u32 vp_id, u64 *out_state);
> +
>  #else
>  
>  static inline bool xive_enabled(void) { return false; }
> diff --git a/arch/powerpc/sysdev/xive/native.c 
> b/arch/powerpc/sysdev/xive/native.c
> index 1ca127d052a6..0c037e933e55 100644
> --- a/arch/powerpc/sysdev/xive/native.c
> +++ b/arch/powerpc/sysdev/xive/native.c
> @@ -437,6 +437,12 @@ void 

Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45]

2019-01-08 Thread Alexey Kardashevskiy



On 06/01/2019 09:43, Benjamin Herrenschmidt wrote:
> On Sat, 2019-01-05 at 10:51 -0700, Jason Gunthorpe wrote:
>>
>>> Interesting.  I've investigated this further, though I don't have as
>>> many new clues as I'd like.  The problem occurs reliably, at least on
>>> one particular type of machine (a POWER8 "Garrison" with ConnectX-4).
>>> I don't yet know if it occurs with other machines, I'm having trouble
>>> getting access to other machines with a suitable card.  I didn't
>>> manage to reproduce it on a different POWER8 machine with a
>>> ConnectX-5, but I don't know if it's the difference in machine or
>>> difference in card revision that's important.
>>
>> Make sure the card has the latest firmware is always good advice..
>>
>>> So possibilities that occur to me:
>>>   * It's something specific about how the vfio-pci driver uses D3
>>> state - have you tried rebinding your device to vfio-pci?
>>>   * It's something specific about POWER, either the kernel or the PCI
>>> bridge hardware
>>>   * It's something specific about this particular type of machine
>>
>> Does the EEH indicate what happend to actually trigger it?
> 
> In a very cryptic way that requires manual parsing using non-public
> docs sadly but yes. From the look of it, it's a completion timeout.
> 
> Looks to me like we don't get a response to a config space access
> during the change of D state. I don't know if it's the write of the D3
> state itself or the read back though (it's probably detected on the
> read back or a subsequent read, but that doesn't tell me which specific
> one failed).

It is write:

pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);


> 
> Some extra logging in OPAL might help pin that down by checking the InA
> error state in the config accessor after the config write (and polling
> on it for a while as from a CPU perspective I don't knw if the write is
> synchronous, probably not).


Extra logging gives these straight after that write:

nFir:8080 0030006e 8000

PhbSts:  0018 0018

Lem: 02088000 42498e367f502eae 0008

OutErr:  0020 0020 

InAErr:  3000 2000 8080



Decoded (my fancy script):

nFir:8080 0030006e 8000

 |- PCI Nest Fault Isolation Register(FIR) NestBase+0x00 _BE_ =
8080h:
 |   [0..63]   1000 1000  
 
 |   #16 set: The PHB had a severe error and has fenced the AIB
 |   #24 set: The internal SCOM to ASB bridge has an error
 |   #29..30: Error bit from SCOM FIR engine = 0h
 |- PCI Nest FIR Mask NestBase+0x03 _BE_ = 0030006eh:
 |   [0..63]  0011  01101110  
 
 |   #10 set: Any PowerBus data hang poll error(Only checked for CI Stores)
 |   #11 set: Any PowerBus command hang error (domestic address range)
 |   #25 set: A command received ack_dead, foreign data hang, or
Link_chk_abort from the foreign interface
 |   #26 set: Any PowerBus command hang error (foreign address range)
 |   #28 set: Error bit from BARS SCOM engines, Nest domain
 |   #29..30: Error bit from SCOM FIR engine = 3h/[0..1] 11
 |- PCI Nest FIR WOF (“Who's on First”) NestBase+0x08 _BE_ =
8000h:
 |   [0..63]   1000   
 
 |   #16 set: The PHB had a severe error and has fenced the AIB
 |   #29..30: Error bit from SCOM FIR engine = 0h
 |
PhbSts:  0018 0018

 |- 0x0120 Processor Load/Store Status Register _BE_ = 0018h:
 |   [0..63]    00011000  
 
 |   #27 set: One of the PHB3’s error status register bits is set
 |   #28 set: One of the PHB3’s first error status register bits is set
 |- 0x0110 DMA Channel Status Register _BE_ = 0018h:
 |   [0..63]    00011000  
 
 |   #27 set: One of the PHB3’s error status register bits is set
 |   #28 set: One of the PHB3’s first error status register bits is set
 |
Lem: 02088000 42498e367f502eae 0008

 |- 0xC00 LEM FIR Accumulator Register _BE_ = 02088000h:
 |   [0..63]   0010   1000
1000 
 |   #22 set: CFG Access Error
 |   #44 set: PCT Timeout Error
 |   #48 set: PCT Unexpected Completion
 |- 0xC18 LEM Error Mask Register = 42498e367f502eaeh
 |- 0xC40 LEM WOF Register _BE_ = 0008h:
 |   [0..63]      1000
 
 |   #44 set: PCT Timeout Error
 |
OutErr:  0020 0020 

 |- 0xD00 Outbound Error Status Register _BE_ = 0020h:
 |   [0..63]   

Re: perf trace syscall table generation for powerpc with syscall.tbl

2019-01-08 Thread Ravi Bangoria
Hi Arnaldo,

Yes. I'm aware of it. Just that I was busy with something else so couldn't do 
it.

Thanks for reminding :). Will post a patch soon.

Ravi

On 1/8/19 10:34 PM, Arnaldo Carvalho de Melo wrote:
> Hi Ravi,
> 
>   I noticed that in:
> 
>   commit ab66dcc76d6ab8fae9d69d149ae38c42605e7fc5
>   Author: Firoz Khan 
>   Date:   Mon Dec 17 16:10:36 2018 +0530
> 
> powerpc: generate uapi header and system call table files
> 
> powerpc now generates its syscall tables headers from a syscall.tbl just
> like x86 and s390, could you please switch to using it, taking the x86
> and s390 scripts as a starting point, then test on your systems
> everything works?
> 
> Thanks,
> 
> - Arnaldo
> 



Re: [PATCH 01/19] powerpc/xive: export flags for the XIVE native exploitation mode hcalls

2019-01-08 Thread David Gibson
On Mon, Jan 07, 2019 at 07:43:13PM +0100, Cédric Le Goater wrote:
> These flags are shared between Linux/KVM implementing the hypervisor
> calls for the XIVE native exploitation mode and the driver for the
> sPAPR guests.
> 
> Signed-off-by: Cédric Le Goater 

Reviewed-by: David Gibson 

> ---
>  arch/powerpc/include/asm/xive.h  | 23 +++
>  arch/powerpc/sysdev/xive/spapr.c | 28 
>  2 files changed, 31 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/xive.h b/arch/powerpc/include/asm/xive.h
> index 3c704f5dd3ae..32f033bfbf42 100644
> --- a/arch/powerpc/include/asm/xive.h
> +++ b/arch/powerpc/include/asm/xive.h
> @@ -93,6 +93,29 @@ extern void xive_flush_interrupt(void);
>  /* xmon hook */
>  extern void xmon_xive_do_dump(int cpu);
>  
> +/*
> + * Hcall flags shared by the sPAPR backend and KVM
> + */
> +
> +/* H_INT_GET_SOURCE_INFO */
> +#define XIVE_SPAPR_SRC_H_INT_ESB PPC_BIT(60)
> +#define XIVE_SPAPR_SRC_LSI   PPC_BIT(61)
> +#define XIVE_SPAPR_SRC_TRIGGER   PPC_BIT(62)
> +#define XIVE_SPAPR_SRC_STORE_EOI PPC_BIT(63)
> +
> +/* H_INT_SET_SOURCE_CONFIG */
> +#define XIVE_SPAPR_SRC_SET_EISN  PPC_BIT(62)
> +#define XIVE_SPAPR_SRC_MASK  PPC_BIT(63) /* unused */
> +
> +/* H_INT_SET_QUEUE_CONFIG */
> +#define XIVE_SPAPR_EQ_ALWAYS_NOTIFY  PPC_BIT(63)
> +
> +/* H_INT_SET_QUEUE_CONFIG */
> +#define XIVE_SPAPR_EQ_DEBUG  PPC_BIT(63)
> +
> +/* H_INT_ESB */
> +#define XIVE_SPAPR_ESB_STORE PPC_BIT(63)
> +
>  /* APIs used by KVM */
>  extern u32 xive_native_default_eq_shift(void);
>  extern u32 xive_native_alloc_vp_block(u32 max_vcpus);
> diff --git a/arch/powerpc/sysdev/xive/spapr.c 
> b/arch/powerpc/sysdev/xive/spapr.c
> index 575db3b06a6b..730284f838c8 100644
> --- a/arch/powerpc/sysdev/xive/spapr.c
> +++ b/arch/powerpc/sysdev/xive/spapr.c
> @@ -184,9 +184,6 @@ static long plpar_int_get_source_info(unsigned long flags,
>   return 0;
>  }
>  
> -#define XIVE_SRC_SET_EISN (1ull << (63 - 62))
> -#define XIVE_SRC_MASK (1ull << (63 - 63)) /* unused */
> -
>  static long plpar_int_set_source_config(unsigned long flags,
>   unsigned long lisn,
>   unsigned long target,
> @@ -243,8 +240,6 @@ static long plpar_int_get_queue_info(unsigned long flags,
>   return 0;
>  }
>  
> -#define XIVE_EQ_ALWAYS_NOTIFY (1ull << (63 - 63))
> -
>  static long plpar_int_set_queue_config(unsigned long flags,
>  unsigned long target,
>  unsigned long priority,
> @@ -286,8 +281,6 @@ static long plpar_int_sync(unsigned long flags, unsigned 
> long lisn)
>   return 0;
>  }
>  
> -#define XIVE_ESB_FLAG_STORE (1ull << (63 - 63))
> -
>  static long plpar_int_esb(unsigned long flags,
> unsigned long lisn,
> unsigned long offset,
> @@ -321,7 +314,7 @@ static u64 xive_spapr_esb_rw(u32 lisn, u32 offset, u64 
> data, bool write)
>   unsigned long read_data;
>   long rc;
>  
> - rc = plpar_int_esb(write ? XIVE_ESB_FLAG_STORE : 0,
> + rc = plpar_int_esb(write ? XIVE_SPAPR_ESB_STORE : 0,
>  lisn, offset, data, _data);
>   if (rc)
>   return -1;
> @@ -329,11 +322,6 @@ static u64 xive_spapr_esb_rw(u32 lisn, u32 offset, u64 
> data, bool write)
>   return write ? 0 : read_data;
>  }
>  
> -#define XIVE_SRC_H_INT_ESB (1ull << (63 - 60))
> -#define XIVE_SRC_LSI   (1ull << (63 - 61))
> -#define XIVE_SRC_TRIGGER   (1ull << (63 - 62))
> -#define XIVE_SRC_STORE_EOI (1ull << (63 - 63))
> -
>  static int xive_spapr_populate_irq_data(u32 hw_irq, struct xive_irq_data 
> *data)
>  {
>   long rc;
> @@ -349,11 +337,11 @@ static int xive_spapr_populate_irq_data(u32 hw_irq, 
> struct xive_irq_data *data)
>   if (rc)
>   return  -EINVAL;
>  
> - if (flags & XIVE_SRC_H_INT_ESB)
> + if (flags & XIVE_SPAPR_SRC_H_INT_ESB)
>   data->flags  |= XIVE_IRQ_FLAG_H_INT_ESB;
> - if (flags & XIVE_SRC_STORE_EOI)
> + if (flags & XIVE_SPAPR_SRC_STORE_EOI)
>   data->flags  |= XIVE_IRQ_FLAG_STORE_EOI;
> - if (flags & XIVE_SRC_LSI)
> + if (flags & XIVE_SPAPR_SRC_LSI)
>   data->flags  |= XIVE_IRQ_FLAG_LSI;
>   data->eoi_page  = eoi_page;
>   data->esb_shift = esb_shift;
> @@ -374,7 +362,7 @@ static int xive_spapr_populate_irq_data(u32 hw_irq, 
> struct xive_irq_data *data)
>   data->hw_irq = hw_irq;
>  
>   /* Full function page supports trigger */
> - if (flags & XIVE_SRC_TRIGGER) {
> + if (flags & XIVE_SPAPR_SRC_TRIGGER) {
>   data->trig_mmio = data->eoi_mmio;
>   return 0;
>   }
> @@ -391,8 +379,8 @@ static int xive_spapr_configure_irq(u32 hw_irq, u32 
> target, u8 prio, u32 sw_irq)
>  {
>   long rc;
>  
> - rc = 

Re: [PATCH V6 1/4] mm/cma: Add PF flag to force non cma alloc

2019-01-08 Thread Andrea Arcangeli
On Tue, Jan 08, 2019 at 10:21:07AM +0530, Aneesh Kumar K.V wrote:
> This patch add PF_MEMALLOC_NOCMA which make sure any allocation in that 
> context
> is marked non movable and hence cannot be satisfied by CMA region.
> 
> This is useful with get_user_pages_cma_migrate where we take a page pin by
> migrating pages from CMA region. Marking the section PF_MEMALLOC_NOCMA ensures
> that we avoid uncessary page migration later.
> 
> Suggested-by: Andrea Arcangeli 
> Signed-off-by: Aneesh Kumar K.V 

Reviewed-by: Andrea Arcangeli 


Re: [PATCH V6 3/4] powerpc/mm/iommu: Allow migration of cma allocated pages during mm_iommu_get

2019-01-08 Thread Andrea Arcangeli
Hello,

On Tue, Jan 08, 2019 at 10:21:09AM +0530, Aneesh Kumar K.V wrote:
> @@ -187,41 +149,25 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, 
> unsigned long ua,
>   goto unlock_exit;
>   }
>  
> + ret = get_user_pages_cma_migrate(ua, entries, 1, mem->hpages);

In terms of gup APIs, I've been wondering if this shall become
get_user_pages_longerm(FOLL_CMA_MIGRATE). So basically moving this
CMA migrate logic inside get_user_pages_longerm.

It depends if powerpc will ever need to bail on dax and/or if other
non-powerpc vfio drivers which are already bailing on dax may also
later optionally need to avoid interfering with CMA.

Aside from the API detail above, this CMA page migration logic seems a
good solution for the problem.

Thanks,
Andrea


Re: [PATCH] lkdtm: Add a tests for NULL pointer dereference

2019-01-08 Thread Kees Cook
On Fri, Dec 14, 2018 at 7:26 AM Christophe Leroy
 wrote:
>
> Introduce lkdtm tests for NULL pointer dereference: check
> access or exec at NULL address.

Why is this not already covered by the existing tests? (Is there
something special about NULL that is being missed?) I'd expect SMAP
and SMEP to cover NULL as well.

-Kees

>
> Signed-off-by: Christophe Leroy 
> ---
>  drivers/misc/lkdtm/core.c  |  2 ++
>  drivers/misc/lkdtm/lkdtm.h |  2 ++
>  drivers/misc/lkdtm/perms.c | 18 ++
>  3 files changed, 22 insertions(+)
>
> diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c
> index bc76756b7eda..36910e1d5c09 100644
> --- a/drivers/misc/lkdtm/core.c
> +++ b/drivers/misc/lkdtm/core.c
> @@ -157,7 +157,9 @@ static const struct crashtype crashtypes[] = {
> CRASHTYPE(EXEC_VMALLOC),
> CRASHTYPE(EXEC_RODATA),
> CRASHTYPE(EXEC_USERSPACE),
> +   CRASHTYPE(EXEC_NULL),
> CRASHTYPE(ACCESS_USERSPACE),
> +   CRASHTYPE(ACCESS_NULL),
> CRASHTYPE(WRITE_RO),
> CRASHTYPE(WRITE_RO_AFTER_INIT),
> CRASHTYPE(WRITE_KERN),
> diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h
> index 3c6fd327e166..b69ee004a3f7 100644
> --- a/drivers/misc/lkdtm/lkdtm.h
> +++ b/drivers/misc/lkdtm/lkdtm.h
> @@ -45,7 +45,9 @@ void lkdtm_EXEC_KMALLOC(void);
>  void lkdtm_EXEC_VMALLOC(void);
>  void lkdtm_EXEC_RODATA(void);
>  void lkdtm_EXEC_USERSPACE(void);
> +void lkdtm_EXEC_NULL(void);
>  void lkdtm_ACCESS_USERSPACE(void);
> +void lkdtm_ACCESS_NULL(void);
>
>  /* lkdtm_refcount.c */
>  void lkdtm_REFCOUNT_INC_OVERFLOW(void);
> diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c
> index fa54add6375a..62f76d506f04 100644
> --- a/drivers/misc/lkdtm/perms.c
> +++ b/drivers/misc/lkdtm/perms.c
> @@ -164,6 +164,11 @@ void lkdtm_EXEC_USERSPACE(void)
> vm_munmap(user_addr, PAGE_SIZE);
>  }
>
> +void lkdtm_EXEC_NULL(void)
> +{
> +   execute_location(NULL, CODE_AS_IS);
> +}
> +
>  void lkdtm_ACCESS_USERSPACE(void)
>  {
> unsigned long user_addr, tmp = 0;
> @@ -195,6 +200,19 @@ void lkdtm_ACCESS_USERSPACE(void)
> vm_munmap(user_addr, PAGE_SIZE);
>  }
>
> +void lkdtm_ACCESS_NULL(void)
> +{
> +   unsigned long tmp;
> +   unsigned long *ptr = (unsigned long *)NULL;
> +
> +   pr_info("attempting bad read at %px\n", ptr);
> +   tmp = *ptr;
> +   tmp += 0xc0dec0de;
> +
> +   pr_info("attempting bad write at %px\n", ptr);
> +   *ptr = tmp;
> +}
> +
>  void __init lkdtm_perms_init(void)
>  {
> /* Make sure we can write to __ro_after_init values during __init */
> --
> 2.13.3
>


-- 
Kees Cook


Re: Kconfig label updates

2019-01-08 Thread Oliver
On Wed, Jan 9, 2019 at 9:31 AM Bjorn Helgaas  wrote:
>
> Hi,
>
> I want to update the PCI Kconfig labels so they're more consistent and
> useful to users, something like the patch below.  IIUC, the items
> below are all IBM-related; please correct me if not.
>
> I'd also like to expand (or remove) "RPA" because Google doesn't find
> anything about "IBM RPA", except Robotic Process Automation, which I
> think must be something else.
>
> Is there some text expansion of RPA that we could use that would be
> meaningful to a user, i.e., something he/she might find on a nameplate
> or in a user manual?

"RISC Platform Architecture" apparently. I couldn't find any
documentation written in the last decade that uses the term so it's
probably a hold-over from when pseries and iseries had separate
hardware platforms. Replacing "RPA" with "pseries" would be the least
confusing way to handle it.

> Ideally the PCI Kconfig labels would match the terms used in
> arch/.../Kconfig, e.g.,
>
>   config PPC_POWERNV
> bool "IBM PowerNV (Non-Virtualized) platform support"
>
>   config PPC_PSERIES
> bool "IBM pSeries & new (POWER5-based) iSeries"
>
>   config MARCH_Z900
> bool "IBM zSeries model z800 and z900"
>
>   config MARCH_Z9_109
> bool "IBM System z9"
>
> Bjorn
>
>
> diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig
> index e9f78eb390d2..1c1d145bfd84 100644
> --- a/drivers/pci/hotplug/Kconfig
> +++ b/drivers/pci/hotplug/Kconfig
> @@ -112,7 +112,7 @@ config HOTPLUG_PCI_SHPC
>   When in doubt, say N.
>
>  config HOTPLUG_PCI_POWERNV
> -   tristate "PowerPC PowerNV PCI Hotplug driver"
> +   tristate "IBM PowerNV PCI Hotplug driver"
> depends on PPC_POWERNV && EEH
> select OF_DYNAMIC
> help
> @@ -125,10 +125,11 @@ config HOTPLUG_PCI_POWERNV
>   When in doubt, say N.
>
>  config HOTPLUG_PCI_RPA
> -   tristate "RPA PCI Hotplug driver"
> +   tristate "IBM Power Systems RPA PCI Hotplug driver"
> depends on PPC_PSERIES && EEH
> help
>   Say Y here if you have a RPA system that supports PCI Hotplug.
> + This includes the earlier pSeries and iSeries.
>
>   To compile this driver as a module, choose M here: the
>   module will be called rpaphp.
> @@ -136,7 +137,7 @@ config HOTPLUG_PCI_RPA
>   When in doubt, say N.
>
>  config HOTPLUG_PCI_RPA_DLPAR
> -   tristate "RPA Dynamic Logical Partitioning for I/O slots"
> +   tristate "IBM RPA Dynamic Logical Partitioning for I/O slots"
> depends on HOTPLUG_PCI_RPA
> help
>   Say Y here if your system supports Dynamic Logical Partitioning
> @@ -157,7 +158,7 @@ config HOTPLUG_PCI_SGI
>   When in doubt, say N.
>
>  config HOTPLUG_PCI_S390
> -   bool "System z PCI Hotplug Support"
> +   bool "IBM System z PCI Hotplug Support"
> depends on S390 && 64BIT
> help
>   Say Y here if you want to use the System z PCI Hotplug


Re: [PATCH v8 24/25] powerpc: Adopt nvram module for PPC64

2019-01-08 Thread Finn Thain
On Tue, 8 Jan 2019, Michael Ellerman wrote:

> > The reason why that doesn't work boils down to introspection. (This 
> > was mentioned elsewhere in this email thread.) For example, we 
> > presently have code like this,
> >
> > ssize_t nvram_get_size(void)
> > {
> >if (ppc_md.nvram_size)
> >return ppc_md.nvram_size();
> >return -1;
> > }
> > EXPORT_SYMBOL(nvram_get_size);
> >
> > This construction means we get to decide at run-time which of the NVRAM 
> > functions should be used. (Whereas your example makes a build-time 
> > decision.)
> 
> Right, but we only need to make a runtime decision on powerpc (right?).

It's needed in many places outside of powerpc. Otherwise the caller can't 
determine at run-time which ops are implemented.

Hence you have to duplicate the caller for each supported configuration 
that you build.

Already, this precludes a shared misc device implementation and belies the 
"generic" in drivers/char/generic_nvram.c.

-- 


Kconfig label updates

2019-01-08 Thread Bjorn Helgaas
Hi,

I want to update the PCI Kconfig labels so they're more consistent and
useful to users, something like the patch below.  IIUC, the items
below are all IBM-related; please correct me if not.

I'd also like to expand (or remove) "RPA" because Google doesn't find
anything about "IBM RPA", except Robotic Process Automation, which I
think must be something else.

Is there some text expansion of RPA that we could use that would be
meaningful to a user, i.e., something he/she might find on a nameplate
or in a user manual?

Ideally the PCI Kconfig labels would match the terms used in
arch/.../Kconfig, e.g.,

  config PPC_POWERNV
bool "IBM PowerNV (Non-Virtualized) platform support"

  config PPC_PSERIES
bool "IBM pSeries & new (POWER5-based) iSeries"

  config MARCH_Z900
bool "IBM zSeries model z800 and z900"

  config MARCH_Z9_109
bool "IBM System z9"

Bjorn


diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig
index e9f78eb390d2..1c1d145bfd84 100644
--- a/drivers/pci/hotplug/Kconfig
+++ b/drivers/pci/hotplug/Kconfig
@@ -112,7 +112,7 @@ config HOTPLUG_PCI_SHPC
  When in doubt, say N.
 
 config HOTPLUG_PCI_POWERNV
-   tristate "PowerPC PowerNV PCI Hotplug driver"
+   tristate "IBM PowerNV PCI Hotplug driver"
depends on PPC_POWERNV && EEH
select OF_DYNAMIC
help
@@ -125,10 +125,11 @@ config HOTPLUG_PCI_POWERNV
  When in doubt, say N.
 
 config HOTPLUG_PCI_RPA
-   tristate "RPA PCI Hotplug driver"
+   tristate "IBM Power Systems RPA PCI Hotplug driver"
depends on PPC_PSERIES && EEH
help
  Say Y here if you have a RPA system that supports PCI Hotplug.
+ This includes the earlier pSeries and iSeries.
 
  To compile this driver as a module, choose M here: the
  module will be called rpaphp.
@@ -136,7 +137,7 @@ config HOTPLUG_PCI_RPA
  When in doubt, say N.
 
 config HOTPLUG_PCI_RPA_DLPAR
-   tristate "RPA Dynamic Logical Partitioning for I/O slots"
+   tristate "IBM RPA Dynamic Logical Partitioning for I/O slots"
depends on HOTPLUG_PCI_RPA
help
  Say Y here if your system supports Dynamic Logical Partitioning
@@ -157,7 +158,7 @@ config HOTPLUG_PCI_SGI
  When in doubt, say N.
 
 config HOTPLUG_PCI_S390
-   bool "System z PCI Hotplug Support"
+   bool "IBM System z PCI Hotplug Support"
depends on S390 && 64BIT
help
  Say Y here if you want to use the System z PCI Hotplug


[PATCH] powerpc/ps3: Use struct_size() in kzalloc()

2019-01-08 Thread Gustavo A. R. Silva
One of the more common cases of allocation size calculations is finding the
size of a structure that has a zero-sized array at the end, along with memory
for some number of elements for that array. For example:

struct foo {
int stuff;
void *entry[];
};

instance = kzalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can now
use the new struct_size() helper:

instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva 
---
 arch/powerpc/platforms/ps3/device-init.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/ps3/device-init.c 
b/arch/powerpc/platforms/ps3/device-init.c
index e7075aaff1bb..59587b75493d 100644
--- a/arch/powerpc/platforms/ps3/device-init.c
+++ b/arch/powerpc/platforms/ps3/device-init.c
@@ -354,9 +354,7 @@ static int ps3_setup_storage_dev(const struct 
ps3_repository_device *repo,
 repo->dev_index, repo->dev_type, port, blk_size, num_blocks,
 num_regions);
 
-   p = kzalloc(sizeof(struct ps3_storage_device) +
-   num_regions * sizeof(struct ps3_storage_region),
-   GFP_KERNEL);
+   p = kzalloc(struct_size(p, regions, num_regions), GFP_KERNEL);
if (!p) {
result = -ENOMEM;
goto fail_malloc;
-- 
2.20.1



Re: [PATCH] socket: move compat timeout handling into sock.c

2019-01-08 Thread Deepa Dinamani
On Tue, Jan 8, 2019 at 12:10 PM Arnd Bergmann  wrote:
>
> This is a cleanup to prepare for the addition of 64-bit time_t
> in O_SNDTIMEO/O_RCVTIMEO. The existing compat handler seems
> unnecessarily complex and error-prone, moving it all into the
> main setsockopt()/getsockopt() implementation requires half
> as much code and is easier to extend.
>
> 32-bit user space can now use old_timeval32 on both 32-bit
> and 64-bit machines, while 64-bit code can use
> __old_kernel_timeval.
>
> Signed-off-by: Arnd Bergmann 

This will make the other series so much nicer. Thank you.

Acked-by: Deepa Dinamani 


Re: [PATCH 2/3] socket: Rename SO_RCVTIMEO/ SO_SNDTIMEO with _OLD suffixes

2019-01-08 Thread Deepa Dinamani
On Tue, Jan 8, 2019 at 12:04 PM Arnd Bergmann  wrote:
>
> On Tue, Jan 8, 2019 at 6:24 AM Deepa Dinamani  wrote:
> >
> > SO_RCVTIMEO and SO_SNDTIMEO socket options use struct timeval
> > as the time format. struct timeval is not y2038 safe.
> > The subsequent patches in the series add support for new socket
> > timeout options with _NEW suffix that are y2038 safe.
> > Rename the existing options with _OLD suffix forms so that the
> > right option is enabled for userspace applications according
> > to the architecture and time_t definition of libc.
> >
> > Signed-off-by: Deepa Dinamani 
>
> Looks good overall. A few minor concerns:
>
> The description above makes it sound like there is a bug with y2038-safety
> in this particular interface, which I think is just not what you meant,
> as the change is only needed for compatiblity with new C libraries
> that work around the y2038 problem in general by changing their
> timeval definition.

Right, there is y2038 safety issue, just the libc part that needs to be handled.
I will fix the commit text.

> > diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
> > index 76976d6e50f9..c98ad9777ad9 100644
> > --- a/fs/dlm/lowcomms.c
> > +++ b/fs/dlm/lowcomms.c
> > @@ -1089,12 +1089,12 @@ static void sctp_connect_to_sock(struct connection 
> > *con)
> >  * since O_NONBLOCK argument in connect() function does not work 
> > here,
> >  * then, we should restore the default value of this attribute.
> >  */
> > -   kernel_setsockopt(sock, SOL_SOCKET, SO_SNDTIMEO, (char *),
> > +   kernel_setsockopt(sock, SOL_SOCKET, SO_SNDTIMEO_OLD, (char *),
> >   sizeof(tv));
> > result = sock->ops->connect(sock, (struct sockaddr *), 
> > addr_len,
> >0);
> > memset(, 0, sizeof(tv));
> > -   kernel_setsockopt(sock, SOL_SOCKET, SO_SNDTIMEO, (char *),
> > +   kernel_setsockopt(sock, SOL_SOCKET, SO_SNDTIMEO_OLD, (char *),
> >   sizeof(tv));
> >
> > if (result == -EINPROGRESS)
>
> It took me a bit to realize there that this is safe as well even if
> we don't use SO_SNDTIMEO_NEW, for the same reason.

Correct.

> > --- a/net/compat.c
> > +++ b/net/compat.c
> > @@ -378,7 +378,7 @@ static int compat_sock_setsockopt(struct socket *sock, 
> > int level, int optname,
> > return do_set_attach_filter(sock, level, optname,
> > optval, optlen);
> > if (!COMPAT_USE_64BIT_TIME &&
> > -   (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO))
> > +   (optname == SO_RCVTIMEO_OLD || optname == SO_SNDTIMEO_OLD))
> > return do_set_sock_timeout(sock, level, optname, optval, 
> > optlen);
> >
> > return sock_setsockopt(sock, level, optname, optval, optlen);
> > @@ -450,7 +450,7 @@ static int compat_sock_getsockopt(struct socket *sock, 
> > int level, int optname,
> > char __user *optval, int __user *optlen)
> >  {
> > if (!COMPAT_USE_64BIT_TIME &&
> > -   (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO))
> > +   (optname == SO_RCVTIMEO_OLD || optname == SO_SNDTIMEO_OLD))
> > return do_get_sock_timeout(sock, level, optname, optval, 
> > optlen);
> > return sock_getsockopt(sock, level, optname, optval, optlen);
> >  }
>
> I looked at the original code and noticed that it's horrible, which of course
> is not your fault, but I wonder if we should just fix it now to avoid that
> get_fs()/set_fs() hack, since that code mostly implements what you
> also have in your patch 3 (which is done more nicely).

I did think of getting rid of set_fs()/ get_fs() here.
But, I wasn't sure as the maintainers seemed to prefer to leave to the
old code as is in the other series for timestamps.

> I'll follow up with a patch to demonstrate what I mean here. Your third
> patch will then just have to add another code path so we can handle
> all of old_timespec32 (for existing 32-bit user space), __kernel_old_timespec
> (for sparc64) and __kernel_sock_timeval (for everything else).

Cool, I will rebase on top of your patch.

Thanks,
Deepa


Re: [PATCH v2 1/2] mm: add probe_user_read()

2019-01-08 Thread Kees Cook
On Tue, Jan 8, 2019 at 1:11 PM Christophe Leroy  wrote:
>
>
>
> Le 08/01/2019 à 20:48, Andrew Morton a écrit :
> > On Tue,  8 Jan 2019 07:37:44 + (UTC) Christophe Leroy 
> >  wrote:
> >
> >> In powerpc code, there are several places implementing safe
> >> access to user data. This is sometimes implemented using
> >> probe_kernel_address() with additional access_ok() verification,
> >> sometimes with get_user() enclosed in a pagefault_disable()/enable()
> >> pair, etc. :
> >>  show_user_instructions()
> >>  bad_stack_expansion()
> >>  p9_hmi_special_emu()
> >>  fsl_pci_mcheck_exception()
> >>  read_user_stack_64()
> >>  read_user_stack_32() on PPC64
> >>  read_user_stack_32() on PPC32
> >>  power_pmu_bhrb_to()
> >>
> >> In the same spirit as probe_kernel_read(), this patch adds
> >> probe_user_read().
> >>
> >> probe_user_read() does the same as probe_kernel_read() but
> >> first checks that it is really a user address.
> >>
> >> ...
> >>
> >> --- a/include/linux/uaccess.h
> >> +++ b/include/linux/uaccess.h
> >> @@ -263,6 +263,40 @@ extern long strncpy_from_unsafe(char *dst, const void 
> >> *unsafe_addr, long count);
> >>   #define probe_kernel_address(addr, retval) \
> >>  probe_kernel_read(, addr, sizeof(retval))
> >>
> >> +/**
> >> + * probe_user_read(): safely attempt to read from a user location
> >> + * @dst: pointer to the buffer that shall take the data
> >> + * @src: address to read from
> >> + * @size: size of the data chunk
> >> + *
> >> + * Returns: 0 on success, -EFAULT on error.
> >> + *
> >> + * Safely read from address @src to the buffer at @dst.  If a kernel fault
> >> + * happens, handle that and return -EFAULT.
> >> + *
> >> + * We ensure that the copy_from_user is executed in atomic context so that
> >> + * do_page_fault() doesn't attempt to take mmap_sem.  This makes
> >> + * probe_user_read() suitable for use within regions where the caller
> >> + * already holds mmap_sem, or other locks which nest inside mmap_sem.
> >> + */
> >> +
> >> +#ifndef probe_user_read
> >> +static __always_inline long probe_user_read(void *dst, const void __user 
> >> *src,
> >> +size_t size)
> >> +{
> >> +long ret;
> >> +
> >> +if (!access_ok(src, size))
> >> +return -EFAULT;
> >> +
> >> +pagefault_disable();
> >> +ret = __copy_from_user_inatomic(dst, src, size);
> >> +pagefault_enable();
> >> +
> >> +return ret ? -EFAULT : 0;
> >> +}
> >> +#endif
> >
> > Why was the __always_inline needed?
> >
> > This function is pretty large.  Why is it inlined?
> >
>
> Kees told to do that way, see https://patchwork.ozlabs.org/patch/986848/

Yeah, I'd like to make sure we can plumb the size checks down into the
user copy primitives.

-- 
Kees Cook


Re: [PATCH v2 1/2] mm: add probe_user_read()

2019-01-08 Thread Christophe Leroy




Le 08/01/2019 à 20:48, Andrew Morton a écrit :

On Tue,  8 Jan 2019 07:37:44 + (UTC) Christophe Leroy 
 wrote:


In powerpc code, there are several places implementing safe
access to user data. This is sometimes implemented using
probe_kernel_address() with additional access_ok() verification,
sometimes with get_user() enclosed in a pagefault_disable()/enable()
pair, etc. :
 show_user_instructions()
 bad_stack_expansion()
 p9_hmi_special_emu()
 fsl_pci_mcheck_exception()
 read_user_stack_64()
 read_user_stack_32() on PPC64
 read_user_stack_32() on PPC32
 power_pmu_bhrb_to()

In the same spirit as probe_kernel_read(), this patch adds
probe_user_read().

probe_user_read() does the same as probe_kernel_read() but
first checks that it is really a user address.

...

--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -263,6 +263,40 @@ extern long strncpy_from_unsafe(char *dst, const void 
*unsafe_addr, long count);
  #define probe_kernel_address(addr, retval)\
probe_kernel_read(, addr, sizeof(retval))
  
+/**

+ * probe_user_read(): safely attempt to read from a user location
+ * @dst: pointer to the buffer that shall take the data
+ * @src: address to read from
+ * @size: size of the data chunk
+ *
+ * Returns: 0 on success, -EFAULT on error.
+ *
+ * Safely read from address @src to the buffer at @dst.  If a kernel fault
+ * happens, handle that and return -EFAULT.
+ *
+ * We ensure that the copy_from_user is executed in atomic context so that
+ * do_page_fault() doesn't attempt to take mmap_sem.  This makes
+ * probe_user_read() suitable for use within regions where the caller
+ * already holds mmap_sem, or other locks which nest inside mmap_sem.
+ */
+
+#ifndef probe_user_read
+static __always_inline long probe_user_read(void *dst, const void __user *src,
+   size_t size)
+{
+   long ret;
+
+   if (!access_ok(src, size))
+   return -EFAULT;
+
+   pagefault_disable();
+   ret = __copy_from_user_inatomic(dst, src, size);
+   pagefault_enable();
+
+   return ret ? -EFAULT : 0;
+}
+#endif


Why was the __always_inline needed?

This function is pretty large.  Why is it inlined?



Kees told to do that way, see https://patchwork.ozlabs.org/patch/986848/

Christophe


[PATCH] socket: move compat timeout handling into sock.c

2019-01-08 Thread Arnd Bergmann
This is a cleanup to prepare for the addition of 64-bit time_t
in O_SNDTIMEO/O_RCVTIMEO. The existing compat handler seems
unnecessarily complex and error-prone, moving it all into the
main setsockopt()/getsockopt() implementation requires half
as much code and is easier to extend.

32-bit user space can now use old_timeval32 on both 32-bit
and 64-bit machines, while 64-bit code can use
__old_kernel_timeval.

Signed-off-by: Arnd Bergmann 
---
 net/compat.c| 66 +
 net/core/sock.c | 65 +++-
 2 files changed, 44 insertions(+), 87 deletions(-)

diff --git a/net/compat.c b/net/compat.c
index 959d1c51826d..ce8f6e8cdcd2 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -348,28 +348,6 @@ static int do_set_attach_filter(struct socket *sock, int 
level, int optname,
  sizeof(struct sock_fprog));
 }
 
-static int do_set_sock_timeout(struct socket *sock, int level,
-   int optname, char __user *optval, unsigned int optlen)
-{
-   struct compat_timeval __user *up = (struct compat_timeval __user 
*)optval;
-   struct timeval ktime;
-   mm_segment_t old_fs;
-   int err;
-
-   if (optlen < sizeof(*up))
-   return -EINVAL;
-   if (!access_ok(up, sizeof(*up)) ||
-   __get_user(ktime.tv_sec, >tv_sec) ||
-   __get_user(ktime.tv_usec, >tv_usec))
-   return -EFAULT;
-   old_fs = get_fs();
-   set_fs(KERNEL_DS);
-   err = sock_setsockopt(sock, level, optname, (char *), 
sizeof(ktime));
-   set_fs(old_fs);
-
-   return err;
-}
-
 static int compat_sock_setsockopt(struct socket *sock, int level, int optname,
char __user *optval, unsigned int optlen)
 {
@@ -377,10 +355,6 @@ static int compat_sock_setsockopt(struct socket *sock, int 
level, int optname,
optname == SO_ATTACH_REUSEPORT_CBPF)
return do_set_attach_filter(sock, level, optname,
optval, optlen);
-   if (!COMPAT_USE_64BIT_TIME &&
-   (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO))
-   return do_set_sock_timeout(sock, level, optname, optval, 
optlen);
-
return sock_setsockopt(sock, level, optname, optval, optlen);
 }
 
@@ -417,44 +391,6 @@ COMPAT_SYSCALL_DEFINE5(setsockopt, int, fd, int, level, 
int, optname,
return __compat_sys_setsockopt(fd, level, optname, optval, optlen);
 }
 
-static int do_get_sock_timeout(struct socket *sock, int level, int optname,
-   char __user *optval, int __user *optlen)
-{
-   struct compat_timeval __user *up;
-   struct timeval ktime;
-   mm_segment_t old_fs;
-   int len, err;
-
-   up = (struct compat_timeval __user *) optval;
-   if (get_user(len, optlen))
-   return -EFAULT;
-   if (len < sizeof(*up))
-   return -EINVAL;
-   len = sizeof(ktime);
-   old_fs = get_fs();
-   set_fs(KERNEL_DS);
-   err = sock_getsockopt(sock, level, optname, (char *) , );
-   set_fs(old_fs);
-
-   if (!err) {
-   if (put_user(sizeof(*up), optlen) ||
-   !access_ok(up, sizeof(*up)) ||
-   __put_user(ktime.tv_sec, >tv_sec) ||
-   __put_user(ktime.tv_usec, >tv_usec))
-   err = -EFAULT;
-   }
-   return err;
-}
-
-static int compat_sock_getsockopt(struct socket *sock, int level, int optname,
-   char __user *optval, int __user *optlen)
-{
-   if (!COMPAT_USE_64BIT_TIME &&
-   (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO))
-   return do_get_sock_timeout(sock, level, optname, optval, 
optlen);
-   return sock_getsockopt(sock, level, optname, optval, optlen);
-}
-
 int compat_sock_get_timestamp(struct sock *sk, struct timeval __user 
*userstamp)
 {
struct compat_timeval __user *ctv;
@@ -527,7 +463,7 @@ static int __compat_sys_getsockopt(int fd, int level, int 
optname,
}
 
if (level == SOL_SOCKET)
-   err = compat_sock_getsockopt(sock, level,
+   err = sock_getsockopt(sock, level,
optname, optval, optlen);
else if (sock->ops->compat_getsockopt)
err = sock->ops->compat_getsockopt(sock, level,
diff --git a/net/core/sock.c b/net/core/sock.c
index 6aa2e7e0b4fb..e50b9a2abc92 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -335,14 +335,48 @@ int __sk_backlog_rcv(struct sock *sk, struct sk_buff *skb)
 }
 EXPORT_SYMBOL(__sk_backlog_rcv);
 
+static int sock_get_timeout(long timeo, void *optval)
+{
+   struct __kernel_old_timeval tv;
+
+   if (timeo == MAX_SCHEDULE_TIMEOUT) {
+   tv.tv_sec = 0;
+   tv.tv_usec = 0;
+   } else {
+   tv.tv_sec = timeo / HZ;
+   

Re: [PATCH 2/3] socket: Rename SO_RCVTIMEO/ SO_SNDTIMEO with _OLD suffixes

2019-01-08 Thread Arnd Bergmann
On Tue, Jan 8, 2019 at 6:24 AM Deepa Dinamani  wrote:
>
> SO_RCVTIMEO and SO_SNDTIMEO socket options use struct timeval
> as the time format. struct timeval is not y2038 safe.
> The subsequent patches in the series add support for new socket
> timeout options with _NEW suffix that are y2038 safe.
> Rename the existing options with _OLD suffix forms so that the
> right option is enabled for userspace applications according
> to the architecture and time_t definition of libc.
>
> Signed-off-by: Deepa Dinamani 

Looks good overall. A few minor concerns:

The description above makes it sound like there is a bug with y2038-safety
in this particular interface, which I think is just not what you meant,
as the change is only needed for compatiblity with new C libraries
that work around the y2038 problem in general by changing their
timeval definition.

> diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
> index 76976d6e50f9..c98ad9777ad9 100644
> --- a/fs/dlm/lowcomms.c
> +++ b/fs/dlm/lowcomms.c
> @@ -1089,12 +1089,12 @@ static void sctp_connect_to_sock(struct connection 
> *con)
>  * since O_NONBLOCK argument in connect() function does not work here,
>  * then, we should restore the default value of this attribute.
>  */
> -   kernel_setsockopt(sock, SOL_SOCKET, SO_SNDTIMEO, (char *),
> +   kernel_setsockopt(sock, SOL_SOCKET, SO_SNDTIMEO_OLD, (char *),
>   sizeof(tv));
> result = sock->ops->connect(sock, (struct sockaddr *), addr_len,
>0);
> memset(, 0, sizeof(tv));
> -   kernel_setsockopt(sock, SOL_SOCKET, SO_SNDTIMEO, (char *),
> +   kernel_setsockopt(sock, SOL_SOCKET, SO_SNDTIMEO_OLD, (char *),
>   sizeof(tv));
>
> if (result == -EINPROGRESS)

It took me a bit to realize there that this is safe as well even if
we don't use SO_SNDTIMEO_NEW, for the same reason.

> --- a/net/compat.c
> +++ b/net/compat.c
> @@ -378,7 +378,7 @@ static int compat_sock_setsockopt(struct socket *sock, 
> int level, int optname,
> return do_set_attach_filter(sock, level, optname,
> optval, optlen);
> if (!COMPAT_USE_64BIT_TIME &&
> -   (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO))
> +   (optname == SO_RCVTIMEO_OLD || optname == SO_SNDTIMEO_OLD))
> return do_set_sock_timeout(sock, level, optname, optval, 
> optlen);
>
> return sock_setsockopt(sock, level, optname, optval, optlen);
> @@ -450,7 +450,7 @@ static int compat_sock_getsockopt(struct socket *sock, 
> int level, int optname,
> char __user *optval, int __user *optlen)
>  {
> if (!COMPAT_USE_64BIT_TIME &&
> -   (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO))
> +   (optname == SO_RCVTIMEO_OLD || optname == SO_SNDTIMEO_OLD))
> return do_get_sock_timeout(sock, level, optname, optval, 
> optlen);
> return sock_getsockopt(sock, level, optname, optval, optlen);
>  }

I looked at the original code and noticed that it's horrible, which of course
is not your fault, but I wonder if we should just fix it now to avoid that
get_fs()/set_fs() hack, since that code mostly implements what you
also have in your patch 3 (which is done more nicely).

I'll follow up with a patch to demonstrate what I mean here. Your third
patch will then just have to add another code path so we can handle
all of old_timespec32 (for existing 32-bit user space), __kernel_old_timespec
(for sparc64) and __kernel_sock_timeval (for everything else).

   Arnd


Re: [PATCH V6 0/4] mm/kvm/vfio/ppc64: Migrate compound pages out of CMA region

2019-01-08 Thread Andrew Morton
On Tue,  8 Jan 2019 10:21:06 +0530 "Aneesh Kumar K.V" 
 wrote:

> ppc64 use CMA area for the allocation of guest page table (hash page table). 
> We won't
> be able to start guest if we fail to allocate hash page table. We have 
> observed
> hash table allocation failure because we failed to migrate pages out of CMA 
> region
> because they were pinned. This happen when we are using VFIO. VFIO on ppc64 
> pins
> the entire guest RAM. If the guest RAM pages get allocated out of CMA region, 
> we
> won't be able to migrate those pages. The pages are also pinned for the 
> lifetime of the
> guest.
> 
> Currently we support migration of non-compound pages. With THP and with the 
> addition of
>  hugetlb migration we can end up allocating compound pages from CMA region. 
> This
> patch series add support for migrating compound pages. The first path adds 
> the helper
> get_user_pages_cma_migrate() which pin the page making sure we migrate them 
> out of
> CMA region before incrementing the reference count. 

Does this code do anything for architectures other than powerpc?  If
not, should we be adding the ifdefs to avoid burdening other
architectures with unused code?



Re: [PATCH v2 1/2] mm: add probe_user_read()

2019-01-08 Thread Andrew Morton
On Tue,  8 Jan 2019 07:37:44 + (UTC) Christophe Leroy 
 wrote:

> In powerpc code, there are several places implementing safe
> access to user data. This is sometimes implemented using
> probe_kernel_address() with additional access_ok() verification,
> sometimes with get_user() enclosed in a pagefault_disable()/enable()
> pair, etc. :
> show_user_instructions()
> bad_stack_expansion()
> p9_hmi_special_emu()
> fsl_pci_mcheck_exception()
> read_user_stack_64()
> read_user_stack_32() on PPC64
> read_user_stack_32() on PPC32
> power_pmu_bhrb_to()
> 
> In the same spirit as probe_kernel_read(), this patch adds
> probe_user_read().
> 
> probe_user_read() does the same as probe_kernel_read() but
> first checks that it is really a user address.
> 
> ...
>
> --- a/include/linux/uaccess.h
> +++ b/include/linux/uaccess.h
> @@ -263,6 +263,40 @@ extern long strncpy_from_unsafe(char *dst, const void 
> *unsafe_addr, long count);
>  #define probe_kernel_address(addr, retval)   \
>   probe_kernel_read(, addr, sizeof(retval))
>  
> +/**
> + * probe_user_read(): safely attempt to read from a user location
> + * @dst: pointer to the buffer that shall take the data
> + * @src: address to read from
> + * @size: size of the data chunk
> + *
> + * Returns: 0 on success, -EFAULT on error.
> + *
> + * Safely read from address @src to the buffer at @dst.  If a kernel fault
> + * happens, handle that and return -EFAULT.
> + *
> + * We ensure that the copy_from_user is executed in atomic context so that
> + * do_page_fault() doesn't attempt to take mmap_sem.  This makes
> + * probe_user_read() suitable for use within regions where the caller
> + * already holds mmap_sem, or other locks which nest inside mmap_sem.
> + */
> +
> +#ifndef probe_user_read
> +static __always_inline long probe_user_read(void *dst, const void __user 
> *src,
> + size_t size)
> +{
> + long ret;
> +
> + if (!access_ok(src, size))
> + return -EFAULT;
> +
> + pagefault_disable();
> + ret = __copy_from_user_inatomic(dst, src, size);
> + pagefault_enable();
> +
> + return ret ? -EFAULT : 0;
> +}
> +#endif

Why was the __always_inline needed?

This function is pretty large.  Why is it inlined?


[PATCH AUTOSEL 3.18 05/19] powerpc/pseries/cpuidle: Fix preempt warning

2019-01-08 Thread Sasha Levin
From: Breno Leitao 

[ Upstream commit 2b038cbc5fcf12a7ee1cc9bfd5da1e46dacdee87 ]

When booting a pseries kernel with PREEMPT enabled, it dumps the
following warning:

   BUG: using smp_processor_id() in preemptible [] code: swapper/0/1
   caller is pseries_processor_idle_init+0x5c/0x22c
   CPU: 13 PID: 1 Comm: swapper/0 Not tainted 
4.20.0-rc3-00090-g12201a0128bc-dirty #828
   Call Trace:
   [c00429437ab0] [c09c8878] dump_stack+0xec/0x164 (unreliable)
   [c00429437b00] [c05f2f24] check_preemption_disabled+0x154/0x160
   [c00429437b90] [c0cab8e8] pseries_processor_idle_init+0x5c/0x22c
   [c00429437c10] [c0010ed4] do_one_initcall+0x64/0x300
   [c00429437ce0] [c0c54500] kernel_init_freeable+0x3f0/0x500
   [c00429437db0] [c00112dc] kernel_init+0x2c/0x160
   [c00429437e20] [c000c1d0] ret_from_kernel_thread+0x5c/0x6c

This happens because the code calls get_lppaca() which calls
get_paca() and it checks if preemption is disabled through
check_preemption_disabled().

Preemption should be disabled because the per CPU variable may make no
sense if there is a preemption (and a CPU switch) after it reads the
per CPU data and when it is used.

In this device driver specifically, it is not a problem, because this
code just needs to have access to one lppaca struct, and it does not
matter if it is the current per CPU lppaca struct or not (i.e. when
there is a preemption and a CPU migration).

That said, the most appropriate fix seems to be related to avoiding
the debug_smp_processor_id() call at get_paca(), instead of calling
preempt_disable() before get_paca().

Signed-off-by: Breno Leitao 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 drivers/cpuidle/cpuidle-pseries.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/cpuidle/cpuidle-pseries.c 
b/drivers/cpuidle/cpuidle-pseries.c
index 6f7b01956885..b755d3a54e3c 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -237,7 +237,13 @@ static int pseries_idle_probe(void)
return -ENODEV;
 
if (firmware_has_feature(FW_FEATURE_SPLPAR)) {
-   if (lppaca_shared_proc(get_lppaca())) {
+   /*
+* Use local_paca instead of get_lppaca() since
+* preemption is not disabled, and it is not required in
+* fact, since lppaca_ptr does not need to be the value
+* associated to the current CPU, it can be from any CPU.
+*/
+   if (lppaca_shared_proc(local_paca->lppaca_ptr)) {
cpuidle_state_table = shared_states;
max_idle_state = ARRAY_SIZE(shared_states);
} else {
-- 
2.19.1



[PATCH AUTOSEL 4.4 10/28] powerpc/pseries/cpuidle: Fix preempt warning

2019-01-08 Thread Sasha Levin
From: Breno Leitao 

[ Upstream commit 2b038cbc5fcf12a7ee1cc9bfd5da1e46dacdee87 ]

When booting a pseries kernel with PREEMPT enabled, it dumps the
following warning:

   BUG: using smp_processor_id() in preemptible [] code: swapper/0/1
   caller is pseries_processor_idle_init+0x5c/0x22c
   CPU: 13 PID: 1 Comm: swapper/0 Not tainted 
4.20.0-rc3-00090-g12201a0128bc-dirty #828
   Call Trace:
   [c00429437ab0] [c09c8878] dump_stack+0xec/0x164 (unreliable)
   [c00429437b00] [c05f2f24] check_preemption_disabled+0x154/0x160
   [c00429437b90] [c0cab8e8] pseries_processor_idle_init+0x5c/0x22c
   [c00429437c10] [c0010ed4] do_one_initcall+0x64/0x300
   [c00429437ce0] [c0c54500] kernel_init_freeable+0x3f0/0x500
   [c00429437db0] [c00112dc] kernel_init+0x2c/0x160
   [c00429437e20] [c000c1d0] ret_from_kernel_thread+0x5c/0x6c

This happens because the code calls get_lppaca() which calls
get_paca() and it checks if preemption is disabled through
check_preemption_disabled().

Preemption should be disabled because the per CPU variable may make no
sense if there is a preemption (and a CPU switch) after it reads the
per CPU data and when it is used.

In this device driver specifically, it is not a problem, because this
code just needs to have access to one lppaca struct, and it does not
matter if it is the current per CPU lppaca struct or not (i.e. when
there is a preemption and a CPU migration).

That said, the most appropriate fix seems to be related to avoiding
the debug_smp_processor_id() call at get_paca(), instead of calling
preempt_disable() before get_paca().

Signed-off-by: Breno Leitao 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 drivers/cpuidle/cpuidle-pseries.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/cpuidle/cpuidle-pseries.c 
b/drivers/cpuidle/cpuidle-pseries.c
index 07135e009d8b..601a6c3acc7f 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -240,7 +240,13 @@ static int pseries_idle_probe(void)
return -ENODEV;
 
if (firmware_has_feature(FW_FEATURE_SPLPAR)) {
-   if (lppaca_shared_proc(get_lppaca())) {
+   /*
+* Use local_paca instead of get_lppaca() since
+* preemption is not disabled, and it is not required in
+* fact, since lppaca_ptr does not need to be the value
+* associated to the current CPU, it can be from any CPU.
+*/
+   if (lppaca_shared_proc(local_paca->lppaca_ptr)) {
cpuidle_state_table = shared_states;
max_idle_state = ARRAY_SIZE(shared_states);
} else {
-- 
2.19.1



[PATCH AUTOSEL 4.9 15/36] powerpc/pseries/cpuidle: Fix preempt warning

2019-01-08 Thread Sasha Levin
From: Breno Leitao 

[ Upstream commit 2b038cbc5fcf12a7ee1cc9bfd5da1e46dacdee87 ]

When booting a pseries kernel with PREEMPT enabled, it dumps the
following warning:

   BUG: using smp_processor_id() in preemptible [] code: swapper/0/1
   caller is pseries_processor_idle_init+0x5c/0x22c
   CPU: 13 PID: 1 Comm: swapper/0 Not tainted 
4.20.0-rc3-00090-g12201a0128bc-dirty #828
   Call Trace:
   [c00429437ab0] [c09c8878] dump_stack+0xec/0x164 (unreliable)
   [c00429437b00] [c05f2f24] check_preemption_disabled+0x154/0x160
   [c00429437b90] [c0cab8e8] pseries_processor_idle_init+0x5c/0x22c
   [c00429437c10] [c0010ed4] do_one_initcall+0x64/0x300
   [c00429437ce0] [c0c54500] kernel_init_freeable+0x3f0/0x500
   [c00429437db0] [c00112dc] kernel_init+0x2c/0x160
   [c00429437e20] [c000c1d0] ret_from_kernel_thread+0x5c/0x6c

This happens because the code calls get_lppaca() which calls
get_paca() and it checks if preemption is disabled through
check_preemption_disabled().

Preemption should be disabled because the per CPU variable may make no
sense if there is a preemption (and a CPU switch) after it reads the
per CPU data and when it is used.

In this device driver specifically, it is not a problem, because this
code just needs to have access to one lppaca struct, and it does not
matter if it is the current per CPU lppaca struct or not (i.e. when
there is a preemption and a CPU migration).

That said, the most appropriate fix seems to be related to avoiding
the debug_smp_processor_id() call at get_paca(), instead of calling
preempt_disable() before get_paca().

Signed-off-by: Breno Leitao 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 drivers/cpuidle/cpuidle-pseries.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/cpuidle/cpuidle-pseries.c 
b/drivers/cpuidle/cpuidle-pseries.c
index 166ccd711ec9..83203c59bf59 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -230,7 +230,13 @@ static int pseries_idle_probe(void)
return -ENODEV;
 
if (firmware_has_feature(FW_FEATURE_SPLPAR)) {
-   if (lppaca_shared_proc(get_lppaca())) {
+   /*
+* Use local_paca instead of get_lppaca() since
+* preemption is not disabled, and it is not required in
+* fact, since lppaca_ptr does not need to be the value
+* associated to the current CPU, it can be from any CPU.
+*/
+   if (lppaca_shared_proc(local_paca->lppaca_ptr)) {
cpuidle_state_table = shared_states;
max_idle_state = ARRAY_SIZE(shared_states);
} else {
-- 
2.19.1



[PATCH AUTOSEL 4.9 14/36] powerpc/xmon: Fix invocation inside lock region

2019-01-08 Thread Sasha Levin
From: Breno Leitao 

[ Upstream commit 8d4a862276a9c30a269d368d324fb56529e6d5fd ]

Currently xmon needs to get devtree_lock (through rtas_token()) during its
invocation (at crash time). If there is a crash while devtree_lock is being
held, then xmon tries to get the lock but spins forever and never get into
the interactive debugger, as in the following case:

int *ptr = NULL;
raw_spin_lock_irqsave(_lock, flags);
*ptr = 0xdeadbeef;

This patch avoids calling rtas_token(), thus trying to get the same lock,
at crash time. This new mechanism proposes getting the token at
initialization time (xmon_init()) and just consuming it at crash time.

This would allow xmon to be possible invoked independent of devtree_lock
being held or not.

Signed-off-by: Breno Leitao 
Reviewed-by: Thiago Jung Bauermann 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/xmon/xmon.c | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 760545519a0b..687e8b8bf5c6 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -73,6 +73,9 @@ static int xmon_gate;
 #define xmon_owner 0
 #endif /* CONFIG_SMP */
 
+#ifdef CONFIG_PPC_PSERIES
+static int set_indicator_token = RTAS_UNKNOWN_SERVICE;
+#endif
 static unsigned long in_xmon __read_mostly = 0;
 
 static unsigned long adrs;
@@ -340,7 +343,6 @@ static inline void disable_surveillance(void)
 #ifdef CONFIG_PPC_PSERIES
/* Since this can't be a module, args should end up below 4GB. */
static struct rtas_args args;
-   int token;
 
/*
 * At this point we have got all the cpus we can into
@@ -349,11 +351,11 @@ static inline void disable_surveillance(void)
 * If we did try to take rtas.lock there would be a
 * real possibility of deadlock.
 */
-   token = rtas_token("set-indicator");
-   if (token == RTAS_UNKNOWN_SERVICE)
+   if (set_indicator_token == RTAS_UNKNOWN_SERVICE)
return;
 
-   rtas_call_unlocked(, token, 3, 1, NULL, SURVEILLANCE_TOKEN, 0, 0);
+   rtas_call_unlocked(, set_indicator_token, 3, 1, NULL,
+  SURVEILLANCE_TOKEN, 0, 0);
 
 #endif /* CONFIG_PPC_PSERIES */
 }
@@ -3227,6 +3229,14 @@ static void xmon_init(int enable)
__debugger_iabr_match = xmon_iabr_match;
__debugger_break_match = xmon_break_match;
__debugger_fault_handler = xmon_fault_handler;
+
+#ifdef CONFIG_PPC_PSERIES
+   /*
+* Get the token here to avoid trying to get a lock
+* during the crash, causing a deadlock.
+*/
+   set_indicator_token = rtas_token("set-indicator");
+#endif
} else {
__debugger = NULL;
__debugger_ipi = NULL;
-- 
2.19.1



[PATCH AUTOSEL 4.14 19/53] powerpc/pseries/cpuidle: Fix preempt warning

2019-01-08 Thread Sasha Levin
From: Breno Leitao 

[ Upstream commit 2b038cbc5fcf12a7ee1cc9bfd5da1e46dacdee87 ]

When booting a pseries kernel with PREEMPT enabled, it dumps the
following warning:

   BUG: using smp_processor_id() in preemptible [] code: swapper/0/1
   caller is pseries_processor_idle_init+0x5c/0x22c
   CPU: 13 PID: 1 Comm: swapper/0 Not tainted 
4.20.0-rc3-00090-g12201a0128bc-dirty #828
   Call Trace:
   [c00429437ab0] [c09c8878] dump_stack+0xec/0x164 (unreliable)
   [c00429437b00] [c05f2f24] check_preemption_disabled+0x154/0x160
   [c00429437b90] [c0cab8e8] pseries_processor_idle_init+0x5c/0x22c
   [c00429437c10] [c0010ed4] do_one_initcall+0x64/0x300
   [c00429437ce0] [c0c54500] kernel_init_freeable+0x3f0/0x500
   [c00429437db0] [c00112dc] kernel_init+0x2c/0x160
   [c00429437e20] [c000c1d0] ret_from_kernel_thread+0x5c/0x6c

This happens because the code calls get_lppaca() which calls
get_paca() and it checks if preemption is disabled through
check_preemption_disabled().

Preemption should be disabled because the per CPU variable may make no
sense if there is a preemption (and a CPU switch) after it reads the
per CPU data and when it is used.

In this device driver specifically, it is not a problem, because this
code just needs to have access to one lppaca struct, and it does not
matter if it is the current per CPU lppaca struct or not (i.e. when
there is a preemption and a CPU migration).

That said, the most appropriate fix seems to be related to avoiding
the debug_smp_processor_id() call at get_paca(), instead of calling
preempt_disable() before get_paca().

Signed-off-by: Breno Leitao 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 drivers/cpuidle/cpuidle-pseries.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/cpuidle/cpuidle-pseries.c 
b/drivers/cpuidle/cpuidle-pseries.c
index a187a39fb866..7f21c6a57178 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -239,7 +239,13 @@ static int pseries_idle_probe(void)
return -ENODEV;
 
if (firmware_has_feature(FW_FEATURE_SPLPAR)) {
-   if (lppaca_shared_proc(get_lppaca())) {
+   /*
+* Use local_paca instead of get_lppaca() since
+* preemption is not disabled, and it is not required in
+* fact, since lppaca_ptr does not need to be the value
+* associated to the current CPU, it can be from any CPU.
+*/
+   if (lppaca_shared_proc(local_paca->lppaca_ptr)) {
cpuidle_state_table = shared_states;
max_idle_state = ARRAY_SIZE(shared_states);
} else {
-- 
2.19.1



[PATCH AUTOSEL 4.14 18/53] powerpc/xmon: Fix invocation inside lock region

2019-01-08 Thread Sasha Levin
From: Breno Leitao 

[ Upstream commit 8d4a862276a9c30a269d368d324fb56529e6d5fd ]

Currently xmon needs to get devtree_lock (through rtas_token()) during its
invocation (at crash time). If there is a crash while devtree_lock is being
held, then xmon tries to get the lock but spins forever and never get into
the interactive debugger, as in the following case:

int *ptr = NULL;
raw_spin_lock_irqsave(_lock, flags);
*ptr = 0xdeadbeef;

This patch avoids calling rtas_token(), thus trying to get the same lock,
at crash time. This new mechanism proposes getting the token at
initialization time (xmon_init()) and just consuming it at crash time.

This would allow xmon to be possible invoked independent of devtree_lock
being held or not.

Signed-off-by: Breno Leitao 
Reviewed-by: Thiago Jung Bauermann 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/xmon/xmon.c | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index a5938fadd031..f752f771f29d 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -78,6 +78,9 @@ static int xmon_gate;
 #define xmon_owner 0
 #endif /* CONFIG_SMP */
 
+#ifdef CONFIG_PPC_PSERIES
+static int set_indicator_token = RTAS_UNKNOWN_SERVICE;
+#endif
 static unsigned long in_xmon __read_mostly = 0;
 static int xmon_on = IS_ENABLED(CONFIG_XMON_DEFAULT);
 
@@ -357,7 +360,6 @@ static inline void disable_surveillance(void)
 #ifdef CONFIG_PPC_PSERIES
/* Since this can't be a module, args should end up below 4GB. */
static struct rtas_args args;
-   int token;
 
/*
 * At this point we have got all the cpus we can into
@@ -366,11 +368,11 @@ static inline void disable_surveillance(void)
 * If we did try to take rtas.lock there would be a
 * real possibility of deadlock.
 */
-   token = rtas_token("set-indicator");
-   if (token == RTAS_UNKNOWN_SERVICE)
+   if (set_indicator_token == RTAS_UNKNOWN_SERVICE)
return;
 
-   rtas_call_unlocked(, token, 3, 1, NULL, SURVEILLANCE_TOKEN, 0, 0);
+   rtas_call_unlocked(, set_indicator_token, 3, 1, NULL,
+  SURVEILLANCE_TOKEN, 0, 0);
 
 #endif /* CONFIG_PPC_PSERIES */
 }
@@ -3472,6 +3474,14 @@ static void xmon_init(int enable)
__debugger_iabr_match = xmon_iabr_match;
__debugger_break_match = xmon_break_match;
__debugger_fault_handler = xmon_fault_handler;
+
+#ifdef CONFIG_PPC_PSERIES
+   /*
+* Get the token here to avoid trying to get a lock
+* during the crash, causing a deadlock.
+*/
+   set_indicator_token = rtas_token("set-indicator");
+#endif
} else {
__debugger = NULL;
__debugger_ipi = NULL;
-- 
2.19.1



[PATCH AUTOSEL 4.19 38/97] powerpc/pseries/cpuidle: Fix preempt warning

2019-01-08 Thread Sasha Levin
From: Breno Leitao 

[ Upstream commit 2b038cbc5fcf12a7ee1cc9bfd5da1e46dacdee87 ]

When booting a pseries kernel with PREEMPT enabled, it dumps the
following warning:

   BUG: using smp_processor_id() in preemptible [] code: swapper/0/1
   caller is pseries_processor_idle_init+0x5c/0x22c
   CPU: 13 PID: 1 Comm: swapper/0 Not tainted 
4.20.0-rc3-00090-g12201a0128bc-dirty #828
   Call Trace:
   [c00429437ab0] [c09c8878] dump_stack+0xec/0x164 (unreliable)
   [c00429437b00] [c05f2f24] check_preemption_disabled+0x154/0x160
   [c00429437b90] [c0cab8e8] pseries_processor_idle_init+0x5c/0x22c
   [c00429437c10] [c0010ed4] do_one_initcall+0x64/0x300
   [c00429437ce0] [c0c54500] kernel_init_freeable+0x3f0/0x500
   [c00429437db0] [c00112dc] kernel_init+0x2c/0x160
   [c00429437e20] [c000c1d0] ret_from_kernel_thread+0x5c/0x6c

This happens because the code calls get_lppaca() which calls
get_paca() and it checks if preemption is disabled through
check_preemption_disabled().

Preemption should be disabled because the per CPU variable may make no
sense if there is a preemption (and a CPU switch) after it reads the
per CPU data and when it is used.

In this device driver specifically, it is not a problem, because this
code just needs to have access to one lppaca struct, and it does not
matter if it is the current per CPU lppaca struct or not (i.e. when
there is a preemption and a CPU migration).

That said, the most appropriate fix seems to be related to avoiding
the debug_smp_processor_id() call at get_paca(), instead of calling
preempt_disable() before get_paca().

Signed-off-by: Breno Leitao 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 drivers/cpuidle/cpuidle-pseries.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/cpuidle/cpuidle-pseries.c 
b/drivers/cpuidle/cpuidle-pseries.c
index 9e56bc411061..74c247972bb3 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -247,7 +247,13 @@ static int pseries_idle_probe(void)
return -ENODEV;
 
if (firmware_has_feature(FW_FEATURE_SPLPAR)) {
-   if (lppaca_shared_proc(get_lppaca())) {
+   /*
+* Use local_paca instead of get_lppaca() since
+* preemption is not disabled, and it is not required in
+* fact, since lppaca_ptr does not need to be the value
+* associated to the current CPU, it can be from any CPU.
+*/
+   if (lppaca_shared_proc(local_paca->lppaca_ptr)) {
cpuidle_state_table = shared_states;
max_idle_state = ARRAY_SIZE(shared_states);
} else {
-- 
2.19.1



[PATCH AUTOSEL 4.19 37/97] powerpc/xmon: Fix invocation inside lock region

2019-01-08 Thread Sasha Levin
From: Breno Leitao 

[ Upstream commit 8d4a862276a9c30a269d368d324fb56529e6d5fd ]

Currently xmon needs to get devtree_lock (through rtas_token()) during its
invocation (at crash time). If there is a crash while devtree_lock is being
held, then xmon tries to get the lock but spins forever and never get into
the interactive debugger, as in the following case:

int *ptr = NULL;
raw_spin_lock_irqsave(_lock, flags);
*ptr = 0xdeadbeef;

This patch avoids calling rtas_token(), thus trying to get the same lock,
at crash time. This new mechanism proposes getting the token at
initialization time (xmon_init()) and just consuming it at crash time.

This would allow xmon to be possible invoked independent of devtree_lock
being held or not.

Signed-off-by: Breno Leitao 
Reviewed-by: Thiago Jung Bauermann 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/xmon/xmon.c | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 4264aedc7775..dd6badc31f45 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -75,6 +75,9 @@ static int xmon_gate;
 #define xmon_owner 0
 #endif /* CONFIG_SMP */
 
+#ifdef CONFIG_PPC_PSERIES
+static int set_indicator_token = RTAS_UNKNOWN_SERVICE;
+#endif
 static unsigned long in_xmon __read_mostly = 0;
 static int xmon_on = IS_ENABLED(CONFIG_XMON_DEFAULT);
 
@@ -358,7 +361,6 @@ static inline void disable_surveillance(void)
 #ifdef CONFIG_PPC_PSERIES
/* Since this can't be a module, args should end up below 4GB. */
static struct rtas_args args;
-   int token;
 
/*
 * At this point we have got all the cpus we can into
@@ -367,11 +369,11 @@ static inline void disable_surveillance(void)
 * If we did try to take rtas.lock there would be a
 * real possibility of deadlock.
 */
-   token = rtas_token("set-indicator");
-   if (token == RTAS_UNKNOWN_SERVICE)
+   if (set_indicator_token == RTAS_UNKNOWN_SERVICE)
return;
 
-   rtas_call_unlocked(, token, 3, 1, NULL, SURVEILLANCE_TOKEN, 0, 0);
+   rtas_call_unlocked(, set_indicator_token, 3, 1, NULL,
+  SURVEILLANCE_TOKEN, 0, 0);
 
 #endif /* CONFIG_PPC_PSERIES */
 }
@@ -3672,6 +3674,14 @@ static void xmon_init(int enable)
__debugger_iabr_match = xmon_iabr_match;
__debugger_break_match = xmon_break_match;
__debugger_fault_handler = xmon_fault_handler;
+
+#ifdef CONFIG_PPC_PSERIES
+   /*
+* Get the token here to avoid trying to get a lock
+* during the crash, causing a deadlock.
+*/
+   set_indicator_token = rtas_token("set-indicator");
+#endif
} else {
__debugger = NULL;
__debugger_ipi = NULL;
-- 
2.19.1



[PATCH AUTOSEL 4.20 108/117] KVM: PPC: Book3S HV: Apply combination of host and l1 pte rc for nested guest

2019-01-08 Thread Sasha Levin
From: Suraj Jitindar Singh 

[ Upstream commit 8b23eee4e55a32a2b51a180dfd27a8d214acc7a1 ]

The shadow page table contains ptes for translations from nested guest
address to host address. Currently when creating these ptes we take the
rc bits from the pte for the L1 guest address to host address
translation. This is incorrect as we must also factor in the rc bits
from the pte for the nested guest address to L1 guest address
translation (as contained in the L1 guest partition table for the nested
guest).

By not calculating these bits correctly L1 may not have been correctly
notified when it needed to update its rc bits in the partition table it
maintains for its nested guest.

Modify the code so that the rc bits in the resultant pte for the L2->L0
translation are the 'and' of the rc bits in the L2->L1 pte and the L1->L0
pte, also accounting for whether this was a write access when setting
the dirty bit.

Signed-off-by: Suraj Jitindar Singh 
Signed-off-by: Paul Mackerras 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/kvm/book3s_hv_nested.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
b/arch/powerpc/kvm/book3s_hv_nested.c
index fc64535e4c00..f8176ae3a5a7 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -1229,6 +1229,9 @@ static long int __kvmhv_nested_page_fault(struct kvm_vcpu 
*vcpu,
perm |= gpte.may_read ? 0UL : _PAGE_READ;
perm |= gpte.may_write ? 0UL : _PAGE_WRITE;
perm |= gpte.may_execute ? 0UL : _PAGE_EXEC;
+   /* Only set accessed/dirty (rc) bits if set in host and l1 guest ptes */
+   perm |= (gpte.rc & _PAGE_ACCESSED) ? 0UL : _PAGE_ACCESSED;
+   perm |= ((gpte.rc & _PAGE_DIRTY) && writing) ? 0UL : _PAGE_DIRTY;
pte = __pte(pte_val(pte) & ~perm);
 
/* What size pte can we insert? */
-- 
2.19.1



[PATCH AUTOSEL 4.20 107/117] KVM: PPC: Book3S HV: Align gfn to L1 page size when inserting nest-rmap entry

2019-01-08 Thread Sasha Levin
From: Suraj Jitindar Singh 

[ Upstream commit 8400f8740651c1a3081c30b46004451c448f4d5f ]

Nested rmap entries are used to store the translation from L1 gpa to L2
gpa when entries are inserted into the shadow (nested) page tables. This
rmap list is located by indexing the rmap array in the memslot by L1
gfn. When we come to search for these entries we only know the L1 page size
(which could be PAGE_SIZE, 2M or a 1G page) and so can only select a gfn
aligned to that size. This means that when we insert the entry, so we can
find it later, we need to align the gfn we use to select the rmap list
in which to insert the entry to L1 page size as well.

By not doing this we were missing nested rmap entries when modifying L1
ptes which were for a page also passed through to an L2 guest.

Signed-off-by: Suraj Jitindar Singh 
Signed-off-by: Paul Mackerras 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/kvm/book3s_hv_nested.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
b/arch/powerpc/kvm/book3s_hv_nested.c
index 401d2ecbebc5..fc64535e4c00 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -1220,6 +1220,8 @@ static long int __kvmhv_nested_page_fault(struct kvm_vcpu 
*vcpu,
return ret;
shift = kvmppc_radix_level_to_shift(level);
}
+   /* Align gfn to the start of the page */
+   gfn = (gpa & ~((1UL << shift) - 1)) >> PAGE_SHIFT;
 
/* 3. Compute the pte we need to insert for nest_gpa -> host r_addr */
 
-- 
2.19.1



[PATCH AUTOSEL 4.20 048/117] powerpc/pseries/cpuidle: Fix preempt warning

2019-01-08 Thread Sasha Levin
From: Breno Leitao 

[ Upstream commit 2b038cbc5fcf12a7ee1cc9bfd5da1e46dacdee87 ]

When booting a pseries kernel with PREEMPT enabled, it dumps the
following warning:

   BUG: using smp_processor_id() in preemptible [] code: swapper/0/1
   caller is pseries_processor_idle_init+0x5c/0x22c
   CPU: 13 PID: 1 Comm: swapper/0 Not tainted 
4.20.0-rc3-00090-g12201a0128bc-dirty #828
   Call Trace:
   [c00429437ab0] [c09c8878] dump_stack+0xec/0x164 (unreliable)
   [c00429437b00] [c05f2f24] check_preemption_disabled+0x154/0x160
   [c00429437b90] [c0cab8e8] pseries_processor_idle_init+0x5c/0x22c
   [c00429437c10] [c0010ed4] do_one_initcall+0x64/0x300
   [c00429437ce0] [c0c54500] kernel_init_freeable+0x3f0/0x500
   [c00429437db0] [c00112dc] kernel_init+0x2c/0x160
   [c00429437e20] [c000c1d0] ret_from_kernel_thread+0x5c/0x6c

This happens because the code calls get_lppaca() which calls
get_paca() and it checks if preemption is disabled through
check_preemption_disabled().

Preemption should be disabled because the per CPU variable may make no
sense if there is a preemption (and a CPU switch) after it reads the
per CPU data and when it is used.

In this device driver specifically, it is not a problem, because this
code just needs to have access to one lppaca struct, and it does not
matter if it is the current per CPU lppaca struct or not (i.e. when
there is a preemption and a CPU migration).

That said, the most appropriate fix seems to be related to avoiding
the debug_smp_processor_id() call at get_paca(), instead of calling
preempt_disable() before get_paca().

Signed-off-by: Breno Leitao 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 drivers/cpuidle/cpuidle-pseries.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/cpuidle/cpuidle-pseries.c 
b/drivers/cpuidle/cpuidle-pseries.c
index 9e56bc411061..74c247972bb3 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -247,7 +247,13 @@ static int pseries_idle_probe(void)
return -ENODEV;
 
if (firmware_has_feature(FW_FEATURE_SPLPAR)) {
-   if (lppaca_shared_proc(get_lppaca())) {
+   /*
+* Use local_paca instead of get_lppaca() since
+* preemption is not disabled, and it is not required in
+* fact, since lppaca_ptr does not need to be the value
+* associated to the current CPU, it can be from any CPU.
+*/
+   if (lppaca_shared_proc(local_paca->lppaca_ptr)) {
cpuidle_state_table = shared_states;
max_idle_state = ARRAY_SIZE(shared_states);
} else {
-- 
2.19.1



[PATCH AUTOSEL 4.20 047/117] powerpc/xmon: Fix invocation inside lock region

2019-01-08 Thread Sasha Levin
From: Breno Leitao 

[ Upstream commit 8d4a862276a9c30a269d368d324fb56529e6d5fd ]

Currently xmon needs to get devtree_lock (through rtas_token()) during its
invocation (at crash time). If there is a crash while devtree_lock is being
held, then xmon tries to get the lock but spins forever and never get into
the interactive debugger, as in the following case:

int *ptr = NULL;
raw_spin_lock_irqsave(_lock, flags);
*ptr = 0xdeadbeef;

This patch avoids calling rtas_token(), thus trying to get the same lock,
at crash time. This new mechanism proposes getting the token at
initialization time (xmon_init()) and just consuming it at crash time.

This would allow xmon to be possible invoked independent of devtree_lock
being held or not.

Signed-off-by: Breno Leitao 
Reviewed-by: Thiago Jung Bauermann 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/xmon/xmon.c | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 36b8dc47a3c3..b566203d09c5 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -75,6 +75,9 @@ static int xmon_gate;
 #define xmon_owner 0
 #endif /* CONFIG_SMP */
 
+#ifdef CONFIG_PPC_PSERIES
+static int set_indicator_token = RTAS_UNKNOWN_SERVICE;
+#endif
 static unsigned long in_xmon __read_mostly = 0;
 static int xmon_on = IS_ENABLED(CONFIG_XMON_DEFAULT);
 
@@ -358,7 +361,6 @@ static inline void disable_surveillance(void)
 #ifdef CONFIG_PPC_PSERIES
/* Since this can't be a module, args should end up below 4GB. */
static struct rtas_args args;
-   int token;
 
/*
 * At this point we have got all the cpus we can into
@@ -367,11 +369,11 @@ static inline void disable_surveillance(void)
 * If we did try to take rtas.lock there would be a
 * real possibility of deadlock.
 */
-   token = rtas_token("set-indicator");
-   if (token == RTAS_UNKNOWN_SERVICE)
+   if (set_indicator_token == RTAS_UNKNOWN_SERVICE)
return;
 
-   rtas_call_unlocked(, token, 3, 1, NULL, SURVEILLANCE_TOKEN, 0, 0);
+   rtas_call_unlocked(, set_indicator_token, 3, 1, NULL,
+  SURVEILLANCE_TOKEN, 0, 0);
 
 #endif /* CONFIG_PPC_PSERIES */
 }
@@ -3688,6 +3690,14 @@ static void xmon_init(int enable)
__debugger_iabr_match = xmon_iabr_match;
__debugger_break_match = xmon_break_match;
__debugger_fault_handler = xmon_fault_handler;
+
+#ifdef CONFIG_PPC_PSERIES
+   /*
+* Get the token here to avoid trying to get a lock
+* during the crash, causing a deadlock.
+*/
+   set_indicator_token = rtas_token("set-indicator");
+#endif
} else {
__debugger = NULL;
__debugger_ipi = NULL;
-- 
2.19.1



[PATCH] powerpc/spufs: use struct_size() in kmalloc()

2019-01-08 Thread Gustavo A. R. Silva
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
int stuff;
void *entry[];
};

instance = kmalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva 
---
 arch/powerpc/platforms/cell/spufs/file.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/cell/spufs/file.c 
b/arch/powerpc/platforms/cell/spufs/file.c
index ae8123edddc6..48c2477e7e2a 100644
--- a/arch/powerpc/platforms/cell/spufs/file.c
+++ b/arch/powerpc/platforms/cell/spufs/file.c
@@ -2338,9 +2338,8 @@ static int spufs_switch_log_open(struct inode *inode, 
struct file *file)
goto out;
}
 
-   ctx->switch_log = kmalloc(sizeof(struct switch_log) +
-   SWITCH_LOG_BUFSIZE * sizeof(struct switch_log_entry),
-   GFP_KERNEL);
+   ctx->switch_log = kmalloc(struct_size(ctx->switch_log, log,
+ SWITCH_LOG_BUFSIZE), GFP_KERNEL);
 
if (!ctx->switch_log) {
rc = -ENOMEM;
-- 
2.20.1



Re: [PATCH v4 10/13] x86: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs

2019-01-08 Thread Peter Zijlstra
On Tue, Jan 08, 2019 at 11:36:33AM -0500, Boris Ostrovsky wrote:
> On 1/8/19 5:48 AM, Peter Zijlstra wrote:

> > On the various exclude options; they are as follows (IIUC):
> >
> >   - exclude_guest: we're a HV/host-kernel and we don't want the counter
> >to run when we run a guest context.
> >
> >   - exclude_host: we're a HV/host-kernel and we don't want the counter
> >   to run when we run in host context.
> >
> >   - exclude_hv: we're a guest and don't want the counter to run in HV
> > context.
> >
> > Now, KVM always implies exclude_hv afaict (for guests), I'm not sure
> > what, if anything Xen does on x86 (IIRC Brendan Gregg once said perf
> > works on Xen) -- nor quite sure who to ask, Boris, Jeurgen?
> 
> perf does work inside guests.
> 
> VPMU is managed by the Xen and it presents to the guest only samples
> that are associated with the guest. So from that perspective exclude_hv
> doesn't seem to be needed.
> 
> There is a VPMU mode that allows profiling whole system (host and
> guests) from dom0, and this where exclude_hv might be useful. But this
> mode, ahem, needs some work.

Thanks Boris!


Re: [1/2] powerpc/4xx/ocm: Fix phys_addr_t printf warnings

2019-01-08 Thread Christian Lamparter
On Tuesday, January 8, 2019 10:54:28 AM CET Michael Ellerman wrote:
> Christian Lamparter  writes:
> > On Wednesday, January 2, 2019 12:31:50 PM CET Michael Ellerman wrote:
> >> On Tue, 2019-01-01 at 03:56:00 UTC, Michael Ellerman wrote:
> >> > Currently the code produces several warnings, eg:
> >> > 
> >> >   arch/powerpc/platforms/4xx/ocm.c:240:38: error: format '%llx'
> >> >   expects argument of type 'long long unsigned int', but argument 3
> >> >   has type 'phys_addr_t {aka unsigned int}'
> >> >  seq_printf(m, "PhysAddr : 0x%llx\n", ocm->phys);
> >> >~~~^ ~
> >> > 
> >> > Fix it by using the special %pa[p] format for printing phys_addr_t.
> >> > Note we need to pass the value by reference for the special specifier
> >> > to work.
> >> > 
> >> > Signed-off-by: Michael Ellerman 
> >> 
> >> Series applied to powerpc fixes.
> >> 
> >> https://git.kernel.org/powerpc/c/52b88fa1e8c7bae03bb691178a9f8b
> >
> > Well, I guess I'm a late. I had issues with the getting 4.20+
> > crosscompiled on debian with make-kpkg. 
> >
> > Nevertheless, I finally got a working kernel and 
> > on the MyBook Live APM82181:
> >
> > ---
> > root@mbl:/sys/kernel/debug# cat ppc4xx_ocm/info 
> > PPC4XX OCM   : 1
> > PhysAddr : 0x00040004[p]
> > MemTotal : 32768 Bytes
> > MemTotal(NC) : 32768 Bytes
> > MemTotal(C)  : 0 Bytes
> >
> > NC.PhysAddr  : 0x00040004[p]
> > NC.VirtAddr  : 0x6bc84b36
> > NC.MemTotal  : 32768 Bytes
> > NC.MemFree   : 32768 Bytes
> >
> > C.PhysAddr   : 0x[p]
> > C.VirtAddr   : 0x  (null)
> > C.MemTotal   : 0 Bytes
> > C.MemFree: 0 Bytes
> 
> Oh right, I'm an idiot :)
> 
> The docs say:
> 
> Physical address types phys_addr_t
> --
> 
>   %pa[p]  0x01234567 or 0x0123456789abcdef
> 
> 
> And if you grep for that there's eg:
> 
>   drivers/ntb/test/ntb_tool.c: "Window Size
> \t%pa[p]\n",
> 
> So I just literally copied that.
> 
> But it's trying to indicate that the p is optional.
> 
> This should fix it, I won't merge it until you've tested it this time :)
> 
> cheers
> 
> 
> diff --git a/arch/powerpc/platforms/4xx/ocm.c 
> b/arch/powerpc/platforms/4xx/ocm.c
> index a1aaa1569d7c..f0e488d97567 100644
> --- a/arch/powerpc/platforms/4xx/ocm.c
> +++ b/arch/powerpc/platforms/4xx/ocm.c
> @@ -237,12 +237,12 @@ static int ocm_debugfs_show(struct seq_file *m, void *v)
>   continue;
>  
>   seq_printf(m, "PPC4XX OCM   : %d\n", ocm->index);
> - seq_printf(m, "PhysAddr : %pa[p]\n", &(ocm->phys));
> + seq_printf(m, "PhysAddr : %pa\n", &(ocm->phys));
>   seq_printf(m, "MemTotal : %d Bytes\n", ocm->memtotal);
>   seq_printf(m, "MemTotal(NC) : %d Bytes\n", ocm->nc.memtotal);
>   seq_printf(m, "MemTotal(C)  : %d Bytes\n\n", ocm->c.memtotal);
>  
> - seq_printf(m, "NC.PhysAddr  : %pa[p]\n", &(ocm->nc.phys));
> + seq_printf(m, "NC.PhysAddr  : %pa\n", &(ocm->nc.phys));
>   seq_printf(m, "NC.VirtAddr  : 0x%p\n", ocm->nc.virt);
>   seq_printf(m, "NC.MemTotal  : %d Bytes\n", ocm->nc.memtotal);
>   seq_printf(m, "NC.MemFree   : %d Bytes\n", ocm->nc.memfree);
> @@ -252,7 +252,7 @@ static int ocm_debugfs_show(struct seq_file *m, void *v)
>   blk->size, blk->owner);
>   }
>  
> - seq_printf(m, "\nC.PhysAddr   : %pa[p]\n", &(ocm->c.phys));
> + seq_printf(m, "\nC.PhysAddr   : %pa\n", &(ocm->c.phys));
>   seq_printf(m, "C.VirtAddr   : 0x%p\n", ocm->c.virt);
>   seq_printf(m, "C.MemTotal   : %d Bytes\n", ocm->c.memtotal);
>   seq_printf(m, "C.MemFree: %d Bytes\n", ocm->c.memfree);

Ok, with the patch applied it now looks like:

|root@mbl:/sys/kernel/debug/ppc4xx_ocm# cat info 
|PPC4XX OCM   : 1
|PhysAddr : 0x00040004
|MemTotal : 32768 Bytes
|MemTotal(NC) : 32768 Bytes
|MemTotal(C)  : 0 Bytes
|
|NC.PhysAddr  : 0x00040004
|NC.VirtAddr  : 0x54f5bce2
|NC.MemTotal  : 32768 Bytes
|NC.MemFree   : 32768 Bytes
|
|C.PhysAddr   : 0x
|C.VirtAddr   : 0x  (null)
|C.MemTotal   : 0 Bytes
|C.MemFree: 0 Bytes

... just as expected. ;)

Thanks,
Christian




perf trace syscall table generation for powerpc with syscall.tbl

2019-01-08 Thread Arnaldo Carvalho de Melo
Hi Ravi,

I noticed that in:

  commit ab66dcc76d6ab8fae9d69d149ae38c42605e7fc5
  Author: Firoz Khan 
  Date:   Mon Dec 17 16:10:36 2018 +0530

powerpc: generate uapi header and system call table files

powerpc now generates its syscall tables headers from a syscall.tbl just
like x86 and s390, could you please switch to using it, taking the x86
and s390 scripts as a starting point, then test on your systems
everything works?

Thanks,

- Arnaldo


Re: [PATCH v4 10/13] x86: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs

2019-01-08 Thread Boris Ostrovsky
On 1/8/19 5:48 AM, Peter Zijlstra wrote:
> On Mon, Jan 07, 2019 at 04:27:27PM +, Andrew Murray wrote:
>> For drivers that do not support context exclusion let's advertise the
>> PERF_PMU_CAP_NOEXCLUDE capability. This ensures that perf will
>> prevent us from handling events where any exclusion flags are set.
>> Let's also remove the now unnecessary check for exclusion flags.
>>
>> Signed-off-by: Andrew Murray 
>> ---
>>  arch/x86/events/amd/ibs.c  | 13 +
>>  arch/x86/events/amd/power.c| 10 ++
>>  arch/x86/events/intel/cstate.c | 12 +++-
>>  arch/x86/events/intel/rapl.c   |  9 ++---
>>  arch/x86/events/intel/uncore_snb.c |  9 ++---
>>  arch/x86/events/msr.c  | 10 ++
>>  6 files changed, 12 insertions(+), 51 deletions(-)
> You (correctly) don't add CAP_NO_EXCLUDE to the main x86 pmu code, but
> then you also don't check if it handles all the various exclude options
> correctly/consistently.
>
> Now; I must admit that that is a bit of a maze, but I think we can at
> least add exclude_idle and exclude_hv fails in there, nothing uses those
> afaict.
>
> On the various exclude options; they are as follows (IIUC):
>
>   - exclude_guest: we're a HV/host-kernel and we don't want the counter
>to run when we run a guest context.
>
>   - exclude_host: we're a HV/host-kernel and we don't want the counter
>   to run when we run in host context.
>
>   - exclude_hv: we're a guest and don't want the counter to run in HV
> context.
>
> Now, KVM always implies exclude_hv afaict (for guests), I'm not sure
> what, if anything Xen does on x86 (IIRC Brendan Gregg once said perf
> works on Xen) -- nor quite sure who to ask, Boris, Jeurgen?

perf does work inside guests.

VPMU is managed by the Xen and it presents to the guest only samples
that are associated with the guest. So from that perspective exclude_hv
doesn't seem to be needed.

There is a VPMU mode that allows profiling whole system (host and
guests) from dom0, and this where exclude_hv might be useful. But this
mode, ahem, needs some work.


-boris


[PATCH] powerpc/ipic: drop unused functions

2019-01-08 Thread Christophe Leroy
ipic_set_highest_priority(), ipic_enable_mcp() and ipic_disable_mcp()
are unused. This patch drops them.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/ipic.h |  3 ---
 arch/powerpc/sysdev/ipic.c  | 35 ---
 2 files changed, 38 deletions(-)

diff --git a/arch/powerpc/include/asm/ipic.h b/arch/powerpc/include/asm/ipic.h
index 3dbd47f2bffe..abad50a745db 100644
--- a/arch/powerpc/include/asm/ipic.h
+++ b/arch/powerpc/include/asm/ipic.h
@@ -69,10 +69,7 @@ enum ipic_mcp_irq {
IPIC_MCP_MU   = 7,
 };
 
-extern void ipic_set_highest_priority(unsigned int irq);
 extern void ipic_set_default_priority(void);
-extern void ipic_enable_mcp(enum ipic_mcp_irq mcp_irq);
-extern void ipic_disable_mcp(enum ipic_mcp_irq mcp_irq);
 extern u32 ipic_get_mcp_status(void);
 extern void ipic_clear_mcp_status(u32 mask);
 
diff --git a/arch/powerpc/sysdev/ipic.c b/arch/powerpc/sysdev/ipic.c
index 8030a0f55e96..fd129c8ecceb 100644
--- a/arch/powerpc/sysdev/ipic.c
+++ b/arch/powerpc/sysdev/ipic.c
@@ -771,21 +771,6 @@ struct ipic * __init ipic_init(struct device_node *node, 
unsigned int flags)
return ipic;
 }
 
-void ipic_set_highest_priority(unsigned int virq)
-{
-   struct ipic *ipic = ipic_from_irq(virq);
-   unsigned int src = virq_to_hw(virq);
-   u32 temp;
-
-   temp = ipic_read(ipic->regs, IPIC_SICFR);
-
-   /* clear and set HPI */
-   temp &= 0x7f00;
-   temp |= (src & 0x7f) << 24;
-
-   ipic_write(ipic->regs, IPIC_SICFR, temp);
-}
-
 void ipic_set_default_priority(void)
 {
ipic_write(primary_ipic->regs, IPIC_SIPRR_A, IPIC_PRIORITY_DEFAULT);
@@ -796,26 +781,6 @@ void ipic_set_default_priority(void)
ipic_write(primary_ipic->regs, IPIC_SMPRR_B, IPIC_PRIORITY_DEFAULT);
 }
 
-void ipic_enable_mcp(enum ipic_mcp_irq mcp_irq)
-{
-   struct ipic *ipic = primary_ipic;
-   u32 temp;
-
-   temp = ipic_read(ipic->regs, IPIC_SERMR);
-   temp |= (1 << (31 - mcp_irq));
-   ipic_write(ipic->regs, IPIC_SERMR, temp);
-}
-
-void ipic_disable_mcp(enum ipic_mcp_irq mcp_irq)
-{
-   struct ipic *ipic = primary_ipic;
-   u32 temp;
-
-   temp = ipic_read(ipic->regs, IPIC_SERMR);
-   temp &= (1 << (31 - mcp_irq));
-   ipic_write(ipic->regs, IPIC_SERMR, temp);
-}
-
 u32 ipic_get_mcp_status(void)
 {
return primary_ipic ? ipic_read(primary_ipic->regs, IPIC_SERSR) : 0;
-- 
2.13.3



Re: [PATCH] powerpc/irq: drop arch_early_irq_init()

2019-01-08 Thread Thomas Gleixner
On Tue, 8 Jan 2019, Christophe Leroy wrote:

> Oops, forgot to actually copy Thomas.
> 
> Le 08/01/2019 à 12:37, Christophe Leroy a écrit :
> > arch_early_irq_init() does nothing different than
> > the weak arch_early_irq_init() in kernel/softirq.c
> > 
> > Fixes: 089fb442f301 ("powerpc: Use ARCH_IRQ_INIT_FLAGS")
> > Cc: Thomas Gleixner 
> > Signed-off-by: Christophe Leroy 

Acked-by: Thomas Gleixner 

Re: [PATCH v4 05/13] arm: perf: conditionally use PERF_PMU_CAP_NO_EXCLUDE

2019-01-08 Thread Peter Zijlstra
On Tue, Jan 08, 2019 at 01:13:57PM +, Andrew Murray wrote:
> On Tue, Jan 08, 2019 at 02:10:31PM +0100, Peter Zijlstra wrote:
> > On Tue, Jan 08, 2019 at 01:07:41PM +, Andrew Murray wrote:
> > 
> > > Yes I found lots of examples like this across the tree whilst doing this
> > > work. However I decided to initially start with simply removing duplicated
> > > code as a result of adding this flag and attempting to preserve existing
> > > functionality. I thought that if I add missing checks then the patchset
> > > will get much bigger and be harder to merge. I would like to do this 
> > > though
> > > as another non-cross-arch series.
> > > 
> > > Can we limit this patch series to the minimal changes required to fully
> > > use PERF_PMU_CAP_NO_EXCLUDE and then attempt to fix these existing 
> > > problems
> > > in subsequent patch sets?
> > 
> > Ok, but it would've been nice to see that mentioned somewhere.
> 
> I'll update the cover leter on any next revision. I'll try to be clearer next
> time with my intentions.

Could you maybe include it in the relevant patches too; like for example
the ARM one where we rely on set_event_filter() to DTRT.

So with the changelogs and subjects fixed I can take these patches and
then you can get on with cleaning up the individual drivers.


Re: [PATCH v12 01/10] powerpc/irq: use memblock functions returning virtual address

2019-01-08 Thread Mike Rapoport
Hi,

On Tue, Jan 08, 2019 at 01:43:09PM +, Christophe Leroy wrote:
> Since only the virtual address of allocated blocks is used,
> lets use functions returning directly virtual address.
> 
> Those functions have the advantage of also zeroing the block.
> 
> Suggested-by: Mike Rapoport 
> Signed-off-by: Christophe Leroy 
> ---
>  @Mike: Part of this is taken from your serie.

It seems you've found some places I've missed :)
Note that my series was merged to mmotm, so you may get a conflict with it.

>  I was not sure how to include you in the commit log, I used
>  Suggested-by: but could use any other property, so if you prefer
>  something different just tell me. Also, if you could review this patch,
>  it would be nice.

Suggested-by sounds reasonable.
 
>  arch/powerpc/kernel/irq.c  |  5 -
>  arch/powerpc/kernel/setup_32.c | 15 +--
>  arch/powerpc/kernel/setup_64.c | 16 ++--
>  3 files changed, 7 insertions(+), 29 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
> index 916ddc4aac44..4a44bc395fbc 100644
> --- a/arch/powerpc/kernel/irq.c
> +++ b/arch/powerpc/kernel/irq.c
> @@ -725,18 +725,15 @@ void exc_lvl_ctx_init(void)
>  #endif
>  #endif
>  
> - memset((void *)critirq_ctx[cpu_nr], 0, THREAD_SIZE);
>   tp = critirq_ctx[cpu_nr];
>   tp->cpu = cpu_nr;
>   tp->preempt_count = 0;
>  
>  #ifdef CONFIG_BOOKE
> - memset((void *)dbgirq_ctx[cpu_nr], 0, THREAD_SIZE);
>   tp = dbgirq_ctx[cpu_nr];
>   tp->cpu = cpu_nr;
>   tp->preempt_count = 0;
>  
> - memset((void *)mcheckirq_ctx[cpu_nr], 0, THREAD_SIZE);
>   tp = mcheckirq_ctx[cpu_nr];
>   tp->cpu = cpu_nr;
>   tp->preempt_count = HARDIRQ_OFFSET;
> @@ -754,12 +751,10 @@ void irq_ctx_init(void)
>   int i;
>  
>   for_each_possible_cpu(i) {
> - memset((void *)softirq_ctx[i], 0, THREAD_SIZE);
>   tp = softirq_ctx[i];
>   tp->cpu = i;
>   klp_init_thread_info(tp);
>  
> - memset((void *)hardirq_ctx[i], 0, THREAD_SIZE);
>   tp = hardirq_ctx[i];
>   tp->cpu = i;
>   klp_init_thread_info(tp);
> diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c
> index 947f904688b0..bfe1b46a26dc 100644
> --- a/arch/powerpc/kernel/setup_32.c
> +++ b/arch/powerpc/kernel/setup_32.c
> @@ -203,10 +203,8 @@ void __init irqstack_early_init(void)
>   /* interrupt stacks must be in lowmem, we get that for free on ppc32
>* as the memblock is limited to lowmem by default */
>   for_each_possible_cpu(i) {
> - softirq_ctx[i] = (struct thread_info *)
> - __va(memblock_phys_alloc(THREAD_SIZE, THREAD_SIZE));
> - hardirq_ctx[i] = (struct thread_info *)
> - __va(memblock_phys_alloc(THREAD_SIZE, THREAD_SIZE));
> + softirq_ctx[i] = memblock_alloc(THREAD_SIZE, THREAD_SIZE);
> + hardirq_ctx[i] = memblock_alloc(THREAD_SIZE, THREAD_SIZE);
>   }
>  }
>  
> @@ -224,13 +222,10 @@ void __init exc_lvl_early_init(void)
>   hw_cpu = 0;
>  #endif
>  
> - critirq_ctx[hw_cpu] = (struct thread_info *)
> - __va(memblock_phys_alloc(THREAD_SIZE, THREAD_SIZE));
> + critirq_ctx[hw_cpu] = memblock_alloc(THREAD_SIZE, THREAD_SIZE);
>  #ifdef CONFIG_BOOKE
> - dbgirq_ctx[hw_cpu] = (struct thread_info *)
> - __va(memblock_phys_alloc(THREAD_SIZE, THREAD_SIZE));
> - mcheckirq_ctx[hw_cpu] = (struct thread_info *)
> - __va(memblock_phys_alloc(THREAD_SIZE, THREAD_SIZE));
> + dbgirq_ctx[hw_cpu] = memblock_alloc(THREAD_SIZE, THREAD_SIZE);
> + mcheckirq_ctx[hw_cpu] = memblock_alloc(THREAD_SIZE, 
> THREAD_SIZE);
>  #endif
>   }
>  }
> diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> index 236c1151a3a7..943503f1e2c0 100644
> --- a/arch/powerpc/kernel/setup_64.c
> +++ b/arch/powerpc/kernel/setup_64.c
> @@ -634,19 +634,10 @@ __init u64 ppc64_bolted_size(void)
>  
>  static void *__init alloc_stack(unsigned long limit, int cpu)
>  {
> - unsigned long pa;
> -
>   BUILD_BUG_ON(STACK_INT_FRAME_SIZE % 16);
>  
> - pa = memblock_alloc_base_nid(THREAD_SIZE, THREAD_SIZE, limit,
> - early_cpu_to_node(cpu), MEMBLOCK_NONE);
> - if (!pa) {
> - pa = memblock_alloc_base(THREAD_SIZE, THREAD_SIZE, limit);
> - if (!pa)
> - panic("cannot allocate stacks");
> - }
> -
> - return __va(pa);
> + return memblock_alloc_try_nid(THREAD_SIZE, THREAD_SIZE, 
> MEMBLOCK_LOW_LIMIT,
> +   limit, early_cpu_to_node(cpu));

I'd prefer if you would keep the return value check and panic() here, as
I'm working on removing 

Re: [PATCH v4 1/6] powerpc: prefer memblock APIs returning virtual address

2019-01-08 Thread Mike Rapoport
Hi,

On Tue, Jan 08, 2019 at 11:02:24AM +0100, Christophe Leroy wrote:
> 
> Le 31/12/2018 à 10:29, Mike Rapoport a écrit :
> >There are a several places that allocate memory using memblock APIs that
> >return a physical address, convert the returned address to the virtual
> >address and frequently also memset(0) the allocated range.
> >
> >Update these places to use memblock allocators already returning a virtual
> >address. Use memblock functions that clear the allocated memory instead of
> >calling memset(0) where appropriate.
> >
> >The calls to memblock_alloc_base() that were not followed by memset(0) are
> >replaced with memblock_alloc_try_nid_raw(). Since the latter does not
> >panic() when the allocation fails, the appropriate panic() calls are added
> >to the call sites.
> >
> >Signed-off-by: Mike Rapoport 
> >---
> >  arch/powerpc/kernel/paca.c | 16 ++--
> >  arch/powerpc/kernel/setup_64.c | 24 ++--
> >  arch/powerpc/mm/hash_utils_64.c|  6 +++---
> >  arch/powerpc/mm/pgtable-book3e.c   |  8 ++--
> >  arch/powerpc/mm/pgtable-book3s64.c |  5 +
> >  arch/powerpc/mm/pgtable-radix.c| 25 +++--
> >  arch/powerpc/platforms/pasemi/iommu.c  |  5 +++--
> >  arch/powerpc/platforms/pseries/setup.c | 18 ++
> >  arch/powerpc/sysdev/dart_iommu.c   |  7 +--
> >  9 files changed, 51 insertions(+), 63 deletions(-)
> >
> >diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
> >index 913bfca..276d36d4 100644
> >--- a/arch/powerpc/kernel/paca.c
> >+++ b/arch/powerpc/kernel/paca.c
> >@@ -27,7 +27,7 @@
> >  static void *__init alloc_paca_data(unsigned long size, unsigned long 
> > align,
> > unsigned long limit, int cpu)
> >  {
> >-unsigned long pa;
> >+void *ptr;
> > int nid;
> > /*
> >@@ -42,17 +42,15 @@ static void *__init alloc_paca_data(unsigned long size, 
> >unsigned long align,
> > nid = early_cpu_to_node(cpu);
> > }
> >-pa = memblock_alloc_base_nid(size, align, limit, nid, MEMBLOCK_NONE);
> >-if (!pa) {
> >-pa = memblock_alloc_base(size, align, limit);
> >-if (!pa)
> >-panic("cannot allocate paca data");
> >-}
> >+ptr = memblock_alloc_try_nid(size, align, MEMBLOCK_LOW_LIMIT,
> >+ limit, nid);
> >+if (!ptr)
> >+panic("cannot allocate paca data");
> 
> AFAIKS, memblock_alloc_try_nid() panics if memblock_alloc_internal() returns
> NULL, so the above panic is useless, isn't it ?
 
My plan is to make all memblock_alloc() APIs to return NULL rather then
panic and then get rid of _nopanic variants. It's currently WIP and
hopefully I'll have the patches ready next week.

> > if (cpu == boot_cpuid)
> > memblock_set_bottom_up(false);
> >-return __va(pa);
> >+return ptr;
> >  }
> >  #ifdef CONFIG_PPC_PSERIES
> >@@ -118,7 +116,6 @@ static struct slb_shadow * __init new_slb_shadow(int 
> >cpu, unsigned long limit)
> > }
> > s = alloc_paca_data(sizeof(*s), L1_CACHE_BYTES, limit, cpu);
> >-memset(s, 0, sizeof(*s));
> > s->persistent = cpu_to_be32(SLB_NUM_BOLTED);
> > s->buffer_length = cpu_to_be32(sizeof(*s));
> >@@ -222,7 +219,6 @@ void __init allocate_paca(int cpu)
> > paca = alloc_paca_data(sizeof(struct paca_struct), L1_CACHE_BYTES,
> > limit, cpu);
> > paca_ptrs[cpu] = paca;
> >-memset(paca, 0, sizeof(struct paca_struct));
> > initialise_paca(paca, cpu);
> >  #ifdef CONFIG_PPC_PSERIES
> >diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> >index 236c115..3dcd779 100644
> >--- a/arch/powerpc/kernel/setup_64.c
> >+++ b/arch/powerpc/kernel/setup_64.c
> >@@ -634,19 +634,17 @@ __init u64 ppc64_bolted_size(void)
> >  static void *__init alloc_stack(unsigned long limit, int cpu)
> >  {
> >-unsigned long pa;
> >+void *ptr;
> > BUILD_BUG_ON(STACK_INT_FRAME_SIZE % 16);
> >-pa = memblock_alloc_base_nid(THREAD_SIZE, THREAD_SIZE, limit,
> >-early_cpu_to_node(cpu), MEMBLOCK_NONE);
> >-if (!pa) {
> >-pa = memblock_alloc_base(THREAD_SIZE, THREAD_SIZE, limit);
> >-if (!pa)
> >-panic("cannot allocate stacks");
> >-}
> >+ptr = memblock_alloc_try_nid(THREAD_SIZE, THREAD_SIZE,
> >+ MEMBLOCK_LOW_LIMIT, limit,
> >+ early_cpu_to_node(cpu));
> >+if (!ptr)
> >+panic("cannot allocate stacks");
> 
> Same ?
> 
> Christophe
> 
> >-return __va(pa);
> >+return ptr;
> >  }
> >  void __init irqstack_early_init(void)
> >@@ -739,20 +737,17 @@ void __init emergency_stack_init(void)
> > struct thread_info *ti;
> > ti = alloc_stack(limit, i);
> >-memset(ti, 0, THREAD_SIZE);
> > 

RE: [PATCH 2/3] powerpc/dts/fsl: t4240rdb: use the Cortina PHY driver compatible

2019-01-08 Thread Camelia Alexandra Groza
> -Original Message-
> From: Scott Wood 
> Sent: Saturday, December 22, 2018 06:07
> To: Camelia Alexandra Groza ;
> robh...@kernel.org; mark.rutl...@arm.com; b...@kernel.crashing.org
> Cc: devicet...@vger.kernel.org; linux-ker...@vger.kernel.org;
> pau...@samba.org; linuxppc-dev@lists.ozlabs.org
> Subject: Re: [PATCH 2/3] powerpc/dts/fsl: t4240rdb: use the Cortina PHY
> driver compatible
> 
> On Wed, 2018-07-18 at 14:46 +0300, Camelia Groza wrote:
> > The Cortina PHY requires the use of the dedicated Cortina PHY driver
> > instead of the generic one.
> >
> > Signed-off-by: Camelia Groza 
> > ---
> >  arch/powerpc/boot/dts/fsl/t4240rdb.dts | 8 
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/powerpc/boot/dts/fsl/t4240rdb.dts
> > b/arch/powerpc/boot/dts/fsl/t4240rdb.dts
> > index 15eb0a3..a56a705 100644
> > --- a/arch/powerpc/boot/dts/fsl/t4240rdb.dts
> > +++ b/arch/powerpc/boot/dts/fsl/t4240rdb.dts
> > @@ -267,22 +267,22 @@
> >
> > mdio@fd000 {
> > xfiphy1: ethernet-phy@10 {
> > -   compatible = "ethernet-phy-
> ieee802.3-
> > c45";
> > +   compatible = "ethernet-phy-
> > id13e5.1002";
> > reg = <0x10>;
> > };
> >
> > xfiphy2: ethernet-phy@11 {
> > -   compatible = "ethernet-phy-
> ieee802.3-
> > c45";
> > +   compatible = "ethernet-phy-
> > id13e5.1002";
> > reg = <0x11>;
> > };
> >
> > xfiphy3: ethernet-phy@13 {
> > -   compatible = "ethernet-phy-
> ieee802.3-
> > c45";
> > +   compatible = "ethernet-phy-
> > id13e5.1002";
> > reg = <0x13>;
> > };
> >
> > xfiphy4: ethernet-phy@12 {
> > -   compatible = "ethernet-phy-
> ieee802.3-
> > c45";
> > +   compatible = "ethernet-phy-
> > id13e5.1002";
> > reg = <0x12>;
> > };
> > };
> 
> I get crashes on boot when using a dtb with this change:

Hi

Sorry for the late replay. I was on vacation. I'm looking into it.

Regards,
Camelia


[PATCH v12 10/10] powerpc: clean stack pointers naming

2019-01-08 Thread Christophe Leroy
Some stack pointers used to also be thread_info pointers
and were called tp. Now that they are only stack pointers,
rename them sp.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/irq.c  | 17 +++--
 arch/powerpc/kernel/setup_64.c | 11 +++
 2 files changed, 10 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 1aef9316345f..983e53b427c0 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -659,21 +659,21 @@ void __do_irq(struct pt_regs *regs)
 void do_IRQ(struct pt_regs *regs)
 {
struct pt_regs *old_regs = set_irq_regs(regs);
-   void *curtp, *irqtp, *sirqtp;
+   void *cursp, *irqsp, *sirqsp;
 
/* Switch to the irq stack to handle this */
-   curtp = (void *)(current_stack_pointer() & ~(THREAD_SIZE - 1));
-   irqtp = hardirq_ctx[raw_smp_processor_id()];
-   sirqtp = softirq_ctx[raw_smp_processor_id()];
+   cursp = (void *)(current_stack_pointer() & ~(THREAD_SIZE - 1));
+   irqsp = hardirq_ctx[raw_smp_processor_id()];
+   sirqsp = softirq_ctx[raw_smp_processor_id()];
 
/* Already there ? */
-   if (unlikely(curtp == irqtp || curtp == sirqtp)) {
+   if (unlikely(cursp == irqsp || cursp == sirqsp)) {
__do_irq(regs);
set_irq_regs(old_regs);
return;
}
/* Switch stack and call */
-   call_do_irq(regs, irqtp);
+   call_do_irq(regs, irqsp);
 
set_irq_regs(old_regs);
 }
@@ -695,10 +695,7 @@ void *hardirq_ctx[NR_CPUS] __read_mostly;
 
 void do_softirq_own_stack(void)
 {
-   void *irqtp;
-
-   irqtp = softirq_ctx[smp_processor_id()];
-   call_do_softirq(irqtp);
+   call_do_softirq(softirq_ctx[smp_processor_id()]);
 }
 
 irq_hw_number_t virq_to_hw(unsigned int virq)
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index bc55493ff164..debd8ac3834f 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -709,19 +709,14 @@ void __init emergency_stack_init(void)
limit = min(ppc64_bolted_size(), ppc64_rma_size);
 
for_each_possible_cpu(i) {
-   void *ti;
-
-   ti = alloc_stack(limit, i);
-   paca_ptrs[i]->emergency_sp = ti + THREAD_SIZE;
+   paca_ptrs[i]->emergency_sp = alloc_stack(limit, i) + 
THREAD_SIZE;
 
 #ifdef CONFIG_PPC_BOOK3S_64
/* emergency stack for NMI exception handling. */
-   ti = alloc_stack(limit, i);
-   paca_ptrs[i]->nmi_emergency_sp = ti + THREAD_SIZE;
+   paca_ptrs[i]->nmi_emergency_sp = alloc_stack(limit, i) + 
THREAD_SIZE;
 
/* emergency stack for machine check exception handling. */
-   ti = alloc_stack(limit, i);
-   paca_ptrs[i]->mc_emergency_sp = ti + THREAD_SIZE;
+   paca_ptrs[i]->mc_emergency_sp = alloc_stack(limit, i) + 
THREAD_SIZE;
 #endif
}
 }
-- 
2.13.3



[PATCH v12 09/10] powerpc/64: Remove CURRENT_THREAD_INFO

2019-01-08 Thread Christophe Leroy
Now that current_thread_info is located at the beginning of 'current'
task struct, CURRENT_THREAD_INFO macro is not really needed any more.

This patch replaces it by loads of the value at PACACURRENT(r13).

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/exception-64s.h   |  4 ++--
 arch/powerpc/include/asm/thread_info.h |  4 
 arch/powerpc/kernel/entry_64.S | 10 +-
 arch/powerpc/kernel/exceptions-64e.S   |  2 +-
 arch/powerpc/kernel/exceptions-64s.S   |  2 +-
 arch/powerpc/kernel/idle_book3e.S  |  2 +-
 arch/powerpc/kernel/idle_power4.S  |  2 +-
 arch/powerpc/kernel/trace/ftrace_64_mprofile.S |  6 +++---
 8 files changed, 14 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 3b4767ed3ec5..dd6a5ae7a769 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -671,7 +671,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 
 #define RUNLATCH_ON\
 BEGIN_FTR_SECTION  \
-   CURRENT_THREAD_INFO(r3, r1);\
+   ld  r3, PACACURRENT(r13);   \
ld  r4,TI_LOCAL_FLAGS(r3);  \
andi.   r0,r4,_TLF_RUNLATCH;\
beqlppc64_runlatch_on_trampoline;   \
@@ -721,7 +721,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CTRL)
 #ifdef CONFIG_PPC_970_NAP
 #define FINISH_NAP \
 BEGIN_FTR_SECTION  \
-   CURRENT_THREAD_INFO(r11, r1);   \
+   ld  r11, PACACURRENT(r13);  \
ld  r9,TI_LOCAL_FLAGS(r11); \
andi.   r10,r9,_TLF_NAPPING;\
bnelpower4_fixup_nap;   \
diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index c959b8d66cac..8e1d0195ac36 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -17,10 +17,6 @@
 
 #define THREAD_SIZE(1 << THREAD_SHIFT)
 
-#ifdef CONFIG_PPC64
-#define CURRENT_THREAD_INFO(dest, sp)  stringify_in_c(ld dest, 
PACACURRENT(r13))
-#endif
-
 #ifndef __ASSEMBLY__
 #include 
 #include 
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 01d0706d873f..83bddacd7a17 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -166,7 +166,7 @@ system_call:/* label this so stack 
traces look sane */
li  r10,IRQS_ENABLED
std r10,SOFTE(r1)
 
-   CURRENT_THREAD_INFO(r11, r1)
+   ld  r11, PACACURRENT(r13)
ld  r10,TI_FLAGS(r11)
andi.   r11,r10,_TIF_SYSCALL_DOTRACE
bne .Lsyscall_dotrace   /* does not return */
@@ -213,7 +213,7 @@ system_call:/* label this so stack 
traces look sane */
ld  r3,RESULT(r1)
 #endif
 
-   CURRENT_THREAD_INFO(r12, r1)
+   ld  r12, PACACURRENT(r13)
 
ld  r8,_MSR(r1)
 #ifdef CONFIG_PPC_BOOK3S
@@ -348,7 +348,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 
/* Repopulate r9 and r10 for the syscall path */
addir9,r1,STACK_FRAME_OVERHEAD
-   CURRENT_THREAD_INFO(r10, r1)
+   ld  r10, PACACURRENT(r13)
ld  r10,TI_FLAGS(r10)
 
cmpldi  r0,NR_syscalls
@@ -746,7 +746,7 @@ _GLOBAL(ret_from_except_lite)
mtmsrd  r10,1 /* Update machine state */
 #endif /* CONFIG_PPC_BOOK3E */
 
-   CURRENT_THREAD_INFO(r9, r1)
+   ld  r9, PACACURRENT(r13)
ld  r3,_MSR(r1)
 #ifdef CONFIG_PPC_BOOK3E
ld  r10,PACACURRENT(r13)
@@ -860,7 +860,7 @@ resume_kernel:
 1: bl  preempt_schedule_irq
 
/* Re-test flags and eventually loop */
-   CURRENT_THREAD_INFO(r9, r1)
+   ld  r9, PACACURRENT(r13)
ld  r4,TI_FLAGS(r9)
andi.   r0,r4,_TIF_NEED_RESCHED
bne 1b
diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index 20f14996281d..04ee24789f80 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -493,7 +493,7 @@ exc_##n##_bad_stack:
\
  * interrupts happen before the wait instruction.
  */
 #define CHECK_NAPPING()
\
-   CURRENT_THREAD_INFO(r11, r1);   \
+   ld  r11, PACACURRENT(r13);  \
ld  r10,TI_LOCAL_FLAGS(r11);\
andi.   r9,r10,_TLF_NAPPING;\
beq+1f; \
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 9e253ce27e08..c7c4e2d6f98f 100644

[PATCH v12 08/10] powerpc/32: Remove CURRENT_THREAD_INFO and rename TI_CPU

2019-01-08 Thread Christophe Leroy
Now that thread_info is similar to task_struct, its address is in r2
so CURRENT_THREAD_INFO() macro is useless. This patch removes it.

At the same time, as the 'cpu' field is not anymore in thread_info,
this patch renames it to TASK_CPU.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Makefile  |  2 +-
 arch/powerpc/include/asm/thread_info.h |  2 --
 arch/powerpc/kernel/asm-offsets.c  |  2 +-
 arch/powerpc/kernel/entry_32.S | 43 --
 arch/powerpc/kernel/epapr_hcalls.S |  5 ++--
 arch/powerpc/kernel/head_fsl_booke.S   |  5 ++--
 arch/powerpc/kernel/idle_6xx.S |  8 +++
 arch/powerpc/kernel/idle_e500.S|  8 +++
 arch/powerpc/kernel/misc_32.S  |  3 +--
 arch/powerpc/mm/hash_low_32.S  | 14 ---
 arch/powerpc/sysdev/6xx-suspend.S  |  5 ++--
 11 files changed, 35 insertions(+), 62 deletions(-)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 326e8ba6e314..5a9bf22a7534 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -431,7 +431,7 @@ ifdef CONFIG_SMP
 prepare: task_cpu_prepare
 
 task_cpu_prepare: prepare0
-   $(eval KBUILD_CFLAGS += -D_TASK_CPU=$(shell awk '{if ($$2 == "TI_CPU") 
print $$3;}' include/generated/asm-offsets.h))
+   $(eval KBUILD_CFLAGS += -D_TASK_CPU=$(shell awk '{if ($$2 == 
"TASK_CPU") print $$3;}' include/generated/asm-offsets.h))
 endif
 
 # Check toolchain versions:
diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index d91523c2c7d8..c959b8d66cac 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -19,8 +19,6 @@
 
 #ifdef CONFIG_PPC64
 #define CURRENT_THREAD_INFO(dest, sp)  stringify_in_c(ld dest, 
PACACURRENT(r13))
-#else
-#define CURRENT_THREAD_INFO(dest, sp)  stringify_in_c(mr dest, r2)
 #endif
 
 #ifndef __ASSEMBLY__
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 94ac190a0b16..03439785c2ea 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -96,7 +96,7 @@ int main(void)
 #endif /* CONFIG_PPC64 */
OFFSET(TASK_STACK, task_struct, stack);
 #ifdef CONFIG_SMP
-   OFFSET(TI_CPU, task_struct, cpu);
+   OFFSET(TASK_CPU, task_struct, cpu);
 #endif
 
 #ifdef CONFIG_LIVEPATCH
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index b547bd4168d8..52a061f14c7d 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -168,8 +168,7 @@ transfer_to_handler:
tophys(r11,r11)
addir11,r11,global_dbcr0@l
 #ifdef CONFIG_SMP
-   CURRENT_THREAD_INFO(r9, r1)
-   lwz r9,TI_CPU(r9)
+   lwz r9,TASK_CPU(r2)
slwir9,r9,3
add r11,r11,r9
 #endif
@@ -180,8 +179,7 @@ transfer_to_handler:
stw r12,4(r11)
 #endif
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
-   CURRENT_THREAD_INFO(r9, r1)
-   tophys(r9, r9)
+   tophys(r9, r2)
ACCOUNT_CPU_USER_ENTRY(r9, r11, r12)
 #endif
 
@@ -195,8 +193,7 @@ transfer_to_handler:
ble-stack_ovf   /* then the kernel stack overflowed */
 5:
 #if defined(CONFIG_PPC_BOOK3S_32) || defined(CONFIG_E500)
-   CURRENT_THREAD_INFO(r9, r1)
-   tophys(r9,r9)   /* check local flags */
+   tophys(r9,r2)   /* check local flags */
lwz r12,TI_LOCAL_FLAGS(r9)
mtcrf   0x01,r12
bt- 31-TLF_NAPPING,4f
@@ -345,8 +342,7 @@ _GLOBAL(DoSyscall)
mtmsr   r11
 1:
 #endif /* CONFIG_TRACE_IRQFLAGS */
-   CURRENT_THREAD_INFO(r10, r1)
-   lwz r11,TI_FLAGS(r10)
+   lwz r11,TI_FLAGS(r2)
andi.   r11,r11,_TIF_SYSCALL_DOTRACE
bne-syscall_dotrace
 syscall_dotrace_cont:
@@ -379,13 +375,12 @@ ret_from_syscall:
lwz r3,GPR3(r1)
 #endif
mr  r6,r3
-   CURRENT_THREAD_INFO(r12, r1)
/* disable interrupts so current_thread_info()->flags can't change */
LOAD_MSR_KERNEL(r10,MSR_KERNEL) /* doesn't include MSR_EE */
/* Note: We don't bother telling lockdep about it */
SYNC
MTMSRD(r10)
-   lwz r9,TI_FLAGS(r12)
+   lwz r9,TI_FLAGS(r2)
li  r8,-MAX_ERRNO
andi.   
r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)
bne-syscall_exit_work
@@ -432,8 +427,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_NEED_PAIRED_STWCX)
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
andi.   r4,r8,MSR_PR
beq 3f
-   CURRENT_THREAD_INFO(r4, r1)
-   ACCOUNT_CPU_USER_EXIT(r4, r5, r7)
+   ACCOUNT_CPU_USER_EXIT(r2, r5, r7)
 3:
 #endif
lwz r4,_LINK(r1)
@@ -526,7 +520,7 @@ syscall_exit_work:
/* Clear per-syscall TIF flags if any are set.  */
 
li  r11,_TIF_PERSYSCALL_MASK
-   addir12,r12,TI_FLAGS
+   addir12,r2,TI_FLAGS
 3: lwarx   

[PATCH v12 07/10] powerpc: 'current_set' is now a table of task_struct pointers

2019-01-08 Thread Christophe Leroy
The table of pointers 'current_set' has been used for retrieving
the stack and current. They used to be thread_info pointers as
they were pointing to the stack and current was taken from the
'task' field of the thread_info.

Now, the pointers of 'current_set' table are now both pointers
to task_struct and pointers to thread_info.

As they are used to get current, and the stack pointer is
retrieved from current's stack field, this patch changes
their type to task_struct, and renames secondary_ti to
secondary_current.

Reviewed-by: Nicholas Piggin 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/asm-prototypes.h |  4 ++--
 arch/powerpc/kernel/head_32.S |  6 +++---
 arch/powerpc/kernel/head_44x.S|  4 ++--
 arch/powerpc/kernel/head_fsl_booke.S  |  4 ++--
 arch/powerpc/kernel/smp.c | 10 --
 5 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/asm-prototypes.h 
b/arch/powerpc/include/asm/asm-prototypes.h
index 1d911f68a23b..1484df6779ab 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -23,8 +23,8 @@
 #include 
 
 /* SMP */
-extern struct thread_info *current_set[NR_CPUS];
-extern struct thread_info *secondary_ti;
+extern struct task_struct *current_set[NR_CPUS];
+extern struct task_struct *secondary_current;
 void start_secondary(void *unused);
 
 /* kexec */
diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index 309a45779ad5..146385b1c2da 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -846,9 +846,9 @@ __secondary_start:
 #endif /* CONFIG_PPC_BOOK3S_32 */
 
/* get current's stack and current */
-   lis r1,secondary_ti@ha
-   tophys(r1,r1)
-   lwz r2,secondary_ti@l(r1)
+   lis r2,secondary_current@ha
+   tophys(r2,r2)
+   lwz r2,secondary_current@l(r2)
tophys(r1,r2)
lwz r1,TASK_STACK(r1)
 
diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S
index f94a93b6c2f2..37117ab11584 100644
--- a/arch/powerpc/kernel/head_44x.S
+++ b/arch/powerpc/kernel/head_44x.S
@@ -1020,8 +1020,8 @@ _GLOBAL(start_secondary_47x)
/* Now we can get our task struct and real stack pointer */
 
/* Get current's stack and current */
-   lis r1,secondary_ti@ha
-   lwz r2,secondary_ti@l(r1)
+   lis r2,secondary_current@ha
+   lwz r2,secondary_current@l(r2)
lwz r1,TASK_STACK(r2)
 
/* Current stack pointer */
diff --git a/arch/powerpc/kernel/head_fsl_booke.S 
b/arch/powerpc/kernel/head_fsl_booke.S
index 11f38adbe020..4ed2a7c8e89b 100644
--- a/arch/powerpc/kernel/head_fsl_booke.S
+++ b/arch/powerpc/kernel/head_fsl_booke.S
@@ -1091,8 +1091,8 @@ __secondary_start:
bl  call_setup_cpu
 
/* get current's stack and current */
-   lis r1,secondary_ti@ha
-   lwz r2,secondary_ti@l(r1)
+   lis r2,secondary_current@ha
+   lwz r2,secondary_current@l(r2)
lwz r1,TASK_STACK(r2)
 
/* stack */
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index aa4517686f90..a41fa8924004 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -76,7 +76,7 @@
 static DEFINE_PER_CPU(int, cpu_state) = { 0 };
 #endif
 
-struct thread_info *secondary_ti;
+struct task_struct *secondary_current;
 bool has_big_cores;
 
 DEFINE_PER_CPU(cpumask_var_t, cpu_sibling_map);
@@ -664,7 +664,7 @@ void smp_send_stop(void)
 }
 #endif /* CONFIG_NMI_IPI */
 
-struct thread_info *current_set[NR_CPUS];
+struct task_struct *current_set[NR_CPUS];
 
 static void smp_store_cpu_info(int id)
 {
@@ -929,7 +929,7 @@ void smp_prepare_boot_cpu(void)
paca_ptrs[boot_cpuid]->__current = current;
 #endif
set_numa_node(numa_cpu_lookup_table[boot_cpuid]);
-   current_set[boot_cpuid] = task_thread_info(current);
+   current_set[boot_cpuid] = current;
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
@@ -1014,15 +1014,13 @@ static bool secondaries_inhibited(void)
 
 static void cpu_idle_thread_init(unsigned int cpu, struct task_struct *idle)
 {
-   struct thread_info *ti = task_thread_info(idle);
-
 #ifdef CONFIG_PPC64
paca_ptrs[cpu]->__current = idle;
paca_ptrs[cpu]->kstack = (unsigned long)task_stack_page(idle) +
 THREAD_SIZE - STACK_FRAME_OVERHEAD;
 #endif
idle->cpu = cpu;
-   secondary_ti = current_set[cpu] = ti;
+   secondary_current = current_set[cpu] = idle;
 }
 
 int __cpu_up(unsigned int cpu, struct task_struct *tidle)
-- 
2.13.3



[PATCH v12 06/10] powerpc: regain entire stack space

2019-01-08 Thread Christophe Leroy
thread_info is not anymore in the stack, so the entire stack
can now be used.

There is also no risk anymore of corrupting task_cpu(p) with a
stack overflow so the patch removes the test.

When doing this, an explicit test for NULL stack pointer is
needed in validate_sp() as it is not anymore implicitely covered
by the sizeof(thread_info) gap.

In the meantime, with the previous patch all pointers to the stacks
are not anymore pointers to thread_info so this patch changes them
to void*

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/irq.h   | 10 +-
 arch/powerpc/include/asm/processor.h |  3 +--
 arch/powerpc/kernel/asm-offsets.c|  1 -
 arch/powerpc/kernel/entry_32.S   | 14 --
 arch/powerpc/kernel/irq.c| 19 +--
 arch/powerpc/kernel/misc_32.S|  6 ++
 arch/powerpc/kernel/process.c| 32 +---
 arch/powerpc/kernel/setup_64.c   |  8 
 8 files changed, 38 insertions(+), 55 deletions(-)

diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h
index 28a7ace0a1b9..c91a60cda4fa 100644
--- a/arch/powerpc/include/asm/irq.h
+++ b/arch/powerpc/include/asm/irq.h
@@ -48,16 +48,16 @@ struct pt_regs;
  * Per-cpu stacks for handling critical, debug and machine check
  * level interrupts.
  */
-extern struct thread_info *critirq_ctx[NR_CPUS];
-extern struct thread_info *dbgirq_ctx[NR_CPUS];
-extern struct thread_info *mcheckirq_ctx[NR_CPUS];
+extern void *critirq_ctx[NR_CPUS];
+extern void *dbgirq_ctx[NR_CPUS];
+extern void *mcheckirq_ctx[NR_CPUS];
 #endif
 
 /*
  * Per-cpu stacks for handling hard and soft interrupts.
  */
-extern struct thread_info *hardirq_ctx[NR_CPUS];
-extern struct thread_info *softirq_ctx[NR_CPUS];
+extern void *hardirq_ctx[NR_CPUS];
+extern void *softirq_ctx[NR_CPUS];
 
 void call_do_softirq(void *sp);
 void call_do_irq(struct pt_regs *regs, void *sp);
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 15acb282a876..8179b64871ed 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -325,8 +325,7 @@ struct thread_struct {
 #define ARCH_MIN_TASKALIGN 16
 
 #define INIT_SP(sizeof(init_stack) + (unsigned long) 
_stack)
-#define INIT_SP_LIMIT \
-   (_ALIGN_UP(sizeof(struct thread_info), 16) + (unsigned long)_stack)
+#define INIT_SP_LIMIT  ((unsigned long)_stack)
 
 #ifdef CONFIG_SPE
 #define SPEFSCR_INIT \
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 1fb52206c106..94ac190a0b16 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -92,7 +92,6 @@ int main(void)
DEFINE(SIGSEGV, SIGSEGV);
DEFINE(NMI_MASK, NMI_MASK);
 #else
-   DEFINE(THREAD_INFO_GAP, _ALIGN_UP(sizeof(struct thread_info), 16));
OFFSET(KSP_LIMIT, thread_struct, ksp_limit);
 #endif /* CONFIG_PPC64 */
OFFSET(TASK_STACK, task_struct, stack);
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 1f5a76283bd4..b547bd4168d8 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -97,14 +97,11 @@ crit_transfer_to_handler:
mfspr   r0,SPRN_SRR1
stw r0,_SRR1(r11)
 
-   /* set the stack limit to the current stack
-* and set the limit to protect the thread_info
-* struct
-*/
+   /* set the stack limit to the current stack */
mfspr   r8,SPRN_SPRG_THREAD
lwz r0,KSP_LIMIT(r8)
stw r0,SAVED_KSP_LIMIT(r11)
-   rlwimi  r0,r1,0,0,(31-THREAD_SHIFT)
+   rlwinm  r0,r1,0,0,(31 - THREAD_SHIFT)
stw r0,KSP_LIMIT(r8)
/* fall through */
 #endif
@@ -121,14 +118,11 @@ crit_transfer_to_handler:
mfspr   r0,SPRN_SRR1
stw r0,crit_srr1@l(0)
 
-   /* set the stack limit to the current stack
-* and set the limit to protect the thread_info
-* struct
-*/
+   /* set the stack limit to the current stack */
mfspr   r8,SPRN_SPRG_THREAD
lwz r0,KSP_LIMIT(r8)
stw r0,saved_ksp_limit@l(0)
-   rlwimi  r0,r1,0,0,(31-THREAD_SHIFT)
+   rlwinm  r0,r1,0,0,(31 - THREAD_SHIFT)
stw r0,KSP_LIMIT(r8)
/* fall through */
 #endif
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index acaeae1f11f5..1aef9316345f 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -618,9 +618,8 @@ static inline void check_stack_overflow(void)
sp = current_stack_pointer() & (THREAD_SIZE-1);
 
/* check for stack overflow: is there less than 2KB free? */
-   if (unlikely(sp < (sizeof(struct thread_info) + 2048))) {
-   pr_err("do_IRQ: stack overflow: %ld\n",
-   sp - sizeof(struct thread_info));
+   if (unlikely(sp < 2048)) {
+   pr_err("do_IRQ: stack overflow: %ld\n", sp);
   

[PATCH v12 05/10] powerpc: Activate CONFIG_THREAD_INFO_IN_TASK

2019-01-08 Thread Christophe Leroy
This patch activates CONFIG_THREAD_INFO_IN_TASK which
moves the thread_info into task_struct.

Moving thread_info into task_struct has the following advantages:
- It protects thread_info from corruption in the case of stack
overflows.
- Its address is harder to determine if stack addresses are
leaked, making a number of attacks more difficult.

This has the following consequences:
- thread_info is now located at the beginning of task_struct.
- The 'cpu' field is now in task_struct, and only exists when
CONFIG_SMP is active.
- thread_info doesn't have anymore the 'task' field.

This patch:
- Removes all recopy of thread_info struct when the stack changes.
- Changes the CURRENT_THREAD_INFO() macro to point to current.
- Selects CONFIG_THREAD_INFO_IN_TASK.
- Modifies raw_smp_processor_id() to get ->cpu from current without
including linux/sched.h to avoid circular inclusion and without
including asm/asm-offsets.h to avoid symbol names duplication
between ASM constants and C constants.

Signed-off-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/Kconfig   |  1 +
 arch/powerpc/Makefile  |  7 +++
 arch/powerpc/include/asm/irq.h |  4 --
 arch/powerpc/include/asm/ptrace.h  |  2 +-
 arch/powerpc/include/asm/smp.h | 17 +++-
 arch/powerpc/include/asm/thread_info.h | 17 +---
 arch/powerpc/kernel/asm-offsets.c  |  7 ++-
 arch/powerpc/kernel/entry_32.S |  9 ++--
 arch/powerpc/kernel/exceptions-64e.S   | 11 -
 arch/powerpc/kernel/head_32.S  |  6 +--
 arch/powerpc/kernel/head_44x.S |  4 +-
 arch/powerpc/kernel/head_64.S  |  1 +
 arch/powerpc/kernel/head_booke.h   |  8 +---
 arch/powerpc/kernel/head_fsl_booke.S   |  7 ++-
 arch/powerpc/kernel/irq.c  | 79 +-
 arch/powerpc/kernel/kgdb.c | 28 
 arch/powerpc/kernel/machine_kexec_64.c |  6 +--
 arch/powerpc/kernel/setup_64.c | 21 -
 arch/powerpc/kernel/smp.c  |  2 +-
 arch/powerpc/net/bpf_jit32.h   |  5 +--
 20 files changed, 52 insertions(+), 190 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2890d36eb531..0a26e0075ce5 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -241,6 +241,7 @@ config PPC
select RTC_LIB
select SPARSE_IRQ
select SYSCTL_EXCEPTION_TRACE
+   select THREAD_INFO_IN_TASK
select VIRT_TO_BUS  if !PPC64
#
# Please keep this list sorted alphabetically.
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 488c9edffa58..326e8ba6e314 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -427,6 +427,13 @@ else
 endif
 endif
 
+ifdef CONFIG_SMP
+prepare: task_cpu_prepare
+
+task_cpu_prepare: prepare0
+   $(eval KBUILD_CFLAGS += -D_TASK_CPU=$(shell awk '{if ($$2 == "TI_CPU") 
print $$3;}' include/generated/asm-offsets.h))
+endif
+
 # Check toolchain versions:
 # - gcc-4.6 is the minimum kernel-wide version so nothing required.
 checkbin:
diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h
index 2efbae8d93be..28a7ace0a1b9 100644
--- a/arch/powerpc/include/asm/irq.h
+++ b/arch/powerpc/include/asm/irq.h
@@ -51,9 +51,6 @@ struct pt_regs;
 extern struct thread_info *critirq_ctx[NR_CPUS];
 extern struct thread_info *dbgirq_ctx[NR_CPUS];
 extern struct thread_info *mcheckirq_ctx[NR_CPUS];
-extern void exc_lvl_ctx_init(void);
-#else
-#define exc_lvl_ctx_init()
 #endif
 
 /*
@@ -62,7 +59,6 @@ extern void exc_lvl_ctx_init(void);
 extern struct thread_info *hardirq_ctx[NR_CPUS];
 extern struct thread_info *softirq_ctx[NR_CPUS];
 
-extern void irq_ctx_init(void);
 void call_do_softirq(void *sp);
 void call_do_irq(struct pt_regs *regs, void *sp);
 extern void do_IRQ(struct pt_regs *regs);
diff --git a/arch/powerpc/include/asm/ptrace.h 
b/arch/powerpc/include/asm/ptrace.h
index 0b8a735b6d85..64271e562fed 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -157,7 +157,7 @@ extern int ptrace_put_reg(struct task_struct *task, int 
regno,
  unsigned long data);
 
 #define current_pt_regs() \
-   ((struct pt_regs *)((unsigned long)current_thread_info() + THREAD_SIZE) 
- 1)
+   ((struct pt_regs *)((unsigned long)task_stack_page(current) + 
THREAD_SIZE) - 1)
 /*
  * We use the least-significant bit of the trap field to indicate
  * whether we have saved the full set of registers, or only a
diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 41695745032c..0de717e16dd6 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -83,7 +83,22 @@ int is_cpu_dead(unsigned int cpu);
 /* 32-bit */
 extern int smp_hw_index[];
 
-#define raw_smp_processor_id() (current_thread_info()->cpu)
+/*
+ * This is particularly ugly: it appears we can't actually get the definition
+ * of 

[PATCH v12 04/10] powerpc: Prepare for moving thread_info into task_struct

2019-01-08 Thread Christophe Leroy
This patch cleans the powerpc kernel before activating
CONFIG_THREAD_INFO_IN_TASK:
- The purpose of the pointer given to call_do_softirq() and
call_do_irq() is to point the new stack ==> change it to void* and
rename it 'sp'
- Don't use CURRENT_THREAD_INFO() to locate the stack.
- Fix a few comments.
- Replace current_thread_info()->task by current
- Rename THREAD_INFO to TASK_STASK: as it is in fact the offset of the
pointer to the stack in task_struct, this pointer will not be impacted
by the move of THREAD_INFO.
- Makes TASK_STACK available to PPC64. PPC64 will need it to get the
stack pointer from current once the thread_info have been moved.
- Modifies klp_init_thread_info() to take task_struct pointer argument.

Signed-off-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/irq.h   | 4 ++--
 arch/powerpc/include/asm/livepatch.h | 6 +++---
 arch/powerpc/include/asm/processor.h | 4 ++--
 arch/powerpc/include/asm/reg.h   | 2 +-
 arch/powerpc/kernel/asm-offsets.c| 2 +-
 arch/powerpc/kernel/entry_32.S   | 2 +-
 arch/powerpc/kernel/entry_64.S   | 2 +-
 arch/powerpc/kernel/head_32.S| 4 ++--
 arch/powerpc/kernel/head_40x.S   | 4 ++--
 arch/powerpc/kernel/head_44x.S   | 2 +-
 arch/powerpc/kernel/head_8xx.S   | 2 +-
 arch/powerpc/kernel/head_booke.h | 4 ++--
 arch/powerpc/kernel/head_fsl_booke.S | 4 ++--
 arch/powerpc/kernel/irq.c| 2 +-
 arch/powerpc/kernel/misc_32.S| 4 ++--
 arch/powerpc/kernel/process.c| 8 
 arch/powerpc/kernel/setup-common.c   | 2 +-
 arch/powerpc/kernel/smp.c| 4 +++-
 18 files changed, 32 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h
index ee39ce56b2a2..2efbae8d93be 100644
--- a/arch/powerpc/include/asm/irq.h
+++ b/arch/powerpc/include/asm/irq.h
@@ -63,8 +63,8 @@ extern struct thread_info *hardirq_ctx[NR_CPUS];
 extern struct thread_info *softirq_ctx[NR_CPUS];
 
 extern void irq_ctx_init(void);
-extern void call_do_softirq(struct thread_info *tp);
-extern void call_do_irq(struct pt_regs *regs, struct thread_info *tp);
+void call_do_softirq(void *sp);
+void call_do_irq(struct pt_regs *regs, void *sp);
 extern void do_IRQ(struct pt_regs *regs);
 extern void __init init_IRQ(void);
 extern void __do_irq(struct pt_regs *regs);
diff --git a/arch/powerpc/include/asm/livepatch.h 
b/arch/powerpc/include/asm/livepatch.h
index 47a03b9b528b..7cb514865a28 100644
--- a/arch/powerpc/include/asm/livepatch.h
+++ b/arch/powerpc/include/asm/livepatch.h
@@ -43,13 +43,13 @@ static inline unsigned long 
klp_get_ftrace_location(unsigned long faddr)
return ftrace_location_range(faddr, faddr + 16);
 }
 
-static inline void klp_init_thread_info(struct thread_info *ti)
+static inline void klp_init_thread_info(struct task_struct *p)
 {
/* + 1 to account for STACK_END_MAGIC */
-   ti->livepatch_sp = (unsigned long *)(ti + 1) + 1;
+   task_thread_info(p)->livepatch_sp = end_of_stack(p) + 1;
 }
 #else
-static void klp_init_thread_info(struct thread_info *ti) { }
+static inline void klp_init_thread_info(struct task_struct *p) { }
 #endif /* CONFIG_LIVEPATCH */
 
 #endif /* _ASM_POWERPC_LIVEPATCH_H */
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 692f7383d461..15acb282a876 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -40,7 +40,7 @@
 
 #ifndef __ASSEMBLY__
 #include 
-#include 
+#include 
 #include 
 #include 
 
@@ -326,7 +326,7 @@ struct thread_struct {
 
 #define INIT_SP(sizeof(init_stack) + (unsigned long) 
_stack)
 #define INIT_SP_LIMIT \
-   (_ALIGN_UP(sizeof(init_thread_info), 16) + (unsigned long) _stack)
+   (_ALIGN_UP(sizeof(struct thread_info), 16) + (unsigned long)_stack)
 
 #ifdef CONFIG_SPE
 #define SPEFSCR_INIT \
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 1c98ef1f2d5b..581e61db2dcf 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -1062,7 +1062,7 @@
  * - SPRG9 debug exception scratch
  *
  * All 32-bit:
- * - SPRG3 current thread_info pointer
+ * - SPRG3 current thread_struct physical addr pointer
  *(virtual on BookE, physical on others)
  *
  * 32-bit classic:
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 9ffc72ded73a..b2b52e002a76 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -90,10 +90,10 @@ int main(void)
DEFINE(SIGSEGV, SIGSEGV);
DEFINE(NMI_MASK, NMI_MASK);
 #else
-   OFFSET(THREAD_INFO, task_struct, stack);
DEFINE(THREAD_INFO_GAP, _ALIGN_UP(sizeof(struct thread_info), 16));
OFFSET(KSP_LIMIT, thread_struct, ksp_limit);
 #endif /* CONFIG_PPC64 */
+   OFFSET(TASK_STACK, task_struct, stack);
 
 #ifdef CONFIG_LIVEPATCH
OFFSET(TI_livepatch_sp, 

[PATCH v12 03/10] powerpc: Only use task_struct 'cpu' field on SMP

2019-01-08 Thread Christophe Leroy
When moving to CONFIG_THREAD_INFO_IN_TASK, the thread_info 'cpu' field
gets moved into task_struct and only defined when CONFIG_SMP is set.

This patch ensures that TI_CPU is only used when CONFIG_SMP is set and
that task_struct 'cpu' field is not used directly out of SMP code.

Signed-off-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/kernel/head_fsl_booke.S | 2 ++
 arch/powerpc/kernel/misc_32.S| 4 
 arch/powerpc/xmon/xmon.c | 2 +-
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/head_fsl_booke.S 
b/arch/powerpc/kernel/head_fsl_booke.S
index 2386ce2a9c6e..2c21e8642a00 100644
--- a/arch/powerpc/kernel/head_fsl_booke.S
+++ b/arch/powerpc/kernel/head_fsl_booke.S
@@ -243,8 +243,10 @@ set_ivor:
li  r0,0
stwur0,THREAD_SIZE-STACK_FRAME_OVERHEAD(r1)
 
+#ifdef CONFIG_SMP
CURRENT_THREAD_INFO(r22, r1)
stw r24, TI_CPU(r22)
+#endif
 
bl  early_init
 
diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index 57d2ffb2d45c..02b8cdd73792 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -183,10 +183,14 @@ _GLOBAL(low_choose_750fx_pll)
or  r4,r4,r5
mtspr   SPRN_HID1,r4
 
+#ifdef CONFIG_SMP
/* Store new HID1 image */
CURRENT_THREAD_INFO(r6, r1)
lwz r6,TI_CPU(r6)
slwir6,r6,2
+#else
+   li  r6, 0
+#endif
addis   r6,r6,nap_save_hid1@ha
stw r4,nap_save_hid1@l(r6)
 
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 757b8499aba2..a0f44f992360 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -2997,7 +2997,7 @@ static void show_task(struct task_struct *tsk)
printf("%px %016lx %6d %6d %c %2d %s\n", tsk,
tsk->thread.ksp,
tsk->pid, rcu_dereference(tsk->parent)->pid,
-   state, task_thread_info(tsk)->cpu,
+   state, task_cpu(tsk),
tsk->comm);
 }
 
-- 
2.13.3



[PATCH v12 02/10] book3s/64: avoid circular header inclusion in mmu-hash.h

2019-01-08 Thread Christophe Leroy
When activating CONFIG_THREAD_INFO_IN_TASK, linux/sched.h
includes asm/current.h. This generates a circular dependency.
To avoid that, asm/processor.h shall not be included in mmu-hash.h

In order to do that, this patch moves into a new header called
asm/task_size_user64.h the information from asm/processor.h required
by mmu-hash.h

Signed-off-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |  2 +-
 arch/powerpc/include/asm/processor.h  | 34 +-
 arch/powerpc/include/asm/task_size_user64.h   | 42 +++
 arch/powerpc/kvm/book3s_hv_hmi.c  |  1 +
 4 files changed, 45 insertions(+), 34 deletions(-)
 create mode 100644 arch/powerpc/include/asm/task_size_user64.h

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 12e522807f9f..b2aba048301e 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -23,7 +23,7 @@
  */
 #include 
 #include 
-#include 
+#include 
 #include 
 
 /*
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index ee58526cb6c2..692f7383d461 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -95,40 +95,8 @@ void release_thread(struct task_struct *);
 #endif
 
 #ifdef CONFIG_PPC64
-/*
- * 64-bit user address space can have multiple limits
- * For now supported values are:
- */
-#define TASK_SIZE_64TB  (0x4000UL)
-#define TASK_SIZE_128TB (0x8000UL)
-#define TASK_SIZE_512TB (0x0002UL)
-#define TASK_SIZE_1PB   (0x0004UL)
-#define TASK_SIZE_2PB   (0x0008UL)
-/*
- * With 52 bits in the address we can support
- * upto 4PB of range.
- */
-#define TASK_SIZE_4PB   (0x0010UL)
 
-/*
- * For now 512TB is only supported with book3s and 64K linux page size.
- */
-#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_PPC_64K_PAGES)
-/*
- * Max value currently used:
- */
-#define TASK_SIZE_USER64   TASK_SIZE_4PB
-#define DEFAULT_MAP_WINDOW_USER64  TASK_SIZE_128TB
-#define TASK_CONTEXT_SIZE  TASK_SIZE_512TB
-#else
-#define TASK_SIZE_USER64   TASK_SIZE_64TB
-#define DEFAULT_MAP_WINDOW_USER64  TASK_SIZE_64TB
-/*
- * We don't need to allocate extended context ids for 4K page size, because
- * we limit the max effective address on this config to 64TB.
- */
-#define TASK_CONTEXT_SIZE  TASK_SIZE_64TB
-#endif
+#include 
 
 /*
  * 32-bit user address space is 4GB - 1 page
diff --git a/arch/powerpc/include/asm/task_size_user64.h 
b/arch/powerpc/include/asm/task_size_user64.h
new file mode 100644
index ..a4043075864b
--- /dev/null
+++ b/arch/powerpc/include/asm/task_size_user64.h
@@ -0,0 +1,42 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_TASK_SIZE_USER64_H
+#define _ASM_POWERPC_TASK_SIZE_USER64_H
+
+#ifdef CONFIG_PPC64
+/*
+ * 64-bit user address space can have multiple limits
+ * For now supported values are:
+ */
+#define TASK_SIZE_64TB  (0x4000UL)
+#define TASK_SIZE_128TB (0x8000UL)
+#define TASK_SIZE_512TB (0x0002UL)
+#define TASK_SIZE_1PB   (0x0004UL)
+#define TASK_SIZE_2PB   (0x0008UL)
+/*
+ * With 52 bits in the address we can support
+ * upto 4PB of range.
+ */
+#define TASK_SIZE_4PB   (0x0010UL)
+
+/*
+ * For now 512TB is only supported with book3s and 64K linux page size.
+ */
+#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_PPC_64K_PAGES)
+/*
+ * Max value currently used:
+ */
+#define TASK_SIZE_USER64   TASK_SIZE_4PB
+#define DEFAULT_MAP_WINDOW_USER64  TASK_SIZE_128TB
+#define TASK_CONTEXT_SIZE  TASK_SIZE_512TB
+#else
+#define TASK_SIZE_USER64   TASK_SIZE_64TB
+#define DEFAULT_MAP_WINDOW_USER64  TASK_SIZE_64TB
+/*
+ * We don't need to allocate extended context ids for 4K page size, because
+ * we limit the max effective address on this config to 64TB.
+ */
+#define TASK_CONTEXT_SIZE  TASK_SIZE_64TB
+#endif
+
+#endif /* CONFIG_PPC64 */
+#endif /* _ASM_POWERPC_TASK_SIZE_USER64_H */
diff --git a/arch/powerpc/kvm/book3s_hv_hmi.c b/arch/powerpc/kvm/book3s_hv_hmi.c
index e3f738eb1cac..64b5011475c7 100644
--- a/arch/powerpc/kvm/book3s_hv_hmi.c
+++ b/arch/powerpc/kvm/book3s_hv_hmi.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 void wait_for_subcore_guest_exit(void)
 {
-- 
2.13.3



[PATCH v12 00/10] powerpc: Switch to CONFIG_THREAD_INFO_IN_TASK

2019-01-08 Thread Christophe Leroy
The purpose of this serie is to activate CONFIG_THREAD_INFO_IN_TASK which
moves the thread_info into task_struct.

Moving thread_info into task_struct has the following advantages:
- It protects thread_info from corruption in the case of stack
overflows.
- Its address is harder to determine if stack addresses are
leaked, making a number of attacks more difficult.

Changes since v11:
 - Rebased on 81775f5563fa ("Automatic merge of branches 'master', 'next' and 
'fixes' into merge")
 - Added a first patch to change memblock allocs to functions returning virtual 
addrs. This removes
   the memset() which were the only remaining stuff in irq_ctx_init() and 
exc_lvl_ctx_init() at the end.
 - dropping irq_ctx_init() and exc_lvl_ctx_init() in patch 5 (powerpc: Activate 
CONFIG_THREAD_INFO_IN_TASK)
 - A few cosmetic changes in commit log and code.

Changes since v10:
 - Rebased on 21622a0d2023 ("Automatic merge of branches 'master', 'next' and 
'fixes' into merge")
  ==> Fixed conflict in setup_32.S

Changes since v9:
 - Rebased on 183cbf93be88 ("Automatic merge of branches 'master', 'next' and 
'fixes' into merge")
  ==> Fixed conflict on xmon

Changes since v8:
 - Rebased on e589b79e40d9 ("Automatic merge of branches 'master', 'next' and 
'fixes' into merge")
  ==> Main impact was conflicts due to commit 9a8dd708d547 ("memblock: rename 
memblock_alloc{_nid,_try_nid} to memblock_phys_alloc*")

Changes since v7:
 - Rebased on fb6c6ce7907d ("Automatic merge of branches 'master', 'next' and 
'fixes' into merge")

Changes since v6:
 - Fixed validate_sp() to exclude NULL sp in 'regain entire stack space' patch 
(early crash with CONFIG_KMEMLEAK)

Changes since v5:
 - Fixed livepatch_sp setup by using end_of_stack() instead of hardcoding
 - Fixed PPC_BPF_LOAD_CPU() macro

Changes since v4:
 - Fixed a build failure on 32bits SMP when include/generated/asm-offsets.h is 
not
 already existing, was due to spaces instead of a tab in the Makefile

Changes since RFC v3: (based on Nick's review)
 - Renamed task_size.h to task_size_user64.h to better relate to what it 
contains.
 - Handling of the isolation of thread_info cpu field inside CONFIG_SMP #ifdefs 
moved to a separate patch.
 - Removed CURRENT_THREAD_INFO macro completely.
 - Added a guard in asm/smp.h to avoid build failure before _TASK_CPU is 
defined.
 - Added a patch at the end to rename 'tp' pointers to 'sp' pointers
 - Renamed 'tp' into 'sp' pointers in preparation patch when relevant
 - Fixed a few commit logs
 - Fixed checkpatch report.

Changes since RFC v2:
 - Removed the modification of names in asm-offsets
 - Created a rule in arch/powerpc/Makefile to append the offset of current->cpu 
in CFLAGS
 - Modified asm/smp.h to use the offset set in CFLAGS
 - Squashed the renaming of THREAD_INFO to TASK_STACK in the preparation patch
 - Moved the modification of current_pt_regs in the patch activating 
CONFIG_THREAD_INFO_IN_TASK

Changes since RFC v1:
 - Removed the first patch which was modifying header inclusion order in timer
 - Modified some names in asm-offsets to avoid conflicts when including 
asm-offsets in C files
 - Modified asm/smp.h to avoid having to include linux/sched.h (using 
asm-offsets instead)
 - Moved some changes from the activation patch to the preparation patch.

Christophe Leroy (10):
  powerpc/irq: use memblock functions returning virtual address
  book3s/64: avoid circular header inclusion in mmu-hash.h
  powerpc: Only use task_struct 'cpu' field on SMP
  powerpc: Prepare for moving thread_info into task_struct
  powerpc: Activate CONFIG_THREAD_INFO_IN_TASK
  powerpc: regain entire stack space
  powerpc: 'current_set' is now a table of task_struct pointers
  powerpc/32: Remove CURRENT_THREAD_INFO and rename TI_CPU
  powerpc/64: Remove CURRENT_THREAD_INFO
  powerpc: clean stack pointers naming

 arch/powerpc/Kconfig   |   1 +
 arch/powerpc/Makefile  |   7 ++
 arch/powerpc/include/asm/asm-prototypes.h  |   4 +-
 arch/powerpc/include/asm/book3s/64/mmu-hash.h  |   2 +-
 arch/powerpc/include/asm/exception-64s.h   |   4 +-
 arch/powerpc/include/asm/irq.h |  18 ++--
 arch/powerpc/include/asm/livepatch.h   |   6 +-
 arch/powerpc/include/asm/processor.h   |  39 +
 arch/powerpc/include/asm/ptrace.h  |   2 +-
 arch/powerpc/include/asm/reg.h |   2 +-
 arch/powerpc/include/asm/smp.h |  17 +++-
 arch/powerpc/include/asm/task_size_user64.h|  42 +
 arch/powerpc/include/asm/thread_info.h |  19 -
 arch/powerpc/kernel/asm-offsets.c  |  10 ++-
 arch/powerpc/kernel/entry_32.S |  66 +-
 arch/powerpc/kernel/entry_64.S |  12 +--
 arch/powerpc/kernel/epapr_hcalls.S |   5 +-
 arch/powerpc/kernel/exceptions-64e.S   |  13 +--
 arch/powerpc/kernel/exceptions-64s.S   |   2 +-
 arch/powerpc/kernel/head_32.S  

[PATCH v12 01/10] powerpc/irq: use memblock functions returning virtual address

2019-01-08 Thread Christophe Leroy
Since only the virtual address of allocated blocks is used,
lets use functions returning directly virtual address.

Those functions have the advantage of also zeroing the block.

Suggested-by: Mike Rapoport 
Signed-off-by: Christophe Leroy 
---
 @Mike: Part of this is taken from your serie. I was not sure how
 to include you in the commit log, I used Suggested-by: but could use
 any other property, so if you prefer something different just tell
 me. Also, if you could review this patch, it would be nice.

 arch/powerpc/kernel/irq.c  |  5 -
 arch/powerpc/kernel/setup_32.c | 15 +--
 arch/powerpc/kernel/setup_64.c | 16 ++--
 3 files changed, 7 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 916ddc4aac44..4a44bc395fbc 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -725,18 +725,15 @@ void exc_lvl_ctx_init(void)
 #endif
 #endif
 
-   memset((void *)critirq_ctx[cpu_nr], 0, THREAD_SIZE);
tp = critirq_ctx[cpu_nr];
tp->cpu = cpu_nr;
tp->preempt_count = 0;
 
 #ifdef CONFIG_BOOKE
-   memset((void *)dbgirq_ctx[cpu_nr], 0, THREAD_SIZE);
tp = dbgirq_ctx[cpu_nr];
tp->cpu = cpu_nr;
tp->preempt_count = 0;
 
-   memset((void *)mcheckirq_ctx[cpu_nr], 0, THREAD_SIZE);
tp = mcheckirq_ctx[cpu_nr];
tp->cpu = cpu_nr;
tp->preempt_count = HARDIRQ_OFFSET;
@@ -754,12 +751,10 @@ void irq_ctx_init(void)
int i;
 
for_each_possible_cpu(i) {
-   memset((void *)softirq_ctx[i], 0, THREAD_SIZE);
tp = softirq_ctx[i];
tp->cpu = i;
klp_init_thread_info(tp);
 
-   memset((void *)hardirq_ctx[i], 0, THREAD_SIZE);
tp = hardirq_ctx[i];
tp->cpu = i;
klp_init_thread_info(tp);
diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c
index 947f904688b0..bfe1b46a26dc 100644
--- a/arch/powerpc/kernel/setup_32.c
+++ b/arch/powerpc/kernel/setup_32.c
@@ -203,10 +203,8 @@ void __init irqstack_early_init(void)
/* interrupt stacks must be in lowmem, we get that for free on ppc32
 * as the memblock is limited to lowmem by default */
for_each_possible_cpu(i) {
-   softirq_ctx[i] = (struct thread_info *)
-   __va(memblock_phys_alloc(THREAD_SIZE, THREAD_SIZE));
-   hardirq_ctx[i] = (struct thread_info *)
-   __va(memblock_phys_alloc(THREAD_SIZE, THREAD_SIZE));
+   softirq_ctx[i] = memblock_alloc(THREAD_SIZE, THREAD_SIZE);
+   hardirq_ctx[i] = memblock_alloc(THREAD_SIZE, THREAD_SIZE);
}
 }
 
@@ -224,13 +222,10 @@ void __init exc_lvl_early_init(void)
hw_cpu = 0;
 #endif
 
-   critirq_ctx[hw_cpu] = (struct thread_info *)
-   __va(memblock_phys_alloc(THREAD_SIZE, THREAD_SIZE));
+   critirq_ctx[hw_cpu] = memblock_alloc(THREAD_SIZE, THREAD_SIZE);
 #ifdef CONFIG_BOOKE
-   dbgirq_ctx[hw_cpu] = (struct thread_info *)
-   __va(memblock_phys_alloc(THREAD_SIZE, THREAD_SIZE));
-   mcheckirq_ctx[hw_cpu] = (struct thread_info *)
-   __va(memblock_phys_alloc(THREAD_SIZE, THREAD_SIZE));
+   dbgirq_ctx[hw_cpu] = memblock_alloc(THREAD_SIZE, THREAD_SIZE);
+   mcheckirq_ctx[hw_cpu] = memblock_alloc(THREAD_SIZE, 
THREAD_SIZE);
 #endif
}
 }
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 236c1151a3a7..943503f1e2c0 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -634,19 +634,10 @@ __init u64 ppc64_bolted_size(void)
 
 static void *__init alloc_stack(unsigned long limit, int cpu)
 {
-   unsigned long pa;
-
BUILD_BUG_ON(STACK_INT_FRAME_SIZE % 16);
 
-   pa = memblock_alloc_base_nid(THREAD_SIZE, THREAD_SIZE, limit,
-   early_cpu_to_node(cpu), MEMBLOCK_NONE);
-   if (!pa) {
-   pa = memblock_alloc_base(THREAD_SIZE, THREAD_SIZE, limit);
-   if (!pa)
-   panic("cannot allocate stacks");
-   }
-
-   return __va(pa);
+   return memblock_alloc_try_nid(THREAD_SIZE, THREAD_SIZE, 
MEMBLOCK_LOW_LIMIT,
+ limit, early_cpu_to_node(cpu));
 }
 
 void __init irqstack_early_init(void)
@@ -739,20 +730,17 @@ void __init emergency_stack_init(void)
struct thread_info *ti;
 
ti = alloc_stack(limit, i);
-   memset(ti, 0, THREAD_SIZE);
emerg_stack_init_thread_info(ti, i);
paca_ptrs[i]->emergency_sp = (void *)ti + THREAD_SIZE;
 
 #ifdef CONFIG_PPC_BOOK3S_64
/* emergency stack for NMI exception 

Re: [PATCH v4 05/13] arm: perf: conditionally use PERF_PMU_CAP_NO_EXCLUDE

2019-01-08 Thread Andrew Murray
On Tue, Jan 08, 2019 at 02:10:31PM +0100, Peter Zijlstra wrote:
> On Tue, Jan 08, 2019 at 01:07:41PM +, Andrew Murray wrote:
> 
> > Yes I found lots of examples like this across the tree whilst doing this
> > work. However I decided to initially start with simply removing duplicated
> > code as a result of adding this flag and attempting to preserve existing
> > functionality. I thought that if I add missing checks then the patchset
> > will get much bigger and be harder to merge. I would like to do this though
> > as another non-cross-arch series.
> > 
> > Can we limit this patch series to the minimal changes required to fully
> > use PERF_PMU_CAP_NO_EXCLUDE and then attempt to fix these existing problems
> > in subsequent patch sets?
> 
> Ok, but it would've been nice to see that mentioned somewhere.

I'll update the cover leter on any next revision. I'll try to be clearer next
time with my intentions.

Andrew Murray


Re: [PATCH v4 10/13] x86: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs

2019-01-08 Thread Andrew Murray
On Tue, Jan 08, 2019 at 11:48:41AM +0100, Peter Zijlstra wrote:
> On Mon, Jan 07, 2019 at 04:27:27PM +, Andrew Murray wrote:
> > For drivers that do not support context exclusion let's advertise the
> > PERF_PMU_CAP_NOEXCLUDE capability. This ensures that perf will
> > prevent us from handling events where any exclusion flags are set.
> > Let's also remove the now unnecessary check for exclusion flags.
> > 
> > Signed-off-by: Andrew Murray 
> > ---
> >  arch/x86/events/amd/ibs.c  | 13 +
> >  arch/x86/events/amd/power.c| 10 ++
> >  arch/x86/events/intel/cstate.c | 12 +++-
> >  arch/x86/events/intel/rapl.c   |  9 ++---
> >  arch/x86/events/intel/uncore_snb.c |  9 ++---
> >  arch/x86/events/msr.c  | 10 ++
> >  6 files changed, 12 insertions(+), 51 deletions(-)
> 
> You (correctly) don't add CAP_NO_EXCLUDE to the main x86 pmu code, but
> then you also don't check if it handles all the various exclude options
> correctly/consistently.
> 
> Now; I must admit that that is a bit of a maze, but I think we can at
> least add exclude_idle and exclude_hv fails in there, nothing uses those
> afaict.

Yes it took me some time to make sense of it.

As per my comments in the other patch, I think you're suggesting that I
add additional checks to x86. I think they are needed but I'd prefer to
make functional changes in a separate series, I'm happy to do this.

> 
> On the various exclude options; they are as follows (IIUC):
> 
>   - exclude_guest: we're a HV/host-kernel and we don't want the counter
>to run when we run a guest context.
> 
>   - exclude_host: we're a HV/host-kernel and we don't want the counter
>   to run when we run in host context.
> 
>   - exclude_hv: we're a guest and don't want the counter to run in HV
> context.
> 
> Now, KVM always implies exclude_hv afaict (for guests),

It certaintly does for ARM.

> I'm not sure
> what, if anything Xen does on x86 (IIRC Brendan Gregg once said perf
> works on Xen) -- nor quite sure who to ask, Boris, Jeurgen?

Thanks,

Andrew Murray
> 


Re: [PATCH v4 05/13] arm: perf: conditionally use PERF_PMU_CAP_NO_EXCLUDE

2019-01-08 Thread Peter Zijlstra
On Tue, Jan 08, 2019 at 01:07:41PM +, Andrew Murray wrote:

> Yes I found lots of examples like this across the tree whilst doing this
> work. However I decided to initially start with simply removing duplicated
> code as a result of adding this flag and attempting to preserve existing
> functionality. I thought that if I add missing checks then the patchset
> will get much bigger and be harder to merge. I would like to do this though
> as another non-cross-arch series.
> 
> Can we limit this patch series to the minimal changes required to fully
> use PERF_PMU_CAP_NO_EXCLUDE and then attempt to fix these existing problems
> in subsequent patch sets?

Ok, but it would've been nice to see that mentioned somewhere.


Re: [PATCH v4 11/13] x86: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs

2019-01-08 Thread Andrew Murray
On Tue, Jan 08, 2019 at 11:49:40AM +0100, Peter Zijlstra wrote:
> On Mon, Jan 07, 2019 at 04:27:28PM +, Andrew Murray wrote:
> 
> This patch has the exact same subject as the previous one.. that seems
> sub-optimal.

Ah yes, I'll update that in subsquent revisions. (The reason for two patches
was to separate functional vs non-functional changes).

Andrew Murray


Re: [PATCH v4 05/13] arm: perf: conditionally use PERF_PMU_CAP_NO_EXCLUDE

2019-01-08 Thread Andrew Murray
On Tue, Jan 08, 2019 at 11:28:02AM +0100, Peter Zijlstra wrote:
> On Mon, Jan 07, 2019 at 04:27:22PM +, Andrew Murray wrote:
> > @@ -393,9 +386,8 @@ __hw_perf_event_init(struct perf_event *event)
> > /*
> >  * Check whether we need to exclude the counter from certain modes.
> >  */
> > +   if (armpmu->set_event_filter &&
> > +   armpmu->set_event_filter(hwc, >attr)) {
> > pr_debug("ARM performance counters do not support "
> >  "mode exclusion\n");
> > return -EOPNOTSUPP;
> 
> This then requires all set_event_filter() implementations to check all
> the various exclude options;

Yes but this isn't a new requirement, this hunk uses the absence of
set_event_filter to blanket indicate that no exclusion flags are supported.


> also, set_event_filter() failing then
> returns with -EOPNOTSUPP instead of the -EINVAL the CAP_NO_EXCLUDE
> generates, which is again inconsitent.

Yes, it's not ideal - but a step in the right direction. I wanted to limit
user visible changes as much as possible, where I've identified them I've
noted it in the commit log.

> 
> If I look at (the very first git-grep found me)
> armv7pmu_set_event_filter(), then I find it returning -EPERM (again
> inconsistent but irrelevant because the actual value is not preserved)
> for exclude_idle.
> 
> But it doesn't seem to check exclude_host at all for example.

Yes I found lots of examples like this across the tree whilst doing this
work. However I decided to initially start with simply removing duplicated
code as a result of adding this flag and attempting to preserve existing
functionality. I thought that if I add missing checks then the patchset
will get much bigger and be harder to merge. I would like to do this though
as another non-cross-arch series.

Can we limit this patch series to the minimal changes required to fully
use PERF_PMU_CAP_NO_EXCLUDE and then attempt to fix these existing problems
in subsequent patch sets?

Thanks,

Andrew Murray

> 
> > @@ -867,6 +859,9 @@ int armpmu_register(struct arm_pmu *pmu)
> > if (ret)
> > return ret;
> >  
> > +   if (!pmu->set_event_filter)
> > +   pmu->pmu.capabilities |= PERF_PMU_CAP_NO_EXCLUDE;
> > +
> > ret = perf_pmu_register(>pmu, pmu->name, -1);
> > if (ret)
> > goto out_destroy;
> > -- 
> > 2.7.4
> > 


[PATCH v2 4/4] ASoC: add imx-audmix DT binding documentation

2019-01-08 Thread Viorel Suman
Add the DT binding documentation for Audio Mixer
machine driver.

Signed-off-by: Viorel Suman 
---
 .../devicetree/bindings/sound/imx-audmix.txt   | 24 ++
 1 file changed, 24 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/sound/imx-audmix.txt

diff --git a/Documentation/devicetree/bindings/sound/imx-audmix.txt 
b/Documentation/devicetree/bindings/sound/imx-audmix.txt
new file mode 100644
index 000..6ac1230
--- /dev/null
+++ b/Documentation/devicetree/bindings/sound/imx-audmix.txt
@@ -0,0 +1,24 @@
+NXP Audio Mixer (AUDMIX) machine driver.
+
+Required properties:
+===
+  - compatible : Compatible list, contains "fsl,imx-audmix"
+
+  - model  : Short audio card description.
+
+  - dais   : Must contain a list of phandles to AUDMIX connected
+ DAIs. The current implementation requires two phandles
+ to SAI interfaces to be provided, the first SAI in the
+ list being used to route the AUDMIX output.
+
+  - audmix-controller  : Must contain the phandle to the AUDMIX device node.
+
+Machine driver configuration example:
+
+  sound-audmix-sai {
+compatible = "fsl,imx-audmix";
+model = "audmix-sai";
+dais = <>, <>;
+audmix-controller = <>;
+  };
+
-- 
2.7.4



[PATCH v2 3/4] ASoC: fsl: Add Audio Mixer machine driver

2019-01-08 Thread Viorel Suman
This patch implements Audio Mixer machine driver for NXP iMX8 SOCs.
It connects together Audio Mixer and related SAI instances.

Signed-off-by: Viorel Suman 
---
 sound/soc/fsl/Kconfig  |   9 ++
 sound/soc/fsl/Makefile |   2 +
 sound/soc/fsl/imx-audmix.c | 333 +
 3 files changed, 344 insertions(+)
 create mode 100644 sound/soc/fsl/imx-audmix.c

diff --git a/sound/soc/fsl/Kconfig b/sound/soc/fsl/Kconfig
index 0af2e056..d87c842 100644
--- a/sound/soc/fsl/Kconfig
+++ b/sound/soc/fsl/Kconfig
@@ -303,6 +303,15 @@ config SND_SOC_FSL_ASOC_CARD
 CS4271, CS4272 and SGTL5000.
 Say Y if you want to add support for Freescale Generic ASoC Sound Card.
 
+config SND_SOC_IMX_AUDMIX
+   tristate "SoC Audio support for i.MX boards with AUDMIX"
+   select SND_SOC_FSL_AUDMIX
+   select SND_SOC_FSL_SAI
+   help
+ SoC Audio support for i.MX boards with Audio Mixer
+ Say Y if you want to add support for SoC audio on an i.MX board with
+ an Audio Mixer.
+
 endif # SND_IMX_SOC
 
 endmenu
diff --git a/sound/soc/fsl/Makefile b/sound/soc/fsl/Makefile
index 4172d5a..c0dd044 100644
--- a/sound/soc/fsl/Makefile
+++ b/sound/soc/fsl/Makefile
@@ -62,6 +62,7 @@ snd-soc-imx-es8328-objs := imx-es8328.o
 snd-soc-imx-sgtl5000-objs := imx-sgtl5000.o
 snd-soc-imx-spdif-objs := imx-spdif.o
 snd-soc-imx-mc13783-objs := imx-mc13783.o
+snd-soc-imx-audmix-objs := imx-audmix.o
 
 obj-$(CONFIG_SND_SOC_EUKREA_TLV320) += snd-soc-eukrea-tlv320.o
 obj-$(CONFIG_SND_SOC_PHYCORE_AC97) += snd-soc-phycore-ac97.o
@@ -71,3 +72,4 @@ obj-$(CONFIG_SND_SOC_IMX_ES8328) += snd-soc-imx-es8328.o
 obj-$(CONFIG_SND_SOC_IMX_SGTL5000) += snd-soc-imx-sgtl5000.o
 obj-$(CONFIG_SND_SOC_IMX_SPDIF) += snd-soc-imx-spdif.o
 obj-$(CONFIG_SND_SOC_IMX_MC13783) += snd-soc-imx-mc13783.o
+obj-$(CONFIG_SND_SOC_IMX_AUDMIX) += snd-soc-imx-audmix.o
diff --git a/sound/soc/fsl/imx-audmix.c b/sound/soc/fsl/imx-audmix.c
new file mode 100644
index 000..30cce52
--- /dev/null
+++ b/sound/soc/fsl/imx-audmix.c
@@ -0,0 +1,333 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2017 NXP
+ *
+ * The code contained herein is licensed under the GNU General Public
+ * License. You may obtain a copy of the GNU General Public License
+ * Version 2 or later at the following locations:
+ *
+ * http://www.opensource.org/licenses/gpl-license.html
+ * http://www.gnu.org/copyleft/gpl.html
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "fsl_sai.h"
+#include "fsl_audmix.h"
+
+struct imx_audmix {
+   struct platform_device *pdev;
+   struct snd_soc_card card;
+   struct platform_device *audmix_pdev;
+   struct platform_device *out_pdev;
+   struct clk *cpu_mclk;
+   int num_dai;
+   struct snd_soc_dai_link *dai;
+   int num_dai_conf;
+   struct snd_soc_codec_conf *dai_conf;
+   int num_dapm_routes;
+   struct snd_soc_dapm_route *dapm_routes;
+};
+
+static const u32 imx_audmix_rates[] = {
+   8000, 12000, 16000, 24000, 32000, 48000, 64000, 96000,
+};
+
+static const struct snd_pcm_hw_constraint_list imx_audmix_rate_constraints = {
+   .count = ARRAY_SIZE(imx_audmix_rates),
+   .list = imx_audmix_rates,
+};
+
+static int imx_audmix_fe_startup(struct snd_pcm_substream *substream)
+{
+   struct snd_soc_pcm_runtime *rtd = substream->private_data;
+   struct imx_audmix *priv = snd_soc_card_get_drvdata(rtd->card);
+   struct snd_pcm_runtime *runtime = substream->runtime;
+   struct device *dev = rtd->card->dev;
+   unsigned long clk_rate = clk_get_rate(priv->cpu_mclk);
+   int ret;
+
+   if (clk_rate % 24576000 == 0) {
+   ret = snd_pcm_hw_constraint_list(runtime, 0,
+SNDRV_PCM_HW_PARAM_RATE,
+_audmix_rate_constraints);
+   if (ret < 0)
+   return ret;
+   } else {
+   dev_warn(dev, "mclk may be not supported %lu\n", clk_rate);
+   }
+
+   ret = snd_pcm_hw_constraint_minmax(runtime, SNDRV_PCM_HW_PARAM_CHANNELS,
+  1, 8);
+   if (ret < 0)
+   return ret;
+
+   return snd_pcm_hw_constraint_mask64(runtime, SNDRV_PCM_HW_PARAM_FORMAT,
+   FSL_AUDMIX_FORMATS);
+}
+
+static int imx_audmix_fe_hw_params(struct snd_pcm_substream *substream,
+  struct snd_pcm_hw_params *params)
+{
+   struct snd_soc_pcm_runtime *rtd = substream->private_data;
+   struct device *dev = rtd->card->dev;
+   bool tx = substream->stream == SNDRV_PCM_STREAM_PLAYBACK;
+   unsigned int fmt = SND_SOC_DAIFMT_DSP_A | SND_SOC_DAIFMT_NB_NF;
+   u32 channels = params_channels(params);
+   int ret, dir;
+
+   /* For playback the AUDMIX is slave, and for record is master */
+   fmt |= tx ? SND_SOC_DAIFMT_CBS_CFS : 

[PATCH v2 2/4] ASoC: add fsl_audmix DT binding documentation

2019-01-08 Thread Viorel Suman
Add the DT binding documentation for NXP Audio Mixer
CPU DAI driver.

Signed-off-by: Viorel Suman 
---
 .../devicetree/bindings/sound/fsl,audmix.txt   | 44 ++
 1 file changed, 44 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/sound/fsl,audmix.txt

diff --git a/Documentation/devicetree/bindings/sound/fsl,audmix.txt 
b/Documentation/devicetree/bindings/sound/fsl,audmix.txt
new file mode 100644
index 000..512d39b
--- /dev/null
+++ b/Documentation/devicetree/bindings/sound/fsl,audmix.txt
@@ -0,0 +1,44 @@
+NXP Audio Mixer (AUDMIX).
+
+The Audio Mixer is a on-chip functional module that allows mixing of two
+audio streams into a single audio stream. Audio Mixer has two input serial
+audio interfaces. These are driven by two Synchronous Audio interface
+modules (SAI). Each input serial interface carries 8 audio channels in its
+frame in TDM manner. Mixer mixes audio samples of corresponding channels
+from two interfaces into a single sample. Before mixing, audio samples of
+two inputs can be attenuated based on configuration. The output of the
+Audio Mixer is also a serial audio interface. Like input interfaces it has
+the same TDM frame format. This output is used to drive the serial DAC TDM
+interface of audio codec and also sent to the external pins along with the
+receive path of normal audio SAI module for readback by the CPU.
+
+The output of Audio Mixer can be selected from any of the three streams
+ - serial audio input 1
+ - serial audio input 2
+ - mixed audio
+
+Mixing operation is independent of audio sample rate but the two audio
+input streams must have same audio sample rate with same number of channels
+in TDM frame to be eligible for mixing.
+
+Device driver required properties:
+=
+  - compatible : Compatible list, contains "fsl,imx8qm-audmix"
+
+  - reg: Offset and length of the register set for the 
device.
+
+  - clocks : Must contain an entry for each entry in clock-names.
+
+  - clock-names: Must include the "ipg" for register access.
+
+  - power-domains  : Must contain the phandle to AUDMIX power domain node
+
+Device driver configuration example:
+==
+  audmix: audmix@5984 {
+compatible = "fsl,imx8qm-audmix";
+reg = <0x0 0x5984 0x0 0x1>;
+clocks = < IMX8QXP_AUD_AUDMIX_IPG>;
+clock-names = "ipg";
+power-domains = <_audmix>;
+  };
-- 
2.7.4



[PATCH v2 1/4] ASoC: fsl: Add Audio Mixer CPU DAI driver

2019-01-08 Thread Viorel Suman
This patch implements Audio Mixer CPU DAI driver for NXP iMX8 SOCs.
The Audio Mixer is a on-chip functional module that allows mixing of
two audio streams into a single audio stream.

Audio Mixer datasheet is available here:
https://www.nxp.com/docs/en/reference-manual/IMX8DQXPRM.pdf

Signed-off-by: Viorel Suman 
---
 sound/soc/fsl/Kconfig  |   7 +
 sound/soc/fsl/Makefile |   3 +
 sound/soc/fsl/fsl_audmix.c | 551 +
 sound/soc/fsl/fsl_audmix.h | 102 +
 4 files changed, 663 insertions(+)
 create mode 100644 sound/soc/fsl/fsl_audmix.c
 create mode 100644 sound/soc/fsl/fsl_audmix.h

diff --git a/sound/soc/fsl/Kconfig b/sound/soc/fsl/Kconfig
index 7b1d997..0af2e056 100644
--- a/sound/soc/fsl/Kconfig
+++ b/sound/soc/fsl/Kconfig
@@ -24,6 +24,13 @@ config SND_SOC_FSL_SAI
  This option is only useful for out-of-tree drivers since
  in-tree drivers select it automatically.
 
+config SND_SOC_FSL_AUDMIX
+   tristate "Audio Mixer (AUDMIX) module support"
+   select REGMAP_MMIO
+   help
+ Say Y if you want to add Audio Mixer (AUDMIX)
+ support for the NXP iMX CPUs.
+
 config SND_SOC_FSL_SSI
tristate "Synchronous Serial Interface module (SSI) support"
select SND_SOC_IMX_PCM_DMA if SND_IMX_SOC != n
diff --git a/sound/soc/fsl/Makefile b/sound/soc/fsl/Makefile
index 3c0ff31..4172d5a 100644
--- a/sound/soc/fsl/Makefile
+++ b/sound/soc/fsl/Makefile
@@ -12,6 +12,7 @@ snd-soc-p1022-rdk-objs := p1022_rdk.o
 obj-$(CONFIG_SND_SOC_P1022_RDK) += snd-soc-p1022-rdk.o
 
 # Freescale SSI/DMA/SAI/SPDIF Support
+snd-soc-fsl-audmix-objs := fsl_audmix.o
 snd-soc-fsl-asoc-card-objs := fsl-asoc-card.o
 snd-soc-fsl-asrc-objs := fsl_asrc.o fsl_asrc_dma.o
 snd-soc-fsl-sai-objs := fsl_sai.o
@@ -22,6 +23,8 @@ snd-soc-fsl-esai-objs := fsl_esai.o
 snd-soc-fsl-micfil-objs := fsl_micfil.o
 snd-soc-fsl-utils-objs := fsl_utils.o
 snd-soc-fsl-dma-objs := fsl_dma.o
+
+obj-$(CONFIG_SND_SOC_FSL_AUDMIX) += snd-soc-fsl-audmix.o
 obj-$(CONFIG_SND_SOC_FSL_ASOC_CARD) += snd-soc-fsl-asoc-card.o
 obj-$(CONFIG_SND_SOC_FSL_ASRC) += snd-soc-fsl-asrc.o
 obj-$(CONFIG_SND_SOC_FSL_SAI) += snd-soc-fsl-sai.o
diff --git a/sound/soc/fsl/fsl_audmix.c b/sound/soc/fsl/fsl_audmix.c
new file mode 100644
index 000..f1267e5
--- /dev/null
+++ b/sound/soc/fsl/fsl_audmix.c
@@ -0,0 +1,551 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * NXP AUDMIX ALSA SoC Digital Audio Interface (DAI) driver
+ *
+ * Copyright 2017 NXP
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "fsl_audmix.h"
+
+#define SOC_ENUM_SINGLE_S(xreg, xshift, xtexts) \
+   SOC_ENUM_SINGLE(xreg, xshift, ARRAY_SIZE(xtexts), xtexts)
+
+static const char
+   *tdm_sel[] = { "TDM1", "TDM2", },
+   *mode_sel[] = { "Disabled", "TDM1", "TDM2", "Mixed", },
+   *width_sel[] = { "16b", "18b", "20b", "24b", "32b", },
+   *endis_sel[] = { "Disabled", "Enabled", },
+   *updn_sel[] = { "Downward", "Upward", },
+   *mask_sel[] = { "Unmask", "Mask", };
+
+static const struct soc_enum fsl_audmix_enum[] = {
+/* FSL_AUDMIX_CTR enums */
+SOC_ENUM_SINGLE_S(FSL_AUDMIX_CTR, FSL_AUDMIX_CTR_MIXCLK_SHIFT, tdm_sel),
+SOC_ENUM_SINGLE_S(FSL_AUDMIX_CTR, FSL_AUDMIX_CTR_OUTSRC_SHIFT, mode_sel),
+SOC_ENUM_SINGLE_S(FSL_AUDMIX_CTR, FSL_AUDMIX_CTR_OUTWIDTH_SHIFT, width_sel),
+SOC_ENUM_SINGLE_S(FSL_AUDMIX_CTR, FSL_AUDMIX_CTR_MASKRTDF_SHIFT, mask_sel),
+SOC_ENUM_SINGLE_S(FSL_AUDMIX_CTR, FSL_AUDMIX_CTR_MASKCKDF_SHIFT, mask_sel),
+SOC_ENUM_SINGLE_S(FSL_AUDMIX_CTR, FSL_AUDMIX_CTR_SYNCMODE_SHIFT, endis_sel),
+SOC_ENUM_SINGLE_S(FSL_AUDMIX_CTR, FSL_AUDMIX_CTR_SYNCSRC_SHIFT, tdm_sel),
+/* FSL_AUDMIX_ATCR0 enums */
+SOC_ENUM_SINGLE_S(FSL_AUDMIX_ATCR0, 0, endis_sel),
+SOC_ENUM_SINGLE_S(FSL_AUDMIX_ATCR0, 1, updn_sel),
+/* FSL_AUDMIX_ATCR1 enums */
+SOC_ENUM_SINGLE_S(FSL_AUDMIX_ATCR1, 0, endis_sel),
+SOC_ENUM_SINGLE_S(FSL_AUDMIX_ATCR1, 1, updn_sel),
+};
+
+struct fsl_audmix_state {
+   u8 tdms;
+   u8 clk;
+   char msg[64];
+};
+
+static const struct fsl_audmix_state prms[4][4] = {{
+   /* DIS->DIS, do nothing */
+   { .tdms = 0, .clk = 0, .msg = "" },
+   /* DIS->TDM1*/
+   { .tdms = 1, .clk = 1, .msg = "DIS->TDM1: TDM1 not started!\n" },
+   /* DIS->TDM2*/
+   { .tdms = 2, .clk = 2, .msg = "DIS->TDM2: TDM2 not started!\n" },
+   /* DIS->MIX */
+   { .tdms = 3, .clk = 0, .msg = "DIS->MIX: Please start both TDMs!\n" }
+}, {   /* TDM1->DIS */
+   { .tdms = 1, .clk = 0, .msg = "TDM1->DIS: TDM1 not started!\n" },
+   /* TDM1->TDM1, do nothing */
+   { .tdms = 0, .clk = 0, .msg = "" },
+   /* TDM1->TDM2 */
+   { .tdms = 3, .clk = 2, .msg = "TDM1->TDM2: Please start both TDMs!\n" },
+   /* TDM1->MIX */
+   { .tdms = 3, .clk = 0, .msg = "TDM1->MIX: Please start both TDMs!\n" }
+}, {   /* TDM2->DIS */
+   { .tdms = 2, .clk = 0, .msg = "TDM2->DIS: TDM2 not started!\n" },
+   /* TDM2->TDM1 */
+   { .tdms = 3, 

[PATCH v2 0/4] Add NXP AUDMIX device and machine drivers

2019-01-08 Thread Viorel Suman
The patchset adds NXP Audio Mixer (AUDMIX) device and machine
drivers and related DT bindings documentation.

Changes since V1:
1. Original patch split into distinct patches for the device driver and
  DT binding documentation.
2. Replaced AMIX with AUDMIX in both code and file names as it looks more
  RM-compliant.
3. Removed polarity control from CPU DAI driver as suggested by Nicolin.
4. Added machine driver and related DT binding documentation.

Viorel Suman (4):
  ASoC: fsl: Add Audio Mixer CPU DAI driver
  ASoC: add fsl_audmix DT binding documentation
  ASoC: fsl: Add Audio Mixer machine driver
  ASoC: add imx-audmix DT binding documentation

 .../devicetree/bindings/sound/fsl,audmix.txt   |  44 ++
 .../devicetree/bindings/sound/imx-audmix.txt   |  24 +
 sound/soc/fsl/Kconfig  |  16 +
 sound/soc/fsl/Makefile |   5 +
 sound/soc/fsl/fsl_audmix.c | 551 +
 sound/soc/fsl/fsl_audmix.h | 101 
 sound/soc/fsl/imx-audmix.c | 328 
 7 files changed, 1069 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/sound/fsl,audmix.txt
 create mode 100644 Documentation/devicetree/bindings/sound/imx-audmix.txt
 create mode 100644 sound/soc/fsl/fsl_audmix.c
 create mode 100644 sound/soc/fsl/fsl_audmix.h
 create mode 100644 sound/soc/fsl/imx-audmix.c

-- 
2.7.4



[PATCH] powerpc: build virtex dtb

2019-01-08 Thread Corentin Labbe
I wanted to test the virtex440-ml507 qemu machine and found that the dtb
for it was not builded.
All powerpc DTB are only built when CONFIG_OF_ALL_DTBS is set which depend on
COMPILE_TEST.

This patchs adds build of virtex dtbs depending on
CONFIG_XILINX_VIRTEX440_GENERIC_BOARD option.

Signed-off-by: Corentin Labbe 
---
 arch/powerpc/boot/dts/Makefile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/boot/dts/Makefile b/arch/powerpc/boot/dts/Makefile
index fb335d05aae8..cae14fca682e 100644
--- a/arch/powerpc/boot/dts/Makefile
+++ b/arch/powerpc/boot/dts/Makefile
@@ -4,3 +4,5 @@ subdir-y += fsl
 
 dtstree:= $(srctree)/$(src)
 dtb-$(CONFIG_OF_ALL_DTBS) := $(patsubst $(dtstree)/%.dts,%.dtb, $(wildcard 
$(dtstree)/*.dts))
+dtb-$(CONFIG_XILINX_VIRTEX440_GENERIC_BOARD) += virtex440-ml507.dtb
+dtb-$(CONFIG_XILINX_VIRTEX440_GENERIC_BOARD) += virtex440-ml510.dtb
-- 
2.19.2



Re: [PATCH 2/2] powerpc: Show PAGE_SIZE in __die() output

2019-01-08 Thread Christophe Leroy




Le 08/01/2019 à 13:05, Michael Ellerman a écrit :

The page size the kernel is built with is useful info when debugging a
crash, so add it to the output in __die().

Result looks like eg:

   kernel BUG at drivers/misc/lkdtm/bugs.c:63!
   Oops: Exception in kernel mode, sig: 5 [#1]
   LE PAGE_SIZE=64K SMP NR_CPUS=2048 NUMA pSeries
   Modules linked in: vmx_crypto kvm binfmt_misc ip_tables

Signed-off-by: Michael Ellerman 
---
  arch/powerpc/kernel/traps.c | 12 
  1 file changed, 12 insertions(+)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 431a86d3f772..fc972e4eee5f 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -268,6 +268,18 @@ static int __die(const char *str, struct pt_regs *regs, 
long err)
else
seq_buf_puts(, "BE ");
  
+	seq_buf_puts(, "PAGE_SIZE=");

+   if (IS_ENABLED(CONFIG_PPC_4K_PAGES))
+   seq_buf_puts(, "4K ");
+   else if (IS_ENABLED(CONFIG_PPC_16K_PAGES))
+   seq_buf_puts(, "16K ");
+   else if (IS_ENABLED(CONFIG_PPC_64K_PAGES))
+   seq_buf_puts(, "64K ");
+   else if (IS_ENABLED(CONFIG_PPC_256K_PAGES))
+   seq_buf_puts(, "256K ");


Can't we build all the above at once using PAGE_SHIFT ?

Something like (untested):

"%dK ", 1 << (PAGE_SHIFT - 10)

Christophe


+   else
+   BUILD_BUG_ON(1);
+
if (IS_ENABLED(CONFIG_PREEMPT))
seq_buf_puts(, "PREEMPT ");
  



Re: [PATCH 1/2] powerpc: Use seq_buf to avoid pr_cont() in __die()

2019-01-08 Thread Christophe Leroy




Le 08/01/2019 à 13:04, Michael Ellerman a écrit :

Using pr_cont() risks having our output interleaved with other output
from other CPUs. Instead use a seq_buf to construct the line and then
print it as a whole.


Why not simply doing a single printk() or similar on the same model as 
X86 for instance ? 
(https://elixir.bootlin.com/linux/v5.0-rc1/source/arch/x86/kernel/dumpstack.c#L368)


Christophe



Signed-off-by: Michael Ellerman 
---
  arch/powerpc/kernel/traps.c | 23 ---
  1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 64936b60d521..431a86d3f772 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -38,6 +38,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -255,26 +256,34 @@ NOKPROBE_SYMBOL(oops_end);
  
  static int __die(const char *str, struct pt_regs *regs, long err)

  {
+   char buf[128]; /* enough for all flags and a long platform name */
+   struct seq_buf s;
+
+   seq_buf_init(, buf, sizeof(buf));
+
printk("Oops: %s, sig: %ld [#%d]\n", str, err, ++die_counter);
  
  	if (IS_ENABLED(CONFIG_CPU_LITTLE_ENDIAN))

-   printk("LE ");
+   seq_buf_puts(, "LE ");
else
-   printk("BE ");
+   seq_buf_puts(, "BE ");
  
  	if (IS_ENABLED(CONFIG_PREEMPT))

-   pr_cont("PREEMPT ");
+   seq_buf_puts(, "PREEMPT ");
  
  	if (IS_ENABLED(CONFIG_SMP))

-   pr_cont("SMP NR_CPUS=%d ", NR_CPUS);
+   seq_buf_printf(, "SMP NR_CPUS=%d ", NR_CPUS);
  
  	if (debug_pagealloc_enabled())

-   pr_cont("DEBUG_PAGEALLOC ");
+   seq_buf_puts(, "DEBUG_PAGEALLOC ");
  
  	if (IS_ENABLED(CONFIG_NUMA))

-   pr_cont("NUMA ");
+   seq_buf_puts(, "NUMA ");
+
+   if (ppc_md.name)
+   seq_buf_puts(, ppc_md.name);
  
-	pr_cont("%s\n", ppc_md.name ? ppc_md.name : "");

+   printk("%s\n", buf);
  
  	if (notify_die(DIE_OOPS, str, regs, err, 255, SIGSEGV) == NOTIFY_STOP)

return 1;



[PATCH 2/2] powerpc: Show PAGE_SIZE in __die() output

2019-01-08 Thread Michael Ellerman
The page size the kernel is built with is useful info when debugging a
crash, so add it to the output in __die().

Result looks like eg:

  kernel BUG at drivers/misc/lkdtm/bugs.c:63!
  Oops: Exception in kernel mode, sig: 5 [#1]
  LE PAGE_SIZE=64K SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in: vmx_crypto kvm binfmt_misc ip_tables

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/traps.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 431a86d3f772..fc972e4eee5f 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -268,6 +268,18 @@ static int __die(const char *str, struct pt_regs *regs, 
long err)
else
seq_buf_puts(, "BE ");
 
+   seq_buf_puts(, "PAGE_SIZE=");
+   if (IS_ENABLED(CONFIG_PPC_4K_PAGES))
+   seq_buf_puts(, "4K ");
+   else if (IS_ENABLED(CONFIG_PPC_16K_PAGES))
+   seq_buf_puts(, "16K ");
+   else if (IS_ENABLED(CONFIG_PPC_64K_PAGES))
+   seq_buf_puts(, "64K ");
+   else if (IS_ENABLED(CONFIG_PPC_256K_PAGES))
+   seq_buf_puts(, "256K ");
+   else
+   BUILD_BUG_ON(1);
+
if (IS_ENABLED(CONFIG_PREEMPT))
seq_buf_puts(, "PREEMPT ");
 
-- 
2.20.1



[PATCH 1/2] powerpc: Use seq_buf to avoid pr_cont() in __die()

2019-01-08 Thread Michael Ellerman
Using pr_cont() risks having our output interleaved with other output
from other CPUs. Instead use a seq_buf to construct the line and then
print it as a whole.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/traps.c | 23 ---
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 64936b60d521..431a86d3f772 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -255,26 +256,34 @@ NOKPROBE_SYMBOL(oops_end);
 
 static int __die(const char *str, struct pt_regs *regs, long err)
 {
+   char buf[128]; /* enough for all flags and a long platform name */
+   struct seq_buf s;
+
+   seq_buf_init(, buf, sizeof(buf));
+
printk("Oops: %s, sig: %ld [#%d]\n", str, err, ++die_counter);
 
if (IS_ENABLED(CONFIG_CPU_LITTLE_ENDIAN))
-   printk("LE ");
+   seq_buf_puts(, "LE ");
else
-   printk("BE ");
+   seq_buf_puts(, "BE ");
 
if (IS_ENABLED(CONFIG_PREEMPT))
-   pr_cont("PREEMPT ");
+   seq_buf_puts(, "PREEMPT ");
 
if (IS_ENABLED(CONFIG_SMP))
-   pr_cont("SMP NR_CPUS=%d ", NR_CPUS);
+   seq_buf_printf(, "SMP NR_CPUS=%d ", NR_CPUS);
 
if (debug_pagealloc_enabled())
-   pr_cont("DEBUG_PAGEALLOC ");
+   seq_buf_puts(, "DEBUG_PAGEALLOC ");
 
if (IS_ENABLED(CONFIG_NUMA))
-   pr_cont("NUMA ");
+   seq_buf_puts(, "NUMA ");
+
+   if (ppc_md.name)
+   seq_buf_puts(, ppc_md.name);
 
-   pr_cont("%s\n", ppc_md.name ? ppc_md.name : "");
+   printk("%s\n", buf);
 
if (notify_die(DIE_OOPS, str, regs, err, 255, SIGSEGV) == NOTIFY_STOP)
return 1;
-- 
2.20.1



Re: [PATCH] powerpc/irq: drop arch_early_irq_init()

2019-01-08 Thread Christophe Leroy

Oops, forgot to actually copy Thomas.

Le 08/01/2019 à 12:37, Christophe Leroy a écrit :

arch_early_irq_init() does nothing different than
the weak arch_early_irq_init() in kernel/softirq.c

Fixes: 089fb442f301 ("powerpc: Use ARCH_IRQ_INIT_FLAGS")
Cc: Thomas Gleixner 
Signed-off-by: Christophe Leroy 
---
  arch/powerpc/kernel/irq.c | 5 -
  1 file changed, 5 deletions(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 916ddc4aac44..bb299613a462 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -827,11 +827,6 @@ int irq_choose_cpu(const struct cpumask *mask)
  }
  #endif
  
-int arch_early_irq_init(void)

-{
-   return 0;
-}
-
  #ifdef CONFIG_PPC64
  static int __init setup_noirqdistrib(char *str)
  {



[PATCH] powerpc/irq: drop arch_early_irq_init()

2019-01-08 Thread Christophe Leroy
arch_early_irq_init() does nothing different than
the weak arch_early_irq_init() in kernel/softirq.c

Fixes: 089fb442f301 ("powerpc: Use ARCH_IRQ_INIT_FLAGS")
Cc: Thomas Gleixner 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/irq.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 916ddc4aac44..bb299613a462 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -827,11 +827,6 @@ int irq_choose_cpu(const struct cpumask *mask)
 }
 #endif
 
-int arch_early_irq_init(void)
-{
-   return 0;
-}
-
 #ifdef CONFIG_PPC64
 static int __init setup_noirqdistrib(char *str)
 {
-- 
2.13.3



Re: [PATCH v3 1/2] selftests/powerpc: Add MSR bits

2019-01-08 Thread Breno Leitao
On 1/8/19 9:14 AM, Christophe Leroy wrote:
> Le 08/01/2019 à 12:11, Breno Leitao a écrit :
>> Hi Michael,
>>
>> On 1/8/19 7:20 AM, Michael Ellerman wrote:
>>> Breno Leitao  writes:
>>>
 hi Christophe,

 On 1/3/19 3:19 PM, LEROY Christophe wrote:
> Breno Leitao  a écrit :
>
>> This patch simply adds definitions for the MSR bits and some macros to
>> test for MSR TM bits.
>>
>> This was copied from arch/powerpc/include/asm/reg.h generic MSR part.
>
> Can't we find a way to avoid duplicating such defines ?

 I think there are three possible ways, but none of them respect the 
 premises
 we are used too. These are the possible ways I can think of:

 1) Including arch/powerpc/include/asm as part of the selftest compilation
 process.
     Problem: This might break the selftest independence of the kbuild
 system.

 2) Generate a temporary header file inside selftests/include which contains
 these macros at compilation time.
     Problem: The problem as above.

 3) Define MSR fields at userspace headers (/usr/include).
     Problem: I am not sure userspace should have MSR bits information.

 Do you suggest me to investigate any other way?
>>>
>>> In this case I think we can probably just copy the few #defines we need.
>>
>> Cool. That is the very same thing as suggested by Christophe. I sent the v4
>> yesterday, with this change already.
> 
> It didn't get through. Latest is still v3 it seems, look
> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?submitter=71262

Weird, I am not finding it as well, although I've just confirmed that I have,
in fact, sent it. Anyway, let me send it again.

Thanks!


[PATCH v4] selftests/powerpc: New TM signal self test

2019-01-08 Thread Breno Leitao
A new self test that forces MSR[TS] to be set without calling any TM
instruction. This test also tries to cause a page fault at a signal
handler, exactly between MSR[TS] set and tm_recheckpoint(), forcing
thread->texasr to be rewritten with TEXASR[FS] = 0, which will cause a BUG
when tm_recheckpoint() is called.

This test is not deterministic, since it is hard to guarantee that the page
access will cause a page fault. In order to force more page faults at
signal context, the signal handler and the ucontext are being mapped into a
MADV_DONTNEED memory chunks.

Tests have shown that the bug could be exposed with few interactions in a
buggy kernel. This test is configured to loop 5000x, having a good chance
to hit the kernel issue in just one run.  This self test takes less than
two seconds to run.

This test uses set/getcontext because the kernel will recheckpoint
zeroed structures, causing the test to segfault, which is undesired because
the test needs to rerun, so, there is a signal handler for SIGSEGV which
will restart the test.

v2: Uses the MADV_DONTNEED memory advice
v3: Fix memcpy and 32-bits compilation
v4: Does not define unused macros

Signed-off-by: Breno Leitao 
---
 tools/testing/selftests/powerpc/include/reg.h |   8 +
 .../testing/selftests/powerpc/include/utils.h |   2 +
 tools/testing/selftests/powerpc/tm/.gitignore |   1 +
 tools/testing/selftests/powerpc/tm/Makefile   |   4 +-
 .../powerpc/tm/tm-signal-context-force-tm.c   | 184 ++
 5 files changed, 198 insertions(+), 1 deletion(-)
 create mode 100644 
tools/testing/selftests/powerpc/tm/tm-signal-context-force-tm.c

diff --git a/tools/testing/selftests/powerpc/include/reg.h 
b/tools/testing/selftests/powerpc/include/reg.h
index 52b4710469d2..a69ab6e2afd9 100644
--- a/tools/testing/selftests/powerpc/include/reg.h
+++ b/tools/testing/selftests/powerpc/include/reg.h
@@ -77,6 +77,14 @@
 #define TEXASR_TE  0x0400
 #define TEXASR_ROT 0x0200
 
+/* MSR register bits */
+#define MSR_TS_S_LG 33  /* Trans Mem state: Suspended */
+
+#define __MASK(X)   (1UL<<(X))
+
+/* macro to check TM MSR bits */
+#define MSR_TS_S__MASK(MSR_TS_S_LG)   /* Transaction Suspended */
+
 /* Vector Instructions */
 #define VSX_XX1(xs, ra, rb)(((xs) & 0x1f) << 21 | ((ra) << 16) |  \
 ((rb) << 11) | (((xs) >> 5)))
diff --git a/tools/testing/selftests/powerpc/include/utils.h 
b/tools/testing/selftests/powerpc/include/utils.h
index ae43a614835d..7636bf45d5d5 100644
--- a/tools/testing/selftests/powerpc/include/utils.h
+++ b/tools/testing/selftests/powerpc/include/utils.h
@@ -102,8 +102,10 @@ do {   
\
 
 #if defined(__powerpc64__)
 #define UCONTEXT_NIA(UC)   (UC)->uc_mcontext.gp_regs[PT_NIP]
+#define UCONTEXT_MSR(UC)   (UC)->uc_mcontext.gp_regs[PT_MSR]
 #elif defined(__powerpc__)
 #define UCONTEXT_NIA(UC)   (UC)->uc_mcontext.uc_regs->gregs[PT_NIP]
+#define UCONTEXT_MSR(UC)   (UC)->uc_mcontext.uc_regs->gregs[PT_MSR]
 #else
 #error implement UCONTEXT_NIA
 #endif
diff --git a/tools/testing/selftests/powerpc/tm/.gitignore 
b/tools/testing/selftests/powerpc/tm/.gitignore
index 208452a93e2c..951fe855f7cd 100644
--- a/tools/testing/selftests/powerpc/tm/.gitignore
+++ b/tools/testing/selftests/powerpc/tm/.gitignore
@@ -11,6 +11,7 @@ tm-signal-context-chk-fpu
 tm-signal-context-chk-gpr
 tm-signal-context-chk-vmx
 tm-signal-context-chk-vsx
+tm-signal-context-force-tm
 tm-signal-sigreturn-nt
 tm-vmx-unavail
 tm-unavailable
diff --git a/tools/testing/selftests/powerpc/tm/Makefile 
b/tools/testing/selftests/powerpc/tm/Makefile
index 75a685359129..c0734ed0ef56 100644
--- a/tools/testing/selftests/powerpc/tm/Makefile
+++ b/tools/testing/selftests/powerpc/tm/Makefile
@@ -4,7 +4,8 @@ SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr 
tm-signal-context-chk-fpu
 
 TEST_GEN_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv 
tm-signal-stack \
tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail tm-unavailable 
tm-trap \
-   $(SIGNAL_CONTEXT_CHK_TESTS) tm-sigreturn tm-signal-sigreturn-nt
+   $(SIGNAL_CONTEXT_CHK_TESTS) tm-sigreturn tm-signal-sigreturn-nt \
+   tm-signal-context-force-tm
 
 top_srcdir = ../../../../..
 include ../../lib.mk
@@ -20,6 +21,7 @@ $(OUTPUT)/tm-vmx-unavail: CFLAGS += -pthread -m64
 $(OUTPUT)/tm-resched-dscr: ../pmu/lib.c
 $(OUTPUT)/tm-unavailable: CFLAGS += -O0 -pthread -m64 -Wno-error=uninitialized 
-mvsx
 $(OUTPUT)/tm-trap: CFLAGS += -O0 -pthread -m64
+$(OUTPUT)/tm-signal-context-force-tm: CFLAGS += -pthread -m64
 
 SIGNAL_CONTEXT_CHK_TESTS := $(patsubst 
%,$(OUTPUT)/%,$(SIGNAL_CONTEXT_CHK_TESTS))
 $(SIGNAL_CONTEXT_CHK_TESTS): tm-signal.S
diff --git a/tools/testing/selftests/powerpc/tm/tm-signal-context-force-tm.c 
b/tools/testing/selftests/powerpc/tm/tm-signal-context-force-tm.c
new file mode 100644
index ..31717625f318
--- /dev/null
+++ 

Re: [PATCH v3 1/2] selftests/powerpc: Add MSR bits

2019-01-08 Thread Christophe Leroy



Le 08/01/2019 à 12:11, Breno Leitao a écrit :

Hi Michael,

On 1/8/19 7:20 AM, Michael Ellerman wrote:

Breno Leitao  writes:


hi Christophe,

On 1/3/19 3:19 PM, LEROY Christophe wrote:

Breno Leitao  a écrit :


This patch simply adds definitions for the MSR bits and some macros to
test for MSR TM bits.

This was copied from arch/powerpc/include/asm/reg.h generic MSR part.


Can't we find a way to avoid duplicating such defines ?


I think there are three possible ways, but none of them respect the premises
we are used too. These are the possible ways I can think of:

1) Including arch/powerpc/include/asm as part of the selftest compilation
process.
Problem: This might break the selftest independence of the kbuild system.

2) Generate a temporary header file inside selftests/include which contains
these macros at compilation time.
Problem: The problem as above.

3) Define MSR fields at userspace headers (/usr/include).
Problem: I am not sure userspace should have MSR bits information.

Do you suggest me to investigate any other way?


In this case I think we can probably just copy the few #defines we need.


Cool. That is the very same thing as suggested by Christophe. I sent the v4
yesterday, with this change already.


It didn't get through. Latest is still v3 it seems, look 
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?submitter=71262


Christophe



Thanks for looking at it.



Re: [PATCH v3 1/2] selftests/powerpc: Add MSR bits

2019-01-08 Thread Breno Leitao
Hi Michael,

On 1/8/19 7:20 AM, Michael Ellerman wrote:
> Breno Leitao  writes:
> 
>> hi Christophe,
>>
>> On 1/3/19 3:19 PM, LEROY Christophe wrote:
>>> Breno Leitao  a écrit :
>>>
 This patch simply adds definitions for the MSR bits and some macros to
 test for MSR TM bits.

 This was copied from arch/powerpc/include/asm/reg.h generic MSR part.
>>>
>>> Can't we find a way to avoid duplicating such defines ?
>>
>> I think there are three possible ways, but none of them respect the premises
>> we are used too. These are the possible ways I can think of:
>>
>> 1) Including arch/powerpc/include/asm as part of the selftest compilation
>> process.
>>Problem: This might break the selftest independence of the kbuild system.
>>
>> 2) Generate a temporary header file inside selftests/include which contains
>> these macros at compilation time.
>>Problem: The problem as above.
>>
>> 3) Define MSR fields at userspace headers (/usr/include).
>>Problem: I am not sure userspace should have MSR bits information.
>>
>> Do you suggest me to investigate any other way?
> 
> In this case I think we can probably just copy the few #defines we need.

Cool. That is the very same thing as suggested by Christophe. I sent the v4
yesterday, with this change already.

Thanks for looking at it.


Re: [PATCH v4 11/13] x86: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs

2019-01-08 Thread Peter Zijlstra
On Mon, Jan 07, 2019 at 04:27:28PM +, Andrew Murray wrote:

This patch has the exact same subject as the previous one.. that seems
sub-optimal.


Re: [PATCH v4 10/13] x86: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs

2019-01-08 Thread Peter Zijlstra
On Mon, Jan 07, 2019 at 04:27:27PM +, Andrew Murray wrote:
> For drivers that do not support context exclusion let's advertise the
> PERF_PMU_CAP_NOEXCLUDE capability. This ensures that perf will
> prevent us from handling events where any exclusion flags are set.
> Let's also remove the now unnecessary check for exclusion flags.
> 
> Signed-off-by: Andrew Murray 
> ---
>  arch/x86/events/amd/ibs.c  | 13 +
>  arch/x86/events/amd/power.c| 10 ++
>  arch/x86/events/intel/cstate.c | 12 +++-
>  arch/x86/events/intel/rapl.c   |  9 ++---
>  arch/x86/events/intel/uncore_snb.c |  9 ++---
>  arch/x86/events/msr.c  | 10 ++
>  6 files changed, 12 insertions(+), 51 deletions(-)

You (correctly) don't add CAP_NO_EXCLUDE to the main x86 pmu code, but
then you also don't check if it handles all the various exclude options
correctly/consistently.

Now; I must admit that that is a bit of a maze, but I think we can at
least add exclude_idle and exclude_hv fails in there, nothing uses those
afaict.

On the various exclude options; they are as follows (IIUC):

  - exclude_guest: we're a HV/host-kernel and we don't want the counter
   to run when we run a guest context.

  - exclude_host: we're a HV/host-kernel and we don't want the counter
  to run when we run in host context.

  - exclude_hv: we're a guest and don't want the counter to run in HV
context.

Now, KVM always implies exclude_hv afaict (for guests), I'm not sure
what, if anything Xen does on x86 (IIRC Brendan Gregg once said perf
works on Xen) -- nor quite sure who to ask, Boris, Jeurgen?



Re: [PATCH v4 05/13] arm: perf: conditionally use PERF_PMU_CAP_NO_EXCLUDE

2019-01-08 Thread Peter Zijlstra
On Mon, Jan 07, 2019 at 04:27:22PM +, Andrew Murray wrote:
> @@ -393,9 +386,8 @@ __hw_perf_event_init(struct perf_event *event)
>   /*
>* Check whether we need to exclude the counter from certain modes.
>*/
> + if (armpmu->set_event_filter &&
> + armpmu->set_event_filter(hwc, >attr)) {
>   pr_debug("ARM performance counters do not support "
>"mode exclusion\n");
>   return -EOPNOTSUPP;

This then requires all set_event_filter() implementations to check all
the various exclude options; also, set_event_filter() failing then
returns with -EOPNOTSUPP instead of the -EINVAL the CAP_NO_EXCLUDE
generates, which is again inconsitent.

If I look at (the very first git-grep found me)
armv7pmu_set_event_filter(), then I find it returning -EPERM (again
inconsistent but irrelevant because the actual value is not preserved)
for exclude_idle.

But it doesn't seem to check exclude_host at all for example.

> @@ -867,6 +859,9 @@ int armpmu_register(struct arm_pmu *pmu)
>   if (ret)
>   return ret;
>  
> + if (!pmu->set_event_filter)
> + pmu->pmu.capabilities |= PERF_PMU_CAP_NO_EXCLUDE;
> +
>   ret = perf_pmu_register(>pmu, pmu->name, -1);
>   if (ret)
>   goto out_destroy;
> -- 
> 2.7.4
> 


Re: powerpc/pasemi: Add Nemo board IRQ initroutine

2019-01-08 Thread Michael Ellerman
Darren Stevens  writes:
> Hello Michael
>
> On 22/12/2018, Michael Ellerman wrote:
>> On Sun, 2018-08-19 at 20:21:47 UTC, Darren Stevens wrote:
>>> Add a IRQ init routine for the Nemo board which inits and attatches
>>> the i8259 found in the SB600, and a cascade routine to dispatch the
>>> interrupts.
>>> 
>>> Signed-off-by: Darren Stevens 
>>
>> Applied to powerpc next, thanks.
>
> Thankyou very much! We can now build a X1000 kernel without an out-of-tree 
> patch, which is good.

Yay!

>> https://git.kernel.org/powerpc/c/51f4cc2047a4b7e9bf1b49acf06c11
>
> Sorry for not mentioning this earlier(holidays etc..), but this commit
> has my 4th patch, but the commit message from the 2nd patch. I don't
> know how, and it is not a major problem, but can we get it updated?

Urgh, cr#p. Not sure how I messed that up. I did have to edit all the
patches a bit to get them to apply so presumably somewhere along the
line the commit message got messed up to.

Unfortunately there's no way to fix it.

This email will have to suffice as documentation that it was me that
screwed it up and not you.

cheers


Re: [PATCH] selftests/powerpc: New TM signal self test

2019-01-08 Thread Michael Ellerman
Breno Leitao  writes:
> On 12/20/18 10:51 AM, Michael Ellerman wrote:
>> Breno Leitao  writes:
>> 
>>> A new self test that forces MSR[TS] to be set without calling any TM
>>> instruction. This test also tries to cause a page fault at a signal
>>> handler, exactly between MSR[TS] set and tm_recheckpoint(), forcing
>>> thread->texasr to be rewritten with TEXASR[FS] = 0, which will cause a BUG
>>> when tm_recheckpoint() is called.
>>>
>>> This test is not deterministic since it is hard to guarantee that the page
>>> access will cause a page fault. Tests have shown that the bug could be
>>> exposed with few interactions in a buggy kernel. This test is configured to
>>> loop 5000x, having a good chance to hit the kernel issue in just one run.
>>> This self test takes less than two seconds to run.
>>>
>>> This test uses set/getcontext because the kernel will recheckpoint
>>> zeroed structures, causing the test to segfault, which is undesired because
>>> the test needs to rerun, so, there is a signal handler for SIGSEGV which
>>> will restart the test.
>> And reference the ucontext->mcontext MSR using UCONTEXT_MSR() macro.
>> Hi Breno,
>> 
>> Thanks for the test, some of these TM tests are getting pretty advanced! :)
>> 
>> Unfortunately it doesn't build in a few configurations.
>> 
>> On Ubuntu 18.10 built with powerpc-linux-gnu-gcc I get:
>> 
>>   tm-signal-force-msr.c: In function 'trap_signal_handler':
>>   tm-signal-force-msr.c:42:19: error: 'union uc_regs_ptr' has no member 
>> named 'gp_regs'; did you mean 'uc_regs'?
>> ucp->uc_mcontext.gp_regs[PT_MSR] |= MSR_TS_S;
>>  ^~~
>>  uc_regs
>>   tm-signal-force-msr.c:17:29: error: left shift count >= width of type 
>> [-Werror=shift-count-overflow]
>>#define __MASK(X)   (1UL<<(X))
>>^~
>>   tm-signal-force-msr.c:20:25: note: in expansion of macro '__MASK'
>>#define MSR_TS_S__MASK(MSR_TS_S_LG) /* Transaction Suspended */
>>^~
>>   tm-signal-force-msr.c:42:38: note: in expansion of macro 'MSR_TS_S'
>> ucp->uc_mcontext.gp_regs[PT_MSR] |= MSR_TS_S;
>> ^~~~
>> 
>
> That is because I missed the -m64 compilation flag on Makefile. I understand
> that this test only make sense when compiled in 64 bits. Do you agree?

I think the test could work as a 32-bit binary on a 64-bit kernel, but I
don't mind if you force it to build 64-bit.

cheers


Re: [PATCH] powerpc: build dtb even without COMPILE_TEST

2019-01-08 Thread Michael Ellerman
Rob Herring  writes:
> On Fri, Jan 4, 2019 at 3:58 AM Corentin Labbe  wrote:
>>
>> I wanted to test the virtex440-ml507 qemu machine and found that the dtb
>> for it was not builded.
>
> Just do:
>
> make virtex440-ml507.dtb

I actually thought you had to do that, so I've never done anything different.

>> All powerpc DTB are only built when CONFIG_OF_ALL_DTBS is set which depend on
>> COMPILE_TEST.
>> But building DTB is not related to a "compile build test".
>
> But it is. We normally only build the dtbs for enabled platforms just
> like we only build platform/driver code for enabled platforms. It's
> hidden behind COMPILE_TEST so it only gets enabled for
> allmodconfig/allyesconfig builds.
>
>> So this patch made building of DTB independent of COMPILE_TEST (by
>> depending only on the PPC arch)
>> A better selection of which DTB to build could be done in the future
>> like that do the ARM arch.
>
> No reason you can't start that now for the platform you care about.
> You don't have to convert everyone.

Yeah that seems like the best option. I'm happy to take a patch just for
that target.

cheers


Re: [PATCH v4 1/6] powerpc: prefer memblock APIs returning virtual address

2019-01-08 Thread Christophe Leroy




Le 31/12/2018 à 10:29, Mike Rapoport a écrit :

There are a several places that allocate memory using memblock APIs that
return a physical address, convert the returned address to the virtual
address and frequently also memset(0) the allocated range.

Update these places to use memblock allocators already returning a virtual
address. Use memblock functions that clear the allocated memory instead of
calling memset(0) where appropriate.

The calls to memblock_alloc_base() that were not followed by memset(0) are
replaced with memblock_alloc_try_nid_raw(). Since the latter does not
panic() when the allocation fails, the appropriate panic() calls are added
to the call sites.

Signed-off-by: Mike Rapoport 
---
  arch/powerpc/kernel/paca.c | 16 ++--
  arch/powerpc/kernel/setup_64.c | 24 ++--
  arch/powerpc/mm/hash_utils_64.c|  6 +++---
  arch/powerpc/mm/pgtable-book3e.c   |  8 ++--
  arch/powerpc/mm/pgtable-book3s64.c |  5 +
  arch/powerpc/mm/pgtable-radix.c| 25 +++--
  arch/powerpc/platforms/pasemi/iommu.c  |  5 +++--
  arch/powerpc/platforms/pseries/setup.c | 18 ++
  arch/powerpc/sysdev/dart_iommu.c   |  7 +--
  9 files changed, 51 insertions(+), 63 deletions(-)

diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 913bfca..276d36d4 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -27,7 +27,7 @@
  static void *__init alloc_paca_data(unsigned long size, unsigned long align,
unsigned long limit, int cpu)
  {
-   unsigned long pa;
+   void *ptr;
int nid;
  
  	/*

@@ -42,17 +42,15 @@ static void *__init alloc_paca_data(unsigned long size, 
unsigned long align,
nid = early_cpu_to_node(cpu);
}
  
-	pa = memblock_alloc_base_nid(size, align, limit, nid, MEMBLOCK_NONE);

-   if (!pa) {
-   pa = memblock_alloc_base(size, align, limit);
-   if (!pa)
-   panic("cannot allocate paca data");
-   }
+   ptr = memblock_alloc_try_nid(size, align, MEMBLOCK_LOW_LIMIT,
+limit, nid);
+   if (!ptr)
+   panic("cannot allocate paca data");


AFAIKS, memblock_alloc_try_nid() panics if memblock_alloc_internal() 
returns NULL, so the above panic is useless, isn't it ?


  
  	if (cpu == boot_cpuid)

memblock_set_bottom_up(false);
  
-	return __va(pa);

+   return ptr;
  }
  
  #ifdef CONFIG_PPC_PSERIES

@@ -118,7 +116,6 @@ static struct slb_shadow * __init new_slb_shadow(int cpu, 
unsigned long limit)
}
  
  	s = alloc_paca_data(sizeof(*s), L1_CACHE_BYTES, limit, cpu);

-   memset(s, 0, sizeof(*s));
  
  	s->persistent = cpu_to_be32(SLB_NUM_BOLTED);

s->buffer_length = cpu_to_be32(sizeof(*s));
@@ -222,7 +219,6 @@ void __init allocate_paca(int cpu)
paca = alloc_paca_data(sizeof(struct paca_struct), L1_CACHE_BYTES,
limit, cpu);
paca_ptrs[cpu] = paca;
-   memset(paca, 0, sizeof(struct paca_struct));
  
  	initialise_paca(paca, cpu);

  #ifdef CONFIG_PPC_PSERIES
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 236c115..3dcd779 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -634,19 +634,17 @@ __init u64 ppc64_bolted_size(void)
  
  static void *__init alloc_stack(unsigned long limit, int cpu)

  {
-   unsigned long pa;
+   void *ptr;
  
  	BUILD_BUG_ON(STACK_INT_FRAME_SIZE % 16);
  
-	pa = memblock_alloc_base_nid(THREAD_SIZE, THREAD_SIZE, limit,

-   early_cpu_to_node(cpu), MEMBLOCK_NONE);
-   if (!pa) {
-   pa = memblock_alloc_base(THREAD_SIZE, THREAD_SIZE, limit);
-   if (!pa)
-   panic("cannot allocate stacks");
-   }
+   ptr = memblock_alloc_try_nid(THREAD_SIZE, THREAD_SIZE,
+MEMBLOCK_LOW_LIMIT, limit,
+early_cpu_to_node(cpu));
+   if (!ptr)
+   panic("cannot allocate stacks");


Same ?

Christophe

  
-	return __va(pa);

+   return ptr;
  }
  
  void __init irqstack_early_init(void)

@@ -739,20 +737,17 @@ void __init emergency_stack_init(void)
struct thread_info *ti;
  
  		ti = alloc_stack(limit, i);

-   memset(ti, 0, THREAD_SIZE);
emerg_stack_init_thread_info(ti, i);
paca_ptrs[i]->emergency_sp = (void *)ti + THREAD_SIZE;
  
  #ifdef CONFIG_PPC_BOOK3S_64

/* emergency stack for NMI exception handling. */
ti = alloc_stack(limit, i);
-   memset(ti, 0, THREAD_SIZE);
emerg_stack_init_thread_info(ti, i);
paca_ptrs[i]->nmi_emergency_sp = (void *)ti + THREAD_SIZE;
  
  		/* emergency stack for machine 

Re: [PATCH v2 2/2] powerpc: use probe_user_read()

2019-01-08 Thread Russell Currey
On Tue, 2019-01-08 at 10:37 +0100, Christophe Leroy wrote:
> Hi Michael and Russell,
> 
> Any idea why:
> - checkpatch reports missing Signed-off-by:
> - Snowpatch build fails on PPC64 (it seems unrelated to the patch, 
> something wrong in lib/genalloc.c)

Upstream kernel broke for powerpc (snowpatch applies patches on top of
powerpc/next), it was fixed in commit
35004f2e55807a1a1491db24ab512dd2f770a130 which I believe is in
powerpc/next now.  I will look at rerunning tests for all the patches
that this impacted.

As for the S-o-b, no clue, I'll have a look.  Thanks for the report!

- Russell

> 
> Thanks
> Christophe
> 
> Le 08/01/2019 à 08:37, Christophe Leroy a écrit :
> > Instead of opencoding, use probe_user_read() to failessly
> > read a user location.
> > 
> > Signed-off-by: Christophe Leroy 
> > ---
> >   v2: Using probe_user_read() instead of probe_user_address()
> > 
> >   arch/powerpc/kernel/process.c   | 12 +---
> >   arch/powerpc/mm/fault.c |  6 +-
> >   arch/powerpc/perf/callchain.c   | 20 +++-
> >   arch/powerpc/perf/core-book3s.c |  8 +---
> >   arch/powerpc/sysdev/fsl_pci.c   | 10 --
> >   5 files changed, 10 insertions(+), 46 deletions(-)
> > 
> > diff --git a/arch/powerpc/kernel/process.c
> > b/arch/powerpc/kernel/process.c
> > index ce393df243aa..6a4b59d574c2 100644
> > --- a/arch/powerpc/kernel/process.c
> > +++ b/arch/powerpc/kernel/process.c
> > @@ -1298,16 +1298,6 @@ void show_user_instructions(struct pt_regs
> > *regs)
> >   
> > pc = regs->nip - (NR_INSN_TO_PRINT * 3 / 4 * sizeof(int));
> >   
> > -   /*
> > -* Make sure the NIP points at userspace, not kernel text/data
> > or
> > -* elsewhere.
> > -*/
> > -   if (!__access_ok(pc, NR_INSN_TO_PRINT * sizeof(int), USER_DS))
> > {
> > -   pr_info("%s[%d]: Bad NIP, not dumping instructions.\n",
> > -   current->comm, current->pid);
> > -   return;
> > -   }
> > -
> > seq_buf_init(, buf, sizeof(buf));
> >   
> > while (n) {
> > @@ -1318,7 +1308,7 @@ void show_user_instructions(struct pt_regs
> > *regs)
> > for (i = 0; i < 8 && n; i++, n--, pc += sizeof(int)) {
> > int instr;
> >   
> > -   if (probe_kernel_address((const void *)pc,
> > instr)) {
> > +   if (probe_user_read(, (void __user *)pc,
> > sizeof(instr))) {
> > seq_buf_printf(, " ");
> > continue;
> > }
> > diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> > index 887f11bcf330..ec74305fa330 100644
> > --- a/arch/powerpc/mm/fault.c
> > +++ b/arch/powerpc/mm/fault.c
> > @@ -276,12 +276,8 @@ static bool bad_stack_expansion(struct pt_regs
> > *regs, unsigned long address,
> > if ((flags & FAULT_FLAG_WRITE) && (flags &
> > FAULT_FLAG_USER) &&
> > access_ok(nip, sizeof(*nip))) {
> > unsigned int inst;
> > -   int res;
> >   
> > -   pagefault_disable();
> > -   res = __get_user_inatomic(inst, nip);
> > -   pagefault_enable();
> > -   if (!res)
> > +   if (!probe_user_read(, nip, sizeof(inst)))
> > return !store_updates_sp(inst);
> > *must_retry = true;
> > }
> > diff --git a/arch/powerpc/perf/callchain.c
> > b/arch/powerpc/perf/callchain.c
> > index 0af051a1974e..0680efb2237b 100644
> > --- a/arch/powerpc/perf/callchain.c
> > +++ b/arch/powerpc/perf/callchain.c
> > @@ -159,12 +159,8 @@ static int read_user_stack_64(unsigned long
> > __user *ptr, unsigned long *ret)
> > ((unsigned long)ptr & 7))
> > return -EFAULT;
> >   
> > -   pagefault_disable();
> > -   if (!__get_user_inatomic(*ret, ptr)) {
> > -   pagefault_enable();
> > +   if (!probe_user_read(ret, ptr, sizeof(*ret)))
> > return 0;
> > -   }
> > -   pagefault_enable();
> >   
> > return read_user_stack_slow(ptr, ret, 8);
> >   }
> > @@ -175,12 +171,8 @@ static int read_user_stack_32(unsigned int
> > __user *ptr, unsigned int *ret)
> > ((unsigned long)ptr & 3))
> > return -EFAULT;
> >   
> > -   pagefault_disable();
> > -   if (!__get_user_inatomic(*ret, ptr)) {
> > -   pagefault_enable();
> > +   if (!probe_user_read(ret, ptr, sizeof(*ret)))
> > return 0;
> > -   }
> > -   pagefault_enable();
> >   
> > return read_user_stack_slow(ptr, ret, 4);
> >   }
> > @@ -307,17 +299,11 @@ static inline int current_is_64bit(void)
> >*/
> >   static int read_user_stack_32(unsigned int __user *ptr, unsigned
> > int *ret)
> >   {
> > -   int rc;
> > -
> > if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> > ((unsigned long)ptr & 3))
> > return -EFAULT;
> >   
> > -   pagefault_disable();
> > -   rc = __get_user_inatomic(*ret, ptr);
> > -   

Re: [1/2] powerpc/4xx/ocm: Fix phys_addr_t printf warnings

2019-01-08 Thread Michael Ellerman
Christian Lamparter  writes:
> On Wednesday, January 2, 2019 12:31:50 PM CET Michael Ellerman wrote:
>> On Tue, 2019-01-01 at 03:56:00 UTC, Michael Ellerman wrote:
>> > Currently the code produces several warnings, eg:
>> > 
>> >   arch/powerpc/platforms/4xx/ocm.c:240:38: error: format '%llx'
>> >   expects argument of type 'long long unsigned int', but argument 3
>> >   has type 'phys_addr_t {aka unsigned int}'
>> >  seq_printf(m, "PhysAddr : 0x%llx\n", ocm->phys);
>> >~~~^ ~
>> > 
>> > Fix it by using the special %pa[p] format for printing phys_addr_t.
>> > Note we need to pass the value by reference for the special specifier
>> > to work.
>> > 
>> > Signed-off-by: Michael Ellerman 
>> 
>> Series applied to powerpc fixes.
>> 
>> https://git.kernel.org/powerpc/c/52b88fa1e8c7bae03bb691178a9f8b
>
> Well, I guess I'm a late. I had issues with the getting 4.20+
> crosscompiled on debian with make-kpkg. 
>
> Nevertheless, I finally got a working kernel and 
> on the MyBook Live APM82181:
>
> ---
> root@mbl:/sys/kernel/debug# cat ppc4xx_ocm/info 
> PPC4XX OCM   : 1
> PhysAddr : 0x00040004[p]
> MemTotal : 32768 Bytes
> MemTotal(NC) : 32768 Bytes
> MemTotal(C)  : 0 Bytes
>
> NC.PhysAddr  : 0x00040004[p]
> NC.VirtAddr  : 0x6bc84b36
> NC.MemTotal  : 32768 Bytes
> NC.MemFree   : 32768 Bytes
>
> C.PhysAddr   : 0x[p]
> C.VirtAddr   : 0x  (null)
> C.MemTotal   : 0 Bytes
> C.MemFree: 0 Bytes

Oh right, I'm an idiot :)

The docs say:

Physical address types phys_addr_t
--

%pa[p]  0x01234567 or 0x0123456789abcdef


And if you grep for that there's eg:

  drivers/ntb/test/ntb_tool.c: "Window Size\t%pa[p]\n",

So I just literally copied that.

But it's trying to indicate that the p is optional.

This should fix it, I won't merge it until you've tested it this time :)

cheers


diff --git a/arch/powerpc/platforms/4xx/ocm.c b/arch/powerpc/platforms/4xx/ocm.c
index a1aaa1569d7c..f0e488d97567 100644
--- a/arch/powerpc/platforms/4xx/ocm.c
+++ b/arch/powerpc/platforms/4xx/ocm.c
@@ -237,12 +237,12 @@ static int ocm_debugfs_show(struct seq_file *m, void *v)
continue;
 
seq_printf(m, "PPC4XX OCM   : %d\n", ocm->index);
-   seq_printf(m, "PhysAddr : %pa[p]\n", &(ocm->phys));
+   seq_printf(m, "PhysAddr : %pa\n", &(ocm->phys));
seq_printf(m, "MemTotal : %d Bytes\n", ocm->memtotal);
seq_printf(m, "MemTotal(NC) : %d Bytes\n", ocm->nc.memtotal);
seq_printf(m, "MemTotal(C)  : %d Bytes\n\n", ocm->c.memtotal);
 
-   seq_printf(m, "NC.PhysAddr  : %pa[p]\n", &(ocm->nc.phys));
+   seq_printf(m, "NC.PhysAddr  : %pa\n", &(ocm->nc.phys));
seq_printf(m, "NC.VirtAddr  : 0x%p\n", ocm->nc.virt);
seq_printf(m, "NC.MemTotal  : %d Bytes\n", ocm->nc.memtotal);
seq_printf(m, "NC.MemFree   : %d Bytes\n", ocm->nc.memfree);
@@ -252,7 +252,7 @@ static int ocm_debugfs_show(struct seq_file *m, void *v)
blk->size, blk->owner);
}
 
-   seq_printf(m, "\nC.PhysAddr   : %pa[p]\n", &(ocm->c.phys));
+   seq_printf(m, "\nC.PhysAddr   : %pa\n", &(ocm->c.phys));
seq_printf(m, "C.VirtAddr   : 0x%p\n", ocm->c.virt);
seq_printf(m, "C.MemTotal   : %d Bytes\n", ocm->c.memtotal);
seq_printf(m, "C.MemFree: %d Bytes\n", ocm->c.memfree);


Re: [PATCH v4 1/2] crypto: talitos - reorder code in talitos_edesc_alloc()

2019-01-08 Thread Christophe Leroy




Le 08/01/2019 à 07:56, Christophe Leroy a écrit :

This patch moves the mapping of IV after the kmalloc(). This
avoids having to unmap in case kmalloc() fails.

Signed-off-by: Christophe Leroy 


Cc: sta...@vger.kernel.org


---
  new in v4

  drivers/crypto/talitos.c | 25 +++--
  1 file changed, 7 insertions(+), 18 deletions(-)

diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index 45e20707cef8..54d80e7edb86 100644
--- a/drivers/crypto/talitos.c
+++ b/drivers/crypto/talitos.c
@@ -1361,23 +1361,18 @@ static struct talitos_edesc *talitos_edesc_alloc(struct 
device *dev,
struct talitos_private *priv = dev_get_drvdata(dev);
bool is_sec1 = has_ftr_sec1(priv);
int max_len = is_sec1 ? TALITOS1_MAX_DATA_LEN : TALITOS2_MAX_DATA_LEN;
-   void *err;
  
  	if (cryptlen + authsize > max_len) {

dev_err(dev, "length exceeds h/w max limit\n");
return ERR_PTR(-EINVAL);
}
  
-	if (ivsize)

-   iv_dma = dma_map_single(dev, iv, ivsize, DMA_TO_DEVICE);
-
if (!dst || dst == src) {
src_len = assoclen + cryptlen + authsize;
src_nents = sg_nents_for_len(src, src_len);
if (src_nents < 0) {
dev_err(dev, "Invalid number of src SG.\n");
-   err = ERR_PTR(-EINVAL);
-   goto error_sg;
+   return ERR_PTR(-EINVAL);
}
src_nents = (src_nents == 1) ? 0 : src_nents;
dst_nents = dst ? src_nents : 0;
@@ -1387,16 +1382,14 @@ static struct talitos_edesc *talitos_edesc_alloc(struct 
device *dev,
src_nents = sg_nents_for_len(src, src_len);
if (src_nents < 0) {
dev_err(dev, "Invalid number of src SG.\n");
-   err = ERR_PTR(-EINVAL);
-   goto error_sg;
+   return ERR_PTR(-EINVAL);
}
src_nents = (src_nents == 1) ? 0 : src_nents;
dst_len = assoclen + cryptlen + (encrypt ? authsize : 0);
dst_nents = sg_nents_for_len(dst, dst_len);
if (dst_nents < 0) {
dev_err(dev, "Invalid number of dst SG.\n");
-   err = ERR_PTR(-EINVAL);
-   goto error_sg;
+   return ERR_PTR(-EINVAL);
}
dst_nents = (dst_nents == 1) ? 0 : dst_nents;
}
@@ -1425,10 +1418,10 @@ static struct talitos_edesc *talitos_edesc_alloc(struct 
device *dev,
alloc_len += sizeof(struct talitos_desc);
  
  	edesc = kmalloc(alloc_len, GFP_DMA | flags);

-   if (!edesc) {
-   err = ERR_PTR(-ENOMEM);
-   goto error_sg;
-   }
+   if (!edesc)
+   return ERR_PTR(-ENOMEM);
+   if (ivsize)
+   iv_dma = dma_map_single(dev, iv, ivsize, DMA_TO_DEVICE);
memset(>desc, 0, sizeof(edesc->desc));
  
  	edesc->src_nents = src_nents;

@@ -1445,10 +1438,6 @@ static struct talitos_edesc *talitos_edesc_alloc(struct 
device *dev,
 DMA_BIDIRECTIONAL);
}
return edesc;
-error_sg:
-   if (iv_dma)
-   dma_unmap_single(dev, iv_dma, ivsize, DMA_TO_DEVICE);
-   return err;
  }
  
  static struct talitos_edesc *aead_edesc_alloc(struct aead_request *areq, u8 *iv,




Re: [PATCH v4 2/2] crypto: talitos - fix ablkcipher for CONFIG_VMAP_STACK

2019-01-08 Thread Horia Geanta
On 1/8/2019 8:56 AM, Christophe Leroy wrote:
> [2.364486] WARNING: CPU: 0 PID: 60 at ./arch/powerpc/include/asm/io.h:837 
> dma_nommu_map_page+0x44/0xd4
> [2.373579] CPU: 0 PID: 60 Comm: cryptomgr_test Tainted: GW
>  4.20.0-rc5-00560-g6bfb52e23a00-dirty #531
> [2.384740] NIP:  c000c540 LR: c000c584 CTR: 
> [2.389743] REGS: c95abab0 TRAP: 0700   Tainted: GW  
> (4.20.0-rc5-00560-g6bfb52e23a00-dirty)
> [2.400042] MSR:  00029032   CR: 24042204  XER: 
> [2.406669]
> [2.406669] GPR00: c02f2244 c95abb60 c6262990 c95abd80 256a 0001 
> 0001 0001
> [2.406669] GPR08:  2000 0010 0010 24042202  
> 0100 c95abd88
> [2.406669] GPR16:  c05569d4 0001 0010 c95abc88 c0615664 
> 0004 
> [2.406669] GPR24: 0010 c95abc88 c95abc88  c61ae210 c7ff6d40 
> c61ae210 3d68
> [2.441559] NIP [c000c540] dma_nommu_map_page+0x44/0xd4
> [2.446720] LR [c000c584] dma_nommu_map_page+0x88/0xd4
> [2.451762] Call Trace:
> [2.454195] [c95abb60] [82000808] 0x82000808 (unreliable)
> [2.459572] [c95abb80] [c02f2244] talitos_edesc_alloc+0xbc/0x3c8
> [2.465493] [c95abbb0] [c02f2600] ablkcipher_edesc_alloc+0x4c/0x5c
> [2.471606] [c95abbd0] [c02f4ed0] ablkcipher_encrypt+0x20/0x64
> [2.477389] [c95abbe0] [c02023b0] __test_skcipher+0x4bc/0xa08
> [2.483049] [c95abe00] [c0204b60] test_skcipher+0x2c/0xcc
> [2.488385] [c95abe20] [c0204c48] alg_test_skcipher+0x48/0xbc
> [2.494064] [c95abe40] [c0205cec] alg_test+0x164/0x2e8
> [2.499142] [c95abf00] [c0200dec] cryptomgr_test+0x48/0x50
> [2.504558] [c95abf10] [c0039ff4] kthread+0xe4/0x110
> [2.509471] [c95abf40] [c000e1d0] ret_from_kernel_thread+0x14/0x1c
> [2.515532] Instruction dump:
> [2.518468] 7c7e1b78 7c9d2378 7cbf2b78 41820054 3d20c076 8089c200 3d20c076 
> 7c84e850
> [2.526127] 8129c204 7c842e70 7f844840 419c0008 <0fe0> 2f9e 
> 54847022 7c84fa14
> [2.533960] ---[ end trace bf78d94af73fe3b8 ]---
> [2.539123] talitos ff02.crypto: master data transfer error
> [2.544775] talitos ff02.crypto: TEA error: ISR 0x2000_0040
> [2.551625] alg: skcipher: encryption failed on test 1 for 
> ecb-aes-talitos: ret=22
> 
> IV cannot be on stack when CONFIG_VMAP_STACK is selected because the stack
> cannot be DMA mapped anymore.
> 
> This patch copies the IV into the extended descriptor.
> 
> Fixes: 4de9d0b547b9 ("crypto: talitos - Add ablkcipher algorithms")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Christophe Leroy 
Reviewed-by: Horia Geantă 

Thanks,
Horia


Re: [PATCH v4 1/2] crypto: talitos - reorder code in talitos_edesc_alloc()

2019-01-08 Thread Horia Geanta
On 1/8/2019 8:56 AM, Christophe Leroy wrote:
> This patch moves the mapping of IV after the kmalloc(). This
> avoids having to unmap in case kmalloc() fails.
> 
> Signed-off-by: Christophe Leroy 
Reviewed-by: Horia Geantă 

Since patch 2/2 is Cc-ing stable, this one should do the same.
Herbert, could you append the commit message before applying it?

Thanks,
Horia


Re: [PATCH v2 2/2] powerpc: use probe_user_read()

2019-01-08 Thread Christophe Leroy

Hi Michael and Russell,

Any idea why:
- checkpatch reports missing Signed-off-by:
- Snowpatch build fails on PPC64 (it seems unrelated to the patch, 
something wrong in lib/genalloc.c)


Thanks
Christophe

Le 08/01/2019 à 08:37, Christophe Leroy a écrit :

Instead of opencoding, use probe_user_read() to failessly
read a user location.

Signed-off-by: Christophe Leroy 
---
  v2: Using probe_user_read() instead of probe_user_address()

  arch/powerpc/kernel/process.c   | 12 +---
  arch/powerpc/mm/fault.c |  6 +-
  arch/powerpc/perf/callchain.c   | 20 +++-
  arch/powerpc/perf/core-book3s.c |  8 +---
  arch/powerpc/sysdev/fsl_pci.c   | 10 --
  5 files changed, 10 insertions(+), 46 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index ce393df243aa..6a4b59d574c2 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1298,16 +1298,6 @@ void show_user_instructions(struct pt_regs *regs)
  
  	pc = regs->nip - (NR_INSN_TO_PRINT * 3 / 4 * sizeof(int));
  
-	/*

-* Make sure the NIP points at userspace, not kernel text/data or
-* elsewhere.
-*/
-   if (!__access_ok(pc, NR_INSN_TO_PRINT * sizeof(int), USER_DS)) {
-   pr_info("%s[%d]: Bad NIP, not dumping instructions.\n",
-   current->comm, current->pid);
-   return;
-   }
-
seq_buf_init(, buf, sizeof(buf));
  
  	while (n) {

@@ -1318,7 +1308,7 @@ void show_user_instructions(struct pt_regs *regs)
for (i = 0; i < 8 && n; i++, n--, pc += sizeof(int)) {
int instr;
  
-			if (probe_kernel_address((const void *)pc, instr)) {

+   if (probe_user_read(, (void __user *)pc, 
sizeof(instr))) {
seq_buf_printf(, " ");
continue;
}
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 887f11bcf330..ec74305fa330 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -276,12 +276,8 @@ static bool bad_stack_expansion(struct pt_regs *regs, 
unsigned long address,
if ((flags & FAULT_FLAG_WRITE) && (flags & FAULT_FLAG_USER) &&
access_ok(nip, sizeof(*nip))) {
unsigned int inst;
-   int res;
  
-			pagefault_disable();

-   res = __get_user_inatomic(inst, nip);
-   pagefault_enable();
-   if (!res)
+   if (!probe_user_read(, nip, sizeof(inst)))
return !store_updates_sp(inst);
*must_retry = true;
}
diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index 0af051a1974e..0680efb2237b 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -159,12 +159,8 @@ static int read_user_stack_64(unsigned long __user *ptr, 
unsigned long *ret)
((unsigned long)ptr & 7))
return -EFAULT;
  
-	pagefault_disable();

-   if (!__get_user_inatomic(*ret, ptr)) {
-   pagefault_enable();
+   if (!probe_user_read(ret, ptr, sizeof(*ret)))
return 0;
-   }
-   pagefault_enable();
  
  	return read_user_stack_slow(ptr, ret, 8);

  }
@@ -175,12 +171,8 @@ static int read_user_stack_32(unsigned int __user *ptr, 
unsigned int *ret)
((unsigned long)ptr & 3))
return -EFAULT;
  
-	pagefault_disable();

-   if (!__get_user_inatomic(*ret, ptr)) {
-   pagefault_enable();
+   if (!probe_user_read(ret, ptr, sizeof(*ret)))
return 0;
-   }
-   pagefault_enable();
  
  	return read_user_stack_slow(ptr, ret, 4);

  }
@@ -307,17 +299,11 @@ static inline int current_is_64bit(void)
   */
  static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
  {
-   int rc;
-
if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
((unsigned long)ptr & 3))
return -EFAULT;
  
-	pagefault_disable();

-   rc = __get_user_inatomic(*ret, ptr);
-   pagefault_enable();
-
-   return rc;
+   return probe_user_read(ret, ptr, sizeof(*ret));
  }
  
  static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index b0723002a396..4b64ddf0db68 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -416,7 +416,6 @@ static void power_pmu_sched_task(struct perf_event_context 
*ctx, bool sched_in)
  static __u64 power_pmu_bhrb_to(u64 addr)
  {
unsigned int instr;
-   int ret;
__u64 target;
  
  	if (is_kernel_addr(addr)) {

@@ -427,13 +426,8 @@ static __u64 power_pmu_bhrb_to(u64 addr)
}
  
  	/* Userspace: need copy instruction here 

Re: [PATCH v8 24/25] powerpc: Adopt nvram module for PPC64

2019-01-08 Thread Michael Ellerman
Finn Thain  writes:
> On Mon, 7 Jan 2019, Michael Ellerman wrote:
>
>> Arnd Bergmann  writes:
>> > On Wed, Dec 26, 2018 at 1:43 AM Finn Thain  
>> > wrote:
>> >
>> >> +static ssize_t ppc_nvram_get_size(void)
>> >> +{
>> >> +   if (ppc_md.nvram_size)
>> >> +   return ppc_md.nvram_size();
>> >> +   return -ENODEV;
>> >> +}
>> >
>> >> +const struct nvram_ops arch_nvram_ops = {
>> >> +   .read   = ppc_nvram_read,
>> >> +   .write  = ppc_nvram_write,
>> >> +   .get_size   = ppc_nvram_get_size,
>> >> +   .sync   = ppc_nvram_sync,
>> >> +};
>> >
>> > Coming back to this after my comment on the m68k side, I notice that
>> > there is now a double indirection through function pointers. Have you
>> > considered completely removing the operations from ppc_md instead
>> > by having multiple copies of nvram_ops?
>> >
>> > With the current method, it does seem odd to have a single
>> > per-architecture instance of the exported structure containing
>> > function pointers. This doesn't give us the flexibility of having
>> > multiple copies in the kernel the way that ppc_md does, but it adds
>> > overhead compared to simply exporting the functions directly.
>> 
>> Yeah TBH I'm not convinced the arch ops is the best solution.
>> 
>> Why can't each arch just implement the required ops functions? On ppc
>> we'd still use ppc_md but that would be a ppc detail.
>> 
>> Optional ops are fairly easy to support by providing a default
>> implementation, eg. instead of:
>> 
>> +if (arch_nvram_ops.get_size == NULL)
>> +return -ENODEV;
>> +
>> +nvram_size = arch_nvram_ops.get_size();
>> +if (nvram_size < 0)
>> +return nvram_size;
>> 
>> 
>> We do in some header:
>> 
>> #ifndef arch_nvram_get_size
>> static inline int arch_nvram_get_size(void)
>> {
>>  return -ENODEV;
>> }
>> #endif
>> 
>> And then:
>> 
>>  nvram_size = arch_nvram_get_size();
>>  if (nvram_size < 0)
>>  return nvram_size;
>> 
>> 
>> But I haven't digested the whole series so maybe I'm missing something?
>> 
>
> The reason why that doesn't work boils down to introspection. (This was 
> mentioned elsewhere in this email thread.) For example, we presently have 
> code like this,
>
> ssize_t nvram_get_size(void)
> {
>if (ppc_md.nvram_size)
>return ppc_md.nvram_size();
>return -1;
> }
> EXPORT_SYMBOL(nvram_get_size);
>
> This construction means we get to decide at run-time which of the NVRAM 
> functions should be used. (Whereas your example makes a build-time decision.)

Right, but we only need to make a runtime decision on powerpc (right?).
So it seems to me we should isolate that in the powerpc code.

> The purpose of arch_nvram_ops is much the same. That is, it does for m68k 
> and x86 what ppc_md already does for ppc32 and ppc64. (And once these 
> platforms share an API like this, they can share more driver code, and 
> reduce duplicated code.)

> The approach taken in this series was to push the arch_nvram_ops approach 
> as far as possible, because by making everything fit into that regime it 
> immediately became apparent where architectures and platforms have 
> diverged, creating weird inconsistencies like the differences in sync 
> ioctl behaviour between ppc32 and ppc64 for core99. (Early revisions of 
> this series exposed more issues like bugs and dead code that got addressed 
> elsewhere.)

I just don't see the advantage of having arch_nvram_ops which is a
structure of function pointers that are always the same on a given arch,
vs some static inlines that implement the same ops for that arch.

> Problem is, as Arnd pointed out, powerpc doesn't need both kinds of ops 
> struct. So I'm rewriting this series in such a way that powerpc doesn't 
> have to implement both. This rewrite is going to look totally different 
> for powerpc (though not for x86 or m68k) so you might want to wait for me 
> to post v9 before spending more time on code review.

OK. I know you've been working on this series for a long time and I
don't want to roadblock it, so at the end of the day I don't feel that
strongly about it as long as the code works.

I'll wait for v9 and have another look then.

cheers


Re: [PATCH v3 1/2] selftests/powerpc: Add MSR bits

2019-01-08 Thread Michael Ellerman
Breno Leitao  writes:

> hi Christophe,
>
> On 1/3/19 3:19 PM, LEROY Christophe wrote:
>> Breno Leitao  a écrit :
>> 
>>> This patch simply adds definitions for the MSR bits and some macros to
>>> test for MSR TM bits.
>>>
>>> This was copied from arch/powerpc/include/asm/reg.h generic MSR part.
>> 
>> Can't we find a way to avoid duplicating such defines ?
>
> I think there are three possible ways, but none of them respect the premises
> we are used too. These are the possible ways I can think of:
>
> 1) Including arch/powerpc/include/asm as part of the selftest compilation
> process.
>Problem: This might break the selftest independence of the kbuild system.
>
> 2) Generate a temporary header file inside selftests/include which contains
> these macros at compilation time.
>Problem: The problem as above.
>
> 3) Define MSR fields at userspace headers (/usr/include).
>Problem: I am not sure userspace should have MSR bits information.
>
> Do you suggest me to investigate any other way?

In this case I think we can probably just copy the few #defines we need.

If we decide in future we need all or most of the MSR defines then I
would suggest we:
 - split *only* the MSR defines into arch/powerpc/include/asm/msr.h
 - include that file from reg.h
 - symlink that into the selftest code


cheers


Re: [PATCH v2 2/2] powerpc: use probe_user_read()

2019-01-08 Thread Christophe Leroy




Le 08/01/2019 à 10:04, David Hildenbrand a écrit :

On 08.01.19 08:37, Christophe Leroy wrote:

Instead of opencoding, use probe_user_read() to failessly
read a user location.

Signed-off-by: Christophe Leroy 
---
  v2: Using probe_user_read() instead of probe_user_address()

  arch/powerpc/kernel/process.c   | 12 +---
  arch/powerpc/mm/fault.c |  6 +-
  arch/powerpc/perf/callchain.c   | 20 +++-
  arch/powerpc/perf/core-book3s.c |  8 +---
  arch/powerpc/sysdev/fsl_pci.c   | 10 --
  5 files changed, 10 insertions(+), 46 deletions(-)



[snip]


diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
index 918be816b097..c8a1b26489f5 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -1068,13 +1068,11 @@ int fsl_pci_mcheck_exception(struct pt_regs *regs)
addr += mfspr(SPRN_MCAR);
  
  	if (is_in_pci_mem_space(addr)) {

-   if (user_mode(regs)) {
-   pagefault_disable();
-   ret = get_user(inst, (__u32 __user *)regs->nip);
-   pagefault_enable();
-   } else {
+   if (user_mode(regs))
+   ret = probe_user_read(, (void __user *)regs->nip,
+ sizeof(inst));


What about also adding probe_user_address ?


Michael doesn't like it, see https://patchwork.ozlabs.org/patch/1007117/

Christophe




+   else
ret = probe_kernel_address((void *)regs->nip, inst);
-   }
  
  		if (!ret && mcheck_handle_load(regs, inst)) {

regs->nip += 4;






Re: [PATCH v2 2/2] powerpc: use probe_user_read()

2019-01-08 Thread David Hildenbrand
On 08.01.19 08:37, Christophe Leroy wrote:
> Instead of opencoding, use probe_user_read() to failessly
> read a user location.
> 
> Signed-off-by: Christophe Leroy 
> ---
>  v2: Using probe_user_read() instead of probe_user_address()
> 
>  arch/powerpc/kernel/process.c   | 12 +---
>  arch/powerpc/mm/fault.c |  6 +-
>  arch/powerpc/perf/callchain.c   | 20 +++-
>  arch/powerpc/perf/core-book3s.c |  8 +---
>  arch/powerpc/sysdev/fsl_pci.c   | 10 --
>  5 files changed, 10 insertions(+), 46 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index ce393df243aa..6a4b59d574c2 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -1298,16 +1298,6 @@ void show_user_instructions(struct pt_regs *regs)
>  
>   pc = regs->nip - (NR_INSN_TO_PRINT * 3 / 4 * sizeof(int));
>  
> - /*
> -  * Make sure the NIP points at userspace, not kernel text/data or
> -  * elsewhere.
> -  */
> - if (!__access_ok(pc, NR_INSN_TO_PRINT * sizeof(int), USER_DS)) {
> - pr_info("%s[%d]: Bad NIP, not dumping instructions.\n",
> - current->comm, current->pid);
> - return;
> - }
> -
>   seq_buf_init(, buf, sizeof(buf));
>  
>   while (n) {
> @@ -1318,7 +1308,7 @@ void show_user_instructions(struct pt_regs *regs)
>   for (i = 0; i < 8 && n; i++, n--, pc += sizeof(int)) {
>   int instr;
>  
> - if (probe_kernel_address((const void *)pc, instr)) {
> + if (probe_user_read(, (void __user *)pc, 
> sizeof(instr))) {
>   seq_buf_printf(, " ");
>   continue;
>   }
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index 887f11bcf330..ec74305fa330 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -276,12 +276,8 @@ static bool bad_stack_expansion(struct pt_regs *regs, 
> unsigned long address,
>   if ((flags & FAULT_FLAG_WRITE) && (flags & FAULT_FLAG_USER) &&
>   access_ok(nip, sizeof(*nip))) {
>   unsigned int inst;
> - int res;
>  
> - pagefault_disable();
> - res = __get_user_inatomic(inst, nip);
> - pagefault_enable();
> - if (!res)
> + if (!probe_user_read(, nip, sizeof(inst)))
>   return !store_updates_sp(inst);
>   *must_retry = true;
>   }
> diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
> index 0af051a1974e..0680efb2237b 100644
> --- a/arch/powerpc/perf/callchain.c
> +++ b/arch/powerpc/perf/callchain.c
> @@ -159,12 +159,8 @@ static int read_user_stack_64(unsigned long __user *ptr, 
> unsigned long *ret)
>   ((unsigned long)ptr & 7))
>   return -EFAULT;
>  
> - pagefault_disable();
> - if (!__get_user_inatomic(*ret, ptr)) {
> - pagefault_enable();
> + if (!probe_user_read(ret, ptr, sizeof(*ret)))
>   return 0;
> - }
> - pagefault_enable();
>  
>   return read_user_stack_slow(ptr, ret, 8);
>  }
> @@ -175,12 +171,8 @@ static int read_user_stack_32(unsigned int __user *ptr, 
> unsigned int *ret)
>   ((unsigned long)ptr & 3))
>   return -EFAULT;
>  
> - pagefault_disable();
> - if (!__get_user_inatomic(*ret, ptr)) {
> - pagefault_enable();
> + if (!probe_user_read(ret, ptr, sizeof(*ret)))
>   return 0;
> - }
> - pagefault_enable();
>  
>   return read_user_stack_slow(ptr, ret, 4);
>  }
> @@ -307,17 +299,11 @@ static inline int current_is_64bit(void)
>   */
>  static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
>  {
> - int rc;
> -
>   if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
>   ((unsigned long)ptr & 3))
>   return -EFAULT;
>  
> - pagefault_disable();
> - rc = __get_user_inatomic(*ret, ptr);
> - pagefault_enable();
> -
> - return rc;
> + return probe_user_read(ret, ptr, sizeof(*ret));
>  }
>  
>  static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx 
> *entry,
> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
> index b0723002a396..4b64ddf0db68 100644
> --- a/arch/powerpc/perf/core-book3s.c
> +++ b/arch/powerpc/perf/core-book3s.c
> @@ -416,7 +416,6 @@ static void power_pmu_sched_task(struct 
> perf_event_context *ctx, bool sched_in)
>  static __u64 power_pmu_bhrb_to(u64 addr)
>  {
>   unsigned int instr;
> - int ret;
>   __u64 target;
>  
>   if (is_kernel_addr(addr)) {
> @@ -427,13 +426,8 @@ static __u64 power_pmu_bhrb_to(u64 addr)
>   }
>  
>   /* Userspace: need copy instruction here then translate it */
> 

  1   2   >