Re: [PATCH] powerpc: Add barrier_nospec to raw_copy_in_user()

2019-12-15 Thread Michal Suchánek
On Thu, Sep 19, 2019 at 12:04:27AM +1000, Michael Ellerman wrote:
> Michal Suchánek  writes:
> > On Wed,  6 Mar 2019 12:10:38 +1100
> > Suraj Jitindar Singh  wrote:
> >
> >> Commit ddf35cf3764b ("powerpc: Use barrier_nospec in copy_from_user()")
> >> Added barrier_nospec before loading from user-controller pointers.
> >> The intention was to order the load from the potentially user-controlled
> >> pointer vs a previous branch based on an access_ok() check or similar.
> >> 
> >> In order to achieve the same result, add a barrier_nospec to the
> >> raw_copy_in_user() function before loading from such a user-controlled
> >> pointer.
> >> 
> >> Signed-off-by: Suraj Jitindar Singh 
> >> ---
> >>  arch/powerpc/include/asm/uaccess.h | 1 +
> >>  1 file changed, 1 insertion(+)
> >> 
> >> diff --git a/arch/powerpc/include/asm/uaccess.h 
> >> b/arch/powerpc/include/asm/uaccess.h
> >> index e3a731793ea2..bb615592d5bb 100644
> >> --- a/arch/powerpc/include/asm/uaccess.h
> >> +++ b/arch/powerpc/include/asm/uaccess.h
> >> @@ -306,6 +306,7 @@ extern unsigned long __copy_tofrom_user(void __user 
> >> *to,
> >>  static inline unsigned long
> >>  raw_copy_in_user(void __user *to, const void __user *from, unsigned long 
> >> n)
> >>  {
> >> +  barrier_nospec();
> >>return __copy_tofrom_user(to, from, n);
> >>  }
> >>  #endif /* __powerpc64__ */
> >
> > Hello,
> >
> > AFAICT this is not needed. The data cannot be used in the kernel without
> > subsequent copy_from_user which already has the nospec barrier.
> 
> Suraj did talk to me about this before sending the patch. My memory is
> that he came up with a scenario where it was at least theoretically
> useful, but he didn't mention that in the change log. He was working on
> nested KVM at the time.
> 
> I've now forgotten, and he's moved on to other things so probably won't
> remember either, if he's even checking his mail :/

In absence of any argument for the code and absence of the same code on
other architectures I would say you were just confused when merging
this. The code is confusing after all.

Thanks

Michal


Re: [PATCH] powerpc/fault: kernel can extend a user process's stack

2019-12-10 Thread Michal Suchánek
On Wed, Dec 11, 2019 at 12:43:37PM +1100, Daniel Axtens wrote:
> If a process page-faults trying to write beyond the end of its
> stack, we attempt to grow the stack.
> 
> However, if the kernel attempts to write beyond the end of a
> process's stack, it takes a bad fault. This can occur when the
> kernel is trying to set up a signal frame.
> 
> Permit the kernel to grow a process's stack. The same general
> limits as to how and when the stack can be grown apply: the kernel
> code still passes through expand_stack(), so anything that goes
> beyond e.g. the rlimit should still be blocked.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=205183
Fixes: 14cf11af6cf6 ("powerpc: Merge enough to start building in
arch/powerpc.")
> Reported-by: Tom Lane 
> Cc: Daniel Black 
> Signed-off-by: Daniel Axtens 
> ---
>  arch/powerpc/mm/fault.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index b5047f9b5dec..00183731ea22 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -287,7 +287,17 @@ static bool bad_stack_expansion(struct pt_regs *regs, 
> unsigned long address,
>   if (!res)
>   return !store_updates_sp(inst);
>   *must_retry = true;
> + } else if ((flags & FAULT_FLAG_WRITE) &&
> +!(flags & FAULT_FLAG_USER)) {
> + /*
> +  * the kernel can also attempt to write beyond the end
> +  * of a process's stack - for example setting up a
> +  * signal frame. We assume this is valid, subject to
> +  * the checks in expand_stack() later.
> +  */
> + return false;
>   }
> +
>   return true;
>   }
>   return false;
> -- 
> 2.20.1
> 


Re: [PATCH v2 29/35] powerpc/perf: remove current_is_64bit()

2019-11-27 Thread Michal Suchánek
On Wed, Nov 27, 2019 at 06:41:09AM +0100, Christophe Leroy wrote:
> 
> 
> Le 26/11/2019 à 21:13, Michal Suchanek a écrit :
> > Since commit ed1cd6deb013 ("powerpc: Activate CONFIG_THREAD_INFO_IN_TASK")
> > current_is_64bit() is quivalent to !is_32bit_task().
> > Remove the redundant function.
> > 
> > Link: https://github.com/linuxppc/issues/issues/275
> > Link: https://lkml.org/lkml/2019/9/12/540
> > 
> > Fixes: linuxppc#275
> > Suggested-by: Christophe Leroy 
> > Signed-off-by: Michal Suchanek 
> 
> This change is already in powerpc/next, see 
> https://github.com/linuxppc/linux/commit/42484d2c0f82b666292faf6668c77b49a3a04bc0

Right, needs rebase.

Thanks

Michal
> 
> Christophe
> 
> > ---
> >   arch/powerpc/perf/callchain.c | 17 +
> >   1 file changed, 1 insertion(+), 16 deletions(-)
> > 
> > diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
> > index c84bbd4298a0..35d542515faf 100644
> > --- a/arch/powerpc/perf/callchain.c
> > +++ b/arch/powerpc/perf/callchain.c
> > @@ -284,16 +284,6 @@ static void perf_callchain_user_64(struct 
> > perf_callchain_entry_ctx *entry,
> > }
> >   }
> > -static inline int current_is_64bit(void)
> > -{
> > -   /*
> > -* We can't use test_thread_flag() here because we may be on an
> > -* interrupt stack, and the thread flags don't get copied over
> > -* from the thread_info on the main stack to the interrupt stack.
> > -*/
> > -   return !test_ti_thread_flag(task_thread_info(current), TIF_32BIT);
> > -}
> > -
> >   #else  /* CONFIG_PPC64 */
> >   /*
> >* On 32-bit we just access the address and let hash_page create a
> > @@ -321,11 +311,6 @@ static inline void perf_callchain_user_64(struct 
> > perf_callchain_entry_ctx *entry
> >   {
> >   }
> > -static inline int current_is_64bit(void)
> > -{
> > -   return 0;
> > -}
> > -
> >   static inline int valid_user_sp(unsigned long sp, int is_64)
> >   {
> > if (!sp || (sp & 7) || sp > TASK_SIZE - 32)
> > @@ -486,7 +471,7 @@ static void perf_callchain_user_32(struct 
> > perf_callchain_entry_ctx *entry,
> >   void
> >   perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct 
> > pt_regs *regs)
> >   {
> > -   if (current_is_64bit())
> > +   if (!is_32bit_task())
> > perf_callchain_user_64(entry, regs);
> > else
> > perf_callchain_user_32(entry, regs);
> > 


Re: [PATCH v3 2/4] powerpc/fadump: reorganize /sys/kernel/fadump_* sysfs files

2019-11-24 Thread Michal Suchánek
On Sat, Nov 16, 2019 at 08:07:29PM +0530, Sourabh Jain wrote:
> 
> 
> On 11/9/19 6:29 PM, Michal Suchánek wrote:
> > On Sat, Nov 09, 2019 at 05:53:37PM +0530, Sourabh Jain wrote:
> >> As the number of FADump sysfs files increases it is hard to manage all of
> >> them inside /sys/kernel directory. It's better to have all the FADump
> >> related sysfs files in a dedicated directory /sys/kernel/fadump. But in
> >> order to maintain the backward compatibility the /sys/kernel/fadump_*
> >> sysfs files are replicated inside /sys/kernel/fadump/ and eventually get
> >> removed in future.
> >>
> >> As the FADump sysfs files are now part of dedicated directory there is no
> >> need to prefix their name with fadump_, hence sysfs file names are also
> >> updated. For example fadump_enabled sysfs file is now referred as enabled.
> >>
> >> Also consolidate ABI documentation for all the FADump sysfs files in a
> >> single file Documentation/ABI/testing/sysfs-kernel-fadump.
> >>
> >> Signed-off-by: Sourabh Jain 
> >> ---
> >>  Documentation/ABI/testing/sysfs-kernel-fadump | 41 +++
> >>  arch/powerpc/kernel/fadump.c  | 38 +
> >>  arch/powerpc/platforms/powernv/opal-core.c| 10 +++--
> >>  3 files changed, 86 insertions(+), 3 deletions(-)
> >>  create mode 100644 Documentation/ABI/testing/sysfs-kernel-fadump
> >>
> >> diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump 
> >> b/Documentation/ABI/testing/sysfs-kernel-fadump
> >> new file mode 100644
> >> index ..a77f1a5ba389
> >> --- /dev/null
> >> +++ b/Documentation/ABI/testing/sysfs-kernel-fadump
> >> @@ -0,0 +1,41 @@
> >> +What: /sys/kernel/fadump/*
> >> +Date: Nov 2019
> >> +Contact:  linuxppc-dev@lists.ozlabs.org
> >> +Description:
> >> +  The /sys/kernel/fadump/* is a collection of FADump sysfs
> >> +  file provide information about the configuration status
> >> +  of Firmware Assisted Dump (FADump).
> >> +
> >> +What: /sys/kernel/fadump/enabled
> >> +Date: Nov 2019
> >> +Contact:  linuxppc-dev@lists.ozlabs.org
> >> +Description:  read only
> >> +  Primarily used to identify whether the FADump is enabled in
> >> +  the kernel or not.
> >> +User: Kdump service
> >> +
> >> +What: /sys/kernel/fadump/registered
> >> +Date: Nov 2019
> >> +Contact:  linuxppc-dev@lists.ozlabs.org
> >> +Description:  read/write
> >> +  Helps to control the dump collect feature from userspace.
> >> +  Setting 1 to this file enables the system to collect the
> >> +  dump and 0 to disable it.
> >> +User: Kdump service
> >> +
> >> +What: /sys/kernel/fadump/release_mem
> >> +Date: Nov 2019
> >> +Contact:  linuxppc-dev@lists.ozlabs.org
> >> +Description:  write only
> >> +  This is a special sysfs file and only available when
> >> +  the system is booted to capture the vmcore using FADump.
> >> +  It is used to release the memory reserved by FADump to
> >> +  save the crash dump.
> >> +
> >> +What: /sys/kernel/fadump/release_opalcore
> >> +Date: Nov 2019
> >> +Contact:  linuxppc-dev@lists.ozlabs.org
> >> +Description:  write only
> >> +  The sysfs file is available when the system is booted to
> >> +  collect the dump on OPAL based machine. It used to release
> >> +  the memory used to collect the opalcore.
> >> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> >> index ed59855430b9..a9591def0c84 100644
> >> --- a/arch/powerpc/kernel/fadump.c
> >> +++ b/arch/powerpc/kernel/fadump.c
> >> @@ -1418,6 +1418,9 @@ static int fadump_region_show(struct seq_file *m, 
> >> void *private)
> >>return 0;
> >>  }
> >>  
> >> +struct kobject *fadump_kobj;
> >> +EXPORT_SYMBOL_GPL(fadump_kobj);
> >> +
> >>  static struct kobj_attribute fadump_release_attr = 
> >> __ATTR(fadump_release_mem,
> >>0200, NULL,
> >>fadump_release_memory_store);
> >> @@ -1428,6 

Re: WARN_ONCE does not warn once

2019-11-15 Thread Michal Suchánek
On Fri, Nov 15, 2019 at 09:44:34AM +0100, Michal Suchánek wrote:
> On Fri, Nov 15, 2019 at 03:43:24PM +1100, Michael Ellerman wrote:
> > Michal Suchánek  writes:
> > > On Thu, Nov 14, 2019 at 05:46:55PM +0100, Michal Suchánek wrote:
> > >> Hello,
> > >> 
> > >> on powernv with 5.3.8 kernel I get flood of messages on boot.
> > >> 
> > >> The messages match WARN_ONCE(1, "opal: OPAL_CONSOLE_FLUSH missing.\n");
> > >
> > > Asking around it was pointed out that WARN_ONCE warns up to as many
> > > times as you have CPUs.
> > 
> > Does it?
> > 
> > > Did not bother counting the messages but it may very well be the case it
> > > is printed once for each CPU.
> > 
> > The way it's implemented is slightly racy, but I'd be surprised if every
> > CPU hit that race all at once.
> 
> Printing a warn_once this early probably forces some peculiar timing.
> grep  WARN.*__opal_flush_console dmesg.txt | wc -l gives exactly the
> number of CPUs as shown by lscpu.
> 
And this dose not change with enforcing once using an atomic.

diff --git a/arch/powerpc/platforms/powernv/opal.c 
b/arch/powerpc/platforms/powernv/opal.c
index d271accf224b..dd870124b804 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -425,6 +425,7 @@ static s64 __opal_flush_console(uint32_t vtermno)
s64 rc;
 
if (!opal_check_token(OPAL_CONSOLE_FLUSH)) {
+   static atomic_t warned;
__be64 evt;
 
/*
@@ -432,7 +433,8 @@ static s64 __opal_flush_console(uint32_t vtermno)
 * the console can still be flushed by calling the polling
 * function while it has OPAL_EVENT_CONSOLE_OUTPUT events.
 */
-   WARN_ONCE(1, "opal: OPAL_CONSOLE_FLUSH missing.\n");
+   if (!atomic_xchg(, 1))
+   WARN(1, "opal: OPAL_CONSOLE_FLUSH missing.\n");
 
opal_poll_events();
if (!(be64_to_cpu(evt) & OPAL_EVENT_CONSOLE_OUTPUT))

Something more tricky is going on.

Thanks

Michal


Re: WARN_ONCE does not warn once

2019-11-14 Thread Michal Suchánek
On Thu, Nov 14, 2019 at 05:46:55PM +0100, Michal Suchánek wrote:
> Hello,
> 
> on powernv with 5.3.8 kernel I get flood of messages on boot.
> 
> The messages match WARN_ONCE(1, "opal: OPAL_CONSOLE_FLUSH missing.\n");

Asking around it was pointed out that WARN_ONCE warns up to as many
times as you have CPUs.

Did not bother counting the messages but it may very well be the case it
is printed once for each CPU.

Thanks

Michal


WARN_ONCE does not warn once

2019-11-14 Thread Michal Suchánek
Hello,

on powernv with 5.3.8 kernel I get flood of messages on boot.

The messages match WARN_ONCE(1, "opal: OPAL_CONSOLE_FLUSH missing.\n");

[ cut here ]
opal: OPAL_CONSOLE_FLUSH missing.
WARNING: CPU: 0 PID: 0 at ../arch/powerpc/platforms/powernv/opal.c:435 
__opal_flush_console+0xfc/0x110
Modules linked in:
Supported: No, Unreleased kernel
CPU: 0 PID: 0 Comm: swapper Not tainted 5.3.8-11.gea9b07d-default #1 SLE15-SP2 
(unreleased)
NIP:  c00b9b9c LR: c00b9b98 CTR: c0c06ca0
REGS: c158b8c0 TRAP: 0700   Not tainted  (5.3.8-11.gea9b07d-default)
MSR:  90021033   CR: 28004222  XER: 2000
CFAR: c0132df0 IRQMASK: 1 
GPR00: c00b9b98 c158bb50 c158c600 0022 
GPR04: c0e63b7a 0002 4f534e4f435f4c41 4853554c465f454c 
GPR08: 676e697373696d20  0027 90001033 
GPR12: c01cc080 c215   
GPR16: 313114c0  c14641e0  
GPR20:  c1765738 c1763738 0001 
GPR24: c15c1ed8 c1763318 c1762dec 0001 
GPR28:  c2130d18   
NIP [c00b9b9c] __opal_flush_console+0xfc/0x110
LR [c00b9b98] __opal_flush_console+0xf8/0x110
Call Trace:
[c158bb50] [c00b9b98] __opal_flush_console+0xf8/0x110 
(unreliable)
[c158bbd0] [c00b9dbc] opal_flush_console+0x2c/0x60
[c158bc00] [c080c180] udbg_opal_putc+0x80/0xd0
[c158bc50] [c0033660] udbg_write+0x90/0x140
[c158bc90] [c01c888c] console_unlock+0x32c/0x820
[c158bd70] [c01c9768] register_console+0x5d8/0x790
[c158be10] [c0fb9dcc] register_early_udbg_console+0x9c/0xb4
[c158be80] [c0fb99c8] setup_arch+0x78/0x39c
[c158bef0] [c0fb3c00] start_kernel+0xb4/0x6bc
[c158bf90] [c000c4d4] start_here_common+0x1c/0x20
Instruction dump:
6000 3860fffe 4ba8 6000 6000 3c62ff8d 3921 3d42fff6 
38637558 992afea8 480791f5 6000 <0fe0> 4b54 48079715 6000 
random: get_random_bytes called from print_oops_end_marker+0x6c/0xa0 with 
crng_init=0
---[ end trace  ]---

This is reapated numerous times and the kernel eventually boots.

Is that an issue with the WARN_ONCE implementation?

Thanks

Michal


Re: [PATCH] Fix wrong message when RFI Flush is disable

2019-11-14 Thread Michal Suchánek
On Thu, Nov 14, 2019 at 08:07:35PM +1100, Michael Ellerman wrote:
> On Thu, 2019-05-02 at 21:09:07 UTC, Gustavo Walbon wrote:
> > From: "Gustavo L. F. Walbon" 
> > 
> > The issue was showing "Mitigation" message via sysfs whatever the state of
> > "RFI Flush", but it should show "Vulnerable" when it is disabled.
> > 
> > If you have "L1D private" feature enabled and not "RFI Flush" you are
> > vulnerable to meltdown attacks.
> > 
> > "RFI Flush" is the key feature to mitigate the meltdown whatever the
> > "L1D private" state.
> > 
> > SEC_FTR_L1D_THREAD_PRIV is a feature for Power9 only.
> > 
> > So the message should be as the truth table shows.
> > CPU | L1D private | RFI Flush |   sysfs   |
> > | --- | - | - |
> >  P9 |False|   False   | Vulnerable
> >  P9 |False|   True| Mitigation: RFI Flush
> >  P9 |True |   False   | Vulnerable: L1D private per thread
> >  P9 |True |   True| Mitigation: RFI Flush, L1D private per
> > | |   | thread
> >  P8 |False|   False   | Vulnerable
> >  P8 |False|   True| Mitigation: RFI Flush
> > 
> > Output before this fix:
> >  # cat /sys/devices/system/cpu/vulnerabilities/meltdown
> >  Mitigation: RFI Flush, L1D private per thread
> >  # echo 0 > /sys/kernel/debug/powerpc/rfi_flush
> >  # cat /sys/devices/system/cpu/vulnerabilities/meltdown
> >  Mitigation: L1D private per thread
> > 
> > Output after fix:
> >  # cat /sys/devices/system/cpu/vulnerabilities/meltdown
> >  Mitigation: RFI Flush, L1D private per thread
> >  # echo 0 > /sys/kernel/debug/powerpc/rfi_flush
> >  # cat /sys/devices/system/cpu/vulnerabilities/meltdown
> >  Vulnerable: L1D private per thread
> > 
> > Link: https://github.com/linuxppc/issues/issues/243
> > 
> > Signed-off-by: Gustavo L. F. Walbon 
> > Signed-off-by: Mauro S. M. Rodrigues 
> 
> Applied to powerpc next, thanks.
> 
> https://git.kernel.org/powerpc/c/4e706af3cd8e1d0503c25332b30cad33c97ed442
> 
> cheers

Fixes: ff348355e9c7 ("powerpc/64s: Enhance the information in
cpu_show_meltdown()")

Thanks

Michal


Re: [PATCH 31/33] powerpc/64: make buildable without CONFIG_COMPAT

2019-11-13 Thread Michal Suchánek
On Wed, Nov 13, 2019 at 01:02:34PM +1000, Nicholas Piggin wrote:
> Michal Suchanek's on November 13, 2019 2:52 am:
> > There are numerous references to 32bit functions in generic and 64bit
> > code so ifdef them out.
> > 
> > Signed-off-by: Michal Suchanek 
> 
> For the most part these seem okay to me.
> 
> > diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> > index 45f1d5e54671..35874119b398 100644
> > --- a/arch/powerpc/kernel/Makefile
> > +++ b/arch/powerpc/kernel/Makefile
> > @@ -44,16 +44,16 @@ CFLAGS_btext.o += -DDISABLE_BRANCH_PROFILING
> >  endif
> >  
> >  obj-y  := cputable.o ptrace.o syscalls.o \
> > -  irq.o align.o signal_32.o pmc.o vdso.o \
> > +  irq.o align.o signal_$(BITS).o pmc.o vdso.o \
> >process.o systbl.o idle.o \
> >signal.o sysfs.o cacheinfo.o time.o \
> >prom.o traps.o setup-common.o \
> >udbg.o misc.o io.o misc_$(BITS).o \
> >of_platform.o prom_parse.o
> > -obj-$(CONFIG_PPC64)+= setup_64.o sys_ppc32.o \
> > -  signal_64.o ptrace32.o \
> > +obj-$(CONFIG_PPC64)+= setup_64.o \
> >paca.o nvram_64.o firmware.o note.o \
> >syscall_64.o
> > +obj-$(CONFIG_COMPAT)   += sys_ppc32.o ptrace32.o signal_32.o
> >  obj-$(CONFIG_VDSO32)   += vdso32/
> >  obj-$(CONFIG_PPC_WATCHDOG) += watchdog.o
> >  obj-$(CONFIG_HAVE_HW_BREAKPOINT)   += hw_breakpoint.o
> > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> > index 00173cc904ef..c339a984958f 100644
> > --- a/arch/powerpc/kernel/entry_64.S
> > +++ b/arch/powerpc/kernel/entry_64.S
> > @@ -52,8 +52,10 @@
> >  SYS_CALL_TABLE:
> > .tc sys_call_table[TC],sys_call_table
> >  
> > +#ifdef CONFIG_COMPAT
> >  COMPAT_SYS_CALL_TABLE:
> > .tc compat_sys_call_table[TC],compat_sys_call_table
> > +#endif
> >  
> >  /* This value is used to mark exception frames on the stack. */
> >  exception_marker:
> > diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
> > index 60436432399f..61678cb0e6a1 100644
> > --- a/arch/powerpc/kernel/signal.c
> > +++ b/arch/powerpc/kernel/signal.c
> > @@ -247,7 +247,6 @@ static void do_signal(struct task_struct *tsk)
> > sigset_t *oldset = sigmask_to_save();
> > struct ksignal ksig = { .sig = 0 };
> > int ret;
> > -   int is32 = is_32bit_task();
> >  
> > BUG_ON(tsk != current);
> >  
> > @@ -277,7 +276,7 @@ static void do_signal(struct task_struct *tsk)
> >  
> > rseq_signal_deliver(, tsk->thread.regs);
> >  
> > -   if (is32) {
> > +   if (is_32bit_task()) {
> > if (ksig.ka.sa.sa_flags & SA_SIGINFO)
> > ret = handle_rt_signal32(, oldset, tsk);
> > else
> 
> This is just a clean up I guess.

It also expands directly to if(0) or if(1) for the !COMPAT cases. I am
not sure how it would work with the intermediate variable.

There was more complex change initially but after some additional
cleanups removing the variable is the only part left.

> 
> > diff --git a/arch/powerpc/kernel/syscall_64.c 
> > b/arch/powerpc/kernel/syscall_64.c
> > index d00cfc4a39a9..319ebd4f494d 100644
> > --- a/arch/powerpc/kernel/syscall_64.c
> > +++ b/arch/powerpc/kernel/syscall_64.c
> > @@ -17,7 +17,6 @@ typedef long (*syscall_fn)(long, long, long, long, long, 
> > long);
> >  
> >  long system_call_exception(long r3, long r4, long r5, long r6, long r7, 
> > long r8, unsigned long r0, struct pt_regs *regs)
> >  {
> > -   unsigned long ti_flags;
> > syscall_fn f;
> >  
> > if (IS_ENABLED(CONFIG_PPC_BOOK3S))
> > @@ -64,8 +63,7 @@ long system_call_exception(long r3, long r4, long r5, 
> > long r6, long r7, long r8,
> >  
> > __hard_irq_enable();
> >  
> > -   ti_flags = current_thread_info()->flags;
> > -   if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) {
> > +   if (unlikely(current_thread_info()->flags & _TIF_SYSCALL_DOTRACE)) {
> > /*
> >  * We use the return value of do_syscall_trace_enter() as the
> >  * syscall number. If the syscall was rejected for any reason
> > @@ -81,7 +79,7 @@ long system_call_exception(long r3, long r4, long r5, 
> > long r6, long r7, long r8,
> > /* May be faster to do array_index_nospec? */
> > barrier_nospec();
> >  
> > -   if (unlikely(ti_flags & _TIF_32BIT)) {
> > +   if (unlikely(is_32bit_task())) {
> > f = (void *)compat_sys_call_table[r0];
> >  
> > r3 &= 0xULL;
> 
> I guess this is okay, I did want to be careful about where ti_flags
> was loaded exactly, but I think DOTRACE and 32BIT are not volatile.
> Is it possible to define _TIF_32BIT to zero for 64-bit !compat case
> and have the original branch 

Re: [PATCH v3 2/4] powerpc/fadump: reorganize /sys/kernel/fadump_* sysfs files

2019-11-09 Thread Michal Suchánek
On Sat, Nov 09, 2019 at 05:53:37PM +0530, Sourabh Jain wrote:
> As the number of FADump sysfs files increases it is hard to manage all of
> them inside /sys/kernel directory. It's better to have all the FADump
> related sysfs files in a dedicated directory /sys/kernel/fadump. But in
> order to maintain the backward compatibility the /sys/kernel/fadump_*
> sysfs files are replicated inside /sys/kernel/fadump/ and eventually get
> removed in future.
> 
> As the FADump sysfs files are now part of dedicated directory there is no
> need to prefix their name with fadump_, hence sysfs file names are also
> updated. For example fadump_enabled sysfs file is now referred as enabled.
> 
> Also consolidate ABI documentation for all the FADump sysfs files in a
> single file Documentation/ABI/testing/sysfs-kernel-fadump.
> 
> Signed-off-by: Sourabh Jain 
> ---
>  Documentation/ABI/testing/sysfs-kernel-fadump | 41 +++
>  arch/powerpc/kernel/fadump.c  | 38 +
>  arch/powerpc/platforms/powernv/opal-core.c| 10 +++--
>  3 files changed, 86 insertions(+), 3 deletions(-)
>  create mode 100644 Documentation/ABI/testing/sysfs-kernel-fadump
> 
> diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump 
> b/Documentation/ABI/testing/sysfs-kernel-fadump
> new file mode 100644
> index ..a77f1a5ba389
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-kernel-fadump
> @@ -0,0 +1,41 @@
> +What:/sys/kernel/fadump/*
> +Date:Nov 2019
> +Contact: linuxppc-dev@lists.ozlabs.org
> +Description:
> + The /sys/kernel/fadump/* is a collection of FADump sysfs
> + file provide information about the configuration status
> + of Firmware Assisted Dump (FADump).
> +
> +What:/sys/kernel/fadump/enabled
> +Date:Nov 2019
> +Contact: linuxppc-dev@lists.ozlabs.org
> +Description: read only
> + Primarily used to identify whether the FADump is enabled in
> + the kernel or not.
> +User:Kdump service
> +
> +What:/sys/kernel/fadump/registered
> +Date:Nov 2019
> +Contact: linuxppc-dev@lists.ozlabs.org
> +Description: read/write
> + Helps to control the dump collect feature from userspace.
> + Setting 1 to this file enables the system to collect the
> + dump and 0 to disable it.
> +User:Kdump service
> +
> +What:/sys/kernel/fadump/release_mem
> +Date:Nov 2019
> +Contact: linuxppc-dev@lists.ozlabs.org
> +Description: write only
> + This is a special sysfs file and only available when
> + the system is booted to capture the vmcore using FADump.
> + It is used to release the memory reserved by FADump to
> + save the crash dump.
> +
> +What:/sys/kernel/fadump/release_opalcore
> +Date:Nov 2019
> +Contact: linuxppc-dev@lists.ozlabs.org
> +Description: write only
> + The sysfs file is available when the system is booted to
> + collect the dump on OPAL based machine. It used to release
> + the memory used to collect the opalcore.
> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> index ed59855430b9..a9591def0c84 100644
> --- a/arch/powerpc/kernel/fadump.c
> +++ b/arch/powerpc/kernel/fadump.c
> @@ -1418,6 +1418,9 @@ static int fadump_region_show(struct seq_file *m, void 
> *private)
>   return 0;
>  }
>  
> +struct kobject *fadump_kobj;
> +EXPORT_SYMBOL_GPL(fadump_kobj);
> +
>  static struct kobj_attribute fadump_release_attr = __ATTR(fadump_release_mem,
>   0200, NULL,
>   fadump_release_memory_store);
> @@ -1428,6 +1431,16 @@ static struct kobj_attribute fadump_register_attr = 
> __ATTR(fadump_registered,
>   0644, fadump_register_show,
>   fadump_register_store);
>  
> +static struct kobj_attribute release_attr = __ATTR(release_mem,
> + 0200, NULL,
> + fadump_release_memory_store);
> +static struct kobj_attribute enable_attr = __ATTR(enabled,
> + 0444, fadump_enabled_show,
> + NULL);
> +static struct kobj_attribute register_attr = __ATTR(registered,
> + 0644, fadump_register_show,
> + fadump_register_store);
> +
>  DEFINE_SHOW_ATTRIBUTE(fadump_region);
>  
>  static void fadump_init_files(void)
> @@ -1435,6 +1448,11 @@ static void fadump_init_files(void)
>   struct dentry *debugfs_file;
>   int rc = 0;
>  
> + fadump_kobj = 

Re: [PATCH] powerpc/fadump: Remove duplicate message.

2019-11-07 Thread Michal Suchánek
On Thu, Oct 24, 2019 at 01:16:51PM +0200, Michal Suchánek wrote:
> On Thu, Oct 24, 2019 at 04:08:08PM +0530, Hari Bathini wrote:
> > 
> > Michal, thanks for looking into this.
> > 
> > On 23/10/19 11:26 PM, Michal Suchanek wrote:
> > > There is duplicate message about lack of support by firmware in
> > > fadump_reserve_mem and setup_fadump. Due to different capitalization it
> > > is clear that the one in setup_fadump is shown on boot. Remove the
> > > duplicate that is not shown.
> > 
> > Actually, the message in fadump_reserve_mem() is logged. 
> > fadump_reserve_mem()
> > executes first and sets fw_dump.fadump_enabled to `0`, if fadump is not 
> > supported.
> > So, the other message in setup_fadump() doesn't get logged anymore with 
> > recent
> > changes. The right thing to do would be to remove similar message in 
> > setup_fadump() instead.
> 
> I need to re-check with a recent kernel build. I saw the message from
> setup_fadump and not the one from fadump_reserve_mem but not sure what
> the platform init code looked like in the kernel I tested with.

Indeed, I was missing the patch that changes the capitalization in
fadump_reserve_mem. In my kernel both messages are the same and the one
from fadump_reserve_mem is displayed.

Thanks

Michal


Re: [RFC PATCH 00/27] current interrupt series plus scv syscall

2019-10-30 Thread Michal Suchánek
Hello,

On Wed, Oct 02, 2019 at 01:13:52PM +1000, Nicholas Piggin wrote:
> Michal Suchánek's on September 24, 2019 7:33 pm:
> > Hello,
> > 
> > can you mark the individual patches with RFC rather than the wole
> > series?
> 
> Hey, thanks for the reviews. I'll resend all but the last two patches
> soon for merge in the next window.

Will you resend these or should I cherry-pick the part I need for the
!COMPAT?

Thanks

Michal


Re: [PATCH] powerpc/fadump: Remove duplicate message.

2019-10-24 Thread Michal Suchánek
On Thu, Oct 24, 2019 at 04:08:08PM +0530, Hari Bathini wrote:
> 
> Michal, thanks for looking into this.
> 
> On 23/10/19 11:26 PM, Michal Suchanek wrote:
> > There is duplicate message about lack of support by firmware in
> > fadump_reserve_mem and setup_fadump. Due to different capitalization it
> > is clear that the one in setup_fadump is shown on boot. Remove the
> > duplicate that is not shown.
> 
> Actually, the message in fadump_reserve_mem() is logged. fadump_reserve_mem()
> executes first and sets fw_dump.fadump_enabled to `0`, if fadump is not 
> supported.
> So, the other message in setup_fadump() doesn't get logged anymore with recent
> changes. The right thing to do would be to remove similar message in 
> setup_fadump() instead.

I need to re-check with a recent kernel build. I saw the message from
setup_fadump and not the one from fadump_reserve_mem but not sure what
the platform init code looked like in the kernel I tested with.

Thanks

Michal


Re: [RFC PATCH 23/27] powerpc/64: system call implement the bulk of the logic in C

2019-10-02 Thread Michal Suchánek
On Sun, Sep 15, 2019 at 11:28:09AM +1000, Nicholas Piggin wrote:
> System call entry and particularly exit code is beyond the limit of what
> is reasonable to implement in asm.
> 
> This conversion moves all conditional branches out of the asm code,
> except for the case that all GPRs should be restored at exit.
> 
> Null syscall test is about 5% faster after this patch, because the exit
> work is handled under local_irq_disable, and the hard mask and pending
> interrupt replay is handled after that, which avoids games with MSR.
> 
> Signed-off-by: Nicholas Piggin 
> 
> v3:
> - Fix !KUAP build [mpe]
> - Fix BookE build/boot [mpe]
> - Don't trace irqs with MSR[RI]=0
> - Don't allow syscall_exit_prepare to be ftraced, because function
>   graph tracing which traces exits barfs after the IRQ state is
>   prepared for kernel exit.
> - Fix BE syscall table to use normal function descriptors now that they
>   are called from C.
> - Comment syscall_exit_prepare.
...
> -#if defined(CONFIG_VIRT_CPU_ACCOUNTING_NATIVE) && defined(CONFIG_PPC_SPLPAR)
> -BEGIN_FW_FTR_SECTION
> - /* see if there are any DTL entries to process */
> - ld  r10,PACALPPACAPTR(r13)  /* get ptr to VPA */
> - ld  r11,PACA_DTL_RIDX(r13)  /* get log read index */
> - addir10,r10,LPPACA_DTLIDX
> - LDX_BE  r10,0,r10   /* get log write index */
> - cmpdr11,r10
> - beq+33f
> - bl  accumulate_stolen_time
> - REST_GPR(0,r1)
> - REST_4GPRS(3,r1)
> - REST_2GPRS(7,r1)
> - addir9,r1,STACK_FRAME_OVERHEAD
> -33:
> -END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
> -#endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE && CONFIG_PPC_SPLPAR */
...
> diff --git a/arch/powerpc/kernel/syscall_64.c 
> b/arch/powerpc/kernel/syscall_64.c
> new file mode 100644
> index ..1d2529824588
> --- /dev/null
> +++ b/arch/powerpc/kernel/syscall_64.c
> @@ -0,0 +1,195 @@
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +extern void __noreturn tabort_syscall(void);
> +
> +typedef long (*syscall_fn)(long, long, long, long, long, long);
> +
> +long system_call_exception(long r3, long r4, long r5, long r6, long r7, long 
> r8, unsigned long r0, struct pt_regs *regs)
> +{
> + unsigned long ti_flags;
> + syscall_fn f;
> +
> + BUG_ON(!(regs->msr & MSR_PR));
> +
> + if (IS_ENABLED(CONFIG_PPC_TRANSACTIONAL_MEM) &&
> + unlikely(regs->msr & MSR_TS_T))
> + tabort_syscall();
> +
> + account_cpu_user_entry();
> +
> +#ifdef CONFIG_PPC_SPLPAR
> + if (IS_ENABLED(CONFIG_VIRT_CPU_ACCOUNTING_NATIVE) &&
> + firmware_has_feature(FW_FEATURE_SPLPAR)) {
> + struct lppaca *lp = get_lppaca();
> +
> + if (unlikely(local_paca->dtl_ridx != lp->dtl_idx))

sparse complains about this, and in time.c this it converted like this:
if (unlikely(local_paca->dtl_ridx != be64_to_cpu(lp->dtl_idx)))
The removed code has a LDX_BE there so there should be some conversion
involved, right?

Thanks

Michal


Re: [RFC PATCH 27/27] powerpc/64s: system call support for scv/rfscv instructions

2019-10-02 Thread Michal Suchánek
On Sun, Sep 15, 2019 at 11:28:13AM +1000, Nicholas Piggin wrote:
> Add support for the scv instruction on POWER9 and later CPUs.
> 
> For now this implements the zeroth scv vector 'scv 0', as identical
> to 'sc' system calls, with the exception that lr is not preserved, and
> it is 64-bit only. There may yet be changes made to this ABI, so it's
> for testing only.
> 
> This also doesn't yet properly handle PR KVM, or the case where a guest
> is denied AIL=3 mode. I haven't added real mode entry points, so scv
> must not be used when AIL=0, but I haven't cleared the FSCR in this
> case.
> 
> This also implements a strange hack to handle the non-implemented
> vectors, scheduling a decrementer and expecting it to interrupt and
> replay pending soft masked interrupts. This is unlikely to be a good
> idea, and needs to actually do a proper handler and replay in case an
> interrupt is pending.
> 
> It may also require some errata handling before it can be safely used
> on all POWER9 CPUs, I have to look that up.
> 
> rfscv is implemented to return from scv type system calls. It can not
> be used to return from sc system calls because those are defined to
> preserve lr.
> 
> In a comparison of getpid syscall, the test program had scv taking
> about 3 more cycles in user mode (92 vs 89 for sc), due to lr handling.
> Total cycles taken for a getpid system call on POWER9 are improved from
> 436 to 345 (26.3% faster), mostly due to reducing mtmsr and mtspr.
...
> diff --git a/arch/powerpc/kernel/syscall_64.c 
> b/arch/powerpc/kernel/syscall_64.c
> index 034b52d3d78c..3e8aa5ae8ec8 100644
> --- a/arch/powerpc/kernel/syscall_64.c
> +++ b/arch/powerpc/kernel/syscall_64.c
> @@ -15,6 +15,77 @@ extern void __noreturn tabort_syscall(void);
>  
>  typedef long (*syscall_fn)(long, long, long, long, long, long);
>  
> +#ifdef CONFIG_PPC_BOOK3S
> +long system_call_vectored_exception(long r3, long r4, long r5, long r6, long 
> r7, long r8, unsigned long r0, struct pt_regs *regs)
> +{
> + unsigned long ti_flags;
> + syscall_fn f;
> +
> + BUG_ON(!(regs->msr & MSR_RI));
> + BUG_ON(!(regs->msr & MSR_PR));
> + BUG_ON(!FULL_REGS(regs));
> + BUG_ON(regs->softe != IRQS_ENABLED);
> +
> + if (IS_ENABLED(CONFIG_PPC_TRANSACTIONAL_MEM) &&
> + unlikely(regs->msr & MSR_TS_T))
> + tabort_syscall();
> +
> + account_cpu_user_entry();
> +
> +#ifdef CONFIG_PPC_SPLPAR
> + if (IS_ENABLED(CONFIG_VIRT_CPU_ACCOUNTING_NATIVE) &&
> + firmware_has_feature(FW_FEATURE_SPLPAR)) {
> + struct lppaca *lp = get_lppaca();
> +
> + if (unlikely(local_paca->dtl_ridx != lp->dtl_idx))
This adds another instance of the lack of endian conversion issue.
> + accumulate_stolen_time();
> + }
> +#endif

Thanks

Michal


Re: [RFC PATCH 24/27] powerpc/64s: interrupt return in C

2019-10-02 Thread Michal Suchánek
On Sun, Sep 15, 2019 at 11:28:10AM +1000, Nicholas Piggin wrote:
> Implement the bulk of interrupt return logic in C. The asm return code
> must handle a few cases: restoring full GPRs, and emulating stack store.
> 
> The asm return code is moved into 64e for now. The new logic has made
> allowance for 64e, but I don't have a full environment that works well
> to test it, and even booting in emulated qemu is not great for stress
> testing. 64e shouldn't be too far off working with this, given a bit
> more testing and auditing of the logic.
> 
> This is slightly faster on a POWER9 (page fault speed increases about
> 1.1%), probably due to reduced mtmsrd.
> 
> Signed-off-by: Nicholas Piggin 
...
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index 24621e7e5033..45c1524b6c9e 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -524,6 +524,7 @@ void giveup_all(struct task_struct *tsk)
>  }
>  EXPORT_SYMBOL(giveup_all);
>  
> +#ifdef CONFIG_PPC_BOOK3S_64
This fails build on !BOOK3S_64 because restore_fpu and restore_altivec
are used exclusively from restore_math which is now BOOK3S_64 only.
>  /*
>   * The exception exit path calls restore_math() with interrupts hard disabled
>   * but the soft irq state not "reconciled". ftrace code that calls
> @@ -564,6 +565,7 @@ void notrace restore_math(struct pt_regs *regs)
>  
>   regs->msr = msr;
>  }
> +#endif
>  
>  static void save_all(struct task_struct *tsk)
>  {

Thanks

Michal


Re: [RFC PATCH 00/27] current interrupt series plus scv syscall

2019-10-02 Thread Michal Suchánek
Hello,

can you mark the individual patches with RFC rather than the wole
series?

Thanks

Michal

On Sun, Sep 15, 2019 at 11:27:46AM +1000, Nicholas Piggin wrote:
> My interrupt entry patches have finally collided with syscall and
> interrupt exit patches, so I'll merge the series. Most patches have
> been seen already, however there have been a number of small changes
> and fixes throughout the series.
> 
> The final two patches add support for 'scv' and 'rfscv' instructions.
> 
> I'm posting this out now so we can start considering ABI and userspace
> support. We have the PPC_FEATURE2_SCV hwcap for this.
> 
> Thanks,
> Nick
> 
> Nicholas Piggin (27):
>   powerpc/64s/exception: Introduce INT_DEFINE parameter block for code
> generation
>   powerpc/64s/exception: Add GEN_COMMON macro that uses INT_DEFINE
> parameters
>   powerpc/64s/exception: Add GEN_KVM macro that uses INT_DEFINE
> parameters
>   powerpc/64s/exception: Expand EXC_COMMON and EXC_COMMON_ASYNC macros
>   powerpc/64s/exception: Move all interrupt handlers to new style code
> gen macros
>   powerpc/64s/exception: Remove old INT_ENTRY macro
>   powerpc/64s/exception: Remove old INT_COMMON macro
>   powerpc/64s/exception: Remove old INT_KVM_HANDLER
>   powerpc/64s/exception: Add ISIDE option
>   powerpc/64s/exception: move real->virt switch into the common handler
>   powerpc/64s/exception: move soft-mask test to common code
>   powerpc/64s/exception: move KVM test to common code
>   powerpc/64s/exception: remove confusing IEARLY option
>   powerpc/64s/exception: remove the SPR saving patch code macros
>   powerpc/64s/exception: trim unused arguments from KVMTEST macro
>   powerpc/64s/exception: hdecrementer avoid touching the stack
>   powerpc/64s/exception: re-inline some handlers
>   powerpc/64s/exception: Clean up SRR specifiers
>   powerpc/64s/exception: add more comments for interrupt handlers
>   powerpc/64s/exception: only test KVM in SRR interrupts when PR KVM is
> supported
>   powerpc/64s/exception: soft nmi interrupt should not use
> ret_from_except
>   powerpc/64: system call remove non-volatile GPR save optimisation
>   powerpc/64: system call implement the bulk of the logic in C
>   powerpc/64s: interrupt return in C
>   powerpc/64s/exception: remove lite interrupt return
>   powerpc/64s/exception: treat NIA below __end_interrupts as soft-masked
>   powerpc/64s: system call support for scv/rfscv instructions
> 
>  arch/powerpc/include/asm/asm-prototypes.h |   11 -
>  .../powerpc/include/asm/book3s/64/kup-radix.h |   24 +-
>  arch/powerpc/include/asm/cputime.h|   24 +
>  arch/powerpc/include/asm/exception-64s.h  |4 -
>  arch/powerpc/include/asm/head-64.h|2 +-
>  arch/powerpc/include/asm/hw_irq.h |4 +
>  arch/powerpc/include/asm/ppc_asm.h|2 +
>  arch/powerpc/include/asm/processor.h  |2 +-
>  arch/powerpc/include/asm/ptrace.h |3 +
>  arch/powerpc/include/asm/signal.h |3 +
>  arch/powerpc/include/asm/switch_to.h  |   11 +
>  arch/powerpc/include/asm/time.h   |4 +-
>  arch/powerpc/kernel/Makefile  |3 +-
>  arch/powerpc/kernel/cpu_setup_power.S |2 +-
>  arch/powerpc/kernel/dt_cpu_ftrs.c |1 +
>  arch/powerpc/kernel/entry_64.S|  964 ++--
>  arch/powerpc/kernel/exceptions-64e.S  |  254 +-
>  arch/powerpc/kernel/exceptions-64s.S  | 2046 -
>  arch/powerpc/kernel/process.c |2 +
>  arch/powerpc/kernel/signal.h  |2 -
>  arch/powerpc/kernel/syscall_64.c  |  422 
>  arch/powerpc/kernel/syscalls/syscall.tbl  |   22 +-
>  arch/powerpc/kernel/systbl.S  |9 +-
>  arch/powerpc/kernel/time.c|9 -
>  arch/powerpc/kernel/vector.S  |2 +-
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S   |   11 -
>  arch/powerpc/kvm/book3s_segment.S |7 -
>  27 files changed, 2458 insertions(+), 1392 deletions(-)
>  create mode 100644 arch/powerpc/kernel/syscall_64.c
> 
> -- 
> 2.23.0
> 


Re: [PATCH v7 5/6] powerpc/64: Make COMPAT user-selectable disabled on littleendian by default.

2019-09-14 Thread Michal Suchánek
On Tue, 03 Sep 2019 10:00:57 +1000
Michael Ellerman  wrote:

> Michal Suchánek  writes:
> > On Mon, 02 Sep 2019 12:03:12 +1000
> > Michael Ellerman  wrote:
> >  
> >> Michal Suchanek  writes:  
> >> > On bigendian ppc64 it is common to have 32bit legacy binaries but much
> >> > less so on littleendian.
> >> 
> >> I think the toolchain people will tell you that there is no 32-bit
> >> little endian ABI defined at all, if anything works it's by accident.  
> >
> > I have seen a piece of software that workarounds code issues on 64bit
> > by always compiling 32bit code. So it does work in some way.  
> 
> What software is that?

The only one I have seen is stockfish (v9)

> 
> > Also it has been pointed out that you can still switch to BE even with
> > the 'fast-switch' removed.  
> 
> Yes we have a proper syscall for endian switching, sys_switch_endian(),
> which is definitely supported.
> 
> But that *only* switches the endian-ness of the process, it does nothing
> to the syscall layer. So any process that switches to the other endian
> must endian flip syscall arguments (that aren't in registers), or flip
> back to the native endian before calling syscalls.

In other words just installing a chroot of binaries built for the other
endian won't work. You need something like qemu to do the syscall
translation or run full VM with a kernel that has the swapped endian
syscall ABI.

Thanks

Michal


Re: [PATCH v8 5/7] powerpc/64: make buildable without CONFIG_COMPAT

2019-09-12 Thread Michal Suchánek
On Thu, 12 Sep 2019 21:36:11 +0200
Christophe Leroy  wrote:

> Le 12/09/2019 à 20:26, Michal Suchánek a écrit :
> > On Thu, 12 Sep 2019 20:02:16 +0200
> > Christophe Leroy  wrote:
> >   
> >> Le 12/09/2019 à 19:26, Michal Suchanek a écrit :  
> >>> There are numerous references to 32bit functions in generic and 64bit
> >>> code so ifdef them out.
> >>>
> >>> Signed-off-by: Michal Suchanek 
> >>> ---
> >>> v2:
> >>> - fix 32bit ifdef condition in signal.c
> >>> - simplify the compat ifdef condition in vdso.c - 64bit is redundant
> >>> - simplify the compat ifdef condition in callchain.c - 64bit is redundant
> >>> v3:
> >>> - use IS_ENABLED and maybe_unused where possible
> >>> - do not ifdef declarations
> >>> - clean up Makefile
> >>> v4:
> >>> - further makefile cleanup
> >>> - simplify is_32bit_task conditions
> >>> - avoid ifdef in condition by using return
> >>> v5:
> >>> - avoid unreachable code on 32bit
> >>> - make is_current_64bit constant on !COMPAT
> >>> - add stub perf_callchain_user_32 to avoid some ifdefs
> >>> v6:
> >>> - consolidate current_is_64bit
> >>> v7:
> >>> - remove leftover perf_callchain_user_32 stub from previous series version
> >>> v8:
> >>> - fix build again - too trigger-happy with stub removal
> >>> - remove a vdso.c hunk that causes warning according to kbuild test robot
> >>> ---
> >>>arch/powerpc/include/asm/thread_info.h |  4 +--
> >>>arch/powerpc/kernel/Makefile   |  7 ++---
> >>>arch/powerpc/kernel/entry_64.S |  2 ++
> >>>arch/powerpc/kernel/signal.c   |  3 +-
> >>>arch/powerpc/kernel/syscall_64.c   |  6 ++--
> >>>arch/powerpc/kernel/vdso.c |  3 +-
> >>>arch/powerpc/perf/callchain.c  | 39 ++
> >>>7 files changed, 33 insertions(+), 31 deletions(-)
> >>>
> >>> diff --git a/arch/powerpc/include/asm/thread_info.h 
> >>> b/arch/powerpc/include/asm/thread_info.h
> >>> index 8e1d0195ac36..c128d8a48ea3 100644
> >>> --- a/arch/powerpc/include/asm/thread_info.h
> >>> +++ b/arch/powerpc/include/asm/thread_info.h
> >>> @@ -144,10 +144,10 @@ static inline bool test_thread_local_flags(unsigned 
> >>> int flags)
> >>>   return (ti->local_flags & flags) != 0;
> >>>}
> >>>
> >>> -#ifdef CONFIG_PPC64
> >>> +#ifdef CONFIG_COMPAT
> >>>#define is_32bit_task()(test_thread_flag(TIF_32BIT))
> >>>#else
> >>> -#define is_32bit_task()  (1)
> >>> +#define is_32bit_task()  (IS_ENABLED(CONFIG_PPC32))
> >>>#endif
> >>>
> >>>#if defined(CONFIG_PPC64)  
> >>
> >> [...]
> >>  
> >>> +static inline int current_is_64bit(void)
> >>> +{
> >>> + if (!IS_ENABLED(CONFIG_COMPAT))
> >>> + return IS_ENABLED(CONFIG_PPC64);
> >>> + /*
> >>> +  * We can't use test_thread_flag() here because we may be on an
> >>> +  * interrupt stack, and the thread flags don't get copied over
> >>> +  * from the thread_info on the main stack to the interrupt stack.
> >>> +  */
> >>> + return !test_ti_thread_flag(task_thread_info(current), TIF_32BIT);
> >>> +}  
> >>
> >>
> >> Since at least commit ed1cd6deb013 ("powerpc: Activate
> >> CONFIG_THREAD_INFO_IN_TASK")
> >> [https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ed1cd6d]
> >> the above comment is wrong and current_is_64bit() is equivalent to
> >> !is_32bit_task()
> >>
> >> See https://github.com/linuxppc/issues/issues/275
> >>
> >> Christophe  
> > 
> > I aim at changing the code as little as possible here. A separate patch
> > on top removing this function would be ok?  
> 
> Yes I agree. By making prior to this patch a separate patch which drops 
> current_is_64bit() would be good. And it would reduce the size of this 
> patch by approximately one third.

Indeed, removing it before makes this patch much cleaner.

Thanks

Michal


Re: [PATCH v8 5/7] powerpc/64: make buildable without CONFIG_COMPAT

2019-09-12 Thread Michal Suchánek
On Thu, 12 Sep 2019 20:02:16 +0200
Christophe Leroy  wrote:

> Le 12/09/2019 à 19:26, Michal Suchanek a écrit :
> > There are numerous references to 32bit functions in generic and 64bit
> > code so ifdef them out.
> > 
> > Signed-off-by: Michal Suchanek 
> > ---
> > v2:
> > - fix 32bit ifdef condition in signal.c
> > - simplify the compat ifdef condition in vdso.c - 64bit is redundant
> > - simplify the compat ifdef condition in callchain.c - 64bit is redundant
> > v3:
> > - use IS_ENABLED and maybe_unused where possible
> > - do not ifdef declarations
> > - clean up Makefile
> > v4:
> > - further makefile cleanup
> > - simplify is_32bit_task conditions
> > - avoid ifdef in condition by using return
> > v5:
> > - avoid unreachable code on 32bit
> > - make is_current_64bit constant on !COMPAT
> > - add stub perf_callchain_user_32 to avoid some ifdefs
> > v6:
> > - consolidate current_is_64bit
> > v7:
> > - remove leftover perf_callchain_user_32 stub from previous series version
> > v8:
> > - fix build again - too trigger-happy with stub removal
> > - remove a vdso.c hunk that causes warning according to kbuild test robot
> > ---
> >   arch/powerpc/include/asm/thread_info.h |  4 +--
> >   arch/powerpc/kernel/Makefile   |  7 ++---
> >   arch/powerpc/kernel/entry_64.S |  2 ++
> >   arch/powerpc/kernel/signal.c   |  3 +-
> >   arch/powerpc/kernel/syscall_64.c   |  6 ++--
> >   arch/powerpc/kernel/vdso.c |  3 +-
> >   arch/powerpc/perf/callchain.c  | 39 ++
> >   7 files changed, 33 insertions(+), 31 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/thread_info.h 
> > b/arch/powerpc/include/asm/thread_info.h
> > index 8e1d0195ac36..c128d8a48ea3 100644
> > --- a/arch/powerpc/include/asm/thread_info.h
> > +++ b/arch/powerpc/include/asm/thread_info.h
> > @@ -144,10 +144,10 @@ static inline bool test_thread_local_flags(unsigned 
> > int flags)
> > return (ti->local_flags & flags) != 0;
> >   }
> >   
> > -#ifdef CONFIG_PPC64
> > +#ifdef CONFIG_COMPAT
> >   #define is_32bit_task()   (test_thread_flag(TIF_32BIT))
> >   #else
> > -#define is_32bit_task()(1)
> > +#define is_32bit_task()(IS_ENABLED(CONFIG_PPC32))
> >   #endif
> >   
> >   #if defined(CONFIG_PPC64)  
> 
> [...]
> 
> > +static inline int current_is_64bit(void)
> > +{
> > +   if (!IS_ENABLED(CONFIG_COMPAT))
> > +   return IS_ENABLED(CONFIG_PPC64);
> > +   /*
> > +* We can't use test_thread_flag() here because we may be on an
> > +* interrupt stack, and the thread flags don't get copied over
> > +* from the thread_info on the main stack to the interrupt stack.
> > +*/
> > +   return !test_ti_thread_flag(task_thread_info(current), TIF_32BIT);
> > +}  
> 
> 
> Since at least commit ed1cd6deb013 ("powerpc: Activate 
> CONFIG_THREAD_INFO_IN_TASK") 
> [https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ed1cd6d]
>  
> the above comment is wrong and current_is_64bit() is equivalent to 
> !is_32bit_task()
> 
> See https://github.com/linuxppc/issues/issues/275
> 
> Christophe

I aim at changing the code as little as possible here. A separate patch
on top removing this function would be ok?

Thanks

Michal


Re: [PATCH v3] powerpc/64: system call implement the bulk of the logic in C

2019-09-05 Thread Michal Suchánek
On Thu, 5 Sep 2019 15:25:31 +
Christophe Leroy  wrote:

> On 09/05/2019 12:35 PM, Nicholas Piggin wrote:
> > System call entry and particularly exit code is beyond the limit of what
> > is reasonable to implement in asm.
> > 
> > This conversion moves all conditional branches out of the asm code,
> > except for the case that all GPRs should be restored at exit.
> > 
> > Null syscall test is about 5% faster after this patch, because the exit
> > work is handled under local_irq_disable, and the hard mask and pending
> > interrupt replay is handled after that, which avoids games with MSR.
> > 
> > Signed-off-by: Nicholas Piggin   
> 
> Cannot apply it on latest powerpc/merge. On what does that apply ?

The v2 series had 4 patches so presumably the previous 3 are missing
here.

Thanks

Michal


Re: [PATCH v2] powerpc/fadump: when fadump is supported register the fadump sysfs files.

2019-09-04 Thread Michal Suchánek
On Thu, 29 Aug 2019 10:58:16 +0530
Hari Bathini  wrote:

> On 28/08/19 10:57 PM, Michal Suchanek wrote:
> > Currently it is not possible to distinguish the case when fadump is
> > supported by firmware and disabled in kernel and completely unsupported
> > using the kernel sysfs interface. User can investigate the devicetree
> > but it is more reasonable to provide sysfs files in case we get some
> > fadumpv2 in the future.
> > 
> > With this patch sysfs files are available whenever fadump is supported
> > by firmware.
> > 
> > Signed-off-by: Michal Suchanek 
> > ---  
> 
> [...]
> 
> > -   if (!fw_dump.fadump_supported) {
> > +   if (!fw_dump.fadump_supported && fw_dump.fadump_enabled) {
> > printk(KERN_ERR "Firmware-assisted dump is not supported on"
> > " this hardware\n");
> > -   return 0;
> > }  
> 
> The above hunk is redundant with similar message already logged during
> early boot in fadump_reserve_mem() function. I am not strongly against
> this though. So...

I see this:
[0.00] debug: ignoring loglevel setting.
[0.00] Firmware-assisted dump is not supported on this hardware
[0.00] Reserving 256MB of memory at 128MB for crashkernel (System RAM: 
8192MB)
[0.00] Allocated 5832704 bytes for 2048 pacas at c7a8
[0.00] Page sizes from device-tree:
[0.00] base_shift=12: shift=12, sllp=0x, avpnm=0x, 
tlbiel=1, penc=0
[0.00] base_shift=16: shift=16, sllp=0x0110, avpnm=0x, 
tlbiel=1, penc=1
[0.00] Page orders: linear mapping = 16, virtual = 16, io = 16, vmemmap 
= 16
[0.00] Using 1TB segments
[0.00] Initializing hash mmu with SLB

Clearly the second message is logged from the above code. The duplicate
is capitalized: "Firmware-Assisted Dump is not supported on this
hardware" and I don't see it logged. So if anything is duplicate that
should be removed it is the message in fadump_reserve_mem(). It is not
clear why that one is not seen, though.

Thanks

Michal


Re: [PATCH v7 5/6] powerpc/64: Make COMPAT user-selectable disabled on littleendian by default.

2019-09-02 Thread Michal Suchánek
On Mon, 02 Sep 2019 12:03:12 +1000
Michael Ellerman  wrote:

> Michal Suchanek  writes:
> > On bigendian ppc64 it is common to have 32bit legacy binaries but much
> > less so on littleendian.  
> 
> I think the toolchain people will tell you that there is no 32-bit
> little endian ABI defined at all, if anything works it's by accident.

I have seen a piece of software that workarounds code issues on 64bit
by always compiling 32bit code. So it does work in some way. Also it
has been pointed out that you can still switch to BE even with the
'fast-switch' removed.

> 
> So I think we should not make this selectable, unless someone puts their
> hand up to say they want it and are willing to test it and keep it
> working.

I don't really care either way.

Thanks

Michal

> 
> cheers
> 
> > v3: make configurable
> > ---
> >  arch/powerpc/Kconfig | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > index 5bab0bb6b833..b0339e892329 100644
> > --- a/arch/powerpc/Kconfig
> > +++ b/arch/powerpc/Kconfig
> > @@ -264,8 +264,9 @@ config PANIC_TIMEOUT
> > default 180
> >  
> >  config COMPAT
> > -   bool
> > -   default y if PPC64
> > +   bool "Enable support for 32bit binaries"
> > +   depends on PPC64
> > +   default y if !CPU_LITTLE_ENDIAN
> > select COMPAT_BINFMT_ELF
> > select ARCH_WANT_OLD_COMPAT_IPC
> > select COMPAT_OLD_SIGACTION
> > -- 
> > 2.22.0  



Re: [PATCH v7 3/6] powerpc/perf: consolidate read_user_stack_32

2019-09-02 Thread Michal Suchánek
On Mon, 02 Sep 2019 14:01:17 +1000
Michael Ellerman  wrote:

> Michael Ellerman  writes:
> > Michal Suchanek  writes:  
> ...
> >> @@ -295,6 +279,12 @@ static inline int current_is_64bit(void)
> >>  }
> >>  
> >>  #else  /* CONFIG_PPC64 */
> >> +static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
> >> +{
> >> +  return 0;
> >> +}
> >> +#endif /* CONFIG_PPC64 */  
> >
> > Ending the PPC64 else case here, and then restarting it below with an
> > ifndef means we end up with two parts of the file that define 32-bit
> > code, with a common chunk in the middle, which I dislike.
> >
> > I'd rather you add the empty read_user_stack_slow() in the existing
> > #else section and then move read_user_stack_32() below the whole ifdef
> > PPC64/else/endif section.
> >
> > Is there some reason that doesn't work?  
> 
> Gah, I missed that you split the whole file later in the series. Any
> reason you did it in two steps rather than moving patch 6 earlier in the
> series?

To make this patch readable.

Thanks

Michal


Re: [PATCH v7 6/6] powerpc/perf: split callchain.c by bitness

2019-08-31 Thread Michal Suchánek
On Fri, 30 Aug 2019 23:03:43 +0200
Michal Suchanek  wrote:

> Building callchain.c with !COMPAT proved quite ugly with all the
> defines. Splitting out the 32bit and 64bit parts looks better.
> 
> No code change intended.

valid_user_sp is broken in compat. It needs to be ifdefed for the 32bit
case.
> 
> Signed-off-by: Michal Suchanek 
> Reviewed-by: Christophe Leroy 
> ---
> v6:
>  - move current_is_64bit consolidetaion to earlier patch
>  - move defines to the top of callchain_32.c
>  - Makefile cleanup
> ---
>  arch/powerpc/perf/Makefile   |   5 +-
>  arch/powerpc/perf/callchain.c| 365 +--
>  arch/powerpc/perf/callchain.h|  11 +
>  arch/powerpc/perf/callchain_32.c | 204 +
>  arch/powerpc/perf/callchain_64.c | 185 
>  5 files changed, 405 insertions(+), 365 deletions(-)
>  create mode 100644 arch/powerpc/perf/callchain.h
>  create mode 100644 arch/powerpc/perf/callchain_32.c
>  create mode 100644 arch/powerpc/perf/callchain_64.c
> 
> diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
> index c155dcbb8691..53d614e98537 100644
> --- a/arch/powerpc/perf/Makefile
> +++ b/arch/powerpc/perf/Makefile
> @@ -1,6 +1,9 @@
>  # SPDX-License-Identifier: GPL-2.0
>  
> -obj-$(CONFIG_PERF_EVENTS)+= callchain.o perf_regs.o
> +obj-$(CONFIG_PERF_EVENTS)+= callchain.o callchain_$(BITS).o perf_regs.o
> +ifdef CONFIG_COMPAT
> +obj-$(CONFIG_PERF_EVENTS)+= callchain_32.o
> +endif
>  
>  obj-$(CONFIG_PPC_PERF_CTRS)  += core-book3s.o bhrb.o
>  obj64-$(CONFIG_PPC_PERF_CTRS)+= ppc970-pmu.o power5-pmu.o \
> diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
> index 9b8dc822f531..8f30f1b47c78 100644
> --- a/arch/powerpc/perf/callchain.c
> +++ b/arch/powerpc/perf/callchain.c
> @@ -15,11 +15,9 @@
>  #include 
>  #include 
>  #include 
> -#ifdef CONFIG_COMPAT
> -#include "../kernel/ppc32.h"
> -#endif
>  #include 
>  
> +#include "callchain.h"
>  
>  /*
>   * Is sp valid as the address of the next kernel stack frame after prev_sp?
> @@ -102,367 +100,6 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx 
> *entry, struct pt_regs *re
>   }
>  }
>  
> -#ifdef CONFIG_PPC64
> -/*
> - * On 64-bit we don't want to invoke hash_page on user addresses from
> - * interrupt context, so if the access faults, we read the page tables
> - * to find which page (if any) is mapped and access it directly.
> - */
> -static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
> -{
> - int ret = -EFAULT;
> - pgd_t *pgdir;
> - pte_t *ptep, pte;
> - unsigned shift;
> - unsigned long addr = (unsigned long) ptr;
> - unsigned long offset;
> - unsigned long pfn, flags;
> - void *kaddr;
> -
> - pgdir = current->mm->pgd;
> - if (!pgdir)
> - return -EFAULT;
> -
> - local_irq_save(flags);
> - ptep = find_current_mm_pte(pgdir, addr, NULL, );
> - if (!ptep)
> - goto err_out;
> - if (!shift)
> - shift = PAGE_SHIFT;
> -
> - /* align address to page boundary */
> - offset = addr & ((1UL << shift) - 1);
> -
> - pte = READ_ONCE(*ptep);
> - if (!pte_present(pte) || !pte_user(pte))
> - goto err_out;
> - pfn = pte_pfn(pte);
> - if (!page_is_ram(pfn))
> - goto err_out;
> -
> - /* no highmem to worry about here */
> - kaddr = pfn_to_kaddr(pfn);
> - memcpy(buf, kaddr + offset, nb);
> - ret = 0;
> -err_out:
> - local_irq_restore(flags);
> - return ret;
> -}
> -
> -static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
> -{
> - if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned long) ||
> - ((unsigned long)ptr & 7))
> - return -EFAULT;
> -
> - pagefault_disable();
> - if (!__get_user_inatomic(*ret, ptr)) {
> - pagefault_enable();
> - return 0;
> - }
> - pagefault_enable();
> -
> - return read_user_stack_slow(ptr, ret, 8);
> -}
> -
> -static inline int valid_user_sp(unsigned long sp, int is_64)
> -{
> - if (!sp || (sp & 7) || sp > (is_64 ? TASK_SIZE : 0x1UL) - 32)
> - return 0;
> - return 1;
> -}
> -
> -/*
> - * 64-bit user processes use the same stack frame for RT and non-RT signals.
> - */
> -struct signal_frame_64 {
> - chardummy[__SIGNAL_FRAMESIZE];
> - struct ucontext uc;
> - unsigned long   unused[2];
> - unsigned inttramp[6];
> - struct siginfo  *pinfo;
> - void*puc;
> - struct siginfo  info;
> - charabigap[288];
> -};
> -
> -static int is_sigreturn_64_address(unsigned long nip, unsigned long fp)
> -{
> - if (nip == fp + offsetof(struct signal_frame_64, tramp))
> - return 1;
> - if (vdso64_rt_sigtramp && current->mm->context.vdso_base &&
> - nip == current->mm->context.vdso_base + vdso64_rt_sigtramp)
> - return 1;
> - return 0;
> -}
> -

Re: [PATCH v7 0/6] Disable compat cruft on ppc64le v7

2019-08-31 Thread Michal Suchánek
On Sat, 31 Aug 2019 08:41:58 +0200
Christophe Leroy  wrote:

> Le 30/08/2019 à 23:03, Michal Suchanek a écrit :
> > Less code means less bugs so add a knob to skip the compat stuff.  
> 
> I guess on PPC64 you have Gigabytes of memory and thousands of bogomips, 
> hence you focus on bugs.
> 
> My main focus usually is kernel size and performance, which makes this 
> series interesting as well.
> 
> Anyway, I was wondering, would it make sense (in a following series, not 
> in this one) to make it buildable as a module, just like some of binfmt ?

I think not.

You have the case when 32bit support is not optional because you are
32bit and when it is optional because you are 64bit. These cases either
diverge or the 32bit case suffers the penalty of some indirection that
makes the functionality loadable.

The indirection requires some synchronization so the 32bit code cannot
be unloaded while you are trying to call it, and possibly counting
32bit tasks so you will know when you can unload the 32bit code again.

This would add more code which benefits neither performance nor size
nor reliability.

Also you can presumably run 32bit code with binfmt-misc already but
don't get stuff like perf counters handled in the kernel because it is
not native code anymore.

Thanks

Michal

> 
> Christophe
> 
> > 
> > This is tested on ppc64le top of
> > 
> > https://patchwork.ozlabs.org/cover/1153556/
> > 
> > Changes in v2: saner CONFIG_COMPAT ifdefs
> > Changes in v3:
> >   - change llseek to 32bit instead of builing it unconditionally in fs
> >   - clanup the makefile conditionals
> >   - remove some ifdefs or convert to IS_DEFINED where possible
> > Changes in v4:
> >   - cleanup is_32bit_task and current_is_64bit
> >   - more makefile cleanup
> > Changes in v5:
> >   - more current_is_64bit cleanup
> >   - split off callchain.c 32bit and 64bit parts
> > Changes in v6:
> >   - cleanup makefile after split
> >   - consolidate read_user_stack_32
> >   - fix some checkpatch warnings
> > Changes in v7:
> >   - add back __ARCH_WANT_SYS_LLSEEK to fix build with llseek
> >   - remove leftover hunk
> >   - add review tags
> > 
> > Michal Suchanek (6):
> >powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro
> >powerpc: move common register copy functions from signal_32.c to
> >  signal.c
> >powerpc/perf: consolidate read_user_stack_32
> >powerpc/64: make buildable without CONFIG_COMPAT
> >powerpc/64: Make COMPAT user-selectable disabled on littleendian by
> >  default.
> >powerpc/perf: split callchain.c by bitness
> > 
> >   arch/powerpc/Kconfig   |   5 +-
> >   arch/powerpc/include/asm/thread_info.h |   4 +-
> >   arch/powerpc/include/asm/unistd.h  |   1 +
> >   arch/powerpc/kernel/Makefile   |   7 +-
> >   arch/powerpc/kernel/entry_64.S |   2 +
> >   arch/powerpc/kernel/signal.c   | 144 +-
> >   arch/powerpc/kernel/signal_32.c| 140 -
> >   arch/powerpc/kernel/syscall_64.c   |   6 +-
> >   arch/powerpc/kernel/vdso.c |   5 +-
> >   arch/powerpc/perf/Makefile |   5 +-
> >   arch/powerpc/perf/callchain.c  | 377 +
> >   arch/powerpc/perf/callchain.h  |  11 +
> >   arch/powerpc/perf/callchain_32.c   | 204 +
> >   arch/powerpc/perf/callchain_64.c   | 185 
> >   fs/read_write.c|   3 +-
> >   15 files changed, 566 insertions(+), 533 deletions(-)
> >   create mode 100644 arch/powerpc/perf/callchain.h
> >   create mode 100644 arch/powerpc/perf/callchain_32.c
> >   create mode 100644 arch/powerpc/perf/callchain_64.c
> >   



Re: [PATCH v7 4/6] powerpc/64: make buildable without CONFIG_COMPAT

2019-08-31 Thread Michal Suchánek
On Fri, 30 Aug 2019 23:03:41 +0200
Michal Suchanek  wrote:

> There are numerous references to 32bit functions in generic and 64bit
> code so ifdef them out.
> 
> Signed-off-by: Michal Suchanek 
> ---
> v2:
> - fix 32bit ifdef condition in signal.c
> - simplify the compat ifdef condition in vdso.c - 64bit is redundant
> - simplify the compat ifdef condition in callchain.c - 64bit is redundant
> v3:
> - use IS_ENABLED and maybe_unused where possible
> - do not ifdef declarations
> - clean up Makefile
> v4:
> - further makefile cleanup
> - simplify is_32bit_task conditions
> - avoid ifdef in condition by using return
> v5:
> - avoid unreachable code on 32bit
> - make is_current_64bit constant on !COMPAT
> - add stub perf_callchain_user_32 to avoid some ifdefs
> v6:
> - consolidate current_is_64bit
> v7:
> - remove leftover perf_callchain_user_32 stub from previous series version

removed too much stuff here

> ---
>  arch/powerpc/include/asm/thread_info.h |  4 ++--
>  arch/powerpc/kernel/Makefile   |  7 +++---
>  arch/powerpc/kernel/entry_64.S |  2 ++
>  arch/powerpc/kernel/signal.c   |  3 +--
>  arch/powerpc/kernel/syscall_64.c   |  6 ++---
>  arch/powerpc/kernel/vdso.c |  5 ++---
>  arch/powerpc/perf/callchain.c  | 31 +-
>  7 files changed, 27 insertions(+), 31 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/thread_info.h 
> b/arch/powerpc/include/asm/thread_info.h
> index 8e1d0195ac36..c128d8a48ea3 100644
> --- a/arch/powerpc/include/asm/thread_info.h
> +++ b/arch/powerpc/include/asm/thread_info.h
> @@ -144,10 +144,10 @@ static inline bool test_thread_local_flags(unsigned int 
> flags)
>   return (ti->local_flags & flags) != 0;
>  }
>  
> -#ifdef CONFIG_PPC64
> +#ifdef CONFIG_COMPAT
>  #define is_32bit_task()  (test_thread_flag(TIF_32BIT))
>  #else
> -#define is_32bit_task()  (1)
> +#define is_32bit_task()  (IS_ENABLED(CONFIG_PPC32))
>  #endif
>  
>  #if defined(CONFIG_PPC64)
> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> index 1d646a94d96c..9d8772e863b9 100644
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -44,16 +44,15 @@ CFLAGS_btext.o += -DDISABLE_BRANCH_PROFILING
>  endif
>  
>  obj-y:= cputable.o ptrace.o syscalls.o \
> -irq.o align.o signal_32.o pmc.o vdso.o \
> +irq.o align.o signal_$(BITS).o pmc.o vdso.o \
>  process.o systbl.o idle.o \
>  signal.o sysfs.o cacheinfo.o time.o \
>  prom.o traps.o setup-common.o \
>  udbg.o misc.o io.o misc_$(BITS).o \
>  of_platform.o prom_parse.o
> -obj-$(CONFIG_PPC64)  += setup_64.o sys_ppc32.o \
> -signal_64.o ptrace32.o \
> -paca.o nvram_64.o firmware.o \
> +obj-$(CONFIG_PPC64)  += setup_64.o paca.o nvram_64.o firmware.o \
>  syscall_64.o
> +obj-$(CONFIG_COMPAT) += sys_ppc32.o ptrace32.o signal_32.o
>  obj-$(CONFIG_VDSO32) += vdso32/
>  obj-$(CONFIG_PPC_WATCHDOG)   += watchdog.o
>  obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index 2ec825a85f5b..a2dbf216f607 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -51,8 +51,10 @@
>  SYS_CALL_TABLE:
>   .tc sys_call_table[TC],sys_call_table
>  
> +#ifdef CONFIG_COMPAT
>  COMPAT_SYS_CALL_TABLE:
>   .tc compat_sys_call_table[TC],compat_sys_call_table
> +#endif
>  
>  /* This value is used to mark exception frames on the stack. */
>  exception_marker:
> diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
> index 60436432399f..61678cb0e6a1 100644
> --- a/arch/powerpc/kernel/signal.c
> +++ b/arch/powerpc/kernel/signal.c
> @@ -247,7 +247,6 @@ static void do_signal(struct task_struct *tsk)
>   sigset_t *oldset = sigmask_to_save();
>   struct ksignal ksig = { .sig = 0 };
>   int ret;
> - int is32 = is_32bit_task();
>  
>   BUG_ON(tsk != current);
>  
> @@ -277,7 +276,7 @@ static void do_signal(struct task_struct *tsk)
>  
>   rseq_signal_deliver(, tsk->thread.regs);
>  
> - if (is32) {
> + if (is_32bit_task()) {
>   if (ksig.ka.sa.sa_flags & SA_SIGINFO)
>   ret = handle_rt_signal32(, oldset, tsk);
>   else
> diff --git a/arch/powerpc/kernel/syscall_64.c 
> b/arch/powerpc/kernel/syscall_64.c
> index 98ed970796d5..0d5cbbe54cf1 100644
> --- a/arch/powerpc/kernel/syscall_64.c
> +++ b/arch/powerpc/kernel/syscall_64.c
> @@ -38,7 +38,6 @@ typedef long (*syscall_fn)(long, long, long, long, long, 
> long);
>  
>  long system_call_exception(long r3, long r4, long r5, long 

Re: [PATCH v6 4/6] powerpc/64: make buildable without CONFIG_COMPAT

2019-08-30 Thread Michal Suchánek
On Fri, 30 Aug 2019 22:21:09 +0200
Christophe Leroy  wrote:

> Le 30/08/2019 à 20:57, Michal Suchanek a écrit :
> > There are numerous references to 32bit functions in generic and 64bit
> > code so ifdef them out.
> > 
> > Signed-off-by: Michal Suchanek 
> > ---
> > v2:
> > - fix 32bit ifdef condition in signal.c
> > - simplify the compat ifdef condition in vdso.c - 64bit is redundant
> > - simplify the compat ifdef condition in callchain.c - 64bit is redundant
> > v3:
> > - use IS_ENABLED and maybe_unused where possible
> > - do not ifdef declarations
> > - clean up Makefile
> > v4:
> > - further makefile cleanup
> > - simplify is_32bit_task conditions
> > - avoid ifdef in condition by using return
> > v5:
> > - avoid unreachable code on 32bit
> > - make is_current_64bit constant on !COMPAT
> > - add stub perf_callchain_user_32 to avoid some ifdefs
> > v6:
> > - consolidate current_is_64bit
> > ---
> >   arch/powerpc/include/asm/thread_info.h |  4 +--
> >   arch/powerpc/kernel/Makefile   |  7 +++--
> >   arch/powerpc/kernel/entry_64.S |  2 ++
> >   arch/powerpc/kernel/signal.c   |  3 +--
> >   arch/powerpc/kernel/syscall_64.c   |  6 ++---
> >   arch/powerpc/kernel/vdso.c |  5 ++--
> >   arch/powerpc/perf/callchain.c  | 37 +++---
> >   7 files changed, 33 insertions(+), 31 deletions(-)
> >   
> 
> [...]
> 
> > diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
> > index b7cdcce20280..788ad2c63f18 100644
> > --- a/arch/powerpc/perf/callchain.c
> > +++ b/arch/powerpc/perf/callchain.c
> > @@ -15,7 +15,7 @@
> >   #include 
> >   #include 
> >   #include 
> > -#ifdef CONFIG_PPC64
> > +#ifdef CONFIG_COMPAT
> >   #include "../kernel/ppc32.h"
> >   #endif
> >   #include 
> > @@ -268,16 +268,6 @@ static void perf_callchain_user_64(struct 
> > perf_callchain_entry_ctx *entry,
> > }
> >   }
> >   
> > -static inline int current_is_64bit(void)
> > -{
> > -   /*
> > -* We can't use test_thread_flag() here because we may be on an
> > -* interrupt stack, and the thread flags don't get copied over
> > -* from the thread_info on the main stack to the interrupt stack.
> > -*/
> > -   return !test_ti_thread_flag(task_thread_info(current), TIF_32BIT);
> > -}
> > -
> >   #else  /* CONFIG_PPC64 */
> >   static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
> >   {
> > @@ -314,11 +304,6 @@ static inline void perf_callchain_user_64(struct 
> > perf_callchain_entry_ctx *entry
> >   {
> >   }
> >   
> > -static inline int current_is_64bit(void)
> > -{
> > -   return 0;
> > -}
> > -
> >   static inline int valid_user_sp(unsigned long sp, int is_64)
> >   {
> > if (!sp || (sp & 7) || sp > TASK_SIZE - 32)
> > @@ -334,6 +319,7 @@ static inline int valid_user_sp(unsigned long sp, int 
> > is_64)
> >   
> >   #endif /* CONFIG_PPC64 */
> >   
> > +#if defined(CONFIG_PPC32) || defined(CONFIG_COMPAT)
> >   /*
> >* Layout for non-RT signal frames
> >*/
> > @@ -475,6 +461,25 @@ static void perf_callchain_user_32(struct 
> > perf_callchain_entry_ctx *entry,
> > sp = next_sp;
> > }
> >   }
> > +#else /* 32bit */
> > +static void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
> > +  struct pt_regs *regs)
> > +{
> > +   (void)_user_stack_32; /* unused if !COMPAT */  
> 
> You don't need that anymore do you ?

Yes, this part is not needed. It was removed later anyway but this
state is broken.

Thanks

Michal

> 
> Christophe
> 
> > +}
> > +#endif /* 32bit */
> > +
> > +static inline int current_is_64bit(void)
> > +{
> > +   if (!IS_ENABLED(CONFIG_COMPAT))
> > +   return IS_ENABLED(CONFIG_PPC64);
> > +   /*
> > +* We can't use test_thread_flag() here because we may be on an
> > +* interrupt stack, and the thread flags don't get copied over
> > +* from the thread_info on the main stack to the interrupt stack.
> > +*/
> > +   return !test_ti_thread_flag(task_thread_info(current), TIF_32BIT);
> > +}
> >   
> >   void
> >   perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct 
> > pt_regs *regs)
> >   



Re: [PATCH] Revert "asm-generic: Remove unneeded __ARCH_WANT_SYS_LLSEEK macro"

2019-08-30 Thread Michal Suchánek
On Fri, 30 Aug 2019 21:54:43 +0200
Arnd Bergmann  wrote:

> On Fri, Aug 30, 2019 at 9:46 PM Michal Suchanek  wrote:
> >
> > This reverts commit caf6f9c8a326cffd1d4b3ff3f1cfba75d159d70b.
> >
> > Maybe it was needed after all.
> >
> > When CONFIG_COMPAT is disabled on ppc64 the kernel does not build.
> >
> > There is resistance to both removing the llseek syscall from the 64bit
> > syscall tables and building the llseek interface unconditionally.
> >
> > Link: https://lore.kernel.org/lkml/20190828151552.ga16...@infradead.org/
> > Link: https://lore.kernel.org/lkml/20190829214319.498c7de2@naga/
> >
> > Signed-off-by: Michal Suchanek   
> 
> This seems like the right idea in principle.
> 
> > index 5bbf587f5bc1..2f3c4bb138c4 100644
> > --- a/fs/read_write.c
> > +++ b/fs/read_write.c
> > @@ -331,7 +331,7 @@ COMPAT_SYSCALL_DEFINE3(lseek, unsigned int, fd, 
> > compat_off_t, offset, unsigned i
> >  }
> >  #endif
> >
> > -#if !defined(CONFIG_64BIT) || defined(CONFIG_COMPAT)
> > +#ifdef __ARCH_WANT_SYS_LLSEEK
> >  SYSCALL_DEFINE5(llseek, unsigned int, fd, unsigned long, offset_high,
> > unsigned long, offset_low, loff_t __user *, result,
> > unsigned int, whence)  
> 
> However, only reverting the patch will now break all newly added
> 32-bit architectures that don't define __ARCH_WANT_SYS_LLSEEK:
> at least nds32 and riscv32 come to mind, not sure if there is another.

AFAICT nds32 never had the syscall. Its headers were added without
__ARCH_WANT_SYS_LLSEEK before the define was removed.

The new architecture csky should be handled.

> 
> I think the easiest way however would be to combine the two checks
> above and make it
> 
> #if !defined(CONFIG_64BIT) || defined(CONFIG_COMPAT) ||
> defined(__ARCH_WANT_SYS_LLSEEK)
> 
> and then only set __ARCH_WANT_SYS_LLSEEK for powerpc.

Yes, that limits the use of __ARCH_WANT_SYS_LLSEEK, does not require
resurrecting the old headers, and may fix some architectures like nds32
that forgot to add it.

Thanks

Michal


Re: [PATCH v2 4/4] powerpc/64: system call implement the bulk of the logic in C

2019-08-30 Thread Michal Suchánek
On Sat, 31 Aug 2019 02:48:26 +0800
kbuild test robot  wrote:

> Hi Nicholas,
> 
> I love your patch! Yet something to improve:
> 
> [auto build test ERROR on linus/master]
> [cannot apply to v5.3-rc6 next-20190830]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/Nicholas-Piggin/powerpc-64-syscalls-in-C/20190828-064221
> config: powerpc64-defconfig (attached as .config)
> compiler: powerpc64-linux-gcc (GCC) 7.4.0
> reproduce:
> wget 
> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
> ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> GCC_VERSION=7.4.0 make.cross ARCH=powerpc64 
> 
> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot 
> 
> All errors (new ones prefixed by >>):
> 
>powerpc64-linux-ld: warning: orphan section `.gnu.hash' from `linker 
> stubs' being placed in section `.gnu.hash'.
>arch/powerpc/kernel/syscall_64.o: In function `.system_call_exception':
> >> syscall_64.c:(.text+0x180): undefined reference to `.tabort_syscall'  

Interestingly it builds and boots for me. Is this something about
dotted vs dotless symbols depending on some config options?

I see there is _GLOBAL(ret_from_fork) just below so maybe we need
_GLOBAL(tabort_syscall) instead of .globl tabort_syscall as well.

Thanks

Michal


Re: [PATCH v6 6/6] powerpc/perf: split callchain.c by bitness

2019-08-30 Thread Michal Suchánek
On Fri, 30 Aug 2019 20:57:57 +0200
Michal Suchanek  wrote:

> Building callchain.c with !COMPAT proved quite ugly with all the
> defines. Splitting out the 32bit and 64bit parts looks better.
> 

BTW the powerpc callchain.c does not match any of the patterns of PERF
CORE in MAINTAINERS (unlike callchain implementation on other
platforms). Is that intentional?

Thanks

Michal


Re: [PATCH v5 3/5] powerpc/64: make buildable without CONFIG_COMPAT

2019-08-30 Thread Michal Suchánek
On Fri, 30 Aug 2019 06:35:13 + (UTC)
Christophe Leroy  wrote:

> On 08/29/2019 10:28 PM, Michal Suchanek wrote:
> > There are numerous references to 32bit functions in generic and 64bit
> > code so ifdef them out.
> > 
> > Signed-off-by: Michal Suchanek 
> > ---
> > v2:
> > - fix 32bit ifdef condition in signal.c
> > - simplify the compat ifdef condition in vdso.c - 64bit is redundant
> > - simplify the compat ifdef condition in callchain.c - 64bit is redundant
> > v3:
> > - use IS_ENABLED and maybe_unused where possible
> > - do not ifdef declarations
> > - clean up Makefile
> > v4:
> > - further makefile cleanup
> > - simplify is_32bit_task conditions
> > - avoid ifdef in condition by using return
> > v5:
> > - avoid unreachable code on 32bit
> > - make is_current_64bit constant on !COMPAT
> > - add stub perf_callchain_user_32 to avoid some ifdefs
> > ---
> >   arch/powerpc/include/asm/thread_info.h |  4 ++--
> >   arch/powerpc/kernel/Makefile   |  7 +++
> >   arch/powerpc/kernel/entry_64.S |  2 ++
> >   arch/powerpc/kernel/signal.c   |  3 +--
> >   arch/powerpc/kernel/syscall_64.c   |  6 ++
> >   arch/powerpc/kernel/vdso.c |  5 ++---
> >   arch/powerpc/perf/callchain.c  | 13 +++--
> >   7 files changed, 23 insertions(+), 17 deletions(-)
> >   
> [...]
> 
> > diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
> > index c84bbd4298a0..881be5c4e9bb 100644
> > --- a/arch/powerpc/perf/callchain.c
> > +++ b/arch/powerpc/perf/callchain.c
> > @@ -15,7 +15,7 @@
> >   #include 
> >   #include 
> >   #include 
> > -#ifdef CONFIG_PPC64
> > +#ifdef CONFIG_COMPAT
> >   #include "../kernel/ppc32.h"
> >   #endif
> >   #include 
> > @@ -291,7 +291,8 @@ static inline int current_is_64bit(void)
> >  * interrupt stack, and the thread flags don't get copied over
> >  * from the thread_info on the main stack to the interrupt stack.
> >  */
> > -   return !test_ti_thread_flag(task_thread_info(current), TIF_32BIT);
> > +   return !IS_ENABLED(CONFIG_COMPAT) ||
> > +   !test_ti_thread_flag(task_thread_info(current), TIF_32BIT);
> >   }
> >   
> >   #else  /* CONFIG_PPC64 */
> > @@ -341,6 +342,7 @@ static inline int valid_user_sp(unsigned long sp, int 
> > is_64)
> >   
> >   #endif /* CONFIG_PPC64 */
> >   
> > +#if defined(CONFIG_PPC32) || defined(CONFIG_COMPAT)
> >   /*
> >* Layout for non-RT signal frames
> >*/
> > @@ -482,6 +484,13 @@ static void perf_callchain_user_32(struct 
> > perf_callchain_entry_ctx *entry,
> > sp = next_sp;
> > }
> >   }
> > +#else /* 32bit */
> > +static void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
> > +  struct pt_regs *regs)
> > +{
> > +   (void)_user_stack_32; /* unused if !COMPAT */  
> 
> That looks pretty much like a hack.
> 
> See possible alternative below.
> 
> > +}
> > +#endif /* 32bit */
> >   
> >   void
> >   perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct 
> > pt_regs *regs)
> >   
> 
> ---
>  arch/powerpc/perf/callchain.c | 62 
> +++
>  1 file changed, 27 insertions(+), 35 deletions(-)
> 
> diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
> index 881be5c4e9bb..1b169b32776a 100644
> --- a/arch/powerpc/perf/callchain.c
> +++ b/arch/powerpc/perf/callchain.c
> @@ -165,22 +165,6 @@ static int read_user_stack_64(unsigned long __user *ptr, 
> unsigned long *ret)
>   return read_user_stack_slow(ptr, ret, 8);
>  }
>  
> -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> -{
> - if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> - ((unsigned long)ptr & 3))
> - return -EFAULT;
> -
> - pagefault_disable();
> - if (!__get_user_inatomic(*ret, ptr)) {
> - pagefault_enable();
> - return 0;
> - }
> - pagefault_enable();
> -
> - return read_user_stack_slow(ptr, ret, 4);
> -}
> -
>  static inline int valid_user_sp(unsigned long sp, int is_64)
>  {
>   if (!sp || (sp & 7) || sp > (is_64 ? TASK_SIZE : 0x1UL) - 32)
> @@ -296,25 +280,10 @@ static inline int current_is_64bit(void)
>  }
>  
>  #else  /* CONFIG_PPC64 */
> -/*
> - * On 32-bit we just access the address and let hash_page create a
> - * HPTE if necessary, so there is no need to fall back to reading
> - * the page tables.  Since this is called at interrupt level,
> - * do_page_fault() won't treat a DSI as a page fault.
> - */
> -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> -{
> - int rc;
> -
> - if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> - ((unsigned long)ptr & 3))
> - return -EFAULT;
>  
> - pagefault_disable();
> - rc = __get_user_inatomic(*ret, ptr);
> - pagefault_enable();
> -
> - return rc;
> +static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
> 

Re: [PATCH v5 5/5] powerpc/perf: split callchain.c by bitness

2019-08-30 Thread Michal Suchánek
On Fri, 30 Aug 2019 08:42:25 +0200
Michal Suchánek  wrote:

> On Fri, 30 Aug 2019 06:35:11 + (UTC)
> Christophe Leroy  wrote:
> 
> > On 08/29/2019 10:28 PM, Michal Suchanek wrote:  

> >  obj-$(CONFIG_PPC_PERF_CTRS)+= core-book3s.o bhrb.o
> > diff --git a/arch/powerpc/perf/callchain_32.c 
> > b/arch/powerpc/perf/callchain_32.c
> > index 0bd4484eddaa..17c43ae03084 100644
> > --- a/arch/powerpc/perf/callchain_32.c
> > +++ b/arch/powerpc/perf/callchain_32.c
> > @@ -15,50 +15,13 @@
> >  #include 
> >  #include 
> >  #include 
> > -#ifdef CONFIG_PPC64
> > -#include "../kernel/ppc32.h"
> > -#endif
> >  #include 
> >  
> >  #include "callchain.h"
> >  
> >  #ifdef CONFIG_PPC64
> > -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> > -{
> > -   if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> > -   ((unsigned long)ptr & 3))
> > -   return -EFAULT;
> > -
> > -   pagefault_disable();
> > -   if (!__get_user_inatomic(*ret, ptr)) {
> > -   pagefault_enable();
> > -   return 0;
> > -   }
> > -   pagefault_enable();
> > -
> > -   return read_user_stack_slow(ptr, ret, 4);
> > -}
> > -#else /* CONFIG_PPC64 */
> > -/*
> > - * On 32-bit we just access the address and let hash_page create a
> > - * HPTE if necessary, so there is no need to fall back to reading
> > - * the page tables.  Since this is called at interrupt level,
> > - * do_page_fault() won't treat a DSI as a page fault.
> > - */
> > -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> > -{
> > -   int rc;
> > -
> > -   if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> > -   ((unsigned long)ptr & 3))
> > -   return -EFAULT;
> > -
> > -   pagefault_disable();
> > -   rc = __get_user_inatomic(*ret, ptr);
> > -   pagefault_enable();
> > -
> > -   return rc;
> > -}
> > +#include "../kernel/ppc32.h"
> > +#else
> >  
> >  #define __SIGNAL_FRAMESIZE32   __SIGNAL_FRAMESIZE
> >  #define sigcontext32   sigcontext
> > @@ -95,6 +58,30 @@ struct rt_signal_frame_32 {
> > int abigap[56];
> >  };
> >  
> > +/*
> > + * On 32-bit we just access the address and let hash_page create a
> > + * HPTE if necessary, so there is no need to fall back to reading
> > + * the page tables.  Since this is called at interrupt level,
> > + * do_page_fault() won't treat a DSI as a page fault.
> > + */
> > +static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> > +{
> > +   int rc;
> > +
> > +   if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> > +   ((unsigned long)ptr & 3))
> > +   return -EFAULT;
> > +
> > +   pagefault_disable();
> > +   rc = __get_user_inatomic(*ret, ptr);
> > +   pagefault_enable();
> > +
> > +   if (IS_ENABLED(CONFIG_PPC32) || !rc)
> > +   return rc;
> > +
> > +   return read_user_stack_slow(ptr, ret, 4);
> > +}
> > +
> >  static int is_sigreturn_32_address(unsigned int nip, unsigned int fp)
> >  {
> > if (nip == fp + offsetof(struct signal_frame_32, mctx.mc_pad))  
> 
> I will leave consolidating this function to somebody who knows what the
> desired semantic is. With a short ifdef section at the top of the file
> it is a low-hanging fruit.

It looks ok if done as a separate patch.

Thanks

Michal


Re: [PATCH v5 5/5] powerpc/perf: split callchain.c by bitness

2019-08-30 Thread Michal Suchánek
On Fri, 30 Aug 2019 06:35:11 + (UTC)
Christophe Leroy  wrote:

> On 08/29/2019 10:28 PM, Michal Suchanek wrote:
> > Building callchain.c with !COMPAT proved quite ugly with all the
> > defines. Splitting out the 32bit and 64bit parts looks better.
> > 
> > Also rewrite current_is_64bit as common function. No other code change
> > intended.  
> 
> Nice result.
> 
> Could look even better by merging both read_user_stack_32(), see below.
> 
> Also a possible cosmetic change to Makefile.
> 
> ---
>  arch/powerpc/perf/Makefile   |  7 ++---
>  arch/powerpc/perf/callchain_32.c | 65 
> 
>  2 files changed, 29 insertions(+), 43 deletions(-)
> 
> diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
> index e9f3202251d0..53d614e98537 100644
> --- a/arch/powerpc/perf/Makefile
> +++ b/arch/powerpc/perf/Makefile
> @@ -1,9 +1,8 @@
>  # SPDX-License-Identifier: GPL-2.0
>  
> -obj-$(CONFIG_PERF_EVENTS)+= callchain.o perf_regs.o
> -ifdef CONFIG_PERF_EVENTS
> -obj-y+= callchain_$(BITS).o
> -obj-$(CONFIG_COMPAT) += callchain_32.o
> +obj-$(CONFIG_PERF_EVENTS)+= callchain.o callchain_$(BITS).o perf_regs.o
> +ifdef CONFIG_COMPAT
> +obj-$(CONFIG_PERF_EVENTS)+= callchain_32.o
>  endif
>  
That looks good.
>  obj-$(CONFIG_PPC_PERF_CTRS)  += core-book3s.o bhrb.o
> diff --git a/arch/powerpc/perf/callchain_32.c 
> b/arch/powerpc/perf/callchain_32.c
> index 0bd4484eddaa..17c43ae03084 100644
> --- a/arch/powerpc/perf/callchain_32.c
> +++ b/arch/powerpc/perf/callchain_32.c
> @@ -15,50 +15,13 @@
>  #include 
>  #include 
>  #include 
> -#ifdef CONFIG_PPC64
> -#include "../kernel/ppc32.h"
> -#endif
>  #include 
>  
>  #include "callchain.h"
>  
>  #ifdef CONFIG_PPC64
> -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> -{
> - if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> - ((unsigned long)ptr & 3))
> - return -EFAULT;
> -
> - pagefault_disable();
> - if (!__get_user_inatomic(*ret, ptr)) {
> - pagefault_enable();
> - return 0;
> - }
> - pagefault_enable();
> -
> - return read_user_stack_slow(ptr, ret, 4);
> -}
> -#else /* CONFIG_PPC64 */
> -/*
> - * On 32-bit we just access the address and let hash_page create a
> - * HPTE if necessary, so there is no need to fall back to reading
> - * the page tables.  Since this is called at interrupt level,
> - * do_page_fault() won't treat a DSI as a page fault.
> - */
> -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> -{
> - int rc;
> -
> - if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> - ((unsigned long)ptr & 3))
> - return -EFAULT;
> -
> - pagefault_disable();
> - rc = __get_user_inatomic(*ret, ptr);
> - pagefault_enable();
> -
> - return rc;
> -}
> +#include "../kernel/ppc32.h"
> +#else
>  
>  #define __SIGNAL_FRAMESIZE32 __SIGNAL_FRAMESIZE
>  #define sigcontext32 sigcontext
> @@ -95,6 +58,30 @@ struct rt_signal_frame_32 {
>   int abigap[56];
>  };
>  
> +/*
> + * On 32-bit we just access the address and let hash_page create a
> + * HPTE if necessary, so there is no need to fall back to reading
> + * the page tables.  Since this is called at interrupt level,
> + * do_page_fault() won't treat a DSI as a page fault.
> + */
> +static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> +{
> + int rc;
> +
> + if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> + ((unsigned long)ptr & 3))
> + return -EFAULT;
> +
> + pagefault_disable();
> + rc = __get_user_inatomic(*ret, ptr);
> + pagefault_enable();
> +
> + if (IS_ENABLED(CONFIG_PPC32) || !rc)
> + return rc;
> +
> + return read_user_stack_slow(ptr, ret, 4);
> +}
> +
>  static int is_sigreturn_32_address(unsigned int nip, unsigned int fp)
>  {
>   if (nip == fp + offsetof(struct signal_frame_32, mctx.mc_pad))

I will leave consolidating this function to somebody who knows what the
desired semantic is. With a short ifdef section at the top of the file
it is a low-hanging fruit.

Thanks

Michal


Re: [PATCH 1/3] scsi: cxlflash: Fix fallthrough warnings.

2019-08-29 Thread Michal Suchánek
On Thu, 29 Aug 2019 15:34:08 -0500
Uma Krishnan  wrote:

> Below commit queued up for 5.4 includes these changes.
> 
> commit 657bd277c162580674ddb86a90c4aeb62639bff5
> Author: Gustavo A. R. Silva 
> Date:   Sun Jul 28 19:21:19 2019 -0500
> 
> Thanks,
> Uma Krishnan

Works for me as well.

Thanks

Michal

> 
> On Aug 29, 2019, at 7:32 AM, Michal Suchanek  wrote:
> > 
> > Add fallthrough comments where missing.
> > 
> > Signed-off-by: Michal Suchanek 
> > ---
> > drivers/scsi/cxlflash/main.c | 8 
> > 1 file changed, 8 insertions(+)
> > 
> > diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
> > index b1f4724efde2..f402fa9a7bec 100644
> > --- a/drivers/scsi/cxlflash/main.c
> > +++ b/drivers/scsi/cxlflash/main.c
> > @@ -753,10 +753,13 @@ static void term_intr(struct cxlflash_cfg *cfg, enum 
> > undo_level level,
> > /* SISL_MSI_ASYNC_ERROR is setup only for the primary HWQ */
> > if (index == PRIMARY_HWQ)
> > cfg->ops->unmap_afu_irq(hwq->ctx_cookie, 3, hwq);
> > + /* fall through */
> > case UNMAP_TWO:
> > cfg->ops->unmap_afu_irq(hwq->ctx_cookie, 2, hwq);
> > + /* fall through */
> > case UNMAP_ONE:
> > cfg->ops->unmap_afu_irq(hwq->ctx_cookie, 1, hwq);
> > + /* fall through */
> > case FREE_IRQ:
> > cfg->ops->free_afu_irqs(hwq->ctx_cookie);
> > /* fall through */
> > @@ -973,14 +976,18 @@ static void cxlflash_remove(struct pci_dev *pdev)
> > switch (cfg->init_state) {
> > case INIT_STATE_CDEV:
> > cxlflash_release_chrdev(cfg);
> > + /* fall through */
> > case INIT_STATE_SCSI:
> > cxlflash_term_local_luns(cfg);
> > scsi_remove_host(cfg->host);
> > + /* fall through */
> > case INIT_STATE_AFU:
> > term_afu(cfg);
> > + /* fall through */
> > case INIT_STATE_PCI:
> > cfg->ops->destroy_afu(cfg->afu_cookie);
> > pci_disable_device(pdev);
> > + /* fall through */
> > case INIT_STATE_NONE:
> > free_mem(cfg);
> > scsi_host_put(cfg->host);
> > @@ -3017,6 +3024,7 @@ static ssize_t num_hwqs_store(struct device *dev,
> > wait_event(cfg->reset_waitq, cfg->state != STATE_RESET);
> > if (cfg->state == STATE_NORMAL)
> > goto retry;
> > + /* fall through */
> > default:
> > /* Ideally should not happen */
> > dev_err(dev, "%s: Device is not ready, state=%d\n",
> > --
> > 2.12.3
> > 
> > 
> 



Re: [PATCH v4 1/4] powerpc: make llseek 32bit-only.

2019-08-29 Thread Michal Suchánek
On Thu, 29 Aug 2019 16:32:50 +0200
Arnd Bergmann  wrote:

> On Thu, Aug 29, 2019 at 4:19 PM Michal Suchánek  wrote:
> > On Thu, 29 Aug 2019 14:57:39 +0200 Arnd Bergmann  wrote:  
> > > On Thu, Aug 29, 2019 at 2:37 PM Michal Suchánek  
> > > wrote:  
> > > > On Thu, 29 Aug 2019 14:19:46 +0200 Arnd Bergmann  wrote: 
> > > >  
> > > > > On Thu, Aug 29, 2019 at 12:23 PM Michal Suchanek  
> > > > > wrote:
> > > > > In particular, I don't see why you single out llseek here, but leave 
> > > > > other
> > > > > syscalls that are not needed on 64-bit machines such as pread64().  
> > > >
> > > > Because llseek is not built in fs/ when building 64bit only causing a
> > > > link error.
> > > >
> > > > I initially posted patch to build it always but it was pointed out it
> > > > is not needed, and  the interface does not make sense on 64bit, and
> > > > that platforms that don't have it on 64bit now don't want that useless
> > > > code.  
> > >
> > > Ok, please put that into the changeset description then.
> > >
> > > I looked at uses of __NR__llseek in debian code search and
> > > found this one:
> > >
> > > https://codesearch.debian.net/show?file=umview_0.8.2-1.2%2Fxmview%2Fum_mmap.c=328
> > >
> > > It looks like this application will try to use llseek instead of lseek
> > > when built against kernel headers that define __NR_llseek.
> > >  
> >
> > The available documentation says this syscall is for 32bit only so
> > using it on 64bit is undefined. The interface is not well-defined in
> > that case either.  
> 
> That's generally not how it works. If there is an existing application
> that relies on the behavior of the system call interface, we should not
> change it in a way that breaks the application, regardless of what the
> documentation says. Presumably nobody cares about umview on
> powerpc64, but there might be other applications doing the same
> thing.

Actually the umview headers go out of their way to define the llseek
syscall as invalid on x86_64 so that the non-llseek path is taken. 
mview-os/xmview/defs_x86_64_um.h:#define __NR__llseek __NR_doesnotexist
It is probably an oversight that this is not done on non-x86. I am not
even sure this builds on non-x86 out of the box.

> It looks like sparc64 and parisc64 do the same thing as powerpc64,
> and provide llseek() calls that may or may not be used by
> applications.

And if they are supposed to build with !compat it should be removed
there as well.

> 
> I think your original approach of always building sys_llseek on
> powerpc64 is the safe choice here.

That's safe but adds junk to the kernel as pointed out in the reply to
that patch.

Thanks

Michal


Re: [PATCH v4 3/4] powerpc/64: make buildable without CONFIG_COMPAT

2019-08-29 Thread Michal Suchánek
On Thu, 29 Aug 2019 19:40:55 +0200
Christophe Leroy  wrote:

> Le 29/08/2019 à 12:23, Michal Suchanek a écrit :
> > There are numerous references to 32bit functions in generic and 64bit
> > code so ifdef them out.
> > 
> > Signed-off-by: Michal Suchanek 
> > ---
> > v2:
> > - fix 32bit ifdef condition in signal.c
> > - simplify the compat ifdef condition in vdso.c - 64bit is redundant
> > - simplify the compat ifdef condition in callchain.c - 64bit is redundant
> > v3:
> > - use IS_ENABLED and maybe_unused where possible
> > - do not ifdef declarations
> > - clean up Makefile
> > v4:
> > - further makefile cleanup
> > - simplify is_32bit_task conditions
> > - avoid ifdef in condition by using return
> > ---
> >   arch/powerpc/include/asm/thread_info.h |  4 ++--
> >   arch/powerpc/kernel/Makefile   |  7 +++
> >   arch/powerpc/kernel/entry_64.S |  2 ++
> >   arch/powerpc/kernel/signal.c   |  3 +--
> >   arch/powerpc/kernel/syscall_64.c   |  6 ++
> >   arch/powerpc/kernel/vdso.c |  5 ++---
> >   arch/powerpc/perf/callchain.c  | 14 ++
> >   7 files changed, 22 insertions(+), 19 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/thread_info.h 
> > b/arch/powerpc/include/asm/thread_info.h
> > index 8e1d0195ac36..c128d8a48ea3 100644
> > --- a/arch/powerpc/include/asm/thread_info.h
> > +++ b/arch/powerpc/include/asm/thread_info.h
> > @@ -144,10 +144,10 @@ static inline bool test_thread_local_flags(unsigned 
> > int flags)
> > return (ti->local_flags & flags) != 0;
> >   }
> >   
> > -#ifdef CONFIG_PPC64
> > +#ifdef CONFIG_COMPAT
> >   #define is_32bit_task()   (test_thread_flag(TIF_32BIT))
> >   #else
> > -#define is_32bit_task()(1)
> > +#define is_32bit_task()(IS_ENABLED(CONFIG_PPC32))
> >   #endif
> >   
> >   #if defined(CONFIG_PPC64)
> > diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> > index 1d646a94d96c..9d8772e863b9 100644
> > --- a/arch/powerpc/kernel/Makefile
> > +++ b/arch/powerpc/kernel/Makefile
> > @@ -44,16 +44,15 @@ CFLAGS_btext.o += -DDISABLE_BRANCH_PROFILING
> >   endif
> >   
> >   obj-y := cputable.o ptrace.o syscalls.o \
> > -  irq.o align.o signal_32.o pmc.o vdso.o \
> > +  irq.o align.o signal_$(BITS).o pmc.o vdso.o \
> >process.o systbl.o idle.o \
> >signal.o sysfs.o cacheinfo.o time.o \
> >prom.o traps.o setup-common.o \
> >udbg.o misc.o io.o misc_$(BITS).o \
> >of_platform.o prom_parse.o
> > -obj-$(CONFIG_PPC64)+= setup_64.o sys_ppc32.o \
> > -  signal_64.o ptrace32.o \
> > -  paca.o nvram_64.o firmware.o \
> > +obj-$(CONFIG_PPC64)+= setup_64.o paca.o nvram_64.o 
> > firmware.o \
> >syscall_64.o
> > +obj-$(CONFIG_COMPAT)   += sys_ppc32.o ptrace32.o signal_32.o
> >   obj-$(CONFIG_VDSO32)  += vdso32/
> >   obj-$(CONFIG_PPC_WATCHDOG)+= watchdog.o
> >   obj-$(CONFIG_HAVE_HW_BREAKPOINT)  += hw_breakpoint.o
> > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> > index 2ec825a85f5b..a2dbf216f607 100644
> > --- a/arch/powerpc/kernel/entry_64.S
> > +++ b/arch/powerpc/kernel/entry_64.S
> > @@ -51,8 +51,10 @@
> >   SYS_CALL_TABLE:
> > .tc sys_call_table[TC],sys_call_table
> >   
> > +#ifdef CONFIG_COMPAT
> >   COMPAT_SYS_CALL_TABLE:
> > .tc compat_sys_call_table[TC],compat_sys_call_table
> > +#endif
> >   
> >   /* This value is used to mark exception frames on the stack. */
> >   exception_marker:
> > diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
> > index 60436432399f..61678cb0e6a1 100644
> > --- a/arch/powerpc/kernel/signal.c
> > +++ b/arch/powerpc/kernel/signal.c
> > @@ -247,7 +247,6 @@ static void do_signal(struct task_struct *tsk)
> > sigset_t *oldset = sigmask_to_save();
> > struct ksignal ksig = { .sig = 0 };
> > int ret;
> > -   int is32 = is_32bit_task();
> >   
> > BUG_ON(tsk != current);
> >   
> > @@ -277,7 +276,7 @@ static void do_signal(struct task_struct *tsk)
> >   
> > rseq_signal_deliver(, tsk->thread.regs);
> >   
> > -   if (is32) {
> > +   if (is_32bit_task()) {
> > if (ksig.ka.sa.sa_flags & SA_SIGINFO)
> > ret = handle_rt_signal32(, oldset, tsk);
> > else
> > diff --git a/arch/powerpc/kernel/syscall_64.c 
> > b/arch/powerpc/kernel/syscall_64.c
> > index 98ed970796d5..0d5cbbe54cf1 100644
> > --- a/arch/powerpc/kernel/syscall_64.c
> > +++ b/arch/powerpc/kernel/syscall_64.c
> > @@ -38,7 +38,6 @@ typedef long (*syscall_fn)(long, long, long, long, long, 
> > long);
> >   
> >   long system_call_exception(long r3, long r4, long r5, long r6, long r7, 
> > 

Re: [PATCH v4 1/4] powerpc: make llseek 32bit-only.

2019-08-29 Thread Michal Suchánek
On Thu, 29 Aug 2019 14:57:39 +0200
Arnd Bergmann  wrote:

> On Thu, Aug 29, 2019 at 2:37 PM Michal Suchánek  wrote:
> > On Thu, 29 Aug 2019 14:19:46 +0200 Arnd Bergmann  wrote:  
> > > On Thu, Aug 29, 2019 at 12:23 PM Michal Suchanek  
> > > wrote:
> > > In particular, I don't see why you single out llseek here, but leave other
> > > syscalls that are not needed on 64-bit machines such as pread64().  
> >
> > Because llseek is not built in fs/ when building 64bit only causing a
> > link error.
> >
> > I initially posted patch to build it always but it was pointed out it
> > is not needed, and  the interface does not make sense on 64bit, and
> > that platforms that don't have it on 64bit now don't want that useless
> > code.  
> 
> Ok, please put that into the changeset description then.
> 
> I looked at uses of __NR__llseek in debian code search and
> found this one:
> 
> https://codesearch.debian.net/show?file=umview_0.8.2-1.2%2Fxmview%2Fum_mmap.c=328
> 
> It looks like this application will try to use llseek instead of lseek
> when built against kernel headers that define __NR_llseek.
> 

The available documentation says this syscall is for 32bit only so
using it on 64bit is undefined. The interface is not well-defined in
that case either.

Thanks

Michal


Re: [PATCH v4 1/4] powerpc: make llseek 32bit-only.

2019-08-29 Thread Michal Suchánek
On Thu, 29 Aug 2019 14:19:46 +0200
Arnd Bergmann  wrote:

> On Thu, Aug 29, 2019 at 12:23 PM Michal Suchanek  wrote:
> >
> > Fixes: aff850393200 ("powerpc: add system call table generation support")  
> 
> This patch needs a proper explanation. The Fixes tag doesn't seem right
> here, since ppc64 has had llseek since the start in 2002 commit 3939e37587e7
> ("Add ppc64 support. This includes both pSeries (RS/6000) and iSeries
> (AS/400).").
> 
> > diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
> > b/arch/powerpc/kernel/syscalls/syscall.tbl
> > index 010b9f445586..53e427606f6c 100644
> > --- a/arch/powerpc/kernel/syscalls/syscall.tbl
> > +++ b/arch/powerpc/kernel/syscalls/syscall.tbl
> > @@ -188,7 +188,7 @@
> >  137common  afs_syscall sys_ni_syscall
> >  138common  setfsuidsys_setfsuid
> >  139common  setfsgidsys_setfsgid
> > -140common  _llseek sys_llseek
> > +14032  _llseek sys_llseek
> >  141common  getdentssys_getdents
> > compat_sys_getdents
> >  142common  _newselect  sys_select  
> > compat_sys_select
> >  143common  flock   sys_flock  
> 
> In particular, I don't see why you single out llseek here, but leave other
> syscalls that are not needed on 64-bit machines such as pread64().

Because llseek is not built in fs/ when building 64bit only causing a
link error. 

I initially posted patch to build it always but it was pointed out it
is not needed, and  the interface does not make sense on 64bit, and
that platforms that don't have it on 64bit now don't want that useless
code.

Thanks

Michal


Re: [PATCH v4 3/4] powerpc/64: make buildable without CONFIG_COMPAT

2019-08-29 Thread Michal Suchánek
On Thu, 29 Aug 2019 12:23:42 +0200
Michal Suchanek  wrote:

> There are numerous references to 32bit functions in generic and 64bit
> code so ifdef them out.
> 
> Signed-off-by: Michal Suchanek 
> ---
> v2:
> - fix 32bit ifdef condition in signal.c
> - simplify the compat ifdef condition in vdso.c - 64bit is redundant
> - simplify the compat ifdef condition in callchain.c - 64bit is redundant
> v3:
> - use IS_ENABLED and maybe_unused where possible
> - do not ifdef declarations
> - clean up Makefile
> v4:
> - further makefile cleanup
> - simplify is_32bit_task conditions
> - avoid ifdef in condition by using return
> ---
>  arch/powerpc/include/asm/thread_info.h |  4 ++--
>  arch/powerpc/kernel/Makefile   |  7 +++
>  arch/powerpc/kernel/entry_64.S |  2 ++
>  arch/powerpc/kernel/signal.c   |  3 +--
>  arch/powerpc/kernel/syscall_64.c   |  6 ++
>  arch/powerpc/kernel/vdso.c |  5 ++---
>  arch/powerpc/perf/callchain.c  | 14 ++
>  7 files changed, 22 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/thread_info.h 
> b/arch/powerpc/include/asm/thread_info.h
> index 8e1d0195ac36..c128d8a48ea3 100644
> --- a/arch/powerpc/include/asm/thread_info.h
> +++ b/arch/powerpc/include/asm/thread_info.h
> @@ -144,10 +144,10 @@ static inline bool test_thread_local_flags(unsigned int 
> flags)
>   return (ti->local_flags & flags) != 0;
>  }
>  
> -#ifdef CONFIG_PPC64
> +#ifdef CONFIG_COMPAT
>  #define is_32bit_task()  (test_thread_flag(TIF_32BIT))
>  #else
> -#define is_32bit_task()  (1)
> +#define is_32bit_task()  (IS_ENABLED(CONFIG_PPC32))
>  #endif
>  
>  #if defined(CONFIG_PPC64)
> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> index 1d646a94d96c..9d8772e863b9 100644
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -44,16 +44,15 @@ CFLAGS_btext.o += -DDISABLE_BRANCH_PROFILING
>  endif
>  
>  obj-y:= cputable.o ptrace.o syscalls.o \
> -irq.o align.o signal_32.o pmc.o vdso.o \
> +irq.o align.o signal_$(BITS).o pmc.o vdso.o \
>  process.o systbl.o idle.o \
>  signal.o sysfs.o cacheinfo.o time.o \
>  prom.o traps.o setup-common.o \
>  udbg.o misc.o io.o misc_$(BITS).o \
>  of_platform.o prom_parse.o
> -obj-$(CONFIG_PPC64)  += setup_64.o sys_ppc32.o \
> -signal_64.o ptrace32.o \
> -paca.o nvram_64.o firmware.o \
> +obj-$(CONFIG_PPC64)  += setup_64.o paca.o nvram_64.o firmware.o \
>  syscall_64.o
> +obj-$(CONFIG_COMPAT) += sys_ppc32.o ptrace32.o signal_32.o
>  obj-$(CONFIG_VDSO32) += vdso32/
>  obj-$(CONFIG_PPC_WATCHDOG)   += watchdog.o
>  obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index 2ec825a85f5b..a2dbf216f607 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -51,8 +51,10 @@
>  SYS_CALL_TABLE:
>   .tc sys_call_table[TC],sys_call_table
>  
> +#ifdef CONFIG_COMPAT
>  COMPAT_SYS_CALL_TABLE:
>   .tc compat_sys_call_table[TC],compat_sys_call_table
> +#endif
>  
>  /* This value is used to mark exception frames on the stack. */
>  exception_marker:
> diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
> index 60436432399f..61678cb0e6a1 100644
> --- a/arch/powerpc/kernel/signal.c
> +++ b/arch/powerpc/kernel/signal.c
> @@ -247,7 +247,6 @@ static void do_signal(struct task_struct *tsk)
>   sigset_t *oldset = sigmask_to_save();
>   struct ksignal ksig = { .sig = 0 };
>   int ret;
> - int is32 = is_32bit_task();
>  
>   BUG_ON(tsk != current);
>  
> @@ -277,7 +276,7 @@ static void do_signal(struct task_struct *tsk)
>  
>   rseq_signal_deliver(, tsk->thread.regs);
>  
> - if (is32) {
> + if (is_32bit_task()) {
>   if (ksig.ka.sa.sa_flags & SA_SIGINFO)
>   ret = handle_rt_signal32(, oldset, tsk);
>   else
> diff --git a/arch/powerpc/kernel/syscall_64.c 
> b/arch/powerpc/kernel/syscall_64.c
> index 98ed970796d5..0d5cbbe54cf1 100644
> --- a/arch/powerpc/kernel/syscall_64.c
> +++ b/arch/powerpc/kernel/syscall_64.c
> @@ -38,7 +38,6 @@ typedef long (*syscall_fn)(long, long, long, long, long, 
> long);
>  
>  long system_call_exception(long r3, long r4, long r5, long r6, long r7, long 
> r8, unsigned long r0, struct pt_regs *regs)
>  {
> - unsigned long ti_flags;
>   syscall_fn f;
>  
>   BUG_ON(!(regs->msr & MSR_PR));
> @@ -83,8 +82,7 @@ long system_call_exception(long r3, long r4, long r5, long 
> r6, long r7, long r8,
>*/
>   regs->softe = 

Re: [PATCH 1/4] fs: always build llseek.

2019-08-29 Thread Michal Suchánek
On Wed, 28 Aug 2019 23:22:00 -0700
Christoph Hellwig  wrote:

> On Wed, Aug 28, 2019 at 06:15:40PM +0200, Michal Suchánek wrote:
> > On Wed, 28 Aug 2019 08:15:52 -0700
> > Christoph Hellwig  wrote:
> >   
> > > On Tue, Aug 27, 2019 at 10:21:06PM +0200, Michal Suchanek wrote:  
> > > > 64bit !COMPAT does not build because the llseek syscall is in the 
> > > > tables.
> > > 
> > > Well, this will bloat thinkgs like 64-bit RISC-V for no good reason.
> > > Please introduce a WANT_LSEEK like symbol that ppc64 can select instead.  
> > 
> > It also builds when llseek is marked as 32bit only in syscall.tbl
> > 
> > It seems it was handled specially in some way before syscall.tbl was
> > added, though (removed in ab66dcc76d6ab8fae9d69d149ae38c42605e7fc5)  
> 
> Independ of if you need it on a purely 64-bit build on powerpc (which
> I'll let the experts figure out) it is not needed on a purely 64-bit
> build on other platforms.  So please make sure it is still built
> conditional, just possibly with an opt-in for powerpc.

AFAICT it is needed for all 64bit platforms with unified syscall.tbl.

I modified the syscall.tbl for powerpc to not need the syscall with
64bit only build but other platforms are still broken. There are a few
platforms that use multiple tables and on those the 64bit one indeed
does not contain llseek.

Thanks

Michal


Re: [PATCH v3 3/4] powerpc/64: make buildable without CONFIG_COMPAT

2019-08-29 Thread Michal Suchánek
On Wed, 28 Aug 2019 23:46:24 -0700
Christoph Hellwig  wrote:

> On Wed, Aug 28, 2019 at 06:43:50PM +0200, Michal Suchanek wrote:
> > +ifdef CONFIG_COMPAT
> > +obj-y  += sys_ppc32.o ptrace32.o signal_32.o
> > +endif  
> 
> This should be:
> 
> obj-$(CONFIG_COMPAT)  += sys_ppc32.o ptrace32.o signal_32.o

Yes, looks better.
> 
> >  /* This value is used to mark exception frames on the stack. */
> >  exception_marker:
> > diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
> > index 60436432399f..73d0f53ffc1a 100644
> > --- a/arch/powerpc/kernel/signal.c
> > +++ b/arch/powerpc/kernel/signal.c
> > @@ -277,7 +277,7 @@ static void do_signal(struct task_struct *tsk)
> >  
> > rseq_signal_deliver(, tsk->thread.regs);
> >  
> > -   if (is32) {
> > +   if ((IS_ENABLED(CONFIG_PPC32) || IS_ENABLED(CONFIG_COMPAT)) && is32) {  
> 
> I think we should fix the is_32bit_task definitions instead so that
> callers don't need this mess.  

Yes, that makes sense.

Thanks

Michal


Re: [PATCH rebased] powerpc/fadump: when fadump is supported register the fadump sysfs files.

2019-08-28 Thread Michal Suchánek
On Tue, 27 Aug 2019 17:57:31 +0530
Hari Bathini  wrote:

> Hi Michal,
> 
> Thanks for the patch. 
> 
> On 20/08/19 11:42 PM, Michal Suchanek wrote:
> > Currently it is not possible to distinguish the case when fadump is
> > supported by firmware and disabled in kernel and completely unsupported
> > using the kernel sysfs interface. User can investigate the devicetree
> > but it is more reasonable to provide sysfs files in case we get some
> > fadumpv2 in the future.
> > 
> > With this patch sysfs files are available whenever fadump is supported
> > by firmware.
> > 
> > Signed-off-by: Michal Suchanek 
> > ---
> > Rebase on top of http://patchwork.ozlabs.org/patch/1150160/
> > [v5,31/31] powernv/fadump: support holes in kernel boot memory area
> > ---
> >  arch/powerpc/kernel/fadump.c | 33 ++---
> >  1 file changed, 18 insertions(+), 15 deletions(-)
> > 
> > diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> > index 4b1bb3c55cf9..7ad424729e9c 100644
> > --- a/arch/powerpc/kernel/fadump.c
> > +++ b/arch/powerpc/kernel/fadump.c
> > @@ -1319,13 +1319,9 @@ static void fadump_init_files(void)
> >   */
> >  int __init setup_fadump(void)
> >  {
> > -   if (!fw_dump.fadump_enabled)
> > -   return 0;
> > -
> > -   if (!fw_dump.fadump_supported) {
> > +   if (!fw_dump.fadump_supported && fw_dump.fadump_enabled) {
> > printk(KERN_ERR "Firmware-assisted dump is not supported on"
> > " this hardware\n");
> > -   return 0;
> > }
> >  
> > fadump_show_config();
> > @@ -1333,19 +1329,26 @@ int __init setup_fadump(void)
> >  * If dump data is available then see if it is valid and prepare for
> >  * saving it to the disk.
> >  */
> > -   if (fw_dump.dump_active) {
> > +   if (fw_dump.fadump_enabled) {
> > +   if (fw_dump.dump_active) {
> > +   /*
> > +* if dump process fails then invalidate the
> > +* registration and release memory before proceeding
> > +* for re-registration.
> > +*/
> > +   if (fw_dump.ops->fadump_process(_dump) < 0)
> > +   fadump_invalidate_release_mem();
> > +   }
> > /*
> > -* if dump process fails then invalidate the registration
> > -* and release memory before proceeding for re-registration.
> > +* Initialize the kernel dump memory structure for FAD
> > +* registration.
> >  */
> > -   if (fw_dump.ops->fadump_process(_dump) < 0)
> > -   fadump_invalidate_release_mem();
> > -   }
> > -   /* Initialize the kernel dump memory structure for FAD registration. */
> > -   else if (fw_dump.reserve_dump_area_size)
> > -   fw_dump.ops->fadump_init_mem_struct(_dump);
> > +   else if (fw_dump.reserve_dump_area_size)
> > +   fw_dump.ops->fadump_init_mem_struct(_dump);
> >  
> > -   fadump_init_files();
> > +   }
> > +   if (fw_dump.fadump_supported)
> > +   fadump_init_files();
> >  
> > return 1;
> >  }
> >   
> 
> 
> Could you please move up fadump_init_files() call and return after it instead 
> of
> nesting rest of the function. 

That sounds reasonable.

> Also, get rid of the error message when fadump is
> not supported as it is already taken care of in fadump_reserve_mem() function.

That should not be called in that case, should it?

Anyway, I find the message right next to the message about reserving
memory for kdump. So it really looks helpful in the log.

> I mean:
> 
> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> index 2015b1f..0e9b028 100644
> --- a/arch/powerpc/kernel/fadump.c
> +++ b/arch/powerpc/kernel/fadump.c
> @@ -1322,16 +1322,16 @@ static void fadump_init_files(void)
>   */
>  int __init setup_fadump(void)
>  {
> -   if (!fw_dump.fadump_enabled)
> +   if (!fw_dump.fadump_supported)
> return 0;
>  
> -   if (!fw_dump.fadump_supported) {
> -   printk(KERN_ERR "Firmware-assisted dump is not supported on"
> -   " this hardware\n");
> -   return 0;
> -   }
> +   fadump_init_files();
>  
> fadump_show_config();
> +
> +   if (!fw_dump.fadump_enabled)
> +   return 0;

Should the init function return 0 when it did something that needs to
be undone later (ie registering the sysfs files)? This is probably not
very meaningful for fadump but what is the correct way to not set a bad
example?

Thanks

Michal


Re: [PATCH 1/4] fs: always build llseek.

2019-08-28 Thread Michal Suchánek
On Wed, 28 Aug 2019 08:15:52 -0700
Christoph Hellwig  wrote:

> On Tue, Aug 27, 2019 at 10:21:06PM +0200, Michal Suchanek wrote:
> > 64bit !COMPAT does not build because the llseek syscall is in the tables.  
> 
> Well, this will bloat thinkgs like 64-bit RISC-V for no good reason.
> Please introduce a WANT_LSEEK like symbol that ppc64 can select instead.

It also builds when llseek is marked as 32bit only in syscall.tbl

It seems it was handled specially in some way before syscall.tbl was
added, though (removed in ab66dcc76d6ab8fae9d69d149ae38c42605e7fc5)

Thanks

Michal


Re: [PATCH v2 4/4] powerpc/64: system call implement the bulk of the logic in C

2019-08-28 Thread Michal Suchánek
On Tue, 27 Aug 2019 23:55:48 +1000
Nicholas Piggin  wrote:

> System call entry and particularly exit code is beyond the limit of what
> is reasonable to implement in asm.
> 
> This conversion moves all conditional branches out of the asm code,
> except for the case that all GPRs should be restored at exit.
> 
> Null syscall test is about 5% faster after this patch, because the exit
> work is handled under local_irq_disable, and the hard mask and pending
> interrupt replay is handled after that, which avoids games with MSR.
> 
> Signed-off-by: Nicholas Piggin 
> ---
> Changes since v1:
> - Improve changelog
> - Lot of code cleanups, moving helpers out to proper header locations,
>   etc (Christophe).
> - Split unnecessary change that affected ppc32 out. I will submit it
>   independently (Christophe).
> 
>  arch/powerpc/include/asm/asm-prototypes.h |  11 -
>  .../powerpc/include/asm/book3s/64/kup-radix.h |  12 +-
>  arch/powerpc/include/asm/cputime.h|  22 ++
>  arch/powerpc/include/asm/ptrace.h |   3 +
>  arch/powerpc/include/asm/signal.h |   2 +
>  arch/powerpc/include/asm/switch_to.h  |   5 +
>  arch/powerpc/include/asm/time.h   |   3 +
>  arch/powerpc/kernel/Makefile  |   3 +-
>  arch/powerpc/kernel/entry_64.S| 340 +++---
>  arch/powerpc/kernel/signal.h  |   2 -
>  arch/powerpc/kernel/syscall_64.c  | 177 +
>  11 files changed, 273 insertions(+), 307 deletions(-)
>  create mode 100644 arch/powerpc/kernel/syscall_64.c
> 
> diff --git a/arch/powerpc/include/asm/asm-prototypes.h 
> b/arch/powerpc/include/asm/asm-prototypes.h
> index ec1c97a8e8cb..f00ef8924a99 100644
> --- a/arch/powerpc/include/asm/asm-prototypes.h
> +++ b/arch/powerpc/include/asm/asm-prototypes.h
> @@ -92,14 +92,6 @@ long sys_switch_endian(void);
>  notrace unsigned int __check_irq_replay(void);
>  void notrace restore_interrupts(void);
>  
> -/* ptrace */
> -long do_syscall_trace_enter(struct pt_regs *regs);
> -void do_syscall_trace_leave(struct pt_regs *regs);
> -
> -/* process */
> -void restore_math(struct pt_regs *regs);
> -void restore_tm_state(struct pt_regs *regs);
> -
>  /* prom_init (OpenFirmware) */
>  unsigned long __init prom_init(unsigned long r3, unsigned long r4,
>  unsigned long pp,
> @@ -110,9 +102,6 @@ unsigned long __init prom_init(unsigned long r3, unsigned 
> long r4,
>  void __init early_setup(unsigned long dt_ptr);
>  void early_setup_secondary(void);
>  
> -/* time */
> -void accumulate_stolen_time(void);
> -
>  /* misc runtime */
>  extern u64 __bswapdi2(u64);
>  extern s64 __lshrdi3(s64, int);
> diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h 
> b/arch/powerpc/include/asm/book3s/64/kup-radix.h
> index f254de956d6a..ef2e65ea8a73 100644
> --- a/arch/powerpc/include/asm/book3s/64/kup-radix.h
> +++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h
> @@ -3,6 +3,7 @@
>  #define _ASM_POWERPC_BOOK3S_64_KUP_RADIX_H
>  
>  #include 
> +#include 
>  
>  #define AMR_KUAP_BLOCK_READ  UL(0x4000)
>  #define AMR_KUAP_BLOCK_WRITE UL(0x8000)
> @@ -56,7 +57,16 @@
>  
>  #ifdef CONFIG_PPC_KUAP
>  
> -#include 
> +#include 
> +#include 
> +
> +static inline void kuap_check_amr(void)
> +{
> +#ifdef CONFIG_PPC_KUAP_DEBUG
> + if (mmu_has_feature(MMU_FTR_RADIX_KUAP))
> + WARN_ON_ONCE(mfspr(SPRN_AMR) != AMR_KUAP_BLOCKED);
> +#endif
> +}
>  
>  /*
>   * We support individually allowing read or write, but we don't support 
> nesting
> diff --git a/arch/powerpc/include/asm/cputime.h 
> b/arch/powerpc/include/asm/cputime.h
> index 2431b4ada2fa..f3aa9db1a3cc 100644
> --- a/arch/powerpc/include/asm/cputime.h
> +++ b/arch/powerpc/include/asm/cputime.h
> @@ -60,6 +60,28 @@ static inline void arch_vtime_task_switch(struct 
> task_struct *prev)
>  }
>  #endif
>  
> +static inline void account_cpu_user_entry(void)
> +{
> + unsigned long tb = mftb();
> +
> + get_accounting(current)->utime += (tb - 
> get_accounting(current)->starttime_user);
> + get_accounting(current)->starttime = tb;
> +}
> +static inline void account_cpu_user_exit(void)
> +{
> + unsigned long tb = mftb();
> +
> + get_accounting(current)->stime += (tb - 
> get_accounting(current)->starttime);
> + get_accounting(current)->starttime_user = tb;
> +}
> +
>  #endif /* __KERNEL__ */
> +#else /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
> +static inline void account_cpu_user_entry(void)
> +{
> +}
> +static inline void account_cpu_user_exit(void)
> +{
> +}
>  #endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
>  #endif /* __POWERPC_CPUTIME_H */
> diff --git a/arch/powerpc/include/asm/ptrace.h 
> b/arch/powerpc/include/asm/ptrace.h
> index feee1b21bbd5..af363086403a 100644
> --- a/arch/powerpc/include/asm/ptrace.h
> +++ b/arch/powerpc/include/asm/ptrace.h
> @@ -138,6 +138,9 @@ extern unsigned long profile_pc(struct pt_regs *regs);
>  #define profile_pc(regs) 

Re: [PATCH v2 0/4] Disable compat cruft on ppc64le v2

2019-08-28 Thread Michal Suchánek
On Wed, 28 Aug 2019 20:57:48 +1000
Nicholas Piggin  wrote:

> Michal Suchanek's on August 28, 2019 8:30 pm:
> > With endian switch disabled by default the ppc64le compat supports
> > ppc32le only which is something next to nobody has binaries for.
> > 
> > Less code means less bugs so drop the compat stuff.  
> 
> Interesting patches, thanks for looking into it. I don't know much
> about compat and wrong endian userspaces. I think sys_switch_endian
> is enabled though, it's just a strange fast endian swap thing that
> has been disabled by default.
> 
> The first patches look pretty good. Maybe for the last one it could
> become a selectable option?

That sounds good.

> 
> 
> > I am not particularly sure about the best way to resolve the llseek
> > situation. I don't see anything in the syscal tables making it
> > 32bit-only so I suppose it should be available on 64bit as well.  
> 
> It's for 32-bit userspace only. Can we just get rid of it, or is
> there some old broken 64-bit BE userspace that tries to call it?

That sounds like a bug in creating these unified syscall tables then.
On architectures that have split tables the 64bit ones do not have
llseek. On architectures with one table the syscall is marked as common.

Thanks

Michal


Re: [PATCH v2 0/4] Disable compat cruft on ppc64le v2

2019-08-28 Thread Michal Suchánek
On Wed, 28 Aug 2019 13:08:48 +
Christophe Leroy  wrote:

> On 08/28/2019 10:30 AM, Michal Suchanek wrote:
> > With endian switch disabled by default the ppc64le compat supports
> > ppc32le only which is something next to nobody has binaries for.
> > 
> > Less code means less bugs so drop the compat stuff.
> > 
> > I am not particularly sure about the best way to resolve the llseek
> > situation. I don't see anything in the syscal tables making it
> > 32bit-only so I suppose it should be available on 64bit as well.
> > 
> > This is tested on ppc64le top of  
> 
> Really ?

Really. It boots with the unused variable. It might depend on some
config options or gcc features if unused variables are fatal.

Thanks

Michal


Re: [PATCH v2 3/4] powerpc/64: make buildable without CONFIG_COMPAT

2019-08-28 Thread Michal Suchánek
On Wed, 28 Aug 2019 14:49:16 +0200
Christophe Leroy  wrote:

> Le 28/08/2019 à 12:30, Michal Suchanek a écrit :
> > There are numerous references to 32bit functions in generic and 64bit
> > code so ifdef them out.  
> 
> As far as possible, avoid opting things out with ifdefs. Ref 
> https://www.kernel.org/doc/html/latest/process/coding-style.html#conditional-compilation
> 
> See comment below.
> 
> > 
> > Signed-off-by: Michal Suchanek 
> > ---
> > v2:
> > - fix 32bit ifdef condition in signal.c
> > - simplify the compat ifdef condition in vdso.c - 64bit is redundant
> > - simplify the compat ifdef condition in callchain.c - 64bit is redundant
> > ---
> >   arch/powerpc/include/asm/syscall.h |  2 ++
> >   arch/powerpc/kernel/Makefile   | 15 ---
> >   arch/powerpc/kernel/entry_64.S |  2 ++
> >   arch/powerpc/kernel/signal.c   |  5 +++--
> >   arch/powerpc/kernel/syscall_64.c   |  5 +++--
> >   arch/powerpc/kernel/vdso.c |  4 +++-
> >   arch/powerpc/perf/callchain.c  | 14 ++
> >   7 files changed, 35 insertions(+), 12 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/syscall.h 
> > b/arch/powerpc/include/asm/syscall.h
> > index 38d62acfdce7..3ed3b75541a1 100644
> > --- a/arch/powerpc/include/asm/syscall.h
> > +++ b/arch/powerpc/include/asm/syscall.h
> > @@ -16,7 +16,9 @@
> >   
> >   /* ftrace syscalls requires exporting the sys_call_table */
> >   extern const unsigned long sys_call_table[];
> > +#ifdef CONFIG_COMPAT
> >   extern const unsigned long compat_sys_call_table[];
> > +#endif  
> 
> Leaving the declaration should be harmless.

Yes, it only allows earlier check that the type is not used.

> 
> >   
> >   static inline int syscall_get_nr(struct task_struct *task, struct pt_regs 
> > *regs)
> >   {
> > diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> > index 1d646a94d96c..b0db365b83d8 100644
> > --- a/arch/powerpc/kernel/Makefile
> > +++ b/arch/powerpc/kernel/Makefile
> > @@ -44,16 +44,25 @@ CFLAGS_btext.o += -DDISABLE_BRANCH_PROFILING
> >   endif
> >   
> >   obj-y := cputable.o ptrace.o syscalls.o \
> > -  irq.o align.o signal_32.o pmc.o vdso.o \
> > +  irq.o align.o pmc.o vdso.o \
> >process.o systbl.o idle.o \
> >signal.o sysfs.o cacheinfo.o time.o \
> >prom.o traps.o setup-common.o \
> >udbg.o misc.o io.o misc_$(BITS).o \
> >of_platform.o prom_parse.o
> > -obj-$(CONFIG_PPC64)+= setup_64.o sys_ppc32.o \
> > -  signal_64.o ptrace32.o \
> > +ifndef CONFIG_PPC64
> > +obj-y  += signal_32.o
> > +else
> > +ifdef CONFIG_COMPAT
> > +obj-y  += signal_32.o
> > +endif
> > +endif
> > +obj-$(CONFIG_PPC64)+= setup_64.o signal_64.o \
> >paca.o nvram_64.o firmware.o \
> >syscall_64.o  
> 
> That's still a bit messy. You could have:
> 
> obj-y = +=signal_$(BITS).o
> obj-$(CONFIG_COMPAT) += signal_32.o
> 
> > +ifdef CONFIG_COMPAT
> > +obj-$(CONFIG_PPC64)+= sys_ppc32.o ptrace32.o
> > +endif  
> 
> AFAIK, CONFIG_COMPAT is only defined when CONFIG_PP64 is defined, so 
> could be:
> 
> obj-$(CONFIG_COMPAT)  += sys_ppc32.o ptrace32.o
> 
> And could be grouped with the above signal_32.o
> 

Looks better.

> 
> >   obj-$(CONFIG_VDSO32)  += vdso32/
> >   obj-$(CONFIG_PPC_WATCHDOG)+= watchdog.o
> >   obj-$(CONFIG_HAVE_HW_BREAKPOINT)  += hw_breakpoint.o
> > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> > index 2ec825a85f5b..a2dbf216f607 100644
> > --- a/arch/powerpc/kernel/entry_64.S
> > +++ b/arch/powerpc/kernel/entry_64.S
> > @@ -51,8 +51,10 @@
> >   SYS_CALL_TABLE:
> > .tc sys_call_table[TC],sys_call_table
> >   
> > +#ifdef CONFIG_COMPAT
> >   COMPAT_SYS_CALL_TABLE:
> > .tc compat_sys_call_table[TC],compat_sys_call_table
> > +#endif  
> 
> Can we avoid this ifdef ?

AFAICT it creates reference to non-existent table otherwise.

> 
> >   
> >   /* This value is used to mark exception frames on the stack. */
> >   exception_marker:
> > diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
> > index 60436432399f..ffd045e9fb57 100644
> > --- a/arch/powerpc/kernel/signal.c
> > +++ b/arch/powerpc/kernel/signal.c
> > @@ -277,14 +277,15 @@ static void do_signal(struct task_struct *tsk)
> >   
> > rseq_signal_deliver(, tsk->thread.regs);
> >   
> > +#if !defined(CONFIG_PPC64) || defined(CONFIG_COMPAT)
> > if (is32) {
> > if (ksig.ka.sa.sa_flags & SA_SIGINFO)
> > ret = handle_rt_signal32(, oldset, tsk);
> > else
> > ret = handle_signal32(, oldset, tsk);
> > 

Re: [PATCH 3/4] powerpc/64: make buildable without CONFIG_COMPAT

2019-08-27 Thread Michal Suchánek
On Tue, 27 Aug 2019 22:21:08 +0200
Michal Suchanek  wrote:

> There are numerous references to 32bit functions in generic and 64bit
> code so ifdef them out.
> 
> Signed-off-by: Michal Suchanek 
> ---
>  arch/powerpc/include/asm/syscall.h |  2 ++
>  arch/powerpc/kernel/Makefile   | 15 ---
>  arch/powerpc/kernel/entry_64.S |  2 ++
>  arch/powerpc/kernel/signal.c   |  5 +++--
>  arch/powerpc/kernel/syscall_64.c   |  5 +++--
>  arch/powerpc/kernel/vdso.c |  4 +++-
>  arch/powerpc/perf/callchain.c  | 14 ++
>  7 files changed, 35 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/syscall.h 
> b/arch/powerpc/include/asm/syscall.h
> index 38d62acfdce7..3ed3b75541a1 100644
> --- a/arch/powerpc/include/asm/syscall.h
> +++ b/arch/powerpc/include/asm/syscall.h
> @@ -16,7 +16,9 @@
>  
>  /* ftrace syscalls requires exporting the sys_call_table */
>  extern const unsigned long sys_call_table[];
> +#ifdef CONFIG_COMPAT
>  extern const unsigned long compat_sys_call_table[];
> +#endif
>  
>  static inline int syscall_get_nr(struct task_struct *task, struct pt_regs 
> *regs)
>  {
> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> index 1d646a94d96c..b0db365b83d8 100644
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -44,16 +44,25 @@ CFLAGS_btext.o += -DDISABLE_BRANCH_PROFILING
>  endif
>  
>  obj-y:= cputable.o ptrace.o syscalls.o \
> -irq.o align.o signal_32.o pmc.o vdso.o \
> +irq.o align.o pmc.o vdso.o \
>  process.o systbl.o idle.o \
>  signal.o sysfs.o cacheinfo.o time.o \
>  prom.o traps.o setup-common.o \
>  udbg.o misc.o io.o misc_$(BITS).o \
>  of_platform.o prom_parse.o
> -obj-$(CONFIG_PPC64)  += setup_64.o sys_ppc32.o \
> -signal_64.o ptrace32.o \
> +ifndef CONFIG_PPC64
> +obj-y+= signal_32.o
> +else
> +ifdef CONFIG_COMPAT
> +obj-y+= signal_32.o
> +endif
> +endif
> +obj-$(CONFIG_PPC64)  += setup_64.o signal_64.o \
>  paca.o nvram_64.o firmware.o \
>  syscall_64.o
> +ifdef CONFIG_COMPAT
> +obj-$(CONFIG_PPC64)  += sys_ppc32.o ptrace32.o
> +endif
>  obj-$(CONFIG_VDSO32) += vdso32/
>  obj-$(CONFIG_PPC_WATCHDOG)   += watchdog.o
>  obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index 2ec825a85f5b..a2dbf216f607 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -51,8 +51,10 @@
>  SYS_CALL_TABLE:
>   .tc sys_call_table[TC],sys_call_table
>  
> +#ifdef CONFIG_COMPAT
>  COMPAT_SYS_CALL_TABLE:
>   .tc compat_sys_call_table[TC],compat_sys_call_table
> +#endif
>  
>  /* This value is used to mark exception frames on the stack. */
>  exception_marker:
> diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
> index 60436432399f..71f4f1794f86 100644
> --- a/arch/powerpc/kernel/signal.c
> +++ b/arch/powerpc/kernel/signal.c
> @@ -277,14 +277,15 @@ static void do_signal(struct task_struct *tsk)
>  
>   rseq_signal_deliver(, tsk->thread.regs);
>  
> +#ifdef CONFIG_COMPAT
This is common code so it needs 
#if !defined(CONFIG_PPC64) || defined(CONFIG_COMPAT)
>   if (is32) {
>   if (ksig.ka.sa.sa_flags & SA_SIGINFO)
>   ret = handle_rt_signal32(, oldset, tsk);
>   else
>   ret = handle_signal32(, oldset, tsk);
> - } else {
> + } else
> +#endif /* CONFIG_COMPAT */
>   ret = handle_rt_signal64(, oldset, tsk);
> - }
>  
>   tsk->thread.regs->trap = 0;
>   signal_setup_done(ret, , test_thread_flag(TIF_SINGLESTEP));
> diff --git a/arch/powerpc/kernel/syscall_64.c 
> b/arch/powerpc/kernel/syscall_64.c
> index 98ed970796d5..3f48262b512d 100644
> --- a/arch/powerpc/kernel/syscall_64.c
> +++ b/arch/powerpc/kernel/syscall_64.c
> @@ -100,6 +100,7 @@ long system_call_exception(long r3, long r4, long r5, 
> long r6, long r7, long r8,
>   /* May be faster to do array_index_nospec? */
>   barrier_nospec();
>  
> +#ifdef CONFIG_COMPAT
>   if (unlikely(ti_flags & _TIF_32BIT)) {
>   f = (void *)compat_sys_call_table[r0];
>  
> @@ -110,9 +111,9 @@ long system_call_exception(long r3, long r4, long r5, 
> long r6, long r7, long r8,
>   r7 &= 0xULL;
>   r8 &= 0xULL;
>  
> - } else {
> + } else
> +#endif /* CONFIG_COMPAT */
>   f = (void *)sys_call_table[r0];
> - }
>  
>   return f(r3, r4, r5, r6, r7, r8);
>  }
> diff --git 

Re: [PATCH] powerpc: Add barrier_nospec to raw_copy_in_user()

2019-08-21 Thread Michal Suchánek
On Wed,  6 Mar 2019 12:10:38 +1100
Suraj Jitindar Singh  wrote:

> Commit ddf35cf3764b ("powerpc: Use barrier_nospec in copy_from_user()")
> Added barrier_nospec before loading from user-controller pointers.
> The intention was to order the load from the potentially user-controlled
> pointer vs a previous branch based on an access_ok() check or similar.
> 
> In order to achieve the same result, add a barrier_nospec to the
> raw_copy_in_user() function before loading from such a user-controlled
> pointer.
> 
> Signed-off-by: Suraj Jitindar Singh 
> ---
>  arch/powerpc/include/asm/uaccess.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/powerpc/include/asm/uaccess.h 
> b/arch/powerpc/include/asm/uaccess.h
> index e3a731793ea2..bb615592d5bb 100644
> --- a/arch/powerpc/include/asm/uaccess.h
> +++ b/arch/powerpc/include/asm/uaccess.h
> @@ -306,6 +306,7 @@ extern unsigned long __copy_tofrom_user(void __user *to,
>  static inline unsigned long
>  raw_copy_in_user(void __user *to, const void __user *from, unsigned long n)
>  {
> + barrier_nospec();
>   return __copy_tofrom_user(to, from, n);
>  }
>  #endif /* __powerpc64__ */

Hello,

AFAICT this is not needed. The data cannot be used in the kernel without
subsequent copy_from_user which already has the nospec barrier.

Thanks

Michal


Re: [DOC][PATCH v5 1/4] powerpc: Document some HCalls for Storage Class Memory

2019-07-24 Thread Michal Suchánek
On Wed, 24 Jul 2019 11:08:58 +0200
Laurent Dufour  wrote:

> Le 23/07/2019 à 18:13, Vaibhav Jain a écrit :
> > This doc patch provides an initial description of the HCall op-codes
> > that are used by Linux kernel running as a guest operating
> > system (LPAR) on top of PowerVM or any other sPAPR compliant
> > hyper-visor (e.g qemu).
> > 
> > Apart from documenting the HCalls the doc-patch also provides a
> > rudimentary overview of how Hcalls are implemented inside the Linux
> > kernel and how information flows between kernel and PowerVM/KVM.  
> 
> Hi Vaibhav,
> 
> That's a good idea to introduce such a documentation.
> 
> > Signed-off-by: Vaibhav Jain 
> > ---
> > Change-log:
> > 
> > v5
> > * First patch in this patchset.
> > ---
> >   Documentation/powerpc/hcalls.txt | 140 +++
> >   1 file changed, 140 insertions(+)
> >   create mode 100644 Documentation/powerpc/hcalls.txt
> > 
> > diff --git a/Documentation/powerpc/hcalls.txt 
> > b/Documentation/powerpc/hcalls.txt
> > new file mode 100644
> > index ..cc9dd872cecd
> > --- /dev/null
> > +++ b/Documentation/powerpc/hcalls.txt
> > @@ -0,0 +1,140 @@
> > +Hyper-visor Call Op-codes (HCALLS)
> > +
> > +
> > +Overview
> > +=
> > +
> > +Virtualization on PPC64 arch is based on the PAPR specification[1] which
> > +describes run-time environment for a guest operating system and how it 
> > should
> > +interact with the hyper-visor for privileged operations. Currently there 
> > are two
> > +PAPR compliant hypervisors (PHYP):
> > +
> > +IBM PowerVM: IBM's proprietary hyper-visor that supports AIX, IBM-i and 
> > Linux as
> > +supported guests (termed as Logical Partitions or LPARS).
> > +
> > +Qemu/KVM:Supports PPC64 linux guests running on a PPC64 linux host.
> > +
> > +On PPC64 arch a virtualized guest kernel runs in a non-privileged mode 
> > (HV=0).
> > +Hence to perform a privileged operations the guest issues a Hyper-visor
> > +Call (HCALL) with necessary input operands. PHYP after performing the 
> > privilege
> > +operation returns a status code and output operands back to the guest.
> > +
> > +HCALL ABI
> > +=
> > +The ABI specification for a HCall between guest os kernel and PHYP is
> > +described in [1]. The Opcode for Hcall is set in R3 and subsequent 
> > in-arguments
> > +for the Hcall are provided in registers R4-R12. On return from 'HVCS'
> > +instruction the status code of HCall is available in R3 an the output 
> > parameters
> > +are returned in registers R4-R12.  
> 
> Would it be good to mention that values passed through the memory must be 
> stored in Big Endian format ?
> 
> > +Powerpc arch code provides convenient wrappers named plpar_hcall_xxx 
> > defined in
> > +header 'hvcall.h' to issue HCalls from the linux kernel running as guest.
> > +
> > +
> > +DRC & DRC Indexes
> > +=
> > +
> > +PAPRGuest
> > +  DR1  Hypervisor OS
> > +  +--++--+ +-+
> > +  |  |<-->|  | |  User   |
> > +  +--+  DRC1  |  |   DRC   |  Space  |
> > + |  |  Index  +-+
> > +  DR2 |  | | |
> > +  +--+|  |<--->|  Kernel |
> > +  |  |<- >|  |  HCall  | |
> > +  +--+  DRC2  +--+ +-+
> > +
> > +PHYP terms shared hardware resources like PCI devices, NVDimms etc 
> > available for
> > +use by LPARs as Dynamic Resource (DR). When a DR is allocated to an LPAR, 
> > PHYP
> > +creates a data-structure called Dynamic Resource Connector (DRC) to manage 
> > LPAR
> > +access. An LPAR refers to a DRC via an opaque 32-bit number called 
> > DRC-Index.
> > +The DRC-index value is provided to the LPAR via device-tree where its 
> > present
> > +as an attribute in the device tree node associated with the DR.  
> 
> Should you use the term 'Hypervisor' instead of 'PHYP' which is not usually 
> designing only the proprietary one ?
> 
> Thanks,
> Laurent.
> 
> > +
> > +HCALL Op-codes
> > +==
> > +
> > +Below is a partial of of HCALLs that are supported by PHYP. For the
^^ list?

Thanks

Michal
> > +corresponding opcode values please look into the header
> > +'arch/powerpc/include/asm/hvcall.h' :
> > +
> > +* H_SCM_READ_METADATA:
> > +  Input: drcIndex, offset, buffer-address, numBytesToRead
> > +  Out: None
> > +  Description:
> > +  Given a DRC Index of an NVDimm, read N-bytes from the the meta data area
> > +  associated with it, at a specified offset and copy it to provided buffer.
> > +  The metadata area stores configuration information such as label 
> > information,
> > +  bad-blocks etc. The metadata area is located out-of-band of NVDimm 
> > storage
> > +  area hence a separate access semantics is provided.
> > +
> > +* H_SCM_WRITE_METADATA:
> > +  Input: drcIndex, offset, data, numBytesToWrite
> > + 

Re: [PATCH 2/2] powerpc: avoid adjusting memory_limit for capture kernel memory reservation

2019-07-22 Thread Michal Suchánek
On Fri, 28 Jun 2019 00:51:19 +0530
Hari Bathini  wrote:

> Currently, if memory_limit is specified and it overlaps with memory to
> be reserved for capture kernel, memory_limit is adjusted to accommodate
> capture kernel. With memory reservation for capture kernel moved later
> (after enforcing memory limit), this adjustment no longer holds water.
> So, avoid adjusting memory_limit and error out instead.

Can you split out the memory limit adjustment out of memory reservation
so it can still be adjusted?

Thanks

Michal
> 
> Signed-off-by: Hari Bathini 
> ---
>  arch/powerpc/kernel/fadump.c|   16 
>  arch/powerpc/kernel/machine_kexec.c |   22 +++---
>  2 files changed, 11 insertions(+), 27 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> index 4eab972..a784695 100644
> --- a/arch/powerpc/kernel/fadump.c
> +++ b/arch/powerpc/kernel/fadump.c
> @@ -476,22 +476,6 @@ int __init fadump_reserve_mem(void)
>  #endif
>   }
>  
> - /*
> -  * Calculate the memory boundary.
> -  * If memory_limit is less than actual memory boundary then reserve
> -  * the memory for fadump beyond the memory_limit and adjust the
> -  * memory_limit accordingly, so that the running kernel can run with
> -  * specified memory_limit.
> -  */
> - if (memory_limit && memory_limit < memblock_end_of_DRAM()) {
> - size = get_fadump_area_size();
> - if ((memory_limit + size) < memblock_end_of_DRAM())
> - memory_limit += size;
> - else
> - memory_limit = memblock_end_of_DRAM();
> - printk(KERN_INFO "Adjusted memory_limit for firmware-assisted"
> - " dump, now %#016llx\n", memory_limit);
> - }
>   if (memory_limit)
>   memory_boundary = memory_limit;
>   else
> diff --git a/arch/powerpc/kernel/machine_kexec.c 
> b/arch/powerpc/kernel/machine_kexec.c
> index c4ed328..fc5533b 100644
> --- a/arch/powerpc/kernel/machine_kexec.c
> +++ b/arch/powerpc/kernel/machine_kexec.c
> @@ -125,10 +125,8 @@ void __init reserve_crashkernel(void)
>   crashk_res.end = crash_base + crash_size - 1;
>   }
>  
> - if (crashk_res.end == crashk_res.start) {
> - crashk_res.start = crashk_res.end = 0;
> - return;
> - }
> + if (crashk_res.end == crashk_res.start)
> + goto error_out;
>  
>   /* We might have got these values via the command line or the
>* device tree, either way sanitise them now. */
> @@ -170,15 +168,13 @@ void __init reserve_crashkernel(void)
>   if (overlaps_crashkernel(__pa(_stext), _end - _stext)) {
>   printk(KERN_WARNING
>   "Crash kernel can not overlap current kernel\n");
> - crashk_res.start = crashk_res.end = 0;
> - return;
> + goto error_out;
>   }
>  
>   /* Crash kernel trumps memory limit */
>   if (memory_limit && memory_limit <= crashk_res.end) {
> - memory_limit = crashk_res.end + 1;
> - printk("Adjusted memory limit for crashkernel, now 0x%llx\n",
> -memory_limit);
> + pr_err("Crash kernel size can't exceed memory_limit\n");
> + goto error_out;
>   }
>  
>   printk(KERN_INFO "Reserving %ldMB of memory at %ldMB "
> @@ -190,9 +186,13 @@ void __init reserve_crashkernel(void)
>   if (!memblock_is_region_memory(crashk_res.start, crash_size) ||
>   memblock_reserve(crashk_res.start, crash_size)) {
>   pr_err("Failed to reserve memory for crashkernel!\n");
> - crashk_res.start = crashk_res.end = 0;
> - return;
> + goto error_out;
>   }
> +
> + return;
> +error_out:
> + crashk_res.start = crashk_res.end = 0;
> + return;
>  }
>  
>  int overlaps_crashkernel(unsigned long start, unsigned long size)
> 



Re: [PATCH v3] tpm: tpm_ibm_vtpm: Fix unallocated banks

2019-07-16 Thread Michal Suchánek
On Fri, 12 Jul 2019 00:13:57 +0300
Jarkko Sakkinen  wrote:

> On Thu, Jul 11, 2019 at 11:28:24PM +0300, Jarkko Sakkinen wrote:
> > On Thu, Jul 11, 2019 at 12:13:35PM -0400, Nayna Jain wrote:  
> > > The nr_allocated_banks and allocated banks are initialized as part of
> > > tpm_chip_register. Currently, this is done as part of auto startup
> > > function. However, some drivers, like the ibm vtpm driver, do not run
> > > auto startup during initialization. This results in uninitialized memory
> > > issue and causes a kernel panic during boot.
> > > 
> > > This patch moves the pcr allocation outside the auto startup function
> > > into tpm_chip_register. This ensures that allocated banks are initialized
> > > in any case.
> > > 
> > > Fixes: 879b589210a9 ("tpm: retrieve digest size of unknown algorithms with
> > > PCR read")
> > > Reported-by: Michal Suchanek 
> > > Signed-off-by: Nayna Jain 
> > > Reviewed-by: Mimi Zohar 
> > > Tested-by: Sachin Sant 
> > > Tested-by: Michal Suchánek   
> > 
> > Reviewed-by: Jarkko Sakkinen   
> 
> Thanks a lot! It is applied now.

Fixes the issue for me.

Thanks

Michal


Re: [PATCH] tpm: fixes uninitialized allocated banks for IBM vtpm driver

2019-07-05 Thread Michal Suchánek
On Fri, 05 Jul 2019 13:42:18 +0300
Jarkko Sakkinen  wrote:

> On Wed, 2019-07-03 at 23:32 -0400, Nayna Jain wrote:
> > The nr_allocated_banks and allocated banks are initialized as part of
> > tpm_chip_register. Currently, this is done as part of auto startup
> > function. However, some drivers, like the ibm vtpm driver, do not run
> > auto startup during initialization. This results in uninitialized memory
> > issue and causes a kernel panic during boot.
> > 
> > This patch moves the pcr allocation outside the auto startup function
> > into tpm_chip_register. This ensures that allocated banks are initialized
> > in any case.
> > 
> > Fixes: 879b589210a9 ("tpm: retrieve digest size of unknown algorithms with
> > PCR read")
> > Signed-off-by: Nayna Jain   
> 
> Please add
> 
> Reported-by: Michal Suchanek 
> 
> It is missing. Michal is there a chance you could try this out once
> Nayna send a new version?

Should be easy to test. Without the patch the machine crashes on boot
so it is quite obvious case. I have a few VMs with the vTPM available.

Thanks

Michal


Re: [PATCH] tpm: fixes uninitialized allocated banks for IBM vtpm driver

2019-07-04 Thread Michal Suchánek
On Thu, 4 Jul 2019 19:26:36 +0530
Sachin Sant  wrote:

> > On 04-Jul-2019, at 5:29 PM, Mimi Zohar  wrote:
> > 
> > On Wed, 2019-07-03 at 23:32 -0400, Nayna Jain wrote:  
> >> The nr_allocated_banks and allocated banks are initialized as part of
> >> tpm_chip_register. Currently, this is done as part of auto startup
> >> function. However, some drivers, like the ibm vtpm driver, do not run
> >> auto startup during initialization. This results in uninitialized memory
> >> issue and causes a kernel panic during boot.
> >> 
> >> This patch moves the pcr allocation outside the auto startup function
> >> into tpm_chip_register. This ensures that allocated banks are initialized
> >> in any case.
> >> 
> >> Fixes: 879b589210a9 ("tpm: retrieve digest size of unknown algorithms with
> >> PCR read")
> >> Signed-off-by: Nayna Jain   
> > Reviewed-by: Mimi Zohar   
> 
> Thanks for the fix. Kernel boots fine with this fix.
> 
> Tested-by: Sachin Sant 
> 

Tested-by: Michal Suchánek 

Thanks

Michal


Re: Kernel Bug/oops during boot (PowerVM LPAR w/vTPM)

2019-07-03 Thread Michal Suchánek
On Wed, 3 Jul 2019 13:17:16 +0200
Michal Suchánek  wrote:

> On Wed, 3 Jul 2019 15:54:59 +0530
> Sachin Sant  wrote:
> 
> > Booting mainline kernel on PowerVM LPAR with vTPM enabled results
> > into a kernel crash.
> > 
> > [0.365989] BUG: Kernel NULL pointer dereference at 0x0012
...
> > [0.366085] NIP [c073dd80] tpm1_pcr_extend+0x130/0x230
> > [0.366090] LR [c073dcd0] tpm1_pcr_extend+0x80/0x230
...
> 
> You need to revert (or fix up) commit 0b6cf6b97b7e ("tpm: pass an array
> of tpm_extend_digest structures to tpm_pcr_extend()". At least
> reverting it fixes the issue for me.

FTR this is the revert on lkml https://lkml.org/lkml/2019/7/1/423

> 
> Thanks
> 
> Michal
> 



Re: Kernel Bug/oops during boot (PowerVM LPAR w/vTPM)

2019-07-03 Thread Michal Suchánek
On Wed, 3 Jul 2019 15:54:59 +0530
Sachin Sant  wrote:

> Booting mainline kernel on PowerVM LPAR with vTPM enabled results
> into a kernel crash.
> 
> [0.365989] BUG: Kernel NULL pointer dereference at 0x0012
> [0.365995] Faulting instruction address: 0xc073dd80
> [0.366000] Oops: Kernel access of bad area, sig: 11 [#1]
> [0.366005] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> [0.366010] Modules linked in:
> [0.366015] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 
> 5.2.0-rc7-autotest-autotest #1
> [0.366020] NIP:  c073dd80 LR: c073dcd0 CTR: 
> 
> [0.366026] REGS: c018ed8e7550 TRAP: 0380   Not tainted  
> (5.2.0-rc7-autotest-autotest)
> [0.366031] MSR:  80009033   CR: 28002448  
> XER: 20040001
> [0.366038] CFAR: c0b6d1c4 IRQMASK: 0
> [0.366038] GPR00: c073dcd0 c018ed8e77e0 c1524a00 
> 
> [0.366038] GPR04: 0003  0001 
> 000e
> [0.366038] GPR08: 0022 0001 c018e551 
> 883bfecd
> [0.366038] GPR12: 48002448 c0001ec6ee00 c00107a8 
> 
> [0.366038] GPR16:    
> 
> [0.366038] GPR20:    
> 
> [0.366038] GPR24: c018eb9eaaa0  c0bce810 
> c0e2ed28
> [0.366038] GPR28: c018e70a4000 000a 0012 
> c018e551
> [0.366085] NIP [c073dd80] tpm1_pcr_extend+0x130/0x230
> [0.366090] LR [c073dcd0] tpm1_pcr_extend+0x80/0x230
> [0.366094] Call Trace:
> [0.366098] [c018ed8e77e0] [c073dcd0] 
> tpm1_pcr_extend+0x80/0x230 (unreliable)
> [0.366105] [c018ed8e7890] [c073c8c4] tpm_pcr_extend+0xd4/0x180
> [0.366111] [c018ed8e78d0] [c05745f8] 
> ima_add_template_entry+0x198/0x320
> [0.366117] [c018ed8e79b0] [c0577058] 
> ima_store_template+0xc8/0x160
> [0.366124] [c018ed8e7a30] [c0f6081c] 
> ima_add_boot_aggregate+0xf8/0x184
> [0.366130] [c018ed8e7b30] [c0f6093c] ima_init+0x94/0xbc
> [0.366135] [c018ed8e7b90] [c0f60aac] init_ima+0x44/0xe8
> [0.366140] [c018ed8e7c10] [c0010448] 
> do_one_initcall+0x68/0x2c0
> [0.366146] [c018ed8e7ce0] [c0f14738] 
> kernel_init_freeable+0x318/0x47c
> [0.366152] [c018ed8e7db0] [c00107c4] kernel_init+0x24/0x150
> [0.366158] [c018ed8e7e20] [c000ba54] 
> ret_from_kernel_thread+0x5c/0x68
> [0.366163] Instruction dump:
> [0.366167] 7d404d2c 81210068 792a07e1 e9410070 392a0002 7d004c2c 79070020 
> 40c20048
> [0.366174] 39080014 3d21 7f884840 419d00a4 <807e> 809e0004 
> 80be0008 80de000c
> [0.366182] ---[ end trace ec40127c4fe87b2c ]—
> 
> Thanks
> -Sachin

You need to revert (or fix up) commit 0b6cf6b97b7e ("tpm: pass an array
of tpm_extend_digest structures to tpm_pcr_extend()". At least
reverting it fixes the issue for me.

Thanks

Michal



Re: Build failure in v4.4.y.queue (ppc:allmodconfig)

2019-05-09 Thread Michal Suchánek
On Thu, 9 May 2019 07:06:32 -0700
Guenter Roeck  wrote:

> On 5/9/19 6:26 AM, Michal Suchánek wrote:
> > On Thu, 9 May 2019 06:07:31 -0700
> > Guenter Roeck  wrote:
> >   
> >> On 5/9/19 2:49 AM, Michal Suchánek wrote:  
> >>> On Thu, 9 May 2019 08:53:24 +0200
> >>> Greg Kroah-Hartman  wrote:
> >>>  
> >>>> On Wed, May 08, 2019 at 01:26:42PM -0700, Guenter Roeck wrote:  
> >>>>> I see multiple instances of:
> >>>>>
> >>>>> arch/powerpc/kernel/exceptions-64s.S:839: Error:
> >>>>> attempt to move .org backwards
> >>>>>
> >>>>> in v4.4.y.queue (v4.4.179-143-gc4db218e9451).
> >>>>>
> >>>>> This is due to commit 9b2d4e06d7f1 ("powerpc/64s: Add support for a 
> >>>>> store
> >>>>> forwarding barrier at kernel entry/exit"), which is part of a large 
> >>>>> patch
> >>>>> series and can not easily be reverted.
> >>>>>
> >>>>> Guess I'll stop doing ppc:allmodconfig builds in v4.4.y ?  
> >>>>
> >>>> Michael, I thought this patch series was supposed to fix ppc issues, not
> >>>> add to them :)
> >>>>
> >>>> Any ideas on what to do here?  
> >>>
> >>> What exact code do you build?
> >>> 
> >> $ make ARCH=powerpc CROSS_COMPILE=powerpc64-linux- allmodconfig
> >> $ powerpc64-linux-gcc --version
> >> powerpc64-linux-gcc (GCC) 8.3.0
> >>  
> > 
> > Gcc should not see this file. I am asking because I do not see an .org
> > directive at line 839 of 4.4.179. I probably need some different repo
> > or extra patches to see the same code as you do.
> >   
> v4.4.179-143-gc4db218e9451 from
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
> branch linux-4.4.y

Still don't see it. That branch is at 4.4.179 and c4db218e9451 does not
exist after fetching from the repo.

Anyway, here is a patch (untested):

Subject: [PATCH] Move out-of-line exception handlers after relon exception
 handlers.

The relon exception handlers need to be at specific location and code
inflation in the common handler code can cause

Error: attempt to move .org backwards

Signed-off-by: Michal Suchanek 
---
 arch/powerpc/kernel/exceptions-64s.S | 88 ++--
 1 file changed, 44 insertions(+), 44 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 938a30fef031..1d477d21ff09 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -772,50 +772,6 @@ kvmppc_skip_Hinterrupt:
b   .
 #endif
 
-/*
- * Code from here down to __end_handlers is invoked from the
- * exception prologs above.  Because the prologs assemble the
- * addresses of these handlers using the LOAD_HANDLER macro,
- * which uses an ori instruction, these handlers must be in
- * the first 64k of the kernel image.
- */
-
-/*** Common interrupt handlers ***/
-
-   STD_EXCEPTION_COMMON(0x100, system_reset, system_reset_exception)
-
-   STD_EXCEPTION_COMMON_ASYNC(0x500, hardware_interrupt, do_IRQ)
-   STD_EXCEPTION_COMMON_ASYNC(0x900, decrementer, timer_interrupt)
-   STD_EXCEPTION_COMMON(0x980, hdecrementer, hdec_interrupt)
-#ifdef CONFIG_PPC_DOORBELL
-   STD_EXCEPTION_COMMON_ASYNC(0xa00, doorbell_super, doorbell_exception)
-#else
-   STD_EXCEPTION_COMMON_ASYNC(0xa00, doorbell_super, unknown_exception)
-#endif
-   STD_EXCEPTION_COMMON(0xb00, trap_0b, unknown_exception)
-   STD_EXCEPTION_COMMON(0xd00, single_step, single_step_exception)
-   STD_EXCEPTION_COMMON(0xe00, trap_0e, unknown_exception)
-   STD_EXCEPTION_COMMON(0xe40, emulation_assist, 
emulation_assist_interrupt)
-   STD_EXCEPTION_COMMON_ASYNC(0xe60, hmi_exception, handle_hmi_exception)
-#ifdef CONFIG_PPC_DOORBELL
-   STD_EXCEPTION_COMMON_ASYNC(0xe80, h_doorbell, doorbell_exception)
-#else
-   STD_EXCEPTION_COMMON_ASYNC(0xe80, h_doorbell, unknown_exception)
-#endif
-   STD_EXCEPTION_COMMON_ASYNC(0xf00, performance_monitor, 
performance_monitor_exception)
-   STD_EXCEPTION_COMMON(0x1300, instruction_breakpoint, 
instruction_breakpoint_exception)
-   STD_EXCEPTION_COMMON(0x1502, denorm, unknown_exception)
-#ifdef CONFIG_ALTIVEC
-   STD_EXCEPTION_COMMON(0x1700, altivec_assist, altivec_assist_exception)
-#else
-   STD_EXCEPTION_COMMON(0x1700, altivec_assist, unknown_exception)
-#endif
-#ifdef CONFIG_CBE_RAS
-   STD_EXCEPTION_COMMON(0x1200, cbe_system_error, 
cbe_system_error_exception)
-   STD_EXCEPTION_COMMON(0x1600, cbe_maintenance, cbe_maintenance_exception)
-   STD_EXCEPTION

Re: [PATCH] vsprintf: Do not break early boot with probing addresses

2019-05-09 Thread Michal Suchánek
On Thu,  9 May 2019 14:19:23 +0200
Petr Mladek  wrote:

> The commit 3e5903eb9cff70730 ("vsprintf: Prevent crash when dereferencing
> invalid pointers") broke boot on several architectures. The common
> pattern is that probe_kernel_read() is not working during early
> boot because userspace access framework is not ready.
> 
> The check is only the best effort. Let's not rush with it during
> the early boot.
> 
> Details:
> 
> 1. Report on Power:
> 
> Kernel crashes very early during boot with with CONFIG_PPC_KUAP and
> CONFIG_JUMP_LABEL_FEATURE_CHECK_DEBUG
> 
> The problem is the combination of some new code called via printk(),
> check_pointer() which calls probe_kernel_read(). That then calls
> allow_user_access() (PPC_KUAP) and that uses mmu_has_feature() too early
> (before we've patched features).

There is early_mmu_has_feature for this case. mmu_has_feature does not
work before patching so parts of kernel that can run before patching
must use the early_ variant which actually runs code reading the
feature bitmap to determine the answer.

Thanks

Michal


Re: Build failure in v4.4.y.queue (ppc:allmodconfig)

2019-05-09 Thread Michal Suchánek
On Thu, 9 May 2019 06:07:31 -0700
Guenter Roeck  wrote:

> On 5/9/19 2:49 AM, Michal Suchánek wrote:
> > On Thu, 9 May 2019 08:53:24 +0200
> > Greg Kroah-Hartman  wrote:
> >   
> >> On Wed, May 08, 2019 at 01:26:42PM -0700, Guenter Roeck wrote:  
> >>> I see multiple instances of:
> >>>
> >>> arch/powerpc/kernel/exceptions-64s.S:839: Error:
> >>>   attempt to move .org backwards
> >>>
> >>> in v4.4.y.queue (v4.4.179-143-gc4db218e9451).
> >>>
> >>> This is due to commit 9b2d4e06d7f1 ("powerpc/64s: Add support for a store
> >>> forwarding barrier at kernel entry/exit"), which is part of a large patch
> >>> series and can not easily be reverted.
> >>>
> >>> Guess I'll stop doing ppc:allmodconfig builds in v4.4.y ?  
> >>
> >> Michael, I thought this patch series was supposed to fix ppc issues, not
> >> add to them :)
> >>
> >> Any ideas on what to do here?  
> > 
> > What exact code do you build?
> >  
> $ make ARCH=powerpc CROSS_COMPILE=powerpc64-linux- allmodconfig
> $ powerpc64-linux-gcc --version
> powerpc64-linux-gcc (GCC) 8.3.0
>

Gcc should not see this file. I am asking because I do not see an .org
directive at line 839 of 4.4.179. I probably need some different repo
or extra patches to see the same code as you do.

Thanks

Michal


Re: Build failure in v4.4.y.queue (ppc:allmodconfig)

2019-05-09 Thread Michal Suchánek
On Thu, 9 May 2019 08:53:24 +0200
Greg Kroah-Hartman  wrote:

> On Wed, May 08, 2019 at 01:26:42PM -0700, Guenter Roeck wrote:
> > I see multiple instances of:
> > 
> > arch/powerpc/kernel/exceptions-64s.S:839: Error:
> > attempt to move .org backwards
> > 
> > in v4.4.y.queue (v4.4.179-143-gc4db218e9451).
> > 
> > This is due to commit 9b2d4e06d7f1 ("powerpc/64s: Add support for a store
> > forwarding barrier at kernel entry/exit"), which is part of a large patch
> > series and can not easily be reverted.
> > 
> > Guess I'll stop doing ppc:allmodconfig builds in v4.4.y ?  
> 
> Michael, I thought this patch series was supposed to fix ppc issues, not
> add to them :)
> 
> Any ideas on what to do here?

What exact code do you build?

In my source there the SLB relon handler starts just above this line and
a lot of out-ouf-line are handlers before. Moving some out-of-line
handlers below the parts with fixed location should fix the build error.

Thanks

Michal


Re: [PATCH 0/2] disable NUMA affinity reassignments at runtime

2019-04-18 Thread Michal Suchánek
On Thu, 18 Apr 2019 13:56:56 -0500
Nathan Lynch  wrote:

Hello,

> Changing cpu <-> node relationships at runtime, as the pseries
> platform code attempts to do for LPM, PRRN, and VPHN is essentially
> unsupported by core subsystems. [1]

Wasn't there a patch that solves the discrepancy by removing and
re-adding the updated CPUs?

http://patchwork.ozlabs.org/patch/1051761/

Thanks

Michal


Re: [PATCH 2/2] mm/dax: Don't enable huge dax mapping by default

2019-03-06 Thread Michal Suchánek
On Wed, 06 Mar 2019 14:47:33 +0530
"Aneesh Kumar K.V"  wrote:

> Dan Williams  writes:
> 
> > On Thu, Feb 28, 2019 at 1:40 AM Oliver  wrote:  
> >>
> >> On Thu, Feb 28, 2019 at 7:35 PM Aneesh Kumar K.V
> >>  wrote:  
 
> Also even if the user decided to not use THP, by
> echo "never" > transparent_hugepage/enabled , we should continue to map
> dax fault using huge page on platforms that can support huge pages.

Is this a good idea?

This knob is there for a reason. In some situations having huge pages
can severely impact performance of the system (due to host-guest
interaction or whatever) and the ability to really turn off all THP
would be important in those cases, right?

Thanks

Michal


Re: [PATCH v04] powerpc/numa: Perform full re-add of CPU for PRRN/VPHN topology update

2019-02-18 Thread Michal Suchánek
On Mon, 18 Feb 2019 11:49:17 +0100
Michal Suchánek  wrote:

Nevermind

Looks like some version of the patch is queued in powerpc/next already.

Thanks

Michal


Re: [PATCH v04] powerpc/numa: Perform full re-add of CPU for PRRN/VPHN topology update

2019-02-18 Thread Michal Suchánek
On Thu, 14 Feb 2019 09:56:26 -0600
Michael Bringmann  wrote:

Hello,

> To: linuxppc-dev@lists.ozlabs.org
> To: linux-ker...@vger.kernel.org
> Benjamin Herrenschmidt 
> Paul Mackerras 
> Michael Ellerman 
> Nathan Lynch 
> Corentin Labbe 
> Tyrel Datwyler 
> Srikar Dronamraju 
> Guenter Roeck 
> Michael Bringmann 
> "Oliver O'Halloran" 
> Russell Currey 
> Haren Myneni 
> Al Viro 
> Kees Cook 
> Nicholas Piggin 
> Rob Herring 
> Juliet Kim 
> Thomas Falcon 
> Date: 2018-11-05 16:14:12 -0600
> Subject: [PATCH v04] powerpc/numa: Perform full re-add of CPU for PRRN/VPHN 
> topology update

Looks like something went wrong with the e-mail headers.

Maybe you should re-send?

Thanks

Michal

> 
> On pseries systems, performing changes to a partition's affinity
> can result in altering the nodes a CPU is assigned to the
> current system.  For example, some systems are subject to resource
> balancing operations by the operator or control software.  In such
> environments, system CPUs may be in node 1 and 3 at boot, and be
> moved to nodes 2, 3, and 5, for better performance.
> 
> The current implementation attempts to recognize such changes within
> the powerpc-specific version of arch_update_cpu_topology to modify a
> range of system data structures directly.  However, some scheduler
> data structures may be inaccessible, or the timing of a node change
> may still lead to corruption or error in other modules (e.g. user
> space) which do not receive notification of these changes.
> 
> This patch modifies the PRRN/VPHN topology update worker function to
> recognize an affinity change for a CPU, and to perform a full DLPAR
> remove and add of the CPU instead of dynamically changing its node
> to resolve this issue.
> 
> [Based upon patch submission:
> Subject: [PATCH] powerpc/pseries: Perform full re-add of CPU for topology 
> update post-migration
> From: Nathan Fontenot 
> Date: Tue Oct 30 05:43:36 AEDT 2018
> ]
> 
> [Replace patch submission:
> Subject: [PATCH] powerpc/topology: Update numa mask when cpu node mapping 
> changes
> From: Srikar Dronamraju 
> Date: Wed Oct 10 15:24:46 AEDT 2018
> ]
> 
> Signed-off-by: Michael Bringmann 
> ---
> Changes in v04:
>   -- Revise tests in topology_timer_fn to check vphn_enabled before 
> prrn_enabled
>   -- Remove unnecessary changes to numa_update_cpu_topology
> Changes in v03:
>   -- Fixed under-scheduling of topo updates.
> Changes in v02:
>   -- Reuse more of the previous implementation to reduce patch size
>   -- Replace former calls to numa_update_cpu_topology(false) by
>  topology_schedule_update
>   -- Make sure that we report topology changes back through
>  arch_update_cpu_topology
>   -- Fix problem observed in powerpc next kernel with updating
>  cpu_associativity_changes_mask in timer_topology_fn when both
>  prrn_enabled and vphn_enabled, and many extra CPUs are possible,
>  but not installed.
>   -- Fix problem with updating cpu_associativity_changes_mask when
>  VPHN associativity information does not arrive until after first
>  call to update topology occurs.
> ---
>  arch/powerpc/include/asm/topology.h |7 +
>  arch/powerpc/kernel/rtasd.c |2 +
>  arch/powerpc/mm/numa.c  |   47 
> +++
>  3 files changed, 34 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/topology.h 
> b/arch/powerpc/include/asm/topology.h
> index f85e2b01c3df..79505c371fd5 100644
> --- a/arch/powerpc/include/asm/topology.h
> +++ b/arch/powerpc/include/asm/topology.h
> @@ -42,7 +42,7 @@ extern void __init dump_numa_cpu_topology(void);
>  
>  extern int sysfs_add_device_to_node(struct device *dev, int nid);
>  extern void sysfs_remove_device_from_node(struct device *dev, int nid);
> -extern int numa_update_cpu_topology(bool cpus_locked);
> +extern void topology_schedule_update(void);
>  
>  static inline void update_numa_cpu_lookup_table(unsigned int cpu, int node)
>  {
> @@ -77,10 +77,7 @@ static inline void sysfs_remove_device_from_node(struct 
> device *dev,
>  {
>  }
>  
> -static inline int numa_update_cpu_topology(bool cpus_locked)
> -{
> - return 0;
> -}
> +static inline void topology_schedule_update(void) {}
>  
>  static inline void update_numa_cpu_lookup_table(unsigned int cpu, int node) 
> {}
>  
> diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
> index 8a1746d755c9..b1828de7ab78 100644
> --- a/arch/powerpc/kernel/rtasd.c
> +++ b/arch/powerpc/kernel/rtasd.c
> @@ -285,7 +285,7 @@ static void handle_prrn_event(s32 scope)
>* the RTAS event.
>*/
>   pseries_devicetree_update(-scope);
> - numa_update_cpu_topology(false);
> + topology_schedule_update();
>  }
>  
>  static void handle_rtas_event(const struct rtas_error_log *log)
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index b5d1c45c1475..eb63479f09d7 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -1077,6 +1077,8 @@ 

Re: Kernel panic when loading the IDE controller driver

2019-02-14 Thread Michal Suchánek
On Thu, 14 Feb 2019 01:17:11 -0700 (MST)
sgosavi1  wrote:

> > Maybe look around 
> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?
> > id=9a0e77f28b50128df0c9e26ae489e44e29a7270a  
> 
> >Also look at ide_platform.c. I imagine there must be some way to set it 
> >up in your device tree.  
> 
> I have gone through this before and also tried the pata_platform driver but
> no success yet. I haven't found any example that passes the IO ports and IRQ
> information through the device tree to the driver code.

Hopefully there are examples of passing these values through ACPI.

Thanks

Michal


Re: [PATCH 4/4] powerpc/64s: Support shrinking the SLB for debugging

2019-01-18 Thread Michal Suchánek
On Thu, 17 Jan 2019 23:13:28 +1100
Michael Ellerman  wrote:

> On machines with 1TB segments and a 32-entry SLB it's quite hard to
> cause sufficient SLB pressure to trigger bugs caused due to badly
> timed SLB faults.
> 
> We have seen this in the past and a few years ago added the
> disable_1tb_segments command line option to force the use of 256MB
> segments. However even this allows some bugs to slip through testing
> if the SLB entry in question was recently accessed.
> 
> So add a new command line parameter for debugging which shrinks the
> SLB to the minimum size we can support. Currently that size is 3, two
> bolted SLBs and 1 for dynamic use. This creates the maximal SLB

Doesn't this violate the power of 2 requirement stated in 2/4?

Thanks

Michal

> pressure while still allowing the kernel to operate.
> 
> Signed-off-by: Michael Ellerman 
> ---
>  arch/powerpc/mm/slb.c | 14 ++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
> index 61450a9cf30d..0f33e28f97da 100644
> --- a/arch/powerpc/mm/slb.c
> +++ b/arch/powerpc/mm/slb.c
> @@ -506,10 +506,24 @@ void switch_slb(struct task_struct *tsk, struct 
> mm_struct *mm)
>   asm volatile("isync" : : : "memory");
>  }
>  
> +static bool shrink_slb = false;
> +
> +static int __init parse_shrink_slb(char *p)
> +{
> + shrink_slb = true;
> + slb_set_size(0);
> +
> + return 0;
> +}
> +early_param("shrink_slb", parse_shrink_slb);
> +
>  static u32 slb_full_bitmap;
>  
>  void slb_set_size(u16 size)
>  {
> + if (shrink_slb)
> + size = SLB_NUM_BOLTED + 1;
> +
>   mmu_slb_size = size;
>  
>   if (size >= 32)



Re: [PATCH 3/4] powerpc/tm: Unset MSR[TS] if not recheckpointing

2018-12-07 Thread Michal Suchánek
On Mon, 26 Nov 2018 18:12:00 -0200
Breno Leitao  wrote:

> There is a TM Bad Thing bug that can be caused when you return from a
> signal context in a suspended transaction but with ucontext MSR[TS] unset.
> 
> This forces regs->msr[TS] to be set at syscall entrance (since the CPU
> state is transactional). It also calls treclaim() to flush the transaction
> state, which is done based on the live (mfmsr) MSR state.
> 
> Since user context MSR[TS] is not set, then restore_tm_sigcontexts() is not
> called, thus, not executing recheckpoint, keeping the CPU state as not
> transactional. When calling rfid, SRR1 will have MSR[TS] set, but the CPU
> state is non transactional, causing the TM Bad Thing with the following
> stack:
> 

Works for me on Linux 4.4 and 4.12

Tested-by: Michal Suchánek 

Thanks


Re: [PATCH 4/4] selftests/powerpc: Add checks for transactional sigreturn

2018-12-07 Thread Michal Suchánek
On Mon, 26 Nov 2018 18:12:01 -0200
Breno Leitao  wrote:

Hello

> +
> +int main(int argc, char **argv)
> +{
> + test_harness(tm_signal_sigreturn_nt, "tm_signal_sigreturn_nt");
Should be something like
 return test_harness(tm_signal_sigreturn_nt, "tm_signal_sigreturn_nt");
> +}
> +

Thanks

Michal


Re: [PATCH RFCv2 3/4] mm/memory_hotplug: Introduce and use more memory types

2018-12-04 Thread Michal Suchánek
On Fri, 30 Nov 2018 18:59:21 +0100
David Hildenbrand  wrote:

> Let's introduce new types for different kinds of memory blocks and use
> them in existing code. As I don't see an easy way to split this up,
> do it in one hunk for now.
> 
> acpi:
>  Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel.
>  Properly change the type when trying to add memory that was already
>  detected and used during boot (so this memory will correctly end up as
>  "acpi" in user space).
> 
> pseries:
>  Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel.
>  As far as I see, handling like in the acpi case for existing blocks is
>  not required.
> 
> probed memory from user space:
>  Use DIMM_UNREMOVABLE as there is no interface to get rid of this code
>  again.
> 
> hv_balloon,xen/balloon:
>  Use BALLOON. As simple as that :)
> 
> s390x/sclp:
>  Use a dedicated type S390X_STANDBY as this type of memory and it's
>  semantics are very s390x specific.
> 
> powernv/memtrace:
>  Only allow to use BOOT memory for memtrace. I consider this code in
>  general dangerous, but we have to keep it working ... most probably just
>  a debug feature.

I don't think it should be arbitrarily restricted like that.

Thanks

Michal


Re: use generic DMA mapping code in powerpc V4

2018-11-28 Thread Michal Suchánek
On Wed, 28 Nov 2018 16:55:30 +0100
Christian Zigotzky  wrote:

> On 28 November 2018 at 12:05PM, Michael Ellerman wrote:
> > Nothing specific yet.
> >
> > I'm a bit worried it might break one of the many old obscure platforms
> > we have that aren't well tested.
> >  
> Please don't apply the new DMA mapping code if you don't be sure if it 
> works on all supported PowerPC machines. Is the new DMA mapping code 
> really necessary? It's not really nice, to rewrote code if the old code 
> works perfect. We must not forget, that we work for the end users. Does 
> the end user have advantages with this new code? Is it faster? The old 
> code works without any problems. 

There is another service provided to the users as well: new code that is
cleaner and simpler which allows easier bug fixes and new features.
Without being familiar with the DMA mapping code I cannot really say if
that's the case here.

> I am also worried about this code. How 
> can I test this new DMA mapping code?

I suppose if your machine works it works for you.

Thanks

Michal


Re: [PATCH RFC] mm/memory_hotplug: Introduce memory block types

2018-11-27 Thread Michal Suchánek
On Mon, 26 Nov 2018 16:59:14 +0100
David Hildenbrand  wrote:

> On 26.11.18 15:20, Michal Suchánek wrote:
> > On Mon, 26 Nov 2018 14:33:29 +0100
> > David Hildenbrand  wrote:
> >   
> >> On 26.11.18 13:30, David Hildenbrand wrote:  
> >>> On 23.11.18 19:06, Michal Suchánek wrote:
> >   
> >>>>
> >>>> If we are going to fake the driver information we may as well add the
> >>>> type attribute and be done with it.
> >>>>
> >>>> I think the problem with the patch was more with the semantic than the
> >>>> attribute itself.
> >>>>
> >>>> What is normal, paravirtualized, and standby memory?
> >>>>
> >>>> I can understand DIMM device, baloon device, or whatever mechanism for
> >>>> adding memory you might have.
> >>>>
> >>>> I can understand "memory designated as standby by the cluster
> >>>> administrator".
> >>>>
> >>>> However, DIMM vs baloon is orthogonal to standby and should not be
> >>>> conflated into one property.
> >>>>
> >>>> paravirtualized means nothing at all in relationship to memory type and
> >>>> the desired online policy to me.
> >>>
> >>> Right, so with whatever we come up, it should allow to make a decision
> >>> in user space about
> >>> - if memory is to be onlined automatically
> >>
> >> And I will think about if we really should model standby memory. Maybe
> >> it is really better to have in user space something like (as Dan noted)  
> > 
> > If it is possible to designate the memory as standby or online in the
> > s390 admin interface and the kernel does have access to this
> > information it makes sense to forward it to userspace (as separate
> > s390-specific property). If not then you need to make some kind of
> > assumption like below and the user can tune the script according to
> > their usecase.  
> 
> Also true, standby memory really represents a distinct type of memory
> block (memory seems to be there but really isn't). Right now I am
> thinking about something like this (tried to formulate it on a very
> generic level because we can't predict which mechanism might want to
> make use of these types in the future).
> 
> 
> /*
>  * Memory block types allow user space to formulate rules if and how to
>  * online memory blocks. The types are exposed to user space as text
>  * strings in sysfs. While the typical online strategies are described
>  * along with the types, there are use cases where that can differ (e.g.
>  * use MOVABLE zone for more reliable huge page usage, use NORMAL zone
>  * due to zone imbalance or because memory unplug is not intended).
>  *
>  * MEMORY_BLOCK_NONE:
>  *  No memory block is to be created (e.g. device memory). Used internally
>  *  only.
>  *
>  * MEMORY_BLOCK_REMOVABLE:
>  *  This memory block type should be treated as if it can be
>  *  removed/unplugged from the system again. E.g. there is a hardware
>  *  interface to unplug such memory. This memory block type is usually
>  *  onlined to the MOVABLE zone, to e.g. make offlining of it more
>  *  reliable. Examples include ACPI and PPC DIMMs.
>  *
>  * MEMORY_BLOCK_UNREMOVABLE:
>  *  This memory block type should be treated as if it can not be
>  *  removed/unplugged again. E.g. there is no hardware interface to
>  *  unplug such memory. This memory block type is usually onlined to
>  *  the NORMAL zone, as offlining is not beneficial. Examples include boot
>  *  memory on most architectures and memory added via balloon devices.

AFAIK baloon device can be inflated as well so this does not really
describe how this memory type works in any meaningful way. Also it
should not be possible to see this kind of memory from userspace. The
baloon driver just takes existing memory that is properly backed,
allocates it for itself, and allows the hypervisor to use it. Thus it
creates the equivalent to s390 standby memory which is not backed in
the VM. When memory is reclaimed from hypervisor the baloon driver
frees it making it available to the VM kernel again. However, the whole
time the memory appears present in the machine and no hotplug events
should be visible unless the docs I am looking at are really outdated.

>  *
>  * MEMORY_BLOCK_STANDBY:
>  *  The memory block type should be treated as if it can be
>  *  removed/unplugged again, however the actual memory hot(un)plug is
>  *  performed by onlining/offlining. In virtual environments, such memory
>  *  is usually added during boot and never removed. Onlining memory will
>  *  result in memory getting allocated to a VM. This memory type is usually
>  *  not onlined automatically but explicitly by the administrator. One
>  *  example is standby memory on s390x.

Again, this does not meaningfully describe the memory type. There is
no memory on standby. There is in fact no backing at all unless you
online it. So this probably is some kind of shared memory. However, the
(de)allocation is controlled differently compared to the baloon device.
The concept is very similar, though.

Thanks

Michal


Re: [PATCH RFC] mm/memory_hotplug: Introduce memory block types

2018-11-26 Thread Michal Suchánek
On Mon, 26 Nov 2018 14:33:29 +0100
David Hildenbrand  wrote:

> On 26.11.18 13:30, David Hildenbrand wrote:
> > On 23.11.18 19:06, Michal Suchánek wrote:  

> >>
> >> If we are going to fake the driver information we may as well add the
> >> type attribute and be done with it.
> >>
> >> I think the problem with the patch was more with the semantic than the
> >> attribute itself.
> >>
> >> What is normal, paravirtualized, and standby memory?
> >>
> >> I can understand DIMM device, baloon device, or whatever mechanism for
> >> adding memory you might have.
> >>
> >> I can understand "memory designated as standby by the cluster
> >> administrator".
> >>
> >> However, DIMM vs baloon is orthogonal to standby and should not be
> >> conflated into one property.
> >>
> >> paravirtualized means nothing at all in relationship to memory type and
> >> the desired online policy to me.  
> > 
> > Right, so with whatever we come up, it should allow to make a decision
> > in user space about
> > - if memory is to be onlined automatically  
> 
> And I will think about if we really should model standby memory. Maybe
> it is really better to have in user space something like (as Dan noted)

If it is possible to designate the memory as standby or online in the
s390 admin interface and the kernel does have access to this
information it makes sense to forward it to userspace (as separate
s390-specific property). If not then you need to make some kind of
assumption like below and the user can tune the script according to
their usecase.

> 
> if (isS390x() && type == "dimm") {
>   /* don't online, on s390x system DIMMs are standby memory */
> }

Thanks

Michal


Re: Spectre+Meltdown

2018-11-23 Thread Michal Suchánek
On Wed, 10 Jan 2018 18:09:45 -0600
Li Yang  wrote:

Hello,

> On Mon, Jan 8, 2018 at 2:17 AM, Christian Zigotzky
>  wrote:
> > Hi All,
> >
> > Thanks a lot for your replies.
> >
> > @NXP developers: Could you please tell us some information?  
> 
> We have done some investigation but it is not ready to be published
> yet.  You can get more information from your support channel right
> now.

With this summary paper https://arxiv.org/abs/1811.05441 it should be
possible to take the manual for your favourite CPU and see which
exploitable optimizations it does have, and how are these exploits
mitigated.

Thanks

Michal



Re: [PATCH RFC] mm/memory_hotplug: Introduce memory block types

2018-11-23 Thread Michal Suchánek
On Fri, 23 Nov 2018 12:13:58 +0100
David Hildenbrand  wrote:

> On 28.09.18 17:03, David Hildenbrand wrote:
> > How to/when to online hotplugged memory is hard to manage for
> > distributions because different memory types are to be treated differently.
> > Right now, we need complicated udev rules that e.g. check if we are
> > running on s390x, on a physical system or on a virtualized system. But
> > there is also sometimes the demand to really online memory immediately
> > while adding in the kernel and not to wait for user space to make a
> > decision. And on virtualized systems there might be different
> > requirements, depending on "how" the memory was added (and if it will
> > eventually get unplugged again - DIMM vs. paravirtualized mechanisms).
> > 
> > On the one hand, we have physical systems where we sometimes
> > want to be able to unplug memory again - e.g. a DIMM - so we have to online
> > it to the MOVABLE zone optionally. That decision is usually made in user
> > space.
> > 
> > On the other hand, we have memory that should never be onlined
> > automatically, only when asked for by an administrator. Such memory only
> > applies to virtualized environments like s390x, where the concept of
> > "standby" memory exists. Memory is detected and added during boot, so it
> > can be onlined when requested by the admininistrator or some tooling.
> > Only when onlining, memory will be allocated in the hypervisor.
> > 
> > But then, we also have paravirtualized devices (namely xen and hyper-v
> > balloons), that hotplug memory that will never ever be removed from a
> > system right now using offline_pages/remove_memory. If at all, this memory
> > is logically unplugged and handed back to the hypervisor via ballooning.
> > 
> > For paravirtualized devices it is relevant that memory is onlined as
> > quickly as possible after adding - and that it is added to the NORMAL
> > zone. Otherwise, it could happen that too much memory in a row is added
> > (but not onlined), resulting in out-of-memory conditions due to the
> > additional memory for "struct pages" and friends. MOVABLE zone as well
> > as delays might be very problematic and lead to crashes (e.g. zone
> > imbalance).
> > 
> > Therefore, introduce memory block types and online memory depending on
> > it when adding the memory. Expose the memory type to user space, so user
> > space handlers can start to process only "normal" memory. Other memory
> > block types can be ignored. One thing less to worry about in user space.
> >   
> 
> So I was looking into alternatives.
> 
> 1. Provide only "normal" and "standby" memory types to user space. This
> way user space can make smarter decisions about how to online memory.
> Not really sure if this is the right way to go.
> 
> 
> 2. Use device driver information (as mentioned by Michal S.).
> 
> The problem right now is that there are no drivers for memory block
> devices. The "memory" subsystem has no drivers, so the KOBJ_ADD uevent
> will not contain a "DRIVER" information and we ave no idea what kind of
> memory block device we hold in our hands.
> 
> $ udevadm info -q all -a /sys/devices/system/memory/memory0
> 
>   looking at device '/devices/system/memory/memory0':
> KERNEL=="memory0"
> SUBSYSTEM=="memory"
> DRIVER==""
> ATTR{online}=="1"
> ATTR{phys_device}=="0"
> ATTR{phys_index}==""
> ATTR{removable}=="0"
> ATTR{state}=="online"
> ATTR{valid_zones}=="none"
> 
> 
> If we would provide "fake" drivers for the memory block devices we want
> to treat in a special way in user space (e.g. standby memory on s390x),
> user space could use that information to make smarter decisions.
> 
> Adding such drivers might work. My suggestion would be to let ordinary
> DIMMs be without a driver for now and only special case standby memory
> and eventually paravirtualized memory devices (XEN and Hyper-V).
> 
> Any thoughts?

If we are going to fake the driver information we may as well add the
type attribute and be done with it.

I think the problem with the patch was more with the semantic than the
attribute itself.

What is normal, paravirtualized, and standby memory?

I can understand DIMM device, baloon device, or whatever mechanism for
adding memory you might have.

I can understand "memory designated as standby by the cluster
administrator".

However, DIMM vs baloon is orthogonal to standby and should not be
conflated into one property.

paravirtualized means nothing at all in relationship to memory type and
the desired online policy to me.

Lastly I would suggest if you add any property you add it to *all*
memory that is hotplugged. That way the userspace can detect if it can
rely on the information from your patch or not. Leaving some memory
untagged makes things needlessly vague.

Thanks

Michal


Re: PIE binaries are no longer mapped below 4 GiB on ppc64le

2018-10-31 Thread Michal Suchánek
On Wed, 31 Oct 2018 19:04:14 -0300
Tulio Magno Quites Machado Filho  wrote:

> Florian Weimer  writes:
> 
> > * Tulio Magno Quites Machado Filho:
> >  
> >> I wonder if this is restricted to linker that Golang uses.
> >> Were you able to reproduce the same problem with Binutils'
> >> linker?  
> >
> > The example is carefully constructed to use the external linker.  It
> > invokes gcc, which then invokes the BFD linker in my case.  
> 
> Indeed. That question was unnecessary.  :-D
> 
> > Based on the relocations, I assume there is only so much the linker
> > can do here.  I'm amazed that it produces an executable at all, let
> > alone one that runs correctly on some kernel versions!  
> 
> Agreed.  That isn't expected to work.  Both the compiler and the
> linker have to generate PIE for it to work.
> 
> > I assume that the Go toolchain simply lacks PIE support on
> > ppc64le.  
> 
> Maybe the support is there, but it doesn't generate PIC by default?
> 
golang has -fPIC IIRC. It does not benefit from the GNU toolchian
synergy of always calling the linker with the correct flags
corresponding to the generated code, though. So when gcc flips the
switch default value golang happily produces incompatible objects.

Also I suspect some pieces of stdlib are not compiled with the flags
you pass in for the build so there are always some objects somewhere
that are not compatible.

Thanks

Michal


Re: PIE binaries are no longer mapped below 4 GiB on ppc64le

2018-10-31 Thread Michal Suchánek
On Wed, 31 Oct 2018 18:20:56 +0100
Florian Weimer  wrote:

> We tried to use Go to build PIE binaries, and while the Go toolchain
> is definitely not ready (it produces text relocations and problematic
> relocations in general), it exposed what could be an accidental
> userspace ABI change.
> 
> With our 4.10-derived kernel, PIE binaries are mapped below 4 GiB, so
> relocations like R_PPC64_ADDR16_HA work:
> 
...

> There are fewer mappings because the loader detects a relocation
> overflow and aborts (“error while loading shared libraries:
> R_PPC64_ADDR16_HA reloc at 0x000120f0983c for symbol `' out of
> range”), so I had to recover the mappings externally.  Disabling ASLR
> does not help.
> 
...
> 
> And it needs to be built with:
> 
>   go build -ldflags=-extldflags=-pie extld.go
> 
> I'm not entirely sure what to make of this, but I'm worried that this
> could be a regression that matters to userspace.

I encountered the same when trying to build go on ppc64le. I am not
familiar with the internals so I just let it be.

It does not seem to matter to any other userspace. Maybe it would be
good idea to generate 64bit relocations on 64bit targets?

Thanks

Michal


Re: [PATCH v08 0/5] powerpc/hotplug: Update affinity for migrated CPUs

2018-10-29 Thread Michal Suchánek
On Sun, 29 Jul 2018 08:18:34 -0500
Michael Bringmann  wrote:

> The migration of LPARs across Power systems affects many attributes
> including that of the associativity of CPUs.  The patches in this
> set execute when a system is coming up fresh upon a migration target.
> They are intended to,
> 
> * Recognize changes to the associativity of CPUs recorded in internal
>   data structures when compared to the latest copies in the device
> tree.
> * Generate calls to other code layers to reset the data structures
>   related to associativity of the CPUs.
> * Re-register the 'changed' entities into the target system.
>   Re-registration of CPUs mostly entails acting as if they have been
>   newly hot-added into the target system.
> 
> Signed-off-by: Michael Bringmann 
> 
Hello,

what is the status of this patchset other than it no longer applies?

Thanks

Michal


Re: ethernet "bus" number in DTS ?

2018-10-28 Thread Michal Suchánek
On Fri, 26 Oct 2018 15:57:15 -0700
Florian Fainelli  wrote:

> On 10/23/18 11:22 PM, Michal Suchánek wrote:
> > On Tue, 23 Oct 2018 11:20:36 -0700
> > Florian Fainelli  wrote:
> >   
> >> On 10/23/18 11:02 AM, Joakim Tjernlund wrote:  
> >>> On Tue, 2018-10-23 at 10:03 -0700, Florian Fainelli wrote:
> >   
> >>>
> >>> I also noted that using status = "disabled" didn't work either to
> >>> create a fix name scheme. Even worse, all the eth I/F after gets
> >>> renumbered. It seems to me there is value in having stability in
> >>> eth I/F naming at boot. Then userspace(udev) can rename if need
> >>> be. 
> >>>
> >>> Sure would like to known more about why this feature is not
> >>> wanted ?
> >>>
> >>> I found
> >>>   https://patchwork.kernel.org/patch/4122441/
> >>> You quote policy as reason but surely it must be better to
> >>> have something stable, connected to the hardware name, than
> >>> semirandom naming?
> >>
> >> If the Device Tree nodes are ordered by ascending base register
> >> address, my understanding is that you get the same order as far as
> >> platform_device creation goes, this may not be true in the future
> >> if Rob decides to randomize that, but AFAICT this is still true.
> >> This may not work well with status = disabled properties being
> >> inserted here and there, but we have used that here and it has
> >> worked for as far as I can remember doing it.  
> > 
> > So this is unstable in several respects. First is changing the
> > enabled/disabled status in the deivecetrees provided by the kernel.
> > 
> > Second is if you have hardware hotplug mechanism either by firmware
> > or by loading device overlays.
> > 
> > Third is the case when the devicetree is not built as part of the
> > kernel but is instead provided by firmware that initializes the
> > low-level hardware details. Then the ordering by address is not
> > guaranteed nor is that the same address will be used to access the
> > same interface every time. There might be multiple ways to
> > configure the hardware depending on firmware configuration and/or
> > version.
> > 
> >
> >> Second, you might want to name network devices ethX, but what if I
> >> want to name them ethernetX or fooX or barX? Should we be
> >> accepting a mechanism in the kernel that would allow someone to
> >> name the interfaces the way they want straight from a name being
> >> provided in Device Tree?  
> > 
> > Clearly if there is text Ethernet1 printed above the Ethernet port
> > we should provide a mechanism to name the port Ethernet1 by
> > default.  
> 
> Yes, but then have a specific property that is flexible enough to
> retrieve things like "vital product information". For DSA switches, we
> have an optional "label" property which names the network device
> directly like it would be found on the product's case. Ideally this
> should be much more flexible such that it can point to the appropriate
> node/firmware service/whatever to get that name, which is what some
> people have been working on lately, see [1].
> 
> [1]: https://lkml.org/lkml/2018/8/14/1039

That's nice for something unique per-device like MAC address. However,
for something per-model like port labels DT properties should suffice.

> 
> The point is don't re-purpose the aliases which is something that
> exists within Device Tree to basically provide a shorter path to a
> specific set of nodes for the boot program to mangle and muck with
> instead of having to resolve a full path to a node. At least that is
> how I conceive it.
> 
> Now what complicates the matter is that some OSes like Linux tend to
> use it to also generate seemingly stable index for peripherals that
> are numbered by index such as SPI, I2C, etc buses, which is why we are
> having this conversation here, see below for more.
> 
> >   
> >>
> >> Aliases are fine for providing relative stability within the Device
> >> Tree itself and boot programs that might need to modify the Device
> >> Tree (e.g: inserting MAC addresses) such that you don't have to
> >> encode logic to search for nodes by compatible strings etc. but
> >> outside of that use case, it seems to me that you can resolve every
> >> naming decision in user-space.  
> > 
> > However, this is pushing platform-specific knowledge to userspace.
> > The way the Ethernet interface is attached and hence the de

Re: ethernet "bus" number in DTS ?

2018-10-26 Thread Michal Suchánek
On Tue, 23 Oct 2018 11:20:36 -0700
Florian Fainelli  wrote:

> On 10/23/18 11:02 AM, Joakim Tjernlund wrote:
> > On Tue, 2018-10-23 at 10:03 -0700, Florian Fainelli wrote:  

> > 
> > I also noted that using status = "disabled" didn't work either to
> > create a fix name scheme. Even worse, all the eth I/F after gets
> > renumbered. It seems to me there is value in having stability in
> > eth I/F naming at boot. Then userspace(udev) can rename if need be. 
> > 
> > Sure would like to known more about why this feature is not wanted ?
> > 
> > I found
> >   https://patchwork.kernel.org/patch/4122441/
> > You quote policy as reason but surely it must be better to
> > have something stable, connected to the hardware name, than
> > semirandom naming?  
> 
> If the Device Tree nodes are ordered by ascending base register
> address, my understanding is that you get the same order as far as
> platform_device creation goes, this may not be true in the future if
> Rob decides to randomize that, but AFAICT this is still true. This
> may not work well with status = disabled properties being inserted
> here and there, but we have used that here and it has worked for as
> far as I can remember doing it.

So this is unstable in several respects. First is changing the
enabled/disabled status in the deivecetrees provided by the kernel.

Second is if you have hardware hotplug mechanism either by firmware or
by loading device overlays.

Third is the case when the devicetree is not built as part of the
kernel but is instead provided by firmware that initializes the
low-level hardware details. Then the ordering by address is not
guaranteed nor is that the same address will be used to access the same
interface every time. There might be multiple ways to configure the
hardware depending on firmware configuration and/or version.

 
> Second, you might want to name network devices ethX, but what if I
> want to name them ethernetX or fooX or barX? Should we be accepting a
> mechanism in the kernel that would allow someone to name the
> interfaces the way they want straight from a name being provided in
> Device Tree?

Clearly if there is text Ethernet1 printed above the Ethernet port we
should provide a mechanism to name the port Ethernet1 by default.

> 
> Aliases are fine for providing relative stability within the Device
> Tree itself and boot programs that might need to modify the Device
> Tree (e.g: inserting MAC addresses) such that you don't have to
> encode logic to search for nodes by compatible strings etc. but
> outside of that use case, it seems to me that you can resolve every
> naming decision in user-space.

However, this is pushing platform-specific knowledge to userspace. The
way the Ethernet interface is attached and hence the device properties
usable for identifying the device uniquely are platform-specific. 

On the other hand, aliases are universal when provided. If they are
good enough to assign a MAC address they are good enough to provide a
stable default name.

I think this is indeed forcing the userspace to reinvent several wheels
for no good reason.

What is the problem with adding the aliases?

Thanks

Michal


Re: [PATCH RFC] mm/memory_hotplug: Introduce memory block types

2018-10-04 Thread Michal Suchánek
On Thu, 4 Oct 2018 17:45:13 +0200
David Hildenbrand  wrote:

> On 04/10/2018 17:28, Michal Suchánek wrote:

> > 
> > The state of the art is to determine what to do with hotplugged
> > memory in userspace based on platform and virtualization type.  
> 
> Exactly.
> 
> > 
> > Changing the default to depend on the driver that added the memory
> > rather than platform type should solve the issue of VMs growing
> > different types of memory device emulation.  
> 
> Yes, my original proposal (this patch) was to handle it in the kernel
> for known types. But as we learned, there might be some use cases that
> might still require to make a decision in user space.
> 
> So providing the user space either with some type hint (auto-online
> vs. standby) or the driver that added it (system vs. hyper-v ...)
> would solve the issue.

Is that not available in the udev event?

Thanks

Michal


Re: [PATCH RFC] mm/memory_hotplug: Introduce memory block types

2018-10-04 Thread Michal Suchánek
On Thu, 4 Oct 2018 10:13:48 +0200
David Hildenbrand  wrote:

ok, so what is the problem here?

Handling the hotplug in userspace through udev may be suboptimal and
kernel handling might be faster but that's orthogonal to the problem at
hand.

The state of the art is to determine what to do with hotplugged memory
in userspace based on platform and virtualization type.

Changing the default to depend on the driver that added the memory
rather than platform type should solve the issue of VMs growing
different types of memory device emulation.

Am I missing something?

Thanks

Michal


Re: [PATCH] powerpc: Move a dereference below a NULL test

2018-09-26 Thread Michal Suchánek
On Wed, 26 Sep 2018 19:46:08 +0800
zhong jiang  wrote:

> It is safe to move dereference below a NULL test.
> 
> Signed-off-by: zhong jiang 
> ---
>  arch/powerpc/kernel/cacheinfo.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/cacheinfo.c
> b/arch/powerpc/kernel/cacheinfo.c index a8f20e5..7f19714 100644
> --- a/arch/powerpc/kernel/cacheinfo.c
> +++ b/arch/powerpc/kernel/cacheinfo.c
> @@ -401,14 +401,13 @@ static struct cache
> *cache_lookup_or_instantiate(struct device_node *node, struct cache
> *cache; 
>   cache = cache_lookup_by_node(node);
> + if (!cache)
> + cache = cache_do_one_devnode(node, level);
>  
>   WARN_ONCE(cache && cache->level != level,

This has also null test so cache should be dereferenced only when
non-null here.

Thanks

Michal


Re: [PATCH v4] powerpc: Avoid code patching freed init sections

2018-09-18 Thread Michal Suchánek
On Tue, 18 Sep 2018 10:52:09 +0200
Christophe LEROY  wrote:

> 
> 
> Le 14/09/2018 à 06:22, Nicholas Piggin a écrit :
> > On Fri, 14 Sep 2018 11:14:11 +1000
> > Michael Neuling  wrote:
> > 
> >> This stops us from doing code patching in init sections after
> >> they've been freed.
> >>
> >> In this chain:
> >>kvm_guest_init() ->
> >>  kvm_use_magic_page() ->
> >>fault_in_pages_readable() ->
> >> __get_user() ->
> >>   __get_user_nocheck() ->
> >> barrier_nospec();
> >>
> >> We have a code patching location at barrier_nospec() and
> >> kvm_guest_init() is an init function. This whole chain gets
> >> inlined, so when we free the init section (hence
> >> kvm_guest_init()), this code goes away and hence should no longer
> >> be patched.
> >>
> >> We seen this as userspace memory corruption when using a memory
> >> checker while doing partition migration testing on powervm (this
> >> starts the code patching post migration via
> >> /sys/kernel/mobility/migration). In theory, it could also happen
> >> when using /sys/kernel/debug/powerpc/barrier_nospec.
> >>
> >> cc: sta...@vger.kernel.org # 4.13+
> >> Signed-off-by: Michael Neuling 
> >>
> >> ---
> >> For stable I've marked this as v4.13+ since that's when we
> >> refactored code-patching.c but it could go back even further than
> >> that. In reality though, I think we can only hit this since the
> >> first spectre/meltdown changes.
> >>
> >> v4:
> >>   Feedback from Christophe Leroy:
> >> - init_mem_free -> init_mem_is_free
> >> - prlog %lx -> %px
> >>
> >> v3:
> >>   Add init_mem_free flag to avoid potential race.
> >>   Feedback from Christophe Leroy:
> >> - use init_section_contains()
> >> - change order of init test for performance
> >> - use pr_debug()
> >> - remove blank line
> >>
> >> v2:
> >>Print when we skip an address
> >> ---
> >>   arch/powerpc/include/asm/setup.h | 1 +
> >>   arch/powerpc/lib/code-patching.c | 6 ++
> >>   arch/powerpc/mm/mem.c| 2 ++
> >>   3 files changed, 9 insertions(+)
> >>
> >> diff --git a/arch/powerpc/include/asm/setup.h
> >> b/arch/powerpc/include/asm/setup.h index 1a951b0046..1fffbba8d6
> >> 100644 --- a/arch/powerpc/include/asm/setup.h
> >> +++ b/arch/powerpc/include/asm/setup.h
> >> @@ -9,6 +9,7 @@ extern void ppc_printk_progress(char *s, unsigned
> >> short hex); 
> >>   extern unsigned int rtas_data;
> >>   extern unsigned long long memory_limit;
> >> +extern bool init_mem_is_free;
> >>   extern unsigned long klimit;
> >>   extern void *zalloc_maybe_bootmem(size_t size, gfp_t mask);
> >>   
> >> diff --git a/arch/powerpc/lib/code-patching.c
> >> b/arch/powerpc/lib/code-patching.c index 850f3b8f4d..6ae2777c22
> >> 100644 --- a/arch/powerpc/lib/code-patching.c
> >> +++ b/arch/powerpc/lib/code-patching.c
> >> @@ -28,6 +28,12 @@ static int __patch_instruction(unsigned int
> >> *exec_addr, unsigned int instr, {
> >>int err;
> >>   
> >> +  /* Make sure we aren't patching a freed init section */
> >> +  if (init_mem_is_free && init_section_contains(exec_addr,
> >> 4)) {
> >> +  pr_debug("Skipping init section patching addr:
> >> 0x%px\n", exec_addr);
> >> +  return 0;
> >> +  }
> > 
> > What we should do is a whitelist, make sure it's only patching the
> > sections we want it to.
> > 
> > That's a bigger job when you consider modules and things too though,
> > so this looks good for now. Thanks,
> 
> What about using kernel_text_address() for it then ? It also handles 
> modules, is it more complicated than that ?

Modules are patched separately so should not need to be excluded here.

There is a different problem with modules: when the mitigation type
changes the modules are not re-patched with the new settings.

Thanks

Michal


Re: [PATCH] powerpc: Avoid code patching freed init sections

2018-09-10 Thread Michal Suchánek
On Mon, 10 Sep 2018 12:16:35 +0200
Christophe LEROY  wrote:

> Le 10/09/2018 à 12:05, Michael Neuling a écrit :
> >   
> >>> + /* Make sure we aren't patching a freed init section */
> >>> + if (in_init_section(patch_addr) && init_freed())
> >>> + return 0;
> >>> +  
> >>
> >> Do we even need the init_freed() check?  
> > 
> > Maybe not.  If userspace isn't up, then maybe it's ok to skip.  
> 
> Euh ... Do you mean you'll skip all patches into init functions ?
> But code patching is not only for meltdown/spectrum workarounds, some
> of the patchings might be needed for the init functions themselves.

Some stuff like cpu feature tests have an early variant that does not
need patching but maybe not everything has.

and some stuff like lwsync might be also expanded from some macros or
inlines and may be needed in the init code. It might be questionable to
rely on it getting patched, though. Hard to tell without seeing what is
actually patched where.

Thanks

Michal


Re: [PATCH] powerpc: Avoid code patching freed init sections

2018-09-10 Thread Michal Suchánek
On Mon, 10 Sep 2018 15:44:05 +1000
Michael Neuling  wrote:

> This stops us from doing code patching in init sections after they've
> been freed.
> 
> In this chain:
>   kvm_guest_init() ->
> kvm_use_magic_page() ->
>   fault_in_pages_readable() ->
>__get_user() ->
>  __get_user_nocheck() ->
>barrier_nospec();
> 
> We have a code patching location at barrier_nospec() and
> kvm_guest_init() is an init function. This whole chain gets inlined,
> so when we free the init section (hence kvm_guest_init()), this code
> goes away and hence should no longer be patched.
> 
> We seen this as userspace memory corruption when using a memory
> checker while doing partition migration testing on powervm (this
> starts the code patching post migration via
> /sys/kernel/mobility/migration). In theory, it could also happen when
> using /sys/kernel/debug/powerpc/barrier_nospec.
> 
> With this patch there is a small change of a race if we code patch
> between the init section being freed and setting SYSTEM_RUNNING (in
> kernel_init()) but that seems like an impractical time and small
> window for any code patching to occur.
> 
> cc: sta...@vger.kernel.org # 4.13+
> Signed-off-by: Michael Neuling 
> 
> ---
> For stable I've marked this as v4.13+ since that's when we refactored
> code-patching.c but it could go back even further than that. In
> reality though, I think we can only hit this since the first
> spectre/meltdown changes.
> ---
>  arch/powerpc/lib/code-patching.c | 19 +++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/arch/powerpc/lib/code-patching.c
> b/arch/powerpc/lib/code-patching.c index 850f3b8f4d..a2bc08bfd8 100644
> --- a/arch/powerpc/lib/code-patching.c
> +++ b/arch/powerpc/lib/code-patching.c
> @@ -23,11 +23,30 @@
>  #include 
>  #include 
>  
> +
> +static inline bool in_init_section(unsigned int *patch_addr)
> +{
> + if (patch_addr < (unsigned int *)__init_begin)
> + return false;
> + if (patch_addr >= (unsigned int *)__init_end)
> + return false;
> + return true;
> +}
> +
> +static inline bool init_freed(void)
> +{
> + return (system_state >= SYSTEM_RUNNING);
> +}
> +
>  static int __patch_instruction(unsigned int *exec_addr, unsigned int
> instr, unsigned int *patch_addr)
>  {
>   int err;
>  
> + /* Make sure we aren't patching a freed init section */
> + if (in_init_section(patch_addr) && init_freed())
> + return 0;
> +

Do we even need the init_freed() check?

What user input can we process in init-only code?

Also it would be nice to write the function+offset of the skipped patch
location into the kernel log.

Thanks

Michal


Re: [PATCH] powerpc: Avoid code patching freed init sections

2018-09-10 Thread Michal Suchánek
On Mon, 10 Sep 2018 15:44:05 +1000
Michael Neuling  wrote:

> This stops us from doing code patching in init sections after they've
> been freed.
> 
> In this chain:
>   kvm_guest_init() ->
> kvm_use_magic_page() ->
>   fault_in_pages_readable() ->
>__get_user() ->
>  __get_user_nocheck() ->
>barrier_nospec();
> 
> We have a code patching location at barrier_nospec() and
> kvm_guest_init() is an init function. This whole chain gets inlined,
> so when we free the init section (hence kvm_guest_init()), this code
> goes away and hence should no longer be patched.
> 
> We seen this as userspace memory corruption when using a memory
> checker while doing partition migration testing on powervm (this
> starts the code patching post migration via
> /sys/kernel/mobility/migration). In theory, it could also happen when
> using /sys/kernel/debug/powerpc/barrier_nospec.
> 
> With this patch there is a small change of a race if we code patch
> between the init section being freed and setting SYSTEM_RUNNING (in
> kernel_init()) but that seems like an impractical time and small
> window for any code patching to occur.
> 
> cc: sta...@vger.kernel.org # 4.13+
> Signed-off-by: Michael Neuling 
> 
> ---
> For stable I've marked this as v4.13+ since that's when we refactored
> code-patching.c but it could go back even further than that. In
> reality though, I think we can only hit this since the first
> spectre/meltdown changes.

Which means it affects all maintained stable trees because the
spectre/meltdown changes are backported.

Thanks

Michal


Re: [PATCH v7 8/9] powerpc/mce: Add sysctl control for recovery action on MCE.

2018-08-09 Thread Michal Suchánek
On Thu, 9 Aug 2018 18:33:33 +1000
Nicholas Piggin  wrote:

> On Thu, 9 Aug 2018 13:39:45 +0530
> Ananth N Mavinakayanahalli  wrote:
> 
> > On Thu, Aug 09, 2018 at 06:02:53PM +1000, Nicholas Piggin wrote:  
> > > On Thu, 09 Aug 2018 16:34:07 +1000
> > > Michael Ellerman  wrote:
> > > 
> > > > "Aneesh Kumar K.V"  writes:
> > > > > On 08/08/2018 08:26 PM, Michael Ellerman wrote:  
> > > > >> Mahesh J Salgaonkar  writes:  
> > > > >>> From: Mahesh Salgaonkar 
> > > > >>>
> > > > >>> Introduce recovery action for recovered memory errors
> > > > >>> (MCEs). There are soft memory errors like SLB Multihit,
> > > > >>> which can be a result of a bad hardware OR software BUG.
> > > > >>> Kernel can easily recover from these soft errors by
> > > > >>> flushing SLB contents. After the recovery kernel can still
> > > > >>> continue to function without any issue. But in some
> > > > >>> scenario's we may keep getting these soft errors until the
> > > > >>> root cause is fixed. To be able to analyze and find the
> > > > >>> root cause, best way is to gather enough data and system
> > > > >>> state at the time of MCE. Hence this patch introduces a
> > > > >>> sysctl knob where user can decide either to continue after
> > > > >>> recovery or panic the kernel to capture the dump.  
> > > > >> 
> > > > >> I'm not convinced we want this.
> > > > >> 
> > > > >> As we've discovered it's often not possible to reconstruct
> > > > >> what happened based on a dump anyway.
> > > > >> 
> > > > >> The key thing you need is the content of the SLB and that's
> > > > >> not included in a dump.
> > > > >> 
> > > > >> So I think we should dump the SLB content when we get the
> > > > >> MCE (which this series does) and any other useful info, and
> > > > >> then if we can recover we should.  
> > > > >
> > > > > The reasoning there is what if we got multi-hit due to some
> > > > > corruption in slb_cache_ptr. ie. some part of kernel is
> > > > > wrongly updating the paca data structure due to wrong
> > > > > pointer. Now that is far fetched, but then possible right?.
> > > > > Hence the idea that, if we don't have much insight into why a
> > > > > slb multi-hit occur from the dmesg which include slb content,
> > > > > slb_cache contents etc, there should be an easy way to force
> > > > > a dump that might assist in further debug.  
> > > > 
> > > > If you're debugging something complex that you can't determine
> > > > from the SLB dump then you should be running a debug kernel
> > > > anyway. And if anything you want to drop into xmon and sit
> > > > there, preserving the most state, rather than taking a dump.
> > > 
> > > I'm not saying for a dump specifically, just some form of crash.
> > > And we really should have an option to xmon on panic, but that's
> > > another story.
> > 
> > That's fine during development or in a lab, not something we could
> > enforce in a customer environment, could we?  
> 
> xmon on panic? Not something to enforce but IMO (without thinking
> about it too much but having encountered it several times) it should
> probably be tied xmon on BUG option.

You should get that with this patch and xmon=on or am I missing
something?

Thanks

Michal


Re: [PATCH 1/2] powerpc/64s: move machine check SLB flushing to mm/slb.c

2018-08-08 Thread Michal Suchánek
On Fri,  3 Aug 2018 14:13:49 +1000
Nicholas Piggin  wrote:

> The machine check code that flushes and restores bolted segments in
> real mode belongs in mm/slb.c. This will be used by pseries machine
> check and idle code.
> 
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/powerpc/include/asm/book3s/64/mmu-hash.h |  3 ++
>  arch/powerpc/kernel/mce_power.c   | 21 ++
>  arch/powerpc/mm/slb.c | 38
> +++ 3 files changed, 44 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> b/arch/powerpc/include/asm/book3s/64/mmu-hash.h index
> 2f74bdc805e0..d4e398185b3a 100644 ---
> a/arch/powerpc/include/asm/book3s/64/mmu-hash.h +++
> b/arch/powerpc/include/asm/book3s/64/mmu-hash.h @@ -497,6 +497,9 @@
> extern void hpte_init_native(void); 
>  extern void slb_initialize(void);
>  extern void slb_flush_and_rebolt(void);
> +extern void slb_flush_all_realmode(void);
> +extern void __slb_restore_bolted_realmode(void);
> +extern void slb_restore_bolted_realmode(void);
>  
>  extern void slb_vmalloc_update(void);
>  extern void slb_set_size(u16 size);
> diff --git a/arch/powerpc/kernel/mce_power.c
> b/arch/powerpc/kernel/mce_power.c index d6756af6ec78..50f7b9817246
> 100644 --- a/arch/powerpc/kernel/mce_power.c
> +++ b/arch/powerpc/kernel/mce_power.c
> @@ -62,11 +62,8 @@ static unsigned long addr_to_pfn(struct pt_regs
> *regs, unsigned long addr) #ifdef CONFIG_PPC_BOOK3S_64
>  static void flush_and_reload_slb(void)
>  {
> - struct slb_shadow *slb;
> - unsigned long i, n;
> -
>   /* Invalidate all SLBs */
> - asm volatile("slbmte %0,%0; slbia" : : "r" (0));
> + slb_flush_all_realmode();
>  
>  #ifdef CONFIG_KVM_BOOK3S_HANDLER
>   /*
> @@ -76,22 +73,10 @@ static void flush_and_reload_slb(void)
>   if (get_paca()->kvm_hstate.in_guest)
>   return;
>  #endif
> -
> - /* For host kernel, reload the SLBs from shadow SLB buffer.
> */
> - slb = get_slb_shadow();
> - if (!slb)
> + if (early_radix_enabled())
>   return;

And we lose the check that the shadow slb exists. Is !slb equivalent to
early_radix_enabled() 

>  
> - n = min_t(u32, be32_to_cpu(slb->persistent), SLB_MIN_SIZE);
> -
> - /* Load up the SLB entries from shadow SLB */
> - for (i = 0; i < n; i++) {
> - unsigned long rb =
> be64_to_cpu(slb->save_area[i].esid);
> - unsigned long rs =
> be64_to_cpu(slb->save_area[i].vsid); -
> - rb = (rb & ~0xFFFul) | i;
> - asm volatile("slbmte %0,%1" : : "r" (rs), "r" (rb));
> - }
> + slb_restore_bolted_realmode();
>  }
>  #endif
>  
> diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
> index cb796724a6fc..136db8652577 100644
> --- a/arch/powerpc/mm/slb.c
> +++ b/arch/powerpc/mm/slb.c
> @@ -90,6 +90,44 @@ static inline void create_shadowed_slbe(unsigned
> long ea, int ssize, : "memory" );
>  }
>  
> +/*
> + * Insert bolted entries into SLB (which may not be empty).
> + */
> +void __slb_restore_bolted_realmode(void)
> +{
> + struct slb_shadow *p = get_slb_shadow();
> + enum slb_index index;

or can we get here at some point when shadow slb is not populated?

Thanks

Michal

> +
> +  /* No isync needed because realmode. */
> + for (index = 0; index < SLB_NUM_BOLTED; index++) {
> + asm volatile("slbmte  %0,%1" :
> +  : "r" (be64_to_cpu(p->save_area[index].vsid)),
> +"r" (be64_to_cpu(p->save_area[index].esid)));
> + }
> +}
> +
> +/*
> + * Insert bolted entries into an empty SLB.
> + * This is not the same as rebolt because the bolted segments
> + * (e.g., kstack) are not changed (rebolted).
> + */
> +void slb_restore_bolted_realmode(void)
> +{
> + __slb_restore_bolted_realmode();
> + get_paca()->slb_cache_ptr = 0;
> +}
> +
> +/*
> + * This flushes all SLB entries including 0, so it must be realmode.
> + */
> +void slb_flush_all_realmode(void)
> +{
> + /*
> +  * This flushes all SLB entries including 0, so it must be
> realmode.
> +  */
> + asm volatile("slbmte %0,%0; slbia" : : "r" (0));
> +}
> +
>  static void __slb_flush_and_rebolt(void)
>  {
>   /* If you change this make sure you change SLB_NUM_BOLTED



Re: [PATCH v7 8/9] powerpc/mce: Add sysctl control for recovery action on MCE.

2018-08-08 Thread Michal Suchánek
On Wed, 8 Aug 2018 21:07:11 +0530
"Aneesh Kumar K.V"  wrote:

> On 08/08/2018 08:26 PM, Michael Ellerman wrote:
> > Mahesh J Salgaonkar  writes:  
> >> From: Mahesh Salgaonkar 
> >>
> >> Introduce recovery action for recovered memory errors (MCEs).
> >> There are soft memory errors like SLB Multihit, which can be a
> >> result of a bad hardware OR software BUG. Kernel can easily
> >> recover from these soft errors by flushing SLB contents. After the
> >> recovery kernel can still continue to function without any issue.
> >> But in some scenario's we may keep getting these soft errors until
> >> the root cause is fixed. To be able to analyze and find the root
> >> cause, best way is to gather enough data and system state at the
> >> time of MCE. Hence this patch introduces a sysctl knob where user
> >> can decide either to continue after recovery or panic the kernel
> >> to capture the dump.  
> > 
> > I'm not convinced we want this.
> > 
> > As we've discovered it's often not possible to reconstruct what
> > happened based on a dump anyway.
> > 
> > The key thing you need is the content of the SLB and that's not
> > included in a dump.
> > 
> > So I think we should dump the SLB content when we get the MCE (which
> > this series does) and any other useful info, and then if we can
> > recover we should.
> >   
> 
> The reasoning there is what if we got multi-hit due to some
> corruption in slb_cache_ptr. ie. some part of kernel is wrongly
> updating the paca data structure due to wrong pointer. Now that is
> far fetched, but then possible right?. Hence the idea that, if we
> don't have much insight into why a slb multi-hit occur from the dmesg
> which include slb content, slb_cache contents etc, there should be an
> easy way to force a dump that might assist in further debug.

Nonetheless this turns all MCEs into crashes. Are there any MCEs that
could happen during normal operation and should be handled by default?

Thanks

Michal


Re: [PATCH v7 5/9] powerpc/pseries: flush SLB contents on SLB MCE errors.

2018-08-07 Thread Michal Suchánek
Hello,


On Tue, 07 Aug 2018 19:47:14 +0530
"Mahesh J Salgaonkar"  wrote:

> From: Mahesh Salgaonkar 
> 
> On pseries, as of today system crashes if we get a machine check
> exceptions due to SLB errors. These are soft errors and can be fixed
> by flushing the SLBs so the kernel can continue to function instead of
> system crash. We do this in real mode before turning on MMU. Otherwise
> we would run into nested machine checks. This patch now fetches the
> rtas error log in real mode and flushes the SLBs on SLB errors.
> 
> Signed-off-by: Mahesh Salgaonkar 
> Signed-off-by: Michal Suchanek 
> ---
> 
> Changes in V7:
> - Fold Michal's patch into this patch.
> - Handle MSR_RI=0 and evil context case in MC handler.
> ---
>  arch/powerpc/include/asm/book3s/64/mmu-hash.h |1 
>  arch/powerpc/include/asm/machdep.h|1 
>  arch/powerpc/kernel/exceptions-64s.S  |  112
> +
> arch/powerpc/kernel/mce.c |   15 +++
> arch/powerpc/mm/slb.c |6 +
> arch/powerpc/platforms/powernv/setup.c|   11 ++
> arch/powerpc/platforms/pseries/pseries.h  |1
> arch/powerpc/platforms/pseries/ras.c  |   51 +++
> arch/powerpc/platforms/pseries/setup.c|1 9 files changed,
> 195 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> b/arch/powerpc/include/asm/book3s/64/mmu-hash.h index
> 50ed64fba4ae..cc00a7088cf3 100644 ---
> a/arch/powerpc/include/asm/book3s/64/mmu-hash.h +++
> b/arch/powerpc/include/asm/book3s/64/mmu-hash.h @@ -487,6 +487,7 @@
> extern void hpte_init_native(void); 
>  extern void slb_initialize(void);
>  extern void slb_flush_and_rebolt(void);
> +extern void slb_flush_and_rebolt_realmode(void);
>  
>  extern void slb_vmalloc_update(void);
>  extern void slb_set_size(u16 size);
> diff --git a/arch/powerpc/include/asm/machdep.h
> b/arch/powerpc/include/asm/machdep.h index a47de82fb8e2..b4831f1338db
> 100644 --- a/arch/powerpc/include/asm/machdep.h
> +++ b/arch/powerpc/include/asm/machdep.h
> @@ -108,6 +108,7 @@ struct machdep_calls {
>  
>   /* Early exception handlers called in realmode */
>   int (*hmi_exception_early)(struct pt_regs
> *regs);
> + long(*machine_check_early)(struct pt_regs
> *regs); 
>   /* Called during machine check exception to retrive fixup
> address. */ bool  (*mce_check_early_recovery)(struct
> pt_regs *regs); diff --git a/arch/powerpc/kernel/exceptions-64s.S
> b/arch/powerpc/kernel/exceptions-64s.S index
> 285c6465324a..cb06f219570a 100644 ---
> a/arch/powerpc/kernel/exceptions-64s.S +++
> b/arch/powerpc/kernel/exceptions-64s.S @@ -332,6 +332,9 @@
> TRAMP_REAL_BEGIN(machine_check_pSeries) machine_check_fwnmi:
>   SET_SCRATCH0(r13)   /* save r13 */
>   EXCEPTION_PROLOG_0(PACA_EXMC)
> +BEGIN_FTR_SECTION
> + b   machine_check_pSeries_early
> +END_FTR_SECTION_IFCLR(CPU_FTR_HVMODE)
>  machine_check_pSeries_0:
>   EXCEPTION_PROLOG_1(PACA_EXMC, KVMTEST_PR, 0x200)
>   /*
> @@ -343,6 +346,90 @@ machine_check_pSeries_0:
>  
>  TRAMP_KVM_SKIP(PACA_EXMC, 0x200)
>  
> +TRAMP_REAL_BEGIN(machine_check_pSeries_early)
> +BEGIN_FTR_SECTION
> + EXCEPTION_PROLOG_1(PACA_EXMC, NOTEST, 0x200)
> + mr  r10,r1  /* Save r1 */
> + ld  r1,PACAMCEMERGSP(r13)   /* Use MC emergency
> stack */
> + subir1,r1,INT_FRAME_SIZE/* alloc stack
> frame */
> + mfspr   r11,SPRN_SRR0   /* Save SRR0 */
> + mfspr   r12,SPRN_SRR1   /* Save SRR1 */
> + EXCEPTION_PROLOG_COMMON_1()
> + EXCEPTION_PROLOG_COMMON_2(PACA_EXMC)
> + EXCEPTION_PROLOG_COMMON_3(0x200)
> + addir3,r1,STACK_FRAME_OVERHEAD
> + BRANCH_LINK_TO_FAR(machine_check_early) /* Function call ABI
> */
> + ld  r12,_MSR(r1)
> + andi.   r11,r12,MSR_PR  /* See if coming
> from user. */
> + bne 2f  /* continue in V mode
> if we are. */ +
> + /*
> +  * At this point we are not sure about what context we come
> from.
> +  * We may be in the middle of swithing stack. r1 may not be
> valid.
> +  * Hence stay on emergency stack, call
> machine_check_exception and
> +  * return from the interrupt.
> +  * But before that, check if this is an un-recoverable
> exception.
> +  * If yes, then stay on emergency stack and panic.
> +  */
> + andi.   r11,r12,MSR_RI
> + bne 1f
> +
> + /*
> +  * Check if we have successfully handled/recovered from
> error, if not
> +  * then stay on emergency stack and panic.
> +  */
> + cmpdi   r3,0/* see if we handled MCE
> successfully */
> + bne 1f  /* if handled then return from
> interrupt */ +
> + LOAD_HANDLER(r10,unrecover_mce)
> + mtspr   SPRN_SRR0,r10
> + ld  r10,PACAKMSR(r13)
> + /*
> +  * We are going down. But there are chances that we 

Re: [PATCH v5 5/7] powerpc/pseries: flush SLB contents on SLB MCE errors.

2018-07-12 Thread Michal Suchánek
On Tue, 3 Jul 2018 08:08:14 +1000
"Nicholas Piggin"  wrote:

> On Mon, 02 Jul 2018 11:17:06 +0530
> Mahesh J Salgaonkar  wrote:
> 
> > From: Mahesh Salgaonkar 
> > 
> > On pseries, as of today system crashes if we get a machine check
> > exceptions due to SLB errors. These are soft errors and can be
> > fixed by flushing the SLBs so the kernel can continue to function
> > instead of system crash. We do this in real mode before turning on
> > MMU. Otherwise we would run into nested machine checks. This patch
> > now fetches the rtas error log in real mode and flushes the SLBs on
> > SLB errors.
> > 
> > Signed-off-by: Mahesh Salgaonkar 
> > ---
> >  arch/powerpc/include/asm/book3s/64/mmu-hash.h |1 
> >  arch/powerpc/include/asm/machdep.h|1 
> >  arch/powerpc/kernel/exceptions-64s.S  |   42
> > + arch/powerpc/kernel/mce.c
> > |   16 +++- arch/powerpc/mm/slb.c |
> > 6 +++ arch/powerpc/platforms/powernv/opal.c |1 
> >  arch/powerpc/platforms/pseries/pseries.h  |1 
> >  arch/powerpc/platforms/pseries/ras.c  |   51
> > +
> > arch/powerpc/platforms/pseries/setup.c|1 9 files
> > changed, 116 insertions(+), 4 deletions(-) 
> 
> 
> > +TRAMP_REAL_BEGIN(machine_check_pSeries_early)
> > +BEGIN_FTR_SECTION
> > +   EXCEPTION_PROLOG_1(PACA_EXMC, NOTEST, 0x200)
> > +   mr  r10,r1  /* Save r1 */
> > +   ld  r1,PACAMCEMERGSP(r13)   /* Use MC emergency
> > stack */
> > +   subir1,r1,INT_FRAME_SIZE/* alloc stack
> > frame   */
> > +   mfspr   r11,SPRN_SRR0   /* Save SRR0 */
> > +   mfspr   r12,SPRN_SRR1   /* Save SRR1 */
> > +   EXCEPTION_PROLOG_COMMON_1()
> > +   EXCEPTION_PROLOG_COMMON_2(PACA_EXMC)
> > +   EXCEPTION_PROLOG_COMMON_3(0x200)
> > +   addir3,r1,STACK_FRAME_OVERHEAD
> > +   BRANCH_LINK_TO_FAR(machine_check_early) /* Function call
> > ABI */  
> 
> Is there any reason you can't use the existing
> machine_check_powernv_early code to do all this?
> 
> > diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
> > index efdd16a79075..221271c96a57 100644
> > --- a/arch/powerpc/kernel/mce.c
> > +++ b/arch/powerpc/kernel/mce.c
> > @@ -488,9 +488,21 @@ long machine_check_early(struct pt_regs *regs)
> >  {
> > long handled = 0;
> >  
> > -   __this_cpu_inc(irq_stat.mce_exceptions);
> > +   /*
> > +* For pSeries we count mce when we go into virtual mode
> > machine
> > +* check handler. Hence skip it. Also, We can't access per
> > cpu
> > +* variables in real mode for LPAR.
> > +*/
> > +   if (early_cpu_has_feature(CPU_FTR_HVMODE))
> > +   __this_cpu_inc(irq_stat.mce_exceptions);
> >  
> > -   if (cur_cpu_spec && cur_cpu_spec->machine_check_early)
> > +   /*
> > +* See if platform is capable of handling machine check.
> > +* Otherwise fallthrough and allow CPU to handle this
> > machine check.
> > +*/
> > +   if (ppc_md.machine_check_early)
> > +   handled = ppc_md.machine_check_early(regs);
> > +   else if (cur_cpu_spec && cur_cpu_spec->machine_check_early)
> > handled =
> > cur_cpu_spec->machine_check_early(regs);  
> 
> Would be good to add a powernv ppc_md handler which does the
> cur_cpu_spec->machine_check_early() call now that other platforms are
> calling this code. Because those aren't valid as a fallback call, but
> specific to powernv.
> 

Something like this (untested)?

Subject: [PATCH] powerpc/powernv: define platform MCE handler.

---
 arch/powerpc/kernel/mce.c  |  3 ---
 arch/powerpc/platforms/powernv/setup.c | 11 +++
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index 221271c96a57..ae17d8aa60c4 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -498,12 +498,9 @@ long machine_check_early(struct pt_regs *regs)
 
/*
 * See if platform is capable of handling machine check.
-* Otherwise fallthrough and allow CPU to handle this machine check.
 */
if (ppc_md.machine_check_early)
handled = ppc_md.machine_check_early(regs);
-   else if (cur_cpu_spec && cur_cpu_spec->machine_check_early)
-   handled = cur_cpu_spec->machine_check_early(regs);
return handled;
 }
 
diff --git a/arch/powerpc/platforms/powernv/setup.c 
b/arch/powerpc/platforms/powernv/setup.c
index f96df0a25d05..b74c93bc2e55 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -431,6 +431,16 @@ static unsigned long pnv_get_proc_freq(unsigned int cpu)
return ret_freq;
 }
 
+static long pnv_machine_check_early(struct pt_regs *regs)
+{
+   long handled = 0;
+
+   if (cur_cpu_spec && cur_cpu_spec->machine_check_early)
+   handled = cur_cpu_spec->machine_check_early(regs);
+
+   return handled;
+}
+
 

Re: [PATCH v6 5/8] powerpc/pseries: flush SLB contents on SLB MCE errors.

2018-07-10 Thread Michal Suchánek
Hello,

On Wed, 04 Jul 2018 23:28:21 +0530
"Mahesh J Salgaonkar"  wrote:

> From: Mahesh Salgaonkar 
> 
> On pseries, as of today system crashes if we get a machine check
> exceptions due to SLB errors. These are soft errors and can be fixed
> by flushing the SLBs so the kernel can continue to function instead of
> system crash. We do this in real mode before turning on MMU. Otherwise
> we would run into nested machine checks. This patch now fetches the
> rtas error log in real mode and flushes the SLBs on SLB errors.
> 
> Signed-off-by: Mahesh Salgaonkar 
> ---
>  arch/powerpc/include/asm/book3s/64/mmu-hash.h |1 
>  arch/powerpc/include/asm/machdep.h|1 
>  arch/powerpc/kernel/exceptions-64s.S  |   42
> + arch/powerpc/kernel/mce.c
> |   16 +++- arch/powerpc/mm/slb.c |6
> +++ arch/powerpc/platforms/pseries/pseries.h  |1 
>  arch/powerpc/platforms/pseries/ras.c  |   51
> +
> arch/powerpc/platforms/pseries/setup.c|1 8 files changed,
> 116 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> b/arch/powerpc/include/asm/book3s/64/mmu-hash.h index
> 50ed64fba4ae..cc00a7088cf3 100644 ---
> a/arch/powerpc/include/asm/book3s/64/mmu-hash.h +++
> b/arch/powerpc/include/asm/book3s/64/mmu-hash.h @@ -487,6 +487,7 @@
> extern void hpte_init_native(void); 
>  extern void slb_initialize(void);
>  extern void slb_flush_and_rebolt(void);
> +extern void slb_flush_and_rebolt_realmode(void);
>  
>  extern void slb_vmalloc_update(void);
>  extern void slb_set_size(u16 size);
> diff --git a/arch/powerpc/include/asm/machdep.h
> b/arch/powerpc/include/asm/machdep.h index ffe7c71e1132..fe447e0d4140
> 100644 --- a/arch/powerpc/include/asm/machdep.h
> +++ b/arch/powerpc/include/asm/machdep.h
> @@ -108,6 +108,7 @@ struct machdep_calls {
>  
>   /* Early exception handlers called in realmode */
>   int (*hmi_exception_early)(struct pt_regs
> *regs);
> + int (*machine_check_early)(struct pt_regs
> *regs); 
>   /* Called during machine check exception to retrive fixup
> address. */ bool  (*mce_check_early_recovery)(struct
> pt_regs *regs); diff --git a/arch/powerpc/kernel/exceptions-64s.S
> b/arch/powerpc/kernel/exceptions-64s.S index
> f283958129f2..0038596b7906 100644 ---
> a/arch/powerpc/kernel/exceptions-64s.S +++
> b/arch/powerpc/kernel/exceptions-64s.S @@ -332,6 +332,9 @@
> TRAMP_REAL_BEGIN(machine_check_pSeries) machine_check_fwnmi:
>   SET_SCRATCH0(r13)   /* save r13 */
>   EXCEPTION_PROLOG_0(PACA_EXMC)
> +BEGIN_FTR_SECTION
> + b   machine_check_pSeries_early
> +END_FTR_SECTION_IFCLR(CPU_FTR_HVMODE)
>  machine_check_pSeries_0:
>   EXCEPTION_PROLOG_1(PACA_EXMC, KVMTEST_PR, 0x200)
>   /*
> @@ -343,6 +346,45 @@ machine_check_pSeries_0:
>  
>  TRAMP_KVM_SKIP(PACA_EXMC, 0x200)
>  
> +TRAMP_REAL_BEGIN(machine_check_pSeries_early)
> +BEGIN_FTR_SECTION
> + EXCEPTION_PROLOG_1(PACA_EXMC, NOTEST, 0x200)
> + mr  r10,r1  /* Save r1 */
> + ld  r1,PACAMCEMERGSP(r13)   /* Use MC emergency
> stack */
> + subir1,r1,INT_FRAME_SIZE/* alloc stack
> frame */
> + mfspr   r11,SPRN_SRR0   /* Save SRR0 */
> + mfspr   r12,SPRN_SRR1   /* Save SRR1 */
> + EXCEPTION_PROLOG_COMMON_1()
> + EXCEPTION_PROLOG_COMMON_2(PACA_EXMC)
> + EXCEPTION_PROLOG_COMMON_3(0x200)
> + addir3,r1,STACK_FRAME_OVERHEAD
> + BRANCH_LINK_TO_FAR(machine_check_early) /* Function call ABI
> */ +
> + /* Move original SRR0 and SRR1 into the respective regs */
> + ld  r9,_MSR(r1)
> + mtspr   SPRN_SRR1,r9
> + ld  r3,_NIP(r1)
> + mtspr   SPRN_SRR0,r3
> + ld  r9,_CTR(r1)
> + mtctr   r9
> + ld  r9,_XER(r1)
> + mtxer   r9
> + ld  r9,_LINK(r1)
> + mtlrr9
> + REST_GPR(0, r1)
> + REST_8GPRS(2, r1)
> + REST_GPR(10, r1)
> + ld  r11,_CCR(r1)
> + mtcrr11
> + REST_GPR(11, r1)
> + REST_2GPRS(12, r1)
> + /* restore original r1. */
> + ld  r1,GPR1(r1)
> + SET_SCRATCH0(r13)   /* save r13 */
> + EXCEPTION_PROLOG_0(PACA_EXMC)
> + b   machine_check_pSeries_0
> +END_FTR_SECTION_IFCLR(CPU_FTR_HVMODE)
> +
>  EXC_COMMON_BEGIN(machine_check_common)
>   /*
>* Machine check is different because we use a different
> diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
> index efdd16a79075..221271c96a57 100644
> --- a/arch/powerpc/kernel/mce.c
> +++ b/arch/powerpc/kernel/mce.c
> @@ -488,9 +488,21 @@ long machine_check_early(struct pt_regs *regs)
>  {
>   long handled = 0;
>  
> - __this_cpu_inc(irq_stat.mce_exceptions);
> + /*
> +  * For pSeries we count mce when we go into virtual mode
> machine
> +  * check handler. Hence skip it. Also, We can't access per
> cpu
> +  * 

Re: [PATCH v6 8/8] powernv/pseries: consolidate code for mce early handling.

2018-07-09 Thread Michal Suchánek
On Fri, 6 Jul 2018 19:40:24 +1000
Nicholas Piggin  wrote:

> On Wed, 04 Jul 2018 23:30:12 +0530
> Mahesh J Salgaonkar  wrote:
> 
> > From: Mahesh Salgaonkar 
> > 
> > Now that other platforms also implements real mode mce handler,
> > lets consolidate the code by sharing existing powernv machine check
> > early code. Rename machine_check_powernv_early to
> > machine_check_common_early and reuse the code.
> > 
> > Signed-off-by: Mahesh Salgaonkar 
> > ---
> >  arch/powerpc/kernel/exceptions-64s.S |   56
> > +++--- 1 file changed, 11
> > insertions(+), 45 deletions(-)
> > 
> > diff --git a/arch/powerpc/kernel/exceptions-64s.S
> > b/arch/powerpc/kernel/exceptions-64s.S index
> > 0038596b7906..3e877ec55d50 100644 ---
> > a/arch/powerpc/kernel/exceptions-64s.S +++
> > b/arch/powerpc/kernel/exceptions-64s.S @@ -243,14 +243,13 @@
> > EXC_REAL_BEGIN(machine_check, 0x200, 0x100)
> > SET_SCRATCH0(r13)   /* save r13 */
> > EXCEPTION_PROLOG_0(PACA_EXMC) BEGIN_FTR_SECTION
> > -   b   machine_check_powernv_early
> > +   b   machine_check_common_early
> >  FTR_SECTION_ELSE
> > b   machine_check_pSeries_0
> >  ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE)
> >  EXC_REAL_END(machine_check, 0x200, 0x100)
> >  EXC_VIRT_NONE(0x4200, 0x100)
> > -TRAMP_REAL_BEGIN(machine_check_powernv_early)
> > -BEGIN_FTR_SECTION
> > +TRAMP_REAL_BEGIN(machine_check_common_early)
> > EXCEPTION_PROLOG_1(PACA_EXMC, NOTEST, 0x200)
> > /*
> >  * Register contents:
> > @@ -306,7 +305,9 @@ BEGIN_FTR_SECTION
> > /* Save r9 through r13 from EXMC save area to stack frame.
> > */ EXCEPTION_PROLOG_COMMON_2(PACA_EXMC)
> > mfmsr   r11 /* get MSR value */
> > +BEGIN_FTR_SECTION
> > ori r11,r11,MSR_ME  /* turn on ME bit
> > */ +END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
> > ori r11,r11,MSR_RI  /* turn on RI bit
> > */ LOAD_HANDLER(r12, machine_check_handle_early)
> >  1: mtspr   SPRN_SRR0,r12
> > @@ -325,7 +326,6 @@ BEGIN_FTR_SECTION
> > andcr11,r11,r10 /* Turn off MSR_ME
> > */ b1b
> > b   .   /* prevent speculative execution */
> > -END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
> >  
> >  TRAMP_REAL_BEGIN(machine_check_pSeries)
> > .globl machine_check_fwnmi
> > @@ -333,7 +333,7 @@ machine_check_fwnmi:
> > SET_SCRATCH0(r13)   /* save r13 */
> > EXCEPTION_PROLOG_0(PACA_EXMC)
> >  BEGIN_FTR_SECTION
> > -   b   machine_check_pSeries_early
> > +   b   machine_check_common_early
> >  END_FTR_SECTION_IFCLR(CPU_FTR_HVMODE)
> >  machine_check_pSeries_0:
> > EXCEPTION_PROLOG_1(PACA_EXMC, KVMTEST_PR, 0x200)
> > @@ -346,45 +346,6 @@ machine_check_pSeries_0:
> >  
> >  TRAMP_KVM_SKIP(PACA_EXMC, 0x200)
> >  
> > -TRAMP_REAL_BEGIN(machine_check_pSeries_early)
> > -BEGIN_FTR_SECTION
> > -   EXCEPTION_PROLOG_1(PACA_EXMC, NOTEST, 0x200)
> > -   mr  r10,r1  /* Save r1 */
> > -   ld  r1,PACAMCEMERGSP(r13)   /* Use MC emergency
> > stack */
> > -   subir1,r1,INT_FRAME_SIZE/* alloc stack
> > frame   */
> > -   mfspr   r11,SPRN_SRR0   /* Save SRR0 */
> > -   mfspr   r12,SPRN_SRR1   /* Save SRR1 */
> > -   EXCEPTION_PROLOG_COMMON_1()
> > -   EXCEPTION_PROLOG_COMMON_2(PACA_EXMC)
> > -   EXCEPTION_PROLOG_COMMON_3(0x200)
> > -   addir3,r1,STACK_FRAME_OVERHEAD
> > -   BRANCH_LINK_TO_FAR(machine_check_early) /* Function call
> > ABI */ -
> > -   /* Move original SRR0 and SRR1 into the respective regs */
> > -   ld  r9,_MSR(r1)
> > -   mtspr   SPRN_SRR1,r9
> > -   ld  r3,_NIP(r1)
> > -   mtspr   SPRN_SRR0,r3
> > -   ld  r9,_CTR(r1)
> > -   mtctr   r9
> > -   ld  r9,_XER(r1)
> > -   mtxer   r9
> > -   ld  r9,_LINK(r1)
> > -   mtlrr9
> > -   REST_GPR(0, r1)
> > -   REST_8GPRS(2, r1)
> > -   REST_GPR(10, r1)
> > -   ld  r11,_CCR(r1)
> > -   mtcrr11
> > -   REST_GPR(11, r1)
> > -   REST_2GPRS(12, r1)
> > -   /* restore original r1. */
> > -   ld  r1,GPR1(r1)
> > -   SET_SCRATCH0(r13)   /* save r13 */
> > -   EXCEPTION_PROLOG_0(PACA_EXMC)
> > -   b   machine_check_pSeries_0
> > -END_FTR_SECTION_IFCLR(CPU_FTR_HVMODE)
> > -
> >  EXC_COMMON_BEGIN(machine_check_common)
> > /*
> >  * Machine check is different because we use a different
> > @@ -483,6 +444,9 @@ EXC_COMMON_BEGIN(machine_check_handle_early)
> > bl  machine_check_early
> > std r3,RESULT(r1)   /* Save result */
> > ld  r12,_MSR(r1)
> > +BEGIN_FTR_SECTION
> > +   bne 9f  /* pSeries: continue
> > to V mode. */ +END_FTR_SECTION_IFCLR(CPU_FTR_HVMODE)  
> 
> Should this be "b 9f" ? Although...
> 
> >  
> >  #ifdef CONFIG_PPC_P7_NAP
> > /*
> > @@ -564,7 +528,9 @@ EXC_COMMON_BEGIN(machine_check_handle_early)
> >  9:
> > /* Deliver the machine check to host kernel in V mode. */
> > MACHINE_CHECK_HANDLER_WINDUP
> > -   b   machine_check_pSeries
> > +   SET_SCRATCH0(r13)  

Re: [PATCH v5 5/7] powerpc/pseries: flush SLB contents on SLB MCE errors.

2018-07-03 Thread Michal Suchánek
On Tue, 3 Jul 2018 08:08:14 +1000
Nicholas Piggin  wrote:

> On Mon, 02 Jul 2018 11:17:06 +0530
> Mahesh J Salgaonkar  wrote:
> 
> > From: Mahesh Salgaonkar 
> > 
> > On pseries, as of today system crashes if we get a machine check
> > exceptions due to SLB errors. These are soft errors and can be
> > fixed by flushing the SLBs so the kernel can continue to function
> > instead of system crash. We do this in real mode before turning on
> > MMU. Otherwise we would run into nested machine checks. This patch
> > now fetches the rtas error log in real mode and flushes the SLBs on
> > SLB errors.
> > 
> > Signed-off-by: Mahesh Salgaonkar 
> > ---
> >  arch/powerpc/include/asm/book3s/64/mmu-hash.h |1 
> >  arch/powerpc/include/asm/machdep.h|1 
> >  arch/powerpc/kernel/exceptions-64s.S  |   42
> > + arch/powerpc/kernel/mce.c
> > |   16 +++- arch/powerpc/mm/slb.c |
> > 6 +++ arch/powerpc/platforms/powernv/opal.c |1 
> >  arch/powerpc/platforms/pseries/pseries.h  |1 
> >  arch/powerpc/platforms/pseries/ras.c  |   51
> > +
> > arch/powerpc/platforms/pseries/setup.c|1 9 files
> > changed, 116 insertions(+), 4 deletions(-) 
> 
> 
> > +TRAMP_REAL_BEGIN(machine_check_pSeries_early)
> > +BEGIN_FTR_SECTION
> > +   EXCEPTION_PROLOG_1(PACA_EXMC, NOTEST, 0x200)
> > +   mr  r10,r1  /* Save r1 */
> > +   ld  r1,PACAMCEMERGSP(r13)   /* Use MC emergency
> > stack */
> > +   subir1,r1,INT_FRAME_SIZE/* alloc stack
> > frame   */
> > +   mfspr   r11,SPRN_SRR0   /* Save SRR0 */
> > +   mfspr   r12,SPRN_SRR1   /* Save SRR1 */
> > +   EXCEPTION_PROLOG_COMMON_1()
> > +   EXCEPTION_PROLOG_COMMON_2(PACA_EXMC)
> > +   EXCEPTION_PROLOG_COMMON_3(0x200)
> > +   addir3,r1,STACK_FRAME_OVERHEAD
> > +   BRANCH_LINK_TO_FAR(machine_check_early) /* Function call
> > ABI */  
> 
> Is there any reason you can't use the existing
> machine_check_powernv_early code to do all this?

Code sharing is nice but if we envision this going to stable kernels
butchering the existing handler is going to be a nightmare. The code is
quite a bit different between kernel versions.

This code as is requires the bit that introduces
EXCEPTION_PROLOG_COMMON_1 and then should work on Linux 3.14+

Thanks

Michal


Re: [v3 PATCH 5/5] powerpc/pseries: Display machine check error details.

2018-07-02 Thread Michal Suchánek
On Fri, 8 Jun 2018 11:51:36 +1000
Nicholas Piggin  wrote:

> On Thu, 07 Jun 2018 22:59:04 +0530
> Mahesh J Salgaonkar  wrote:
> 
> > From: Mahesh Salgaonkar 
> > 
> > Extract the MCE error details from RTAS extended log and display it
> > to console.
> > 
> > With this patch you should now see mce logs like below:
> > 
> > [  142.371818] Severe Machine check interrupt [Recovered]
> > [  142.371822]   NIP [dca301b8]: init_module+0x1b8/0x338
> > [bork_kernel] [  142.371822]   Initiator: CPU
> > [  142.371823]   Error type: SLB [Multihit]
> > [  142.371824] Effective address: dca7
> > 
> > Signed-off-by: Mahesh Salgaonkar 
> > ---
> >  arch/powerpc/include/asm/rtas.h  |5 +
> >  arch/powerpc/platforms/pseries/ras.c |  128
> > +- 2 files changed, 131
> > insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/rtas.h
> > b/arch/powerpc/include/asm/rtas.h index 3f2fba7ef23b..8100a95c133a
> > 100644 --- a/arch/powerpc/include/asm/rtas.h
> > +++ b/arch/powerpc/include/asm/rtas.h
> > @@ -190,6 +190,11 @@ static inline uint8_t
> > rtas_error_extended(const struct rtas_error_log *elog) return
> > (elog->byte1 & 0x04) >> 2; }
> >  
> > +static inline uint8_t rtas_error_initiator(const struct
> > rtas_error_log *elog) +{
> > +   return (elog->byte2 & 0xf0) >> 4;
> > +}
> > +
> >  #define rtas_error_type(x) ((x)->byte3)
> >  
> >  static inline
> > diff --git a/arch/powerpc/platforms/pseries/ras.c
> > b/arch/powerpc/platforms/pseries/ras.c index
> > e56759d92356..cd9446980092 100644 ---
> > a/arch/powerpc/platforms/pseries/ras.c +++
> > b/arch/powerpc/platforms/pseries/ras.c @@ -422,7 +422,130 @@ int
> > pSeries_system_reset_exception(struct pt_regs *regs) return 0; /*
> > need to perform reset */ }
> >  
> > -static int mce_handle_error(struct rtas_error_log *errp)
> > +#define VAL_TO_STRING(ar, val) ((val < ARRAY_SIZE(ar)) ?
> > ar[val] : "Unknown") +
> > +static void pseries_print_mce_info(struct pt_regs *regs,
> > +   struct rtas_error_log *errp, int
> > disposition) +{
> > +   const char *level, *sevstr;
> > +   struct pseries_errorlog *pseries_log;
> > +   struct pseries_mc_errorlog *mce_log;
> > +   uint8_t error_type, err_sub_type;
> > +   uint8_t initiator = rtas_error_initiator(errp);
> > +   uint64_t addr;
> > +
> > +   static const char * const initiators[] = {
> > +   "Unknown",
> > +   "CPU",
> > +   "PCI",
> > +   "ISA",
> > +   "Memory",
> > +   "Power Mgmt",
> > +   };
> > +   static const char * const mc_err_types[] = {
> > +   "UE",
> > +   "SLB",
> > +   "ERAT",
> > +   "TLB",
> > +   "D-Cache",
> > +   "Unknown",
> > +   "I-Cache",
> > +   };
> > +   static const char * const mc_ue_types[] = {
> > +   "Indeterminate",
> > +   "Instruction fetch",
> > +   "Page table walk ifetch",
> > +   "Load/Store",
> > +   "Page table walk Load/Store",
> > +   };
> > +
> > +   /* SLB sub errors valid values are 0x0, 0x1, 0x2 */
> > +   static const char * const mc_slb_types[] = {
> > +   "Parity",
> > +   "Multihit",
> > +   "Indeterminate",
> > +   };
> > +
> > +   /* TLB and ERAT sub errors valid values are 0x1, 0x2, 0x3
> > */
> > +   static const char * const mc_soft_types[] = {
> > +   "Unknown",
> > +   "Parity",
> > +   "Multihit",
> > +   "Indeterminate",
> > +   };
> > +
> > +   pseries_log = get_pseries_errorlog(errp,
> > PSERIES_ELOG_SECT_ID_MCE);
> > +   if (pseries_log == NULL)
> > +   return;
> > +
> > +   mce_log = (struct pseries_mc_errorlog *)pseries_log->data;
> > +
> > +   error_type = rtas_mc_error_type(mce_log);
> > +   err_sub_type = rtas_mc_error_sub_type(mce_log);
> > +
> > +   switch (rtas_error_severity(errp)) {
> > +   case RTAS_SEVERITY_NO_ERROR:
> > +   level = KERN_INFO;
> > +   sevstr = "Harmless";
> > +   break;
> > +   case RTAS_SEVERITY_WARNING:
> > +   level = KERN_WARNING;
> > +   sevstr = "";
> > +   break;
> > +   case RTAS_SEVERITY_ERROR:
> > +   case RTAS_SEVERITY_ERROR_SYNC:
> > +   level = KERN_ERR;
> > +   sevstr = "Severe";
> > +   break;
> > +   case RTAS_SEVERITY_FATAL:
> > +   default:
> > +   level = KERN_ERR;
> > +   sevstr = "Fatal";
> > +   break;
> > +   }
> > +
> > +   printk("%s%s Machine check interrupt [%s]\n", level,
> > sevstr,
> > +   disposition == RTAS_DISP_FULLY_RECOVERED ?
> > +   "Recovered" : "Not recovered");
> > +   if (user_mode(regs)) {
> > +   printk("%s  NIP: [%016lx] PID: %d Comm: %s\n",
> > level,
> > +   regs->nip, current->pid, current->comm);
> > +   } else {
> > +   printk("%s  NIP [%016lx]: %pS\n", level, regs->nip,
> > +   (void *)regs->nip);
> > +   }  
> 
> I 

Re: [PATCH v2 3/3] powerpc/fsl: Implement cpu_show_spectre_v1/v2 for NXP PowerPC Book3E

2018-06-12 Thread Michal Suchánek
On Tue, 12 Jun 2018 02:59:11 +
Bharat Bhushan  wrote:

> Hi Diana,
> 
> > -Original Message-
> > From: Diana Craciun [mailto:diana.crac...@nxp.com]
> > Sent: Monday, June 11, 2018 6:23 PM
> > To: linuxppc-dev@lists.ozlabs.org
> > Cc: m...@ellerman.id.au; o...@buserror.net; Leo Li
> > ; Bharat Bhushan ;
> > Diana Madalina Craciun 
> > Subject: [PATCH v2 3/3] powerpc/fsl: Implement
> > cpu_show_spectre_v1/v2 for NXP PowerPC Book3E  
> 
> Please add some description

To me the subject is self-explanatory. It implements a kernel interface
that was already described elsewhere.

What are you missing here?

Thanks

Michal

> 
> > 
> > Signed-off-by: Diana Craciun 
> > ---
> >  arch/powerpc/Kconfig   |  2 +-
> >  arch/powerpc/kernel/security.c | 15 +++
> >  2 files changed, 16 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index
> > 940c955..a781d60 100644
> > --- a/arch/powerpc/Kconfig
> > +++ b/arch/powerpc/Kconfig
> > @@ -170,7 +170,7 @@ config PPC
> > select GENERIC_CLOCKEVENTS_BROADCASTif SMP
> > select GENERIC_CMOS_UPDATE
> > select GENERIC_CPU_AUTOPROBE
> > -   select GENERIC_CPU_VULNERABILITIES  if PPC_BOOK3S_64
> > +   select GENERIC_CPU_VULNERABILITIES  if PPC_BOOK3S_64
> > || PPC_FSL_BOOK3E
> > select GENERIC_IRQ_SHOW
> > select GENERIC_IRQ_SHOW_LEVEL
> > select GENERIC_SMP_IDLE_THREAD
> > diff --git a/arch/powerpc/kernel/security.c
> > b/arch/powerpc/kernel/security.c index 797c975..aceaadc 100644
> > --- a/arch/powerpc/kernel/security.c
> > +++ b/arch/powerpc/kernel/security.c
> > @@ -183,3 +183,18 @@ ssize_t cpu_show_spectre_v2(struct device *dev,
> > struct device_attribute *attr, c  }  #endif /* CONFIG_PPC_BOOK3S_64
> > */
> > 
> > +#ifdef CONFIG_PPC_FSL_BOOK3E
> > +ssize_t cpu_show_spectre_v1(struct device *dev, struct
> > device_attribute +*attr, char *buf) {
> > +   if (barrier_nospec_enabled)
> > +   return sprintf(buf, "Mitigation: __user pointer
> > sanitization\n"); +
> > +   return sprintf(buf, "Vulnerable\n");
> > +}
> > +
> > +ssize_t cpu_show_spectre_v2(struct device *dev, struct
> > device_attribute +*attr, char *buf) {
> > +   return sprintf(buf, "Vulnerable\n");
> > +}
> > +#endif /* CONFIG_PPC_FSL_BOOK3E */
> > +
> > --
> > 2.5.5  
> 



Re: [RFC PATCH] powerpc/64s: remove POWER9 DD1 support

2018-06-10 Thread Michal Suchánek
On Sun, 10 Jun 2018 23:30:27 +1000
Nicholas Piggin  wrote:

> POWER9 DD1 was never a product. It is no longer supported by upstream
> firmware, and it is not effectively supported in Linux due to lack of
> testing.
> 

> diff --git a/arch/powerpc/kvm/book3s_xive_template.c
> b/arch/powerpc/kvm/book3s_xive_template.c index
> 99c3620b40d9..487f1f6650cc 100644 ---
> a/arch/powerpc/kvm/book3s_xive_template.c +++
> b/arch/powerpc/kvm/book3s_xive_template.c @@ -25,18 +25,6 @@ static
> void GLUE(X_PFX,ack_pending)(struct kvmppc_xive_vcpu *xc) */
>   eieio();
>  
> - /*
> -  * DD1 bug workaround: If PIPR is less favored than CPPR
> -  * ignore the interrupt or we might incorrectly lose an IPB
> -  * bit.
> -  */
> - if (cpu_has_feature(CPU_FTR_POWER9_DD1)) {
> - __be64 qw1 = __x_readq(__x_tima + TM_QW1_OS);
> - u8 pipr = be64_to_cpu(qw1) & 0xff;
> - if (pipr >= xc->hw_cppr)
> - return;
> - }
> -
>   /* Perform the acknowledge OS to register cycle. */
>   ack = be16_to_cpu(__x_readw(__x_tima + TM_SPC_ACK_OS_REG));
>  
> @@ -105,7 +93,7 @@ static void GLUE(X_PFX,source_eoi)(u32 hw_irq,
> struct xive_irq_data *xd) *
>* For LSIs, using the HW EOI cycle works around a
> problem
>* on P9 DD1 PHBs where the other ESB accesses don't
> work
> -  * properly.
> +  * properly. XXX: can this be removed?
>*/
>   if (xd->flags & XIVE_IRQ_FLAG_LSI)
>   __x_readq(__x_eoi_page(xd) +

Maybe this should be really removed or the comment changed to why it is
still useful?

Thanks

Michal


Re: pkeys on POWER: Access rights not reset on execve

2018-06-08 Thread Michal Suchánek
On Fri, 8 Jun 2018 15:51:03 +0200
Florian Weimer  wrote:

> On 06/08/2018 03:49 PM, Michal Suchánek wrote:
> > On Fri, 8 Jun 2018 14:57:06 +0200
> > Florian Weimer  wrote:
> >   
> >> On 06/08/2018 02:54 PM, Michal Suchánek wrote:  
> >>> On Fri, 8 Jun 2018 12:44:53 +0200
> >>> Florian Weimer  wrote:
> >>>  
> >>>> On 06/08/2018 12:15 PM, Michal Suchánek wrote:  
> >>>>> On Fri, 8 Jun 2018 07:53:51 +0200
> >>>>> Florian Weimer  wrote:
> >>>>> 
> >>>>>> On 06/08/2018 04:34 AM, Ram Pai wrote:  
> >>>>>>>>
> >>>>>>>> So the remaining question at this point is whether the Intel
> >>>>>>>> behavior (default-deny instead of default-allow) is
> >>>>>>>> preferable.  
> >>>>>>>
> >>>>>>> Florian, remind me what behavior needs to fixed?  
> >>>>>>
> >>>>>> See the other thread.  The Intel register equivalent to the AMR
> >>>>>> by default disallows access to yet-unallocated keys, so that
> >>>>>> threads which are created before key allocation do not
> >>>>>> magically gain access to a key allocated by another thread.
> >>>>>>
> >>>>>
> >>>>> That does not make any sense. The threads share the address
> >>>>> space so they should also share the keys.
> >>>>>
> >>>>> Or in other words the keys are supposed to be acceleration of
> >>>>> mprotect() so if mprotect() magically gives access to threads
> >>>>> that did not call it so should pkey functions. If they cannot
> >>>>> do that then they fail the primary purpose.  
> >>>>
> >>>> That's not how protection keys work.  The access rights are
> >>>> thread-specific, so that you can change them locally, without
> >>>> synchronization and expensive inter-node communication.
> >>>> 
> >>>
> >>> And the association of a key with part of the address space is
> >>> thread-local as well?  
> >>
> >> No, that part is still per-process.  
> > 
> > So as said above it does not make sense to make keys per-thread.  
> 
> The keys are still global, but the access rights are per-thread and
> have to be for reliability reasons.
> 

Oh, right. The association of keys to memory is independent of key
allocation. However, to change the key permissions or the memory
association to a key you need to allocate it. And key allocation is
propagated lazily between threads so you do not have to stop the world
to allocate a key. So if default key permissions of an unallocated
key allow access then allocating a key and associating it with memory
makes that memory accessible to threads that are not yet aware of the
fact the key has been allocated which is not desirable.

Sounds sensible.

Thanks

Michal


Re: pkeys on POWER: Access rights not reset on execve

2018-06-08 Thread Michal Suchánek
On Fri, 8 Jun 2018 14:57:06 +0200
Florian Weimer  wrote:

> On 06/08/2018 02:54 PM, Michal Suchánek wrote:
> > On Fri, 8 Jun 2018 12:44:53 +0200
> > Florian Weimer  wrote:
> >   
> >> On 06/08/2018 12:15 PM, Michal Suchánek wrote:  
> >>> On Fri, 8 Jun 2018 07:53:51 +0200
> >>> Florian Weimer  wrote:
> >>>  
> >>>> On 06/08/2018 04:34 AM, Ram Pai wrote:  
> >>>>>>
> >>>>>> So the remaining question at this point is whether the Intel
> >>>>>> behavior (default-deny instead of default-allow) is
> >>>>>> preferable.  
> >>>>>
> >>>>> Florian, remind me what behavior needs to fixed?  
> >>>>
> >>>> See the other thread.  The Intel register equivalent to the AMR
> >>>> by default disallows access to yet-unallocated keys, so that
> >>>> threads which are created before key allocation do not magically
> >>>> gain access to a key allocated by another thread.
> >>>> 
> >>>
> >>> That does not make any sense. The threads share the address space
> >>> so they should also share the keys.
> >>>
> >>> Or in other words the keys are supposed to be acceleration of
> >>> mprotect() so if mprotect() magically gives access to threads that
> >>> did not call it so should pkey functions. If they cannot do that
> >>> then they fail the primary purpose.  
> >>
> >> That's not how protection keys work.  The access rights are
> >> thread-specific, so that you can change them locally, without
> >> synchronization and expensive inter-node communication.
> >>  
> > 
> > And the association of a key with part of the address space is
> > thread-local as well?  
> 
> No, that part is still per-process.

So as said above it does not make sense to make keys per-thread.

Thanks

Michal


<    1   2   3   4   >