Re: [PATCH] Doc: admin-guide: Add entry for kvm_cma_resv_ratio kernel param

2020-09-16 Thread Satheesh Rajendran
Hi Randy,

Thanks for the comments, will send a V2 fixing them.

On Tue, Sep 15, 2020 at 11:18:52PM -0700, Randy Dunlap wrote:
> On 9/15/20 11:11 PM, sathn...@linux.vnet.ibm.com wrote:
> > From: Satheesh Rajendran 
> > 
> > Add document entry for kvm_cma_resv_ratio kernel param which
> > is used to alter the KVM contiguous memory allocation percentage
> > for hash pagetable allocation used by hash mode PowerPC KVM guests.
> > 
> > Cc: linux-ker...@vger.kernel.org
> > Cc: kvm-...@vger.kernel.org
> > Cc: linuxppc-dev@lists.ozlabs.org
> > Cc: Paul Mackerras 
> > Cc: Michael Ellerman 
> > Cc: Jonathan Corbet   
> > Signed-off-by: Satheesh Rajendran 
> > ---
> >  Documentation/admin-guide/kernel-parameters.txt | 9 +
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> > b/Documentation/admin-guide/kernel-parameters.txt
> > index a1068742a6df..9cb126573c71 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -599,6 +599,15 @@
> > altogether. For more information, see
> > include/linux/dma-contiguous.h
> >  
> > +kvm_cma_resv_ratio=n
> > +[PPC]
> 
> You can put [PPC] on the line above.
> 
sure
> > +Reserves given percentage from system memory area 
> > for
> > +contiguous memory allocation for KVM hash pagetable
> > +allocation.
> > +Bydefault it reserves 5% of total system memory.
> 
>  By default
> 
> > +Format: 
> > +Default: 5
> > +
> 
> and please use tabs for indentation, not all spaces.
> 
sure
> > cmo_free_hint=  [PPC] Format: { yes | no }
> > Specify whether pages are marked as being inactive
> > when they are freed.  This is used in CMO environments
> > 
> 
> Entries in kernel-parameters.txt should be sorted into dictionary order,
> so please put that with the other kvm parameters.
> 
sure.
> thanks.
> -- 
> ~Randy
> 

Thanks!
-Satheesh.


Re: [PATCH v5 05/10] powerpc/smp: Dont assume l2-cache to be superset of sibling

2020-09-13 Thread Satheesh Rajendran
On Fri, Sep 11, 2020 at 09:55:23PM +1000, Michael Ellerman wrote:
> Srikar Dronamraju  writes:
> > Current code assumes that cpumask of cpus sharing a l2-cache mask will
> > always be a superset of cpu_sibling_mask.
> >
> > Lets stop that assumption. cpu_l2_cache_mask is a superset of
> > cpu_sibling_mask if and only if shared_caches is set.
> 
> I'm seeing oopses with this:
> 
> [0.117392][T1] smp: Bringing up secondary CPUs ...
> [0.156515][T1] smp: Brought up 2 nodes, 2 CPUs
> [0.158265][T1] numa: Node 0 CPUs: 0
> [0.158520][T1] numa: Node 1 CPUs: 1
> [0.167453][T1] BUG: Unable to handle kernel data access on read at 
> 0x800041228298
> [0.167992][T1] Faulting instruction address: 0xc018c128
> [0.168817][T1] Oops: Kernel access of bad area, sig: 11 [#1]
> [0.168964][T1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> [0.169417][T1] Modules linked in:
> [0.170047][T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
> 5.9.0-rc2-00095-g7430ad5aa700 #209
> [0.170305][T1] NIP:  c018c128 LR: c018c0cc CTR: 
> c004dce0
> [0.170498][T1] REGS: c0007e343880 TRAP: 0380   Not tainted  
> (5.9.0-rc2-00095-g7430ad5aa700)
> [0.170602][T1] MSR:  82009033   CR: 
> 4400  XER: 
> [0.170985][T1] CFAR: c018c288 IRQMASK: 0
> [0.170985][T1] GPR00:  c0007e343b10 
> c173e400 4000
> [0.170985][T1] GPR04:  0800 
> 0800 
> [0.170985][T1] GPR08:  c122c298 
> c0003fffc000 c0007fd05ce8
> [0.170985][T1] GPR12: c0007e0119f8 c193 
> 8ade 
> [0.170985][T1] GPR16: c0007e3c0640 0917 
> c0007e3c0658 0008
> [0.170985][T1] GPR20: c15d0bb8 8ade 
> c0f57400 c1817c28
> [0.170985][T1] GPR24: c176dc80 c0007e3c0890 
> c0007e3cfe00 
> [0.170985][T1] GPR28: c1772310 c0007e011900 
> c0007e3c0800 0001
> [0.172750][T1] NIP [c018c128] build_sched_domains+0x808/0x14b0
> [0.172900][T1] LR [c018c0cc] build_sched_domains+0x7ac/0x14b0
> [0.173186][T1] Call Trace:
> [0.173484][T1] [c0007e343b10] [c018bfe8] 
> build_sched_domains+0x6c8/0x14b0 (unreliable)
> [0.173821][T1] [c0007e343c50] [c018dcdc] 
> sched_init_domains+0xec/0x130
> [0.174037][T1] [c0007e343ca0] [c10d59d8] 
> sched_init_smp+0x50/0xc4
> [0.174207][T1] [c0007e343cd0] [c10b45c4] 
> kernel_init_freeable+0x1b4/0x378
> [0.174378][T1] [c0007e343db0] [c00129fc] 
> kernel_init+0x24/0x158
> [0.174740][T1] [c0007e343e20] [c000d9d0] 
> ret_from_kernel_thread+0x5c/0x6c
> [0.175050][T1] Instruction dump:
> [0.175626][T1] 554905ee 71480040 7d2907b4 4182016c 2c29 3920006e 
> 913e002c 41820034
> [0.175841][T1] 7c6307b4 e9300020 78631f24 7d58182a <7d2a482a> 
> f93e0080 7d404828 314a0001
> [0.178340][T1] ---[ end trace 6876b88dd1d4b3bb ]---
> [0.178512][T1]
> [1.180458][T1] Kernel panic - not syncing: Attempted to kill init! 
> exitcode=0x000b
> 
> That's qemu:
> 
> qemu-system-ppc64 -nographic -vga none -M pseries -cpu POWER8 \
>   -kernel build~/vmlinux \
>   -m 2G,slots=2,maxmem=4G \
>   -object memory-backend-ram,size=1G,id=m0 \
>   -object memory-backend-ram,size=1G,id=m1 \
>   -numa node,nodeid=0,memdev=m0 \
>   -numa node,nodeid=1,memdev=m1 \
>   -smp 2,sockets=2,maxcpus=2  \

PowerKVM guest vCPUs does not yet have L2 and L3 cache elements
I had this bug raised some time ago, probably related?
https://bugs.launchpad.net/qemu/+bug/1774605


Regards,
-Satheesh.
> 
> 
> On mambo I get:
> 
> [0.005069][T1] smp: Bringing up secondary CPUs ...
> [0.011656][T1] smp: Brought up 2 nodes, 8 CPUs
> [0.011682][T1] numa: Node 0 CPUs: 0-3
> [0.011709][T1] numa: Node 1 CPUs: 4-7
> [0.012015][T1] BUG: arch topology borken
> [0.012040][T1]  the SMT domain not a subset of the CACHE domain
> [0.012107][T1] BUG: Unable to handle kernel data access on read at 
> 0x8001012e7398
> [0.012142][T1] Faulting instruction address: 0xc01aa4f0
> [0.012174][T1] Oops: Kernel access of bad area, sig: 11 [#1]
> [0.012206][T1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
> [0.012236][T1] Modules linked in:
> [0.012264][T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
> 5.9.0-rc2-00095-g7430ad5aa700 #1
> [0.012304][T1] NIP:  c01aa4f0 LR: c01aa498 CTR: 
> 
> [0.012341][T1] REGS: c000ef583880 TRAP: 0380   Not tainted  
> 

Re: [PATCH 0/7] Optimization to improve cpu online/offline on Powerpc

2020-07-29 Thread Satheesh Rajendran
On Mon, Jul 27, 2020 at 01:25:25PM +0530, Srikar Dronamraju wrote:
> Anton reported that his 4096 cpu (1024 cores in a socket) was taking too
> long to boot. He also analyzed that most of the time was being spent on
> updating cpu_core_mask.
> 
> Here are some optimizations and fixes to make ppc64_cpu --smt=8/ppc64_cpu
> --smt=1 run faster and hence boot the kernel also faster.
> 
> Its based on top of my v4 coregroup support patchset.
> http://lore.kernel.org/lkml/20200727053230.19753-1-sri...@linux.vnet.ibm.com/t/#u
> 
> The first two patches should solve Anton's immediate problem.
> On the unofficial patches, Anton reported that the boot time came from 30
> mins to 6 seconds. (Basically a high core count in a single socket
> configuration). Satheesh also reported similar numbers.
> 
> The rest are simple cleanups/optimizations.
> 
> Since cpu_core_mask is an exported symbol for a long duration, lets retain
> as a snapshot of cpumask_of_node.

boot tested on P9 KVM guest.

without this series:
# dmesg|grep smp
[0.066624] smp: Bringing up secondary CPUs ...
[  347.521264] smp: Brought up 1 node, 2048 CPUs

with this series:
# dmesg|grep smp
[0.067744] smp: Bringing up secondary CPUs ...
[    5.416910] smp: Brought up 1 node, 2048 CPUs

Tested-by: Satheesh Rajendran 

Regards,
-Satheesh
> 
> Architecture:ppc64le
> Byte Order:  Little Endian
> CPU(s):  160
> On-line CPU(s) list: 0-159
> Thread(s) per core:  4
> Core(s) per socket:  20
> Socket(s):   2
> NUMA node(s):2
> Model:   2.2 (pvr 004e 1202)
> Model name:  POWER9, altivec supported
> CPU max MHz: 3800.
> CPU min MHz: 2166.
> L1d cache:   32K
> L1i cache:   32K
> L2 cache:512K
> L3 cache:10240K
> NUMA node0 CPU(s):   0-79
> NUMA node8 CPU(s):   80-159
> 
> without patch (powerpc/next)
> [0.099347] smp: Bringing up secondary CPUs ...
> [0.832513] smp: Brought up 2 nodes, 160 CPUs
> 
> with powerpc/next + coregroup support patchset
> [0.099241] smp: Bringing up secondary CPUs ...
> [0.835627] smp: Brought up 2 nodes, 160 CPUs
> 
> with powerpc/next + coregroup + this patchset
> [0.097232] smp: Bringing up secondary CPUs ...
> [0.528457] smp: Brought up 2 nodes, 160 CPUs
> 
> x ppc64_cpu --smt=1
> + ppc64_cpu --smt=4
> 
> without patch
> N   Min   MaxMedian   AvgStddev
> x 100 11.82 17.06 14.01 14.05 1.2665247
> + 100 12.25 16.59 13.86   14.1143  1.164293
> 
> with patch
> N   Min   MaxMedian   AvgStddev
> x 100 12.68 16.15 14.2414.2380.75489246
> + 100 12.93 15.85 14.35   14.28970.60041813
> 
> Cc: linuxppc-dev 
> Cc: LKML 
> Cc: Michael Ellerman 
> Cc: Nicholas Piggin 
> Cc: Anton Blanchard 
> Cc: Oliver O'Halloran 
> Cc: Nathan Lynch 
> Cc: Michael Neuling 
> Cc: Gautham R Shenoy 
> Cc: Satheesh Rajendran 
> Cc: Ingo Molnar 
> Cc: Peter Zijlstra 
> Cc: Valentin Schneider 
> 
> Srikar Dronamraju (7):
>   powerpc/topology: Update topology_core_cpumask
>   powerpc/smp: Stop updating cpu_core_mask
>   powerpc/smp: Remove get_physical_package_id
>   powerpc/smp: Optimize remove_cpu_from_masks
>   powerpc/smp: Limit cpus traversed to within a node.
>   powerpc/smp: Stop passing mask to update_mask_by_l2
>   powerpc/smp: Depend on cpu_l1_cache_map when adding cpus
> 
>  arch/powerpc/include/asm/smp.h  |  5 --
>  arch/powerpc/include/asm/topology.h |  7 +--
>  arch/powerpc/kernel/smp.c   | 79 +
>  3 files changed, 24 insertions(+), 67 deletions(-)
> 
> -- 
> 2.17.1
> 


[PATCH V2] powerpc/pseries/svm: Remove unwanted check for shared_lppaca_size

2020-06-19 Thread Satheesh Rajendran
Early secure guest boot hits the below crash while booting with
vcpus numbers aligned with page boundary for PAGE size of 64k
and LPPACA size of 1k i.e 64, 128 etc, due to the BUG_ON assert
for shared_lppaca_total_size equal to shared_lppaca_size,

 [0.00] Partition configured for 64 cpus.
 [0.00] CPU maps initialized for 1 thread per core
 [0.00] [ cut here ]
 [0.00] kernel BUG at arch/powerpc/kernel/paca.c:89!
 [0.00] Oops: Exception in kernel mode, sig: 5 [#1]
 [0.00] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries

which is not necessary, let's remove it.

Fixes: bd104e6db6f0 ("powerpc/pseries/svm: Use shared memory for LPPACA 
structures")
Cc: linux-ker...@vger.kernel.org
Cc: Michael Ellerman 
Cc: Thiago Jung Bauermann 
Cc: Ram Pai 
Cc: Sukadev Bhattiprolu 
Cc: Laurent Dufour 
Reviewed-by: Laurent Dufour 
Reviewed-by: Thiago Jung Bauermann 
Signed-off-by: Satheesh Rajendran 
---

V2:
Added Reviewed by Thiago and Laurent.
Added Fixes tag as per Thiago suggest.

V1: 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20200609105731.14032-1-sathn...@linux.vnet.ibm.com/
 
---
 arch/powerpc/kernel/paca.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 2168372b792d..74da65aacbc9 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -87,7 +87,7 @@ static void *__init alloc_shared_lppaca(unsigned long size, 
unsigned long align,
 * This is very early in boot, so no harm done if the kernel crashes at
 * this point.
 */
-   BUG_ON(shared_lppaca_size >= shared_lppaca_total_size);
+   BUG_ON(shared_lppaca_size > shared_lppaca_total_size);
 
return ptr;
 }
-- 
2.26.2



[PATCH V2] powerpc/pseries/svm: Drop unused align argument in alloc_shared_lppaca() function

2020-06-12 Thread Satheesh Rajendran
Argument "align" in alloc_shared_lppaca() was unused inside the
function. Let's drop it and update code comment for page alignment.

Cc: linux-ker...@vger.kernel.org
Cc: Thiago Jung Bauermann 
Cc: Ram Pai 
Cc: Sukadev Bhattiprolu 
Cc: Laurent Dufour 
Cc: Michael Ellerman 
Reviewed-by: Thiago Jung Bauermann 
Signed-off-by: Satheesh Rajendran 
---

V2:
Added reviewed by Thiago.
Dropped align argument as per Michael suggest.
Modified commit msg.

V1: 
http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20200609113909.17236-1-sathn...@linux.vnet.ibm.com/
---
 arch/powerpc/kernel/paca.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 8d96169c597e..a174d64d9b4d 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -57,8 +57,8 @@ static void *__init alloc_paca_data(unsigned long size, 
unsigned long align,
 
 #define LPPACA_SIZE 0x400
 
-static void *__init alloc_shared_lppaca(unsigned long size, unsigned long 
align,
-   unsigned long limit, int cpu)
+static void *__init alloc_shared_lppaca(unsigned long size, unsigned long 
limit,
+   int cpu)
 {
size_t shared_lppaca_total_size = PAGE_ALIGN(nr_cpu_ids * LPPACA_SIZE);
static unsigned long shared_lppaca_size;
@@ -68,6 +68,12 @@ static void *__init alloc_shared_lppaca(unsigned long size, 
unsigned long align,
if (!shared_lppaca) {
memblock_set_bottom_up(true);
 
+   /* See Documentation/powerpc/ultravisor.rst for mode details
+*
+* UV/HV data share is in PAGE granularity, In order to
+* minimize the number of pages shared and maximize the
+* use of a page, let's use page align.
+*/
shared_lppaca =
memblock_alloc_try_nid(shared_lppaca_total_size,
   PAGE_SIZE, MEMBLOCK_LOW_LIMIT,
@@ -122,7 +128,7 @@ static struct lppaca * __init new_lppaca(int cpu, unsigned 
long limit)
return NULL;
 
if (is_secure_guest())
-   lp = alloc_shared_lppaca(LPPACA_SIZE, 0x400, limit, cpu);
+   lp = alloc_shared_lppaca(LPPACA_SIZE, limit, cpu);
else
lp = alloc_paca_data(LPPACA_SIZE, 0x400, limit, cpu);
 
-- 
2.26.2



[PATCH] powerpc/pseries/svm: Fixup align argument in alloc_shared_lppaca() function

2020-06-09 Thread Satheesh Rajendran
Argument "align" in alloc_shared_lppaca() function was unused inside the
function. Let's fix it and update code comment.

Cc: linux-ker...@vger.kernel.org
Cc: Thiago Jung Bauermann 
Cc: Ram Pai 
Cc: Sukadev Bhattiprolu 
Cc: Laurent Dufour 
Signed-off-by: Satheesh Rajendran 
---
 arch/powerpc/kernel/paca.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 8d96169c597e..9088e107fb43 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -70,7 +70,7 @@ static void *__init alloc_shared_lppaca(unsigned long size, 
unsigned long align,
 
shared_lppaca =
memblock_alloc_try_nid(shared_lppaca_total_size,
-  PAGE_SIZE, MEMBLOCK_LOW_LIMIT,
+  align, MEMBLOCK_LOW_LIMIT,
   limit, NUMA_NO_NODE);
if (!shared_lppaca)
panic("cannot allocate shared data");
@@ -122,7 +122,14 @@ static struct lppaca * __init new_lppaca(int cpu, unsigned 
long limit)
return NULL;
 
if (is_secure_guest())
-   lp = alloc_shared_lppaca(LPPACA_SIZE, 0x400, limit, cpu);
+   /*
+* See Documentation/powerpc/ultravisor.rst for mode details
+*
+* UV/HV data share is in PAGE granularity, In order to minimize
+* the number of pages shared and maximize the use of a page,
+* let's use page align.
+*/
+   lp = alloc_shared_lppaca(LPPACA_SIZE, PAGE_SIZE, limit, cpu);
else
lp = alloc_paca_data(LPPACA_SIZE, 0x400, limit, cpu);
 
-- 
2.26.2



[PATCH] powerpc/pseries/svm: Remove unwanted check for shared_lppaca_size

2020-06-09 Thread Satheesh Rajendran
Early secure guest boot hits the below crash while booting with
vcpus numbers aligned with page boundary for PAGE size of 64k
and LPPACA size of 1k i.e 64, 128 etc, due to the BUG_ON assert
for shared_lppaca_total_size equal to shared_lppaca_size,

 [0.00] Partition configured for 64 cpus.
 [0.00] CPU maps initialized for 1 thread per core
 [0.00] [ cut here ]
 [0.00] kernel BUG at arch/powerpc/kernel/paca.c:89!
 [0.00] Oops: Exception in kernel mode, sig: 5 [#1]
 [0.00] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries

which is not necessary, let's remove it.

Cc: linux-ker...@vger.kernel.org
Cc: Thiago Jung Bauermann 
Cc: Ram Pai 
Cc: Sukadev Bhattiprolu 
Cc: Laurent Dufour 
Signed-off-by: Satheesh Rajendran 
---
 arch/powerpc/kernel/paca.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 949eceb25..10b7c54a7 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -86,7 +86,7 @@ static void *__init alloc_shared_lppaca(unsigned long size, 
unsigned long align,
 * This is very early in boot, so no harm done if the kernel crashes at
 * this point.
 */
-   BUG_ON(shared_lppaca_size >= shared_lppaca_total_size);
+   BUG_ON(shared_lppaca_size > shared_lppaca_total_size);
 
return ptr;
 }
-- 
2.26.2



Re: [PATCH v3] selftests: powerpc: Fix CPU affinity for child process

2020-06-09 Thread Satheesh Rajendran
On Tue, Jun 09, 2020 at 01:44:23PM +0530, Harish wrote:
> On systems with large number of cpus, test fails trying to set
> affinity by calling sched_setaffinity() with smaller size for
> affinity mask. This patch fixes it by making sure that the size of
> allocated affinity mask is dependent on the number of CPUs as
> reported by get_nprocs().
> 
> Fixes: 00b7ec5c9cf3 ("selftests/powerpc: Import Anton's context_switch2 
> benchmark")
> Reported-by: Shirisha Ganta 
> Signed-off-by: Sandipan Das 
> Signed-off-by: Harish 
> ---

Reviewed-by: Satheesh Rajendran 

> v2: 
> https://lore.kernel.org/linuxppc-dev/20200609034005.520137-1-har...@linux.ibm.com/
> 
> Changes from v2:
> - Interchanged size and ncpus as suggested by Satheesh
> - Revert the exit code as suggested by Satheesh
> - Added NULL check for the affinity mask as suggested by Kamalesh
> - Freed the affinity mask allocation after affinity is set
>   as suggested by Kamalesh
> - Changed "cpu set" to "affinity mask" in the commit message
> 
> ---
>  .../powerpc/benchmarks/context_switch.c   | 21 ++-
>  1 file changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/testing/selftests/powerpc/benchmarks/context_switch.c 
> b/tools/testing/selftests/powerpc/benchmarks/context_switch.c
> index a2e8c9da7fa5..d50cc05df495 100644
> --- a/tools/testing/selftests/powerpc/benchmarks/context_switch.c
> +++ b/tools/testing/selftests/powerpc/benchmarks/context_switch.c
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -104,8 +105,9 @@ static void start_thread_on(void *(*fn)(void *), void 
> *arg, unsigned long cpu)
> 
>  static void start_process_on(void *(*fn)(void *), void *arg, unsigned long 
> cpu)
>  {
> - int pid;
> - cpu_set_t cpuset;
> + int pid, ncpus;
> + cpu_set_t *cpuset;
> + size_t size;
> 
>   pid = fork();
>   if (pid == -1) {
> @@ -116,14 +118,23 @@ static void start_process_on(void *(*fn)(void *), void 
> *arg, unsigned long cpu)
>   if (pid)
>   return;
> 
> - CPU_ZERO();
> - CPU_SET(cpu, );
> + ncpus = get_nprocs();
> + size = CPU_ALLOC_SIZE(ncpus);
> + cpuset = CPU_ALLOC(ncpus);
> + if (!cpuset) {
> + perror("malloc");
> + exit(1);
> + }
> + CPU_ZERO_S(size, cpuset);
> + CPU_SET_S(cpu, size, cpuset);
> 
> - if (sched_setaffinity(0, sizeof(cpuset), )) {
> + if (sched_setaffinity(0, size, cpuset)) {
>   perror("sched_setaffinity");
> + CPU_FREE(cpuset);
>   exit(1);
>   }
> 
> + CPU_FREE(cpuset);
>   fn(arg);
> 
>   exit(0);
> -- 
> 2.24.1
> 


Re: [PATCH v2] selftests: powerpc: Fix CPU affinity for child process

2020-06-09 Thread Satheesh Rajendran
On Tue, Jun 09, 2020 at 09:10:05AM +0530, Harish wrote:
> On systems with large number of cpus, test fails trying to set
> affinity for child process by calling sched_setaffinity() with 
> smaller size for cpuset. This patch fixes it by making sure that
> the size of allocated cpu set is dependent on the number of CPUs
> as reported by get_nprocs().
> 
> Fixes: 00b7ec5c9cf3 ("selftests/powerpc: Import Anton's context_switch2 
> benchmark")
> Reported-by: Shirisha Ganta 
> Signed-off-by: Harish 
> Signed-off-by: Sandipan Das 
> ---
>  .../powerpc/benchmarks/context_switch.c| 18 --
>  1 file changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/testing/selftests/powerpc/benchmarks/context_switch.c 
> b/tools/testing/selftests/powerpc/benchmarks/context_switch.c
> index a2e8c9da7fa5..de6c49d6f88f 100644
> --- a/tools/testing/selftests/powerpc/benchmarks/context_switch.c
> +++ b/tools/testing/selftests/powerpc/benchmarks/context_switch.c
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -104,8 +105,9 @@ static void start_thread_on(void *(*fn)(void *), void 
> *arg, unsigned long cpu)
> 
>  static void start_process_on(void *(*fn)(void *), void *arg, unsigned long 
> cpu)
>  {
> - int pid;
> - cpu_set_t cpuset;
> + int pid, ncpus;
> + cpu_set_t *cpuset;
> + size_t size;
> 
>   pid = fork();
>   if (pid == -1) {
> @@ -116,12 +118,16 @@ static void start_process_on(void *(*fn)(void *), void 
> *arg, unsigned long cpu)
>   if (pid)
>   return;
> 
> - CPU_ZERO();
> - CPU_SET(cpu, );
> + size = CPU_ALLOC_SIZE(ncpus);
> + ncpus = get_nprocs();
above two lines should be interchanged, ncpus not assigned while getting used 
to get size.

> + cpuset = CPU_ALLOC(ncpus);
> + CPU_ZERO_S(size, cpuset);
> + CPU_SET_S(cpu, size, cpuset);
> 
> - if (sched_setaffinity(0, sizeof(cpuset), )) {
> + if (sched_setaffinity(0, size, cpuset)) {
>   perror("sched_setaffinity");
> - exit(1);
> + CPU_FREE(cpuset);
> + exit(-1);
do we need to change the return value here?
probably other framework might rely on previous value?

Regards,
-Satheesh.
>   }
> 
>   fn(arg);
> -- 

> 2.24.1
> 


Re: [mainline][Oops][bisected 2ba3e6 ] 5.7.0 boot fails with kernel panic on powerpc

2020-06-03 Thread Satheesh Rajendran
On Wed, Jun 03, 2020 at 03:32:57PM +0200, Joerg Roedel wrote:
> On Wed, Jun 03, 2020 at 04:20:57PM +0530, Abdul Haleem wrote:
> > @Joerg, Could you please have a look?
> 
> Can you please try the attached patch?

Hi Joerg,

I did hit the similar boot failue on a Power9 baremetal box(mentioned in Note) 
and 
your below patch helped solving that for my environment and 
am able to boot the system fine.

...
Fedora 31 (Thirty One)
Kernel 5.7.0-gd6f9469a0-dirty on an ppc64le (hvc0)

 login:


Tested-by: Satheesh Rajendran 

Note: for the record, here is the boot failure call trace.

[0.023555] mempolicy: Enabling automatic NUMA balancing. Configure with 
numa_balancing= or the kernel.numa_balancing sysctl
[0.023582] pid_max: default: 163840 minimum: 1280
[0.035014] BUG: Unable to handle kernel data access on read at 
0xc060
[0.035058] Faulting instruction address: 0xc0382304
[0.035074] Oops: Kernel access of bad area, sig: 11 [#1]
[0.035097] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
[0.035113] Modules linked in:
[0.035136] CPU: 24 PID: 0 Comm: swapper/24 Not tainted 5.7.0-gd6f9469a0 #1
[0.035161] NIP:  c0382304 LR: c038407c CTR: 
[0.035197] REGS: c167f930 TRAP: 0300   Not tainted  
(5.7.0-gd6f9469a0)
[0.035241] MSR:  92009033   CR: 
42022422  XER: 
[0.035294] CFAR: c03822fc DAR: c060 DSISR: 4000 
IRQMASK: 0 
[0.035294] GPR00: c038407c c167fbc0 c168090[  
150.252645597,5] OPAL: Reboot request...
[  150.252928266,5] RESET: Initiating fast reboot 1...
0 c008 
[0.035294] GPR04:  01ff c008001f 
0060 
[0.035294] GPR08: 6000 0005 c060 
c0080020 
[0.035294] GPR12: 22022422 c187 c000 
c008 
[0.035294] GPR16: c00807ff c0080020  
c060 
[0.035294] GPR20: c0080800 c0080800 c00807ff 
c00807ff 
[0.035294] GPR24: c163f7c8 c172d0c0 0001 
0001 
[0.035294] GPR28: c1708000 c172d0c8  
c0080800 
[0.035622] NIP [c0382304] map_kernel_range_noflush+0x274/0x510
[0.035657] LR [c038407c] __vmalloc_node_range+0x2ec/0x3a0
[0.035690] Call Trace:
[0.035709] [c167fbc0] [c038d848] 
__alloc_pages_nodemask+0x158/0x3f0 (unreliable)
[0.035750] [c167fc90] [c038407c] 
__vmalloc_node_range+0x2ec/0x3a0
[0.035787] [c167fd40] [c0384268] __vmalloc+0x58/0x70
[0.035823] [c167fdb0] [c1056db8] 
alloc_large_system_hash+0x204/0x304
[0.035870] [c167fe60] [c105c1f0] vfs_caches_init+0xd8/0x138
[0.035916] [c167fee0] [c10242a0] start_kernel+0x644/0x6ec
[0.035960] [c167ff90] [c000ca9c] 
start_here_common+0x1c/0x400
[0.036004] Instruction dump:
[0.036016] 3af4 6000 6000 38c90010 7f663036 7d667a14 7cc600d0 
7d713038 
[0.036038] 38d1 7c373040 41810008 7e91a378  2c25 418201b4 
7f464830 
[0.036083] ---[ end trace c7e72029dfacc217 ]---
[0.036114] 
[1.036223] Kernel panic - not syncing: Attempted to kill the idle task!
[1.036858] Rebooting in 10 seconds..


Regards,
-Satheesh.

> 
> diff --git a/include/asm-generic/5level-fixup.h 
> b/include/asm-generic/5level-fixup.h
> index 58046ddc08d0..afbab31fbd7e 100644
> --- a/include/asm-generic/5level-fixup.h
> +++ b/include/asm-generic/5level-fixup.h
> @@ -17,6 +17,11 @@
>   ((unlikely(pgd_none(*(p4d))) && __pud_alloc(mm, p4d, address)) ? \
>   NULL : pud_offset(p4d, address))
> 
> +#define pud_alloc_track(mm, p4d, address, mask)  
> \
> + ((unlikely(pgd_none(*(p4d))) && 
> \
> +   (__pud_alloc(mm, p4d, address) || 
> ({*(mask)|=PGTBL_P4D_MODIFIED;0;})))?   \
> +   NULL : pud_offset(p4d, address))
> +
>  #define p4d_alloc(mm, pgd, address)  (pgd)
>  #define p4d_alloc_track(mm, pgd, address, mask)  (pgd)
>  #define p4d_offset(pgd, start)   (pgd)
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 7e07f4f490cb..d46bf03b804f 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2088,35 +2088,35 @@ static inline pud_t *pud_alloc(struct mm_struct *mm, 
> p4d_t *p4d,
>   NULL : pud_offset(p4d, address);
>  }
> 
> -static inline p4d_t *p4d_alloc_track(struct mm_struct *mm, pgd_t *pgd,
> +static inline pud_t *pud_alloc_track(struct mm_struct *mm, p4d_t *p4d,
>   

Re: [PATCH 2/3] powerpc/numa: Early request for home node associativity

2019-08-23 Thread Satheesh Rajendran
On Thu, Aug 22, 2019 at 08:12:34PM +0530, Srikar Dronamraju wrote:
> Currently the kernel detects if its running on a shared lpar platform
> and requests home node associativity before the scheduler sched_domains
> are setup. However between the time NUMA setup is initialized and the
> request for home node associativity, workqueue initializes its per node
> cpumask. The per node workqueue possible cpumask may turn invalid
> after home node associativity resulting in weird situations like
> workqueue possible cpumask being a subset of workqueue online cpumask.

Tested this series on Power KVM guest and expected that it fixes 
https://github.com/linuxppc/issues/issues/167 but am able to see the below 
warning
still while doing vcpu hotplug with numa nodes, Advise if am missing anything or
this is not the intended series to fix above issue.

Env:
HW: Power8
Host/Guest Kernel: 5.3.0-rc5-00172-g13e3f1076e29 (linux master + this series)
Qemu: 4.0.90 (v4.1.0-rc3)

Guest Config:
..
 4
...
/home/kvmci/linux/vmlinux
root=/dev/sda2 rw console=tty0 console=ttyS0,115200 
init=/sbin/init  initcall_debug numa=debug crashkernel=1024M selinux=0
...
  

  
  


Event: 
vcpu hotplug

[root@atest-guest ~]# [   41.447170] random: crng init done
[   41.448153] random: 7 urandom warning(s) missed due to ratelimiting
[   51.727256] VPHN hcall succeeded. Reset polling...
[   51.826301] adding cpu 2 to node 1
[   51.856238] WARNING: workqueue cpumask: online intersect > possible intersect
[   51.916297] VPHN hcall succeeded. Reset polling...
[   52.036272] adding cpu 3 to node 1


Regards,
-Satheesh.
> 
> This can be fixed by requesting home node associativity earlier just
> before NUMA setup. However at the NUMA setup time, kernel may not be in
> a position to detect if its running on a shared lpar platform. So
> request for home node associativity and if the request fails, fallback
> on the device tree property.
> 
> However home node associativity requires cpu's hwid which is set in
> smp_setup_pacas. Hence call smp_setup_pacas before numa_setup_cpus.
> 
> Signed-off-by: Srikar Dronamraju 
> Cc: Michael Ellerman 
> Cc: Nicholas Piggin 
> Cc: Nathan Lynch 
> Cc: linuxppc-dev@lists.ozlabs.org
> Reported-by: Satheesh Rajendran 
> Reported-by: Abdul Haleem 
> ---
>  arch/powerpc/kernel/setup-common.c |  5 +++--
>  arch/powerpc/mm/numa.c | 28 +++-
>  2 files changed, 30 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/setup-common.c 
> b/arch/powerpc/kernel/setup-common.c
> index 1f8db66..9135dba 100644
> --- a/arch/powerpc/kernel/setup-common.c
> +++ b/arch/powerpc/kernel/setup-common.c
> @@ -888,6 +888,9 @@ void __init setup_arch(char **cmdline_p)
>   /* Check the SMT related command line arguments (ppc64). */
>   check_smt_enabled();
> 
> +#ifdef CONFIG_SMP
> + smp_setup_pacas();
> +#endif
>   /* Parse memory topology */
>   mem_topology_setup();
> 
> @@ -899,8 +902,6 @@ void __init setup_arch(char **cmdline_p)
>* so smp_release_cpus() does nothing for them.
>*/
>  #ifdef CONFIG_SMP
> - smp_setup_pacas();
> -
>   /* On BookE, setup per-core TLB data structures. */
>   setup_tlb_core_data();
> 
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 88b5157..7965d3b 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -461,6 +461,21 @@ static int of_drconf_to_nid_single(struct drmem_lmb *lmb)
>   return nid;
>  }
> 
> +static int vphn_get_nid(unsigned long cpu)
> +{
> + __be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
> + long rc;
> +
> + /* Use associativity from first thread for all siblings */
> + rc = hcall_vphn(get_hard_smp_processor_id(cpu),
> + VPHN_FLAG_VCPU, associativity);
> +
> + if (rc == H_SUCCESS)
> + return  associativity_to_nid(associativity);
> +
> + return NUMA_NO_NODE;
> +}
> +
>  /*
>   * Figure out to which domain a cpu belongs and stick it there.
>   * Return the id of the domain used.
> @@ -490,7 +505,18 @@ static int numa_setup_cpu(unsigned long lcpu)
>   goto out;
>   }
> 
> - nid = of_node_to_nid_single(cpu);
> + /*
> +  * On a shared lpar, the device tree might not have the correct node
> +  * associativity.  At this time lppaca, or its __old_status field
> +  * may not be updated. Hence request an explicit associativity
> +  * irrespective of whether the lpar is shared or dedicated.  Use the
> +  * device tree property as a fallback.
> +  */
> + if (firmware_has_feature(FW_FEATURE_VPHN))
> + nid = vphn_get_nid(lcpu);
> +
> + if (nid == NUMA_NO_NODE)
> + nid = of_node_to_nid_single(cpu);
> 
>  out_present:
>   if (nid < 0 || !node_possible(nid))
> -- 
> 1.8.3.1
> 



Re: [PATCH] powerpc: mm: Limit rma_size to 1TB when running without HV mode

2019-07-10 Thread Satheesh Rajendran
On Wed, Jul 10, 2019 at 03:20:18PM +1000, Suraj Jitindar Singh wrote:
> The virtual real mode addressing (VRMA) mechanism is used when a
> partition is using HPT (Hash Page Table) translation and performs
> real mode accesses (MSR[IR|DR] = 0) in non-hypervisor mode. In this
> mode effective address bits 0:23 are treated as zero (i.e. the access
> is aliased to 0) and the access is performed using an implicit 1TB SLB
> entry.
> 
> The size of the RMA (Real Memory Area) is communicated to the guest as
> the size of the first memory region in the device tree. And because of
> the mechanism described above can be expected to not exceed 1TB. In the
> event that the host erroneously represents the RMA as being larger than
> 1TB, guest accesses in real mode to memory addresses above 1TB will be
> aliased down to below 1TB. This means that a memory access performed in
> real mode may differ to one performed in virtual mode for the same memory
> address, which would likely have unintended consequences.
> 
> To avoid this outcome have the guest explicitly limit the size of the
> RMA to the current maximum, which is 1TB. This means that even if the
> first memory block is larger than 1TB, only the first 1TB should be
> accessed in real mode.
> 
> Signed-off-by: Suraj Jitindar Singh 
> ---
>  arch/powerpc/mm/book3s64/hash_utils.c | 8 
>  1 file changed, 8 insertions(+)

Hi,

Tested this patch and now Power8 compat guest boots fine with mem >1024G on 
Power9 host.

Tested-by: Satheesh Rajendran 

Host: P9; kernel: 5.2.0-00915-g5ad18b2e60b7

Before this patch:
Guest crashes..
[0.00] BUG: Kernel NULL pointer dereference at 0x0028
[0.00] Faulting instruction address: 0xc102caa0
[0.00] Oops: Kernel access of bad area, sig: 11 [#1]
[0.00] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[0.00] Modules linked in:
[0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 5.2.0-03135-ge9a83bd23220 #24
[0.00] NIP:  c102caa0 LR: c102ca84 CTR: 
[0.00] REGS: c1603ba0 TRAP: 0380   Not tainted  
(5.2.0-03135-ge9a83bd23220)
[0.00] MSR:  80001033   CR: 24000428  XER: 
2000
[0.00] CFAR: c102c1d8 IRQMASK: 1 
[0.00] GPR00: c102ca84 c1603e30 c1605300 
0100 
[0.00] GPR04:   c0ff8000 
c1863dc8 
[0.00] GPR08: 2028  c0ff8000 
0009 
[0.00] GPR12:  c18f 7dc5fef0 
012e1220 
[0.00] GPR16: 012e10a0 fffd 7dc5fef0 
0130fcc0 
[0.00] GPR20: 0014 01a8 2fff 
fffd 
[0.00] GPR24: 01dc c000 c1641ed8 
c1641b78 
[0.00] GPR28:   0100 
 
[0.00] NIP [c102caa0] emergency_stack_init+0xb8/0x118
[0.00] LR [c102ca84] emergency_stack_init+0x9c/0x118
[0.00] Call Trace:
[0.00] [c1603e30] [c102ca84] 
emergency_stack_init+0x9c/0x118 (unreliable)
[0.00] [c1603e80] [c102bd54] setup_arch+0x2fc/0x388
[0.00] [c1603ef0] [c1023ccc] start_kernel+0xa4/0x660
[0.00] [c1603f90] [c000b774] start_here_common+0x1c/0x528
[0.00] Instruction dump:
[0.00] 7ffc07b4 7fc3f378 7bfd1f24 7f84e378 4bfff6e9 3f620004 3b7bc878 
7f84e378 
[0.00] 39434000 7fc3f378 e93b 7d29e82a  4bfff6c5 e93b 
7f84e378 
[0.00] random: get_random_bytes called from print_oops_end_marker+0x6c/0xa0 
with crng_init=0
[0.00] ---[ end trace  ]---
[0.00] 
[0.00] Kernel panic - not syncing: Attempted to kill the idle task!

-
With this patch:
# virsh start --console p8
Domain p8 started
Connected to domain p8
..
..
Fedora 27 (Twenty Seven)
Kernel 5.2.0-03136-gf709b0494ad9 on an ppc64le (hvc0)

atest-guest login: 
# free -g
  totalusedfree  shared  buff/cache   available
Mem:   1028   01027   0   01025
Swap: 0   0 

Regards,
-Satheesh.

> 
> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
> b/arch/powerpc/mm/book3s64/hash_utils.c
> index 28ced26f2a00..4d0e2cce9cd5 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -1901,11 +1901,19 @@ void hash__setup_initial_memory_limit(phys_addr_t 
> first_memblock_base,
>*
>* For guests on platforms before POWER9, we clamp the it limit to 1G
>* to avoid some funky things such as RTAS bugs etc...
> +  * On POWER9 we limit to 1TB in case the host erroneously told us that
> +  * the RMA was >1TB. 

Re: [PATCH] recordmcount: Fix spurious mcount entries on powerpc

2019-06-27 Thread Satheesh Rajendran
On Thu, Jun 27, 2019 at 12:08:01AM +0530, Naveen N. Rao wrote:
> The recent change enabling HAVE_C_RECORDMCOUNT on powerpc started
> showing the following issue:
> 
>   # modprobe kprobe_example
>ftrace-powerpc: Not expected bl: opcode is 3c4c0001
>WARNING: CPU: 0 PID: 227 at kernel/trace/ftrace.c:2001 
> ftrace_bug+0x90/0x318
>Modules linked in:
>CPU: 0 PID: 227 Comm: modprobe Not tainted 5.2.0-rc6-00678-g1c329100b942 #2
>NIP:  c0264318 LR: c025d694 CTR: c0f5cd30
>REGS: c1f2b7b0 TRAP: 0700   Not tainted  
> (5.2.0-rc6-00678-g1c329100b942)
>MSR:  90010282b033   CR: 
> 28228222  XER: 
>CFAR: c02642fc IRQMASK: 0
>
>NIP [c0264318] ftrace_bug+0x90/0x318
>LR [c025d694] ftrace_process_locs+0x4f4/0x5e0
>Call Trace:
>[c1f2ba40] [0004] 0x4 (unreliable)
>[c1f2bad0] [c025d694] ftrace_process_locs+0x4f4/0x5e0
>[c1f2bb90] [c020ff10] load_module+0x25b0/0x30c0
>[c1f2bd00] [c0210cb0] sys_finit_module+0xc0/0x130
>[c1f2be20] [c000bda4] system_call+0x5c/0x70
>Instruction dump:
>419e0018 2f83 419e00bc 2f83ffea 409e00cc 481c 0fe0 3c62ff96
>3901 3940 386386d0 48c4 <0fe0> 3ce20003 3901 3c62ff96
>---[ end trace 4c438d5cebf78381 ]---
>ftrace failed to modify
>[] 0xc008012a0008
> actual:   01:00:4c:3c
>Initializing ftrace call sites
>ftrace record flags: 200
> (0)
> expected tramp: c006af4c
> 
> Looking at the relocation records in __mcount_loc showed a few spurious
> entries:
>   RELOCATION RECORDS FOR [__mcount_loc]:
>   OFFSET   TYPE  VALUE
>    R_PPC64_ADDR64.text.unlikely+0x0008
>   0008 R_PPC64_ADDR64.text.unlikely+0x0014
>   0010 R_PPC64_ADDR64.text.unlikely+0x0060
>   0018 R_PPC64_ADDR64.text.unlikely+0x00b4
>   0020 R_PPC64_ADDR64.init.text+0x0008
>   0028 R_PPC64_ADDR64.init.text+0x0014
> 
> The first entry in each section is incorrect. Looking at the relocation
> records, the spurious entries correspond to the R_PPC64_ENTRY records:
>   RELOCATION RECORDS FOR [.text.unlikely]:
>   OFFSET   TYPE  VALUE
>    R_PPC64_REL64 .TOC.-0x0008
>   0008 R_PPC64_ENTRY *ABS*
>   0014 R_PPC64_REL24 _mcount
>   
> 
> The problem is that we are not validating the return value from
> get_mcountsym() in sift_rel_mcount(). With this entry, mcountsym is 0,
> but Elf_r_sym(relp) also ends up being 0. Fix this by ensuring mcountsym
> is valid before processing the entry.
> 
> Fixes: c7d64b560ce80 ("powerpc/ftrace: Enable C Version of recordmcount")
> Signed-off-by: Naveen N. Rao 
> ---
>  scripts/recordmcount.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
Hi,

linuxppc/merge branch latest commit 850f6274c5b5 was failing to boot IBM
Power8 Box, and the failure got resolved with this patch.
https://github.com/linuxppc/issues/issues/254

# git log -2 --oneline
0e0f55b31ea8 (HEAD -> issue_254) recordmcount: Fix spurious mcount entries on 
powerpc
850f6274c5b5 (origin/merge, merge) Automatic merge of branches 'master', 'next' 
and 'fixes' into merge
# uname -r
5.2.0-rc6-00123-g0e0f55b31ea8
 
Tested-by: Satheesh Rajendran 

Regards,
-Satheesh
> 
> diff --git a/scripts/recordmcount.h b/scripts/recordmcount.h
> index 13c5e6c8829c..47fca2c69a73 100644
> --- a/scripts/recordmcount.h
> +++ b/scripts/recordmcount.h
> @@ -325,7 +325,8 @@ static uint_t *sift_rel_mcount(uint_t *mlocp,
>   if (!mcountsym)
>   mcountsym = get_mcountsym(sym0, relp, str0);
> 
> - if (mcountsym == Elf_r_sym(relp) && !is_fake_mcount(relp)) {
> + if (mcountsym && mcountsym == Elf_r_sym(relp) &&
> + !is_fake_mcount(relp)) {
>   uint_t const addend =
>   _w(_w(relp->r_offset) - recval + mcount_adjust);
>   mrelp->r_offset = _w(offbase
> -- 
> 2.22.0
> 



Re: [PATCH v3 3/3] powerpc: Add support to initialize ima policy rules

2019-06-10 Thread Satheesh Rajendran
On Mon, Jun 10, 2019 at 04:33:57PM -0400, Nayna Jain wrote:
> PowerNV secure boot relies on the kernel IMA security subsystem to
> perform the OS kernel image signature verification. Since each secure
> boot mode has different IMA policy requirements, dynamic definition of
> the policy rules based on the runtime secure boot mode of the system is
> required. On systems that support secure boot, but have it disabled,
> only measurement policy rules of the kernel image and modules are
> defined.
> 
> This patch defines the arch-specific implementation to retrieve the
> secure boot mode of the system and accordingly configures the IMA policy
> rules.
> 
> This patch provides arch-specific IMA policies if PPC_SECURE_BOOT
> config is enabled.
> 
> Signed-off-by: Nayna Jain 
> ---
>  arch/powerpc/Kconfig   | 14 +
>  arch/powerpc/kernel/Makefile   |  1 +
>  arch/powerpc/kernel/ima_arch.c | 54 ++
>  include/linux/ima.h|  3 +-
>  4 files changed, 71 insertions(+), 1 deletion(-)
>  create mode 100644 arch/powerpc/kernel/ima_arch.c

Hi,

This series failed to build against linuxppc/merge tree with 
`ppc64le_defconfig`,

arch/powerpc/platforms/powernv/secboot.c:14:6: error: redefinition of 
'get_powerpc_sb_mode'
   14 | bool get_powerpc_sb_mode(void)
  |  ^~~
In file included from arch/powerpc/platforms/powernv/secboot.c:11:
./arch/powerpc/include/asm/secboot.h:15:20: note: previous definition of 
'get_powerpc_sb_mode' was here
   15 | static inline bool get_powerpc_sb_mode(void)
  |^~~
make[3]: *** [scripts/Makefile.build:278: 
arch/powerpc/platforms/powernv/secboot.o] Error 1
make[3]: *** Waiting for unfinished jobs
make[2]: *** [scripts/Makefile.build:489: arch/powerpc/platforms/powernv] Error 
2
make[1]: *** [scripts/Makefile.build:489: arch/powerpc/platforms] Error 2
make: *** [Makefile:1071: arch/powerpc] Error 2
make: *** Waiting for unfinished jobs

Regards,
-Satheesh

> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 8c1c636308c8..9de77bb14f54 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -902,6 +902,20 @@ config PPC_MEM_KEYS
> 
> If unsure, say y.
> 
> +config PPC_SECURE_BOOT
> + prompt "Enable PowerPC Secure Boot"
> + bool
> + default n
> + depends on PPC64
> + depends on OPAL_SECVAR
> + depends on IMA
> + depends on IMA_ARCH_POLICY
> + help
> +   Linux on POWER with firmware secure boot enabled needs to define
> +   security policies to extend secure boot to the OS.This config
> +   allows user to enable OS Secure Boot on PowerPC systems that
> +   have firmware secure boot support.
> +
>  endmenu
> 
>  config ISA_DMA_API
> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> index 0ea6c4aa3a20..75c929b41341 100644
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -131,6 +131,7 @@ ifdef CONFIG_IMA
>  obj-y+= ima_kexec.o
>  endif
>  endif
> +obj-$(CONFIG_PPC_SECURE_BOOT)+= ima_arch.o
> 
>  obj-$(CONFIG_AUDIT)  += audit.o
>  obj64-$(CONFIG_AUDIT)+= compat_audit.o
> diff --git a/arch/powerpc/kernel/ima_arch.c b/arch/powerpc/kernel/ima_arch.c
> new file mode 100644
> index ..1767bf6e6550
> --- /dev/null
> +++ b/arch/powerpc/kernel/ima_arch.c
> @@ -0,0 +1,54 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 IBM Corporation
> + * Author: Nayna Jain 
> + *
> + * ima_arch.c
> + *  - initialize ima policies for PowerPC Secure Boot
> + */
> +
> +#include 
> +#include 
> +
> +bool arch_ima_get_secureboot(void)
> +{
> + bool sb_mode;
> +
> + sb_mode = get_powerpc_sb_mode();
> + if (sb_mode)
> + return true;
> + else
> + return false;
> +}
> +
> +/*
> + * File signature verification is not needed, include only measurements
> + */
> +static const char *const default_arch_rules[] = {
> + "measure func=KEXEC_KERNEL_CHECK template=ima-modsig",
> + "measure func=MODULE_CHECK template=ima-modsig",
> + NULL
> +};
> +
> +/* Both file signature verification and measurements are needed */
> +static const char *const sb_arch_rules[] = {
> + "measure func=KEXEC_KERNEL_CHECK template=ima-modsig",
> + "measure func=MODULE_CHECK template=ima-modsig",
> + "appraise func=KEXEC_KERNEL_CHECK appraise_type=imasig|modsig 
> template=ima-modsig",
> +#if !IS_ENABLED(CONFIG_MODULE_SIG)
> + "appraise func=MODULE_CHECK appraise_type=imasig|modsig 
> template=ima-modsig",
> +#endif
> + NULL
> +};
> +
> +/*
> + * On PowerPC, file measurements are to be added to the IMA measurement list
> + * irrespective of the secure boot state of the system. Signature 
> verification
> + * is conditionally enabled based on the secure boot state.
> + */
> +const char *const *arch_get_ima_policy(void)
> +{
> + 

Re: [PATCH 4/4] mm/vmalloc: Hugepage vmalloc mappings

2019-06-10 Thread Satheesh Rajendran
On Mon, Jun 10, 2019 at 03:49:48PM +1000, Nicholas Piggin wrote:
> Nicholas Piggin's on June 10, 2019 2:38 pm:
> > +static int vmap_hpages_range(unsigned long start, unsigned long end,
> > +  pgprot_t prot, struct page **pages,
> > +  unsigned int page_shift)
> > +{
> > +   BUG_ON(page_shift != PAGE_SIZE);
> > +   return vmap_pages_range(start, end, prot, pages);
> > +}
> 
> That's a false positive BUG_ON for !HUGE_VMAP configs. I'll fix that
> and repost after a round of feedback.

Sure, Crash log for that false positive BUG_ON on Power8 Host.

[0.001718] pid_max: default: 163840 minimum: 1280
[0.010437] [ cut here ]
[0.010461] kernel BUG at mm/vmalloc.c:473!
[0.010471] Oops: Exception in kernel mode, sig: 5 [#1]
[0.010481] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
[0.010491] Modules linked in:
[0.010503] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.2.0-rc3-ga7ee9421d #1
[0.010515] NIP:  c034dbd8 LR: c034dc80 CTR: 
[0.010527] REGS: c15bf9a0 TRAP: 0700   Not tainted  
(5.2.0-rc3-ga7ee9421d)
[0.010537] MSR:  92029033   CR: 
22022422  XER: 2000
[0.010559] CFAR: c034dc88 IRQMASK: 0
[0.010559] GPR00: c034dc80 c15bfc30 c15c2f00 
c00c01fd0e00
[0.010559] GPR04:  2322  
0040
[0.010559] GPR08: c00ff908 0400 0400 
0100
[0.010559] GPR12: 42022422 c17a 0001035ae7d8 
0400
[0.010559] GPR16: 0400 818e c0ee08c8 

[0.010559] GPR20: 0001 2b22 0b20 
0022
[0.010559] GPR24: c007f92c7880 0b22 0001 
c00a
[0.010559] GPR28: c008 0400  
0b20
[0.010664] NIP [c034dbd8] __vmalloc_node_range+0x1f8/0x410
[0.010677] LR [c034dc80] __vmalloc_node_range+0x2a0/0x410
[0.010686] Call Trace:
[0.010695] [c15bfc30] [c034dc80] 
__vmalloc_node_range+0x2a0/0x410 (unreliable)
[0.010711] [c15bfd30] [c034de40] __vmalloc+0x50/0x60
[0.010724] [c15bfda0] [c101e54c] 
alloc_large_system_hash+0x200/0x304
[0.010738] [c15bfe60] [c10235bc] vfs_caches_init+0xd8/0x138
[0.010752] [c15bfee0] [c0fe428c] start_kernel+0x5c4/0x668
[0.010767] [c15bff90] [c000b774] 
start_here_common+0x1c/0x528
[0.010777] Instruction dump:
[0.010785] 6000 7c691b79 418200dc e9180020 79ea1f24 7d28512a 40920170 
8138002c
[0.010803] 394f0001 794f0020 7c095040 4181ffbc <0fe0> 6000 3f41 
4bfffedc
[0.010826] ---[ end trace dd0217488686d653 ]---
[0.010834]
[1.010946] Kernel panic - not syncing: Attempted to kill the idle task!
[1.011061] Rebooting in 10 seconds..

Regards,
-Satheesh.
> 
> Thanks,
> Nick
> 
> 



Re: [RFC PATCH V2 3/3] mm/nvdimm: Use correct #defines instead of opencoding

2019-05-22 Thread Satheesh Rajendran
On Wed, May 22, 2019 at 01:57:01PM +0530, Aneesh Kumar K.V wrote:
> The nfpn related change is needed to fix the kernel message
> 
> "number of pfns truncated from 2617344 to 163584"
> 
> The change makes sure the nfpns stored in the superblock is right value.
> 
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  drivers/nvdimm/label.c   | 2 +-
>  drivers/nvdimm/pfn_devs.c| 6 +++---
>  drivers/nvdimm/region_devs.c | 8 
>  3 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/nvdimm/label.c b/drivers/nvdimm/label.c
> index f3d753d3169c..bc6de8fb0153 100644
> --- a/drivers/nvdimm/label.c
> +++ b/drivers/nvdimm/label.c
> @@ -361,7 +361,7 @@ static bool slot_valid(struct nvdimm_drvdata *ndd,
> 
>   /* check that DPA allocations are page aligned */
>   if ((__le64_to_cpu(nd_label->dpa)
> - | __le64_to_cpu(nd_label->rawsize)) % SZ_4K)
> + | __le64_to_cpu(nd_label->rawsize)) % PAGE_SIZE)
>   return false;
> 
>   /* check checksum */
> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
> index 39fa8cf8ef58..9fc2e514e28a 100644
> --- a/drivers/nvdimm/pfn_devs.c
> +++ b/drivers/nvdimm/pfn_devs.c
> @@ -769,8 +769,8 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
>* when populating the vmemmap. This *should* be equal to
>* PMD_SIZE for most architectures.
>*/
> - offset = ALIGN(start + reserve + 64 * npfns,
> - max(nd_pfn->align, PMD_SIZE)) - start;
> + offset = ALIGN(start + reserve + sizeof(struct page) * npfns,
> +max(nd_pfn->align, PMD_SIZE)) - start;
>   } else if (nd_pfn->mode == PFN_MODE_RAM)
>   offset = ALIGN(start + reserve, nd_pfn->align) - start;
>   else
> @@ -782,7 +782,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
>   return -ENXIO;
>   }
> 
> - npfns = (size - offset - start_pad - end_trunc) / SZ_4K;
> + npfns = (size - offset - start_pad - end_trunc) / PAGE_SIZE;
>   pfn_sb->mode = cpu_to_le32(nd_pfn->mode);
>   pfn_sb->dataoff = cpu_to_le64(offset);
>   pfn_sb->npfns = cpu_to_le64(npfns);
> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> index b4ef7d9ff22e..2d8facea5a03 100644
> --- a/drivers/nvdimm/region_devs.c
> +++ b/drivers/nvdimm/region_devs.c
> @@ -994,10 +994,10 @@ static struct nd_region *nd_region_create(struct 
> nvdimm_bus *nvdimm_bus,
>   struct nd_mapping_desc *mapping = _desc->mapping[i];
>   struct nvdimm *nvdimm = mapping->nvdimm;
> 
> - if ((mapping->start | mapping->size) % SZ_4K) {
> - dev_err(_bus->dev, "%s: %s mapping%d is not 4K 
> aligned\n",
> - caller, dev_name(>dev), i);
> -
> + if ((mapping->start | mapping->size) % PAGE_SIZE) {
> + dev_err(_bus->dev,
> + "%s: %s mapping%d is not 4K aligned\n",

s/not 4K aligned/not PAGE_SIZE aligned ?

hope the error msg need to be changed as well..

Regards,
-Satheesh.
> + caller, dev_name(>dev), i);
>   return NULL;
>   }
> 
> -- 
> 2.21.0
> 



Re: [PATCH v5 00/16] KVM: PPC: Book3S HV: add XIVE native exploitation mode

2019-04-29 Thread Satheesh Rajendran
On Wed, Apr 10, 2019 at 07:04:32PM +0200, Cédric Le Goater wrote:
> Hello,
> 
> GitHub trees available here :
> 
> QEMU sPAPR:
> 
>   https://github.com/legoater/qemu/commits/xive-next
>   
> Linux/KVM:
> 
>   https://github.com/legoater/linux/commits/xive-5.1

Hi,

Xive(both ic-mode=dual and ic-mode=xive) guest fails to boot with guest memory 
> 64G, till 64G it boots fine.

Note: xics(ic-mode=xics) guest with the same configuration boots fine

Tested with below current latest code(v6).

HW: Power9 DD 2.2

Qemu:
# git log -1
commit 34cc68411a5ada92df6ef968c32bad424911474c (HEAD -> xive-next, 
origin/xive-next)
Author: Cédric Le Goater 
Date:   Thu Apr 18 18:31:37 2019 +0200

spapr/irq: add KVM support to the 'dual' machine

Kernel Guest/Host: (Host kernel built with `ppc64le_defconfig`, Guest kernel 
built with `ppc64le_guest_defconfig`)
# git log -1
commit fac6994841aa8cfa5af02552f2eb9858fee9a25d (HEAD -> xive-5.1, 
origin/xive-5.1, origin/HEAD)
Author: Cédric Le Goater 
Date:   Thu Apr 18 08:46:33 2019 +0200

KVM: PPC: Book3S HV: XIVE: replace the 'destroy' method by a 'release' 
method


Qemu Commandline:
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 
QEMU_AUDIO_DRV=none /home/sath/qemu/ppc64-softmmu/qemu-system-ppc64 -name 
guest=vm2,debug-threads=on -S -object 
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-13-vm2/master-key.aes
 -machine pseries-4.0,accel=kvm,usb=off,dump-guest-core=off -m 66560 -realtime 
mlock=off -smp 56,sockets=1,cores=28,threads=2 -uuid 
5510791f-f156-4f5a-8c3d-30cfa7a4c7a2 -display none -no-user-config -nodefaults 
-chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-13-vm2/monitor.sock,server,nowait
 -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown 
-boot strict=on -kernel /home/sath/linux/vmlinux -append 'root=/dev/sda2 rw 
console=tty0 console=ttyS0,115200 init=/sbin/init initcall_debug selinux=0 
secure=on' -device qemu-xhci,id=usb,bus=pci.0,addr=0x3 -device 
virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x2 -drive 
file=/home/sath/tests/data/avocado-vt/images/jeos-27-ppc64le_vm2.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0
 -device 
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
 -netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=27 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:57:58:59,bus=pci.0,addr=0x1 
-chardev pty,id=charserial0 -device 
spapr-vty,chardev=charserial0,id=serial0,reg=0x3000 -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -M pseries,ic-mode=dual -msg 
timestamp=on


Guest Console:

Escape character is ^]
Populating /vdevice methods
Populating /vdevice/vty@3000
Populating /vdevice/nvram@7100
Populating /pci@8002000
 00 0800 (D) : 1af4 1000virtio [ net ]
 00 1000 (D) : 1af4 1004virtio [ scsi ]
Populating /pci@8002000/scsi@2
   SCSI: Looking for devices
  100 DISK : "QEMU QEMU HARDDISK2.5+"
 00 1800 (D) : 1b36 000dserial bus [ usb-xhci ]
 00 2000 (D) : 1af4 1002unknown-legacy-device*
No NVRAM common partition, re-initializing...
Scanning USB 
  XHCI: Initializing
Using default console: /vdevice/vty@3000
Detected RAM kernel at 40 (17fe068 bytes) 
 
  Welcome to Open Firmware

  Copyright (c) 2004, 2017 IBM Corporation All rights reserved.
  This program and the accompanying materials are made available
  under the terms of the BSD License available at
  http://www.opensource.org/licenses/bsd-license.php

Booting from memory...
OF stdout device is: /vdevice/vty@3000
Preparing to boot Linux version 5.1.0-rc5-176614-gfac6994841aa 
(root@kvmupstream) (gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04)) #2 SMP Wed 
Apr 24 07:58:04 EDT 2019
Detected machine type: 0101
command line: root=/dev/sda2 rw console=tty0 console=ttyS0,115200 
init=/sbin/init initcall_debug selinux=0 secure=on
Max number of cores passed to firmware: 1024 (NR_CPUS = 2048)
Calling ibm,client-architecture-support...

SLOF **
QEMU Starting
 Build Date = Jan 14 2019 18:00:39
 FW Version = git-a5b428e1c1eae703
 Press "s" to enter Open Firmware.

Populating /vdevice methods
Populating /vdevice/vty@3000
Populating /vdevice/nvram@7100
Populating /pci@8002000
 00 0800 (D) : 1af4 1000virtio [ net ]
 00 1000 (D) : 1af4 1004virtio [ scsi ]
Populating /pci@8002000/scsi@2
   SCSI: Looking for devices
  100 DISK : "QEMU QEMU HARDDISK2.5+"
 00 1800 (D) : 1b36 000dserial bus [ usb-xhci ]
 00 2000 (D) : 1af4 1002unknown-legacy-device*
Scanning USB 
  XHCI: Initializing
Using default console: /vdevice/vty@3000
Detected RAM kernel at 

Re: [PATCH v8 1/2] powerpc/64s: reimplement book3s idle code in C

2019-04-08 Thread Satheesh Rajendran
Hi,

Hit with below kernel crash during Power8 Host boot with this patch series on 
top
of powerpc merge branch commit 
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h=merge=6a821ffee18a6e6c0027c523fa8c958df98ca361

built with ppc64le_defconfig

Host Console log:
[0.454666] EEH: PCI Enhanced I/O Error Handling Enabled
[0.456524] create_dump_obj: New platform dump. ID = 0x4 Size 7457968
[0.457627] opal-power: OPAL EPOW, DPO support detected.
[0.457722] BUG: Unable to handle kernel data access at 0xff76184a
[0.457733] Faulting instruction address: 0xc001a94c
[0.457740] Oops: Kernel access of bad area, sig: 11 [#1]
[0.457745] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
[0.457750] Modules linked in:
[0.457756] CPU: 58 PID: 0 Comm: swapper/58 Not tainted 5.1.0-rc2-gd0ae6c548 
#1
[0.457762] NIP:  c001a94c LR: c00a6e9c CTR: c0008000
[0.457768] REGS: c00f272b7b50 TRAP: 0380   Not tainted  
(5.1.0-rc2-gd0ae6c548)
[0.457773] MSR:  90001033   CR: 24004222  
XER: 
[0.457781] CFAR: c00a6e98 IRQMASK: 1 
[0.457781] GPR00: c00a6e9c c00f272b7de0 0004 
0006 
[0.457781] GPR04: c00a5dd4 24004222 c00f272b7d48 
0001 
[0.457781] GPR08: 0002 ff761844 c00f27250c00 
c3feb1676be1 
[0.457781] GPR12: 4400 c009d380 c00ffe60ff90 
 
[0.457781] GPR16:   c004b4d0 
c004b4a0 
[0.457781] GPR20: c1526214 0800 0001 
c1521b78 
[0.457781] GPR24: 003a  0008 
 
[0.457781] GPR28: c1526140 0001 0400 
c1525ce0 
[0.457829] NIP [c001a94c] irq_set_pending_from_srr1+0x1c/0x50
[0.457835] LR [c00a6e9c] power7_idle+0x3c/0x50
[0.457839] Call Trace:
[0.457843] [c00f272b7de0] [c00a6e98] power7_idle+0x38/0x50 
(unreliable)
[0.457849] [c00f272b7e00] [c00210f4] arch_cpu_idle+0x54/0x160
[0.457856] [c00f272b7e30] [c0c47bc4] default_idle_call+0x74/0x88
[0.457862] [c00f272b7e50] [c0158f54] do_idle+0x2f4/0x3d0
[0.457868] [c00f272b7ec0] [c0159288] cpu_startup_entry+0x38/0x40
[0.457874] [c00f272b7ef0] [c004dae4] start_secondary+0x654/0x680
[0.457881] [c00f272b7f90] [c000b25c] 
start_secondary_prolog+0x10/0x14
[0.457886] Instruction dump:
[0.457890] 992d098b 7c630034 5463d97e 4e800020 6000 3c4c014d 38424dd0 
7c0802a6 
[0.457898] 6000 3d22ff76 78637722 39291840 
[0.457900] BUG: Unable to handle kernel data access at 0xff76184a
[0.457901] <7d4918ae> 2b8a00ff 419e001c 892d098b 
[0.457907] Faulting instruction address: 0xc001a94c
[0.457910] BUG: Unable to handle kernel data access at 0xff76184a
[0.457915] ---[ end trace fa7343cfd21c8798 ]---
[0.457919] Faulting instruction address: 0xc001a94c
[0.458961] BUG: Unable to handle kernel data access at 0xff76184a
[0.458963] BUG: Unable to handle kernel data access at 0xff76184a
[0.458964] BUG: Unable to handle kernel data access at 0xff76184a
[0.458966] BUG: Unable to handle kernel data access at 0xff76184a
[0.458968] BUG: Unable to handle kernel data access at 0xff76184a
[0.458970] BUG: Unable to handle kernel data access at 0xff76184a
[0.458972] Faulting instruction address: 0xc001a94c
[0.458973] Faulting instruction address: 0xc001a94c
[0.458974] Faulting instruction address: 0xc001a94c
[0.458975] Faulting instruction address: 0xc001a94c
[0.458976] Faulting instruction address: 0xc001a94c
[0.458978] initcall 
__machine_initcall_powernv_pnv_init_idle_states+0x0/0xb30 returned 0 after 0 
usecs
[0.458981] calling  __machine_initcall_powernv_opal_time_init+0x0/0x150 @ 1
[0.458982] Faulting instruction address: 0xc001a94c
[0.459022] BUG: Unable to handle kernel data access at 0xff76184a
[0.459040] Faulting instruction address: 0xc001a94c
[0.459043] initcall __machine_initcall_powernv_opal_time_init+0x0/0x150 
returned 0 after 0 usecs
[0.459044] BUG: Unable to handle kernel data access at 0xff76184c
[0.459045] Faulting instruction address: 0xc001a94c
[0.459060] calling  __machine_initcall_powernv_rng_init+0x0/0x334 @ 1
[0.459084] powernv-rng: Registering arch random hook.
[0.459141] BUG: Unable to handle kernel data access at 0xff76184a
[0.459147] Faulting instruction address: 0xc001a94c
[0.459191] BUG: Unable to handle kernel data access at 0xff76184a
[0.459199] Faulting 

Re: [PATCH] Disable kcov for slb routines.

2019-03-06 Thread Satheesh Rajendran
On Mon, Mar 04, 2019 at 01:55:51PM +0530, Mahesh J Salgaonkar wrote:
> From: Mahesh Salgaonkar 
> 
> The kcov instrumentation inside SLB routines causes duplicate SLB entries
> to be added resulting into SLB multihit machine checks.
> Disable kcov instrumentation on slb.o
> 
> Signed-off-by: Mahesh Salgaonkar 
> ---
>  arch/powerpc/mm/Makefile |1 +
>  1 file changed, 1 insertion(+)

Fixes: https://github.com/linuxppc/issues/issues/230

Tested-by: Satheesh Rajendran 

> 
> diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
> index d4d32e229ace..f9cb40684746 100644
> --- a/arch/powerpc/mm/Makefile
> +++ b/arch/powerpc/mm/Makefile
> @@ -60,3 +60,4 @@ obj-$(CONFIG_PPC_MEM_KEYS)  += pkeys.o
>  # This is necessary for booting with kcov enabled on book3e machines
>  KCOV_INSTRUMENT_tlb_nohash.o := n
>  KCOV_INSTRUMENT_fsl_booke_mmu.o := n
> +KCOV_INSTRUMENT_slb.o := n
> 



Re: [PATCH v3 0/4] Fixes for 3 separate NMI reentrancy bugs

2019-02-25 Thread Satheesh Rajendran
On Tue, Feb 26, 2019 at 04:08:57PM +1000, Nicholas Piggin wrote:
> This series fixes several similar but unrelated bugs with NMIs
> clobbering live registers without noticing it, because MSR[RI] is set.
> Pretty rare bugs, but serious silent corruption consequences.
> 
> For the most part these can be observed and tested quite easily
> with the mambo simulator, except that it does not seem to follow
> the architecture wrt leaving MSR[RI] unchanged for HV interrupts.
> Mambo clears MSR[RI], so you have to account for that manually.
> 
> Since v1:
> - Fixed several build bugs.
> 
> Since v2:
> - Improved changelog and comments.
> - Fixed the NIA test for virt mode interrupts.

Hit with below crash on Power8 box, patch built with linuxppc merge branch with 
`ppc64le_defconfig`

UnknownStateTransition: Something happened system state="8" and we transitioned 
to UNKNOWN state.  Review the following for more details
Message="OpTestSystem in run_IPLing and Exception="Kernel OOPS (machine in 
state '5'): Oops: Kernel access of bad area, sig: 11 [#1]
[0.00] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
[0.00] Modules linked in:
[0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 5.0.0-rc7-gf46b87021 #1
[0.00] NIP:  c0c1306c LR: c0c12f64 CTR: c033d860
[0.00] REGS: c14878b0 TRAP: 0380   Not tainted  
(5.0.0-rc7-gf46b87021)
[0.00] MSR:  90001033   CR: 28002224  
XER: 
[0.00] CFAR: c0c12f7c IRQMASK: 1 
[0.00] GPR00: c0c12f64 c1487b40 c1488400 
f000 
[0.00] GPR04: c1487b18 c1487b20  
c1388400 
[0.00] GPR08: f000 f008  
0008 
[0.00] GPR12: c15e1ed0 c167  
 
[0.00] GPR16:   c15e0d40 
0001 
[0.00] GPR20:   0800 
c1413b90 
[0.00] GPR24: c1413b98 0070 0008 
 
[0.00] GPR28:   00701000 
 
[0.00] NIP [c0c1306c] memmap_init_zone+0x258/0x308
[0.00] LR [c0c12f64] memmap_init_zone+0x150/0x308
[0.00] Call Trace:
[0.00] [c1487b40] [c0c12f64] 
memmap_init_zone+0x150/0x308 (unreliable)
[0.00] [c1487be0] [c0f87acc] 
free_area_init_node+0x480/0x518
[0.00] [c1487cf0] [c0f88630] 
free_area_init_nodes+0x838/0x940
[0.00] [c1487e10] [c0f6340c] paging_init+0x8c/0xa8
[0.00] [c1487e80] [c0f5bc00] setup_arch+0x3b4/0x3f0
[0.00] [c1487ef0] [c0f53b68] start_kernel+0x94/0x630
[0.00] [c1487f90] [c000b37c] 
start_here_common+0x1c/0x520
[0.00] Instruction dump:
[0.00] 71290002 41820014 ebea0008 7cc6fa14 78df8402 4870 3d22000c 
7bea3664 
[0.00] 39299d20 e909 7c685214 39230008  fa290018 fa290020 
fa290030 
[0.00] random: get_random_bytes called from 
print_oops_end_marker+0x40/0x80 with crng_init=0
[0.00] ---[ end trace  ]---
[0.00] 
[0.00] Kernel panic - not syncing: Attempted to kill the idle task!
[0.00] Rebooting in 10 seconds" caused the system to go to UNKNOWN_BAD 
and the system will be stopping."

Regards,
-Satheesh.
> 
> Nicholas Piggin (4):
>   powerpc/64s: Fix HV NMI vs HV interrupt recoverability test
>   powerpc/64s: system reset interrupt preserve HSRRs
>   powerpc/64s: Prepare to handle data interrupts vs d-side MCE
> reentrancy
>   powerpc/64s: Fix data interrupts vs d-side MCE reentrancy
> 
>  arch/powerpc/include/asm/asm-prototypes.h |  8 ++
>  arch/powerpc/include/asm/nmi.h|  2 +
>  arch/powerpc/kernel/exceptions-64s.S  | 92 +++
>  arch/powerpc/kernel/mce.c |  3 +
>  arch/powerpc/kernel/traps.c   | 91 +-
>  5 files changed, 179 insertions(+), 17 deletions(-)
> 
> -- 
> 2.18.0
> 



Re: [PATCH v5 0/5] powerpc: system call table generation support

2018-12-17 Thread Satheesh Rajendran
Hi Firoz,

On Thu, Dec 13, 2018 at 02:32:45PM +0530, Firoz Khan wrote:
> The purpose of this patch series is, we can easily
> add/modify/delete system call table support by cha-
> nging entry in syscall.tbl file instead of manually
> changing many files. The other goal is to unify the 
> system call table generation support implementation 
> across all the architectures. 
> 
> The system call tables are in different format in 
> all architecture. It will be difficult to manually
> add, modify or delete the system calls in the resp-
> ective files manually. To make it easy by keeping a 
> script and which'll generate uapi header file and 
> syscall table file.
> 
> syscall.tbl contains the list of available system 
> calls along with system call number and correspond-
> ing entry point. Add a new system call in this arch-
> itecture will be possible by adding new entry in 
> the syscall.tbl file.
> 
> Adding a new table entry consisting of:
> - System call number.
> - ABI.
> - System call name.
> - Entry point name.
>   - Compat entry name, if required.
>   - spu entry name, if required.
> 
> ARM, s390 and x86 architecuture does exist the sim-
> ilar support. I leverage their implementation to 
> come up with a generic solution.
> 
> I have done the same support for work for alpha, 
> ia64, m68k, microblaze, mips, parisc, sh, sparc, 
> and xtensa. Below mentioned git repository contains
> more details about the workflow.
> 
> https://github.com/frzkhn/system_call_table_generator/
> 
> Finally, this is the ground work to solve the Y2038
> issue. We need to add two dozen of system calls to 
> solve Y2038 issue. So this patch series will help to
> add new system calls easily by adding new entry in the
> syscall.tbl.
> 
> Changes since v4:
>  - DOTSYM macro removed for ppc32, which was causing
>the compilation error.
> 
> Changes since v3:
>  - split compat syscall table out from native table.
>  - modified the script to add new line in the generated
>file.
> 
> Changes since v2:
>  - modified/optimized the syscall.tbl to avoid duplicate
>for the spu entries.
>  - updated the syscalltbl.sh to meet the above point.
> 
> Changes since v1:
>  - optimized/updated the syscall table generation 
>scripts.
>  - fixed all mixed indentation issues in syscall.tbl.
>  - added "comments" in syscall_*.tbl.
>  - changed from generic-y to generated-y in Kbuild.
> 
> Firoz Khan (5):
>   powerpc: add __NR_syscalls along with NR_syscalls
>   powerpc: move macro definition from asm/systbl.h
>   powerpc: add system call table generation support
>   powerpc: split compat syscall table out from native table
>   powerpc: generate uapi header and system call table files

Tried to apply on linus "master" and 
linuxppc-dev(https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git) 
"merge" branch,
both failed to apply series.

# git am mbox
Applying: powerpc: add __NR_syscalls along with NR_syscalls
Applying: powerpc: move macro definition from asm/systbl.h
Applying: powerpc: add system call table generation support
Applying: powerpc: split compat syscall table out from native table
Applying: powerpc: generate uapi header and system call table files
error: patch failed: arch/powerpc/include/uapi/asm/Kbuild:1
error: arch/powerpc/include/uapi/asm/Kbuild: patch does not apply
Patch failed at 0005 powerpc: generate uapi header and system call table files
Use 'git am --show-current-patch' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Then, tried with 
linuxppc-dev(https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git) 
"next" branch,
patch got applied, compiled with ppc64le_defconfig and booted on IBM Power8 box.

# uname -r
4.20.0-rc2-gdd2690d2c

Looks like patch series needs a rebase against the latest kernel versions.


Thanks,
-Satheesh.

> 
>  arch/powerpc/Makefile   |   3 +
>  arch/powerpc/include/asm/Kbuild |   4 +
>  arch/powerpc/include/asm/syscall.h  |   3 +-
>  arch/powerpc/include/asm/systbl.h   | 396 --
>  arch/powerpc/include/asm/unistd.h   |   3 +-
>  arch/powerpc/include/uapi/asm/Kbuild|   2 +
>  arch/powerpc/include/uapi/asm/unistd.h  | 389 +
>  arch/powerpc/kernel/Makefile|  10 -
>  arch/powerpc/kernel/entry_64.S  |   7 +-
>  arch/powerpc/kernel/syscalls/Makefile   |  63 
>  arch/powerpc/kernel/syscalls/syscall.tbl| 427 
> 
>  arch/powerpc/kernel/syscalls/syscallhdr.sh  |  37 +++
>  arch/powerpc/kernel/syscalls/syscalltbl.sh  |  36 +++
>  arch/powerpc/kernel/systbl.S|  40 ++-
>  arch/powerpc/kernel/systbl_chk.c|  60 
>  arch/powerpc/kernel/vdso.c  | 

Re: [PATCH v4 0/5] powerpc: system call table generation support

2018-12-06 Thread Satheesh Rajendran
On Fri, Dec 07, 2018 at 11:41:35AM +0530, Firoz Khan wrote:
> The purpose of this patch series is, we can easily
> add/modify/delete system call table support by cha-
> nging entry in syscall.tbl file instead of manually
> changing many files. The other goal is to unify the 
> system call table generation support implementation 
> across all the architectures. 
> 
> The system call tables are in different format in 
> all architecture. It will be difficult to manually
> add, modify or delete the system calls in the resp-
> ective files manually. To make it easy by keeping a 
> script and which'll generate uapi header file and 
> syscall table file.
> 
> syscall.tbl contains the list of available system 
> calls along with system call number and correspond-
> ing entry point. Add a new system call in this arch-
> itecture will be possible by adding new entry in 
> the syscall.tbl file.
> 
> Adding a new table entry consisting of:
> - System call number.
> - ABI.
> - System call name.
> - Entry point name.
>   - Compat entry name, if required.
>   - spu entry name, if required.
> 
> ARM, s390 and x86 architecuture does exist the sim-
> ilar support. I leverage their implementation to 
> come up with a generic solution.
> 
> I have done the same support for work for alpha, 
> ia64, m68k, microblaze, mips, parisc, sh, sparc, 
> and xtensa. Below mentioned git repository contains
> more details about the workflow.
> 
> https://github.com/frzkhn/system_call_table_generator/
> 
> Finally, this is the ground work to solve the Y2038
> issue. We need to add two dozen of system calls to 
> solve Y2038 issue. So this patch series will help to
> add new system calls easily by adding new entry in the
> syscall.tbl.
> 
> Changes since v3:
>  - split compat syscall table out from native table.
>  - modified the script to add new line in the generated
>file.
Hi Firoz,

This version(v4) booted fine in IBM Power8 box.
Compiled with 'ppc64le_defconfig' aginst 
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h=merge=a26b21082959cee3075b3edb7ef23071c7e0b09a

Reference failure v3 version: 
https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-November/182110.html

Regards,
-Satheesh.
> 
> Changes since v2:
>  - modified/optimized the syscall.tbl to avoid duplicate
>for the spu entries.
>  - updated the syscalltbl.sh to meet the above point.
> 
> Changes since v1:
>  - optimized/updated the syscall table generation 
>scripts.
>  - fixed all mixed indentation issues in syscall.tbl.
>  - added "comments" in syscall_*.tbl.
>  - changed from generic-y to generated-y in Kbuild.
> 
> Firoz Khan (5):
>   powerpc: add __NR_syscalls along with NR_syscalls
>   powerpc: move macro definition from asm/systbl.h
>   powerpc: add system call table generation support
>   powerpc: split compat syscall table out from native table
>   powerpc: generate uapi header and system call table files
> 
>  arch/powerpc/Makefile   |   3 +
>  arch/powerpc/include/asm/Kbuild |   4 +
>  arch/powerpc/include/asm/syscall.h  |   3 +-
>  arch/powerpc/include/asm/systbl.h   | 396 --
>  arch/powerpc/include/asm/unistd.h   |   3 +-
>  arch/powerpc/include/uapi/asm/Kbuild|   2 +
>  arch/powerpc/include/uapi/asm/unistd.h  | 389 +
>  arch/powerpc/kernel/Makefile|  10 -
>  arch/powerpc/kernel/entry_64.S  |   7 +-
>  arch/powerpc/kernel/syscalls/Makefile   |  63 
>  arch/powerpc/kernel/syscalls/syscall.tbl| 427 
> 
>  arch/powerpc/kernel/syscalls/syscallhdr.sh  |  37 +++
>  arch/powerpc/kernel/syscalls/syscalltbl.sh  |  36 +++
>  arch/powerpc/kernel/systbl.S|  40 ++-
>  arch/powerpc/kernel/systbl_chk.c|  60 
>  arch/powerpc/kernel/vdso.c  |   7 +-
>  arch/powerpc/platforms/cell/spu_callbacks.c |  17 +-
>  17 files changed, 606 insertions(+), 898 deletions(-)
>  delete mode 100644 arch/powerpc/include/asm/systbl.h
>  create mode 100644 arch/powerpc/kernel/syscalls/Makefile
>  create mode 100644 arch/powerpc/kernel/syscalls/syscall.tbl
>  create mode 100644 arch/powerpc/kernel/syscalls/syscallhdr.sh
>  create mode 100644 arch/powerpc/kernel/syscalls/syscalltbl.sh
>  delete mode 100644 arch/powerpc/kernel/systbl_chk.c
> 
> -- 
> 1.9.1
> 



Re: [PATCH 0/3] System call table generation support

2018-11-29 Thread Satheesh Rajendran
On Thu, Nov 29, 2018 at 01:48:16PM +0530, Firoz Khan wrote:
> Hi Sathish,
> 
> Thanks for your email.
> 
> On Thu, 29 Nov 2018 at 12:05, Satheesh Rajendran
>  wrote:
> >
> > On Fri, Sep 14, 2018 at 02:02:57PM +0530, Firoz Khan wrote:
> > > The purpose of this patch series is:
> > > 1. We can easily add/modify/delete system call by changing entry
> > > in syscall.tbl file. No need to manually edit many files.
> > >
> > > 2. It is easy to unify the system call implementation across all
> > > the architectures.
> > >
> > > The system call tables are in different format in all architecture
> > > and it will be difficult to manually add or modify the system calls
> > > in the respective files manually. To make it easy by keeping a script
> > > and which'll generate the header file and syscall table file so this
> > > change will unify them across all architectures.
> > >
> > > syscall.tbl contains the list of available system calls along with
> > > system call number and corresponding entry point. Add a new system
> > > call in this architecture will be possible by adding new entry in
> > > the syscall.tbl file.
> > >
> > > Adding a new table entry consisting of:
> > > - System call number.
> > > - ABI.
> > > - System call name.
> > > - Entry point name.
> > > - Compat entry name, if required.
> > >
> > > ARM, s390 and x86 architecuture does exist the similar support. I
> > > leverage their implementation to come up with a generic solution.
> > >
> > > I have done the same support for work for alpha, m68k, microblaze,
> > > ia64, mips, parisc, sh, sparc, and xtensa. But I started sending
> > > the patch for one architecuture for review. Below mentioned git
> > > repository contains more details.
> > > Git repo:- https://github.com/frzkhn/system_call_table_generator/
> > >
> > > Finally, this is the ground work for solving the Y2038 issue. We
> > > need to add/change two dozen of system calls to solve Y2038 issue.
> > > So this patch series will help to easily modify from existing
> > > system call to Y2038 compatible system calls.
> > >
> > > I started working system call table generation on 4.17-rc1. I used
> > > marcin's script - https://github.com/hrw/syscalls-table to generate
> > > the syscall.tbl file. And this will be the input to the system call
> > > table generation script. But there are couple system call got add
> > > in the latest rc release. If run Marcin's script on latest release,
> > > It will generate a new syscall.tbl. But I still use the old file -
> > > syscall.tbl and once all review got over I'll update syscall.tbl
> > > alone w.r.to the tip of the kernel. The impact of this thing, few
> > > of the system call won't work.
> > >
> > > Firoz Khan (3):
> > >   powerpc: Replace NR_syscalls macro from asm/unistd.h
> > >   powerpc: Add system call table generation support
> > >   powerpc: uapi header and system call table file generation
> > >
> > >  arch/powerpc/Makefile   |   3 +
> > >  arch/powerpc/include/asm/Kbuild |   3 +
> > >  arch/powerpc/include/asm/unistd.h   |   3 +-
> > >  arch/powerpc/include/uapi/asm/Kbuild|   2 +
> > >  arch/powerpc/include/uapi/asm/unistd.h  | 391 
> > > +---
> > >  arch/powerpc/kernel/Makefile|   3 +-
> > >  arch/powerpc/kernel/syscall_table_32.S  |   9 +
> > >  arch/powerpc/kernel/syscall_table_64.S  |  17 ++
> > >  arch/powerpc/kernel/syscalls/Makefile   |  51 
> > >  arch/powerpc/kernel/syscalls/syscall_32.tbl | 378 
> > > +++
> > >  arch/powerpc/kernel/syscalls/syscall_64.tbl | 372 
> > > ++
> > >  arch/powerpc/kernel/syscalls/syscallhdr.sh  |  37 +++
> > >  arch/powerpc/kernel/syscalls/syscalltbl.sh  |  38 +++
> > >  arch/powerpc/kernel/systbl.S|  50 
> > >  14 files changed, 916 insertions(+), 441 deletions(-)
> > >  create mode 100644 arch/powerpc/kernel/syscall_table_32.S
> > >  create mode 100644 arch/powerpc/kernel/syscall_table_64.S
> > >  create mode 100644 arch/powerpc/kernel/syscalls/Makefile
> > >  create mode 100644 arch/powerpc/kernel/syscalls/syscall_32.tbl
> > >  create mode 100644 

Re: [PATCH] powerpc/numa: fix hot-added CPU on memory-less node

2018-11-15 Thread Satheesh Rajendran
On Wed, Nov 14, 2018 at 06:03:19PM +0100, Laurent Vivier wrote:
> Trying to hotplug a CPU on an empty NUMA node (without
> memory or CPU) crashes the kernel when the CPU is onlined.
> 
> During the onlining process, the kernel calls start_secondary()
> that ends by calling
> set_numa_mem(local_memory_node(numa_cpu_lookup_table[cpu]))
> that relies on NODE_DATA(nid)->node_zonelists and in our case
> NODE_DATA(nid) is NULL.
> 
> To fix that, add the same checking as we already have in
> find_and_online_cpu_nid(): if NODE_DATA() is NULL, use
> the first online node.
> 
> Bug: https://github.com/linuxppc/linux/issues/184
> Fixes: ea05ba7c559c8e5a5946c3a94a2a266e9a6680a6
>(powerpc/numa: Ensure nodes initialized for hotplug)
> Signed-off-by: Laurent Vivier 
> ---
>  arch/powerpc/mm/numa.c | 9 +
>  1 file changed, 9 insertions(+)

This patch causes regression for cold plug numa case(Case 1) and 
hotplug case + reboot(Case 2) with adding all vcpus into node 0.


Env: HW: Power8 Host.
Kernel: 4.20-rc2 + this patch

Case 1:
1. boot a guest with 8 vcpus(all available), spreadout in 4 numa nodes.
8
...
   
  
  
  
  


2. Check lscpu --- all vcpus are added to node0 --> NOK

# lscpu
Architecture:ppc64le
Byte Order:  Little Endian
CPU(s):  8
On-line CPU(s) list: 0-7
Thread(s) per core:  1
Core(s) per socket:  8
Socket(s):   1
NUMA node(s):4
Model:   2.1 (pvr 004b 0201)
Model name:  POWER8 (architected), altivec supported
Hypervisor vendor:   KVM
Virtualization type: para
L1d cache:   64K
L1i cache:   32K
NUMA node0 CPU(s):   0-7
NUMA node1 CPU(s):   
NUMA node2 CPU(s):   
NUMA node3 CPU(s): 

without this patch it was working fine.
# lscpu
Architecture:ppc64le
Byte Order:  Little Endian
CPU(s):  8
On-line CPU(s) list: 0-7
Thread(s) per core:  1
Core(s) per socket:  8
Socket(s):   1
NUMA node(s):4
Model:   2.1 (pvr 004b 0201)
Model name:  POWER8 (architected), altivec supported
Hypervisor vendor:   KVM
Virtualization type: para
L1d cache:   64K
L1i cache:   32K
NUMA node0 CPU(s):   0,1
NUMA node1 CPU(s):   2,3
NUMA node2 CPU(s):   4,5
NUMA node3 CPU(s):   6,7


Case 2:
1. boot a guest with 8 vcpus(2 available, 6 possible), spreadout in 4 numa 
nodes.
8
...
   
  
  
  
  


2. Hotplug all vcpus
# lscpu
Architecture:ppc64le
Byte Order:  Little Endian
CPU(s):  8
On-line CPU(s) list: 0-7
Thread(s) per core:  1
Core(s) per socket:  8
Socket(s):   1
NUMA node(s):2
Model:   2.1 (pvr 004b 0201)
Model name:  POWER8 (architected), altivec supported
Hypervisor vendor:   KVM
Virtualization type: para
L1d cache:   64K
L1i cache:   32K
NUMA node0 CPU(s):   0,1,4-7
NUMA node1 CPU(s):   2,3


3. reboot the guest
# lscpu
Architecture:ppc64le
Byte Order:  Little Endian
CPU(s):  8
On-line CPU(s) list: 0-7
Thread(s) per core:  1
Core(s) per socket:  8
Socket(s):   1
NUMA node(s):4
Model:   2.1 (pvr 004b 0201)
Model name:  POWER8 (architected), altivec supported
Hypervisor vendor:   KVM
Virtualization type: para
L1d cache:   64K
L1i cache:   32K
NUMA node0 CPU(s):   0-7
NUMA node1 CPU(s):
NUMA node2 CPU(s):
NUMA node3 CPU(s):


Without this patch, Case 2 crashes the guest during hotplug, i.e
issue reported in https://github.com/linuxppc/linux/issues/184

Regards,
-Satheesh.

> 
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 3a048e98a132..1b2d25a3c984 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -483,6 +483,15 @@ static int numa_setup_cpu(unsigned long lcpu)
>   if (nid < 0 || !node_possible(nid))
>   nid = first_online_node;
> 
> + if (NODE_DATA(nid) == NULL) {
> + /*
> +  * Default to using the nearest node that has memory installed.
> +  * Otherwise, it would be necessary to patch the kernel MM code
> +  * to deal with more memoryless-node error conditions.
> +  */
> + nid = first_online_node;
> + }
> +
>   map_cpu_to_node(lcpu, nid);
>   of_node_put(cpu);
>  out:
> -- 
> 2.17.2
> 



Re: [PATCH] powerpc: Add KVM guest defconfig

2018-11-12 Thread Satheesh Rajendran
On Mon, Nov 12, 2018 at 11:24:08PM +1100, Michael Ellerman wrote:
> Satheesh Rajendran  writes:
> 
> > On Thu, Nov 08, 2018 at 04:23:07PM -0200, Breno Leitao wrote:
> >> hi Satheesh,
> >> 
> >> On 11/08/2018 03:08 AM, sathn...@linux.vnet.ibm.com wrote:
> >> > --- /dev/null
> >> > +++ b/arch/powerpc/configs/guest.config
> >> > @@ -0,0 +1,14 @@
> >> > +CONFIG_VIRTIO_BLK=y
> >> > +CONFIG_VIRTIO_BLK_SCSI=y
> >> > +CONFIG_SCSI_VIRTIO=y
> >> > +CONFIG_VIRTIO_NET=y
> >> > +CONFIG_NET_FAILOVER=y
> >> > +CONFIG_VIRTIO_CONSOLE=y
> >> > +CONFIG_VIRTIO=y
> >> > +CONFIG_VIRTIO_PCI=y
> >> > +CONFIG_KVM_GUEST=y
> >> > +CONFIG_EPAPR_PARAVIRT=y
> >> > +CONFIG_XFS_FS=y
> >> 
> >> Why a guest kernel needs to have XFS integrated in the core image? I am
> >> wondering if it is a requirement from another CONFIG_ option.
> >
> > Idea is to have a working config which would boot guest without initramfs,
> > other FS(like EXT4) is already integrated in the core image, 
> > thought this would be helpful for distributions, which default XFS as root 
> > disk.
> 
> Maybe we should switch XFS_FS to Y in ppc64_defconfig ?

Sure, makes sense, will send it for ppc64_defconfig instead. 
Inaddition, Have few more symbols to be enabled for cgroups,
memhotplug,numa balancing.
I guess these symbols can also go to ppc64_defconfig itself?.

i.e,

CONFIG_CGROUP_SCHED=y
CONFIG_MEMCG=y
CONFIG_CGROUP_SCHED=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_PERF=y
CONFIG_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTREMOVE=y
CONFIG_NUMA_BALANCING=y

Thanks!
-Satheesh.
> 
> cheers
> 



Re: [PATCH] powerpc: Add KVM guest defconfig

2018-11-09 Thread Satheesh Rajendran
On Thu, Nov 08, 2018 at 04:23:07PM -0200, Breno Leitao wrote:
> hi Satheesh,
> 
> On 11/08/2018 03:08 AM, sathn...@linux.vnet.ibm.com wrote:
> > --- /dev/null
> > +++ b/arch/powerpc/configs/guest.config
> > @@ -0,0 +1,14 @@
> > +CONFIG_VIRTIO_BLK=y
> > +CONFIG_VIRTIO_BLK_SCSI=y
> > +CONFIG_SCSI_VIRTIO=y
> > +CONFIG_VIRTIO_NET=y
> > +CONFIG_NET_FAILOVER=y
> > +CONFIG_VIRTIO_CONSOLE=y
> > +CONFIG_VIRTIO=y
> > +CONFIG_VIRTIO_PCI=y
> > +CONFIG_KVM_GUEST=y
> > +CONFIG_EPAPR_PARAVIRT=y
> > +CONFIG_XFS_FS=y
> 
> Why a guest kernel needs to have XFS integrated in the core image? I am
> wondering if it is a requirement from another CONFIG_ option.

Idea is to have a working config which would boot guest without initramfs,
other FS(like EXT4) is already integrated in the core image, 
thought this would be helpful for distributions, which default XFS as root disk.

Hope this should be fine?

> 
> If it is not a strict requirement from another config, I think we can keep it
> as defined at ppc64_defconfig, which defines it as module (CONFIG_XFS_FS=m).
> 
> Thanks for this patch, very useful.

Thanks Breno! :-)

-Satheesh.

> Breno
> 



Re: System not booting since dm changes? (was Linux 4.20-rc1)

2018-11-05 Thread Satheesh Rajendran
On Mon, Nov 05, 2018 at 08:51:57AM -0500, Mike Snitzer wrote:
> On Mon, Nov 05 2018 at  5:25am -0500,
> Michael Ellerman  wrote:
> 
> > Linus Torvalds  writes:
> > ...
> > > Mike Snitzer (1):
> > > device mapper updates
> > 
> > Hi Mike,
> > 
> > Replying here because I can't find the device-mapper pull or the patch
> > in question on LKML. I guess I should be subscribed to dm-devel.
> > 
> > We have a box that doesn't boot any more, bisect points at one of:
> > 
> >   cef6f55a9fb4 Mike Snitzer   dm table: require that request-based DM 
> > be layered on blk-mq devices 
> >   953923c09fe8 Mike Snitzer   dm: rename DM_TYPE_MQ_REQUEST_BASED to 
> > DM_TYPE_REQUEST_BASED 
> >   6a23e05c2fe3 Jens Axboe dm: remove legacy request-based IO path 
> > 
> > 
> > It's a Power8 system running Rawhide, it does have multipath, but I'm
> > told it was setup by the Fedora installer, ie. nothing fancy.
> > 
> > The symptom is the system can't find its root filesystem and drops into
> > the initramfs shell. The dmesg includes a bunch of errors like below:
> > 
> >   [   43.263460] localhost multipathd[1344]: sdb: fail to get serial
> >   [   43.268762] localhost multipathd[1344]: mpatha: failed in domap for 
> > addition of new path sdb
> >   [   43.268762] localhost multipathd[1344]: uevent trigger error
> >   [   43.282065] localhost kernel: device-mapper: table: table load 
> > rejected: not all devices are blk-mq request-stackable
> ...
> >
> > Any ideas what's going wrong here?
> 
> "table load rejected: not all devices are blk-mq request-stackable"
> speaks to the fact that you aren't using blk-mq for scsi (aka scsi-mq).
> 
> You need to use scsi_mod.use_blk_mq=Y on the kernel commandline (or set
> CONFIG_SCSI_MQ_DEFAULT in your kernel config)

Thanks Mike!, above solution worked and the system booted fine now:-)

# uname -r
4.20.0-rc1+
# cat /proc/cmdline 
root=/dev/mapper/fedora_ltc--test--ci2-root ro 
rd.lvm.lv=fedora_ltc-test-ci2/root rd.lvm.lv=fedora_ltc-test-ci2/swap 
scsi_mod.use_blk_mq=Y

CONFIG_SCSI_MQ_DEFAULT kernel was not set in my kernel config, will set in 
future runs.

Thanks Michael!

Regards,
-Satheesh.

> 
> Mike
>