[PATCH 16/18] powerpc/pseries: remove memory "re-add" implementation

2020-06-11 Thread Nathan Lynch
dlpar_memory() no longer has any callers which pass
PSERIES_HP_ELOG_ACTION_READD. Remove this case and the corresponding
unreachable code.

Signed-off-by: Nathan Lynch 
---
 arch/powerpc/include/asm/rtas.h   |  1 -
 .../platforms/pseries/hotplug-memory.c| 42 ---
 2 files changed, 43 deletions(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 0107d724e9da..55f9a154c95d 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -215,7 +215,6 @@ inline uint16_t pseries_errorlog_length(struct 
pseries_errorlog *sect)
 
 #define PSERIES_HP_ELOG_ACTION_ADD 1
 #define PSERIES_HP_ELOG_ACTION_REMOVE  2
-#define PSERIES_HP_ELOG_ACTION_READD   3
 
 #define PSERIES_HP_ELOG_ID_DRC_NAME1
 #define PSERIES_HP_ELOG_ID_DRC_INDEX   2
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 5ace2f9a277e..67ece3ac9ac2 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -487,40 +487,6 @@ static int dlpar_memory_remove_by_index(u32 drc_index)
return rc;
 }
 
-static int dlpar_memory_readd_by_index(u32 drc_index)
-{
-   struct drmem_lmb *lmb;
-   int lmb_found;
-   int rc;
-
-   pr_info("Attempting to update LMB, drc index %x\n", drc_index);
-
-   lmb_found = 0;
-   for_each_drmem_lmb(lmb) {
-   if (lmb->drc_index == drc_index) {
-   lmb_found = 1;
-   rc = dlpar_remove_lmb(lmb);
-   if (!rc) {
-   rc = dlpar_add_lmb(lmb);
-   if (rc)
-   dlpar_release_drc(lmb->drc_index);
-   }
-   break;
-   }
-   }
-
-   if (!lmb_found)
-   rc = -EINVAL;
-
-   if (rc)
-   pr_info("Failed to update memory at %llx\n",
-   lmb->base_addr);
-   else
-   pr_info("Memory at %llx was updated\n", lmb->base_addr);
-
-   return rc;
-}
-
 static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
 {
struct drmem_lmb *lmb, *start_lmb, *end_lmb;
@@ -617,10 +583,6 @@ static int dlpar_memory_remove_by_index(u32 drc_index)
 {
return -EOPNOTSUPP;
 }
-static int dlpar_memory_readd_by_index(u32 drc_index)
-{
-   return -EOPNOTSUPP;
-}
 
 static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
 {
@@ -902,10 +864,6 @@ int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
break;
}
 
-   break;
-   case PSERIES_HP_ELOG_ACTION_READD:
-   drc_index = hp_elog->_drc_u.drc_index;
-   rc = dlpar_memory_readd_by_index(drc_index);
break;
default:
pr_err("Invalid action (%d) specified\n", hp_elog->action);
-- 
2.25.4



Re: ppc64le and 32-bit LE userland compatibility

2020-06-11 Thread Christophe Leroy




Le 06/06/2020 à 01:54, Will Springer a écrit :

On Saturday, May 30, 2020 3:17:24 PM PDT Will Springer wrote:

On Saturday, May 30, 2020 8:37:43 AM PDT Christophe Leroy wrote:

There is a series at
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=173231
to switch powerpc to the Generic C VDSO.

Can you try and see whether it fixes your issue ?

Christophe


Sure thing, I spotted that after making the initial post. Will report
back with results.

Will [she/her]


Sorry for the wait, I just sat down to work on this again yesterday.

Tested this series on top of stable/linux-5.7.y (5.7.0 at the time of
writing), plus the one-line signal handler patch. Had to rewind to the
state of powerpc/merge at the time of the mail before the patch would
apply, then cherry-picked to 5.6 until I realized the patchset used some
functionality that didn't land until 5.7, so I moved it there.

Good news is that `date` now works correctly with the vdso call in 32-bit
LE. Bad news is it seems to have broken things on the 64-bit side—in my
testing, Void kicks off runit but hangs after starting eudev, and in a
Debian Stretch system, systemd doesn't get to the point of printing
anything whatsoever. (I had to `init=/bin/sh` to confirm the date worked
in ppcle, although in ppc64le running `date` also hung the system when it
made the vdso call...) Not sure how to approach debugging that, so I'd
appreciate any pointers.



Does it breaks only ppc64le vdso or also ppc64 (be) vdso ?

I never had a chance to run any test on ppc64 as I only have a kernel 
cross compiler.


Would you have a chance to build and run vdsotest from 
https://github.com/nathanlynch/vdsotest ?


Thanks
Christophe


Re: [PATCH v4] powerpc/fadump: fix race between pstore write and fadump crash trigger

2020-06-11 Thread kernel test robot
Hi Sourabh,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on linus/master linux/master v5.7 next-20200611]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:
https://github.com/0day-ci/linux/commits/Sourabh-Jain/powerpc-fadump-fix-race-between-pstore-write-and-fadump-crash-trigger/20200605-033545
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-randconfig-r002-20200612 (attached as .config)
compiler: powerpc64le-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross 
ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>, old ones prefixed by <<):

arch/powerpc/kernel/fadump.c: In function 'crash_fadump':
>> arch/powerpc/kernel/fadump.c:700:15: error: 'cpus_in_crash' undeclared 
>> (first use in this function)
700 |   atomic_inc(_in_crash);
|   ^
arch/powerpc/kernel/fadump.c:700:15: note: each undeclared identifier is 
reported only once for each function it appears in
arch/powerpc/kernel/fadump.c: In function 'fadump_update_elfcore_header':
arch/powerpc/kernel/fadump.c:755:17: warning: variable 'elf' set but not used 
[-Wunused-but-set-variable]
755 |  struct elfhdr *elf;
| ^~~
arch/powerpc/kernel/fadump.c: At top level:
arch/powerpc/kernel/fadump.c:1706:22: warning: no previous prototype for 
'arch_reserved_kernel_pages' [-Wmissing-prototypes]
1706 | unsigned long __init arch_reserved_kernel_pages(void)
|  ^~

vim +/cpus_in_crash +700 arch/powerpc/kernel/fadump.c

   678  
   679  void crash_fadump(struct pt_regs *regs, const char *str)
   680  {
   681  unsigned int msecs;
   682  struct fadump_crash_info_header *fdh = NULL;
   683  int old_cpu, this_cpu;
   684  unsigned int ncpus = num_online_cpus() - 1; /* Do not include 
first CPU */
   685  
   686  if (!should_fadump_crash())
   687  return;
   688  
   689  /*
   690   * old_cpu == -1 means this is the first CPU which has come 
here,
   691   * go ahead and trigger fadump.
   692   *
   693   * old_cpu != -1 means some other CPU has already on it's way
   694   * to trigger fadump, just keep looping here.
   695   */
   696  this_cpu = smp_processor_id();
   697  old_cpu = cmpxchg(_cpu, -1, this_cpu);
   698  
   699  if (old_cpu != -1) {
 > 700  atomic_inc(_in_crash);
   701  
   702  /*
   703   * We can't loop here indefinitely. Wait as long as 
fadump
   704   * is in force. If we race with fadump un-registration 
this
   705   * loop will break and then we go down to normal panic 
path
   706   * and reboot. If fadump is in force the first crashing
   707   * cpu will definitely trigger fadump.
   708   */
   709  while (fw_dump.dump_registered)
   710  cpu_relax();
   711  return;
   712  }
   713  
   714  fdh = __va(fw_dump.fadumphdr_addr);
   715  fdh->crashing_cpu = crashing_cpu;
   716  crash_save_vmcoreinfo();
   717  
   718  if (regs)
   719  fdh->regs = *regs;
   720  else
   721  ppc_save_regs(>regs);
   722  
   723  fdh->online_mask = *cpu_online_mask;
   724  
   725  /*
   726   * If we came in via system reset, wait a while for the 
secondary
   727   * CPUs to enter.
   728   */
   729  if (TRAP(&(fdh->regs)) == 0x100) {
   730  msecs = CRASH_TIMEOUT;
   731  while ((atomic_read(_in_crash) < ncpus) && 
(--msecs > 0))
   732  mdelay(1);
   733  }
   734  
   735  fw_dump.ops->fadump_trigger(fdh, str);
   736  }
   737  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


[PATCH 18/18] powerpc/pseries: remove obsolete memory hotplug DT notifier code

2020-06-11 Thread Nathan Lynch
pseries_update_drconf_memory() runs from a DT notifier in response to
an update to the ibm,dynamic-memory property of the
/ibm,dynamic-reconfiguration-memory node. This property is an older
less compact format than the ibm,dynamic-memory-v2 property used in
most currently supported firmwares. There has never been an equivalent
function for the v2 property.

pseries_update_drconf_memory() compares the 'assigned' flag for each
LMB in the old vs new properties and adds or removes the block
accordingly. However it appears to be of no actual utility:

* Partition suspension and PRRNs are specified only to change LMBs'
  NUMA affinity information. This notifier should be a no-op for those
  scenarios since the assigned flags should not change.

* The memory hotplug/DLPAR path has a hack which short-circuits
  execution of the notifier:
 dlpar_memory()
...
rtas_hp_event = true;
drmem_update_dt()
   of_update_property()
  pseries_memory_notifier()
 pseries_update_drconf_memory()
if (rtas_hp_event) return;

So this code only makes sense as a relic of the time when more of the
DLPAR workflow took place in user space. I don't see a purpose for it
now.

Signed-off-by: Nathan Lynch 
---
 .../platforms/pseries/hotplug-memory.c| 65 +--
 1 file changed, 1 insertion(+), 64 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 67ece3ac9ac2..73a5dcd977e1 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -22,8 +22,6 @@
 #include 
 #include "pseries.h"
 
-static bool rtas_hp_event;
-
 unsigned long pseries_memory_block_size(void)
 {
struct device_node *np;
@@ -871,11 +869,8 @@ int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
break;
}
 
-   if (!rc) {
-   rtas_hp_event = true;
+   if (!rc)
rc = drmem_update_dt();
-   rtas_hp_event = false;
-   }
 
unlock_device_hotplug();
return rc;
@@ -911,60 +906,6 @@ static int pseries_add_mem_node(struct device_node *np)
return (ret < 0) ? -EINVAL : 0;
 }
 
-static int pseries_update_drconf_memory(struct of_reconfig_data *pr)
-{
-   struct of_drconf_cell_v1 *new_drmem, *old_drmem;
-   unsigned long memblock_size;
-   u32 entries;
-   __be32 *p;
-   int i, rc = -EINVAL;
-
-   if (rtas_hp_event)
-   return 0;
-
-   memblock_size = pseries_memory_block_size();
-   if (!memblock_size)
-   return -EINVAL;
-
-   if (!pr->old_prop)
-   return 0;
-
-   p = (__be32 *) pr->old_prop->value;
-   if (!p)
-   return -EINVAL;
-
-   /* The first int of the property is the number of lmb's described
-* by the property. This is followed by an array of of_drconf_cell
-* entries. Get the number of entries and skip to the array of
-* of_drconf_cell's.
-*/
-   entries = be32_to_cpu(*p++);
-   old_drmem = (struct of_drconf_cell_v1 *)p;
-
-   p = (__be32 *)pr->prop->value;
-   p++;
-   new_drmem = (struct of_drconf_cell_v1 *)p;
-
-   for (i = 0; i < entries; i++) {
-   if ((be32_to_cpu(old_drmem[i].flags) & DRCONF_MEM_ASSIGNED) &&
-   (!(be32_to_cpu(new_drmem[i].flags) & DRCONF_MEM_ASSIGNED))) 
{
-   rc = pseries_remove_memblock(
-   be64_to_cpu(old_drmem[i].base_addr),
-memblock_size);
-   break;
-   } else if ((!(be32_to_cpu(old_drmem[i].flags) &
-   DRCONF_MEM_ASSIGNED)) &&
-   (be32_to_cpu(new_drmem[i].flags) &
-   DRCONF_MEM_ASSIGNED)) {
-   rc = memblock_add(be64_to_cpu(old_drmem[i].base_addr),
- memblock_size);
-   rc = (rc < 0) ? -EINVAL : 0;
-   break;
-   }
-   }
-   return rc;
-}
-
 static int pseries_memory_notifier(struct notifier_block *nb,
   unsigned long action, void *data)
 {
@@ -978,10 +919,6 @@ static int pseries_memory_notifier(struct notifier_block 
*nb,
case OF_RECONFIG_DETACH_NODE:
err = pseries_remove_mem_node(rd->dn);
break;
-   case OF_RECONFIG_UPDATE_PROPERTY:
-   if (!strcmp(rd->prop->name, "ibm,dynamic-memory"))
-   err = pseries_update_drconf_memory(rd);
-   break;
}
return notifier_from_errno(err);
 }
-- 
2.25.4



[PATCH 17/18] powerpc/pseries: remove dlpar_cpu_readd()

2020-06-11 Thread Nathan Lynch
dlpar_cpu_readd() is unused now.

Signed-off-by: Nathan Lynch 
---
 arch/powerpc/include/asm/topology.h  |  1 -
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 19 ---
 2 files changed, 20 deletions(-)

diff --git a/arch/powerpc/include/asm/topology.h 
b/arch/powerpc/include/asm/topology.h
index b2c346c5e16f..f0b6300e7dd3 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -115,7 +115,6 @@ int get_physical_package_id(int cpu);
 #define topology_core_cpumask(cpu) (per_cpu(cpu_core_map, cpu))
 #define topology_core_id(cpu)  (cpu_to_core_id(cpu))
 
-int dlpar_cpu_readd(int cpu);
 #endif
 #endif
 
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index dbfabb185eb5..4bad7a83addc 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -779,25 +779,6 @@ static int dlpar_cpu_add_by_count(u32 cpus_to_add)
return rc;
 }
 
-int dlpar_cpu_readd(int cpu)
-{
-   struct device_node *dn;
-   struct device *dev;
-   u32 drc_index;
-   int rc;
-
-   dev = get_cpu_device(cpu);
-   dn = dev->of_node;
-
-   rc = of_property_read_u32(dn, "ibm,my-drc-index", _index);
-
-   rc = dlpar_cpu_remove_by_index(drc_index);
-   if (!rc)
-   rc = dlpar_cpu_add(drc_index);
-
-   return rc;
-}
-
 int dlpar_cpu(struct pseries_hp_errorlog *hp_elog)
 {
u32 count, drc_index;
-- 
2.25.4



[PATCH 15/18] powerpc/pseries: remove prrn special case from DT update path

2020-06-11 Thread Nathan Lynch
pseries_devicetree_update() is no longer called with PRRN_SCOPE. The
purpose of prrn_update_node() was to remove and then add back a LMB
whose NUMA assignment had changed. This has never been reliable, and
this codepath has been default-disabled for several releases. Remove
prrn_update_node().

Signed-off-by: Nathan Lynch 
---
 arch/powerpc/platforms/pseries/mobility.c | 27 ---
 1 file changed, 27 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/mobility.c 
b/arch/powerpc/platforms/pseries/mobility.c
index c0b09f6f0ae3..78cd772a579b 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -244,29 +244,6 @@ static int add_dt_node(__be32 parent_phandle, __be32 
drc_index)
return rc;
 }
 
-static void prrn_update_node(__be32 phandle)
-{
-   struct pseries_hp_errorlog hp_elog;
-   struct device_node *dn;
-
-   /*
-* If a node is found from a the given phandle, the phandle does not
-* represent the drc index of an LMB and we can ignore.
-*/
-   dn = of_find_node_by_phandle(be32_to_cpu(phandle));
-   if (dn) {
-   of_node_put(dn);
-   return;
-   }
-
-   hp_elog.resource = PSERIES_HP_ELOG_RESOURCE_MEM;
-   hp_elog.action = PSERIES_HP_ELOG_ACTION_READD;
-   hp_elog.id_type = PSERIES_HP_ELOG_ID_DRC_INDEX;
-   hp_elog._drc_u.drc_index = phandle;
-
-   handle_dlpar_errorlog(_elog);
-}
-
 int pseries_devicetree_update(s32 scope)
 {
char *rtas_buf;
@@ -305,10 +282,6 @@ int pseries_devicetree_update(s32 scope)
break;
case UPDATE_DT_NODE:
update_dt_node(phandle, scope);
-
-   if (scope == PRRN_SCOPE)
-   prrn_update_node(phandle);
-
break;
case ADD_DT_NODE:
drc_index = *data++;
-- 
2.25.4



[PATCH 14/18] powerpc/numa: remove arch_update_cpu_topology

2020-06-11 Thread Nathan Lynch
Since arch_update_cpu_topology() doesn't do anything on powerpc now,
remove it and associated dead code.

Signed-off-by: Nathan Lynch 
---
 arch/powerpc/include/asm/topology.h |  6 --
 arch/powerpc/mm/numa.c  | 10 --
 2 files changed, 16 deletions(-)

diff --git a/arch/powerpc/include/asm/topology.h 
b/arch/powerpc/include/asm/topology.h
index 658aad65912b..b2c346c5e16f 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -43,7 +43,6 @@ extern void __init dump_numa_cpu_topology(void);
 
 extern int sysfs_add_device_to_node(struct device *dev, int nid);
 extern void sysfs_remove_device_from_node(struct device *dev, int nid);
-extern int numa_update_cpu_topology(bool cpus_locked);
 
 static inline void update_numa_cpu_lookup_table(unsigned int cpu, int node)
 {
@@ -78,11 +77,6 @@ static inline void sysfs_remove_device_from_node(struct 
device *dev,
 {
 }
 
-static inline int numa_update_cpu_topology(bool cpus_locked)
-{
-   return 0;
-}
-
 static inline void update_numa_cpu_lookup_table(unsigned int cpu, int node) {}
 
 static inline int cpu_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc)
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 26fcc947dd2d..e437a9ac4956 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1200,16 +1200,6 @@ int find_and_online_cpu_nid(int cpu)
return new_nid;
 }
 
-int numa_update_cpu_topology(bool cpus_locked)
-{
-   return 0;
-}
-
-int arch_update_cpu_topology(void)
-{
-   return numa_update_cpu_topology(true);
-}
-
 static int topology_update_init(void)
 {
topology_inited = 1;
-- 
2.25.4



[PATCH 12/18] powerpc/rtasd: simplify handle_rtas_event(), emit message on events

2020-06-11 Thread Nathan Lynch
prrn_is_enabled() always returns false/0, so handle_rtas_event() can
be simplified and some dead code can be removed. Use machine_is()
instead of #ifdef to run this code only on pseries, and add an
informational ratelimited message that we are ignoring the
events. PRRN events are relatively rare in normal operation and
usually arise from operator-initiated actions such as a DPO (Dynamic
Platform Optimizer) run.

Eventually we do want to consume these events and update the device
tree, but that needs more care to be safe vs LPM and DLPAR.

Signed-off-by: Nathan Lynch 
---
 arch/powerpc/kernel/rtasd.c | 28 +++-
 1 file changed, 3 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
index 89b798f8f656..8561dfb33f24 100644
--- a/arch/powerpc/kernel/rtasd.c
+++ b/arch/powerpc/kernel/rtasd.c
@@ -273,37 +273,15 @@ void pSeries_log_error(char *buf, unsigned int err_type, 
int fatal)
}
 }
 
-#ifdef CONFIG_PPC_PSERIES
-static void handle_prrn_event(s32 scope)
-{
-   /*
-* For PRRN, we must pass the negative of the scope value in
-* the RTAS event.
-*/
-   pseries_devicetree_update(-scope);
-   numa_update_cpu_topology(false);
-}
-
 static void handle_rtas_event(const struct rtas_error_log *log)
 {
-   if (rtas_error_type(log) != RTAS_TYPE_PRRN || !prrn_is_enabled())
+   if (!machine_is(pseries))
return;
 
-   /* For PRRN Events the extended log length is used to denote
-* the scope for calling rtas update-nodes.
-*/
-   handle_prrn_event(rtas_error_extended_log_length(log));
+   if (rtas_error_type(log) == RTAS_TYPE_PRRN)
+   pr_info_ratelimited("Platform resource reassignment 
ignored.\n");
 }
 
-#else
-
-static void handle_rtas_event(const struct rtas_error_log *log)
-{
-   return;
-}
-
-#endif
-
 static int rtas_log_open(struct inode * inode, struct file * file)
 {
return 0;
-- 
2.25.4



[PATCH 13/18] powerpc/numa: remove prrn_is_enabled()

2020-06-11 Thread Nathan Lynch
All users of this prrn_is_enabled() are gone; remove it.

Signed-off-by: Nathan Lynch 
---
 arch/powerpc/include/asm/topology.h | 5 -
 arch/powerpc/mm/numa.c  | 5 -
 2 files changed, 10 deletions(-)

diff --git a/arch/powerpc/include/asm/topology.h 
b/arch/powerpc/include/asm/topology.h
index 537c638582eb..658aad65912b 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -93,13 +93,8 @@ static inline int cpu_distance(__be32 *cpu1_assoc, __be32 
*cpu2_assoc)
 #endif /* CONFIG_NUMA */
 
 #if defined(CONFIG_NUMA) && defined(CONFIG_PPC_SPLPAR)
-extern int prrn_is_enabled(void);
 extern int find_and_online_cpu_nid(int cpu);
 #else
-static inline int prrn_is_enabled(void)
-{
-   return 0;
-}
 static inline int find_and_online_cpu_nid(int cpu)
 {
return 0;
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index dec7ce3b5e67..26fcc947dd2d 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1210,11 +1210,6 @@ int arch_update_cpu_topology(void)
return numa_update_cpu_topology(true);
 }
 
-int prrn_is_enabled(void)
-{
-   return 0;
-}
-
 static int topology_update_init(void)
 {
topology_inited = 1;
-- 
2.25.4



[PATCH 10/18] powerpc/numa: remove timed_topology_update()

2020-06-11 Thread Nathan Lynch
timed_topology_update is a no-op now, so remove it and all call sites.

Signed-off-by: Nathan Lynch 
---
 arch/powerpc/include/asm/topology.h  | 5 -
 arch/powerpc/mm/numa.c   | 9 -
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 2 --
 3 files changed, 16 deletions(-)

diff --git a/arch/powerpc/include/asm/topology.h 
b/arch/powerpc/include/asm/topology.h
index 2db7ba789720..379e2cc3789f 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -97,7 +97,6 @@ extern int start_topology_update(void);
 extern int stop_topology_update(void);
 extern int prrn_is_enabled(void);
 extern int find_and_online_cpu_nid(int cpu);
-extern int timed_topology_update(int nsecs);
 #else
 static inline int start_topology_update(void)
 {
@@ -115,10 +114,6 @@ static inline int find_and_online_cpu_nid(int cpu)
 {
return 0;
 }
-static inline int timed_topology_update(int nsecs)
-{
-   return 0;
-}
 
 #endif /* CONFIG_NUMA && CONFIG_PPC_SPLPAR */
 
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index b220e5b60140..6c579ac3e679 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1124,14 +1124,6 @@ u64 memory_hotplug_max(void)
 #ifdef CONFIG_PPC_SPLPAR
 static int topology_inited;
 
-/*
- * Change polling interval for associativity changes.
- */
-int timed_topology_update(int nsecs)
-{
-   return 0;
-}
-
 /*
  * Retrieve the new associativity information for a virtual processor's
  * home node.
@@ -1147,7 +1139,6 @@ static long vphn_get_associativity(unsigned long cpu,
switch (rc) {
case H_SUCCESS:
dbg("VPHN hcall succeeded. Reset polling...\n");
-   timed_topology_update(0);
goto out;
 
case H_FUNCTION:
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index d4b346355bb9..dbfabb185eb5 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -263,7 +263,6 @@ static int dlpar_offline_cpu(struct device_node *dn)
break;
 
cpu_maps_update_done();
-   timed_topology_update(1);
rc = device_offline(get_cpu_device(cpu));
if (rc)
goto out;
@@ -302,7 +301,6 @@ static int dlpar_online_cpu(struct device_node *dn)
if (get_hard_smp_processor_id(cpu) != thread)
continue;
cpu_maps_update_done();
-   timed_topology_update(1);
find_and_online_cpu_nid(cpu);
rc = device_online(get_cpu_device(cpu));
if (rc) {
-- 
2.25.4



[PATCH 11/18] powerpc/numa: remove start/stop_topology_update()

2020-06-11 Thread Nathan Lynch
These APIs have become no-ops, so remove them and all call sites.

Signed-off-by: Nathan Lynch 
---
 arch/powerpc/include/asm/topology.h   | 10 --
 arch/powerpc/mm/numa.c| 20 
 arch/powerpc/platforms/pseries/mobility.c |  4 
 arch/powerpc/platforms/pseries/suspend.c  |  5 +
 4 files changed, 1 insertion(+), 38 deletions(-)

diff --git a/arch/powerpc/include/asm/topology.h 
b/arch/powerpc/include/asm/topology.h
index 379e2cc3789f..537c638582eb 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -93,19 +93,9 @@ static inline int cpu_distance(__be32 *cpu1_assoc, __be32 
*cpu2_assoc)
 #endif /* CONFIG_NUMA */
 
 #if defined(CONFIG_NUMA) && defined(CONFIG_PPC_SPLPAR)
-extern int start_topology_update(void);
-extern int stop_topology_update(void);
 extern int prrn_is_enabled(void);
 extern int find_and_online_cpu_nid(int cpu);
 #else
-static inline int start_topology_update(void)
-{
-   return 0;
-}
-static inline int stop_topology_update(void)
-{
-   return 0;
-}
 static inline int prrn_is_enabled(void)
 {
return 0;
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 6c579ac3e679..dec7ce3b5e67 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1157,8 +1157,6 @@ static long vphn_get_associativity(unsigned long cpu,
, rc);
break;
}
-
-   stop_topology_update();
 out:
return rc;
 }
@@ -1212,22 +1210,6 @@ int arch_update_cpu_topology(void)
return numa_update_cpu_topology(true);
 }
 
-/*
- * Start polling for associativity changes.
- */
-int start_topology_update(void)
-{
-   return 0;
-}
-
-/*
- * Disable polling for VPHN associativity changes.
- */
-int stop_topology_update(void)
-{
-   return 0;
-}
-
 int prrn_is_enabled(void)
 {
return 0;
@@ -1235,8 +1217,6 @@ int prrn_is_enabled(void)
 
 static int topology_update_init(void)
 {
-   start_topology_update();
-
topology_inited = 1;
return 0;
 }
diff --git a/arch/powerpc/platforms/pseries/mobility.c 
b/arch/powerpc/platforms/pseries/mobility.c
index 10d982997736..c0b09f6f0ae3 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -388,8 +388,6 @@ static ssize_t migration_store(struct class *class,
if (rc)
return rc;
 
-   stop_topology_update();
-
do {
rc = rtas_ibm_suspend_me(streamid);
if (rc == -EAGAIN)
@@ -401,8 +399,6 @@ static ssize_t migration_store(struct class *class,
 
post_mobility_fixup();
 
-   start_topology_update();
-
return count;
 }
 
diff --git a/arch/powerpc/platforms/pseries/suspend.c 
b/arch/powerpc/platforms/pseries/suspend.c
index f789693f61f4..81e0ac58d620 100644
--- a/arch/powerpc/platforms/pseries/suspend.c
+++ b/arch/powerpc/platforms/pseries/suspend.c
@@ -145,11 +145,8 @@ static ssize_t store_hibernate(struct device *dev,
ssleep(1);
} while (rc == -EAGAIN);
 
-   if (!rc) {
-   stop_topology_update();
+   if (!rc)
rc = pm_suspend(PM_SUSPEND_MEM);
-   start_topology_update();
-   }
 
stream_id = 0;
 
-- 
2.25.4



[PATCH 09/18] powerpc/numa: stub out numa_update_cpu_topology()

2020-06-11 Thread Nathan Lynch
Previous changes have removed the code which sets bits in
cpu_associativity_changes_mask and thus it is never modifed at
runtime. From this we can reason that numa_update_cpu_topology()
always returns 0 without doing anything. Remove the body of
numa_update_cpu_topology() and remove all code which becomes
unreachable as a result.

Signed-off-by: Nathan Lynch 
---
 arch/powerpc/mm/numa.c | 193 +
 1 file changed, 1 insertion(+), 192 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 8749d7f2b1a6..b220e5b60140 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1122,14 +1122,6 @@ u64 memory_hotplug_max(void)
 
 /* Virtual Processor Home Node (VPHN) support */
 #ifdef CONFIG_PPC_SPLPAR
-struct topology_update_data {
-   struct topology_update_data *next;
-   unsigned int cpu;
-   int old_nid;
-   int new_nid;
-};
-
-static cpumask_t cpu_associativity_changes_mask;
 static int topology_inited;
 
 /*
@@ -1219,192 +1211,9 @@ int find_and_online_cpu_nid(int cpu)
return new_nid;
 }
 
-/*
- * Update the CPU maps and sysfs entries for a single CPU when its NUMA
- * characteristics change. This function doesn't perform any locking and is
- * only safe to call from stop_machine().
- */
-static int update_cpu_topology(void *data)
-{
-   struct topology_update_data *update;
-   unsigned long cpu;
-
-   if (!data)
-   return -EINVAL;
-
-   cpu = smp_processor_id();
-
-   for (update = data; update; update = update->next) {
-   int new_nid = update->new_nid;
-   if (cpu != update->cpu)
-   continue;
-
-   unmap_cpu_from_node(cpu);
-   map_cpu_to_node(cpu, new_nid);
-   set_cpu_numa_node(cpu, new_nid);
-   set_cpu_numa_mem(cpu, local_memory_node(new_nid));
-   vdso_getcpu_init();
-   }
-
-   return 0;
-}
-
-static int update_lookup_table(void *data)
-{
-   struct topology_update_data *update;
-
-   if (!data)
-   return -EINVAL;
-
-   /*
-* Upon topology update, the numa-cpu lookup table needs to be updated
-* for all threads in the core, including offline CPUs, to ensure that
-* future hotplug operations respect the cpu-to-node associativity
-* properly.
-*/
-   for (update = data; update; update = update->next) {
-   int nid, base, j;
-
-   nid = update->new_nid;
-   base = cpu_first_thread_sibling(update->cpu);
-
-   for (j = 0; j < threads_per_core; j++) {
-   update_numa_cpu_lookup_table(base + j, nid);
-   }
-   }
-
-   return 0;
-}
-
-/*
- * Update the node maps and sysfs entries for each cpu whose home node
- * has changed. Returns 1 when the topology has changed, and 0 otherwise.
- *
- * cpus_locked says whether we already hold cpu_hotplug_lock.
- */
 int numa_update_cpu_topology(bool cpus_locked)
 {
-   unsigned int cpu, sibling, changed = 0;
-   struct topology_update_data *updates, *ud;
-   cpumask_t updated_cpus;
-   struct device *dev;
-   int weight, new_nid, i = 0;
-
-   if (topology_inited)
-   return 0;
-
-   weight = cpumask_weight(_associativity_changes_mask);
-   if (!weight)
-   return 0;
-
-   updates = kcalloc(weight, sizeof(*updates), GFP_KERNEL);
-   if (!updates)
-   return 0;
-
-   cpumask_clear(_cpus);
-
-   for_each_cpu(cpu, _associativity_changes_mask) {
-   /*
-* If siblings aren't flagged for changes, updates list
-* will be too short. Skip on this update and set for next
-* update.
-*/
-   if (!cpumask_subset(cpu_sibling_mask(cpu),
-   _associativity_changes_mask)) {
-   pr_info("Sibling bits not set for associativity "
-   "change, cpu%d\n", cpu);
-   cpumask_or(_associativity_changes_mask,
-   _associativity_changes_mask,
-   cpu_sibling_mask(cpu));
-   cpu = cpu_last_thread_sibling(cpu);
-   continue;
-   }
-
-   new_nid = find_and_online_cpu_nid(cpu);
-
-   if (new_nid == numa_cpu_lookup_table[cpu]) {
-   cpumask_andnot(_associativity_changes_mask,
-   _associativity_changes_mask,
-   cpu_sibling_mask(cpu));
-   dbg("Assoc chg gives same node %d for cpu%d\n",
-   new_nid, cpu);
-   cpu = cpu_last_thread_sibling(cpu);
-   continue;
-   }
-
-   

[PATCH 08/18] powerpc/numa: remove vphn_enabled and prrn_enabled internal flags

2020-06-11 Thread Nathan Lynch
These flags are always zero now; remove them and suitably adjust the
remaining references to them.

Signed-off-by: Nathan Lynch 
---
 arch/powerpc/mm/numa.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 8415481a7f13..8749d7f2b1a6 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1130,8 +1130,6 @@ struct topology_update_data {
 };
 
 static cpumask_t cpu_associativity_changes_mask;
-static const int vphn_enabled;
-static const int prrn_enabled;
 static int topology_inited;
 
 /*
@@ -1292,7 +1290,7 @@ int numa_update_cpu_topology(bool cpus_locked)
struct device *dev;
int weight, new_nid, i = 0;
 
-   if (!prrn_enabled && !vphn_enabled && topology_inited)
+   if (topology_inited)
return 0;
 
weight = cpumask_weight(_associativity_changes_mask);
@@ -1432,7 +1430,7 @@ int stop_topology_update(void)
 
 int prrn_is_enabled(void)
 {
-   return prrn_enabled;
+   return 0;
 }
 
 static int topology_update_init(void)
-- 
2.25.4



[PATCH 07/18] powerpc/numa: remove unreachable topology workqueue code

2020-06-11 Thread Nathan Lynch
Since vphn_enabled is always 0, we can remove the call to
topology_schedule_update() and remove the code which becomes
unreachable as a result.

Signed-off-by: Nathan Lynch 
---
 arch/powerpc/mm/numa.c | 14 --
 1 file changed, 14 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 6207297490a8..8415481a7f13 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1414,17 +1414,6 @@ int arch_update_cpu_topology(void)
return numa_update_cpu_topology(true);
 }
 
-static void topology_work_fn(struct work_struct *work)
-{
-   rebuild_sched_domains();
-}
-static DECLARE_WORK(topology_work, topology_work_fn);
-
-static void topology_schedule_update(void)
-{
-   schedule_work(_work);
-}
-
 /*
  * Start polling for associativity changes.
  */
@@ -1450,9 +1439,6 @@ static int topology_update_init(void)
 {
start_topology_update();
 
-   if (vphn_enabled)
-   topology_schedule_update();
-
topology_inited = 1;
return 0;
 }
-- 
2.25.4



[PATCH 06/18] powerpc/numa: remove unreachable topology timer code

2020-06-11 Thread Nathan Lynch
Since vphn_enabled is always 0, we can stub out
timed_topology_update() and remove the code which becomes unreachable.

Signed-off-by: Nathan Lynch 
---
 arch/powerpc/mm/numa.c | 21 -
 1 file changed, 21 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 1b89bacb8975..6207297490a8 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1129,13 +1129,9 @@ struct topology_update_data {
int new_nid;
 };
 
-#define TOPOLOGY_DEF_TIMER_SECS60
-
 static cpumask_t cpu_associativity_changes_mask;
 static const int vphn_enabled;
 static const int prrn_enabled;
-static void reset_topology_timer(void);
-static int topology_timer_secs = 1;
 static int topology_inited;
 
 /*
@@ -1143,15 +1139,6 @@ static int topology_inited;
  */
 int timed_topology_update(int nsecs)
 {
-   if (vphn_enabled) {
-   if (nsecs > 0)
-   topology_timer_secs = nsecs;
-   else
-   topology_timer_secs = TOPOLOGY_DEF_TIMER_SECS;
-
-   reset_topology_timer();
-   }
-
return 0;
 }
 
@@ -1438,14 +1425,6 @@ static void topology_schedule_update(void)
schedule_work(_work);
 }
 
-static struct timer_list topology_timer;
-
-static void reset_topology_timer(void)
-{
-   if (vphn_enabled)
-   mod_timer(_timer, jiffies + topology_timer_secs * HZ);
-}
-
 /*
  * Start polling for associativity changes.
  */
-- 
2.25.4



[PATCH 05/18] powerpc/numa: make vphn_enabled, prrn_enabled flags const

2020-06-11 Thread Nathan Lynch
Previous changes have made it so these flags are never changed;
enforce this by making them const.

Signed-off-by: Nathan Lynch 
---
 arch/powerpc/mm/numa.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 9e20f12e6caf..1b89bacb8975 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1132,8 +1132,8 @@ struct topology_update_data {
 #define TOPOLOGY_DEF_TIMER_SECS60
 
 static cpumask_t cpu_associativity_changes_mask;
-static int vphn_enabled;
-static int prrn_enabled;
+static const int vphn_enabled;
+static const int prrn_enabled;
 static void reset_topology_timer(void);
 static int topology_timer_secs = 1;
 static int topology_inited;
-- 
2.25.4



[PATCH 04/18] powerpc/numa: remove unreachable topology update code

2020-06-11 Thread Nathan Lynch
Since the topology_updates_enabled flag is now always false, remove it
and the code which has become unreachable. This is the minimum change
that prevents 'defined but unused' warnings emitted by the compiler
after stubbing out the start/stop_topology_updates() functions.

Signed-off-by: Nathan Lynch 
---
 arch/powerpc/mm/numa.c | 149 +
 1 file changed, 2 insertions(+), 147 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 34d95de77bdd..9e20f12e6caf 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -984,8 +984,6 @@ static int __init early_numa(char *p)
 }
 early_param("numa", early_numa);
 
-static const bool topology_updates_enabled;
-
 #ifdef CONFIG_MEMORY_HOTPLUG
 /*
  * Find the node associated with a hot added memory section for
@@ -1133,7 +1131,6 @@ struct topology_update_data {
 
 #define TOPOLOGY_DEF_TIMER_SECS60
 
-static u8 vphn_cpu_change_counts[NR_CPUS][MAX_DISTANCE_REF_POINTS];
 static cpumask_t cpu_associativity_changes_mask;
 static int vphn_enabled;
 static int prrn_enabled;
@@ -1158,63 +1155,6 @@ int timed_topology_update(int nsecs)
return 0;
 }
 
-/*
- * Store the current values of the associativity change counters in the
- * hypervisor.
- */
-static void setup_cpu_associativity_change_counters(void)
-{
-   int cpu;
-
-   /* The VPHN feature supports a maximum of 8 reference points */
-   BUILD_BUG_ON(MAX_DISTANCE_REF_POINTS > 8);
-
-   for_each_possible_cpu(cpu) {
-   int i;
-   u8 *counts = vphn_cpu_change_counts[cpu];
-   volatile u8 *hypervisor_counts = 
lppaca_of(cpu).vphn_assoc_counts;
-
-   for (i = 0; i < distance_ref_points_depth; i++)
-   counts[i] = hypervisor_counts[i];
-   }
-}
-
-/*
- * The hypervisor maintains a set of 8 associativity change counters in
- * the VPA of each cpu that correspond to the associativity levels in the
- * ibm,associativity-reference-points property. When an associativity
- * level changes, the corresponding counter is incremented.
- *
- * Set a bit in cpu_associativity_changes_mask for each cpu whose home
- * node associativity levels have changed.
- *
- * Returns the number of cpus with unhandled associativity changes.
- */
-static int update_cpu_associativity_changes_mask(void)
-{
-   int cpu;
-   cpumask_t *changes = _associativity_changes_mask;
-
-   for_each_possible_cpu(cpu) {
-   int i, changed = 0;
-   u8 *counts = vphn_cpu_change_counts[cpu];
-   volatile u8 *hypervisor_counts = 
lppaca_of(cpu).vphn_assoc_counts;
-
-   for (i = 0; i < distance_ref_points_depth; i++) {
-   if (hypervisor_counts[i] != counts[i]) {
-   counts[i] = hypervisor_counts[i];
-   changed = 1;
-   }
-   }
-   if (changed) {
-   cpumask_or(changes, changes, cpu_sibling_mask(cpu));
-   cpu = cpu_last_thread_sibling(cpu);
-   }
-   }
-
-   return cpumask_weight(changes);
-}
-
 /*
  * Retrieve the new associativity information for a virtual processor's
  * home node.
@@ -1498,16 +1438,6 @@ static void topology_schedule_update(void)
schedule_work(_work);
 }
 
-static void topology_timer_fn(struct timer_list *unused)
-{
-   if (prrn_enabled && cpumask_weight(_associativity_changes_mask))
-   topology_schedule_update();
-   else if (vphn_enabled) {
-   if (update_cpu_associativity_changes_mask() > 0)
-   topology_schedule_update();
-   reset_topology_timer();
-   }
-}
 static struct timer_list topology_timer;
 
 static void reset_topology_timer(void)
@@ -1516,69 +1446,12 @@ static void reset_topology_timer(void)
mod_timer(_timer, jiffies + topology_timer_secs * HZ);
 }
 
-#ifdef CONFIG_SMP
-
-static int dt_update_callback(struct notifier_block *nb,
-   unsigned long action, void *data)
-{
-   struct of_reconfig_data *update = data;
-   int rc = NOTIFY_DONE;
-
-   switch (action) {
-   case OF_RECONFIG_UPDATE_PROPERTY:
-   if (of_node_is_type(update->dn, "cpu") &&
-   !of_prop_cmp(update->prop->name, "ibm,associativity")) {
-   u32 core_id;
-   of_property_read_u32(update->dn, "reg", _id);
-   rc = dlpar_cpu_readd(core_id);
-   rc = NOTIFY_OK;
-   }
-   break;
-   }
-
-   return rc;
-}
-
-static struct notifier_block dt_update_nb = {
-   .notifier_call = dt_update_callback,
-};
-
-#endif
-
 /*
  * Start polling for associativity changes.
  */
 int start_topology_update(void)
 {
-   int rc = 0;
-
-   if (!topology_updates_enabled)
-   return 0;
-
-   

[PATCH 02/18] powerpc/rtas: don't online CPUs for partition suspend

2020-06-11 Thread Nathan Lynch
Partition suspension, used for hibernation and migration, requires
that the OS place all but one of the LPAR's processor threads into one
of two states prior to calling the ibm,suspend-me RTAS function:

  * the architected offline state (via RTAS stop-self); or
  * the H_JOIN hcall, which does not return until the partition
resumes execution

Using H_CEDE as the offline mode, introduced by
commit 3aa565f53c39 ("powerpc/pseries: Add hooks to put the CPU into
an appropriate offline state"), means that any threads which are
offline from Linux's point of view must be moved to one of those two
states before a partition suspension can proceed.

This was eventually addressed in commit 120496ac2d2d ("powerpc: Bring
all threads online prior to migration/hibernation"), which added code
to temporarily bring up any offline processor threads so they can call
H_JOIN. Conceptually this is fine, but the implementation has had
multiple races with cpu hotplug operations initiated from user
space[1][2][3], the error handling is fragile, and it generates
user-visible cpu hotplug events which is a lot of noise for a platform
feature that's supposed to minimize disruption to workloads.

With commit 3aa565f53c39 ("powerpc/pseries: Add hooks to put the CPU
into an appropriate offline state") reverted, this code becomes
unnecessary, so remove it. Since any offline CPUs now are truly
offline from the platform's point of view, it is no longer necessary
to bring up CPUs only to have them call H_JOIN and then go offline
again upon resuming. Only active threads are required to call H_JOIN;
stopped threads can be left alone.

[1] commit a6717c01ddc2 ("powerpc/rtas: use device model APIs and
serialization during LPM")
[2] commit 9fb603050ffd ("powerpc/rtas: retry when cpu offline races
with suspend/migration")
[3] commit dfd718a2ed1f ("powerpc/rtas: Fix a potential race between
CPU-Offline & Migration")

Fixes: 120496ac2d2d ("powerpc: Bring all threads online prior to 
migration/hibernation")
Signed-off-by: Nathan Lynch 
---
 arch/powerpc/include/asm/rtas.h  |   2 -
 arch/powerpc/kernel/rtas.c   | 122 +--
 arch/powerpc/platforms/pseries/suspend.c |  22 +---
 3 files changed, 3 insertions(+), 143 deletions(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 014968f25f7e..0107d724e9da 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -253,8 +253,6 @@ extern int rtas_set_indicator_fast(int indicator, int 
index, int new_value);
 extern void rtas_progress(char *s, unsigned short hex);
 extern int rtas_suspend_cpu(struct rtas_suspend_me_data *data);
 extern int rtas_suspend_last_cpu(struct rtas_suspend_me_data *data);
-extern int rtas_online_cpus_mask(cpumask_var_t cpus);
-extern int rtas_offline_cpus_mask(cpumask_var_t cpus);
 extern int rtas_ibm_suspend_me(u64 handle);
 
 struct rtc_time;
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index a09eba03f180..806d554ce357 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -843,96 +843,6 @@ static void rtas_percpu_suspend_me(void *info)
__rtas_suspend_cpu((struct rtas_suspend_me_data *)info, 1);
 }
 
-enum rtas_cpu_state {
-   DOWN,
-   UP,
-};
-
-#ifndef CONFIG_SMP
-static int rtas_cpu_state_change_mask(enum rtas_cpu_state state,
-   cpumask_var_t cpus)
-{
-   if (!cpumask_empty(cpus)) {
-   cpumask_clear(cpus);
-   return -EINVAL;
-   } else
-   return 0;
-}
-#else
-/* On return cpumask will be altered to indicate CPUs changed.
- * CPUs with states changed will be set in the mask,
- * CPUs with status unchanged will be unset in the mask. */
-static int rtas_cpu_state_change_mask(enum rtas_cpu_state state,
-   cpumask_var_t cpus)
-{
-   int cpu;
-   int cpuret = 0;
-   int ret = 0;
-
-   if (cpumask_empty(cpus))
-   return 0;
-
-   for_each_cpu(cpu, cpus) {
-   struct device *dev = get_cpu_device(cpu);
-
-   switch (state) {
-   case DOWN:
-   cpuret = device_offline(dev);
-   break;
-   case UP:
-   cpuret = device_online(dev);
-   break;
-   }
-   if (cpuret < 0) {
-   pr_debug("%s: cpu_%s for cpu#%d returned %d.\n",
-   __func__,
-   ((state == UP) ? "up" : "down"),
-   cpu, cpuret);
-   if (!ret)
-   ret = cpuret;
-   if (state == UP) {
-   /* clear bits for unchanged cpus, return */
-   cpumask_shift_right(cpus, cpus, cpu);
-   

[PATCH 03/18] powerpc/numa: remove ability to enable topology updates

2020-06-11 Thread Nathan Lynch
Remove the /proc/powerpc/topology_updates interface and the
topology_updates=on/off command line argument. The internal
topology_updates_enabled flag remains for now, but always false.

Signed-off-by: Nathan Lynch 
---
 arch/powerpc/mm/numa.c | 71 +-
 1 file changed, 1 insertion(+), 70 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 9fcf2d195830..34d95de77bdd 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -984,27 +984,7 @@ static int __init early_numa(char *p)
 }
 early_param("numa", early_numa);
 
-/*
- * The platform can inform us through one of several mechanisms
- * (post-migration device tree updates, PRRN or VPHN) that the NUMA
- * assignment of a resource has changed. This controls whether we act
- * on that. Disabled by default.
- */
-static bool topology_updates_enabled;
-
-static int __init early_topology_updates(char *p)
-{
-   if (!p)
-   return 0;
-
-   if (!strcmp(p, "on")) {
-   pr_warn("Caution: enabling topology updates\n");
-   topology_updates_enabled = true;
-   }
-
-   return 0;
-}
-early_param("topology_updates", early_topology_updates);
+static const bool topology_updates_enabled;
 
 #ifdef CONFIG_MEMORY_HOTPLUG
 /*
@@ -1632,52 +1612,6 @@ int prrn_is_enabled(void)
return prrn_enabled;
 }
 
-static int topology_read(struct seq_file *file, void *v)
-{
-   if (vphn_enabled || prrn_enabled)
-   seq_puts(file, "on\n");
-   else
-   seq_puts(file, "off\n");
-
-   return 0;
-}
-
-static int topology_open(struct inode *inode, struct file *file)
-{
-   return single_open(file, topology_read, NULL);
-}
-
-static ssize_t topology_write(struct file *file, const char __user *buf,
- size_t count, loff_t *off)
-{
-   char kbuf[4]; /* "on" or "off" plus null. */
-   int read_len;
-
-   read_len = count < 3 ? count : 3;
-   if (copy_from_user(kbuf, buf, read_len))
-   return -EINVAL;
-
-   kbuf[read_len] = '\0';
-
-   if (!strncmp(kbuf, "on", 2)) {
-   topology_updates_enabled = true;
-   start_topology_update();
-   } else if (!strncmp(kbuf, "off", 3)) {
-   stop_topology_update();
-   topology_updates_enabled = false;
-   } else
-   return -EINVAL;
-
-   return count;
-}
-
-static const struct proc_ops topology_proc_ops = {
-   .proc_read  = seq_read,
-   .proc_write = topology_write,
-   .proc_open  = topology_open,
-   .proc_release   = single_release,
-};
-
 static int topology_update_init(void)
 {
start_topology_update();
@@ -1685,9 +1619,6 @@ static int topology_update_init(void)
if (vphn_enabled)
topology_schedule_update();
 
-   if (!proc_create("powerpc/topology_updates", 0644, NULL, 
_proc_ops))
-   return -ENOMEM;
-
topology_inited = 1;
return 0;
 }
-- 
2.25.4



[PATCH 00/18] remove extended cede offline mode and bogus topology update code

2020-06-11 Thread Nathan Lynch
Two major parts to this series:

1. Removal of the extended cede offline mode for CPUs as well as the
   partition suspend code which accommodates it by temporarily
   onlining all CPUs prior to suspending the LPAR. This solves some
   accounting problems, simplifies the pseries CPU hotplug code, and
   greatly uncomplicates the existing partition suspend code, easing
   a much-needed transition to the Linux suspend framework. The two
   patches which make up this part have been posted before:

   https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=180718

   and they are simply incorporated unchanged into the larger series
   here, with Gautham's Reviewed-by added to patch #1.

2. Removal of the long-disabled "topology update" code, most of which
   resides in mm/numa.c, but there are pieces in pseries and rtasd to
   excise as well. This code was an attempt to honor changes in a
   partition's NUMA properties arising from resource reassignments
   which occur as part of a migration, VPHN change, or a Dynamic
   Platform Optimizer operation. Its main technique is to remove and
   re-add affected processors and LMBs and hope in vain that the
   changes in cpu-node and physaddr-node relationships aren't
   disruptive. We want to provide user space with some indication that
   Linux's logical NUMA representation has become out of sync with the
   platform's assignments, but we need to get this unusable stuff out
   of the way before this code can sustain new features.

Nathan Lynch (18):
  powerpc/pseries: remove cede offline state for CPUs
  powerpc/rtas: don't online CPUs for partition suspend
  powerpc/numa: remove ability to enable topology updates
  powerpc/numa: remove unreachable topology update code
  powerpc/numa: make vphn_enabled, prrn_enabled flags const
  powerpc/numa: remove unreachable topology timer code
  powerpc/numa: remove unreachable topology workqueue code
  powerpc/numa: remove vphn_enabled and prrn_enabled internal flags
  powerpc/numa: stub out numa_update_cpu_topology()
  powerpc/numa: remove timed_topology_update()
  powerpc/numa: remove start/stop_topology_update()
  powerpc/rtasd: simplify handle_rtas_event(), emit message on events
  powerpc/numa: remove prrn_is_enabled()
  powerpc/numa: remove arch_update_cpu_topology
  powerpc/pseries: remove prrn special case from DT update path
  powerpc/pseries: remove memory "re-add" implementation
  powerpc/pseries: remove dlpar_cpu_readd()
  powerpc/pseries: remove obsolete memory hotplug DT notifier code

 Documentation/core-api/cpu_hotplug.rst|   7 -
 arch/powerpc/include/asm/rtas.h   |   3 -
 arch/powerpc/include/asm/topology.h   |  27 -
 arch/powerpc/kernel/rtas.c| 122 +
 arch/powerpc/kernel/rtasd.c   |  28 +-
 arch/powerpc/mm/numa.c| 486 --
 arch/powerpc/platforms/pseries/hotplug-cpu.c  | 189 +--
 .../platforms/pseries/hotplug-memory.c| 107 +---
 arch/powerpc/platforms/pseries/mobility.c |  31 --
 .../platforms/pseries/offline_states.h|  38 --
 arch/powerpc/platforms/pseries/pmem.c |   1 -
 arch/powerpc/platforms/pseries/smp.c  |  28 +-
 arch/powerpc/platforms/pseries/suspend.c  |  27 +-
 13 files changed, 22 insertions(+), 1072 deletions(-)
 delete mode 100644 arch/powerpc/platforms/pseries/offline_states.h

-- 
2.25.4



[PATCH 01/18] powerpc/pseries: remove cede offline state for CPUs

2020-06-11 Thread Nathan Lynch
This effectively reverts commit 3aa565f53c39 ("powerpc/pseries: Add
hooks to put the CPU into an appropriate offline state"), which added
an offline mode for CPUs which uses the H_CEDE hcall instead of the
architected stop-self RTAS function in order to facilitate "folding"
of dedicated mode processors on PowerVM platforms to achieve energy
savings. This has been the default offline mode since its
introduction.

There's nothing about stop-self that would prevent the hypervisor from
achieving the energy savings available via H_CEDE, so the original
premise of this change appears to be flawed.

I also have encountered the claim that the transition to and from
ceded state is much faster than stop-self/start-cpu. Certainly we
would not want to use stop-self as an *idle* mode. That is what H_CEDE
is for. However, this difference is insignificant in the context of
Linux CPU hotplug, where the latency of an offline or online operation
on current systems is on the order of 100ms, mainly attributable to
all the various subsystems' cpuhp callbacks.

The cede offline mode also prevents accurate accounting, as discussed
before:
https://lore.kernel.org/linuxppc-dev/1571740391-3251-1-git-send-email-...@linux.vnet.ibm.com/

Unconditionally use stop-self to offline processor threads. This is
the architected method for offlining CPUs on PAPR systems.

The "cede_offline" boot parameter is rendered obsolete.

Removing this code enables the removal of the partition suspend code
which temporarily onlines all present CPUs.

Fixes: 3aa565f53c39 ("powerpc/pseries: Add hooks to put the CPU into an 
appropriate offline state")

Signed-off-by: Nathan Lynch 
Reviewed-by: Gautham R. Shenoy 
---
 Documentation/core-api/cpu_hotplug.rst|   7 -
 arch/powerpc/platforms/pseries/hotplug-cpu.c  | 170 ++
 .../platforms/pseries/offline_states.h|  38 
 arch/powerpc/platforms/pseries/pmem.c |   1 -
 arch/powerpc/platforms/pseries/smp.c  |  28 +--
 5 files changed, 15 insertions(+), 229 deletions(-)
 delete mode 100644 arch/powerpc/platforms/pseries/offline_states.h

diff --git a/Documentation/core-api/cpu_hotplug.rst 
b/Documentation/core-api/cpu_hotplug.rst
index 4a50ab7817f7..b1ae1ac159cf 100644
--- a/Documentation/core-api/cpu_hotplug.rst
+++ b/Documentation/core-api/cpu_hotplug.rst
@@ -50,13 +50,6 @@ Command Line Switches
 
   This option is limited to the X86 and S390 architecture.
 
-``cede_offline={"off","on"}``
-  Use this option to disable/enable putting offlined processors to an extended
-  ``H_CEDE`` state on supported pseries platforms. If nothing is specified,
-  ``cede_offline`` is set to "on".
-
-  This option is limited to the PowerPC architecture.
-
 ``cpu0_hotplug``
   Allow to shutdown CPU0.
 
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 3e8cbfe7a80f..d4b346355bb9 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -35,54 +35,10 @@
 #include 
 
 #include "pseries.h"
-#include "offline_states.h"
 
 /* This version can't take the spinlock, because it never returns */
 static int rtas_stop_self_token = RTAS_UNKNOWN_SERVICE;
 
-static DEFINE_PER_CPU(enum cpu_state_vals, preferred_offline_state) =
-   CPU_STATE_OFFLINE;
-static DEFINE_PER_CPU(enum cpu_state_vals, current_state) = CPU_STATE_OFFLINE;
-
-static enum cpu_state_vals default_offline_state = CPU_STATE_OFFLINE;
-
-static bool cede_offline_enabled __read_mostly = true;
-
-/*
- * Enable/disable cede_offline when available.
- */
-static int __init setup_cede_offline(char *str)
-{
-   return (kstrtobool(str, _offline_enabled) == 0);
-}
-
-__setup("cede_offline=", setup_cede_offline);
-
-enum cpu_state_vals get_cpu_current_state(int cpu)
-{
-   return per_cpu(current_state, cpu);
-}
-
-void set_cpu_current_state(int cpu, enum cpu_state_vals state)
-{
-   per_cpu(current_state, cpu) = state;
-}
-
-enum cpu_state_vals get_preferred_offline_state(int cpu)
-{
-   return per_cpu(preferred_offline_state, cpu);
-}
-
-void set_preferred_offline_state(int cpu, enum cpu_state_vals state)
-{
-   per_cpu(preferred_offline_state, cpu) = state;
-}
-
-void set_default_offline_state(int cpu)
-{
-   per_cpu(preferred_offline_state, cpu) = default_offline_state;
-}
-
 static void rtas_stop_self(void)
 {
static struct rtas_args args;
@@ -101,9 +57,7 @@ static void rtas_stop_self(void)
 
 static void pseries_mach_cpu_die(void)
 {
-   unsigned int cpu = smp_processor_id();
unsigned int hwcpu = hard_smp_processor_id();
-   u8 cede_latency_hint = 0;
 
local_irq_disable();
idle_task_exit();
@@ -112,49 +66,6 @@ static void pseries_mach_cpu_die(void)
else
xics_teardown_cpu();
 
-   if (get_preferred_offline_state(cpu) == CPU_STATE_INACTIVE) {
-   set_cpu_current_state(cpu, 

Re: [RFC PATCH v2 3/3] ASoC: fsl_asrc_dma: Reuse the dma channel if available in Back-End

2020-06-11 Thread Nicolin Chen
On Fri, Jun 12, 2020 at 10:17:08AM +0800, Shengjiu Wang wrote:

> > > diff --git a/sound/soc/fsl/fsl_asrc_common.h 
> > > b/sound/soc/fsl/fsl_asrc_common.h

> > > + * @req_dma_chan_dev_to_dev: flag for release dev_to_dev chan
> >
> > Since we only have dma_request call for back-end only:
> > + * @req_dma_chan: flag to release back-end dma chan
> 
> I prefer to use the description "flag to release dev_to_dev chan"
> because we won't release the dma chan of the back-end. if the chan
> is from the back-end, it is owned by the back-end component.

TBH, it just looks too long. But I wouldn't have problem if you
insist so.

> > > @@ -273,19 +299,21 @@ static int fsl_asrc_dma_hw_params(struct 
> > > snd_soc_component *component,
> > >  static int fsl_asrc_dma_hw_free(struct snd_soc_component *component,
> > >   struct snd_pcm_substream *substream)
> > >  {
> > > + bool tx = substream->stream == SNDRV_PCM_STREAM_PLAYBACK;
> > >   struct snd_pcm_runtime *runtime = substream->runtime;
> > >   struct fsl_asrc_pair *pair = runtime->private_data;
> > > + u8 dir = tx ? OUT : IN;
> > >
> > >   snd_pcm_set_runtime_buffer(substream, NULL);
> > >
> > > - if (pair->dma_chan[IN])
> > > - dma_release_channel(pair->dma_chan[IN]);
> > > + if (pair->dma_chan[!dir])
> > > + dma_release_channel(pair->dma_chan[!dir]);
> > >
> > > - if (pair->dma_chan[OUT])
> > > - dma_release_channel(pair->dma_chan[OUT]);
> > > + if (pair->dma_chan[dir] && pair->req_dma_chan_dev_to_dev)
> > > + dma_release_channel(pair->dma_chan[dir]);
> >
> > Why we only apply this to one direction?
> 
> if the chan is from the back-end, it is owned by the back-end
> component, so it should be released by the back-end component,
> not here. That's why I added the flag "req_dma_chan".

Ah...I forgot the IN and OUT is for front-end and back-end. The
naming isn't very good indeed. Probably we should add a line of
comments somewhere as a reminder.

Thanks


[PATCH kernel] powerpc/xive: Ignore kmemleak false positives

2020-06-11 Thread Alexey Kardashevskiy
xive_native_provision_pages() allocates memory and passes the pointer to
OPAL so kmemleak cannot find the pointer usage in the kernel memory and
produces a false positive report (below) (even if the kernel did scan
OPAL memory, it is unable to deal with __pa() addresses anyway).

This silences the warning.

unreferenced object 0xc000200350c4 (size 65536):
  comm "qemu-system-ppc", pid 2725, jiffies 4294946414 (age 70776.530s)
  hex dump (first 32 bytes):
02 00 00 00 50 00 00 00 00 00 00 00 00 00 00 00  P...
01 00 08 07 00 00 00 00 00 00 00 00 00 00 00 00  
  backtrace:
[<81ff046c>] xive_native_alloc_vp_block+0x120/0x250
[] kvmppc_xive_compute_vp_id+0x248/0x350 [kvm]
[] kvmppc_xive_connect_vcpu+0xc0/0x520 [kvm]
[<6acbc81c>] kvm_arch_vcpu_ioctl+0x308/0x580 [kvm]
[<89c69580>] kvm_vcpu_ioctl+0x19c/0xae0 [kvm]
[<902ae91e>] ksys_ioctl+0x184/0x1b0
[] sys_ioctl+0x48/0xb0
[<01b2c127>] system_call_exception+0x124/0x1f0
[] system_call_common+0xe8/0x214

Signed-off-by: Alexey Kardashevskiy 
---

Does kmemleak actually check the OPAL memory? Because if it did, we
would still have a warning as kmemleak does not trace __pa() addresses
anyway.
---
 arch/powerpc/sysdev/xive/native.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/sysdev/xive/native.c 
b/arch/powerpc/sysdev/xive/native.c
index 71b881e554fc..cb58ec7ce77a 100644
--- a/arch/powerpc/sysdev/xive/native.c
+++ b/arch/powerpc/sysdev/xive/native.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -647,6 +648,7 @@ static bool xive_native_provision_pages(void)
pr_err("Failed to allocate provisioning page\n");
return false;
}
+   kmemleak_ignore(p);
opal_xive_donate_page(chip, __pa(p));
}
return true;
-- 
2.17.1



Re: [RFC PATCH v2 3/3] ASoC: fsl_asrc_dma: Reuse the dma channel if available in Back-End

2020-06-11 Thread Shengjiu Wang
On Fri, Jun 12, 2020 at 8:33 AM Nicolin Chen  wrote:
>
> On Wed, Jun 10, 2020 at 06:05:49PM +0800, Shengjiu Wang wrote:
> > The dma channel has been requested by Back-End cpu dai driver already.
> > If fsl_asrc_dma requests dma chan with same dma:tx symlink, then
> > there will be below warning with SDMA.
> >
> > [   48.174236] fsl-esai-dai 2024000.esai: Cannot create DMA dma:tx symlink
> >
> > or with EDMA the request operation will fail for EDMA channel
> > can only be requested once.
> >
> > So If we can reuse the dma channel of Back-End, then the issue can be
> > fixed.
> >
> > In order to get the dma channel which is already requested in Back-End.
> > we use the exported two functions (snd_soc_lookup_component_nolocked
> > and soc_component_to_pcm). If we can get the dma channel, then reuse it,
> > if can't, then request a new one.
> >
> > Signed-off-by: Shengjiu Wang 
> > ---
> >  sound/soc/fsl/fsl_asrc_common.h |  2 ++
> >  sound/soc/fsl/fsl_asrc_dma.c| 52 +
> >  2 files changed, 42 insertions(+), 12 deletions(-)
>
> > diff --git a/sound/soc/fsl/fsl_asrc_common.h 
> > b/sound/soc/fsl/fsl_asrc_common.h
> > index 77665b15c8db..09512bc79b80 100644
> > --- a/sound/soc/fsl/fsl_asrc_common.h
> > +++ b/sound/soc/fsl/fsl_asrc_common.h
> > @@ -32,6 +32,7 @@ enum asrc_pair_index {
> >   * @dma_chan: inputer and output DMA channels
> >   * @dma_data: private dma data
> >   * @pos: hardware pointer position
> > + * @req_dma_chan_dev_to_dev: flag for release dev_to_dev chan
>
> Since we only have dma_request call for back-end only:
> + * @req_dma_chan: flag to release back-end dma chan

I prefer to use the description "flag to release dev_to_dev chan"
because we won't release the dma chan of the back-end. if the chan
is from the back-end, it is owned by the back-end component.

>
> > diff --git a/sound/soc/fsl/fsl_asrc_dma.c b/sound/soc/fsl/fsl_asrc_dma.c
> > index d6a3fc5f87e5..5ecb77d466d3 100644
> > --- a/sound/soc/fsl/fsl_asrc_dma.c
> > +++ b/sound/soc/fsl/fsl_asrc_dma.c
> > @@ -160,6 +161,9 @@ static int fsl_asrc_dma_hw_params(struct 
> > snd_soc_component *component,
> >   substream_be = snd_soc_dpcm_get_substream(be, stream);
> >   dma_params_be = snd_soc_dai_get_dma_data(dai, substream_be);
> >   dev_be = dai->dev;
> > + component_be = snd_soc_lookup_component_nolocked(dev_be, 
> > SND_DMAENGINE_PCM_DRV_NAME);
> > + if (component_be)
> > + tmp_chan = 
> > soc_component_to_pcm(component_be)->chan[substream->stream];
>
> Should we use substream_be->stream or just substream->stream?

substream_be->stream should be better.

>
> And would be better to add these lines right before we really use
> tmp_chan because there's still some distance till it reaches that
> point. And would be better to have a line of comments too.

ok.

>
> > @@ -205,10 +209,14 @@ static int fsl_asrc_dma_hw_params(struct 
> > snd_soc_component *component,
> >*/
> >   if (!asrc->use_edma) {
> >   /* Get DMA request of Back-End */
> > - tmp_chan = dma_request_slave_channel(dev_be, tx ? "tx" : 
> > "rx");
> > + if (!tmp_chan) {
> > + tmp_chan_new = dma_request_slave_channel(dev_be, tx ? 
> > "tx" : "rx");
> > + tmp_chan = tmp_chan_new;
>
> This is a bit confusing...though I finally got it :)
> So probably better to have a line of comments.

ok.

>
> > @@ -220,9 +228,26 @@ static int fsl_asrc_dma_hw_params(struct 
> > snd_soc_component *component,
> >
> >   pair->dma_chan[dir] =
> >   dma_request_channel(mask, filter, >dma_data);
> > + pair->req_dma_chan_dev_to_dev = true;
> >   } else {
> > - pair->dma_chan[dir] =
> > - asrc->get_dma_channel(pair, dir);
> > + /*
> > +  * With EDMA, there is two dma channels can be used for p2p,
> > +  * one is from ASRC, one is from another peripheral
> > +  * (ESAI or SAI). Previously we select the dma channel of 
> > ASRC,
> > +  * but find an issue for ideal ratio case, there is no control
> > +  * for data copy speed, the speed is faster than sample
> > +  * frequency.
> > +  *
> > +  * So we switch to use dma channel of peripheral (ESAI or 
> > SAI),
> > +  * that copy speed of DMA is controlled by data consumption
> > +  * speed in the peripheral FIFO.
> > +  */
>
> This sounds like a different issue and should be fixed separately?
> If you prefer not to, better to move this one to commit log, other
> than having a changelog here, in my opinion.

ok, will move it in commit log.

>
> Since it no longer uses get_dma_channel() for EDMA case, we should
> update the comments at the top as well.
>
> > + pair->req_dma_chan_dev_to_dev = false;
> > + 

Re: [RFC PATCH v2 3/3] ASoC: fsl_asrc_dma: Reuse the dma channel if available in Back-End

2020-06-11 Thread Nicolin Chen
On Wed, Jun 10, 2020 at 06:05:49PM +0800, Shengjiu Wang wrote:
> The dma channel has been requested by Back-End cpu dai driver already.
> If fsl_asrc_dma requests dma chan with same dma:tx symlink, then
> there will be below warning with SDMA.
> 
> [   48.174236] fsl-esai-dai 2024000.esai: Cannot create DMA dma:tx symlink
> 
> or with EDMA the request operation will fail for EDMA channel
> can only be requested once.
> 
> So If we can reuse the dma channel of Back-End, then the issue can be
> fixed.
> 
> In order to get the dma channel which is already requested in Back-End.
> we use the exported two functions (snd_soc_lookup_component_nolocked
> and soc_component_to_pcm). If we can get the dma channel, then reuse it,
> if can't, then request a new one.
> 
> Signed-off-by: Shengjiu Wang 
> ---
>  sound/soc/fsl/fsl_asrc_common.h |  2 ++
>  sound/soc/fsl/fsl_asrc_dma.c| 52 +
>  2 files changed, 42 insertions(+), 12 deletions(-)

> diff --git a/sound/soc/fsl/fsl_asrc_common.h b/sound/soc/fsl/fsl_asrc_common.h
> index 77665b15c8db..09512bc79b80 100644
> --- a/sound/soc/fsl/fsl_asrc_common.h
> +++ b/sound/soc/fsl/fsl_asrc_common.h
> @@ -32,6 +32,7 @@ enum asrc_pair_index {
>   * @dma_chan: inputer and output DMA channels
>   * @dma_data: private dma data
>   * @pos: hardware pointer position
> + * @req_dma_chan_dev_to_dev: flag for release dev_to_dev chan

Since we only have dma_request call for back-end only:
+ * @req_dma_chan: flag to release back-end dma chan

> diff --git a/sound/soc/fsl/fsl_asrc_dma.c b/sound/soc/fsl/fsl_asrc_dma.c
> index d6a3fc5f87e5..5ecb77d466d3 100644
> --- a/sound/soc/fsl/fsl_asrc_dma.c
> +++ b/sound/soc/fsl/fsl_asrc_dma.c
> @@ -160,6 +161,9 @@ static int fsl_asrc_dma_hw_params(struct 
> snd_soc_component *component,
>   substream_be = snd_soc_dpcm_get_substream(be, stream);
>   dma_params_be = snd_soc_dai_get_dma_data(dai, substream_be);
>   dev_be = dai->dev;
> + component_be = snd_soc_lookup_component_nolocked(dev_be, 
> SND_DMAENGINE_PCM_DRV_NAME);
> + if (component_be)
> + tmp_chan = 
> soc_component_to_pcm(component_be)->chan[substream->stream];

Should we use substream_be->stream or just substream->stream?

And would be better to add these lines right before we really use
tmp_chan because there's still some distance till it reaches that
point. And would be better to have a line of comments too.

> @@ -205,10 +209,14 @@ static int fsl_asrc_dma_hw_params(struct 
> snd_soc_component *component,
>*/
>   if (!asrc->use_edma) {
>   /* Get DMA request of Back-End */
> - tmp_chan = dma_request_slave_channel(dev_be, tx ? "tx" : "rx");
> + if (!tmp_chan) {
> + tmp_chan_new = dma_request_slave_channel(dev_be, tx ? 
> "tx" : "rx");
> + tmp_chan = tmp_chan_new;

This is a bit confusing...though I finally got it :)
So probably better to have a line of comments.

> @@ -220,9 +228,26 @@ static int fsl_asrc_dma_hw_params(struct 
> snd_soc_component *component,
>  
>   pair->dma_chan[dir] =
>   dma_request_channel(mask, filter, >dma_data);
> + pair->req_dma_chan_dev_to_dev = true;
>   } else {
> - pair->dma_chan[dir] =
> - asrc->get_dma_channel(pair, dir);
> + /*
> +  * With EDMA, there is two dma channels can be used for p2p,
> +  * one is from ASRC, one is from another peripheral
> +  * (ESAI or SAI). Previously we select the dma channel of ASRC,
> +  * but find an issue for ideal ratio case, there is no control
> +  * for data copy speed, the speed is faster than sample
> +  * frequency.
> +  *
> +  * So we switch to use dma channel of peripheral (ESAI or SAI),
> +  * that copy speed of DMA is controlled by data consumption
> +  * speed in the peripheral FIFO.
> +  */

This sounds like a different issue and should be fixed separately?
If you prefer not to, better to move this one to commit log, other
than having a changelog here, in my opinion.

Since it no longer uses get_dma_channel() for EDMA case, we should
update the comments at the top as well.

> + pair->req_dma_chan_dev_to_dev = false;
> + pair->dma_chan[dir] = tmp_chan;
> + if (!pair->dma_chan[dir]) {
> + pair->dma_chan[dir] = dma_request_slave_channel(dev_be, 
> tx ? "tx" : "rx");
> + pair->req_dma_chan_dev_to_dev = true;
> + }
>   }

Now there are some duplicated lines between these if-else routines, so
combining my previous comments, we can do (sample change, not tested):

@@ -197,18 +199,29 @@ static int fsl_asrc_dma_hw_params(struct 
snd_soc_component *component,
dma_cap_set(DMA_SLAVE, mask);

Re: [PATCH v4 1/2] powerpc/uaccess: Implement unsafe_put_user() using 'asm goto'

2020-06-11 Thread Segher Boessenkool
On Thu, Jun 11, 2020 at 03:43:55PM -0700, Nick Desaulniers wrote:
> Segher, Cristophe, I suspect Clang is missing support for the %L and %U
> output templates [1].

The arch/powerpc kernel first used the %U output modifier in 0c176fa80fdf
(from 2016), and %L in b8b572e1015f (2008).  include/asm-ppc (and ppc64)
have had %U since 2005 (1da177e4c3f4), and %L as well (0c541b4406a6).

> I've implemented support for some of these before
> in Clang via the documentation at [2], but these seem to be machine
> specific?

Yes, almost all output modifiers are.  Only %l, %a, %n, and part of %c
are generic (and %% and %= and on some targets, %{, %|, %}).

> Can you please point me to documentation/unit tests/source for
> these so that I can figure out what they should be doing, and look into
> implementing them in Clang?

The PowerPC part of
https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints
(sorry, no anchor) documents %U.

Traditionally the source code is the documentation for this.  The code
here starts with the comment
  /* Write second word of DImode or DFmode reference.  Works on register
 or non-indexed memory only.  */
(which is very out-of-date itself, it works fine for e.g. TImode as well,
but alas).

Unit tests are completely unsuitable for most compiler things like this.

The source code is gcc/config/rs6000/rs6000.c, easiest is to search for
'L' (with those quotes).  Function print_operand.

HtH,


Segher


[PATCH v3 2/2] powerpc: configs: remove CMDLINE_BOOL

2020-06-11 Thread Chris Packham
Regenerate defconfigs to remove CONFIG_CMDLINE_BOOL and the default
CONFIG_CMDLINE where applicable.

Signed-off-by: Chris Packham 
---
Changes in v3:
- new

 arch/powerpc/configs/44x/akebono_defconfig | 2 --
 arch/powerpc/configs/44x/arches_defconfig  | 2 --
 arch/powerpc/configs/44x/bamboo_defconfig  | 2 --
 arch/powerpc/configs/44x/bluestone_defconfig   | 2 --
 arch/powerpc/configs/44x/canyonlands_defconfig | 2 --
 arch/powerpc/configs/44x/currituck_defconfig   | 2 --
 arch/powerpc/configs/44x/eiger_defconfig   | 2 --
 arch/powerpc/configs/44x/fsp2_defconfig| 1 -
 arch/powerpc/configs/44x/icon_defconfig| 2 --
 arch/powerpc/configs/44x/iss476-smp_defconfig  | 1 -
 arch/powerpc/configs/44x/katmai_defconfig  | 2 --
 arch/powerpc/configs/44x/rainier_defconfig | 2 --
 arch/powerpc/configs/44x/redwood_defconfig | 2 --
 arch/powerpc/configs/44x/sam440ep_defconfig| 2 --
 arch/powerpc/configs/44x/sequoia_defconfig | 2 --
 arch/powerpc/configs/44x/taishan_defconfig | 2 --
 arch/powerpc/configs/44x/warp_defconfig| 1 -
 arch/powerpc/configs/holly_defconfig   | 1 -
 arch/powerpc/configs/mvme5100_defconfig| 3 +--
 arch/powerpc/configs/ps3_defconfig | 2 --
 arch/powerpc/configs/skiroot_defconfig | 1 -
 arch/powerpc/configs/storcenter_defconfig  | 1 -
 22 files changed, 1 insertion(+), 38 deletions(-)

diff --git a/arch/powerpc/configs/44x/akebono_defconfig 
b/arch/powerpc/configs/44x/akebono_defconfig
index 7705a5c3f4ea..60d5fa2c3b93 100644
--- a/arch/powerpc/configs/44x/akebono_defconfig
+++ b/arch/powerpc/configs/44x/akebono_defconfig
@@ -19,8 +19,6 @@ CONFIG_HIGHMEM=y
 CONFIG_HZ_100=y
 CONFIG_IRQ_ALL_CPUS=y
 # CONFIG_COMPACTION is not set
-CONFIG_CMDLINE_BOOL=y
-CONFIG_CMDLINE=""
 # CONFIG_SUSPEND is not set
 CONFIG_NET=y
 CONFIG_PACKET=y
diff --git a/arch/powerpc/configs/44x/arches_defconfig 
b/arch/powerpc/configs/44x/arches_defconfig
index 82c6f49b8dcb..41d04e70d4fb 100644
--- a/arch/powerpc/configs/44x/arches_defconfig
+++ b/arch/powerpc/configs/44x/arches_defconfig
@@ -11,8 +11,6 @@ CONFIG_MODULE_UNLOAD=y
 # CONFIG_BLK_DEV_BSG is not set
 # CONFIG_EBONY is not set
 CONFIG_ARCHES=y
-CONFIG_CMDLINE_BOOL=y
-CONFIG_CMDLINE=""
 CONFIG_NET=y
 CONFIG_PACKET=y
 CONFIG_UNIX=y
diff --git a/arch/powerpc/configs/44x/bamboo_defconfig 
b/arch/powerpc/configs/44x/bamboo_defconfig
index 679213214a75..acbce718eaa8 100644
--- a/arch/powerpc/configs/44x/bamboo_defconfig
+++ b/arch/powerpc/configs/44x/bamboo_defconfig
@@ -9,8 +9,6 @@ CONFIG_MODULE_UNLOAD=y
 # CONFIG_BLK_DEV_BSG is not set
 CONFIG_BAMBOO=y
 # CONFIG_EBONY is not set
-CONFIG_CMDLINE_BOOL=y
-CONFIG_CMDLINE=""
 CONFIG_NET=y
 CONFIG_PACKET=y
 CONFIG_UNIX=y
diff --git a/arch/powerpc/configs/44x/bluestone_defconfig 
b/arch/powerpc/configs/44x/bluestone_defconfig
index 8006a5728afd..37088f250c9e 100644
--- a/arch/powerpc/configs/44x/bluestone_defconfig
+++ b/arch/powerpc/configs/44x/bluestone_defconfig
@@ -11,8 +11,6 @@ CONFIG_EXPERT=y
 # CONFIG_COMPAT_BRK is not set
 CONFIG_BLUESTONE=y
 # CONFIG_EBONY is not set
-CONFIG_CMDLINE_BOOL=y
-CONFIG_CMDLINE=""
 CONFIG_NET=y
 CONFIG_PACKET=y
 CONFIG_UNIX=y
diff --git a/arch/powerpc/configs/44x/canyonlands_defconfig 
b/arch/powerpc/configs/44x/canyonlands_defconfig
index ccc14eb7a2f1..61776ade572b 100644
--- a/arch/powerpc/configs/44x/canyonlands_defconfig
+++ b/arch/powerpc/configs/44x/canyonlands_defconfig
@@ -11,8 +11,6 @@ CONFIG_MODULE_UNLOAD=y
 # CONFIG_BLK_DEV_BSG is not set
 # CONFIG_EBONY is not set
 CONFIG_CANYONLANDS=y
-CONFIG_CMDLINE_BOOL=y
-CONFIG_CMDLINE=""
 CONFIG_NET=y
 CONFIG_PACKET=y
 CONFIG_UNIX=y
diff --git a/arch/powerpc/configs/44x/currituck_defconfig 
b/arch/powerpc/configs/44x/currituck_defconfig
index be76e066df01..34c86b3abecb 100644
--- a/arch/powerpc/configs/44x/currituck_defconfig
+++ b/arch/powerpc/configs/44x/currituck_defconfig
@@ -17,8 +17,6 @@ CONFIG_HIGHMEM=y
 CONFIG_HZ_100=y
 CONFIG_MATH_EMULATION=y
 CONFIG_IRQ_ALL_CPUS=y
-CONFIG_CMDLINE_BOOL=y
-CONFIG_CMDLINE=""
 # CONFIG_SUSPEND is not set
 CONFIG_NET=y
 CONFIG_PACKET=y
diff --git a/arch/powerpc/configs/44x/eiger_defconfig 
b/arch/powerpc/configs/44x/eiger_defconfig
index 1abaa63e067f..509300f400e2 100644
--- a/arch/powerpc/configs/44x/eiger_defconfig
+++ b/arch/powerpc/configs/44x/eiger_defconfig
@@ -10,8 +10,6 @@ CONFIG_MODULES=y
 CONFIG_MODULE_UNLOAD=y
 # CONFIG_EBONY is not set
 CONFIG_EIGER=y
-CONFIG_CMDLINE_BOOL=y
-CONFIG_CMDLINE=""
 CONFIG_PCIEPORTBUS=y
 # CONFIG_PCIEASPM is not set
 CONFIG_NET=y
diff --git a/arch/powerpc/configs/44x/fsp2_defconfig 
b/arch/powerpc/configs/44x/fsp2_defconfig
index e67fc041ca3e..30845ce0885a 100644
--- a/arch/powerpc/configs/44x/fsp2_defconfig
+++ b/arch/powerpc/configs/44x/fsp2_defconfig
@@ -28,7 +28,6 @@ CONFIG_476FPE_ERR46=y
 CONFIG_SWIOTLB=y
 CONFIG_KEXEC=y
 CONFIG_CRASH_DUMP=y
-CONFIG_CMDLINE_BOOL=y
 CONFIG_CMDLINE="ip=on rw"
 # CONFIG_SUSPEND is not set
 # CONFIG_PCI is not set
diff --git 

[PATCH v3 1/2] powerpc: Remove inaccessible CMDLINE default

2020-06-11 Thread Chris Packham
Since commit cbe46bd4f510 ("powerpc: remove CONFIG_CMDLINE #ifdef mess")
CONFIG_CMDLINE has always had a value regardless of CONFIG_CMDLINE_BOOL.

For example:

 $ make ARCH=powerpc defconfig
 $ cat .config
 # CONFIG_CMDLINE_BOOL is not set
 CONFIG_CMDLINE=""

When enabling CONFIG_CMDLINE_BOOL this value is kept making the 'default
"..." if CONFIG_CMDLINE_BOOL' ineffective.

 $ ./scripts/config --enable CONFIG_CMDLINE_BOOL
 $ cat .config
 CONFIG_CMDLINE_BOOL=y
 CONFIG_CMDLINE=""

Remove CONFIG_CMDLINE_BOOL and the inaccessible default.

Signed-off-by: Chris Packham 
Reviewed-by: Christophe Leroy 
---

Changes in v3:
- none

Changes in v2:
- Rebase on top of Linus's tree
- Fix some typos in commit message
- Add review from Christophe
- Remove CONFIG_CMDLINE_BOOL

 arch/powerpc/Kconfig | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 9fa23eb320ff..51abc59c3334 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -859,12 +859,8 @@ config PPC_DENORMALISATION
  Add support for handling denormalisation of single precision
  values.  Useful for bare metal only.  If unsure say Y here.
 
-config CMDLINE_BOOL
-   bool "Default bootloader kernel arguments"
-
 config CMDLINE
-   string "Initial kernel command string" if CMDLINE_BOOL
-   default "console=ttyS0,9600 console=tty0 root=/dev/sda2" if CMDLINE_BOOL
+   string "Initial kernel command string"
default ""
help
  On some platforms, there is currently no way for the boot loader to
-- 
2.27.0



[PATCH v3 0/2] powerpc: CMDLINE config cleanup

2020-06-11 Thread Chris Packham
This series cleans up the config options related to the boot command line.

Chris Packham (2):
  powerpc: Remove inaccessible CMDLINE default
  powerpc: configs: remove CMDLINE_BOOL

 arch/powerpc/Kconfig   | 6 +-
 arch/powerpc/configs/44x/akebono_defconfig | 2 --
 arch/powerpc/configs/44x/arches_defconfig  | 2 --
 arch/powerpc/configs/44x/bamboo_defconfig  | 2 --
 arch/powerpc/configs/44x/bluestone_defconfig   | 2 --
 arch/powerpc/configs/44x/canyonlands_defconfig | 2 --
 arch/powerpc/configs/44x/currituck_defconfig   | 2 --
 arch/powerpc/configs/44x/eiger_defconfig   | 2 --
 arch/powerpc/configs/44x/fsp2_defconfig| 1 -
 arch/powerpc/configs/44x/icon_defconfig| 2 --
 arch/powerpc/configs/44x/iss476-smp_defconfig  | 1 -
 arch/powerpc/configs/44x/katmai_defconfig  | 2 --
 arch/powerpc/configs/44x/rainier_defconfig | 2 --
 arch/powerpc/configs/44x/redwood_defconfig | 2 --
 arch/powerpc/configs/44x/sam440ep_defconfig| 2 --
 arch/powerpc/configs/44x/sequoia_defconfig | 2 --
 arch/powerpc/configs/44x/taishan_defconfig | 2 --
 arch/powerpc/configs/44x/warp_defconfig| 1 -
 arch/powerpc/configs/holly_defconfig   | 1 -
 arch/powerpc/configs/mvme5100_defconfig| 3 +--
 arch/powerpc/configs/ps3_defconfig | 2 --
 arch/powerpc/configs/skiroot_defconfig | 1 -
 arch/powerpc/configs/storcenter_defconfig  | 1 -
 23 files changed, 2 insertions(+), 43 deletions(-)

-- 
2.27.0



[no subject]

2020-06-11 Thread ndesaulniers

Date: Thu, 11 Jun 2020 15:38:38 -0700
From: Nick Desaulniers 
To: Michael Ellerman ,
christophe.le...@c-s.fr, seg...@kernel.crashing.org
Cc: Christophe Leroy ,
Benjamin Herrenschmidt ,
Paul Mackerras , npig...@gmail.com,
seg...@kernel.crashing.org, linuxppc-dev@lists.ozlabs.org,
linux-ker...@vger.kernel.org, clang-built-li...@googlegroups.com
Subject: Re: [PATCH v4 1/2] powerpc/uaccess: Implement unsafe_put_user()
 using 'asm goto'
Message-ID: <20200611223838.ga60...@google.com>
References:  
<23e680624680a9a5405f4b88740d2596d4b17c26.1587143308.git.christophe.le...@c-s.fr>

 <49ybky13szz9...@ozlabs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <49ybky13szz9...@ozlabs.org>

On Fri, May 29, 2020 at 02:24:16PM +1000, Michael Ellerman wrote:

On Fri, 2020-04-17 at 17:08:51 UTC, Christophe Leroy wrote:
> unsafe_put_user() is designed to take benefit of 'asm goto'.
>
> Instead of using the standard __put_user() approach and branch
> based on the returned error, use 'asm goto' and make the
> exception code branch directly to the error label. There is
> no code anymore in the fixup section.
>
> This change significantly simplifies functions using
> unsafe_put_user()
...
>
> Signed-off-by: Christophe Leroy 
> Reviewed-by: Segher Boessenkool 



Applied to powerpc topic/uaccess-ppc, thanks.



https://git.kernel.org/powerpc/c/334710b1496af8a0960e70121f850e209c20958f



cheers


Hello!  It seems this patch broke our ppc32 builds, and we had to
disable them [0]. :(

From what I can tell, though Michael mentioned this was merged on May
29, but our CI of -next was green for ppc32 until June 4, then mainline
went red June 6.  So this patch only got 2 days of soak time before the
merge window opened.

A general issue with the -next workflow seems to be that patches get
different amounts of soak time.  For higher risk patches like this one,
can I please ask that they be help back a release if close to the merge
window?

Segher, Cristophe, I suspect Clang is missing support for the %L and %U
output templates [1]. I've implemented support for some of these before
in Clang via the documentation at [2], but these seem to be machine
specific? Can you please point me to documentation/unit tests/source for
these so that I can figure out what they should be doing, and look into
implementing them in Clang?

(Apologies for the tone off this email; I had typed up a nice fuller
report with links, but it seemed that mutt wrote out an empty postponed
file, and I kind of just want to put my laptop in the garbage right now.
I suspect our internal SMTP tool will also mess up some headers, but
lets see (Also, too lazy+angry right now to solve).)

[0] https://github.com/ClangBuiltLinux/continuous-integration/pull/279
[1] https://bugs.llvm.org/show_bug.cgi?id=46186
[2]  
https://gcc.gnu.org/onlinedocs/gccint/Output-Template.html#Output-Template


Re: [PATCH v5 1/4] riscv: Move kernel mapping to vmalloc zone

2020-06-11 Thread Atish Patra
On Sun, Jun 7, 2020 at 1:01 AM Alexandre Ghiti  wrote:
>
> This is a preparatory patch for relocatable kernel.
>
> The kernel used to be linked at PAGE_OFFSET address and used to be loaded
> physically at the beginning of the main memory. Therefore, we could use
> the linear mapping for the kernel mapping.
>
> But the relocated kernel base address will be different from PAGE_OFFSET
> and since in the linear mapping, two different virtual addresses cannot
> point to the same physical address, the kernel mapping needs to lie outside
> the linear mapping.
>
> In addition, because modules and BPF must be close to the kernel (inside
> +-2GB window), the kernel is placed at the end of the vmalloc zone minus
> 2GB, which leaves room for modules and BPF. The kernel could not be
> placed at the beginning of the vmalloc zone since other vmalloc
> allocations from the kernel could get all the +-2GB window around the
> kernel which would prevent new modules and BPF programs to be loaded.
>
> Signed-off-by: Alexandre Ghiti 
> Reviewed-by: Zong Li 
> ---
>  arch/riscv/boot/loader.lds.S |  3 +-
>  arch/riscv/include/asm/page.h| 10 +-
>  arch/riscv/include/asm/pgtable.h | 38 ++---
>  arch/riscv/kernel/head.S |  3 +-
>  arch/riscv/kernel/module.c   |  4 +--
>  arch/riscv/kernel/vmlinux.lds.S  |  3 +-
>  arch/riscv/mm/init.c | 58 +---
>  arch/riscv/mm/physaddr.c |  2 +-
>  8 files changed, 88 insertions(+), 33 deletions(-)
>
> diff --git a/arch/riscv/boot/loader.lds.S b/arch/riscv/boot/loader.lds.S
> index 47a5003c2e28..62d94696a19c 100644
> --- a/arch/riscv/boot/loader.lds.S
> +++ b/arch/riscv/boot/loader.lds.S
> @@ -1,13 +1,14 @@
>  /* SPDX-License-Identifier: GPL-2.0 */
>
>  #include 
> +#include 
>
>  OUTPUT_ARCH(riscv)
>  ENTRY(_start)
>
>  SECTIONS
>  {
> -   . = PAGE_OFFSET;
> +   . = KERNEL_LINK_ADDR;
>
> .payload : {
> *(.payload)
> diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> index 2d50f76efe48..48bb09b6a9b7 100644
> --- a/arch/riscv/include/asm/page.h
> +++ b/arch/riscv/include/asm/page.h
> @@ -90,18 +90,26 @@ typedef struct page *pgtable_t;
>
>  #ifdef CONFIG_MMU
>  extern unsigned long va_pa_offset;
> +extern unsigned long va_kernel_pa_offset;
>  extern unsigned long pfn_base;
>  #define ARCH_PFN_OFFSET(pfn_base)
>  #else
>  #define va_pa_offset   0
> +#define va_kernel_pa_offset0
>  #define ARCH_PFN_OFFSET(PAGE_OFFSET >> PAGE_SHIFT)
>  #endif /* CONFIG_MMU */
>
>  extern unsigned long max_low_pfn;
>  extern unsigned long min_low_pfn;
> +extern unsigned long kernel_virt_addr;
>
>  #define __pa_to_va_nodebug(x)  ((void *)((unsigned long) (x) + va_pa_offset))
> -#define __va_to_pa_nodebug(x)  ((unsigned long)(x) - va_pa_offset)
> +#define linear_mapping_va_to_pa(x) ((unsigned long)(x) - va_pa_offset)
> +#define kernel_mapping_va_to_pa(x) \
> +   ((unsigned long)(x) - va_kernel_pa_offset)
> +#define __va_to_pa_nodebug(x)  \
> +   (((x) >= PAGE_OFFSET) ? \
> +   linear_mapping_va_to_pa(x) : kernel_mapping_va_to_pa(x))
>
>  #ifdef CONFIG_DEBUG_VIRTUAL
>  extern phys_addr_t __virt_to_phys(unsigned long x);
> diff --git a/arch/riscv/include/asm/pgtable.h 
> b/arch/riscv/include/asm/pgtable.h
> index 35b60035b6b0..94ef3b49dfb6 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -11,23 +11,29 @@
>
>  #include 
>
> -#ifndef __ASSEMBLY__
> -
> -/* Page Upper Directory not used in RISC-V */
> -#include 
> -#include 
> -#include 
> -#include 
> -
> -#ifdef CONFIG_MMU
> +#ifndef CONFIG_MMU
> +#define KERNEL_VIRT_ADDR   PAGE_OFFSET
> +#define KERNEL_LINK_ADDR   PAGE_OFFSET
> +#else
> +/*
> + * Leave 2GB for modules and BPF that must lie within a 2GB range around
> + * the kernel.
> + */
> +#define KERNEL_VIRT_ADDR   (VMALLOC_END - SZ_2G + 1)
> +#define KERNEL_LINK_ADDR   KERNEL_VIRT_ADDR
>
>  #define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1)
>  #define VMALLOC_END  (PAGE_OFFSET - 1)
>  #define VMALLOC_START(PAGE_OFFSET - VMALLOC_SIZE)
>
>  #define BPF_JIT_REGION_SIZE(SZ_128M)
> -#define BPF_JIT_REGION_START   (PAGE_OFFSET - BPF_JIT_REGION_SIZE)
> -#define BPF_JIT_REGION_END (VMALLOC_END)
> +#define BPF_JIT_REGION_START   PFN_ALIGN((unsigned long)&_end)
> +#define BPF_JIT_REGION_END (BPF_JIT_REGION_START + BPF_JIT_REGION_SIZE)
> +

As these mappings have changed a few times in recent months including
this one, I think it would be
better to have virtual memory layout documentation in RISC-V similar
to other architectures.

If you can include the page table layout for 3/4 level page tables in
the same document, that would be really helpful.

> +#ifdef CONFIG_64BIT
> +#define VMALLOC_MODULE_START   BPF_JIT_REGION_END
> +#define VMALLOC_MODULE_END (((unsigned long)&_start & PAGE_MASK) + SZ_2G)
> +#endif
>
>  /*
>   * 

Re: [PATCH v2] powerpc: Remove inaccessible CMDLINE default

2020-06-11 Thread Chris Packham

On 11/06/20 5:46 pm, Christophe Leroy wrote:
>
>
> Le 11/06/2020 à 05:41, Chris Packham a écrit :
>> Since commit cbe46bd4f510 ("powerpc: remove CONFIG_CMDLINE #ifdef mess")
>> CONFIG_CMDLINE has always had a value regardless of CONFIG_CMDLINE_BOOL.
>>
>> For example:
>>
>>   $ make ARCH=powerpc defconfig
>>   $ cat .config
>>   # CONFIG_CMDLINE_BOOL is not set
>>   CONFIG_CMDLINE=""
>>
>> When enabling CONFIG_CMDLINE_BOOL this value is kept making the 'default
>> "..." if CONFIG_CMDLINE_BOOL' ineffective.
>>
>>   $ ./scripts/config --enable CONFIG_CMDLINE_BOOL
>>   $ cat .config
>>   CONFIG_CMDLINE_BOOL=y
>>   CONFIG_CMDLINE=""
>>
>> Remove CONFIG_CMDLINE_BOOL and the inaccessible default.
>
> You also have to remove all CONFIG_CMDLINE_BOOL from the defconfigs

OK. I'll do so as a follow-up patch and send a v3.

>
> Christophe
>
>>
>> Signed-off-by: Chris Packham 
>> Reviewed-by: Christophe Leroy 
>> ---
>> It took me a while to get round to sending a v2, for a refresher v1 
>> can be found here:
>>
>> http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20190802050232.22978-1-chris.pack...@alliedtelesis.co.nz/
>>  
>>
>>
>> Changes in v2:
>> - Rebase on top of Linus's tree
>> - Fix some typos in commit message
>> - Add review from Christophe
>> - Remove CONFIG_CMDLINE_BOOL
>>
>>   arch/powerpc/Kconfig | 6 +-
>>   1 file changed, 1 insertion(+), 5 deletions(-)
>>
>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>> index 9fa23eb320ff..51abc59c3334 100644
>> --- a/arch/powerpc/Kconfig
>> +++ b/arch/powerpc/Kconfig
>> @@ -859,12 +859,8 @@ config PPC_DENORMALISATION
>>     Add support for handling denormalisation of single precision
>>     values.  Useful for bare metal only.  If unsure say Y here.
>>   -config CMDLINE_BOOL
>> -    bool "Default bootloader kernel arguments"
>> -
>>   config CMDLINE
>> -    string "Initial kernel command string" if CMDLINE_BOOL
>> -    default "console=ttyS0,9600 console=tty0 root=/dev/sda2" if 
>> CMDLINE_BOOL
>> +    string "Initial kernel command string"
>>   default ""
>>   help
>>     On some platforms, there is currently no way for the boot 
>> loader to
>>

Re: [PATCH v2] All arch: remove system call sys_sysctl

2020-06-11 Thread Eric W. Biederman
Rich Felker  writes:

> On Thu, Jun 11, 2020 at 12:01:11PM -0500, Eric W. Biederman wrote:
>> Rich Felker  writes:
>> 
>> > On Thu, Jun 11, 2020 at 06:43:00AM -0500, Eric W. Biederman wrote:
>> >> Xiaoming Ni  writes:
>> >> 
>> >> > Since the commit 61a47c1ad3a4dc ("sysctl: Remove the sysctl system 
>> >> > call"),
>> >> > sys_sysctl is actually unavailable: any input can only return an error.
>> >> >
>> >> > We have been warning about people using the sysctl system call for years
>> >> > and believe there are no more users.  Even if there are users of this
>> >> > interface if they have not complained or fixed their code by now they
>> >> > probably are not going to, so there is no point in warning them any
>> >> > longer.
>> >> >
>> >> > So completely remove sys_sysctl on all architectures.
>> >> 
>> >> 
>> >> 
>> >> >
>> >> > Signed-off-by: Xiaoming Ni 
>> >> >
>> >> > changes in v2:
>> >> >   According to Kees Cook's suggestion, completely remove sys_sysctl on 
>> >> > all arch
>> >> >   According to Eric W. Biederman's suggestion, update the commit log
>> >> >
>> >> > V1: 
>> >> > https://lore.kernel.org/lkml/1591683605-8585-1-git-send-email-nixiaom...@huawei.com/
>> >> >   Delete the code of sys_sysctl and return -ENOSYS directly at the 
>> >> > function entry
>> >> > ---
>> >> >  include/uapi/linux/sysctl.h|  15 --
>> >> [snip]
>> >> 
>> >> > diff --git a/include/uapi/linux/sysctl.h b/include/uapi/linux/sysctl.h
>> >> > index 27c1ed2..84b44c3 100644
>> >> > --- a/include/uapi/linux/sysctl.h
>> >> > +++ b/include/uapi/linux/sysctl.h
>> >> > @@ -27,21 +27,6 @@
>> >> >  #include 
>> >> >  #include 
>> >> >  
>> >> > -#define CTL_MAXNAME 10 /* how many path components do we allow 
>> >> > in a
>> >> > -  call to sysctl?   In other words, 
>> >> > what is
>> >> > -  the largest acceptable value for the 
>> >> > nlen
>> >> > -  member of a struct __sysctl_args to 
>> >> > have? */
>> >> > -
>> >> > -struct __sysctl_args {
>> >> > -   int __user *name;
>> >> > -   int nlen;
>> >> > -   void __user *oldval;
>> >> > -   size_t __user *oldlenp;
>> >> > -   void __user *newval;
>> >> > -   size_t newlen;
>> >> > -   unsigned long __unused[4];
>> >> > -};
>> >> > -
>> >> >  /* Define sysctl names first */
>> >> >  
>> >> >  /* Top-level names: */
>> >> [snip]
>> >> 
>> >> The uapi header change does not make sense.  The entire point of the
>> >> header is to allow userspace programs to be able to call sys_sysctl.
>> >> It either needs to all stay or all go.
>> >> 
>> >> As the concern with the uapi header is about userspace programs being
>> >> able to compile please leave the header for now.
>> >> 
>> >> We should leave auditing userspace and seeing if userspace code will
>> >> still compile if we remove this header for a separate patch.  The
>> >> concerns and justifications for the uapi header are completely different
>> >> then for the removing the sys_sysctl implementation.
>> >> 
>> >> Otherwise
>> >> Acked-by: "Eric W. Biederman" 
>> >
>> > The UAPI header should be kept because it's defining an API not just
>> > for the kernel the headers are supplied with, but for all past
>> > kernels. In particular programs needing a failsafe CSPRNG source that
>> > works on old kernels may (do) use this as a fallback only if modern
>> > syscalls are missing. Removing the syscall is no problem since it
>> > won't be used, but if you remove the types/macros from the UAPI
>> > headers, they'll have to copy that into their own sources.
>> 
>> May we assume you know of a least one piece of userspace that will fail
>> to compile if this header file is removed?
>
> I know at least one piece of software is using SYS_sysctl for a
> fallback CSPRNG source. I'm not 100% sure that they're using the
> kernel headers; they might have copied it already. I'm also not sure
> how many there are.
>
> Regardless, I think the principle stands. There's no need to remove
> definitions that are essentially maintenance-free now that the
> interface is no longer available in new kernels, and doing so
> contributes to the myth that you're supposed to use kernel headers
> matching runtime kernel rather than it always being safe to use latest
> headers.

If there is no one using the definitions removing them saves people
having to remember what they are there for.

The big rule is don't break userspace.  The goal is to allow people to
upgrade their kernel without needing to worry about userspace breaking,
and to be able to downgrade to the extent possible to help in tracking
bugs.

Not being able to compile userspace seems like a pretty clear cut case.
Although there are some fuzzy edges given the history of the kernel
headers.  Things like your libc requiring kernel headers to be processed
before they can be used.  I think there are still some kernel headers
that have that restriction when 

Re: [PATCH v2] All arch: remove system call sys_sysctl

2020-06-11 Thread Eric W. Biederman
Rich Felker  writes:

> On Thu, Jun 11, 2020 at 06:43:00AM -0500, Eric W. Biederman wrote:
>> Xiaoming Ni  writes:
>> 
>> > Since the commit 61a47c1ad3a4dc ("sysctl: Remove the sysctl system call"),
>> > sys_sysctl is actually unavailable: any input can only return an error.
>> >
>> > We have been warning about people using the sysctl system call for years
>> > and believe there are no more users.  Even if there are users of this
>> > interface if they have not complained or fixed their code by now they
>> > probably are not going to, so there is no point in warning them any
>> > longer.
>> >
>> > So completely remove sys_sysctl on all architectures.
>> 
>> 
>> 
>> >
>> > Signed-off-by: Xiaoming Ni 
>> >
>> > changes in v2:
>> >   According to Kees Cook's suggestion, completely remove sys_sysctl on all 
>> > arch
>> >   According to Eric W. Biederman's suggestion, update the commit log
>> >
>> > V1: 
>> > https://lore.kernel.org/lkml/1591683605-8585-1-git-send-email-nixiaom...@huawei.com/
>> >   Delete the code of sys_sysctl and return -ENOSYS directly at the 
>> > function entry
>> > ---
>> >  include/uapi/linux/sysctl.h|  15 --
>> [snip]
>> 
>> > diff --git a/include/uapi/linux/sysctl.h b/include/uapi/linux/sysctl.h
>> > index 27c1ed2..84b44c3 100644
>> > --- a/include/uapi/linux/sysctl.h
>> > +++ b/include/uapi/linux/sysctl.h
>> > @@ -27,21 +27,6 @@
>> >  #include 
>> >  #include 
>> >  
>> > -#define CTL_MAXNAME 10/* how many path components do we allow 
>> > in a
>> > - call to sysctl?   In other words, what is
>> > - the largest acceptable value for the nlen
>> > - member of a struct __sysctl_args to have? */
>> > -
>> > -struct __sysctl_args {
>> > -  int __user *name;
>> > -  int nlen;
>> > -  void __user *oldval;
>> > -  size_t __user *oldlenp;
>> > -  void __user *newval;
>> > -  size_t newlen;
>> > -  unsigned long __unused[4];
>> > -};
>> > -
>> >  /* Define sysctl names first */
>> >  
>> >  /* Top-level names: */
>> [snip]
>> 
>> The uapi header change does not make sense.  The entire point of the
>> header is to allow userspace programs to be able to call sys_sysctl.
>> It either needs to all stay or all go.
>> 
>> As the concern with the uapi header is about userspace programs being
>> able to compile please leave the header for now.
>> 
>> We should leave auditing userspace and seeing if userspace code will
>> still compile if we remove this header for a separate patch.  The
>> concerns and justifications for the uapi header are completely different
>> then for the removing the sys_sysctl implementation.
>> 
>> Otherwise
>> Acked-by: "Eric W. Biederman" 
>
> The UAPI header should be kept because it's defining an API not just
> for the kernel the headers are supplied with, but for all past
> kernels. In particular programs needing a failsafe CSPRNG source that
> works on old kernels may (do) use this as a fallback only if modern
> syscalls are missing. Removing the syscall is no problem since it
> won't be used, but if you remove the types/macros from the UAPI
> headers, they'll have to copy that into their own sources.

May we assume you know of a least one piece of userspace that will fail
to compile if this header file is removed?

Eric



Re: [PATCH v2] All arch: remove system call sys_sysctl

2020-06-11 Thread Rich Felker
On Thu, Jun 11, 2020 at 12:01:11PM -0500, Eric W. Biederman wrote:
> Rich Felker  writes:
> 
> > On Thu, Jun 11, 2020 at 06:43:00AM -0500, Eric W. Biederman wrote:
> >> Xiaoming Ni  writes:
> >> 
> >> > Since the commit 61a47c1ad3a4dc ("sysctl: Remove the sysctl system 
> >> > call"),
> >> > sys_sysctl is actually unavailable: any input can only return an error.
> >> >
> >> > We have been warning about people using the sysctl system call for years
> >> > and believe there are no more users.  Even if there are users of this
> >> > interface if they have not complained or fixed their code by now they
> >> > probably are not going to, so there is no point in warning them any
> >> > longer.
> >> >
> >> > So completely remove sys_sysctl on all architectures.
> >> 
> >> 
> >> 
> >> >
> >> > Signed-off-by: Xiaoming Ni 
> >> >
> >> > changes in v2:
> >> >   According to Kees Cook's suggestion, completely remove sys_sysctl on 
> >> > all arch
> >> >   According to Eric W. Biederman's suggestion, update the commit log
> >> >
> >> > V1: 
> >> > https://lore.kernel.org/lkml/1591683605-8585-1-git-send-email-nixiaom...@huawei.com/
> >> >   Delete the code of sys_sysctl and return -ENOSYS directly at the 
> >> > function entry
> >> > ---
> >> >  include/uapi/linux/sysctl.h|  15 --
> >> [snip]
> >> 
> >> > diff --git a/include/uapi/linux/sysctl.h b/include/uapi/linux/sysctl.h
> >> > index 27c1ed2..84b44c3 100644
> >> > --- a/include/uapi/linux/sysctl.h
> >> > +++ b/include/uapi/linux/sysctl.h
> >> > @@ -27,21 +27,6 @@
> >> >  #include 
> >> >  #include 
> >> >  
> >> > -#define CTL_MAXNAME 10  /* how many path components do we allow 
> >> > in a
> >> > -   call to sysctl?   In other words, 
> >> > what is
> >> > -   the largest acceptable value for the 
> >> > nlen
> >> > -   member of a struct __sysctl_args to 
> >> > have? */
> >> > -
> >> > -struct __sysctl_args {
> >> > -int __user *name;
> >> > -int nlen;
> >> > -void __user *oldval;
> >> > -size_t __user *oldlenp;
> >> > -void __user *newval;
> >> > -size_t newlen;
> >> > -unsigned long __unused[4];
> >> > -};
> >> > -
> >> >  /* Define sysctl names first */
> >> >  
> >> >  /* Top-level names: */
> >> [snip]
> >> 
> >> The uapi header change does not make sense.  The entire point of the
> >> header is to allow userspace programs to be able to call sys_sysctl.
> >> It either needs to all stay or all go.
> >> 
> >> As the concern with the uapi header is about userspace programs being
> >> able to compile please leave the header for now.
> >> 
> >> We should leave auditing userspace and seeing if userspace code will
> >> still compile if we remove this header for a separate patch.  The
> >> concerns and justifications for the uapi header are completely different
> >> then for the removing the sys_sysctl implementation.
> >> 
> >> Otherwise
> >> Acked-by: "Eric W. Biederman" 
> >
> > The UAPI header should be kept because it's defining an API not just
> > for the kernel the headers are supplied with, but for all past
> > kernels. In particular programs needing a failsafe CSPRNG source that
> > works on old kernels may (do) use this as a fallback only if modern
> > syscalls are missing. Removing the syscall is no problem since it
> > won't be used, but if you remove the types/macros from the UAPI
> > headers, they'll have to copy that into their own sources.
> 
> May we assume you know of a least one piece of userspace that will fail
> to compile if this header file is removed?

I know at least one piece of software is using SYS_sysctl for a
fallback CSPRNG source. I'm not 100% sure that they're using the
kernel headers; they might have copied it already. I'm also not sure
how many there are.

Regardless, I think the principle stands. There's no need to remove
definitions that are essentially maintenance-free now that the
interface is no longer available in new kernels, and doing so
contributes to the myth that you're supposed to use kernel headers
matching runtime kernel rather than it always being safe to use latest
headers.

Rich


Re: [PATCH v2] All arch: remove system call sys_sysctl

2020-06-11 Thread Rich Felker
On Thu, Jun 11, 2020 at 06:43:00AM -0500, Eric W. Biederman wrote:
> Xiaoming Ni  writes:
> 
> > Since the commit 61a47c1ad3a4dc ("sysctl: Remove the sysctl system call"),
> > sys_sysctl is actually unavailable: any input can only return an error.
> >
> > We have been warning about people using the sysctl system call for years
> > and believe there are no more users.  Even if there are users of this
> > interface if they have not complained or fixed their code by now they
> > probably are not going to, so there is no point in warning them any
> > longer.
> >
> > So completely remove sys_sysctl on all architectures.
> 
> 
> 
> >
> > Signed-off-by: Xiaoming Ni 
> >
> > changes in v2:
> >   According to Kees Cook's suggestion, completely remove sys_sysctl on all 
> > arch
> >   According to Eric W. Biederman's suggestion, update the commit log
> >
> > V1: 
> > https://lore.kernel.org/lkml/1591683605-8585-1-git-send-email-nixiaom...@huawei.com/
> >   Delete the code of sys_sysctl and return -ENOSYS directly at the function 
> > entry
> > ---
> >  include/uapi/linux/sysctl.h|  15 --
> [snip]
> 
> > diff --git a/include/uapi/linux/sysctl.h b/include/uapi/linux/sysctl.h
> > index 27c1ed2..84b44c3 100644
> > --- a/include/uapi/linux/sysctl.h
> > +++ b/include/uapi/linux/sysctl.h
> > @@ -27,21 +27,6 @@
> >  #include 
> >  #include 
> >  
> > -#define CTL_MAXNAME 10 /* how many path components do we allow 
> > in a
> > -  call to sysctl?   In other words, what is
> > -  the largest acceptable value for the nlen
> > -  member of a struct __sysctl_args to have? */
> > -
> > -struct __sysctl_args {
> > -   int __user *name;
> > -   int nlen;
> > -   void __user *oldval;
> > -   size_t __user *oldlenp;
> > -   void __user *newval;
> > -   size_t newlen;
> > -   unsigned long __unused[4];
> > -};
> > -
> >  /* Define sysctl names first */
> >  
> >  /* Top-level names: */
> [snip]
> 
> The uapi header change does not make sense.  The entire point of the
> header is to allow userspace programs to be able to call sys_sysctl.
> It either needs to all stay or all go.
> 
> As the concern with the uapi header is about userspace programs being
> able to compile please leave the header for now.
> 
> We should leave auditing userspace and seeing if userspace code will
> still compile if we remove this header for a separate patch.  The
> concerns and justifications for the uapi header are completely different
> then for the removing the sys_sysctl implementation.
> 
> Otherwise
> Acked-by: "Eric W. Biederman" 

The UAPI header should be kept because it's defining an API not just
for the kernel the headers are supplied with, but for all past
kernels. In particular programs needing a failsafe CSPRNG source that
works on old kernels may (do) use this as a fallback only if modern
syscalls are missing. Removing the syscall is no problem since it
won't be used, but if you remove the types/macros from the UAPI
headers, they'll have to copy that into their own sources.

Rich


Re: Linux powerpc new system call instruction and ABI

2020-06-11 Thread Segher Boessenkool
Hi!

On Thu, Jun 11, 2020 at 06:12:01PM +1000, Nicholas Piggin wrote:
> Calling convention
> --
> The proposal is for scv 0 to provide the standard Linux system call ABI 
> with the following differences from sc convention[1]:
> 
> - lr is to be volatile across scv calls. This is necessary because the 
>   scv instruction clobbers lr. From previous discussion, this should be 
>   possible to deal with in GCC clobbers and CFI.
> 
> - cr1 and cr5-cr7 are volatile. This matches the C ABI and would allow the
>   kernel system call exit to avoid restoring the volatile cr registers
>   (although we probably still would anyway to avoid information leaks).
> 
> - Error handling: The consensus among kernel, glibc, and musl is to move to
>   using negative return values in r3 rather than CR0[SO]=1 to indicate error,
>   which matches most other architectures, and is closer to a function call.

What about cr0 then?  Will it be volatile as well (exactly like for
function calls)?

> Notes
> -
> - r0,r4-r8 are documented as volatile in the ABI, but the kernel patch as
>   submitted currently preserves them. This is to leave room for deciding
>   which way to go with these.

The kernel has to set it to *something* that doesn't leak information ;-)


Segher


Re: [PATCH v5 2/4] riscv: Introduce CONFIG_RELOCATABLE

2020-06-11 Thread Alex Ghiti

Hi Jerome,

Le 6/10/20 à 10:10 AM, Jerome Forissier a écrit :

On 6/7/20 9:59 AM, Alexandre Ghiti wrote:
[...]


+config RELOCATABLE
+   bool
+   depends on MMU
+   help
+  This builds a kernel as a Position Independent Executable (PIE),
+  which retains all relocation metadata required to relocate the
+  kernel binary at runtime to a different virtual address than the
+  address it was linked at.
+  Since RISCV uses the RELA relocation format, this requires a
+  relocation pass at runtime even if the kernel is loaded at the
+  same address it was linked at.

Is this true? I thought that the GNU linker would write the "proper"
values by default, contrary to the LLVM linker (ld.lld) which would need
a special flag: --apply-dynamic-relocs (by default the relocated places
are set to zero). At least, it is my experience with Aarch64 on a
different project. So, sorry if I'm talking nonsense here -- I have not
looked at the details.




It seems that you're right, at least for aarch64 since they specifically 
specify the --no-apply-dynamic-relocs option. I retried to boot without 
relocating at runtime, and it fails on riscv. Can this be arch specific ?


Thanks,

Alex



Re: [PATCH v11 5/6] ndctl/papr_scm, uapi: Add support for PAPR nvdimm specific methods

2020-06-11 Thread Vaibhav Jain
Dan Williams  writes:

> On Wed, Jun 10, 2020 at 5:10 AM Vaibhav Jain  wrote:
>>
>> Dan Williams  writes:
>>
>> > On Tue, Jun 9, 2020 at 10:54 AM Vaibhav Jain  wrote:
>> >>
>> >> Thanks Dan for the consideration and taking time to look into this.
>> >>
>> >> My responses below:
>> >>
>> >> Dan Williams  writes:
>> >>
>> >> > On Mon, Jun 8, 2020 at 5:16 PM kernel test robot  wrote:
>> >> >>
>> >> >> Hi Vaibhav,
>> >> >>
>> >> >> Thank you for the patch! Perhaps something to improve:
>> >> >>
>> >> >> [auto build test WARNING on powerpc/next]
>> >> >> [also build test WARNING on linus/master v5.7 next-20200605]
>> >> >> [cannot apply to linux-nvdimm/libnvdimm-for-next scottwood/next]
>> >> >> [if your patch is applied to the wrong git tree, please drop us a note 
>> >> >> to help
>> >> >> improve the system. BTW, we also suggest to use '--base' option to 
>> >> >> specify the
>> >> >> base tree in git format-patch, please see 
>> >> >> https://stackoverflow.com/a/37406982]
>> >> >>
>> >> >> url:
>> >> >> https://github.com/0day-ci/linux/commits/Vaibhav-Jain/powerpc-papr_scm-Add-support-for-reporting-nvdimm-health/20200607-211653
>> >> >> base:   
>> >> >> https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
>> >> >> config: powerpc-randconfig-r016-20200607 (attached as .config)
>> >> >> compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project 
>> >> >> e429cffd4f228f70c1d9df0e5d77c08590dd9766)
>> >> >> reproduce (this is a W=1 build):
>> >> >> wget 
>> >> >> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross
>> >> >>  -O ~/bin/make.cross
>> >> >> chmod +x ~/bin/make.cross
>> >> >> # install powerpc cross compiling tool for clang build
>> >> >> # apt-get install binutils-powerpc-linux-gnu
>> >> >> # save the attached .config to linux build tree
>> >> >> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross 
>> >> >> ARCH=powerpc
>> >> >>
>> >> >> If you fix the issue, kindly add following tag as appropriate
>> >> >> Reported-by: kernel test robot 
>> >> >>
>> >> >> All warnings (new ones prefixed by >>, old ones prefixed by <<):
>> >> >>
>> >> >> In file included from :1:
>> >> >> >> ./usr/include/asm/papr_pdsm.h:69:20: warning: field 'hdr' with 
>> >> >> >> variable sized type 'struct nd_cmd_pkg' not at the end of a struct 
>> >> >> >> or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
>> >> >> struct nd_cmd_pkg hdr;  /* Package header containing sub-cmd */
>> >> >
>> >> > Hi Vaibhav,
>> >> >
>> >> [.]
>> >> > This looks like it's going to need another round to get this fixed. I
>> >> > don't think 'struct nd_pdsm_cmd_pkg' should embed a definition of
>> >> > 'struct nd_cmd_pkg'. An instance of 'struct nd_cmd_pkg' carries a
>> >> > payload that is the 'pdsm' specifics. As the code has it now it's
>> >> > defined as a superset of 'struct nd_cmd_pkg' and the compiler warning
>> >> > is pointing out a real 'struct' organization problem.
>> >> >
>> >> > Given the soak time needed in -next after the code is finalized this
>> >> > there's no time to do another round of updates and still make the v5.8
>> >> > merge window.
>> >>
>> >> Agreed that this looks bad, a solution will probably need some more
>> >> review cycles resulting in this series missing the merge window.
>> >>
>> >> I am investigating into the possible solutions for this reported issue
>> >> and made few observations:
>> >>
>> >> I see command pkg for Intel, Hpe, Msft and Hyperv families using a
>> >> similar layout of embedding nd_cmd_pkg at the head of the
>> >> command-pkg. struct nd_pdsm_cmd_pkg is following the same pattern.
>> >>
>> >> struct nd_pdsm_cmd_pkg {
>> >> struct nd_cmd_pkg hdr;
>> >> /* other members */
>> >> };
>> >>
>> >> struct ndn_pkg_msft {
>> >> struct nd_cmd_pkg gen;
>> >> /* other members */
>> >> };
>> >> struct nd_pkg_intel {
>> >> struct nd_cmd_pkg gen;
>> >> /* other members */
>> >> };
>> >> struct ndn_pkg_hpe1 {
>> >> struct nd_cmd_pkg gen;
>> >> /* other members */
>> [.]
>> >
>> > In those cases the other members are a union and there is no second
>> > variable length array. Perhaps that is why those definitions are not
>> > getting flagged? I'm not seeing anything in ndctl build options that
>> > would explicitly disable this warning, but I'm not sure if the ndctl
>> > build environment is missing this build warning by accident.
>>
>> I tried building ndctl master with clang-10 with CC=clang and
>> CFLAGS="". Seeing the same warning messages reported for all command
>> package struct for existing command families.
>>
>> ./hpe1.h:334:20: warning: field 'gen' with variable sized type 'struct 
>> nd_cmd_pkg' not at the end of a struct or class is a GNU extension 
>> [-Wgnu-variable-sized-type-not-at-end]
>> struct nd_cmd_pkg gen;
>>   ^
>> ./msft.h:59:20: warning: field 'gen' with variable sized type 'struct 
>> nd_cmd_pkg' not at 

Re: PowerPC KVM-PR issue

2020-06-11 Thread Christian Zigotzky

On 10 June 2020 at 01:23 pm, Christian Zigotzky wrote:

On 10 June 2020 at 11:06 am, Christian Zigotzky wrote:

On 10 June 2020 at 00:18 am, Christian Zigotzky wrote:

Hello,

KVM-PR doesn't work anymore on my Nemo board [1]. I figured out that 
the Git kernels and the kernel 5.7 are affected.


Error message: Fienix kernel: kvmppc_exit_pr_progint: emulation at 
700 failed ()


I can boot virtual QEMU PowerPC machines with KVM-PR with the kernel 
5.6 without any problems on my Nemo board.


I tested it with QEMU 2.5.0 and QEMU 5.0.0 today.

Could you please check KVM-PR on your PowerPC machine?

Thanks,
Christian

[1] https://en.wikipedia.org/wiki/AmigaOne_X1000


I figured out that the PowerPC updates 5.7-1 [1] are responsible for 
the KVM-PR issue. Please test KVM-PR on your PowerPC machines and 
check the PowerPC updates 5.7-1 [1].


Thanks

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d38c07afc356ddebaa3ed8ecb3f553340e05c969



I tested the latest Git kernel with Mac-on-Linux/KVM-PR today. 
Unfortunately I can't use KVM-PR with MoL anymore because of this 
issue (see screenshots [1]). Please check the PowerPC updates 5.7-1.


Thanks

[1]
- 
https://i.pinimg.com/originals/0c/b3/64/0cb364a40241fa2b7f297d4272bbb8b7.png
- 
https://i.pinimg.com/originals/9a/61/d1/9a61d170b1c9f514f7a78a3014ffd18f.png



Hi All,

I bisected today because of the KVM-PR issue.

Result:

9600f261acaaabd476d7833cec2dd20f2919f1a0 is the first bad commit
commit 9600f261acaaabd476d7833cec2dd20f2919f1a0
Author: Nicholas Piggin 
Date:   Wed Feb 26 03:35:21 2020 +1000

    powerpc/64s/exception: Move KVM test to common code

    This allows more code to be moved out of unrelocated regions. The
    system call KVMTEST is changed to be open-coded and remain in the
    tramp area to avoid having to move it to entry_64.S. The custom nature
    of the system call entry code means the hcall case can be made more
    streamlined than regular interrupt handlers.

    mpe: Incorporate fix from Nick:

    Moving KVM test to the common entry code missed the case of HMI and
    MCE, which do not do __GEN_COMMON_ENTRY (because they don't want to
    switch to virt mode).

    This means a MCE or HMI exception that is taken while KVM is running a
    guest context will not be switched out of that context, and KVM won't
    be notified. Found by running sigfuz in guest with patched host on
    POWER9 DD2.3, which causes some TM related HMI interrupts (which are
    expected and supposed to be handled by KVM).

    This fix adds a __GEN_REALMODE_COMMON_ENTRY for those handlers to add
    the KVM test. This makes them look a little more like other handlers
    that all use __GEN_COMMON_ENTRY.

    Signed-off-by: Nicholas Piggin 
    Signed-off-by: Michael Ellerman 
    Link: 
https://lore.kernel.org/r/20200225173541.1549955-13-npig...@gmail.com


:04 04 ec21cec22d165f8696d69532734cb2985d532cb0 
87dd49a9cd7202ec79350e8ca26cea01f1dbd93d M    arch


-

The following commit is the problem: powerpc/64s/exception: Move KVM 
test to common code [1]


These changes were included in the PowerPC updates 5.7-1. [2]

Another test:

git checkout d38c07afc356ddebaa3ed8ecb3f553340e05c969 (PowerPC updates 
5.7-1 [2] ) -> KVM-PR doesn't work.


After that: git revert d38c07afc356ddebaa3ed8ecb3f553340e05c969 -m 1 -> 
KVM-PR works.


Could you please check the first bad commit? [1]

Thanks,
Christian


[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9600f261acaaabd476d7833cec2dd20f2919f1a0
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d38c07afc356ddebaa3ed8ecb3f553340e05c969


Re: [PATCH] powerpc/64: indirect function call use bctrl rather than blrl in ret_from_kernel_thread

2020-06-11 Thread Christophe Leroy




Le 11/06/2020 à 14:11, Nicholas Piggin a écrit :

blrl is not recommended to use as an indirect function call, as it may
corrupt the link stack predictor.

This is not a performance critical path but this should be fixed for
consistency.


There's exactly the same in entry_32.S
Should it be changed there too ... for consistency :) ?

ppc32 also uses blrl for calling syscall handler, should it be changed 
as well ?


Christophe



Signed-off-by: Nicholas Piggin 
---
  arch/powerpc/kernel/entry_64.S | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 223c4f008e63..f59a17471d4d 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -400,12 +400,12 @@ _GLOBAL(ret_from_fork)
  _GLOBAL(ret_from_kernel_thread)
bl  schedule_tail
REST_NVGPRS(r1)
-   mtlrr14
+   mtctr   r14
mr  r3,r15
  #ifdef PPC64_ELF_ABI_v2
mr  r12,r14
  #endif
-   blrl
+   bctrl
li  r3,0
b   .Lsyscall_exit
  



[PATCH] powerpc/64: indirect function call use bctrl rather than blrl in ret_from_kernel_thread

2020-06-11 Thread Nicholas Piggin
blrl is not recommended to use as an indirect function call, as it may
corrupt the link stack predictor.

This is not a performance critical path but this should be fixed for
consistency.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/entry_64.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 223c4f008e63..f59a17471d4d 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -400,12 +400,12 @@ _GLOBAL(ret_from_fork)
 _GLOBAL(ret_from_kernel_thread)
bl  schedule_tail
REST_NVGPRS(r1)
-   mtlrr14
+   mtctr   r14
mr  r3,r15
 #ifdef PPC64_ELF_ABI_v2
mr  r12,r14
 #endif
-   blrl
+   bctrl
li  r3,0
b   .Lsyscall_exit
 
-- 
2.23.0



Re: [PATCH v2] All arch: remove system call sys_sysctl

2020-06-11 Thread Eric W. Biederman
Xiaoming Ni  writes:

> Since the commit 61a47c1ad3a4dc ("sysctl: Remove the sysctl system call"),
> sys_sysctl is actually unavailable: any input can only return an error.
>
> We have been warning about people using the sysctl system call for years
> and believe there are no more users.  Even if there are users of this
> interface if they have not complained or fixed their code by now they
> probably are not going to, so there is no point in warning them any
> longer.
>
> So completely remove sys_sysctl on all architectures.



>
> Signed-off-by: Xiaoming Ni 
>
> changes in v2:
>   According to Kees Cook's suggestion, completely remove sys_sysctl on all 
> arch
>   According to Eric W. Biederman's suggestion, update the commit log
>
> V1: 
> https://lore.kernel.org/lkml/1591683605-8585-1-git-send-email-nixiaom...@huawei.com/
>   Delete the code of sys_sysctl and return -ENOSYS directly at the function 
> entry
> ---
>  include/uapi/linux/sysctl.h|  15 --
[snip]

> diff --git a/include/uapi/linux/sysctl.h b/include/uapi/linux/sysctl.h
> index 27c1ed2..84b44c3 100644
> --- a/include/uapi/linux/sysctl.h
> +++ b/include/uapi/linux/sysctl.h
> @@ -27,21 +27,6 @@
>  #include 
>  #include 
>  
> -#define CTL_MAXNAME 10   /* how many path components do we allow 
> in a
> -call to sysctl?   In other words, what is
> -the largest acceptable value for the nlen
> -member of a struct __sysctl_args to have? */
> -
> -struct __sysctl_args {
> - int __user *name;
> - int nlen;
> - void __user *oldval;
> - size_t __user *oldlenp;
> - void __user *newval;
> - size_t newlen;
> - unsigned long __unused[4];
> -};
> -
>  /* Define sysctl names first */
>  
>  /* Top-level names: */
[snip]

The uapi header change does not make sense.  The entire point of the
header is to allow userspace programs to be able to call sys_sysctl.
It either needs to all stay or all go.

As the concern with the uapi header is about userspace programs being
able to compile please leave the header for now.

We should leave auditing userspace and seeing if userspace code will
still compile if we remove this header for a separate patch.  The
concerns and justifications for the uapi header are completely different
then for the removing the sys_sysctl implementation.

Otherwise
Acked-by: "Eric W. Biederman" 


Eric


[PATCH] powerpc/kvm/book3s64/nested: Fix kernel crash with nested kvm

2020-06-11 Thread Aneesh Kumar K.V
__pa() do check for addr value passed and if < PAGE_OFFSET
results in BUG.

 #define __pa(x)
\
({  \
VIRTUAL_BUG_ON((unsigned long)(x) < PAGE_OFFSET);   \
(unsigned long)(x) & 0x0fffUL;  \
})

kvmhv_copy_tofrom_guest_radix() use a NULL value for
to/from to indicate direction of copy. Avoid calling __pa() if the
value is NULL

kernel BUG at arch/powerpc/kvm/book3s_64_mmu_radix.c:43!
cpu 0x70: Vector: 700 (Program Check) at [c018a2187360]
pc: c0161b30: __kvmhv_copy_tofrom_guest_radix+0x130/0x1f0
lr: c0161d5c: kvmhv_copy_from_guest_radix+0x3c/0x80



[c018a2187670] c0161d5c kvmhv_copy_from_guest_radix+0x3c/0x80
[c018a21876b0] c014feb8 kvmhv_load_from_eaddr+0x48/0xc0
[c018a21876e0] c0135828 kvmppc_ld+0x98/0x1e0
[c018a2187780] c013bc20 kvmppc_load_last_inst+0x50/0x90
[c018a21877b0] c015e9e8 kvmppc_hv_emulate_mmio+0x288/0x2b0
[c018a2187810] c0164888 kvmppc_book3s_radix_page_fault+0xd8/0x2b0
[c018a21878c0] c015ed8c kvmppc_book3s_hv_page_fault+0x37c/0x1050
[c018a2187a00] c015a518 kvmppc_vcpu_run_hv+0xbb8/0x1080
[c018a2187b20] c013d204 kvmppc_vcpu_run+0x34/0x50
[c018a2187b40] c013949c kvm_arch_vcpu_ioctl_run+0x2fc/0x410
[c018a2187bd0] c012a2a4 kvm_vcpu_ioctl+0x2b4/0x8f0
[c018a2187d50] c05b12a4 ksys_ioctl+0xf4/0x150
[c018a2187da0] c05b1328 sys_ioctl+0x28/0x80
[c018a2187dc0] c0030584 system_call_exception+0x104/0x1d0
[c018a2187e20] c000ca68 system_call_common+0xe8/0x214

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 02219e28b1e4..84acb4769487 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -40,7 +40,8 @@ unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int 
pid,
/* Can't access quadrants 1 or 2 in non-HV mode, call the HV to do it */
if (kvmhv_on_pseries())
return plpar_hcall_norets(H_COPY_TOFROM_GUEST, lpid, pid, eaddr,
- __pa(to), __pa(from), n);
+ (to != NULL) ? __pa(to): 0,
+ (from != NULL) ? __pa(from): 0, n);
 
quadrant = 1;
if (!pid)
-- 
2.26.2



Re: [PATCH? v2] powerpc: Hard wire PT_SOFTE value to 1 in gpr_get() too

2020-06-11 Thread Jan Kratochvil
On Thu, 11 Jun 2020 12:58:31 +0200, Oleg Nesterov wrote:
> On 06/11, Madhavan Srinivasan wrote:
> > On 6/10/20 8:37 PM, Oleg Nesterov wrote:
> > > > This is not consistent and this breaks
> > > > http://sourceware.org/systemtap/wiki/utrace/tests/user-regs-peekpoke
> 
> this is 404.

Attaching the testcase, the CVS web interface no longer works on
sourceware.org.


Jan
/* Test case for PTRACE_SETREGS modifying the requested ragisters.
   x86* counterpart of the s390* testcase `user-area-access.c'.

   This software is provided 'as-is', without any express or implied
   warranty.  In no event will the authors be held liable for any damages
   arising from the use of this software.

   Permission is granted to anyone to use this software for any purpose,
   including commercial applications, and to alter it and redistribute it
   freely.  */

/* FIXME: EFLAGS should be tested restricted on the appropriate bits.  */

#define _GNU_SOURCE 1

#if defined __powerpc__ || defined __sparc__
# define user_regs_struct pt_regs
#endif

#ifdef __ia64__
#define ia64_fpreg ia64_fpreg_DISABLE
#define pt_all_user_regs pt_all_user_regs_DISABLE
#endif  /* __ia64__ */
#include 
#ifdef __ia64__
#undef ia64_fpreg
#undef pt_all_user_regs
#endif  /* __ia64__ */
#include 
#include 
#include 
#if defined __i386__ || defined __x86_64__
#include 
#endif
#include 

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

/* ia64 has PTRACE_SETREGS but it has no USER_REGS_STRUCT.  */
#if !defined PTRACE_SETREGS || defined __ia64__

int
main (void)
{
  return 77;
}

#else   /* PTRACE_SETREGS */

/* The minimal alignment we use for the random access ranges.  */
#define REGALIGN (sizeof (long))

static pid_t child;

static void
cleanup (void)
{
  if (child > 0)
kill (child, SIGKILL);
  child = 0;
}

static void
handler_fail (int signo)
{
  cleanup ();
  signal (SIGABRT, SIG_DFL);
  abort ();
}

int
main (void)
{
  long l;
  int status, i;
  pid_t pid;
  union
{
  struct user_regs_struct user;
  unsigned char byte[sizeof (struct user_regs_struct)];
} u, u2;
  int start;

  setbuf (stdout, NULL);
  atexit (cleanup);
  signal (SIGABRT, handler_fail);
  signal (SIGALRM, handler_fail);
  signal (SIGINT, handler_fail);
  i = alarm (10);
  assert (i == 0);

  child = fork ();
  switch (child)
{
case -1:
  assert_perror (errno);
  assert (0);

case 0:
  l = ptrace (PTRACE_TRACEME, 0, NULL, NULL);
  assert (l == 0);

  // Prevent rt_sigprocmask() call called by glibc after raise().
  syscall (__NR_tkill, getpid (), SIGSTOP);
  assert (0);

default:
  break;
}

  pid = waitpid (child, , 0);
  assert (pid == child);
  assert (WIFSTOPPED (status));
  assert (WSTOPSIG (status) == SIGSTOP);

  /* Fetch U2 from the inferior.  */
  errno = 0;
# ifdef __sparc__
  l = ptrace (PTRACE_GETREGS, child, , NULL);
# else
  l = ptrace (PTRACE_GETREGS, child, NULL, );
# endif
  assert_perror (errno);
  assert (l == 0);

  /* Initialize U with a pattern.  */
  for (i = 0; i < sizeof u.byte; i++)
u.byte[i] = i;
#ifdef __x86_64__
  /* non-EFLAGS modifications fail with EIO,  EFLAGS gets back different.  */
  u.user.eflags = u2.user.eflags;
  u.user.cs = u2.user.cs;
  u.user.ds = u2.user.ds;
  u.user.es = u2.user.es;
  u.user.fs = u2.user.fs;
  u.user.gs = u2.user.gs;
  u.user.ss = u2.user.ss;
  u.user.fs_base = u2.user.fs_base;
  u.user.gs_base = u2.user.gs_base;
  /* RHEL-4 refuses to set too high (and invalid) PC values.  */
  u.user.rip = (unsigned long) handler_fail;
  /* 2.6.25 always truncates and sign-extends orig_rax.  */
  u.user.orig_rax = (int) u.user.orig_rax;
#endif  /* __x86_64__ */
#ifdef __i386__
  /* These values get back different.  */
  u.user.xds = u2.user.xds;
  u.user.xes = u2.user.xes;
  u.user.xfs = u2.user.xfs;
  u.user.xgs = u2.user.xgs;
  u.user.xcs = u2.user.xcs;
  u.user.eflags = u2.user.eflags;
  u.user.xss = u2.user.xss;
  /* RHEL-4 refuses to set too high (and invalid) PC values.  */
  u.user.eip = (unsigned long) handler_fail;
#endif  /* __i386__ */
#ifdef __powerpc__
  /* These fields are constrained.  */
  u.user.msr = u2.user.msr;
# ifdef __powerpc64__
  u.user.softe = u2.user.softe;
# else
  u.user.mq = u2.user.mq;
# endif /* __powerpc64__ */
  u.user.trap = u2.user.trap;
  u.user.dar = u2.user.dar;
  u.user.dsisr = u2.user.dsisr;
  u.user.result = u2.user.result;
#endif  /* __powerpc__ */

  /* Poke U.  */
# ifdef __sparc__
  l = ptrace (PTRACE_SETREGS, child, , NULL);
# else
  l = ptrace (PTRACE_SETREGS, child, NULL, );
# endif
  assert (l == 0);

  /* Peek into U2.  */
# ifdef __sparc__
  l = ptrace (PTRACE_GETREGS, child, , NULL);
# else
  l = ptrace (PTRACE_GETREGS, child, NULL, );
# endif
  assert (l == 0);

  /* Verify it matches.  */
  if (memcmp (, , sizeof u.byte) != 0)
{
  for (start = 0; start + REGALIGN <= sizeof u.byte; start += REGALIGN)
if (*(unsigned long *) (u.byte + 

Re: [PATCH? v2] powerpc: Hard wire PT_SOFTE value to 1 in gpr_get() too

2020-06-11 Thread Oleg Nesterov
On 06/11, Madhavan Srinivasan wrote:
>
>
> On 6/10/20 8:37 PM, Oleg Nesterov wrote:
> >Hi,
> >
> >looks like this patch was forgotten.
>
> yep, I missed this. But mpe did have comments for the patch.
>
> https://lkml.org/lkml/2019/9/19/107

Yes, and I thought that I have replied... apparently not, sorry!

So let me repeat, I am fine either way, I do not understand this
ppc-specific code and I can't really test this change.

Let me quote that email from Michael:

> We could do it like below. I'm 50/50 though on whether it's worth it, 
or
> if we should just go with the big ifdef like in your patch.

up to you ;)

Hmm. And yes,

> >>This is not consistent and this breaks
> >>http://sourceware.org/systemtap/wiki/utrace/tests/user-regs-peekpoke

this is 404.

Jan, could correct the link above?

Oleg.



Re: [PATCH? v2] powerpc: Hard wire PT_SOFTE value to 1 in gpr_get() too

2020-06-11 Thread Madhavan Srinivasan




On 6/10/20 8:37 PM, Oleg Nesterov wrote:

Hi,

looks like this patch was forgotten.


yep, I missed this. But mpe did have comments for the patch.

https://lkml.org/lkml/2019/9/19/107

Maddy


Do you think this should be fixed or should we document that
PTRACE_GETREGS is not consistent with PTRACE_PEEKUSER on ppc64?


On 09/17, Oleg Nesterov wrote:

I don't have a ppc machine, this patch wasn't even compile tested,
could you please review?

The commit a8a4b03ab95f ("powerpc: Hard wire PT_SOFTE value to 1 in
ptrace & signals") changed ptrace_get_reg(PT_SOFTE) to report 0x1,
but PTRACE_GETREGS still copies pt_regs->softe as is.

This is not consistent and this breaks
http://sourceware.org/systemtap/wiki/utrace/tests/user-regs-peekpoke

Reported-by: Jan Kratochvil 
Signed-off-by: Oleg Nesterov 
---
  arch/powerpc/kernel/ptrace.c | 25 +
  1 file changed, 25 insertions(+)

diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index 8c92feb..291acfb 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -363,11 +363,36 @@ static int gpr_get(struct task_struct *target, const 
struct user_regset *regset,
BUILD_BUG_ON(offsetof(struct pt_regs, orig_gpr3) !=
 offsetof(struct pt_regs, msr) + sizeof(long));
  
+#ifdef CONFIG_PPC64

+   if (!ret)
+   ret = user_regset_copyout(, , , ,
+ >thread.regs->orig_gpr3,
+ offsetof(struct pt_regs, orig_gpr3),
+ offsetof(struct pt_regs, softe));
+
+   if (!ret) {
+   unsigned long softe = 0x1;
+   ret = user_regset_copyout(, , , , ,
+ offsetof(struct pt_regs, softe),
+ offsetof(struct pt_regs, softe) +
+ sizeof(softe));
+   }
+
+   BUILD_BUG_ON(offsetof(struct pt_regs, trap) !=
+offsetof(struct pt_regs, softe) + sizeof(long));
+
+   if (!ret)
+   ret = user_regset_copyout(, , , ,
+ >thread.regs->trap,
+ offsetof(struct pt_regs, trap),
+ sizeof(struct user_pt_regs));
+#else
if (!ret)
ret = user_regset_copyout(, , , ,
  >thread.regs->orig_gpr3,
  offsetof(struct pt_regs, orig_gpr3),
  sizeof(struct user_pt_regs));
+#endif
if (!ret)
ret = user_regset_copyout_zero(, , , ,
   sizeof(struct user_pt_regs), -1);
--
2.5.0





[PATCH 2/2] powerpc/64s: system call support for scv/rfscv instructions

2020-06-11 Thread Nicholas Piggin
Add support for the scv instruction on POWER9 and later CPUs.

For now this implements the zeroth scv vector 'scv 0', as identical to
'sc' system calls, with the exception that lr is not preserved, nor are
volatile cr registers, and error is not indicated with CR0[SO], but by
returning a negative errno.

rfscv is implemented to return from scv type system calls. It can not be
used to return from sc system calls because those are defined to
preserve lr.

getpid syscall throughput on POWER9 is improved by 26% (428 to 318
cycles), largely due to reducing mtmsr and mtspr.

Signed-off-by: Nicholas Piggin 
---
 Documentation/powerpc/syscall64-abi.rst   |  42 --
 arch/powerpc/include/asm/asm-prototypes.h |   2 +-
 arch/powerpc/include/asm/exception-64s.h  |   6 +
 arch/powerpc/include/asm/head-64.h|   2 +-
 arch/powerpc/include/asm/ppc-opcode.h |   2 +
 arch/powerpc/include/asm/ppc_asm.h|   2 +
 arch/powerpc/include/asm/ptrace.h |   7 +-
 arch/powerpc/include/asm/setup.h  |   4 +-
 arch/powerpc/include/asm/sstep.h  |   1 +
 arch/powerpc/kernel/cpu_setup_power.S |  10 +-
 arch/powerpc/kernel/cputable.c|   3 +-
 arch/powerpc/kernel/dt_cpu_ftrs.c |   1 +
 arch/powerpc/kernel/entry_64.S| 171 +-
 arch/powerpc/kernel/exceptions-64s.S  | 123 +++-
 arch/powerpc/kernel/process.c |  10 +-
 arch/powerpc/kernel/setup_64.c|   5 +-
 arch/powerpc/kernel/signal.c  |  19 ++-
 arch/powerpc/kernel/syscall_64.c  |  32 +++-
 arch/powerpc/lib/sstep.c  |  14 ++
 arch/powerpc/platforms/pseries/setup.c|   8 +-
 arch/powerpc/xmon/xmon.c  |   1 +
 21 files changed, 421 insertions(+), 44 deletions(-)

diff --git a/Documentation/powerpc/syscall64-abi.rst 
b/Documentation/powerpc/syscall64-abi.rst
index e49f69f941b9..46caaadbb029 100644
--- a/Documentation/powerpc/syscall64-abi.rst
+++ b/Documentation/powerpc/syscall64-abi.rst
@@ -5,6 +5,15 @@ Power Architecture 64-bit Linux system call ABI
 syscall
 ===
 
+Invocation
+--
+The syscall is made with the sc instruction, and returns with execution
+continuing at the instruction following the sc instruction.
+
+If PPC_FEATURE2_SCV appears in the AT_HWCAP2 ELF auxiliary vector, the
+scv 0 instruction is an alternative that may provide better performance,
+with some differences to calling sequence.
+
 syscall calling sequence\ [1]_ matches the Power Architecture 64-bit ELF ABI
 specification C function calling sequence, including register preservation
 rules, with the following differences.
@@ -12,16 +21,23 @@ rules, with the following differences.
 .. [1] Some syscalls (typically low-level management functions) may have
different calling sequences (e.g., rt_sigreturn).
 
-Parameters and return value

+Parameters
+--
 The system call number is specified in r0.
 
 There is a maximum of 6 integer parameters to a syscall, passed in r3-r8.
 
-Both a return value and a return error code are returned. cr0.SO is the return
-error code, and r3 is the return value or error code. When cr0.SO is clear,
-the syscall succeeded and r3 is the return value. When cr0.SO is set, the
-syscall failed and r3 is the error code that generally corresponds to errno.
+Return value
+
+- For the sc instruction, both a value and an error condition are returned.
+  cr0.SO is the error condition, and r3 is the return value. When cr0.SO is
+  clear, the syscall succeeded and r3 is the return value. When cr0.SO is set,
+  the syscall failed and r3 is the error value (that normally corresponds to
+  errno).
+
+- For the scv 0 instruction, the return value indicates failure if it is
+  -4095..-1 (i.e., it is >= -MAX_ERRNO (-4095) as an unsigned comparison),
+  in which case the error value is the negated return value.
 
 Stack
 -
@@ -34,22 +50,23 @@ Register preservation rules match the ELF ABI calling 
sequence with the
 following differences:
 
 === = 
+--- For the sc instruction, differences with the ELF ABI ---
 r0  Volatile  (System call number.)
 r3  Volatile  (Parameter 1, and return value.)
 r4-r8   Volatile  (Parameters 2-6.)
-cr0 Volatile  (cr0.SO is the return error condition)
+cr0 Volatile  (cr0.SO is the return error condition.)
 cr1, cr5-7  Nonvolatile
 lr  Nonvolatile
+
+--- For the scv 0 instruction, differences with the ELF ABI ---
+r0  Volatile  (System call number.)
+r3  Volatile  (Parameter 1, and return value.)
+r4-r8   Volatile  (Parameters 2-6.)
 === = 
 
 All floating point and vector data registers as well as control and status
 registers are nonvolatile.
 
-Invocation
---
-The syscall is performed with the sc 

[PATCH 1/2] powerpc/64s/exception: treat NIA below __end_interrupts as soft-masked

2020-06-11 Thread Nicholas Piggin
The scv instruction causes an interrupt which can enter the kernel with
MSR[EE]=1, thus allowing interrupts to hit at any time. These must not
be taken as normal interrupts, because they come from MSR[PR]=0 context,
and yet the kernel stack is not yet set up and r13 is not set to the
PACA).

Treat this as a soft-masked interrupt regardless of the soft masked
state. This does not affect behaviour yet, because currently all
interrupts are taken with MSR[EE]=0.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 27 ---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index e70ebb5c318c..388e34665b4a 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -508,8 +508,24 @@ DEFINE_FIXED_SYMBOL(\name\()_common_real)
 
 .macro __GEN_COMMON_BODY name
.if IMASK
+   .if ! ISTACK
+   .error "No support for masked interrupt to use custom stack"
+   .endif
+
+   /* If coming from user, skip soft-mask tests. */
+   andi.   r10,r12,MSR_PR
+   bne 2f
+
+   /* Kernel code running below __end_interrupts is implicitly
+* soft-masked */
+   LOAD_HANDLER(r10, __end_interrupts)
+   cmpld   r11,r10
+   li  r10,IMASK
+   blt-1f
+
+   /* Test the soft mask state against our interrupt's bit */
lbz r10,PACAIRQSOFTMASK(r13)
-   andi.   r10,r10,IMASK
+1: andi.   r10,r10,IMASK
/* Associate vector numbers with bits in paca->irq_happened */
.if IVEC == 0x500 || IVEC == 0xea0
li  r10,PACA_IRQ_EE
@@ -540,7 +556,7 @@ DEFINE_FIXED_SYMBOL(\name\()_common_real)
 
.if ISTACK
andi.   r10,r12,MSR_PR  /* See if coming from user  */
-   mr  r10,r1  /* Save r1  */
+2: mr  r10,r1  /* Save r1  */
subir1,r1,INT_FRAME_SIZE/* alloc frame on kernel stack  */
beq-100f
ld  r1,PACAKSAVE(r13)   /* kernel stack to use  */
@@ -2838,7 +2854,8 @@ masked_interrupt:
ld  r10,PACA_EXGEN+EX_R10(r13)
ld  r11,PACA_EXGEN+EX_R11(r13)
ld  r12,PACA_EXGEN+EX_R12(r13)
-   /* returns to kernel where r13 must be set up, so don't restore it */
+   ld  r13,PACA_EXGEN+EX_R13(r13)
+   /* May return to masked low address where r13 is not set up */
.if \hsrr
HRFI_TO_KERNEL
.else
@@ -2997,6 +3014,10 @@ EXC_COMMON_BEGIN(ppc64_runlatch_on_trampoline)
 
 USE_FIXED_SECTION(virt_trampolines)
/*
+* All code below __end_interrupts is treated as soft-masked. If
+* any code runs here with MSR[EE]=1, it must then cope with pending
+* soft interrupt being raised (i.e., by ensuring it is replayed).
+*
 * The __end_interrupts marker must be past the out-of-line (OOL)
 * handlers, so that they are copied to real address 0x100 when running
 * a relocatable kernel. This ensures they can be reached from the short
-- 
2.23.0



Linux powerpc new system call instruction and ABI

2020-06-11 Thread Nicholas Piggin
Thanks to everyone who has given feedback on the proposed new system
call instruction and ABI, I think it has reached agreement and the
implementation can be merged into Linux.

I have a hacked glibc implementation (that doesn't do all the right
HWCAP detection and misses a few things) that I've tested several things
including some kernel selftests (involving signals and syscalls) with.

System Call Vectored (scv) ABI
==

The scv instruction is introduced with POWER9 / ISA3, it comes with an
rfscv counter-part. The benefit of these instructions is performance
(trading slower SRR0/1 with faster LR/CTR registers, and entering the
kernel with MSR[EE] and MSR[RI] left enabled, which can reduce MSR 
updates. The scv instruction has 128 levels (not enough to cover the Linux
system call space).

Assignment and advertisement

The proposal is to assign scv levels conservatively, and advertise them
with HWCAP feature bits as we add support for more.

Linux has not enabled FSCR[SCV] yet, so executing the scv instruction will
cause the kernel to log a "SCV facility unavilable" message, and deliver a
SIGILL with ILL_ILLOPC to the process. Linux has defined a HWCAP2 bit
PPC_FEATURE2_SCV for SCV support, but does not set it.

This change allocates the zero level ('scv 0'), advertised with
PPC_FEATURE2_SCV, which will be used to provide normal Linux system
calls (equivalent to 'sc').

Attempting to execute scv with other levels will cause a SIGILL to be
delivered the same as before, but will not log a "SCV facility unavailable"
message (because the processor facility is enabled).

Calling convention
--
The proposal is for scv 0 to provide the standard Linux system call ABI 
with the following differences from sc convention[1]:

- lr is to be volatile across scv calls. This is necessary because the 
  scv instruction clobbers lr. From previous discussion, this should be 
  possible to deal with in GCC clobbers and CFI.

- cr1 and cr5-cr7 are volatile. This matches the C ABI and would allow the
  kernel system call exit to avoid restoring the volatile cr registers
  (although we probably still would anyway to avoid information leaks).

- Error handling: The consensus among kernel, glibc, and musl is to move to
  using negative return values in r3 rather than CR0[SO]=1 to indicate error,
  which matches most other architectures, and is closer to a function call.

Notes
-
- r0,r4-r8 are documented as volatile in the ABI, but the kernel patch as
  submitted currently preserves them. This is to leave room for deciding
  which way to go with these. Some small benefit was found by preserving
  them[1] but I'm not convinced it's worth deviating from the C function
  call ABI just for this. Release code should follow the ABI.

Previous discussions:
https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/208691.html
https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/209268.html

[1] 
https://github.com/torvalds/linux/blob/master/Documentation/powerpc/syscall64-abi.rst
[2] https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/209263.html

The following patches to add scv support to Linux are posted to

 https://lists.ozlabs.org/pipermail/linuxppc-dev/

Nicholas Piggin (2):
  powerpc/64s/exception: treat NIA below __end_interrupts as soft-masked
  powerpc/64s: system call support for scv/rfscv instructions

Thanks,
Nick

-- 
2.23.0



Re: [PATCH v2] All arch: remove system call sys_sysctl

2020-06-11 Thread Will Deacon
On Thu, Jun 11, 2020 at 11:54:00AM +0800, Xiaoming Ni wrote:
> Since the commit 61a47c1ad3a4dc ("sysctl: Remove the sysctl system call"),
> sys_sysctl is actually unavailable: any input can only return an error.
> 
> We have been warning about people using the sysctl system call for years
> and believe there are no more users.  Even if there are users of this
> interface if they have not complained or fixed their code by now they
> probably are not going to, so there is no point in warning them any
> longer.
> 
> So completely remove sys_sysctl on all architectures.
> 
> Signed-off-by: Xiaoming Ni 
> 
> changes in v2:
>   According to Kees Cook's suggestion, completely remove sys_sysctl on all 
> arch
>   According to Eric W. Biederman's suggestion, update the commit log
> 
> V1: 
> https://lore.kernel.org/lkml/1591683605-8585-1-git-send-email-nixiaom...@huawei.com/
>   Delete the code of sys_sysctl and return -ENOSYS directly at the function 
> entry
> ---
>  arch/alpha/kernel/syscalls/syscall.tbl |   2 +-
>  arch/arm/configs/am200epdkit_defconfig |   1 -
>  arch/arm/tools/syscall.tbl |   2 +-
>  arch/arm64/include/asm/unistd32.h  |   4 +-

For the arm/arm64 parts:

Acked-by: Will Deacon 

Will


[Bug 205183] PPC64: Signal delivery fails with SIGSEGV if between about 1KB and 4KB bytes of stack remain

2020-06-11 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=205183

--- Comment #4 from Daniel Black (dan...@linux.ibm.com) ---
Still broken.

danielgb@talos2:~$ gcc -g -Wall -O stacktest.c
danielgb@talos2:~$  ./a.out 124 &
[1] 494618
danielgb@talos2:~$ cat /proc/$(pidof a.out)/maps | grep stack
7fffcde8-7fffcdfb rw-p  00:00 0 
[stack]
danielgb@talos2:~$ kill -USR1 %1
danielgb@talos2:~$ signal delivered, stack base 0x7fffcdfb top
0x7fffcde81427 (1240025 used)

[1]+  Done./a.out 124
danielgb@talos2:~$ ./a.out 1241000 &
[1] 494677
danielgb@talos2:~$ kill -USR1 %1
danielgb@talos2:~$ 
[1]+  Segmentation fault  ./a.out 1241000
danielgb@talos2:~$ 
danielgb@talos2:~$ dmesg | grep a.out
[10617.616145] a.out[494587]: bad frame in setup_rt_frame: 7fffdea30010 nip
00011a0a09fc lr 7fffa1c404c8
[10865.752876] a.out[494677]: bad frame in setup_rt_frame: 7fffcc420030 nip
000135a70a3c lr 7fff952604c8
danielgb@talos2:~$ uname -a
Linux talos2 5.7.0-rc5-77151-gfea086b627a0 #1 SMP Mon May 11 16:00:00 AEST 2020
ppc64le ppc64le ppc64le GNU/Linux

-- 
You are receiving this mail because:
You are watching the assignee of the bug.