[PATCH v4 16/24] powerpc/pseries: Implement signed update for PLPKS objects

2023-01-19 Thread Andrew Donnellan
From: Nayna Jain 

The Platform Keystore provides a signed update interface which can be used
to create, replace or append to certain variables in the PKS in a secure
fashion, with the hypervisor requiring that the update be signed using the
Platform Key.

Implement an interface to the H_PKS_SIGNED_UPDATE hcall in the plpks
driver to allow signed updates to PKS objects.

(The plpks driver doesn't need to do any cryptography or otherwise handle
the actual signed variable contents - that will be handled by userspace
tooling.)

Signed-off-by: Nayna Jain 
[ajd: split patch, add timeout handling and misc cleanups]
Co-developed-by: Andrew Donnellan 
Signed-off-by: Andrew Donnellan 
Signed-off-by: Russell Currey 

---

v3: Merge plpks fixes and signed update series with secvar series

Fix error code handling in plpks_confirm_object_flushed() (ruscur)

Pass plpks_var struct to plpks_signed_update_var() by reference (mpe)

Consistent constant naming scheme (ruscur)

v4: Fix MAX_HCALL_OPCODE rebasing issue (npiggin)
---
 arch/powerpc/include/asm/hvcall.h  |  1 +
 arch/powerpc/include/asm/plpks.h   |  5 ++
 arch/powerpc/platforms/pseries/plpks.c | 71 --
 3 files changed, 72 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index 95fd7f9485d5..c099780385dd 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -335,6 +335,7 @@
 #define H_RPT_INVALIDATE   0x448
 #define H_SCM_FLUSH0x44C
 #define H_GET_ENERGY_SCALE_INFO0x450
+#define H_PKS_SIGNED_UPDATE0x454
 #define H_WATCHDOG 0x45C
 #define MAX_HCALL_OPCODE   H_WATCHDOG
 
diff --git a/arch/powerpc/include/asm/plpks.h b/arch/powerpc/include/asm/plpks.h
index 7c5f51a9af7c..e7204e6c0ca4 100644
--- a/arch/powerpc/include/asm/plpks.h
+++ b/arch/powerpc/include/asm/plpks.h
@@ -68,6 +68,11 @@ struct plpks_var_name_list {
struct plpks_var_name varlist[];
 };
 
+/**
+ * Updates the authenticated variable. It expects NULL as the component.
+ */
+int plpks_signed_update_var(struct plpks_var *var, u64 flags);
+
 /**
  * Writes the specified var and its data to PKS.
  * Any caller of PKS driver should present a valid component type for
diff --git a/arch/powerpc/platforms/pseries/plpks.c 
b/arch/powerpc/platforms/pseries/plpks.c
index 1189246b03dc..796ed5544ee5 100644
--- a/arch/powerpc/platforms/pseries/plpks.c
+++ b/arch/powerpc/platforms/pseries/plpks.c
@@ -81,6 +81,12 @@ static int pseries_status_to_err(int rc)
err = -ENOENT;
break;
case H_BUSY:
+   case H_LONG_BUSY_ORDER_1_MSEC:
+   case H_LONG_BUSY_ORDER_10_MSEC:
+   case H_LONG_BUSY_ORDER_100_MSEC:
+   case H_LONG_BUSY_ORDER_1_SEC:
+   case H_LONG_BUSY_ORDER_10_SEC:
+   case H_LONG_BUSY_ORDER_100_SEC:
err = -EBUSY;
break;
case H_AUTHORITY:
@@ -184,14 +190,17 @@ static struct label *construct_label(char *component, u8 
varos, u8 *name,
 u16 namelen)
 {
struct label *label;
-   size_t slen;
+   size_t slen = 0;
 
if (!name || namelen > PLPKS_MAX_NAME_SIZE)
return ERR_PTR(-EINVAL);
 
-   slen = strlen(component);
-   if (component && slen > sizeof(label->attr.prefix))
-   return ERR_PTR(-EINVAL);
+   // Support NULL component for signed updates
+   if (component) {
+   slen = strlen(component);
+   if (slen > sizeof(label->attr.prefix))
+   return ERR_PTR(-EINVAL);
+   }
 
// The label structure must not cross a page boundary, so we align to 
the next power of 2
label = kzalloc(roundup_pow_of_two(sizeof(*label)), GFP_KERNEL);
@@ -397,6 +406,58 @@ static int plpks_confirm_object_flushed(struct label 
*label,
return pseries_status_to_err(rc);
 }
 
+int plpks_signed_update_var(struct plpks_var *var, u64 flags)
+{
+   unsigned long retbuf[PLPAR_HCALL9_BUFSIZE] = {0};
+   int rc;
+   struct label *label;
+   struct plpks_auth *auth;
+   u64 continuetoken = 0;
+   u64 timeout = 0;
+
+   if (!var->data || var->datalen <= 0 || var->namelen > 
PLPKS_MAX_NAME_SIZE)
+   return -EINVAL;
+
+   if (!(var->policy & PLPKS_SIGNEDUPDATE))
+   return -EINVAL;
+
+   auth = construct_auth(PLPKS_OS_OWNER);
+   if (IS_ERR(auth))
+   return PTR_ERR(auth);
+
+   label = construct_label(var->component, var->os, var->name, 
var->namelen);
+   if (IS_ERR(label)) {
+   rc = PTR_ERR(label);
+   goto out;
+   }
+
+   do {
+   rc = plpar_hcall9(H_PKS_SIGNED_UPDATE, retbuf,
+ virt_to_phys(auth), virt_to_phys(label),
+ label->size, var->policy, flags,
+ 

[PATCH v4 15/24] powerpc/pseries: Expose PLPKS config values, support additional fields

2023-01-19 Thread Andrew Donnellan
From: Nayna Jain 

The plpks driver uses the H_PKS_GET_CONFIG hcall to retrieve configuration
and status information about the PKS from the hypervisor.

Update _plpks_get_config() to handle some additional fields. Add getter
functions to allow the PKS configuration information to be accessed from
other files. Validate that the values we're getting comply with the spec.

While we're here, move the config struct in _plpks_get_config() off the
stack - it's getting large and we also need to make sure it doesn't cross
a page boundary.

Signed-off-by: Nayna Jain 
[ajd: split patch, extend to support additional v3 API fields, minor fixes]
Co-developed-by: Andrew Donnellan 
Signed-off-by: Andrew Donnellan 
Signed-off-by: Russell Currey 

---

v3: Merge plpks fixes and signed update series with secvar series

Refresh config values in plpks_get_usedspace() (ajd)

Validate the config values being returned comply with spec (ruscur)

Return maxobjlabelsize as is (ruscur)

Move plpks.h to include/asm (ruscur)

Fix checkpatch checks (ruscur)
---
 arch/powerpc/include/asm/plpks.h   |  58 ++
 arch/powerpc/platforms/pseries/plpks.c | 149 +++--
 2 files changed, 195 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/plpks.h b/arch/powerpc/include/asm/plpks.h
index 6466aadd7145..7c5f51a9af7c 100644
--- a/arch/powerpc/include/asm/plpks.h
+++ b/arch/powerpc/include/asm/plpks.h
@@ -96,6 +96,64 @@ int plpks_read_fw_var(struct plpks_var *var);
  */
 int plpks_read_bootloader_var(struct plpks_var *var);
 
+/**
+ * Returns if PKS is available on this LPAR.
+ */
+bool plpks_is_available(void);
+
+/**
+ * Returns version of the Platform KeyStore.
+ */
+u8 plpks_get_version(void);
+
+/**
+ * Returns hypervisor storage overhead per object, not including the size of
+ * the object or label. Only valid for config version >= 2
+ */
+u16 plpks_get_objoverhead(void);
+
+/**
+ * Returns maximum password size. Must be >= 32 bytes
+ */
+u16 plpks_get_maxpwsize(void);
+
+/**
+ * Returns maximum object size supported by Platform KeyStore.
+ */
+u16 plpks_get_maxobjectsize(void);
+
+/**
+ * Returns maximum object label size supported by Platform KeyStore.
+ */
+u16 plpks_get_maxobjectlabelsize(void);
+
+/**
+ * Returns total size of the configured Platform KeyStore.
+ */
+u32 plpks_get_totalsize(void);
+
+/**
+ * Returns used space from the total size of the Platform KeyStore.
+ */
+u32 plpks_get_usedspace(void);
+
+/**
+ * Returns bitmask of policies supported by the hypervisor.
+ */
+u32 plpks_get_supportedpolicies(void);
+
+/**
+ * Returns maximum byte size of a single object supported by the hypervisor.
+ * Only valid for config version >= 3
+ */
+u32 plpks_get_maxlargeobjectsize(void);
+
+/**
+ * Returns bitmask of signature algorithms supported for signed updates.
+ * Only valid for config version >= 3
+ */
+u64 plpks_get_signedupdatealgorithms(void);
+
 #endif // CONFIG_PSERIES_PLPKS
 
 #endif // _ASM_POWERPC_PLPKS_H
diff --git a/arch/powerpc/platforms/pseries/plpks.c 
b/arch/powerpc/platforms/pseries/plpks.c
index 91f3f623a2c7..1189246b03dc 100644
--- a/arch/powerpc/platforms/pseries/plpks.c
+++ b/arch/powerpc/platforms/pseries/plpks.c
@@ -24,8 +24,16 @@ static u8 *ospassword;
 static u16 ospasswordlength;
 
 // Retrieved with H_PKS_GET_CONFIG
+static u8 version;
+static u16 objoverhead;
 static u16 maxpwsize;
 static u16 maxobjsize;
+static s16 maxobjlabelsize;
+static u32 totalsize;
+static u32 usedspace;
+static u32 supportedpolicies;
+static u32 maxlargeobjectsize;
+static u64 signedupdatealgorithms;
 
 struct plpks_auth {
u8 version;
@@ -206,32 +214,149 @@ static struct label *construct_label(char *component, u8 
varos, u8 *name,
 static int _plpks_get_config(void)
 {
unsigned long retbuf[PLPAR_HCALL_BUFSIZE] = { 0 };
-   struct {
+   struct config {
u8 version;
u8 flags;
-   __be32 rsvd0;
+   __be16 rsvd0;
+   __be16 objoverhead;
__be16 maxpwsize;
__be16 maxobjlabelsize;
__be16 maxobjsize;
__be32 totalsize;
__be32 usedspace;
__be32 supportedpolicies;
-   __be64 rsvd1;
-   } __packed config;
+   __be32 maxlargeobjectsize;
+   __be64 signedupdatealgorithms;
+   u8 rsvd1[476];
+   } __packed * config;
size_t size;
-   int rc;
+   int rc = 0;
+
+   size = sizeof(*config);
+
+   // Config struct must not cross a page boundary. So long as the struct
+   // size is a power of 2, this should be fine as alignment is guaranteed
+   config = kzalloc(size, GFP_KERNEL);
+   if (!config) {
+   rc = -ENOMEM;
+   goto err;
+   }
+
+   rc = plpar_hcall(H_PKS_GET_CONFIG, retbuf, virt_to_phys(config), size);
+
+   if (rc != H_SUCCESS) {
+   rc = 

[PATCH v4 17/24] powerpc/pseries: Log hcall return codes for PLPKS debug

2023-01-19 Thread Andrew Donnellan
From: Russell Currey 

The plpks code converts hypervisor return codes into their Linux
equivalents so that users can understand them.  Having access to the
original return codes is really useful for debugging, so add a
pr_debug() so we don't lose information from the conversion.

Signed-off-by: Russell Currey 
Signed-off-by: Andrew Donnellan 
---
 arch/powerpc/platforms/pseries/plpks.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/plpks.c 
b/arch/powerpc/platforms/pseries/plpks.c
index 796ed5544ee5..96a026a37285 100644
--- a/arch/powerpc/platforms/pseries/plpks.c
+++ b/arch/powerpc/platforms/pseries/plpks.c
@@ -117,6 +117,8 @@ static int pseries_status_to_err(int rc)
err = -EINVAL;
}
 
+   pr_debug("Converted hypervisor code %d to Linux %d\n", rc, err);
+
return err;
 }
 
-- 
2.39.0



[PATCH v4 24/24] integrity/powerpc: Support loading keys from pseries secvar

2023-01-19 Thread Andrew Donnellan
From: Russell Currey 

The secvar object format is only in the device tree under powernv.
We now have an API call to retrieve it in a generic way, so we should
use that instead of having to handle the DT here.

Add support for pseries secvar, with the "ibm,plpks-sb-v1" format.
The object format is expected to be the same, so there shouldn't be any
functional differences between objects retrieved from powernv and
pseries.

Signed-off-by: Russell Currey 
Signed-off-by: Andrew Donnellan 

---

v3: New patch

v4: Pass format buffer size (stefanb, npiggin)
---
 .../integrity/platform_certs/load_powerpc.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/security/integrity/platform_certs/load_powerpc.c 
b/security/integrity/platform_certs/load_powerpc.c
index dee51606d5f4..d4ce91bf3fec 100644
--- a/security/integrity/platform_certs/load_powerpc.c
+++ b/security/integrity/platform_certs/load_powerpc.c
@@ -10,7 +10,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include "keyring_handler.h"
@@ -59,16 +58,22 @@ static int __init load_powerpc_certs(void)
void *db = NULL, *dbx = NULL;
u64 dbsize = 0, dbxsize = 0;
int rc = 0;
-   struct device_node *node;
+   ssize_t len;
+   char buf[32];
 
if (!secvar_ops)
return -ENODEV;
 
-   /* The following only applies for the edk2-compat backend. */
-   node = of_find_compatible_node(NULL, NULL, "ibm,edk2-compat-v1");
-   if (!node)
+   len = secvar_ops->format(buf, 32);
+   if (len <= 0)
return -ENODEV;
 
+   // Check for known secure boot implementations from OPAL or PLPKS
+   if (strcmp("ibm,edk2-compat-v1", buf) && strcmp("ibm,plpks-sb-v1", 
buf)) {
+   pr_err("Unsupported secvar implementation \"%s\", not loading 
certs\n", buf);
+   return -ENODEV;
+   }
+
/*
 * Get db, and dbx. They might not exist, so it isn't an error if we
 * can't get them.
@@ -103,8 +108,6 @@ static int __init load_powerpc_certs(void)
kfree(dbx);
}
 
-   of_node_put(node);
-
return rc;
 }
 late_initcall(load_powerpc_certs);
-- 
2.39.0



[PATCH v4 13/24] powerpc/pseries: Move plpks.h to include directory

2023-01-19 Thread Andrew Donnellan
From: Russell Currey 

Move plpks.h from platforms/pseries/ to include/asm/. This is necessary
for later patches to make use of the PLPKS from code in other subsystems.

Signed-off-by: Russell Currey 
Signed-off-by: Andrew Donnellan 

---

v3: New patch
---
 .../powerpc/{platforms/pseries => include/asm}/plpks.h | 10 +++---
 arch/powerpc/platforms/pseries/plpks.c |  3 +--
 2 files changed, 8 insertions(+), 5 deletions(-)
 rename arch/powerpc/{platforms/pseries => include/asm}/plpks.h (89%)

diff --git a/arch/powerpc/platforms/pseries/plpks.h 
b/arch/powerpc/include/asm/plpks.h
similarity index 89%
rename from arch/powerpc/platforms/pseries/plpks.h
rename to arch/powerpc/include/asm/plpks.h
index 275ccd86bfb5..8295502ee93b 100644
--- a/arch/powerpc/platforms/pseries/plpks.h
+++ b/arch/powerpc/include/asm/plpks.h
@@ -6,8 +6,10 @@
  * Platform keystore for pseries LPAR(PLPKS).
  */
 
-#ifndef _PSERIES_PLPKS_H
-#define _PSERIES_PLPKS_H
+#ifndef _ASM_POWERPC_PLPKS_H
+#define _ASM_POWERPC_PLPKS_H
+
+#ifdef CONFIG_PSERIES_PLPKS
 
 #include 
 #include 
@@ -68,4 +70,6 @@ int plpks_read_fw_var(struct plpks_var *var);
  */
 int plpks_read_bootloader_var(struct plpks_var *var);
 
-#endif
+#endif // CONFIG_PSERIES_PLPKS
+
+#endif // _ASM_POWERPC_PLPKS_H
diff --git a/arch/powerpc/platforms/pseries/plpks.c 
b/arch/powerpc/platforms/pseries/plpks.c
index a01cf2ff140a..13e6daadb179 100644
--- a/arch/powerpc/platforms/pseries/plpks.c
+++ b/arch/powerpc/platforms/pseries/plpks.c
@@ -18,8 +18,7 @@
 #include 
 #include 
 #include 
-
-#include "plpks.h"
+#include 
 
 #define PKS_FW_OWNER0x1
 #define PKS_BOOTLOADER_OWNER 0x2
-- 
2.39.0



[PATCH v4 20/24] powerpc/pseries: Add helpers to get PLPKS password

2023-01-19 Thread Andrew Donnellan
From: Russell Currey 

Add helper functions to get the PLPKS password. This will be used in a
later patch to support passing the password between kernels over kexec.

Signed-off-by: Russell Currey 
Signed-off-by: Andrew Donnellan 

---

v3: New patch
---
 arch/powerpc/include/asm/plpks.h   | 11 +++
 arch/powerpc/platforms/pseries/plpks.c | 10 ++
 2 files changed, 21 insertions(+)

diff --git a/arch/powerpc/include/asm/plpks.h b/arch/powerpc/include/asm/plpks.h
index 0c49969b0864..08355c89f5fd 100644
--- a/arch/powerpc/include/asm/plpks.h
+++ b/arch/powerpc/include/asm/plpks.h
@@ -171,6 +171,17 @@ u32 plpks_get_maxlargeobjectsize(void);
  */
 u64 plpks_get_signedupdatealgorithms(void);
 
+/**
+ * Returns the PLPKS password generated by the hypervisor.
+ * Should only be used to prepare a different OS to use the PLPKS, i.e. kexec.
+ */
+u8 *plpks_get_password(void);
+
+/**
+ * Returns the length of the PLPKS password in bytes.
+ */
+u16 plpks_get_passwordlen(void);
+
 #endif // CONFIG_PSERIES_PLPKS
 
 #endif // _ASM_POWERPC_PLPKS_H
diff --git a/arch/powerpc/platforms/pseries/plpks.c 
b/arch/powerpc/platforms/pseries/plpks.c
index 5d9c6a3b2014..b3c7410a4f13 100644
--- a/arch/powerpc/platforms/pseries/plpks.c
+++ b/arch/powerpc/platforms/pseries/plpks.c
@@ -359,6 +359,16 @@ u64 plpks_get_signedupdatealgorithms(void)
return signedupdatealgorithms;
 }
 
+u8 *plpks_get_password(void)
+{
+   return ospassword;
+}
+
+u16 plpks_get_passwordlen(void)
+{
+   return ospasswordlength;
+}
+
 bool plpks_is_available(void)
 {
int rc;
-- 
2.39.0



[PATCH v4 05/24] powerpc/secvar: Use sysfs_emit() instead of sprintf()

2023-01-19 Thread Andrew Donnellan
From: Russell Currey 

The secvar format string and object size sysfs files are both ASCII
text, and should use sysfs_emit().  No functional change.

Suggested-by: Greg Kroah-Hartman 
Signed-off-by: Russell Currey 
Signed-off-by: Andrew Donnellan 

---

v2: New patch (gregkh)
---
 arch/powerpc/kernel/secvar-sysfs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/secvar-sysfs.c 
b/arch/powerpc/kernel/secvar-sysfs.c
index 001cdbcdb9d2..462cacc0ca60 100644
--- a/arch/powerpc/kernel/secvar-sysfs.c
+++ b/arch/powerpc/kernel/secvar-sysfs.c
@@ -35,7 +35,7 @@ static ssize_t format_show(struct kobject *kobj, struct 
kobj_attribute *attr,
if (rc)
goto out;
 
-   rc = sprintf(buf, "%s\n", format);
+   rc = sysfs_emit(buf, "%s\n", format);
 
 out:
of_node_put(node);
@@ -57,7 +57,7 @@ static ssize_t size_show(struct kobject *kobj, struct 
kobj_attribute *attr,
return rc;
}
 
-   return sprintf(buf, "%llu\n", dsize);
+   return sysfs_emit(buf, "%llu\n", dsize);
 }
 
 static ssize_t data_read(struct file *filep, struct kobject *kobj,
-- 
2.39.0



[PATCH v4 14/24] powerpc/pseries: Move PLPKS constants to header file

2023-01-19 Thread Andrew Donnellan
From: Russell Currey 

Move the constants defined in plpks.c to plpks.h, and standardise their
naming, so that PLPKS consumers can make use of them later on.

Signed-off-by: Russell Currey 
Co-developed-by: Andrew Donnellan 
Signed-off-by: Andrew Donnellan 

---

v3: New patch
---
 arch/powerpc/include/asm/plpks.h   | 36 +---
 arch/powerpc/platforms/pseries/plpks.c | 57 ++
 2 files changed, 53 insertions(+), 40 deletions(-)

diff --git a/arch/powerpc/include/asm/plpks.h b/arch/powerpc/include/asm/plpks.h
index 8295502ee93b..6466aadd7145 100644
--- a/arch/powerpc/include/asm/plpks.h
+++ b/arch/powerpc/include/asm/plpks.h
@@ -14,14 +14,40 @@
 #include 
 #include 
 
-#define OSSECBOOTAUDIT 0x4000
-#define OSSECBOOTENFORCE 0x2000
-#define WORLDREADABLE 0x0800
-#define SIGNEDUPDATE 0x0100
+// Object policy flags from supported_policies
+#define PLPKS_OSSECBOOTAUDIT   PPC_BIT32(1) // OS secure boot must be 
audit/enforce
+#define PLPKS_OSSECBOOTENFORCE PPC_BIT32(2) // OS secure boot must be enforce
+#define PLPKS_PWSETPPC_BIT32(3) // No access without password set
+#define PLPKS_WORLDREADABLEPPC_BIT32(4) // Readable without authentication
+#define PLPKS_IMMUTABLEPPC_BIT32(5) // Once written, object 
cannot be removed
+#define PLPKS_TRANSIENTPPC_BIT32(6) // Object does not persist 
through reboot
+#define PLPKS_SIGNEDUPDATE PPC_BIT32(7) // Object can only be modified by 
signed updates
+#define PLPKS_HVPROVISIONEDPPC_BIT32(28) // Hypervisor has provisioned 
this object
 
-#define PLPKS_VAR_LINUX0x02
+// Signature algorithm flags from signed_update_algorithms
+#define PLPKS_ALG_RSA2048  PPC_BIT(0)
+#define PLPKS_ALG_RSA4096  PPC_BIT(1)
+
+// Object label OS metadata flags
+#define PLPKS_VAR_LINUX0x02
 #define PLPKS_VAR_COMMON   0x04
 
+// Flags for which consumer owns an object is owned by
+#define PLPKS_FW_OWNER 0x1
+#define PLPKS_BOOTLOADER_OWNER 0x2
+#define PLPKS_OS_OWNER 0x3
+
+// Flags for label metadata fields
+#define PLPKS_LABEL_VERSION0
+#define PLPKS_MAX_LABEL_ATTR_SIZE  16
+#define PLPKS_MAX_NAME_SIZE239
+#define PLPKS_MAX_DATA_SIZE4000
+
+// Timeouts for PLPKS operations
+#define PLPKS_MAX_TIMEOUT  5000 // msec
+#define PLPKS_FLUSH_SLEEP  10 // msec
+#define PLPKS_FLUSH_SLEEP_RANGE400
+
 struct plpks_var {
char *component;
u8 *name;
diff --git a/arch/powerpc/platforms/pseries/plpks.c 
b/arch/powerpc/platforms/pseries/plpks.c
index 13e6daadb179..91f3f623a2c7 100644
--- a/arch/powerpc/platforms/pseries/plpks.c
+++ b/arch/powerpc/platforms/pseries/plpks.c
@@ -20,19 +20,6 @@
 #include 
 #include 
 
-#define PKS_FW_OWNER0x1
-#define PKS_BOOTLOADER_OWNER 0x2
-#define PKS_OS_OWNER0x3
-
-#define LABEL_VERSION  0
-#define MAX_LABEL_ATTR_SIZE 16
-#define MAX_NAME_SIZE  239
-#define MAX_DATA_SIZE  4000
-
-#define PKS_FLUSH_MAX_TIMEOUT 5000 //msec
-#define PKS_FLUSH_SLEEP  10 //msec
-#define PKS_FLUSH_SLEEP_RANGE 400
-
 static u8 *ospassword;
 static u16 ospasswordlength;
 
@@ -59,7 +46,7 @@ struct label_attr {
 
 struct label {
struct label_attr attr;
-   u8 name[MAX_NAME_SIZE];
+   u8 name[PLPKS_MAX_NAME_SIZE];
size_t size;
 };
 
@@ -122,7 +109,7 @@ static int pseries_status_to_err(int rc)
 static int plpks_gen_password(void)
 {
unsigned long retbuf[PLPAR_HCALL_BUFSIZE] = { 0 };
-   u8 *password, consumer = PKS_OS_OWNER;
+   u8 *password, consumer = PLPKS_OS_OWNER;
int rc;
 
// The password must not cross a page boundary, so we align to the next 
power of 2
@@ -159,7 +146,7 @@ static struct plpks_auth *construct_auth(u8 consumer)
 {
struct plpks_auth *auth;
 
-   if (consumer > PKS_OS_OWNER)
+   if (consumer > PLPKS_OS_OWNER)
return ERR_PTR(-EINVAL);
 
// The auth structure must not cross a page boundary and must be
@@ -171,7 +158,7 @@ static struct plpks_auth *construct_auth(u8 consumer)
auth->version = 1;
auth->consumer = consumer;
 
-   if (consumer == PKS_FW_OWNER || consumer == PKS_BOOTLOADER_OWNER)
+   if (consumer == PLPKS_FW_OWNER || consumer == PLPKS_BOOTLOADER_OWNER)
return auth;
 
memcpy(auth->password, ospassword, ospasswordlength);
@@ -191,7 +178,7 @@ static struct label *construct_label(char *component, u8 
varos, u8 *name,
struct label *label;
size_t slen;
 
-   if (!name || namelen > MAX_NAME_SIZE)
+   if (!name || namelen > PLPKS_MAX_NAME_SIZE)
return ERR_PTR(-EINVAL);
 
slen = strlen(component);
@@ -206,9 +193,9 @@ static struct label *construct_label(char *component, u8 
varos, u8 *name,
if (component)
memcpy(>attr.prefix, component, slen);
 
-   

[PATCH v4 11/24] powerpc/secvar: Warn when PAGE_SIZE is smaller than max object size

2023-01-19 Thread Andrew Donnellan
Due to sysfs constraints, when writing to a variable, we can only handle
writes of up to PAGE_SIZE.

It's possible that the maximum object size is larger than PAGE_SIZE, in
which case, print a warning on boot so that the user is aware.

Signed-off-by: Andrew Donnellan 
Signed-off-by: Russell Currey 

---

v3: New patch (ajd)
---
 arch/powerpc/kernel/secvar-sysfs.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/powerpc/kernel/secvar-sysfs.c 
b/arch/powerpc/kernel/secvar-sysfs.c
index e1d4c70dd327..6dd9b4f6f87c 100644
--- a/arch/powerpc/kernel/secvar-sysfs.c
+++ b/arch/powerpc/kernel/secvar-sysfs.c
@@ -221,6 +221,7 @@ static int secvar_sysfs_load_static(void)
 
 static int secvar_sysfs_init(void)
 {
+   u64 max_size;
int rc;
 
if (!secvar_ops) {
@@ -270,6 +271,14 @@ static int secvar_sysfs_init(void)
goto err;
}
 
+   // Due to sysfs limitations, we will only ever get a write buffer of
+   // up to 1 page in size. Print a warning if this is potentially going
+   // to cause problems, so that the user is aware.
+   secvar_ops->max_size(_size);
+   if (max_size > PAGE_SIZE)
+   pr_warn_ratelimited("PAGE_SIZE (%lu) is smaller than maximum 
object size (%llu), writes are limited to PAGE_SIZE\n",
+   PAGE_SIZE, max_size);
+
return 0;
 err:
kobject_put(secvar_kobj);
-- 
2.39.0



[PATCH v4 08/24] powerpc/secvar: Clean up init error messages

2023-01-19 Thread Andrew Donnellan
Remove unnecessary prefixes from error messages in secvar_sysfs_init()
(the file defines pr_fmt, so putting "secvar:" in every message is
unnecessary). Make capitalisation and punctuation more consistent.

Signed-off-by: Andrew Donnellan 
Signed-off-by: Russell Currey 

---

v3: New patch (ajd)
---
 arch/powerpc/kernel/secvar-sysfs.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/secvar-sysfs.c 
b/arch/powerpc/kernel/secvar-sysfs.c
index d2f65a67845c..53ac01e0eb0b 100644
--- a/arch/powerpc/kernel/secvar-sysfs.c
+++ b/arch/powerpc/kernel/secvar-sysfs.c
@@ -194,13 +194,13 @@ static int secvar_sysfs_init(void)
int rc;
 
if (!secvar_ops) {
-   pr_warn("secvar: failed to retrieve secvar operations.\n");
+   pr_warn("Failed to retrieve secvar operations\n");
return -ENODEV;
}
 
secvar_kobj = kobject_create_and_add("secvar", firmware_kobj);
if (!secvar_kobj) {
-   pr_err("secvar: Failed to create firmware kobj\n");
+   pr_err("Failed to create firmware kobj\n");
return -ENOMEM;
}
 
@@ -212,7 +212,7 @@ static int secvar_sysfs_init(void)
 
secvar_kset = kset_create_and_add("vars", NULL, secvar_kobj);
if (!secvar_kset) {
-   pr_err("secvar: sysfs kobject registration failed.\n");
+   pr_err("sysfs kobject registration failed\n");
kobject_put(secvar_kobj);
return -ENOMEM;
}
-- 
2.39.0



[PATCH v4 06/24] powerpc/secvar: Handle format string in the consumer

2023-01-19 Thread Andrew Donnellan
From: Russell Currey 

The code that handles the format string in secvar-sysfs.c is entirely
OPAL specific, so create a new "format" op in secvar_operations to make
the secvar code more generic.  No functional change.

Signed-off-by: Russell Currey 
Signed-off-by: Andrew Donnellan 

---

v2: Use sysfs_emit() instead of sprintf() (gregkh)

v3: Enforce format string size limit (ruscur)

v4: Pass the buffer size as an argument, not using a macro (stefanb,
npiggin)

Fix error reporting (npiggin)
---
 arch/powerpc/include/asm/secvar.h|  1 +
 arch/powerpc/kernel/secvar-sysfs.c   | 27 +++-
 arch/powerpc/platforms/powernv/opal-secvar.c | 25 ++
 3 files changed, 35 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/secvar.h 
b/arch/powerpc/include/asm/secvar.h
index 4ce3f12c5613..2d9816dff128 100644
--- a/arch/powerpc/include/asm/secvar.h
+++ b/arch/powerpc/include/asm/secvar.h
@@ -17,6 +17,7 @@ struct secvar_operations {
int (*get)(const char *key, u64 key_len, u8 *data, u64 *data_size);
int (*get_next)(const char *key, u64 *key_len, u64 keybufsize);
int (*set)(const char *key, u64 key_len, u8 *data, u64 data_size);
+   ssize_t (*format)(char *buf, size_t bufsize);
 };
 
 #ifdef CONFIG_PPC_SECURE_BOOT
diff --git a/arch/powerpc/kernel/secvar-sysfs.c 
b/arch/powerpc/kernel/secvar-sysfs.c
index 462cacc0ca60..4beec935f5e7 100644
--- a/arch/powerpc/kernel/secvar-sysfs.c
+++ b/arch/powerpc/kernel/secvar-sysfs.c
@@ -21,26 +21,17 @@ static struct kset *secvar_kset;
 static ssize_t format_show(struct kobject *kobj, struct kobj_attribute *attr,
   char *buf)
 {
-   ssize_t rc = 0;
-   struct device_node *node;
-   const char *format;
-
-   node = of_find_compatible_node(NULL, NULL, "ibm,secvar-backend");
-   if (!of_device_is_available(node)) {
-   rc = -ENODEV;
-   goto out;
-   }
+   char tmp[32];
+   ssize_t len = secvar_ops->format(tmp, sizeof(tmp));
 
-   rc = of_property_read_string(node, "format", );
-   if (rc)
-   goto out;
+   if (len > 0)
+   return sysfs_emit(buf, "%s\n", tmp);
+   else if (len < 0)
+   pr_err("Error %zd reading format string\n", len);
+   else
+   pr_err("Got empty format string from backend\n");
 
-   rc = sysfs_emit(buf, "%s\n", format);
-
-out:
-   of_node_put(node);
-
-   return rc;
+   return -EIO;
 }
 
 
diff --git a/arch/powerpc/platforms/powernv/opal-secvar.c 
b/arch/powerpc/platforms/powernv/opal-secvar.c
index 4c0a3b030fe0..e33bb703ecbc 100644
--- a/arch/powerpc/platforms/powernv/opal-secvar.c
+++ b/arch/powerpc/platforms/powernv/opal-secvar.c
@@ -98,10 +98,35 @@ static int opal_set_variable(const char *key, u64 ksize, u8 
*data, u64 dsize)
return opal_status_to_err(rc);
 }
 
+static ssize_t opal_secvar_format(char *buf, size_t bufsize)
+{
+   ssize_t rc = 0;
+   struct device_node *node;
+   const char *format;
+
+   node = of_find_compatible_node(NULL, NULL, "ibm,secvar-backend");
+   if (!of_device_is_available(node)) {
+   rc = -ENODEV;
+   goto out;
+   }
+
+   rc = of_property_read_string(node, "format", );
+   if (rc)
+   goto out;
+
+   rc = snprintf(buf, bufsize, "%s", format);
+
+out:
+   of_node_put(node);
+
+   return rc;
+}
+
 static const struct secvar_operations opal_secvar_ops = {
.get = opal_get_variable,
.get_next = opal_get_next_variable,
.set = opal_set_variable,
+   .format = opal_secvar_format,
 };
 
 static int opal_secvar_probe(struct platform_device *pdev)
-- 
2.39.0



[PATCH v4 10/24] powerpc/secvar: Allow backend to populate static list of variable names

2023-01-19 Thread Andrew Donnellan
Currently, the list of variables is populated by calling
secvar_ops->get_next() repeatedly, which is explicitly modelled on the
OPAL API (including the keylen parameter).

For the upcoming PLPKS backend, we have a static list of variable names.
It is messy to fit that into get_next(), so instead, let the backend put
a NULL-terminated array of variable names into secvar_ops->var_names,
which will be used if get_next() is undefined.

Signed-off-by: Andrew Donnellan 
Signed-off-by: Russell Currey 

---

v3: New patch (ajd/mpe)
---
 arch/powerpc/include/asm/secvar.h  |  4 ++
 arch/powerpc/kernel/secvar-sysfs.c | 67 --
 2 files changed, 50 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/secvar.h 
b/arch/powerpc/include/asm/secvar.h
index 5ed141c711b0..467f83c24084 100644
--- a/arch/powerpc/include/asm/secvar.h
+++ b/arch/powerpc/include/asm/secvar.h
@@ -21,6 +21,10 @@ struct secvar_operations {
ssize_t (*format)(char *buf, size_t bufsize);
int (*max_size)(u64 *max_size);
const struct attribute **config_attrs;
+
+   // NULL-terminated array of fixed variable names
+   // Only used if get_next() isn't provided
+   const char * const *var_names;
 };
 
 #ifdef CONFIG_PPC_SECURE_BOOT
diff --git a/arch/powerpc/kernel/secvar-sysfs.c 
b/arch/powerpc/kernel/secvar-sysfs.c
index d7936d8c7478..e1d4c70dd327 100644
--- a/arch/powerpc/kernel/secvar-sysfs.c
+++ b/arch/powerpc/kernel/secvar-sysfs.c
@@ -157,9 +157,31 @@ static int secvar_sysfs_config(struct kobject *kobj)
return 0;
 }
 
-static int secvar_sysfs_load(void)
+static int add_var(const char *name)
 {
struct kobject *kobj;
+   int rc;
+
+   kobj = kzalloc(sizeof(*kobj), GFP_KERNEL);
+   if (!kobj)
+   return -ENOMEM;
+
+   kobject_init(kobj, _ktype);
+
+   rc = kobject_add(kobj, _kset->kobj, "%s", name);
+   if (rc) {
+   pr_warn("kobject_add error %d for attribute: %s\n", rc,
+   name);
+   kobject_put(kobj);
+   return rc;
+   }
+
+   kobject_uevent(kobj, KOBJ_ADD);
+   return 0;
+}
+
+static int secvar_sysfs_load(void)
+{
u64 namesize = 0;
char *name;
int rc;
@@ -177,31 +199,26 @@ static int secvar_sysfs_load(void)
break;
}
 
-   kobj = kzalloc(sizeof(*kobj), GFP_KERNEL);
-   if (!kobj) {
-   rc = -ENOMEM;
-   break;
-   }
-
-   kobject_init(kobj, _ktype);
-
-   rc = kobject_add(kobj, _kset->kobj, "%s", name);
-   if (rc) {
-   pr_warn("kobject_add error %d for attribute: %s\n", rc,
-   name);
-   kobject_put(kobj);
-   kobj = NULL;
-   }
-
-   if (kobj)
-   kobject_uevent(kobj, KOBJ_ADD);
-
+   rc = add_var(name);
} while (!rc);
 
kfree(name);
return rc;
 }
 
+static int secvar_sysfs_load_static(void)
+{
+   const char * const *name_ptr = secvar_ops->var_names;
+   int rc;
+   while (*name_ptr) {
+   rc = add_var(*name_ptr);
+   if (rc)
+   return rc;
+   name_ptr++;
+   }
+   return 0;
+}
+
 static int secvar_sysfs_init(void)
 {
int rc;
@@ -243,7 +260,15 @@ static int secvar_sysfs_init(void)
goto err;
}
 
-   secvar_sysfs_load();
+   if (secvar_ops->get_next)
+   rc = secvar_sysfs_load();
+   else
+   rc = secvar_sysfs_load_static();
+
+   if (rc) {
+   pr_err("Failed to create variable attributes\n");
+   goto err;
+   }
 
return 0;
 err:
-- 
2.39.0



[PATCH v4 09/24] powerpc/secvar: Extend sysfs to include config vars

2023-01-19 Thread Andrew Donnellan
From: Russell Currey 

The forthcoming pseries consumer of the secvar API wants to expose a
number of config variables.  Allowing secvar implementations to provide
their own sysfs attributes makes it easy for consumers to expose what
they need to.

This is not being used by the OPAL secvar implementation at present, and
the config directory will not be created if no attributes are set.

Signed-off-by: Russell Currey 
Co-developed-by: Andrew Donnellan 
Signed-off-by: Andrew Donnellan 

---

v3: Remove unnecessary "secvar:" prefix from error messages (ajd)

Merge config attributes into secvar_operations (mpe)
---
 arch/powerpc/include/asm/secvar.h  |  2 ++
 arch/powerpc/kernel/secvar-sysfs.c | 33 +-
 2 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/secvar.h 
b/arch/powerpc/include/asm/secvar.h
index b97ab793cc8a..5ed141c711b0 100644
--- a/arch/powerpc/include/asm/secvar.h
+++ b/arch/powerpc/include/asm/secvar.h
@@ -10,6 +10,7 @@
 
 #include 
 #include 
+#include 
 
 extern const struct secvar_operations *secvar_ops;
 
@@ -19,6 +20,7 @@ struct secvar_operations {
int (*set)(const char *key, u64 key_len, u8 *data, u64 data_size);
ssize_t (*format)(char *buf, size_t bufsize);
int (*max_size)(u64 *max_size);
+   const struct attribute **config_attrs;
 };
 
 #ifdef CONFIG_PPC_SECURE_BOOT
diff --git a/arch/powerpc/kernel/secvar-sysfs.c 
b/arch/powerpc/kernel/secvar-sysfs.c
index 53ac01e0eb0b..d7936d8c7478 100644
--- a/arch/powerpc/kernel/secvar-sysfs.c
+++ b/arch/powerpc/kernel/secvar-sysfs.c
@@ -144,6 +144,19 @@ static int update_kobj_size(void)
return 0;
 }
 
+static int secvar_sysfs_config(struct kobject *kobj)
+{
+   struct attribute_group config_group = {
+   .name = "config",
+   .attrs = (struct attribute **)secvar_ops->config_attrs,
+   };
+
+   if (secvar_ops->config_attrs)
+   return sysfs_create_group(kobj, _group);
+
+   return 0;
+}
+
 static int secvar_sysfs_load(void)
 {
struct kobject *kobj;
@@ -206,26 +219,36 @@ static int secvar_sysfs_init(void)
 
rc = sysfs_create_file(secvar_kobj, _attr.attr);
if (rc) {
-   kobject_put(secvar_kobj);
-   return -ENOMEM;
+   pr_err("Failed to create format object\n");
+   rc = -ENOMEM;
+   goto err;
}
 
secvar_kset = kset_create_and_add("vars", NULL, secvar_kobj);
if (!secvar_kset) {
pr_err("sysfs kobject registration failed\n");
-   kobject_put(secvar_kobj);
-   return -ENOMEM;
+   rc = -ENOMEM;
+   goto err;
}
 
rc = update_kobj_size();
if (rc) {
pr_err("Cannot read the size of the attribute\n");
-   return rc;
+   goto err;
+   }
+
+   rc = secvar_sysfs_config(secvar_kobj);
+   if (rc) {
+   pr_err("Failed to create config directory\n");
+   goto err;
}
 
secvar_sysfs_load();
 
return 0;
+err:
+   kobject_put(secvar_kobj);
+   return rc;
 }
 
 late_initcall(secvar_sysfs_init);
-- 
2.39.0



[PATCH v4 01/24] powerpc/pseries: Fix handling of PLPKS object flushing timeout

2023-01-19 Thread Andrew Donnellan
plpks_confirm_object_flushed() uses the H_PKS_CONFIRM_OBJECT_FLUSHED hcall
to check whether changes to an object in the Platform KeyStore have been
flushed to non-volatile storage.

The hcall returns two output values, the return code and the flush status.
plpks_confirm_object_flushed() polls the hcall until either the flush
status has updated, the return code is an error, or a timeout has been
exceeded.

While we're still polling, the hcall is returning H_SUCCESS (0) as the
return code. In the timeout case, this means that upon exiting the polling
loop, rc is 0, and therefore 0 is returned to the user.

Handle the timeout case separately and return ETIMEDOUT if triggered.

Fixes: 2454a7af0f2a ("powerpc/pseries: define driver for Platform KeyStore")
Reported-by: Benjamin Gray 
Signed-off-by: Andrew Donnellan 
Tested-by: Russell Currey 
Reviewed-by: Russell Currey 
Signed-off-by: Russell Currey 

---

v3: Merge plpks fixes and signed update series with secvar series

Neaten how we return at the end of the function (ruscur)

v4: Move up in series (npiggin)
---
 arch/powerpc/platforms/pseries/plpks.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/plpks.c 
b/arch/powerpc/platforms/pseries/plpks.c
index 4edd1585e245..9e85b6d85b0b 100644
--- a/arch/powerpc/platforms/pseries/plpks.c
+++ b/arch/powerpc/platforms/pseries/plpks.c
@@ -248,6 +248,7 @@ static int plpks_confirm_object_flushed(struct label *label,
struct plpks_auth *auth)
 {
unsigned long retbuf[PLPAR_HCALL_BUFSIZE] = { 0 };
+   bool timed_out = true;
u64 timeout = 0;
u8 status;
int rc;
@@ -259,22 +260,26 @@ static int plpks_confirm_object_flushed(struct label 
*label,
 
status = retbuf[0];
if (rc) {
+   timed_out = false;
if (rc == H_NOT_FOUND && status == 1)
rc = 0;
break;
}
 
-   if (!rc && status == 1)
+   if (!rc && status == 1) {
+   timed_out = false;
break;
+   }
 
usleep_range(PKS_FLUSH_SLEEP,
 PKS_FLUSH_SLEEP + PKS_FLUSH_SLEEP_RANGE);
timeout = timeout + PKS_FLUSH_SLEEP;
} while (timeout < PKS_FLUSH_MAX_TIMEOUT);
 
-   rc = pseries_status_to_err(rc);
+   if (timed_out)
+   return -ETIMEDOUT;
 
-   return rc;
+   return pseries_status_to_err(rc);
 }
 
 int plpks_write_var(struct plpks_var var)
-- 
2.39.0



[PATCH v4 00/24] pSeries dynamic secure boot secvar interface + platform keyring loading

2023-01-19 Thread Andrew Donnellan
This series exposes an interface to userspace for reading and writing
secure variables contained within the PowerVM LPAR Platform KeyStore
(PLPKS) for the purpose of configuring dynamic secure boot, and adds
the glue required to load keys from the PLPKS into the platform keyring.

This series builds on past work by Nayna Jain[0] in exposing PLPKS
variables to userspace.  Rather than being a generic interface for
interacting with the keystore, however, we use the existing powerpc
secvar infrastructure to only expose objects in the keystore used
for dynamic secure boot.  This has the benefit of leveraging an
existing interface and making the implementation relatively minimal.

This series integrates a previous series to fix some bugs in PLPKS
and implement support for signed updates[1], and a cleanup patch from
Michael Ellerman[2].

There are a few relevant details to note about the implementation:

 * New additions to the secvar API: format(), max_size(), config_attrs,
   var_names

 * New optional sysfs directory "config/" for arbitrary ASCII variables

 * Some OPAL-specific code has been relocated from secvar-sysfs.c to
   powernv platform code.  Would appreciate any powernv testing!

 * Variable names are fixed and only those used for secure boot are
   exposed.  This is not a generic PLPKS interface, but also doesn't
   preclude one being added in future.

With this series, both powernv and pseries platforms support dynamic
secure boot through the same interface.

[0]: 
https://lore.kernel.org/linuxppc-dev/20221106210744.603240-1-na...@linux.ibm.com/
[1]: 
https://lore.kernel.org/linuxppc-dev/20221220071626.1426786-1-...@linux.ibm.com/
[2]: 
https://lore.kernel.org/linuxppc-dev/20230112023819.1692452-1-...@ellerman.id.au/

v1: 
https://lore.kernel.org/linuxppc-dev/20221228072943.429266-1-rus...@russell.cc/
v2: 
https://lore.kernel.org/linuxppc-dev/20221230042014.154483-1-rus...@russell.cc/
v3: 
https://lore.kernel.org/linuxppc-dev/20230118061049.1006141-1-...@linux.ibm.com/

=

Changes in v4:

Fix the build when CONFIG_PSERIES_PLPKS=n (snowpatch)

Shuffled fixes to the front the series (npiggin)

Pass buffer size in secvar_operations->format() (stefanb, npiggin)

Return an error when set_secvar_ops() fails (npiggin)

Add some extra null checks (stefanb, gjoyce)

Add commit message comment elaborating on PAGE_SIZE issues (joel)

Fix error handling in the kexec code (ruscur)

Fix hvcall.h MAX_HCALL_OPCODE rebasing issue (npiggin)

Changes in v3:

Integrate Andrew's PLPKS bugfixes and enhancements series and Michael
Ellerman's u64 cleanup patch into this series (and update the other
patches to use u64)

New patches to load keys from the PLPKS into the kernel's platform
keyring (ruscur)

New patches to pass PLPKS password to new kernels when kexecing
(ruscur)

Improve handling of format strings (ruscur)

Clean up secvar error messages (ajd)

Merge config attributes into secvar_operations (mpe)

Add a new static variable names API rather than (ab)using get_next()
(ajd/mpe)

Warning message when PAGE_SIZE is smaller than the max object size
(ajd)

Move plpks.h to the include directory, and move a bunch of constants
in there with a consistent naming scheme

Refresh PLPKS config values whenever plpks_get_usedspace() is called
(ajd)

Extra validation on PLPKS config values (ruscur)

Return maxobjlabelsize to userspace as is without subtracting overhead 
(ruscur)

Fix error code handling in plpks_confirm_object_flushed() (ruscur)

Pass plpks_var struct to plpks_signed_update_var() by reference (mpe)

Make the data buffer in plpks_read_var() caller-allocated to reduce
number of allocations/copies (mpe)

Rework the Kconfig options so that PSERIES_PLPKS is a hidden option,
turned on by enabling PPC_SECURE_BOOT, and the PLPKS secvar code is
activated by PPC_SECVAR_SYSFS to match powernv (ajd)

Use machine_arch_initcall() rather than device_initcall() so we don't
break powernv (mpe)

Improve ABI documentation (mpe)

Return -EIO on most read errors (mpe)

Add "grubdbx" variable (Sudhakar)

Use utf8s_to_utf16s() rather than our own "UCS-2" conversion code (mpe)

Fix SB_VERSION data length (ruscur)

Stop prepending policy data on read (ruscur)

Don't print errors to the kernel log when reading non-existent
variables (Sudhakar)

Miscellaneous code style, checkpatch cleanups

Changes in v2:

Remove unnecessary config vars from sysfs and document the others,
thanks to review from Greg.  If we end up needing to expose more, we
can add them later and update the docs.

Use sysfs_emit() instead of sprintf() for all sysfs strings

Change the size of the sysfs binary attributes to include the 8-byte
flags header, preventing truncation of large writes.

Andrew Donnellan (8):
  powerpc/pseries: Fix handling of 

[PATCH v4 03/24] powerpc/secvar: Use u64 in secvar_operations

2023-01-19 Thread Andrew Donnellan
From: Michael Ellerman 

There's no reason for secvar_operations to use uint64_t vs the more
common kernel type u64.

The types are compatible, but they require different printk format
strings which can lead to confusion.

Change all the secvar related routines to use u64.

Signed-off-by: Michael Ellerman 
Reviewed-by: Russell Currey 
Reviewed-by: Andrew Donnellan 
Signed-off-by: Andrew Donnellan 

---

v3: Include new patch
---
 arch/powerpc/include/asm/secvar.h| 9 +++--
 arch/powerpc/kernel/secvar-sysfs.c   | 8 
 arch/powerpc/platforms/powernv/opal-secvar.c | 9 +++--
 security/integrity/platform_certs/load_powerpc.c | 4 ++--
 4 files changed, 12 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/secvar.h 
b/arch/powerpc/include/asm/secvar.h
index 4cc35b58b986..07ba36f868a7 100644
--- a/arch/powerpc/include/asm/secvar.h
+++ b/arch/powerpc/include/asm/secvar.h
@@ -14,12 +14,9 @@
 extern const struct secvar_operations *secvar_ops;
 
 struct secvar_operations {
-   int (*get)(const char *key, uint64_t key_len, u8 *data,
-  uint64_t *data_size);
-   int (*get_next)(const char *key, uint64_t *key_len,
-   uint64_t keybufsize);
-   int (*set)(const char *key, uint64_t key_len, u8 *data,
-  uint64_t data_size);
+   int (*get)(const char *key, u64 key_len, u8 *data, u64 *data_size);
+   int (*get_next)(const char *key, u64 *key_len, u64 keybufsize);
+   int (*set)(const char *key, u64 key_len, u8 *data, u64 data_size);
 };
 
 #ifdef CONFIG_PPC_SECURE_BOOT
diff --git a/arch/powerpc/kernel/secvar-sysfs.c 
b/arch/powerpc/kernel/secvar-sysfs.c
index 1ee4640a2641..001cdbcdb9d2 100644
--- a/arch/powerpc/kernel/secvar-sysfs.c
+++ b/arch/powerpc/kernel/secvar-sysfs.c
@@ -47,7 +47,7 @@ static ssize_t format_show(struct kobject *kobj, struct 
kobj_attribute *attr,
 static ssize_t size_show(struct kobject *kobj, struct kobj_attribute *attr,
 char *buf)
 {
-   uint64_t dsize;
+   u64 dsize;
int rc;
 
rc = secvar_ops->get(kobj->name, strlen(kobj->name) + 1, NULL, );
@@ -64,8 +64,8 @@ static ssize_t data_read(struct file *filep, struct kobject 
*kobj,
 struct bin_attribute *attr, char *buf, loff_t off,
 size_t count)
 {
-   uint64_t dsize;
char *data;
+   u64 dsize;
int rc;
 
rc = secvar_ops->get(kobj->name, strlen(kobj->name) + 1, NULL, );
@@ -166,9 +166,9 @@ static int update_kobj_size(void)
 
 static int secvar_sysfs_load(void)
 {
-   char *name;
-   uint64_t namesize = 0;
struct kobject *kobj;
+   u64 namesize = 0;
+   char *name;
int rc;
 
name = kzalloc(NAME_MAX_SIZE, GFP_KERNEL);
diff --git a/arch/powerpc/platforms/powernv/opal-secvar.c 
b/arch/powerpc/platforms/powernv/opal-secvar.c
index 14133e120bdd..ef89861569e0 100644
--- a/arch/powerpc/platforms/powernv/opal-secvar.c
+++ b/arch/powerpc/platforms/powernv/opal-secvar.c
@@ -54,8 +54,7 @@ static int opal_status_to_err(int rc)
return err;
 }
 
-static int opal_get_variable(const char *key, uint64_t ksize,
-u8 *data, uint64_t *dsize)
+static int opal_get_variable(const char *key, u64 ksize, u8 *data, u64 *dsize)
 {
int rc;
 
@@ -71,8 +70,7 @@ static int opal_get_variable(const char *key, uint64_t ksize,
return opal_status_to_err(rc);
 }
 
-static int opal_get_next_variable(const char *key, uint64_t *keylen,
- uint64_t keybufsize)
+static int opal_get_next_variable(const char *key, u64 *keylen, u64 keybufsize)
 {
int rc;
 
@@ -88,8 +86,7 @@ static int opal_get_next_variable(const char *key, uint64_t 
*keylen,
return opal_status_to_err(rc);
 }
 
-static int opal_set_variable(const char *key, uint64_t ksize, u8 *data,
-uint64_t dsize)
+static int opal_set_variable(const char *key, u64 ksize, u8 *data, u64 dsize)
 {
int rc;
 
diff --git a/security/integrity/platform_certs/load_powerpc.c 
b/security/integrity/platform_certs/load_powerpc.c
index a2900cb85357..1e4f80a4e71c 100644
--- a/security/integrity/platform_certs/load_powerpc.c
+++ b/security/integrity/platform_certs/load_powerpc.c
@@ -18,7 +18,7 @@
 /*
  * Get a certificate list blob from the named secure variable.
  */
-static __init void *get_cert_list(u8 *key, unsigned long keylen, uint64_t 
*size)
+static __init void *get_cert_list(u8 *key, unsigned long keylen, u64 *size)
 {
int rc;
void *db;
@@ -51,7 +51,7 @@ static __init void *get_cert_list(u8 *key, unsigned long 
keylen, uint64_t *size)
 static int __init load_powerpc_certs(void)
 {
void *db = NULL, *dbx = NULL;
-   uint64_t dbsize = 0, dbxsize = 0;
+   u64 dbsize = 0, dbxsize = 0;
int rc = 0;
struct device_node *node;
 
-- 
2.39.0



[PATCH v4 02/24] powerpc/pseries: Fix alignment of PLPKS structures and buffers

2023-01-19 Thread Andrew Donnellan
A number of structures and buffers passed to PKS hcalls have alignment
requirements, which could on occasion cause problems:

- Authorisation structures must be 16-byte aligned and must not cross a
  page boundary

- Label structures must not cross page boundaries

- Password output buffers must not cross page boundaries

Round up the allocations of these structures/buffers to the next power of
2 to make sure this happens.

Reported-by: Benjamin Gray 
Fixes: 2454a7af0f2a ("powerpc/pseries: define driver for Platform KeyStore")
Signed-off-by: Andrew Donnellan 
Reviewed-by: Russell Currey 
Signed-off-by: Russell Currey 

---

v3: Merge plpks fixes and signed update series with secvar series

v4: Fix typo in commit message

Move up in series (npiggin)
---
 arch/powerpc/platforms/pseries/plpks.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/plpks.c 
b/arch/powerpc/platforms/pseries/plpks.c
index 9e85b6d85b0b..a01cf2ff140a 100644
--- a/arch/powerpc/platforms/pseries/plpks.c
+++ b/arch/powerpc/platforms/pseries/plpks.c
@@ -126,7 +126,8 @@ static int plpks_gen_password(void)
u8 *password, consumer = PKS_OS_OWNER;
int rc;
 
-   password = kzalloc(maxpwsize, GFP_KERNEL);
+   // The password must not cross a page boundary, so we align to the next 
power of 2
+   password = kzalloc(roundup_pow_of_two(maxpwsize), GFP_KERNEL);
if (!password)
return -ENOMEM;
 
@@ -162,7 +163,9 @@ static struct plpks_auth *construct_auth(u8 consumer)
if (consumer > PKS_OS_OWNER)
return ERR_PTR(-EINVAL);
 
-   auth = kzalloc(struct_size(auth, password, maxpwsize), GFP_KERNEL);
+   // The auth structure must not cross a page boundary and must be
+   // 16 byte aligned. We align to the next largest power of 2
+   auth = kzalloc(roundup_pow_of_two(struct_size(auth, password, 
maxpwsize)), GFP_KERNEL);
if (!auth)
return ERR_PTR(-ENOMEM);
 
@@ -196,7 +199,8 @@ static struct label *construct_label(char *component, u8 
varos, u8 *name,
if (component && slen > sizeof(label->attr.prefix))
return ERR_PTR(-EINVAL);
 
-   label = kzalloc(sizeof(*label), GFP_KERNEL);
+   // The label structure must not cross a page boundary, so we align to 
the next power of 2
+   label = kzalloc(roundup_pow_of_two(sizeof(*label)), GFP_KERNEL);
if (!label)
return ERR_PTR(-ENOMEM);
 
-- 
2.39.0



[PATCH v4 04/24] powerpc/secvar: Warn and error if multiple secvar ops are set

2023-01-19 Thread Andrew Donnellan
From: Russell Currey 

The secvar code only supports one consumer at a time.

Multiple consumers aren't possible at this point in time, but we'd want
it to be obvious if it ever could happen.

Signed-off-by: Russell Currey 
Co-developed-by: Andrew Donnellan 
Signed-off-by: Andrew Donnellan 

---

v4: Return an error and don't actually try to set secvar_operations if the
warning is triggered (npiggin)
---
 arch/powerpc/include/asm/secvar.h| 4 ++--
 arch/powerpc/kernel/secvar-ops.c | 8 ++--
 arch/powerpc/platforms/powernv/opal-secvar.c | 4 +---
 3 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/secvar.h 
b/arch/powerpc/include/asm/secvar.h
index 07ba36f868a7..4ce3f12c5613 100644
--- a/arch/powerpc/include/asm/secvar.h
+++ b/arch/powerpc/include/asm/secvar.h
@@ -21,11 +21,11 @@ struct secvar_operations {
 
 #ifdef CONFIG_PPC_SECURE_BOOT
 
-extern void set_secvar_ops(const struct secvar_operations *ops);
+extern int set_secvar_ops(const struct secvar_operations *ops);
 
 #else
 
-static inline void set_secvar_ops(const struct secvar_operations *ops) { }
+static inline int set_secvar_ops(const struct secvar_operations *ops) { return 
0; }
 
 #endif
 
diff --git a/arch/powerpc/kernel/secvar-ops.c b/arch/powerpc/kernel/secvar-ops.c
index 6a29777d6a2d..9c8dd4e7c270 100644
--- a/arch/powerpc/kernel/secvar-ops.c
+++ b/arch/powerpc/kernel/secvar-ops.c
@@ -8,10 +8,14 @@
 
 #include 
 #include 
+#include 
 
-const struct secvar_operations *secvar_ops __ro_after_init;
+const struct secvar_operations *secvar_ops __ro_after_init = NULL;
 
-void set_secvar_ops(const struct secvar_operations *ops)
+int set_secvar_ops(const struct secvar_operations *ops)
 {
+   if (WARN_ON_ONCE(secvar_ops))
+   return -1;
secvar_ops = ops;
+   return 0;
 }
diff --git a/arch/powerpc/platforms/powernv/opal-secvar.c 
b/arch/powerpc/platforms/powernv/opal-secvar.c
index ef89861569e0..4c0a3b030fe0 100644
--- a/arch/powerpc/platforms/powernv/opal-secvar.c
+++ b/arch/powerpc/platforms/powernv/opal-secvar.c
@@ -113,9 +113,7 @@ static int opal_secvar_probe(struct platform_device *pdev)
return -ENODEV;
}
 
-   set_secvar_ops(_secvar_ops);
-
-   return 0;
+   return set_secvar_ops(_secvar_ops);
 }
 
 static const struct of_device_id opal_secvar_match[] = {
-- 
2.39.0



Re: [PATCH v3 08/24] powerpc/secvar: Allow backend to populate static list of variable names

2023-01-19 Thread Andrew Donnellan
On Thu, 2023-01-19 at 11:10 +1000, Nicholas Piggin wrote:
> > diff --git a/arch/powerpc/include/asm/secvar.h
> > b/arch/powerpc/include/asm/secvar.h
> > index ebf95386d720..c8bee1834b54 100644
> > --- a/arch/powerpc/include/asm/secvar.h
> > +++ b/arch/powerpc/include/asm/secvar.h
> > @@ -23,6 +23,10 @@ struct secvar_operations {
> > ssize_t (*format)(char *buf);
> > int (*max_size)(u64 *max_size);
> > const struct attribute **config_attrs;
> > +
> > +   // NULL-terminated array of fixed variable names
> > +   // Only used if get_next() isn't provided
> > +   const char * const *var_names;
> 
> The other way you could go is provide a sysfs_init() ops call here,
> and export the add_var as a library function that backends can use.

True, I think I'll keep it as is for now but I'll have a think about
whether to do that in a later patch.

-- 
Andrew DonnellanOzLabs, ADL Canberra
a...@linux.ibm.com   IBM Australia Limited


Re: Calculating array sizes in C - was: Re: Build regressions/improvements in v6.2-rc1

2023-01-19 Thread Rob Landley



On 1/19/23 16:11, Michael.Karcher wrote:
> Isn't this supposed to be caught by this check:

  a, __same_type(a, NULL)

 ?
>>>
>>> Yeah, but gcc thinks it is smarter than us...
>>> Probably it drops the test, assuming UB cannot happen.
>> Hmm, sounds like a GGC bug to me then. Not sure how to fix this then.
> 
> 
> I don't see a clear bug at this point. We are talking about the C expression
> 
>    __same_type((void*)0, (void*)0)? 0 : sizeof((void*)0)/sizeof(*((void*0))

*(void*) is type "void" which does not have a size.

The problem is gcc "optimizing out" an earlier type check, the same way it
"optimizes out" checks for signed integer math overflowing, or "optimizes out" a
comparison to pointers from two different local variables from different
function calls trying to calculate the amount of stack used, or "optimizes out"
using char *x = (char *)1; as a flag value and then doing "if (!(x-1)) because
it can "never happen"...
> I suggest to file a bug against gcc complaining about a "spurious 
> warning", and using "-Werror -Wno-error-sizeof-pointer-div" until gcc is 
> adapted to not emit the warning about the pointer division if the result 
> is not used.

Remember when gcc got rewritten in c++ starting in 2007?

Historically the main marketing push of C++ was that it contains the whole of C
and therefore MUST be just as good a language, the same way a mud pie contains
an entire glass of water and therefore MUST be just as good a beverage. Anything
C can do that C++ _can't_ do is seen as an existential threat by C++ developers.
They've worked dilligently to "fix" C not being a giant pile of "undefined
behavior" the way C++ is for 15 years now.

I have... opinions on this.

> Regards,
>    Michael Karcher

Rob


Re: [PATCH v3 04/24] powerpc/secvar: Handle format string in the consumer

2023-01-19 Thread Russell Currey
On Thu, 2023-01-19 at 11:17 +1000, Nicholas Piggin wrote:
> On Wed Jan 18, 2023 at 4:10 PM AEST, Andrew Donnellan wrote:
> > From: Russell Currey 
> > 
> > The code that handles the format string in secvar-sysfs.c is
> > entirely
> > OPAL specific, so create a new "format" op in secvar_operations to
> > make
> > the secvar code more generic.  No functional change.
> > 
> > Signed-off-by: Russell Currey 
> > Signed-off-by: Andrew Donnellan 
> > 
> > ---
> > 
> > v2: Use sysfs_emit() instead of sprintf() (gregkh)
> > 
> > v3: Enforce format string size limit (ruscur)
> > ---
> >  arch/powerpc/include/asm/secvar.h    |  3 +++
> >  arch/powerpc/kernel/secvar-sysfs.c   | 23 
> > --
> >  arch/powerpc/platforms/powernv/opal-secvar.c | 25
> > 
> >  3 files changed, 33 insertions(+), 18 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/secvar.h
> > b/arch/powerpc/include/asm/secvar.h
> > index 07ba36f868a7..8b6475589120 100644
> > --- a/arch/powerpc/include/asm/secvar.h
> > +++ b/arch/powerpc/include/asm/secvar.h
> > @@ -11,12 +11,15 @@
> >  #include 
> >  #include 
> >  
> > +#define SECVAR_MAX_FORMAT_LEN  30 // max length of string returned
> > by ->format()
> > +
> >  extern const struct secvar_operations *secvar_ops;
> >  
> >  struct secvar_operations {
> > int (*get)(const char *key, u64 key_len, u8 *data, u64
> > *data_size);
> > int (*get_next)(const char *key, u64 *key_len, u64
> > keybufsize);
> > int (*set)(const char *key, u64 key_len, u8 *data, u64
> > data_size);
> > +   ssize_t (*format)(char *buf);
> >  };
> >  
> >  #ifdef CONFIG_PPC_SECURE_BOOT
> > diff --git a/arch/powerpc/kernel/secvar-sysfs.c
> > b/arch/powerpc/kernel/secvar-sysfs.c
> > index 462cacc0ca60..d3858eedd72c 100644
> > --- a/arch/powerpc/kernel/secvar-sysfs.c
> > +++ b/arch/powerpc/kernel/secvar-sysfs.c
> > @@ -21,26 +21,13 @@ static struct kset *secvar_kset;
> >  static ssize_t format_show(struct kobject *kobj, struct
> > kobj_attribute *attr,
> >    char *buf)
> >  {
> > -   ssize_t rc = 0;
> > -   struct device_node *node;
> > -   const char *format;
> > -
> > -   node = of_find_compatible_node(NULL, NULL, "ibm,secvar-
> > backend");
> > -   if (!of_device_is_available(node)) {
> > -   rc = -ENODEV;
> > -   goto out;
> > -   }
> > +   char tmp[SECVAR_MAX_FORMAT_LEN];
> > +   ssize_t len = secvar_ops->format(tmp);
> >  
> > -   rc = of_property_read_string(node, "format", );
> > -   if (rc)
> > -   goto out;
> > +   if (len <= 0)
> > +   return -EIO;
> 
> AFAIKS this does have a functional change, it loses the return value.
> Why not return len if it is < 0, and -EIO if len == 0?

In v2 mpe suggested the following:

   I'm not sure you should pass that raw error back to sysfs. Some of
   the
   values could be confusing, eg. if you return -EINVAL it looks like a
   parameter to the read() syscall was invalid. Might be better to just
   return -EIO.
   
Following that advice, I don't think we should return something other
than -EIO, but we should at least pr_err() to document the error - this
isn't something that should ever fail.

> 
> Thanks,
> Nick



Re: Calculating array sizes in C - was: Re: Build regressions/improvements in v6.2-rc1

2023-01-19 Thread Michael.Karcher

Isn't this supposed to be caught by this check:


 a, __same_type(a, NULL)

?


Yeah, but gcc thinks it is smarter than us...
Probably it drops the test, assuming UB cannot happen.

Hmm, sounds like a GGC bug to me then. Not sure how to fix this then.



I don't see a clear bug at this point. We are talking about the C expression

  __same_type((void*)0, (void*)0)? 0 : sizeof((void*)0)/sizeof(*((void*0))

This expression is valid (assuming __same_type works, which is a GCC 
extension), and should return 0. As of now, I have no indication that 
this expression does not return 0. Also, it is true that this expression 
contains the suspicious pattern "sizeof(void*)/sizeof(void)", which is 
does not calculate the size of any array. GCC is free to emit as much 
warnings is it wants for any kind of expressions. From a C standard 
point of view, it's just a "quality of implementation" issue, and an 
implementation that emits useless warnings is of low quality, but not 
non-conforming.


In this case, we requested that gcc refuses to compile if it emits any 
kind of warning, which instructs gcc to reject programs that would be 
valid according to the C standard, but are deemed to be "likely incorrect".


I suggest to file a bug against gcc complaining about a "spurious 
warning", and using "-Werror -Wno-error-sizeof-pointer-div" until gcc is 
adapted to not emit the warning about the pointer division if the result 
is not used.



Regards,
  Michael Karcher



Re: [PATCH v3 05/24] powerpc/secvar: Handle max object size in the consumer

2023-01-19 Thread Greg Joyce
On Wed, 2023-01-18 at 17:10 +1100, Andrew Donnellan wrote:
> From: Russell Currey 
> 
> Currently the max object size is handled in the core secvar code with
> an
> entirely OPAL-specific implementation, so create a new max_size() op
> and
> move the existing implementation into the powernv platform.  Should
> be
> no functional change.
> 
> Signed-off-by: Russell Currey 
> Signed-off-by: Andrew Donnellan 
> 
> ---
> 
> v3: Change uint64_t type to u64 (mpe)
> ---
>  arch/powerpc/include/asm/secvar.h|  1 +
>  arch/powerpc/kernel/secvar-sysfs.c   | 17 +++--
>  arch/powerpc/platforms/powernv/opal-secvar.c | 19
> +++
>  3 files changed, 23 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/secvar.h
> b/arch/powerpc/include/asm/secvar.h
> index 8b6475589120..b2cb9bb7c540 100644
> --- a/arch/powerpc/include/asm/secvar.h
> +++ b/arch/powerpc/include/asm/secvar.h
> @@ -20,6 +20,7 @@ struct secvar_operations {
>   int (*get_next)(const char *key, u64 *key_len, u64 keybufsize);
>   int (*set)(const char *key, u64 key_len, u8 *data, u64
> data_size);
>   ssize_t (*format)(char *buf);
> + int (*max_size)(u64 *max_size);
>  };
>  
>  #ifdef CONFIG_PPC_SECURE_BOOT
> diff --git a/arch/powerpc/kernel/secvar-sysfs.c
> b/arch/powerpc/kernel/secvar-sysfs.c
> index d3858eedd72c..031ef37bca99 100644
> --- a/arch/powerpc/kernel/secvar-sysfs.c
> +++ b/arch/powerpc/kernel/secvar-sysfs.c
> @@ -128,27 +128,16 @@ static struct kobj_type secvar_ktype = {
>  static int update_kobj_size(void)
>  {
>  
> - struct device_node *node;
>   u64 varsize;
> - int rc = 0;
> + int rc = secvar_ops->max_size();
>  
> - node = of_find_compatible_node(NULL, NULL, "ibm,secvar-
> backend");
> - if (!of_device_is_available(node)) {
> - rc = -ENODEV;
> - goto out;
> - }
> -
> - rc = of_property_read_u64(node, "max-var-size", );
>   if (rc)
> - goto out;
> + return rc;
>  
>   data_attr.size = varsize;
>   update_attr.size = varsize;
>  
> -out:
> - of_node_put(node);
> -
> - return rc;
> + return 0;
>  }
>  
>  static int secvar_sysfs_load(void)
> diff --git a/arch/powerpc/platforms/powernv/opal-secvar.c
> b/arch/powerpc/platforms/powernv/opal-secvar.c
> index 623c6839e66c..c9b9fd3730df 100644
> --- a/arch/powerpc/platforms/powernv/opal-secvar.c
> +++ b/arch/powerpc/platforms/powernv/opal-secvar.c
> @@ -122,11 +122,30 @@ static ssize_t opal_secvar_format(char *buf)
>   return rc;
>  }
>  
> +static int opal_secvar_max_size(u64 *max_size)
> +{
> + int rc;
> + struct device_node *node;
> +
> + node = of_find_compatible_node(NULL, NULL, "ibm,secvar-
> backend");

I assume that node could be NULL and this code relies on
of_device_is_available() and of_node_put() checking for a NULL node
pointer? Would it be safer just to return -ENODEV if node is NULL?

> + if (!of_device_is_available(node)) {
> + rc = -ENODEV;
> + goto out;
> + }
> +
> + rc = of_property_read_u64(node, "max-var-size", max_size);
> +
> +out:
> + of_node_put(node);
> + return rc;
> +}
> +
>  static const struct secvar_operations opal_secvar_ops = {
>   .get = opal_get_variable,
>   .get_next = opal_get_next_variable,
>   .set = opal_set_variable,
>   .format = opal_secvar_format,
> + .max_size = opal_secvar_max_size,
>  };
>  
>  static int opal_secvar_probe(struct platform_device *pdev)



Re: [PATCH 39/41] kernel/fork: throttle call_rcu() calls in vm_area_free

2023-01-19 Thread Paul E. McKenney
On Thu, Jan 19, 2023 at 11:47:36AM -0800, Suren Baghdasaryan wrote:
> On Thu, Jan 19, 2023 at 11:20 AM Paul E. McKenney  wrote:
> >
> > On Thu, Jan 19, 2023 at 10:52:03AM -0800, Suren Baghdasaryan wrote:
> > > On Thu, Jan 19, 2023 at 4:59 AM Michal Hocko  wrote:
> > > >
> > > > On Mon 09-01-23 12:53:34, Suren Baghdasaryan wrote:
> > > > > call_rcu() can take a long time when callback offloading is enabled.
> > > > > Its use in the vm_area_free can cause regressions in the exit path 
> > > > > when
> > > > > multiple VMAs are being freed. To minimize that impact, place VMAs 
> > > > > into
> > > > > a list and free them in groups using one call_rcu() call per group.
> > > >
> > > > After some more clarification I can understand how call_rcu might not be
> > > > super happy about thousands of callbacks to be invoked and I do agree
> > > > that this is not really optimal.
> > > >
> > > > On the other hand I do not like this solution much either.
> > > > VM_AREA_FREE_LIST_MAX is arbitrary and it won't really help all that
> > > > much with processes with a huge number of vmas either. It would still be
> > > > in housands of callbacks to be scheduled without a good reason.
> > > >
> > > > Instead, are there any other cases than remove_vma that need this
> > > > batching? We could easily just link all the vmas into linked list and
> > > > use a single call_rcu instead, no? This would both simplify the
> > > > implementation, remove the scaling issue as well and we do not have to
> > > > argue whether VM_AREA_FREE_LIST_MAX should be epsilon or epsilon + 1.
> > >
> > > Yes, I agree the solution is not stellar. I wanted something simple
> > > but this is probably too simple. OTOH keeping all dead vm_area_structs
> > > on the list without hooking up a shrinker (additional complexity) does
> > > not sound too appealing either. WDYT about time domain throttling to
> > > limit draining the list to say once per second like this:
> > >
> > > void vm_area_free(struct vm_area_struct *vma)
> > > {
> > >struct mm_struct *mm = vma->vm_mm;
> > >bool drain;
> > >
> > >free_anon_vma_name(vma);
> > >
> > >spin_lock(>vma_free_list.lock);
> > >list_add(>vm_free_list, >vma_free_list.head);
> > >mm->vma_free_list.size++;
> > > -   drain = mm->vma_free_list.size > VM_AREA_FREE_LIST_MAX;
> > > +   drain = jiffies > mm->last_drain_tm + HZ;
> > >
> > >spin_unlock(>vma_free_list.lock);
> > >
> > > -   if (drain)
> > > +   if (drain) {
> > >   drain_free_vmas(mm);
> > > + mm->last_drain_tm = jiffies;
> > > +   }
> > > }
> > >
> > > Ultimately we want to prevent very frequent call_rcu() calls, so
> > > throttling in the time domain seems appropriate. That's the simplest
> > > way I can think of to address your concern about a quick spike in VMA
> > > freeing. It does not place any restriction on the list size and we
> > > might have excessive dead vm_area_structs if after a large spike there
> > > are no vm_area_free() calls but I don't know if that's a real problem,
> > > so not sure we should be addressing it at this time. WDYT?
> >
> > Just to double-check, we really did try the very frequent call_rcu()
> > invocations and we really did see a problem, correct?
> 
> Correct. More specifically with CONFIG_RCU_NOCB_CPU=y we saw
> regressions when a process exits and all its VMAs get destroyed,
> causing a flood of call_rcu()'s.

Thank you for the reminder, real problem needs solution.  ;-)

Thanx, Paul

> > Although it is not perfect, call_rcu() is designed to take a fair amount
> > of abuse.  So if we didn't see a real problem, the frequent call_rcu()
> > invocations might be a bit simpler.


Re: [PATCH 39/41] kernel/fork: throttle call_rcu() calls in vm_area_free

2023-01-19 Thread Suren Baghdasaryan
On Thu, Jan 19, 2023 at 11:20 AM Paul E. McKenney  wrote:
>
> On Thu, Jan 19, 2023 at 10:52:03AM -0800, Suren Baghdasaryan wrote:
> > On Thu, Jan 19, 2023 at 4:59 AM Michal Hocko  wrote:
> > >
> > > On Mon 09-01-23 12:53:34, Suren Baghdasaryan wrote:
> > > > call_rcu() can take a long time when callback offloading is enabled.
> > > > Its use in the vm_area_free can cause regressions in the exit path when
> > > > multiple VMAs are being freed. To minimize that impact, place VMAs into
> > > > a list and free them in groups using one call_rcu() call per group.
> > >
> > > After some more clarification I can understand how call_rcu might not be
> > > super happy about thousands of callbacks to be invoked and I do agree
> > > that this is not really optimal.
> > >
> > > On the other hand I do not like this solution much either.
> > > VM_AREA_FREE_LIST_MAX is arbitrary and it won't really help all that
> > > much with processes with a huge number of vmas either. It would still be
> > > in housands of callbacks to be scheduled without a good reason.
> > >
> > > Instead, are there any other cases than remove_vma that need this
> > > batching? We could easily just link all the vmas into linked list and
> > > use a single call_rcu instead, no? This would both simplify the
> > > implementation, remove the scaling issue as well and we do not have to
> > > argue whether VM_AREA_FREE_LIST_MAX should be epsilon or epsilon + 1.
> >
> > Yes, I agree the solution is not stellar. I wanted something simple
> > but this is probably too simple. OTOH keeping all dead vm_area_structs
> > on the list without hooking up a shrinker (additional complexity) does
> > not sound too appealing either. WDYT about time domain throttling to
> > limit draining the list to say once per second like this:
> >
> > void vm_area_free(struct vm_area_struct *vma)
> > {
> >struct mm_struct *mm = vma->vm_mm;
> >bool drain;
> >
> >free_anon_vma_name(vma);
> >
> >spin_lock(>vma_free_list.lock);
> >list_add(>vm_free_list, >vma_free_list.head);
> >mm->vma_free_list.size++;
> > -   drain = mm->vma_free_list.size > VM_AREA_FREE_LIST_MAX;
> > +   drain = jiffies > mm->last_drain_tm + HZ;
> >
> >spin_unlock(>vma_free_list.lock);
> >
> > -   if (drain)
> > +   if (drain) {
> >   drain_free_vmas(mm);
> > + mm->last_drain_tm = jiffies;
> > +   }
> > }
> >
> > Ultimately we want to prevent very frequent call_rcu() calls, so
> > throttling in the time domain seems appropriate. That's the simplest
> > way I can think of to address your concern about a quick spike in VMA
> > freeing. It does not place any restriction on the list size and we
> > might have excessive dead vm_area_structs if after a large spike there
> > are no vm_area_free() calls but I don't know if that's a real problem,
> > so not sure we should be addressing it at this time. WDYT?
>
> Just to double-check, we really did try the very frequent call_rcu()
> invocations and we really did see a problem, correct?

Correct. More specifically with CONFIG_RCU_NOCB_CPU=y we saw
regressions when a process exits and all its VMAs get destroyed,
causing a flood of call_rcu()'s.

>
> Although it is not perfect, call_rcu() is designed to take a fair amount
> of abuse.  So if we didn't see a real problem, the frequent call_rcu()
> invocations might be a bit simpler.
>
> Thanx, Paul


Re: [PATCH] modpost: support arbitrary symbol length in modversion

2023-01-19 Thread Miguel Ojeda
On Wed, Jan 18, 2023 at 8:02 AM Masahiro Yamada  wrote:
>
> - *.mod.c is kept human readable.

On the topic of `.mod.c` readability: for approaches that may be less
readable, we could improve that by adding some extra comments or
rearrange things in a different way (it is a generated file, after
all!).

For instance, for the original approach: https://godbolt.org/z/6oh45axnc

Cheers,
Miguel


Re: [PATCH 39/41] kernel/fork: throttle call_rcu() calls in vm_area_free

2023-01-19 Thread Paul E. McKenney
On Thu, Jan 19, 2023 at 10:52:03AM -0800, Suren Baghdasaryan wrote:
> On Thu, Jan 19, 2023 at 4:59 AM Michal Hocko  wrote:
> >
> > On Mon 09-01-23 12:53:34, Suren Baghdasaryan wrote:
> > > call_rcu() can take a long time when callback offloading is enabled.
> > > Its use in the vm_area_free can cause regressions in the exit path when
> > > multiple VMAs are being freed. To minimize that impact, place VMAs into
> > > a list and free them in groups using one call_rcu() call per group.
> >
> > After some more clarification I can understand how call_rcu might not be
> > super happy about thousands of callbacks to be invoked and I do agree
> > that this is not really optimal.
> >
> > On the other hand I do not like this solution much either.
> > VM_AREA_FREE_LIST_MAX is arbitrary and it won't really help all that
> > much with processes with a huge number of vmas either. It would still be
> > in housands of callbacks to be scheduled without a good reason.
> >
> > Instead, are there any other cases than remove_vma that need this
> > batching? We could easily just link all the vmas into linked list and
> > use a single call_rcu instead, no? This would both simplify the
> > implementation, remove the scaling issue as well and we do not have to
> > argue whether VM_AREA_FREE_LIST_MAX should be epsilon or epsilon + 1.
> 
> Yes, I agree the solution is not stellar. I wanted something simple
> but this is probably too simple. OTOH keeping all dead vm_area_structs
> on the list without hooking up a shrinker (additional complexity) does
> not sound too appealing either. WDYT about time domain throttling to
> limit draining the list to say once per second like this:
> 
> void vm_area_free(struct vm_area_struct *vma)
> {
>struct mm_struct *mm = vma->vm_mm;
>bool drain;
> 
>free_anon_vma_name(vma);
> 
>spin_lock(>vma_free_list.lock);
>list_add(>vm_free_list, >vma_free_list.head);
>mm->vma_free_list.size++;
> -   drain = mm->vma_free_list.size > VM_AREA_FREE_LIST_MAX;
> +   drain = jiffies > mm->last_drain_tm + HZ;
> 
>spin_unlock(>vma_free_list.lock);
> 
> -   if (drain)
> +   if (drain) {
>   drain_free_vmas(mm);
> + mm->last_drain_tm = jiffies;
> +   }
> }
> 
> Ultimately we want to prevent very frequent call_rcu() calls, so
> throttling in the time domain seems appropriate. That's the simplest
> way I can think of to address your concern about a quick spike in VMA
> freeing. It does not place any restriction on the list size and we
> might have excessive dead vm_area_structs if after a large spike there
> are no vm_area_free() calls but I don't know if that's a real problem,
> so not sure we should be addressing it at this time. WDYT?

Just to double-check, we really did try the very frequent call_rcu()
invocations and we really did see a problem, correct?

Although it is not perfect, call_rcu() is designed to take a fair amount
of abuse.  So if we didn't see a real problem, the frequent call_rcu()
invocations might be a bit simpler.

Thanx, Paul


Re: [PATCH 39/41] kernel/fork: throttle call_rcu() calls in vm_area_free

2023-01-19 Thread Paul E. McKenney
On Thu, Jan 19, 2023 at 01:52:14PM +0100, Michal Hocko wrote:
> On Wed 18-01-23 11:01:08, Suren Baghdasaryan wrote:
> > On Wed, Jan 18, 2023 at 10:34 AM Paul E. McKenney  
> > wrote:
> [...]
> > > There are a couple of possibilities here.
> > >
> > > First, if I am remembering correctly, the time between the call_rcu()
> > > and invocation of the corresponding callback was taking multiple seconds,
> > > but that was because the kernel was built with CONFIG_LAZY_RCU=y in
> > > order to save power by batching RCU work over multiple call_rcu()
> > > invocations.  If this is causing a problem for a given call site, the
> > > shiny new call_rcu_hurry() can be used instead.  Doing this gets back
> > > to the old-school non-laziness, but can of course consume more power.
> > 
> > That would not be the case because CONFIG_LAZY_RCU was not an option
> > at the time I was profiling this issue.
> > Laxy RCU would be a great option to replace this patch but
> > unfortunately it's not the default behavior, so I would still have to
> > implement this batching in case lazy RCU is not enabled.
> > 
> > >
> > > Second, there is a much shorter one-jiffy delay between the call_rcu()
> > > and the invocation of the corresponding callback in kernels built with
> > > either CONFIG_NO_HZ_FULL=y (but only on CPUs mentioned in the nohz_full
> > > or rcu_nocbs kernel boot parameters) or CONFIG_RCU_NOCB_CPU=y (but only
> > > on CPUs mentioned in the rcu_nocbs kernel boot parameters).  The purpose
> > > of this delay is to avoid lock contention, and so this delay is incurred
> > > only on CPUs that are queuing callbacks at a rate exceeding 16K/second.
> > > This is reduced to a per-jiffy limit, so on a HZ=1000 system, a CPU
> > > invoking call_rcu() at least 16 times within a given jiffy will incur
> > > the added delay.  The reason for this delay is the use of a separate
> > > ->nocb_bypass list.  As Suren says, this bypass list is used to reduce
> > > lock contention on the main ->cblist.  This is not needed in old-school
> > > kernels built without either CONFIG_NO_HZ_FULL=y or CONFIG_RCU_NOCB_CPU=y
> > > (including most datacenter kernels) because in that case the callbacks
> > > enqueued by call_rcu() are touched only by the corresponding CPU, so
> > > that there is no need for locks.
> > 
> > I believe this is the reason in my profiled case.
> > 
> > >
> > > Third, if you are instead seeing multiple milliseconds of CPU consumed by
> > > call_rcu() in the common case (for example, without the aid of interrupts,
> > > NMIs, or SMIs), please do let me know.  That sounds to me like a bug.
> > 
> > I don't think I've seen such a case.
> > Thanks for clarifications, Paul!
> 
> Thanks for the explanation Paul. I have to say this has caught me as a
> surprise. There are just not enough details about the benchmark to
> understand what is going on but I find it rather surprising that
> call_rcu can induce a higher overhead than the actual kmem_cache_free
> which is the callback. My naive understanding has been that call_rcu is
> really fast way to defer the execution to the RCU safe context to do the
> final cleanup.

If I am following along correctly (ha!), then your "induce a higher
overhead" should be something like "induce a higher to-kfree() latency".

Of course, there already is a higher latency-to-kfree via call_rcu()
than via a direct call to kfree(), and callback-offload CPUs that are
being flooded with callbacks raise that latency a jiffy or so more in
order to avoid lock contention.

If this becomes a problem, the callback-offloading code can be a bit
smarter about avoiding lock contention, but need to see a real problem
before I make that change.  But if there is a real problem I will of
course fix it.

Or did I miss a turn in this discussion?

Thanx, Paul


Re: [PATCH] modpost: support arbitrary symbol length in modversion

2023-01-19 Thread Lucas De Marchi

On Wed, Jan 18, 2023 at 04:01:29PM +0900, Masahiro Yamada wrote:

On Wed, Jan 18, 2023 at 4:23 AM Lucas De Marchi
 wrote:


On Tue, Jan 17, 2023 at 06:51:44PM +0100, Michal Suchánek wrote:
>Hello,
>
>On Fri, Jan 13, 2023 at 06:18:41PM +, Gary Guo wrote:
>> On Thu, 12 Jan 2023 14:40:59 -0700
>> Lucas De Marchi  wrote:
>>
>> > On Wed, Jan 11, 2023 at 04:11:51PM +, Gary Guo wrote:
>> > >
>> > > struct modversion_info {
>> > >- unsigned long crc;
>> > >- char name[MODULE_NAME_LEN];
>> > >+ /* Offset of the next modversion entry in relation to this one. */
>> > >+ u32 next;
>> > >+ u32 crc;
>> > >+ char name[0];
>> >
>> > although not really exported as uapi, this will break userspace as this is
>> > used in the  elf file generated for the modules. I think
>> > this change must be made in a backward compatible way and kmod updated
>> > to deal with the variable name length:
>> >
>> > kmod $ git grep "\[64"
>> > libkmod/libkmod-elf.c:  char name[64 - sizeof(uint32_t)];
>> > libkmod/libkmod-elf.c:  char name[64 - sizeof(uint64_t)];
>> >
>> > in kmod we have both 32 and 64 because a 64-bit kmod can read both 32
>> > and 64 bit module, and vice versa.
>> >
>>
>> Hi Lucas,
>>
>> Thanks for the information.
>>
>> The change can't be "truly" backward compatible, in a sense that
>> regardless of the new format we choose, kmod would not be able to decode
>> symbols longer than "64 - sizeof(long)" bytes. So the list it retrieves
>> is going to be incomplete, isn't it?
>>
>> What kind of backward compatibility should be expected? It could be:
>> * short symbols can still be found by old versions of kmod, but not
>>   long symbols;
>
>That sounds good. Not everyone is using rust, and with this option
>people who do will need to upgrade tooling, and people who don't care
>don't need to do anything.

that could be it indeed. My main worry here is:

"After the support is added in kmod, kmod needs to be able to output the
correct information regardless if the module is from before/after the
change in the kernel and also without relying on kernel version."
Just changing the struct modversion_info doesn't make that possible.

Maybe adding the long symbols in another section? Or ble
just increase to 512 and add the size to a
"__versions_hdr" section. If we then output a max size per module,
this would offset a little bit the additional size gained for the
modules using rust. And the additional 0's should compress well
so I'm not sure the additional size is that much relevant here.






I also thought of new section(s) for long symbols.



One idea is to have separate sections for CRCs and symbol names.




section __version_crc:
  0x12345678
  0x23456789
  0x34567890


section __version_sym:
 "very_very_very_very_long_symbol"
 "another_very_very_very_very_very_long_symbol"
 "yet_another_very_very_very_very_very_long_symbol"




You can iterate in each section with this:

  crc += sizeof(u32);
  name += strlen(name) + 1;


Benefits:
 - No next pointer
 - No padding
   - *.mod.c is kept human readable.


I like this option. It would be better than the current one.

Lucas De Marchi







BTW, the following is impossible
because the pointer reference to .rodata
is not available at this point?

struct modversion_info {
u32 crc;
const char *name:
};



--
Best Regards
Masahiro Yamada


Re: [PATCH v7 3/8] powerpc/crash: update kimage_arch struct

2023-01-19 Thread Laurent Dufour
On 15/01/2023 16:02:01, Sourabh Jain wrote:
> Add a new member "fdt_index" to kimage_arch struct to hold the index of
> the FDT (Flattened Device Tree) segment in the kexec segment array.
> 
> Having direct access to FDT segment will help arch crash hotplug handler
> to avoid looping kexec segment array to identify the FDT segment index
> for every FDT update on hotplug events.
> 
> The fdt_index is initialized during the kexec load for both kexec_load and
> kexec_file_load system call.
> 
> Signed-off-by: Sourabh Jain 
> ---
>  arch/powerpc/include/asm/kexec.h  |  7 +++
>  arch/powerpc/kexec/core_64.c  | 27 +++
>  arch/powerpc/kexec/elf_64.c   |  6 ++
>  arch/powerpc/kexec/file_load_64.c |  5 +
>  4 files changed, 45 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/kexec.h 
> b/arch/powerpc/include/asm/kexec.h
> index 8090ad7d97d9d..5a322c1737661 100644
> --- a/arch/powerpc/include/asm/kexec.h
> +++ b/arch/powerpc/include/asm/kexec.h
> @@ -103,6 +103,10 @@ void kexec_copy_flush(struct kimage *image);
>  struct crash_mem;
>  int update_cpus_node(void *fdt);
>  int get_crash_memory_ranges(struct crash_mem **mem_ranges);
> +#if defined(CONFIG_CRASH_HOTPLUG)
> +int machine_kexec_post_load(struct kimage *image);
> +#define machine_kexec_post_load machine_kexec_post_load
> +#endif
>  #endif
>  
>  #if defined(CONFIG_CRASH_DUMP) && defined(CONFIG_PPC_RTAS)
> @@ -118,6 +122,9 @@ extern const struct kexec_file_ops kexec_elf64_ops;
>  struct kimage_arch {
>   struct crash_mem *exclude_ranges;
>  
> +#if defined(CONFIG_CRASH_HOTPLUG)
> + int fdt_index;
> +#endif
>   unsigned long backup_start;
>   void *backup_buf;
>   void *fdt;
> diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
> index 0b292f93a74cc..3d4fe1aa6f761 100644
> --- a/arch/powerpc/kexec/core_64.c
> +++ b/arch/powerpc/kexec/core_64.c
> @@ -77,6 +77,33 @@ int machine_kexec_prepare(struct kimage *image)
>   return 0;
>  }
>  
> +#if defined(CONFIG_CRASH_HOTPLUG)

I think you should add a small function header describing that this
function is recording the index of the FDT segment for later use.

> +int machine_kexec_post_load(struct kimage *kimage)
> +{
> + int i;
> + void *ptr;
> + unsigned long mem;
> +
> + /* Mark fdt_index invalid */
> + kimage->arch.fdt_index = -1;

Is that really needed?
This is already done in arch_kexec_kernel_image_probe() called before this
function, isn't it?

> +
> + if (kimage->type != KEXEC_TYPE_CRASH)
> + return 0;
> +
> + for (i = 0; i < kimage->nr_segments; i++) {
> + mem = kimage->segment[i].mem;
> + ptr = __va(mem);
> +
> + if (ptr && fdt_magic(ptr) == FDT_MAGIC) {
> + kimage->arch.fdt_index = i;
> + break;
> + }
> + }
> +
> + return 0;
> +}
> +#endif
> +
>  /* Called during kexec sequence with MMU off */
>  static notrace void copy_segments(unsigned long ind)
>  {
> diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c
> index eeb258002d1e0..2a17f171661f1 100644
> --- a/arch/powerpc/kexec/elf_64.c
> +++ b/arch/powerpc/kexec/elf_64.c
> @@ -123,6 +123,12 @@ static void *elf64_load(struct kimage *image, char 
> *kernel_buf,
>   kbuf.buf_align = PAGE_SIZE;
>   kbuf.top_down = true;
>   kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
> +
> +#if defined(CONFIG_CRASH_HOTPLUG)
> + image->arch.fdt_index = image->nr_segments;

I'm sorry, I'm not familliar with that code, could you explain why
fdt_index has to be assigned here, and to that value?

> +#endif
> + kbuf.memsz = fdt_totalsize(fdt);
> +
>   ret = kexec_add_buffer();
>   if (ret)
>   goto out_free_fdt;
> diff --git a/arch/powerpc/kexec/file_load_64.c 
> b/arch/powerpc/kexec/file_load_64.c
> index 9bc70b4d8eafc..725f74d1b928c 100644
> --- a/arch/powerpc/kexec/file_load_64.c
> +++ b/arch/powerpc/kexec/file_load_64.c
> @@ -1153,6 +1153,11 @@ int arch_kexec_kernel_image_probe(struct kimage 
> *image, void *buf,
>   return ret;
>   }
>  
> +#if defined(CONFIG_CRASH_HOTPLUG)
> + /* Mark fdt_index invalid */
> + image->arch.fdt_index = -1;
> +#endif
> +
>   return kexec_image_probe_default(image, buf, buf_len);
>  }
>  



Re: [PATCH 17/41] mm/mmap: move VMA locking before anon_vma_lock_write call

2023-01-19 Thread Suren Baghdasaryan
On Thu, Jan 19, 2023 at 1:31 AM Michal Hocko  wrote:
>
> On Wed 18-01-23 13:48:13, Suren Baghdasaryan wrote:
> > On Wed, Jan 18, 2023 at 1:33 PM Michal Hocko  wrote:
> [...]
> > > So it will become:
> > > Move VMA flag modification (which now implies VMA locking) before
> > > vma_adjust_trans_huge() to ensure the modifications are done after VMA
> > > has been locked. Because vma_adjust_trans_huge() modifies the VMA and such
> > > modifications should be done under VMA write-lock protection.
> > >
> > > which is effectivelly saying
> > > vma_adjust_trans_huge() modifies the VMA and such modifications should
> > > be done under VMA write-lock protection so move VMA flag modifications
> > > before so all of them are covered by the same write protection.
> > >
> > > right?
> >
> > Yes, and the wording in the latter version is simpler to understand
> > IMO, so I would like to adopt it. Do you agree?
>
> of course.

Will update in the next respin. Thanks!

> --
> Michal Hocko
> SUSE Labs


Re: [PATCH 39/41] kernel/fork: throttle call_rcu() calls in vm_area_free

2023-01-19 Thread Suren Baghdasaryan
On Thu, Jan 19, 2023 at 4:59 AM Michal Hocko  wrote:
>
> On Mon 09-01-23 12:53:34, Suren Baghdasaryan wrote:
> > call_rcu() can take a long time when callback offloading is enabled.
> > Its use in the vm_area_free can cause regressions in the exit path when
> > multiple VMAs are being freed. To minimize that impact, place VMAs into
> > a list and free them in groups using one call_rcu() call per group.
>
> After some more clarification I can understand how call_rcu might not be
> super happy about thousands of callbacks to be invoked and I do agree
> that this is not really optimal.
>
> On the other hand I do not like this solution much either.
> VM_AREA_FREE_LIST_MAX is arbitrary and it won't really help all that
> much with processes with a huge number of vmas either. It would still be
> in housands of callbacks to be scheduled without a good reason.
>
> Instead, are there any other cases than remove_vma that need this
> batching? We could easily just link all the vmas into linked list and
> use a single call_rcu instead, no? This would both simplify the
> implementation, remove the scaling issue as well and we do not have to
> argue whether VM_AREA_FREE_LIST_MAX should be epsilon or epsilon + 1.

Yes, I agree the solution is not stellar. I wanted something simple
but this is probably too simple. OTOH keeping all dead vm_area_structs
on the list without hooking up a shrinker (additional complexity) does
not sound too appealing either. WDYT about time domain throttling to
limit draining the list to say once per second like this:

void vm_area_free(struct vm_area_struct *vma)
{
   struct mm_struct *mm = vma->vm_mm;
   bool drain;

   free_anon_vma_name(vma);

   spin_lock(>vma_free_list.lock);
   list_add(>vm_free_list, >vma_free_list.head);
   mm->vma_free_list.size++;
-   drain = mm->vma_free_list.size > VM_AREA_FREE_LIST_MAX;
+   drain = jiffies > mm->last_drain_tm + HZ;

   spin_unlock(>vma_free_list.lock);

-   if (drain)
+   if (drain) {
  drain_free_vmas(mm);
+ mm->last_drain_tm = jiffies;
+   }
}

Ultimately we want to prevent very frequent call_rcu() calls, so
throttling in the time domain seems appropriate. That's the simplest
way I can think of to address your concern about a quick spike in VMA
freeing. It does not place any restriction on the list size and we
might have excessive dead vm_area_structs if after a large spike there
are no vm_area_free() calls but I don't know if that's a real problem,
so not sure we should be addressing it at this time. WDYT?


>
> --
> Michal Hocko
> SUSE Labs


Re: [PATCH] powerpc: add crtsavres.o to always-y instead of extra-y

2023-01-19 Thread Nathan Chancellor
On Thu, Jan 19, 2023 at 05:24:47PM +0900, Masahiro Yamada wrote:
> crtsavres.o is linked to modules, but as commit d0e628cd817f ("kbuild:
> doc: clarify the difference between extra-y and always-y") explained,
> 'make modules' does not build extra-y.
> 
> The following command fails:
> 
>   $ make ARCH=powerpc LLVM=1 mrproper ps3_defconfig modules
> [snip]
> LD [M]  arch/powerpc/platforms/cell/spufs/spufs.ko
>   ld.lld: error: cannot open arch/powerpc/lib/crtsavres.o: No such file or 
> directory
>   make[1]: *** [scripts/Makefile.modfinal:61: 
> arch/powerpc/platforms/cell/spufs/spufs.ko] Error 1
>   make: *** [Makefile:1924: modules] Error 2
> 
> Signed-off-by: Masahiro Yamada 

Reviewed-by: Nathan Chancellor 

> ---
> 
>  arch/powerpc/lib/Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
> index c53618c34b70..aa34854bc9f5 100644
> --- a/arch/powerpc/lib/Makefile
> +++ b/arch/powerpc/lib/Makefile
> @@ -43,7 +43,7 @@ obj-$(CONFIG_FUNCTION_ERROR_INJECTION)  += 
> error-inject.o
>  # so it is only needed for modules, and only for older linkers which
>  # do not support --save-restore-funcs
>  ifndef CONFIG_LD_IS_BFD
> -extra-$(CONFIG_PPC64)+= crtsavres.o
> +always-$(CONFIG_PPC64)   += crtsavres.o
>  endif
>  
>  obj-$(CONFIG_PPC_BOOK3S_64) += copyuser_power7.o copypage_power7.o \
> -- 
> 2.34.1
> 


Re: [PATCH v7 4/8] crash: add phdr for possible CPUs in elfcorehdr

2023-01-19 Thread Laurent Dufour
On 15/01/2023 16:02:02, Sourabh Jain wrote:
> On architectures like PowerPC the crash notes are available for all
> possible CPUs. So let's populate the elfcorehdr for all possible
> CPUs having crash notes to avoid updating elfcorehdr during in-kernel
> crash update on CPU hotplug events.
> 
> The similar technique is used in kexec-tool for kexec_load case.
> 
> Signed-off-by: Sourabh Jain 
> ---
>  kernel/crash_core.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)

This patch is not applying on ppc/next (53ab112a9508).

As far as I could see, crash_prepare_elf64_headers() is defined in the file
kernel/kexec_file.c and that's not recent, see babac4a84a88 (kexec_file,
x86: move re-factored code to generic side, 2018-04-13)

Am I missing something?

> 
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 910d377ea317e..19f987b3851e8 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -364,8 +364,8 @@ int crash_prepare_elf64_headers(struct kimage *image, 
> struct crash_mem *mem,
>   ehdr->e_ehsize = sizeof(Elf64_Ehdr);
>   ehdr->e_phentsize = sizeof(Elf64_Phdr);
>  
> - /* Prepare one phdr of type PT_NOTE for each present CPU */
> - for_each_present_cpu(cpu) {
> + /* Prepare one phdr of type PT_NOTE for possible CPU with crash note. */
> + for_each_possible_cpu(cpu) {
>  #ifdef CONFIG_CRASH_HOTPLUG
>   if (IS_ENABLED(CONFIG_HOTPLUG_CPU)) {
>   /* Skip the soon-to-be offlined cpu */
> @@ -373,8 +373,11 @@ int crash_prepare_elf64_headers(struct kimage *image, 
> struct crash_mem *mem,
>   continue;
>   }
>  #endif
> - phdr->p_type = PT_NOTE;
>   notes_addr = per_cpu_ptr_to_phys(per_cpu_ptr(crash_notes, cpu));
> + if (!notes_addr)
> + continue;
> +
> + phdr->p_type = PT_NOTE;
>   phdr->p_offset = phdr->p_paddr = notes_addr;
>   phdr->p_filesz = phdr->p_memsz = sizeof(note_buf_t);
>   (ehdr->e_phnum)++;



[PATCH] KVM: PPC: Fix refactoring goof in kvmppc_e500mc_init()

2023-01-19 Thread Sean Christopherson
Fix a build error due to a mixup during a recent refactoring.  The error
was reported during code review, but the fixed up patch didn't make it
into the final commit.

Fixes: 474856bad921 ("KVM: PPC: Move processor compatibility check to module 
init")
Link: https://lore.kernel.org/all/87cz93snqc@mpe.ellerman.id.au
Cc: Michael Ellerman 
Signed-off-by: Sean Christopherson 
---
 arch/powerpc/kvm/booke.c  | 2 +-
 arch/powerpc/kvm/e500mc.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 7b4920e9fd26..a647bfc322b8 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -1208,7 +1208,7 @@ int kvmppc_handle_exit(struct kvm_vcpu *vcpu, unsigned 
int exit_nr)
 
 /*
  * On cores with Vector category, KVM is loaded only if CONFIG_ALTIVEC,
- * see kvmppc_core_check_processor_compat().
+ * see kvmppc_e500mc_check_processor_compat().
  */
 #ifdef CONFIG_ALTIVEC
case BOOKE_INTERRUPT_ALTIVEC_UNAVAIL:
diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c
index 611532a0dedc..a309138927ff 100644
--- a/arch/powerpc/kvm/e500mc.c
+++ b/arch/powerpc/kvm/e500mc.c
@@ -168,7 +168,7 @@ static void kvmppc_core_vcpu_put_e500mc(struct kvm_vcpu 
*vcpu)
kvmppc_booke_vcpu_put(vcpu);
 }
 
-int kvmppc_core_check_processor_compat(void)
+int kvmppc_e500mc_check_processor_compat(void)
 {
int r;
 
@@ -390,7 +390,7 @@ static int __init kvmppc_e500mc_init(void)
 
r = kvmppc_e500mc_check_processor_compat();
if (r)
-   return kvmppc_e500mc;
+   goto err_out;
 
r = kvmppc_booke_init();
if (r)

base-commit: 91dc252b0dbb6879e4067f614df1e397fec532a1
-- 
2.39.0.246.g2a6d74b583-goog



Re: [PATCH V3] tools/perf/tests: Fix string substitutions in build id test

2023-01-19 Thread Arnaldo Carvalho de Melo
Em Thu, Jan 19, 2023 at 03:49:15PM +, David Laight escreveu:
> From: Athira Rajeev
> > Sent: 19 January 2023 14:27
> ...
> > The test script "tests/shell/buildid.sh" uses some of the
> > string substitution ways which are supported in bash, but not in
> > "sh" or other shells. Above error on line number 69 that reports
> > "Bad substitution" is:
> 
> Looks better - assuming it works :-)

:-)

Can I take this as an:

Acked-by: David Laight 

?

- Arnaldo


Re: [PATCH v9 00/10] phy: Add support for Lynx 10G SerDes

2023-01-19 Thread Sean Anderson
On 1/18/23 11:54, Vinod Koul wrote:
> On 17-01-23, 11:46, Sean Anderson wrote:
>> 
>> I noticed that this series is marked "changes requested" on patchwork.
>> However, I have received only automated feedback. I have done my best
>> effort to address feedback I have received on prior revisions. I would
>> appreciate getting another round of review before resending this series.
> 
> Looking at the series, looks like kernel-bot sent some warnings on the
> series so I was expecting an updated series for review
> 

Generally, multiple reviewers will comment on a patch, even if another
reviewer finds something which needs to be changed. This is a one-line
fix, so I would appreciate getting more substantial feedback before
respinning. Every time I send a new series I have to rebase and test on
hardware. It's work that I would rather do when there is something to be
gained.

--Sean


Re: [PATCH] modpost: support arbitrary symbol length in modversion

2023-01-19 Thread Gary Guo
On Thu, 19 Jan 2023 16:18:57 +0100
Michal Suchánek  wrote:

> On Thu, Jan 19, 2023 at 03:09:36PM +, Gary Guo wrote:
> > On Tue, 17 Jan 2023 11:22:45 -0800
> > Lucas De Marchi  wrote:
> >   
> > > And the additional 0's should compress well
> > > so I'm not sure the additional size is that much relevant here.  
> > 
> > I am not sure why compression is mentioned here. I don't think section
> > in .ko files are compressed.  
> 
> There is the option to compress the whole .ko files, and it's commonly
> used.

Hi Michal,

I am aware that there is an option but I am surprised to hear that it's
commonly used. I don't think that's enabled by default, and certainly
Debian/Ubuntu does not have it enabled.

Best,
Gary


Re: [PATCH v2 0/9] jevents/pmu-events improvements

2023-01-19 Thread Ian Rogers
On Wed, Dec 21, 2022 at 2:34 PM Ian Rogers  wrote:
>
> Add an optimization to jevents using the metric code, rewrite metrics
> in terms of each other in order to minimize size and improve
> readability. For example, on Power8
> other_stall_cpi is rewritten from:
> "PM_CMPLU_STALL / PM_RUN_INST_CMPL - PM_CMPLU_STALL_BRU_CRU / 
> PM_RUN_INST_CMPL - PM_CMPLU_STALL_FXU / PM_RUN_INST_CMPL - PM_CMPLU_STALL_VSU 
> / PM_RUN_INST_CMPL - PM_CMPLU_STALL_LSU / PM_RUN_INST_CMPL - 
> PM_CMPLU_STALL_NTCG_FLUSH / PM_RUN_INST_CMPL - PM_CMPLU_STALL_NO_NTF / 
> PM_RUN_INST_CMPL"
> to:
> "stall_cpi - bru_cru_stall_cpi - fxu_stall_cpi - vsu_stall_cpi - 
> lsu_stall_cpi - ntcg_flush_cpi - no_ntf_stall_cpi"
> Which more closely matches the definition on Power9.
>
> A limitation of the substitutions are that they depend on strict
> equality and the shape of the tree. This means that for "a + b + c"
> then a substitution of "a + b" will succeed while "b + c" will fail
> (the LHS for "+ c" is "a + b" not just "b").
>
> Separate out the events and metrics in the pmu-events tables saving
> 14.8% in the table size while making it that metrics no longer need to
> iterate over all events and vice versa. These changes remove evsel's
> direct metric support as the pmu_event no longer has a metric to
> populate it. This is a minor issue as the code wasn't working
> properly, metrics for this are rare and can still be properly ran
> using '-M'.
>
> Add an ability to just build certain models into the jevents generated
> pmu-metrics.c code. This functionality is appropriate for operating
> systems like ChromeOS, that aim to minimize binary size and know all
> the target CPU models.
>
> v2. Rebase. Modify the code that skips rewriting a metric with the
> same name with itself, to make the name check case insensitive.
>
> Ian Rogers (9):
>   perf jevents metric: Correct Function equality
>   perf jevents metric: Add ability to rewrite metrics in terms of others
>   perf jevents: Rewrite metrics in the same file with each other
>   perf pmu-events: Separate metric out of pmu_event
>   perf stat: Remove evsel metric_name/expr
>   perf jevents: Combine table prefix and suffix writing
>   perf pmu-events: Introduce pmu_metrics_table
>   perf jevents: Generate metrics and events as separate tables
>   perf jevents: Add model list option

Ping. Looking for reviews.

Thanks,
Ian

>  tools/perf/arch/arm64/util/pmu.c |  23 +-
>  tools/perf/arch/powerpc/util/header.c|   4 +-
>  tools/perf/builtin-list.c|  20 +-
>  tools/perf/builtin-stat.c|   1 -
>  tools/perf/pmu-events/Build  |   3 +-
>  tools/perf/pmu-events/empty-pmu-events.c | 111 ++-
>  tools/perf/pmu-events/jevents.py | 353 ++-
>  tools/perf/pmu-events/metric.py  |  79 -
>  tools/perf/pmu-events/metric_test.py |  10 +
>  tools/perf/pmu-events/pmu-events.h   |  26 +-
>  tools/perf/tests/expand-cgroup.c |   4 +-
>  tools/perf/tests/parse-metric.c  |   4 +-
>  tools/perf/tests/pmu-events.c|  68 ++---
>  tools/perf/util/cgroup.c |   1 -
>  tools/perf/util/evsel.c  |   2 -
>  tools/perf/util/evsel.h  |   2 -
>  tools/perf/util/metricgroup.c| 203 +++--
>  tools/perf/util/metricgroup.h|   4 +-
>  tools/perf/util/parse-events.c   |   2 -
>  tools/perf/util/pmu.c|  44 +--
>  tools/perf/util/pmu.h|  10 +-
>  tools/perf/util/print-events.c   |  32 +-
>  tools/perf/util/print-events.h   |   3 +-
>  tools/perf/util/python.c |   7 -
>  tools/perf/util/stat-shadow.c| 112 ---
>  tools/perf/util/stat.h   |   1 -
>  26 files changed, 666 insertions(+), 463 deletions(-)
>
> --
> 2.39.0.314.g84b9a713c41-goog
>


RE: [PATCH V3] tools/perf/tests: Fix string substitutions in build id test

2023-01-19 Thread David Laight
From: Athira Rajeev
> Sent: 19 January 2023 14:27
...
> The test script "tests/shell/buildid.sh" uses some of the
> string substitution ways which are supported in bash, but not in
> "sh" or other shells. Above error on line number 69 that reports
> "Bad substitution" is:

Looks better - assuming it works :-)

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


Re: sched/debug: CPU hotplug operation suffers in a large cpu systems

2023-01-19 Thread Phil Auld
Hi Greg, et alia,

On Tue, Dec 13, 2022 at 03:31:06PM +0100 Greg Kroah-Hartman wrote:
> On Tue, Dec 13, 2022 at 08:22:58AM -0500, Phil Auld wrote:

> > > 
> > > The idea seems good, the implementation might need a bit of work :)
> > 
> > More than the one comment below? Let me know.
> 
> No idea, resubmit a working patch and I'll review it properly :)
> 

I finally got this posted, twice :(. Sorry for the delay, I ran into
what turned out to be an unrelated issue while testing it, plus end of
the year holidays and what not. 

https://lore.kernel.org/lkml/20230119150758.880189-1-pa...@redhat.com/T/#u


Cheers,
Phil

-- 



Re: [PATCH v2] of: Fix of platform build on powerpc due to bad of disaply code

2023-01-19 Thread Thomas Zimmermann

Hi

Am 19.01.23 um 14:23 schrieb Michal Suchánek:

On Thu, Jan 19, 2023 at 02:11:13PM +0100, Thomas Zimmermann wrote:

Hi

Am 19.01.23 um 11:24 schrieb Christophe Leroy:



Le 19/01/2023 à 10:53, Michal Suchanek a écrit :

The commit 2d681d6a23a1 ("of: Make of framebuffer devices unique")
breaks build because of wrong argument to snprintf. That certainly
avoids the runtime error but is not the intended outcome.

Also use standard device name format of-display.N for all created
devices.

Fixes: 2d681d6a23a1 ("of: Make of framebuffer devices unique")
Signed-off-by: Michal Suchanek 
---
v2: Update the device name format
---
drivers/of/platform.c | 12 
1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index f2a5d679a324..8c1b1de22036 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -525,7 +525,9 @@ static int __init of_platform_default_populate_init(void)
if (IS_ENABLED(CONFIG_PPC)) {
struct device_node *boot_display = NULL;
struct platform_device *dev;
-   int display_number = 1;
+   int display_number = 0;
+   char buf[14];


Can you declare that in the for block where it is used instead ?


+   char *of_display_format = "of-display.%d";


Should be const ?


That should be static const of_display_format[] = then


Why? It sounds completely fine to have a const pointer to a string
constatnt.


Generally speaking:

'static' because your const pointer is then not a local variable, so it 
takes pressure off the stack. For global variables, you don't want them 
to show up in any linker symbol tables.


The string "of-display.%d" is stored as an array in the ELF data 
section. And your char pointer is a reference to that array. For static 
pointers, these indirections take CPU cycles to update when the loader 
has to relocate sections. If you declare of_display_format[] directly as 
array, you avoid the reference and work directly with the array.


Of course, this is a kernel module and the string is self-contained 
within the function. So the compiler can probably detect that and 
optimize the code to be like the 'static const []' version. It's still 
good to follow best practices, as someone might copy from this function.


Best regards
Thomas



Thanks

Michal






int ret;
/* Check if we have a MacOS display without a node spec */
@@ -556,7 +558,10 @@ static int __init of_platform_default_populate_init(void)
if (!of_get_property(node, "linux,opened", NULL) ||
!of_get_property(node, "linux,boot-display", NULL))
continue;
-   dev = of_platform_device_create(node, "of-display", 
NULL);
+   ret = snprintf(buf, sizeof(buf), of_display_format, 
display_number++);
+   if (ret >= sizeof(buf))
+   continue;



Can you make buf big enough to avoid that ?

And by the way could it be called something else than 'buf' ?

See exemple here :
https://elixir.bootlin.com/linux/v6.1/source/drivers/fsi/fsi-occ.c#L690



+   dev = of_platform_device_create(node, buf, NULL);
if (WARN_ON(!dev))
return -ENOMEM;
boot_display = node;
@@ -564,10 +569,9 @@ static int __init of_platform_default_populate_init(void)
}
for_each_node_by_type(node, "display") {
-   char *buf[14];
if (!of_get_property(node, "linux,opened", NULL) || 
node == boot_display)
continue;
-   ret = snprintf(buf, "of-display-%d", display_number++);
+   ret = snprintf(buf, sizeof(buf), of_display_format, 
display_number++);
if (ret >= sizeof(buf))
continue;
of_platform_device_create(node, buf, NULL);


--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev






--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH] modpost: support arbitrary symbol length in modversion

2023-01-19 Thread Michal Suchánek
On Thu, Jan 19, 2023 at 03:09:36PM +, Gary Guo wrote:
> On Tue, 17 Jan 2023 11:22:45 -0800
> Lucas De Marchi  wrote:
> 
> > On Tue, Jan 17, 2023 at 06:51:44PM +0100, Michal Suchánek wrote:
> > >Hello,
> > >
> > >On Fri, Jan 13, 2023 at 06:18:41PM +, Gary Guo wrote:  
> > >> On Thu, 12 Jan 2023 14:40:59 -0700
> > >> Lucas De Marchi  wrote:
> > >>  
> > >> > On Wed, Jan 11, 2023 at 04:11:51PM +, Gary Guo wrote:  
> > >> > >
> > >> > > struct modversion_info {
> > >> > >- unsigned long crc;
> > >> > >- char name[MODULE_NAME_LEN];
> > >> > >+ /* Offset of the next modversion entry in relation to this one. 
> > >> > >*/
> > >> > >+ u32 next;
> > >> > >+ u32 crc;
> > >> > >+ char name[0];  
> > >> >
> > >> > although not really exported as uapi, this will break userspace as 
> > >> > this is
> > >> > used in the  elf file generated for the modules. I think
> > >> > this change must be made in a backward compatible way and kmod updated
> > >> > to deal with the variable name length:
> > >> >
> > >> > kmod $ git grep "\[64"
> > >> > libkmod/libkmod-elf.c:  char name[64 - sizeof(uint32_t)];
> > >> > libkmod/libkmod-elf.c:  char name[64 - sizeof(uint64_t)];
> > >> >
> > >> > in kmod we have both 32 and 64 because a 64-bit kmod can read both 32
> > >> > and 64 bit module, and vice versa.
> > >> >  
> > >>
> > >> Hi Lucas,
> > >>
> > >> Thanks for the information.
> > >>
> > >> The change can't be "truly" backward compatible, in a sense that
> > >> regardless of the new format we choose, kmod would not be able to decode
> > >> symbols longer than "64 - sizeof(long)" bytes. So the list it retrieves
> > >> is going to be incomplete, isn't it?
> > >>
> > >> What kind of backward compatibility should be expected? It could be:
> > >> * short symbols can still be found by old versions of kmod, but not
> > >>   long symbols;  
> > >
> > >That sounds good. Not everyone is using rust, and with this option
> > >people who do will need to upgrade tooling, and people who don't care
> > >don't need to do anything.  
> > 
> > that could be it indeed. My main worry here is:
> > 
> > "After the support is added in kmod, kmod needs to be able to output the
> > correct information regardless if the module is from before/after the
> > change in the kernel and also without relying on kernel version."
> > Just changing the struct modversion_info doesn't make that possible.
> > 
> > Maybe adding the long symbols in another section?
> 
> Yeah, that's what I imagined how it could be implemented when I said
> "short symbols can still be found by old versions of kmod, but not long
> symbols".
> 
> > Or ble just increase to 512 and add the size to a
> > "__versions_hdr" section. If we then output a max size per module,
> > this would offset a little bit the additional size gained for the
> > modules using rust.
> 
> That format isn't really elegant IMO. And symbol length can vary a lot,
> having all symbols dictated by the longest symbol doesn't sound a good
> approach.
> 
> > And the additional 0's should compress well
> > so I'm not sure the additional size is that much relevant here.
> 
> I am not sure why compression is mentioned here. I don't think section
> in .ko files are compressed.

There is the option to compress the whole .ko files, and it's commonly
used.

Thanks

Michal


Re: [PATCH v2] of: Fix of platform build on powerpc due to bad of disaply code

2023-01-19 Thread Rob Herring
On Thu, Jan 19, 2023 at 3:53 AM Michal Suchanek  wrote:
>
> The commit 2d681d6a23a1 ("of: Make of framebuffer devices unique")
> breaks build because of wrong argument to snprintf. That certainly
> avoids the runtime error but is not the intended outcome.
>
> Also use standard device name format of-display.N for all created
> devices.
>
> Fixes: 2d681d6a23a1 ("of: Make of framebuffer devices unique")
> Signed-off-by: Michal Suchanek 
> ---
> v2: Update the device name format
> ---
>  drivers/of/platform.c | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)

As this is the only fix I have queued, if you respin, send the
original fix with fixes squashed.

Rob


Re: [PATCH] modpost: support arbitrary symbol length in modversion

2023-01-19 Thread Gary Guo
On Tue, 17 Jan 2023 11:22:45 -0800
Lucas De Marchi  wrote:

> On Tue, Jan 17, 2023 at 06:51:44PM +0100, Michal Suchánek wrote:
> >Hello,
> >
> >On Fri, Jan 13, 2023 at 06:18:41PM +, Gary Guo wrote:  
> >> On Thu, 12 Jan 2023 14:40:59 -0700
> >> Lucas De Marchi  wrote:
> >>  
> >> > On Wed, Jan 11, 2023 at 04:11:51PM +, Gary Guo wrote:  
> >> > >
> >> > > struct modversion_info {
> >> > >-   unsigned long crc;
> >> > >-   char name[MODULE_NAME_LEN];
> >> > >+   /* Offset of the next modversion entry in relation to this one. 
> >> > >*/
> >> > >+   u32 next;
> >> > >+   u32 crc;
> >> > >+   char name[0];  
> >> >
> >> > although not really exported as uapi, this will break userspace as this 
> >> > is
> >> > used in the  elf file generated for the modules. I think
> >> > this change must be made in a backward compatible way and kmod updated
> >> > to deal with the variable name length:
> >> >
> >> > kmod $ git grep "\[64"
> >> > libkmod/libkmod-elf.c:  char name[64 - sizeof(uint32_t)];
> >> > libkmod/libkmod-elf.c:  char name[64 - sizeof(uint64_t)];
> >> >
> >> > in kmod we have both 32 and 64 because a 64-bit kmod can read both 32
> >> > and 64 bit module, and vice versa.
> >> >  
> >>
> >> Hi Lucas,
> >>
> >> Thanks for the information.
> >>
> >> The change can't be "truly" backward compatible, in a sense that
> >> regardless of the new format we choose, kmod would not be able to decode
> >> symbols longer than "64 - sizeof(long)" bytes. So the list it retrieves
> >> is going to be incomplete, isn't it?
> >>
> >> What kind of backward compatibility should be expected? It could be:
> >> * short symbols can still be found by old versions of kmod, but not
> >>   long symbols;  
> >
> >That sounds good. Not everyone is using rust, and with this option
> >people who do will need to upgrade tooling, and people who don't care
> >don't need to do anything.  
> 
> that could be it indeed. My main worry here is:
> 
> "After the support is added in kmod, kmod needs to be able to output the
> correct information regardless if the module is from before/after the
> change in the kernel and also without relying on kernel version."
> Just changing the struct modversion_info doesn't make that possible.
> 
> Maybe adding the long symbols in another section?

Yeah, that's what I imagined how it could be implemented when I said
"short symbols can still be found by old versions of kmod, but not long
symbols".

> Or ble just increase to 512 and add the size to a
> "__versions_hdr" section. If we then output a max size per module,
> this would offset a little bit the additional size gained for the
> modules using rust.

That format isn't really elegant IMO. And symbol length can vary a lot,
having all symbols dictated by the longest symbol doesn't sound a good
approach.

> And the additional 0's should compress well
> so I'm not sure the additional size is that much relevant here.

I am not sure why compression is mentioned here. I don't think section
in .ko files are compressed.

(sorry forget to reply-all, re-send email to the list)

Best,
Gary


[PATCH v4 7/7] sched, smp: Trace smp callback causing an IPI

2023-01-19 Thread Valentin Schneider
Context
===

The newly-introduced ipi_send_cpumask tracepoint has a "callback" parameter
which so far has only been fed with NULL.

While CSD_TYPE_SYNC/ASYNC and CSD_TYPE_IRQ_WORK share a similar backing
struct layout (meaning their callback func can be accessed without caring
about the actual CSD type), CSD_TYPE_TTWU doesn't even have a function
attached to its struct. This means we need to check the type of a CSD
before eventually dereferencing its associated callback.

This isn't as trivial as it sounds: the CSD type is stored in
__call_single_node.u_flags, which get cleared right before the callback is
executed via csd_unlock(). This implies checking the CSD type before it is
enqueued on the call_single_queue, as the target CPU's queue can be flushed
before we get to sending an IPI.

Furthermore, send_call_function_single_ipi() only has a CPU parameter, and
would need to have an additional argument to trickle down the invoked
function. This is somewhat silly, as the extra argument will always be
pushed down to the function even when nothing is being traced, which is
unnecessary overhead.

Changes
===

send_call_function_single_ipi() is only used by smp.c, and is defined in
sched/core.c as it contains scheduler-specific ops (set_nr_if_polling() of
a CPU's idle task).

Split it into two parts: the scheduler bits remain in sched/core.c, and the
actual IPI emission is moved into smp.c. This lets us define an
__always_inline helper function that can take the related callback as
parameter without creating useless register pressure in the non-traced path
which only gains a (disabled) static branch.

Do the same thing for the multi IPI case.

Signed-off-by: Valentin Schneider 
---
 kernel/sched/core.c | 18 +++-
 kernel/sched/smp.h  |  2 +-
 kernel/smp.c| 72 +
 3 files changed, 66 insertions(+), 26 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c0d09eb249603..9733c3ecdbf16 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3816,16 +3816,20 @@ void sched_ttwu_pending(void *arg)
rq_unlock_irqrestore(rq, );
 }
 
-void send_call_function_single_ipi(int cpu)
+/*
+ * Prepare the scene for sending an IPI for a remote smp_call
+ *
+ * Returns true if the caller can proceed with sending the IPI.
+ * Returns false otherwise.
+ */
+bool call_function_single_prep_ipi(int cpu)
 {
-   struct rq *rq = cpu_rq(cpu);
-
-   if (!set_nr_if_polling(rq->idle)) {
-   trace_ipi_send_cpumask(cpumask_of(cpu), _RET_IP_, NULL);
-   arch_send_call_function_single_ipi(cpu);
-   } else {
+   if (set_nr_if_polling(cpu_rq(cpu)->idle)) {
trace_sched_wake_idle_without_ipi(cpu);
+   return false;
}
+
+   return true;
 }
 
 /*
diff --git a/kernel/sched/smp.h b/kernel/sched/smp.h
index 2eb23dd0f2856..21ac44428bb02 100644
--- a/kernel/sched/smp.h
+++ b/kernel/sched/smp.h
@@ -6,7 +6,7 @@
 
 extern void sched_ttwu_pending(void *arg);
 
-extern void send_call_function_single_ipi(int cpu);
+extern bool call_function_single_prep_ipi(int cpu);
 
 #ifdef CONFIG_SMP
 extern void flush_smp_call_function_queue(void);
diff --git a/kernel/smp.c b/kernel/smp.c
index 821b5986721ac..5cd680a7e78ef 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -161,9 +161,18 @@ void __init call_function_init(void)
 }
 
 static __always_inline void
-send_call_function_ipi_mask(const struct cpumask *mask)
+send_call_function_single_ipi(int cpu, smp_call_func_t func)
 {
-   trace_ipi_send_cpumask(mask, _RET_IP_, NULL);
+   if (call_function_single_prep_ipi(cpu)) {
+   trace_ipi_send_cpumask(cpumask_of(cpu), _RET_IP_, func);
+   arch_send_call_function_single_ipi(cpu);
+   }
+}
+
+static __always_inline void
+send_call_function_ipi_mask(const struct cpumask *mask, smp_call_func_t func)
+{
+   trace_ipi_send_cpumask(mask, _RET_IP_, func);
arch_send_call_function_ipi_mask(mask);
 }
 
@@ -430,12 +439,16 @@ static void __smp_call_single_queue_debug(int cpu, struct 
llist_node *node)
struct cfd_seq_local *seq = this_cpu_ptr(_seq_local);
struct call_function_data *cfd = this_cpu_ptr(_data);
struct cfd_percpu *pcpu = per_cpu_ptr(cfd->pcpu, cpu);
+   struct __call_single_data *csd;
+
+   csd = container_of(node, call_single_data_t, node.llist);
+   WARN_ON_ONCE(!(CSD_TYPE(csd) & (CSD_TYPE_SYNC | CSD_TYPE_ASYNC)));
 
cfd_seq_store(pcpu->seq_queue, this_cpu, cpu, CFD_SEQ_QUEUE);
if (llist_add(node, _cpu(call_single_queue, cpu))) {
cfd_seq_store(pcpu->seq_ipi, this_cpu, cpu, CFD_SEQ_IPI);
cfd_seq_store(seq->ping, this_cpu, cpu, CFD_SEQ_PING);
-   send_call_function_single_ipi(cpu);
+   send_call_function_single_ipi(cpu, csd->func);
cfd_seq_store(seq->pinged, this_cpu, cpu, CFD_SEQ_PINGED);
} else {

[PATCH v4 6/7] smp: reword smp call IPI comment

2023-01-19 Thread Valentin Schneider
Accessing the call_single_queue hasn't involved a spinlock since 2014:

  6897fc22ea01 ("kernel: use lockless list for smp_call_function_single")

The llist operations (namely cmpxchg() and xchg()) provide similar ordering
guarantees, update the comment to lessen confusion.

Signed-off-by: Valentin Schneider 
---
 kernel/smp.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 93b4386cd3096..821b5986721ac 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -495,9 +495,10 @@ void __smp_call_single_queue(int cpu, struct llist_node 
*node)
 #endif
 
/*
-* The list addition should be visible before sending the IPI
-* handler locks the list to pull the entry off it because of
-* normal cache coherency rules implied by spinlocks.
+* The list addition should be visible to the target CPU when it pops
+* the head of the list to pull the entry off it in the IPI handler
+* because of normal cache coherency rules implied by the underlying
+* llist ops.
 *
 * If IPIs can go out of order to the cache coherency protocol
 * in an architecture, sufficient synchronisation should be added
-- 
2.31.1



[PATCH v4 5/7] treewide: Trace IPIs sent via smp_send_reschedule()

2023-01-19 Thread Valentin Schneider
To be able to trace invocations of smp_send_reschedule(), rename the
arch-specific definitions of it to arch_smp_send_reschedule() and wrap it
into an smp_send_reschedule() that contains a tracepoint.

Changes to include the declaration of the tracepoint were driven by the
following coccinelle script:

  @func_use@
  @@
  smp_send_reschedule(...);

  @include@
  @@
  #include 

  @no_include depends on func_use && !include@
  @@
#include <...>
  +
  + #include 

Signed-off-by: Valentin Schneider 
[csky bits]
Acked-by: Guo Ren 
[riscv bits]
Acked-by: Palmer Dabbelt 
---
 arch/alpha/kernel/smp.c  |  2 +-
 arch/arc/kernel/smp.c|  2 +-
 arch/arm/kernel/smp.c|  2 +-
 arch/arm/mach-actions/platsmp.c  |  2 ++
 arch/arm64/kernel/smp.c  |  2 +-
 arch/csky/kernel/smp.c   |  2 +-
 arch/hexagon/kernel/smp.c|  2 +-
 arch/ia64/kernel/smp.c   |  4 ++--
 arch/loongarch/kernel/smp.c  |  4 ++--
 arch/mips/include/asm/smp.h  |  2 +-
 arch/mips/kernel/rtlx-cmp.c  |  2 ++
 arch/openrisc/kernel/smp.c   |  2 +-
 arch/parisc/kernel/smp.c |  4 ++--
 arch/powerpc/kernel/smp.c|  6 --
 arch/powerpc/kvm/book3s_hv.c |  3 +++
 arch/powerpc/platforms/powernv/subcore.c |  2 ++
 arch/riscv/kernel/smp.c  |  4 ++--
 arch/s390/kernel/smp.c   |  2 +-
 arch/sh/kernel/smp.c |  2 +-
 arch/sparc/kernel/smp_32.c   |  2 +-
 arch/sparc/kernel/smp_64.c   |  2 +-
 arch/x86/include/asm/smp.h   |  2 +-
 arch/x86/kvm/svm/svm.c   |  4 
 arch/x86/kvm/x86.c   |  2 ++
 arch/xtensa/kernel/smp.c |  2 +-
 include/linux/smp.h  | 11 +--
 virt/kvm/kvm_main.c  |  2 ++
 27 files changed, 52 insertions(+), 26 deletions(-)

diff --git a/arch/alpha/kernel/smp.c b/arch/alpha/kernel/smp.c
index f4e20f75438f8..38637eb9eebd5 100644
--- a/arch/alpha/kernel/smp.c
+++ b/arch/alpha/kernel/smp.c
@@ -562,7 +562,7 @@ handle_ipi(struct pt_regs *regs)
 }
 
 void
-smp_send_reschedule(int cpu)
+arch_smp_send_reschedule(int cpu)
 {
 #ifdef DEBUG_IPI_MSG
if (cpu == hard_smp_processor_id())
diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c
index ad93fe6e4b77d..409cfa4675b40 100644
--- a/arch/arc/kernel/smp.c
+++ b/arch/arc/kernel/smp.c
@@ -292,7 +292,7 @@ static void ipi_send_msg(const struct cpumask *callmap, 
enum ipi_msg_type msg)
ipi_send_msg_one(cpu, msg);
 }
 
-void smp_send_reschedule(int cpu)
+void arch_smp_send_reschedule(int cpu)
 {
ipi_send_msg_one(cpu, IPI_RESCHEDULE);
 }
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 45b8ca2ce521f..dea24a6e0ed6f 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -744,7 +744,7 @@ void __init set_smp_ipi_range(int ipi_base, int n)
ipi_setup(smp_processor_id());
 }
 
-void smp_send_reschedule(int cpu)
+void arch_smp_send_reschedule(int cpu)
 {
smp_cross_call(cpumask_of(cpu), IPI_RESCHEDULE);
 }
diff --git a/arch/arm/mach-actions/platsmp.c b/arch/arm/mach-actions/platsmp.c
index f26618b435145..7b208e96fbb67 100644
--- a/arch/arm/mach-actions/platsmp.c
+++ b/arch/arm/mach-actions/platsmp.c
@@ -20,6 +20,8 @@
 #include 
 #include 
 
+#include 
+
 #define OWL_CPU1_ADDR  0x50
 #define OWL_CPU1_FLAG  0x5c
 
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 937d2623e06ba..8d108edc4a89f 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -976,7 +976,7 @@ void __init set_smp_ipi_range(int ipi_base, int n)
ipi_setup(smp_processor_id());
 }
 
-void smp_send_reschedule(int cpu)
+void arch_smp_send_reschedule(int cpu)
 {
smp_cross_call(cpumask_of(cpu), IPI_RESCHEDULE);
 }
diff --git a/arch/csky/kernel/smp.c b/arch/csky/kernel/smp.c
index 4b605aa2e1d65..fd7f81be16dd6 100644
--- a/arch/csky/kernel/smp.c
+++ b/arch/csky/kernel/smp.c
@@ -140,7 +140,7 @@ void smp_send_stop(void)
on_each_cpu(ipi_stop, NULL, 1);
 }
 
-void smp_send_reschedule(int cpu)
+void arch_smp_send_reschedule(int cpu)
 {
send_ipi_message(cpumask_of(cpu), IPI_RESCHEDULE);
 }
diff --git a/arch/hexagon/kernel/smp.c b/arch/hexagon/kernel/smp.c
index 4ba93e59370c4..4e8bee25b8c68 100644
--- a/arch/hexagon/kernel/smp.c
+++ b/arch/hexagon/kernel/smp.c
@@ -217,7 +217,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
}
 }
 
-void smp_send_reschedule(int cpu)
+void arch_smp_send_reschedule(int cpu)
 {
send_ipi(cpumask_of(cpu), IPI_RESCHEDULE);
 }
diff --git a/arch/ia64/kernel/smp.c b/arch/ia64/kernel/smp.c
index e2cc59db86bc2..ea4f009a232b4 100644
--- a/arch/ia64/kernel/smp.c
+++ b/arch/ia64/kernel/smp.c
@@ -220,11 +220,11 @@ kdump_smp_send_init(void)
  * Called with preemption disabled.
  */
 void

[PATCH v4 4/7] irq_work: Trace self-IPIs sent via arch_irq_work_raise()

2023-01-19 Thread Valentin Schneider
IPIs sent to remote CPUs via irq_work_queue_on() are now covered by
trace_ipi_send_cpumask(), add another instance of the tracepoint to cover
self-IPIs.

Signed-off-by: Valentin Schneider 
Reviewed-by: Steven Rostedt (Google) 
---
 kernel/irq_work.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index 7afa40fe5cc43..c33e88e32a67a 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -22,6 +22,8 @@
 #include 
 #include 
 
+#include 
+
 static DEFINE_PER_CPU(struct llist_head, raised_list);
 static DEFINE_PER_CPU(struct llist_head, lazy_list);
 static DEFINE_PER_CPU(struct task_struct *, irq_workd);
@@ -74,6 +76,16 @@ void __weak arch_irq_work_raise(void)
 */
 }
 
+static __always_inline void irq_work_raise(struct irq_work *work)
+{
+   if (trace_ipi_send_cpumask_enabled() && arch_irq_work_has_interrupt())
+   trace_ipi_send_cpumask(cpumask_of(smp_processor_id()),
+  _RET_IP_,
+  work->func);
+
+   arch_irq_work_raise();
+}
+
 /* Enqueue on current CPU, work must already be claimed and preempt disabled */
 static void __irq_work_queue_local(struct irq_work *work)
 {
@@ -99,7 +111,7 @@ static void __irq_work_queue_local(struct irq_work *work)
 
/* If the work is "lazy", handle it from next tick if any */
if (!lazy_work || tick_nohz_tick_stopped())
-   arch_irq_work_raise();
+   irq_work_raise(work);
 }
 
 /* Enqueue the irq work @work on the current CPU */
-- 
2.31.1



[PATCH v4 3/7] smp: Trace IPIs sent via arch_send_call_function_ipi_mask()

2023-01-19 Thread Valentin Schneider
This simply wraps around the arch function and prepends it with a
tracepoint, similar to send_call_function_single_ipi().

Signed-off-by: Valentin Schneider 
Reviewed-by: Steven Rostedt (Google) 
---
 kernel/smp.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index e2ca1e2f31274..93b4386cd3096 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -160,6 +160,13 @@ void __init call_function_init(void)
smpcfd_prepare_cpu(smp_processor_id());
 }
 
+static __always_inline void
+send_call_function_ipi_mask(const struct cpumask *mask)
+{
+   trace_ipi_send_cpumask(mask, _RET_IP_, NULL);
+   arch_send_call_function_ipi_mask(mask);
+}
+
 #ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
 
 static DEFINE_STATIC_KEY_FALSE(csdlock_debug_enabled);
@@ -970,7 +977,7 @@ static void smp_call_function_many_cond(const struct 
cpumask *mask,
if (nr_cpus == 1)
send_call_function_single_ipi(last_cpu);
else if (likely(nr_cpus > 1))
-   arch_send_call_function_ipi_mask(cfd->cpumask_ipi);
+   send_call_function_ipi_mask(cfd->cpumask_ipi);
 
cfd_seq_store(this_cpu_ptr(_seq_local)->pinged, this_cpu, 
CFD_SEQ_NOCPU, CFD_SEQ_PINGED);
}
-- 
2.31.1



[PATCH v4 1/7] trace: Add trace_ipi_send_cpumask()

2023-01-19 Thread Valentin Schneider
trace_ipi_raise() is unsuitable for generically tracing IPI sources due to
its "reason" argument being an uninformative string (on arm64 all you get
is "Function call interrupts" for SMP calls).

Add a variant of it that exports a target cpumask, a callsite and a callback.

Signed-off-by: Valentin Schneider 
Reviewed-by: Steven Rostedt (Google) 
---
 include/trace/events/ipi.h | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/include/trace/events/ipi.h b/include/trace/events/ipi.h
index 0be71dad6ec03..b1125dc27682c 100644
--- a/include/trace/events/ipi.h
+++ b/include/trace/events/ipi.h
@@ -35,6 +35,28 @@ TRACE_EVENT(ipi_raise,
TP_printk("target_mask=%s (%s)", __get_bitmask(target_cpus), 
__entry->reason)
 );
 
+TRACE_EVENT(ipi_send_cpumask,
+
+   TP_PROTO(const struct cpumask *cpumask, unsigned long callsite, void 
*callback),
+
+   TP_ARGS(cpumask, callsite, callback),
+
+   TP_STRUCT__entry(
+   __cpumask(cpumask)
+   __field(void *, callsite)
+   __field(void *, callback)
+   ),
+
+   TP_fast_assign(
+   __assign_cpumask(cpumask, cpumask_bits(cpumask));
+   __entry->callsite = (void *)callsite;
+   __entry->callback = callback;
+   ),
+
+   TP_printk("cpumask=%s callsite=%pS callback=%pS",
+ __get_cpumask(cpumask), __entry->callsite, __entry->callback)
+);
+
 DECLARE_EVENT_CLASS(ipi_handler,
 
TP_PROTO(const char *reason),
-- 
2.31.1



[PATCH v4 2/7] sched, smp: Trace IPIs sent via send_call_function_single_ipi()

2023-01-19 Thread Valentin Schneider
send_call_function_single_ipi() is the thing that sends IPIs at the bottom
of smp_call_function*() via either generic_exec_single() or
smp_call_function_many_cond(). Give it an IPI-related tracepoint.

Note that this ends up tracing any IPI sent via __smp_call_single_queue(),
which covers __ttwu_queue_wakelist() and irq_work_queue_on() "for free".

Signed-off-by: Valentin Schneider 
Reviewed-by: Steven Rostedt (Google) 
Acked-by: Ingo Molnar 
---
 arch/arm/kernel/smp.c   | 3 ---
 arch/arm64/kernel/smp.c | 1 -
 kernel/sched/core.c | 7 +--
 kernel/smp.c| 4 
 4 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 36e6efad89f30..45b8ca2ce521f 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -48,9 +48,6 @@
 #include 
 #include 
 
-#define CREATE_TRACE_POINTS
-#include 
-
 /*
  * as from 2.5, kernels no longer have an init_tasks structure
  * so we need some other way of telling a new secondary core
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index ffc5d76cf6955..937d2623e06ba 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -51,7 +51,6 @@
 #include 
 #include 
 
-#define CREATE_TRACE_POINTS
 #include 
 
 DEFINE_PER_CPU_READ_MOSTLY(int, cpu_number);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index bb1ee6d7bddea..c0d09eb249603 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -81,6 +81,7 @@
 #include 
 #include 
 #undef CREATE_TRACE_POINTS
+#include 
 
 #include "sched.h"
 #include "stats.h"
@@ -3819,10 +3820,12 @@ void send_call_function_single_ipi(int cpu)
 {
struct rq *rq = cpu_rq(cpu);
 
-   if (!set_nr_if_polling(rq->idle))
+   if (!set_nr_if_polling(rq->idle)) {
+   trace_ipi_send_cpumask(cpumask_of(cpu), _RET_IP_, NULL);
arch_send_call_function_single_ipi(cpu);
-   else
+   } else {
trace_sched_wake_idle_without_ipi(cpu);
+   }
 }
 
 /*
diff --git a/kernel/smp.c b/kernel/smp.c
index 06a413987a14a..e2ca1e2f31274 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -26,6 +26,10 @@
 #include 
 #include 
 
+#define CREATE_TRACE_POINTS
+#include 
+#undef CREATE_TRACE_POINTS
+
 #include "smpboot.h"
 #include "sched/smp.h"
 
-- 
2.31.1



[PATCH v4 0/7] Generic IPI sending tracepoint

2023-01-19 Thread Valentin Schneider
Background
==

Detecting IPI *reception* is relatively easy, e.g. using
trace_irq_handler_{entry,exit} or even just function-trace
flush_smp_call_function_queue() for SMP calls.  

Figuring out their *origin*, is trickier as there is no generic tracepoint tied
to e.g. smp_call_function():

o AFAIA x86 has no tracepoint tied to sending IPIs, only receiving them
  (cf. trace_call_function{_single}_entry()).
o arm/arm64 do have trace_ipi_raise(), which gives us the target cpus but also a
  mostly useless string (smp_calls will all be "Function call interrupts").
o Other architectures don't seem to have any IPI-sending related tracepoint.  

I believe one reason those tracepoints used by arm/arm64 ended up as they were
is because these archs used to handle IPIs differently from regular interrupts
(the IRQ driver would directly invoke an IPI-handling routine), which meant 
they 
never showed up in trace_irq_handler_{entry, exit}. The trace_ipi_{entry,exit}
tracepoints gave a way to trace IPI reception but those have become redundant as
of: 

  56afcd3dbd19 ("ARM: Allow IPIs to be handled as normal interrupts")
  d3afc7f12987 ("arm64: Allow IPIs to be handled as normal interrupts")

which gave IPIs a "proper" handler function used through
generic_handle_domain_irq(), which makes them show up via
trace_irq_handler_{entry, exit}.

Changing stuff up
=

Per the above, it would make sense to reshuffle trace_ipi_raise() and move it
into generic code. This also came up during Daniel's talk on Osnoise at the CPU
isolation MC of LPC 2022 [1]. 

Now, to be useful, such a tracepoint needs to export:
o targeted CPU(s)
o calling context

The only way to get the calling context with trace_ipi_raise() is to trigger a
stack dump, e.g. $(trace-cmd -e ipi* -T echo 42).

This is instead introducing a new tracepoint which exports the relevant context
(callsite, and requested callback for when the callsite isn't helpful), and is
usable by all architectures as it sits in generic code. 

Another thing worth mentioning is that depending on the callsite, the _RET_IP_
fed to the tracepoint is not always useful - generic_exec_single() doesn't tell
you much about the actual callback being sent via IPI, which is why the new
tracepoint also has a @callback argument.

Patches
===

o Patches 1-5 spread out the tracepoint across relevant sites.
  Patch 5 ends up sprinkling lots of #include  which I'm not
  the biggest fan of, but is the least horrible solution I've been able to come
  up with so far.
  
o Patch 7 is trying to be smart about tracing the callback associated with the
  IPI.

This results in having IPI trace events for:

o smp_call_function*()
o smp_send_reschedule()
o irq_work_queue*()
o standalone uses of __smp_call_single_queue()

This is incomplete, just looking at arm64 there's more IPI types that aren't
covered: 

  IPI_CPU_STOP,
  IPI_CPU_CRASH_STOP,
  IPI_TIMER,
  IPI_WAKEUP,

but apart from IPI_TIMER (cf. tick_broadcast()), those IPIs are both unfrequent
and accompanied with identifiable interference (stopper or cpuhp threads being
scheduled). I've added a point in my todolist to handle those in a later series
for the sake of completeness.

Links
=

[1]: https://youtu.be/5gT57y4OzBM?t=14234

Revisions
=

v3 -> v4


o Rebased against 6.2-rc4
  Re-ran my coccinelle scripts for the treewide change; only loongarch needed
  changes
o Dropped cpumask trace event field patch (now in 6.2-rc1)
o Applied RB and Ack tags
  Ingo, I wasn't sure if you meant to Ack the whole series or just the patch you
  replied to, so since I didn't want to unlawfully forge any tag I only added
  the one.
o Did a small pass on comments and changelogs

v2 -> v3


o Dropped the generic export of smp_send_reschedule(), turned it into a macro
  and a bunch of imports
o Dropped the send_call_function_single_ipi() macro madness, split it into sched
  and smp bits using some of Peter's suggestions

v1 -> v2


o Ditched single-CPU tracepoint
o Changed tracepoint signature to include callback
o Changed tracepoint callsite field to void *; the parameter is still UL to save
  up on casts due to using _RET_IP_.
o Fixed linking failures due to not exporting smp_send_reschedule()

git range-diff v3 vs v4
=

1:  6820c1880d97d < -:  - tracing: Add __cpumask to denote a trace 
event field that is a cpumask_t
2:  ef594e168af0d ! 1:  8f1309849c859 trace: Add trace_ipi_send_cpumask()
@@ Commit message
 its "reason" argument being an uninformative string (on arm64 all you 
get
 is "Function call interrupts" for SMP calls).
 
-Add a variant of it that exports a target CPU, a callsite and a 
callback.
+Add a variant of it that exports a target cpumask, a callsite and a 
callback.
 
 Signed-off-by: Valentin Schneider 
 Reviewed-by: Steven Rostedt (Google) 
3:  17ccdc591aec9 ! 2:  3e0f952a905ce sched, smp: 

[PATCH V3] tools/perf/tests: Fix string substitutions in build id test

2023-01-19 Thread Athira Rajeev
The perf test named “build id cache operations” skips with below
error on some distros:

<<>>
 78: build id cache operations   :
test child forked, pid 01
WARNING: wine not found. PE binaries will not be run.
test binaries: /tmp/perf.ex.SHA1.PKz /tmp/perf.ex.MD5.Gt3 
./tests/shell/../pe-file.exe
DEBUGINFOD_URLS=
Adding 4abd406f041feb4f10ecde3fc30fd0639e1a91cb /tmp/perf.ex.SHA1.PKz: Ok
build id: 4abd406f041feb4f10ecde3fc30fd0639e1a91cb
./tests/shell/buildid.sh: 69: ./tests/shell/buildid.sh: Bad substitution
test child finished with -2
build id cache operations: Skip
<<>>

The test script "tests/shell/buildid.sh" uses some of the
string substitution ways which are supported in bash, but not in
"sh" or other shells. Above error on line number 69 that reports
"Bad substitution" is:

<<>>
link=${build_id_dir}/.build-id/${id:0:2}/${id:2}
<<>>

Here the way of getting first two characters from id ie,
${id:0:2} and similarly expressions like ${id:2} is not
recognised in "sh". So the line errors and instead of
hitting failure, the test gets skipped as shown in logs.
So the syntax issue causes test not to be executed in
such cases. Similarly usage : "${@: -1}" [ to pick last
argument passed to a function] in “test_record” doesn’t
work in all distros.

Fix this by using alternative way with shell substitution
to pick required characters from the string. Also fix the
usage of “${@: -1}” to work in all cases.

Another usage in “test_record” is:
<<>>
${perf} record --buildid-all -o ${data} $@ &> ${log}
<<>>

This causes the perf record to start in background and
Results in the data file not being created by the time
"check" function is invoked. Below log shows perf record
result getting displayed after the call to "check" function.

<<>>
running: perf record /tmp/perf.ex.SHA1.EAU
build id: 4abd406f041feb4f10ecde3fc30fd0639e1a91cb
link: /tmp/perf.debug.mLT/.build-id/4a/bd406f041feb4f10ecde3fc30fd0639e1a91cb
failed: link 
/tmp/perf.debug.mLT/.build-id/4a/bd406f041feb4f10ecde3fc30fd0639e1a91cb does 
not exist
test child finished with -1
build id cache operations: FAILED!
root@machine:~/athira/linux/tools/perf# Couldn't synthesize bpf events.
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.010 MB /tmp/perf.data.bFF ]
<<>>

Fix this by redirecting output instead of using “&” which
starts the command in background.

Signed-off-by: Athira Rajeev 
Acked-by: Ian Rogers 
---
Changelog:
>From v2 -> v3
- Addressed review comments from David Laight
  for using shell substitutions.

>From v1 -> v2
- Added Acked-by from Ian.
- Rebased to tmp.perf/urgent of:
 git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git

 tools/perf/tests/shell/buildid.sh | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/tools/perf/tests/shell/buildid.sh 
b/tools/perf/tests/shell/buildid.sh
index aaf851108ca3..0ce22ea0a7f1 100755
--- a/tools/perf/tests/shell/buildid.sh
+++ b/tools/perf/tests/shell/buildid.sh
@@ -66,7 +66,9 @@ check()
esac
echo "build id: ${id}"
 
-   link=${build_id_dir}/.build-id/${id:0:2}/${id:2}
+   id_file=${id#??}
+   id_dir=${id%$id_file}
+   link=$build_id_dir/.build-id/$id_dir/$id_file
echo "link: ${link}"
 
if [ ! -h $link ]; then
@@ -74,7 +76,7 @@ check()
exit 1
fi
 
-   file=${build_id_dir}/.build-id/${id:0:2}/`readlink ${link}`/elf
+   file=${build_id_dir}/.build-id/$id_dir/`readlink ${link}`/elf
echo "file: ${file}"
 
# Check for file permission of original file
@@ -130,20 +132,22 @@ test_record()
 {
data=$(mktemp /tmp/perf.data.XXX)
build_id_dir=$(mktemp -d /tmp/perf.debug.XXX)
-   log=$(mktemp /tmp/perf.log.XXX)
+   log_out=$(mktemp /tmp/perf.log.out.XXX)
+   log_err=$(mktemp /tmp/perf.log.err.XXX)
perf="perf --buildid-dir ${build_id_dir}"
 
echo "running: perf record $@"
-   ${perf} record --buildid-all -o ${data} $@ &> ${log}
+   ${perf} record --buildid-all -o ${data} $@ 1>${log_out} 2>${log_err}
if [ $? -ne 0 ]; then
echo "failed: record $@"
-   echo "see log: ${log}"
+   echo "see log: ${log_err}"
exit 1
fi
 
-   check ${@: -1}
+   args="$*"
+   check ${args##* }
 
-   rm -f ${log}
+   rm -f ${log_out} ${log_err}
rm -rf ${build_id_dir}
rm -rf ${data}
 }
-- 
2.39.0



Re: [PATCH V2] tools/perf/tests: Fix string substitutions in build id test

2023-01-19 Thread Athira Rajeev



> On 19-Jan-2023, at 5:32 PM, David Laight  wrote:
> 
> From: Athira Rajeev
>> Sent: 19 January 2023 11:31
> ...
>> diff --git a/tools/perf/tests/shell/buildid.sh 
>> b/tools/perf/tests/shell/buildid.sh
>> index aaf851108ca3..43e43e131be7 100755
>> --- a/tools/perf/tests/shell/buildid.sh
>> +++ b/tools/perf/tests/shell/buildid.sh
>> @@ -66,7 +66,7 @@ check()
>>  esac
>>  echo "build id: ${id}"
>> 
>> -link=${build_id_dir}/.build-id/${id:0:2}/${id:2}
>> +link=${build_id_dir}/.build-id/$(echo ${id}|cut -c 1-2)/$(echo 
>> ${id}|cut -c 3-)
>>  echo "link: ${link}"
> 
> That is horrid, why not just use valid shell substitutions, eg:
>   id_file=${id#??}
>   id_dir=${id%$id_file}
>   link=$build_id_dir/.build-id/$id_dir/$id_file
> 
> ...
>> -check ${@: -1}
>> +check $last
> 
> Since this is the end of the shell function you can avoid the eval
> by doing:
>   shift $(($# - 1))
>   check $1
> or maybe:
>   args="$*"
>   check ${args##* }
> 
> Those should be ok in all posix shells.
> 
>   David
> 
Hi David,

Thanks for the review. I will post a V3 addressing these changes.

Athira
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 
> 1PT, UK
> Registration No: 1397386 (Wales)



Re: [PATCH v2] of: Fix of platform build on powerpc due to bad of disaply code

2023-01-19 Thread Michal Suchánek
On Thu, Jan 19, 2023 at 02:11:13PM +0100, Thomas Zimmermann wrote:
> Hi
> 
> Am 19.01.23 um 11:24 schrieb Christophe Leroy:
> > 
> > 
> > Le 19/01/2023 à 10:53, Michal Suchanek a écrit :
> > > The commit 2d681d6a23a1 ("of: Make of framebuffer devices unique")
> > > breaks build because of wrong argument to snprintf. That certainly
> > > avoids the runtime error but is not the intended outcome.
> > > 
> > > Also use standard device name format of-display.N for all created
> > > devices.
> > > 
> > > Fixes: 2d681d6a23a1 ("of: Make of framebuffer devices unique")
> > > Signed-off-by: Michal Suchanek 
> > > ---
> > > v2: Update the device name format
> > > ---
> > >drivers/of/platform.c | 12 
> > >1 file changed, 8 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> > > index f2a5d679a324..8c1b1de22036 100644
> > > --- a/drivers/of/platform.c
> > > +++ b/drivers/of/platform.c
> > > @@ -525,7 +525,9 @@ static int __init 
> > > of_platform_default_populate_init(void)
> > >   if (IS_ENABLED(CONFIG_PPC)) {
> > >   struct device_node *boot_display = NULL;
> > >   struct platform_device *dev;
> > > - int display_number = 1;
> > > + int display_number = 0;
> > > + char buf[14];
> > 
> > Can you declare that in the for block where it is used instead ?
> > 
> > > + char *of_display_format = "of-display.%d";
> > 
> > Should be const ?
> 
> That should be static const of_display_format[] = then

Why? It sounds completely fine to have a const pointer to a string
constatnt.

Thanks

Michal

> 
> > 
> > >   int ret;
> > >   /* Check if we have a MacOS display without a node spec 
> > > */
> > > @@ -556,7 +558,10 @@ static int __init 
> > > of_platform_default_populate_init(void)
> > >   if (!of_get_property(node, "linux,opened", 
> > > NULL) ||
> > >   !of_get_property(node, 
> > > "linux,boot-display", NULL))
> > >   continue;
> > > - dev = of_platform_device_create(node, "of-display", 
> > > NULL);
> > > + ret = snprintf(buf, sizeof(buf), of_display_format, 
> > > display_number++);
> > > + if (ret >= sizeof(buf))
> > > + continue;
> > 
> > 
> > Can you make buf big enough to avoid that ?
> > 
> > And by the way could it be called something else than 'buf' ?
> > 
> > See exemple here :
> > https://elixir.bootlin.com/linux/v6.1/source/drivers/fsi/fsi-occ.c#L690
> > 
> > 
> > > + dev = of_platform_device_create(node, buf, NULL);
> > >   if (WARN_ON(!dev))
> > >   return -ENOMEM;
> > >   boot_display = node;
> > > @@ -564,10 +569,9 @@ static int __init 
> > > of_platform_default_populate_init(void)
> > >   }
> > >   for_each_node_by_type(node, "display") {
> > > - char *buf[14];
> > >   if (!of_get_property(node, "linux,opened", 
> > > NULL) || node == boot_display)
> > >   continue;
> > > - ret = snprintf(buf, "of-display-%d", display_number++);
> > > + ret = snprintf(buf, sizeof(buf), of_display_format, 
> > > display_number++);
> > >   if (ret >= sizeof(buf))
> > >   continue;
> > >   of_platform_device_create(node, buf, NULL);
> 
> -- 
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Ivo Totev





Re: [PATCH v2] of: Fix of platform build on powerpc due to bad of disaply code

2023-01-19 Thread Thomas Zimmermann

Hi

Am 19.01.23 um 11:24 schrieb Christophe Leroy:



Le 19/01/2023 à 10:53, Michal Suchanek a écrit :

The commit 2d681d6a23a1 ("of: Make of framebuffer devices unique")
breaks build because of wrong argument to snprintf. That certainly
avoids the runtime error but is not the intended outcome.

Also use standard device name format of-display.N for all created
devices.

Fixes: 2d681d6a23a1 ("of: Make of framebuffer devices unique")
Signed-off-by: Michal Suchanek 
---
v2: Update the device name format
---
   drivers/of/platform.c | 12 
   1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index f2a5d679a324..8c1b1de22036 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -525,7 +525,9 @@ static int __init of_platform_default_populate_init(void)
if (IS_ENABLED(CONFIG_PPC)) {
struct device_node *boot_display = NULL;
struct platform_device *dev;
-   int display_number = 1;
+   int display_number = 0;
+   char buf[14];


Can you declare that in the for block where it is used instead ?


+   char *of_display_format = "of-display.%d";


Should be const ?


That should be static const of_display_format[] = then




int ret;
   
   		/* Check if we have a MacOS display without a node spec */

@@ -556,7 +558,10 @@ static int __init of_platform_default_populate_init(void)
if (!of_get_property(node, "linux,opened", NULL) ||
!of_get_property(node, "linux,boot-display", NULL))
continue;
-   dev = of_platform_device_create(node, "of-display", 
NULL);
+   ret = snprintf(buf, sizeof(buf), of_display_format, 
display_number++);
+   if (ret >= sizeof(buf))
+   continue;



Can you make buf big enough to avoid that ?

And by the way could it be called something else than 'buf' ?

See exemple here :
https://elixir.bootlin.com/linux/v6.1/source/drivers/fsi/fsi-occ.c#L690



+   dev = of_platform_device_create(node, buf, NULL);
if (WARN_ON(!dev))
return -ENOMEM;
boot_display = node;
@@ -564,10 +569,9 @@ static int __init of_platform_default_populate_init(void)
}
   
   		for_each_node_by_type(node, "display") {

-   char *buf[14];
if (!of_get_property(node, "linux,opened", NULL) || 
node == boot_display)
continue;
-   ret = snprintf(buf, "of-display-%d", display_number++);
+   ret = snprintf(buf, sizeof(buf), of_display_format, 
display_number++);
if (ret >= sizeof(buf))
continue;
of_platform_device_create(node, buf, NULL);


--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH 39/41] kernel/fork: throttle call_rcu() calls in vm_area_free

2023-01-19 Thread Michal Hocko
On Mon 09-01-23 12:53:34, Suren Baghdasaryan wrote:
> call_rcu() can take a long time when callback offloading is enabled.
> Its use in the vm_area_free can cause regressions in the exit path when
> multiple VMAs are being freed. To minimize that impact, place VMAs into
> a list and free them in groups using one call_rcu() call per group.

After some more clarification I can understand how call_rcu might not be
super happy about thousands of callbacks to be invoked and I do agree
that this is not really optimal.

On the other hand I do not like this solution much either.
VM_AREA_FREE_LIST_MAX is arbitrary and it won't really help all that
much with processes with a huge number of vmas either. It would still be
in housands of callbacks to be scheduled without a good reason.

Instead, are there any other cases than remove_vma that need this
batching? We could easily just link all the vmas into linked list and
use a single call_rcu instead, no? This would both simplify the
implementation, remove the scaling issue as well and we do not have to
argue whether VM_AREA_FREE_LIST_MAX should be epsilon or epsilon + 1.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH 39/41] kernel/fork: throttle call_rcu() calls in vm_area_free

2023-01-19 Thread Michal Hocko
On Wed 18-01-23 11:01:08, Suren Baghdasaryan wrote:
> On Wed, Jan 18, 2023 at 10:34 AM Paul E. McKenney  wrote:
[...]
> > There are a couple of possibilities here.
> >
> > First, if I am remembering correctly, the time between the call_rcu()
> > and invocation of the corresponding callback was taking multiple seconds,
> > but that was because the kernel was built with CONFIG_LAZY_RCU=y in
> > order to save power by batching RCU work over multiple call_rcu()
> > invocations.  If this is causing a problem for a given call site, the
> > shiny new call_rcu_hurry() can be used instead.  Doing this gets back
> > to the old-school non-laziness, but can of course consume more power.
> 
> That would not be the case because CONFIG_LAZY_RCU was not an option
> at the time I was profiling this issue.
> Laxy RCU would be a great option to replace this patch but
> unfortunately it's not the default behavior, so I would still have to
> implement this batching in case lazy RCU is not enabled.
> 
> >
> > Second, there is a much shorter one-jiffy delay between the call_rcu()
> > and the invocation of the corresponding callback in kernels built with
> > either CONFIG_NO_HZ_FULL=y (but only on CPUs mentioned in the nohz_full
> > or rcu_nocbs kernel boot parameters) or CONFIG_RCU_NOCB_CPU=y (but only
> > on CPUs mentioned in the rcu_nocbs kernel boot parameters).  The purpose
> > of this delay is to avoid lock contention, and so this delay is incurred
> > only on CPUs that are queuing callbacks at a rate exceeding 16K/second.
> > This is reduced to a per-jiffy limit, so on a HZ=1000 system, a CPU
> > invoking call_rcu() at least 16 times within a given jiffy will incur
> > the added delay.  The reason for this delay is the use of a separate
> > ->nocb_bypass list.  As Suren says, this bypass list is used to reduce
> > lock contention on the main ->cblist.  This is not needed in old-school
> > kernels built without either CONFIG_NO_HZ_FULL=y or CONFIG_RCU_NOCB_CPU=y
> > (including most datacenter kernels) because in that case the callbacks
> > enqueued by call_rcu() are touched only by the corresponding CPU, so
> > that there is no need for locks.
> 
> I believe this is the reason in my profiled case.
> 
> >
> > Third, if you are instead seeing multiple milliseconds of CPU consumed by
> > call_rcu() in the common case (for example, without the aid of interrupts,
> > NMIs, or SMIs), please do let me know.  That sounds to me like a bug.
> 
> I don't think I've seen such a case.
> Thanks for clarifications, Paul!

Thanks for the explanation Paul. I have to say this has caught me as a
surprise. There are just not enough details about the benchmark to
understand what is going on but I find it rather surprising that
call_rcu can induce a higher overhead than the actual kmem_cache_free
which is the callback. My naive understanding has been that call_rcu is
really fast way to defer the execution to the RCU safe context to do the
final cleanup.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH] powerpc: remove checks for binutils older than 2.25

2023-01-19 Thread Masahiro Yamada
On Thu, Jan 19, 2023 at 9:12 PM Joel Stanley  wrote:
>
> On Thu, 19 Jan 2023 at 08:24, Masahiro Yamada  wrote:
> >
> > Commit e4412739472b ("Documentation: raise minimum supported version of
> > binutils to 2.25") allows us to remove the checks for old binutils.
> >
> > There is no more user for ld-ifversion. Remove it as well.
>
> ppc kernels fail to link with 2.27 under some configurations:
>
>  https://github.com/linuxppc/issues/issues/388
>
> We may want to use ld-ifversion to exclude that version.




For LLD, CONFIG option is directly checked.


masahiro@zoe:~/ref/linux(master)$ git grep  CONFIG_LLD_VERSION
Makefile:ifeq ($(call test-lt, $(CONFIG_LLD_VERSION), 13),y)
arch/riscv/Makefile:ifeq ($(call test-lt, $(CONFIG_LLD_VERSION), 15),y)
arch/x86/Makefile:ifeq ($(call test-lt, $(CONFIG_LLD_VERSION), 13),y)
scripts/Kbuild.include:# Usage: $(call test-lt, $(CONFIG_LLD_VERSION), 15)











> >
> > Signed-off-by: Masahiro Yamada 
> > ---
> >
> >  arch/powerpc/Makefile | 22 +-
> >  arch/powerpc/lib/Makefile |  2 +-
> >  scripts/Makefile.compiler |  4 
> >  3 files changed, 2 insertions(+), 26 deletions(-)
> >
> > diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
> > index dc4cbf0a5ca9..3d265b16c0ae 100644
> > --- a/arch/powerpc/Makefile
> > +++ b/arch/powerpc/Makefile
> > @@ -42,18 +42,13 @@ machine-$(CONFIG_PPC64) += 64
> >  machine-$(CONFIG_CPU_LITTLE_ENDIAN) += le
> >  UTS_MACHINE := $(subst $(space),,$(machine-y))
> >
> > -# XXX This needs to be before we override LD below
> > -ifdef CONFIG_PPC32
> > -KBUILD_LDFLAGS_MODULE += arch/powerpc/lib/crtsavres.o
> > -else
> > -ifeq ($(call ld-ifversion, -ge, 22500, y),y)
> > +ifeq ($(CONFIG_PPC64)$(CONFIG_LD_IS_BFD),yy)
> >  # Have the linker provide sfpr if possible.
> >  # There is a corresponding test in arch/powerpc/lib/Makefile
> >  KBUILD_LDFLAGS_MODULE += --save-restore-funcs
> >  else
> >  KBUILD_LDFLAGS_MODULE += arch/powerpc/lib/crtsavres.o
> >  endif
> > -endif
> >
> >  ifdef CONFIG_CPU_LITTLE_ENDIAN
> >  KBUILD_CFLAGS  += -mlittle-endian
> > @@ -389,8 +384,6 @@ vdso_prepare: prepare0
> > $(build)=arch/powerpc/kernel/vdso 
> > include/generated/vdso64-offsets.h)
> >  endif
> >
> > -archprepare: checkbin
> > -
> >  archheaders:
> > $(Q)$(MAKE) $(build)=arch/powerpc/kernel/syscalls all
> >
> > @@ -405,16 +398,3 @@ else
> > $(eval KBUILD_CFLAGS += -mstack-protector-guard-offset=$(shell awk 
> > '{if ($$2 == "TASK_CANARY") print $$3;}' include/generated/asm-offsets.h))
> >  endif
> >  endif
> > -
> > -PHONY += checkbin
> > -# Check toolchain versions:
> > -# - gcc-4.6 is the minimum kernel-wide version so nothing required.
> > -checkbin:
> > -   @if test "x${CONFIG_LD_IS_LLD}" != "xy" -a \
> > -   "x$(call ld-ifversion, -le, 22400, y)" = "xy" ; then \
> > -   echo -n '*** binutils 2.24 miscompiles weak symbols ' ; \
> > -   echo 'in some circumstances.' ; \
> > -   echo'*** binutils 2.23 do not define the TOC symbol ' ; 
> > \
> > -   echo -n '*** Please use a different binutils version.' ; \
> > -   false ; \
> > -   fi
> > diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
> > index 4de71cbf6e8e..c53618c34b70 100644
> > --- a/arch/powerpc/lib/Makefile
> > +++ b/arch/powerpc/lib/Makefile
> > @@ -42,7 +42,7 @@ obj-$(CONFIG_FUNCTION_ERROR_INJECTION)+= 
> > error-inject.o
> >  # 64-bit linker creates .sfpr on demand for final link (vmlinux),
> >  # so it is only needed for modules, and only for older linkers which
> >  # do not support --save-restore-funcs
> > -ifeq ($(call ld-ifversion, -lt, 22500, y),y)
> > +ifndef CONFIG_LD_IS_BFD
> >  extra-$(CONFIG_PPC64)  += crtsavres.o
> >  endif
> >
> > diff --git a/scripts/Makefile.compiler b/scripts/Makefile.compiler
> > index 3d8adfd34af1..ad07a4efc253 100644
> > --- a/scripts/Makefile.compiler
> > +++ b/scripts/Makefile.compiler
> > @@ -72,7 +72,3 @@ clang-min-version = $(call test-ge, 
> > $(CONFIG_CLANG_VERSION), $1)
> >  # ld-option
> >  # Usage: KBUILD_LDFLAGS += $(call ld-option, -X, -Y)
> >  ld-option = $(call try-run, $(LD) $(KBUILD_LDFLAGS) $(1) -v,$(1),$(2),$(3))
> > -
> > -# ld-ifversion
> > -# Usage:  $(call ld-ifversion, -ge, 22252, y)
> > -ld-ifversion = $(shell [ $(CONFIG_LD_VERSION)0 $(1) $(2)0 ] && echo $(3) 
> > || echo $(4))
> > --
> > 2.34.1
> >



-- 
Best Regards
Masahiro Yamada


Re: [PATCH] powerpc: remove checks for binutils older than 2.25

2023-01-19 Thread Joel Stanley
On Thu, 19 Jan 2023 at 08:24, Masahiro Yamada  wrote:
>
> Commit e4412739472b ("Documentation: raise minimum supported version of
> binutils to 2.25") allows us to remove the checks for old binutils.
>
> There is no more user for ld-ifversion. Remove it as well.

ppc kernels fail to link with 2.27 under some configurations:

 https://github.com/linuxppc/issues/issues/388

We may want to use ld-ifversion to exclude that version.

>
> Signed-off-by: Masahiro Yamada 
> ---
>
>  arch/powerpc/Makefile | 22 +-
>  arch/powerpc/lib/Makefile |  2 +-
>  scripts/Makefile.compiler |  4 
>  3 files changed, 2 insertions(+), 26 deletions(-)
>
> diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
> index dc4cbf0a5ca9..3d265b16c0ae 100644
> --- a/arch/powerpc/Makefile
> +++ b/arch/powerpc/Makefile
> @@ -42,18 +42,13 @@ machine-$(CONFIG_PPC64) += 64
>  machine-$(CONFIG_CPU_LITTLE_ENDIAN) += le
>  UTS_MACHINE := $(subst $(space),,$(machine-y))
>
> -# XXX This needs to be before we override LD below
> -ifdef CONFIG_PPC32
> -KBUILD_LDFLAGS_MODULE += arch/powerpc/lib/crtsavres.o
> -else
> -ifeq ($(call ld-ifversion, -ge, 22500, y),y)
> +ifeq ($(CONFIG_PPC64)$(CONFIG_LD_IS_BFD),yy)
>  # Have the linker provide sfpr if possible.
>  # There is a corresponding test in arch/powerpc/lib/Makefile
>  KBUILD_LDFLAGS_MODULE += --save-restore-funcs
>  else
>  KBUILD_LDFLAGS_MODULE += arch/powerpc/lib/crtsavres.o
>  endif
> -endif
>
>  ifdef CONFIG_CPU_LITTLE_ENDIAN
>  KBUILD_CFLAGS  += -mlittle-endian
> @@ -389,8 +384,6 @@ vdso_prepare: prepare0
> $(build)=arch/powerpc/kernel/vdso 
> include/generated/vdso64-offsets.h)
>  endif
>
> -archprepare: checkbin
> -
>  archheaders:
> $(Q)$(MAKE) $(build)=arch/powerpc/kernel/syscalls all
>
> @@ -405,16 +398,3 @@ else
> $(eval KBUILD_CFLAGS += -mstack-protector-guard-offset=$(shell awk 
> '{if ($$2 == "TASK_CANARY") print $$3;}' include/generated/asm-offsets.h))
>  endif
>  endif
> -
> -PHONY += checkbin
> -# Check toolchain versions:
> -# - gcc-4.6 is the minimum kernel-wide version so nothing required.
> -checkbin:
> -   @if test "x${CONFIG_LD_IS_LLD}" != "xy" -a \
> -   "x$(call ld-ifversion, -le, 22400, y)" = "xy" ; then \
> -   echo -n '*** binutils 2.24 miscompiles weak symbols ' ; \
> -   echo 'in some circumstances.' ; \
> -   echo'*** binutils 2.23 do not define the TOC symbol ' ; \
> -   echo -n '*** Please use a different binutils version.' ; \
> -   false ; \
> -   fi
> diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
> index 4de71cbf6e8e..c53618c34b70 100644
> --- a/arch/powerpc/lib/Makefile
> +++ b/arch/powerpc/lib/Makefile
> @@ -42,7 +42,7 @@ obj-$(CONFIG_FUNCTION_ERROR_INJECTION)+= 
> error-inject.o
>  # 64-bit linker creates .sfpr on demand for final link (vmlinux),
>  # so it is only needed for modules, and only for older linkers which
>  # do not support --save-restore-funcs
> -ifeq ($(call ld-ifversion, -lt, 22500, y),y)
> +ifndef CONFIG_LD_IS_BFD
>  extra-$(CONFIG_PPC64)  += crtsavres.o
>  endif
>
> diff --git a/scripts/Makefile.compiler b/scripts/Makefile.compiler
> index 3d8adfd34af1..ad07a4efc253 100644
> --- a/scripts/Makefile.compiler
> +++ b/scripts/Makefile.compiler
> @@ -72,7 +72,3 @@ clang-min-version = $(call test-ge, 
> $(CONFIG_CLANG_VERSION), $1)
>  # ld-option
>  # Usage: KBUILD_LDFLAGS += $(call ld-option, -X, -Y)
>  ld-option = $(call try-run, $(LD) $(KBUILD_LDFLAGS) $(1) -v,$(1),$(2),$(3))
> -
> -# ld-ifversion
> -# Usage:  $(call ld-ifversion, -ge, 22252, y)
> -ld-ifversion = $(shell [ $(CONFIG_LD_VERSION)0 $(1) $(2)0 ] && echo $(3) || 
> echo $(4))
> --
> 2.34.1
>


RE: [PATCH V2] tools/perf/tests: Fix string substitutions in build id test

2023-01-19 Thread David Laight
From: Athira Rajeev
> Sent: 19 January 2023 11:31
...
> diff --git a/tools/perf/tests/shell/buildid.sh 
> b/tools/perf/tests/shell/buildid.sh
> index aaf851108ca3..43e43e131be7 100755
> --- a/tools/perf/tests/shell/buildid.sh
> +++ b/tools/perf/tests/shell/buildid.sh
> @@ -66,7 +66,7 @@ check()
>   esac
>   echo "build id: ${id}"
> 
> - link=${build_id_dir}/.build-id/${id:0:2}/${id:2}
> + link=${build_id_dir}/.build-id/$(echo ${id}|cut -c 1-2)/$(echo 
> ${id}|cut -c 3-)
>   echo "link: ${link}"

That is horrid, why not just use valid shell substitutions, eg:
id_file=${id#??}
id_dir=${id%$id_file}
link=$build_id_dir/.build-id/$id_dir/$id_file

...
> - check ${@: -1}
> + check $last

Since this is the end of the shell function you can avoid the eval
by doing:
shift $(($# - 1))
check $1
or maybe:
args="$*"
check ${args##* }

Those should be ok in all posix shells.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


Re: [PATCH v2] of: Fix of platform build on powerpc due to bad of disaply code

2023-01-19 Thread Erhard F.
On Thu, 19 Jan 2023 10:53:23 +0100
Michal Suchanek  wrote:

> The commit 2d681d6a23a1 ("of: Make of framebuffer devices unique")
> breaks build because of wrong argument to snprintf. That certainly
> avoids the runtime error but is not the intended outcome.
> 
> Also use standard device name format of-display.N for all created
> devices.
> 
> Fixes: 2d681d6a23a1 ("of: Make of framebuffer devices unique")
> Signed-off-by: Michal Suchanek 
> ---
> v2: Update the device name format

Hi Michal!

Just tested your 'of: Make of framebuffer devices unique' + 'v2 of: Fix of 
platform build on powerpc due to bad of disaply code' on my G4 and can confirm 
they fix the original issue 
(https://bugzilla.kernel.org/show_bug.cgi?id=216095). Thanks!

Also ofdrm gets loaded now without error messages:
 # modprobe -v ofdrm
insmod 
/lib/modules/6.2.0-rc4-PMacG4+/kernel/drivers/video/fbdev/core/cfbcopyarea.ko 
insmod 
/lib/modules/6.2.0-rc4-PMacG4+/kernel/drivers/video/fbdev/core/sysimgblt.ko 
insmod 
/lib/modules/6.2.0-rc4-PMacG4+/kernel/drivers/video/fbdev/core/sysfillrect.ko 
insmod 
/lib/modules/6.2.0-rc4-PMacG4+/kernel/drivers/video/fbdev/core/cfbimgblt.ko 
insmod 
/lib/modules/6.2.0-rc4-PMacG4+/kernel/drivers/video/fbdev/core/syscopyarea.ko 
insmod 
/lib/modules/6.2.0-rc4-PMacG4+/kernel/drivers/video/fbdev/core/cfbfillrect.ko 
insmod /lib/modules/6.2.0-rc4-PMacG4+/kernel/drivers/gpu/drm/drm_kms_helper.ko 
insmod 
/lib/modules/6.2.0-rc4-PMacG4+/kernel/drivers/gpu/drm/drm_shmem_helper.ko 
insmod /lib/modules/6.2.0-rc4-PMacG4+/kernel/drivers/gpu/drm/tiny/ofdrm.ko 

However I get no monitor output yet, despite ofdrm is loaded:
 # lsmod | grep -i drm
ofdrm   9736  0
drm_shmem_helper5704  1 ofdrm
drm_kms_helper101736  1 ofdrm
cfbfillrect 2896  1 drm_kms_helper
syscopyarea 2400  1 drm_kms_helper
cfbimgblt   2256  1 drm_kms_helper
sysfillrect 2920  1 drm_kms_helper
sysimgblt   2296  1 drm_kms_helper
cfbcopyarea 2376  1 drm_kms_helper
drm   288960  3 drm_shmem_helper,ofdrm,drm_kms_helper
drm_panel_orientation_quirks   16  1 drm

I use DRM [=m], DRM_OFDRM [=m], DRM_RADEON [=n], DRM_FBDEV_EMULATION [=y], FB 
[=y] and FB_OF [=n] in my kernel test .config. As far as I understand  
DRM_OFDRM and FB_OF would be mutually exclusive? Also does not seem to make a 
difference whether FB_SIMPLE [=y] is set.

Regards,
Erhard


dmesg_62-rc4_g4
Description: Binary data


Re: [PATCH 1/2] tools/perf: Fix the file mode with copyfile while adding file to build-id cache

2023-01-19 Thread Athira Rajeev



> On 18-Jan-2023, at 7:20 PM, Arnaldo Carvalho de Melo  wrote:
> 
> Em Mon, Jan 16, 2023 at 10:31:30AM +0530, Athira Rajeev escreveu:
>> The test "build id cache operations" fails on powerpc
>> As below:
>> 
>>  Adding 5a0fd882b53084224ba47b624c55a469 ./tests/shell/../pe-file.exe: Ok
>>  build id: 5a0fd882b53084224ba47b624c55a469
>>  link: /tmp/perf.debug.ZTu/.build-id/5a/0fd882b53084224ba47b624c55a469
>>  file: 
>> /tmp/perf.debug.ZTu/.build-id/5a/../../root/linux/tools/perf/tests/pe-file.exe/5a0fd882b53084224ba47b624c55a469/elf
>>  failed: file 
>> /tmp/perf.debug.ZTu/.build-id/5a/../../root/linux/tools/perf/tests/pe-file.exe/5a0fd882b53084224ba47b624c55a469/elf
>>  does not exist
>>  test child finished with -1
>>   end 
>>  build id cache operations: FAILED!
>> 
>> The failing test is when trying to add pe-file.exe to
>> build id cache.
>> 
>> Perf buildid-cache can be used to add/remove/manage
>> files from the build-id cache. "-a" option is used to
>> add a file to the build-id cache.
>> 
>> Simple command to do so for a PE exe file:
>> # ls -ltr tests/pe-file.exe
>> -rw-r--r--. 1 root root 75595 Jan 10 23:35 tests/pe-file.exe
>> The file is in home directory.
>> 
>> # mkdir  /tmp/perf.debug.TeY1
>> # perf --buildid-dir /tmp/perf.debug.TeY1 buildid-cache -v
>>   -a tests/pe-file.exe
>> 
>> The above will create ".build-id" folder in build id
>> directory, which is /tmp/perf.debug.TeY1. Also adds file
>> to this folder under build id. Example:
>> 
>> # ls -ltr /tmp/perf.debug.TeY1/.build-id/5a/0fd882b53084224ba47b624c55a469/
>> total 76
>> -rw-r--r--. 1 root root 0 Jan 11 00:38 probes
>> -rwxr-xr-x. 1 root root 75595 Jan 11 00:38 elf
>> 
>> We can see in the results that file mode for original
>> file and file in build id directory is different. ie,
>> build id file has executable permission whereas original
>> file doesn’t have.
>> 
>> The code path and function ( build_id_cache__add ) to
>> add file to cache is in "util/build-id.c". In
>> build_id_cache__add() function, it first attempts to link
>> the original file to destination cache folder. If linking
>> the file fails ( which can happen if the destination and
>> source is on a different mount points ), it will copy the
>> file to destination. Here copyfile() routine explicitly uses
>> mode as "755" and hence file in the destination will have
>> executable permission.
>> 
>> Code snippet:
>> 
>> if (link(realname, filename) && errno != EEXIST &&
>>   copyfile(name, filename))
>> 
>> Strace logs:
>> 
>>  172285 link("/home//linux/tools/perf/tests/pe-file.exe", 
>> "/tmp/perf.debug.TeY1/home//linux/tools/perf/tests/pe-file.exe/5a0fd882b53084224ba47b624c55a469/elf")
>>  = -1 EXDEV (Invalid cross-device link)
>>  172285 newfstatat(AT_FDCWD, "tests/pe-file.exe", {st_mode=S_IFREG|0644, 
>> st_size=75595, ...}, 0) = 0
>>  172285 openat(AT_FDCWD, 
>> "/tmp/perf.debug.TeY1/home//linux/tools/perf/tests/pe-file.exe/5a0fd882b53084224ba47b624c55a469/.elf.KbAnsl",
>>  O_RDWR|O_CREAT|O_EXCL, 0600) = 3
>>  172285 fchmod(3, 0755)  = 0
>>  172285 openat(AT_FDCWD, "tests/pe-file.exe", O_RDONLY) = 4
>>  172285 mmap(NULL, 75595, PROT_READ, MAP_PRIVATE, 4, 0) = 0x7fffa5cd
>>  172285 pwrite64(3, 
>> "MZ\220\0\3\0\0\0\4\0\0\0\377\377\0\0\270\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 
>> 75595, 0) = 75595
>> 
>> Whereas if the link succeeds, it succeeds in the first
>> attempt itself and the file in the build-id dir will
>> have same permission as original file.
>> 
>> Example, above uses /tmp. Instead if we use
>> "--buildid-dir /home/build", linking will work here
>> since mount points are same. Hence the destination file
>> will not have executable permission.
>> 
>> Since the testcase "tests/shell/buildid.sh" always looks
>> for executable file, test fails in powerpc environment
>> when test is run from /root.
>> 
>> The patch adds a change in build_id_cache__add to use
>> copyfile_mode which also passes the file’s original mode as
>> argument. This way the destination file mode also will
>> be same as original file.
>> 
>> Signed-off-by: Athira Rajeev 
> 
> Thanks, applied both patches, IIRC there were an Acked-by from Ian for
> this one? Or was that reference a cut'n'paste error with that other
> series for the #/bin/bash changes?
> 
> - Arnaldo
> 

Hi Arnaldo,

Thanks for picking up the changes in this patch set.

I re-based the patch from the other series ( for which I had got Acked-by from 
Ian ) on top of tmp.perf/urgent
and posted it as single V2 here:

https://lore.kernel.org/linux-perf-users/20230119113054.31742-1-atraj...@linux.vnet.ibm.com/T/#u

Thanks
Athira

>> ---
>> tools/perf/util/build-id.c | 10 +++---
>> 1 file changed, 7 insertions(+), 3 deletions(-)
>> 
>> diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
>> index a839b30c981b..ea9c083ab1e3 100644
>> --- a/tools/perf/util/build-id.c
>> +++ 

[PATCH V2] tools/perf/tests: Fix string substitutions in build id test

2023-01-19 Thread Athira Rajeev
The perf test named “build id cache operations” skips with below
error on some distros:

<<>>
 78: build id cache operations   :
test child forked, pid 01
WARNING: wine not found. PE binaries will not be run.
test binaries: /tmp/perf.ex.SHA1.PKz /tmp/perf.ex.MD5.Gt3 
./tests/shell/../pe-file.exe
DEBUGINFOD_URLS=
Adding 4abd406f041feb4f10ecde3fc30fd0639e1a91cb /tmp/perf.ex.SHA1.PKz: Ok
build id: 4abd406f041feb4f10ecde3fc30fd0639e1a91cb
./tests/shell/buildid.sh: 69: ./tests/shell/buildid.sh: Bad substitution
test child finished with -2
build id cache operations: Skip
<<>>

The test script "tests/shell/buildid.sh" uses some of the
string substitution ways which are supported in bash, but not in
"sh" or other shells. Above error on line number 69 that reports
"Bad substitution" is:

<<>>
link=${build_id_dir}/.build-id/${id:0:2}/${id:2}
<<>>

Here the way of getting first two characters from id ie,
${id:0:2} and similarly expressions like ${id:2} is not
recognised in "sh". So the line errors and instead of
hitting failure, the test gets skipped as shown in logs.
So the syntax issue causes test not to be executed in
such cases. Similarly usage : "${@: -1}" [ to pick last
argument passed to a function] in “test_record” doesn’t
work in all distros.

Fix this by using alternative way with "cut" command
to pick "n" characters from the string. Also fix the usage
of “${@: -1}” to work in all cases.

Another usage in “test_record” is:
<<>>
${perf} record --buildid-all -o ${data} $@ &> ${log}
<<>>

This causes the perf record to start in background and
Results in the data file not being created by the time
"check" function is invoked. Below log shows perf record
result getting displayed after the call to "check" function.

<<>>
running: perf record /tmp/perf.ex.SHA1.EAU
build id: 4abd406f041feb4f10ecde3fc30fd0639e1a91cb
link: /tmp/perf.debug.mLT/.build-id/4a/bd406f041feb4f10ecde3fc30fd0639e1a91cb
failed: link 
/tmp/perf.debug.mLT/.build-id/4a/bd406f041feb4f10ecde3fc30fd0639e1a91cb does 
not exist
test child finished with -1
build id cache operations: FAILED!
root@machine:~/athira/linux/tools/perf# Couldn't synthesize bpf events.
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.010 MB /tmp/perf.data.bFF ]
<<>>

Fix this by redirecting output instead of using “&” which
starts the command in background.

Signed-off-by: Athira Rajeev 
Acked-by: Ian Rogers 
---
Changelog:
- Added Acked-by from Ian.
- Rebased to tmp.perf/urgent of:
  git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git

 tools/perf/tests/shell/buildid.sh | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/tools/perf/tests/shell/buildid.sh 
b/tools/perf/tests/shell/buildid.sh
index aaf851108ca3..43e43e131be7 100755
--- a/tools/perf/tests/shell/buildid.sh
+++ b/tools/perf/tests/shell/buildid.sh
@@ -66,7 +66,7 @@ check()
esac
echo "build id: ${id}"
 
-   link=${build_id_dir}/.build-id/${id:0:2}/${id:2}
+   link=${build_id_dir}/.build-id/$(echo ${id}|cut -c 1-2)/$(echo 
${id}|cut -c 3-)
echo "link: ${link}"
 
if [ ! -h $link ]; then
@@ -74,7 +74,7 @@ check()
exit 1
fi
 
-   file=${build_id_dir}/.build-id/${id:0:2}/`readlink ${link}`/elf
+   file=${build_id_dir}/.build-id/$(echo ${id}|cut -c 1-2)/`readlink 
${link}`/elf
echo "file: ${file}"
 
# Check for file permission of original file
@@ -130,20 +130,22 @@ test_record()
 {
data=$(mktemp /tmp/perf.data.XXX)
build_id_dir=$(mktemp -d /tmp/perf.debug.XXX)
-   log=$(mktemp /tmp/perf.log.XXX)
+   log_out=$(mktemp /tmp/perf.log.out.XXX)
+   log_err=$(mktemp /tmp/perf.log.err.XXX)
perf="perf --buildid-dir ${build_id_dir}"
+   eval last=\${$#}
 
echo "running: perf record $@"
-   ${perf} record --buildid-all -o ${data} $@ &> ${log}
+   ${perf} record --buildid-all -o ${data} $@ 1>${log_out} 2>${log_err}
if [ $? -ne 0 ]; then
echo "failed: record $@"
-   echo "see log: ${log}"
+   echo "see log: ${log_err}"
exit 1
fi
 
-   check ${@: -1}
+   check $last
 
-   rm -f ${log}
+   rm -f ${log_out} ${log_err}
rm -rf ${build_id_dir}
rm -rf ${data}
 }
-- 
2.31.1



Re: [PATCH v2] of: Fix of platform build on powerpc due to bad of disaply code

2023-01-19 Thread Michal Suchánek
Hello,

On Thu, Jan 19, 2023 at 10:24:07AM +, Christophe Leroy wrote:
> 
> 
> Le 19/01/2023 à 10:53, Michal Suchanek a écrit :
> > The commit 2d681d6a23a1 ("of: Make of framebuffer devices unique")
> > breaks build because of wrong argument to snprintf. That certainly
> > avoids the runtime error but is not the intended outcome.
> > 
> > Also use standard device name format of-display.N for all created
> > devices.
> > 
> > Fixes: 2d681d6a23a1 ("of: Make of framebuffer devices unique")
> > Signed-off-by: Michal Suchanek 
> > ---
> > v2: Update the device name format
> > ---
> >   drivers/of/platform.c | 12 
> >   1 file changed, 8 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> > index f2a5d679a324..8c1b1de22036 100644
> > --- a/drivers/of/platform.c
> > +++ b/drivers/of/platform.c
> > @@ -525,7 +525,9 @@ static int __init 
> > of_platform_default_populate_init(void)
> > if (IS_ENABLED(CONFIG_PPC)) {
> > struct device_node *boot_display = NULL;
> > struct platform_device *dev;
> > -   int display_number = 1;
> > +   int display_number = 0;
> > +   char buf[14];
> 
> Can you declare that in the for block where it is used instead ?

No, there are two for blocks.

> 
> > +   char *of_display_format = "of-display.%d";
> 
> Should be const ?

Yes, could be.

> 
> > int ret;
> >   
> > /* Check if we have a MacOS display without a node spec */
> > @@ -556,7 +558,10 @@ static int __init 
> > of_platform_default_populate_init(void)
> > if (!of_get_property(node, "linux,opened", NULL) ||
> > !of_get_property(node, "linux,boot-display", NULL))
> > continue;
> > -   dev = of_platform_device_create(node, "of-display", 
> > NULL);
> > +   ret = snprintf(buf, sizeof(buf), of_display_format, 
> > display_number++);
> > +   if (ret >= sizeof(buf))
> > +   continue;
> 
> 
> Can you make buf big enough to avoid that ?

It would be a bit fragile that way.

The buffer would have to theoretically accomodate
"of-display.-9223372036854775808", and any change to the format requires
recalculating the length, by hand.

Of course, the memory would run out way before allocating that many
devices so it's kind of pointless to try and accomodate all possible
device numbers.

> 
> And by the way could it be called something else than 'buf' ?
> 
> See exemple here : 
> https://elixir.bootlin.com/linux/v6.1/source/drivers/fsi/fsi-occ.c#L690

Yes, that is quite possible. Nonetheless, just like 'ret' generic
variable names also work.

Thanks

Michal


Re: [PATCH v2] of: Fix of platform build on powerpc due to bad of disaply code

2023-01-19 Thread Christophe Leroy


Le 19/01/2023 à 10:53, Michal Suchanek a écrit :
> The commit 2d681d6a23a1 ("of: Make of framebuffer devices unique")
> breaks build because of wrong argument to snprintf. That certainly
> avoids the runtime error but is not the intended outcome.
> 
> Also use standard device name format of-display.N for all created
> devices.
> 
> Fixes: 2d681d6a23a1 ("of: Make of framebuffer devices unique")
> Signed-off-by: Michal Suchanek 
> ---
> v2: Update the device name format
> ---
>   drivers/of/platform.c | 12 
>   1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> index f2a5d679a324..8c1b1de22036 100644
> --- a/drivers/of/platform.c
> +++ b/drivers/of/platform.c
> @@ -525,7 +525,9 @@ static int __init of_platform_default_populate_init(void)
>   if (IS_ENABLED(CONFIG_PPC)) {
>   struct device_node *boot_display = NULL;
>   struct platform_device *dev;
> - int display_number = 1;
> + int display_number = 0;
> + char buf[14];

Can you declare that in the for block where it is used instead ?

> + char *of_display_format = "of-display.%d";

Should be const ?

>   int ret;
>   
>   /* Check if we have a MacOS display without a node spec */
> @@ -556,7 +558,10 @@ static int __init of_platform_default_populate_init(void)
>   if (!of_get_property(node, "linux,opened", NULL) ||
>   !of_get_property(node, "linux,boot-display", NULL))
>   continue;
> - dev = of_platform_device_create(node, "of-display", 
> NULL);
> + ret = snprintf(buf, sizeof(buf), of_display_format, 
> display_number++);
> + if (ret >= sizeof(buf))
> + continue;


Can you make buf big enough to avoid that ?

And by the way could it be called something else than 'buf' ?

See exemple here : 
https://elixir.bootlin.com/linux/v6.1/source/drivers/fsi/fsi-occ.c#L690


> + dev = of_platform_device_create(node, buf, NULL);
>   if (WARN_ON(!dev))
>   return -ENOMEM;
>   boot_display = node;
> @@ -564,10 +569,9 @@ static int __init of_platform_default_populate_init(void)
>   }
>   
>   for_each_node_by_type(node, "display") {
> - char *buf[14];
>   if (!of_get_property(node, "linux,opened", NULL) || 
> node == boot_display)
>   continue;
> - ret = snprintf(buf, "of-display-%d", display_number++);
> + ret = snprintf(buf, sizeof(buf), of_display_format, 
> display_number++);
>   if (ret >= sizeof(buf))
>   continue;
>   of_platform_device_create(node, buf, NULL);


Re: [PATCH v2] of: Fix of platform build on powerpc due to bad of disaply code

2023-01-19 Thread Thomas Zimmermann



Am 19.01.23 um 10:53 schrieb Michal Suchanek:

The commit 2d681d6a23a1 ("of: Make of framebuffer devices unique")
breaks build because of wrong argument to snprintf. That certainly
avoids the runtime error but is not the intended outcome.

Also use standard device name format of-display.N for all created
devices.

Fixes: 2d681d6a23a1 ("of: Make of framebuffer devices unique")
Signed-off-by: Michal Suchanek 


Reviewed-by: Thomas Zimmermann 

Thanks again for taking care of this issue.


---
v2: Update the device name format
---
  drivers/of/platform.c | 12 
  1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index f2a5d679a324..8c1b1de22036 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -525,7 +525,9 @@ static int __init of_platform_default_populate_init(void)
if (IS_ENABLED(CONFIG_PPC)) {
struct device_node *boot_display = NULL;
struct platform_device *dev;
-   int display_number = 1;
+   int display_number = 0;
+   char buf[14];
+   char *of_display_format = "of-display.%d";
int ret;
  
  		/* Check if we have a MacOS display without a node spec */

@@ -556,7 +558,10 @@ static int __init of_platform_default_populate_init(void)
if (!of_get_property(node, "linux,opened", NULL) ||
!of_get_property(node, "linux,boot-display", NULL))
continue;
-   dev = of_platform_device_create(node, "of-display", 
NULL);
+   ret = snprintf(buf, sizeof(buf), of_display_format, 
display_number++);
+   if (ret >= sizeof(buf))
+   continue;
+   dev = of_platform_device_create(node, buf, NULL);
if (WARN_ON(!dev))
return -ENOMEM;
boot_display = node;
@@ -564,10 +569,9 @@ static int __init of_platform_default_populate_init(void)
}
  
  		for_each_node_by_type(node, "display") {

-   char *buf[14];
if (!of_get_property(node, "linux,opened", NULL) || 
node == boot_display)
continue;
-   ret = snprintf(buf, "of-display-%d", display_number++);
+   ret = snprintf(buf, sizeof(buf), of_display_format, 
display_number++);
if (ret >= sizeof(buf))
continue;
of_platform_device_create(node, buf, NULL);


--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


OpenPGP_signature
Description: OpenPGP digital signature


[PATCH v2] of: Fix of platform build on powerpc due to bad of disaply code

2023-01-19 Thread Michal Suchanek
The commit 2d681d6a23a1 ("of: Make of framebuffer devices unique")
breaks build because of wrong argument to snprintf. That certainly
avoids the runtime error but is not the intended outcome.

Also use standard device name format of-display.N for all created
devices.

Fixes: 2d681d6a23a1 ("of: Make of framebuffer devices unique")
Signed-off-by: Michal Suchanek 
---
v2: Update the device name format
---
 drivers/of/platform.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index f2a5d679a324..8c1b1de22036 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -525,7 +525,9 @@ static int __init of_platform_default_populate_init(void)
if (IS_ENABLED(CONFIG_PPC)) {
struct device_node *boot_display = NULL;
struct platform_device *dev;
-   int display_number = 1;
+   int display_number = 0;
+   char buf[14];
+   char *of_display_format = "of-display.%d";
int ret;
 
/* Check if we have a MacOS display without a node spec */
@@ -556,7 +558,10 @@ static int __init of_platform_default_populate_init(void)
if (!of_get_property(node, "linux,opened", NULL) ||
!of_get_property(node, "linux,boot-display", NULL))
continue;
-   dev = of_platform_device_create(node, "of-display", 
NULL);
+   ret = snprintf(buf, sizeof(buf), of_display_format, 
display_number++);
+   if (ret >= sizeof(buf))
+   continue;
+   dev = of_platform_device_create(node, buf, NULL);
if (WARN_ON(!dev))
return -ENOMEM;
boot_display = node;
@@ -564,10 +569,9 @@ static int __init of_platform_default_populate_init(void)
}
 
for_each_node_by_type(node, "display") {
-   char *buf[14];
if (!of_get_property(node, "linux,opened", NULL) || 
node == boot_display)
continue;
-   ret = snprintf(buf, "of-display-%d", display_number++);
+   ret = snprintf(buf, sizeof(buf), of_display_format, 
display_number++);
if (ret >= sizeof(buf))
continue;
of_platform_device_create(node, buf, NULL);
-- 
2.35.3



Re: [PATCH 17/41] mm/mmap: move VMA locking before anon_vma_lock_write call

2023-01-19 Thread Michal Hocko
On Wed 18-01-23 13:48:13, Suren Baghdasaryan wrote:
> On Wed, Jan 18, 2023 at 1:33 PM Michal Hocko  wrote:
[...]
> > So it will become:
> > Move VMA flag modification (which now implies VMA locking) before
> > vma_adjust_trans_huge() to ensure the modifications are done after VMA
> > has been locked. Because vma_adjust_trans_huge() modifies the VMA and such
> > modifications should be done under VMA write-lock protection.
> >
> > which is effectivelly saying
> > vma_adjust_trans_huge() modifies the VMA and such modifications should
> > be done under VMA write-lock protection so move VMA flag modifications
> > before so all of them are covered by the same write protection.
> >
> > right?
> 
> Yes, and the wording in the latter version is simpler to understand
> IMO, so I would like to adopt it. Do you agree?

of course.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH] of: Make of framebuffer devices unique

2023-01-19 Thread Thomas Zimmermann

Hi

Am 19.01.23 um 10:01 schrieb Michal Suchánek:

On Thu, Jan 19, 2023 at 09:00:44AM +0100, Thomas Zimmermann wrote:

Hi Michal,

thanks for fixing this issue. But the review time was way too short. Please
see my comments below.

Am 18.01.23 um 22:46 schrieb Michal Suchánek:

On Wed, Jan 18, 2023 at 09:13:05PM +0100, Erhard F. wrote:

On Tue, 17 Jan 2023 17:58:04 +0100
Michal Suchanek  wrote:


Since Linux 5.19 this error is observed:

sysfs: cannot create duplicate filename '/devices/platform/of-display'

This is because multiple devices with the same name 'of-display' are
created on the same bus.

Update the code to create numbered device names for the non-boot
disaplay.

cc: linuxppc-dev@lists.ozlabs.org
References: https://bugzilla.kernel.org/show_bug.cgi?id=216095
Fixes: 52b1b46c39ae ("of: Create platform devices for OF framebuffers")
Reported-by: Erhard F. 
Suggested-by: Thomas Zimmermann 
Signed-off-by: Michal Suchanek 
---
   drivers/of/platform.c | 8 +++-
   1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 81c8c227ab6b..f2a5d679a324 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -525,6 +525,7 @@ static int __init of_platform_default_populate_init(void)
if (IS_ENABLED(CONFIG_PPC)) {
struct device_node *boot_display = NULL;
struct platform_device *dev;
+   int display_number = 1;
int ret;
/* Check if we have a MacOS display without a node spec */
@@ -561,10 +562,15 @@ static int __init of_platform_default_populate_init(void)
boot_display = node;
break;
}
+
for_each_node_by_type(node, "display") {
+   char *buf[14];


Another issue here: This should simply be buf[14]; not a pointer. Is 14 
chars enough for the string plus a full number?



if (!of_get_property(node, "linux,opened", NULL) || 
node == boot_display)
continue;
-   of_platform_device_create(node, "of-display", NULL);
+   ret = snprintf(buf, "of-display-%d", display_number++);


The second argument to snprintf() is sizeof(buf); the number of 
characters in buf.




Platform devices use a single dot (.) as separator before the index.
Counting starts at zero. See /sys/bus/platform/ for examples. Can we please
stick with that scheme? Generated names would then be of-display.0,
of-display.1, etc.


Yes, there was surprisingly no bikeshedding.

Do we also want to change the name of the device that did manage to
instantiate before?

This scheme changes the name only for those that did not in the past,
hence "of-display" and "of-display-%d", starting from 1.


I find that very confusing. It is is better to count all devices from 0. 
I don't expect this to be an issue for userspace. But if necessary, 
devfs can create softlinks to of-display.0.


Best regards
Thomas



Sure, replacing '-' with '.' is easy enough, and using the same format
for both as well.

Thanks

Michal



Best regards
Thomas




+   if (ret >= sizeof(buf))
+   continue;
+   of_platform_device_create(node, buf, NULL);
}
} else {
--
2.35.3



Thank you for the patch Michal!

It applies on 6.2-rc4 but I get this build error with my config:


Indeed, it's doubly bad.

Where is the kernel test robot when you need it?

It should not be that easy to miss this file but clearly it can happen.

I will send a fixup.

Sorry about the mess.

Thanks

Michal


--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev






--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH] of: Make of framebuffer devices unique

2023-01-19 Thread Michal Suchánek
On Thu, Jan 19, 2023 at 09:00:44AM +0100, Thomas Zimmermann wrote:
> Hi Michal,
> 
> thanks for fixing this issue. But the review time was way too short. Please
> see my comments below.
> 
> Am 18.01.23 um 22:46 schrieb Michal Suchánek:
> > On Wed, Jan 18, 2023 at 09:13:05PM +0100, Erhard F. wrote:
> > > On Tue, 17 Jan 2023 17:58:04 +0100
> > > Michal Suchanek  wrote:
> > > 
> > > > Since Linux 5.19 this error is observed:
> > > > 
> > > > sysfs: cannot create duplicate filename '/devices/platform/of-display'
> > > > 
> > > > This is because multiple devices with the same name 'of-display' are
> > > > created on the same bus.
> > > > 
> > > > Update the code to create numbered device names for the non-boot
> > > > disaplay.
> > > > 
> > > > cc: linuxppc-dev@lists.ozlabs.org
> > > > References: https://bugzilla.kernel.org/show_bug.cgi?id=216095
> > > > Fixes: 52b1b46c39ae ("of: Create platform devices for OF framebuffers")
> > > > Reported-by: Erhard F. 
> > > > Suggested-by: Thomas Zimmermann 
> > > > Signed-off-by: Michal Suchanek 
> > > > ---
> > > >   drivers/of/platform.c | 8 +++-
> > > >   1 file changed, 7 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> > > > index 81c8c227ab6b..f2a5d679a324 100644
> > > > --- a/drivers/of/platform.c
> > > > +++ b/drivers/of/platform.c
> > > > @@ -525,6 +525,7 @@ static int __init 
> > > > of_platform_default_populate_init(void)
> > > > if (IS_ENABLED(CONFIG_PPC)) {
> > > > struct device_node *boot_display = NULL;
> > > > struct platform_device *dev;
> > > > +   int display_number = 1;
> > > > int ret;
> > > > /* Check if we have a MacOS display without a node spec 
> > > > */
> > > > @@ -561,10 +562,15 @@ static int __init 
> > > > of_platform_default_populate_init(void)
> > > > boot_display = node;
> > > > break;
> > > > }
> > > > +
> > > > for_each_node_by_type(node, "display") {
> > > > +   char *buf[14];
> > > > if (!of_get_property(node, "linux,opened", 
> > > > NULL) || node == boot_display)
> > > > continue;
> > > > -   of_platform_device_create(node, "of-display", 
> > > > NULL);
> > > > +   ret = snprintf(buf, "of-display-%d", 
> > > > display_number++);
> 
> Platform devices use a single dot (.) as separator before the index.
> Counting starts at zero. See /sys/bus/platform/ for examples. Can we please
> stick with that scheme? Generated names would then be of-display.0,
> of-display.1, etc.

Yes, there was surprisingly no bikeshedding.

Do we also want to change the name of the device that did manage to
instantiate before?

This scheme changes the name only for those that did not in the past,
hence "of-display" and "of-display-%d", starting from 1.

Sure, replacing '-' with '.' is easy enough, and using the same format
for both as well.

Thanks

Michal

> 
> Best regards
> Thomas
> 
> 
> 
> > > > +   if (ret >= sizeof(buf))
> > > > +   continue;
> > > > +   of_platform_device_create(node, buf, NULL);
> > > > }
> > > > } else {
> > > > -- 
> > > > 2.35.3
> > > > 
> > > 
> > > Thank you for the patch Michal!
> > > 
> > > It applies on 6.2-rc4 but I get this build error with my config:
> > 
> > Indeed, it's doubly bad.
> > 
> > Where is the kernel test robot when you need it?
> > 
> > It should not be that easy to miss this file but clearly it can happen.
> > 
> > I will send a fixup.
> > 
> > Sorry about the mess.
> > 
> > Thanks
> > 
> > Michal
> 
> -- 
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Ivo Totev





Re: [PATCH 01/11] ARM: dts: ti: Fix pca954x i2c-mux node names

2023-01-19 Thread Tony Lindgren
* Geert Uytterhoeven  [221202 18:50]:
> "make dtbs_check":
> 
> arch/arm/boot/dts/am3874-iceboard.dtb: pca9548@70: $nodename:0: 
> 'pca9548@70' does not match '^(i2c-?)?mux'
>   From schema: 
> Documentation/devicetree/bindings/i2c/i2c-mux-pca954x.yaml
> arch/arm/boot/dts/am3874-iceboard.dtb: pca9548@70: Unevaluated properties 
> are not allowed ('#address-cells', '#size-cells', 'i2c@0', 'i2c@1', 'i2c@2', 
> 'i2c@3', 'i2c@4', 'i2c@5', 'i2c@6', 'i2c@7' were unexpected)
>   From schema: 
> Documentation/devicetree/bindings/i2c/i2c-mux-pca954x.yaml
> ...
> 
> Fix this by renaming PCA954x nodes to "i2c-mux", to match the I2C bus
> multiplexer/switch DT bindings and the Generic Names Recommendation in
> the Devicetree Specification.

Applying this patch into omap-for-v6.3/dt thanks.

Tony


[PATCH] powerpc: add crtsavres.o to always-y instead of extra-y

2023-01-19 Thread Masahiro Yamada
crtsavres.o is linked to modules, but as commit d0e628cd817f ("kbuild:
doc: clarify the difference between extra-y and always-y") explained,
'make modules' does not build extra-y.

The following command fails:

  $ make ARCH=powerpc LLVM=1 mrproper ps3_defconfig modules
[snip]
LD [M]  arch/powerpc/platforms/cell/spufs/spufs.ko
  ld.lld: error: cannot open arch/powerpc/lib/crtsavres.o: No such file or 
directory
  make[1]: *** [scripts/Makefile.modfinal:61: 
arch/powerpc/platforms/cell/spufs/spufs.ko] Error 1
  make: *** [Makefile:1924: modules] Error 2

Signed-off-by: Masahiro Yamada 
---

 arch/powerpc/lib/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
index c53618c34b70..aa34854bc9f5 100644
--- a/arch/powerpc/lib/Makefile
+++ b/arch/powerpc/lib/Makefile
@@ -43,7 +43,7 @@ obj-$(CONFIG_FUNCTION_ERROR_INJECTION)+= 
error-inject.o
 # so it is only needed for modules, and only for older linkers which
 # do not support --save-restore-funcs
 ifndef CONFIG_LD_IS_BFD
-extra-$(CONFIG_PPC64)  += crtsavres.o
+always-$(CONFIG_PPC64) += crtsavres.o
 endif
 
 obj-$(CONFIG_PPC_BOOK3S_64) += copyuser_power7.o copypage_power7.o \
-- 
2.34.1



[PATCH] powerpc: remove checks for binutils older than 2.25

2023-01-19 Thread Masahiro Yamada
Commit e4412739472b ("Documentation: raise minimum supported version of
binutils to 2.25") allows us to remove the checks for old binutils.

There is no more user for ld-ifversion. Remove it as well.

Signed-off-by: Masahiro Yamada 
---

 arch/powerpc/Makefile | 22 +-
 arch/powerpc/lib/Makefile |  2 +-
 scripts/Makefile.compiler |  4 
 3 files changed, 2 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index dc4cbf0a5ca9..3d265b16c0ae 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -42,18 +42,13 @@ machine-$(CONFIG_PPC64) += 64
 machine-$(CONFIG_CPU_LITTLE_ENDIAN) += le
 UTS_MACHINE := $(subst $(space),,$(machine-y))
 
-# XXX This needs to be before we override LD below
-ifdef CONFIG_PPC32
-KBUILD_LDFLAGS_MODULE += arch/powerpc/lib/crtsavres.o
-else
-ifeq ($(call ld-ifversion, -ge, 22500, y),y)
+ifeq ($(CONFIG_PPC64)$(CONFIG_LD_IS_BFD),yy)
 # Have the linker provide sfpr if possible.
 # There is a corresponding test in arch/powerpc/lib/Makefile
 KBUILD_LDFLAGS_MODULE += --save-restore-funcs
 else
 KBUILD_LDFLAGS_MODULE += arch/powerpc/lib/crtsavres.o
 endif
-endif
 
 ifdef CONFIG_CPU_LITTLE_ENDIAN
 KBUILD_CFLAGS  += -mlittle-endian
@@ -389,8 +384,6 @@ vdso_prepare: prepare0
$(build)=arch/powerpc/kernel/vdso 
include/generated/vdso64-offsets.h)
 endif
 
-archprepare: checkbin
-
 archheaders:
$(Q)$(MAKE) $(build)=arch/powerpc/kernel/syscalls all
 
@@ -405,16 +398,3 @@ else
$(eval KBUILD_CFLAGS += -mstack-protector-guard-offset=$(shell awk '{if 
($$2 == "TASK_CANARY") print $$3;}' include/generated/asm-offsets.h))
 endif
 endif
-
-PHONY += checkbin
-# Check toolchain versions:
-# - gcc-4.6 is the minimum kernel-wide version so nothing required.
-checkbin:
-   @if test "x${CONFIG_LD_IS_LLD}" != "xy" -a \
-   "x$(call ld-ifversion, -le, 22400, y)" = "xy" ; then \
-   echo -n '*** binutils 2.24 miscompiles weak symbols ' ; \
-   echo 'in some circumstances.' ; \
-   echo'*** binutils 2.23 do not define the TOC symbol ' ; \
-   echo -n '*** Please use a different binutils version.' ; \
-   false ; \
-   fi
diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
index 4de71cbf6e8e..c53618c34b70 100644
--- a/arch/powerpc/lib/Makefile
+++ b/arch/powerpc/lib/Makefile
@@ -42,7 +42,7 @@ obj-$(CONFIG_FUNCTION_ERROR_INJECTION)+= 
error-inject.o
 # 64-bit linker creates .sfpr on demand for final link (vmlinux),
 # so it is only needed for modules, and only for older linkers which
 # do not support --save-restore-funcs
-ifeq ($(call ld-ifversion, -lt, 22500, y),y)
+ifndef CONFIG_LD_IS_BFD
 extra-$(CONFIG_PPC64)  += crtsavres.o
 endif
 
diff --git a/scripts/Makefile.compiler b/scripts/Makefile.compiler
index 3d8adfd34af1..ad07a4efc253 100644
--- a/scripts/Makefile.compiler
+++ b/scripts/Makefile.compiler
@@ -72,7 +72,3 @@ clang-min-version = $(call test-ge, $(CONFIG_CLANG_VERSION), 
$1)
 # ld-option
 # Usage: KBUILD_LDFLAGS += $(call ld-option, -X, -Y)
 ld-option = $(call try-run, $(LD) $(KBUILD_LDFLAGS) $(1) -v,$(1),$(2),$(3))
-
-# ld-ifversion
-# Usage:  $(call ld-ifversion, -ge, 22252, y)
-ld-ifversion = $(shell [ $(CONFIG_LD_VERSION)0 $(1) $(2)0 ] && echo $(3) || 
echo $(4))
-- 
2.34.1



Re: [PATCH] of: Make of framebuffer devices unique

2023-01-19 Thread Thomas Zimmermann

Hi Michal,

thanks for fixing this issue. But the review time was way too short. 
Please see my comments below.


Am 18.01.23 um 22:46 schrieb Michal Suchánek:

On Wed, Jan 18, 2023 at 09:13:05PM +0100, Erhard F. wrote:

On Tue, 17 Jan 2023 17:58:04 +0100
Michal Suchanek  wrote:


Since Linux 5.19 this error is observed:

sysfs: cannot create duplicate filename '/devices/platform/of-display'

This is because multiple devices with the same name 'of-display' are
created on the same bus.

Update the code to create numbered device names for the non-boot
disaplay.

cc: linuxppc-dev@lists.ozlabs.org
References: https://bugzilla.kernel.org/show_bug.cgi?id=216095
Fixes: 52b1b46c39ae ("of: Create platform devices for OF framebuffers")
Reported-by: Erhard F. 
Suggested-by: Thomas Zimmermann 
Signed-off-by: Michal Suchanek 
---
  drivers/of/platform.c | 8 +++-
  1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 81c8c227ab6b..f2a5d679a324 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -525,6 +525,7 @@ static int __init of_platform_default_populate_init(void)
if (IS_ENABLED(CONFIG_PPC)) {
struct device_node *boot_display = NULL;
struct platform_device *dev;
+   int display_number = 1;
int ret;
  
  		/* Check if we have a MacOS display without a node spec */

@@ -561,10 +562,15 @@ static int __init of_platform_default_populate_init(void)
boot_display = node;
break;
}
+
for_each_node_by_type(node, "display") {
+   char *buf[14];
if (!of_get_property(node, "linux,opened", NULL) || 
node == boot_display)
continue;
-   of_platform_device_create(node, "of-display", NULL);
+   ret = snprintf(buf, "of-display-%d", display_number++);


Platform devices use a single dot (.) as separator before the index. 
Counting starts at zero. See /sys/bus/platform/ for examples. Can we 
please stick with that scheme? Generated names would then be 
of-display.0, of-display.1, etc.


Best regards
Thomas




+   if (ret >= sizeof(buf))
+   continue;
+   of_platform_device_create(node, buf, NULL);
}
  
  	} else {

--
2.35.3



Thank you for the patch Michal!

It applies on 6.2-rc4 but I get this build error with my config:


Indeed, it's doubly bad.

Where is the kernel test robot when you need it?

It should not be that easy to miss this file but clearly it can happen.

I will send a fixup.

Sorry about the mess.

Thanks

Michal


--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


OpenPGP_signature
Description: OpenPGP digital signature