Re: [PATCH next v2 2/5] locking/osq_lock: Optimise the vcpu_is_preempted() check.

2024-01-07 Thread kernel test robot



Hello,

kernel test robot noticed a 10.7% improvement of 
stress-ng.netlink-task.ops_per_sec on:


commit: d93300891f810c9498d09a6ecea2403d7a3546f0 ("[PATCH next v2 2/5] 
locking/osq_lock: Optimise the vcpu_is_preempted() check.")
url: 
https://github.com/intel-lab-lkp/linux/commits/David-Laight/locking-osq_lock-Defer-clearing-node-locked-until-the-slow-osq_lock-path/20240101-055853
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 
610a9b8f49fbcf1100716370d3b5f6f884a2835a
patch link: 
https://lore.kernel.org/all/3a9d1782cd50436c99ced8c10175b...@acums.aculab.com/
patch subject: [PATCH next v2 2/5] locking/osq_lock: Optimise the 
vcpu_is_preempted() check.

testcase: stress-ng
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz 
(Ice Lake) with 256G memory
parameters:

nr_threads: 100%
testtime: 60s
sc_pid_max: 4194304
class: scheduler
test: netlink-task
cpufreq_governor: performance






Details are as below:
-->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240108/202401081557.641738f5-oliver.s...@intel.com

=
class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/sc_pid_max/tbox_group/test/testcase/testtime:
  
scheduler/gcc-12/performance/x86_64-rhel-8.3/100%/debian-11.1-x86_64-20220510.cgz/4194304/lkp-icl-2sp8/netlink-task/stress-ng/60s

commit: 
  ff787c1fd0 ("locking/osq_lock: Defer clearing node->locked until the slow 
osq_lock() path.")
  d93300891f ("locking/osq_lock: Optimise the vcpu_is_preempted() check.")

ff787c1fd0c13f9b d93300891f810c9498d09a6ecea 
 --- 
 %stddev %change %stddev
 \  |\  
  3880 ±  7% +26.4%   4903 ± 18%  vmstat.system.cs
  0.48 ±126% -99.8%   0.00 ±141%  
perf-sched.sch_delay.avg.ms.__cond_resched.aa_sk_perm.security_socket_recvmsg.sock_recvmsg.__sys_recvfrom
  0.16 ± 23% -38.9%   0.10 ± 32%  
perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.genl_rcv_msg
  1.50 ±118% -99.9%   0.00 ±142%  
perf-sched.sch_delay.max.ms.__cond_resched.aa_sk_perm.security_socket_recvmsg.sock_recvmsg.__sys_recvfrom
  2.55 ± 97% -83.7%   0.42 ±145%  
perf-sched.wait_time.max.ms.__cond_resched.__mutex_lock.constprop.0.genl_rcv_msg
  32244865   +10.7%   35709040stress-ng.netlink-task.ops
537413   +10.7% 595150stress-ng.netlink-task.ops_per_sec
 38094 ± 12% +42.2%  54160 ± 27%  
stress-ng.time.involuntary_context_switches
 42290 ± 11% +39.8%  59117 ± 23%  
stress-ng.time.voluntary_context_switches
143.50 ±  7% -28.8% 102.17 ± 15%  perf-c2c.DRAM.local
  4955 ±  3% -26.4%   3647 ±  4%  perf-c2c.DRAM.remote
  4038 ±  2% -18.8%   3277 ±  4%  perf-c2c.HITM.local
  3483 ±  3% -21.1%   2747 ±  5%  perf-c2c.HITM.remote
  7521 ±  2% -19.9%   6024 ±  4%  perf-c2c.HITM.total
  0.42 ±  3% -16.2%   0.35 ±  5%  perf-stat.i.MPKI
 1.066e+10+9.6%  1.169e+10perf-stat.i.branch-instructions
 51.90-2.5   49.42 ±  2%  perf-stat.i.cache-miss-rate%
  22517746 ±  3% -13.4%   19503564 ±  5%  perf-stat.i.cache-misses
  3730 ±  7% +29.2%   4819 ± 19%  perf-stat.i.context-switches
  3.99-3.1%   3.86perf-stat.i.cpi
  9535 ±  3% +15.4%  11003 ±  5%  
perf-stat.i.cycles-between-cache-misses
  0.00 ±  3%  +0.00.00 ±  3%  perf-stat.i.dTLB-load-miss-rate%
 1.419e+10   -14.9%  1.207e+10perf-stat.i.dTLB-loads
 8.411e+08+8.4%  9.118e+08perf-stat.i.dTLB-stores
  5.36e+10+3.1%  5.524e+10perf-stat.i.instructions
  0.26+7.0%   0.28 ±  5%  perf-stat.i.ipc
837.29 ±  3%  -9.8% 755.30 ±  4%  perf-stat.i.metric.K/sec
401.41-4.1% 385.10perf-stat.i.metric.M/sec
   6404966   -23.2%4920722perf-stat.i.node-load-misses
141818 ±  4% -22.2% 110404 ±  4%  perf-stat.i.node-loads
 69.54   +13.8   83.36perf-stat.i.node-store-miss-rate%
   3935319   +10.4%4345724perf-stat.i.node-store-misses
   1626033   -52.6% 771187 ±  5%  perf-stat.i.node-stores
  0.42 ±  3% -16.0%   0.35 ±  5%  perf-stat.overall.MPKI
  0.11-0.00.10 ±  8%  
perf-stat.overall.branch-miss-rate%
 51.32-2.5   48.79 ±  2%  perf-stat.overall.cache-miss-rate%
  4.06-3.0%   3.94perf-stat.overall.cpi
  9677 ±  3% +15.6%  11187 ±  5%  

[PATCH v2 5/6] LoongArch: KVM: Add physical cpuid map support

2024-01-07 Thread Bibo Mao
Physical cpuid is used to irq routing for irqchips such as ipi/msi/
extioi interrupt controller. And physical cpuid is stored at CSR
register LOONGARCH_CSR_CPUID, it can not be changed once vcpu is
created. Since different irqchips have different size definition
about physical cpuid, KVM uses the smallest cpuid from extioi, and
the max cpuid size is defines as 256.

Signed-off-by: Bibo Mao 
---
 arch/loongarch/include/asm/kvm_host.h | 26 
 arch/loongarch/include/asm/kvm_vcpu.h |  1 +
 arch/loongarch/kvm/vcpu.c | 61 ++-
 arch/loongarch/kvm/vm.c   | 11 +
 4 files changed, 98 insertions(+), 1 deletion(-)

diff --git a/arch/loongarch/include/asm/kvm_host.h 
b/arch/loongarch/include/asm/kvm_host.h
index 0e89db020481..93acba84f87e 100644
--- a/arch/loongarch/include/asm/kvm_host.h
+++ b/arch/loongarch/include/asm/kvm_host.h
@@ -64,6 +64,30 @@ struct kvm_world_switch {
 
 #define MAX_PGTABLE_LEVELS 4
 
+/*
+ * Physical cpu id is used for interrupt routing, there are different
+ * definitions about physical cpuid on different hardwares.
+ *  For LOONGARCH_CSR_CPUID register, max cpuid size if 512
+ *  For IPI HW, max dest CPUID size 1024
+ *  For extioi interrupt controller, max dest CPUID size is 256
+ *  For MSI interrupt controller, max supported CPUID size is 65536
+ *
+ * Currently max CPUID is defined as 256 for KVM hypervisor, in future
+ * it will be expanded to 4096, including 16 packages at most. And every
+ * package supports at most 256 vcpus
+ */
+#define KVM_MAX_PHYID  256
+
+struct kvm_phyid_info {
+   struct kvm_vcpu *vcpu;
+   boolenabled;
+};
+
+struct kvm_phyid_map {
+   int max_phyid;
+   struct kvm_phyid_info phys_map[KVM_MAX_PHYID];
+};
+
 struct kvm_arch {
/* Guest physical mm */
kvm_pte_t *pgd;
@@ -71,6 +95,8 @@ struct kvm_arch {
unsigned long invalid_ptes[MAX_PGTABLE_LEVELS];
unsigned int  pte_shifts[MAX_PGTABLE_LEVELS];
unsigned int  root_level;
+   struct mutex  phyid_map_lock;
+   struct kvm_phyid_map  *phyid_map;
 
s64 time_offset;
struct kvm_context __percpu *vmcs;
diff --git a/arch/loongarch/include/asm/kvm_vcpu.h 
b/arch/loongarch/include/asm/kvm_vcpu.h
index 0e87652f780a..3019e260a3ae 100644
--- a/arch/loongarch/include/asm/kvm_vcpu.h
+++ b/arch/loongarch/include/asm/kvm_vcpu.h
@@ -61,6 +61,7 @@ void kvm_save_timer(struct kvm_vcpu *vcpu);
 void kvm_restore_timer(struct kvm_vcpu *vcpu);
 
 int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq);
+struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid);
 
 /*
  * Loongarch KVM guest interrupt handling
diff --git a/arch/loongarch/kvm/vcpu.c b/arch/loongarch/kvm/vcpu.c
index cf1c4d64c1b7..9dc40a80ab5a 100644
--- a/arch/loongarch/kvm/vcpu.c
+++ b/arch/loongarch/kvm/vcpu.c
@@ -274,6 +274,63 @@ static int _kvm_getcsr(struct kvm_vcpu *vcpu, unsigned int 
id, u64 *val)
return 0;
 }
 
+static inline int kvm_set_cpuid(struct kvm_vcpu *vcpu, u64 val)
+{
+   int cpuid;
+   struct loongarch_csrs *csr = vcpu->arch.csr;
+   struct kvm_phyid_map  *map;
+
+   if (val >= KVM_MAX_PHYID)
+   return -EINVAL;
+
+   cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
+   if (cpuid == 0) {
+   kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, val);
+   map = vcpu->kvm->arch.phyid_map;
+   map->phys_map[val].enabled  = true;
+   map->phys_map[val].vcpu = vcpu;
+
+   mutex_lock(>kvm->arch.phyid_map_lock);
+   if (map->max_phyid < val)
+   map->max_phyid = val;
+   mutex_unlock(>kvm->arch.phyid_map_lock);
+   } else if (cpuid != val)
+   return -EINVAL;
+
+   return 0;
+}
+
+struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid)
+{
+   struct kvm_phyid_map  *map;
+
+   if (cpuid >= KVM_MAX_PHYID)
+   return NULL;
+
+   map = kvm->arch.phyid_map;
+   if (map->phys_map[cpuid].enabled)
+   return map->phys_map[cpuid].vcpu;
+
+   return NULL;
+}
+
+static inline void kvm_drop_cpuid(struct kvm_vcpu *vcpu)
+{
+   int cpuid;
+   struct loongarch_csrs *csr = vcpu->arch.csr;
+   struct kvm_phyid_map  *map;
+
+   map = vcpu->kvm->arch.phyid_map;
+   cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
+   if (cpuid >= KVM_MAX_PHYID)
+   return;
+
+   if (map->phys_map[cpuid].enabled) {
+   map->phys_map[cpuid].vcpu = NULL;
+   map->phys_map[cpuid].enabled = false;
+   }
+}
+
 static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
 {
int ret = 0, gintc;
@@ -291,7 +348,8 @@ static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int 
id, u64 val)
kvm_set_sw_gcsr(csr, LOONGARCH_CSR_ESTAT, gintc);
 
return ret;
-   }
+  

[PATCH v2 6/6] LoongArch: Add pv ipi support on LoongArch system

2024-01-07 Thread Bibo Mao
On LoongArch system, ipi hw uses iocsr registers, there is one iocsr
register access on ipi sender and two iocsr access on ipi receiver
which is ipi interrupt handler. On VM mode all iocsr registers
accessing will trap into hypervisor. So with one ipi hw notification
there will be three times of trap.

This patch adds pv ipi support for VM, hypercall instruction is used
to ipi sender, and hypervisor will inject SWI on the VM. During SWI
interrupt handler, only estat CSR register is written to clear irq.
Estat CSR register access will not trap into hypervisor. So with pv ipi
supported, pv ipi sender will trap into hypervsor one time, pv ipi
revicer will not trap, there is only one time of trap.

Also this patch adds ipi multicast support, the method is similar with
x86. With ipi multicast support, ipi notification can be sent to at most
64 vcpus at one time. It reduces trap into hypervisor greatly.

Signed-off-by: Bibo Mao 
---
 arch/loongarch/include/asm/hardirq.h   |   1 +
 arch/loongarch/include/asm/kvm_host.h  |   1 +
 arch/loongarch/include/asm/kvm_para.h  | 124 +
 arch/loongarch/include/asm/loongarch.h |   1 +
 arch/loongarch/kernel/irq.c|   2 +-
 arch/loongarch/kernel/paravirt.c   | 110 ++
 arch/loongarch/kernel/smp.c|   2 +-
 arch/loongarch/kvm/exit.c  |  70 +-
 arch/loongarch/kvm/vcpu.c  |   1 +
 9 files changed, 308 insertions(+), 4 deletions(-)

diff --git a/arch/loongarch/include/asm/hardirq.h 
b/arch/loongarch/include/asm/hardirq.h
index 9f0038e19c7f..8a611843c1f0 100644
--- a/arch/loongarch/include/asm/hardirq.h
+++ b/arch/loongarch/include/asm/hardirq.h
@@ -21,6 +21,7 @@ enum ipi_msg_type {
 typedef struct {
unsigned int ipi_irqs[NR_IPI];
unsigned int __softirq_pending;
+   atomic_t messages cacheline_aligned_in_smp;
 } cacheline_aligned irq_cpustat_t;
 
 DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
diff --git a/arch/loongarch/include/asm/kvm_host.h 
b/arch/loongarch/include/asm/kvm_host.h
index 93acba84f87e..f21c60ce58be 100644
--- a/arch/loongarch/include/asm/kvm_host.h
+++ b/arch/loongarch/include/asm/kvm_host.h
@@ -43,6 +43,7 @@ struct kvm_vcpu_stat {
u64 idle_exits;
u64 cpucfg_exits;
u64 signal_exits;
+   u64 hvcl_exits;
 };
 
 #define KVM_MEM_HUGEPAGE_CAPABLE   (1UL << 0)
diff --git a/arch/loongarch/include/asm/kvm_para.h 
b/arch/loongarch/include/asm/kvm_para.h
index 41200e922a82..a25a84e372b9 100644
--- a/arch/loongarch/include/asm/kvm_para.h
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -9,6 +9,10 @@
 #define HYPERVISOR_VENDOR_SHIFT8
 #define HYPERCALL_CODE(vendor, code)   ((vendor << HYPERVISOR_VENDOR_SHIFT) + 
code)
 
+#define KVM_HC_CODE_SERVICE0
+#define KVM_HC_SERVICE HYPERCALL_CODE(HYPERVISOR_KVM, 
KVM_HC_CODE_SERVICE)
+#define  KVM_HC_FUNC_IPI   1
+
 /*
  * LoongArch hypcall return code
  */
@@ -16,6 +20,126 @@
 #define KVM_HC_INVALID_CODE-1UL
 #define KVM_HC_INVALID_PARAMETER   -2UL
 
+/*
+ * Hypercalls interface for KVM hypervisor
+ *
+ * a0: function identifier
+ * a1-a6: args
+ * Return value will be placed in v0.
+ * Up to 6 arguments are passed in a1, a2, a3, a4, a5, a6.
+ */
+static __always_inline long kvm_hypercall(u64 fid)
+{
+   register long ret asm("v0");
+   register unsigned long fun asm("a0") = fid;
+
+   __asm__ __volatile__(
+   "hvcl "__stringify(KVM_HC_SERVICE)
+   : "=r" (ret)
+   : "r" (fun)
+   : "memory"
+   );
+
+   return ret;
+}
+
+static __always_inline long kvm_hypercall1(u64 fid, unsigned long arg0)
+{
+   register long ret asm("v0");
+   register unsigned long fun asm("a0") = fid;
+   register unsigned long a1  asm("a1") = arg0;
+
+   __asm__ __volatile__(
+   "hvcl "__stringify(KVM_HC_SERVICE)
+   : "=r" (ret)
+   : "r" (fun), "r" (a1)
+   : "memory"
+   );
+
+   return ret;
+}
+
+static __always_inline long kvm_hypercall2(u64 fid,
+   unsigned long arg0, unsigned long arg1)
+{
+   register long ret asm("v0");
+   register unsigned long fun asm("a0") = fid;
+   register unsigned long a1  asm("a1") = arg0;
+   register unsigned long a2  asm("a2") = arg1;
+
+   __asm__ __volatile__(
+   "hvcl "__stringify(KVM_HC_SERVICE)
+   : "=r" (ret)
+   : "r" (fun), "r" (a1), "r" (a2)
+   : "memory"
+   );
+
+   return ret;
+}
+
+static __always_inline long kvm_hypercall3(u64 fid,
+   unsigned long arg0, unsigned long arg1, unsigned long arg2)
+{
+   register long ret asm("v0");
+   register unsigned long fun asm("a0") = fid;
+   register unsigned long a1  asm("a1") = arg0;
+   register unsigned long a2  

[PATCH v2 4/6] LoongArch: Add paravirt interface for guest kernel

2024-01-07 Thread Bibo Mao
The patch add paravirt interface for guest kernel, function
pv_guest_init firstly checks whether system runs on VM mode. If kernel
runs on VM mode, it will call function kvm_para_available to detect
whether current VMM is KVM hypervisor. And the paravirt function can work
only if current VMM is KVM hypervisor, and there is only KVM hypervisor
supported on LoongArch now.

Signed-off-by: Bibo Mao 
---
 arch/loongarch/Kconfig|  9 
 arch/loongarch/include/asm/kvm_para.h |  7 
 arch/loongarch/include/asm/paravirt.h | 27 
 .../include/asm/paravirt_api_clock.h  |  1 +
 arch/loongarch/kernel/Makefile|  1 +
 arch/loongarch/kernel/paravirt.c  | 41 +++
 arch/loongarch/kernel/setup.c |  2 +
 7 files changed, 88 insertions(+)
 create mode 100644 arch/loongarch/include/asm/paravirt.h
 create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
 create mode 100644 arch/loongarch/kernel/paravirt.c

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index ee123820a476..d8ccaf46a50d 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -564,6 +564,15 @@ config CPU_HAS_PREFETCH
bool
default y
 
+config PARAVIRT
+   bool "Enable paravirtualization code"
+   depends on AS_HAS_LVZ_EXTENSION
+   help
+  This changes the kernel so it can modify itself when it is run
+ under a hypervisor, potentially improving performance significantly
+ over full virtualization.  However, when run without a hypervisor
+ the kernel is theoretically slower and slightly larger.
+
 config ARCH_SUPPORTS_KEXEC
def_bool y
 
diff --git a/arch/loongarch/include/asm/kvm_para.h 
b/arch/loongarch/include/asm/kvm_para.h
index 9425d3b7e486..41200e922a82 100644
--- a/arch/loongarch/include/asm/kvm_para.h
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -2,6 +2,13 @@
 #ifndef _ASM_LOONGARCH_KVM_PARA_H
 #define _ASM_LOONGARCH_KVM_PARA_H
 
+/*
+ * Hypcall code field
+ */
+#define HYPERVISOR_KVM 1
+#define HYPERVISOR_VENDOR_SHIFT8
+#define HYPERCALL_CODE(vendor, code)   ((vendor << HYPERVISOR_VENDOR_SHIFT) + 
code)
+
 /*
  * LoongArch hypcall return code
  */
diff --git a/arch/loongarch/include/asm/paravirt.h 
b/arch/loongarch/include/asm/paravirt.h
new file mode 100644
index ..b64813592ba0
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_PARAVIRT_H
+#define _ASM_LOONGARCH_PARAVIRT_H
+
+#ifdef CONFIG_PARAVIRT
+#include 
+struct static_key;
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+   return static_call(pv_steal_clock)(cpu);
+}
+
+int pv_guest_init(void);
+#else
+static inline int pv_guest_init(void)
+{
+   return 0;
+}
+
+#endif // CONFIG_PARAVIRT
+#endif
diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
b/arch/loongarch/include/asm/paravirt_api_clock.h
new file mode 100644
index ..65ac7cee0dad
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt_api_clock.h
@@ -0,0 +1 @@
+#include 
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index 3c808c680370..662e6e9de12d 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
 
 obj-$(CONFIG_PROC_FS)  += proc.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o
 
 obj-$(CONFIG_SMP)  += smp.o
 
diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c
new file mode 100644
index ..21d01d05791a
--- /dev/null
+++ b/arch/loongarch/kernel/paravirt.c
@@ -0,0 +1,41 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct static_key paravirt_steal_enabled;
+struct static_key paravirt_steal_rq_enabled;
+
+static u64 native_steal_clock(int cpu)
+{
+   return 0;
+}
+
+DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock);
+
+static bool kvm_para_available(void)
+{
+   static int hypervisor_type;
+   int config;
+
+   if (!hypervisor_type) {
+   config = read_cpucfg(CPUCFG_KVM_SIG);
+   if (!memcmp(, KVM_SIGNATURE, 4))
+   hypervisor_type = HYPERVISOR_KVM;
+   }
+
+   return hypervisor_type == HYPERVISOR_KVM;
+}
+
+int __init pv_guest_init(void)
+{
+   if (!cpu_has_hypervisor)
+   return 0;
+   if (!kvm_para_available())
+   return 0;
+
+   return 1;
+}
diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c
index d183a745fb85..fa680bdd0bd1 

[PATCH v2 3/6] LoongArch: SMP: Refine ipi ops on LoongArch platform

2024-01-07 Thread Bibo Mao
This patch refines ipi handling on LoongArch platform, there are
three changes with this patch.
1. Add generic get_percpu_irq api, replace some percpu irq function
such as get_ipi_irq/get_pmc_irq/get_timer_irq with get_percpu_irq.

2. Change parameter action definition with function
loongson_send_ipi_single and loongson_send_ipi_mask. Code encoding is used
here rather than bitmap encoding for ipi action, ipi hw sender uses action
code, and ipi receiver will get action bitmap encoding, the ipi hw will
convert it into bitmap in ipi message buffer.

3. Add smp_ops on LoongArch platform so that pv ipi can be used later.

Signed-off-by: Bibo Mao 
---
 arch/loongarch/include/asm/hardirq.h |  4 ++
 arch/loongarch/include/asm/irq.h | 10 -
 arch/loongarch/include/asm/smp.h | 31 +++
 arch/loongarch/kernel/irq.c  | 22 +--
 arch/loongarch/kernel/perf_event.c   | 14 +--
 arch/loongarch/kernel/smp.c  | 58 +++-
 arch/loongarch/kernel/time.c | 12 +-
 7 files changed, 71 insertions(+), 80 deletions(-)

diff --git a/arch/loongarch/include/asm/hardirq.h 
b/arch/loongarch/include/asm/hardirq.h
index 0ef3b18f8980..9f0038e19c7f 100644
--- a/arch/loongarch/include/asm/hardirq.h
+++ b/arch/loongarch/include/asm/hardirq.h
@@ -12,6 +12,10 @@
 extern void ack_bad_irq(unsigned int irq);
 #define ack_bad_irq ack_bad_irq
 
+enum ipi_msg_type {
+   IPI_RESCHEDULE,
+   IPI_CALL_FUNCTION,
+};
 #define NR_IPI 2
 
 typedef struct {
diff --git a/arch/loongarch/include/asm/irq.h b/arch/loongarch/include/asm/irq.h
index 218b4da0ea90..00101b6d601e 100644
--- a/arch/loongarch/include/asm/irq.h
+++ b/arch/loongarch/include/asm/irq.h
@@ -117,8 +117,16 @@ extern struct fwnode_handle *liointc_handle;
 extern struct fwnode_handle *pch_lpc_handle;
 extern struct fwnode_handle *pch_pic_handle[MAX_IO_PICS];
 
-extern irqreturn_t loongson_ipi_interrupt(int irq, void *dev);
+static inline int get_percpu_irq(int vector)
+{
+   struct irq_domain *d;
+
+   d = irq_find_matching_fwnode(cpuintc_handle, DOMAIN_BUS_ANY);
+   if (d)
+   return irq_create_mapping(d, vector);
 
+   return -EINVAL;
+}
 #include 
 
 #endif /* _ASM_IRQ_H */
diff --git a/arch/loongarch/include/asm/smp.h b/arch/loongarch/include/asm/smp.h
index f81e5f01d619..330f1cb3741c 100644
--- a/arch/loongarch/include/asm/smp.h
+++ b/arch/loongarch/include/asm/smp.h
@@ -12,6 +12,13 @@
 #include 
 #include 
 
+struct smp_ops {
+   void (*call_func_ipi)(const struct cpumask *mask, unsigned int action);
+   void (*call_func_single_ipi)(int cpu, unsigned int action);
+   void (*ipi_init)(void);
+};
+
+extern struct smp_ops smp_ops;
 extern int smp_num_siblings;
 extern int num_processors;
 extern int disabled_cpus;
@@ -24,8 +31,6 @@ void loongson_prepare_cpus(unsigned int max_cpus);
 void loongson_boot_secondary(int cpu, struct task_struct *idle);
 void loongson_init_secondary(void);
 void loongson_smp_finish(void);
-void loongson_send_ipi_single(int cpu, unsigned int action);
-void loongson_send_ipi_mask(const struct cpumask *mask, unsigned int action);
 #ifdef CONFIG_HOTPLUG_CPU
 int loongson_cpu_disable(void);
 void loongson_cpu_die(unsigned int cpu);
@@ -59,9 +64,12 @@ extern int __cpu_logical_map[NR_CPUS];
 
 #define cpu_physical_id(cpu)   cpu_logical_map(cpu)
 
-#define SMP_BOOT_CPU   0x1
-#define SMP_RESCHEDULE 0x2
-#define SMP_CALL_FUNCTION  0x4
+#define ACTTION_BOOT_CPU   0
+#define ACTTION_RESCHEDULE 1
+#define ACTTION_CALL_FUNCTION  2
+#define SMP_BOOT_CPU   BIT(ACTTION_BOOT_CPU)
+#define SMP_RESCHEDULE BIT(ACTTION_RESCHEDULE)
+#define SMP_CALL_FUNCTION  BIT(ACTTION_CALL_FUNCTION)
 
 struct secondary_data {
unsigned long stack;
@@ -71,7 +79,8 @@ extern struct secondary_data cpuboot_data;
 
 extern asmlinkage void smpboot_entry(void);
 extern asmlinkage void start_secondary(void);
-
+extern void arch_send_call_function_single_ipi(int cpu);
+extern void arch_send_call_function_ipi_mask(const struct cpumask *mask);
 extern void calculate_cpu_foreign_map(void);
 
 /*
@@ -79,16 +88,6 @@ extern void calculate_cpu_foreign_map(void);
  */
 extern void show_ipi_list(struct seq_file *p, int prec);
 
-static inline void arch_send_call_function_single_ipi(int cpu)
-{
-   loongson_send_ipi_single(cpu, SMP_CALL_FUNCTION);
-}
-
-static inline void arch_send_call_function_ipi_mask(const struct cpumask *mask)
-{
-   loongson_send_ipi_mask(mask, SMP_CALL_FUNCTION);
-}
-
 #ifdef CONFIG_HOTPLUG_CPU
 static inline int __cpu_disable(void)
 {
diff --git a/arch/loongarch/kernel/irq.c b/arch/loongarch/kernel/irq.c
index 883e5066ae44..1b58f7c3eed9 100644
--- a/arch/loongarch/kernel/irq.c
+++ b/arch/loongarch/kernel/irq.c
@@ -87,23 +87,9 @@ static void __init init_vec_parent_group(void)
acpi_table_parse(ACPI_SIG_MCFG, early_pci_mcfg_parse);
 }
 
-static int __init get_ipi_irq(void)
-{
-   struct irq_domain *d 

[PATCH v2 0/6] LoongArch: Add pv ipi support on LoongArch VM

2024-01-07 Thread Bibo Mao
This patchset adds pv ipi support for VM. On physical machine, ipi HW
uses IOCSR registers, however there is trap into hypervisor when vcpu
accesses IOCSR registers if system is in VM mode. SWI is a interrupt
mechanism like SGI on ARM, software can send interrupt to CPU, only that
on LoongArch SWI can only be sent to local CPU now. So SWI can not used
for IPI on real HW system, however it can be used on VM when combined with
hypercall method. This patch uses SWI interrupt for IPI mechanism, SWI
injection uses hypercall method. And there is one trap with IPI sending,
however with SWI interrupt handler there is no trap.

This patch passes to runltp testcases, and unixbench score is 99% of
that on physical machine on 3C5000 single way machine. Here is unixbench
score with 16 cores on 3C5000 single way machine.

UnixBench score on 3C5000 machine with 16 cores 
Dhrystone 2 using register variables 116700.0  339749961.8  29113.1
Double-Precision Whetstone   55.0  57716.9  10494.0
Execl Throughput 43.0  33563.4   7805.4
File Copy 1024 bufsize 2000 maxblocks  3960.01017912.5   2570.5
File Copy 256 bufsize 500 maxblocks1655.0 260061.4   1571.4
File Copy 4096 bufsize 8000 maxblocks  5800.03216109.4   5545.0
Pipe Throughput   12440.0   18404312.0  14794.5
Pipe-based Context Switching   4000.03395856.2   8489.6
Process Creation126.0  55684.8   4419.4
Shell Scripts (1 concurrent) 42.4  55901.8  13184.4
Shell Scripts (8 concurrent)  6.0   7396.5  12327.5
System Call Overhead  15000.06997351.4   4664.9
System Benchmarks Index Score7288.6

UnixBench score on VM with 16 cores -
Dhrystone 2 using register variables 116700.0  341649555.5  29275.9
Double-Precision Whetstone   55.0  57490.9  10452.9
Execl Throughput 43.0  33663.8   7828.8
File Copy 1024 bufsize 2000 maxblocks  3960.01047631.2   2645.5
File Copy 256 bufsize 500 maxblocks1655.0 286671.0   1732.2
File Copy 4096 bufsize 8000 maxblocks  5800.03243588.2   5592.4
Pipe Throughput   12440.0   16353087.8  13145.6
Pipe-based Context Switching   4000.03100690.0   7751.7
Process Creation126.0  51502.1   4087.5
Shell Scripts (1 concurrent) 42.4  56665.3  13364.4
Shell Scripts (8 concurrent)  6.0   7412.1  12353.4
System Call Overhead  15000.06962239.6   4641.5
System Benchmarks Index Score7205.8

---
Change in V2:
  1. Add hw cpuid map support since ipi routing uses hw cpuid
  2. Refine changelog description
  3. Add hypercall statistic support for vcpu
  4. Set percpu pv ipi message buffer aligned with cacheline
  5. Refine pv ipi send logic, do not send ipi message with if there is
pending ipi message.
---

Bibo Mao (6):
  LoongArch: KVM: Add hypercall instruction emulation support
  LoongArch: KVM: Add cpucfg area for kvm hypervisor
  LoongArch/smp: Refine ipi ops on LoongArch platform
  LoongArch: Add paravirt interface for guest kernel
  LoongArch: KVM: Add physical cpuid map support
  LoongArch: Add pv ipi support on LoongArch system

 arch/loongarch/Kconfig|   9 +
 arch/loongarch/include/asm/Kbuild |   1 -
 arch/loongarch/include/asm/hardirq.h  |   5 +
 arch/loongarch/include/asm/inst.h |   1 +
 arch/loongarch/include/asm/irq.h  |  10 +-
 arch/loongarch/include/asm/kvm_host.h |  27 +++
 arch/loongarch/include/asm/kvm_para.h | 157 ++
 arch/loongarch/include/asm/kvm_vcpu.h |   1 +
 arch/loongarch/include/asm/loongarch.h|  10 ++
 arch/loongarch/include/asm/paravirt.h |  27 +++
 .../include/asm/paravirt_api_clock.h  |   1 +
 arch/loongarch/include/asm/smp.h  |  31 ++--
 arch/loongarch/include/uapi/asm/Kbuild|   2 -
 arch/loongarch/kernel/Makefile|   1 +
 arch/loongarch/kernel/irq.c   |  24 +--
 arch/loongarch/kernel/paravirt.c  | 151 +
 arch/loongarch/kernel/perf_event.c|  14 +-
 arch/loongarch/kernel/setup.c |   2 +
 arch/loongarch/kernel/smp.c   |  60 ---
 arch/loongarch/kernel/time.c  |  12 +-
 arch/loongarch/kvm/exit.c | 122 --
 arch/loongarch/kvm/vcpu.c |  62 ++-
 arch/loongarch/kvm/vm.c   |  11 ++
 23 files changed, 639 insertions(+), 102 deletions(-)
 create 

[PATCH v2 2/6] LoongArch: KVM: Add cpucfg area for kvm hypervisor

2024-01-07 Thread Bibo Mao
System will trap into hypervisor when executing cpucfg instruction.
And now hardware only uses the area 0 - 20 for actual usage, here
one specified area 0x1000 -- 0x10ff is used for KVM hypervisor,
and the area can be extended for other hypervisors in future.

Signed-off-by: Bibo Mao 
---
 arch/loongarch/include/asm/inst.h  |  1 +
 arch/loongarch/include/asm/loongarch.h |  9 +
 arch/loongarch/kvm/exit.c  | 46 +-
 3 files changed, 40 insertions(+), 16 deletions(-)

diff --git a/arch/loongarch/include/asm/inst.h 
b/arch/loongarch/include/asm/inst.h
index d8f637f9e400..ad120f924905 100644
--- a/arch/loongarch/include/asm/inst.h
+++ b/arch/loongarch/include/asm/inst.h
@@ -67,6 +67,7 @@ enum reg2_op {
revhd_op= 0x11,
extwh_op= 0x16,
extwb_op= 0x17,
+   cpucfg_op   = 0x1b,
iocsrrdb_op = 0x19200,
iocsrrdh_op = 0x19201,
iocsrrdw_op = 0x19202,
diff --git a/arch/loongarch/include/asm/loongarch.h 
b/arch/loongarch/include/asm/loongarch.h
index 46366e783c84..a03b466555a1 100644
--- a/arch/loongarch/include/asm/loongarch.h
+++ b/arch/loongarch/include/asm/loongarch.h
@@ -158,6 +158,15 @@
 #define  CPUCFG48_VFPU_CG  BIT(2)
 #define  CPUCFG48_RAM_CG   BIT(3)
 
+/*
+ * cpucfg index area: 0x1000 -- 0x10ff
+ * SW emulation for KVM hypervirsor
+ */
+#define CPUCFG_KVM_BASE0x1000UL
+#define CPUCFG_KVM_SIZE0x100
+#define CPUCFG_KVM_SIG CPUCFG_KVM_BASE
+#define  KVM_SIGNATURE "KVM\0"
+#define CPUCFG_KVM_FEATURE (CPUCFG_KVM_BASE + 4)
 #ifndef __ASSEMBLY__
 
 /* CSR */
diff --git a/arch/loongarch/kvm/exit.c b/arch/loongarch/kvm/exit.c
index 59e5fe221982..e233d7b3b76d 100644
--- a/arch/loongarch/kvm/exit.c
+++ b/arch/loongarch/kvm/exit.c
@@ -206,10 +206,37 @@ int kvm_emu_idle(struct kvm_vcpu *vcpu)
return EMULATE_DONE;
 }
 
-static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
+static int kvm_emu_cpucfg(struct kvm_vcpu *vcpu, larch_inst inst)
 {
int rd, rj;
unsigned int index;
+
+   rd = inst.reg2_format.rd;
+   rj = inst.reg2_format.rj;
+   ++vcpu->stat.cpucfg_exits;
+   index = vcpu->arch.gprs[rj];
+
+   /*
+* By LoongArch Reference Manual 2.2.10.5
+* Return value is 0 for undefined cpucfg index
+*/
+   switch (index) {
+   case 0 ... (KVM_MAX_CPUCFG_REGS - 1):
+   vcpu->arch.gprs[rd] = vcpu->arch.cpucfg[index];
+   break;
+   case CPUCFG_KVM_SIG:
+   vcpu->arch.gprs[rd] = *(unsigned int *)KVM_SIGNATURE;
+   break;
+   default:
+   vcpu->arch.gprs[rd] = 0;
+   break;
+   }
+
+   return EMULATE_DONE;
+}
+
+static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
+{
unsigned long curr_pc;
larch_inst inst;
enum emulation_result er = EMULATE_DONE;
@@ -224,21 +251,8 @@ static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
er = EMULATE_FAIL;
switch (((inst.word >> 24) & 0xff)) {
case 0x0: /* CPUCFG GSPR */
-   if (inst.reg2_format.opcode == 0x1B) {
-   rd = inst.reg2_format.rd;
-   rj = inst.reg2_format.rj;
-   ++vcpu->stat.cpucfg_exits;
-   index = vcpu->arch.gprs[rj];
-   er = EMULATE_DONE;
-   /*
-* By LoongArch Reference Manual 2.2.10.5
-* return value is 0 for undefined cpucfg index
-*/
-   if (index < KVM_MAX_CPUCFG_REGS)
-   vcpu->arch.gprs[rd] = vcpu->arch.cpucfg[index];
-   else
-   vcpu->arch.gprs[rd] = 0;
-   }
+   if (inst.reg2_format.opcode == cpucfg_op)
+   er = kvm_emu_cpucfg(vcpu, inst);
break;
case 0x4: /* CSR{RD,WR,XCHG} GSPR */
er = kvm_handle_csr(vcpu, inst);
-- 
2.39.3




[PATCH v2 1/6] LoongArch: KVM: Add hypercall instruction emulation support

2024-01-07 Thread Bibo Mao
On LoongArch system, hypercall instruction is supported when system
runs on VM mode. This patch adds dummy function with hypercall
instruction emulation, rather than inject EXCCODE_INE invalid
instruction exception.

Signed-off-by: Bibo Mao 
---
 arch/loongarch/include/asm/Kbuild  |  1 -
 arch/loongarch/include/asm/kvm_para.h  | 26 ++
 arch/loongarch/include/uapi/asm/Kbuild |  2 --
 arch/loongarch/kvm/exit.c  | 10 ++
 4 files changed, 36 insertions(+), 3 deletions(-)
 create mode 100644 arch/loongarch/include/asm/kvm_para.h
 delete mode 100644 arch/loongarch/include/uapi/asm/Kbuild

diff --git a/arch/loongarch/include/asm/Kbuild 
b/arch/loongarch/include/asm/Kbuild
index 93783fa24f6e..22991a6f0e2b 100644
--- a/arch/loongarch/include/asm/Kbuild
+++ b/arch/loongarch/include/asm/Kbuild
@@ -23,4 +23,3 @@ generic-y += poll.h
 generic-y += param.h
 generic-y += posix_types.h
 generic-y += resource.h
-generic-y += kvm_para.h
diff --git a/arch/loongarch/include/asm/kvm_para.h 
b/arch/loongarch/include/asm/kvm_para.h
new file mode 100644
index ..9425d3b7e486
--- /dev/null
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_KVM_PARA_H
+#define _ASM_LOONGARCH_KVM_PARA_H
+
+/*
+ * LoongArch hypcall return code
+ */
+#define KVM_HC_STATUS_SUCCESS  0
+#define KVM_HC_INVALID_CODE-1UL
+#define KVM_HC_INVALID_PARAMETER   -2UL
+
+static inline unsigned int kvm_arch_para_features(void)
+{
+   return 0;
+}
+
+static inline unsigned int kvm_arch_para_hints(void)
+{
+   return 0;
+}
+
+static inline bool kvm_check_and_clear_guest_paused(void)
+{
+   return false;
+}
+#endif /* _ASM_LOONGARCH_KVM_PARA_H */
diff --git a/arch/loongarch/include/uapi/asm/Kbuild 
b/arch/loongarch/include/uapi/asm/Kbuild
deleted file mode 100644
index 4aa680ca2e5f..
--- a/arch/loongarch/include/uapi/asm/Kbuild
+++ /dev/null
@@ -1,2 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0
-generic-y += kvm_para.h
diff --git a/arch/loongarch/kvm/exit.c b/arch/loongarch/kvm/exit.c
index e708a1786d6b..59e5fe221982 100644
--- a/arch/loongarch/kvm/exit.c
+++ b/arch/loongarch/kvm/exit.c
@@ -650,6 +650,15 @@ static int kvm_handle_fpu_disabled(struct kvm_vcpu *vcpu)
return RESUME_GUEST;
 }
 
+static int kvm_handle_hypcall(struct kvm_vcpu *vcpu)
+{
+   update_pc(>arch);
+
+   /* Treat it as noop intruction, only set return value */
+   vcpu->arch.gprs[LOONGARCH_GPR_A0] = KVM_HC_INVALID_CODE;
+   return RESUME_GUEST;
+}
+
 /*
  * LoongArch KVM callback handling for unimplemented guest exiting
  */
@@ -679,6 +688,7 @@ static exit_handle_fn kvm_fault_tables[EXCCODE_INT_START] = 
{
[EXCCODE_TLBM]  = kvm_handle_write_fault,
[EXCCODE_FPDIS] = kvm_handle_fpu_disabled,
[EXCCODE_GSPR]  = kvm_handle_gspr,
+   [EXCCODE_HVC]   = kvm_handle_hypcall,
 };
 
 int kvm_handle_fault(struct kvm_vcpu *vcpu, int fault)
-- 
2.39.3




Re: [PATCH v6 2/3] vduse: Temporarily fail if control queue features requested

2024-01-07 Thread Jason Wang
On Fri, Jan 5, 2024 at 6:14 PM Maxime Coquelin
 wrote:
>
>
>
> On 1/5/24 10:59, Eugenio Perez Martin wrote:
> > On Fri, Jan 5, 2024 at 9:12 AM Maxime Coquelin
> >  wrote:
> >>
> >>
> >>
> >> On 1/5/24 03:45, Jason Wang wrote:
> >>> On Thu, Jan 4, 2024 at 11:38 PM Maxime Coquelin
> >>>  wrote:
> 
>  Virtio-net driver control queue implementation is not safe
>  when used with VDUSE. If the VDUSE application does not
>  reply to control queue messages, it currently ends up
>  hanging the kernel thread sending this command.
> 
>  Some work is on-going to make the control queue
>  implementation robust with VDUSE. Until it is completed,
>  let's fail features check if any control-queue related
>  feature is requested.
> 
>  Signed-off-by: Maxime Coquelin 
>  ---
> drivers/vdpa/vdpa_user/vduse_dev.c | 13 +
> 1 file changed, 13 insertions(+)
> 
>  diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
>  b/drivers/vdpa/vdpa_user/vduse_dev.c
>  index 0486ff672408..94f54ea2eb06 100644
>  --- a/drivers/vdpa/vdpa_user/vduse_dev.c
>  +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
>  @@ -28,6 +28,7 @@
> #include 
> #include 
> #include 
>  +#include 
> #include 
> 
> #include "iova_domain.h"
>  @@ -46,6 +47,15 @@
> 
> #define IRQ_UNBOUND -1
> 
>  +#define VDUSE_NET_INVALID_FEATURES_MASK \
>  +   (BIT_ULL(VIRTIO_NET_F_CTRL_VQ) |\
>  +BIT_ULL(VIRTIO_NET_F_CTRL_RX)   |  \
>  +BIT_ULL(VIRTIO_NET_F_CTRL_VLAN) |  \
>  +BIT_ULL(VIRTIO_NET_F_GUEST_ANNOUNCE) | \
>  +BIT_ULL(VIRTIO_NET_F_MQ) | \
>  +BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR) |  \
>  +BIT_ULL(VIRTIO_NET_F_RSS))
> >>>
> >>> We need to make this as well:
> >>>
> >>> VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
> >>
> >> I missed it, and see others have been added in the Virtio spec
> >> repository (BTW, I see this specific one is missing in the dependency
> >> list [0], I will submit a patch).
> >>
> >> I wonder if it is not just simpler to just check for
> >> VIRTIO_NET_F_CTRL_VQ is requested. As we fail instead of masking out,
> >> the VDUSE driver won't be the one violating the spec so it should be
> >> good?
> >>
> >> It will avoid having to update the mask if new features depending on it
> >> are added (or forgetting to update it).
> >>
> >> WDYT?
> >>
> >
> > I think it is safer to work with a whitelist, instead of a blacklist.
> > As any new feature might require code changes in QEMU. Is that
> > possible?
>
> Well, that's how it was done in previous revision. :)
>
> I changed to a blacklist for consistency with block device's WCE feature
> check after Jason's comment.
>
> I'm not sure moving back to a whitelist brings much advantages when
> compared to the effort of keeping it up to date. Just blacklisting
> VIRTIO_NET_F_CTRL_VQ is enough in my opinion.

I think this makes sense.

Thanks

>
> Thanks,
> Maxime
>
> >> Thanks,
> >> Maxime
> >>
> >> [0]:
> >> https://github.com/oasis-tcs/virtio-spec/blob/5fc35a7efb903fc352da81a6d2be5c01810b68d3/device-types/net/description.tex#L129
> >>> Other than this,
> >>>
> >>> Acked-by: Jason Wang 
> >>>
> >>> Thanks
> >>>
>  +
> struct vduse_virtqueue {
>    u16 index;
>    u16 num_max;
>  @@ -1680,6 +1690,9 @@ static bool features_is_valid(struct 
>  vduse_dev_config *config)
>    if ((config->device_id == VIRTIO_ID_BLOCK) &&
>    (config->features & (1ULL << 
>  VIRTIO_BLK_F_CONFIG_WCE)))
>    return false;
>  +   else if ((config->device_id == VIRTIO_ID_NET) &&
>  +   (config->features & 
>  VDUSE_NET_INVALID_FEATURES_MASK))
>  +   return false;
> 
>    return true;
> }
>  --
>  2.43.0
> 
> >>>
> >>
> >>
> >
>




[PATCH] tracing histograms: Simplify parse_actions() function

2024-01-07 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

The parse_actions() function uses 'len = str_has_prefix()' to test which
action is in the string being parsed. But then it goes and repeats the
logic for each different action. This logic can be simplified and
duplicate code can be removed as 'len' contains the length of the found
prefix which should be used for all actions.

Link: https://lore.kernel.org/all/20240107112044.6702c...@gandalf.local.home/

Signed-off-by: Steven Rostedt (Google) 
---
 kernel/trace/trace_events_hist.c | 49 
 1 file changed, 24 insertions(+), 25 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 5ecf3c8bde20..6ece1308d36a 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -4805,36 +4805,35 @@ static int parse_actions(struct hist_trigger_data 
*hist_data)
int len;
 
for (i = 0; i < hist_data->attrs->n_actions; i++) {
+   enum handler_id hid = 0;
+   char *action_str;
+
str = hist_data->attrs->action_str[i];
 
-   if ((len = str_has_prefix(str, "onmatch("))) {
-   char *action_str = str + len;
+   if ((len = str_has_prefix(str, "onmatch(")))
+   hid = HANDLER_ONMATCH;
+   else if ((len = str_has_prefix(str, "onmax(")))
+   hid = HANDLER_ONMAX;
+   else if ((len = str_has_prefix(str, "onchange(")))
+   hid = HANDLER_ONCHANGE;
 
-   data = onmatch_parse(tr, action_str);
-   if (IS_ERR(data)) {
-   ret = PTR_ERR(data);
-   break;
-   }
-   } else if ((len = str_has_prefix(str, "onmax("))) {
-   char *action_str = str + len;
+   action_str = str + len;
 
-   data = track_data_parse(hist_data, action_str,
-   HANDLER_ONMAX);
-   if (IS_ERR(data)) {
-   ret = PTR_ERR(data);
-   break;
-   }
-   } else if ((len = str_has_prefix(str, "onchange("))) {
-   char *action_str = str + len;
+   switch (hid) {
+   case HANDLER_ONMATCH:
+   data = onmatch_parse(tr, action_str);
+   break;
+   case HANDLER_ONMAX:
+   case HANDLER_ONCHANGE:
+   data = track_data_parse(hist_data, action_str, hid);
+   break;
+   default:
+   data = ERR_PTR(-EINVAL);
+   break;
+   }
 
-   data = track_data_parse(hist_data, action_str,
-   HANDLER_ONCHANGE);
-   if (IS_ERR(data)) {
-   ret = PTR_ERR(data);
-   break;
-   }
-   } else {
-   ret = -EINVAL;
+   if (IS_ERR(data)) {
+   ret = PTR_ERR(data);
break;
}
 
-- 
2.42.0




Re: [PATCH v5 11/34] function_graph: Have the instances use their own ftrace_ops for filtering

2024-01-07 Thread Google
On Fri, 5 Jan 2024 17:09:10 +
Mark Rutland  wrote:

> On Mon, Dec 18, 2023 at 10:13:46PM +0900, Masami Hiramatsu (Google) wrote:
> > From: Steven Rostedt (VMware) 
> > 
> > Allow for instances to have their own ftrace_ops part of the fgraph_ops
> > that makes the funtion_graph tracer filter on the set_ftrace_filter file
> > of the instance and not the top instance.
> > 
> > This also change how the function_graph handles multiple instances on the
> > shadow stack. Previously we use ARRAY type entries to record which one
> > is enabled, and this makes it a bitmap of the fgraph_array's indexes.
> > Previous function_graph_enter() expects calling back from
> > prepare_ftrace_return() function which is called back only once if it is
> > enabled. But this introduces different ftrace_ops for each fgraph
> > instance and those are called from ftrace_graph_func() one by one. Thus
> > we can not loop on the fgraph_array(), and need to reuse the ret_stack
> > pushed by the previous instance. Finding the ret_stack is easy because
> > we can check the ret_stack->func. But that is not enough for the self-
> > recursive tail-call case. Thus fgraph uses the bitmap entry to find it
> > is already set (this means that entry is for previous tail call).
> > 
> > Signed-off-by: Steven Rostedt (VMware) 
> > Signed-off-by: Masami Hiramatsu (Google) 
> 
> As a heads-up, while testing the topic/fprobe-on-fgraph branch on arm64, I get
> a warning which bisets down to this commit:

Hmm, so does this happen when enabling function graph tracer?

> 
> | Testing tracer function_graph: 
> | [ cut here ]
> | WARNING: CPU: 2 PID: 0 at arch/arm64/kernel/stacktrace.c:84 
> arch_stack_walk+0x3c0/0x3d8
> | Modules linked in:
> | CPU: 2 PID: 0 Comm: swapper/2 Not tainted 6.7.0-rc2-00026-gea1e68a341c2 #12
> | Hardware name: linux,dummy-virt (DT)
> | pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> | pc : arch_stack_walk+0x3c0/0x3d8
> | lr : arch_stack_walk+0x260/0x3d8
> | sp : 80008318be00
> | x29: 80008318be00 x28: 03c0ae80 x27: 
> | x26:  x25: 03c0ae80 x24: 
> | x23: 8000800234c8 x22: 80008002dc30 x21: 800080035d10
> | x20: 80008318bee8 x19: 800080023460 x18: 800083453c68
> | x17:  x16: 800083188000 x15: 8ccc5058
> | x14: 0004 x13: 800082b8c4f0 x12: 
> | x11: 800081fba9b0 x10: 80008318bff0 x9 : 800080010798
> | x8 : 80008002dc30 x7 : 03c0ae80 x6 : 
> | x5 :  x4 : 8000832a3c18 x3 : 80008318bff0
> | x2 : 80008002dc30 x1 : 80008002dc30 x0 : 80008002dc30
> | Call trace:
> |  arch_stack_walk+0x3c0/0x3d8
> |  return_address+0x40/0x80
> |  trace_hardirqs_on+0x8c/0x198
> |  __do_softirq+0xe8/0x440
> | ---[ end trace  ]---
> 
> That's a warning in arm64's unwind_recover_return_address() function, which
> fires when ftrace_graph_ret_addr() finds return_to_handler:
> 
>   if (state->task->ret_stack &&
>   (state->pc == (unsigned long)return_to_handler)) {
>   unsigned long orig_pc;
>   orig_pc = ftrace_graph_ret_addr(state->task, NULL, state->pc,
>   (void *)state->fp);
>   if (WARN_ON_ONCE(state->pc == orig_pc))
>   return -EINVAL;
>   state->pc = orig_pc;
>   }
> 
> The rationale there is that since tail calls are (currently) disabled on 
> arm64,
> the only reason for ftrace_graph_ret_addr() to return return_to_handler is 
> when
> it fails to find the original return address.

Yes. what about FP check?

> 
> Does this change make it legitimate for ftrace_graph_ret_addr() to return
> return_to_handler in other cases, or is that a bug?

It should be a bug to be fixed.

> 
> Either way, we'll need *some* way to recover the original return addresss...

At least it needs to dump the shadow stack so that we can analyze what
happened. 

Thank you!

> 
> Mark.
> 
> > ---
> >  Changes in v4:
> >   - Simplify get_ret_stack() sanity check and use WARN_ON_ONCE() for
> > obviously wrong value.
> >   - Do not check ret == return_to_handler but always read the previous
> > ret_stack in ftrace_push_return_trace() to check it is reusable.
> >   - Set the bit 0 of the bitmap entry always in function_graph_enter()
> > because it uses bit 0 to check re-usability.
> >   - Fix to ensure the ret_stack entry is bitmap type when checking the
> > bitmap.
> >  Changes in v3:
> >   - Pass current fgraph_ops to the new entry handler
> >(function_graph_enter_ops) if fgraph use ftrace.
> >   - Add fgraph_ops::idx in this patch.
> >   - Replace the array type with the bitmap type so that it can record
> > which fgraph is called.
> >   - Fix some helper function to use passed task_struct instead of current.
> >   - Reduce the ret-index size to 

Re: [PATCH v5 22/34] tracing: Rename ftrace_regs_return_value to ftrace_regs_get_return_value

2024-01-07 Thread Google
On Fri, 5 Jan 2024 17:14:06 +
Mark Rutland  wrote:

> On Mon, Dec 18, 2023 at 10:15:59PM +0900, Masami Hiramatsu (Google) wrote:
> > From: Masami Hiramatsu (Google) 
> > 
> > Rename ftrace_regs_return_value to ftrace_regs_get_return_value as same as
> > other ftrace_regs_get/set_* APIs.
> > 
> > Signed-off-by: Masami Hiramatsu (Google) 
> 
> Acked-by: Mark Rutland 
> 
> Since this is a trivial cleanup, it might make sense to move this to the start
> of the series, so that it can be queued even if the later parts need more 
> work.

Thanks for your Ack! and OK, let me move this to the start of this series.

> 
> Mark.
> 
> > ---
> >  Changes in v3:
> >   - Newly added.
> > ---
> >  arch/loongarch/include/asm/ftrace.h |2 +-
> >  arch/powerpc/include/asm/ftrace.h   |2 +-
> >  arch/s390/include/asm/ftrace.h  |2 +-
> >  arch/x86/include/asm/ftrace.h   |2 +-
> >  include/linux/ftrace.h  |2 +-
> >  kernel/trace/fgraph.c   |2 +-
> >  6 files changed, 6 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/loongarch/include/asm/ftrace.h 
> > b/arch/loongarch/include/asm/ftrace.h
> > index a11996eb5892..a9c3d0f2f941 100644
> > --- a/arch/loongarch/include/asm/ftrace.h
> > +++ b/arch/loongarch/include/asm/ftrace.h
> > @@ -70,7 +70,7 @@ ftrace_regs_set_instruction_pointer(struct ftrace_regs 
> > *fregs, unsigned long ip)
> > regs_get_kernel_argument(&(fregs)->regs, n)
> >  #define ftrace_regs_get_stack_pointer(fregs) \
> > kernel_stack_pointer(&(fregs)->regs)
> > -#define ftrace_regs_return_value(fregs) \
> > +#define ftrace_regs_get_return_value(fregs) \
> > regs_return_value(&(fregs)->regs)
> >  #define ftrace_regs_set_return_value(fregs, ret) \
> > regs_set_return_value(&(fregs)->regs, ret)
> > diff --git a/arch/powerpc/include/asm/ftrace.h 
> > b/arch/powerpc/include/asm/ftrace.h
> > index 9e5a39b6a311..7e138e0e3baf 100644
> > --- a/arch/powerpc/include/asm/ftrace.h
> > +++ b/arch/powerpc/include/asm/ftrace.h
> > @@ -69,7 +69,7 @@ ftrace_regs_get_instruction_pointer(struct ftrace_regs 
> > *fregs)
> > regs_get_kernel_argument(&(fregs)->regs, n)
> >  #define ftrace_regs_get_stack_pointer(fregs) \
> > kernel_stack_pointer(&(fregs)->regs)
> > -#define ftrace_regs_return_value(fregs) \
> > +#define ftrace_regs_get_return_value(fregs) \
> > regs_return_value(&(fregs)->regs)
> >  #define ftrace_regs_set_return_value(fregs, ret) \
> > regs_set_return_value(&(fregs)->regs, ret)
> > diff --git a/arch/s390/include/asm/ftrace.h b/arch/s390/include/asm/ftrace.h
> > index 5a82b08f03cd..01e775c98425 100644
> > --- a/arch/s390/include/asm/ftrace.h
> > +++ b/arch/s390/include/asm/ftrace.h
> > @@ -88,7 +88,7 @@ ftrace_regs_set_instruction_pointer(struct ftrace_regs 
> > *fregs,
> > regs_get_kernel_argument(&(fregs)->regs, n)
> >  #define ftrace_regs_get_stack_pointer(fregs) \
> > kernel_stack_pointer(&(fregs)->regs)
> > -#define ftrace_regs_return_value(fregs) \
> > +#define ftrace_regs_get_return_value(fregs) \
> > regs_return_value(&(fregs)->regs)
> >  #define ftrace_regs_set_return_value(fregs, ret) \
> > regs_set_return_value(&(fregs)->regs, ret)
> > diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
> > index 0b306c82855d..a061f8832b20 100644
> > --- a/arch/x86/include/asm/ftrace.h
> > +++ b/arch/x86/include/asm/ftrace.h
> > @@ -64,7 +64,7 @@ arch_ftrace_get_regs(struct ftrace_regs *fregs)
> > regs_get_kernel_argument(&(fregs)->regs, n)
> >  #define ftrace_regs_get_stack_pointer(fregs) \
> > kernel_stack_pointer(&(fregs)->regs)
> > -#define ftrace_regs_return_value(fregs) \
> > +#define ftrace_regs_get_return_value(fregs) \
> > regs_return_value(&(fregs)->regs)
> >  #define ftrace_regs_set_return_value(fregs, ret) \
> > regs_set_return_value(&(fregs)->regs, ret)
> > diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> > index 79875a00c02b..da2a23f5a9ed 100644
> > --- a/include/linux/ftrace.h
> > +++ b/include/linux/ftrace.h
> > @@ -187,7 +187,7 @@ static __always_inline bool ftrace_regs_has_args(struct 
> > ftrace_regs *fregs)
> > regs_get_kernel_argument(ftrace_get_regs(fregs), n)
> >  #define ftrace_regs_get_stack_pointer(fregs) \
> > kernel_stack_pointer(ftrace_get_regs(fregs))
> > -#define ftrace_regs_return_value(fregs) \
> > +#define ftrace_regs_get_return_value(fregs) \
> > regs_return_value(ftrace_get_regs(fregs))
> >  #define ftrace_regs_set_return_value(fregs, ret) \
> > regs_set_return_value(ftrace_get_regs(fregs), ret)
> > diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
> > index 088432b695a6..9a60acaacc96 100644
> > --- a/kernel/trace/fgraph.c
> > +++ b/kernel/trace/fgraph.c
> > @@ -783,7 +783,7 @@ static void fgraph_call_retfunc(struct ftrace_regs 
> > *fregs,
> > trace.rettime = trace_clock_local();
> >  #ifdef CONFIG_FUNCTION_GRAPH_RETVAL
> > if (fregs)
> > -   trace.retval = ftrace_regs_return_value(fregs);

Re: [PATCH] tracefs/eventfs: Use root and instance inodes as default ownership

2024-01-07 Thread Steven Rostedt
On Sun, 7 Jan 2024 13:29:12 -0500
Steven Rostedt  wrote:

> > 
> > IOW, the inode_permission() in lookup_one_len() that eventfs does is
> > redundant and just wrong.  
> 
> I don't think so.

Just to make it clear. eventfs has nothing to do with mkdir instance/foo.
It exists without that. Although one rationale to do eventfs was so
that the instance directories wouldn't recreate the same 10thousands
event inodes and dentries for every mkdir done.

-- Steve



Re: Re: [PATCH v3 1/3] kasan: switch kunit tests to console tracepoints

2024-01-07 Thread Paul Heidekrüger
On 12.12.2023 10:32, Marco Elver wrote:
> On Tue, 12 Dec 2023 at 10:19, Paul Heidekrüger  
> wrote:
> >
> > On 12.12.2023 00:37, Andrey Konovalov wrote:
> > > On Tue, Dec 12, 2023 at 12:35 AM Paul Heidekrüger
> > >  wrote:
> > > >
> > > > Using CONFIG_FTRACE=y instead of CONFIG_TRACEPOINTS=y produces the same 
> > > > error
> > > > for me.
> > > >
> > > > So
> > > >
> > > > CONFIG_KUNIT=y
> > > > CONFIG_KUNIT_ALL_TESTS=n
> > > > CONFIG_FTRACE=y
> > > > CONFIG_KASAN=y
> > > > CONFIG_KASAN_GENERIC=y
> > > > CONFIG_KASAN_KUNIT_TEST=y
> > > >
> > > > produces
> > > >
> > > > ➜   ./tools/testing/kunit/kunit.py run 
> > > > --kunitconfig=mm/kasan/.kunitconfig --arch=arm64
> > > > Configuring KUnit Kernel ...
> > > > Regenerating .config ...
> > > > Populating config with:
> > > > $ make ARCH=arm64 O=.kunit olddefconfig CC=clang
> > > > ERROR:root:Not all Kconfig options selected in kunitconfig were 
> > > > in the generated .config.
> > > > This is probably due to unsatisfied dependencies.
> > > > Missing: CONFIG_KASAN_KUNIT_TEST=y
> > > >
> > > > By that error message, CONFIG_FTRACE appears to be present in the 
> > > > generated
> > > > config, but CONFIG_KASAN_KUNIT_TEST still isn't. Presumably,
> > > > CONFIG_KASAN_KUNIT_TEST is missing because of an unsatisfied 
> > > > dependency, which
> > > > must be CONFIG_TRACEPOINTS, unless I'm missing something ...
> > > >
> > > > If I just generate an arm64 defconfig and select CONFIG_FTRACE=y,
> > > > CONFIG_TRACEPOINTS=y shows up in my .config. So, maybe this is 
> > > > kunit.py-related
> > > > then?
> > > >
> > > > Andrey, you said that the tests have been working for you; are you 
> > > > running them
> > > > with kunit.py?
> > >
> > > No, I just run the kernel built with a config file that I put together
> > > based on defconfig.
> >
> > Ah. I believe I've figured it out.
> >
> > When I add CONFIG_STACK_TRACER=y in addition to CONFIG_FTRACE=y, it works.
> 
> CONFIG_FTRACE should be enough - maybe also check x86 vs. arm64 to debug more.

See below.

> > CONFIG_STACK_TRACER selects CONFIG_FUNCTION_TRACER, CONFIG_FUNCTION_TRACER
> > selects CONFIG_GENERIC_TRACER, CONFIG_GENERIC_TRACER selects 
> > CONFIG_TRACING, and
> > CONFIG_TRACING selects CONFIG_TRACEPOINTS.
> >
> > CONFIG_BLK_DEV_IO_TRACE=y also works instead of CONFIG_STACK_TRACER=y, as it
> > directly selects CONFIG_TRACEPOINTS.
> >
> > CONFIG_FTRACE=y on its own does not appear suffice for kunit.py on arm64.
> 
> When you build manually with just CONFIG_FTRACE, is CONFIG_TRACEPOINTS 
> enabled?

When I add CONFIG_FTRACE and enter-key my way through the FTRACE prompts - I 
believe because CONFIG_FTRACE is a menuconfig? - at the beginning of a build, 
CONFIG_TRACEPOINTS does get set on arm64, yes.

On X86, the defconfig already includes CONIFG_TRACEPOINTS.

I also had a closer look at how kunit.py builds its configs.
I believe it does something along the following lines:

cp  .kunit/.config
make ARCH=arm64 O=.kunit olddefconfig

On arm64, that isn't enough to set CONFIG_TRACEPOINTS; same behaviour when run 
outside of kunit.py.

For CONFIG_TRACEPOINTS, `make ARCH=arm64 menuconfig` shows:

Symbol: TRACEPOINTS [=n]
Type  : bool
Defined at init/Kconfig:1920
Selected by [n]:
- TRACING [=n]
- BLK_DEV_IO_TRACE [=n] && FTRACE [=y] && SYSFS [=y] && BLOCK 
[=y]

So, CONFIG_TRACING or CONFIG_BLK_DEV_IO_TRACE are the two options that prevent 
CONFIG_TRACEPOINTS from being set on arm64.

For CONFIG_TRACING we have:

Symbol: TRACING [=n]
Type  : bool
Defined at kernel/trace/Kconfig:157
Selects: RING_BUFFER [=n] && STACKTRACE [=y] && TRACEPOINTS [=n] && 
NOP_TRACER [=n] && BINARY_PRINTF [=n] && EVENT_TRACING [=n] && TRACE_CLOCK [=y] 
&& TASKS_RCU [=n]
Selected by [n]:
- DRM_I915_TRACE_GEM [=n] && HAS_IOMEM [=y] && DRM_I915 [=n] && 
EXPERT [=n] && DRM_I915_DEBUG_GEM [=n]
- DRM_I915_TRACE_GTT [=n] && HAS_IOMEM [=y] && DRM_I915 [=n] && 
EXPERT [=n] && DRM_I915_DEBUG_GEM [=n]
- PREEMPTIRQ_TRACEPOINTS [=n] && (TRACE_PREEMPT_TOGGLE [=n] || 
TRACE_IRQFLAGS [=n])
- GENERIC_TRACER [=n]
- ENABLE_DEFAULT_TRACERS [=n] && FTRACE [=y] && !GENERIC_TRACER 
[=n]
- FPROBE_EVENTS [=n] && FTRACE [=y] && FPROBE [=n] && 
HAVE_REGS_AND_STACK_ACCESS_API [=y]
- KPROBE_EVENTS [=n] && FTRACE [=y] && KPROBES [=n] && 
HAVE_REGS_AND_STACK_ACCESS_API [=y]
- UPROBE_EVENTS [=n] && FTRACE [=y] && ARCH_SUPPORTS_UPROBES 
[=y] && MMU [=y] && PERF_EVENTS [=n]
- SYNTH_EVENTS [=n] && FTRACE [=y]
- USER_EVENTS [=n] && FTRACE [=y]
- HIST_TRIGGERS [=n] && FTRACE [=y] && 
ARCH_HAVE_NMI_SAFE_CMPXCHG [=y]

> > I believe the reason my .kunitconfig as well as the 

Re: [PATCH] tracefs/eventfs: Use root and instance inodes as default ownership

2024-01-07 Thread Steven Rostedt
On Sun, 7 Jan 2024 13:42:39 +0100
Christian Brauner  wrote:
> 
> So, I tried to do an exploratory patch even though I promised myself not
> to do it. But hey...
> 
> Some notes:
> 
> * Permission handling for idmapped mounts is done completely in the
>   VFS. That's the case for all filesytems that don't have a custom
>   ->permission() handler. So there's nothing to do for us here.  
> 
> * Idmapped mount handling for ->getattr() is done completely by the VFS
>   if the filesystem doesn't have a custom ->getattr() handler. So we're
>   done here.
> 
> * Tracefs doesn't support attribute changes via ->setattr() (chown(),
>   chmod etc.). So there's nothing to here.
> 
> * Eventfs does support attribute changes via ->setattr(). But it relies
>   on simple_setattr() which is already idmapped mount aware. So there's
>   nothing for us to do.
> 
> * Ownership is inherited from the parent inode (tracefs) or optionally
>   from stashed ownership information (eventfs). That means the idmapping
>   is irrelevant here. It's similar to the "inherit gid from parent
>   directory" logic we have in some circumstances. TL;DR nothing to do
>   here as well.

The reason ownership is inherited from the parent is because the inodes
are created at boot up before user space starts.

eventfs inodes are created on demand after user space so it needs to
either check the "default" ownership and permissions, or if the user
changed a specific file/directory, it must save it and use that again
if the inode/dentry are reclaimed and then referenced again and
recreated.

> 
> * Tracefs supports the creation of instances from userspace via mkdir.
>   For example,
> 
>   mkdir /sys/kernel/tracing/instances/foo
> 
>   And here the idmapping is relevant so we need to make the helpers
>   aware of the idmapping.
> 
>   I just went and plumbed this through to most helpers.
> 
> There's some subtlety in eventfs. Afaict, the directories and files for
> the individual events are created on-demand during lookup or readdir.
> 
> The ownership of these events is again inherited from the parent inode
> or recovered from stored state. In both cases the actual idmapping is
> irrelevant.
> 
> The callchain here is:
> 
> eventfs_root_lookup("xfs", "events")
> -> create_{dir,file}_dentry("xfs", "events")
>-> create_{dir,file}("xfs", "events")
>   -> eventfs_start_creating("xfs", "events")
>  -> lookup_one_len("xfs", "events")  
> 
> And the subtlety is that lookup_one_len() does permission checking on
> the parent inode (IOW, if you want a dentry for "blech" under "events"
> it'll do a permission check on events->d_inode) for exec permissions
> and then goes on to give you a new dentry.
> 
> Usually this call would have to be changed to lookup_one() and the
> idmapping be handed down to it. But I think that's irrelevant here.
> 
> Lookup generally doesn't need to be aware of idmappings at all. The
> permission checking is done purely in the vfs via may_lookup() and the
> idmapping is irrelevant because we always initialize inodes with the
> filesystem level ownership (see the idmappings.rst) documentation if
> you're interested in excessive details (otherwise you get inode aliases
> which you really don't want).
> 
> For tracefs it would not matter for lookup per se but only because
> tracefs seemingly creates inodes/dentries during lookup (and readdir()).

tracefs creates the inodes/dentries at boot up, it's eventfs that does
it on demand during lookup.

For inodes/dentries:

 /sys/kernel/tracing/* is all created at boot up, except for "events".
 /sys/kernel/tracing/events/* is created on demand.

> 
> But imho the permission checking done in current eventfs_root_lookup()
> via lookup_one_len() is meaningless in any way; possibly even
> (conceptually) wrong.
> 
> Because, the actual permission checking for the creation of the eventfs
> entries isn't really done during lookup or readdir, it's done when mkdir
> is called:
> 
> mkdir /sys/kernel/tracing/instances/foo

No. that creates a entire new tracefs instance, which happens to
include another eventfs directory.

The eventsfs directory is in "/sys/kernel/tracing/events" and
 "/sys/kernel/tracing/instances/*/events"

eventfs has 10s of thousands of files and directories which is why I
changed it to be on-demand inode/dentry creation and also reclaiming
when no longer accessed.

 # ls /sys/kernel/tracing/events/

will create the inodes and dentries, and a memory stress program will
free those created inodes and dentries.

> 
> Here, all possible entries beneath foo including "events" and further
> below are recorded and stored. So once mkdir returns it basically means
> that it succeeded with the creation of all the necessary directories and
> files. For all purposes the foo/events/ directory and below have all the
> entries that matter. They have been created. It's comparable to them not
> being in the {d,i}cache, I guess.

No. Only the meta data is created for the eventfs 

Re: [PATCH] tracefs/eventfs: Use root and instance inodes as default ownership

2024-01-07 Thread Christian Brauner
On Sun, Jan 07, 2024 at 06:42:33PM +0100, Christian Brauner wrote:
> On Sun, Jan 07, 2024 at 01:42:39PM +0100, Christian Brauner wrote:
> > > > So tracefs supports remounting with different uid/gid mount options and
> > > > then actually wades through _all_ of the inodes and changes their
> > > > ownership internally? What's the use-case for this? Containers?
> > > 
> > > No, in fact tracing doesn't work well with containers as tracing is global
> > > to the entire machine. It can work with privileged containers though.
> > 
> > At least the tracefs interface is easily supportable within a delegation
> > model. IOW, you have a privileged process that delegates relevant
> > portions to a container via idmapped mounts _without_ doing the insane thing
> > and making it mountable by a container aka the fs-to-CVE pipeline.
> > 
> > > 
> > > The reason for this is because tracefs was based off of debugfs where the
> > > files and directores are created at boot up and mounted later. The reason
> > > to do this was to allow users to mount with gid=GID to allow a given group
> > > to have access to tracing. Without this update, tracefs would ignore it
> > > like debugfs and proc does today.
> > > 
> > > I think its time I explain the purpose of tracefs and how it came to be.
> > > 
> > > The tracing system required a way to control tracing and read the traces.
> > > It could have just used a new system like perf (although
> > > /sys/kernel/debug/tracing predates perf), where it created a single 
> > > ioctl()
> > > like system call do do everything.
> > > 
> > > As the ftrace tracing came from PREEMPT_RT latency tracer and my own 
> > > logdev
> > > tracer, which both have an embedded background, I chose an interface that
> > > could work with just an unmodified version of busybox. That is, I wanted 
> > > it
> > > to work with just cat and echo.
> > > 
> > > The main difference with tracefs compared to other file systems is that it
> > > is a control interface, where writes happen as much as reads. The data 
> > > read
> > > is controlled. The closest thing I can think of is how cgroups work.
> > > 
> > > As tracing is a privileged operation, but something that could be changed
> > > to allow a group to have access to, I wanted to make it easy for an admin
> > > to decide who gets to do what at boot up via the /etc/fstab file.
> > 
> > Yeah, ok. I think you could achieve the same thing via idmapped mounts. You
> > just need to swap out the mnt on /sys/kernel/tracing with an idmapped mount.
> > 
> > mount(8) should just give you the ability to specify "map the ids I 
> > explicitly
> > want to remap to something else and for the rest use the identity mapping". 
> > I
> > wanted that for other reasons anyway.
> > 
> > So in one of the next versions of mount(8) you can then do (where --beneath
> > means place the mount beneath the current one and --replace is
> > self-explanatory):
> > 
> > sudo mount --beneath -o X-mount.idmap='g:0:1234:1 u:0:0:1' 
> > /sys/kernel/tracing
> > sudo umount /sys/kernel/tracing
> > 
> > or as a shortcut provided by mount(8):
> > 
> > sudo mount --replace -o X-mount.idmap='g:0:1234:1 u:0:0:1' 
> > /sys/kernel/tracing 
> > 
> > In both cases you replace the mount without unmounting tracefs.
> > 
> > I can illustrate this right now though:
> > 
> > user1@localhost:~$ sudo mount --bind -o X-mount.idmap='g:0:1000:1 
> > u:0:1000:1' /sys/kernel/tracing/ /mnt/
> > 
> > # This is a tool I wrote for testing the patchset I wrote back then.
> > user1@localhost:~/data/move-mount-beneath$ sudo ./move-mount --beneath 
> > --detached /mnt /sys/kernel/tracing
> > Mounting beneath top mount
> > Creating anonymous mount
> > Attaching mount /mnt -> /sys/kernel/tracing
> > Creating single detached mount
> > 
> > user1@localhost:~/data/move-mount-beneath$
> > 
> > # Now there's two mounts stacked on top of each other.
> > user1@localhost:~/data/move-mount-beneath$ findmnt | grep tracing
> > | `-/sys/kernel/tracingtracefstracefs 
> > rw,nosuid,nodev,noexec,relatime,idmapped
> > |   `-/sys/kernel/tracing  tracefstracefs 
> > rw,nosuid,nodev,noexec,relatime
> > 
> > user1@localhost:~/data/move-mount-beneath$ sudo ls -al 
> > /sys/kernel/tracing/| head
> > total 0
> > drwx--  6 root root 0 Jan  7 13:33 .
> > drwxr-xr-x 16 root root 0 Jan  7 13:33 ..
> > -r--r-  1 root root 0 Jan  7 13:33 README
> > -r--r-  1 root root 0 Jan  7 13:33 available_events
> > -r--r-  1 root root 0 Jan  7 13:33 available_filter_functions
> > -r--r-  1 root root 0 Jan  7 13:33 available_filter_functions_addrs
> > -r--r-  1 root root 0 Jan  7 13:33 available_tracers
> > -rw-r-  1 root root 0 Jan  7 13:33 buffer_percent
> > -rw-r-  1 root root 0 Jan  7 13:33 buffer_size_kb
> > 
> > # Reveal updated mount
> > user1@localhost:~/data/move-mount-beneath$ sudo umount /sys/kernel/tracing
> > 
> > user1@localhost:~/data/move-mount-beneath$ findmnt | grep tracing
> > | 

Re: [PATCH] tracefs/eventfs: Use root and instance inodes as default ownership

2024-01-07 Thread Christian Brauner
On Sun, Jan 07, 2024 at 01:42:39PM +0100, Christian Brauner wrote:
> > > So tracefs supports remounting with different uid/gid mount options and
> > > then actually wades through _all_ of the inodes and changes their
> > > ownership internally? What's the use-case for this? Containers?
> > 
> > No, in fact tracing doesn't work well with containers as tracing is global
> > to the entire machine. It can work with privileged containers though.
> 
> At least the tracefs interface is easily supportable within a delegation
> model. IOW, you have a privileged process that delegates relevant
> portions to a container via idmapped mounts _without_ doing the insane thing
> and making it mountable by a container aka the fs-to-CVE pipeline.
> 
> > 
> > The reason for this is because tracefs was based off of debugfs where the
> > files and directores are created at boot up and mounted later. The reason
> > to do this was to allow users to mount with gid=GID to allow a given group
> > to have access to tracing. Without this update, tracefs would ignore it
> > like debugfs and proc does today.
> > 
> > I think its time I explain the purpose of tracefs and how it came to be.
> > 
> > The tracing system required a way to control tracing and read the traces.
> > It could have just used a new system like perf (although
> > /sys/kernel/debug/tracing predates perf), where it created a single ioctl()
> > like system call do do everything.
> > 
> > As the ftrace tracing came from PREEMPT_RT latency tracer and my own logdev
> > tracer, which both have an embedded background, I chose an interface that
> > could work with just an unmodified version of busybox. That is, I wanted it
> > to work with just cat and echo.
> > 
> > The main difference with tracefs compared to other file systems is that it
> > is a control interface, where writes happen as much as reads. The data read
> > is controlled. The closest thing I can think of is how cgroups work.
> > 
> > As tracing is a privileged operation, but something that could be changed
> > to allow a group to have access to, I wanted to make it easy for an admin
> > to decide who gets to do what at boot up via the /etc/fstab file.
> 
> Yeah, ok. I think you could achieve the same thing via idmapped mounts. You
> just need to swap out the mnt on /sys/kernel/tracing with an idmapped mount.
> 
> mount(8) should just give you the ability to specify "map the ids I explicitly
> want to remap to something else and for the rest use the identity mapping". I
> wanted that for other reasons anyway.
> 
> So in one of the next versions of mount(8) you can then do (where --beneath
> means place the mount beneath the current one and --replace is
> self-explanatory):
> 
> sudo mount --beneath -o X-mount.idmap='g:0:1234:1 u:0:0:1' /sys/kernel/tracing
> sudo umount /sys/kernel/tracing
> 
> or as a shortcut provided by mount(8):
> 
> sudo mount --replace -o X-mount.idmap='g:0:1234:1 u:0:0:1' 
> /sys/kernel/tracing 
> 
> In both cases you replace the mount without unmounting tracefs.
> 
> I can illustrate this right now though:
> 
> user1@localhost:~$ sudo mount --bind -o X-mount.idmap='g:0:1000:1 u:0:1000:1' 
> /sys/kernel/tracing/ /mnt/
> 
> # This is a tool I wrote for testing the patchset I wrote back then.
> user1@localhost:~/data/move-mount-beneath$ sudo ./move-mount --beneath 
> --detached /mnt /sys/kernel/tracing
> Mounting beneath top mount
> Creating anonymous mount
> Attaching mount /mnt -> /sys/kernel/tracing
> Creating single detached mount
> 
> user1@localhost:~/data/move-mount-beneath$
> 
> # Now there's two mounts stacked on top of each other.
> user1@localhost:~/data/move-mount-beneath$ findmnt | grep tracing
> | `-/sys/kernel/tracingtracefstracefs 
> rw,nosuid,nodev,noexec,relatime,idmapped
> |   `-/sys/kernel/tracing  tracefstracefs 
> rw,nosuid,nodev,noexec,relatime
> 
> user1@localhost:~/data/move-mount-beneath$ sudo ls -al /sys/kernel/tracing/| 
> head
> total 0
> drwx--  6 root root 0 Jan  7 13:33 .
> drwxr-xr-x 16 root root 0 Jan  7 13:33 ..
> -r--r-  1 root root 0 Jan  7 13:33 README
> -r--r-  1 root root 0 Jan  7 13:33 available_events
> -r--r-  1 root root 0 Jan  7 13:33 available_filter_functions
> -r--r-  1 root root 0 Jan  7 13:33 available_filter_functions_addrs
> -r--r-  1 root root 0 Jan  7 13:33 available_tracers
> -rw-r-  1 root root 0 Jan  7 13:33 buffer_percent
> -rw-r-  1 root root 0 Jan  7 13:33 buffer_size_kb
> 
> # Reveal updated mount
> user1@localhost:~/data/move-mount-beneath$ sudo umount /sys/kernel/tracing
> 
> user1@localhost:~/data/move-mount-beneath$ findmnt | grep tracing
> | `-/sys/kernel/tracingtracefstracefs 
> rw,nosuid,nodev,noexec,relatime,idmapped
> 
> user1@localhost:~/data/move-mount-beneath$ sudo ls -al /sys/kernel/tracing/| 
> head
> total 0
> drwx--  6 user1 user1 0 Jan  7 13:33 .
> drwxr-xr-x 16 root  root  0 Jan  7 13:33 ..
> -r--r-  1 user1 user1 0 

Re: [RFC PATCH 4/5] regulator: add 88pm88x regulators driver

2024-01-07 Thread Mark Brown
On Sun, Jan 07, 2024 at 10:49:20AM +0100, Karel Balej wrote:
> On Fri Jan 5, 2024 at 4:18 PM CET, Mark Brown wrote:

> > Why are we adding an of_compatible here?  It's redundant, the MFD split
> > is a feature of Linux internals not of the hardware, and the existing
> > 88pm8xx MFD doesn't use them.

> in a feedback to my MFD series, Rob Herring pointed out that there is no
> need to have a devicetree node for a subdevice if it only contains
> "compatible" as the MFD driver can instantiate subdevices itself. I
> understood that this is what he was referring to, but now I suspect that
> it is sufficient for the mfd_cell.name to be set to the subdevice driver
> name for this - is that correct?

That's what I'd expect, yes.


signature.asc
Description: PGP signature


Re: [RFC PATCH 4/5] regulator: add 88pm88x regulators driver

2024-01-07 Thread Karel Balej
On Sun Jan 7, 2024 at 11:35 AM CET, Krzysztof Kozlowski wrote:
> On 28/12/2023 10:39, Karel Balej wrote:
> > diff --git a/drivers/mfd/88pm88x.c b/drivers/mfd/88pm88x.c
> > index 69a8e39d43b3..999d0539b720 100644
> > --- a/drivers/mfd/88pm88x.c
> > +++ b/drivers/mfd/88pm88x.c
> > @@ -68,6 +68,21 @@ static struct mfd_cell pm886_devs[] = {
> > .num_resources = ARRAY_SIZE(pm88x_onkey_resources),
> > .resources = pm88x_onkey_resources,
> > },
> > +   {
> > +   .name = "88pm88x-regulator",
> > +   .id = PM88X_REGULATOR_ID_LDO2,
> > +   .of_compatible = "marvell,88pm88x-regulator",
> > +   },
> > +   {
> > +   .name = "88pm88x-regulator",
> > +   .id = PM88X_REGULATOR_ID_LDO15,
> > +   .of_compatible = "marvell,88pm88x-regulator",
> > +   },
> > +   {
> > +   .name = "88pm88x-regulator",
> > +   .id = PM886_REGULATOR_ID_BUCK2,
> > +   .of_compatible = "marvell,88pm88x-regulator",
>
> Same compatible per each regulator looks suspicious, if not even wrong.
> What are these?

The original attempt for upstreaming this MFD had a different compatible
for each regulator which was not correct according to the reviewers at
the time. I have thus used the same compatible for all regulators and
make the distinction in the regulator driver (using the .id property).
But I think that the problem here is again that I have confused the
purpose of .name and .of_compatible properties of struct mfd_cell - if a
driver is probed due to the .name property then I indeed should not need
compatible for the regulator driver at all.

>
> Best regards,
> Krzysztof

Best regards,
K. B.



Re: [RFC PATCH 4/5] regulator: add 88pm88x regulators driver

2024-01-07 Thread Karel Balej
Krzysztof,

On Sun Jan 7, 2024 at 11:34 AM CET, Krzysztof Kozlowski wrote:
> On 07/01/2024 10:49, Karel Balej wrote:
> > Mark,
> > 
> > On Fri Jan 5, 2024 at 4:18 PM CET, Mark Brown wrote:
> >> On Thu, Dec 28, 2023 at 10:39:13AM +0100, Karel Balej wrote:
> >>
> >>> @@ -68,6 +68,21 @@ static struct mfd_cell pm886_devs[] = {
> >>>   .num_resources = ARRAY_SIZE(pm88x_onkey_resources),
> >>>   .resources = pm88x_onkey_resources,
> >>>   },
> >>> + {
> >>> + .name = "88pm88x-regulator",
> >>> + .id = PM88X_REGULATOR_ID_LDO2,
> >>> + .of_compatible = "marvell,88pm88x-regulator",
> >>> + },
> >>
> >> Why are we adding an of_compatible here?  It's redundant, the MFD split
> >> is a feature of Linux internals not of the hardware, and the existing
> >> 88pm8xx MFD doesn't use them.
> > 
> > in a feedback to my MFD series, Rob Herring pointed out that there is no
> > need to have a devicetree node for a subdevice if it only contains
> > "compatible" as the MFD driver can instantiate subdevices itself. I
> > understood that this is what he was referring to, but now I suspect that
> > it is sufficient for the mfd_cell.name to be set to the subdevice driver
> > name for this - is that correct?
>
> I think Rob was only referring to "no need to have a devicetree node".

yes, but I thought the presence of the compatible in the node is what
triggers instantiation of the driver and that adding it here instead was
necessary for that to happen if the node was to be removed. But like I
said, now I think only the .name property is relevant for that.

> But you added here a devicetree node, plus probably undocumented compatible.
>
> Does it even pass the checkpatch?

It does, but you were correct in your previous messages that I have not
run `make dt_binding_check` for this (or I assume that was what you
meant when you said that I did not test this) because I was not aware of
it when sending the MFD series and because this one would fail with the
same problems as Rob pointed out for that one, which is the main reason
why I only asked for feedback on the new parts. Sorry about that, next
time I will be sure to first fix all already known problems before
building on something.
>
> Best regards,
> Krzysztof

Thank you,
K. B.



Re: [PATCH] tracefs/eventfs: Use root and instance inodes as default ownership

2024-01-07 Thread Christian Brauner
> > So tracefs supports remounting with different uid/gid mount options and
> > then actually wades through _all_ of the inodes and changes their
> > ownership internally? What's the use-case for this? Containers?
> 
> No, in fact tracing doesn't work well with containers as tracing is global
> to the entire machine. It can work with privileged containers though.

At least the tracefs interface is easily supportable within a delegation
model. IOW, you have a privileged process that delegates relevant
portions to a container via idmapped mounts _without_ doing the insane thing
and making it mountable by a container aka the fs-to-CVE pipeline.

> 
> The reason for this is because tracefs was based off of debugfs where the
> files and directores are created at boot up and mounted later. The reason
> to do this was to allow users to mount with gid=GID to allow a given group
> to have access to tracing. Without this update, tracefs would ignore it
> like debugfs and proc does today.
> 
> I think its time I explain the purpose of tracefs and how it came to be.
> 
> The tracing system required a way to control tracing and read the traces.
> It could have just used a new system like perf (although
> /sys/kernel/debug/tracing predates perf), where it created a single ioctl()
> like system call do do everything.
> 
> As the ftrace tracing came from PREEMPT_RT latency tracer and my own logdev
> tracer, which both have an embedded background, I chose an interface that
> could work with just an unmodified version of busybox. That is, I wanted it
> to work with just cat and echo.
> 
> The main difference with tracefs compared to other file systems is that it
> is a control interface, where writes happen as much as reads. The data read
> is controlled. The closest thing I can think of is how cgroups work.
> 
> As tracing is a privileged operation, but something that could be changed
> to allow a group to have access to, I wanted to make it easy for an admin
> to decide who gets to do what at boot up via the /etc/fstab file.

Yeah, ok. I think you could achieve the same thing via idmapped mounts. You
just need to swap out the mnt on /sys/kernel/tracing with an idmapped mount.

mount(8) should just give you the ability to specify "map the ids I explicitly
want to remap to something else and for the rest use the identity mapping". I
wanted that for other reasons anyway.

So in one of the next versions of mount(8) you can then do (where --beneath
means place the mount beneath the current one and --replace is
self-explanatory):

sudo mount --beneath -o X-mount.idmap='g:0:1234:1 u:0:0:1' /sys/kernel/tracing
sudo umount /sys/kernel/tracing

or as a shortcut provided by mount(8):

sudo mount --replace -o X-mount.idmap='g:0:1234:1 u:0:0:1' /sys/kernel/tracing 

In both cases you replace the mount without unmounting tracefs.

I can illustrate this right now though:

user1@localhost:~$ sudo mount --bind -o X-mount.idmap='g:0:1000:1 u:0:1000:1' 
/sys/kernel/tracing/ /mnt/

# This is a tool I wrote for testing the patchset I wrote back then.
user1@localhost:~/data/move-mount-beneath$ sudo ./move-mount --beneath 
--detached /mnt /sys/kernel/tracing
Mounting beneath top mount
Creating anonymous mount
Attaching mount /mnt -> /sys/kernel/tracing
Creating single detached mount

user1@localhost:~/data/move-mount-beneath$

# Now there's two mounts stacked on top of each other.
user1@localhost:~/data/move-mount-beneath$ findmnt | grep tracing
| `-/sys/kernel/tracingtracefstracefs 
rw,nosuid,nodev,noexec,relatime,idmapped
|   `-/sys/kernel/tracing  tracefstracefs 
rw,nosuid,nodev,noexec,relatime

user1@localhost:~/data/move-mount-beneath$ sudo ls -al /sys/kernel/tracing/| 
head
total 0
drwx--  6 root root 0 Jan  7 13:33 .
drwxr-xr-x 16 root root 0 Jan  7 13:33 ..
-r--r-  1 root root 0 Jan  7 13:33 README
-r--r-  1 root root 0 Jan  7 13:33 available_events
-r--r-  1 root root 0 Jan  7 13:33 available_filter_functions
-r--r-  1 root root 0 Jan  7 13:33 available_filter_functions_addrs
-r--r-  1 root root 0 Jan  7 13:33 available_tracers
-rw-r-  1 root root 0 Jan  7 13:33 buffer_percent
-rw-r-  1 root root 0 Jan  7 13:33 buffer_size_kb

# Reveal updated mount
user1@localhost:~/data/move-mount-beneath$ sudo umount /sys/kernel/tracing

user1@localhost:~/data/move-mount-beneath$ findmnt | grep tracing
| `-/sys/kernel/tracingtracefstracefs 
rw,nosuid,nodev,noexec,relatime,idmapped

user1@localhost:~/data/move-mount-beneath$ sudo ls -al /sys/kernel/tracing/| 
head
total 0
drwx--  6 user1 user1 0 Jan  7 13:33 .
drwxr-xr-x 16 root  root  0 Jan  7 13:33 ..
-r--r-  1 user1 user1 0 Jan  7 13:33 README
-r--r-  1 user1 user1 0 Jan  7 13:33 available_events
-r--r-  1 user1 user1 0 Jan  7 13:33 available_filter_functions
-r--r-  1 user1 user1 0 Jan  7 13:33 available_filter_functions_addrs
-r--r-  1 user1 user1 0 Jan  7 13:33 available_tracers

Re: [RFC PATCH 4/5] regulator: add 88pm88x regulators driver

2024-01-07 Thread Krzysztof Kozlowski
On 28/12/2023 10:39, Karel Balej wrote:
> diff --git a/drivers/mfd/88pm88x.c b/drivers/mfd/88pm88x.c
> index 69a8e39d43b3..999d0539b720 100644
> --- a/drivers/mfd/88pm88x.c
> +++ b/drivers/mfd/88pm88x.c
> @@ -68,6 +68,21 @@ static struct mfd_cell pm886_devs[] = {
>   .num_resources = ARRAY_SIZE(pm88x_onkey_resources),
>   .resources = pm88x_onkey_resources,
>   },
> + {
> + .name = "88pm88x-regulator",
> + .id = PM88X_REGULATOR_ID_LDO2,
> + .of_compatible = "marvell,88pm88x-regulator",
> + },
> + {
> + .name = "88pm88x-regulator",
> + .id = PM88X_REGULATOR_ID_LDO15,
> + .of_compatible = "marvell,88pm88x-regulator",
> + },
> + {
> + .name = "88pm88x-regulator",
> + .id = PM886_REGULATOR_ID_BUCK2,
> + .of_compatible = "marvell,88pm88x-regulator",

Same compatible per each regulator looks suspicious, if not even wrong.
What are these?

Best regards,
Krzysztof




Re: [RFC PATCH 4/5] regulator: add 88pm88x regulators driver

2024-01-07 Thread Krzysztof Kozlowski
On 07/01/2024 10:49, Karel Balej wrote:
> Mark,
> 
> On Fri Jan 5, 2024 at 4:18 PM CET, Mark Brown wrote:
>> On Thu, Dec 28, 2023 at 10:39:13AM +0100, Karel Balej wrote:
>>
>>> @@ -68,6 +68,21 @@ static struct mfd_cell pm886_devs[] = {
>>> .num_resources = ARRAY_SIZE(pm88x_onkey_resources),
>>> .resources = pm88x_onkey_resources,
>>> },
>>> +   {
>>> +   .name = "88pm88x-regulator",
>>> +   .id = PM88X_REGULATOR_ID_LDO2,
>>> +   .of_compatible = "marvell,88pm88x-regulator",
>>> +   },
>>
>> Why are we adding an of_compatible here?  It's redundant, the MFD split
>> is a feature of Linux internals not of the hardware, and the existing
>> 88pm8xx MFD doesn't use them.
> 
> in a feedback to my MFD series, Rob Herring pointed out that there is no
> need to have a devicetree node for a subdevice if it only contains
> "compatible" as the MFD driver can instantiate subdevices itself. I
> understood that this is what he was referring to, but now I suspect that
> it is sufficient for the mfd_cell.name to be set to the subdevice driver
> name for this - is that correct?

I think Rob was only referring to "no need to have a devicetree node".
But you added here a devicetree node, plus probably undocumented compatible.

Does it even pass the checkpatch?

Best regards,
Krzysztof




Re: [RFC PATCH 4/5] regulator: add 88pm88x regulators driver

2024-01-07 Thread Karel Balej
Mark,

On Fri Jan 5, 2024 at 4:18 PM CET, Mark Brown wrote:
> On Thu, Dec 28, 2023 at 10:39:13AM +0100, Karel Balej wrote:
>
> > @@ -68,6 +68,21 @@ static struct mfd_cell pm886_devs[] = {
> > .num_resources = ARRAY_SIZE(pm88x_onkey_resources),
> > .resources = pm88x_onkey_resources,
> > },
> > +   {
> > +   .name = "88pm88x-regulator",
> > +   .id = PM88X_REGULATOR_ID_LDO2,
> > +   .of_compatible = "marvell,88pm88x-regulator",
> > +   },
>
> Why are we adding an of_compatible here?  It's redundant, the MFD split
> is a feature of Linux internals not of the hardware, and the existing
> 88pm8xx MFD doesn't use them.

in a feedback to my MFD series, Rob Herring pointed out that there is no
need to have a devicetree node for a subdevice if it only contains
"compatible" as the MFD driver can instantiate subdevices itself. I
understood that this is what he was referring to, but now I suspect that
it is sufficient for the mfd_cell.name to be set to the subdevice driver
name for this - is that correct?

Thank you,
K. B.