[PATCH] tracing/kprobes: Fix the order of argument descriptions
The order of descriptions should be consistent with the argument list of the function, so "kretprobe" should be the second one. int __kprobe_event_gen_cmd_start(struct dynevent_cmd *cmd, bool kretprobe, const char *name, const char *loc, ...) Fixes: 2a588dd1d5d6 ("tracing: Add kprobe event command generation functions") Suggested-by: Mukesh Ojha Signed-off-by: Yujie Liu --- kernel/trace/trace_kprobe.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index e834f149695b..47812aa16bb5 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -1020,9 +1020,9 @@ EXPORT_SYMBOL_GPL(kprobe_event_cmd_init); /** * __kprobe_event_gen_cmd_start - Generate a kprobe event command from arg list * @cmd: A pointer to the dynevent_cmd struct representing the new event + * @kretprobe: Is this a return probe? * @name: The name of the kprobe event * @loc: The location of the kprobe event - * @kretprobe: Is this a return probe? * @...: Variable number of arg (pairs), one pair for each field * * NOTE: Users normally won't want to call this function directly, but -- 2.34.1
Re: [PATCH v7 2/3] mm/memory_hotplug: split memmap_on_memory requests across memblocks
On Mon, 2023-10-30 at 11:20 +0100, David Hildenbrand wrote: > On 26.10.23 00:44, Vishal Verma wrote: > > [..] > > @@ -2146,11 +2186,69 @@ void try_offline_node(int nid) > > } > > EXPORT_SYMBOL(try_offline_node); > > > > -static int __ref try_remove_memory(u64 start, u64 size) > > +static void __ref remove_memory_blocks_and_altmaps(u64 start, u64 size) > > { > > - struct memory_block *mem; > > - int rc = 0, nid = NUMA_NO_NODE; > > + unsigned long memblock_size = memory_block_size_bytes(); > > struct vmem_altmap *altmap = NULL; > > + struct memory_block *mem; > > + u64 cur_start; > > + int rc; > > + > > + /* > > + * For memmap_on_memory, the altmaps could have been added on > > + * a per-memblock basis. Loop through the entire range if so, > > + * and remove each memblock and its altmap. > > + */ > > /* > * altmaps where added on a per-memblock basis; we have to process > * each individual memory block. > */ > > > + for (cur_start = start; cur_start < start + size; > > + cur_start += memblock_size) { > > + rc = walk_memory_blocks(cur_start, memblock_size, , > > + test_has_altmap_cb); > > + if (rc) { > > + altmap = mem->altmap; > > + /* > > + * Mark altmap NULL so that we can add a debug > > + * check on memblock free. > > + */ > > + mem->altmap = NULL; > > + } > > Simpler (especially, we know that there must be an altmap): > > mem = find_memory_block(pfn_to_section_nr(cur_start)); > altmap = mem->altmap; > mem->altmap = NULL; > > I think we might be able to remove test_has_altmap_cb() then. > > > + > > + remove_memory_block_devices(cur_start, memblock_size); > > + > > + arch_remove_memory(cur_start, memblock_size, altmap); > > + > > + /* Verify that all vmemmap pages have actually been freed. > > */ > > + if (altmap) { > > There must be an altmap, so this can be done unconditionally. Hi David, All other comments make sense, making those changes now. However for this one, does the WARN() below go away then? I was wondering if maybe arch_remove_memory() is responsible for freeing the altmap here, and at this stage we're just checking if that happened. If it didn't WARN and then free it. I drilled down the path, and I don't see altmap actually getting freed in vmem_altmap_free(), but I wasn't sure if was meant to free it as altmap->alloc went down to 0. > > > + WARN(altmap->alloc, "Altmap not fully unmapped"); > > + kfree(altmap); > > + } > > + } > > +} > >
Re: [PATCH] eventfs: Hold eventfs_mutex when calling callback functions
On Mon, 30 Oct 2023 at 21:10, Steven Rostedt wrote: > > From: "Steven Rostedt (Google)" > > The callback function that is used to create inodes and dentries is not > protected by anything and the data that is passed to it could become > stale. After eventfs_remove_dir() is called by the tracing system, it is > free to remove the events that are associated to that directory. > Unfortunately, that means the callbacks must not be called after that. > > CPU0 CPU1 > > eventfs_root_lookup() { > eventfs_remove_dir() { > mutex_lock(_mutex); > ei->is_freed = set; > mutex_unlock(_mutex); > } > kfree(event_call); > > for (...) { > entry = >entries[i]; > r = entry->callback() { > call = data; // call == event_call above > if (call->flags ...) > > [ USE AFTER FREE BUG ] > > The safest way to protect this is to wrap the callback with: > > mutex_lock(_mutex); > if (!ei->is_freed) > r = entry->callback(); > else > r = -1; > mutex_unlock(_mutex); > > This will make sure that the callback will not be called after it is > freed. But now it needs to be known that the callback is called while > holding internal eventfs locks, and that it must not call back into the > eventfs / tracefs system. There's no reason it should anyway, but document > that as well. > > Link: > https://lore.kernel.org/all/CA+G9fYu9GOEbD=rR5eMR-=HJ8H6rMsbzDC2ZY5=y50wpwae...@mail.gmail.com/ > > Reported-by: Linux Kernel Functional Testing > Reported-by: Naresh Kamboju > Signed-off-by: Steven Rostedt (Google) Tested-by: Linux Kernel Functional Testing Tested-by: Naresh Kamboju > --- > fs/tracefs/event_inode.c | 22 ++-- > include/linux/tracefs.h | 43 -- Linaro LKFT https://lkft.linaro.org
Re: [PATCH v2] eventfs: Test for ei->is_freed when accessing ei->dentry
On Mon, 30 Oct 2023 at 20:11, Steven Rostedt wrote: > > On Mon, 30 Oct 2023 12:37:08 +0530 > Naresh Kamboju wrote: > > > > > I have tested the linux-trace.git trace/core and run selftests ftrace > > the reported kernel panic [1] & [2] has been fixed but found > > Good to know. Can I add "Tested-by" from you for that bug fix? Please add, Tested-by: Linux Kernel Functional Testing Tested-by: Naresh Kamboju > > > "general protection fault" at kernel/trace/trace_events.c:2439. > > Can you test with the below patch? Applied this patch on top of linux-trace.git trace/core and test passed. All the reported issues have been resolved. > > Also, can I ask what are you testing this on that makes it trigger so > easily? As I'm not able to trigger these in my tests, even though they are > indeed bugs. I use following build artifacts and running selftests: ftrace: ftracetest-ktap kernel: url: https://storage.tuxsuite.com/public/linaro/naresh/builds/2XT84xYJIMmKApmqOOtKhnLdCyz/bzImage modules: url: https://storage.tuxsuite.com/public/linaro/naresh/builds/2XT84xYJIMmKApmqOOtKhnLdCyz/modules.tar.xz compression: xz nfsrootfs: url: https://storage.tuxboot.com/debian/bookworm/amd64/rootfs.tar.xz kselftest: url: https://storage.tuxsuite.com/public/linaro/naresh/builds/2XT84xYJIMmKApmqOOtKhnLdCyz/kselftest.tar.xz Steps to test: == 1) Boot x86 machine with above bzImage, modules, rootfs and copy kselftest.tar.xz from above url and unzip. 2) cd /opt/kselftests/default-in-kernel/ftrace 3) ./run_kselftest.sh -c ftrace # selftests: ftrace: ftracetest-ktap # unlink: cannot unlink '/opt/kselftests/default-in-kernel/ftrace/logs/latest': No such file or directory # TAP version 13 # 1..129 # ok 1 Basic trace file check ... # # Totals: pass:126 faii:0 xfail:1 xpass:0 skip:1 error:1 Test logs: https://lkft.validation.linaro.org/scheduler/job/6985018#L4742 > > -- Steve > > diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c > index 7ad7496bd597..7a0b54ddda24 100644 > --- a/fs/tracefs/event_inode.c > +++ b/fs/tracefs/event_inode.c > @@ -609,7 +609,13 @@ static struct dentry *eventfs_root_lookup(struct inode > *dir, > entry = >entries[i]; > if (strcmp(name, entry->name) == 0) { > void *cdata = data; > - r = entry->callback(name, , , ); > + mutex_lock(_mutex); > + /* If ei->is_freed, then the event itself may be too > */ > + if (!ei->is_freed) > + r = entry->callback(name, , , > ); > + else > + r = -1; > + mutex_unlock(_mutex); > if (r <= 0) > continue; > ret = simple_lookup(dir, dentry, flags); > @@ -743,7 +749,13 @@ static int dcache_dir_open_wrapper(struct inode *inode, > struct file *file) > void *cdata = data; > entry = >entries[i]; > name = entry->name; > - r = entry->callback(name, , , ); > + mutex_lock(_mutex); > + /* If ei->is_freed, then the event itself may be too */ > + if (!ei->is_freed) > + r = entry->callback(name, , , ); > + else > + r = -1; > + mutex_unlock(_mutex); > if (r <= 0) > continue; > d = create_file_dentry(ei, i, parent, name, mode, cdata, > fops, false);
Re: [PATCH 9/9] arm64: dts: qcom: qcm6490-fairphone-fp5: Enable WiFi
On 27.10.2023 16:20, Luca Weiss wrote: > Now that the WPSS remoteproc is enabled, enable wifi so we can use it. > > Signed-off-by: Luca Weiss > --- > arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts | 4 > 1 file changed, 4 insertions(+) > > diff --git a/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts > b/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts > index d65eef30091b..e7e20f73cbe6 100644 > --- a/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts > +++ b/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts > @@ -713,3 +713,7 @@ { > firmware-name = "qcom/qcm6490/fairphone5/venus.mbn"; > status = "okay"; > }; > + > + { > + status = "okay"; qcom,ath11k-calibration-variant? Konrad
Re: [PATCH 8/9] arm64: dts: qcom: qcm6490-fairphone-fp5: Enable various remoteprocs
On 27.10.2023 16:20, Luca Weiss wrote: > Enable the ADSP, CDSP, MPSS and WPSS that are found on the SoC. > > Signed-off-by: Luca Weiss > --- > arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts | 20 > 1 file changed, 20 insertions(+) > > diff --git a/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts > b/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts > index cc092735ce17..d65eef30091b 100644 > --- a/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts > +++ b/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts > @@ -490,6 +490,26 @@ _id_1 { > status = "okay"; > }; > > +_adsp { > + firmware-name = "qcom/qcm6490/fairphone5/adsp.mdt"; > + status = "okay"; > +}; > + > +_cdsp { > + firmware-name = "qcom/qcm6490/fairphone5/cdsp.mdt"; > + status = "okay"; > +}; > + > +_mpss { > + firmware-name = "qcom/qcm6490/fairphone5/modem.mdt"; > + status = "okay"; > +}; > + > +_wpss { > + firmware-name = "qcom/qcm6490/fairphone5/wpss.mdt"; mbn? Konrad
[GIT PULL] tracing/tools: Updates for 6.7
Steven, Tracing tools changes for 6.7: RTLA: - On rtla/utils.c, initialize the 'found' variable to avoid garbage when a mount point is not found. Verification: - Remove duplicated imports on dot2k python script Please pull the latest tracing-tools-v6.7 tree, which can be found at: git://git.kernel.org/pub/scm/linux/kernel/git/bristot/linux.git tracing-tools-v6.7 Tag SHA1: eadb7e76bc7ce4174d4905a29a4a07cd835508fe Head SHA1: 696444a544ecd6d62c1edc89516b376cefb28929 Alessandro Carminati (Red Hat) (1): verification/dot2k: Delete duplicate imports Colin Ian King (1): rtla: Fix uninitialized variable found tools/tracing/rtla/src/utils.c | 2 +- tools/verification/dot2/dot2k | 2 -- 2 files changed, 1 insertion(+), 3 deletions(-) --- diff --git a/tools/tracing/rtla/src/utils.c b/tools/tracing/rtla/src/utils.c index 623a38908ed5..c769d7b3842c 100644 --- a/tools/tracing/rtla/src/utils.c +++ b/tools/tracing/rtla/src/utils.c @@ -538,7 +538,7 @@ static const int find_mount(const char *fs, char *mp, int sizeof_mp) { char mount_point[MAX_PATH]; char type[100]; - int found; + int found = 0; FILE *fp; fp = fopen("/proc/mounts", "r"); diff --git a/tools/verification/dot2/dot2k b/tools/verification/dot2/dot2k index 9dcd38abe20a..d4d7e52d549e 100644 --- a/tools/verification/dot2/dot2k +++ b/tools/verification/dot2/dot2k @@ -15,8 +15,6 @@ if __name__ == '__main__': import os import platform import sys -import sys -import argparse parser = argparse.ArgumentParser(description='transform .dot file into kernel rv monitor') parser.add_argument('-d', "--dot", dest="dot_file", required=True)
[PATCH v6 11/12] Docs/x86/sgx: Add description for cgroup support
From: Sean Christopherson Add initial documentation of how to regulate the distribution of SGX Enclave Page Cache (EPC) memory via the Miscellaneous cgroup controller. Signed-off-by: Sean Christopherson Co-developed-by: Kristen Carlson Accardi Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang Cc: Sean Christopherson --- V6: - Remove mentioning of VMM specific behavior on handling SIGBUS - Remove statement of forced reclamation, add statement to specify ENOMEM returned when no reclamation possible. - Added statements on the non-preemptive nature for the max limit - Dropped Reviewed-by tag because of changes V4: - Fix indentation (Randy) - Change misc.events file to be read-only - Fix a typo for 'subsystem' - Add behavior when VMM overcommit EPC with a cgroup (Mikko) --- Documentation/arch/x86/sgx.rst | 74 ++ 1 file changed, 74 insertions(+) diff --git a/Documentation/arch/x86/sgx.rst b/Documentation/arch/x86/sgx.rst index d90796adc2ec..dfc8fac13ab2 100644 --- a/Documentation/arch/x86/sgx.rst +++ b/Documentation/arch/x86/sgx.rst @@ -300,3 +300,77 @@ to expected failures and handle them as follows: first call. It indicates a bug in the kernel or the userspace client if any of the second round of ``SGX_IOC_VEPC_REMOVE_ALL`` calls has a return code other than 0. + + +Cgroup Support +== + +The "sgx_epc" resource within the Miscellaneous cgroup controller regulates distribution of SGX +EPC memory, which is a subset of system RAM that is used to provide SGX-enabled applications +with protected memory, and is otherwise inaccessible, i.e. shows up as reserved in /proc/iomem +and cannot be read/written outside of an SGX enclave. + +Although current systems implement EPC by stealing memory from RAM, for all intents and +purposes the EPC is independent from normal system memory, e.g. must be reserved at boot from +RAM and cannot be converted between EPC and normal memory while the system is running. The EPC +is managed by the SGX subsystem and is not accounted by the memory controller. Note that this +is true only for EPC memory itself, i.e. normal memory allocations related to SGX and EPC +memory, e.g. the backing memory for evicted EPC pages, are accounted, limited and protected by +the memory controller. + +Much like normal system memory, EPC memory can be overcommitted via virtual memory techniques +and pages can be swapped out of the EPC to their backing store (normal system memory allocated +via shmem). The SGX EPC subsystem is analogous to the memory subsystem, and it implements +limit and protection models for EPC memory. + +SGX EPC Interface Files +--- + +For a generic description of the Miscellaneous controller interface files, please see +Documentation/admin-guide/cgroup-v2.rst + +All SGX EPC memory amounts are in bytes unless explicitly stated otherwise. If a value which +is not PAGE_SIZE aligned is written, the actual value used by the controller will be rounded +down to the closest PAGE_SIZE multiple. + + misc.capacity +A read-only flat-keyed file shown only in the root cgroup. The sgx_epc resource will +show the total amount of EPC memory available on the platform. + + misc.current +A read-only flat-keyed file shown in the non-root cgroups. The sgx_epc resource will +show the current active EPC memory usage of the cgroup and its descendants. EPC pages +that are swapped out to backing RAM are not included in the current count. + + misc.max +A read-write single value file which exists on non-root cgroups. The sgx_epc resource +will show the EPC usage hard limit. The default is "max". + +If a cgroup's EPC usage reaches this limit, EPC allocations, e.g. for page fault +handling, will be blocked until EPC can be reclaimed from the cgroup. If there are no +pages left that are reclaimable within the same group, the kernel returns ENOMEM. + +The EPC pages allocated for a guest VM by the virtual EPC driver are not reclaimable by +the host kernel. In case the guest cgroup's limit is reached and no reclaimable pages +left in the same cgroup, the virtual EPC driver returns SIGBUS to the user space +process to indicate failure on new EPC allocation requests. + +The misc.max limit is non-preemptive. If a user writes a limit lower than the current +usage to this file, the cgroup will not preemptively deallocate pages currently in use, +and will only start blocking the next allocation and reclaiming EPC at that time. + + misc.events +A read-only flat-keyed file which exists on non-root cgroups. +A value change in this file generates a file modified event. + + max +The number of times the cgroup has triggered a reclaim +due to its EPC usage approaching (or
[PATCH v6 10/12] x86/sgx: Implement EPC reclamation for cgroup
From: Kristen Carlson Accardi Currently all reclaimable pages are tracked only in the global LRU list, and only the global reclaimer(ksgxd) performs reclamation when the global free page counts are lower than a threshold. When a cgroup limit is reached, cgroup need also try to reclaim pages allocated within the group. This patch enables per-cgroup reclamation. Add a helper function sgx_lru_list(), that for a given EPC page, returns the LRU list of the cgroup that is assigned to the EPC page at allocation time. This helper is used to replace the hard coded global LRU wherever appropriate: modify sgx_mark/unmark_page_reclaimable() to track EPCs in the LRU list of the appropriate cgroup; modify sgx_do_epc_reclamation() to return unreclaimed pages back to proper cgroup. Implement the reclamation flow for cgroup, encapsulated in the top-level function sgx_epc_cgroup_reclaim_pages(). Just like the global reclaimer, the cgroup reclaimer first isolates candidate pages for reclaim, then invokes sgx_do_epc_reclamation(). The only difference is that a cgroup does a pre-order walk on its subtree to scan for candidate pages from its own LRU and LRUs in its descendants. In some contexts, e.g. page fault handling, only asynchronous reclamation is allowed. Create a workqueue, 'sgx_epc_cg_wq', corresponding work item and function definitions to support the asynchronous reclamation. Add a Boolean parameter for sgx_epc_cgroup_try_charge() to indicate whether synchronous reclaim is allowed or not. Both synchronous and asynchronous flows invoke the same top level reclaim function, sgx_epc_cgroup_reclaim_pages(). All reclaimable pages are tracked in per-cgroup LRUs when cgroup is enabled. Update the original global reclaimer to reclaim from the root cgroup when cgroup is enabled, also calling sgx_epc_cgroup_reclaim_pages(). Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang --- V6: - Drop EPC OOM killer.(Dave, Michal) - Patch restructuring: this includes part split from the patch, "Limit process EPC usage with misc cgroup controller", and combined with "Prepare for multiple LRUs" - Removed force reclamation ignoring 'youngness' of the pages - Removed checking for capacity in reclamation loop. --- arch/x86/kernel/cpu/sgx/epc_cgroup.c | 224 ++- arch/x86/kernel/cpu/sgx/epc_cgroup.h | 19 ++- arch/x86/kernel/cpu/sgx/main.c | 71 ++--- 3 files changed, 289 insertions(+), 25 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.c b/arch/x86/kernel/cpu/sgx/epc_cgroup.c index 500627d0563f..110d44c0ef7c 100644 --- a/arch/x86/kernel/cpu/sgx/epc_cgroup.c +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.c @@ -5,6 +5,38 @@ #include #include "epc_cgroup.h" +#define SGX_EPC_RECLAIM_MIN_PAGES 16U + +static struct workqueue_struct *sgx_epc_cg_wq; + +static inline u64 sgx_epc_cgroup_page_counter_read(struct sgx_epc_cgroup *epc_cg) +{ + return atomic64_read(_cg->cg->res[MISC_CG_RES_SGX_EPC].usage) / PAGE_SIZE; +} + +static inline u64 sgx_epc_cgroup_max_pages(struct sgx_epc_cgroup *epc_cg) +{ + return READ_ONCE(epc_cg->cg->res[MISC_CG_RES_SGX_EPC].max) / PAGE_SIZE; +} + +/* + * Get the lower bound of limits of a cgroup and its ancestors. Used in + * sgx_epc_cgroup_reclaim_work_func() to determine if EPC usage of a cgroup is over its limit + * or its ancestors' hence reclamation is needed. + */ +static inline u64 sgx_epc_cgroup_max_pages_to_root(struct sgx_epc_cgroup *epc_cg) +{ + struct misc_cg *i = epc_cg->cg; + u64 m = U64_MAX; + + while (i) { + m = min(m, READ_ONCE(i->res[MISC_CG_RES_SGX_EPC].max)); + i = misc_cg_parent(i); + } + + return m / PAGE_SIZE; +} + static inline struct sgx_epc_cgroup *sgx_epc_cgroup_from_misc_cg(struct misc_cg *cg) { return (struct sgx_epc_cgroup *)(cg->res[MISC_CG_RES_SGX_EPC].priv); @@ -15,12 +47,188 @@ static inline bool sgx_epc_cgroup_disabled(void) return !cgroup_subsys_enabled(misc_cgrp_subsys); } +/** + * sgx_epc_cgroup_lru_empty() - check if a cgroup tree has no pages on its LRUs + * @root: root of the tree to check + * + * Return: %true if all cgroups under the specified root have empty LRU lists. + * Used to avoid livelocks due to a cgroup having a non-zero charge count but + * no pages on its LRUs, e.g. due to a dead enclave waiting to be released or + * because all pages in the cgroup are unreclaimable. + */ +bool sgx_epc_cgroup_lru_empty(struct misc_cg *root) +{ + struct cgroup_subsys_state *css_root; + struct cgroup_subsys_state *pos; + struct sgx_epc_cgroup *epc_cg; + bool ret = true; + + /* +* Caller ensure css_root ref acquired +*/ + css_root = >css; + + rcu_read_lock(); + css_for_each_descendant_pre(pos, css_root) { + if
[PATCH v6 12/12] selftests/sgx: Add scripts for EPC cgroup testing
The scripts rely on cgroup-tools package from libcgroup [1]. To run selftests for epc cgroup: sudo ./run_epc_cg_selftests.sh With different cgroups, the script starts one or multiple concurrent SGX selftests, each to run one unclobbered_vdso_oversubscribed test. Each of such test tries to load an enclave of EPC size equal to the EPC capacity available on the platform. The script checks results against the expectation set for each cgroup and reports success or failure. The script creates 3 different cgroups at the beginning with following expectations: 1) SMALL - intentionally small enough to fail the test loading an enclave of size equal to the capacity. 2) LARGE - large enough to run up to 4 concurrent tests but fail some if more than 4 concurrent tests are run. The script starts 4 expecting at least one test to pass, and then starts 5 expecting at least one test to fail. 3) LARGER - limit is the same as the capacity, large enough to run lots of concurrent tests. The script starts 10 of them and expects all pass. Then it reruns the same test with one process randomly killed and usage checked to be zero after all process exit. To watch misc cgroup 'current' changes during testing, run this in a separate terminal: ./watch_misc_for_tests.sh current [1] https://github.com/libcgroup/libcgroup/blob/main/README Signed-off-by: Haitao Huang --- V5: - Added script with automatic results checking, remove the interactive script. - The script can run independent from the series below. --- .../selftests/sgx/run_epc_cg_selftests.sh | 196 ++ .../selftests/sgx/watch_misc_for_tests.sh | 13 ++ 2 files changed, 209 insertions(+) create mode 100755 tools/testing/selftests/sgx/run_epc_cg_selftests.sh create mode 100755 tools/testing/selftests/sgx/watch_misc_for_tests.sh diff --git a/tools/testing/selftests/sgx/run_epc_cg_selftests.sh b/tools/testing/selftests/sgx/run_epc_cg_selftests.sh new file mode 100755 index ..72b93f694753 --- /dev/null +++ b/tools/testing/selftests/sgx/run_epc_cg_selftests.sh @@ -0,0 +1,196 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright(c) 2023 Intel Corporation. + +TEST_ROOT_CG=selftest +cgcreate -g misc:$TEST_ROOT_CG +if [ $? -ne 0 ]; then +echo "# Please make sure cgroup-tools is installed, and misc cgroup is mounted." +exit 1 +fi +TEST_CG_SUB1=$TEST_ROOT_CG/test1 +TEST_CG_SUB2=$TEST_ROOT_CG/test2 +TEST_CG_SUB3=$TEST_ROOT_CG/test1/test3 +TEST_CG_SUB4=$TEST_ROOT_CG/test4 + +cgcreate -g misc:$TEST_CG_SUB1 +cgcreate -g misc:$TEST_CG_SUB2 +cgcreate -g misc:$TEST_CG_SUB3 +cgcreate -g misc:$TEST_CG_SUB4 + +# Default to V2 +CG_ROOT=/sys/fs/cgroup +if [ ! -d "/sys/fs/cgroup/misc" ]; then +echo "# cgroup V2 is in use." +else +echo "# cgroup V1 is in use." +CG_ROOT=/sys/fs/cgroup/misc +fi + +CAPACITY=$(grep "sgx_epc" "$CG_ROOT/misc.capacity" | awk '{print $2}') +# This is below number of VA pages needed for enclave of capacity size. So +# should fail oversubscribed cases +SMALL=$(( CAPACITY / 512 )) + +# At least load one enclave of capacity size successfully, maybe up to 4. +# But some may fail if we run more than 4 concurrent enclaves of capacity size. +LARGE=$(( SMALL * 4 )) + +# Load lots of enclaves +LARGER=$CAPACITY +echo "# Setting up limits." +echo "sgx_epc $SMALL" | tee $CG_ROOT/$TEST_CG_SUB1/misc.max +echo "sgx_epc $LARGE" | tee $CG_ROOT/$TEST_CG_SUB2/misc.max +echo "sgx_epc $LARGER" | tee $CG_ROOT/$TEST_CG_SUB4/misc.max + +timestamp=$(date +%Y%m%d_%H%M%S) + +test_cmd="./test_sgx -t unclobbered_vdso_oversubscribed" + +echo "# Start unclobbered_vdso_oversubscribed with SMALL limit, expecting failure..." +# Always use leaf node of misc cgroups so it works for both v1 and v2 +# these may fail on OOM +cgexec -g misc:$TEST_CG_SUB3 $test_cmd >cgtest_small_$timestamp.log 2>&1 +if [[ $? -eq 0 ]]; then +echo "# Fail on SMALL limit, not expecting any test passes." +cgdelete -r -g misc:$TEST_ROOT_CG +exit 1 +else +echo "# Test failed as expected." +fi + +echo "# PASSED SMALL limit." + +echo "# Start 4 concurrent unclobbered_vdso_oversubscribed tests with LARGE limit, +expecting at least one success" +pids=() +for i in {1..4}; do +( +cgexec -g misc:$TEST_CG_SUB2 $test_cmd >cgtest_large_positive_$timestamp.$i.log 2>&1 +) & +pids+=($!) +done + +any_success=0 +for pid in "${pids[@]}"; do +wait "$pid" +status=$? +if [[ $status -eq 0 ]]; then +any_success=1 + echo "# Process $pid returned successfully." +fi +done + +if [[ $any_success -eq 0 ]]; then +echo "# Failed on LARGE limit positive testing, no test passes." +cgdelete -r -g misc:$TEST_ROOT_CG +exit 1 +fi + +echo "# PASSED LARGE limit positive testing." + +echo "# Start 5 concurrent unclobbered_vdso_oversubscribed tests with LARGE limit, +expecting at least one failure" +pids=() +for i in {1..5}; do +( +cgexec -g misc:$TEST_CG_SUB2 $test_cmd
[PATCH v6 08/12] x86/sgx: Use a list to track to-be-reclaimed pages
From: Sean Christopherson Change sgx_reclaim_pages() to use a list rather than an array for storing the epc_pages which will be reclaimed. This change is needed to transition to the LRU implementation for EPC cgroup support. When the EPC cgroup is implemented, the reclaiming process will do a pre-order tree walk for the subtree starting from the limit-violating cgroup. When each node is visited, candidate pages are selected from its "reclaimable" LRU list and moved into this temporary list. Passing a list from node to node for temporary storage in this walk is more straightforward than using an array. Signed-off-by: Sean Christopherson Co-developed-by: Kristen Carlson Accardi Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang Cc: Sean Christopherson --- V6: - Remove extra list_del_init and style fix (Kai) V4: - Changes needed for patch reordering - Revised commit message V3: - Removed list wrappers --- arch/x86/kernel/cpu/sgx/main.c | 35 +++--- 1 file changed, 15 insertions(+), 20 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index e27ac73d8843..33bcba313d40 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -296,12 +296,11 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, */ static void sgx_reclaim_pages(void) { - struct sgx_epc_page *chunk[SGX_NR_TO_SCAN]; struct sgx_backing backing[SGX_NR_TO_SCAN]; + struct sgx_epc_page *epc_page, *tmp; struct sgx_encl_page *encl_page; - struct sgx_epc_page *epc_page; pgoff_t page_index; - int cnt = 0; + LIST_HEAD(iso); int ret; int i; @@ -317,7 +316,7 @@ static void sgx_reclaim_pages(void) if (kref_get_unless_zero(_page->encl->refcount) != 0) { sgx_epc_page_set_state(epc_page, SGX_EPC_PAGE_RECLAIM_IN_PROGRESS); - chunk[cnt++] = epc_page; + list_move_tail(_page->list, ); } else /* The owner is freeing the page. No need to add the * page back to the list of reclaimable pages. @@ -326,8 +325,11 @@ static void sgx_reclaim_pages(void) } spin_unlock(_global_lru.lock); - for (i = 0; i < cnt; i++) { - epc_page = chunk[i]; + if (list_empty()) + return; + + i = 0; + list_for_each_entry_safe(epc_page, tmp, , list) { encl_page = epc_page->owner; if (!sgx_reclaimer_age(epc_page)) @@ -342,6 +344,7 @@ static void sgx_reclaim_pages(void) goto skip; } + i++; encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED; mutex_unlock(_page->encl->lock); continue; @@ -349,27 +352,19 @@ static void sgx_reclaim_pages(void) skip: spin_lock(_global_lru.lock); sgx_epc_page_set_state(epc_page, SGX_EPC_PAGE_RECLAIMABLE); - list_add_tail(_page->list, _global_lru.reclaimable); + list_move_tail(_page->list, _global_lru.reclaimable); spin_unlock(_global_lru.lock); kref_put(_page->encl->refcount, sgx_encl_release); - - chunk[i] = NULL; - } - - for (i = 0; i < cnt; i++) { - epc_page = chunk[i]; - if (epc_page) - sgx_reclaimer_block(epc_page); } - for (i = 0; i < cnt; i++) { - epc_page = chunk[i]; - if (!epc_page) - continue; + list_for_each_entry(epc_page, , list) + sgx_reclaimer_block(epc_page); + i = 0; + list_for_each_entry_safe(epc_page, tmp, , list) { encl_page = epc_page->owner; - sgx_reclaimer_write(epc_page, [i]); + sgx_reclaimer_write(epc_page, [i++]); kref_put(_page->encl->refcount, sgx_encl_release); sgx_epc_page_reset_state(epc_page); -- 2.25.1
[PATCH v6 09/12] x86/sgx: Restructure top-level EPC reclaim function
From: Sean Christopherson To prepare for per-cgroup reclamation, separate the top-level reclaim function, sgx_reclaim_epc_pages(), into two separate functions: - sgx_isolate_epc_pages() scans and isolates reclaimable pages from a given LRU list. - sgx_do_epc_reclamation() performs the real reclamation for the already isolated pages. Create a new function, sgx_reclaim_epc_pages_global(), calling those two in succession, to replace the original sgx_reclaim_epc_pages(). The above two functions will serve as building blocks for the reclamation flows in later EPC cgroup implementation. sgx_do_epc_reclamation() returns the number of reclaimed pages. The EPC cgroup will use the result to track reclaiming progress. sgx_isolate_epc_pages() returns the additional number of pages to scan for current epoch of reclamation. The EPC cgroup will use the result to determine if more scanning to be done in LRUs in its children groups. Signed-off-by: Sean Christopherson Co-developed-by: Kristen Carlson Accardi Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang Cc: Sean Christopherson --- V6: - Restructure patches to make it easier to review. (Kai) - Fix unused nr_to_scan (Kai) --- arch/x86/kernel/cpu/sgx/main.c | 97 ++ arch/x86/kernel/cpu/sgx/sgx.h | 8 +++ 2 files changed, 72 insertions(+), 33 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 33bcba313d40..e8848b493eb7 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -281,33 +281,23 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, mutex_unlock(>lock); } -/* - * Take a fixed number of pages from the head of the active page pool and - * reclaim them to the enclave's private shmem files. Skip the pages, which have - * been accessed since the last scan. Move those pages to the tail of active - * page pool so that the pages get scanned in LRU like fashion. +/** + * sgx_isolate_epc_pages() - Isolate pages from an LRU for reclaim + * @lru: LRU from which to reclaim + * @nr_to_scan:Number of pages to scan for reclaim + * @dst: Destination list to hold the isolated pages * - * Batch process a chunk of pages (at the moment 16) in order to degrade amount - * of IPI's and ETRACK's potentially required. sgx_encl_ewb() does degrade a bit - * among the HW threads with three stage EWB pipeline (EWB, ETRACK + EWB and IPI - * + EWB) but not sufficiently. Reclaiming one page at a time would also be - * problematic as it would increase the lock contention too much, which would - * halt forward progress. + * Return: remaining pages to scan, i.e, @nr_to_scan minus the number of pages scanned. */ -static void sgx_reclaim_pages(void) +unsigned int sgx_isolate_epc_pages(struct sgx_epc_lru_list *lru, unsigned int nr_to_scan, + struct list_head *dst) { - struct sgx_backing backing[SGX_NR_TO_SCAN]; - struct sgx_epc_page *epc_page, *tmp; struct sgx_encl_page *encl_page; - pgoff_t page_index; - LIST_HEAD(iso); - int ret; - int i; + struct sgx_epc_page *epc_page; - spin_lock(_global_lru.lock); - for (i = 0; i < SGX_NR_TO_SCAN; i++) { - epc_page = list_first_entry_or_null(_global_lru.reclaimable, - struct sgx_epc_page, list); + spin_lock(>lock); + for (; nr_to_scan > 0; --nr_to_scan) { + epc_page = list_first_entry_or_null(>reclaimable, struct sgx_epc_page, list); if (!epc_page) break; @@ -316,23 +306,53 @@ static void sgx_reclaim_pages(void) if (kref_get_unless_zero(_page->encl->refcount) != 0) { sgx_epc_page_set_state(epc_page, SGX_EPC_PAGE_RECLAIM_IN_PROGRESS); - list_move_tail(_page->list, ); + list_move_tail(_page->list, dst); } else /* The owner is freeing the page. No need to add the * page back to the list of reclaimable pages. */ sgx_epc_page_reset_state(epc_page); } - spin_unlock(_global_lru.lock); + spin_unlock(>lock); + + return nr_to_scan; +} + +/** + * sgx_do_epc_reclamation() - Perform reclamation for isolated EPC pages. + * @iso: List of isolated pages for reclamation + * + * Take a list of EPC pages and reclaim them to the enclave's private shmem files. Do not + * reclaim the pages that have been accessed since the last scan, and move each of those pages + * to the tail of its tracking LRU list. + * + * Limit the number of pages to be processed up to SGX_NR_TO_SCAN_MAX per call in order to + * degrade amount of IPI's and ETRACK's potentially required. sgx_encl_ewb() does degrade a bit +
[PATCH v6 07/12] x86/sgx: Introduce EPC page states
Use the lower 2 bits in the flags field of sgx_epc_page struct to track EPC states and define an enum for possible states for EPC pages tracked for reclamation. Add the RECLAIM_IN_PROGRESS state to explicitly indicate a page that is identified as a candidate for reclaiming, but has not yet been reclaimed, instead of relying on list_empty(_page->list). A later patch will replace the array on stack with a temporary list to store the candidate pages, so list_empty() should no longer be used for this purpose. Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Co-developed-by: Kristen Carlson Accardi Signed-off-by: Kristen Carlson Accardi Signed-off-by: Haitao Huang Cc: Sean Christopherson --- V6: - Drop UNRECLAIMABLE and use only 2 bits for states (Kai) - Combine the patch for RECLAIM_IN_PROGRESS - Style fixes (Jarkko and Kai) --- arch/x86/kernel/cpu/sgx/encl.c | 2 +- arch/x86/kernel/cpu/sgx/main.c | 33 +- arch/x86/kernel/cpu/sgx/sgx.h | 62 +++--- 3 files changed, 76 insertions(+), 21 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index 279148e72459..17dc108d3ff7 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -1315,7 +1315,7 @@ void sgx_encl_free_epc_page(struct sgx_epc_page *page) { int ret; - WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED); + WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_STATE_MASK); ret = __eremove(sgx_get_epc_virt_addr(page)); if (WARN_ONCE(ret, EREMOVE_ERROR_MESSAGE, ret, ret)) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index d347acd717fd..e27ac73d8843 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -315,13 +315,14 @@ static void sgx_reclaim_pages(void) list_del_init(_page->list); encl_page = epc_page->owner; - if (kref_get_unless_zero(_page->encl->refcount) != 0) + if (kref_get_unless_zero(_page->encl->refcount) != 0) { + sgx_epc_page_set_state(epc_page, SGX_EPC_PAGE_RECLAIM_IN_PROGRESS); chunk[cnt++] = epc_page; - else + } else /* The owner is freeing the page. No need to add the * page back to the list of reclaimable pages. */ - epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; + sgx_epc_page_reset_state(epc_page); } spin_unlock(_global_lru.lock); @@ -347,6 +348,7 @@ static void sgx_reclaim_pages(void) skip: spin_lock(_global_lru.lock); + sgx_epc_page_set_state(epc_page, SGX_EPC_PAGE_RECLAIMABLE); list_add_tail(_page->list, _global_lru.reclaimable); spin_unlock(_global_lru.lock); @@ -370,7 +372,7 @@ static void sgx_reclaim_pages(void) sgx_reclaimer_write(epc_page, [i]); kref_put(_page->encl->refcount, sgx_encl_release); - epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; + sgx_epc_page_reset_state(epc_page); sgx_free_epc_page(epc_page); } @@ -509,7 +511,8 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void) void sgx_mark_page_reclaimable(struct sgx_epc_page *page) { spin_lock(_global_lru.lock); - page->flags |= SGX_EPC_PAGE_RECLAIMER_TRACKED; + WARN_ON_ONCE(sgx_epc_page_reclaimable(page->flags)); + page->flags |= SGX_EPC_PAGE_RECLAIMABLE; list_add_tail(>list, _global_lru.reclaimable); spin_unlock(_global_lru.lock); } @@ -527,16 +530,13 @@ void sgx_mark_page_reclaimable(struct sgx_epc_page *page) int sgx_unmark_page_reclaimable(struct sgx_epc_page *page) { spin_lock(_global_lru.lock); - if (page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) { - /* The page is being reclaimed. */ - if (list_empty(>list)) { - spin_unlock(_global_lru.lock); - return -EBUSY; - } - - list_del(>list); - page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; + if (sgx_epc_page_reclaim_in_progress(page->flags)) { + spin_unlock(_global_lru.lock); + return -EBUSY; } + + list_del(>list); + sgx_epc_page_reset_state(page); spin_unlock(_global_lru.lock); return 0; @@ -623,6 +623,7 @@ void sgx_free_epc_page(struct sgx_epc_page *page) struct sgx_epc_section *section = _epc_sections[page->section]; struct sgx_numa_node *node = section->node; + WARN_ON_ONCE(page->flags & (SGX_EPC_PAGE_STATE_MASK)); if (page->epc_cg) { sgx_epc_cgroup_uncharge(page->epc_cg); page->epc_cg = NULL; @@ -635,7 +636,7 @@ void sgx_free_epc_page(struct
[PATCH v6 04/12] x86/sgx: Implement basic EPC misc cgroup functionality
From: Kristen Carlson Accardi Implement support for cgroup control of SGX Enclave Page Cache (EPC) memory using the misc cgroup controller. EPC memory is independent from normal system memory, e.g. must be reserved at boot from RAM and cannot be converted between EPC and normal memory while the system is running. EPC is managed by the SGX subsystem and is not accounted by the memory controller. Much like normal system memory, EPC memory can be overcommitted via virtual memory techniques and pages can be swapped out of the EPC to their backing store (normal system memory, e.g. shmem). The SGX EPC subsystem is analogous to the memory subsystem and the SGX EPC controller is in turn analogous to the memory controller; it implements limit and protection models for EPC memory. The misc controller provides a mechanism to set a hard limit of EPC usage via the "sgx_epc" resource in "misc.max". The total EPC memory available on the system is reported via the "sgx_epc" resource in "misc.capacity". This patch was modified from the previous version to only add basic EPC cgroup structure, accounting allocations for cgroup usage (charge/uncharge), setup misc cgroup callbacks, set total EPC capacity. For now, the EPC cgroup simply blocks additional EPC allocation in sgx_alloc_epc_page() when the limit is reached. Reclaimable pages are still tracked in the global active list, only reclaimed by the global reclaimer when the total free page count is lower than a threshold. Later patches will reorganize the tracking and reclamation code in the globale reclaimer and implement per-cgroup tracking and reclaiming. Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang --- V6: - Split the original large patch"Limit process EPC usage with misc cgroup controller" and restructure it (Kai) --- arch/x86/Kconfig | 13 arch/x86/kernel/cpu/sgx/Makefile | 1 + arch/x86/kernel/cpu/sgx/epc_cgroup.c | 103 +++ arch/x86/kernel/cpu/sgx/epc_cgroup.h | 36 ++ arch/x86/kernel/cpu/sgx/main.c | 28 arch/x86/kernel/cpu/sgx/sgx.h| 3 + 6 files changed, 184 insertions(+) create mode 100644 arch/x86/kernel/cpu/sgx/epc_cgroup.c create mode 100644 arch/x86/kernel/cpu/sgx/epc_cgroup.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 66bfabae8814..e17c5dc3aea4 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1921,6 +1921,19 @@ config X86_SGX If unsure, say N. +config CGROUP_SGX_EPC + bool "Miscellaneous Cgroup Controller for Enclave Page Cache (EPC) for Intel SGX" + depends on X86_SGX && CGROUP_MISC + help + Provides control over the EPC footprint of tasks in a cgroup via + the Miscellaneous cgroup controller. + + EPC is a subset of regular memory that is usable only by SGX + enclaves and is very limited in quantity, e.g. less than 1% + of total DRAM. + + Say N if unsure. + config X86_USER_SHADOW_STACK bool "X86 userspace shadow stack" depends on AS_WRUSS diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile index 9c1656779b2a..12901a488da7 100644 --- a/arch/x86/kernel/cpu/sgx/Makefile +++ b/arch/x86/kernel/cpu/sgx/Makefile @@ -4,3 +4,4 @@ obj-y += \ ioctl.o \ main.o obj-$(CONFIG_X86_SGX_KVM) += virt.o +obj-$(CONFIG_CGROUP_SGX_EPC) += epc_cgroup.o diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.c b/arch/x86/kernel/cpu/sgx/epc_cgroup.c new file mode 100644 index ..500627d0563f --- /dev/null +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.c @@ -0,0 +1,103 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright(c) 2022 Intel Corporation. + +#include +#include +#include "epc_cgroup.h" + +static inline struct sgx_epc_cgroup *sgx_epc_cgroup_from_misc_cg(struct misc_cg *cg) +{ + return (struct sgx_epc_cgroup *)(cg->res[MISC_CG_RES_SGX_EPC].priv); +} + +static inline bool sgx_epc_cgroup_disabled(void) +{ + return !cgroup_subsys_enabled(misc_cgrp_subsys); +} + +/** + * sgx_epc_cgroup_try_charge() - hierarchically try to charge a single EPC page + * + * Returns EPC cgroup or NULL on success, -errno on failure. + */ +struct sgx_epc_cgroup *sgx_epc_cgroup_try_charge(void) +{ + struct sgx_epc_cgroup *epc_cg; + int ret; + + if (sgx_epc_cgroup_disabled()) + return NULL; + + epc_cg = sgx_epc_cgroup_from_misc_cg(get_current_misc_cg()); + ret = misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg, PAGE_SIZE); + + if (!ret) { + /* No epc_cg returned, release ref from get_current_misc_cg() */ + put_misc_cg(epc_cg->cg); + return ERR_PTR(-ENOMEM); + } + + /* Ref released in sgx_epc_cgroup_uncharge() */ + return epc_cg; +} + +/** + * sgx_epc_cgroup_uncharge() -
[PATCH v6 05/12] x86/sgx: Add sgx_epc_lru_list to encapsulate LRU list
From: Sean Christopherson Introduce a data structure to wrap the existing reclaimable list and its spinlock. Each cgroup later will have one instance of this structure to track EPC pages allocated for processes associated with the same cgroup. Just like the global SGX reclaimer (ksgxd), an EPC cgroup reclaims pages from the reclaimable list in this structure when its usage reaches near its limit. Signed-off-by: Sean Christopherson Co-developed-by: Kristen Carlson Accardi Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang Cc: Sean Christopherson --- V6: - removed introduction to unreclaimables in commit message. V4: - Removed unneeded comments for the spinlock and the non-reclaimables. (Kai, Jarkko) - Revised the commit to add introduction comments for unreclaimables and multiple LRU lists.(Kai) - Reordered the patches: delay all changes for unreclaimables to later, and this one becomes the first change in the SGX subsystem. V3: - Removed the helper functions and revised commit messages. --- arch/x86/kernel/cpu/sgx/sgx.h | 15 +++ 1 file changed, 15 insertions(+) diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index b1786774b8d2..0fbe6a2a159b 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -86,6 +86,21 @@ static inline void *sgx_get_epc_virt_addr(struct sgx_epc_page *page) return section->virt_addr + index * PAGE_SIZE; } +/* + * Contains EPC pages tracked by the global reclaimer (ksgxd) or an EPC + * cgroup. + */ +struct sgx_epc_lru_list { + spinlock_t lock; + struct list_head reclaimable; +}; + +static inline void sgx_lru_init(struct sgx_epc_lru_list *lru) +{ + spin_lock_init(>lock); + INIT_LIST_HEAD(>reclaimable); +} + struct sgx_epc_page *__sgx_alloc_epc_page(void); void sgx_free_epc_page(struct sgx_epc_page *page); -- 2.25.1
[PATCH v6 06/12] x86/sgx: Use sgx_epc_lru_list for existing active page list
From: Sean Christopherson In future each cgroup needs a LRU list to track reclaimable pages. For now just replace the existing sgx_active_page_list in the reclaimer and its spinlock with a global sgx_epc_lru_list struct. Signed-off-by: Sean Christopherson Co-developed-by: Kristen Carlson Accardi Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang Cc: Sean Christopherson --- V5: - Spelled out SECS, VA (Jarkko) V4: - No change, only reordered the patch. V3: - Remove usage of list wrapper --- arch/x86/kernel/cpu/sgx/main.c | 39 +- 1 file changed, 20 insertions(+), 19 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 07606f391540..d347acd717fd 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -28,10 +28,9 @@ static DEFINE_XARRAY(sgx_epc_address_space); /* * These variables are part of the state of the reclaimer, and must be accessed - * with sgx_reclaimer_lock acquired. + * with sgx_global_lru.lock acquired. */ -static LIST_HEAD(sgx_active_page_list); -static DEFINE_SPINLOCK(sgx_reclaimer_lock); +static struct sgx_epc_lru_list sgx_global_lru; static atomic_long_t sgx_nr_free_pages = ATOMIC_LONG_INIT(0); @@ -306,13 +305,13 @@ static void sgx_reclaim_pages(void) int ret; int i; - spin_lock(_reclaimer_lock); + spin_lock(_global_lru.lock); for (i = 0; i < SGX_NR_TO_SCAN; i++) { - if (list_empty(_active_page_list)) + epc_page = list_first_entry_or_null(_global_lru.reclaimable, + struct sgx_epc_page, list); + if (!epc_page) break; - epc_page = list_first_entry(_active_page_list, - struct sgx_epc_page, list); list_del_init(_page->list); encl_page = epc_page->owner; @@ -324,7 +323,7 @@ static void sgx_reclaim_pages(void) */ epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; } - spin_unlock(_reclaimer_lock); + spin_unlock(_global_lru.lock); for (i = 0; i < cnt; i++) { epc_page = chunk[i]; @@ -347,9 +346,9 @@ static void sgx_reclaim_pages(void) continue; skip: - spin_lock(_reclaimer_lock); - list_add_tail(_page->list, _active_page_list); - spin_unlock(_reclaimer_lock); + spin_lock(_global_lru.lock); + list_add_tail(_page->list, _global_lru.reclaimable); + spin_unlock(_global_lru.lock); kref_put(_page->encl->refcount, sgx_encl_release); @@ -380,7 +379,7 @@ static void sgx_reclaim_pages(void) static bool sgx_should_reclaim(unsigned long watermark) { return atomic_long_read(_nr_free_pages) < watermark && - !list_empty(_active_page_list); + !list_empty(_global_lru.reclaimable); } /* @@ -432,6 +431,8 @@ static bool __init sgx_page_reclaimer_init(void) ksgxd_tsk = tsk; + sgx_lru_init(_global_lru); + return true; } @@ -507,10 +508,10 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void) */ void sgx_mark_page_reclaimable(struct sgx_epc_page *page) { - spin_lock(_reclaimer_lock); + spin_lock(_global_lru.lock); page->flags |= SGX_EPC_PAGE_RECLAIMER_TRACKED; - list_add_tail(>list, _active_page_list); - spin_unlock(_reclaimer_lock); + list_add_tail(>list, _global_lru.reclaimable); + spin_unlock(_global_lru.lock); } /** @@ -525,18 +526,18 @@ void sgx_mark_page_reclaimable(struct sgx_epc_page *page) */ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page) { - spin_lock(_reclaimer_lock); + spin_lock(_global_lru.lock); if (page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) { /* The page is being reclaimed. */ if (list_empty(>list)) { - spin_unlock(_reclaimer_lock); + spin_unlock(_global_lru.lock); return -EBUSY; } list_del(>list); page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; } - spin_unlock(_reclaimer_lock); + spin_unlock(_global_lru.lock); return 0; } @@ -574,7 +575,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) break; } - if (list_empty(_active_page_list)) + if (list_empty(_global_lru.reclaimable)) return ERR_PTR(-ENOMEM); if (!reclaim) { -- 2.25.1
[PATCH v6 02/12] cgroup/misc: Export APIs for SGX driver
From: Kristen Carlson Accardi Export misc_cg_root() so the SGX EPC cgroup can access and do extra setup during initialization, e.g., set callbacks and private data previously defined. The SGX EPC cgroup will reclaim EPC pages when a usage in a cgroup reaches its or ancestor's limit. This requires a walk from the current cgroup up to the root similar to misc_cg_try_charge(). Export misc_cg_parent() to enable this walk. Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang --- V6: - Make commit messages more concise and split the original patch into two(Kai) --- include/linux/misc_cgroup.h | 24 kernel/cgroup/misc.c| 21 - 2 files changed, 32 insertions(+), 13 deletions(-) diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h index 5dc509c27c3d..2a3b1f8dc669 100644 --- a/include/linux/misc_cgroup.h +++ b/include/linux/misc_cgroup.h @@ -68,6 +68,7 @@ struct misc_cg { struct misc_res res[MISC_CG_RES_TYPES]; }; +struct misc_cg *misc_cg_root(void); u64 misc_cg_res_total_usage(enum misc_res_type type); int misc_cg_set_capacity(enum misc_res_type type, u64 capacity); int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg, u64 amount); @@ -87,6 +88,20 @@ static inline struct misc_cg *css_misc(struct cgroup_subsys_state *css) return css ? container_of(css, struct misc_cg, css) : NULL; } +/** + * misc_cg_parent() - Get the parent of the passed misc cgroup. + * @cgroup: cgroup whose parent needs to be fetched. + * + * Context: Any context. + * Return: + * * struct misc_cg* - Parent of the @cgroup. + * * %NULL - If @cgroup is null or the passed cgroup does not have a parent. + */ +static inline struct misc_cg *misc_cg_parent(struct misc_cg *cgroup) +{ + return cgroup ? css_misc(cgroup->css.parent) : NULL; +} + /* * get_current_misc_cg() - Find and get the misc cgroup of the current task. * @@ -111,6 +126,15 @@ static inline void put_misc_cg(struct misc_cg *cg) } #else /* !CONFIG_CGROUP_MISC */ +static inline struct misc_cg *misc_cg_root(void) +{ + return NULL; +} + +static inline struct misc_cg *misc_cg_parent(struct misc_cg *cg) +{ + return NULL; +} static inline u64 misc_cg_res_total_usage(enum misc_res_type type) { diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c index d971ede44ebf..fa464324ccf8 100644 --- a/kernel/cgroup/misc.c +++ b/kernel/cgroup/misc.c @@ -40,18 +40,13 @@ static struct misc_cg root_cg; static u64 misc_res_capacity[MISC_CG_RES_TYPES]; /** - * parent_misc() - Get the parent of the passed misc cgroup. - * @cgroup: cgroup whose parent needs to be fetched. - * - * Context: Any context. - * Return: - * * struct misc_cg* - Parent of the @cgroup. - * * %NULL - If @cgroup is null or the passed cgroup does not have a parent. + * misc_cg_root() - Return the root misc cgroup. */ -static struct misc_cg *parent_misc(struct misc_cg *cgroup) +struct misc_cg *misc_cg_root(void) { - return cgroup ? css_misc(cgroup->css.parent) : NULL; + return _cg; } +EXPORT_SYMBOL_GPL(misc_cg_root); /** * valid_type() - Check if @type is valid or not. @@ -150,7 +145,7 @@ int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg, u64 amount) if (!amount) return 0; - for (i = cg; i; i = parent_misc(i)) { + for (i = cg; i; i = misc_cg_parent(i)) { res = >res[type]; new_usage = atomic64_add_return(amount, >usage); @@ -163,12 +158,12 @@ int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg, u64 amount) return 0; err_charge: - for (j = i; j; j = parent_misc(j)) { + for (j = i; j; j = misc_cg_parent(j)) { atomic64_inc(>res[type].events); cgroup_file_notify(>events_file); } - for (j = cg; j != i; j = parent_misc(j)) + for (j = cg; j != i; j = misc_cg_parent(j)) misc_cg_cancel_charge(type, j, amount); misc_cg_cancel_charge(type, i, amount); return ret; @@ -190,7 +185,7 @@ void misc_cg_uncharge(enum misc_res_type type, struct misc_cg *cg, u64 amount) if (!(amount && valid_type(type) && cg)) return; - for (i = cg; i; i = parent_misc(i)) + for (i = cg; i; i = misc_cg_parent(i)) misc_cg_cancel_charge(type, i, amount); } EXPORT_SYMBOL_GPL(misc_cg_uncharge); -- 2.25.1
[PATCH v6 03/12] cgroup/misc: Add SGX EPC resource type
From: Kristen Carlson Accardi Add SGX EPC memory, MISC_CG_RES_SGX_EPC, to be a valid resource type for the misc controller. Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang --- V6: - Split the original patch into this and the preceding one (Kai) --- include/linux/misc_cgroup.h | 4 kernel/cgroup/misc.c| 4 2 files changed, 8 insertions(+) diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h index 2a3b1f8dc669..368f6c5fccae 100644 --- a/include/linux/misc_cgroup.h +++ b/include/linux/misc_cgroup.h @@ -17,6 +17,10 @@ enum misc_res_type { MISC_CG_RES_SEV, /* AMD SEV-ES ASIDs resource */ MISC_CG_RES_SEV_ES, +#endif +#ifdef CONFIG_CGROUP_SGX_EPC + /* SGX EPC memory resource */ + MISC_CG_RES_SGX_EPC, #endif MISC_CG_RES_TYPES }; diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c index fa464324ccf8..a22500851fe8 100644 --- a/kernel/cgroup/misc.c +++ b/kernel/cgroup/misc.c @@ -24,6 +24,10 @@ static const char *const misc_res_name[] = { /* AMD SEV-ES ASIDs resource */ "sev_es", #endif +#ifdef CONFIG_CGROUP_SGX_EPC + /* Intel SGX EPC memory bytes */ + "sgx_epc", +#endif }; /* Root misc cgroup */ -- 2.25.1
[PATCH v6 01/12] cgroup/misc: Add per resource callbacks for CSS events
From: Kristen Carlson Accardi The misc cgroup controller (subsystem) currently does not perform resource type specific action for Cgroups Subsystem State (CSS) events: the 'css_alloc' event when a cgroup is created and the 'css_free' event when a cgroup is destroyed. Define callbacks for those events and allow resource providers to register the callbacks per resource type as needed. This will be utilized later by the EPC misc cgroup support implemented in the SGX driver. Also add per resource type private data for those callbacks to store and access resource specific data. Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang --- V6: - Create ops struct for per resource callbacks (Jarkko) - Drop max_write callback (Dave, Michal) - Style fixes (Kai) --- include/linux/misc_cgroup.h | 14 ++ kernel/cgroup/misc.c| 27 --- 2 files changed, 38 insertions(+), 3 deletions(-) diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h index e799b1f8d05b..5dc509c27c3d 100644 --- a/include/linux/misc_cgroup.h +++ b/include/linux/misc_cgroup.h @@ -27,16 +27,30 @@ struct misc_cg; #include +/** + * struct misc_operations_struct: per resource callback ops. + * @alloc: invoked for resource specific initialization when cgroup is allocated. + * @free: invoked for resource specific cleanup when cgroup is deallocated. + */ +struct misc_operations_struct { + int (*alloc)(struct misc_cg *cg); + void (*free)(struct misc_cg *cg); +}; + /** * struct misc_res: Per cgroup per misc type resource * @max: Maximum limit on the resource. * @usage: Current usage of the resource. * @events: Number of times, the resource limit exceeded. + * @priv: resource specific data. + * @misc_ops: resource specific operations. */ struct misc_res { u64 max; atomic64_t usage; atomic64_t events; + void *priv; + const struct misc_operations_struct *misc_ops; }; /** diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c index 79a3717a5803..d971ede44ebf 100644 --- a/kernel/cgroup/misc.c +++ b/kernel/cgroup/misc.c @@ -383,23 +383,37 @@ static struct cftype misc_cg_files[] = { static struct cgroup_subsys_state * misc_cg_alloc(struct cgroup_subsys_state *parent_css) { + struct misc_cg *parent_cg, *cg; enum misc_res_type i; - struct misc_cg *cg; + int ret; if (!parent_css) { - cg = _cg; + parent_cg = cg = _cg; } else { cg = kzalloc(sizeof(*cg), GFP_KERNEL); if (!cg) return ERR_PTR(-ENOMEM); + parent_cg = css_misc(parent_css); } for (i = 0; i < MISC_CG_RES_TYPES; i++) { WRITE_ONCE(cg->res[i].max, MAX_NUM); atomic64_set(>res[i].usage, 0); + if (parent_cg->res[i].misc_ops && parent_cg->res[i].misc_ops->alloc) { + ret = parent_cg->res[i].misc_ops->alloc(cg); + if (ret) + goto alloc_err; + } } return >css; + +alloc_err: + for (i = 0; i < MISC_CG_RES_TYPES; i++) + if (parent_cg->res[i].misc_ops && parent_cg->res[i].misc_ops->free) + cg->res[i].misc_ops->free(cg); + kfree(cg); + return ERR_PTR(ret); } /** @@ -410,7 +424,14 @@ misc_cg_alloc(struct cgroup_subsys_state *parent_css) */ static void misc_cg_free(struct cgroup_subsys_state *css) { - kfree(css_misc(css)); + struct misc_cg *cg = css_misc(css); + enum misc_res_type i; + + for (i = 0; i < MISC_CG_RES_TYPES; i++) + if (cg->res[i].misc_ops && cg->res[i].misc_ops->free) + cg->res[i].misc_ops->free(cg); + + kfree(cg); } /* Cgroup controller callbacks */ -- 2.25.1
[PATCH v6 00/12] Add Cgroup support for SGX EPC memory
SGX Enclave Page Cache (EPC) memory allocations are separate from normal RAM allocations, and are managed solely by the SGX subsystem. The existing cgroup memory controller cannot be used to limit or account for SGX EPC memory, which is a desirable feature in some environments, e.g., support for pod level control in a Kubernates cluster on a VM or baremetal host [1,2]. This patchset implements the support for sgx_epc memory within the misc cgroup controller. The user can use the misc cgroup controller to set and enforce a max limit on total EPC usage per cgroup. The implementation reports current usage and events of reaching the limit per cgroup as well as the total system capacity. With the EPC misc controller enabled, every EPC page allocation is accounted for a cgroup's usage, reflected in the 'sgx_epc' entry in the 'misc.current' interface file of the cgroup. Much like normal system memory, EPC memory can be overcommitted via virtual memory techniques and pages can be swapped out of the EPC to their backing store (normal system memory allocated via shmem, accounted by the memory controller). When the EPC usage of a cgroup reaches its hard limit ('sgx_epc' entry in the 'misc.max' file), the cgroup starts a reclamation process to swap out some EPC pages within the same cgroup and its descendant to their backing store. Although the SGX architecture supports swapping for all pages, to avoid extra complexities, this implementation does not support swapping for certain page types, e.g. Version Array(VA) pages, and treat them as unreclaimable pages. When the limit is reached but nothing left in the cgroup for reclamation, i.e., only unreclaimable pages left, any new EPC allocation in the cgroup will result in an ENOMEM error. The EPC pages allocated for guest VMs by the virtual EPC driver are not reclaimable by the host kernel [5]. Therefore they are also treated as unreclaimable from cgroup's point of view. And the virtual EPC driver translates an ENOMEM error resulted from an EPC allocation request into a SIGBUS to the user process. This work was originally authored by Sean Christopherson a few years ago, and previously modified by Kristen C. Accardi to utilize the misc cgroup controller rather than a custom controller. I have been updating the patches based on review comments since V2 [3, 4, 10], simplified the implementation/design and fixed some stability issues found from testing. The patches are organized as following: - Patches 1-3 are prerequisite misc cgroup changes for adding new APIs, structs, resource types. - Patch 4 implements basic misc controller for EPC without reclamation. - Patches 5-9 prepare for per-cgroup reclamation. * Separate out the existing infrastructure of tracking reclaimable pages from the global reclaimer(ksgxd) to a newly created LRU list struct. * Separate out reusable top-level functions for reclamation. - Patch 10 adds support for per-cgroup reclamation. - Patch 11 adds documentation for the EPC cgroup. - Patch 12 adds test scripts. I appreciate your review and providing tags if appropriate. --- V6: - Dropped OOM killing path, only implement non-preemptive enforcement of max limit (Dave, Michal) - Simplified reclamation flow by taking out sgx_epc_reclaim_control, forced reclamation by ignoring 'age". - Restructured patches: split misc API + resource types patch and the big EPC cgroup patch (Kai, Michal) - Dropped some Tested-by/Reviewed-by tags due to significant changes - Added more selftests v5: - Replace the manual test script with a selftest script. - Restore the "From" tag for some patches to Sean (Kai) - Style fixes (Jarkko) v4: - Collected "Tested-by" from Mikko. I kept it for now as no functional changes in v4. - Rebased on to v6.6_rc1 and reordered patches as described above. - Separated out the bug fixes [7,8,9]. This series depend on those patches. (Dave, Jarkko) - Added comments in commit message to give more preview what's to come next. (Jarkko) - Fixed some documentation error, gap, style (Mikko, Randy) - Fixed some comments, typo, style in code (Mikko, Kai) - Patch format and background for reclaimable vs unreclaimable (Kai, Jarkko) - Fixed typo (Pavel) - Exclude the previous fixes/enhancements for self-tests. Patch 18 now depends on series [6] - Use the same to list for cover and all patches. (Solo) v3: - Added EPC states to replace flags in sgx_epc_page struct. (Jarkko) - Unrolled wrappers for cond_resched, list (Dave) - Separate patches for adding reclaimable and unreclaimable lists. (Dave) - Other improvements on patch flow, commit messages, styles. (Dave, Jarkko) - Simplified the cgroup tree walking with plain css_for_each_descendant_pre. - Fixed race conditions and crashes. - OOM killer to wait for the victim enclave pages being reclaimed. - Unblock the user by handling misc_max_write callback asynchronously. - Rebased onto 6.4 and no longer base this series on the MCA
Re: [PATCH v2 1/3] dt-bindings: remoteproc: qcom,sm8550-pas: document the SM8650 PAS
On 30/10/2023 11:03, Neil Armstrong wrote: > Document the DSP Peripheral Authentication Service on the SM8650 Platform. > > Signed-off-by: Neil Armstrong > --- > .../bindings/remoteproc/qcom,sm8550-pas.yaml | 44 > +- > 1 file changed, 43 insertions(+), 1 deletion(-)' Reviewed-by: Krzysztof Kozlowski Best regards, Krzysztof
Re: [PATCH] eventfs: Fix kerneldoc of eventfs_remove_rec()
On 10/30/2023 9:45 PM, Steven Rostedt wrote: From: "Steven Rostedt (Google)" The eventfs_remove_rec() had some missing parameters in the kerneldoc comment above it. Also, rephrase the description a bit more to have a bit more correct grammar. Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode"); Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202310052216.4sgqaswo-...@intel.com/ Signed-off-by: Steven Rostedt (Google) Reviewed-by: Mukesh Ojha -Mukesh --- fs/tracefs/event_inode.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 5a3cc5394294..1c28e013201f 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -977,9 +977,11 @@ static void free_rcu_ei(struct rcu_head *head) /** * eventfs_remove_rec - remove eventfs dir or file from list * @ei: eventfs_inode to be removed. + * @head: the list head to place the deleted @ei and children + * @level: prevent recursion from going more than 3 levels deep. * - * This function recursively remove eventfs_inode which - * contains info of file or dir. + * This function recursively removes eventfs_inodes which + * contains info of files and/or directories. */ static void eventfs_remove_rec(struct eventfs_inode *ei, struct list_head *head, int level) {
[PATCH] eventfs: Fix kerneldoc of eventfs_remove_rec()
From: "Steven Rostedt (Google)" The eventfs_remove_rec() had some missing parameters in the kerneldoc comment above it. Also, rephrase the description a bit more to have a bit more correct grammar. Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode"); Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202310052216.4sgqaswo-...@intel.com/ Signed-off-by: Steven Rostedt (Google) --- fs/tracefs/event_inode.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 5a3cc5394294..1c28e013201f 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -977,9 +977,11 @@ static void free_rcu_ei(struct rcu_head *head) /** * eventfs_remove_rec - remove eventfs dir or file from list * @ei: eventfs_inode to be removed. + * @head: the list head to place the deleted @ei and children + * @level: prevent recursion from going more than 3 levels deep. * - * This function recursively remove eventfs_inode which - * contains info of file or dir. + * This function recursively removes eventfs_inodes which + * contains info of files and/or directories. */ static void eventfs_remove_rec(struct eventfs_inode *ei, struct list_head *head, int level) { -- 2.42.0
Re: [PATCH v6 3/4] remoteproc: zynqmp: add pm domains support
Hi Mathieu, I agree to all the comments, I will address them in next revision. Thanks, Tanmay On 10/18/23 12:38 PM, Mathieu Poirier wrote: > Good morning, > > On Thu, Oct 12, 2023 at 09:22:28PM -0700, Tanmay Shah wrote: > > Use TCM pm domains extracted from device-tree > > to power on/off TCM using general pm domain framework. > > > > Signed-off-by: Tanmay Shah > > --- > > > > Changes in v6: > > - Remove spurious change > > - Handle errors in add_pm_domains function > > - Remove redundant code to handle errors from remove_pm_domains > > > > drivers/remoteproc/xlnx_r5_remoteproc.c | 262 ++-- > > 1 file changed, 243 insertions(+), 19 deletions(-) > > > > diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c > > b/drivers/remoteproc/xlnx_r5_remoteproc.c > > index 4395edea9a64..04e95d880184 100644 > > --- a/drivers/remoteproc/xlnx_r5_remoteproc.c > > +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c > > @@ -16,6 +16,7 @@ > > #include > > #include > > #include > > +#include > > > > #include "remoteproc_internal.h" > > > > @@ -102,6 +103,12 @@ static const struct mem_bank_data > > zynqmp_tcm_banks_lockstep[] = { > > * @rproc: rproc handle > > * @pm_domain_id: RPU CPU power domain id > > * @ipi: pointer to mailbox information > > + * @num_pm_dev: number of tcm pm domain devices for this core > > + * @pm_dev1: pm domain virtual devices for power domain framework > > + * @pm_dev_link1: pm domain device links after registration > > + * @pm_dev2: used only in lockstep mode. second core's pm domain virtual > > devices > > + * @pm_dev_link2: used only in lockstep mode. second core's pm device > > links after > > + * registration > > */ > > struct zynqmp_r5_core { > > struct device *dev; > > @@ -111,6 +118,11 @@ struct zynqmp_r5_core { > > struct rproc *rproc; > > u32 pm_domain_id; > > struct mbox_info *ipi; > > + int num_pm_dev; > > + struct device **pm_dev1; > > s/pm_dev1/pm_dev_core0 > > > + struct device_link **pm_dev_link1; > > s/pm_dev_link1/pm_dev_core0_link; > > > + struct device **pm_dev2; > > s/pm_dev2/pm_dev_core1 > > > + struct device_link **pm_dev_link2; > > s/pm_dev_link2/pm_dev_core1_link; > > > }; > > > > /** > > @@ -575,12 +587,21 @@ static int add_tcm_carveout_split_mode(struct rproc > > *rproc) > > bank_size = r5_core->tcm_banks[i]->size; > > pm_domain_id = r5_core->tcm_banks[i]->pm_domain_id; > > > > - ret = zynqmp_pm_request_node(pm_domain_id, > > -ZYNQMP_PM_CAPABILITY_ACCESS, 0, > > -ZYNQMP_PM_REQUEST_ACK_BLOCKING); > > - if (ret < 0) { > > - dev_err(dev, "failed to turn on TCM 0x%x", > > pm_domain_id); > > - goto release_tcm_split; > > + /* > > +* If TCM information is available in device-tree then > > +* in that case, pm domain framework will power on/off TCM. > > +* In that case pm_domain_id is set to 0. If hardcode > > +* bindings from driver is used, then only this driver will > > +* use pm_domain_id. > > +*/ > > + if (pm_domain_id) { > > + ret = zynqmp_pm_request_node(pm_domain_id, > > + > > ZYNQMP_PM_CAPABILITY_ACCESS, 0, > > + > > ZYNQMP_PM_REQUEST_ACK_BLOCKING); > > + if (ret < 0) { > > + dev_err(dev, "failed to turn on TCM 0x%x", > > pm_domain_id); > > + goto release_tcm_split; > > + } > > This should go in the next patch. > > > } > > > > dev_dbg(dev, "TCM carveout split mode %s addr=%llx, da=0x%x, > > size=0x%lx", > > @@ -646,13 +667,16 @@ static int add_tcm_carveout_lockstep_mode(struct > > rproc *rproc) > > for (i = 0; i < num_banks; i++) { > > pm_domain_id = r5_core->tcm_banks[i]->pm_domain_id; > > > > - /* Turn on each TCM bank individually */ > > - ret = zynqmp_pm_request_node(pm_domain_id, > > -ZYNQMP_PM_CAPABILITY_ACCESS, 0, > > -ZYNQMP_PM_REQUEST_ACK_BLOCKING); > > - if (ret < 0) { > > - dev_err(dev, "failed to turn on TCM 0x%x", > > pm_domain_id); > > - goto release_tcm_lockstep; > > + if (pm_domain_id) { > > + /* Turn on each TCM bank individually */ > > + ret = zynqmp_pm_request_node(pm_domain_id, > > + > > ZYNQMP_PM_CAPABILITY_ACCESS, 0, > > + > > ZYNQMP_PM_REQUEST_ACK_BLOCKING); > > + if (ret < 0) { > > + dev_err(dev, "failed to turn on TCM 0x%x", > > +
[PATCH v2] eventfs: Save ownership and mode
From: "Steven Rostedt (Google)" Now that inodes and dentries are created on the fly, they are also reclaimed on memory pressure. Since the ownership and file mode are saved in the inode, if they are freed, any changes to the ownership and mode will be lost. To counter this, if the user changes the permissions or ownership, save them, and when creating the inodes again, restore those changes. Signed-off-by: Steven Rostedt (Google) --- Changes since v1: https://lore.kernel.org/all/20231029182345.0803e...@rorschach.local.home/ - Fixed kerneldoc reported by https://lore.kernel.org/all/202310301225.vjemupal-...@intel.com/ fs/tracefs/event_inode.c | 148 +++ fs/tracefs/internal.h| 15 2 files changed, 150 insertions(+), 13 deletions(-) diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 45bddce7c747..7ad7496bd597 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -33,6 +33,15 @@ */ static DEFINE_MUTEX(eventfs_mutex); +/* Mode is unsigned short, use the upper bits for flags */ +enum { + EVENTFS_SAVE_MODE = BIT(16), + EVENTFS_SAVE_UID= BIT(17), + EVENTFS_SAVE_GID= BIT(18), +}; + +#define EVENTFS_MODE_MASK (EVENTFS_SAVE_MODE - 1) + /* * The eventfs_inode (ei) itself is protected by SRCU. It is released from * its parent's list and will have is_freed set (under eventfs_mutex). @@ -47,8 +56,89 @@ static int dcache_dir_open_wrapper(struct inode *inode, struct file *file); static int dcache_readdir_wrapper(struct file *file, struct dir_context *ctx); static int eventfs_release(struct inode *inode, struct file *file); +static void update_attr(struct eventfs_attr *attr, struct iattr *iattr) +{ + unsigned int ia_valid = iattr->ia_valid; + + if (ia_valid & ATTR_MODE) { + attr->mode = (attr->mode & ~EVENTFS_MODE_MASK) | + (iattr->ia_mode & EVENTFS_MODE_MASK) | + EVENTFS_SAVE_MODE; + } + if (ia_valid & ATTR_UID) { + attr->mode |= EVENTFS_SAVE_UID; + attr->uid = iattr->ia_uid; + } + if (ia_valid & ATTR_GID) { + attr->mode |= EVENTFS_SAVE_GID; + attr->gid = iattr->ia_gid; + } +} + +static int eventfs_set_attr(struct mnt_idmap *idmap, struct dentry *dentry, + struct iattr *iattr) +{ + const struct eventfs_entry *entry; + struct eventfs_inode *ei; + const char *name; + int ret; + + mutex_lock(_mutex); + ei = dentry->d_fsdata; + /* The LSB is set when the eventfs_inode is being freed */ + if (((unsigned long)ei & 1UL) || ei->is_freed) { + /* Do not allow changes if the event is about to be removed. */ + mutex_unlock(_mutex); + return -ENODEV; + } + + /* Preallocate the children mode array if necessary */ + if (!(dentry->d_inode->i_mode & S_IFDIR)) { + if (!ei->entry_attrs) { + ei->entry_attrs = kzalloc(sizeof(*ei->entry_attrs) * ei->nr_entries, + GFP_KERNEL); + if (!ei->entry_attrs) { + ret = -ENOMEM; + goto out; + } + } + } + + ret = simple_setattr(idmap, dentry, iattr); + if (ret < 0) + goto out; + + /* +* If this is a dir, then update the ei cache, only the file +* mode is saved in the ei->m_children, and the ownership is +* determined by the parent directory. +*/ + if (dentry->d_inode->i_mode & S_IFDIR) { + update_attr(>attr, iattr); + + } else { + name = dentry->d_name.name; + + for (int i = 0; i < ei->nr_entries; i++) { + entry = >entries[i]; + if (strcmp(name, entry->name) == 0) { + update_attr(>entry_attrs[i], iattr); + break; + } + } + } + out: + mutex_unlock(_mutex); + return ret; +} + static const struct inode_operations eventfs_root_dir_inode_operations = { .lookup = eventfs_root_lookup, + .setattr= eventfs_set_attr, +}; + +static const struct inode_operations eventfs_file_inode_operations = { + .setattr= eventfs_set_attr, }; static const struct file_operations eventfs_file_operations = { @@ -59,10 +149,30 @@ static const struct file_operations eventfs_file_operations = { .release= eventfs_release, }; +static void update_inode_attr(struct inode *inode, struct eventfs_attr *attr, umode_t mode) +{ + if (!attr) { + inode->i_mode = mode; + return; + } + + if (attr->mode & EVENTFS_SAVE_MODE)
Re: [PATCH] eventfs: Hold eventfs_mutex when calling callback functions
I think I figured out why claws-mail adds a backslash to quotes when sending. It allows you to add more than one Cc on a line, and I think (I haven't tested it yet) if you add multiple names with quotes on the same line, it thinks that it's a single name and will backslash internal quotes. Hmm. Anyway, I had to remove all quoted names to reply to this email. Hopefully claws-mail doesn't screw it up again :-p I may also need to upgrade claws-mail, as I build my own because I have some things enabled that the distro version does not include. -- Steve On Mon, 30 Oct 2023 11:40:47 -0400 Steven Rostedt wrote: > From: "Steven Rostedt (Google)" > > The callback function that is used to create inodes and dentries is not > protected by anything and the data that is passed to it could become > stale. After eventfs_remove_dir() is called by the tracing system, it is > free to remove the events that are associated to that directory. > Unfortunately, that means the callbacks must not be called after that. > > CPU0 CPU1 > > eventfs_root_lookup() { >eventfs_remove_dir() { > mutex_lock(_mutex); > ei->is_freed = set; > mutex_unlock(_mutex); >} >kfree(event_call); > > for (...) { > entry = >entries[i]; > r = entry->callback() { > call = data;// call == event_call above > if (call->flags ...) > > [ USE AFTER FREE BUG ] > > The safest way to protect this is to wrap the callback with: > > mutex_lock(_mutex); > if (!ei->is_freed) > r = entry->callback(); > else > r = -1; > mutex_unlock(_mutex); > > This will make sure that the callback will not be called after it is > freed. But now it needs to be known that the callback is called while > holding internal eventfs locks, and that it must not call back into the > eventfs / tracefs system. There's no reason it should anyway, but document > that as well. > > Link: > https://lore.kernel.org/all/CA+G9fYu9GOEbD=rR5eMR-=HJ8H6rMsbzDC2ZY5=y50wpwae...@mail.gmail.com/ > > Reported-by: Linux Kernel Functional Testing > Reported-by: Naresh Kamboju > Signed-off-by: Steven Rostedt (Google) > --- > fs/tracefs/event_inode.c | 22 ++-- > include/linux/tracefs.h | 43 > 2 files changed, 63 insertions(+), 2 deletions(-) > > diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c > index 7ad7496bd597..5a3cc5394294 100644 > --- a/fs/tracefs/event_inode.c > +++ b/fs/tracefs/event_inode.c > @@ -609,7 +609,13 @@ static struct dentry *eventfs_root_lookup(struct inode > *dir, > entry = >entries[i]; > if (strcmp(name, entry->name) == 0) { > void *cdata = data; > - r = entry->callback(name, , , ); > + mutex_lock(_mutex); > + /* If ei->is_freed, then the event itself may be too */ > + if (!ei->is_freed) > + r = entry->callback(name, , , ); > + else > + r = -1; > + mutex_unlock(_mutex); > if (r <= 0) > continue; > ret = simple_lookup(dir, dentry, flags); > @@ -743,7 +749,13 @@ static int dcache_dir_open_wrapper(struct inode *inode, > struct file *file) > void *cdata = data; > entry = >entries[i]; > name = entry->name; > - r = entry->callback(name, , , ); > + mutex_lock(_mutex); > + /* If ei->is_freed, then the event itself may be too */ > + if (!ei->is_freed) > + r = entry->callback(name, , , ); > + else > + r = -1; > + mutex_unlock(_mutex); > if (r <= 0) > continue; > d = create_file_dentry(ei, i, parent, name, mode, cdata, fops, > false); > @@ -821,6 +833,10 @@ static void free_ei(struct eventfs_inode *ei) > * data = A pointer to @data, and the callback may replace it, which will > * cause the file created to pass the new data to the open() call. > * fops = the fops to use for the created file. > + * > + * NB. @callback is called while holding internal locks of the eventfs > + * system. The callback must not call any code that might also call into > + * the tracefs or eventfs system or it will risk creating a deadlock. > */ > struct eventfs_inode *eventfs_create_dir(const char *name, struct > eventfs_inode *parent, >const struct eventfs_entry *entries, > @@ -880,6 +896,8 @@ struct eventfs_inode
[PATCH] eventfs: Hold eventfs_mutex when calling callback functions
From: "Steven Rostedt (Google)" The callback function that is used to create inodes and dentries is not protected by anything and the data that is passed to it could become stale. After eventfs_remove_dir() is called by the tracing system, it is free to remove the events that are associated to that directory. Unfortunately, that means the callbacks must not be called after that. CPU0 CPU1 eventfs_root_lookup() { eventfs_remove_dir() { mutex_lock(_mutex); ei->is_freed = set; mutex_unlock(_mutex); } kfree(event_call); for (...) { entry = >entries[i]; r = entry->callback() { call = data; // call == event_call above if (call->flags ...) [ USE AFTER FREE BUG ] The safest way to protect this is to wrap the callback with: mutex_lock(_mutex); if (!ei->is_freed) r = entry->callback(); else r = -1; mutex_unlock(_mutex); This will make sure that the callback will not be called after it is freed. But now it needs to be known that the callback is called while holding internal eventfs locks, and that it must not call back into the eventfs / tracefs system. There's no reason it should anyway, but document that as well. Link: https://lore.kernel.org/all/CA+G9fYu9GOEbD=rR5eMR-=HJ8H6rMsbzDC2ZY5=y50wpwae...@mail.gmail.com/ Reported-by: Linux Kernel Functional Testing Reported-by: Naresh Kamboju Signed-off-by: Steven Rostedt (Google) --- fs/tracefs/event_inode.c | 22 ++-- include/linux/tracefs.h | 43 2 files changed, 63 insertions(+), 2 deletions(-) diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 7ad7496bd597..5a3cc5394294 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -609,7 +609,13 @@ static struct dentry *eventfs_root_lookup(struct inode *dir, entry = >entries[i]; if (strcmp(name, entry->name) == 0) { void *cdata = data; - r = entry->callback(name, , , ); + mutex_lock(_mutex); + /* If ei->is_freed, then the event itself may be too */ + if (!ei->is_freed) + r = entry->callback(name, , , ); + else + r = -1; + mutex_unlock(_mutex); if (r <= 0) continue; ret = simple_lookup(dir, dentry, flags); @@ -743,7 +749,13 @@ static int dcache_dir_open_wrapper(struct inode *inode, struct file *file) void *cdata = data; entry = >entries[i]; name = entry->name; - r = entry->callback(name, , , ); + mutex_lock(_mutex); + /* If ei->is_freed, then the event itself may be too */ + if (!ei->is_freed) + r = entry->callback(name, , , ); + else + r = -1; + mutex_unlock(_mutex); if (r <= 0) continue; d = create_file_dentry(ei, i, parent, name, mode, cdata, fops, false); @@ -821,6 +833,10 @@ static void free_ei(struct eventfs_inode *ei) * data = A pointer to @data, and the callback may replace it, which will * cause the file created to pass the new data to the open() call. * fops = the fops to use for the created file. + * + * NB. @callback is called while holding internal locks of the eventfs + * system. The callback must not call any code that might also call into + * the tracefs or eventfs system or it will risk creating a deadlock. */ struct eventfs_inode *eventfs_create_dir(const char *name, struct eventfs_inode *parent, const struct eventfs_entry *entries, @@ -880,6 +896,8 @@ struct eventfs_inode *eventfs_create_dir(const char *name, struct eventfs_inode * @data: The default data to pass to the files (an entry may override it). * * This function creates the top of the trace event directory. + * + * See eventfs_create_dir() for use of @entries. */ struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry *parent, const struct eventfs_entry *entries, diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h index 13359b1a35d1..7a5fe17b6bf9 100644 --- a/include/linux/tracefs.h +++ b/include/linux/tracefs.h @@ -23,9 +23,52 @@ struct file_operations; struct eventfs_file; +/** + * eventfs_callback - A callback function to create dynamic files in eventfs + * @name: The
Re: [PATCH 7/9] arm64: dts: qcom: sc7280: Add CDSP node
Hi, On Mon, Oct 30, 2023 at 7:43 AM Luca Weiss wrote: > > On Mon Oct 30, 2023 at 3:11 PM CET, Doug Anderson wrote: > > Hi, > > > > On Mon, Oct 30, 2023 at 2:12 AM Luca Weiss wrote: > > > > > > On Mon Oct 30, 2023 at 10:04 AM CET, Mukesh Ojha wrote: > > > > > > > > > > > > On 10/27/2023 7:50 PM, Luca Weiss wrote: > > > > > Add the node for the ADSP found on the SC7280 SoC, using standard > > > > > Qualcomm firmware. > > > > > > > > > > The memory region for sc7280-chrome-common.dtsi is taken from msm-5.4 > > > > > yupik.dtsi since the other areas also seem to match that file there, > > > > > though I cannot be sure there. > > > > > > > > > > Signed-off-by: Luca Weiss > > > > > --- > > > > > arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi | 5 + > > > > > arch/arm64/boot/dts/qcom/sc7280.dtsi | 138 > > > > > + > > > > > 2 files changed, 143 insertions(+) > > > > > > > > > > diff --git a/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi > > > > > b/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi > > > > > index eb55616e0892..6e5a9d4c1fda 100644 > > > > > --- a/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi > > > > > +++ b/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi > > > > > @@ -29,6 +29,11 @@ adsp_mem: memory@8670 { > > > > > no-map; > > > > > }; > > > > > > > > > > + cdsp_mem: memory@88f0 { > > > > > + reg = <0x0 0x88f0 0x0 0x1e0>; > > > > > + no-map; > > > > > + }; > > > > > + > > > > > > > > Just a question, why to do it here, if chrome does not use this ? > > > > > > Other memory regions in sc7280.dtsi also get referenced but not actually > > > defined in that file, like mpss_mem and wpss_mem. Alternatively we can > > > also try and solve this differently, but then we should probably also > > > adjust mpss and wpss to be consistent. > > > > > > Apart from either declaring cdsp_mem in sc7280.dtsi or > > > "/delete-property/ memory-region;" for CDSP I don't really have better > > > ideas though. > > > > > > I also imagine these ChromeOS devices will want to enable cdsp at some > > > point but I don't know any plans there. > > > > Given that "remoteproc_cdsp" has status "disabled" in the dtsi, it > > feels like the dtsi shouldn't be reserving memory. I guess maybe > > memory regions can't be status "disabled"? > > Hi Doug, > > That's how it works in really any qcom dtsi though. I think in most/all > cases normally the reserved-memory is already declared in the SoC dtsi > file and also used with the memory-region property. > > I wouldn't be against adjusting sc7280.dtsi to match the way it's done > in the other dtsi files though, so to have all the required labels > already defined in the dtsi so it doesn't rely on these labels being > defined in the device dts. > > In other words, currently if you include sc7280.dtsi and try to build, > you first have to define the labels mpss_mem and wpss_mem (after this > patch series also cdsp_mem and adsp_mem) for it to build. > > I'm quite neutral either way, let me know :) I haven't done a ton of thinking about this, so if I'm spouting gibberish then feel free to ignore me. :-P It just feels weird that when all the "dtsi" files are combined and you look at what you end up on a sc7280 Chrome board that you'll be reserving 32MB of memory for a device that's set (in the same device tree) to be "disabled", right? ...the 32MB is completely wasted, I think. If we wanted to enable the CDSP we'd have to modify the device tree anyway, so it seems like that same modification would set the CDSP to "okay" and also reserve the memory... In that vein, it seems like maybe you could move the "cdsp_mem" to the SoC .dsti file with a status of "disabled". . I guess we don't do that elsewhere, but maybe we should be? As far as I can tell without testing it (just looking at fdt_scan_reserved_mem()) this should work... -Doug
Re: [PATCH 7/9] arm64: dts: qcom: sc7280: Add CDSP node
On Mon Oct 30, 2023 at 3:11 PM CET, Doug Anderson wrote: > Hi, > > On Mon, Oct 30, 2023 at 2:12 AM Luca Weiss wrote: > > > > On Mon Oct 30, 2023 at 10:04 AM CET, Mukesh Ojha wrote: > > > > > > > > > On 10/27/2023 7:50 PM, Luca Weiss wrote: > > > > Add the node for the ADSP found on the SC7280 SoC, using standard > > > > Qualcomm firmware. > > > > > > > > The memory region for sc7280-chrome-common.dtsi is taken from msm-5.4 > > > > yupik.dtsi since the other areas also seem to match that file there, > > > > though I cannot be sure there. > > > > > > > > Signed-off-by: Luca Weiss > > > > --- > > > > arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi | 5 + > > > > arch/arm64/boot/dts/qcom/sc7280.dtsi | 138 > > > > + > > > > 2 files changed, 143 insertions(+) > > > > > > > > diff --git a/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi > > > > b/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi > > > > index eb55616e0892..6e5a9d4c1fda 100644 > > > > --- a/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi > > > > +++ b/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi > > > > @@ -29,6 +29,11 @@ adsp_mem: memory@8670 { > > > > no-map; > > > > }; > > > > > > > > + cdsp_mem: memory@88f0 { > > > > + reg = <0x0 0x88f0 0x0 0x1e0>; > > > > + no-map; > > > > + }; > > > > + > > > > > > Just a question, why to do it here, if chrome does not use this ? > > > > Other memory regions in sc7280.dtsi also get referenced but not actually > > defined in that file, like mpss_mem and wpss_mem. Alternatively we can > > also try and solve this differently, but then we should probably also > > adjust mpss and wpss to be consistent. > > > > Apart from either declaring cdsp_mem in sc7280.dtsi or > > "/delete-property/ memory-region;" for CDSP I don't really have better > > ideas though. > > > > I also imagine these ChromeOS devices will want to enable cdsp at some > > point but I don't know any plans there. > > Given that "remoteproc_cdsp" has status "disabled" in the dtsi, it > feels like the dtsi shouldn't be reserving memory. I guess maybe > memory regions can't be status "disabled"? Hi Doug, That's how it works in really any qcom dtsi though. I think in most/all cases normally the reserved-memory is already declared in the SoC dtsi file and also used with the memory-region property. I wouldn't be against adjusting sc7280.dtsi to match the way it's done in the other dtsi files though, so to have all the required labels already defined in the dtsi so it doesn't rely on these labels being defined in the device dts. In other words, currently if you include sc7280.dtsi and try to build, you first have to define the labels mpss_mem and wpss_mem (after this patch series also cdsp_mem and adsp_mem) for it to build. I'm quite neutral either way, let me know :) Regards Luca > > -Doug
Re: [PATCH v2] eventfs: Test for ei->is_freed when accessing ei->dentry
On Mon, 30 Oct 2023 12:37:08 +0530 Naresh Kamboju wrote: > > I have tested the linux-trace.git trace/core and run selftests ftrace > the reported kernel panic [1] & [2] has been fixed but found Good to know. Can I add "Tested-by" from you for that bug fix? > "general protection fault" at kernel/trace/trace_events.c:2439. Can you test with the below patch? Also, can I ask what are you testing this on that makes it trigger so easily? As I'm not able to trigger these in my tests, even though they are indeed bugs. -- Steve diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 7ad7496bd597..7a0b54ddda24 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -609,7 +609,13 @@ static struct dentry *eventfs_root_lookup(struct inode *dir, entry = >entries[i]; if (strcmp(name, entry->name) == 0) { void *cdata = data; - r = entry->callback(name, , , ); + mutex_lock(_mutex); + /* If ei->is_freed, then the event itself may be too */ + if (!ei->is_freed) + r = entry->callback(name, , , ); + else + r = -1; + mutex_unlock(_mutex); if (r <= 0) continue; ret = simple_lookup(dir, dentry, flags); @@ -743,7 +749,13 @@ static int dcache_dir_open_wrapper(struct inode *inode, struct file *file) void *cdata = data; entry = >entries[i]; name = entry->name; - r = entry->callback(name, , , ); + mutex_lock(_mutex); + /* If ei->is_freed, then the event itself may be too */ + if (!ei->is_freed) + r = entry->callback(name, , , ); + else + r = -1; + mutex_unlock(_mutex); if (r <= 0) continue; d = create_file_dentry(ei, i, parent, name, mode, cdata, fops, false);
Re: [PATCH 7/9] arm64: dts: qcom: sc7280: Add CDSP node
Hi, On Mon, Oct 30, 2023 at 2:12 AM Luca Weiss wrote: > > On Mon Oct 30, 2023 at 10:04 AM CET, Mukesh Ojha wrote: > > > > > > On 10/27/2023 7:50 PM, Luca Weiss wrote: > > > Add the node for the ADSP found on the SC7280 SoC, using standard > > > Qualcomm firmware. > > > > > > The memory region for sc7280-chrome-common.dtsi is taken from msm-5.4 > > > yupik.dtsi since the other areas also seem to match that file there, > > > though I cannot be sure there. > > > > > > Signed-off-by: Luca Weiss > > > --- > > > arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi | 5 + > > > arch/arm64/boot/dts/qcom/sc7280.dtsi | 138 > > > + > > > 2 files changed, 143 insertions(+) > > > > > > diff --git a/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi > > > b/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi > > > index eb55616e0892..6e5a9d4c1fda 100644 > > > --- a/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi > > > +++ b/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi > > > @@ -29,6 +29,11 @@ adsp_mem: memory@8670 { > > > no-map; > > > }; > > > > > > + cdsp_mem: memory@88f0 { > > > + reg = <0x0 0x88f0 0x0 0x1e0>; > > > + no-map; > > > + }; > > > + > > > > Just a question, why to do it here, if chrome does not use this ? > > Other memory regions in sc7280.dtsi also get referenced but not actually > defined in that file, like mpss_mem and wpss_mem. Alternatively we can > also try and solve this differently, but then we should probably also > adjust mpss and wpss to be consistent. > > Apart from either declaring cdsp_mem in sc7280.dtsi or > "/delete-property/ memory-region;" for CDSP I don't really have better > ideas though. > > I also imagine these ChromeOS devices will want to enable cdsp at some > point but I don't know any plans there. Given that "remoteproc_cdsp" has status "disabled" in the dtsi, it feels like the dtsi shouldn't be reserving memory. I guess maybe memory regions can't be status "disabled"? -Doug
Re: [PATCH for-next] tracing/kprobes: Add symbol counting check when module loads
On Sun, 29 Oct 2023 12:10:46 +0900 "Masami Hiramatsu (Google)" wrote: > From: Masami Hiramatsu (Google) > > Check the number of probe target symbols in the target module when > the module is loaded. If the probe is not on the unique name symbols > in the module, it will be rejected at that point. > > Note that the symbol which has a unique name in the target module, > it will be accepted even if there are same-name symbols in the > kernel or other modules, > > Signed-off-by: Masami Hiramatsu (Google) > --- > kernel/trace/trace_kprobe.c | 112 > ++- > 1 file changed, 68 insertions(+), 44 deletions(-) Reviewed-by: Steven Rosted (Google) -- Steve
Re: [PATCH v2 2/3] remoteproc: qcom: pas: make region assign more generic
On 10/30/2023 3:33 PM, Neil Armstrong wrote: The current memory region assign only supports a single memory region. But new platforms introduces more regions to make the memory requirements more flexible for various use cases. Those new platforms also shares the memory region between the DSP and HLOS. To handle this, make the region assign more generic in order to support more than a single memory region and also permit setting the regions permissions as shared. Signed-off-by: Neil Armstrong --- drivers/remoteproc/qcom_q6v5_pas.c | 102 - 1 file changed, 66 insertions(+), 36 deletions(-) diff --git a/drivers/remoteproc/qcom_q6v5_pas.c b/drivers/remoteproc/qcom_q6v5_pas.c index 913a5d2068e8..4829fd26e17d 100644 --- a/drivers/remoteproc/qcom_q6v5_pas.c +++ b/drivers/remoteproc/qcom_q6v5_pas.c @@ -33,6 +33,8 @@ #define ADSP_DECRYPT_SHUTDOWN_DELAY_MS 100 +#define MAX_ASSIGN_COUNT 2 + struct adsp_data { int crash_reason_smem; const char *firmware_name; @@ -51,6 +53,9 @@ struct adsp_data { int ssctl_id; int region_assign_idx; + int region_assign_count; + bool region_assign_shared; + int region_assign_vmid; }; struct qcom_adsp { @@ -87,15 +92,18 @@ struct qcom_adsp { phys_addr_t dtb_mem_phys; phys_addr_t mem_reloc; phys_addr_t dtb_mem_reloc; - phys_addr_t region_assign_phys; + phys_addr_t region_assign_phys[MAX_ASSIGN_COUNT]; void *mem_region; void *dtb_mem_region; size_t mem_size; size_t dtb_mem_size; - size_t region_assign_size; + size_t region_assign_size[MAX_ASSIGN_COUNT]; int region_assign_idx; - u64 region_assign_perms; + int region_assign_count; + bool region_assign_shared; + int region_assign_vmid; + u64 region_assign_perms[MAX_ASSIGN_COUNT]; struct qcom_rproc_glink glink_subdev; struct qcom_rproc_subdev smd_subdev; @@ -590,37 +598,52 @@ static int adsp_alloc_memory_region(struct qcom_adsp *adsp) static int adsp_assign_memory_region(struct qcom_adsp *adsp) { - struct reserved_mem *rmem = NULL; - struct qcom_scm_vmperm perm; + struct qcom_scm_vmperm perm[MAX_ASSIGN_COUNT]; + unsigned int perm_size = 1; AFAICS, not need of initialization. struct device_node *node; - int ret; + int offset, ret; Nit: one variable per line. if (!adsp->region_assign_idx) Not related to this patch.. Should not this be valid only for > 1 ? return 0; - node = of_parse_phandle(adsp->dev->of_node, "memory-region", adsp->region_assign_idx); - if (node) - rmem = of_reserved_mem_lookup(node); - of_node_put(node); - if (!rmem) { - dev_err(adsp->dev, "unable to resolve shareable memory-region\n"); - return -EINVAL; - } + for (offset = 0; offset < adsp->region_assign_count; ++offset) { + struct reserved_mem *rmem = NULL; + + node = of_parse_phandle(adsp->dev->of_node, "memory-region", + adsp->region_assign_idx + offset); + if (node) + rmem = of_reserved_mem_lookup(node); + of_node_put(node); + if (!rmem) { + dev_err(adsp->dev, "unable to resolve shareable memory-region index %d\n", + offset); + return -EINVAL; > + } - perm.vmid = QCOM_SCM_VMID_MSS_MSA; - perm.perm = QCOM_SCM_PERM_RW; + if (adsp->region_assign_shared) { + perm[0].vmid = QCOM_SCM_VMID_HLOS; + perm[0].perm = QCOM_SCM_PERM_RW; + perm[1].vmid = adsp->region_assign_vmid; + perm[1].perm = QCOM_SCM_PERM_RW; + perm_size = 2; + } else { + perm[0].vmid = adsp->region_assign_vmid; + perm[0].perm = QCOM_SCM_PERM_RW; + perm_size = 1; + } - adsp->region_assign_phys = rmem->base; - adsp->region_assign_size = rmem->size; - adsp->region_assign_perms = BIT(QCOM_SCM_VMID_HLOS); + adsp->region_assign_phys[offset] = rmem->base; + adsp->region_assign_size[offset] = rmem->size; + adsp->region_assign_perms[offset] = BIT(QCOM_SCM_VMID_HLOS); Do we need array for this, is this changing ? - ret = qcom_scm_assign_mem(adsp->region_assign_phys, - adsp->region_assign_size, - >region_assign_perms, - , 1); - if (ret < 0) { - dev_err(adsp->dev, "assign memory failed\n"); - return ret; + ret = qcom_scm_assign_mem(adsp->region_assign_phys[offset], +
Re: [PATCH v7 2/3] mm/memory_hotplug: split memmap_on_memory requests across memblocks
On 26.10.23 00:44, Vishal Verma wrote: The MHP_MEMMAP_ON_MEMORY flag for hotplugged memory is restricted to 'memblock_size' chunks of memory being added. Adding a larger span of memory precludes memmap_on_memory semantics. For users of hotplug such as kmem, large amounts of memory might get added from the CXL subsystem. In some cases, this amount may exceed the available 'main memory' to store the memmap for the memory being added. In this case, it is useful to have a way to place the memmap on the memory being added, even if it means splitting the addition into memblock-sized chunks. Change add_memory_resource() to loop over memblock-sized chunks of memory if caller requested memmap_on_memory, and if other conditions for it are met. Teach try_remove_memory() to also expect that a memory range being removed might have been split up into memblock sized chunks, and to loop through those as needed. This does preclude being able to use PUD mappings in the direct map; a proposal to how this could be optimized in the future is laid out here[1]. Almost there, I think :) +static int create_altmaps_and_memory_blocks(int nid, struct memory_group *group, + u64 start, u64 size) +{ + unsigned long memblock_size = memory_block_size_bytes(); + u64 cur_start; + int ret; + + for (cur_start = start; cur_start < start + size; +cur_start += memblock_size) { + struct mhp_params params = { .pgprot = +pgprot_mhp(PAGE_KERNEL) }; + struct vmem_altmap mhp_altmap = { + .base_pfn = PHYS_PFN(cur_start), + .end_pfn = PHYS_PFN(cur_start + memblock_size - 1), + }; + + mhp_altmap.free = memory_block_memmap_on_memory_pages(); + params.altmap = kmemdup(_altmap, sizeof(struct vmem_altmap), + GFP_KERNEL); + if (!params.altmap) + return -ENOMEM; Best to cleanup here instead of handling it in the caller [as noted by Vishal, we might not be doing that yet]. Using remove_memory_blocks_and_altmaps() on the fully processed range sounds reasonable. maybe something like ret = arch_add_memory(nid, cur_start, memblock_size, ); if (ret) { kfree(params.altmap); goto out; } ret = create_memory_block_devices(cur_start, memblock_size, params.altmap, group); if (ret) { arch_remove_memory(cur_start, memblock_size, NULL); kfree(params.altmap); goto out; } if (ret && cur_start != start) remove_memory_blocks_and_altmaps(start, cur_start - start); return ret; + + /* call arch's memory hotadd */ + ret = arch_add_memory(nid, cur_start, memblock_size, ); + if (ret < 0) { + kfree(params.altmap); + return ret; + } + + /* create memory block devices after memory was added */ + ret = create_memory_block_devices(cur_start, memblock_size, + params.altmap, group); + if (ret) { + arch_remove_memory(cur_start, memblock_size, NULL); + kfree(params.altmap); + return ret; + } + } + + return 0; +} + [...] static int check_cpu_on_node(int nid) { int cpu; @@ -2146,11 +2186,69 @@ void try_offline_node(int nid) } EXPORT_SYMBOL(try_offline_node); -static int __ref try_remove_memory(u64 start, u64 size) +static void __ref remove_memory_blocks_and_altmaps(u64 start, u64 size) { - struct memory_block *mem; - int rc = 0, nid = NUMA_NO_NODE; + unsigned long memblock_size = memory_block_size_bytes(); struct vmem_altmap *altmap = NULL; + struct memory_block *mem; + u64 cur_start; + int rc; + + /* +* For memmap_on_memory, the altmaps could have been added on +* a per-memblock basis. Loop through the entire range if so, +* and remove each memblock and its altmap. +*/ /* * altmaps where added on a per-memblock basis; we have to process * each individual memory block. */ + for (cur_start = start; cur_start < start + size; +cur_start += memblock_size) { + rc = walk_memory_blocks(cur_start, memblock_size, , + test_has_altmap_cb); + if (rc) { + altmap = mem->altmap; + /* +* Mark altmap NULL so that we can add a debug +* check on memblock free. +*/ + mem->altmap = NULL; + } Simpler (especially, we know that there must be an altmap): mem =
[PATCH v2 2/3] remoteproc: qcom: pas: make region assign more generic
The current memory region assign only supports a single memory region. But new platforms introduces more regions to make the memory requirements more flexible for various use cases. Those new platforms also shares the memory region between the DSP and HLOS. To handle this, make the region assign more generic in order to support more than a single memory region and also permit setting the regions permissions as shared. Signed-off-by: Neil Armstrong --- drivers/remoteproc/qcom_q6v5_pas.c | 102 - 1 file changed, 66 insertions(+), 36 deletions(-) diff --git a/drivers/remoteproc/qcom_q6v5_pas.c b/drivers/remoteproc/qcom_q6v5_pas.c index 913a5d2068e8..4829fd26e17d 100644 --- a/drivers/remoteproc/qcom_q6v5_pas.c +++ b/drivers/remoteproc/qcom_q6v5_pas.c @@ -33,6 +33,8 @@ #define ADSP_DECRYPT_SHUTDOWN_DELAY_MS 100 +#define MAX_ASSIGN_COUNT 2 + struct adsp_data { int crash_reason_smem; const char *firmware_name; @@ -51,6 +53,9 @@ struct adsp_data { int ssctl_id; int region_assign_idx; + int region_assign_count; + bool region_assign_shared; + int region_assign_vmid; }; struct qcom_adsp { @@ -87,15 +92,18 @@ struct qcom_adsp { phys_addr_t dtb_mem_phys; phys_addr_t mem_reloc; phys_addr_t dtb_mem_reloc; - phys_addr_t region_assign_phys; + phys_addr_t region_assign_phys[MAX_ASSIGN_COUNT]; void *mem_region; void *dtb_mem_region; size_t mem_size; size_t dtb_mem_size; - size_t region_assign_size; + size_t region_assign_size[MAX_ASSIGN_COUNT]; int region_assign_idx; - u64 region_assign_perms; + int region_assign_count; + bool region_assign_shared; + int region_assign_vmid; + u64 region_assign_perms[MAX_ASSIGN_COUNT]; struct qcom_rproc_glink glink_subdev; struct qcom_rproc_subdev smd_subdev; @@ -590,37 +598,52 @@ static int adsp_alloc_memory_region(struct qcom_adsp *adsp) static int adsp_assign_memory_region(struct qcom_adsp *adsp) { - struct reserved_mem *rmem = NULL; - struct qcom_scm_vmperm perm; + struct qcom_scm_vmperm perm[MAX_ASSIGN_COUNT]; + unsigned int perm_size = 1; struct device_node *node; - int ret; + int offset, ret; if (!adsp->region_assign_idx) return 0; - node = of_parse_phandle(adsp->dev->of_node, "memory-region", adsp->region_assign_idx); - if (node) - rmem = of_reserved_mem_lookup(node); - of_node_put(node); - if (!rmem) { - dev_err(adsp->dev, "unable to resolve shareable memory-region\n"); - return -EINVAL; - } + for (offset = 0; offset < adsp->region_assign_count; ++offset) { + struct reserved_mem *rmem = NULL; + + node = of_parse_phandle(adsp->dev->of_node, "memory-region", + adsp->region_assign_idx + offset); + if (node) + rmem = of_reserved_mem_lookup(node); + of_node_put(node); + if (!rmem) { + dev_err(adsp->dev, "unable to resolve shareable memory-region index %d\n", + offset); + return -EINVAL; + } - perm.vmid = QCOM_SCM_VMID_MSS_MSA; - perm.perm = QCOM_SCM_PERM_RW; + if (adsp->region_assign_shared) { + perm[0].vmid = QCOM_SCM_VMID_HLOS; + perm[0].perm = QCOM_SCM_PERM_RW; + perm[1].vmid = adsp->region_assign_vmid; + perm[1].perm = QCOM_SCM_PERM_RW; + perm_size = 2; + } else { + perm[0].vmid = adsp->region_assign_vmid; + perm[0].perm = QCOM_SCM_PERM_RW; + perm_size = 1; + } - adsp->region_assign_phys = rmem->base; - adsp->region_assign_size = rmem->size; - adsp->region_assign_perms = BIT(QCOM_SCM_VMID_HLOS); + adsp->region_assign_phys[offset] = rmem->base; + adsp->region_assign_size[offset] = rmem->size; + adsp->region_assign_perms[offset] = BIT(QCOM_SCM_VMID_HLOS); - ret = qcom_scm_assign_mem(adsp->region_assign_phys, - adsp->region_assign_size, - >region_assign_perms, - , 1); - if (ret < 0) { - dev_err(adsp->dev, "assign memory failed\n"); - return ret; + ret = qcom_scm_assign_mem(adsp->region_assign_phys[offset], + adsp->region_assign_size[offset], + >region_assign_perms[offset], + perm, perm_size); + if (ret < 0) {
[PATCH v2 0/3] remoteproc: qcom: Introduce DSP support for SM8650
Add the bindings and driver changes for DSP support on the SM8650 platform in order to enable the aDSP, cDSP and MPSS subsystems to boot. Compared to SM8550, where SM8650 uses the same dual firmware files, (dtb file and main firmware) the memory zones requirement has changed: - cDSP: now requires 2 memory zones to be configured as shared between the cDSP and the HLOS subsystem - MPSS: In addition to the memory zone required for the SM8550 MPSS, another one is required to be configured for MPSS usage only. In order to handle this and avoid code duplication, the region_assign_* code patch has been made more generic and is able handle multiple DSP-only memory zones (for MPSS) or DSP-HLOS shared memory zones (cDSP) in the same region_assign functions. Dependencies: None For convenience, a regularly refreshed linux-next based git tree containing all the SM8650 related work is available at: https://git.codelinaro.org/neil.armstrong/linux/-/tree/topic/sm8650/upstream/integ Signed-off-by: Neil Armstrong --- Changes in v2: - Fixed sm8650 entries in allOf:if:then to match Krzysztof's comments - Collected reviewed-by on patch 3 - Link to v1: https://lore.kernel.org/r/20231025-topic-sm8650-upstream-remoteproc-v1-0-a8d20e4ce...@linaro.org --- Neil Armstrong (3): dt-bindings: remoteproc: qcom,sm8550-pas: document the SM8650 PAS remoteproc: qcom: pas: make region assign more generic remoteproc: qcom: pas: Add SM8650 remoteproc support .../bindings/remoteproc/qcom,sm8550-pas.yaml | 44 +- drivers/remoteproc/qcom_q6v5_pas.c | 152 - 2 files changed, 159 insertions(+), 37 deletions(-) --- base-commit: fe1998aa935b44ef873193c0772c43bce74f17dc change-id: 20231016-topic-sm8650-upstream-remoteproc-66a87eeb6fee Best regards, -- Neil Armstrong
[PATCH v2 1/3] dt-bindings: remoteproc: qcom,sm8550-pas: document the SM8650 PAS
Document the DSP Peripheral Authentication Service on the SM8650 Platform. Signed-off-by: Neil Armstrong --- .../bindings/remoteproc/qcom,sm8550-pas.yaml | 44 +- 1 file changed, 43 insertions(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/remoteproc/qcom,sm8550-pas.yaml b/Documentation/devicetree/bindings/remoteproc/qcom,sm8550-pas.yaml index 58120829fb06..4e8ce9e7e9fa 100644 --- a/Documentation/devicetree/bindings/remoteproc/qcom,sm8550-pas.yaml +++ b/Documentation/devicetree/bindings/remoteproc/qcom,sm8550-pas.yaml @@ -19,6 +19,9 @@ properties: - qcom,sm8550-adsp-pas - qcom,sm8550-cdsp-pas - qcom,sm8550-mpss-pas + - qcom,sm8650-adsp-pas + - qcom,sm8650-cdsp-pas + - qcom,sm8650-mpss-pas reg: maxItems: 1 @@ -49,6 +52,7 @@ properties: - description: Memory region for main Firmware authentication - description: Memory region for Devicetree Firmware authentication - description: DSM Memory region + - description: DSM Memory region 2 required: - compatible @@ -63,6 +67,7 @@ allOf: enum: - qcom,sm8550-adsp-pas - qcom,sm8550-cdsp-pas +- qcom,sm8650-adsp-pas then: properties: interrupts: @@ -71,7 +76,26 @@ allOf: maxItems: 5 memory-region: maxItems: 2 -else: + - if: + properties: +compatible: + enum: +- qcom,sm8650-cdsp-pas +then: + properties: +interrupts: + maxItems: 5 +interrupt-names: + maxItems: 5 +memory-region: + minItems: 3 + maxItems: 3 + - if: + properties: +compatible: + enum: +- qcom,sm8550-mpss-pas +then: properties: interrupts: minItems: 6 @@ -79,12 +103,28 @@ allOf: minItems: 6 memory-region: minItems: 3 + maxItems: 3 + - if: + properties: +compatible: + enum: +- qcom,sm8650-mpss-pas +then: + properties: +interrupts: + minItems: 6 +interrupt-names: + minItems: 6 +memory-region: + minItems: 4 + maxItems: 4 - if: properties: compatible: enum: - qcom,sm8550-adsp-pas +- qcom,sm8650-adsp-pas then: properties: power-domains: @@ -101,6 +141,7 @@ allOf: compatible: enum: - qcom,sm8550-mpss-pas +- qcom,sm8650-mpss-pas then: properties: power-domains: @@ -116,6 +157,7 @@ allOf: compatible: enum: - qcom,sm8550-cdsp-pas +- qcom,sm8650-cdsp-pas then: properties: power-domains: -- 2.34.1
[PATCH v2 3/3] remoteproc: qcom: pas: Add SM8650 remoteproc support
Add DSP Peripheral Authentication Service support for the SM8650 platform. Reviewed-by: Dmitry Baryshkov Signed-off-by: Neil Armstrong --- drivers/remoteproc/qcom_q6v5_pas.c | 50 ++ 1 file changed, 50 insertions(+) diff --git a/drivers/remoteproc/qcom_q6v5_pas.c b/drivers/remoteproc/qcom_q6v5_pas.c index 4829fd26e17d..c593e6d529b3 100644 --- a/drivers/remoteproc/qcom_q6v5_pas.c +++ b/drivers/remoteproc/qcom_q6v5_pas.c @@ -1195,6 +1195,53 @@ static const struct adsp_data sm8550_mpss_resource = { .region_assign_vmid = QCOM_SCM_VMID_MSS_MSA, }; +static const struct adsp_data sm8650_cdsp_resource = { + .crash_reason_smem = 601, + .firmware_name = "cdsp.mdt", + .dtb_firmware_name = "cdsp_dtb.mdt", + .pas_id = 18, + .dtb_pas_id = 0x25, + .minidump_id = 7, + .auto_boot = true, + .proxy_pd_names = (char*[]){ + "cx", + "mxc", + "nsp", + NULL + }, + .load_state = "cdsp", + .ssr_name = "cdsp", + .sysmon_name = "cdsp", + .ssctl_id = 0x17, + .region_assign_idx = 2, + .region_assign_count = 1, + .region_assign_shared = true, + .region_assign_vmid = QCOM_SCM_VMID_CDSP, +}; + +static const struct adsp_data sm8650_mpss_resource = { + .crash_reason_smem = 421, + .firmware_name = "modem.mdt", + .dtb_firmware_name = "modem_dtb.mdt", + .pas_id = 4, + .dtb_pas_id = 0x26, + .minidump_id = 3, + .auto_boot = false, + .decrypt_shutdown = true, + .proxy_pd_names = (char*[]){ + "cx", + "mss", + NULL + }, + .load_state = "modem", + .ssr_name = "mpss", + .sysmon_name = "modem", + .ssctl_id = 0x12, + .region_assign_idx = 2, + .region_assign_count = 2, + .region_assign_vmid = QCOM_SCM_VMID_MSS_MSA, +}; + static const struct of_device_id adsp_of_match[] = { { .compatible = "qcom,msm8226-adsp-pil", .data = _resource_init}, { .compatible = "qcom,msm8953-adsp-pil", .data = _adsp_resource}, @@ -1247,6 +1294,9 @@ static const struct of_device_id adsp_of_match[] = { { .compatible = "qcom,sm8550-adsp-pas", .data = _adsp_resource}, { .compatible = "qcom,sm8550-cdsp-pas", .data = _cdsp_resource}, { .compatible = "qcom,sm8550-mpss-pas", .data = _mpss_resource}, + { .compatible = "qcom,sm8650-adsp-pas", .data = _adsp_resource}, + { .compatible = "qcom,sm8650-cdsp-pas", .data = _cdsp_resource}, + { .compatible = "qcom,sm8650-mpss-pas", .data = _mpss_resource}, { }, }; MODULE_DEVICE_TABLE(of, adsp_of_match); -- 2.34.1
Re: [PATCH 1/3] dt-bindings: remoteproc: qcom,sm8550-pas: document the SM8650 PAS
On 30/10/2023 09:29, Neil Armstrong wrote: > Ok, I fixed all that > >> >>> + - if: >>> + properties: >>> +compatible: >>> + enum: >>> +- qcom,sm8650-mpss-pas >>> +then: >> >> I am not sure if keeping it in the same binding as sm8550 avoids that >> much duplication. > > Yes it does, 70% is the bindings would be the same, still if it's still > preferable I can duplicate. > Then let's keep how you propose - in one binding. Best regards, Krzysztof
Re: [PATCH 7/9] arm64: dts: qcom: sc7280: Add CDSP node
On Mon Oct 30, 2023 at 10:04 AM CET, Mukesh Ojha wrote: > > > On 10/27/2023 7:50 PM, Luca Weiss wrote: > > Add the node for the ADSP found on the SC7280 SoC, using standard > > Qualcomm firmware. > > > > The memory region for sc7280-chrome-common.dtsi is taken from msm-5.4 > > yupik.dtsi since the other areas also seem to match that file there, > > though I cannot be sure there. > > > > Signed-off-by: Luca Weiss > > --- > > arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi | 5 + > > arch/arm64/boot/dts/qcom/sc7280.dtsi | 138 > > + > > 2 files changed, 143 insertions(+) > > > > diff --git a/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi > > b/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi > > index eb55616e0892..6e5a9d4c1fda 100644 > > --- a/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi > > +++ b/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi > > @@ -29,6 +29,11 @@ adsp_mem: memory@8670 { > > no-map; > > }; > > > > + cdsp_mem: memory@88f0 { > > + reg = <0x0 0x88f0 0x0 0x1e0>; > > + no-map; > > + }; > > + > > Just a question, why to do it here, if chrome does not use this ? Other memory regions in sc7280.dtsi also get referenced but not actually defined in that file, like mpss_mem and wpss_mem. Alternatively we can also try and solve this differently, but then we should probably also adjust mpss and wpss to be consistent. Apart from either declaring cdsp_mem in sc7280.dtsi or "/delete-property/ memory-region;" for CDSP I don't really have better ideas though. I also imagine these ChromeOS devices will want to enable cdsp at some point but I don't know any plans there. Regards Luca > > -Mukesh > > > camera_mem: memory@8ad0 { > > reg = <0x0 0x8ad0 0x0 0x50>; > > no-map; > > diff --git a/arch/arm64/boot/dts/qcom/sc7280.dtsi > > b/arch/arm64/boot/dts/qcom/sc7280.dtsi > > index cc153f4e6979..e15646289bf7 100644 > > --- a/arch/arm64/boot/dts/qcom/sc7280.dtsi > > +++ b/arch/arm64/boot/dts/qcom/sc7280.dtsi > > @@ -3815,6 +3815,144 @@ nsp_noc: interconnect@a0c { > > qcom,bcm-voters = <_bcm_voter>; > > }; > > > > + remoteproc_cdsp: remoteproc@a30 { > > + compatible = "qcom,sc7280-cdsp-pas"; > > + reg = <0 0x0a30 0 0x1>; > > + > > + interrupts-extended = < GIC_SPI 578 > > IRQ_TYPE_LEVEL_HIGH>, > > + <_smp2p_in 0 > > IRQ_TYPE_EDGE_RISING>, > > + <_smp2p_in 1 > > IRQ_TYPE_EDGE_RISING>, > > + <_smp2p_in 2 > > IRQ_TYPE_EDGE_RISING>, > > + <_smp2p_in 3 > > IRQ_TYPE_EDGE_RISING>, > > + <_smp2p_in 7 > > IRQ_TYPE_EDGE_RISING>; > > + interrupt-names = "wdog", "fatal", "ready", "handover", > > + "stop-ack", "shutdown-ack"; > > + > > + clocks = < RPMH_CXO_CLK>; > > + clock-names = "xo"; > > + > > + power-domains = < SC7280_CX>, > > + < SC7280_MX>; > > + power-domain-names = "cx", "mx"; > > + > > + interconnects = <_noc MASTER_CDSP_PROC 0 _virt > > SLAVE_EBI1 0>; > > + > > + memory-region = <_mem>; > > + > > + qcom,qmp = <_qmp>; > > + > > + qcom,smem-states = <_smp2p_out 0>; > > + qcom,smem-state-names = "stop"; > > + > > + status = "disabled"; > > + > > + glink-edge { > > + interrupts-extended = < IPCC_CLIENT_CDSP > > + > > IPCC_MPROC_SIGNAL_GLINK_QMP > > + > > IRQ_TYPE_EDGE_RISING>; > > + mboxes = < IPCC_CLIENT_CDSP > > + IPCC_MPROC_SIGNAL_GLINK_QMP>; > > + > > + label = "cdsp"; > > + qcom,remote-pid = <5>; > > + > > + fastrpc { > > + compatible = "qcom,fastrpc"; > > + qcom,glink-channels = > > "fastrpcglink-apps-dsp"; > > + label = "cdsp"; > > + qcom,non-secure-domain; > > + #address-cells = <1>; > > + #size-cells = <0>; > > + > > + compute-cb@1 { > > + compatible = > > "qcom,fastrpc-compute-cb"; > > +
Re: [PATCH 7/9] arm64: dts: qcom: sc7280: Add CDSP node
On 10/27/2023 7:50 PM, Luca Weiss wrote: Add the node for the ADSP found on the SC7280 SoC, using standard Qualcomm firmware. The memory region for sc7280-chrome-common.dtsi is taken from msm-5.4 yupik.dtsi since the other areas also seem to match that file there, though I cannot be sure there. Signed-off-by: Luca Weiss --- arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi | 5 + arch/arm64/boot/dts/qcom/sc7280.dtsi | 138 + 2 files changed, 143 insertions(+) diff --git a/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi b/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi index eb55616e0892..6e5a9d4c1fda 100644 --- a/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi +++ b/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi @@ -29,6 +29,11 @@ adsp_mem: memory@8670 { no-map; }; + cdsp_mem: memory@88f0 { + reg = <0x0 0x88f0 0x0 0x1e0>; + no-map; + }; + Just a question, why to do it here, if chrome does not use this ? -Mukesh camera_mem: memory@8ad0 { reg = <0x0 0x8ad0 0x0 0x50>; no-map; diff --git a/arch/arm64/boot/dts/qcom/sc7280.dtsi b/arch/arm64/boot/dts/qcom/sc7280.dtsi index cc153f4e6979..e15646289bf7 100644 --- a/arch/arm64/boot/dts/qcom/sc7280.dtsi +++ b/arch/arm64/boot/dts/qcom/sc7280.dtsi @@ -3815,6 +3815,144 @@ nsp_noc: interconnect@a0c { qcom,bcm-voters = <_bcm_voter>; }; + remoteproc_cdsp: remoteproc@a30 { + compatible = "qcom,sc7280-cdsp-pas"; + reg = <0 0x0a30 0 0x1>; + + interrupts-extended = < GIC_SPI 578 IRQ_TYPE_LEVEL_HIGH>, + <_smp2p_in 0 IRQ_TYPE_EDGE_RISING>, + <_smp2p_in 1 IRQ_TYPE_EDGE_RISING>, + <_smp2p_in 2 IRQ_TYPE_EDGE_RISING>, + <_smp2p_in 3 IRQ_TYPE_EDGE_RISING>, + <_smp2p_in 7 IRQ_TYPE_EDGE_RISING>; + interrupt-names = "wdog", "fatal", "ready", "handover", + "stop-ack", "shutdown-ack"; + + clocks = < RPMH_CXO_CLK>; + clock-names = "xo"; + + power-domains = < SC7280_CX>, + < SC7280_MX>; + power-domain-names = "cx", "mx"; + + interconnects = <_noc MASTER_CDSP_PROC 0 _virt SLAVE_EBI1 0>; + + memory-region = <_mem>; + + qcom,qmp = <_qmp>; + + qcom,smem-states = <_smp2p_out 0>; + qcom,smem-state-names = "stop"; + + status = "disabled"; + + glink-edge { + interrupts-extended = < IPCC_CLIENT_CDSP + IPCC_MPROC_SIGNAL_GLINK_QMP + IRQ_TYPE_EDGE_RISING>; + mboxes = < IPCC_CLIENT_CDSP + IPCC_MPROC_SIGNAL_GLINK_QMP>; + + label = "cdsp"; + qcom,remote-pid = <5>; + + fastrpc { + compatible = "qcom,fastrpc"; + qcom,glink-channels = "fastrpcglink-apps-dsp"; + label = "cdsp"; + qcom,non-secure-domain; + #address-cells = <1>; + #size-cells = <0>; + + compute-cb@1 { + compatible = "qcom,fastrpc-compute-cb"; + reg = <1>; + iommus = <_smmu 0x11a1 0x0420>, +<_smmu 0x1181 0x0420>; + }; + + compute-cb@2 { + compatible = "qcom,fastrpc-compute-cb"; + reg = <2>; + iommus = <_smmu 0x11a2 0x0420>, +<_smmu 0x1182 0x0420>; + }; + + compute-cb@3 { + compatible = "qcom,fastrpc-compute-cb"; +
Re: [PATCH v2 2/3] usb: typec: fsa4480: Add support to swap SBU orientation
On Fri, Oct 20, 2023 at 11:33:19AM +0200, Luca Weiss wrote: > On some hardware designs the AUX+/- lanes are connected reversed to > SBU1/2 compared to the expected design by FSA4480. > > Made more complicated, the otherwise compatible Orient-Chip OCP96011 > expects the lanes to be connected reversed compared to FSA4480. > > * FSA4480 block diagram shows AUX+ connected to SBU2 and AUX- to SBU1. > * OCP96011 block diagram shows AUX+ connected to SBU1 and AUX- to SBU2. > > So if OCP96011 is used as drop-in for FSA4480 then the orientation > handling in the driver needs to be reversed to match the expectation of > the OCP96011 hardware. > > Support parsing the data-lanes parameter in the endpoint node to swap > this in the driver. > > The parse_data_lanes_mapping function is mostly taken from nb7vpq904m.c. > > Reviewed-by: Neil Armstrong > Signed-off-by: Luca Weiss Reviewed-by: Heikki Krogerus > --- > drivers/usb/typec/mux/fsa4480.c | 71 > + > 1 file changed, 71 insertions(+) > > diff --git a/drivers/usb/typec/mux/fsa4480.c b/drivers/usb/typec/mux/fsa4480.c > index e0ee1f621abb..cb7cdf90cb0a 100644 > --- a/drivers/usb/typec/mux/fsa4480.c > +++ b/drivers/usb/typec/mux/fsa4480.c > @@ -60,6 +60,7 @@ struct fsa4480 { > unsigned int svid; > > u8 cur_enable; > + bool swap_sbu_lanes; > }; > > static const struct regmap_config fsa4480_regmap_config = { > @@ -76,6 +77,9 @@ static int fsa4480_set(struct fsa4480 *fsa) > u8 enable = FSA4480_ENABLE_DEVICE; > u8 sel = 0; > > + if (fsa->swap_sbu_lanes) > + reverse = !reverse; > + > /* USB Mode */ > if (fsa->mode < TYPEC_STATE_MODAL || > (!fsa->svid && (fsa->mode == TYPEC_MODE_USB2 || > @@ -179,12 +183,75 @@ static int fsa4480_mux_set(struct typec_mux_dev *mux, > struct typec_mux_state *st > return ret; > } > > +enum { > + NORMAL_LANE_MAPPING, > + INVERT_LANE_MAPPING, > +}; > + > +#define DATA_LANES_COUNT 2 > + > +static const int supported_data_lane_mapping[][DATA_LANES_COUNT] = { > + [NORMAL_LANE_MAPPING] = { 0, 1 }, > + [INVERT_LANE_MAPPING] = { 1, 0 }, > +}; > + > +static int fsa4480_parse_data_lanes_mapping(struct fsa4480 *fsa) > +{ > + struct fwnode_handle *ep; > + u32 data_lanes[DATA_LANES_COUNT]; > + int ret, i, j; > + > + ep = fwnode_graph_get_next_endpoint(dev_fwnode(>client->dev), > NULL); > + if (!ep) > + return 0; > + > + ret = fwnode_property_read_u32_array(ep, "data-lanes", data_lanes, > DATA_LANES_COUNT); > + if (ret == -EINVAL) > + /* Property isn't here, consider default mapping */ > + goto out_done; > + if (ret) { > + dev_err(>client->dev, "invalid data-lanes property: %d\n", > ret); > + goto out_error; > + } > + > + for (i = 0; i < ARRAY_SIZE(supported_data_lane_mapping); i++) { > + for (j = 0; j < DATA_LANES_COUNT; j++) { > + if (data_lanes[j] != supported_data_lane_mapping[i][j]) > + break; > + } > + > + if (j == DATA_LANES_COUNT) > + break; > + } > + > + switch (i) { > + case NORMAL_LANE_MAPPING: > + break; > + case INVERT_LANE_MAPPING: > + fsa->swap_sbu_lanes = true; > + break; > + default: > + dev_err(>client->dev, "invalid data-lanes mapping\n"); > + ret = -EINVAL; > + goto out_error; > + } > + > +out_done: > + ret = 0; > + > +out_error: > + fwnode_handle_put(ep); > + > + return ret; > +} > + > static int fsa4480_probe(struct i2c_client *client) > { > struct device *dev = >dev; > struct typec_switch_desc sw_desc = { }; > struct typec_mux_desc mux_desc = { }; > struct fsa4480 *fsa; > + int ret; > > fsa = devm_kzalloc(dev, sizeof(*fsa), GFP_KERNEL); > if (!fsa) > @@ -193,6 +260,10 @@ static int fsa4480_probe(struct i2c_client *client) > fsa->client = client; > mutex_init(>lock); > > + ret = fsa4480_parse_data_lanes_mapping(fsa); > + if (ret) > + return ret; > + > fsa->regmap = devm_regmap_init_i2c(client, _regmap_config); > if (IS_ERR(fsa->regmap)) > return dev_err_probe(dev, PTR_ERR(fsa->regmap), "failed to > initialize regmap\n"); > > -- > 2.42.0 -- heikki
Re: [PATCH 1/3] dt-bindings: remoteproc: qcom,sm8550-pas: document the SM8650 PAS
On 27/10/2023 09:36, Krzysztof Kozlowski wrote: On 25/10/2023 09:35, Neil Armstrong wrote: Document the DSP Peripheral Authentication Service on the SM8650 Platform. Signed-off-by: Neil Armstrong --- .../bindings/remoteproc/qcom,sm8550-pas.yaml | 41 +- 1 file changed, 40 insertions(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/remoteproc/qcom,sm8550-pas.yaml b/Documentation/devicetree/bindings/remoteproc/qcom,sm8550-pas.yaml index 58120829fb06..316371c8ee6e 100644 --- a/Documentation/devicetree/bindings/remoteproc/qcom,sm8550-pas.yaml +++ b/Documentation/devicetree/bindings/remoteproc/qcom,sm8550-pas.yaml @@ -19,6 +19,9 @@ properties: - qcom,sm8550-adsp-pas - qcom,sm8550-cdsp-pas - qcom,sm8550-mpss-pas + - qcom,sm8650-adsp-pas + - qcom,sm8650-cdsp-pas + - qcom,sm8650-mpss-pas reg: maxItems: 1 @@ -49,6 +52,7 @@ properties: - description: Memory region for main Firmware authentication - description: Memory region for Devicetree Firmware authentication - description: DSM Memory region + - description: DSM Memory region 2 required: - compatible @@ -63,6 +67,7 @@ allOf: enum: - qcom,sm8550-adsp-pas - qcom,sm8550-cdsp-pas +- qcom,sm8650-adsp-pas then: properties: interrupts: @@ -71,7 +76,25 @@ allOf: maxItems: 5 memory-region: maxItems: 2 -else: + - if: + properties: +compatible: + enum: +- qcom,sm8650-cdsp-pas +then: + properties: +interrupts: + minItems: 5 maxItems +interrupt-names: + minItems: 5 maxItems +memory-region: + minItems: 3 maxItems: 3 + - if: + properties: +compatible: + enum: +- qcom,sm8550-mpss-pas +then: properties: interrupts: minItems: 6 @@ -79,12 +102,26 @@ allOf: minItems: 6 memory-region: minItems: 3 You need to add here maxItems. Ok, I fixed all that + - if: + properties: +compatible: + enum: +- qcom,sm8650-mpss-pas +then: I am not sure if keeping it in the same binding as sm8550 avoids that much duplication. Yes it does, 70% is the bindings would be the same, still if it's still preferable I can duplicate. Thanks, Neil Best regards, Krzysztof
Re: [PATCH v2] bus: mhi: host: Add tracing support
On 10/27/2023 8:59 PM, Jeffrey Hugo wrote: On 10/23/2023 1:11 AM, Krishna Chaitanya Chundru wrote: On 10/20/2023 8:33 PM, Jeffrey Hugo wrote: On 10/13/2023 3:52 AM, Krishna chaitanya chundru wrote: This change adds ftrace support for following functions which helps in debugging the issues when there is Channel state & MHI state change and also when we receive data and control events: 1. mhi_intvec_threaded_handler 2. mhi_process_data_event_ring 3. mhi_process_ctrl_ev_ring 4. mhi_gen_tre 5. mhi_update_channel_state 6. mhi_tryset_pm_state 7. mhi_pm_st_worker Signed-off-by: Krishna chaitanya chundru --- Changes in v2: - Passing the raw state into the trace event and using __print_symbolic() as suggested by bjorn. - Change mhi_pm_st_worker to mhi_pm_st_transition as suggested by bjorn. - Fixed the kernel test rebot issues. - Link to v1: https://lore.kernel.org/r/20231005-ftrace_support-v1-1-23a2f394f...@quicinc.com --- MAINTAINERS | 1 + drivers/bus/mhi/host/init.c | 3 + drivers/bus/mhi/host/internal.h | 1 + drivers/bus/mhi/host/main.c | 32 +++-- drivers/bus/mhi/host/pm.c | 6 +- include/trace/events/mhi_host.h | 287 6 files changed, 317 insertions(+), 13 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 35977b269d5e..4339c668a6ab 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -13862,6 +13862,7 @@ F: Documentation/mhi/ F: drivers/bus/mhi/ F: drivers/pci/endpoint/functions/pci-epf-mhi.c F: include/linux/mhi.h +F: include/trace/events/mhi_host.h MICROBLAZE ARCHITECTURE M: Michal Simek diff --git a/drivers/bus/mhi/host/init.c b/drivers/bus/mhi/host/init.c index f78aefd2d7a3..3afa90a204fd 100644 --- a/drivers/bus/mhi/host/init.c +++ b/drivers/bus/mhi/host/init.c @@ -20,6 +20,9 @@ #include #include "internal.h" +#define CREATE_TRACE_POINTS +#include This feels redundant to me. A few lines ago we included internal.h, and internal.h includes trace/events/mhi_host.h As Steve mentioned, this is mandatory step for creating trace points & trace events. I understand this creates the trace points, and that needs to be done in C code. It dtill seems redundant because we are including the header twice (and I am aware trace has the special multi-header read functionality for this). The duplicate include still feels weird, but I have not come up with a better way to structure this. We will use this way for now then, we will check in parallel if there is a way to avoid this and change it in the future. + static DEFINE_IDA(mhi_controller_ida); const char * const mhi_ee_str[MHI_EE_MAX] = { diff --git a/drivers/bus/mhi/host/internal.h b/drivers/bus/mhi/host/internal.h index 2e139e76de4c..a80a317a59a9 100644 --- a/drivers/bus/mhi/host/internal.h +++ b/drivers/bus/mhi/host/internal.h @@ -7,6 +7,7 @@ #ifndef _MHI_INT_H #define _MHI_INT_H +#include #include "../common.h" extern struct bus_type mhi_bus_type; diff --git a/drivers/bus/mhi/host/main.c b/drivers/bus/mhi/host/main.c index dcf627b36e82..fcdb728ba49f 100644 --- a/drivers/bus/mhi/host/main.c +++ b/drivers/bus/mhi/host/main.c @@ -246,6 +246,11 @@ static void *mhi_to_virtual(struct mhi_ring *ring, dma_addr_t addr) return (addr - ring->iommu_base) + ring->base; } +dma_addr_t mhi_to_physical(struct mhi_ring *ring, void *addr) +{ + return (addr - ring->base) + ring->iommu_base; +} This seems to be poorly named since we are using the iommu_base which suggests we are converting to an IOVA. Why do we need this though? This seems like it might be a security issue, or at the very least, not preferred, and I'm struggling to figure out what value this provides to you are I when looking at the log. I will rename the function to reflect it is converting to IOVA. We MHI TRE we write the IOVA address, to correlate between TRE events in the MHI ring and event we are processing we want to log the IOVA address. As we are logging only IOVA address which is provided in the devicetree and not the original physical address we are not expecting any security issues here. Correct me if I was wrong. The IOVA is not provided by DT, it is a runtime allocated value provided by the IOMMU, if present. If not present, then it is a physical address. Remember, x86 does not use devicetree. While the IOVA (with an iommu) is not technically a physical address, but is treated as such by the device. I can imagine an attacker doing bad things if they get a hold of the value. Sure we will remove it. Still, you haven't indicated why this is useful. The TRE ring elements has address in the IOVA format when we want to correlate the address with the TRE elements in the dumps it will easier with this way. Anyway we will not expose this as you suggested as it might expose physical address in some platforms. + static void mhi_add_ring_element(struct
Re: [PATCH v2] eventfs: Test for ei->is_freed when accessing ei->dentry
Hi Steven, > Are you sure it was applied correctly? Please ignore the build warnings / errors it was due to apply patch was not successful. > Perhaps check out the branch I > have and let me know if you get the same errors. > > git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git trace/core I have tested the linux-trace.git trace/core and run selftests ftrace the reported kernel panic [1] & [2] has been fixed but found "general protection fault" at kernel/trace/trace_events.c:2439. [1] https://lore.kernel.org/all/CA+G9fYt6pY+tMZEOg=soeywqoe19fgp3ur15sgowkdk+_x8...@mail.gmail.com/ [2] https://lore.kernel.org/linux-trace-kernel/20231028164650.4f5ea...@rorschach.local.home/ Reported-by: Linux Kernel Functional Testing Test log: == kselftest: Running tests in ftrace TAP version 13 1..1 # timeout set to 0 # selftests: ftrace: ftracetest-ktap ... # ok 46 Test creation and deletion of trace instances while setting an event [ 332.783872] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bcb: [#1] PREEMPT SMP PTI [ 332.794585] CPU: 1 PID: 5165 Comm: ls Not tainted 6.6.0-rc4 #1 [ 332.800429] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS 2.7 12/07/2021 [ 332.807820] RIP: 0010:event_callback (kernel/trace/trace_events.c:2439) [ 332.812353] Code: f6 48 c7 c6 09 78 6b a0 41 55 49 89 cd 41 54 49 89 d4 53 48 8b 02 48 89 fb 4c 8b 78 10 e8 10 8b 19 01 85 c0 0f 84 d2 00 00 00 <41> 8b 47 60 a8 08 0f 85 9c 00 00 00 49 8b 47 10 48 83 78 18 00 74 All code 0: f6 48 c7 c6 testb $0xc6,-0x39(%rax) 4: 09 78 6b or %edi,0x6b(%rax) 7: a0 41 55 49 89 cd 41 movabs 0x495441cd89495541,%al e: 54 49 10: 89 d4mov%edx,%esp 12: 53push %rbx 13: 48 8b 02 mov(%rdx),%rax 16: 48 89 fb mov%rdi,%rbx 19: 4c 8b 78 10 mov0x10(%rax),%r15 1d: e8 10 8b 19 01callq 0x1198b32 22: 85 c0test %eax,%eax 24: 0f 84 d2 00 00 00je 0xfc 2a:* 41 8b 47 60 mov0x60(%r15),%eax <-- trapping instruction 2e: a8 08test $0x8,%al 30: 0f 85 9c 00 00 00jne0xd2 36: 49 8b 47 10 mov0x10(%r15),%rax 3a: 48 83 78 18 00cmpq $0x0,0x18(%rax) 3f: 74.byte 0x74 Code starting with the faulting instruction === 0: 41 8b 47 60 mov0x60(%r15),%eax 4: a8 08test $0x8,%al 6: 0f 85 9c 00 00 00jne0xa8 c: 49 8b 47 10 mov0x10(%r15),%rax 10: 48 83 78 18 00cmpq $0x0,0x18(%rax) 15: 74.byte 0x74 [ 332.831089] RSP: 0018:ba5700967b50 EFLAGS: 00010202 [ 332.836315] RAX: 0001 RBX: a06fb5d7 RCX: ba5700967bc8 [ 332.843439] RDX: 0069 RSI: a06b7809 RDI: a06fb5d7 [ 332.850563] RBP: ba5700967b78 R08: 8dbe12250134 R09: [ 332.857722] R10: 8dbe122500e0 R11: 8dbe0d8f8fc8 R12: ba5700967bd0 [ 332.864846] R13: ba5700967bc8 R14: ba5700967bc6 R15: 6b6b6b6b6b6b6b6b [ 332.871971] FS: 7fc5a5062d00() GS:8dc167a8() knlGS: [ 332.880057] CS: 0010 DS: ES: CR0: 80050033 [ 332.885801] CR2: 7f7f8dc50a1c CR3: 000113c84003 CR4: 003706e0 [ 332.892928] DR0: DR1: DR2: [ 332.900058] DR3: DR6: fffe0ff0 DR7: 0400 [ 332.907183] Call Trace: [ 332.909658] [ 332.911760] ? show_regs (arch/x86/kernel/dumpstack.c:479) [ 332.915166] ? die_addr (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:460) [ 332.918486] ? exc_general_protection (arch/x86/kernel/traps.c:697 arch/x86/kernel/traps.c:642) [ 332.923191] ? asm_exc_general_protection (arch/x86/include/asm/idtentry.h:564) [ 332.928073] ? event_callback (kernel/trace/trace_events.c:2439) [ 332.931996] dcache_dir_open_wrapper (fs/tracefs/event_inode.c:745) [ 332.936636] ? __pfx_dcache_dir_open_wrapper (fs/tracefs/event_inode.c:683) [ 332.941772] do_dentry_open (fs/open.c:929) [ 332.945646] vfs_open (fs/open.c:1064) [ 332.948800] path_openat (fs/namei.c:3639 fs/namei.c:3796) [ 332.952465] ? ___slab_alloc (mm/slub.c:810 mm/slub.c:3265) [ 332.956384] ? nfs_ctx_key_to_expire (fs/nfs/write.c:1281) [ 332.960917] ? trace_preempt_on (kernel/trace/trace_preemptirq.c:105) [ 332.964929] do_filp_open (fs/namei.c:3823) [ 332.968511] do_sys_openat2 (fs/open.c:1422) [ 332.972173] __x64_sys_openat (fs/open.c:1448) [ 332.976014] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [ 332.979601] ? trace_hardirqs_on_prepare (kernel/trace/trace_preemptirq.c:47 (discriminator 16) kernel/trace/trace_preemptirq.c:42 (discriminator 16)) [ 332.984436] ? syscall_exit_to_user_mode (kernel/entry/common.c:299) [ 332.989219] ?
Re: [PATCH v3 04/13] mm/execmem, arch: convert remaining overrides of module_alloc to execmem
On Thu, Oct 26, 2023 at 11:24:39AM +0100, Will Deacon wrote: > On Thu, Oct 26, 2023 at 11:58:00AM +0300, Mike Rapoport wrote: > > On Mon, Oct 23, 2023 at 06:14:20PM +0100, Will Deacon wrote: > > > On Mon, Sep 18, 2023 at 10:29:46AM +0300, Mike Rapoport wrote: > > > > diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c > > > > index dd851297596e..cd6320de1c54 100644 > > > > --- a/arch/arm64/kernel/module.c > > > > +++ b/arch/arm64/kernel/module.c ... > > > > - if (module_direct_base) { > > > > - p = __vmalloc_node_range(size, MODULE_ALIGN, > > > > -module_direct_base, > > > > -module_direct_base + SZ_128M, > > > > -GFP_KERNEL | __GFP_NOWARN, > > > > -PAGE_KERNEL, 0, NUMA_NO_NODE, > > > > -__builtin_return_address(0)); > > > > - } > > > > + module_init_limits(); > > > > > > Hmm, this used to be run from subsys_initcall(), but now you're running > > > it _really_ early, before random_init(), so randomization of the module > > > space is no longer going to be very random if we don't have early entropy > > > from the firmware or the CPU, which is likely to be the case on most SoCs. > > > > Well, it will be as random as KASLR. Won't that be enough? > > I don't think that's true -- we have the 'kaslr-seed' property for KASLR, > but I'm not seeing anything like that for the module randomisation and I > also don't see why we need to set these limits so early. x86 needs execmem initialized before ftrace_init() so I thought it would be best to setup execmem along with most of MM in mm_core_init(). I'll move execmem initialization for !x86 to a later point, say core_initcall. > Will -- Sincerely yours, Mike.